Data modeling is the process of defining datapoints and structures at a detailed or abstract level to communicate information about the data shape, content, and relationships to target audiences. Data models can be focused on a very specific universe of discourse or an entire enterprise’s informational concerns. The final product for a data modeling exercise varies from a list of critical subject areas, an entity-relationship diagram (ERD) with or without details about attributes, or even a data definition language (DDL) script containing all the SQL commands to build a set of physical structures within some chosen database management system (DBMS).
Use of ERDs
Target audiences for these deliverables may be solution architects, solution developers, business users, executives, or database administrators (DBAs). ERDs express entities and entity-to-entity relationships. Entities are the data objects in focus, where examples may include Customer, Order, Invoice, Address, Phone, and so forth, and attributes are the descriptors of the entity. As an example, a Customer entity may contain attributes such as Customer First Name, Customer Last Name, Customer Credit Rating, and so on. Entities and attributes are considered the logical idea, so the table and column become the physical equivalents.
Because of their focus, ERDs are a pervasive tool used in the data modeling process. ERDs are assembled based on sets of data structural approaches, including normalized, dimensionalized, data vault, or something else. Each structural approach groups the attributes in differing ways. Similarly, ERD tools may have a variety of symbols,, since the data modeling industry has not achieved standardization, and these notations include Crow’s Foot, IDEF1X, Barker, and others.
Levels of Data Models
There are three generally agreed-upon levels of data models that may be attempted: conceptual, logical, and physical. However, opinions on what constitutes each of these levels, and who their targeted audiences are, varies based on schools of thought. Conceptual data models communicate a high-level perspective into the idea of the data under discussion. Because of this high-level nature, conceptual data models often lack much of the detail found in other data model types. For an enterprise perspective, the data model may be a list of data subject areas. A conceptual data model for a focused solution may result in a draft of an ERD, minus the attributes, or even a diagram not following any standard but conveying the idea. In some cases, an unattributed ERD may be looked at as the logical data model. At the other end of the continuum, a logical data model will have every piece of detail fleshing out the entities, all attributes, data types, optionality, definitions, and only stop at things that are a part of the implementation on the chosen DBMS.
Physical data models, in their own quiet way, are the most controversial data modeling component. Every DBMS operates in a unique fashion. Consequently, many details are unique to a given DBMS. ERD tools offer an agreed-upon-fiction in representing a “physical ERD.” Each tool may offer unique ways of presenting some physical-only characteristics. One may be able to flag entities/tables or attributes/columns as logical-only or physical-only. But ultimately, many necessary facets are settings buried within the tool, not represented in the diagram.
These hidden elements are only seen as clauses and keywords inside the DDL script that is to be executed by the DBAs implementing the solution. Because of this, that final DDL script can be viewed as the actual physical model. Shortcomings are natural, as the diagram is called an “entity-relationship diagram,” not a “table-foreign key diagram.”
Data Modeling Is Still Young
Data modeling is still quite a young practice. Standard data modeling practices are only “standard” within a single organization, or even a single team within an organization. With the wide variety of possibilities, our data modeling youth still shows.