Normalization clusters data items together based on functional dependencies within the data items. This normalized arrangement expresses the semantics of the business items being presented. Denormalization means that, for some reason, grouping the data strictly by functional dependencies has been ignored. Why might one cast aside reasonable designs for something less rational? There are many reasons for denormalizing.
First off, we have the very “tiny” reasons: Perhaps some specific query against two normalized structures runs an excessive amount of time. Maybe investigations into those queries have revealed that no indexing solution helps performance, but it is determined that copying one or two data items from one structure onto the other and then using one less data structure in the query solves the query performance issue. Or perhaps a similar performance issue about calculating a result is resolved by pre-calculating a value and instantiating it as a physical column.
We have made a few relatively minor changes to denormalize our designs but that enhance important queries’ performance. Of course, those new to data modeling should remain conscious of these modifications, as the logical data model will not contain these new columns, but the physical data model will include these alterations and have them marked as “physical only” objects.
Next in line may be some slightly “larger” reasons for denormalization. Again, some important query or queries are having issues in returning data fast enough for the solution being built. However, instead of simply tweaking designs by adding a column or two, more is necessary. The end result of these performance investigations may be the suggestion of creating an entire summary structure that has already pre-processed and aggregated data.
Or perhaps it is creating a new structure that is a pre-join of two or more other structures. Especially since these options provide a bit more significant kind of change, one should first actually have a performance problem when one is executing the solution, further investigate by running through “Explain Plan” commands to see where bottlenecks are likely, and then work through options to improve the bottleneck.
Those who jump directly to denormalized solutions without even having a performance issue are likely weaker architects who have little faith in their tools. And again, an entire table could be flagged as “physical only” or, alternately, one may have a data modeling tool that can handle marking two logical structures to be merged as it is transformed into a physical design.
On a larger scale, there are structural and architectural denormalizations that are done. This means that a complete set of data structures will not be normalized. Staging or raw areas can be considered “denormalized” in that the “designers” of these areas are simply reflecting the data as it comes in versus analyzing functional dependencies.
Structures may actually be normalized, but any such normalization was done by those providing the data and may not have been done in the context of the containing solution. Therefore, the solution at hand can only be considered “as is” and flat denormalized.
Beyond these staging areas, there may be layers or entire solutions in which architectural decisions have been made to create designs that may be multidimensional, data vault, or something else that follows rules and patterns outside the norm. These are physical structures adhering to non-normalized conventions and serving a determined objective and need.
Regardless of the physical configuration chosen, the best architects make sure that they understand what a normalized version of the subject areas would look like, as knowing that aspect is simply knowing the data. There are many scenarios where denormalization is done. Some small, some large, but always with a purpose.
Never be afraid of denormalizing, but always be aware of the good reasons for doing so.