Many people seem to become filled with anxiety over the word “normalization.” Mentioning the word causes folks to slowly back away toward the exits. Why? What might have caused this data modeling phobia? Do people have images flashing through their minds of data modelers running around wearing a hockey mask and carrying a chainsaw screaming, “Give me your primary keys”? I hope not. I would tend to believe that any such anxiety is based on simply not having had a proper explanation of the process and idea. People may have heard normalization is somehow related to math and logic. Math and logic all by themselves can start the fear molecules dancing around for many. Five-syllable words too can give the impression of noses tilting upward, but the idea behind normalization is quite simple.
Ultimately, normalization means that one’s data attributes are organized based on the dependencies between them. What the data means drives how the data is arranged. As often mentioned, data items on an entity should depend on, “The key, the whole key, and nothing but the key.” If you have a business object—a person, a store, an order—first give a thought to what makes that object unique. Data attributes like a Social Security number, an employee identifier, a store number, or an order number are likely suspects. Whatever it is, it is unique for a given instance of that object.
When looking at other data attributes, the question you need to evaluate is, “Which business object has only one value for this data attribute?” You end up with simple arrangements, such as a person uniquely identified by a Social Security number who has a single birth date, a single shoe size, a single eye color, and so forth. I call this the Theory of Onesies, in that every piece of data belongs where it exists only once. Take a breath and apply this thought across all the data items you are dealing with. The simple Theory of Onesies will help you get your data normalized into Third Normal Form, or even Boyce-Codd Normal Form, without worrying. No integrals or differentials used.
If you have entities with composite keys (more than one attribute comprises the primary key) or, even worse, composite keys and multiple candidate keys (primary key choices) with overlapping columns, maybe then there might be some normalization subtleties that still need working through. But if you do not have any composite primary keys or multiple candidate keys with overlapping columns, then life is a breeze. Even if the composites and multiple candidate keys arise, there is plenty of literature out there to investigate what may be necessary. One is never left alone in the dark to be tormented by the specters of normalization.
Applying the Theory of Onesies is a simple shortcut into attaining normalized data that works for most data, most of the time. No muss, no fuss, and no math. One and done. What could be easier? If your data is normalized, then every attribute has a single value that is determined by the identity of the object instance the data item resides within. It is a onesie for the key. The better one knows one’s data, the faster one can step through the process. The only time problems can arise is in the case of unknown data items. If the data is not understood, then why is it being included in the first place? A good data modeler should always question the relevance of data that is ill-defined or undefined. If an item is worth inclusion, it is also worth being known and understood. Why spend so much effort being afraid of normalization?