Newsletters




A Bridge Too Far


In multidimensional data modeling practice, there is the concept of a bridge dimension. A bridge dimension, being the dimensionalized equivalent of a normalized data design’s associative entity, allows for the resolution of a many-to-many relationship. By its very nature, the bridge is saying, “Yes, let’s normalize our dimensional designs.” The bridge dimension exists to resolve a many-to-many relationship between two other dimensions.

The first question a data modeler needs to ask concerns what the context and need are for the exposure of a many-to-many relationship.

What are the exact business questions to be answered? Generally, there is no need for the bridge dimension because the entire star schema exposes those many-to-many relationships via each of the individual dimensions’ relationships to the fact under analysis. If analysis of the many-to-many relationship itself is the object of the user’s desire, then consider a different design. In a dimensionalized design, the fact is the focus of analysis; should the many-to-many need to be analyzed, then seriously consider creating a fact, or even a factless fact, focused on that relationship. In this fashion, the possible bridge dimension becomes a fact table all by itself rather than a bridge dimension at all.

There was a circumstance in which a marketing group had a sales territory dimension and a salesperson dimension, since a salesperson might occasionally be associated with more than one sales territory. Because each sales territory wanted credit for the salesperson’s sales, a bridge dimension between the salesperson and sales territory was established. The enterprise did not want to double count sales for global totals, so a flag was added that made one of the many salespeople to sales territory combinations the “primary.”

If more probing questions had been asked during development, issues may have been resolved in another fashion that would not have needed a bridge dimension. What if the sales, by customer or by store/sales location, naturally fell into one specific sales territory? And if a salesperson actually had a primary sales territory in which they worked at each point in time? Then the fact could have had one connection to a simple sales territory dimension and another direct relationship to the salesperson.

The salesperson dimension could have the current sales territory assignment of that salesperson, along with the sales territory the salesperson was in at the time of the sales. Potentially, there could have even been a “credited” sales territory for any overrides to the usual, expected rules.

Essentially, the bridge sets up the possibility of joining a large fact table to a smallish dimension, then joining into a largish bridge, followed by a join into another smallish dimension. In many data platforms, this kind of joining across tables can be a querying performance bottleneck. Flirting with poor performance is a reason to avoid establishing a bridge dimension construct.

Functionally, a bridge table can hurt you in another way. Depending on the user’s query, based on the way the bridge sets things up, the user may easily end up multiplying their results and obtaining an incorrect answer to their query via our old friend, the Cartesian product. A designer can attempt to avoid this by adding flags or even multiplier values onto the bridge, but this starts creeping into the realm of overcomplicating the user’s life.

This unwanted data duplication is another reason to avoid creating a bridge dimension. There are approaches that can be used to avoid a bridge dimension if a designer is bold enough to explore other options. It is truly rare for there to be a set of circumstances that require only a bridge dimension as the solution. Be bold and avoid bridge dimensions.


Sponsors