Big data tool vendors try to downplay the notion that data warehouses and data marts still need to exist, even in a big data world. They wish for potential customers to see their shiny new big data toys as the be-all and end-all for conquering information needs. Relational DBMSs are painted as “old-fashioned,” “yesterday,” and “inadequate.”
They beckon potential customers to take a dip in the refreshing data lake. The fact that big data, in all of its glory, is only part of a larger business intelligence solution simply gets lost in the dialog. Or, at least, that part of the discussion waits until after those new big data tools are acquired and paid for.
Habitually, source system data does not account for flagging changes and maintaining historical values. Even when source systems acknowledge change they may present change in an inconsistent fashion. A massively parallel processing approach that reads through transactional data quickly does not provide data that is not there to begin with. Under these circumstances, the original data needs to go through some kind of manipulation to transform in new history-aware data structures. Data warehouse structures serve that purpose. Often in an effort to optimize performance, operational solutions are defined with a bare minimum of data points. Only those elements necessary for required functionality exist. Businesses often need to reach beyond those minimal elements with additional derivations and calculations for analysis and decision-making.
Prior to the rise of data warehousing practices, each report went its own way and these calculations and derivations varied randomly. It was the integration performed by the processes that created the data warehouses and data marts that gave rise to consistency, consolidating business rules and providing one version of the truth for corporate analysis. The need for consistency still exists. The need for data warehousing has not vanished.
Data warehousing efforts are often discouraged due to excessive costs. Recently, India was successful in sending its first spacecraft into orbit around Mars for a price tag of about $74 million. A similar effort from the United States cost $671 million. Like India’s efforts, data warehousing can be performed more cost effectively than it is. More often the excess is in the planning, the approach, and the scope. Many times the data warehouse is used as a mechanism to fundamentally change many other practices across IT and the organization in general. Far too often the data warehouse is blamed for incurring the costs associated with bad or insufficient operational data administration. Good data administration enhances the understanding of all systems, including the data warehouse. But moving away from bad data management and administration is not the function of the data warehouse. Focused projects, focused development, non-ambiguous goals can help shepherd even the data warehouse to success without breaking the bank.
The end user community will not be composing unassisted ad hoc MapReduce processes just yet. But they will be running standard reports and expected analyses of the sales data over and over. Setting up efficient means of supporting those standard reports and expected analyses is accomplished by having them execute against data marts and warehouses focused on optimizing that workload.
Yes, embracing and exploring big data can open the door to mind-expanding and profitable new thoughts. However, corporations still need to progress towards success by walking before they run.
Organizations need to start with building a business intelligence environment that handles the usual, the expected, the necessary, and then they may continue on by expanding into the potentially mind-blowing. Do not think that the simpler slicing and dicing of the “usual” metrics by the “usual” descriptive dimensions has lost any of its organizational value. The data warehouse remains an important component of the organization’s informational ecosystem.