The modern data warehouse needs to support advanced analytics on multiple types of data from semi-structured to streaming.
Driving BI dashboards is still at the heart of a data warehouse, but even those now demand data be current. Old-school batch ETL won't cut it.
More advanced predictive and prescriptive analytics are needed to keep businesses competitive, but those powerful capabilities are driven, not by standard business intelligence, but by machine learning.
Paige Roberts, open source relations manager, Vertica, explained why and how our old definitions no longer apply, along with what changed and what drove those changes during her Data Summit Connect 2021 presentation, “Your Definition of a Data Warehouse Is Probably Wrong.”
The annual Data Summit event is being held virtually again this year—May 10–May 12—due to the ongoing COVID-19 pandemic.
The typical definition for a data warehouse is: A technology that aggregates structured data from one or more sources so that it can be analyzed for business intelligence, she explained. However, it’s all wrong, she pointed out.
“It has evolved,” she said. “The most important thing about a data warehouse is fast, efficient analytics. Everyone in the organizations should use the data to make important decisions.”
The environment has changed. Requirements for the data warehouse are seeing increasing data volumes, more concurrent users, faster response requirements, and an increased demand for analytics.
Now requirements should include predictive maintenance, fraud detection, consider electronic health records, customer support, and give product recommendations.
A data warehouse now includes new types of infrastructure, new ways to store structured data, new types and amounts of data, new types of analytics.
“A data warehouse is not a single thing you buy,” Roberts said. “It has an enterprise architectural design.”
There’s more to a data warehouse than needing a dashboard, she said. Storage is not the goal of a data warehouse. Doing analytics well, doing different analytics well, is what a data warehouse is all about, she pointed out.
There has been a tidal wave of IoT data that it has left everything else in the dust, she said. Analyzing data isn’t limited by where it sits, or in what format.
“We have to be able to scale for tremendous amounts of data, for more people, and more workloads,” Roberts said.
Instead of doing just dashboards there are teams trying advanced analytics, going beyond BI with machine learning.
“You can accomplish a lot of things that you couldn’t before with predictive analytics,” Roberts said.
According to Roberts, the new data warehouse definition should be: An enterprise architectural design with an analytical database at its heart that unites and analyzes many kinds of data from across an enterprise and beyond to see what has happened, what will likely happen, and what action will help for improving decision-making both human and automated.
Register here now for Data Summit Connect 2021 which continues through Wednesday, May 12.