In all scenarios, business requirements should be the ultimate determinant, and this is very much the case with data analytics platforms. “Ultimately, it is always about how well business challenges, such as customer churn, fraud detection, or security threat perception, are addressed,” said Sri Raghavan, director of data science and advanced analytics, product marketing for Teradata. “Data platforms that are also accompanied by a palette of analytics capabilities—algorithms, visualizations, workflows—that can be used by a wide range of personas will always be dominant.”
There are considerable choices above and beyond the data environment itself. “Today’s analytics requirements range from real-time and time-series-based analysis right down to standard BI,” said Kelker. Some analytics processes are too expensive to be done in the cloud due to ingestion costs, and, therefore, they require edge AI solutions with local algorithms and tinyML (tiny machine learning).
“The days of master data management are over,” Kelker stated. The focus has moved away from data models toward algorithms, he noted. “The explosion of external data fields intermeshed with internal data makes this increasingly difficult to design upfront. Modern concepts, such as streaming architectures and data meshes, are blending the worlds of data storage and analytics together.”
The main issue is that “organizations need real-time, actionable insights to inform critical decision making,” said Scott Gnau, VP of data platforms at InterSystems. “Seamless, cross-silo access to the right data at the right time is difficult due to increasing complexity and latency challenges. Scalable, high-performance data platforms that connect distributed data to the composable stack need to serve as the foundation of modern analytics strategies.”
DATA WAREHOUSES, LAKES, AND EVERYTHING IN BETWEEN
What kind of role is emerging for today’s data warehouses, and how have data lakes shaped this role? It’s important that “data warehouses and data lakes operate in unison if businesses want to stay ahead of the game,” said Adams. “Data warehouses typically ingested information from relational databases that was then extracted by business intelligence tools for further analysis,” said Adams. Data lakes have created “an undercurrent for warehouses,” whereby they are now able to store all business data—from contacts, user information, documents, pictures, logs—or any data the business and its users generate, he noted. “While having a breadth of diverse information in data lakes makes transforming data a more difficult task, it gives organizations a wealth of information that was previously inaccessible.”
A converged data warehouse-lake architecture is the best path forward for supporting increasingly complex analytic data environments, Raghavan said. “Data lakes, or data swamps, require robust solutions to understand, search, and analyze the data in a context-sensitive manner, while not losing the associated lineage and provenance information,” he pointed out. “Data warehouses have been rearchitected to meet the emerging need of analytics that is near real-time and can handle large volumes of data.”
Raghavan also pointed out that “today’s data warehouses have become high-efficiency, super-compute clusters where not only are ETL processes used to deliver clean data but also combined with state-of-the-art feature engineering and modeling capabilities to deliver high performance models and operationalizations at scale.” Data lakes have contributed to these super data warehouses “by simply increasing the volume and the breadth of data that could be ingested into a data warehouse. The presence of a loosely coupled compute-storage architecture ensures that subsets of the data can be selected for ETL [processes] and more production-ready work within the warehouse.”