Newsletters




The Future of Data Analytics: Out of the Warehouse, Through the Lake, and into the Fabric

<< back Page 3 of 3

TRAWLING THROUGH DATA LAKES 

Data lakes, once seen as the next generation for analytic data storage, also has encountered issues as they proliferated across enterprises.

Data lakes came on the scene more than a decade ago, with adoption tied to the adoption of Hadoop as a big-data storage and processing environment. However, as has been the case with Hadoop, governance has been a challenge for data lakes. “Hadoop lacked data governance,” said Stalla-Bourdillon. “With increased regulation and tightened controls, organizations simply couldn’t answer simple questions about what data they had and who had access to it. With the rise of new table formats,  native support for row-level deletes for example, which are critical for privacy and data protection compliance but were previously only available to data warehouses, are now possible in data lakes. However, with the expansion of data privacy and data localization requirements, global data lakes still struggle to accommodate today’s data challenges and hinder organizations when it comes to purpose limitation, data minimization, transparency, and data flow management and control, as well as data quality and provenance.”

Organizations “would simply dump their data into one data lake repository and there is no way to analyze it or make it actionable in real-time,” said Gnau. “While data lakes have helped businesses organize their raw data into these central repositories, they still are not typically involved in operational and transactional data flows. Only on rare occasions do I see that there’s enough governance and metadata to make it useable for anyone other than the person who puts the data in.”

Data lakes “risk becoming data swamps when clear ownership and governance cannot be established,” agreed Werner. “That is something that with increasing regulation on personal data and AI will only become more important. Healthy skepticism is mounting about whether trawling for data, dumping it in a lake, and working out what to do with it later, is a wise strategy. Leaders are returning to the basics: engage with the business to understand their needs, empower them to deliver the vast majority of those needs, and excite your data experts with the task of working on only the hardest problems."

Ten years ago, Stalla-Bourdillon continued, “industry experts would have argued that global data lakes are the only way to build analytics capabilities. The emergence of the federated data governance movement shows that there is now an alternative.”

While data lakes “helped stimulate a move to broader data management applications, the role of data lakes is now waning,” Gnau said. “Companies are adopting architectures like data fabric, that can provide a holistic and connected view of the entire life cycle of data.” 

As companies increasingly rely on unstructured data, “there is a shift towards analytics that aims to provide a holistic view of the company, including the human element,” said Kon Leong, CEO at ZL Technologies. “This requires the agility to manipulate near-real time data, not at the sandbox level but at the ‘beach’ level, a capability that is not present in the data lake or warehouse model. The corporate need to better understand human resources, whether for talent analytics, operational improvement, has paved the way for a new paradigm: the virtual, managed data lake.”

THE BEST OF BOTH WORLDS

Enter these new approaches such as data fabric, a centralized metadata architecture that integrates disparate multiple data platforms and pipelines, and data lakehouses, which support the transformational and governance capabilities of data warehouses, but with the openness of data lakes. Data lakehouses and data fabric are “an evolution of exciting platform capabilities that are responding to the exponential growth of data,” said McGuigan. “The data warehouse, data lake, data fabric, and data lakehouse are all platform technologies that are a means to an endenabling business users to make better decisions fasterbased on data. With the cloud, it’s even easier to democratize these capabilities, so while the technology may be a data lakehouse, the integration and adoption in real business scenarios should be seamless.”

Werner compares the adoption and growth of data lakehouses to that of data warehouses two decades ago. “Every one of these concepts is just a pattern for making data valuable for a business need,” he said. “The lakehouse makes explicit that one size-one pattern does not fit all, and that any modern data architecture will need to marshal a range of storage and processing technologies for different business needs.” In addition, he noted,  “we see many customers also considering a more decentralized approach like data mesh to achieve quicker time to value and enabling data analytics beyond a central center of excellence."

At the same time, business requirements need to be considered first. “The livelihood of data lakehouses depends on data analysts’ ability to uphold the crucial role of determining what the use cases are and defining the organization of the data inside in such a way that it will satisfy the desired business outcome,” said Zisk. “A data lakehouse is only as valuable as the understanding of its use cases by the people putting it together.”

<< back Page 3 of 3

Sponsors