In a world where every industry stresses “doing more with less,” particular technologies and strategies that conserve resources while maximizing business value are crucial, yet often elusive.
DBTA’s webinar, Building a Data Mesh on your Lakehouse, offered best practices and tools for implementing a data mesh—an architectural framework that solves data efficiency challenges through decentralized ownership—as well as how data lakehouse and data pipeline automation can accelerate its adoption and effectiveness.
There is a rampant disorder plaguing enterprise data stacks—existing as disparate platforms, software, and tools—that induce context-switching madness. As Pavithra Rao, solutions architect at Databricks, said, stitching together a plethora of platforms is unnecessarily expensive and complex; it results in data silos driving high operational costs, inconsistent policies reducing trust in the data; and disparate tools slowing down cross-team productivity.
A data lakehouse—particularly the unified data lakehouse offered by Databricks—takes a different approach, according to Rao. As a single platform, a data lakehouse supports multiple personas, protected by one security and governance model across the organization, where all unstructured, structured, and semi-structured data is stored in a single place.
Furthermore, a data lakehouse and a data mesh go hand-in-hand, providing autonomy within workspaces while simultaneously issuing governance across workspaces. While a data mesh lets those who know the data best be responsible for its value, reducing reliance on central teams while avoiding silos, the Databricks Lakehouse offers a flexible architecture that can facilitate distributed, centralized data mesh design patterns through open protocols.
The Databricks Lakehouse additionally implements a unified and collaborative environment for all data personas to co-create and share data products. Rao argued that a data mesh and the Databricks Lakehouse are complementary paradigms which—when implemented correctly—can create value from data at scale.
Jon Osborn, field CTO at Ascend.io, added onto Rao’s argument by pairing Ascend.io’s intelligent pipelines with the Databricks Lakehouse for optimal data mesh usage. By automating data pipelines, the lakehouse is supported by a central platform for ingesting, transforming, orchestrating, and sharing data.
According to Osborn, data sharing is the most complex data mesh implementation challenge. With Ascend.io’s platform, however, it becomes a simplistic operation. Ascend.io enables users to publish, subscribe, and link data products seamlessly within and across data clouds through Live Data Share.
This results in game-changing efficiencies, said Osborn; with 75% reduction in tool costs and 7x increases in pipelines per engineer, organizations can consolidate the number of tools they use and increase business impact through faster pipeline builds to properly support a data mesh architecture and data lakehouse.
For an in-depth discussion of using the Databricks Lakehouse and Ascend.io’s data pipeline automation platform to successfully implement a data mesh, you can view an archived version of the webinar here.