Implementing a data fabric can spark fears for an organization concerned with the business impact of the transformation; how will our existing tech stack be affected? How long will it take to implement? How much money will we lose in the shift?
At Data Summit 2023, Doug MacWilliams, director in the data engineering and analytics practice at West Monroe, led the session, “Implementing a Data Fabric,” to discuss a technology-agnostic methodology for employing a data fabric that will be conducive to faster data on-boarding, automatic agility for data changes, manageable solutions across technologies, all united under a monitorable, maintainable data fabric.
The annual Data Summit conference returned to Boston, May 10-11, 2023, with pre-conference workshops on May 9.
MacWilliams posed a familiar origin story: a company starts simply—in regard to both its technology and strategy—and grows through known technologies on best-intentions to develop larger and more complex data platforms.
As these companies evolve, challenges arise, taking shape as disjointed processes and tech debt that requires re-platforming to remediate.
With the proliferation of technologies and services flooding the market, these re-platforming initiatives are overwhelmingly complex to address, MacWilliams explained. This causes companies to be distracted by evaluating the technology, wholly missing the opportunity to optimize data processing operations and methodology.
Common pain points that reveal a need for a data fabric include:
- Data teams struggling to keep pace with the evolving demands of the business
- Difficulty in finding and retaining talent
- Failing data platform builds
- Platform builds that take a long time to complete
- Accumulating tech debt
- A need for fresher, near real-time data that meets SLAs
To alleviate these pain points, MacWilliams outlined the following features of a modern, tech-agnostic data fabric:
- Data platforms should organize on three logical functions where each unit of work should fall into these functions (consolidate disparate data, enrich and transform, and publish).
- Data platforms should utilize consistent processing steps to create a unified process for all data to follow, consisting of purpose-built pipelines that are highly flexible.
- Data platforms should utilize parameterized reusable pipelines which is a shift away from the conventional approach of copying large parts of ETL code.
- Data platforms should require consolidated metadata for data fabric visibility and to intelligently orchestrate an enterprise-wide data fabric.
- Data platforms should shift from large DAGs/process trees to configured dependencies powered by metadata.
By modernizing data platform methodology, and in turn, applying a tech-agnostic data fabric, companies can derive greater value from their existing tech stacks as well as from their teams, MacWilliams concluded.
Many Data Summit 2023 presentations are available for review at https://www.dbta.com/DataSummit/2023/Presentations.aspx.