Data and analytics needs evolve just as quickly as tools are created for those very needs. The vast quantity of technologies available today may make it rather overwhelming for any data leader to know what tools to use and how to leverage them for maximum benefit. However, a rickety connection between disparate tools does not a data infrastructure make; rather, looking at wider trends and successful strategies implemented in the data and analytics space can revolutionize the way your enterprise gets data to the people who need it the most.
Cameron O'Rourke, senior director of product strategy at Incorta, and Eldad Chai, CEO of Satori, gathered for a DBTA webinar, “Top Trends in Data Engineering,” to discuss key patterns and methods that illuminate what areas of data and analytics need some technological TLC.
O’Rourke analyzed the routine of methodologies that many enterprises take toward data engineering, as well as their shortcomings. The “modern” data pipeline, though attached by a promising adjective, ultimately moves the data through too many stages—reducing its quantity, quality, and increasing the time it takes to get to its target. From the “raw zone” it lands in, to the refinement, and finally to its end state results in data loss, stale data, and lack of accuracy and flexibility.
The agile data lakehouse is a strategy that accommodates for decentralization and scale; or, as O'Rourke explained, it means doing more with less.
“We want to land the data in its original form and retain that original form, and then apply different use cases and workloads to the data, essentially as it is. This means different analytics issues and different domains have access to information in the way they're used to seeing it,” said O’Rourke.
This method supplies data scientists with a plethora of raw data, as well as increases accessibility through an open store that can be accessed directly by users. O’Rourke acknowledged that this process is anything but trivial; how does it even work?
O’Rourke explained that the key towards improving data engineering is letting the more advanced and capable query engines available today do more of the work, reducing up-front data engineering work. Though common engineering processes are typically composed of flattening, converting, and aggregating data for it to be processed by the query engine, data engineers are now leveraging analytics platforms and bringing data in directly from source systems. Leaving data in its original form allows users to immediately harvest insights around data without waiting for its preparation, increasing agility, freshness, and accuracy.
Breaking free from the traditional pipeline and introducing more simplicity, agility, and speed are the critical responses needed to empower modern, effective data engineering, concluded O’Rourke.
Chai’s analysis of data engineering trends was focused on a Satori survey of more than 300 data leaders in data engineering, architecture, BI, analytics, and data science, throughout multiple industries and various enterprise sizes. Satori inquired about common data challenges including data sharing and comprehensive roadmaps for continuous benefit, as well as how that stacks against the reality of their enterprise in terms of their people, technology, and processes.
The survey revealed a multitude of statistics that highlight the greater concerns of data workers and their workloads. While 75% of companies are planning to increase the use of data—a seemingly positive prediction—more data and more people using it raises a few alarms; how will processes scale? What will break? How can you make the data easily prepared, ingestible, and accessible?
In order to answer these questions, Chai directed viewers towards a significant trend: 61% of companies have either manual or siloed approaches for enabling data access. If you are wondering why this may be, Chai presented the statistics: 75% of companies have to deal with sensitive or regulated data, and 61% of data leaders spend more than 10% of time on access management and security. This accumulation of overhead and manual labor is an obvious detriment towards data access, as regulation and scale become the boogeyman of effective and agile data engineering.
Luckily, Chai pointed to a promising data point: 20% of teams that have automated data access significantly improved their data processes and were able to get data to the people who need it, quicker. Transitioning towards dynamic, real-time access, automation, self-service, and baked-in security are fundamental towards empowering accessible data. These strategies, Chai concluded, can be implemented with Satori’s solution; a cloud service that is identity- and data-aware and dynamically manages access to data throughout all data silos.
For more information about data engineering trends and to view Satori’s full report, you can view an archived version of the webinar here.