While these forces converge, “we believe that the days of maintaining both data warehouses and data lakes are limited,” said Minnick. “The lines between data engineering, data analytics, and data science are becoming increasingly blurry. The people in these fields are working more closely together than ever, driving the need to have everyone collaborating on the same data.” As a result, he predicted, “a data lakehouse will be the norm since it builds on the open data lake, where most businesses already store the majority of their data. The data lakehouse also adds the transactional support and performance of a data warehouse needed for analytics that data lakes never delivered. The unified approach of the lakehouse will enable better collaboration with all data team members to work from the same data instead of their own siloed versions.”
Kelker also sees movement toward data mesh and streaming architectures playing an increasing role in analytics environments. “A lot of this design will be impacted by forces that pull in both directions—5G and multi-access edge computing concepts can suddenly make connected use cases with large data possible. At the same time, the advancement of edge AI, tinyML, and smarter processing chips with low power will remove the need for all data to be transmitted to the cloud. Both forces are competing with how much data needs to be transferred to and from the cloud.”
Overall, there will be capabilities around “data and analytics commoditizing in the cloud,” said Stijn “Stan” Christiaens, co-founder and chief data citizen at Collibra. “While the drivers for that are strong and growth is fast, it will still take time for organizations to fully adopt and migrate to a modern infrastructure while at the same time ‘keeping the lights on’ [in] the old.” He also foresees more focus on privacy-enhancing technologies such as encryption that keep patterns in data alive for analytics or synthetic data, as well as greater focus on real-time use cases, for example, with time-series approaches.
Wingfield sees a rise in containerized analytics architectures, which makes analytics capabilities “more composable so that they can be more flexibly combined into applications.” In addition, data lake and data warehouse architectures “will also need to be containerized to meet the resource demands associated with big data, artificial intelligence, machine learning, streaming analytics, and other resource-intensive decision intelligence tools that are straining older data lake architectures.”
There will also be a push to democratize data analytics as these platforms open up and become more ubiquitous within enterprises. “The initial personas who were doing a lot of work on data lakes were architects and hardcore data scientists who were able to create complex analytic pipelines with multiple programming frameworks,” said Raghavan. “This was a hard-to-replicate core competency that resulted in a hard-to-justify mystique.” This will change, with more use cases, more types of users, and more applications derived from data lakes, said Raghavan.