What types of platforms are most viable for modern data analytics requirements? These days, there are a wide variety of choices available to enterprises, including data lakes, warehouses, lakehouses, and other options—resident within an on-site data center or accessed via the cloud. The options are boundless. It’s a matter of finding the best fit for the business task at hand.
Choices will depend on the nature and purpose of the data. “There’s no one-size-fits-all solution,” said Teresa Wingfield, director of product marketing at Actian. A flexible, multi-platform approach is needed with a wide range of data capabilities—be it a data warehouse, data lake, transactional database, IoT database, or third-party data service, she noted.
The data platform best for analytics “depends on the type of information you want to analyze,” said Matthew Adams, senior cloud architect at ZL Technologies. “If you hold large amounts of easily categorizable, structured data—sales metrics, warehouse items, or customer information—data warehouses are ideal. Structured reservoirs and warehouses cater to analytics by design as they are organically segmented and neatly packaged.” In contrast, “if you want to analyze information that is less structured—emails, files, or messages—then a data lake is a better approach.”
MORE CHOICES
There is a third way, observed Herb VanHook, vice president of enterprise CTO Services of BMC Software. Emerging data lakehouses address requirements for analyzing both structured and unstructured data, he said. “Lakehouses are a new construct that applies querying capabilities associated with structured data and data warehouses alongside the diverse workloads that data lakes use to process many formats. Many cloud providers offer a set of different platform capabilities, including a data warehouse with querying, a data lake with analytics processing, and managed infrastructure, to run these platforms.”
The data lakehouse architecture offers a compelling option, “since it combines the best qualities of data warehouses and lakes to provide a single solution for all significant data workloads,” including SQL analytics, business intelligence, data science, and AI, agreed Joel Minnick, vice president of marketing at Databricks.
Data warehouses “have been used in the past for lagging indicators, such as sales of the last quarter per product line,” said Prashant Kelker, partner for digital strategy and solutions at ISG. “The focus is now shifting to leading indicators, for example, what could the forecasted sales be per product line? Leading indicators are making way for either judgment or prediction algorithms. Data lakes and data warehouses are merging into concepts like data lakehouses.”
Both the data lake and data warehouse are not perfect solutions, Minnick pointed out. “They have contrasting benefits that often require organizations to deploy both architectures, which is costly to maintain, complicated, and causes information silos that slow down decision making. Data lakehouses provide the low-cost, open standards support of a data lake with data warehouse-like performance, reliability, quality, and scale. Lakehouses can support both structured and unstructured data, including video, audio, and text.”
In addition, the range of choices extend well beyond the bounds of enterprises, often easily accessible via APIs, which support “a number of easily integrated data orchestration platforms,” said Pragyansmita Nayak, chief data scientist at Hitachi Vantara Federal. “Open and easily extensible architectures enable this data exchange without significant development and learning.”