To maximize the value of their data, organizations are building large data teams with both supply-side functions that provide analytics-ready data from across the enterprise, and demand-side teams that analyze, transform, model, and commercialize data.
Multiple cloud data platforms improve agility, reduce cost, and get more data to more users. The shift to the cloud entails rewriting decades of on-premise data pipelines, shifting ETL processes to ELT to enable end-user agility, and adopting multiple cloud analytics platforms and data science tools.
Thus, data teams need a modern framework, architecture, and suite of tools to make their cloud data platforms work together efficiently and securely.
Sumit Sarkar, director of product marketing, Immuta discussed all this and more during his Data Summit Connect 2021 presentation, “Examining the Current State of DataOps and Data Engineering.”
The annual Data Summit event is being held virtually again this year—May 10–May 12—due to the ongoing COVID-19 pandemic.
DataOps is a collaborative data management practice focused on improving the communication, integration and automation of data flows between data engineers and data consumers across an organization, Sarkar explained.
“It’s a mix of philosophies, methods, and tools,” Sarkar said.
DataOps is not a data integration tool, anything using Apache Airflow, or another type of unpaid engineer, he said.
Emerging data trends for teams see a diverse mix of cloud-based platforms. The migration or expansion side of cloud capabilities seems to be a few years behind but moving quickly forward, Sarkar said. According to a recent survey conducted by Immuta, 52% of data teams plan to adopt two or more cloud data platforms in the next 12-24 months.
“It’s a sign of organizations expanding best of breed capabilities,” Sarkar said.
The next trend sees the utilization of sensitive and regulated data use. According to the Immuta survey, 75% of respondents expect to collect and currently use or plan to use sensitive data. Data engineers are outnumbered 30:1 by data consumers, according to Sarkar.
Challenges are multiplying with more cloud data platforms including multiplied roles, hard to classify data, and disparate data protection.
“Roles are becoming harder to manage across workloads within the cloud,” Sarkar said.
DataOps teams should automate certain tasks within the cloud, he recommended. These areas include the ingestion and transformation layer; the storage layer (Object Store Controls); the query and processing layer; the output layer (analytical tool controls); and the decoupled policy enforced on the query and processing layer.
The impact increases permitted use cases, increases data engineering productivity, simplifies roles to manage, and reduces time to data access, he said.
Register here now for Data Summit Connect 2021 which continues through Wednesday, May 12.