The cloud is changing the way businesses use data and at the same time, the way we use data is changing the cloud.
Organizations are adopting multiple cloud data platforms to maximize data's value, but in doing so are often increasing complexity for data engineering and DataOps teams that may stifle the very reason for investing in those platforms.
DBTA recently held a webinar featuring Sumit Sarkar, senior director of product marketing, Immuta, who discussed the three biggest hurdles for DataOps teams managing cloud data platforms and how to overcome them.
DataOps is a collaborative data management practice focused on improving the communication, integration and automation of data flows between data engineers and data consumers across an organization, Sarkar explained.
DataOps is not a data integration tool, anything using Apache Airflow, or even another type of underpaid engineer, he said.
Right now DataOps teams are seeing an emergence of various trends including the use of diverse cloud platforms and sensitive and regulated data use.
According to an Immuta survey the “State of Data Engineering and Operations for Analytics, 2020,” 75% of respondents expect to be “entirely” or “primarily” cloud-based in the next 24 months.
Data teams found “masking or anonymizing data” and “monitoring and auditing data use” to be the most challenging steps in the data management process, Sarkar said. The challenges are multiplying with more cloud data platforms.
The 3 biggest hurdles with more cloud data platforms and more sensitive data include:
- Multiplied role explosion
- Hard to classify data
- Disparate data protection
There are several layers of the modern cloud data ecosystem that can be automated, he said. These areas include the ingestion and transformation layer; the storage layer (Object Store Controls); the query and processing layer; the output layer (analytical tool controls); and the decoupled policy enforced on the query and processing layer.
According to Sarkar, the impact of this is an increase in permitted use cases by 400%, an increase in data engineering productivity by 40%, simplified roles to manage by 200X, and reduced time to data access from months to seconds.
An archived on-demand replay of this webinar is available here.