DataOps, or an approach to quickly deliver data and accelerate deployment of analytics solutions, can be a key driver in accelerating data analytics democratization. Though ideal, it's not without obstacles relating to data analytics processes or practices.
Frank Cervone, program coordinator of information science and data analytics at San Jose State University, led the Data Summit session, “Developing Database DevOps,” to explore how DataOps can be optimized to lead organizations to better business outcomes in regards to both data and analytics work.
The annual Data Summit conference returned to Boston, May 10-11, 2023, with pre-conference workshops on May 9.
Cervone began by stating that DataOps is more than extending DevOps concepts to data; rather, it is a collaborative data management practice focused on improving the communication, integration, and automation of data flows between data managers and data consumers across an organization, according to Gartner.
“What we’re really focusing on is how to improve the communication and integration of data flows across an organization,” said Cervone.
DataOps is a relatively new phenomenon, which lends to misunderstandings in what the strategy truly entails. Why it became critical in the world of data is due to the influx of complexity and types of sources, explained Cervone.
The problems that DataOps attempts to solve are:
- Complexity and shifting requirements
- Strategic alignment of data projects
- Reducing delays without impacting data quality
- Addressing questions of data provenance and lineage
- Lack of trust in data
These concerns are warranted, said Cervone. According to a 2019 Experian study, 98% of companies rely on data to enhance customer experience. Further, according to a Gartner study from 2015, more than 60% of data analytics projects fail because organizations cannot keep up with their own complex data environments.
Though the issues associated with DataOps are similar to that of DevOps, “one of the differences that are important is, while DevOps brings the development and operation teams together, DataOps operates externally,” said Cervone.
DataOps is rooted in agile, or working within a distributed data architecture, where the question is, “how can we implement agile processes organizations as a whole?” According to Cervone, this implementation may occur without the organization knowing it’s going on, since introducing the process may increase more difficulty compared to implementing it without introduction.
The end goal of DataOps is to take advantage of data governance, as well as focusing on building and deploying reliable data pipelines and producing predictable analytics processes. Cervone highly emphasized the data governance aspect of DataOps, as without governance, DataOps will be impossible to deploy.
“When we think about implementing DataOps, there are a couple different ways to look at it; one of them is a high-level framework, which requires establishing a groundwork that prepares and optimizes for operation, implementing through knowledge, trust, and use cases, and then a series of iterative improvements,” explained Cervone.
Another way that implementation of DataOps may take place is through a lifecycle approach, which encourages a cyclical process that begins with establishing the framework through planning, development, integration, and testing, which then transitions to implementation and improvement through deployment, operation, and monitoring.
Cervone stressed that there are few key takeaways when engaging in a DataOps development journey:
- Gain stakeholder alignment on KPIs early and then revisit them periodically
- Automate as many tasks as possible through relevant tools
- Deploy and iterate as a culture
- Invest or divest to self-service, depending on one’s perspective
- Focus on data quality then scale
- Encourage data observability to understand the health of data in systems
Many Data Summit 2023 presentations are available for review at https://www.dbta.com/DataSummit/2023/Presentations.aspx.