It’s a time when large datasets are being leveraged for real-time analysis. Tried-and-true approaches to cobbling together technologies and policies to achieve workable data governance and security just won’t work anymore. The rapid emergence of AI and machine learning—among other things—has turned the data governance and security world upside down.
Traditionally, enterprise data governance initiatives “have primarily been driven by manual processes and reactive to changing data privacy, security policies, and regulatory requirements,” said Matt Aslett, director of research, data, and analytics at ISG Software Research. “This poses challenges for enterprise attempts to respond quickly to evolving security threats, competitive dynamics, and regulations.”
These traditional approaches to data governance and security “are insufficient to manage the risks posed by AI systems,” said Steven Tiell, global head of AI governance advisory at SAS. “They fall short because they’re generally confined to engineering and product teams, and AI requires a more stakeholder-rich ecosystem.”
There’s a “long list” of data governance and data security challenges these days, said Heather Gentile, director of product for IBM watsonx. There’s “shadow data, AI, and mounting compliance measures and the squandered potential of untapped, unstructured data,” she said. “One of the biggest gaps, however, isn’t a tactical issue—it’s a strategic one.”
The issue is the “siloed approaches to governance and security,” seen within enterprises, Gentile continued. “They treat them as separate disciplines, rather than integrated and reinforcing efforts with common objectives. Data and AI that aren’t properly governed can lead to security risks, and data and AI that aren’t properly secured can lead to compliance and operational risks.”
The siloed approach is driven “by extreme demand for operational and analytical applications,” said Scott Gnau, VP of data platforms at InterSystems. “As such, the notion of provenance becomes very important—where was it created, who touched it, where did it travel. True provenance means traceability to a data’s source and enables trust in applications.”
The rise of AI is also “giving way to new data types and locations” that may exist under the IT department’s radar, said Rick Vanover, senior director of product strategy at Veeam. “Vector databases, training data, the large language models, and more are components of an AI infrastructure and exist somewhere. This isn’t new. Call it the next ‘shadow IT.’”
The challenge is that “once an AI solution is part of a business process, it should be managed, protected, and subject to the same compliance and procedural elements as everything else,” said Vanover. “The biggest shortcoming for governance and security lies in the identification and governance of data.”
“Most issues arise from outdated, fragmented systems, which have insufficient access controls, lack observability, and have poor data lineage,” said Omar Khawaja, field CISO at Databricks. “Fragmented data estates are drivers of data silos, leading to challenges with implementing privacy, regulation, and security consistently.”
Another area threatening enterprise data security is poor data quality and lack of oversight. Unreliable and incomplete or missing data makes AI algorithms less trustworthy, but this can also have broader implications for an organization, such as inaccurate conclusions, vulnerabilities, and compliance risks.
AI EXACERBATES GOVERNANCE AND SECURITY ISSUES
But nothing has made governance and security more difficult than the rise of AI and machine learning.
To some extent, the challenges around data governance and security may look familiar to those who have been overseeing such efforts over the years. “AI is heavily dependent on data, so the same data governance and privacy issues that impact data and analytics also impact AI,” said Aslett.
And AI brings the risk of even more data silos, said Gnau. This is a challenge that has vexed data professionals for decades.
Unfortunately, it will only get worse. “As organizations become more data-driven, all business outcomes ultimately depend on the integrity of that data. With this, one main challenge persists—data silos,” said Shayde Christian, chief data and analytics officer for Cloudera. A typical organization has more than 10 data silos, he added.
This contributes to “fragmented efforts toward governance, via well-managed but isolated islands of data, which can provide the illusion of control, though these efforts are usually fragile in nature.”
Then there’s the pervasiveness of AI. “Sensitive data is now woven into a range of enterprise AI tools and applications, making it more vulnerable,” said Gentile.
There is also the scale required for AI. Here is an example: “Banks have models they’ve been using for decades—they understand the risks associated with these models and have mitigation built into standards and controls,” said Tiell. “In contrast, with today’s AI and GenAI models, the risks are not as well-known. Model-building techniques have become less transparent and less explainable, and models are often exposed to consumers and many more individuals who may not even know they’re using AI.”
The torrid pace of AI development itself is creating governance and security challenges. “In my hundreds of meetings with executives, boards, and security teams, I’ve found two conflicting mindsets about adopting AI,” Khawaja related. “One party is business and tech leaders eager to invest—I call them the gas. And on the other [side], you have security, legal, compliance, and governance teams, or the brakes. With AI adoption, you need both to avoid taking shortcuts in the decision-making processes and governance control implementation.”
Another challenge in governing and securing AI-bound data is model drift.
“When the underlying data or real-world conditions the model operates within shift, the initial model assumptions become outdated,” said Tiell. “This requires more robust governance.”
Generative AI (GenAI) also potentially exacerbates governance risks, as it “can inadvertently generate biased or harmful content,” said Aslett. “Enterprises must ensure that ethical guidelines are in place to direct AI-generated outputs and prevent unintended consequences. GenAI models also learn from historical data, which may contain biases. Organizations need robust mechanisms to detect and rectify bias during model training and deployment. Plus, enterprises need to create and enforce policies that govern the use of sensitive data by GenAI applications, including strong privacy controls and best practices to safeguard against breaches.”
The range of data types absorbed into AI systems also widens the need for governance and security. “When focusing on AI and machine learning use cases, all data—not just the traditional transactional and data-warehouse-resident data, but unorganized documents and logs and textual data—must be visible,” said Kunju Kashalikarm, senior director of product management for Pentaho.
AI not only surfaces new data types, but entire data environments as well that may have previously been set up for specific purposes under the radar of IT departments. “AI is giving a way to new data types and locations,” said Vanover.
Once an AI solution is part of a business process, “It should be managed, protected, and subject to the same compliance and procedural elements as everything else,” Vanover urged.