The seemingly endless list of requirements and necessities for propelling AI toward success is certainly overwhelming. Narrowing the focus onto a fundamental piece of this list—data management—and addressing it effectively can cause a domino effect of positive outcomes. However, ensuring that data is relevant, clean, accurate, well-structured, and prepared for AI is still a lofty, and often complex, task.
Danny Sandwell, technology strategist, erwin by Quest, and Helen Kinsella, community of practice leader for data governance and privacy at Informatica, joined DBTA’s webinar, Achieving AI-Ready Data: Active Metadata, Data Catalogs, and Data Observability, to discuss the latest advancements, technologies, and best practices in the area of data management that can make AI-ready data a tangible reality.
Sandwell explained that it isn’t surprising “how much people are aligning AI with data governance…making it [AI] their driver for data governance.”
“AI is really going to be…the next GDPR in terms of that hyper accelerator for people to really, seriously look at how they’re governing their data and how they’re managing their data in the background to make sure that it’s…going to bring success to their efforts in the AI world,” Sandwell continued.
Bridging the gap between data and the real world will be key in these efforts, according to Sandwell, which can be achieved by adhering to the four “C’s” of AI-ready datasets:
- Consistent: Trusted, value-scored, ready, and available
- Contextual: Context-based indexing and searching
- Curated: Business definitions, rules, processes, and context
- Clean: Watermarked, trusted, and free from drift
Atop these four components is erwin by Quest’s framework for AI model governance, governing data input, AI model certification, and output model observation. In the input phase, it is crucial to apply the aforementioned 4 “C’s” for AI-ready data, supported by data cataloging, classification, lineage observation, and scoring.
The AI model and its data should then be certified as fit for use, followed by monitoring the output for data quality, data drift (or when the data starts changing drastically), and sensitive data leaks.
Examining why enterprise data is generally not ready for AI is critical in understanding how to fix it.
According to Informatica’s “CDO Insights 2024: Charting a Course to AI Readiness” study, 93% of AI adopters encountered roadblocks, which were identified as the following:
- Data quality (42%)
- Data privacy and protection (40%)
- AI ethics (38%)
- Quantity of domain-specific data for training and fine-tuning of LLMs (38%)
- AI governance (36%)
Ultimately, Kinsella noted, bad data leads to bad AI; the success of AI is entirely dependent on robust, responsible, and relevant data. Taking a closer look at what good, AI-ready data is, Kinsella identified the following attributes:
- Accurate, transparent, and contextual, leveraging a universal metadata foundation that helps deliver AI answers tailored to your unique business
- Governed, democratized, and secure, aligning to set standards and helping you deliver AI that is compliant, private, and unbiased
- Complete, resilient, enterprise-scale, and consistent, making AI more powerful and reliable
While ideal, achieving this sort of AI-ready data runs into another obstacle: the sheer volume of data facing organizations today.
“There’s too much data to do everything manually…to look at that profile, to create that individual rule,” Kinsella explained. “They’re looking for quality of data, to observe it, because they need to feed that data into the AI models—but they need to do that in a way that can scale.”
As a response to this challenge, Informatica has developed the Intelligent Data Management Cloud (IDMC), an AI-powered data management platform built on a metadata system of record to make relevant, responsible, and robust data for AI possible. An end-to-end platform for data management forwarding AI success, Informatica leverages the CLAIRE engine to simplify data management through GenAI-powered natural language, enabling its customers to excel beyond human capacity.
For the full, in-depth webinar featuring detailed explanations, real-world examples, a roundtable discussion, and more, you can view an archived version of the webinar here.