For many, governance remains a dirty word: It’s bureaucratic, restrictive, and slows things down. This perception is diametrically opposed to data governance’s true objective, which is to enable rapid yet appropriate exploitation of enterprise data assets.
Think of this as a metaphor: Brakes don’t exist to make cars go slow; brakes allow cars to go fast. In the same vein, AI provides capabilities that allow your data to go fast (i.e., deliver value). AI solutions also require brakes to ensure that they don’t veer off the road in dangerous ways—ways that impacted parties do not agree are fair, equitable, or right.
As a result, AI is both a benefactor to and beneficiary of data governance. Here’s how AI contributes to the practice of data governance while simultaneously creating new challenges to overcome.
The Opportunity: Three Ways AI Benefits Data Governance
- Data as an Asset
Data is an asset. Organizations are data-driven. These are aspirational statements that are often proclaimed but hard to achieve. Digital Natives understand data as a core, value-added business enabler. Yet, the formal strategic and fiscal connection can be murky to non-natives.
Enter AI. Unlike the dashboards of yesteryear, AI-enabled products and services from chatbots to self-driving cars provide a clear connection between data that goes in and the business outcome that comes out. AI has also raised awareness of the risks (including incumbent bias) associated with data in the wild.
Additionally, AI has made organizations cognizant of the need to mindfully create and not just collect information. Historically, business processes were designed with just enough data to operate. Analytics mined (and made do) with the available data exhaust. Emerging “big data” platforms and analytics techniques such as machine learning and natural language processing made it possible to exploit diverse data sources such as social media. But, even in such scenarios, a Twitter stream that provided rich analytics fodder was a natural output of the application.
With AI, there is an increased awareness of the need to proactively identify the information that needs to be generated or captured to drive future value—even if that data is not core to the process itself.
So, has data hit the balance sheet? Not yet. But the link between data investment and business outcomes (or the lack thereof) has never been clearer.
- Augmented Data Management
Given the heavy lifting that has always been associated with data management, it is no surprise that data management is a target for AI augmentation. Utilizing machine learning, natural language processing, and computer vision, AI can be a powerful ally in the work associated with the following core data management tasks.
- Data curation—Organizations can evaluate incoming data streams to identify patterns, categorize the content, and identify discrete attributes. AI may be applied to help understand structured sources or to perform initial analysis of dirtier, chaotic data from sources such as sensors (common in IoT), or to tag private or sensitive information subject to enterprise or regulatory constraints.
- Data integration—Organizations can identify common attributes and linkages across data sources, as well as propose rules or flag suspicious records and poor quality content based on conformance to data standards (including valid values) established within or across identified datasets.
- Master data management (MDM)—Organizations can automate ongoing identification and matching of master attributes (i.e., people, places, things) across sources, and, going a step beyond traditional approaches, AI-enabled MDM often utilizes network/graph models to represent relationships.
- Data quality—Organizations can apply standard analytic and statistical analysis to score overall quality of the data feed and attendant attributes based on comparison to known value sets. Metrics can be evaluated at both the data stream and attribute levels, thereby allowing a more discrete data grading. In addition, relevant data quality rules can be proposed based on comparison to like data attributes.
A word of caution: Augmented data management does not occur autonomously. The availability of algorithms trained on standard industry constructs provides a running start. It does not negate the need to train against your own data. Humans must also remain in the loop to direct ongoing learning by monitoring overall efficacy and providing corrective input and training systems on new content categories.
For more articles like this one, go to the 2020 Data Sourcebook
Data stewards beware: Your job is still safe—just potentially a little less monotonous and with a new algorithmic customer. Is the effort worth it? A global pharmaceutical company reports that more than 80% of its core datasets are managed through automated processes.
- AI-Driven or Augmented Analytics
Self-service is the holy grail of analytics and data governance. The sticking point is that self-service is (still) not self-enabled. Yes, widely available tools allow non-technical users to easily generate advanced analysis from forecasting to machine learning models. All hail the citizen data scientist!