Maintaining consistency is more difficult than it may appear at first. Companies capture data in a multitude of ways. In many cases, customers are entering data via web forms ,and both the accuracy and the consistency of the data can be an issue. Moreover, data is often imported from third-party source systems, which may use alternative formats to represent “facts.” Indeed, even separate operational units within a single enterprise may represent data differently.
Maintaining Data Consistency
Master data management is one approach companies have used to maintain data consistency. MDM technology consolidates, cleanses, and augments corporate data, synchronizing data among all applications, business processes, and analytical tools. Master data management tools provide the central repository for cross-referenced data in the organization, building a single view of organizational data.
The second element of data quality is completeness. Different stakeholders in an organization need different information. For example, the academic records department in a university may be most interested in a student grade point average, the courses in which the student is enrolled, and the student’s progress toward graduation. The dean of students wants to know if the student is living on campus, the extra-curricular activities in which the student participates, and any disciplinary problems the student has had. The bursar’s office wants to know the scholarships the student has received and the student’s payment history. A good data system will not only capture all that information but also ensure that none of the key elements are missing.
The last element of good quality data is conciseness. Information is flowing into organizations through several different avenues. Inevitably, records will be duplicated and information comingled, and nobody likes to receive three copies of the same piece of direct mail.
Because companies currently operate within such a dynamic information environment, no matter how diligent enterprises are, their systems will contain faulty, incorrect, duplicate, and incomplete information. Indeed, if companies do nothing at all, the quality of their data will degrade. Time decay is an ongoing, consistent cause of data errors. People move. They get married and change their names. They get divorced and change their names again. Corporate records have no way to keep up.
But time is only one of the root causes for bad data. Corporate change also poses a problem. As companies grow, they add new applications and systems, making other applications and systems obsolete. In addition, an enterprise may merge with or purchase another organization whose data is in completely different formats. Finally, companies are increasingly incorporating data from outside sources. If not managed correctly, each of these events can introduce large-scale problems with corporate data.
The third root cause of data quality problems is that old standby—human error. ?People already generate a lot of data and are ?generating even more as social media content and unstructured data become more significant. Sadly, people make mistakes. People are inconsistent. People omit things. People enter data multiple times. Inaccuracies, omissions, inconsistencies, and redundancies are hallmarks of poor data quality.
Given that data deterioration is an ongoing facet of enterprise information, for a data quality program to work, it must be ongoing and iterative. Modern data quality programs rest on a handful of key activities—data profiling and assessment, data improvement, data integration, and data augmentation.
In theory, data improvement programs are not complicated. The first step is to characterize or profile the data at hand and measure how closely it conforms to what is expected. The next step is to fix the mistakes. The third step is to eliminate duplicated and redundant data.Finally, data quality improvement programs should address holes in the enterprise information environment by augmenting existing data with data from appropriate sources. Frequently, data improvement programs do not address enterprise data in its entirety but focus on high-value, high-impact information used in what can be considered mission-critical business processes.
The Big Data Challenge
To date, most data quality programs have been focused on structured data. But, ironically, while the tools, processes, and organizational structures needed to implement an effective data quality program have developed, the emergence of big data has the potential to completely rewrite the rules of the game.