Data quality has been one of the central issues in information management since the beginning—not the beginning of modern computing and the development of the corporate information infrastructure but since the beginning of modern economics and probably before that. Data quality is what audits are all about.
Nonetheless, the issues surrounding data quality took on added importance with the data explosion sparked by the large-scale integration of computing into every aspect of business activity. The need for high-quality data was captured in the punch-card days of the computer revolution with the epigram garbage in, garbage out. If the data isn’t good, the outcome of the business process that uses that data isn’t good either.
Data growth has always been robust, and the rate keeps accelerating with every new generation of computing technology. Mainframe computers generated and stored huge amounts of information, but then came minicomputers and then personal computers. At that point, everybody in a corporation and many people at home were generating valuable data that was used in many different ways. Relational databases became the repositories of information across the enterprises, from financial data to product development efforts, from manufacturing to logistics to customer relationships to marketing. Unfortunately, given the organizational structure of most companies, frequently data was captured in divisional silos and could not be shared among different departments—finance and sales, for example, or manufacturing and logistics. Since data was captured in different ways by different organizational units, integrating the data to provide a holistic picture of business activities was very difficult.
The explosion in the amount of structured data generated by a corporation sparked two key developments. First, it cast a sharp spotlight on data quality. The equation was pretty simple. Bad data led to bad business outcomes. Second, efforts were put in place to develop master data management programs so data generated by different parts of an organization could be coordinated and integrated, at least to some degree.
Challenges to Data Quality and MDM
Efforts in both data quality and master data management have only been partially successful. Not only is data quality difficult to achieve, it is a difficult problem even to approach. In addition, the scope of the problem keeps broadening. Master data management presents many of the same challenges that data quality itself presents. Moreover, the complexity of implementing master data management solutions has restricted them to relatively large companies. At the bottom line, both data quality program and master data management solutions are tricky to successfully implement, in part because, to a large degree, the impact of poor quality and disjointed data is hidden from sight. Too often, data quality seems to be nobody’s specific responsibility.
Despite the difficulties in gathering corporate resources to address these issues, during the past decade, the high cost of poor quality and poorly integrated data has become clearer, and a better understanding of what defines data quality, as well as a general methodology for implementing data quality programs, has emerged. The establishment of the general foundation for data quality and master data management programs is significant, particularly because the corporate information environment is undergoing a tremendous upheaval, generating turbulence as vigorous as that created by mainframe and personal computers.