The spread of the internet and mobile devices such as smartphones and tablets is not only generating more data than ever before, many kinds of data—much of it largely unstructured or semistructured—have become very important. The use of RFID and other kinds of sensor data has led to a data tsunami of epic proportions. Cloud computing has created an imperative for companies to integrated data from many different sources both inside and outside the corporation. And compliance with regulations in a wide range of industries means that data has to be held for longer periods of time and must be correct. In short, the basics for data quality and master data management are in place but the basics are not nearly sufficient.
The Current Situation
In 2002, the Data Warehousing Institute estimated that poor data quality cost American businesses about $600 million a year. Through the years, that figure has been the number most commonly bandied about as the price tag for bad data. Of course, the accuracy of such an eye-popping number covering the entire scope of American industry is hard to assess.
However, a more recent study of businesses in the U.K. presented an even starker picture. It found that as much as 16% of many companies’ budgets is squandered because of poor data quality. Departments such as sales, operations, and finance waste on average 15% of their budgets, according to the study. That figure climbs to 18% for IT. And the number is even higher for customer-facing activities such as customer loyalty programs. In all, 90% of the companies surveyed opined that they felt their activities were hindered by poor data.
When specific functional areas are assessed, the substantial cost that poor data quality extracts can become pretty clear. For example, contact information was one of the first targets for data quality programs. Obviously, inaccurate, incomplete, and duplicated address information hurts the results of direct marketing campaigns. In one particularly egregious example, a major pharmaceutical company once reported that 25% of the glossy brochures it mailed were returned. Not only are potential sales missed, current customers can be alienated. Marketing material that arrives in error somewhere represents sheer costs.
Marketing is only one area in which the impact of poor information is visible. One European bank found that 100% of customer complaints had their roots in poor or outright incorrect information. Moreover, this study showed, customers who register complaints are much more likely to shop for alternative suppliers than those who don’t. The difference in the churn between customers who complain and whose complaints are rooted in poor data quality and those who don’t is a direct cost of poor data quality.
And the list goes on. Poor data quality in manufacturing slows time to market, leads to inventory management problems, and can result in product defects. Bad logistics data can have a material impact on both the front end and back end of the manufacturing process.
The Benefits of Improving Data Quality
On the other side of the equation, improving data quality can lead to huge benefits. One company reported that improving the quality of data available to its call center personnel resulted in nearly $1 million in savings. Another realized $150,000 in billing efficiencies by improving its customer contact information.
As the cost/benefit equation of data quality has become more apparent, the need to define data quality has become more pressing. In addition to the core characteristics of accuracy and timeliness, the most concise expression of the attributes of high-quality data is consistency, completeness, and compactness. Consistency means that each “fact” is represented in the same way across the information ecosystem. For example, a date is represented by two digits for the month, two for the day, and four for the year and is represented in that order across the informational ecosystem in a company. Moreover, the “facts” represented must be logical. An “order due” date, for example, cannot be earlier than an “order placed” date.