Ensuring data quality is a big part of regulatory compliance. Its importance cannot be understated. Poor data quality costs the typical company at least 10% of revenue, and 20% is probably a better estimate in the view of data quality expert Thomas C. Redman. According to software marketing and technology expert Hollis Tibbetts, “Incorrect, inconsistent, fraudulent, and redundant data cost the U.S. economy over $3 Trillion a year.”
The cost of poor data quality notwithstanding, high-quality data is crucial for complying with regulations. Think about it. If the data is not accurate, how can you be sure that the proper controls are being applied to the right pieces of data to comply with the appropriate regulations?
Metadata Management
Good data quality starts with metadata. Accurate data definitions are required in order to apply the controls for compliance to the correct data. But what is metadata?
Metadata characterizes data, providing documentation such that data can be understood and more readily consumed by an organization. Metadata answers the who, what, when, where, why, and how questions for users of the data.
Metadata is required to place the data into proper categories for determining which regulations apply. For example, PCI-DSS applies to payment card transactions, HIPAA applies to healthcare data, and so on. Some data will apply to multiple regulations and some data will not be regulated at all. Without proper metadata definitions, it is impossible to apply regulatory compliance to data.
Data Quality
The next step is to ensure that the data, once accurately defined, is itself accurate. Imposing regulatory controls on the wrong data does no good at all. This raises the question “How good is your data quality?” Estimates show that, on average, data quality is an overarching industry problem. According to Redman, payroll record changes have a 1% error rate; billing records have a 2% to 7% error rate; and the error rate for credit records is as high as 30%.
But what can a DBA do about poor-quality data? Data quality is a business responsibility, but the DBA can help by instating technology controls. Building constraints into the database can improve overall data quality, as well as defining referential integrity in the database. Additional constraints should be defined in the database as appropriate to control uniqueness, as well as data value ranges using CHECK constraints and triggers.
Another technology tactic that can be deployed to improve data quality is data profiling. Data profiling is the process of examining the existing data in the database and collecting statistics and other information about that data. With data profiling, you can discover the quality, characteristics, and potential problems of information. Using the statistics collected by the data profiling solution, business analysts can undertake projects to clean up problematic data in the database.
Data profiling can dramatically reduce the time and resources required to find problematic data. Furthermore, it allows business analysts and data stewards to have more control of the maintenance and management of enterprise data.
Data Governance
Data governance programs are becoming more popular as corporations work to comply with more and stricter governmental regulations. A data governance program oversees the management of the availability, usability, integrity, and security of enterprise data. A sound data governance program includes a governing body or council, a defined set of procedures, and a plan to execute those procedures.
So an organization with a strong data governance practice will have better control over its information. When data management is instituted as an officially sanctioned mandate of an organization, data is treated as an asset. That means data elements are defined in business terms; data stewards are assigned; data is modeled and analyzed; metadata is defined, captured, and managed; and data is archived for long-term data retention.