Though the term “big data” is still debated, it represents something qualitatively new. Big data does not just mean the explosion of transactional data driven by the widespread use of sensors and other data-generating devices. It also refers to the desire and ability to extract analytic value from new data types such as video and audio. And it refers to the trend toward capturing huge amounts of data produced by the internet, mobile devices, and social media.
The availability of more data, new types of data, and data from a wider array of sources has had a major impact on data analysis and business intelligence. In the past, people would identify a problem they wanted to solve and then gather and analyze the data needed to solve that problem. With big data, that work flow is reversed. Companies are realizing that they have access to huge amounts of new data—tweets, for example—and are working to determine how to extract value from that data, reversing the usual process.
Data quality programs will have to evolve to meet these new challenges. Perhaps the first step will be methods for developing appropriate metadata. In general, big data is complex, messy, and can come from a variety of different sources, so good metadata is essential. Data classification, efficient data integration, and the establishment of standards and data governance will also be critical elements of data quality programs that encompass big data elements.
Ensuring data quality has been a serious challenge in many organizations. Frequently, data quality problems are masked. Business processes seem to be working well enough, and it is hard to determine beforehand what the return on investment in a data quality program would be. In addition, in many organizations, nobody seems to “own” responsibility for the overall quality of corporate data. People are responsible or are sensitive to their own slice of the data pie but are not concerned with the overall pie itself.
What’s Ahead
It should not be a surprise that in a recent survey of data quality professionals, two-thirds of the respondents felt the data quality programs in their organizations were only “OK”—that is, some goals were met or poor. On the brighter side, however, 70% indicated that the company’s management felt data and information were important corporate assets and recognized the value of improving its quality. On balance, however, data quality must be improved. In another survey, 61% of IT and business professionals said they lacked confidence in their company data.
During the next several years, data quality professionals will face a series of complex challenges. Perhaps the most immediate is to be able to view data quality issues within their organizations holistically. Data generated by one division—marketing, let’s say—may be consumed by another—manufacturing, perhaps. Data quality professionals need to be able to respond to the needs of both.
Secondly, data quality professionals must develop tools, processes, and procedures to manage big data. Since a lot of big data is also real-time data, data quality must become a real-time process integrated into the enterprise information ecosystem. And finally, and perhaps most importantly, data quality professionals will have to set priorities. Nobody can do everything at once.
About the author
Elliot King has reported on IT for 30 years. He is the chair of the Department of Communication at Loyola University Maryland,where he is a founder of an M.A. program in Emerging Media. He has written six books and hundreds ?of articles about new technologies. Follow him on Twitter @joyofjournalism. He blogs at emergingmedia360.org.