Image courtesy of Shutterstock.
One of the hallmarks of big data is that it is coming from everywhere and anywhere, creating massive headaches for data and IT executives. Data flowing in may have different formats, represent murky timelines, or be of less-than-stellar quality.
Such concerns, while valid, actually mean little to business decision makers. They simply expect IT and data management departments to be able to pull all available information resources together and make it highly accessible to the business as actionable insights, automated decisioning systems, or via self-service platforms.
The good news is that some enterprises are gaining a semblance of control over, and are seeing business value from, their big data assets. They are enriching their capabilities through a variety of data governance initiatives. Some companies have more routine, established and comprehensive processes and policies in place, while others have ad hoc approaches to the same challenges. The troubling news is that most organizations are only just beginning to recognize the scope of the challenge that lies ahead of them and are not satisfied with the pace of data integration.
To better understand the impact of new data sources on data governance practices, IBM commissioned Unisphere Research, a division of Information Today, Inc., to survey 304 managers responsible for data management in their organizations. (“Governance Moves Big Data from Hype to Confidence”)
The survey finds that organizations are investing heavily in initiatives that will increase the amount of data at their disposal. The survey also finds that the percentage of organizations with big data projects in production is expected to triple in the next 18 months. However, as the amount of data grows, they are spending more time finding needed data rather than analyzing it.
Concurrently, very few companies feel entirely confident about the data that is coming in from all sources, and they are significantly less confident in data gathered through social media and public cloud applications than they are in data generated internally. Internal, structured data evokes the highest level of confidence.
Figure 1: What is Your Confidence Level in Each of These Data Sources?
Structured data in your internal systems | Confident 63% | Not confident 14% |
Data provided by your business partners | Confident 37% | Not confident 29% |
Unstructured data in your internal systems | Confident 21% | Not confident 44% |
Data stored in a public cloud | Confident 23% | Not confident 50% |
Social media data | Confident 13% | Not confident 61% |
Note: The remaining respondents for each data type had neutral responses (neither confident nor not confident).
While confidence is an issue overall, managers generally trust reports based on the analysis of big data, even though the data quality may not be as good as that of reports based on traditional, internal data. As data grows within enterprises, analysts may find themselves spending more time defending the quality of their data than ever before. And the problems of security lurk beneath the surface of every conversation about data and data analytics. As companies accumulate more and more sensitive data about their customers, the need to keep that information private and secure is paramount.