For a long time, data integration has been the holy grail for data organizations, promising a single, accurate picture of relevant data from across the enterprise, regardless of original source or format. Now, with the rise of new approaches — including master data management (MDM) and data virtualization, there is hope that this goal is within reach. But the challenges keep on coming, and lately, there has been a surge in unstructured data that may fall outside MDM realms.
“Just when architects thought they had integrated all their enterprise applications, their businesses started using SaaS apps,” Zeb Mahmood, principal of product and strategy at SnapLogic, tells DBTA. “And more recently, the challenge has not been around where the data resides— on-premise or in the cloud—but with the nature of the new data in the enterprise: the unstructured, high velocity, high volume ‘big data.’ Data integration is never complete.”
The problem is that large portions of this new class of information are not visible to decision makers. “Most is passing through unnoticed,” David Flammia, product owner of LP Insights at LivePerson, Inc., tells DBTA. “The primary reason behind that is the complexity of the raw data formats, and the general inability to reasonably convert the data into a meaningful, business intelligence-oriented type of format. And when data analysis does occur within a business, it mostly happens disparately; that is, data source analysis happens independently amongst all channels, so the insights found are very narrow in scope, and limited in value.”
Access the full version of this article in the DBTA September Best Practices Issue: Data Integration, Master Data Management, and Data Virtualization
Thus, observers concur that there is still plenty of work to be done before organizations truly bring together the disparate data pulsing through their systems and networks. “The reality is that today's organizations have a ways to go with unstructured data management, integration, and protection,” David Gibson, vice president of strategy for Varonis, tells DBTA.
“People aren’t adequately managing unstructured data because it’s a hard data set to get a handle on. In a recent survey we did, we found that 67% of respondents said that senior management in their organizations either don’t know where all company data resides or is not sure. It’s really hard to manage something when you don’t know where it is, who is using it, who has access to it, and what it contains.” It’s time to expand data integration efforts to address the new realities of big and unstructured data, Scott Gidley, senior director of R&D and data management for SAS, tells DBTA.
“The most common approach is ETL [extract-transform-load]. And while this is a solid, viable option, there are numerous data integration options available today, including ELT [extractload- transform] to consider. It’s important to weigh all options.” Enter MDM, emerging as a strategy that may help deliver unity in enterprise data. “The uptake of MDM has been strong and the market continues to mature,” Rick Clements, program director for MDM and big data strategy at IBM, tells DBTA. “MDM has gone past the early adopter stage and is now in the early majority with more organizations across all industries and sizes realizing value and returns on investment.”
Where is this return on investment coming from? Analytics and business intelligence are the low-hanging fruit for organizations attempting data integration efforts. MDM can play a key role here as well. An MDM roadmap is needed, Umesh Karpe, vice president of the data warehousing and business intelligence practice for iGATE, tells DBTA. Such a roadmap needs to include multiple entities for the MDM repository, covering subjects such as “customers, suppliers, products, sites, and in some cases even finance,” Karpe says. “The MDM repository becomes the golden copy and clean certified data is used across the organization for operational and analytical applications.”
The challenge of data integration — and the promise of MDM — doesn’t stop at the organization’s walls, either. “Information is fundamentally federated and that information is most valuable when it is presented as part of the active transaction and represents the most recent picture possible,” Steve Jones, global head for master data management at Capgemini, tells DBTA. This includes external information such as social media and supplier catalogs, as well as internal systems. “The new thinking is about providing a consistent federated view rather than fighting against how companies work with technology. This shift in thinking on information has highlighted the need to identify the unique customers, suppliers, products, and other key elements across systems, rather than simply try and consolidate into single unachievable repositories.”
Ultimately, MDM offers great promise as the best way to address thorny data integration problems, for both structured and unstructured data stores.“MDM considerably reduces the spaghetti of interfaces across business applications and hence optimizes business processes and outcomes,” KR Sanjiv, senior vice president and global head of analytics and information management for Wipro Technologies, tells DBTA. “MDM can eliminate a significant number of integration points.” While MDM offerings currently on the market manage unstructured data in a comprehensive way, it's not a showstopper either, says SnapLogic’s Mahmood. “It’s not a big limitation,” he explains. “The unstructured data of interest for MDM is not large unstructured text or machine-generated big data, but mostly images and PDFs, which can be managed by integration with content management systems" ... article continues in the DBTA September Best Practices Issue: Data Integration, Master Data Management, and Data Virtualization