Olsson goes so far as to predict that more traditional relational databases will come roaring back as the preferred solution for all levels of big data challenges. “Every company will always need a way to store and access structured data, and relational databases are really the only way to do it,” he says.
Plus, relational databases and data warehouses coming on the market do address big data analytics with advanced tools, and are being configured to more easily integrate with the newer big data-centric platforms and frameworks on the market, such as Hadoop. These new databases and data warehouses accommodate both structured and unstructured data within their file systems.
“Organizations of all sizes need data integration capabilities that fully support their business requirements, including ETL, data federation, replication, synchronization, changed data capture, data quality and governance, master data management, natural language processing, business-to-business data exchange, and more,” says Kopp-Hensley. “Many types of data architecture are in use today, but an ideal strategy for addressing all these business needs is to use a combination of methodologies.”
It is Time to Accept Database Diversity
Olsson agrees, noting that it’s time “to stop fighting database diversity, and to accept the fact that you have to live with several database platforms. Good execution can be a great cost saver, so focus instead on being good at supporting each of these platforms.”
There are initiatives to bring newer platforms—such as Hadoop—more in line with traditional platforms such as data warehouses. As Haddad puts it, “Most organizations have concluded that Hadoop plays a key role in data warehouse infrastructure,” he points out. “Now, people who are managing this hybrid infrastructure are struggling with questions such as, ‘what type of data should I use, and when should I use Hadoop versus the data warehouse?’ Haddad advises data managers to consult the numerous reference architectures that now exist to provide guidance on where data should be deployed and managed.
In many instances, the attraction to new-generation data platforms will be economics. “Enterprises are not in a hurry to discard their traditional relational and SQL platforms which have worked well for them for certain specialized applications, such as payroll or HR,” says Jim Vogt, president and CEO of Zettaset. “However, there is clearly a shift toward adoption of Hadoop as a more cost-effective database option for certain types of data, especially the unstructured data that is being gathered from sensors or log files. So for some time, expect new platforms to coexist with legacy platforms.”
New Approaches to Enterprise Decision Making
Still, the confluence of new database types and new frameworks designed for big data processing and analysis has elevated the process of managing business intelligence and analytics to new levels. ?Along with the variety, volume, and velocity of data these new environments are handling, there is an even more profound transformation afoot. New data environments are also reinventing the way enterprises approach decision making, and are even creating new job roles for people working with information.
“In the old days of an RDBMS-centric world, we were forced to think about the questions we wanted to ask of our data before we even put them into relational tables,” says Will Hayes, chief product officer for LucidWorks. In addition, organizations required a great deal of staff expertise in building databases or data warehouses, and constructing the necessary schemas, tables and SQL queries. “Developers were left with little ability to construct the necessary schemas, tables and SQL queries required, especially as these applications went to production,” he says. “Effectively scaling RDBMSs required expensive technology, as well as a Ph.D. in database replication.”
Today’s new generation of solutions take away a lot of the pain—and specialized skills requirements—associated with database management. Hayes says. “NoSQL data stores and index technologies mean more flexibility in the questions you could ask with simple loading of semi-structured data,” he explains. “Developers find that tools like MongoDB and Cassandra offer easy-to-understand transport mechanisms, such as REST APIs and JSON for loading and retrieving data. These technologies, most of which are open source, scale horizontally, requiring little knowledge of complex replication schemes.”
Linking Structured and Unstructured Worlds
It pays to link the structured and unstructured worlds within a comprehensive data integration framework, says Sid Probstein, chief technology officer for Attivio. “You can gain huge insights from analyzing three types of information at the same time: transactional, or what happened; CRM or sensor systems; and human-generated content from things like open-ended survey questions, social media, or email,” he points out.