Big data represents an enormous shift for IT, said Craig S. Mullins in a presentation at Data Summit 2016 in NYC that looked at what relational database professionals need to know about big data technologies. Mullins, a principal of Mullins Consulting, and the author of the DBA Corner column for DBTA, provided an overview of the changes that have taken place in the DBTA arena in recent years, and the key technologies that are having high impact.
Aside from the well-known five Vs of big data, Mullins said that big data represents a move from mostly internal data to information from multiple sources; a move from transactional to add analytical data; a move from structured to add unstructured or differently structured data; a move from persistent data to data that is constantly on the move; and a shift from local storage to storage in the cloud.
For years, the IT landscape was dominated by three DBMS vendors - IBM DB2, Microsoft SQL Server, and Oracle Database - but with the advent of big data, things are changing very rapidly. There is not just relational, but also NoSQL; not just DBMSs, but also Hadoop; and not just commercial but also open source. Offering a prediction for the future, Mullins said he expects that we will see a DBMS that is not just one type of engine. Big players will have a data management platform with a variety of engines for different purposes.
There are many drivers contributing to the growth of NoSQL, said Mullins. There is the fact that there are more users – 1,000 users used to be a lot and 10,000 was extreme but the web renders these numbers quaint today. There is more data and it can be difficult to scale to terabytes of data for some traditional databases applications. There is also the growth of different data - and “unstructured” data from documents and social media is not easily handled by relational databases. Moreover, a variety of analytics use cases can require different technologies. And finally, there is also the simplicity of the infrastructure of the DBMS, and the more rapid development – schema-free databases deliver.
Mullins compared and contrasted the different types of NoSQL database including column store, key/value, document store, graph and multi-model, and also compared relational versus NoSQL.
“Don’t let anyone tell you that NoSQL will overtake relational or that ACID is not important,” said Mullins. “NoSQL does not replace relational. It augments it.”
In addition, he emphasized, it is important to remember that many of the newer big data technologies are relatively immature. Pig is at v0.15.0; and HBase and Hive only very recently at V1, he noted.
And, despite the promise of all of these newer data management and storage technologies, he added, it is important to remember that the goal is really to help to advance data-based decision making.
Mullins has made the slides from his presentation available at http://conferences.infotoday.com/documents/245/A104_Mullins.pdf.
Many additional presentations from Data Summit 2016 have also been made available for review at www.dbta.com/DataSummit/2016/Presentations.aspx.
Data Summit is an annual 2-day conference, preceded by a day of workshops. Data Summit offers a comprehensive educational experience designed to guide you through all of the key issues in data management and analysis today. The event brings together IT managers, data architects, application developers, data analysts, project managers, and business managers for an intense immersion into the key technologies and strategies for becoming a data-informed business.
SAVE THE DATE FOR DATA SUMMIT 2017
MAY 16 - 17, 2017
HILTON NEW YORK MIDTOWN, NYC