There’s no getting around it—unstructured data is now flooding into enterprises at an unprecedented rate. Many data managers and administrators say most of the data they are handling will be unstructured within the next few years.
In a survey of 264 data managers and professionals conducted by Unisphere Research, a division of Information Today, Inc., and sponsored by MarkLogic, respondents almost unanimously agree that unstructured data—which they define as business documents, presentations, and social media data—is on the rise and ready to engulf their current data management systems. Sixteen percent say they already have more unstructured and structured data within their enterprises, and another 33% predict that this shift will occur within the next several years. (“Big Data Is Real and It Is Here: 2012 Survey on Managing Big and Unstructured Data”)
As unstructured data overtakes structured data within enterprises, the coming year will see the start of a reassessment of how data is architected, stored, and queried in enterprises. In fact, 73% of respondents to the Unisphere survey admit that their current data infrastructures are not fully equipped to manage this data. To meet this challenge, new technologies and solutions have already begun to transform data management within enterprises. And enterprises are enthusiastically embracing new approaches that move data environments beyond traditional centralized relational database systems. Many data managers recognize that new architecture, new technologies, and new processes are needed to deliver real-time, actionable analytics to today’s decision makers.
The challenge with today’s dominant database model is that log files, graphics, documents, and social media data doesn’t fit so well into the orderly rows-and-columns world of relational databases. It’s almost futile to attempt to design a database schema that will capture the potentially huge and unpredictable influx of unstructured data, coming from a multitude of sources from inside and outside the enterprise. Data managers and administrators cannot keep up with all the new data sources that are being added, or others that are being taken offline, at a moment’s notice. They simply don’t have time or resources to keep up. Yet, data analytics and reporting derived from all this new data needs to be delivered on an often real-time or near-real-time basis. In addition, the ability to maintain database performance is getting more complex and difficult. Standby solutions such as ramping up storage capacity, processing capacity, or database tuning simply aren’t enough to stave off this tsunami.
In response, a new generation of databases and data platforms is coming into full fruition. Another recent survey of 304 data managers and professionals, conducted by Unisphere Research and sponsored by SAP, finds that 43% of organizations across North America currently have big data initiatives underway, many supported by the adoption of new technologies such as NoSQL and NewSQL databases and Hadoop. (“2013 Big Data Opportunities Survey”)
A few years ago, many database professionals began experimenting with Not Only SQL (NoSQL) databases, which take structured data out of the equation and rely on queries that are straight key-value pairs. NoSQL databases also have another advantage, as they can be readily run on commodity hardware. NoSQL databases support unstructured or nonrelational data types now flooding organizations. A survey of 298 data managers and professionals conducted by Unisphere Research among IOUG members finds about 11% have adopted NoSQL within enterprise settings, a number expected to grow to 15% within the year. (“Big Data, Big Challenges, Big Opportunities: 2012 IOUG Big Data Strategies Survey,” sponsored by Oracle)
For more articles on this topic, access the DBTA Best Practices section on Enterprise-Ready NoSQL, NewSQL and Hadoop - What's Ahead for Big Data.
NoSQL databases are finding their way into the enterprise, supporting activities as varied as mobile applications, ecommerce sites, and content stores. There are four major database groups that fall under the NoSQL umbrella: Key-value stores, which enable the storage of schemaless data, aligned as a key and actual data; column family databases which instead of storing data by rows—as is the case with relational databases—store data within columns; graph databases, which employ structures with nodes, edges, and properties to represent and store data; and document databases, which facilitate simple storage and retrieval of document aggregates.
It is also notable that the pendulum has also been swinging back to SQL, as evidenced by the rise of a new class of databases, called “NewSQL,” which represent a way for enterprises to leverage the best of the SQL and NoSQL worlds. These databases bring together the ability to manage both structured and unstructured data within a single environment, intermingling structured schemas and queries with distributed data structures. NewSQL databases differ from traditional relational databases as they are a class of modern relational database management systems that seek to provide the same scalable performance of NoSQL systems for online transaction processing (read-write) workloads while still maintaining the ACID guarantees of a traditional single-node database system. NewSQL databases are entering the enterprise supporting applications as varied as fraud detection, digital advertising, market segmentation analysis, real-time pricing and billing, and retail loyalty programs.