What is the new era of data management?
There has been a significant change in the IT world recently; solution developers no longer believe the answer to all data management challenges is a relational database. After 40 years, data management was considered to be a quiet part of IT where the products and providers were firmly decided. It is evident that information management has again become quite dynamic with a broad set of solutions offering new options for managing the big data challenges of today.
Organizations of all sizes and all industries around the world realize that they need to extract greater value from available information - to better serve their customers, to identify ways to reduce costs, and to better manage and reduce risk. The global rise of intelligent mobile devices, social media, and instrumentation with smart meters and sensors has resulted in an explosive growth of information - both structured data and unstructured content like text and video. Simultaneously, the "consumerization" of IT is driving higher expectations from all users. The trick is to find the best ways to manage and analyze all this information better to meet growing expectations, and deliver better results for the organization. This is today's big data challenge.
There is an important lesson to keep in mind as we consider these new challenges and solutions. In the late 1980s, the death of the mainframe was predicted because of the new era of "distributed computing." In reality, the new systems opened new uses of computing to drive business growth, while the volume of computing best done on the mainframe also grew ... and continues to grow. Unique values of relational database systems and their continued evolution will likewise make them the best choice for many solutions; consequently, this new era of data management does not mean the "death of the relational database." It is fair to say, however, that it could mean trouble for software providers that ignore this new wave of needs, and instead tell customers to fit the new "square" workloads into their existing "round" systems.
What solutions are available today to manage big data (and not-so-big data)?
New NoSQL, big data and cloud technologies, while each important to the new generation of solutions, are likewise not the answer to all needs. When selecting an IT solution, it's important to consider both function and the qualities of service required. Requirements for security, reliability, and transactional integrity are among a set of characteristics that can vary greatly across solutions - even those that have common functional needs. In this new era of data management, it is critical that we optimize systems for function and qualities of service requirements and well as cost efficiency.
It is also important to recognize that not all of the data challenges are "big." Some technologies under the category of NoSQL (Not only SQL/relational) are growing in popularity because they are easier to use in solutions with relatively modest requirements. In these cases, using a traditional relational database system may add unnecessary complexity and cost. For example, triple store databases are being used to capture relationships that enable an integration of different sets of data without the complexity of merging them into a rigid unified structure.
Cost efficient solutions often involve a complementary integration of more than one system - each managing the data and analysis it does best, then sharing the resulting insights with the others.
Solution examples include:
- Map-Reduce (Hadoop)Analytics - These systems enable analysis over a very broad and diverse set of information such as data and content available viathe Internet - for example, to cost effectively analyze petabytes of structured and unstructured information
- Streams Analytics - Systems that perform ultra low latency analysis of flowing information that may never be stored - for example, telemetry from medical devices
- Time Series Analytics - Systems with optimized data structures and capabilities that dramatically reduce required storage space and increase performance - for example, for unlocking hidden insights among the growing volume of data from smart meters and sensors
Even relational data systems are being optimized to save clients save time and money currently wasted on inefficient "one size does not best fit all" systems. Examples of different workloads for relational data systems include:
- Transaction Processing - Top performance, throughout and reliability for mission critical transactional systems - for example, core banking, retail point-of-sale, ERP systems
- Deep Analytics - Optimized performance and simplicityfor complex analytics workloads - for example, for data mining and predictive analytics
- Operational Analytics - Balanced performance forcomplex analyticsand a high volume of concurrent operational queries - for example, for supporting a call center with operational insights during each customer contact
What are the implications for information professionals?
The current pace of innovation in information management and analytics presents new opportunities for information professionals. Leading organizations are thinking differently about the types of skills they need to make the best use of information. While the roles of database designer and administrator continue, they are evolving as systems increasingly automate many of the activities required of DBAs. Organizations that take advantage of today's generation of self managing systems, free their teams to deliver higher level value. This is an opportunity for information professional to learn about new technologies, broaden their skills, and elevate the impact they have on their business.
We are also seeing the need for a much better collaboration among those focused on the question of what information should be analyzed with what algorithms, and those answering the question of what systems can best manage that information and analysis. When these groups do not work together to define solutions, there is a significant risk that the resulting solution will not deliver optimal performance and cost efficiency; and consequently, fail to deliver the desired business advantages.
How will organizations and industries be impacted by the changes?
It is exciting to watch how leading organizations are taking advantage of the new technologies to transform themselves and redefine what is required to successfully compete in their industries.
In healthcare, advanced analytics solutions are identifying patterns among millions of patient records to help doctors identify disease trends and best practices for effective and cost efficient treatments. Different systems are being used to analyze telemetry data from medical devices like never before to identify patients in trouble faster than doctors and nurses can observe. And an even more advanced system, Watson, is being deployed that can answer diagnostic questions in minutes based on a text reports and case histories that would take doctors week or months to find - if ever.
Another example that matters to all of us is the advances being made by smarter energy utilities. Data from smart meters and sensors throughout the energy grid is being analyzed real-time as it streams through systems; is being captured and analyzed by efficient time-series data systems over near-term history for optimizing operational systems; and analyzed along with other historical data in warehouse systems to anticipate and optimize long term planning. And if that were not enough - Hadoop-based systems are being used for analysis of structured and unstructured information for making decisions such as optimal wind turbine placement.
Regardless of industry, every sector has examples or potential use cases for data driven innovation. As organizations build advanced solutions, they must consider optimizing for qualities of service and cost efficiency, as well as function. It is a new world of data management complete with striking innovations. This means that the conventional answer to data challenges may not deliver the most business value or be the best answer.
About the author:
Bernard M. Spang is director, Strategy and Marketing, Database Software and Systems, at IBM.