Prior to the “historical data” stage, record-level storage and retrieval are key – store this URL, retrieve that user profile. Systems must write to the database, read back as necessary, and make sure these reads can be serviced quickly, because there is often a user waiting for that data on the other side of a browser. NoSQL alternatives are producing some compelling capabilities in a growing number of use cases where lookup of large-sized datasets are the driving functionality.
When data is even newer and by necessity faster, we are often interested in a specific data instance relative to other data that has also just arrived. Data that answers questions like “How is my network traffic trending?” or “What is my composite risk by trading desk?” or even “What is the current state of my online game leader board?” Queries like these on high-velocity data are commonly referred to as real-time analytics. Increasingly, real-time analytics are playing a critical role in making the best possible decisions in an operational environment. We have recently seen NewSQL alternatives addressing this relatively younger market.
Some of the latest, most exciting developments in big data focus on the very beginning of the data value continuum. At this stage, immediately after it is created, data is highly interactive. This is the realm of high-velocity analytics and decisioning – how fast can we place a trade and verify it doesn’t exceed a risk threshold, or serve an ad and use data from previous interactions to optimize the placement? These transactions require very fast, scalable systems that can ingest massive amounts of data – and are able to analyze and act on this data in real-time. In-memory SQL relational databases are able to provide the extraordinary velocity and sophistication required for these high-volume applications.
The “Closed Loop” Big Data Framework
Why is important to define data according to its age? One reason is that the value of the data can then be fully maximized at each point of its lifecycle by using purpose-built tools that are designed for specific jobs. But another important realization is that each of these points can begin working together within a “closed loop” system in a way that radically increases the overall value of the total dataset. Within a closed loop data framework, data is ingested by high-velocity systems and then sent to data warehouse and Hadoop systems.
When it comes down to it, there is only one reason to perform analytics of any kind – You want to make a better decision. When a closed loop system can gain knowledge from an analytic stage and feed that back into real-time decision-making, tremendous value can be achieved. Imagine a smart grid sensor system, ingesting thousands of events per second, that uses historical analytics to discover that prior to a catastrophic failure in the power grid, a particular piece of equipment starts to trend in an abnormal way. Now imagine if that closed loop system instantly feeds that knowledge back into the real-time decision making system to detect those run away trends before they happen again. The outcome can be astounding.
In the coming months and years, the deluge of data will cut across every industry. Being able to effectively manage and analyze this data at every stage of the lifecycle will become a key competitive advantage for the winners in any given market. Achieving this big data future is impossible with the old “centralized, monolithic database” approach. Instead, it will require a new approach: one that employs specialized databases across each stage of the data lifecycle. This is how big data will finally move beyond big promise to big value.