Image courtesy of Shutterstock.
In 2015, we will see a shift in the primary method of big data processing, from batching to near real-time data processing. What will drive this technological turn, and what business segments will benefit most? What use cases are already occurring?
The Constant Data Flow of Modern Living
We are living in a "sensors world," where data is constantly gathered by smartphones, smart watches, sport trackers, smart devices, and even smart houses. Service providers collect this data for research, marketing campaign effectiveness analysis, proactive solutions, and more. All these "sensors" generate a constant flow that needs to be stored, processed and analyzed in order to bring value.
Over the last couple of years, companies have begun to realize the value of the vast amounts of data being collected by consumer products. As a result, we could see new products such as Apache Hadoop change the IT world and make it possible to store and process data within reasonable time spans. Cloud providers have made this technology available to businesses by providing ready-to-use Hadoop clusters as services.
Today's Business Market as a Big Data Processing Driver
The ability to process large data sets is no longer enough for modern business: today's climate demands near real time data processing. As a response to modern world requirements, the open source community has released technologies such as Apache Storm, Apache Kafka and later - Apache Spark. Cloud providers also had to react to the demand; that's why in 2014, we saw releases of brand new cloud services focused on near real time Big Data processing.
Let's have a look at business segments that take the biggest advantage of near real time data processing:
Manufacturing
Manufacturing control systems have a great number of sensors that monitor dozens of events. We need to bes able to see this data in real-time-a system should be able react to events as they occur. For example, if any parameter exceeds a safety threshold, the system should stop the process to prevent negative consequences.
IT Systems and Network Monitoring
The number of servers out there is always growing, producing massive amounts of machine data in the form of logs and different types of events. Businesses require resource usage and IT operations optimization. Proactive customer support is an example of where real time data processing can be applicable for IT. When an IT operational team isn't waiting for a customer to run across an issue, but instead the system monitors network metrics and social media, the system itself can detect an anomaly and trigger the IT Support team to launch an investigation of an issue.
Field Assets Monitoring and Alerting
Public transportation is an example of how field assets monitoring can be best utilized. Sensors can be deployed for taxis, buses, trucks, oil rigs, vending machines, or turnstiles in a subway. If we can aggregate information about the current position of a public transport vehicle, the number of people in a vehicle or in line waiting for a bus in a single place, and visualize this data in real time, it will become a smart tool to help traffic dispatchers optimize traffic and expenses.
Financial Transaction Processing
People all over the world prefer paying with credit cards, not cash; what's more, companies like Apple and Google already enable customers to pay with smartphones in stores. This is a challenge for a real time data processing, because we have a constantly increasing amount of data that needs to be processed, validated and fraud detected. Applied wisely, real time data processing could even prevent finance-related issues like risky trades or stock exchange meltdowns.
Marketing
Thanks to the popularity of social networks, marketing analytics got an ability to collect and analyze customer sentiments in real time, based on data retrieved from their accounts. For example, with the help of social networks, companies hosting a marketing event may see how people perceive their products, get feedback about the event, or track changes in their brand reputation.
Healthcare
Hospitals' intensive care units (ICUs) continuously collect data streams on patient vitals such as respiration, heart rate, and blood pressure. Real time monitoring-along with information about applied treatment and anomaly detections-can save lives, ensuring early reaction to anomalies and giving more time for doctors to prevent critical conditions.
Near Real Time Data Processing in Action
What kind of problems can near real time data processing effectively solve? Market leaders, technology evangelists and market researchers say that the following typical usage scenarios are trending:
Recommendation Engines
Users visit a website to read news, listen to music, or buy something, but they often have no idea what they want, specifically. For such cases, a recommendation engine can help a customer find something interesting based on the analysis of the behavior of other users, current trends, etc. As a result, if a user finds a service useful, it is possible to improve the value of Repeat Visitor Ratio (RVR), Average Session Length (ASL) indicators, and increase revenue.
Risk Management
We are living in the world where the risks of an investment can depend on the news. Whether we win or lose depends on if we could identify a specific trend, and how fast we reacted to the change.
Fraud Detection
The number of web-based financial transactions is constantly growing, and with it, the need for better ways to detect fraud online. The key to fraud detection is a real time anomaly detection algorithm that can identify suspicious customer behavior, which serves as evidence of malicious actions.
Application Monitoring
In the past, IT operations teams split the monitoring process into servers, network, and health application monitoring. Currently, there is a trend to combine all these monitoring types into a single entity, and it's success depends on the state of the servers, network infrastructure, etc. Application monitoring opens doors to operational optimization by an automatic anomaly detection that can help reduce the time required for support action, and improve quality of services.
Transaction Cost Analysis
In the case of a long supply chain, the cost of a single transaction may depend on numerous factors, some of which are difficult to predict, such as weather, news, or political events. Constant monitoring of the transaction cost analysis gives business management an ability to see, in real time, the influence of their decisions and help them take required actions as soon as possible.
The Internet of Things
The IoT is one of the latest trends showing that a large number of connected devices with centralized storage can bring value to people. A great example is a Smart City, where dozens of devices collect information about the environment, road traffic, weather, and resource usage, visualized in the real time. This can help city management to make reasonable decisions to improve residents' life quality.
Real Time Data Management
What solutions can help manage real time data? Here are three notable services offered by the most popular cloud providers:
- Amazon Kinesis: a fully managed, cloud-based service for real-time processing of large, distributed data streams. The idea of Amazon Kinesis is close to Apache Kafka - both are highly scalable event storages that guarantee messages' order. However, the main difference is that Kinesis is a ready-to-use out-of-box service that provides not only storage, but also framework for real time events processing.
- Azure Stream Analytics is a fully managed service providing low latency, highly scalable, complex event processing over streaming data in the cloud. The main advantage of this service is a SQL-like language used to specify transformations and monitor the scale/speed of their overall streaming pipeline.
- Rackspace Managed Big Data Services allows a user to easily spin up Apache Hadoop with Apache Spark cluster, a popular open source solution that currently is a standard de facto for Big Data and real time stream processing.
If for some reason an out-of-box solution doesn't feed your needs, deploy Apache Hadoop and Apache Spark on the Infrastructure as a Service provider, and have full control over cluster configuration.
The Growth of the Internet of Things
The growth of IoT, along with increased acknowledgement of data value, poses new business challenges that can be solved by real time monitoring. Cloud providers, Big Data real-time stream processing solutions and data science are ready to respond to this demand by providing new services, ensuring a better understanding of our environment (and even life-saving decision making). In 2015, we will witness an increasing number of solutions emerge, based on real time data processing.
Vadym Fedorov is a solutions architect at SoftServe Inc. He has 12 years of experience in enterprise application development, including expertise in OOD, OOP, SOA applications design and architecture approaches. In the last two years he has been focusing on cloud applications design and development. Vadym holds multiple certifications and a Masters in High-precision mechanics from Sevastopol State Technical University. He is also a contributor to the SoftServe United blog.
Additional information on SoftServe's view of the way IoT is changing data management, go here and for more information on IT operations optimization, go here.