To help companies get more value from big data, SAP has introduced HANA Vora, a new in-memory computing engine that leverages and extends the Apache Spark execution framework to provide enriched, interactive analytics on Hadoop. HANA Vora is a completely new product built from the ground up and is aimed at better processing of data to make business decisions.
According to SAP, companies are increasingly facing complex hurdles in dealing with distributed big data everywhere, a situation compounded by the lack of business process awareness across enterprise apps, analytics, big data and Internet of Things (IoT) sources. SAP HANA Vora, which is intended to be installed and run on Hadoop distributions - whether they be public cloud, private cloud, or on-premise deployments at customers sites - is aimed at simplifying and expediting the process of gaining value from this data.
It will also be available in the HANA Cloud Platform as a development tool.
While customers seek to get more from big data, the company contends, mining large datasets for contextual information in Hadoop can be a challenge. SAP HANA Vora enables organizations to conduct OLAP processing directly on these large, rich data sets all in-memory.
Similar to what what SAP has done with HANA, pushing processing down to where the data resides in HANA in memory, Vora pushes the processing down to Hadoop, said Mike Eacritt, vice president of product management, SAP HANA and SAP HANA Vora.
According to Eacritt, SAP HANA Vora takes Hadoop and Spark and augments them with enterprise-type features, so that enterprise data sources can be combined with Hadoop data on the fly for real time processing without having to consolidate it. This means that if a customer has HANA connecting to Hadoop, they can both process the data without having to move the data to one place. “This gives you the ability to take all of the context from the data that you have in Hadoop and enrich it with metadata, because Hadoop data is actually very metadata-light – it’s a lot of files, whereas enterprise files are very metadata-rich,” Eacritt explained.
SAP has a customer in the aerospace and defense field that had 300 million records within one table in HANA. The organization was able to connect it to and match that data up with their Hadoop data within an hour, avoiding the undesirable prospect of moving the 300 million records to their Hadoop store for processing. “We were able to use data virtualization to be able to add two data stores together and end users were able to query them like they were all in one store,” said Eacritt.
The new solution addresses problems that SAP has seen in organizations as people try to get access to and extract value from their big data stores. “What we are trying to address is the distributed nature of data and how to get access to that distributed nature of data – so across thousands of nodes in a data lake for example, how do we extend that to the enterprise.”
Hadoop is very cheap storage, but the challenge is that when people think of the data lake, they think of water, Eacritt said. Water flows well together, but the data lake is a lot of different types of data and it is not always easy to get value out of it across many projects, he added. “The data lake has this impression that you can go with your cup and get anything you want out of it.”
SAP, Eacritt said, sees problems with inefficient processing from many sources within and outside organizations, with data stored in siloes, and the issue of how to unify siloes to make it consumable. In addition, moving the data around has cost, and creates more siloes as data is deployed in shadow IT processes, thus creating new problems in terms of governance, compliance, and data quality.
“What we have done is really extended enterprise-type software into Hadoop, and brought a lot of our know-how into the Hadoop and open source world,” said Eacritt. The goal, he said, is to fuel what SAP is calling “precision decisions” - using precision from the very deep data but with the right context. “Precision decisions require a lot of nuance and context. The term is something we came up with to try to explain what enterprise-meets-big-data will enable.”
SAP HANA Vora is expected to help customers in industries where highly interactive big data analytics in business process context is paramount, such as financial services, telecommunications, healthcare and manufacturing.
Use case examples where SAP HANA Vora can potentially benefit customers include
- Mitigation of risk and fraud by detecting new anomalies in financial transactions and customer history data;
- Optimization of telecommunication bandwidth by analyzing traffic patterns to help avoid network bottlenecks and improve network quality of service (QoS).
- Delivery of preventive maintenance and improve product re-call process by analyzing bill-of-material, services records and sensor data together
SAP HANA Vora is planned to be released to customers in late September; a cloud-based developer edition is planned to be available at the same time.
For more information about HANA Vora, go here.