Databricks, the company founded by the creators of Apache Spark, has formed a new partnership with SAP. The companies have collaborated on a Databricks-certified Apache Spark distribution offering for the SAP HANA platform.
The Databricks-certified distribution offering for SAP HANA contains the Spark processing engine that works with any Hadoop distribution out of the box, providing a complete data store and processing layer for Hadoop. This production-ready distribution offering is the first result of Databrick’s new partnership with SAP.
“SAP HANA is both an incredibly powerful and fast analytics engine, as well as a repository for some of the most valuable enterprise data by virtue of the enterprise applications that it helps run,” said Ion Stoica, CEO of Databricks. “This integration will help enable the large and growing community of Hadoop and Spark developers and applications to harness these capabilities immediately via Spark as well as extend the reach of SAP HANA.”
SAP HANA integrated with Spark will help enable real-time applications and interactive analysis across corporate application data with content stored in Hadoop Distributed File System (HDFS). Developers and data scientists developing on Spark can also benefit from end-to-end data processing acceleration in SAP HANA by leveraging its comprehensive suite of in-memory engines and libraries for transactional applications, analytics, predictive, machine learning, text, graph and geospatial analysis. This helps simplify the integration of mission-critical applications with contextual data stored in Hadoop-like data stores. As a result, in-memory computation is enabled to happen where data resides and can help minimize costly and time-consuming data movement.
Developers and data scientists will be enabled to more easily create a new class of applications with SAP HANA and Spark. For example, they can span data domains, such as applications that integrate inventory analysis with social media trends for retailers; combine sensor data with billing systems to deliver personalized resource and cost-saving recommendations for utilities; or converge patient data with epidemiological information to construct better staffing decisions for healthcare providers.
The full production-ready distribution offering, based on Apache Spark 1.0, is deployable in the cloud or on premise and available for immediate download from SAP at no cost at spr.ly/SAP_and_Spark.