Databricks, provider of the Unified Analytics Platform and founded by the team who created Apache Spark, is releasing Apache Spark 2.3.0 on Databricks’ Unified Analytics Platform.
Databricks is the first vendor to support Apache Spark 2.3 within a compute engine, Databricks Runtime 4.0, which is now generally available.
In addition to support for Spark 2.3, Databricks Runtime 4.0 introduces new features including Machine Learning Model Export to simplifying production deployments and performance optimizations.
The Apache Spark community made multiple valuable contributions to the Spark 2.3 release which was introduced on February 28.
“The community continues to expand on Apache Spark’s role as a unified analytics engine for big data and AI. This is a major milestone to introduce the continuous processing mode of Structured Streaming with millisecond low-latency, as well as other features across the project,” said Matei Zaharia, creator of Apache Spark and chief technologist and co-founder of Databricks. “By making these innovations available in the newest version of the Databricks Runtime, Databricks is immediately offering customers a cloud-optimized environment to run Spark 2.3 applications with a complete suite of surrounding tools.”
The Databricks Runtime, built on top of Apache Spark, is the cloud-optimized core of the Databricks Unified Analytics Platform that focuses on making big data and artificial intelligence simple for enterprise organizations.
The enhancements introduced in Spark 2.3, which is supported within the latest Databricks Runtime 4.0, focus on usability, stability, and refinement.
In addition to introducing stream-to-stream joins and extending new functionality to SparkR, Python, MLlib, and GraphX, the new release provides a millisecond-latency Continuous Processing mode for Structured Streaming.
Instead of micro-batch execution, new records are processed immediately upon arrival, reducing latencies to milliseconds and satisfying low-level latency requirements.
Now developers can elect either mode—continuous or micro-batching—depending on their latency requirements to build real-time streaming applications at scale while benefiting from the fault-tolerance and reliability guarantees that structured streaming engine afford.
For more information about this news, visit www.databricks.com.