DataStax, a provider of a NoSQL platform powered by Apache Cassandra, and Databricks, a company founded by the creators of Apache Spark, have formed a partnership to integrate Cassandra and Spark. Together, the companies say, these technologies can boost analytics performance in a transactional database and allow companies to act quicker when serving customers’ needs.
According to Martin Van Ryswyk, executive vice president of engineering, DataStax, the companies hope that the integration will accomplish two things – give existing users from the two communities a useful new tool in their arsenal, and also garner the attention of new users who are still evaluating their technology choices.
“More and more, we see customers in the community wanting to do analytics on data in as real time as possible. That is what this is really about. By taking what is shaping up to be a really interesting, well architected, fast analytic tool like Spark and the Spark ecosystem, and getting it to run on top of the Cassandra database, where real time data is, in addition to where it has been playing, which is on top of HDFS, we are moving analytics into real-time.”
Apache Cassandra is a distributed, scalable database that allows users to create online applications that are always on and can process large amounts of data in real time.
Originally developed at UC Berkeley’s AMPLab, Apache Spark is a processing engine that enables applications in Hadoop clusters to run up to 100X faster in memory, and even 10X faster when running on disk. It also provides SQL, streaming data, machine learning, and graph computation functionality out-of-the-box to simplify building end-to-end analytic workflows. Together, these technologies can boost analytics performance in a transactional database and allow companies to act quicker when serving customers’ needs.
“There are online, line-of-business, transaction-oriented systems, and then there are more data warehouse, batch-oriented databases, and they serve different needs,” says Van Ryswyk. “But the world has changed. People need analytics faster and earlier in the cycle for all of their applications.”
Van Ryswyk said the two organizations are actively working on the integration and expect to share more information at the Spark Summit, June 30 through July 2, with the results of the partnership coming to market shortly after.
For more information, visit www.datastax.com and www.databricks.com.