Hydrolix, the company transforming the economics of log data with its streaming data lake platform, is unveiling a new Apache Spark connector that democratizes the power of Databricks to customers’ wealth of event data. The Hydrolix Spark Connector enables the ingestion and storage of full-fidelity event data in Databricks, offering low latency queries, a streamlined user experience, and rapid information extraction.
Though Databricks is a powerful analytics platform, its limitations lie within the fidelity, range, and query performance of the underlying data, according to Hydrolix.
Targeting this inefficiency, the Hydrolix Spark Connector enables Databricks users to economically store full-fidelity event data in Hydrolix, where information can be rapidly extracted from both real-time and historical data. Ultimately, this connector combines the power of Hydrolix’s high-performance, cost-effective, and always “hot” log and event data with Spark’s massive JOINs, advanced machine learning workflows, and shuffling to power complex query operations at a low latency without driving up costs.
"The Hydrolix Spark Connector allows Databricks users to store massive amounts of time series data over long periods of time at full fidelity in the Hydrolix data lake,” said Alok Aggarwal, director of the Innovation Lab at Hydrolix. “With this connector, Databricks users can unleash the full power of Databricks against all of their data and model across longer time periods such as year-over-year and multiyear datasets quickly and cost-effectively.”
The Hydrolix Spark connector opens new use cases for Databricks users, enabling them to query all of their event data and explore log data in Databricks. Particularly useful for data science, business intelligence, and machine learning use cases, the Hydrolix/Databricks integration can generate insights associated with:
- Predicting inventory and product demand
- Capacity planning
- Detecting outliers for anomaly and threat detection
- Fraud detection
- Training machine learning models
Other streamlined capabilities include:
- Databricks notebooks can be used to analyze and visualize Hydrolix data
- Joining log data in Hydrolix with data from other sources to produce new insights
- MLib for machine learning tasks, including for use cases such as fraud detection, capacity prediction, and anticipating customer churn
- Utilizing Hydrolix summary tables for real-time summaries in Databricks
To learn more about the Hydrolix Spark Connector, please visit https://hydrolix.io/.