Syncsort, a provider of specialized solutions for big data, is making a major open source contribution to the IBM z Systems mainframe with a connector for Apache Spark.
The connecter will enable organizations to access and gain new insights from their critical mainframe data with Apache Spark’s advanced analytics capabilities and Spark SQL.
“We were very carefully looking at what the main drivers are of Apache Spark and why it is becoming such an effective open source project and compute platform,” said Tendü Yogurtçu, general manager at Syncsort. “The main reason and driver for this is really the ability to accommodate a variety of workloads.”
Spark Packages, launched by Databricks, makes it easy for users to find, discuss, rate and install packages that enhance Spark’s capabilities.
“We saw that this promise of having a single compute platform for multiple types of workloads will be really critical moving forward to 2016 and 2017 for use cases that will be interested, including streaming, ETL and IoT use cases,” Yogurtçu said.
With this new connector, customers can simply specify the location of multiple datasets and the associated COBOL copybook metadata. The Spark mainframe connector automatically transfers the datasets in parallel via a secure connection into Spark’s DataFrame objects.
Users can then manipulate this DataFrame object and join it with their other data sources for further analysis.
Syncsort’s mainframe connector conforms to Spark's Data Sources API specification, and because of Spark’s ability operate on data in memory, the connector will allow queries to access mainframe data without offloading the data first.
Mainframe record formats including fixed, variable, sequential and VSAM files are all supported. The connector also handles compressed data transfer, minimizing network bandwidth and optimizing overall elapsed time.
“We see this as our first step towards the bigger picture with our streaming, ETL and IoT of use cases with our contributions to Apache Spark and related big data projects,” Yogurtçu said.
Data scientists and data stewards that are exploring and running interactive queries, looking at existing legacy data, analytics, and handling other data sources to gain additional insights will benefit from this, according to Yogurtçu.
“This has been very beneficial for us and our partners,” Yogurtçu said. “We are accommodating real time workloads in addition to batched workloads and we’ll be making some related announcements with that. This is just the first step.”
For more information about Syncsort and Spark, go here.