Talend, a provider of open source integration software, has announced the availability of Talend Open Studio for Big Data, to be released under the Apache Software License. Talend Open Studio for Big Data is based on Talend Open Studio, augmented with native support for Apache Hadoop. In addition, Talend Open Studio for Big Data will be bundled in Hortonworks' Apache Hadoop distribution, Hortonworks Data Platform, constituting a key integration component of Hortonworks Data Platform.
Talend Open Studio for Big Data aims to improve the efficiency of integration job design by offering a graphical development environment. It provides native support for Hadoop Distributed File System (HDFS), Pig, HBase, Sqoop and Hive. By leveraging Hadoop's MapReduce architecture for highly-distributed data processing, Talend generates native Hadoop code and runs data transformations directly inside Hadoop for maximum scalability. This feature enables organizations to combine Hadoop-based processing, with traditional data integration processes, either ETL or ELT-based, for better overall performance.
The goal is to put simple and easy to use tools into the market and democratize big data, Jim Walker, director, product marketing at Talend, tells 5 Minute Briefing. "We are releasing this under the Apache license so it will be completely free for download form our website."
In addition, he notes, the Talend Open Studio for Big Data is a core component of the Talend Platform for Big Data, which enables organizations to increase their productivity by more quickly deploying big data solutions. The Talend Platform for Big Data integrates data of all types - structured, semi-structured and un-structured - and maximizes an organization's resources by abstracting the technical complexity of big data tools and technologies, according to Talend. The Talend Platform for Big Data is compatible with all Apache Hadoop distributions and has been certified for use with Hortonworks Data Platform.
Talend Platform for Big Data provides an intuitive set of graphical components and workspace that allows for interaction with a big data source or target without the need to learn and write complicated code. It also presents data quality functions that take advantage of the massively parallel environment of Hadoop, enabling developers to take advantage of the high performance processing environment to identify duplicate records across these huge data stores in moments not days. It also extends into profiling big data and other quality issues as the Talend data quality functions can be employed for big data tasks. With Talend Platform for Big Data, the ability to schedule, monitor and deploy any big data job is included, built on a shared repository, so that data analysts can collaborate and share project metadata and artifacts.
What is missing from many big data projects is governance, emphasizes Walker. "It is still a bit of the wild west out there. We don't see the governance around these projects scoping out just yet," says Walker. "It is our job to put all of the governance constructs that we have built out within the Talend Unified Platform for a common code repository, scheduling of jobs, scheduling of events, the metadata management, the monitoring of these projects - that governance from a project point of view around this set of tools. Talend Platform for Big Data extends those capabilities around the core capabilities that allow you to get these projects up and running at will. That extends into data quality as well, which is a key part of our overall strategy. "
A preview version of the product is available at www.talend.com/download-tosbd.
For more information, please visit www.talend.com.