EMC announced a new distribution of Apache Hadoop called Pivotal HD that features native integration of EMC's Greenplum MPP database with Apache Hadoop.
Pivotal HD with HAWQ provides a key advancement by offering SQL processing for Hadoop to the platform's reach to SQL programmers. Unlike competitive Hadoop distributions, EMC says, Pivotal HD does this without moving data between systems or using connectors that require users to store the data twice. Capabilities of EMC’s HAWQ technology include Dynamic Pipelining, a query optimizer, horizontal scaling, SQL-compliant, interactive query, deep analytics, and support for common Hadoop formats.
According to a blog posted on the Greenplum website, Donald Miner, solutions architect, says:
“HAWQ is a relational database that runs atop of HDFS. It has its own execution engine, separate from MapReduce, and manages its own data, which is stored on HDFS. HAWQ bridges the gap, a SQL interface layer on top of HDFS that also organizes data. It also boasts a core feature called GPXF (Greenplum Extension Framework) that allows HAWQ to read data from flat files in HDFS that are stored in just about any common format (delimited text, sequence files, protobuf and avro.) In addition, it has native support for HBase, supporting HBase predicate pushdown, hive connectivity, and offering a ton of intelligent features to retrieve HBase data.”
Because Pivotal HD offers support for SQL standards-compliant data mining tools, SQL-trained data analysts, can connect to, query, and analyze data sets stored in the Hadoop file system (HDFS). Also, according to EMC, Pivotal HD delivers query response time improvements that are up to 600x faster than current SQL-like interfaces for Hadoop.
Pivotal HD provides operational support through a command center that enables administrators and developers to install and manage large clusters from interactive web user interfaces. The command center also exposes a command line interface for scripting and a programmer-friendly web services API for complex automation tasks. The command center enables administrators to deploy large clusters, configure services and roles, manage services, and monitor HDFS jobs and tasks.
Expected to be available at the end of the first quarter of this year, Pivotal HD will be offered as a software-only or appliance-based solution.
For more information, go here.