In 2016, Hadoop marked its 10th anniversary and now represents much more than a platform for the storage and batch processing of vast quantities of data from disparate sources in many formats. The Apache Hadoop framework, consisting of Hadoop Common, Hadoop Distributed File System (HDFS); Hadoop YARN, and Hadoop MapReduce, remains central to most big data projects and to the creation of data lakes, but Hadoop has also expanded to represent a large ecosystem of more than 100 interconnected open source Hadoop-related projects.
The Apache Hadoop framework allows for the distributed processing of large datasets across compute clusters, enabling scale up from single commodity servers to thousands of machines for local computing and storage. Designed to detect and handle failures at the application layer, the framework supports high availability.
First created by Doug Cutting, a Yahoo engineer, the Apache Hadoop open source framework, which got its name from Cutting’s son’s toy elephant, Hadoop was embraced early on by large web companies such as Facebook, Twitter, LinkedIn, Yahoo!, and Amazon.
But today, as IT and data management moves from a back office concern to a high level objective, a wide range of companies view data more than ever before as a valuable enterprise resource. And, as part of that evolution Hadoop is being embraced as a key technology to enable the storing and analyzing of more data than ever before and for longer periods of time. A key aspect of the framework is that it that allows for the distributed processing of large datasets across clusters of commodity hardware and can scale from single server to many machines, each offering local compute and storage, with highly available service on top of clusters.
Along with a rich and growing assortment of Hadoop related projects flourishing as part of the main Apache Hadoop project, there is also an expanding array vendors offering enterprise distributions as well as related tools and services, helping Hadoop to increase its array of enterprise features, in areas such as security, flexibility, and accessibility.
HERE ARE THE WINNERS OF THE 2016 DBTA READERS' CHOICE AWARDS FOR BEST HADOOP SOLUTION:
Winner:
Hortonworks
Finalists:
Cloudera
MapR