After more than 10 years, there is no technology more aligned with advent of big data than Hadoop. The Apache Hadoop framework allows for the distributed processing of large datasets across compute clusters, enabling scale up from single commodity servers to thousands of machines for local computing and storage. Designed to detect and handle failures at the application layer, the framework supports high availability.
Hadoop has forever changed the economics and dynamics of large-scale computing, and its use among enterprises looking to augment their traditional data warehouses continues to grow.
“Hadoop doesn’t move the data, it brings a function to the data,” said Marco Vasquez, senior technical director, MapR, at Data Summit 2018. “You don’t need a lot of clusters and infrastructure to get going.”
As the velocity and types of data continue to accelerate, a vortex that corporate data falls into is created, said Michael Corey, co-founder of LicenseFortress, and a Microsoft Data Platform MVP, Oracle ACE, and VMware vExpert. “As organizations race to exploit this information, many of the traditional safeguards on data go by the wayside. The more progressive DBAs understand this problem and are stepping in to protect their organization from the organization’s need for speed before it’s too late. Over time, traditional DBAs will transform to becoming DBAs of the ‘dataverse’ (Hadoop, NoSQL, etc.) and, whatever form or velocity the data takes, they will make sure it’s accessible, accurate, and secure.”
Best Hadoop Solution
Cloudera
Finalists
Hortonworks
Amazon Elastic MapReduce