In the new world of big data and the data-driven enterprise, data has been likened to the new oil, a company’s crown jewels, and the transformative effect of the advent of electricity. Whatever you liken it to, the message is clear that enterprise data is of high value.
And, after more than 10 years, there is no technology more aligned with advent of big data than Hadoop. The Apache Hadoop framework allows for the distributed processing of large datasets across compute clusters, enabling scale up from single commodity servers to thousands of machines for local computing and storage. Designed to detect and handle failures at the application layer, the framework supports high availability.
Despite the ongoing centrality of relational databases, a recent Unisphere Research report, titled “The 2016 Enterprise Data Management Survey,” found that Hadoop is gaining traction in the enterprise with about 40% of respondents indicating that they have a Hadoop installation. Of those using Hadoop, about two-thirds said that their relational databases are included in their Hadoop implementation. About 42% of the respondents also said they intend to add Hadoop in the future, though most were in the planning process, with about 5% at the proof-of-concept stage.
Apache Spark, another new technology in the Hadoop ecosystem that consists of a processing engine that accelerates structured database queries, unmodified Hadoop Hive queries and analytics of streaming data, among other tasks is still in the early adopter stage with about 18% of respondents using Spark, according to the same survey.
Along with a rich and growing assortment of projects flourishing as part of the main Apache Hadoop project, there is also an expanding array of vendors offering enterprise distributions as well as related tools and services. This is increasingly helping to make Hadoop, and its larger ecosystem of open source projects, more robust, secure, and enterprise-hardened.
Best Hadoop Solution
Cloudera
FINALISTS
Hortonworks
Amazon Elastic MapReduce