Image courtesy of Shutterstock
It was all things Hadoop at this year's Hadoop Summit in San Jose, California. The big data giant is continuing to move forward and expand and evolve as big data technology and big data analytics become more mainstream.
2014 is turning out to be a banner year for Hadoop as was evidenced by the huge number of attendees at the summit and the numerous different companies that were represented. Here are six key points the summit illustrated about the advances of Hadoop and big data.
- Education - As big data continues to grow and expand, so too does Hadoop. The sheer number of mainstream companies that attended illustrated this point. It wasn’t just tech companies that were at the event. Companies like Comcast, AT&T, Bank of America and American Express, to name a few, were all present at the Hadoop Summit. Not only does that mean Hadoop is being used at even higher levels, but it also means that attendees of the conference, and companies in general, are more informed and educated about Hadoop and big data than ever before.
- Hadoop 2.0 and YARN - With more data and more uses, it was paramount to the big data world that Hadoop take the next steps forward. There is a lot of work in the community happening around Hadoop 2.0 and YARN with a lot of promise around better performance and a richer set of APIs to support multiple applications on HDFS. With it, Hadoop is taking initial steps to become a data operating system in the world of fixed clusters and on-premise clusters. In addition HDFS is also taking steps to support different application types other than just raw map/reduce.
- Hadoop Becomes more Interactive and Real-Time Conscious - Apache Spark was also a popular topic of conversation at the summit. With such technologies, big data users can gain more from big data than ever before. In a world that values interactive technology and real-time capabilities above almost anything else, that’s sure to make big data users happy. Users can not only do more with Hadoop, but they can do it at faster speeds than ever before.
- The Cloud - Not surprisingly, 2014 is also a banner year for big data in the cloud. Just as personal computing has seen the evolution of the cloud in recent years, so too has big data. There are so many advantages to the cloud that it’s impossible to ignore, as evidenced by the increasing number of providers who are adding on cloud capabilities. Cloud computing is more affordable, provides greater storage capabilities and more flexibility. And as cloud capabilities continue to improve, it’s natural to assume that the enormous growth is only going to continue.
- The Stinger Initiative - The recent completion of the Stinger Initiative prompted conversations on the development of Hive. As described on the Hortonworks website, “The Stinger Initiative is a broad, community-based effort to drive the future of Apache Hive, delivering 100x performance improvements at petabyte scale with familiar SQL semantics.” Apache Hive is making waves for the way it’s evolving and improving in order to provide users the best possible querying capabilities. Adam Kawa, a Spotify data engineer, confirmed as much in his praise at the event. “If we gave the awards for the greatest software comebacks, Apache Hive would receive one this year,” Kawa said. “In 2008, Apache Hive made unthinkable queries possible. Today it makes them unthinkably fast.”
- Teamwork - 2014 is also the year that traditional databases and big data platforms become teammates and not opponents. Big data never was intended to replace traditional databases. It’s always been about making improvements and complementing the traditional methods. Working together is where there is real data power. It’s not about one or the other, but what they can both do together.
2014 is a year of change and innovation for Hadoop and big data. Hopefully, the advances of this year will spur continued growth and implementation in the future.
About the author
Ashish Thusoo is the CEO and co-founder of Qubole, a pioneering Big Data startup.