With well over a hundred open source projects now part of the Hadoop ecosystem, it can be hard to know which technologies are best for which requirements. To help users get started with Hadoop and understand their technology choices, James Casaletto will present “Harnessing the Hadoop Ecosystem” at Data Summit 2016 in NYC. Casaletto is a solutions architect for MapR, where he develops and deploys big data solutions with Apache Hadoop.
In total, users can expect to come away from this session with an understanding of the key components of the Hadoop ecosystem, how they fit into a solution, and how to make decisions when there is a plurality of choices, said Casaletto. In addition, Casaletto plans to present a “soup to nuts, end to end, comprehensive” example of a common solution built on Hadoop that will be generally applicable to many customers.
“The main takeaway I want to offer attendees is the ability to see an end-to-end solution and how it got designed based on some requirements and specifications,” said Casaletto. In the process of walking the group through the different choices that can be made as far as design and the possibilities available in the ecosystem, he said, he will emphasize that there is no single way to do it. “There are many choices in how you ingest, process, analyze, and visualize data,” he said.
Casaletto says he also plans to touch on a few common misperceptions about Hadoop. There are mistaken beliefs about the differences between Hadoop providers MapR, Cloudera, and Hortonworks, and “off-the-shelf Apache Hadoop.” There is also a misunderstanding about how to get started - with many organizations holding off on using Hadoop because they think it is only used for a “really big cluster,” when in reality a customer can start small, he said.
There is also a general lack of knowledge and awareness about the sorts of problems that can be solved with Hadoop, Casaletto said. However, he emphasized, for a very first project, it is important to choose low-hanging fruit - a well-known use case - and not something esoteric. “If it fails, it can seem that Hadoop failed, so it is important to get that first one right.”
Casaletto will present “Harnessing the Hadoop Ecosystem” as part of Data Summit’s Hadoop Day on Tuesday, May 10, at 10:45 am at the New York Hilton Midtown in NYC. To register, go here.