The data lake is one of the hottest topics in the data industry today. It is a massive storage reservoir that allows data to be stored in its rawest forms. Hadoop Day at Data Summit 2015 concluded with a panel on everything data lake featuring James Casaletto, solutions architect for MapR, Joe Caserta, president and founder of Caserta Concepts, and George Corugedo, CTO with RedPoint Global Inc.
All three panelists agreed that the data lake is on the uptick in the data industry but to varying degrees. “We don’t see it in start-ups too much but it is prevalent in our enterprise clients. Our enterprise clients have been trying to offload dark data,” said Joe Caserta.
Corugedo stated that he believes adoption of data lakes isn’t as high as it could be but it is increasing. He attributes this to companies starting out with Hadoop with very vague intentions. “Until you have a very specific purpose for the infrastructure that is paid for by a business unit, it is just going to remain an R&D activity that is going to move to enterprise adoption,” explained Corugedo.
Data lake is a hot buzz word in the industry but it may have a different definition depending on who you are speaking with. “Data Lakes are going to continue to evolve and I don’t think it is necessary to have a definition. I think it is more important for the customers to have a vision on what Data Lakes mean to you,” said Casaletto. Caserta added that data warehousing technologies are are 30 years old and people still don’t know what they are. “Data Lakes have only been around for a few years so the definition of data lakes doesn’t vary that much. Most people can get their head around the concept,” he stated.
Corugedo felt it was buzz term that can mean whatever you want it to mean. The key is role and success criteria that the customer will have for the data lake.
“When a company is considering Hadoop it is important that they have an organizational need for it,” stated Casaletto in response to some of the biggest challenges to Hadoop adoption. Caserta noted that there is a technical learning curve for people that are familiar with relational databases. Also, applying governance to DataLakes is very difficult at this time and presents another challenge for organizations.
Every year, the amount of data produced has been on a steady increase and that trend shows no signs of slowing down anytime soon. Caserta noted that not only is more data being created, but there are more and more data sources as well. “Distributing your compute system to analyze data is not a fad, that is future,” stated Caserta. Corugedo agreed, saying that the term data lake may fade but the idea is here to stay.