With the influx of big data solutions and technologies comes a bevy of new problems, according to Data Summit 2015 panelists Miles Kehoe, search evangelist at Avalon Consulting, and Anne Buff, business solutions manager for SAS best practices at the SAS Institute.
Kehoe and Buff opened the second day of Data Summit with a keynote discussion focusing on resolving data conundrums.
Some of the issues include data quality, leveraging Hadoop, using data lakes, valuing the cloud, and measuring data.
According to Buff, some of the most important treasures to the business value are found in “dirty” data but to move on to a more operational environment, data quality becomes critical.
“Whether you are using Spark or Hive, without the data quality, everything you put in big data is just a big garbage can,” Kehoe said.
Hadoop is more intended for data scientists instead of the general power user, Kehoe said, as that data storage method is gaining popularity in the big data space.
“I see Hadoop in the entire big data infrastructure, it’s not a solution, it’s a bunch of programs,” Kehoe said.
While Buff agreed and said it’s creating a skills gap, she added that the technology is still very valuable.
“To abandon it completely would be a shortfall because I think it’s giving us a really good premise to where not only within organizations can we find some consistency, but this is also a building block to be able to start seeing the sharing of data across organizations to form a lot of different capabilities,” Buff said.
A quick survey of attendees in the room during the panel, revealed that half used Hadoop in their organizations. Not nearly as many used data lakes however.
The use of data lakes gives organizations flexibility in storing and housing data, but there are pitfalls, so enterprises should be careful, according to Buff.
“From an analytics standpoint it’s a great playground or sandbox because it’s this utopia of putting our data in one place and make it naked, make it raw so we could do whatever we want with it,” Buff said. “But, oh my god that’s the biggest risk you could imagine.”
A content management system is a better way to go, Kehoe said. While using data lakes is a mixed bag, moving on to the cloud is promising, Buff said, as it allows organizations to begin a culture shift to focus on future problems.
When it comes to measuring data, asking the right questions is crucial, Kehoe observed.
“When you are using data for answering questions and metrics and measures…there are two different considerations,” Buff said. “When talking about metrics and measures you should be talking about data as a value for business process output versus if we are looking for new information now we use data as a business process input.”
Lastly, Buff and Kehoe predicted which technologies and issues would dominate in the next 5 years.
While more and more organizations struggle to define their data, use it and secure it, Buff said there would be a rise figuring out the ethics behind using certain data – with legal and marketing taking a bigger role.
“We are going to have to be very transparent in how we are using data without giving away our ‘secret sauce,’ ” Buff said.
Because of this shift it may lead to co-branding and the sharing of information across companies.
“Companies that already have a customer set can start to understand how these relationships are across brands and why,” Buff said.
Kehoe agreed and said security will continue to be incredibly important along with enterprises moving toward NoSQL, improved Hadoop and more.