Even the most ambitious data analytics initiatives tend to get buried by the 80/20 rule—with data analysts or scientists only able to devote 20% of their time to actual business analysis, while the rest is spent simply finding, cleansing, and organizing data. This is unsustainable, as the pressure to deliver insights in a rapid manner is increasing. When time to answer is critical, “you can’t afford to spend hours cleaning up data, nor can you waste time worrying whether your data is good enough,” said Peter Bailis, Stanford University professor and CEO of Sisu.
The need to flip the 80/20 ratio is urgent. “Just 5 or 6 years ago, innovative companies were satisfied with one- or even multiple-day delays for insights from their data,” said Ben Newton, director of operations analytics at Sumo Logic. “That is no longer the case. Many companies have hours, or even minutes, to respond to user behavior and market trends. The companies that are winning are basing much of their competitive muscle on the ability to leverage their data effectively and quickly.”
Data teams spend “copious amounts of time finding, cleansing and organizing data,” agreed Thameem Khan, general manager of data catalog and preparation at Boomi. “This creates a number of problems that hamper business progress, especially as it relates to understanding where data is, what the data says, and if the data is actually available to be used.” As a result, business users may need to wait for weeks for data teams to deliver responses.
In one organization serving energy markets, its data analysts and scientists employed a platform to process data for analytics and run machine learning models that consisted of Apache Kafka, Kubernetes, and a Hadoop stack for long-term storage. “These were all complicated technologies requiring specialized and rare engineering talent,” said Andrew Stevenson, CTO at Lenses.io. The analysts’ screens were full of terminals running models and if their desktops got rebooted, they had production incidents, he explained. The people charged with maintaining these systems—the platform and data engineering teams—had to keep up with trying to understand the energy markets and machine learning, as well as rewriting the models and applications, and making them deployable, noted Stevenson. Needless to say, he added, inordinate amounts of time and effort were spent “simply trying to access data, with governance, and deploy and run data applications.”
Tempered Optimism
Even the most optimistic business users are finding their enthusiasm tempered by the delays and complications that occur at the back end of their data infrastructures. “Business leaders are excited about leveraging analytics for decisioning inside of their organization, but they trigger a data landmine and report problems with accessing, preparing, cleansing, and managing data, ultimately stalling development of trustworthy and transparent analytical models,” Kim Kaluba, senior manager for data management at SAS, pointed out.