8 Game-Changing Data Technologies

Jul 25, 2018

By Joe McKendrick

There’s a new generation of technologies reshaping data management as we know it. To explore some of the game-changing technologies or approaches that are having the most profound impact on today’s enterprises, DBTA asked industry experts and leaders to cite what they see as having the most positive impact. The following are eight areas effecting the most change.

1-Data as a Service

Data as a service (DaaS), which provides access to all forms of data across the network through a standardized service layer, is seen as the most promising development by some industry leaders. “To effectively leverage data for competitive advantage, we believe a fundamental technology change is needed to the solutions that enterprises use to organize and access their data,” said Von Wright, chief strategy and marketing officer for K2View. A DaaS platform facilitates this, “rapidly integrating data across disparate sources and delivering it in real time to any end user application—a serious challenge for traditional data solutions.”

Before DaaS, most data management platforms stored information using a row and column approach that creates complex, inflexible, and time-intensive transactions every time a user needs to access data, Wright added. Instead, DaaS supports models “built on specific business needs rather than a predefined technology or structure.”

2-Data Governance Tools

The ability to prepare data for advanced analytics “will have the biggest impact on an enterprise’s ability to compete on data,” said Ron Agresta, director of product management at SAS. “As data continues to be produced at an unprecedented rate and AI and analytics become more ingrained in businesses, it’s essential that organizations empower their data analysts, data scientists, and data engineers with technology that makes it easy to find, clean, and transform.” This calls for an environment “built for collaboration and governance,” he stated.

A well-designed data governance structure “should have an easy-to-use interface to encourage self-service data preparation while also having industry-proven data quality functions built in,” Agresta continued. Such an effort “should be to empower the analytical community to access and use all data, in its original form, and validated for their analytical processes,” said Agresta, who noted that organizational culture often poses a challenge to this vision. “Many organizations view the data as a departmental asset rather than a corporate asset. Data preparation with a self-service component challenges that paradigm and thus moves an organization to the vision of data for all rather than data for the few.”

3-Real-Time Streaming

The march to real-time enterprise is also bringing about a new generation of solutions geared to improving organizations’ abilities to sense and respond to opportunities or issues. Greg Benson, computer science professor at University of San Francisco and chief scientist at SnapLogic, pointed to the rise of real-time streaming platforms such as Spark and Flink used in conjunction with Kafka as a reliable distributed message queue. “The combination of these technologies enables reliable real-time analytics and machine learning applications.”

“Real-time analytics on streaming data is changing the way businesses compete,” said Colin Britton, chief strategy officer for Logtrust. “It’s also making it easy to democratize and liberate data access to more users, from data scientists to business users.” However, he added, “most companies are still struggling to get there because they’re using legacy databases or building bespoke systems by patching together the legacy systems they have with partial, limited solutions. Companies are reluctant to walk away from proprietary technologies where they’ve made heavy investments, so they wait too long, and they miss the wave. Scale and complexity are also barriers. It’s really hard to scale and get real-time results with much of the technology that’s out there.”

While stream analytics is still in the “experimental stages,” it’s proving to be a powerful approach to “monitor, analyze, and react to real-time events with a focus on fast detection and correction to assure quality,” said Abay Radhakrishnan, CTO architect at Sungard Availability Services.

The main challenge to making real-time data streaming and analytics a reality for organizations is a shortage of skills, Benson cautioned. “Spark and Flink are often deployed in a DevOps environment, which means finding both DevOps talent and also the programming skills for both Spark and Flink. They both require a new way of approaching problems.” At the same time, in the foreseeable future, such capabilities will also be widely available through cloud providers, helping to ease skill requirements, he added.

Radhakrishnan agreed that there’s currently a lack of expertise in data science, analytical, and real-time application programming skills that holds streaming analytics back. “Another issue is ensuring the security, sensitivity, and compliance of data sharing between all of the parties involved through the entire process of data collection, analysis, storage, and target visualization as it may involve third parties and cloud environments,” he added.

“Businesses need to react to the world in real time—be it for fraud detection, social sentiment analysis, or other uses,” said Benson. “They need to identify trends in a torrent of incoming data, then take action to improve the customer experience, reduce costs, or achieve some other benefit. Spark and Flink are incredibly powerful in this regard. They are fault-tolerant and designed to scale to very large amounts of data. Spark and Flink are similar: They both support batch computations on large-scale data and real-time streaming applications. Spark leans toward batch and interactive queries, while Flink was built from the ground up as a streaming first platform.”

It won’t be long, however, until streaming analytics “will be more widely adopted as more companies unlock the power of and become dependent upon real-time data,” Britton predicted. “It will be an integral part of data operations, enabling automation and data-driven business processes.”

4-Artificial Intelligence, Machine Learning, and Deep Learning

Of course, nothing is shaking up the data landscape more than the rise of cognitive computing. “You cannot complete any discussion without the mention of machine learning and artificial intelligence,” said Dheeraj Remella, chief technologist at VoltDB. “While machine learning provides the basis for uncovering indicators and patterns from a swath of data harvested from months or years of collection, enterprises are able to differentiate themselves with what I would consider the first cut of artificial intelligence: automated decision making. Automating the decisions from known business rules provides improved levels of efficiency.”

Ben Loria, chief data scientist at O’Reilly, said deep learning is a “technique that’s largely responsible for the widespread adoption of AI.” The applications associated with deep learning are now woven into the platforms of leading data companies such as Google, Microsoft, and Amazon, which have all “introduced deep learning across their services and are replacing their existing machine learning systems with deep learning-based models, including recommenders, search ranking, and forecasting,” he said. “Lately, there’s been a big emphasis on generative adversarial networks (GANs) and recurrent neural networks (RNNs)—extensions of deep learning that are expanding the bounds of what we can do with AI. Another important trend is the expanding footprint of deep learning in data science products.”

Loria cautioned, however, that “our understanding of deep learning systems is still emerging—and remains a work in progress.” Skills issues also pose a problem for the proliferation of the technology. “The shallow AI talent pool will be a huge bottleneck in getting deep learning and other AI projects off the ground,” he warned. “LinkedIn data indicates that there are around 20,000 active AI developers, compared to the millions that we’ll need to make deep learning succeed in the industry.” Efforts to democratize AI and deep learning will mean wider adoption, with more emphasis on training developers and tech professionals in other disciplines, rather than hiring Ph.D.-level data scientists, said Loria. “Thus, we’ll see more practical applications of the technology.”

Machine learning and decision automation have already gained considerable ground among forward-thinking enterprises, Remella observed. “The maturity of the automation is improving with the incorporation of machine learning into the decision making,” he said. “While this particular aspect is still in its early stages in a maturity model, this is going to change quickly with enhanced decision making frameworks and platforms.”

However, Remella also pointed to a looming skills shortage that could delay or inhibit machine learning projects. “The scarcity of really good data scientists is one challenge and the business expectation that data science is going to be able to make the enterprise revolutionary instantly is another,” said Remella. “This will be leading to a lot of disillusionment.”

A more targeted form of machine learning, called in-database machine learning, may help “change the scale and speed through which these machine learning algorithms are modeled, trained, and deployed, removing common barriers and accelerating time to insight on predictive analytics projects, said Ben Smith, senior product manager for Vertica at Micro Focus. “In-database machine learning technology frees data scientists from the volume constraints of traditional tools and enables them to discover patterns buried in ever-larger datasets, including data that resides in data lakes.” Smith predicted that soon, “most, if not all, organizations will be utilizing the full breadth and size of their data volumes to develop and deploy machine learning applications—no longer relying on down-sampled data and sluggish computation.”

5-Augmented Intelligence—The Other Ai

The other AI, augmented intelligence, is also becoming a part of leading analytics platforms, and helps address the skills deficiencies faced with artificial intelligence development. “One of the major factors limiting digital transformation initiatives is workforce data literacy—the ability to more effectively read, work with, analyze, and argue with data,” stated James Fisher, senior vice president for strategic marketing for Qlik. “Analytics platforms that incorporate augmented intelligence can help close this gap, and will change the way enterprises can compete with data. Augmented intelligence combines the power of human intuition with machine intelligence and artificial intelligence, expanding the range of user insights based on intent. The more a user engages with data, the analytics platform will learn and suggest associations between data sources, generating charts and visualizations to reveal untapped opportunities for growth and productivity.”

6-Containers

Containers—in which applications, data, dependencies, and runtimes are housed within a portable environment—“are currently having the most positive impact on enterprises’ ability to compete on data,” observed Laurent Bride, CTO of Talend. “They’re in production everywhere. They enable enterprises to scale in the cloud much more easily and to innovate faster, by automating processes and deploying across clusters.” With the emergence of container orchestration, and solutions such as Kubernetes, containerization will be highly strategic to optimizing storage, security, and networking.

There is still a learning and adoption curve with containers, as “abstract complexities come into play when it comes to understanding the state of infrastructure,” Laurent added. “Additionally, there are slight differences among cloud providers, so container deployment across clouds is not as seamless as companies would like.”

7-Open Standards (Especially JSON)

The rise of standardization in recent years has enabled organizations to take advantage of a range of data technologies and database types to fit their purposes. In particular, JSON (JavaScript Object Notation), a lightweight data-interchange format, is human-readable as well as machine-friendly. According to Sachin Smotra, director of product management at Couchbase, in recent years, the standard “has emerged as the primary object model for enterprise applications providing the flexibility required for rapidly evolving applications.” In particular, Smotra said, the NoSQL movement has been able to “capitalize on the growing popularity of JSON, creating a new type of database that could store JSON data natively and provide schema flexibility.”

At this time, Smotra observed, “JSON object models are being widely adopted within enterprise—both for building new applications and modernizing existing applications.” The challenge that needs to be addressed with JSON is the fact that “traditional data management approaches are being disrupted, which requires a revolution of the core ecosystem that was built over 40 years ago for relational technology. The lack of a schema for JSON is both a strength and a weakness. The evolution of the application and associated data model requires operational discipline, a new approach to application management and processes to support this new way of thinking.”

8-Multi-Model Data Management

With so much data now streaming through enterprises, there’s a need to be able to view and leverage data from many perspectives. This is giving rise to a new generation of multi-model data management technologies. Such systems “enable organizations to view their data from a variety of perspectives across the same system—helping them harness this data for more strategic decision making around competitive threats, market opportunities, and customer services,” said Jeff Fried, director of product management for InterSystems. “Multi-model also addresses the wide variety and huge volume of data that is being generated and gives organizations the flexibility and agility to use all the datasets and data types required to compete in the business environment.”

While multi-model data management isn’t a new concept, there’s been a notable increase in adoption in recent years, Fried observed. “We’re now seeing a significant increase in adoption, as the technology moves out of the experimental phase into the mainstream market.” The challenge, he added, is getting around “established mindsets and old-school cultures. There’s a proliferation of models and single-model databases. Telling DBAs that they can look across models and have the same data expressed in different ways at high performance runs counter to the orientation of their favorite database tool.”

However, with the rise of cognitive computing and real-time analytics, data managers are recognizing the need for multi-model data. “Organizations that embrace polyglot persistence will start to question its performance and practicality,” Fried predicted. “This will push multi-modal capabilities across the chasm into the early mainstream. The increase in information will, in turn, fuel the rise of machine learning and cognitive computing.”