Oracle has introduced a new Big Data Preparation Cloud Service.
According to the company, there is a need to reduce the data preparation time and costs associated with traditional data preparation methods.
Despite the increasing talk about the need for companies to become “data-driven,” and the perception that people who work with business data spend most of their time on analytics, the company contends that in reality many organizations devote much more time and effort on importing, profiling, cleansing, repairing, standardizing, and enriching their data.
The new service is built on Hadoop/Spark, with enhanced with Natural Language Processing and a Reference Dataset Knowledge Service. According to Oracle, it provides an intuitive and interactive user experience, guiding users with a machine learning-driven recommendation engine, so they can reduce the amount of time needed to ingest and prepare new datasets for multiple downstream processes. In addition to helping to simplify complex business operations, the new cloud service also aims to help users avoid error-prone setups and configurations.
There are a few ways the Oracle Big Data Preparation Cloud Service is differentiated from other offerings, said Jeff Pollock, Oracle vice president of product management.
“Data governance is table stakes. In order to have a viable tool in the enterprise software space you need some governance for the base capabilities,” he observed. “What most tools will provide is some level of automation or a recommendation service to help non-developers with the mapping, enrichment, and validation that they want to put on the data while they are ingesting and processing it. That is part of the definition of what it takes to even have a tool in this category. “
Where the Oracle solution differs from others, Pollock said, is that it provides a stronger recommendation service to automate that processing. “And we are the only vendor in this category that has combined the machine learning infrastructure of the Apache Spark Foundation with the Natural Language Processing capabilities of the Apache UIMA (Unstructured Information Management applications) project for doingNatural Language Processing. The result is that we do the statistical matching that comes with machine learning but we also apply what is called the entity extraction or the data classification that comes with Natural Language Processing and that comes together to provide a better recommendation engine to help automate more of that process of ingesting and preparing the data. That is the big differentiator that we have got in this category.”
In addition, Pollock said, many tools in this space focus only on the analytics use cases with data preparation capabilities to feed a BI tool, as opposed to the operational use case of moving data from point A to point B in an application domain.
Oracle, he said, believes there is a “huge untapped opportunity” also in the operational data preparation space. “A lot of our use cases are around streamlining the operations and building a repeatable data pipeline of activities. We are aiming to help the operational developers with their data flows so in the end we will see what used to be a collaboration between a business user and an IT developer flattened out with an IT analyst role covering the full domain without having to have a hardcore developer, or having the self-service capabilities go directly to the line-of-business users without having to get IT involved at all.” In the end, the goal is to simplify the user experience around preparing and enriching data to get it ready for both analytics and operational use cases, said Pollock.
The new service became available at the end of 2015 as part of the Oracle PaaS offering.
More information is available from Oracle at https://cloud.oracle.com/en_US/bigdatapreparation