At the Strata Data Conference in New York, Paxata, provider of the Adaptive Information big data prep platform, announced early availability of its Intelligent Ingest as part of its next major release. The new automated ingest capabilities are aimed at making it more simple for business consumers to rapidly incorporate data from any cloud or format to prepare data for business analysis.
The premise of Paxata has been to democratize the data preparation process, which in the past involved data quality and ETL tools which are useful for very technically skilled people "but not so great" for less technically skilled users, said Nenshad Bardoliwalla, co-founder and chief product officer at Paxata, who discussed the new capabilities at the conference.
When Paxata was started in 2012, the idea was to simplify the entire process from raw data to visualization, he noted. “On the one hand, the world was moving to Hadoop, and data lakes and MongoDB, and other polyglot datastores, and on the other hand, end users were moving to Tableau and Qlik and other end user solutions, and there was nothing to bridge the gap in between.” When the platform was introduced at Strata 4 years ago, one of the key tenets was to use intuitive visual experiences, machine learning algorithms, and distributed computing to provide a simple environment for a less technical analyst to turn raw data into useful information, Bardoliwalla noted.
“What has happened over the last 4 years is that the big data paradigm has become practically mainstream and companies are pouring all sorts of frankly strange data into the data lake, such as XML, JSON, Avro and Parquet. There is a never-ending list of these and what we realized is that we make data easy to transform—and that has always been our value proposition—but just getting it in and working with it has become really hard. We wanted to take the intelligence that Paxata had developed as part of the data prep process—where in our system we can automatically join data on the fly, union data on the fly, standardize values on the fly—and automate the ingest process too,” said Bardoliwalla.
According to Bardoliwalla, the ability to automate the ingest process is the next step in the company’s efforts to make the life of business consumers easier by enabling them to turn raw data into information so they can exploit the business value of different data types in the data lake without having to know anything about their underlying storage format.
“Why does the user have to know that the extension 'avro' is an Avro file or that it may be .avro but it is also compressed using LZ4?” Bardoliwalla continued. “Those are words that technical people use but people can’t get value from the data lake if they can’t ingest it into the tools they work with, so the Fall ‘17 release is about making it really easy for someone to point at any arbitrary dataset or file format in their data lake, database, Amazon S3, or BLOB storage, and actually have the system infer how to bring that in and make it consumable for the end user. What Intelligent Ingest allows you to do is to click on a file in a data lake about which you previously had no information, and learn that it is XML, compressed, and turn it into a view that looks like a table you would find in Excel.”
To access an overview of Intelligent Ingest, go here.