Tamr Inc. has been issued a patent (US9,542,412) from the United States Patent and Trademark Office covering the principles underlying its enterprise-scale data unification platform.
The patent, titled Method and System for Large Scale Data Curation, describes a comprehensive approach for integrating a large number of data sources by normalizing, cleaning, integrating, and deduplicating them using machine learning techniques supplemented by human expertise.
Traditional approaches to data integration had outlived their usefulness, according to Mike Stonebraker, co-founder & CTO of Tamr, who noted that Tamr’s goal was to build an end-to-end system for enterprise-scale data curation that leveraged modern machine learning techniques to reduce the time and cost of producing clean, unified datasets and this patent confirms the uniqueness of Tamr’s invention.
In addition to Stonebraker, other Tamr employees named in the patent include Nik Bates-Haus, George Beskales, Dan Bruckner, Ihab Ilyas, and Alex Pagan.
The patent describes several features and advantages implemented in the company’s software, such as the techniques used to obtain training data for the machine learning algorithms; a unified methodology for linking attributes and database records in a holistic fashion; multiple methods for pruning the large space of candidate matches for scalability and high data volume considerations; and novel ways to generate highly relevant questions for experts across all stages of the data curation lifecycle.
Other characteristics of Tamr’s unique data unification system covered by the patent include scalability through automation, data cleaning, non-programmer orientation, incremental data integration, and data curation.
For more information about Tamr, go to www.tamr.com.