Galileo is emerging from stealth with a machine learning (ML) data intelligence platform for unstructured data that gives data scientists the ability to inspect, discover, and fix critical ML data errors fast across the entire ML lifecycle. The platform is currently in private beta with the Fortune 500 and startups across multiple industries.
“The motivation for Galileo came from our personal experiences at Apple, Google, and Uber AI and from conversations with hundreds of ML teams working with unstructured data where we noticed that, while they have a long list of model-focused MLOps tools to choose from, the biggest bottleneck and time sink for high quality ML is always around fixing the data they work with. This is critical, but prohibitively manual, ad-hoc and slow, leading to poor model predictions and avoidable model biases creeping into production for the business,” said Vikram Chatterji, co-founder and CEO of Galileo. “With unstructured data across the enterprise being generated at an unprecedented scale and now rapidly leveraged for ML, we are building Galileo with the goal of being the intelligent data bench for data scientists to systematically and quickly inspect, fix and track their ML data in one place.”
It is common for data scientists to use spreadsheets and Python scripts to inspect and fix their training unstructured data. Doing this ‘data detective work’ consumes more than 50% of a data scientist’s time, is ad-hoc, manual, error prone and leads to poor data transparency across the organization, causing avoidable mispredictions and biases in production models.
Galileo takes a unique approach to this problem – with just a few lines of code added by the data scientist while training a model, Galileo auto-logs the data, leverages some advanced statistical algorithms the team has created and then intelligently surfaces the model’s failure points with actions and integrations to immediately fix them, all within one platform, according to the vendor. This short circuits the time taken to proactively find critical errors in ML data across training and production models from weeks today to minutes with Galileo.
Galileo goes a step further by acting as a collaborative system of record for the data scientist's training runs, bringing transparency towards how specific data and model parameter changes impact overall performance – this is key for ML teams to truly be data-driven.
Half of the Galileo team comprises researchers from Apple, Google and Stanford AI who are focused on pushing the envelope of data-centric research that is then baked into the Galileo platform for any ML team to leverage.
The other half of the team is focused on building novel systems that can perform extremely low latency in-memory computations on millions of data points using minimal system resources. This combination allows Galileo customers to get quick, intelligent data insights throughout the entire ML workflow.
In addition from emerging from stealth, the company also announced that it has raised $5.1 million in seed funding. The Factory led the round and Anthony Goldbloom (co-founder and CEO at Kaggle) and other angel investors also participated. Company advisers include Amy Chang (Disney, P&G board member), and Pete Warden (one of the creators of TensorFlow).
The company plans to use the funding to hire across all departments and accelerate research and development to meet the demand of the industry for a purpose-built product to find and fix ML data blind spots across the workflow while working with unstructured data.
For more information about this news, visit www.rungalileo.io.