
Iterative Announces Data Chain, the Open Source, AI-Based Tool for Perfecting Unstructured Data

Iterative, the company dedicated to streamlining the workflow of AI engineers and creator of widely-used open-source projects in MLOps, is announcing the upcoming release of Data Chain, the new open source tool designed to empower organizations to derive greater value from unstructured data. With the ability to process and evaluate unstructured data, Data Chain positions a variety of enterprises to successfully implement AI with open source data curation and pre-processing capabilities.

The push toward AI—especially in its generative form—continues to highlight the data inefficiencies being fostered at many organizations. Iterative points to the fact that processing unstructured data at scale is a key inhibitor toward AI success, revealing a greater incompetence between today’s structured data technologies and the latest AI workloads based in Python, according to the company.

Without the ability to effectively process and curate unstructured data—such as videos, images, and PDFs—any generative AI (GenAI) implementation is doomed from the start.

“People are solving all these kinds of problems themselves using Python, [which is] super inefficient,” explained Dmitry Petrov, CEO of Iterative. “They're wasting tons of time [trying] to answer this question [of retrieving unstructured data], [and] your ML engineers will spend days [trying] to solve this problem. That's not the best use of their time.”

Acknowledging this rampant challenge among DVC users, Iterative’s Data Chain democratizes the use of AI models to evaluate and process unstructured data in an easy-to-manage format. Using AI to perfect AI—or, rather, the data serving it—alleviates the pains of complex workarounds for developers trying to extrapolate value from their unstructured data.

“Any good toolset gets you to the next level,” said Petrov. “What ‘next level’ here means is that people should be able to use AI to curate their data for AI.”

Offering AI-based analytical capabilities such as “large language models (LLMs) judging LLMs” and multimodal GenAI evaluations, Data Chain brings intelligent data curation and pre-processing to the open source community. Additionally, Data Chain can store and structure Python object responses with the latest model schemas, including those utilized by leading LLM and AI foundational model providers, according to Iterative.

“The biggest challenge in adopting artificial intelligence in the enterprise today is the lack of practices and tools for data curation and generative AI evaluation that can ensure the quality of results,” said Petrov. “As the next step, we need AI models that can evaluate and improve AI models. So far this has only happened at the industry forefront; take a look at DeepMind's AlphaGo training against itself, or OpenAI’s DALL-E3 curating its own dataset. Our goal is to change this.”

To learn more about Data Chain, please visit
