By now, I'm sure most of us have spent quality time with generative AI systems such as ChatGPT and have been amazed at how convincingly they mimic human intelligence. Furthermore, it's just as easy to integrate generative AI into application code as it is to interact manually.
However, when it comes to adding generative AI capabilities to enterprise applications, we usually find that something is missing—the generative AI programs simply don't have the context to interact with an application's users or a company's customers. The AIs have been trained on very large training sets, but these sets don't usually include fine-grained and up-to-date information about a company's products, customers, or other context that would be needed to make intelligent responses.
Retrieval Augmented Generation (RAG) solves this problem by integrating a search component with a generative model. RAG first retrieves relevant documents from a database or application API. It then combines these relevant documents into the prompt, which is fed into the AI. The AI can then provide a response that is based on this input data.
This allows an AI like ChatGPT to answer questions such as "What are the available shoe sizes for your explorer hiking shoe?" or "What is the status of my recent order for a spelunking headlamp?"
RAG is a powerful, but relatively straightforward process. Firstly, we need a way to retrieve relevant data from a store using a natural language query. In most cases, we do that using a vector index.
Like all indexes, vector indexes are designed to locate items in a large set using a search term. However, while traditional indexes use exact matches on the search terms, vector indexes search for things that are semantically similar. In other words, they look for items that appear to "mean" something similar.
Vector indexes work best on textual or image data but can be applied to structured data as well. We convert textual data to vectors using an embedding model. Almost all AI services offer such a model. The vectors are complex multidimensional structures that look meaningless to humans but can be used in vector search to find "similar" items.
Let's say we have a bunch of PDF documents containing legal agreements and that we've created a vector index on them. When a user asks the question, "Do we have a sales agreement with CompanyX?”, we convert that question to vectors using the embedding model. We then perform a search on the vector index to find the "most similar" documents in the database. These documents will be those that include exact matches to words like "CompanyX" but also contain words similar to "sales agreement." For instance, the vector index will consider "purchase contract" to be semantically similar to "sales agreement" and so will return those contracts.
Once we've got the relevant data, we can feed it into the context of the query sent to the AI. The AI then does its normal processing of the text and answers the question in context.
RAG is a great way to integrate generative AI into custom databases. And despite the complexity of the underlying technologies, the programming model is quite simple. Not all databases support vector indexes, but they can be found in the leading non-relational systems including Redis, MongoDB, and Neo4J.
The need for RAG may diminish as generative AI systems gain greater context windows. Right now, ChatGPT can only process about 32,000 tokens of information embedded in a question. Google's Gemini has already pushed that limit to the millions. In some cases, RAG will be unnecessary—we'll just push up all possible information in these massive context windows. But unless we want to push entire databases to an AI anytime we want to ask a question about it, RAG will continue to have a role to play.