Large language models are trained on vast amounts of public text, but they don't know anything about your specific data — your company's documents, your customer records, your codebase, your policies. Ask a vanilla LLM "what's our refund policy?" and it has no idea; it might even hallucinate a plausible-sounding but wrong answer.
RAG (Retrieval-Augmented Generation) solves this. It's a technique where the agent retrieves relevant information from your data sources before generating a response, then uses that retrieved information to ground its answer. RAG is what makes agents useful for real business work — without it, agents can only talk about the public internet; with it, they can answer questions about your specific context.
This guide explains what RAG is, how it works, why it matters, and what to look for when evaluating RAG-based agent features. If you're new to agent concepts, start with our How AI Agents Work guide.
The problem RAG solves
LLMs have three properties that make them challenging for business use:
- They're trained on public data. They don't know anything about your private documents, internal tools, or customer data.
- Their training data has a cutoff. They don't know about events after their training completed.
- They hallucinate. When asked about something they don't know, they often generate plausible-sounding but incorrect answers rather than admitting ignorance.
RAG addresses all three problems by giving the model access to your specific, current data at inference time. Instead of relying on what the model learned during training, RAG retrieves the relevant facts from your data sources and includes them in the model's context. The model then generates a response grounded in those facts.
How RAG works, step by step
RAG involves four main steps, performed for each user query:
Step 1: Document ingestion and indexing
Before RAG can work, your documents need to be indexed. This typically involves: (1) splitting documents into chunks (paragraphs, sections, or fixed-size chunks), (2) generating an embedding (a numerical representation) for each chunk, (3) storing the embeddings in a vector database. This step happens once (or whenever your documents change) — not for each query.
Step 2: Query embedding
When a user asks a question, the question is converted into an embedding using the same model that was used for document indexing. This embedding represents the semantic meaning of the question.
Step 3: Retrieval
The query embedding is compared against all document embeddings in the vector database. The most similar documents (typically 3-10 chunks) are retrieved. "Similar" here means semantically similar — the documents that are most likely to contain information relevant to answering the question.
Step 4: Generation
The retrieved documents are inserted into the LLM's prompt along with the user's question. The model generates a response that's grounded in the retrieved documents, ideally with citations back to the source material. The response is sent back to the user.
Why RAG matters for agents
RAG is what makes agents useful for business work. Without RAG, an agent can only answer questions about public information — useless for "what's our refund policy?" or "who handled the Acme account last year?" With RAG, the same agent can answer those questions by retrieving the relevant information from your documents, emails, and CRM.
Three specific capabilities RAG enables:
- Domain-specific answers. Agents can answer questions about your company, your products, your customers, your policies — anything in your data.
- Current information. Because RAG retrieves from your live data sources, agents can answer questions about current state (current inventory, current deals, current customer status) rather than relying on stale training data.
- Citable responses. Good RAG implementations include citations back to the source documents, so users can verify the agent's claims. This is essential for high-stakes use cases like legal or medical research.
RAG vs fine-tuning
RAG and fine-tuning are two different ways to give a model access to your data, and they're often confused. The key difference:
- RAG retrieves relevant documents at inference time and includes them in the model's context. The model itself isn't changed. RAG is great for factual questions, document search, and any use case where the underlying data changes frequently.
- Fine-tuning trains the model on your data, modifying its weights. Fine-tuning is great for style, tone, and domain-specific patterns, but it's expensive, slow to update, and prone to overfitting on small datasets.
For most business use cases, RAG is the right choice. It's faster to set up, easier to update (just update your documents — no retraining needed), and more transparent (you can see what documents were retrieved). Fine-tuning is worth considering for specific cases like matching brand voice or domain-specific language, but it's rarely the right starting point.
Limitations of RAG
RAG is powerful but not perfect. Important limitations to understand:
- Retrieval quality matters more than model quality. If the right documents aren't retrieved, the model can't use them. RAG systems are only as good as their retrieval step.
- Chunking affects quality. How documents are split into chunks affects what's retrieved. Too small and you lose context; too large and you retrieve irrelevant information. Chunking strategy is an art.
- Models still hallucinate. Even with retrieved context, models sometimes generate claims that aren't supported by the documents. Always verify critical claims.
- RAG doesn't work for all question types. Questions requiring synthesis across many documents (e.g., "summarize our top 10 deals this quarter") are harder for RAG than specific factual questions.
RAG in practice: what to look for
If you're evaluating agent platforms that include RAG features (often marketed as "chat with your data" or "knowledge base"), look for these capabilities:
- Citations. Does the agent cite its sources? Can you click through to the original document? Citations are essential for verification.
- Multiple data sources. Can the agent retrieve from multiple sources (documents, CRM, email, codebase) or just one? Most business questions span multiple sources.
- Easy updates. How do you add new documents? Is it automatic (synced from a folder) or manual (upload each document)? Easy updates are essential for keeping the knowledge base current.
- Access control. Does the agent respect your existing permissions? If user A doesn't have access to document X, the agent shouldn't retrieve it for user A. This is critical for enterprise use.
- Transparency. Can you see what documents were retrieved for a given query? This helps you debug bad answers and understand the agent's reasoning.
Frequently asked questions
Does RAG eliminate hallucinations?
No, but it significantly reduces them for factual questions. With RAG, the model is more likely to ground its answers in retrieved documents rather than generating plausible-sounding but false claims. However, models can still misinterpret retrieved documents or generate claims that go beyond what the documents actually say. Always verify critical claims.
What data sources can RAG use?
Almost any text data: PDFs, Word documents, web pages, emails, CRM records, code, databases (via text export), spreadsheets, support tickets, chat logs. The main requirement is that the data can be converted to text and chunked into retrievable units. Images and audio require additional processing (OCR, transcription) before RAG can work.
Is RAG expensive to run?
It depends on scale. For small knowledge bases (under 1,000 documents), RAG is cheap — vector database costs are minimal and the additional LLM tokens for retrieved context are small. For large knowledge bases (100,000+ documents), vector database costs and retrieval latency become significant. Most agent platforms handle RAG infrastructure for you, so you pay their subscription fee rather than building it yourself.
Can I build RAG myself or should I use a platform?
If you have technical resources and specific requirements, building RAG yourself gives you more control. If you want to get started quickly or don't have a development team, use an agent platform that includes RAG (Lindy, Relevance, Copilot Studio, Claude Projects). Most businesses start with a platform and only build custom RAG when they hit platform limitations.
The bottom line
RAG is the technology that makes AI agents useful for real business work. By retrieving relevant information from your data sources before generating responses, RAG lets agents answer questions about your specific context rather than just the public internet. Without RAG, agents are limited to general-purpose chatbot capabilities; with RAG, they become genuine knowledge workers that can access and reason about your company's information.
For most users, RAG will be invisible — it's the technology behind "chat with your data" features in agent platforms. But understanding that it exists, how it works, and what its limitations are helps you set realistic expectations and evaluate agent platforms more critically. Choose platforms with strong RAG capabilities, and your agents will be dramatically more useful for your specific work.
Want more agent concepts explained?
Our glossary covers 35+ AI agent terms in plain English.
See the glossary