RAG, or Retrieval-Augmented Generation, is a technique in artificial intelligence that enhances a language model's responses by supplying it with relevant information retrieved from an external knowledge source at the time of each query. Rather than relying solely on what the model learned during training, RAG allows the model to consult current, domain-specific, or private data before generating an answer.
The core mechanism works in two stages. When a user submits a query, a retrieval component searches a designated data source - often a vector database or document store - and identifies the passages or records most relevant to that input. Those retrieved pieces of text are then passed to the language model alongside the original query, giving the model grounded context from which to compose its response.
This architecture addresses one of the most significant limitations of large language models: their knowledge is frozen at the point of training. A model trained several months ago has no awareness of events, documents, or data that emerged after that cutoff. RAG sidesteps this problem by connecting the model to a living knowledge base that can be updated independently, without retraining the model itself.
In practice, organizations use RAG to build AI assistants that can reason over internal documentation, customer support archives, legal corpora, product catalogs, or any structured or unstructured text repository. The model does not memorize this content. Instead, it reads the retrieved context on demand and formulates a response grounded in that material.
A key distinction separates RAG from fine-tuning, which is the other common strategy for specializing a language model. Fine-tuning bakes knowledge directly into the model's weights through additional training, whereas RAG keeps the model and the knowledge source separate. This separation makes RAG faster to update, cheaper to maintain, and easier to audit, since the source documents can be inspected, corrected, or replaced without touching the model itself.
RAG is not a single product or framework but a design pattern. It can be implemented with a variety of retrieval engines, embedding models, and language models, and the quality of the output depends heavily on how well the retrieval step surfaces genuinely relevant context.