Understanding Retrieval Augmented Generation Methods

Mar 6, 2025

RAG from Scratch Course Notes

Introduction

Instructor: Lance Martin, Software Engineer at LangChain.
Focus: Implementing RAG (Retrieval Augmented Generation) from scratch.
Motivation: Most world data is private, while LLMs are trained on publicly available data. RAG allows integration of private data into LLMs.

Overview of RAG

RAG Definition: Combines retrieval of external data with generation from LLMs.
Steps of RAG:
1. Indexing external data (creating a retrieval database).
2. Retrieving relevant documents based on a query.
3. Generating answers using the retrieved documents.

Key Concepts

1. Context Window Size

Context windows have increased from 4-8k tokens to potentially millions of tokens.
This allows for feeding extensive external data into LLMs.

2. Indexing External Data

Methods to index data:
- SQL databases, vector stores, etc.
- Documents indexed for retrieval based on heuristics.

3. Query Transformation

Process of modifying a user’s query to improve retrieval success:
- Query Rewriting: Rephrasing for clarity.
- Decomposition: Breaking down a complex query into simpler sub-questions.

4. Routing

Identifying the correct database for the query:
- Logical routing (LLM-driven) or semantic routing (embedding-based).

5. Query Construction

Converting natural language into domain-specific languages (DSL) for databases.
- Examples: Text-to-SQL, Text-to-Cipher, Text-to-metadata filters.

Indexing Techniques

Various methods to enhance indexing:
- Embedding Methods: Create fixed-length representations of documents.
- Reranking Techniques: After retrieval, documents can be reranked for relevance.

Generation Methods

Involves using retrieved documents to generate responses:
- RAG with Feedback: Incorporates feedback into retrieval and generation processes.
Chain of Thought: Processes where intermediate steps are evaluated before final generation.

Active RAG

LLMs decide when and where to retrieve based on previous outputs.
State Machines: Allows for more complex workflows with multiple decision points.

Recent Advancements in RAG

Multi-Representation Indexing: Using summaries instead of full documents to improve retrieval efficiency.
Hierarchical Indexing (Raptor): Clustering documents and creating summaries for better information retrieval.

Conclusion

RAG is not dead but is evolving with advancements in LLMs and retrieval techniques.
Emphasis on combining retrieval with reasoning, especially in long-context scenarios.
Future of RAG likely includes more sophisticated models and techniques, improving accuracy and efficiency.

Full transcript