Understanding Retrieval Augmented Generation Methods

Mar 6, 2025

RAG from Scratch Course Notes

Introduction

  • Instructor: Lance Martin, Software Engineer at LangChain.
  • Focus: Implementing RAG (Retrieval Augmented Generation) from scratch.
  • Motivation: Most world data is private, while LLMs are trained on publicly available data. RAG allows integration of private data into LLMs.

Overview of RAG

  • RAG Definition: Combines retrieval of external data with generation from LLMs.
  • Steps of RAG:
    1. Indexing external data (creating a retrieval database).
    2. Retrieving relevant documents based on a query.
    3. Generating answers using the retrieved documents.

Key Concepts

1. Context Window Size

  • Context windows have increased from 4-8k tokens to potentially millions of tokens.
  • This allows for feeding extensive external data into LLMs.

2. Indexing External Data

  • Methods to index data:
    • SQL databases, vector stores, etc.
    • Documents indexed for retrieval based on heuristics.

3. Query Transformation

  • Process of modifying a userโ€™s query to improve retrieval success:
    • Query Rewriting: Rephrasing for clarity.
    • Decomposition: Breaking down a complex query into simpler sub-questions.

4. Routing

  • Identifying the correct database for the query:
    • Logical routing (LLM-driven) or semantic routing (embedding-based).

5. Query Construction

  • Converting natural language into domain-specific languages (DSL) for databases.
    • Examples: Text-to-SQL, Text-to-Cipher, Text-to-metadata filters.

Indexing Techniques

  • Various methods to enhance indexing:
    • Embedding Methods: Create fixed-length representations of documents.
    • Reranking Techniques: After retrieval, documents can be reranked for relevance.

Generation Methods

  • Involves using retrieved documents to generate responses:
    • RAG with Feedback: Incorporates feedback into retrieval and generation processes.
  • Chain of Thought: Processes where intermediate steps are evaluated before final generation.

Active RAG

  • LLMs decide when and where to retrieve based on previous outputs.
  • State Machines: Allows for more complex workflows with multiple decision points.

Recent Advancements in RAG

  • Multi-Representation Indexing: Using summaries instead of full documents to improve retrieval efficiency.
  • Hierarchical Indexing (Raptor): Clustering documents and creating summaries for better information retrieval.

Conclusion

  • RAG is not dead but is evolving with advancements in LLMs and retrieval techniques.
  • Emphasis on combining retrieval with reasoning, especially in long-context scenarios.
  • Future of RAG likely includes more sophisticated models and techniques, improving accuracy and efficiency.