📚

Enhancing RAG Systems to Retain Context

Apr 23, 2025

Lecture on Improving RAG Systems: Tackling Lost Context Problem

Introduction

  • Issues with RAG Agents:
    • Inaccurate answers
    • Hallucinations due to lost context
  • Goal: Improve retrieval accuracy and reduce hallucinations

Example of Lost Context Problem

  • Document: Wikipedia article on Berlin
    • Chunked into sentences for demonstration
  • Context Loss:
    • "Berlin" mentioned only in the first chunk
    • Subsequent chunks refer to "Berlin" as "its" or "city"
  • Standard RAG System:
    • Chunks sent independently to embedding models
    • Loss of context affects retrieval accuracy

Techniques to Mitigate Context Loss

  • Contextual Retrieval
  • Late Chunking

What is RAG?

  • Retrieval-Augmented Generation
  • AI retrieves relevant information to answer questions

Chunking in RAG Systems

  • Splitting documents into segments for processing
  • Different strategies: token/character count, overlapping chunks

Late Chunking

  • Developed by Gina AI
  • Embedding whole document first, then chunking
  • Benefits:
    • Long context embedding models maintain context
    • Pooling/aggregating vectors to preserve context

Long Context Embedding Models

  • Examples: Mistl, Quinn
  • Gina AI's V3 model supports ~8,000 tokens

Workflow for Late Chunking

  • Fetch document, chunk it, embed it, and upsert in vector store
  • Overcoming limitations in tools like N8N

Contextual Retrieval

  • Introduced by Anthropic
  • Uses LLMs for context instead of embedding models
  • Process:
    • Document split into chunks
    • Each chunk analyzed by LLM for context
    • Descriptive blurb generated
    • Added to chunk for embedding

Implementation Challenges

  • Cost and time constraints
  • Managing rate limits and token limits

Evaluation and Results

  • Late Chunking and Contextual Retrieval tested against simple RAG setup
  • Contextual Retrieval improves accuracy significantly
  • Issues with scalability and cost for large document sets

Conclusion

  • Advanced RAG techniques improve retrieval accuracy
  • Cost and scalability remain challenges
  • Community resources available for further exploration and implementation