📚

Enhancing RAG Systems to Retain Context

Apr 23, 2025

Lecture on Improving RAG Systems: Tackling Lost Context Problem

Introduction

Issues with RAG Agents:
- Inaccurate answers
- Hallucinations due to lost context
Goal: Improve retrieval accuracy and reduce hallucinations

Example of Lost Context Problem

Document: Wikipedia article on Berlin
- Chunked into sentences for demonstration
Context Loss:
- "Berlin" mentioned only in the first chunk
- Subsequent chunks refer to "Berlin" as "its" or "city"
Standard RAG System:
- Chunks sent independently to embedding models
- Loss of context affects retrieval accuracy

Techniques to Mitigate Context Loss

Contextual Retrieval
Late Chunking

What is RAG?

Retrieval-Augmented Generation
AI retrieves relevant information to answer questions

Chunking in RAG Systems

Splitting documents into segments for processing
Different strategies: token/character count, overlapping chunks

Late Chunking

Developed by Gina AI
Embedding whole document first, then chunking
Benefits:
- Long context embedding models maintain context
- Pooling/aggregating vectors to preserve context

Long Context Embedding Models

Examples: Mistl, Quinn
Gina AI's V3 model supports ~8,000 tokens

Workflow for Late Chunking

Fetch document, chunk it, embed it, and upsert in vector store
Overcoming limitations in tools like N8N

Contextual Retrieval

Introduced by Anthropic
Uses LLMs for context instead of embedding models
Process:
- Document split into chunks
- Each chunk analyzed by LLM for context
- Descriptive blurb generated
- Added to chunk for embedding

Implementation Challenges

Cost and time constraints
Managing rate limits and token limits

Evaluation and Results

Late Chunking and Contextual Retrieval tested against simple RAG setup
Contextual Retrieval improves accuracy significantly
Issues with scalability and cost for large document sets

Conclusion

Advanced RAG techniques improve retrieval accuracy
Cost and scalability remain challenges
Community resources available for further exploration and implementation

Full transcript