Coconote
AI notes
AI voice & video notes
Try for free
📚
Enhancing RAG Systems to Retain Context
Apr 23, 2025
Lecture on Improving RAG Systems: Tackling Lost Context Problem
Introduction
Issues with RAG Agents:
Inaccurate answers
Hallucinations due to lost context
Goal: Improve retrieval accuracy and reduce hallucinations
Example of Lost Context Problem
Document: Wikipedia article on Berlin
Chunked into sentences for demonstration
Context Loss:
"Berlin" mentioned only in the first chunk
Subsequent chunks refer to "Berlin" as "its" or "city"
Standard RAG System:
Chunks sent independently to embedding models
Loss of context affects retrieval accuracy
Techniques to Mitigate Context Loss
Contextual Retrieval
Late Chunking
What is RAG?
Retrieval-Augmented Generation
AI retrieves relevant information to answer questions
Chunking in RAG Systems
Splitting documents into segments for processing
Different strategies: token/character count, overlapping chunks
Late Chunking
Developed by Gina AI
Embedding whole document first, then chunking
Benefits:
Long context embedding models maintain context
Pooling/aggregating vectors to preserve context
Long Context Embedding Models
Examples: Mistl, Quinn
Gina AI's V3 model supports ~8,000 tokens
Workflow for Late Chunking
Fetch document, chunk it, embed it, and upsert in vector store
Overcoming limitations in tools like N8N
Contextual Retrieval
Introduced by Anthropic
Uses LLMs for context instead of embedding models
Process:
Document split into chunks
Each chunk analyzed by LLM for context
Descriptive blurb generated
Added to chunk for embedding
Implementation Challenges
Cost and time constraints
Managing rate limits and token limits
Evaluation and Results
Late Chunking and Contextual Retrieval tested against simple RAG setup
Contextual Retrieval improves accuracy significantly
Issues with scalability and cost for large document sets
Conclusion
Advanced RAG techniques improve retrieval accuracy
Cost and scalability remain challenges
Community resources available for further exploration and implementation
📄
Full transcript