Enhancing RAG with Knowledge Graphs

Jul 18, 2024

Enhancing RAG with Knowledge Graphs

Introduction

  • Retrieval Augmented Generation (RAG) is common in LLM usage.
  • Major idea: Enable LLM to use a private corpus for domain-specific knowledge.
  • Shortcomings of RAG: Fails on global questions about the entire corpus.

Problems with RAG

  • RAG relies on retrieving relevant document snippets for queries.
  • Query-answering dependent on text within these snippets.
  • Struggles with queries about themes or concepts spanning entire documents.
  • Example Problem: Query about major themes in a document corpus.

Introduction of Knowledge Graphs

  • Aim: To provide deeper understanding of corpus concepts and relations.
  • Sense Making: Understanding core connections among entities (people, places, events, concepts).
  • Extracting entities and their relationships for better answers to queries.

Process Overview

  1. Offline Steps (Indexing Time)

    • Chunk documents.
    • Extract element instances (entities and relationships).
    • Summarize entities and relationships.
    • Cluster similar entities into communities.
    • Summarize these communities.
  2. Query Time (Lookup Time)

    • Use community summaries for initial query processing.
    • Generate intermediate answers from community summaries.
    • Rank and score intermediate answers using LLM.
    • Concatenate top-ranked answers for final global answer.

Detailed Steps

Offline Steps

  1. Document Chunking

    • Experiments with various chunk sizes.
  2. Extracting Concepts and Relationships

    • Heavy computational step using few-shot prompting with LLM.
  3. Summarizing Entities and Relationships

    • Summarize extracted concepts and their connections through LLM prompts.
  4. Community Clustering

    • Nodes (entities) clustered based on strong relationship edges.
    • Resulting in mutually exclusive and collectively exhaustive hierarchy.
    • Summarize these clusters (community summaries).

Query Answering

  1. Using Community Summaries
    • Chunk and shuffle community summaries.
    • Generate and rank intermediate answers for each chunk.
    • Top-ranked answers are used for final comprehensive answer.

Evaluation of Graph RAG

  • Datasets Used: Podcast transcripts, news articles.
  • Generating evaluation questions demonstrating global understanding using LLM.

Metrics for Answer Evaluation

  • Comprehensiveness: Broadly covering the topic/query.
  • Diversity: Varied and covering different ideas.
  • Empowerment: Helps the reader understand the topic.
  • Directness: Specificity and relevance to the question.

Results and Comparison

  • Graph RAG vs. Naive RAG: Better performance in comprehensiveness and diversity.
  • Example Comparison: Public figures in entertainment articles.
    • Graph RAG showed deeper understanding by categorizing figures rather than listing frequently mentioned names.

Conclusion

  • Graph RAG enhances traditional RAG with deeper understanding via entity graphs.
  • Produces more comprehensive, direct answers to complex, corpus-wide queries.
  • Trend indicates movement towards integrating knowledge graphs in RAG techniques.

Summary

  • Enhanced RAG with knowledge graphs solves major traditional RAG issues.
  • Knowledge graphs used to understand and summarize document-wide concepts.
  • Promising results in producing higher quality, globally aware answers.