🧠

Exploring LLM Hallucinations and Mitigation

May 22, 2025

Lecture on LLM Hallucinations

Introduction

  • Focus on LLM hallucinations, detection, and mitigation.
  • Emphasis on building Retrieval-Augmented Generation (RAG) systems.
  • Discuss metrics from the RAGAS paper for evaluating RAG pipelines.

Understanding Hallucinations

  • Hallucinations in LLMs: False/misleading information due to statistical nature.
  • Example: Air Canada's chatbot gave false refund policy information.
  • No complete solution for hallucinations; they occur because LLMs follow probability, not truth.
  • Mitigation possible when facts are available for grounding LLM responses.

RAG Pipelines

  • Retrieve document chunks relevant to a user's query.
  • LLM uses retrieved context to generate a response, reducing hallucination chances.
  • Checking faithfulness or groundedness of LLM's output against retrieved context.

Metrics for RAG Systems (RAGAS Framework)

  • Faithfulness: Is the LLM's output grounded in the provided context?
  • Context Relevance: Effectiveness of retrieval—are retrieved documents relevant to the query?
  • Answer Relevance: Does the generated answer address the user's question directly?

Methods for Evaluating Metrics

  • Sentence Embedding Models: Measure semantic similarity between text snippets.
  • LLMs as Judges: Prompt LLMs to score and evaluate the faithfulness of responses.
    • Example: Lynx model for hallucination detection in specialized domains.

Detailed Evaluation Techniques

  • Faithfulness Checks:
    • Use sentence embeddings and cosine distance to assess similarity.
    • LLMs as judges to score answer's faithfulness.
    • Two-step method: Break down answer into factual statements and verify each against context.
  • Answer Relevance:
    • Generate questions from the answer and compare them with the original question using embeddings.
  • Context Relevance:
    • Extract key sentences relevant to the question from retrieved context.
    • Measure ratio of relevant to total sentences.

Reference-Free Metrics

  • Metrics do not require a correct reference answer, suitable for real-time application.

Application of Metrics

  • Real-Time Use:
    • Check faithfulness before displaying answers to users.
    • Show message if answer can't be generated faithfully.
  • Development Phase:
    • Test RAG pipeline with user questions.
    • Evaluate and improve with faithfulness, context, and answer relevance metrics.

Conclusion

  • Hallucinations are a product of LLMs' statistical nature but can be managed.
  • Faithfulness checks and RAGAS framework metrics are essential for improving RAG systems.
  • Invitation to subscribe for further insights and controls in practice.