Exploring Context Windows in LLMs

Feb 8, 2025

Lecture on Large Language Models (LLMs) and Context Windows

Understanding the Context Window

  • Definition: Equivalent to working memory in LLMs, determining how long a conversation can be maintained without forgetting earlier details.
  • Functionality: Captures entire conversations within a window, allowing the model to reference previous exchanges for generating responses.
  • Limitations: When a conversation thread exceeds the context window, earlier details are forgotten, leading to potential inaccuracies.

Measuring Context Windows

  • Tokens: Unlike humans who use characters, AI models use tokens as the smallest unit of language.
    • Tokens can be individual characters, parts of words, whole words, or short phrases.
    • Example: "a" in "Martin drove a car" is one token, but "a" in "amoral" is two tokens ("a" and "moral").
  • Tokenization: Process of converting language to tokens using a tokenizer.
    • Different tokenizers may produce varying results for the same text.
    • A regular English word averages 1.5 tokens.

Context Window Size and Processing

  • Self-Attention Mechanism: Utilized by transformer models to determine relationships and dependencies between tokens.
    • Relevance of tokens is computed via weight vectors.
  • Window Size Increases: Early LLMs had around 2,000 tokens; newer models like IBM Granite 3 have 128,000 tokens.

Components Within a Context Window

  • User Input and Model Responses: Both contribute to filling the context window.
  • System Prompts: Hidden instructions conditioning model behavior.
  • Supplementary Information: Documents, source code, and external data sources can be included for augmented generation.

Challenges with Large Context Windows

  • Increased Compute Requirements: Processing needs scale quadratically with sequence length.
    • Doubling input tokens requires four times more processing power.
  • Performance Issues: Models may struggle with information in the middle of long contexts, leading to cognitive shortcuts.
    • Performance is best when relevant info is at the start or end of the context.
  • Safety Concerns: Longer windows increase vulnerability to adversarial prompts and jailbreaking.
    • Embedded malicious content is harder to detect.

Conclusion

  • Balancing Act: Selecting the right number of tokens involves balancing information needs with computational demands and potential performance issues.