Coconote
AI notes
AI voice & video notes
Try for free
ðŸ§
Exploring Context Windows in LLMs
Feb 8, 2025
Lecture on Large Language Models (LLMs) and Context Windows
Understanding the Context Window
Definition:
Equivalent to working memory in LLMs, determining how long a conversation can be maintained without forgetting earlier details.
Functionality:
Captures entire conversations within a window, allowing the model to reference previous exchanges for generating responses.
Limitations:
When a conversation thread exceeds the context window, earlier details are forgotten, leading to potential inaccuracies.
Measuring Context Windows
Tokens:
Unlike humans who use characters, AI models use tokens as the smallest unit of language.
Tokens can be individual characters, parts of words, whole words, or short phrases.
Example: "a" in "Martin drove a car" is one token, but "a" in "amoral" is two tokens ("a" and "moral").
Tokenization:
Process of converting language to tokens using a tokenizer.
Different tokenizers may produce varying results for the same text.
A regular English word averages 1.5 tokens.
Context Window Size and Processing
Self-Attention Mechanism:
Utilized by transformer models to determine relationships and dependencies between tokens.
Relevance of tokens is computed via weight vectors.
Window Size Increases:
Early LLMs had around 2,000 tokens; newer models like IBM Granite 3 have 128,000 tokens.
Components Within a Context Window
User Input and Model Responses:
Both contribute to filling the context window.
System Prompts:
Hidden instructions conditioning model behavior.
Supplementary Information:
Documents, source code, and external data sources can be included for augmented generation.
Challenges with Large Context Windows
Increased Compute Requirements:
Processing needs scale quadratically with sequence length.
Doubling input tokens requires four times more processing power.
Performance Issues:
Models may struggle with information in the middle of long contexts, leading to cognitive shortcuts.
Performance is best when relevant info is at the start or end of the context.
Safety Concerns:
Longer windows increase vulnerability to adversarial prompts and jailbreaking.
Embedded malicious content is harder to detect.
Conclusion
Balancing Act:
Selecting the right number of tokens involves balancing information needs with computational demands and potential performance issues.
📄
Full transcript