Coconote
AI notes
AI voice & video notes
Try for free
📚
Understanding Large Language Models (LLMs)
Apr 23, 2025
Lecture on Building Large Language Models (LLMs)
Overview
Focus: How LLMs work and components needed to train LLMs.
Key components: Architecture, training loss & algorithm, data, evaluation, and system.
LLMs based on Transformers; focus on non-architecture aspects like data and systems.
Key Components of LLMs
Architecture
LLMs are neural networks with critical architecture considerations.
Not detailed in this lecture due to an abundance of existing resources.
Training
Involves specific algorithms and losses.
Training loss is crucial for model optimization.
Data
Critical to model performance; involves collecting and processing vast amounts of internet data.
Importance of data quality and diversity.
Evaluation
Determines model efficacy through benchmarks and specific metrics like perplexity.
Systems
Optimization of LLMs on hardware is crucial due to size and computational demands.
Pre-Training vs. Post-Training
Pre-training
: Classical language modeling paradigm.
Models probability distributions over token sequences.
Post-training
: Recent trend focusing on AI assistants.
Example: Transition from GPT-3 to ChatGPT.
Language Modeling
Probabilistic model over sequences of tokens.
Generative Models
: Generate sentences/data from a trained model.
Auto-regressive Language Models
: Predict next token based on previous ones.
Tokenization
Converts text to tokens, essential for handling diverse languages and typos.
BPE (Byte Pair Encoding)
: Common tokenization method.
Tokens are unique and necessary for effective model training.
Evaluation of LLMs
Perplexity
: Measures model's uncertainty in token prediction; not widely used due to dependency on tokenization.
Academic benchmarks aggregate multiple NLP tasks.
Data for LLMs
Involves crawling and cleaning internet data.
Challenges: Undesirable content, duplication, heuristic filtering.
Model-based filtering improves data quality.
Scaling Laws
Larger models and more data improve performance.
Predictive scaling laws guide resource allocation and model size decisions.
Post-Training
Align LLMs to follow instructions and reflect human preferences.
Supervised Fine Tuning (SFT)
: Fine-tunes models on human-labeled data.
Reinforcement Learning from Human Feedback (RLHF)
: Optimizes models based on human preferences.
DPO (Direct Preference Optimization)
: A simpler alternative to RLHF using likelihood maximization of preferred outputs.
Evaluation of Post-Training
Challenges in evaluating open-ended responses.
Use of human and LM preferences to rank model outputs.
Importance of unbiased and effective evaluation methods.
Systems and Computing
Efficient use of GPUs crucial for cost-effective LLM training.
Low Precision Computing
: Reduces memory and increases speed.
Operator Fusion
: Combines operations to minimize data transfer latency.
Conclusion
Training and deploying LLMs is complex, involving significant systems and data considerations.
Ongoing research and development in data collection, scaling laws, and post-training improvements.
Additional Learning Resources
CS courses on natural language processing and LLMs are recommended for deeper understanding.
📄
Full transcript