Coconote
AI notes
AI voice & video notes
Try for free
🧠
Understanding Large Language Models (LLMs)
Dec 18, 2024
📄
View transcript
🤓
Take quiz
🃏
Review flashcards
Lecture: Building Large Language Models (LLMs)
Introduction
Focus on building and understanding Large Language Models (LLMs).
Examples of LLMs: ChatGPT, Cloud, Gemini, Lama.
Overview of key components: architecture, training loss, training algorithm, data, evaluation, and systems.
Key Components of LLMs
Architecture
LLMs are neural networks, specifically based on transformers.
Discussion on architecture is limited due to previous lectures and availability of online resources.
Training Loss and Algorithm
Importance of how models are trained.
Discusses pre-training (modeling language) and post-training (AI assistants).
Data
Critical for training models.
Involves large internet datasets and careful curation/cleaning.
Evaluation
Evaluation through perplexity and NLP benchmarks.
Challenges in evaluation are acknowledged.
Systems
Facilitates running models on modern hardware.
Important due to the large size of models.
Pre-Training
Language Modeling
LLMs model probability distribution over sequences of words.
Autoregressive models predict the next word using probability chain rule.
Generative Models
Language models as generative models can generate text.
Tokenization
Tokenizers handle text segmentation into tokens.
Importance of tokenizers: generalizing beyond words, handling typos, and optimizing sequence length.
Evaluation of LLMs
Perplexity
Measures model efficiency; depends on tokenizer and dataset.
Not typically used in academic benchmarking but important in development.
Academic Benchmarks
HELM and Hugging Face Open LM Leaderboard for evaluating models across various tasks.
Evaluation Challenges
Inconsistency and train-test contamination in benchmarks.
Handling open-ended generations.
Data Collection and Processing
Involves web crawling and filtering for quality data.
Steps include text extraction, removal of duplicates, and heuristic filtering.
Synthetic data and multimodal data as advancing areas.
Scaling Laws
Larger models and more data lead to better performance.
Scaling laws predict model performance improvements.
Chinchilla paper findings on optimal training resource allocation.
Post-Training
Supervised Fine-Tuning (SFT)
Fine-tuning models on desired human-generated answers.
SFT data requirements are modest.
Reinforcement Learning from Human Feedback (RLHF)
Overcomes limitations of SFT by maximizing human preference.
Involves creating a reward model to evaluate and improve output.
Evaluation of Post-Training
Challenges due to open-ended answers and lack of perplexity use.
Human evaluations and LLM-assisted evaluations (e.g., Chatbot Arena).
Miscellaneous
Systems Optimization
Low precision computations and operator fusion to optimize GPU usage.
Future Topics
Unexplored areas include architecture advancements, inference optimizations, and legalities of data usage.
Further Reading
Recommendations for related courses: CS224N, CS324, and CS336 for deeper understanding.
📄
Full transcript