🧠

Understanding Large Language Models (LLMs)

Dec 18, 2024

View transcript

Take quiz

Review flashcards

Lecture: Building Large Language Models (LLMs)

Introduction

Focus on building and understanding Large Language Models (LLMs).
Examples of LLMs: ChatGPT, Cloud, Gemini, Lama.
Overview of key components: architecture, training loss, training algorithm, data, evaluation, and systems.

Key Components of LLMs

Architecture

LLMs are neural networks, specifically based on transformers.
Discussion on architecture is limited due to previous lectures and availability of online resources.

Training Loss and Algorithm

Importance of how models are trained.
Discusses pre-training (modeling language) and post-training (AI assistants).

Data

Critical for training models.
Involves large internet datasets and careful curation/cleaning.

Evaluation

Evaluation through perplexity and NLP benchmarks.
Challenges in evaluation are acknowledged.

Systems

Facilitates running models on modern hardware.
Important due to the large size of models.

Pre-Training

Language Modeling

LLMs model probability distribution over sequences of words.
Autoregressive models predict the next word using probability chain rule.

Generative Models

Language models as generative models can generate text.

Tokenization

Tokenizers handle text segmentation into tokens.
Importance of tokenizers: generalizing beyond words, handling typos, and optimizing sequence length.

Evaluation of LLMs

Perplexity

Measures model efficiency; depends on tokenizer and dataset.
Not typically used in academic benchmarking but important in development.

Academic Benchmarks

HELM and Hugging Face Open LM Leaderboard for evaluating models across various tasks.

Evaluation Challenges

Inconsistency and train-test contamination in benchmarks.
Handling open-ended generations.

Data Collection and Processing

Involves web crawling and filtering for quality data.
Steps include text extraction, removal of duplicates, and heuristic filtering.
Synthetic data and multimodal data as advancing areas.

Scaling Laws

Larger models and more data lead to better performance.
Scaling laws predict model performance improvements.
Chinchilla paper findings on optimal training resource allocation.

Post-Training

Supervised Fine-Tuning (SFT)

Fine-tuning models on desired human-generated answers.
SFT data requirements are modest.

Reinforcement Learning from Human Feedback (RLHF)

Overcomes limitations of SFT by maximizing human preference.
Involves creating a reward model to evaluate and improve output.

Evaluation of Post-Training

Challenges due to open-ended answers and lack of perplexity use.
Human evaluations and LLM-assisted evaluations (e.g., Chatbot Arena).

Miscellaneous

Systems Optimization

Low precision computations and operator fusion to optimize GPU usage.

Future Topics

Unexplored areas include architecture advancements, inference optimizations, and legalities of data usage.

Further Reading

Recommendations for related courses: CS224N, CS324, and CS336 for deeper understanding.

Full transcript