πŸ€–

Overview of Large Language Models

Aug 19, 2025

Overview

This lecture explains how large language models (LLMs), especially transformers, predict the next word in a text, how they are trained, and why their scale and behavior are unique.

How Language Models Predict Text

  • Large language models predict the next word in a text by assigning probabilities to all possible next words.
  • Chatbots use LLMs to generate responses by predicting one word at a time, creating natural conversation.
  • While deterministic, the model can select less likely words at random, making each response potentially unique.

Training Language Models

  • Models are trained with vast amounts of text, often scraped from the internet.
  • Training adjusts internal continuous values called parameters or weights, often numbering in the hundreds of billions.
  • Training starts with random parameters and refines them using actual text examples.
  • The algorithm compares predicted words with actual text and tweaks parameters using backpropagation to increase accuracy.
  • The sheer amount of computation required is enormous and only feasible with specialized hardware (GPUs).

Pre-training and Fine-tuning

  • The initial stage, called pre-training, teaches the model to autocomplete random text passages.
  • To become effective assistants, models undergo reinforcement learning with human feedback (RLHF), where humans flag and correct model errors.

Transformers and Attention

  • Transformers revolutionized LLMs by processing text in parallel rather than word by word.
  • Words are encoded as lists of numbers (vectors) since training only works with numerical values.
  • Transformers use the attention mechanism, letting words influence each other's meaning based on context.
  • Another key operation is the feed-forward neural network, increasing the model's capacity to learn patterns.
  • Each processing layer refines word representations to improve next word predictions.

Model Behavior and Limitations

  • Model behavior emerges from the tuning of billions of parameters, making their decisions hard to interpret.
  • Output is often fluent and useful, but the exact reasons for predictions are difficult to pinpoint.

Key Terms & Definitions

  • Large Language Model (LLM) β€” A model that predicts the next word in text, trained on massive datasets.
  • Parameter/Weight β€” A continuous value inside a model, adjusted during training to improve predictions.
  • Backpropagation β€” The algorithm used to update parameters in neural networks based on prediction errors.
  • Transformer β€” A neural network architecture that uses attention and processes text in parallel.
  • Attention β€” An operation allowing words to influence each other's meaning in context.
  • Feed-forward Neural Network β€” A type of neural network layer for learning patterns in data.
  • Reinforcement Learning with Human Feedback (RLHF) β€” Training where humans correct model outputs to refine predictions.
  • GPU β€” A chip specialized for performing many operations in parallel, essential for training large models.

Action Items / Next Steps

  • Watch the suggested deep learning series for detailed explanations of transformers and attention.
  • View the lecturer’s talk for further insights into LLMs.