Module 6 - Lecture 2: Generating Text: Large Language Models

Overview

This lecture explains the basics of large language models (LLMs), how they predict and generate text, the transformative role of the Transformer architecture, and the distinction between web-based and code-based LLMs.

Intuition Behind Large Language Models

LLMs work because language is predictable; words often appear together in meaningful, patterned ways.
Predictability in language is quantified using probabilities assigned to words or sequences of words.

Types of Language Models

The simplest model, the unigram language model, assigns probabilities to individual words based on frequency in large text collections (corpora).
Bigram and trigram models extend this by considering the probability of a word given one or two preceding words, improving coherent text generation.
Extending to higher-order n-gram models (more previous words) quickly becomes computationally impractical due to the vast number of combinations.

Large Language Models and the Internet

Modern LLMs are trained on massive text datasets from the internet, learning billions of probabilities between word sequences.
This training enables them to generate human-like text responses to prompts.

Model Architectures and the Transformer

Early LLMs used recurrent neural networks (RNNs), but these struggled with efficiency and vanishing probability issues.
Transformers, introduced by Google, use an "attention" mechanism to relate each word to every other word in the input, greatly improving model performance.
Attention allows the model to focus on relevant words and relationships without tracking the entire history, making computations more efficient.

Categories and Examples of LLMs

Popular web interfaces like ChatGPT, Google Bard (now Gemini) make LLMs accessible without coding.
Many LLMs (GPT, Elmo, Bert, Claude, Llama) require coding to use; these can be integrated into business systems for custom tasks.
Platforms like Hugging Face provide access to hundreds of thousands of pre-trained models for tasks like text generation, translation, and classification.

Key Terms & Definitions

Large Language Model (LLM) — A machine learning system trained to predict the next word(s) in text, based on massive datasets.
Unigram Model — A model assigning probabilities to single words regardless of context.
Bigram/Trigram/N-gram Model — Models that assign probabilities to word sequences of length two, three, or n, based on previous words.
Transformer — A neural network architecture that uses attention mechanisms to efficiently model relationships between all words in a sequence.
Attention — A technique allowing models to weigh the importance of each word relative to every other word in the input.

Action Items / Next Steps

Review Transformer and attention mechanisms in more detail in the next lecture.
Explore platforms like Hugging Face to see practical applications and available models.