📚

Exploring Large Language Models Overview

May 28, 2025

View transcript

Review flashcards

Lecture on Large Language Models

Introduction

Overview of a recent 30-minute talk on large language models (LLMs).
Discussion on the Llama 270b model by Meta AI.
Comparison to models like GPT (OpenAI) with closed architecture.

What is a Large Language Model?

Consists of two files: a parameters file (70 billion parameters, 140GB) and a run file (neural network architecture code).
Can run on a MacBook, does not need internet connection.
Parameters represent a "lossy compression" of the internet, storing vast amounts of data in a compact form.

Training and Inference

Training: Involves compressing large amounts of internet text data using powerful GPU clusters.
- Example: Llama 270b requires 6,000 GPUs, 12 days, $2 million.
Inference: Running the model—cheap compared to training.
The model predicts the next word in a sequence, learning a lot about the world.
Models achieve a "Gestalt" understanding rather than memorization of data.

Neural Network Architecture

Transformer neural network architecture is used.
The model learns to predict the next word, enabling it to "dream" or generate text resembling internet documents.
Models behave as empirical artifacts; difficult to understand fully but can be measured and evaluated.

Fine-Tuning and Improving Models

Pre-training: Phase where large datasets are used to gather knowledge.
Fine-tuning: Involves customizing models into assistant models through high-quality Q&A datasets.
- Uses human labelers to create ideal response data.
Stage Two: Further fine-tuning using comparison labels (Reinforcement Learning from Human Feedback).
Open-source models like Llama allow easier fine-tuning by users.

Current State and Future Directions

Closed models (e.g., GPT, Claude) lead in performance, open models (e.g., Llama) follow closely.
Scaling Laws: Model performance improves predictably with increased parameters and data.
Capabilities: Models now use tools (browsers, calculators) for enhanced problem-solving.
Multi-modality: Involves understanding and generating from multiple types of input (text, images, audio).

System One vs. System Two Thinking

Current language models operate on instinct (System One).
System Two involves deeper reasoning and problem-solving, future goal for language models.
Self-improvement in narrow domains is a potential development path.

Security Concerns

Jailbreak Attacks: Exploit loopholes to bypass model restrictions (e.g., role-playing scenarios).
Data Poisoning: Attackers can manipulate model training data to introduce backdoors.
Prompt Injection: Injecting misleading instructions to alter model behavior.
Security in LLMs involves a continuous cycle of attack and defense.

Conclusion

LLMs are evolving into complex systems, comparable to operating systems in functionality.
The field is rapidly developing with both exciting opportunities and significant security challenges.

Full transcript