Understanding Large Language Models

Oct 16, 2024

Introduction to Large Language Models

Overview

Re-recorded talk on large language models (LLMs) for YouTube.
Discusses LLMs using the example of LAMA2-70B by Meta.ai.

What is a Large Language Model?

Comprises two files: parameter file and a run file (code to run parameters).
Example: LAMA2 series with different sizes (7B, 13B, 34B, 70B parameters).
Open weights models like LAMA2 are accessible, unlike proprietary models like ChatGPT.
Parameters stored as 140GB float 16 data type.

Running LLMs

Can run LAMA2 models with just the two files on a laptop.
Requires no internet for basic inference.
Example given with scaled-down model for speed demonstration.

Obtaining Parameters: Model Training

Training compresses large datasets (10TB of text) using GPU clusters.
LAMA2 training specifics: 6000 GPUs, 12 days, $2 million cost.
Training involves lossy compression of internet text.

Neural Network Functionality

Task: Predict the next word in a sequence.
Training data leads to learning general world knowledge.
Inference uses text predictions based on training data distribution.

Understanding Neural Networks

Architecture understood but parameter interactions remain complex.
Issue: Models like GPT-4 have knowledge retrieval problems (e.g., asymmetry in question-answer pairs).

Model Training and Fine-tuning

Pre-training vs. Fine-tuning

Pre-training: Involves large-scale internet text for general knowledge.
Fine-tuning: Adjusts model behavior for specific tasks using high-quality Q&A datasets.

Fine-tuning Process

Collect high-quality labeled data for Q&A.
Process improves model's ability to act as an assistant.

Further Fine-tuning (Stage 3)

Uses comparison labels (e.g., choosing best response) for further refinement.
Reinforcement Learning from Human Feedback (RLHF) as an example method.

Labeling Instructions

Instructions can be detailed, aiming for helpful, truthful, harmless outputs.

Human-Machine Collaboration

Increasing use of AI in creating labels, reducing human workload.

Current Model Landscape and Performance

Open vs. Proprietary Models

Closed models (e.g., GPT, Claude) perform better but lack user access.
Open models (e.g., LAMA2) offer freedom in fine-tuning and usage.

Scaling Laws

Performance depends on parameters and training data size.
Larger models tend to perform better without needing new algorithms.

Future Directions

System 1 vs. System 2 Thinking

LLMs currently operate on instinctive processing (System 1).
Goal: Develop System 2, allowing deeper reasoning and longer processing times.

Self-improvement

Inspired by AlphaGo's self-improvement through reinforcement learning.
Challenge: LLMs lack clear reward functions except in narrow domains.

Customization and Specialized Tasks

Customizing models for specific tasks using tools like GPT’s App Store.

Challenges and Security Concerns

Jailbreak and Prompt Injection Attacks

Jailbreaks allow bypassing safety instructions via roleplay or encoded queries.
Prompt injections hijack model instructions (e.g., hidden text in images).

Data Poisoning and Backdoor Attacks

Potential for training data manipulation to create triggers that alter model behavior.

Defense and Ongoing Security Efforts

Constant updates and defenses against emergent attacks.

Conclusion

LLMs as part of a new computing paradigm with unique challenges and opportunities.
Active development and interest in improving capabilities and security of LLMs.

Full transcript