Exploring Large Language Models Overview

May 28, 2025

Lecture on Large Language Models

Introduction

  • Overview of a recent 30-minute talk on large language models (LLMs).
  • Discussion on the Llama 270b model by Meta AI.
  • Comparison to models like GPT (OpenAI) with closed architecture.

What is a Large Language Model?

  • Consists of two files: a parameters file (70 billion parameters, 140GB) and a run file (neural network architecture code).
  • Can run on a MacBook, does not need internet connection.
  • Parameters represent a "lossy compression" of the internet, storing vast amounts of data in a compact form.

Training and Inference

  • Training: Involves compressing large amounts of internet text data using powerful GPU clusters.
    • Example: Llama 270b requires 6,000 GPUs, 12 days, $2 million.
  • Inference: Running the model—cheap compared to training.
  • The model predicts the next word in a sequence, learning a lot about the world.
  • Models achieve a "Gestalt" understanding rather than memorization of data.

Neural Network Architecture

  • Transformer neural network architecture is used.
  • The model learns to predict the next word, enabling it to "dream" or generate text resembling internet documents.
  • Models behave as empirical artifacts; difficult to understand fully but can be measured and evaluated.

Fine-Tuning and Improving Models

  • Pre-training: Phase where large datasets are used to gather knowledge.
  • Fine-tuning: Involves customizing models into assistant models through high-quality Q&A datasets.
    • Uses human labelers to create ideal response data.
  • Stage Two: Further fine-tuning using comparison labels (Reinforcement Learning from Human Feedback).
  • Open-source models like Llama allow easier fine-tuning by users.

Current State and Future Directions

  • Closed models (e.g., GPT, Claude) lead in performance, open models (e.g., Llama) follow closely.
  • Scaling Laws: Model performance improves predictably with increased parameters and data.
  • Capabilities: Models now use tools (browsers, calculators) for enhanced problem-solving.
  • Multi-modality: Involves understanding and generating from multiple types of input (text, images, audio).

System One vs. System Two Thinking

  • Current language models operate on instinct (System One).
  • System Two involves deeper reasoning and problem-solving, future goal for language models.
  • Self-improvement in narrow domains is a potential development path.

Security Concerns

  • Jailbreak Attacks: Exploit loopholes to bypass model restrictions (e.g., role-playing scenarios).
  • Data Poisoning: Attackers can manipulate model training data to introduce backdoors.
  • Prompt Injection: Injecting misleading instructions to alter model behavior.
  • Security in LLMs involves a continuous cycle of attack and defense.

Conclusion

  • LLMs are evolving into complex systems, comparable to operating systems in functionality.
  • The field is rapidly developing with both exciting opportunities and significant security challenges.