Understanding Prompting and Fine-Tuning Methods

Aug 8, 2024

Lecture Notes: Prompting, Instruction Fine-Tuning, and RLHF

Introduction

  • Lecturer: Jesse Moo, PhD student in CS Department (NLP group)
  • Topic: Prompting, Instruction Fine-Tuning, and RLHF (Reinforcement Learning from Human Feedback)
  • Relevance: Key concepts behind the training of modern chatbots like ChatGPT and Bing

Course Logistics

  • Project Proposals: Due recently; mentors are being assigned.
  • Assignment 5: Due Friday at midnight; suggested tools include Colab, AWS, Azure, or Kaggle for GPU access.
  • Course Feedback Survey: Posted on Ed; due Sunday by 11:59 pm.

Lecture Overview

Large Language Models (LLMs)

  • Increase in compute and data for LLMs over the years.
  • Pre-training helps LLMs learn features like syntax, co-reference, sentiment, and more.
  • LLMs act as rudimentary world models due to vast internet data.
  • Examples of abilities: math reasoning, code generation, medical text comprehension.

Zero-shot and Few-shot Learning

  • Zero-shot learning: Perform tasks without explicit training, e.g., question answering by predicting next token.
  • Few-shot learning: Specify tasks by giving example inputs/outputs, improving performance.
  • Models:
    • GPT (2018): 117 million parameters, trained on books.
    • GPT-2 (2019): 1.5 billion parameters, trained on 40GB web text.
    • GPT-3 (2020): 175 billion parameters, enables few-shot learning.

Prompt Engineering

  • Chain of Thought Prompting: Demonstrate reasoning steps in prompts to improve task performance.
  • Zero-shot Chain of Thought Prompting: Simple instructions like "let's think step by step" can improve results.
  • Prompt Engineering: Emerging field, involves constructing effective prompts for various tasks.

Instruction Fine-Tuning

  • Objective: Align language models with user intent by fine-tuning on instruction-output pairs.
  • Data Sets: Large datasets like Supernatural Instructions (~1.6k tasks, 3 million examples) are used.
  • Evaluation Benchmarks: MMLU and Big Bench for assessing performance on diverse tasks.
  • Benefits: Generalizes to unseen tasks, smaller models can outperform larger models with fine-tuning.
  • Challenges: Expensive human data collection, creative/open-ended tasks, and token-level penalty issues.

Reinforcement Learning from Human Feedback (RLHF)

  • Objective: Maximize expected reward (human preference) for language model outputs.
  • Method:
    • Train a reward model to predict human preferences.
    • Use policy gradient methods to optimize language model parameters.
    • Include a penalty term to prevent divergence from pre-trained model.
  • Challenges: Human feedback is expensive, noisy, and miscalibrated; reward hacking; over-optimization.

Advanced Concepts and Future Directions

  • Constitutional AI: Using AI feedback to critique and improve language model outputs.
  • Self-Improvement: Fine-tuning models on their own outputs, particularly for Chain of Thought reasoning.
  • Challenges: High data requirements, reward hacking, hallucination, and security (jailbreaking issues).

Conclusion

  • Current State: Instruction fine-tuning and RLHF have significantly improved LLM capabilities but still face challenges.
  • Future Work: Exploring safer and more efficient methods to align AI models with human values and preferences.
  • Open Questions: Addressing fundamental limitations like hallucination and data efficiency for RLHF.

Final Remarks

  • Exciting Time: Fast-paced developments in LLM research, requiring continual updates and innovations.

End of Notes