📚

Understanding Transformer Models and Their Applications

Apr 22, 2025

Transformer Models: An Introduction and Catalog (2023 Edition)

Overview

  • Transformer models overview and history
  • Discussion on foundational vs. fine-tuned models
  • Recent updates and additions in 2023

Update Summary

  • May 2023 Update

    • Added new models, especially from Llama family
    • Included new image of models added since Feb 2023
    • Explained difference between fine-tuned and pre-trained models
    • Added section on model license status
    • Excluded models without public details (e.g., GPT-4, PALM-2)
    • Collaborators: Ananth Sankar, Jie Bing, Praveen Kumar Bodigutla, Timothy J. Hazen, and Michaeel Kazi
  • January 2023 Update

    • Added models contributing to ChatGPT development
    • Included models from Eleuther.ai, Anthropic, Stability.ai
    • Added discussion on Reinforcement Learning with Human Feedback (RLHF) and Diffusion models
    • Highlighted noteworthy transformers
    • Added timeline view of transformer models

What are Transformers?

  • Defined by architectural traits
  • Introduced in 2017 by Google in "Attention is All you Need"
  • Core features: Encoder-decoder architecture, multi-head attention

Key Architectural Features

  • Encoder/Decoder Architecture

    • Encoder: Encodes input into fixed-length vector
    • Decoder: Decodes vector into output sequence
    • Multi-head self-attention and feed-forward networks
  • Attention Mechanism

    • Mapping between query, key-value pairs, and output
    • Allows parallel computation of token representations

Foundation vs. Fine-tuned Models

  • Foundation Models: Trained on broad data, adaptable to various tasks
  • Fine-tuned Models: Further trained on specific data for a specific task
  • Examples include BERT, GPT, InstructGPT

Impact of Transformers

  • Revolutionized NLP and extended beyond language to other fields (e.g., image, audio)
  • Huggingface's role in making Transformers accessible

Diffusion Models

  • State-of-the-art in image generation
  • Relation to autoencoders and GANs

Catalog Features

  • Organized by family, architecture, task, extensions, application, publication date

Pretraining Architecture

  • Encoder Pretraining: Used for understanding tasks
  • Decoder Pretraining: Used for generation tasks
  • Encoder-Decoder Pretraining: Best for sequence-to-sequence tasks

Pretraining Tasks

  • Language modeling, masked language modeling, denoising autoencoders
  • Next Sentence Prediction, Replaced Token Detection

Applications

  • Primarily in NLP but extends to other domains

Catalog Table

  • Detailed descriptions and links to model documentation

Further Reading

  • Huggingface Transformers documentation
  • Various surveys and academic papers on LLMs

The notes provide a structured overview and insights into the advancements and applications of Transformer models, serving as a comprehensive guide for understanding and exploring different models and their capabilities.