Transformer Models: An Introduction and Catalog (2023 Edition)
Overview
- Transformer models overview and history
- Discussion on foundational vs. fine-tuned models
- Recent updates and additions in 2023
Update Summary
-
May 2023 Update
- Added new models, especially from Llama family
- Included new image of models added since Feb 2023
- Explained difference between fine-tuned and pre-trained models
- Added section on model license status
- Excluded models without public details (e.g., GPT-4, PALM-2)
- Collaborators: Ananth Sankar, Jie Bing, Praveen Kumar Bodigutla, Timothy J. Hazen, and Michaeel Kazi
-
January 2023 Update
- Added models contributing to ChatGPT development
- Included models from Eleuther.ai, Anthropic, Stability.ai
- Added discussion on Reinforcement Learning with Human Feedback (RLHF) and Diffusion models
- Highlighted noteworthy transformers
- Added timeline view of transformer models
What are Transformers?
- Defined by architectural traits
- Introduced in 2017 by Google in "Attention is All you Need"
- Core features: Encoder-decoder architecture, multi-head attention
Key Architectural Features
Foundation vs. Fine-tuned Models
- Foundation Models: Trained on broad data, adaptable to various tasks
- Fine-tuned Models: Further trained on specific data for a specific task
- Examples include BERT, GPT, InstructGPT
Impact of Transformers
- Revolutionized NLP and extended beyond language to other fields (e.g., image, audio)
- Huggingface's role in making Transformers accessible
Diffusion Models
- State-of-the-art in image generation
- Relation to autoencoders and GANs
Catalog Features
- Organized by family, architecture, task, extensions, application, publication date
Pretraining Architecture
- Encoder Pretraining: Used for understanding tasks
- Decoder Pretraining: Used for generation tasks
- Encoder-Decoder Pretraining: Best for sequence-to-sequence tasks
Pretraining Tasks
- Language modeling, masked language modeling, denoising autoencoders
- Next Sentence Prediction, Replaced Token Detection
Applications
- Primarily in NLP but extends to other domains
Catalog Table
- Detailed descriptions and links to model documentation
Further Reading
- Huggingface Transformers documentation
- Various surveys and academic papers on LLMs
The notes provide a structured overview and insights into the advancements and applications of Transformer models, serving as a comprehensive guide for understanding and exploring different models and their capabilities.