📚

Understanding Transformer Models and Their Applications

Apr 22, 2025

Transformer Models: An Introduction and Catalog (2023 Edition)

Overview

Transformer models overview and history
Discussion on foundational vs. fine-tuned models
Recent updates and additions in 2023

Update Summary

May 2023 Update
- Added new models, especially from Llama family
- Included new image of models added since Feb 2023
- Explained difference between fine-tuned and pre-trained models
- Added section on model license status
- Excluded models without public details (e.g., GPT-4, PALM-2)
- Collaborators: Ananth Sankar, Jie Bing, Praveen Kumar Bodigutla, Timothy J. Hazen, and Michaeel Kazi
January 2023 Update
- Added models contributing to ChatGPT development
- Included models from Eleuther.ai, Anthropic, Stability.ai
- Added discussion on Reinforcement Learning with Human Feedback (RLHF) and Diffusion models
- Highlighted noteworthy transformers
- Added timeline view of transformer models

What are Transformers?

Defined by architectural traits
Introduced in 2017 by Google in "Attention is All you Need"
Core features: Encoder-decoder architecture, multi-head attention

Key Architectural Features

Encoder/Decoder Architecture
- Encoder: Encodes input into fixed-length vector
- Decoder: Decodes vector into output sequence
- Multi-head self-attention and feed-forward networks
Attention Mechanism
- Mapping between query, key-value pairs, and output
- Allows parallel computation of token representations

Foundation vs. Fine-tuned Models

Foundation Models: Trained on broad data, adaptable to various tasks
Fine-tuned Models: Further trained on specific data for a specific task
Examples include BERT, GPT, InstructGPT

Impact of Transformers

Revolutionized NLP and extended beyond language to other fields (e.g., image, audio)
Huggingface's role in making Transformers accessible

Diffusion Models

State-of-the-art in image generation
Relation to autoencoders and GANs

Catalog Features

Organized by family, architecture, task, extensions, application, publication date

Pretraining Architecture

Encoder Pretraining: Used for understanding tasks
Decoder Pretraining: Used for generation tasks
Encoder-Decoder Pretraining: Best for sequence-to-sequence tasks

Pretraining Tasks

Language modeling, masked language modeling, denoising autoencoders
Next Sentence Prediction, Replaced Token Detection

Applications

Primarily in NLP but extends to other domains

Catalog Table

Detailed descriptions and links to model documentation

Further Reading

Huggingface Transformers documentation
Various surveys and academic papers on LLMs

The notes provide a structured overview and insights into the advancements and applications of Transformer models, serving as a comprehensive guide for understanding and exploring different models and their capabilities.

View note sourcehttps://amatria.in/blog/transformer-models-an-introduction-and-catalog-2d1e9039f376/