Attention Mechanisms: Overview of the 2014 paper introducing attention in sequence models.
Transformers: Base architecture for most modern LLMs. Overview of the paper "Attention is All You Need." Transformer encoders and decoders, multi-headed attention, positional encoding.
Generative AI vs. Discriminative AI
Discriminative Models: Classical supervised models like RNNs for fixed input-output lengths.
Generative Models: Training involves unsupervised learning, supervised fine-tuning, and reinforcement learning; capable of generating new data.
Large Language Models (LLMs)
Definition: Trained on huge datasets, capable of multiple tasks (text generation, summarization, etc.).
Bases of LLMs: Large data needs, complexity in neural networks, unsupervised learning, supervised fine-tuning.
Model Types: Encoder-only (BERT), Decoder-only (GPT), Encoder-Decoder (T5).
Applications: Various tasks like transcription, translation, question answering.
OpenAI and Open Source Models: Overview of models like GPT-3, GPT-4, Bloom, Llama 2.