GPU Functionality and Architecture

Overview

This lecture explains how graphics cards (GPUs) function, explores their physical and computational architecture, and discusses why they're ideal for video game rendering, Bitcoin mining, and AI.

GPU vs CPU

GPUs execute trillions of calculations per second, essential for rendering modern video games.
GPUs have thousands of simple cores; CPUs have fewer (e.g., 24), but they are more flexible and faster per task.
GPU analogy: cargo ship (high capacity, slow, less flexible); CPU analogy: jumbo jet (fast, flexible).
CPUs can run operating systems and a wide range of instructions; GPUs execute simple, repetitive calculations in parallel.

Physical Architecture of GPUs

The GA102 GPU contains 28.3 billion transistors with 7 Graphics Processing Clusters (GPCs), each with 12 Streaming Multiprocessors (SMs).
Each SM contains 4 warps and 1 ray tracing core; each warp has 32 CUDA (shading) cores and 1 tensor core.
3090 GPUs: 10,496 CUDA cores; chips with defects have some cores disabled to create lower-tier cards.
CUDA cores perform simple arithmetic; tensor cores handle matrix math; ray tracing cores execute ray tracing algorithms.
Special function units handle complex operations like division and trigonometric functions.
Additional components: memory controllers, PCIe interface, 6 MB cache, Gigathread Engine for task scheduling.

Graphics Card Components

Key parts: display ports, power connector, PCIe pins, voltage regulator module, heat sink/fans, and 24 GB of GDDR6X graphics memory.
Graphics memory feeds the GPU with required data at high speeds (up to 1.15 TB/s bandwidth).
Advanced memory (GDDR6X, GDDR7) uses multi-level signaling for faster data transfer.
HBM (High Bandwidth Memory) is used for AI chips, enabling large, fast memory stacks.

Computational Architecture & Parallelism

GPUs excel at "embarrassingly parallel" problems, easily split into thousands of independent tasks.
Use SIMD (Single Instruction Multiple Data): same operation executed across many data points (e.g., transforming millions of vertices in game scenes).
SIMD matched to GPU structure: threads → warps (32 threads) → thread blocks → grids, managed by the Gigathread Engine.
Modern GPUs use SIMT (Single Instruction Multiple Threads), allowing independent execution within warps for greater flexibility.

Bitcoin Mining & AI

GPUs ran many SHA-256 hashes in parallel for Bitcoin mining, but ASICs have replaced them due to higher efficiency.
Tensor cores multiply and add matrices rapidly, performing core operations required by neural networks and AI models.

Key Terms & Definitions

GPU (Graphics Processing Unit) — A processor specialized for high-volume, parallel numerical tasks, mainly graphics.
CPU (Central Processing Unit) — The main processor handling general computing tasks.
CUDA core — Simple processor in a GPU for arithmetic operations.
Tensor core — Unit optimized for matrix math used in AI/neural networks.
Ray tracing core — Handles algorithms for realistic light rendering.
SIMD (Single Instruction Multiple Data) — Processing model applying one operation to many data points.
SIMT (Single Instruction Multiple Threads) — Flexible GPU execution model where threads in a group can diverge.
GDDR6X/GDDR7 — High-speed graphics memory standards.
HBM (High Bandwidth Memory) — Stacked memory technology for high data rates.

Action Items / Next Steps

Watch supplementary videos on CPU architecture and detailed graphics rendering pipelines.
Review definitions of GPU components and parallel processing models for exam prep.