🖥️

[Lecture 31] GPU as General Purpose Processors Overview

Apr 11, 2025

Lecture Notes: GPU and Accelerators as General Purpose Processors

Introduction

  • Focus on GPUs as general purpose processors, not graphics processing.
  • Programming with CUDA or OpenCL for general purpose processing.

Lecture Organization

  1. Typical programming structure for CUDA/OpenCL.
  2. Bulk synchronous parallel programming model.
  3. Memory hierarchy and management.
  4. Performance optimization techniques.
  5. Collaborative computing (time permitting).

Recommended Readings

  • CUDA Programming Guide.
  • Book by Professor W. May and David Kirk.

Historical Context

  • General purpose GPU programming began 12-13 years ago.
  • Tesla architecture: first Nvidia GPUs for general purpose processing.

GPU Architecture

  • 240 streaming processors (aka CUDA processors).
  • SIMT execution with 30 cores, each containing SIMD pipelines.
  • Memory hierarchy includes shared L2 cache and HBM2 memory.
  • Tensor cores for machine learning in Volta architecture.

Programming Model

  • Bulk Synchronous Parallel Programming Model.
  • No global synchronization inside a kernel.
  • SPMD model allows for irregular workloads.
  • Key: Correctly mapping computation to threads and blocks.

GPU Memory Hierarchy

  • Components: registers, shared memory, global memory.
  • Shared memory allows for fast data exchange within a block.
  • Memory coalescing important for performance.

Performance Considerations

  • CPU-GPU data transfer bottlenecks.
  • Importance of occupancy and latency hiding through multi-threading.
  • Memory coalescing for accessing global memory efficiently.
  • Shared memory bank conflicts and padding to improve performance.
  • Divergence avoidance in warp execution.
  • Optimizing atomic operations via privatization.

Collaborative Computing

  • Unified memory for seamless CPU-GPU collaboration.
  • Partitioning workloads between CPUs and GPUs for efficiency.
  • Fine-grained task partitioning possible with unified memory.

Conclusion

  • GPUs are powerful for general purpose processing but require careful programming to optimize performance.
  • Collaborative patterns and unified memory are advancing the efficiency of CPU-GPU workloads.