Lecture Notes on Computer Architecture: Processing in Memory (PIM)
Lecture Introduction
The lecture covers processing in memory architectures.
Focus areas include processing near memory and processing in memory.
Potential need for breaks during the lecture; we may finish a bit earlier.
Introduction to Lecture Three of the course on Computer Architecture, focusing on processing in memory.
Key Concepts of Processing in Memory
Processing in Memory (PIM): Incorporates compute capability within or near the memory or storage.
Processing Near Memory: Places processing elements near the memory arrays or storage units.
Divide processing in memory into processing near memory and processing in memory proper.
Lecture Structure
Processing Near Memory: Focus of this lecture.
Real-world Processing in Memory Architectures: Covered in subsequent lectures.
Challenges and Enablers: Discussed for making PIM real-world usable.
Processing Using Memory: To be covered in future lectures.
Why Processing in Memory?
Challenges:
Data access is a major bottleneck due to increasingly data-hungry applications.
High energy consumption, dominated by data movement over computation.
High latency and high energy costs caused by data movement.
Energy-efficient and sustainable computing is essential.
Opportunities:
Minimize data movement by performing computations directly inside or close to the memory.
Utilize near data processing to improve overall system efficiency.
System Trends and Analysis
Bottlenecks:
Accessing main memory has significant overheads in both time and energy.
Typical designs are processor-centric, creating inefficiencies in data movement.
Key Statistics:
Up to 55% of execution time is spent fetching data from memory.
More than half of mobile system energy is spent on data movement.
High energy usage due to data movement rather than computation.
Key Solutions and Architectural Changes
Paradigm Shift: Transition to computing with minimal data movement.
Compute where the data is (processors, caches, memory, storage).
Memory as an Accelerator: Memory should not only store data but also perform computations when necessary.
New System Design: Develop innovative hardware and software interfaces, new ISAs, and programming frameworks.
Real-world PIM Systems
Samsung HBM-P: High-bandwidth memory with integrated processing units.
SK Hynix AIM: Specialized processing units for AI/ML within memory.
Alibaba HBM: Hybrid bonding equipped for recommendation systems.
Upmem PIM: Processing units within DRAM chips to accelerate memory-intensive tasks.
Examples and Programming Models
Example Workloads:
Graph processing: Large, sparse graphs processed using PIM can significantly benefit from high throughput and low energy processing elements near memory.
Google Workloads for Consumer Devices: Improving energy efficiency by offloading functions like packing and quantization to near-memory processors.
Upmem PIM Programming:
Utilize up to 2560 DPUs, with each accessing its own dedicated memory bank.
Split workloads across DPUs and within DPUs across multiple tasklets to maximize parallelism.
Set tasklets for efficient parallel computing; typically use a minimum of 11 tasklets per DPU to keep all pipeline stages active.
Conclusion
Processing in memory offers significant advantages but also requires careful design and programming adjustments.
The Upmem PIM system demonstrates real-world implementation and practical programming considerations.
Future lectures will further explore enabling adoption, detailed programming, and architectural challenges to ensure smooth integration of PIM systems into the real world.
Next Steps:
Understand practical PIM programming through lab exercises on Upmem PIM systems.
Further discussion on addressing architectural and programming challenges to boost PIM adoption in practical computing environments.
References
Detailed reading on processing in memory concepts, historical perspectives on the evolution of PIM, and recent advancements in both academic and industry-driven PIM research.
Access to SDK documentation and user manuals for hands-on exercises.