๐Ÿ“Š

System Design Concepts: Capacity Estimations

Jul 21, 2024

System Design Concepts for Beginners: Capacity Estimations

Introduction

  • Capacity estimations in system design interviews are crucial, especially in big tech companies.
  • Typically follows after clarifying system requirements.
  • Often, estimations donโ€™t directly affect design in interviews.
  • Estimations demonstrate problem-solving approach.

Importance in Real Life

  • To determine resources needed for high-level system functioning:
    • Number of servers
    • Data partitions
    • Memory for caching

Key Points of Estimation

  • Demonstrates organized thinking and problem-solving skills.
  • Important numbers to estimate:
    • Traffic
    • Storage
    • Bandwidth
    • Memory for caching
  • Focus on traffic and storage in interviews; optional to include bandwidth and caching.

Traffic Estimation

  • Input: Number of daily active users (DAUs).
    • Example: 10 million DAUs.
  • Convert users to requests:
    • Consider user interactions (read and write requests).
  • **Typical usage consideration: **
    • Example scenario: Paste bin service (upload & view text).
    • Assume 10% users upload once per day; therefore, 1 million write requests per day.
    • Determine read to write ratio; assume read-heavy (example: 50 to 1 ratio).
    • Calculate read requests per day (example: 50 million reads per day).
  • Convert to requests per second:
    • Divide daily requests by seconds in a day (~100K seconds).
    • Example results: 10 write requests/second and 500 read requests/second.

Storage Estimation

  • Identify data artifacts: Focus on the largest storage requirement.
    • Example: Pastes in a paste bin service.
  • **Estimate storage based on data size: **
    • Example: Average paste size ~10KB.
    • Calculate daily storage need: 10KB * 1 million write requests = 10GB/day.
  • **Consider data retention period: **
    • Example: Data expires in 5 years => Storage for 5 years.
    • Calculate total storage: 10GB/day * 2000 days = 20TB.
    • Consider data replication (example: 3 times) => 60TB total.

Bandwidth Estimation

  • **Incoming data per second: **
    • Write requests: 10 requests/second * 10KB = 100KB/second.
  • **Outgoing data per second: **
    • Read requests: 500 requests/second * 10KB = 5MB/second.

Cache Memory Estimation

  • 80-20 Rule for Caching: 20% of data generating 80% of traffic.
    • Example: Cache 20% of read requests.
    • Calculate memory for cache: 50 million daily reads * 2KB = 100GB.
    • Consider duplicates; actual usage will be less.*

Application Servers Estimation

  • Formula: Number of requests per second / Requests a server can handle.
    • Consider server specifications and if CPU-bound.
    • Example: 500 requests/second, 16 requests/server => 30-50 servers needed.

Common Sizes and Estimations

  • **Text Estimates: **
    • English language ~500K words.
    • One line text ~10 words, one word ~5 characters or 5 bytes.
  • **Media Estimates: **
    • HD image ~3MB (1280x720 pixels, 24-bit depth).
    • Profile image ~300KB (300x300 pixels).
    • 1 minute HD video ~50MB (compression included).
    • Consider storage for multiple resolutions (double size for HD).

Conclusion

  • Estimations are approximate; aggressive approximation is key.
  • Understand common sizes for text and media.
  • Demonstrates organized and methodical problem-solving skills in design interviews.