System Design Mock Interview: Rate Limiter

Jul 4, 2024

System Design Mock Interview: Rate Limiter

Introduction

  • Speaker: Engineering Manager at Meta
  • Topic: Designing a rate limiter

What is a Rate Limiter?

  • Definition: A system that blocks or allows a certain number of requests within a specified period.
  • Purpose:
    • Prevents Denial of Service (DDOS) attacks.
    • Reduces operational costs by managing server load.
    • Prevents system overload and ensures fair usage among users.

Implementation Considerations

  • Server-side Implementation:
    • Rationale: Harder for malicious actors to bypass compared to client-side implementations.
    • Throttling Criteria: Uses unique identifiers such as IP addresses, User IDs.
      • IP addresses are generally more reliable than user IDs for rate limiting.
  • Communication:
    • HTTP 429 status response when a user is throttled.
    • Log traffic patterns for analysis and adjustments.

Rate Limiting Algorithms

  1. Token Bucket System:

    • Each incoming request consumes a token.
    • Tokens are refilled at a constant rate (e.g., 1 token/second).
    • When tokens are exhausted, subsequent requests are throttled (HTTP 429).
    • Downside: Spikes in traffic can exhaust tokens quickly.
    • Adjustment: Tweak the size of the bucket.
  2. Fixed Window System:

    • Fixed number of requests allowed within a time window (e.g., 30 requests/10 seconds).
    • Once the limit is reached, no requests are allowed until the next window.
    • Downside: Potential spikes at the edge of windows.
    • Variants: Sliding windows to adjust dynamically based on traffic.

Example: Implementing a Rate Limiter for Twitter

  • Preferred Algorithm: Sliding window.
  • Rationale: Handles traffic spikes and lulls effectively.

High-Level Design

Components

  • Rate Limiter Middleware: Processes requests before they reach API servers.
  • Rule Engine: Defines rate limiting rules and criteria.
    • Handles IP or user-based throttling.
  • High Throughput Cache: Stores request counts for fast read/write operations.
  • Logging Service: Logs data for retrospective analysis.

Workflow

  1. Client Requests: Sent to the Rate Limiter Middleware.
  2. Middleware Checks: Validates against rules from Rule Engine.
  3. Cache: Reads/Writes request counts or status.
  4. Response:
    • Success: Forward request to API servers and ultimately return a 200 response to the client.
    • Failure: Return HTTP 429 response.
  5. Logging: Logs the transaction for analysis.

Additional Considerations

  • Rules Cache: Optimizes the read operations from Rule Engine.
  • Distributed Environment:
    • Use a Load Balancer to manage requests across multiple data centers.
    • Shared high-throughput cache across distributed data centers to maintain global rate limiting.
    • Multiple readings from Rule Cache in different data centers if needed.

Handling Distributed Systems

  • Shared Cache: Ensures consistent rate limiting across data centers.
  • Load Balancer: Manages incoming traffic distribution.
  • Rule Consistency: Shared globally to prevent circumvention attempts.

User Perspective

  • Non-Malicious User:
    • Understand vendor rate limits and plan accordingly.
    • Handle 429 responses gracefully (e.g., showing appropriate errors, using retry strategies).

Conclusion

  • Rate limiting is crucial for maintaining system integrity and performance.
  • The design should be adaptable to handle different traffic patterns and operational scales.
  • Goal: Understand requirements, constraints, and provide a robust, flexible solution.

Final Thoughts

  • System design interviews test your ability to understand and solve complex problems within constraints.
  • Focus on high-level architecture and decision rationale.

Acknowledgment

  • Outro thanking the guest for their valuable insights and discussing the importance of system design interviews.