System Design Mock Interview: Designing Spotify

Jul 10, 2024

System Design Mock Interview: Designing Spotify

Introduction

  • Objective: High-quality system design answer for interview preparation.
  • Presenter: Mark, former engineering manager at Google.
  • Constraints: Focus on finding and playing music.

Understanding Spotify

  • Spotify Components:
    • Songs (Audio files)
    • Playlists
    • Users
    • Artists
    • Podcasts

Key Use Cases

  • Finding Music: Searching and discovering songs.
    • Input: Specific song, artist, genre, etc.
  • Playing Music: Streaming chosen songs.

Metrics and Capacity

  • User Base: 1 billion users.
  • Songs: 100 million songs.
  • Data Estimates:
    • MP3 file size: ~5 MB per song.
    • Total audio data: ~500 TB (500 trillion bytes).
    • Replication: ~1.5 PB (3x replication).
    • Metadata: ~100 GB.
    • User data: ~1 TB.

High-Level Design Components

  • Client: Spotify App on mobile phones.
  • Servers:
    • Spotify Web/Application Servers (multiple).
    • Load Balancer for distributing requests.
  • Databases:
    • Song Audio Database: Blob storage (e.g., Amazon S3).
    • Metadata Database: Relational Database (e.g., Amazon RDS).

Detailed Design

  • Data Storage:
    • Audio files in S3 (immutable, scalable, blob storage).
    • Metadata in RDS (queries, updates).

Example Song Metadata

  • Attributes: Song ID, URL, Artist, Genre, Album cover link, Audio link.
  • Size distribution: Audio data (~1.5 PB) vs. Metadata (~100 GB).

Use Case: Finding and Playing Music

  • Finding Music:
    • User initiates a search via app -> Request to Web Server -> Query Metadata DB -> Return results to app.
    • No audio database access needed.
  • Playing Music:
    • User selects a song -> Request to Web Server -> Fetch metadata (audio link) -> Access Song Audio DB -> Stream audio to app.
    • Potential use of websocket for continuous streaming.

Optimizations

  • Content Delivery Network (CDN):
    • Use CDN (e.g., AWS CloudFront) for caching popular songs.
    • Flow: Initial request -> Web Server -> Song in CDN -> Future requests -> Directly from CDN.
  • Caching Layers:
    • Web Server Cache: Temporary storage for frequently accessed songs.
    • Local Storage: Cache recent songs on user’s device.

Load Balancing

  • Metrics for Load Balancing:
    • Network bandwidth, Memory usage, Outstanding requests/streams.
    • Example: AWS ELB or similar technology for smarter load balancing.

Global Scale Considerations

  • Data Replication:
    • Multi-region replication for availability and performance.
    • Geo-aware data placement for local access efficiency.

Conclusion

  • Wrap up with a review of initial requirements.
  • Brief mention of geolocation for enhanced performance.
  • Open for questions and additional considerations.

Final Thoughts

  • Practice both visual (diagrams) and verbal communication during system design.
  • Always relate back to user requirements and scalability during interviews.