Designing a Scalable Video Streaming Platform

Aug 11, 2024

Systems Design Lecture Notes

Introduction

  • Host: Jordan
  • Topic: Systems design for a video streaming platform (similar to YouTube/Netflix).
  • Tone: Tired, wants to move quickly through the content.

Key Objectives

  • Build a scalable video streaming site.
  • Essential functionality includes:
    • Users can post videos.
    • Users can watch videos.
    • Users can comment on videos.
    • Users can search for videos by name.

Capacity Estimates

  • Targeting 1 billion users.
  • Average 1,000 views per video.
  • Average video size: 100 MB.
  • Videos posted daily: 1 million.
  • Estimated storage needed per year: 40 PB (Petabytes).

Video Streaming Concepts

  • Support for multiple devices and varying network speeds.
  • Dynamic loading of video resolutions based on bandwidth. Examples:
    • Slow devices: 480p -> 720p, back and forth.
    • Fast devices: 4K, 1080p, etc.

Video Chunking

  • Videos split into chunks for efficient loading.
  • Benefits of chunking:
    • Parallel uploads: Upload video chunks through multiple ports.
    • Lower barrier to start watching: Load the first chunk and start streaming.
    • Adapt to changing network speeds: Load the best quality chunk based on current speed.

Database Schema

  1. Subscribers Table: Tracks user subscriptions (user ID, subscribed to ID).
  2. User Videos Table: Tracks videos posted by users (user ID, video ID, timestamp, metadata).
  3. Users Table: Basic user information (user ID, email, password).
  4. Video Comments Table: Comments on videos (video ID, commenter ID, content, timestamp).
  5. Video Chunks Table: Metadata for video chunks (video ID, encoding, resolution, chunk order).

Database Choices

  • Use MySQL for most tables due to read optimization and index efficiency.
  • Consider Cassandra for high-write scenarios (like comments) for its LSM tree architecture.

Video Upload Process

  • Upload videos infrequently; thus, focus on optimizing the write path for efficiency.
  • Use message brokers for processing and load balancing:
    • RabbitMQ for uploading chunk references.
    • Kafka for aggregating metadata and chunk completion events.
  • Chunks are uploaded to an object store (e.g., S3).

Processing and Aggregation

  • Use Flink for stream processing, ensuring every message is processed at least once.
  • Video processing:
    • Chunks uploaded to S3, then processed by various nodes.
    • References sent to Kafka and RabbitMQ.
    • Flink tracks state and aggregates chunk completion.

Search Index

  • Create an inverted index for video titles and descriptions.
  • Implement smart partitioning to optimize query speed by balancing load across nodes and reducing aggregation costs.

Conclusion

  • The lecture emphasized the importance of design considerations in building a scalable video streaming service.
  • Focus on the architecture that accommodates dynamic user needs and varying network conditions.
  • Final thoughts from the speaker about the complexity and exhaustion from the session.