🎥

Scalable Video Chat System Design

Jun 9, 2025

Overview

This lecture explains the system design of scalable video group chat applications (like Zoom or Skype), covering networking choices, server architecture, stream forwarding, and recording strategies.

Problem Requirements

  • Support both one-on-one and group video chats.
  • Enable large calls with up to 100 participants.
  • Allow recording of calls to a backend server, not the client, for later cloud access.

Networking Fundamentals

  • TCP: Reliable protocol ensuring order, retransmission, handshake, congestion and flow control.
  • UDP: Faster, less reliable protocol; ideal for video/audio streams since dropped frames are acceptable.

Peer-to-Peer vs. Centralized Servers

  • Peer-to-peer (P2P) is efficient for one-on-one but scales poorly for large groups.
  • NAT (Network Address Translation) and STUN servers enable P2P by exposing public IPs but don't always work.
  • Centralized servers simplify client load in group calls; each user sends/receives from only one server.

Selective Forwarding and Stream Customization

  • Centralized stream compilation increases server CPU load and forces uniform experience.
  • Selective Forwarding Units (SFUs): Server sends each client only the streams and resolutions they request, reducing bandwidth and offering a customizable view.

Stream Encoding Strategies

  • Server-side transcoding: Server converts a single client stream to multiple resolutions/formats (high server load).
  • Intermediary/proxy server: Offloads encoding from the main server but adds latency.
  • Client-side encoding (common in practice): Client sends multiple resolution streams; server simply forwards, minimizing server CPU usage.

Partitioning, Sharding, and Replication

  • Calls are sharded by chat ID using consistent hashing to balance load across servers.
  • Replication uses active-passive strategy: backup servers monitored by Zookeeper take over on failure.
  • Stateless design: Only forwarding streams; minimal state kept on backup servers.

Video Recording Architecture

  • Dedicated recording servers subscribe to video calls, encode, and upload footage to cloud storage (e.g., S3).
  • Full-stream HD recordings for all users may require sharding the recording workload over multiple servers.
  • Kafka is used for buffering streams, partitioned by chat ID; stateful consumers process and align frames via timestamps for final export.

Key Terms & Definitions

  • TCP (Transmission Control Protocol) — Reliable, connection-oriented network protocol with delivery guarantees.
  • UDP (User Datagram Protocol) — Lightweight, connectionless network protocol; no delivery guarantees.
  • NAT (Network Address Translation) — Router method mapping private network IPs to a public IP.
  • STUN (Session Traversal Utilities for NAT) — Protocol to discover public-facing IP addresses for P2P.
  • SFU (Selective Forwarding Unit) — Server that routes requested streams to clients at requested resolutions.
  • Zookeeper — Service for maintaining configuration information and providing distributed synchronization.
  • Kafka — Distributed messaging system used for buffering and streaming data.

Action Items / Next Steps

  • Review diagrams and data flow for typical video group chat architectures.
  • Study the trade-offs between server load, client complexity, and network efficiency in stream forwarding.
  • Understand the basics of sharding, replication, and failover in distributed systems.