Overview
This lecture explains the system design of scalable video group chat applications (like Zoom or Skype), covering networking choices, server architecture, stream forwarding, and recording strategies.
Problem Requirements
- Support both one-on-one and group video chats.
- Enable large calls with up to 100 participants.
- Allow recording of calls to a backend server, not the client, for later cloud access.
Networking Fundamentals
- TCP: Reliable protocol ensuring order, retransmission, handshake, congestion and flow control.
- UDP: Faster, less reliable protocol; ideal for video/audio streams since dropped frames are acceptable.
Peer-to-Peer vs. Centralized Servers
- Peer-to-peer (P2P) is efficient for one-on-one but scales poorly for large groups.
- NAT (Network Address Translation) and STUN servers enable P2P by exposing public IPs but don't always work.
- Centralized servers simplify client load in group calls; each user sends/receives from only one server.
Selective Forwarding and Stream Customization
- Centralized stream compilation increases server CPU load and forces uniform experience.
- Selective Forwarding Units (SFUs): Server sends each client only the streams and resolutions they request, reducing bandwidth and offering a customizable view.
Stream Encoding Strategies
- Server-side transcoding: Server converts a single client stream to multiple resolutions/formats (high server load).
- Intermediary/proxy server: Offloads encoding from the main server but adds latency.
- Client-side encoding (common in practice): Client sends multiple resolution streams; server simply forwards, minimizing server CPU usage.
Partitioning, Sharding, and Replication
- Calls are sharded by chat ID using consistent hashing to balance load across servers.
- Replication uses active-passive strategy: backup servers monitored by Zookeeper take over on failure.
- Stateless design: Only forwarding streams; minimal state kept on backup servers.
Video Recording Architecture
- Dedicated recording servers subscribe to video calls, encode, and upload footage to cloud storage (e.g., S3).
- Full-stream HD recordings for all users may require sharding the recording workload over multiple servers.
- Kafka is used for buffering streams, partitioned by chat ID; stateful consumers process and align frames via timestamps for final export.
Key Terms & Definitions
- TCP (Transmission Control Protocol) — Reliable, connection-oriented network protocol with delivery guarantees.
- UDP (User Datagram Protocol) — Lightweight, connectionless network protocol; no delivery guarantees.
- NAT (Network Address Translation) — Router method mapping private network IPs to a public IP.
- STUN (Session Traversal Utilities for NAT) — Protocol to discover public-facing IP addresses for P2P.
- SFU (Selective Forwarding Unit) — Server that routes requested streams to clients at requested resolutions.
- Zookeeper — Service for maintaining configuration information and providing distributed synchronization.
- Kafka — Distributed messaging system used for buffering and streaming data.
Action Items / Next Steps
- Review diagrams and data flow for typical video group chat architectures.
- Study the trade-offs between server load, client complexity, and network efficiency in stream forwarding.
- Understand the basics of sharding, replication, and failover in distributed systems.