Video Chat System Design Overview

Overview

This lecture covers the systems design of large-scale video group chat applications (like Zoom or Skype), focusing on networking, architecture for scalability, selective forwarding, recording, and failover strategies.

Functional Requirements

Support both one-on-one and group video chats with up to 100 participants.
Enable server-side recording of video calls for later access.
Minimize client load and maximize scalability.

Networking Refresher

TCP guarantees ordered, reliable delivery with congestion and flow control, but adds latency.
UDP is preferred for real-time video/audio as it’s faster and can tolerate some dropped packets.

Peer-to-Peer vs Centralized Server

Peer-to-peer works for small calls but doesn’t scale for large groups due to high client load.
NAT (Network Address Translation) hides private IPs; STUN servers help peers discover each other but aren't always reliable.
Centralized server architecture reduces client load and is necessary for large video calls and recording.

Centralized Server Architecture

Each client sends/receives video streams only to/from the central chat server.
Central chat server can become a bottleneck if not managed carefully.

Selective Forwarding (SFU)

Clients specify which streams and resolutions they want via websocket to the chat server.
Three main approaches:
1. Server transcodes one client stream to multiple resolutions (high server load).
2. Use intermediary (proxy) servers for transcoding (adds latency).
3. Clients encode and send multiple resolution streams directly (more client work, less server load; the preferred method).

Partitioning and Replication

Calls are sharded by chat ID using consistent hashing to distribute load evenly.
Large calls may be further partitioned across multiple servers, with clients connecting to more than one server if needed.
Active-passive replication with failover using ZooKeeper ensures availability if a server fails.

Video Recording Architecture

Recording servers subscribe to the video chat, request desired streams, and encode/upload them to cloud storage (e.g., S3).
For multiple high-definition user streams, distribute recording tasks across multiple servers and partition frames using Kafka by chat ID.
Frame timestamps allow proper alignment when combining streams.

Key Terms & Definitions

NAT (Network Address Translation) — A device that maps private IP addresses to a public IP for external communication.
STUN Server — Helps devices discover their public IP and NAT mapping for peer connections.
UDP (User Datagram Protocol) — Connectionless, fast network protocol suited for real-time media.
TCP (Transmission Control Protocol) — Reliable, ordered protocol with flow/congestion control, used where reliability is more important than speed.
Selective Forwarding (SFU) — Server forwards only selected streams/resolutions to each client.
ZooKeeper — Tool for managing distributed server coordination and failover.
Kafka — Distributed queue used to buffer and partition streaming frames for recording and processing.

Action Items / Next Steps

Review the concepts of TCP, UDP, NAT, and STUN.
Understand the differences between peer-to-peer and centralized architectures for group calls.
Study how selective forwarding optimizes server and client load.
Learn failover strategies using active-passive replication and ZooKeeper.
Read about Kafka stream processing for distributed video recording.