Overview
This lecture covers the systems design of large-scale video group chat applications (like Zoom or Skype), focusing on networking, architecture for scalability, selective forwarding, recording, and failover strategies.
Functional Requirements
- Support both one-on-one and group video chats with up to 100 participants.
- Enable server-side recording of video calls for later access.
- Minimize client load and maximize scalability.
Networking Refresher
- TCP guarantees ordered, reliable delivery with congestion and flow control, but adds latency.
- UDP is preferred for real-time video/audio as itβs faster and can tolerate some dropped packets.
Peer-to-Peer vs Centralized Server
- Peer-to-peer works for small calls but doesnβt scale for large groups due to high client load.
- NAT (Network Address Translation) hides private IPs; STUN servers help peers discover each other but aren't always reliable.
- Centralized server architecture reduces client load and is necessary for large video calls and recording.
Centralized Server Architecture
- Each client sends/receives video streams only to/from the central chat server.
- Central chat server can become a bottleneck if not managed carefully.
Selective Forwarding (SFU)
- Clients specify which streams and resolutions they want via websocket to the chat server.
- Three main approaches:
- Server transcodes one client stream to multiple resolutions (high server load).
- Use intermediary (proxy) servers for transcoding (adds latency).
- Clients encode and send multiple resolution streams directly (more client work, less server load; the preferred method).
Partitioning and Replication
- Calls are sharded by chat ID using consistent hashing to distribute load evenly.
- Large calls may be further partitioned across multiple servers, with clients connecting to more than one server if needed.
- Active-passive replication with failover using ZooKeeper ensures availability if a server fails.
Video Recording Architecture
- Recording servers subscribe to the video chat, request desired streams, and encode/upload them to cloud storage (e.g., S3).
- For multiple high-definition user streams, distribute recording tasks across multiple servers and partition frames using Kafka by chat ID.
- Frame timestamps allow proper alignment when combining streams.
Key Terms & Definitions
- NAT (Network Address Translation) β A device that maps private IP addresses to a public IP for external communication.
- STUN Server β Helps devices discover their public IP and NAT mapping for peer connections.
- UDP (User Datagram Protocol) β Connectionless, fast network protocol suited for real-time media.
- TCP (Transmission Control Protocol) β Reliable, ordered protocol with flow/congestion control, used where reliability is more important than speed.
- Selective Forwarding (SFU) β Server forwards only selected streams/resolutions to each client.
- ZooKeeper β Tool for managing distributed server coordination and failover.
- Kafka β Distributed queue used to buffer and partition streaming frames for recording and processing.
Action Items / Next Steps
- Review the concepts of TCP, UDP, NAT, and STUN.
- Understand the differences between peer-to-peer and centralized architectures for group calls.
- Study how selective forwarding optimizes server and client load.
- Learn failover strategies using active-passive replication and ZooKeeper.
- Read about Kafka stream processing for distributed video recording.