Goal: To simplify and explain basic system design concepts for beginners.
Importance of Availability
Definition: Availability is the percentage of time a system is operational and performing its intended function.
Real-World Impact: Downtime can lead to significant revenue losses and other negative consequences, e.g., Facebook's 6-hour outage in October 2021 led to an estimated $60 million loss in ad revenue.
Different Levels of Availability: Different systems require varying levels of availability.
Example: Air traffic control system vs. restaurant reservation system.
Measuring Availability: Often measured in "nines" (9's):
2 Nines (99%): 3.6 days downtime/year
3 Nines (99.9%): 8.7 hours downtime/year
4 Nines (99.99%): 52 minutes downtime/year
5 Nines (99.999%): Less than 6 minutes downtime/year (challenging but achievable for some systems)
Achieving High Availability
Factors Affecting Availability
Hardware failures, power outages, natural disasters
Resource exhaustion (e.g., disk space, overloads)
Software bugs (e.g., null pointers, memory leaks)
Design Strategy: Accept that failures are inevitable and design to mask these localized failures.
Eliminate Single Points of Failure
Example: A single application server goes down -> downtime for the website
Solution: Redundancy (e.g., multiple app servers sharing the load)
Load Balancer
Manages client requests and distributes them to available servers
Can fail itself -> use redundancy (backup setup or multiple active load balancers)
Backup Setup: Use passive secondary load balancer to take over using floating IP in case of primary failure
Active-Active Setup: Both load balancers share the load using DNS for redundancy
DNS Issue: DNS servers may not be immediately aware of load balancer failure; requires additional monitoring service to update DNS
Geographic Considerations
Geographic Redundancy
Distribute servers globally to mitigate regional outages
Improves latency by serving requests from the nearest server
Trade-offs: Increased complexity and cost
General Advice
Start Simple: Begin with the simplest design, then optimize
Consider Trade-offs: More components increase complexity and the potential for failure
Prioritize Redundancy Justifiably: Only add redundancy where critical for system function and data integrity
Conclusion
Video aims to make system design concepts simple and accessible
Future videos will cover more topics like load balancing and DNS in detail