Lecture Notes: Scalability in System Design

Introduction to Scalability

Scalability is crucial for applications that may experience sudden traffic surges.
Aim: Build applications that maintain performance under pressure.

A system is scalable if it can handle increased loads by adding resources without sacrificing performance.
Focus on efficiency and cost-effectiveness in scaling strategies.

Coordination of added resources (processors, servers) is essential to avoid performance overhead.
Evaluate scalability by comparing systems, not just labeling them as scalable or not.
Use response vs. demand curves to visually assess scalability:
- X-axis: Demand
- Y-axis: Response Time

No system is infinitely scalable; every system has limits.
Identify the tipping point on the response vs. demand curve where performance degrades.

Centralized Components
- E.g., single database server can become a bottleneck.
High Latency Operations
- Time-consuming tasks can slow down overall response time.
- Mitigation strategies: Optimize performance, implement caching, and use replication.

Statelessness
- Servers do not retain client-specific data between requests.
- Enhances horizontal scalability and fault tolerance.
- For stateful applications, externalize state management.
Loose Coupling
- Design components to operate independently with minimal dependencies.
- Use well-defined APIs for communication.
- Allows for scaling specific parts without affecting the entire system.
Asynchronous Processing
- Use event-driven architecture for non-blocking operations.
- Reduces tight coupling and risk of cascading failures.

Vertical Scaling (Scaling Up)
- Increase capacity of a single machine (CPU, RAM, Storage).
- Limitations: Physical and economic constraints.
Horizontal Scaling (Scaling Out)
- Add more machines to share the workload.
- Better fault tolerance and cost-effectiveness for large-scale systems.
- Challenges: Data consistency and managing distributed systems.

Load Balancing
- Direct incoming requests efficiently to servers.
- Algorithms: Round-robin, least connections, performance-based.
Caching
- Store frequently accessed data closer to where it's needed.
- Use client-side, server-side, or distributed caches.
- Consider Content Delivery Network (CDN) for global traffic management.
Sharding
- Split large datasets into smaller pieces across different servers.
- Parallel processing and workload distribution.
- Choose effective sharding strategies to avoid hotspots.
Avoid Centralized Resources
- Centralized components become bottlenecks under load.
- Use multiple queues for processing and break long tasks into smaller tasks.
Modularity in Design
- Create loosely coupled modules communicating through APIs.
- Enhances scalability and maintainability.

Scalability is an ongoing process of monitoring and optimization.
Key metrics to monitor: CPU usage, memory consumption, network bandwidth, response times, throughput.
Adapt architecture as application needs change.