Essential Principles of System Design

Aug 14, 2024

System Design Tutorial Notes

Introduction

  • Focus on scalability, reliability, data handling, and high-level architecture.
  • Concepts covered for system design interviews.
  • Importance of understanding how to integrate system components rather than just coding.

Computer Architecture Basics

Layers of a Computer System

  • Computers operate on binary data (0s and 1s).
  • Bits: Smallest unit of data.
  • Bytes: 1 byte = 8 bits; represents a character or number.
  • Data storage hierarchy: Kilobyte (KB), Megabyte (MB), Gigabyte (GB), Terabyte (TB).

Storage Types

  • Disk Storage: HDD (hard disk drive) vs. SSD (solid state drive).
    • HDD: Slower (80-160 MB/s), non-volatile, larger capacity.
    • SSD: Faster (500-3500 MB/s), more expensive, non-volatile.
  • RAM (Random Access Memory): Volatile memory used for active data.
    • Size: Ranges from GB in consumer devices to hundreds of GB in servers.
    • Speed: Often exceeds 5,000 MB/s.
  • Cache: Smaller, faster memory (in MB) that stores frequently accessed data.
    • Levels: L1, L2, and L3 caches.

CPU and Motherboard

  • CPU: Processes instructions (fetch, decode, execute).
  • Motherboard: Connects all components and facilitates data flow.

High-Level Architecture of a Production-ready App

CI/CD Pipeline

  • Continuous Integration and Deployment automates code testing and deployment.
  • Tools: Jenkins, GitHub Actions.

Handling User Requests

  • Load Balancers: Distribute incoming requests across multiple servers.
  • Reverse Proxies: Manage user requests and enhance performance.

Data Storage

  • External storage servers manage data not on production servers.
  • Logging and Monitoring: Essential for tracking interactions and anomalies.
    • Tools: PM2 for back end; Sentry for front end.

Alerting and Debugging

  • Integration of alerting systems (e.g., Slack) for real-time issue resolution.
  • Debugging process: Identify issues, replicate in a safe environment, and apply hotfixes.

Key Principles of System Design

  • Scalability: Ability to grow with user base.
  • Maintainability: Future developers can understand and enhance the system.
  • Efficiency: Optimal resource usage.
  • Failure Planning: Building resilience against errors and outages.

CAP Theorem (Brewer's Theorem)

  • Consistency: All nodes have the same data at the same time.
  • Availability: System is operational and responsive to requests.
  • Partition Tolerance: Continues functioning despite network partitions.
  • Trade-offs: Can only achieve two of the three properties at once.

Measuring System Performance

  • Availability: Measured in uptime/downtime (e.g., aiming for 99.999% availability).
  • Service Level Objectives (SLO): Performance goals set for services.
  • Reliability: Consistent performance under expected conditions.
  • Throughput: Amount of data processed over time (requests per second, queries per second).
  • Latency: Time taken to handle a single request.

Networking Basics

  • IP Addresses: Unique identifiers for devices (IPv4 and IPv6).
  • Data Transmission: Governed by Internet Protocol, involving packets with headers.
  • Transport Layer: TCP (reliable) vs. UDP (faster but less reliable).
  • DNS: Translates domain names to IP addresses.

Application Layer Protocols

  • HTTP: Request-response protocol; stateless.
  • WebSockets: Two-way communication for real-time updates.
  • SMTP, POP3, IMAP: Email protocols for sending and retrieving messages.
  • FTP: File transfer protocol for website maintenance.

API Design Basics

CRUD Operations

  • Create, Read, Update, Delete operations defined for APIs.
  • Common methods: POST, GET, PUT, DELETE.
  • REST vs. GraphQL vs. gRPC: Different paradigms for API design.

Best Practices

  • Maintain backward compatibility with versioning.
  • Implement rate limiting to prevent abuse.
  • Consider CORS settings to control API access.

Caching and Content Delivery Networks (CDNs)

Caching Techniques

  • Browser Caching: Stores resources on local machines.
  • Server-Side Caching: Reduces database queries.
  • Database Caching: Uses in-memory databases like Redis.
  • CDN: Distributes static content geographically to reduce latency.

Proxy Servers

  • Forward Proxy: Acts on behalf of clients.
  • Reverse Proxy: Acts on behalf of servers, managing requests and load balancing.

Load Balancers

  • Distributes requests among servers to prevent overload.

Load Balancing Strategies

  • Round Robin, Least Connections, Least Response Time, IP Hashing.

Database Essentials

  • Types: SQL (relational) vs. NoSQL (schema-less) databases.
  • Scaling Methods: Vertical (scale-up) vs. Horizontal (scale-out).
  • Sharding: Distributing data across servers.
  • Replication: Keeping copies for high availability.

Database Performance Techniques

  • Caching, indexing, and query optimization.

Conclusion

  • Importance of thoughtful system design for scalability, maintainability, and efficiency.
  • Understanding trade-offs and choosing the right architecture based on specific use cases.