Key Concepts in System Design

Oct 15, 2024

System Design Crash Course

Overview

  • Covers scalability, reliability, data handling, and high-level architecture.
  • Focus on concepts for system design interviews.
  • Importance of understanding how different parts of a computer work together.

Computer Architecture Basics

Data Representation

  • Computers understand binary (0s and 1s).
  • Bit: smallest data unit.
  • Byte: 8 bits, represents a character or number.
  • Data sizes: Kilobyte (KB), Megabyte (MB), Gigabyte (GB), Terabyte (TB).

Storage Types

  • Disk Storage: HDD vs. SSD
    • Non-volatile, holds OS, applications, user files.
    • SSDs are faster (500 MBps - 3500 MBps) than HDDs (80-160 MBps).
  • RAM (Random Access Memory):
    • Volatile memory, stores active data.
    • Faster read-write speeds (5000 MBps or more).
  • Cache Memory:
    • Smaller than RAM, faster access (few nanoseconds for L1 cache).
    • Purpose: reduce average data access time.

CPU and Motherboard

  • CPU: Executes instructions, processes operations.
  • Motherboard: Connects components, allows data flow.

High-Level Architecture of a Production App

CI-CD Pipeline

  • Automates deployment from repository to production server.
  • Tools: Jenkins, GitHub Actions.

Load Balancers and Reverse Proxies

  • Distribute user requests across multiple servers.
  • Examples: NGINX.

External Storage and Monitoring

  • External storage for data; logging and monitoring systems for performance tracking.
  • Tools for logging: PM2 (backend), Sentry (frontend).

Alerting and Debugging

  • Alerting systems notify developers of issues (e.g., Slack integration).
  • Debugging process: identify issue, replicate in safe environment, fix, and roll out hotfix.

Principles of Good Design

Key Principles

  • Scalability: System growth with user base.
  • Maintainability: Future developers can understand the system.
  • Efficiency: Optimal resource use.
  • Planning for Failure: System should perform well under stress.

Core Elements

  1. Moving Data: Seamless data flow.
  2. Storing Data: SQL vs NoSQL, indexing strategies.
  3. Transforming Data: Raw data into meaningful information.

CAP Theorem

  • Consistency: Same data across nodes.
  • Availability: System operational and responsive.
  • Partition Tolerance: Continued function during network partitions.
  • Trade-offs required based on system requirements.

Measuring System Availability

  • Availability: Operational performance and reliability.
  • Measured as a percentage (e.g., 99.9% availability).
  • Service Level Objectives (SLOs): Goals for system performance.
  • Service Level Agreements (SLAs): Contracts defining service commitments.

System Resilience

  • Reliability: Consistent performance.
  • Fault Tolerance: Preparedness for failures.
  • Redundancy: Backup systems to prevent service loss.

Networking Basics

Communication

  • IP Address: Unique identifier for devices.
  • Packets: Data packets containing IP headers for routing.

Protocols

  • TCP: Reliable, connection-oriented.
  • UDP: Faster, connectionless, suitable for time-sensitive communication.
  • DNS: Translates domain names to IP addresses.

Application Layer Protocols

  • HTTP: Request-response protocol for web browsing.
  • WebSockets: Two-way communication for real-time updates.
  • SMTP, IMAP, POP3: Email transmission and retrieval protocols.
  • FTP and SSH: File transfer protocols.

API Design Principles

  • CRUD Operations: Create, Read, Update, Delete.
  • REST: Stateless interactions using standard HTTP methods.
  • GraphQL: Flexible querying to avoid data overfetching/underfetching.
  • gRPC: Efficient for microservices using protocol buffers.

Caching and CDNs

Caching Techniques

  • Browser Caching: Local storage of resources for faster access.
  • Server Caching: Reduces expensive operations by storing frequent data.
  • Database Caching: Improves performance for data-driven applications.
  • CDNs: Geographically distributed servers for serving static content.

Proxy Servers

Types of Proxy Servers

  • Forward Proxy: Forwards requests from clients to servers.
  • Reverse Proxy: Intercepts requests from clients to web servers.

Load Balancers

  • Distribute traffic, ensuring no single server is overwhelmed.
  • Common algorithms: Round Robin, Least Connections, IP Hashing.
  • Health checking of servers to maintain optimal performance.

Database Essentials

Types of Databases

  • Relational Databases: SQL, ACID compliant (PostgreSQL, MySQL).
  • NoSQL Databases: Flexible, schema-less (MongoDB, Cassandra).
  • In-Memory Databases: Fast data retrieval (Redis, Memcache).

Scaling Techniques

  • Vertical Scaling: Upgrading a single server.
  • Horizontal Scaling: Distributing across multiple servers (sharding, replication).

Performance Techniques

  • Caching, Indexing, Query Optimization: Improve data access speed.