Notes on System Design Interview: Key Components and Preparation

Jun 29, 2024

Key Components for a Successful System Design Interview

Importance of Preparation and Knowledge

  • Knowledge: Familiarity with system design concepts and combining them is crucial.

Stages of a System Design Interview

Problem Understanding

  • Example Problem: Counting likes/views on Youtube, Instagram, etc.
  • Clarification: Define metrics, questions to ask the interviewer to clarify requirements.

Problem Scope Clarification

Reasons for Asking Questions

  1. For Interviewer: To assess your problem-solving approach in real life.
  2. For Interviewee: Understanding requirements fully helps in determining the right technologies and building blocks.

Key Categories to Focus On

  • Users: Who will use the system and how.
  • Scale: How much data will be handled, expected spikes in traffic.
  • Performance: How fast must the system be.
  • Cost: Budget constraints and cost-effective solutions.

Specific Questions to Ask

  • Users: Who uses total counts? What granularity of data is required?
  • Scale: Expected queries per second, data per request, traffic spikes.
  • Performance: Real-time vs. delayed processing capabilities.
  • Cost: Development vs. maintenance costs.

Functional Requirements

  • APIs Definition: Define actions and parameters (e.g., countVideoViews(videoId)).
  • Generalization: Account for event types (views, likes, shares) and functions (sum, average).

Non-Functional Requirements

  • Attributes: Scalability, performance, availability, eventual consistency, cost minimization.
  • Writing Down Requirements: Helps in the decision-making process for technology stacks.

High-Level Architecture

  • Initial Components: Database, web services for data processing and retrieval.
  • Gradual Development: Start simple, then expand to full architecture.

Data Model Definition

  • Options: Storing raw events vs. aggregated data.
  • Trade-offs: Pros and cons of each method (speed vs. storage cost).
  • Combining Approaches: Benefits of storing both raw and secondary data.

Data Storage Technologies

  • SQL vs. NoSQL: Evaluation against non-functional requirements.
  • Scaling SQL Databases: Sharding, leader-follower setup, cluster proxy.
  • Scaling NoSQL Databases: Apache Cassandra, consistent hashing, gossip protocol.
  • Data Modeling in SQL: Normalization and relationships among tables.
  • Data Modeling in NoSQL: Query-based design and denormalization.

Data Processing Overview

  • Real-Time Requirements: Pre-aggregation vs. in-memory processing.
  • Push vs Pull Approach: Reliability and scalability benefits.
  • Partitioning: Distributing data across multiple queues.
  • System Components: Consumer, aggregator, internal queue, database writer, dead letter queue, embedded databases.

Full Data Ingestion Path

  • Overview: Sequence from API Gateway through partitioner service to database.
  • Partition Management: Strategies to handle hot partitions, service discovery.
  • Fault Tolerance: Leader-follower replication, quorum writes.

Data Retrieval Path

  • Query Service: Real-time counts vs. stored metrics, data roll-up strategies.
  • Cache: Distributed cache for improved performance.

Technology Stack

  • Frameworks & Load Balancers: Netty, Hystrix, NGINX.
  • Database Solutions: Vitess, Cassandra, InfluxDB, Hadoop.
  • Event Processing Frameworks: Apache Kafka, Apache Spark.
  • Embedded Databases: RocksDB.
  • Service Registry Solutions: Zookeeper, Eureka.

Additional Considerations

  • Performance Testing: Load testing, stress testing, soak testing.
  • Health Monitoring: Metrics, dashboards, alerts.
  • Audit Systems: Weak vs. strong audit systems.
  • Handling Popular Events: Scaling partitions, fallback strategies.

Summary of Design Steps

  1. Clarify Requirements: Define APIs and non-functional attributes.
  2. High-Level Design: Outline data paths and architecture.
  3. Focus on Data: Address ingestion, storage, processing, retrieval.
  4. Discuss Bottlenecks: Performance tests and monitoring.
  5. Technology Choices: Align solutions with requirements and constraints.