🛒

E-commerce System Design Overview

Jun 9, 2025

Overview

This lecture covers the high-level system design for an e-commerce platform like Amazon, focusing on the products database, shopping cart service, order processing, and optimizing read performance for scalability and speed.

System Requirements & Capacity Estimates

  • Focus is on product catalog, cart management, inventory, and fast/scalable reads; excludes reviews, payments, AWS, and support features.
  • Assume up to 1 billion products (~1PB data, including images).
  • Estimated about 10 million orders/day (≈100 orders/sec).
  • Requires sharding and caching due to data volume and performance needs.

Product Database Design

  • Read throughput is prioritized; writes by admins/sellers are infrequent.
  • Single-leader replication is sufficient; B-tree indexing provides fast reads.
  • NoSQL (MongoDB) chosen for flexible schema (products vary widely); data stored as JSON documents.
  • Images stored in S3, with CDN for fast, global access.

Shopping Cart Service Design

  • Avoid store-and-overwrite or simple list approaches due to potential concurrency issues.
  • Optimal model: represent cart as a set (table with cart ID and product ID) to minimize locking.
  • Use CRDTs (Conflict-Free Replicated Data Types), specifically observed-remove sets, for eventual consistency across multiple leaders and avoid lost updates.
  • Writes and reads are synchronized per user session to avoid missing recent updates.

Order Processing & Inventory Management

  • Each product has a stock counter; decrements on order require locking to prevent race conditions.
  • To scale, use Kafka for order/event buffering and Flink for stream processing.
  • Orders placed into a unified Kafka queue for atomicity, then split per product for inventory checks.
  • Inventory updates from sellers flow into Kafka and are reflected in real-time inventory via Flink.
  • Email notifications sent post-processing (may not be idempotent; two-phase commit can address).

Optimizing Read Performance

  • Pre-cache popular products/data using Redis; cache images with CDN.
  • Identify popular products with Spark Streaming, aggregating order/click metrics daily and exporting to HDFS.
  • Batch job ranks products by popularity per category for cache prepopulation.

Search Indexing Strategy

  • Use Elasticsearch with inverted index (term → product ID/description/thumbnail).
  • Prefer local (category- and score-based) partitioning over global index to minimize duplication and speed up pagination.
  • Popularity scores guide internal shard allocation for fast retrieval of top items.

Cache & Index Swap Process

  • Populate a standby cluster with updated cache/index after batch job completes.
  • Update coordination service (Zookeeper/DNS) to point services to the new cluster, minimizing downtime during switch.
  • Schedule swap during off-peak hours to reduce impact.

Key Terms & Definitions

  • NoSQL (MongoDB) — Database supporting flexible JSON schemas, ideal for diverse product data.
  • B-tree Index — Tree-based data structure for efficient read queries.
  • CDN (Content Delivery Network) — Distributed servers for fast image/content access.
  • CRDT (Conflict-Free Replicated Data Type) — Data type ensuring eventual consistency across distributed nodes.
  • Kafka — Distributed messaging system for buffering events and orders.
  • Flink — Stream processing framework for real-time data.
  • Elasticsearch — Search engine using inverted indexes for quick full-text search.
  • Sharding — Partitioning data to distribute load across multiple servers.
  • Redis — In-memory data store used for caching.

Action Items / Next Steps

  • Review concepts of CRDTs and their use in distributed systems.
  • Study stream processing pipeline setup (Kafka, Flink, Spark).
  • Understand how inverted indexes and sharding enhance search scalability.
  • (If assigned) Implement sample cart service or search indexer following these principles.