E-commerce System Design Overview

Overview

This lecture covers the high-level system design for an e-commerce platform like Amazon, focusing on the products database, shopping cart service, order processing, and optimizing read performance for scalability and speed.

System Requirements & Capacity Estimates

Focus is on product catalog, cart management, inventory, and fast/scalable reads; excludes reviews, payments, AWS, and support features.
Assume up to 1 billion products (~1PB data, including images).
Estimated about 10 million orders/day (≈100 orders/sec).
Requires sharding and caching due to data volume and performance needs.

Product Database Design

Read throughput is prioritized; writes by admins/sellers are infrequent.
Single-leader replication is sufficient; B-tree indexing provides fast reads.
NoSQL (MongoDB) chosen for flexible schema (products vary widely); data stored as JSON documents.
Images stored in S3, with CDN for fast, global access.

Shopping Cart Service Design

Avoid store-and-overwrite or simple list approaches due to potential concurrency issues.
Optimal model: represent cart as a set (table with cart ID and product ID) to minimize locking.
Use CRDTs (Conflict-Free Replicated Data Types), specifically observed-remove sets, for eventual consistency across multiple leaders and avoid lost updates.
Writes and reads are synchronized per user session to avoid missing recent updates.

Order Processing & Inventory Management

Each product has a stock counter; decrements on order require locking to prevent race conditions.
To scale, use Kafka for order/event buffering and Flink for stream processing.
Orders placed into a unified Kafka queue for atomicity, then split per product for inventory checks.
Inventory updates from sellers flow into Kafka and are reflected in real-time inventory via Flink.
Email notifications sent post-processing (may not be idempotent; two-phase commit can address).

Optimizing Read Performance

Pre-cache popular products/data using Redis; cache images with CDN.
Identify popular products with Spark Streaming, aggregating order/click metrics daily and exporting to HDFS.
Batch job ranks products by popularity per category for cache prepopulation.

Search Indexing Strategy

Use Elasticsearch with inverted index (term → product ID/description/thumbnail).
Prefer local (category- and score-based) partitioning over global index to minimize duplication and speed up pagination.
Popularity scores guide internal shard allocation for fast retrieval of top items.

Cache & Index Swap Process

Populate a standby cluster with updated cache/index after batch job completes.
Update coordination service (Zookeeper/DNS) to point services to the new cluster, minimizing downtime during switch.
Schedule swap during off-peak hours to reduce impact.

Key Terms & Definitions

NoSQL (MongoDB) — Database supporting flexible JSON schemas, ideal for diverse product data.
B-tree Index — Tree-based data structure for efficient read queries.
CDN (Content Delivery Network) — Distributed servers for fast image/content access.
CRDT (Conflict-Free Replicated Data Type) — Data type ensuring eventual consistency across distributed nodes.
Kafka — Distributed messaging system for buffering events and orders.
Flink — Stream processing framework for real-time data.
Elasticsearch — Search engine using inverted indexes for quick full-text search.
Sharding — Partitioning data to distribute load across multiple servers.
Redis — In-memory data store used for caching.

Action Items / Next Steps

Review concepts of CRDTs and their use in distributed systems.
Study stream processing pipeline setup (Kafka, Flink, Spark).
Understand how inverted indexes and sharding enhance search scalability.
(If assigned) Implement sample cart service or search indexer following these principles.