Hi everybody, welcome back. In today's video we're going to discuss how to design amazon.com. Everything we discuss in today's video also applies to interview questions like design eBay, design walmart.com or design Flipkart or any e-commerce websites.
So first we're going to discuss functional requirements followed by non-functional requirements then we're going to do some capacity estimations then we will do some high-level database designing followed by API's then we will do a high-level system design for a smaller user base and then we will scale the system for millions of users And as we go through the process, we're going to call out assumptions. First, we are going to start with gathering functional requirements. We only have one hour for the system design interview, so without wasting any time on basic functionality like creating the user profile, we are going to directly dive into the core system functionality.
We want to be able to search for the product. Two, when users open Amazon.com app or web page, we want to be able to show some recommended products on the user's homepage. Users should also be able to place the order and view the status of the existing orders.
Users should be able to read or write the reviews for the products. System design interview is only for one hour. So practically you can take three to five requirements and dive deep into the design of those requirements.
But for a system to be operational, you will require a lot more functionality than just your functional requirements. So in that case you can call out assumptions. For example, we're going to assume that the user profile creation is provided. Let's assume that product onboarding is also provided.
And then we also have the payment gateway provided. Let's look at the non-functional requirements. When a user opens Amazon app, they should be able to see the product recommendation on their home screen without any delays.
So we want low latency for recommendation service. And similarly, when a user search for a product, they should be able to immediately get the search results. So for both of these services, we need low latency. When a user is placing the order, making a payment, or checking the status of the order, we want to show the most accurate information to the user.
So we want highly consistent system for these services. Let's do the capacity estimation now. Amazon have about 300 million monthly active users. Each user is searching 10 products a month. 300 million users multiplied by 10 searches a month, which is equal to 3 billion searches a month.
We're gonna convert that into number of searches per second. So we're gonna divide 3 billion, divide by, 30 days a month multiplied by 24 hours in a day multiplied by 60 minutes in an hour multiplied by 6 seconds in a minute if you calculate 24 hours multiplied by 60 minutes multiplied by 60 second it is equal to 86 400 seconds since we are just doing the estimations for easier calculation we are going to round off that number 200 000 which is equal to 10 power 5. we are going to rewrite the equation as 3 billion as 3 multiplied by 10 power 9 divided by 30 multiplied by 10 power 5 which is equal to 1000 searches per second so our system should be capable of handling 1000 searches every second let's look at the storage estimate let's say we have 10 million products in product catalog let's say each product needs 10 mb of storage for storing product images and descriptions so total storage required to store product information is 10 million products multiplied by 10 mb of storage per product which is equal to 10 multiplied by 10 power 6 you into 10 multiplied by 10 power 6, which is equal to 100 multiplied by 10 power 12, which is equal to 100 terabyte of storage required to store all the product information. We have to scroll the page now.
So if you want to use any of this information on this page to revise before your interview, take a screenshot now. Hey, real quick, I have spent a lot of hours making this video. So if you're getting value out of this, can you please consider subscribing?
Or at least give me a thumbs up. That way I know that my efforts are not going waste. Thank you. Let's look at the database design. First is the user database.
Since user information is pretty structured in nature, we can use a SQL database and have a schema like this. We'll have user ID as primary key. We'll have username and password, first name, last name of the user.
And it would also be nice to know when the user was last logged into our system. And we can also have the created date timestamp to know when the user ID was created. We can also have users'billing address and address on which the products will be delivered on the same table.
But if we do that, then every time a user wants to add a new address, we'll either have to duplicate some of the information that we already stored in this table, or we have to restrict the user to have only one billing and one delivery address. So in order to avoid this situation, we're going to create a separate database called Address Database. Address information is also structured, so it makes sense to use the SQL database. We're going to have Address ID as the primary key, and then we will also have User ID as the foreign key.
This way, we can say which address belongs to which user. Then we will have effective date. That will indicate which user was residing at which address starting which date. Then we will have standard address parameters like Address Line 1, Address Line 2, City, State, Zip Code, and Country.
and the address type will indicate whether the address is billing address, home address, or office address. Now let's talk about product database. Product data is not very structured. Now let's understand that with help of an example.
If we use SQL database to store product information, we will be wasting a lot of space. You can see all the yellow highlights that indicates all the space that will be wasted. So we will be using a NoSQL database of document database type, and we have DynamoDB or MongoDB as popular choices.
These are the two examples of how product database will be stored using JSON documents. Now let's talk about the order database. You can place an order for one item or you can place an order for 100 items.
If you use SQL database then you will store order information in one table and all the items within the order into a separate table. So you will need two SQL tables. Instead of doing that I am choosing to go with a NoSQL database. That way we can store all the items within the order as an array of items within the same JSON document.
Now let's talk about the database for customer reviews. We're going to use a NoSQL key value database. Key value databases are pretty similar to document databases.
We're going to use product ID as the key. Product reviews, description, and link to attachments like videos and images in the review will be part of the value of the key value database. You can also include user ID. as a part of value to identify which user wrote which review. Now let's talk about the APIs. The first is getRecommendationService.
We will be passing a user ID to this service so that we can get recommendation for that particular user. And this API will return a list of, let's say, 10 recommended products for this user. The next one is the searchService. We will pass the search string that user is searching for and the user ID as a parameter.
That way, the searchService will only show the list of products that can be delivered at user's location. The next one is addToCartService. we will be passing the user id the product id quantity and the amount to this service and this service will return a boolean true or false value whether the service was able to add these items to users card or not the next one is place order service we will pass user id order id billing address and users payment information to this order service and if the payment was successful and the quantity or inventory was available for the items in the order this service will return true or false as a status whether this whether the order was successful or not the next one is Check order status service.
We will pass the order ID to the service and it will return the status of the order. Let's start with the high-level system design. We have displayed this functional requirements on the top right corner of the page. The first requirement is search for the product.
So a user will call the search service and the search service will look for the product in the product database. We will store this information in the search history database. We will run bad jobs against the search history database to generate recommendation for the user. These bad jobs will use machine learning algorithms to find the best product to recommend to the user. We will store these recommendations generated by this bad job into a recommendation database.
That way, Get Recommendation Service can retrieve those recommendations faster. Next requirement is to place the order. Before the user can place an order, they need to add items to their cart. So we need Add to Cart Service. User can add products to the cart either from the recommended products on the homepage or by using the search service.
So Add to Cart Service will be triggered either through Search Service or Get Recommendation Service. It will store the items to the cart database. So once the items are added to the cart, user can place an order.
So place an order service will be triggered through add to cart service. So when a user place an order, the items will be removed from the cart database and added to the order database. Next requirement is to check the order status.
So check order status service will check the status of the order from the order database and display to the user screen. Next requirement is to read or write product reviews. Product review service will read the product reviews from the product review database and display to user screen. User may also want to view the product recommendation when searching for products or reviewing the recommended products on the home page.
So product review service can also be triggered through get recommendation service as well as the search service. Now that we have done the design for all the requirements, it's time to scale our system to accommodate millions of users. First we're going to add a load balancer.
That way the load on the system is evenly distributed across multiple servers. Load balancer can be a single point of failure. If a load balancer goes down, we don't want our entire system to go down.
So we will have a standby load balancer. Instead of running one instance for each service, we will be running multiple instances for all the services. That way, our system is more failure-proof. When there are millions of users searching for product, we still want our search service to return the results quickly. So instead of directly searching the database, we will be implementing the Elasticsearch.
Before a user places an order, a user can search multiple items. So the product database is going to be read-heavy. So we will implement master-slave design for our product database.
Master database will handle all the write requests, and all the read replicas will handle all the read requests. For all the other databases, we only have one copy of the database. So if a database fails, we lose all the information. To avoid that situation, we will keep multiple copies of the databases.
The search history database is going to grow very quickly. In order to maintain the performance of the recommendation generation back jobs, we can either delete old records from the search history database or we can archive it. User search history provides really good insights. So instead of deleting it at this moment, we're going to archive it and we can set retention policy after a few years, it gets deleted.
Most users will be only concerned about their recent orders. They are less likely to interact with the orders they placed over a year ago. So we can archive all of those past orders into an archive database. That way we can ensure good performance for the order database.
Whenever a user is placing an order, the place order service will check against the inventory database before placing the order. So that is something we forgot to mention earlier. Whenever a user opens the Amazon app or the webpage, we want their recommendation to load immediately.
So we will be storing their recommendation into a cache. We will also add the message queue between the search service, get recommendation service, and the add to cart service. That way we are decoupling all those surveys and reducing the dependencies. We will also cache the reviews for the popular products and users recent order status. And there we have our fully scaled system.
If you like this video, you will also enjoy this one over here.