E-commerce System Design Overview

hello everyone and welcome back today we will be doing yet another systems design problem probably going to be Amazon retail now if it looks like I'm going to pass out that's because I probably am so I'm going to try and keep this video relatively short however if you guys are anything like the women that I've seen in the past apparently the long ones are too painful anyways so hopefully this should be just good for you let's go ahead and get started all righty so today we are going to be building Amazon or for you lovely folks over in India right now flip cart basically the idea of this is uh as we know you can go ahead and see a product once you see the product it has some price probably an indication of whether or not it's in stock a button to add it to your cart and then of course once you're on the cart screen you can go ahead and place an order so as you can see in my cart uh it shows the items that I've currently uh got ready to check out we've got some lotion some socks some motor oil a broom no relation between any of those items just random stuff that I picked out cool let's talk about some problem requirements so the main things to do here are we obviously just want to build some pretty uh limited feature set of Amazon's retail site which at least as far as I'm concerned uh you know what's hard to build or what's worth building for the sake of this video is going to be uh the cart service uh basically dealing with inventory management um you know dealing with when items go out of stock and uh just in general making everything fast and scalable a couple of features that we aren't going to cover as far as Amazon is concerned are the revews uh we basically did that with Yelp it's the same idea payments probably deserves its own video if anything anything related to Amazon web services you're on crack if you think I'm covering this in this video or of course any uh user or seller customer service related type of things um I have a friend who works at Amazon on one of these teams and it just goes to show you it it deserves an entire team right so obviously there are tons of complicated Services going on on Amazon retail however nonetheless we're just going to cover a very small piece of that cool so let's talk about some quick capacity estimates uh these are going to be super handwavy because you know it just changes constantly and you're kind of at your whim of your interviewer here when you're doing this so first off let's assume there are around 350 million unique products on Amazon I Googled that one so why not just round it up to a billion we want to build for scale uh additionally apparently there are 10 million orders per day now we see why Bezos is so rich that if you divide by your 24 hours your 60 minutes per hour and your 60 seconds per minute leads us to around 100 orders per second which is actually not so bad if you think about it just considering the fact that most of those orders are probably going to different databases or you know different product IDs and those will probably be spread out over different databases but we'll talk about that later the next thing is that I basically start getting really handwavy here and say let's estimate we have around a megabyte of data per product if we multiply our billion products by our megabyte of data that's around a petabyte of of total data so we're obviously going to be needing some partitioning over our products database which is not a huge deal keep in mind all these products have images uh so at some point we want to throw them in S3 maybe use a CDN to speed up those load times and then finally uh regardless of how much scale that I listed out here uh if you were just like I'm going to throw this on one database your interviewer is probably going to be like you need more scale than that so uh yeah we're just going to try and imagine that everything is uh as slow as possible and that we've got 10 bajillion users uh editing everything at a time and we're just going to try and build something super bulletproof cool so what is the highle overview I got a comment recently kind of uh saying that I should try and structure my videos a little bit better so I'm going to attempt to do that with this very slide uh apparently and I've heard this in reference to Amazon and other e-commerce sites even so much as 100 milliseconds of a delay on our load times can really hurt our conversion rates in our site and so as a result we want to make reads as fast as possible wherever we can especially when loading potential products actually clicking into a product to see the view of it and then you know checking out as well because we want to make Rights passed as well cool so if I had to make a table of contents here for the one thing I'm going to cover really quickly is our products database then our card service then dealing with placing orders and then finally just optimizing reads in general cool so the first thing that we're going to talk about is our products database so like I mentioned uh we basically have a petabyte of products data store or we can imagine that we do and if that's the case we're going to have to go ahead and partition this thing uh additionally we also want to be ensuring fault tolerance uh and if we do that even within all of that we want to make sure that reads are as fast as possible so generally speaking I don't really think we have to worry too much about write throughput here because editing a product database uh description is probably going to be something that like an administrator or a seller might do and they're probably not doing that too frequently I would imagine it's very infrequent that two people are going to be editing the description of a product at the same time and so as a result I mainly care about our read throughput here and as opposed to our write throughput so I think single leiter replication should be fine I think that um using a b Tree in theory is going to be optimal because as we know B trees are just you know go down the tree D and find your read as opposed to an SS table where you have to actually check multiple of these SS table files and that is obviously going to take longer of course the B tree is going to be slower to write to but we don't really care if we don't have that many rights to it cool so the another thing is uh as far as Amazon products are concerned I think that we want to be able to have a flexible schema here the reason being that uh you know products are in many different types of departments and uh they have many uh different pieces of data amongst them and so having that flexibility could be useful so this is one of the very few cases where I'm interested in no SQL solely for the data model or the flexibility there and uh I think that mongod DB is going to be the choice here considering it is a sing single leader replicated uh B tree index type of database and also allows us to just make Json documents for products I think we are typically going to want to load that full Json document at once so the data locality is actually going to help us out here and uh we don't really need to do very much denormalization we're literally just loading the product it's not like there's any uh joins that we need to do at least not in this very strip down version of Amazon cool the next thing we can talk about is actually building a cart so I'm going to discuss a few different different implementations of how we can do this ideally getting better and better as we go so I guess the same uh the kind of idea here is we don't just want it to be the case that one person can edit their card at a time I might have my account open you might have my account open we could both be editing that card at once and if we're doing that we don't want to both be stepping on each other's Toes or just having super slow rights cool so what would be a really bad implementation well I guess it would be taking the full list that you have in your JavaScript front-end application and just sending that over to the database and overwriting the existing card with it why well for starters if two people write at the same time and one isn't aware of the other's right one is just going to override the other so if we already had eggs and ham over here oops did the wrong thing if we already had eggs and ham in the database and then this guy writes milk and carrots that's just going to completely get rid of it that would be bad we don't want to step on each other's toes like that uh of course we could also just store the cart as a list and then basically add elements to the end of that list right so I'm going to add ham this guy's going to add eggs but that's not going to be great either because we're going to have contention when we're adding elements to the end of a list we need to grab a lock on the list itself in order to add that element and so as a result if we have tons of people you know not just two guys but maybe 10 or 20 trying to add to the end of the cart we're going to have a lot of contention on this lock now of course in reality is it going to be the case that we're going to have 10 people editing the same Amazon card at the same time probably not but like I mentioned the whole point here is to basically push this problem to the extreme in order to ultimately think through some fundamentals of systems design cool so let's continue talking about building a cart ideally we don't just want to keep a list because like I mentioned whenever we have to maintain order for something we're going to have to grab locks in order to append onto the end of that list so what could we do instead well in order to avoid having to lock we could just do a set right so as opposed to um adding things in list basically what we could do instead is add things to just a typical table where you've got your card ID and your product ID and since these two aren't actually uh contrasting with one another we can just throw them into the table and you don't have to basically have the two compete with locks cool however the one situation where we would have to grab locks is if we want to be able to delete items from our cart right so the person who adds this item right here and the person who deletes it one is going to have to grab a lock on that row to add it and the other is going to have to grab a lock on that road to delete it otherwise we would have race conditions and you have no clue what's actually going to happen so of course like I mentioned again this is all going to be super hypothetical are we ever going to really have a lot of ads and removes of the same item within a second probably not maybe I'm desperately trying to add 20 boxes of tissues to my Amazon cart and my mom keeps deleting them because she's pissed at me buying too many tissues however I don't really think that's going to happen in real life but again we're going to pretend like it does cool so let's imagine there was a ton of contention on a single one of these rows how could we avoid that lock in contention well generally speaking at least as far as I've come across or what comes to my mind when I was making this video I can think of kind of two main ways to try and deal with avoiding contention right so the idea is if we're going to have to grab locks to write two things that compete with each other ideally we have two options one is basically we process them one at a time by first buffering them in a sense calized place like Kafka in this case I don't really think Kafka is going to work very well the reason being that if I add an item to my cart then it's going to go in Kafka and be processed asynchronously in the background so if I add an item to my cart and then I click check out uh unless Kafka has already like allowed that message to be processed now all of a sudden I'm going to check out a bunch of items and I might not even have the one that I just added so that would be bad we definitely want something that's going to be at least synchronous here or synchronous so far as when when I add an item and then I check out I can see that I actually added that item and it's part of my card cool so our second option would to actually just be to send all the rights to different places so they don't have to compete with one another and in this case when we say different places what that really means is multiple database leaders for that particular card so a couple of things or complications are going to come up here if we have multiple leaders how is it that everyone is going to end up seeing the same card well they won't there's going to be eventual consistency but we can actually ensure eventual consistency by virtue of using a crdt so these are something I talk about relatively frequently on this channel a crdt stands for Conflict Free replicated data type and in this case we want to use such a crdt to build a set amongst our leaders so the idea is we're going to have a crdt on every single one of these leaders each of them is going to basically have its own copy of what it thinks is in the cart and then eventually those guys will sync up with one another and then they should all converge to the same value so what I'm going to demonstrate in the next slide is something called an observe remove set which is going to allow us to build out a crdt set where you can add delete elements and then add them back which is not trivial and uh the one thing to note here is that like I mentioned there are multiple leaders and so if we do have multiple leaders in order to avoid the case where I write an item to a database then check out and my item is not in the cart we should make sure that if I'm writing to a database or writing to a leader of a dat database one of the many that I should also read from that same one so that I can actually see my own items on the checkout otherwise you you know basically make it possible such that uh you know you check out and don't have the item in there that you just added cool so hopefully that all makes sense so far let's go ahead and talk about this observe and remove set so basically the overarching idea here for this crdt is that one it's an operation based crdt so I'm not going to be throwing my whole set in the database every single time I'm just going to be throwing in the element that I added or removed and the main idea is that every single element that I add or remove we're also going to append some unique tag which is really just going to be a random uuid and basically what we're going to do is we're going to throw that unique tag on the product ID whenever we're adding or removing an item from the card and that's basically going to tell me uh what a given client actually knows was in his version of the crdt when performing an addition or a removal so let's go ahead and demonstrate this let's say that at timestamp one we've got one client making a right to this left database over here and he wants to add product 12 to the cart so we assign this random tag a32 is that a delete no it is not at Tim stamp two this guy over here is going to read from this leader and sees that uh you know product 12 is in there with that given tag a32 so now his client is going to say hey I want to go ahead and delete this thing how do we delete a given item well all we have to do is put it in the database with the tag 12 a32 and a given delete saying okay we actually got rid of this thing cool the next aspect of this is going to be at time stamp 4 we have yet another client who says oh you know what I'm also editing this cart I want product 12 too since this is a unique addition we're actually going to generate yet another new tag in this case 6D 27 and that of course is not going to be a delete so the main idea here is that when we eventually merge together all these operations both databases are going to agree on what the general state is so now because of the fact that a32 was actually deleted and added we know that we can basically just cancel this guy out from both of these replicas once they sync up so a32 is going to get deleted but note how this user never actually deleted product 12 with tag 6027 so this guy sticks around and this is actually going to be the eventual result of our crdt we basically would agree that 6027 was still in the set and not yet deleted however in an alternate set of events what would actually happen is let's say you know uh this guy is actually going to go fourth and what happens at time stamp three is this guy adds 6027 however at time stamp 4 this guy both deletes a32 and he deletes 6027 because he sees them both he's actually read from this database first and then at time stamp 4 goes and deletes them both well in that case we would have nothing remaining in our cart he actually just deleted product 12 entirely so hopefully that makes enough sense for how you can both delete products and also add them back the tag is basically going to be the source of Truth cool the next thing that we're going to be talking about here is going to be our order service so the general overarching theme Here is whenever we create an order I have to check whether there's going to be stock on that order so how do we check if there's stock well obviously for each product there's some sort of counter saying here's how much inventory I have left you know product 10 has 100 left in stock so if I'm going to buy five units of 100 or product 10 now I have to decrement that quantity from 100 to 95 and if you recall every single time that we have to do some sort of De operation or something like that on a counter we have to grab a lock on it otherwise we can have lost updates so for example if I have this counter here 10 and two different processes both try to add one at a time what that really means is they first have to read the current value add one to that and then write it back so as you can see if they both do that at the same time without locking they can basically both read that the current value is 10 and both write back 11 that would be a bad race condition hence the need for the lock so if we do have to grabb a lock that of course opens us up to lock contention so like I said in reality this probably isn't such a big deal right we mentioned we have around 100 orders per second since all of those orders are on different product IDs uh we're going to be grabbing different locks for each of those 100 orders I imagine there's not really that much contention in reality however of course for the sake of this problem we're going to pretend like there's a ton of contention so what are we going to do in such a scenario well let's go back to our avoiding contention Playbook basically there are two things we can do one was what we did last time where we have multiple leaders and we attempt to basically use some sort of eventual consistency in order to distribute out our rights to multiple different places so that we don't have to grab as many lock or we don't basically grab the same locks on each database node cool so in this case let's imagine we use something like a counter crdt so this is something I've covered in the past but basically the idea here is that you keep track of um the number that you have basically incremented on every single leader instance and then eventually those guys will sync together and then you'll have a good idea of the total number of increments or decrements in this case but the idea here is basically we could effectively see the remaining quantity of orders or at least as well as each leader database knew it to be and those guys would eventually converge however the problem is we only actually find out the true value of uh the crdt once these guys actually sync up with one another so let's imagine uh I'm just going over to this example right here and I'll erase what I had so that I can write it out again let's imagine that we both start out with 10 remaining quantity on both of these guys now the first thing that's going to happen is uh you know let's say one order comes in it goes to this database over here and it was for a size of 10 and then this guy comes over here going to the right database and it was also for a size of 10 now these guys SN up and they say Oh shoot what actually happened is that we oversold our product by 10 units we accidentally sold 20 and we only meant to send 10 now we have to go back and tell everyone that even though we told them that they had filled their order they didn't actually so what would be a better solution than that well ideally we have to use our other possibility which is going to be some sort of buffering our events which we do in Kafka and then processing them one at a time via a stream processing framework so basically the idea now is that the orders are going to go into a Kafka Q we can partition these by product ID if we need to actually scale them out because there are so many product ideas and then basically once we figure out whether we actually have the proper amount of stock for a given order we can go ahead and email a user now keep in mind that uh the emailing users part is not item potent right so for some reason our flank node goes down or whatever it is that we're using for stream processing uh after we send that email and we don't actually acknowledge the fact that we sent the email we're going to rerun this event from Kafka send another email and you know tough luck our client gets two emails but not the end of the world just note that it is not item potent unless we use two face commit cool or it wouldn't be item potent if we use two- face commit but if we did use two- face commit it wouldn't be possible for us to end up replaying that email because uh we would acknowledge the message at the same exact time that we sent the email cool so I now I have myself a little Di diagram here to explain how order processing is going to work when we are using streaming so we've got our order service over here this is where I come and place myself in order you know for some tissues and some lotion now the first thing that's going to happen are these guys are two different product IDs right tissues could be product ID 22 lotion could be product ID 50 so over here the first thing that's going to happen is that exact message is going to be sent into this fairy kfaq and the nice thing about using one Kafka Q for the whole order is that that is going to assure atomicity we know that once this message is in Kafka it's eventually going to be processed by the rest of our components but before it gets there if I wanted to split these guys up and try and send them both into the same Kofa que well it's possible that one gets there and the other doesn't and then now we only process half of our order so again it's very nice to basically have a unified Kafka Q for our orders and then once it actually gets in there we can use flank to basically guarantee that every single message is going to be processed so what our first Flink node is going to do is actually split up these orders so I'm going to call this the order splitter Flink whatever bad name who cares it's basically just going to Route each individual component of the order so each product of the order and the quantity that we actually want into the proper Kafka Q for it so these Kafka qes are shed by product IDs so these would be the individual product orders cool the last thing that's going to happen there is we're basically going to take a product ID we're going to take the quantity of it that we actually want to buy and then we are going to uh check its current quantity on the Flink node so let's imagine I wanted a three quantity and there were three remaining what I would do is basically decrement this three to zero I would send an email to the client saying you're good to go we actually filled this order and then the next thing that I would do is say oh shoot this thing is a zero quantity let's go ahead and write that this product is out of stock to our product database the last piece of the puzzle is well how is Flink actually able to cash the uh basically the current uh stock of a given product well besides whatever incoming orders there are it also needs to get any incoming inventory changes that come from uh our sellers so what our seller can do is if this is our seller he's basically just going to upload hey uh you know for product 10 we just added 250 more stock and then this guy would flow through Kafka back into our Flint consumer and then this zero gets edited to 250 so that we can start fulfilling subsequent orders cool hopefully that all makes sense right there the last aspect of this video is going to be attempting to optimize our reads so like I mentioned uh basically the faster that we can make the reads on our service the more likely that we are to sell products and that is a huge deal so the idea is um you know we can't make all of our reads fast right that would just be put everything in memory you know get everything into cash that would be nice uh so a certain subset of our items we want to ideally make the reads faster for so which items would these be well ideally the popular ones cool so the first question that should come up is which items are popular many different ways to determine this I'm sure Amazon has a variety even internally one would just be the number of orders on a product another would be the number of clicks on a product etc etc etc maybe the reviews on the product it doesn't really matter I guess the idea is at the end of the day we're doing some sort of counting aggregation over a Time window and we might want to export those metrics let's say once per day so of course you already know what I'm going to say what are we going to do stream stream processing because then we can window our data and say hey here are our metrics for the last day so let's say we use spark streaming because uh we just want to process things in mini batches no reason to process them one at a time uh it's not that urgent so let's say uh you know we keep all of these counts in spark streaming let's imagine that we're doing this based off the number of orders on a product right so in spark streaming right over here not only are we uh keeping track of the actual quantity remaining but we're also keeping track of the number of orders so we're going to do that and then once per day at some set time we can basically export for each product ID the number of orders that they had and we can send that right over to hdfs now once this guy is in hdfs or once all this data is in hdfs we can actually go ahead and run a spark job to basically rank all of the items in terms of their popularity we would calculate some popularity score we would rank them we could even split them into given departments right so for example I don't know the silverware all the mattresses all the video games rank them all in their own categories and we could even do some exponential waiting right if I have yesterday's data and the days before that I can slightly incorporate those into my popularity SP uh score but I could wait today's data more heavily again this is all implementation details you have a lot of options here for how you want to do this ranking doesn't really matter but the point that does matter is that once we know which items are particular po particularly popular sorry about that we can go ahead and pre-populate our caches with them so we can do that with uh basically the product data that we see in mongodb we can throw that in a redus cache for products and additionally we can take those corresponding images of the products and throw them in a CDM the nice thing about a CDN is of course that it's geographically distributed meaning that for a given product ideally since those images are so big if the CDN hosting the image is physically closer to me I can load it faster cool so another part of reads that we really want to optimize is going to be text search so of course if you go to Amazon one of the things that you're going to notice is that we have this massive search bar at the top and uh you know you want it to be quick when you say search for a given product so if I want to search for a mattress I want to see only mattresses and I want that to happen really quickly now if you've been watching my channel for a decently long time you would know that a search index is going to be completely necessary here and what that really is under the hood is just going to be an inverted index where inverted index is an index from search term to a list of product IDs or product descriptions that contain that search ter cool and so uh in order to make things even faster I said product IDE over here but what would really be better is an actual product description or perhaps a mini one maybe not everything that's in but perhaps uh like a thumbnail of the product because then we don't have to actually go back to to do an additional read once we hit our search index so a little bit of data denormalization is probably going to be in order here cool so the next question is going to be how do we actually want to partition our search index well we could do Global term-based partitioning right which would be super nice so if I ever search mattress uh all of the document IDs or the product IDs that uh contain the term mattress are going to be on a single note that would be great because then if I ever do that search I am basically only going to have to hit one node at the same time in reality there are probably too many products especially when you consider their description sizes uh in order to be doing that the reason I say that is because keep in mind that if we want to keep a global index what that basically means is that the same product is going to have to be on multiple different nodes right so for example if we have the term Kleenex on one node and the term tissues on another node uh on node one I now have to keep the full product description and on node two I also have to keep the full product description so now we're really really duplicating data which is not ideal um another thing about global based partitioning is do we really need all of those uh product IDs on the same partition perhaps not because at the end of the day on our front end we use pagination so as long as every single document ID on a given uh page of Amazon is on the same partition we can still get super fast queries cool so let's talk about local indexing because that is really our only alter alternative to Global indexing the idea here is that basically we put a set of products on every single uh node of our search index and then we index them locally so we just build that inverted index locally right off of that set of products so let's imagine that uh basically we have uh one product called Kleenex tissues as you can see the term for Kleenex and the term for tissues is going to be on this particular node if on a different node we had I don't know softy tissues they would also have the term tissues on that node too okay cool so that is what local partitioning is but now the question is well how do we want to perform local partitioning we have to decide now which products are actually going to go on which node to be locally partitioned cool and when I say locally partitioned typo on my end locally indexed great so what products are going to go on each partition basically the main thing right here is that we want ideally similar product to be on the same partition and also uh similar product categories which would be a little bit easier because I would imagine that in that mongod DB most of these products are going to have some sort of tag you know so for an example a tag would just be a mattress or it would be gardening or something like that so similar products can go on the same uh search index node and additionally in our batch job keep in mind we basically have generated a popularity score for every single uh product category right so here's uh the mattress product category we've got product IDE D2 it's got a score of 3.6 19's got a score of 2.1 and so on and so on so ideally keep in mind that this is effectively also going to be the order of items that we return when we search for the term mattress and so as a result what's going to be nice about this is that as long as we Partition by popularity score range now if I want to say search for page one of mattress listings I basically just want these terms right here if I want page two of mattress listings I want these terms right here and so we can actually just go ahead and partition those accordingly which is going to make our life pretty easily so to summarize that first we partition the products by their product category and then internally within a product category we can Partition by their popularity score that we calculate in spark cool the last piece of this puzzle is actually going to be doing the population of both the caches and indexes after we go ahead and compute our popularity scores for all of the products well ideally we don't want to basically do this into our current caches and indexes or our current cluster the reason being that this is going to slow it down by a lot if we add a bunch of rights to that current cluster that is going to compete for a lot of the networking bandwidth there and it's going to slow down actual user reads and that would be bad so ideally what we could do instead is have some sort of standby cluster now we don't have to have this provisioned at all times but maybe you know 30 minutes before um our batch job runs you provision a bunch of ec2 instances or something like that uh depending on what platform you're actually using here and then the first thing that you're going to do is basically go ahead and populate them from spark right so you're going to add stuff into your standby cluster the next thing that you're going to do is go over to your zookeeper or whatever uh coordination service it is that you're using to basically tell everything in your application where everyone lives or your DNS service or anything like that so we go to zookeeper we say hey wait a second we're now using all of these URLs uh that points to our standby cluster and then all of a sudden we're going to broadcast that to all of our existing servers uh via something like a zookeeper watch in order to keep them updated in terms of where to actually Point their queries and then you know maybe 20 minutes later we go ahead and decommission uh the nodes in our current cluster because they can become a standby eventually we don't want to waste a bunch of money um additionally one other point here is you could probably do this uh on like a region basis uh where you purposely choose a time to do this where that region has most of the people sleeping so traffic is particularly low uh but those are other considerations you can just talk about with your interviewer cool my voice is starting to give out on me so give me about one second here and then we will talk about this okay so I'm now going to try and demonstrate the flow of how Amazon is going to work so the first thing that we can do is actually just search for Products off a given search term as we know that means we're just going to be hitting our elastic search so like we mentioned elastic search is going to work really nicely for us here if we Shard by both the product type or some sort of product tag and also some sort of popularity score range we can evenly distribute these out pretty easily because we ran a batch job so we know how big the whole sample size is and we can distribute those as necessary cool the next aspect of this is going to be the product service this is once I perform my search now all of a sudden I see a thumbnail for an item I want to click the item and get the full product description so this is going to be the second thing in the pipeline so either that product is going to be in our product cache because we pre-loaded it there because it's a popular one or if not we hit our cash get a cash Miss and then we go over to our product database which again is going to be so we get that flexible data format uh with Json documents and then we can just Shard that on the product ID cool hopefully that makes sense number three is we submit an order actually technically we would have have to put things in the cart first so cart is going to be number three here I've basically lined up our cart DB so keep in mind the schema is effectively the cart ID the product ID and whether or not it's deleted as well as the unique tag of that product ID uh so we're going to do that in ryak uh ryak is not a database that I talk about super often on this channel but think of it like Cassandra but it supports crdts so uh ryak would work very nicely here cool so we're going to just Partition by card ID there uh so that all of the rights for a given cart are on the same uh node and as a result of that you know when we do perform our anti-entropy uh all of our state will eventually converge cool the next aspect of this is going to be the order service so the first thing that happens as we discussed in order to keep things Atomic is we send our full order into the full orders cofaq from there Flink is going to take it in split it up into multiple different orders this then gets sent into a second C Q where we have orders per product ID with of course the amount of stock in that product ID that we want to buy and then finally that's going to hit our last Flink node which performs our inventory checks so this Flink node is basically going to take in inventory changes which is just a cofa Q and uh you know if you are someone who is selling things on Amazon you can write your incoming inventory changes to that cofaq for example product 10 add 250 stock product 10 remove 100 stock Flint can take those into account the last aspect of this is that you know obviously once we go ahead and process whether an order can be processed or not we would actually want to send an email over to our user that made the order so we are going to have to have the email address associated with the order somewhere in our data model finally of course like I mentioned we want to uh get some aspect of how popular a given product is so in this Flink node right here which is sharded by product ID and keep in mind it's sharded by product ID because Kafka is sharded by product ID and we have one Flint consumer per Kafka Shard basically what's going to go ahead and happen now is that we can aggregate the number of orders or the number of stock purchased uh from an item in a given day upload that to hdfs at the end of the day and then run our spark batch job to One update the products database or rather actually update the product cache over here Jordan stop erasing update the product cache update our elastic search instance to basically rebalance our partitions by their popularity score we would do this in the standby cluster but I just didn't have enough space in the diagram we would update zookeeper to basically go ahead and uh say hey we're going to have to switch over to our new elastic search and our new caches and our new cdns etc etc and uh yeah that would be it right there once we switch over the user starts reading from all of those and particular well guys I hope you enjoyed this video again apologies I'm so tired and my voice is giving out however I do think uh this one was moderately understandable compared to the Reddit video so uh hopefully it worked out well and I will see you all in the next one

Transcript for:E-commerce System Design Overview

Transcript for:
E-commerce System Design Overview