Transcript for:
Designing a Scalable Video Streaming Platform

hello everyone welcome back to the channel today we've got yet another systems design video for you now before we go ahead and get started let's play a quick game which is am I a intoxicated or B tired unfortunately because it's a Wednesday the answer is going to be tired which is why I want to get through this one kind of fast so let's go ahead and get started all right so let's go ahead and get into it so unless you're watching the super secret Jordan has no life only fans version of the systems design videos you're probably watching this on YouTube which means you at least understand the product that I'm going to attempt to be building Here YouTube and Netflix is just going to be a video watching site where a given Channel or a user can post a variety of videos they can watch a variety of videos and such an example would be one right here where the video has a title a number of views can have some comments it's able to buffer in real time as your network bandwidth fluctuates and so the gist is that we're going to attempt to build something like this that actually scales to billions of users so let's go ahead and outline some problem requirements so the first is the users can post videos the second is that users can watch videos the reason I have the arrow here is because we're going to be watching a lot of videos as one does users can also comment on videos and users can also search for videos by name the reason I'm not really doing views as much in this one is frankly because if you watch the tiny URL video the process that we would use to be aggregating video views is very similar that we would be using to aggregate uh tiny URL uh link clicks that we use in that video so I think it would be redundant I would just recommend you go watch that video as well if you're curious on view aggregation so again number two is going to be the most important number two being that we can watch videos so we want to optimize for our read speeds so let's go ahead and quickly outline some capacity estimates while we're at it cuz I said we're going to be building for large scale let's imagine our product has a billion users I wish I would be retired right now in addition our average video is going to have around 1,000 views that being said some are going to be super popular right certain channels such as myself of course with billions of subscribers have millions of views per video additionally we've also got videos being around 100 megabytes on average there are about a million of them posted a day and if we were to go ahead and figure out how much storage that needs to basically hold all of these videos we can take that 100 megabyte size per video multiply it by the 1 million videos that are getting posted per day we can also multiply that by the approximate number of days per year I use 400 instead of 365 and that comes to about 40 pedabytes of storage per year now obviously this is going to be a ton that being said when you're making a crap ton of money that's doable you just get a lot of hard drives so before we actually go ahead and get into any you know design or server choices or database schemas or anything like that that let's go ahead and do an introduction to video streaming so the first big concept that we have to think about is that we are going to support streaming or rather watching footage on many different types of devices right I might be able to watch on my phone you on your laptop someone else on their tablet and basically the gist is that we may have to support many different types of them additionally we have to be able to support different networking speeds right me on my gigaby e ethernet is going to be able to watch Clips at a higher rate than you can on your 2G cell phone so as a result of that what would be really really great is if uh certain devices with different bandwidths or different abilities to load footage can actually load them dynamically right so for example a slow phone might first be watching some footage in 480p switch over to 720 maybe they get slower again go back to 480 a medium device might be fluctuating between 4K 1080p 720 and then back and forth and then a super fast connection might just be able to watch the entire video in 4k which would be great so so the gist is that in order to do that we need to basically be able to support multiple different resolutions for our clips and also to support different devices multiple different encodings okay so let's talk now about video chunking which is going to allow us to do this in a more efficient manner so keep in mind that we want multiple different resolutions for every single clip we want multiple different encodings and what that really means is that when we upload a clip from our device let's say I'm posting a Jordan has no video YouTube is going to take my video and make multiple different copies of it right we would need a 1080p version we would need a 720p version and if you're a device and you want to constantly be switching between uh those two as your kind of bandwidth is scaling up and down or your network speeds are scaling up and down we would want to load both of those clips and the problem with that is that we don't need both of those clips right we only need the parts that don't overlap with one another if I've already watched you know the first quarter of the clip in 1080p then I only want to load the remaining 3/4s in 720p and so the way that we can do this is actually by splitting all of our videos into chunks so for example if I've got my 1080P and my 720p video right here maybe I can watch the first two chunks in 1080P my device then realizes its Network speeds have gotten worse so it's going to say hey let's load the 720p chunk after that oh wait it got better let's go back to 1080 oh no worse again and do 720 for the last two so the fact that we actually have all of these different chunks means that with a relatively small granularity we can pick the right chunk at the right time so that we can take advantage of our changing networking speeds so let's quickly outline the different advantages of splitting our videos into a bunch of different chunks the first is going to be parallel uploads so imagine that this were our server or our backend or S3 or anything like that basically wherever the chunks are going to end up if we can do this in parallel we can effectively just establish multiple different ports for this guy to uh be connected to and as a result of that we can just send the chunks on different ports not really a huge deal frankly I'm not even sure this would increase speeds very much because your bandwidth might be limited from your device but no big deal the bigger thing on the other hand is there is a lower barrier to starting a video so rather than having to load this entire huge file and get it on your device what you can actually do is load just the first chunk of it once the first chunk is loaded we can start watching and then from there we can start loading the second chunk the third chunk and the fourth chunk and so when you think of it that's why on YouTube you know you can get partway through a video and then it might start buffering we're only loading a little bit at a time and then asynchronously in the background loading the rest of it okay and then of course the last thing we just covered on the previous slide which is that as our networking speeds change we can grab the best chunk for us so we're going to do so based on our current networking abilities and as a result we now have a bunch of different chunks per video we're going to need some some sort of table or some sort of database to keep track of all of these chunks because things are going to get complicated when we have so many different copies of every single file okay so let's start outlining some of the database tables that we'll actually be able to use for our service I'm not going to go into these in too much detail I'm going to outline a schema and then possibly suggest an actual database Choice here but I think it's worth noting that because a lot of these are not going to be the B uh the main bottleneck in our system we don't have to over really optimize on them too much just choose a decent schema and be reasonable about that so first let's start with a subscribers table so imagine that we have a user ID and another user ID that describes who we're subscribed to this is going to allow us to say you know if I'm Jordan and I'm a viewer of YouTube videos who am I actually subscribed to that's going to be a query I have to run pretty frequently when establishing something like you know the equivalent of my news feed in YouTube and again I'm not going to describe how we would actually build something like that out because we've basically already made a video on that right it's the same problem as like the Twitter Facebook Instagram Reddit problem and it's the same exact thing where you know when a video gets published you would actually deliver it to all of these caches based on who you're subscribed to so again not going to cover that too much uh just watch the Twitter video if you're more concerned on how to build something like a home screen or a Newsfeed but we do still need this subscribers table for a different reason which I'll cover later in the video so the nice thing about that is because we want to know you know I'm Jordan who am I subscribed to we should probably index on the user ID field so I can quickly find all the people I'm subscribed to and we should also partition on that user ID field so that all of the queries to figure out who am I subscribed to go to the same physical Noe additionally if we really wanted to another thing that we should know is if I have a YouTube channel as I do who subscribed to me well the nice thing about this is that so we don't have to make a kind of you know two-part write every single time someone subscribes to someone we can use change data capture instead to ensure that eventually our second table will be updated right and we can do this using Kafka as we've done so many times uh in the remainder of these videos and Flink so if we have all these subscribers right here we can actually partition Kafka on this subscribe two field and then that way you know if user 10 has a YouTube channel you can see that they have subscribers three six and five those all get added and cached locally in our Flink node and we can either make a second database table out of that or we can just cash it which personally I would like to do for later purposes in the video and I'll explain that down the line so this is going to come in handy when uploading videos we'll talk about that in a bit the second third and fourth tables that I wanted to talk about I will now get started on the first is going to be user videos so this is just outlining if I'm a user what videos have I actually posted that's going to have a user ID it's going to have a video ID it's going to have the timestamp that we actually posted that video and then of course some metadata about the video like the name and the description so of course for a given user we want to be able to quickly figure out all of the videos that they've posted and we should do so by keeping all of those entries on one single note we can do that by partitioning on the user ID and then using some combination of the user ID the video ID and the timestamp as either a sort key or an index so that I can basically quickly figure out all the videos that I've posted in order by the time that I posted them so that should hopefully be self-explanatory again most of these tables I'm not really too concerned with performance- wise because we're more concerned with actually watching the videos not really loading the metadata for them okay next we've got a users table don't worry about this one too much let's keep it simple we've got a user ID an email a password whatever other information you have to store and of course if we partition on our user ID field and index on our user ID field it should be relatively easy to figure out where a row for a given user actually lives and not have to do some sort of distributed query the last aspect of at least this slide here is going to be the video video comments table so for this one we've actually got a video ID and I should preface this with saying that in reality it would probably be like a channel ID and a video ID because eventually that's how I'm planning on uniquely identifying a video is a combination of just some channel name and then also the video ID that comes with it and so we can partition on that so that all the comments for a single video go on the same node and then we can also use the time stamp as part of the index so that they're sorted by timestamp and of course have the actual comment poster ID the content etc etc etc so another thing that is worth talking about is actually going to be our video chunks table where as you can see we now have a bunch of encodings and resolutions to worry about now I've pretty stupidly actually switch the order of these so h264 would be the encoding 1080p would be the resolution but the gist is that by partitioning by video ID we ensure that all the chunks for a given video should be on the same note note additionally by sorting by the video ID encoding resolution and then after that the chunk order we can basically say Hey you know if I have really good interview uh internet and I can watch every single clip at 1080p on the h264 format what I can go ahead and do is just load all of them they'll be right next to one another on disk and that should be a simple query so again that's going to allow easily getting all the chunks for a given resolution even if we can't load all of the chunks for a given resolution because our Network speeds are fluctuating it's still a pretty easy query to go and get the ones from the other resolution I wouldn't really worry about it all too much and then of course we would have some sort of hash to describe the chunk and then possibly the the URL to wherever we're storing it whether that's S3 or Hadoop or something else so let's actually talk about some database choices that we want to use for the aention tables so the main thing to note is that we're going to be worrying about reads a lot more than we're worrying about wrs and typically when we're worrying about reads what we basically want to do is keep our database schema simple and you know cache where possible and you know we can get away with using something like my SQL here because it actually uses a b tree so if you recall uh the B tree is just a big tree on disk and it is a lot easier to iterate this on disk tree than it is to check multiple different SS tables which would be the case for an LSM tree index and so something like a b tree index would be perfectly acceptable here and uh as a result of that I tend to opt for or something like my SQL for all of those tables that I listed the one caveat being that some videos may actually get a lot of comments right because at the end of the day if it's a super popular video the first 10 minutes you may get a ton of comments and my SQL might not be able to handle that type of writing load so for that type of situation I think you could make the argument to actually use Cassandra here for a couple of reasons one the LSM tree architecture means that all rights first go to memory the second thing is that it's a leaderless architecture right so as long as we part Partition by that combination of like Channel ID and video ID we can Partition by that and then the time stamp is actually going to make it such that you know no two comments are exactly the same and we're not really going to have to deal with any right conflicts uh unless of course we support comment editing or we support child conflicts because then we can have dependency issues where let's imagine I wrote a comment to this replica and then I try and edit it but that goes to this replica before the comment itself already exists then Cassandra might be problematic and we might have to look for a different solution but assuming we just allow writing com uh comments we don't allow editing them and we don't allow replies to comments then we shouldn't have to worry about using Cassandra it should be okay okay The Next Step that we're going to start to actually talk about is video uploads now these are going to potentially get a little bit complicated but the main idea to remember here is that we aren't uploading to our database very very often again I upload once per week for example and I watch many many more YouTube videos than that so if we're uploading very infrequently the idea is to do as much work as possible on the right path and as little work as possible on the read path because most of us are just watching videos the majority of the time and the main idea here is we want to keep these relatively cheap and when I say cheap I mean literally cost wise let's keep them cheap and also correct if I'm uploading a video I expect to see it eventually doesn't have to be instant but it's got to be up there eventually so the idea is for each chunk we've got to do some sort of post-processing on it right because we want to store multiple variants we might have to store a 480p version a 720p version a 1080p version and all of those things are going to require us to store multiple copies as well as the different encodings so again we can basically maintain a list of servers that are going to do this processing for us and it doesn't really matter which actually does the processing as long as it eventually gets done so with that being said what can we actually do well for starters we know that processing these events takes time actually converting a video from one format to the other it is relatively compute heavy same goes for encoding it to a variety of different formats and so we're not just going to like upload a video to a server and be like hey do this synchronously and then get back to me whether or not it worked we're going to have to do this in the background of our application and typically when that happens we're going to need some sort of message broker so assuming that we put events on a broker and process them in the background around what type of broker should we actually use well let's start to think of it remember I said already that we don't actually really care which one of this guy this guy or this guy does our processing for a given chunk so what type of broker should we actually use well in this case I think what we would want is probably going to be an inmemory message CU so if you recall at least from our Concepts videos the the reason behind using something like a Kafka broker or a log based message broker is that there's only one consumer and because there's only one consumer and because all messages are replayable we can actually build up some State on our consumer the issue with something like an inmemory broker is that you don't really know that the order uh of the messages is going to be consistent especially if they get replayed and in addition to that uh you know it might not be as durable you can use things like a write ahead log and replication of your broker to ensure that every event gets played once but if the gist is you want to be able to maintain the same exact state if your broker were to go down then it is going to be the case that you would much rather use Kafka that being said we don't actually really care about State because all our consumer is going to do is read file process file and then rate file and so when you're just reading a file processing and writing it we don't actually care what happens from message to message everything that we need to know about what we're doing is encapsulated within one message and so we don't have to maintain any state additionally we want to get this process done as fast as possible you know as long as it's not going to make us sacrifice uh read speed so as a result of that using something like an in-memory message CU can allow us to use some sort of uh round robin load balancing where basically as each of these guys is ready to take an event they can pull one off the queue put it over here right so no one else is going to get it and then when they're done they can reach back out saying done and then finally this event is going to get deleted from the message broker and so for that reason I definitely support using an in-memory message broker for this type of task uh something like rabbit mq Amazon sqs any of those types of solutions I think would work great here active mq as well okay so now let's go ahead and talk about chunks so keep in mind that we now have this message broker right here we've got a bunch of processing nodes that are reading from the message broker and the question is what do we actually put in the broker well the answer should be certainly not the chunks the reason being that the chunks are going to be decently big you know they might not be multiple gigabytes but they might be a couple hundred megabytes and as a result of that they are probably too big to actually put in the broker and so ideally what we would first want to do is upload them to some sort of either distributed file store or some object store and then after doing so we can put a reference to them in the broker right so for example chunk goes here and then we say like chunk at some URL and that's going to be going into our Rabbid mq now it is possible that we can have partial failure States here right where one is going to go through and then two doesn't go through uh the big thing to note there is it doesn't really matter right now we're storing a few extra chunks uh but at the end of the day we can always run some sort of background cleanup job where we go through S3 uh and we would then basically say do we see this in our chunk metadata table if not uh get rid of it if we do then keep it so I think that is preferable to having to do something like a two-phase commit right here which basically we want to avoid at all costs unless we absolutely need it so the question now is we know we're putting our chunks somewhere where are we actually putting them well I think we've got a couple of options the first is going to be hdfs or Hadoop distributed file store so htfs would be run on a cluster of nodes like so and what we could actually do if we really cared about our uh you know our processing speeds is we could actually run our uh you know our consumers or our processing nodes that do the encoding on hdfs right so you know we might devote some resources to processing and in addition some resources to actual storage and that is something that we can can do and so the nice thing about that is we would get really good data locality right because all these guys are probably in the same data center at least similar to one another they're going to be able to process and write these chunks back to hdfs Super quickly compared to something like S3 where you know you've got S3 over here on the right and every single one of these nodes is going to have to be making networking calls multiple different times to load every single Chunk from net uh S3 and then write it back so hdfs is probably going to be the optimal solution if we care about our right speeds however it is worth noting that in practice two big things come up one hdfs is just like objectively more expensive right so S3 is going to be cheaper another thing is that truthfully in modern Computing or at least from what I understand about modern data centers we're actually going to be a lot more CPU bound than we are network bound and so even all of these calls are going to be expensive even though they're going to be expensive what's more important is the fact that this is just going to take time and when you look at the amount of time relative to the networking speed versus the CPU speed it's actually probably going to be more significant on the CPU end and we probably won't notice as much latency as a result of basically those networking calls and so considering it's cheaper considering we don't really care that much about making these wrs or these uploads super fast I actually am going to go with the S3 solution I imagine this is probably more similar to what these companies are doing in real life okay so now this is where things start getting kind of complex which is that we are going to talk about aggregation so we've got a bunch of different processor nodes right processor processor blah blah blah and they're going to be outputting a bunch of different copies of every single chunk and considering that these are all distributed nodes how can we actually figure out when a video has had all of its chunks finished processes so this is kind of the solution that I came up with I think this would probably be the most controversial part of the video in terms of people trying to correct me in the comments so please feel free to go ahead if you want to but ultimately this is the solution I'm going to go with so the first thing is that imagine we've got three chunks to upload like I mentioned the first thing that we're going to do is literally just put them in object storage which is going to be S3 right here once that has completed it needs to actually finish we can then go start in queuing the metadata for all of those chunks in rabbit mq so we would say like you know chunk one is s3.com ABC chunk two is s3.com sdef and then in addition in parallel to uploading these events we would also upload a second event over here to a different Kafka queue the reason I use Kafka is because we want to maintain this to keep State over here in Flink which I'll explain what that does in a second so the type of message that we would send to this Kafka over here is is this guy imagine we have video 10 and we're uploading that I would basically say in advance hey video 10 is three chunks that's what we're going to be looking out for so then we can have our Flink over here which is going to be keeping State the reason I use Flink as always is because it makes sure that every message actually gets processed at least once so even due to failures or anything like that within our system eventually it'll come back up or another node will take its place and the message will still get processed so video 10 we've got three chunks it means now that this guy knows that video 10 has three chunks coming in and it's actually going to be able to wait on them so I can say video 10 is three and then imagine chunk C isn't in yet once I see chunk a and once I see chunk B we're still not done eventually chunk C gets over here to flank and then we say okay actually we're good to go how does chunk C get there well all of the processors once they're done completing their actions are going to first do two things one is that they're actually going to let me switch up the order here one is that they're going to upload that chunk information to Kafka basically saying hey we finished incing this the second is that they're going to go back to Rabbit mq and say hey get rid of my chunk so I no longer need to process this thing anymore it's completely done I've alerted flank that this entire process is complete and so now what's going to happen is that it's going to get dced from rabbit mq it's going to get inced to Kafka over here and it's going to go on to flank so when this state is kind of completely filled which is that we have the amount of chunks for the video that we think that we have we're complete and we can do all of the downstream things that we actually need to do now the nice thing about this is that you may say to yourself oh Jordan there's tons of room for partial failures here and you are correct about that right you know this step could fail this step could fail and as a result you know we might end up replaying one of these events we might end up replaying one of these events all of these events are item potent which basically means that even if we were to do it again it would actually have the same effect as if we had only done it one time so again if I'm never able to actually DQ this event from Rabbid mq if that fails what would happen is it just gets reprocessed it gets sent right back to S3 we then go back to the kofka que saying hey we've just completed this chunk again and then even if that was already present on the Flink node the Flink node would say actually I already have the data for this video and for this chunk I'm still good to go you know I'm not double processing this video I understand it's a duplicate this is totally fine and so through this all of these things are item potent and our thing is completely fault tolerant or at least I should hope okay I did all that but it was a bit abstract so let's go and actually talk about the data models that we might use putting into each of our cues which should hopefully make this a little bit more clear what we're actually doing so we've got our client over here after uploading to S3 the client is also going to put some data in rabbit mq so the the first idea is that again we've got kind of our user ID because we're sharting by user ID and video ID and so basically let's imagine user 10 is uploading video 100 this is going to be the first chunk here's the S3 link that we got after uploading it we can put that in rabbit mq the processor is going to process it and from S3 it's going to say hey get me s3.com sl10 100 ABC S3 is going to respond to it and then from there it's going to basically put in Flink hey here's chunk number one of video 10 or video 100 for user 10 and then it's going to actually provide a list of all the different chunk copies that it made their encodings their resolutions their hashes their new S3 URL it can actually put that in Flank In addition to that we've got this second data model right here which is more about the video meta data right so for user 10 for video 100 there are three total chunks the name is going to be Jordan's feet and the description is that they're so stinky wait that says dainty fraudi and slip perhaps so anyways we're going to put these in Kafka and then they're eventually going to flow over to Flink it doesn't actually matter whether this message gets the Flink before this message does because at the end of the day both of them are going to be required for us to consider that video completed we're going to store this data regardless on Flink and we're going to store this data regardless on Flink and then we're effectively going to do a join on them via the user ID and the video ID fields and use this num chunks field right here to determine when we're done so hopefully that clears things up a little bit but we've still got plenty more to do as far as video uploading goes we can actually tell when a video has finished aggregating or when all those chunks have been processed as necessary but now we have to go to all of our databases and actually say hey here's the data that you were looking for so again we can now begin to update our derived data oops had the Eraser on so we're updating our derived data and at least to start we're going to put it in two places the first is the video chunks table right we've got now this massive data model and we want to upload it all at once automically into a single partition we can do so by just partitioning by the user ID and the video ID so we make sure that for a given video all of the chunk metadata is on the same table after those video chunks are uploaded we can then basically go ahead and say hey let's now upload to user videos so now we'll say oh user 10 has video 100 and the name is Jordan's feet now if you notice there is technically a partial scenar a failure scenario here where this guy goes through number one when we upload to the chunks table and number two goes and fails that being said because this is Flink we know that we're not actually going to consider this event completely acknowledged until number two goes through so even if number one does go through and two fails there's going to be a small period of time where this table is not as up to date as this table but you can only really find that a video exists if it's in the user videos table so you're going to have a bunch of Chunk metadata existing in this table but no one's actually going to be able to watch the video because they would have to access it through the user videos table and like going to a user's page and clicking the video and so as a result of that we basically wait for a little bit this guy eventually comes back up or someone else takes over the state it retries the event until this guy goes through and this guy goes through and then we're good to go of course this is totally fine as long as all of those rights are item potent because those chunks or the data for those chunks is going to be the same data you know from both before and after this Flink node crashes it's just going to override itself no big deal same thing for the user videos table so again no big deal so this is why the order is important right if I had the user videos table first first it might look like I have actually posted a new video but then when you click it you can't load any chunks for it so again that's why I've deliberately chosen to upload the video chunks before the user video metadata okay lots of video uploading details for this one we've got plenty more to go so basically the next part of this is that some of our videos we actually know in advance that they'll be popular so this is pretty cool because for a lot of social networks or social medias you know it's very hard to say in advance whether a post is going to be popular if I'm blind and every single person that's posting is an anonymous user when a new post comes in I have no idea whether people are actually going to view it a lot or I don't that being said with a service like YouTube or with a service like Netflix we actually have the privilege of knowing when a particular video is going to be popular in advance precisely because we know how many subscribers they have right so let's take my channel for example with my 1 billion subscribers and we can basically say hey if Jordan's posting let's go ahead and push this thing to a CDN so that way when users originally try and load it those chunks are going to load more quickly we can go ahead and actually do just that but how would we do it well in the past you might have thought okay well this guy is just going to be partitioned by video ID right we're sending video metadata we're sending video chunks and at least intuitively it would make a lot of sense if we partitioned both the Kafka qes by video ID and also if we partitioned uh Flink by video ID however what I'm now proposing is that rather than do that we should be partitioning everything by user ID the reason being that the subscriptions table can be partitioned by the user ID so basically for my user I know exactly who is subscribed to me and as a result of that now I know you know Jordan has 10 subscribers and here his new videos he's got this guy right here with video one it's got three of three chunks he's got this amount of subscribers Oh shoot that's actually going to be a very popular video so not only are we now uploading to the video chunks table then the user videos table but lastly we can even upload to the CDN I think again precisely the ordering is important here because if these two succeed and then the CDN fails now all of a sudden you can still watch this video it's just going to be a little bit slower we can always do a poll based approach into the CDN rather than a push based approach but eventually it will get pushed to the CDN because of how flank works so again you know if we were to publish to the CDN for example uh first then all of a sudden we're taking up a bunch of room in our CDN and the video is not even watchable or you know just it would be a bad situation so be careful when you're doing these kind of sequential uploads to actually think a little bit about what the ordering means okay let's now talk about actually building a search index so this is is also going to be a little bit complicated I think I would rather do this a bit more in depth in a separate video so I'm not going to go too crazy with this one but hopefully this should outline some of the problems that you have to consider when building a large massively distributed search index so for starters we want to take our video title and we want to take our video description and make sure that if anyone searches for that video title or for that video description that they are easily accessible and that that query is going to be quick so the way that we do this if you've watched my concept series is via an inverted index so for example if we've got video 22 over here and this is the title of it we would want to split out the main tokens like these guys right here and then we would go ahead and basically say hey for this given word here's a list of video IDs that actually contain that word now of course as you can see I only kind of took like the four non-common words right here but how you actually break this document up is going to depend on the tokenization process of the search index that you actually use okay so the next thing that we need to think about is partitioning because partitioning and the actual speed of our query is directly dependent on you know just how well we balance out our data if we can keep all queries going to one single node as opposed to having to aggregate them over multiple different nodes that is going to be a lot faster so again ideally for a given search term we would want the user ID and video ID combo on a single node the question is is it possible we've actually discussed how much storage we have available we've discussed how many videos we've posted can we get this done let's say if we post a million videos a day over 10 years that's about 4 billion videos so if the user ID and the video ID are integers that means combined they are 8 bytes in total so 8 bytes * 4 billion is 32 GB of IDs for a search term in theory that is actually super possible however in reality keep in mind that if we only just s uh store the user ID and the video IDs we can get those back but then we would still have to do a distributed query on our video metadata table in order to actually get all of that data back when we actually make the search right because even though we have this data we still have to get the names and the descriptions and possibly the thumbnails of those videos and so in reality it's probably going to be a lot more than 32 GB of IDs for a single term and as a result of that we're probably going to have to do more partitioning in this so we've got basically a couple of decently solid options the first being that you know we don't do any term specific partitioning and rather we just you know kind of Round Rob in all the documents and then keep a local index per partition uh that can be okay but if we have too many partitions let's say we have 100 different partitions of search indexes then now we have to aggregate a 100 different partitions of indexes which is you know probably going to be pretty expensive another perhaps hybrid option is that you know where we can store all the terms on one single node that we do that being said for very popular terms you know take fortnite for example we can actually do an inner Shard on that and keep you know all the fortnite documents on a combination of five nodes and then you know have to aggregate those together so let's do some search index partitioning continued so the idea is that in reality we're going to have to store you know a bunch of metadata for the video we're going to have to denormalize our data a little bit or else we're looking at potentially a super expensive query so we would denormalize our data by basically continuing to store the video name and the video description within our search index and that of course is going to take up more space meaning that we can't partition in the way that we necessarily completely wanted to so rather than taking a super popular term and trying to fit it all on one note what we can do is actually repartition that over a few different nodes and then make some client side optimizations that know to actually check all of those nodes and aggregate them together so internally you know ideally something like elastic search would do this for us but the idea would be you know you would take a term like Jordan and then in reality we would Shard that Via Jordan dollar sign one Jordan dollar sign 2 Jordan dollar sign 3 and that way we can kind of repartition our popular terms so again what this also is going to mean is that if we do keep our data denormalized right if we store the actual title and the description of the video within our search index now all of a sudden if we want to be able to edit those that's going to get super expensive right because now we have to actually find where they are in the search index and go ahead and change them the question is do we actually care probably not really because at the end of the day the benefit of denormalizing our data within the search index for reads and for viewing videos is a lot bigger than kind of the cost of having to have slow uploads or slow edit speeds for our video descriptions and titles so what would that actually look like when we bring it together well let's imagine we have our user videos table which to remind you is like a user ID a video ID uh a name a timestamp probably first timestamp then name then description blah blah blah blah blah we can actually upload first to this table then use change data capture to go ahead and derive another table off of it where we first sh by our video ID and as a result of sharding by our video ID we can ensure that the flank instance that reads this data is actually going to have the old video name and old video description on it and so if we know the old name and old description then we can actually go to elastic search and specifically tell it to remove that video name and description elastic search would then reach out to the proper partitions remove them and then also write the new ones now of course this is going to be pretty expensive right we might even find our ourselves writing to 10 different partitions at a time that being said a it's doable B we don't really care because even if it takes a couple minutes for that video name change to propagate to other users it doesn't really matter as long as they'll eventually see it so this does seem like a feasible solution personally of course as always if you guys in the comment section have a better idea I'm open to hearing it and then of course finally we've got this absolute Behemoth of a diagram so let's go ahead and talk about it the first thing that I'm going to start with is going to be the top left so if we're looking over here we've got our load balancer which goes to our horizontally scaled user service the user service is going to be responsible for both the user table as well as the subscriber table so the user table is going to be sharded on the user ID because this is not something that's probably going to be a bottleneck I think just using my SQL and partitioning that and replicating it in a single leader fashion is going to be fine additionally we've got our subs table right that can be fine with my SQL we don't really subscribe to channels that often right you know maybe once a day or something and so WR speeds are probably not going to be the bottleneck there more importantly I just care about correctness so as long as we Shard this guy on user ID right not who they're subscribed to but who is actually doing the subscribing we can use change data capture to eventually put this into our flank node over here so it is eventually going to fall over there additionally we've got our comment service right for a given video we have the ability to comment on it and as a result of the fact that certain videos may have a lot of comments I think that using the increased right through putut that Cassandra gives us could be very good all we have to do is Shard by the channel ID and the video ID and then sort on the time stamp of the comment and we should be good to go keep in mind that Cassandra will actually present problems if we start to try and edit comments or start to have child comments uh but I would consider that more of a stretch goal uh worse comes to worse we could always use single leiter replication and then just add more partitions I think that would be pretty feasible as well though it may be a little bit more expensive finally the most important piece of the puzzle is going to be the upload service so the first thing that we do when we're uploading is we're going to upload all of our chunks to S3 from there we're going to do two things our upload service is going to put the actual video metadata such as the number of chunks the name and the description in Kafka and it's actually going to put references to the chunks in rabbit mq rabbit mq is going to allow us to efficiently load balance the processing of these chunks to our chunk processing nodes which are going to read them from S3 all the way over here and then ideally write them or rather write the result of them right back into S3 and again put the references of those into Kafka before they actually DQ them from rabbit mq once those chunk processors actually go ahead and do that we've now got all of these chunk completed events in Kafka and so the idea here is that the chunk completed events are going to have access to the user ID that's posting the video the video ID of the video and as long as we sh it on user ID then what we can ensure is that Flink knows for a given user num subscribers it knows the video Numb chunks and it knows the chunks that are completed chunks completed and so so once all of those things are done it can perform all of the external events that it needs to do as long as these are item potent this should be totally fine so of course the first thing that we would want to do is actually upload to the video chunks table remember order is going to be important here this is going to be number one number two is going to be uploading to the user videos table right that is going to help us out quite a bit because now it's going to say hey this video is actually accessible once you click it we're going to start reading from the video chunks table and that'll tell us what to read from S3 on the client side number three is if this user is a popular user if they have a lot of subscribers we can read from the CDN and then last but not least from our user videos table we can do some sort of change data capture which basically says hey we know all of these users have videos now we can see that one of them was either uploaded or you know if I'm a client and I'm writing over here to actually change the name of my video it's still going to flow through the same exact pipeline we're then going to go into this video metadata que which of course is going to be Kafka so that we can use Flink again we can go ahead and Shard on the video ID itself so that we have access to the old name and old description of that video and then of course we can make whatever removes or adds to elastic search that we need to make I have elastic search is one big circle here right now I want to keep this abstract because I'm going to go into a lot more depth on it in a future video but the gist is elastic search is probably going to have to write to a bunch of different partitions and that's okay we don't really need this to be fast we just need it to work the only thing that we really need to be fast is this guy right here reading from S3 reading from the video chunks table in reality there would probably be some sort of caching layer on the video chunks table particularly for popular videos but uh I've chosen not to go into that too much because we've already seen how caching is going to work in distributed setting anyways guys wow this was a long one so far the longest by far and uh yeah great stuff thanks for hanging in there uh my voice hurts so I'm going to go drink some water uh my throat hurts geez I can't even speak I'm losing it and I am now going to go to sleep so with that being said have yourselves a great week I hope you enjoyed this one and I will see you in the next one