Scaling Strategies and Key Programming Principles

[Music] and what we'll be doing today is a bunch of this um the things that have checkboxes right next to it we'll be trying to do today what doesn't have checkboxes we'll most probably be doing tomorrow we most probably have to do asgs also tomorrow or the implementation of asgs what is an ASG an Autos scaling group this probably will be done tomorrow because my AWS account has limitations as to how many machines I can create um and I've sort of used it to Peak I've request Ted for an increased limit I don't think it has happened right now um so we'll have to do this tomorrow and most probably these two also we'll be doing tomorrow the rest we'll try to cover today where can you find the slides you can find the slides at the end of project 100ex sts.com in this section called horizontal and vertical scaling indexing in databases that is what we will cover this week basically some advanced concepts in scaling some more advanced concepts in uh databases we'll also cover a little bit of rust to show how it's you know better than um node J um specifically you know when you want to scale what makes it so rest is 10 different things like it memory safe it's faster but what makes it scalable node J is very inferior when it comes to a comparison with languages like Java rust golang because these can spin off multiple threads they are multi-threaded what is uh nodejs or JavaScript single threaded so we'll try to do some coding and you know implementation and try to see whe the other languages are better um we'll understand about the cluster module that pretty much let's not just get close to the capabilities of goang or you know R let say not not very close like there is a difference between multi-threading the way it is done in golang and rust and the way we'll be hacking it on today and one way to hack away you know using multiple threads in JavaScript is using the cluster module so we'll spend some time there we'll understand what is horizontal scaling then we'll understand what is vertical scaling these should actually be flipped vertical scaling should come here horizontal scaling should come here and auto scaling groups which let you horizontally scale on AWS most clouds will have some construct to let you create autoscaling groups what does autoscaling groups mean means a bunch of servers that you can describe K if the CPU usage goes above 50% please increase the number of SC servers Auto scale up and then if the CPU usage goes very low like 10% then Auto scale down we'll talk a bit about capacity estimation which is you know just a buzzword and and a system design question a lot of times they will ask you okay estimate the number of videos that are uploaded to YouTube every day how do you do that Mental Math um load balancers probably not today when we understand asgs is that we understand load balancers and a bunch of other constructs that are used to create an Autos Skilling group indexing we'll cover today normalization and sharding you'll cover tomorrow U yeah that's pretty much it um the interesting bits here I would say are you know probably this um asgs and normalization um everything else should be fairly straightforward cool that's enough context for today let's get right into it let's kick things off by understanding vertical scaling I actually want to start with a poll okay how many of you already understand this but either bya first principles or You' have heard of it do you know what horizontal and vertical scaling is generally most of us do no most of us don't sorry 50 50ish okay so half of us know half of us don't know it's very obvious once you think about it what does vertical scaling mean maybe even before that why do you need scaling if your application gets a lot of load then you need to scale up it's not necessary your application look look at that this is what we'll be covering today if this is your you know AWS machine easy to machine where you're running a node J process let's say node index.js this can be used by a lot of people not just one do you think an application like swiggy is used by only 10 people no it's used by millions of people so you need to somehow make this process very powerful you need to say take as many resources as you want here take you know more CPUs and more let's say you know theoretically gpus or more memory whatever you need more file system I will give you as many resources as you want so you can handle these requests when you increase the size of an ec2 machine to do this when you move from you know a c5x large machine to C5 do 4X large machine what does this transition mean what is c5x large it's just a machine type in AWS if you're ever in AWS if you're ever in AWS and try to create an ec2 instance it asks you what is the size of the machine you need the bigger the size the more that you have to pay so if you're scaling this way if you are like yeah I'm going to expect a lot of load let me increase my size to be not micro but the biggest machine that exists over here let's say c5a metal which you know is $4 per hour how much is that in a day 24 into 4 like $150 a day which is how much $4,500 a month if the math is Right somewhere in that range right 3 four lakh rupees a month if you feel like yeah we are going to have a lot of load swiy doesn't care if an application like swi is making millions why would they care if getting a big machine helps them handle the load so this is what vertical scaling means because you're vertically increasing the size of the same machine if this was your machine first it only had one CPU now it has 20 20 CPUs if only had 8 GB Ram now it has 32 GB Ram so you're vertically scaling a machine and trying to hope you're hoping your node just process will you know be able to handle more requests if you vertically scale a machine but do you think that is possible do you think that is necessary okay if you vertically scale a machine if you keep adding resources to the same machine the nodejs process is able to handle more requests what do you think the answer is do you think it is possible for your node just process to handle more load if you vertically scale your machine and 50% say yes 50% say no the answer is no you can say yes something might happen you know it might have more memory if the nodejs process uses a lot of memory then sure increasing the size from you know 8 GB memory to so much memory might help the nodejs process but specifically for processing specifically for running for Loops if there is an expensive operation here that's happening that's being handled over here where does all of this run this runs on a CPU central processing unit and what did we keep saying again and again we said JavaScript is single threaded what does being single threaded mean it means it can only utilize one of these CPUs it does not matter how much you scale your machine you could have four vcpus you could have 8 16 32 64 there is no point of making your machine beefier and beefier if all you are doing on that machine is running a node process node cannot take the benefit of all of these CPUs it will still run on a single CPU only irrespective of how big you make your machine so should I just go with a very small machine then yes unless it's going to use a lot of memory the only problem that might come in a very small machine is okay you know some build process is not going not able to happen because the CPU is not you need like more than one threads for it uh what do you mean by build converting you know jsx to JavaScript if you compile your react Project A lot of times on you know a t2. micro machine which a lot of you might be using if you are using the you know free free tier over here the t2. micro runs on it says free tier eligible so we've recommended again and again if you're starting a machine please start it here here running the node just process here is fine it only has one vcpu doesn't matter if it has one or if a different machine has you know 32 that is fine they will still perform similar not exactly the same but very similar it won't linearly grow for sure if there is one CPU and you make it 32 times bigger you are paying 32 times more money doesn't mean you get 32 times the performance you might get 1.1 times the performance something extra you might be able to squeeze out here and there in a bigger machine but not linearly the downside is some build process might not be able to build over here your machine might run out of you know CPU memory by I sorry I cannot uh build your react project but if it's a single nodejs process then you know just throw it on a micro machine a mini machine it'll work then the question is is there any point of doing vertical scaling in a nodejs process vertical scaling means increasing the size of your machine to support more load what does increasing the size mean doesn't necessarily mean increasing the hard drive means increasing the memory it means increasing the number of CPUs you have for example if this is a nodejs machine and this is my Mac machine this is a nodejs file which I run by running node index.js it doesn't matter how many vcpus I have my current MAC I have 10 vcpus but my node just process will run on a single one does not matter how much load it's getting does not matter you know what optimizations I do here it can only be handled by a single processor or a single CPU because it's single threaded nature it has a single thread that's you know going into any sing function an Express Handler wherever it's going there's no way to paralyze this you can't really split it you do this part you do this part you do it on thread one you do it on thread two you cannot paralize which means node J not the most effective way to scale it is vertical scaling you can try but you will see diminishing effects after a point you will see no effects so if you have a node just process no point renting a very beefy machine rent a small machine you'll get the same performance this is in Stark difference to multi-threaded languages languages that do let you use all the cores of your machine what are good examples of these languages Java goang rust if you have done Java you might have created a thread pool over there if you've done rust you might have spawned threads over there if you've done go you might have heard of something called sub routines that let you create small routines that can be run on different threads what do all three of these let you do they let you parallelize they let you be like you run here you do this t you do this task you do this task so on and so forth and why is this useful this is useful because if you have a b machine why do you want to keep so many cores idle why do you want K rust you only run over here I will only give you you know one1 the power that exists on this machine you don't want that and hence you can write code in a way you have to write code in a way it's not as easy as I will create a for Loop and it will get paralyzed to all of these scores no if I give you a problem statement that says find me the sum of all the numbers from one to let's say 1 million or some big number like this and you write a for Loop okay okay I will do this for let I equal to z i less than very big number I ++ answer equal to answer plus I and then you know console.log answer if you do this or if you write similar code in rust golang or Java all of them will be able to compute this on a single thread only right writing multi-threaded code isn't as easy as I will write this code and then the CPU will itself parallelize no if you want to write the optimal version of this in Java in goang in Rust you will have to write it somewhere like this I need to find the sum from 1 to 1 million I will split this into a bunch of 10 chunks let's say Okay 1 2 whatever this is 108 and then 10 Power 8 to 2 into 10^ 8 and then 2 into 10^ 8 2 3 into 10^ 8 so on and so forth you will create a bunch of chunks here I will this is the first 10% of the sum that I need to find this is the next 10% of the sum that I need to find this is the next 10% and you will write code that look something like this k for slice in you know slices let's say we store these in like a bunch of slices you will do a thread do spawn you know some function over here which will actually find the sum for that run a for Loop to find the sum and then eventually you will aggregate the results from all 10 of the course and like be like return you know answer one plus answer two plus so on and so forth so it's slightly difficult to write multi-threaded code you can't just assume I write a for Loop in gang and it'll be extremely fast it'll run on multiple threads no you have to write code as an as a developer it's your responsibility to you know write this code to spawn threads to tell the compiler of golang of rust yeah please break this job down into 10 parts and individually do these on 10 different threads and eventually accumulate them into a answer variable how would you do a daily task let's say if you have to do three things today if you have to you know eat breakfast get some medicine and go to the beach maybe not go to the get medicine and get food you can either do all three of these together yourself so sequentially I will do one then two then three or you can paralyze I will do one thing you do the other thing you do the other thing and when the results are there I will just join the results and all three of my tasks are done parall that is what this code on the right would let you do in a language like rust Java goang so the question is Kat there is no point of vertically scaling a nodejs process you are saying okay it works for these guys and for these guys also it only works if you write code like this I can't write a for Loop and assume it'll run on multiple CES I have to tell the compiler can hey please spawn a new thread that does this one a new thread that does this and here I will of course you know find the sum from for the first iteration I'll find the sum from 0 to 10^ 8 for the second iteration I'll have find the sum from 10^ 8 to 2 into 10^ 8 but they're happening on separate threads they are happening on separate course they are parallely doing the results and then you're accumulating it eventually can we see this in action can you show me okay this whatever you're saying actually happens sure let's create an infinite V Loop and let's try to run this locally on you know whatever machine you might have you might have to install htop if you want to see stats like these um like I see on my machine um you can also run top and you'll see some but either way you need some way to monitor or you can also open if you're on Windows machine which you know I've seen a lot of people are you can open your system analyzer and look at the CPU usage either way let's try to create not just a for Loop that runs finds the Su for a very big number let's create a infinite for Loop K keep on running forever and ever let's copy this code very simple let c equal to Z initialize a variable and then an infinite while loop that keeps on running the same thing again and again never stops this should keep the CPU very busy ideally you would want K if I am on a 10 core machine this should paralyze this heavy task of increasing C should be paralyzed on various um threads or you know various cores but does that happen let's see let's copy this code let's open a random variable or sorry a random folder and since I don't need any you know dependencies I'm just going to open it here V index.js and paste it you can open a file in Visual Studio code in notepad wherever and paste it I'll wait for 10 15 seconds in case you want to you know try it locally as well what have I done I've just created an index.js file where I've put the contents that were present here on the website which is defining a able and then running it in an infinite for Loop 10 more seconds and then I will proceed I have used Vim like feel free to use Visual Studio code and open the file and run it let me just open it in visual I actually cannot you get the idea it's fairly simple just open the file and then run the file node index.js what is this doing infinitely running a for Loop that's trying to compute something of course it's Computing a very useless thing but is trying to compute something how many cores of my machine do you think are running if all of my cores of my machine were doing this the zoom call would have stopped the zoom call would have at least you know fumbled a bit K sorry we have a very expensive operation if I open GTA on a very own laptop of mine then of course the laptop will begin to struggle a bit right I cannot open zoom in that laptop so if this was actually multi-threaded you would have begun to see some interruptions in the zoom call if it is not happening it means Zoom is running on a separate thread most probably or you know a separate core is able to handle all the zoom load separate one the browser load and then a separate one is running this very one of them is very busy right now one of these cores is running at 100% capacity trying to you know calculate the sum from 0o to Infinity but the rest of them are you know doing this terminal Zoom call so on and so forth great question to ask at this point ISAT you are on a 10 core machine does that mean you can only have 10 applications running because each score will handle one application not exactly if you have gone through a class on operating system you can have multiple applications they inter leave between each other of course if I start multiple applications that are infinitely running a for Loop then you know none of them will be idle at all and you know they'll keep still keep interleaving but you know the processes will become extremely slow versus when you have a browser a terminal a zoom application the CPU doesn't need to be working all the time for example right now I'm sending you video at 30 FPS which means you know I have to send one chunk every few seconds every few milliseconds so I don't need infin L running I don't need the I don't need 100% of this virtual CPU this or this CPUs you know uh compute I might be like bro I'm going to send something to the other side this video to the other side on Zoom to my friends once every uh 30 times in one second so please send it once and then you know you are free for the next 300 milliseconds or 30 milliseconds then please send another frame then free so they can remain idle from time to time and they inter leave the same CPU might be first running my browser code then might might be running you know this terminal code then might be running Zoom so on and so forth but one of them is very busy right now how can I confirm that I can open another terminal where I can run htop which lets me lets me see the stats on my machine and what does it say at the very top what is the most computationally expensive process running on my machine right now it this one at the top which is node index.js what's the second most computationally expensive it's OBS that's recording the video right now what's the thir third most competition expensive it's Zoom app that's sending the video and everything else is like 8% 2% so on and so forth but 99.3% which means completely using the powers of one of the cores what is that process that is the infinitely running for Loop that makes sense as well right if you have an infinitely running for Loop and if you're in a machine you want CPU is not like an employee who has labor laws CPU can run at 100% capacity forever so if you have an infinitely running for Lo then of course it will run forever why is it less for OBS and zoom because Zoom might not need to as I said send everything at every point at some point it might become free from time to time which is why it's not able to use 100% or it doesn't need 100% of any CP what if I start 100 applications then what will happen what you would expect you they will begin to chop out why because you know even this for even though this for Loop should ideally keep running the OS will not have the time to pick it up from time to time and run it and hence you know it'll slow down that's what happens when you open GT on a very old laptop right hopefully this is clear K1 CPU is being used by this guy now the question you might have is to node just is extremely powerless language you should never use it it only runs on a single thread or a single core and you know you are on such a Big Mac machine yet you're not able to use it to complete power sort of true but you know you can do a bunch of things you can do this I have to do something competition expensive let me start multiple uh node processes let me start three nodejs processes so that I'm able to use three CPUs of course I it becomes very hard to share data between them they can't really talk to each other which is something you can do in a language like goang or rust but you can be like it's fine at least the you know infinite for Loop is running in three places so I have you know three workers that are doing work I was able to pariz nodejs my nodejs runs on multiple threads or multiple vcpus my nodejs is multi-threaded can you see that you can technically sure say that you can you know if on a big if you're on a big machine you can run multiple nodejs processes and you know various U Terminals and what would that do if you look at desktop now you will see the top three processes are node index.js all of them running 100% 100% 100% to can you make a nodejs process use all the power of your machine yes then why do I need goang herat as I said number one this is very ugly opening three terminals doing this it would be nice if I run a single command that's able to parallelize number two it's very hard to share data between them if this thread wants to send some data This Thread becomes very hard that becomes much easier in languages like C++ that becomes much safer in languages like rust which is one of the reasons that rust was introduced rust needs to be extremely memory safe when it comes to cases like these if you if you're spawning two threads they should be able to allow to sh share data between each other like amongst each other without you know having dangling pointers or things like these so we have found a weird way to make node GS sort of work on a bey machine and you know extract all of its power but let's see how the same thing happens in Rust if you were supposed to do the same thing in Rust would you run three different rust processes or can you do something better over there if you don't understand this code it's fine this is you know synex is a little different we're using something called the thread think of it as pseudo code all along I've defined a main function I'm iterating from 0 to 1 to two so I'm running three iterations of this and rather than doing you know a for Loop over here infinitely I'm first spawning a thread inside which which I'm creating a counter and then running a for Loop I'm not running the for Loop directly outside over here I'm first spawning a thread and then in that thread I'm sort of running the for Loop and then I have another for Loop in the main function call so when I start this function main if when I call this function main when I do the rust start let's say what happens control reaches here three threads get spawned 0 do do3 means it'll run three times 0 for I I equal to 0 I = 1 I equal to 2 or underscore equal 0 1 2 and it will spawn three threads which are also overworking they are running an infinite for Loop and then the main function will also run an infinite for Loop so we ideally have four processes running or four threads running one three of them trying to you know find some sum this one just ideally running and you know not doing anything at all let's and if you don't understand this code it's fine I hope you theoretically understand it you know spawning a thread which means you know this is the thing that harat said golang provides you or Java provides you or us provides you it lets you spawn threads um where you can you know tell another person okay please do this job or tell another vcpu okay this can be run on a different machine a different CPU this can be run on a different CPU that's exactly what we're doing here if I try to run this and it's fine if you're not able to run this if you don't have the rest compiler locally right now I'm just pasting this in a file called uh main. RS will this work without a cargo it won't I'll have to do it in okay let me initialize an Mt rust project mkd week 22 so if you have cargo locally which means you know the package manager of rust very similar to npm how do you initialize a empty package.json you run npm in it- similarly how do you bootstrapper rust application which has the equivalent of package.json which is cargo. tomal here the equivalent of index.js which is you know whatever main. RS here you run cargo in it when I run cargo in it it initializes a rust project for me if I look at the files it has a cargo. toml file which is similar to package.json it also has a source folder which has the main. RS file where I can just replace the contents of this to be the code that we have cool so I've just initialized the r project how did I do that Cargo in it inside the folder where I want to initialize it and what is the second thing I did I've restarted uh sorry I've uh replaced the contents of main. now I can run this rust project locally by typing cargo run gives a bunch of warnings that's fine and then remains stuck over here notice what happens here what does this say it says yeah there is a process using 385 per. what does 385 per mean it means four vcpus it's using four vcpus or four CPUs and what is the process it is this cargo run process that's sort of running rust on my machine what does this tell you I did not have to open four terminals in a single terminal in a single rust file I told k bro you run an infinite for Loop and also spawn three threads that do the same thing that run an infinite for Loop of course in the real world you you're not running infinite for Loops you'll be like a a handle request that's coming from a user an HTTP request or you know as I mentioned a use case of you know um finding the sum from 1 to 100 billion then you'll be like the first thread please find the sum from 1 to 100 million second thread please find the sum from 100 million to 200 million you'll get all of these values and then finally add them up and return them over here you'll parallelize this that will be a real world use case but hopefully it's clear a language like rust or even as I said goang may you can create sub routines and Java may you have a thread pool that lets you create threads does the same thing you can have something using more than you know one CPU of your machine which is good which is great for you know a backend application because you want to scale backends as much as you can vertically if vertically doesn't work you do scale horizontally as well but you know for the first 100,000 users scaling vertically is great and it's much harder to vertically scale in nodejs can you do it though you can this should say vertical over here implementing vertical not horizontal vertical scaling in node J let's say I say F you have to implement vertical scaling in a nodejs process um somehow figure it out firstly let's do a poll do we understand everything I've taught until now oh let's take some questions really quickly then 82% um I I think two or three questions of recap will be fine so you know small 3 4 minute break so now is that question answered okay even though JavaScript is single threaded how does it run asynchronous operations this question sort of is a great question to ask to check someone's fundamentals you know do they fundamentally understand JavaScript or not great cool uh shall we proceed uh I can we move on to you know the cluster module are we good to go guys or do we need more recapping I'm assuming so can we good to go all right um not taking a break uh we can take one later on um 87% is great let's proceed implementing horizontal scaling in a nodejs project let's say you do want to implement horizontal scaling if you you're like I to have a very Bey machine I want my million users to go to the single B Machine and I want to use nodejs only what can I do you can either start multiple node project you'll be like take I will just start node index.js 10 times and then you know my machine is running 10 different node J processes and hence I have used all the powers of my machine but this has a lot of problems first this is a you know very ugly approach slightly better approach is just spawning multiple requests on a worker thread that someone suggested number two processes will have Port conflicts the biggest issue here is okay if you start multiple processes together this is an Express server running on 3,000 you cannot start another Express server running on Port 3,000 you cannot be doing this um there'll be a port conflict if you want to try it out quickly you know try it out um you might have already seen this from time to time but let's see if I have a rust like Express project somewhere here let's just do it on next cannot do it here which of these is an Express project week 14 isn't God let me quickly create an Express project even though I'm assuming this is like pretty straightforward we say CD Express project import Express from Express const app equal to express and then app. listen 3000 if you write code like this of course there'll be a bunch of handlers here but assume there are no handlers and run node index.js one sub it will tell you you don't have Express you also don't have package. on so npm init Dy npm install Express I'm not using typescript I'm simply using JavaScript let me open this in Visual Studio code so it's easier to you know look at and understand what's happening here if I run node index.js over here still gives you an error okay you cannot import like this because this is a JavaScript project to require there you go you have this you know Express project let's say okay cool if you run this once it's all good you have a single thing running on Port 3000 if I go to Google Chrome over here and go to local scor 3,000 it says message hello world but if I try to run the same thing in another terminal I repeat I open another terminal and run node index.js what happens a port conflict happens it says address is already in use you cannot start another process that's running on the same port so the knive approach that we were talking about K we will do this we will start multiple node index. GSS does not work so what can we do we can use the cluster module the I mean this might have changed recently by the way like this might someone mentioned worker thread so maybe they are a better way to do this uh but historically cluster module is something I've used and that also has recently changed its API so you know today when I was doing this I realized cluster dot is primary is depreciated and there's another way to you know start uh multiple or spawn multiple processes using the cluster module so you know feel free to look Google a little bit after also after the class to figure out is there a better way to do this today but this was conventionally the way to you know scale nodejs applications for the longest time probably e today as well what does it say it says import Express from Express firstly we'll do whatever we were doing in like importing Express like we were until now import cluster from cluster this cluster is something that nodejs provides you which means it's a node internal Library you don't have to do npm install cluster it comes pre-built with node J very similar to you know FS or uh HTTP all of these come pre-built with nodejs you don't have to externally install them import Os from OS this Library we are only using to find the total CPUs that exist on this machine const total CPUs is equal to os. cpu. length this will give you in my machine this will print 10 because I have 10 VC CPUs on my machine con Port equal to 3,000 nothing too complicated here now comes this mildly complicated bit if You' have done you know forking in the past then this will be straightforward if you haven't done forking in in you know C++ in the past then this might seem a little overwhelming but bear with me what are we doing eventually we are starting multiple processes somehow which means this file that you see here will run multiple times it might run 10 times or 20 times which means the same code is going to run for all of these workers right yes ARA the same code will run so what this if check over here does is K is this the user started process eventually I will start a process by running node index.js and this process needs to start a bunch of other processes right I will start a process by running node index.js which we want it spawns 10 different processes that handle all the incoming requests so what does the cluster module say it says here if this is the primary nodejs execution which means if this is the nodejs index.js process that's running by the user then run this code and what does this code do it says for let I equal to z i less than total CPUs i++ cluster. Fork what does this cluster. FK do it starts another process that pretty much runs the same thing which basically means that process will also start from the top and try to run all of this but for that process cluster do is primary would be false which means the control would reach over here and you know here you can write all of your Express logic to do whatever you're supposed to do and if you are using the cluster module there will be no Port conflicts even though you might feel like we still starting 10 processes there should be a port conflict there will not be another port a port conflict if you are using the cluster module uh to you know folk the same process rather than starting not just process again and again in a terminal the biggest thing that'll feel overwhelming over here is this cluster do is primary path what it basically means is okay you know you will run node index.js once in your terminal and once you do the cluster dot is primary variable will be true over here because this is the primary execution of this index.ts file this index.ts file is running from the top and this is true because this is the user started node index.js but this ends up forking 10 times which means this ends up starting another 10 processes it ends up starting node index.js let's say 10 times but for all 10 of these cluster. is primary is false for this guy it was true for all 10 of these again the code will start executing from the top but the control will reach the El statement where you define your Express logic that you know actually handles the request what is the express logic const app equal to express it also prints on the screen worker process. PID started what does this tell you every process will have a separate PID which means these aren't lightweight threads that you're starting you're starting multiple processes this is starkly different to how you know you do it in goang where you know when a request comes you spawn a sub routine okay yeah you handle this request you handle this request here we just start 10 processes they are actually separate processes only it's just cleaner for us to write and you know there is no Port conflict a single node index.js is able to start 10 processes all of them listening on the same you know 3,000 Port over here a simple route hello world route a slightly better route app.get API colon n which means if you want to find the sum from one to you know some number n this request Handler will do it if you look at the code that's exactly what it does it gets n from the parameters starts count equal to Z let count equal to Z if n is greater than a very big number then we pad it to a small number and then we iterate from zero to that number and increase the value of count and then finally return that to the user along with process. PID we want to eventually debug okay are multiple requests being handled by different processes which is why we also send back process. PID here what is process. PID whenever you start a process on your machine it has an Associated process ID P ID if you start a node index.js if you do a go run some goang file every process that you start even my Google browser Google Chrome browser that I have here has a p ID process ID associated with it and this log over here should confirm okay we are able to start multiple processes let's try to run this code and maybe see we understand something better with the code pasting the code here again to quickly go over it from top to bottom bunch of imports you can also hard code this you don't want to use OS feels overwhelming hard code this if you have five CPUs on your machine hard code this to 10 or five if cluster is primary we put a bunch of logs and then we Fork the process 10 times we also say if one of these workers exit please Fork again so that you know if it does one of the workers do end up crashing for whatever reason we do we restart it again so that you know we still are using 10 CPS on our machine so if any of them stops you do this you can also be like I will exit the parent process which will exit every other child process as well when this happens um a lot of times this is the approach if one of the worker dies we kill everything and then you know pm2 will restart the application again and you know that's how the restart happens or you can just be like yeah if one of the workers stop I will just do a cluster. fork to again restart it what does cluster. fork do it restarts this file again it starts a new process with the same file but this time cluster. is primary is false and you know control reaches here and you start an Express server that is listening on Port whatever was the port over here 3,000 let's run node index.js we cannot because we have these kinds of UTS let's convert this into a typescript project try this later on let me just you know quickly do this and you know feel free to try it in your own time you can also try it right now uh but again you know it'll take some time for you to I guess not like npxs d that's all I did to initialize a TS config.js and then I'm going to replace index.js with index.ts because this is typescript code then let me do npm install Express why does cluster there we go npm install Express npm install at types / Express notice I did not have to do an npm install cluster and then this is the code if I compile this how do I compile this TSC DB it will create an index.js file it is creating it in the root folder because I did not update TS config to have a rooted D and an out D which is something that we usually do but today I did not do it because we have a single index.ts file this is the typescript file I ran node I ran TSC DB to convert it into a Javascript file let me run node index.js and it keeps on erroring because I have something already running on Port 3,000 so let me stop and let me restart and notice what it say what does it say worker this thing started worker this thing started there are multiple processes that I have running with different PS which means they are different processes they aren't lightweight threads they are different processes that are now handling requests on Port 3000 if I go to Local Host coron 3000 SL API /10 or th000 what does it say it says final count is 55 and a process. PID which is 91 296 there is one small problem with this what would you expect when I refresh this page you would expect the P ID keeps changing but notice the P ID Remains the Same it did change initially but now if I'm refreshing it Remains the Same it will remain the same for a while and then eventually if I refresh again you'll see it will change which means even though we have multiple processes the same one is handling all the requests here does anyone know why because the answer is I don't know why statefulness is what I thought as well maybe caching same API caching caching where just sing threaded that's the wrong answer do worker threads run on far left core of the CPU I don't understand that question check for pent process for child process copying the parent one yes exactly cool I guess no one knows the answer I am assuming it's yeah somehow you know the request when it goes down goes out from the browser it has some sort of a fingerprint as you can see now it changed but you know it has some sort of a fingerprint and then the back end is like nature I will return it from here as well you can this does not happen if I start another browser if I start a brave browser there this it is handled by 91 288 here is handled by 91 292 if I send the same request via curl you will see 912 91 so it does happen on it is multi- threaded firstly I thought this isn't working only when I was testing this out before the class I was like I'm refreshing why is this not changing then I realize probably you know something is up with how the back end handles request from the same device um and you know just pipes it to the same core for some reason I don't know what the reason for that is but you know if you wait for a while then you will see the P ID change for example if I wait for if you wait for like 10 seconds then it changes but if you keep refreshing with no gap of 10 seconds you will notice the request goes to the same P ID I don't know why probably something on the back end cool that's pretty much it um that's how you do you know vertical scaling in not J um and yeah try to figure out the stickiness thingy I have no idea why this happened you know why the it gets routed to the same P capacity estimation all right so yeah let's I don't know it could be a break time we could take a small break how's it going uh should we take a break guys yes or no I can answer some questions during that time I don't think we'll be able to reach indexing today because we have to do some backend stuff should we take a break and that is a yes so we will take a break we will restart in 9 minutes so at 7:50 I would assume that would be in India time tasks that can be offloaded to other things that have to wait but can wait asynchronously is what you should use node just for because you know if they're being waited on asynchronously the Node just thread can do other things diagram okay cool guys let's get right into vertical scaling um so slide number four before that capacity estimation this is one of the weird just you know things we have to sort of check off our things to do um I have a video with K pwani where we did you know YouTube and the first thing she asked me was you know how much video do you think is being uploaded every day on YouTube so these are the questions that you know are asked in in these sort of system design interviews a lot of times this is what capacity estimation means there are some practical learnings from this but a lot of a lot of this sounds pretty useless to me to you know as an interview question I I don't know how many how much videos uploaded to YouTube every day I don't follow those stats right but a lot of times you are expected to you know arrive at A number close to what is that uh the load of a system like this what is the expected load so you'll sort of face these questions from time to time yeah if you're building a YouTube how much do you think every users come how many users do you come every day how many uploads do you happen every day how many uploads do you happen every do you think happen every second so on and so forth so practically they become useful if you want to understand how to scale your system we'll understand a few architectures here PTM application test application I'll also introduce a third one if I forget to remind me a video transcoder application this is a common system design interview question where they'll ask you how would you scale your servers system design may they will never ask you you know okay oh your system has 10 people how would you build it they will ask you your system has a million users how would you scale it how would you handle spikes what are spikes you had only 10 users and suddenly you have a million users then you have 10 users again how would you scale up your infrastructure how would you scale down your infrastructure how can you support a certain SLA given some traffic what is SLA SLA stands for service level agreement SLA is basically mean if you you know ever use AWS let's say or if you use Razer pay they will give you an SLA service level agreement they will say okay yeah 99.9% we will be up which means you know in a year we might go down for whatever that is 30 minutes but 99.9% up time is our SLA and if we go down below that you don't have to pay us or you have to pay us 20% less you get discount in case we don't follow our SLA in case we do not do what we promised you so that is what an SLA is and the question here is the statement is how do you support a certain SLA given some traffic now let's say razor pay said you know we will you don't have to pay us anything if we don't follow the 99.9% up time but if you make such a promise how can you make sure you always do follow that SLA and never can you go down for more than 30 minutes these are the questions that are asked answers usually require a bunch of number one paper math the SLA is 99.9% and you know the maximum Spike that is expected is a million users then I should always have you know some servers running or I should always every 3 minutes check should I scale up should I scale up so that if there's a spike that happens you know during cricket matches and cricket matches happen two times a year then you know during those two times my infrastructure will wait for 3 minutes before it scales up by a lot so you know these are the things that you have to think of what did I say again I said you know in capacity estimation you have to put things like these IPL happens twice every year my SLA is 99.9% which means I my downtime can be at Max 30 minutes in a year which means whenever this Spike happens I have to recover at Max in 15 minutes if I don't recover in 15 minutes then you know I've failed to follow through on my SLA which means whatever is the logic of mine that auto scales servers that you know brings up a lot of infrastructure needs to keep checking probably every 5 minutes okay do I is there a spike I need do I need to Auto scale whenever whenever a spike happens a spike like IPL happens what you do is you you know go to a bunch of providers the IPL bandwidth that is expected you know to be handled in India whenever people are watching 20 CR people are watching the same thing is you cannot it cannot be handled by a single cloud provider only AWS does not have enough wires in India to handle that kind of band withd so you have to have multiple you know vendors available and whenever the spike happens how can I in in 15 minutes quickly start those vendors so these are you know the things that you talk about when you do capacity estimation estimating requests per second there is some Obsession that in these interviews with the per second requests per second bandwidth so you know if you will say they'll ask you how many requests do you think come in a day you will say oh there are like whatever you know a million requests or a million users that will come in a day or a million active users so they'll ask oh how many requests will each user send you'll send say maybe 10 requests so they'll say so there's 10 million requests in a day what does that translate to seconds so you know you have to do whatever 10 million divided by you know whatever the number of hours are in a day so on and so forth so 10 million divided by 3600 into 12 24 I would assume something like this these are the number of requests that I need to support per second right if there are 10 million requests a day that means there are 10 million divid by 3624 24 request a second when you arrive at this number the requests a second number number let's say that number is you know whatever 100 requests a second then you have to be like how much can infrastructure in my Mach my how much can my infrastructure support if I have a single nodejs machine in AWS how much can it reasonably handle in one second let's say the answer is 10 it's not 10 for noj but let's say the answer is 10 requests a second is what your node J process can handle which tells you what I I need to have 10 machines available because I need to handle 100 requests a second so that is how you know you will arrive at the first number take for an application like the this assuming can my node just you of course need a realistic number here you know a node just process can handle a th requests a second and let's say the number of requests that you're receiving are 10,000 requests a second so you need 10 servers available um that can handle all of this load then they will ask what happens if there is a spike how do you scale up scale down so as assuming assuming you have monitoring that tells you how many requests are coming and you know you can somehow spit out yeah the current number of requests per second in the last 10 minutes was this number you can sort of use this to scale up or scale down your infrastructure Auto scaling machines based on the load that is estimated from time to time from time to time you need to check what is the current load on these 10 machines and if this load becomes a lot then you have to scale up if this load becomes really less then you have to scale down so the final architecture looks something like this let's say you have you know four machines right now in AWS all four of these are receiving requests from user and every few minutes they need to somehow aggregate KR in the last 10 minutes I received 100,000 requests a million requests a million requests a million requests this somehow reaches a system how it reaches the system you know there are multiple ways you don't have to synchronously do this you don't have to be like HTTP request came let me send an HTTP request to tell the other side okay you know a request has come no you aggregate them yeah whenever a th000 requests come I will let the aggregated service know thousand requests came even if this gets missed it's fine so that is how you will somehow aggregate in the last 10 minutes these many requests came you will spit this number out to you know some database or some process which will be like oh the number went up it will tell the auto scaling Group K you need to scale up you need to bring up 10 more servers the load in the last 10 minutes has increased by a lot similarly if this load decreases by a lot and you know this final number becomes 10 requests a second then this process whatever this process is that is staling scaling up and scaling down this can be you know different things based on how crudely you implement this it can be a node just process for all practical purposes care I will receive every 10 minutes the current number of requests I will scale up and scale down my infrastructure based on that do we understand the diagram then we reach Autos Skilling groups tomorrow maybe we'll actually implement this as well but do we understand at least theoretically multiple machines sending data to a single service which is aggregating the total number of requests per second and then you know sending it to a process that's scaling up or scaling down we don't understand this like 80% understand this so there is some obvious question that I have missed can you post it in Q&A really quickly how many requests can nodejs handle um it depends right uh depends on how heavy that request is if this request is simply returning back the user hello then not a lot of them maybe 20,000 requests a second if that process is doing some for Loop over one to a million then of course it can handle lesser there's no Port conflict set means one port is handling all 10 requests yes that is exactly the pro that cluster module provides you someone said an answer for cluster P ID recap recap recap recap please come once again okay let me explain one more time you are an engineer forget that this is a system design interview you are actually an engineer and you created a simple website to begin with eventually we'll get into complicated use cases like a realtime game and a video transcoder but for now you created a e-commerce website all right now right now you have a single server over here on your e-commerce website and it's receiving requests from end users this is the back end of your e-commerce website where is it running it's running on a ec2 machine let's say users are sending requests here let's say suddenly 10 million users come some sale happens everyone's visiting your website and you have no systems to scale up or scale down then what will happen your server will begin to get choked your users will not be able to receive requests they will get a bunch of timeouts they'll just get get bad status codes or they will have to wait for a really long time for a response to come back and hence this is bad you should scale up your infrastructure whenever this happens the question is how do you know how many requests are coming and you know how can you based on that scale up or scale down one crude way to do this is not crude this is actually pretty good for an application like this what makes sense is you can create an autoscaling Group which you know use as a buzzword for now tomorrow we'll understand this better what does an auto scaling group mean but you know from first principles what does Auto scaling group means yeah you give it a number you start 10 machines and it starts the 10 machines for you you give it a number no sorry now scale down to three and it scales down to three machines it terminates the seven machines so the most crude way of doing this is okay create an autoscaling group and tell it k bro your average CPU usage should be 20 50% let's say which means you tell the Autos Skilling Group K if your CPU usage is going above 50% if it becomes 80% then please scale up the infrastructure please in increase the number of machines that currently exist so that this number goes down to you know 50% what is the other thing you're telling it K if it ever drops to 10% let's say all the users leave it is night time in India right now then of course no one is visiting your four servers so the average CPU utilization will go down and then what will the Autos scaling group do it will be like average should be 50% I will delete these slowly so that the average you know goes up to whatever 30% and of course you know you always need one machine you cannot be like 30 is less than 50 so I will delete the last machine as well the Autos scaling group will of course have a minimum but you get the idea that is the most crude way to do Autos scaling but you know when you are asked this question in a system design interview they will give you weird use cases for example you know one use case was this the metric is on Rec per second K they will give you K this single process can handle 10 requests a second and they will say you know how will you build a system that scales up and scales Down based on the number of requests so your answer should be or you know your implementation also should be yeah I will have a certain number of machines and these machines will every 5 minutes tell another let's say nodejs process for now K I received 100 requests in the last one minute or last 10 minutes I received 1,000 I received 2,000 I received 3,000 this will create an average here on average we have received let's say 100 requests a second and it will tell this to another process or even you don't even need another process this process itself will tell the AWS Autos Skilling Group K scale up the number of requests per second that we're handling is very high so please scale up and if vice versa if the number of requests that we're handling on average goes below a certain threshold then we can scale on as well because you know we don't we are over provisioned then we have too many machines and we we can get rid of them to save some costs so this is one architecture to do auto scaling when you have constraints like K we have a very strict SLA and you know the number of requests per second that our process can handle is X and you need to you know scale up and scale down whenever this Spike comes whenever the load slowly load might increase slowly there might be a Spike as well so you know irrespective of what happens you need to scale up or scale on your infrastructure this is the slightly more over engineered way this is the most crude and easy way okay you know give it an average CPU utilization give it an average incoming bandwidth if the incoming bandwidth goes above a certain threshold then also scale up if it goes below then scale down do we Now understand this crude way of doing capacity estimation and scaling your systems this is exactly what we'll try to implement tomorrow you know 50% kind of a thing all right um I think that was good only yeah 90% great cool so now comes a slightly more complicated what if they ask you can now you have a realtime application you have a chess like application how would you scale now and in a chess like application you don't have requests per second you have persistent connections if there are 100,000 people active on your website they aren't sending you a lot of messages the number of messages they're sending is low but they have a persistent connection to your servers and persistent connections are expensive you can only have you know 10,000 persistent connection there's some Metric like this okay for a node just process if you have more than 10,000 sockets open you'll run into issues which means for this real time application the metric that you're worried about is not the number of requests but the number of people currently active on on the website if I go to chess.com you will see they have this number aggregated somehow okay currently these many people are playing if you go to a bunch of other you know live streaming websites like twitch there A lot of times have on the top right currently 8,000 people are active on this website so you always have this number how this gets aggregated is a different question in a distributed system like this this is also a challenge this machine has 100 connections currently open this has a th000 connections this has 1300 this has 1400 how do you aggregate this and you know the answer Remains the Same you create a aggregation service like this anytime a person connects this guy tells hey I have a th people now every time a person disconnect this guy tells sorry I have 999 people now you can also back this off you don't have to not every change needs to go from here to here even if this number is off by a bit or you know is slower to finally catch up to the real value it is fine you don't need this number to be exactly accurate at all times so every 2 minutes you know they can keep reporting their active users that number will reach here and this will send this to the database so that users can see this on the front end as well you want this aggregated number shown on the front end as well this will also try to find an average okay are 88,000 people are live we have four servers currently which means every server has 20,000 open sockets I need to increase the number of servers because I know okay you know our single machine can only handle 10,000 at a time so this will tell an auto scaling group it will tell AWS please scale up to eight servers and it will you know the number will get increased if these connections go down then of course vice versa would happen another like one very challenging thing that happens here compared to the last use cases you know if you had 20,000 people here and you scaled up you need to slowly drain these users to the other server you need to move some of these users to the other server which you know requires a bunch of client side logic which means on your website code your react code you write this logic to you know disconnect and reconnect to a fresh websocket server the other problem comes when you drain down let's say AB we increase the number of servers what if you decrease the number of servers then you know this server will die all the people that had persistent connections here now need to be moved to other servers so this is one thing that becomes challenging if you have web socket connections this is not a challenge for HTTP servers for HTTP servers request go request finishes so if a server dies it's fine all the requests will start get to get outout to the other server but for a websocket connection the server that is dying will take down all of these websocket connections with it so you have to reinitiate those connections to a different server from the browser you probably need to show the user somewhere on the UI you are disconnected and then you know eventually when they reconnect to a server show them you are reconnected something like that um but yeah that's a brief of how you do this in a websocket system real time chat application do we understand this 82% let me share the last one and then let's call it and do questions um so that means we will do horizontal scaling tomorrow which is fine um yeah the last one was a video transcoder then I'll open it to questions the third weird use case is you know video transcoder or replit like website or you can say even lead code but you know not really lead code submissions happen in 2 seconds replit pay you have to give the user access to some compute for you know sometimes 2 hours and video transcoding also takes sometimes 2 hours what are these three use cases let me quickly share and then you know the way you scale in all three of these is actually fairly similar video transcoding means when you go to youtube.com and upload a video ever I don't know if you've tried it but when you upload a video you only give YouTube a single mp4 file you give YouTube a single mp4 file and YouTube converts it into various qualities it converts it into 720p 360p so on and so forth three qualities this process is called video transcoding it is transcoding your 1080p file into various qualities this can take a long time this is a very expensive operation what do I mean by expensive I mean you know converting a 108p file to three variants will take a lot of the CPU on the machine that they're running on um so you know if suddenly 100 people come to your website and upload websites even uh upload videos even though 100 as a number is really small only 100 people are uploading you need a lot of compute for this you need almost 100 you know decently sized machines because it takes a lot of you know time for this to happen for you know um an mp4 file to get converted to 360p so on and so forth so the question is how do you scale a system like this where you know you know okay you will have a use case where you have to have some compute ready for the user for a long time because video transcoding takes 2 hours and if you go to replate ever and you know try to create a new repel in any language let's say nodejs then also you have access to basically a replate machine I have a video on rep so you can look at it I don't follow this architecture there I follow a very over engineered kubernetes architecture I do share this architecture there though okay this is taking a long time for me so when you go to replate also then you have access to a machine because you have a terminal where you run a bunch of commands they're running on someone else's machine same happens on lead code when you submit runs on someone else's machine only on lead code it's like very small so this architecture isn't really valid for lead code but for the other two what you have to do is whenever people come to your website and say I have uploaded a video or I want to create a apple you need to give them access to some compute immediately what do I mean when I say compute I mean you have to give them access to some 2GB and you know whatever uh 2 CPU machine this is exclusively for you for the next 2 hours when you're doing an mp4 transcoding it's a very expensive operation for a really long time converting 1080p to 360 converting 1080p to 720 is a 2hour long process that needs access to both the CPUs at 100% Peak so whenever user is uploading they need access to some part of your computer this is all of the machines that you have in a very big pool let's say then you know you have to sort of share these machines somehow with users okay if the first person comes you give them access to the first machine second user comes you give them access to the second machine third user comes you give them access to the third machine then the fourth user comes is when the question comes okay what happens should we Autos scale or should we do something else and you know there can be multiple answers here one answer is yes please Auto scale so you know what you can do is you know whenever a new request comes to create a new mp4 file um or you know upload a new mp4 file you can have a war pool basically of servers you can have a warm pool of servers what does warm pool means it means they are ready these machines are ready but there's nothing running on them they are ready to get a request from the YouTube uploader service yeah someone uploaded an mp4 I will now handle it and then you know another person uploaded I will not handle it and as these handles keep on happening you keep on increasing the warm pool as well you always have let's say 20 machines ready that are warm by warm I mean get the idea you know they're ready they're not doing anything but they're ready to pick up requests if you know okay you know Peak you will only get 20 uploads in 1 minute if you know at Max your slas if you send me 20 machine uh uploads I will handle it if you send me 21 I might not be able to handle it but my SLA clearly states you will you can send 20 to me in a minute and I'll be able to handle it to you probably need a warm pool of 20 at all times so that you know as people send requests you sort of warm pool keeps on increasing and you know you you keep assigning the machines one by one why do you have to assign every user a machine because as I said you know this is a 2 CPU 2GB machine that all those resources will be completely used for a video transcoding process and the same is true for you know replate if you go to replate you get some vcpu some GB memory some space over there until your replate session is closed so you know they need to give you some part of their compute for a long time so this is one way what is the problem with this number one you're maintaining a warm pool you have 20 machines always running which might remain idle so you know you're playing for 20 machines always not a problem at scale if you have a lot of machines you know if you have a thousand users then it's fine you pay for 20 extra machines when you're creating a th000 machines every month so the cost isn't anything compared to you paying if you have throughout the month you're starting 2,000 machines doesn't matter if you have 20 in a warm pool for a long time it's fine well that's the first problem what is the second problem resource sharing is also sort of a problem here you know you are restricting the user K yet you will only be you will have complete access to two CPUs even if you're not using it maybe this 1080t to 7 uh 10 uh 1080 to 1080 takes 100% of the CPU 1080 to 720 takes 100% of the CPU but 1080 to 360p does not take a lot of the CPU so you know would be nice if this can be shared you know if I can just have 20 CPUs that can probably be shared by you know 30 videos because you know a lot of times one video reaches the 360p stage so it does not need as much CPU so you can't really share compute you are restricting yeah every video will have its own set of compute and GB requirements so you know that is the downside of this approach but still a pretty decent approach to follow in you know this third use case of having a replate like website or video transport website have a w pool and increase the W pool what is the other thing you can do the other thing you can do is an architecture we've discussed a few times you anytime a video transcode happens you push it to a queue and then you know you have workers that asynchronously pick this up from a Q and if this Q length becomes very big is when you scale up your servers so you know if 100 people submit a video upload and you slowly can scale up this length has become very big let me scale up the number of servers to 10 and then as this length goes down throughout the day because they will transcode and as they transcode this Q will sort of become smaller and smaller then you can scale down so that is another sort of architecture the problem with this architecture is it works for video upload because video upload me YouTube says after you upload YouTube says you please wait for 1 hour video is processing so you know they have that kind of an expectation SL SL SLA with you this is not true for replate when you go to replate and that is why you know replate breaks if all of us will try to create a rep at the same time they cannot have a warm of a, people they usually do not get a spike of a th000 people but you know if all of us right now go here then most probably we'll crash upate because they don't have such a big W pool and they cannot use this Q based architecture because this is slow Q may you only have three workers let's say then everyone will pick them you know slowly slowly one by one versus and you know you can't expect that onate sorry wait for the next 30 minutes we will get computer for you you cannot tell that to the user which is why this Q architecture will not work for replit what will work for upate the original warm pool architecture or a kubernetes architecture that I've already discussed in a YouTube video let's call it here guys boy we discuss a little too much I think I've been speaking for a long time so let's you know do some candid discussions questions all that Jaz um is there any other special kind of way to scale applications in some use case no these are the three I can think of um generally the thing that that will be asked is this um you know I guess the four fourth thing can be you know wait cricket matches um because when you have a cricket match um you know sorry this the last thing um cricket match that's happening in Mumbai let's say this is the you know India map is Mumbai then the video transcoding is actually something that doesn't need to scale a single stream goes from the Mumbai server to some server you know in some data center let's say in Mumbai only and this sort of converts the incoming stream which usually is rtmp over to you know an hls stream the problem that comes is how do you send this to so many people so usually the scaling problems we have discussed here is not about sending data yeah a lot of people will ask a lot of data at the same time it's more about a lot of people will be like yeah give me some I want to sign up I want to sign in I want to do something or you know very s e-commerce website backend rout handlers this is a completely different problem if you have a cricket match happening the problem is is yeah too many people are asking for you know 720p video which is a lot of bandwidth and you know India cannot support this much bandwidth so how do [Music] you 720p a lot of people are asking for 720p video how do you send this you know for all practical purposes MP4 video or you know hls is a better format that is used for streaming in these websites how do you this is the easy bit getting an rtmp stream transcoding it to hls is the easy bit and creating three qualities out of it is the easy bit the hard bit is sending it to so many people how do you scale that and the answer surprisingly over there is also Autos scaling groups in AWS only there's a video on YouTube where hotar discusses this how do they scale their infrastructure the answer surprisingly was Autos scaling groups in aw I thought they would have a very elegant SL you know Cloud native architecture but apparently not um they use asgs which is something we'll be discussing tomorrow um to scale up infrastructure in case of spikes all right with that let's answer some questions it will stop all the processes how can two different Cloud providers come into picture oh well for example you know in case of a cricket match one cloud provider is not enough um so you know you will have deals with multiple CDN content delivery networks and you know some of them will have servers in Rajasthan some of them will have servers in guat and then you know whenever the peak Peak Spike of IPL happens is when you know you will turn on all Cloud providers and be like bro everyone please serve the right traffic if the crypto course launch okay answer that can you show the calculation for an SLA uh calculation for for the SLA for which SLA um for example you know one basic paper math was the basic most basic method it was okay our server can handle 100 requests per second n our system has a million users each user sends 100 requests or th000 requests in a day so I did what I did number of users into number of requests in a day divided by you know 3600 into 24 which is the number of requests per second that is the number of requests per second we have a single server can handle 100 so how many servers do I need this number whatever this number is divided by 100 these are the number of servers that I need um to handle this uh what the second question sorry if I already answered this but can chess game be made by HTTP server and websocket for most practical purposes yes but there is one thing for which you know you cannot use HTTP the problem the thing with chess is you don't send a lot of data so there are two benefits of websockets one whenever you're sending a lot of data you should use websockets so you don't have to do the three-way handshake every time an HTP request is sent so that's one benefit chess may you don't play a move as often so it's fine the the other problem that they resolve is server side events you have to push events on the server if one person makes a move it needs to be pushed to the other side HTTP cannot do that you cannot you know push something by HTTP so this guy either needs to keep poing has the move in has the move in made which is bad good thing is websocket so you you can make whatever you can make in web soet you can make HTTP by polling but you know shouldn't do that you I have won the mount not to my message I will not please ping anur or rookie they will do it uh end of month is year is when you know money gets spent so should have it by 31st maybe like second third also there are a lot of people like small big mounties but yeah if it's less than $100 Aran is the one who's clearing it what are the best operating system open source blockchain especially related to fintech all fintech all all D protocols Unis swap drift protocol have their contracts open source maybe not their code open source but most of them have their contracts open source you look at them all right thank you thank you thank you thank you thank you notion doc ises not open another notion doc by Rohan vya and one by abta so last two things guys then I'll call it is my internet bad well you're out of luck let's try either notion is down or my internet is and it is notion yeah oh there you go let's see sorry guys hard oh no oops did not reload this one's still reloading we'll have to call it here guys sorry these two notion blocks got missed create a thread if that super important create a thre and you know tag me there U I'll pick it up from there um soon enough let's call it here guys good class I'll see you guys tomorrow where we will be implementing some Autos Skilling groups in AWS maybe doing a bunch of other AWS things good night G

Transcript for:Scaling Strategies and Key Programming Principles

Transcript for:
Scaling Strategies and Key Programming Principles