Spark Executor Tuning Guide

[Music] even if you've written the Best in Class par code your jobs may take forever to complete if you've not done the allocation of CPU and memory resources correctly hey everyone welcome back in this video we are going to talk about executor tuning basically how do you decide the number of executor that you should create and the amount of memory and the number of cores that you should be allotting to those executors so let's first have a look at how executors are created and how executors would look like inside a node so here I've taken one node and one node is basically one machine in a cluster and you could have several of these machines in the cluster and the configuration of this machine is that it has 17 cores and it has 20 GB of RAM now let's say you want to do a spark submit and when we do a spark submit we basically specify the number of executors we want to create the amount of codes and the amount of memory that we want to allot to those executors right so let's let's go ahead and create three executors let's go ahead and create three executors and let's say that I have assigned five cores and 6 GB of RAM to each of these executors so in total you would end up using 5 into 3 which is 15 15 CES and 6 into 3 which is 18 GB of RAM from the whole cluster yeah and this is how your executor would look like so this executor you would have three executors you would end up having three executors and each of them would have five cores and 6 GB of RAM five cores and 6 GB of R yeah so here we saw that inside a node this is how you're going to create executors during spark submit yeah but again whatever number I assigned over here right this was based on my wishful thinking right there are of course certain logic and certain rules that you would need to follow in order to optimally size these executor in order to optimally decide what are the number of executors you should create and how much memory and how much course you should allot to those executors yeah so let's imagine you had to create and run a spark job how would you decide what are the number of executors what are the number of executors and the cores and memory that you would assign to those executors right a very critical question so let's go ahead and take a few examples in order to understand how we would do that okay so let's say you have this cluster and you have the following configuration you have the following configurations so let's say you have five nodes which is five different machines and each of those machines have this configuration they have 12 cores and 48 GB of RAM Yeah so basically you have five machines in a cluster you have five machines and each of those machines have 12 Cordes and 48 GB of RAM now the question is how do you decide the number of executors the number of executors that you should create when running a spark job the course per executor and the memory per executor yeah so there is a way to think about this so you would naturally have three options as we see over here the first option would be thin executors the other one would be fat executors and the last one would be optimally sized executors of course by looking at this we would always want to go ahead with optimally sized executors right but all three of them have advantages and disadvantages of their own so let's first have a look at fat executors if you were to create fat executors how would the configuration look like and what kind of benefits or disad benefits and disadvantages that it would have yeah so let me just move my camera over here and let's go ahead and perform the calculation right so first of all what are are fat executors right what are fat executors so fat executors are those executors which occupy a large portion of the resources on a node yeah so simply to keep it simply fat executors are those executors which occupy a large portion of the resources on an node yeah so here we have 12 core 12 core and 48 GB of RAM so fat executors are going to occupy a good portion of these resources yeah so now let's go ahead and do some calculation so in order to calculate the number of executors and the CES and all of that we are first going to leave out one core and 1 GB of RAM for operating systemo and Yan and other processes right so per node we are going to leave out one core and 1 GB of RAM so that is the first thing that we should do so if we had if we had 12 cores and 48 GB of RAM we just subtract one we just subtract one and what we are going to get is the number that we have over here so this is per node you're going to have 11 cores and 47 GB of RAM yeah so now what we said that fat executors are going to occupy a large portion of the resources yeah so what we are going to say is one executor one fat executor will take all of 11 cores and it is going to take all of 47 GB of RAM so this simply means that one node is simply going to have one executor it is going to have one executor which takes up all of the 11 code 4 and 47 GB of RAM and this much amount of space is left for the operating system and other processes yeah so this would simply mean now that one node has one executor and our cluster has five nodes as you've seen over here so one one one cluster has five nodes so five nodes are going to simply have five executors so we are going to have five executors now the number of executors the number of executors is simply going to be five and executor CES how much is it going to be take a guess it is going to be 11 we already decided that over here yeah so it is going to be 11 and and executor memory executor memory is going to be 47 gab over here so this configuration is the configuration that we Supply when we do a spark submit for when we are creating a job Yeah so basically if we were to use fat executor we are going to have five fat executors yeah one present on each of the nodes and you see that these are very strong very powerful executor because each of them has 11 cores and 47 GB of RAM yeah so you're going to end up with five executors each having 11 Cordes and 47 GB of RAM so this is how the scenario would look like for fat executors now let's have a look at thin executors so first of all what are thin executors thin executors are just the opposite of fat executors thin executors occupy minimal resources from the node yeah so they would occupy minimal resources from the node so let's quickly do a few calculations so here we've seen that we've already left out one core and one 1 GB of RAM so per node again we are left with 11 cores and 47 GB of RAM yeah so one node has 11 Cordes and 47 GB of RAM now what we decide is that one executor because it contains minimal it takes up minimal resources I'm going to only give it one core one executor is only going to take up one core and we have 11 cores so this simply means that one node is going to end up with 11 executors yeah simple unitary method we have one core per executor and there are 11 cores in total so there are going to be 11 executors here now what is the memory per executor what is the memory per executor so we know that we have 11 executors in total and 47 GB of RAM is all that I have on node so 47 is my total RAM and in one node I have 11 executors so this is somewhere going to be 4 GB yeah so what we've essentially found out here is one executor is going to contain one core and 4 GB of RAM yeah now the last thing that we need to find out is what is the total number of cores and that is very simple to find out so one node has 11 executors as we've seen over here and we have a total of five nodes so this simply means we are going to end up with 55 executors 11 into to 5 is 55 so again the configuration is going to look something like this num executors is going to be 55 the executor course the executor code is going to be one and the executor memory is going to be close to 4 GB yeah so this is how the configuration is going to look like for a thin and a fat executor now let's understand the differences the advantages and the disadvantages for each of them and then we would go ahead and understand how an optimally sized executor would look like okay so now let's have a look at the advantages of a fat executor so the first advantage of it is increase par so we've seen that fat executors are pretty powerful they have a lot of CES and a lot of memory so they'll be able to Crunch a lot of data right so if you have a lot of cores that means that you'll be able to run a lot of task right and because many codes are present on one executor many tasks can be run together thereby increasing the parallelism right now it is of course very beneficial because it allows you to load data which requir significant amount of memory so task that require significant amount of memory can be easily consumed by consumed and processed by fat executor another advantage of it is if managing a lot of executors is a concern in any case right we've seen that one node one node only has one executor and similarly all the other nodes would only have one executor either one or minimal executors so in cases where managing executors is a concern you could think of fat executors now the other Advantage is enhan data locality now given that the executor already has a large memory it is going to be able to fit a lot of partitions lot of partitions in this memory right so that would mean that the data it wants to process is already local to it it's already loaded in memory right so it wouldn't need to shuffle data from the other nodes or uh in the clusters right so it is for that reason that the data locality is enhanced and this overall reduces the network traffic and the overall application piece first of all because you don't need to move data across the cluster right and because of this it improves the overall application speed now coming over to the disadvantages of course we have a lot of resources within an Executor if we we don't fully utilize it we are going to end up pay for resources which are sitting idle for resources which are not which are which you're not even using right the second one is Fault tolerant so let's imagine that you have two executors and these decide on node one and node two and these two executers are crunching a lot of data so let's say they are crunching 32 GB and 30 2 GB of data right now let's say for whatever reason this executor failed something happened and this executor crashed the amount of computation the amount of effort that needs to be done in order to recompute this is going to be large because it was processing a huge amount of data it was processing 32 gabyt of data so there is going to be a good amount of time in terms of computation that is going to be lost in order to recompute this size of data right and this is going to reduce the application reliability now the last one is hdfs trut so for those of you using hdfs hdfs throughput simply mean the rate at which you can write data to hdfs or the rate at which you can read data from hdfs right write data to hdfs or read data from hdfs and the rate at which you can do these two is called hdfs throughput now if you use a lot of codes basically more than 3 to five codes actually more than five codes then this is going to cause a lot of garbage collection and I've discussed about garbage collection in my previous video on spark memory management if you've not watched please go ahead and watch it so it is going to cause a lot of garbage collection and garbage collection is basically a process within the jvm in which if in which it cleans up your memory for unwanted objects right so if your memory is full it basically cleans up your memory of The Unwanted objects and during this time it pauses your program so it pauses your program cleans up the memory for unwanted objects and then resumes back your program now imagine if this is going to happen this pause is going to happen again and again and again it is going to take a performance toll on your program right so that is one of the reasons why you're recommended to have something between three to five executor so these are some of the advantages and disadvantages of fat executors yeah okay so now let's talk talk about the advantages and disadvantages of thin executors so the First Advantage is increase parallelism again and this might confuse you a little bit because you saw increase parallelism for fat executors as well but this is a little different right in the sense that you would have a lot of executors right you would have a lot of executors so this is basically executor level parallelism the last one was task level parallelism in one executor you had a lot of cores right so each core is capable of processing one task and you can ex you can process many of such task in parallel so it was task level parallelism task level parallelism but this is Executor level parallelism and that's how it's different so so the advantage of it is that you would still be able to process a lot of things parallely but these jobs need to be lightweight the amount of work that each executor is doing needs to be lightweight because you cannot stuff in a lot of memory because these guys have very small memory so you can still do lightweight jobs yeah lightweight task now the second advantage of it is Fault tolerance so we've seen earlier that one executor was processing a huge amount of data and then that worker crashed for some reason and because of that you lost a huge amount of data on which already computation had happened but in this case your executors are very small right so even if you lose an Executor it is easy to be able to rec compute whatever has been done over here already yeah so Thea tolerance is pretty good when comparing it to Fat executor now the disadvantage is high Network traffic now because this executor has a very small memory there are chances that the data it needs might not be fully present on this executor so it needs to move data across the cluster in order to bring the relevant data into this executor yeah and this is going to happen to all the executors within the Clusters so that is one reason why it is going to increase the network traffic now the last disadvantage is reduced data locality now we know that each of these executors have a small amount of memory right so the amount of partitions it is going to be able to load in this memory is also going to be small so the number of partition which are local to this executor will therefore be small so because this this memory is small the amount of partition that could be loaded into this memory is also going to be small let's say it need P10 it would need to load this in this memory and P10 is not local to this executor right and the reason why it's not local because it couldn't be loaded into this memory because the memory itself was small so this leads to a reduced data locality yeah so these are the overall advantages and disadvantage of both thin and fat executors yeah okay now that we've seen the advantages and disadvantage of a fat and thin executor let's try to understand how do we size or how do we create an OP optimal executor yeah and there are a few rules that we should always keep in mind when trying to size an optimal executor and there are four of them over here the first one is leave out one core and 1 GB of RAM for doop Yan and operating system processes right so you always leave out one core and 1 GB of RAM yeah this is the first one now the second one is the Yan application Master now Yan application Master is basically the one which is responsible for negotiating resources to the resource manager so the application Master basically negotiates for resources from the resource manager so basically when you say that I want to create an executor with 11 cores and 47 GB of RAM it is this guy who going to go and ask the resource manager that I need these resources in order to be able to provision an Executor yeah so we also need to leave out something for this guy to function properly so you can either leave out one executor or you can leave out one core and and 1 GB of RAM the reason why we have two options over here is because application Master generally Works quite well with one core and 1 GB of RAM yeah but just for Simplicity you may just want to remove out one executor but this may not be very suitable for cases where you have a fat executor yeah so in fat executor you have configurations like 11 cores and 47 GB of RAM you wouldn't want to give away such a big executor for an application Master which just needs one core or and 1 GB of RAM so if your executor is small just subtract one executor when you define the num executors right when you define the num executor just subtract one executor we are going to look at these example so either you can just subtract one executor or you can spare out one core and 1 GB of RAM so that's the second important thing to remember the third one is 3 to five task per executor and we've mentioned we've discussed earlier that the hdfs through put deteriorates if we have more than five ex uh cores per executor it leads to a lot of garbage collection so a general rule of thumb and this is a general practice rather than just saying hdfs throughput this is a general good practice to have three to five cores per executor yeah and the last one is when you define your executor memory right when you define your executor memory this executor memory should exclude the memory overhead so the overhead memory as we discussed in one of my last videos on Park memory management this is basically used for internal system system processes right so we need to spare out some memory so the actual executor memory should always exclude the overhead memory and we are going to look at an example taking into consideration all of these rules so don't worry if this this this sounds this sounds intimidating right now yeah okay so now let's go ahead and try to understand how would we size an optimal executor right so let me quickly iterate we have a five node clutter and each of those nodes have 12 cores and 48 GB of RAM so let's go through the rules that we just walk through and follow each of them yeah so so the first one that we saw was we leave out one core and 1 GB of RAM for hu pan and other operating system processes right so we would do the same over here we would leave out one core and 1 GB of RAM for hardan and another operating system demon right and this is basically done per node level so it's quite important to understand that this is per node per node we have to leave out one core and 1 GB of RAM so per node after subtracting this we are left with 11 cores and 47 GB of RAM yeah so this is the amount of resources that we would be left with now let's go ahead and follow the F the second rule the second rule was that we leave out either one executor or we leave out one core and 1 GB of RAM for the application master and this is at the cluster level so it's important to know that this is at the cluster level this was at a node level so let's go ahead and do those calculation so before doing that calcul ation let's first calculate the total memory that we have the total memory that we have is per node we have 47 gab and we have five nodes in total so this is going to be 47 into 5 which is going to be 235 the total course the total course is going to be 11 core in one node and we have five noes so it is going to be 55 course yeah now this is the resources that we have at a cluster level so now we would simply go ahead and follow this rule which is the subtracting out either one Core 1 GB of RAM or one executor so we'll go ahead with one core and 1 GB of RAM yeah so we subtract out one GB of RAM which gives me 234 GB and I subtract out one core which gives me 54 cores so this is the configuration of my cluster yeah so now what I need to do is I need to find out how many executors I need to create and for each of those executors what are the cores that the number of cores and the memory that I should be allotting per executor yeah so the first thing that is very simplified is we can find out the number of cores by simply following one of the rules so we already had these rules which would say that we should assign three to five 3 to five course per executor so this should be course yeah is interchangeable codes or task because one code would process one task so CES per executor so I would simply say that for each executor I'm going to assign five course and this is going to help me find out the total number of executors so the total executors is simply going to be the total course which is 54 course by the course per executor and this is simply going to be giving me some number which is close to 10 so I'm going to have 10 executors and each executor is going to have five cores yeah so the last calculation that I now need to do is to find out the memory per executor so in order to find out the memory per executor I simply need to take the total memory which is 234 GB and divide it by the number of executors which gives me somewhere close to 23.4 gigabyte so each executor is going to have 23.4 GB but remember this rule remember the last rule that we yet to follow was that we need to subtract the overhead memory from the executor memory and that will be the actual executor memory and this is the calculation for it so it's maximum of 384 MB or 10% of whatever executor memory we have yeah so let's go ahead and do the calculation so the actual memory per executor per executor is simply going to be 23 so let me just round this off to 23 GB 23 minus maximum of 384 MB or 10% of 23 GB is 2.3 GB yeah so 23 - 2.3 GB and let's let's also round this again to 20 GB yeah so we've rounded it down to 20 GB so this is the actual memory per executor yeah so we finally found out the number of executors the number of executors the number of executors are 10 and let me just highlight this quickly so the number of executors oops the number of executors are 10 the course per executor is five course and the memory per executor is 20 GB so the number of executor is 10 the executor course is five and and the executor memory is 20 gab yeah now a possible question that you may have is why are we not talking about the size of data the size of data why are we not talking about let's say we we are processing 10 GB of data over here or 100 GB of data over here wouldn't the sizes of data affect all of these calculations that we are doing over here yeah so the reason the reason why I'm not talking about the sizes of data particularly is because I want you to focus on one very important metric which is the memory per core which is the memory per core and let's calculate what is the memory per core so we here we have cores the executor core is five and five cores are getting 20 GB of memory that mean one core is going to get 20x 5 which is close to 4 GB of memory yeah so that means one core is simply going to get 4 GB of memory and one core can process one partition right that means as as long as my partition is less than equal to 4 GB the processing should happen seamlessly without any issues right so whether it's 10 GB or 100 GB we need to talk about this at a very granular level what is the size of my partition right if my partition if my data is partitioned into one into sizes of 128 MB into partition of 128 MB this configuration that we've created over here is very good because each core can live up to 4 GB of data yeah so I believe that is how you should be looking at this problem try to understand what is your partition size and then try to figure out that the configuration that you've created what is the partition that it can process what is the partition size that it can process right what is the memory that has been allocated to each core yeah because that memory would be the amount of data that that it would be able to process right for one partition so that was one of the most important reason but I hope all of this makes sense and it made um things clear for you right okay so to quickly summarize the benefits of an optimal executor we see that we've tried to maintain a balance between the thin executor and the fat executor right so here we see that we have assigned we have assigned five Cordes and 20 GB of RAM in this example right which is not too low and not too high as we've seen in a thin and a fat executor so this is a good configuration for good parallelism right we also avoid falling into issues with the hdfs throughput hdfs throughput right because one of the consequence of this is that it is going to lead to large GC Cycles which is going to pause your program so we have assigned five codes so this should be good with hdfs throughput the last one is of course data locality and we see that the amount of memory is 20 GB of RAM so the number of partitions this executor can hold it should be a good number right should be a good number of parti so these partition are going to be local to that executor so data locality is still preserved is still enhanced right so these are some of advantages uh of an optimal size executor right okay so let's solve one more example to size an optimal executor right so this is the configuration of your cluster here you have three nodes each of those nodes is 16 core and 48 gab of ram yeah so now let's again follow those those four rules right we are first going to leave out one core and 1 GB of RAM per node so once we leave out one core and 1 GB of RAM per node we are going to be left with 15 cores and 47 GB of RAM and this is per node yeah so now let's calculate the total memory and the total course that we are going to have in the cluster that we can use so total course is going to be 15 into 3 which is 45 cores and then total memory is going to be 47 into 3 which is going to be 7 3 21 141 yeah so this is going to be 141 GB of memory now let's go ahead and follow the second rule which in which we are going to leave out 1 gab of RAM and one core for the application Master yeah so let's go ahead and do that so we are going to leave out One Core which gives me 44 cores and 1 gab of ram which gives me 140 GB of RAM yeah so this is now going to be the resources in my cluster yeah so this is the cluster resources that I can use so now let's go ahead and find out how many executors how many executors is we want to create and what is the number of cores that we should give to these executer and what is the memory yeah so let's first find out the course right so again we are going to follow this we are going to follow this and we are going to give five executors and we also see that if we give five executed it is going to okay it's not going to fully use 44 44 uh course right so instead let's say we give four execute uh four course so if we give four course per executor we are going to end up with 11 executors so the number of executor this is simply going to be total course by the core per executor so this is going to give you 11 executors so I'm going to have 11 executors in total each of them is going to have four cores and now let's see what is the amount of memory so I have a total of 140 GB and I have 11 executor so this is simply going to give me something very close to 1 12 something gabes right so let let's assume it to be 12 GB it's going to be give me giving me some some number which is very close to 12 GB so this would simply mean that the total memory that I would have is 12 GB now if you remember I also need to subtract out the memory overhead yeah so the memory overhead is again going to be Max of 384 MB or 10% of executor memory so the 10% is going to be 1.2 GB right 10% of 12 GB is going to be 1.2 GB so overhead is going to be Max of 384 MB or 10% of 12 GB and the answer to this is going to be 1.2 GB yeah so let let me just assume again this is I'm doing this for to make the calculation very simple so I would just assume it to be 1 gab yeah so now the actual memory is going to be 12 minus overhead memory which is 1 GB it is going to be 11 gab yeah so if we were to narrow down the final calculations it is going to be something like this num executors num execut us is going to be 11 as we calculated over here the executor course is going to be four as we calculated over here and the executor memory is going to be 11 gab so this is how your calculation is going to look like yeah it's great to see that you've reached the end of the video so to summarize we've learned how to size an Executor and what are the approaches that you could follow in order to size these executors optimally right so we've seen various options like what are fat executors what are thin executors what are the advantages and disadvantage of both of them and then finally how do you size an optimal executor what are the rules that are going to govern the sizing of that optimal executor so I hope all of that made sense and if it did please don't forget to like this video share this video thank you so much for watching

Transcript for:Spark Executor Tuning Guide

Transcript for:
Spark Executor Tuning Guide