Transcript for:
Understanding Thread Scheduling and Prioritization

All right, well, welcome everyone. First of the recorded sessions. I hope you all are doing well, and I hope I've made it to my destination safely. Real quick, convoy from last week. I don't know why it tripped me up, but a convoy is basically when you have a large process in front of a bunch of small ones. Kind of like getting stuck behind a school bus. We have all these cars stuck behind a school bus where that slow thing is keeping all the other things that would be fast from coming through. So, got that covered. Just the wording of it, I think, just tripped me up last week, but we're good to go on that. So let's talk about thread scheduling. So there is a distinction between user-level threads and kernel-level threads. Usually it is the software that's scheduling the user-level threads, and then the kernel-level threads are being taken care of by the OS. And we don't usually schedule processes. we schedule the threads. So the mini to one and mini to many, what we talked about previously, where you have many user level threads trying to use one kernel level thread or many user level threads trying to do many kernel level threads, there's usually some sort of LWP, which is one of those, you know, fake processors that just lightweight processors that look like one. That is basically kind of like a socket for a hose. It's where it connects to. So it's where the user-level threads connect to the kernel-level threads. You set those up, kind of like this is where you need to go in order to connect. So with the two levels of threads, there's two different types of contention going on here. So the process has a bunch of threads that they're trying to get to run. And so which ones are they going to send off to run? That's the process contentious scope. Those threads are contending for time on the CPU. Usually the priority there is set by the programmer, whereas the system contentious scope is which of the kernel threads are going to attach to it. Or if you have a core with two threads going to it, which one of those is going to run right there? And that's usually set by the OS. Okay, so P thread scheduling. If we look at this, you see we're setting up. Let me get my little laser pointer here. We see we're setting up. That's not a laser pointer. Let me get back over here. There we go. Laser pointer. There we go. Laser pointer. All right, so we're setting up five threads. We're setting up an array for our IDs. We got our thread attributes. We initialize the attributes. Then we get the scope. We grab it. And it's either going to be a process scope or a system scope. And we're checking that. And then the API allows us two different scopes. There's the Pthread scope process, schedules tasks on the process contention scope, and then the system one gets set on the system contention scope scheduling. So it allows you to set which one... Camera schedule, but which one it is how you can access is usually limited by the OS Linux and Mac only allow the system scope Yeah, only allow you to set the system scope. So here we have setting that we're creating our threads Are you creating threads to run this code? Here we're setting it to system scope and then creating our threads running our threads and then joining back our threads again okay so that's general process scheduling but now when we start running it again when we if we are just by ourselves in a room uh with one computer not connected to any other computers running one process on one cpu life is simple but things are getting more complicated as we have more processes so When we say multi-process computer, it could be that there's multiple CPUs, with multiple, or there could be there's multiple cores in a CPU. It could be that a core, a single core that has multiple threads. We have to deal with NUMA systems and that's where a non-uniform memory access, where accessing the memory on one core might be different than the other. And then there's heterogeneous multiprocessing. This is mostly done on... cell phones where you have some faster cores and some slower cores. So let's take a look at some of the problems that we run into with this multi-core issue. The first issue is you have all these threads wanting to use the core, and do the cores pick from just one large group of threads in some kind of common ready queue, or do each of the cores have their own queue to pick a thread from? So when they each have a core to pick from... That is a symmetric multi-processing. They can schedule their own threads. They don't have to worry about the problem with a single queue like this is you end up with some race conditions where if one core goes to grab a thread while it's in the process of scheduling that thread, another core might say, hey, I want that thread too. So you have to do some race conditions. You have to do some blocking. You have to do... it just caused a lot of issues. Allowing each core to be able to have their own queue with their own threads and be able to schedule themselves just makes life a lot easier. Okay now we've mentioned something like this before that if you have a single thread on a single core we're often alternating between using the CPU and then we're stalled because of memory. So as these cores get faster and use less power. More cores allows us to be faster and use less power. It also allows us to have multiple threads per core. So we end up with something like this, where you have two threads can run on the same core at the same time. There are some processors that allow more cores. I know there's an Intel server processor that allows eight threads per core, but generally two is sufficient to keep that core busy. And a lot of times... These threads are sharing some resources, so that switching between two threads that are technically using the same core is not as intensive as completely switching to a different thread. So you can have these both running in one kind of in wait to run when the other one gets paused, and it just saves some time on that. Okay, so while when one thread is working at the memory, the other one can be processing on the CPU. All right, and so most... Most... computers, and let's go here, if I go look at Task Manager here, and then I look at Performance, and the CPU, you'll see that it has eight cores, but 16 logical processors. That's because each core can handle two threads, and so the operating system sees this as it having 16, essentially 16 logical processors. So it thinks it has, it can run 16 threads at a time, which it essentially can, and it's doing that just to save time. And so while the processor can each core can take two hardware threads, the operating system sees each thread as a separate CPU. And this is called chip multithreading or CMT. So if you have four cores and each of them could do two threads per core, that means you have eight logical processes. And that's what you often see, as I showed you on Task Manager, that as it looks like you have a lot more cores than you actually do. That's because we're running two threads on every core. As we mentioned before, there's two levels of scheduling. We have these software threads that want to run, and they're all being assigned to maybe a separate, to certain cores, to make sure that each core is, you want to spread those threads out so that each core is being regularly utilized, and not just one is being just hounded while the other ones are sitting there idle. So you have to choose one of those logical cores to run it, but then the hardware has to decide, you know, each core has two threads, and it has to decide which one is it going to run right now. So there's two levels of deciding. So, yeah, so the core has to choose which one it's going to run. One thing that has, you know, these cores are choosing which threads to run themselves. But sometimes one core can get a lot of threads, another core can just be sitting there idle. So we need a way to balance out the threads so that each core is kind of working evenly. If I go back to my Task Manager here again, let me load that back up, you'll see that there is something going on in all of them and they're all kind of approximately doing the same amount of work. There are some times where, like this one, the CPU2 seems to be taking the brunt of something, but generally... they're fairly even. At least there's at least something working on them at all times. So there needs to be a way to balance so that if one CPU has a lot of threads in its queue, can I move some of those threads to a different queue? And there's two ways of doing this. There's push migration and then there's also pull migration. So push migration means you have you have some process that's looking over the CPU cores and looking at sees a core that's heavily loaded, it may push some of those threads over to another core. Okay, so that takes a little bit of overhead because you're having to kind of keep an eye, you have to have some process keeping an eye on the cores and how much they're using. The other option is pull migration. Now, pull migration is a core is like, I'm running low on threads here, and it may look and say, okay, that core over there has a lot more threads. I'm going to pull some over there. So you choose. between those two options. And there's not just you have to pick one. There could be a balance of both going on. Some pushing, some pulling going on to help keep the balance. But here's the problem. Most of the time, a CPU, and I'll look at the next slide here, CPU has, each core has its own memory. So it may be that each core has the L1 cache in the core. L2 or L3 may be shared amongst the cores, but the cache, each core may have some of its own cache. So if a process, if the thread has started on one CPU and that CPU is loaded, well, it may be loaded with lots of threads, that CPU probably has in cache the stuff that that particular thread needs to run. So if that thread gets pushed or pulled to a different core, then now it has to try and access that memory. It's going to have to clear out the cache memory in the one core and get it over the other. That takes time, and that can take a good amount of time, especially if it has to go all the way back to the RAM. So when you're pushing or pulling, there has to be a balance, because too much of that, and you're losing what we call thread affinity. Thread affinity is where I already have my cache loaded in this core. So I would like to stay on this core. And so there's a couple ways that an operating system can deal with this. Soft affinity means that the operating system is going to try and keep it on the same core. Like, I'm not going to just start moving things willy-nilly. I'm going to try and keep you on the same core, but I'm not going to guarantee it. And hard affinity usually allows the thread or the process to specify, I want to run on this core, and I want to stick to this core. You'll see a lot of games that are heavily loaded onto a single core because that's keeping everything together and keeping that cache stuff ready. But games don't tend to use the other cores as much, which can be problematic. All right, so that is some of the scheduling issues. And remember, NUMA just means non-uniform memory access. that when we're when we're scheduling and removing threads around we have to think about the threads need to access the memory and if it's already started running some of it and it got bumped out of a core and it's coming back in it's going to want that same core so that it can keep because the memory is probably in cache that it already needs okay real-time cpu scheduling so some some devices have to run things that within a certain time limit for example uh If you have anti-lock brakes, the computer has to decide whether or not to activate the anti-lock brakes. It has a very limited, you know, a few milliseconds to decide whether or not to activate the anti-lock brakes before it's too late and you've lost control of your vehicle. Or a lot of these self-driving vehicles have a very limited amount of time where they have to detect. whether they're about to run into something before they have to make a decision whether to break or swerve or whatever. So there are certain operating systems that require that things get done within a certain amount of time. And that of course causes a lot of issues. So when we talk about real-time scheduling, we have soft real-time scheduling, and these are the ones where critical tasks that need to be real-time... Excuse me, let me say it. So soft real-time systems are ones where those critical real-time tasks, they get the highest priority, but there's not a guarantee that they can be done within a certain amount of time. Hard real-time services mean that tasks have to be serviced by a specific deadline. You'll often see those where safety is critical, such as vehicles or maybe factory floors or whatever, where you're worried about safety or you're worried about quality of your product. Then there may have to be a hard real time. Now, so you have to worry about latency. And so an event latency is how long does it take from when the event occurs before it starts processing. And there's. two types of latency. There's the interrupt latency, and this is a time from when it arrives to the time where the routine can actually start being serviced, where that interrupt can be serviced. So you know where you have to finish up. You may have to finish up a process. I think I got this on the next slide. Yeah. So you may have to finish. Interrupt happens. You may have to finish up a process. Then you have the context switching that happens. And then finally, you can... take care of that request, that thing that has to happen. Dispatch latency is how long does it take once the interrupt starts processing before the execution, the response to the event can actually happen. So you have to deal with conflicts, you have to actually dispatch the instruction, you have to process the instruction, and this time right here with not any processes really happening, that's the dispatch latency. So you have time, you know, you may have to be waiting for one process to finish before it can actually happen. OK, so how do we make sure that a process is going to be taken care of in time? Well, there's not really a guarantee, but real time scheduling, you have to be preemptive. That means you have to be able to stop something else and take care of the thing that has to be taken care of. Usually there's a priority based scheduling. So the things that have to be. done now can get priority. But usually you can only guarantee a soft real-time. It's like, you know, there is the reality that some things cannot be done in a certain amount of time. And if you have especially conflicting priorities where you need both of these things in a certain amount of time, but there isn't that much time, you can't always guarantee it. So, in order to get a hard real-time, you have to have deadlines, and you have to have the capability of actually finishing it. So, now there's this periodic thing where maybe a process has to happen within a certain period of time. So, you have a couple characteristics here. You have the time it takes to actually process it. you have the deadline where it has to be finished and you have a certain period in order to get it all accomplished before it a certain period before the task has to happen again so the task happens every so often it has to be done by a certain time like a certain length time before the next period happens again and then you have the length of period and so the time that it takes process that has to be less than the time from the beginning of the process the deadline which has to be less than the actual period that the process happens. And so the rate that it happens is, you know, one over the period, and it's going to happen that many times. The rate, how often it's going to happen. Alright. So priorities are assigned based on the inverse of their period. So ones that have, so a rate monotonic scheduling means that, you know, these things are happening every period. So we'll do an inverse of that period so that things with shorter periods where. have to happen more often, they get higher priority. Things that have longer periods, but they don't have to happen as often, those get a lower priority. So it looks something like this. So, you know, this period, P process one needs to be finished by, by 50 milliseconds. Well, it in period in process two has 100. So here, I'm able to take here process one, and you know, process one starts again, and then process two. can wait and then on here process one and process two starts again so that things which it was shorter period so this this process one has a 50 millisecond interval where the process two has a 100 a millisecond interval so it can it can wait so we take care of process one first because it's shorter and then we can take care of the longer period one uh later okay but here's the problem what if you have something like this where the processes are long enough that we can't get it done. If process two needs to finish by here, but look, it's in the middle of it, it doesn't get a chance to finish it. And so if we have multiple processes, and one may have a slower period, a smaller period, but if you can't get both of the processes done within the longer period, then we have an issue. So then another option is earliest deadline first scheduling. So instead of getting it, instead of whatever has the smallest period, we look at which has the nearest deadline. So here I take care of process one, I take care of process two. And then here I'm working on process one, but I'm going to switch out the process two to make sure I get it done before the deadline. So the processes are assigned according to. the deadlines so that the later the deadline, the lower the priority that you have. Okay, another option, proportional share scheduling. So you just have so many shares, so you basically divide up the processing time into so many shares and each application receives n number of shares where n has to be less than the total number of shares that there are. And this ensures that each application receives in the Viability of the total processing time. But it guarantees that each thing gets some processing time, but it doesn't necessarily guarantee it all gets done in time. So POS-X 1.1b standard has real-time scheduling available to it. There's two scheduling classes. There's first-in first-out scheduling for real-time threads and there's also around Robin scheduling. So you can get the current scheduling policy, but you can also set the schedule policy with these two APIs. So if we look at some sample code here, we have two integers, I and policy. We set up our thread IDs, our attributes, just like we've always been doing. Here we're getting the current policy, and we're displaying whether it's some other policy or whether it's Round Robin or it's FIFO. But here... We can try and set it to FIFO. Of course, I could also choose round robin or other policy. I'm trying to set the policy. And if it returns zero, if it was able to successfully do that. So if it doesn't return zero, we have an error. Otherwise, we've set the policy. We can then, you know, run the threads, create the threads, join back the threads, and then that's the work we do. This is all very similar code to what we've looked at before. Okay? So the... Real-time scheduling just has a lot of extra issues because we need certain jobs done at a certain time, and it becomes very hard to guarantee that, especially if we don't have enough processing power for everything that's going on. All right, so some operating system examples. Linux, the old way of Linux doing this, this is through Linux version 2.5. So starting at 2.5, it went to an old one. The scheduling time was just, you know. Quick schedule, it's preemptive, so we can kick stuff out. There's priorities based. There's two priority ranges. There's time sharing, and then there's real time, so the stuff that has to be taken care of now. Real time range for priorities was from zero to 99. Anything above that, 100 to 140, those were time sharing. You don't need as much time. Yeah, so the global priority queue, things with lower values indicate that it's got a higher priority. Higher priorities get a larger quantum of time to process. And the task is runnable as long as there's time left in the time slice. It can keep running. So if it runs out of time, Then all the other tasks get a chance to run their slices of time before it gets a chance to go again. So each CPU has a queue that has two arrays. One is the active one and one is expired. So when a time slice runs out, it goes into expired array. But the indexes of the array are the priority. So their indexes expire their priority. When the active array is empty, then they switch places. This is an expired array for any... Threads that still need more time on the CPU, they get switched from expired, that now becomes the active array, and the active array becomes the expired array, and it runs through them again. Again, they're indexed by their priority. This worked well, but it did give some poor responses time for interactive processes. So things that really needed some priority and where it could cause some slowing down on jobs. So as of Linux version 2.6, there's a newer version. So CFS stands for Completely Fair Scheduler. It keeps track of how long it wants... each thread should have before it gets a chance to run. And the nice number is really because nice threads go last. And so if you add a value to the thread, if it goes up, it's lowering its priority. It's going to take longer to run. And it just keeps track. It keeps a virtual table of how long it's been run. And things that need less time, it will try and run those. It'll run those first. And it kind of keeps a tree so that the thing at the... you know, with a binary, a balanced binary tree, the thing at the bottom left is going to be the one that needs the least amount of time. It's going to schedule that. But with the, with a decay happening based upon the, the latency that it thinks that this object should have. Yeah. And so the virtual runtime is how much it's actually, how much time it's actually has run and based, and calculated based upon target latency, it thinks it should be able to run. Real-time scheduling is the same as the POS-X, and these map back into that priority scheme for POS-X. So negative 20 is a priority of 100, plus 19 is 139. So there's a 39 value difference between there, and that's for the normal priorities. But then you have your real-time priorities, and of course, take priority over everything else. Okay, so Linux does support load balancing. So if one core is doing too much, we can move threads. Another core pushing or pulling threads between cores. But it also has scheduling domains. And what a scheduling domain does, it says, like, for example, it looks at the physical layout of the processor. And if, like, for example, an L2 cache is shared amongst these cores, then it's going to try and keep those threads on one of the cores that's sharing that memory to try and keep the delay down from memory issues, things not being cached. So it tries to be aware of that. So Windows has a priority base. So there is priority. It's preemptive. It can kick stuff out type of scheduling. And the dispatcher's job is the scheduler. And a thread will run until either it blocks, it uses up its time slice, that's supposed to say slice, it uses up its time slice, or it's preempted by a higher priority thread. And Windows has a 32-level priority scheme, but it's kind of unusual. Oh, okay. Yeah, it's got the variable class that has numbers between 1 and 15, and then it has a real-time class with values between 16 to 31. Yep. And then the priority zero is the memory management thread. So that's got the highest priority. And there's a queue for each priority. And there's also an idle thread for a queue that doesn't have anything. So if we look at this, if we look at the priority classes, we have real-time priority class. This is the only one that's non-variable. So it gets highest priority all the time. High priority class above normal priority blah blah blah blah Just the different levels of priority and of course the idle for for idle for something's not really going on But within each of these classes, there's a relative priority time critical highest above normal normal below normal lowest and idle And you'll kind of see the values here. So for real time every value in real time no matter where it's at no matter what relative priority level you'll see the values are higher than any other so the other ones start at 15 but real time it has 16 to 31 just like i mentioned over here real time 16 to 31 all the others share 15 uh through 1 through 15 so you see the high priority one the lowest it gets is 11 which is still higher than most of the others like above normal uh it's even the lowest high is still better than than half of the above normal uh and then this gets down to eight which again it's it's Fairly similar to that, but you can see how based upon which priority class your thread is in and then what relative priority it is within that class, you can see the actual value that that gets interpreted as to give it a priority. So Windows 7. added a user mode scheduling. So there is a runtime library, C++ has concurrent runtime library framework that allows applications to create and manage threads that are independent of the kernel. If you have a large number of threads, this could be more efficient than letting the kernel manage the threads for you. Okay, that's all I have. Thank you all for listening and I'll get my next video up ASAP.