Interactive Study of Queuing Strategies

queuing it's not the easiest thing and getting it right can be a massive challenge which is why Sam Rose one of my favorite writers and plog poost creators wrote an article about it and his articles aren't your usual just talking about a thing in text they have the craziest Most Fascinating visualizations too if you haven't already seen my load balancing video it's also all about one of his posts with similar quality visualizations I haven't seen this one yet I've heard incredible things and I'm really excited to take a look also Encore is the place this one is hosted which is really cool simple development for complex problems huge shout out to them for allowing Sam to justify the time he spends on working on something like this I'm so excited without further Ado let's go straight in queuing an interactive study of queuing strategies we're Encore and we build tools to help developers create distributed systems and event driven applications in this blog you're going on an interactive journey to help you understand common queing strategies it's been meticulously crafted with love and attention by Sam Rose huge shout out again to Sam and to Encore for making this blog post worth the time because Sam as they said is so meticulous that he deserves to get paid for this cues are everywhere we Queue at bars in restaurants and at the bank when you load this web page the request to fetchet is interacting with dozens of different cues on its way from your machine to the server this page is hosted on cues are fundamental in this post we're going to explore queuing in the context of HTTP requests we'll start simply and gradually introduce more complex cues when you're finished with this post you will know all of the following why cues are useful three different types of cues how these three cues compare to each other and one extra queuing strategy you can apply to any any type of queue to make sure you don't drop priority requests interesting he always sneaks these one more things in almost like apple style that end up being one of the most useful and interesting parts of the content why do we need cues let's start at the beginning one client and one server for this to work I'm going to need your help you're going to play a central role in this post by being the user that sends requests click the button to send a request nice oh the why why do you make these so good the 30 FPS isn't even going to do it justice in the video I always say like you should check out the original Source but seriously go play with these on the website it's linked in the description as it always is this is stunning on my 60 FPS I actually want to go and take a look at 120 FPS later yeah you see here if there's other requests coming in they all bounce because the server can't handle it the PHP experience when you clicked a request traveled from the button to the server also the the coloring of the icons here in the text in the copy so it's clear oh so good when the request arrived at the server the server began processing it the request shrinks in a clock-like fashion until it has disappeared at which point it is considered processed if you click fast enough you will notice that the server can only process one request at a time and requests that arrive while the server is busy get dropped animation too fast you can oh you go way too hard for these man dropped requests are a problem if these are HTTP requests dropping them would mean showing an error page to the user this could cause the user to leave your site which could be a lost signup or even a lost sale we want to avoid dropping requests as much as possible one of the ways you could tackle the problem is to introduce a que the next example puts a queue between you and the server new requests will be temporarily highlighted to make it easier to see when they end up in the queue but if you fill up the queue you see we're still bouncing so a lot of things don't make it in but at least we have a backlog here of a few now when a request arrives at the server and the server is busy the request is cued the server takes requests off the queue in the order they were added processing them at a steady rate cues help us smooth the flow of requests to the server we can't control the rate at which requests arrive but for the cost of some memory we can hold on to them until the server is ready having users wait a bit longer than usual is often better than showing them an error absolutely agree it's hard for me to not go on the tangent of we should show them something if we can using like a really cheap scalable layer that will immediately give them like some feedback because sitting on a blank white screen or the boring loading bar in the browser just feels unresponsive first in first out The Classic fifo Q this is fun even one of my doctors like I I have a doctor that I talk to through an app and they respond when you hit them up saying you're in a fifo first in first out q and then gives a brief description every time you hit them up and it's really cute because my doctor's a code nerd he built his whole injust system in Pearl if I recall which is nuts that my doctor isn't a full-time engineer but still did that for fun but yeah it's funny that like fifo both has broken out of the traditional engineering world as a term but also needs to be explained every time you use it the queue we use in the previous example is what we call a first in first out or a fifo q as the name suggests requests are processed in the order that they were added so we send it and they come in and the ones that don't fit in the queue get dropped still while the cues do help absorb burst of requests they can get full if a request arrives and the que is full we still drop that request cues do not prevent requests from getting dropped you might be thinking why not just make the que longer if the que's longer you can hold more requests and there's less risk of dropping them let's give that a try the next example the Q size has been increased from 3 to 5 the catch is that if a user has to wait more than 5 Seconds they'll almost always give up we represent this by making the request Hollow when this happens we refer to the request as timing out so see those requests at the back there getting emptied before they even make it because they're waiting for so long yeah this is a really good visualization of that problem things start off okay but you'll see that when you fill up the queue the requests that join in the last two slots are at risk of timing out when a request gets to the front of the queue after timing out the server wastes time processing it this then increases the chance that the request behind it will also time out if you send a requests faster then they can be processed you can end up only serving timed out requests I kind of want to Riff on this part too the uh processes that timed out are still being handled so if we fill this back up you'll see eventually like right now these ones aren't timed out yet but eventually we're going to start getting requests that have timed out and those are still taking time to respond to so even though some of these requests don't have a user waiting anymore they are blocking requests that do have users waiting for them and we end up with a whole bunch of requests that just never end up going to any user which sucks if you send requests faster than they can be processed you can end up only serving timed out requests also very true we saw that there where basically every single request was being timed out all of the server's time is spent serving requests that the user is given up on Q length is an important parameter to get right and getting it right depends on your application if in doubt it's better to be too short than too long that's a bold claim but honestly having some requests bounce is better than having all requests time out Fair Point I've definitely went to Services where like they went viral on hn and all of sudden every request is taking forever you just open the page and just sits there and hangs for like 10 minutes it will eventually resolve but that's cuz the queue is way too long wait why don't we just drop requests that have timed out why are we serving them all good question there's something stopping you from devising a system to check if requests are still valid before serving them but that's not something HTTP servers often do by default it's a way Bolder claim than you're used to making but you stand by it also I should have mentioned this before Sam's in chat so keep an eye out for Sam who and anything he says because actually realistic spent a long time actually making sure this is how things behave in the real world not much stuff will out of the box skip processing timed out requests that's a painful and sad reality and I have no reason to not believe it because yeah sucks but so does most of code let's be real very fair points and I will triple down on anything Sam says because I'm like a Dumber video version so yeah totally agree with everything he said here and the the boldness of the statement we should have shorter cues rather than too long yeah checks out lifeo this is a fun one one thing we can do to prevent the problem of only serving time to requests is to use a last in first out or a lifeo q sometimes called the stack in a lifeo q the most recently added request is process first there's a kind of like chat like I just saw this message from Sam and then I went back and saw the other two before because this one was first it means I can feel more responsive like you feel like you sent a message and then I responded to it even if I get to those other ones later it still feels much better overall but yes some users will just be sitting there forever and I'll show that by doing this and we see that one at the front can just hang out forever and now we have these two requests here that don't get resolved until the Q is empty but they've already timed out processing the most recent request first means that it's not possible for us to be in a situation where we only serve timed out requests however the mischievous among you may have noticed that it is possible to have a timed out request stuck at the back of the queue indefinitely if you keep clicking the button yeah that must be how most of my Chatters feel for web requests this failure mode is often preferable it doesn't matter if the request has been stuck for 20 seconds or 20 minutes the user is gone again bold but very true I've never sat and waited for a page longer than 10 seconds unless it was jir and it was my job I go on a long rant about how jir does all of these things wrong just for me using it but uh I'll resist the urge because we already hate jir and Confluence and all of that may as well serve new requests ASAP and let the old ones clear out when things come down yeah very fair priority cues both fifo and Leo cues treat all requests the same the only thing that differs is which end of the queue requests are getting added too it's not uncommon though for some requests to be more important than others for example you probably want to make sure a checkout request on your store is treated as the highest priority to help with this let's look at priority cues oh boy the way priority cues work is that when a new request arrives it is added to the queue in a position determined by its priority requests with a higher priority get added closer to the front lower priority requests get added behind larger and higher priority ones in the following example we introduce the idea of priority requests these are visually distinct from requests by color and by the use of Stripes priority requests jump the que getting placed at the front when they arrive send a priority request while the que has one request in it makes sense cool checks out you're struggling to keep up what's Happening Here is we have these requests in the queue but then the priority request goes in front of the others I really like this visualization because we have things going left to right but then we have this jump in front in the Que really good visualization here yeah and striping the priority request for color blind as a fellow color blind yes very much appreciate that notice the priority requests get added to the front of the queue this ensures that they're processed first and don't wait as long as low priority requests when the queue is full however we still have to drop priority requests yep yep even when the queue is full or even with prioritization when the queue is full the queue is full and you can still bounce a bunch of priority requests just by bad luck active Q management oh boy this like the thing that we pay Amazon for with sqs what we would like to be able to do is push low priority requests out of the queue to make room when a priority request does arrive this is where active queue management comes in up until now we've only dropped requests that arrive when the queue is full this is called tail drop because we're dropping at the tail end of the queue this is a simple strategy but it has some problems one of the problems that we are dropping priority requests to make it less likely that we'll drop those priority requests we're going to drop low priority requests before the queue gets full the way we're going to do it is proportional to how full the queue is if the queue is empty we will never drop if the queue is 50% full there's a 50% chance will drop if the Q is 90% full there's a 90% chance that we drop remember this rule only applies to low priority requests priority requests are never dropped unless the que is full see that we have the spot open there's like a decent chance it gets skipped you see some of these requests are getting dropped even though there's room in the que because it is doing its best to leave room for priority requests and it makes it much more likely now that we actually get all our priority requests in good really good visualization as always it may take a few goes to see this in action but you should notice that as the queue gets more full more requests are getting dropped even though the queue does have space in it Priority requests are only dropped when the Q is entirely full this is called random early detection red what red is doing is it's trading off dropping more low priority requests in return for dropping fewer priority requests best of all it's cheap and simple to implement I don't remind me of the Nikon red acquisition I should do a video on it but there's so much to cover there because of this red is a commonly used aqm algorithm in the wild technically speaking because we're using different probabilities for low priority and high priority stuff what we're using here is called a weighted random early detection these abbreviations are nuts but also really useful to have a term for this the idea of each request having a weight to it and that weight being used to calculate if we should put it on the que or not in that calculation factoring in the size of the que that's a lot of data points and none of them are is the user still connected we still haven't touched on the idea of like if a request times out what are we going to do with it so I'll just force this to time out now and now we're losing priority requests even though there's room on the Queue if we were to drop that timed out request but we're still not even doing that and even then we're seeing significantly better likelihood that a priority request gets in yeah like I when you're realistic you do like 10 of these and then one of these you almost always get that priority request in I find big scary acronyms intimidating so I try to only introduce them after I think the reader understands what they're referring to you do a great job of that yeah you've you've been phenomenal of this throughout you're the only reason that these abbreviations aren't driving me insane anyways comparing cues we've spoken about lots of different types of cues let's see how they compare to each other below here we're going to see all of the request cues that we've explored throughout this post fifo Leo priority and priority plus red good old visualizations I will say priority plus red not being red hurts my brain a bit understand though clicking the buttons will send requests to all of the cues work through all of the goals to generate the data we need for a good comparison looks like a lot at first but I promise you won't take more than a minute or two add 15 requests to each Que so we're losing a lot of these regular requests you see them on the bottom especially but all of the ones here have timed out but now all those priority requests are getting through almost always this is a really good visualization and you can see how like these priority requests are getting thrown so far back that some of them are going to time out like for sure the last and first out is just sitting on this huge pile of timed out requests that are never going to come through this is making me just hate last in first out honestly is what this is doing I don't know if that was your goal but if it was you've succeeded yeah this is stunning I yeah yeah lifeo is rough I thought that was one of the better options and I'm clearly seeing now why not really good visualization now that you've generated a good amount of data let's dive into the numbers here all the data below is taken from the requests that I just sent above that's really cool you're going to do a data visualization of me just hammering on buttons you may also notice the graph's updating as request get process feel free to send more requests to see how it changes the data wait time let's start with arguably the most important metric how long do requests spend inside of each queue below is a bar graph showing latency percentiles for each queue split out by the requests and the priority requests change the percentile using the toggle below the graph does anything stand out as you increase the percentile reminder the 50th percentile is the time that 50% of requests take to respond or below 50th percentile is 1 second that means 50% of your requests are processed in 1 second or less also means 50% are 1 second or more so if we had 75th percentile for 1 second that means 75% are 1 second or less lifo goated or n okay the average response time here is a fair point lifo responds quickly in the 50th percenti but as we to 95th yeah see the difference there 95th is awful because once something takes a while if we go back up here I put a few requests on we have the lifeo there and I start putting these in front that those can just take forever now even if I'm not doing priority requests just doing normal ones this one's just going to sit there almost indefinitely if you've ever been to a web page where like you load it and it's just hanging there and you wait a while and then you refresh and all of a sudden it works not unlikely that AO is the reason why yeah I guess those requests there are really screwed with my numbers here but what we do see here that's really cool is having the blue as the priority requests we see that these two algorithms which obviously they're focused on making sure priority stuff can get through more often do a significantly better job of making sure those priority requests come in at the 95th percentile they're all under 3 seconds at the 99th percentile they're still under three and a half seconds but again at the 50th percentile so for half of requests these aren't too different although I did manage to make red really buffed out here but as you go into the 75th the 90th and the 99th we start to see the weaknesses of lifeo specifically like let's be real a priority request should never take 18 seconds like imagine hitting a checkout button and then 18 seconds later you finally get to check out that's not acceptable and the only things here that let priority requests come in under that like 5sec Mark which is a really important number to hit only options here are the PQ and the red yep also a fun question here in chat which approach is used by most web servers almost always first in first out yeah because most web servers are assuming they're not getting more traffic than they can handle so they just throw it in fifo but once you start dropping requests fifo gets rough because when it's full now it's full now I can't even get priority requests in let's talk about the standout here which according to Sam is lifo at the 50th percentile it looks to perform well but it gets dramatically worse as you approach the 99th percentile this makes sense as it tries to serve the newest requests as fast as it can which also means that the requests that aren't the new ones get served as slow as they can is it a great median at the cost of very poor tail and performance yep dropped requests below are a set of bar graph showing how many dropped requests each queue had split out by the priority requests and the normal requests so the only one that didn't drop more priority requests is red the good old priority uh the the ranked queue because again the ranked queue is smart and goes out of its way to make sure that there's a little bit more room for priority requests and that it's more likely to drop low priority request requests so we did get a significant bump in the low priority requests dropped but also a nice little increase of the priority requests served that's a big difference that's like 16 to 23 like in this example might seem small but that's a significant percentage of customers who get their customer like checkout handled where others wouldn't so if you just used a fifo you just lost seven customers potentially just in this small little example it also increased overall drop right that's another very important point that I should jump in on here the overall number of requests being dropped by these ranked systems is higher because sometimes it'll just drop a request even when the que isn't full so here we'll see it's dropping request because it's doing that algorithm to leave some room for priority but that's why that priority request got in and it didn't in any of the others so you see here these all dropped 24 priority requests now and this hasn't dropped any more priority requests but it did drop a couple more low priority requests checks and balances depends on how valuable the priority stuff is versus the low priority stuff you notice the priority with red has the fewest dropped priority requests and the most dropped low priority requests this is the trade-off that we make when we use a ranked system system all of the other cues should be equal it isn't a sure thing for priority with red to always have the fewest drop priority requests it can sometimes have the same as the other cues depending on how you you click it will never have more drop priority requests than the other cues though if you're curious how this priority could ever possibly have the same performance the easiest way to see it is just shipping a ton of low priority requests if you have a shitload of low priority requests and very rarely have high priority ones let's get that full now we do a high priority now we do a couple more High priorities it did sneak one additional one in but only like one more and there it bounced again if you're mostly low priority stuff the difference is negligible but if you have a decent ratio of low to high priority it helps a lot but it's almost like you have to think of two different axes here where like one axis you have to think about is what's the ratio of low priority to high priority requests another ratio you have to think about is how high priority is a high priority request compared to a low priority One like if you're an e-commerce site and you want to make sure people can check out how much more valuable is a checkout than somebody opening the homepage or going to their cart because if they can't open their cart they might just never check out in the first place so that might be a value difference that's close enough that you rank more requests higher the Gap no longer matters as much and you're more okay with the timeouts but if users don't even go to the shopping cart they just instab buy maybe everybody on the site uses the buy now button everything else should be low priority the buy now button should be instant but this depends on you and your needs so it's the gap between how many of these types of requests are made as well as the gap of how important those requests are to you like how high priority is high priority and how often does that high PRI thing occur these are things you should have to consider when you make these decisions let's look at timeouts this is where things are going to be fun because again some of these cues just Cress timed out requests all the time and let systems sit there and time out like we saw some of the times here where the 99th percentile for lifo was 18.2 seconds that's a shitload of timeouts so let's take a look at what these numbers looked like overall because it looks like FIFA had by far the most high priority things timeout CU again it just gets stuffed on the back there's nothing to give it a chance to go in front but the life oq is a little less like because I have to press the high priority and then go spam the low priority I could force that to be higher though let's do that just keep spamming this yeah so if you're still getting a lot of low priority requests and you have a small burst of high priority ones you're going to run into this problem again see how many I can get to time out here and also how bad I can make that 99th percentile intentionally I'm actually curious now it's like a mini game but just to see how much you can Screw Up Priorities I like this a lot it's like Universal paper clip for people who are even nerdier yeah look at that look at what I managed to Brute Force here that's a 29 second almost a 30 second I I won't reload the page I'll don't worry I'm actually really curious to see at the end 29 seconds is insane for a 99th percentile even like the 75th 75th isn't actually that bad the jump between those two is insane but the percentile here 21.2 seconds for priority requests is insane imagine a user checking out and it takes over 20 seconds to process that request they closed the page long ago they're not buying those items anymore you're screwed so yeah to see the Gap there is nuts and to go back to where we were here with the timeout requests here we're seeing we've now timed out six of those last and first out requests versus fifo being at 28 timeouts pretty big gap but also red has by far the fewest timeouts cuz it usually just drops it's it's a little hard to predict this one because it depends more on how you complete the goals but you should see that fifo and Leo have the most timed out priority requests the priority cues will process priority requests faster and thus have them time out less often yep we don't even have any timeouts on priority requests for these two you should also see that lifo has the fewest timed out requests overall this is again because it prioritizes the newest requests conclusion oh boy these aren't your usual conclusions so don't like skim through and go to the next video because as Sam said there's going to be something fun at the end I hope you enjoyed this journey through queuing let's recap what we've learned there are lots of different types of cues each with their own sets of trade-offs cu's help absorb burst of requests but don't fully prevent requests from being dropped priority cues can help ensure important requests are processed first active Q management can help process more priority requests by dropping low priority requests before the queue gets full for HTTP requests it's common to see first and first out cues used in practice I would argue with this post to back me up that lifeo cues are a better choice for most workloads if you need to prioritize some requests over others a priority queue with aqm is a very good choice though seen less often in the wild queuing is both a wide and a deep topic we've looked at only one use case here but cues are used everywhere I hope this post has ignited a desire to learn more about the topic and if it has I can recommend the following reads as always a bunch of useful stuff here is this all of oh that's why you had me wait all the dropped requests are on the bottom of the page you nerd you absolute nerd that's so good I love that you put all the effort into something that like is basically impossible to see without screwing with the page that's adorable oh my God 10 out of 10 so one last thing I have to bring up is that Sam's actually open sourced all of the code for his visualizers and you can see here all of his articles that have these visualizers you just look at the code for the queuing one in typescript of course I'm just curious what your dependencies are of course pixie Legacy version and your core depths chroma JS font face Observer Gap oh you're doing good old Gap that's nuts I I've regularly said like gap's one of those things that like I it's a nice reminder that I'm not actually that good at web the amount of effort but also quality you can get using green sock is nuts it's a beautiful animation Library that's even lagging my machine a little here but you can do this type of just crazy with it and nothing else comes close everybody seriously go give Sam a follow he's a legend Sam who with two o's also check out his blog he has one of about retries on here but has a whole blog of his own as well s.dev full of things like this it's nuts highly highly recommend checking him out fantastic stuff also I have to say considering how wide and deep queuing is this is a fantastic overview like this touches on like every single thing that most people would need like getting started to have the conversation and I like knew a decent bit about queing before but I still feel like I understand much better and I hope everybody watching feels similar if I should do another video about a Sam Post let me know in the comments until next time peace nerds

Transcript for:Interactive Study of Queuing Strategies

Transcript for:
Interactive Study of Queuing Strategies