Transcript for:
Azure Proximity Placement Groups - Key Points and Best Practices

hey everyone in this video i want to explore proximity placement groups dive into what they are and what are some of the best practices around actually using them as always this is useful please go ahead and like subscribe comment and share and you can hit the bell icon to get notified of new content now we often have a need to separate resources in azure and we may think that hey this is also a mechanism for proximity but as i want to talk about it really isn't now we think about separation for resiliency for some problem that may happen that problem may happen at a node level a rack level a data center level maybe even a regional level this could be a natural disaster it could be some kind of incident it could be a software problem introduced but there's some impact to some blast radius so we want to separate out resources to protect against those various blast radiuses and there's different mechanisms to do that for example we often think about availability sets so i can think what's actually happening behind the scenes is well there's there's racks of servers i can think that azure cloud is not magic there's physical infrastructure sitting somewhere so i can think about there are these various racks and i can think that each rack is a fault domain something could happen to a node in there to some top of rack switch to some power supply unit so one of the mechanisms to distribute is availability sets and when i use an availability set it will distribute my workload over these various fault domains that's about distributing hey within racks now those racks themselves they're sitting in a particular physical building a region is made up of lots of physical buildings and they actually get exposed to me in many regions today now there's kind of a group of buildings typically but some particular group of buildings would be an availability zone so now i can think about okay there's an availability zone and what i'm going to see in a subscription per region is i'll kind of see an availability zone one and then likewise there's another kind of set of buildings i'd see an availability zone two and i would see an availability again i just joined kind of a couple and availability zone three so when i use these what i'm now getting isolation from is some building level problem we think about isolation of power communication and calling so these are different buildings within a certain region so that's kind of the next layer up i can think about these availability zones well they're in a certain region so i could say region one could be east us two whatever that may be and there are other regions as well i can think hey there's a whole set of other infrastructure maybe availability zones over here as well and these are all part of let's say region two and i would use a different region to really maximize that isolation i'm hundreds of miles away typically soon if there was some natural disaster here on my primary region it's very unlikely that it's also impacting my other region over here so these are all about separation hey separation at iraq level ability sets separation at building level high availability zones separation at huge distances using different regions but there is a side effect you might say from a side effect perspective it does give some element of proximity as well i can think about well hey look if i'm using an availability set isn't there some proximity of well they're kind of in the same cluster you might say hey i'm in a certain availability zone well there's some particular set of data centers just as a side effect and if i talk about a region well we talk about a region as this two millisecond round trip latency envelope so if i put things in the same region well they should be within this kind of two milliseconds but this proximity is not guaranteed there's no sla around it and as azure advances its architecture there's no guarantee that my availability set is in some certain physical distance availability zones are typically not a single building they're actually typically sets of buildings could be two three four it varies completely again there's no guarantee but putting things in availability zone is not any kind of latency boundary the region yes i'm within a certain amount and so the requirement i want to address is i do want to get as close as possible together i want the minimum possible latency for my services and typically between those vms there's virtual machine scale sets and this is what we're talking about for proximity placement group it's a construct for virtual machines so it's infrastructure as a service a construct for virtual machines and virtual machine scale sets to get resource proximity between them to get the lowest possible latency now this is vms and virtual machines get sales sets only it's not other services now other services might build on top of virtual machine scale sets think about aks uh node pools but they would have to do the work to use proximity placement groups and to my knowledge that doesn't happen today now what exactly does this mean then if i'm talking about this proximity placement group what is this really going to give me so if i think about a proximity placement group i do this in my nice shiny pen so my proximity placement group my expectation is sub millisecond latency that is not a guarantee there's not a an sla around that that is the expectation i would expect to see this kind of sub millisecond latency for that as part of this is what you would typically see when you're leveraging this and again what i'm talking about here is virtual machines and virtual machine scale sets this is what this is going to apply to now these are available in all regions even if the region doesn't have availability zones for example i can still leverage this and realize this is designed where i really want this very small latency i have some application that is super sensitive to latency i might think about sap hana workloads i might think about high performance computing i'm thinking that the latency is more important than really anything else what is the most important aspect to me is the performance availability well it can kind of take a second seat to the importance of this super low latency and the reason for this is let's think about availability zone for a second so let's really expand out the idea of a particular a z so i'm just going to look at for example az1 it's a particular availability zone now the reality of that availability zone as i said is typically not a single building there's no set number but let's pretend for example it's three buildings three data centers make up az1 in the region i'm working with and i can think they're all connected via some networking which then goes off and connects to the rest of the data centers in that particular region which then connects to the microsoft backbone one so i have these different data centers when i create a proximity placement group once resources actually start to get created into it that proximity placement group gets pinned to a specific data center so the ppg is going to be within a data center that's what it's giving me everything i add into that proximity placement group is going to be in a single physical facility and that's what's going to give me that low latency and then i go and create the resources hey i go and create vms and i add them to that proximity placement group and that could be individual vms it could be vms that's actually part of a vm scale set so we'll all now be in that same data center but think what that means an availability zone is typically multiple buildings so i have the capacity of multiple buildings i have the different types of skus of different buildings as soon as i use a proximity placement group what i'm now doing is the only capacity i have available to me is within that single building so the capacity that i can now leverage for things in this proximity placement group is what i can fit and what is available in that particular data center the skews i can use are what exist in that particular data center so that's why when we talk about using the proximity placement group we're saying that performance that low low latency is more important than potentially the availability of the service by being able to scale out because it's possible there's not capacity available to do auto scale actions for example but it's more important that i have this super super low latency so that's when i would think about actually using this now i do want to stress another point this is not i'm going to do this in red we'll say red this is not about disks there is no correlation here that i'm using a proximity placement group that the disks that i connect to these vms will be put into the same physical data center it is not about reducing latency from the vm to the disk that is not a feature to proximity placement group it is low latency between the vms that's what it is it is not about low latency between the vm and its disk that is not a feature of proximity placement groups so hopefully that kind of clears that up so when i think about using a proximity placement group what i actually do first is i create a proximity placement group i could think about hey look if i just go into the portal i can say hey i want to create a new proximity placement group and i would just like everything else i can pick a certain resource group so i'll say hey ppg demo i pick the region so i'm picking the region for this proximity placement group and i just give it a name so i might say southcentral us ppg1 whatever i want to do and that's it notice i've not specified an availability zone or anything else at this point i've created the proximity placement group and it's kind of just floating out there in azure resource manager it's not pinned to any particular data center at this time it exists i then create resources for example i could go into here so that now exists i can go and look at my resource and notice currently it's aligned everything in that proximity placement group is in the same physical building because there's nothing in it now i could go over now and i could create a virtual machine i'd say hey create vm i would obviously pick south central us if i want to use that proximity placement group has to be the same region i could also now pick a particular availability zone hey i want to be in zone one and then under advanced i can pick my proximity placement group down here and i would obviously go and fill in all of the other options for it now at this point when this vm is actually created the control plane of azure goes through those decisions to pick where it's going to provision that vm the ppg is pinned so the first resource that is provisioned into the fabric from the control plane that's in that ppg then the ppg is pinned to a data center in that criteria so in this case hey i deployed that vm into az1 the ppg would get pinned to a building a data center that is part of az1 in that region for my subscription at that point it's pinned every other resource i add into that specific ppg would go into that same building but stop this is not how i should use a ppg i should not go and create a vm in the portal and use it because realize something within each of these data centers there's actually different vm skus supported not every vm sku is in every data center there are different clusters compute clusters that support the different vm skews dv5 ev5 ss fs um nc m series all those different skews so i could think about well what's actually supported in this building for example i'm just making these up let's say we support dv5 and ev5 and fs maybe this building which supports dv5 as well and ev5 oh and it actually supports nc as well and the m series so different data centers support different skus i might be using different types of skew that i need to be able to communicate i have some multi-tiered application i want that super low latency so what's important is when my ppg is pinned to a particular building i want to make sure it doesn't get pinned to a building that doesn't support the skews that i actually want so how can i help influence this well i don't want to use the portal and we'll come back to that but i don't want to do that the best case scenario is we like declarative technologies i could think about hey i'm going to create a template now that template could be an arm json file it could be bicep which remember transpiles into an arm json file it could be terraform there are others but they're all declarative and what i would then do is as part of this definition i would go okay well i'm creating a ppg and i want hey i'm going to have an ev5 in that ppg and maybe there's multiple of them i'm also going to have nc and that's also going to be that ppg etc etc and then what happens is i deploy that into azure and i go via the azure resource manager so i say hey i want to deploy this and this is actually kind of batched up it takes all of this it doesn't do it sequentially it batches up it looks at everything i'm asking for and that will help it make the right decision because now what it can do is all those different types of skus it will understand what's in that template what it wants to use in the ppg and it can then help it make the right decision and say okay i've got this combination well i'll go and create the ppg in this physical building that maps the requirements of hey az1 and those different skus so that's the ideal scenario so think this this is what i want to use i want to create a template of all the different skus in there it will then not serially process instead it batches up those instructions it's smart it looks at everything i'm going to do and we'll make the right decision now if you see an error like over constrained allocation request it means hey the data center it picked doesn't support all of those skus and there may be scenarios where it's just not possible it maybe you're picking some combination of vms that do not exist in one single data center for example if you mix intel and amd skus i think that can be very tricky to get in the same building now what about if you don't use this now this is what you should be using anyway this is the best practice use declarative technologies i can version control it it's item potent i can rerun it i can modify it and just rerun it again it will make it so but let's say i'm not let's say i'm using i'll do it in red just to emphasize i don't really want to do this let's say i'm using powershell or the cli or it could even be the portal so i think about once again i'm going to create the ppg the first resource i should be creating is the most exotic the biggest most exotic vm for example i'd probably say here the nc then i would say ev5 etc because by starting with the most exotic vm is giving me the best shot that the most exotic will get placed in a data center that at least supports that exotic vm and i have a pretty good chance it will support the other ones i'm going to say once again if i get that over constrained allocation request it didn't work i could try reordering delete all my resources and reorder the operation but best practice use this this is gonna be kind of prone to not working it's not going to get the match i want but if you have to hey you could do that because remember as soon as the control plane so i'm submitting this to azure resource manager it's creating resources even before it starts the resource the control plane of azure makes decisions around placement as soon as it has made the first decision about the first resource that's going in that ppg the ppg is pinned at that point it's not moving everything has to be able to fit from a capacity perspective and have support from a skew perspective in the data center where that ppg has been pinned to now if you have existing vms if you de-allocate the vm you can then add it to the ppg and start it up and it will then try and style it in existing ppg it doesn't have to be done at creation time there is a side effect of ppgs that can be useful i talked about availability sets mobile racks and i actually kind of drew it in an availability zone you cannot actually create an availability set in a specific az it doesn't let you it's one or the other but if you create a proximity placement group and then i create a resource in that proximity placement group in a specific az so i create a vm in az1 in a ppg that ppg is now pinned to a data center in a z1 if i then add an availability set to the ppg that availability set is now within az1 as well so there's a little hack is the right word you can actually get an availability set in a specific a z if you use the proximity placement group but remember you're then dealing with those constraints of the capacity and skews of that particular data center and the cluster there's no guarantee a cluster does only exist in one data center i think that's pretty common which is when an availability set lives within but there's no guarantee so that may add additional constraints that's not attractive to you now when i created that proximity placement group remember there was that attribute and if we go back for a second it had that aligned attribute it was telling me hey everything in the ppg is aligned everything within that ppg is in the same physical building there may be times that certain maintenance is performed and a resource may get pushed out because of there's some movement or something happens and it may be put in a different building in which case it would show non-aligned at that point you could stop the resource id allocate it and then restart it and at that point maintenance is done it should go back into that ppg you can do the get a z proximity placement group dash co-location status it will show you that align statement as well if i de-allocate every resource in the ppg the ppg gets unpinned it's floating around again when i start the first resource when that control plane makes the decision about where it's going to locate in which building the first resource that's starting in the ppg the ppg gets pinned again so just understand the ppg is pinned while there is some resource running if i deallocate everything so there's no compute resources being used at this point in that ppg it gets unpinned it's floating around again as soon as i start the first one it will get pinned down and that's it i mean that's the whole point of ppgs it's to reduce the latency between the vms by co-locating them it is not about reducing the latency from the vm to its disk it's only vms and vm scale sets if i'm thinking about the smallest possible latency make sure you're also using accelerated networking that really helps accelerate and reduce those latencies between the workloads but that was it as always a lot of work goes into creating this content so i really would appreciate that subscription but i really hope that helped clear everything up and as always until next video take care you