Transcript for:
Convex MPC and Nonlinear Trajectory Optimization

so uh what did we talk about last time so last time uh we sort of like I don't know talked about a handful of examples of convex MPC uh applications just to give you a little taste of some of the things you can do with that um that are are sort of widely used for uh we're going to dig into a couple of those later in the class towards the end in some detail particular we're going to talk about legged Locomotion stuff and quadrip heads and we're going to do the rocket softt Landing problem so we're going to do like a couple case studies on on some of those'll be fun um so yeah that stuff and then we talked about uh nonlinear trajectory optimization so we started that and um sort of Define the problem and then we talked about the differential dynamic programming eive lqr algorithm and I basically like probably somewhat over welms you with lots of math apologies is those a lot um so in the interest of you know like not having that kind of you know just totally steamroll you with math um today we're g to sort of recap that at a high level and give you sort of the the big picture and like some of the details implementation details and some of the extensions uh that you need to actually make it work in practice um and we'll talk a little bit about how you can go about uh handling constraints in DDP like remember the way we talked about it last time it's it's basically you know looks like an lqr type of recursion there's no provision for like actuator or state constraints in there so we'll talk a little bit about how you can uh reason about constraints in these kind of DDP type algorithms um and maybe mention uh so-called free or minimum time problems as well which uh don't fit super naturally into like the DDP setup um so these are problems like get to the goal as fast as possible that kind of thing which come up reasonably often uh okay any questions about any of that stuff like anything from last time okay hopefully today we'll we'll get like uh a bit of a we'll revisit the stuff from last time and give you hopefully like a little bit you know have a recap and it should make some more sense I'm gonna try to turn off these stupid buzzing things on my phone also do not disturb me okay cool here we go so uh first thing is a little bit of a DDP recap so remember what we're trying to do here ultimately is solve um in this case the unconstrained uh nonlinear trajectory optimization problem that's what this is all about and it looks like uh Min over our X and U trajectories uh of some cost function that uh looks like you know the stuff we've been looking at the whole time where we've got a stage cost that gets added up over time and then a terminal cost that's just on the last St so that's our cost and then we've got uh our nonlinear Dynamics okay so this is the problem we're solving and the way the algorithm works you have a backward path that looks like a slightly uh sort of generalized version of the rotti stuff we know from lqr and we start with our value function and and we tailor expand it to second order in like our Delta X we got our V ofx we got our gradient term and our Hess term and then we start that out at the end of the trajectory at time n by just looking at the terminal cost so we got the Hess of the terminal cost and the gradient of the terminal cost to kind of get us started and then we go backwards with our like Bellman backup and um that looks like this we minimize over our Delta U uh the second order tailor expansion of our action value function which we're calling s in here but you will mostly see that called Q in general um so that's what that looks like and when you do that you get the following stuff you you get Delta U um is equal to a feed forward term I.E like a constant uh Delta Vector which we call D and then a feedback term that is looks just like and is exactly the lqr feedback controller and then we plug that Delta U Back in to the value function and we get um our next cost to go H and gradient and we wrote all these matrices down last time I'm just trying to summarize here so if you want to get the definitions of these things you can go look the stuff from last time Andor in the code and we're going to do some more code examples today so you go you plug the plug back in you get big p and little p okay so that's the whole thing okay so that's the backward pass we go we go backwards we get these Corrections on our controls and we get this feedback policy so now we have a new U trajectory the next step of the algorithm is then to do a forward pass or forward roll out and that um we're I'm going to write it out in like kind of pseudo code form here uh with a line search and it's just the armo rule that we've been doing all along applied to this kind of forward roll out so the way we're going to do this um I'll write it kind of all the all the mess out so we're going to compute uh the change in the total cost and uh as we go when we'll do this and we're going to start the roll out out we're going to the new trajectory that we're getting after we apply this Delta U we're going to call that the prime trajectory we're going to start out at time step one with X1 Prime equal to X1 so like the initial state is given right we don't get to pick that so we're starting at the initial conditions and now we're just going to roll this out it's going to look like this so for K from 1 to n minus1 we're going to calculate our new U so we're going to call this U Prime and it's going to be our old U um and then we've got the feed forward and feed back terms but we're going to put an alpha on there and do a line search on them we're going to do the armo rule thing the backtracking line search thing and in particular there's some little details here that are important that Alpha only goes on the D it only goes on the feed forward term it does not go on the feedback term okay so that's what the U looks like and then we're going to take that new U and plug it into the Dynamics and get the next state and then we're also going to calculate this uh um sort of uh expected change in the cost which looks like this Terri okay so what this is this is the gradient that gu guy is the gradient of the cost to go evaluated that time step and then you're taking that and multiplying it by the update by that D so this is giving you the reason we're doing this is so we can do our Miho and remember in our Miho what we do is we try to calculate the expected decrease in the cost under the linearization right so in this case that's the gradient of the cost with respect to U which is that gu times the step that we're taking which is the D and we're scaling it by Alpha does that make sense so in this case it's it's the same thing we've been doing the only difference here is we're doing this on a per time step basis and so during the roll out we're going to add this up right make sense so that's the expected change in cost under the linearization and then we're going to take all that stuff and do our line search which is exactly our Miho just kind of Applied in this context so we start with Alpha equals 1 and then we're going to do the following stuff we're going to calculate our X Prime U Prime and Delta J using that roll out procedure we just wrote down above and that sort of you know depends on x u d k and Alpha then we're going to chop Alpha down by some Factor if we don't satisfy the armo conditions which are the following which again this is the same stuff we've been doing so we're going to evaluate the actual cost on our new trajectory our X Prime u u Prime and we want that to be um uh we want this new guy to be less than our old cost minus this expected decrease with some Tolerance on it right so that's the same stuff we had before and then if that's you know we we're going to backtrack until that's satisfied and then once it's satisfied we're going to make our primes the new the new trajectory and repeat and there's a few different you know there then's the question here of how you decide when you're done uh and we're going to use uh the following so the D's which are those feed forward updates we're going to basically do this until those updates are small basically until it's not really taking steps anymore so infinitor means the largest you know the largest single element of D um until that's below some tolerance and then like we had before those there's a handful of little hyper parameters in there so in particular um the C which is the factor we chop the step length down by is usually sort of a half is pretty typical there and then this B which is the tolerance uh on like you know how how much we want to like match the linear the expected linear decrease we usually make this pretty small like um 20 minus 4 to like 0.01 say something like that okay okay does that make sense to everybody any questions about this so we do backward stuff last time we basically derive the whole backward thing which is all this crazy mess um and then we do forward roll out where we basically take those Corrections apply them roll out the Dynamics with those new controls evaluate the cost evaluate the expected cost decrease under the linearization do our backtracking arm Miho thing chop the alpha down and and repeat until we satisfy our Miho and then then we take that update and do it again cool that's the the general strategy okay everyone's cool with that now we'll go do stuff yeah I guess if I'm looking back through this where we're doing like the forward roll out we have this like for Loop happening where we calculate these like U primes and X primes based on that singular value of alpha we pick an Al so we start at Alpha One which is the full step right that's the full Newton step so do we go back through that if our line search is like no that step was too big yeah exactly so here the step think about it this way the step you're taking is on the full trajectory so to evaluate that thing you got to roll out the Dynamics along the full trajectory even evaluate the cost right the cost depends on the entire trajectory and the updates on the entire the entire trajectory right okay so in our like pseudo code when we're saying wall J of blah blah blah is less than this stuff that's where we're that's on the entire trajectory could be cleared yeah you do this roll out you compute the entire new trajectory which is like the primed trajectory you've added the cost and you've already calculated the expected decrease in the cost kind of put all that together check our Miho conditions and then like if it's satisfied you're good you keep the then the primes become the new x's and you do it again do a new backward Pass New forward pass if not you chop Alpha in half do another forward roll out and evaluate the cost and the armo rule again until it satisfied right so that part's the same the armo stuff than oh he's the cool all right so let's let's some code step check this out a little bit um there's some fun things in here all right soone see that okay yeah let's try all right okay so let's see now we have to apparently compile sense um take too long uh so let's see we're going to do this do a few examples here the first one's gonna be a cart pole so everyone know what a cart pole is anyone not seen a cart pole before pre stock thing so yeah it's a little cart on wheels with a motor that can go back and forth on a track and it's got a pendulum attached to the cart and the generally when you play with these things the ideaas you got this cart going back and forth and you don't have direct control over the pendulum you can only Slide the cart back and forth and the goal is to take the cart and slide it back and forth and get it to like swing the pendulum up that's so it's one of these classic like underactuated nonlinear systems that's used as like a control Benchmark a lot uh okay cool so um we're going to use a library to get those Dynamics so we don't have to write them down ourselves um that's this robot Zoo Library that's the continuous Dynamic so then we're going to do rk4 on them to discretize them in time we're going to do this like 20 htz which is reasonable then we're going to just write down a bunch of bookkeeping functions here to get all our jacobians that we need so these ones are just the A and B matrices and then these ones are those tensor terms that we talked about last time so this is like da DX DB DX blah blah blah get the idea okay so we're just going to do four diff on all this stuff and note what I'm doing is exactly what we wrote down last time right we're taking like DF DX which is a and we're doing we're vectorizing it with VEC and then we're calling for a diff again on that thing so we get this like really big Jacobian right okay so there's the second derivatives this is just bookkeeping stuff we're going to like try this over 5 Seconds whatever cost function so here the cost again it's just quadratic stuff but what we're doing here is essentially pretty small cost on the controls um these are like pretty modest but we're putting a big terminal cost here the idea is that terminal cost is saying that we want to get it up by the end so it's saying really what this is saying is like I don't care too much about getting it up until the end of the trajectory like you better have it up at the swung up by the end of the trajectory if I wanted to swing it up faster or sooner I might jack up the stage cost right and you guys should go play with this and you know it's kind of fun okay stage cost terminal cost total cost function cool so we're going to play around a little bit with this this is the backward pass um it's exactly what we just wrote down um so there's a bunch of derivative stuff and uh the thing that's going to be fun here is we're going to write down both the gaus Newton IQR version which is this one and we're going to write down the full DDP full Newton version which is this one that has all these extra tensor terms right and we're going to try them both and see what happens um but this is basically exactly what we wrote down this we're going to talk a little bit about so remember when we did all this like nonlinear stuff before when we did the optimization we often had to regularize the HS to get a descent Direction it turns out you got to do that here as well so it's all the same stuff uh we haven't quite talk we'll talk about this in a sec um so we're first going to do this G Newton stuff uh and yeah it's exactly what we wrote down so it shouldn't be any any surprises uh we need an initial guess and here what we're going to do is we're going to start it at the origin hanging down and then we have this goal state which is again at the origin but with the pendulum up and um we're going to initialize it with just like sitting at the initial State and then with just a little bit of random noise on the controls there's a really good reason to initialize it with some random noise on the controls any want to know why yeah exactly so yeah basically if it's hanging straight down there's this kind of weird phenomena here where like you want the gradient information to tell you to like swing up right if you're sitting right here turns out like there's this ambiguity of whether I swing left or swing right and it doesn't matter the problem's completely symmetric about going left or right at the initial conditions and so there's this like annoying symmetry in the problem actually that means the gradients are actually zero if you're exactly at the bottom so it's often a good idea to like actually put a little bit of random noise on these things to avoid things like uh your your gradients being zero at like weird configurations and stuff like this this by the way has anyone heard of the hairy ball theorem that's what this is an instance this a funny it turns out the deal there is the the wording of the hary ballum is you can't comb it's actually like usually people talk about tennis balls that are hairy you can't comb it flat without it having a cc so like the the deal is you can have a vector field on a sphere or a circle that has only one fixed point so like uh the way if I if I Tred to like draw a circle I want the gradients to point to the upward you know equilibrium turns out you can't do that everywhere you have to have at least two flat spots on a circle uh the easy picture is like if I draw I draw this I want to be up here so if I try drawing gradients that point me up you know all over the place I can do this all the way around that kind of works but then down here what do I do like I can't have I can't have it have gradients pointing up everywhere so it turns out that's a general property of spheres uh in N Dimensions like you can't have just one fix point this is like an instance of that it's funny hary ball through just sounds funny so look it up kind of cool okay so that's what we're doing there little noise initial roll out cost whatever then we're going to here's the meat of the algorithm so this is like the whole thing that we just wrote down so we're going to start with a backward pass do the forward roll out with the line search which is this stuff this is just what we exactly we just wrote down um then we're going to evaluate the cost and there's a little bit of extra kind of like bookkeeping slash um you know bailout kind of stuff in here to make sure we're not messing things up um in particular we have like a couple of little things here to this can because it's nonlinear and sometimes with nasty problem say you can get like n and bad things happening so we're checking for that kind of thing um and then this is just what we just did though right so we Cho the step size down check our Miho blah blah blah do this until we're happy and then once we're happy we update everything and that becomes the new trajectory and then we repeat the whole thing right okay so let's go ahead and do this and see what happens it's going to take a minute my code is not fast right this is all I'm calling for diff a bunch and whatever okay so that's 691 iterations there's the cost and this is like what the trajectories look like it's very reasonable and let's go like animate this and have a look uh that looks sort of weird but yeah that's that's it um okay let's try some stuff now um I don't know it looks kind of unphysical the way this is animated like that that looks kind of wrong right I don't know it's obeying the physics as we have written them down you know but it does look kind of funky in in here yeah that's fine I mean it just it looks like it's kind of like torquing the pendulum around without doing a lot of swinging but I mean it is it is right according the Dynamics we have okay so that's that um okay so let's keep this number in your head so it's about 690 iterations let's go try the full DDP with all the extra tensor crap on it and see what happens so we're we're then going to have to calculate these extra derivatives um and then we're g to helmet this stuff out and try the the ones with all these extra terms so there's a couple things to notice here um so you notice I have this extra like is positive definite kind of check with all this regularization stuff in it I actually have it set to display if it has to regularize if it goes in here and notice it didn't happen before right because we were doing GA Newton and if you remember from before we do G Newton you don't have those extra curvature terms and here are cost functions nice so we didn't have to regularize because everything was benign but as soon as we do this and we pick up those extra constraint curvature terms it's going to mess us up a little bit and you'll notice uh let's let's go roll it out again and do it um yeah so now it's having to go in there and regularize the hessen a bunch of times um so let's see okay so remember before we took um like 691 iterations so here it's 552 so it's a little it's you know slightly fewer iterations than before but also in terms of wall clock time it was definitely slower and that's because it Computing all those extra second derivatives kind of expensive so often we don't bother with this note also the solution you get is exactly the same I didn't do anything different and like watch it it's the same solution so you get the same answer in terms of iteration count it may or may not actually converge a little bit faster but in terms of wall clock time it's usually not much of a win because it's way more expensive to compute um okay other things you can play around with you guys should go play around with the code like some things in particular you should check out if you go and um play with the cost function no the solution we just got it basically did one pump to get up if you make R bigger if you make the control cost bigger um and you stretch out the length of the trajectory a little bit you can get solutions that take multiple pumps and swing up slower because it's more expensive to to use control effort right stuff like that so playing around with that check it out it's kind of fun any questions about this yeah something that was like straight up like or something is that going in changing what the terminal so that would be so first of all that's not an equilibrium point so you can't actually sit still with the pendulum over there right because it's underactuated you could do some weird thing where you were moving you know accelerating constantly and like have it but that doesn't quite work right you could get it to some angle instantaneously at zero velocity at the end and if you want to do that what you would do is go change this x goal so you can make that anything you want in fact you I don't know you want to try it real quick do that and see what happens so this is now going to try to get it to like pi over two at the end of the trajectory is that like on guess not uh yeah I mean I don't it looks like it didn't yeah it also like didn't converge because we didn't like give it enough iterations and stuff but yeah basically you can get it to get to here at the end if you wanted to with zero velocity so yeah you can play with all this stuff there's like kind of fun anybody else yeah so we do a similar thing in the homework with lqr is the advantage of this you don't need a reference trajectory this is generating the reference trajectory so in fact this this gives you the reference trajectory it also gives you the lqr tracking controller for free because those KS that come out of here are the lqr controller so if you just run this you not only get the reference trajectory then you also get the lqr tracking controller which is kind of cool and kind of useful in fact that's quite useful if you're doing this online so you can run this thing at like 10 Herz say you can use the lqr controller and run like a tracking controller at like 1,000 htz if you want to in between these solves it's actually kind of useful there's there's cool things you can do with that uh yeah Del one uh yeah I think that's just that's a stupid like I was just checking uh where did that go yeah it's because I checked I think because of the conditions here I I don't want it to be uh zero at the beginning because it'll automatically return at something dumb like that anybody else okay so we got one more to do um remember the acrobot right we talked about that uh before that's um a double pendulum where you have one motor at the elbow joint but you're un actuated at the shoulder and it's called the acrobot because it's a sort of like toy model of an acrobat on a trapeze where you have no ankle or you no wrist torque you're hanging on the bar you can move at your waist like bend your your waist to do stuff so that's kind of the idea here again we're going to use a model you know a library with a model in it 20 HZ rk4 same stuff as before um 10 seconds 20 htz cost same same cost actually we're going to put a big terminal cost to try to get it to swing up uh uh yeah IQR we're going to not do the tensor stuff here um and then this is same deal it's it's going to start down here and we're going to try to get it up here right um and I think one yeah one thing to point out again is same deal like if you hang it straight down the gradients are zero and it won't do anything so you need to perturb it a little bit basically you have to break the symmetry so it's either going to the left or going to the right at the beginning so it can then figure out what it's doing um so we're going to give it some random noise and then we're going to go solve it this is the same code we just did uh so so yeah couple hundred iterations here's what that looks like and we'll go animate it okay that's cool so now I want to show you something so um this is uh so we've got this like random initial conditions situation happening here right uh where I'm like initializing the use with this random thing we just talked about this um if you run this like I don't know a dozen times you'll mostly get this answer but once in 10 or 20 tries um you'll get a slightly different answer and I want to show you that I saved one of those so we'll try that one real quick so it's again it's just slightly different you know initial conditions and it's basically the same cost but look at the trajectory it's quite different here's that that solution so it's like a qualitatively different trajectory right what's going on there don't have any thoughts why did that happen I mean because you have some randomization at the start they're going to be different local optimally for like what trajectory would work to follow yeah so that's exactly right this is a non-convex problem right so it has potentially lots of local Optima and we're doing we're essentially doing Newton's method it's going to land in whichever local minimum is the closest to where we started essentially just like when we saw when we did the Newton's method stuff at the beginning of the class if you have stuff with multiple local Minima it's the one you land in is kind of depends on where you start right here we're starting with random a random guess we're going to land in the closest local minimum to where we started and in this case you know there's there's a handful of different solutions that have almost the same cost and like look pretty similar and it's total random chance which one you end up in and here we found two different local Optima basically um in practice most of the time we don't really care like in robotics applications like this like usually you know as long as it's getting where we want it's obeying the torque limits and doesn't look stupid then we're pretty much fine with it right so uh so yeah that there's uh kind of generally could be lots of there's actually infinite number of possible swing up Solutions here right similar to the other one we can go play around with the cost here like if I jack up R and try this again I can get Solutions with like multiple pumps and stuff like that we can try that real quick uh put that uh yeah any other questions about this where did that go so yeah you should play with this you can try all kinds of stuff um like in particular the playing with the cost function and getting it to do different numbers of swing like pumps to get up is kind of fun uh yeah okay any questions about this local Optima should beware of that uh I think that's all there's to say about this yeah all right yep CH somewhere else would it for the arm to be straight or could it end like at an angle so yeah the the key thing here is we're we're trying to swing it up to something that is an equilibrium where it can balance um if you swing it to some other location like you could totally like have it terminal conditions be like somewhere random with zero velocity it'll get there instantaneously at the end of but then it'll just fall down right so um there are though there are definitely Solutions weirder solutions to this like instead of like this you could have a solution that's like this that's also balanced so if you found one of those and you wanted to swing it up to there sure you just you add to kind of force like the elbow to be so remember there's no constraints in here so all I'm doing is I'm doing all of this with the cost function so this is just um I like set that final State uh so yeah actually if if I try something like I don't know just out of for fun here if I if I make this a little bigger we can try this again and see if we get a different number of PS so yeah so literally it's just this terminal cost being big is the only thing that's doing this right now see if we get something different run it again oh uh oh we got Badness so yeah it's also this is not very high quality code there's definitely you know things where you can get into trouble in here so we need to do this but yeah you can do that for sure and it would just come down to like that xn that X goal that you're putting in the cost function you would want to if you wanted to find one of these that was slightly bent but still balanced how would you do that okay so it what's the definition of an equilibrium point yeah x dot equals Zer in our Dynamics our Dynamics equals zero right it's a root of the D Dynamics so what you could do for instance is make uh so which what I would do so there's two angles right and you want the velocity to be zero so I would set this one to be something arbitrary like say 10 degrees and then I would use Newton's method to solve for the um shoulder angle such that X do zero right that's how and then you you get that you plug that in as your goal state right here and you'll swing up to that and then you can wrap that in a lqr balancing controller once you get to the top and one of Zero no they all have to be zero x dot equals zero it's just a root yeah I mean that's just a root of the Dynamics right or equilibrium point right cool anybody else all right so yeah go play with this play with the cost functions mess around with fun um all right so just to sum that up a little bit okay so we did Carole and act robot and um like so one takeaway is you know DDP uh can converge in fewer iterations um but IQR often wins in wall clock time because the iterations are cheaper uh what else uh and yeah in general these problems are non-convex which means in general you can land in different local Optima based on your initial guess okay questions about that yeah you say initial guess that's like the initial guess for your actuation over the full trajectory that's right yeah so here we're solving for trajectories everything's about a trajectory and you make an initial guess on the full actually full control input trajectory in this case right so the way this works remember is we we we have the initial State the initial conditions we don't really get to guess the state trajectory we only get to guess the controls then we roll those controls out with the Dynamics to get you know kind of to kick things off um and that's a key property of these kind of algorithms they are always dynamically feasible they always obey the Dynamics because we're always doing roll outs right okay and then just to kind of finish off some of the details um you saw that we had to do regularization in there uh which you know makes sense we had to do that you know in Newton's method before and there's nothing different here it's a it's a non-convex situation it's Newton's method so um just like kind of the standard Newton situation uh we had uh basically the in this case it's the cost to go Andor um the action value function uh HS uh can become indefinite in the backward pass and that makes bad things happen right it makes us go uphill instead of downhill on the cost function which we don't like so um we need to regularize um so uh is you definitely need to do it in the DDP uh you of full Newton scenario and even on IQR where maybe sometimes you don't strictly need to it's often a good idea for numerical reasons um because of floating point you know messiness if you're close to having you know like a zero igen value it's not good so just padding a little bit with a regularizer is generally a good idea for numerical reasons um so this is where it gets a little weird though uh so like with the normal kind of Newton step we did before was very straightforward right you just like pad the Hesh in here there's options it's a little kind of more subtle there's several options and here's kind of some of them so the kind of like obvious thing would be uh to add a multiple of the identity to the action value Hess so to d^ s s uh which is basically we do in in like regular Newton um you could also just regularize the PS or the um the G's in the backward pass like as necessary which is kind of what we did before right in the backward pass we would check uh P at each time step and if it was indefinite we'd regularize it like kind of on the fly so P or the action value guy as needed and so yeah the first one what I mean is you just regularize the whole thing all the time right the second thing is as you're going backward you check it at every time step and if you need to you just do it at that time step um the other thing you can do is if you go stare at the um equations here um it turns out you only actually need to invert in this it's not in here it's in this from last time in this Min over U the only Matrix you actually have to invert is this guu so that's the only one that like you know really needs to be full Rank and blah blah blah so you could also just um regularize that one um the reason you can do that is cuz this is the only Matrix to invert and specifically your D looks like guu inverse gu and your K looks like gu inverse gux this option is maybe like reasonable for IQR you're just doing this for like numerical conditioning reasons but in the full DDP it's probably not the move uh because in the DDP scenario like your your cost to go could be indefinite which is bad so this is just sort of for numerical reasons um okay and then yeah like for IQR if your if your cost function is convex like you shouldn't you shouldn't need to do it but you you might still need to because of floating Point roundoff kind of reasons conditioning reasons so you should probably always just have a little regularizer in there so same for doing Newton you know in general uh okay any questions about that all right so that's kind of the whole thing now you know how to do IQR um some notes about this and like you know Pros cons that kind of thing because we're going to next we're going to do sort of a whole other class of algorithms for solving this nonlinear trajectory optimization problem and so why you might prefer one over the other um good things are can be very fast uh both in terms of ations and wall clock time um it's uh it's often one of the most efficient methods because it really exploits the full like dynamic programming structure of the problem see and another cool one is because we're always doing these forward rollouts it's always dynamically feasible so why you might care about this is um if you need to stop early before the algorithms fully converge like if you're in a real-time like online scenario this is very nice because you can just like go run this on the robot and it'll obey physics so it might be suboptimal but it's you know it's feasible uh and then the other cool thing that we kind of mentioned is it comes with this time varying lqr tracking controller for free so not only does it give you like the fee for you know reference State and control sory also gives you this feedback lqr controller and we kind of mentioned that like you can play fun tricks where you use that feedback controller that lqr feedback controller that kind of comes out of this a free uh to run the controller at a much higher rate online in between nonlinear solves so like if you can only run the nonlinear solve at say 10 Herz but you need to run the controller on the robot at like 500 htz to stabilize it and get good performance you can run this at 10 Herz and then in between uh you can run the lqr controller at a much higher rate right okay so bad stuff about it uh one is that it does not natively handle constraints right so as we've done it so far there's no no sort of input you know kind of like torque limits there's no obstacle constraints State constraints we're going to talk in a second about how to incor some of that stuff but out of the box it doesn't do that um it also does not support this is kind of a double-edged sword with the whole dynamically feasible thing you're always doing roll outs and so when you kick this thing off when you initialize it you um have to guess the controls and roll the Dynamics out to get the States you can't guess a state trajectory that's infeasible that's actually very very useful in a lot of situations um in in particular that's useful in like like for example like maze likee situations we have a lot of obstacles these methods often struggle with like getting around obstacles so a really common move would be to use something like a sample-based planner like an RT to find some kind of uh pose trajectory that makes it through the obstacles and then use that as an initial guess of your state trajectory for this guy it won't obey the Dynamics because the RT doesn't know anything about the Dynamics there just sampling over you know poses or whatever but you can throw it in maybe and use that somehow to initialize a trajectory Optimizer to get a dynamically feasible path right through the obstacles that's a very common move and unfortunately like these algorithms are not not really well suited to that um so yeah like kind of bad uh bad for like maze or so-called bug trap problems basically problems with lots of obstacles and then the final one is um we haven't really talked about this but I'm just going to tell you this and we can talk more about if you're interested um these sort of algorithms generally so anything with like a rotti recursion um can suffer from numerical ill conditioning on long trajectories it turns out the rotti recursion is numerically unstable that means in terms of like error propagation uh from like floating Point error it's not great so actually for a very long TR rectories where you're doing lots and lots of time steps the the numerical error due to floating Point gets worse and worse and worse and it can get pretty bad if you have like a poorly scaled kind of nasty problem uh so that's the kind of you know Vanishing expodent gradient type phenomena right okay any questions about any of that yeah definitely not no the best you can hope for is that it will converge to a local Optimum from like uh any given starting point and you can prove results like like that but they're honestly kind of meaningless in practice because you end up uh in floating Point arithmetic you can get in trouble uh so even if they have these nice like sort of local convergence guarantees in practice They Don't Really hold up um they only hold up an exact you know infinite Precision arithmetic in practice and life is hard in general so yeah this is like getting into sort of voodoo you know hard bad territory uh this stuff works great you can definitely make it work but it's not as like bulletproof as the convex stuff you have to like actually kind of try a little bit harder and do some tuning and you know trial and error kind of stuff to make this stuff work okay any other questions about this stuff okay so let's see the um some quick notes about constraints then so this stuff we talked about so far no constraints there's uh obviously we care about that so there's people have been you know messing around with how to handle constraints in these kind of methods for a long time and there's a handful of different techniques so um let's see there's few different things you can do many options depending on the type of constraint so in particular if you just have torque limits like just you know kind of bounds on you there's some special kind of like tricks you can play the simplest one is um to use what's called a uh like a squashing function or a saturating nonlinearity so um this is a bit of a hack and maybe not the best way to go but it's a common common hack in practice so show you that so the maybe the most common one of these would be like hch um or like any kind of sigmoid function so the idea is you have some function that um so to use your op imizing in the solver get plugged into this squashing function uh at the input to the Dynamics and then the the U's that actually go into the Dynamics are the ones after the squashing function and the squashing function you know the gist is you want something that's going to saturate out but be kind of linear in the in the middle right so it's capturing this kind of behavior so you would you would write this thing down such that it's bounded between U Min and uax and then kind of behaves roughly linearly you know in the middle um and you like this thing so you know kind of the idea is it's going to it's going to spit out some U Tilda which is squashed between the control bounds and that's the one you're actually going to run so for instance here if we use a tanch it's going to look kind of like this uh like umax times hch of like U over U Max just kind of scale everything and rely on that saturating Behavior to keep you within the control bounds um so this works it's a very reasonable hack um it is effective it will definitely make it satisfy the control limits uh but it's adding additional nonlinearity of the problem which can make it harder to solve and can make it take more iterations to converge so that's not necessarily great it's also in general going to give you a slightly like suboptimal solution so because this thing is kind of like rounding things off it it'll give you solutions that are like smooth and don't ever quite get to the limits which is actually maybe desirable in practice it kind of Smooths things out makes it not like be super bang bang railing things all the time which is not necessarily bad um but [Music] um the kind of May making it need uh more iterations to converge is not awesome so um there's a better option which is um so remember we're doing this backward pass and we're solving this like Min over U at each step in the backward pass and what we did before was just take a gradient with respect to you set it equal zero blah blah blah um but you don't have to do that remember when we did dynamic programming and pontre and all this stuff we made a whole point about how when you solve that Min over problem at each time step going backwards you can just enforce the control limits explicitly there right so you can do that here so we do that we just solve a box constrained QP in the backward pass instead of doing like the closed form answer which is extremely cheap to do and so you'll you'll do this you got Delta U is argmin gu slightly bad notation of this action value guy so this is that backward thing subject to uh like your forque limit constraints so stuff like this right so this is really easy to solve yeah that's supposed to be X Plus yeah Delta X thanks so yeah instead of just you know setting the gradient equal to zero and solving for Delta u in close form you just solve this little guy and remember s is quadratic so and we make it convex uh so this is like a tiny QP so it's really easy to solve this uh okay so that's controls control limits pretty easy turns out State constraints are a lot harder to deal with so you will see many terrible things out there in the wild if you kind of get into this it's very common to see people use like penalties uh you just add penalties to the cost function to try to do this stuff we talked about that at the beginning of the class right and said like you know penalties are bad they cause a conditioning you probably shouldn't do that they never really satisfy the constraints off also uh so don't do this that can make things e conditioned so other uh another option that is a better option is um well what do you guys think what would you do based on other things we have done in the class so far and like things we did in particular at the beginning of the class and we talked about constrainted optimization a whole bunch how did we teach you guys to do QPS that whole like augment of the Gran thing so the cool thing about the augment of the Gran method is it takes your constraints and it basically transforms them into a quadratic and a linear term in the cost function right the quadratic term is that quadratic penalty and the linear term is that lrange multiplier term so it looks just like a quadratic plus linear cost thing which is exactly what's in this DDP IQR stuff right like the cost to go is atic plus linear so the AL stuff slots right into this like super cleanly and kind of just works so you can basically wrap uh the entire DDP algorithm in sort of a like an outer augmented lran method and this is maybe the like standard legit way to do state constraints in um in in like a DDP type method and our lab like one of the first things we did in our lab like four or five years ago was to develop a solver called Alro that does exactly this augmented lran trajectory Optimizer and it's just this it's just like DDP with an augmented lran thing kind of smoosh together so that works pretty well um and as Maybe the best way to handle constraints in this framework um yeah I guess I'm not really sure how these two bullets like uh how to reconcile them because with the augmented lient weren't we using penalty term you're using a penalty term we talked about this like at the beginning of class right so the deal is if you crank that penalty up so with a quadratic penalty in theory if you want to exactly satisfy the constraints you have to Jack the penalty weight up to infinity and obviously you can't do that and if you get it super big it starts to make everything ill conditioned and things just barf so that's why you don't want to do that and so with the aian thing we introduced this extra of multiplier term and the gist of that is you use the penalty to help you estimate the lrange multiplier and you update the multipliers at each kind of Al iteration and because you have that extra linear term in there with that lrange multiplier estimate uh it lets you converge on the true lrange multiplier in which case the penalty goes away and it it converges it turns out some finite value of the penalty weight and you don't have to drive the penalty up super big so it alleviates the conditioning stuff I don't know we we talked about this a while back so I don't know go go check that step out again if you want but yeah that's kind of the reason for for the augment unit to fix that ill conditioning issue uh anybody else yeah um I mean it seems like you're not totally convinced that this is a good method what are the downsides to this like the augmental the gr ring you just build sound no it works it's it's like the it's the thing if you're going to do DDP and you need to handle State constraints this is probably the best way to go yeah like definitely better than putting hacky penalty stuff in there have been some attempts at um doing interior Point stuff also in like DDP it's not great I would not recommend that it it turns out like yeah you can't do the Primal dual is stuff we haven't really talked about in the class if you want to know about this come talk there's a couple papers that have tried this it doesn't work very well so like B if you're going to do state constraints in DDP do do it this way this is kind of the move and uh yeah that yeah changing yeah we said sing function we'll be like always be sure that we'll have feasible transes in this Cas because we said we canot that I mean by construction that U Tilda can't exceed the control bounds right if we do this this is a hack to be clear the squashing function thing is a hack it works but it's a hack um and if you're gonna be serious about this um like you're you should probably do this this thing instead which is the more legit way to do it in this Cas we have the feedback Dum G right how will that Beed because we're changing the this is a very good question there's a little bit of a kind of subtle Goa there you you also have to clamp them on the forward pass and so there's a couple ways to do that um there's so if you if you want to know the details of this there's a paper um by emo todorov and a couple others called control limited differential dynamic programming we can send it out on the P after this where they that was the first paper to really do this uh method as from like 10 years ago and um what they do is they solve this problem the backward pass that clamps they use and and then they also basically zero out the um the the like is it the um the rows of the K Matrix that correspond to the saturated Elements which is effectively like same as like clamping them in the forward pass does that make sense so there's some details in there yeah you check out the paper if you want the details okay anything else about that uh let's see how we doing on time uh okay so there's like some stuff we could say about minimum time problems I don't know if that makes sense to you right now does that make sense to do right now I don't know I'm feeling tired how are you guys feeling I'm seeing just like I'm seeing some sleepy sleepy eyes and stuff I'm also I'm with you no like not feeling super super energetic also we've got like I don't know 10ish minutes left thinking maybe it's time to call for today we do this next time