Nonlinear Trajectory Optimization and DDP Methods

Transcript for:
Nonlinear Trajectory Optimization and DDP Methods

okay so um last time we sort of like finished off the DDP story talked about some of the like implementation details um like for backward past you know stuff how to do a line search right all that kind of good stuff and then we talked a little bit about constraints in um DDP and essentially you know there's some hacky ways to do it for just control limits if you care about State constraints though um the the best thing to do is basically just like wrap all of DDP in an augmented lonian method and just kind of do the AL stuff we already talked about where you treat DDP as the inner solve of of the augmental gr method okay uh so today um just one more quick thing to mention that sort of a DDP thing but also just a more General thing that applies to to the stuff we're going to talk about later too is how to think about um minimum or fre time problems these come up uh fairly often in in certain applications so specifically we're talking here about you know get to the goal as fast as possible places where this shows up are obviously in racing um drone racing which has been hot recently like autonomous drone racing probably guys seen like the recent like like science papers about like people like you know getting the fastest ever you know beating humans at drone racing it's kind of an example of this also race car stuff um but these show up in other applications too like a really common one minimum time to climb for airplanes is a big thing like trying to most efficiently get to altitude uh I don't know shows up in yeah it's a thing so we'll talk about that a little bit today it's not super obvious how to do that in the sort of with the algorithms we're talking about um and then we're going to switch gears today and talk about I mentioned when we got into DDP that there's sort of like two broad classes of algorithms for nonlinear trajectory optimization so far we've talked about you know IQ rddp those are maybe prototypical like representatives of what are called indirect methods and then today we're going to talk about like the sort of most prototypical example of what are called direct methods so this is direct trajectory optimization and like roughly speaking indirect refers to like leveraging like dynamic programming kind of like Pagen ideas like Optimal control ideas to like um attack the problem uh and then uh which obviously we did with you know DDP it's like differential dynamic programming so it's it's sort of dynamic programming stuff direct methods basically refer to like just straight up discretizing the problem and throwing it in as a a more or less standard form nonlinear optimization problem using like a off-the-shelf nonlinear programming solver say which we're going to talk about today um so as part of that discussion we'll we'll kind of talk about what that means and then we're going to talk about briefly like one of the like very very standard techniques for solving standard form nonlinear programs which is sequential uh quadratic programming which uh you know is what it sounds like you take your nonlinear problem you quadrati and linearize some stuff to make it look like a QP solve a QP to get a step take a step it's like a generalization of Newton's method to like the inequality constraint kind of context we'll talk about that real quick turn this on do not disturb um and then uh yeah we'll get that'll kind of lead us into uh direct location which is sort of the most standard commmon widely known direct method okay any questions about that stuff what we did last time or what's coming up everyone good okay let's do it so first we'll talk a little bit then about sort of the minimum time thing which is applicable in in both settings um so endling sort of re minimum time okay so the sort of if you wanted to solve a minimum time problem like in continuous time the way you would actually write this down it looks like this we've got Min over you know our X trajectory our U trajectory and then B basically our our T final um and your cost function looks like like this so we have our integral right um from like T to tfal say whatever and what goes inside the integral here how do I get the total time that's what I want to minimize right so it turns out it's just one DT T so your stage cost is one right and so if you if you integrate one DT in the integral what comes out is the final time right just adds up the time so that's what you're after so that's the cost function subject to your Dynamics you know the usual stuff um and then you've got you know like maybe your exit P final equals your goal as a constraint for example and then you might have your like torque limits and stuff right so like um andx like this so this would be like a pretty representative like minimum time problem uh let's see so cool so this is weird for a couple of reasons um when I discretize this to like solve my you know solve it on a computer if I were to like like do this in a very naive way with fixed time steps this is literally telling me that I want to minimize the number of time steps right which is a weird thing that isn't great to write down like because if I were literally doing that if I have like that's sort of some weirdo mixed integer thing and it's also making the size of the problem change it's just you basically don't want to write that down as a numerical thing to solve on a computer so um we don't want the you know we really don't want the size of the problem changing the number of not points to change I like don't even really know how to do that effectively right in a an algorithm so how else might we do this don't have any ideas Okay so anyone thoughts yeah you got it so yeah we want to do this with a fixed number of notot points the move is you change the Delta T's so we basically want to make the Delta T's decision variables and the quick and dirty way to do this is you can basically make the time steps control inputs I'll show you what I mean so we're g to make so if you have like H subk for each time step you have the H the delta T for that time step we're going to make those new controls so that H from your Rakata method say so we're got say like we have our discrete time you know Dynamics we're going to make this like f of XK U Tilda K so you have again and this is your runga method right this is like your rk4 step or whatever in here so like k um and then U Tilda K is going to be our original UK and then HK where H is the the time step from the Rakata method so I've got my like continue inside this you know F sub RK it's like your continuous Dynamics and like your runga thing with h in it I just make h a new control input and stack it up like this and now as far as the control problem is concerned this is play with right um the other thing you would want to do in this case is uh so there's there's lots of kind of little GES but you would also want to um scale the cost for example like you'd make this because remember this cost is supposed to supposed to approximate the integral right so if the bounds on the integral are changing so what you really want to do is basically scale all these little guys your stage costs by H and so now it kind of all makes sense right so like if we were going back to the The Continuous version here where the the integram stage positive one down here right if we do this properly sumar range right so the hes just are all that show up and that's the total time now yeah I think you're yes so this is a little subtle um so the deal is there's obviously like you know the rangaka thing is only so accurate right it's really accurate if the time step will small if you stretch the time steps out really big it becomes like pretty bad right for for if you stretch the time steps out too large your integration accuracy your your runga cuta thing is going to be bad and so you're going to get like what can happen here is effectively the solver can do this like it can mess with the time steps a bunch to like cheat the physics by like stretching some time steps out super big right and you don't want that so how much you fix that uh you definitely want to do a line search here but the the simp thing here is I I can put inequality constraints now it's just like it looks like control limits right we always have like torque limits right so I can put bounds on H I can put upper and lower bounds on H as as a as a constraint inequality constraint in the problem looks exactly the same as torque limits right that's no problem so I can just bound H and like say okay yeah you can't make H too big Mak sense yeah so that's easy to deal with yeah so yeah there's a good reason for this um which is a little subtle you can make them all the same if you want to there's a good reason to make them all separate though and that's uh essentially this is hard to okay we haven't talked about this a ton but basically if I make them all separate hes so they're all like used this then kind of plays nice with all of the algorithms we know like so in particular it plays nice with rotti and it plays nice with like DDP it's a simple mod if I make one H for the whole problem you can't do rung you can't do ricotti anymore because uh so specifically right it's it's actually technically speaking like from a I don't know like it it breaks this this control problem oh damn it can't scroll there it breaks this H like couple across all the time steps it breaks that setup and I can't use your cotti anymore you can do it but what ends up happening is you're uh remember we have that like nice stair step Jacobian structure it makes that Jacobian have filling because H affects all the times which is annoying so it turns out um having them be all different and then constraining them with like box constraints to say they can't be too big or too small it it works better practically in the algorithms know if you don't believe me try it basically but it show it it messes up the the Jacobian structure and you can't do ricotti is is like yeah uh you your hand before that was supposed to be RK for Rakata yeah let me that like discreet time Dynamics right so like rk4 say yeah yep so yeah this the in the key idea here is you're decoupling the number of knot points from the total time so you can make the number of knot points whatever you need to so it can be 10 100 thousand but by letting the choose now the solver can shrink the actual walk with while keeping the number of discreet do points the same so the problem size doesn't change right so this is like one optimization problem I could write it down once throw that to solver get you answer yeah yeah you definitely could the reason you know if you make n too small um like basically if n's too small then like effectively H would have to be really big right which is bad for accuracy and also if you like put an upper bound on the size of H you could just make an inasal problem answer or it can't find the answer so yeah absolutely that would be bad shouldn't do that you should probably make n conservatively big in general right yeah anybody else yeah yeah lots um I mean like I kind of mentioned you know like uh racing like if you want to like race car around a track find the minimum lap time you would do this uh drone racing which has been hot recently you do this um classical examples would be like minimum time to climb for airplanes like fighter jets this used to be a big deal like how do I get the how do I get up to like you know 30,000 feet the fastest possible um you do this uh there's yeah lots and lots of examples where we care about doing a getting to a goal State as fast as possible yeah I mean it shows up all the time like we've done this for Quantum problems it turns out for like qued computers you care about this because the longer it takes the more like weird noise decoherence effects you get so you want to do it as fast as possible tons tons of examples where you care about this yeah I don't know if you care about going fast this is what you want essentially yeah make sense uh cool all right so any let's see uh there's some few okay so this is what you would do um couple more notes one is that the the way we've written this down this is always um a linear nonconvex problem even if the Dynamics are linear because what we've done like say we have linear Dynamics if we make the time step a control now it's now like nonlinear in like the new controls because I have ax plus Buu all times H and if a is H is a decision variable I now have like X time h u * H stuff happening which is now nonlinear that make sense everybody yeah so these are basically always nonlinear problems which is annoying uh even if Dynamics are linear because the Dynamics are nonlinear yeah so you if you have equality constraints in your problem the only kind of equality constraints that are convex that are legal are linear ones so it comes back to the geometry of like draw a straight line between two points in the set and the line has to be in the set any kind of curved equality I can't do that because I can't draw a straight line that's going to stay on the curve surface right so the only legal equalities that are convex have to be because the line that's the only thing where I can have a surface that has contains the line right it's sort of a trivial thing right okay so that's um and then the other one yeah is that we already kind of talked about this you you basically need to put constraints on H um to make this work in practice otherwise um the solver will basically try to like stretch the time steps out in weird ways and like cheat the physics oh yeah also uh if you don't have constraints on this it can the solver can make the H the time set negative which is obviously bad and wrong you can't have negative time so uh but yeah you at least need to have you know like positivity constraints on H when you do this it will do this it'll do weird weird stuff so you got to have constraints yeah uh what do you mean go back and forth oh yeah it'll do super weird crap yeah I mean because like just thinking about it from a cost function perspective right I'm just summing up the time steps so if I can have negative time I could make that thing zero very weird stuff and it's you know obviously wrong yeah like negative times are optimal with respect to this cost function right I can do it so it'll try to make the time negative naively yeah okay so we don't want that all right cool so any questions about this this is very easy it's you know literally just like take your H from Rakata stack it in your U and now you have like a weird utila and now you can just do minimum time or not just minimum time but you know basically you can wait time versus all the other stuff right you can make say you don't know like say you need to get to a goal state but you don't know what a good time is for that then you do this you can you don't have to wait the time you can just let it be free right and then it'll figure out okay here's the most efficient way to get there I don't know if it's going to take five or 10 seconds you know let the solver figure it out for you it's very useful thing to do the free time thing okay with that uh Switching gears then so now we're going to talk about direct direct methods Direct direct trop um so yeah the the basic idea here is you know before we did DDP there was all this like dynamic programming e stuff and all this like complicated like control theory happening here it's a lot simpler um the basic strategy is you literally just discretize the problem uh or transcribe the problem in like the continuous control problem in like the most obvious way possible so uh this is sometimes called direct transcription um we just take that continuous problem turn it into um a sort of standard form nonlinear program also called NLP sort of annoying to me that yeah to me NLP means nonlinear program but now it means like natural language whatever to so I can't say that anymore around here like all the computer scientists think it's natural language uh but okay so here's what we mean a standard form nonlinear program is the following thing it's Min some f ofx which can be nonlinear obviously uh subject to some equality constraints which can be nonlinear C of x equals z and then some inequality constraints which can be nonlinear C of X say less than equal to D of X less than equal to zero so this I would call this a a quote standard NLP and um basically for our purposes this is going to be like your uh cost function um the equalities for us are going to be basically the Dynamics constraints they can be other things too like a goal State you know could be inequality constraint as well but mostly these are going to be Dynamics and then like the inequalities for us are going to be you know torque limits obstacles the that kind of stuff right so say other constraints and so what this is all about is basically taking the continuous optimal control problem like we have up here where it's like an integral and can use you just discretize everything in the most straightforward way possible and stuff it into like FC and D that so it looks like this and then once it looks like that I can throw it at lots of you know kind of standard numerical solvers on my computer um so couple notes so when we do this this it's pretty standard here like if we're using Newton's method you kind of need everything here to be C2 smooth it's got to have like first and second derivatives so I can do Newton so we we tend to assume all the functions so FC and D are assume to be two means I can take two derivatives and I won't get any weird crap happening um and then there's lots of off-the-shelf solvers for problems that look like that basically uh large scale NLP uh then we could just throw this at if we can get it to look like this right so some examples um so maybe the most common would be um IP op which is um open source free and is developed at CMU in Larry Beer's lab and chemical engineering um another one that's very popular in the controls Community specifically and like Aerospace Community is stopped or SN opt which stands for sparse nonlinear Optimizer that one's commercial non-free Al like 500 bucks for a license I think last time I checked and then another one that's um pretty big is called Nitro which is also commercial so for obvious reasons we're going to use IP optin here uh that's free it also it's pretty good it works pretty well um okay uh and yeah just for yeah IP opt is as the name implies it's an interior point it stands for interior Point optimization Optimizer so it's an inor report method uh SNT is an active set method remember we talked about this this kind of stuff before and then Nitro actually does both it has like a whole bunch of options and it can do both like interior point and active set kind of stuff and it kind of switches back and forth whenever it's convenient um okay and then um just to kind of like go into this yeah so snot is an sqp method Nitro kind of does both um one of the most common strategies though you'll see for doing this stuff is and also because we know how to solve QPS already in here we talked a lot about QPS a very very common strategy for solving nlps is sequential quadratic programming which just means you take your nonlinear problem make a starting guess linearize a bunch of stuff turn it into a QP solve the QP take a step and it's sort of a generalization of Newton's method in a lot of ways so it's pretty natural thing to do uh let's talk about that sqp okay so so the idea is um we're going to take a second order tlor expansion of not just the objective function not just F but the lran so that's like the full Newton version right to be clear right it includes the curvature of the constraints stff and um so we take that H and then we linearize the constraints we linearize of X and D of x uh to approximate the NLP um as a QP we know how to solve those so let's write that down so Min we're doing Min over Delta X and it's going to look like f ofx plus like gpose Delta X Plus Delta X exp over here so we tailor expand all these guys so it's like Capital C which is the Jacobian we're going to do zero and then B of X Plus Big D * Delta X less than so that's what it looks like we taor expand a bunch of stuff and now in the Deltas this is a QP right then where uh H is e^2 L B X2 B is l x Big C is partial C partial X Big D is partial partial X and to be clear the L we're talking about is the lran so in this problem we'd have um a l NM to handle the equalities and the inequalities and this is FX plus Lambda transpose C plus mu transpose right so lanan whatever um we can write that QP down by taking all those derivatives then we know how to solve QPS already so we use our favorite QP solver to solve this thing for Delta for the Deltas and um there's some details in here but basically we're going to use that QP solution to compute a um a primal dual search direction or step so that is going to give us some Delta Z which is got a Delta X and then Deltas on the multipliers as well right and then with that we can do a line search all the usual stuff break down a merit function we already know how to do that Stu okay and so to be clear right this should look extremely familiar and in particular if I eliminate the inequality constraint so if I get rid of that D line there this is just Newton's method with constraints that we already did at the beginning of class right so you got so if I eliminate like specifically the D's and the M's in this you know now this Delta just Delta X this is exactly what we did being in class right so same stuff um so with only equality constraints is yes Newton on the KT conditions uh which we already did right so this is like you know H transpose C you know this thing Delta X Delta Lambda just for refreshing our memory and this is like minus gradient minus C of X and you know this thing is a kkt system so we already did all this stuff right so the only difference is now we're adding in the inequalities in the sub problem directly and it's sort of generalizing this to the inequality situation and if you already have a QP solver that can handle it you just throw it at the QP solver um and so this becomes a QP instead of a single linear system solve and then we get this search Direction over both sets of multipliers and then do all the same stuff as before cool any questions about that okay so yeah basically think of this as a generalization of Newton uh to handle inequalities and yeah you can use in principle any QP solver you want for these sub problems um but um good implementations try in general to like try to warm start the QPS as much as possible based on the last iteration turns out you can do clever things here like the last QP step looks something like the next one in general so you can play lots of warm starting games um so in a lot of in in good implementations of this stuff particularly like SNT which is maybe like the the you know industrial scale version of this there's a the the the QP solve is like extremely specialized and very tightly integrated with the overall NLP solver and tricks that they play to be able to efficiently like Leverage prior information in the next up solve uh yeah that uh and then the other thing to note here um is that for good performance especially on trop problems uh you really need to take advantage of all the sparsity structure right so we talked about like you know how that KT system like for the lqr problem is super Spar has block structure and how ricotti is like efficiently harnessing that by doing like back substitution um similarly in all these guys you if you want it to not suck you have to take advantage of that sparsity and most of the good like large scale solvers do this to some degree um and it's even in you know snop it's in the name right sparse nonlinear Optimizer so they'll doing like semi- smart yeah sparity death yes yeah if you're naive about this right these are huge problems they might have like you know 100,000 decision variables and so we're talking about like factorizing like 100,00 by 100,000 Matrix right if you treat that as a dense Matrix like if the problems I mean you can easily write down traja problems with like a million variables if you treat that naively as like a million by million Matrix you'll never be able to solve that right so the only way it's even tractable is because you know most of its zeros and it's block Bandit kind of thing in the middle is the only part that's nonzero so you kind of have to be smart about that and yes absolutely it's wall clock time it's also memory and at a certain point it's just totally intractable if you don't do this right uh cool so for good performance oh uh let's see yeah oh there's one other yeah so one other note is that if your constraints in the original problem are convex like say you've got I don't know conic constraints in the original problem um as long as they're convex you can actually pass them right into the sub problems so here we talked about just sqp but let's say you have I don't know a rocket Landing where you have onic constraints on the thrust Vector but it's like full nonar Dynamics um those conic thrust constraints uh you know in princip you can just throw those right in the sub problems as is and instead of sqp youd have like sscp right so you can like as long as they're conic or whatever and you have a good solver that can handle those cones you can just handle them right in the individual sub problems um and this is uh the generalization of STP is called sequential convex programming or SCP and this has been kind of hot for the last few years and it's still a pretty active research area um for things like rocket landing and stuff like this so uh in fact I think this past year there was a big review article in like I control systems magazine about SCP methods for you know it's it's sort of hot uh so specific conic problems are really common uh and generalize P topal Onex programming where um inequalities are as directly uh to sub Problem Solver um and then yeah like sort of this SCP stuff is still a pretty active research area roughly speaking sqp methods were developed in the 80s and were like in the 80s um into the 90s and then like in the mid to late 90s and then into the 2000s like interior Point methods were the hot thing um and like kind of all the state of the art like IP opt these are like mid-2000s um like uh and then now like this SCP stuff is becoming popular uh as a kind of research area in this kind of area okay any questions about that stuff yeah yeah absolutely so what would it be I go back to here uh sort of comments apply like you'll maybe get slightly worse tail conversions like then the difference will be maybe a handful of iterations but it's often cheaper and easier to implement so people do it all the time so I'd say yeah g Newton stuff in general is super common it's so common that like people don't even often like you know sort of like it's like very very normal thing to do just like whenever you feel like it uh it's super common in fact it's pretty much the um so in sqp methods people do try to like do the h lran a lot but it's actually extremely prob um the reason is that remember we talked about this whole hes Gran thing often becoming indefinite and not convex because of that like it's very annoying to try to to deal with that when you're trying to pass these things to com Sol so people often just do G methods yeah so it's yeah basically you should just probably always do gas M I'm being honest if you're doing this step yourself it's just like way way easier and tends to cause less headaches it's not usually worth it to do the full Newton thing with some exceptions uh cool all right any other questions about this stuff okay so now we're going to talk about direct Cod location now that we know how to solve these things now we can like actually actually do it yeah like also to kind of like just talk about the difference for a sec right so like DDP like you know the it sort of comes with the solver right like the solver is very much tied up with the algorithm they're all kind of one thing here like there's you set the problem up you discretize the problem and then it's an NLP and then you can use any solver want right ipop not whatever so there's sort of this like for indirect methods the solver is like intimately tied up in the whole thing whereas for direct methods it's sort of all about you just discretize and then call solver and they're kind of like separated uh all right so um few things to note here one um so far there's some cool things about these one is that um so far we've used like um explicit rangaka for everything um uh that look like this right so you've got um continuous time Dynamics and we kind of convert these to a discret form that looks like this I can plug in an XK and a UK and just get out an XK plus one and if you're doing roll outs which we are doing in like DDP this makes a lot of sense because you want to be able to simulate thatx forward in time efficiently um but in these cocation in these direct methods we're not doing roll outs anymore we're just writing down the Dynamics as equality constraints and shoving them into a solver right so we don't actually care about that now and um it turns out this is uh potentially a big win um so to be clear like to be speciic about this right we're like we're just enforcing these things as equality constraints and we don't um that kind of frees us up a little bit to use um use integration methods that aren't explicit IE like implicit runga of methods um all points but let's write this down so we have something like d subk maybe of like XK U K XK + 1 u k + 1 equals right our our constraints are of the form C of you know decision variables equals zero so I mean I can obviously put this guy you know my explicit run a cut I can obviously put that in that form by moving the X K plus one over and setting it to a zero but also I mean I can do anything I want so I can have some like kind of more complicated norly and you remember before we talked about implicit rung at the beginning class how you needed to do Newton on this to solve for the next time step like implicit Oiler and that's annoying if you have to do a roll out right if I'm doing a roll out I don't have to do a bunch of Newton sols just to get the nice time step but in this context where I stack the whole problem up and these just become equality constraints in the problem I don't care like so another waying stuff myself I'm letting the solver do it for me kind of implicitly as it solves the whole problem so I get this for free I don't have to worry about it right um so let's see um okay so that's fun and um so what we're going to do now is like talk pretty in get into details about how like the classic direct cocation method works this is often called durall and if you're cool you're in the now uh so the idea at a basic level um is you represent all your trajectories as polinomial splines and um you end up enforcing the Dynamics constraints as like constraints on the spline derivatives which are easy to calculate in close form that's the high level idea and then classic dur call which we're going to do now um makes a couple of particular choices here where it uses um cubic splines for the states and um peie wise linear for the controls this is just like these are just choices you can do whatever you want this is kind of the lowest order that makes sense um yeah so like uh and there are actually some there's some cool things that happen with these particular choices that we're going to show you that are kind of advantageous um but you can totally go higher if you want to it turns out for like robotics applications this is kind of a sweet spot it is common to use um it is common to use much higher order polinomial for um so there's actually a reason for this right so in Aerospace applications like interplanetary trajectories like flat to Mars kind of stuff in those contexts it is common to use much much higher order splines uh like I think JPL for like these interplanetary things it's very common to use eighth order splines it's really high order um and the reason there is and whereas in robotics we wouldn't do that for the following reasons right so those space trajectories they're very smooth and so you can get away with representing them as high order pols robotics stuff on the other hand often for like our applications our Dynamics are not that smooth because we have like impact grasping stuff where you have these kind of like non-s smooth things going on and the deal with polinomial interpolant right is like um to represent something with like an eighth order polom that means like the underlying function that I'm trying to fit with the eighth order polinomial has to be like really needs to be like C8 smooth I need to like it needs to have like eight continuous derivatives but that's it makes sense and Robotics applications that's just not a thing like we don't have that so for us it it wouldn't make sense and if you if you're trying to like fit smooth polinomial to like jerky functions it just doesn't work basically there's no win and it it doesn't doesn't happen it you get these like weird ringing effects when people have seen like the runga phenomenal you might have seen this when you did 4A transforms like you try to do 4A transform with like a step function you get this like crazy ringing and you have to go to really really high order the same thing happens here so if you have non-s smooth like you know step function jerky things in either the controls or the states higher order polinomial don't make sense basically so this is kind of a good sweet spot for robotics uh maybe if your problem's super smooth it makes sense to go higher order but for us not so much uh maybe write something down about that maybe so like very high order uh pols are sometimes used um so like for example spacecraft trajectory stuff uh but kind of not common and we're not going to do it but I mean the same stuff applies right you can you can figure it out if need to um and this again like the durall stuff we're going to do is is kind of again like very very common and kind of a sweet spot for robotic stuff so like write these down um so like here's what it looks like so on the states we'll do like a very cartoony version of this let's say we've got like you know x x of T um you might have something you know that looks Wiggly like this and let's say I don't know this is like ekk and like um e k+ one over here so these guys uh the TKS we call these not points right we've done this already so this would be like XK right and xk+ one um so those are not points and then we're going to introduce this other concept to talk about this collocation stuff which are points in between the knot points so they're the knot points are the things that are showing up in your optimization problem those are the decision variables but for these cotation methods we also end up needing to evaluate the Dynamics in between the knot points and we're going to call those po location points so particular for the dur call algorithm we're about to do we end up evaluating the Dynamics also halfway between the notot points at like PK plus a half we'll say and so this thing is called a collocation point okay that's that so that's the x's and then similarly we said you know the Ed we're going to just do piecewise linear interpolation so these are going to look like um you know straight line chunks so something like this uh this to be like you know DK DK + one okay that's what this stuff looks like you know cool and similarly right for that collocation point I can do that that on the on the use as well I'm going toate you know UK plus a half over there same kind of thing that's right yeah so that's that's what Durk call assumes yeah first order hold is the same as saying you know linear interpolation or linear spline if you like right although we wouldn't call it that same deal though yeah you can go higher order it's just that in a lot of our robotics applications it doesn't make any sense interation can also even that can be kind of dumb in a lot of cases if you're like really thrashing between your control limits which happens yeah and so this this so that's actually one of the good things about this is to kind of smooth stuff out and um doesn't let you totally rash your actuators uh okay so that's sort of cartoony what's going on and so now we're going to write some of this stuff down so just to give you an idea of like how you go about deriving these things so I'm gonna like tease it a little bit we're not going to do like the full for details but the idea here is say between a pair of not points I'm representing the state trajectory as a cubic so that's going to look like this they like a constant term and then a linear term and then quadratic term and cubic term and I'm just writing this with like these C coefficients that's what it looks like right and I can take that cubic polinomial and I can diff it in closed form like analytically which is turns out very useful so I can get x dot T and it's just like doing differentiation on that guy so it's C1 plus 2 C2 t plus 3 C3 t^ 2 right makes sense that's super easy and then I can take that stuff and ultimately like the thing that's cool about cubic splines and why they're so popular in so many applications is that a cub spline is uniquely determined by knowing the end points and the derivatives at the end points which is cool so the reason that's true right is that there's four Co number it gives you like a good way of interpolating if I know xmx do at the end points which is convenient for lots of like Dynamics applications and so just just to give you guys like a little teaser if you haven't seen this stuff before um I don't know general interest the way you go about this is I can take these two lines I just wrote down and I can write those in Matrix form so um I have some big Matrix that's going to multiply all the C's so this would be like you know C1 so if I were to stack a vector up over here with all those coefficients the C C1 C2 C3 and then like you know have this Matrix and and it spits out XK x. K XK + one x dot a plus one so the idea here is I want to be able to uniquely determine the polom from the end point stuff right so I can write this down as as this Matrix equation so um so what I'm doing here literally is plugging in lines and just going across there um so that's you know just c not and then the x dot it's just C1 if T equals Zer so I'm just picking off the coefficients and then for um tals H evaluate these things so it's going to look like 1 H H2 h H cubed and then same thing for the X dolin I'm going to get uh 01 2 H and then 3 H2 uh okay so I write this Matrix equation down and then I just invert that and now I have like everything I need to like be able to uh to be able to work with just the the endpoint information and I don't need to like worry about the C's anymore basically what I'm doing is I solve for the C's you know in terms of this stuff but by having this Matrix I can like just never need the C's like anything that involves these pols I can just use the endpoint with stuff we're never going to actually do do stuff with c's we're just going to use the x's and X dos for everything so to be clear right I can I'll just write it down you can do this like I don't know you can do this by hand you can use like a toolbox to compute this whatever but at the end of the day like you just work with this thing um this is sort of a general interest thing so you know what's going on here um but from now on we're like never going to touch this the coefficients we're only going to use stuff like this and write down constraints that are just between like the knot points um yeah I don't know details here aren't super important but yeah so this is like now the other way right so you can do this and now I can just carry around the x's and the X dots at the notot points and this Matrix so if I want to know X at the collocation points I want to know X at and I slap that on in front of this Matrix the location points from the end point stuff do that make sense so that's kind of the gist so we don't ever use the polinomial coefficients anymore okay cool so now we're going to kind of just run through the rest of this guy so we're going to evaluate the spline at the collocation point at the sort of DK plus a half point so just looking up here and plugging in you know h over2 i get the following step we'll call this x k plus a half um so this is X evaluated at TK + H over 2 and just using the the spline stuff and again doing it in terms of the end points not the coefficients I get 12 XK plus XK + 1 um which makes sense and then H over8 times x do step and so for those X dots now I'm going to plug in the Dynamics which are the which I know the continuous Dynamics to be clear right so this now becomes 12 XK plus XK + 1 plus h over 8times um f of XK UK minus F of XK Plus + 1 UK + 1 and to be clear these FS are the continuous time Dynamics not not discretized right all this is continuous time okay so that's the X's at the collocation points I also need the X dots at the collocation points so this is x dot evaluated at TK plus a half which is really so we're going to get this from the spline so the idea is we're going to get this from the spline then we're going to evaluate the Dynamics and set them the constraint it says spline derivative at the midpoint has to equal Dynamics at the midpoint so plugging like H over2 into this stuff for x dot I just do that real quick it's easy so it's 3 over 2 h x k - XK + 1us 14 x. K plus X do k + one and then same deal I'm going to plug the Dynamics in here uh so go I just plug the fs in XK u k uh XK +1 U k+ one cool um so that's x dot at the midpoint so I've got X at the midpoint x dot at the midpoint and the last thing I need is U at the midpoint this one's easy because it's just linear interpolation we know how to do that so U K plus a half is U at TK + h over2 is just because it's linear interpolation it's just averaging the two end points so UK plus UK + one okay deep breath cool so now I have everything at the midpoint of the spline and now we're just going to write down Dynamics constraints by setting the x dot from the spline equal to the x dot from the Dynamics we're going to evaluate the Dynamics at X midpoint U midpoint call F there and we're going to set that equal to x dot coming from the spline and that's our constraint so here's what it looks like so like we'll have like Ci a whole bunch of these constraints that couple adjacent time steps adjacent knot points and it's going to depend on x's and use those time steps and it looks like this so I'm going to evaluate F my continuous Dynamics at x k plus a half U K plus a half where these are coming from the splines right so XK plus a half is up here right I just got it from the cubic and UK plus a half is right here I got from my linear interpolation I plug that into F so I got my Dynamics and remember that's equal to x dot so I just set this guy equal to the x dot that I got at the midpoint up here from evaluating the spline this literally just saying spline derivative has to equal Dynamics at the midpoint and so I I'm going to subtract these guys and set it equal to zero so this part is just what we wrote down up here for x dot from the spline derivatives so it looks like this I get my brackets right so this whole thing equals zero and again the fs here are all the continuous Dynamics FS not the discrete Dynamics FS right this is all continuous Time stuff continuous splines I'm be laboring this because this is a topic of much confusion last time we did this so I will keep noting this Okay cool so yeah yes it's time oh yeah yeah yeah yep not not the best hand rang that was a good match okay so some notes on this and please yeah ask questions yeah why does what are we talking about this picture yes where different things this is supposed to be the X trajectory which is so points so we're using EB polinomial here wiggle here using linear on the use so just cartoon and then so that's the picture and then the math is down here for how you actually compute the midpoint given a cubic spline or given a linear interpolate yeah yeah so this we'll get into this this is actually like one of the cool things about this so basically like my constraints are going to be that if I have say I have just two spines right I'm going to have a constraint that the spines have to agree with each other at their end point so that's constraining both x and x dot to be equal to the previous spine make sense yeah so you're at the end points you're enforcing matching between adjacent spine chunks and if you like think about this for a second like so let's just think about the two spine case right how many coefficients do I have in total for two spline chunks eight so I have eight total like free degrees of freedom so I can have eight constraints so if I set these guys equal here how many do I get rid of yeah two constraints if I set the end points how many do I get rid of yeah so I still have a bunch of pre- ones right so like even if I constrain the initials uh State and initial like whatever Str end points and I glue them together I still have degrees of freedom right so that's yeah so I get one midpoint and one midpoint so that now it's all fully determined right so basically like the deal is I'm my constraints at the end points of this spine are stitching constraints that make the spines match each other and then the Dynamics actually get enforced by by a valuing at the midpoint you you could um like there's nothing stopping you from doing that but it turns out doing it this way um gives you third order integration see which is subtle um there's actually a very cool like slightly subtle thing going on here that we haven't quite hit yet that's going to come in a minute when I get all the way done yeah yeah so I mean Beast blinds are just blinds like what do you mean I guess I'm not following you are you are stitching with Beast blindes also like Beast blinds are just a different choice of representation uh so like all all polinomial splines are equivalent up to a linear change of variables it turns out um and like it doesn't really matter how you choose to to like so you know you could represent them with the polinomial coefficients you can choose to represent them with like you know control points like bezier curves here we're choosing to represent them with like um not Point States right turns out for these kind of control problems that's actually very convenient and there's reasons we want to do that because like for instance we want to enforce Collision constraints on the actual States at those not points right it makes sense for us to use like those those notot points um you could do it other ways but it would get Messier for other things Downstream um often because we yeah we care about explicit State constraints and and control constraints uh I don't know if that answer your question but yeah to first order it doesn't actually matter which polinomial like basis representation you use like you know beier like this it's all turns out if you take those whatever coefficient representation use it should be clear maybe also from this like I went from polinomial coefficient to not points by matrix multiplication right turns out any any way you want to represent the polinomial it's just a matrix multiply they're all uh it's a CH it's a linear change of basis basically uh turns out polinomial are a vector space and like all of this is just change of bases so it's just Matrix I I don't know if that all right anybody else yeah yeah where yeah so literally all that's do is doing is I went up here and I plugged in each what particular this Matrix multiply those matrices together I can get um this I me literally all I'm do and evaluating the polom H over two is all it is it's just a choice I could use any kind I want I could use I could also use a cubic on that if I wanted to it just turns out it's not very it's not very worth it uh in principle you can use any order you want for any of these things arbitrary Pol it turns out that like this combination is very common and is like a kind of a classic method that's very very videly used yeah this is kind of the simplest version that works well in practice that's very very common but you can use high order whatever if you want to on any of this stuff yeah you do yep exactly you do yep you have to add a collocation point on the U with with more constraints yep exactly so it depends there's some subtlety in here so when you say changes fast you can mean I if it's changes fast in terms of so there's two I guess got another there are two options you can make the order of the polinomial higher to capture more wiggles or you can increase the number of knot points and have more time steps right this is these are your choices and it turns out for very smooth problems which can happen in like spacecraft trajectory design for instance higher order polinomial are good if you have a smooth problem if you have a non-s smooth problem where things are very jerky increasing the order of pols can hurt you more than help and you're better off adding more knot points so it's sort of depends on the problem and like how smooth it isent yeah I mean so you need to have enough you know expressive capability in your function there to like capture the true function right so and your ways of increasing the expressiveness of this representation is you have two choices you can more n points so more segments right or you can have high order pols inside the segments and um if your function is infinitely smooth and has all the derivatives ever then um it turns out increasing the polinomial order is advantageous if your function is not very smooth it turns out like that doesn't help and you want to add more knot points that's kind of it I don't know and it it it kind of just depends on your application which way is better uh okay any other stuff on this okay let's keep checking then so and yeah hopefully uh okay so couple notes and then like the cool punch line about this so not so we're we're evaluating stuff at the location point in the middle but just to be clear note that like only the xks and UK's are decision variables in the optimization problem like the uh those midpoint like collocation Point values like XK plus u k plus half these are not actually explicitly decision variables they get calculated internally inside the constraint function when you evaluate the constraints but they're not decision variables um this integration scheme where it's like an implicit integration and it's a cubic polinomial like whatever this this integrator um It's actually an instance of an implicit rung C method and it's got a specific name this is called just as a fun fact if you care um this is called hermit Simpson integration I don't know just like rk4 or whatever that's that's what it's called um this it turns out achieves um third order integration accuracy so like like an rk3 would right which you know can do but the reason this was cool and one of the reasons we like for this whole discussion is that it turns out that doing it this way remember implicit is free when we do this stuff um it turns out this implicit hermit Simpson scheme same accuracy as rk3 but fewer Dynamics calls than an explicit rk3 would be which is not totally obvious but I'm going to hopefully make it obvious in a sec okay so here's to make this clear I'm going to write down just for our purposes of this conversation I'm going to write down explicit rk3 it looks like rk4 but it has three stages instead of four so we evaluate F at the beginning we evaluate F that kind of the midpoint uh we evaluate it the m point e looking thing this weird kind of extra point this is just you know you can look this up on Wikipedia if you care it's the same idea as RC 4 it's just third order um and this is what it looks like okay so that's that and then like you get the next XK + one by kind of weighted average over these okay so just from looking at that very obviously this requires three Dynamics calls per time step makes sense expected so that's that one now let's look at this hermet Simpson thing again it looks like f of x a + f u A+ F uh plus 3 2 H oneus 14 F of XK U K plus F of XK + 1 u k + 1 okay equals zero so those that's the herid Simpson thing how many f vals are in this guy it looks like three but if I do this over whole trajectory it's not why y so so think about if I just have two segments if I just have two segments um I get to X1 to X2 to X3 and then the the constraint here in the next one is f of X2 U2 so they get shared across knot points from adjacent time steps right so the the end point for this one is the first point for the next one and on and on and on right so if I actually look at this on average across the trajectory it's only two Dynamics calls per time step instead of three because I get to recycle the calls across time steps do that make sense yeah everyone see that so these guys these get reused adjacent STS yeah the easy way to see this is if you just take like two time steps if you add up the total number of calls over two time steps it's not nine right because you get to recycle the middle one um so then like basically kind of ASM totically for a long trajectory it's only do Dynamics calls and the reason we care about this is it turns out the the nonlinear Dynamics calls end up dominating a lot of the computational cost of these algorithms so if you go from three to two it's literally a 50% Savings in like overall like flop count and stuff on these things which is a pretty big deal also um we talked about at the beginning of the class how like implicit rangaka methods tend to be like more stable than explicit ones there's also like stability reasons that this this works better so you get kind of both a cheer compute win and like a integration stability kind of win okay cool it's a win uh and then just before we get out of here let's do it no cool things we're gonna do the acrobot again same stuff as last time here's those dur call uh dynamics that we just did written out in the code uh cost function is same stuff we've been doing like quadratic cost blah blah blah here's those Dynamics constraints evaluated you know at at all the time steps I'm being super dumb about it here and not being efficient and recycling calls but you know if you do it you should for efficiency um this is the whole constraint just kind of everything in here um this is a bunch of boilerplate call code to call IP op um we made this a lot nicer for you on the homeworks this is just kind of crappy old code that's very ugly none of is important it's literally just boiler plate crap to call the solver don't worry about that um we have a goal constraint it's a swing up constraint like we did last time but now we actually set that as an explicit constraint it's not like weighted in the cost function I make a stupid guess where it's just like hanging down the whole time Put a Little Noise we solve it um it's going to take a couple minutes because my code is not not fast well that's what that do it's thing then uh once that PES itself out p m and plots yeah cool so there it finished it took like I don't know you know 30 seconds or whatever I think it was about like about like 800 iterations or something 843 iterations you can see this works cool got our little animation looks basically the same as what we got last time with DDP which is cool here's the really cool thing about these direct methods though that you can't do with dedp so I solved this once right let's just say for sake of argument that I have some way of getting a good initial guess on like part of the state so let's say I just get the just like one of the states like one one of the POS variables or something like that let's say somehow I know these and this could be for example if I did like sample based planner graph based planner just in like the pose right and I don't know the velocities or just for part of the state like Center Mass Dynamics or something let's say I have something like that the cool thing here is because these are all like just constraints I can make a dynamically infeasible initial guess here right in DDP I can only guess the Ed here I can guess the x's and Ed separately and they don't have to match or obey the Dynamics at all so I'm going to do here is I'm going to plug in just a guess on the first state just the first state so it's just this angle not even going to plug in this angle not going to plug in any velocities and I'm going to add some noise to it so this is my new initial guess it's totally in feasible it's noisy and it's only part of the state Vector it's only one out of the four states watch this if I resolve this warm started with that guess it now only takes like 2 iterations and it was almost instant pretty sick and now I get the same answer so one one of the things that Derk call lets you do that's really really cool is it lets you guess um potentially very dumb like totally dynamically infusible guesses on the states that don't have to obey the physics but if you like cartoon them kind of okay it can make it solve really really fast and where this gets really useful is things like maze likee problems we have to navigate around obstacles where trop can be really bad bad but like RTS or graph based planners can be really good I can use a sample based planner to find a like path that goes around the obstacle successfully that's just in like say R2 or R3 and then I can use that to warm start like the center of massic of my leged robot or my quadraped and then this will converge super quick because these are really good about reasoning about all the Dynamics but they're really bad at reasoning about dense obstacles so it gives you this like ability to trade these off and get like Best of Both Worlds stuff out of sample based planning TR drop okay we went a little over sorry about that go play with this it's cool and uh catch you guys next time

Transcript for:
Nonlinear Trajectory Optimization and DDP Methods

Transcript for:Nonlinear Trajectory Optimization and DDP Methods

Transcript for:Nonlinear Trajectory Optimization and DDP Methods

Transcript for:
Nonlinear Trajectory Optimization and DDP Methods

Transcript for:
Nonlinear Trajectory Optimization and DDP Methods