Stochastic Optimal Control - LQG and Kalman Filters

okay so last time um we uh got into kind of basically like stochastic optimal control and in particular the uh lqg problem which is you know basically the generalization of all this lqr stuff that we've already spent a lot of time on to stochastic setting where you have noise on both in the Dynamics themselves so the Dynamics are St stochastic and noise on your and you no longer have access to the full State you now got sensors that are only observing say pieces of the state or whatever and those have noise on them too right so we've we've gone from like full State Everything clean and deterministic to everything noisy and and so noise on everything and you can't really ever have access to the full State and um last time we got through this whole uh thing about the separation principle and certainty equivalence these are like really kind of Bedrock Classic results that should sort of stick in your brain you should remember these things these are super cool um and um and the fact that you know even though those things don't hold in the general nonlinear case we still do them all the time and pretty much all like modern control things are kind of based on doing that separation thing into a state estimator and a controller and theoretically the justif if comes from this lqr story that we just talked about um let's see what else say about this that sort of Separation principle thing there are uh RL methods now that try to not do that and the gist there is um if if anyone's familiar right there's a lot of like reinforcement learning methods that take as input some like big lagged history of observations um instead of like just the current state the reasoning behind that is that's trying to capture distribution information um and rather than force you to kind of like sum everything up into a single state Vector uh so it's maybe trying to like not do the separation principle thing to some degree um what else to say about this the other thing I'll mention about lqg um lqg is like you know again one of these like great like theoretical playgrounds like control theorists love the lqg thing because it's simple enough that you can actually prove you know theorems and prove results in that context but like meaty enough that it actually means something and like is useful and there's still lots of good work happening in this lqg framework uh last few year some of my favorite work uh was um Ben W who's at Berkeley and one of his PhD students who's now Professor Cornell sarahan did a bunch of work um showing like basically trying to answer the question about like you know like answer some deep theoretical questions about model-free versus modelbased RL algorithms and like the way they got at this was by doing it in the lqg setting where you kind of can get some real answers and they specifically looked at like this question of like is it better to do something like policy gradient where you directly optimize a feedback policy or is it better to try to like learn a model first and then using the model do control like model based versus model free RL and in specifically in that context they were doing lqg so we know the optimal policy is linear so they just try to learn like a k Matrix basically versus should you learn the A and B matrices and then use those to just plug into ricotti and get the lqr policy what do you guys think the answer to that is which one of those do you think is better because most of modern RL is doing more the the the policy gradient thing right where you directly learn the feedback policy it's in the general setting it's like impossible to say anything about this but if You' try this specifically in the LPG setting this is easy to try you can just code these things up and do like you know uh policy gradient or whatever po or you know sort of system ID and then like whatever anyone have any guess like what you think would be the better thing specifically better in the sense of what's more sample efficient so given the same everything noisy observations but BL and you can try stuff and explore should you fit the K Matrix directly or you should try to fit the A and B and then compute a k based on that what do you think requires fewer samples so you yeah so we got a couple different votes here I mean personally if I were just going out this knowing nothing I would I would guess that learning K is better because it's less to learn it's literally just like the feedback Matrix learning A and B it's more stuff right Like A and B are bigger than just the K but it's it turns out you're right learning that's right yeah yeah so it's maybe task specific or something so it turns out they're able to prove in this series of papers that um at least in the lqg setting it is better to do modelbased RL it is it is more data efficient to learn A&B and then plug it into you know dqr or whatever and to get the controller than it is to directly try to learn the K Matrix which is kind of interesting and like to me a little bit counterintuitive up yeah I mean it turns out it's just better period like in terms of sample efficiency basically if you look at how much data you need to collect to get to some level of of performance of of in terms of cost or whatever it's like strictly better to do the the model based version I don't know kind of interesting nugget but this is like recent work in the last couple years that's you know again very current very sort of in this RL kind of context and it's basically built on lqg because that's the context in which you can actually get like hard results and and like prove stuff because the general nonlinear case is too hard for for Theory right okay cool so that's just some random usings by me that stuff's awesome though if you're interested highly recommend checking those papers out um today we're going to kind of try to finish off this lqg story specifically by talking about optimal estimation so last time we basically made this argument that because of this separation principle stuff the move is the optimal thing in this lqg setting is to basically just do lqr on your best estimate of the state and that's optimal so we already know how to do lqr so now the story is how you actually get that best estimate of the state so this is optimal estimation um and it the answer is it's the common filter but there's a lot more to the story so we're going to like kind of go through that today and maybe HP you to some things you might not have seen before in that context who's done a common filter before by most you guys right who's seen it actually derived from scratch before and gone through that okay so most of you guys good um so we'll do that and then I want to point out some some interesting little you know uh so we'll finish this lqg story and then we're going to talk about Duality who's seen this before like Duality between estimation and control couple people yeah so this is a really cool thing that will maybe bring us full circle back to a bunch of stuff we've already talked about in the class um cool so that's what we're doing today let's get into it okay so first um from last time I just want a quick summary recap of some things um first off we we did this whole certainty equivalence thing just you know I kind of already just said these things and the separation principle and what all these things say is basically at least in the lqg context the best thing you can do is like you know run some kind of state estimator to get an optimal State estimate and whatever that state is you just plug it into the lqr controller the optimal controller and life is good and so the idea here is in the stochastic setting like um you don't have to basically like in the controller you don't have to reason about the full probability distribution you can just assume a deterministic optimal controller and then you just separately design a state estimator and you just hook them together and that's optimal not true in general but definitely true in the lqg setting we proved it um but even though it's not true in general we we frequently do this in practice uh ah damn it even though it's suboptimal in general and [Music] um the uh the kind of main weather you can get away with this or not thing comes down to what your distribution looks like over your States coming out of your estimator and basic you can get away with this as long as the distribution is uni modal um so like that peak of the distribution you know in the unimodal thing is like a reasonable estimate for where you are as soon as the distribution comes bodal like it's a it's a bad situation and you don't want to mess with that okay so that's that um and so we started talking about this last time just to recap uh a little bit and then before we get into the next thing just kind of what we mean by Optimal State estimation um and last time we talked briefly about this question of like what are you optimizing so this is the idea of like I might have some probability distribution it might be messy in general and at the end of the day to do this sort of Separation certainty equivalence thing I need a state to plug into my controller so if I have some weird distribution that's like Wiggly and weird and maybe multimo and looks kind of like this there's this question of like what is the what is the right optimal State just to return given some weird distribution like this and like we kind of talked about two options last time um one that's very reasonable is to pick the maximum of the distribution the highest point on the distribution so in this case that would be up here probably and this is called the map estimate or maximum a posterior estimate the other reasonable choice is to pick like the least squares answer or the minimum mean squared error answer which is roughly speaking like the the x that like is in the middle of the distribution right the the average the mean right and so that's maybe in this case somewhere over here which you you know is actually probably a pretty bad estimate because it's at a low probability you know um point and so uh we talked about [Music] maximum a posterior or map and like mathematically speaking this is basically the the ARG Max of the distribution so this is in particularly the the posterior distribution so we're going to get into this but what this is is the state conditioned on the measurements so the measurements are why so this is the called the posterior it's like after I've included all my measurement information this is that posterior distribution that's been conditioned on all my measurements this is what I you know think the distribution over the state is given the measurements and now I want to say okay what's what should I return so one option is the peak that's that guy and this is probability of State given measurements okay cool and then the other one is this mmse thing error mm and here I actually want to blabber this one a bit because it's we're going to use a trick from this discussion so this one is ARG Min over X hat which is our estimate like usually mean and the thing we're minimizing is this expectation um so it's x - x hat uh transpose * x - x hat so this is basically least squares or minimum variance right they're all kind of the same thing least squares or minimum variance and now I want to show you a trick uh that I don't know I I know it only as the trace trick that's what I call it it's what I've heard people call it I don't know if it has another name if anyone's heard of this before heard it under a different name let me know so the idea here is and we're going to use this a bunch today which is why I'm going to show it to you in this kind of simpler context so okay so this is an inner product right a DOT product between those two things and the answer the thing inside the expectation there's a scaler right so I can like it means nothing but I can wrap that scaler in a trace operator and that's completely legal it does nothing right the trace of a scaler is the same scalar so I can I can just do that and it's kind of trivial so I'll go do that so I can write it as this trade of uh x - x hat transpose x - x hat cool so yeah that was kind of dumb and did nothing for us but here's the fun part okay so um inside of a trace I can permute like if I have a trace of a bunch of matrices like multiply together I can permute the entries inside the trace and I get the same answer it doesn't change the result you hav't heard of this before so if I have like trace of ABC it's the same thing as trace of like C blah blah blah you can mix them around right so this is equal to so I do that here and that's not not hard to show so cool so in that case I'm allowed to do the following move and it doesn't change you know anything so I can do x - x Hat Time x - x hat transpose all I did was reorder the product inside and now instead of a DOT product this is an outer product right so it's trace of this outer product thing what's that outer product thing though we've seen that already it's the covariance Matrix right the expectation of that outer product thing is the covariance Matrix exactly so now what we have and Trace is linear operator expectations linear operator so I can like flip these things around and it doesn't change the answer so now I can kind of move this around a little bit and this is now I can move the trace outside the EXP expect because everything's linear so now I can do trace of expectation of this outer prodct thing and this outer producty thing is the coari so this is Trace of covariant cool fun little trick um so this is just a trick that we should know about we're going to use it today a bunch um basically what it says is like um leas squares where you're minimizing the square we error where I take the inner product of the stuff this is equivalent to minimizing the trace of the coari under this kind of nice little Trace trick fun fact comes up it's useful has anyone seen this before couple people okay I don't know I call this the trace trick it's a fun little trick it shows up in like particularly in Colman filtery stuff but it's kind of generally a useful thing to know about uh so nice little mask trick and then the punch line here so this is a thing we're going to use today a little bit so there you go um uh and then the last note which we said last time is that this map estimator and the mmse estimator in the case of a gausian what happens to them they're the same right so if you have a symmetric distribution like you know uni modal whatever gausian uh these things end up coinciding and so we're going to also leverage that fact today and basically say we don't care which one we do and it turns out in our derivation we're going to do today you you can get a common filter from either perspective by solving either optimization problem you get the same answer because they coincide today we're going to use the mmse perspective to get the answer because it's more convenient for the way we're going to do it but you can do this other there are many many routes to the common filter you can drive it a whole bunch of ways I'm going to show you one way there's at least three or four that I know of my favorite one which we're not doing a bunch which is a bit of a bummer is there you can actually do it geometrically which is super cool if you haven't seen this before the original Colman paper from the 60s does it geometrically as like a bunch of project it's really cool the original column paper is awesome you should check this out the only other book I know that does it this way is the kyth linear estimation book which is a great book if you I I will give suggested reading at the end if you're into this stuff I'm a nerd I'm into these things I like this stuff but you know if you're similarly you should definitely check out the original column paper in the kyth uh Min your estimation book okay and then yeah the someone oh okay so and then lastly these are the same for a gausian okay so that's a nice little kind of intro thing and now what we're going to do um is kind of Leverage some of these little ideas and we're going to go through a derivation for the common filter which is the optimal linear State estimator so in this lqg context you got linear Dynamics gou and noise on everything this is we're going to basically just go through and and find the optimal State estimator it is called the common filter youve heard of this before um so let's do it and like I said even if you've seen this bar I think it's it's healthy because there's like I know of at least three or four different ways to derive it like it's healthy to see it done a couple different ways and just kind of like wrestle with it a little bit okay so the perspective we're going to go from today is um recursive uh linear mm estimator and um so kind of what we're going to do today is we're going to like play some tricks and shortcuts we're going to basically assume out the gate that this thing is recursive that has a recursive structure which we know it does you can derive it without assuming that out the gate and come to that conclusion we're just going to assume it out the gate it's linear we're going to assume that out the gate also uh which makes things go quicker and then we're going to go that mm route which we just talked about you can also go the map route and get the same answer okay so the recursive means that we're going to update we're going to maintain an estimate and in each time step we're going to update that estimate given a new measurement as opposed to batch which means taking a whole bunch of measurements at the same time and solving like for the whole trajectory at once that makes sense yeah cool so that's what recursor means it means you update at every time step okay so the general um idea here is we're going to kind of um bootstrap this by we're going to bootstrap we're going to assume a recursive structure we're going to oh yeah that was awkward apologies assume an estimate uh of the state at like time K that includes all measurements uh up to the current time so what that looks like so our X hat Hat's going to be an estimate and we're going to write this with a little Shand we're going to write it as K given K or k conditioned on K that's what that subscr means and the long hand version of that is this is the expectation of XK conditioned on measurements from time step one up to K so this is our estimate of X at time K given all all previous measurements up to including the current time step and so we're going to assume we have access to this at time K and what we're going to basically do is assume there's a recursion so given a new measurement at time K plus one we're going to figure out how to update this estimate to include that one new data point and once we've done that now you've bootstrapped this recursion you can do it for all time right that's that's kind of the strategy um and similarly we're going to also assume we know the covariance on this state um the eror equ variance another way of saying this is we're assuming gausian beliefs over the state right so we're just assuming we know the gaussian Distribution on the state and that it's gaan and whatever um so this we're going to write as big Sigma is going to be our state coari try and write this somewhat legibly and same same deal we're going to write it like this K given k thing and this is going to be the expectation of xus X hat uh this is K given K outer product e thing cool so that's our shorthand basically we're going to use these subscripts to indicate you know the first one is the time step and then the second one is what measurements they're conditioned on and the reason we're going to do this is we're going to end up with a kind of a two-phase update rule that's going to do a prediction step which is going to predict X hat and sigma so that's going to be like K plus one given K right so that's the next time step but we haven't included the new measurement yet and then we're going to update do a measurement update step that's going to give us K plus one given K plus one so that's why that notation there okay so right so basically the game here is we want to figure out um how to update X hat and sigma to include a new measurement at uh like TK plus one at the next time step cool so basically yeah we're taking a lot of shortcuts we're assuming linear it holds we're assuming that gusan stays gusan we're assuming it's we can do this recursively right all these things are true I'm not going to prove any of those facts to you I'm just going to like assume that structure going into it and we're going to derive like the the details um so basically the move here is we can break this into two steps and we're going to kind of go through and derive those two steps right now any questions so far yeah after after youed your system yeah all of this is in discrete time do you do Des critize then linearize then do this or can you just assume the nonlinear system is linear I mean it kind of doesn't matter what order you do that in you can linearize in continuous time then discretize it or you can like discretize the nonlinear thing first and then linearize that and it doesn't really matter you'll get basically the same answer um yeah sorry say again xn the whole basically the entire trajectory conditioned on all the measurements yeah yeah yeah so so that's like if you're doing this offline if you're doing offline analysis and you have the entire data set and you want to then reconstruct the trajectory in an offline context right so then you're that the that's just Bach squares actually then you're just solving a big Bach squares problem the deal here that's slightly different um another way of saying this is this is causal as in the current estimate only depends on now and previous times not on the future the version you're talking about the current estimate at some time in the middle of the trajectory would actually be informed by Future measurements that I don't have yet right which is fine if you're doing it offline and you're analyzing a data set offline but if you're on the robot in Real Time you can't use future measurements to figure out where you are now right so this is purely like a causal kind of thing where it's only taking account past measurements that you have access to me why do I care about the past measurements yeah because they tell me where I am now so here's like a simple example this I only have position measurements my state though is position and velocity how do I know my velocity if I only have single position measurement from right now oh see does that make sense so very very frequently um this gets into this idea called observability which is a dual notion to control ability usually the why is not and one why at one time step is not enough to reconstruct the full State like very often it's some partial set of states or some you know function of the state for example only the positions or something like this right or only only some part of the pose it's not the full State Vector you can basically never get the full State Vector kind of like in a clean way right so you're always trying to reconstruct that state Vector from partial noisy partial observations so in the context where you only have say positions but not velocities coming in from your sensors the way you get velocity is by using previous positions and effectively a naive way to do it would be to finite difference this position and your last position right but finite differencing amplifies noise and does bad things so by doing a filter like this and leveraging the Dynamics model you can do much better than like naive finite differencing make sense cool anybody else all right here we go cool all right so here's what we're going to do so the two steps um the first one is the prediction step and the idea here is we're going to use our Dynamics model and we're just going to like propagate our distribution forward one time step no measurements yet it's just so the the using our notation this would be first we're going to predict the the state uh estimate or the mean of the distribution and this is going to be k + one given measurements up to time K using our notation and if I expand that out it's by its definition I literally just take the definition of that expectation and I plug the Dynamics in so this is the expectation of now just ax K plus b k um and remember I have my stoas Dynamics with the W in there now plus w k and all of this conditioned on measurements y from 1 to K right so it's literally the definition of X hat K with the Dynamics plugged in and so if I stare at this for a second like literally all that happens is um I know you it's not a random variable the only random variable in there is the W and remember we're assuming it's zero mean gusan noise so the expect of w is z so all this does is essentially apply the Dynamics to our last best estimate so it just looks like this uh ax hat K given K plus b u k okay so that's the prediction step on the state now we also need to do the prediction on the covariance so this is Sigma k + 1 given K this is same idea we're just going to take the definition and plug the Dynamics in it's um plug and chug it's not super hard so remember the definition of this guy is x k + 1 ni X hat k + 1 given K that's the outer product and I'm going to start using shorthand notation here so when it's you know an inner or outer product and it's the same thing repeated I'm going to do this so I'll write the first half and then dot dot dot for the second half save myself from lots of terribleness okay so now same story I'm just going to plug the Dynamics in here uh so this is uh expectation and so now I'm going to plug the Dynamics in on that XK plus one and on the X hat and so I'm going to get the following ax K plus bu K plus w k minus ax k + 1 given K minus B UK uh and then dot dot dot same thing okay so first to note is that uh the control term looks the same on all the x's and the X hat so these go away these cancel right out um that's not a random variable blah blah blah um and then the next thing to note is uh basically the the A's on the X's um expectation is linear so I can like kind of move some stuff around and and basically all it's going to happen is I'll I can split this into two terms uh the first one will have the X terms in it and I can pull the A's out on that one so that's going to look like this so a and then inside expectation of x k minus X hat K given K dot dot dot transpose and then a transpose on the other side and then the noise term the W shows up on both sides in this case and so that's w k WK transpose which is um all right so then we've got these two terms this one from our definitions is Big W the noise covariance process noise covariance and then the first one what I've got there is just the original definition of Sigma K given K so this guy in the expectation is literally just the definition of the state coari from the previous time step um so now if I like put all that together I've got this like I don't know what I like to call Matrix sandwich um and so if you think about this for a second what this is saying is literally just I take the covariance at time K and just shove it through the Dynamics that's what that sandwich with a is yeah don't you have cr Wells yeah oh yeah sorry um this is a thing we talked about a bunch last time which is this idea of the noise at time K is uncorrelated uh from the state at time K so the idea here is if I'm at time step K and I have a state whatever it is cool that's where I am the noise sample is coming in after that it's coming in in the time in the Dynamics that are propagating me to the next time step and it's assumed to be drawn from a gausian at at that time right so the current state does not depend on basically it's it's sort of weird ambiguous because of the way we're defining this in discreet time but essentially that noise is drawn at time K once I'm already there so these things are not correlated yeah so I should write that again we did we did talk about this last time but yeah just to thanks for the reminder um uh so we're saying that basically XK and WK are uncorrelated and therefore the cross term are zero right by definition is everyone cool with that yeah ah sorry yeah so a times is that a that's probably just a mistakes okay so yeah sorry this one uh to be clear this one this is correct and what I'm doing is writing this as a * X this one is a x this is wrong here so let me fix that thanks good catch so all we're doing is putting in the Dynamics right and writing it as the last time step fast forwarded by the Dynamics and basically at the end what you get is this thing which is saying I just take my coari at time K and run it through the Dynamics and then I add on process noise term that's just making the noise bigger right okay everyone cool with that we're going fast um and that Matrix sandwich thing shows up all the time that's literally just how I propagate like a coari matrix through through a linear transformation right it's like a change of coordinates on a matrix or whatever so that shows up all the time um and it falls out here from doing this expectation math but just generally if you have a matrix written in like one coordinate system and you have a CO a linear change of coordinates like you kind of smoosh it on both sides right and that's that's how it works okay so that's the state and covariance prediction um so now I have from this you know if I if I have a a Gauss at time K given all my measurements so far I can predict where I think the state's going to be at time K plus one and the coari by running it through this right so it's just run the state the mean through the Dynamics and then the coari basically run through the Dynamics and add a process noise term to like inflate it a little bit okay so that's that the next step is the measurement update okay and so the idea here is we're going to like Define a couple things um and take some shortcuts basically I'm going to assume a linear update blah blah blah and then we're going to figure out the details so uh right at the gate I want to Define this thing called the innovation has anyone heard of this before okay so this is like a common lingo in in like the filtering literature um I'll write down real quick so we're going to use the letter z for this and I guess this a Time K plus one and what this thing is is basically the difference between the actual measurement so the me the Y here is the actual measurement you get at Time k+ 1 and the predicted measurement so I take my predicted State and I run it through my measurement model so remember here I'm assuming my measurement is C * X plus a measurement noise term so the predicted measurement is just this guy uh on my predicted state so this is K plus one given K so in some sense this is like your surprise I I actually think the word for I would use for this is surprise it's like how off your prediction is from what you actually saw right so this I mean people call this Innovation it's basically the idea is it's the it's the error in your prediction so to speak right it's the how surprised you were by the measurement you got and the reason we bother with this is this is basically the error that you feedback in the common filter right so common filter is linear feedback um on some kind of error right to try to control one way to think about it is it's trying to control your state estimate and it's doing that by you know feedback on the measurements and this is the thing that's actually getting fed back this is the error term you've got the error between your predicted measurement and the real measurement that you actually got cool uh cool so let's and so if we write this out kind of like in some more detail um like what this thing is it's uh so the the Y there is going to be c times the actual say State maybe um plus this measurement noise uh term we talked about before and then minus this like expected measurement cool so that's like the definition of that thing and then the next thing we need unsurprisingly is the Innovation covariance so I need to figure out what the coari of Z is going to be this thing's gausian right so gaussian's we kind of assume this here but like if if anyone hasn't heard this before gausian stay gausian under linear apine Transformations which kind of what makes all this derivation work so if I take a gausian and plop it through linear Dynamics or a linear measurement model it stays gausian yeah is the C Matrix given or yeah so C is here is our measurement model it's the uh we're assuming it's linear and it's basically if I have the state X it's the function that Maps X into my measurements and it doesn't update time it can yeah so I should have said this out front too so here we're assuming the standard like lqg assumptions from last time so it's XK + 1 = a * XK plus b * UK and then yal C * XK so that's it plus noise on everything and here we're going to write everything as LTI so the A and the B and the C don't change in time nothing changes if they are time varying though you can like carry through the entire thing with indices on the C's and A's and B's and it all still works the same way I'm not going to do it here just to save myself writing and like little subscripts that you can't read anyway because my handwriting is bad um but does it make sense all right everyone cool yeah me mod have that directly so you're not passing the measurement through the measurement model you're passing the state through the measurement model why why do we need to do that why can't we directly take that as say y hat you it's not y hat it's just y That's a thing you get from your sensors there's no X head will take that as y head and subtract that and that be the between the actual measurement and the sure if you want to call that thing y hat that's fine by me the reason we're doing it this way is because I'm going to use this in one second to like derive this covariance result so yeah bear with me for one sec all right so the next move here is we're going to look for this Innovation covariance this actually is quite useful in filters in general um we're going to call this s uh K +1 and this is going to be expectation of ZZ transpose like you know just how we're used to thinking about Co variances um and so I'm going to expand this out and plug the definition of ZK plus one just there in uh so it is um C * XK + 1 + v k + 1us C * X hat k + 1 given k uh times that transpose and now we're going to play our game of things being uncorrelated and say same as before that like uh VK + one and x a + 1 are uncorrelated which is sort of obviously true in this case this is saying the state at the current time and the sensor noise are uncorrelated is definitely not true of in general right like this could be you could imagine a scenario in which your state is correlated with your measurement noise this is not that hard to cook up for instance like let's say you have a sensor where the noise is proportional to the temperature or if you have a camera and like it's proportional to like just the ambient light you know you could very much imagine scenarios where like your robot pose influences the amount of noise you get right like if I'm in a very noisy environment versus a less noisy environment if if like I I don't know I am user actually generally the noise is proportional to temperatures for so if I'm in a hot room versus a cold room if I move around you could have this this this one is you could imagine scenarios whiches is not true right and there's way to deal with that but here we're going to make this the standard assumption that they're uncorrelated and the idea there right like wherever I am right now I'm just drawing a random sample from aashian right now on my sensor effectively and that's that's what this is cool okay so we're going to assume that uncor ated thing again and now um kind of expand this out so this isk + one is um and so similar game as last time I'm going to pull the C's out here and split this up into two terms so I'll have one term that's just um the X stuff this ends up being x k + 1 minus X hat k + one given K dot dot dot um and then C transs on the other side and then we're going to get a noise term it's going to look like this and so this is like if you stare at this for a sec it's pretty clear that this is this thing in here is something we've already derived which is the predicted state coari so K plus one giv K from the last step and then this guy is just the sensor noise Co variance and so then the punchline from this is that this Innovation covariance is um basically smooshing my predicted stage covariance through the measurement model so it looks like this and then plus the measurement noise co-variance make sense cool and then yeah the the reason we care about this is the innovation is the error signal that we feed back into the estimator and um this idea of like Innovation Innovation ciance one area this is like really useful um in Practical filter design is uh this is a really really good way to do outlier rejection so you basically remember like Innovation is surprise and I have aarian on my surprise so this is saying like if I get a measurement that's like three three sigma away from my prediction or like more than that say it's like five or six Sigma way way out the odds of that being legit are pretty low right and so the maybe most standard way to do outlier rejection in a filter is to look at this Innovation and this Innovation covariance and if your Innovation is huge like relative to the covariance if it's like many Sigma away like you put some bound on that and you throw it out if it's more than say three sigma away from what you expect and you call it an outlier so I don't know fun fact these are useful things all right cool so that is the Innovation and so now that we've got this stuff uh defined here's uh and just that statement I just made um we can now you know write down what the state update is the measurement update based on those things just basically this and the fact that it's linear it's just doing linear feedback on that Innovation so here's what that looks like you've got xat k + 1 given k + one so like our updated estimate is going to be the predicted one so k + one given K and then plus this feedback term um we're going to call the feedback gain Matrix here L uh instead of K because we're using k for lqr and we will mix these together that said um L actually has historical precedent and is not uncommon L is for luenberger who is another giant of you know kind of linear dynamical systems theory uh who was the first guy to really like formulate state estimators in in kind of state space this way uh so this is lk+ one times zk+ one so um and this L here uh in our context you can write in you can write down any L you want if you just write a random one down like if you do old school pull placement kind of stuff and you're not optimizing it per se it's called a luenberger Observer uh just like you can have any kind of linear feedback that's not lqr right if you optimize it and pick the optimal one in the sense we're about to do then it's called the Colman gain and this is now a Colman filter uh so this L going be called the colan game okay so now we know how to do the state update the big question is what's that L supposed to be and we haven't gotten there yet we're going to assume it looks like this it's linear feedback on the Innovation um but we need to do a little bit more to get to the actual expression for l so the first step in getting there is now we're actually going to write down the covariance update because remember we've also need to figure out what uh Sigma K +1 given k + 1 is so to get to this we're just going to go back to the definition and plug stuff in so this is x k + 1 minus X hat k + 1 times that transposed um so now what I'm going to do is just go plug in basically this state update equation and expand it out so here that looks like um expectation of x k + 1 minus X hat k + 1 given K minus l k + 1 times and now I'm going to plug in the definition of Z this just plugging in all stuff we've already done it's slightly messy C * x k + 1 plus v k + 1 minus C * x a + 1 given K Okay so that's one two parentheses and then same thing over here transpose okay so I just plugged in the update equation right there and then I plugged in the definition of Z from up above in in this whole thing so now there's like another round of this whole assuming things are uncorrelated um once again it's VK + one and x k + one are uncorrelated which we already kind of did and then we're just going to expand this thing out a little bit and massage it um it's not hard it's like obvious stuff based on like if you followed all the rest um and so what this works out to be is I minus LC time Sigma k + 1 given K time IUS LC transpose so it's the Matrix sandwich of like our predicted Co variance with this sort of Colman gain thing um and then uh the measurement noise similarly sandwiched through the comman game okay so this is the coari update equation um has anyone seen this thing before in their Colman filter Explorations yes kind of maybe so we we had a little sidebar discussion a couple classes ago some folks who were messing around with um like the IQR stuff and talked about how uh remember we went through the lqr derivation a few different ways and in fact came up with different expressions for the rotti equation couple different ways same thing is true in Colman filter land this form of the measurement update uh has a name this is called the Joseph form fun fact and basically I'm showing you this one this is the one you should actually use in practice there's at least two or three other ways to write this um that all have various shortcomings numerically speaking they're all exactly identical in an infinite Precision arithmetic but this one is the one that's nicest in a computer and with floating point because it's symmetric so like both these expressions are symmetric so the answer is going to be a symmetric Matrix and because I'm adding these two terms together so other versions of this are not symmetric as they're written and so because of floating Point roundoff they can give you not symmetric covariances which is bad and then also a lot of them have minus signs in them and under floating Point round off that can give you an indefinite I.E a coari with negative I values which is also a big no no like covariances have to be at least positive to semi-definite so this one is basically guaranteed to give you a symmetric positive semi-definite matrix by construction because the way it's written so you should do this one essentially and if you see other ones written down in books and papers which you will um that's fine but when you go code up your column filter you should do it this way um there's other ways to register to kind of okay too there's other but this is like kind of of the ones you will see probably the best one yeah survey paper go all the Lessons Learned just really like as far as floating point prision the limitations of Hardware yeah yeah like those things are hard to quantify yeah but look out it's like oh man it's made a big difference yeah yeah yeah I mean there's there's almost certainly papers that kind of go through this stuff I would say the stuff that I'm talking about now these are like street fighting you know kind of skills that are like more tips and tricks and like practical things that are not so much Theory or math or academic WR I mean there's definitely some of this stuff written down but these are more like the hard one lessons of like you know if you like go talk to people who've written like these things on real deployed systems and like gone through the pain like these are the things you pick up doing that you know so there these are I'm telling you these things but like a lot of the stuff is just like kind of Street knowledge that's not like written in books and papers some of it you'll definitely find but like I would say there's a whole bunch of like these kind of Street Fighting tricks that are not sort of common knowledge and books and stuff so I'm feeding you some of them I'm happy there's um for specifically for common filters there's this guy Tucker McClure who um has an who's a consultant on this kind of stuff who has an awesome series of blog posts his website is an uncommon lab.com uh if you go plug that in and he has a really really awesome like three-part blog post series on like a lot of practical tips and tricks for common filters that like are his he basically does Colman filters for a living as a consultant and like um has done lots of this and he he has like a a whole really great like actually his blog post is one of the best resources I know of for this kind of stuff so that's where I would look if you if you want me I I'll send it out on Patza later you already got it yeah send it out to your friends it's good it's really good yeah all right uh yeah like space b like y CX plus du what if you have a d term yeah um you can do it that way like practically speaking I've literally never encountered a system where that was a thing um so that's why I don't do it uh but yeah it I mean it doesn't change the derivation much cool it basically shows up like the bu yeah where it shows up in the mean but not the coari update it doesn't really do much it like if you followed this if you want to throw the DU in there it's like no big deal and you can find that stuff in books and stuff too okay so that's the covariance update that's the Joseph form thing that's the street fighting skills about covariances being positive and definite and symmetric and stuff that you should know about the other way to fix that which is the legit thing and the Tucker McClure blog goes into this the legit way to do this on an embedded system if you really want to make a bulletproof Real Deal colum filter is to write it in the so-called square root form so what that is is rather than keep track of Sigma in the filter you keep track of the square root of Sigma aka the chesy factor and the reason you do that is then if you keep track of the chesy factor the actual covariance is always the square of that thing and so if you square that matrix it's guaranteed to be symmetric positive definite so that's a Slick Trick that guarantees positive depthness under like floating Point roundoff error and stuff it's more numerically stable also and has other nice numerical conditioning winds the blog post has all that stuff it's great um there's a like awesome I should okay I'm getting really sidetracked now but like historical anecdote um the colan filter this thing this is obviously Rudy Colman in the early 60s um the extended Colman filter is some called times called the Colman Bucy filter um and it or no col colan Schmidt filter Schmid is the name on that basically the story there is in the Apollo program this is only like a few years after the original col paper I think it was in like 1967 Schmidt was an engineer working at Matha Ames I think on the um guidance like navigation for Apollo and basically they had this like intractable problem of like how to navigate Apollo and he basically went home over the weekend the legend is he basically went home over the weekend saw the colan paper which had like just come out like fairly recently and this is pre- internet so like you know whatever he came across the colan paper original one like had the AA moment invented the extended version like the nonlinear version for the Apollo like navigation problem and like showed up at work on Monday with the answer and they flew it that's how they flew to the moon it's kind of like insane legendary you know kind of story uh anyway cool so colan Schmidt all right so okay we got to finish uh Coleman gain so this is now how you find the L um okay so basically what we're going to do is we're going to use this mmse idea um so what this means mathematically is we're going to minimize the essentially this like the least squares thing right so it's expectation of the inner product of these things and now we know how to get all this stuff uh so it's this inner product um and the next move is we we learned the trace trick at the beginning of today so this is equal to um the trace of our Co variance our our updated coari K plus 1 given k+1 so this is the thing we're trying to minimize and specifically we're trying to minimize that trace of that coari as a function of L which is literally this Joseph form expression right here so literally what we're going to do is try to minimize the trace of this guy as a function of this guy that's what we're doing so quick question what kind of function is that as a function of L it's quadratic yeah it's a quadratic function of L and if you stare at this for a sec you know sigma's symmetric positive definite so it's it's a convex quadratic function of L so that's beautiful we know what to do with that so the answer is I basically take this expression and I take its derivative with respect to L and set it equal to zero that's the whole game and that gives me the common game so uh let's write that down I'm going to skip the details because the details are messy because what I'm actually doing here is I'm differentiating a matrix valued expression with respect to another Matrix technically that derivative is a fourth rank tensor which is super super gnarly you can do it um you can actually do it with specifically with the crve tricks that we talked about when we did DDP so the legit way to get this is to use the cron tricks to massage this around into the or you can write down the tensor it's a little messy so I'm going to kind of skip those gory details and just tell you the answer if you want the Gory details you can try doing the konvex stuff on this uh so what we're going to do is set partial of trace of Sigma k + 1 given k + 1 which again is just up here this Joseph form expression uh with respect to l k + one we're just take that derivative and set equal to zero and then solve solve the resulting thing for L k+ one so if I do that uh the answer that you find is l k +1 is Sigma K +1 given K time cpose times s k +1 inverse where that's our like Innovation covariance thing that we just derived so that's the punch line it's a little messy to get there but um it's not really that bad and I'm going to like sort of save you the pain right now because it's not very interesting just a lot of algebra and tell you the answer it wouldn't hurt you to like go actually do this yourself just to like by the way yeah if you actually want to know this stuff I like highly highly recommend like what I would suggest doing if you really want to learn this stuff which I like still do occasionally is like start from scratch and say okay gaussians and linear Dynamics and linear measurement model and just start from there and then try to derive this whole thing which I I can still like basically once a year I try to just do this to make sure I like stay sharp and still remember how um but like like doing that kind of stuff like deriving lqr from like the basic building blocks you know once in a while like that's like what you should be able to do if you if you really want to understand this stuff um okay so to summarize the whole thing uh so first off we're going to start with our kind of um gouin prior so in this case this would be like X hat at Time Zero you know and similarly Sigma at time zero and then our obviously our system specifications like ABC and the the noise covariances all that good stuff cool so we start there then we're going to do the prediction step which we saw is you know basically shove the the mean through the linear Dynamics and similarly sh the coari through the Dynamics where you get the Matrix sandwich deal and then the noise added on uh next you calculate the Innovation stuff The Innovation and Innovation covariant um which looks like this so uh basically remember the definition here is surprise there a difference between the measurement we actually got versus the measurement we predicted based on our predicted state in our measurement model so surprise and I really should just I just want to change the name on this I don't like I think Innovation is a weird surprise is a way better name for this so it's and then the coari on that guy which similar right we take our prediction our predicted State ciance and shove it through the measurement model um and then tack on the measurement noise coari so there then we're going to calculate our colan gain uh so that's just what we just got sort of uh k + 1 given K Time cpose Time s inverse and then finally we're going to do our measurement update so that looks like this uh so it's linear feedback like we said on the innovation and then um we do the sigma update which is this Joseph form deal which you may see other versions but those other versions are not as good you should do it this way uh Sigma K uh Kus l c transpose and then noise turn Okay cool so that's the whole thing and then finally you you just repeat this right now we've bootstrapped the recursion and you go back to step two keep going so that's the recursive thing right okay now we're going to show you like the example code anyone have questions on this while I put the code up yeah yeah you said so you said X hat of K was dependent on the whole measurement Vector but didn't you just kind of show that it's like dependent on your me Vector your measurement at that time and then your last state um well yeah so I mean you you have a state at time K you predict the time K plus one take a measurement and like you fuse those together yeah the expectation back sorry like you the expectation is the state given your measurements from one to K yes so this seem like slightly contradictory sorry yeah explain so so you're saying it only depends on the measurement at time K plus1 say and not the previous ones so it depends on the previous ones by virtue of them being folded into the previous States right so at time K if I've run this thing for a bunch for K time steps I have already done K measurement updates on all those previous measurements right okay so the idea is basically at each time step I'm including one new measurement but I'm doing that on that previous state estimate that has already had the previous measurement and then the one before that folded into it right does that make sense so what this is a really if you know linearly squares anyone heard recursively squares this is basically a really fancy version of recursive Le squares which um if you like if you know linear Le squares right like ax equals b or ax minus B right minimize some of squared errors if you stare at that problem you can derive a recursion there where you basically um can update uh the least squares estimate one row at a time so like the idea there is if I if I do solve a Le squares problem for ax and B if I add one new measurement so I like one new row to a and one new row to be I can it turns out I can efficiently update my previous lease squares estimate by just like solving one little problem given that one new row instead of like having to resolve the entire Lee squares problem so that's called recursive Le squares that's a nice simple idea that's like fairly easy to grock that you can find on Wikipedia and stuff basically this whole common filter thing is a very fancy up version of that um where you've got like Dynamics models and stuff like that if that I don't know if that helped at all yeah what point can you like for like I don't know if it's cost saving but forget previous measurements if you don't need them any yeah so the idea of this recursive thing is there's no there's no cost you don't need to forget like here right like basically the algorithm at each time step it's the same thing I do this measurement update on this one measurement and the measurement at each time step so you're totally if you were doing a batch kind of thing where I was trying to like that's the beauty of this recursive idea if I were solving that going back to the least squares example if every time I got one new measurement I had to stack all those up and solve a giant problem you're right you would want to like forget and maintain a fixed Horizon there or something um but in this context it doesn't change the the expense of the update at all like every update is just a single measurement I just do it at every time step that never changes and like I'm basically folding in all the prior information into that gausian at every time step right so that the cool part is the complexity algorithm is independent of of the Horizon length so you don't need to forget basically if you want to forget you can there's actually pretty straightforward ways of introducing what's called a forgetting Factor um here to do that kind of thing but you really don't need to um by the way that those forgetting Factor ideas are all basically also just very closely related to the co variances in here so you can kind of tune the co variances in such a way that it kind of gets you that effect basically High process noise is very similar to a forgetting factor it's saying I don't trust the the propagation very much so it'll kind of like downweight the predictions and upweight the newer measurements uh okay so just to kind of run us through this real quick this will be posted this is like a very very very simple implementation of this thing on a very dumb system this is a double integrator so like the sliding brick in 1D which we've seen many times um I'm putting noise I'm basically having uh some kind of you know White Noise Drive signal to this so it's like basically a random walk uh and then I've got white noise on the position measurements I'm assuming just position measurements no velocity measurements so that's a b and c over there just set up stuff I'm going to put an lqr controller on here that's trying to drive us to the origin so we're going to do the whole Clos Loop lqg thing I hook up the colan filter to the um lqr controller this is like you know very straightforward dumb like lqr cost I'm going to get out my lqr controller I'm going to start this like I don't know a meter to the right say at zero velocity and the controller trying to drive it to the origin um and I'm going to initialize the filter with like a very bad initial guess which is the things at the origin which is wrong and then like an identity covariance random stuff um this is just kicking stuff off and then down here here's the um I wrote out the first time step in detail here so you can see like this is literally just the stuff we just wrote down um all the steps and then um um you can see like in the propagation I'm actually like adding a I'm drawing a sample from a gaus in here and adding that on to the Dynamics that's the stochastic Dynamics and then we're just going to kind of loop this and see what it looks like so you can see basically the blue here is uh ground truth and then the Orange is the estimate so it's like I put a lot of noise on here so it's it's quite noisy but you can see that the the estimate is tracking the truth right it's noisy version of it we can run this a bunch I'm being pretty mean here and putting a lot of noise is like kind of you know obviously noisy but the the punch line is this actually does you know work and and still like in spite of the noise right it's like uh the the mean is converging to the origin and you can run this a bunch you guys can try this one thing I did want to point out that's kind of fun is if I plot the covariance entries what's happening this would show up by the way if I also if I plot the colan game What's Happening well they're obviously getting smaller which they should but in particular they're converging towards steady state values that is neat where have we seen this before infinite Horizon lqr Yeah so basically because this is an LTI system similar in the lqr context it turns out there's this Duality thing which we're going to talk about in one second but like um similar to the infinite Horizon lqr thing here uh the pro variance on the state and the colan game all converge to steady state values after a while in the LTI context so if everything's time invariant like the covariances are time invariant blah blah blah it converges to a steady state value and in fact you can play the same game here if you really have an LTI system you can basically just do this and then have a you can basically just run with a constant colon gain that is the converge steady state value very similar to the lqg lqr thing we talked about coning to zero no no this is nonzero I mean it's just a lot smaller than we started with because I started with a bad initial estimate point is after you run this for a while um essentially you're going to converge to some steady state coari which is the best you can do and like because you're constantly basically one way to think about this is the the filters trying to squish the covariant down as as aggressively as possible but the System Dynamics right at each time step remember we're we're adding process and measurement noise to it so one way to think about this is the filter is trying to squish the coari and the physics is trying to blow it back up and so those things are fighting each other and if you run this long enough to steady state they're going to kind of reach an equilibrium and you get this steady state coari value that's in the time invariant case but in the general case you know it can wiggle around and do whatever so if it's time varying or nonlinear it'll do weird stuff in the LTI case though you get this kind of steady state thing it's very reminiscent of what we saw with lqr and the last thing to talk about is kind of closing that so yeah this will be posted you guys can check this out um is kind of connecting the dots on that um Duality story a little bit uh so just a couple closing notes uh first one is what happens if your stuff's nonlinear uh so how do we apply this to nonlinear systems and you know you guys have probably seen this stuff uh before and proba implemented stuff like this before so that the the first thing to do is the What's called the extended common right where you basically just linearize at each time step about your current estimate and and that's it and [Music] proceed and um you know this has many issues uh and if anyone's everever done one of these right you it's very finicky it requires tuning but this does work and it you know goes back to Apollo Sly flew to the moon um but there's lots and lots of you know this can be quite finicky in in a lot of scenarios and so there's there's many many other generalizations to the nonlinear setting uh that many of which you probably heard of right so there's all kinds of variations on the EKF there's iterated ekfs there's and then there's the unscented colon filter right there's various other sample-based versions of this idea um and then you can get into like particle filters right that are like kind of Monte Carlo samples versions of this idea so it's lot lots of versions cool okay and so now the the last topic which is really cool um is this Duality idea uh so what I'm going show you right now is how not even just the linear case but the the general nonlinear case of this mmse estimator is actually exactly equivalent to an optimal control problem uh which is kind of kind of cool and so if you can do that then it sort of follows that in the in the linear case there's some connection to lqr so here's here's what this looks like um this mm problem is equivalent to the following uh optimal control problem so um I'm going to kind of do my my usual thing and what you'll see is so we're going to minimize over a state trajectory and we're going to also minimize over a noise trajectory so it turns out the process noise the W's take on the same role as the control inputs uh which is kind of funky here's how this looks so my cost function is going to look not that different than what we're used to seeing um I guess maybe I'll I'll just make this so it's going to look like here's our measurements our y's and this is our measurement model uh so I'm going to this is General nonlinear measurement models we'll call that G and then we've got a quadratic cost well quadratic is um that looks like the following thing let's write it down then maybe we talk about it and then we've got a sort of what would be our control cost but here it's the process noise and that looks like this and then we've got our Dynamics constraints and those look like this all right so let's talk about what this all is um so this is your measurement model um and this is basically this whole term is your state cost and in the case that your measurement model is linear this is a quadratic cost on the states right so that g turns into c times x and if you kind of expand that out it just becomes a quadratic cost and then this is basically like your sort of control cost so right it looks just like you transpose r u um and then the doubles here which are the process noise at each time step these become your controls and what your waiting up there the weight matrices right so the equivalent of Q and R from lqr are V inverse and W inverse it's the inverse noise covariances if anyone's looked at a log likelihood function before this should look very familiar right so basically the cost the quadratic cost in here is the log likelihood of a gausian that's the way I think about it um like this this quadratic thing is exactly what shows up in the exponent of the definition of a gausian distribution right um it's like xus mu times inverse coari time xus mu remember from when we wrote that down okay um okay so some notes on this uh if F ofx = A * X and g ofx = c * X this whole thing is an lqr problem and if you do that you can like go solve that lqr problem with dynamic programming like we know how and you'll basically get a Colman filter but it'll be this weird you know like it basically that's how you show this kind of weird equivalence this Duality between the colon filter and lqr is VI this definition but the the kind of you know interesting thing is this is General this this sort of Correspondence holds up in in nonlinear settings too so nonlinear F nonlinear G um it's still there's still this Duality basically I can write down a control problem that is exactly equivalent to the state estimation problem the cool thing now is this this opens us up algorithmically to doing a lot of interesting things and like to other weirder generalizations of common filters for nonlinear problems like um if I were to go solve this problem in like an m PC style way this is a thing people do this is called a receding Horizon State estimator um or backward smoothing extended common filter sometimes um the idea is like I'm basically solving a nonlinear full nonlinearly squares problem over some window into the past so instead of MPC looking forward over a second or two this is me Looking Backward over a second or two of previous measurement history and so that I'm solving as a nonlinearly squares problem and then the tail just like an lqr I maintain a cost too approximation to ex extend me into the fut past where I'm optimizing my Optimizer Horizon in this context what I'm doing is the tail is represented by a gausian prior basically like for anything that's older than my batch history in here in my Horizon I represent as a gausian like allaha EKF uh but I keep the nonlinearity in some window and solve that full problem so there's lots of cool things you can do with this it's kind of cool so this is like the the big idea around Duality is that there's like an equivalent control problem for the state estimation problem and with that we are done uh thanks guys

okay so last time um we uh got into kind of basically like stochastic optimal control and in particular the uh lqg problem which is you know basically the generalization of all this lqr stuff that we&#39;ve already spent a lot of time on to stochastic setting where you have noise on both in the Dynamics themselves so the Dynamics are St stochastic and noise on your and you no longer have access to the full State you now got sensors that are only observing say pieces of the state or whatever and those have noise on them too right so we&#39;ve we&#39;ve gone from like full State Everything clean and deterministic to everything noisy and and so noise on everything and you can&#39;t really ever have access to the full State and um last time we got through this whole uh thing about the separation principle and certainty equivalence these are like really kind of Bedrock Classic results that should sort of stick in your brain you should remember these things these are super cool um and um and the fact that you know even though those things don&#39;t hold in the general nonlinear case we still do them all the time and pretty much all like modern control things are kind of based on doing that separation thing into a state estimator and a controller and theoretically the justif if comes from this lqr story that we just talked about um let&#39;s see what else say about this that sort of Separation principle thing there are uh RL methods now that try to not do that and the gist there is um if if anyone&#39;s familiar right there&#39;s a lot of like reinforcement learning methods that take as input some like big lagged history of observations um instead of like just the current state the reasoning behind that is that&#39;s trying to capture distribution information um and rather than force you to kind of like sum everything up into a single state Vector uh so it&#39;s maybe trying to like not do the separation principle thing to some degree um what else to say about this the other thing I&#39;ll mention about lqg um lqg is like you know again one of these like great like theoretical playgrounds like control theorists love the lqg thing because it&#39;s simple enough that you can actually prove you know theorems and prove results in that context but like meaty enough that it actually means something and like is useful and there&#39;s still lots of good work happening in this lqg framework uh last few year some of my favorite work uh was um Ben W who&#39;s at Berkeley and one of his PhD students who&#39;s now Professor Cornell sarahan did a bunch of work um showing like basically trying to answer the question about like you know like answer some deep theoretical questions about model-free versus modelbased RL algorithms and like the way they got at this was by doing it in the lqg setting where you kind of can get some real answers and they specifically looked at like this question of like is it better to do something like policy gradient where you directly optimize a feedback policy or is it better to try to like learn a model first and then using the model do control like model based versus model free RL and in specifically in that context they were doing lqg so we know the optimal policy is linear so they just try to learn like a k Matrix basically versus should you learn the A and B matrices and then use those to just plug into ricotti and get the lqr policy what do you guys think the answer to that is which one of those do you think is better because most of modern RL is doing more the the the policy gradient thing right where you directly learn the feedback policy it&#39;s in the general setting it&#39;s like impossible to say anything about this but if You&#39; try this specifically in the LPG setting this is easy to try you can just code these things up and do like you know uh policy gradient or whatever po or you know sort of system ID and then like whatever anyone have any guess like what you think would be the better thing specifically better in the sense of what&#39;s more sample efficient so given the same everything noisy observations but BL and you can try stuff and explore should you fit the K Matrix directly or you should try to fit the A and B and then compute a k based on that what do you think requires fewer samples so you yeah so we got a couple different votes here I mean personally if I were just going out this knowing nothing I would I would guess that learning K is better because it&#39;s less to learn it&#39;s literally just like the feedback Matrix learning A and B it&#39;s more stuff right Like A and B are bigger than just the K but it&#39;s it turns out you&#39;re right learning that&#39;s right yeah yeah so it&#39;s maybe task specific or something so it turns out they&#39;re able to prove in this series of papers that um at least in the lqg setting it is better to do modelbased RL it is it is more data efficient to learn A&amp;B and then plug it into you know dqr or whatever and to get the controller than it is to directly try to learn the K Matrix which is kind of interesting and like to me a little bit counterintuitive up yeah I mean it turns out it&#39;s just better period like in terms of sample efficiency basically if you look at how much data you need to collect to get to some level of of performance of of in terms of cost or whatever it&#39;s like strictly better to do the the model based version I don&#39;t know kind of interesting nugget but this is like recent work in the last couple years that&#39;s you know again very current very sort of in this RL kind of context and it&#39;s basically built on lqg because that&#39;s the context in which you can actually get like hard results and and like prove stuff because the general nonlinear case is too hard for for Theory right okay cool so that&#39;s just some random usings by me that stuff&#39;s awesome though if you&#39;re interested highly recommend checking those papers out um today we&#39;re going to kind of try to finish off this lqg story specifically by talking about optimal estimation so last time we basically made this argument that because of this separation principle stuff the move is the optimal thing in this lqg setting is to basically just do lqr on your best estimate of the state and that&#39;s optimal so we already know how to do lqr so now the story is how you actually get that best estimate of the state so this is optimal estimation um and it the answer is it&#39;s the common filter but there&#39;s a lot more to the story so we&#39;re going to like kind of go through that today and maybe HP you to some things you might not have seen before in that context who&#39;s done a common filter before by most you guys right who&#39;s seen it actually derived from scratch before and gone through that okay so most of you guys good um so we&#39;ll do that and then I want to point out some some interesting little you know uh so we&#39;ll finish this lqg story and then we&#39;re going to talk about Duality who&#39;s seen this before like Duality between estimation and control couple people yeah so this is a really cool thing that will maybe bring us full circle back to a bunch of stuff we&#39;ve already talked about in the class um cool so that&#39;s what we&#39;re doing today let&#39;s get into it okay so first um from last time I just want a quick summary recap of some things um first off we we did this whole certainty equivalence thing just you know I kind of already just said these things and the separation principle and what all these things say is basically at least in the lqg context the best thing you can do is like you know run some kind of state estimator to get an optimal State estimate and whatever that state is you just plug it into the lqr controller the optimal controller and life is good and so the idea here is in the stochastic setting like um you don&#39;t have to basically like in the controller you don&#39;t have to reason about the full probability distribution you can just assume a deterministic optimal controller and then you just separately design a state estimator and you just hook them together and that&#39;s optimal not true in general but definitely true in the lqg setting we proved it um but even though it&#39;s not true in general we we frequently do this in practice uh ah damn it even though it&#39;s suboptimal in general and [Music] um the uh the kind of main weather you can get away with this or not thing comes down to what your distribution looks like over your States coming out of your estimator and basic you can get away with this as long as the distribution is uni modal um so like that peak of the distribution you know in the unimodal thing is like a reasonable estimate for where you are as soon as the distribution comes bodal like it&#39;s a it&#39;s a bad situation and you don&#39;t want to mess with that okay so that&#39;s that um and so we started talking about this last time just to recap uh a little bit and then before we get into the next thing just kind of what we mean by Optimal State estimation um and last time we talked briefly about this question of like what are you optimizing so this is the idea of like I might have some probability distribution it might be messy in general and at the end of the day to do this sort of Separation certainty equivalence thing I need a state to plug into my controller so if I have some weird distribution that&#39;s like Wiggly and weird and maybe multimo and looks kind of like this there&#39;s this question of like what is the what is the right optimal State just to return given some weird distribution like this and like we kind of talked about two options last time um one that&#39;s very reasonable is to pick the maximum of the distribution the highest point on the distribution so in this case that would be up here probably and this is called the map estimate or maximum a posterior estimate the other reasonable choice is to pick like the least squares answer or the minimum mean squared error answer which is roughly speaking like the the x that like is in the middle of the distribution right the the average the mean right and so that&#39;s maybe in this case somewhere over here which you you know is actually probably a pretty bad estimate because it&#39;s at a low probability you know um point and so uh we talked about [Music] maximum a posterior or map and like mathematically speaking this is basically the the ARG Max of the distribution so this is in particularly the the posterior distribution so we&#39;re going to get into this but what this is is the state conditioned on the measurements so the measurements are why so this is the called the posterior it&#39;s like after I&#39;ve included all my measurement information this is that posterior distribution that&#39;s been conditioned on all my measurements this is what I you know think the distribution over the state is given the measurements and now I want to say okay what&#39;s what should I return so one option is the peak that&#39;s that guy and this is probability of State given measurements okay cool and then the other one is this mmse thing error mm and here I actually want to blabber this one a bit because it&#39;s we&#39;re going to use a trick from this discussion so this one is ARG Min over X hat which is our estimate like usually mean and the thing we&#39;re minimizing is this expectation um so it&#39;s x - x hat uh transpose * x - x hat so this is basically least squares or minimum variance right they&#39;re all kind of the same thing least squares or minimum variance and now I want to show you a trick uh that I don&#39;t know I I know it only as the trace trick that&#39;s what I call it it&#39;s what I&#39;ve heard people call it I don&#39;t know if it has another name if anyone&#39;s heard of this before heard it under a different name let me know so the idea here is and we&#39;re going to use this a bunch today which is why I&#39;m going to show it to you in this kind of simpler context so okay so this is an inner product right a DOT product between those two things and the answer the thing inside the expectation there&#39;s a scaler right so I can like it means nothing but I can wrap that scaler in a trace operator and that&#39;s completely legal it does nothing right the trace of a scaler is the same scalar so I can I can just do that and it&#39;s kind of trivial so I&#39;ll go do that so I can write it as this trade of uh x - x hat transpose x - x hat cool so yeah that was kind of dumb and did nothing for us but here&#39;s the fun part okay so um inside of a trace I can permute like if I have a trace of a bunch of matrices like multiply together I can permute the entries inside the trace and I get the same answer it doesn&#39;t change the result you hav&#39;t heard of this before so if I have like trace of ABC it&#39;s the same thing as trace of like C blah blah blah you can mix them around right so this is equal to so I do that here and that&#39;s not not hard to show so cool so in that case I&#39;m allowed to do the following move and it doesn&#39;t change you know anything so I can do x - x Hat Time x - x hat transpose all I did was reorder the product inside and now instead of a DOT product this is an outer product right so it&#39;s trace of this outer product thing what&#39;s that outer product thing though we&#39;ve seen that already it&#39;s the covariance Matrix right the expectation of that outer product thing is the covariance Matrix exactly so now what we have and Trace is linear operator expectations linear operator so I can like flip these things around and it doesn&#39;t change the answer so now I can kind of move this around a little bit and this is now I can move the trace outside the EXP expect because everything&#39;s linear so now I can do trace of expectation of this outer prodct thing and this outer producty thing is the coari so this is Trace of covariant cool fun little trick um so this is just a trick that we should know about we&#39;re going to use it today a bunch um basically what it says is like um leas squares where you&#39;re minimizing the square we error where I take the inner product of the stuff this is equivalent to minimizing the trace of the coari under this kind of nice little Trace trick fun fact comes up it&#39;s useful has anyone seen this before couple people okay I don&#39;t know I call this the trace trick it&#39;s a fun little trick it shows up in like particularly in Colman filtery stuff but it&#39;s kind of generally a useful thing to know about uh so nice little mask trick and then the punch line here so this is a thing we&#39;re going to use today a little bit so there you go um uh and then the last note which we said last time is that this map estimator and the mmse estimator in the case of a gausian what happens to them they&#39;re the same right so if you have a symmetric distribution like you know uni modal whatever gausian uh these things end up coinciding and so we&#39;re going to also leverage that fact today and basically say we don&#39;t care which one we do and it turns out in our derivation we&#39;re going to do today you you can get a common filter from either perspective by solving either optimization problem you get the same answer because they coincide today we&#39;re going to use the mmse perspective to get the answer because it&#39;s more convenient for the way we&#39;re going to do it but you can do this other there are many many routes to the common filter you can drive it a whole bunch of ways I&#39;m going to show you one way there&#39;s at least three or four that I know of my favorite one which we&#39;re not doing a bunch which is a bit of a bummer is there you can actually do it geometrically which is super cool if you haven&#39;t seen this before the original Colman paper from the 60s does it geometrically as like a bunch of project it&#39;s really cool the original column paper is awesome you should check this out the only other book I know that does it this way is the kyth linear estimation book which is a great book if you I I will give suggested reading at the end if you&#39;re into this stuff I&#39;m a nerd I&#39;m into these things I like this stuff but you know if you&#39;re similarly you should definitely check out the original column paper in the kyth uh Min your estimation book okay and then yeah the someone oh okay so and then lastly these are the same for a gausian okay so that&#39;s a nice little kind of intro thing and now what we&#39;re going to do um is kind of Leverage some of these little ideas and we&#39;re going to go through a derivation for the common filter which is the optimal linear State estimator so in this lqg context you got linear Dynamics gou and noise on everything this is we&#39;re going to basically just go through and and find the optimal State estimator it is called the common filter youve heard of this before um so let&#39;s do it and like I said even if you&#39;ve seen this bar I think it&#39;s it&#39;s healthy because there&#39;s like I know of at least three or four different ways to derive it like it&#39;s healthy to see it done a couple different ways and just kind of like wrestle with it a little bit okay so the perspective we&#39;re going to go from today is um recursive uh linear mm estimator and um so kind of what we&#39;re going to do today is we&#39;re going to like play some tricks and shortcuts we&#39;re going to basically assume out the gate that this thing is recursive that has a recursive structure which we know it does you can derive it without assuming that out the gate and come to that conclusion we&#39;re just going to assume it out the gate it&#39;s linear we&#39;re going to assume that out the gate also uh which makes things go quicker and then we&#39;re going to go that mm route which we just talked about you can also go the map route and get the same answer okay so the recursive means that we&#39;re going to update we&#39;re going to maintain an estimate and in each time step we&#39;re going to update that estimate given a new measurement as opposed to batch which means taking a whole bunch of measurements at the same time and solving like for the whole trajectory at once that makes sense yeah cool so that&#39;s what recursor means it means you update at every time step okay so the general um idea here is we&#39;re going to kind of um bootstrap this by we&#39;re going to bootstrap we&#39;re going to assume a recursive structure we&#39;re going to oh yeah that was awkward apologies assume an estimate uh of the state at like time K that includes all measurements uh up to the current time so what that looks like so our X hat Hat&#39;s going to be an estimate and we&#39;re going to write this with a little Shand we&#39;re going to write it as K given K or k conditioned on K that&#39;s what that subscr means and the long hand version of that is this is the expectation of XK conditioned on measurements from time step one up to K so this is our estimate of X at time K given all all previous measurements up to including the current time step and so we&#39;re going to assume we have access to this at time K and what we&#39;re going to basically do is assume there&#39;s a recursion so given a new measurement at time K plus one we&#39;re going to figure out how to update this estimate to include that one new data point and once we&#39;ve done that now you&#39;ve bootstrapped this recursion you can do it for all time right that&#39;s that&#39;s kind of the strategy um and similarly we&#39;re going to also assume we know the covariance on this state um the eror equ variance another way of saying this is we&#39;re assuming gausian beliefs over the state right so we&#39;re just assuming we know the gaussian Distribution on the state and that it&#39;s gaan and whatever um so this we&#39;re going to write as big Sigma is going to be our state coari try and write this somewhat legibly and same same deal we&#39;re going to write it like this K given k thing and this is going to be the expectation of xus X hat uh this is K given K outer product e thing cool so that&#39;s our shorthand basically we&#39;re going to use these subscripts to indicate you know the first one is the time step and then the second one is what measurements they&#39;re conditioned on and the reason we&#39;re going to do this is we&#39;re going to end up with a kind of a two-phase update rule that&#39;s going to do a prediction step which is going to predict X hat and sigma so that&#39;s going to be like K plus one given K right so that&#39;s the next time step but we haven&#39;t included the new measurement yet and then we&#39;re going to update do a measurement update step that&#39;s going to give us K plus one given K plus one so that&#39;s why that notation there okay so right so basically the game here is we want to figure out um how to update X hat and sigma to include a new measurement at uh like TK plus one at the next time step cool so basically yeah we&#39;re taking a lot of shortcuts we&#39;re assuming linear it holds we&#39;re assuming that gusan stays gusan we&#39;re assuming it&#39;s we can do this recursively right all these things are true I&#39;m not going to prove any of those facts to you I&#39;m just going to like assume that structure going into it and we&#39;re going to derive like the the details um so basically the move here is we can break this into two steps and we&#39;re going to kind of go through and derive those two steps right now any questions so far yeah after after youed your system yeah all of this is in discrete time do you do Des critize then linearize then do this or can you just assume the nonlinear system is linear I mean it kind of doesn&#39;t matter what order you do that in you can linearize in continuous time then discretize it or you can like discretize the nonlinear thing first and then linearize that and it doesn&#39;t really matter you&#39;ll get basically the same answer um yeah sorry say again xn the whole basically the entire trajectory conditioned on all the measurements yeah yeah yeah so so that&#39;s like if you&#39;re doing this offline if you&#39;re doing offline analysis and you have the entire data set and you want to then reconstruct the trajectory in an offline context right so then you&#39;re that the that&#39;s just Bach squares actually then you&#39;re just solving a big Bach squares problem the deal here that&#39;s slightly different um another way of saying this is this is causal as in the current estimate only depends on now and previous times not on the future the version you&#39;re talking about the current estimate at some time in the middle of the trajectory would actually be informed by Future measurements that I don&#39;t have yet right which is fine if you&#39;re doing it offline and you&#39;re analyzing a data set offline but if you&#39;re on the robot in Real Time you can&#39;t use future measurements to figure out where you are now right so this is purely like a causal kind of thing where it&#39;s only taking account past measurements that you have access to me why do I care about the past measurements yeah because they tell me where I am now so here&#39;s like a simple example this I only have position measurements my state though is position and velocity how do I know my velocity if I only have single position measurement from right now oh see does that make sense so very very frequently um this gets into this idea called observability which is a dual notion to control ability usually the why is not and one why at one time step is not enough to reconstruct the full State like very often it&#39;s some partial set of states or some you know function of the state for example only the positions or something like this right or only only some part of the pose it&#39;s not the full State Vector you can basically never get the full State Vector kind of like in a clean way right so you&#39;re always trying to reconstruct that state Vector from partial noisy partial observations so in the context where you only have say positions but not velocities coming in from your sensors the way you get velocity is by using previous positions and effectively a naive way to do it would be to finite difference this position and your last position right but finite differencing amplifies noise and does bad things so by doing a filter like this and leveraging the Dynamics model you can do much better than like naive finite differencing make sense cool anybody else all right here we go cool all right so here&#39;s what we&#39;re going to do so the two steps um the first one is the prediction step and the idea here is we&#39;re going to use our Dynamics model and we&#39;re just going to like propagate our distribution forward one time step no measurements yet it&#39;s just so the the using our notation this would be first we&#39;re going to predict the the state uh estimate or the mean of the distribution and this is going to be k + one given measurements up to time K using our notation and if I expand that out it&#39;s by its definition I literally just take the definition of that expectation and I plug the Dynamics in so this is the expectation of now just ax K plus b k um and remember I have my stoas Dynamics with the W in there now plus w k and all of this conditioned on measurements y from 1 to K right so it&#39;s literally the definition of X hat K with the Dynamics plugged in and so if I stare at this for a second like literally all that happens is um I know you it&#39;s not a random variable the only random variable in there is the W and remember we&#39;re assuming it&#39;s zero mean gusan noise so the expect of w is z so all this does is essentially apply the Dynamics to our last best estimate so it just looks like this uh ax hat K given K plus b u k okay so that&#39;s the prediction step on the state now we also need to do the prediction on the covariance so this is Sigma k + 1 given K this is same idea we&#39;re just going to take the definition and plug the Dynamics in it&#39;s um plug and chug it&#39;s not super hard so remember the definition of this guy is x k + 1 ni X hat k + 1 given K that&#39;s the outer product and I&#39;m going to start using shorthand notation here so when it&#39;s you know an inner or outer product and it&#39;s the same thing repeated I&#39;m going to do this so I&#39;ll write the first half and then dot dot dot for the second half save myself from lots of terribleness okay so now same story I&#39;m just going to plug the Dynamics in here uh so this is uh expectation and so now I&#39;m going to plug the Dynamics in on that XK plus one and on the X hat and so I&#39;m going to get the following ax K plus bu K plus w k minus ax k + 1 given K minus B UK uh and then dot dot dot same thing okay so first to note is that uh the control term looks the same on all the x&#39;s and the X hat so these go away these cancel right out um that&#39;s not a random variable blah blah blah um and then the next thing to note is uh basically the the A&#39;s on the X&#39;s um expectation is linear so I can like kind of move some stuff around and and basically all it&#39;s going to happen is I&#39;ll I can split this into two terms uh the first one will have the X terms in it and I can pull the A&#39;s out on that one so that&#39;s going to look like this so a and then inside expectation of x k minus X hat K given K dot dot dot transpose and then a transpose on the other side and then the noise term the W shows up on both sides in this case and so that&#39;s w k WK transpose which is um all right so then we&#39;ve got these two terms this one from our definitions is Big W the noise covariance process noise covariance and then the first one what I&#39;ve got there is just the original definition of Sigma K given K so this guy in the expectation is literally just the definition of the state coari from the previous time step um so now if I like put all that together I&#39;ve got this like I don&#39;t know what I like to call Matrix sandwich um and so if you think about this for a second what this is saying is literally just I take the covariance at time K and just shove it through the Dynamics that&#39;s what that sandwich with a is yeah don&#39;t you have cr Wells yeah oh yeah sorry um this is a thing we talked about a bunch last time which is this idea of the noise at time K is uncorrelated uh from the state at time K so the idea here is if I&#39;m at time step K and I have a state whatever it is cool that&#39;s where I am the noise sample is coming in after that it&#39;s coming in in the time in the Dynamics that are propagating me to the next time step and it&#39;s assumed to be drawn from a gausian at at that time right so the current state does not depend on basically it&#39;s it&#39;s sort of weird ambiguous because of the way we&#39;re defining this in discreet time but essentially that noise is drawn at time K once I&#39;m already there so these things are not correlated yeah so I should write that again we did we did talk about this last time but yeah just to thanks for the reminder um uh so we&#39;re saying that basically XK and WK are uncorrelated and therefore the cross term are zero right by definition is everyone cool with that yeah ah sorry yeah so a times is that a that&#39;s probably just a mistakes okay so yeah sorry this one uh to be clear this one this is correct and what I&#39;m doing is writing this as a * X this one is a x this is wrong here so let me fix that thanks good catch so all we&#39;re doing is putting in the Dynamics right and writing it as the last time step fast forwarded by the Dynamics and basically at the end what you get is this thing which is saying I just take my coari at time K and run it through the Dynamics and then I add on process noise term that&#39;s just making the noise bigger right okay everyone cool with that we&#39;re going fast um and that Matrix sandwich thing shows up all the time that&#39;s literally just how I propagate like a coari matrix through through a linear transformation right it&#39;s like a change of coordinates on a matrix or whatever so that shows up all the time um and it falls out here from doing this expectation math but just generally if you have a matrix written in like one coordinate system and you have a CO a linear change of coordinates like you kind of smoosh it on both sides right and that&#39;s that&#39;s how it works okay so that&#39;s the state and covariance prediction um so now I have from this you know if I if I have a a Gauss at time K given all my measurements so far I can predict where I think the state&#39;s going to be at time K plus one and the coari by running it through this right so it&#39;s just run the state the mean through the Dynamics and then the coari basically run through the Dynamics and add a process noise term to like inflate it a little bit okay so that&#39;s that the next step is the measurement update okay and so the idea here is we&#39;re going to like Define a couple things um and take some shortcuts basically I&#39;m going to assume a linear update blah blah blah and then we&#39;re going to figure out the details so uh right at the gate I want to Define this thing called the innovation has anyone heard of this before okay so this is like a common lingo in in like the filtering literature um I&#39;ll write down real quick so we&#39;re going to use the letter z for this and I guess this a Time K plus one and what this thing is is basically the difference between the actual measurement so the me the Y here is the actual measurement you get at Time k+ 1 and the predicted measurement so I take my predicted State and I run it through my measurement model so remember here I&#39;m assuming my measurement is C * X plus a measurement noise term so the predicted measurement is just this guy uh on my predicted state so this is K plus one given K so in some sense this is like your surprise I I actually think the word for I would use for this is surprise it&#39;s like how off your prediction is from what you actually saw right so this I mean people call this Innovation it&#39;s basically the idea is it&#39;s the it&#39;s the error in your prediction so to speak right it&#39;s the how surprised you were by the measurement you got and the reason we bother with this is this is basically the error that you feedback in the common filter right so common filter is linear feedback um on some kind of error right to try to control one way to think about it is it&#39;s trying to control your state estimate and it&#39;s doing that by you know feedback on the measurements and this is the thing that&#39;s actually getting fed back this is the error term you&#39;ve got the error between your predicted measurement and the real measurement that you actually got cool uh cool so let&#39;s and so if we write this out kind of like in some more detail um like what this thing is it&#39;s uh so the the Y there is going to be c times the actual say State maybe um plus this measurement noise uh term we talked about before and then minus this like expected measurement cool so that&#39;s like the definition of that thing and then the next thing we need unsurprisingly is the Innovation covariance so I need to figure out what the coari of Z is going to be this thing&#39;s gausian right so gaussian&#39;s we kind of assume this here but like if if anyone hasn&#39;t heard this before gausian stay gausian under linear apine Transformations which kind of what makes all this derivation work so if I take a gausian and plop it through linear Dynamics or a linear measurement model it stays gausian yeah is the C Matrix given or yeah so C is here is our measurement model it&#39;s the uh we&#39;re assuming it&#39;s linear and it&#39;s basically if I have the state X it&#39;s the function that Maps X into my measurements and it doesn&#39;t update time it can yeah so I should have said this out front too so here we&#39;re assuming the standard like lqg assumptions from last time so it&#39;s XK + 1 = a * XK plus b * UK and then yal C * XK so that&#39;s it plus noise on everything and here we&#39;re going to write everything as LTI so the A and the B and the C don&#39;t change in time nothing changes if they are time varying though you can like carry through the entire thing with indices on the C&#39;s and A&#39;s and B&#39;s and it all still works the same way I&#39;m not going to do it here just to save myself writing and like little subscripts that you can&#39;t read anyway because my handwriting is bad um but does it make sense all right everyone cool yeah me mod have that directly so you&#39;re not passing the measurement through the measurement model you&#39;re passing the state through the measurement model why why do we need to do that why can&#39;t we directly take that as say y hat you it&#39;s not y hat it&#39;s just y That&#39;s a thing you get from your sensors there&#39;s no X head will take that as y head and subtract that and that be the between the actual measurement and the sure if you want to call that thing y hat that&#39;s fine by me the reason we&#39;re doing it this way is because I&#39;m going to use this in one second to like derive this covariance result so yeah bear with me for one sec all right so the next move here is we&#39;re going to look for this Innovation covariance this actually is quite useful in filters in general um we&#39;re going to call this s uh K +1 and this is going to be expectation of ZZ transpose like you know just how we&#39;re used to thinking about Co variances um and so I&#39;m going to expand this out and plug the definition of ZK plus one just there in uh so it is um C * XK + 1 + v k + 1us C * X hat k + 1 given k uh times that transpose and now we&#39;re going to play our game of things being uncorrelated and say same as before that like uh VK + one and x a + 1 are uncorrelated which is sort of obviously true in this case this is saying the state at the current time and the sensor noise are uncorrelated is definitely not true of in general right like this could be you could imagine a scenario in which your state is correlated with your measurement noise this is not that hard to cook up for instance like let&#39;s say you have a sensor where the noise is proportional to the temperature or if you have a camera and like it&#39;s proportional to like just the ambient light you know you could very much imagine scenarios where like your robot pose influences the amount of noise you get right like if I&#39;m in a very noisy environment versus a less noisy environment if if like I I don&#39;t know I am user actually generally the noise is proportional to temperatures for so if I&#39;m in a hot room versus a cold room if I move around you could have this this this one is you could imagine scenarios whiches is not true right and there&#39;s way to deal with that but here we&#39;re going to make this the standard assumption that they&#39;re uncorrelated and the idea there right like wherever I am right now I&#39;m just drawing a random sample from aashian right now on my sensor effectively and that&#39;s that&#39;s what this is cool okay so we&#39;re going to assume that uncor ated thing again and now um kind of expand this out so this isk + one is um and so similar game as last time I&#39;m going to pull the C&#39;s out here and split this up into two terms so I&#39;ll have one term that&#39;s just um the X stuff this ends up being x k + 1 minus X hat k + one given K dot dot dot um and then C transs on the other side and then we&#39;re going to get a noise term it&#39;s going to look like this and so this is like if you stare at this for a sec it&#39;s pretty clear that this is this thing in here is something we&#39;ve already derived which is the predicted state coari so K plus one giv K from the last step and then this guy is just the sensor noise Co variance and so then the punchline from this is that this Innovation covariance is um basically smooshing my predicted stage covariance through the measurement model so it looks like this and then plus the measurement noise co-variance make sense cool and then yeah the the reason we care about this is the innovation is the error signal that we feed back into the estimator and um this idea of like Innovation Innovation ciance one area this is like really useful um in Practical filter design is uh this is a really really good way to do outlier rejection so you basically remember like Innovation is surprise and I have aarian on my surprise so this is saying like if I get a measurement that&#39;s like three three sigma away from my prediction or like more than that say it&#39;s like five or six Sigma way way out the odds of that being legit are pretty low right and so the maybe most standard way to do outlier rejection in a filter is to look at this Innovation and this Innovation covariance and if your Innovation is huge like relative to the covariance if it&#39;s like many Sigma away like you put some bound on that and you throw it out if it&#39;s more than say three sigma away from what you expect and you call it an outlier so I don&#39;t know fun fact these are useful things all right cool so that is the Innovation and so now that we&#39;ve got this stuff uh defined here&#39;s uh and just that statement I just made um we can now you know write down what the state update is the measurement update based on those things just basically this and the fact that it&#39;s linear it&#39;s just doing linear feedback on that Innovation so here&#39;s what that looks like you&#39;ve got xat k + 1 given k + one so like our updated estimate is going to be the predicted one so k + one given K and then plus this feedback term um we&#39;re going to call the feedback gain Matrix here L uh instead of K because we&#39;re using k for lqr and we will mix these together that said um L actually has historical precedent and is not uncommon L is for luenberger who is another giant of you know kind of linear dynamical systems theory uh who was the first guy to really like formulate state estimators in in kind of state space this way uh so this is lk+ one times zk+ one so um and this L here uh in our context you can write in you can write down any L you want if you just write a random one down like if you do old school pull placement kind of stuff and you&#39;re not optimizing it per se it&#39;s called a luenberger Observer uh just like you can have any kind of linear feedback that&#39;s not lqr right if you optimize it and pick the optimal one in the sense we&#39;re about to do then it&#39;s called the Colman gain and this is now a Colman filter uh so this L going be called the colan game okay so now we know how to do the state update the big question is what&#39;s that L supposed to be and we haven&#39;t gotten there yet we&#39;re going to assume it looks like this it&#39;s linear feedback on the Innovation um but we need to do a little bit more to get to the actual expression for l so the first step in getting there is now we&#39;re actually going to write down the covariance update because remember we&#39;ve also need to figure out what uh Sigma K +1 given k + 1 is so to get to this we&#39;re just going to go back to the definition and plug stuff in so this is x k + 1 minus X hat k + 1 times that transposed um so now what I&#39;m going to do is just go plug in basically this state update equation and expand it out so here that looks like um expectation of x k + 1 minus X hat k + 1 given K minus l k + 1 times and now I&#39;m going to plug in the definition of Z this just plugging in all stuff we&#39;ve already done it&#39;s slightly messy C * x k + 1 plus v k + 1 minus C * x a + 1 given K Okay so that&#39;s one two parentheses and then same thing over here transpose okay so I just plugged in the update equation right there and then I plugged in the definition of Z from up above in in this whole thing so now there&#39;s like another round of this whole assuming things are uncorrelated um once again it&#39;s VK + one and x k + one are uncorrelated which we already kind of did and then we&#39;re just going to expand this thing out a little bit and massage it um it&#39;s not hard it&#39;s like obvious stuff based on like if you followed all the rest um and so what this works out to be is I minus LC time Sigma k + 1 given K time IUS LC transpose so it&#39;s the Matrix sandwich of like our predicted Co variance with this sort of Colman gain thing um and then uh the measurement noise similarly sandwiched through the comman game okay so this is the coari update equation um has anyone seen this thing before in their Colman filter Explorations yes kind of maybe so we we had a little sidebar discussion a couple classes ago some folks who were messing around with um like the IQR stuff and talked about how uh remember we went through the lqr derivation a few different ways and in fact came up with different expressions for the rotti equation couple different ways same thing is true in Colman filter land this form of the measurement update uh has a name this is called the Joseph form fun fact and basically I&#39;m showing you this one this is the one you should actually use in practice there&#39;s at least two or three other ways to write this um that all have various shortcomings numerically speaking they&#39;re all exactly identical in an infinite Precision arithmetic but this one is the one that&#39;s nicest in a computer and with floating point because it&#39;s symmetric so like both these expressions are symmetric so the answer is going to be a symmetric Matrix and because I&#39;m adding these two terms together so other versions of this are not symmetric as they&#39;re written and so because of floating Point roundoff they can give you not symmetric covariances which is bad and then also a lot of them have minus signs in them and under floating Point round off that can give you an indefinite I.E a coari with negative I values which is also a big no no like covariances have to be at least positive to semi-definite so this one is basically guaranteed to give you a symmetric positive semi-definite matrix by construction because the way it&#39;s written so you should do this one essentially and if you see other ones written down in books and papers which you will um that&#39;s fine but when you go code up your column filter you should do it this way um there&#39;s other ways to register to kind of okay too there&#39;s other but this is like kind of of the ones you will see probably the best one yeah survey paper go all the Lessons Learned just really like as far as floating point prision the limitations of Hardware yeah yeah like those things are hard to quantify yeah but look out it&#39;s like oh man it&#39;s made a big difference yeah yeah yeah I mean there&#39;s there&#39;s almost certainly papers that kind of go through this stuff I would say the stuff that I&#39;m talking about now these are like street fighting you know kind of skills that are like more tips and tricks and like practical things that are not so much Theory or math or academic WR I mean there&#39;s definitely some of this stuff written down but these are more like the hard one lessons of like you know if you like go talk to people who&#39;ve written like these things on real deployed systems and like gone through the pain like these are the things you pick up doing that you know so there these are I&#39;m telling you these things but like a lot of the stuff is just like kind of Street knowledge that&#39;s not like written in books and papers some of it you&#39;ll definitely find but like I would say there&#39;s a whole bunch of like these kind of Street Fighting tricks that are not sort of common knowledge and books and stuff so I&#39;m feeding you some of them I&#39;m happy there&#39;s um for specifically for common filters there&#39;s this guy Tucker McClure who um has an who&#39;s a consultant on this kind of stuff who has an awesome series of blog posts his website is an uncommon lab.com uh if you go plug that in and he has a really really awesome like three-part blog post series on like a lot of practical tips and tricks for common filters that like are his he basically does Colman filters for a living as a consultant and like um has done lots of this and he he has like a a whole really great like actually his blog post is one of the best resources I know of for this kind of stuff so that&#39;s where I would look if you if you want me I I&#39;ll send it out on Patza later you already got it yeah send it out to your friends it&#39;s good it&#39;s really good yeah all right uh yeah like space b like y CX plus du what if you have a d term yeah um you can do it that way like practically speaking I&#39;ve literally never encountered a system where that was a thing um so that&#39;s why I don&#39;t do it uh but yeah it I mean it doesn&#39;t change the derivation much cool it basically shows up like the bu yeah where it shows up in the mean but not the coari update it doesn&#39;t really do much it like if you followed this if you want to throw the DU in there it&#39;s like no big deal and you can find that stuff in books and stuff too okay so that&#39;s the covariance update that&#39;s the Joseph form thing that&#39;s the street fighting skills about covariances being positive and definite and symmetric and stuff that you should know about the other way to fix that which is the legit thing and the Tucker McClure blog goes into this the legit way to do this on an embedded system if you really want to make a bulletproof Real Deal colum filter is to write it in the so-called square root form so what that is is rather than keep track of Sigma in the filter you keep track of the square root of Sigma aka the chesy factor and the reason you do that is then if you keep track of the chesy factor the actual covariance is always the square of that thing and so if you square that matrix it&#39;s guaranteed to be symmetric positive definite so that&#39;s a Slick Trick that guarantees positive depthness under like floating Point roundoff error and stuff it&#39;s more numerically stable also and has other nice numerical conditioning winds the blog post has all that stuff it&#39;s great um there&#39;s a like awesome I should okay I&#39;m getting really sidetracked now but like historical anecdote um the colan filter this thing this is obviously Rudy Colman in the early 60s um the extended Colman filter is some called times called the Colman Bucy filter um and it or no col colan Schmidt filter Schmid is the name on that basically the story there is in the Apollo program this is only like a few years after the original col paper I think it was in like 1967 Schmidt was an engineer working at Matha Ames I think on the um guidance like navigation for Apollo and basically they had this like intractable problem of like how to navigate Apollo and he basically went home over the weekend the legend is he basically went home over the weekend saw the colan paper which had like just come out like fairly recently and this is pre- internet so like you know whatever he came across the colan paper original one like had the AA moment invented the extended version like the nonlinear version for the Apollo like navigation problem and like showed up at work on Monday with the answer and they flew it that&#39;s how they flew to the moon it&#39;s kind of like insane legendary you know kind of story uh anyway cool so colan Schmidt all right so okay we got to finish uh Coleman gain so this is now how you find the L um okay so basically what we&#39;re going to do is we&#39;re going to use this mmse idea um so what this means mathematically is we&#39;re going to minimize the essentially this like the least squares thing right so it&#39;s expectation of the inner product of these things and now we know how to get all this stuff uh so it&#39;s this inner product um and the next move is we we learned the trace trick at the beginning of today so this is equal to um the trace of our Co variance our our updated coari K plus 1 given k+1 so this is the thing we&#39;re trying to minimize and specifically we&#39;re trying to minimize that trace of that coari as a function of L which is literally this Joseph form expression right here so literally what we&#39;re going to do is try to minimize the trace of this guy as a function of this guy that&#39;s what we&#39;re doing so quick question what kind of function is that as a function of L it&#39;s quadratic yeah it&#39;s a quadratic function of L and if you stare at this for a sec you know sigma&#39;s symmetric positive definite so it&#39;s it&#39;s a convex quadratic function of L so that&#39;s beautiful we know what to do with that so the answer is I basically take this expression and I take its derivative with respect to L and set it equal to zero that&#39;s the whole game and that gives me the common game so uh let&#39;s write that down I&#39;m going to skip the details because the details are messy because what I&#39;m actually doing here is I&#39;m differentiating a matrix valued expression with respect to another Matrix technically that derivative is a fourth rank tensor which is super super gnarly you can do it um you can actually do it with specifically with the crve tricks that we talked about when we did DDP so the legit way to get this is to use the cron tricks to massage this around into the or you can write down the tensor it&#39;s a little messy so I&#39;m going to kind of skip those gory details and just tell you the answer if you want the Gory details you can try doing the konvex stuff on this uh so what we&#39;re going to do is set partial of trace of Sigma k + 1 given k + 1 which again is just up here this Joseph form expression uh with respect to l k + one we&#39;re just take that derivative and set equal to zero and then solve solve the resulting thing for L k+ one so if I do that uh the answer that you find is l k +1 is Sigma K +1 given K time cpose times s k +1 inverse where that&#39;s our like Innovation covariance thing that we just derived so that&#39;s the punch line it&#39;s a little messy to get there but um it&#39;s not really that bad and I&#39;m going to like sort of save you the pain right now because it&#39;s not very interesting just a lot of algebra and tell you the answer it wouldn&#39;t hurt you to like go actually do this yourself just to like by the way yeah if you actually want to know this stuff I like highly highly recommend like what I would suggest doing if you really want to learn this stuff which I like still do occasionally is like start from scratch and say okay gaussians and linear Dynamics and linear measurement model and just start from there and then try to derive this whole thing which I I can still like basically once a year I try to just do this to make sure I like stay sharp and still remember how um but like like doing that kind of stuff like deriving lqr from like the basic building blocks you know once in a while like that&#39;s like what you should be able to do if you if you really want to understand this stuff um okay so to summarize the whole thing uh so first off we&#39;re going to start with our kind of um gouin prior so in this case this would be like X hat at Time Zero you know and similarly Sigma at time zero and then our obviously our system specifications like ABC and the the noise covariances all that good stuff cool so we start there then we&#39;re going to do the prediction step which we saw is you know basically shove the the mean through the linear Dynamics and similarly sh the coari through the Dynamics where you get the Matrix sandwich deal and then the noise added on uh next you calculate the Innovation stuff The Innovation and Innovation covariant um which looks like this so uh basically remember the definition here is surprise there a difference between the measurement we actually got versus the measurement we predicted based on our predicted state in our measurement model so surprise and I really should just I just want to change the name on this I don&#39;t like I think Innovation is a weird surprise is a way better name for this so it&#39;s and then the coari on that guy which similar right we take our prediction our predicted State ciance and shove it through the measurement model um and then tack on the measurement noise coari so there then we&#39;re going to calculate our colan gain uh so that&#39;s just what we just got sort of uh k + 1 given K Time cpose Time s inverse and then finally we&#39;re going to do our measurement update so that looks like this uh so it&#39;s linear feedback like we said on the innovation and then um we do the sigma update which is this Joseph form deal which you may see other versions but those other versions are not as good you should do it this way uh Sigma K uh Kus l c transpose and then noise turn Okay cool so that&#39;s the whole thing and then finally you you just repeat this right now we&#39;ve bootstrapped the recursion and you go back to step two keep going so that&#39;s the recursive thing right okay now we&#39;re going to show you like the example code anyone have questions on this while I put the code up yeah yeah you said so you said X hat of K was dependent on the whole measurement Vector but didn&#39;t you just kind of show that it&#39;s like dependent on your me Vector your measurement at that time and then your last state um well yeah so I mean you you have a state at time K you predict the time K plus one take a measurement and like you fuse those together yeah the expectation back sorry like you the expectation is the state given your measurements from one to K yes so this seem like slightly contradictory sorry yeah explain so so you&#39;re saying it only depends on the measurement at time K plus1 say and not the previous ones so it depends on the previous ones by virtue of them being folded into the previous States right so at time K if I&#39;ve run this thing for a bunch for K time steps I have already done K measurement updates on all those previous measurements right okay so the idea is basically at each time step I&#39;m including one new measurement but I&#39;m doing that on that previous state estimate that has already had the previous measurement and then the one before that folded into it right does that make sense so what this is a really if you know linearly squares anyone heard recursively squares this is basically a really fancy version of recursive Le squares which um if you like if you know linear Le squares right like ax equals b or ax minus B right minimize some of squared errors if you stare at that problem you can derive a recursion there where you basically um can update uh the least squares estimate one row at a time so like the idea there is if I if I do solve a Le squares problem for ax and B if I add one new measurement so I like one new row to a and one new row to be I can it turns out I can efficiently update my previous lease squares estimate by just like solving one little problem given that one new row instead of like having to resolve the entire Lee squares problem so that&#39;s called recursive Le squares that&#39;s a nice simple idea that&#39;s like fairly easy to grock that you can find on Wikipedia and stuff basically this whole common filter thing is a very fancy up version of that um where you&#39;ve got like Dynamics models and stuff like that if that I don&#39;t know if that helped at all yeah what point can you like for like I don&#39;t know if it&#39;s cost saving but forget previous measurements if you don&#39;t need them any yeah so the idea of this recursive thing is there&#39;s no there&#39;s no cost you don&#39;t need to forget like here right like basically the algorithm at each time step it&#39;s the same thing I do this measurement update on this one measurement and the measurement at each time step so you&#39;re totally if you were doing a batch kind of thing where I was trying to like that&#39;s the beauty of this recursive idea if I were solving that going back to the least squares example if every time I got one new measurement I had to stack all those up and solve a giant problem you&#39;re right you would want to like forget and maintain a fixed Horizon there or something um but in this context it doesn&#39;t change the the expense of the update at all like every update is just a single measurement I just do it at every time step that never changes and like I&#39;m basically folding in all the prior information into that gausian at every time step right so that the cool part is the complexity algorithm is independent of of the Horizon length so you don&#39;t need to forget basically if you want to forget you can there&#39;s actually pretty straightforward ways of introducing what&#39;s called a forgetting Factor um here to do that kind of thing but you really don&#39;t need to um by the way that those forgetting Factor ideas are all basically also just very closely related to the co variances in here so you can kind of tune the co variances in such a way that it kind of gets you that effect basically High process noise is very similar to a forgetting factor it&#39;s saying I don&#39;t trust the the propagation very much so it&#39;ll kind of like downweight the predictions and upweight the newer measurements uh okay so just to kind of run us through this real quick this will be posted this is like a very very very simple implementation of this thing on a very dumb system this is a double integrator so like the sliding brick in 1D which we&#39;ve seen many times um I&#39;m putting noise I&#39;m basically having uh some kind of you know White Noise Drive signal to this so it&#39;s like basically a random walk uh and then I&#39;ve got white noise on the position measurements I&#39;m assuming just position measurements no velocity measurements so that&#39;s a b and c over there just set up stuff I&#39;m going to put an lqr controller on here that&#39;s trying to drive us to the origin so we&#39;re going to do the whole Clos Loop lqg thing I hook up the colan filter to the um lqr controller this is like you know very straightforward dumb like lqr cost I&#39;m going to get out my lqr controller I&#39;m going to start this like I don&#39;t know a meter to the right say at zero velocity and the controller trying to drive it to the origin um and I&#39;m going to initialize the filter with like a very bad initial guess which is the things at the origin which is wrong and then like an identity covariance random stuff um this is just kicking stuff off and then down here here&#39;s the um I wrote out the first time step in detail here so you can see like this is literally just the stuff we just wrote down um all the steps and then um um you can see like in the propagation I&#39;m actually like adding a I&#39;m drawing a sample from a gaus in here and adding that on to the Dynamics that&#39;s the stochastic Dynamics and then we&#39;re just going to kind of loop this and see what it looks like so you can see basically the blue here is uh ground truth and then the Orange is the estimate so it&#39;s like I put a lot of noise on here so it&#39;s it&#39;s quite noisy but you can see that the the estimate is tracking the truth right it&#39;s noisy version of it we can run this a bunch I&#39;m being pretty mean here and putting a lot of noise is like kind of you know obviously noisy but the the punch line is this actually does you know work and and still like in spite of the noise right it&#39;s like uh the the mean is converging to the origin and you can run this a bunch you guys can try this one thing I did want to point out that&#39;s kind of fun is if I plot the covariance entries what&#39;s happening this would show up by the way if I also if I plot the colan game What&#39;s Happening well they&#39;re obviously getting smaller which they should but in particular they&#39;re converging towards steady state values that is neat where have we seen this before infinite Horizon lqr Yeah so basically because this is an LTI system similar in the lqr context it turns out there&#39;s this Duality thing which we&#39;re going to talk about in one second but like um similar to the infinite Horizon lqr thing here uh the pro variance on the state and the colan game all converge to steady state values after a while in the LTI context so if everything&#39;s time invariant like the covariances are time invariant blah blah blah it converges to a steady state value and in fact you can play the same game here if you really have an LTI system you can basically just do this and then have a you can basically just run with a constant colon gain that is the converge steady state value very similar to the lqg lqr thing we talked about coning to zero no no this is nonzero I mean it&#39;s just a lot smaller than we started with because I started with a bad initial estimate point is after you run this for a while um essentially you&#39;re going to converge to some steady state coari which is the best you can do and like because you&#39;re constantly basically one way to think about this is the the filters trying to squish the covariant down as as aggressively as possible but the System Dynamics right at each time step remember we&#39;re we&#39;re adding process and measurement noise to it so one way to think about this is the filter is trying to squish the coari and the physics is trying to blow it back up and so those things are fighting each other and if you run this long enough to steady state they&#39;re going to kind of reach an equilibrium and you get this steady state coari value that&#39;s in the time invariant case but in the general case you know it can wiggle around and do whatever so if it&#39;s time varying or nonlinear it&#39;ll do weird stuff in the LTI case though you get this kind of steady state thing it&#39;s very reminiscent of what we saw with lqr and the last thing to talk about is kind of closing that so yeah this will be posted you guys can check this out um is kind of connecting the dots on that um Duality story a little bit uh so just a couple closing notes uh first one is what happens if your stuff&#39;s nonlinear uh so how do we apply this to nonlinear systems and you know you guys have probably seen this stuff uh before and proba implemented stuff like this before so that the the first thing to do is the What&#39;s called the extended common right where you basically just linearize at each time step about your current estimate and and that&#39;s it and [Music] proceed and um you know this has many issues uh and if anyone&#39;s everever done one of these right you it&#39;s very finicky it requires tuning but this does work and it you know goes back to Apollo Sly flew to the moon um but there&#39;s lots and lots of you know this can be quite finicky in in a lot of scenarios and so there&#39;s there&#39;s many many other generalizations to the nonlinear setting uh that many of which you probably heard of right so there&#39;s all kinds of variations on the EKF there&#39;s iterated ekfs there&#39;s and then there&#39;s the unscented colon filter right there&#39;s various other sample-based versions of this idea um and then you can get into like particle filters right that are like kind of Monte Carlo samples versions of this idea so it&#39;s lot lots of versions cool okay and so now the the last topic which is really cool um is this Duality idea uh so what I&#39;m going show you right now is how not even just the linear case but the the general nonlinear case of this mmse estimator is actually exactly equivalent to an optimal control problem uh which is kind of kind of cool and so if you can do that then it sort of follows that in the in the linear case there&#39;s some connection to lqr so here&#39;s here&#39;s what this looks like um this mm problem is equivalent to the following uh optimal control problem so um I&#39;m going to kind of do my my usual thing and what you&#39;ll see is so we&#39;re going to minimize over a state trajectory and we&#39;re going to also minimize over a noise trajectory so it turns out the process noise the W&#39;s take on the same role as the control inputs uh which is kind of funky here&#39;s how this looks so my cost function is going to look not that different than what we&#39;re used to seeing um I guess maybe I&#39;ll I&#39;ll just make this so it&#39;s going to look like here&#39;s our measurements our y&#39;s and this is our measurement model uh so I&#39;m going to this is General nonlinear measurement models we&#39;ll call that G and then we&#39;ve got a quadratic cost well quadratic is um that looks like the following thing let&#39;s write it down then maybe we talk about it and then we&#39;ve got a sort of what would be our control cost but here it&#39;s the process noise and that looks like this and then we&#39;ve got our Dynamics constraints and those look like this all right so let&#39;s talk about what this all is um so this is your measurement model um and this is basically this whole term is your state cost and in the case that your measurement model is linear this is a quadratic cost on the states right so that g turns into c times x and if you kind of expand that out it just becomes a quadratic cost and then this is basically like your sort of control cost so right it looks just like you transpose r u um and then the doubles here which are the process noise at each time step these become your controls and what your waiting up there the weight matrices right so the equivalent of Q and R from lqr are V inverse and W inverse it&#39;s the inverse noise covariances if anyone&#39;s looked at a log likelihood function before this should look very familiar right so basically the cost the quadratic cost in here is the log likelihood of a gausian that&#39;s the way I think about it um like this this quadratic thing is exactly what shows up in the exponent of the definition of a gausian distribution right um it&#39;s like xus mu times inverse coari time xus mu remember from when we wrote that down okay um okay so some notes on this uh if F ofx = A * X and g ofx = c * X this whole thing is an lqr problem and if you do that you can like go solve that lqr problem with dynamic programming like we know how and you&#39;ll basically get a Colman filter but it&#39;ll be this weird you know like it basically that&#39;s how you show this kind of weird equivalence this Duality between the colon filter and lqr is VI this definition but the the kind of you know interesting thing is this is General this this sort of Correspondence holds up in in nonlinear settings too so nonlinear F nonlinear G um it&#39;s still there&#39;s still this Duality basically I can write down a control problem that is exactly equivalent to the state estimation problem the cool thing now is this this opens us up algorithmically to doing a lot of interesting things and like to other weirder generalizations of common filters for nonlinear problems like um if I were to go solve this problem in like an m PC style way this is a thing people do this is called a receding Horizon State estimator um or backward smoothing extended common filter sometimes um the idea is like I&#39;m basically solving a nonlinear full nonlinearly squares problem over some window into the past so instead of MPC looking forward over a second or two this is me Looking Backward over a second or two of previous measurement history and so that I&#39;m solving as a nonlinearly squares problem and then the tail just like an lqr I maintain a cost too approximation to ex extend me into the fut past where I&#39;m optimizing my Optimizer Horizon in this context what I&#39;m doing is the tail is represented by a gausian prior basically like for anything that&#39;s older than my batch history in here in my Horizon I represent as a gausian like allaha EKF uh but I keep the nonlinearity in some window and solve that full problem so there&#39;s lots of cool things you can do with this it&#39;s kind of cool so this is like the the big idea around Duality is that there&#39;s like an equivalent control problem for the state estimation problem and with that we are done uh thanks guys

Transcript for:Stochastic Optimal Control - LQG and Kalman Filters

Transcript for:
Stochastic Optimal Control - LQG and Kalman Filters