Control Theory: Introduction to LQR

okay so uh let us begin switch so today we are going to uh discuss the next topic um no okay like progression through control theory uh let me just quickly place it uh in the row of topics that we have studied so our first uh topic was how to transfer something to the state space okay so we knew origin differential equations uh first thing that we learned is how to place them into State space form this matrices uh secondly asked how to check stability of the systems in state space form okay and that is through eigenvalues right after that we spent some time uh yeah and after that we said that we can choose control that can stabilize the system okay so we had a system as a state space form which I can stability through eigenvalues and then we can choose control load such that closed loop system so system where we substitute control law inside the system um that closed loop system is stable so it would have negative eigen values all right so that was the first three lectures after that we studied some spend some time with transfer functions than discrete systems now is the time to do the next logical step um from those three first steps that is to give you a tool like a viable practical tool that you can use to actually do this to design control that makes the system stable okay uh on the Practical sessions you already learned ball placement uh Paul placement is a viable tool to find control law that makes the system stable well we discussed them as a lecture the limitations of publishment but that was in the Practical sessions and the regional studied in the lectures so in the lectures we will constrain a different approach and that is lqr lqr lqr is a such a popular such a popular tool for Designing control theory uh sorry for saying control law that um for your linear systems that um basically it is you can say just like part part of common vocabulary okay for people who work in this field in practice of course you can argue what is actually used more often like you someone would argue that PID tuning according to manuals of how to tune a single input single output PID is the most important practical tool for design control law that is outside of the scope of this quarters but uh the adjust you so you know the ID tuning that is one aspect pole placement more classical [Music] you could say more straightforward approach to design control law uh that someone could argue that this is more popular and like you are you know maybe someone will argue that this is more popular but we're not trying to decide who which one's more popular which ones is more influential that is not very important for us but uh just so you know the what you are studying is one of the pillars of uh control law design so it is not a you know just something it's of extreme importance uh you know for control theory all right all right and uh the key aspect of what we are going to do here is uh to bring for the first time additional concept additional concept and that is optimality before that we were talking about uh control in terms of only one criteria and that one criteria was stability okay so far we only wanted the system to be stable for example if the system has eigenvalues minus 10 minus 10 we were happy because it is stable if the system has again values minus 0.11 0.001 so 1 over uh you know one divided by a thousand is negative sign and then minus thing we're still happy because it is still stable of course you can understand that uh the second system would Decay to zero much slower than the first so in fact the second system will be performing worse if what we understand those words is how fast the control error decays to zero right so you can say that the two systems behave differently and you can measure the performance somehow and say that the first system is better than the other now if you can clarify this measurement in terms of some mathematics you can talk about optimality so optimal system would be the one that behaves the best possible way given your circumstances so given your limitations given uh your uh what you care about there will be some optimal way your system would behave and this is much more strict than the notion of stability like there can be infinitely many ways to stabilize the system but there will be usually only one way to make it optimal okay so much more strict and that's what we are going to talk about here all right let's begin so first uh we will try to understand where the notion of optimality comes in like how do we how do we get it uh that is not a not going to be used directly in your uh like when you apply lqr but that will give you telescope granted which I found in my practice to be extremely useful so I recommend you to try to understand what we are trying to do here okay so first let us Define Dynamics as x dot equals to f of x and the U okay the standard uh State space Dynamics uh in this case it is non-linear so we don't say anything about linearity of the system yet with initial conditions X over zero okay X or zero maybe it is equal to X naught something like this okay so Dynamics initial conditions great now we will Define control policy control policy as U equals to Pi of x and t so here um the reason why we suddenly introduce this new um like we introduced two things all right first is a new term control policy another one is new letter Y okay so the reason is is just to avoid some confusion okay so I can say U equals to U of x and t that is a typical abuse of notation that we use in a month um but uh more correctly when you say you have a function like to a call function one thing and because the output something else okay so functionally and output are not usually the same thing so let me just uh write it as we usually usually we say U is equal to U of X let's say okay so U is a function of X That's How we'll say so here is abusive notation comes in the fact that the output of the function and the function itself is denoted by the same letter okay usually it doesn't matter because you have no chance of confusing the two you can don't have a chance to confuse the output of the function with the function itself okay well the function itself can be a cosine for example the output of the function for a given X can be let's say 0.7 so two different things U is equal sign here U is 0.7 here two different things okay now uh here uh same thing can happen and we would really like to differentiate between you output and P like Pi is a control policy now why do we say control policy again this is to avoid uh the same confusion so classically we call like this control input or just control okay you and this the call control law okay so the law account control is defined but just to avoid uh but I mean we we call them the same thing so often as in um that uh somehow it just becomes dangerous uh in terms of being you know in terms of uh you know mistaking month was the other the code of uh control washer but uh I introduced control policy and part of it is just so you don't uh you know don't feel afraid when you encounter this next time in let's say like reinforcement learning course actually like this whereas they use control policy very typically it doesn't matter control policy and control law is the same thing is just the way to reference Pi so the way you compute control is a function what's out all right so the dynamical system in control police all right so uh what would be for example a way to choose a different control policy uh in in our familiar settings so for example let's say this is a control policy number one there's a control policy number one and here's a control policy number two as long as uh K1 doesn't equal to K2 there's also two different control policies so like choosing games different games would give you different control policy okay well our control policy is allowed to depend on time and that would be typical for time varying systems uh so poorer for for the tracking control or something like this well there's nothing surprising about t but as long as we're talking about LGI so linear time invariant systems and uh what we are trying to solve is uh stabilizing the origin so x equal to 0 is what we are trying to stabilize as a solution uh as long as we do that we would have uh P of X usually Pi of x if you want to review uh where uh you know other options I'll suggest starting with a look at the lecture on control and trying to see how it would look like if we have um try to stabilize a solution which is not not adorable okay okay now let us introduce next Innovation that is a cost function okay it is a cost function there's also additive constellation not just a post added input what do I mean by additive uh well additive refers to this integral here for this integral is if you remember calculus integral is a limit offer sum so we have a sum infinite sum of components or that like exposure but I think you'll probably understand what I mean but uh if you think of in of an integral as the area under the grapple function you you slice this area under the graph with vertical lines you increase the number of vertical lines infinitely and there's a sum as limit approaches Infinity becomes integral so is a sum basically the sum of those thin slices so this is why we call it additive in discrete uh case this would be literally a sun in this case it would be literalism in continuous tank case it is all right now here what we have is a policy and uh here we have initial condition X zero okay so cost depends on initial conditions and control policy and what it integrates is cost which is a function of X and U okay now let us try to understand what's going on here so first of all this cost is uh what we call instantaneous cost so we look at the pair X and U and we decide how much we want to punish it let's say our ideal case is when X and u i equal to zero let's say that this is our ideal case X and UI equal to zero so we might punish them from being for being naught zero like for example x squared plus u squared would be an example of a search cost can be anything else all we ask for is that uh it is instantaneous so it depends on the current tax current why according to you all right and now this is initial condition X of zero um now notice that uh here we have initial condition instead of X of t or why well because if you know control policy and you know initial condition you can solve cache problem to find X of t so cost doesn't depend actually on X of T it depends on the initial condition because X of T itself depends on the initial condition and control policy so you have a control policy you already know what x of T will be so the cost only depends on initial condition and control policy which kind of makes sense you can say my cost will depend on from where I start and how I derive my road correct now let us try to decide to talk about the optimal post G Star status denot optimality for this lecture so this optimal or lowest possible cost okay so G Star now will only depend on X of 0. and what is uh what is it equal to well it is equal to uh infinum over all control policies so minimum over all possible control policies over the original cost J of x0 Pi of x t so notice what happens here is we take all possible costs uh from a single initial position with all postal control policies and we find the smallest possible cost uh for all uh given uh like different control policies so we find control policies that delivers the smallest possible cost and that would be the optimal cost for x0 okay so given given all possible posts this would be the small uh for all possible policies but a single initial condition this will be the optimal one position okay and the controls it makes it possible the controls it makes it possible is a control policy is a pi star of x and T so is a pi star of Excellency so uh the control policies makes possible is pi star you call it based these are optimal control policy okay so just to reiterate we have a control policy which is additive it is an integral over instantaneous cost from zero to Infinity uh it depends on initial conditions and control and I have Optical control policy which no longer depends on control because we out of all possible controls we've chosen the ones that delivers the minimum to the cost so it only depends on neutral conditions okay now with this what we can formulate is Hamilton equation we call it HGB is a famous equation uh and the types of the situation are used in reinforcement learning types of the equation I used in object control is just important uh important piece of mathematics it has direct connections with dynamic programming in discrete discretion of this control this Belmont principle of optimality that is used I believe in Reef was learning so you know don't uh don't uh feel like this is uh useless this is actually you know if you're going to study one of those fields for your you know advancement later uh you might find it useful to understand how it works in this context so you know right in my question so what it is is a minimum is a minimum over you okay minimum volume of uh instantaneous cost plus partial derivative of optimal course with respect to X times the dynamic dynamical system and this whole thing this Meme Over You for this guy is equal to zero okay yeah so you may find this random you'll be like at this this was my impression when you look at it you'll say Okay instantaneous cost okay for some reason we we care about it even though it is instantly and why okay then optimal control policy here our optional processary here partial derivative times Dynamics all look very strange but if you think about it uh this whole thing here is uh looks a lot like a Time derivative of uh the cost function which was a cost much okay in fact you can say Time Zero is for the optimal Construction now let us see why let's look here so if we're trying to take time time 0 which for this game uh what we'll have is using chain rule what we'll say is time derivative of J with respect to X and here let me just say with respect to exit like instead of instead of x0 I will just use x okay so with respect to x uh it would be DG DX DX DT but DX DT is of course f okay zxdt is uh f of x okay so I'm generous for this guy uh using chain rule would have a first component is DX DG DX DX DT so DG DX f of x so this is how we get this component DJ DX perfect that's partial division now uh the second component will be partial derivative respect to time foreign would be just J of x value that's it that's it so partial view is respect the time will become G of excellent so this component and the partial division is respect to x times x dot is this kind of for this you can interpret it as x dot like full time derivative of this one this mean over U you can interpret us coming directly from the fact that we are talking about optimal post which is in the Phenom over the chain right so we didn't just work with J will represent film all the chain okay and uh we said that equal to zero yeah now uh what you can think uh like yet another way to think about it is to say that optimal control uh optional cost is going to be um bigger as long as you stray away from optimality so the control the cost will be bigger as long as you go away from optimality so also control of course optional cost is always the smallest possible you go uphill so um so you can kind of think of it as a as this relation right so uh whenever you have a situation where you are at the bottom of the hill when any time anytime uh you do anything you go uphill your derivative at this point would be zero okay so this is the integration maybe for the zip all right uh that is not a zero issue that is just intuition like uh uh at least for me when I first saw it I found it difficult to wrap ahead around this strange combination this guy even this so I'm trying to give you like a way you can at least feel comfortable with those uh things I'm not giving you derivation here and uh unfortunately the intervention of this equation is actually difficult to find equation in textbooks that are ones that I edit but also is it reflection all right and as long as this condition hold the control policy is optimal the control policy U that you found is optimal and this is true for non-linear systems uh not just linear so any control policies that you will find if you want to check if this optional with respect to your cost just plug it into this equation and uh if it holds then great like you found the correct controller of course the problem is that it will be difficult to actually check and practice because of this mean over you but uh the risk speaking you could do that so this is an equation and uh this is the conditional optimize now we can use it to find control volume uh like this you can say foreign so if if this condition hold condition files holds holds then option control policy can be found as argument over you okay fair enough uh so far uh it just looks interesting but it is unclear uh if it has practical significance so let's uh let's try to get to practice so for our practice uh we are very humble we use LTI systems continuous time outside x dot equals to f a x plus b u okay let us use the quadratic post so our instantaneous cost will be quadratic what this mean well we'll use components of cost X Q X and u r u so this is x times Q times x is a quadratic form of X which is Q as a weight function you can see there's a quadratic classic uh for Matrix this is the same but in terms of you q and R would have to be positive definite positive uh for for this to uh to be meaningful because if for example the uh if if this number can be negative then there will be some X and U this can actually make the cost go down which is not something that makes a lot of sense because the system will just plunge into this direction then so if you want to stabilize the origin we want a Q and R to be a positive at hand okay want them to be positive in fact Q can be positive semi different so it can be equal to the sum of eigen values of Q can be equal to zero if there are some states for which we don't care at all States whose or combination States who's going to zero is of no consequence to us however R always have to be positive and this makes sense because as long as we allow R to be a positive semi definitely so there are some control control components which we don't care how much we use so we're not going to punish the system for using more control the system would say okay let me use infinitely High control in this direction so I achieved some marginal gain in some other direction because I'm not not punished by using as much as controls as I want this makes no engineering sense so our would have to be a positive different not semic and you will see in the derivation later it will play a role so this is the thing is caused send repetition as a alluded already is kind of like this you say okay here I punish uh X for being away from the origin but I punish it in different directions differently so Q would be a quadratic function positively so it would Define uh in which direction the punishment is high in which direction the punishment is lower so you can say that this is uh you know this is the equation of an ellipsoid and uh there'll be like principle Direction which is a punishment is the highest there's another principal Direction which is the punishment to the lowest is there argument we still think about it but uh yeah you can also do like eigen decomposition on it uh to identify the values and this would also yield to you where the punishment is the highest Etc um but yeah if you choose Q for example to be diagonal then your coefficients on your diagonal would directly imply which of the components of X is being punished how okay so if Q is a diagonal it will become a sum of X1 squared times q1 plus X2 squared times Q2 and so on and so forth so this is not uh anything total strange same here if R is diagonal then you would have just U1 squared times L1 plus CO2 squared times F2 Etc okay now let us substitute all of this goodness into HGB so the substitute into hdb and that's what we get this is our instantaneous cost this here is our partial derivative of the optimal optimal cost with respect to packs and this here is our f of X so x dot right so this is what we get and ultimately this is what we wanted to arrive at all right this is HTTP but for linear systems now we just need one more component to be able to solve this equation that one component is this part here or so far this looks mysterious and we don't know what to do with it all right let's make Next Step so we have a theorem which I'm just giving to you as a theorem uh that says for LTI with scholarity cost optimal postage form x times s times x so optimal post is a quadratic form itself okay where s is the positive for definite Matrix so the optimal cost is a quadratic function and that is uh great because uh because it allows us to easily uh do the rest of those derivations so for example what is the partial derivative of this function with respect to X respect to X would be let me write it out and say DJ DX and we are talking about partial derivative even though like I don't have a partial simple in my keyboard so I apologize so we'll have X transpose times C times [Music] topics okay Plus Plus so this would be this would be our okay so this would actually be our uh continuity with respect to x times DX GX which yes so this is how we how we do it uh we say let's say that uh partial derivative respect to X so this is ultimately what we do here uh a better way to say it is this is a derivative with respect to a vector field okay so we have this scalar function we don't want to take a derivative but uh this is a function of X but X itself changes according to a vector field so it's changes with respect to time uh as uh f of x and you know according to the function f okay so f is a vector field here and the the derivative is uh this is just to say that the derivative respect to a vector field is a partial derivative with respect to x times the vector field okay well this would be a better way to discuss this than to say you know whatever written here I guess yep correct now this is just a chain rule uh oh sorry the derivative of a product that's what I did and there is for product right so it's like derivative of Z A times B is derivative of a times B plus 0 to B times a so same here now uh this part this part here is in this part here sorry uh this part here is f of x and in our case this a X plus b u this part here is f of x in our cases again ax plus b u so when we substitute here what we get is our familiar instantaneous cost as before here plus those new components that we just discussed so x times C okay okay I'll see X from here and our f of x in both cases that's it that's what we have so nothing to difficult happens here here is very easy to make a mistake with transposers and stuff like this so This wave is uh x dot I recommend it because it makes it much more easier not to make mistake one way to make a mistake here is to just compute DJ DX okay the jtx and then multiply it by x dot that is one way to make a mistake so if you try to compute DJ DX uh what you would find is two vectors one vectors another horizontal okay uh and you will sum them I guess you would pretend that there's a stay kind of like they're both vertical I guess and you would have something like let's say 2X 2cx or two external suppose C and that will be would create uh like you would forget one of those two components you will get a two in front of the other component and that will just lead to uh you know your quadratic form might be correct but uh it would no longer be symmetric The Matrix no longer business so it's easy to make a mistake by trying to apply uh this thinking but uh instead of thinking of it as a single single entity like a derivative with respect to a vector field you think of it as two jacobians the Jacobian respect to x a Jacobian with respective Gene like uh like separate entities that you can multiply to each other so [Music] if you in your generation you have a tool and you forgot one of the components that is probably how we did all right do you have questions about this so far this is like mathematically this is the most uh hairy part um no okay good if you have questions interrupt me at any time because uh right we have enough time for in the lecture for that um like the number of slides is quite smoothly so don't hesitable okay so this is uh the situation here uh we have um now let me clear all this we have now our age student in this form and the rest is just out the rest is just logic so let us do the audition so first of all we open these brackets it opens as brackets and uh what we do is uh for example we can collect okay you can collect quadratic terms in X so this is a quadratic term in x here in one component here in as a component here another component okay so those are quadratic press form on X and then we have mixed components so we have component that depends on B so on U and X if you ever have components it depends on the U and X and this is here so quite simple quite simple all right good next step is to find the partial derivative of the equations back to U and set it to zero so we'll say that this is this equation the minimum of it with respect to U should be equal to zero well let's uh since you know we know that uh you know we know that this lies at the minimum with respect to U so if that is true we'll lie at the bottom and we can find the derivative respect to U and there is no that the derivative is going to be equal to zero because anything that lies at the bottom uh of a in uh some locally convex shape like at the bottom of uh basically like as a new the condition of optimality includes the derivative is zero so since we know instances what we are all right let's do it so uh let's find the derivative of that guy with respective with respect to you so what's going to happen is this guy will become um do you transfers uh here basically uh here by the way we are going to do exactly what I was worrying you against uh above so we're going to some vertical and horizontal vectors quite interesting quite interesting so uh more correct would be to say that this is U times R plus r times U right but we're going to sum so many ways and here again it will be well uh here is there is an asset patient so those this equation will yield as X transpose c times B and this would yield us B times c times x thank you here um but let me I guess uh go just a step deeper uh into this whole business of taking zero with those guys so delusion for this guy is clear absolutely clear so let me even get rid of this one so zero two this guy is clear that is just X transpose m c b right what are the derivative of this guy what is the derivative of this game now this is actually not so clear well it is tempting to say that the derivative of this guy is B times C uh times x right it stems interested but that's not correct why because uh U here is transposed okay you hear it's not supposed so in fact in fact uh in fact uh the correct way to take a derivative of a function when your argument is transposed is to transpose the function so the correct way is to say that this is what's going to happen that is we are talking about partial derivatives like remember here we were talking about derivative with respect to a vector field that is a different Beast uh here we're talking about partial derivatives so uh how can you justify it for yourself as in you can just look it up in Wikipedia or in calculus textbook and you'll find some table which will tell you how to take a derivative with respect to anything but how can we justify for yourself well there is a very simple way to think about it like this whole thing here and let me I guess uh clearance so this whole thing here is a scalar so this is just a single number simple like function okay scale of energy uh a single number it's transpose is still itself so transposing this particular one a particular function does not change anything so you can transpose it so transpose it first leads us to a function let me write it here transposing at first leads us to a function X transpose C uh I mean c is the symmetric so the reason for a transpose specific function times B tells you can so this will be a transpose of serration function great now derivative of this function with respect to U is clear it is this game [Music] x times uh how to open the brackets with transpose humor right it is you change the order in the opposite direction and you transpose everything so B transpose CX will becomes X transpose c b okay so uh you can think about it this way uh you uh like that always one is if you're going to take a derivative with respect to the variable which is transposed U transpose the outcome okay that's one right another another one is since this is a scalar you are allowed to transpose it without changing anything and then after you transpose that you take a derivative which you know so you can clearly see that this partial derivative here and partial derivative here they are the same like this is the same as so here when you do this uh you'll also have the same thing if there is respect to directly with respect to this U would give you U transpose R okay now derivative with respect to this you you would give you um let me consumerizing here will give you r u transposed okay which is the same as two or you open the buckets and you get again you are so don't be the same as you and uh here here you get just two x cb okay so you understand now how the decades impossible let us see if one of the things that were easy to make mistakes and it involves two different ways of thinking about derivatives here with the thought of dealerships with respect to Vector field here we think conservatives of scalers which can be transposed not also correct now this is the result you can clearly see we can get rid of those tools in fact if uh we choose control policy to be or to have like one half somewhere we might have gotten rid of those uh guys uh uh in the derivation but personally like they cancel next what you do is you take this part here transfer to the right hand side this is a minus sign okay we transfer to the right hand side you get a minus sign and then you multiply yeah uh here okay let me I guess on skip steps the the first important step is to transpose this whole thing again uh because U and X are transposed so it would be nice to transpose them back so it'll be r u Plus different scores CX okay that was the first step and after you cancel the tools here you can see those okay that's what you get next you transfer this guy on the right hand side is a minus sign and then you multiply both sides by r trans R inverse so multiply both sides by R inverse so basically you just Express U out of this equation alone but what you do is you remember to transposed you have to do it all right so this is how you get this controller now this control law is optimal okay so this optional control you can already use it there's only downside is that you don't know what s is by R you chose yourself that is your cost function B is your control Matrix you note it comes from a system s we don't know what it is you have to find it is optimal cost uh because that's cost function you don't know what it is okay okay now this is a control law so we could using this control log instantly say that uh this is a gear control gain so we can rewrite it as like this where a will be equal to R inverse B transpose C so uh people uh sometimes like to give it interpretations one interpretations for example to say uh the the control law somehow takes X transforms it in accordance to the optimal costable okay so if the optimal cost has deformation to be taken to account here then it uh transforms it in accordance to the control Matrix so control authorities you have then finally takes into account the control cost okay so this is like a series of linear Transformations that somehow take into account your final cost how it's going to look like your Control Authority into your cost of your actions but uh yeah I'm not sure if this makes it more clear to me but if you want you can spend time with to think about uh like algebraic implications of this but this is a standard formula so this would be your standard lqr control mode and this is a proportional control law this is a proportional control notice if you had a second order system let's say spring dumper uh X would include position and velocity so your control law in this case would be based on positional velocity so it will be linear with respect to position linear with respect to Velocity for second order systems that is what we call PD control so this is a proof that for a linear second order system PD control is optimal so V for free proved that for second order linear systems uh PG controls Optimum thank you all right that is uh nice to hear another it's just uh something but when when people talk about PID uh when they say like PID is good Etc you can see that some of the justification comes from the fact that in fact PG component is already optimal for linear system and PID control is usually used for motors which are indeed linear second order systems sometimes first order sometimes second order is sometimes third order uh so you know there is zero variations but uh if we're talking second order the systems then you can already see the spruce optimality of PT control I component and controller uh that is I just uh for your information I guess you can prove its necessity by considering something called disturbance Observer who understands this may be a little later as a last lectures of the course which takes to account also external disturbances or errors Etc model errors all right so there is a connection between this and PID in terms of theoretical proofs okay now this control law is called linear quadratic regulator lqr this is called linear quadratic regulator uh you if you use this control lawyer using lqr you're using linear contrast curriculator okay now how do we find S uh is all fun but uh before we know s we cannot use 8 QR we need to find this now what we can do is we can go back uh to this equation back to this equation and we can substitute the controllers that we found into here here and here okay so I found the control law Everyone by this minus r inverse B transpose c x so we will get here x transpose uh R inverse uh be b c oh sorry uh what I'm talking about no so it'll be X transpose c b r inverse times R times R inverse uh B transpose c x here it will be again minus r inverse so we'll have minus sign here we have minus r inverse B transpose c x here it will be X transpose c d r inverse Etc so uh all this with the rules of student will be bigger let's look at this so I'll try to highlight over the substitute the controls our controller here's the first instance your second instance here is uh I believe the third instance here believe the last instance so we substitute our control law we got a few minus extra minus signs here because controller is negative has a negative sign so this is what you get now let me simplify uh there is something to simplify here like this is identity this is for example and basically we notice now why we wanted R to be positive different rather than semi definitely if it was semi-defined it would have zero eigenvalues so it could have been not invertible but with r being positive difference it is always invertible so we always have R inverse that allows us to find controller so it wouldn't be even possible without uh RB inverted all right so we have now this guy now let us notice that we have three identical expressions one expression here x s b r inverse P transpose CX another one here is a minus sign x c b renewable transpose c x and another one here x transpose CB are inverse B Transport CX okay three identical expressions so when we sum them uh we'll have only one left here there's a minus sign because we have one with a plus sign who is a minus sign the identical so the white clue cancel each other everything else that remains is this contraceptive form with respect to X okay this is also almost all of this is a quadratic form with respect to X so we put X Out of the brackets and that's what we get we have q c r a a transpose C minus uh c b r inverse business plus c okay so we have this guy here and this would hold only if uh this for all X all X only this equation is equal to zero so it's not a question of uh it's uh having a zero eigenvalue or two it is a question of this whole Matrix being identically equal to zero well this whole Matrix has to identically be equal to zero so uh just to reiterate those mean over you now goes away because we substitute optional control law uh but we still e i equal to zero as you remember from the front condition so this uh this still rests these theorists in this theorists okay so what we achieved here is called algebraic Legacy coach this is called algebraic records equation one of the key stones for a number of uh areas but in particular for control optional controls also used in at least used to be important for the convex optimization and it is you can think of it as just the important mathematical object so this is photos equation what we can notice here is that algebraic Cricket situation is quadratic with respect to s with linear components here so the first linear components here but also quadratic component here so this would prevent us from being able to solve that illusion using linear algebra so in fact this equation is usually solved numerically we don't usually solve it using the algebra and seller exists very robust algorithms that solve this equation so this equation algebraic equation is what we solve to get past and the ones we solved we substitute it into our control law here uh we find K as R inverse BC yes and uh that's probably the way if you are now how do we do this in practice it is you know painfully simple so in practice if you use a Matlock you Control Function lqr of a b q r this is a very linear systems so this is your state Matrix and raw Matrix control cost uh sorry State course control host and this will give you your optimal gain so K which is equal to minus sorry equal to R inverse B transpose C this will be your C so s will be your s uh and this will be identify yourself in your closed loop systems okay if you're using python what you can do is you can call site file you know solve continuous array array is algebraic gas equation as you can see it's Again State Matrix control Matrix control State Post control cost and this now is your uh solution to your records equation so this is optic form of obso control cost notice that you have to do like substitute yourself into K to get like to get your K you have to like do R inverse times B transpose times C so in Python you don't get K you get s so you have to find K yourself is just remember remember to do it still forgettable so uh if you want like a more robotics oriented default software just Drake a few years ago used to be one of the like more exciting libraries out there uh we have more competition now I guess uh but uh it Still Remains one of the uh you know more impressive robotics uh toolkits uh is developing MIT by a raster Drake and GST so uh they they have in one of the releases the ones that we worked with it they use have a function linear products calculated which takes again control Matrix State plus control cost and the return of the optimal login and costable so first form of the option cost now what what do we see here what we see here is that in practice you just call a single function let's say this one or since you're using python let's say use it calls as function you get your solution to the gas equation so get yours and then uh you'll find your key and you're done so it's kind of the same as using place uh you it just gives you optimal cost but the plus side is um you yeah okay I'll discuss plus size minus signs on the next slide but uh yeah so it's very simple very simple now to calling a place in Matlab or python please both please okay so lqr first uh comes down to using software which computes a solution to zerica's equation but uh this whole charades that we did like all of this from here to here it was used so let me just before we continue on uh let me just quickly explain like is useful outside of lqr it's just important piece of mathematics and it is true for non-linear systems as well so it's not just linear system think it is true forming linear systems as well okay so if your system becomes non-linear but you can still solve a hematological building you know great you are in Black so uh it's important to keep it in mind if you are only working with this type of intellectual environment you are instantly limiting yourself only to the systems for which you'll have a situation so in this case LTE but uh you can instantly imagine maybe you'll have LTV systems maybe you will have systems with some small linearity or anything else as a there can be a number of options and if you can derive this equation from HTTP or from this equation section then you'll be able to stay flexible you'll be able to continue working with ricotti even when you see something that is not covered in the textbook that you have open internship so it's important to know HTTP just to be able to derive slightly different systems okay so in fact if I add plus C here or if I want to do trajectory tracking let's say I once uh as a course the punishes not X but x minus X star all this is possible is uh the results derivations you just you can do it okay now this derivation is just important uh partly because you know it's easy nothing to Difficult about it and it allows you to keep your skills of picking derivatives uh not resting and when you're going to do other things like uh you know carbon filter Etc knowing how this is done here might be useful for you there so it's uh like this is not different so I would recommend you practice okay now this is just important to know because python expects you to know this they don't give you okay they give you s so you need to remove those aspect and this is recalcipation so if you want to understand what exactly you're solving you need to run remember this equation this equation might be difficult to memorize but if you know the derivation you'll be able to derive it very quickly and if you don't remember the situation uh but you if you just know the outline of situation for example if you remember this part at least uh is the fact that you need to cancel our our inverse your Originals that are inverse is going to be the center and the only thing that uh yeah and you just need to be be trans be transpose you need to put there correctly and uh yeah so it's not difficult it's not difficult the rest will be done by just uh I thinking about Dimensions so if you want to memorize it knowing uh the derivation uh will allow you to easily remember the minus sign here so I strongly recommend you to study this as much as you can now what is the comparison between this and uh our previous uh tool all placement so uh just a few words so both Lisbon has its own advantages so it allows you to exact to design exactly how fast the control error decays to zero foreign so you can decide how fast your you go down into zero chapter because you place eigenvalues essentially of the system and you can design control without oscillations for example so you can decide if your control needs to have oscillations or not you can decide okay no oscillations that is a plus side the minus side is uh it may easily require unreasonably High control units so may ask please make sure that the second value goes is equal to -2 and you need to control gain 10 000 for that so in practice it is useless you'll be like a minus 2 doesn't work because ten thousand control games let me ask for minus one and I'll tell you okay this is one thousand new black okay still not acceptable and so on so what I'm trying to say is um it's very easy with a full placement to ask for some families because you don't understand how Dynamics works and you don't care about how it works you simply ask it now make it work this way okay let's say you have a car and you ask it to behave like a bike you ask it to be able to uh let's say you have like a bad example you have a bus you ask to behave like a sport car you're saying okay accelerate this first decelerate this fast Etc that is a like a metaphor not a knowledge but yeah uh you like accelerate this far this is always this far and so on just fast uh let's just not how you design a control system for a bus it's just not uh not a good way to do it because you would overtax your Motors and so on so in fact uh very easy to move wrong the whole idea of Paul placement is that you don't really care about the Dynamics beforehand you just force it to behave the way you want to behave and you can imagine this is not good now lqr is very easy to produce good games like uh if you just choose Q equal to Identity r equal to Identity most likely you'll produce a reasonable okay okay so alkira doesn't Force anything on your system and as long as your control gains are reasonable uh so the cost is reasonable you would find Optimal control policy with respect to this cost it should be reasonable of course here you can instantly see a one downside we are talking about optimality right so we're talking about optimal control policy but the cost like q and R functions if you don't really know how to choose them as in it would be good if someone told us what cost should be like but the way no one is going to tell us so we have to choose the optimal cost and the optional cost is not something you can look up or do anything else in fact you'd have to just come up with it and often up the performance that we really care about comes in the form of eigenvalues not cost so in fact the population somehow deals more directly with what we care about with eigenvalues and the lqr steps away deals with the cost which we don't really care about so much so that is a downside for example it can easily produce uh slowly digging control error visualizations so if you use a car it's very easy to get performance which is not very good okay so two uh both of them have advantage ball basement work gives you exactly what you want to have but you can easily ask for something but lqr usually gives you something good but uh it's not often give giving you what you really care about so performance it can be jeopardized yeah all right all right so this I think is all I want to say about aqr do you have questions ah no okay oh that's good okay uh today we will have I I believe the similar you will have a bit of a study of the discrete case for uh zelker so that will be covered make sure you attend the seminar all right we'll see each other next week good luck bye goodbye bye

okay so uh let us begin switch so today we are going to uh discuss the next topic um no okay like progression through control theory uh let me just quickly place it uh in the row of topics that we have studied so our first uh topic was how to transfer something to the state space okay so we knew origin differential equations uh first thing that we learned is how to place them into State space form this matrices uh secondly asked how to check stability of the systems in state space form okay and that is through eigenvalues right after that we spent some time uh yeah and after that we said that we can choose control that can stabilize the system okay so we had a system as a state space form which I can stability through eigenvalues and then we can choose control load such that closed loop system so system where we substitute control law inside the system um that closed loop system is stable so it would have negative eigen values all right so that was the first three lectures after that we studied some spend some time with transfer functions than discrete systems now is the time to do the next logical step um from those three first steps that is to give you a tool like a viable practical tool that you can use to actually do this to design control that makes the system stable okay uh on the Practical sessions you already learned ball placement uh Paul placement is a viable tool to find control law that makes the system stable well we discussed them as a lecture the limitations of publishment but that was in the Practical sessions and the regional studied in the lectures so in the lectures we will constrain a different approach and that is lqr lqr lqr is a such a popular such a popular tool for Designing control theory uh sorry for saying control law that um for your linear systems that um basically it is you can say just like part part of common vocabulary okay for people who work in this field in practice of course you can argue what is actually used more often like you someone would argue that PID tuning according to manuals of how to tune a single input single output PID is the most important practical tool for design control law that is outside of the scope of this quarters but uh the adjust you so you know the ID tuning that is one aspect pole placement more classical [Music] you could say more straightforward approach to design control law uh that someone could argue that this is more popular and like you are you know maybe someone will argue that this is more popular but we&#39;re not trying to decide who which one&#39;s more popular which ones is more influential that is not very important for us but uh just so you know the what you are studying is one of the pillars of uh control law design so it is not a you know just something it&#39;s of extreme importance uh you know for control theory all right all right and uh the key aspect of what we are going to do here is uh to bring for the first time additional concept additional concept and that is optimality before that we were talking about uh control in terms of only one criteria and that one criteria was stability okay so far we only wanted the system to be stable for example if the system has eigenvalues minus 10 minus 10 we were happy because it is stable if the system has again values minus 0.11 0.001 so 1 over uh you know one divided by a thousand is negative sign and then minus thing we&#39;re still happy because it is still stable of course you can understand that uh the second system would Decay to zero much slower than the first so in fact the second system will be performing worse if what we understand those words is how fast the control error decays to zero right so you can say that the two systems behave differently and you can measure the performance somehow and say that the first system is better than the other now if you can clarify this measurement in terms of some mathematics you can talk about optimality so optimal system would be the one that behaves the best possible way given your circumstances so given your limitations given uh your uh what you care about there will be some optimal way your system would behave and this is much more strict than the notion of stability like there can be infinitely many ways to stabilize the system but there will be usually only one way to make it optimal okay so much more strict and that&#39;s what we are going to talk about here all right let&#39;s begin so first uh we will try to understand where the notion of optimality comes in like how do we how do we get it uh that is not a not going to be used directly in your uh like when you apply lqr but that will give you telescope granted which I found in my practice to be extremely useful so I recommend you to try to understand what we are trying to do here okay so first let us Define Dynamics as x dot equals to f of x and the U okay the standard uh State space Dynamics uh in this case it is non-linear so we don&#39;t say anything about linearity of the system yet with initial conditions X over zero okay X or zero maybe it is equal to X naught something like this okay so Dynamics initial conditions great now we will Define control policy control policy as U equals to Pi of x and t so here um the reason why we suddenly introduce this new um like we introduced two things all right first is a new term control policy another one is new letter Y okay so the reason is is just to avoid some confusion okay so I can say U equals to U of x and t that is a typical abuse of notation that we use in a month um but uh more correctly when you say you have a function like to a call function one thing and because the output something else okay so functionally and output are not usually the same thing so let me just uh write it as we usually usually we say U is equal to U of X let&#39;s say okay so U is a function of X That&#39;s How we&#39;ll say so here is abusive notation comes in the fact that the output of the function and the function itself is denoted by the same letter okay usually it doesn&#39;t matter because you have no chance of confusing the two you can don&#39;t have a chance to confuse the output of the function with the function itself okay well the function itself can be a cosine for example the output of the function for a given X can be let&#39;s say 0.7 so two different things U is equal sign here U is 0.7 here two different things okay now uh here uh same thing can happen and we would really like to differentiate between you output and P like Pi is a control policy now why do we say control policy again this is to avoid uh the same confusion so classically we call like this control input or just control okay you and this the call control law okay so the law account control is defined but just to avoid uh but I mean we we call them the same thing so often as in um that uh somehow it just becomes dangerous uh in terms of being you know in terms of uh you know mistaking month was the other the code of uh control washer but uh I introduced control policy and part of it is just so you don&#39;t uh you know don&#39;t feel afraid when you encounter this next time in let&#39;s say like reinforcement learning course actually like this whereas they use control policy very typically it doesn&#39;t matter control policy and control law is the same thing is just the way to reference Pi so the way you compute control is a function what&#39;s out all right so the dynamical system in control police all right so uh what would be for example a way to choose a different control policy uh in in our familiar settings so for example let&#39;s say this is a control policy number one there&#39;s a control policy number one and here&#39;s a control policy number two as long as uh K1 doesn&#39;t equal to K2 there&#39;s also two different control policies so like choosing games different games would give you different control policy okay well our control policy is allowed to depend on time and that would be typical for time varying systems uh so poorer for for the tracking control or something like this well there&#39;s nothing surprising about t but as long as we&#39;re talking about LGI so linear time invariant systems and uh what we are trying to solve is uh stabilizing the origin so x equal to 0 is what we are trying to stabilize as a solution uh as long as we do that we would have uh P of X usually Pi of x if you want to review uh where uh you know other options I&#39;ll suggest starting with a look at the lecture on control and trying to see how it would look like if we have um try to stabilize a solution which is not not adorable okay okay now let us introduce next Innovation that is a cost function okay it is a cost function there&#39;s also additive constellation not just a post added input what do I mean by additive uh well additive refers to this integral here for this integral is if you remember calculus integral is a limit offer sum so we have a sum infinite sum of components or that like exposure but I think you&#39;ll probably understand what I mean but uh if you think of in of an integral as the area under the grapple function you you slice this area under the graph with vertical lines you increase the number of vertical lines infinitely and there&#39;s a sum as limit approaches Infinity becomes integral so is a sum basically the sum of those thin slices so this is why we call it additive in discrete uh case this would be literally a sun in this case it would be literalism in continuous tank case it is all right now here what we have is a policy and uh here we have initial condition X zero okay so cost depends on initial conditions and control policy and what it integrates is cost which is a function of X and U okay now let us try to understand what&#39;s going on here so first of all this cost is uh what we call instantaneous cost so we look at the pair X and U and we decide how much we want to punish it let&#39;s say our ideal case is when X and u i equal to zero let&#39;s say that this is our ideal case X and UI equal to zero so we might punish them from being for being naught zero like for example x squared plus u squared would be an example of a search cost can be anything else all we ask for is that uh it is instantaneous so it depends on the current tax current why according to you all right and now this is initial condition X of zero um now notice that uh here we have initial condition instead of X of t or why well because if you know control policy and you know initial condition you can solve cache problem to find X of t so cost doesn&#39;t depend actually on X of T it depends on the initial condition because X of T itself depends on the initial condition and control policy so you have a control policy you already know what x of T will be so the cost only depends on initial condition and control policy which kind of makes sense you can say my cost will depend on from where I start and how I derive my road correct now let us try to decide to talk about the optimal post G Star status denot optimality for this lecture so this optimal or lowest possible cost okay so G Star now will only depend on X of 0. and what is uh what is it equal to well it is equal to uh infinum over all control policies so minimum over all possible control policies over the original cost J of x0 Pi of x t so notice what happens here is we take all possible costs uh from a single initial position with all postal control policies and we find the smallest possible cost uh for all uh given uh like different control policies so we find control policies that delivers the smallest possible cost and that would be the optimal cost for x0 okay so given given all possible posts this would be the small uh for all possible policies but a single initial condition this will be the optimal one position okay and the controls it makes it possible the controls it makes it possible is a control policy is a pi star of x and T so is a pi star of Excellency so uh the control policies makes possible is pi star you call it based these are optimal control policy okay so just to reiterate we have a control policy which is additive it is an integral over instantaneous cost from zero to Infinity uh it depends on initial conditions and control and I have Optical control policy which no longer depends on control because we out of all possible controls we&#39;ve chosen the ones that delivers the minimum to the cost so it only depends on neutral conditions okay now with this what we can formulate is Hamilton equation we call it HGB is a famous equation uh and the types of the situation are used in reinforcement learning types of the equation I used in object control is just important uh important piece of mathematics it has direct connections with dynamic programming in discrete discretion of this control this Belmont principle of optimality that is used I believe in Reef was learning so you know don&#39;t uh don&#39;t uh feel like this is uh useless this is actually you know if you&#39;re going to study one of those fields for your you know advancement later uh you might find it useful to understand how it works in this context so you know right in my question so what it is is a minimum is a minimum over you okay minimum volume of uh instantaneous cost plus partial derivative of optimal course with respect to X times the dynamic dynamical system and this whole thing this Meme Over You for this guy is equal to zero okay yeah so you may find this random you&#39;ll be like at this this was my impression when you look at it you&#39;ll say Okay instantaneous cost okay for some reason we we care about it even though it is instantly and why okay then optimal control policy here our optional processary here partial derivative times Dynamics all look very strange but if you think about it uh this whole thing here is uh looks a lot like a Time derivative of uh the cost function which was a cost much okay in fact you can say Time Zero is for the optimal Construction now let us see why let&#39;s look here so if we&#39;re trying to take time time 0 which for this game uh what we&#39;ll have is using chain rule what we&#39;ll say is time derivative of J with respect to X and here let me just say with respect to exit like instead of instead of x0 I will just use x okay so with respect to x uh it would be DG DX DX DT but DX DT is of course f okay zxdt is uh f of x okay so I&#39;m generous for this guy uh using chain rule would have a first component is DX DG DX DX DT so DG DX f of x so this is how we get this component DJ DX perfect that&#39;s partial division now uh the second component will be partial derivative respect to time foreign would be just J of x value that&#39;s it that&#39;s it so partial view is respect the time will become G of excellent so this component and the partial division is respect to x times x dot is this kind of for this you can interpret it as x dot like full time derivative of this one this mean over U you can interpret us coming directly from the fact that we are talking about optimal post which is in the Phenom over the chain right so we didn&#39;t just work with J will represent film all the chain okay and uh we said that equal to zero yeah now uh what you can think uh like yet another way to think about it is to say that optimal control uh optional cost is going to be um bigger as long as you stray away from optimality so the control the cost will be bigger as long as you go away from optimality so also control of course optional cost is always the smallest possible you go uphill so um so you can kind of think of it as a as this relation right so uh whenever you have a situation where you are at the bottom of the hill when any time anytime uh you do anything you go uphill your derivative at this point would be zero okay so this is the integration maybe for the zip all right uh that is not a zero issue that is just intuition like uh uh at least for me when I first saw it I found it difficult to wrap ahead around this strange combination this guy even this so I&#39;m trying to give you like a way you can at least feel comfortable with those uh things I&#39;m not giving you derivation here and uh unfortunately the intervention of this equation is actually difficult to find equation in textbooks that are ones that I edit but also is it reflection all right and as long as this condition hold the control policy is optimal the control policy U that you found is optimal and this is true for non-linear systems uh not just linear so any control policies that you will find if you want to check if this optional with respect to your cost just plug it into this equation and uh if it holds then great like you found the correct controller of course the problem is that it will be difficult to actually check and practice because of this mean over you but uh the risk speaking you could do that so this is an equation and uh this is the conditional optimize now we can use it to find control volume uh like this you can say foreign so if if this condition hold condition files holds holds then option control policy can be found as argument over you okay fair enough uh so far uh it just looks interesting but it is unclear uh if it has practical significance so let&#39;s uh let&#39;s try to get to practice so for our practice uh we are very humble we use LTI systems continuous time outside x dot equals to f a x plus b u okay let us use the quadratic post so our instantaneous cost will be quadratic what this mean well we&#39;ll use components of cost X Q X and u r u so this is x times Q times x is a quadratic form of X which is Q as a weight function you can see there&#39;s a quadratic classic uh for Matrix this is the same but in terms of you q and R would have to be positive definite positive uh for for this to uh to be meaningful because if for example the uh if if this number can be negative then there will be some X and U this can actually make the cost go down which is not something that makes a lot of sense because the system will just plunge into this direction then so if you want to stabilize the origin we want a Q and R to be a positive at hand okay want them to be positive in fact Q can be positive semi different so it can be equal to the sum of eigen values of Q can be equal to zero if there are some states for which we don&#39;t care at all States whose or combination States who&#39;s going to zero is of no consequence to us however R always have to be positive and this makes sense because as long as we allow R to be a positive semi definitely so there are some control control components which we don&#39;t care how much we use so we&#39;re not going to punish the system for using more control the system would say okay let me use infinitely High control in this direction so I achieved some marginal gain in some other direction because I&#39;m not not punished by using as much as controls as I want this makes no engineering sense so our would have to be a positive different not semic and you will see in the derivation later it will play a role so this is the thing is caused send repetition as a alluded already is kind of like this you say okay here I punish uh X for being away from the origin but I punish it in different directions differently so Q would be a quadratic function positively so it would Define uh in which direction the punishment is high in which direction the punishment is lower so you can say that this is uh you know this is the equation of an ellipsoid and uh there&#39;ll be like principle Direction which is a punishment is the highest there&#39;s another principal Direction which is the punishment to the lowest is there argument we still think about it but uh yeah you can also do like eigen decomposition on it uh to identify the values and this would also yield to you where the punishment is the highest Etc um but yeah if you choose Q for example to be diagonal then your coefficients on your diagonal would directly imply which of the components of X is being punished how okay so if Q is a diagonal it will become a sum of X1 squared times q1 plus X2 squared times Q2 and so on and so forth so this is not uh anything total strange same here if R is diagonal then you would have just U1 squared times L1 plus CO2 squared times F2 Etc okay now let us substitute all of this goodness into HGB so the substitute into hdb and that&#39;s what we get this is our instantaneous cost this here is our partial derivative of the optimal optimal cost with respect to packs and this here is our f of X so x dot right so this is what we get and ultimately this is what we wanted to arrive at all right this is HTTP but for linear systems now we just need one more component to be able to solve this equation that one component is this part here or so far this looks mysterious and we don&#39;t know what to do with it all right let&#39;s make Next Step so we have a theorem which I&#39;m just giving to you as a theorem uh that says for LTI with scholarity cost optimal postage form x times s times x so optimal post is a quadratic form itself okay where s is the positive for definite Matrix so the optimal cost is a quadratic function and that is uh great because uh because it allows us to easily uh do the rest of those derivations so for example what is the partial derivative of this function with respect to X respect to X would be let me write it out and say DJ DX and we are talking about partial derivative even though like I don&#39;t have a partial simple in my keyboard so I apologize so we&#39;ll have X transpose times C times [Music] topics okay Plus Plus so this would be this would be our okay so this would actually be our uh continuity with respect to x times DX GX which yes so this is how we how we do it uh we say let&#39;s say that uh partial derivative respect to X so this is ultimately what we do here uh a better way to say it is this is a derivative with respect to a vector field okay so we have this scalar function we don&#39;t want to take a derivative but uh this is a function of X but X itself changes according to a vector field so it&#39;s changes with respect to time uh as uh f of x and you know according to the function f okay so f is a vector field here and the the derivative is uh this is just to say that the derivative respect to a vector field is a partial derivative with respect to x times the vector field okay well this would be a better way to discuss this than to say you know whatever written here I guess yep correct now this is just a chain rule uh oh sorry the derivative of a product that&#39;s what I did and there is for product right so it&#39;s like derivative of Z A times B is derivative of a times B plus 0 to B times a so same here now uh this part this part here is in this part here sorry uh this part here is f of x and in our case this a X plus b u this part here is f of x in our cases again ax plus b u so when we substitute here what we get is our familiar instantaneous cost as before here plus those new components that we just discussed so x times C okay okay I&#39;ll see X from here and our f of x in both cases that&#39;s it that&#39;s what we have so nothing to difficult happens here here is very easy to make a mistake with transposers and stuff like this so This wave is uh x dot I recommend it because it makes it much more easier not to make mistake one way to make a mistake here is to just compute DJ DX okay the jtx and then multiply it by x dot that is one way to make a mistake so if you try to compute DJ DX uh what you would find is two vectors one vectors another horizontal okay uh and you will sum them I guess you would pretend that there&#39;s a stay kind of like they&#39;re both vertical I guess and you would have something like let&#39;s say 2X 2cx or two external suppose C and that will be would create uh like you would forget one of those two components you will get a two in front of the other component and that will just lead to uh you know your quadratic form might be correct but uh it would no longer be symmetric The Matrix no longer business so it&#39;s easy to make a mistake by trying to apply uh this thinking but uh instead of thinking of it as a single single entity like a derivative with respect to a vector field you think of it as two jacobians the Jacobian respect to x a Jacobian with respective Gene like uh like separate entities that you can multiply to each other so [Music] if you in your generation you have a tool and you forgot one of the components that is probably how we did all right do you have questions about this so far this is like mathematically this is the most uh hairy part um no okay good if you have questions interrupt me at any time because uh right we have enough time for in the lecture for that um like the number of slides is quite smoothly so don&#39;t hesitable okay so this is uh the situation here uh we have um now let me clear all this we have now our age student in this form and the rest is just out the rest is just logic so let us do the audition so first of all we open these brackets it opens as brackets and uh what we do is uh for example we can collect okay you can collect quadratic terms in X so this is a quadratic term in x here in one component here in as a component here another component okay so those are quadratic press form on X and then we have mixed components so we have component that depends on B so on U and X if you ever have components it depends on the U and X and this is here so quite simple quite simple all right good next step is to find the partial derivative of the equations back to U and set it to zero so we&#39;ll say that this is this equation the minimum of it with respect to U should be equal to zero well let&#39;s uh since you know we know that uh you know we know that this lies at the minimum with respect to U so if that is true we&#39;ll lie at the bottom and we can find the derivative respect to U and there is no that the derivative is going to be equal to zero because anything that lies at the bottom uh of a in uh some locally convex shape like at the bottom of uh basically like as a new the condition of optimality includes the derivative is zero so since we know instances what we are all right let&#39;s do it so uh let&#39;s find the derivative of that guy with respective with respect to you so what&#39;s going to happen is this guy will become um do you transfers uh here basically uh here by the way we are going to do exactly what I was worrying you against uh above so we&#39;re going to some vertical and horizontal vectors quite interesting quite interesting so uh more correct would be to say that this is U times R plus r times U right but we&#39;re going to sum so many ways and here again it will be well uh here is there is an asset patient so those this equation will yield as X transpose c times B and this would yield us B times c times x thank you here um but let me I guess uh go just a step deeper uh into this whole business of taking zero with those guys so delusion for this guy is clear absolutely clear so let me even get rid of this one so zero two this guy is clear that is just X transpose m c b right what are the derivative of this guy what is the derivative of this game now this is actually not so clear well it is tempting to say that the derivative of this guy is B times C uh times x right it stems interested but that&#39;s not correct why because uh U here is transposed okay you hear it&#39;s not supposed so in fact in fact uh in fact uh the correct way to take a derivative of a function when your argument is transposed is to transpose the function so the correct way is to say that this is what&#39;s going to happen that is we are talking about partial derivatives like remember here we were talking about derivative with respect to a vector field that is a different Beast uh here we&#39;re talking about partial derivatives so uh how can you justify it for yourself as in you can just look it up in Wikipedia or in calculus textbook and you&#39;ll find some table which will tell you how to take a derivative with respect to anything but how can we justify for yourself well there is a very simple way to think about it like this whole thing here and let me I guess uh clearance so this whole thing here is a scalar so this is just a single number simple like function okay scale of energy uh a single number it&#39;s transpose is still itself so transposing this particular one a particular function does not change anything so you can transpose it so transpose it first leads us to a function let me write it here transposing at first leads us to a function X transpose C uh I mean c is the symmetric so the reason for a transpose specific function times B tells you can so this will be a transpose of serration function great now derivative of this function with respect to U is clear it is this game [Music] x times uh how to open the brackets with transpose humor right it is you change the order in the opposite direction and you transpose everything so B transpose CX will becomes X transpose c b okay so uh you can think about it this way uh you uh like that always one is if you&#39;re going to take a derivative with respect to the variable which is transposed U transpose the outcome okay that&#39;s one right another another one is since this is a scalar you are allowed to transpose it without changing anything and then after you transpose that you take a derivative which you know so you can clearly see that this partial derivative here and partial derivative here they are the same like this is the same as so here when you do this uh you&#39;ll also have the same thing if there is respect to directly with respect to this U would give you U transpose R okay now derivative with respect to this you you would give you um let me consumerizing here will give you r u transposed okay which is the same as two or you open the buckets and you get again you are so don&#39;t be the same as you and uh here here you get just two x cb okay so you understand now how the decades impossible let us see if one of the things that were easy to make mistakes and it involves two different ways of thinking about derivatives here with the thought of dealerships with respect to Vector field here we think conservatives of scalers which can be transposed not also correct now this is the result you can clearly see we can get rid of those tools in fact if uh we choose control policy to be or to have like one half somewhere we might have gotten rid of those uh guys uh uh in the derivation but personally like they cancel next what you do is you take this part here transfer to the right hand side this is a minus sign okay we transfer to the right hand side you get a minus sign and then you multiply yeah uh here okay let me I guess on skip steps the the first important step is to transpose this whole thing again uh because U and X are transposed so it would be nice to transpose them back so it&#39;ll be r u Plus different scores CX okay that was the first step and after you cancel the tools here you can see those okay that&#39;s what you get next you transfer this guy on the right hand side is a minus sign and then you multiply both sides by r trans R inverse so multiply both sides by R inverse so basically you just Express U out of this equation alone but what you do is you remember to transposed you have to do it all right so this is how you get this controller now this control law is optimal okay so this optional control you can already use it there&#39;s only downside is that you don&#39;t know what s is by R you chose yourself that is your cost function B is your control Matrix you note it comes from a system s we don&#39;t know what it is you have to find it is optimal cost uh because that&#39;s cost function you don&#39;t know what it is okay okay now this is a control law so we could using this control log instantly say that uh this is a gear control gain so we can rewrite it as like this where a will be equal to R inverse B transpose C so uh people uh sometimes like to give it interpretations one interpretations for example to say uh the the control law somehow takes X transforms it in accordance to the optimal costable okay so if the optimal cost has deformation to be taken to account here then it uh transforms it in accordance to the control Matrix so control authorities you have then finally takes into account the control cost okay so this is like a series of linear Transformations that somehow take into account your final cost how it&#39;s going to look like your Control Authority into your cost of your actions but uh yeah I&#39;m not sure if this makes it more clear to me but if you want you can spend time with to think about uh like algebraic implications of this but this is a standard formula so this would be your standard lqr control mode and this is a proportional control law this is a proportional control notice if you had a second order system let&#39;s say spring dumper uh X would include position and velocity so your control law in this case would be based on positional velocity so it will be linear with respect to position linear with respect to Velocity for second order systems that is what we call PD control so this is a proof that for a linear second order system PD control is optimal so V for free proved that for second order linear systems uh PG controls Optimum thank you all right that is uh nice to hear another it&#39;s just uh something but when when people talk about PID uh when they say like PID is good Etc you can see that some of the justification comes from the fact that in fact PG component is already optimal for linear system and PID control is usually used for motors which are indeed linear second order systems sometimes first order sometimes second order is sometimes third order uh so you know there is zero variations but uh if we&#39;re talking second order the systems then you can already see the spruce optimality of PT control I component and controller uh that is I just uh for your information I guess you can prove its necessity by considering something called disturbance Observer who understands this may be a little later as a last lectures of the course which takes to account also external disturbances or errors Etc model errors all right so there is a connection between this and PID in terms of theoretical proofs okay now this control law is called linear quadratic regulator lqr this is called linear quadratic regulator uh you if you use this control lawyer using lqr you&#39;re using linear contrast curriculator okay now how do we find S uh is all fun but uh before we know s we cannot use 8 QR we need to find this now what we can do is we can go back uh to this equation back to this equation and we can substitute the controllers that we found into here here and here okay so I found the control law Everyone by this minus r inverse B transpose c x so we will get here x transpose uh R inverse uh be b c oh sorry uh what I&#39;m talking about no so it&#39;ll be X transpose c b r inverse times R times R inverse uh B transpose c x here it will be again minus r inverse so we&#39;ll have minus sign here we have minus r inverse B transpose c x here it will be X transpose c d r inverse Etc so uh all this with the rules of student will be bigger let&#39;s look at this so I&#39;ll try to highlight over the substitute the controls our controller here&#39;s the first instance your second instance here is uh I believe the third instance here believe the last instance so we substitute our control law we got a few minus extra minus signs here because controller is negative has a negative sign so this is what you get now let me simplify uh there is something to simplify here like this is identity this is for example and basically we notice now why we wanted R to be positive different rather than semi definitely if it was semi-defined it would have zero eigenvalues so it could have been not invertible but with r being positive difference it is always invertible so we always have R inverse that allows us to find controller so it wouldn&#39;t be even possible without uh RB inverted all right so we have now this guy now let us notice that we have three identical expressions one expression here x s b r inverse P transpose CX another one here is a minus sign x c b renewable transpose c x and another one here x transpose CB are inverse B Transport CX okay three identical expressions so when we sum them uh we&#39;ll have only one left here there&#39;s a minus sign because we have one with a plus sign who is a minus sign the identical so the white clue cancel each other everything else that remains is this contraceptive form with respect to X okay this is also almost all of this is a quadratic form with respect to X so we put X Out of the brackets and that&#39;s what we get we have q c r a a transpose C minus uh c b r inverse business plus c okay so we have this guy here and this would hold only if uh this for all X all X only this equation is equal to zero so it&#39;s not a question of uh it&#39;s uh having a zero eigenvalue or two it is a question of this whole Matrix being identically equal to zero well this whole Matrix has to identically be equal to zero so uh just to reiterate those mean over you now goes away because we substitute optional control law uh but we still e i equal to zero as you remember from the front condition so this uh this still rests these theorists in this theorists okay so what we achieved here is called algebraic Legacy coach this is called algebraic records equation one of the key stones for a number of uh areas but in particular for control optional controls also used in at least used to be important for the convex optimization and it is you can think of it as just the important mathematical object so this is photos equation what we can notice here is that algebraic Cricket situation is quadratic with respect to s with linear components here so the first linear components here but also quadratic component here so this would prevent us from being able to solve that illusion using linear algebra so in fact this equation is usually solved numerically we don&#39;t usually solve it using the algebra and seller exists very robust algorithms that solve this equation so this equation algebraic equation is what we solve to get past and the ones we solved we substitute it into our control law here uh we find K as R inverse BC yes and uh that&#39;s probably the way if you are now how do we do this in practice it is you know painfully simple so in practice if you use a Matlock you Control Function lqr of a b q r this is a very linear systems so this is your state Matrix and raw Matrix control cost uh sorry State course control host and this will give you your optimal gain so K which is equal to minus sorry equal to R inverse B transpose C this will be your C so s will be your s uh and this will be identify yourself in your closed loop systems okay if you&#39;re using python what you can do is you can call site file you know solve continuous array array is algebraic gas equation as you can see it&#39;s Again State Matrix control Matrix control State Post control cost and this now is your uh solution to your records equation so this is optic form of obso control cost notice that you have to do like substitute yourself into K to get like to get your K you have to like do R inverse times B transpose times C so in Python you don&#39;t get K you get s so you have to find K yourself is just remember remember to do it still forgettable so uh if you want like a more robotics oriented default software just Drake a few years ago used to be one of the like more exciting libraries out there uh we have more competition now I guess uh but uh it Still Remains one of the uh you know more impressive robotics uh toolkits uh is developing MIT by a raster Drake and GST so uh they they have in one of the releases the ones that we worked with it they use have a function linear products calculated which takes again control Matrix State plus control cost and the return of the optimal login and costable so first form of the option cost now what what do we see here what we see here is that in practice you just call a single function let&#39;s say this one or since you&#39;re using python let&#39;s say use it calls as function you get your solution to the gas equation so get yours and then uh you&#39;ll find your key and you&#39;re done so it&#39;s kind of the same as using place uh you it just gives you optimal cost but the plus side is um you yeah okay I&#39;ll discuss plus size minus signs on the next slide but uh yeah so it&#39;s very simple very simple now to calling a place in Matlab or python please both please okay so lqr first uh comes down to using software which computes a solution to zerica&#39;s equation but uh this whole charades that we did like all of this from here to here it was used so let me just before we continue on uh let me just quickly explain like is useful outside of lqr it&#39;s just important piece of mathematics and it is true for non-linear systems as well so it&#39;s not just linear system think it is true forming linear systems as well okay so if your system becomes non-linear but you can still solve a hematological building you know great you are in Black so uh it&#39;s important to keep it in mind if you are only working with this type of intellectual environment you are instantly limiting yourself only to the systems for which you&#39;ll have a situation so in this case LTE but uh you can instantly imagine maybe you&#39;ll have LTV systems maybe you will have systems with some small linearity or anything else as a there can be a number of options and if you can derive this equation from HTTP or from this equation section then you&#39;ll be able to stay flexible you&#39;ll be able to continue working with ricotti even when you see something that is not covered in the textbook that you have open internship so it&#39;s important to know HTTP just to be able to derive slightly different systems okay so in fact if I add plus C here or if I want to do trajectory tracking let&#39;s say I once uh as a course the punishes not X but x minus X star all this is possible is uh the results derivations you just you can do it okay now this derivation is just important uh partly because you know it&#39;s easy nothing to Difficult about it and it allows you to keep your skills of picking derivatives uh not resting and when you&#39;re going to do other things like uh you know carbon filter Etc knowing how this is done here might be useful for you there so it&#39;s uh like this is not different so I would recommend you practice okay now this is just important to know because python expects you to know this they don&#39;t give you okay they give you s so you need to remove those aspect and this is recalcipation so if you want to understand what exactly you&#39;re solving you need to run remember this equation this equation might be difficult to memorize but if you know the derivation you&#39;ll be able to derive it very quickly and if you don&#39;t remember the situation uh but you if you just know the outline of situation for example if you remember this part at least uh is the fact that you need to cancel our our inverse your Originals that are inverse is going to be the center and the only thing that uh yeah and you just need to be be trans be transpose you need to put there correctly and uh yeah so it&#39;s not difficult it&#39;s not difficult the rest will be done by just uh I thinking about Dimensions so if you want to memorize it knowing uh the derivation uh will allow you to easily remember the minus sign here so I strongly recommend you to study this as much as you can now what is the comparison between this and uh our previous uh tool all placement so uh just a few words so both Lisbon has its own advantages so it allows you to exact to design exactly how fast the control error decays to zero foreign so you can decide how fast your you go down into zero chapter because you place eigenvalues essentially of the system and you can design control without oscillations for example so you can decide if your control needs to have oscillations or not you can decide okay no oscillations that is a plus side the minus side is uh it may easily require unreasonably High control units so may ask please make sure that the second value goes is equal to -2 and you need to control gain 10 000 for that so in practice it is useless you&#39;ll be like a minus 2 doesn&#39;t work because ten thousand control games let me ask for minus one and I&#39;ll tell you okay this is one thousand new black okay still not acceptable and so on so what I&#39;m trying to say is um it&#39;s very easy with a full placement to ask for some families because you don&#39;t understand how Dynamics works and you don&#39;t care about how it works you simply ask it now make it work this way okay let&#39;s say you have a car and you ask it to behave like a bike you ask it to be able to uh let&#39;s say you have like a bad example you have a bus you ask to behave like a sport car you&#39;re saying okay accelerate this first decelerate this fast Etc that is a like a metaphor not a knowledge but yeah uh you like accelerate this far this is always this far and so on just fast uh let&#39;s just not how you design a control system for a bus it&#39;s just not uh not a good way to do it because you would overtax your Motors and so on so in fact uh very easy to move wrong the whole idea of Paul placement is that you don&#39;t really care about the Dynamics beforehand you just force it to behave the way you want to behave and you can imagine this is not good now lqr is very easy to produce good games like uh if you just choose Q equal to Identity r equal to Identity most likely you&#39;ll produce a reasonable okay okay so alkira doesn&#39;t Force anything on your system and as long as your control gains are reasonable uh so the cost is reasonable you would find Optimal control policy with respect to this cost it should be reasonable of course here you can instantly see a one downside we are talking about optimality right so we&#39;re talking about optimal control policy but the cost like q and R functions if you don&#39;t really know how to choose them as in it would be good if someone told us what cost should be like but the way no one is going to tell us so we have to choose the optimal cost and the optional cost is not something you can look up or do anything else in fact you&#39;d have to just come up with it and often up the performance that we really care about comes in the form of eigenvalues not cost so in fact the population somehow deals more directly with what we care about with eigenvalues and the lqr steps away deals with the cost which we don&#39;t really care about so much so that is a downside for example it can easily produce uh slowly digging control error visualizations so if you use a car it&#39;s very easy to get performance which is not very good okay so two uh both of them have advantage ball basement work gives you exactly what you want to have but you can easily ask for something but lqr usually gives you something good but uh it&#39;s not often give giving you what you really care about so performance it can be jeopardized yeah all right all right so this I think is all I want to say about aqr do you have questions ah no okay oh that&#39;s good okay uh today we will have I I believe the similar you will have a bit of a study of the discrete case for uh zelker so that will be covered make sure you attend the seminar all right we&#39;ll see each other next week good luck bye goodbye bye

Transcript for:Control Theory: Introduction to LQR

Transcript for:
Control Theory: Introduction to LQR