Optimization Problem Classes Overview

so today uh we'll we're going to uh finish this Whirlwind tour through problems um different types of problems and formulations and things like that and some named problem classes uh so last time I think we we quit right around here on on quasi convex optimization so that's when you minimize a quasi convex objective um and uh here you have to be I mean you should be a little bit worried because in fact uh the basic you know probably the most important fact about convex problems is that you can have a local minimum you can't have a local minimum that's not a a global minimum right but for quasi convex optimization that's false you you can have that right so you have to be a little bit more careful um not that much careful it's not that hard to solve um so the way you do this is uh if a function is quasi convex it means its Su sets are convex and so what we introduce is a function whose zero subl set is the T subl set of your function um and uh so what so so this and this function has to be convex right so F and how you construct f t there's completely generic constructions of it they're not useful but you know when people ask you how do you come up with that 5T you you can always do it often it just is suggested by the by the problem we looked at an example last time that was this um I mean here's an interesting one it's the ratio of of a non- negative over a positive function uh the numerator is convex and the denominator is concave right so uh this function here it is definitely not convex necessarily but it is quasi convex and to check that what you do is you look at the Su set of this of P you know when is it true that P of X ID Q ofx is less than or equal to T if T is negative uh then this is simple uh right because then the suble set is empty and that's convex so that's cool so we really only have to worry about t positive so if T is positive then P of X over Q ofx is less than T um well this is true without the T positive that's true if an only a p of x minus t q ofx is uh is less than or equal to zero okay now now the critical part this is a convex function of X provided T is positive that's because Q is concave right so so you're you're just using one of these simple rules in your head that's concave multiplied by T positive still concave minus now it's convex and then it's a sum right so this is the example um and what this allows you to do is really cool it says that if I wanted to check does there exist an X for which F0 of X is less than T I can check it by solving a convex feasibility problem right so that's the that's kind of the idea here which is really cool and then of course what you could do is you can just do bisection right you just say well uh can you find an X where F0 of X is less than 1.5 and if the answer is yes then you go cool and by the way you now have a feasible point with an objective value of 1.5 or better um then you'd say all right how about one or how about 0.5 or about minus one this kind of thing so I mean not minus one in this case but still um so that anyway that's how that's just bisection and that that actually uh Works um so you know the algorithm is actually just what I said I start with uh an upper and a lower bound on the uh optimal value of the objective I I just query I ask uh what I do is I I take the midpoint of that interval and I ask is there a is there a I solve a convex visibility problem uh to find out if these inequalities hold if they do then that tells you that P star is less than or equal to that value right it means it means you're above and by the way now I update my upper bound I pull it down to this halfway mark if it goes the other way around I take my lower bound and I pull it up to the halfway mark and what this tells you is that your let's see your interval of ignorance about the optimal value right that's p stars in some interval that interval shrinks by a factor of two every iteration right so or if you like you could even say this you could say that you obtain one bit of information about P star every time you solve one of these convex feasibility problems right and by one bit I mean that you know it it's uh you actually reduce your uncertainty exactly by a factor of two okay so so that's it and this is you know this is kind of standard and it works and so on so that's that's quasi convex optimization okay what we're going to do now is we're going to talk about a whole bunch of different problems some of them actually have named so there's named problem classes um I'm just trying to think you know it's maybe much less important these days that you know these um but things like this is just basic and there'll be people who don't know other stuff and they'll know these so you just you just have to know these uh these names right so uh linear program is a famous one that's P um and we we already encountered this a couple weeks ago it's it's minimize an affine function subject to a set of apine inequality constraints here and a and an equality constraint or another way to say it is you know the D doesn't matter here if I remove D this becomes an equivalent problem it's not the same it's equivalent uh it's an equivalent problem and so it's basically says minimize a linear function over a polyhedron so that's that's actually another way to State what a what a linear program is okay so um so actually there's this you know the modern history starts in around the 40s for linear programming but in fact I think Fier wrote a paper about linear programming so that tells you it's not it's not totally new um it became a whole lot more interesting when you could actually compute stuff so that would be after the age of computers so okay um so a picture looks like this right here's your feasible set a polyhedron each of these faces is a row a single linear inequality right so that's what this is and so these are the feasible points and there's a vector C it points this direction these These are the level these Dash lines show you the level curves of the objective which is cpose X right obviously it's going to be orthogonal they're orthogonal to C and what it says is you know this might be 3 2 1 zero okay like and so on I mean this there'd be minus one minus two and so on right so and so your job is to basically slide in in this thing go this direction as far as you possibly can and that would be like this point here right so um I mean this is very I mean this picture is not a bad one to have in your mind because it explains what an L what an LP is you know obviously we are not interested in solving LPS with two variables because you can do it with your eyeball okay so what we're interested in is solving LPS with 10,000 variables and 30,000 constraints okay people do that all like right now I guarantee you a lot of those are being done right now it's used for all sorts of things like scheduling supply chain stuff it's just used everywhere okay so like in my opinion everyone should know about it okay this is linear programming so um actually the first example is historical and I think that's it's kind of cool because it's well it's just part respecting history anyway so this is from maybe from n maybe 1940s or 50s and it's the diet problem um so I think it you know allegedly you know was supposed to be you know from the military like how do you how do you um how do you construct a diet that will keep soldiers alive uh and as cheap as possible or something like that right so I mean whatever so it's a perfectly good problem here it is uh so you're going to you have a bunch of foods and you're going to choose some quantities these are going to be uh non- Negative they can't be negative um and what we'll have is we have a bunch of foods so food J when you get one unit of it doesn't matter what a unit is but one unit of it um is going to have an amount AI J of nutrient ey so we care about a bunch of nutrients right like your you know your maybe 10 or 20 or five or something like that various nutrients um and a healthy diet is going to require nutrient I in at least a quantity bi so that that's your sort of let's say that's your minimum daily advised blah blah whatever it is right okay so then immediately well not quite immediately but I'll explain something here so the to find the cheapest healthy diet you minimize that's the cost of the food stuffs um and then you could even annotate you can imagine annotating the constraints which would be really good so if I annotated that that's not a con idate that as that's the cost total cost this says the amounts have to be non they have to be they they can't be negative right um and each line of this you could annotate and say this line says that's the minimum amount of protein in a diet right that's the something like that right um now technically speaking if you look at this and then compare it to what we defined an LP to be like this um it's not the same right um but here we're going to start doing things where once you're you know once I think at this point in the class and moving forward um you can do a couple of quick transformations of the problem without even mentioning it so anyone would look at that and say that's an LP doesn't have exactly the right form there's a few silly things like for example here uh you know the inequalities are they go the wrong way in fact both of them go the wrong way so I'll just do this once and then never again so what's going to happen is you have you know ax bigger than b and also X is bigger than zero and then we're going to write we're going to end up writing that let me get my notation down here so the notation is GX less than H so we're going to somehow write this as GX less than H right um anyway anybody got some suggestions minus sign yeah minus sign I like it okay so we'll start start by doing this we'll have like minus a so this thing is equivalent to minus a x is less than or equal to minus B and this one is equivalent to this that zero is the same as minus 0 everybody cool on that that was step one then step two is you just stack right so you just do this and that's less than uh okay B and zero okay so so here's G and here's H okay so everybody got that so yeah we'll never do that again I'll never do that again let's put it that way and the truth is you don't have to either because you know modern languages which you'll be using shortly uh to do this will do all the stuff for you okay so so you don't have to but you know in the old days or every actually every now and then you might meet a colleague you might be in an internship or something like that find someone who's using linear programming or something like that and they would they'd say you know how long did it take you to do and they would say oh it's huge pain took me like weeks and weeks and weeks uh because they had to convert their problem to they had to do this kind of stuff and keep track of everything right and then to debug that is not fun and so on so we refer to that I guess kind of derisively as Matrix stuffing so right there a question considered an identical ah okay uh good question these are actually equivalent yeah because yeah yeah actually I I think actually that doesn't even fit our general you know our general form was very restricted it was like you know fi is less than or equal to zero and stuff so U but for sure even if I mean it's yeah when you do these small Transformations which keep everything equivalent uh there the two thing the thing you transform something or reduce something to to use the computer science term um is not the same problem it is equivalent yeah so okay um all right so back to oh so that's our that's our uh here's our here's our our healthiest diet thing um what have I had an upper limit on a nutrient I mean can I can we handle it yeah it's kind of something similar like the answer is sure that that that would be we could certainly do that um what if there were like Supply availability issues and some of these Goods the some of these foods are only available up to some current limit of supply of course you could handle that so once you start figuring out how to do this you'll you can see that it's there it's very expressive there's a lot of stuff you could do with it okay so so this is a this is just a very famous historical example I think it's probably one of the first ones that people ever uh you know uh talked about okay next up for fun is peace wise linear minimization so this is a piece-wise linear function right expressed as the maximum of a bunch of affine functions so each each one of those has a graph that it's like a plane and then you just intersect them and that's your peie wise linear function right and then you want to minimize it right um and so the way this would work is if I walk up to you and I said is that an LP you would have to say no because I mean to check if something is an LP or a linear program the first thing you do is you check the object is the objective linear or apine it's not right I mean if m is more than one it's it's not apine right okay so um but in fact this easily converts to an LP um and you do that by the epigraph form so you just introduce a new variable T here and then these constraints here basically say that XT is in the epigraph of that function or you could just think of it this way it just says that t is an upper bound on the max of these things this says Ma so this would typically this would actually this problem here says minimize the upper bound on the max but also mess with the X's right right and so you can quickly argue you you could easily it's not I mean someone has to do it but you could show that these are actually equivalent um actually quite specifically if you solve that that's an LP that's an honest LP right there that that is a linear objective um and these are like linear constraints okay fine if you wanted to put the T on the other side right but that's that's sort of in in that thing and I'm not going to show that type of thing everybody got this so it's kind of uh it's kind of cool and this already is actually kind of useful um just to the ability and by the way most people let's see you're actually doing this kind of stuff if you came up with this and someone said well cool but how do I solve that people just look at you and go like uh it's an LP so people would expect you to know that okay or cool people would anyway other people would I don't God God only knows what they would do uh I do know some of the things they would do okay okay so we'll look at another example again of linear programming actually I think it's it's actually our first glance you know Glimpse at this uh this issue of finding a point that's actually it's our second find a point that's Cent that's that's deep inside a set like a center I think we talked about we already found we already encountered it once it was the yield Center it says here's your here's your region of stuff that works dial up the machines to get a Target so that your yield is maximized so that was one notion of the center um this is another one is a chbby CHF Center so here's my uh here's my polyhedrin described by uh linear inequalities and that your job uh is is this uh you want to find the largest inscribed ball inscribed means it's inside the polyhedrin and you want basically you want it to you want to find the largest ball that fits inside the polyhedrin everybody got it so that that's that's the question um let me just mention one thing right here cuz I it's probably worth mentioning um if someone comes up to you and says you know I mean if we're talking about linear programming right then here are the things first of all a part of your brain should light up right um on the geometric side you should see pictures of like flat function you know apine functions maybe you should also see pictures of Peace wise affine functions right those are those those are the things that should light up when I say linear programming to you in your brain right we know this we actually do take students put them MRI machines and check these things actually we don't but but we we could if we did then every time after that I could actually honestly say we did this but so so well I may even think I'll think about it um okay um all right so what I was saying is that the minute you get to a torm a a rather different part of your brain should light up right because when I say torm to you here's visually here's what should be showing up things that are round balls ukian distances right which should not be stored anywhere near the part of your brain that lights up when I say linear program which should be peace wise linear things flat things apine things that kind of everybody following this so honestly when you're listening and you see that and you just see this ball here you should say I there's no way that something involving a quadratic is going to end up as a linear program I'm just I mean OB viously it does it it's right here on the slide but the point is I'm just I'm just saying it's a little bit surprising right so okay all right so how do you solve that how do you find the how do you find the ball the largest ball that fits inside a polyhedrin well here's what you do um what you do the first thing you ask is when when is it true that a ball is in a half space because then to say a ball is in a polyhedrin is to say it's in all of these half spaces because the polyhedron is the intersection of the half spaces okay so when is a ball in a half space and the answer is kind of interesting we parameterize the ball by its Center and its radius that's a vector and a not a positive scalar okay okay so here it is um when is it true what you want to do is you want to maximize this over the ball and you want that maximum to be less than bi uh but that we can solve if I ask you to maximize an inner product over a ball it's it's easy we can do that exactly right because it's first of all this the Koshi Schwarz inqu basically the Koshi Schwarz inequality is tight that that's what happens right and so it turns out this this thing that's the maximum value of this thing over a ball of of this oh sorry uh yeah that's right that's the maximum value of this over the ball and it's equal to AI transpose XC that's this term plus and then over here we have ai transpose U but U is less than R and so by Koshi Schwarz that's definitely that's an upper bound but in fact it's equal if I take you the right way right so in fact U would be AI appropriately scaled okay so uh it says that you get that b is a subset of P if and only if this holds for each eye so that's this constraint here now when you look at this uh again you see a two Norm here and so there should be like a at least a yellow light flashing somewhere like this is not going to be an LP because it's got a two Norm it'd be the same if there was an exp or a sign or a cosine you would say doesn't sound like LP to me okay but you just got to you just got to be very careful here and take a look at this uh the variables are R and XC and the objective is what is linear that's cool and then we look at these constraints again the two Norm is causing this you know either discomfort or yellow flashing light right which is warning warning may not be need not be LP ahead but um but that's a a AI is given data it describes a face of the polyhedrin so that's just a positive number what kind of what kind of constraint is that in the variables which are XC and R it's just linear okay so I mean maybe I made too big a deal out of it anyway so by the way practical conclusion is you can solve this problem even at enormous scales okay so so there you go um by the way is the chbby CHF center of a polyhedrin is it unique is it see some people saying no I come what's an example a rectangle would have a bun I like it okay so here's A's a rectangle right this is going to come out terribly but okay that's a if that were inside that would that would have been a chabby CHF that would have been a chabby CHF a solution to the tabf problem but so is this okay so right so so that anyway so it's it's not um by the way this is already kind of uh quite quite useful here right because it's it's already it I mean this already has like just immediate applications right so okay um next up uh I think we're in around 1950 or something like that so uh here here we are this is uh yeah so part of this is also historical but it's not bad to know these names right so um here's one so linear fractional program that looks like this um you minimize a linear fractional function that's a ratio of two uh of two apine functions where actually by convention we assume the domain is where the denominator is positive okay that's that's what that is um and then s just over a polyhedrin that that's all okay now it's a quasi convex problem and we just talked about how to solve them so you can always solve a linear fraction Problem by doing bisection where you're calling an LP solver in each step right so we we just that's part of the generic thing um but what's cool here and this was figured out maybe also in maybe the 50s or early 60s I I don't remember exactly um it turns out you can solve this problem as a single LP and I'm not going to I won't get into the details but it's it's something where you can look in the book look here it's not totally obvious how to do this um you yeah you introduce some new variables I I think I won't go into it and you reparameterize and in fact it has to do with this perspective transformation I'm not going to say anything so by the way you should not there's no way a person can look at that with this and then say look at this and go oh yeah no I I totally see it so this an example of a problem transformation that's not obvious okay and I think I'm I'm not going to take the time to go into it okay so the others have all been obvious right like you know you can take an inequality that's a bigger than or equal to multiply everything by minus one and now it's less than or equal to right these this one is not actually there's a whole bunch that are not which we'll get to okay um now a generalization of that is a so-called uh that people call generalized linear fractional problem that's a maximum of a bunch of these These are actually quasi convex and the maximum of a bunch of Quasi convex functions is quasi convex so that's quasi convex and and in fact for this problem the only way to solve it is in fact by bisection as far as I know I mean not that it matters right but okay and here's a very famous example of that's from Von noyman and it's the it's the it's the Von noyman model of a growing economy right so let me explain what it is we have two sets of variables um X and X+ and each of those they're non- negative and they they represent um hey wait a minute do we have X hey shouldn't I have X bigger than or equal to zero here yeah we should or or maybe we oh maybe maybe it's it's absorbed in the denominator of this sorry okay I confused myself momentarily my confusion is over now though so everything's cool okay um here it is uh okay so X+ is the uh economic activity across n sectors of an economy X is the current economic activity across n sectors in an economy right um x i plus over XI is literal the growth in that sector of the economy so you would say I don't know you know steel manufacturing went up 6% it means that this number is 1.06 if it's 093 it means that it actually contracted 7% everybody got it okay so here's what we want we want to figure out um what would be the right economic activity levels to maximize the minimum growth in sectors across the economy that's that's this right okay um okay here are the constraints here's how X and X+ are are constrained um they're constrained this way we have two matrices we have a number of of of goods um that are consumed uh and you know that are consumed and also uh produced right by by economic activity right so one unit of economic activity in sector I might consume a bunch of things actually it's going to have positive and negative right because if it if it's consuming it'll you know it'll have all sorts of stuff that it's maybe consuming but it's also producing stuff you know like manufacturing automobiles you have it consumes steel and you know various other things and you know that it produces cars like just as an example okay so what this says is uh so a x a when you multiply The Matrix a by by the vector x what you get is the vector of you get what is produced this year that that that's a set of goods that were produced this year it would be so and the model here is very simple that whatever what if this is year quarters or months it doesn't matter the idea is that what's produced this year we kind of Imagine is not immediately fed into uh various but it's almost as if well we we do this in discreete time so what's produced now is kind of held in escrow for a year and then on January 1st uh everything starts up again and you can you use those and so what this says is that's the that's the vector of resources consumed by economic activity next year and it says that is less than or equal to what you produced this year and there's tons of variations on this right you could you know we could we could do all sorts of variations on this we could have for example we could do this for 10 periods and have a running um you know a stockpile of stuff right uh so okay everybody but this is the basic one make sense okay and and this is just immediately a generalized linear fractional problem a few things here I'm maximizing A Min whereas this thing should say minimize minimize a maximum right but you just flip the sign and everything's cool okay everybody got it so and by the way things like this these are these are not obvious things right I guarantee you with 10 10 sectors I you could not figure this out by hand I mean they're just absolutely out of the question right so I mean but of course we can solve a problem like that with 10,000 sectors like with no problem at all like you could let me put it that way right so everybody got this so it's kind of cool um okay next up H we're probably early 1950s is the so-called quadratic program this is Al this is like linear programming this is actually a problem class that you need to know about just because you know it's it's a thing people talk about it I guarantee when you go various places I don't care the field someone will say how do you do that and someone will say it's a QP or something like that right so so quadratic program is this uh the constraints are still uh linear just like an LP uh so we have a poly a polyhedron is your feasible set but the objective is a convex quadratic okay so that that's the idea and here the only thing that changes is here's your polyhedrin which is your your this thing and the suble sets remember in an LP they're just H the the not subl sets well Su L sets are are sub half spaces the level sets were hyperplanes now the level sets are actually the surfaces of ellipsoids I mean if p is positive definite if it's not then it's the surface of a I don't know whatever I don't know what people call degenerate ellipsoid or something like that okay so and this is the picture right you you want to go downhill here to the minimum um and uh you know while staying in here and here you'll find out it's right here here and actually you can see our optimality condition here that when you're optimal the the negative gradient which is pointing you towards uh you know towards the unconstrained minimum I mean roughly here um is actually an outward normal of the feasible set right and by the way if that were not the case I could wiggle X and do a little bit better right everybody got so this is a good picture um I mean what it doesn't what this doesn't tell you is that I mean and the point here it's a you should have this picture in your mind for sure um but I guess the main point here is that we can solve QPS with 50 you can solve QPS with 50,000 variables and it just it's nothing it's just nothing it can be done so reliably that you can actually have this embedded in control systems and have this done 100 times a second with a failure rate of zero okay just okay so that's that's yeah so everyone see the like the Falcon 9 first stages land right that's running that's running multiple QPS all the time one is running at a thousand times a second and it does not fail okay so so people I people use I'd say this just QP alone is running probably I would say more than half of all quantitative hedge funds just run QPS period period right so what that means is there are many trillions of dollars uh trading right now on on just a QP it's being solved so okay so I'm just saying it's it's it's not bad to know about okay uh let's look at some examples well here's here's a really simple one without constraints it it you would get things like just lease squares right so yeah and you know we we know how to solve that and that's just linear algebra but here's what's cool is you can add linear constraints now so I mean and this is uh I mean you can add linear constraints and it's that's that's and actually the you can add linear constraints but the critical part of that is to say if you had linear constraints the problem is still entirely tractable okay so and I guarantee you there are Fields where they do not know that okay every every year people I people say like oh yeah you know we're fitting these parameters and we do these very complicated experiments or you know long comput ations but you know the parameters like have some of the parameters have to be in ranges like they have some of them have to be non- negative and then they'll invent some unbelievably complicated terrible method for like iteratively doing stuff and you're like the right answer is dude that's a QP like just solve it like just be quiet like it's done kind of thing right um okay so um yeah I mean and there's lots of interesting I mean even just this one there's whole so oh here's a fun one um to do least squares where I insist I'm going to insist that the entries of X are non-decreasing everybody got that so X1 is less than or equal to X2 is less than or equal to X3 and so on that by the way would be a really good model of like wear and tear and damage in the system so if a jet engine's running then we'll just assume that you know if I pull it apart and go in there and like start measuring all my critical clearances I'm not going to after it runs another 10,000 hours of flight I'm not going to find that it's in better shape than it was before right there so that would be so that that's by the way a whole field in statistic I think it's called isotonic regression or something like that there's whole books written on it like so it's which I find kind of humorous right because because it's because here's how we it it's so I mean for us it's just really simple it's like excuse me it's a QP like that's what it is so um and so what it means is you don't you do you should not and you do not need to talk about how to solve it I mean you can have an intelligent conversation about it if you want to do it at super big scale if you want to do it and and embed it in some safety critical thing so that it's its failure rate has to be zero right then we can have a sensible conversation but you can't other other than that it's silly that's for you starting next week it's uh believe one line of python Two Fine two right so it's nothing okay everybody got this so I mean it's already these are already like these are useful things right here uh so um here's one let's let's do linear program with random cost and this is actually the um It's actually kind of interesting so that's a huge um that that that's a huge area it's it's called robustness right and what it says is it acknowledges the fact so here's a okay let's step back and I'll give you the big picture on on robustness when people use optimization they you know you you you're assuming like your your your constraints and your objective and all that they're sort of like quote accurate to like eight significant figures that's kind of what you're doing and you're modeling everything by floats and things like that right so so it's not you know it shouldn't take much if you've done anything in engineering or anything actually in any kind of real field uh you would know that of course they don't know the coefficients right like in our diet problem um every number in you know like the costs they you know it's because you ask somebody or or you check some Market or something but maybe the costs have changed in the last two day you know you know what I'm saying right um if I have a model of a mechanical system maybe I'm manufacturing it really well but you don't you don't know the young you don't know the third digit of the Young's modulus that's ridiculous um it's absurd in fact right in finance I guess my friends say uh about their models they said we're happy if we get the sign right right uh strongly suggesting that I mean everyone knows the third digit is like just a complete joke right the second digit is actually kind of a joke too right so they're they're worried about the leading sign right so okay anyway so there's lots of ways that people handle this we're going to come back and talk about this in great detail later it's actually the entirety of like regularization it comes up in in statistics and model fitting there comes up everywhere right um okay so this will be our first foray into it um so what we're going to do is we're just going to say that uh the cost is random so when you so the way this this problem is going to go like this um it says when you choose an x uh we don't even speak to you if x does not is not feasible so it has to sa x equals B GX less than H so that's let's say I construct a diet right and then the problem is now the C I'd say how much is it going to cost me and they're like I don't know it's a random variable you go oh okay and so you know you could do things like just minimize the mean on average I mean if you're going to do this every 10 seconds for like the next year you know the argument would be everything would average out and that'd be cool um but you might also care about the uh the for example the variance and you by the way this can get way fancier and we'll do that later so here this is a very traditional thing is this is that's the cost that's the mean cost and then gamma is a positive number that's called the risk aversion parameter Sigma is the covariance of C right and so this is the variance of the cost induced by your choice of X okay um so that's that's the idea and then this people would call the risk adjusted cost that's what that would be called right like say and you'd probably quote it and I mean it's probably honestly you'd probably take the square a more sensible thing would be to take a square root here but this is fine right because then they're on the same units right but anyway so that's what this is this is called risk adjusted cost um oh and you in principle pull okay so anyway when you do this that's that's a QP right because that's a that's a convex quadratic function oh provided gamma's positive right um if gamma is negative that's interesting um in fact what what do you if someone's doing if gam if you if you want to minimize a risk adjusted cost with a gamma negative that's called risk seeking um and uh okay let just a couple of things about solving about that problem a it's a really stupid thing to do because you're saying I mean unless it's like some thrill seeker you can imagine something where you're like yeah I'd like the same mean but I'd like the variance to be way bigger it just doesn't make it so number one it's a stupid thing to do number two let's talk about the computational complexity if someone says no no no no no you don't understand I really I'm I'm designing I'm doing some entertainment thing it has it has to be there has to be some Thrills in there so I want gamma to be minus .1 um is that a QP no it's not because this becomes no longer it's no longer convex right so anyway so this actually occurs a lot where you you have something where there's a box with the the things that these are the problems that are easy to solve these are the problems that are hard to solve these are the problems that are sensible to solve and these are the problems that are really stupid to solve and it doesn't split up 25 25 25 25 all the time I mean it's actually quite it's very usual that the problem you should be solving is actually the one that's solvable and then the other the other problem the other types of problems are ones that a you should not be solving and B that's just fine because you can't do it anyway so any it's just weird and that's just how it works out okay um oh this leads us to another one yeah you don't hear so much about this anymore but you still do is qcqp um and so this is where wow you know instead of having linear inequalities you now have convex quadratic okay so this is a that's a so-called qcqp and you you'll still hear some people talk about that okay now we get to something that's kind of modern meaning it's from the last 25 years okay and it looks just like like qcqp or something like that but it's not um and it's it's also like an incredible it's an innocent looking problem it's uh truly un believable what problems can be reduced to this okay so let's take a look at it it's you minimize a linear function that's cool that's like linear programming you have linear uh equality constraints um and then you have a bunch of constraints that by the way if this was just if I made A and B zero this would be zero this would be a general LP so what that says is whatever second order cone programming is it generalizes LP for sure because any LP I can solve with socp right and it just looks like this and so it's a it's a torm by the way this is torm and it is not torm squared right so that's one of the things you have to fight um that because of our collective training and and hundreds of years of math before us whenever you see a two Norm you square it right we all know this right everybody does that and you don't and you know there might be some good reasons right like now you get this analytical solution I don't know this kind of now it's differentiable I don't know so but the but mostly it's because it's because it's historical that's that's really why it is why you square so anyway very important point this is not squared okay um okay so that's a second order cone program and uh why because basically this constraint here says that the image uh under an apine mapping is in the second order cone that's what it says right because second order cone is all things of the form you know torm of U is less than v u is a vector v is a scaler and that's exactly what this is this says the torm of that is less than that and that's exactly what this is right so that's it and and this this includes like qcqp LP actually weird secret it actually includes probably 95% of any problem you would ever want to solve from a 100 different fields okay so you will be you will start using using like CVX Pi you'll write down like weird functions like quadratic over Lin and all sorts of other crazy stuff really complicated geometric means all sorts of weird stuff they will be reduced to ancp and solved okay so this is kind of the uh yeah this is the idea so and this is you know like I said this is kind of relatively new um it's also I think for a broad uh group of of of problems it's it's it's almost like it it this is the this is kind of the low-level uh language this is like the three byte code of convex optimization roughly for a reasonable fraction of it right that what it is is you write a problem a high form in a high level form it gets compiled into a second order cone program and then it gets solved and what what's nice about that Arrangement is it means that people all over the world right now are working on solvers for second order cone programs why because a much larger group of people is actually solving problems that they don't even know can be reduced to second order cone programs but then so it all works out very well okay so okay um Let's do an example we'll do a robust linear programming um and here's the way it's going to work um so this time before we said the objective is is UN is is uncertain now we're going to now we're going to make the uh inequalities uncertain right um and what we're going to do is is uh we will we'll look at what happens we're going to look at just what happens when AI is uncertain right so and you know look in our diet problem that says oh how much you know how how many nutrients do you get uh per gram of this food stuff right and the answer is I don't know depends on the shipment like you know we yanked uh we yanked a actually you know in June they're slightly different you know and whatever and by the way Friday's batch is different from Thursdays I mean not grossly but there'll be they could be it could easily be considerable right so anyway so you do that by the way then you fit an ellipsoid to those you and then then what you're doing is you're saying please construct me a diet that satisfies you know these minimum like Health requirements and you'd say oh but I want this diet that should work no matter what gets delivered to us uh W with different amounts of nutrients per food stuff everybody following this that's that's the robust solution to that right so okay so that looks like this um okay um and another one is the stochastic model in the stochastic model you'd say oh yeah you know what the AIS are random they're just they're they're gaussian or whatever some they have some distribution and you'd say yeah I want the probability that the constraint holds to be greater than some number you know typical numbers of ADA would be 0.9 that's pretty loose 9599 um it could be 999 999 or something like that but and that's okay as long as you understand what you're doing when you solve that problem um yeah if someone said yeah no no no we do we we we we make sure it works you know for like you know that you know that the probability is bigger than 99.9% of course the question there is there's no one in any field that I'm aware of who actually knows the distribution of something the taale of their distribution that well this is just sorry this is just a weird rant but I'm just I'm doing it anyway way like there's no such thing right if you go if you go to anywhere and and someone says oh you know no we want we want the probability to be more than 99.9% it's hilarious to me right it's just ridiculous U okay so okay um however it's actually still a perfectly good problem to solve because when someone looks at your code and it says the probability should be bigger than 99.9% you're like wow you must really have a good handle on the tals of that distribution and then here's the the correct answer you go yeah no we have no we have terrible models at the tail distribution and you go but your probabil and you go that just means like really please I want this to hold that's that's what it means it doesn't mean anything more than that so okay everybody got that so well that was weird that was a weird rant okay so fine uh okay so let's do the deterministic one that's going to turn into ancp how do you do that well we write the ellipsoids as a center plus here uh a matrix uh a linear mapping of the unit ball so that's my so P tells me like the ellipsoid where is it big which axis is it big in and stuff like that okay so that by the way that's one of several representation of ellipsoid that's what we're going to use here okay so we do this um and here if you want to know uh we really have to work out what is the maximum of this expression over that ellipsoid and then you would just end up with this right and now again we very you know we sit down um we actually with chabby CHF Center we encountered something with a torm but the argument of it was a con was data and therefore it was a constant and it was just a red flag and you're you're yellow flashing light in your brain didn't matter here though we have to be careful so you look at the problem that's linear that's cool um and this objective is indeed uh an an a second order cone uh problem a second order cone constru people say so constraint here right um actually it's super interesting right like if you just said you just said you know what I don't care about this uncertainty stuff I don't believe it besides this stuff is only off by plus orus 5% I don't care then what you do is you just replace the uh a with AI bar which is kind of the the middle value right so you just do that if you do that you get this problem here that's an LP but then you'd say you'd say yeah but you know there's uncertainty so I would like a little bit of margin in my in in my inequality just to cover me when a Wiggles around right what's interesting is this tells you exactly how much margin to use right so what this says is wherever p is Big you should avoid having X in that direction it makes sense in fact you could even you could literally flag that number and call that the that that is the margin and the cool part is it's a function of X it's a function of your choice everybody got this so okay so that is a uh that's a an example so a a uh you know a ro robust LP with you know ellipsoidal uncertainty is in fact ancp okay um we can also do the same thing with a statistical model and it works just fine um weirdly you also end up with ancp so here here I say no you know what I don't know I took enough statistics and stuff so AI is gaussian with a mean a bar that's a a vector and a covariance matrix right so that means that this linear function of it that's also gaussian right because linear function of a gaussian vector is gussian um and it's got a mean a a bar transpose X and it's got a variance X transpose Sigma X and that means you know the probability that it means this is just a that's just a gaussian so a scal of gaussian the probability scal of gaussian is less than a number is equal to this right you you know you subtract the mean and then you divide by the standard deviation you get an n01 variable and then you uh then you look at the CDF so that's that's the CDF of a of a of a gan okay everybody following this um yeah by the way it's not looking super good here because these are replaced with uh the the these now you do know some things about that for example that's a log concave function the Gin CDF right so but doesn't sound like this is going to end up as a second order cone program right because you know it just doesn't sound quadratic or something like that so but it is and the reason is pretty straightforward um you what we want to do is we want to say that this thing is bigger than or equal to Ada right uh Ada is 095 there you go okay but this fi is monotone right so this says this expression is bigger than fi inverse of ADA everybody got that okay now F inverse of ADA it's going to it's going to be uh it's going to be positive right because Ada is like 085 and stuff oh actually if Ada is less than 50% vaa is less than 50% uh F inverse of beta is negative right if I have a standard gaussian and I say what's the inverse CDF of3 I don't know what it is but it's like minus a half or something roughly I just made that up but still everybody got this right so so provided that your required probability that the constraints hold is more than 50% um then then you can actually multiply out this thing this thing bigger than equal to F inverse of ADA and you multiply it out like this and lo behold it's ancp oh actually it's only ancp if adaa if this thing is is bigger than a half because otherwise this is negative and then you got the wrong you're you're sitting on the wrong side of the inequality there everybody see that so now if someone came up to you and said please help me solve this robust problem and you go what is it you go well the thing is I don't know the constraints and they they vary and I'd say well how with what probability would you like your constraints to hold and if they said something like oh 15% would be good for me like you would say go away I mean you'd check or what but You' send them away and you could also and as they're leaving right you can shout out it says it says not only is that stupid but you can't even solve it that's what I would say to that person that's as they're leaving okay everybody got I mean you there are weird cases where you might have risk-seeking things uh the closest I've ever come to hearing somebody describe risk-seeking something was okay not socially positive but it was somebody who was designing missile uh missile controllers and they said Well if you know I if you have high probability of of making a kill do it but we'll take 10% because otherwise it's you already launched it it's on its way and it's not coming home and you're not going to use it again I don't know if everybody got that so and I I think I even quasi accept that as a as an actual application but okay if if not a socially positive one but anyway okay that's okay next step in our tour um so uh next topic is geometric programming um so yeah you're seeing a lot of stuff this is very cool this goes back into the 1940s and actually weirdly it persisted well let me explain what it is um so it's it's a class of optimization problems where we are going to apply it is absolutely not a convex problem there's no doubt about it we'll look at it in a minute it's very weird um but it's not a convex problem but what we're going to do is we're going to make a a transformation uh we're going to change the variables and change a few other things that's all and now it's it's going to become convex under this change of variables now you can solve it it scales to gigantic problems and all sorts of other stuff okay so that's this is an example of that we're going to see others in statistics we're going to see them we're going to see them in other areas too right so um okay so and it's got its own weird language um oh what the other thing I was going to say was this was an entire field people talked about it people wrote books about it from the' 60s all sorts of stuff it wasn't until like around 198 80s that people actually realized it was convex oh sorry it's not convex can be reduced to an equivalent convex problem and therefore scaled right it means in fact you don't even have to say anything at that point you just stop and you say it's done okay so although some of my friends who were in Moscow in the 60s said well we all knew that like uh so I'm hardly surprised in the west they didn't know that anyway so everybody got this okay so here it is and it's got its own secret language which you should never use these words in the wrong situation because you will sound like an idiot unless unless other people know the background and things like that so here's what it is um if I have a product these are positive variables and I raise them to various Powers but the powers are just real numbers they can - 1.3 plus 2.6 - 3.5 that everybody got that right so uh so it's a product by the way I think people talk about this as a scaling law in a lot of engineering Fields right like you know how does this go with like Reynolds number and all that kind of so it's a scaling law so I think some people are you're familiar with things that look like that okay so um anyway so in this field of geometric programming um they decided to call this a monomial okay However unfortunately there is a definition of monomial in math which has been used since like 1820 or something like that and it is absolutely standard it's like Min there is no doubt what a monomial is a monomial is one and only one thing it is a product of variables with integer exponents and the coefficient in front can be anything positive or negative everybody got that so minus 3 X1 cubed * x^2 * X5 that is that's a monomial that you can say outside this room but only in the context of geometric programming would you call this a monomial so so just just just a heads up because if you say if you use you know that's a monomial and there's anybody around who doesn't know about geometric programming they'll think you're an idiot okay so okay and this is even worse a POS nomial like and I'm sorry we didn't make this up this is this is simply the name assigned to it in the 1960s it's a sum of monomials okay that's called opposing and I think it's supposed to be like positive and monomial forget the fact that their use of monomial is not the absolutely standard use right so anyway whatever that's poomi um and then a geometric program looks like this it says you have positive variables that's implicit um and you minimize a poomi sub subject to yeah and don't say poomi out make if you say that make sure everybody around you within any hearing distance no knows about GP and knows that you know about GP otherwise don't say that word uh because I mean you could try it just walk over to the math department and see what happens but anyway um okay and then you and you have equality constraints right so you know this is not remotely convex I mean these are these are highly This is highly nonlinear right I mean here's an example of this you know X1 to the 1.3 time X2 to the minus. 3 is equal to 1 well that's not that is not Aline last time I check it's not even okay everybody got that okay that's a that that's a GP and books were written about this in the 60s people solved them by hand I'm not kidding um when they were easy enough it's got kind of weird but then here's what was discovered in the West in the 80s and supposedly in Moscow everybody knew this in the 60s so there it is um what you do is you CH you change variables they're positive variables so I'm going to work with the logs I call those y right that's a change of variables so I'm going to have a Yi um and then when I have a monomial what does that convert to well it's the log um oh sorry yeah let's see what it is a um a uh if you have F right this transforms to these are exponentials actually to the exponential of an apine function right if I take the log of a monomial I get an apine function okay um then a poomi is going to transform into a log sum exp of an apine function and they're convex right so that means the geometric program over here which is not remotely convex is actually exactly equivalent to this problem and this problem is convex because look at this um this is log sum X of an apine function log sum X is convex and convex of apine is convex so are all these that's linear aine whatever what you want to do everybody got this so it's actually pretty cool it's even cooler because every time when you look at a practical field where people use geometric programming you will find out that they already either either implicitly or not figured out you should be working with logs okay I'll give you an example A lot of times there's power control for Wireless Systems or actually for cables right you you and actually so Engineers there they use decb to talk about Powers well guess what that's the log right um if you go to Circuit design and you'd say well tell me about my library what sizes do these come in and they go oh yeah you can get that uh you know that whatever that nandate you can get it in the size5 size 1 1.4 2 you know 4 8 16 sound familiar uh what it means is they already knew in their hearts that that that basically what was what it it wasn't you should not work with the size you should work with the log of the size right because that's what those numbers are so yeah is it always the same K in like the objective and the inequality uh let me think uh uh it it let me think let me think about that sorry uh it is let me it may not be actually let me think I don't think it is you're right because these are different um yeah there's some there's something yeah the K doesn't have to be the same no wait a minute no it does not have to be so yeah it doesn't have to be it's not part of the definition of like it wouldn't matter so okay so that's GP um I'll I'll go in one examp I'm not going to go through the details I mean actually the real truth is I couldn't even defend the details um but I will tell you a funny story about it uh so this is on a book I I found this in a book on um optimization for engineering design and this example was held up as an exam they said oh here's a problem that is not convex and that was literally Lally the point of the example that's a true statement but you do the right sneaky change of variables and it is convex so um and so the problem is this is you you know you you're going to design this Cal lever beam and you're very uh what what you can do is uh you you can change the length that's this thing that you could see here but also the width which you can't see that's that's how wide it is uh over here um and um so that's the uh that that that that that's what this is um and then we're going to put a a force at the tip and of course I mean you don't have to know any mechanical engineering to know it's going to deflect right and if you if you make yourself a you know a giant thick thing with this is steel or something like that it's going to be very stiff right because then when you push you put a lot of newtons on it and it's barely going to budge right or I could make it sort of this flimsy small thing and I press it and it's going to really Bend right so that's that's kind of that mean and again you don't have to know any mechanical engineering to know this I mean I I don't know any so that's fine so it's fine but that's just that's the rough idea right um and so your design problem is something like you know minimize the total weight subject to I mean these are just uh you know this could even be manufacturing constraints right that you just there just limits to these um you you could have an upper and that's a geometric constraint um you can have an upper bound on the stress in each segment that that's a reasonable model of of you know failure um and uper and then you want an upper bound and deflection of the beam at the end right so what you're saying is if I load this with 10 kilon Newtons I don't want it I don't want it to Sag more than 3 cmers right that would be your spec okay everybody got it and by the way problems like this but simpler were already being solved in the early 1950s in Aerospace they were doing exact and they were using linear programming and mostly linear programming in Aerospace right pretty cool right because you're you're you know if you design a bridge that people are going to drive over for a hundred years you build a lot of margin into it and stuff like that if you're doing Aerospace stuff you can't over you can't engineer stuff 5x over right because it just weighs too much and congratulations now you have no payload right so there it's delicate you you want to you want you want things to be as absolutely as light as possible with uh you know but still actually satisfy all the requirements that is actually going to work right which means that it can't it's got to be stiff enough it's got to handle all sorts of dynamic loads all that kind of stuff so anyway so that's the that's that's the IDE so stuff like this is done all the time in fact there's a whole field it doesn't matter I'll talk about that later okay so that's it okay so uh here it is in English and then you know in words to work out what these things are I'll just I'm going to just talk about a few of the things and then not I don't want to focus on it because it's not that critical to us but here's the total weight uh the variables remember are W and H and I would like to to ask you what kind of function is that I mean aside from it being a poomi which you should not never say in public but what kind of function is this of w and again pretend you're in public so you can use words that you're allowed to say in public what kind of function is that SC it's what SC produ I missed that dot product it's a DOT product right um it's an inner product between W and H is it convex in W and H no it's a quadratic form right it's the quadratic form with Matrix zero like I I zero right so that that is not this is not convex right so you know already we are not solving a convex problem or yeah well when we get around to solving this we'll do this change of variables and it will become a convex problem okay um okay so the these various ratios are monomials and and the the maximum stress is a monomial so those are fine to to handle the vertical deflection comes from again I couldn't defend this but this is uh I mean I suppose it's just like Elementary you know baby physics and mechanics or whatever but it's just you you start at the tip and you go back and this just a recursion that tells you this what's kind of cool about this is you look at uh various things uh here um but what you'll see is that each each of these right if you start these you can actually by applying the rules oh I should say there's a full other set you can transcribe all the rules you know for convex functions and they will they will have analoges for like pooms and monomials and I give you some example right let's let's just go back I mean um okay okay here we go I want to do this um talk talk to me about the product of monomials what is that product of two monomials no sorry two I have two monomial functions again not in the math sense in the G geometric programming sense I have two monomial functions and I take the product what is it it's a monomial it's a monomial right uh what's the sum of two monomials a POS right okay what's the product of two posals all right what's a POS what what is a poomi divided by a monomial it's a it's aomi it's fine uh what's aomi divided by a poomi right the answer is nothing we don't know okay so I'm just saying there's a bunch of these rules that that you're going to that you're going to use um and down here I don't want to go into the details but you know you just argue that you know a sum of posals is poomi or whatever that kind of thing and and what you'll end up finding is that these things are just poomi functions they're quite complicated but they are poomi functions right anyway then you you reassemble everything to look like this uh and there it is in in this standard GP form right um by the way um you don't have to do this right um because what's clear is that you can actually write these out in some object Orient in some domain specific language and then something would do all of this stuff for you and the truth is you shouldn't even look at this any more than you should look if you actually the way you should look at like after you compile something why would you look at the bite code or something like that right I mean you could a very small number of people have to because that's what they do but the rest of us should never look at that you know what I'm saying and I think the same is kind of true for this right so okay um I think I'll I'll skip this one this is just some Advanced uh topics and I'll talk about a couple more uh things uh just actually two two more main topics um okay oh uh to to finish up the other one um geometric programming is your first example of a wide class of non-convex problems that can be transformed to equivalent convex problems and therefore solved right so that's that's the main point about it okay so it's actually we're going to see plenty of others right we'll see that we'll see it also happening in statistics and it happens in control and other areas where um in fact I'll even tell you a story I I used to go around and give talks this is a long long time ago um and I would say look here's a problem in control oh it's convex here's another problem in control oh look it's convex and I said I had a slide that was like I said here's a here's an example of a problem and I said this I don't think it's convex right so I anyway so I went around and gave that talk to several thousand people I was at Berkeley I gave the talk colleague of mine came up to me afterwards and he said oh you see that problem that you just said was probably nonc convex I was like yeah and he said can I show you something I was like sure anyway he had some completely sick demented change of variables like instead of this positive definite Matrix we're going to work in the inverse and instead of these we're going to work in the inverse of this times that and I was like Andy why would anyone do that and he goes watch this two lines later it's con and I step away and I'm like oh no oh no I've been telling people all over the world that it was not then we looked at my slide and it was really awesome it says it says a problem it's not going V and there was a little footnote and it said probably so I was safe then by the way after that I told that story every time so I said so yeah so basically you should be very careful because this this anyway so okay fine back to this one now we're back to honest convex problems okay so uh okay so all we do here is the constraints we're going to have Vector constraints instead of uh instead of just SK you know scalar in inequalities right um you know the uh a very famous example of that is called a conic form problem or a cone form problem right and that looks like this you minimize a linear function subject to FX plus G is less than or equal to zero but that's with the respect to some cone K and ax equals B what's cool is if K is the non- negative this is just a pedantic way to write down an LP right but the cool part is K could be other cones and then you get like if I make k a product of second order cones this iscp but I can have other things too and a very famous one is uh so-called semi-definite programming so that's something you will see this is probably it's not bad to have heard it people talk about it in actually a whole bunch of fields now right so in theoretical computer science people blabber about I people in physics use the I so it's it's it's become mainstream in the last 20 25 years um here it is you minimize a linear function subject to equality constraint linear equality looks F and then you have a single inequality which looks like this in fact it it looks like a linear inequality if these were scalers and that was ordinary less than or equal to this would be a single linear inequality it' be a silly LP but here these are matri symmetric matrices and this says this linear combination this apine function which is a symmetric Matrix has to be negative semi-definite okay so that's a that's called a semi-definite program um and uh oh this thing is called an LMI a linear Matrix inequality it's like a linear inequality except it's a matrix inequality so okay um and you know there's things that you do you don't it this says you only have one but you could just diagonally stack them like this um yeah yeah oh any any suggestion how would I how would I make how would I represent an LP as an sdp any any idea you have multiple I have 10 linear inequalities on xcalar how do I write yeah dial yeah you make you make these matrices all diagonal and that's diagonal and then it's silly because a DI a diagonal matrix is less than or equal to zero right I mean if it only if all the entries are right so so this extends LP you would say right so okay this is a this is a semi-definite program um okay and as I said you can easily represent these things it also extends sdp um and that sorry s socp and that's also interesting not obvious at all uh and in fact that goes through something called the sure complement which is O should put that on one of the next coming homeworks yeah okay sorry okay okay so sure compliment it it's this uh this says it has to do with a block Matrix um by the way this is not a everyone here has seen sure compliment it's just people didn't use that name right so you know if you're in statistics or probability I mean either you are or you took those classes I hope then this is conditioning on a gan but it's tons of other stuff in mechan electrical engineering it means you take a an electrical circuit you terminate a bunch of the ports like you short those and you open circuit those and then you ask what what does the rest of it look like this is all the all these things are sure complements right so here it is it says that a matrix I I mean I I'm not going to go into it now but this inequality which notice is like nonlinear it's got this two Norm is equivalent to this thing and this is that's a matrix which is apine in X right and the the argument is this thing is positive semi-definite if and only if you know that's bigger than or equal to zero and sort of this thing minus that times the inverse of that times that um is um is uh is going to be less than or equal to u a zero right so that or bigger than equal to zero that here right so that's so I won't go into that detail but it's um it's interesting um so I I'll do an examp here's an example um you know and these are like we're already in the domain of like these are not obvious problems right so here's one I have a bunch of symmetric matrices I'm going to form an apine combination of them uh no I mean I'm going to form this an apine function with these coefficients right and I'd like you to choose X1 through xn to minimize the maximum igen value of that Matrix okay now we we already discussed this you know that's convex right but that doesn't mean it's obvious how to solve that problem it turns out it's just an sdp and it just it literally looks like that it's really dumb this is by the way that's an epigraph for the maximum IG value right that the maximum IG value is less than something if and only if that Matrix is less than equal to times T the identity right um okay so that's that's the idea but by the way how many people here have encountered or heard of sdp somewhere else just for fun you did cool where what context theoretical computer science ah there we go awesome right uh did anyone know that you could solve them except for theoretically oh not really okay yeah I thought so okay that's fine that's cool totally cool I have lots of friends who do that there was another one who saw it stat what in statistics in statistics okay cool yeah okay good so so it's getting I mean it's obviously not totally mainstream or half the class would have raised their hand but you know it's it's it's getting there it's not like it's not crazy stuff anymore um okay oh here's an interesting one like um minimize the uh this is a two Norm so that's the induced two Norm it's the maximum singular value right of of a matrix and it says uh please like minimize that over some apine set and this too you do the same you do the same thing you express this as a uh as a an LMI and then then everything works out and the conclusion is we can just solve these problems now we just we just solve them uh so it's actually it's kind of interesting and cool um yeah so okay and I think what we'll do is we're going to quit here um we have one more uh one more very important topic on optimization uh next week and there next week we're going to do a duality which is actually super interesting it's still theoretical but it's actually super practical

Transcript for:Optimization Problem Classes Overview

Transcript for:
Optimization Problem Classes Overview