Overview of CS 221: Artificial Intelligence Course

All right. Let's get started. Please try to have a seat if you can find a seat, and let's get the show on the road. So welcome everyone to CS 221. This is Artificial Intelligence. And if you're new to Stanford, welcome to Stanford. So first, let's do some introduction. So I'm Percy. I'm going to be one of your instructors, teaching this class with Dorsa over there. So if Dorsa wants to say hi, stand up. Hi, I'm Kirstie. I work at the robot interactions. Super excited about this new class and what we've got today. Great. So we're going to be trading off throughout the quarter. And we also have a wonderful teaching team. So these are your CAs. So if all the CAs could stand up and I'll give you each person an opportunity to say three words about what you're interested in. Let's start with the head CA. Hello. My name is. I'm a PhD student, and I'm interested in natural language processing. Yeah.. Hi, my name's. I'm a second year master's student. I'm. I'm a student of machine learning and data mining. Hi, I'm a second year master's student and I'm interested in machine learning and natural language philosophy. Hi everyone, my name is and I'm a first year master's student and I'm interested in computer vision. Hi, I'm Hello, I'm a first year master student and I'm interested in reinforcement learning. Hi everyone, I'm a first year master student and I'm interested in machine learning and differential privacy. Hi, I'm a post-grad student and I'm interested in reinforcement learning and AI education. Hi, I'm going to use this on your side. I need to spend a few minutes in the room. Hi, I'm going to use this on your side. Let's go over there. Hi, everyone. I'm Kendra. And I'm interested in... Hey, everyone. I'm also a co-chairman, and I'm interested in... Hi, everyone. and interesting. Now, any tears in the back? Well, well, they're all on the slide. Okay. So as you can see, we kind of have. have a very diverse team and so when you're thinking about kind of final projects later in the quarter you can tap into this kind of incredible resource. Um, so three quick announcements. Um, so there's going to be a section every week which will cover both kind of review topics and also advanced topics. So this Thursday there's gonna be an overview. Um, if you're kind of rusty on Python or rusty on probability, come to this and we'll get you up to speed. Um, homework, the first homework is out. It's posted on the website. It's due next. Tuesday at 11 PM. So remember the time, that matters. Um, all submissions will be done on Gradescope. There's gonna be a Gradescope coast-code that will be posted on, uh, Piazza. So look out for that, um, later. Okay. So now let's, let's begin. So when I first started teaching this class, uh, seven years ago, I used to have to motivate why AI was important and why, if you study it, you'll have a lot of impact in the world. But I feel like I don't really need to do this. Now, it's kind of inescapable. that you pick up the news in the morning and you hear something about, you know, AI. And indeed, we've seen a lot of success stories, right? Uh, AIs that can play Jeopardy or play Go, Dota 2, even poker, all these kind of games at superhuman level performance. It can also, you know, read documents and answer questions, do speech recognition, uh, face recognition, um, even kind of medical imaging. And all these tasks are uh, you read about how successful these, uh, technologies have been. Um, and then if you take a look at outside the kind of the technical circles, there's a lot of people, um, in policy, um, and, um, trying to ask what is going on with AI. And you, you hear about, uh, these kind of very, uh, broad claims of how transformative AI will be, um, to the future of work and, um, the society and so on, and even some kind of boring on. are pretty, you know, catastrophic consequences. So what's gonna happen in the future? No one knows. Um, but it is fair to say that AI will be transformative. Um, but how do we get here? And to do that, I want to take a step back to the summer of 1956. So the place was Dartmouth College. John McCarthy, who was then at MIT, He, and then after that he founded the Stanford AI Lab, um, organized a workshop at Dartmouth College with, um, some of the best and brightest minds of the time, Marvin Minsky, uh, Claude Shannon, and so on. And they had this not so modest goal of trying to think that every aspect of learning or any feature of intelligence can be precisely captured so a machine can be just, uh, simulated. So they were after the, the big question of how do you kind of solve. um, AI. So now they didn't make that much progress over the, the summer, but a lot of programs and interesting artifacts came about from that time. Um, there were programs that could play checkers or prove, uh, theorems and sometimes even better than what, um, the human proof would look like. Um, and there was a lot of optimism. People were really, really excited and you can see these quotes by all these excited people who proclaimed that AI would be solved in a matter of years. But, We know that didn't really happen. And there's this kind of folklore example. People are trying to do machine translation. So you take an English sentence like the spirit is willing but the flesh is weak. You translate into Russian, which is what the choice language by the US government at that time. And you could translate back into English. And this is what you get. The vodka is good but the meat is rotten. So the government didn't think that was too funny. So they cut off the funding and it became the first AI winter. Um, so there's a period where, you know, AI research was not very active and was not well-very well funded. Um, so what went wrong here? Um, these are really smart people, right? Um, they just got a little maybe a little ahead of themselves. So two problems. One is that the compute was simply not there, right? It was millions or even, you know, billions of order of magnitude compared less than what we have, uh, right now. And also the problem is the way they formulate them intrinsically rely on a kind of exponential search which, um, no matter how much compute you have, you're never going to, you know, uh, win that race. Um, they also have a limited, you know, information. And this is maybe a kind of a more subtle point that if I gave you infinite compute and ask you to translate, I don't think you would be able to figure it out because it's not a computation problem. You just need to learn the language and you need to experience all the subtleties of language to be able to, you know, translate. But on the other hand, AI wasn't solved, but a lot of interesting um, contributions to computer science came out of it. Lisp was-is, had a lot of ideas that, um, underlay many of the high-level programming languages we have, garbage collection, um, time-sharing, allowing multiple people to use the same-one computer at the same time, which is something that, uh, we kind of take for granted. And also this paradigm of separating what you want to compute, which is modeling, and how you do it, which is inference, which we'll get to a little bit later. Okay. So, Um, people forget quickly and, um, in the 17th and 18th, there was a renewed generation of people getting excited about AI again. Um, and this time it was all about knowledge, right? Knowledge is power. And, um, there were a lot of expert systems which are created. And the idea is that if you could encode experts knowledge about the world, then you could do kind of amazing things. And at the time, the knowledge was encoded in generally a set of rules. Um, and there were a lot of programs that was written. and you notice that the, the scope is much narrower now. The goal isn't to solve it all of AI, but to really focus on some choice problems like diagnosing the diseases or converting customers'order parts into parts and customer orders into parts. And this was the first time that AI I think really had a real impact on industries. So people were actually able to make useful, you know, products out of this. And knowledge did actually play a key ingredient in curbing this, in your exponential growth that people were worry about. But of course, um, it didn't last long. Um, knowledge as deterministic rules was simply not rich enough to capture all the kind of nuances of the world. It required a lot of manual effort to maintain. And, um, again, um, a pattern of, uh, over-promising and under-deliverings that seems to plague, um, AI people, um, led to the collapse and, of the field and the kind of second AI winter. Um, Okay. So that's not the end of the story either. But actually, it's not, not kind of really the beginning either. Um, so I'm going to step back further in time to 1943. So what happened in 1943? So there was, um, a neuroscientist, McCullough, and a logician, Pitts, who were wondering and marveling at how the human brain is able to do all these kind of complicated things. And they wanted to kind of formulate a theory about how this could all happen. So they developed a theory of artificial neural networks. And this is kind of, you can think about the- root as of, you know, deep learning in some sense. And what was interesting is that they looked at neurons and logic, which are two things that you might not kind of necessarily associate with each other, and showed how they were kind of connected mathematically. And a lot of the early work in this era were around artificial neural networks was about studying them kind of from a mathematical perspective. Because at that time, the computer wasn't there, you couldn't really run any. kind of train any models or, um. Um, and then 1969, something interesting happened. So there's this book by Minsky and Papert called Perceptrons. And this book did a lot of mathematical analysis. And it also showed that linear models, one of the results, um, of many was showing that linear, uh, classifiers couldn't solve the XOR problem. Um, the problem is, uh, another way to think about the problem is basically given two inputs, can you tell whether they're the, the same or not or different? And, um, so it kind of not a-shouldn't be a hard problem, but linear classifiers can do it. And for some reason, which I don't quite understand, it killed off neural nets research, even though they said nothing about if you had a deeper network, what it could do. But it's often cited that this book swung things from people who are interested in neural networks to the field of AI being very symbolic and logic-driven. But there was always this kind of minority group who were, really invested and believed in, um, the power of neural networks and they thought it was just kind of a matter of time. So in the 80s, there was a re-renewed interest. Um, people kind of discovered or rediscovered the backpropagation algorithm, which allowed, uh, kind of a for a generic algorithm that could train these multi-layer neural networks, because single layer, remember, was insufficient to do a lot of things. Um, and then one of the kind of the early success stories was Yann LeCun in 1989. applied a convolutional neural network and was able to recognize hand digit-written digits. And this actually got, you know, deployed by the USPS and it was reading kind of zip codes. Um, so this was, you know, great. Um, but it wasn't until this decade that the, um, this area of neural networks really kind of took off, um, under the moniker deep learning. Um, and, you know, AlexNet in 2012 was kind of a huge transformation. Um, where they show gains on the kind of ImageNet benchmark and, and overnight transform the computer vision community. AlphaGo as many of you know, and many kind of other, um, and there were kind of the rest is history. Okay. So, so there's these kind of two intellectual traditions. Um, you know, the name AI has always been associated with the time John McCarthy, logical vision, that's kind of where it started. But Um, as you can see that there's also kind of this neuroscience inspired tradition of AI. And the two were kind of really had some deep philosophical differences and over the decades fought with each other kind of quite a bit. But I want to pause for a moment and really think about, maybe there are actually kind of deeper connections here. Remember McClellan-Pitts, they were studying artificial neural networks, but the connection was to logic, right? So from even in the very beginning, there is kind of this synergy that, you know, some, some people can kind of often overlook. And if you take a look at AlphaGo, which if you think about the game of Go or many games, it's a mathematically-you can write down the rules of, uh, Go in logic in just a few lines. So it's a mathematically well-defined logical-logic puzzle in some sense. But somehow, the, the power of neural networks allows you to develop these models that actually play Go really, really well. So- This is kind of one of the deep mysteries that has kind of a, I think is a kind of open, a standing challenge in AI. As with any story, it's not the full picture. And I want to point out on this slide that AI has drawn from a lot of different fields. Many of the techniques that we're gonna look at, for example, maximum likelihood came from statistics or games, came from economics, optimizations, gradient descent, came from was in the 50s, completely unrelated to AI, but these techniques kind of developed in a different context. And so AI is kind of like, you know, it's kind of like a New York City. It's, it's like a melting pot where a lot of the, these techniques get kind of unified and applied to kind of interesting problems. And that's what makes it, I think, really interesting because of the, the new avenues that are opened up by kind of unique combinations of existing techniques. Okay. So, so that was a really, really brief history of, you know, where-how we got here. Now, I want to pause for a moment and think about, you know, what is-what is the goal? What-what are AI people trying- to do. And again, this is kind of, there's two ways to think about this, which and sometimes the conflation of these causes a lot of confusion. So I'd like to think about it as AI as agents and AI as tools. So the first view asks the kind of the scientific question of how can we create or recreate intelligence. And the second one asks, you know, how can we use technology to kind of benefit, you know, society? And these two are obviously very related and they have, um, a lot of shared technical, um, um, overlap. But, you know, philosophically, they're kind of different. So let me kind of explain this a little bit. So the idea with AI agents is, and this is, I think a lot of what, um, um, gets associated with AI. And especially with science fiction, that kind of portrayal certainly kind of encourages this kind of view, where you're human, we're human beings. And what you do is you look in the mirror and you say, wow, that's a really smart person. And you think, okay, what can humans do that is so amazing? Well, they can see and they can perceive the world, recognize objects. They can grasp cups. and drink water and not spill it. They can communicate using language as I'm doing to you right now. We know facts about the world, declarative knowledge such as what's the capital of France, and procedural knowledge like how to ride a bike. We can reason with this knowledge and maybe ride a bike to the capital of France. And then really importantly, we're not born with all of this, right? We're born with basically nothing, none of these capabilities, but we're born with the. capacity and potential to acquire these over time through experience. And learning, it seems to be kind of this critical ingredient which drives a lot of the success in AI today. But also with, um, you know, human intelligence is clear that learning plays such a central role in getting us to the level that we are operating at. So each of these areas has kind of spawned entire subfields and people in it are kind of wondering about how you can, um, make artificial systems that have the language or the motor or the individual perceptual capabilities that humans have. But are we there yet? And I would like to think that we are very far. So if you look at the way that machines are having successful, it's all with a narrow set of tasks and millions or billions of examples. And you just crunch a lot of computation and you can really kind of optimize. um, every-any task you're gonna come-come up with. Whereas humans operate in a very different regime. They don't necessarily do any, you know, one thing well, but they are-have such a kind of diverse set of, you know, experiences, can solve the diverse set of tasks, and learn from each individual task from very few examples. And still, it's a kind of a grand challenge in-from a, a, you know, Congress perspective, how you can build systems with this level of capability in that humans have. So the other view is, you know, AI tools. Basically, you would say, okay, well, you know, it's kind of cool to think about how we can, you know, recreate intelligence. But, you know, we don't really care about making more things like humans. We already have a way of, you know, doing that that's called, you know, babies. So. what instead we really like to do is not making something that's like a human, but making systems that help humans because, you know, after all we're, we're humans, I guess it's a little bit selfish, but, um, we're in charge right now. Um, and, and a lot of these, this view and a lot of the success stories in AI are really different from the things that you expect, you know, this, uh, this humanoid robot to come into your house and be able to do. For example, this is a project from, uh, Stefano Ehrman's group, um, there's a lot of poverty in the world and, Um, part of it is just kind of understanding what's-what's going on. And they had this idea of using computer vision on satellite imagery to predict things like, you know, uh, GDP. Um, so this is obviously not a task that, you know, the-our ancestors in Africa were like, you know, getting really good at. Um, but nonetheless, it uses convolutional neural networks, which is a technique that was inspired by, you know, the brain. And so that's-that's kind of interesting. Um, you can also have another application for saving energy by trying to figure out when to cool data centers. As AI is being deployed in more kind of mission critical situations, such as self-driving cars or authentication, there are, there are a few new issues that come up. So. So for example, there are this phenomenon called adversarial examples, where you can take these cool looking glasses, you can put them on your face, and you can fool the computer, uh, state of our-our face recognition system to think that you're actually, you know, someone else. Or you can post these, uh, stickers on stop signs and get this, uh, state of our system to think that it's, uh, um, a speed limit sign. So there's obvious-there's clearly these are, you know, big problems if we think about the widespread deploy-deployment of AI. Um, there's also, uh, less catastrophically but also pretty, um, you know, upsetting, which is, uh, biases that you-many of you probably have read in the news about. Um, so for example, if you take Malay which is a language that, uh, doesn't distinguish, um, in this writing form between he and she, and you stick into Google Translate, um, you see that she works as a nurse but he works as a programmer, which is encoding certain, uh, societal biases, um, in the actual models. And one kind of important point I wanna bring up is that, you know, how is machine learning and, and, uh, kind of working today? Well, it's, um, you know, society exists, society is generating a lot of data. We're training on this data and kind of trying to fit the data and trying to mimic what it's doing. and then using predictions on it. What could possibly go wrong, right? And so, so certainly people, a lot of people have been thinking about how these biases are kind of creeping up, and it's an open active area of research. Something a little bit more kind of sensitive is, you know, asking, well, these systems are being deployed to all these, all these people whether they kind of want it or, want it or not. And this, this actually touches on, you know, people's, uh, you know, livelihoods. It actually impacts people's lives in a serious way. Um, so North Point Wizards Company developed, uh, a software called Compass that tries to predict how risky, um, criminal risk or how someone, how risky someone is essentially. Um, and a pro-publica, this organization realized, whoa, whoa, whoa. You have the system that, uh, given that individual didn't refund is actually, um, more-twice as likely to classify blacks as in- correctly as non-blacks. So this is, seems pretty problematic. And then North Point comes back and says, actually, you know, I think, I think we're being fair. Um, so given a risk score of seven, we were fair because 60% of whites were re-offended and 60% of blacks were re-offended. Um, the, the point here is that there's, there's, there's actually no, um, solution to this in some sense, sadly. Um, So people have, uh, formulated different notions of fairness and equality between, um, how you predict, uh, accord on different kind of, um, groups. But, um, all-you can have different, uh, notions of fairness and which all seem reasonable from first principles, but mathematically they can be, um, incompatible with each other. So this is, again, an open area of research where we're trying to figure out as a society how, um, to deal with the skim and that machine learning. might be used in these kind of critical situations. Okay. So summary so far, there's an agent's view. We're trying to really kind of dream and think about how do you get these capabilities, like learning from very few examples that humans have into, you know, machines and maybe opening up a kind of a different set of technical capabilities. But at the same time, We really need to be thinking about how these AI systems are affecting the real world. And things like security and biases and fairness all kind of show up. It's also interesting to note that, you know, a lot of the challenges in deployment of AI system don't really have necessarily to do with, you know, humans at all. I mean, humans are incredibly biased, but that doesn't mean we want to build systems kind of in our, um, that mimic humans and kind of inherit all the kind of the flaws that humans have. Okay. Any questions about this? Maybe pause for a moment. So let's go on. So what I wanna do next is give an overview of the different topics in the course. And the way to think about all of this is that, um, in AI we're trying to solve really complex problems. The real world is really complicated. And-but at the end of the day, we want to produce some software or maybe some hardware that actually runs and does stuff, right? And so there's a very considerable gap between these things. And so how do you even approach something like self-driving cars or, um, you know, diagnosing diseases, you probably shouldn't just like go sit down at a terminal and start typing, because then, um, there, there's no kind of no overarching structure. So what this class is going to do is to give you one example of a structure, which will hopefully help you approach hard problems and think about how to solve them in a kind of more principled way. Um, so this is a paradigm that I call the, um, modeling inference and learning paradigm. Um, so the idea here is that there's three pillars, which I'll explain in a bit. And, uh, we can focus on each one of these things kind of in turn. So the first pillar is modeling. So what is modeling? The modeling is taking the real world, which is really complicated, and building a model out of it. So what is a model? Model is a simplification. that is mathematically precise so that you can, you know, do something with it on a computer. One of the things that's necessary is that modeling, necessary has to simplify things and, you know, throw away information. So one of the kind of the, you know, the art is to figure out what information to pay attention to and what information to keep. So this is gonna be important for example, when you work on your final projects and you have a real-world problem, you need to figure out, Um, you can't have everything and you have to figure out judiciously how to, um, manage your, you know, resources. So here's an example. If you want to, for example, build a system that can find, uh, the best way to get from point A to point B in a graph, in a city, you can formulate the model as a graph where nodes are points in the city and edges re-represent ability to go between these points with some sort of cost, um, on the edges. Okay. So now, once you have your model, you can do inference. And what inference means is asking questions about your model. So here's a model. You can ask, for example, how-what is the shortest path from this point to this point, right? And that's because now you're in model land, it's a mathematically well-defined problem. Now you can-it's within the realm of, you know, dev-developing algorithms to- you know, solve that problem. And most of inference is being able to do these computations really efficiently. And finally, learning addresses the problem, where did this model come from? So in any kind of realistic setting, the model might have a lot of parameters, maybe it has, you know, millions of parameters. And how do you, if it, if it wants to be faithful to the, you know, real world, but how do you get all this? uh, information there. Um, manually encoding this information turns out not to be a good idea. This is, um, in some sense what, um, AI from the 80s was trying to do. Um, so the learning paradigm is as follows. What we're gonna do is specify a model without parameters. Think about it as a skeleton. So in this case, we have a graph, but we don't know what the edge weights are. Um, and now, we have some data. So maybe we have data of the form people try to go from x to y, and they took 10 minutes or an hour or so on. Um, and then from this data, we can learn to fit the parameters of the model. We can assign, um, costs to the edges that kind of are representative of what the data is telling us. Okay. So now in this way, we can write down a model without parameters, feed the data, apply a generic learning algorithm, and get a model with parameters. And now we can go back and do. inference and ask questions, you know, about this. Okay. So this is kind of the, the, the paradigm and I want to really emphasize that, you know, learning is not, as I presented, is really not about any one particular algorithm like nearest neighbors or neural networks. Really a kind of a philosophy of how you go about approaching problems by defining a model, and then not having to specify all the details but filling them in later. Okay. So here is the, um, plan for the course. We're gonna go from low-level intelligence to high-level intelligence. Um, and this is the intelligence of, um, of the-of the models that we're gonna be talking about. So first, we're gonna talk about machine learning. And like I've kind of alluded to earlier, machine learning is going to be such a kind of a important building block of that can be applied to any of the models that we develop. So the central tenet in machine learning is you have data and you go to model, its main driver of a lot of successes in AI because it allows you to, in software engineering terms, move the complexity from code to data. Rather having, you know, a million lines of code which is unmanageable, you have, a lot of data which is collected in kind of a more natural way, and a smaller amount of code that can operate on this data. And this paradigm has really been, um, incredibly powerful. One thing to think about in terms of machine learning is that it, it is-requires a leap of faith, right? So you can go through the mechanics of, you know, down-downloading some machine learning code and you train a model. But fundamentally, it's about generalization, right? You have your data, you fit a model, but you don't care about how it performs on that data, you care about how it performs on new experiences. And that leap of faith is something that's, um, I think gives machine learning its power, but it's also a little bit, um, at first glance, perhaps magical. Um, it turns out you can actually formalize a lot of this using, um, you know, probability theory and statistics, but that's kind of a topic for another time. Okay. So after we talk about machine learning, we're going to go back. and talk about the, the simplest of models, right? So a reflex model is this. So here's a quiz. Okay, what is this animal? Okay, zebra. How did you get it so fast? Well, it's kind of reflex, right? Your human visual system is so good, um, at, at doing these things without thinking. Um, and so reflex models are these, um, are models which just require a fixed set of computations. So examples like, are linear classifiers, deep neural networks. Um, and most of these models are the ones that people in machine learning, um, use. Models is almost synonymous with, um, reflex, um, in, you know, machine learning. An important thing that there's no feed for it. It's just like you get your input, bam, bam, bam, and here's your output, okay? So that's, that's great because it's, it's fast. But there's some problems that require a little bit more than that, right? So for example, here's another problem. Okay, quick. Why to move? Where should you go? Okay. There's, there's probably like a few of you who are like chess geniuses. Um, but for the rest of us, um, I have no idea. I don't even know, wait, who's moving again? Um, so, so in these kind of situations, we need something perhaps a little bit more powerful than a reflex. We need agents that can kind of plan and think, um, ahead. So the idea behind state-based models is that we model the world as a set of states which capture any given situation like a, a position in a, in a game. and actions that take us between states which correspond to things that, um, you can do in the-in this game. Um, so a lot of game applications following this, uh, category of robotics, uh, motion planning, navigation. Um, also some things that are-might not be, uh, you might think of, um, planning as such as gen-you know, generation, um, in natural language or generating an image, um, you know, are, uh, can be cast in this way as well. So there's three types of state-based models, each of which we'll cover in, you know, weeks of time. So search problems are the classic, you control everything so you're just trying to find the optimal path. There are cases where there's randomness. For example, if you're trying to go from point A to point B, maybe there's traffic that you don't, you know, don't know about or in a game there might be dice that are, die which are road. And there's a third category which are adversarial games, which is cases where you're playing an opponent. who's actively trying to destroy you. So what are you gonna do about it? So one of the games that we're gonna be talking about, when we talk about games is Pac-Man. And one of the assignments is actually building a Pac-Man agent such as this. So while you're looking at this, think about how-what are the states and what are the actions and how would you go about, you know, devising a strategy for Pac-Man to eat all the dots and avoid all the ghosts? So that's something to maybe look forward to. There's also gonna be a competition, so we'll see who ends up on top. Okay. So state-based models, um, are very powerful, they allow you to kind of have foresight. Um, but some problems are not really most naturally cast as state-based models. For example, you know, how many of you play Sudoku or have played it? So the goal of Sudoku is to fill in these, uh, um, blanks with numbers so that, um, every row, column, and 3 by 3 sub-block has a digits 1 through 9, so it's a bunch of constraints. Um, and there's no kind of sense in which you have to do it in a certain order. Right. Whereas the, the, the order in how you move in, in chess or something is, you know, pretty important. Um, so, so these type of problems, uh, are, are captured by these, uh, variable-based models where you kind of think about a solution to the problem as an assignment to the individual variables under some constraints. So constraint satisfaction problems, we'll spend a week on that. Um, these are hard constraints. For example, two people can't be, a person can't be in two places at once, for example. Um, there's also Bayesian networks which we'll talk about which are variable-based models with soft dependencies. For example, if you're trying to track, um, you know, a car over time, these are the positions of the car, these variables represent the position of the cars, and these, uh, E's represent, uh, the sensor readings of the position, of the car at that particular position. And inference looks like trying to figure out where the, the car was given all this kind of noisy sensor reading. So that's also gonna be another assignment later. Okay. So finally, um, now we get to high-level. What's the, what is high-level intelligence here? Um, and I put logic here, um, for a reason that you'll see clear. Yeah, is there a question? Can you explain why it's not a state-based model? Yeah. So the question is why is not the, why is the Sodiaco problem not a space-based model? Um, you can actually formulate this as a state-based model, um, by just thinking about the sequence of, uh, assignments. But it turns out that, um, you can formulate it in a kind of more natural way as a variable-based model, which allows you to, uh, take advantage of some kind of more efficient algorithms to solve it, right? It's-it-think about these models as kind of different, um, analogy is like a programming language. So yes, you could write everything in, you know, C++, but sometimes, Writing in, you know, Python or, or SQL for some things might be more, might be easier. Yeah. How do you categorize this state-based problem where you have both an adversarial element and an element of randomness? Yeah. So the question is, how do you categorize state-based models where there's both randomness and an adversary? We're also gonna talk about those as well. And those would be, I would classify them as adversarial, but there's also a random component you have to deal with. Games like Back Camera. Yeah, question. Um, I was trying to like go from discrete states to like continuous spaces. So it feels like we have like problems that are, that are quantifiable via like discrete states spaces. And then, and then the farther right we get, the more we need like optimants. Yeah. So the question is about whether, uh, some of these are more- are continuous and some of them are discrete. Um, I don't necessarily think of, uh, so a lot of the reflex models actually can work in continuous state spaces, for example, images. Um, actually it's, it's almost a little bit of the opposite where um, the logic-based models are in some sense more, you know, discrete, but you can also have continuous elements, you know, in there, um, as well. Um, so in this class, we're mostly gonna focus on kind of discrete objects because they're just going to be simpler to work with. Okay. So what is this logic? So the motivation here is that suppose you, um, wanted a little companion who, um, you could boss around and, um, and help or help you do things, let's say. That's a better way to say it. it. Um, so you'd like to be able to say, okay, you know, tell us some information, um, and, you know, then later you want to be able to ask some questions and have the system be able to reply to you. Um, so, um, you know, how-how would you go about doing this? One way you could think about is building a system that you can actually talk to using natural language. Okay. So I'm actually gonna show you a little demo, um, which, uh, is gonna come up in the last assignment on logic. Um, and well, let's see what you think of that. Um, okay. So this is going to be a system that is, um, based on logic that I'm going to, um, tell the system a bunch of things and I'm gonna ask some questions. So, um, I want you all to follow along and see if you can, you know, play the role of the agent. Okay. So I'm going to teach you a few things like, um, Alice is a student. Okay. So it says I learned something. Now, let's-let's quiz. Um, is Alice a student? Okay. Good. So that worked. Um, is Bob a student? Should the answer be? I don't know. Who's Bob? Um, okay. So now let's do, um, students are, um, people. Um, Alice is not a person. I don't buy that. Okay. So, um, okay. It's, you know, it's doing some reasoning, right? It's using logic. It's not, um, just, um. Okay. So now let's do, um, Alice is from, you know, Phoenix. Phoenix is a hot city, I know because I live there. Um, cities are places and if it is snowing, uh, it is, um, then it is cold. Okay? Got it? So, um, is it snowing? I don't know. Um, so how about this? Okay. So if, um, a person is from a hot place and it is cold, then she is not happy. Okay? True, right? Um, I guess those of you who have spent all your life in California would maybe appreciate this. But, um, okay. So is it snowing now? How many of you say yeah, it's snowing? How many of you say no? Don't know? Okay. Let's see what. Oh, too late. How about if I say Alice is happy? Okay. So is it snowing now? No, it should be no. Okay. So you, you guys were able to do this. Okay. So this is kind of an example of an interaction which, um, if you think about it has, is very, very different from what you would see kind of in a typical, um, you know, ML system where- you have to show it, you know, millions of examples of one particular thing and then it can do kind of one task. This is much more of a very open-ended set of, um, um, I want to say that the, the experiences are super rich but they're definitely diverse. I teach. Um, I just give one statement, I say it once, and then all of a sudden it has all the ramifications and, uh, kind of consequences of that built in, and it kind of understands in a kind of deeper level. Of course, this is, uh, based on, you know, logic systems. Um, so it is brittle, but this is kind of just a proof of concepts to give you a taste of what I mean when I'm, uh, say logic. So, uh, so these systems need to be able to digest these heterogeneous information and reason deeply with that information. And we'll see how... out of how, um, logic, uh, systems can, you know, do that. Okay. So that completes the tour of the topics of this class. Um, now I want to spend a little bit of time on course logistics. Uh, so Um, I want to-all the details here are online, so I'm not gonna be complete in my coverage, um, but I just wanna give you a-a general sense of what's going on here. Okay. So what are we trying to do in this course? Um, so, uh, prerequisites, um, there's programming, um, discrete math, and, uh, probability. So you need to be able to code, you need to be able to, um, do some math and, uh, some kind of basic proofs, right? So these are the classes that are, um, required or at least recommended, uh, that you-or if you have some equivalent experience that's, you know, fine too. Um, and what we-what should you hope to get out of this course, right? So one hand, the course is meant to be giving you a set of tools. Using the modeling inference learning paradigm, it gives you a set of tools. and a way of thinking about problems that hopefully will be really useful for you when you go out in the world and try to solve real-world problems. Um, and also by-as a side product, I also want all of you to be more proficient at, you know, math and programming because those are kind of the core elements that are, uh, enable you to do kind of interesting things in AI. So a lot of AI, you know, you read about it is very flashy but really the foundations are still, um, just, you know, math and programming. in some sense. Okay. So the coursework is homeworks, exam, and a project. That's what you have to do. Um, homeworks, there's eight homeworks. Each homework is a mix of writing, written, and programming problems centered on a particular application covering one particular type of model essentially. Um, like I mentioned before, there's a competition for extra credit. There's also some- extra credit problems in the, in the, in the homeworks. And when you submit code, we're gonna run, we have an autograder that runs. It's gonna run on all the test cases but you get a feedback only a subset. So you can, it's like, you know, in machine learning you have a train set and you have a test set. So don't train on your test set. Okay. So the exam is, testing your ability to use the knowledge that you learn to solve new problems, right? So there's, um, and I, I think it's worth taking a look at exam because this kind of surprises people every, the exam is a little bit different than the types of problems that you see on, on the homework and they're kind of more problem, you know, solving. So the exam isn't gonna be like a multiple choice like, okay, you know, um, you know, when was, uh, um, you know, Perceptron's published or, you know, something like that. It's gonna be, here's a real-life problem. How do you model it and how do you come up with a, you know, solution? Um, they're all gonna be written, it's closed book except for you have a one page of notes. And this is a great opportunity to actually, um, review all the material and actually learn the, the content of the class. Um, so the project I think is, uh, a really good opportunity to take all the things that we've been talking about in the class, and, um, try to find something you really care about and try to apply it. Um, working groups of three and I really recommend finding a group early. And as I emphasize, it's your responsibility to find, you know, a good group, right? Um, don't come to us later like one week before the project deadline and say, oh, you know, my group members, they, um, they ditched me or something. Really try to, try to nail this down. Use Piazza to, or your other, um, social networks to find a good group. So throughout the quarter, there's gonna be these milestones for the projects. So, um, to prevent you guys from procrastinating until the very end. Um, so there's gonna be a proposal where you try and brainstorm some ideas, progress report, a poster session, which is actually a whole week before the final report is due. Um, and the project is very open. So this can be, um, really liberating but also might be a little bit, um, daunting. Um, we will hopefully give you a lot of structure in terms of saying, okay, how do you define your task? How do you implement different, um, baselines and oracles, which I'll explain later. How do you evaluate? How do you, um, analyze what you've done? Um, and each of you will, each project group will be assigned a CA mentor, um, to help you, uh, through the process and you're always welcome to come to my office hours or Dorsa's or any of the CAs to get additional, um, help, either brainstorming or figuring out what the next, uh, step is. Um, some policies, uh, all assignments will be submitted on Gradescope. Um, there's seven total late days. You can use at most two per assignment. After that, there's no credit. Um, uh, we're gonna use Piazza for all communications. So don't email us directly, leave a post on Piazza. If, uh, I encourage you to make it public if it's, it's not sensitive but if it's, you know, personal then obviously make it private. Um, and try to help each other. We'll actually award some extra credit for students. who help answer, um, you know, other students'questions. So all the details are on the course website. Okay. So one last thing, and it's really important, and that's the honor code. Okay. So especially if you're, um, you know, you've probably heard this if you've been at Stanford, if you haven't, then I want to really kind of make this clear. So I encourage you all to kind of collaborate, discuss together, but when you-when it comes to actually the homeworks, you have to write up your homework and, you know, code it independently. So you shouldn't be looking at someone's write-up, um, you shouldn't be looking at their code, um, and you definitely shouldn't be, like, copying code off of GitHub. Um, um, that's hopefully should be, you know, obvious. And maybe less obvious, you should not, please do not post your homework assignments on GitHub. I know you're probably proud of the fact your Pac-Man agent is doing really well, but, um, please don't, uh, post on GitHub because, then that's gonna be an honor code violation. Um, when debugging, um, with-if you're working together, it's fine to-as long as it's kind of looking at input-output behavior. So you can say to your partner, hey, I-I put in this, um, input to my test case, I'm getting like three, what are you getting? So that's fine, but you can't, um, remember, don't look at each other's code. Um, and as you enforce this, we're gonna be running Moss, which is a software program that looks for code duplication, um, to-to- make sure that the rules are being followed. And, you know, changing one variable name is or you, you'll be so, anyway, enough said. Just don't, don't, don't do it. Okay. Any questions about this? I want to make sure this is important or about any of the logistics. Yeah. Does the GitHub thing apply to the final project? The final project, you can put on GitHub. Yeah. Yeah. Private GitHub repos is fine. Yeah, question in the back. Is it necessary to have a group or can you do a solo project? The question is, can you do a solo project? You can do a solo project, you can do a project with two people, you can do a project with three. I would encourage you to- try to work in, uh, groups of three because you'll be able to do more as a group. And there's definitely, uh, you know, it's not like if you do solo project, we'll be expecting like one-third of the work. So, okay. Anything else? All right. Okay. So in the final section, I want to actually delve into some technical details. And one thing we're gonna focus on right now is the kind of inference and learn. components of, of this course. So I'm gonna talk about how you can approach these through the lens of, you know, optimization. So this is going to be, um, it might be a review for some of you, but hopefully it's gonna be a, a good, um, uh, you know, way to get everyone on the same page. Okay. So what is optimization? There's two flavors of optimization that we care about. There's discrete optimization where you're trying to find the best discrete object. For example, you're trying to find the best path. or some-the path P that minimizes the cost of that path. Um, we're going to talk about one algorithmic tool, um, based on dynamic programming, which is a very powerful way of solving these, um, complex optimization problems. Um, and the key, you know, property here is that the set of paths is huge and you can't just, uh, try all of them and compute the cost and choose the best one. So you're gonna have to do something clever. The second brand of optimization is continuous optimization. And formally, this is just finding the best, uh, vector of real numbers that satisfies or minimizes some objective function. So a typical place this shows up is in learning where you define objective function like the training error, and you're trying to find a weight vector w. So this notation just means it's a list of numbers, d numbers that minimizes the training error. And we're gonna show that gradient descent is, uh, uh, you know, easy and a very-a surprisingly effective way of solving these, um, continuous optimization problems. Okay. So to introduce these two ideas, I'm gonna look at two, um, problems and try to kinda work through them. So this might be also a good, um, you know, way to, uh, think about how you might go approach, uh, you know, homework problems. And I'm trying to kind of talk through this, um, in a bit more detail. Okay. So the first problem is, um, you know, computing at its distance. Um, and, you know, this might not look, you know, like an AI problem but a lot of, uh, AI problems have this as kind of a, you know, building block if you want to do some sort of matching between, um, you know, two words or two, um, biological sequences. So the input is you're given two strings, um, I'm gonna start writing over here on the board just to work this out. So you're given two strings, um, s and t. Um, so for example, um, a cat and, um, the cats. Okay. So these are two strings. And you want to find the minimum number of edits that is needed to take transform s into t. And by edits, I mean you can, uh, insert, um, a character like you can insert an S, you can delete characters like you can delete this A, and you can substitute one character for another. So you can replace this A with a T, okay? Um, so here's some examples. What's the added distance of cat and cat? It's 0, you don't have to do anything. Cat and dog is 3, cat and at is 1, you insert the A or insert the C. Um, cat and cats is 1, um, and a cat and the cats is, uh, 4, okay? So the challenge here is that there are, uh, quite a different number of ways to insert and delete, right? So if you have a string of, that's very long, there's just way too many things to like just try out. Um, okay. So then how do we, how do we go about, um, coming up with a solution? So any ideas? Yeah. In terms of saying that a substitution or a deletion can be considered a substitution or vice versa by saying like an empty character? Yeah, yeah. So let's try to simplify the problem a bit and building off. you know, what you, um, what was said. So, um, one thing to note is that, okay, we're-so the general principle, let me just write the general principle. Um, is to reduce the problem to a simpler problem. Because then you can hopefully solve, it is easier to solve and then you can maybe keep on doing that and you get something that's trivial. Okay. So there's maybe two observations we can make. One is that, well, we're technically saying we can, you know, insert into S, right? But if we insert into S, it makes the problem kind of larger in some sense. right? I mean, that's not-that's not good. That's not reducing the problem. But-but whenever we insert into S, we probably want to insert things which are in T. We want to cancel something out, right? So we wouldn't insert a K there for any reason. We probably want to insert a S, in which case, you know, S matches that, and then we've reduced that problem, right? So we can actually think about, you know, inserting into S. to S as equivalent to kind of deleting from, um, from T. Okay. Does that make sense? All right. So another observation we can make is that, um, you know, we can start inserting anywhere. We can start inserting here and then jump over here and do this. But this just introduces a lot of, um, you know, ways of doing it which all kind of result in the same answer. So why don't we just start more systematically at one end, and then just proceed and try to chisel off the problem, um, kind of, let's say from the end. Okay. So, um, start at the end. Okay. So, so now we have this, um, we have this problem. Problem. I'm gonna draw a problem in a little box here. Um, so let's start at the end. Yeah, question. What was the reasoning that we used to reach that principle, start at the end? Uh, the question is why are we starting at the end? Right. As opposed, well, um, the idea is that if you start at the end, then you have kind of a more systematic and consistent way of, you know, reducing the problem. So you don't have to think about all the permutations of where I can, you know, delete and substitute. Why is it more systematic to go from the right to the left than from the left to the right? We can also do left to right. So the end or the start, um, is both fine. This is just, I just picked the end. Yeah. How do we know starting at one end can give us the optimal strategy? Yeah. The question is, how do we know that starting at one end can give you the optimal strategy? So, you know, if you wanted to prove this more rigorously, there's some work, but I'll just try to give you a, you know, intuitive answer. Suppose you didn't start at the end and you just made a sequence of steps. Like I insert here, I delete here, and then I went over here. and, um, did all those operations to S. I could have equivalently also just sorted those by, you know, where it was happening and then just proceeded from one end to the other, and I would arrive at the exact same answer. So without loss of generality, I can start that. Any other questions? Okay. So, yeah. So instead of doing this, wouldn't the more vulnerable approach be trying to recognize some patterns instead of doing this? I mean between the two strings S and T, like some kind of-some sort of pattern. have better input in both of these S and T strings? Yeah. So the question is, maybe you can recognize some patterns. Wow, it's like, oh, cat. That's, uh, that's maybe those should be lined up. Um, I guess these examples are chosen so that these patterns exist, but we want to solve the problem for cases where, um, the pattern might not be obvious. It could be we want to work it for-it to work for all strings. Maybe there is no pattern. And we still would want to have efficient algorithm to do it. Yeah. Can we just use dynamic programming? We go one by one, there is always basically two steps. Either we're doing substitution or otherwise, it's like the same character or we have to insert. Yeah. and then we keep going and we just like remember each like two, two strings that we have at one point. Mm-hm. So that if we calculated that we don't have to do it again. Yeah, yeah. That's it. Yeah. Yeah. Yeah. Great idea. Let's do dynamic programming. So that's what I'm kinda trying to build up from, build up to. Okay. So, um, So if you look at this, so dynamic programming is a kind of general technique that essentially allows you to express a more complicated problem than a simpler problem. So let's start with this problem. If we start at the end, um, if the two match then, well, we can just immediately, uh, you know, delete these two and that's- it's gonna be the same, right? So we can get-we're gonna get some free rides there. Okay, but when they differ, um, now we have many options. So what we-what could we do? Well, we could, um, um, you know, substitute, okay? We can change the T to an S. So what does that leave us with? So I can do, uh, cats, um, T is the cat, the-okay. So I can substitute, um, Okay. What else can I do? Someone say something I can do. So I can insert, insert where? Into?. So you can insert an S, right? But that's the same as, you know, by one deleting from T. So but you can basically also just delete this S. So this is our cat and I deleted this S from T. Okay? So this is, um, let's call it, uh, you know, um, I guess let's call this insertion, it's technically insertion. And then finally, what can I do? I can also remove T. So a, k, c, a, t, the, the cat. Okay, so this is delete. And right now, you're probably looking at this like, well, obviously, you know, you should do this one. But in general, it's hard to tell, right? I just gave you some arbitrary strings, you know, who knows what the right answer is. Um, so in general, how do you pick? Yeah. The second one, the t is supposed to be the cats. Yeah. You mean this one? Yeah. So here I inserted a S, right? But then because there's two S's here, I just cancel them out and just left it. So you can think about this as really deleting from-We're just not considering. It's like in the original problem we said that we're trying to find the S to the T. Yeah. Yeah. Yeah. So I'm-I'm-because of this, I'm kind of trying to reframe the problem a little bit. Okay. So which one should I choose? Yeah. What about the substitution the other way? Um, the substitution the other way meaning change. S to the T. Sorry, there's too many S's and T's here, which is a bit unfortunate. I didn't replace the last S pass with T. Oh, you could. How did we eliminate that? What did we do wrong? That's, you can think about that. is kind of equivalent. So if you identify two letters that you want to make the same, then you could-you can replace the one to be the other or the other to be that. I mean, officially we've been kind of framing it as we're only editing S, which is the reason that it's asymmetric. Okay. So which one of these? Door A, door B, or door C? Yeah. Would you look for similarity between S and T, every step, like you did the one because there's cat in both of them? Yeah. So you could try to look inside, but-but remember, these are-might be really complicated. So we want to kind of a simple mechanized procedure to tell. What about the next layer? The next letter. Yeah, let's pretend these are-you can't see inside of them. Each of the different cases. Yeah. Okay. So let's keep on going. So I'm not gonna draw everything, but you can also try to break this down into, maybe there's three actions here and three actions here, right? Um, and at the end of the day, you hopefully have a problem that's simple enough that, um, where s equals t or something, then you're done. Um, but then, you know, how-how do I-how do I know? Suppose I've solved this. Suppose someone just told you, okay, I know this, um, cost, I know this cost, I know this cost. What-what should you do? Yeah, you should take the minimum, right? Like remember, we wanna minimize the edit distance. So, um, there's three things you can do. Each of them has some cost of doing that action, which is, you know, one, every edit is the same cost. And then there's a cost of, you know, continuing to do whatever you're doing. And so we're just gonna take the minimum of those. Yeah. We know that that's like the optimal amount of loops that we have to take. Yeah. So I was trying to argue that with, if you're going to right to left, it's without loss of generality. Because if you went left to right or in some other order, you can also replay the edits. letter that you needed one insertion on like the top left and like the top stream. But if you go from like the left, it looks like as if they're all shifted over by one or not. Yeah. Oh, that will work. Okay. Yeah, I think it works. Um, okay. So, so let's, um, try to code this up and see if we can make this program work. Okay. So, um, I'm gonna do edit distance. Can everyone see this? Okay. So, um, so I'm gonna define a function, it takes two strings, and then I'm going to, um, define a recurrence. So recurrences are- uh, is I guess one word I haven't really used. But this is really the way you should kind of think about dynamic programs and this idea of taking complex problems and breaking it down. It's gonna show up in, you know, search problems, um, MDPs, and, you know, games. So I guess it's something that you should really be comfortable with. So let's, um, define a recurrence, uh, as follows. Um, so remember at any point in time, I have, uh, let's say a sub-problem. And since I'm going right to left, I'm only considering the- first m letters of s and the first letter, n letters of t. Okay? So recurse is going to return the minimum edit distance between two things. The first m letter. letters of s and the first n letters of t. Um, I'm gonna post this online so you guys don't have to, uh, copy-try to copy this. Um, okay. So, um, okay. Suppose I'm gonna-I'm gonna define this function. If I had this function, what should I return? Recurse of s. So M is an integer, right? So N is an integer. So I'm going to return the length of M and the length of N. Okay? So that's kind of the initial state. Sorry. Yep. Okay. Um, all right. So now I need to fill out this function. Okay. So let's, let's, um, consider a bunch of cases. So here's some easy cases. Suppose that, um, m is 0, right? So I have comparing an empty string with something that has n letters. So what should the cost of that be? I heard something wrong. It should be n. And symmetrically, if n is 0, then result should be m. And then if, now we come to the kind of initial case that we consider, which is the n. that can match. So if, um, s, um, the last letter of m, here, you know, this is, uh, zero-based indexing, um, so that's why there's a minus 1. So if this matches, then what should I do? So now I reduce this to a sub-problem, right? So I re-have m minus 1 and minus 1. Okay. And now comes the fun case which we looked at. So there's-in this case, the last letter doesn't match so I'm gonna have to do some sort of edit. Can't just let it. Yeah. Would it be worth doing a full STT compare or S through M and then through N compare? Would it be worth doing a full STT compare? until I guess, yeah. There's probably a way you can make this more efficient. I'm just gonna try to get the basic thing in there. Okay. So substitution. Okay. So what's the cost of a substitution? I pay one to do the substitution. substitution but as a reward I get to reduce the problem to n minus 1 and n minus 1, right? So I lop off a letter from S and I lop off a letter from T. So what else can I do? So I can, um, you know, delete so that also costs 1 and when I delete, I delete from S and then N so this remains the same. And then now you can think about the insertion, um, is n minus 1, right? Because remember insertion into S is deletion from T, that's why this is n minus 1. Okay? And then the result is just gonna be a minimum of, uh, all these things. Okay? Return result. Okay. So just, uh, and then how do I call this function? Um, a cat, the cats. So let me print out the answer. Let's see if it works. Okay, print out 4. Therefore, I conclude it works now. Um, you-I mean, if you were doing this, uh, you would probably want to test it some more, but in the interest of time, I'll kind of move on. So let me just kind of refresh. Okay, so I'm computing this at a distance between two strings, and we're gonna define a- recurrence that works on subproblems where the subproblem is the first m letters of s and the first n letters of t. Um, and the reason I'm using integers instead of, um, strings is to avoid like string copying, um, implementation detail but it doesn't really matter. Um, so base cases. So you want to reduce your problem to a case where it's, it's trivial to solve. And then we have the last letter matches, and then we have the letter doesn't match and you have to pay some sort of cost. I don't know which action to take, so I'm gonna take them, you know, minimum of all of them. and then I call it by just calling, you know, recurse. Okay. So this is great, right? So now I have a working thing. Um, let's try another test case. So I'm gonna make this, um, so if I do times 10, this, um, basically, uh, replicates this string 10 times. So it's a, it's a long string, longer string. Okay. So now I'm gonna run it. Maybe I shouldn't wait for this. Is there a base case? There is a base case. I think it works, but it's, it's-what's, what's wrong with this code? Yes, it's very slow. Why is it slow?. Yeah, right. So, so I'm recursing. Every point recurses three times. So you kind of get this exponential blow up. So there's kind of a-how do you solve this problem? Yeah, you can memo, I think I heard the word memoize, which is another way to kind of think about memoize plus, I guess recurrence is, is dynamic programming, I guess. So I'm gonna show you kind of this. way to do it which is pretty un-invasive. Um, and generally I recommend people, well, get the slow version working and then try to make it faster. Don't try to be, you know, too slick at once. Okay. So I'm gonna- gonna make this cache, right? And I'm gonna say if mn is in the cache, then I'm gonna return whatever's in the cache. So cache is just a dictionary mapping, um, the key which is, um, identification of the problem I'm interested in solving, and the result which is the answer that I computed. So if I already computed it, I don't need to compute it again, just return it. And then at the end, if I have to compute it, then, um, I have to put this in the cache. cache, okay? So three lines or four lines, I guess. Yeah.. Yeah, that's a great point. Uh, this should be outside of the recurse function. Yeah. Glad you guys are paying attention. Um, otherwise, yeah, it would do basically nothing. Any other mistakes? Yeah. There are some function decorators that like implement memorizing for you. In this class, are you okay if we use that? Or would you rather us like make our own in this case? Um, you can use the decor, you can be fancy if you want. Okay. Yeah. But, but I think this is, you know, pretty transparent and easy for learning purposes. Okay. So let's run this. So now it runs instantaneously as opposed to, I actually don't know how long it would have taken otherwise. Okay? And, uh, sanity check 4d is probably the right answer because there's, 4 was the original answer and multiply by 10. Okay. Any other questions about this? So this is an example of, you know, kind of basic dynamic programming which are, you solve a problem trying to formulate it as a recurrence of a complicated problem in terms of smaller problems. And like I said before, this is gonna kind of show up over and over again in this class. Yeah. Yeah, so the question is why does this reduce, uh, redundancy? Is that right? Um, so maybe I can do it kind of pictorially. Um, if you think about, let's say you have a, um, a problem here, right? And this gets, um, you know, reduced to, um,. Um, I'm just making kind of an arbitrary, um, diagram here. So this problem gets reduced to these two, and this problem gets reduced to these two, um, and, and so on. Um, right? So If you think about, if you didn't have memoization, you would just be paying for the number of paths. Every path is the kind of you have to compute from scratch. Whereas, um, if you do memoization, you pay in the number of nodes here, which a lot of this is shared. Like here, um, you know, once you compute this, no matter if you're coming from here or here, you're kind of using the same value. Okay. So let's, let's move on. So the second problem, um, we're gonna talk about is, uh, has to do with continuous optimization. And the motivating question here is how do you do, um, regression, which is a kind of a bread and butter of, um, you know, machine learning here. So, so here we go. Um, regression. Okay. So imagine, um, you get some points. Okay. So I give you a point, um, which is, you know, 2, 4. Um, and then I give you another point, let's say, um, 4, 2. And so these are data points you want to, let's say, predict, um, um, housing price from, you know, square footage or some-something like that. You want to predict health score from, um, you know, blood pressure and some other things. So this is pretty common in machine learning. And the question is, how do you fit, um, you know, a line? I'm gonna kind of consider the case where your line has to go through the origin just for simplicity. Um, so you might wanna like find, you know, a fit. I mean, two, two points is maybe kind of a little bit degenerate, but, um, that's the simple example we're gonna, you know, work with. In general, you have lots of points and you wanna fit the line that best kind of, uh, is close to the points. Okay. So, um, how do you do this? Um, so there's a principle called least squares which says, well, if you give me a line which is given in this case by a slope w, um, I'm going to tell you how bad this is. And badness is measured by looking at all the training points and looking at these distances, right? So, Here I have, um, you know, this particular, uh, particular, let's say, point, you know, x i. If I hit it with a w, then I get basically the, um, you know, the y-intercept, um, here. Not the y-intercept, but the, like, the y-value here. That's my prediction. The real value was, um, you know, y i, which is, you know, up here. And so if I look at the difference, I want that difference to be 0. Right? So in, in least squares, I square this and I say, I want this to be as small as possible, right? Now, this is only for one point. So I'm going to look at all the points. So let's suppose I have endpoints, um, and that's a function that I'm gonna call f of w, which basically says for a given weight vector, which is a slope, um, give you a number that characterizes how bad of a fit, um, this is. Where 0 means that I fit everything perfectly, and large numbers mean that I fit poorly, okay? All right. So, so that's- No regression. So how do I solve a regression problem? So how do I optimize this? Can you do this in your head? So if I actually had these two points, what should W be? Okay, it doesn't matter. We'll, we'll compute it. So how do we go about doing this? So one principle, um, which is maybe another general takeaway is abstract away the details, right? Um, this is also true with the dynamic program. But sometimes, you know, you get-if you're too close to the board and you're looking at, oh man, these points are here and I need to fit this line, you know, how do I do that? Um, um, you-you kind of get kind of a little bit stuck. But why don't we think about this f as, say, some function. I don't-I don't really care what it is. And let's plot this function, okay? So now this is a different plot now. This is, uh, the weight and this is f of w. Um, always label your axes. Um, and let's say this function looks like, uh, this, okay? So which means that for this slope, I pay, you know, this amount. For this slope, I pay this amount and, and so on. And what do I wanna do? I wanna minimize f of w, which means, I want to find the w which has the least value of f of w, right? Question? Derivative. Okay. So you take the derivative. So what does the derivative give you? It tells you where to move, right? So if you look over here, um, so you can't-in general, you might not be able to get there directly. In this actually particular case, you can because you can solve it in- closed form but I'm gonna try to be more general. So if you start here, this, this derivative tells you, well, the function is decreasing if you move to the right. So then you should move to the right. Whereas over here, if you end up over here, the derivative says, uh, the function is decreasing if you move to the left. So you should move to the left, right? So, um, what I'm gonna introduce is this, uh, algorithm called gradient descent. It's a very simple algorithm. It basically says, start with some place and then compute the derivative and just follow your nose, right? If the derivative says it's negative, then just go this way and now you're at a new point, you compute the derivative again, you descend, and now you compute it again. And then maybe you, uh, compute the derivative and it says keep on going this way, maybe you overshoot, and then you come back and then, you know, hopefully, uh, you'll end up kind of at the minimum, okay? So, uh, let's try to see what this looks like in code. So gradient descent is, you know, one of the simplest algorithms but it really underlies, essentially, all the algorithms that you people use in machine learning. Um, so let's do points. We have, uh, two points here. Um, and I'm gonna define, um, some functions. Okay. So f of w. So what is this function? So I'm going to sum over all the different, um, you know, basically at this point is converting math into Python. So I'm gonna look at all the, um, points. So for every x, y, um, what the model predicts is w times x minus y. Um, and if I square that, that's going to be the, the error. that I get on that point and then if I sum over all of these errors, then I get my objective function, okay? Array of, uh, yeah. So you can put array here if you want, but it doesn't matter. It's, it's actually fine. Um, okay. So now I need to compute the derivative. So how do you compute the derivative? So if your calculus is a little bit rusty, you might want to brush up on that. Um, so what's the derivative? Remember we're taking derivative with respect to w, right? There's a lot of symbols here. Always remember what you're taking derivative with respect to. Okay, so derivative of the sum is the sum of the derivative. So now I need to take the derivative of this, right? And what's the derivative of this? Well, something squared, um, you bring the 2 down here and now you multiply by the derivative of this. And what's the derivative of this? Should be x. Right? Because this is a-why this is a constant and w derivative, w times x with respect to w is x. Okay, so that's it. Okay, so now let's do gradient descent. Um, let's initialize with w equals 0. And then I'm going to just, um, you know, just iterate a 100 times. Normally, you would set some sort of stopping condition, but, um, let's just keep it simple for now. Okay, so for every moment, I'm going to-I have a w. I can compute the value of the function and also take the gradient or the derivative. Gradient just means derivative in higher dimensions, which we'll want later. Um, okay. And then what do I do? I take, uh, w and I, um, subtract the-the gradient, okay? So remember-okay, I'll be out of- So, uh, I take the gradient. Remember, I want to have the gradient, uh, gradient tells me where the function is increasing. So I wanna move in the opposite direction. Um, and eta is just going to be this, uh, step size to, um, keep things under control. We'll talk more about it next time. Okay. So now I want to do, uh, print out what's going on here. So iteration, um, print out the function and, um, key value, okay? All right. So let's compute the gradient. And so you can see that the iteration, we first start out with w equals 0, then it moves to 0.3, and then it moves to 0.7999999, and then it looks like it's converging to 0.8. And meanwhile, the function value is going down from 20 to 7.2, which happens to be the optimal answer. So the correct answer here is 0.8. Okay. So that's it. Next time we're gonna keep, uh, we're gonna start on the machine learning lecture.

All right. Let's get started. Please try to have a seat if you can find a seat, and let's get the show on the road. So welcome everyone to CS 221. This is Artificial Intelligence.

And if you're new to Stanford, welcome to Stanford. So first, let's do some introduction. So I'm Percy.

I'm going to be one of your instructors, teaching this class with Dorsa over there. So if Dorsa wants to say hi, stand up. Hi, I'm Kirstie.

I work at the robot interactions. Super excited about this new class and what we've got today. Great.

So we're going to be trading off throughout the quarter. And we also have a wonderful teaching team. So these are your CAs. So if all the CAs could stand up and I'll give you each person an opportunity to say three words about what you're interested in.

Let's start with the head CA. Hello. My name is.

I'm a PhD student, and I'm interested in natural language processing. Yeah.. Hi, my name's.

I'm a second year master's student. I'm. I'm a student of machine learning and data mining.

Hi, I'm a second year master's student and I'm interested in machine learning and natural language philosophy. Hi everyone, my name is and I'm a first year master's student and I'm interested in computer vision. Hi, I'm Hello, I'm a first year master student and I'm interested in reinforcement learning. Hi everyone, I'm a first year master student and I'm interested in machine learning and differential privacy.

Hi, I'm a post-grad student and I'm interested in reinforcement learning and AI education. Hi, I'm going to use this on your side. I need to spend a few minutes in the room.

Hi, I'm going to use this on your side. Let's go over there. Hi, everyone. I'm Kendra.

And I'm interested in... Hey, everyone. I'm also a co-chairman, and I'm interested in...

Hi, everyone. and interesting. Now, any tears in the back? Well, well, they're all on the slide. Okay.

So as you can see, we kind of have. have a very diverse team and so when you're thinking about kind of final projects later in the quarter you can tap into this kind of incredible resource. Um, so three quick announcements.

Um, so there's going to be a section every week which will cover both kind of review topics and also advanced topics. So this Thursday there's gonna be an overview. Um, if you're kind of rusty on Python or rusty on probability, come to this and we'll get you up to speed.

Um, homework, the first homework is out. It's posted on the website. It's due next. Tuesday at 11 PM. So remember the time, that matters.

Um, all submissions will be done on Gradescope. There's gonna be a Gradescope coast-code that will be posted on, uh, Piazza. So look out for that, um, later. Okay. So now let's, let's begin.

So when I first started teaching this class, uh, seven years ago, I used to have to motivate why AI was important and why, if you study it, you'll have a lot of impact in the world. But I feel like I don't really need to do this. Now, it's kind of inescapable.

that you pick up the news in the morning and you hear something about, you know, AI. And indeed, we've seen a lot of success stories, right? Uh, AIs that can play Jeopardy or play Go, Dota 2, even poker, all these kind of games at superhuman level performance.

It can also, you know, read documents and answer questions, do speech recognition, uh, face recognition, um, even kind of medical imaging. And all these tasks are uh, you read about how successful these, uh, technologies have been. Um, and then if you take a look at outside the kind of the technical circles, there's a lot of people, um, in policy, um, and, um, trying to ask what is going on with AI.

And you, you hear about, uh, these kind of very, uh, broad claims of how transformative AI will be, um, to the future of work and, um, the society and so on, and even some kind of boring on. are pretty, you know, catastrophic consequences. So what's gonna happen in the future? No one knows.

Um, but it is fair to say that AI will be transformative. Um, but how do we get here? And to do that, I want to take a step back to the summer of 1956. So the place was Dartmouth College. John McCarthy, who was then at MIT, He, and then after that he founded the Stanford AI Lab, um, organized a workshop at Dartmouth College with, um, some of the best and brightest minds of the time, Marvin Minsky, uh, Claude Shannon, and so on.

And they had this not so modest goal of trying to think that every aspect of learning or any feature of intelligence can be precisely captured so a machine can be just, uh, simulated. So they were after the, the big question of how do you kind of solve. um, AI. So now they didn't make that much progress over the, the summer, but a lot of programs and interesting artifacts came about from that time.

Um, there were programs that could play checkers or prove, uh, theorems and sometimes even better than what, um, the human proof would look like. Um, and there was a lot of optimism. People were really, really excited and you can see these quotes by all these excited people who proclaimed that AI would be solved in a matter of years. But, We know that didn't really happen.

And there's this kind of folklore example. People are trying to do machine translation. So you take an English sentence like the spirit is willing but the flesh is weak. You translate into Russian, which is what the choice language by the US government at that time. And you could translate back into English.

And this is what you get. The vodka is good but the meat is rotten. So the government didn't think that was too funny.

So they cut off the funding and it became the first AI winter. Um, so there's a period where, you know, AI research was not very active and was not well-very well funded. Um, so what went wrong here? Um, these are really smart people, right?

Um, they just got a little maybe a little ahead of themselves. So two problems. One is that the compute was simply not there, right?

It was millions or even, you know, billions of order of magnitude compared less than what we have, uh, right now. And also the problem is the way they formulate them intrinsically rely on a kind of exponential search which, um, no matter how much compute you have, you're never going to, you know, uh, win that race. Um, they also have a limited, you know, information.

And this is maybe a kind of a more subtle point that if I gave you infinite compute and ask you to translate, I don't think you would be able to figure it out because it's not a computation problem. You just need to learn the language and you need to experience all the subtleties of language to be able to, you know, translate. But on the other hand, AI wasn't solved, but a lot of interesting um, contributions to computer science came out of it.

Lisp was-is, had a lot of ideas that, um, underlay many of the high-level programming languages we have, garbage collection, um, time-sharing, allowing multiple people to use the same-one computer at the same time, which is something that, uh, we kind of take for granted. And also this paradigm of separating what you want to compute, which is modeling, and how you do it, which is inference, which we'll get to a little bit later. Okay.

So, Um, people forget quickly and, um, in the 17th and 18th, there was a renewed generation of people getting excited about AI again. Um, and this time it was all about knowledge, right? Knowledge is power.

And, um, there were a lot of expert systems which are created. And the idea is that if you could encode experts knowledge about the world, then you could do kind of amazing things. And at the time, the knowledge was encoded in generally a set of rules. Um, and there were a lot of programs that was written.

and you notice that the, the scope is much narrower now. The goal isn't to solve it all of AI, but to really focus on some choice problems like diagnosing the diseases or converting customers'order parts into parts and customer orders into parts. And this was the first time that AI I think really had a real impact on industries. So people were actually able to make useful, you know, products out of this.

And knowledge did actually play a key ingredient in curbing this, in your exponential growth that people were worry about. But of course, um, it didn't last long. Um, knowledge as deterministic rules was simply not rich enough to capture all the kind of nuances of the world. It required a lot of manual effort to maintain. And, um, again, um, a pattern of, uh, over-promising and under-deliverings that seems to plague, um, AI people, um, led to the collapse and, of the field and the kind of second AI winter.

Um, Okay. So that's not the end of the story either. But actually, it's not, not kind of really the beginning either.

Um, so I'm going to step back further in time to 1943. So what happened in 1943? So there was, um, a neuroscientist, McCullough, and a logician, Pitts, who were wondering and marveling at how the human brain is able to do all these kind of complicated things. And they wanted to kind of formulate a theory about how this could all happen. So they developed a theory of artificial neural networks. And this is kind of, you can think about the- root as of, you know, deep learning in some sense.

And what was interesting is that they looked at neurons and logic, which are two things that you might not kind of necessarily associate with each other, and showed how they were kind of connected mathematically. And a lot of the early work in this era were around artificial neural networks was about studying them kind of from a mathematical perspective. Because at that time, the computer wasn't there, you couldn't really run any. kind of train any models or, um.

Um, and then 1969, something interesting happened. So there's this book by Minsky and Papert called Perceptrons. And this book did a lot of mathematical analysis.

And it also showed that linear models, one of the results, um, of many was showing that linear, uh, classifiers couldn't solve the XOR problem. Um, the problem is, uh, another way to think about the problem is basically given two inputs, can you tell whether they're the, the same or not or different? And, um, so it kind of not a-shouldn't be a hard problem, but linear classifiers can do it.

And for some reason, which I don't quite understand, it killed off neural nets research, even though they said nothing about if you had a deeper network, what it could do. But it's often cited that this book swung things from people who are interested in neural networks to the field of AI being very symbolic and logic-driven. But there was always this kind of minority group who were, really invested and believed in, um, the power of neural networks and they thought it was just kind of a matter of time.

So in the 80s, there was a re-renewed interest. Um, people kind of discovered or rediscovered the backpropagation algorithm, which allowed, uh, kind of a for a generic algorithm that could train these multi-layer neural networks, because single layer, remember, was insufficient to do a lot of things. Um, and then one of the kind of the early success stories was Yann LeCun in 1989. applied a convolutional neural network and was able to recognize hand digit-written digits. And this actually got, you know, deployed by the USPS and it was reading kind of zip codes.

Um, so this was, you know, great. Um, but it wasn't until this decade that the, um, this area of neural networks really kind of took off, um, under the moniker deep learning. Um, and, you know, AlexNet in 2012 was kind of a huge transformation.

Um, where they show gains on the kind of ImageNet benchmark and, and overnight transform the computer vision community. AlphaGo as many of you know, and many kind of other, um, and there were kind of the rest is history. Okay.

So, so there's these kind of two intellectual traditions. Um, you know, the name AI has always been associated with the time John McCarthy, logical vision, that's kind of where it started. But Um, as you can see that there's also kind of this neuroscience inspired tradition of AI.

And the two were kind of really had some deep philosophical differences and over the decades fought with each other kind of quite a bit. But I want to pause for a moment and really think about, maybe there are actually kind of deeper connections here. Remember McClellan-Pitts, they were studying artificial neural networks, but the connection was to logic, right?

So from even in the very beginning, there is kind of this synergy that, you know, some, some people can kind of often overlook. And if you take a look at AlphaGo, which if you think about the game of Go or many games, it's a mathematically-you can write down the rules of, uh, Go in logic in just a few lines. So it's a mathematically well-defined logical-logic puzzle in some sense. But somehow, the, the power of neural networks allows you to develop these models that actually play Go really, really well.

So- This is kind of one of the deep mysteries that has kind of a, I think is a kind of open, a standing challenge in AI. As with any story, it's not the full picture. And I want to point out on this slide that AI has drawn from a lot of different fields.

Many of the techniques that we're gonna look at, for example, maximum likelihood came from statistics or games, came from economics, optimizations, gradient descent, came from was in the 50s, completely unrelated to AI, but these techniques kind of developed in a different context. And so AI is kind of like, you know, it's kind of like a New York City. It's, it's like a melting pot where a lot of the, these techniques get kind of unified and applied to kind of interesting problems.

And that's what makes it, I think, really interesting because of the, the new avenues that are opened up by kind of unique combinations of existing techniques. Okay. So, so that was a really, really brief history of, you know, where-how we got here.

Now, I want to pause for a moment and think about, you know, what is-what is the goal? What-what are AI people trying- to do. And again, this is kind of, there's two ways to think about this, which and sometimes the conflation of these causes a lot of confusion.

So I'd like to think about it as AI as agents and AI as tools. So the first view asks the kind of the scientific question of how can we create or recreate intelligence. And the second one asks, you know, how can we use technology to kind of benefit, you know, society? And these two are obviously very related and they have, um, a lot of shared technical, um, um, overlap.

But, you know, philosophically, they're kind of different. So let me kind of explain this a little bit. So the idea with AI agents is, and this is, I think a lot of what, um, um, gets associated with AI. And especially with science fiction, that kind of portrayal certainly kind of encourages this kind of view, where you're human, we're human beings. And what you do is you look in the mirror and you say, wow, that's a really smart person.

And you think, okay, what can humans do that is so amazing? Well, they can see and they can perceive the world, recognize objects. They can grasp cups. and drink water and not spill it.

They can communicate using language as I'm doing to you right now. We know facts about the world, declarative knowledge such as what's the capital of France, and procedural knowledge like how to ride a bike. We can reason with this knowledge and maybe ride a bike to the capital of France.

And then really importantly, we're not born with all of this, right? We're born with basically nothing, none of these capabilities, but we're born with the. capacity and potential to acquire these over time through experience.

And learning, it seems to be kind of this critical ingredient which drives a lot of the success in AI today. But also with, um, you know, human intelligence is clear that learning plays such a central role in getting us to the level that we are operating at. So each of these areas has kind of spawned entire subfields and people in it are kind of wondering about how you can, um, make artificial systems that have the language or the motor or the individual perceptual capabilities that humans have. But are we there yet?

And I would like to think that we are very far. So if you look at the way that machines are having successful, it's all with a narrow set of tasks and millions or billions of examples. And you just crunch a lot of computation and you can really kind of optimize.

um, every-any task you're gonna come-come up with. Whereas humans operate in a very different regime. They don't necessarily do any, you know, one thing well, but they are-have such a kind of diverse set of, you know, experiences, can solve the diverse set of tasks, and learn from each individual task from very few examples.

And still, it's a kind of a grand challenge in-from a, a, you know, Congress perspective, how you can build systems with this level of capability in that humans have. So the other view is, you know, AI tools. Basically, you would say, okay, well, you know, it's kind of cool to think about how we can, you know, recreate intelligence.

But, you know, we don't really care about making more things like humans. We already have a way of, you know, doing that that's called, you know, babies. So.

what instead we really like to do is not making something that's like a human, but making systems that help humans because, you know, after all we're, we're humans, I guess it's a little bit selfish, but, um, we're in charge right now. Um, and, and a lot of these, this view and a lot of the success stories in AI are really different from the things that you expect, you know, this, uh, this humanoid robot to come into your house and be able to do. For example, this is a project from, uh, Stefano Ehrman's group, um, there's a lot of poverty in the world and, Um, part of it is just kind of understanding what's-what's going on.

And they had this idea of using computer vision on satellite imagery to predict things like, you know, uh, GDP. Um, so this is obviously not a task that, you know, the-our ancestors in Africa were like, you know, getting really good at. Um, but nonetheless, it uses convolutional neural networks, which is a technique that was inspired by, you know, the brain.

And so that's-that's kind of interesting. Um, you can also have another application for saving energy by trying to figure out when to cool data centers. As AI is being deployed in more kind of mission critical situations, such as self-driving cars or authentication, there are, there are a few new issues that come up.

So. So for example, there are this phenomenon called adversarial examples, where you can take these cool looking glasses, you can put them on your face, and you can fool the computer, uh, state of our-our face recognition system to think that you're actually, you know, someone else. Or you can post these, uh, stickers on stop signs and get this, uh, state of our system to think that it's, uh, um, a speed limit sign. So there's obvious-there's clearly these are, you know, big problems if we think about the widespread deploy-deployment of AI.

Um, there's also, uh, less catastrophically but also pretty, um, you know, upsetting, which is, uh, biases that you-many of you probably have read in the news about. Um, so for example, if you take Malay which is a language that, uh, doesn't distinguish, um, in this writing form between he and she, and you stick into Google Translate, um, you see that she works as a nurse but he works as a programmer, which is encoding certain, uh, societal biases, um, in the actual models. And one kind of important point I wanna bring up is that, you know, how is machine learning and, and, uh, kind of working today?

Well, it's, um, you know, society exists, society is generating a lot of data. We're training on this data and kind of trying to fit the data and trying to mimic what it's doing. and then using predictions on it. What could possibly go wrong, right? And so, so certainly people, a lot of people have been thinking about how these biases are kind of creeping up, and it's an open active area of research.

Something a little bit more kind of sensitive is, you know, asking, well, these systems are being deployed to all these, all these people whether they kind of want it or, want it or not. And this, this actually touches on, you know, people's, uh, you know, livelihoods. It actually impacts people's lives in a serious way. Um, so North Point Wizards Company developed, uh, a software called Compass that tries to predict how risky, um, criminal risk or how someone, how risky someone is essentially. Um, and a pro-publica, this organization realized, whoa, whoa, whoa.

You have the system that, uh, given that individual didn't refund is actually, um, more-twice as likely to classify blacks as in- correctly as non-blacks. So this is, seems pretty problematic. And then North Point comes back and says, actually, you know, I think, I think we're being fair.

Um, so given a risk score of seven, we were fair because 60% of whites were re-offended and 60% of blacks were re-offended. Um, the, the point here is that there's, there's, there's actually no, um, solution to this in some sense, sadly. Um, So people have, uh, formulated different notions of fairness and equality between, um, how you predict, uh, accord on different kind of, um, groups.

But, um, all-you can have different, uh, notions of fairness and which all seem reasonable from first principles, but mathematically they can be, um, incompatible with each other. So this is, again, an open area of research where we're trying to figure out as a society how, um, to deal with the skim and that machine learning. might be used in these kind of critical situations.

Okay. So summary so far, there's an agent's view. We're trying to really kind of dream and think about how do you get these capabilities, like learning from very few examples that humans have into, you know, machines and maybe opening up a kind of a different set of technical capabilities.

But at the same time, We really need to be thinking about how these AI systems are affecting the real world. And things like security and biases and fairness all kind of show up. It's also interesting to note that, you know, a lot of the challenges in deployment of AI system don't really have necessarily to do with, you know, humans at all.

I mean, humans are incredibly biased, but that doesn't mean we want to build systems kind of in our, um, that mimic humans and kind of inherit all the kind of the flaws that humans have. Okay. Any questions about this? Maybe pause for a moment. So let's go on.

So what I wanna do next is give an overview of the different topics in the course. And the way to think about all of this is that, um, in AI we're trying to solve really complex problems. The real world is really complicated.

And-but at the end of the day, we want to produce some software or maybe some hardware that actually runs and does stuff, right? And so there's a very considerable gap between these things. And so how do you even approach something like self-driving cars or, um, you know, diagnosing diseases, you probably shouldn't just like go sit down at a terminal and start typing, because then, um, there, there's no kind of no overarching structure. So what this class is going to do is to give you one example of a structure, which will hopefully help you approach hard problems and think about how to solve them in a kind of more principled way. Um, so this is a paradigm that I call the, um, modeling inference and learning paradigm.

Um, so the idea here is that there's three pillars, which I'll explain in a bit. And, uh, we can focus on each one of these things kind of in turn. So the first pillar is modeling.

So what is modeling? The modeling is taking the real world, which is really complicated, and building a model out of it. So what is a model? Model is a simplification. that is mathematically precise so that you can, you know, do something with it on a computer.

One of the things that's necessary is that modeling, necessary has to simplify things and, you know, throw away information. So one of the kind of the, you know, the art is to figure out what information to pay attention to and what information to keep. So this is gonna be important for example, when you work on your final projects and you have a real-world problem, you need to figure out, Um, you can't have everything and you have to figure out judiciously how to, um, manage your, you know, resources. So here's an example.

If you want to, for example, build a system that can find, uh, the best way to get from point A to point B in a graph, in a city, you can formulate the model as a graph where nodes are points in the city and edges re-represent ability to go between these points with some sort of cost, um, on the edges. Okay. So now, once you have your model, you can do inference. And what inference means is asking questions about your model.

So here's a model. You can ask, for example, how-what is the shortest path from this point to this point, right? And that's because now you're in model land, it's a mathematically well-defined problem. Now you can-it's within the realm of, you know, dev-developing algorithms to- you know, solve that problem. And most of inference is being able to do these computations really efficiently.

And finally, learning addresses the problem, where did this model come from? So in any kind of realistic setting, the model might have a lot of parameters, maybe it has, you know, millions of parameters. And how do you, if it, if it wants to be faithful to the, you know, real world, but how do you get all this? uh, information there. Um, manually encoding this information turns out not to be a good idea.

This is, um, in some sense what, um, AI from the 80s was trying to do. Um, so the learning paradigm is as follows. What we're gonna do is specify a model without parameters. Think about it as a skeleton. So in this case, we have a graph, but we don't know what the edge weights are.

Um, and now, we have some data. So maybe we have data of the form people try to go from x to y, and they took 10 minutes or an hour or so on. Um, and then from this data, we can learn to fit the parameters of the model.

We can assign, um, costs to the edges that kind of are representative of what the data is telling us. Okay. So now in this way, we can write down a model without parameters, feed the data, apply a generic learning algorithm, and get a model with parameters.

And now we can go back and do. inference and ask questions, you know, about this. Okay.

So this is kind of the, the, the paradigm and I want to really emphasize that, you know, learning is not, as I presented, is really not about any one particular algorithm like nearest neighbors or neural networks. Really a kind of a philosophy of how you go about approaching problems by defining a model, and then not having to specify all the details but filling them in later. Okay.

So here is the, um, plan for the course. We're gonna go from low-level intelligence to high-level intelligence. Um, and this is the intelligence of, um, of the-of the models that we're gonna be talking about. So first, we're gonna talk about machine learning.

And like I've kind of alluded to earlier, machine learning is going to be such a kind of a important building block of that can be applied to any of the models that we develop. So the central tenet in machine learning is you have data and you go to model, its main driver of a lot of successes in AI because it allows you to, in software engineering terms, move the complexity from code to data. Rather having, you know, a million lines of code which is unmanageable, you have, a lot of data which is collected in kind of a more natural way, and a smaller amount of code that can operate on this data.

And this paradigm has really been, um, incredibly powerful. One thing to think about in terms of machine learning is that it, it is-requires a leap of faith, right? So you can go through the mechanics of, you know, down-downloading some machine learning code and you train a model.

But fundamentally, it's about generalization, right? You have your data, you fit a model, but you don't care about how it performs on that data, you care about how it performs on new experiences. And that leap of faith is something that's, um, I think gives machine learning its power, but it's also a little bit, um, at first glance, perhaps magical.

Um, it turns out you can actually formalize a lot of this using, um, you know, probability theory and statistics, but that's kind of a topic for another time. Okay. So after we talk about machine learning, we're going to go back. and talk about the, the simplest of models, right? So a reflex model is this.

So here's a quiz. Okay, what is this animal? Okay, zebra. How did you get it so fast? Well, it's kind of reflex, right?

Your human visual system is so good, um, at, at doing these things without thinking. Um, and so reflex models are these, um, are models which just require a fixed set of computations. So examples like, are linear classifiers, deep neural networks. Um, and most of these models are the ones that people in machine learning, um, use.

Models is almost synonymous with, um, reflex, um, in, you know, machine learning. An important thing that there's no feed for it. It's just like you get your input, bam, bam, bam, and here's your output, okay? So that's, that's great because it's, it's fast.

But there's some problems that require a little bit more than that, right? So for example, here's another problem. Okay, quick. Why to move?

Where should you go? Okay. There's, there's probably like a few of you who are like chess geniuses. Um, but for the rest of us, um, I have no idea. I don't even know, wait, who's moving again?

Um, so, so in these kind of situations, we need something perhaps a little bit more powerful than a reflex. We need agents that can kind of plan and think, um, ahead. So the idea behind state-based models is that we model the world as a set of states which capture any given situation like a, a position in a, in a game.

and actions that take us between states which correspond to things that, um, you can do in the-in this game. Um, so a lot of game applications following this, uh, category of robotics, uh, motion planning, navigation. Um, also some things that are-might not be, uh, you might think of, um, planning as such as gen-you know, generation, um, in natural language or generating an image, um, you know, are, uh, can be cast in this way as well. So there's three types of state-based models, each of which we'll cover in, you know, weeks of time.

So search problems are the classic, you control everything so you're just trying to find the optimal path. There are cases where there's randomness. For example, if you're trying to go from point A to point B, maybe there's traffic that you don't, you know, don't know about or in a game there might be dice that are, die which are road.

And there's a third category which are adversarial games, which is cases where you're playing an opponent. who's actively trying to destroy you. So what are you gonna do about it?

So one of the games that we're gonna be talking about, when we talk about games is Pac-Man. And one of the assignments is actually building a Pac-Man agent such as this. So while you're looking at this, think about how-what are the states and what are the actions and how would you go about, you know, devising a strategy for Pac-Man to eat all the dots and avoid all the ghosts?

So that's something to maybe look forward to. There's also gonna be a competition, so we'll see who ends up on top. Okay.

So state-based models, um, are very powerful, they allow you to kind of have foresight. Um, but some problems are not really most naturally cast as state-based models. For example, you know, how many of you play Sudoku or have played it?

So the goal of Sudoku is to fill in these, uh, um, blanks with numbers so that, um, every row, column, and 3 by 3 sub-block has a digits 1 through 9, so it's a bunch of constraints. Um, and there's no kind of sense in which you have to do it in a certain order. Right.

Whereas the, the, the order in how you move in, in chess or something is, you know, pretty important. Um, so, so these type of problems, uh, are, are captured by these, uh, variable-based models where you kind of think about a solution to the problem as an assignment to the individual variables under some constraints. So constraint satisfaction problems, we'll spend a week on that. Um, these are hard constraints.

For example, two people can't be, a person can't be in two places at once, for example. Um, there's also Bayesian networks which we'll talk about which are variable-based models with soft dependencies. For example, if you're trying to track, um, you know, a car over time, these are the positions of the car, these variables represent the position of the cars, and these, uh, E's represent, uh, the sensor readings of the position, of the car at that particular position. And inference looks like trying to figure out where the, the car was given all this kind of noisy sensor reading.

So that's also gonna be another assignment later. Okay. So finally, um, now we get to high-level. What's the, what is high-level intelligence here?

Um, and I put logic here, um, for a reason that you'll see clear. Yeah, is there a question? Can you explain why it's not a state-based model? Yeah.

So the question is why is not the, why is the Sodiaco problem not a space-based model? Um, you can actually formulate this as a state-based model, um, by just thinking about the sequence of, uh, assignments. But it turns out that, um, you can formulate it in a kind of more natural way as a variable-based model, which allows you to, uh, take advantage of some kind of more efficient algorithms to solve it, right? It's-it-think about these models as kind of different, um, analogy is like a programming language.

So yes, you could write everything in, you know, C++, but sometimes, Writing in, you know, Python or, or SQL for some things might be more, might be easier. Yeah. How do you categorize this state-based problem where you have both an adversarial element and an element of randomness?

Yeah. So the question is, how do you categorize state-based models where there's both randomness and an adversary? We're also gonna talk about those as well. And those would be, I would classify them as adversarial, but there's also a random component you have to deal with. Games like Back Camera.

Yeah, question. Um, I was trying to like go from discrete states to like continuous spaces. So it feels like we have like problems that are, that are quantifiable via like discrete states spaces.

And then, and then the farther right we get, the more we need like optimants. Yeah. So the question is about whether, uh, some of these are more- are continuous and some of them are discrete. Um, I don't necessarily think of, uh, so a lot of the reflex models actually can work in continuous state spaces, for example, images.

Um, actually it's, it's almost a little bit of the opposite where um, the logic-based models are in some sense more, you know, discrete, but you can also have continuous elements, you know, in there, um, as well. Um, so in this class, we're mostly gonna focus on kind of discrete objects because they're just going to be simpler to work with. Okay. So what is this logic?

So the motivation here is that suppose you, um, wanted a little companion who, um, you could boss around and, um, and help or help you do things, let's say. That's a better way to say it. it. Um, so you'd like to be able to say, okay, you know, tell us some information, um, and, you know, then later you want to be able to ask some questions and have the system be able to reply to you. Um, so, um, you know, how-how would you go about doing this?

One way you could think about is building a system that you can actually talk to using natural language. Okay. So I'm actually gonna show you a little demo, um, which, uh, is gonna come up in the last assignment on logic.

Um, and well, let's see what you think of that. Um, okay. So this is going to be a system that is, um, based on logic that I'm going to, um, tell the system a bunch of things and I'm gonna ask some questions.

So, um, I want you all to follow along and see if you can, you know, play the role of the agent. Okay. So I'm going to teach you a few things like, um, Alice is a student. Okay.

So it says I learned something. Now, let's-let's quiz. Um, is Alice a student? Okay. Good.

So that worked. Um, is Bob a student? Should the answer be? I don't know. Who's Bob?

Um, okay. So now let's do, um, students are, um, people. Um, Alice is not a person. I don't buy that. Okay.

So, um, okay. It's, you know, it's doing some reasoning, right? It's using logic. It's not, um, just, um. Okay.

So now let's do, um, Alice is from, you know, Phoenix. Phoenix is a hot city, I know because I live there. Um, cities are places and if it is snowing, uh, it is, um, then it is cold.

Okay? Got it? So, um, is it snowing?

I don't know. Um, so how about this? Okay.

So if, um, a person is from a hot place and it is cold, then she is not happy. Okay? True, right?

Um, I guess those of you who have spent all your life in California would maybe appreciate this. But, um, okay. So is it snowing now?

How many of you say yeah, it's snowing? How many of you say no? Don't know? Okay.

Let's see what. Oh, too late. How about if I say Alice is happy? Okay. So is it snowing now?

No, it should be no. Okay. So you, you guys were able to do this. Okay.

So this is kind of an example of an interaction which, um, if you think about it has, is very, very different from what you would see kind of in a typical, um, you know, ML system where- you have to show it, you know, millions of examples of one particular thing and then it can do kind of one task. This is much more of a very open-ended set of, um, um, I want to say that the, the experiences are super rich but they're definitely diverse. I teach. Um, I just give one statement, I say it once, and then all of a sudden it has all the ramifications and, uh, kind of consequences of that built in, and it kind of understands in a kind of deeper level. Of course, this is, uh, based on, you know, logic systems.

Um, so it is brittle, but this is kind of just a proof of concepts to give you a taste of what I mean when I'm, uh, say logic. So, uh, so these systems need to be able to digest these heterogeneous information and reason deeply with that information. And we'll see how... out of how, um, logic, uh, systems can, you know, do that. Okay.

So that completes the tour of the topics of this class. Um, now I want to spend a little bit of time on course logistics. Uh, so Um, I want to-all the details here are online, so I'm not gonna be complete in my coverage, um, but I just wanna give you a-a general sense of what's going on here.

Okay. So what are we trying to do in this course? Um, so, uh, prerequisites, um, there's programming, um, discrete math, and, uh, probability.

So you need to be able to code, you need to be able to, um, do some math and, uh, some kind of basic proofs, right? So these are the classes that are, um, required or at least recommended, uh, that you-or if you have some equivalent experience that's, you know, fine too. Um, and what we-what should you hope to get out of this course, right?

So one hand, the course is meant to be giving you a set of tools. Using the modeling inference learning paradigm, it gives you a set of tools. and a way of thinking about problems that hopefully will be really useful for you when you go out in the world and try to solve real-world problems. Um, and also by-as a side product, I also want all of you to be more proficient at, you know, math and programming because those are kind of the core elements that are, uh, enable you to do kind of interesting things in AI.

So a lot of AI, you know, you read about it is very flashy but really the foundations are still, um, just, you know, math and programming. in some sense. Okay. So the coursework is homeworks, exam, and a project. That's what you have to do.

Um, homeworks, there's eight homeworks. Each homework is a mix of writing, written, and programming problems centered on a particular application covering one particular type of model essentially. Um, like I mentioned before, there's a competition for extra credit. There's also some- extra credit problems in the, in the, in the homeworks. And when you submit code, we're gonna run, we have an autograder that runs.

It's gonna run on all the test cases but you get a feedback only a subset. So you can, it's like, you know, in machine learning you have a train set and you have a test set. So don't train on your test set.

Okay. So the exam is, testing your ability to use the knowledge that you learn to solve new problems, right? So there's, um, and I, I think it's worth taking a look at exam because this kind of surprises people every, the exam is a little bit different than the types of problems that you see on, on the homework and they're kind of more problem, you know, solving. So the exam isn't gonna be like a multiple choice like, okay, you know, um, you know, when was, uh, um, you know, Perceptron's published or, you know, something like that. It's gonna be, here's a real-life problem.

How do you model it and how do you come up with a, you know, solution? Um, they're all gonna be written, it's closed book except for you have a one page of notes. And this is a great opportunity to actually, um, review all the material and actually learn the, the content of the class. Um, so the project I think is, uh, a really good opportunity to take all the things that we've been talking about in the class, and, um, try to find something you really care about and try to apply it.

Um, working groups of three and I really recommend finding a group early. And as I emphasize, it's your responsibility to find, you know, a good group, right? Um, don't come to us later like one week before the project deadline and say, oh, you know, my group members, they, um, they ditched me or something. Really try to, try to nail this down. Use Piazza to, or your other, um, social networks to find a good group.

So throughout the quarter, there's gonna be these milestones for the projects. So, um, to prevent you guys from procrastinating until the very end. Um, so there's gonna be a proposal where you try and brainstorm some ideas, progress report, a poster session, which is actually a whole week before the final report is due. Um, and the project is very open. So this can be, um, really liberating but also might be a little bit, um, daunting.

Um, we will hopefully give you a lot of structure in terms of saying, okay, how do you define your task? How do you implement different, um, baselines and oracles, which I'll explain later. How do you evaluate? How do you, um, analyze what you've done?

Um, and each of you will, each project group will be assigned a CA mentor, um, to help you, uh, through the process and you're always welcome to come to my office hours or Dorsa's or any of the CAs to get additional, um, help, either brainstorming or figuring out what the next, uh, step is. Um, some policies, uh, all assignments will be submitted on Gradescope. Um, there's seven total late days.

You can use at most two per assignment. After that, there's no credit. Um, uh, we're gonna use Piazza for all communications.

So don't email us directly, leave a post on Piazza. If, uh, I encourage you to make it public if it's, it's not sensitive but if it's, you know, personal then obviously make it private. Um, and try to help each other. We'll actually award some extra credit for students. who help answer, um, you know, other students'questions.

So all the details are on the course website. Okay. So one last thing, and it's really important, and that's the honor code. Okay.

So especially if you're, um, you know, you've probably heard this if you've been at Stanford, if you haven't, then I want to really kind of make this clear. So I encourage you all to kind of collaborate, discuss together, but when you-when it comes to actually the homeworks, you have to write up your homework and, you know, code it independently. So you shouldn't be looking at someone's write-up, um, you shouldn't be looking at their code, um, and you definitely shouldn't be, like, copying code off of GitHub. Um, um, that's hopefully should be, you know, obvious.

And maybe less obvious, you should not, please do not post your homework assignments on GitHub. I know you're probably proud of the fact your Pac-Man agent is doing really well, but, um, please don't, uh, post on GitHub because, then that's gonna be an honor code violation. Um, when debugging, um, with-if you're working together, it's fine to-as long as it's kind of looking at input-output behavior. So you can say to your partner, hey, I-I put in this, um, input to my test case, I'm getting like three, what are you getting? So that's fine, but you can't, um, remember, don't look at each other's code.

Um, and as you enforce this, we're gonna be running Moss, which is a software program that looks for code duplication, um, to-to- make sure that the rules are being followed. And, you know, changing one variable name is or you, you'll be so, anyway, enough said. Just don't, don't, don't do it.

Okay. Any questions about this? I want to make sure this is important or about any of the logistics. Yeah.

Does the GitHub thing apply to the final project? The final project, you can put on GitHub. Yeah. Yeah.

Private GitHub repos is fine. Yeah, question in the back. Is it necessary to have a group or can you do a solo project?

The question is, can you do a solo project? You can do a solo project, you can do a project with two people, you can do a project with three. I would encourage you to- try to work in, uh, groups of three because you'll be able to do more as a group.

And there's definitely, uh, you know, it's not like if you do solo project, we'll be expecting like one-third of the work. So, okay. Anything else?

All right. Okay. So in the final section, I want to actually delve into some technical details. And one thing we're gonna focus on right now is the kind of inference and learn.

components of, of this course. So I'm gonna talk about how you can approach these through the lens of, you know, optimization. So this is going to be, um, it might be a review for some of you, but hopefully it's gonna be a, a good, um, uh, you know, way to get everyone on the same page. Okay.

So what is optimization? There's two flavors of optimization that we care about. There's discrete optimization where you're trying to find the best discrete object.

For example, you're trying to find the best path. or some-the path P that minimizes the cost of that path. Um, we're going to talk about one algorithmic tool, um, based on dynamic programming, which is a very powerful way of solving these, um, complex optimization problems.

Um, and the key, you know, property here is that the set of paths is huge and you can't just, uh, try all of them and compute the cost and choose the best one. So you're gonna have to do something clever. The second brand of optimization is continuous optimization.

And formally, this is just finding the best, uh, vector of real numbers that satisfies or minimizes some objective function. So a typical place this shows up is in learning where you define objective function like the training error, and you're trying to find a weight vector w. So this notation just means it's a list of numbers, d numbers that minimizes the training error. And we're gonna show that gradient descent is, uh, uh, you know, easy and a very-a surprisingly effective way of solving these, um, continuous optimization problems. Okay.

So to introduce these two ideas, I'm gonna look at two, um, problems and try to kinda work through them. So this might be also a good, um, you know, way to, uh, think about how you might go approach, uh, you know, homework problems. And I'm trying to kind of talk through this, um, in a bit more detail.

Okay. So the first problem is, um, you know, computing at its distance. Um, and, you know, this might not look, you know, like an AI problem but a lot of, uh, AI problems have this as kind of a, you know, building block if you want to do some sort of matching between, um, you know, two words or two, um, biological sequences.

So the input is you're given two strings, um, I'm gonna start writing over here on the board just to work this out. So you're given two strings, um, s and t. Um, so for example, um, a cat and, um, the cats. Okay. So these are two strings.

And you want to find the minimum number of edits that is needed to take transform s into t. And by edits, I mean you can, uh, insert, um, a character like you can insert an S, you can delete characters like you can delete this A, and you can substitute one character for another. So you can replace this A with a T, okay?

Um, so here's some examples. What's the added distance of cat and cat? It's 0, you don't have to do anything. Cat and dog is 3, cat and at is 1, you insert the A or insert the C.

Um, cat and cats is 1, um, and a cat and the cats is, uh, 4, okay? So the challenge here is that there are, uh, quite a different number of ways to insert and delete, right? So if you have a string of, that's very long, there's just way too many things to like just try out.

Um, okay. So then how do we, how do we go about, um, coming up with a solution? So any ideas? Yeah.

In terms of saying that a substitution or a deletion can be considered a substitution or vice versa by saying like an empty character? Yeah, yeah. So let's try to simplify the problem a bit and building off.

you know, what you, um, what was said. So, um, one thing to note is that, okay, we're-so the general principle, let me just write the general principle. Um, is to reduce the problem to a simpler problem. Because then you can hopefully solve, it is easier to solve and then you can maybe keep on doing that and you get something that's trivial.

Okay. So there's maybe two observations we can make. One is that, well, we're technically saying we can, you know, insert into S, right?

But if we insert into S, it makes the problem kind of larger in some sense. right? I mean, that's not-that's not good.

That's not reducing the problem. But-but whenever we insert into S, we probably want to insert things which are in T. We want to cancel something out, right? So we wouldn't insert a K there for any reason.

We probably want to insert a S, in which case, you know, S matches that, and then we've reduced that problem, right? So we can actually think about, you know, inserting into S. to S as equivalent to kind of deleting from, um, from T.

Okay. Does that make sense? All right.

So another observation we can make is that, um, you know, we can start inserting anywhere. We can start inserting here and then jump over here and do this. But this just introduces a lot of, um, you know, ways of doing it which all kind of result in the same answer. So why don't we just start more systematically at one end, and then just proceed and try to chisel off the problem, um, kind of, let's say from the end. Okay.

So, um, start at the end. Okay. So, so now we have this, um, we have this problem.

Problem. I'm gonna draw a problem in a little box here. Um, so let's start at the end.

Yeah, question. What was the reasoning that we used to reach that principle, start at the end? Uh, the question is why are we starting at the end?

Right. As opposed, well, um, the idea is that if you start at the end, then you have kind of a more systematic and consistent way of, you know, reducing the problem. So you don't have to think about all the permutations of where I can, you know, delete and substitute. Why is it more systematic to go from the right to the left than from the left to the right?

We can also do left to right. So the end or the start, um, is both fine. This is just, I just picked the end. Yeah.

How do we know starting at one end can give us the optimal strategy? Yeah. The question is, how do we know that starting at one end can give you the optimal strategy?

So, you know, if you wanted to prove this more rigorously, there's some work, but I'll just try to give you a, you know, intuitive answer. Suppose you didn't start at the end and you just made a sequence of steps. Like I insert here, I delete here, and then I went over here.

and, um, did all those operations to S. I could have equivalently also just sorted those by, you know, where it was happening and then just proceeded from one end to the other, and I would arrive at the exact same answer. So without loss of generality, I can start that.

Any other questions? Okay. So, yeah. So instead of doing this, wouldn't the more vulnerable approach be trying to recognize some patterns instead of doing this?

I mean between the two strings S and T, like some kind of-some sort of pattern. have better input in both of these S and T strings? Yeah.

So the question is, maybe you can recognize some patterns. Wow, it's like, oh, cat. That's, uh, that's maybe those should be lined up. Um, I guess these examples are chosen so that these patterns exist, but we want to solve the problem for cases where, um, the pattern might not be obvious.

It could be we want to work it for-it to work for all strings. Maybe there is no pattern. And we still would want to have efficient algorithm to do it.

Yeah. Can we just use dynamic programming? We go one by one, there is always basically two steps. Either we're doing substitution or otherwise, it's like the same character or we have to insert.

Yeah. and then we keep going and we just like remember each like two, two strings that we have at one point. Mm-hm. So that if we calculated that we don't have to do it again.

Yeah, yeah. That's it. Yeah.

Yeah. Yeah. Great idea.

Let's do dynamic programming. So that's what I'm kinda trying to build up from, build up to. Okay. So, um, So if you look at this, so dynamic programming is a kind of general technique that essentially allows you to express a more complicated problem than a simpler problem. So let's start with this problem.

If we start at the end, um, if the two match then, well, we can just immediately, uh, you know, delete these two and that's- it's gonna be the same, right? So we can get-we're gonna get some free rides there. Okay, but when they differ, um, now we have many options.

So what we-what could we do? Well, we could, um, um, you know, substitute, okay? We can change the T to an S.

So what does that leave us with? So I can do, uh, cats, um, T is the cat, the-okay. So I can substitute, um, Okay.

What else can I do? Someone say something I can do. So I can insert, insert where?

Into?. So you can insert an S, right? But that's the same as, you know, by one deleting from T. So but you can basically also just delete this S. So this is our cat and I deleted this S from T.

Okay? So this is, um, let's call it, uh, you know, um, I guess let's call this insertion, it's technically insertion. And then finally, what can I do? I can also remove T. So a, k, c, a, t, the, the cat.

Okay, so this is delete. And right now, you're probably looking at this like, well, obviously, you know, you should do this one. But in general, it's hard to tell, right? I just gave you some arbitrary strings, you know, who knows what the right answer is.

Um, so in general, how do you pick? Yeah. The second one, the t is supposed to be the cats. Yeah. You mean this one?

Yeah. So here I inserted a S, right? But then because there's two S's here, I just cancel them out and just left it.

So you can think about this as really deleting from-We're just not considering. It's like in the original problem we said that we're trying to find the S to the T. Yeah. Yeah.

Yeah. So I'm-I'm-because of this, I'm kind of trying to reframe the problem a little bit. Okay.

So which one should I choose? Yeah. What about the substitution the other way?

Um, the substitution the other way meaning change. S to the T. Sorry, there's too many S's and T's here, which is a bit unfortunate. I didn't replace the last S pass with T.

Oh, you could. How did we eliminate that? What did we do wrong?

That's, you can think about that. is kind of equivalent. So if you identify two letters that you want to make the same, then you could-you can replace the one to be the other or the other to be that.

I mean, officially we've been kind of framing it as we're only editing S, which is the reason that it's asymmetric. Okay. So which one of these? Door A, door B, or door C?

Yeah. Would you look for similarity between S and T, every step, like you did the one because there's cat in both of them? Yeah.

So you could try to look inside, but-but remember, these are-might be really complicated. So we want to kind of a simple mechanized procedure to tell. What about the next layer?

The next letter. Yeah, let's pretend these are-you can't see inside of them. Each of the different cases.

Yeah. Okay. So let's keep on going.

So I'm not gonna draw everything, but you can also try to break this down into, maybe there's three actions here and three actions here, right? Um, and at the end of the day, you hopefully have a problem that's simple enough that, um, where s equals t or something, then you're done. Um, but then, you know, how-how do I-how do I know?

Suppose I've solved this. Suppose someone just told you, okay, I know this, um, cost, I know this cost, I know this cost. What-what should you do?

Yeah, you should take the minimum, right? Like remember, we wanna minimize the edit distance. So, um, there's three things you can do.

Each of them has some cost of doing that action, which is, you know, one, every edit is the same cost. And then there's a cost of, you know, continuing to do whatever you're doing. And so we're just gonna take the minimum of those.

Yeah. We know that that's like the optimal amount of loops that we have to take. Yeah.

So I was trying to argue that with, if you're going to right to left, it's without loss of generality. Because if you went left to right or in some other order, you can also replay the edits. letter that you needed one insertion on like the top left and like the top stream.

But if you go from like the left, it looks like as if they're all shifted over by one or not. Yeah. Oh, that will work. Okay.

Yeah, I think it works. Um, okay. So, so let's, um, try to code this up and see if we can make this program work. Okay.

So, um, I'm gonna do edit distance. Can everyone see this? Okay.

So, um, so I'm gonna define a function, it takes two strings, and then I'm going to, um, define a recurrence. So recurrences are- uh, is I guess one word I haven't really used. But this is really the way you should kind of think about dynamic programs and this idea of taking complex problems and breaking it down. It's gonna show up in, you know, search problems, um, MDPs, and, you know, games. So I guess it's something that you should really be comfortable with.

So let's, um, define a recurrence, uh, as follows. Um, so remember at any point in time, I have, uh, let's say a sub-problem. And since I'm going right to left, I'm only considering the- first m letters of s and the first letter, n letters of t.

Okay? So recurse is going to return the minimum edit distance between two things. The first m letter.

letters of s and the first n letters of t. Um, I'm gonna post this online so you guys don't have to, uh, copy-try to copy this. Um, okay. So, um, okay.

Suppose I'm gonna-I'm gonna define this function. If I had this function, what should I return? Recurse of s. So M is an integer, right? So N is an integer.

So I'm going to return the length of M and the length of N. Okay? So that's kind of the initial state.

Sorry. Yep. Okay. Um, all right.

So now I need to fill out this function. Okay. So let's, let's, um, consider a bunch of cases. So here's some easy cases. Suppose that, um, m is 0, right?

So I have comparing an empty string with something that has n letters. So what should the cost of that be? I heard something wrong. It should be n. And symmetrically, if n is 0, then result should be m.

And then if, now we come to the kind of initial case that we consider, which is the n. that can match. So if, um, s, um, the last letter of m, here, you know, this is, uh, zero-based indexing, um, so that's why there's a minus 1. So if this matches, then what should I do? So now I reduce this to a sub-problem, right? So I re-have m minus 1 and minus 1. Okay.

And now comes the fun case which we looked at. So there's-in this case, the last letter doesn't match so I'm gonna have to do some sort of edit. Can't just let it.

Yeah. Would it be worth doing a full STT compare or S through M and then through N compare? Would it be worth doing a full STT compare?

until I guess, yeah. There's probably a way you can make this more efficient. I'm just gonna try to get the basic thing in there. Okay.

So substitution. Okay. So what's the cost of a substitution?

I pay one to do the substitution. substitution but as a reward I get to reduce the problem to n minus 1 and n minus 1, right? So I lop off a letter from S and I lop off a letter from T.

So what else can I do? So I can, um, you know, delete so that also costs 1 and when I delete, I delete from S and then N so this remains the same. And then now you can think about the insertion, um, is n minus 1, right?

Because remember insertion into S is deletion from T, that's why this is n minus 1. Okay? And then the result is just gonna be a minimum of, uh, all these things. Okay?

Return result. Okay. So just, uh, and then how do I call this function?

Um, a cat, the cats. So let me print out the answer. Let's see if it works.

Okay, print out 4. Therefore, I conclude it works now. Um, you-I mean, if you were doing this, uh, you would probably want to test it some more, but in the interest of time, I'll kind of move on. So let me just kind of refresh. Okay, so I'm computing this at a distance between two strings, and we're gonna define a- recurrence that works on subproblems where the subproblem is the first m letters of s and the first n letters of t.

Um, and the reason I'm using integers instead of, um, strings is to avoid like string copying, um, implementation detail but it doesn't really matter. Um, so base cases. So you want to reduce your problem to a case where it's, it's trivial to solve. And then we have the last letter matches, and then we have the letter doesn't match and you have to pay some sort of cost. I don't know which action to take, so I'm gonna take them, you know, minimum of all of them.

and then I call it by just calling, you know, recurse. Okay. So this is great, right?

So now I have a working thing. Um, let's try another test case. So I'm gonna make this, um, so if I do times 10, this, um, basically, uh, replicates this string 10 times. So it's a, it's a long string, longer string. Okay.

So now I'm gonna run it. Maybe I shouldn't wait for this. Is there a base case?

There is a base case. I think it works, but it's, it's-what's, what's wrong with this code? Yes, it's very slow. Why is it slow?.

Yeah, right. So, so I'm recursing. Every point recurses three times. So you kind of get this exponential blow up.

So there's kind of a-how do you solve this problem? Yeah, you can memo, I think I heard the word memoize, which is another way to kind of think about memoize plus, I guess recurrence is, is dynamic programming, I guess. So I'm gonna show you kind of this. way to do it which is pretty un-invasive.

Um, and generally I recommend people, well, get the slow version working and then try to make it faster. Don't try to be, you know, too slick at once. Okay.

So I'm gonna- gonna make this cache, right? And I'm gonna say if mn is in the cache, then I'm gonna return whatever's in the cache. So cache is just a dictionary mapping, um, the key which is, um, identification of the problem I'm interested in solving, and the result which is the answer that I computed.

So if I already computed it, I don't need to compute it again, just return it. And then at the end, if I have to compute it, then, um, I have to put this in the cache. cache, okay? So three lines or four lines, I guess. Yeah..

Yeah, that's a great point. Uh, this should be outside of the recurse function. Yeah. Glad you guys are paying attention.

Um, otherwise, yeah, it would do basically nothing. Any other mistakes? Yeah.

There are some function decorators that like implement memorizing for you. In this class, are you okay if we use that? Or would you rather us like make our own in this case? Um, you can use the decor, you can be fancy if you want.

Okay. Yeah. But, but I think this is, you know, pretty transparent and easy for learning purposes. Okay. So let's run this.

So now it runs instantaneously as opposed to, I actually don't know how long it would have taken otherwise. Okay? And, uh, sanity check 4d is probably the right answer because there's, 4 was the original answer and multiply by 10. Okay.

Any other questions about this? So this is an example of, you know, kind of basic dynamic programming which are, you solve a problem trying to formulate it as a recurrence of a complicated problem in terms of smaller problems. And like I said before, this is gonna kind of show up over and over again in this class.

Yeah. Yeah, so the question is why does this reduce, uh, redundancy? Is that right? Um, so maybe I can do it kind of pictorially.

Um, if you think about, let's say you have a, um, a problem here, right? And this gets, um, you know, reduced to, um,. Um, I'm just making kind of an arbitrary, um, diagram here.

So this problem gets reduced to these two, and this problem gets reduced to these two, um, and, and so on. Um, right? So If you think about, if you didn't have memoization, you would just be paying for the number of paths.

Every path is the kind of you have to compute from scratch. Whereas, um, if you do memoization, you pay in the number of nodes here, which a lot of this is shared. Like here, um, you know, once you compute this, no matter if you're coming from here or here, you're kind of using the same value.

Okay. So let's, let's move on. So the second problem, um, we're gonna talk about is, uh, has to do with continuous optimization. And the motivating question here is how do you do, um, regression, which is a kind of a bread and butter of, um, you know, machine learning here.

So, so here we go. Um, regression. Okay.

So imagine, um, you get some points. Okay. So I give you a point, um, which is, you know, 2, 4. Um, and then I give you another point, let's say, um, 4, 2. And so these are data points you want to, let's say, predict, um, um, housing price from, you know, square footage or some-something like that.

You want to predict health score from, um, you know, blood pressure and some other things. So this is pretty common in machine learning. And the question is, how do you fit, um, you know, a line? I'm gonna kind of consider the case where your line has to go through the origin just for simplicity.

Um, so you might wanna like find, you know, a fit. I mean, two, two points is maybe kind of a little bit degenerate, but, um, that's the simple example we're gonna, you know, work with. In general, you have lots of points and you wanna fit the line that best kind of, uh, is close to the points.

Okay. So, um, how do you do this? Um, so there's a principle called least squares which says, well, if you give me a line which is given in this case by a slope w, um, I'm going to tell you how bad this is. And badness is measured by looking at all the training points and looking at these distances, right?

So, Here I have, um, you know, this particular, uh, particular, let's say, point, you know, x i. If I hit it with a w, then I get basically the, um, you know, the y-intercept, um, here. Not the y-intercept, but the, like, the y-value here.

That's my prediction. The real value was, um, you know, y i, which is, you know, up here. And so if I look at the difference, I want that difference to be 0. Right?

So in, in least squares, I square this and I say, I want this to be as small as possible, right? Now, this is only for one point. So I'm going to look at all the points. So let's suppose I have endpoints, um, and that's a function that I'm gonna call f of w, which basically says for a given weight vector, which is a slope, um, give you a number that characterizes how bad of a fit, um, this is. Where 0 means that I fit everything perfectly, and large numbers mean that I fit poorly, okay?

All right. So, so that's- No regression. So how do I solve a regression problem?

So how do I optimize this? Can you do this in your head? So if I actually had these two points, what should W be? Okay, it doesn't matter.

We'll, we'll compute it. So how do we go about doing this? So one principle, um, which is maybe another general takeaway is abstract away the details, right?

Um, this is also true with the dynamic program. But sometimes, you know, you get-if you're too close to the board and you're looking at, oh man, these points are here and I need to fit this line, you know, how do I do that? Um, um, you-you kind of get kind of a little bit stuck. But why don't we think about this f as, say, some function.

I don't-I don't really care what it is. And let's plot this function, okay? So now this is a different plot now.

This is, uh, the weight and this is f of w. Um, always label your axes. Um, and let's say this function looks like, uh, this, okay?

So which means that for this slope, I pay, you know, this amount. For this slope, I pay this amount and, and so on. And what do I wanna do? I wanna minimize f of w, which means, I want to find the w which has the least value of f of w, right?

Question? Derivative. Okay.

So you take the derivative. So what does the derivative give you? It tells you where to move, right? So if you look over here, um, so you can't-in general, you might not be able to get there directly.

In this actually particular case, you can because you can solve it in- closed form but I'm gonna try to be more general. So if you start here, this, this derivative tells you, well, the function is decreasing if you move to the right. So then you should move to the right.

Whereas over here, if you end up over here, the derivative says, uh, the function is decreasing if you move to the left. So you should move to the left, right? So, um, what I'm gonna introduce is this, uh, algorithm called gradient descent. It's a very simple algorithm.

It basically says, start with some place and then compute the derivative and just follow your nose, right? If the derivative says it's negative, then just go this way and now you're at a new point, you compute the derivative again, you descend, and now you compute it again. And then maybe you, uh, compute the derivative and it says keep on going this way, maybe you overshoot, and then you come back and then, you know, hopefully, uh, you'll end up kind of at the minimum, okay?

So, uh, let's try to see what this looks like in code. So gradient descent is, you know, one of the simplest algorithms but it really underlies, essentially, all the algorithms that you people use in machine learning. Um, so let's do points. We have, uh, two points here.

Um, and I'm gonna define, um, some functions. Okay. So f of w.

So what is this function? So I'm going to sum over all the different, um, you know, basically at this point is converting math into Python. So I'm gonna look at all the, um, points. So for every x, y, um, what the model predicts is w times x minus y. Um, and if I square that, that's going to be the, the error.

that I get on that point and then if I sum over all of these errors, then I get my objective function, okay? Array of, uh, yeah. So you can put array here if you want, but it doesn't matter. It's, it's actually fine.

Um, okay. So now I need to compute the derivative. So how do you compute the derivative?

So if your calculus is a little bit rusty, you might want to brush up on that. Um, so what's the derivative? Remember we're taking derivative with respect to w, right? There's a lot of symbols here.

Always remember what you're taking derivative with respect to. Okay, so derivative of the sum is the sum of the derivative. So now I need to take the derivative of this, right?

And what's the derivative of this? Well, something squared, um, you bring the 2 down here and now you multiply by the derivative of this. And what's the derivative of this? Should be x. Right?

Because this is a-why this is a constant and w derivative, w times x with respect to w is x. Okay, so that's it. Okay, so now let's do gradient descent. Um, let's initialize with w equals 0. And then I'm going to just, um, you know, just iterate a 100 times.

Normally, you would set some sort of stopping condition, but, um, let's just keep it simple for now. Okay, so for every moment, I'm going to-I have a w. I can compute the value of the function and also take the gradient or the derivative. Gradient just means derivative in higher dimensions, which we'll want later.

Um, okay. And then what do I do? I take, uh, w and I, um, subtract the-the gradient, okay?

So remember-okay, I'll be out of- So, uh, I take the gradient. Remember, I want to have the gradient, uh, gradient tells me where the function is increasing. So I wanna move in the opposite direction. Um, and eta is just going to be this, uh, step size to, um, keep things under control.

We'll talk more about it next time. Okay. So now I want to do, uh, print out what's going on here. So iteration, um, print out the function and, um, key value, okay? All right.

So let's compute the gradient. And so you can see that the iteration, we first start out with w equals 0, then it moves to 0.3, and then it moves to 0.7999999, and then it looks like it's converging to 0.8. And meanwhile, the function value is going down from 20 to 7.2, which happens to be the optimal answer.

So the correct answer here is 0.8. Okay. So that's it. Next time we're gonna keep, uh, we're gonna start on the machine learning lecture.

Transcript for:Overview of CS 221: Artificial Intelligence Course

Transcript for:
Overview of CS 221: Artificial Intelligence Course