[MUSIC] BRIAN YU: All right. Welcome, everyone, to an Introduction
to Artificial Intelligence with Python. My name is Brian Yu. And in this class, we'll explore
some of the ideas, and techniques, and algorithms that are at the
foundation of artificial intelligence. Now, artificial intelligence covers a
wide variety of types of techniques. Anytime you see a
computer do something that appears to be intelligent
or rational in some way, like recognizing
someone's face in a photo, or being able to play a
game better than people can, or being able to understand human
language when we talk to our phones and they understand what we mean
and are able to respond back to us, these are all examples of AI,
or artificial intelligence. And in this class we'll explore some of
the ideas that make that AI possible. So we'll begin our
conversations with search. The problem of, we
have an AI and we would like the AI to be able to search for
solutions to some kind of problem, no matter what that problem might be. Whether it's trying to get driving
directions from point A to point B, or trying to figure
out how to play a game, giving a tic-tac-toe game, for
example, figuring out what move it ought to make. After that, we'll take
a look at knowledge. Ideally, we want our AI to
be able to know information, to be able to represent
that information, and more importantly, to be able to
draw inferences from that information. To be able to use the information it
knows and draw additional conclusions. So we'll talk about how AI can be
programmed in order to do just that. Then we'll explore the
topic of uncertainty. Talking about ideas of, what happens
if a computer isn't sure about a fact but maybe is only sure
with a certain probability? So we'll talk about some of
the ideas behind probability and how computers can begin
to deal with uncertain events in order to be a little bit more
intelligent in that sense, as well. After that, we'll turn our
attention to optimization. Problems of when the computer is trying
to optimize for some sort of goal, especially in a situation
where there might be multiple ways that a
computer might solve a problem, but we're looking for a better
way or, potentially, the best way if that's at all possible. Then we'll take a look at machine
learning, or learning more generally. In looking at how when
we have access to data our computers can be programmed to be
quite intelligent by learning from data and learning from experience, being
able to perform a task better and better based on greater access to data. So your email, for example,
where your email inbox somehow knows which of your emails are good
emails and whichever emails are spam. These are all examples
of computers being able to learn from past
experiences and past data. We'll take a look, too, at how
computers are able to draw inspiration from human intelligence, looking
at the structure of the human brain and how neural networks can be a
computer analog to that sort of idea. And how, by taking advantage of a
certain type of structure of a computer program, we can write
neural networks that are able to perform tasks
very, very effectively. And then finally, we'll turn
our attention to language. Not programming languages, but human
languages that we speak every day. And taking a look at
the challenges that come about as a computer tries to
understand natural language and how it is some of
the natural language processing that occurs in
modern artificial intelligence can actually work. But today it will begin our
conversation with search. This problem of trying
to figure out what to do when we have some sort of
situation that the computer is in, some sort of environment that
an agent is in, so to speak. And we would like for that agent to
be able to somehow look for a solution to that problem. Now, these problems can come in any
number of different types of formats. One example, for instance, might
be something like this classic 15 puzzle with the sliding
tiles that you might have seen, where you're trying to
slide the tiles in order to make sure that all the numbers line up in order. This is an example of what you
might call a search problem. The 15 puzzle begins in an
initially mixed up state and we need some way of finding
moves to make in order to return the puzzle to its solved state. But there are similar problems
that you can frame in other ways. Trying to find your way
through a maze, for example, is another example of a search problem. You begin in one place, you have some
goal of where you're trying to get to, and you need to figure out the correct
sequence of actions that will take you from that initial state to the goal. And while this is a little
bit abstract, anytime we talk about maze
solving in this class, you can translate it to something
a little more real world, something like driving directions. If you ever wonder how Google
Maps is able to figure out what is the best way for you to
get from point A to point B and what turns to make, at what time,
depending on traffic, for example. It's often some sort
of search algorithm. You have an AI that is trying
to get from an initial position to some sort of goal by taking
some sequence of actions. So we'll start our
conversations today by thinking about these types of
search problems and what goes in to solving a search problem
like this in order for an AI to be able to find a good solution. In order to do so, though,
we're going to need to introduce a little bit of
terminology, some of which I've already used. But the first time we'll need
to think about is an agent. An agent is just some entity
that perceives its environment, it somehow is able to
perceive the things around it, and act on that environment in some way. So in the case of the
driving directions, your agent might be some
representation of a car that is trying to figure out what
actions to take in order to arrive at a destination. In the case of the 15 puzzle
with the sliding tiles, the agent might be the AI or the person
that is trying to solve that puzzle, trying to figure out what tiles to
move in order to get to that solution. Next, we introduce the idea of a state. A state is just some configuration
of the agent in its environment. So in the 15 puzzle, for example, any
state might be any one of these three for example. A state is just some
configuration of the tiles. Each of these states is
different and is going to require a slightly
different solution. A different sequence of actions will
be needed in each one of these in order to get from this initial
state to the goal, which is where we're trying to get. The initial state then. What is that? The initial state is just the
state where the agent begins. It is one such state where
we're going to start from and this is going to be the starting
point for our search algorithm, so to speak. We're going to begin
with this initial state and then start to reason about it, to
think about what actions might we apply to that initial state in order to
figure out how to get from the beginning to the end, from the initial position
to whatever our goal happens to be. And how do we make our way from
that initial position to the goal? Well ultimately, it's
via taking actions. Actions are just choices that
we can make in any given state. And in AI, we're always going to try
to formalize these ideas a little bit more precisely such that we could
program them a little bit more mathematically, so to speak. So this will be a recurring theme and
we can more precisely define actions as a function. We're going to effectively
define a function called actions that takes an input S, where S is going
to be some state that exists inside of our environment, and actions of S
is going to take the state as input and return as output the
set of all actions that can be executed in that state. And so it's possible that some actions
are only valid in certain states and not in other states. And we'll see examples
of that soon, too. So in the case of the
15 puzzle, for example, they're generally going to
be four possible actions that we can do most of the time. We can slide a tile to the
right, slide a tile to the left, slide a tile up, or slide
a tile down, for example. And those are going to be the
actions that are available to us. So somehow our AI, our
program, needs some encoding of the state, which is often going
to be in some numerical format, and some encoding of these actions. But it also needs some encoding of
the relationship between these things, how do the states and actions
relate to one another? And in order to do that,
we'll introduce to our AI a transition model, which will
be a description of what state we get after we perform some
available action in some other state. And again, we can be a little
bit more precise about this, define this transition model a
little bit more formally, again, as a function. The function is going to be
a function called result, that this time takes two inputs. Input number one is S, some state. And input number two is A, some action. And the output of this
function result is it is going to give us the state that we
get after we perform action A in state S. So let's take a look at an example
to see more precisely what this actually means. Here's an example of a state
of the 15 puzzle, for example. And here's an example of an action,
sliding a tile to the right. What happens if we pass these as
inputs to the result function? Again, the result function takes this
board, this state, as its first input. And it takes an action
as a second input. And of course, here, I'm
describing things visually so that you can see visually what the
state is and what the action is. In a computer, you might
represent one of these actions as just some number that
represents the action. Or if you're familiar
with enums that allow you to enumerate multiple possibilities,
it might be something like that. And the state might just be represented
as an array, or two dimensional array, of all of these numbers that exist. But here we're going to show it
visually just so you can see it. When we take this state and this action,
pass it into the result function, the output is a new state. The state we get after we take a tile
and slide it to the right, and this is the state we get as a result. If we had a different action and
a different state, for example, and passed that into
the result function, we'd get a different answer altogether. So the result function
needs to take care of figuring out how to take a state and
take an action and get what results. And this is going to be
our transition model that describes how it is that states and
actions are related to each other. If we take this transition model
and think about it more generally and across the entire problem, we can
form what we might call a state space, the set of all of the states we
can get from the initial state via any sequence of actions, by
taking zero or one or two or more actions in addition to that,
so we could draw a diagram that looks something like this. Where every state is represented
here by a game board. And there are arrows that connect
every state to every other state we can get two from that state. And the state space is much larger
than what you see just here. This is just a sample of what the
state space might actually look like. And, in general, across
many search problems, whether they're this particular
15 puzzle or driving directions or something else, the state space
is going to look something like this. We have individual states and
arrows that are connecting them. And oftentimes, just
for simplicity, we'll simplify our representation
of this entire thing as a graph, some sequence of nodes
and edges that connect nodes. But you can think of this more abstract
representation as the exact same idea. Each of these little
circles, or nodes, is going to represent one of the
states inside of our problem. And the arrows here
represent the actions that we can take in
any particular state, taking us from one particular state
to another state, for example. All right. So now we have this idea of nodes
that are representing these states, actions that can take us
from one state to another, and a transition model
that defines what happens after we take a particular action. So the next step we
need to figure out is how we know when the AI is
done solving the problem. The AI I needs some way to
know when it gets to the goal, that it's found the goal. So the next thing we'll need to encode
into our artificial intelligence is a goal test, some way to determine
whether a given state is a goal state. In the case of something like driving
directions, it might be pretty easy. If you're in a state that
corresponds to whatever the user typed in as their
intended destination, well, then you know you're in a goal state. In the 15 puzzle, it might
be checking the numbers to make sure they're
all in ascending order. But the AI need some way
to encode whether or not any state they happen
to be in is a goal. And some problems might
have one goal, like a maze where you have one initial
position and one ending position and that's the goal. In other more complex
problems, you might imagine that there are multiple possible
goals, that there are multiple ways to solve a problem. And we might not care which
one the computer finds as long as it does find a particular goal. However, sometimes a computer doesn't
just care about finding a goal, but finding a goal well,
or one with a low cost. And it's for that reason
that the last piece of terminology that we use to
define these search problems is something called a path cost. You might imagine that in the
case of driving directions, it would be pretty annoying if I
said I wanted directions from point A to point B, and the route the Google
Maps gave me was a long route with lots of detours that were unnecessary,
that took longer than it should have for me to get to that destination. And it's for that reason that when
we're formulating search problems, we'll often give every path some sort of
numerical cost, some number telling us how expensive it is to take
this particular option. And then tell our AI that instead
of just finding a solution, some way of getting from the
initial state to the goal, we'd really like to find one that
minimizes this path cost, that is less expensive, or takes
less time, or minimizes some other numerical value. We can represent this graphically, if
we take a look at this graph again. And imagine that each of these
arrows, each of these actions that we can take from one
state to another state, has some sort of number
associated with it, that number being the path cost
of this particular action where some of the costs for
any particular action might be more expensive than the cost
for some other action, for example. Although this will only happen
in some sorts of problems. In other problems we
can simplify the diagram and just assume that the cost of
any particular action is the same. And this is probably the case
in something like the 15 puzzle, for example, where it doesn't
really make a difference whether I'm moving right or moving left. The only thing that matters
is the total number of steps that I have to take to get from point
A to point B. And each of those steps is of equal cost. We can just assume it's a
some constant cost, like one. And so this now forms
the basis for what we might consider to be a search problem. A search problem has some sort of
initial state, some place where we begin, some sort of
action that we can take or multiple actions that we
can take in any given state, and it has a transition
model, some way of defining what happens when we go from
one state and take one action, what state do we end
up with as a result. In addition to that, we need some
goal test to know whether or not we've reached a goal. And then we need a path cost function
that tells us for any particular path, by following some sequence of
actions, how expensive is that path. What is its cost in
terms of money, or time, or some other resource that we are
trying to minimize our usage of. The goal, ultimately, is to find a
solution, where a solution in this case is just some sequence of actions that
will take us from the initial state to the goal state. And, ideally, we'd like to find not just
any solution, but the optimal solution, which is a solution that has
the lowest path cost among all of the possible solutions. And in some cases, there might
be multiple optimal solutions, but an optimal solution
just means that there is no way that we could have done better
in terms of finding that solution. So now we've defined the problem. And now we need to
begin to figure out how it is that we're going to solve
this kind of search problem. And in order to do so,
you'll probably imagine that our computer is going to need
to represent a whole bunch of data about this particular problem. We need to represent data about
where we are in the problem. And we might need to be considering
multiple different options at once. And oftentimes when we're trying to
package a whole bunch of data related to a state together, we'll
do so using a data structure that we're going to call a node. A node is a data structure
that is just going to keep track of a variety
of different values, and specifically in the
case of a search problem, it's going to keep track of
these four values in particular. Every node is going to keep track of
a state, the state we're currently on. And every node is also going
to keep track of a parent. A parent being the state
before us, or the node that we used in order to
get to this current state. And this is going to be
relevant because eventually, once we reach the goal node,
once we get to the end, we want to know what sequence of actions
we used in order to get to that goal. And the way we'll know that
is by looking at these parents to keep track of what led us to the
goal, and what led us to that state, and what led us to the state
before that, so on and so forth, backtracking our way to
the beginning so that we know the entire sequence of
actions we needed in order to get from the beginning to the end. The node is also going to keep track
of what action we took in order to get from the parent to the current state. And the node is also going
to keep track of a path cost. In other words, it's going to
keep track of the number that represents how long it took to get
from the initial state to the state that we currently happen to be at. And we'll see why this is relevant
as we start to talk about some of the optimizations that we can make
in terms of these search problems more generally. So this is the data
structure that we're going to use in order to solve the problem. And now let's talk about
the approach, how might we actually begin to solve the problem? Well, as you might imagine,
what we're going to do is we're going to start
at one particular state and we're just going
to explore from there. The intuition is that
from a given state, we have multiple options
that we could take, and we're going to
explore those options. And once we explore those options,
we'll find that more options than that are going to make themselves available. And we're going to consider
all of the available options to be stored inside of a single data
structure that we'll call the frontier. The frontier is going to
represent all of the things that we could explore next, that
we haven't yet explored or visited. So in our approach, we're
going to begin this search algorithm by starting with a frontier
that just contains one state. The frontier is going to contain the
initial state because at the beginning, that's the only state we know about. That is the only state that exists. And then our search algorithm is
effectively going to follow a loop. We're going to repeat some
process again and again and again. The first thing we're going to
do is if the frontier is empty, then there's no solution. And we can report that there
is no way to get to the goal. And that's certainly possible. There are certain types of
problems that an AI might try to explore and realize that there
is no way to solve that problem. And that's useful information
for humans to know, as well. So if ever the frontier is empty, that
means there's nothing left to explore, and we haven't yet found a
solution so there is no solution. There's nothing left to explore. Otherwise what we'll do is we'll
remove a node from the frontier. So right now at the
beginning, the frontier just contains one node
representing the initial state. But over time, the frontier might grow. It might contain multiple states. And so here we're just going to remove
a single node from that frontier. If that node happens to be a
goal, then we found a solution. So we remove a node from the frontier
and ask ourselves, is this the goal? And we do that by applying the goal
test that we talked about earlier, asking if we're at the destination or
asking if all the numbers of the 15 puzzle happen to be in order. So if the node contains the
goal, we found a solution. Great. We're done. And otherwise, what we'll need to
do is we'll need to expand the node. And this is a term in
artificial intelligence. To expand the node just means to look
at all of the neighbors of that node. In other words, consider
all of the possible actions that I could take from the state
that this node as representing and what nodes could
I get to from there. We're going to take all of
those nodes, the next nodes that I can get to from this
current one I'm looking at, and add those to the frontier. And then we'll repeat this process. So at a very high level, the idea
is we start with a frontier that contains the initial state. And we're constantly removing
a node from the frontier, looking at where we can get to next,
and adding those nodes to the frontier, repeating this process over
and over until either we remove a node from the frontier and it
contains a goal, meaning we've solved the problem. Or we run into a situation where the
frontier is empty, at which point we're left with no solution. So let's actually try
and take the pseudocode, put it into practice by taking a look at
an example of a sample search problem. So right here I have a sample graph. A is connected to B via this action,
B is connected to node C and D, C is connected to D, E is connected
to F. And what I'd like to do is have my AI find a path from A to E.
We want to get from this initial state to this goal state. So how are we going to do that? Well, we're going to start
with the frontier that contains the initial state. This is going to represent our frontier. So our frontier, initially, will
just contain A, that initial state where we're going to begin. And now we'll repeat this process. If the frontier is empty, no solution. That's not a problem because
the frontier is not empty. So we'll remove a node from the
frontier as the one to consider next. There is only one node in the frontier. So we'll go ahead and
remove it from the frontier. But now A, this initial node, this is
the node we're currently considering. We follow the next step. We ask ourselves, is this node the goal? No, it's not. A is not the goal. E is the goal. So we don't return the solution. So instead, we go to this
last step, expand the node and add the resulting
nodes to the frontier. What does that mean? Well, it means take this state A and
consider where we could get to next. And after A what we
could get to next is only B. So that's what we get
when we expand A. We find B. And we add B to the frontier. And now B is in the frontier
and we repeat the process again. We say, all right. The frontier is not empty. So let's remove B from the frontier. B is now the node that
we're considering. We ask ourselves, is B the goal? No, it's not. So we go ahead and expand B and add
its resulting nodes to the frontier. What happens when we expand B? In other words, what nodes
can we get to from B? Well, we can get to C
and D. So we'll go ahead and add C and D from the frontier. And now we have two nodes
in the frontier, C and D. And we repeat the process again. We remove a node from the
frontier, for now we'll do so arbitrarily just by picking C. We'll see why later how choosing which
node you remove from the frontier is actually quite an important
part of the algorithm. But for now I'll arbitrarily
remove C, say it's not the goal, so we'll add E, the next
one to the frontier. Then let's say I remove
E from the frontier. And now I'm currently looking at
state E. Is that a goal state? It is because I'm trying
to find a path from A to E. So I would return the goal. And that, now, would be
the solution, that I'm now able to return the solution
and I found a path from A to E. So this is the general idea, the general
approach of this search algorithm, to follow these steps constantly
removing nodes from the frontier until we're able to find a solution. So the next question
you might reasonably ask is, what could go wrong here? What are the potential problems
with an approach like this? And here's one example of a problem that
could arise from this sort of approach. Imagine this same graph, same
as before, with one change. The change being, now, instead
of just an arrow from A to B, we also have an arrow from B to A,
meaning we can go in both directions. And this is true in
something like the 15 puzzle where when I slide a
tile to the right, I could then slide a tile to the left
to get back to the original position. I could go back and
forth between A and B. And that's what these double arrows
symbolize, the idea that from one state I can get to another
and then I can get back. And that's true in many search problems. What's going to happen if I try
to apply the same approach now? Well, I'll begin with A, same as before. And I'll remove A from the frontier. And then I'll consider where I can get
to from A. And after A, the only place I can get choice B so B
goes into the frontier. Then I'll say, all right. Let's take a look at B. That's the
only thing left in the frontier. Where can I get to from B? Before it was just C and D, but
now because of that reverse arrow, I can get to A or C or D. So all
three A, C, and D. All of those now go into the frontier. They are places I can get to from B.
And now I remove one from the frontier, and, you know, maybe I'm
unlucky and maybe I pick A. And now I'm looking at A again. And I consider where can I get to from
A. And from A, well I can get to B. And now we start to see the
problem, that if I'm not careful, I go from A to B and then
back to A and then to B again. And I could be going in this
infinite loop where I never make any progress because I'm constantly
just going back and forth between two states that I've already seen. So what is the solution to this? We need some way to
deal with this problem. And the way that we can
deal with this problem is by somehow keeping track of
what we've already explored. And the logic is going to be, well,
if we've already explored the state, there's no reason to go back to it. Once we've explored a
state, don't go back to it, don't bother adding it to the frontier. There's no need to. So here is going to be
our revised approach, a better way to approach
this sort of search problem. And it's going to look very similar
just with a couple of modifications. We'll start with a frontier
that contains the initial state. Same as before. But now we'll start with
another data structure, which would just be a set of
nodes that we've already explored. So what are the states we've explored? Initially, it's empty. We have an empty explored set. And now we repeat. If the frontier is empty, no solution. Same as before. We remove a node from
the frontier, we check to see if it's a goal
state, return the solution. None of this is any different so far. But now, what we're going
to do is we're going to add the node to the explored state. So if it happens to be the case that
we remove a node from the frontier and it's not the goal, we'll
add it to the explored set so that we know we've
already explored it. We don't need to go back to it again
if it happens to come up later. And then the final
step, we expand the node and we add the resulting
nodes to the frontier. But before we just always added the
resulting nodes to the frontier, we're going to be a little
cleverer about it this time. We're only going to add
the nodes to the frontier if they aren't already in
the frontier and if they aren't already in the explored set. So we'll check both the
frontier and the explored set, make sure that the node isn't
already in one of those two, and so long as it isn't, then we'll
go ahead and add to the frontier but not otherwise. And so that revised
approach is ultimately what's going to help
make sure that we don't go back and forth between two nodes. Now the one point that I've
kind of glossed over here so far is this step here,
removing a node from the frontier. Before I just chose arbitrarily, like
let's just remove a node and that's it. But it turns out it's
actually quite important how we decide to structure
our frontier, how we add them, and how we remove our nodes. The frontier is a data structure. And we need to make a
choice about in what order are we going to be removing elements? And one of the simplest data structures
for adding and removing elements is something called a stack. And a stack is a data structure that
is a last in, first out data type. Which means the last thing
that I add to the frontier is going to be the first thing
that I remove from the frontier. So the most recent thing to go into the
stack, or the frontier in this case, is going to be the node that I explore. So let's see what happens if I
apply this stack based approach to something like this problem,
finding a path from A to E. What's going to happen? Well, again we'll start with
A. And we'll say, all right. Let's go ahead and look at A first. And then, notice this time, we've
added A to the explored set. A is something we've now explored,
we have this data structure that's keeping track. We then say from A we can
get to B. And all right. From B what can we do? Well from B, we can explore
B and get to both C and D. So we added C and then D. So
now when we explore a node, we're going to treat the frontier
as a stack, last in, first out. D was the last one to come in so
we'll go ahead and explore that next. And say, all right, where
can we get to from D? Well we can get to F. And so, all right. We'll put F into the frontier. And now because the
frontier is a stack, F is the most recent thing
that's gone in the stack. So F is what we'll explore next. We'll explore F and say, all right. Where can we get you from F? Well, we can't get anywhere so
nothing gets added to the frontier. So now what was the new most
recent thing added to the frontier? Well it's not C, the only
thing left in the frontier. We'll explore that from which we can
say, all right, from C we can get to E. So E goes into the frontier. And then we say, all right. Let's look at E and E
is now the solution. And now we've solved the problem. So when we treat the
frontier like a stack, a last in, first out data
structure, that's the result we get. We go from A to B to D to F, and then
we sort of backed up and went down to C and then E. And it's important
to get a visual sense for how this algorithm is working. We went very deep in
this search tree, so to speak, all the way until the
bottom where we hit a dead end. And then we effectively backed
up and explored this other route that we didn't try before. And it's this going very
deep in the search tree idea, this way the algorithm ends up
working when we use a stack, that we call this version of the
algorithm depth first search. Depth first search is
the search algorithm where we always explore the
deepest node in the frontier. We keep going deeper and
deeper through our search tree. And then if we hit a dead end, we back
up and we try something else instead. But depth first search is just
one of the possible search options that we could use. It turns out that there is
another algorithm called breadth first search, which
behaves very similarly to depth first search with one difference. Instead of always exploring the
deepest node in the search tree the way the depth first search
does, breadth first search is always going to explore the
shallowest node in the frontier. So what does that mean? Well, it means that instead
of using a stack, which depth first search, or DFS, used
where the most recent item added to the frontier is the one we'll explore
next, in breadth first search, or BFS, will instead use a queue where a
queue is a first in, first out data type, where the very first thing we
add to the frontier is the first one we'll explore. And they effectively
form a line or a queue, where the earlier you arrive in the
frontier, the earlier you get explored. So what would that mean for the same
exact problem finding a path from A to E? Well we start with A, same as before. Then we'll go ahead and have explored
A, and say, where can we get to from A? Well, from A we can get
to B. Same as before. From B, same as before. We can get to C and D so C and
D get added to the frontier. This time, though, we
added C to the frontier before D so we'll explore C first. So C gets explored. And from C, where can we get to? Well, we can get to E. So E gets added to the frontier. But because D was explored
before E, we'll look at D next. So we'll explore D and say,
where can we get to from D? We can get to F. And only
then will we say, all right. Now we can get to E. And so what
breadth first search, or BFS, did is we started here,
we looked at both C and D, and then we looked at
E. Effectively we're looking at things one away
from the initial state, then two away from the initial state. And only then, things that are
three away from the initial state. Unlike depth first search, which
just went as deep as possible into the search tree until
it hit a dead end and then, ultimately, had to back up. So these now are two
different search algorithms that we could apply in order
to try and solve a problem. And let's take a look at
how these would actually work in practice with something
like maze solving, for example. So here's an example of a maze. These empty cells represent
places where our agent can move. These darkened gray
cells and represent walls that the agent can't pass through. And, ultimately, our
agent, our AI, is going to try to find a way
to get from position A to position B via some sequence of
actions, where those actions are left, right, up, and down. What will depth first
search do in this case? Well depth first search
will just follow one path. If it reaches a fork in the road where
it has multiple different options, depth first search is just, in
this case, going to choose one. There isn't a real preference. But it's going to keep following
one until it hits a dead end. And when it hits a dead
end, depth first search effectively goes back to
the last decision point and tries the other path. Fully exhausting this
entire path and when it realizes that, OK,
the goal is not here, then it turns its
attention to this path. It goes as deep as possible. When it hits a dead end, it backs
up and then tries this other path, keeps going as deep as possible
down one particular path, and when it realizes that that's
a dead end, then it'll back up. And then ultimately find
its way to the goal. And maybe you got lucky and maybe you
made a different choice earlier on, but ultimately this is how depth
first search is going to work. It's going to keep following
until it hits a dead end. And when it hits a dead end, it backs
up and looks for a different solution. And so one thing you
might reasonably ask is, is this algorithm
always going to work? Will it always actually find a way to
get from the initial state to the goal? And it turns out that
as long as our maze is finite, as long as they're
that finitely many spaces where we can travel, then yes. Depth first search is going to find
a solution because eventually it will just explore everything. If the maze happens to be infinite and
there's an infinite state space, which does exist in certain types of problems,
then it's a slightly different story. But as long as our maze
has finitely many squares, we're going to find a solution. The next question, though,
that we want to ask is, is it going to be a good solution? Is it the optimal
solution that we can find? And the answer there is not necessarily. And let's take a look
at an example of that. In this maze, for example, we're again
trying to find our way from A to B. And you notice here there are
multiple possible solutions. We could go this way, or
we could go up in order to make our way from A
to B. Now if we're lucky, depth first search will choose this way
and get to B. But there's no reason, necessarily, why depth
first search would choose between going up or going to the right. It's sort of an arbitrary
decision point because both are going to be added to the frontier. And ultimately, if we get
unlucky, depth first search might choose to explore this
path first because it's just a random choice at this point. It will explore, explore,
explore, and it'll eventually find the goal, this particular
path, when in actuality there was a better path. There was a more optimal
solution that used fewer steps, assuming we're measuring the cost of a
solution based on the number of steps that we need to take. So depth first search, if
we're unlucky, might end up not finding the best solution when
a better solution is available. So if that's DFS, depth first search. How does BFS, or breadth
first search, compare? How would it work in this
particular situation? Well the algorithm is going to
look very different visually in terms of how BFS explores. Because BFS looks at
shallower nodes first, the idea is going to be BFS will
first look at all of the nodes that are one away from the initial state. Look here and look here, for example. Just at the two nodes that are
immediately next to this initial state. Then it will explore
nodes that are two away, looking at the state and
that state, for example. Then it will explore nodes that are
three away, this state and that state. Whereas depth first search just
picked one path and kept following it, breadth first search
on the other hand, is taking the option of exploring
all of the possible paths kind of at the same time,
bouncing back between them, looking deeper and deeper
at each one, but making sure to explore the shallower
ones or the ones that are closer to the initial state earlier. So we'll keep following this pattern,
looking at things that are four away, looking at things that
are five away, looking at things that are six away, until
eventually we make our way to the goal. And in this case, it's true we
had to explore some states that ultimately didn't lead us anywhere. But the path that we found to
the goal was the optimal path. This is the shortest way that
we could get to the goal. And so, what might happen
then in a larger maze? Well let's take a look
at something like this and how breadth first
search is going to behave. Well, breadth first
search, again, will just keep following the states until
it receives a decision point. It could go either left or right. And while DFS just picked
one and kept following that until it hit a dead end, BFS on
the other hand, will explore both. It'll say, look at this
node, then this node, and I'll look at this node, then
that node, so on and so forth. And when it hits a decision point here,
rather than pick one left or two right and explore that path, it will again
explore both alternating between them, going deeper and deeper. Will explore here, and then maybe
here and here, and then keep going. Explore here and slowly make
our way, you can visually see further and further out. Once we get to this
decision point, we'll explore both up and down
until, ultimately, we make our way to the goal. And what you'll notice is,
yes, breadth first search did find our way from A to B by
following this particular path. But it needed to explore a lot
of states in order to do so. And so we see some trade
here between DFS and BFS. That in DFS there may be some cases
where there is some memory savings, as compared to a breadth
first approach where breadth first search, in this case,
had to explore a lot of states. But maybe that won't always be the case. So now let's actually turn
our attention to some code. And look at the code
that we could actually write in order to implement something
like depth first search or breadth for the search in the context
of solving a maze, for example. So I'll go ahead and
go into my terminal. And what I have here inside of
maze.pi is an implementation of this same idea of maze solving. I've defined a class called
node that in this case is keeping track of the state,
the parent, in other words the state before the
state, and the action. In this case, we're not
keeping track of the path cost because we can calculate the
cost of the path at the end after we found our way from
the initial state to the goal. In addition to this, I've defined
a class called a stack frontier. And if unfamiliar with a
class, a class is a way for me to define a way to
generate objects in Python. It refers to an idea of object oriented
programming where the idea here is that I would like to
create an object that is able to store all of my Frontier Data. And I would like to have
functions, otherwise known as methods on that object, that I
can use to manipulate the object. And so what's going on here,
if unfamiliar with the syntax, is I have a function that
initially creates a frontier that I'm going to
represent using a list. And initially my frontier is
represented by the empty list. There's nothing in my
frontier to begin with. I have an add function that
adds something to the frontier, as by appending it to
the end of the list. I have a function that checks if the
frontier contains a particular state. I have an empty function that
checks if the frontier is empty. If the frontier is empty, that just
means the length of the frontier is zero. And then I have a function for
removing something from the frontier. I can't remove something from the
frontier if the frontier is empty. So I check for that first. But otherwise, if the
frontier isn't empty, recall that I'm implementing
this frontier as a stack, a last in, first out data structure. Which means the last thing
I add to the frontier, in other words, the
last thing in the list, is the item that I should
remove from this frontier. So what you'll see here is I have
removed the last item of a list. And if you index into a
Python list with negative one, that gets you the last item in the list. Since zero is the first
item, negative one kind of wraps around and gets
you to the last item in the list. So we give that the node. We call that node, we update the
frontier here on line 28 to say, go ahead and remove that node that
you just removed from the frontier. And then we return the node as a
result. So this class here effectively implements the idea of a frontier. It gives me a way to add
something to a frontier and a way to remove something from
the frontier as a stack. I've also, just for good measure,
implemented an alternative version of the same thing called a Q frontier. Which, in parentheses you'll see here,
it inherits from a stack frontier, meaning it's going to do all the same
things that the stack frontier did, except the way we remove
a node from the frontier is going to be slightly different. Instead of removing from the end of
the list the way we would in a stack, we're instead going to remove
from the beginning of the list. self.frontierzero will get me
the first node in the frontier, the first one that was added. And that is going to be the one
that we return in the case of a Q. Under here I have a definition
of a class called maze. This is going to handle the process
of taking a sequence, a maze like text file, and figuring out how to solve it. So we'll take as input a text
file that looks something like this, for example, where we see
hash marks that are here representing walls and I have the character A
representing the starting position, and the character B representing
the ending position. And you can take a look at the code
for parsing this text file right now. That's the less interesting part. The more interesting part
is this solve function here, where the solve function
is going to figure out how to actually get
from point A to point B. And here we see an implementation
of the exact same idea we saw from a moment ago. We're going to keep
track of how many states we've explored just so we
can report that data later. But I start with a node that
represents just the start state. And I start with a frontier that
in this case is a stack frontier. And given that I'm treating
my frontier as a stack, you might imagine that the algorithm I'm
using here is now depth first search. Because depth first search or DFS
uses a stack as its data structure. And initially, this frontier is just
going to contain the start state. We initialize an explored
set that initially is empty. There's nothing we've explored so far. And now here's our loop, that notion
of repeating something again and again. First, we check if the frontier is empty
by calling that empty function that we saw the implementation of a moment ago. And if the frontier
is indeed empty, we'll go ahead and raise an exception,
or a Python error, to say, sorry. There is no solution to this problem. Otherwise, we'll go ahead and
remove a node from the frontier, as by calling frontier.remove and update
the number of states we've explored. Because now we've explored
one additional state so we say self.numexplored plus equals
one, adding one to the number of states we've explored. Once we remove a node
from the frontier, recall that the next step is to see whether
or not it's the goal, the goal test. And in the case of the maze,
the goal is pretty easy. I check to see whether the state
of the node is equal to the goal. Initially when I set
up the maze, I set up this value called goal which
is the property of the maze so I can just check to see if
the node is actually the goal. And if it is the goal,
then what I want to do is backtrack my way towards
figuring out what actions I took in order to get to this goal. And how do I do that? We'll recall that every
node stores its parent-- the node that came before it that
we used to get to this node-- and also the action used
in order to get there. So I can create this
loop where I'm constantly just looking at the parent
of every node and keeping track, for all of the parents, what
action I took to get from the parent to this. So this loop is going to keep repeating
this process of looking through all of the parent nodes until we get
back to the initial state, which has no parent, where node.parent
is going to be equal to none. As I do so, I'm going to be
building up the list of all of the actions that I'm following
and the list of all of the cells that are part of the solution. But I'll reverse them
because when I build it up going from the goal
back to the initial state, I'm building the sequence of actions
from the goal to the initial state, but I want to reverse them in order
to get the sequence of actions from the initial state to the goal. And that is, ultimately,
going to be the solution. So all of that happens if the
current state is equal to the goal. And otherwise, if it's
not the goal, well, then I'll go ahead and add this
state to the explored set to say, I've explored this state now. No need to go back to it if I
come across it in the future. And then, this logic
here implements the idea of adding neighbors to the frontier. I'm saying, look at all of my neighbors. And I implemented a function called
neighbors that you can take a look at. And for each of those
neighbors, I'm going to check, is the state already in the frontier? Is the state already
in the explored set? And if it's not in either of those, then
I'll go ahead and add this new child node-- this new node-- to the frontier. So there's a fair amount of
syntax here, but the key here is not to understand all
the nuances of the syntax, though feel free to take a closer
look at this file on your own to get a sense for how it is working. But the key is to see how this is an
implementation of the same pseudocode, the same idea that we were describing
a moment ago on the screen when we were looking at the steps that
we might follow in order to solve this kind of search problem. So now let's actually
see this in action. I'll go ahead and run maze.py
on maze1.txt, for example. And what we'll see is here we have a
printout of what the maze initially looked like. And then here, down below,
is after we've solved it. We had to explore 11 states in order to
do it, and we found a path from A to B. And in this program, I just happened
to generate a graphical representation of this, as well-- so I can open up maze.png, which
is generated by this program-- that shows you where, in the
darker color here, the wall is. Red is the initial
state, green is the goal, and yellow is the path
that was followed. We found a path from the
initial state to the goal. But now let's take a look
at a more sophisticated maze to see what might happen instead. Let's look now at maze2.txt, where
now here we have a much larger maze. Again, we're trying to find our
way from point A to point B, but now you'll imagine that depth-first
search might not be so lucky. It might not get the
goal on the first try. It might have to follow
one path then backtrack and explore something
else a little bit later. So let's try this. Run pythonmaze.py of maze2.txt,
this time trying on this other maze. And now depth-first search
is able to find a solution. Here, as indicated by the stars,
is a way to get from A to B. And we can represent this
visually by opening up this maze. Here's what that maze looks like. And highlighted in yellow, is the path
that was found from the initial state to the goal. But how many states do we have to
explore before we found that path? Well, recall that, in my program, I was
keeping track of the number of states that we've explored so far. And so I can go back to the terminal
and see that, all right, in order to solve this problem, we had
to explore 399 different states. And in fact, if I make one small
modification to the program and tell the program at the
end when we output this image, I added an argument
called "show explored". And if I set "show
explored" equal to true and rerun this program pythonmaze.py
by running it on maze2, and then I open the maze, what you'll
see here is, highlighted in red, are all of the states that had to be
explored to get from the initial state to the goal. Depth-First Search, or DFS, didn't
find its way to the goal right away. It made a choice to first
explore this direction. And when it explored
this direction, it had to follow every conceivable
path, all the way to the very end, even
this long and winding one, in order to realize that, you
know what, that's a dead end. And instead, the program
needed to backtrack. After going this direction, it
must have gone this direction. It got lucky here by just
not choosing this path. But it got unlucky here, exploring this
direction, exploring a bunch of states that it didn't need
to and then, likewise, exploring all of this
top part of the graph when it probably didn't
need to do that either. So all in all, depth-first
search here really not performing optimally, or probably
exploring more states than it needs to. It finds an optimal solution,
the best path to the goal, but the number of states needed
to explore in order to do so, the number of steps I had to
take, that was much higher. So let's compare. How would Breadth-First Search, or BFS,
do on this exact same maze instead? And in order to do so,
it's a very easy change. The algorithm for DFS and BFS
is identical with the exception of what data structure we use
to represent the frontier. That in DFS I used a stack frontier-- last in, first out-- whereas in BFS, I'm going to
use a queue frontier-- first in, first out, where the first
thing I add to the frontier is the first thing that I remove. So I'll go back to the terminal,
rerun this program on the same maze, and now you'll see that the number of
states we had to explore was only 77, as compared to almost 400 when
we used depth-first search. And we can see exactly why. We can see what happened if we open
up maze.png now and take a look. Again, yellow highlight is the solution
that breath-first search found, which, incidentally, is the same
solution that depth-first search found. They're both finding the best solution,
but notice all the white unexplored cells. There was much fewer states
that needed to be explored in order to make our way to the goal
because breadth-first search operates a little more shallowly. It's exploring things that
are close to the initial state without exploring things
that are further away. So if the goal is not too far
away, then breadth-first search can actually behave quite
effectively on a maze that looks a little something like this. Now, in this case, both BFS and DFS
ended up finding the same solution, but that won't always be the case. And in fact, let's take a look at one
more example, for instance, maze3.txt. In maze3.txt, notice that
here there are multiple ways that you could get from A to B. It's a relatively small maze,
but let's look at what happens. If I use-- and I'll go ahead
and turn off "show explored" so we just see the solution. If I use BFS, breadth-first
search, to solve maze3.txt, well, then we find a solution. And if I open up the maze, here's
the solution that we found. It is the optimal one. With just four steps, we can
get from the initial state to what the goal happens to be. But what happens if we try to use,
depth-first search, or DFS, instead? Well, again, I'll go back up to my queue
frontier, where queue frontier means that we're using breadth-first search. And I'll change it to a stack
frontier, which means that now we'll be using depth-first search. I'll rerun Pythonmaze.py. And now you'll see that
we find a solution, but it is not the optimal solution. This, instead, is what
our algorithm finds. And maybe depth-first search
would have found this solution. It's possible, but it's not
guaranteed, that if we just happen to be unlucky, if we choose
this state instead of that state, then depth-first search might
find a longer route to get from the initial state to the goal. So we do see some trade-offs here
where depth-first search might not find the optimal solution. So at that point, it seems like
breadth-first search is pretty good. Is that the best we can do, where it's
going to find us the optimal solution and we don't have to worry
about situations where we might end up finding a longer path to
the solution than what actually exists? Where the goal is far away
from the initial state-- and we might have to take lots of steps
in order to get from the initial state to the goal-- what ended up happening, is that
this algorithm, BFS, ended up exploring basically the entire graph,
having to go through the entire maze in order to find its way from the
initial state to the goal state. What we'd ultimately
like is for our algorithm to be a little bit more intelligent. And now what would it
mean for our algorithm to be a little bit more
intelligent, in this case? Well, let's look back to where
breadth-first search might have been able to make
a different decision and consider human intuition
in this process, as well. Like, what might a human do when solving
this maze that is different than what BFS ultimately chose to do? Well, the very first
decision point that BFS made was right here, when it
made five steps and ended up in a position where it
had a fork in the road. It could either go left
or it could go right. In these initial couple of
steps, there was no choice. There was only one action that could
be taken from each of those states. And so the search algorithm
did the only thing that any search
algorithm could do, which is keep following that
state after the next state. But this decision point is where
things get a little bit interesting. Depth-first search, that very first
search algorithm we looked at, chose to say, let's pick one
path and exhaust that path, see if anything that way has
the goal, and if not, then let's try the other way. Breadth-first search took the
alternative approach of saying, you know what? Let's explore things that are shallow,
close to us first, look left and right, then back left and back
right, so on and so forth, alternating between our options in
the hopes of finding something nearby. But ultimately, what might a human do
if confronted with a situation like this of go left or go right? Well, a human might visually
see that, all right, I'm trying to get to state B, which
is way up there, and going right just feels like it's closer to the goal. Like, it feels like
going right should be better than going left
because I'm making progress towards getting to that goal. Now, of course, there are a couple
of assumptions that I'm making here. I'm making the assumption
that we can represent this grid as, like, a
two-dimensional grid, where I know the
coordinates of everything. I know that A is in coordinate 0,0,
and B is in some other coordinate pair. And I know what coordinate I'm at now,
so I can calculate that, yeah, going this way, that is closer to the goal. And that might be a reasonable
assumption for some types of search problems but maybe not in others. But for now, we'll go
ahead and assume that-- that I know what my current coordinate
pair and I know the coordinate x,y of the goal that I'm trying to get to. And in this situation,
I'd like an algorithm that is a little bit more
intelligent and somehow knows that I should be making
progress towards the goal, and this is probably the way
to do that because, in a maze, moving in the coordinate
direction of the goal is usually, though not
always, a good thing. And so here we draw a distinction
between two different types of search algorithms-- uninformed
search and informed search. Uninformed search algorithms
are algorithms like DFS and BFS, the two algorithms
that we just looked at, which are search strategies that don't
use any problem specific knowledge to be able to solve the problem. DFS and BFS didn't really
care about the structure of the maze or anything about
the way that a maze is in order to solve the problem. They just look at the actions available
and choose from those actions, and it doesn't matter whether if
it's a maze or some other problem. The solution, or the way that
it tries to solve the problem, is really fundamentally
going to be the same. What we're going to
take a look at now is an improvement upon uninformed search. We're going to take a
look at informed search. Informed search are going
to be search strategies that use knowledge specific to the problem
to be able to better find a solution. And in the case of a maze,
this problem specific knowledge is something like, if
I'm going to square that is geographically
closer to the goal, that is better than being in a square
that is geographically further away. And this is something we can only
know by thinking about this problem and reasoning about what knowledge
might be helpful for our AI agent to know a little something about. There are a number of different
types of informed search. Specifically, first, we're going
to look at a particular type of search algorithm called
greedy best-first search. Greedy Best-First Search,
often abbreviated GBFS, is a search algorithm that, instead
of expanding the deepest node, like DFS, or the
shallowest node, like BFS, this algorithm is always
going to expand the node that it thinks is closest to the goal. Now, the search algorithm isn't going to
know for sure whether it is the closest thing to the goal, because if we
knew what was closest to the goal all the time, then we would
already have a solution. Like, the knowledge of
what is close to the goal, we could just follow those steps in
order to get from the initial position to the solution. But if we don't know the solution--
meaning we don't know exactly what's closest to the goal-- instead, we can use
an estimate of what's closest to the goal, otherwise
known as a heuristic-- just some way of estimating whether
or not we're close to the goal. And we'll do so using a heuristic
function, conventionally called h(n), that takes a state of input and
returns our estimate of how close we are to the goal. So what might this
heuristic function actually look like in the case of
a maze-solving algorithm? Where we're trying to solve a maze,
what does a heuristic look like? Well, the heuristic needs to answer
a question, like between these two cells, C and D, which one is better? Which one would I rather be in if I'm
trying to find my way to the goal? Well, any human could probably look
at this and tell you, you know what? D looks like it's better. Even if the maze is a convoluted and
you haven't thought about all the walls, D is probably better. And why is D better? Well, because if you ignore the
wall-- let's just pretend the walls don't exist for a moment and
relax the problem, so to speak-- D, just in terms of coordinate
pairs, is closer to this goal. It's fewer steps that I would
need to take to get to the goal, as compared to C, even
if you ignore the walls. If you just know the x,y coordinate of
C, and the x,y coordinate of the goal, and likewise, you know
the x,y coordinate of D, you can calculate that D, just
geographically, ignoring the walls, looks like it's better. And so this is the heuristic
function that we're going to use, and it's something called the
Manhattan distance, one specific type of heuristic, where the heuristic
is, how many squares vertically and horizontally and then
left to right-- so not allowing myself to go diagonally, just
either up or right or left or down. How many steps do I need to take to get
from each of these cells to the goal? Well, as it turns out, D is much closer. There are fewer steps. It only needs to take six steps
in order to get to that goal. Again here ignoring the walls. We've relaxed the problem a little bit. We're just concerned
with, if you do the math, subtract the x values
from each other and the y values from each other, what is our
estimate of how far we are away? We can estimate that D is
closer to the goal than C is. And so now we have an approach. We have a way of picking which
node to remove from the frontier. And at each stage in
our algorithm, we're going to remove a node
from the frontier. We're going to explore the
node, if it has the smallest value for this heuristic
function, if it has the smallest Manhattan distance to the goal. And so what would this
actually look like? Well, let me first label
this graph, label this maze, with a number representing
the value of this heuristic function, the value of the Manhattan
distance from any of these cells. So from this cell, for example,
were one away from the goal. From this cell, were
two away from the goal. Three away, four away. Here we're five away, because we have
to go one to the right and then four up. From somewhere like here,
the Manhattan distance is 2. We're only two squares
away from the goal, geographically, even
though in practices we're going to have to take a longer
path, but we don't know that yet. The heuristic is just
some easy way to estimate how far we are away from the goal. And maybe our heuristic
is overly optimistic. It thinks that, yeah,
we're only two steps away, when in practice, when you consider
the walls, it might be more steps. So the important thing here is that
the heuristic isn't a guarantee of how many steps it's going to take. It is estimating. It's an attempt at
trying to approximate. And it does seem generally the case
that the squares that look closer to the goal have smaller values
for the heuristic function than squares that are further away. So now, using greedy best-first search,
what might this algorithm actually do? Well, again, for these first five
steps, there's not much of a choice. We started this initial state,
A. And we say, all right. We have to explore these five states. But now we have a decision point. Now we have a choice between
going left and going right. And before, when DFS and BFS would
just pick arbitrarily because it just depends on the order you throw
these two nodes into the frontier-- and we didn't specify what order you put
them into the frontier, only the order you take them out. Here we can look at 13 and
11 and say that, all right, this square is a distance
of 11 away from the goal, according to our heuristic,
according to our estimate. And this one we estimate to
be 13 away from the goal. So between those two options,
between these two choices, I'd rather have the 11. I'd rather be 11 steps away from
the goal, so I'll go to the right. We're able to make an informed decision
because we know a little something more about this problem. So then we keep following 10, 9, 8-- between the two sevens. We don't really have much of
a way to know between those. So then we do just have to
make an arbitrary choice. And you know what? Maybe we choose wrong. But that's OK because now we can still
say, all right, let's try this seven. We say seven, six. We have to make this choice
even though it increases the value of the heuristic function. But now we have another decision
point between six and eight. And between those two-- and really, we're also considering
the 13, but that's much higher. Between six, eight,
and 13, well, the six is the smallest value, so
we'd rather take the six. We're able to make an informed decision
that going this way to the right is probably better than going that way. So we turn this way. We go to five. And now we find a decision
point where we'll actually make a decision that we
might not want to make, but there's unfortunately not
too much of a way around this. We see four and six. Four looks closer to the goal, right? It's going up, and the
goal is further up. So we end up taking that route, which
ultimately leads us to a dead end. But that's OK because we can still
say, all right, now let's try the six, and now follow this route that will
ultimately lead us to the goal. And so this now is how greedy
best-first search might try to approach this
problem, by saying whenever we have a decision between multiple
nodes that we could explore, let's explore the node that
has the smallest value of h(n), this heuristic function that is
estimating how far I have to go. And it just so happens
that, in this case, we end up doing better, in terms of the
number of states we needed to explore, than BFS needed to. BFS explored all of this
section and all of that section. But we were able to eliminate
that by taking advantage of this heuristic, this
knowledge about how close we are to the goal or some
estimate of that idea. So this seems much better. So wouldn't we always
prefer an algorithm like this over an algorithm
like breadth-first search? Well, maybe. One thing to take into
consideration is that we need to come up with a good heuristic. How good the heuristic is is going
to affect how good this algorithm is. And coming up with a good heuristic
can oftentimes be challenging. But the other thing
to consider is to ask the question, just as we did
with the prior two algorithms, is this algorithm optimal? Will it always find the shortest path
from the initial state to the goal? And to answer that question, let's take
a look at this example for a moment. Take a look at this example. Again, we're trying to get
from A to B, and again, I've labeled each of the cells
with their Manhattan distance from the goal, the number of
squares up and to the right you would need to travel in order
to get from that square to the goal. And let's think about, would
greedy best-first search that always picks the smallest number
end up finding the optimal solution? What is the shortest solution,
and would this algorithm find it? And the important thing to realize is
that right here is the decision point. We're estimate to be
12 away from the goal. And we have two choices. We can go to the left, which we
estimate to be 13 away from the goal, or we can go up, where we estimate
it to be 11 away from the goal. And between those two, greedy
best-first search is going to say, the 11 looks better than the 13. And in doing so, greedy
best-first search will end up finding
this path to the goal. But it turns out this
path is not optimal. There is a way to get to
the goal using fewer steps. And it's actually this way, this way
that ultimately involved fewer steps, even though it meant at this
moment choosing the worst option between the two-- or what we
estimated to be the worst option, based on the heretics. And so this is what we mean
by this is a greedy algorithm. It's making the best decision, locally. At this decision point,
it looks like it's better to go here than it is to go to the 13. But in the big picture, it's
not necessarily optimal, that it might find a solution
when in actuality there was a better solution available. So we would like some way
to solve this problem. We like the idea of
this heuristic, of being able to estimate the path, the
distance between us and the goal, and that helps us to be able
to make better decisions and to eliminate having to search
through entire parts of the state space. But we would like to modify the
algorithm so that we can achieve optimality, so that it can be optimal. And what is the way to do this? What is the intuition here? Well, let's take a look at this problem. In this initial problem,
greedy best-first search found this solution
here, this long path. And the reason why it wasn't great is
because, yes, the heuristic numbers went down pretty low, but later on,
and they started to build back up. They built back 8, 9, 10, 11-- all
the way up to 12, in this case. And so how might we go about
trying to improve this algorithm? Well, one thing that we
might realize is that, if we go all the way through this
algorithm, through this path, and we end up going to the 12, and we've
had to take this many steps-- like, who knows how many steps that
is-- just to get to this 12, we could have also, as an alternative,
taken much fewer steps, just six steps, and ended up at this 13 here. And yes, 13 is more than 12, so
it looks like it's not as good, but it required far fewer steps. Right? It only took six steps to get to
this 13 versus many more steps to get to this 12. And while greedy best-first search
says, oh, well, 12 is better than 13 so pick the 12, we might
more intelligently say, I'd rather be somewhere
that heuristically looks like it takes slightly longer
if I can get there much more quickly. And we're going to encode
that idea, this general idea, into a more formal algorithm
known as A star search. A star search is going
to solve this problem by, instead of just
considering the heuristic, also considering how long it took
us to get to any particular state. So the distinction is greedy
best-first search, if I am in a state right now, the only
thing I care about is what is the estimated distance,
the heuristic value, between me and the goal. Whereas A star search will
take into consideration two pieces of information. It'll take into consideration, how
far do I estimate I am from the goal, but also how far did I have to
travel in order to get here? Because that is relevant, too. So we'll search algorithms by
expanding the node with the lowest value of g(n) plus h(n). h(n) is that same heuristic that we were
talking about a moment ago that's going to vary based on the problem, but
g(n) is going to be the cost to reach the node-- how many steps I had to take, in this
case, to get to my current position. So what does that search
algorithm look like in practice? Well, let's take a look. Again, we've got the same maze. And again, I've labeled them
with their Manhattan distance. This value is the h(n)
value, the heuristic estimate of how far each of these
squares is away from the goal. But now, as we begin
to explore states, we care not just about this
heuristic value but also about g(n), the number of steps I
had to take in order to get there. And I care about summing
those two numbers together. So what does that look like? On this very first step,
I have taken one step. And now I am estimated to be
16 steps away from the goal. So the total value here is 17. Then I take one more step. I've now taken two steps. And I estimate myself to
be 15 away from the goal-- again, a total value of 17. Now I've taken three steps. And I'm estimated to be 14 away
from the goal, so on and so forth. Four steps, an estimate of 13. Five steps, estimate of 12. And now, here's a decision point. I could either be six steps away
from the goal with a heuristic of 13 for a total of 19, or I
could be six steps away from the goal with a heuristic of 11
with an estimate of 17 for the total. So between 19 and 17,
I'd rather take the 17-- the 6 plus 11. So so far, no different
than what we saw before. We're still taking this option
because it appears to be better. And I keep taking this option
because it appears to be better. But it's right about here that
things get a little bit different. Now I could be 15 steps away from the
goal with an estimated distance of 6. So 15 plus 6, total value of 21. Alternatively, I could be six
steps away from the goal-- because this was five steps
away, so this is six steps away-- with a total value of 13 as my estimate. So 6 plus 13-- that's 19. So here we would evaluate
g(n) plus h(n) to be 19-- 6 plus 13-- whereas here, we
would be 15 plus 6, or 21. And so the intuition is,
19 less than 21, pick here. But the idea is ultimately I'd rather be
having taken fewer steps to get to a 13 than having taken 15
steps and be at a six because it means I've had to take
more steps in order to get there. Maybe there's a better path this way. So instead we'll explore this route. Now if we go one more--
this is seven steps plus 14, is 21, so between those
two it's sort of a toss up. We might end up exploring
that one anyways. But after that, as these numbers start
to get bigger in the heuristic values and these heuristic values
start to get smaller, you'll find that we'll actually
keep exploring down this path. And you can do the math to see
that at every decision point, A star search is going to make a choice
based on the sum of how many steps it took me to get to my
current position and then how far I estimate I am from the goal. So while we did have to
explore some of these states, the ultimate solution we found
was, in fact, an optimal solution. It did find us the quickest possible
way to get from the initial state to the goal. And it turns out that A* is an
optimal search algorithm under certain conditions. So the conditions are h of n, my
heuristic, needs to be admissible. What does it mean for a
heuristic to be admissible? Well, a heuristic is admissible if
it never overestimates the true cost. Each event always needs to
either get it exactly right in terms of how far away I am,
or it needs to underestimate. So we saw an example from before where
the heuristic value was much smaller than the actual cost it would take. That's totally fine. But the heuristic value
should never overestimate. It should never think that I'm further
away from the goal than I actually am. And meanwhile, to make a
stronger statement, h of n also needs to be consistent. And what does it mean
for it to be consistent? Mathematically, it means
that for every node, which we'll call n, and successor, the node
after me, that I'll call n prime, where it takes a cost of c to make
that step, the heuristic value of n needs to be less than or
equal to the heuristic value of n prime plus the cost. So it's a lot of math, but
in words, what it ultimately means is that if I am here
at this state right now, the heuristic value from me to the goal
shouldn't be more than the heuristic value of my successor, the next place
I could go to, plus however much it would cost me to just make that
step, from one step to the next step. And so this is just making sure that
my heuristic is consistent between all of these steps that I might take. So as long as this is true, then A*
search is going to find me an optimal solution. And this is where much of the challenge
of solving these search problems can sometimes come in, that A* search
is an algorithm that is known, and you could write
the code fairly easily. But it's choosing the heuristic that
can be the interesting challenge. The better the heuristic
is, the better I'll be able to solve the problem, and the
fewer states that I'll have to explore. And I need to make sure
that the heuristic satisfies these particular constraints. So all in all, these are some of
the examples of search algorithms that might work. And certainly, there are
many more than just this. A*, for example, does have a tendency
to use quite a bit of memory, so there are alternative approaches to
A* that ultimately use less memory than this version of A* happens to use. And there are other search algorithms
that are optimized for other cases as well. But now, so far, we've only been
looking at search algorithms where there's one agent. I am trying to find a
solution to a problem. I am trying to navigate
my way through a maze. I am trying to solve a 15 puzzle. I am trying to find driving
directions from point A to point B. Sometimes in search
situations, though, we'll enter an adversarial
situation where I am an agent trying to make
intelligent decisions, and there is someone else who is
fighting against me, so to speak, that has an opposite objective,
someone where I am trying to succeed, someone else that wants me to fail. And this is most popular in something
like a game, a game like tic-tac-toe, where we've got this
3-by-3 grid, and X and O take turns either writing an X or
an O in any one of these squares. And the goal is to get three X's
in a row, if you're the X player, or three O's in a row,
if you're the O player. And computers have gotten quite good at
playing games, tic-tac-toe very easily, but even more complex games. And so you might imagine, what does
an intelligent decision in a game look like? So maybe X makes an initial move
in the middle, and O plays up here. What does an intelligent
move for X now become? Where should you move if you were X? And it turns out there are
a couple of possibilities. But if an AI is playing
this game optimally, then the AI might play somewhere
like the upper right, where in this situation, O has
the opposite objective of X. X is trying to win the game, to
get three in a row diagonally here, and O is trying to stop that
objective, opposite of the objective. And so O is going to place
here, to try to block. But now, X has a pretty clever move. X can make a move, like this where
now X has two possible ways that X can win the game. X could win the game by getting
three in a row across here, or X could win the game by getting
three in a row vertically this way. So it doesn't matter where
O makes their next move. O could play here, for example, blocking
the three in a row horizontally, but then X is going to win the game by
getting a three in a row vertically. And so there's a fair
amount of reasoning that's going on here in order for the
computer to be able to solve a problem. And it's similar in spirit to the
problems we've looked at so far. There are actions, there's some
sort of state of the board, and some transition from
one action to the next, but it's different in the sense that
this is now not just a classical search problem, but an adversarial search
problem, that I am the X player, trying to find the best
moves to make, but I know that there is some adversary
that is trying to stop me. So we need some sort of algorithm
to deal with these adversarial type of search situations. And the algorithm we're
going to take a look at is an algorithm called
Minimax, which works very well for these deterministic
games, where there are two players. It can work for other types of games as
well, but we'll look right now at games where I make a move, that my opponent
makes a move, and I am trying to win, and my opponent is trying to win, also. Or in other words, my opponent
is trying to get me to lose. And so what do we need in order
to make this algorithm work? Well, anytime we try and translate
this human concept, of playing a game, winning, and losing,
to a computer, we want to translate it in terms that
the computer can understand. And ultimately, the computer
really just understands numbers. And so we want some way of
translating a game of X's and O's on a grid to something numerical,
something the computer can understand. The computer doesn't normally
understand notions of win or lose, but it does understand the
concept of bigger and smaller. And so what we might yet do is, we
might take each of the possible ways that a tic-tac-toe game can unfold
and assign a value, or a utility, to each one of those possible ways. And in a tic-tac-toe game,
and in many types of games, there are three possible outcomes. The outcomes are, O wins,
X wins, or nobody wins. So player one wins, player
two wins, or nobody wins. And for now, let's go ahead and
assign each of these possible outcomes a different value. We'll say O winning-- that'll have a value of negative 1. Nobody winning-- that'll
have a value of 0. And X winning-- that
will have a value of 1. So we've just assigned numbers to
each of these three possible outcomes. And now, we have two players. We have the X player and the O player. And we're going to go ahead and
call the X player the max player. And we'll call the O
player the min player. And the reason why is because
in the Minimax algorithm, the max player, which in this case is
X, is aiming to maximize the score. These are the possible options for
the score, negative 1, 0, and 1. X wants to maximize the score,
meaning if at all possible, X would like this situation
where X wins the game. And we give it a score of 1. But if this isn't possible, if X
needs to choose between these two options, negative 1 meaning O
winning, or 0 meaning nobody winning, X would rather that nobody wins, score
of 0, than a score of negative 1, O winning. So this notion of winning
and losing in time has been reduced mathematically to
just this idea of, try and maximize the score. The X player always wants
the score to be bigger. And on the flip side, the
min player, in this case, O, is aiming to minimize the score. The O player wants the score
to be as small as possible. So now we've taken this game of
X's and O's and winning and losing and turned it into something
mathematical, something where X is trying to maximize the score,
O is trying to minimize the score. Let's now look at all
of the parts of the game that we need in order
to encode it in an AI so that an AI can play
a game like tic-tac-toe. So the game is going to
need a couple of things. We'll need some sort of initial
state, that we'll in this case call S0, which is how the game begins,
like an empty tic-tac-toe board, for example. We'll also need a
function called player, where the player function is going to
take as input a state, here represented by S. And the output of the
player function is going to be, which player's turn is it? We need to be able to give a
tic-tac-toe board to the computer, run it through a function, and that
function tells us whose turn it is. We'll need some notion of
actions that we can take. We'll see examples of
that in just a moment. We need some notion of a
transition model-- same as before. If I have a state, and
I take an action, I need to know what results
as a consequence of it. I need some way of knowing
when the game is over. So this is equivalent to
kind of like a goal test, but I need some terminal
test, some way to check to see if a state is a terminal
state, where a terminal state means the game is over. In the classic game of tic-tac-toe , a
terminal state means either someone has gotten three in a row, or all of the
squares of the tic-tac-toe board are filled. Either of those conditions
make it a terminal state. In a game of chess, it
might be something like, when there is checkmate, or if
checkmate is no longer possible, that becomes a terminal state. And then finally we'll need a utility
function, a function that takes a state and gives us a numerical value for that
terminal state, some way of saying, if X wins the game,
that has a value of 1. If O has won the game, that
has the value of negative 1. If nobody has won the game,
that has a value of 0. So let's take a look at
each of these in turn. The initial state, we can just represent
in tic-tac-toe as the empty game board. This is where we begin. It's the place from which
we begin this search. And again, I'll be representing
these things visually. But you can imagine
this really just being an array, or a two-dimensional array,
of all of these possible squares. Then we need the player function
that, again, takes a state and tells us whose turn it is. Assuming X makes the first move,
if I have an empty game board, then my player function
is going to return X And if I have a game board where X has
made a move, that my player function is going to return O. The player function
takes a tic-tac-toe game board and tells us whose turn it is. Next up, we'll consider
the actions function. The actions function, much like it
did in classical search, takes a state and gives us the set of
all of the possible actions we can take in that state. So let's imagine it's O's turn to move
in a game board that looks like this. What happens when we pass it
into the actions function? So the actions function takes
this state of the game as input, and the output is a set of
possible actions it's a set of-- I could move in the upper left, or
I could move in the bottom middle. Those are the two possible
action choices that I have when I begin in this particular state. Now, just as before, when
we add states and actions, we need some sort of
transition model to tell us, when we take this action in the state,
what is the new state that we get? And here, we define that using
the result function that takes a state as input, as well as an action. And when we apply the result
function to this state, saying that let's let O move in this
upper left corner, the new state we get is this resulting state, where
O is in the upper-left corner. And now, this seems obvious to someone
who knows how to play tic-tac-toe. Of course, you play in
the upper left corner-- that's the board you get. But all of this information
needs to be encoded into the AI. The AI doesn't know
how to play tic-tac-toe until you tell the AI how the
rules of tic-tac-toe work. And this function,
defining the function here, allows us to tell the AI
how this game actually works and how actions actually
affect the outcome of the game. So the AI needs to know
how the game works. The AI also needs to know
when the game is over. That is by defining a function
called terminal that takes as input a state S, such that if we take
a game that is not yet over, pass it into the terminal
function, the output is false. The game is not over. But if we take a game that is
over, because X has gotten three in a row along that diagonal, pass
that into the terminal function, then the output is going to be true,
because the game now is, in fact, over. And finally, we've told
the AI how the game works in terms of what moves can be made and
what happens when you make those moves. We've told the AI when the game is over. Now we need to tell the AI what the
value of each of those states is. And we do that by defining this utility
function, that takes a state, S, and tells us the score or
the utility of that state. So again, we said that if X wins the
game, that utility is a value of 1, whereas if O wins the game, then
the utility of that is negative 1. And the AI needs to know, for
each of these terminal states where the game is over, what
is the utility of that state? So I can give you a game board like
this, where the game is, in fact, over, and I ask the AI to tell me what the
value of that state is, it could do so. The value of the state is 1. Where things get interesting, though,
is if the game is not yet over. Let's imagine a game board like this. We're in the middle of the game. It's O's turn to make a move. So how do we know it's
O's turn to make a move? We can calculate that,
using the player function. We can say, player of
S, pass in the state. O is the answer, so we
know it's O's turn to move. And now, what is the value of this
board, and what action should O take? Well that's going to depend. We have to do some calculation here. And this is where the Minimax
algorithm really comes in. Recall that X is trying to
maximize the score, which means that O is trying to minimize the score. O would like to minimize the total value
that we get at the end of the game. And because this game isn't
over yet, we don't really know just yet what the
value of this game board is. We have to do some calculation
in order to figure that out. So how do we do that
kind of calculation? Well, in order to do so,
we're going to consider, just as we might in a
classical search situation, what actions could happen next, and
what states will that take us to? And it turns out that
in this position, there are only two open squares, which means
there are only two open places where O can make a move. O could either make a
move in the upper left, or O can make a move
in the bottom middle. And Minimax doesn't know right out
of the box which of those moves is going to be better, so
it's going to consider both. But now we run into the same situation. Now I have two more game boards,
neither of which is over. What happens next? And now it's in this
sense that Minimax is what we'll call a recursive algorithm. It's going to now repeat the
exact same process, although now considering it from the
opposite perspective. It's as if I am now going to put
myself-- if I am the O player, I'm going to put myself in my opponent's
shoes, my opponent as the X player, and consider, what would my opponent
do if they were in this position? What would my opponent do, the X
player, if they were in that position? And what would then happen? Well, the other player,
my opponent, the X player, is trying to maximize the
score, whereas I am trying to minimize the score as the O player. So X is trying to find the maximum
possible value that they can get. And so what's going to happen? Well, from this board position,
X only has one choice. X is going to play here, and
they're going to get three in a row. And we know that that board, X winning-- that has a value of 1. If X wins the game, the value
of that game board is 1. And so from this position, if this
state can only ever lead to this state, it's the only possible option,
and this state has a value of 1, then the maximum possible value that the
X player can get from this game board is also 1 from here. The only place we can get is
to a game with the value of 1, so this game board
also has a value of 1. Now we consider this one over here. What's going to happen now? Well, X needs to make a move. The only move X can make is in the
upper left, so X will go there. And in this game, no one wins the game. Nobody has three in a row. So the value of that game board is 0. Nobody's won. And so again, by the same logic, if
from this board position, the only place we can get to is a board
where the value is 0, then this state must
also have a value of 0. And now here comes the choice part,
the idea of trying to minimize. I, as the O player, now know
that if I make this choice, moving in the upper left, that is going
to result in a game with a value of 1, assuming everyone plays optimally. And if I instead play
in the lower middle, choose this fork in the road, that
is going to result in a game board with a value of 0. I have two options. I have a 1 and a 0 to choose
from, and I need to pick. And as the min player, I
would rather choose the option with the minimum value. So whenever a player
has multiple choices, the min player will choose the
option with the smallest value. The max player will choose the
option with the largest value. Between the 1 in the
0, the 0 is smaller, meaning I'd rather tie the
game than lose the game. And so this game board, we'll
say, also has a value of 0, because if I am playing optimally,
I will pick this fork in the road. I'll place my O here to
block X's three in a row. X will move in the upper left,
and the game will be over, and no one will have won the game. So this is now the logic of Minimax,
to consider all of the possible options that I can take, all of the
actions that I can take, and then to put myself
in my opponent's shoes. I decide what move I'm going to
make now by considering what move my opponent will make on the next turn. And to do that, I consider what move
I would make on the turn after that, so on and so forth, until I get all
the way down to the end of the game, to one of these so-called
terminal states. In fact, this very
decision point, where I am trying to decide as the O player
what to make a decision about, might have just been a part of the
logic that the X player, my opponent, was using the move before me. This might be part of
some larger tree where X is trying to make a
move in this situation and needs to pick between
three different options in order to make a decision
about what to happen. And the further and further away
we are from the end of the game, the deeper this tree has to go,
because every level in this tree is going to correspond to one move,
one move or action that I take, one move or action that my opponent
takes, in order to decide what happens. And in fact, it turns out that if
I am the X player in this position, and I recursively do the logic
and see I have a choice-- three choices, in fact, one of which
leads to a value of 0, if I play here, and if everyone plays optimally,
the game will be a tie. If I play here, then O is going to
win, and I'll lose, playing optimally. Or here, where I, the
X player, can win-- well, between a score of
0 and negative 1 and 1, I'd rather pick the
board with a value of 1, because that's the
maximum value I can get. And so this board would also
have a maximum value of 1. And so this tree can
get very, very deep, especially as the game starts
to have more and more moves. And this logic works not
just for tic-tac-toe, but any of these sorts of games where I
make a move, my opponent makes a move, and ultimately, we have
these adversarial objectives. And we can simplify the diagram
into a diagram that looks like this. This is a more abstract
version of the Minimax tree, where these are each states, but I'm
no longer representing them as exactly like tic-tac-toe boards. This is just representing some generic
game that might be tic-tac-toe, might be some other game altogether. Any of these green arrows
that are pointing up-- that represents a maximizing state. I would like the score
to be as big as possible. And any of these red
arrows pointing down-- those are minimizing states, where
the player is the min player, and they are trying to make
the score as small as possible. So if you imagine in this situation, I
am the maximizing player, this player here, and I have three choices-- one choice gives me a score of 5,
one choice gives me a score of 3, and one choice gives me a score of 9. Well, then, between those
three choices, my best option is to choose this 9 over here, the
score that maximizes my options out of all the three options. And so I can give this
state a value of 9, because among my three
options, that is the best choice that I have available to me. So that's my decision now. You imagine it's like one move
away from the end of the game. But then you could also
ask a reasonable question. What might my opponent do two moves
away from the end of the game? My opponent is the minimizing player. They are trying to make the
score as small as possible. Imagine what would have happened if
they had to pick which choice to make. One choice leads us to this state,
where I, the maximizing player, am going to opt for 9, the
biggest score that I can get. And one leads to this state,
where I, the maximizing player, would choose 8, which is then
the largest score than I can get. Now, the minimizing player, forced
to choose between a 9 or an 8, is going to choose the smallest possible
score, which in this case is an 8. And that is, then, how
this process would unfold. But the minimizing player,
in this case, considers both of their options, and
then all of the options that would happen as a result of that. So this now is a general picture of
what the Minimax algorithm looks like. Let's now try to formalize it
using a little bit of pseudocode. So what exactly is happening
in the Minimax algorithm? Well, given a state, S, we
need to decide what to happen. The max player-- if it's
the max player's turn, then max is going to pick an action,
A, in actions of S. Recall that actions is a function
that takes a state and gives me back all of the
possible actions that I can take. It tells me all of the
moves that are possible. The max player is going
to specifically pick an action, A, in the set
of actions that gives me the highest value of min value of result
of S and A. So what does that mean? Well, it means that I want to
make the option that gives me the highest score of
all of the actions, A. But what score is that going to have? To calculate that, I need to know
what my opponent, the min player, is going to do if they try to minimize
the value of the state that results. So we say, what state results
after I take this action, and what happens when
the min player tries to minimize the value of that state? I consider that for all
of my possible options. And after I've considered that
for all of my possible options, I pick the action, A, that
has the highest value. Likewise, the min player is going
to do the same thing, but backwards. They're also going to consider, what
are all of the possible actions they can take if it's their turn? And they're going to pick the
action, A, that has the smallest possible value of all the options. And the way they know what the smallest
possible value of all the options is, is by considering what the
max player is going to do, by saying, what's the result of applying
this action to the current state, and then, what would the
max player try to do? What value would the max player
calculate for that particular state? So everyone makes their decision
based on trying to estimate what the other person would do. And now we need to turn
our attention to these two functions, maxValue and minValue. How do you actually calculate
the value of a state if you're trying to maximize
its value, and how do you calculate the value of a state if
you're trying to minimize the value? If you can do that, then we
have an entire implementation of this Minimax algorithm. So let's try it. Let's try and implement
this maxValue function that takes a state and returns
as output the value of that state if I'm trying to maximize
the value of the state. Well, the first thing I can check
for is to see if the game is over, because if the game is over-- in other words, if the
state is a terminal state-- then this is easy. I already have this utility
function that tells me what the value of the board is. If the game is over, I
just check, did X win? Did O win? Is that a tie? And the utility function just knows
what the value of the state is. What's trickier is if
the game isn't over, because then I need to do this
recursive reasoning about thinking, what is my opponent going
to do on the next move? Then I want to calculate
the value of this state, and I want the value of the
state to be as high as possible. And I'll keep track of that
value in a variable called v. And if I want the value
to be as high as possible, I need to give v an initial value. And initially, I'll just go ahead
and set it to be as low as possible, because I don't know what
options are available to me yet. So initially, I'll set v equal
to negative infinity, which seems a little bit
strange, but the idea here is, I want the value initially
to be as low as possible, because as I consider
my actions, I'm always going to try and do better than v.
And if I set v to negative infinity, I know I can always do better than that. So now I consider my actions. And this is going to
be some kind of loop, where for every action
in actions of state-- recall, actions is a
function that takes my state and gives me all the possible
actions that I can use in that state. So for each one of those actions,
I want to compare it to v and say, all right, v is going to be equal to
the maximum of v and this expression. So what is this expression? Well, first it is, get the result
of taking the action and the state, and then get the min value of that. In other words, let's
say, I want to find out from that state what is the
best that the min player can do, because they are going to
try and minimize the score. So whatever the resulting score
is of the min value of that state, compare it to my current best value,
and just pick the maximum of those two, because I am trying
to maximize the value. In short, what these three
lines of code are doing are going through all of my possible
actions and asking the question, how do I maximize the score, given
what my opponent is going to try to do? After this entire loop,
I can just return v, and that is now the value
of that particular state. And for the min player, it's the exact
opposite of this, the same logic, just backwards. To calculate the minimum
value of a state, first we check if it's a terminal state. If it is, we return its utility. Otherwise, we're going to now try
to minimize the value of the state, given all of my possible actions. So I need an initial value
for v, the value of the state. And initially, I'll set it to
infinity, because I know it can always get something less than infinity. So by starting with v equals infinity,
I make sure that the very first action I find-- that will be less than this value of v. And then I do the same thing-- loop over all of my possible
actions, and for each of the results that we could get when
the max player makes their decision, let's take the minimum of that
and the current value of v. So after all is said and done I get
the smallest possible value of v, that I then return back to the user. So that, in effect, is the
pseudocode for Minimax. That is how we take a game and
figure out what the best move to make is by recursively using these
maxValue and minValue functions, where maxValue calls minValue,
minValue calls maxValue, back and forth, all the way until we reach
a terminal state, at which point our algorithm can simply return the
utility of that particular state. What you might imagine
is that this is going to start to be a long process,
especially as games start to get more complex, as we start to add
more moves and more possible options and games that might
last quite a bit longer. So the next question to ask is, what
sort of optimizations can we make here? How can we do better in
order to use less space or take less time to be able
to solve this kind of problem? And we'll take a look at a
couple of possible optimizations. But for one, we'll take
a look at this example. Again, we're turning to these
up arrows and down arrows. Let's imagine that I now am the
max player, this green arrow. I am trying to make the
score as high as possible. And this is an easy game,
where there are just two moves. I make a move, one of
these three options, and then my opponent makes a
move, one of these three options, based on what move I make. And as a result, we get some value. Let's look at the order in
which I do these calculations and figure out if there are any
optimizations I might be able to make to this calculation process. I'm going to have to look at
these states one at a time. So let's say I start here on
the left and say, all right, now I'm going to consider, what will the
min player, my opponent, try to do here? Well, the min player is going to look
at all three of their possible actions and look at their value, because
these are terminal states. They're the end of the game. And so they'll see, all right, this
node is a value of 4, value of 8, value of 5. And the min player is going
to say, well, all right. Between these three
options, 4, 8, and 5, I'll take the smallest
one I'll take the 4. So this state now has a value of 4. Then I as the max player say,
all right, if I take this action, it will have a value of 4. That's the best that I
can do, because min player is going to try and minimize my score. So now, what if I take this option? We'll explore this next. And now I explore what the min player
would do if I choose this action. And the min player is going to say,
all right, what are the three options? The min player has options
between 9, 3, and 7, and so 3 is the smallest among 9, 3, and 7. So we'll go ahead and say
this state has a value of 3. So now I, as the max player-- I have now explored two
of my three options. I know that one of my options will
guarantee me a score of 4, at least, and one of my options will
guarantee me a score of 3. And now I consider my third option
and say, all right, what happens here? Same exact logic-- the
min player is going to look at these three
states, 2, 4, and 6, say the minimum possible option is
2, so the min player wants the two. Now I, as the max player, have
calculated all of the information by looking two layers deep, by
looking at all of these nodes. And I can now say, between the 4,
the 3, and the 2, you know what? I'd rather take the
4, because if I choose this option, if my
opponent plays optimally, they will try and get me to the
4, but that's the best I can do. I can't guarantee a
higher score, because if I pick either of these two options, I
might get a 3, or I might get a 2. And it's true that down
here is a 9, and that's the highest score of any of the scores. So I might be tempted
to say, you know what? Maybe I should take this option,
because I might get the 9. But if the min player is
playing intelligently, if they're making the best
moves at each possible option they have when they get to make
a choice, I'll be left with a 3, whereas I could better,
playing optimally, have guaranteed that I would get the 4. So that doesn't affect
the logic that I would use as a Minimax player trying to
maximize my score from that node there. But it turns out, that took
quite a bit of computation for me to figure that out. I had to reason through all of these
nodes in order to draw this conclusion. And this is for a pretty simple
game, where I have three choices, my opponent has three choices,
and then the game's over. So what I'd like to do is come up
with some way to optimize this. Maybe I don't need to do all of
this calculation to still reach the conclusion that, you know what? This action to the left-- that's the best that I could do. Let's go ahead and try again and
try and be a little more intelligent about how I go about doing this. So first, I start the exact same way. I don't know what to
do initially, so I just have to consider one of the options and
consider what the min player might do. Min has three options, 4, 8, and 5. And between those three options,
min says, 4 is the best they can do, because they want to try
to minimize the score. Now, I, the max player, will
consider my second option, making this move here and considering
what my opponent would do in response. What will the min player do? Well, the min player is going to, from
that state, look at their options. And I would say, all right. 9 is an option, 3 is an option. And if I am doing the math
from this initial state, doing all this calculation,
when I see a 3, that should immediately
be a red flag for me, because when I see a 3
down here at this state, I know that the value of this
state is going to be at most 3. It's going to be 3 or
something less than 3, even though I haven't yet looked at
this last action or even further actions if there were more actions
that could be taken here. How do I know that? Well, I know that the min player is
going to try to minimize my score. And if they see a 3, the only way
this could be something other than a 3 is if this remaining thing that I
haven't yet looked at is less than 3, which means there is no way for this
value to be anything more than 3, because the min player
can already guarantee a 3, and they are trying
to minimize my score. So what does that tell me? Well, it tells me that
if I choose this action, my score is going to be 3, or maybe
even less than 3, if I'm unlucky. But I already know that this
action will guarantee me a 4. And so given that I know that this
action guarantees me a score of 4, and this action means I
can't do better than 3, if I'm trying to maximize
my options, there is no need for me to
consider this triangle here. There is no value, no
number that could go here, that would change my mind
between these two options. I'm always going to opt for
this path that gets me a 4, as opposed to this path, where
the best I can do is a 3, if my opponent plays optimally. And this is going to be true for all of
the future states that I look at, too. But if I look over here, at what
min player might do over here, if I see that this state is a 2, I
know that this state is at most a 2, because the only way this value
could be something other than 2 is if one of these remaining
states is less than a 2, and so the min player
would opt for that instead. So even without looking
at these remaining states, I, as the maximizing player, can know
that choosing this path to the left is going to be better than choosing
either of those two paths to the right, because this one can't be better than
3, this one can't be better than 2, and so 4 in this case is
the best that I can do. And I can say now that this
state has a value of 4. So in order to do this
type of calculation, I was doing a little bit more
bookkeeping, keeping track of things, keeping track all the time of,
what is the best that I can do, what is the worst that I can do, and
for each of these states, saying, all right, well, if I already
know that I can get a 4, then if the best I can
do at this state is a 3, no reason for me to consider it. I can effectively prune this leaf
and anything below it from the tree. And it's for that reason this
approach, this optimization to Minimax, is called alpha-beta pruning. Alpha and beta stand
for these two values that you'll have to keep track
of, the best you can do so far and the worst you can do so far. And pruning is the idea of, if I
have a big, long, deep search tree, I might be able to search it
more efficiently if I don't need to search through everything,
if I can remove some of the nodes to try and optimize the way that I
look through this entire search space. So alpha-beta pruning can
definitely save us a lot of time as we go about the search process by
making our searches more efficient. But even then, it's still not
great as games get more complex. Tic-tac-toe, fortunately,
is a relatively simple game, and we might reasonably
ask a question like, how many total possible
tic-tac-toe games are there? You can think about it. You can try and estimate, how many
moves are there at any given point? How many moves long can the game last? It turns out there are about 255,000
possible tic-tac-toe games that can be played. But compare that to a more
complex game, something like a game of chess, for example-- far more pieces, far more moves,
games that last much longer. How many total possible
chess games could there be? It turns out that after
just four moves each, four moves by the white player,
four moves by the black player, that there are 288
billion possible chess games that can result from that
situation, after just four moves each. And going even further. If you look at entire chess games and
how many possible chess games there could be as a result there,
there are more than 10 to the 29,000 possible chess
games, far more chess games than could ever be considered. And this is a pretty big problem for the
Minimax algorithm, because the Minimax algorithm starts with an initial state,
considers all the possible actions and all the possible actions
after that, all the way until we get to the end of the game. And that's going to be a
problem if the computer is going to need to look through
this many states, which is far more than any computer could ever
do in any reasonable amount of time. So what do we do in order
to solve this problem? Instead of looking through
all these states, which is totally intractable for a computer,
we need some better approach. And it turns out that better approach
generally takes the form of something called depth-limited Minimax. Where normally Minimax
is depth-unlimited-- we just keep going, layer
after layer, move after move, until we get to the end of the game-- depth-limited Minimax is instead
going to say, you know what? After a certain number
of moves-- maybe I'll look 10 moves ahead, maybe I'll look
12 moves ahead, but after that point, I'm going to stop and not
consider additional moves that might come after that,
just because it would be computationally intractable to
consider all of those possible options. But what do we do after we
get 10 or 12 moves deep, and we arrive at a situation
where the game's not over? Minimax still needs a way to assign a
score to that game board or game state to figure out what its
current value is, which is easy to do if the
game is over, but not so easy to do if the game is not yet over. So in order to do that, we need
to add one additional feature to depth-limited Minimax
called an evaluation function, which is just some
function that is going to estimate the expected utility
of a game from a given state. So in a game like chess, if you
imagine that a game value of 1 means white wins, negative 1 means
black wins, 0 means it's a draw, then you might imagine that a score of
0.8 means white is very likely to win, though certainly not guaranteed. And you would have an evaluation
function that estimates how good the game state happens to be. And depending on how good
that evaluation function is, that is ultimately what's going
to constrain how good the AI is. The better the AI is
at estimating how good or how bad any particular game
state is, the better the AI is going to be able to play that game. If the evaluation function
is worse and not as good as estimating what the
expected utility is, then it's going to be
a whole lot harder. And you can imagine trying to come
up with these evaluation functions. In chess, for example, you might
write an evaluation function based on how many pieces
you have, as compared to how many pieces your
opponent has, because each one has a value in your evaluation function. It probably needs to
be a little bit more complicated than that to consider
other possible situations that might arise as well. And there are many other
variants on Minimax that add additional features in
order to help it perform better under these larger and more
computationally intractable situations, where we couldn't possibly
explore all of the possible moves, so we need to figure out
how to use evaluation functions and other techniques to be
able to play these games, ultimately, better. But this now was a look at this kind
of adversarial search, these search problems where we have
situations where I am trying to play against some sort of opponent. And these search problems
show up all over the place throughout artificial intelligence. We've been talking a lot today about
more classical search problems, like trying to find directions
from one location to another. But anytime an AI is faced with
trying to make a decision like, what do I do now in order to
do something that is rational, or do something that is intelligent,
or trying to play a game, like figuring out what move to
make, these sort of algorithms can really come in handy. It turns out that for tic-tac-toe,
the solution is pretty simple, because it's a small game. XKCD has famously put
together a webcomic where he will tell you exactly what
move to make as the optimal move to make, no matter what
your opponent happens to do. This type of thing is
not quite as possible for a much larger game
like checkers or chess, for example, where chess
is totally computationally intractable for most computers
to be able to explore all the possible states. So we really need our AI to be
far more intelligent about how they go about trying to
deal with these problems and how they go about
taking this environment that they find themselves
in and ultimately searching for one of these solutions. So this, then, was a look at
search and artificial intelligence. Next time we'll take
a look at knowledge, thinking about how it is that our AIs
are able to know information, reason about that information, and draw
conclusions, all in our look at AI and the principles behind it. We'll see you next time.