Transcript for:
Lecture Notes: Game Playing Algorithms and Minimax Algorithm

Computers are exceptionally good at playing games. Whether it's a simple game like Tic-Tac-Toe, or a complex game like chess, computers are excellent at taking the current position in the game, referred to as the game state, and figuring out what the best next move is. But how do computers do it? One of the most important algorithms is the minimax algorithm, a general-purpose algorithm for game-playing that determines the optimal decision to make in an adversarial game, that is to say, a game where you and your opponent have opposite goals. your goal is for you to win, and your opponent's goal is for them to win. What does it mean to play optimally? Optimal play is about ensuring the best outcome for yourself even in the worst-case scenario: the worst-case scenario in this case being that your opponent is playing optimally too. In other words, making the best move isn't just about playing a move that potentially allows you to win. It's about playing a move that does the best even if your opponent makes the best possible moves to try to prevent you from achieving your goal. To understand the algorithm, how it works, and the challenges we run into when using the algorithm in practice, let's start by setting up some fundamentals. For these examples, we'll look at the simple game of Tic-Tac-Toe, but know that the same principles can apply to other games too. What ultimately matters when trying to play a game optimally is the outcome: what happens when the game is over? There are a lot of different states that the game could be in, but some of them, known as terminal states, are states when the game is over: for Tic-Tac-Toe, that's when X has gotten three in a row, when O has gotten three in a row, or when every square has been filled in and it's a tie. So that our algorithm can compare these terminal states, we'll assign a numerical value to each terminal state. When X wins, we'll give that a value of 1, when O wins we'll give that a value of -1, and when it's a tie we'll give that terminal state a value of 0. As a result of assigning values to each terminal state, we can give each player a mathematical goal. The X player, who we'll start calling the MAX player, has the goal of maximizing the score: ideally, the X player wants to win (which has a score of 1), but if that's not possible then a tie (with a score of 0) is better than losing (with a score of -1). Meanwhile, the O player, who we'll start calling the MIN player, has the goal of minimizing the score: they want O to win, but if that's not possible then a tie is better than X winning. These values are useful for players to decide which outcomes they prefer over other outcomes. But importantly, we can also use these values to help us decide what choices to make while the game is still ongoing. Take this game state, for example, where it's the O player's turn to make a move. It's not a terminal state, so we can't just look directly at the game's outcome to determine the value: the game's not over yet. But remember that O is the MIN player. In other words, O wants the value to be as small as possible. So to determine the value of a game state where it's the MIN player's turn, we can consider all of the possible actions that the player can take in the game state, and look at what board will result from taking those actions. In this case, the player has two possible moves, each of which leads to a different game state. But in both of these game states, the game's still not over. It's the MAX player's turn, so we'd want to consider what their best move is. But now there's only one possible choice left for the MAX player to bring us to the end of the game, so we can calculate what the outcome is after that last move. In one case, X wins the game: that has a value of 1. In the other case, the game is a tie: that has a value of 0. And now that we know the value of these terminal states, we can also use that as the value of the states that immediately lead to those states: since these game states each lead to only a single possible outcome, we can use the value of that outcome as the value of these game states. So now we return to the original question: what's the value of this game state? To calculate that, remember the goal of the MIN player. The MIN player wants the value to be as small as possible. So we consider all of the actions we can take in the state and all of the corresponding resulting game states, and we find the one with the minimum value. That action is our optimal action, and that value is the value of the game state. We can try to formalize that idea into a more precisely-defined algorithm for Minimax. Our algorithm will rely on certain information about the game that we need to know. First, we need to be able to determine whether a game state is a terminal state, meaning the game is over. So we'll have a function Terminal that will tell us that. If it's a terminal state, we need to be able to get the Value of the the terminal state. The value should be something that the MAX player wants to maximize and the MIN player wants to minimize. We also need to be able to determine whose turn it is in any given game state. So we'll have a function called Player that takes a game state and tells us whether it's MAX or MIN's turn. In a particular game state, we need to know what actions are available to the player. So Actions will be a function that takes a game state and gives us all of the possible actions we can take in that state. And finally, we'll also need a function Result that takes a state and an action and tells us what the new state of the game will be after taking that action. With those pieces in mind, let's put together the Minimax algorithm. The algorithm will take any game state s and calculate the value of that state. If the state is a terminal state, then the game is over and we can just calculate the result directly, for example, by seeing who won or whether the game ended in a tie. Otherwise, the game's not over yet, so we can check to see whose turn it is: the MAX player or the MIN player. If it's the MAX player's turn, remember that our goal is to make the value of the game as large as possible. The way we'll do that is by checking all of the possible actions we can take, and figuring out which one will lead to the greatest value even if the MIN player is playing optimally. Before we've considered any actions, we'll start by setting the value to the smallest value possible (negative infinity), so that any action we find will lead to a value better than it. Then, we'll loop through all of the actions. For each action we can take, we'll figure out what the resulting game state is, and then calculate the Minimax value of that new game state. If it's greater than the best value we've found so far, it becomes the new value. The result of this loop is that the value variable will be equal to the largest value we can get considering all possible actions, so that's the value we'll return. Now, let's consider the same situation if it's the MIN player's turn. The approach will be the same, but now we want the value to be as small as possible. So we'll start by setting the value to the largest value possible (infinity), so that any action we find will lead to a value lower than it. We'll loop through all actions, calculating the Minimax value of the resulting game state, and update the value if it's smaller than the best value we've found so far. And then at the end, we'll return the value. And that's it: this algorithm can take any game state and determine its value. If we want to know what the best action to take is, we can use this algorithm to calculate what the value of each possible next game state is and pick the action that will lead to the highest or lowest value, depending on whether we're the MAX or MIN player. Notice that the algorithm is recursive. For any non-terminal game state, we consider all of the possible next moves, and for each of the resulting game states, we use the Minimax algorithm recursively to determine the value of that new game state. In doing so, that could result in recursively calling Minimax again and again until we reach a terminal state where the game is over. This algorithm is optimal: correctly implemented, it will make the best choices and it will never lose a game of tic-tac-toe. In theory, the same could be true for other games too: but we quickly run into challenges when using this algorithm on more complicated games. Tic-tac-toe is a relatively small game. There are only a maximum of nine possible actions you can take in any given game state, and that number goes down over time. And a game can only last up to nine moves total before the game will be over, so a computer can relatively quickly use Minimax to explore all possible tic-tac-toe games (building what's known as a game tree) and determine the best choice. Other games can go much longer, and have many more choices: in a game of chess, for example, a player might have dozens of choices per move in games, and games can go on for more than 50 moves for each player and will sometimes go on more than 100 moves for each player. Given unlimited time and processing power, a computer could theoretically run the Minimax algorithm on the game of chess to determine the optimal move to make in any position. But practically speaking, there are far too many possibilities for that to be a reasonable option. So, to make the algorithm more efficient, we have to make some optimizations. One optimization we can make is to eliminate exploring certain game states if we know they won't be relevant for determining the best move. Consider, for example, the game state we were looking at before. We had explored one possible move we could make that results in the game being a tie. Then, we explored a second possible move we could make. In exploring that, we found that one move for our opponent allows our opponent to win the game. But once we've discovered that, there's actually no point in going on to analyze any other move the opponent could make in the position: we clearly never want to take this action that lets the opponent win if we have a different action that guarantees we can at least tie the game. And that's true no matter what the other opponent moves might lead to. It doesn't even matter if potentially there's a way for us to win. If our opponent is also playing optimally, they'll never choose the path that lets us win if they have a way to win instead, so the only thing that matters is that after this action, the opponent has an option which leads to a worse outcome for us than some other action we've already explored. As a result, we can skip searching the rest of this part of the game tree. And any time we run into similar situations where we know parts of the game tree won't matter, we can skip analyzing those game states too. This process, known as alpha-beta pruning, makes our Minimax algorithm more efficient without affecting the correctness of the algorithm: we're still going to end up making the optimal choices because we're only eliminating parts of the game tree that we know with certainty won't matter. This helps a little bit, but it still potentially leaves a lot of game states to be analyzed, especially in more complicated games. So another approach is this: instead of determining the best move by calculating all of the possible ways the game could proceed until the end of the game, we'll limit ourselves to a particular depth: maybe we'll only explore 3 moves ahead, or 5 moves ahead, or 10 moves ahead, or more, depending on how much time we have and how fast our computer is. Once we reach that depth, though, the game won't necessarily be over. So we can't easily directly calculate the value of these game states: we don't know what the value is. Instead, we'll need some way to estimate the value of the game, using something known as an evaluation function. The evaluation function's job is to estimate what it thinks the value of any game state is, so that we can use its result to make decisions about what action we should take. Because we're now estimating value instead of fully calculating it, our algorithm is no longer optimal: it might not always make the best decision, but it might be able to make a good decision in significantly less time. How good of a decision it makes depends on how good of an evaluation function we're using. In a game like chess, a very simple evaluation function might be to add up the value of all of the pieces for each player and use that to determine who's better in the game. That might work for many positions, but there are also many positions where the side with more valuable pieces might not be winning. So better evaluation functions are needed to get the computer to perform better. Coming up with a good evaluation function can be tricky, so another approach is to have our computers learn on their own how to evaluate game positions, using techniques from machine learning. Over time, we've developed more and more techniques for getting computers to more efficiently find good moves to play in games, but Minimax remains an important part of many of these game-playing agents: with its ability to explore the game tree, consider the available actions and how the opponent might respond, and decide on the best move to play.