Transcript for:
Exploring Monte Carlo Tree Search Techniques

[Music] hi I'm Tommy Thompson and this is an AI 101 video on Monte Carlo tree search in this video I'm gonna give a brief overview of the Monte Carlo tree search or MCTS algorithm how it operates in the general sense as well as why it's no utilize in a variety of problem spaces such as video game AI opponents as well as expert computer players and video games MCTS is a heuristic driven search algorithm that's a combination of classic tree search implementations alongside machine learning principles of reinforcement learning tree search algorithms connect the potential states of a given problem by the actions that will achieve them we then Ray algorithms that can search within this space in order to make intelligent decisions that yield a desired outcome by exploring the tree to find a desired goal State the simplest versions of these are the uninformed algorithms such as breadth and depth for search which follow very strict and specific ordering of states to visit while more intelligent algorithms such as a star used matrix such as the cost of an action as well as a heuristic that estimates the value of a state and solving a problem meanwhile reinforcement learning is a branch of machine learning algorithms that learns good strategies for the problem it's trying to solve it aims to achieve an optimal strategy by repeatedly using the best actions that has found in that problem however there's always the possibility that the current based action is in actuality not optimal as such it will continue to evaluate alternatives periodically during the learning phase by executing them instead of the perceived optimal this is known in reinforcement learning as the exploration exploitation trade-off but it will exploit the actions and strategies that is found to be the best but must also continue to explore the local space of alternative decisions and see whether they could replace the current best this is a difficult process to resolve given that sometimes we need to really explore a series of decisions to discover that an action that might look bad now will actually prove to be a really good idea somewhere down the line the perceived value of these actions and their subsequent states is encoded into the state space itself with the value of a state being continually updated and improved the more the algorithm tries options available to it and then learns whether it was a good decision or not MCTS essentially merges these two practices together it uses a tree model of the world that it searches for the best possible path by applauding specific subsections of the tree calculating how effective they were and then updating the value of states within na based on the outcome by doing this hundreds if not thousands of times they can establish the base path to take through the tree and give the best answer to a given problem this repeated evaluation in order to determine the value of a given state or strategy is derived from the Monte Carlo method a type of algorithm that repeatedly samples a problem space randomly and order to obtain a more accurate understanding of the best answers within it it carries many similarities to an existing algorithm for tree search called minimax which also evaluates all the options of the tree in order to make a decision the problem with minimax is when we're in large problem spaces whether our large number of actions an agent can execute it increases the number of states on the next layer down in the tree that the algorithm has to explore this is a property known as the branching factor if branching factor is high then algorithms like minimax don't scale well at all given they need to evaluate every opportunity quite thoroughly with some adapted versions such as alpha beta pruning a requirement to explore large and complex problems this is where MCTS proves its value it only searches a couple of layers deep into the tree prioritizes what parts of the tree to explore simulates the outcome rather than exhaustively expand the search space and can be limited in how many evaluations that makes this helps with problems with very high branching factor as it will isolate what parts of the tree it should explore the individual evaluations are relying on the use of a playout but the algorithm effectively plays the game from a given starting point all the way to the end by making random decisions a concept derived from the monte carlo method now exploring the future states of a game can be difficult to do depending on the game itself board games are often easier to encode and allow for time to run the evaluations however large-scale real-time video games make even more difficult such MCTS uses what's called a forward model an abstract approximation of the game logic that allows it to consider the outcome of playing action X in state Y resulting in outcome Z that runs play out to a terminal state and then record the result which is then used to update the value of a given stay in the tree once it's done running the evaluations it simply selects the action that had the best rollout score the smart part comes in how each rollout is to say that upon an exit to do their sit relies on four key steps selection expansion simulation and back propagation selection takes the current state of the tree and selects decisions down that tree to a future state fixed depth expansion then moves one step down to expose a new state in the tree provided the state we reach didn't in the game either as a winner loss or exceeds the tree depth is then going to simulate the value of that state simulation is the random playout phase that plays a game of completely random decisions from this point until it reaches either a terminal state where a wins or losses or a simulation cap is reached it then gives back a result of how well it performed as a score this is passed to the back propagation phase in back propagation we update the perceived value of a given state not just to the state we run the rollout but every state that led to it in the tree so any score be it positive or negative propagates back up the tree to the starting point through these four phases we can take decisions to a fixed point in the tree simulate their outcome propagate back the perceived value of n now doing this once isn't enough you have to do it hundreds thousands of times and balance which playoffs to make which I'll come back to in a second but once the player limit is reached it's done and it takes the action leading to the best value state different MCTS algorithms balance the players such that they shift focused at different parts of the tree periodically to ensure there are no better solutions to be found that didn't otherwise spot the original implementation of this balancing act is UCT or upper confidence bound one applied to trees UCT chooses which node to visit next based on the equation shown on screen now using these on the left hand side of the equation is ensuring we retain some aspect of history in the decision-making this is the exploitation part of the algorithm meanwhile these last two are the expiration component meaning an OPD audit li encourage exploding other parts of the space as the number of evaluations increases what makes the system even more powerful is that it's what we call an anytime algorithm meaning it will always give an answer regardless of how many players we let it make so in a context like a game or CPU and memory resources are pretty tight if it needs to stop evaluating the game at a moment's notice that will still give the best answer it could with time despite this giving a massive amount cpu resource one necessarily result in God Lake AI given the knowledge accrued from repeatedly running play outs me eventually level o MCTS is proving very popular in areas of general intelligence and expert play most notably as part of the action selection process and the expert goal player alphago by google deepmind but also in the video game industry to date the most notable examples of its adoption include the sadly canceled fable legends alongside the total war franchise a point i discuss at length in parts three and four of my ai of total war series but it's also being used in research in areas such as miss pac-man Magic the Gathering and also general intelligence as part of the general video game AI competition this has been the AI 101 on Monte Carlo tree search i'm tommy thompson and they say AI 101 video like all my other case studies video sh tutorials and the lake are supported thanks to my sponsors over at patreon.com forward slash a i underscore and underscore games thanks to these awesome people they provide support that allows me to make more of these videos for you all out there thanks for watching and i'll see you again soon