Transcript for:
Training an AI to Beat Usain Bolt's Record

It's 2009 and Usain Bolt is about to 100 meters in only 9.58 seconds. A world record unlikely to be broken ever again until today. This is an artificial intelligence, a ragdoll controlled completely by an AI. Let's check if it works. At the moment our AI is untrained and naive. So let's fix it. This is a test running track measuring precisely 100 meters. Our AI is going to be training here for thousands of hours and episodes. But before we do that, let's get to know our AI. Our ragdoll weighs 70 kilos and is six foot tall. Inside its head is something called a neural network. Think of it as some mathematical formula which roughly mimics how the brain functions. To train this neural network, we give it inputs, which it then processes into outputs. Since we are training any AI to run, we're going to give it complete control over all of its joints. The euler angles of each joint is extracted and arranged into a vector which is then fed into the neural network as inputs. Since we have a total of 16 joints, we will need a decently large network. Besides the input and output, there are hidden layers. The number of hidden layers and nodes is what determines the IQ of the AI. 256 should be enough. Now, this may seem like a lot of brain cells, but in reality it's only about 10% of the neurons in a jellyfish, which in turn has only about 0.00001% of the neurons in a human. Now that I've explained how dumb our AI is, let's talk about rewards. Since we are using a reinforcement learning algorithm, we need to define a reward function. This incentivizes the AI to behave a certain way. Bolt's average speed during the record breaking 100 meter race, was about 10.4 meters per second. To beat this, the AI needs to run at at least 11 meters per second on average. So we set this as the target velocity and reward the AI for how closely it matches this speed. As the current velocity of the AI approaches 11, the reward approaches 1. As the AI deviates away, the reward approach zero. This should encourage our AI to run fast. To make sure the AI runs in a straight lane, a small reward will be given for facing forward. Now that we have defined the reward function, there is only one thing left to be done. Let's get to training. So what has our AI learned? Let's see if it can now beat Usain Bolt. Usain Bolt absolutely zips away while our AI face face-plants itself onto the ground, struggling to find meaning in life. So what's wrong? The AI seems to have developed a unique stride, very similar to a zombie. There is one dominant leg while the other barely drags along. This is a severe bottleneck preventing the AI from running faster and it needs to be fixed. So how exactly do we force our AI to use both of his legs? Well, randomization. We will now use a special cubic training environment. The ragdolls orientation will be randomized every single episode. A cube will be randomly spawned around the area, which the AI has to face towards, to earn rewards. This means the AI has to face the cube and maintain a target velocity of three meters per second. The lower velocity will ensure more stable training. This environment will force the AI to twist and turn in various directions and hopefully use both of his legs. Our AI has successfully learned to use both of its legs. If we now place our AI back into the previous environment, we can see that it's now walking properly. Since it now learned to use both of his legs, it won't forget it and we can safely bump up the target velocity back to 11. However, before we resume training, few changes need to be made. There is now going to be a penalty for falling over. Usain Bolt is six foot five while our AI is only six foot. So it's only fair to make our AI slightly taller. There is one last change to be made and this is in the reward function itself. We are now going to penalize the AI for a simply existing. Yes, you heard that right. A small penalty every single step. Imagine you are in a real life maze and you have to escape. There is no time limit and you can find the escape route at your own pace. Sounds pretty simple, but now imagine the same maze with a serial killer placed behind you. You will be forced to run faster and find the escape route quickly and efficiently. This is the idea behind the small penalty. It will force the AI to run fast and use its limbs efficiently. So in summary, Our AI is now taller, there is a penalty for falling over and also a penalty for existing. Let's get to training and see what our AI learns this time.