It's 2009
and Usain Bolt is about to 100 meters in only 9.58 seconds. A world record unlikely to be broken ever again until today. This is an artificial intelligence, a
ragdoll controlled completely by an AI. Let's check if it works. At the moment
our AI is untrained and naive. So let's fix it. This is a test running track measuring
precisely 100 meters. Our AI is going to be training here
for thousands of hours and episodes. But before we do
that, let's get to know our AI. Our ragdoll weighs
70 kilos and is six foot tall. Inside its head
is something called a neural network. Think of it as some mathematical formula
which roughly mimics how the brain functions. To train this neural network, we give it inputs,
which it then processes into outputs. Since we are training any AI to run,
we're going to give it complete control over all of its joints. The euler angles of each joint
is extracted and arranged into a vector which is then fed into the neural network
as inputs. Since we have a total of 16 joints,
we will need a decently large network. Besides the input and output,
there are hidden layers. The number of hidden layers and nodes
is what determines the IQ of the AI. 256 should be enough. Now, this may seem
like a lot of brain cells, but in reality it's only about 10% of the neurons
in a jellyfish, which in turn has only about 0.00001%
of the neurons in a human. Now that I've explained how dumb our
AI is, let's talk about rewards. Since we are using a reinforcement
learning algorithm, we need to define a reward function. This incentivizes
the AI to behave a certain way. Bolt's average speed during the record
breaking 100 meter race, was about 10.4 meters
per second. To beat this, the AI needs to run at
at least 11 meters per second on average. So we set this as the target velocity and reward the
AI for how closely it matches this speed. As the current velocity of the
AI approaches 11, the reward approaches 1. As the AI deviates
away, the reward approach zero. This should encourage our AI to run fast. To make sure the AI runs
in a straight lane, a small reward will be given
for facing forward. Now that we have defined
the reward function, there is only one thing left to be done. Let's get to training. So what has our AI learned? Let's see if it can now beat Usain Bolt. Usain Bolt
absolutely zips away while our AI face face-plants itself onto the ground,
struggling to find meaning in life. So what's wrong? The AI seems to have developed
a unique stride, very similar to a zombie. There is one dominant leg
while the other barely drags along. This is a severe bottleneck preventing the AI from running faster
and it needs to be fixed. So how exactly do we force our
AI to use both of his legs? Well, randomization. We will now use a special cubic
training environment. The ragdolls orientation
will be randomized every single episode. A cube will be randomly spawned around the area, which the
AI has to face towards, to earn rewards. This means the
AI has to face the cube and maintain a target velocity of three meters
per second. The lower velocity will ensure
more stable training. This environment will force the
AI to twist and turn in various directions and hopefully use both of his legs. Our AI has successfully learned to use
both of its legs. If we now place our AI back into the previous environment,
we can see that it's now walking properly. Since it now learned to use
both of his legs, it won't forget it and we can safely bump up the target
velocity back to 11. However, before we resume training,
few changes need to be made. There is now going to be a penalty
for falling over. Usain Bolt is six foot five while our
AI is only six foot. So it's only fair to make our
AI slightly taller. There is one last change to be made
and this is in the reward function itself. We are now going to penalize the
AI for a simply existing. Yes, you heard that right. A small penalty every single step. Imagine you are in a real life maze
and you have to escape. There is no time limit and you can find
the escape route at your own pace. Sounds pretty simple, but now imagine the same maze
with a serial killer placed behind you. You will be forced to run faster and find
the escape route quickly and efficiently. This is the idea behind the small penalty. It will force the AI to run fast
and use its limbs efficiently. So in summary, Our AI is now taller, there is a penalty for falling over
and also a penalty for existing. Let's get to training
and see what our AI learns this time.