Transcript for: The Most Famous Problem in Game Theory: The Prisoner's Dilemma
This is a video about
the most famous problem in game theory. Problems of this sort pop up
everywhere, from nations locked in conflict to roommates doing the dishes. Even game shows have been
based around this concept. Figuring out the best strategy
can mean the difference between life and death, war and peace, flourishing and the
destruction of the planet. And in the mechanics of this game, we may find the very source of one of the most unexpected phenomena
in nature: cooperation. (upbeat music) On the 3rd of September, 1949, an American weather monitoring plane collected air samples over Japan. In those samples, they found
traces of radioactive material. The Navy quickly collected and tested rainwater
samples from their ships and bases all over the world. They also detected small amounts of Cerium-141 and Yttrium-91. But these isotopes have half
lives of one or two months, so they must have been produced recently and the only place they
could have come from was a nuclear explosion. But the US hadn't performed
any tests that year, so the only possible conclusion was that the Soviet Union had figured out how to make a nuclear bomb. This was the news the
Americans had been dreading. Their military supremacy achieved through the Manhattan
Project was quickly fading. - This makes the problem of Western Europe and the United States far more
serious than it was before and perhaps makes the
imminence of war greater. - Some thought their
best course of action was to launch an unprovoked nuclear
strike against the Soviets while they were still ahead. In the words of Navy Secretary Matthews to become "aggressors for peace". John von Neuman, the
founder of game theory, said, "If you say why
not bomb them tomorrow, I say, why not bomb them today? If you say today at five o'clock, I say why not at one o'clock?" Something needed to be done
about nuclear weapons and fast. But what? In 1950, the RAND Corporation, a US-based think tank was
studying this question. And as part of this research,
they turned to game theory. That same year, two mathematicians
at RAND had invented a new game, one which
unbeknownst to them at the time, closely resembled the US-Soviet conflict. This game is now known as
the prisoner's dilemma. So let's play a game. A banker with a chest full
of gold coins invites you and another player to
play against each other. You each get two choices. You can cooperate or you can defect. If you both cooperate,
you each get three coins. If one of you cooperates, but the other defects, then the one who defected gets five coins and the other gets nothing. And if you both defect,
then you each get a coin. The goal of the game is simple: to get as many coins as you can. So what would you do? Suppose your opponent cooperates, then you could also
cooperate and get three coins or you could defect and
get five coins, instead. So you are better off defecting, but what if your opponent
defects, instead? Well, you could cooperate and get no coins or you could defect and
at least get one coin. So no matter what your opponent does, your best option is always to defect. Now, if your opponent is also rational, they will reach the same conclusion and therefore also defect. As a result, when you both act rationally, you both end up in the
suboptimal situation getting one coin each when you could have gotten three, instead. In the case of the US and Soviet Union, this led both countries to
develop huge nuclear arsenals of tens of thousands of
nuclear weapons each, more than enough to destroy
each other many times over. But since both countries had
nukes, neither could use them. And both countries spent around $10 trillion
developing these weapons. Both would've been better
off if they had cooperated and agreed not to develop
this technology further. But since they both acted
in their own best interest, they ended up in a situation
where everyone was worse off. The prisoner's dilemma is one of the most famous games in game theory. Thousands and thousands of
papers have been published on versions of this game. In part, that's because
it pops up everywhere. (bright music) Impalas living in
between African woodlands and Savannahs are prone to
catching ticks, which can lead to infectious diseases,
paralysis, even death. So it's important for
impalas to remove ticks and they do this by grooming, but they can't reach all
the spots on their bodies and therefore they need
another impala to groom them. Now, grooming someone
else comes at a cost. It costs saliva, electrolytes,
time and attention, all vital resources
under the hot African sun where a predator could
strike at any moment. So for the other impala,
it would be best not to pay this cost, but then again, it too will need help grooming. So all impalas face a choice: should they groom each other or not? In other words, should
they cooperate or defect? Well, if they only interact once, then the rational solution
is always to defect. That other impala is never
gonna help you, so why bother? But the thing about a lot of problems is that they're not a single
prisoner's dilemma. Impalas see each other day after day and the same situation keeps
happening over and over again. So that changes the problem because instead of playing the
prisoner's dilemma just once, you're now playing it many, many times. And if I defect now, then
my opponent will know that I'd defected and they can use that against me in the future. So what is the best strategy
in this repeated game? That is what Robert Axelrod, a political scientist wanted to find out. So in 1980 he decided to
hold a computer tournament. - He invited some of the
world's leading game theorists for many different subjects
to submit computer programs that would play each other. - Axelrod called these
programs strategies. Each strategy would face off
against every other strategy and against a copy of itself and each matchup would go for 200 rounds. That's important and
we'll come back to it. Now, Axelrod used points instead of coins, but the payoffs were the same. The goal of the tournament
was to win as many points as possible and in the end, the whole tournament was
repeated five times over to ensure the success was
robust and not just a fluke. Axelrod gave an example
of a simple strategy. It would start each game by
cooperating and only defect after its opponent had
defected twice in a row. In total Axelrod received 14 strategies and he added a 15th called random, which just randomly cooperates or defects 50% of the time. All strategies were loaded
onto a single computer where they faced off against each other. One of the strategies was called Friedman. It starts off by cooperating, but if its opponent defects just once, it will keep defecting for
the remainder of the game. Another strategy was called Joss. It also starts by cooperating, but then it just copies what the other player
did on the last move. Then around 10% of the time,
Joss gets sneaky and defects. There was also a rather elaborate strategy called Graaskamp. This strategy works the same as Joss, but instead of defecting
probabilistically, Graaskamp defects in the 50th round to try and probe the
strategy of its opponent and see if it can take
advantage of any weaknesses. The most elaborate
strategy was Name Withheld with 77 lines of code. After all the games were played,
the results were tallied up and the leaderboard established. - The crazy thing was that the simplest
program ended up winning, a program that came to
be called Tit for Tat. - [Derek] Tit for Tat
starts by cooperating and then it copies exactly what its opponent did in the last move. So it would follow
cooperation with cooperation and defection with defection, but only once if it's opponent
goes back to cooperating. So does Tit for Tat. When Tit for Tat played against Friedman, both started by cooperating
and they kept cooperating both ending up with perfect
scores for complete cooperation. When Tit for Tat played against Joss, they too started by cooperating but then on the sixth move, Joss defected. This sparked a series of
back and forth defections, a sort of echo effect. - Okay, so now you've got
this alternating thing which will remind you
of some of the politics of the world today where we
have to do something to you because of what you did to us. And then when this weird program throws in a second unprovoked
defection, now it's really bad because now both programs are
gonna defect on each other for the rest of the game. And that's also like some of the things that we're seeing in politics today and in international relations. - As a result of these
mutual retaliations, both Tit for Tat and Joss did poorly. But because Tit for Tat
managed to cooperate with enough other strategies, it still won the tournament. - As we are being joined... - Hey my God, there's Professor Axelrod. - Hey, there's Steven Strogatz. - Whoa, what a treat this is. - And I imagine initially it'd be sort of like computer chess where you need a pretty
complicated program to play a sophisticated game. But in fact it was not like that at all. It was the simplest
strategy that did the best. So I analyzed how that happened. - Axelrod found that all the
best performing strategies, including Tit for Tat,
shared four qualities. First, they were all nice, which just means they are
not the first to defect. So Tit for Tat is a nice strategy, it can defect but only in retaliation. The opposite of nice is nasty. That's a strategy that defects first. So Joss is nasty. Outta the 15 strategies in the tournament, eight were nice and seven nasty. The top eight strategies were all nice and even the worst
performing nice strategy still far outscored the
best performing nasty one. The second important
quality was being forgiving. A forgiving strategy is
one that can retaliate but it doesn't hold a grudge. So Tit for Tat is a forgiving strategy. It retaliates when its opponent defects but it doesn't let affections from before the last round
influence its current decisions. Friedman on the other hand,
is maximally unforgiving - After the first defection
just from the opponent would defect for the rest of the game. Okay, that's it. No mercy and that might feel good to do but it doesn't end up working
out well in the long run. - This conclusion that it pays to be nice and forgiving came as
a shock to the experts. Many had tried to be tricky and create subtle nasty
strategies to beat their opponent and eke out an advantage, but they all failed. Instead, in this tournament, nice guys finished first. Now Tit for Tat is quite forgiving but it's possible to
be even more forgiving. Axelrod's sample strategy only defects after its opponent
defected twice in a row. It was Tit for Two Tats. Now that might sound overly generous, but when Axelrod ran the numbers, he found that if anyone had
submitted the Sample strategy, they would've won the tournament. - I mean it's so clever, there's so many layers to this story. After Axelrod published his analysis of what happened or circulated it among these game theorists, he said, now that we all know what
worked well, let's try again. - So he announced a second tournament where everything would be the same except for one change the number
of rounds per game. See in the first tournament, each game lasted precisely 200 rounds. And that is important because if you know when the last round is, then there's no reason to
cooperate in that round. So you're better off defecting. Of course your opponent
should reason the same and so they should also
defect in the last round. But if you both anticipate
defection in the last round, then there's no reason for
you to cooperate in the second to last round or the round
before that, or before that and so on all the way
to the very first round. - And so in Axelrod's tournament, it was a very important thing that the players didn't know exactly how long they were gonna be playing. They knew on average
it would be 200 rounds, but there was a random number generator that prevented them from
knowing with certainty. - Yeah, if you're not sure when it ends, then you have to kind of keep cooperating 'cause it might keep going
and you need might need them on your side.
That's right. - For this second tournament,
Axelrod received 62 entries and again, added random. The contestants had gotten the results and analysis from the first tournament and could use this information
to their advantage. This created two camps. Some thought that clearly being nice and forgiving were winning qualities. So they submitted nice
and forgiving strategies. One even submitted Tit for Two Tats. The second camp anticipated
that others would be nice and extra forgiving and therefore they
submitted nasty strategies to try to take advantage of
those that were extra forgiving. One such strategy was called Tester. It would defect on the first move to see how its opponent reacted. If it retaliated, tester would apologize and play Tit for Tat for
the remainder of the game. If it didn't retaliate, tester would defect every
other move after that. But again, being nasty didn't pay. - And once again, Tit for
Tat was the most effective. - Nice strategies again did much better. In the top 15, only one was not nice. Similarly, in the bottom
15, only one was not nasty. After the second tournament, Axelrod identified the other qualities that distinguished the
better performing strategies. The third is being retaliatory, which means if your opponent defects, strike back immediately,
don't be a pushover. Always cooperate is a total pushover. And so it's very easy
to take advantage of. Tit for Tat, on the other hand, is very hard to take advantage of. The last quality that Axelrod
identified is being clear. - Programs that were too opaque, that were too similar to a random program, you couldn't figure them out because they were so complicated, tt was very hard to establish
any pattern of trust with a program like that because you couldn't figure
out what it was doing. Not you. I mean the other programs it was playing couldn't figure them out and
so they would end up more or less defaulting to thinking
every turn is like the last time I'm gonna see you. So I might as well defect. What to me is mind blowing about this is that these four principles
being nice, forgiving, provokable and clear is
a lot like the morality that has evolved around the
world that is often summarized as an eye for an eye. It's not Christianity, by the way. It's not to not turn the
other cheek philosophy, it's some older philosophy. - What's interesting is that while Tit for Two Tats would've
won the first tournament, it only came 24th in
the second tournament. This highlights an important fact: in the repeated prisoner's dilemma, there is no single best strategy. The strategy that performs
best always depends on the other strategies
it's interacting with. For example, if you put Tit
for Tat in an environment with only the ultimate
bullies of always defect, then Tit for Tat comes in last. - I wanted to see whether, for example, the Tit for Tat did well because it did well
with really stupid rules that didn't do well with it all themselves that basically it took
advantage of people. - So he ran a simulation where successful strategies
in one generation would see their numbers
grow and unsuccessful ones would see their numbers drop. In this simulation, the worst performing
strategies quickly shrink and go extinct, while the
top performing strategies become more common. Harrington, the only nasty
strategy in the top 15, first grew quickly, but then as the strategies it
was preying on went extinct, Harrington's numbers also quickly dropped. This shows a main benefit
of this simulation because it tests how well a strategy does with other successful strategies. After a thousand generations, the proportions are mostly stable and only nice strategies survive. Again, Tit for Tat comes out on top, representing 14.5% of
the total population. Now this process may sound
similar to evolution, but there is a subtle difference, which is that in this case
there are no mutations. So it's actually an ecological simulation. But what if the world you
started in was different? - Imagine a world that is a
really nasty place to live, more or less populated with
players that always defect, except there's a little
cluster of tit-for-tat players that live in some kind of nucleus and they get to play with each other a lot because they're
geographically sequestered. They will start building
up a lot of points, and also because that
translates into offspring, they'll start to take over the population. So in fact, Axelrod showed
that a little island of cooperation can emerge and spread and eventually will take over the world, which is fantastic. How can cooperation emerge in a population of players who are self-interested? Who are not trying to be good
because they're good-hearted. You don't have to be altruistic. You could be looking out
for number one for yourself and your own interests. And yet cooperation can still emerge. (bright music) - Some argue that this
could explain how we went from a world full of
completely selfish organisms where every organism only
cared about themselves to one where cooperation
emerged and flourished. From impalas grooming each other to fish cleaning sharks. Many life forms experience
conflicts similar to the prisoner's dilemma, but because they don't interact just once, both can be better off by cooperating. And this doesn't require trust
or conscious thought either because the strategy
could be encoded in DNA, as long as it performs better
than the other strategies, it can take over a population. Axelrod's insights were applied to areas like evolutionary biology and international conflicts, but there was one aspect that his original
tournaments didn't cover. What happens if there is a little bit of random error in the game? Some noise in the system. For example, one player
tries to cooperate, but it comes across as a defection. Little errors like this happen in the real world all the time. Like in 1983, the Soviet satellite-based
early warning system detected the launch of an
intercontinental ballistic missile from the US but the US
hadn't launched anything. The Soviet system had confused
sunlight reflecting off high altitude clouds
with a ballistic missile. Thankfully, Stanislav Petrov, the Soviet officer on
duty, dismissed the alarm. But this example shows the
potential costs of a signal error and the importance of
studying the effects of noise on those strategies. - The word game sounds
like it's a children's game or, you know, there's some something, a misnomer maybe in calling it game theory because this is, these are life and
death matters obviously. And as you mentioned that
this came up in the Cold War. I mean it could actually be life and death of the whole planet, the whole we could annihilate
human civilization. So these are not games in
any kind of trivial sense, it's just the term that is used by mathematicians and theorists. - When Tit for Tat plays against itself in a noisy environment, both start off by cooperating, but if a single cooperation
is perceived as a defection, then the other Tit for Tat retaliates and it sets off a chain of
alternating retaliations. And if another cooperation
is perceived as a defection, then the rest of the game is
constant mutual defection. Therefore, in the long run,
both would only get a third of the points they would get
in a perfect environment. Tit for Tat goes from performing very well to performing poorly. So how do you solve this? Well, you need a reliable way to break out of these echo effects. And one way to do this is
by playing Tit for Tat, but with around 10% more forgiveness. So instead of retaliating
after every defection, you only retaliate around
nine out of every 10 times. This helps you break out of those echoes while still being retaliatory enough to not be taken advantage of. - And so we also ran the tournament with noise and generosity
and that did quite well. My favorite example is Tit
for Tat does really well, but it could never do better than the player it's playing with. - I mean, think about it, by design, all they can do is lose or draw. And yet when the results of all
interactions are tallied up, they come out ahead of
all other strategies. Similarly, always defect
can never lose a game. It can only draw or win, but overall, it performs extremely poorly. This highlights a common misconception because for many people when
they think about winning, they think they need to
beat the other person. In games like chess or poker, this is true since one person's gain is necessarily another person's loss, so these games are zero sum. But most of life is not zero sum. To win, you don't need to get your reward from the other player. Instead, you can get it from the banker. Only in real life, the
banker is the world. It is literally everything around you. It is just up to us to find
those win-win situations, and then work together
to unlock those rewards. Cooperation pays even among rivals. From 1950 to 1986, the US and Soviet Union
had trouble cooperating and both kept developing nukes. But then from the late '80s onwards, they started reducing
their nuclear stockpiles. They too had learned
how to resolve conflict. Rather than making an agreement to abolish all nuclear arms at once and essentially turning it into
a single prisoner's dilemma, they would disarm slowly, a
small number of nukes each year and then they'd check each other to see that they had both cooperated and then repeat the year after, and the year after that. All along, checking to
ensure mutual cooperation. In the more than 40 years
since Axelrod's tournaments, researchers have continued to study which strategies perform best in a variety of environments. In doing so, they varied
everything from payoff structures to strategies to errors and more. Some even allowed the strategies to mutate while Tit for Tat or generous Tit for Tat doesn't always come out on top, Axelrod's main takeaways still
hold: be nice, forgiving, but don't be a pushover. - Can I ask you, why did Anatol Rapoport submit Tit for Tat? - Well, the reason was
because I asked him to. (both laugh) And he wrote saying, yeah,
that I'm willing to do that, but I just wanna be
clear that I'm not sure that this is really such a good idea. I don't, he was a peace researcher and I think his own inclinations were to be much more forgiving and
maybe not be so provokable. - What I find fascinating is
that one of the main things that sets life apart
from non-living things that life gets to make decisions. We get to make choices. Choices that don't only change our future, but also the future of
those we interact with. You see, in the short term,
it is often the environment that shapes the player that
determines who does well. But in the long run, it is the players that shape the environment. So let's play a game, the game of life, and make your choices wisely because their impact may
reach further than you think. Using the right strategy matters, but figuring out the
best strategy isn't easy. It requires critical thinking and innovative solutions
like Axelrod's tournaments. If you are looking for an easy way to build your problem solving skills, then check out this
video's sponsor, Brilliant. Brilliant will help make
you a better thinker in everything from math and data science to programming, technology, you name it. You can start right now for free, straight from your device, the device you're watching this on. All you need to do is
set your learning goal and Brilliant will design
the perfect path for you, equipping you with all the
tools you need to reach it. Did you like today's
insights from game theory then Brilliant's brand new course, Intro to Probability is
the perfect next step. Introduction to
Probability is your gateway to mastering the tools of
chance, risk, and prediction. You'll learn how to construct and analyze models of
real world situations from electrons and business
decisions all the way to the 2023 Women's World Cup. Whether you are a budding statistician or you just wanna learn
about randomness and chance, this course will equip you
with the skills you need to make decisions in uncertain situations. You'll even take a page
out of Axelrod's Playbook and learn how to build
computer simulations to put your strategies to the test. Beyond probability, Brilliant
has a massive library of content covering everything
from math to data science and programming to technology. What I love about Brilliant is
that each lesson is hands-on, so you'll build real intuition and the best part is that you can learn with brilliant on the go. To try everything Brilliant
has to offer for free for a full 30 days visit
brilliant.org/veritasium and the first 200 of you to sign up will get 20% off Brilliant's
annual premium subscription. So I want to thank Brilliant
for sponsoring this video and I want to thank you for watching.