Exploring the Collatz Conjecture

This is the most dangerous problem in mathematics. One that young mathematicians are warned not to waste their time on. It's a simple conjecture that not even the world's best mathematicians have been able to solve.

Paul Erdős, a famous mathematician, said, mathematics is not yet ripe enough for such questions. Here's how it works. Pick a number, any number. Seven?

Good choice. Okay, we're gonna apply two rules. If the number is odd, we multiply by 3 and add 1. So 3 times 7 is 21, plus 1 is 22. If the number is even, we divide by 2. So 22 divided by 2 is 11. Now we keep applying these two rules.

11 is odd, so we multiply by 3, 33, and add 1, 34. Even, divide by 2, 17. Odd. Multiply by 3, 51, add 1, 52. Even. Divide by 2, 26. Still even. Divide by 2, 13. Odd.

So we multiply by 3, 39, add 1, and that's 40, which is even. So we divide by 2, 20. Divide by 2, 10. Divide by 2, 5. Odd. Multiply by 3, 15. Add 1, 16. Divide by 2, that's 8. And then 4, 2, and 1. Now 1 is odd, so we multiply by 3 and add 1, which equals 4. But 4 goes to 2 goes to 1, so we're in a loop, and the lowest number is 1. Now the conjecture is this.

Every positive integer, if you apply these rules, will eventually end up in the 4-2-1 loop. This is commonly called the Collatz conjecture after German mathematician Lothar Collatz, who may have come up with it in the 1930s. But the problem has many origin stories and many names. It's also known as the Ulam conjecture, Kakutani's problem, Thwaites conjecture, Haas's algorithm, the Syracuse problem, and simply 3n plus 1. Why is 3x plus 1 so famous? Among professional mathematicians, maybe it's not famous but infamous, in the sense that if someone actually admits in public that they're working on it, then there's something wrong with them.

The numbers you get by applying 3x plus 1 are called hailstone numbers. Because they go up and down like hailstones in a thundercloud, but eventually they all fall down to one. Or at least we think they do. You can think of the numbers as representing the height above the ground in meters. So a number like 26 would start 26 meters above the ground, and if you apply 3x plus 1, it rises up as high as 40 meters, and in total it takes 10 steps to get to 1. So 10 is called its total stopping time.

But, Take the very next number, 27, and it bounces around all over the place. In fact, it climbs all the way up to 9,232. As an altitude, that is higher than Mount Everest, before it too falls back to the ground. In total, it takes 111 steps for 27 to get down to 1 and end up in the 4-2-1 loop.

The path... that different numbers take vary so widely, even numbers right next to each other. So how do you even start to make progress on this problem? Well, honestly, mathematicians struggled.

People just decided that this was something invented by the Soviets to slow down US science and it was doing a good job at it because everybody's sitting there twiddling their thumbs and making no progress on this trivial thing that you can tell school children. Jeffrey Ligarius is the world authority on 3x plus one. The first time I met him, I was a senior in college, and he pulled me aside and he said, don't do this.

Don't work on this problem. If you want to have a career, do not start spending time writing about this or publishing any papers about this. Do real math for a while to establish yourself. Alex Kontorovich didn't listen.

He and Yakov Sinai looked at the paths of the numbers. Were there any patterns? Well, obviously all of them end up at 1, but what about the paths they take to get there?

The pattern is randomness. Here is the sequence of a large number chosen at random. The graph peaks, and then drops so low that you can't really see what's happening at this scale.

But if you take the logarithm, you find this wiggly graph with a downward trend. It looks kinda like the stock market on a bad day. And this is no coincidence. Both are examples of geometric Brownian motion.

That means if you take the log and remove the linear trend, the fluctuations are random. It's like flipping a coin each step. If the coin is heads, the line goes up.

Tails, it goes down. 3x plus 1 is just like the random wiggles of the stock market. Over long enough periods, the stock market tends to trend upwards, while 3x plus 1 trends down.

Another way to analyze 3x plus 1 is by looking at the leading digit of each number in a sequence. Here are the hailstone numbers starting with 3 as the seed. And we can count up how many numbers start with a 1, how many start with a 2, how many start with a 3, and so on to make a histogram. We can...

do the same thing for the sequence that starts with four, that's a short one, and for the sequences that start with five, six, and seven. Again, for each sequence, we're just counting up how many numbers start with each digit, one through nine, and adding that to our histogram. If you keep doing this for more and more numbers, eventually the histogram settles into a stable pattern.

For the first billion sequences, you'll find one. is by far the most common leading digit. 30% of all numbers start with 1. Around 17.5% start with 2. 12% start with 3. And the frequency decreases for higher digits. Fewer than 5% of all the numbers start with 9. Now this pattern is not unique to 3x plus 1. It actually comes up everywhere.

From the populations of countries, to the value of companies, All the physical constants and the Fibonacci numbers, just to name a few. The distribution is known as Benford's Law, and it is even used to detect fraud. If all the numbers on your income tax forms obey Benford's Law, then, well, you're probably being honest. If not, you may be hiding something.

In elections, Benford's Law can be used to spot irregularities, though you have to apply it correctly. Benford's Law works best when the numbers involved span several orders of magnitude. as they do for 3x plus 1. But, Benford's Law can't tell us whether all numbers will end up in the 4-2-1 loop or not.

For that, we need a different sort of analysis. At first glance, it seems strange that when you apply 3x plus 1, all numbers should end up at 1. I mean, consider that there are the same number of odd and even numbers, but odd numbers are more than tripled, while even numbers are only cut in half. Therefore, it seems like every sequence on average should grow, not shrink. But here's the catch. Every time you multiply an odd number by 3 and then add 1, it will always become an even number.

And that means the next step is to divide by 2. So odd numbers are not actually tripled by 3x plus 1, they're increased by a factor of about 3 over 2. I'm neglecting the plus 1 because it's insignificant for large numbers. And three halves is actually the most an odd number can grow in one step. Think of the path from one odd number in a sequence to the next odd number. After multiplying by three and adding one, you have an even number.

And 50% of the time, dividing by two brings you back to an odd number. But a quarter of the time, you can divide by four before you get to the next odd number. So for a quarter of numbers, the next one in the sequence will be three fourths of its initial value. An eighth of the time you can divide by 8 before getting to the next odd number, and a sixteenth of the time you can divide by 16, and so on. If you take the geometric mean, you find on average to get from one odd number to the next one, you multiply by 3 over 4, which is less than 1. So statistically speaking, 3x plus 1 sequences are more likely to shrink than grow.

Take 341 for example. Multiply by 3 and add 1, you get 1024, which you can divide by 2, and then divide by 2 again, and again, and again, and again, 10 times in total until you're down to 1. One way to visualize these paths of numbers in 3x plus 1 is simply to show how each number connects to the next one in its sequence. This is called a directed graph.

It looks like a tree, or a series of little streams that flow into each other. If the conjecture is true, It means that every single number is connected to this graph. Every tiny stream all the way out to infinity eventually flows into the massive river of 4, 2, 1. Some mathematicians have modified this visualization by rotating the graph at each number. Anticlockwise if it's an odd number and clockwise if it's even. And then you end up with a structure that looks like a coral or seaweed.

And by adjusting the degree of rotation for odd and even numbers, you can create these beautiful organic looking shapes. Now there are two ways The conjecture could be false. There could be a number somewhere, some seed, that starts a sequence of numbers that grows to infinity. For whatever reason, it doesn't obey the same numerical gravity as all of the other numbers. Another possibility is there exists a sequence of numbers that forms a closed loop.

All the numbers in this loop would be unconnected to the main graph. But thus far, no loop or sequence that shoots off to infinity has been found. And not for lack of trying, mathematicians have tested by brute force every single number up to 2 to the 68th. That's 295 quintillion 147 quadrillion 905 trillion 179 billion 352 million 825 thousand 856 numbers.

We know for certain that every single one of those numbers eventually comes back down to one. We have tested nearly 300 quintillion numbers and none of them. disproves the conjecture. In fact, given this information, mathematicians calculate that any loop other than 4, 2, 1 must be at least a hundred and eighty six billion numbers long.

So it seems pretty likely that the conjecture is true, but this doesn't prove it. One way mathematicians have attempted to prove it is by making a scatter plot with all the seed numbers on the x-axis and a number from each seeds sequence on the y-axis. Now if you can show that in every 3x plus 1 sequence, there is a number that is smaller than the original seed, well, then you have proven the Collatz conjecture. Because whatever number you pick, you know it will at some point get smaller, and that smaller number as a seed also gets smaller, and so on down to 1. Meaning the only way any sequence can end is in the 4-2-1 loop. This has not yet been shown, but In 1976, Rijo Terras was able to show that almost all callout sequences reach a point below their initial value.

In 1979, this limit was reduced with almost all numbers going to less than x to the power of 0.869, and then in 1994, it was further lowered to less than x to the power of 0.7925. In this case, the term almost all numbers has a technical mathematical definition. It means that as the numbers you're looking at go to infinity, the fraction that end up under the curve goes to 1. Then, in 2019, one of the world's greatest living mathematicians, Terry Tao, was able to show 3x plus 1 obeys even stricter criteria. He showed almost all numbers will end up smaller than any arbitrary function f, so long as that function goes to infinity as x goes to infinity. But the function can rise as slowly as you like.

So log x is an example, or log log x works too, or log log log log x. What this means is for almost all numbers, you can guarantee that there is an arbitrarily small number somewhere in its sequence. In a public talk he gave in 2020, Terry Tao said, this is about as close as one can get to the Collatz conjecture without actually solving it. This is an impressive result, but it's still not a proof.

So why can't we prove the conjecture true? Could it be because it's not true? I mean, everyone is trying to prove it true, which means almost no one is looking for counter-examples. It happened to me just two years ago, where there was something I was trying to prove that I was trying for three years to prove.

And... I couldn't get it to work. And then I found a counterexample. And then I realized what the correct statement should have been.

And then a month later, I proved the correct statement. Maybe we should be spending more energy looking for counterexamples than we're currently spending. I mean, remember how the number 27 grows all the way to 9,232?

Here is a plot of seed numbers up to 10,000, with the largest number reached for each seed plotted on the y-axis. The y-axis stops at 100,000, but... Not all numbers can be shown at this scale.

The seed 9,663, for example, climbs as high as 27 million. And as yet, no one has proven why a number couldn't just shoot off to infinity. And it would take only one to disprove the conjecture. Or, some set of numbers could be part of a loop, not connected to the main graph.

As far as we know, there is only one loop, 4, 2, 1. But something strange happens if you include negative numbers. Applying the same 3x plus 1 rules as before, there is not one loop, not two loops, but three independent loops of numbers. And they start at low values like minus 17 and minus 5. Why should there be... Disconnected loops on the negative side of the number line, but not on the positive side.

Now, one of the most convincing pieces of evidence supporting the conjecture is Terry Tao's proof that almost all numbers have a number in their sequence that is arbitrarily small. But proving that almost all numbers obey this criteria isn't the same thing as proving that all numbers do. How many numbers between 1 and 100 are perfect squares? The answer is 10. So... 10% of numbers up to 100 are perfect squares.

How many numbers between 1 and 1,000 are perfect squares? The answer is 31. So only 3.1% of the numbers up to 1,000 are perfect squares. And the higher you go, the smaller this percentage becomes, such that in the limit, you could say almost all numbers are not perfect squares.

The fraction of numbers that are not perfect squares goes to 1 as x goes to infinity. And yet we know there are an infinite number of perfect squares and we know exactly where they are. Now we've tested by brute force all numbers up to 2 to the 68 and they all obey the Collatz conjecture. And you might be thinking that, well, we should have found a counterexample by this point. But on the scale of all numbers, 2 to the 68 is nothing.

I mean the Pauliou conjecture proposed in 1919 by George Pauliou asserted that the majority of natural numbers up to any given number have an odd number of prime factors. The conjecture was eventually proven false by C. Brian Hasselgrove in 1958 when he identified a counterexample.

What's remarkable is the value of this counterexample was 1.845 times 10 to the 361. That's some 10 to the 340 times bigger than all the numbers checked for 3x plus 1. One way to think about 3x plus 1 is as though it's a simple program run on a Turing machine. The seed number is the input. to the machine.

So in this picture, 2 to the 68 is simply an input tape 68 squares long. You can think of them as a string of ones and zeros or black and white squares. Saying that the machine has transformed every input up to this 68 square tape down to one should not give you a lot of confidence that it will do so for all inputs.

In fact, it's fairly simple to calculate a number that shows any arbitrary behavior you like, so long as it is finite. If you want a number that increases by 3 over 2 five consecutive times, you can calculate that number. If you want a number that climbs by 3 over 2 ten times in a row, or a hundred times, or a thousand times, you can easily calculate those numbers. But after the finite section you specify, you have no more control, and every number that has ever been tested always falls to one.

If there is a counterexample, It's virtually impossible that someone's going to guess it. And the space of all possibilities is too big to search exhaustively by brute force. Two to the thousand is not a searchable space. So if we're going to find it, we have to find it by some intelligent process and not by guess and check. I had been on team 3x plus one for 20 years, and then this point of view, I realized that like, what do we really know?

Do we what? It's very hard to prove a theorem that's false. And so, could it be that everyone's struggling to prove this thing because it's not actually true? And 2 to the 60 is not a lot of evidence.

And even the statistical version is maybe true, and not evidence for the non-existence of a divergent trajectory somewhere in the 3x plus 1 sequence. Of course, there is another option, and that is that we'll never know. That the problem is undecidable. In 1987, John Conway created a generalization of 3x plus 1. It was a mathematical machine that he called Fractran, and he was able to show this machine is Turing complete, which means it can do anything a modern computer can do. But it also means that it's subject to the halting problem, a chance that the machine never stops running and so doesn't give you an output.

And this does not prove that 3x plus 1 is also subject to the halting problem, but it is possible that given what we know, we may never prove the Collatz conjecture true. or false. You're gonna be taught in school that we know a bunch of stuff and they're it's a little it's a lie. They're all lies. Here's this stupid little problem.

Come on. Really we can't solve this? Really? You know it just shows math is hard.

If anything it shows that all the things that we can solve are miracles. We have no right to have solutions to all these other problems. For my whole life, I've thought of numbers as these incredibly regular things, full of patterns, symmetry, and repetition.

But what I'm realizing only now is just how peculiar numbers really are. You can see this most clearly in the coral representation. From a simple mathematical operation comes something intricate, organic-looking, and thus far intractable to us.

Do all numbers connect to this structure, or is there some... Unique filament, a spindly little thread that doesn't connect to any of this, that runs off to infinity. And why is it so hard to tell?

I think that's why Paul Erdos said, mathematics is not yet ripe enough for such questions. What I love about 3x plus 1 is it's a problem almost anyone can understand and play around with. And actually trying to figure things out for yourself is the best way to learn.

Which is why I subscribed to Brilliant, the sponsor of this video. Now recently, Brilliant have upped their interactivity. For example, here is a great lesson on the Pythagorean theorem. So you don't just remember the formula, but you really understand what it means. Now Brilliant is a website and an app designed to get you thinking deeply by engaging you in problem solving.

It's one thing to read through a textbook and think that you get it, but it's quite another to play with interactives and actually test yourself as you go. And Brilliant curates the experiences so they get more and more challenging over time. There's always a helpful tip or explanation that takes your understanding to the next level.

I'd highly recommend their course on mathematical fundamentals. which now has even more interactivity and it has topics that are relevant to all areas of STEM. And algorithm fundamentals for anyone interested in coding.

It's great to have a resource like this to inspire you to learn something new every single day. I try to start off my day by challenging myself with Brilliant, and so if you'd like to join me and a community of 8 million other learners, go to brilliant.org veritasium. I will put that link down in the description. So I want to thank Brilliant for sponsoring this video, and I want to thank you for watching.

Transcript for:Exploring the Collatz Conjecture

Transcript for:
Exploring the Collatz Conjecture