Ever wonder if we could
run computers backwards? Today we'll be looking at the most
famous problem in computer science, P vs NP, from this unusual angle. By the end of this video, you'll
understand why the P vs NP question is ultimately about
whether we can invert algorithms. First, we'll meet satisfiability, or
SAT for short, a kind of superhero among algorithmic problems. We know literally thousands of
equivalent definitions of it, and we still have no idea how to solve it. Next, we'll break a famous cipher
by building a multiplication circuit and using SAT to run it backwards. Finally, we'll see how the idea
of inverting algorithms connects to the most important question
in computer science, P versus NP. Our story starts with the satisfiability
problem, also known as SAT. This is an incredibly powerful problem. We'll see that, if we could
solve it efficiently, we could use it to reverse any algorithm. Let's see what the problem is about. We're given a bunch of boolean
variables, meaning each can be set to either true or false. But since I'm a programmer,
I'll write 1 and 0 instead. We're also given a bunch of logical
constraints on those variables that have to be satisfied. In the general SAT problem, a
constraint can be an arbitrarily complex logical formula like this. But for simplicity, we'll only
consider the so called CNF SAT. In this variant, each constraint
is a bunch of variables or their negations that are connected by OR. So, for example, this constraint
says that either b must be 1, or c must be 0, or d must be 0. The task is to set the values
of the variables to satisfy all of the constraints. This specific choice works because
if we plug these values into our constraints and simplify, Every constraint
evaluates to 1, meaning it's satisfied. Given a large satisfiability
problem, solving it, or figuring out that there is no solution, is a
really hard computational problem. But let's make the following
thought experiment. Let's say we do have a
really fast SAT solver. Would such an algorithm have
any interesting implications? The answer is that it would
absolutely destroy any intuition that we have about our universe. To start with the tip of the iceberg,
let's see how we could use this fast SAT solver to break perhaps
the most famous cipher of all, RSA. The security of the RSA cipher rests
on the assumption that factoring, writing an input composite number
as a product of two integers, is a really hard computational problem. As far as we know, that is the case,
but if we had a really fast SAT solver, we could suddenly solve factoring. Let's see why. First, let's talk about the inverse
problem to factoring, multiplication. Let's say you want to multiply two really
large numbers, like three and five. We can reduce this problem to SAT. We're going to utilize the
long multiplication algorithm you know from school. You've learned it for base 10
numbers, but we'll use it in base 2. So let's convert the numbers 3 and
5 to binary and run the algorithm. We first compute the intermediate
products and then sum them up. To convert this algorithm to a
SAT instance we'll think about it as a so-called circuit. A circuit is simply a bunch of wires
that can propagate a signal and gates that apply simple logical functions. You can fix the value of
each input wire to 1 or 0. Then the signal propagates through the
wires and the gates and you can read the output off of the wires at the end. Coming back to our multiplication
algorithm This is how you represent it as a circuit. You can see that there are two sets of
input wires that hold two 4-bit numbers. We can now encode the numbers 3 and 5
as inputs to the circuit and let it run. The first part of the circuit uses AND
gates to compute the intermediate product. You can see how the values of the gates
correspond to the intermediate values in the long multiplication algorithm. The second part of the
circuit sums those results up. Each row of adder gates adds the next
intermediate result to the total. Finally, we can look at the
output wires to get the result and find that 3 times 5 equals... 15! This special plus gate in the
circuit is just a shorthand to make the circuit more readable. The gate performs a
simple addition in binary. This is how you can implement it
using elementary logical gates. You can pause to work out how the other
gates and the overall circuit works, but the exact details are beside the point. What matters is that we can
convert the long multiplication algorithm to a circuit. Now, let's see how we can convert
a circuit to a SAT instance. Let's take a simpler example first. We create a variable for
each wire in this circuit. Next. Consider any gate like this one that says
that the value of f is equal to c and e. We can encode the logic of this gate
with the following three constraints. Let's start with the first one. On the one hand, if either c or e
are set to 0, the constraint is going to be satisfied because not 0 is 1. On the other hand, if both c and e are set
to 1, the constraints becomes 0 or 0 or f. So the only way to satisfy
it is to set f to 1. If you think about it, this constraint
encodes the fact that if both of the inputs to the AND gate are
1, the gate will also return 1. Similarly, the other two constraints
encode the logic of the AND gate when one of the inputs is 0. We can do the same thing for all gates. And by taking all these constraints
together, we've managed to represent the logic of the
original circuit as a SAT problem. Let's do the same for our
multiplication circuit. In this case, we need around 80
variables and 400 constraints. What happens if we feed
this into a SAT solver? Well, the solver ensures that all
the values on the wires correspond to the circuit being run on some input. Remember that SAT problems
can have multiple solutions. Our constraints don't specify any
particular input to the algorithm, so there are many solutions, each
one corresponding to the algorithm being run on some two numbers. The solver can return any one of these. Let's narrow this down by
adding constraints that specify the input to the algorithm. To multiply, say, 5 and 3, we first
convert those numbers into binary and based on that we put additional
constraints on the input wires that just say that wire x1 is 0,
x2 is 1, and so on up to wire x8. If we run our SAT solver now
with these new constraints, only one solution is valid. The one that corresponds to running
the multiplication circuit on the input numbers we specified. If we then look at the variables that
encode the output wires, we can find the value of the product, in this case 15. Okay, maybe multiplying numbers
using SAT is a bit underwhelming, but the real fun starts when we realize
we can also do the opposite thing. Instead of adding constraints that specify
a particular input to the algorithm, we can also fix a certain output. For example, if we want to force the
output of the circuit to be 15, we can do it in the exact same way as with the
input, We just fixed the values of the output wires instead of the input ones. And that narrows down the possible
solutions to the SAT problem to only the ones where the output is 15. In other words, we're solving
the question: if we know the output of the circuit is 15,
what does its input have to be? That is, we're solving factoring. There might be multiple possible inputs
leading to the same output, and the SAT solver will just choose any one of these. For example, in this case,
we found that 15 equals... 1 times 15. Well, that's not very helpful, is it? When we're factoring, we also
need to specify that both of the numbers must be bigger than 1. We can ensure that's the case by adding
a few extra gates to the circuit. It would be a bit too messy to show that
here, so here they are in isolation. Alright, we run the
circuit backwards again, and there we go, we've discovered
that 15 equals 5 times 3. Brilliant. And if we try to do the same thing for
the number 13, the SAT solver simply tells us that our instance is not satisfiable,
which is because 13 is a prime number. Can we use our trick on
larger numbers to break RSA? That would require factoring
numbers of size about 10^500. Using our trick, we would get
instances of satisfiability with about 50 million variables. That's way too much for
existing SAT solvers, so we didn't break RSA, fortunately. But the point is that if we did have
a solver that could run in, say, linear or quadratic time, we would
be able to use it to break RSA. So an efficient SAT solver would mean
breaking RSA, which is, of course, already a massive consequence. But if you think about it, the
power goes far beyond that. If you look back at how we converted
factoring to SAT, The only particularity of that problem was that you can use a
small circuit to multiply two numbers. SAT then enabled us to, in a sense,
run the multiplication circuit backwards and solve factoring. Of course, you can't literally run
a circuit backwards, that's why the satisfiability problem is so hard. It's a bit more precise to think
of this as inverting functions. If you think about the original circuit
as computing some function, our trick allows us to represent the inverse
of that function as a SAT problem. If there are multiple inputs leading
to the same output like here, our trick leaves it to the SAT solver to choose one. At this point, it's really important
to understand that any algorithm can be represented as a circuit. We wrote a proof of that in our linked
blog post, but if you don't want to read it, just look closely at your CPU. Ultimately, it's just a
bunch of wires and gates. So, whenever we have an algorithm,
we can think of it as a circuit and then use SAT to invert it. So, instead of answering the question
"if I run the algorithm on a given input, what output do I get?", we
can encode the question "if I want a certain output, what input do I
need to feed into the algorithm?" This is an incredibly powerful
trick, so let's see some more examples of how it can be used. For example, take the 3-coloring problem
where you are given a graph and you're supposed to color each node in one
of three colors such that every two neighboring nodes have different colors. So, this would be a valid coloring, but
if we change the color of this vertex, it's no longer valid because now it
shares a color with one of its neighbors. When we have a proposed solution, it's
easy to check whether it's correct. We just iterate over all the edges,
and for each one, we make sure that the two sides have different colors. So again, we can think of this
checking algorithm as a circuit. Oof, the circuit looks even messier
than the multiplication one. Let's at least fade some of these
wires out so that we can see it better. I won't go into the details here, but
the basic idea is that we have some gates for every node and for every
edge, and together, these check that the coloring is valid on all edges. Then, there's this single output wire
coming out of this messy circuit. It tells us whether all of the gates
output 1, in other words, whether the input is a valid coloring or not. When we run the circuit on this faulty
coloring, we see that the output is 0. It's because this NAND gate is not
happy about these two vertices that are adjacent and have the same color. Then, we encode it into a SAT
problem and add a constraint saying that the output must be 1. This way, we've managed to
invert the checking algorithm. The SAT instance now asks to
find a set of inputs that the coloring checker will accept. In simpler words, we've encoded
the problem of solving 3-coloring. In general, think of any problem like
coloring where you can quickly check whether a proposed solution is correct
or not, using a verification algorithm. We can use our trick to encode the
inverse function as a SAT problem. But this function maps the
output YES to a valid solution. This gives us an incredibly
general way of solving problems. We can now finally connect
everything to P versus NP. The theoretical definition of a
fast algorithm is that its time complexity is a polynomial function. The class of all problems where
you can verify proposed solutions in polynomial time is called NP. And it contains most important computer
science problems, including coloring, factoring, and even much simpler
ones like multiplying two numbers. The trick of converting
checking algorithms to SAT works in polynomial time. This means that a SAT solver
that also runs in polynomial time would imply a polynomial-time
solution for any problem in NP. But here's a fun fact: the satisfiability
problem itself is in the class NP. A problem is in NP if there
is a polynomial-time algorithm for checking solutions to it. And for SAT, checking if a
solution is valid is easy. You just iterate over all the constraints
and verify that all of them are satisfied. So you can view satisfiability
as the hardest problem in NP. Because once you've solved SAT, you've
solved everything in this class. Formally, we say that a problem
is NP-complete if it's both in NP and any other NP problem can be
reduced to it in polynomial time. So using our trick of converting
algorithms to SAT, we can prove that satisfiability is NP-complete. This is one of the most important
results of theoretical computer science, and it was proven more than 50 years
ago by Steve Cook, and independently by Leonid Levin, who you may remember
from one of our earlier videos. And it gets weirder. Soon after Cook proved that SAT is
NP-complete, another computer scientist named Richard Karp published a famous
paper in which he basically said, Hey, I looked around at about 20 other problems
in NP, like the coloring problem or the traveling salesman problem, and all
of these problems are NP-complete too. He proved this using so called reductions. Basically, similar to how we converted
the factoring problem to satisfiability, you can start from SAT and convert it
to other problems like graph coloring. Soon after Karp, researchers identified
thousands and thousands of additional NP-complete problems covering all
areas of computer science, like graph theory, optimization, machine learning,
bioinformatics, and even playing games. Remember, NP-completeness means that if
you have a polynomial algorithm for any NP-complete problem, you have one for
all other NP-complete problems as well. So in some sense, it's debatable whether
NP-complete problems are even different. They're all just different formulations
of a single problem: satisfiability. All of this begs the question,
is there a polynomial-time algorithm for satisfiability? This is the famous P versus NP problem. At this point in time, the best SAT
solvers are, in the worst case, not much faster than just trying all possible
assignments, which takes exponential time. You can solve small instances of SAT
using this brute force approach, but already in the 60s, researchers pointed
out that you can't solve big instances of SAT if you just try and try and try. And that makes a lot of sense, because
this whole video explains how a fast algorithm for satisfiability would
have so many wild consequences. We have this intuition that
there is a huge difference between verifying a potential
solution and solving the problem. Equivalently, we think that if there
is a fast algorithm implementing some function, there doesn't necessarily have
to be one for the function's inverse. A polynomial-time SAT solver would
shatter this intuition, and it would mean fast algorithms for literally
thousands of NP-complete problems. It would also imply that
cryptography is impossible. That's not just because we would
break RSA, as we mentioned. Think also of hash functions. Such functions are often called
one way functions, because they're literally designed to be easy to
compute, but very difficult to invert. You can think of these
functions as blenders. Will it blend? That is the question. They perform complicated operations on
the input bits, making it difficult to reverse them if you only have the output. It's analogous to the notion of entropy
and the second law of thermodynamics. It seems that it's easier to make a
mess than put the pieces back together. Unfortunately, the difference between
physics and computer science is that we can simply declare P does not equal NP as
a law of nature and claim our Nobel prize. As far as we know, fast SAT solvers could
be mathematically possible, and so we maybe could invert any messy algorithm. We just think it's very unlikely. But in that case, why don't
we have a mathematical proof that P does not equal to NP? The big issue here is that while we
have a lot of tricks for coming up with fancy new algorithms to solve
problems, we don't really have good ways of proving that a certain problem
can't be solved with a fast algorithm. So we suspect that P does not
equal to NP, but also that proving it is really, really hard. I think this is a good intuition for what
the P versus NP problem is really about. The heart of the problem is
whether we can invert algorithms. This is of course closely connected to
the standard way of framing the question, namely whether we can efficiently
solve problems if we're able to efficiently verify proposed solutions. That's because inverting a
checking algorithm enables us to solve the original problem. I hope you find this
intuition helpful too. If you have a question,
just leave us a comment. Also, check out our blog post where
we correct some of the inaccuracies that we made here for the sense of
clarity, and we also explain some things that we considered too crazy
even for our polylog standards. That includes how you can explain the
success of deep learning by viewing backpropagation as an algorithm
that can kind of invert circuits. If you enjoy our videos,
please like and subscribe. Ah, that's so cheesy, isn't it? If you enjoy our videos, like
and subscribe, and consider supporting us on Patreon. I'll see you next time.