Monday, April 23, 2007

P vs NP & NP-completeness

Michael Sipser of MIT gave a very clear - and perhaps too elementary - colloquium about P vs NP and NP-completeness. Let me make a short introduction to these questions.

P and NP problems

Use a computer to multiply two numbers, 7x13 = 91, or some longer numbers. It will be done "quickly". For large enough input with "N" bytes, the required number of operations (time) will go like a power law, "N^a", where "a" is an exponent. These are the P (polynomial) problems: they are quickly solvable.

Then there are possibly harder problems that may be more time-consuming but if you have a solution, you can quickly prove that it satisfies the conditions. This class of problems is called NP (non-deterministically polynomial) and includes these seemingly difficult problems:
  • find the best schedule for students
  • find the factors whose product gives you a large number with "N" digits
  • find the clique of size "M" (a number comparable to "N"); a clique is a subgraph where all vertices are connected with each other
  • compute the permanent of a "N x N" matrix (determinant without the minus signs)
  • protein folding - minimize the energy by folding a protein
  • find a proof of a true theorem, a proof that has "M" characters
  • find a vacuum with the smallest positive cosmological constant in some toy models of the landscape with "exp(N)" vacua
and many others. (About two entries were added by me.) Once you find a solution, you can check that it is a solution in polynomial time, "N^a". But apparently, the only way to find such a solution is to search through a significant part of possibilities which takes time longer than any power law, namely time such as "exp(N)" or even "N! ~ (N/e)^N".

NP problems are "quickly verifiable". You can easily see that P problems are a subset of NP problems: if a problem is quickly solvable, it must also be quickly verifiable. But is it a proper subset? Can't they be identical sets?


When was this science invented? It was around the mid 1960s, independently by Russian and American authors. When you have a Russian and an American who invent the same thing at the same time, you're usually not getting the whole story. If I tell you that it was all invented 10 years earlier, what was the citizenship of the inventor? Yes, you're right. It had to be a Czech. In fact, he was not just Czech: he was a German Czech at the IAS. ;-)

I claim that the description "German Czech" matches the U.S. standards - it is constructed just like e.g. "African American" - but no sane Czech person with a possible exception of your humble correspondent would ever seriously call Kurt Gödel a "Czech". ;-)

Recent analyses of John von Neumann's archive showed that in 1956, Kurt Gödel sent a letter to (ailing) von Neumann (both IAS Princeton) asking how much time "phi(N)" one needs to find a mathematical proof of a valid assertion, a proof that is known to contain "N" characters. It could be a power law, he speculated. This fact if true would have profound consequences for mathematics. He has also sketched other problems where similar questions can be asked. Gödel was clearly interested in the issue of finding proofs but his comments were more general and the other guys just repeated it 10 years later.


The most important later development was NP-completeness. There exists a large and important subclass of NP problems that are exactly as difficult as each other. This NP-complete class contains many granddaddies - but the clique problem is one of them. This really means that a difficult, NP-complete problem can be converted to a clique problem (or one of many other granddaddies that are dual to each other).

The set of NP problems also contains problems that are not proven to be NP-complete such as the factorization problem. A factorization homework can be transformed into a clique problem which means that factorization problems can't be more difficult than the clique problem: but they can be easier although there exists no proof whether they're easier in either way.

Clique problem

Imagine that you have a group of scientists (vertices of a graph) with the information which pairs have shared a paper (links in this graph). Your task is to find the largest subgraph (subgroup of scientists) in which everyone wrote a paper with everyone else.

In paleoclimatology, it is clearly a P problem. The fast solution to identify the largest clique is to find Michael Mann in the graph, list all of his co-authors, sort them by a number of papers co-authored with Mann, and take the upper "M" of this list that still form a clique, if you want to get the largest clique in the graph.

This algorithm was discovered by Dr. Edward Wegman, a statistician who was asked by by the U.S. lawmakers to investigate similar problems connected with that particular bizarre clique generating a remarkable large number of flawed papers, especially about broken hockey sticks.

P=NP or not P=NP

However, the real question is whether this problem is quickly solvable for random large graphs outside paleoclimatology. If there is a solution that avoids the brute force search, then P=NP. Because such an identity would allow one to crack all possible codes, solve scheduling problems, identify cliques, calculate permanents, and even find proofs of true mathematical statements automatically and quickly, it is generally expected that P=NP is not true.

However, there are examples where similar intuition is wrong, such as the primarily test. One can quickly test whether a number with "N" digits is a prime by using Fermat's little theorem. It takes much less time than testing all possible factors up to those with "N/2" digits. The Lucas-Lehmer primality test is a similar example of this dramatic speed-up although it only works for Mersenne numbers.

NP-completeness means that if you find a universal fast (power law time) solution of one of those hard but easily testable problems, you can prove that you have solved all of them. Not all NP problems are NP-complete, but many of them - such as the clique problem - are.

Sipser continued with some additional questions whether we can check that someone knows a solution to a problem such as the isomorphism of two graphs. These are toy models of some concepts that are important in contemporary cryptography - such as the methods for someone to prove that she knows a password without revealing what it is.

Finally, he also mentioned how different levels of complexity that require memory or time scaling as various, generally different functions of "N" can be related to each other, suggesting that these relations could have something to do with the relations of space and time in physics. Well, I would guess that probably not but everyone is free to try to find this relation with physics, of course. ;-)

And that's the memo.

No comments:

Post a Comment