Wednesday, April 22, 2015

Lack of surprises in discrete mathematics?

Surprises exist, are bound to become important, and therefore cannot be neglected

On his blog, in the announcement called "Is There Something Mysterious About Math?", Scott Aaronson of MIT has announced a longer essay on the "Ideas.Aeon.CO" server
Q: Is there something mysterious about mathematics?

A: Why isn’t it more mysterious?

That text is sort of provocative.

First, Aaronson points out that it isn't sufficiently fulfilling to fight a straw man because he is filled with the air. However, your humble correspondent provides him with a more attractive straw man – one that is filled with flesh and blood. As you know, many people just love to constantly fight straw men with flesh like myself – because such straw men are actually right and many people simply love to be wrong and to be proud about it.

The first general insight I am credited with is the following principle:
Discrete math is just a disorganized mess of random statements.
It was taken somewhere from my remarks about \(P\neq NP\) or \(P=NP\) and compactified. Note that I have repeatedly made the statements that "discrete math" doesn't admit any partial evidence. If you have found neither a proof nor a disproof of a statement, you should remain about 50-50 open-minded about the validity of the claim.

One must be careful about what I mean by "discrete math", however. Roughly speaking, it must be a part of mathematics with no links to the continuous structures that are important in physics; and we must discuss questions such as \(P=NP\) which are unique and can't be viewed as the 106th variation of problems that have been solved.

Scott Aaronson, an electrical engineer, starts by the question whether mathematics is mysterious. Well, there are lots of unsolved problems. They are usually propositions that may be right or wrong and we don't have any rigorous proof in one way or another. However, in most cases, we actually do have some opinion "what is the likely answer" – it's the answer that avoids the shocking surprises – and we just don't possess a rigorous proof that the shocking surprise is indeed non-existent.

This is a fair summary I agree with – Aaronson's overall picture isn't stupid, at least not for a plumber at MIT.

So Fermat's Last Theorem should have been believed to be "probably right" – because it's been proven for many exponents and in similar situations and it would have been surprising if there were suddenly a (necessarily very large) exponent which would provide us with a counterexample. Later, the theorem was proven. The equal percentage of digits \(0,1,\dots 9\) in the decadic form of \(\pi\) hasn't been rigorously proven but it seems to hold for quite some time, it seems to be a matter of common sense because \(\pi\) is a "random" sequence of digits, and an unequal representation of the digits would look like a giant conspiracy.

Similarly, there exist partial proofs of the Riemann Hypothesis. I am confident that it's true although like others, I haven't found a rigorous proof yet. Too many roots seem to lie on the critical line. Roots away from the line would mean some non-randomness or clumping of primes and it doesn't seem to be there. If it were there, it would probably have manifested itself a long time ago but it didn't.

However, there are mathematical propositions where one should be about 50-50 open-minded. Before some first sensible insights were made about the monstrous moonshine, the appearance of the number \(196,883+1\) at two very different corners of mathematics could have been a coincidence, or it could have had a reason. It is such a big number that the "coincidence" would require a random event whose probability is something like \(N/196,883\) where \(N\) is the number of "similarly fundamental" places in mathematics that produce an integer constant.

Of course, we've known for decades that it is no coincidence. The number appears at two very different places (an expansion of the elementary \(j\)-function of a complex variable, and a representation of the monster group) and a string-theoretical CFT proves why it is so. This insight also shapes our expectations about similar "moonshines". Of course, experts have generally believed that a similar explanation exists for similar coincidences, and some of the new ones have been found, too.

But I think it's fair to say that \(P=NP\) is an example of a mathematical proposition that "doesn't have any true precedent" which means that it is the unique problem "in its own class" and there is no "partial evidence". We are effectively asking about the existence of one algorithm that has a certain speed and that solves something that is effectively one particular problem (because proofs that this problem is complexity-wise equivalent to many others are known), let's say the travelling salesman problem. Such an algorithm may exist (in which case it may be found in 2350, earlier, later, or never) and it doesn't need to exist. So just like Walter Wagner said about the statement "LHC will destroy Earth" :-), the probability has to be 50-50.

Just to be sure that the smiley doesn't get lost, I think that Walter Wagner revealed his idiocy because there exists overwhelming evidence – evidence at many different levels – that the LHC can't destroy the Earth. But there is no analogous evidence relevant for the question whether the algorithm above exists.

Try to compare the existence of the fast algorithm for the travelling salesman problem with Fermat's Last Theorem, for example. A good analogy could be that this hypothetical "fast algorithm" would be similar to a "freak exponent" for which there exists a counterexample to Fermat's Last Theorem. But with this analogy in mind, the situations are not analogous at all. We have a proof of Fermat's Last Theorem and even before we had it, we could eliminate all the possible candidates for the "freak exponents" up to some very large number.

But we just don't have the analogous evidence in the \(P=NP\) case. Any future candidate algorithm that may solve the travelling salesman problem is a pretty special, clever structure. We know some algorithms how to solve this problem. All of them are "slow", in agreement with \(P\neq NP\). But have we looked at sufficiently many candidate algorithms to eliminate the possibility that there exists a more clever, parameterically faster algorithm? I don't think so. Because all of the algorithms we know are basically the same – they are all about the brute force – maybe we should say that we have only tested one algorithm for the travelling salesman problem, and this candidate was "slow". It's like having verified one exponent for which Fermat's Last Theorem holds. It's not a significant amount of evidence (especially because the exponents \(1,2\) allow infinitely many "counterexamples", so you could even argue that by the heuristic "induction", counterexamples should exist for all exponents \(3,4,5\) just like they exist for the exponents \(1,2\); exactly the converse holds). Before Fermat tried to intensely look for solutions to his equations with exponents greater than two, he should have believed that the probability of his "theorem's" statement was comparable to 50%, and that's exactly the situation in which \(P=NP\) is today.

OK, I have discussed this problem many times. Many wise experts in Aaronson's field agree with me, many other, almost equally wise ones prefer their \(P\neq NP\) group think. Instead, I want to focus on a new twist in Aaronson's essay – which may be summarized e.g. by his last sentence
But I feel confident in saying that, yes, there is something mysterious about math, and the main thing that’s mysterious is why there isn’t even more mystery than there is.
I think that I agree with this conclusion – but only if the "measure" (which decides whether the amount of mystery is high or not) is chosen in a certain way that isn't terribly useful for the mathematics research. And I also think that the very general philosophy that Aaronson had to adopt before he came to the conclusion in the quote confirms my principle about the desirable open-mindedness concerning problems in "discrete math". Why?

Take Aaronson's comments about the Riemann Hypothesis. He says that the alignment of the nontrivial zeroes on the \(i(s-1/2)\in\RR\) axis is a "mysterious pattern" but it's needed for an even more "mysterious pattern" – non-randomness and hierarchical clumping in the distribution of primes – to be absent.

Well, I think that the Riemann Hypothesis is probably true which also means that the \(i(s-1/2)\in\RR\) pattern cannot be mysterious to me as long as I am rational. And if the Riemann Hypothesis is true, there is nothing mysterious about it. I may easily rationalize it. We can prove that all the nontrivial zeroes of the zeta-function are symmetrically distributed with respect to the critical axis. It means that if there were zeroes that violate the Riemann Hypothesis, they would have to come in pairs.

The imaginary part of \(s\) of a zero of the zeta-function is a "random real number". But if there were zeroes away from the critical axis, there would be at least one pair of zeroes whose imaginary parts are equal (and their real parts are also basically the same – the distances from the real axis coincide). This equality (or these equalities) would be "non-mysterious" from the functional viewpoint: we may prove the symmetry formula for the zeta-function. But in the perspective involving number theory (the distribution of primes), the equality of the imaginary (and real) parts of two zeroes would look like the equality between two random real numbers which seems infinitely unlikely.

For those reasons, it seems comprehensible to me that all the nontrivial roots may prefer to stay on the critical axis.

But I didn't really want to end up with this conclusion. My point is that if you don't think rationally or creatively enough, the alignment of the zeroes on the critical axis may look like a miraculous pattern to you. The expectation that such an alignment along the critical axis shouldn't exist may be based on "vague analogies" with other problems that you know.

As our understanding of the mathematical structures deepen, we learn that some problems that were thought to be analogous differ in important ways, and may lead to the opposite answers; while other problems that were thought to be unrelated are actually deeply interconnected if not equivalent. We are learning. But unless we have a rigorous proof, it is always possible that there are "important exceptions" in the classes of structures and problems that we consider "analogous".

For certain reasons, I think that these problems are analogous to the problem of ethnic minorities. Let me explain why. Take some territory, e.g. Czechoslovakia of the 1930s to avoid "currently hot" disputes of the present. Most people in Czechoslovakia preferred to speak Czech or Slovak. But you shouldn't overgeneralize: there was a rather well-defined, not too fractal territory, the Sudetenland, where most people preferred to speak German.

Your idea that "everyone speaks Czech" would be too inaccurate and if you were learning about the country, you would have learned about the German-speaking minority. But if you were learning about it even more than that, you would have figured out that "the Sudetenland speaks German" is inaccurate as well because there are still Czechs there – who are a minority within the population of the Sudetenland itself.

And so on.

My point is that in mathematics, there are subclasses of problems and structures that are "exceptions", and within those smaller classes, you find "exceptions of exceptions", and this chain may continue – in the most general case, it probably continues indefinitely. When you land at a not too well-defined but "not quite randomly chosen" point of the Sudetenland, your chances that you first meet a Czech speaker are about the same as the chances for a German speaker. You may run into a guy who is an "exception" in Czechoslovakia, or an "exception in the Sudetenland" i.e. an "exception for an exception in Czechoslovakia", and so on.

If you were given a truly "random" person in the Sudetenland of the 1930s, according to the uniform egalitarian distribution, it would make sense to say that "German" is more likely than "Czech". But there was no random generator with a uniform distribution. There could have been a good reason why you landed there. You may have been ordered by your Czech allies to deal with the Henlein Nazis, for example. In a similar way, \(P=NP\) was in no way chosen as a "truly random proposition" in a large class whose most members are known to be false propositions. \(P=NP\) is a special problem, one without any true precedents. To clump it with other problems is likely to be misleading.

I could speak for hours or days about the "exceptions" and "surprises" that people had to recognize in mathematics when they were learning things at an ever deeper level.

A simple example. Take \(\cos x\). It is a complicated, transcendental function of a sort. \(\cos x\) is almost certainly transcendental for a rational \(x=p/q\) if \(x\neq 0\). If the function can't be calculated in some simple way, using fractions (or square roots etc.), it probably means that the probability is "infinitesimal" that \(\cos x\) will be a simple number.

But you could expect the same for \(\cos(\pi p/q)\), too. Only for \(x\) which is a multiple of \(\pi/2\), the cosine is either \(0\) or \(\pm 1\). But you may learn that \(\cos(\pi/4)=1/\sqrt{2}\). You view it as a single exception but then you learn about \(\cos \pi/3=1/2\) and in fact, infinitely many values of \(x\) in the period which are rational multiples of \(\pi\) may be expressed by similar simple square-root-based formulae.

These are simple things with a geometric interpretation and everyone who understands high school mathematics will find them understandable. Even though \(\cos(\pi p/q)\) may look like random numbers that have no reason to be of the form \(p_2/q_2+\sqrt{p_3/q_3}\), infinitely many such values of cosine exist. In this sense, the values of \(\cos x\) for "simple" values of \(x\) aren't quite "random" numbers.

But there exist many much more abstract surprises or examples when experimental mathematics fails. Take\[

v=\sin (\pi\cdot \exp (\pi\cdot \sqrt{163}))

\] from the blog post Is Nature natural? (and Some numerology). Now we have a sine of a multiple of \(\pi\) but the coefficient isn't rational. Instead, it's the exponential of some crazy \(\pi\) multiple of a square root, one of \(163\), to make things worse.

It looks like some absolutely random gibberish. The exponential – the coefficient in the sine – may be proven not to be rational. So you expect the number \(v\) to be of order one. Your calculator says\[

v\approx 0.00000\,00000\,02356

\] which is so close to zero that you will be tempted to believe that it is a numerical error and the expression really is zero. It is not, however. The probability that a sine – a "random" number between \(-1\) and \(+1\) – is so close to zero is comparable to the value \(v\) itself, about one part in a trillion. But while the expression for \(v\) was complicated, you won't quite find a "trillion" of similarly simple expressions. So it's surprising that we could get so close to zero.

The \(j\)-function – the same function that enters the \(196,883\)-based moonshine – may be used to explain why the number \(v\) is so small, i.e. why the coefficient given by the exponential is "almost" integer.

Once you learn about these cool patterns in the \(j\)-function, you realize (at least you should realize) that \(196,883\) isn't quite a random number between \(100,000\) and \(1,000,000\); that's a lesson from the monstrous moonshine. This number is much more tricky and is capable of deceiving you. Similarly, \(\sqrt{163}\) isn't a random square root, either.

There is a heuristic explanation why \(\exp(\pi\sqrt{163})\) is nearly an integer. We know another number of this sort,\[


\] which is an exact integer. If you replace \(163\) by \(-1\), the square root is \(i\), the imaginary unit, and the exponential of \(\pi i\) gives you minus one, according to the formula that Feynman once called the prettiest identity in mathematics. With the number \(163\) replacing \(-1\), it no longer works "exactly", but you get "close" to an integer.

Much of mathematics – and its progress – is all about similar "exceptional structures and patterns" that were not previously known. Because there's something special about them, it makes sense for mathematicians to spend much more time by studying them. After some time, within the mathematical literature, these exceptional structures won't look so exceptional, after all. They may even conquer a majority of the literature.

The "equality" of all the numbers etc. is broken because "something special" holds for special numbers or special functions or special mathematical structures of many kinds. And it is this "something special" that makes these "exceptions" particularly attractive for mathematics.

We may look at \(P=NP\) from the viewpoint of the following analogy. Imagine that "the travelling salesman problem" is compared to the number of \(163\) because it is, for example, the \(163\)rd most important problem in computer science. It sounds about right. The question is whether the fastest algorithm for this problem is polynomial. This question may be analogous to the question whether the time (expressed in billions of years)\[

v= \sin[\pi\exp(\pi\sqrt{163})]

\] is shorter than a human life. Well, it's in billions of years and this is a random \(\O(1)\) multiple, so it can't be shorter, can it? However, it is shorter. With my units, it would only take hours to solve the travelling salesman problem because the problem isn't too random – much like the number \(163\), it may be described in a rather concise way – and lots of "special extra relationships and approximations" exist for small numbers including those similar to \(163\).

There may very well exist a polynomial algorithm for the travelling salesman problem, and to be "nearly certain" that there isn't one is a similar kind of an unsubstantiated faith as the faith that numbers such as \(\sin[\pi\exp(\pi\sqrt{163})]\) can never be insanely small. Well, they can be, but it may often need some degree of depth, progress, and modern expertise to uncover all the patterns. When we get smarter, we may very well find \(P=NP\) as natural as \(v\ll 1\).

Most of the numbers and mathematical structures that mathematicians write about in their papers are "special" – they can be described by finite, and usually short enough, sequences of words or symbols. And their being "special" means that they can be subject to many special relationships, patterns, and laws. It is simply stupid (and a sign of a lazy Luddite) to assume otherwise – that no similar relationships or exceptional numbers or exceptional mathematical structures or fast algorithms or surprises in general will be found in the future.

The whole history of mathematics is the history of our increasingly refined knowledge of numbers and mathematical structures – and every refinement is about a "surprise" of a sort!

I wrote that people who believe that there are almost no surprises in mathematics are lazy – and Luddites who oppose progress – because they don't want to find these surprises (even though they sometimes get huge salaries pretty much for looking for interesting things). But I think that people like Aaronson can't really be familiar with sufficiently deep mathematics and its history, either, because they would know that surprises occur all the time and the history of mathematics is pretty much composed of them because a surprise is usually a more important development than a non-surprise!

Partial evidence that such surprising structures don't exist is only conceivable once we solve many similar problems that really seem to be in the same "class" and that seem to be equally "random" members in the class. This is, pretty much by definition, never the case in the "discrete math" as quasi-defined at the beginning.


  1. It's not accident that mathematics almost exclusively concerns itself with computable numbers because it's hard to work with something you can only abstractly describe in the negative. The computable numbers are an infinite class but infinitesimally small class of numbers relative to uncomputable numbers but are "luckily" quite good at describing reality and the laws of physics. God is not malicious or do we force God to not to be malicious? I go with the former in most cases.

  2. Dear Bill, the role of the computable numbers and structures is even greater than you suggest, I think. In measure theory - and therefore physics of continuous variables - it makes sense to assume that the non-computable real numbers are approximately "equally important" or "equally likely" as the computable ones. After all, that's what every continuous probability distribution implies. And in physics, it's guaranteed that we never measure things exactly (because of limited apparatuses; as well as deeper principles such as the uncertainty principle), so there are distributions that contain uncalculable numbers and only a measure-zero subset of these intervals are calculable.

    But whenever we talk about specific numbers, they have to be calculable, otherwise we would need an infinite amount of information to "pinpoint" the number. The specific real numbers in mathematical papers are therefore "always special", and there is always a risk that they have some "special properties" that would be "infinitely unlikely" for a random - uncomputable - real number. Such special properties - "surprises" - are bound to exist for these "calculable" real numbers (and well-defined more complicated mathematical structures). The question isn't "whether" surprises exist for those; the question is when and which.

  3. Hi Lubos,

    You wrote "cosx is almost certainly transcendental for a rational..." Was that sarcasm? I think cosx is transcendental for all algebraic numbers different than 0, so for nonzero rational numbers too, this being a consequence of Lindemann–Weierstrass theorem.

  4. Ok, so you agree with the statement, don't you?

  5. Different NP complete problems are not all the same problem. They all have the same worst case complexity, but only up to a polynomial function. So to claim that they are "morally" all the same you must accept that any polynomial problem is trivial (which I don't believe you accept). It is incoherent to claim both that all NP complete problems are the same, and that there are any non-trivial problems in P.

  6. I do agree and got also the point with "almost."

  7. I am an architect who is interested in the possible sub-quantum particle FORM as information medium and building blocks in nature.

    I also focus on the possible dynamical FORM transformation aspects in micro- and macro physics as a base for dark matter black holes, the Higgs field vacuum and the Big bang.

    I believe that our complex world can not be described by equations alone.
    WHY? Because we don't know why the universe is as it is.
    An example: “Finetuning”: Why are the "fundamental constants" constant?
    My suggestion: because the sub-quantum FORM of particles and the Higgs field vacuum lattice have a certain form and play a game with us..

    Even Gerard t'Hooft told me that according to his knowledge there is no known math description possible for simple form changing objects like a convertible ring shape as shown below.

  8. Dear Lubos, I don't want to argue with you. This is my last comment.

    Here what Omnes says at page 340:

    "Finally, an interpretation must make it clear how it describes
    facts. Bohr (1958) spoke of phenomena as being
    described in terms of classical physics. Landau and
    Lifshitz (1958a) formulated quantum mechanics in terms
    of a separate classical physics. Heisenberg (1958) and
    others (London and Bauer, 1939; Peierls, 1985) stressed
    the central role of an external, essentially classical observer.
    Accordingly, the Copenhagen interpretation does
    not really describe phenomena in the language of the
    theory; it only puts them side by side with theory. The
    question of consistency cannot even be stated in these

    I agree with these statements. You have always said every aspect of quantum mechanics understood in 1920s . The only thing I say is that, assumptions of authors quoted above make perfect sense in ordinary situations, but they may not make sense in the case of cosmology. You need to use full consistent histories. My question was always the following, can we make sense of assumption of external classical observer in the case of cosmology ?

    He ten says:

    "If, on the contrary, one assumes that there is only one
    physics, which must then be quantum physics, the theory
    must account for the existence of the factual data entering
    in its experimental verification. This means that, if
    one assumes the universality of quantum mechanics, the
    theory must be consistent enough to encompass the conditions
    of its own experimental verification."

    Again I agree with these statement. My point is that one didn't really need consistent histories before. You have said before that consistent histories was correct but it was against shut up and calculate approach, we really didn't need it. Do we need now ?

    I am sure that classical physics will never return back and the usual use of quantum mechanics will be valid always in ordinary lab experiments.

  9. Of course there are no non-NP problems in P, since P is a subset of NP (If had a proof that it is a strict subset, it might be too long for this comment:-), but I know what you meant to say. I thought I remembered you (in an earlier post) questioning the identification of P with "feasible" problems, which is a valid point, but incompatible with the argument that all NP complete problems are the "same". However, looking back at the old posts, I couldn't find it. So I may have been mistaken.

    Problems have properties other than worst case complexity, and those can vary from one NP complete problem to another. For example some NP complete problems have practical algorithms that solve them approximately (for various definitions of both "approximately" and "practical"), and others do not (unless P=NP, in which case they can all be solved exactly).

    Note that there are known algorithms (e.g. for SAT) that are within a constant of optimal. So proving P=NP isn't a mater of finding a polynomial algorithm, we already know one if one exists. The problem is to determine if a known algorithm runs in polynomial time. The algorithms (Levin/Hutter search) have been known from the beginning, but they are not used because the constants are a bit too large (an understatement. They are not known exactly, but probably bigger than any numbers used in physics).

  10. Isn't the last paragraph actual partial evidence that P=NP is likely to hold?

    It was exactly the "apparent need for large constants" (which look unnatural) to make these algorithms slow in practice that was used by Aaronson and others as an argument that P=NP should better not hold because everyone would be like Mozart then, and all this stuff.

    But if the number of required operations is a gazillion times N^gazillion, then it's polynomial but still very slow, right? And you seem to say that the best algorithms actually seem to scale like this.

  11. > it makes sense to assume that the non-computable real numbers are approximately "equally important" or "equally likely" as the computable ones. After all, that's what every continuous probability distribution implies.


  12. Could you please be more specific about which word you don't understand?

  13. Valid until 43th decimal !

  14. "Isn't the last paragraph actual partial evidence that P=NP is likely to hold?"

    I don't see how it is evidence either way. We don't know if the algorithms are polynomial or not, but they have huge constants regardless. There are could be other "best" algorithms with much smaller constants. There almost certainly are, since the known algorithms work by running all algorithms in parallel until one of them gets a correct result.

  15. I'm not sure what you're saying. It sound like you're saying that a random number is equally likely to be computable as non-computable, which is certainly not true.

  16. I wonder if the principle of indifference in the case of a experiment having just two possible outcomes, about which there is no partial evidence, has ever been vindicated in nature.

  17. What I am meaning by this statement is that both numbers are assumed to exist in interval of real numbers, and probability distributions described by a continuous function rho(x) imply that "x" may be computable or not computable. If rho(x) is constant for "x" in an interval, all the numbers - computable and uncomputable - in this interval are equally likely to be the value of x.

  18. Well, Richard, knowledge is subjective. Not knowing anything about a question X means to consider Yes and No equally likely, 50-50.

    Everything else is only right if there is "some" knowledge, some theory - perhaps an unreliable or approximate one or based on analogies - that is relevant for the question.

    And these theories are tested by empirical tests. So a theory that makes more robust and correct predictions is better than a theory that makes vague 50-50 predictions. But if there is no such more accurate theory, 50-50 is clearly the most successful "mini-theory" to predict the unknown questions.

  19. You realize that your uniform 50-50 assumption is inconsistent with Bayesian probability? According to Bayes, additional information must affect the probability. I think you claim to be a Bayesian, but based on this and other discussions you seem to hold to a nonmathematical theory of probability.

  20. I am not an idiot so I realize - and have pointed out millions of times - that extra evidence leads to changes of Bayesian probabilities. And if *you* were not an idiot, you would know that I must know it.

    But you also misunderstand the whole point here, and the point is that there is *no* additional evidence for/against similar unprecedented mathematical claims (like P=NP) than their proof or disproof.

  21. You really don't understand Bayes theorem, and are substituting misguided intuition for calculation. But since you are being so obnoxious about it, I will not go to the effort to explain for your readers. You are hereby banned.

  22. Thanks for the reply. Understandably, in the case of equally likely outcomes one might infer equal probabilities for any one of them them. What I do not get is how you make that inference (however useful it may be) in the absence of knowledge indicating there are a number of outcomes that are equally likely.

  23. Dear Richard, Bayesian inference depends on priors. The P=1/N priors are the most natural ones if there are N clearly isolated options, but there is nothing special about them. It's important that all qualitatively different options/hypotheses get nonzero - and more or less comparable - priors.

    The more evidence is accumulated, the more safely the dependence on these priors evaporates.

  24. Aaronson is a clever silly, which is what you need to be to be at a prestigious university. The most telling thing about his career is the time he asked a doctor for a chemical castration so he wouldn't commit sexual microaggressions.