Sunday, November 28, 2021

"Very high probabilities that theorems are true" are fallacious

David B. sent me a recent article by Mussardo and LeClair that uses the Mertens function i.e. the cummulative sum of the Möbius function to produce a statistical argument in favor of the Riemann Hypothesis. It's interesting but at the end, I believe that we already have lots of these incomplete "statistical arguments in favor of the Riemann Hypothesis". They have actually existed for centuries; most mathematicians have believed that the hypothesis was true and the "circumstantial evidence" was always an important reason of that belief.



Yes, this article hugely overlaps with \(P=NP\) is conceivable; there is no partial evidence in purely discrete mathematics (2014). See other blog posts that mention probabilities and Riemann or \(P\), \(NP\), proofs.

Fine, I have spent hundreds of hours with the Riemann Hypothesis in my life. Almost all my attacks on the problem were naturally thinking of "physical systems" where a function related to the Riemann zeta function was a partition sum or something similar. Some of them were just quantum mechanical problems of particles on some graphs (with line intervals whose lengths was related to primes), some of them had creation and annihilation operators for primes, the most sophisticated ones involved the tachyon minimum in string field theory or \(p\)-adic string theory.



An overwhelming majority of my attempts assumed the Hilbert-Pólya conjecture which is a vague Ansatz for a proof. It says that there is a physics problem, a quantum Hamiltonian, whose spectrum basically and demonstrably coincides with the nontrivial zeroes of the zeta function. The final steps of the proof of the Riemann Hypothesis is conjectured to be the proof of the Hermiticity of this operator.



Although I have repeatedly thought to be hours away from a complete proof, I no longer believe it has ever been the case. The main problem is that I believe that the whole program is a wishful thinking and the actual beef is actually still missing. Why? Well, it is surely true that a Hermitian operator has a real spectrum. But the problem is that aside from the normalizable eigenstates, typical operators which represent the Riemann zeta problem may also have quasinormal modes which are not normalizable eigenstates i.e. "wave functions in \(L^2\)" because of some "exponential growth somewhere at infinity" but they still do produce zeroes or poles in the scattering amplitudes extended to the complex plane. Despite their being non-normalizable, these "wave functions" or formal eigenstates would still lead to a violation of the Riemann Hypothesis. My research on quasinormal modes has reminded me that in the optics of calculus, the quasinormal modes simply do look as real as the normal ones, and the discrimination between the two groups is largely equivalent to the validity of statements such as the Riemann Hypothesis. If you can't separate solutions to normal and quasinormal ones, you will probably be unable to prove or disprove the Riemann Hypothesis, either.

The Hilbert-Pólya program makes it sound (and perhaps, the claim is the main beef of the program!) as if "finding the right Hamiltonian" were the overwhelming majority of a proof of the Riemann Hypothesis but I think that it is not really hard to define such operators. The missing part of the proof is the proof of the non-existence of the quasinormal modes away from the real axis (away from real eigenvalues). And because I think that Hilbert, Pólya, or many of their followers haven't found any sketch of a proof implying that "no quasinormal modes exist", the program hasn't really come closer to a complete proof of the Riemann Hypothesis. And for this reason, while I expect the physics methods to be superuseful in analyzing all such problems, I am somewhat open-minded about the very validity of the Riemann Hypothesis. The physics wisdom would probably be very useful even in the case that the nontrivial roots away from the axis actually exist; and the Riemann Hypothesis is actually false!

But that is not what I wanted to discuss according to the headline. I wanted to discuss the problems with any quantification of a "probability that a mathematical proposition is true". When this reasoning is applied to unique theorems, the probability calculus is always fishy.

Why is it so? The reason is very simple... the reason is that mathematics isn't a natural science, mathematics and physics aren't the same thing! What is the relevant difference and why does it matter? Well, in physics and natural sciences, probabilistic arguments are not only legitimate. They actually cover all real empirical evidence in favor of the non-trivial, non-tautological claims in physics (and natural science). We simply need the empirical data and the empirical data always have some probability to deviate from the precise mean value, because of the random (Poisson) character of events, because of noise and inaccuracy of the experimental apparatuses, and perhaps a few more reasons.

What happened when the Higgs boson was "discovered" in July 2012 (and I was sure it would be there from late 2011)? Well, the histogram of the possible "candidate decaying particle" describing some decay events at the LHC was growing an increasing bump near \(125\GeV\) and the probability that such a bump would arise by chance, assuming that no Higgs of the mass around \(125\GeV\) existed, was generally decreasing. When it dropped in "one part per million" or so, the particle physicists had enough self-confidence to announce the discovery.

But we need to understand. How is it possible that physicists may calculate the probabilities that a Higgs-like bump emerges by chance and similar probabilities? What are the assumptions behind such calculations (and indeed, they are needed whenever we deduce something from the raw experimental data)? A necessary condition is our assumption that Nature obeys a theory that allows to calculate the probabilities of various outcomes. And if the same experiment is repeated many times, the repetitions of the experiments are almost precisely independent from each other. So if a Higgs-like event is seen too many times, the probability that it is a coincidence is dropping to zero, basically exponentially. Natural science depends on the "reproducibility of experiments". While there may be random fluctuations or noise that prevents a clear outcome of a single experiment, the probability that a patternless (null) hypothesis will be capable of producing a pattern that is characteristic for a more elaborate theory (one with the Higgs in this case) may be made arbitrarily small by simply repeating the same process many times!

The independence of the outcomes of random processes in two labs at different places, or even one lab at two different moments, boils down to a basic property of spacetime, some kind of locality. In a relativistic theory, the locality in space may be precise and it implies that the outcomes in two spatially separated regions are strictly independent of each other. If they are time-like separated, like repeated experiments in the same lab always are, the precise independence cannot hold. But we have still good reasons to believe that the LHC is basically "reset" before every collision and the result of the previous collision doesn't tangibly influence the odds for the outcome of a new, repeated collision. Because the spacetime has many quasi-independent regions, the evidence from each may add up to the total evidence, the probabilities may multiply, and with a sufficient number of repetitions showing a growing pattern, the certainty that the null hypothesis has been falsified may become arbitrarily self-confident.

On the other hand, mathematics doesn't work in any spacetime and it doesn't allow any truly independent repetitions of experiments!

Try to view questions in mathematics (questions about the validity of various propositions) as physics questions, i.e. questions analogous to the question "whether an observable \(L\) in some region obeys some condition". The problem is that all of mathematics is really connected with itself. You cannot ever assume that two propositions (that have been proven to be neither equivalent; nor the negations of each other) are "spatially separated" from each other so that the two answers could be used as two truly independent pieces of evidence in favor of a theorem (or against it), so that the probabilities could be multiplied (to get a really small number really soon).

The reproducibility of the LHC collisions seems almost exactly true. If the LHC only produces three charged leptons, the electron, the muon, and the tau, it is implausible that it will suddenly produce five charged lepton species (at the same energy) or that the muon mass will dramatically change. We have quite some evidence for the statement that "the laws of physics aren't changing, or at most changing incredibly slowly". Also, we have quite some direct and indirect evidence in favor of the independence of especially spatially separated regions (that independence, locality, follows from relativistic thoughts about the causality). Much of the evidence is direct, we have just observed lots of pairs of repetitions of an experiment whose outcomes were predicted by a theory but uncorrelated with each other. But the relativity-like principles are even stronger arguments against some "dependence" between the random generators employed by Nature in two regions of space (or spacetime).



However, all of this locality is pretty much completely absent in mathematics. Mathematics may look like a huge landscape of concepts, structures, their relationships, and propositions about all the previous things. However, the mathematics landscape is not "huge" at all when it comes to the independence of the regions and subfields and especially propositions with different numbers (or other "values" of other structures) inserted. In fact, mathematics as a field may be said to be "perfectly globalized". Constructions and propositions in one subfield can never be legitimately assumed (and can never be proven!) to be "statistically independent" from those in another subfield. In particular, the complex calculus has totally tight, intimate relationships to number theory. Theoretical physicists love the connections between different subfields of mathematics.

In this sense, the whole field of mathematics is a tiny, "Planckian" seed of spacetime where everything may be related to everything else, where arbitrarily differently looking propositions may actually be found equivalent.

The psychological problem leading to the fallacious thinking about the "probabilities in mathematics" boils down to the fact that the "landscape of mathematical propositions" really looks "huge". Why? Because when we are learning it, we usually need to read (or write) large books that are located at different places of the physical space, or that are read (or written) at different moments of time (assuming a single mathematics student or researcher). In their real lives, mathematicians are mapping some propositions and their classes into regions of the physical spacetime, and that makes the subfields of mathematics or sets of mathematical propositions look "independent" from other subfields or sets of propositions. The picture above even shows a 2D "map" of the mathematical landscape. We separate mathematical subfields to articles, books, and chapters; to different university courses; or to different floors of mathematics departments, to pick a few examples of this segregation.

But this perspective is an artifact of the way how we organize our learning of mathematics. It is obvious that this independence is generally an illusion, something that only results from a combination of the mathematical beef and psychology, not mathematics itself. After all, if we are very strict, there exists no independence in mathematics at all. Mathematics admits proofs of all (or almost all) propositions, or their negations, and all the propositions that have been proven to be true (which may be the negations of some original propositions) may be said to exactly equivalent to each other in the purely logical sense! You may open a Wikipedia page to learn that
In logic and mathematics, statements \(p\) and \(q\) are said to be logically equivalent if they are provable from each other under a set of axioms, or have the same truth value in every model.
Once you agree about the basic axioms of number theory, the definition of primes etc., the proposition \(2+2=4\) is logically equivalent to the statement that there are infinitely many primes! They are logically equivalent because all pairs of truths are logically equivalent; and all pairs of falsehoods are logically equivalent, too! Obviously, this very narrow "logical equivalence" that actually looks at the \(2\times 2\) table saying whether "yes/no and yes/no" are equivalent to one another... is not what we normally mean when we say that "two mathematical propositions are equivalent to each other". What do we normally mean by that?

We mean that the proof of one may be basically directly translated to the proof of the other, by a dictionary that maps the objects on one side to the objects on the other side and vice versa. We really want the whole structure of the logic of proofs, and perhaps everything that may be done with the objects, to be "copied" in the two places. In this much more demanding, narrow sense, the equivalences are much less widespread. For example, the prime number theorem may be equivalently formulated in terms of the prime counting function (some decreasing density of primes); or in terms of the Riemann zeta function (there are no roots of zeta for \(s=1+it\)).

But an important point is that it is only the "logical equivalence" that admits a truly rigorous (but also nearly trivial) mathematical definition. The narrower equivalence of two mathematical propositions "whose logic really has to match" depends on the degree to which the two "translations" may be considered "translations of each other". It is ultimately a subjective, psychological issue. Or perhaps even a matter of linguistics or another soft, "social" science: the equivalence of two "monologues about mathematics" is as debatable as the question "whether someone translated a paragraph from a European language to an African one correctly". Of course that linguists' opinions may differ because much of the question boils down to the "equivalence of different phrases in two languages" which are rarely clearcut (both in linguistics and in the comparisons of two mathematical subfields). And just like we cannot totally rigorously establish whether two propositions are equivalent to each other in the narrower sense, we can't ever establish that they are totally independent from one another. In fact, it is obvious that there are "shades" of equivalence-or-independence (these two adjectives describe pairs of propositions; and they go in the opposite directions; well, I mean that the equivalence-and-antiequivalence are similar to the parallel lines and the independence are similar to the orthogonal lines) and the degree of the "shade" can never be calculated by a quantitatively precise formula. They depend on lots of arbitrary choices so they cannot possibly have implications for the validity of mathematical theorems which are objectively either "true" or "false". I don't say that all judgements about the equivalence in mathematics are equally wortheless; some may be smarter than others (hopefully the paid experts can produce smarter ones than the laymen). But they also depend on some context, conventions, and none of them can ever be an ultimate, precise, rigorous conclusion about the independence or moral equivalence of two statements.

Of course, I have written similar things about similar topics many times. For example, there is a 2012 blog post explaining that experimental mathematics may fail. If you understand my arguments, you will be capable of seeing that the beef of that blog post is also "close to be equivalent" to the beef of this present blog post although the focus and chosen examples are different. This similarity between two blog posts is actually another example of the "fuzziness of the notion of equivalence" of two different argumentations in mathematics. Because many of the words "sound" different, I am pretty sure that most laymen can't even determine that these two blog posts discuss nearly equivalent philosophical or conceptual issues.

At any rate, I should repeat the examples from that 2012 blog post and some others. For example we know that \[ v=\sin (\pi\cdot \exp (\pi\cdot \sqrt{163})) \approx 2.356 \times 10^{-12} \] A rather simple expression with a sine, two pis, one square root, and one simple integer \(163\) is capable of producing a number that is nonzero but very small, about two trillionths. You could statistically analyze the probability that the sine (which is between minus one and one) of a random real expression is this close to zero (but nonzero). You could conclude that it is very unlikely that a simple expression like that produces a result that is just "two trillionths". But it happens. And it actually isn't a coincidence. You may expand the \(j\)-function in the complex plane and see why exactly \(163\) is capable of making \[ \exp(\pi \sqrt{163}) \] very close to an integer, which is enough to make the previous sine close to zero. Once you understand these expansions of the \(j\)-function, you know that the "statistical model for the distributions of those sines with similarly looking random arguments inside" is just deeply wrong. (You must have known that the statistical model just couldn't have been "precisely" right because the values of the aforementioned sines with the arguments between 1 and 163 are not 163 numbers produced by a real number random generator; they are totally specific constants that are always the same and therefore "completely deterministic"!) The statistical model overlooks all the patterns implied by the wisdom of the \(j\)-function and the related mathematical wisdom. You could claim that the very existence of the \(j\)-function wisdom is some rare miracle which may be assumed to be non-existent.

Except that with these claims, you would be on a very thin ice. You have no real evidence to justify such a proposition. In fact, those who understand mathematics well know that the \(j\)-function wisdom is not "very rare" at all. Such patterns are omnipresent in mathematics. In some sense, all of valuable mathematics is made out of such things because the "truly trivial things" leading you to the naive expectations should be considered tiny portions of the "body of mathematics". In the palaces of mathematics, the trivial things are left to janitors. The real mathematicians working in the palaces of mathematics should spend their time with the nontrivial things that the janitors are not sufficiently clever to do! If you assume that it can't matter whether some number in an expression is 163 or something else, then you pretty much believe that "all numbers are created equal" and this hardcore version of the left-wing ideology may indeed be considered the most spectacular proof of your mathematical illiteracy. Even more so than the people or nations, numbers are not equal to each other and they generally lead to very different outcomes.

I have written many blog posts sketching the "truly surprising" things in mathematics such as very "unnatural" small or large numbers that nevertheless appear as answers to seemingly simple and natural questions; very short theorems that require a shockingly long proof; wrong propositions whose first counterexamples involve extremely high integers, and many such things. All those things exist – and surely cannot be legitimately assumed to be absent – everywhere in mathematics. We already know lots of them. There exists absolutely no reason to think that we have already learned a "majority" of such surprises in some reasonable measure (and again, there almost certainly cannot exist any objectively and permanently perfect measure for the "amount of mathematics" because the skills of a janitor and mathematicians keep on evolving). So the number of such patterns and surprises and unnaturally high integers or real numbers that appear as the "minimal answers" to seemingly natural and simple questions... is probably huge. It is strictly speaking infinite. With some vague definition of the equivalence, many of them may be said to be "moral copies of each other" but as I said, this notion of "near equivalence" is unavoidably fuzzy, too.

That is why in many previous posts, I emphasized that it was important to acknowledge that we don't really know whether \(P\neq NP\). The pressure to force people to assume that \(P\neq NP\) is mathematically unjustifiable, because of all the interconnectedness of mathematics and the surprises discussed above, and the pressure to force the scholars believe that \(P\neq NP\) is on par with the dictatorship of the Catholic Church's or Nazi Party's or Communist Party's of Woke Psychopaths' dogmas. It is just wrong to impose group think on the people.

While I do think that the Riemann Hypothesis is "very likely to be true", and I could possibly get above 99% with my estimated probability, I do realize that the unlimited drift towards 100%, like when you have 99.9999999%, is fallacious. A necessary condition for producing such very high probabilities of theorems, including the beloved ones like the Riemann Hypothesis, is some sort of "independence of various arguments" (often arguments based on properties of many numbers that are substituted to the same proposition as parameters) but this independence is simply a wrong assumption in mathematics. The landscape of mathematical propositions isn't a spacetime with strictly independent, spacelike-separated regions, and that is why all arguments claiming that the probability of totally unproven unique mathematical propositions can be pushed arbitrarily close to 100% are demagogic; and they involve dogmas and circular reasoning to promote such dogmas. A rational, honest person should avoid them and all good mathematicians should be rational, honest people.

And that's the memo.

No comments:

Post a Comment