Originally written with a different audience in mind
Arguments involving the internal consistency of a theory or a system of ideas have assumed increased prominence in modern mathematical and theoretical physics as well as in other branches of human thought that were or are being inspired by physics.
It is possible to underestimate these arguments; and it is possible to overestimate them as well. And indeed, numerous thinkers err in both ways.
The goals of this essay are to explain the basic logic and assumptions behind these arguments, to present several examples, to show that these arguments are often equivalent to logical steps that may be described without the word “consistency”, to sketch a probabilistic argument suggesting that theories passing a consistency check are more likely, and to clarify both basic fallacies.
The first basic fallacy is to dismiss the arguments rooted in the consistency altogether; we will try to see that the reasoning based on consistency has often been useful, successful, and demonstrably valid. The second basic fallacy, one discussed at the very end of this essay, is the assumption that the internal consistency of a system of ideas proves (i.e. is sufficient for) the validity of such a system.
Consistency and proof by contradiction
At the level of mathematical logic, the character of arguments involving consistency is rather trivial. If proposition P implies proposition non(Q) i.e. if it can be proven that P → ¬ Q, we may say that P and Q are inconsistent.
Equivalently, the proposition “P and Q are inconsistent” may be expressed as Q → ¬ P i.e. it is possible to prove non(P) assuming Q. The equivalence of the two implications holds and is known as “transposition”; the equivalence is a valid rule to exchange the antecedent with the consequent of the conditional statement and to negate both propositions at the same moment. Both propositions are equivalent to ¬ (P ∧ Q), too.
All of mathematics and physics depends on mathematical logic which is why these elementary logical rules and principles are omnipresent in exact sciences. For example, every proof by contradiction in mathematics is nothing else than a proof exploiting the logical consistency (more precisely, inconsistency) of a set of assumptions.
To see a simple example, let us prove that there are infinitely many primes. The proof by contradiction is a proof showing the internal inconsistency of the usual assumptions about integers, the usual definition of primes, and the (false) assumption that there are finitely many primes. The proof proceeds as follows: If there were finitely many primes, one could calculate their product and add one. The resulting sum (“the sum”) would be greater than any of the (finitely many) primes so it couldn't be one of the entries on the original (finite) list of primes. However, according to the definition of a prime, the sum would be a prime as well because it is not divisible by any other prime i.e. because it is only divisible by one and by itself.
That's a contradiction: The sum is a prime and it is not a prime at the same moment. It follows that the original set of assumptions is logically inconsistent. At least one of the assumptions has to be false. By inspection, it is not hard to see that the false assumption is the assumption that the number of primes is finite.
May all mathematical descriptions fail?
It is perhaps necessary to briefly address two possible general objections against arguments using consistency – or against any argument involving the language of mathematics: 1) the world doesn't have to be consistent; 2) the world may admit no mathematical description at all which would apparently imply that every mathematical argument is invalid, too.
The first suggestion is impossible because a well-defined Yes/No question in Nature can't have both contradictory answers. Individual human beings may be confused and hold beliefs that are inconsistent. But it is always possible to verify and correct the system of propositions and to see that some assumptions are untrue or some steps in the proof of contradiction are invalid. In this way, we reach a system of proposition that avoids demonstrable contradictions. Such a “theory” is superior in comparison with an inconsistent one, to say the least.
As far as the general criticism of mathematics is concerned, we must realize that the language of mathematics (and mathematical logic in particular) is being used in physics and in natural sciences because it is the most accurate language that avoids confusions and contradictions. Aristotle has pioneered mathematical logic of the Western style and the subsequent improvements have simply eliminated all sources of confusion. But in principle, every language used to study Nature that is sufficiently refined and that is promoted to a sufficiently controllable form may be seen to be equivalent to mathematical logic.
On the other hand, the identification of the “reality around us” with mathematical objects similar to numbers and functions doesn't have to be counted as an independent, questionable assumption. Instead, it may be described as a general result of the scientific method. A priori, only propositions about our direct observations or perceptions (i.e. “we saw a full moon 28 days after the previous one”) may be considered “valid axioms”. Numbers and other mathematical objects have appeared in our theories describing Nature simply because all non-mathematical attempts to explain the whole body of our observations have been falsified or at least found significantly incomplete. In other words, numbers, functions, and other mathematical structures entered physics because they seem necessary for a viable explanation of the logical relationship between (propositions about) our direct perceptions or observations.
The extra assumptions commonly used in physics
Previously, we reviewed a proof of the infinite size of the set of primes. The same proof may be formulated in many ways, using various words. And we don't mean just the translation from one human language to another; or a translation from “plain English” to the language of mathematical symbols. After all, the phrase “proof by contradiction” doesn't contain the word “consistency” at all. It is not hard to modify the wording of the proof so that one avoids the word “contradiction”, too.
Given the ubiquity of proofs by contradiction in mathematics, one might be surprised that physicists' arguments that boil down to logical consistency may be questioned at all. After all, the dismissal of the requirement of logical consistency is tantamount to the denial of mathematical logic as a whole. And rational reasoning in mathematics, sciences, or other fields would be impossible without mathematical logic incorporated into the foundations.
However, there is something special about the consistency arguments in physics, something that goes beyond the generic proofs by contradiction in mathematics. What are the special new ingredients?
The first ingredient may be the empirical data. The compatibility (or consistency) of a hypothesis with the empirical data is the key criterion that decides about the fate of the hypothesis according to the scientific method. If the observed data disagree with predictions of a hypothesis, the hypothesis is falsified. But in this essay, we don't want to talk about the empirical data. Because the references to the internal consistency are characteristic steps used by theorists, not experimenters, we want to focus on the internal consistency i.e. consistency of assumptions that are parts of a theory. What are the extra ingredients we routinely encounter in physics?
One of them involves a special class of the physical assumptions that often belong to the “list of axioms” whose consistency or inconsistency is being analyzed. The other ingredient is the physical nature of the contradictions that are being arrived at in the process of the proof.
To be more specific, the important physical assumptions that may result in contradictions (inconsistency) with others are symmetries (or, almost equivalently, the independence of the predictions on some transformations or changes of conventions used to observe the system or to talk about it); the fact that probabilities must be numbers between 0 and 1; the fact that the sum of probabilities of all mutually excluding options equals 1. The typical last step of the contradiction in such physics proofs is a calculation of the value of a continuous physical quantity that should be equal to two different numbers according to two different arguments.
Examples of consistency arguments before string theory
Since the beginning of the 20th century, a common theme in theoretical physics was the “reconciliation of two pre-existing foundations” that resulted in the justification of a rather specific new theory. This sort of reasoning is an example of a consistency argument. Einstein's special theory of relativity; the uncertainty principle underlying quantum mechanics; Einstein's general theory of relativity; quantum field theory including its general predictions such as the existence of antimatter; and string theory may all be viewed as results of such “reconciliations”.
In each case and in many others, one may deduce a rather specific new theory out of some previous, less accurate or less complete theories or assumptions, because the older assumptions are compatible but just barely so. If they were completely incompatible, they couldn't simultaneously hold at all. On the other hand, if they were easily compatible, they wouldn't have much to do with each other and we wouldn't learn anything out of their combination (we don't learn much from the fact that the leaves are green and that the Sun is round). Modern physics often falls in the middle. It almost looks like we may derive that the assumptions are incompatible except that there is always a loophole. The assumptions are compatible but only if the loophole is exploited. Because we know that the assumptions hold, their consistency implies that all the propositions we summarized as the “loophole” – along with all of their implications – are true, too.
The special theory of relativity may be derived from two postulates: 1) the equal form of physical laws relevant for all inertial observers (those that are in a the state of uniform motion with respect to each other), and 2) the constancy of the speed of light (independently of the speed of the source of light as well as the observer). These two postulates look nearly inconsistent because the first postulate used to be believed to imply that the relative speed between two objects adds up i.e. is equal to \(u+v\). However, the second postulate requires that the relative speed of light shouldn't be \(c+v\); it should still be just \(c\). Einstein realized that these two postulates are compatible (and mechanics and electromagnetism+optics are consistent) if a particular loophole is exploited. The loophole involves the insight that the simultaneity of two events depends on the observer, too. Galilean transformations are replaced by the Lorentz transformations and the composition formula for two speeds becomes \((u+v) / (1 + uv / c^2)\). All the other consequences also follow. When the speed of objects increases, their length is being contracted, the time is slowing down, the masses are increasing, the speed of light is the maximum allowed speed of material objects or the information, and so on.
Similarly, the Heisenberg uncertainty principle of quantum mechanics is the necessary loophole that is needed to reconcile the fact that particles sometimes behave as pointlike objects; and sometimes they act as waves (e.g. in the double slit experiment). Mathematically, the uncertainty principle is translated to a nonzero value of the “commutator” \(xp – px = i\hbar\) and more or less all the exotic (as well as mundane) phenomena associated with quantum mechanics follow from this fact.
The general theory of relativity (found by Einstein in 1915) follows from the consistent reconciliation of the insights of the special theory of relativity (1905) and from the existence of the gravitational force (as previously understood by Newton). The gravitational influence seems to immediately change when the source of the gravitational field gets changed. However, the special theory of relativity prohibits such an instantaneous “action at a distance” (which would operate more quickly than by the speed of light). The two principles (gravity, special relativity) are only consistent because of a loophole: the gravitational signals also propagate (only) by the speed of light and it's possible because they are disturbance in the spacetime geometry itself. They cannot be disturbances or waves in any other field or medium because the “equivalence principle”, i.e. the fact that all objects accelerate in a gravitational field at the same rate (in the vacuum), would be contradicted.
There are several other very important examples and hundreds of less important examples. Quantum field theory results from the reconciliation of the special theory of relativity with quantum mechanics. Particles with wave-like properties have to be “created” and “annihilated” by local quantum fields for the rules of special relativity to be obeyed. However, one may see that such local quantum fields are able to create and annihilate antiparticles (particles of antimatter), too.
Finally, string theory seems to be the only known logically consistent reconciliation of the rules of quantum field theory (arrived to in the previous paragraph) and those of the general theory of relativity (Einstein's modern theory of gravity). In this case, we don't possess any “complete and direct proof” that string theory is an inevitable implication of the two assumptions. This failure partially boils down to the fact that we don't have any “truly universal” definition of string theory, something that would allow us to decide which laws of Nature are string theory and which laws of Nature aren't (research from the mid 1990s has demonstrated that “the theory” we called “string theory” is actually more than just a “theory of strings” and their splitting/joining interactions; it inevitably includes many other building blocks such as D-branes and other processes).
However, in practice, we are always able to decide whether a new proposed theory generalizing the general theory of relativity as well as quantum field theory is or isn't string theory. And whenever a candidate for a consistent theory of quantum gravity comes close enough to string theory, we may show that it is actually necessary for the constraints of string theory to be obeyed exactly and the resulting theory must be one of the numerous (seemingly inequivalent but at the end, exactly equivalent) descriptions of string theory.
Examples of consistency arguments in string theory
We have seen that “consistency arguments” exploded in theoretical physics about 100 years ago. They became even more omnipresent with the rise of string theory. Detailed string theory examples are arguably too technical for a philosophical essay. Nevertheless, I want to sketch and distinguish two ways in which references to consistency are used as arguments in the context of string theory.
First, consistency is often being exploited in a derivation of some particular technical result that apparently didn't have anything to do with consistency. Second, the apparent consistency of string theory and all of its insights and aspects is a rational reason to strengthen our belief that string theory as a whole is correct or at least unique and worth our (or physicists') time.
Concerning the first first point, it is interesting to note that even Einstein's equations of the general theory of relativity (which say that the Ricci tensor vanishes, if we restrict our analysis to the vacuum) may be derived from consistency within string theory. A priori, it is possible to study the vibrations and motion of one-dimensional strings on any spacetime geometry. The results should be independent of our parametrization of the curve spanned by the string in the spacetime. In general, however, we find out that the results depend on the parametrization. Such a dependence is sometimes referred to as a “Weyl anomaly” or “conformal anomaly”. (In general, “anomalies” are symmetry-breaking terms in the equations that may be made inevitable by the rules of quantum mechanics even though they would seem obviously absent according to the rules of classical physics.) If this dependence were there (and if the results for cross sections etc. included the corresponding term that would be nonzero), the physical theory would make ambiguous, mutually contradictory predictions. It would be an inconsistency. (More precisely, there would be new “degrees of freedom” that could create waves on the string and processes involving these new “degrees of freedom” would sometimes be predicted to occur with negative probabilities.)
It turns out that there are many potential terms in the “Weyl anomaly” and they are proportional to the values of the Ricci tensor at each spacetime point. The condition that the propagation of strings is logically consistent – i.e. that it produces unique, not self-contradictory, results – requires the Ricci tensor to vanish at each spacetime point. (More generally, we may derive all the right low-energy field equations including the source terms and other terms.) The field equations may be derived in several ways but the consistency of the propagation of other strings on a given background is one of them.
Surprisingly enough, one may also derive certain nearly “philosophical” conclusions that would look like “arbitrary assumptions” in all simpler theories. In particular, all approximate theories including quantum field theory may assume a certain number of spacetime dimensions. Theories with different numbers of spacetime dimensions describe “completely separate worlds” that cannot be related or compared. The number of spacetime dimensions seems to be one of the first assumptions we have to make and there is apparently no way (except for observations) to decide that one assumption is correct or wrong.
However, in superstring theory, the “Weyl anomaly” also contains terms proportional to (D – 10). Their vanishing implies that the spacetime has to have ten spacetime dimensions in total (it is not incompatible with our basic observations if six of them end up being compactified). It is just one of the prominent examples of the ability to string theory to pick the right values and choices that previously seemed to be “up to us”. String theory is a sufficiently complex theory whose “ill siblings” would have many potential or real inconsistencies. String theory manages to avoid all of these problems. A welcome side effect of this “surprising consistency” is the theory's ability to calculate quantities (such as the total number of dimensions) that previously looked “obviously incalculable”.
This leads me to the second aspect of consistency in string theory. The surprising consistency and cancellation of all conceivable inconsistencies and “anomalies” may be used as an argument in favor of the validity of the theory as a whole. Given the fact that the predictions of string theory only differ from the predictions of the approximate theories (quantum field theories) in extreme environments that will probably never be achieved by experiments, at least not in a foreseeable future and directly, we must ask: How may such reasoning be justified?
Even though one cannot “directly test” the new phenomena predicted by string theory, it is important to notice that even the “more mundane phenomena” that are predicted to occur similarly by quantum field theory and by string theory may increase the probability that string theory is valid. The reason is a difference between quantum field theory and string theory. In quantum field theory, the individual particle species and their interactions are more or less added to the theory one by one. So their qualitative and quantitative properties may be adjusted to achieve the internal consistency and the compatibility with the experiments.
In string theory, the diverse particle species (Higgs boson, leptons, quarks, gauge bosons, graviton) may be deduced as states of vibration of the same object, the fundamental string, and the same applies to different interactions. It means that there is a lot of potential for contradictions. String theory could have been incompatible with the existence of the Higgs boson or the graviton or the lepton or the gauge boson or the electromagnetic interaction or something else. Or it could have produced one of the infinitely many a priori possible inconsistencies or anomalies. But with a very small number of discrete choices, all the potential internal inconsistencies and inconsistencies between string theory and basic qualitative empirical observations are absent. These virtues are nothing else than “tests that string theory has passed” and by Bayes' theorem, the probability that a hypothesis is correct is increasing once it passes some tests that could have a priori failed. (We know how it works in the case of detailed empirical tests. But even tests of the internal mathematical rigidity or qualitative features of the list of particles and interactions are non-trivial tests that could have failed but they didn't. And that's why the probability goes up.)
Finally, string theory boasts two closely related general features: the resilience in extreme conditions and the dual descriptions of the same equations (the so-called “dualities”).
When we adjust the parameters of an incompletely defined theory to extreme values (a huge value of the coupling constant, a tiny size or a nearly singular shape of the compactification manifold, very high energies, a huge number of colors of quarks etc.), we expect to end up with a mysterious situation where predictions are impossible – i.e. either undetermined or inconsistent. But even though no complete definition of string theory is known, this doesn't happen. Instead, consistency considerations imply that there is always a unique way to predict what happens beyond a critical point or in extreme conditions. This property of string theory means that “a whole theory that is capable of answering arbitrarily extreme questions” almost certainly exists even if we don't know the most universal definition of the theory yet.
This virtue is sometimes described by the slogan that “physicists are discovering, not inventing, string theory”. Let me mention the following analogy. We call two people, a maritime explorer and a writer of fiction, and ask them to talk about a new continent they are just visiting. Is it possible to decide which of them is “making things up” and which of them is seeing a real new continent? A good strategy exists: We ask them to walk in various directions and describe what they are seeing. After some time, we will notice that the writer doesn't know what to say or her testimony will become internally inconsistent. String theory is analogous to the actual new continent being discovered by the explorer. He may have only seen parts of it but it is clear that wherever he goes, his testimonies continue to make sense. They are well-defined and internally consistent. Even if we decide that we never want to move to the new continent, it may be a good idea to listen to the explorer because there are not too many continents (and there is only one theory beyond quantum field theory with an analogous degree of consistency, string theory).
The other general virtue of string theory is the existence of “dualities”. A duality is an equivalence between two (or several) sets of equations i.e. laws of physics but it must be an equivalence that is “very hard to see”. For example, the renaming of all fields from \(F\) to \(2F\) wouldn't be counted as a duality. Since the 1990s, it was becoming clear that string theory offers us a huge number of such dualities. We may describe two physical systems that look very different, when it comes to some qualitative features and the words that seem most relevant for a description of the events in the (hypothetical or real) world. But when the physical implications of the two theories are analyzed in detail, we find out that for every measurement in one world, there is a measurement in the other world and the results always exactly agree. So despite the differing ways to think and talk about the events, there is actually no way to distinguish these two worlds; they are one world, they are one theory. One of the languages or descriptions may be more natural in a certain situation than the other but qualitatively speaking, both languages or descriptions are equally legitimate.
The existence of dualities increases the probability that “string theory is as real as a continent” because the different dual, equivalent descriptions of the same physics are analogous to photographs taken by completely different explorers from different perspectives. A pair (or collection) of two-dimensional photographs that are consistent with the hypothesis that they are actual photographs of the same three-dimensional object dramatically increases the probability that the three-dimensional object is real and the photographs were not Photoshopped.
Again, by itself, this feature doesn't guarantee that string theory is the right description of Nature around us (direct or indirect comparisons of the predictions with the empirical data are needed for that). However, what it does (nearly) guarantee is that string theory isn't just a collection of ideas that someone has made up, like the fictitious continents invented by the writers of fiction novels. Instead, it is a theory whose properties are objectively given, not determined by human choices or conventions, and because the theory seems capable of describing and predicting all phenomena in Nature, it deserves the physicists' attention. It deserves much more attention than the “theories” that are being invented by people who are adding one assumption on top of another – “theories” analogous to the fictitious continents – and the theories sometimes presented as “alternatives to string theory” belong to this disappointing camp.
Internally consistent theories may be false
Most of the essay has argued that the arguments referring to consistency are valid, important, and theories that have passed some (or many) consistency checks should be taken (more) seriously because they are probably teaching us something new, something that we previously didn't know.
However, it is also possible to overestimate the power of the consistency. The most fallacious abuse of the power of consistency (or the language that includes this word) is to pick a particular theory and to claim that it must be correct because every new insight or observation seems consistent with the theory while overlooking the existence of some competing theories that are equally (or more) consistent.
The arguments involving consistency are often used in a twisted and illogical manner, as a rationalization of an opinion, a theory, or an ideology that a given person decided to believe at the very beginning. That's why it is important to emphasize that the consistency checks that a theory passes for its probability to increase must be sufficiently non-trivial – in the sense that a (large) majority of competing, superficially similar but different theories (or every competing theory) fails the tests. There must exist a sufficient potential for (i.e. prior probability of) inconsistencies and the given promising theory must “surprisingly” manage to avoid all these inconsistencies. And whenever the original theory is modified (or made more complicated) in some way and whenever the modification is known to be logically unnecessary (i.e. whenever the modification is not just a correction of an error that people would previously make), some of the passed tests become trivial because some features of the theory were “fitted” in order to pass the test. Only when the same theory passes other, inequivalent tests (without additional modifications of the theory), the success may be considered an argument that increases the probability that the theory is correct.
Logically flawed excuses that use the word “consistency” are so widespread and numerous that some of the proponents of these invalid explanations could feel discriminated against if they were not mentioned. That is why I will mention no examples. However, when these logical traps, fallacies, and circular reasoning are avoided, arguments referring to the logical consistency are very powerful and important. And their power – especially in theoretical physics but perhaps in other disciplines, too – is likely to keep on increasing in the future.
Originally written with a different audience in mind