Thursday, April 26, 2018

Volumes of higher-dimensional balls from Gaussians: the coolness and conceptual implications

Bill Zajc has agreed with me that the derivation of the volume of an \(N\)-dimensional ball\[

V_N (R) = \frac{\pi^{N/2} } { \zav{\frac N2}! } R^N

\] where \(X!\) is the generalized factorial i.e. \(X! = X\cdot (X-1)!\), \(0!=1\), \((-1/2)!=\sqrt{\pi}\), is something that simply has to impress a kid who has a mathematical heart, especially the kind of a mathematical heart that is relevant for theoretical physics.

I will try to discuss the derivation including some comments why it's so cool and what one learns.

First, at some moment, the kid learns that the exponential with the base \(e\approx 2.718\) is the most natural function needed to write powers. Along with its inverse, the natural logarithm, it may be used to write down a general power:\[

x^y = \exp(y \log x).

\] That's cool. On top of that, \(\exp(x)\) has itself as the derivative. We may differentiate and integrate \(\exp(x)\) easily as in the joke about the functions that walk on the street and they're terrified to death when the derivative emerges in front of them. Only one function seems courageous. "Aren't you terrified of me?" the derivative asks. "No, I am the exponential."

The argument of the exponential may be anything but there's the "second" most natural function based on the exponential, the Gaussian\[

f(x) = \exp(-x^2).

\] Such Gaussians, perhaps with \(-x^2/x_0^2\) written as the exponent (and \(x\) may also be shifted by an additive shift), are cool for various reasons.

First, they are the most widespread idealized probabilistic distributions because of the central limit theorem. The omnipresence guaranteed by this theorem is the reason why the "Gaussian" distribution is usually called the "normal distribution".

Second, the Gaussian is also the wave function for the ground state of the harmonic oscillator in quantum mechanics. It's not unrelated to the previous paragraph because the probabilistic distribution for \(x\) and \(p\) is Gaussian in that Gaussian wave function, too. And this ground state wave function is also the right type of the wave function that minimizes the product of uncertainties \(\Delta x \cdot \Delta p\) – the product saturates the lower bound from the uncertainty principle, namely \(\hbar / 2 \).

However, it's harder to deal with the Gaussian function than with \(\exp(x)\). In particular, we don't know what is the indefinite integral of \(\exp(-x^2)\). In fact, it may be proven that the function whose derivative is \(\exp(-x^2)\) cannot be written in terms of the basic functions. Nevertheless, it turns out that we may calculate the definite integral over the real axis,\[

\int_{-\infty}^{+\infty} \exp(-x^2) dx = \sqrt{\pi}

\] It's the square root of pi, the same pi that is the ratio of the circumference of a circle and its diameter. This claim is no approximation. It's really the same \(\pi\) that appears in the Gaussian. How is it possible? What does the Gaussian or its integral have to do with circle, every curious kid has to ask at the beginning? How does the Gaussian function "know" about the circle, or how does the "circle" become important for questions about the seemingly non-circular Gaussian function?

The proof is a "minimum subset" of the derivation of the volume of the higher-dimensional balls.

The funny fact is that we can calculate the integral above if we calculate its second power, i.e. its square:\[

\left ( \int_{-\infty}^{+\infty} \exp(-x^2) dx \right)^2 = \dots

\] Well, the second power may be written by writing the two identical factors next to each other, with "times" in between. But to be able to manipulate the expression, it's useful to directly rename \(x\) to \(y\) in the second factor:\[

\dots =\int_{-\infty}^{+\infty} \exp(-x^2) dx \cdot \int_{-\infty}^{+\infty} \exp(-y^2) dy = \dots

\] What is surprising – and the first deep lesson – is that we're seemingly making things harder to prove but we're actually getting closer to the proof, anyway. Imagine that someone asks you to lift 100 pounds to the top of some furniture with your left arm. I still couldn't quite do it because of my bike accident last Saturday. OK, someone could tell you: lift 200 pounds and then return 100 pounds back. You would be laughing: What sort of a stupidity it is? If it's hard to lift 100 pounds, it must be even more difficult to lift 200 pounds, right?

Well, this simple counting is often wrong in mathematics. Calculating things that look "less elementary" often ends up being more doable. OK, how do we calculate the product of integrals over \(x\) and \(y\) above? We first realize that the integral over \(x\) – which is a "coefficient" in front of the integral over \(y\) – may be included in the second integral. That effectively merges the two one-dimensional integrals into a single two-dimensional one:\[

\dots = \int \int dx\,dy \exp(-x^2-y^2) =\dots

\] I have suppressed the limits – they always go from \(-\infty\) to \(+\infty\) – and I have also written the product of two exponentials as the exponential of the sum. If you got lost, then it means that you would probably get lost somewhere later, anyway, so it makes no sense to spend too much time with teaching of these trivialities – unless my pessimistic comments about your prospects motivated you to prove the triviality, after all. (Note that this Motl's method of education of mathematics is very different from Hejný's method in this respect.)

OK, we're almost done. The two-dimensional integral is one over a two-dimensional plane and by the Pythagorean theorem, \(-x^2-y^2 = -r^2\) is simply (minus) the squared radial coordinate from the polar coordinates. We may switch the measure to the polar coordinates as well \(dx\,dy = dr\,r\,d\phi\) to get \[

\dots = \int_{0}^{2\pi} d\phi \int_{0}^{\infty} dr \, r\,\exp(-r^2) = \dots

\] That's great because the integrand doesn't really depend on the angular coordinate \(\phi\). The integral over \(\phi\) simply gives us \(2\pi\), the circumference of the unit circle. That's how the circle gets in. And it really did get in because another benefit of the polar coordinates was the extra \(r\) in the area element, from the Jacobian.

So we're no longer looking for the "impossible" indefinite integral of \(\exp(-r^2)\). Instead, we're trying to integrate \(r\exp(-r^2)\). And although the latter looks "longer" and therefore "harder", we already cannot be shocked that it's actually "easier" to integrate the second function. The indefinite integral of \(r\exp(-r^2)\) is simply \(-\exp(-r^2)/2\). Just try to differentiate the latter: you get the exponential back, times the derivative of the argument which is \(-2r\), and \((-1/2)\cdot (-2r)\) is \(r\) which is what we needed.

That's why the two-dimensional integral equals\[

\dots = 2\pi \cdot \left[ -\frac 12 \exp(-r^2) \right]_0^{\infty}= 2\pi[0-(-1/2)] = \pi

\] We got it. The two-dimensional integral equals \(\pi\), half the circumference of the unit circle! We may return back to calculate the one-dimensional Gaussian integral. It's the square root of \(\pi\). Well, it could be \(\pm \sqrt{\pi}\) but we may eliminate the negative answer because the integral is self-evidently positive. Done!

It works. What have we learned conceptually? We have learned that by making things superficially harder, by combining the elementary things into combinations, by lifting 200 pounds instead of 100 pounds, we may sometimes actually make progress. The apparent "extra difficulty" actually cancels against some difficulty that was there to start with, which couldn't happen if we insisted on the "straightforward minimal work". And when the difficulties cancel, the problem may be doable, calculable, or provable!

Equivalently, we have learned that useful and important calculations are not always "results of a self-evidently useful, straightforward or mechanical algorithm". Sometimes, one needs to play with things. Something that may look like a waste of time to a layman may be a very clever thing to do – which actually leads to the solution of difficult problems. Similar clever tricks exist in many other problems to calculate something – which is why \(\pi\), the number defined from the circumference of a circle, appears at many previously unexpected places of mathematics.

For example, the probability that two random integers are relatively prime, is \(6/\pi^2\), close to 60 percent. Because that probability is the product of probabilities \(1-1/p^2\) over all primes – which is the probability of the negation of the proposition that both random integers are multiples of \(p\) – and because the zeta-function may be written using this Euler product, we may prove that the probability is equal to \(1 / \zeta(2)\). And \(\zeta(2) = 1+1/2^2+1/3^2+1/4^2+\dots =\pi^2 / 6\) may be calculated – and shown to include the same circle-based \(\pi\) – by calculating the norm of a periodic, locally linear or polynomial, function using two methods that are related by the Fourier series to each other. Because Fourier series deal with the natural periodicity \(2\pi\), you will get the \(\pi\)'s somewhere, and they're why the \(\pi\) appears in \(\zeta(2)\) as well as the probability of numbers' being relatively prime.

Now, we have seen how to calculate the integral of the Gaussian. Can it help us to calculate the volume of the \(N\)-dimensional ball? You bet. Just for fun, let's calculate the integral\[

\int d^N x \,\exp(-r^2 / 2) = \zav{\sqrt{2\pi}}^N

\] Here, I used the argument \(-r^2/2\) and not just \(-r^2\) but that doesn't change anything substantial about the calculation. You can relate the two integrals simply by rescaling \(r\) by the factor of \(\sqrt{2}\) in one direction (a substitution) – which is also why the extra power of \(\sqrt{2}\) has appeared in the result. You're supposed to check the identity above – and I think it's sensible for the people who play with such things to more or less memorize this formula even in this form. It's not "mandatory" but if a sane person meaningfully plays with such things for some time, he will at least temporarily memorize these basic facts, anyway.

Now, we can pick the spherical coordinates – the \(N\)-dimensional generalization of the polar coordinates – in this situation, too. The volume form may be written as\[

d^N x = S_N(r) dr = S_N r^{N-1} dr

\] where \(S_N(r)\) is the hypersurface of the \(N\)-dimensional ball, i.e. the hyperarea of the \((N-1)\)-sphere of radius \(r\); apologies if you would prefer to call it \(S_{N-1}\). When the argument \(r\) is omitted, it's assumed that \(r=1\), and the scaling with \(r\) clearly generates the power \(r^{N-1}\). So we can calculate the \(N\)-dimensional Gaussian either as the power of the one-dimensional Gaussian (left hand side); or in the (hyper)spherical coordinates. These two results are equal i.e.\[

\zav{\sqrt{2\pi}}^N = S_N \int_{0}^\infty r^{N-1} \exp(-r^2/2).

\] This is an identity, a fact that we proved to be correct just like \(2+2=4\). And it's an identity where everything seems explicit except for \(S_N\). So we should be able to extract \(S_N\) from it. And yes, we can:\[

S_N = \frac{ \zav{\sqrt{2\pi}}^N }{ \int_{0}^\infty dr\, r^{N-1} \exp(-r^2/2) }

\] In the denominator, we may see a famous integral by the substitution \(t=r^2/2\) i.e. \(r=\sqrt{2t}\), i.e. \(dr = dt / \sqrt{2t}\):\[

S_N = \frac{ \zav{\sqrt{2\pi}}^N }{ \int_{0}^\infty dt\, (2t)^{(N-2)/2} \exp(-t) }

\] OK, up to the factor \(2^{N/2}\) which cancels against the numerator and \(2^{-1}\) which doesn't, the denominator has become an Euler integral for the Gamma function, a generalized factorial:\[

P! = \int_{0}^\infty dt\, t^P \exp(-t)

\] In our denominator, the relevant factorial is \([(N-2)/2]!\).

You may check that the general integral in the right hand side of Euler's formula above matches everything you expect from the factorial. For \(P=0\), it gives you \(0!=1\). And if you increase \(P\) by one, you may calculate the integral by parts. One term disappears and the other one reduces \(P!\) to \(P\cdot (P-1)!\) where \((P-1)!\) is expressed by the same Euler integral, too.

If you combine the powers of \(2\) and \(\pi\), and if you realize that \(S_N(r)\) may be obtained as the derivative of \(V_N(r)\) which also means \(S_N = N\cdot V_N\) for the unit ball (the \(N=3\) special case is \(4\pi = 3\cdot 4\pi / 3\)), then you may verify that the originally claimed formula which I write for the unit ball's volume here:\[

V_N = \frac{S_N} N = \frac{\pi^{N/2}}{N\cdot \frac 12 \cdot (N/2-1)!} = \frac{\pi^{N/2} } { \zav{\frac N2}! }

\] Many people know the derivation very well, others should go through all the trivial operations not shown "explicitly" above, at least once. But even if you don't go through these operations, you should trust me that the result is true and all steps not explicitly discussed are trivial.

What have we learned conceptually? We have learned that sometimes, when it's easier to lift 200 pounds instead of 100 pounds, it may be a good idea to lift \(N\) hundred pounds, too. ;-) The higher-dimensional Gaussian has a spherical symmetry so we could have expected that it knows something about the higher-dimensional balls and spheres, especially if the integral of the Gaussian over the whole space may be calculated as a power of something that we know. The higher-dimensional Gaussian is the best-suited "rotationally invariant function" for that purpose because a rotationally symmetric function only depends on \(-r^2\), and exponentiating that is the simplest way to make the integral convergent.

We have also learned Euler's integral for the generalized factorial. That integral may be proven by induction because it confirms the recursion formula for the factorial which is clear if you try to calculate the integral by parts. Also, Euler's integral gives you \(0!=1\) and \((-1/2)! = \sqrt{\pi}\), as you may verify.

Patterns in mathematics are often clever – and cleverly connected with each other, and with various particular combinations of other things. Such reorganization of the wisdom and "what is easy" and "what is difficult" emerges in many successful calculations. When you understand a clever derivation like that, you should adjust your expectations that were proven to be inadequate. At the beginning, you could have expected that the one-dimensional Gaussian must be "easier" to be integrated than the higher-dimensional ones. But once you go through the derivation and it works happily, you must be able to appreciate that your expectations were wrong. The two-dimensional Gaussian was easier to be integrated.

More generally, expectations like that may sometimes be wrong. It's often easier to integrate a function that looks "harder" than to integrate a function that looks "easier". More generally, it's often easier to calculate, solve, or prove a task that looks "harder" than one that looks "straightforward".

And the simplest calculations and proofs that actually exist may often involve tools that you didn't expect to be useful at all. To calculate the one-dimensional Gaussian, we got something that depends on \(\pi\) i.e. on a circle. The addition of the \(t^P\) factor to a simple integral of \(\exp(-t)\) has the effect of including a factor that is equal to \(P!\). So power-law insertions are helpful to write factorials using integrals. (Perturbative string theory amplitudes exploit this fact at every place.)

And yes, geometry may not only be rewritten as algebra. Geometric relationships, shapes, and their combinations may be rethought as algebraic methods and tricks. The previous sentences contain quite some abstract, general lessons. But a person who works with mathematics – especially the kind of mathematics that is found in theoretical physics – sees this theme to repeat itself many times, in increasingly sophisticated patterns.

However, there was some moment when each of us said "Heureka" for the first time. From the kindergarten years, we could do some "straightforward" things in mathematics where everything matched some expectations if we had any. But there was a moment in which our expectations were demonstrably wrong – and we could see that the truth is very precise, totally provable, and far more interesting than the expectations. It was a moment when we learned that mathematics or Mother Nature don't have to be necessarily smarter than what we can become, but they're surely smarter than the dumb and naive babies that we were born as.

This first lesson that one's built-in and straightforward common sense expectations may be wrong – and that doesn't mean the end of reason, on the contrary, it means the beginning of intelligent reason – is the same lesson or a similar lesson that one needs in natural sciences, too. A physicist must undergo the experience of having a theory that looks extremely logical and likable – but cold hard evidence shows the theory to be wrong and the theory is ideally replaced with one that looks even more logical.

One must learn that we have our limitations – and one must learn that they may be pushed and we may learn to be smarter. Once we understand that this trend is possible at all, we may repeat it many times and get rather far.

I am afraid that the classes of mathematics, especially according to those postmodern methods where the pupils are "protected against the unknown" in a specific version of the safe spaces, simply don't teach any of that. These pupils don't learn that the common sense may fail, that there are often clever things that work for unexpected reasons and that only work because several pieces of the puzzle exactly match what is needed. The kids are reinforced in their view that they were basically born as perfect beings who already know everything that is worth knowing and what is left is just some practice. This opinion is completely wrong.

I am sure that most kids aren't excited by these calculations and clever proofs. After all, many of these calculations and proofs are dead ends – you only need the proof once and on top of that, even the result is usually useless for practical enough applications. But this fact shouldn't prevent other kids from being exposed to this stuff because love trumps hatred – you know, when a bright kid understands some of these deep things and is ready to build many new floors on this skyscraper of knowledge, it's much more important than the fact that another kid doesn't have a clue and dislikes mathematics!

What's primarily wrong about the decisions "how to design a good system to educate mathematics" is that the kids who hate mathematics are considered more important than those who love it – and that's just wrong, wrong, wrong. It's wrong for the intellectual happiness of the kids who love and it's wrong for the intellectual and technological future of the nation, nations, and the mankind. Unfortunately, this preference of the kid who hates mathematics is what has dominated the education of mathematics in Czechia and elsewhere for decades and that's why the education has substantially deteriorated.

No comments:

Post a Comment