Wednesday, July 13, 2011

Regularization and renormalization

Eighty years ago, people encountered the first divergences - infinities - in the calculations based on quantum field theory: I chose Julian Schwinger as the top representative of these developments. It was a somewhat technical development but in some sense, it was as big a revolution and a source of new puzzles as relativity or quantum mechanics themselves.

Albert Einstein was a pioneer of relativity and has actually made major contributions to early quantum mechanics, too. (And he also encouraged people to think about entanglement etc.) However, he didn't really accept quantum mechanics because he has never understood it. Paul Dirac, a big shot from a newer generation, understood relativity as well as quantum mechanics (he was a major co-father of it) but he has never accepted renormalization. Of course, this theme gets repeated all the time. Richard Feynman was a big shot guy from another younger generation who understood and accepted relativity, quantum mechanics as well as renormalization but he was already too old to understand and accept string theory.

I want to discuss the divergences and renormalization in quantum field theory in some detail.

A historical summary: is renormalization legitimate?

Over the years, people learned to deal with the infinities and extract the final answers from their calculations that could be compared to experiments. It was very clear by the early 1950s that a well-defined universal machinery existed but many people opposed it - and some people oppose it even today. However, the agreement of the calculations with the observations is the best proof that the methods are correct and essential for all of quantum field theory.

In the 1970s, the people's hostility against renormalization was tamed - or should have been tamed - by the discovery of the Renormalization Group by Ken Wilson and friends.

In this framework, every quantum field theory may be viewed as an "effective description" that only holds for distances longer than \( L > L_{min} \) or \( L \gg L_{min} \) and for energies smaller than \( E < E_{max} \) or \( E \ll E_{max} \). The physical phenomena may be related to some "critical behavior" and these phenomena are labeled by some parameters. One may get the same low-energy phenomena if he lowers \( E_{max} \) but changes the value of the coupling constants and masses \( g,e,m,\dots \) correspondingly. The required modification of the couplings and masses linked to a small change of the typical energy scale \( E_{max} \) which is needed to preserve all the low-energy physical phenomena is what is referred to as the Renormalization Group flow.

It's called a "group" because the operation of changing \( E_{max} \) by the factor \( k \) is a (multiplicative) "semigroup" or "monoid" and physicists have simplified the word to a group because they're not used to contrived, pompously sounding terms such as "semigroups" and "monoids".

Is quantum field theory a quantum mechanical theory?

The answer is a resounding Yes. So whenever I was speaking about the universal postulates of quantum mechanics, they fully apply to quantum field theories, too.

There is a well-defined Hilbert space and there obviously exist operators. States are evolving according to a linear Schrödinger's equation. Squared absolute values of amplitudes are interpreted as probabilities. Nature remembers all correlations - called entanglement in the general quantum context - and the right "interpretation" of the measurement is identical in quantum field theory as it is in quantum mechanics. After all, non-relativistic quantum mechanics may be identified as a limit of a quantum field theory. A part of the Hilbert space of a QFT - which will stand for Quantum Field Theory - may be identified with the Hilbert space of a non-relativistic QM theory and the Hamiltonians agree in the limit \( 1/c \to 0 \), too. Decoherence is as real as it has always been, much like everything else linked to the foundations of QM.

What's different or surprising is that the "normal operators" we might expect to be well-behaved in quantum field theory, namely the fields, are not quite well-behaved. Relatively to the well-defined states of the Hilbert space - those that can be identified with the "ordinary" quantum mechanical states - such field operators may have infinite matrix elements and the Hamiltonian may be expressed as a functional of the field operators that involves divergent constants. It's very important to realize that only things that can be measured - e.g. the cross sections and the amplitudes directly leading to them - are required to be finite and well-behaved. Everything else may be subtle.

Let me emphasize that the "divergent" character of the basic field operators is yet another complication different from the fact that these operators are not strictly speaking operators but "operator distributions" (whose commutators involve Dirac's delta-functions which are also distributions). Instead, we will be talking about the divergence that comes from the factors such as \( 1 + e^2 \log (\Lambda / m) \) where \( \Lambda \) is a huge energy scale.

QED: a prototype of a renormalizable QFT

By QED or Quantum Electrodynamics, I will mean a theory of the electromagnetic field \( A_\mu \) and a single Dirac field \( \Psi \) describing the electrons and positrons. Of course, the term QED may be used for theories that also contain other charged fields (and the corresponding particles) but let's be simple at the beginning.

The action of a QFT is always written as
\[ S_{QFT} = \int d^d x\, {\mathcal L_{QFT}}. \]
where \( d \) is the spacetime dimension or \( d-1 \) is the number of spatial dimensions. We will usually set \( d=4 \) which, at least naively, most closely resembles the Universe we live in. However, quantum field theories with \( d<4 \) as well as \( d>4 \) should be and are considered.

When it comes to divergences, it is quite generally true that for ever larger values of \( d \), one encounters ever more severe short-distance or ultraviolet (UV) divergences while for small values of \( d \), the UV divergences become more peaceful, less severe, or disappear, and on the contrary, we are seeing ever strengthening long-distance or infrared (IR) divergences. The UV and IR divergences have very different interpretations and cures, as we will discuss at the end.

QED Lagrangian

The Lagrangian of QED is inherited from classical electrodynamics. It is, in a certain normalization of the fields,
\[ {\mathcal L}_{QED} = -\frac{1}{4e^2} F_{\mu\nu} F^{\mu\nu} + \bar\Psi (i\gamma^\mu D_\mu - m )\Psi \]
This is the whole story. The full, real, accurately verified QED is given by this Lagrangian: any "additional terms" you may ever see in the "full" QED Lagrangian only appear because of field and parameter redefinitions.

Here, \( F_{\mu\nu} = \partial_\mu A_\nu - \partial_\nu A_\mu \) is the field strength - the intensity of electric and magnetic fields - as calculated from \( A_\mu \), the 4-vector potential, which is treated as the elementary field variable of the electromagnetic field. \( D_\mu \) is the covariant derivative which for \( U(1) \) gauge groups is simply \( \partial_\mu + i A_\mu \) where the coefficient in front of \( A_\mu \) was set to one because I renormalized the field \( A_\mu \) as is obvious from the normalization factor \( -1/4e^2 \) in the first term of the Lagrangian (the Maxwell term or the kinetic term for the photons).

Of course, if there were other fields with different values of the electric charge, I would have to put the corresponding charge (in the units of positron's charge) next to the \( A_\mu \).

The field \( \Psi \) is the Dirac spinor field for the electrons and positrons. I expect the reader to be familiar with - or to ignore - the intricacies of the spinorial representations of the Lorentz group because they're pretty independent from renormalization.

If you divide the Lagrangian to the quadratic and other terms, the only "other" term is the term schematically of the form \( \bar \Psi A \Psi \) which produces the cubic vertex in Feynman diagrams - two external electron/positron (straight) lines and one external (wiggly) photon line. The remaining quadratic terms may be treated as a solvable free field theory and they produce the propagators for the (straight) electrons and positrons (electrons with negative energies moving backwards in time) and for the (wiggly) photons.


Note that the simple QED Lagrangian has two parameters, the electromagnetic coupling \( e \) and the electron mass \( m \). However, one must immediately realize that the parameters in the Lagrangian don't have to obey the naive expectations.

First of all, \( m \) in the Lagrangian is not necessarily the measurable physical mass of the electron we can measure. You must understand that to create a measure a real electron is a complicated process - potentially involving corrections from loop diagrams with virtual particle-antiparticle pairs and other things - so what we measure as the electron mass isn't necessarily the same thing as the simplest parameter that is most directly inserted into the simplest form of the Lagrangian. I have used the symbol \( m \) just for the sake of simplicity. The parameter in the Lagrangian should use a different symbol such as \( m_0 \) as soon as you start to use \( m \) for the actual measured value of the mass.

Analogous comments apply to \( e \) where they are even more urgent: \( e \) is not exactly the charge that you have to substitute to the Coulomb's law to get the right electrostatic force.

Second of all, \( m \) and \( e \) don't really have to be finite at all. In fact, they have to be infinite in the right way so that the physical masses and couplings - which are the sums of \( m \) or \( e \) as well as many corrections - end up being finite. Because the corrections are "infinite", it's clear that the initial values of \( m \) and \( e \) in the Lagrangian have to be infinite as well, for the observable sum to be finite.

The photon mass has to be zero because a nonzero value would violate gauge invariance. There is no consistent version of pure QED that has a massive photon. However, you may have to include a photon mass term to the Lagrangian and insist that the masslessness - and gauge invariance - is restored when the counterterms are added. In fact, you have to consider such terms in the photon mass - and manually cancel them - if you use a regularization technique that produces corrections to the photon mass. Many modern regularization technique automatically preserve the gauge invariance i.e. the masslessness of the photon so this problem is avoided.

Gauge invariance acts as
\[ \begin{align}
A_\mu &\to A_\mu + \frac{1}{e} \partial_\mu \lambda \\
\Psi &\to \Psi \exp (i\lambda)
\end{align} \]
where \( \lambda = \lambda (x_\mu) \) is a parameter of the \( U(1) \) transformation. The original Lagrangian is invariant under this transformation. The photon field has negative-norm, sick time-like polarizations that have to be removed by this gauge invariance. For this reason, we also need some formalism to guarantee that only the physical photon polarizations remain physical. So gauge-fixing terms have to be added and there's whole technology here. Much like in the case of the spinor indices, these technicalities are largely independent from the basic logic of renormalization.

Splitting the constants: counterterms

The loop diagrams produce divergences - things that behave like
\[ \int d^4 p \frac{1}{p^4} \]
at large values of the loop momentum \( p \). The particular integral above is dimensionless - so the corresponding divergence has to be logarithmic. From one viewpoint, the logarithmic divergences are the "softest" ones: the power law divergences are "more" divergent.

From another viewpoint, the logarithmic divergences are the "most real one": the power-law divergences may be viewed just as representations of finite numbers. This conclusion becomes explicit if we use the dimensional regularization. The finiteness of the power-law divergences is analogous to the finiteness of \( 1+2+3+\dots \). The logarithmic divergence is "really infinite" much like the value \( \zeta(+1) \) of the Riemann zeta function. Recall that \( s=+1 \) is the location of the only pole of the zeta function.

Because the loop diagrams for as simple processes as the repulsion of two very slow electrons includes the divergent terms such as the term above, it's clear that for the final Coulomb force to be finite, the tree-level contribution has to be "infinite" in the right way for the infinities to cancel. So we may write
\[ \frac{1}{e^2} = \frac{1}{e_{\rm finite}^2} + \Delta E. \]
This is not a standard notation but I wanted to explicitly indicate that \( 1/e^2 \) fails to be finite and we divided it to a finite and infinite part. The infinite part is \( \Delta E \). All the infinite parts are written as separate terms in the Lagrangian and, even though some of them are quadratic, they are uniformly treated as interaction terms (if they were used to define the propagators, there would be no real separation of them from the finite parts).

In a similar way, one divides
\[ m = m_{\rm finite} + \delta m. \]
That's not the last thing we have to do. If we use the fields \( A_\mu \) and \( \Psi \) from the simple Lagrangian to create and annihilate particles in the ways we would expect in field theory, we find out that the normalization of the resulting 1-particle states is also infinite - it includes an expansion in \( e \) with divergent coefficients in front of the higher powers of \( e \). So one also wants to rewrite
\[ \Psi = Z_\Psi \Psi_{\rm finite} \]
and similarly for the electromagnetic field. Note that all those things are just changes of variables. We're not really changing the original Lagrangian in any way. We're just changing variables in such a way that the new ones are "finite" because we're used to work with finite variables. However, field redefinition is not amodification of physics. It's just a bookkeeping trick.

The infinite parts of \( e \), \( m \), and other quantities may be considered "large" by a layman or a beginner - much bigger (and most important) than the finite parts. But this is a totally incorrect interpretation. In reality, the infinite parts of these parameters are meant to cancel loop diagrams and loop diagrams come equipped with the factor of \( e^2 \) or even higher powers of the coupling - and these are small numbers.

The suppression by genuine, physical, truly small parameters such as \( e \) - which is related to the small fine-structure constant, \( \alpha = 1/137.036 \) - is the only suppression that allows a physicist to say that something is smaller than something else. The "divergent coefficients" in the "infinite part" of \( e \) and other parameters are fake or unphysical. They surely shouldn't lead you to believe that the infinite parts are more important or relevant than the finite parts. They're less relevant and smaller. You may say that the reason is that everything that is "infinitely powerful about them" will get canceled.


The loop diagrams include divergent integrals. The first thing you have to realize is that saying that something is "infinity" and that's the end of story is not good enough. It won't allow you to subtract the infinities to get finite, observable results that may be compared to the experiments.

In fact, you need to preserve your resolution when it comes to the finite part. If you fail to do so, the cancellation of the infinities will look like
\[ \sigma = \infty - \infty = {\rm bullshit} \]
or, using a somewhat more technical term, the result is an "indeterminate form". If you tell your experimental friend that he should be able to measure the cross section equal to an indeterminate form, your friend will probably tell you that you are something that is not expressed in an excessively technical language.

So you actually have to have a crisp formula that is more accurate than just \( \infty \) if you evaluate the divergent integrals. There exist various methods to achieve so. An archaic method is called the Pauli-Villars regularization: it is a strategy to cancel all the divergences by including unphysical new electron-like particles with the wrong sign of the kinetic term (so that there may be cancellations) but much higher masses (so that you don't affect low-energy physics). In this way, the divergent part may be cancelled and one has to study the independence of the theory on the high masses of the spurious, newly added particles.

A brute cutoff means that all the loop integrals are not integrated to \( |p_\mu| = \infty \) but only up to some upper bound \( \Lambda \) in the Euclidean spacetime (after the Wick rotation). This brute cutoff has the disadvantage that we also generate contributions to terms that are prohibited by symmetries - such as the photon mass.

A much more modern machinery is the dimensional regularization that assumes that the spacetime dimension isn't just \( d=4 \) but it is a general complex continuous number \( d \) and all formulae are analytically continued wherever you want. The divergence reappears if you set \( d=4 \) but if you are just a little bit away from four, \( d = 4-2\epsilon \), then you already get a finite results with parts like \( 1/\epsilon \) that diverge in the \( \epsilon \to 0 \) limit. The main advantage of the dimensional regularization is that it protects various symmetries such as the electromagnetic gauge invariance - i.e. the masslessness of the photon.

Quite typically, the factor \( 1/\epsilon \) which diverges in the physically relevant limit plays the same role as \( \log (\Lambda/m) \) in the brute cutoff scenarios etc.


Now the most general amplitudes are expressed as functions of \( e,m,Z,\dots \) - where the dependence on \( e \) is usually encoded via Taylor expansion. What you have to do is to adjust the right values of these parameters - a finite number of parameters - so that your theory agrees with the experiments.

You need the right values of the normalization constants of the fields that I called \( Z \) - there is one per every field in general - so that the field operators create properly normalized one-particle states. And you also need to set the right "bare" values of the mass and coupling, \( e,m \).

To set the right mass, you need to create a one-particle electron state and check that its physically measurable value of the mass agrees with the experimental value. As I said, the bare value in the Lagrangian will generally be different - and divergent.

To adjust the right value of the electromagnetic coupling \( e \), you need to impose the condition that one measurable quantity - such as the force between two static electrons at some distance from one another - agrees with the finite, experimental value. Again, this will tell you that \( e \) in the Lagrangian has to be something specific, something like
\[ e = e_{\rm exp} + e_{\rm exp}^3 \left ( \frac{K_2}{\epsilon} + K_2 \right) + \dots \]
What I want to emphasize with this formula is that the subleading terms have formally divergent coefficients such as \( 1 / \epsilon \) but they are suppressed by higher powers of \( e \) - it doesn't matter which one you use.

Now, the magic feature of renormalizable theories is that once you set the right values of the 2-5 bare parameters of QED, to get the right normalization of fields, the right coupling, and the right mass, any prediction you can make will produce finite answers!

You may compute the scattering of 7 electrons and 2011 positrons at some random energies. Calculate the amplitude at a 5-loop level. Add up lots of diagrams. They will have lots of divergences. The haters of seemingly divergent integrals will give up after the first one appears and they will end up with an indeterminate form - also known as bullshit - but you will never give up. You will carefully add all the diagrams and subtract all the divergences of the type \( 1/\epsilon \) - which may be multiplied by hugely complicated functions.

If your belief and accuracy is strong enough, you will ultimately see that all the divergences cancel for any process! A finite number of adjustments is enough to produce a totally well-behaved theory that predicts physically meaningful predictions for any phenomena. It's the magic of renormalizability. As sketched at the beginning, it may be "explained" by the Renormalization Group.

Renormalizable theories can be extrapolated to high energies

The main feature of renormalizable theories is that their physics may be extrapolated to and remains well-behaved at arbitrary high energy scales - or at least up to energy scales that are exponentially higher - e.g. \( \exp (1/e^2) \) times - than the typical scales in your theory. Because of that, any contributions of new particles or, more generally, new physics at these extremely high energy scales \( \Lambda \) are suppressed by \( 1 / \Lambda^k \) where \( k \) is a positive exponent.

The infinite intermediate results are needed because these "infinities" arise from the extrapolation of the low-energy physics to the insanely high energies. However, they have no impact on the low-energy observables. That's why they're not a source of problems even in the limit where you treat them as strictly infinite numbers - e.g. when you set \( \epsilon = 0 \) in the dimensional regularization.

When a theory is not renormalizable, you will need to add infinitely many counterterms - add arbitrarily high powers of the fields and the spacetime derivatives to the Lagrangian, assume that all of them have arbitrary coefficients, and divide the coefficients to finite and infinite parts. The infinite parts may be cancelled but the finite parts of these parameters - infinitely many new parameters - may be arbitrary. The latter feature is the real problem with non-renormalizable theories. The problem is not something "infinite" or something that "looks infinite". Infinities may always be subtracted in some way. The real problem is that what's left has infinitely many unknown and undetermined finite parameters.

You first need to measure infinitely many things to make your theory well-defined including the parameter - and only afterwards, you may predict. You will obviously never get to the second step. ;-) Consequently, non-renormalizable theories only make sense for the questions in which it is enough to truncate the operators to a finite subset - and it's only possible at low energies where the super-complicated operators are suppressed by very high powers of \( 1 / \Lambda \).

UV vs IR divergences

Well, I have really discussed the UV divergences so far. They are related to your attempts to extrapolate the theory to arbitrarily high energies and momenta - as evidenced by the fact that you tried to integrate over \( p \) to infinity. Those infinities may be subtracted and the only thing that matters is whether the ultimate theory is renormalizable - i.e. whether it has just a finite number of terms in the Lagrangian so that you don't produce any new terms that have to be added by the calculation of loops.

The infrared divergences seemingly look "similar". The frequency of electromagnetic waves may be lower or higher than the visible frequencies and you might think that there is a "symmetry" in between those two realms. Well, string theory and quantum gravity could even tell you that there's something right about your expectation - the UV/IR mixing. But I don't want to get into these advanced things.

There's no UV/IR mixing in QFTs because very small distances are philosophically very different from very big ones. Correspondingly, the interpretation of the IR divergences is also very different. If your theory is well-defined at arbitrarily short distances - e.g. if it is renormalizable which makes it really good - then it is defined at arbitrary (longer) distance scales as well.

What's the reason? Well, the reason is that long distances are made out of shorter ones (but not vice versa!) so physics at longer distances may be determined from physics at shorter distances. This statement sounds like a tautology - if you know about 1-centimeter realms, you know everything about 1-meter realms as well because 1 meter is one hundred "known" centimeters ordered along a line. It is a tautology but it is also known as reductionism and many philosophical kibitzers find it controversial.

At any rate, it's not controversial in physics. The result is that whatever your theory - well-defined at short distances - tells you about long distances is true (if you calculated it right). You must just carefully listen what the theory wants to tell you.

Quite generally, we say that if you experience IR divergences, i.e. those related to divergences in integrals that appear for \( p \to 0 \), it's because you have asked a wrong question. The textbook example is the emission of soft photons.

If you study the collision of 2 particles, they exert forces on each other which causes acceleration and acceleration of charged objects makes them emit electromagnetic radiation. If you think how many photons will be emitted, you will find out that the total number of emitted photons is infinite. However, almost all of them will have supertiny energies so that the total energy is pretty small - it has to be small (and surely finite) because of energy conservation.

So if you calculate the scattering of two charged particles with a well-defined finite state, you will get a finite tree-level (classical) term: tree-level terms are always finite because there are no integrals over loop momenta because there are no loops. This is related to the fact that all the divergences are quantum effects and disappear in the \( \hbar \to 0 \) limit, which is yet another way to describe the truncation to the tree-level diagrams.

However, the one-loop graph will already display an IR divergence - aside from the UV divergences that are cured by the renormalization techniques in the early bulk of this article. The IR divergences are really a signal that what you're doing is to expand a cross section that is zero for the particular process. It's not interesting at all if it is zero - it's a process that strictly speaking can't happen. You can't guarantee that no photons will be produced in the scattering at all: the probability is zero.

Curing IR divergences

What you should be doing instead is to calculate the "inclusive" cross section where you allow the final charged particles to have approximately the momenta you had a moment ago, but you also allow the production of arbitrary soft (very low-energy) photons whose energy is below \( E_{min} \), the minimum energy that your real (or gedanken) devices may detect. This process, allowing any extra soft photons - and summing over all their possible numbers and momenta - has a finite probability (finite cross section) and indeed, you will find that the terms that you added (the cross section allowing extra soft photons) exactly cancel their divergent portions against the IR divergences in the loop diagrams that you had before.

You must think for a little while what we're doing and why it's necessary - and how it translates to detailed formulae. There's a lot of things to learn. But the main message is that the IR divergences can never tell you that a physical theory is inconsistent: they just tell you that the IR, long-distance physics may be different than you expected (or than you wanted to see). Production of soft photons in QED and confinement in QCD (and marginally also in conformal theories) are two examples of infrared effects you have to be careful about. But they will never allow you to throw away a theory as an inconsistent one.

Recall that the UV divergences allow you to divide theories to renormalizable and nonrenormalizable ones - the renormalizable ones only force you to choose a finite number of parameters and cancel their corresponding infinite parts by counterterms.

All those comments above of course come with lots of technicalities if you try to include spinors, gauge fields, chiral interactions, nonperturbative effects, and many other things. The blog entry was meant to be just a superficial sketch attempting to outline the big picture - the kind of a sketch I probably wanted to see when I was a college freshman etc.

Oh, you just noticed that the clicky hacky (that's the anglicized Czech term for scribble) has transformed into beautiful TeX equations right now - and you spent a few hours reading the clicky hacky. In that case, you are encouraged to return at the beginning and reread this text with the nice LaTeX expressions. :-)

And that's the memo.


  1. UV and IR divergences can be cured with zeta regularization :) see my paper for partial results (although in general even the harmonic series has a regularization equal to Euler-mascheroni constant)

    this is valid even for 2-loop or more integrals.
    the idea is to replace (using Euler Maclaurin sum formula a divergnet integral by a divergent series of the form 1+1^{m}+2^{m}+......... which is zeta regularizable, and for the Harmonic series an analogue regularizatio suggest that 1+1/2+1/3+..... is equal to Euler-Macheroni constant.

  2. at-home-ktahn aka father's aptNov 27, 2014, 11:38:00 PM

    yooo Lumo
    I am actually at home for Thanks Giving aka LA. Just glanced through your latest post on renormalization. Epic! I have taken a short break from chasing infinities. Today I am just drinking juice and chatting with the old man.