Thursday, October 20, 2011

Why is there an action and what it isn't

This is an informal continuation of the text
Why is there energy and what it isn't?
One month ago, I discussed the notion of energy and the Hamiltonian. There is a concept similar to the Hamiltonian in mechanics (and the rest of physics) which is called the Lagrangian. To make things funny, the Lagrangian is an essential player in the principle of stationary action whose well-defined form is known as Hamilton's principle. William Rowan Hamilton has clearly contributed to both ways how to look at mechanics (and the rest of physics).

In mechanics, the Hamiltonian \(H\) and the Lagrangian \(L\) are seemingly very similar objects. They may be constructed out of the kinetic energy \(T\) and the potential energy \(V\) as follows:
\(H = T+U, \qquad L = T-U \)
They differ just by the sign! However, you shouldn't overinterpret this similarity. What's important about the Hamiltonian and the Lagrangian are the principles in which they are featured and the principles are very different and employ the Lagrangian and the Hamiltonian in very different ways. In the article about energy, you should have understood that the energy is defined as the quantity that is conserved whenever the laws of physics (of the type we know) are invariant under translations in time. Particular formulae for energy such as \(H=T+U\) in mechanics are just particular examples of physical systems and their dynamical rules.

In the previous article, we have seen that the energy – or the Hamiltonian – is sufficient for formulating all the differential equations that govern the evolution of any realistic physical system in time. In the quantum context, the evolution of other observable quantities is governed by the commutators with the Hamiltonian.

There is a very analogous statement involving the Lagrangian. What is it? First of all, the Lagrangian is not "quite fundamental" in this principle: the Lagrangian is the integrand in an integral called the action, \(S\), over the time variable. So the action is typically an integral of a "quantity local in time over time" and this quantity is nothing else than the Lagrangian \(L\). So I have converted the question "what is the defining property of the Lagrangian" to a related question, "what is the defining property of the action".

The answer is the "principle of least action" or, somewhat more generally, the "principle of stationary action". It says:
For a fixed pair of initial and final states (in which we specify just the coordinates, not the velocities or the momenta), every classical system in physics must choose a history \(x_i(t)\) for which the action \( S[x_i(t)] \) is stationary. It means that if \(x_i(t)\) is changed by \(\epsilon\cdot \delta x_i(t)\) for a small \(\epsilon\) and a function of time \(\delta x_i(t)\) that vanishes at the initial as well as final moment, the action is only allowed to change by terms scaling like \(\epsilon^2\), not terms scaling like \(\epsilon\).
In other words, the action must be equal to a local maximum, a local minimum, a local saddle point.for the allowed history. This allows one to select the whole history as soon as we are told the initial and final configurations. When we do so, we may derive the Euler-Lagrange equations, the differential equations that govern the evolution of a physical system in time.

These equations turn out to be equivalent to Newton's equations, Maxwell's equations, Einstein's equations, the Klein-Gordon equation, the Dirac equation, or whatever other equation describes the evolution of your physical system. All these things may be derived from an action much like they could be extracted from the Hamiltonian.

Deriving the Euler-Lagrange equations

It may be helpful to derive the equations of motion from the action at least in the simplest case of mechanics. Imagine that you have a system described by the coordinate \(q(t)\) and the Lagrangian
\[ S = \int_{t_0}^{t_1}L\,{\mathrm d}t,\quad L = \frac{mv^2}{2}-V(q), \quad v\equiv \frac{{\rm d}q(t)}{{\rm d}t} \equiv \dot q \] The condition that this action is stationary reads
\[ 0 = \delta S = \int_{t_0}^{t_1} \left[ mv \delta v - V'(q) \delta q \right] {\mathrm d}t \] where I needed to be able to differentiate the second power (I hope you know why the factor of one-half disappeared) and a composite function \(V(q)\); just to be sure, \( V'(q) \equiv {\rm d}V/{\rm d}q \). The first term may be integrated by parts
\[ \int_{t_0}^{t_1} m\dot q \delta \dot q\,{\mathrm d}t = \int_{t_0}^{t_1} {\mathrm d}t \left( \frac{\mathrm d}{{\mathrm d}t}(m\dot q\, \delta q) - m\frac{{\mathrm d}^2 q}{{\mathrm d}t^2}\delta q \right){\mathrm d}t \] The first term, the total derivative, integrates to the difference of \(mv\delta q\) between the final and initial state; because we required \(\delta q = 0\) for both of these states, it drops out. (In field theory, we require similar vanishing of the variation at the infinitely distant, asymptotic region of the whole spacetime.)

The remaining terms tell us that
\[ 0 = \delta S = \int_{t_0}^{t_1} {\mathrm d}t \delta q \left( - m\frac{{\mathrm d}^2 q}{{\mathrm d}t^2} - V'(q) \right). \] Because this must hold for any variation of the trajectory \(\delta q(t)\), we see that for each moment in between \(t_0\) and \(t_1\), we must have
\[ m\frac{{\mathrm d}^2 q(t)}{{\mathrm d}t^2} = -V'[q(t)]. \] The mass multiplied by the acceleration is equal to the force: that's nothing else than the usual \(F=ma\) equation due to Newton applied to the situation defined with a potential energy.

In classical field theory, it's natural that the action isn't just an integral over time; it becomes an integral over spacetime. Equivalently, the Lagrangian is the integral of the Lagrangian density \({\mathcal L}\) over the space (but not time):
\[ S = \int \,{\rm d}t\, L = \int{\rm d}^4 x \,{\mathcal L},\quad L\equiv \int{\rm d}^3 x\,{\mathcal L}. \] We were able to derive the equations of motion from the action. In general, the number of independent variations generalizing \(\delta q\) from our example is equal to the number of independent degrees of freedom. For each of them, we may derive some differential equation in time. So you see that everything is ultimately determined by the action.

The Lagrangian density for Maxwell's electromagnetism equals \(-F^{\mu\nu}F_{\mu\nu}/4\); the action for Einstein's general relativity is given by the Ricci scalar curvature, \(R/16\pi G\), and so on. When several systems exist independently, you just add their actions (which would result in several non-interacting systems: the simple additivity of the action guarantees locality and/or clustering property) and cleverly modify the terms (e.g. by replacing partial derivatives by covariant derivatives) so that the natural interactions are incorporated.

It's also useful to mention the case of a non-field-theoretical relativistic particle. Its action is equal to \(-m\int {\rm d}s\) and is proportional to the proper length of the world line in the spacetime (which is nicely independent of the reference frame). Correspondingly, the "Nambu-Goto" action for a relativistic string – a starting point for string theory – is the proper area of the two-dimensional world sheet (the history picked by the propagating string in the spacetime): the coefficient is known as the string tension and it is de facto the linear mass density of the string.

Higher-dimensional generalizations for branes are straightforward but they're not as well-behaved as the action for the string, especially at the quantum level.

In all these cases, the actions directly encode everything we can say about the dynamics. First of all, the actions are integrals of local quantities, the Lagrangian (or the Lagrangian density). For example, there are no bilocal terms that would depend on something like \(q(t_2) q(t_3)\), the product of coordinates or fields at different places of time or spacetime. This automatically implies that the derived equations of motion will be local in time or local in spacetime, too. The condition \(\delta(S_A+S_B)=0\), assuming that \(S_A,S_B\) only depend on two disjoint sets of degrees of freedom, is equivalent to \(\delta S_A=\delta S_B=0\).

Also, the actions are "scalar invariants" that remain unchanged under all the desired symmetries – translations in time and space, spatial rotations, gauge symmetries, Lorentz transformations, supersymmetry, sometimes parity, flavor symmetries, and so on. The action is a master formula that knows about everything.

A brief history

Centuries ago, people understood that objects at equilibrium want to minimize their energy: a little ball in a bowl ultimately wants to sit at the lowest point because that's where the potential energy is minimized. The concept of the action is a generalization of such insights to the case of time-dependent systems. We still want to minimize something but we want to be told not only what is the single preferred location in space (which was the issue in the static case); we want to know what is the preferred location or configuration of the system at every moment of time.

There was another historical precedent: Snell's law of refraction (how the direction of a light ray changes when it switches from one medium to another) may be derived from Fermat's principle of least time: the light ray actually tries to get from one point to another in the minimum possible time, assuming that you realize that the medium whose index of refraction of \(n\) slows down the light to the speed \(c/n\) where \(c\) is the speed of light in the vacuum.

Fermat's principle is very similar to the principle of least action but the derivation of Fermat's principle from the principle of least action is less straightforward than you might think.

Action in quantum mechanics

In the case of energy, we saw that it was upgraded to an operator in quantum mechanics, the Hamiltonian, and such an operator had to exist whenever the wave function is evolving in a unitary way according to a universal law. In fact, we have seen that the key mathematical operations in quantum mechanics – such as the commutators of the Hamiltonian with other operators – were more natural and simpler than their classical counterparts – such as the Poisson brackets.

Analogously, the action has its place in quantum mechanics, too. And we may say that relatively to classical physics, its usage becomes simpler and more natural as well. Paul Dirac has made some preliminary attempts but this discovery – discovery of the relevance of the action (and therefore Lagrangians) in quantum mechanics – was mostly made by Richard Feynman.

In classical physics, the systems are following one particular history which minimizes the action. Feynman realized that the corresponding statement is that quantum mechanical systems are allowed to follow all conceivable histories. If we want to figure out the probability that an initial state evolves into a final state, the probability is the squared absolute value of a probability amplitude – a usual rule in quantum mechanics.

More nontrivially, the probability amplitude is the infinite-dimensional integral over all histories, including (and especially) those that don't satisfy the classical equations of motion – with the integrand given by the action:
\[ {\mathcal A}_{i\to f} = \int {\mathcal D}\phi\, \exp(i S(\phi) / \hbar). \] This formula makes the linearity of quantum mechanics manifest: we're just adding contributions from different histories.

It also makes the classical limit given by the least action, \(\delta S = 0 \), obvious. Why? Because of the following reason: in the classical limit, Planck's constant \(\hbar\to 0\) is tiny so the exponent in the phase of the Feynman path integral is a quickly varying number. The exponential is therefore pretty much a random phase (a random point on the unit circle in the complex plane) and the contributions of nearby histories almost cancel. The only region of the integral where this cancellation breaks down is the vicinity of the histories with \(\delta S = 0\) because the phase \(\exp(iS/\hbar)\) is nearly constant in those regions and the nearby histories constructively add up. That's why classical histories (plus minus some quantum noise) are the greatest contributors to the transition probabilities.

I also find it important to explain how the locality or clustering property or independence of two isolated systems is guaranteed by Feynman's formulae. In classical cases, I said that \(\delta(S_A+S_B)=0\) was equivalent to \(\delta S_A=\delta S_B=0\) if the actions \(S_A,S_B\) for the subsystems \(A,B\) only depended on disjoint sets of degrees of freedom. How does this observation elevate to the case of quantum mechanics? Well, the probability amplitude for such a composite system is equal to
\[ {\mathcal A}_{i\to f} = \int {\mathcal D}\phi\, \exp\left[i [S_A(\phi_A)+S_B(\phi_B)] / \hbar\right] \] but because the integration \(\int{\mathcal D}\phi\) factorizes to \(\int {\mathcal D}\phi_A{\mathcal D}\phi_B \) and because the exponential of a sum is the product of the individual exponentials, this is equal to
\[ \begin{align} {\mathcal A}_{i\to f} &= \int {\mathcal D}\phi_A\, \exp\left[i S_A(\phi_A) / \hbar\right] \times \\ &\times \int {\mathcal D}\phi_B\, \exp\left[i S_B(\phi_B) / \hbar\right]. \end{align} \] The probability amplitudes for the transitions are simply products of the probability amplitudes for the systems \(A\) and \(B\) and because the total probability is the squared absolute value of the probability amplitude, the same thing holds for the probability: the probability of the transition is just the product of the probability calculated for \(A\) and one for \(B\), just like you expect for probabilities of independent events or outcomes. That's why it's right to write down \(\exp(iS/\hbar)\) in Feynman's formula and not, for example, \(\exp(iS^2/\hbar^2)\) or another phase, assuming that the action is additive (e.g. an integral over spacetime in which two independent regions simply add up).

Actionless quantum systems

The concept of the action became especially potent in quantum field theory where it has led to a very convenient toolkit how to calculate things. It's been more convenient than the Hamiltonian approach because the symmetries may be kept manifest. This advantage of the Lagrangian-Feynman approach grew even larger in theories with gauge symmetries because one must add things like the Faddeev-Popov ghosts and those things would be very cumbersome (but possible) in the Hamiltonian approach.

However, you should realize that the approach based on the action is more special than the Hamiltonian or operator approach. The reason is that the Feynman approach to quantum mechanics still assumes that there are some "classical histories" described by fields such as \(q_i(t)\) or \(\Phi_i(x,y,z,t)\). Of course, Feynman doesn't say that these observables behave just like in classical physics; quite on the contrary, the bulk of his approach to quantum theories is to clarify the new rules how to deal with these objects. However, it's true that there have to exist continuous quantities generalizing the "coordinates".

In general, quantum mechanical systems without continuous observables of this sort may exist. When you look at such a system, the description in terms of actions becomes impossible. If \(q(t)\) were only allowed to take discrete values, you couldn't really integrate over it: we may only integrate over continuous objects. In practice, this is not a limitation: for example, pretty much all sensible quantum field theories in 3+1 dimensions we know may be defined out of continuous fields and an action.

However, there are both "discrete-like" quantum systems that may only be defined in the operator approach; and there are even "seeimingly continuous" field theories such as the \((2,0)\) theory in 5+1 dimensions where the operator approach is more fundamental and where you may have a trouble to find a description using Lagrangians. No action that would fully define such theories is known and we know it can't have certain simple forms but it's also true that there is no proof that no action-like description of this particular theory exists.

Minima vs maxima

The derivation of the equation of motion of course used the condition that the action was stationary: the first functional derivatives had to be zero. Stationary points may be minima or maxima and it didn't really affect the equations. In reality, the solutions to the classical equations of motion in sufficiently simple and well-behaved systems are almost always the minima of the action: think of a free particle whose action only contains the integral of the kinetic energy. Obviously, the straight line (which solves the equations of motion) minimizes the average kinetic energy.

However, maxima may occur as well: those lead to the same equations of motion but are somewhat analogous to unstable equilibrium in the case of minimization or maximization of potential energy. In the case of the action, you must realize that we're looking for stationary points of the action which is a function of infinitely many variables – the values of \(q_i(t)\) at all times. So the stationary points may be minima with respect to some directions and maxima with respect to other directions in this infinite-dimensional space.

It's pretty much a rule that the action is minimized with respect to almost all directions – and there are only a finite number of exceptions: most typically, at most one exception exists. So the action is either a complete local minimum or a saddle point that is "almost universally a minimum" with one exception i.e. one direction with respect to which the action is a maximum.

Local minima vs global minima

Another issue is the question whether the action must be at a global minimum or whether local minima are enough (if many of them exist, and this may happen, indeed). The equations of motion of course only depended on the stationarity of the action which defines a local minimum or a local maximum (first derivatives must vanish: this condition knows nothing about the other potential distant minima). However, in quantum mechanics, we may see that the vicinities of all the histories that are local minima do contribute to the path integral significantly: the constructive interference works for all local minima.

However, if the global minimum contributes a nonzero term to your amplitude, it's pretty much the rule that its contribution exceeds the contribution of other local minima that are not a global minimum.

Nevertheless, as we discussed in the article about cosmic catastrophes, there are important situations (especially quantum tunneling and vacuum decay) in which the global minimum of the action contributes nothing to an interesting process (because this process is topologically nontrivial while the global minimum is topologically trivial; or because this process violates an accidental conservation law that is respected by the global-minimum-based amplitudes). In that case, one may find a kink or an instanton or whatever that shows that the probability of some previously forbidden processes (quantum tunneling, vacuum decay) is nonzero.

You may also combine instantons with the insight that the action may also be maximized and not just minimized with respect to one direction in the space of possible variations: such generalizations of instantons are known as sphalerons.


The action, i.e. the integral of the Lagrangian, became the most beautiful object that encodes everything we know about the dynamics of any classical system and any quantum system with a natural classical limit. It makes the symmetries and locality manifest; however, as I didn't discuss, the unitarity (and the positivity of probabilities) may often be obscured by the Lagrangian approach while it is made manifest by the operator approach.

Classically, the defining property of the action is the principle of least (or stationary) action: the histories that a given system actually chooses minimize (or extremize) the action. A reason why such a description of classical systems in Nature is possible is the quantum generalization of the Lagrangian formalism: Feynman's path integral.

In any quantum mechanical systems with well-defined classical-like histories, linearity guarantees that the transition amplitudes must be expressible as the integral over all trajectories. The integrand has to be a pure phase, because of the unitarity (equivalent to the reality of the Hamiltonian), and the exponent giving the phase is called the action. A trivial derivation of the classical limit shows that this action is extremized in the classical limit.

So it must be possible to describe every classical limit of a physical system observed in a quantum world – such as ours – by the equations that are equivalent to \(\delta S = 0\) for a cleverly chosen action \(S\). Once again, it's because the accurate implementation of the formula for the probability amplitudes has to be given by Feynman's integral over histories; the exponent has to be pure imaginary; and up to the \(i/\hbar\) normalization, we may call it the "action" and easily prove that it's stationary around the dominant (classically allowed) histories.

The importance of the action becomes super cereal in quantum field theory as well as almost all known descriptions of string/M-theory because almost all known descriptions of string/M-theory are based on some quantum mechanical model or quantum field theory, either one in spacetime, world volume, boundary of the spacetime, or another auxilliary spacetime-like space.

Male chromosomes still discriminated against in Solvay

A participant of the centennial 2011 Solvay Conference has made an observation on genetics:
Seems ratio of x to y chromosomes hasn't changed in 100 years since first Solvay conference in 1911...
What Lisa Randall Sklodowski may have failed to do was to calculate the actual ratio. The purely male Y chromosomes only constitute about 49% of the X/Y chromosomes in Solvay (1911 as well as 2011) while the X chromosomes have 51% or so. As you can see, the Y chromosomes are still being discriminated against by the X chromosomes. The inhuman imperialism is so brutal and it has penetrated so deeply into the fabric of the society that our society allows and even encourages the production of XX carriers where the X chromosomes overwhelm the Y chromosomes by the shameful 2-to-0 ratio (the same score as FC Barcelona dared to choose in its match against FC Viktoria Pilsen last night), the so-called women. The democratic, politically correct carriers of XY chromosomes, the so-called men, are sometimes misinterpreted as the politically incorrect sex while the carriers of YY chromosomes are banned altogether.

The Solvay conference has always been rather close to the democratic 50-to-50 composition of the X and Y chromosomes but traces of the dominance of the X chromosomes and the suppression of the Y chromosomes may still be seen at every step. It's not shocking that the complaint that the X chromosomes' dictatorship is not perfect was raised by someone who completely suppresses the Y chromosomes. ;-)

No comments:

Post a Comment