The Conceptual Framework of Quantum Field Theory (PDFDrive)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 793

The Conceptual Framework

of Quantum Field Theory

Anthony Duncan

1
3
Great Clarendon Street, Oxford, OX2 6DP,
United Kingdom
Oxford University Press is a department of the University of Oxford.
If furthers the University’s objective of excellence in research, scholarship,
and education by publishing worldwide. Oxford is a registered trade mark of
Oxford University Press in the UK and in certain other countries
© Anthony Duncan 2012
The moral rights of the author have been asserted
First Edition published in 2012
Impression: 1
All rights reserved. No part of this publication may be reproduced, stored in
a retrieval system, or transmitted, in any form or by any means, without the
prior permission in writing of Oxford University Press, or as expressly permitted
by law, by licence or under terms agreed with the appropriate reprographics
rights organization. Enquiries concerning reproduction outside the scope of the
above should be sent to the Rights Department, Oxford University Press, at the
address above
You must not circulate this work in any other form
and you must impose this same condition on any acquirer
British Library Cataloguing in Publication Data
Data available
Library of Congress Cataloging in Publication Data
Data available
ISBN 978–0–19–957326–4
Printed and bound by
CPI Group (UK) Ltd, Croydon, CR0 4YY
Links to third party websites are provided by Oxford in good faith and
for information only. Oxford disclaims any responsibility for the materials
contained in any third party website referenced in this work.
Preface

In the roughly six decades since modern quantum field theory came of age with the
introduction in the late 1940s of covariant field theory, supplemented by renormaliza-
tion ideas, there has been a steady stream of expository texts aimed at introducing
each new generation of physicists to the concepts and techniques of this central area of
modern theoretical physics. Each decade has produced one or more “classics”, attuned
to the background, needs, and interests of students wishing to acquire a proficiency
in the subject adequate for the beginning researcher at the time. In the 1950s the
seminal text of Jauch and Rohrlich, Theory of Photons and Electrons, provided the
first systematic textbook treatment of the Feynman diagram technique for quantum
electrodynamics, while more or less simultaneously the first field-theoretic attacks on
the strong interactions were presented in the two-volume Mesons and Fields of Bethe,
de Hoffmann, and Schweber. The 1960s saw the appearance of the massive treatise
by Schweber, Introduction to Relativistic Quantum Field Theory, which addressed
in much greater detail formal aspects of the theory, including the LSZ asymptotic
formalism and the Wightman axiomatic approach. The dominant text of the late 1960s
was undoubtedly the two-volume text of Bjorken and Drell, Relativistic Quantum
Mechanics and Relativistic Quantum Fields, which combined a thorough introduction
to Feynman graph technology (in volume 1) with a more formal introduction to
Lagrangian field theory (in volume 2). In the 1970s the emergence of non-abelian
gauge theories as the overwhelmingly favored candidates for a successful field-theoretic
description of weak and strong interactions coincided with the emergence of functional
(path-integral) methods as the appropriate technical tool for quantization of gauge
theories. In due course, these methods received full treatment with the appearance
in 1980 of Itzhykson and Zuber’s encyclopedic Quantum Field Theory. In a similar
way, the surge to prominence of supersymmetric field theories throughout the 1980s
necessitated a full account of supersymmetry, which is the sole subject of the third
volume of Weinberg’s comprehensive three-volume The Quantum Theory of Fields,
the first edition of which appeared in 1995.
With such a selection of classic expository treatises (not to mention many other
fine texts not listed above—with apologies to authors of same!) one may well doubt
the need for yet another introductory treatment of quantum field theory. Nevertheless,
in the course of teaching the subject to graduate students (typically, second year) over
the last 25 years, I have been struck by the number of occasions on which important
conceptual issues are raised by questions in the classroom which require a careful
explanation not to be found in any of the readily available textbooks on quantum
field theory. To give just a small sample of the sort of questions one encounters in the
classroom setting: “Of the plethora of quantum fields introduced to describe Nature at
subatomic scales, why do so few (basically, only electromagnetism and gravity) have
classical macroscopic correlates?”; “If there are many possible quantum fields available
iv Preface

to ‘represent’ a given particle, can, or in what sense does, quantum field theory
prescribe a unique all-time dynamics?”; “If the interaction picture does not exist,
as implied by Haag’s theorem, why (or in what sense) are the formulas derived in
this picture for the S-matrix still valid?”; “Are there non-perturbative phenomena
amenable to treatment using perturbative (i.e., graph-theoretical) methods?”; and so
on. None of these questions require an answer if one’s attitude in learning quantum
field theory amounts to a purely pragmatic desire to “start with a Lagrangian and
compute a process to two loops”. However, if the aim is to arrive at a truly deep
and satisfying comprehension of the most powerful, beautiful, and effective theoretical
edifice ever constructed in the physical sciences, the pedagogical approach taken by
the instructor has to be quite a bit different from that adopted in the “classics”
enumerated above.
In the present work, an attempt is made to provide an introduction to quantum
field theory emphasizing conceptual issues frequently neglected in more “utilitarian”
treatments of the subject. The book is divided into four parts, entitled respec-
tively, “Origins”, “Dynamics”, “Symmetries”, and “Scales”. Although the emphasis is
conceptual—the aim is to build the theory up systematically from some clearly stated
foundational concepts—and therefore to a large extent anti-historical, I have included
two historical chapters in the “Origins” section which trace the evolution of the modern
theory from the earliest “penumbra” of quantum-field-theoretical phenomena detected
by Planck and Einstein in the early years of the twentieth century to the emergence,
in the late 1940s, of the recognizable structure of modern quantum field theory, in the
form of quantum electrodynamics. The reader anxious to proceed with the business
of logically developing the framework of modern field theory is at liberty to skim, or
even entirely omit, this historical introduction.
The three remaining sections of the book follow a step-by-step reconstruction of
this framework beginning with just a few basic assumptions: relativistic invariance,
the basic principles of quantum mechanics, and the prohibition of physical action at a
distance embodied in the clustering principle. The way in which these physical ingre-
dients combine to engender some of the most dramatic results of relativistic quantum
field theory is outlined qualitatively in Chapter 3, which also contains a summary of
the topics treated in later chapters. Subsequent chapters in the “Dynamics” section
of the book lay out the basic structure of quantum field theory arising from the
sequential insertion of quantum-mechanical, relativistic, and locality constraints. The
rather extended treatment of free fields allows us to discuss important conceptual
issues (e.g., the classical limit of field theory) in greater depth than usually found in
the standard texts. Some applications of perturbation theory to some simple theories
and processes are discussed in Chapter 7, after the construction of covariant fields for
general spin has been explained. A deeper discussion of interacting field theories is
initiated in Chapters 9, 10, and 11, where we treat first general features shared by all
interacting theories (Chapter 9) and then aspects amenable to formal perturbation
expansions (Chapter 10). The “Dynamics” section concludes with a discussion of
“non-perturbative” aspects of field theory—a rather imprecise methodological term
encompassing a wide variety of very different physical processes. In Chapter 11 we
attempt to clarify the extent to which certain features of field theory are “intrinsically”
Preface v

non-perturbative, requiring methods complementary to the graphical expansions made


famous by Feynman.
In the “Symmetries” section we explore the many important ways in which
symmetry principles influence both our understanding and our use of quantum
field theory. Of course, at the heart of relativistic quantum field theory lies an
inescapable symmetry of critical importance: Lorentz-invariance, which, together
with translational symmetry in space and time, makes up the larger symmetry of the
Poincaré group. The centrality of this symmetry explains the dominance of Lagrangian
methods in field theory, even though from a physical standpoint the Hamiltonian
would appear (as is typically the case in non-relativistic quantum theory) to hold
pride of place. The role played by Lorentz-invariance in restricting the dynamics of a
field theory is the main topic of Chapter 12, which also includes an introduction to the
extension of the Poincaré algebra to the graded superalgebra of supersymmetric field
theory. Discrete spacetime symmetries, and the famous twin theorems of axiomatic
field theory—the Spin-Statistics and TCP theorems—are the subject of Chapter
13. The discussion of global symmetries, exact and approximate, in Chapter 14
leads naturally into the very important topics of spontaneous symmetry-breaking
and the Goldstone theorem. The “Symmetries” section of the book closes with a
treatment of local gauge symmetries in Chapter 15, which imply remarkable new
features not present in theories where the only symmetries are global (i.e., involve a
finite-dimensional algebra of spacetime-independent transformations).
With the final section of the book, entitled “Scales”, we come to perhaps the
most characteristic conceptual feature of quantum field theory: the scale separation
property exhibited by theories defined by an effective local Lagrangian. Given that
essentially all of the information obtained from scattering experiments at accelerators
concerns asymptotic transitions (i.e., the infinite time evolution of an appropriately
prepared quantum state, terminated by a detection measurement) it is critically
important for theoretical progress that the probabilities of such transitions not depend
in a sensitive way on interaction details at much smaller distances than those presently
accessible in accelerator experiments (roughly, the inverse of the center-of-mass energy
of the collision process). The insensitivity of field theory amplitudes to our inescapable
ignorance of the nature of the interactions at very short distances (or equivalently,
high momentum) is therefore of central importance if we are to infer reliably an
underlying microdynamics from the limited phenomenology available at any given
time. Remarkably, in this respect quantum field theories are far kinder to us than their
classical (particle or field) counterparts, where non-linearities almost always introduce
chaotic behavior which effectively precludes the possibility of accurate predictions
of state evolution over long time periods. The technical foundations needed for
examining these issues are taken up in Chapter 16, which contains an account of
regularization, power-counting, effective Lagrangians, and the renormalization group.
Applications to the proof of perturbative renormalizability, and a discussion of the
“triviality” phenomenon (the absence of a non-trivial continuum limit), follow in
Chapter 17. Chapters 18 and 19 then explore important features of the behavior
of quantum field theories at short distance (e.g., the operator product expansion
and factorization) and long distance (in particular, the complications in defining the
correct physical state space in unbroken abelian and non-abelian gauge field theories).
vi Preface

To the beginning student, quantum field theory all too often takes on the appear-
ance of a multi-headed Hydra, with many intertwined parts, the understanding of any
one of which seems to require a prior understanding of the rest of the frightening
anatomy of the whole beast. The motivation for the present work was the author’s
desire to provide an introduction to modern quantum field theory in which this rich
and complex structure is seen to arise naturally from a few basic conceptual inputs, in
contrast to the more typical approach in which Lagrangian field theory is presented as
a theoretical fait accompli and then subsequently shown to have the desired physical
features.
Much (perhaps most) of the attitude towards quantum field theory expressed in this
book is the result of innumerable conversations, over four decades, with colleagues and
students. For laying the foundations of my knowledge of field theory I wish especially
to thank my predoctoral and post-doctoral mentors (Steven Weinberg and Al Mueller,
respectively). In the case of the present work I am extremely grateful to Estia Eichten,
Michel Janssen, Adam Leibovich, Max Niedermaier, Sergio Pernice, and Ralph Roskies
for reading extensive parts of the manuscript, and for many useful comments and
suggestions. Any remaining solecisms of style or content are, of course, entirely the
responsibility of the author.
Contents

1 Origins I: From the arrow of time to the first quantum field 1


1.1 Quantum prehistory: crises in classical physics 1
1.2 Early work on cavity radiation 3
1.3 Planck’s route to the quantization of energy 8
1.4 First inklings of field quantization: Einstein and energy fluctuations 14
1.5 The first true quantum field: Jordan and energy fluctuations 18
2 Origins II: Gestation and birth of interacting field theory:
from Dirac to Shelter Island 30
2.1 Introducing interactions: Dirac and the beginnings of quantum
electrodynamics 31
2.2 Completing the formalism for free fields: Jordan, Klein, Wigner,
Pauli, and Heisenberg 40
2.3 Problems with interacting fields: infinite seas, divergent integrals,
and renormalization 46
3 Dynamics I: The physical ingredients of quantum field theory:
dynamics, symmetries, scales 57
4 Dynamics II: Quantum mechanical preliminaries 69
4.1 The canonical (operator) framework 70
4.2 The functional (path-integral) framework 86
4.3 Scattering theory 96
4.4 Problems 106
5 Dynamics III: Relativistic quantum mechanics 108
5.1 The Lorentz and Poincaré groups 108
5.2 Relativistic multi-particle states (without spin) 111
5.3 Relativistic multi-particle states (general spin) 114
5.4 How not to construct a relativistic quantum theory 121
5.5 A simple condition for Lorentz-invariant scattering 125
5.6 Problems 130
6 Dynamics IV: Aspects of locality: clustering, microcausality,
and analyticity 132
6.1 Clustering and the smoothness of scattering amplitudes 133
6.2 Hamiltonians leading to clustering theories 138
6.3 Constructing clustering Hamiltonians: second quantization 144
6.4 Constructing a relativistic, clustering theory 149
6.5 Local fields, non-localizable particles! 159
viii Contents

6.6 From microcausality to analyticity 164


6.7 Problems 169
7 Dynamics V: Construction of local covariant fields 171
7.1 Constructing local, Lorentz-invariant Hamiltonians 171
7.2 Finite-dimensional representations of the homogeneous
Lorentz group 173
7.3 Local covariant fields for massive particles of any spin:
the Spin-Statistics theorem 177
7.4 Local covariant fields for spin- 12 (spinor fields) 184
7.5 Local covariant fields for spin-1 (vector fields) 198
7.6 Some simple theories and processes 202
7.7 Problems 215
8 Dynamics VI: The classical limit of quantum fields 219
8.1 Complementarity issues for quantum fields 219
8.2 When is a quantum field “classical”? 223
8.3 Coherent states of a quantum field 228
8.4 Signs, stability, symmetry-breaking 234
8.5 Problems 238
9 Dynamics VII: Interacting fields: general aspects 240
9.1 Field theory in Heisenberg representation: heuristics 241
9.2 Field theory in Heisenberg representation: axiomatics 253
9.3 Asymptotic formalism I: the Haag–Ruelle scattering theory 268
9.4 Asymptotic formalism II: the Lehmann–Symanzik–Zimmermann
(LSZ) theory 281
9.5 Spectral properties of field theory 289
9.6 General aspects of the particle–field connection 297
9.7 Problems 304
10 Dynamics VIII: Interacting fields: perturbative aspects 307
10.1 Perturbation theory in interaction picture and Wick’s
theorem 309
10.2 Feynman graphs and Feynman rules 314
10.3 Path-integral formulation of field theory 325
10.4 Graphical concepts: N -particle irreducibility 341
10.5 How to stop worrying about Haag’s theorem 359
10.6 Problems 371
11 Dynamics IX: Interacting fields: non-perturbative aspects 374
11.1 On the (non-)convergence of perturbation theory 376
11.2 “Perturbatively non-perturbative” processes: threshhold bound
states 386
11.3 “Essentially non-perturbative” processes: non-Borel-summability
in field theory 400
11.4 Problems 411
Contents ix

12 Symmetries I: Continuous spacetime symmetry: why we need


Lagrangians in field theory 414
12.1 The problem with derivatively coupled theories: seagulls, Schwinger
terms, and T ∗ products 414
12.2 Canonical formalism in quantum field theory 416
12.3 General condition for Lorentz-invariant field theory 421
12.4 Noether’s theorem, the stress-energy tensor, and all that stuff 426
12.5 Applications of Noether’s theorem 431
12.6 Beyond Poincaré: supersymmetry and superfields 443
12.7 Problems 464
13 Symmetries II: Discrete spacetime symmetries 469
13.1 Parity properties of a general local covariant field 470
13.2 Charge-conjugation properties of a general local covariant field 474
13.3 Time-reversal properties of a general local covariant field 477
13.4 The TCP and Spin-Statistics theorems 478
13.5 Problems 485
14 Symmetries III: Global symmetries in field theory 487
14.1 Exact global symmetries are rare! 489
14.2 Spontaneous breaking of global symmetries: the Goldstone theorem 492
14.3 Spontaneous breaking of global symmetries: dynamical aspects 495
14.4 Problems 507
15 Symmetries IV: Local symmetries in field theory 509
15.1 Gauge symmetry: an example in particle mechanics 509
15.2 Constrained Hamiltonian systems 512
15.3 Abelian gauge theory as a constrained Hamiltonian system 519
15.4 Non-abelian gauge theory: construction and functional integral
formulation 529
15.5 Explicit quantum-breaking of global symmetries: anomalies 544
15.6 Spontaneous symmetry-breaking in theories with a local
gauge symmetry 552
15.7 Problems 565
16 Scales I: Scale sensitivity of field theory amplitudes and
effective field theories 569
16.1 Scale separation as a precondition for theoretical science 570
16.2 General structure of local effective Lagrangians 571
16.3 Scaling properties of effective Lagrangians: relevant, marginal,
and irrelevant operators 574
16.4 The renormalization group 581
16.5 Regularization methods in field theory 588
16.6 Effective field theories: a compendium 595
16.7 Problems 608
x Contents

17 Scales II: Perturbatively renormalizable field theories 610


17.1 Weinberg’s power-counting theorem and the divergence structure
of Feynman integrals 613
17.2 Counterterms, subtractions, and perturbative renormalizability 629
17.3 Renormalization and symmetry 645
17.4 Renormalization group approach to renormalizability 652
17.5 Problems 660
18 Scales III: Short-distance structure of quantum field theory 662
18.1 Local composite operators in field theory 664
18.2 Factorizable structure of field theory amplitudes: the operator
product expansion 679
18.3 Renormalization group equations for renormalized amplitudes 698
18.4 Problems 708
19 Scales IV: Long-distance structure of quantum field theory 712
19.1 The infrared catastrophe in unbroken abelian gauge theory 713
19.2 The Bloch–Nordsieck resolution 724
19.3 Unbroken non-abelian gauge theory: confinement 728
19.4 How confinement works: three-dimensional gauge theory 744
19.5 Problems 752
Appendix A The functional calculus 754
Appendix B Rates and cross-sections 756
Appendix C Majorana spinor algebra 761
References 765
Index 777
1
Origins I: From the arrow of time
to the first quantum field

1.1 Quantum prehistory: crises in classical physics


The first indications of serious inadequacies in the framework of classical physics—
deficiencies which eventually could only be resolved by the introduction of quantum-
theoretical concepts and methods—can already be found in Maxwell’s discussion of
the anomalously low specific heat of gases in the mid-1870s. Nevertheless, the birth of
quantum theory as such, and in particular the clear identification of a new fundamental
constant of Nature characteristic of quantum phenomena, is usually located in the
year 1900 with Planck’s invention (Planck, 1900b) (“derivation” would perhaps be too
charitable a term) of a novel formula for the distribution of energy over frequencies
in thermal radiation (namely, electromagnetic radiation in the interior of a sealed
enclosure which has been allowed to come to thermal equilibrium with the walls of the
enclosure, themselves maintained at a fixed temperature T ). The problem of thermal
cavity (or “blackbody”) radiation would in modern terms be regarded as one of thermal
quantum field theory, so we have the strange situation that the first historical impetus
to the discovery of quantum principles actually lay in a problem of quantum field
theory, which is generally supposed to be a much later invention arising from a fusion of
quantum, relativistic, and locality principles. In fact, Planck’s papers of 1900 make no
reference to quantization of the electromagnetic field-energy (a concept which Planck
would continue to resist strenuously until the mid-1920s): the energy quantization
principle for Planck is strictly a statement about the distribution of energy among
the idealized material oscillators constituting the walls of the enclosure, and general
principles of thermodynamics are then brought to bear to fix the electromagnetic
energy distribution which must necessarily obtain once equilibrium between these
oscillators and the interior radiation has been achieved. Only five years later, in his
remarkable paper entitled “A heuristic point of view concerning the creation and
conversion of light” (Einstein, 1905a), Einstein was to extend boldly and explicitly
the idea of energy quantization to the electromagnetic field itself, introducing the idea
of “light quanta” (in modern language, photons). From a conceptual (if not technical)
point of view, this paper therefore marks the true birth of quantum field theory.
In order to understand the origins of quantum theory in the problem of black-
body radiation, which to physicists of Planck’s generation must have appeared
quintessentially classical (merging as it did well-established principles of Maxwellian
electromagnetic theory and thermodynamics), we need to push our time horizon back
a quarter century or so and survey the overall situation in classical physics at the start
2 Origins I: From the arrow of time to the first quantum field

of the final quarter-century of the 1800s. The three great edifices of classical physics—
Newtonian mechanics (amplified and deepened, of course, by the contributions of
Laplace, Lagrange, Hamilton, and many others), electromagnetic theory, only recently
completed by Maxwell (in Philosophical Transactions, vol. 155, 1865), and, somewhat
later, put in a form recognizable to the modern student of the subject by Hertz,
and thermodynamics, which reached essential conceptual completeness at the hands
of Clausius, also around 1865—stood as precise descriptions of natural phenomena,
each apparently unassailable in its natural domain of applicability. In a sense, further
progress keeping strictly within the limits of each of these disciplines had become
difficult or impossible. However, precisely at this time, natural phenomena requiring
the simultaneous application of more than one of these formal structures began to
demand the attention of physicists. It is possible to trace the origins of the core
disciplines of twentieth-century physics—quantum theory, statistical mechanics, and
relativity—to developments at the interfaces of the three basic classical frameworks.
We can summarize these developments very briefly as follows:

1. Developments at the interface of thermodynamics and electromag-


netic theory.
The discovery and classification of solar spectral lines by Fraunhofer, and the
development of spectroscopy as an analytical tool by Bunsen and Kirchhoff in
the 1850s, led naturally to an investigation of the radiation emitted by hot
bodies, and thus to the study of the relation between thermodynamics and
electromagnetic phenomena. It was immediately recognized by Kirchhoff that
the intensity and frequency of the radiation emitted by a perfectly absorbing
body had a fundamental significance. Various arguments—some of a rigorous
thermodynamic nature, others of an heuristic character—led, by 1896, to a
widely accepted form for this “blackbody distribution” (the Wien Law). The
experimental failure of this “law” was the final stimulus which led Planck to
(reluctantly) advance the quantum hypothesis in his seminal papers (Planck,
1900b) in Annalen der Physik, 1900.
2. Developments at the interface of mechanics and thermodynamics.
Attempts to reconcile classical mechanics, regarded as the underlying dynamical
description of all phenomena, with the formal principles of thermodynamics led
to the development of kinetic theory by Clausius, Maxwell, and Boltzmann. The
more general framework developed by the last of these has since come to be
called statistical mechanics. Boltzmann was the first to understand clearly the
role of statistical and probabilistic considerations in reconciling mechanics with
heat theory (as thermodynamics was called at the time). However, his views were
highly controversial at the time, especially with regard to his claim of a statistical
origin for the phenomenon of irreversibility in thermal physics. Although he would
prove to be absolutely right on this point, Boltzmann’s methods proved incapable
of explaining the observed specific heats of gases: this difficulty, early (before
1875) recognized by Maxwell (Maxwell, 1875) as a serious anomaly in classical
theory, would only finally be removed by the application of quantum ideas in the
1920s, fully fifty years after the problem was first recognized.
Early work on cavity radiation 3

3. Developments at the interface of electromagnetic theory and


mechanics.
The historical and conceptual preeminence of classical mechanics implied a spe-
cial status which led naturally to an attempt to interpret all natural phenomena
in mechanical terms. In particular, the attempt to weld electromagnetic theory
with mechanics (begun by Maxwell himself) by main force led by the second half
of the nineteenth century to the development of a profusion of aether theories of
increasing complexity and artificiality. The failure to produce any direct evidence
for the existence of an aether in optical experiments (by measuring relative motion
of the aether and the optical apparatus of choice) grew from a mere annoyance
into an outright crisis with the null result of the experiments of Michelson (1881)
and Michelson and Morley (1887). The entire class of complicated and messy
dynamical aether theories concocted to surmount this impasse were demolished
with one stroke by Einstein (Einstein, 1905b) in 1905, by accepting the kine-
matical structure natural to electromagnetic phenomena as generally valid in the
mechanical sphere also.

In our outline of the conceptual origins of quantum mechanics and quantum field
theory, the first item above holds pride of place: firstly, because by common consent
quantum theory begins with Planck’s introduction of a new universal constant of
Nature in his blackbody distribution formula of 1900, and secondly, because, as we shall
see later, the first explicitly quantum field-theoretic calculation, Jordan’s derivation of
the mean-square energy fluctuations in a subvolume of a cavity containing (a one-
dimensional version of) electromagnetic radiation in the final section of the Drei-
Männer-Arbeit (1925) of Born, Heisenberg, and Jordan (Born et al., 1926), came
directly out of an attempt to reproduce a remarkable result of Einstein dating from
1909 (Einstein, 1909b,a) in which the apparent paradox of simultaneous wave and
particle behavior of light was first exposed with full clarity. The essence of quantum
field theory is to provide a unified dynamical framework in which these apparently dis-
parate behaviors can coexist in a conceptually consistent fashion. The electromagnetic
radiation contained in a cavity at thermal equilibrium therefore plays a central role
in the conceptual origins both of quantum theory generally and quantum field theory
in particular. For this reason we shall retell in this chapter the history of thermal
radiation in some detail, paying particular attention to those aspects important for
understanding the conceptual origins of quantum field theory. The story will lead
us continuously from the early arguments surrounding the role of the Second Law
of Thermodynamics (and the “arrow of time” it implies) in blackbody radiation, to
the appearance of the first truly quantum-field-theoretical analysis of electromagnetic
radiation. We begin in the next section by describing some important milestones on
the way to the understanding of blackbody radiation as it stood in 1900 when Planck
took the first steps along the road to modern quantum theory.

1.2 Early work on cavity radiation


The fact that heated bodies glow with a color and intensity varying with their temper-
ature must surely have been apparent in prehistoric times (at least since the discovery
4 Origins I: From the arrow of time to the first quantum field

of fire!). The precise nature of thermal radiation became the subject of intense study
in the nineteenth century, and as we shall see, led directly to the discovery of the
quantum principle. Wedgewood, the porcelain manufacturer, observed in the 1790s
that heated bodies all became red at the same temperature. The Scottish physicist
Balfour Stewart noted (in 1858) that a block of rock salt at 100o C strongly absorbs
the radiation emitted by a similar block at the same temperature, and suggested the
rule of equality of radiating and absorbing power of bodies for rays of any given type
(i.e., wavelength). About a year later (independently of Stewart) Kirchhoff put all
of this phenomenology into a comprehensible framework by the use of very general
thermodynamic arguments (Kirchhoff, 1859, 1860).
The arguments given by Kirchhoff established the universal character of blackbody,
or “cavity”, radiation—the radiation filling the interior of a hollow material cavity, the
walls of which are maintained at a fixed temperature T . It is important to understand
at the outset the role of the cavity in these arguments. After heating the cavity to
the desired temperature T , a fixed amount of radiant energy fills the interior. To
examine the nature of this radiation we are at liberty to drill a very small hole in
the walls, as the small amount of radiant energy emerging will not sensibly disturb
the established equilibrium. Note that from the point of view of the external world,
the punctured cavity is essentially “black”, in the sense that any radiation entering
through the pinhole will have to scatter around in the interior of the cavity for a
very long time before having an opportunity to escape. Such a cavity is therefore
(effectively) a perfect absorber, or “black body”.1 The essence of the problem is that
radiation in the cavity is forced to interact with the walls (or contents, if any) of the
cavity until thermal equilibrium is reached.
Kirchhoff showed that the energy (per unit volume, and per wavelength interval) of
the cavity radiation was uniform and isotropic throughout the interior. The arguments
he gave were subsequently simplified by Pringsheim. The latter observed that by
inserting a reflecting plane surface at some point inside the cavity, the equality of
radiation in opposite directions follows, as otherwise the unequal radiation pressure
exerted on the two faces would allow the spontaneous conversion of heat to work,
thereby violating the Second Law of Thermodynamics. The existence of a pressure
exerted by radiation reflected from a surface is crucial to these arguments: the precise
relation (to be discussed further below) between this pressure and the energy density of
the radiation had been derived earlier by Maxwell (Maxwell, 1873). Similar arguments
employing two mirrors can be used to prove the isotropy and homogeneity of the
radiation in the cavity. The existence of filters selectively absorbing and transmitting
radiation of different wavelengths means that these statements hold separately within
each interval of wavelength. Thus, we can define a function φ(λ, T ) such that φ(λ, T )Δλ
is the radiant energy per unit volume between wavelengths λ, λ + Δλ anywhere in the
cavity.

1 The use of the term “blackbody”, though historically predominant, frequently confuses the beginning
student, who wonders quite naturally how a truly black body can radiate! In the forthcoming discussion we
prefer the use of the term “cavity (or thermal) radiation”. The German terminology, “Normal Spektrum”,
is not particularly illuminating either.
Early work on cavity radiation 5

Next, one easily sees that φ(λ, T ) is the same (universal!) function for any cavity,
irrespective of size, shape, or material constitution. This can be established by
connecting two cavities at the same temperature by a thin tube allowing radiation
to pass in either direction. Unequal radiation densities would result in unequal fluxes
down the tube, and hence in a spontaneous flow of heat between two systems at
the same temperature. Again, by placing filters which pass only a limited range of
wavelengths in the tube, this equality is found to hold in each wavelength interval
λ, λ + Δλ.
The universal function φ(λ, T ) appears in the description of radiation emitted from
any hot surface in the following way. At thermal equilibrium the radiation impinging
on the surface must be balanced by the total radiation leaving, as otherwise the surface
would grow progressively cooler or hotter. Suppose the surface (say, the interior wall
of our cavity discussed above) absorbs a fraction Aλ of the incident radiation flux eλ
(at wavelength λ). It is clear that the radiation emitted (as opposed to reflected), or
the “emissive power” of the surface Eλ must just equal Aλ eλ , or equivalently, the ratio
Eλ /Aλ of emissive power to absorption coefficient is a universal function (essentially
our old friend φ) of wavelength and temperature. This result, of course, lies at the
core of Stewart’s observations mentioned above. It also explains why thermos flasks
are silvered (making Aλ small) in order to get them to radiate less.
The total energy density of cavity radiation (at all wavelengths) is evidently just
the integral of Kirchhoff’s function φ:

ρ(T ) = φ(λ, T )dλ (1.1)

In 1879 Josef Stefan proposed on the basis of some preliminary experiments the
form

ρ(T ) = aT 4 (1.2)

with a a universal constant. In 1884 Boltzmann derived this formula thermodynam-


ically, essentially by the following argument. Consider a spherical cavity of radius r.
Maxwell had shown that radiation of energy density ρ exerts a radiation pressure
1 4 3
3 ρ. Thus the internal energy is U = 3 πr ρ and the heat absorbed in an infinitesimal
reversible expansion is given by the First Law of Thermodynamics as2

4 1
d/Q = dU + d/W = d( πr3 ρ) + 4πr 2 ρdr (1.3)
3 3

with the corresponding entropy change

4 3 dρ 16 2 dr 4 dρ dT 16 dr
dS = d/Q/T = πr + πr ρ = πr 3 + πr2 ρ (1.4)
3 T 3 T 3 dT T 3 T

2 We remind the reader that U and S are state functions, unlike work W and heat Q: only the changes
in the latter are meaningful, whence the difference in notation in the associated differentials d
/W, d
/Q.
6 Origins I: From the arrow of time to the first quantum field

∂ ∂S ∂ ∂S
Equating ∂T ∂r to ∂r ∂T we find
dρ 4
= ρ (1.5)
dT T
which immediately implies the T 4 law stated above, and now known as the Stefan–
Boltzmann Law.
Serious attempts to measure experimentally the intensity and spectral composition
of blackbody radiation began with Langley, the American astronomer, who invented
the bolometer, a device for measuring the intensity of radiation in the infrared, using
the principle of temperature-dependent resistance of a thin filament placed in the path
of the rays after they were refracted through a rock-salt prism. These measurements
(1886) extended up to about wavelengths of 5μ. He found “a real though slight
progression of the point of maximum heat towards the shorter wave-lengths as the
temperature rises” and the asymmetric form of the maximum of φ(λ, T ), steeper on
the shorter wavelength side. By 1895 the greatly improved measurements of Paschen
established the rule that the wavelength λm of maximum intensity was inversely
proportional to the temperature T . Paschen’s measurements led him to propose, in
1896, the form
A
φ(λ, T ) = Bλ−C exp(− ) (1.6)
λT
with the constant C somewhere in the range of 5–6.
In 1893 Wien derived, using purely thermodynamic arguments, an important
constraint on the Kirchhoff function φ(λ, T ). Consider once again the spherical cavity
used above in the derivation of the Stefan–Boltzmann Law. Imagine that the sphere
undergoes slow, adiabatic compression, where the radius contracts steadily at speed v
(v << c). At every reflection from this inwardly contracting sphere light of wavelength
λ suffers a Doppler shift to wavelength λ(1 − 2v/c). During a contraction by Δr,
occurring in time Δr/v, the light undergoes cΔr/2vr reflections across a diameter of
the sphere (it can be shown that light not incident perpendicular to the walls suffers a
smaller Doppler shift each time, but is reflected correspondingly more frequently: the
net result is the same). The result is a total blue shift to wavelength
2v cΔr Δr
(1 − ) 2vr λ ∼ (1 − )λ
c r
so the wavelength is shifted by Δλ
λ
= − Δr
r
in this adiabatic compression. As there is
no heat transfer dS = 0 and (see Eq. (1.4) above)
4 3 4 dT 16 dr
πr ρ = − πr 2 ρ
3 T T 3 T
dr dT
⇒ =−
r T
so r ∝ T1 .
The essence of the thermodynamic argument given by Wien lies in the observation
that the adiabatic process described here gives at every stage cavity radiation in
Early work on cavity radiation 7

equilibrium at the new temperature T inversely proportional to r. If the result were


otherwise, producing more or less radiation in some wavelength band than appropriate
for blackbody radiation at the new temperature, introduction of filters absorbing
in this range immediately allows the generation of temperature differences and a
consequent violation of the Second Law.
In the time interval Δt of the adiabatic compression, the radiant energy
originally in the wavelength band (λ, λ + Δλ), namely
4 3
πr φ(λ, r)Δλ (1.7)
3
is shifted down to the interval (λ(1 − vΔt
r
), (λ + Δλ)(1 − vΔt
r
)), and increased by the
amount of adiabatic work done against the radiation pressure, which is
1
(4πr2 )(vΔt)( φ(λ, r)Δλ) (1.8)
3
Thus
4 vΔt vΔt
π(r − vΔt)3 φ(λ(1 − ), r − vΔt)Δλ(1 − )
3 r r
4 4
= πr3 φ(λ, r) + ( πr2 )(vΔt)φ(λ, r)Δλ (1.9)
3 3
Expanding to first order in Δt, one finds the equation
∂φ ∂φ
λ +r = −5φ
∂λ ∂r
or
∂ ∂
(λ + r )(λ5 φ) = 0
∂λ ∂r
so that

λ5 φ = f (λ/r) = f (λT )

Wien phrased this result slightly differently. Since


1 1
5
φ= f (λT )
T (λT )5
it follows that if λ1 T1 = λ2 T2 , then
1 1
φ(λ1 , T1 ) = 5 φ(λ2 , T2 )
T15 T2

which Wien called the “T 5 ” law, but is now commonly called the Wien Displacement
Law. If we know the radiation function φ(λ1 , T1 ) at temperature T1 , the law allows us
to “displace” it into the appropriate curve for any other temperature T2 , as
T2 5 T2
φ(λ, T2 ) = ( ) φ(λ , T1 )
T1 T1
8 Origins I: From the arrow of time to the first quantum field

It follows immediately from the Wien Displacement Law that the total energy density

ρ(T ) = dλφ(λ, T )

1
= dλ f (λT )
λ5
 ∞
1
= T4 dx f (x) ≡ aT 4 (1.10)
0 x5

satisfies the Stefan–Boltzmann Law (provided, of course, that the integral converges:
a condition by no means to be taken for granted, as we shall see). Another imme-
diate corollary is the result later verified experimentally by Paschen (but suggested
previously by several workers in this field, notably H. F. Weber) that the maximum in
wavelength displaces inversely with the temperature. Finally, the Wien Displacement
Law immediately fixes the value of the constant C in the form proposed by Paschen
(see (1.6)) to be 5 exactly. Independently of Paschen’s work, Wien in 1896 arrived
at the form (1.6) on the basis of an ad hoc assumption concerning the emission of
radiation by molecules distributed according to a Maxwellian velocity distribution.
This form, which is now a complete specification of the Kirchhoff function φ, was
called the Wien Distribution Law, and was to play, with its “corrected” version, the
Planck Distribution Law, a critical role in the evolution of attempts to understand
quantization of the electromagnetic field.

1.3 Planck’s route to the quantization of energy


Max Planck, the father of quantum theory, was born in Kiel, Germany, in 1858, and
attended the Gymnasium (high school) and University in Munich before going to
Berlin for his doctoral degree, where he had classes from Helmholtz and Kirchhoff.
His doctoral thesis concerned the application of thermodynamics (à la Clausius) to
problems of “Evaporation, Melting, and Sublimation”. Planck was fascinated by the
extraordinary scope, power and (apparent) infallibility of the energy conservation
principle—or First Law of Thermodynamics—and accorded to the Second Law of
Thermodynamics (with its concomitant “arrow of time”) an equal degree of validity. He
was therefore convinced that the Second Law could not rest on the purely mechanical
foundations of Boltzmannian gas theory, in which the reversibility objections of
Loschmidt and Zermelo (the latter a Planck assistant in Berlin) would necessarily lead
to spontaneous processes (albeit rare) where the entropy decreased.3 The consideration
of the paradox of Maxwell’s demon also led Planck to the rather peculiar conclusion
that the Second Law could never be valid in a system comprised of discrete particles.
Instead, Planck began to investigate (in 1897) the possibility that irreversible
thermal phenomena could somehow be traced back to irreversible processes in a
continuous medium—in particular, the electromagnetic field. The archetypal process
considered by Planck was the apparently irreversible conversion of plane radiation

3 For a beautiful retelling of this remarkable period in the development of statistical heat theory, see the
biography by Martin Klein of Paul Ehrenfest (Klein, 1970).
Planck’s route to the quantization of energy 9

incident on a charged oscillator into outgoing spherical waves. This subject was
explored with great thoroughness in a series of five papers in the Berliner Berichte
(1897–99) entitled “Über irreversible Strahlungsvorgänge” (“On irreversible radiation
processes”) (Planck, 1900a). The subject of absorption and re-emission of electromag-
netic radiation from an oscillator led Planck naturally into the subject of thermal
cavity radiation. Here the oscillators constitute the material of the walls of the cavity,
absorbing and re-emitting the radiation in the interior. The universal character of the
thermal radiation discussed above allowed Planck the freedom of making a very simple
model of the constituent particles of the walls (essentially charged simple harmonic
oscillators), as the spectral distribution of the cavity radiation would have to be
independent of the specific material constitution of the cavity once equilibrium is
reached.
The irreversibility that Planck relies upon in his radiation studies can be seen
clearly in the damped oscillator equation that he derived as a prelude to his studies
of the coupled field-oscillator problem:

d2 x 2e2 d3 x
m + kx − = eE cos(2πνt) (1.11)
dt2 3c3 dt3
The third (“radiation damping”) term has three time-derivatives and evidently
changes sign under time-reversal. It arises because the damping force times the velocity
must give the power lost to radiation, which is proportional to the acceleration of the
charged particle squared. The average of the third term above times the velocity
dx
dt over a cycle of the periodic system is easily seen to be the same as the average
power radiated, by a single integration by parts. Planck was particularly impressed
by the fact that the irreversibility in this system arises without any recourse to non-
conservative processes, in which ordered energy is lost (as in friction or air resistance)
to disordered heat. Instead, the energy appears to flow irreversibly from an ordered
source (an incoming plane wave incident on the oscillator) to an equally ordered form:
outgoing spherical radiation.
In 1898 Boltzmann succeeded in convincing Planck (in a paper entitled “On
the supposedly(!) irreversible radiation processes” (Boltzmann, 1898)) that the hope
of deriving irreversible phenomena from electromagnetic theory without additional
statistical assumptions was bound to fail, as Maxwell’s equations are just as invariant
under time-reversal as those of classical mechanics. In fact (Boltzmann claimed), in
the course of a careful derivation of radiation damping one is forced to apply boundary
conditions to the fields which amount to a field analog of the assumption of molecular
disorder implicit in the Boltzmann approach to gas theory.4 Planck admitted this
promptly and abandoned the attempt at a “microscopic” explanation of irreversibility
based on electrodynamics.
In the fifth of his papers on irreversible radiation processes (Planck, 1899), Planck
derived a crucial formula relating the distribution function for cavity radiation to the
average energy of his fictional oscillators (at equilibrium). Before stating this formula,

4 See (Klein, 1970), (Kuhn, 1978) for masterful expositions of the remarkable developments at the
interface of mechanics and heat theory summarized all too briefly above.
10 Origins I: From the arrow of time to the first quantum field

a slight change in notation will be convenient. Let ρ(ν, T )dν be the energy/unit volume
of cavity radiation in the frequency interval (ν, ν + dν), where νλ = c, |dν| = λc2 dλ, so
that
c
ρ(ν, T )dν = ρ(ν, T ) 2 dλ = φ(λ, T )dλ
λ
λ2 c c
ρ(ν, T ) = φ(λ, T ) = 2 φ( , T ) (1.12)
c ν ν
In terms of ρ(ν, T ), the T 5 law takes the form
ν
ρ(ν, T ) = ν 3 f ( ) (1.13)
T
and the Wien Distribution Law is

ρ(ν, T ) = αν 3 exp(−βν/T ) (1.14)

where the new constants α, β are related to those appearing in the Paschen result
(1.6) by α = cB4 , β = Ac (recall that the constant C was fixed previously by purely
thermodynamic reasoning to be 5). The equation derived by Planck, obtained by
equating at equilibrium the energy absorbed and emitted by the oscillator, stated
simply
c3
E(νo , T ) = ρ(νo , T ) (1.15)
8πν 2
1

It relates the average energy of an oscillator of natural frequency νo = 2π k/m to
the blackbody distribution function. That such a relation must exist is physically
clear. Planck showed that if the left-hand side exceeded the right, energy would flow
from the oscillators to the electromagnetic field, while if the intensity of radiation at
ν0 became large enough that the right-hand side exceeded the left the oscillators
would tend to absorb energy from the field. The importance of this equation (a
full derivation of which we must unfortunately forego, in the interests of brevity) in
Planck’s intellectual journey can scarcely be overemphasized: it allowed him to restrict
the application of energy quantization to the material oscillators alone (left-hand side
of (1.15)), while relying on the equilibrium condition to transfer the resultant average
distribution of energy by main force, as it were, to the continuous electromagnetic
radiation (right-hand side) in the interior of the cavity. Planck would continue to
insist on the continous, purely classical character of electromagnetic radiation for the
next 25 years.
In his final paper on irreversible radiation processes (see (Planck, 1900a)), Planck
gave a “derivation” of the Wien Law based on purely thermodynamic arguments
together with the crucial formula (1.15) above. This was done by making a plausible
assumption for the form of the entropy S of the oscillator as a function of energy E,
using for the inverse temperature T −1 = ∂E ∂S
, and solving for E as a function of T .
Planck showed that his assumption for S(E) implied that the entropy of the whole
system (oscillators plus radiation) would necessarily increase in time, in agreement
with the Second Law of Thermodynamics. He was also under the (as it later turned
Planck’s route to the quantization of energy 11

out, erroneous) impression that this was the only possible choice for S(E) consistent
with the Second Law. Consequently, at this point Planck was quite convinced that
he had finally managed a complete derivation of the blackbody spectrum from pure
thermodynamics (even if he had now to agree with Boltzmann that the Second Law
had a statistical rather than absolute significance, even in radiation phenomena).
On the afternoon of Sunday, 7 October 1900, Planck was visited at home by an
experimental colleague from the Physikalische-Technische Reichsanstalt (the Physical-
Technical Imperial Institute, or PTR), H. Rubens. He learnt from Rubens that recent
experiments at the PTR had established incontrovertible deviations from the Wien
Distribution Law on the infrared (low-frequency) side. In particular, the intensity was
roughly proportional to temperature in this regime, instead of the saturation at high
temperatures implied by the Wien Law (1.14). Planck realized that a more general
form for the oscillator entropy S(E) would in turn allow the derivation of a modified
distribution law

1
ρ(ν, T ) = αν 3 (1.16)
exp(βν/T ) − 1

which clearly reproduces the Wien Law at higher frequencies, but behaves like

1 α
ρ(ν, T ) ∼ αν 3 ∼ T ν2 (1.17)
(βν/T ) β

in the infrared (small ν), showing the desired linear behavior with T . This interpolating
formula, which Planck appears to have constructed in the few hours following the visit
of Rubens, was checked within the next week and a half and found to match exactly
the experimental data.
Planck was perfectly aware that his interpolating formula was nothing more than
an enlightened guess at this stage, and he began right away to search for a proper
understanding of the formula (1.16). His strategy was precisely the inverse of the one
he had followed heretofore. He used (1.15) to obtain the average oscillator energy,
assuming the validity of the Planck distribution (1.16):


E(ν, T ) = (1.18)
exp (βν/T ) − 1

3
where h ≡ αc 8π
. He then reconstructed the corresponding expression for oscillator
entropy as a function of energy, using the thermodynamic relation T dS = dE valid
for a reversible transformation involving transfer of heat but no external work. Here,
oscillators of a fixed natural frequency ν (called νo above) are considered. Solving
(1.18) for 1/T as a function of E:

1 1 hν
= ln(1 + ) (1.19)
T βν E
12 Origins I: From the arrow of time to the first quantum field

and integrating, one obtains,



1
S= dE
T (E)

1
= (ln(E + hν) − ln(E))dE
βν
h E E E E
= {(1 + ) ln(1 + )− ln( )} (1.20)
β hν hν hν hν
apart from an irrelevant integration constant. The problem now shifted to finding a
“fundamental” explanation for this last expression.
By this point in late 1900, Planck had been converted to Boltzmann’s statistical
approach, and he now adopted the techniques used by the latter for gas theory in
an attempt to establish (1.20) by microstatistical reasoning. Thus, the entropy was
to be determined by taking the logarithm of the number of available microscopic
states consistent with the stated macroscopic parameters, S = k ln(W ) (here k is
Boltzmann’s constant). Like Boltzmann, Planck introduced a finite-energy unit
to
facilitate the counting. The total energy EN shared by N oscillators was a (large!)
integer P number of these units, EN = P
. Planck then “counted” W by simply
computing the number of ways in which P units of energy could be distributed among
the N oscillators. The combinatorial formula needed for this can be derived rapidly
using a characteristically elegant trick due to Ehrenfest. Write out a string of P energy
units
, with dividers to indicate how many units belong to the first, second, etc.,
oscillator:

|
|

...|

There are P of the


symbols and N − 1 dividers. First assume that all these symbols
are distinguishable. There are then (P + N − 1)! ways of ordering them. As the
dividers and energy units are (separately) indistinguishable, we have overcounted by a
factor (N − 1)!P !. Thus the desired result (using Stirling’s approximation to evaluate
the factorials of large numbers) is

(P + N − 1)!
S = k ln( ) ∼ k((N + P ) ln(N + P ) − P ln(P ) − N ln(N ))
P !(N − 1)!

The average entropy of each oscillator S = N1 SN while the average energy of a single
oscillator is E = N1 EN = N
P

. Consequently

E E E E
S = k{(1 + ) ln(1 + ) − ln( )} (1.21)



This is exactly the relation (1.20), provided we identify


= hν, βh = k. In other words,
the derivation of the new distribution formula forced Planck to keep the energy units

finite, even at the end of the calculation. Setting


to zero here, as Boltzmann had done
at the end of his gas theory calculations, would be equivalent to setting the constant
h to zero, which would lead to an incorrect distribution law (the Rayleigh–Jeans Law,
Planck’s route to the quantization of energy 13

to be discussed further below). Apparently, the oscillators in the walls of a cavity


were only allowed to have energies in integer multiples of the basic energy “quantum”

= hν! The arguments outlined above were presented in Planck’s paper in Annalen der
Physik 4 (1901),p. 553, “Über das Gesetz der Energieverteilung im Normalspectrum”
(“On the law of energy distribution for the normal [i.e., blackbody] spectrum”). The
famous Planck’s constant h appears here for the first time. In modern notation, the
blackbody distribution thus takes the form

8πν 2 hν
ρ(ν, T ) = (1.22)
c3 exp(hν/kT ) − 1

From the experimental fits, Planck determined h = 6.55 × 10−27 erg/sec, and k (Boltz-
mann’s constant)= 1.346 × 10−16 ergs/degree. The latter value allowed Planck to
obtain the first decently accurate value for Avogadro’s number N = R/k (where R
is the gas constant).
It is a strange historical irony that Planck’s modification of the Wien’s Law,
motivated by the pressure of the Kurlbaum–Rubens experimental results, was actually
a move towards a “more classical” result: as Einstein was to emphasize in his epochal
1905 paper (Einstein, 1905a), in which the revolutionary idea of field quantization
was introduced, the Wien Law is in a sense an extreme manifestation of the quantal
properties of light. The deviations observed from this law in the infrared by Kurlbaum
and Rubens are harbingers of the reappearance of the classical wave-like aspects of
electromagnetic phenomena. To understand this we must realize that despite Planck’s
heroic efforts to obtain a rigorous and unique classical result for the distribution func-
tion of cavity radiation throughout the 1890s, leading up to the quantum-theoretically
correct Planck distribution (1.22), the first derivation of the blackbody distribution
based on a consistent and full application of classical principles is actually due to Lord
Rayleigh. In a short (two-page) paper published in 1900 (Rayleigh, 1900) Rayleigh
derived the correct classical form of the distribution function from the classical
equipartition theorem applied directly to the electromagnetic modes in the cavity.
Consider a cubical LxLxL box containing electromagnetic radiation in the form of
standing waves. A typical standing wave mode takes the form
n1 πx n2 πy n3 πz
sin( ) sin( ) sin( )
L L L
where the associated frequency is ν = 2Lc
| n| and n is the vector with (positive) integer
Cartesian components (n1 , n2 , n3 ). The number of such modes in the shell (| n|, | n| +
d| n|) (octant of positive components only!—an error of Rayleigh’s later corrected by
Jeans, see below) is evidently

1 L3
4π|n|2 d|n| = 4π 3 ν 2 dν
8 c
and each of these modes receives a total of 2kT at equilibrium by the equipartition
principle (namely, 12 kT each into electric and magnetic field energy, and each of two
polarization modes). Thus the energy per unit volume in the field in the frequency
interval (ν, ν + dν) is
14 Origins I: From the arrow of time to the first quantum field

1 L3 ν2
ρ(ν, T )dν = 3
2kT (4π 3 ν 2 )dν = 8π 3 kT dν (1.23)
L c c
—a result which has since become known as the Rayleigh–Jeans Law. (The error
mentioned above of an overall factor of eight made by Rayleigh in his original paper was
subsequently corrected by Jeans. As Pais points out in his biography of Einstein (Pais,
1982), the correction was made also in Einstein’s 1905 paper on the light quantum, so
the result should perhaps more properly be called the Rayleigh–Jeans–Einstein Law.)
Rayleigh was perfectly aware that this result could not be correct: the total energy
contained in the cavity radiation, when integrated over all frequencies, would then be
infinite! Instead, he assumed that it was correct only for the “graver modes” (i.e., lower
frequencies) and that the distribution was modified for some as yet unknown reason
at higher frequencies (Rayleigh simply inserted an exponential suppression factor at
high frequencies, and the resultant formula was in fact his final result). In any event
the simple linear dependence on temperature in the Rayleigh–Jeans Law flies in the
face of experience: a bar of steel at room temperature (300 K, say) does not emit
radiation at one-tenth the blinding intensity of a bar at 3000 K ! The infinite amount
of energy present in the classical radiation field under equipartition would later (1911)
be referred to by Ehrenfest (Ehrenfest, 1911) as the “ultraviolet catastrophe”.
Of course, if Planck had finished his Boltzmannian calculation of the average
oscillator energy by taking the energy units
to zero, as Boltzmann had done
previously in his discussion of gas theory, he would have arrived precisely at Rayleigh’s
result (though he does not seem to have been aware of Rayleigh’s work during the
critical period leading up to the 1901 paper), as the Rayleigh–Jeans Law is simply the
h → 0 limit of the Planck distribution. That he did not do so is probably due to a
combination of reasons:
1. He does not seem to have regarded equipartition as a fundamental guiding
principle to the same extent as other physicists of a more “mechanist” bent.
2. Planck attacked the problem from the point of view of the behavior of the
oscillators at thermal equilibrium, rather than by directly considering the modes
of the electromagnetic field itself, which would have led much more quickly to
the (wrong!) classical result.
3. The result obtained by setting the energy units to zero would not have agreed
with the Wien Law, with which Planck had started and which he knew to be
empirically correct at higher frequencies.

1.4 First inklings of field quantization: Einstein and energy


fluctuations
Although Planck succeeded in obtaining an absolutely correct expression for the
equilibrium thermal frequency distribution of electromagnetic radiation in a cavity,
there is absolutely no indication that he supposed any sort of energy quantization to
hold for the electromagnetic field itself. Instead, the (at this point frankly magical)
effect of the energy quantization imposed on the material oscillators receiving from
and transferring energy to the radiation in the interior was forcibly transferred to
the electromagnetic field via the equilibrium formula (1.15). The field itself, Planck
First inklings of field quantization: Einstein and energy fluctuations 15

was to insist for almost another full quarter century, was a continuous, fully classical
entity regulated by Maxwell’s equations. The situation was to change dramatically
with Einstein’s remarkable 1905 paper, “On a heuristic point of view concerning
the creation and conversion of light” (Einstein, 1905a). Although this paper is now
commonly referred to as the “photoelectric paper”, Einstein spends much more time
in it on an analysis of the volume-dependence of blackbody radiation (pp. 92–102)
than on the brief discussion (pp. 104–105) of the photoelectric effect.
After pointing out that a strictly classical analysis must necessarily lead to the
Rayleigh–Jeans result (1.23), with its inescapable concomitant ultraviolet catastrophe,
Einstein goes on to analyse cavity radiation in the high-frequency domain, drawing
some extraordinarily non-classical conclusions from the quintessentially “classical” (at
least from an historical point of view) Wien Law. Einstein’s approach in this paper is
radically different from Planck’s. He focusses first and foremost on the thermodynamic
and statistical properties of the electromagnetic radiation in the interior of the cavity.
Taking a cavity of volume V0 and considering only the electromagnetic radiation in
the frequency interval (ν, ν + dν), the energy E of such radiation in the high-frequency
domain where Wien’s Law (1.14) holds is given by
8πhν 3
V0 e− kT dν

E= (1.24)
c3
Solving this equation for T1 and repeating the integration procedure of (1.20) to obtain
an expression for the entropy S of the electromagnetic radiation in this frequency
interval, one finds (the 0 subscript indicates that the radiation in the entire cavity of
volume V0 is being considered—we shall shortly consider radiation in a subcavity)
kE E
S0 = − {ln ( 8πhν 3 dν ) − 1} (1.25)
hν V0 c3
The same amount of radiation confined to a smaller volume V would lead to an entropy
S with exactly the same form as (1.25) but with V0 replaced with V . Accordingly, the
difference in entropy for the two situations is
kE V V
S − S0 = ln ( ) = k ln ( )E/hν (1.26)
hν V0 V0
The fundamental Boltzmannian association of entropy with the probability W of the
associated microstates of the system, S = k ln W , then leads to the conclusion that
V E/hν
W =( ) (1.27)
V0
i.e., that the probability of an energy fluctuation leading to a concentration of all the
electromagnetic radiation in the frequency interval (ν, ν + dν) in the subvolume V
of the full cavity V0 takes exactly the form which we would expect if that radiation
E
consisted of hν “mutually independent energy quanta” (each of energy hν) moving
freely throughout the cavity, in complete analogy to the behavior of molecules in a
gas. This is as far from the classical picture of electromagnetic radiation as extended
waves subject to mutual (destructive and constructive) interference as it is possible to
16 Origins I: From the arrow of time to the first quantum field

get. The result (1.26)—extraordinarily simple, but profoundly baffling, from a classical
point of view—clearly had a deep impact on Einstein’s thinking. He was to hold firmly
to the concept of energy (and later momentum) quantization of the electromagnetic
field over the next 20 years—a period of time in which the majority of physicists
were firmly on Planck’s side and resistant to any notion of quantization of the sacred
classical Maxwellian fields.
The centrality of blackbody radiation to Einstein’s thinking about the nature of
the electromagnetic field is clear once one reflects on the number of occasions on
which he would return to the subject: to take the most prominent cases, in 1909 in
two papers (Einstein, 1909b,a) (one entitled “On the present status of the radiation
problem”, the other “On the development of our conceptions on the nature and
constitution of radiation”) in which energy fluctuations were once more used as a
diagnostic for exposing the underlying properties of radiation, and in 1917, in the
famous “A-B coefficients” paper (Einstein, 1916, 1917), of critical importance in
the later development of dispersion theory by Kramers, and thereafter in the 1925
development of matrix mechanics at the hands of Heisenberg, Born, and Jordan.5 Here
we briefly review Einstein’s results of 1909, which proved to be a critical inspiration
for Jordan’s introduction in 1925, in the last section of the “Three-Man” paper of
Born, Heisenberg, and Jordan (Born et al., 1926), of the first true quantum field.
In returning to the problem of energy fluctuations in cavity radiation, Einstein
decided to relax the simplifying assumption of high-frequency (or low-density) radia-
tion described by the Wien Law, and to enquire into the implications of the full Planck
distribution (1.16), valid at all densities and frequencies, for the fluctuation properties
of thermal radiation. In this case, instead of considering the highly non-Gaussian
process whereby a fluctuation would concentrate 100% of the radiation energy in
a given interval (ν, ν + dν) in a subvolume V (giving the result (1.27), later to be
called “Einstein’s first fluctuation theorem” by Jordan) Einstein decided to calculate
the mean-square energy fluctuation of the energy in this interval in the subvolume
V . The formula for such mean-square fluctuations is a standard result of statistical
mechanics:

dE
(ΔE)2  = kT 2 (1.28)
dT

where T is the temperature and E the mean energy, which in this case is clearly just
V ρ(ν, T )dν. We can distinguish three interesting choices for the energy distribution
ρ(ν, T ) and corresponding mean-square energy fluctuation. We shall distinguish the
results obtained for the mean-square energy fluctuation in each case by a subscript
indicating the assumed form for the universal Kirchhoff distribution function ρ(ν, T ):
“RJ” for the completely classical Rayleigh–Jeans form, “W” for the Wien Law, and
“P” for the final result of Planck. In the case of the Rayleigh–Jeans Law valid at low
frequencies,

5 For a thorough study of the role played by dispersion theory in the birth of modern quantum mechanics,
see the two-part paper by M. Janssen and the present author (Duncan and Janssen, 2007a,b).
First inklings of field quantization: Einstein and energy fluctuations 17

8π 2
ρRJ = ν kT (1.29)
c3
c3 E2RJ
⇒ (ΔE)2 RJ = (1.30)
8πν 2 V dν
while the Wien case, valid at high frequencies, yields
8πh 3 −hν/kT
ρW = ν e (1.31)
c3
⇒ (ΔE)2 W = hνEW (1.32)

and, using the Planck distribution formula valid at all frequencies, one obtains instead
8πh ν3
ρP = (1.33)
c3 ehν/kT − 1
c3 E2P
⇒ (ΔE)2 P = + hνEP (1.34)
8πν 2 V dν
The energy fluctuation result (1.30) for the classical Rayleigh–Jeans regime would
be reproduced by Lorentz (Lorentz, 1916) a few years later independently of thermo-
dynamic considerations by considering the fluctuations of energy of electromagnetic
radiation due to constructive and destructive interference in a subvolume (with an
assumption of random phases of the component waves). The fact that the mean-square
energy fluctuation comes out to be proportional to the square of the mean energy can
be considered to be the characteristic feature of classical waves in this context.
The linear energy dependence of the Wien result for the squared energy fluctuation
is on the other√hand immediately suggestive of quantization, as we would expect a
fluctuation of√ N from N particles of energy hν to lead to a mean-square energy
fluctuation ( N hν)2 = hν · N hν = hνE. The full Planck result (1.34), however,
leads to a mean-square fluctuation which appears to be the purely additive result
of wave and particle contributions.6 In 1909 Einstein was to interpret this remarkable
result (later dubbed by Jordan “Einstein’s second fluctuation theorem”, to distinguish
it from (1.27), the “first fluctuation theorem”) as evidence for two statistically—and
therefore structurally—independent causes for energy fluctuation, and insisted in a
lecture at the Salzburg Naturforscherversammlung (1909) that “the next phase of
the development of theoretical physics will bring us a theory of light that can be
interpreted as a kind of fusion of the wave and emission (i.e., particle) theories”. In
the next section we shall see that Pascual Jordan, in his introduction of the first
quantum field, was to establish conclusively that two separate physical mechanisms
for energy fluctuation are not necessary: rather, once the kinematic demands of the
new quantum theory are properly implemented in the description of the modes of the
electromagnetic field, the result (1.34) emerges precisely and naturally from a unified
dynamical framework.

6 Note, however, that in the Wien regime of large ν, the first term on the right-hand side of (1.34) is
exponentially smaller than the second, agreeing with Einstein’s 1905 assertion of purely particle behavior
in this limit.
18 Origins I: From the arrow of time to the first quantum field

1.5 The first true quantum field: Jordan and energy fluctuations
The evolution of understanding of quantum physics in the twenty years between
Einstein’s introduction of light quanta and the development of the modern for-
mal structure of quantum mechanics initiated by Heisenberg’s famous Umdeutung
(=“Reinterpretation”) paper of 1925 (Heisenberg, 1925) is a fascinating and com-
plex story of frequent frustration punctuated by occasional leaps of understanding
leading to yet further frustration. The final stages of this development, a fusion of
correspondence principle arguments with Einstein’s radiation theory (the “A and
B coefficients”) of 1917 leading to the quantum dispersion theory of Kramers, and
thence to Heisenberg’s reinterpretation of the kinematics of electrons in terms of
non-commuting quantities, have been described many times (and in great detail
in some recent work of M. Janssen and the present author (Duncan and Janssen,
2007a,b)). Here, we shall assume that the reader is familiar with the basic principles
of quantum mechanics and focus our attention on the developments directly related
to the quantization of fields—specifically, of the electromagnetic field, as this was the
classical field of immediate phenomenological importance at the time, given the need
to understand the interactions of atomic systems with light quanta.
The immediate successor to Heisenberg’s Umdeutung paper—the work in which
Heisenberg, without any reliance on the mathematics of matrix algebra, introduced
the basic ideas of matrix mechanics—is the “Two-Man” paper of Born and Jordan
“On Quantum Mechanics” (Born and Jordan, 1925), written in the late summer of
1925 as a formal amplification of the Heisenberg approach. In this paper, Born and
Jordan derive the commutation relations of coordinates and momenta that we now
regard as the fundamental defining characteristic of quantum phenomena. The first
part of the paper clarifies, using explicit matrix methods, many of the “magical”
results obtained by Heisenberg, but it is in the fourth chapter (entitled “Observations
on electrodynamics”) that the subject of quantization of fields is raised for the first
time in the context of the new mechanics. The way in which this idea is introduced is
of some relevance to the explicit calculations to follow in the “Three-Man” paper of
Born, Heisenberg, and Jordan, and deserves an extensive quote:
A cavity with electromagnetic oscillations constitutes a system of infinitely many degrees of
freedom. Nevertheless, the basic principles developed in the preceding sections, which admittedly
only concern systems of a single degree of freedom, are sufficient to handle this case as well, given
that it goes over to a system of uncoupled oscillators once analyzed in terms of eigenmodes. There
is hardly any possible doubt, how such a system is to be treated (our emphasis). In particular, the
circumstance that the basic equations of electromagnetism are linear is of importance, for it then
follows that the virtual oscillators (eigenmodes) are harmonic, and it is precisely for harmonic
oscillators, in contradistinction to other systems, that the validity of energy conservation is
independent of the quantum condition.

In other words, after analysing (essentially by Fourier transformation) the elec-


tromagnetic field in terms of eigenmodes of specific frequencies, one is led to an
infinite set of uncoupled harmonic oscillators, each of which can then be subjected
to quantization (as a system of a single degree of freedom) by the methods outlined
earlier. Moreover, for such harmonic systems, it turns out that energy conservation
can be established without explicit use of the quantum condition [q, p] = i: although
non-commuting quantities appear in the expression for the energy H, they do not
The first true quantum field: Jordan and energy fluctuations 19

have to be interchanged (requiring the use of the commutation relation of p and q)


in the proof of energy conservation Ḣ = 0. The rest of chapter 4 of the Born–Jordan
paper7 is then dedicated to a demonstration of energy and momentum conservation in
linear electrodynamics along these lines. As the results do not at any point involve the
quantum condition (or Planck’s constant!) one is left with the impression that if this
is indeed a calculation in quantum field theory (which in some sense it surely is), it is
a peculiarly stillborn product, lacking any really characteristically quantum features.
This defect would be dramatically, and spectacularly, remedied in the final section of
the follow-up paper (the famous Drei-Männer-Arbeit of Born, Heisenberg, and Jordan
(Born et al., 1926)), where a truly quantum-field-theoretic calculation is employed to
resolve the conundrum of the wave–particle duality of light first raised by Einstein’s
results of 1909 discussed in the previous section.
The Drei-Männer-Arbeit of Born, Heisenberg, and Jordan (entitled “On Quantum
Mechanics: II”) contains a thorough discussion of the fundamental principles of
matrix mechanics: the kinematical quantities of classical mechanics are subjected to
a reinterpretation in which they are replaced by non-commuting matrices (necessarily
infinite) subject to the quantum condition specifying the commutator of canonically
conjugate coordinates and momenta. The final chapter 4 of this work, devoted to
physical applications, consists of three sections. In the first section the commutation
relations for angular momentum are derived and selection rules for angular momentum
discussed. The second section contains a short (and given the lack of understanding of
electron spin at this time, not very productive) discussion of the Zeeman effect. In the
final section of this long paper, entitled “Coupled harmonic resonators: Statistics of
wavefields”, the authors come to grips with the problem of energy fluctuations in the
electromagnetic field. In particular, the aim is to show that Einstein’s baffling result
(1.34) giving the mean-square energy fluctuation in a subvolume of cavity radiation as
a sum of wave and particle terms is a natural and inescapable consequence of subjecting
the harmonic eigenmodes of the electromagnetic field to the same kinematic shift
employed in the quantization of a single harmonic oscillator: i.e., the corresponding
momentum p and coordinate q variables are replaced by matrices satisfying [q, p] = i.
In contrast to the procedure of Planck, in which only the material oscillators in the wall
were quantized systems, all physical entities, matter and radiation in equal measure,
were now to be subjected to a quantization treatment.
It is commonly agreed that the material in chapter 4, section 3 of the Drei-Männer-
Arbeit (henceforth referred to as the “3M paper”) is entirely the work of Jordan:
indeed, his two co-authors were later, for differing reasons, to disavow their involvement
and even to criticize the validity of the result (see (Duncan and Janssen, 2008) for
a detailed discussion of the historical background, as well as a careful mathematical
reconstruction of the argument). With one exception, to be pointed out below, the
criticisms are unwarranted: Jordan’s calculation is technically correct and provides a

7 It is best to draw the veil of charity over the attempts made at the very end to derive Heisenberg’s
connection between matrix elements of the electron coordinate and the transition amplitude for atomic
transitions. This connection would first be properly elucidated in Dirac’s seminal work of 1927, to be
discussed in Chapter 2.
20 Origins I: From the arrow of time to the first quantum field

true insight into the physics of wave–particle duality. We will present a brief summary
of Jordan’s argument in the following (for more details, see the work cited above).
The problem of interpretation of “Einstein’s second fluctuation theorem” (1.34)
(as Jordan termed Einstein’s 1909 result) involving wave and particle terms had
been addressed by Ehrenfest just prior to the 3M paper in a paper (Ehrenfest, 1925)
which is of interest to us only in one aspect: Ehrenfest introduces a one-dimensional
model of cavity radiation which leads to a considerable technical simplification in the
calculation of energy fluctuations. Unfortunately (for Ehrenfest), the paper precedes
the Umdeutung paper of Heisenberg, so the non-commutativity of the eigenmode
variables is unrecognized, and incorrect results necessarily follow in the quantum case.
Still, the model of Ehrenfest was exactly the technical tool Jordan used to carry
through a correct, post-Umdeutung calculation of the energy fluctuations in cavity
radiation. The model of Ehrenfest which Jordan uses imagines a string of length l,
fixed at both ends, and of constant elasticity and constant mass density. This is simply
a one-dimensional analog of an electromagnetic field where the fixing of the string at
the ends corresponds to an electric field component forced to vanish at the conducting
sides of a box. The displacement of the string at location x (with 0 ≤ x ≤ l) and time
t is denoted u(x, t). The wave equation for the string (the analog of the free Maxwell
equations for this simple model) is then

∂2u ∂2u
− =0 (1.35)
∂t2 ∂x2
Note that the velocity of propagation is set to unity here. The boundary conditions
u(0, t) = u(l, t) = 0 for all times t express that the string is fixed at both ends. The
general solution of this problem can be written as a Fourier series

 kπ
u(x, t) = qk (t) sin (ωk x), ωk ≡ (1.36)
l
k=1

qk (t) = ak cos (ωk t + ϕk ). (1.37)

The classical Hamiltonian for a one-dimensional string is the well-known expression



1 l  
H= dx u̇2 + u2x (1.38)
2 0

where the dot indicates a time-derivative and the subscript x a partial derivative with
respect to x. The terms u̇2 and u2x are the analogs of the densities of the electric and
the magnetic field energy, respectively, in this simple model of blackbody radiation.
Inserting (1.36) for u(x, t) in (1.38), one finds
 l ∞

1
H= dx (q̇j (t)q̇k (t) sin (ωj x) sin (ωk x)
2 0 j,k=1

+ ωj ωk qj (t)qk (t) cos (ωj x) cos (ωk x)) (1.39)


The first true quantum field: Jordan and energy fluctuations 21

The functions {sin (ωk x)}k in (1.36) are orthogonal on the interval (0, l), i.e.,

 l
l
dx sin (ωj x) sin (ωk x) = δjk (1.40)
0 2

The same is true for the functions {cos (ωk x)}k . It follows that the integral in (1.39)
only gives contributions for j = k. The double sum thus turns into the single sum:

∞ ∞
l 2  
H= q̇j (t) + ωj2 qj2 (t) = Hj (1.41)
j=1
4 j=1

This expression shows that the vibrating string can be replaced by an infinite
number of uncoupled oscillators, one for every mode of the string, just as described
in the extended quote from chapter 4 of the Born–Jordan paper given previously.
Moreover, the distribution of the energy over the frequencies of these oscillators
is constant in time. Since there is no coupling between the oscillators, there is no
mechanism for transferring energy from one mode to another. The spatial distribution
of the energy in a given frequency range over the length of the string, however, varies
in time. In analogy to Einstein’s considerations of 1909, Jordan now sets out to study
the fluctuations of the energy in a narrow frequency interval (ω, ω + Δω) in a small
segment of the string, namely the region 0 ≤ x ≤ a, a << l. The total energy in that
frequency range will be constant but the fraction located in that small segment will
fluctuate. Jordan derived an expression for the mean-square energy fluctuation of this
energy, first in classical theory, then in matrix mechanics. Here we shall abbreviate the
discussion by going directly to the quantum mechanical case. Accordingly, quantities
like q(t), q̇(t) in the forthcoming equations must be considered to be non-commuting
matrices. However, the time-development of these matrices involves exactly the usual
periodic functions as in the classical case.
Changing the upper boundary of the integral in (1.39) from l to a (a l) and
restricting the sums over j to correspond to a narrow angular frequency range (ω, ω +
Δω) (i.e., ω < j(π/l) < ω + Δω and ω < k(π/l) < ω + Δω), we find the instantaneous
energy in that frequency range in a small segment (0, a) ⊂ (0, l) of the string, here
denoted E(a,ω) :
 a 
1
E(a,ω) (t) = dx (q̇j (t)q̇k (t) sin (ωj x) sin (ωk x)
2 0 j,k

+ ωj ωk qj (t)qk (t) cos (ωj x) cos (ωk x)) (1.42)

The functions {sin (ωk x)}k and the functions {cos (ωk x)}k are not orthogonal on
the interval (0, a), so both terms with j = k and terms with j = k will contribute to
the instantaneous energy E(a,ω) (t) in (1.42). First consider the (j = k) terms. On the
assumption that a is large enough for the integrals over sin2 (ωj x) and cos2 (ωj x) to
range over many periods corresponding to ωj , these terms are given by
22 Origins I: From the arrow of time to the first quantum field

(j=k) a  2  a
E(a,ω) (t) ≈ q̇j (t) + ωj2 qj2 (t) = Hj (t). (1.43)
4 j l j

Since we are dealing with a system of uncoupled oscillators, the energy of the individual
oscillators is constant, even at the quantum level (recall the emphasis on this point in
(j=k)
the Born–Jordan paper). Since all terms Hj (t) are constant, E(a,ω) (t) is constant too
and equal to its time average:

(j=k) (j=k)
E(a,ω) (t) = E(a,ω) (t). (1.44)

Since the time averages q̇j (t)q̇k (t) and qj (t)qk (t) vanish for j = k, the (j = k) terms
in (1.42) do not contribute to its time average:

(j=k)
E(a,ω) (t) = 0. (1.45)

The time average of (1.42) is thus given by the (j = k) terms:


(j=k)
E(a,ω) (t) = E(a,ω) (t). (1.46)

From (1.46) it follows that the (j = k) terms in (1.42) give the instantaneous deviation
ΔE(a,ω) (t) of the energy in this frequency range in the segment (0, a) of the string
from its mean (time average) value:
(j=k)
ΔE(a,ω) (t) ≡ E(a,ω) (t) − E(a,ω) (t) = E(a,ω) (t). (1.47)

We now integrate the (j = k) terms in (1.42) to find ΔE(a,ω) . From now on, we suppress
the explicit display of the time-dependence of ΔE(a,ω) , qj and q̇j .
 a 
1
ΔE(a,ω) = dx (q̇j q̇k [cos ((ωj − ωk )x) − cos ((ωj + ωk )x)]
4 0 j=k

+ ωj ωk qj qk [cos ((ωj − ωk )x) + cos ((ωj + ωk )x)])


  (1.48)
1 sin ((ωj − ωk )a) sin ((ωj + ωk )a)
= q̇j q̇k −
4 ωj − ω k ωj + ωk
j=k


sin ((ωj − ωk )a) sin ((ωj + ωk )a)


+ ωj ωk qj qk + .
ωj − ωk ωj + ω k

Defining the expressions within square brackets as (cf. 3M paper, ch. 4, Eq. (45 ))

sin ((ωj − ωk )a) sin ((ωj + ωk )a)


Kjk ≡ − ,
ωj − ω k ωj + ω k
(1.49)
 sin ((ωj − ωk )a) sin ((ωj + ωk )a)
Kjk ≡ + ,
ωj − ω k ωj + ω k
The first true quantum field: Jordan and energy fluctuations 23

we can write this as (cf. ch. 4, Eq. (45)):


1  

ΔE(a,ω) = q̇j q̇k Kjk + ωj ωk qj qk Kjk . (1.50)
4
j=k

Recalling that we are working in a small frequency interval, ωj − ωk << ωj + ωk , it


turns out that it is permissible to neglect the second term in the expressions for Kjk
 
and Kjk above, so we set Kjk = Kjk and obtain

1
ΔE(a,ω) = Kjk (q̇j q̇k + ωj ωk qj qk ) (1.51)
4
j=k

where the coefficients Kjk now mean

sin ((ωj − ωk )a)


Kjk ≡ (1.52)
ωj − ω k

From (1.41) above it is apparent that the individual oscillators are formally identical
to point particles of mass m = l/2. The subsequent calculations can be considerably
simplified by introducing the now familiar8 raising and lowering operators a†j (t), aj (t)

lω 1
aj (t) = qj (t) + i pj (t) = aj (0)e−iωj t
4 lω

† lω 1
aj (t) = qj (t) − i pj (t) = a†j (0)eiωj t (1.53)
4 lω
satisfying

[aj (t), a†k (t)] = δjk (1.54)

In fact, the calculations of the 3M paper involve only the pj and qj matrices, and their
commutation relation, and are mathematically perfectly equivalent to results obtained
with the linear combinations defined in (1.53). The introduction of operators which
raise or lower the excitation level of the individual eigenmodes will become central
in our later development of the modern formalism of quantum field theory. To the
extent that the excitation levels {nj } are identified (as they clearly are in the 3M
paper) with the number of light quanta (i.e., photons, in modern terminology) with
frequency ωj , operators raising and lowering these levels are clearly identifiable as
the particle creation and destruction operators of modern field theory. Later, in our
systematic development of field theory, they will turn out to be the technical tool
ideally suited to the introduction of physically sensible local interactions as well as
dealing effortlessly with the statistics of properly symmetrized multi-particle states.
Here they are introduced simply in order to allow us to write the expression for the
energy fluctuation in a maximally compact fashion. We remind the reader (Baym,

8 See (Baym, 1990) for a discussion of this now standard method for solving the quantized harmonic
oscillator.
24 Origins I: From the arrow of time to the first quantum field

1990) that the effect of these operators for a single mode (the jth, say) on an eigenstate
|nj  of Hj is (at time 0)


aj (0)|nj  = nj |nj − 1 (1.55)

a†j (0)|nj  = nj + 1|nj + 1 (1.56)

a†j (0)aj (0)|nj  = nj |nj  (1.57)

In terms of the aj and a†j operators the instantaneous energy fluctuation takes the
very simple form:

 √
ΔE(a,ω) = Kjk ωj ωk a†j (t)ak (t) (1.58)
l
j=k

In other words, the operator (or “matrix”, in the language of the 3M paper) represent-
ing energy fluctuations is simply a sum of terms each of which takes a single photon
in a given energy level and transfers it to a different energy level. What could be more
natural?
The question now arises concerning in which state to evaluate the squared energy
2
fluctuation ΔE(a,ω) . The 1909 calculations of Einstein refer to cavity radiation in
thermal equilibrium at a specified temperature T —i.e., to the evaluation of the mean-
square fluctuation in a canonical thermal ensemble of states—but a careful perusal
of the Jordan calculations of the 3M paper shows that the temperature never enters!
Instead, Jordan calculates the quantum dispersion of the energy in a single, pure state
of the field |{nl }, characterized by specifying all excitation levels nl , l = 1, 2, 3, ... of
the field (recall (1.41)):

 1
H|{nl } = (nj + )ωj |{nl } (1.59)
j
2

2
The expectation value of ΔE(a,ω) in this eigenstate of the full energy operator H is
necessarily time-independent, so the time-averaging is moot. One has simply

2  √
{nl }|ΔE(a,ω)
2
|{nl } = Kjk Kj  k ωj ωk ωj  ωk {nl }|a†j ak a†j  ak |{nl }
l2
j=k,j  =k
(1.60)
The diagonal matrix element in (1.60) only receives non-vanishing contributions when
the indices j, k, j  , k  satisfy j = k = k = j  , i.e., when the photon destroyed in mode
k  (by ak ) and the photon created into the different mode j  (by a†j  ) are replaced in
mode j = k  (by a†j ) and removed in mode k = j  (by ak ). Thus we obtain
The first true quantum field: Jordan and energy fluctuations 25

2  2
{nl }|ΔE(a,ω)
2
|{nl } = Kjk ωj ωk {nl }|a†j ak a†k aj |{nl } (1.61)
l2
j=k

2  2
= Kjk ωj ωk {nl }|a†j aj (a†k ak + 1)|{nl } (1.62)
l2
j=k

2  2
= Kjk nj (nk + 1)ωj ωk (1.63)
l2
j=k

where in going from (1.61) to (1.62) we have used the commutation relation (1.54),
and used the fact that the operator a†j aj has eigenvalue nj (i.e., the excitation level)
for the jth mode. At this point a modern derivation of the thermal fluctuation would
immediately perform a canonical ensemble average (by multiplying by the Boltzmann
−β j n ω
j
weight e j and summing over excitation levels to obtain the weighted average
of the quantity in (1.63)). As the double sum is over non-identical indices j = k, the
thermal averages factorize and the result is to replace the fixed occupation number nj
(resp. nk ) by the Planck mean occupation number of that mode n̄j = eβω1j −1 (resp.
n̄k ). This is manifestly a smooth function of the discrete index j, a condition required
by the next step Jordan takes in simplifying (1.63): namely, the (implicit) assumption
that the dependence of the summand is sufficiently smooth that we can, with negligible
error, replace the double sum with a double frequency integral. It should be emphasized
once again that Jordan does not perform a thermal average in the 3M paper: rather,
his derivation must be considered as valid for the pure state quantum dispersion,
assuming that the given pure state has a photon occupation number dependence which
is adequately smooth with respect to the variation of the discrete mode index over the
finite interval under consideration to allow the replacement of sums by integrals. The
further simplification of (1.63) therefore proceeds by the replacements
 
l
→ dω (1.64)
j
π
 
l
→ dω  (1.65)
π
k


where ω = jπ
l ,ω = l .

Resuming our calculation, we first note that if a is very large compared to the
wavelengths associated with the frequencies in the narrow range (ω, ω + Δω), we
can set9
  
 sin ((ω − ω )a)
2

dω f (ω ) = dω  f (ω  )πaδ(ω − ω  ) = πaf (ω) (1.66)
(ω − ω  )2

9 Here we use the fact that the integrand becomes highly peaked for a → ∞, with total weight determined
 +∞ sin2 (x)
by the definite integral −∞ x2
dx = π.
26 Origins I: From the arrow of time to the first quantum field

where δ(x) is the Dirac δ-function and f (x) is an arbitrary function. As pointed out
2
previously, the sine function in (1.66) is the dominant part of the Kjk factor in (1.63)
in a narrow frequency interval, so we finally obtain, with the indicated translations
from sums to integrals,
 
2
{nl }|ΔE(a,ω)
2
|{nl } = dω dω  πaδ(ω − ω  )n(ω)(n(ω  ) + 1)ωω 
π2

a ω+Δω
= (n(ω)2 + n(ω))2 ω 2 dω
π ω
a
 (n(ω)2 + n(ω))2 ω 2 Δω (1.67)
π
We once again (and for the last time!) re-emphasize that the smooth variation of the
occupation numbers nj is a precondition for the validity of the result (1.67). This
smoothness is, of course, guaranteed once the thermal average is performed to replace
fixed occupation numbers nj by their thermal averages n̄j , as the latter are just the
smooth Planck function 1/(eβωj − 1).
The result (1.67) is just Einstein’s second fluctuation theorem (à la Jordan) in
slightly disguised form. Writing all quantities as functions of the cyclic frequency ν
rather than angular frequency ω, and using an overbar as a shorthand for the diagonal
expectation values in Eqs. (1.60)–(1.67), we find:
 
2
ΔE(a,ν) = 2aΔν (n(ν)hν)2 + (n(ν)hν)hν (1.68)

We now introduce the excitation energy—the difference between the total energy and
the zero-point energy. Jordan and his co-authors call this the “thermal energy” (p. 377,
p. 384). Although the intuition behind it is clear, this terminology is misleading. The
term “thermal energy” suggests that the authors consider a thermal ensemble of energy
eigenstates—what we would call a mixed state—while as has been made clear in the
preceding, in fact they are dealing with individual energy eigenstates, i.e., pure states.
The term “excitation energy” is therefore preferable in this context. The excitation
energy E(ν) in the narrow frequency range (ν, ν + Δν) in the entire string in the state
{nν } is

E(ν) = N (ν)(n(ν)hν) = 2lΔν(n(ν)hν) (1.69)

where we used that N (ν) = 2lΔν is the number of modes between ν and ν + Δν for
our one-dimensional string. On average there will be a fraction a/l of this energy in
the small segment (0, a) of the string:
a
E(a,ν) = E(ν) = 2aΔν(n(ν)hν) (1.70)
l

Substituting E(a,ν) /2aΔν for n(ν)hν in (1.68), we arrive at the final result of this
section of the Dreimännerarbeit (ch. 4, Eq. (55)):
The first true quantum field: Jordan and energy fluctuations 27

2
2
E(a,ν)
ΔE(a,ν) = + hνE(a,ν) (1.71)
2aΔν

—precisely the analog, for a one-dimensional system of waves (with unit wave speed),
of the Einstein result (1.34) (recall that in three dimensions, the number of electro-
magnetic modes in volume V in a narrow frequency interval is just 8πν 2 V dν/c3 ).
Jordan’s derivation of Einstein’s peculiar “hybrid” formula for the energy fluc-
tuations in cavity radiation is used in the final paragraph of the 3M paper as
further, as it were independent (of the dispersion considerations that had originally
motivated Heisenberg) evidence for the validity of the matrix mechanical procedure
of maintaining the form of the classical dynamical equations while reinterpreting as
non-commuting matrix quantities the kinematical ingredients of these equations. But
Jordan himself regarded the result as having far greater significance, insofar as it
pointed the way to a general procedure for extending the principles of quantum
mechanics to field systems with infinitely many degrees of freedom, including, as
Jordan was to show a few years later, systems of particles satisfying Fermi statistics.
Jordan was later (in 1962, in a comment to van der Waerden) to refer to his fluctuation
calculation in the 3M paper as “almost the most important contribution I ever made
to quantum mechanics.”
The negative reaction of Jordan’s contemporaries—including his co-authors on the
3M paper!—to the fluctuation calculation is of interest in its own right. Heisenberg
seems to have been worried about potential divergences, and later (in 1930) was to
publish a paper (Heisenberg, 1931) showing that the mean-square fluctuation in a
subvolume of the cavity is in fact infinite if one considers the electromagnetic energy
integrated over all frequencies. Although mathematically correct, this calculation is
irrelevant as a criticism of Jordan’s result in the 3M paper, which explicitly considers
the energy fluctuations in a finite frequency interval, which are perfectly finite, as we
have seen. It is also irrelevant from a physical point of view: the isolation of energy
in a subvolume requires the introduction of enclosing physical filters of small but
necessarily finite thickness, and the structure of these filters will eventually be resolved
if we consider photons of arbitrarily high frequency and small wavelength. In fact,
Heisenberg found that if the walls of the enclosing subvolume are smeared out (for
example, if we replace the θ-function θ(a − x) implementing the restriction of the
range of the integral (1.42) by a smooth function interpolating between 0 and 1),
the integral over frequencies can be extended to infinity with a finite result for the
mean-square energy fluctuation. Born was initially less vocal in opposition to Jordan’s
result, but later (in 1939), in exile in Edinburgh, published in collaboration with his
assistant Klaus Fuchs (later famous after being discovered as a Soviet spy!) a paper
(Born and Fuchs, 1939a,b) essentially retracting the entire calculation. The retraction
had itself in short order to be retracted when serious technical errors were discovered
by Pauli’s assistant Markus Fierz.10

10 For a detailed discussion of the reception of the Jordan fluctuation calculation, see the previously cited
work of Duncan and Janssen (Duncan and Janssen, 2008).
28 Origins I: From the arrow of time to the first quantum field

In a letter to Jordan (1926), Einstein complained that Jordan’s methods, while


perfectly adequate in reproducing the second fluctuation theorem of 1909 (no mean
feat in itself), must somehow fail in explaining the (at first sight simpler) result of 1905,
for the quasi-molecular behavior of radiation in the Wien regime. It is difficult to infer
at this remove what specific difficulty Einstein is referring to here: perhaps he was
somehow under the impression that the zero-point energy should be included in the
energy of an individual photon (making it 32 hν instead of hν), which would then make
the result (1.27) incomprehensible. It is certainly true that from a technical standpoint,
using the methods of the 3M paper, the calculation of large non-Gaussian fluctuations
of the kind considered in the first fluctuation theorem, where all the radiation energy is
concentrated in the subvolume, is highly non-trivial. Such a calculation would require
the evaluation of all (not just the quadratic) moments of the energy fluctuation. In
principle, one could (with considerable effort!) explicitly calculate all higher moments
and conclude that the behavior of light in the Wien regime does indeed mimic perfectly
the statistical behavior of a gas of point-like particles—in particular in the extreme case
of a fluctuation localizing all the energy in a subvolume—but in fact the technically
superior route to this conclusion is simply to employ the modern methods of thermal
statistical quantum field theory to calculate the expression for the entropy of the field,
and then to proceed exactly as Einstein did in 1905!
To summarize the situation as the epochal year 1925 drew to a close, we have a
clear example at last of a true quantum-field-theoretical calculation giving a mean-
ingful resolution to the long-standing paradox of the wave–particle duality of light.
This resolution can be achieved at the level of the non-interacting electromagnetic
field: although true thermal equilibrium requires interactions of the light with the
material container (the “heat bath” at temperature T ), the quantum dispersion of a
pure state of free photons in fact displays the same dual form for the mean-square
energy fluctuation as the thermal case (with the caveats emphasized previously on
smoothness of the occupation number distribution). This is the quantum analog of
the fact that the classical fluctuation result (1.30) was also found by Lorentz (Lorentz,
1916) to hold rather generally (i.e., with no appeal to thermal concepts) provided a
“reasonable” random phase assumption was made with regard to the classical modes
of the electromagnetic field. Later, when we return to the issue of the classical limit of
quantum fields, we shall see that this analogy is not accidental, but based on a deep
principle of number–phase complementarity for fields.
While the fluctuation calculations of the final section of the Drei-Männer-Arbeit
bring to a perfectly satisfactory conclusion the twenty-year controversy over wave–
particle duality in electromagnetic phenomena initiated by Einstein’s 1905 paper on
light, it marks only the very beginning of the development of quantum field theory
as we understand it today. In particular, two very important tasks required, and were
soon to receive, the attention of physicists. First, there was the matter of interactions
of light with matter. Indeed, the role of dispersion theory as the midwife to the new
matrix mechanics of Heisenberg, Born, and Jordan indicates the primary role that
the interaction between atomic systems (i.e., bound electrons in atoms) and light
played in the thinking of physicists in the late 1920s. The 1927 paper of Dirac on
the interactions between electrons and the quantized electromagnetic field was to
begin the tortuous journey to a physically consistent quantum electrodynamics which
The first true quantum field: Jordan and energy fluctuations 29

would finally emerge by mid-century. The second task, also beginning in 1927, was
taken up with great intensity and focus by Jordan and collaborators: the extension of
the notion of field quantization to the treatment of matter fields, in particular fields
with elementary excitations of fermionic character, which in consequence could never
possess a classical counterpart analogous to the electromagnetic field of Maxwell. Our
review of the historical evolution of quantum field theoretical concepts continues in
the next chapter with a discussion of these developments.
2
Origins II: Gestation and birth of
interacting field theory: from Dirac
to Shelter Island

The convoluted evolution of quantum field theory in the period from the emergence
of modern quantum mechanics in the late 1920s until the completion of the fully
covariant and renormalizable quantum electrodynamics of the early 1950s seems at first
sight a reprise of the extended birth pangs of quantum mechanics itself in the period
from Planck’s quantization of the distribution of energy among thermally equilibrated
oscillators in 1900 to the development of matrix mechanics by Heisenberg in 1925. In
both cases, and in contrast to the development of special and general relativity by
Einstein, progress (often slow and halting) was due to the efforts of many physicists
working along several lines of enquiry, with each new insight often opening up new
questions and new difficulties.
However, at least in hindsight, it is apparent that there is a considerable difference
in the intellectual background of the two efforts. The physicists struggling with the
development of quantum mechanics in the first quarter of the twentieth century were
faced with the need to construct an entirely novel, and in many respects completely
counter-intuitive, type of physical theory, in which many of the basic concepts of classi-
cal physics seemed no longer applicable. Indeed, adherence to these concepts frequently
obstructed rather than assisted the understanding of the complex of microscopic
phenomena steadily being uncovered on the experimental front. As with relativity,
and even more radically, the new theory seemed to demand the demolition of some of
the most deeply held presuppositions of classical physics, and it was totally unclear,
almost to the very end (i.e., the Heisenberg–Schrödinger revolution of 1925–26), what
nexus of consistently interrelated concepts could replace them.
By contrast, at least from 1930 on, the physical requirements and conceptual struc-
ture needed for an adequate quantum theory of fields were fairly clear: the quantum
mechanical substructure needed to follow a clear set of by now well-established rules,
and the resultant theory should obviously respect the precepts of special relativity,
yielding transition probabilities indifferent to one’s choice of inertial frame. In a way,
the fact that the quantum mechanical and relativistic foundations needed for an
adequate physical theory were completely clear by the early 1930s made the apparent
inconsistencies and frequently infinite results obtained in early calculations in quantum
electrodynamics even more frustrating for the leading theorists in the field, who (as
in the case of Heisenberg’s willingness to introduce a discretization of space and a
Introducing interactions: Dirac and the beginnings of quantum electrodynamics 31

universal length unit) often felt the need for radical modifications at a fundamental
level—modifications which we now understand to have been quite unnecessary.
A comprehensive (and comprehensible) account of the history of quantum field
theory from the Dreimännerarbeit of 1925 to Dyson’s renormalization analysis of 1949,
effectively completing the formal framework of perturbative quantum electrodynamics
(QED), would require an entire treatise.1 In this chapter, constraints of space will
limit us to a highly selective account of some of the major breakthroughs along the
way. Many important and interesting contributions made to the early development
of QED will be passed over in silence. The papers discussed are primarily those
which had a definitive impact in (a) the development of the formal structure of
quantum field theory, and (b) uncovering and (partially) resolving the conceptual
difficulties occasioned by the lack of explicit relativistic covariance and by the
appearance of ultraviolet divergencies in early calculations of quantum electrodynamic
processes.

2.1 Introducing interactions: Dirac and the beginnings


of quantum electrodynamics
In early 1927 Dirac took the first steps (Dirac, 1927b) towards a formulation of the
theory of interaction of charged particles (specifically, electrons) with a quantized
electromagnetic field which would eventually lead to the fully covariant, and renormal-
izable, quantum electrodynamics of the late 1940s: a formalism whose extraordinary
quantitative successes established once and for all that local relativistic field theories
are an essential ingredient in any precise description of the microworld, at least as far
as electrodynamic effects are concerned. The motivation for Dirac’s work was clear: the
basis of the quantum dispersion theory which led to Heisenberg’s founding paper on
matrix mechanics, and indeed the interpretation of the matrices appearing in the latter
as connected to the transition amplitudes for electrons jumping between distinct bound
states in the atom, with the concomitant absorption or emission of photons, still rested
on a completely ad hoc set of assumptions concerning the nature of the interactions
of electrons in atoms with the electromagnetic field. The latter had indeed been
subjected, in Jordan’s calculations of energy fluctuations described in the preceding
chapter, to the quantization procedure, but only in the approximation in which
photons remain free particles, oblivious to the presence of matter. The establishment of
the relation between matrix elements of the electron’s coordinate operator in various
atomic states, and the probabilities for transitions between these states (a relation
simply posited by fiat by the founders of matrix mechanics), required an explicit theory
of the interaction of an atomic system with a quantized electromagnetic field, and it
was this theory which Dirac set out to develop in his seminal work. As a byproduct,
the famous Einstein formulas for the A and B coefficients describing spontaneous and
induced radiation—derived originally with the help of thermodynamic arguments—
appear almost effortlessly once the appropriate formalism is in place.

1 Indeed, the subject has been tackled already, in the excellent book by Schweber (Schweber, 1994),
which the reader is encouraged to consult for more extensive details on many of the developments discussed
below. See also the book by Miller (Miller, 1994), containing a number of the original papers in translation.
32 Origins II: Gestation and birth of interacting field theory: from Dirac to Shelter Island

The problem addressed by Dirac in (Dirac, 1927b) was that of determining the
“perturbation of an assembly satisfying the Einstein–Bose statistics” due to the
interaction of the assembly with an atomic system (in fact, with an electron in such
a system). The terminology here seems somewhat strange from a modern point of
view: Dirac refers frequently to the set of independent modes of the non-interacting
electromagnetic field (say, quantized in a box of finite volume so that the allowed
modes form a discrete set, as in the Jordan calculation in the 3M paper) as an
“assembly of independent systems”. He then sets about to write a Hamiltonian for
the electromagnetic field in terms of these mode variables—one sufficiently general
to accommodate both the free electromagnetic field and the possible perturbation
which might be induced by inserting an atomic system (described with the usual
non-relativistic quantum formalism) into the field.
At this point Dirac introduces two technical devices which would become central
features of quantum field theory: firstly, the use of a canonical transformation from the
(pr , qr ) type variables (such as those employed by Jordan in describing the individual

harmonic oscillator systems for the rth field mode; cf. Section 1.5) to amplitude ( Nr )
and phase (θr ) variables, and thence to destruction br and creation b†r operators which
lower and raise, respectively, by one the number of photons associated with the rth
mode. The use of such variables actually goes back to a paper by London (London,
1926),2 where the harmonic oscillator spectrum (and eigenfunctions) are derived using
this technique.
Secondly, he employs for the first time the interaction picture of time development
wherein the operators of the theory evolve via time-dependent unitary transformations
due only to the free part of the Hamiltonian. The Hamiltonian introduced by Dirac
for his Einstein–Bose “assembly” (i.e., the electromagnetic field) takes the form

H= b†r Hrs bs , Hrs = Wr δrs + vrs (2.1)
r,s

where Wr is the energy of a free photon in the rth mode (namely, Wr = hνr ), and
the vrs form a matrix representing the effect of a perturbation (for example, due to
the presence of an atomic electron in some atomic state) on the electromagnetic field.
The operators br are defined in terms of the aforesaid conjugate amplitude and angle
variables3

br = e−iθr / Nr , [Nr , θs ] = −iδrs (2.2)

whence one finds, using the representation θr = i ∂N r
, and consequently e−iθr / =
∂ ∂ ∂
e ∂Nr . Thus, e ∂Nr f (Nr ) = f (Nr + 1)e ∂Nr , and we find

2 See(Duncan and Janssen, 2009, p. 358) for a discussion of this important but not well known paper.
3 The proper definition of a well-defined phase operator involves some subtle mathematical problems
which at that time were not appreciated by Dirac, and which will be addressed in detail in Section 8.2.
Here, we note simply that Dirac’s results can be obtained without relying on the ill-defined phase operators
θr , but purely in terms of the algebra of creation b†r and annihilation br operators, which are perfectly well
defined.
Introducing interactions: Dirac and the beginnings of quantum electrodynamics 33
 
br = e−iθr / Nr = Nr + 1e−iθr / (2.3)
 
b†r = Nr eiθr / = eiθr / Nr + 1 (2.4)
Nr = b†r br (2.5)

with the positive semidefinite operators Nr taking eigenvalues equal to zero or a


positive integer value, clearly to be interpreted as the number of photons in mode
r. In terms of the creation and destruction operators, the Hamiltonian (2.1) takes the
form
     
H= Hrs Nr eiθr / Ns + 1 e−iθs / = Hrs Nr Ns + 1 − δrs ei(θr −θs )/
rs rs
(2.6)
A particular pure state |ψ; t at time t (we shall use Dirac notation here, even
though it did not yet exist in the modern form in the paper under discussion!) of
the electromagnetic field may be identified by giving the “wave function” of the field
in occupation number representation, i.e., by specifying the set of amplitudes

ψ(n1 , n2 , .., nr , ...; t) ≡ n1 , n2 , .....|ψ; t (2.7)

(where nr is the integer eigenvalue of Nr ), and satisfies the time-dependent Schrödinger


equation


i ψ(n1 , n2 , .., nr , ..; t) = n1 , n2 , ...|H|ψ; t
∂t
 √ 
= Hrs nr ns + 1 − δrs ψ(n1 , ..., nr − 1, ..., ns + 1, ...; t)
r,s

(2.8)

The bilinear (in creation and destruction operators) form (2.1) chosen by Dirac for
the Hamiltonian of the electromagnetic field means, of course, that the number of
photons is necessarily conserved: the perhaps most characteristic feature of quantum
field theories—particle creation and/or annihilation—is completely missing in this first
attempt at an interacting theory of photons! Indeed, each destruction operator bs is
accompanied by a creation operator b†r , so the time evolution of the system under
the Hamiltonian (2.1) amounts simply to the continual transitioning of photons from
one mode to another. This proves to be something of an embarrassment when Dirac
addresses the basic object of the paper: the calculation of probabilities for the emission
or absorption of photons by an electron (in a bound state of an atom). This difficulty
is finessed by assuming that the “disappearance” of a photon in a pure absorption
process really corresponds to the transition of the photon from a finite-energy (and
hence detectable) state to a zero energy (and hence undetectable) state, which is
possible, of course, for a massless particle with zero momentum. Indeed, Dirac assumes
34 Origins II: Gestation and birth of interacting field theory: from Dirac to Shelter Island

the existence of an omnipresent “sea” of infinitely many zero-energy photons,4 with


the addition or subtraction of a single (or any finite number) of zero-energy photons
a physically unobservable event.
In hindsight, this assumption is completely unnecessary: indeed, one could easily
have added terms linear in the creation and destruction operators directly to the
right-hand side of (2.1) (symmetrically, in order to preserve hermiticity of the
Hamiltonian), in which case the desired emission and absorption events would have
appeared directly, with no need to resort to the device of an invisible sea of photons.
The speculation is irresistible that the notion of an invisible sea of particles in their
ground (lowest available energy) state which later re-emerges in Dirac’s hole theory
of positrons can be traced back to exactly this aspect of his first stab at quantum
electrodynamics. Unfortunately, the hole theory in the context of charged massive
particles (electrons and positrons) turned out to be a red herring of sinister vitality
in the 1930s (and even into the 1940s), leading to a huge amount of fruitless (and
complicated) formalism, as we shall see below, before the transition was made to a
formulation in terms of a Fock space approach in which particles and antiparticles
appear in a completely symmetric way.
Dirac circumvents the absence of pure emission or absorption terms in the proposed
Hamiltonian (2.1) by a formal device which corresponds in modern terms to assuming
that the vacuum is in a coherent state (cf. Section 8.2) of zero-energy photons, so
that (for the destruction operator b0 of a zero-energy photon) the photon states of the
system satisfy5

b0 |ψ = α|ψ, ψ|b†0 = α∗ ψ| (2.9)

with α a complex number with modulus going to infinity if (as Dirac assumes) there
are infinitely many zero-energy photons present. Dirac assumes that the terms in H
in (2.1) with r = 0, s = 0 (pure emission events) then lead (via vanishingly small vr0
coefficients) to finite matrix elements by setting the limit (for infinitely many zero-
energy photons)

vr0 α → vr = finite (2.10)

which then implies a correspondingly finite amplitude v0s α∗ = vs∗ for a pure absorption
event, in which a finite-energy photon in mode s is transferred to the zero-energy
reservoir. The net effect of these shenanigans is the appearance of a term (necessarily
hermitian, given the hermiticity of the original interaction matrix vrs ) linear in the

4 In modern terminology, Dirac was imagining a Bose–Einstein condensate of zero-energy photons. While
this idea is physically incorrect in the present context, Dirac’s intuition of the physical imperceptibility of
very-low-energy photons is a critical component in a proper understanding of the problem of infrared
divergences in scattering amplitudes in quantum electrodynamics, as we shall see in Section 19.2.
5 Strictly speaking, this is not what Dirac does. He assumes a definite number N of zero-energy photons,
0
in which case the matrix elements of the Hamiltonian between initial and final states would necessarily
vanish! We are “fixing up” his argument here to yield the desired result, which he could, of course,
have obtained directly by assuming from the outset an hermitian term linear in creation and destruction
operators. Dirac was perfectly aware that the desired interaction Hamiltonian
√ was necessarily linear in the
electromagnetic vector potential and hence in the amplitude variable Nr , thence also in the creation–
destruction variables, as he shows in the final Section 7 of his paper (see below).
Introducing interactions: Dirac and the beginnings of quantum electrodynamics 35

creation and destruction operators, and capable of initiating the desired pure emission
and absorption events:

H = H0 + Hlin + Hscat (2.11)


 
H0 = Wr b†r br = Wr N r (2.12)
r r

Hlin = (vr b†r + vr∗ br ) (2.13)
r=0

Hscat = vrs b†r bs (2.14)
r,s=0

For an interaction term of linear type, the Einstein results for the A and B coeffi-
cients describing spontaneous emission and absorption (Einstein, 1916, 1917) follow
immediately once we re-express Hlin in terms of occupation number/phase variables:
  
Hlin = (vr eiθr / Nr + 1 + vr∗ e−iθr / Nr ) (2.15)
r

Recalling (cf. (2.3, 2.4)) that the operators eiθr / (resp. e−iθr / ) increase (resp.
decrease) the associated mode number eigenvalue nr by one and hence effect the emis-
sion (resp. absorption) of a photon in the rth mode: the corresponding probabilities are
therefore proportional to the initial-state occupation numbers nr + 1 (nr resp.). Note
that the same factor |vr |2 appears in both probabilities (in Einstein’s language, this
is the equality of the coefficients for absorption and induced emission): it involves an
atomic state matrix element which Dirac will identify once the interaction Hamiltonian
is fully specified in terms of electron and electromagnetic field quantities. In any event,
with the usual mode counting of classical plane wave modes in a box of volume V ,
giving cV3 ν 2 dνdΩ modes of a given polarization into solid angle dΩ in the frequency
interval dν, one finds for the total energy of the radiation in this interval (assuming
the occupation numbers nr smoothly varying),
V 2 V c2
nr · hνr · 3
νr dνr dΩ ≡ I(νr )dνr dΩ ⇒ nr = I(νr ) (2.16)
c c hνr3
where the specific intensity of the ambient radiation (in the initial state) I(νr )
corresponds to the radiative flux in the particular frequency interval and solid angle
(with flux defined as usual as the energy density times the speed of light). Thus, if
the absorption rate is proportional to nr and hence to I(νr ), the emission rate is
hν 3
correspondingly proportional to I(νr ) + c2r , with the first and second terms corre-
sponding respectively to induced and spontaneous emission (the latter present even
in the absence of ambient radiation). These are Einstein’s laws for the emission and
absorption of radiation, in the form presented by Dirac.
The remaining task facing Dirac was to go beyond the results of Einstein by
providing a complete route to a calculation of the absolute (rather than relative)
absorption and emission (spontaneous and induced) rates for specified transitions of
an electron between distinct atomic stationary states, thereby putting on a firm basis
36 Origins II: Gestation and birth of interacting field theory: from Dirac to Shelter Island

(at last!) Heisenberg’s basic, but as yet purely hypothetical, intuition in his seminal
Umdeutung paper connecting the transition amplitudes for radiative processes in
atoms to the matrix elements of the electron’s coordinate operator. Dirac accomplishes
this, in the final section of (Dirac, 1927b), by a correspondence principle argument
identical in spirit to those used by Kramers, Born, and Heisenberg in the dispersion
theory precursors to matrix mechanics. We shall briefly reprise the argument here, in
modern notation.
We begin with the Fourier expansion for the classical vector potential in radiation
(i.e., Coulomb) gauge, ∇ ·A = 0, in a finite box of volume V ,
 1
r, t) = √1
A( 

a
eik·
r−iωk t + c.c.) (2.17)
V

k k,λ k,λ
k,λ

where we are using natural units (so  = c = 1), the polarization index λ takes two
values, with a transverse polarization vector k ·

k,λ = 0, and the energy associated


with the discrete wavevector k is, for massless photons, ω
k = | k|, and “c.c” means
simply the complex conjugate, as A is a real c-number field. In Dirac’s notation,
the combination of the discrete momentum k and polarization index λ are denoted
by a single mode index such as r, as previously. With the scalar potential zero, as
appropriate for a free Coulomb gauge field, the electric field is E = − ∂ A
and the
∂t
magnetic field is B =∇ × A,
whence the total energy of the free electromagnetic field
in the box is, after some simple algebra,
 
1
H0 = (E 2 + B
2 )d3 r = ω
k a
∗k,λ a
k,λ (2.18)
2 V

k,λ

The interpretation of the amplitudes a


k,λ is now clear: the absolute square is just the
number of photons in the mode k, λ, i.e., in Dirac’s notation, after quantization, just
the Nr operator. Dirac now completes the quantization of the electromagnetic field
by replacing the classical coefficient a
k,λ by the destruction operator br = b
k,λ , and
its complex conjugate a
∗k,λ by the hermitian conjugate operator b
† . The classical
k,λ
interaction energy between a particle of charge e and mass m in the presence of the
vector potential is then (Dirac ignores the term quadratic in the vector field arising
from the expansion of the gauged kinetic term 2m 1
( 2 , keeping only the terms
p − eA)
linear in the vector potential, which give the leading contribution in perturbation
theory in e)
e r, t)
Hint = − p · A( (2.19)
m
where now p and r refer to the momentum and coordinate of the electron interacting
with the electromagnetic field. Inserting (2.17) into (2.19) gives an operator precisely
of the form Hlin in (2.15), but with the coefficient vr∗ of the destruction operator br now

identified, up to unimportant constants, as

k,λ · p e−ik·
r for the mode r corresponding
to wavevector k and polarization λ. In the electric dipole approximation the wavelength
Introducing interactions: Dirac and the beginnings of quantum electrodynamics 37

k −1 of the photon is assumed much larger than atomic dimensions, and the exponential
can be replaced by unity. The matrix element of this interaction operator between
distinct initial and final atomic states |i, |f  thus involves matrix elements of the
component of the electron momentum in the direction of the photon polarization,
f |

k,λ · p |i, multiplied by the dependence on the photon occupation numbers of the
electromagnetic field found earlier by considering matrix elements of the br and b†r
photon destruction and creation operators.6 Dirac shows that these results agree in
detail with the matrix-mechanical expressions for the Einstein A and B coefficients.
Dirac was perfectly aware that his fully quantum-mechanical discussion of the
interaction of electrons with the electromagnetic field suffered from a serious short-
coming: the continuing treatment of the electrons as non-relativistic particles, which in
particular made a proper derivation of the relativistic fine-structure effects of electrons
in atomic bound states impossible. Initial attempts to produce a fully relativistic
wave equation proceeded by replacing the non-relativistic (free-particle) Schrödinger
equation
∂ 2 2
i ψ( r, t) = − ∇ ψ( r, t) (2.20)
∂t 2m
implementing (via the standard associations H → i ∂t

, p → i ∇)
the non-relativistic
2
energy-momentum relation H = p /2m, with the fully relativistic Klein–Gordon equa-
tion (we take c=1)7
∂2 2 ) + m2 }ψ( r, t) = 0
{2 ( −∇ (2.21)
∂t2
incorporating the correct relativistic relation H 2 = p 2 + m2 . These attempts having
failed in the archetypal test case of the hydrogen atom spectrum, Dirac tackled the
problem of the coupling of a relativistic electron to the electromagnetic radiation in a
seminal paper (Dirac, 1928) which settled once and for all the kinematical aspects of
the relativistic treatment of spin- 12 particles. The incorporation of an electromagnetic
coupling in (2.21), by the usual minimal coupling replacement p → p − eA, H→H−

eA0 (with A, A0 the vector and scalar potentials respectively) led to a wave-equation
for the electron with two immediate, and serious, drawbacks:
1. The presence of a second-order derivative in time ran counter to the fundamental
quantum-mechanical principle that the state of a quantum system at any time
is determined solely by knowledge of its state at any given earlier time (unlike
the situation in the configuration space formulation of classical mechanics, say,
where both coordinates and velocities needed to be specified at an initial time to
allow the subsequent time evolution to be computed). All of the highly successful
quantum transformation theory developed up to this point, primarily by Dirac

6 The matrix elements of the electron momentum operator can be converted to matrix elements of the
coordinate operator by a simple commutator trick; see (Baym, 1990), chapter 13.
7 We note for historical accuracy, that Schrödinger had actually tried the relativistic Klein–Gordon
equation first in his treatment of the hydrogen atom, only to discover to his dismay that it led to incorrect
fine-structure predictions.
38 Origins II: Gestation and birth of interacting field theory: from Dirac to Shelter Island

(Dirac, 1927a) and Jordan (Jordan, 1927a,b), was predicated on this assumption,
which clearly required the dynamical evolution equation of a quantum system to
be first order in time.
2. Secondly, the second-order time-derivative had the highly unpleasant ancillary
effect of introducing solutions with both positive and negative energy (with
the latter, in the minimally coupled case, corresponding to particles of opposite
charge to those of positive energy). This is due simply to the fact that the equation
(2.21) determines only the square of the energy, but not its sign.
Dirac showed that it was possible to solve the first of these problems, and obtain a
fully relativistic equation first order in the time-derivative, with the highly desirable
byproduct of providing a natural explanation of the hitherto mysterious double-
valuedness (rather quaintly termed “duplexity” by Dirac) of electron states, due to the
intrinsic spin. However, as Dirac frankly admitted, the resultant equation, involving a
four-component wavefunction (in other words, twice the desired “duplexity” account-
ing for spin), was still plagued with the unwanted appearance of negative-energy,
opposite-charge solutions. The desired relativistic equation for the free particle case,
if linear in the time-derivative (hence in the energy operator p0 ≡ H = i ∂t ∂
), must

necessarily be linear also in the spatial momentum operators p = −i∇, and Dirac
showed that the simplest possibility, unique up to obvious similarity transformations,
was obtained by setting
{γμ pμ − m}ψ( r, t) = 0 (2.22)
where the γμ , μ = 0, 1, 2, 3 were algebraic objects satisfying the anticommutation
algebra8

{γμ , γν } = 2gμν (2.23)


where gμν is the flat-space metric tensor, with g00 = −gii = 1, gμν = 0, μ = ν. The
solutions to (2.22) necessarily also satisfied the Klein–Gordon equation (2.21), as
0 = (γν pν + m)(γμ pμ − m)ψ = (γν γμ pν pμ − m2 )ψ
1
= ( {γμ , γν }pμ pν − m2 )ψ
2
= (gμν pμ pν − m2 )ψ = (p20 − p 2 − m2 )ψ (2.24)
which is identical with (2.21). A straightforward algebraic analysis led Dirac to the
conclusion that the simplest realization of the Clifford algebra (2.23) was in terms
of a set of four 4x4 matrices γμ , so that the wavefunction ψ necessarily had to
carry an internal discrete index taking four values. This certainly gave room for
the two-fold “duplexity” due to electron spin, and Dirac soon realized that the
additional surplus “duplexity” corresponded exactly to the unwelcome reappearance
in the theory of negative-energy solutions—basically because the γ0 matrix associated

8 Our notation vis-à-vis the Dirac equation and algebra differs slightly from Dirac’s, in accord with
modern usage.
Introducing interactions: Dirac and the beginnings of quantum electrodynamics 39

with the energy term necessarily produced two +1 and two −1 eigenvalues on
diagonalization.
The successful application of the Dirac equation (with the coupling of the electron
to the electromagnetic field accomplished via the usual minimal coupling procedure of
replacing pμ → pμ − ec Aμ in the free particle equation (2.22)) to the relativistic fine-
structure of the hydrogen atom, and the fact that the new theory automatically yielded
the correct, and previously utterly mysterious, gyromagnetic ratio of 2 for the electron,
led to the immediate acceptance of the Dirac equation as the correct foundation for a
fully relativistic treatment of the interaction of electrons and photons. However, the
conceptual wave-mechanical framework in which the equation was conceived and born
was to prove a stubborn hindrance to the early acceptance of a fully second-quantized
formalism for matter fields such as the electron.
The irony is that Dirac, having pioneered the application of a second-quantized
formalism (with creation and annihilation operators at the center) for dealing with
the electromagnetic field, continued to insist on the use of first-quantization ideas for
the electron. In particular, multi-electron states would be described relativistically in
the same way as one was now accustomed to treat them in non-relativistic contexts,
such as in multi-electron atoms, using wavefunctions defined on a 3N -dimensional
coordinate space (for N electrons) and with interactions handled approximately via
Hartree–Fock mean field techniques.
The absence of disastrous instabilities incurred by the unavoidable transitions
between positive- and negative-energy states once the Dirac electron was coupled to
the electromagnetic field was legislated by fiat, by the assertion that the physical
“vacuum” (i.e., no-particle) state actually consisted in having all negative-energy
electron states filled, so that the normal positive electron states were simply (by
the Pauli exclusion principle) denied the opportunity of dropping down to any of
the infinitely many otherwise available negative-energy states (with the release of
electromagnetic gamma-radiation of energy ≥ 2mc2 ). The absence of a negative-energy
electron was then interpreted as the presence of a positive-energy particle of positive
charge, interpreted first by Weyl and Dirac as the proton, but as difficulties arose with
this proposal, as a (so far) unseen particle of equal mass and opposite charge to the
electron.
This approach was soon confirmed by Anderson’s experimental discovery of the
positron in 1931. However, the idea of an unobservable “filled sea” of negative-energy
electrons—in some sense the conceptual progeny of Dirac’s earlier idea, discussed
above, of a sea of zero-energy photons—would prove to be an extremely persistent
distraction during the 1930s, leading to a vast amount of ultimately unprofitable,
and extremely complicated, formalism, before the whole rickety framework9 could be
thrown overboard and replaced by a conceptually clean and technically efficient Fock

9 Among many other artificialities, the negative-sea idea required the introduction of mathematically
murky subtractions in the operators representing the total energy and charge of the system in order to
implement the absence of energy and charge in the vacuum state—the dubious character of the necessary
subtractions being considerably amplified by the assumption that the infinitely many occupied negative-
energy electron states were interacting and the corresponding multi-particle wavefunction therefore had
to be treated by extremely unconvincing Hartree–Fock approximation techniques incorporating, at least
roughly, the interactions of the occupied negative-energy electrons.
40 Origins II: Gestation and birth of interacting field theory: from Dirac to Shelter Island

space (i.e., occupation number) formalism in which positive-energy electron states and
“negative-energy hole” states (i.e., positive-energy positron states) could be treated in
a completely symmetric way. This required that the ideas of second-quantization, fully
accepted for radiation after Dirac’s 1927 paper, should be applied with equal force to
matter (i.e., fermionic) fields. The unquestionable leader in this point of view, as we
shall now see, was Pascual Jordan.

2.2 Completing the formalism for free fields: Jordan, Klein,


Wigner, Pauli, and Heisenberg
In contrast to Dirac, Jordan was an early champion (by which we mean already
in the period 1926–27 immediately following the breakthroughs of Heisenberg and
Schrödinger) of the notion that wave–particle duality extended to a coherence in the
mathematical formalisms used to describe radiation (specifically, the electromagnetic
field) and matter (which seems to denote for Jordan and co-workers the aggregate
behavior of massive particles of either bosonic or fermionic type). As we saw in
Chapter 1, Jordan was the first to realize that the introduction of a quantum field
for electromagnetism, in which the classical fields as c-number functions of space and
time were replaced by q-number spacetime functions built from independent modes
quantized as quantum harmonic oscillators, led to an explanation of the mysterious
Einstein formula for the energy fluctuations in blackbody radiation, where the result
involved a sum of wave and particle terms.
Jordan adopted very early the point of view that a similar approach, in which
a single q-number function of space and time would incorporate all the physics of
a system of arbitrarily many (massive) particles, should be applied universally. The
mathematical advantage of such an approach was very clear (at least to Jordan!):
the treatment of a many (N ) particle system in terms of Schrödinger wavefunctions
involving the abstract multi-dimensional coordinate space of N coordinate three-
vectors (as well as the time t, of course) would be replaced by a dynamics specified by a
single q-number field φ( r, t) defined on the physical spacetime. The promise of such an
approach, from a physical point of view, was twofold: (i) the treatment of interactions
between matter and electromagnetism should be possible in a more natural way if both
systems were subjected to a similar quantization technique, and (ii) the appearance of
functions of spacetime ( r, t), in lieu of the abstract 3N +1-dimensional configuration
space of many-body Schrödinger theory, suggested that the requirements of relativistic
invariance could be more readily imposed on the theory. In the words of Jordan and
Klein, “. . . this point of view seems especially suited to an attack on the relativistic
many-body problem, because it describes matter and radiation in a mathematically
equivalent way, namely through partial differential equations” (Jordan and Klein,
1927, p. 752).
In fact, in the paper by Jordan and Klein where a unified field-theoretic philosophy
is first spelled out in detail, the matter particles are still treated non-relativistically
(and are bosonic in nature). The problem addressed by Jordan and Klein amounts
to a transcription into field-theoretic terms of the basic problem of atomic physics:
the quantum mechanics of a system of non-relativistic charged particles (of charge e
and mass μ) interacting via mutual electrostatic forces, and perhaps also subject to
Completing the formalism for free fields: Jordan, Klein, Wigner, Pauli, and Heisenberg 41

applied external fields. At this point, however, the incorporation of Fermi statistics in
a second-quantized formalism was not yet fully understood (a deficit soon to be erased
in the paper of Jordan and Wigner discussed below), so the particles are assumed to
obey bosonic statistics. By writing the Hamiltonian operator of such a system (we
assume that no external fields are present to simplify slightly the formalism) in the
form
 2 
 † e2 : φ† ( r, t)φ( r, t)φ† ( r  , t)φ( r  , t) : 3 3 
H= ∇φ ( r, t) · ∇φ( r, t)d r +
3
d rd r
2μ 2 | r − r  |
(2.25)
with the field operator expanded in discrete plane wave modes (say, by quantizing the
system in a box of volume V )

1  i

r
φ( r, t) = √ e b
k (t) (2.26)
V

and where the : . . . . : notation appearing in (2.25) embodies the instruction10 to move
all conjugate operators b
† to the left of the b
k when (2.26) is inserted into (2.25),
k
giving
 e2 
H= E(k)b
† b
k + A( k  q  | k q)b
† b†q
b
k bq
 (2.27)
k 2 k

k
k q

q



 
r
1 e−i(k−k )·
r−i(
q−
q )·

A( k  q  | k q) ≡ 2 d3 rd3 r  (2.28)
V | r − r  |

In analogy to Dirac’s treatment of the electromagnetic field, the Hamiltonian


equations for the classical Fourier amplitudes b
k (t), b
∗k (t) are shown by Jordan and
Klein to imply that these quantities form classically canonical pairs, whence the
quantization of the system requires the commutator Ansatz for their quantum analogs:

[b
k (t), b
†  (t)] = δ
k
k (2.29)
k

[b
k (t), b
k (t)] = 0 = [b
† (t), b
†  (t)] (2.30)
k k

The transition to the occupation number basis, and the interpretation of the bs and
b† s as destruction and creation operators, is then made exactly as in Dirac’s work, via
the transcription (see (2.2, 2.3, 2.4, 2.5))

b
k → e−  θk N
k , b
† → N
k e+  θk
i i
(2.31)
k

10 This instruction, in modern terminology the “normal-ordering” of the Hamiltonian operator, is essential
to remove infinite contributions to the energy arising from the Coulomb self-energy of the individual
particles, leaving only the electrostatic interaction energy of charged particle pairs (see (Jordan and Klein,
1927), Section 3). In the tangled history of 1930s field theory, the normal-ordering instruction was frequently
referred to as the “Klein–Jordan trick”.
42 Origins II: Gestation and birth of interacting field theory: from Dirac to Shelter Island

The algebra (2.29, 2.30) implies that the Hamiltonian equation of motion of the field
operator φ( r, t), expressed in coordinate space, amounts to the field equations

∂ 2
i φ( r, t) = − Δφ( r, t) + eV ( r, t)φ( r, t) (2.32)
∂t 2μ
ΔV ( r, t) = −4πeφ† φ( r, t) (2.33)

One sees immediately that (2.32) is formally identical with the non-relativistic (time-
dependent) Schrödinger equation for a single particle of mass μ and charge e interact-
ing with a Coulomb potential V , if we interpret φ( r, t) as a c-number wavefunction. Eq.
(2.33) is then simply the Poisson equation giving the electrostatic potential in terms
of the particle probability density |φ( r, t)|2 . Jordan and Klein refer to the process
of replacing the c-number coordinate space Schrödinger wavefunction by a q-number
field operator as a “quantization of de Broglie waves”. This terminology would soon be
replaced by the denotation second quantization,11 expressing the fact that the original
c-number (or “first-quantized”) equations of wave mechanics would lead to a correct
treatment of multi-particle systems, including the proper treatment of the statistics
(i.e., bosonic or fermionic), by the simple expedient of quantizing (i.e., replacing by
operators satisfying suitable commutation relations) the expansion amplitudes of the
single-particle wavefunctions.
The form of the Hamiltonian (2.27), when expressed in terms of creation and
destruction operators, makes it clear that the theory (like Dirac’s first version of
the Hamiltonian for quantum electrodynamics) conserves particle number: each term
contains an equal number of bs and b† s. Thus the time-dependent Schrödinger equation
acts independently in each sector of the state space corresponding to a given fixed
number of particles. Introducing a Schrödinger wavefunction for a given total number
of particles N (in analogy to (2.7)), Jordan and Klein show after a simple calculation
that the Hamiltonian (2.27) generates precisely the appropriate non-relativistic time-
dependent Schrödinger equation (analogous to (2.8)) for the given N -particle system.
Moreover, the Bose–Einstein symmetry of the multi-particle wavefunction is manifest
throughout.
The extension of the second-quantization procedure to systems of particles obeying
fermionic statistics followed within a few months of the Jordan–Klein paper, in an
article of Jordan and E. Wigner entitled “On the Pauli Exclusion Principle” (Jordan
and Wigner, 1928). The non-relativistic Schrödinger wave mechanical formalism for a
system of N fermions, wherein the non-relativistic Schrödinger Hamiltonian acts on
fully antisymmetric wavefunctions in the N -particle coordinate space, was shown to be
equivalent to a second-quantized version in which the Hamiltonian appears in the form
(2.27)12 but with creation and destruction operators a†κ , aκ subject to anticommutation
relations13

11 In an interview with Thomas Kuhn in June 1963 for the Archive for the History of Quantum Physics
(session 3, p. 9), Jordan claims to have been the first to employ this terminology. See (Duncan and Janssen,
2008, p. 642).
12 See (Jordan and Wigner, 1928), Eqs. (66a), (66b).
13 The anticommutator of two operators A and B is defined as {A, B} ≡ AB + BA¿
Completing the formalism for free fields: Jordan, Klein, Wigner, Pauli, and Heisenberg 43

{aκ , a†λ } = δκλ (2.34)


{aκ , aλ } = 0 = {a†κ , a†λ } (2.35)

with the indices κ, λ labeling a complete set of single-particle states (which may
require both a continuous as well as discrete specification, with a corresponding
interpretation of the Kronecker δ as a Dirac δ-function). The vanishing of the square of
the creation operator a†κ (implied by (2.35), setting κ = λ) immediately incorporates
the Pauli exclusion principle denying the possibility of multiply occupied fermionic
states. However, the treatment of Jordan and Wigner is still entirely within the
framework of non-relativistic physics: their paper appeared almost simultaneously with
Dirac’s relativistic equation for the electron, the correct second-quantized formulation
of which, as we shall see, was as yet still several years in the future.
The formal developments outlined so far, while of critical importance in establish-
ing the role of second quantization in enforcing the wave–particle connection while
maintaining the correct symmetry of multi-particle states, left the issue of relativistic
invariance essentially untouched. This shortcoming was remedied by the seminal paper
of Heisenberg and Pauli (Heisenberg and Pauli, 1929), which put in place the formalism
of Lagrangian field theory, still (eighty years later) at the core of modern field theory.
Heisenberg and Pauli begin with the canonical formalism of classical field theory,
where an action functional defined as the spacetime integral of the Lagrangian gives
rise to field equations via a variational principle:

α , φ̇α )d4 x = 0 ⇒ ∂L = ∂
δ L(φα , ∇φ
∂L
(2.36)
∂φα ∂xμ ∂( ∂φ
∂xμ )
α

Here φα (xμ ) are a set of spacetime fields labeled by a discrete index α, and the
covariant form of the Euler–Lagrange equation on the right-hand side of (2.36) suggests
that, at least classically, relativistic invariance of the field equations will follow once
the Lagrangian L is chosen to be a Lorentz scalar functional. The transition to a
Hamiltonian framework is made by the standard procedure: one introduces canonically
conjugate “momentum” fields πα = ∂∂L φ̇α
14
and sets the Hamiltonian density equal to
the Legendre transform of the Lagrangian density,

α , πα ) ≡
H(φα , ∇φ πα φ̇α − L (2.37)
α

Exactly as in classical point mechanics, one is then easily able to verify that, in the
absence of explicit time-dependence, the spatial integral of the density H is temporally
constant, and hence deserves the appellation “total energy of the system”. The fields
φα and πα at all spatial points but on the same time-slice are subjected to quantization
by the usual replacement of the classical Poisson bracket by commutators

14 Some notational warnings: Heisenberg and Pauli use Q (resp. P ) for φ (resp. π ) to emphasize
α α α α
the analogy to Q, P variables in classical mechanics: the analogy is made even more complete by a lattice
formulation wherein the spatial continuum is discretized so that the 
x dependence also becomes discrete.
44 Origins II: Gestation and birth of interacting field theory: from Dirac to Shelter Island


[πα ( x, t), φβ ( y , t)] = δαβ δ 3 ( x − y ) (2.38)
i

with the desirable consequence that the Hamiltonian operator  H defined as the
(conserved) spatial integral of the Hamiltonian density, H ≡ Hd3 x generates the
time evolution of any dynamical variable F (πα , φα ) constructed from the fields and
their conjugates on a time-slice:

∂F i
= [H, F ] (2.39)
∂t 

The article of Heisenberg and Pauli runs for 61 pages in the Zeitschrift für Physik,
and we can do no more here than to summarize a few of the critical contributions it
makes to the conceptual development of quantum field theory:

1. The relativistic invariance of the action (i.e., choice of a scalar Lagrange func-
tion) is shown to lead to relativistically invariant commutation relations, in the
sense that the vanishing of the equal-time commutators of fields and conjugate
(momentum) fields in one inertial frame implies the vanishing at equal time in
another inertial frame (vis-à-vis the time coordinate of the new frame). From
this follows the vanishing of local observables built from the fields and their
conjugates at pairs of points with strictly space-like separation (as such point
pairs can always be brought to equal time by a suitable Lorentz transformation).
This is the first clear statement of the microcausality principle which lies at the
heart of relativistic field theory.15
2. The canonical formalism was applied to the electromagnetic field, starting from
the classically well-known Lagrangian function

1
L = − Fμν (x)F μν (x), Fμν ≡ ∂μ Aν (x) − ∂ν Aμ (x) (2.40)
4

where we use the convenient abbreviation ∂μ ≡ ∂x∂ μ for a spacetime-derivative.


As Heisenberg and Pauli point out, the canonical program runs into immedi-
ate difficulties as the scalar potential A0 (time component of the four-vector
potential) has a vanishing conjugate momentum, π0 ≡ ∂L/∂ Ȧ0 = F00 = 0, which
immediately derails the imposition of canonical commutation relations of the
form (2.38). The canonical treatment of systems with a local gauge symmetry,
leading to a degeneracy in the Legendre transform connecting the Lagrangian and
Hamiltonian, is now well understood (cf. Chapter 15), but in 1929 the best that
Heisenberg and Pauli could do was to suggest the addition of a term 12
(∂μ Aμ )2
to the Lagrangian, breaking the gauge symmetry Aμ → Aμ + ∂μ Λ (where Λ is
an arbitrary scalar function), but allowing the definition of a sensible conjugate

15 The earlier derivation of field commutation relations for the electromagnetic field by Jordan and Pauli
(Jordan and Pauli, 1928) had considered only the free electromagnetic field, and the commutators were
evaluated at arbitrary spacetime separations, so the specific aspect of space-like commutativity was not
emphasized.
Completing the formalism for free fields: Jordan, Klein, Wigner, Pauli, and Heisenberg 45

momentum field π0 =
Ȧ0 for the scalar potential A0 .16 Heisenberg and Pauli
then suggest that the parameter
be set to zero at the end of all calculations to
recover the correct gauge-invariant results.
3. The Lagrangian formalism is developed for the Dirac relativistic electron equa-
tion. In modern notation, one writes an action (spacetime integral of the
Lagrangian),
 
SDirac = LDirac d x = ψ̄(x)(iγ μ ∂μ − m)ψ(x)d4 x, ψ̄ ≡ ψ † γ0
4
(2.41)

which, by varying with respect to field ψ̄ (assumed independent of ψ) yields


the desired Dirac equation (iγ μ ∂μ − m)ψ(x) = 0. The requirement of Fermi–
Dirac statistics then lead Heisenberg and Pauli to the choice of equal-time
anticommutation relations to replace the commutators (2.38),

{ψα ( x, t), ψβ† ( y , t)} = δαβ δ 3 ( x − y ) (2.42)

which follows formally by identifying the conjugate field to ψ as π ≡


∂LDirac /∂ ψ̇ = iψ̄γ0 and replacing commutators by anticommutators. The iden-
tification of the electromagnetic charge current of the electron as j μ (x) =
eψ̄(x)γ μ ψ(x) is also made in this paper, as well as the assertion that the coupling
between the electron and electromagnetic field is accomplished by including the
term jμ Aμ in the Lagrangian. The necessity for the choice of fermionic statistics
(specifically, anticommutators) for the electron is regarded by Heisenberg and
Pauli as fundamentally mysterious: “a satisfying explanation for Nature’s pref-
erence for the second possibility [i.e., anticommutators] can not be given”. The
Spin-Statistics theorem, establishing this necessity on the basis of the general
tenets of relativistic invariance, quantum theory, and microcausality, is still a
decade in the future (Pauli, 1940).
4. General formulas are adduced for the energy-momentum tensor Tμν giving via
spatial integrals the total four-momentum operators P0 = H (total energy), P
(total spatial momentum) of the system, namely Pμ = Tμ0 d3 x, the conservation
of which are assured by the divergenceless property ∂ν T μν = 0. Specific forms are
given for Tμν for both electromagnetism and the Dirac theory.
To these unquestionable triumphs we must also add the difficulties and failures
which emerge in the course of applying the Heisenberg–Pauli formalism for inter-
acting electrons and photons to some fundamental questions. Of particular historical
interest is their second-order calculation of the self energy of the electron (i.e., the
correction to the rest mass of an electron of second order in the charge e when the

16 In two extremely influential papers in 1929 and 1930, Enrico Fermi (Fermi, 1929, 1930) was able
to develop a consistent Hamiltonian quantum electrodynamics in the covariant Lorentz gauge ∂μ Aμ = 0,
and to establish its equivalence to the transverse gauge ∇  ·A = 0 used by Dirac. In a second paper on
the quantization of wave fields (Heisenberg and Pauli, 1930), Heisenberg and Pauli showed that Fermi’s
covariant gauge results corresponded to a choice of unity for their parameter , and that the gauge condition
∂μ Aμ = 0 should be interpreted as a constraint on the allowed states, not as an operator identity (a “q-
Zahlrelation”).
46 Origins II: Gestation and birth of interacting field theory: from Dirac to Shelter Island

interaction with the electromagnetic field is turned on). The calculation yields (Eq.
(115) in (Heisenberg and Pauli, 1929)) a linear divergence identical with the classical
2
Coulomb self-energy result: namely, |
xe−
x| = ∞. This divergent energy (together with
the ubiquitous vacuum energy arising from the sum of 12 hν zero-point energies
from each mode of the electromagnetic field) would plague attempts to arrive at a
mathematically consistent quantum electrodynamics throughout the 1930s and early
1940s. As only electron states are considered (the interpretation of the negative-energy
solutions in terms of positrons, with a correct treatment of the latter, is yet to come),
the Heisenberg–Pauli quantum electrodynamics conserves electron number, and the
additive infinity in the electron energy can therefore be dropped “as one is only
interested in energy differences”. The development of positron theory, and its proper
application by Weisskopf to the electron self-energy problem in 1939, would lead to a
softening of the divergence from linear to logarithmic, but the extreme discomfort felt
by physicists working on early quantum electrodynamics in the presence of numerous
divergent expressions would only be assuaged in the late 1940s with the development
of a covariant renormalization procedure.

2.3 Problems with interacting fields: infinite seas, divergent


integrals, and renormalization
The history of quantum electrodynamics throughout the 1930s and into the mid-
1940s is replete with false starts, conceptual confusions, and the frequent appearance
of increasingly radical suggestions for the abandonment of “sacred” principles in
a desperate attempt to stay afloat in a rising tide of physical and mathematical
inconsistencies. The main problems were of two distinct, but related, types:
1. The persistence of the “negative sea” hole theory of Dirac, wherein positron
states were interpreted as holes in an infinite background of negative-energy
Dirac electrons. Conceptually, this theory was the unfortunate consequence of a
stubborn persistence of the many-body configuration-space thinking, fully appro-
priate in the case of non-relativistic multi-electron systems such as atoms, in the
treatment of relativistic fermionic matter on the part of Dirac and his followers,
despite the fact that Dirac had pioneered the treatment of the electromagnetic
field in terms of a second-quantized formalism involving photon creation and
destruction operators. Despite the resistance of prominent theorists (especially
Heisenberg and Pauli), who properly objected to the highly ad hoc way in which
the physical effects of an infinite background of charged particles were disposed
of, many of the detailed calculations of quantum electrodynamic processes in the
1920s were carried out on the basis of (or at the least, paying lip service to) the
Dirac hole theory.17
2. The appearance of ultraviolet divergences (i.e., at the high-momentum or short-
distance end of integrals appearing in the calculation of various quantum elec-
trodynamic quantities) which seemed unavoidable without introducing funda-
mental distortions of underlying physical principles (such as the introduction of

17 For a convenient compendium of many of the important papers, see (Miller, 1994).
Problems with interacting fields: infinite seas, divergent integrals, and renormalization 47

a “universal length” cutoff (Heisenberg, 1938)) led to increasing dissatisfaction


with the theoretical frameworks available (whether hole theory or fully second-
quantized). The zero-point (vacuum) energy in the electromagnetic field already
visible in Jordan’s fluctuation calculations in the Dreimännerarbeit were soon
joined by divergences in the electron self energy (i.e., the corrections to the energy
of an isolated electron due to its interactions with the electromagnetic field),
and by a divergent screening correction to the electric charge on an isolated
electron due to the unavoidable presence in the vacuum of virtual electron–
positron pairs, essentially corresponding to an infinite dielectric constant of the
vacuum.

The way out of the impasse created by the fundamentally untenable notion of a
negative-energy sea of electrons was shown very clearly in a paper by V. A. Fock in
1933 (Fock, 1933), and somewhat less clearly (as the basic idea is submerged in a large
quantity of speculation on other unrelated topics) in an almost simultaneous article
by Furry and Oppenheimer (Furry and Oppenheimer, 1934), although it must be
admitted that the basic lessons of both papers seem to have been pretty much ignored
by the theoretical community, which for the most part went right on calculating in
terms of electrons and holes.
We shall describe briefly the ideas of Fock here, as this paper can be regarded
as the seminal work responsible for the term “Fock space”, which provides the basic
kinematical scaffolding for modern (operator) formulations of relativistic field theory.
Fock proposed that instead of treating electrons as the primary objects of the theory
and positrons as derived concepts (i.e., holes in a negative-energy sea of electrons), the
latter should appear in the theory along with electrons in a completely symmetrical
way. By this time, the experimentally well-established phenomenological symmetry
between the two types of particle (identical mass, opposite charge) certainly made
this proposal a plausible one. Thus, the Hamiltonian of the free theory (i.e., with
electromagnetic interactions switched off) was assumed to take the form18
 
H0 = d3 p E(p)(b† ( p, σ) + d† (
p, σ)b( p, σ)d(
p, σ)) + infinite constant (2.43)
σ

where the creation and destruction operators for electrons (b† , b) and positrons (d† , d),
as already clear from the work of Jordan and Wigner, must obey anticommutation
relations to enforce Fermi–Dirac statistics:

p, σ), b† (
{b( p  , σ )} = δσσ δ 3 (
p − p  ) (2.44)
p, σ), d† (
{d( p  , σ )} = δσσ δ 3 (
p − p  ) (2.45)
   
{b( p , σ )} = {d(
p, σ), b( p, σ), d(
p , σ )} = 0 (2.46)

18 We have taken the liberty of introducing modern notation here: in Fock’s paper, the momentum-spin
pair p, σ is denoted by the single variable q, the electron (resp. positron) destruction
 operators b( p, σ)
(resp. d(  2 + m2 is given as
p, σ)) are denoted φ(q, 1) (resp. φ(q, 2)), and the relativistic energy E(p) = p
a matrix element of the single-particle Dirac Hamiltonian.
48 Origins II: Gestation and birth of interacting field theory: from Dirac to Shelter Island

The infinite constant appearing in the Hamiltonian in (2.43) is a consequence of


Fock having started with a Hamiltonian built from an electron field containing both
positive and negative electron states, and then having defined the positron destruction
operator d( p, σ) as the creation operator for a negative-energy electron state. Once
the Hamiltonian is obtained in the symmetrical form (2.43), this infinite-energy term
(basically the energy of the filled sea of negative-energy electrons in the Dirac hole
theory) is immediately discarded by Fock as physically irrelevant.
The form of the remaining operator is completely transparent: the energy of a
system of a finite number of free electrons and positrons is given simply by multiplying
the possible energies by the number operators b† b, d† d for electrons and positrons,
with the energy of the vacuum state (no electrons or positrons) automatically zero. In
particular, no primacy of place is given to either electrons or positrons in this approach,
both types of particle appearing on an absolutely equal footing in the formalism. With
the benefit of hindsight we now know, of course, that this was exactly the right attitude
to adopt: the apparent preference of Nature for electrons over positrons in the world
around us being a purely historical accident occasioned by the presence of a tiny
CP violation (namely, a breaking of the particle–antiparticle symmetry) in the early
Universe.
The free Hamiltonian H0 in (2.43) preserves separately the number of electrons
and positrons in any state. However, as Fock points out, the addition of interaction
terms involving odd operators (i.e., those in which electron and positron creation
or destruction operators appear multiplied, corresponding in the Dirac theory to
transitions between positive- and negative-energy electron states) such as
 
Hu = d3 pd3 p p, σ; p  , σ  )b† (
(U ( p, σ)d† (
p  , σ  ) + h.c.) (2.47)
σσ 

(here h.c. denotes “hermitian conjugate”) while conserving electric charge (as electrons
and positrons are created or destroyed in tandem by the operator Hu ), necessarily
results in a theory in which the number of particles (either electrons or positrons)
is not conserved. The importance of these prescient remarks would soon become
clear when it became apparent that the gauge-invariant treatment of the coupling
to electromagnetism involved a four-vector charge current j μ (x) containing exactly
odd terms of this sort (in addition to even terms conserving separately electron and
positron number).
In retrospect, the advantages of a charge symmetric quantum electrodynamics
should certainly have become completely manifest after the appearance of the paper
of Pauli and Weisskopf in 1934 (Pauli and Weisskopf, 1934), in which a fully
gauge-invariant theory of massive charged scalar (i.e., spinless) particles coupled
to electromagnetism was written down classically and then subjected to canonical
quantization à la Heisenberg and Pauli. Thus, temporarily switching off the coupling to
the electromagnetic field (and setting  = c =1), one starts with a free Hamiltonian for
the scalar field ψ (where for c-number fields the † means simply complex conjugation)

∂ψ † ∂ψ †
H0 = { + ∇ψ · ∇ψ + m2 ψ † ψ}d3 x (2.48)
∂t ∂t
Problems with interacting fields: infinite seas, divergent integrals, and renormalization 49

and then (via the usual identification of conjugate momentum fields) introduces
quantization via the equal-time commutation relations

∂ψ†
[π( x, t), ψ( x , t)] = −iδ 3 ( x − x ), π≡ (2.49)
∂t
This theory, with fields satisfying a Klein–Gordon equation (which follows from
(2.48) and (2.49)) with classical solutions of both negative and positive energy,
provided Pauli and Weisskopf with a clear analog of the problems in electron theory
which led Dirac to the desperate expedient of a negative-energy sea: with the crucial
difference that the absence of an exclusion principle made the notion of viewing the
physical vacuum as a state with all negative-energy states filled (each one infinitely
many times, as we are dealing with bosons here!) even more manifestly absurd than
in the fermionic case.
Fortunately, Pauli and Weisskopf were able to show that the quantized version
of the theory possessed a perfectly sensible interpretation, provided the particles and
antiparticles of the theory were put on a completely equal footing (as in the work of
Fock, which, however, is not referenced in (Pauli and Weisskopf, 1934)). A Fourier
expansion of the scalar field ψ( r, 0) (at time zero), incorporating the commutation
relations (2.49), and working in a box of volume V so that the allowed momenta are
discrete, gives (with some slight modifications of notation to accommodate modern
taste)
i  1

ψ( r, 0) = √  (a( k)eik·
r − b† (− k)e−ik·
r ) (2.50)
V
2E(k)
k

where a( k) (resp. b( k)) are now interpreted as destruction operators for a particle
of spatial momentum k (resp. an antiparticle of momentum − k) and the hermitian
conjugate operators are the corresponding creation operators. The Hamiltonian (2.48)
then becomes (cf. Fock’s (2.43))

H0 = E(k)(a† ( k)a( k) + b† ( k)b( k) + 1) (2.51)

k

with the infinite term
k E(k) · 1 interpreted as a vacuum zero-point energy which
“can be deleted in all applications”. Once this is done, the vacuum, with zero energy, is
simply the state |0 annihilated by H0 via a( k)|0 = b( k)|0 = 0 for all k: the noisome
negative-energy sea is simply banished from the theory. The divergenceless four-vector

† ∂ψ
Jμ (x) ≡ i( ∂ψ
∂xμ ψ − ψ ∂xμ ) (with ∂ Jμ = 0 following from the Klein–Gordon equations
μ

of the free theory) then leads in the usual way to a conserved charge operator
 
Q ≡ J0 ( r, t)d3 r = (a† ( k)a( k) − b† ( k)b( k)) (2.52)

making clear the identification of the a and b operators as destruction operators


 for par-
ticles of opposite charge (but identical mass: the energy function E(k) = k 2 + m2 in
(2.51) involves the same mass m throughout). The coupling to the electromagnetic field
50 Origins II: Gestation and birth of interacting field theory: from Dirac to Shelter Island

(via the usual mimimal coupling replacement ∂μ ψ → (∂μ − ieAμ )ψ in the Lagrangian,
where Aμ is the electromagnetic four-vector potential) can be carried through in
a straightforward way. One arrives at an interacting theory in which (a) charge
conservation (and the vanishing of the four-divergence ∂ μ Jμ ) is still exact, and (b)
new photon-mediated pair-creation and annihilation processes appear in the theory,
exactly of the sort expected in the Dirac hole theory from transitions between positive-
and negative-energy electron states (but, finally, without the need for an invisible
infinite background of charged particles!).
In hindsight, the advantages of a second-quantized formalism in which electrons
and positrons are treated symmetrically seem so compelling that it is difficult to
understand the persistence of the hole-theory perspective years after the works of
Fock and Pauli–Weisskopf discussed above. Nevertheless, the hole-theory point of view
remained prominent even up to the late 1940s, and the troublesome charge and mass
divergences which would undermine the confidence of many of the early practitioners
of quantum electrodynamics first made their appearance in the context of calculations
performed on the basis of a vacuum consisting of an invisible filled sea of negative-
energy electrons. By 1930, divergent field-theoretic quantities had already made their
appearance in the form of the zero-point energy of the electromagnetic field and the
infinite sea of negative-energy electrons, as well as in the linear divergence in the self-
energy of the electron encountered by Heisenberg and Pauli. In his presentation at
the 1933 Solvay conference (Dirac, 1933), Dirac pointed out that the alteration in the
charge density of the background sea of filled electron states induced by the insertion
of a test charge could be interpreted as a polarizability of the vacuum, leading to
an effective screening of the bare test charge by a factor (1 − 2α Λ
3π ln mc ),
19
where α is
the fine-structure constant and Λ is a momentum cutoff which Dirac assumed should
correspond to the inverse electron Compton wavelength, above which the theory was
presumably unreliable. A perturbative calculation involving an intermediate state in
which a negative electron changes state—necessarily to a positive-energy state, as all
other negative-energy states are filled—corresponds in the Fock point of view to the
appearance of a virtual electron–positron pair, so that the screening can alternatively
be viewed as due to the preferential orientation of these virtual dipoles with respect
to the applied field, much as in the classical theory of polarization. Dirac made clear
that the “observed” charges measured on electrically charged particles necessarily
differed, as a result of this polarization of the vacuum, from the “true” charges carried
by these particles. This observation clearly contains the germ of the idea of charge
renormalization, and more generally the realization that physically observed proper-
ties may—indeed must—contain built-in modifications as a consequence of radiative
interaction effects, necessarily complicating the interpretation of the “true” (or, in
modern terminology, “bare”) parameters appearing in the fundamental Hamiltonian
of the theory.
Further calculations of vacuum polarization in the mid-1930s, by Furry and
Oppenheimer (Furry and Oppenheimer, 1934), Peierls (Peierls, 1934), and Weisskopf
(Weisskopf, 1936), confirmed the presence of a logarithmic ultraviolet divergence in the

19 The order α correction appears to differ from the correct value by a factor of 2, but the reason for this
is unclear.
Problems with interacting fields: infinite seas, divergent integrals, and renormalization 51

charge screening factor. However, it was generally accepted (perhaps “hoped” would
be more accurate here) that the screening of the “true” charges would operate in a
universal and field-independent way, and could therefore be consistently absorbed once
and for all into a uniform redefinition of electric charge. Of course, this maneuver had
the inevitable consequence of making the “true” charges appearing in the Hamiltonian
cutoff dependent (a situation which persists to the present in local quantum field
theories), and the unconscious presupposition that these “true” charges were somehow
physically meaningful could only be satisfied by the expedient, considered desperate
at the time, but now understood (cf. Chapter 16) to be an ineluctable feature of
any realistic field theory, of assuming an actual breakdown of the theory at some
high momentum, which would then cut off the divergent integrals and allow these
underlying charges to take finite values.
Another classic example of the dominance of the hole-theory language, even when
the results were equivalent to those obtainable via a second-quantized formalism with
only positive-energy electrons and positrons, is Weisskopf’s own calculation of the
divergent self-energy of the electron in quantum electrodynamics in 1939 (Weisskopf,
1939), which is phrased throughout in hole-theory language, despite the fact that the
subtractions performed to remove the unpleasant—and clearly unobserved—attributes
of the negative-energy sea precisely correspond to the rewriting in terms of electron
and positron operators suggested by Fock, suggesting that the lessons of second
quantization have, at least subliminally, been absorbed. The second-order (in the
electron charge e) correction to the energy of an electron at rest (with momentum
0 and spin σ) arises from a Coulomb self-energy term, corresponding to the diagonal
matrix element of the Coulomb energy in first order,

ΔECoul =  0σ|Hcoul | 0σ (2.53)



1 ρ( r )ρ( r ) 3 3 
Hcoul = d rd r (2.54)
2 4π| r − r |

where ρ( r ) ≡ J0 ( r ) is the charge density operator (zeroth component of the four-vector


current Jμ ), and a transverse part ΔEtr coming from the appearance (to second order)
of the interaction Hamiltonian Utr for the coupling of physical transverse photons to
the electron charge current J (cf. Section 15.3),

Htr = J · A d3 r (2.55)

From (2.53, 2.54) follows directly



1 G̃(ξ)
ΔECoul = d3 ξ (2.56)
2
4π|ξ|

1 1
G̃(ξ) ≡  0σ|ρ( r − ξ)ρ(
r+ ξ)|0σd3 r (2.57)
2 2
The charge density operator is given in terms of the quantized field for the electron
ψ( r ) (for the present calculation of the electrostatic self energy, only the field at time
zero is needed) by ρ( r ) = eψ † ( r )ψ( r ). For practitioners of Dirac hole theory, this field
52 Origins II: Gestation and birth of interacting field theory: from Dirac to Shelter Island

was written as a single sum (over discrete momentum modes, with box normalization,
where the single index q contains a spatial momentum q, a spin index σ and a
discrete energy sign index to distinguish between positive-energy and negative-energy
modes)

ψ( r ) = φq ( r )aq (2.58)
q

involving only destruction operators for either positive or negative-energy electrons.


Here, the c-number coefficient functions φq ( r ) are single-particle solutions to the
free-particle Dirac equation. In second-quantized language, introducing destruction
operators for electrons aq → bq
σ (for the positive-energy modes in (2.58)) and positrons
aq → d†q
σ (for the negative-energy modes, destruction of one of which in the filled Dirac
sea corresponding to positron—i.e., hole—creation), the field is written instead

ψ( r ) = (uq
σ ( r )bq
σ + vq
σ ( r )d†q
σ ) (2.59)
q

σ

with the wavefunctions φq relabeled as u (resp. v) for positive (resp. negative) energy
solutions. The expression ρ( r ) = eψ† ( r )ψ( r ), taken literally, of course, contains an
infinite background charge in the vacuum due to the negative-energy sea. If we insert
(2.59) in this formula, and reorder the charge density via the “Klein–Jordan trick”
of normal-ordering, whereby all destruction operators are moved to the right of all
creation operators (with a change of sign for each interchange required, as we are
dealing with fermions), one finds
 † 
ρ( r ) =: ρ( r ) : + evq
σ vq
σ =: ρ( r ) : + e (2.60)
q

σ q

σ

with the divergent second term on the right-hand side the sum of the (negative)
charges for each electron in a filled negative-energy state. This term arises from
reordering terms of the form dq
σ dq†
 σ = δq
q
 δσσ − d†q
 σ dq
σ appearing in ρ( r ), using
the anticommutation relations (2.45). By contrast, the normal-ordered charge density
: ρ( r ) : vanishes as physically required in the vacuum state, as the destruction (resp.
creation) operators are deployed on the right (resp. left) side of the expression,
and therefore encounter immediately the vacuum state |0 (resp. 0|), giving zero.
The subtractions performed by Weisskopf amount to the replacement of the charge-
density operator ρ( r ) in (2.54) by its normal-ordered version : ρ( r ) :. When this
normal-ordered expression is used in the evaluation of the charge–charge correlation
function G̃(ξ) defined in (2.57), one finds, returning to infinite volume and continuous
20
momenta,

= e2 d3 q m i
q·ξ

G̃(ξ) e (2.61)
(2π)3 E(q)

20 The calculation is considerably simplified by using Wick expansion techniques described in Chapter
10; one also needs the appropriate normalization properties of the Dirac spinor functions uqσ , vqσ , defined
and discussed in Chapter 7. See Chapter 10, Problem 5.
Problems with interacting fields: infinite seas, divergent integrals, and renormalization 53

On the other hand, if the positron contributions are ignored, one finds a contribution
(due to the appearance of an extra factor of E(q) in the
proportional to δ 3 (ξ)
to G̃(ξ) m
integral), which leads, when substituted into (2.56), to the classic linear divergence
in the Coulomb self-energy, as found previously by Heisenberg and Pauli. Inserting
(2.61) into (2.56), one finds instead the logarithmically divergent integral21
 

e2 d3 q m ei
q·ξ 3 e2 d3 q m
ΔECoul = d ξ= (2.62)
2 (2π)3 E(q)
4π|ξ| 2 (2π)3 E(q)q2

Weisskopf was also able to show in his 1930 paper (correcting an earlier error pointed
out by Furry in which he had found a quadratic divergence) that the other, transverse
contribution to the electron self-energy was likewise given in terms of a logarithmically
divergent integral, and that logarithmic divergences of this kind persisted to all higher
orders of perturbation theory (in the electron charge).
The lack of manifest covariance in Weisskopf’s calculation,22 performed only for
an electron at rest, with the electromagnetic field in radiation gauge, concealed the
crucial fact that the lowest-order correction to the self-energy of an electron in motion,
with momentum p = 0, would take the form δE(p) ∼ 2E(p) 1
δm2 , with δm2 a divergent

shift
 in the squared rest-mass, corresponding to the change E(p) = p 2 + m2 →
2 2 2
p + m + δm . In other words, the disturbing ultraviolet divergences appearing
in the electron self-energy were really divergences in the (Lorentz-invariant) rest-
mass, and could therefore be removed by a (admittedly divergent) redefinition—or
renormalization—of the “bare” mass m appearing in the defining Hamiltonian of
the theory. As we shall now see, this crucial realization, essential for a consistent
formulation of quantum electrodynamics, would come only after another decade had
passed, with the appearance of a fully covariant formulation, and more importantly, a
transparent calculational scheme vastly simplifying the otherwise onerous higher-order
calculations needed for a full understanding of the theory.
The wartime years 1939–45 brought an almost complete halt to research in funda-
mental issues in physics—such as the issues of consistency and calculability in quantum
electrodynamics—as the discovery of nuclear fission in 1939 redirected the attention
of the leading practitioners of subatomic physics to the urgent question of the military
applicability of the potentially vast (and perhaps accessible) stores of energy in the
nuclei of atoms. One important development in this period was Heisenberg’s introduc-
tion (Heisenberg, 1943a,b, 1944) of the concept of the S-matrix, which attempted to
replace a detailed microscopic prescription of the Hamiltonian dynamics of a quantum
system (with the concomitant appearance of apparently intractable divergences) with
a specification of only the phenomenologically “observable” aspects—in particular,
the unitary scattering (or S-) matrix encoding the amplitudes with which particular

 d3 q Λdq
21 Inserting a cutoff at |
q | = Λ in the integral, with Λ >> m, one finds E(q)q 2
∼ 4π ∼
m q
4π ln (Λ/m).
22 In a footnote, Weisskopf admits that the direct calculation of the energy shift for electrons in motion
is complicated by ambiguities in the subtraction of quadratic divergences appearing at various stages of
his calculation. These concerns could, and would only, be put to rest with the development of a manifestly
Lorentz-covariant formulation of QED in the late 1940s.
54 Origins II: Gestation and birth of interacting field theory: from Dirac to Shelter Island

incoming states resolve to particular outgoing states in a scattering event. This project
would receive an extended (but nonetheless finite) rebirth in the late 1950s and 1960s
in the S-matrix theory approach to strong interactions, as frustration with the inability
of field-theoretic methods to yield useful quantitative descriptions of strong-interaction
processes mounted. But the resolution of the divergence difficulties of quantum elec-
trodynamics in the late 1940s and early 1950s, together with the new beautiful and
powerful perturbation-theoretic apparatus which (owing to the small value of the fine-
structure constant, α ∼ 1/137), allowed ever more accurate calculation (in agreement
with ever more accurate experiments!) of measured quantities such as the hydrogen
fine-structure and electron magnetic moment, meant that the S-matrix would remain
a useful auxiliary, if not the central, quantity in quantum electrodynamics.
In June 1947, the National Academy of Sciences of the US sponsored a three-
day conference (25 participants, with an emphasis given to the younger generation
of theorists, who dominated the conference numerically, although, as we shall see,
it was the experimental results reported there by Rabi and Lamb that had a really
dramatic effect in stimulating theoretical progress) on “The Foundations of Quantum
Mechanics”, to be held on a small island, Shelter Island, at the tip of Long Island
in New York State. The three rapporteurs chosen to lead the discussions were
V. Weisskopf, J. R. Oppenheimer, and H. A. Kramers. For these physicists (and
in contrast to present usage) the term “foundations of quantum mechanics” meant
primarily, not quantum measurement theory, but rather the accumulated difficulties
and confusions of the preceding two decades in developing consistent quantum field
theories to describe electrodynamics (and to a lesser extent, the strong and weak
interactions). Weisskopf, in his abstract prepared and distributed in advance of the
meeting, was quite explicit about these failures: “Certain well known attempts have
been made in the last fifteen years to overcome a series of fundamental problems.
All these attempts seem to have failed at an early stage.”23 Weisskopf explicitly
mentions the need for obtaining finite results in a “reliable” way in the presence of
divergent contributions to the electron self-energy and to the vacuum polarization.
Kramers, in his abstract, also emphasized the need for a consistent treatment of
divergences in “hole theory”, and mentions in passing that the meson theory of
nuclear forces offered no respite from similar difficulties, but rather “brought new
divergence sorrows”.
The first day of the Shelter Island conference was primarily given over to the new
experimental results of Lamb and Rabi on hydrogen spectroscopy. In particular, the
discovery of an unequivocal deviation from the hydrogen fine structure given by the
Dirac theory, in which states of equal j (electron total, i.e., spin plus orbital, angular
momentum) and neighboring l (orbital angular momentum) were exactly degenerate,
was presented by Lamb in his measurement of the 2S–2P splitting (for j = 12 ), which
corresponded to an energy of order α3 Rydbergs, in contrast to the Dirac formula for
the hydrogen relativistic fine structure, in which only even powers of the fine-structure
constant appear:

23 See Schweber, op. cit., for a detailed account of the run-up to Shelter Island and the discussions in the
conference itself.
Problems with interacting fields: infinite seas, divergent integrals, and renormalization 55

α 1
Enj = mc2 {(1 + ( )2 )−1/2 − 1}, j + ≤ n, n = 1, 2, 3, . . .
n−j− 1
+ (j + 12 )2 − α2 2
2
(2.63)

It was clear to all the participants that a reliable calculation of this new “Lamb shift”,
requiring the subtraction of the divergent self-energy corrections for the electron in
two distinct atomic bound states, would be an ideal test of the adequacy of any
proposed quantum electrodynamic theory, inasmuch as the desired finite-energy shifts
would have to be very carefully disentangled from the divergent electron self-energy
contributions which were sure to appear in any higher-order calculation.
On the second day of the conference, Kramers gave a very important talk in which
the essential conceptual content of mass renormalization was very clearly laid out,
albeit in the context of a purely classical theory of a non-relativistic electron interacting
with the electromagnetic field. Kramers emphasized—and Bethe’s calculation of the
Lamb shift just two days later, on the train home, showed that his arguments fell on
fertile ground—that the measured mass of the electron should be regarded as already
containing the divergent self-energy contributions, and that calculations should be
reorganized to express the desired physical observables in terms of this physical mass,
rather than the “intrinsic” or “bare” mass appearing in the Hamiltonian. During the
conference it was realized that the weak logarithmic divergence in the electron self-
energy would in fact cancel in the calculation of the energy difference ΔE between
the 2S1/2 and 2P1/2 states (as the electron in both states receives the same self-
energy correction)—a point emphasized by Weisskopf in his report on the divergence
difficulties of hole theory. On the final day of the conference, Feynman presented his
spacetime (in modern language, “path-integral”) approach to quantum mechanics,
which would lead within a year to his reformulation of quantum electrodynamics in
terms of Feynman diagrams and a set of explicitly relativistically covariant calcula-
tional rules.
The calculations by Bethe of the Lamb shift (immediately following the Shelter
Island conference) were performed for a non-relativistic electron, for which the self-
energy corrections are linearly rather than logarithmically divergent (as the momentum
integral in (2.62) becomes linearly divergent in the non-relativistic limit when we
replace E(q) → mc2 ), with the result that the energy shift ΔE calculated by Bethe
still contained a logarithmic divergence. Given that the correct relativistic treatment
converts the linearly divergent behavior of the integral in (2.62) to logarithmic once
q > mc, Bethe simply introduced a cutoff in his logarthmically divergent integral at
q ∼ mc, obtaining a finite result which agreed very well with Lamb’s measurements
(1040 MHz for the associated frequency, as compared to the observed 1000 MHz).
But the need for a relativistically correct calculation was urgently felt by all the
theorists now engaged in the hunt for a fully consistent quantum electrodynamics.
Bethe’s conversations with Feynman at Cornell in the next few months provided
a strong impetus for the latter’s development (Feynman, 1949a,b) of a manifestly
relativistic classical Lagrangian formalism, extended to quantum electrodynamics by
the sum-over-histories (path-integral) approach that Feynman had already developed
for ordinary quantum mechanics.
56 Origins II: Gestation and birth of interacting field theory: from Dirac to Shelter Island

By the time of the Pocono conference in April 1948, relativistically invariant


formulations of the theory (operator rather than path-integral based) had been
independently developed by Julian Schwinger (who applied his methods to the calcu-
lation of the order α correction to the magnetic moment of the electron (Schwinger,
1948a,b)) and by Sin-itiro Tomonaga in Japan, who had already in 1943 produced
a relativistically invariant formulation of field theory, applicable to QED, and only
belatedly published in the West in 1946 (Tomonaga, 1946). Schwinger’s presentation
of his interaction-picture calculations at Pocono were widely regarded as a tour de force
of computational power and elegance, while Feynman’s much more intuitive spacetime
diagrams met with considerable suspicion. Within a few years, however (and especially
after the contributions of Dyson (Dyson, 1949), involving a graph-theoretic analysis
of the divergence structure of the theory to all orders), the theoretical community
had wholeheartedly embraced the Feynman approach, which came to be regarded
(much in the same way as Schrödinger’s wave mechanics superseded by sheer intuitive
transparency, as well as computational efficacy, the matrix mechanical approach) as a
far more usable technology in practical calculations than the highly formal Schwinger
approach.24
The full story of the final transition to a relativistically invariant quantum elec-
trodynamics, in which all divergences are absorbed into redefinitions of the physical
constants defining the theory, has received an excellent and comprehensive treatment
in the previously cited work of Schweber (Schweber, 1994), so we will leave our
historical account at this point. In the constructive (and ahistorical!) rebuilding of
quantum field theory which follows in the rest of the book, the need to maintain
Lorentz-invariance will be a central part of our development of the theory. The
other critical physical input—locality, or the absence of action at a distance (in the
relativistic sense, as transmission of physical effects over space-like separations)—will
also be inserted at a very early stage, and with it the particle–antiparticle symmetry
which, given the persistence of hole theory up to the late 1940s, required almost two
decades to be fully appreciated in the early history of quantum electrodynamics.

24 For excellent treatments of the genesis and later spread of the use of the Feynman diagrammatic
approach, see (Wüthrich, 2010) and (Kaiser, 2005).
3
Dynamics I: The physical ingredients
of quantum field theory: dynamics,
symmetries, scales

In the preceding chapters we have presented an all too brief review of some of the
critical episodes in the historical evolution of modern quantum field theory, up to
the point where renormalized covariant quantum field theory, epitomized by the
astonishing quantitative successes of quantum electrodynamics beginning in the late
1940s and continuing to the present day, reached a state of technical (if not conceptual)
completion. While this historical account is remarkably fascinating in its own right,
it runs somewhat at cross-purposes to the account of field theory which is the major
motivation of this book: namely, to present local quantum field theory as the natural,
and in a certain sense, almost inevitable framework arising from the application of
a few basic principles which lie at the very core of modern physical science. These
principles fall into three basic categories: those involved in the specification of the
dynamics of the sought-for microphysical theory, those concerned with the specification
of the symmetries of the theory, and finally, those principles having to do with the
behavior of the theory at different distance (or energy/momentum) scales. The rest
of the book is therefore organized with a view to exploring how different conceptual
strands in each of these three areas are woven together to produce the fabric of modern
quantum field theory. In contrast to the procedure followed in the first two chapters,
our approach for the rest of the book will be resolutely antihistorical: we shall introduce
the basic principles from which relativistic quantum field theory can be constructed
with little or no attention to the role played (explicitly or implicitly) by the invocation
of such principles in the actual historical record. In particular, the order in which topics
are discussed will have in general no connection to the actual historical sequence of
events discussed in the “Origins” section of the book. In this chapter the main themes
will be introduced, as far as possible, in a non-technical and qualitative (but, we hope,
illuminating) manner. The technicalities will ensue in proper course in the ensuing
chapters!
Local relativistic quantum field theory is based on three basic principles which
in combination lead to a powerful and elegant formalism which appears to allow a
remarkably accurate description (the so-called “Standard Model”) of at least three
of the four fundamental forces in Nature: the strong, weak, and electromagnetic
interactions.
58 Dynamics I: The physical ingredients of quantum field theory: dynamics, symmetries, scales

1. Quantum mechanics: in a nutshell, the notion of linear superposition of ampli-


tudes, the probability interpretation of these amplitudes (squared), and unitary
evolution of the quantum state to implement the dynamics of the theory.
2. Special relativity: the symmetry of Lorentz-invariance. While many other sym-
metries play a crucial role in the quantum field theories of present importance, it
is the fundamental symmetry of invariance under the homogeneous Lorentz group
(more completely, under its inhomogeneous extension, the Poincaré group) which
gives quantum field theory many of its most characteristic features.
3. Clustering: insensitivity of local processes to the distant environment. Here, the
issue of the behavior of the theory at different distance scales (specifically, at long
distances) becomes the crucial constraining factor, leading, in combination with
the first two principles, to the characteristic features of relativistic quantum field
theory. The clustering property as such is not specific to relativistic theories: we
shall see that in the restricted context of such theories it is intimately linked to
(but not synonymous with!) a special property of “locality” (or “microcausal-
ity”), which will ensure, among other things, the “Einstein causality” of the
theory: namely, the absence of faster-than-light propagation of physical signals
or effects.

Items 1 and 2 above are, of course, the basic ingredients of modern physics. One
often encounters the assertion that quantum field theory arises from the marriage of
these two. In fact, the addition of special relativity to quantum mechanics leads to
no remarkably novel physics. Later we shall see that it is quite easy to write down
scattering amplitudes which fulfill both the requirements of unitarity and Lorentz
invariance. In a sense, such theories are just as unconstrained as non-relativistic
quantum theory prior to the addition of the principle of special relativity: e.g.,
Hamiltonians can be written in terms of essentially arbitrary covariant functions of
momenta, much as we are allowed to invent potential energy functions with abandon
in elementary non-relativistic quantum theory.
The characteristic phenomena of relativistic field theory only appear once we insist
on the third principle: clustering, i.e., the factorization of the S-matrix1 containing
the scattering amplitude for an arbitrary process as the product of two independent
amplitudes in the event of two spatially far separated scattering subprocesses. This
principle, which seems intuitively obvious, is surely a precondition for the success of
experimental science. It relieves us of the obligation to specify completely the state of
the entire world outside the laboratory prior to a correct interpretation of the results
of an experiment. However, the inclusion of item 3 greatly increases the complexity
of the resultant formalism, and means that it is no longer possible to write exactly
S-matrices satisfying all the desired properties in spacetimes of more than 1 space-1
time dimension. Rather, we must resort to various approximative schemes. This is
the bad part. On the other hand, the inclusion of the clustering requirement means,
as we shall see, that the construction of an appropriate Hamiltonian (dynamics) is

1A precise definition of this object will be provided later, in Chapter 4.


Dynamics I: The physical ingredients of quantum field theory: dynamics, symmetries, scales 59

now far more constrained. Arbitrary interaction potentials are no longer allowed: the
potential between far-separated electric charges is forced to be 1/r and not r −3.5 , etc.
Moreover, we are led ineluctably to the formalism of local quantum field theories, with
two immediate and unavoidable consequences:2
(a) an explanation of the existence of antimatter, with each particle having an
antiparticle of exactly equal mass and opposite additive quantum numbers,3 and
(b) the Spin-Statistics theorem, which clarifies one of the great mysteries of non-
relativistic quantum theory: the contrasting symmetry properties of the wavefunctions
of particles of integer (bosonic) versus half-integer (fermionic) spin.
A simple and intuitive picture of the emergence of antimatter as a natural conse-
quence of the basic physical ingredients of local field theory goes back to the work of
Feynman (Feynman, 1949b) on quantum electrodynamics in the late 1940s. The results
cited in item (a) above are special cases of the more general TCP theorem valid in
any local relativistic quantum field theory: the invariance of scattering amplitudes
under simultaneous interchange of particles with antiparticles (the “C” operation),
spatial reflection (or parity, the “P” operation), and time reversal (the “T” operation).
A beautifully simple argument to illustrate property (a) has been given by Weinberg
(Weinberg, 1972), although, as pointed out above, the underlying ideas were first
elucidated by Feynman. Consider a process such as that illustrated in Fig. 3.1, where
a positive pion (π + ) emitted by a proton (P) at spacetime point x travels to a neutron
(N) and is absorbed at spacetime point y. The idea of locality here amounts to the
statement that the neutron and proton interact via local emission and absorption
events of a third intermediary particle. On the one hand, the mutual indeterminacy

P
N
N P

x
y π–
π+
time
x boost y

P N P N

Fig. 3.1 Frame-dependence of a simple exchange process.

2 The primary character of these results is emphasized, and rigorous proofs given, in the seminal work
of Streater and Wightman (Streater and Wightman, 1978).
3 Recently, the charge-to-mass ratio of the antiproton and proton was measured by Penning trap
techniques to be equal to within about 1 part in 1012 !
60 Dynamics I: The physical ingredients of quantum field theory: dynamics, symmetries, scales

of position and velocity in quantum mechanics allows for the possibility that the
spacetime points x and y are actually space-like separated (“tunneling outside the
classical light-cone”, as it were).4
But if the interval between the emission and absorption is space-like, relativity tells
us that it is possible to find an inertial frame in which y 0 < x0 , i.e., spacetime point
y precedes spacetime point x, so the same event appears in this new frame as in the
figure on the right. An observer in the new frame will naturally interpret this as the
emission of a particle (at y) from the neutron, turning it into a proton. Such a particle
must be negatively charged, if we are to maintain charge conservation, but with the
same mass (as its kinematics is identical to that of the original π+ , with a spatially and
temporally reversed path). This particle is just the π − , the antiparticle for the original
positively charged pion. This example contains in a nutshell the intimate association
between spatiotemporal reflection and particle–antiparticle interchange characteristic
of local theories and exemplified in the TCP theorem.
Insofar as the characteristic features of relativistic field theory require at a min-
imum the implementation of unitary quantum dynamics, Lorentz symmetry, and
locality, our exploration of the conceptual framework of field theory must begin with
a detailed examination of these physical ingredients. This will allow us to build up the
technical framework appropriate to the task of weaving together the desired physical
properties into a unified and consistent dynamical theory. This will be our object in
this second section of the book (entitled “Dynamics”, Chapters 3–11), where we shall
concentrate on the most general features shared by essentially all relativistic local
quantum field theories (which we henceforth denote “LQFTs”).
The necessary input from quantum theory will be reviewed in Chapter 4, which
will also contain a brief review of those results from quantum scattering theory needed
for later development of the theory. Chapter 5 describes the kinematics of relativistic
quantum mechanics, which incorporates the requirements of Lorentz symmetry (but
not yet the clustering principle), leading to an enormous class of interacting theories
almost all of which display bizarre and completely unphysical long-distance behavior.
A natural way to restrict the form of the interactions—by introducing local fields—
is introduced here and shown to incorporate the requirements of Lorentz-invariance
(though, as yet, with no proof of the desired clustering properties). The restriction
to physically sensible theories compatible with the clustering principle is effected
in Chapter 6, which shows how the huge class of quantum theories incorporating
special relativity can be systematically “pruned” to yield theories which display

4 The reader may be momentarily disturbed by the apparent superluminal transmission of influence by
the exchanged pion, which would seem to run counter to the requirement that physical signals/effects can
only be transmitted at most at the speed of light (“Einstein causality”). Here, as in the EPR paradox, it is
important to keep in mind that quantum theory is fundamentally a theory of the statistics of microscopic
processes, and that the formalism can (and does!) contain apparently non-local features on an event by
event basis, provided only that these features do not result in a measurable transmission of statistically
measurable properties at faster than light speed. In the quantum information community, this is referred to
as the “no-signalling” property of quantum mechanics. We shall see later, in Chapter 9, that measurements
performed in two space-like separated domains of spacetime are guaranteed to yield statistically independent
results, as a rigorous consequence of microcausality: i.e., the property of space-like commutativity of local
field operators used to construct the hermitian operators embodying the said measurements.
Dynamics I: The physical ingredients of quantum field theory: dynamics, symmetries, scales 61

sensible long-distance behavior, purged of bizarre action-at-a-distance effects.5 This


chapter also explores the connection between the concepts of clustering and locality
(or microcausality), which are intimately related but not synonymous, the connection
between locality and the smoothness (i.e., analyticity) properties of amplitudes in field
theory, and the limitations of the localization concept in regards to particle states (as
opposed to fields).
The construction of local covariant fields which incorporate in a natural way the
requirements of Lorentz-invariance is the topic in Chapter 7, where we see that the
mysterious plethora of ad hoc field equations (Klein–Gordon, Dirac, Maxwell–Proca,
etc.) encountered in many texts arise inescapably from a straightforward analysis of
the unitary representations of the Poincaré group. Here we shall see that such fields
provide a convenient set of ingredients for the construction of local Hamiltonian energy
densities, describing particles of arbitrary mass and spin, with the important special
cases of massive particles with spin 0, 12 , and 1 worked out in detail (the peculiarities
of the massless case are also discussed).
In Chapter 8 the question of the classical limit of quantum fields is examined:
we discuss the mutual measurability of field observables, and the types of states of
the field for which quasi-classical behavior is recovered. Here also, we discuss the
energetic stability of quantum field theories, and encounter the related phenomenon
of spontaneous symmetry-breaking for the first (but not the last!) time.
In Chapter 9 we come to grips for the first time with the intricacies of interacting
field theories: our emphasis again will be on very general aspects common to all
LQFTs. The basic concept here is that of the interpolating Heisenberg field in terms
of which the dynamics of the theory is specified, but which may be connected in a
variety of ways6 to the actual physical particle states. At this point a characteristic
(and for beginning students, frequently baffling) feature of LQFTs becomes apparent:
namely, the absence of any preferred, one-to-one connection between particles and
fields. The discussion of field theory in the Heisenberg picture is first carried out in
an “heuristic” fashion, ignoring some important mathematical fine points, and then
from a rigorous axiomatic point of view, starting with the Wightman axioms (spectral
and field) (Wightman, 1956), and proceeding, via the Haag–Ruelle formulation of
scattering theory, to the asymptotic formalism of Lehmann, Symanzik, and Zimmer-
mann (Lehmann et al., 1955). The latter is treated in some detail, as it is central to
subsequent discussions in this chapter of the nature of the state space of field theory
(as it depends on the presence of stable or unstable, elementary or composite particles
in the theory). In Chapter 9 we also discuss the spectral properties of field theory, and
the connection between the internal dynamics as specified by the interpolating fields
and the phenomenological content of the theory as encapsulated in the asymptotic
particle states and the S-matrix.

5 By “action-at-a-distance” effects, we do not refer here to the psychologically unsettling effects involving
non-local transitions in entangled wavefunctions, commonly referred to as the “EPR paradox”, but to
physically observable non-local phenomena: namely, those leading to superluminal transmission of physical
signals. See the preceding footnote.
6 For example, in the case of confinement, discussed in Chapter 19, the theory contains fields which do
not correspond to finite-energy particle states at all!
62 Dynamics I: The physical ingredients of quantum field theory: dynamics, symmetries, scales

Chapter 10 provides an introduction to perturbative aspects of interacting field


theory: namely, the techniques appropriate for studying those aspects of LQFTs which
emerge from a formal asymptotic expansion in some parameter (typically, a coupling
constant) of the theory, both from an operatorial as well as a path-integral point of
view. Topological aspects of the graphical expansion of field theory amplitudes are
discussed in some detail, as well as the psychologically disturbing (but ultimately
irrelevant) Haag’s theorem. Some of the material here is, of course, to be found in
essentially all introductory texts on quantum field theory, but is reviewed in order
to lay the basis for important conceptual discussions later in the book. In particular,
the technology of perturbative expansions of field theory becomes essential in the
discussion in Chapter 11 of “non-perturbative” aspects of interacting field theory—
those phenomena which are not appropriately described by a finite number of terms
in the asymptotic expansion of a field theory amplitude in powers of coupling (or
interaction) strength. Such expansions are always asymptotic only, corresponding to
divergent series lacking a well-defined sum to all orders. One can therefore make
a useful distinction between phenomena which (a) require the summation of an
infinite number of terms extracted from a perturbative expansion, but (b) in which
the extracted terms form a convergent series, with a well-defined summand, which
represents in some precise sense the leading contribution to the desired amplitude
(the “perturbatively non-perturbative” processes discussed in Section 11.2), and pro-
cesses which require information beyond that available in even an infinite number
of terms in the perturbative expansion (the “essentially non-perturbative” processes
of Section 11.3). Non-relativistic threshhold bound states in gauge theories provide
an example of the former, while quark confinement in four-dimensional quantum
chromodynamics is an example of the latter.
The central role of symmetry and invariance principles in quantum theory gener-
ally, and in quantum field theory in particular, is now considered self-evident. One
tends to forget nowadays that the use of symmetry ideas, fully exploiting the natural
mathematical framework of group theory in order to express these ideas, was in the
early days of quantum mechanics (the late 1920s and 1930s) quite controversial. Many
atomic theorists of this period viewed the introduction of group-theoretical ideas as
a “Group-plague” (“Gruppenpest” in German): an excessively abstract formalism
quite irrelevant to the extraction of useful information about atomic spectra—an
attitude which pervades, for example, (Condon and Shortley, 1935). And indeed, to
the extent that a quantum dynamical system can be fully specified and then solved
(by either analytic or numerical methods), say by a complete diagonalization of the
underlying Hamiltonian and direct computation of all needed matrix elements of
relevant physical observables, symmetry arguments and group-theoretical reasoning
are, strictly speaking, unnecessary. Needless to say, this is rarely the case in non-
relativistic quantum theory, and essentially never the case for relativistic field theory.
Eugene Wigner, one of the pioneers in developing group theoretical methods in
quantum theory, has emphasized (Wigner, 1979a) the special efficacy of such methods
in quantum (as opposed to classical) physics. The increased complexity of the quantum
mechanical state space for even a point particle (an infinite-dimensional complex
Hilbert space) over the classical phase-space (a six-dimensional real space) together
with the linear structure of the quantum theory (allowing linear superposition of
Dynamics I: The physical ingredients of quantum field theory: dynamics, symmetries, scales 63

quantum states) means that symmetry arguments can play a much more significant
role in the resolution of the dynamics in a quantum mechanical problem than in a
comparable classical problem.7 For example, the invariance of the Hamiltonian under
a symmetry operation means that it must commute with the operator generating the
transformation. Such a commutation property already implies a partial diagonalization
of the Hamiltonian, as matrix elements connecting states with different eigenvalues
of the symmetry generator must vanish. In some cases (e.g., the O(4) symmetry of
the hydrogen atom, or in completely integrable quantum systems) the symmetry is
sufficiently large to allow a complete resolution of the spectrum, or the dynamics. The
third section of this book (entitled “Symmetries”) will therefore examine some of the
important ways in which symmetry considerations are woven into the fabric of modern
quantum field theory.
There is an important distinction between spacetime symmetries, involving sym-
metry transformations affecting exclusively the universal underlying spatiotemporal
framework of field theories, and internal symmetries in which the symmetry transfor-
mations act in specific non-geometrical ways on the assorted fields in the theory.8 By
far the most important example of the first type of symmetry is Poincaré invariance—
item 2 in the discussion at the beginning of this Chapter—whereby the physical content
of special relativity is injected into relativistic field theory. The extension of this
symmetry to supersymmetry (SUSY), wherein the Poincaré group is enlarged to a
graded extension and spacetime to an enlarged “superspace” containing conventional
space and time as well as a Grassmannian component, should probably be included in
this category, purely on the basis of the extremely powerful formal analogy between
operations carried out in normal spacetime and the extended superspace of SUSY.
In Chapter 12, devoted to continuous spacetime symmetry, we develop the canonical
formalism of Lagrangian field theory as the natural solution to the problem of gener-
ating, in as painless a process as possible, Hamiltonian energy densities that lead to a
quantum field theory with fully Lorentz-invariant dynamics. The general connection
between symmetries and conservation laws, expressed in the form most natural to
field theory (Noether’s theorem) is also given here, together with its application to
the case of Poincaré symmetry, conformal symmetry, and global internal symmetries.
Chapter 12 concludes with an introduction to the extension of Poincaré symmetry to
the super-Poincaré algebra of supersymmetry.
Discrete spacetime symmetries (reflection or parity symmetry P, and time-reversal
symmetry T) are treated in Chapter 13, together with charge-conjugation invariance
symmetry C (a symmetry under interchange of particles and antiparticles): despite
the “internal” appearance of the latter, the fact that we are dealing throughout with
local theories immediately introduces an intimate and unbreakable connection with
the P and T symmetries, making it natural to treat the C symmetry on the same

7 For example, the application of a symmetry transformation to a possible classical phase-space trajectory
will, of course, yield another possible classical trajectory, but does not directly assist in the explicit solution
of either: on the other hand, the fact that symmetries imply conservation laws and hence invariants of the
motion is clearly of great utility in resolving the dynamics in many important classical problems.
8 The terminology here has evolved over time: Wigner (Wigner, 1979b) speaks in the first case of
“classical” or “geometric” symmetries, and in the second, of “dynamical” or “non-geometric” symmetries.
64 Dynamics I: The physical ingredients of quantum field theory: dynamics, symmetries, scales

footing with these. Our treatment of discrete spacetime symmetries concludes with
proofs of the TCP and Spin-Statistics theorems, using techniques of axiomatic field
theory introduced in Chapter 9.
Although discrete internal symmetries have played some role in constructing
models of elementary particle interactions beyond the Standard Model, by far the
most important internal symmetries have turned out to be the continuous ones
corresponding to transformations which form compact, finite-dimensional Lie groups.
Internal symmetries may either be “global”, where the dynamics is invariant under
an application of the same symmetry transformation to the field quantities at all
spacetime points, or “local”, in which the invariance persists even for spacetime
dependent transformations. Evidently, every local (or “gauge”) symmetry contains
ipso facto a global subsymmetry. However, the presence of a local symmetry has
extraordinarily deep ramifications for the dynamics of the theory displaying such a
symmetry, far beyond the comparatively simple implications of global symmetry.
In Chapter 14 the role of global symmetries in LQFT is examined. We shall see that
exact global symmetries are rare, indeed, if one takes gravitational effects into account,
probably non-existent! Nevertheless, approximate global symmetries play an enor-
mously important role in modern field theory. The appearance of massless Goldstone
particles once an exact global symmetry is spontaneously broken is of enormous impor-
tance in modern field theory, and a proof of the Goldstone theorem embodying this
phenomenon is given in Section 14.2. Dynamical aspects of spontaneous symmetry-
breaking (SSB) are examined in Section 14.3, where we see that the essence of SSB
resides in the energetics of the theory in the infrared (i.e., at long distances).
The additional rich structure introduced when a LQFT displays a local gauge
symmetry is studied in Chapter 15, where we show how such symmetries require
a generalization of the canonical Lagrangian/Hamiltonian formalism discussed in
Section 12.3 in order to handle the presence of constraints entailed by the presence
of local symmetries. The concept of a local symmetry is introduced in Section 15.1
with a simple example from classical mechanics, the lessons of which are extrapolated
to a wide class of constrained Hamiltonian systems in Section 15.2, where we intro-
duce the Dirac constrained Hamiltonian theory, and the Faddeev–deWitt functional
quantization method for such systems. The quantization of gauge theories using this
functional (path-integral) method is then explained, first using abelian gauge theory
in Section 15.3, where the technical complications are minimal. The extension to
non-abelian gauge theories is performed, again using path-integral methods (which
in this case are vastly more efficient than the canonical operator approach) applied
to the constrained Hamiltonian in Section 15.4, leading to the Feynman rules for
general (unbroken) non-abelian gauge theories. The existence of quantum anomalies
in the chiral currents of internal global symmetries is explored in Section 15.5, where
we see that the classical current conservation implied by Noether’s theorem may be
violated by quantum effects, yielding a non-vanishing divergence of the Noether current
explicitly proportional to Planck’s constant. The peculiar features of spontaneous
symmetry breaking in the presence of local (as opposed to global) gauge symmetry are
the subject of Section 15.6, where we explain the famous “Higgs phenomenon” in the
context of the electroweak sector of the Standard Model, and outline the derivation
of the Feynman rules for a general spontaneously broken local gauge theory.
Dynamics I: The physical ingredients of quantum field theory: dynamics, symmetries, scales 65

The class of theories obtained from the three basic requirements discussed at the
beginning of this chapter turn out to display a very important feature: that of scale
separation, which will here be vaguely defined as the weak coupling of physics at
widely varying distance scales. This property, and its consequences, will be the central
theme of the fourth, and final, section of this book, entitled “Scales”. Much of the
confusion over troublesome “infinities” which plagued the development of interacting
field theory in the 1930s and 1940s, as described in Chapter 2, derived from a failure
to appreciate this characteristic property of LQFTs.
Unlike the situation in classically chaotic systems, where small perturbations at
very short distance scales can propagate rapidly up to much longer scales (the famous
“butterfly in China leading to a hurricane in the Atlantic” effect), LQFTs can be
“tailored” to accurately reflect the physics in some given range of length scales even if
we are completely ignorant of the “true microphysics” which obtains at much shorter
distances. This property is as indispensable to the theoretical success of field theory
as the cluster decomposition property is for the practicability of experimental science.
Our unavoidable ignorance—in a direct empirical sense—of the behavior of matter
at distance scales much smaller than the reciprocal of the highest experimentally
attainable particle momenta would be disastrous if we were dealing with theories in
which complicated (and unknown!) details of the interactions at very short distances
propagated up to the much longer scales presently accessible.
For example, there is no doubt that quantum gravity effects will drastically alter
the structure of spacetime on distance scales corresponding to the inverse of the Planck
mass, i.e., at distances below about 10−34 cm.9 Nevertheless, quantum electrodynamics
correctly predicts the anomalous magnetic moment of the electron to an astonishing
nine significant figures, all in terms of integrals extending in principle up to infinite
energy (or, in coordinate space, down to zero distance). Evidently, this remarkably
accurate result means that the long-distance behavior of quantum electrodynamic
systems must be insensitive to the detailed structure of the interactions at such very
short distances.
Obviously, theories in which unknown short-distance structure infects the behavior
of amplitudes at much longer scales would be as intractable from the point of view
of theoretical predictability in quantum physics as chaotic systems are in the classical
arena. So scale separation in the sense of the isolation of very short-distance physics
(from the behavior at accessible scales) is as crucial to the formulation of successful
theories as the isolation of long-distance effects entailed by clustering (item 3 above)
was for the correct interpretation of experimental results. In Chapter 16 we discuss
various aspects of scale separation: the critical role it plays in leading to quantitative
predictions at accessible energy scales, the introduction of regularization techniques
to quantify and simplify the study of scale sensitivity, the relevance of power counting
methods in LQFTs, the extremely important concept of effective Lagrangians, and
the classification of operators into relevant, marginal, and irrelevant on the basis of
their scaling behavior. At this stage, the point of view first introduced by Wilson

9 In theories with extra dimensions, the effective distance scale at which quantum gravity effects become
significant can in fact be much larger than this value.
66 Dynamics I: The physical ingredients of quantum field theory: dynamics, symmetries, scales

in the 1970s, whereby the physics of a system is described in terms of an “effective


Lagrangian” which incorporates, via a set of “renormalization group” equations, the
behavior over a strictly limited range of distance (or energy) scales, moves to the
center of our discussion of quantum field theory.
Although, as just indicated, a general LQFT is only “designed” to represent
microphysics in a limited range of length scales (typically, only down to a lower
limit in distance, or up to a finite cutoff energy), there is a small subclass of local
field theories in which the insensitivity to short-distance structure can be pushed
up to very-high-energy scales indeed—in some cases, to infinity! Such theories are
usually called “renormalizable” quantum field theories. This subclass can further be
subdivided into “weakly” and “strongly” renormalizable theories (this is my language,
not to be found in standard texts!). In weakly renormalizable theories, the insensitivity
to short-distance structure of the interacting theory at arbitrarily small distances is
valid only within the context of perturbation theory (an asymptotic expansion in the
interaction strength in the theory), but fails when the full, non-perturbatively defined
theory is considered. A famous example is the self-coupled λφ4 interacting scalar field
theory, to which we shall later return many times. In strongly renormalizable theories,
the insensitivity to short-distance structure at arbitrarily short scales is valid even
non-perturbatively: the effective field theory in such cases could in principle (ignoring
inevitable quantum gravitational effects!) be regarded as a correct microphysics down
to arbitrarily short distances, without any inconsistencies appearing in the quantum
amplitudes. There appears to be only a single known example of a strongly renormal-
izable theory in this sense (in 3+1 spacetime dimensions): non-abelian gauge theory.
This should not particularly worry us: as mentioned above, quantum gravity effects
necessarily obliterate the Minkowski structure of spacetime assumed (item 2) in the
whole construction of LQFTs anyway, once we reach distances on the order of the
Planck length, and a completely new type of theory must emerge at that point. It is
best to think of renormalizable theories (both kinds) just as LQFTs with a particularly
weak (logarithmic) coupling between low and high energies (or distances). In fact, it
can be shown (cf. Section 17.4) that the low-energy “residue” of an arbitrary LQFT
is in fact necessarily a renormalizable (possibly free) field theory. More than anything
else, this accounts for the historically central role played by renormalizable theories
since the development of quantum electrodynamics in the late 1940s.
The proper technical instrument for the understanding of the scale separation
features of LQFTs is called the renormalization group. This concept has found wide
applications not only in elementary particle theory, but in the modern theory of
critical phenomena10 in condensed-matter physics, where the importance of scale
separation can be seen directly from the existence of universal scaling laws for the long-
distance behavior of correlation functions independent of fine details of the microscopic
interactions in the system. In Section 16.4 we introduce the renormalization group in
its most general form, appropriate for discussing its implications for LQFTs viewed as
Wilsonian effective theories. In Chapter 17 the technical tools needed for the analysis
of the perturbative renormalizability of a specific LQFT are introduced, and a proof

10 The central role of the renormalization group in the understanding of second-order phase transitions
was first set forth in the seminal work of K. Wilson (Wilson, 1971; Wilson and Kogut, 1974).
Dynamics I: The physical ingredients of quantum field theory: dynamics, symmetries, scales 67

of cutoff-insensitivity is given, both using traditional graphical methods (i.e., the


subtraction formalism of BPHZ (Zimmermann, 1969)), and from the point of view
of effective Lagrangian theory.
The final two chapters of the book introduce the reader to some important
aspects of the short-distance (Chapter 18) and long-distance (Chapter 19) structure
of quantum field theory. In Chapter 18 we explain one of the most fertile (from
the point of view of phenomenological impact) manifestations of scale separation in
field theory: the Wilson operator product expansion (OPE) which provides a precise
characterization of the short-distance asymptotics of field theory amplitudes in terms
of factorized products of “short-” and “long-”distance terms. The useful application
of the OPE in particular processes depends on the presence and structure of mass
singularities in the relevant amplitudes—a topic which we address in Section 18.2.
The role of the renormalization group in studying high-energy (or short-distance)
behavior is outlined in Section 18.3.
Aspects of the long-distance behavior of field theory are studied in Chapter 19.
In a theory with only massive fields, this behavior is essentially trivial: clustering
is exponentially rapid, with the inverse of the smallest mass providing a length
scale over which spatially separated processes decouple. The situation for theories
with massless fields is radically different. Here, we need to distinguish between two
important cases: when the massless field interpolates for a physical particle, and
alternatively, when massless fields are present in the underlying Lagrangian dynamics
but do not interpolate for physical particles. The former case corresponds to quantum
electrodynamics, where we have, as far as we know, an exactly massless photon (and
photon field). Indeed, the photon is the only massless particle for which we have any
empirical evidence. The specific problematic issues arising with respect to introducing
massless fields, charged particle states, and a well-defined S-matrix in this situation are
explored in Sections 19.1 and 19.2. The second case mentioned above, where massless
fields exist in the theory, but do not interpolate for physical particles, corresponds to
quantum chromodynamics (QCD) and the physical phenomenon of color confinement.
The massless gluon fields of this theory, as well as the massive quark fields, specify the
Lagrangian dynamics of the theory but do not interpolate for finite-energy asymptotic
states. This extraordinary behavior—apart from superconductivity, perhaps the most
amazing and counter-intuitive phenomenon to emerge in twentieth-century physics—is
explored in general terms in Section 19.3, where we introduce the basic concepts and
techniques of lattice gauge theory, and in more detail in a toy model where the physical
mechanism of confinement can be clearly exhibited using semiclassical arguments:
namely, three-dimensional gauge theory.
We close this chapter with a comment on the role of LQFTs in the context of the
focus in recent years on superstring theories as providing a possible framework for
an ultimate microphysics (or “Theory of Everything”). In the last twenty years the
attempt to develop a consistent quantum theory of gravity has led to the introduction
of string theories which incorporate two additional physical principles which have
recently attained central importance: supersymmetry (a symmetry between bosonic
and fermionic particles), and duality (a symmetry connecting the weak and strong
coupling sectors of the theory). The dynamics of such theories is fundamentally
different from that of local quantum field theories, but to the extent to which the
68 Dynamics I: The physical ingredients of quantum field theory: dynamics, symmetries, scales

quantum amplitudes in a string theory display unitarity, Poincaré invariance, and


cluster decomposition (as everyone certainly expects), we can rest assured that these
amplitudes correspond at long distance scales to those derivable from some effective
LQFT. At the present time the LQFT believed to describe microphysics up to energies
of about one hundred GeV is called the “Standard Model”, and appears to describe
almost all (neutrino masses and the existence of a massive, stable dark-matter particle
are intriguing exceptions!) the known features of strong, weak, and electromagnetic
interactions in the experimentally accessible range. The dream of a TOE (“Theory
of Everything”)—a consistent microtheory including quantum gravity, and therefore
capable of accurately describing physics at (and beyond!) the Planck scale, and also
yielding the observed Standard Model at low energies—is the motivating force for the
study of superstrings and their descendants (M-theory, p-branes, etc.). However, our
ability to build up local field theory from just a few basic principles, which seem likely
to be conserved11 in any future theory, suggests that local quantum field theories will
continue to provide an indispensable conceptual framework for understanding the vast
majority of accessible microphysical processes.

11 From this point of view LQFTs may be the analog in physics of the “conserved core processes” in
Kirschner and Gerhart’s theory of facilitated biological evolution (Kirschner and Gerhart, 2005).
4
Dynamics II: Quantum mechanical
preliminaries

At one level, quantum field theories can be regarded as a very special subclass of all
quantum theories: theories based on a kinematical structure consisting of a state space
which is typically an infinite-dimensional complex Hilbert space, and a dynamical
structure in which time-evolution is effected by a deterministic unitary transformation
of the state vectors determined by the linear operator representing the energy of the
system (the “Hamiltonian”).1
In addition, this theoretical scaffolding needs to be supplemented with the stochas-
tic postulate of quantum mechanical measurement theory: “detection” of a state
|Ψ (via interaction with a suitable macroscopic measurement apparatus) given a
previously prepared state |Φ occurs with probability given by the absolute square of
the inner product of the (suitably normalized) state vectors: |Ψ|Φ|2 . From the vast
variety of possible quantum theories (distinguished by the structure of the Hilbert
space representing the particular system under study, as well as by the variety of
possible physically sensible Hamiltonians, measurable physical quantities, etc.) our
object in this book is to select the minuscule subset of theories in which the relativistic
invariance of special relativity is implemented exactly, and in which physical processes
localized in space-like separated regions are strictly independent (i.e., no faster-than-
light transmission of physically measurable effects). Our task in this chapter is to
review and assemble just those parts of the basic underlying quantum-mechanical
structure which will be critical in realizing these relativity and locality constraints in
the following chapters. This will also serve as a convenient opportunity to introduce the
reader to the particular notational idiosyncrasies of the author. We will begin with
a review of the basic operator formalism underlying standard quantum mechanics,
paying particular attention to dynamics (time evolution) and symmetries. Then we
turn to the reformulation of quantum dynamics as a sum over histories (the “path-
integral” approach) due to Feynman and Dirac, which has turned out to be of enormous
conceptual and technical utility in quantum field theory. Finally, we review those
aspects of quantum scattering theory which will be central in teasing out the intricate
physical content of field theory.

1 This text assumes that the reader is familiar with the basic formalism and technical apparatus of
non-relativistic quantum mechanics, at the level of an advanced undergraduate or beginning graduate level.
Conventions and notation used throughout generally coincide with those of Gordon Baym’s excellent text
“Lectures in Quantum Mechanics” (Baym, 1990).
70 Dynamics II: Quantum mechanical preliminaries

4.1 The canonical (operator) framework


The modern formulation of quantum mechanics evolved over a period of roughly
four years as a conceptual clarification and completion of the seminal papers of
Heisenberg (June 1925) and Schrödinger (January 1926). In Heisenberg’s formulation,
which after formal amplification by Born and Jordan came to be called “matrix
mechanics”, the physical observables of classical mechanics (position, momentum,
angular momentum, energy, and so on) were replaced by time-dependent matrices,
and the normal algebraic operations whereby these quantities (real valued numerical
functions of time in classical physics) are manipulated in classical theory were replaced
by the corresponding matrix operations. The concept of a physical state as a vector in
a Hilbert space is at most highly implicit in the founding papers of matrix mechanics,
but gradually emerged in the subsequent years as the transformation theory of
Dirac and Jordan took hold and was put on a rigorous mathematical basis by von
Neumann. In particular, the wave-mechanical approach of Schrödinger (which was
the original inspiration for the Dirac–Jordan transformation theory) made it natural
to associate the dynamical development of a quantum system with the evolution of
its state vector (or equivalently, the associated wavefunction) rather than with the
physical observables, as in the Heisenberg–Born–Jordan approach. The dual character
of quantum theory (involving both states and observables, which play different but
complementary roles) makes a certain fluidity in the representation of the dynamics
inevitable, as we shall now see.

4.1.1 Quantum dynamics: the Heisenberg, Schrödinger, and Dirac


(interaction) pictures
In this section we are concerned only with describing the deterministic evolution of
quantum systems isolated from the macroworld—in particular the stochastic modifi-
cations of the state arising from “measurements”, i.e., interactions of the microsystem
with a macroscopic apparatus capable of registering macroscopically distinguishable
effects depending on the interaction with the microsystem, are not included in the
description of the time-evolution of the system. As in Heisenberg’s original matrix
mechanics, we may assign the entire time-development of the quantum system to
the operators corresponding to the physical observables, with the quantum state
of the system fixed once and for all, say by specifying the state at time t = 0
(since the evolution is deterministic, the specification of the state at any time suffices
to determine it at all other times, a situation with which Laplace would have been very
happy!). The time-evolution of an operator O associated with an arbitrary physical
observable is then given by

OH (t) = eiHt/ OH (0)e−iHt/ (4.1)

where H is the self-adjoint operator representing the energy of the system: the
“Hamiltonian”. The Planck constant  = 2π h
will henceforth be set to unity (for the rest
of this book, with a few exceptions, natural units will hold sway:  = c = 1). The
subscript H appearing in (4.1) reminds us that we are in the “Heisenberg picture” of
time development. In this picture, the quantum state of a particular system is a fixed
The canonical (operator) framework 71

vector |α in the Hilbert space appropriate for the system in question.2 Taking the
time-derivative of the finite time-evolution (4.1) yields the commutation property

∂OH (t)
= i[H, OH (t)] (4.2)
∂t
To put some meat on these rather abstract bones, consider a spinless point particle
of mass m, described by non-relativistic kinematics, and moving on a one-dimensional
line (say, the x-axis). In this case the Hilbert space of states is just the linear space of
complex, Lebesgue square-integrable functions,
 +∞
|α → ψα (x), |ψα (x)|2 dx < ∞ (4.3)
−∞

which is commonly denoted L2 (R) (the R refers to the functions ψα being defined on
the entire real axis: if our particle were constrained to move in the interval a < x <
b, we would denote the corresponding Hilbert space L2 (a, b)). Of course, our linear
space needs an inner product to be a Hilbert space, so if |β is another state vector,
representing the square-integrable function ψβ (x),
 +∞
β|α = ψβ (x)∗ ψα (x)dx (4.4)
−∞

The Hilbert space L2 (R) is separable (i.e., is spanned by a countable basis of


orthonormal square-integrable functions), so if |n → ψn (x), n = 1, 2, 3, . . . is such a
basis, each physical observable, represented in the theory by a self-adjoint operator
OH (t) can be completely specified3 at time t by giving its matrix, i.e., the set of
numbers

OH nm (t) = n|OH (t)|m, n, m = 1, 2, 3, . . . (4.5)

For the special case of systems such as the harmonic (or anharmonic) oscillators studied
in Heisenberg’s original work, the energy eigenstates of the Hamiltonian

H|n = En |n (4.6)

form such a complete orthonormal basis (in other words, the spectrum of the Hamil-
tonian is completely discrete), and the matrix elements of any physical observable
O in this basis have a purely oscillatory time-dependence determined by the energy
differences between the states:

n|OH (t)|m = n|eiHt OH (0)e−iHt |m = ei(En −Em )t n|OH (0)|m (4.7)

2 We will use the Dirac bra-ket notation throughout this book: see Baym, op. cit.
3 Contrary to assertions in many texts, this is true even for operators with a partially or fully continuous
spectrum: matrix mechanics is not restricted to situations where the spectrum is fully discrete! Of course,
in many, indeed most, cases the coordinate space representation of wave mechanics is technically more
convenient.
72 Dynamics II: Quantum mechanical preliminaries

The discovery of matrix mechanics by Heisenberg was occasioned by his recognition of


the appearance in (4.7) of the appropriate time-dependence for emitted radiation in
atomic transitions: an electron in an atom transitioning from a state |n to a state |m
emits electromagnetic radiation with frequency given by the Bohr condition νn→m =
(En − Em )/h (rather than at the frequencies given by Fourier analyzing the classical
motion in Bohr orbits, as predicted by the old quantum theory).
In the Schrödinger approach to quantum mechanics, the dynamical evolution of a
quantum system is incorporated in the state vector, which now evolves according to

|α; tS = e−iHt |α; 0S (4.8)

where the subscript S indicates that we are in the Schrödinger picture. In this picture,
physical observables are represented by time-independent self-adjoint operators OS . By
convention, states and observables in the Heisenberg and Schrödinger picture coincide
at time t = 0:

|α; 0S = |α (4.9)


OS = OH (0) (4.10)

The expectation value of a Heisenberg observable in a Heisenberg state coincides with


the corresponding expectation value in the Schrödinger picture:

S α; t|OS |α; tS = α|eiHt OS e−iHt |α = α|OH (t)|α (4.11)

In both the Heisenberg and Schrödinger pictures, the time-evolution of the system
is treated exactly, i.e., with the full energy operator H of the system. However, it
is frequently the case—for quantum field theories, almost always the case—that the
exact dynamics is too complicated for an analytic solution to be available. A standard
tactic is then to split the full Hamiltonian H into “free” (H0 ) and “interaction” (V )
parts

H = H0 + V (4.12)

There are obviously an infinite number of ways in which such a split can be done,
but the split is only useful if (a) H0 generates an analytically simple dynamics, and
(b) the effects of V represent a quantitatively small “perturbation” on the evolution
induced by H0 . Then one can hope to obtain useful results by expanding the desired
physical quantities in powers of the “small” interaction V . In order to facilitate such an
expansion, Dirac introduced a third version of quantum-mechanical time-development,
which is now universally referred to as the “interaction picture”. In the interaction
picture, states and observables share the burden of carrying the time development of
the system. In particular, operators retain the time-development characteristic of the
Heisenberg picture, but only the free part H0 of the Hamiltonian is used:

Oip (t) = eiH0 t OS e−iH0 t (4.13)

while the states evolve unitarily according to

|α; tip = eiH0 t e−iHt |α (4.14)


The canonical (operator) framework 73

Once again, these choices ensure that expectation values of an observable are the same
as those computed in (say) the Heisenberg picture:

ip α; t|Oip (t)|α; tip = α|eiHt e−iH0 t eiH0 t OS e−iH0 t eiH0 t e−iHt |α
= α|eiHt OS e−iHt |α
= α|OH (t)|α (4.15)

From (4.14) it follows that time evolution within the interaction picture (say from
time t0 to a later time t) is accomplished by the unitary operators

U (t, t0 ) ≡ eiH0 t e−iH(t−t0 ) e−iH0 t0 (4.16)


|α; tip = U (t, t0 )|α; t0 ip (4.17)

The unitary operator (4.16) also gives directly the transformation of the operators
from interaction to Heisenberg picture:

OH (t) = U † (t, 0)Oip (t)U (t, 0) (4.18)

We will now derive a formal expression for U (t, t0 ) as an expansion in powers of


the interaction part of the Hamiltonian V . Here is as good a place as any to remark
(and we will return to this issue on several occasions later in the book) that this
expansion is in general (and in fact, in any interesting physical case) not a convergent
Taylor expansion, but at best an asymptotic expansion.4 Such an expansion is useful
only in those situations in which the contribution of the initial few terms in the series
to the physical quantity of interest decrease sufficiently rapidly to give a sufficiently
accurate estimate of the exact answer. For the time being we will ignore this issue and
show how to develop a formal expansion for the interaction-picture time-development
operator U (t, t0 ). First, observe that

d
U (t, t0 ) = eiH0 t (iH0 − iH)e−iH(t−t0 ) e−iH0 t0 (4.19)
dt
= −iVip (t)U (t, t0 ) (4.20)

where

Vip (t) ≡ eiH0 t V e−iH0 t (4.21)

is simply the interaction part of the Hamiltonian transformed to the interaction


picture, and U obviously satisfies the initial condition U (t0 , t0 ) = 1. A solution of
(4.20) satisfying the boundary condition U (t0 , t0 ) = 1 is clearly

4 For an introduction to asymptotic expansions, the short treatise of Erdelyi, “Asymptotic Expansions”
(Dover, 1956) is very useful.
74 Dynamics II: Quantum mechanical preliminaries

 t
U (t, t0 ) = 1 − i Vip (t1 )U (t1 , t0 )dt1 (4.22)
t0

which can be straightforwardly iterated (by reinserting the right-hand side in the
integral) to yield the formal expansion

  t  t1  tn−1
U (t, t0 ) = (−i)n dt1 dt2 . . . dtn Vip (t1 )Vip (t2 ) . . . .Vip (tn ) (4.23)
n=0 t0 t0 t0

In fact, if we just differentiate (4.23) once with respect to t, the t1 integral is


removed in each term, t1 is replaced by t in the rest, and an extra factor of −iVip (t)
appears in front of the original series. So the desired differential equation is satisfied.
The first (n = 0) term in the series is interpreted as 1, and all subsequent terms vanish
(the integrals collapse) for t = t0 , so the initial condition is also satisfied.
The formula (4.23) is so important that we pause here to comment on its structure.
The operator Vip (t) depends in general on time in a highly non-trivial way (although
V , the original interaction part of the Hamiltonian, does not). This would not be so
if H0 and V , the free and interacting parts of the Hamiltonian commuted, but this is
(alas!) never the case in any interesting situation. Consequently, the order in which
the Vip operators appear in (4.23) is crucial, since Vip at one time will not in general
commute with Vip at another time. Note that the operators are arranged in order
of time, with the earliest operator at the right end and the latest at the left. The
temptation is irresistible to interpret the n-th term in (4.23) as corresponding to n
sequential interactions induced by Vip , with free propagation of the system in the
intervening time intervals.
Consider the n = 2 term in the series (4.23). The integration region is t0 ≤ t2 <
t1 ≤ t (see Fig. 4.1). If we introduce a time-ordering symbol T which reorders a product
of interaction-picture operators in a decreasing time sequence (left to right)

T (Vip (t1 )Vip (t2 )) ≡ Vip (t1 )Vip (t2 ) , t1 > t2 (4.24)
≡ Vip (t2 )Vip (t1 ) , t2 > t1 (4.25)

we can expand the region of integration so that both t1 and t2 run from t0 to t. In
fact,
 t  t1  t  t
1
dt1 dt2 Vip (t1 )Vip (t2 ) = dt1 dt2 T (Vip (t1 )Vip (t2 )) (4.26)
t0 t0 2 t0 t0

The factor of 12 just compensates for the inclusion of the upper triangular region in
the figure, which contributes equally (by the reordering action of the T symbol) to the
lower. This can obviously be generalized to the nth term in the series. We simply allow
1
all the integrations to go from t0 to t, and compensate with a factor n! . A T symbol
must be included to ensure that the operators are always in the proper time-sequence,
no matter what sector of the multi-dimensional integration region we happen to be
in. In other words,
The canonical (operator) framework 75

t2
t0 ≤ t2 < t1 ≤ t

t0

t1
t0 t

Fig. 4.1 Integration region for second-order term.

∞ 
(−i)n t
U (t, t0 ) = dt1 dt2 ..dtn T {Vip (t1 ) . . . .Vip (tn )} (4.27)
n=0
n! t0

This formula will play a central role in our discussion of scattering theory, both for
non-relativistic quantum systems later in this chapter, and for relativistic quantum
field theories, where it will afford us a simple criterion for understanding how Lorentz-
invariance can be guaranteed in the simplest field theories (cf. Chapter 5, Section 5).
The resemblance of (4.27) to an exponential series suggests the convenient notation:
 t
U (t, t0 ) = T {exp (−i Vip (τ )dτ )} (4.28)
t0

4.1.2 Propagators and kernels in quantum mechanics


As we saw in the preceding section, in the Heisenberg picture of quantum-mechanical
time evolution, all of the dynamics takes place at the operator level, with state vectors
fixed in time, and conventionally chosen to agree with the time-dependent state vectors
of the Schrödinger picture at time t = 0: thus, the variable t appears in the Dirac
ket |α; tS for a Schrödinger state, but not in the corresponding Heisenberg state
|α ≡ |α; 0S . However, it is frequently more convenient to specify the Heisenberg
state of the system as a condition on the system at a general time. For example, we
may consider a non-relativistic particle moving in one dimension which is known to
be exactly localized at position qi at time ti , and denote the corresponding Heisenberg
state by |qi , ti , with
qH (ti )|qi , ti  = qi |qi , ti  (4.29)
76 Dynamics II: Quantum mechanical preliminaries

where qH (t) is the position operator for the particle in Heisenberg representation. The
connection of such states to the conventionally defined Heisenberg states (specified at
t = 0) follows immediately from (4.1):

|qi , ti  = eiHti |qi  (4.30)

We distinguish between the implicit time-dependence of states |qi , ti  due to an initial


condition and the explicit time-dependence of the states in Schrödinger picture |qi ; tS
by using a comma in the former and a semicolon in the latter case to separate the
time variable. The states (4.30) allow a simple and direct definition for the propagator
of a particle K(qf , tf ; qi , ti ) as the amplitude for detecting the particle at position qf
at time tf given that the particle was previously localized at qi at time ti :

K(qf , tf ; qi , ti ) = qf , tf |qi , ti  = qf |e−iH(tf −ti ) |qi  (4.31)

This result is clearly translation-invariant: the amplitude depends only on the elapsed
time T = tf − ti , so we may as well consider simply K(qf , T ; qi , 0), with no loss of
generality. The simple harmonic oscillator provides a concrete example: the Hamilto-
nian is5
p2 1
Hsho = + mω 2 q2 (4.32)
2m 2
so the propagator satisfies the differential equation

∂ 1 ∂ 2 K(qf , T ; qi , 0) 1
i K(qf , T ; qi , 0) = − + mω 2 qf2 K(qf , T ; qi , 0) (4.33)
∂T 2m ∂qf2 2

with the initial condition

K(qf , T ; qi , 0) → δ(qf − qi ), T → 0 (4.34)

The solution in this case is well known:


mω imω
K(qf , T ; qi , 0) = ( )1/2 exp ( ((q 2 + qf2 ) cos ωT − 2qi qf )) (4.35)
2πi sin ωT 2 sin ωT i
The verification of (4.33) is a matter of some straightforward, if tedious, algebra. The
zero time limit, yielding the δ-function normalization of the position eigenstates, is a
more subtle matter, given the oscillatory behavior of the exponential in (4.35). This
is the first indication of technical difficulties which will persist (in highly amplified
degree!) in quantum field theory. Circumventing these problems leads to the imaginary-
time formulation, which we now discuss briefly.
The propagator K(qf , T ; qi , 0) in (4.35) is evidently an analytic function of the time
variable T (with essential singularities at T = nπ/ω). We may therefore analytically
continue it by the replacement T → −iT , leading6 to the Euclidean (or imaginary

5 We employ the standard device of bold-face notation to distinguish operators from c-numbers.
6 This rotation by 90 degrees in the complex plane of the time variable is called a “Wick rotation”.
The canonical (operator) framework 77

time) propagator KE (qf , T ; qi , 0) defined as

KE (qf , T ; qi , 0) = qf |e−Hsho T |qi 


mω mω
=( )1/2 exp (− ((q 2 + qf2 ) cosh ωT − 2qi qf )) (4.36)
2π sinh ωT 2 sinh ωT i
The desired zero time limit is now easily demonstrated: as the time variable T only
appears in the combination ωT , in this limit KE for the harmonic oscillator coincides
with the Euclidean propagator for a free particle (ω = 0):
(0) m 1/2 m
KE (qf , T ; qi , 0) → KE ≡ ( ) exp (− (qf − qi )2 ), ω → 0 (4.37)
2πT 2T
→ δ(qf − qi ), T → 0 (4.38)

where the δ-function limit is now apparent in the increasingly peaked Gaussians
(normalized to unity) appearing on the right-hand side of (4.37).
The Euclidean propagators KE are sometimes referred to as “heat kernels”. Indeed,
(0)
the free-particle propagator KE satisfies the one-dimensional diffusion equation

∂ 2 KE (qf , T ; qi , 0) ∂KE (qf , T ; qi , 0)


= 2m (4.39)
∂qf2 ∂T
1
corresponding to diffusion in one dimension with diffusion constant κ = 2m . Later in
this chapter we will show how mathematically well-defined integral representations of
such heat kernels (referred to generically as “Feynman–Kac” formulae) lead to the
path integral formulation of quantum theory.

4.1.3 Quantum symmetries


The probability that a measurement of some observable O performed on a quantum
system prepared in a specified (pure) quantum state |Ψ yields a specified value (or
range of values) is given by the absolute square of a probability amplitude which can
typically be expressed as the Hilbert space inner product of the state |Ψ and the
eigenstate |Φ of O corresponding to the specified value:

P = |Φ|Ψ|2 = |(|Φ, |Ψ)|2 (4.40)

where we have introduced an alternative notation (|Φ, |Ψ) for the complex inner
product of two Hilbert space vectors which will be useful in the following. Translational
invariance of the laws of physics is a fundamental symmetry which has survived from
the times of Galileo and Newton, and we are certainly entitled to expect that if the
entire apparatus that prepared the system in the state |Ψ and the detection apparatus
which on interaction with the system will project it onto the eigenstate |Φ are both
translated by the same fixed spatial vector a, the measurement probability P in (4.40)
should be unchanged. The translation of a physical system by displacement a is of
course effected by the unitary operator

Utrans (a) = e−i



p (4.41)
78 Dynamics II: Quantum mechanical preliminaries

where p is the total 3-momentum vector for the system. Similarly, a rotation of
a physical system around the direction of the vector α  by an angle given by the
magnitude |α| is implemented on the Hilbert space of states by the unitary operator

α) = e−i
α·J , where J is the total angular momentum of the system.7 Returning
Urot (
to the translation case, it is no surprise that if we replace
|Φ → Utrans (a)|Φ, |Ψ → Utrans (a)|Ψ (4.42)
then (by the unitarity of Utrans )

P → |(Utrans (a)|Φ, Utrans (a)|Ψ)|2 = |(|Φ, Utrans (a)Utrans (a)|Ψ)|2 = |Φ|Ψ)|2 = P
(4.43)
In general, the symmetries of physics can be expressed mathematically as groups of
transformations (e.g., in the case just above, the succession of two translations a, b
is equivalent to the combined translation a + b, the translation −a is the inverse of
the translation a, etc.). A particular element g of such a symmetry transformation
group will be associated with some Hilbert space operator S(g) (just as translation
by a above was associated with the unitary operator e−i

a ). And the statement that
physics is invariant under such a group of transformations amounts to the requirement
|(S(g)|Φ, S(g)|Ψ)|2 = |(|Φ, |Ψ)|2 (4.44)
for arbitrary states |Φ, |Ψ and arbitrary group elements g. This requirement clearly
holds if the symmetry group is implemented by unitary operators S(g), such as the
operator Utrans (a) discussed above.
In fact, there is another option, as Wigner was the first to demonstrate, in his
famous unitarity–antiunitarity theorem (Wigner, 1959). The other option—indeed the
only other possibility compatible with (4.44)—is that S(g) be an antiunitary operator.
An operator T is antiunitary if, for some complete orthonormal basis {|n} of the state
space (which we shall assume here to be separable, i.e., to allow a denumerable basis),
(T |n, T |m) = δnm (4.45)
 
T an |n = a∗n T |n (4.46)
n n

The property (4.46) indicates that T is an antilinear operator. The symmetry require-
ment (4.44) follows immediately:
 
|Φ = an |n, |Ψ = bm |m (4.47)
n m
 
|(T |Φ, T |Ψ)| = |( a∗n T |n, b∗m T |m)|
n m

=| an b∗m (T |n, T |m)| (4.48)
n,m

7 See Baym, op. cit., Chapter 17.


The canonical (operator) framework 79

Using (4.45), this becomes



|(T |Φ, T |Ψ)| = | an b∗n |
n

=| a∗n bn |
n

= |(|Φ, |Ψ)| (4.49)

A symmetry group cannot consist purely of antiunitary operators, for the simple reason
that the product of two antilinear operators must be linear. Indeed, the only case of
physical interest in which the antiunitary option is required is for the discrete group
consisting of (i) the identity and (ii) the time-reversal operation t → −t. That time
reversal should entail a complex conjugation is plausible once we consider that the
time-dependence of quantum states in the energy basis involves the factor e−iEt with
the energy eigenvalue E real. For a classical particle the time-reversal operation is
easily described in phase-space as the mapping taking q(t), p(t) to

qtr (t) = q(−t) (4.50)


ptr (t) = −
p(−t) (4.51)

where the subscript “tr” denotes the time-reversed trajectory. In quantum mechanics,
the corresponding mapping is realized by an antiunitary operator T (the need for
the “anti” will be shortly apparent) with the Heisenberg operators (we omit the “H”
subscript to avoid clogging the notation) transforming like

qtr (t) = T q(t)T −1 (4.52)


−iHt −1
=Te iHt
q(0)e T (4.53)
= T eiHt T −1 qtr (0)T e−iHt T −1 (4.54)
−iHt iHt
= q(−t) = e qtr (0)e (4.55)

from which we conclude that the time-reversal operator T must satisfy

T (iHt)T −1 = −iHt ⇒ T iH = −iHT (4.56)

Since T has to commute with H (for example, if the Hamiltonian is quadratic


in momenta), it must anticommute with i: in other words, T contains a complex
conjugation and must be antilinear. Let us define a complex conjugation operator K
as an antilinear operator performing complex conjugation on the components of a
state vector in a preferred basis |n: thus,
  
K|n = |n ⇒ K( an |n) = a∗n K|n = a∗n |n (4.57)
n n n
80 Dynamics II: Quantum mechanical preliminaries

It can be readily shown (see, for example, (Messiah, 1966), Chapter XV) that the
most general antilinear operator satisfying (4.49) takes the form

T = UK (4.58)

where U is a conventional (linear) unitary operator, and K is the complex conjugation


operator (in a specified basis) described above. Typically, the basis chosen is the
coordinate space basis in which the q operators are diagonal, and for the spin degrees
of freedom (if any), the basis in which S3 is diagonal and real (and S1 and S2 are
purely real and purely imaginary matrices, respectively). Note that the orbital angular
momentum q × p reverses sign under time-reversal, so we must require the same of
spin angular momentum:
 −1 = −S
T ST  (4.59)

which can be achieved, for a particle of spin j, by choosing

T = e−iπS2 K = Y (j) K (4.60)

clearly of the form (4.58). The complex conjugation operator K takes S1 → S1 , S2 →


−S2 , S3 → S3 , and then the subsequent rotation by π around the y-axis reverses the
sign of S1 and S3 , yielding the desired result (4.59). Explicitly, the matrix of Y (j) in
the standard spin representation is given by
(j) 1
Ymm = (−1)j+m δm,−m (= −iσ2 for spin− ) (4.61)
2
The discussion of symmetries so far has been essentially at a kinematic level:
unitary operators U (g) representing a particular group element g of a symmetry group
G can be defined on the Hilbert space of states of a quantum mechanical particle quite
independently of whether the dynamics (i.e., the time evolution) of the system respects
the symmetry. We shall say that the “dynamics respects the symmetry”8 if for any
initial and final states |Ψ and |Φ, any time lapse T , and any group element g

(U (g)|Φ, e−iHT U (g)|Ψ) = (|Φ, e−iHT |Ψ) (4.62)

In other words, the amplitude that the initial state |Ψ will be found to have evolved
to the state |Φ after time T is equal to the corresponding amplitude for the symmetry
rotated states U (g)|Ψ and U (g)|Φ. Now suppose that G is a finite-dimensional
linear Lie group: namely, a group of matrices parameterizable by a finite set of group
parameters ω α and finite-dimensional generator matrices Tα ,

g = exp (−iω α Tα ) (4.63)

If the Lie group in question in unitary or orthogonal (e.g., the rotation group) then the
group parameters ω α can be chosen to be real and the generators Tα to be hermitian

8 This definition of a “dynamical symmetry” differs from the usage introduced by Wigner (cf. Chapter
3), where the term was reserved for symmetries of a non-geometrical character.
The canonical (operator) framework 81

matrices. For non-unitary groups, such as the homogeneous Lorentz group, discussed in
greater detail in the following chapter, some of the generators must be non-hermitian
(if we follow usual convention and continue to parameterize the group in terms of
real parameters ω α ). In either case, the discussion of Wigner’s theorem above makes
clear that individual group operations g must be represented on the Hilbert space of
quantum states by unitary operators (putting aside the special case of time reversal
for the moment) U (g), with

U (g) = exp (−iω α Jα ) (4.64)

where the Jα are self-adjoint operators in the quantum state space.


As |Φ and |Ψ are arbitrary states in (4.62) we conclude that at the operator level

U † (g)e−iHT U (g) = e−iHT (4.65)

or taking the derivative with respect to T at T =0

U † (g)HU (g) = H ⇒ [U (g), H] = 0 (4.66)

The commutativity of the Hamiltonian with arbitrary group operations U (g) then
implies, for symmetry operations infinitesimally close to the identity, g = 1 − iωα Tα ,
U (g) = 1 − iω α Jα , that

[Jα , H] = 0 (4.67)

By a standard quantum mechanical argument, the generators Jα of any dynamical


symmetry therefore represent conserved observables of the theory. The dual character
of (4.67)—simultaneously expressing the invariance of the Hamiltonian under the
infinitesimal group transformations generated by the Jα (a symmetry requirement)
and the time-independence of physical observables associated with the Jα (a con-
servation principle)—will receive an elegant transcription in Noether’s theorem (cf.
Chapter 12) for the conserved currents associated with continuous symmetries in
quantum field theory.
At this point it will be convenient to give some concrete examples of the imple-
mentation of classical symmetries in a quantum theory. As we shall see, in some cases
subtleties arise which obstruct a straightforward transferral of a classical symmetry
to a unitary symmetry of the corresponding quantum system, a situation which
will become even more prevalent in quantum field theory, with important physical
consequences.
82 Dynamics II: Quantum mechanical preliminaries

Example 1: Rotational symmetry


We begin with an elementary case, a (non-relativistic) spinless point particle in three
dimensions subject to a rotationally invariant potential V (r), r = |r|. The Hamilto-
nian is
1 2
H= p + V (r) (4.68)
2m
and clearly satisfies

[J, H] = 0 (4.69)

where J is the (orbital) angular momentum three-vector. Corresponding to a three-


dimensional rotation R( α) by the angle | α| around the direction of α  (which we
can realize as a 3x3 real orthogonal matrix) is the unitary representative Urot ( α) =
−i
α·

J
e , acting on states in Hilbert space. The general requirement of dynamical
rotational invariance (4.62) implies in particular that the propagator K(rf , T ; ri , 0) =
rf |e−iHT |ri  for detecting the particle at rf at time T if localized initially at time zero
at ri should satisfy, for any fixed rotation R( α)

α)rf |e−iHT |R(


R( α)ri  = rf |e−iHT |ri  (4.70)

the physical significance of which is obvious.


Example 2: Canonical symmetry.
The canonical symmetry of classical mechanics9 asserts the possibility of equivalent
representations of the dynamical Hamiltonian evolution of a classical system in phase-
space in terms of alternative choices of coordinate and momentum variables. Here we
consider a particle moving in one space dimension, with a Hamiltonian dynamics
determined by a function H(p, q) of momentum (coordinate) variables p (q). An
equivalent description is obtained by choosing a different canonical pair P, Q related
to p, q by a generating function F (q, P )10 as follows:
∂F
Q= (4.71)
∂P
∂F
p= (4.72)
∂q
In principle, solution of the equation pair (4.71, 4.72) allows us to express the
initial canonical pair p, q in terms of the new pair P, Q, or vice versa. The prob-
lem of identifying unitary representatives of the classical canonical transformation
(p, q) → (P, Q) in the Hilbert space of quantum states for our particle was first solved
by Jordan (Jordan, 1926), in a paper which played a crucial role in the development

9 See Goldstein (Goldstein, 2002), Chapter 9 for a review of the essential properties of canonical
transformations.
10 This is a generating function of the second type, F , in the notation of Goldstein, op. cit. We only
2
consider time-independent generating functions here: the new Hamiltonian is then equal to the old one,
re-expressed in terms of the new canonical variables.
The canonical (operator) framework 83

of quantum transformation theory in the late 1920s. Jordan showed that the operator
Ucan implementing this transformation for the quantum kinematic variables, for the
special case of classical generating functions of the form

F (q, P ) = fn (q)gn (P ) (4.73)
n

takes the form (again temporarily reintroducing Planck’s constant)


i 
Ucan (P, Q) = C exp ( {−(Q, P ) + (fn (Q), gn (P ))}) (4.74)
 n

with C an arbitrary constant. With this form Jordan could show (cf. Problem 2 at
the end of this chapter)
−1
q = Ucan QUcan (4.75)
−1
p = Ucan P Ucan (4.76)

In the formula (4.74) the round bracket expressions (Q, P ) and (fn (Q), gn (P )) imply a
specific ordering of the non-commuting operators Q and P : one is instructed to order
all Qs to the left of all P s in the formally expanded exponential in (4.74). Although
a formal demonstration of (4.74, 4.75, 4.76) is straightforward, the actual existence
of Ucan is not guaranteed, and in general the operator obtained in this fashion is not
even unitary! We will shortly provide an example of the problems that can arise in
this connection—but first, a “nice” canonical transformation where all the desired
properties of a quantum symmetry obtain in an unproblematic way. We consider the
generating function
1
F (q, P ) = (bP 2 + 2qP − cq 2 ) (4.77)
2d
where a, b, c, d are real constants satisfying ad − bc = 1. The result is a linear canonical
transformation

Q = aq + bp (4.78)
P = cq + dp (4.79)

In this case the general formula for the symmetry representative Ucan (4.74) gives

i b c 1
Ucan = exp ( { P 2 − Q2 + ( − 1)(Q, P )}) (4.80)
 2d 2d d
We now derive an explicit formula for this operator as an integral kernel U(Q, Q )
acting on coordinate wavefunctions ψ(Q) (so that the operator P in (4.80) becomes
 ∂
i ∂Q ):

Ucan ψ(Q) = U(Q, Q )ψ(Q )dQ (4.81)
84 Dynamics II: Quantum mechanical preliminaries

A short exercise in Fourier transformation (cf. Problem 2) then shows



 1 i
U (Q, Q ) = exp {− (aQ2 − 2QQ + dQ 2 )} (4.82)
2πb 2b

where the arbitrary constant C in (4.74) is chosen to ensure unitarity of the resultant
kernel:

U(Q, Q )U ∗ (Q , Q )dQ = δ(Q − Q ) (4.83)

The special case a = d = 0, b = −c = 1, corresponding to a canonical transformation


interchanging the roles
of coordinate and momentum variables leads, of course, to the
i 
Fourier kernel U = 2π 1
e  QQ , which is of great importance in the development of
transformation theory.
As mentioned above, it is important to realize that the transferral of a non-linear
classical canonical transformation to the quantum arena as a unitary symmetry group
is far from automatic: indeed, it is the exception rather than the rule (Anderson, 1994).
Consider, for example, another class of canonical transformations of great importance
classically: the point transformations

q → Q = f (q) (4.84)
1
p→P = p (4.85)
f  (q)

corresponding to the generating function

F (q, P ) = f (q)P (4.86)

and the Jordan operator

i
Ucan = C exp ( (f (Q) − Q, P )) (4.87)

We assume that f (q) is (a) monotone increasing, and (b) invertible. Unfortunately,
irrespective of the choice of C, Ucan fails to be norm-preserving for general non-linear
choices of the reparameterization function f (q), as

Ucan ψ(Q) = Cψ(f (Q)) (4.88)


 
and in general we certainly do not have |ψ(Q)|2 dQ = |C|2 ψ(f (Q))|2 dQ ! As Jordan
showed, the non-unitarity of the symmetry representative in this case can be traced
back to the non-hermiticity of the new momentum variable P in (4.85). The problem
can be fixed by adding a quantum correction to the generating function (4.86)


F (q, P ) = f (q)P + ln |f  (q)| (4.89)
2i
The canonical (operator) framework 85

The relation between the new and old momentum variables is now
1 
p= (f (q)P + P f  (q)) (4.90)
2
consistent with hermiticity of both p and P ; the symmetry representative becomes
i
Ucan = |f  (Q)|1/2 exp ( (f (Q) − Q, P )) (4.91)

generating the norm-preserving action

Ucan ψ(Q) = |f  (Q)|1/2 ψ(f (Q)) (4.92)

The presence of a quantum obstruction (typically signalled by the appearance of a


term proportional to Planck’s constant) in a classical symmetry is generally referred
to as an anomaly. Some further examples of quantum-mechanical anomalies in the
implementation of classical canonical transformations are discussed in (Swanson,
1993). In Chapter 15 we shall see that quantum anomalies appear quite commonly
in quantum field theories when we attempt to implement certain classical symmetries
(such as dilatation and axial symmetry).
Before leaving the subject of the role of symmetries in quantum theory we should
briefly discuss an important restriction (originally pointed out by Wick, Wightman,
and Wigner (Wick et al., 1952)) on the fundamental superposition principle of quan-
tum mechanics, arising in connection with certain important physical symmetries.
An unrestricted application of the superposition principle would imply that a linear
combination of any two physical states α|ψ1  + β|ψ2 , α, β = 0 must necessarily also
correspond to a physically realizable state. In particular, the relative phase of the
two components of the state (i.e., of the complex numbers α and β) must be
physically meaningful, measurable for example by some suitable suitable interference
experiment, in contrast to the overall phase (common phase of α and β) which
disappears from all expectation values. Wick et al., in the aforesaid reference, point
out that certain linear combinations are in fact not permitted, as the resultant
state displays unphysical phase correlations. For example, a state represented by the
linear combination α|B + β|F  of a state |B with integral total angular momentum
and one |F  with half-integral total angular momentum (say, a state with an odd
number of spin- 12 particles) will be converted to the state α|B − β|F  under a spatial
rotation by 360 degrees around any axis, which clearly corresponds to no physical
change to the state. Accordingly, the relative phase of the “bosonic” and “fermionic”
components of the state is unobservable. Combinations of this kind are said to be
excluded as a consequence of a superselection principle: the physical Hilbert space
of the theory is constructed as a direct sum of distinct superselection sectors. In the
example just given there are two such sectors: the Hilbert space HB of all states with
integral angular momentum, and the space HF of all states with half-integral total
angular momentum. Linear combinations of states in distinct superselection sectors are
forbidden. The exact conservation of angular momentum in the theory ensures that
such a prohibition, asserted at any initial time, will be respected by the dynamics
of the theory, as states evolve independently within their superselection sectors, with
86 Dynamics II: Quantum mechanical preliminaries

transitions between sectors forbidden by a conservation principle. Exact electric charge


conservation is similarly associated with a superselection rule: superposition of states
with different total electric charge is proscribed, as the relative phase will be altered
by an (unobservable) gauge transformation. Finally, we observe that in situations
where superselection rules are operative, the hermitian operators corresponding to
physically realizable measurements must have vanishing matrix elements between
states in different superselection sectors.

4.2 The functional (path-integral) framework


The operator formulation of quantum mechanics employed in the preceding sections
took shape in the late 1920s, within a few years of Heisenberg’s 1925 breakthrough,
and was put on a mathematically firm foundation by von Neumann (Von Neumann,
1996), who developed in the course of his formalization of quantum mechanics much
of the essential machinery of modern functional analysis (e.g., the spectral theory
of unbounded self-adjoint operators in Hilbert space). It therefore came as quite a
surprise to many physicists when a completely different formulation of quantum theory
in terms of integrals over infinite-dimensional function spaces emerged from work of
Feynman (Feynman, 1948) and Dirac (Dirac, 1945). This “path-integral” approach
was initially viewed with great suspicion by more mathematically inclined physicists,
despite the clear intuitive power and formal elegance of this approach. Nevertheless,
just as the initially disturbing odor of the “improper” Dirac δ-function was dispelled
by the development of a mathematically rigorous theory of distributions, the Euclidean
(imaginary-time) version of the Feynman path integral has been completely absorbed
into a rigorous theory of conditional Wiener measures, which has become an absolutely
indispensable technical tool in the arsenal of the constructive quantum field theorist.11
We will begin our discussion of this approach by returning to our favorite toy in this
chapter, the simple harmonic oscillator, where the availability of transparent analytic
expressions will allow us to confront and resolve the troubling convergence issues which
are unavoidable in any treatment of the path-integral method.

4.2.1 Path-integral formulation for the simple harmonic oscillator


We return to the Euclidean kernel for the simple harmonic oscillator,

KE (qf , tf ; qi , ti ) = qf |e−Hsho (tf −ti ) |qi  (4.93)

defined in (4.36), but with the initial and final (imaginary) times shifted to general
values, so that the elapsed time T = tf − ti . We divide the finite interval (ti , tf ) into
N subintervals of size τ = T /N by introducing N − 1 intermediate times

tn = ti + nτ, n = 1, 2, 3, . . . .N − 1 (4.94)

with t0 ≡ ti , tN ≡ tf . Likewise, N − 1 new coordinate variables q1 , q2 , . . .qN −1 (with


q0 ≡ qi , qN ≡ qf ) are introduced, corresponding to the position of our particle at

11 For a rigorous introduction to conditional Wiener measures, including appropriate convergence proofs
for potential theory, see Chapter 3 of (Glimm and Jaffe, 1987).
The functional (path-integral) framework 87

these intermediate times. Inserting a complete set of intermediate states, the finite-
time heat kernel (4.93), can be written as a multiple convolution of the N kernels
which accomplish the time-evolution of the system over the temporal subintervals
(tn , tn+1 ), n = 0, 1, 2, . . .N − 1:

 N
−1
KE (qf , tf ; qi , ti ) = dqn qf |e−τ Hsho |qN −1 qN −1 |e−τ Hsho |qN −2 
n=1

· · · ·q2 |e−τ Hsho |q1 q1 |e−τ Hsho |qi  (4.95)

One easily sees, from the exact result (4.36), that in the limit of large N (T fixed,
τ → 0), the individual kernel factors in (4.95) become

qn+1 |e−τ Hsho |qn 


m 1/2 m (qn+1 − qn )2 2
mω 2 qn+1 + qn2
=( ) exp {−τ 2
−τ )(1 + O(ω 2 τ 2 )} (4.96)
2πτ 2 τ 2 2

Inserting this result in the expression (4.95) for the finite-time evolution kernel, we
find
 N
−1
KE (qf , tf ; qi , ti ) = lim dqn e−SE (4.97)
N →∞
n=1


N −1
m qn+1 − qn 2 1 q 2 + qn2
SE ≡ τ ( ( ) + mω 2 n+1 ) (4.98)
n=0
2 τ 2 2

The limit on the right-hand side of (4.97), of course, just yields our Euclidean
propagator (4.36). Formally, identifying qn = q(tn ), the corresponding limit of the
exponent in (4.98) yields the Euclidean action
 tf
1 1
SE = { mq̇ 2 (t) + mω 2 q 2 (t)}dt (4.99)
ti 2 2

and the above-mentioned limit can therefore by interpreted as an integral of e−SE over
all “paths” q(t) subject to the boundary conditions q(ti ) = qi , q(tf ) = qf . In fact, it can
be shown that this limit defines a countably additive measure (called a conditional
Wiener measure to indicate the boundary conditions) over the space of continuous
functions q(t) defined on the interval (ti , tf ) (see (Glimm and Jaffe, 1987)), whence
the full weight of Lebesgue integration theory can be brought to bear to give a rigorous
meaning to this “functional integral” over the space of continuous functions. We shall
use the notation Dq(t) to indicate the measure defining this integral, as follows:

KE (qf , tf ; qi , ti ) = e−SE Dq(t) (4.100)
q(ti )=qi ,q(tf )=qf
88 Dynamics II: Quantum mechanical preliminaries

The connection of SE to the conventional action S, defined as the real time integral
of the Lagrangian
 tf
1 1
S= { mq̇ 2 (t) − mω2 q 2 (t)}dt (4.101)
ti 2 2
can be seen if we make the reverse analytic continuation back to real time

t → ei( 2 −) t = (i + )t
π
(4.102)
t
τ = → (i + )τ (4.103)
N
where the rotation is by an angle π2 − in the complex plane, with a positive
infinitesimal quantity, to be set to zero after the integrations in (4.97) are done. The
need for this maneuver is apparent if we examine the discretized Euclidean action
(4.98) after the continuation (4.103):


N −1
m qn+1 − qn 2 1 q 2 + qn2
−SE → iτ ( ( ) − mω2 n+1 )
n=0
2 τ 2 2


N −1
m qn+1 − qn 2 1 q 2 + qn2
− τ ( ( ) + mω2 n+1 ) (4.104)
n=0
2 τ 2 2

As long as is retained as a small positive quantity, the integrals in (4.97) remain


absolutely convergent in virtue of the negative real contribution of the second line of
(4.104). If we set to zero prematurely, the integrals become undamped oscillatory
ones and therefore ill-defined. Nevertheless, the real-time path-integral representation
of the propagator is often written ignoring the converging factor involving , in which
case we recognize the first line of (4.104) as giving (in the continuum limit N → ∞)
just i/ times the real-time action (time integral of the Lagrangian) S = Ldt:
 
K((qf , tf ; qi , ti ) = e  Ldt Dq(t)
i
(4.105)

1 2 1
L= mq̇ (t) − mω 2 q 2 (t) (4.106)
2 2
We again emphasize that (4.105) must be interpreted as containing a hidden regulariz-
ing “ ” term in the exponent in order to be meaningful. Note that we have temporarily
reintroduced Planck’s constant  in (4.105), abandoning the choice of natural units
( = 1) for a moment. The classical limit  → 0 clearly results in strong damping
except where the phase of the exponent is stationary, which is the extremal action
principle of classical mechanics selecting the classical path qcl (t):
 
δ 
Ldt =0 (4.107)
δq(t) q=qcl

The variational principles of classical mechanics (specifically, Hamilton’s principle)


can therefore be regarded as simply the result of applying the stationary-phase
The functional (path-integral) framework 89

approximation to the path-integral representation of the quantum mechanical ampli-


tudes of the system.
An important byproduct of the functional integral representation (4.100) of the
kernel (4.93) is obtained by taking the trace, as follows:


−Hsho (tf −ti )
Tr(e )= dQQ|e−Hsho (tf −ti ) |Q (4.108)
 
= dQ e−SE Dq(t) (4.109)
q(ti )=q(tf )=Q

= e−SE Dq(t) (4.110)
q(ti )=q(tf )

In the last line the integration over all paths which begin and end at a given coordinate
Q, followed by an integration over all Q, has been replaced by a functional integral
over all periodic paths satisfying q(ti ) = q(tf ). Choosing ti = −β/2, tf = +β/2, we see
that this functional integral actually provides a simple reformulation of the finite-
temperature partition function Z ≡ Tr(e−βHsho ) of the simple harmonic oscillator.
Thus the path integral also provides, in its Euclidean version, an alternative tool
for computations in quantum statistical mechanics as well as for real-time quantum
mechanics.

4.2.2 Path-integral formulation of quantum mechanics:


Hamiltonian version
We shall now abandon the simple harmonic oscillator in favor of a more general
discussion, where we consider an integral representation for the propagation of a
particle of mass m in one dimension under the influence of a general potential V (q).
Thus, the Hamiltonian is now

1 2
H= p + V (q) = H0 + V (4.111)
2m

Under rather loose conditions on the potential V (q) (e.g., it is sufficient that V be a
polynomial in q, bounded below so as to guarantee a “bottom” to the energy spectrum)
it is possible to show (Glimm and Jaffe, 1987) that for infinitesimal time intervals
τ = (tf − ti )/N , e−Hτ e−H0 τ e−V τ , in the sense that

qf |e−H(tf −ti ) |qi  = qf |(e−Hτ )N |qi  = lim qf |(e−H0 τ e−V τ )N |qi  (4.112)
N →∞

As before, we introduce N − 1 intermediate completeness sums for the kernel in


(4.112):
90 Dynamics II: Quantum mechanical preliminaries

 N
−1
KE (qf , tf ; qi , ti ) = lim dqn qf |e−H0 τ e−V τ |qN −1 
N →∞
n=1
−H0 τ −V τ
·qN −1 |e e |qN −2  . . . ..qn+1 |e−H0 τ e−V τ |qn  . . . ..q1 |e−H0 τ e−V τ |qi  (4.113)

Once again we focus attention on the individual matrix elements representing the
propagation amplitude over the infinitesimal intervals τ :

qn+1 |e−H0 τ e−V (q)τ |qn  = e−V (qn )τ qn+1 |e−H0 τ |qn 
1 2
= e−V (qn )τ qn+1 |e− 2m p τ |qn 

dpn 1 2
= e−V (qn )τ qn+1 |pn pn |e− 2m p τ |qn 


dpn ipn (qn+1 −qn )− 1 p2n τ
= e−V (qn )τ e 2m (4.114)

Notice that each time interval is now associated with an auxiliary momentum variable
pn , n = 0, 1, . . . N − 1. Inserting the result (4.114) into the integral representation
(4.113), we find
 N
dpn i N −1 pn (qn+1 −qn )− N −1 H(pn ,qn )τ
−1 
N −1
KE (qf , tf ; qi , ti ) = lim dqn e n=0 n=0
N →∞
n=1 n=0

(4.115)
1 2
where H(pn , qn ) = 2m pn + V (qn ) is the c-number valued classical energy associated
with momentum pn and coordinate qn . Although the integrals in (4.115) involve an
oscillatory factor, the second part of the exponent is negative and real, resulting
in absolute convergence of the integrals (provided, as mentioned above, that the
potential V (q) is bounded below!). As in the special case of the simple harmonic
oscillator, the limit defines a conditional Wiener measure over continuous phase-space
paths ({q(t), p(t)}, ti < t < tf ) with the boundary condition q(ti ) = qi , q(tf ) = qf and
p(ti ), p(tf ) unrestricted (all momentum integrals in (4.115) range from −∞ to +∞).
Of course, the Gaussian integral over the intermediate momentum pn in (4.114)
can easily be evaluated to leave us with

dpn ipn (qn+1 −qn )− 1 p2n τ m 1/2 − 12 m (qn+1τ−qn )2
e 2m =( ) e (4.116)
2π 2πτ

which we immediately recognize as the first factor in (4.96), the infinitesimal time
propagator for the simple harmonic oscillator. The product over all time intervals of
the remaining factor in (4.96) gives (with V (q) = 12 mω 2 q 2 for the simple harmonic
oscillator)


N −1
mω 2
q2 2
+qn N −1 22 2
N −1
= e−τ 4 (qf −qi )τ
V (qn )− mω
→ e−τ
n+1
e−τ 2 2 n=0 n=0
V (qn )
, τ →0
n=0
(4.117)
The functional (path-integral) framework 91

corresponding to the product of the potential terms: i.e., the first factor (outside the
integral) in (4.114). This establishes the equivalence of our new integral representation
(4.115), involving the Hamiltonian of the system and an integration over phase-space
paths in coordinate and momentum, with the results obtained previously (for the
harmonic oscillator) in which the exponent involved the Lagrangian (a function of
coordinates and velocities) and an integration over paths in coordinate space solely.
The Wiener measure defined by the limit in (4.115) will be indicated by a notation
analogous to that used previously in the Lagrangian formulation (4.100): namely,
  tf
(ip(t)q̇(t)−H(p(t),q(t)))dt
KE (qf , tf ; qi , ti ) = DpDqe ti (4.118)

Implicit in this expression are (i) the boundary conditions q(ti ) = qi , q(tf ) = qf , and
1
(ii) the 2π factors in the measure for the momentum integrations, visible in (4.115).
A Hamiltonian path integral representation for the real-time propagation ampli-
tude K(qf , tf ; qi , ti ) can be recovered from the Euclidean version (4.115) by the
analytic continuation discussed above in the Lagrangian case: namely, we rotate
τ → (i + )τ , obtaining

dpn i N −1 (pn (qn+1 −qn )−(1−i)H(pn ,qn )τ )
−1
n=N 
N −1
K(qf , tf ; qi , ti ) = lim dqn e n=0
N →∞
n=1 n=0

(4.119)
The integrals here are oscillatory except for the real factors e−H(pn ,qn )τ , which ensure
absolute convergence provided H is bounded below (and grows for large qn , pn ). Again,
one typically uses an abbreviated notation for this real-time version of the Hamitonian
path integral:
  tf
i (p(t)q̇(t)−H(p(t),q(t)))dt
K(qf , tf ; qi , ti ) = DpDqe ti (4.120)

but it must be understood that this integral is given meaning by a hidden i factor,
as in (4.119). If Planck’s constant is once more made explicit (as in (4.105)) and
the stationary phase approximation applied to the classical limit  → 0, we find the
modified Hamilton’s principle (Leech, 1965) as the condition selecting the classical
path in phase space:
 tf
δ (p(t)q̇(t) − H(p(t), q(t)))dt = 0 (4.121)
ti

4.2.3 Time-ordered products and operator ordering in


the path-integral method
The path-integral formulations introduced in the preceding two sections provided inte-
gral representations for a particular quantum-mechanical quantity: the propagation
amplitude for a particle localized at a specified position at some initial time (i.e., the
state |qi , ti ) to be detected at another position at some later time (corresponding to
the state |qf , tf ). This overlap amplitude can be generalized to allow the consideration
92 Dynamics II: Quantum mechanical preliminaries

of matrix elements between these initial and final states of products of Heisenberg
operators. It turns out that the path-integral method is ideally suited for the repre-
sentation of such matrix elements, but only if the corresponding operator product is
time-ordered. As a simple example, consider the propagation of the system from initial
time ti to final time tf with two intermediate times t1 , t2 specified, and t1 > t2 . The
product qH (t1 )qH (t2 ) is then time-ordered (later operator to the left), and

qf , tf |qH (t1 )qH (t2 )|qi , ti  = qf |e−iHtf eiHt1 qe−iHt1 eiHt2 qe−iHt2 eiHti |qi 
= qf |e−iH(tf −t1 ) qe−iH(t1 −t2 ) qe−iH(t2 −ti ) |qi 
(4.122)

We now repeat the steps leading from (4.112) to (4.113), dividing the time interval
(ti , tf ) into N subintervals. For very large N , we may assume that the times t1 , t2 are
arbitrarily close to the discrete times tn1 , tn2 . The only modifications to the previous
calculation are therefore the appearance of the position operator q before the states
|qn1  and |qn2  in the matrix elements (4.114), leading to the additional c-number
factor qn1 qn2 in the full functional integral. In the continuum limit, this additional
factor is just q(t1 )q(t2 ), so we obtain, in analogy to (4.120)
  tf
i (p(t)q̇(t)−H(p(t),q(t)))dt
qf , tf |qH (t1 )qH (t2 )|qi , ti  = DpDq q(t1 )q(t2 )e ti (4.123)

It is important to realize that the order of the products q(t1 )q(t2 ) inside the path
integral (4.123) is irrelevant, as at this stage we are dealing with c-number real-valued
functions which multiply commutatively. However, the above argument shows that the
path integral automatically computes the matrix element of the time-ordered product
of the corresponding Heisenberg operators. In other words, irrespective of the time
order of t1 and t2 , we have
  tf
i (p(t)q̇(t)−H(p(t),q(t)))dt
qf , tf |T (qH (t1 )qH (t2 ))|qi , ti  = DpDq q(t1 )q(t2 )e ti
(4.124)
The arguments of the preceding section leading to the result (4.118) for the imaginary
time kernel may be repeated with the insertion of imaginary time Heisenberg operators
(which evolve according to qH (t) = eHt qH (0)e−Ht ): unsurprisingly, the corresponding
matrix element (which we distinguish with the subscript E) has the path-integral
representation
  tf
(ip(t)q̇(t)−H(p(t),q(t)))dt
qf , tf |T (qH (t1 )qH (t2 ))|qi , ti E = DpDq q(t1 )q(t2 )e ti
(4.125)

It is, of course, straightforward to repeat the above argument to establish that the
insertion of arbitrary multi-nomials in both the coordinate q(t) and momentum p(t)
values at distinct times in the integral (4.124) results in an integral representation
for the matrix element of the corresponding time-ordered products of the Heisenberg
The functional (path-integral) framework 93

coordinate qH (t) and momentum pH (t) operators. The same will be true were terms
in the exponent of (4.124) involving products of p(t) and q(t) at different times to be
present. Such terms do not appear in (4.124) as it stands, and one may wonder why
they ever would! However, the fact that the path-integral formulation involves only
commuting c-number functions—either on coordinate or on phase-space—leads to the
following puzzle, which perhaps has already occurred to the reader. If the Hamiltonian
H(p, q) contains terms with both coordinate and momentum operators, with some
specified ordering (chosen, of course, to maintain the self-adjoint property of H), how
can this be reflected in the path integral where different orders of multiplication of
the c-number valued p(t) and q(t) seem manifestly equivalent? For example, if at the
operator level the Hamiltonian contained a term (λ a real constant)

V1 ≡ λpq2 p (4.126)

the corresponding c-number term in the exponent of (4.124) would be indistinguishable


from the path integral corresponding to a term
λ 2 2
V2 ≡ (p q + q2 p2 ) (4.127)
2
despite the fact that (temporarily abandoning natural units to restore Planck’s
constant)

V1 = V2 + λ2 (4.128)

so that the two Hamiltonians definitely lead to different quantum dynamics. The
solution to this quandary is to realize that in situations like this the apparently
innocent continuum limit (4.119) develops ambiguities which must be resolved by
temporarily separating the times of the coordinates and momenta to indicate the
desired ordering. Thus propagators for a Hamiltonian involving the term V1 above
would be generated by a path integral in which the c-number Hamiltonian H(p(t), q(t))
in (4.120) contains a term

λp(t + δ)q(t)2 p(t − δ) (4.129)

where δ is a small time-interval which is only set to zero after the N → ∞ limit in
(4.119) is carried out. Likewise, if we desire the propagator in a theory containing the
term V2 above, we need to regularize the path integral by adding a term
λ
(p(t + δ)2 q(t)2 + p(t)2 q(t + δ)2 ) (4.130)
2
to H(p(t), q(t)) in (4.120), with the limit δ → 0 performed only after the discrete
time-limit defining the path integral is performed (i.e., τ → 0).

4.2.4 Ground-state expectation values from path integrals


One of the most important applications of the path-integral method lies in the
evaluation of the ground-state expectation value of observables. Essentially all
94 Dynamics II: Quantum mechanical preliminaries

non-perturbative formulations of quantum field theory begin with the study of such
expectation values of (products of) the field operators. This technique also lies at
the heart of modern approaches to the numerical computation of the spectrum of
theories such as quantum chromodynamics (the field theory describing the strong
interactions of quarks and gluons), where perturbative methods fail. We begin with
the result (4.124) for the real-time expectation value of the product of two Heisenberg
coordinate operators:
  +t
i (p(t)q̇(t)−H(p(t),q(t)))dt
qf , +t|T (qH (t1 )qH (t2 ))|qi , −t = DpDq q(t1 )q(t2 )e −t

= qf , 0|e−iHt(1−i) T (qH (t1 )qH (t2 ))e−iHt(1−i) |qi , 0


(4.131)

Note that we have chosen to evolve the system over the symmetric time-interval from
−t to +t, and that the 1 − i factor needed to make the path integral well-defined (see
(4.119)) is explicitly displayed in the corresponding matrix element. We now assume
that our quantum system has a unique ground state |0 of energy E0 , separated in
energy from the first excited state (or states) by a finite gap E1 − E0 . If we insert a
complete set of energy eigenstates 1 = |nn| (where H|n = En |n), we find

e−iHt(1−i) |qi , 0 = e−iEn t(1−i) n|qi , 0|n
n
−iE0 t(1−i)
→e 0|qi , 0|0 + O(e−(E1 −E0 )t ), t → ∞
qf , 0|e−iHt(1−i) → 0|qf , 0|0e−iE0 t(1−i) + O(e−(E1 −E0 )t ), t → ∞ (4.132)

In other words, the infinite time limit of the path integral acts as a “low-pass filter”,
effectively selecting out the ground-state component of the initial and final states. If
we divide the matrix element in (4.131) by the same quantity without the T-product
(i.e., by qf , t|qi , −t) and take the large time limit, the exponential time factors and
overlap factors 0|qi , 0 and qf , 0|0 cancel, leaving

qf , +t|T (qH (t1 )qH (t2 ))|qi , −t 0|T (qH (t1 )qH (t2 ))|0
lim =
t→∞ qf , t|qi , −t 0|0
 +∞
 i (p(t)q̇(t)−H(p(t),q(t)))dt
DpDq q(t1 )q(t2 )e −∞
=  +∞ (4.133)
 i (p(t)q̇(t)−H(p(t),q(t)))dt
DpDq e −∞

This result generalizes in an obvious way to time-ordered products of n qH (t)


operators, which suggests the introduction of the following generating functional:
  +∞
i (p(t)q̇(t)−H(p(t),q(t))−j(t)q(t))dt
Z[j] ≡ DpDq e −∞ (4.134)
The functional (path-integral) framework 95

allowing us to generate arbitrary ground-state expectations of time-ordered coordi-


nate operators by functional differentiation12 with respect to the c-number source
function j(t):

0|T (qH (t1 ) . . . qH (tn ))|0 1 in δ n Z[j] 
=  (4.135)
0|0 Z[j] δj(t1 ) . . . δj(tn ) j=0

4.2.5 Quantum symmetries: path-integral aspects


The reformulation of operator quantum theory in the quasiclassical c-number frame-
work of the path integral suggests that the transferral of classical symmetries to
their quantum analogs should be particularly straightforward using path integral
techniques. And in many cases of interest, this is exactly what we find. For the case of
three-dimensional rotation symmetry discussed in Section 4.1.3, for example, where a
particle moves in a rotationally invariant potential, subject to the Lagrangian
1 dr dr
L(r˙, r ) = m · − V (r(t)) (4.136)
2 dt dt
the Lagrangian path-integral representation of the propagator (4.70) takes the form
  T
i
rf |e−iHT |ri  = exp { L(r˙, r )dt} Dr, r(0) = ri , r(T ) = rf (4.137)
 0

with V (r ) a rotationally invariant potential, V (R( α)r ) = V (r ). The dynamical rota-
tional invariance expressed by (4.70) follows immediately by making a change of
integration variable in the functional integral r(t) → R( α)r(t) (rendered precise by
discretization, as in (4.104)). All that is required is (a) the invariance property of
the Lagrangian, L(R( α)r˙, R(
α)r ) = L(r˙, r ), and (b) the invariance of the functional
measure, which follows from the unimodular property of the rotation matrices,
det(R( α)) =1.
For the canonical symmetries discussed previously, in which new coordinates and
momenta are introduced which are in general non-linear functions of the old ones (cf.
Section 4.1.3), the realization of the canonical symmetry at the quantum level can
involve some subtle issues. Classically, for generating functions lacking explicit time-
dependence, the new Hamiltonian (expressed in the new variables) is algebraically
equal to the old Hamiltonian. In the quantum case there may be additional “anoma-
lous” terms of order  (or higher powers of ). In the path-integral formalism these
terms appear as (a) a consequence of a non-trivial Jacobian in the change of variables
qn , pn in the (discretized) Hamiltonian version of the path integral (4.119), and (b)
time reordering of coordinate and momentum variables when a discretized version of
the continuous classical contact transformation is implemented (recall our discussion
above of operator ordering in the path-integral context). We shall return later in the
book to the important issue of quantum anomalies in classical symmetries in quantum

12 For a review of the basic elements of functional calculus, see Appendix A.


96 Dynamics II: Quantum mechanical preliminaries

field theory: the reader interested in a more thorough discussion of these issues in non-
relativistic quantum mechanics is referred to the work of Swanson (Swanson, 1993).

4.3 Scattering theory


The importance of scattering theory in the development of quantum field theory is
to some extent a technological accident, stemming from the development of particle
accelerators in this century as the primary experimental tool of subatomic physics. The
primary source of phenomenological information concerning the behavior of matter at
very short distance scales remains the observation of collision processes of elementary
particles in high-energy accelerators. A typical such process may be described qual-
itatively in the following terms. A short time (in macroscopic units, but effectively
at time t = −∞ in time units appropriate for elementary particle interactions) before
the collision, some number (typically two) of particles are travelling freely towards
each other. A short time after the collision (effectively, at t = +∞) the state of the
system may be resolved into a complicated linear combination of states, each of which
corresponds to some definite number of particles with various momenta and quantum
numbers. The coefficient of each such state is a complex amplitude, the square of
which is just the probability that this state was the end result of the collision. Our
ultimate objective is to gain an understanding of the underlying dynamics giving rise
to these experimentally measurable amplitudes.
Let us be a little more specific. The notation |αin will refer to the state described
above, where α is a shorthand for the complete specification of the momenta, spins,
etc., of the incoming particles. For example, if we were colliding two spinless particles,
α would simply specify (k1 , k2 ), the two momenta of the incoming particles.
Strictly speaking, the state in which the incoming particles are initially localized,
far apart, and moving towards one another must be obtained by constructing wave-
packets. This is done in the usual way: one must fold the momentum eigenstates
(which are infinitely extended plane waves) with a smooth (e.g., Gaussian) function
of momentum to produce a wavefunction of finite extent in coordinate space. In other
words, the physical “in-state” which is really prepared in an accelerator is actually

g(α)|αin dα (4.138)

where g(α) is the folding function. One may also define “out-states” correspondingly
as physical states in which the system goes over to a definite number of free outgoing
particles after the collision. These are not the states prepared in any conceivable
accelerator, but they are the states measured by the detectors after the collision has
taken place. The amplitude that a given incoming state |αin will then be found to be
in the state |βout by a detector measurement after the collision is just the overlap

out β|αin ≡ Sβα (4.139)

The quantity Sβα is called the S-matrix and is of fundamental importance. It is


convenient to think of it as a matrix, although in general the indices α, β may contain
continuous (e.g., momenta of the incoming and outgoing particles) as well as discrete
(e.g., spin and isospin quantum numbers) variables. A column of this matrix lists
Scattering theory 97

all the amplitudes for possible final states β arising from a given initial state α. The
probability interpretation of quantum mechanics requires that the sum of the absolute
squares of these amplitudes must be unity (something has to happen!). This is just the
property of a unitary matrix. The unitarity of the S-matrix, which we shall see below
follows from the hermiticity of the Hamiltonian, is one of the fundamental constraints
which we will have to keep in mind when building quantum field theories.

4.3.1 Convergence notions for states in Hilbert space


It will be useful to present a more detailed account of these admittedly rather vague
concepts in a familiar context: non-relativistic potential scattering of a particle in
three dimensions. Thus, the dynamics of our particle of mass m is determined by a
Hamiltonian
p2
H = H0 + V = + V (r) (4.140)
2m
In the qualitative discussion of scattering theory given above, the concepts of free
particles localized far apart and moving towards or away from each other play a
prominent role, as does the notion of the infinite time limits (both at t → −∞ and
at t → +∞) of the quantum state of the scattered particle(s). The latter notion is
particularly subtle, and we must pause here to remind the reader13 of the differing
notions of convergence which can apply to vectors in a Hilbert space. Recall that a
Hilbert space with a complex inner product is ipso facto a normed space, where a
“norm” (or “length”) of a state vector |Ψ can be defined as

|||Ψ|| ≡ Ψ|Ψ (4.141)

In order to avoid the unsightly concatenation of vertical bars in (4.141), we shall


temporarily abandon Dirac notation and denote state vectors by capital Greek letters,
inner products by round brackets, and the norm by a double bar as above:

|Ψ → Ψ
Φ|Ψ → (Φ, Ψ)
|||Ψ|| → ||Ψ|| (4.142)

In the Schrödinger picture we have time-dependent states Ψ(t), and the question arises
whether it makes sense to consider infinite time limits (“far past” or “far future”) of
such states. In general, a sequence of states Ψn is said to converge weakly to Υ if

lim (Φ, Ψn ) = (Φ, Υ) (4.143)


n→∞

for any fixed Hilbert space vector Φ. In particular, a sequence of states has a weak
limit if each component of these states in a complete orthonormal basis converges

13 For a more detailed introduction to the relevant concepts from functional analysis, see (Newton, 1966),
Chapter 6.
98 Dynamics II: Quantum mechanical preliminaries

separately to a finite limit. In this case we write simply

Ψn → Υ (4.144)

The reason for the appelation “weak” becomes apparent when we consider that the
sequence of states (1, 0, 0, 0, . . .), (0, 1, 0, 0, . . .), (0, 0, 1, 0, . . .), etc., specified by listing
their components in a denumerable orthonormal basis, converges weakly to zero,14 even
though the norm of each state in the sequence is unity! Another equally off-putting
case is given by the wavefunction for a localized wave-packet, given in coordinate
space by

2
p
Ψ(r, t) = r|Ψ, t = g( p)ei

r−i 2m t d3 p (4.145)

which also converges weakly to zero at large (negative or positive) times, due to the
famous spreading of the wave-packet, which implies that the overlap of Ψ(t) with
any (normalizable, and hence essentially localized) state must vanish at large times.
Our intuitive feeling of “convergence” conforms more closely to a stronger requirement
than that implied by (4.143). We say that the sequence of states Ψn converges strongly
to the state vector Υ if the norm of the difference vectors converges to zero:

lim ||Ψn − Υ|| = 0 (4.146)


n→∞

written concisely with a double-arrow as

lim Ψn ⇒ Υ (4.147)
n→∞

With this definition, neither of the examples of weak convergence given in the
preceding paragraph survive (as the individual states in the sequence have fixed norm
and can clearly not converge in the strong sense to zero!). Correspondingly, weak (resp.
strong) convergence of a sequence of operators On to a limit operator O can be defined
by requiring that for every fixed state Φ, On Φ → OΦ (resp. On Φ ⇒ OΦ).

4.3.2 In- and out-states in potential scattering


Keeping the preceding mathematical niceties in mind, we now return to the somewhat
slippery problem of specifying precisely the initial conditions of a scattering event.
The limit t → −∞ implied so far in the specification of an “in-state” should not be
taken literally: in that limit any particle described by a wave-packet of finite width
at finite time will be spread over all of space with vanishing probability density at
any given point, which hardly corresponds to the origin of electrons or protons used
to form the beams in present-day high-energy accelerators, which are extracted from
perfectly sensible localized states in originally neutral atoms. Nevertheless, we may
retain the useful mathematical fiction of “infinite past (or future)” by noting that
free and interacting solutions of the time-dependent Schrödinger equation can be

14 Indeed, the overlap of these vectors with any finite norm vector with components c gives just c →
 n n
0, n → ∞, as the cn must go to zero if n |cn |2 < ∞.
Scattering theory 99

found which approach each other arbitrarily closely in the “strong” sense in the limit
t → −∞, and that this unique association of free with interacting states allows us to
define a mapping from an arbitrary (finite norm) free state Ψ(t) evolving according
to H0 to an interacting state Ψin (t) evolving according to the full Hamiltonian H,
provided that the interaction potential V is sufficiently short-ranged. To see how to do
this, we work for the moment in coordinate representation and imagine that we are
provided with a normalizable solution Ψ(r, t) of the free time-dependent Schrödinger
equation
1 ∂Ψ(r, t)
− ΔΨ(r, t) = i (4.148)
2m ∂t
For definiteness, such a solution is given by (4.145), with g( p) a smearing function
peaked at some 3-momentum p0 and some width Δp, such that the resultant wave-
packet is localized in the neighborhood of the spatial origin at time t = 0. For any given
large negative time T , we now define an interacting solution Ψ(T ) (r, t) associated with
this freely evolving state as the solution of the interacting Schrödinger equation

1 ∂Ψ(T ) (r, t)
− ΔΨ(T ) (r, t) + V (r )Ψ(T ) (r, t) = i (4.149)
2m ∂t
subject to the boundary condition

Ψ(T ) (r, T ) = Ψ(r, T ) (4.150)

It is intuitively plausible that for potentials which fall off sufficiently rapidly at


large distances from the origin (where we imagine the interaction potential to be
concentrated) the effect of the potential should be increasingly negligible as T goes
to −∞, as the center of the wave-packet recedes from the center of the potential.
The rigorous proof of this hypothesis clearly requires a careful consideration of the
spreading of the wave-packet in relation to the falloff of the potential. As shown by
Brenig and Haag (Brenig and Haag, 1963), it suffices that the potential V (r ) falls off
faster than r−(1+) , > 0. Assuming this to be the case, it can then be shown that
the interacting and free solutions approach one another strongly in the sense that the
norm of the difference of free and interacting solutions is uniformly bounded in the
far past. Thus, defining the shift χ(T ) (t) in the state due to the potential

χ(T ) (t) ≡ Ψ(T ) (t) − Ψ(t), χ(T ) (T ) = 0 (4.151)

we expect that in the far past the influence of a localized potential must vanish for
wave-packets then localized far from the center of the potential, i.e.,

||χ(T ) (t)|| < F (T ), t < T, with F (T ) → 0, T → −∞ (4.152)

and that in this limit Ψ(T ) (0) (i.e., the interacting solution matched in the far past
to a specified free wave-packet, run forward to time zero) converges to a well-defined
Heisenberg state (recall that the various representations are defined to coincide at
time zero):

Ψ(T ) (0) ⇒ Ψin , T → −∞ (4.153)


100 Dynamics II: Quantum mechanical preliminaries

Of course, as emphasized in the preceding section, the limit T → −∞ corresponds


physically to a very short time on any macroscopic time-scale.15
Note that since Ψ(T ) (t) = e−iHt Ψ(T ) (0), and Ψ(t) = e−iH0 t Ψ(0), the boundary
condition (4.150) implies

Ψ(T ) (0) = eiHT e−iH0 T Ψ(0) = U (0, T )Ψ(0) (4.154)

where U (t, t0 ) is the time-development operator in the interaction picture (see (4.16)).
It then follows from (4.153) that U (0, T ) has a strong limit when T is taken to the
infinite past:

U (0, T ) ⇒ Ω− , T → −∞ (4.155)

where the Møller wave operator Ω− maps the free state Ψ onto the in-state Ψin
associated with it by the procedure described above:

Ψin = Ω− Ψ (4.156)

An exactly analogous procedure, this time taking the limit T → +∞, can be used
to associate any freely evolving wave-packet solution Ψ with an interacting state
converging strongly to it in the far future: the associated Heisenberg state is then
called Ψout :

Ψout = Ω+ Ψ (4.157)
U (0, T ) ⇒ Ω+ , T → +∞ (4.158)

In the case that the potential V admits bound states (a not infrequent situation!),
the above discussion conceals some subtleties which we shall mention briefly here. In
this situation the scattering eigenstates of the full Hamiltonian H are not complete,
as the bound state(s) are missing. Indeed, the Møller wave operators are in this case
norm-preserving maps from the full Hilbert space H(=L2 (R3 )) spanned by the free
solutions Ψ to the subspace Hscat spanned by the interacting scattering states Ψin (or
Ψout ). Indeed, for a bound state of energy Eb , normalized eigenstate Ψb , consider the
overlap matrix element

(U (0, T )Ψ, Ψb ) = e−iEb T (e−iH0 T Ψ, Ψb ) (4.159)

Recall that Ψ here represents a localized wave-packet (e.g., the state given in (4.145))
so that e−iH0 T Ψ will spread in the limit T → −∞ to a state with pointwise vanishing
probability density, and hence vanishing overlap with any stationary, localized bound-
state wavefunction Ψb . Thus, the right-hand side of (4.159) vanishes in the limit
T → −∞. As Ω− is the strong limit of U (0, T ) as T → −∞, it follows that this Møller
operator maps an arbitrary free state onto the proper subspace of states orthogonal

15 We also note here that the Coulomb potential fails to satisfy the falloff condition posited above, and
indeed, non-relativistic Coulomb scattering exhibits a number of subtleties, which, however, will not concern
us further here. Related field-theoretic subtleties in defining the scattering matrix for theories with massless
particles will be considered explicitly in Section 19.2.
Scattering theory 101

to the discrete bound states of V . As U (0, T ) is norm-preserving at any finite T , its


strong limit Ω− is also norm-preserving, but not onto, and hence not unitary.16
Returning to the Møller operators defined by the limits (4.155, 4.158), it is clear
that the strong limit T → ±∞ is unaffected by any fixed finite time-shift t; accordingly

Ω± = eiHt Ω± e−iH0 t (4.160)

for all t, whence, taking t infinitesimal

HΩ± = Ω± H0 (4.161)

We are finally equipped with the necessary tools to introduce the central concept
of scattering theory: the S-matrix, which we discussed qualitatively in the preceding
section. We recall that the Hilbert space L2 (R3 ) is separable, so the free solutions Ψ
may be expanded in a discrete orthonormal basis of finite-norm states Ψα (α a discrete
index)

Ψ= (Ψα , Ψ)Ψα (4.162)
α

In particular, any localized wave-packet describing either the state prepared by an


accelerator or the state detected after a collision can be so expanded. Each free basis
state is associated with an in- or out-state (Ψα,in or Ψα,out ) via the Møller operators
introduced above. The quantum-mechanical amplitude that a state prepared in the
far past to match the behavior of the free state Ψα will evolve into the detected state
Ψβ in the far future is then given by the overlap

(Ψβ,out , Ψα,in ) ≡ Sβα = (Ω+ Ψβ , Ω− Ψα ) = (Ψβ , Ω+† Ω− Ψα ) (4.163)

Using (4.155, 4.158), this becomes

Sβα = lim (Ψβ , U (0, T  )† U (0, T )Ψα )


T  →∞,T →−∞

= lim (Ψβ , U (T  , T )Ψα ) ≡ (Ψβ , U (+∞, −∞)Ψα ) (4.164)


T  →∞,T →−∞

where the limits in the last two lines are strong, which together with the unitarity
of U (t, t0 ) at finite t, t0 and the completeness of the free states Ψα ensures that the
infinite discrete matrix S is unitary in the standard way (even in the presence of bound
states).17 In practice, it is more convenient for obvious reasons to use continuum-
normalized states: we go over to wave-packets of arbitrarily well-defined momentum,
for example, in which limit the S-matrix amplitudes remain well defined (again, with

16 As a simple example of a norm-preserving but non-unitary operator, consider the operator represented
by the infinite discrete matrix Onm = δn,m+1 , which maps an arbitrary vector in l2 , the Hilbert space
of square-summable infinite complex sequences, into a shifted vector of equal norm but one with no first
component.
17 The unitarity of the S-matrix defined as an overlap of in- and out-states even in the presence of unitary
defects in various interaction-picture operators will (fortunately) persist in quantum field theory, where the
defect will become total, given Haag’s theorem for the non-existence of the interaction picture, discussed
below in Chapter 10.
102 Dynamics II: Quantum mechanical preliminaries

the proviso of suitably localized interaction potentials). Then the discrete index α
above is replaced by a specification of the momentum (and if present, spins) of the
incoming particle. It should be emphasized that the matrix S defined above is to be
thought of as the matrix of an operator S acting in the full Hilbert space spanned
by freely-evolving wave-packets. Energy conservation requires commutation of S with
H0 , the free Hamiltonian. Indeed, Ψα and Ψβ can be chosen to be free wave-packets
of arbitrarily well-defined H0 eigenvalue, in which case we certainly require Sβα to
vanish if Eβ = Eα , and this is ensured by the intertwining property (4.161):

H0 S = H0 Ω+† Ω− = Ω+† HΩ− = Ω+† Ω− H0 = SH0 (4.165)

4.3.3 Time-independent scattering theory


The derivation of many important theorems of scattering theory—in particular, the
generalized optical theorem, which we prove below—is facilitated by another important
result, the Lipmann–Schwinger equation, encapsulating the relation between free
states and the interacting states matched to them in the far past or future. We derive
this equation here in the case of in-states (the analogous result for out-states following
from a completely parallel argument). From the integral equation (4.22) it follows that

U (t, −∞) = U (t, 0)Ω− = eiH0 t e−iHt Ω− = eiH0 t Ω− e−iH0 t (4.166)

satisfies
 t
U (t, −∞) = 1 − i Vip (t1 )U (t1 , −∞)dt1 (4.167)
−∞

whence, inserting (4.166),


 t
eiH0 t Ω− e−iH0 t = 1 − Vip (t1 )eiH0 t1 Ω− e−iH0 t1 dt1 (4.168)
−∞

or equivalently
 t

Ω = 1−i e−iH0 t Vip (t1 )eiH0 t1 Ω− e−iH0 (t1 −t) dt1 (4.169)
−∞
 t
= 1−i eiH0 (t1 −t) V Ω− e−iH0 (t1 −t) dt1 (4.170)
−∞
 0
= 1−i eiH0 t1 V Ω− e−iH0 t1 dt1 (4.171)
−∞

We now take the matrix element of this result between two free wave-packet solutions
Ψβ and Ψα which are arbitrarily close to energy eigenstates with energy Eβ , Eα . The
free time-development factors e±iH0 t1 mean that the centroids of these packets are
moved very far from our presumably localized potential V at large negative times t1 ,
so we may at no cost insert an adiabatic switching factor et1 ( > 0) multiplying the
potential: for very small ( will be taken to zero at the very end), switching off the
Scattering theory 103

interaction potential at very large negative times will have no effect if the wave-packets
are still very far from the potential center. We then obtain
 0

(Ψβ , Ω Ψα ) = (Ψβ , Ψα ) − i ei(Eβ −Eα −i)t1 (Ψβ , V Ω− Ψα ) (4.172)
−∞

1
= (Ψβ , Ψα ) − (Ψβ , V Ω− Ψα ) (4.173)
Eβ − Eα − i
1
= (Ψβ , Ψα ) + (Ψβ , V Ω− Ψα ) (4.174)
Eα − H0 + i
As the Ψβ can be chosen to run over a complete basis of the Hilbert space, we
may remove it, obtaining the desired Lipmann–Schwinger equation, relating free to
interacting scattering states:
1
Ψα,in = Ψα + V Ψα,in (4.175)
Eα − H0 + i
In a similar fashion one may derive the Lipmann–Schwinger equation for out-states:
1
Ψα,out = Ψα + V Ψα,out (4.176)
Eα − H0 − i
Note that by multiplying both sides of (4.175) by Eα − H0 (at which point the i
becomes irrelevant) we find

(Eα − H0 )Ψα,in = V Ψα,in ⇒ HΨα,in = Eα Ψα,in (4.177)

so that the interacting scattering state has the same energy relative to the full
Hamiltonian H as the free state Ψα which matches it in the far past has relative
to H0 .
By carrying the time evolution in (4.166) all the way forward to t = +∞ we of
course obtain the S-matrix element Sβα :

Sβα = lim (Ψβ , U (t, −∞)Ψα ) (4.178)


t→+∞
 +∞
= (Ψβ , Ψα ) − i (Ψβ , eiH0 t1 V Ω− e−iH0 t1 Ψα )dt1 (4.179)
−∞

= (Ψβ , Ψα ) − 2πiδ(Eβ − Eα )(Ψβ , V Ω− Ψα ) (4.180)

Defining the T-matrix element Tβα

Tβα ≡ (Ψβ , V Ω− Ψα ) = (Ψβ , V Ψα,in ) (4.181)

this becomes, if we choose the Ψα from an orthonormal set,

Sβα = δβα − 2πiδ(Eβ − Eα )Tβα (4.182)

At this point it will be convenient to return to Dirac notation for states and matrix
elements, as the fine points of convergence that infest time-dependent scattering theory
104 Dynamics II: Quantum mechanical preliminaries

have already been discussed adequately for our purposes. The Lipmann–Schwinger
equations (4.175, 4.176) will henceforth be written
1 1
|αin = |α + V |αin , |αout = |α + V |αout (4.183)
Eα − H0 + i Eα − H0 − i
We will also abandon our previous insistence on normalizable (wave-packet) states
and allow the label α to denote continuum orthonormalized states of well-defined
momentum and energy (possibly containing a discrete spin index as well). The usual
completeness
 relations will then involve integrals (as well as spin sums), denoted
formally dα . . . Similarly, the notation δαβ will denote a product of continuous
δ-functions (in the momentum variables) and discrete Kronecker δs (for any discrete
spin indices).
We now turn to the derivation of some formal scattering theorems of great
importance. By taking the adjoint of the Lippmann–Schwinger (LS) equation for an
in-state, we find
1
in β| = β| + in β|V
Eβ − H0 − i
1
⇒ in β|V |αin = in β|V |α + in β|V V |αin
Eα − H0 + i
∗ 1
= Tαβ + in β|V V |αin (4.184)
Eα − H0 + i
1
= β|V |αin + in β|V V |αin
Eβ − H0 − i
1
= Tβα + in β|V V |αin (4.185)
Eβ − H0 − i

In (4.184) we have used the LS equation on the right, and the definition (4.181) of
the T-matrix; in (4.185) one uses the LS equation on the left. Subtract the right-hand
sides of (4.184) and (4.185) to obtain

∗ 1 1
Tβα − Tαβ = in β|V { − }V |αin (4.186)
Eα − H0 + i Eβ − H0 − i

If we insert a complete set of free eigenstates of H0 , with dγ|γγ| = 1, on the right
of the second V in (4.186), we find


Tβα − Tαβ = dγ in β|V |γγ|V |αin

1 1
.{ − } (4.187)
Eα − Eγ + i Eβ − Eγ − i
At this point we remind the reader of a famous identity: observe that
1 x
= 2 −i 2 (4.188)
x + i x + 2 x + 2
Scattering theory 105


The function x2 +2 is a highly peaked function of x (for small) which integrates
x
to π: in other words, it is just πδ(x). The odd function x2 +2 is a regularized form

of 1/x which goes by the name of “principal-part of x ”, or simply P ( x1 ). Now by


1

energy conservation, Eβ = Eα and (4.187) becomes the so-called “Generalized Optical


theorem”:

∗ ∗
Tβα − Tαβ = −2iπ dγTγβ δ(Eγ − Eα )Tγα (4.189)

A particularly important special case of the above result occurs for forward scattering:
for example, when α, β refer to two-particle states with identical momenta, i.e., the
elastic scattering amplitude in the limit of zero exchanged momentum. Then (4.189)
becomes

Im(Tαα ) = −π dγδ(Eγ − Eα )|Tγα |2 (4.190)

Evidently, the right-hand side of this relation describes the integrated probability (or
total cross-section) for the given initial state |αin to evolve into an arbitrary final
state |γout (naturally, of the same energy, hence the δ-function). The optical theorem
relates this to the imaginary part of the the forward scattering amplitude Tαα . The
term “optical” derives from the special case where the scattering is that of photons off
neutral atoms in a medium, in which case the forward scattering amplitude is related
to the index of refraction and the integrated scattering cross-section to the absorption.
The Generalized Optical theorem (G.O.T.) is really nothing but the previously
advertised unitarity of the S-matrix, somewhat disguised, as the following brief
computation shows:
 

(S † S)βα = dγSβγ Sγα = dγSγβ ∗
Sγα (4.191)

Inserting (4.182), this becomes



(S † S)βα = dγ(δγβ + 2iπδ(Eγ − Eβ )Tγβ

)(δγα − 2iπδ(Eγ − Eα )Tγα )

= δβα + 2iπδ(Eα − Eβ )Tαβ


−2iπδ(Eβ − Eα )Tβα − 2iπδ(Eβ − Eα ).2iπ dγTγβ δ(Eγ − Eα )Tγα

= δβα (4.192)

where all terms save the final Kronecker δ cancel in the penultimate line, courtesy of
the Generalized Optical theorem (4.189).
Of course, the above results still leave us a fair way from an actual scattering
experiment. In particular, we need to be able to convert information about S-matrix
elements into a precise statement of how many particles of given momentum and
type will emerge per unit time when a given target is placed in a particle beam of a
given intensity. Also of interest are the cases in which the Ψα,in state in (4.164) is a
106 Dynamics II: Quantum mechanical preliminaries

one-particle state, while the Ψβ,out state may contain two, three, or more particles—
corresponding to the decay of an unstable particle. The relevant formulas connecting
the S-matrix elements to the desired phenomenological cross-sections and rates are
derived in Appendix B.

4.4 Problems
1. For the evolution operator U (t, t0 ) (4.16), verify the semigroup property

U (t2 , t0 ) = U (t2 , t1 )U (t1 , t0 )

2. (a) Show that the Jordan operator Ucan given in (4.74) effects the appropriate
similarity transformation between old and new canonical coordinates (4.75,
4.76).
(b) Show that the operator (4.80) implementing linear canonical transformations
is (up to a multiplicative constant) equivalent to the unitary kernel (4.81).
(Hint: apply the operator (4.80) to the coordinate space wavefunction written
as a Fourier transform of the momentum-space wavefunction.)
3. The object of the following exercise is to build intuition for the very important
concepts of in/out states and the S-matrix, in a simple example where every-
thing can be worked out explicitly. The model being considered is that of a
one-dimensional repulsive δ-function potential, V (x) = gδ(x), g > 0 (see Baym
(Baym, 1990), p.113). Scattering experiments are performed by firing particles
of mass m and momentum k ( is set to unity throughout!) in from the left or
the right. By energy conservation, this results in outgoing particles (either to
the left or right) with the same magnitude of momentum k. Use a normalization
where the free particle plane wave moving left to right is given by a wavefunction
x|k = eikx .
(a) Show that in coordinate space the Lippmann–Schwinger equation for right-
moving in-states takes the form (k > 0)

(+) (+)
Ψin (x; k) ≡ x|kin = eikx + dyGin (x, y)V (y)Ψin (y; k) (4.193)

where the Lippmann–Schwinger in-state kernel is


1
Gin (x, y) ≡ x| k2
|y
2m − H0 + i
P2
H0 ≡
2m
(b) Show that the kernel Gin takes the explicit form
im ik|x−y|
Gin (x, y) = − e
k
(c) Use the results of (a) and (b) to obtain a completely explicit expression
(+)
for Ψin (x; k). Interpret the various terms in your result (for x < 0, x > 0
Problems 107

separately) in terms of the conventional transmission and reflection coeffi-


cients for one-dimensional scattering.
(−)
(d) Repeat (a–c) for left-moving in-states: i.e., compute Ψin (x; k).
(±)
(e) Repeat (a–d) for left and right-moving out-states: i.e., compute Ψout (x; k).
(f) Suppressing the momentum (magnitude) k, assumed fixed throughout,
we may label the in/out states as simply |±in,out , leading to a 2 × 2
S-matrix with elements S++ , S+− , . . . etc. Calculate this 2 × 2 matrix (hint:
(+) (+) (−)
it is convenient to use Ψin = S++ Ψout + S−+ Ψout , etc.) and verify that it
is unitary.
4. Use the Lipmann–Schwinger equation to calculate Ψin (x) for an incoming particle
of momentum k (k > 0) scattering in one dimension off the double δ-potential
V (x) = −g(δ(x − a) + δ(x + a))

5. (a) For the case of a repulsive δ-function potential (g > 0):


V (x) = gδ(x)
show that the scattering states by themselves form a complete set (there is
no bound-state in this case!).
(b) Now suppose that the δ-function potential in part (a) is attractive (g < 0).
There is now a single normalizable bound-state energy eigenstate Ψb (x).
Show that the completeness relation now requires inclusion of the bound-
state: i.e.,
 ∞
(+) (+)∗ (−) (−)∗ dk
{Ψin (x; k)Ψin (y; k) + Ψin (x; k)Ψin (y; k)}
0 2π
+Ψb (x)Ψ∗b (y) = δ(x − y) (4.194)
5
Dynamics III: Relativistic quantum
mechanics

In this chapter we shall begin to explore the implications of the basic underlying
symmetry of relativistic quantum field theory: the Poincaré group incorporating
the symmetry of such theories under Lorentz transformations and translations in
spacetime. This is, of course, the second of the three fundamental physical ingredients
of quantum field theory discussed at length in Chapter 3 (the other two being
quantum theory and the locality principle). Minkowski’s introduction in 1908 of a
four-dimensional space, in which the Lorentz transformations of special relativity
could be interpreted as rotations preserving an indefinite metric, does not expand
the physical content of relativity, but it vastly improves our ability to visualize the
physical structure of the theory, and just as importantly, greatly simplifies the search
for theories satisfying the constraints of relativistic invariance.
Although we assume the reader to be familiar with the basic tenets of special
relativity, as formulated in spacetime concepts, we begin this chapter with a brief
review of the Lorentz and Poincaré groups, which provide the kinematic underpinnings
for the description of particle states in field theory. This will also provide a convenient
opportunity for introducing the reader to the notational conventions that will prevail
in the rest of this book.

5.1 The Lorentz and Poincaré groups


In special relativity we characterize events as points in a four-dimensional spacetime
specified by four coordinates which identify the event by readings of clocks synchro-
nized in and meter sticks attached to a given inertial frame S. Typically, we use
Cartesian coordinates x1 , x2 , x3 for the spatial location (x, y, z are more familiar,
but are unsuited to the general spacetime notation we shall use), and x0 = t for the
time. The event as a whole is associated with the spacetime contravariant four-vector1
xμ , μ=0,1,2,3. The same physical event as viewed by meter sticks and clocks in another
inertial frame S  will be identified by coordinates xμ . The linear operations relating the
old and new spacetime four-vectors are called homogeneous Lorentz transformations
(together constituting the homogeneous Lorentz group, or HLG):

xμ = Λμν xν (5.1)

1 Note that the physical energy and three spatial momentum components are conventionally taken as
the 0,1,2,3 components respectively of a contravariant vector kμ .
The Lorentz and Poincaré groups 109

The scalar product left invariant by a Lorentz transformation is most compactly


expressed by introducing a covariant four-vector xμ . Indices are lowered by x0 =
x0 , x1 = −x1 , x2 = −x2 , x3 = −x3 or equivalently, by using the Minkowski metric
tensor gμν ,

xμ = gμν xν

The components of gμν are g00 = 1, g11 = g22 = g33 = −1, with all off-diagonal ele-
ments zero. The Lorentz transformation (5.1) preserves the relativistic scalar product
of any two four-vectors:

x · y  ≡ xμ yμ = Λμν Λμρ xν y ρ = xρ y ρ (5.2)

for all xρ , yρ . Differentiating first with respect to y ρ and then with respect to xν , one
finds

Λμν Λμρ = gνρ (5.3)

4x4 matrices satisfying (5.3) are said to lie in the fundamental representation of the
Lorentz group. By raising the index ρ we may rewrite this as

Λμν Λμρ = δνρ (5.4)

which implies

(Λ−1 )ν μ = Λμν (5.5)

Because of the signs involved in raising and lowering indices, it should be noted that
the Λ matrices are not orthogonal in general (this will hold only for the subgroup of
purely spatial rotations).
Condition (5.3) can be written as the matrix equation

ΛT gΛ = g (5.6)

from which we find, by taking the determinant of both sides, that det(Λ)2 = 1,
det(Λ) = ±1. The set of Lorentz transformations with det(Λ) = +1 form the “proper
Lorentz transformations” (obviously a subgroup). The class of transformations consid-
ered can be refined further by considering only those transformations corresponding to
physically realizable changes of inertial frame. Such “orthochronous” transformations
leave the sign of the time component unchanged and are characterized by Λ00 > 0.
The combined application of the proper and orthochronous requirements lead us
to the restricted Lorentz group, which contains all (and only) physically accessible2
Lorentz transformations: namely, those corresponding to physically accessible changes
of inertial frame. The unimodularity of the proper Lorentz transformations implies
the invariance of four-dimensional spacetime integrals under change of variable corre-
sponding to a Lorentz transform (as the Jacobian det(Λ) is unity):

2 We exclude “looking in the mirror” (parity transformations with Λ0 = 1, det(Λ) = −1) from the set
0
of physically accessible changes of frame.
110 Dynamics III: Relativistic quantum mechanics
 
d4 x . . . = d 4 x . . . (5.7)

for x = Λx.
The kinematic discussion above focussed on descriptions of coordinates of events:
in particle physics, the description of scattering processes is almost always exclusively
in terms of energies and momenta of the scattering particles.3 Using natural units,
 = c =1, an isolated  stable particle of mass m and well-defined spatial momentum
k has energy E(k) = k 2 + m2 . This “mass-shell condition” relating the relativistic
energy and momentum is more simply written k 2 ≡ k · k = m2 . We shall frequently
encounter integrals over the possible four-momenta of a stable particle subject to (i)
the mass-shell condition, and (ii) positivity of the energy. Under proper orthochronous
transformations, the invariance of the relativistic product (and the sign of the energy)
3
therefore implies the (not at all obvious!) invariance of the measure d
k :
E(k)
  3
d k
d4 kθ(k0 )δ(k 2 − m2 )f (k0 , k) = f (E(k), k) (5.8)
2E(k)
—a result which we shall employ on numerous occasions.
We list here some special Lorentz transformations of particular interest:
1. Rotation by θ around the z-axis
⎛ ⎞
1 0 0 0
⎜ 0 cos(θ) sin(θ) 0⎟
Λμν = R(θ)μν =⎜
⎝ 0 − sin(θ)
⎟ (5.9)
cos(θ) 0⎠
0 0 0 1
2. Boost with rapidity ω along z-axis
⎛ ⎞
cosh(ω) 0 0 − sinh(ω)
⎜ 0 1 0 0 ⎟
Λμν = B(ω)μν =⎜


⎠ (5.10)
0 0 1 0
− sinh(ω) 0 0 cosh(ω)
where the rapidity is related to the velocity of the boost by
1
cosh(ω) = √ =γ
1 − v2
v
sinh(ω) = √ = vγ (5.11)
1 − v2
Note that the boost matrix B(ω) (in contrast to the rotation matrix R(θ)) is not
orthogonal.
The restricted Poincaré group (sometimes referred to as the inhomogeneous
(restricted) Lorentz group) is the set of transformations consisting of the combined

3 Oscillation experiments, such as in the neutral kaon system, are a notable exception.
Relativistic multi-particle states (without spin) 111

effect of restricted Lorentz transformations (i.e., those Λ satisfying det(Λ) = +1, Λ0 0 >
0) and spacetime displacements. Unless otherwise explicitly stated, we shall assume
henceforth that the restriction to proper, orthochronous Lorentz transformations
applies in our further discussions of both the homogeneous Lorentz group and its
inhomogeneous extension, the Poincaré group. In defining the latter, by convention,
the Lorentz transformation is performed first, followed by the displacement. Thus an
element of the Poincaré group is specified by a pair (Λ, a), with action

xμ → xμ = Λμν xν + aμ (5.12)

The composition law for the Poincaré group follows immediately:

(Λ1 , a1 ) · (Λ2 , a2 ) = (Λ1 Λ2 , a1 + Λ1 a2 ) (5.13)

In accordance with the Wigner unitarity–antiunitarity theorem discussed in Chap-


ter 3, the Poincaré symmetry of a relativistic quantum theory is realized by a unitary
representation (recall that for a continuous group the antiunitary option is disallowed)
associating the unitary operator U (Λ, a) with the Poincaré element (Λ, a), with the
algebra (following from (5.13))

U (Λ1 , a1 )U (Λ2 , a2 ) = U (Λ1 Λ2 , a1 + Λ1 a2 ) (5.14)

For the special case of the subgroup of proper Lorentz transformations (a = 0), the
element Λ is represented on the state space by the unitary operator U (Λ). We now
turn to the properties of the U (Λ), beginning with their action on states with spinless
particles.

5.2 Relativistic multi-particle states (without spin)


Consider a state of a single stable massive particle (for simplicity, spinless, and ignoring
all internal quantum numbers) with four-momentum kμ . There is always a frame
(unique up to rotations) in which the spatial components of the four-momentum
vanish. Define k0 ≡ m in this frame as the mass. In any other frame obtained by
a proper orthochronous transformation from this one, the zeroth component will
be given by E(k) = k 2 + m2 . Thus the state of our spinless particle is completely
characterized by giving just the spatial momentum, and may be written simply |k.
There are two normalization conventions commonly used for states in relativistic
quantum theory. The covariant normalization convention defines

k |k = 2E(k)δ 3 (k − k  ) (5.15)

The “covariant” appellation is justified by the following invariance property:4

2E(k)δ 3 (k − k ) = 2E(Λk)δ 3 (Λk − Λk ) (5.16)

4 The spatial vector Λ


k is to be interpreted as the spatial part of the four-vector Λk, where the zeroth
component k0 is set equal to the on-mass-shell value E(k).
112 Dynamics III: Relativistic quantum mechanics

To prove this, integrate both sides of (5.16) with a smooth test function f (k)/2E(k):

f (k)
d3 k 2E(k)δ 3 (k − k ) = f (k ) (5.17)
2E(k) 

is obtained on the left side, while the result on the right is


  
 f (k)
 
d k 2E(Λk)δ (Λk − Λk )
3 3  = d4 kδ(k 2 − m2 )2E(Λk)δ 3 (Λk − Λk  )f (k)
2E(k)

= d4 kδ(k 2 − m2 )2E(k)δ 3 (k − Λk )f (Λ−1k)

= d3 kδ 3 (k − Λk )f (Λ−1k) = f (k  ) (5.18)

where on the second line we have made a change of variable k → Λ−1 k. This ensures
the invariance of the Hilbert-space inner product under Lorentz transformations:

k|k  = Λk|Λk   (5.19)

With non-covariant normalization, we simply drop the factor of 2E(k) on the right-
hand side of (5.15).
A state |k will look like |Λk to a boosted or rotated observer. The Lorentz group
is realized on the space of states by operators U (Λ) defined to effect precisely this
change:
U (Λ)|k ≡ |Λk (5.20)

In particle physics one is accustomed to the phenomena of particle creation (and


annihilation): processes occur in which the number of particles of a given type changes,
making it essential that we formulate the theory in a multi-particle state space, or
Fock space, constructed mathematically as the infinite direct sum of Hilbert space
sectors containing 0 (the “vacuum state”, |0), 1 (the single particle states |k discussed
above), 2 particles (|k1 , k2 ), and so on. The inner product is defined naturally on the
basis of multi-particle states of well defined momentum by asserting (i) orthogonality
of sectors with differing number of particles, and (ii) for particles of the same type,
Bose or Fermi symmetry depending on the spin.5 For the N -particle sector, the latter
requirement (using non-covariant normalization) amounts to the condition

k1 , ..kN
 
|k1 , ..kN  = (±)P δ 3 (k1 − kP (1) ) . . . δ 3 (kN − kP (N ) ) (5.21)
P

Here P denotes a sum over all permutations P of the integers 1, 2, . . . N , with
(±)P inserting a minus sign for fermions if the permutation P is odd. Of course,

5 The Spin-Statistics theorem asserting the necessity of Bose statistics for particles of integer spin, and
Fermi statistics for particles of half-integral spin, is one of the seminal results of local field theory: we shall
discuss it in Chapters 7 and 13.
Relativistic multi-particle states (without spin) 113

for the spinless bosonic particles under consideration in this section, the positive sign
is to be taken in the symmetrization. Note that the completeness sums acquire extra
combinatoric factors for identical particles: for example, the decomposition of the
identity in the multi-particle Fock space of a single boson takes the form
 1 
1= d3 k1 d3 k2 . . . d3 kN |k1 , k2 , .., kN k1 , k2 , .., kN | (5.22)
N!
N

as the reader may readily verify by squaring the above expression. Rather surprisingly,
the very “big” Fock space obtained by the construction outlined here is still neverthe-
less, like L2 , a separable Hilbert space: it is spanned by a denumerable orthonormal
basis.6
The operator implementing a Lorentz transformation Λ on a multi-particle state
takes the unsurprising form:

U (Λ)|k1 , k2 , .. ≡ |Λk1 , Λk2 , .. (5.23)

With the covariant normalization of states defined above these operators are in fact
unitary. For example, on single-particle states,

(U (Λ)|k  , U (Λ)|k) = k |U † (Λ)U (Λ)|k


= Λk  |Λk
= 2E(Λk)δ3 (Λk  − Λk)
= 2E(k)δ 3 (k − k )
= k |k (5.24)

and similarly for multi-particle states.


In practice it is generally more convenient to use non-covariantly normalized states,
which are simply √ 1 (for each particle in the state) times the old covariantly
2E(k)
normalized states. Thus, the non-covariant normalization convention is simply

k  |k = δ 3 (k − k ) (5.25)

The modification in the Lorentz transformation rule is trivial:



E(Λk) 
U (Λ)|k = |Λk (5.26)
E(k)

and correspondingly, for a multi-particle state:




N
E(Λki ) 
U (Λ)|k1 , k2 , .., kN  = |Λk1 , Λk2 , .., ΛkN  (5.27)
E(ki )
i=1

6 For a clear discussion of this frequently misunderstood feature, see (Streater and Wightman, 1978).
114 Dynamics III: Relativistic quantum mechanics

In either case, we shall often abbreviate the Lorentz action on a multi-particle state
as U (Λ)|α = |Λα.
We can view the multi-particle Fock space described above as a basis of eigenstates
of a free Hamiltonian H0 corresponding to having all self-interactions of our putative
stable, spinless particle switched off. The energy of a multi-particle state is then simply


N
H0 |k1 , k2 , .., kN  = E(ki )|k1 , k2 , .., kN  (5.28)
i=1

and the momentum of such a state is clearly


N
P (0) |k1 , k2 , .., kN  = ki |k1 , k2 , .., kN  (5.29)
i=1

We can consider H0 and P (0) as the time and spatial components respectively
of an energy-momentum four-vector operator P (0)μ for free (hence the superscript
(0)
notation) particles. As we know from general quantum theory, these operators
generate infinitesimal translations in time and space. Taking the matrix element of
the Lorentz transformed energy-momentum operator between covariantly normalized
single particle states:

k  |U † (Λ)P (0)μ U (Λ)|k = (Λk)μ · 2E(Λk)δ 3 (Λk  − Λk)


= Λμν k ν · 2E(k)δ 3 (k − k)
= Λμν k  |P (0)ν |k (5.30)

with a similar result for the matrix element in a general multi-particle state. Thus the
four operators (P (0)0 , P (0)i , i = 1, 2, 3) transform as expected under Lorentz transfor-
mations

U † (Λ)P (0)μ U (Λ) = Λμν P (0)ν (5.31)

Finally, recalling that a general element (Λ, a) of the Poincaré group is defined as a
Lorentz transformation Λ followed by a translation by the four-vector aμ , we see that
the unitary representation of (Λ, a) in the free Fock space is given by
(0)
·a
U (Λ, a) = eiP U (Λ) (5.32)

A few lines of algebra, employing the transformation property (5.31), confirms that
these operators do indeed furnish a unitary representation of the full Poincaré group
(cf. (5.13))
U (Λ1 , a1 )U (Λ2 , a2 ) = U (Λ1 Λ2 , a1 + Λ1 a2 ) (5.33)

5.3 Relativistic multi-particle states (general spin)


The analysis of the transformation behavior of single particle states in relativistic
quantum mechanics for general mass and spin was pioneered by Wigner in a classic
Relativistic multi-particle states (general spin) 115

paper (Wigner, 1939). Mathematically, this amounts to a classification of the irre-


ducible unitary representations of the Poincaré group, which may even be viewed as
furnishing a definition of a particle. Note that there is no claim to elementarity (absence
of internal structure) for such an object: any quantum state which is an eigenvalue of
the squared mass operator Pμ P μ and (for massive particles) of the angular momentum
operators J2 , Jz in the inertial frame where it is at rest, corresponds to a single particle
state associated with a specific irreducible representation of the Poincaré group. Thus,
from this point of view, the proton (a complicated bound state of quarks and gluons)
is just as much a particle as the (as far as we know) structureless electron.7 Wigner’s
technique for exposing the structure of these representations can be summarized as
follows. One first removes the “continuous” part of the specification of the state (i.e.,
the spatial momentum) by performing a Lorentz boost which (for massive particles)
reduces the particle to rest. The remaining discrete (“spin”) degrees of freedom are
then exposed by analysing the properties of the “little group”: i.e., the subgroup of the
full Poincaré group which leaves the particle in the standard state—i.e., for massive
particles, at rest, which means that the little group is simply the three-dimensional
rotation group. The analysis for massless particles is somewhat more subtle, as the
choice of standard state (which cannot be at rest) leads to a little group which is
not semisimple.8 We shall concentrate on massive particles for most of this section,
returning briefly to the massless case at the end. There is some phenomenological
justification for this: we know of many massive but presumably only two exactly
massless particles (the photon, and hypothetically, the graviton). The interactions of
photons are subject to an abelian gauge symmetry (cf. Chapter 15) which renders
the massless limit smooth, so that in fact it is operationally impossible to distinguish
between the interactions of a photon of exactly zero mass and one with mass equal to
10−100 eV, for example. Accordingly, for this one case of a (possibly) massless particle,
we might as well provisionally take the photon to have non-zero mass and pass to the
zero mass limit at the end of the calculations.
Any massive particle state can be viewed from a frame at which it is at rest, so
there is no loss of generality in considering first only states with p = 0, any other
one-particle state being obtainable by applying U (L(k)) where L(k) boosts from
the frame where the particle is at rest to the one where the momentum is k. Of
course, L(k) is not unique, as any Lorentz transformation from rest to a non-zero
momentum k can be preceded by an arbitrary rotation, or followed by an arbitrary
rotation around the direction of k without altering the final momentum. This will
lead to the freedom to define alternative basis sets for single (or multi-) particle states
with spin.
The subgroup of the homogeneous Lorentz group (HLG) leaving a state with zero
spatial momentum at zero momentum is the three-dimensional rotation group, with
generators Jx , Jy , Jz . The irreducible representations of this group are characterized
by the eigenvalues of J2 (= j(j + 1)), with the states within a given representation

7 Of course, in string theory even supposedly point-like quarks and leptons acquire a one-dimensional
string structure in ten dimensions, or a membrane structure in eleven-dimensional M-theory.
8 The reader may recall that a semisimple group is one with no invariant abelian subgroups.
116 Dynamics III: Relativistic quantum mechanics

distinguished by the eigenvalue of Jz (≡ σ). We shall customarily omit the specification


of j, and write a one-particle state of a particle at rest as simply |0, σ, with

Jz |0, σ = σ|0, σ

(Jx ± iJy )|0, σ = (j ∓ σ)(j ± σ + 1)|0, σ ± 1
J2 |0, σ = j(j + 1)|0, σ (5.34)

Of course, j is conventionally referred to as the spin of the particle in question, and


must take either integer or half-integer values by the general representation theory of
the rotation group.9
Let R(k̂) be the rotation operator taking the z-axis into the k̂ direction via a
rotation around the axis perpendicular to both. Further, define B(|k|) (cf. (5.10)) as
the boost from rest to the state with momentum |k| in the z-direction. It is immediately
obvious that we arrive at a state with momentum k by applying either of the following
Lorentz transformations

L(k) ≡ R(k̂)B(|k|)R−1 (k̂) (5.35)


L(k) ≡ R(k̂)B(|k|) (5.36)

to the state at rest |0, σ. These two transformations


give two possible definitions for
general one-particle spin-j states: including the m/E(k) factor for non-covariant
normalization (cf. (5.26)), we may either use states defined as

m
|k, σ ≡ U (L(k))|0, σ (5.37)
E(k)

or

m
|k, λ ≡ U (L(k))|0, λ (5.38)
E(k)

The states defined by (5.37) are conventionally referred to as “spin” states: as in


ordinary non-relativistic quantum theory, the discrete spin index σ refers to the
component of angular momentum (of the particle in its rest frame) along a standard
axis (typically, the z-axis). States defined as in (5.38) are called “helicity” states, as
the λ quantum number actually refers to the projection of the angular momentum in
the direction of motion of the particle:

9 Throughout this section we shall make heavy use of the machinery of angular momentum and the
rotation group: for a review, see Chapters 15 and 17 of (Baym, 1990).
Relativistic multi-particle states (general spin) 117

m
J · k̂|k, λ = U (R(k̂))U † (R(k̂))J · k̂U (R(k̂))U (B(|k|)|0, λ
E(k)

m
= U (R(k̂))Jz U (B(|k|))|0, λ
E(k)

m
= U (R(k̂))U (B(|k|))Jz |0, λ
E(k)

= λ|k, λ (5.39)

where in the third line we have used the fact that rotations around the z axis (affecting
only the transverse x and y coordinates) commute with boosts along the z axis.
Of course, both the |k, σ, σ = −j, −j + 1, .., +j and the |k, λ, λ = −j, .. + j states
form complete sets and can be expressed as linear combinations of one another
(specifically: |k, λ = σ Dσλ j
(R(k̂))|k, σ). The helicity states |k, λ are of particular
utility in dealing with massless or highly energetic particles, as we shall see below when
we address the massless case explicitly. To summarize our options, we may either label
the discrete spin state of our particle by the Jz eigenvalue of a comoving observer (as
in (5.37)), or by the eigenvalue of the component of angular momentum J · k̂ along
the direction of motion of the particle (as in (5.38)).
Now we turn to the Lorentz transformation properties of these states. Evidently

 
m m
U (Λ)|k, σ = U (Λ)U (L(k))|0, σ = U (ΛL(k))|0, σ
E(k) E(k)

m
= U (L(Λk))U (L−1 (Λk)ΛL(k))|0, σ

E(k)

m
≡ U (L(Λk))U (W (Λ, k))|0, σ (5.40)
E(k)

The Lorentz transformation W (Λ, k) ≡ U (L−1 (Λk)ΛL(k)) is actually a rotation, as it


takes a particle at rest first to momentum k, then to momentum Λk, and then finally
back to rest, and therefore must be a pure rotation: it is called the Wigner rotation
(the complete unitary representation theory of the Poincaré group was first developed
by Eugene Wigner in the previously mentioned seminal paper in 1939 (Wigner, 1939)).
The action of a pure rotation on a state of a spin-j particle at rest is given10 by the
rotation matrices Dσj  σ . So we have

10 See Baym (Baym, 1990), Chapter 17. Our notation vis-à-vis rotation group matters agrees with this
reference.
118 Dynamics III: Relativistic quantum mechanics

m  j
U (Λ)|k, σ = U (L(Λk)) Dσ σ (W (Λ, k))|0, σ  
E(k)  σ

E(Λk)  j
= Dσ σ (W (Λ, k))|Λk, σ  (5.41)
E(k)  σ

For j = 0 we have trivially Dσj  σ = δσ 0 δσ0 , and we regain the transformation law (5.26)
for spinless particle states. A straightforward calculation, employing the unitarity of
the D j matrices and U (Λ) operators, confirms the non-covariant normalization of these
states:

k , σ  |k, σ = δσ σ δ(k  − k) (5.42)

Finally, we note that the extension of the unitary representation U (Λ) for the
homogeneous Lorentz group to the full Poincaré group including translations follows
exactly the same lines as in the preceding section for spinless particles: in particular,
the spin indices are unaffected by the application of the energy-momentum operator
P (0)μ .
The above discussion for massive particles undergoes significant modifications for
massless particles. As far as we know, the only exactly massless particles in Nature
are the photon (spin j = 1) and the (hypothetical) graviton (spin j = 2). In either
case, the one particle states are labeled by a discrete helicity index which takes only
two possible values, +j, −j: the intermediate values −j + 1, . . . , j − 1 which would be
present for a massive particle are missing. The reason for this, as we shall now see,
is that the little group for massless particles is quite different from that for massive
particles (the rotation group O(3), as discussed above).
First, let
 us revert to covariant normalization of states, to avoid inconvenient
factors of E(k) in the transformation formulas. We cannot choose the state of the
particle at rest as our standard state for massless particles; instead, our standard
state will be defined as the particle with a standard momentum (and energy) +μ in
the z-direction, i.e., with four-momentum k0 = (μ, 0, 0, μ). The boost operator B(k)
will now be defined as the boost in the z-direction with rapidity ω = ln μk , i.e., it takes
the standard state with four-momentum k0 = (μ, 0, 0, μ) to four-momentum (k, 0, 0, k).
The rotation R(k̂) then takes this to a general light-like state with four-momentum
(|k|, k). Thus, if we label the (as yet unspecified) spin quantum number(s) needed
to specify fully the one-particle state τ , a general massless particle state can be
defined as

|k, λ ≡ U (L(k))|k0 , τ  (5.43)


L(k) ≡ R(k̂)B(|k|) (5.44)

Note that we must use the helicity type transformation of (5.36) here, rather than
the spin type transformation (5.35), as our standard state has non-zero momen-
tum. Following exactly the steps performed leading to (5.40), this time in covariant
normalization (so the square-root factors are absent), we find the general Lorentz
Relativistic multi-particle states (general spin) 119

transformation law for such states to be

U (Λ)|k, τ  = U (L(Λk))U (W(Λ, k))|k0 , τ  (5.45)

where the Lorentz transformation

W(Λ, k) ≡ B −1 (|Λk|)R−1 (Λk)ΛR(


ˆ k̂)B(|k|)

must now be an element of the little group for massless particle states: namely, it must
be a Lorentz transformation leaving the standard vector k0 = (μ, 0, 0, μ) invariant. As
in the massive case, this condition specifies a three-parameter subgroup of the homoge-
neous Lorentz group. However, this group, as we shall now see, is not the conventional
O(3) rotation group (obviously, as rotations around the x and y directions manifestly
do not leave the standard vector unchanged), with its attendant fully understood finite-
dimensional representations labeled by j, σ quantum numbers. Instead, it turns out to
be a non-compact, non-semisimple group: namely, the Euclidean group of rotations and
translations in two dimensions, with a completely different representation structure to
the O(3) rotation group.
The little group of Lorentz transformations leaving our standard light-like four-
vector k0 = (μ, 0, 0, μ) invariant clearly still retains a remnant of the full rotation
group: namely, rotations R(θ) by an angle θ around the z-axis, as given in (5.9).
Defining a two-dimensional vector ξ ≡ (ξ1 , ξ2 ), the reader may easily verify that the
two-parameter set of Lorentz transformations defined by
⎛ ⎞
1 + 12 ξ2 ξ1 ξ2 − 12 ξ2
μ =⎜
⎜ ξ
1 1 0 −ξ1 ⎟ ⎟
T (ξ) ν ⎜ ⎟ (5.46)
⎝ ξ2 0 1 −ξ2 ⎠
1 2
2ξ ξ1 ξ2 1 − 1 ξ2 2

 form an abelian subgroup, as one


also leaves k0 invariant. The transformations T (ξ)
easily sees that
 (
T (ξ)T χ) = T (ξ + χ
) (5.47)

and is therefore structurally identical to the group of translations in the x-y plane.
Indeed, the two-vector ξ transforms as expected under a rotation around the z-axis:
 −1 (θ) = T (Rθ ξ)
R(θ)T (ξ)R  (5.48)

where Rθ ξ ≡ (cos (θ)ξ1 + sin (θ)ξ2 , − sin (θ)ξ1 + cos (θ)ξ2 ). Moreover, (5.48) implies
that these two-dimensional translations form an invariant abelian subgroup of the
full three-parameter Euclidean group of two-dimensional translations and rotations
 θ) ≡ T (ξ)R(θ)
W (ξ,  which comprise the little group for a massless particle:

W (ξ, χ)W −1 (ξ,


 θ)T (  θ) = T (Rθ χ
) (5.49)

so, as advertised previously, our little group is definitely not semisimple, nor compact
(as the translations are unbounded).
120 Dynamics III: Relativistic quantum mechanics

What is the representation structure of this Euclidean group? The unitary rep-
resentative of the rotation part R(θ) is just U (R(θ)) = eiθJz so if we label the
eigenvalue of the hermitian generator Jz by λ, we must include this “helicity”
value (the component of angular momentum along the direction of motion) in the
specification of quantum numbers τ which give a full labeling of our standard state
|k0 , τ . In principle, the hermitian generators of the translation part of the little
group could also take on non-vanishing “momentum” eigenvalues π = (π1 , π2 ), with
 π ·ξ

0, τ  = e |k0 , τ , in which case the full set of quantum numbers τ would


i

U (T (ξ))|k
be labeled by the three parameter set (π , λ). The one massless particle with which we
have (extensive!) experience, the photon, definitely has non-zero helicity states, but
there is no empirical evidence for any further degrees of freedom corresponding to the
π variables, which we must therefore set to zero. Accordingly, we shall assume that
one-particle massless states in general are fully specified by a momentum and single
helicity variable, with a standard state |k0 , λ satisfying

Jz |k0 , λ = λ|k0 , λ (5.50)

The only remnant of the rotation group here is the one-dimensional abelian subgroup
of rotations around the z-axis, for the standard state, or more generally, around the
direction of motion of the massless particle. In particular, we are missing the raising
and lowering operators constructed from the other two generators (Jx , Jy ) which would
normally allow us to (a) move stepwise from helicity λ to λ ± 1, thereby filling out a full
2j + 1-dimensional O(3) representation of spin j states, and (b) establish the integral
or half-integral quantization of the maximal helicity. In fact, for massless particles the
helicity is actually a Lorentz-invariant: it is exactly the same number for the state
|k, λ = U (L(k))|k0 , λ obtained by boosting the standard state |k0 , λ for a particle
moving in the z-direction with the same helicity, as we see by the calculation leading
to (5.39). Physically, massless particles of different helicity are, from the standpoint
of the proper Lorentz group (i.e., absent improper parity transformations), completely
distinct and unrelated objects. If we include parity transformations, of course, which
reverse momentum but leave angular momentum unchanged, the helicity changes sign,
so if a massless particle participates in parity conserving interactions (as the photon
certainly does), then if the particle exists with helicity +λ it must also exist with
helicity −λ. But the representation theory of the little group clearly says nothing
about intermediate helicity values (such as 0 for the photon, or +1,0, and –1 for the
spin-2 graviton).
What about the quantization condition for maximal helicity (≡ spin of the par-
ticle), given the absence of the full O(3) rotation group? At first, one might suppose
that as a rotation by angle θ around the momentum vector of our helicity λ state |k, λ
generates a phase factor eiλθ , a rotation by 2π must return us to the original state,
ensuring integer quantization of helicity. Of course, the doubly connected structure of
the rotation group is the critical property saving us from this conclusion, and reinstat-
ing the possibility of the spinor type half-integral representations. Recall that whereas
any continuous one-dimensional path R(α), 0 ≤ α ≤ 1 through the group manifold of
O(3) in which the starting element R(0) is the identity but the final rotation R(1) is
by an odd multiple of 2π (around the z-axis, say) cannot be continuously shrunk to a
How not to construct a relativistic quantum theory 121

point (the identity) in the group manifold (i.e., to the path R(α) = R(0) = 1, α ≤ 1):
the rotation group in three dimensions is not simply connected. On the other hand,
paths starting at the identity and ending at rotations involving a multiple of 4π can
be so shrunk.11 The corresponding unitary representatives U (R(α)) must generate
continuously varying phases applied to our massless one-particle state, which then
implies that
1 3
ei(4πn)λ = 1 ⇒ λ = 0, ± , ±1, ± , . . . (5.51)
2 2
In other words, as we expect, only integer or half-integer values for the helicity of
a massless particle are allowed. This is an example of a topological, rather than an
algebraic, quantization of a quantum number.

5.4 How not to construct a relativistic quantum theory


Having specified the kinematic structure of the multi-particle Hilbert space for rela-
tivistic particles, one may set about the task of constructing a sensible “relativistic
quantum mechanics”: at the very least we should expect to be able to construct,
at least in principle, theories in which a unitary S-matrix satisfies the requirements
of special relativity. Our general discussion of quantum symmetries in Section 4.1.3
(and in particular, the Wigner unitarity–antiunitarity theorem discussed there) require
equality of the S-matrix amplitudes (and not just their absolute squares, giving the
probabilities of the corresponding scattering) for a process in which a prepared state
|αin evolves to a detected state |βout in two inertial frames related by a Lorentz
transformation Λ. Specifically, we must have

out β|αin = Sβα = SΛβ,Λα (5.52)

which ensures the inability of an observer to detect absolute inertial motion by


performing scattering experiments. Writing out the matrix elements above more
explicitly

Sβα = β|S|α = Λβ|S|Λα


= β|U † (Λ)SU (Λ)|α (5.53)

for all states |β, |α. Consequently, we must have

S = U † (Λ)SU (Λ) (5.54)

for an arbitrary Lorentz transformation Λ. Equivalently, S commutes with the repre-


sentatives U (Λ) of Lorentz transformations on the quantum-state space:

[U (Λ), S] = 0 (5.55)

11 The reader may verify this immediately by the magical “twisted belt” experiment: a belt held flat at
both ends, but with a single twist by 360 degrees in the middle cannot be flattened by moving only the
right end around (while held flat), whereas if subjected to a double twist of two full rotations, the belt
“unwinds” easily if we move its right end appropriately, keeping the left end fixed.
122 Dynamics III: Relativistic quantum mechanics

We remind the reader (cf. Section 4.3) that the S-matrix is defined as a unitary
operator in the basis of free particle states: the behavior of these states under the
action of U (Λ) is precisely the content of the preceding sections of this chapter.
The requirement of Poincaré invariance adds invariance under spacetime translations
to the above (i.e., we require commutativity of S with the larger class of unitary
symmetry operators U (Λ, a)). This, of course, implies commutativity of S with the
infinitesimal generators of spacetime translations, namely the (free) energy-momentum
operator P (0)μ :

[P (0)μ , S] = 0 (5.56)

There is, in fact, a trivial way to ensure both unitarity of S and the invariance
requirements (5.55, 5.56). Let us write the full Hamiltonian of the theory as

H = H0 + V (5.57)

where the free part H0 is defined by (5.28). Recall that the S-matrix is
obtained as the infinite time limit of the interaction-picture time-evolution operator:
S = limT →∞ U (T, −T ). Moreover, for general times t, t0 , this operator is constructed
(cf. (4.27)) as a sum of products of the interaction operator Vip (t) (i.e., the interac-
tion part of the Hamiltonian, in the interaction picture):

Vip (t) ≡ eiH0 t V e−iH0 t (5.58)

Thus, the S-matrix involves a sum of products of the operators H0 and V . Energy-
momentum conservation (5.56) is therefore ensured if12

[P (0) , V ] = 0 (5.59)

What about Lorentz-invariance? It would certainly follow if we could arrange

[U (Λ), Vip (t)] = 0 (5.60)

for all t. In fact, a very simple Ansatz would seem to do the trick. Consider the
operator (for the rest of this section we use covariant normalization throughout, and
drop vector symbols for three-momenta in the states, assumed to be multi-particle
states of a single spinless particle):
 2
d3 ki d3 ki 4      
V =  δ (k1 + k2 − k1 − k2 )h(k1 , k2 , k1 , k2 )|k1 k2 k1 k2 | (5.61)
i=1
2E(k i ) 2E(k i )

The physical meaning of this interaction operator is clear: an incoming two-particle


state with momenta k1 , k2 is converted to an outgoing state k1 , k2 with amplitude
h(k1 , k2 , k1 , k2 ) subject to the constraint of energy-momentum conservation, imple-
mented by the four-dimensional δ-function.

12 Energy conservation is automatic given the time-invariance of H; cf. (4.182).


How not to construct a relativistic quantum theory 123

If h(k1 , k2 , k1 , k2 )∗ = h(k1 , k2 , k1 , k2 ), it is easy to see that V is hermitian, so


unitarity is guaranteed. The δ-function in (5.61) guarantees that energy is conserved
by the interaction, so V commutes with H0 and Vint (t) = V . Lorentz-invariance is
established by checking

U (Λ)V U † (Λ) = V (5.62)

With covariant normalization

U (Λ)|k1 k2  = |Λk1 , Λk2  (5.63)


k1 , k2 |U † (Λ) = Λk1 , Λk2 | (5.64)

so
  3
† d ki d3 ki 4     
U (Λ)V U (Λ) =
2E(k ) 2E(k  ) δ (k1 + k2 − k1 − k2 )h(k1 , ..)|Λk1 , Λk2 Λk1 , Λk2 |
i i i
(5.65)
d3 p
Changing variables from pi to Λpi and using the invariance of 2E(p) and the four-
dimensional δ-function, we find
  3
† d ki d3 ki 4  
U (Λ)V U (Λ) =
2E(k ) 2E(k  ) δ (k1 + k2 − k1 − k2 )
i i i

·h(Λ−1 k1 , Λ−1 k2 , Λ−1 k1 , Λ−1 k2 )|k1 , k2 k1 , k2 | (5.66)

so invariance of V under arbitrary Lorentz transformations (or equivalently, commu-


tativity of V with U (Λ)) follows provided

h(Λ−1 k1 , Λ−1 k2 , Λ−1 k1 , Λ−1 k2 ) = h(k1 , k2 , k1 , k2 ) (5.67)

for general Lorentz transformations Λ. This is trivial to arrange: we simply make h an


arbitrary function of the Lorentz-invariant scalar products k1 · k2 , etc. Obviously there
is a great deal of freedom here, rather as in non-relativistic quantum mechanics, where
one is at liberty to construct potential functions with only very loose constraints. The
above procedure for generating Lorentz-invariant 2-2 particle scattering can clearly be
imitated for N-N particle scattering, so we can generalize the above Ansatz to

∞  
d3 ki d3 ki 4   
N
V = ) δ ( ki − ki )h(N ) (ki , ki )|k1 k2 ..kN

k1 k2 ..kN |
i=1
2E(k i ) 2E(k i i i
N =2
(5.68)
Alas, this seemingly trivial method of generating a profusion of theories with Lorentz-
invariant interactions (and number-conserving Lorentz-invariant scattering for any
number of particles) possesses some fatal flaws:
1. The “convenient” property that the free and interaction parts of the Hamiltonian
commute, [H0 , V ] = 0, induced by the need to complete the 3-momentum con-
servation δ-function in (5.68) to a four-dimensional energy-momentum δ-function
124 Dynamics III: Relativistic quantum mechanics

invariant under general Lorentz transformations, is in fact a disaster. The resul-


tant time-independence of the interaction in interaction picture means that the
infinite time limit giving the S-matrix (4.164) is divergent. One could obtain a
finite result by adiabatically switching off the interaction V → e−|t| V , but the
resultant theory would violate time-translation invariance, and the results would
in any case be sensitive to the switch-off rate .13
2. Even if we ignore the problem of defining a sensible asymptotic limit for the
scattering, the theory defined by (5.68) conceals another potentially fatal disease:
the presence of “spooky action-at-a-distance effects”.14 The interaction opera-
tor V commutes with Lorentz transformations for any choice of the functions
h(N ) , provided that these functions are constructed from Lorentz-invariant dot-
products of the four-momenta. However, the physical requirement that a localized
particle (or particles) very far separated from a set of scattering particles should
not affect the interactions of the latter means that there must in fact be intricate
relations between the h(N ) for different values of N . This requirement is generally
referred to as the clustering principle, and we shall discuss it in great detail
in the following Chapter. Here, it suffices to note that the scattering of two
particles in a universe emptied of all other matter would be determined by the
h(2) functions, but that the introduction of a third particle localized very far away
(and propagating therefore freely) would mean that the scattering amplitude
for the first two is now determined by the h(3) function, which must therefore
be related to h(2) . If we had set h(3) to zero, for example, the introduction
of a third particle localized arbitrarily far from two other localized interacting
particles would have instantly caused these two to cease interacting—very clearly
a “spooky action-at-a-distance” effect! In fact, the cluster decomposition principle
vetoing such effects implies an infinite set of recursive relations between the
coefficient functions h(N ) , as well as specifying certain smoothness properties
for them, as we shall see in Chapter 6. The dyadic notation used to specify
the interaction V in (5.68) turns out to be extremely inefficient in expressing
these contraints: instead, we shall soon see that the introduction of creation
and annihilation operators is just the technical tool needed to facilitate the
construction of clustering interaction theories.
The two pathologies outlined above are really different symptoms of the same
underlying disease: with an interaction Hamiltonian of the form (5.68), our particles
obstinately refuse to stop interacting, either after separating (as naturally must occur
in quantum theory due to wave-packet spreading—cf. also Section 9.3, where the

13 We shall later encounter “persistent interactions” in local quantum field theory which also result
in divergent contributions to the infinite-time propagation of the system encapsulated by the S-matrix,
although these effects can be removed by appropriate choice of the interaction operator V . However, for
interaction operators V of the general class specified by (5.68), it is impossible to avoid a divergent, and
hence physically meaningless, S-matrix.
14 Einstein’s original use of the term “spukhafte Fernwirkung” referred to the peculiar (from a clas-
sical standpoint) statistical correlations of entangled quantum states, as in the famous EPR effect.
These correlations, while perhaps psychologically disturbing, do not lead to the physically unacceptable
action-at-a-distance effect of the kind discussed here (which we might perhaps call an “entsetzliche
Fernwirkung”!).
A simple condition for Lorentz-invariant scattering 125

rigorous Haag–Ruelle theory of scattering asymptotics is introduced) over a long


time period, or when separated spatially by a large distance from one another (the
unacceptable “action-at-a-distance” effects). We have evidently failed to construct an
appropriately local, either in space or in time, theory of interactions.

5.5 A simple condition for Lorentz-invariant scattering


The preceding section exposed some of the difficulties one faces in attempting to
construct, directly from a particle point of view, a physically sensible relativistic
quantum mechanics—a theory, in other words, leading to a unitary S-matrix subject
to the constraints of special relativity. The concept of a field—a dynamical entity
allowing us to associate physical energy and momentum to domains in spacetime
rather than directly to multi-particle states—was nowhere in sight. But from the time
of Maxwell’s great work on electromagnetism, the central role of the field concept in
implementing the idea of local transmission of physical effects (eschewing “action-at-
a-distance” effects) has been clear. In this section we shall show that the introduction
of the field idea, and specifically, the notion of a local field, commuting with itself at
space-like separations, allows a rather simple implementation of the requirements of
Lorentz-invariance. The extent to which this approach also provides a solution to the
requirements of the clustering principle (and if so, a unique solution) will be discussed
in the next chapter.
Let us suppose that the interaction energy operator V in (5.57) can be written as
the spatial integral of an interaction energy density operator Hint (x):

V = d3 xHint (x) (5.69)

Correspondingly, the interaction operator in interaction picture Vip (t) will be given
by a spatial integral of a spacetime field Hint (x, t) (≡ Hint (x): as usual, coordinate
vectors without three-vector symbols are to be taken to be spacetime four-vectors):

Vip (t) = d3 xHint (x, t) (5.70)

Hint (x, t) = eiH0 t Hint (x)e−iH0 t (5.71)

Accordingly, the formal expansion (4.27) for the S-matrix S = U (+∞, −∞) becomes

∞ 
(−i)n +∞
S= dt1 dt2 ..dtn T {Vip (t1 ) . . . .Vip (tn )} (5.72)
n=0
n! −∞

∞ 
(−i)n
= d4 x1 d4 x2 ..d4 xn T {Hint (x1 )..Hint (xn )} (5.73)
n=0
n!

Ignore for the time being the presence of a time-ordering symbol in (5.72). Then the
desired Lorentz-invariance of the S-matrix

S = U (Λ)SU † (Λ) (5.74)


126 Dynamics III: Relativistic quantum mechanics

would follow directly if we could ensure that

U (Λ)Hint (x)U † (Λ) = Hint (Λx) (5.75)

since the change xi → Λxi could be erased by the change of variable (with unit
Jacobian) xi → Λ−1 xi .15
Unfortunately, such a change of variable can in general change the ordering in time
of the various interaction operators, which will not in general commute at different
times (this would require [H0 , V ] = 0 and we have already seen in the preceding
section why this is untenable). When can this happen? Only if two of the spacetime
arguments, xi and xj say, differ by a space-like interval, (xi − xj )2 < 0. There would
then exist Lorentz transformations Λ for which the time-ordering symbol will reverse
the order of the corresponding interaction Hamiltonians. Unless we insist that these
operators commute in this situation, the argument leading to (5.74) will break down.
We may clearly avoid this outcome by insisting on commutativity of interaction energy
densities at space-like separations. This result is, of course, intuitively plausible, as it
asserts the non-interference of measurements of (interaction) energy performed at
space-like separations. The hand-waving argument presented above suggests that the
two conditions

U (Λ)Hint (x)U † (Λ) = Hint (Λx) (5.76)


[Hint (x), Hint (y)] = 0, (x − y) < 0 2
(5.77)

might be sufficient to ensure the Lorentz-invariance of the S-matrix (5.72). We shall


refer to a field satisfying (5.76) as a Lorentz scalar field, while fields satisfying the
space-like commutativity requirement (5.77) will be termed local fields.
The admittedly heuristic argument given above turns out to require some careful
fine-tuning due to subtleties in the behavior of the operator products in (5.73) at point
coincidences in the multi-dimensional integral: i.e., when xi = xj for i = j. Let us focus
our attention on the n =2 term in the expansion of the S-matrix (5.73), involving the
time-ordered product of two interaction energy density operators. Recall the definition
of the time-ordered product in a particular inertial frame (and hence, with a definite
choice of time variable)

T {Hint (x1 , t1 )Hint (x2 , t2 )} = θ(t1 − t2 )Hint (x1 , t1 )Hint (x2 , t2 )


+ θ(t2 − t1 )Hint (x2 , t2 )Hint (x1 , t1 ) (5.78)

or, introducing the time-like unit vector nμ = g0μ = (1, 0, 0, 0),

τ (x1 , x2 ; n) ≡ T {Hint (x1 )Hint (x2 } = θ(n · (x1 − x2 ))Hint (x1 )Hint (x2 )
+ θ(n · (x2 − x1 ))Hint (x2 )Hint (x1 ) (5.79)

15 The order in which the similarity transformation is performed, with the U (Λ) operator on the left
and the U † (Λ) on the right, is dictated by the need for two successive transformations to follow the group
composition law: U (Λ2 )U (Λ1 ) = U (Λ2 Λ1 ).
A simple condition for Lorentz-invariant scattering 127

The failure of the above product to be covariant under Lorentz transformations,


i.e., to satisfy U (Λ)τ (x1 , x2 ; n)U † (Λ) = τ (Λx1 , Λx2 ; n), is clearly due to the explicit
presence of the time-like frame-dependent vector nμ in the θ-functions. An infinitesimal
variation of the time-like vector nμ lies in the space-like hypersurface orthogonal to
n, on to which the operator Πμν ≡ g μν − nμ nν projects (as Πμν nν = 0). The frame-
dependence of the T-product τ (x1 , x2 ) therefore involves the projected derivative


Πμν τ (x1 , x2 ; n) = Πμν (x1 − x2 )ν δ(n · (x1 − x2 ))[Hint (x1 ), Hint (x2 ] (5.80)
∂nν

In our original frame, nμ = (1, 0, 0, 0), the δ-function sets the times of spacetime
points x1 , x2 equal, so the commutator of the interactions energy densities in (5.80)
is an equal-time commutator (ETC). The points x1 and x2 must therefore either be
coincident or space-like separated, (x1 − x2 )2 < 0. Locality (5.77) implies the vanishing
of the commutator in the latter case, but we still have the possibility of δ-function-
type singularities in the coincidence limit at equal time x01 = x02 = t, including possible
terms with spatial derivatives of the coincidence δ-function δ 3 (x1 − x2 ),

∂ 3
[Hint (x1 , t), Hint (x2 , t)] = C(x1 , t)δ 3 (x1 − x2 ) + Dρ (x1 , t)Πρσ δ (x1 − x2 ) + . . .
∂xσ1
(5.81)
where the ellipsis indicates the possible presence of higher (spatial) derivatives. The
fact that the commutator in (5.81) involves the same field operator Hint twice
actually eliminates the first term on the right-hand side as the commutator must
be antisymmetric under the exchange of x1 and x2 (in other words, in this case,
C(x, t) = 0). The second (and if present, higher) terms on the right-hand side of (5.81),
involving derivatives of δ-functions in an equal-time commutator, are called Schwinger
terms. If no such terms are present, the commutator is called “ultralocal”. Inserting
(5.81) in the result (5.80) we find

∂ ∂
Πμν ν
τ (x1 , x2 ; n) = Πμν (x1 − x2 )ν Πρσ Dρ (x1 ) σ δ 4 (x1 − x2 ) + ..
∂n ∂x1
= −Πμρ Dρ (x1 )δ 4 (x1 − x2 ) + .. (5.82)

(where we have used (x1 − x2 )ν δ 4 (x1 − x2 ) = 0). Note that the preceding equations
are to be interpreted in the usual distributional sense: as equalities holding after
integration with appropriately smooth, rapidly decreasing c-number functions. This
allows us to remove the derivative from the δ-function in the Schwinger term and apply
it to Dρ (x1 )(x1 − x2 )ν , obtaining the final result (5.82). We see that if Dρ (x) = 0
the time-ordered product τ (x1 , x2 ; n) contributing to the second-order term in the
S-matrix is indeed frame-dependent—thereby ruining the Lorentz-invariance of the
S-matrix—in the presence of Schwinger terms in the equal-time commutator of Hint .
This is not an empty possibility: we shall see in Chapter 12 that this is exactly
128 Dynamics III: Relativistic quantum mechanics

the situation in derivatively-coupled field theories, for example.16 For the time being
we shall satisfy ourselves with the requirement of “ultralocality” in the equal-time
commutators of Hint : namely, the absence of any derivative terms in the general form of
the ETC (5.81). With this proviso, it can then be shown that the qualitative argument
given previously for the Lorentz-invariance of the S-matrix (5.73) is indeed correct.
Having satisfied the requirements of invariance under the homogeneous portion of
the Poincaré group (i.e., the homogeneous Lorentz group) by choosing the interaction
energy density to be an ultralocal Lorentz scalar field, what can we say about the
inhomogeneous part: namely, invariance under translations in space and time? The
corresponding conservation laws require the S-matrix to commute with the free state
(0) (0)
energy (P0 = H0 ) and momentum (Pi , i = 1, 2, 3) operators. The fact that our free
and interaction operators H0 and V are time-independent ensures energy conservation
(cf. (4.182)); but what about spatial momentum conservation? Recall that this is
(0)
assured once we have [Pi , V ] = 0 (5.59). It turns out that we get this for free once
Hint is chosen to be a Lorentz scalar field satisfying (5.76). First, we note that by the
usual property of interaction-picture operators,

(0) ∂
i[H0 , Hint (x, t)] = i[P0 , Hint (x, t)] = Hint (x, t) (5.83)
∂t

(0)
We saw previously (cf. 5.31) that the energy-momentum four-vector operator Pμ
transforms in the expected way under Lorentz transformations

U † (Λ)Pμ(0) U (Λ) = Λμν Pν(0) (5.84)

For example, for a boost along the z-axis, choosing Λ to be the Lorentz transformation
given by (5.10), and writing U (Λ) = U (ω),

U † (ω)H0 U (ω) = U † (ω)P0 U (ω)


(0)

(0)
= Λ0ν Pν(0) = cosh(ω)H0 + sinh(ω)P3 (5.85)

Thus

iU † (ω)[H0 ,Hint (x, t)]U (ω) = i[U † (ω)H0 U (ω), U † (ω)Hint (x, t)U (ω)]
= i[cosh(ω)H0 + sinh(ω)P3 , Hint (Λ−1 x)]
(0)
(5.86)

16 For such theories it is possible (cf. Section 12.1) to concoct an additional non-covariant term in the
interaction density which cancels everywhere the effect of the Schwinger term—a so-called “covariantizing
seagull”. The Lagrangian formalism developed in Chapter 12 provides an automatic and foolproof procedure
for generating the correct form of such covariantizing terms, if needed.
A simple condition for Lorentz-invariant scattering 129

On the other hand,


iU † (ω)[H0 ,Hint (x, t)]U (ω) = Hint (Λ−1 x)
∂t

= Hint (x1 , x2 , cosh(ω)x3 + sinh(ω)t, cosh(ω)t + sinh(ω)x3 )
∂t
∂ ∂
= (sinh(ω) 3 + cosh(ω)  )Hint (x , t ), x ≡ Λ−1 x (5.87)
∂x ∂t
Comparing coefficients of sinh(ω) in (5.86) and (5.87), we find


i[P3 , Hint (Λ−1 x)] = i[P3 , Hint (x )] = Hint (x )
(0) (0)
(5.88)
∂x3
Evidently, for any of the spatial components

(0) ∂
i[Pi , Hint (x)] = Hint (x) (5.89)
∂xi
which implies
 
(0) (0) ∂
i[Pi , V ] = i[Pi , d3 xHint (x)] = d3 x Hint (x) = 0 (5.90)
∂xi

exactly the condition for momentum conservation. We shall henceforth consider the
definition of a general Lorentz scalar field A(x) to include the transformation property
(5.76) under the HLG as well as the commutation relations with the energy-momentum
four-vector

i[Pμ(0) , A(x)] = A(x) (5.91)
∂xμ
From (5.91), a standard application of the Baker–Campbell–Hausdorff formula

1 1
eP Qe−P = Q + [P, Q] + [P, [P, Q]] + [P, [P, [P, Q]]] + . . . (5.92)
2! 3!
leads to the very important translation property for scalar fields in the interaction
picture
(0) μ (0) μ
eiPμ a
A(x)e−iPμ a
= A(x + a) (5.93)

for any fixed displacement four-vector aμ . We can combine the transformation require-
ments under the HLG (5.76) with the translation property (5.93) to obtain the
transformation of our scalar field under a general element (Λ, a) of the Poincaré group,
(0)
with unitary representative U (Λ, a) = eiP ·a U (Λ):

U (Λ, a)A(x)U † (Λ, a) = A(Λx + a) (5.94)


130 Dynamics III: Relativistic quantum mechanics

5.6 Problems
1. Calculate explicitly (as a 4x4 matrix) the commutator [Λ1 , Λ2 ] of a boost Λ1 with
rapidity ω1 along the x axis with a boost Λ2 with rapidity ω2 along the y axis. If
the rapidities ω1 , ω2 are infinitesimal, what type of transformation does [Λ1 , Λ2 ]
induce?
2. (a) Show the following connection between helicity and spin states:
 j
|
p, λ = Dσλ (R(p̂))|
p, σ
σ

(b) Prove that helicity states transform as follows under Lorentz transformations:

E(Λp)  j
U (Λ)|p, λ = p, λ  
D  (W(Λ, p))|Λ
E(p)  λ λ
λ

where

W(Λ, p) ≡ B −1 (|Λ


p|)R−1 (Λp)ΛR(p̂)B(|
ˆ p|)

(c) From (b), show that



† † E(Λp)  j
U (Λ)a (
p, λ)U (Λ) = D  (W(Λ, p))a† (Λ
p, λ )
E(p)  λ λ
λ


3. Verify that the massless particle little group displacement transformations T (ξ)
in (5.46) are indeed Lorentz transformations. Also, verify the group composition
rule (5.47) and the transformation law (5.48).
4. The transformation property of scalar fields under spatial rotations follows from
the infinitesimal version of the transformation property:

U (Λ)A(x)U † (Λ) = A(Λx)

Choosing Λ to be the infinitesimal rotation (Ri x)j = xj + ijk xk δθ, show that
the scalar field A(x) has the following commutation relation with the angular
momentum components Ji , i = 1, 2, 3:

[Ji , A(x)] = i ijk xk ∂j A(x) (5.95)

5. Show that a product H(x) = A(x)B(x)C(x).. of local, Lorentz scalar fields A(x),
B(x), C(x),.. is itself a local, Lorentz scalar field: i.e., verify

U (Λ)H(x)U † (Λ) = H(Λx)


[H(x), H(y)] = 0, (x − y)2 < 0

i[Pμ(0) , H(x)] = H(x)
∂xμ
Problems 131

6. A Lorentz vector field V μ (x) is defined as a field transforming as follows:

U (Λ)V μ (x)U † (Λ) = (Λ−1 )μν V ν (Λx)


(a) Show that if A(x) is a Lorentz scalar field, then V ν (x) ≡ ∂ ν A(x) is a Lorentz
vector field.
(b) If V ν (x), W ν (x) are Lorentz vector fields, show that A(x) ≡ Vν W ν (x) is a
Lorentz scalar field.
(c) Verify that the group composition law works out properly for the product of
two successive Lorentz transformations Λ1 , Λ2 of the vector field V ν (x).
6
Dynamics IV: Aspects of locality:
clustering, microcausality, and
analyticity

In the preceding chapter we discussed two possible approaches to constructing a


relativistically invariant theory of particle scattering. Our first attempt—a frontal
assault in which one directly writes down for each scattering sector (i.e., with a
specified number of incoming and outgoing particles) a manifestly Lorentz-invariant
interaction operator containing momentum-dependent Lorentz scalar amplitudes—
led to disaster. The resultant theory led to particle interactions which could not be
confined to finite regions of spacetime, as all our experience with subatomic phenomena
has led to us to conclude that they are. In particular, the constraints of cluster
decomposition—which we now define as the necessity for scattering amplitudes of
groups of far separated particles to factorize in such a way that the probability of the
overall process is a product of the probability of scattering in the separate groups—do
not appear to be satisfied in any natural way in such a framework. Our second attempt,
in which the interaction Hamiltonian is written as a spatial integral of a local, Lorentz
(ultra-)scalar field, accomplishes the primary goal of producing a Lorentz-invariant
set of scattering amplitudes, but we are as yet unsure as to its compliance with the
clustering principle. Our goal in this chapter is to put this latter requirement into
a precise mathematical framework, called second quantization, so that the process of
identifying clustering relativistic scattering theories can be simplified and even to some
degree automated.
Before introducing the technical tools needed to accomplish this task, it may be
best to outline in a completely qualitative way the essential conceptual elements that
will be woven together by writing our theories in the language of second quantization.
When one considers the nature of interactions in the subatomic world, the following
two intuitions seem so completely natural that we can hardly imagine them to be
violated in a physically sensible theory:
1. Intuition A (the clustering principle):1 the scattering amplitude for two groups
of particles contained (and localized, in the wave-packet sense) in distinct finite
spatial regions, with the two regions separated by a distance R, should, in the

1 The cluster decomposition principle for the S-matrix seems to have been first articulated by Wichmann
and Crichton (Wichmann and Crichton, 1963): the proof that factorization of scattering probabilities
extends to the scattering amplitudes was given by Taylor (Taylor, 1966).
Clustering and the smoothness of scattering amplitudes 133

limit R → ∞, approach the product of the independent scattering amplitudes


for the particles in each region to scatter among themselves. As we shall see
shortly, this intuition can be formulated as a completely precise condition which
a physically sensible multi-particle S-matrix must necessarily obey.
2. Intuition B: the quantum-mechanical amplitude for particle scattering can be
constructed as a sum of terms in which particular particles can scatter not
at all, once, twice, or indeed arbitrarily many times. If no interaction occurs
between two particles, their initial and final spatial momenta must be precisely
unchanged. Likewise, if no interaction occurs between two groups of particles, we
expect the total momentum of each group to be exactly preserved. On the other
hand, if even a single interaction, however weak, occurs between two particles,
we expect the probability for their individual spatial momenta to be exactly
preserved to be zero: the interaction will presumably end up transferring some
non-zero, albeit very small, momentum between the two particles (all this with
an obvious generalization to two groups of particles).
The amazing thing, as we shall see in the very next section, is that these two intuitions
are in fact one and the same! By the magic of Fourier transformation, the spatial
considerations of intuition A turn out to be precisely equivalent to the momentum
considerations of intuition B. This realization will allow us to rephrase the requirement
of clustering in a mathematically transparent and readily implementable way.

6.1 Clustering and the smoothness of scattering amplitudes


We now come to the constraints placed on the S-matrix by the requirement of cluster
decomposition. To express the idea of factorization of far-separated processes in a
simple but precise way, it is convenient to introduce the concept of connected parts of
the S-matrix. This allows us to separate out the parts of the S-matrix corresponding to
a non-trivial interaction among some subset of particles from those parts corresponding
to the trivial situation where the particles pass through without interaction. The
connected part S c of S may be defined by induction on the number of particles in the
initial and final states as follows. If both initial and final states contain only a single
stable particle, no interactions occur and in this subspace the S-matrix is the identity,
or (with non-covariant normalization of the states)

S
k ,
k = δ 3 (k − k) ≡ S
kc  ,
k (6.1)

which we may express pictorially as in Fig. 6.1.


For 2-2 scattering, the connected part is defined by the equation

Sk1 k2 ,k1 k2 = Skc  ,k1 Skc  ,k2 + Skc ,k2 Skc ,k1 + Skc k ,k1 k2 (6.2)
1 2 1 2 1 2

which can be expressed graphically as indicated in Fig. 6.2. Note that in the present
discussion we assume that the incoming particles are identical (hence, indistinguish-
able) Bose particles.
The three-particle connected part is similarly defined by subtracting off those parts
of the full 3-3 scattering amplitude in which some proper subset of the particles pass
through unaffected by interactions. Again, this can be pictorially represented as shown
134 Dynamics IV: Aspects of locality: clustering, microcausality, and analyticity

k k

k k

Fig. 6.1 S-matrix in the one-particle sector.

k1 k2 k1 k2 k2 k1 k1 k2

k1 k2 k1 k2 k1 k2 k1 k2

Fig. 6.2 Decomposition of 2-2 S-matrix amplitude into connected parts.

in Fig. 6.3 (here, “perms” refers to the appropriate set of terms with initial and
final particle momenta permuted to ensure Bose symmetry of the amplitude). As
the connected 2-2 amplitude has already been defined in (6.2), the fully connected
3-3 amplitude Skc k k ,k1 k2 k3 is defined inductively as the full 3-3 amplitude minus
1 2 3
the contributions from situations in which some proper subset of particles interacts
separately from the others.
Since Skc k ,k1 k2 , Skc k k ,k1 k2 k3 , etc., all come from the δ(Eα − Eβ )Tβα (i.e., interac-
1 2 1 2 3
tion) part of S (recall (4.182)), and T is assumed to conserve total spatial momentum,
we expect these connected parts all to have a full four-dimensional δ-function of
four-momentum conservation: δ 4 (P  − P ), with P, P  the total initial and final four-
momenta for the connected subprocess. For clustering to hold, as we shall now see,
it is crucial that this be the only δ-function in the connected parts of the S-matrix.
Of course, the disconnected parts may have many more, as energy-momentum must
Clustering and the smoothness of scattering amplitudes 135

k1 k2 k3 k1 k2 k3



k1 k2 k3
k1 k2 k3

+perms + C +perms + C

k1 k2 k3
k1 k3 k1 k2 k3 k1 k2 k3
k2

Fig. 6.3 Decomposition of 3-3 S-matrix amplitude into connected parts.

be conserved separately in each disconnected subprocess. To see the necessity for the
above assertion, consider a N→N process (N particles in, N out) where N=n1 + n2 ,
with n1 particles scattering far from the other n2 . Cluster decomposition requires
that the S-matrix factor into a product of S-matrices for n1 → n1 and n2 → n2
scattering separately. (We are assuming here for simplicity of notation only that the
scatterings conserve the number of particles, which is certainly not the case in general
in relativistic quantum theory.) The general expansion of the S-matrix in terms of
connected parts means that we can write the graphical representation for the full
N→N process as in Fig. 6.4. For the S-matrix to cluster—in other words, for the overall
process to factorize into a product of independent n1 → n1 and n2 → n2 scatterings
when the two sets of particles are spatially far separated—the extra terms containing
connected parts with both qs and ks must vanish when we form wave-packets for
the incoming and outgoing particles and then move all q-type particles far from all
k-type ones.
The separation can be achieved by introducing a large three-vector Δ  and position-

ing one subset of particles around −Δ and the other around Δ. The wave-packets are 


constructed in the usual way: replace2 the plane wave eik·
x by d3 k̃g(k̃; k)eik̃·(x−[Δ+ξ]) ,
i.e., a wave-packet of momentum centered around k (if g is strongly peaked there) and
peaked in coordinate space at Δ + ξ. A typical connected transition amplitude for
such a set of wave-packets will thus take the form:

d3 k̃1 d3 k̃2 ..d3 q̃n2 d3 k̃1 ..d3 q̃n 2 g(k̃1 ; k1 )..g(q̃n2 ; qn2 )

−i k̃j ·(Δ+ξj )−i q̃j ·(−Δ+ηj )+i k̃j ·(Δ+ξj )+i q̃j ·(−Δ+ηj )
e· j j j j Sk̃c ..q̃  (6.3)
1 n2 ,k̃1 ..q̃n2

2 For the rest of this section we drop three-vector notation to avoid overcrowding the formulas—but
re-emphasize to the reader that the requirements of clustering are exclusively spatial ones!
136 Dynamics IV: Aspects of locality: clustering, microcausality, and analyticity

kn1 q1  k1 kn1 k1 k2 k3· · kn1 k1 ··· kn1
k1 qn2

·· ·· ···

·· + C ·· + ····+ C

·· ·· ···
k1 qn2
kn1 q 1 k1 kn1 k1 k2 k3·· kn1 k1 ··· kn1

q1 qn2 q1 q2 q3· · qn2 q1 ··· qn2

···

× ·· + C ·· + · · · · + C + terms connecting
k’s and q’s

···

q1 qn2 q1 q2 q3·· qn2 q1·· ·qn2

Fig. 6.4 Decomposition of a general S-matrix amplitude into separated subprocesses.

The Δ-dependence in the exponent of (6.3) is just


 
eiΔ·( k̃j − q̃j − k̃j + q̃j ) (6.4)

On the other hand, momentum conservation says


   
k̃j + q̃j = k̃j + q̃j
   
⇒ k̃j − k̃j = q̃j − q̃j (6.5)

so the Δ-dependence may also be written e2iΔ·( q̃j − q̃j ) . If the connected part S c in
(6.3) contained a partial δ-function such as δ( q̃j − q̃j ), the amplitude would lose
Δ-dependence and S c could not vanish in the limit Δ → ∞. In fact, we must demand
that S c contain only a single overall δ-function of energy-momentum conservation,
Clustering and the smoothness of scattering amplitudes 137

times a sufficiently smooth function of momenta (where “sufficiently smooth” ensures


that the Fourier transform falls sufficiently rapidly in coordinate space).
To summarize our main result from this section: the requirement of cluster
factorization of the S-matrix for far-separated processes amounts to a momentum
smoothness condition on the connected parts of the multi-particle scattering ampli-
tudes. In particular, these connected parts must contain at most a single (three-
dimensional) δ-function ensuring spatial momentum conservation, multiplied by a
reasonably smooth momentum-dependent amplitude. The behavior of the Fourier
transform of the latter in coordinate space determines the detailed spatial falloff (power
versus exponential, for example) of the corrections to factorization for far-separated
processes.
Before proceeding to a derivation of the constraints imposed on the dynamics of the
theory (specifically, on the form of the Hamiltonian) by clustering, it will be convenient
to introduce a technical tool which will greatly simply the combinatoric aspects of the
argument, and which will in any event be needed later in our systematic study of
the clustering properties of local field theory in the axiomatic Wightman framework
in Chapter 9, and of perturbative quantum field theory in Chapter 10. As above,
we will reduce inessential algebraic complications to a minimum by dealing with the
scattering amplitudes of a single spinless boson, with Bose-symmetric multi-particle
free states |k1 , k2 , . . . kN  specified entirely by listing the momenta of the particles (in
any order). We can then associate any operator S acting in the Fock space of these
states with a generating functional S(j ∗ , j), where j(k) is a complex source function3
of momentum, and
  d3 k  ...d3 k  d3 k1 ...d3 kN
S(j ∗ , j) ≡ 1 M
j ∗ (k1 )...j ∗ (kM

)Sk1 ..kM
 ,k ..k j(k1 )...j(kN )
1 N
M! N!
M,N
(6.6)
where Sk1 ..kM
 ,k ..k
1 N
= k1 ..kM

|S|k1 ..kN .
The association is unique, as we may extract
any multi-particle amplitude by an appropriate multiple functional derivative:4

δ M +N 
Sk1 ..kM = S(j ∗
, j)  (6.7)
 ,k ..k
1 N ∗  ∗ 
δj (k1 )..δj (kM )δj(k1 )..δj(kN )  ∗
j=j =0

The utility of the generating functional defined in this way follows from the ease with
which it allows us to extract the connected amplitudes. Indeed, if we define a connected
functional S c (j ∗ , j) as the logarithm of the full generating functional S defined above,
so that

S(j ∗ , j) ≡ exp (S c (j ∗ , j)) (6.8)

one easily sees that (for the case where S is the scattering operator) the amplitudes
encoded in S c are exactly the connected amplitudes defined above. For example,

3 For fermions, a similar procedure can be used, but with source functions which take values in a
anticommuting Grassmann field. We shall return to this later in the book when we discuss the path-integral
formulation of quantum field theory.
4 For a review of functional calculus, see Appendix A.
138 Dynamics IV: Aspects of locality: clustering, microcausality, and analyticity

(6.2) corresponds to the result obtained if we assume only particle-number-conserving


interactions, in which case only the M = N = 1 and M = N = 2 terms of (6.6) need be
4
included in the exponent, and the functional derivative δj ∗ (k )δj ∗ (kδ )δj(k1 )δj(k2 ) applied
1 2
to S(j ∗ , j) (with j and j ∗ then set to zero) then yields (6.2) after a short calculation.
The exponential relation here between full and connected amplitudes is the precise
analog of the corresponding relation in statistical mechanics between the partition
function and the extensive (i.e., linear in the system volume) free energy of the system.

6.2 Hamiltonians leading to clustering theories


Matrix elements of the Hamiltonian also display a connectedness structure. We shall
now see that the desired cluster decomposition property of the S-matrix is inherited
from a corresponding characteristic of the underlying (interaction) Hamiltonian. To
begin the discussion, imagine first a non-relativistic quantum-mechanical system where
the particles interact by a two-body potential. The 3-3 matrix element of the potential
operator will be

k1 k2 k3 |V |k1 k2 k3  = k1 k2 k3 |V12 + V23 + V31 |k1 k2 k3  (6.9)

For example, the system being considered might be the three electrons of a lithium
atom, and V above the total electrostatic interaction energy of the electrons. Explicitly,

k1 k2 k3 |V12 |k1 k2 k3 



  
= d3 x1 d3 x2 d3 x3 V12 (x1 − x2 )ei[(k1 −k1 )·x1 +(k2 −k2 )·x2 +(k3 −k3 )·x3 ]


= (2π)6 δ 3 (k3 − k3 )δ 3 (k1 + k2 − k1 − k2 ) d3 x− V12 (x− )ei(k1 −k1 )·x− (6.10)

which may be expressed graphically as shown in Fig. 6.5.

k1 k2 k3

connected piece

k1 k2 k3

Fig. 6.5 Decomposition of a 3-3 matrix element of V.


Hamiltonians leading to clustering theories 139

Note that the remaining integral in (6.10) defines a smooth function of k1 − k1
provided V12 goes to zero sufficiently rapidly for x− → ∞. In general, the multi-particle
matrix elements of H decompose as follows:

H
k ,
k =< k |H|k ≡ E(k)δ 3 (k  − k) ≡ H
kc ,
k (6.11)

In the two-body sector,

H
k
k ,
k1
k2 ≡ H
kc ,
k δ 3 (k2 − k2 ) + H
kc ,
k δ 3 (k2 − k1 ) + H
kc ,
k δ 3 (k1 − k2 )
1 2 1 1 1 2 2 1

+ H
kc ,
k δ 3 (k1 − k1 ) + H
kc
k ,
k
k (6.12)
2 2 1 2 1 2

which defines the 2-2 connected piece Hkc k ,k1 k2 . Graphically, (6.12) may be repre-
1 2
sented as shown in Fig. 6.6. Similarly, in the three-body sector we have a decomposition

H
k
k
k ,
k1
k2
k3 ≡ H
kc ,
k δ 3 (k2 − k2 )δ 3 (k3 − k3 ) + perms.
1 2 3 1 1

+ H
kc
k ,
k
k δ 3 (k3 − k3 ) + perms.
1 2 1 2

+ H
kc
k
k ,
k

(6.13)
1 2 3 1 k2 k3

Note that in the simple non-relativistic example mentioned above, with only two-
body forces present, the connected part of the Hamiltonian in the one-particle sector
is given just by the matrix elements of the free Hamiltonian H0 (see (6.11)), while the
connected 3-3 part Hkc k k ,k1 k2 k3 in (6.13) is actually zero. In more general situations
1 2 3
in many-body theory, there are intrinsically three (or higher) body forces. Intuitively,
this corresponds to a situation where a single interaction alters the momenta of all

k1 k2 k1 k2 k1 k2

C +perms + C
from V only

k1 k2 k1 k2 k1 k2

Fig. 6.6 Decomposition of 2-2 matrix element of H.


140 Dynamics IV: Aspects of locality: clustering, microcausality, and analyticity

three (or more) participating particles. In relativistic quantum field theory, particle
number is no longer conserved in interactions, and the existence of persistent inter-
actions (cf. Chapter 10) can lead to interaction contributions even in the one-particle
sector (depending on exactly how the full Hamiltonian is split into free and interacting
parts).
To summarize the foregoing discussion for non-relativistic scattering, we expect
that the connected N-N part of the Hamiltonian should not contain any partial
δ-functions conserving momentum of any subset M < N of the N interacting particles.
This intuitive expectation turns out to be precisely the requirement that the resultant
S-matrix possess the desired cluster-decomposition properties. More formally, the con-
nected part of the Hamiltonian has matrix elements obtained by Fourier-transforming
some multi-particle potential-energy function


Hkc k ..,k1 k2 = d3 x1 d3 x2 ..d3 x1 d3 x2 ..V (x1 , x2 , .., x1 ..)
1 2

   
e−ik1 ·x1 e−ik2 ·x2 ..eik1 ·x1 eik2 ·x2 .. (6.14)

where the potential energy function V should only depend on differences of coordinates
(translation invariance). The invariance of V under an equal shift of all coordinates
leads directly to momentum conservation: H c must contain an overall conservation
δ-function, namely δ3 (k1 + k2 + .. − k1 − k2 − ..). The only way to have additional
δ-functions conserving subsets of momenta is to have V constant when some subset
of coordinates is moved en-bloc far away from some other subset. We can prevent this
by insisting that V is a smooth5 function of differences of coordinates falling to zero
when any two coordinates separate to infinity with the others fixed.
The basic theorem specifying the smoothness properties needed in the con-
nected Hamiltonian amplitudes to guarantee the clustering property for the S-matrix
was proven by Weinberg (Weinberg, 1964b), in a classic study of multi-particle
scattering,:

Theorem 6.1 If Hkc k ..,k1 k2 = δ 3 ( k  − k) times a smooth function of the
1 2
momenta k, k  , then Skc k ..,k1 k2 = δ 3 ( k  − k) times a smooth function.
1 2

We shall derive this result for the same system used in our discussions in this
and the preceding section: namely, for multi-particle scattering of a single species of
spinless boson. However, there will be no restriction to non-relativistic scattering. In
particular, the scattering amplitudes need not conserve particle number. To facilitate
the otherwise rather tricky combinatorics, we shall use the generating functional
technique described in the preceding section. As in the case of S-matrix amplitudes,
the introduction of associated functionals

5 Here “smooth” is being used in a somewhat loose sense: we are certainly not requiring analyticity
in the momentum variables, for example. Rather, here and henceforth in our discussion of clustering, the
reader should interpret the property “smooth” as simply implying the absence of singularities of δ-function
strength.
Hamiltonians leading to clustering theories 141

  d3 k  ...d3 k  d3 k1 ...d3 kN
H(j ∗ , j) ≡ 1 M
j ∗ (k1 )...j ∗ (kM

)Hk1 ..kM
 ,k ..k j(k1 )...j(kN )
1 N
M! N!
M,N

(6.15)
  d3 k  ...d3 k  d3 k1 ...d3 kN
Hc (j ∗ , j) ≡ 1 M
j ∗ (k1 )...j ∗ (kM

)Hkc ..k ,k1 ..kN j(k1 )...j(kN )
M! N! 1 M
M,N

(6.16)

allows the inductive sequence defining connected parts of the Hamiltonian to be solved
very simply. Namely, defining

F(j ∗ , j) ≡ d3 kj ∗ (k)j(k) (6.17)

one finds simply

H(j ∗ , j) = Hc (j ∗ , j) exp (F(j ∗ , j)) (6.18)

For example, the reader may easily check (take again number conserving theories with
4
M = N for simplicity) that applying the functional derivative δj ∗ (k )δj ∗ (kδ )δj(k1 )δj(k2 )
1 2
to (6.18) (and then setting j = j ∗ = 0) leads directly to the connectedness structure
(6.12) for the two-body sector (as illustrated in Fig. 6.6).
We are interested in the clustering properties of the S-matrix, which the reader
will recall from Section 4.3 is defined as the set of multi-particle matrix elements of
the infinite time limit U (+∞, −∞) of the finite time evolution operator U (t, t0 ) in the
interaction picture, satisfying (4.20):

i U (t, t0 ) = Vip (t)U (t, t0 ) (6.19)
∂t
It will suffice to establish the desired smoothness property for the matrix elements of
U (t, t0 ) at finite times t, t0 as it will then carry over trivially to the large time limit.
We need just one further technical tool to carry through the argument. It is easy to
see that the functional associated with the product V U of two linear operators V and
U is given by

(VU )(j ∗ , j) = LV(j ∗ , J)U(J ∗ , j)|J=J ∗ =0 (6.20)

where V (resp. U ) are the functionals associated with the Fock space operators V
(resp. U ), and L is the linking operator defined as

δ2
L ≡ exp ( d3 k ) (6.21)
δJ(k)δJ ∗ (k)
The role of the exponential in (6.21) is to produce (on expansion) a series of terms,
each of which ties together the momentum of the final-state particles in the matrix
element of U with the initial-state particles in the matrix element of V (for some
specific intermediate state in the operator product V U ). There is no substitute here
142 Dynamics IV: Aspects of locality: clustering, microcausality, and analyticity

for the reader writing out a few simple examples to verify this result (see Problem 1
at the end of this chapter).
Suppressing the passive initial time t0 , which plays no role in the following, we shall
denote the generating functional for the matrix elements of U (t, t0 ) by U(j ∗ , j; t), the
functional for the matrix elements of the interaction operator Vip (t) by V(j ∗ , j; t),
and the corresponding connected quantities as usual by attaching a superscript “c”.
Then the equation of motion (6.19) translates to a corresponding time-development
equation for the associated functionals:

∂ ∂
i U(j ∗ , j; t) = i exp (U c (j ∗ , j; t))
∂t ∂t
∂U c (j ∗ , j; t)
=i exp (U c (j ∗ , j; t))
∂t
= (VU )(j ∗ , j; t)
= LV(j ∗ , J; t)U (J ∗ , j; t)|J=J ∗ =0


(J ∗ ,j;t) 
= L{eF (j V c (j ∗ , J; t)eU
c
,J)
} (6.22)
J=J ∗ =0

Note that the operators


 
δ2
A≡ d3 k , B ≡ F(j ∗ , J) = d3 kj ∗ (k)J(k) (6.23)
δJ(k)δJ ∗ (k)

have the commutator



δ
[A, B] = d3 kj ∗ (k) (6.24)
δJ ∗ (k)

which (a) commutes with A and B (remember that the functional derivatives with
respect to J and J ∗ act independently), and (b) acts as the generator of translations
on functionals of J ∗ :

δ
exp ( d3 kj ∗ (k) )G(J ∗ ) = G(J ∗ + j ∗ ) (6.25)
δJ ∗ (k)

From (a), we have the Baker–Campbell formula eA eB = eB eA e[A,B] , so



F (j ∗ ,J) F (j ∗ ,J) δ
Le =e L exp ( d3 kj ∗ (k) ) (6.26)
δJ ∗ (k)


Once the factor eF (j ,J) is to the left of the linking operator (containing derivatives
with respect to J) the source J can be set to zero, leaving only the second two terms
on the right-hand side of (6.26). Inserting this result on the final line of (6.22),
Hamiltonians leading to clustering theories 143

∂U c (j ∗ , j; t)
i exp (U c (j ∗ , j; t))
∂t
 
δ U c (J ∗ ,j;t) 

= L exp ( d3 kj ∗ (k) ∗
){V c ∗
(j , J; t)e } 
δJ (k) J=J ∗ =0

∗ ∗ 
= L{V c (j ∗ , J; t)eU (J +j ,j;t) }
c


(6.27)
J=J =0

whence

∂U c (j ∗ , j; t) 
= L{V c (j ∗ , J; t)eU (J +j ,j;t)−U (j ,j;t) }
c ∗ ∗ c ∗
i (6.28)
∂t J=J ∗ =0

The essential result we desire is already contained, in disguised form, in Eq. (6.28).
On the right-hand side, we find an exponential which if expanded gives a sum of
terms corresponding to a product of r (say) terms of the form U c (J ∗ + j ∗ , j; t) −
U c (j ∗ , j; t), each of which would vanish when we evaluate the final expression at J ∗ = 0,
as indicated in the formula. The only way to avoid this is if each such term receives at
least one derivative δJδ ∗ from the linking operator on the left, which will then attach the
corresponding connected U c factor to the connected Hamiltonian term V c (j ∗ , J; t). In
other words, the only terms which survive involve r connected factors of the evolution
amplitude U c connected to a single factor of the connected interaction Hamiltonian
V c , as indicated schematically in Fig. 6.7, for the special case of r =3.
Now suppose that at a given time U c has only a single overall δ-function of
momentum conservation. By assumption, so does V c . It is apparent that each term
on the right of Fig. 6.7 is completely connected, so that after integration over internal

momenta it can have only a single overall δ-function. Thus, ∂t U c has only a single
c
δ-function, which implies that U retains this property for all time. At time t = t0
however, U (t0 , t0 ) = 1, so initially the only non-vanishing connected piece of U is the
one-particle matrix element k  |U c |k = δ 3 (k  − k), which indeed has but a single δ-
function. Apart from this single overall δ-function then, we have (by assumption) only
smooth functions of momenta, which remain smooth when combined and integrated
via the linking operator. This establishes the desired result, as stated in theorem 6.1.

···
Vc
∂ ∞

i Uc = Uc Uc
∂t r=1

Uc
···

Fig. 6.7 Time development of connected U c amplitude.


144 Dynamics IV: Aspects of locality: clustering, microcausality, and analyticity

The importance of the above result is clear: we now have a precise criterion
for choosing interaction Hamiltonians which lead to properly clustering S-matrices.
In particular, the connected part of the matrix elements of Vip should contain no
dangerous δ-functions (in the terminology of (Weinberg, 1964b)). It turns out that the
extraction of the connected part of the Hamiltonian by either the inductive scheme
described above, or by functional methods, is rather inconvenient in general. Instead,
a technical device known as “second quantization”6 allows us to display the connected
parts of H with great ease. This device, involving the introduction of the famous
“creation” and “annihilation” operators, will also have an important added bonus:
it will be trivial to ensure the proper symmetry (bosonic or fermionic) of the states.
But the real advantage of the creation–annihilation technology is the ease with which
clustering can be incorporated in the theory.
Much of the preceding discussion of clustering applies quite generally to scattering
processes in non-relativistic quantum mechanics, as well as to scattering in relativistic
quantum field theory. Certainly, we do not expect locality, in the sense of space-
like commutativity of field operators, to play any role in the non-relativistic case.
However, we shall soon see that this much more special property, sometimes termed
microcausality, dovetails effortlessly with the clustering requirement once we add the
condition of Lorentz-invariance of scattering amplitudes. The natural structure of local
quantum field theory emerges inexorably once these basic concepts are fused.

6.3 Constructing clustering Hamiltonians: second quantization


We have arrived at a point where the mathematical preconditions for a clustering
theory have been brought into a direct association with the structure of the interaction
Hamiltonian of the theory: we must ensure that all connected matrix elements of
this Hamiltonian are appropriately smooth, apart from the inevitable δ-function of
spatial momentum conservation implied by the translation invariance of the theory. In
order to do this, one more indispensable technical tool—second quantization—will be
required. We shall introduce a class of operators, called “creation” and “annihilation”
operators, which will vastly simplify the search for clustering theories. Using these
operators will also allow us to incorporate trivially the correct (i.e., fermionic or
bosonic) symmetry in multi-particle states. In the forthcoming discussion, the upper
sign, e.g., as in ± or ∓, will always be taken to refer to bosons, the lower sign to
fermions.
On a general multi-particle state |k1 , k2 , , , kN ,7 define operators a(k), a† (k) by the
following action:

6 The terminology “second quantization” arose in the late 1920s to describe the process, pioneered by
Jordan and Dirac, of replacing the c-number wavefunctions for multi-particle systems by a single spacetime-
dependent q-number operator field: it was soon realized that the creation and annihilation algebra provided
an extremely convenient operator basis for constructing such fields. See Section 2.2.
7 Again, in the interests of avoiding notational overload, the three-vector arrow indications on spatial
momenta are omitted in what follows. Whether a momentum label refers to a spatial, or four momentum,
should (we hope) be obvious from context.
Constructing clustering Hamiltonians: second quantization 145


N
a(k)|k1 , k2 , ..kN  ≡ (±)r−1 δ 3 (k − kr )|k1 , ..kr−1 , kr+1 , ..kN  (6.29)
r=1

a† (k)|k1 , k2 , ..kN  ≡ |k, k1 , k2 , ..kN  (6.30)

Intuitively, a(k) attempts to remove a particle of momentum k (for simplicity of


notation, three-vector symbols are omitted) while a† (k) adds a particle of momentum
k. We shall use non-covariant normalization throughout. It will also be assumed that
all particles in the state are identical: non-identical particles simply act independently.
However, for identical particles we have to be careful to build in the right Bose or Fermi
symmetry. This appears in the normalization formula as follows:


k1 , ..kN

|k1 , ..kN  = (±)P δ 3 (k1 − kP (1) )...δ 3 (kN − kP (N ) ) (6.31)
P


Here P denotes a sum over all permutations P of the integers 1, 2, ...N , with (±)P
inserting a minus sign for fermions if the permutation P is odd.
As suggested by the notation, a† is really the hermitian conjugate of a, as the
following computation shows:

q2 , ..qN |a(q1 )|k1 , k2 , ..kN 



N 
= (±)r−1 δ 3 (kr − q1 ) (±)P δ 3 (k1 − qP (2) )..δ 3 (kr−1 − qP (r) )δ 3 (kr+1 − qP (r+1) )..
r=1 P

= (±)P δ 3 (k1 − qP (1) )δ 3 (k2 − qP (2) )..δ 3 (kN − qP (N ) )
P

= q1 , q2 , ..qN |k1 , k2 , ..kN 


= k1 , k2 , ..kN |q1 , q2 , ..qN ∗
= k1 , k2 , ..kN |a† (q1 )|q2 , ..qN ∗ (6.32)

This implies that if |0 is the state with no particles (the “vacuum”) and ψ| an
arbitrary bra state

ψ|a(k)|0 = (a† (k)|ψ, |0) = k, ψ|0 = 0 (6.33)

as the bra state k, ψ| must contain at least one particle. Since ψ| is arbitrary, it
follows that the annihilation operators a(k) all annihilate the vacuum, a(k)|0 = 0.
(Note that we shall always normalize the vacuum to unity, 0|0 = 1.)
The commutator properties of the creation and annihilation operators will be
crucial. Thus, observe
146 Dynamics IV: Aspects of locality: clustering, microcausality, and analyticity

a† (k )a(k)|k1 ..kN 



N
= a† (k ) (±)r−1 δ 3 (k − kr )|k1 k2 ..kr−1 kr+1 ..kN 
r=1


N
= δ 3 (k − kr )|k1 ..kr−1 k  kr+1 ..kN  (6.34)
r=1

whereas

a(k)a† (k  ) |k1 ..kN  = a(k)|k  k1 ..kN 



N

= δ (k − k )|k1 ..kN  +
3
(±)r δ 3 (k − kr )|k  k1 ..kr−1 kr+1 ...
r=1


N
= δ3 (k − k )|k1 ..kN  ± δ 3 (k − kr )|k1 ..kr−1 k  kr+1 .. (6.35)
r=1

Hence

(a(k)a† (k  ) ∓ a† (k  )a(k))|k1 ..kN  = δ 3 (k − k )|k1 ..kN  (6.36)

More concisely, we have derived, for bosons (resp. fermions), the fundamental com-
mutation (resp. anticommutation) relations

[a(k), a† (k  )]∓ = δ 3 (k − k ) (6.37)

By similar computations, it is trivial to show that creation and annihilation operators


commute (resp. anticommute) among themselves:

[a(k), a(k  )]∓ = [a† (k), a† (k  )]∓ = 0 (6.38)

For the case of fermions, this implies a† (k)a† (k) + a† (k)a† (k) = 2a† (k)a† (k) = 0, i.e.,
the Pauli exclusion principle forbidding the addition of two identical fermionic particles
to any state.
Next we derive the behavior of the creation and annihilation operators under
Lorentz transformations. Recall that with non-covariant normalization of the states,
 
E(Λk)  E(Λki )
U (Λ)|k, k1 , ... = |Λk, Λk1 , ... (6.39)
E(k) E(ki )
i

This may alternatively be written


 
E(Λk) †  E(Λki )
U (Λ)a† (k)|k1 .. = a (Λk) |Λk1 , ..
E(k) i
E(ki )

E(Λk) †
= a (Λk)U (Λ)|k1 , .. (6.40)
E(k)
Constructing clustering Hamiltonians: second quantization 147

As the ket in (6.40) is arbitrary, we have the operator relation


† E(Λk) †
U (Λ)a (k) = a (Λk)U (Λ) (6.41)
E(k)

Multiplying (6.41) on the right by U † (Λ) and using unitarity of the U (Λ)


† † E(Λk) †
U (Λ)a (k)U (Λ) = a (Λk) (6.42)
E(k)

with an exactly similar equation (by hermitian conjugation) for a(k).


Remembering that the states |k1 , .. are eigenstates of the free Hamiltonian
(0) (0)
H0 = P0 and the free momentum operator Pi , i = 1, 2, 3,

Pμ(0) |k, k1 , .. = (kμ + k1μ + ..)|k, k1 , .. (6.43)

we have

Pμ(0) a† (k)|k1 , .. = kμ a† (k)|k1 , .. + a† (k)Pμ(0) |k1 , .. (6.44)

Rearranging (6.44), and removing the arbitrary ket,

[Pμ(0) , a† (k)] = kμ a† (k) (6.45)

By taking the hermitian conjugate of (6.45),

[Pμ(0) , a(k)] = −kμ a(k) (6.46)

We emphasized above that the creation–annihilation formalism provides a conve-


nient, as it were automated, mechanism for ensuring that the states in the theory are
properly symmetrized (or, for fermions, antisymmetrized). As long as the fundamental
commutation relations (6.37, 6.38) hold, the multi-particle states obtained by applying
any number of creation operators to the vacuum will have the right symmetry under
particle exchange. Even more important, however, is the fact that the creation–
annihilation operator formalism affords a compact notation for writing multi-particle
matrix elements in a way which exposes the connectedness structure with great clarity.
To see this, first observe that an arbitrary operator H acting in the Fock space of multi-
particle states may be expanded as a series in multi-nomials of a, a† :
148 Dynamics IV: Aspects of locality: clustering, microcausality, and analyticity

 
1
H= d3 k1 ..d3 kM
 3 3
 d k1 ..d kM
M !M  !
M,M 

hM  M (k1 , ..kM
 †  † 
 , k1 , ..kM )a (k1 )..a (kM  )a(k1 )..a(kM ) (6.47)

The coefficient functions hM  M encode the complete set of multi-particle matrix


elements of the Fock space operator H. If hM  M = 0 for all M, M  = 0, we call H a
“c-number”: in other words, it commutes with all operators in the theory. Otherwise,
H is a “q-number”. The entire operatorial structure of the theory can thereby be
encoded by particle creation and annihilation operators.
The proof of the above assertion proceeds by induction. First, determine h11 by
computing the matrix elements of H between two single-particle states:


k |H|k = d3 k1 d3 k1 h11 (k1 , k1 )k |a† (k1 )a(k1 )|k

= d3 k1 d3 k1 h11 (k1 , k1 )δ 3 (k1 − k )δ 3 (k − k1 )

= h11 (k , k) (6.48)

Similarly, k1 k2 |H|k1 k2  = h22 (k1 , k2 , k1 , k2 )+ terms involving h11 δ 3 (..). Having deter-
mined h11 in the preceding step, this fixes h22 , and so on. We may now state the
critical result which validates the importance of an expansion in terms of creation and
annihilation operators:
Theorem 6.2

k1 , k2 , ..kN


  
 |H |k1 , k2 , ..kN  = hN  N (k1 , ..kN  , k1 , ..kN )
c

This remarkable theorem shows that the expansion (6.47) directly yields the connected
matrix elements of the Hamiltonian in terms of the expansion functions hN  N . The
proof is simple. Consider a general (N, N  ) matrix element of H, expressed graphi-
cally as shown in Fig. 6.8. The first terms on the right-hand side indicate possible
disconnected contributions: in particular, each term here must contain at least two
δ-functions (of course, some of the terms, for example the first term on the right-hand
side, only appear if N = N  ). The last term on the right is the fully connected piece,
with only a single overall δ-function of momentum conservation. The terms in the
expansion H = M M  . . . with M < N or M  < N  do not contain enough creation or
annihilation operators to affect all the particles in the final and initial states: thus such
terms contribute only to the disconnected part. The terms with M > N or M  > N 
give a vanishing contribution, by attempting to destroy more particles than are present
in the initial state or create more than are present in the final state. So the only part
of H to contribute to the fully connected matrix element is hN  N ! Q.E.D.
The theorem of the preceding section requiring H c to contain at most a single
delta-function may now be applied directly to the coefficient functions hN  N to obtain
the following immediate corollary of theorem 6.2:
Constructing a relativistic, clustering theory 149

N N

··· ···

H = Hc ··· + Hc ··· + ··· + Hc

··· ···

N N

Fig. 6.8 Connectedness structure of Hamiltonian matrix elements.

Corollary 6.3 hN  N (k1 , k2 , .. k1 , k2 ..) = δ 3 (k1 + k2 + .. − k1 − k2 − ..) · f (k1 , k2 , ..)
=⇒ clustering property of the S-matrix, provided f is a smooth function of momenta.

6.4 Constructing a relativistic, clustering theory


Although the general expansion (6.47) of a Hamiltonian operator in terms of creation
and annihilation operators simplifies the task of enforcing proper clustering properties
of our theory, the requirements of Lorentz-invariance of the resultant interactions are
clearly far from obviously satisfied in such a framework: the ubiquitous spatial momen-
tum integrals make the behavior of the theory under general Lorentz transformations
quite obscure. On the other hand, in Section 5.5 we saw that relativistic invariance
follows immediately if the interaction Hamiltonian is constructed as the spatial
integral of an ultralocal scalar field. But do such theories satisfy the desired cluster
decomposition properties? In this section we shall see that interaction Hamiltonians
built as polynomials out of a basic set of canonical local fields linear in creation
and annihilation operators do indeed have the right smoothness in momentum space
to guarantee clustering of the resultant S-matrix. We shall assume throughout that
we are dealing with spinless bosons, as the complications of spin are an unnecessary
distraction from our immediate task, which is to understand the intimate way in which
relativistic and locality considerations intertwine in the construction of quantum field
theory.
Recall the lesson of Section 5.5: we wish to construct an interaction energy density
Hint (x) which is a local8 scalar field, i.e., with the properties


i[Pμ(0) , Hint (x)] = Hint (x) (6.49)
∂xμ

8 Strictly speaking, ultralocal, with no spacetime-derivatives appearing in the commutator contact terms
(cf. the discussion in Section 5.5).
150 Dynamics IV: Aspects of locality: clustering, microcausality, and analyticity

U (Λ)Hint (x)U † (Λ) = Hint (Λx) (6.50)


[Hint (x1 ), Hint (x2 )] = 0, (x1 − x2 ) < 0 2
(6.51)

We also know that Hint may be written as an expansion in destruction and creation
operators (6.47), a, a† s. Let us therefore try to construct an object from a single a (or
a† ) which satisfies the above conditions. Then we can rely on the fact that local scalar
fields form an algebraic “ring”: the product of a set of local scalar fields is again a
local scalar field. To prove this, suppose A(x), B(x), C(x), . . . comprise a set of local
scalar fields. We want to show that A(x)B(x)C(x) . . . satisfies (6.49, 6.50, 6.51).

1. To check (6.49), observe that


 
i Pμ(0) , A(x)B(x)C(x)..
   
= i Pμ(0) , A(x) B(x)C(x).. + A(x) iPμ(0) , B(x) C(x).. + ..

∂A(x) ∂B
= B(x)C(x).. + A(x) μ C(x).. + ..
∂xμ ∂x

= (A(x)B(x)C(x)..)
∂xμ
We remind the reader that all operators considered here are in the interaction
(0)
picture: taking μ = 0 we see that P0 = H0 , the free Hamiltonian, generates the
time development for all fields in the theory.
2. From unitarity of U (Λ), U † (Λ)U (Λ) = 1:

U (Λ)A(x)B(x)C(x)..U † (Λ) = U (Λ)A(x)U † (Λ)U (Λ)B(x)U † (Λ)...


= A(Λx)B(Λx)C(Λx)...

3. If the points x1 , x2 are space-like separated, then each of the operators A(x1 ),
B(x1 ), C(x1 ),... commutes with each of A(x2 ), B(x2 ), C(x2 ), .., by (6.51), whence

[A(x1 )B(x1 )C(x1 ).., A(x2 )B(x2 )C(x2 )..] = 0

For the time being imagine that we are dealing with a single spinless boson, with
a(k) the destruction operator for a particle of spatial momentum k.9 The most general
operator linear in the destruction operator must take the form:10

φ(+) (x) = d3 kf (x; k)a(k) (6.52)

9 In this section we restore the arrows to distinguish spatial from four-momenta.


10 The “plus” superscript anticipates the fact, shortly to emerge, that the time-dependence of this operator
involves positive frequencies, which by convention means that it is a linear combination of plane waves of

the form e+i(k·r−ωt) , ω > 0.
Constructing a relativistic, clustering theory 151

First, impose the translation condition (6.49):


    
i Pμ , φ (x) = i d3 kf (x; k) Pμ(0) , a(k)
(0) (+)


= −i d3 kkμ f (x; k)a(k)

= d3 k∂μ f (x; k)a(k) (6.53)

which implies

∂μ f (x; k) = −ikμ f (x; k) (6.54)

Solving this differential equation, we find

f (x; k) = f (k)e−ik·x (6.55)

Note that the k in the exponential factor in (6.55) is the four-vector momentum, with
k0 ≡ E(k) ≡ k 2 + m2 : such a four-vector momentum is said to be “on-mass-shell”.
The result is that f (x; k), and therefore φ(+) (x), necessarily satisfy the Klein–Gordon
equation (Klein, 1926; Gordon, 1926)

( + m2 )φ(+) (x) = 0 (6.56)

It is apparent from the preceding argument that the appearance of a relativistically


invariant wave equation for our field is a direct consequence of the imposition of the
spacetime translation property.
Next we must impose the Lorentz transformation property (6.50) which defines a
scalar field:
 
E(Λk)
U (Λ)φ(+) (x)U † (Λ) = d3 kf (k)e−ik·x a(Λk)
E(k)

d3 k  −ik·x 
= f (k)e E(Λk)E(k)a(Λk)
E(k)

d3 k  −i(Λk·Λx) 
= f (k)e E(Λk)E(k)a(Λk) (6.57)
E(k)

On the other hand,

U (Λ)φ(+) (x)U † (Λ) = φ(+) (Λx)



d3 k
= E(k)f (k)e−ik·Λx a(k)
E(k)

d3 k
= E(Λk)f (Λk)e−i(Λk·Λx) a(Λk) (6.58)
E(k)
152 Dynamics IV: Aspects of locality: clustering, microcausality, and analyticity

Comparing the right-hand sides of (6.57) and (6.58), we conclude that



f (Λk)E(Λk) = f (k) E(Λk)E(k) (6.59)

which implies that f (k) ∝ √ 1 . The conventional normalization is to take


E(k)

1 1
f (k) =  (6.60)
(2π)3/2 2E(k)

so that we obtain finally the desired result (unique, up to an overall constant!):



1 d3 k
φ(+) (x) =  e−ik·x a(k) (6.61)
(2π)3/2 2E(k)

φ(+) (x) and its hermitian conjugate φ(+)† (given by an analogous formula involving
the creation operator a† ) will be the basic ingredients out of which we shall construct
our (necessarily hermitian) interaction Hamiltonians.
We must still address the question of locality, (6.51). For bosons we have the com-
mutation relations [a(k), a(k  )] = [a† (k), a† (k  )] = 0 among creation and destruction
operators separately, so automatically
   
φ(+) (x), φ(+) (y) = 0 = φ(+)† (x), φ(+)† (y) (6.62)

On the other hand, the interaction Hamiltonian must be hermitian, and must therefore
be built out of both φ(+) and φ(+)† . Thus, we must also check the commutation relation
  
1 d3 k d3 k  
φ(+) (x), φ(+)† (y) =   e−i(k·x−k ·y) δ 3 (k − k  )
(2π) 3
2E(k) 2E(k ) 


1 d3 k −ik·(x−y)
= 3
e
(2π) 2E(k)
≡ Δ+ (x − y; m) (6.63)

Here Δ+√is a c-number Lorentz-invariant function of x − y (and the mass m, through


E(k) = k 2 + m2 ) as
 
Δ+ (Λx − Λy; m) = φ(+) (Λx), φ(+)† (Λy)
 
= U (Λ) φ(+) (x), φ(+)† (y) U † (Λ)

= U (Λ)Δ+ (x − y; m)U † (Λ)


= Δ+ (x − y; m)

where the last line follows because Δ+ is a c-number (no creation or annihilation
operators), and therefore commutes with all operators in the theory, in particular with
the U (Λ). The frame-independence of this invariant function means that for z ≡ x − y
Constructing a relativistic, clustering theory 153

space-like we can evaluate it by choosing a frame in which the time component of z



vanishes, with |z| = −zμ z μ . Then

1 4πk 2 dk sin(k|z|)
Δ+ (z; m) = (6.64)
(2π)3 2E(k) k|z|

sin(k|
z |)
where we have used the result that the average over directions of k of eik·
z is z|
k|

.
Changing variables in the radial momentum integral to β ≡ k/m,

m2 ∞ βdβ sin(βm|z|)
Δ+ (z; m) = 
4π 2 0 β2 + 1 m|z|
m2 1 
= √ K1 (m −z 2 ) (6.65)
4π m −z 2
2

where K1 is a modified Bessel function. In particular, the commutator (6.63) does


not vanish when the separation z = x − y is space-like. Instead, the commutator falls
off exponentially with a scale determined by the Compton wavelength m−1 of the
associated particle (see Fig. 6.9):11

Spacelike region
Timelike region
Δ+-->

z2 -->

Fig. 6.9 Behavior of the invariant function Δ+ (real part shown for z 2 > 0).

11 In the time-like region with z 2 > 0, Δ+ can be shown to take the form

m √ √
Δ+ (z; m) = √ (N1 (m z 2 ) ± iJ1 (m z 2 )), z0 = ±
8π z 2

which displays oscillatory behavior, in contrast to the exponential decrease of K1 in the space-like region
(Fig. 6.9).
154 Dynamics IV: Aspects of locality: clustering, microcausality, and analyticity

m2  √
2

Δ+ (z; m) ∼ √ (πm −z 2 )−3/2 e−m −z , −z 2 → ∞ (6.66)
4 2
Unfortunately for the Lorentz-invariance of the theory, exponential falloff is not
good enough here: a non-zero commutator will result in non-Lorentz-invariant con-
tributions to the S-matrix, as the argument following (5.75) in Section 5.5 makes
clear. An important example of such a failure is provided by scalar number-conserving
interaction Hamiltonians. Consider a theory in which the interaction Hamiltonian has
an expansion of the form (6.47) with the only non-zero term having M = M  =2. The
perturbative expansion of the S-matrix then yields amplitudes for processes which can
arise from a succession of 2-2 particle scatterings. Writing the interaction Hamiltonian
as a spatial integral of an hermitian scalar density, the interaction energy density in
the simplest such theory must then take the form

Hint (x) = λ(φ(+)† (x))2 (φ(+) (x))2 (6.67)

where λ is a real “coupling” constant parameterizing the strength of the interaction.


In this theory the number of particles is strictly conserved, as the interaction operator
(6.67) exactly commutes with the number operator

N ≡ a† (k)a(k)d3 k (6.68)

Unfortunately, a number conserving interaction of this type (although itself a Lorentz


scalar field) violates locality, and hence the Lorentz-invariance of the S-matrix. A
straightforward calculation of the commutator of Hint at two space-like separated
points gives

[Hint (x) , Hint (y)] = 2λ2 Δ+ (x − y; m)


×{(φ(+)† (x))2 (φ(+) (x)φ(+)† (y) + φ(+)† (y)φ(+) (x))(φ(+) (y))2 } − (x ↔ y)
(6.69)

Neither the c-number function Δ+ (x − y; m) nor the associated operator (cubic in


a’s and a† ’s) vanish for x − y space-like, so the terms of second and higher order
in the S-matrix expansion (5.73), which contain time-ordering instructions sensitive
to frame for space-like separated points, will necessarily contain Lorentz-invariance
violating terms.
The above example shows that we are likely to encounter violations of Lorentz-
invariance if we construct the interaction Hamiltonian carelessly out of arbitrary
combinations of φ(+) and φ(+)† . Physically, Lorentz-invariance really requires the
inclusion of both destruction and creation parts in the same local field, as we saw
in the description of pion exchange in Chapter 3 (recall the discussion of the process
pictured in Fig. 3.1). There the existence of a process of π + exchange between nucleons
initiated and terminated by local interaction events forced us to include a particle of
opposite quantum numbers and equal mass: the π− , in the example shown. In the
general situation the particle and antiparticle are distinct (e.g., have opposite values
for any conserved additive quantum number, such as electric charge), so we really
Constructing a relativistic, clustering theory 155

have separate and independent destruction ac (k) and creation ac† (k) operators for the
antiparticle, satisfying (for bosons)
 
ac (k), ac† (k  ) = δ 3 (k − k ) (6.70)
   
ac (k), a(k  ) = ac (k), a† (k  ) = 0 (6.71)
 
ac (k), ac (k  ) = 0 (6.72)

etc. Although the heuristic argument given in Chapter 3 implies equality of particle
and antiparticle masses, we temporarily allow the antiparticle to have an independent
mass mc . Now destruction of a particle and creation of the antiparticle at a spacetime
point x must be treated on the same footing (as they both describe the same physical
event in different frames), which suggests that we write a single field (which we shall
call a canonical scalar field) containing both terms:

1 d3 k
φ(x) =  (a(k)e−ik·x + ac† (k)eik·x ) (6.73)
(2π)3/2 2E(k)
≡ φ(+) (x) + φ(−) (x) (6.74)

We note here in passing that φ(x), like its positive frequency part φ(+) (x), automati-
cally satisfies the Klein–Gordon equation (6.56): ( + m2 )φ(x) = 0, as only on-mass-
shell four-momenta occur in the plane-wave exponentials e±ik·x .
The basic commutation relations (6.70–6.72) immediately imply

[φ(x), φ(y)] = [φ(x)† , φ(y)† ] = 0 (6.75)

for all x, y (i.e., not necessarily space-like separated). The necessity to include both
φ and φ† in an hermitian Hamiltonian requires that we also check for locality in the
commutator

  1 d3 k
φ(x), φ† (y) = 3
(e−ik·(x−y) − eik·(x−y) )
(2π) 2E(k)
= Δ+ (x − y; m) − Δ+ (y − x; mc ) (6.76)

If the particle and antiparticle masses m, mc differ, this commutator will still fail to
vanish even for x − y space-like. On the other hand, exact equality of particle and
antiparticle masses m = mc implies (since Δ+ (z; m) in (6.65) is even in z for z space-
like) that the full invariant function

Δ(z; m) ≡ Δ+ (z; m) − Δ+ (−z; m) (6.77)

vanishes for space-like separation z (Δ+ (z; m) is even in z for z space-like), and we
have the desired locality
156 Dynamics IV: Aspects of locality: clustering, microcausality, and analyticity
 
φ(x), φ† (y) = 0 for (x − y)2 < 0 (6.78)

Of course, as a consequence of (6.62), we already have [φ(x), φ(y)] = [φ† (x), φ† (y)] = 0
for all x, y. The fact that strict locality of our fields implies an exact equality of
particle and antiparticle masses is a consequence of the famous “TCP theorem” of local
quantum field theory, which we shall discuss further in Section 13.4. For the proton
and antiproton, the equality has been established experimentally to nine significant
figures.
With the local fields φ(x) and φ† (x) in hand we may multiply them freely to obtain
an hermitian scalar interaction Hamiltonian density guaranteed to have the right
locality and Lorentz transformation properties, thereby ensuring Lorentz-invariance
of the scattering amplitudes of the theory.12 A famous example is “phi-4” theory, with
interaction
Hint (x) = λφ† (x)2 φ(x)2 (6.79)

Note the difference with our previous number-conserving interaction Hamiltonian


(6.67): if we break up each of the fields in (6.79) into creation and annihilation
parts, we find terms with unequal numbers of creation and annihilation operators,
so the interaction induces processes in which the total number of particles in the
state changes, which is surely one of the most characteristic features of relativistic
quantum field theories. However, one easily sees that any additive quantum number
corresponding to an operator of the following form, counting the difference between
the number of particles and antiparticles,

Q = e (a† (k)a(k) − ac† (k)ac (k))d3 k (6.80)

commutes with Hint (x), and is thus exactly conserved by the dynamics of this theory
(see also Problem 3).
In certain cases a particle may have identical quantum numbers to the antiparticle
(e.g., photons, π0 , ρ0 ,..): in other words, a = ac and the canonical local field φ repre-
senting this particle is hermitian, φ(x) = φ† (x). Such particles (and their associated
canonical field) are called “self-conjugate”. A “real” version of the phi-4 theory (6.79)
describing the self-interactions of a self-conjugate neutral boson would therefore take
the form
Hint (x) = λφ(x)4 (6.81)

With this lengthy digression into the requirements of locality (as a way to ensure
Lorentz-invariance of the S-matrix) concluded, we may finally return to examine
the constraints due to cluster decomposition. Recall that clustering required the
interaction part of the Hamiltonian to take the form

12 Of course, we must still ensure ultralocality of the equal-time commutators, as discussed in Section
5.5. In particular, derivatively-coupled theories will in general produce Schwinger terms in the commutator,
requiring additional non-covariant “seagull” terms in the interaction to correct the resultant defects in
Lorentz-invariance. We shall see in Chapter 12 how the Lagrangian formalism automatically solves the
problem of generating the appropriate seagull terms.
Constructing a relativistic, clustering theory 157

 
1
V = hM,M  (k1 , ..kM

 , k1 , ..kM )
M !M  !
M,M 

d3 ki d 3 ki
· a† (k1 )..a† (kM

 )a(k1 )..a(kM ) 


2E(ki ) 2E(ki )
 
1
= δ 3 (k1 + k2 + ..kM

 − k1 − ..kM )
M !M  !
d3 ki d3 ki
· fM M  (k1 ..kM )a† (k1 )....a(kM )   (6.82)
2E(ki ) 2E(ki )

with fM M  a smooth function of momenta. From (6.82) it is apparent that our


interaction Hamiltonian is automatically the spatial integral of an energy density
(introduced as a pure hypothesis in Section 5.5): V = d3 xHint (x), where
 
1 d3 k1 d 3 kM
Hint (x) =  .. 
M !M  ! 2E(k1 ) 2E(kM )
MM



· fM M  (k1 , .., kM )e−ik1 ·


x a† (k1 )..eikM ·
x a(kM ) (6.83)

From the fact that a(k) (a† (k)) removes (resp. adds) a particle of energy E(k) from
any state on which it acts, it follows that

eiH0 t a† (k)e−iH0 t = eiE(k)t a† (k)


eiH0 t a(k)e−iH0 t = e−iE(k)t a(k) (6.84)

Thus, transforming (6.83) to the interaction picture, one obtains


 
1 d3 k1 d3 kM
Hint (x, t) = 
 .. 
M !M ! 2E(k1 )
MM
2E(kM )

fM M  (k1 , .., kM )eik1 ·x a† (k1 )..e−ikM ·x a(kM ) (6.85)

where now the exponential factors contain four-vector dot-products. It is now straight-
forward to show that the scalar-field transformation property U (Λ)Hint (x)U † (Λ) =
Hint (Λx) requires that

fM M  (Λk1 , ..ΛkM ) = fM M  (k1 , ..kM ) (6.86)

Thus the function fM M  should be cooked up from invariant dot-products of the



various four-vector momenta k1 , ..kM  available.

Clustering requires that the functions fM M  should be smooth functions of


momenta. If we make the very strong assumption that fM M  are analytic, and
therefore expandable in a series around zero momentum, then Hint must be a
series in the scalar fields φ(+) , φ(−) , φ(+)† , φ(−)† with four-derivatives, if any, coupled
to an overall scalar (all four-vector indices contracted). Building Hint out of the
combinations φ = φ(+) + φ(−) and φ† = φ(+)† + φ(−)† ensures that the locality prop-
158 Dynamics IV: Aspects of locality: clustering, microcausality, and analyticity

erty [Hint (x), Hint (y)] = 0, (x − y) space-like, will hold, giving us a Lorentz-invariant
theory.
There is one tricky point here which has perhaps already occurred to the reader:
namely, the form (6.83) in which we wrote the interaction Hamiltonian. In this
expression, all creation operators appear to the left of all destruction operators. This is
called the normal-ordered product. Given any product of fields, ABC.., in which each
field A, B, C.. is a linear combination of creation and annihilation parts, we form the
normal-ordered product : ABC.. : by multiplying out the field product in the normal
way and then moving all a† s to the left and all as to the right in each of the resultant
terms, ignoring any resultant commutator terms! For example (for a self-conjugate
field):

: φ(x)φ(y) : = : (φ(+) (x) + φ(−) (x))(φ(+) (y) + φ(−) (y)) :


≡ φ(+) (x)φ(+) (y) + φ(−) (y)φ(+) (x) + φ(−) (x)φ(+) (y) + φ(−) (x)φ(−) (y)

This differs from the ordinary product φ(x)φ(y) by a commutator

[φ(+) (x), φ(−) (y)] = Δ+ (x − y; m) (6.87)

which is, of course, a c-number. For field reorderings at the same spacetime point
we encounter a divergent c-number Δ+ (0; m) (the momentum integral in (6.64) is
quadratically divergent for z = 0)—our first encounter with the ubiquitous ultraviolet
delicacies of local field theory. We shall see how to deal with the sensitivity of the
theory to high-momentum contributions in the fourth section of this book, but for
the time being we may simply imagine inserting a cutoff on divergent momentum
integrals at some very high value, reflecting our inevitable ignorance of very-short-
distance physics. Thus, the rearrangement implied by normal ordering within a single
interaction term does not affect the locality properties of the interaction Hamiltonian
density, as a normal-ordered product can be rearranged into a linear combination of
ordinary powers of the local field. For example (see Problem 4):

: φ(x)4 := φ(x)4 + Aφ(x)2 + B (6.88)

with A, B c-number constants related to Δ+ (0; m). Accordingly, we are free to take

λ
Hint (x) = : φ(x)4 :
4!
λ
≡ (φ(+)4 + 4φ(−) φ(+)3 + 6φ(−)2 φ(+)2 + 4φ(−)3 φ(+) + φ(−)4 )
4!
which is, despite the
 reorderings, still a local field in virtue of (6.88). The free
Hamiltonian, H0 = d3 kE(k)a† (k)a(k), is also given (see Problem 5) by the integral
of a spatial energy density, itself a sum of normal-ordered products
 
φ̇2 1  2 1 2 2
H0 = d3 xH0 (x, t) = d3 x : + |∇φ| + m φ : (6.89)
2 2 2
Local fields, non-localizable particles! 159

where the divergent c-number commutator term by which the normal-ordered expres-
sion differs from the corresponding expression without normal ordering is just the
infamous “zero-point” energy of a free scalar field theory, which we have already
encountered in Chapter 1 in Jordan’s seminal calculation of field energy fluctuations.
The dependence on the time t at which the field operators in (6.89) are taken is
spurious—not surprisingly, as H0 is clearly time-independent in the interaction picture.
In summary, we have a theory with complete Hamiltonian

1 1  2 1 2 2 λ 4
H = d3 x : φ̇2 + |∇φ| + m φ + φ : (6.90)
2 2 2 4!
where all the field operators are taken at t = 0, say.
Let us return briefly to the very strong analyticity constraint assumed above
for fM M  . The Taylor series for fM M  in powers of momenta translates to higher
derivatives of the fields in coordinate space. The immediate problem induced by such
derivatives—failure of ultralocality—is not, in fact, fatal: as we shall see in Chapter
12, the Lagrangian formalism enables the construction of Lorentz invariant theories
even for interaction Hamiltonians with arbitrarily many spacetime-derivatives. The
justification for this assumption really lies in the basic feature needed to insulate
us from our ignorance of physics at very short distance scales—the scale separation
property—discussed in Chapter 3 (and in much greater detail in Chapter 16), and in
the further property of renormalizability possessed by a small subclass of local quantum
field theories, in which the low-momentum physics can, at least perturbatively (and in
a few special cases exactly, i.e., even non-perturbatively), be completely isolated from
the behavior of the theory at arbitrarily high momentum. In fact, the requirement of
renormalizability will constrain the suitable range of theories in any more than one
spacetime dimension to polynomial interactions in the fields: the sum over M, M  must
terminate! In addition, for theories of spinless particles in four spacetime dimensions,
the coefficient function fM M  must actually be momentum-independent. The necessity
for this will be examined in great detail in the fourth section of this book when we
consider the physical origin and role of renormalizability. We shall also see then that
more general effective field theories, with arbitrarily high derivatives, and powers of the
field, which are the natural end results of the imposition only of Lorentz-invariance and
clustering requirements, lie inevitably at the microscopic “core” of any local quantum
field theory—even the perturbatively renormalizable ones.

6.5 Local fields, non-localizable particles!


The commutativity of local fields at space-like separated points, commonly referred
to as the property of microcausality, is a natural implementation at the quantum
level of our macroscopic intuition that propagation of physical effects at superluminal
speeds should be prohibited (which we may term macrocausality). We have seen that
it is quite straightforward to construct fields which exactly satisfy microcausality, and
commute even at arbitrarily small space-like separations. All that is required is that
the positive frequency (or destruction) part of the field be balanced symmetrically by
a negative frequency (creation) part involving a particle of equal mass and opposite
additive quantum numbers. More generally, if v1 , v2 are two compact regions of
160 Dynamics IV: Aspects of locality: clustering, microcausality, and analyticity

spacetime such that for all x ∈ v1 , y ∈ v2 the separation x − y is space-like, and


f1 (x), (f2 (x)) are c-number functions with support in v1 , (resp v2 ), then the smeared
fields φfi ≡ d4 xfi (x)φ(x), i = 1, 2 satisfy (φ(x) a self-conjugate, hence hermitian,
scalar field)

[φf1 , φf2 ] = 0 (6.91)

and are thus, in the general sense of quantum measurement theory, mutually compati-
ble hermitian (hence, measurable) observables of the theory: states can be constructed
in which φf1 and φf2 take simultaneously sharp values.13
The exact localizability of relativistic quantum fields does not, however, extend
to the quantal manifestations of these fields: the particles whose interactions have
motivated the introduction of the field concept in the first place! Recall the situation
in non-relativistic quantum theory, where a massive particle like an electron can be
assigned a wavefunction ψ(x, t) which at some time t0 exactly vanishes outside an
arbitrarily small bounded spatial region v1 , thereby localizing the particle exactly
inside the given region, at least at some instant of time.14 More generally, for a non-
relativistic many particle quantum system,  a number density operator N (x, t) can
be defined such that the operators Nv ≡ v d3 xN (x, t0 ) defined for arbitrary spatial
regions at some instant t0 exactly commute with each other for non-overlapping spatial
regions. Thus, it makes perfect sense in such a theory to speak of a definite number
of particles in a precisely well defined spatial volume at a given time.
All of this falls apart in relativistic field theory. Let us illustrate the basic issues in
the simplest case, that of a massive spinless boson described by a self-conjugate scalar
field φ(x) (as given by (6.73) with ac = a). We first note that the failure of strict
localizability for relativistic particles is a completely kinematic issue: interactions of
the field are not relevant here. The number operator counting the total number of
particles in a state can be written as a spatial integral of a number density N (x, t)
(see Problem 6)

N ≡ d3 ka† (k)a(k) (6.92)

= d3 xN (x, t), (6.93)

∂ (+)
N (x, t) = iφ (−)
(x, t) φ (x, t) (6.94)
∂t
where the antisymmetric time-derivative symbol in (6.94) is defined as

∂ ∂B(t) ∂A(t)
A(t) B(t) ≡ A(t) − B(t) = A(t)Ḃ(t) − Ȧ(t)B(t) (6.95)
∂t ∂t ∂t

13 We shall see how to do this explicitly in Chapter 8 when we discuss coherent states of a quantum field.
14 Ofcourse, the instantaneous spreading of the wave-packet allowed in non-relativistic theory will produce
a non-vanishing wavefunction outside of v1 for t > t0 .
Local fields, non-localizable particles! 161

In this relativistic theory, the equal-time commutator of the number density operator
at spatially distinct points x
= y does not vanish. We shall need the basic commutators
(Problem 7)

[φ(+) (x, t), φ(−) (y , t] = Δ+ (x − y , 0; m) (6.96)


[φ̇ (+)
(x, t), φ̇ (−)
(y , t] ≡ Δ̃+ (x − y , 0; m) (6.97)
i 3
[φ(+) (x, t), φ̇(−) (y , t] = δ (x − y ) = 0 (x
= y ) (6.98)
2
i
[φ(−) (x, t), φ̇(+) (y , t] = δ 3 (x − y ) = 0 (x =

y ) (6.99)
2
where, reintroducing
√ the speed of light c in the energy-momentum dispersion formula
E(k) = m2 c4 + k 2 c2 to facilitate a non-relativistic limit,

d3 k 1

Δ+ (x, 0; m) = 3
√ eik·
x (6.100)
(2π) 2 m c4 + k2 c2
2

and

d3 k 1  2 4

Δ̃+ (x, 0; m) = m c + k 2 c2 eik·


x (6.101)
(2π)3 2
 2 )Δ+
= (m2 c4 − c2 ∇ (6.102)

Note that although the relativistic function Δ+ is not zero, but rather falls exponen-
tially (recall (6.66)), implying the same behavior for the function Δ̃+ , by (6.102), in
the formal non-relativistic limit c → ∞, we may expand
1 1 2 3
Δ+ (x, 0; m) ∼ 2
(δ 3 (x) + ∇ δ (x) + ..) (6.103)
mc 2m2 c2
thereby recovering local behavior for all relevant commutators. However, in the
relativistic case, both Δ+ (x, 0; m) and Δ̃+ (x, 0; m) have a dominant asymptotic
exponential falloff ∼ e−m|
x| .
With these ingredients, a short calculation yields

[N (x, t), N (y , t)] = Δ+ (x − y , 0; m)(φ̇(−) (x, t)φ̇(+) (y , t) − φ̇(−) (y , t)φ̇(+) (x, t))
+ Δ̃+ (x − y , 0; m)(φ(−) (x, t)φ(+) (y , t) − φ(−) (y , t)φ(+) (x, t))
(6.104)

Accordingly, measurements of the number of particles Nv1 ≡ v1 d3 xN (x, t), Nv2 ≡

v2
d3 xN (x, t) in two spatially non-overlapping volumes v1 , v2 will mutually interfere,
with the level of interference falling exponentially as the separation (smallest distance
between points in v1 and v2 ) is increased, on the scale of the Compton wavelength
m−1 of the particle. In the non-relativistic limit, after appropriately absorbing the
1
leading mc 2 factor in (6.103) into the normalization of the operators, the commutator

(6.104) vanishes as expected for all x ∈ v1


= y ∈ v2 , ensuring [Nv1 , Nv2 ] = 0 for non-
overlapping volumes v1 , v2 .
162 Dynamics IV: Aspects of locality: clustering, microcausality, and analyticity

The difficulty in defining localized states for relativistic particles can be seen with
even greater clarity if we examine the behavior of various field observables for a one-
particle state |ψ , defined by specifying a momentum wavefunction ψ(k):
  
|ψ ≡ d3 k ψ(k)a† (k)|0 = d3 k ψ(k)|k , ψ|ψ = d3 k|ψ(k)|2 = 1 (6.105)

We now consider the expectation value of various field observables in this one-particle
state at some fixed time, say t =0. For example, the expectation value of the number
density is (suppressing the time-variable in the fields, as we are at t =0)

ψ|N (x, 0)|ψ = i d3 kd3 k  ψ ∗ (k  )ψ(k) k |φ(−) (x)φ̇(+) (x) − φ̇(−) (x)φ(+) (x)|k

  
1 i(
k  )·

k−
E(k) E(k  )
= d3 kd3 k  ψ ∗ (k )ψ(k)e x
{ + }
2(2π)3 E(k  ) E(k)
= Re(χ∗ (x)χ̃(x)) (6.106)

involving two distinct coordinate space wavefunctions



1 1

χ(x) ≡ 3/2
d3 k  ψ(k)eik·
x (6.107)
(2π) E(k)
 
1

χ̃(x) ≡ 3/2
d3 k E(k)ψ(k)eik·
x (6.108)
(2π)

In the non-relativistic limit (formally, c → ∞) these wavefunctions revert, up to


normalization, to the usual non-relativistic coordinate space wavefunction of a massive
particle:

1 1

χ(x) → √ ψ(x), ψ(x) = d3 kψ(k)eik·


x (6.109)
mc (2π)3/2

χ̃(x) → mcψ(x) (6.110)

and the expectation value of the number density (6.106) reduces to the conventional
probability density of non-relativistic quantum mechanics |ψ(x)|2 , as first proposed
by Max Born in 1926.
In the relativistic case, the non-commutativity of number operators for non-
overlapping regions means, however, that even if we choose ψ(k) in (6.105) so that
the expectation value of N vanishes exactly outside some compact spatial volume
v1 , this does not mean that the state is an eigenstate, with zero eigenvalue, of the
number operator for another non-overlapping spatial volume v2 . In fact, such a state
will even have a non-zero-energy distribution outside the volume v1 . This is easily
seen by examining the energy density, given by the operator (6.89) (recall that we are
interested only in a free system here, and that we are reinstating explicit factors of
the velocity of light to facilitate the non-relativistic limit, while keeping natural units
Local fields, non-localizable particles! 163

 =1 for Planck’s constant)

1  x, 0)|2 + m2 c4 φ(x, 0)2 ) :


H0 (x, 0) = : (φ̇(x, 0)2 + c2 |∇φ( (6.111)
2
A short calculation (see Problem 8) gives for the expectation value of the energy
density in the state (6.105)

1  ∗ (x) · ∇χ(
 x) + m2 c4 |χ(x)|2 )
ψ|H0 (x, 0)|ψ = (|χ̃(x)|2 + c2 ∇χ (6.112)
2
We can localize our one-particle state exactly with respect to the number density
operator either by choosing ψ(k) ∝ E(k) (in which case χ(x) ∝ δ 3 (x) but χ̃ is not a
point distribution, but rather our old friend Δ̃+ from (6.101)) or by choosing ψ(k) ∝
√ 1 (in which case χ̃(x) ∝ δ 3 (x), and χ reduces to Δ+ ), but in either case the energy
E(k)
density involves terms which do not vanish for x
= 0, but rather fall off exponentially
away from the origin at a rate determined once again by the Compton wavelength of
the particle, specifically (apart from power prefactors) like e−2m|
x| . The non-relativistic
limit, using (6.109, 6.110), is just as expected:

1 
ψ|H0 (x, 0)|ψ → mc2 |ψ(x)|2 + |∇ψ(x)|2 (6.113)
2m
which, on integration over space, gives the rest energy mc2 plus the expectation value
1  2
of the non-relativistic kinetic energy operator − 2m |∇| .
The peculiar resistance of relativistic particles described by local fields15 to local-
ization of their physical attributes is really a manifestation of a deep complemen-
tarity principle at play between particle and field aspects in relativistic quantum
field theories. Indeed, the fluctuations of the energy of blackbody radiation in a
subvolume of a cavity even for states in which the number of photons of each
mode (in the full system) is completely definite was the critical piece of information
used by Jordan to carry through the first real calculation in quantum field theory,
as we saw in Chapter 1. We also recall from Heisenberg’s original “gamma-ray
microscope” argument for the uncertainty principle that we can often gain some
physical understanding of complementary quantities in quantum physics by thought
experiments in which physical processes are invoked to effect a “measurement” of a
given quantity. From the field point of view, attempts to localize physical attributes
of a relativistic particle (energy, number of quanta) in a finite volume of order the
Compton wavelength necessarily entail interaction with other fields with momentum
(and hence, relativistically, energy) components on the order of the inverse Compton
wavelength. Such interactions can produce additional “virtual” particle–antiparticle

15 One of the most remarkable examples of the “fuzzy” character of localization in quantum field theories
is the Reeh–Schlieder theorem (see (Reeh and Schlieder, 1961); also (Streater and Wightman, 1978), theorem
4-2) of axiomatic quantum field theory, which asserts that an arbitrary physical state can be approximated
arbitrarily well by applying polynomial functions of field operators localized in any finite open region of
spacetime to the vacuum: even a region arbitrarily far separated from the “location” of the particles in the
given state!
164 Dynamics IV: Aspects of locality: clustering, microcausality, and analyticity

pairs, vitiating the desired localization of the original particle.16 Perhaps the confus-
ing disparity between exactly localizable fields and our stubbornly “fuzzy” particle
states is best understood in the following terms. The locality principle operating
at the field level is really an implementation of the point-like nature of the particle
interactions: we construct interaction Hamiltonians by multiplying the relevant fields
at exactly the same spacetime point. The structureless character of an elementary
particle is, from this point of view, a statement about the way it interacts with
other elementary particles (or, in the case of purely self-coupled theories, with itself),
and not a statement about our ability to localize the physical characteristics (energy,
momentum, charge, etc.) of the associated particle at a dimensionless spatial point,
which, as the preceding discussion shows, is intrinsically impossible in a relativistic
theory.

6.6 From microcausality to analyticity


As we have seen, the construction of relativistic interactions using local fields can
be motivated by invoking the spatial clustering principle for relativistic scattering
amplitudes, which in turn leads to a smoothness requirement on matrix elements of
the interaction Hamiltonian in momentum space. In this section we shall, in a sense,
complete the circle by exhibiting a deep connection between the microcausality of the
underlying local fields and analyticity properties of amplitudes constructed from such
fields.
The analyticity of S-matrix amplitudes is so stringent a constraint that it was
at one time17 believed (in conjunction with Lorentz-invariance, unitarity, and the
crossing symmetry property of amplitudes, which we shall discuss later in Chapter 7)
to provide an adequate basis for a complete theory of strong interactions, with no need
for the explicit introduction of local fields. While this point of view has essentially
been abandoned, the dispersion relations for amplitudes derivable on the basis of
their analyticity properties still form a very important part of the armory of the field
theorist.
The connection between causality and analyticity was first emphasized by Kramers
in the Como conference of 1927 (Kramers, 1927).18 Perhaps the simplest physical
example, which nevertheless contains all essential features, is one familiar to electrical
engineers: a causal linear device is one in which an output signal O(t) (t is the time
variable) is related linearly to an input signal I(t) by a causal transfer function T ,
 +∞
O(t) = T (t − t )I(t )dt (6.114)
−∞

16 We shall later return to the underlying particle–field complementarity (in the form of the number-phase
mutual uncertainty principle) at work here in Chapter 8, when we examine the classical limit of quantum
field theory.
17 The S-matrix approach to strong interactions, pioneered in the late 1950s and 1960s by Chew,
Mandelstam, Regge, and many others, while leading to many important and lasting results, faded once
the efficacy of quantum chromodynamics in addressing a much wider variety of strong dynamics processes
became apparent in the 1970s and 1980s.
18 See the paper of Toll (Toll, 1956), for a review and careful discussion of the logical foundations.
From microcausality to analyticity 165

where causality requires T (t − t ) = 0, t < t : the output at any time t can only depend
on prior values of the input signal at time t . The Fourier transform of the transfer
function is given by

 +∞  +∞
iωt
T̃ (ω) = e T (t)dt = eiωt T (t)dt (6.115)
−∞ 0

The restriction of the time variable t to positive values in the above integral means that
T̃ (ω) may be analytically continued from real values of ω to the upper-half-plane of the
complex frequency plane, Im(ω) > 0, as the integral acquires an additional convergence
factor e−Im(ω)t as we move off the real ω axis into the upper half plane of ω. In fact,
for square-integrable transfer functions, the connection goes both ways, as has been
shown by Titchmarsh (Titchmarsh, 1948): upper-half-plane analytic functions square
integrable along any line parallel to the real axis are inevitably the Fourier transforms
of causal transfer functions (i.e., functions vanishing for negative time). This upper-
half-plane analyticity of T̃ (ω) allows the derivation of important dispersion relations,
relating the real and imaginary parts of T̃ (ω) for real values of ω. Such dispersion
relations are of enormous phenomenological importance in high-energy physics: they
played a central role in the S-matrix approach to strong interaction physics which
dominated particle physics in the late 1950s and through the 1960s.
The connection between analyticity and causality is also familiar in ordinary
quantum scattering theory. Form a wave-packet of incoming waves by smearing plane
waves in the usual way:

dωg(ω)eiω( c −t)
z
ψin (z, t) =

The scattering amplitude f (ω) then gives the outgoing spherical scattered wave as
(see Fig. 6.10)

z r

z r
ψin (z, t) = g(ω)eiω( c −t) ψscat (r, t) = f (ω)g(ω)eiω( c −t)
Fig. 6.10 Scattering from a localized target.
166 Dynamics IV: Aspects of locality: clustering, microcausality, and analyticity

dωf (ω)g(ω)eiω( c −t)
r
ψscat (r, t) =

Now choose g(ω) so the incoming packet arrives at the scattering center at t = 0:
 +∞
dt
ψin (0, t) = 0, t < 0, with g(ω) = ψin (0, t)eiωt
0 2π
g(ω) certainly exists for real ω, a fortiori for Im(ω) > 0, where the integral has an
additional real exponential convergence. In fact, this implies (by reasoning exactly
analogous to that presented previously for causal signals) analyticity of g(ω) in the
entire upper-half-plane. But causality requires that there be no scattered wave ahead
of the incident one! In other words,
r
ψscat (r, t) = 0, t− <0
c
so by the same argument, f (ω)g(ω) is upper-half-plane analytic. This implies that the
scattering amplitude f (ω) cannot have singularities in the upper-half-plane.
In field theory, a simple, and physically important, illustration of the connection
between microcausality and analyticity can be found in the process of forward scat-
tering, in which a massless particle (e.g., a photon) scatters with zero momentum
exchange off a target (e.g., a neutral atom, or a proton). The complications of spin
are completely irrelevant to the argument we shall present, so we shall discuss this
process for a massless spinless boson described by a field φ(x). The target particle is
assumed to be stable with respect to emission of the φ-particle. We also assume that the
interaction between our scalar “photon” and the target can be treated perturbatively
to second order in an interaction Hamiltonian of form

Hint (x) = eJ(x)φ(x) (6.116)

where φ(x) is the canonical scalar field for the “photon”—in the interaction picture,
and hence given explicitly by (6.73) (with a = ac , as we assume a self-conjugate
boson).
The fields describing the internal dynamics of the target are all contained in the
“current”19 J(x) which evolves dynamically according to a Hamiltonian H0 which,
despite the suggestive subscript, is “free” only in the sense that the field φ is absent,
while all other aspects of the internal dynamics of the target are treated exactly. For
example, if the target is a proton, J(x) would be constructed from the appropriate
quark fields and would evolve dynamically with the full Hamiltonian of quantum
chromodynamics, the gauge theory assumed to describe strong dynamics. We need only
assume that J(x) is a local scalar field with the usual translation property (cf (5.93))
(0) (0)
·xμ
J(x) = eiPμ J(0)e−iPμ ·xμ
(6.117)

19 The terminology is justified by the form of the interaction in the physical case of electron–proton scat-
tering, where Hint (x) = eJμ (x)Aμ (x), with Jμ (x) the electromagnetic current for the strongly interacting
fields, and Aμ (x) is the vector potential field mediating the photon. Here, for simplicity, we assume that
the current is a Lorentz scalar field.
From microcausality to analyticity 167

and microcausality

[J(x), J(y)] = 0, (x − y)2 < 0 (6.118)


(0)
We note again that the energy-momentum operators Pμ in (6.117), despite the
superscript, contain the full dynamics of the target system, but no interactions for
the “photon”.
The scattering S-matrix amplitude of a scalar photon of momentum q from a target
particle of momentum p to second order in the interaction (6.116) is, by (5.72),


(−ie)2
S (2) (k  , q  ; k, q) = d4 xd4 y k  , q  |T {J(x)φ(x)J(y)φ(y)}|k, q (6.119)
2
or, writing out the T-product explicitly,

(2)   e2
S (k , q ; k, q) = − d4 xd4 y{θ(x0 − y 0 ) k |J(x)J(y)|k q  |φ(x)φ(y)|q + x ↔ y}
2
(6.120)
Here the field φ is free and commutes with the current J, as the dynamics of the
latter by definition lacks all reference to φ. This allows the factorization of the matrix
element indicated in (6.120). From the explicit expression for φ in terms of creation
and annihilation operators, we find
1 1 
q |φ(x)φ(y)|q = 3
(eiq ·x−iq·y + x ↔ y) (6.121)
(2π) 2E(q)

whence

−e2 1
S (2) (k  , q  ; k, q) = 3
d4 xd4 y
2(2π) 2E(q)

· (eiq ·x−iq·y + x ↔ y) k  |T {J(x)J(y)}|k (6.122)

Changing spacetime variables from x, y to Z ≡ x+y


2
,z ≡ x − y, and using the transla-
tion property
z z (0) z z (0)
T {J(Z + )J(Z − )} = eiP ·Z T {J( )J(− )}e−iP ·Z (6.123)
2 2 2 2
this becomes

(2)   −e2 1
S (k , q ; k, q) = d4 zd4 Z
2(2π)3 2E(q)
    z z
· (ei(q +q)·z/2
+ e−i(q +q)·z/2 )ei(q +k −q−k)·Z k  |T {J( )J(− )}|k
2 2
−e2 1
= 3
(2π)4 δ 4 (k  + q  − k − q)T (k  , q  ; k, q) (6.124)
(2π) 2E(q)
168 Dynamics IV: Aspects of locality: clustering, microcausality, and analyticity

where the invariant amplitude T (k  , q  ; k, q) is defined as



1  z z
T (k  , q  ; k, q) ≡ d4 z (ei(q +q)·z/2 + z → −z) k  |T {J( )J(− )}|k
2 2 2

 z z
= d4 zei(q+q )·z/2 k |T {J( )J(− )}|k (6.125)
2 2
with the last line following from the symmetry of the T -product: T {J( z2 )J(− z2 )} =
T {J(− z2 )J( z2 )}. In the forward scattering limit q  → q, k  → k, we find
 
z z
T (k, q; k, q) = d ze k|T {J( )J(− )}|k = d4 zeiq·z k|T {J(z)J(0)}|k
4 iq·z
2 2
(6.126)
where the translation of the T -product by z2 (with no phase factor) is allowed by
equality of the initial- and final-state momenta. Next, we note that

T {J(z)J(0)} = θ(z 0 )J(z)J(0) + θ(−z 0 )J(0)J(z)

= θ(z 0 )[J(z), J(0)] + (θ(z 0 ) + θ(−z 0 ))J(0)J(z)

= θ(z 0 )[J(z), J(0)] + J(0)J(z) (6.127)

and that the second term on the right-hand side of (6.127) makes no contribu-
tion to the forward scattering amplitude. This follows from the following brief
computation:
 
d4 qeiq·z k|J(0)J(z)|k = d4 zeiq·z k|J(0)|n n|J(z)|k
n

= d4 zei(q+Pn −k)·z k|J(0)|n n|J(0)|k
n

= (2π)4 δ 4 (q + Pn − k) k|J(0)|n n|J(0)|k (6.128)
n

where we have inserted a complete set |n of eigenstates of the target system energy-
momentum P (0) , P (0)μ |n = Pnμ |n , and used the translation property (6.117). By
assumption, the stability of the target system |p to emission of a “photon” implies
that there are no states |n with momentum k − q and non-vanishing matrix element
n|J(0)|k .20 Thus, the forward scattering amplitude T (k, q; k, q) can be written as
the Fourier transform of a retarded commutator:

T (k, q; k, q) = d4 zeiq·z θ(z 0 ) k|[J(z), J(0)]|k (6.129)

The commutator appearing in (6.129) is the necessary ingredient for invoking


microcausality: together with the θ-function, the vanishing of the commutator for

20 For example, our target system might be an atom in its ground state, or the proton.
Problems 169

space-like z ensures that the spacetime integral over z is restricted to the forward
light-cone, i.e., coordinates z with z 0 > 0, z 2 ≥ 0. Our massless “photon” has energy-
momentum q μ = (ω, ω q̂), with q̂ a unit spatial vector, so (6.129) can be written
(suppressing the dependence on the target momentum k and photon direction q̂)

0
T (ω) = d4 zeiω(z −q̂·
z) θ(z 0 ) k|[J(z), J(0)]|k (6.130)

In the forward light-cone for z we necessarily have that z 0 − k̂ · z ≥ 0, so the Fourier


transform in (6.130) has exactly the same property as T̃ (ω) in (6.115): it may be
smoothly analytically continued to the upper-half-plane of the frequency variable ω,
0
as Im(ω) > 0 implies an exponential suppression factor e−Im(ω)(z −q̂·
z) .
That the connection between microcausality (space-like commutativity) and ana-
lyticity runs very deep in quantum field theory has been demonstrated by rigorous
proofs of very powerful analyticity theorems for the vacuum-expectation-values of local
fields, based only on very general assumptions on the spectral and field properties of
the theory. These assumptions, which have come to be called the Wightman axioms,
and their connection to the analyticity properties of field-theory amplitudes, will be
explained in our discussion of the axiomatic framework of field theory in the Heisenberg
picture in Section 9.2. At that point we shall be in a position to introduce, and prove,
the Ruelle Clustering theorem, which exhibits the clustering principle of local quantum
field theory in its most general and powerful form.

6.7 Problems
1. Verify that the linking operator L defined in (6.21) correctly constructs the
contribution to the 2-2 matrix elements of the product V U of two Fock space
operators V and U arising from two-(bosonic)particle intermediate states. You
will only need the M = N = 2 terms in the expansions of the V(j ∗ , j) and U(j ∗ , j)
functionals. Also, recall the form of the completeness relation for the Fock space
of a single boson, (5.22).
2. Consider a bosonic theory with Hamiltonian

H = d3 k1 d3 k1 h11 (k1 , k1 )a† (k1 )a(k1 )

1
+ d3 k1 d3 k2 d3 k1 d3 k2 h22 (k1 , k2 , k1 , k2 )a† (k1 )a† (k2 )a(k1 )a(k2 )
4
Calculate the 3-3 matrix element q1 q2 q3 |H|q1 q2 q3 explicitly in terms of the
functions h11 , h22 . Express your result graphically to display the connectedness
structure of this matrix element.
3. A conserved four-vector field (called a “current”) J μ (x) can be defined as follows
for a free complex scalar field φ(x):

J μ (x) = ie : (φ† (x)∂ μ φ(x) − (∂ μ φ† (x))φ(x)) : (6.131)

If we attribute an electric charge e (resp. −e) to the particles (resp. antiparticles)


of the theory, show that J μ may be regarded as the electromagnetic current for
170 Dynamics IV: Aspects of locality: clustering, microcausality, and analyticity

this field by demonstrating


(a) current conservation (use the Klein–Gordon equation for φ):
∂μ J μ (x) = 0

(b) that the charge Q defined as



Q≡ d3 xJ 0 (x, t)

counts electric charge (defined as e times the difference in the number of particles
and antiparticles), and is time-independent.
4. Show that the normal-ordered product of four self-conjugate scalar fields at the
same spacetime point can be written in terms of even powers of the field (and is
therefore itself local); i.e., show
: φ(x)4 := φ(x)4 + Aφ(x)2 + B
where A, B are c-numbers (independent of x).
5. The following exercise shows that the free Hamiltonian H0 can be written as an
integral of a density constructed from the local self-conjugate field φ.
(a) Show that

H0 = d3 kE(k)a† (k)a(k)

i.e., show that an arbitrary free state |k1 , k2 , ...kn has the appropriate eigenvalue
relative to this operator.
(b) Next, show that
 
1 ∂φ  2 + m2 φ2 } = d3 kE(k)(a† (k)a(k) + a(k)a† (k))/2
d3 x{( )2 + |∇φ|
2 ∂t

1 3
= H0 + δ (0) d3 kE(k)
2
Note that the singular zero-point energy is removed by normal-ordering H0 , as
in (6.89).
6. Verify the expressions for the number operator (6.92, 6.93, 6.94).
7. Verify the commutator results listed in Eqs. (6.96–6.99).
8. Show that the expectation value of the free scalar energy density in the state
(6.105) is as given in (6.112).
9. Let gk (x) = √ 13 e−ik·x (k and x are four-vectors, with k0 = E(k)). Show
(2π) 2E(k)
that the destruction operator a(k) may be reconstructed from a self-conjugate
scalar field by

a(k) = i d3 x(gk∗ (x, t)∂0 φ(x, t) − φ(x, t)∂0 gk∗ (x, t))
 (6.132)
7
Dynamics V: Construction of local
covariant fields

In many of the standard texts on quantum field theory, the introduction of fields
representing particles of low spin (zero, 12 , or one—there is no direct phenomenological
evidence for elementary particles of any higher spin) is a fairly ad hoc matter. Rela-
tivistic wave equations are introduced and shown to have “nice” covariance properties.
A Lagrangian formalism is then constructed for which these equations are just the
Euler–Lagrange equations of the theory, corresponding to the extremal condition
on the classical action. Finally, a canonical quantization procedure is carried out:
conjugate momentum fields are introduced, and the resultant Hamiltonian is shown
to be the appropriate energy operator for particles of the desired mass and spin.
In this chapter we shall eschew this ad hoc methodology in favor of a more
direct, constructive approach. The relativistic wave equations satisfied by the covariant
fields representing particles of low spin are shown to be automatic consequences
of the representation theory of the Poincaré group, which can be used to write a
completely general expression for the fields transforming according to an arbitrary
finite-dimensional representation of the Lorentz group and representing particles of
arbitrary mass and spin.
The advantage of this approach is twofold. In the first place, it will become apparent
in a completely natural way that there is an inevitable, but completely classifiable,
fluidity in the association of fields to particles: many different covariant fields may be
used with equal validity to represent a particle of a given spin, although there is usually
a “best” (i.e., most convenient) choice. Secondly, the formalism allows us to solve the
problem of constructing covariant fields in one fell swoop: the special cases of spin
zero, 12 , or 1 then follow from the general result simply by inserting j = 0, 12 , 1 into the
master formula. In particular, the Spin-Statistics theorem associating particles with
integral (resp. half-integral) spin with fields of bosonic (resp. fermionic) type (at the
free field level) will emerge naturally in the framework of this formalism.

7.1 Constructing local, Lorentz-invariant Hamiltonians


We saw previously in Chapter 5 that one simple way to achieve unitarity, Lorentz
invariance and locality of a theory of interacting particles was to express the interaction
Hamiltonian as the spatial integral of a Lorentz scalar local field. In Chapter 6 we saw
that theories of interacting spinless particles are readily constructed by taking the
Hamiltonian density as a polynomial in canonical scalar fields which are linear in
creation and annihilation operators of the particles in question. Unfortunately, such
172 Dynamics V: Construction of local covariant fields

simple scalar canonical fields can never produce particles of non-zero spin acting on
the vacuum. From the defining transformation law for a scalar field

U (Λ)φ(x)U † (Λ) = φ(Λx) (7.1)

one finds, choosing Λ to be the infinitesimal rotation by δθ around the ith axis,
(Ri x)j = xj + ijk xk δθ, and U (Λ) = e−iδθJi ,

[Ji , φ(x)] = i ijk xk ∂j φ(x) (7.2)

Recall that the spin of a particle is simply the residual angular momentum it possesses
when at rest: i.e., at zero linear momentum. Any state, formed by φ acting on the
vacuum with zero spatial momentum (which we achieve by integrating over 3-space to
project out the creation operator at zero momentum), must then have zero angular
momentum as well, as we discover by a simple integration by parts:
 
Ji d xφ(x)|0 = i d3 x ijk xk ∂j φ(x)|0
3


= −i d3 xφ(x) ijk ∂j xk |0

=0 (7.3)

It is clear that more general fields are needed to describe particles of non-vanishing
spin. Nevertheless, it will be important to be able to combine such fields to again
produce an interaction Hamiltonian density which is itself a Lorentz scalar field,
satisfying (7.1) above. The solution is to construct fields that transform according
to definite finite-dimensional representations of the homogeneous Lorentz group. For
example, two vector fields Aμ (x), B μ (x) transforming like

U (Λ)(A, B)μ (x)U † (Λ) = (Λ−1 )μν (A, B)ν (Λx) (7.4)

can be coupled together to make a scalar field C(x) ≡ Aμ (x)Bμ (x), which is easily seen
to satisfy (7.1). Moreover, C(x) is local (commutes with itself at space-like separation)
if A, B are. The original four-fermion theory of the weak interactions, dating from the
1930s, involved a weak interaction Hamiltonian of precisely this form.
A general covariant field will be a set of field operators transforming according to
a general finite-dimensional representation of the homogeneous Lorentz group realized
by the finite-dimensional matrices Mnm (Λ):

U (Λ)φn (x)U † (Λ) = Mnm (Λ−1 )φm (Λx) (7.5)

Lorentz scalar fields can be constructed from such covariant fields by coupling them
together with invariant tensors tn1 n2 ... of the HLG; namely

H(x) = tn1 n2 ... φn1 φn2 ... (7.6)

is a Lorentz scalar field if

tn1 n2 ... Mn1 m1 (Λ−1 )Mn2 m2 (Λ−1 )... = tm1 m2 ... , all Λ (7.7)
Finite-dimensional representations of the homogeneous Lorentz group 173

Of course, the simplest example of such a tensor is the two index Minkowski-space
metric tensor gμν , employed above to construct a scalar field C as the invariant dot-
product of two vector fields Aμ , B ν .
In addition we shall require translation invariance just as for scalar fields (cf(5.91)):

[Pμ(0) , φn (x)] = −i∂μ φn (x) (7.8)

and locality

[φn (x), φm (y)] = [φn (x), φ†m (y)] = 0, (x − y)2 < 0 (7.9)

7.2 Finite-dimensional representations of the homogeneous


Lorentz group
In this section we shall develop the basic theory of finite-dimensional representations of
the homogeneous Lorentz group (HLG), which will lead us to a complete classification
of local covariant fields. Consider an infinitesimal Lorentz transformation

Λμν = g μν + Ωμν + O(Ω2 ) (7.10)

where Ωμν is a matrix of infinitesimally small parameters (“rotation angles”). In fact,


this matrix is forced to be antisymmetric:

Λμν Λμρ = (g μν + Ωμν )(gμρ + Ωμρ ) + O(Ω2 ) = gνρ (7.11)


=⇒ Ωρν + Ωνρ = 0 (7.12)

which implies that there are six independent parameters in the specification of an
arbitrary infinitesimal Lorentz transformation (three angles and three boosts). In a
general finite-dimensional representation, Λ is represented by the matrix
i
Mnm (Λ) = δnm + Ωμν (J μν )nm + O(Ω2 ) (7.13)
2
where the six independent matrices J 12 , J 23 , J 31 , J 01 , J 02 , J 03 are the generators of
rotations (around the z, x, and y axes, respectively) and boosts (along the x, y, and z
axes, respectively). Now consider some definite fixed Λ̄ = 1 + Ω̄ which we subject to
a similarity transformation with a general M (Λ):

M (Λ)M (Λ̄)M (Λ−1 ) = M (ΛΛ̄Λ−1 ) (7.14)


−1
= M (1 + ΛΩ̄Λ ) (7.15)
i
= 1 + (ΛΩ̄Λ−1 )ρσ J ρσ (7.16)
2
i
= 1 + Ω̄μν Λρμ Λσν J ρσ (7.17)
2
On the other hand,
i
M (Λ)M (Λ̄)M −1 (Λ) = 1 + M (Λ) Ω̄μν J μν M −1 (Λ) (7.18)
2
174 Dynamics V: Construction of local covariant fields

Comparing (7.17) and (7.18), we see that the generators J μν transform as second-rank
contravariant tensors under the Lorentz group:

M (Λ)J μν M −1 (Λ) = Λρμ Λσν J ρσ (7.19)

Next, starting from (7.19), choose Λ itself infinitesimal, Λμν = g μν + ω μν , so that keeping
terms of O(ω), we obtain
i
[ωρσ J ρσ , J μν ] = (ωρμ gσν + gρμ ωσν )J ρσ
2
= ωρσ (g σμ J ρν − g ρν J μσ )
1
= ωρσ (gσμ J ρν + g σν J μρ − g ρν J μσ − g ρμ J σν ) (7.20)
2
from which follows immediately the full Lie algebra of the HLG:

[J μν , J ρσ ] = i(g μσ J ρν + g νσ J μρ − g ρμ J σν − g ρν J μσ ) (7.21)

If μ, ν are both spatial indices, we are dealing with the rotation subgroup of the HLG,
and the corresponding generators are therefore just the familiar angular momentum
operators:

J1 ≡ J 32 = J32 (7.22)
J2 ≡ J 13
= J13 (7.23)
J3 ≡ J 21 = J21 (7.24)

which satisfy, as an immediate consequence of (7.21), the usual SU(2) algebra

[Ji , Jj ] = i ijk Jk (7.25)

The physical content of the algebra is made more transparent by defining boost
generators

K1 ≡ J10 (7.26)
K2 ≡ J20 (7.27)
K3 ≡ J30 (7.28)

which transform as three-vectors under the rotation group

[Ji , Kj ] = i ijk Kk (7.29)

while satisfying the following commutation relation among themselves:

[Ki , Kj ] = −i ijk Jk (7.30)

The structure of the HLG is further clarified by considering the linear combinations
1 1
Ai ≡ (Ji − iKi ). Bi ≡ (Ji + iKi ) (7.31)
2 2
Finite-dimensional representations of the homogeneous Lorentz group 175

whereupon one finds

[Ai , Aj ] = i ijk Ak , [Bi , Bj ] = i ijk Bk (7.32)

together with a complete decoupling of the A and B parts of the algebra:

[Ai , Bj ] = 0 (7.33)

Evidently, the HLG can be regarded as the direct product of two “angular momentum
groups”! A note of caution is necessary here: we parenthesize “angular momentum
groups” because the generators Ai , Bi are not hermitian, so that the groups they
generate are not, strictly speaking, the usual unitary SU(2) group, but rather a
complexified version. Nevertheless, the resolution of the full HLG algebra into two
commuting subgroups tremendously simplifies (in fact, effectively solves) the problem
of classifying all the finite-dimensional representations of HLG.
General representations of the HLG can be labeled by a spin pair (A, B), where
A, B are integers or half-integers. Within a representation (A, B), a state is labeled
by (a, b), where as usual a = −A, −A + 1, ..., +A, b = −B, −B + 1, ..., +B. Since the
usual angular momentum J = A  + B,
 we can only describe particles of spin j by fields
(A, B) where A and B can be coupled together to make angular momentum j, i.e,

|A − B| ≤ j ≤ |A + B| (7.34)

On the other hand, any field satisfying this constraint can be used to represent a
particle of the given spin! This is a central feature of field theory which appears at
this point clearly for the first time: there is no unique correspondence between particles
and fields. A given particle can be represented by a variety of covariant fields, and (as
we shall see later) a given field can also represent many particle states.
The construction of scalar quantities under the HLG is analogous to the problem
of coupling non-zero angular momenta to net zero spin in the theory of the rotation
group—except that we have here to worry about two “rotation groups”! We shall
soon see that the behavior of these representations under (i) complex conjugation,
and (ii) spatial inversion (parity) play a particularly important role in understanding
how to accomplish this. First, we make some comments about the effect of complex
conjugation. This becomes an issue for spinorial (half-integral) representations: in the
fundamental (spin- 12 ) representation, for example, which is complex, but pseudoreal.
In other words, the 2-spinor representation of SU(2) is isomorphic to its complex
conjugate. In the case of the HLG, with its doubled SU(2) structure, conjugation has
the additional effect of interchanging the A and B quantum numbers. To see this,
consider the M (Λ) representation matrix for the ( 12 ,0) representation, with Λ a finite
HLG element with finite parameters Ωμν (cf. Eq(7.13) for the infinitesimal case):
1 i μν
M ( 2 ,0) (Λ) = e 2 Ωμν J (7.35)

Defining angle and boost vectors by φ  = (Ω23 , Ω31 , Ω12 ), ξ = (Ω10 , Ω20 , Ω30 ), and
recalling that J i0 = −Ki = − 2i σ (i=1,2,3), J ij = − ijk Jk = − 12 ijk σk for the ( 12 ,0)
176 Dynamics V: Construction of local covariant fields

representation, with σi the conventional Pauli σ matrices, this becomes


1 1

M ( 2 ,0) (Λ) = e 2 ξ·
σ− 2 φ·
σ
i
(7.36)
Likewise, the corresponding finite transformation matrix for the (0, 12 ) representation
is
1 1

M (0, 2 ) (Λ) = e− 2 ξ·
σ− 2 φ·
σ
i
(7.37)
Thus if a ( 12 ,0) spinor χα , α = 1, 2 transforms under a Lorentz transformation as
1

χα → (e 2 ξ·
σ− 2 φ·
σ )αβ χβ
i
(7.38)
the conjugate spinor will transform as
1

σ∗
χ∗α → (e 2 ξ·
σ + 2i φ·

)αβ χ∗β (7.39)


The pseudoreality property of the spinor representation alluded to above amounts
to the following conjugation property of the Pauli matrices: defining the spinor
conjugation matrix


0 1
Cs ≡ iσ2 = (7.40)
−1 0
we have
Csσ Cs−1 = −σ ∗ , Csσ ∗ Cs−1 = −σ (7.41)
whence the conjugation transformation (7.39) becomes
1

σ∗
(Cs χ∗ )α → (Cs e 2 ξ·
σ + 2i φ·

Cs−1 )αβ (Cs χ∗ )β


1

= (e− 2 ξ·
σ− 2 φ·
σ )αβ (Cs χ∗ )β
i
(7.42)
Comparing this result with (7.37), we see that the conjugate spinor Cs χ∗ transforms
appropriately for a (0, 12 ) representation: conjugation has reversed the roles of the A
and B quantum numbers. This is hardly surprising, given Eqs. (7.31), as the J (resp.
 generators are represented by hermitian (resp. antihermitian) matrices.
K)
Next, we consider the effects of a parity transformation: namely, an improper
Lorentz transformation (i.e., with determinant equal to negative unity) reversing the
sign of the three spatial coordinates while leaving the time coordinate unchanged.
Evidently, the angular momentum generators J,  with two spatial indices (see (7.22))
are even under a parity transformation, while the boost generators K,  with a single
spatial index (see (7.26)) are odd. In common parlance, angular momentum J is an
axial vector, the boost vector K a polar vector. A glance at the definitions (7.31) shows
that the parity transformation has the effect of interchanging the A and B labels
of a given irreducible representation (A, B) of the HLG. If the interactions of our
particles exactly conserve parity (as in the strong and electromagnetic interactions),
then the Hamiltonian cannot change under a parity transformation, so that if we
employ fields transforming under a representation (A, B), with A = B (note: such
Local covariant fields for massive particles of any spin: the Spin-Statistics theorem 177

fields are termed chiral), then the Hamiltonian must also contain, symmetrically,
fields transforming according to the representation (B, A). Indeed, for fermions, the
Spin-Statistics theorem (discussed below) implies j half-integral, whence, by (7.34),
we necessarily have A = B. Parity may be preserved in this case either by employing
conjugate representations as discussed above (leading to Majorana fermions, cf. Section
7.4.1), or by the use of reducible representations of the HLG, such as (A, B) ⊕ (B, A)
(Dirac fermions, Section 7.4.2).

7.3 Local covariant fields for massive particles of any spin: the
Spin-Statistics theorem
We now turn to the task of explicitly constructing local canonical covariant fields,
linear in creation and annihilation operators, for (massive) particles of any spin. This
section will correspondingly be algebraically somewhat more dense than most, but
the end results will more than merit the effort expended: just a few pages will suffice
to establish the general form of covariant fields of any spin, from which with very
little further effort flows the whole panoply of relativistic wave equations (Klein–
Gordon, Dirac, Maxwell–Proca, etc.) typically introduced in a more or less ad hoc
fashion in earlier generations of field theory texts.1 Moreover, a profound consequence
of relativistic field theory—the Spin-Statistics connection—will emerge naturally as
part of the construction.
The most general canonical field linear in creation and annihilation operators can
be written
 d3 k
φn (x) =  (un (k, σ)a(k, σ)e−ik·x + vn (k, σ)ac† (k, σ)eik·x ) (7.43)
σ (2π)3/2 2E(k)

Our task is to determine the coefficient functions un (k, σ), vn (k, σ) in order to satisfy
the requirements of Poincaré covariance and locality, namely (7.5, 7.8, 7.9). First, note

 d3 k E(Λk)

U (Λ)φn (x)U (Λ) = 
 (2π)3/2 2E(k) E(k)
σσ

· {un (k, σ)e−ik·x Dσσ


j
 (W
−1
(Λ, k))a(Λk, σ  )

+ vn (k, σ)eik·x Dσj  σ (W (Λ, k))ac† (Λk, σ  )} (7.44)

On the other hand, this must equal, by (7.5),


 
 d3 k E(Λk)
−1
Mnm (Λ )  { um (Λk, σ  )e−ik·x a(Λk, σ  )
m (2π) 3/2 2E(k) E(k)

+ vm (Λk, σ  )eik·x ac† (Λk, σ  )}

1 The construction of covariant fields for any spin in a unified way employing the representation theory
of the Poincaré group was first carried out in a seminal paper by Weinberg (Weinberg, 1964a).
178 Dynamics V: Construction of local covariant fields

where we have made a change of variable k → Λk in the momentum integration.


Comparing coefficents of a(Λk, σ  ), ac† (Λk, σ  ) we find
 j 
Dσσ (W −1 (Λ, k))un (k, σ) = Mnm (Λ−1 )um (Λk, σ  ) (7.45)
σ m
 
Dσj  σ (W (Λ, k))vn (k, σ) = Mnm (Λ−1 )vm (Λk, σ  ) (7.46)
σ m

The un (k, σ), vn (k, σ) should be regarded as connection coefficients between the finite-
dimensional (n index) field representations of the HLG and the infinite-dimensional
unitary Fock-space representation of the single-particle states (labeled by k and
σ). The constraints (7.45) and (7.46) will turn out to uniquely determine these
coefficients and hence the desired covariant field operators for any spin, up to an
obvious normalization and phase freedom. They also imply, as we shall see, that
the covariant field operators necessarily satisfy certain partial differential equations
in coordinate space (commonly called “relativistic wave equations”): in conventional
presentations of field theory these equations arise (somewhat magically) as Euler–
Lagrange equations of relativistically invariant actions. Here the constraints appear
naturally as a consequence of connecting the covariance of the field operator to the
underlying unitary Fock-space structure. We shall now show how these constraints
may be explicitly solved for arbitrary spin j.
Recall the definition of the Wigner rotation: for general Lorentz transformation Λ,

W (Λ, k) = L−1 (Λk)ΛL(k) (7.47)

Suppose k = 0. Then, if Λ = R, a rotation, W (R, 0) = R and we obtain from (7.45)


 j 
Dσσ (R−1 )un (0, σ) = Mnm (R−1 )um (0, σ  ) (7.48)
σ m

so that a unitary rotation on the field (m) index can be transferred to a unitary spin-j
rotation on the particle-spin index (σ). In an irreducible representation, this constraint
will determine un (0, σ) up to overall normalization.
Similarly, choosing k = 0, Λ = L−1 (k),

W (Λ, k) = L−1 (Λk) = L−1 (0) = 1 (7.49)

so

un (k, σ) = Mnm (L(k))um (0, σ) (7.50)
m

so the full coefficient function un (k, σ) is determined by a boost once we have used
(7.48) to fix the un s at zero momentum.
For the vn coefficient functions, using (7.46), and the property of rotation matrices

Dσj  σ (W (Λ, k)) = (−)σ −σ D−σ,−σ
j
 (W
−1
(Λ, k)) (7.51)
Local covariant fields for massive particles of any spin: the Spin-Statistics theorem 179

one finds
  
j
Dσσ  (R)(−)
j+σ
vn (0, −σ) = Mnm (R)(−)j+σ vm (0, −σ  ) (7.52)
σ m

which implies (−)j+σ vn (0, −σ) = ξun (0, σ) (ξ a so far arbitrary constant) as both
satisfy (7.48), while the vn s at non-zero momentum are again obtained by a boost:

vn (k, σ) = Mnm (L(k))vm (0, σ) (7.53)
m

so that finally

vn (k, σ) = ξ(−)j−σ un (k, −σ) (7.54)

We now turn to the task of explicitly solving the constraints (7.48, 7.50). Here
we shall need the results of the preceding section, in which the finite-dimensional
representations of the HLG (i.e., the structure of the representation matrices Mnm (Λ))
were classified. We shall use the (AB; ab) notation described above to identify specific
representations of the HLG and components within a representation. Thus
(AB) (AB)
φn → φab , un → uab (7.55)

With this notation, the constraint (7.48) now reads


 j 

Dσ σ (R)uab (0, σ  ) = (e−iA·


α )aa (e−iB·
α )bb ua b (0, σ) (7.56)
σ a b

A B
= Daa (R)Dbb (R)ua b (0, σ) (7.57)
a b

where R is the rotation e−iJ·


α . Now suppose that |Aa, Bb denotes a set of eigenstates
 2 , B3 and one constructs the state
 2 , A3 , B
of A

|σ ≡ uab (0, σ)|Aa, Bb (7.58)
ab

Then (7.57) just says that the |σ states transform as an irreducible representation of
the rotations induced by the angular momentum operator J = A  + B,
 corresponding
 2
α −iB·

−iA·

α
to eigenvalue J = j(j + 1). Proof: under a simultaneous rotation e e

|Aa, Bb → DaA a ( α)|Aa , Bb 
α)DbB b ( (7.59)
a b

so the above constraint amounts to


e−iJ·
α |σ = uab (0, σ)e−iA·
α e−iB·
α |Aa, Bb (7.60)
ab

= uab (0, σ)DaA a ( α)|Aa , Bb 
α)DbB b ( (7.61)
ab,a b
180 Dynamics V: Construction of local covariant fields

= α)ua b (0, σ  )|Aa , Bb 
Dσj  σ ( (7.62)
σ  ,a b

= α)|σ  
Dσj  σ ( (7.63)
σ

i.e., the |σ span a spin j irreducible representation of the rotation group. The
coefficients which perform the desired coupling (7.58) are just the familiar Clebsch–
Gordon coefficients (unique up to a phase), so we now know the coefficient functions
at zero momentum:

ab (0, σ) = A B a b |j σ
uAB (7.64)

which (by (7.54)) implies

AB
vab (0, σ) = ξ AB (−1)j−σ A B a b|j − σ (7.65)

From (7.50), once the coefficient functions are known at zero momentum, they can
be boosted to any non-zero momentum. For this, we need the general expression for
the boost Lnm (L(k)) in the (AB) representation. An infinitesimal boost in the ith
direction is realized by the Ji0 = Ki generator. For example, if k is in the z-direction,
and θ is the rapidity angle of the boost,
⎛ ⎞
cosh θ 0 0 sinh θ
⎜ 0 1 0 0 ⎟
Lρσ (k) ≡ B ρσ (θ) = ⎜
⎝ 0

0 1 0 ⎠
sinh θ 0 0 cosh θ

One easily sees that B(θ) = eθM , where M is the matrix


⎛ ⎞
0 0 0 1
⎜0 0 0 0⎟
M ρσ =⎜
⎝0

0 0 0⎠
1 0 0 0

which is equal to (in the four-vector representation) i(J30 )ρσ = iK3 . (In the four-vector
representation, the explicit formula for the generators is (Jμν )ρσ = i(gνρ gμσ − gμρ gνσ );
some boring algebra establishes that these matrices satisfy the Lie algebra (7.21)). It
follows that z-boosts in a general (AB) representation are realized by

Lnm (B(θ)) = (eiθK3 )nm (7.66)


iθ(i)(A3 −B3 )
= (e )nm (7.67)
= (e−θA3 e+θB3 )nm (7.68)
Local covariant fields for massive particles of any spin: the Spin-Statistics theorem 181

with cosh θ = E(k)  


m . For a boost in a general direction we just use k̂ · A, k̂ · B instead
of A3 , B3 , obtaining finally the general solution for the u coefficient function in an
arbitrary representation of the HLG and for arbitrary spin particles:

uab (k, σ) = (e−θk̂·A )aa (e+θk̂·B )bb A B a b |j σ (7.69)


a b

with (using (7.54)) an obvious corresponding equation for vab (k, σ).
Now that we know how to build covariant fields, transforming according to definite
representations of the HLG, it remains to be seen whether we can also arrange for
microcausality. For interaction Hamiltonians built purely from scalar fields, we saw
that Lorentz-invariance of the S-matrix hinged on the local commutativity property:

[Hint (x), Hint (y)]− = 0 , (x − y)2 < 0 (7.70)

which could be arranged if Hint was built as a polynomial of local fields:

[φ(x), φ† (y)]− = 0 , (x − y)2 < 0 (7.71)

For fermions, [φ(x), φ† (y)]− is not even a c-number, but something quadratic in
creation and annihilation operators, so it certainly cannot vanish identically in the
space-like region. However [φ(x), φ† (y)]+ ≡ {φ(x), φ† (y)} is a c-number, so it is at least
possible for the anticommutator of fermionic fields to vanish in the space-like region.
Therefore, recalling that the commutator of products of an even number of fields
can always be rewritten as a sum of terms involving only anticommutators (for exam-
ple, [AB, CD]− = A{B, C}D − AC{B, D} + {A, C}DB − C{A, D}B), it follows that
local commutativity of Hint can be assured simply by building it out of an even number
of fermionic fields, and insisting on space-like anticommutativity for the elementary
fermionic fields.
The general structure of a commutator (or anticommutator—as usual, upper signs
refer to bosons, lower to fermions) of two covariant fields is

 
A2 B2 † d3 k 1 d 3 k 2 1 B1  −ik1 ·x
[φA 1 B1
a1 b1 (x), φa2 b2 (y)]∓ =  [uA
a1 b1 (k1 , σ1 )e a(k1 , σ1 )
3
(2π) 2E(k1 )2E(k2 ) σ1 σ2

+ vaA11bB1 1 (k1 , σ1 )eik1 ·x ac† (k1 , σ1 ), uA 2 B2  ∗ ik2 ·y † 


a2 b2 (k2 , σ2 ) e a (k2 , σ2 )

+ vaA22bB2 2 (k2 , σ2 )∗ e−ik2 ·y ac (k2 , σ2 )]∓


 
d3 k ∗ −ik·(x−y)
= { uA1 B1 (k, σ)uA 2 B2 
a2 b2 (k, σ) e
(2π)3 2E(k) σ a1 b1

∓ vaA11bB1 1 (k, σ)vaA22bB2 2 (k, σ)∗ eik·(x−y) } (7.72)
σ
182 Dynamics V: Construction of local covariant fields

Define

1 B1  A2 B2  ∗
NaA11bB 1 A2 B2
1 a 2 b2
(k) ≡ uA
a1 b1 (k, σ)ua2 b2 (k, σ)
σ



= (e−θk̂·A )a1 a1 (eθk̂·B )b1 b1 (e−θk̂·A )∗a2 a (eθk̂·B )∗b2 b
2 2
a1 b1 a2 b2

· A1 B1 a1 b1 |jσ A2 B2 a2 b2 |jσ (7.73)
σ

where cosh(θ) = E(k)/m, sinh(θ) = |k|/m, eθ = E+| m


k|
= k +| k|
m . As a function of k ,
μ

N (k) has a definite parity under k → −k . This is easiest to see by choosing k̂ along
μ μ

 = A3 , and recalling that the small a (resp. b) indices refer to


the z axis. Then k̂ · A
eigenvalues of A3 (resp. B3 ) we easily find in this case

NaA11bB 1 A2 B2
1 a2 b2
(k) = eθ(b1 +b2 −a1 −a2 ) A1 B1 a1 b1 |jσ A2 B2 a2 b2 |jσ (7.74)
σ

while
 
NaA11bB 1 A2 B2
1 a 2 b2
(−k) = eθ (a1 +a2 −b1 −b2 ) A1 B1 a1 b1 |jσ A2 B2 a2 b2 |jσ (7.75)
σ

where eθ = −k m+|k| = −e−θ , and the interchange of as and bs results from the change


k̂ → −k̂ in (7.73). We thus obtain the reflection property

NaA11bB 1 A2 B2
1 a 2 b2
(−k) = (−1)(a1 +a2 −b1 −b2 ) NaA11bB 1 A2 B2
1 a2 b2
(k)

= (−1)(2σ−2b1 −2b2 ) NaA11bB 1 A2 B2


1 a2 b2
(k)

= (−1)(2j−2B1 −2B2 ) NaA11bB 1 A2 B2


1 a2 b2
(k) (7.76)

Since vaA11bB1 1 (k, σ) = ξ A1 B1 (−1)j−σ uA 1 B1 


a1 b1 (k, −σ), we have

 
vaA11bB1 1 (k, σ)vaA22bB2 2 (k, σ)∗ = ξ A1 B1 ξ A2 B2 ∗ uA 1 B1  A2 B2 
a1 b1 (k, −σ)ua2 b2 (k, −σ)

σ σ
A1 B1 A2 B2 ∗
=ξ ξ NaA11bB 1 A2 B2
1 a2 b2
(k) (7.77)

Using (7.73) and (7.77) in (7.72), we obtain



A2 B2 † d3 k
[φA1 B1
a1 b1 (x), φa2 b2 (y)]∓ = N A1 B1 A2 B2 (k)
(2π)3 2E(k) a1 b1 a2 b2
(e−ik·(x−y) ∓ ξ A1 B1 ξ A2 B2 ∗ e+ik·(x−y) ) (7.78)
Local covariant fields for massive particles of any spin: the Spin-Statistics theorem 183

We can now use the parity property (7.76) to write this result as

A2 B2 † d3 k
A1 B1
[φa1 b1 (x), φa2 b2 (y)]∓ = (N A1 B1 A2 B2 (k)e−ik·(x−y)
(2π)3 2E(k) a1 b1 a2 b2
∓ (−1)2B1 +2B2 +2j NaA11bB 1 A2 B2
1 a2 b2
(−k)ξ A1 B1 ξ A2 B2 ∗ e+ik·(x−y) )

= NaA11bB 1 A2 B 2
1 a2 b2
(i )F(x − y) (7.79)
∂x
where

d3 k
F(x − y) ≡ (e−ik·(x−y) ∓ (−1)2B1 ξ A1 B1 (−1)2B2 ξ A2 B2 ∗ (−1)2j e+ik·(x−y) )
(2π)3 2E(k)
(7.80)
If we compare this result with (6.76) we see immediately that space-like commutativity
(resp. anticommutativity) is assured if and only if

±(−1)2B1 ξ A1 B1 (−1)2B2 ξ A2 B2 ∗ (−1)2j = 1 (7.81)

Take first (A1 , B1 ) = (A2 , B2 ) and recall that ξ AB is a pure phase. Then we conclude

(−1)2j = ±1 (7.82)

This is the celebrated Pauli–Lüders Spin-Statistics theorem. Bosonic commutation


relations, at least for free fields, require integral spin, fermionic anticommutation rela-
tions half-integral spin—a result completely incomprehensible within the framework of
non-relativistic quantum theory. Now inserting (−1)2j = ±1 into (7.80), we find that
for any pair of covariant fields (A1 , B1 ), (A2 , B2 ) appearing in the theory we must have

(−1)2B1 ξ A1 B1 (−1)2B2 ξ A2 B2 ∗ = 1 (7.83)

which is solved by setting all ξ AB = (−1)2B ξ, with ξ = eiθ a universal phase factor. In
fact, an overall universal phase in all destruction parts is unobservable, so with no loss
of generality we can set ξ = 1.2 Referring back to (7.54), we thus obtain a completely
general expression for a local covariant field of any spin and Lorentz representation:
 
d3 k  −ik·x 
φAB
ab (x) =  (uAB
ab (k, σ)e a(k, σ)
(2π) 3/2 2E(k) σ
 ik·x c† 
ab (k, −σ)e
+ (−)2B (−)j−σ uAB a (k, σ)) (7.84)

with uAB 
ab (k, σ) given by (7.69).
For a spinless particle, j = 0, the simplest covariant field satisfying (7.34) is clearly
obtained by taking A = B = 0, whence the u coefficient function in (7.84) becomes
unity (by (7.64) and (7.69)), and we recover the canonical scalar field (6.73) of Chapter
6, satisfying (automatically) the Klein–Gordon equation ( + m2 )φ(x) = 0. The only
other cases of real importance in the Standard Model of elementary particle physics

2 In certain cases, e.g., the Majorana field discussed below, the choice ξ = −1 will be more convenient.
184 Dynamics V: Construction of local covariant fields

are for j = 12 , 1. These two important special cases are therefore given separate and
detailed attention in the two following sections.

7.4 Local covariant fields for spin- 12 (spinor fields)


From the constraint (7.34) we see immediately that the simplest covariant fields
capable of describing spin- 12 particles evidently transform according to the ( 12 ,0)
and (0, 12 ) representations of the HLG. Our simplest option is clearly to use a single
two-component field χα (x), α = 1, 2 transforming, say, according to the ( 12 ,0) repre-
sentation of the HLG (we shall see below that we could just as well use the (0, 12 )
representation). In addition, we assume that our particle is massive: the special issues
that arise for particles with zero mass and non-zero spin are deferred to Section 7.6.
This approach leads to the Majorana field (Majorana, 1937), and as we shall see
shortly, involves self-conjugate fermions, with the particle indistinguishable from the
antiparticle. Consequently, such fields cannot be used to describe particles endowed
with a non-vanishing conserved additive quantum number (such as electric charge).
The only known electrically neutral elementary spin- 12 particles are neutrinos, and it
is indeed possible that neutrinos exist that are described by such a Majorana field.
All other (charged) spin- 12 particles are described by four component Dirac fields
ψα (x), α = 1, 2, 3, 4 transforming according to reducible representations ( 12 ,0)⊕(0, 12 )
of the HLG. We will first discuss the construction and properties of the Majorana
field,3 and then proceed to the Dirac case.

7.4.1 The Majorana field


Choosing A = 12 , B=0, our general covariant field φAB
ab becomes a two-component field
1 1
0 0
χα , with χ1 = φ 21 0 , χ2 = φ−
2
1 . The zero-momentum u-spinors are trivial Clebsch–
2 20
Gordon coefficients


1 1
u(0, ) =
2 0


1 0
u(0, − ) =
2 1
1
With choice of phase ξ 2 0 = −1, the zero momentum v-spinor given by (7.65) is easily
seen to be

v(0, σ) = iσ2 u(0, σ) = Cs u(0, σ) (7.85)

The HLG generators are A  = 1 σ , B


 = 0, J = 1 σ , K
 = i σ , with σ the conventional
2 2 2
Pauli matrices. Thus the finite-momentum spinors are given (cf. (7.69)) by

u(k, σ) = e−θk̂·
σ/2 u(0, σ)

v(k, σ) = e−θk̂·
σ/2 Cs u(0, σ) (7.86)

3 Section 7.4.1 may be omitted on a first reading of the book.


Local covariant fields for spin-½ (spinor fields) 185

where the rapidity of the boost is given by cosh θ = E(k)m . Up to normalization, the
unique covariant field of type ( 12 ,0) is given by√(7.84). It is conventional for spinor
fields of non-zero mass to include an additional m in the definition of the field. We
thus obtain

d3 k m  −θk̂·
σ/2
χ(x) = 3/2
(e u(0, σ)b(k, σ)e−ik·x
(2π) 2E(k) σ

+ e−θk̂·
σ/2 Cs u(0, σ)b† (k, σ)eik·x ) (7.87)

Note that the antiparticle creation operator b† is just the hermitian adjoint of the
particle annihilation operator: our field is self-conjugate. The spinor index (on χ and
u(0, σ)) has been suppressed: the reader is invited to visualize the left- and right-
hand sides of (7.87) as a column two-vector of field operators. Next we introduce
the augmented Pauli matrices σμ , μ = 0, 1, 2, 3 where σ0 ≡ 1 and the σi , i = 1, 2, 3 are
 and a straightforward calculation,
the usual Pauli matrices. Thus, σ μ ∂μ = ∂0 − σ · ∇,
using the conjugation properties (7.41), leads to the Majorana equation

iσμ ∂μ χ(x) = −mCs χ∗ (7.88)

Here, and throughout this section, we use the asterisk of complex conjugation to
indicate both normal complex conjugation (of numbers) and hermitian conjugation
of operators, while the † symbol is reserved for the combination of conjugation (both
types) and transposition of 2-spinors or 2x2 matrices.
The free Hamiltonian H0 for our particle is given by the usual expression in terms
of creation and annihilation operators:
 
H0 = d3 k E(k)b† (k, σ)b(k, σ) (7.89)
σ

It is not difficult to show that this operator can be written as a spatial integral of a
free energy density

m †
H0 = d3 x : χ† i∂0 χ + (χ Cs χ∗ + χT Cs χ) : (7.90)
2

Note that the normal-ordering symbol :....: is defined for fermions by moving all
creation operators to the left of all annihilation operators (as for bosons), but with an
extra minus sign for each transposition. Use of the Majorana equation (7.88) allows
us to rewrite this in the more usual form (see Problem 1):

H0 =  + m (χT Cs χ − χ† Cs χ∗ ) :
d3 x : χ† iσ · ∇χ (7.91)
2

Note that both the kinetic (spatial derivative) term and the mass term in this
expression are hermitian operators (in particular, χ† Cs χ∗ = −(χT Cs χ)† ).
186 Dynamics V: Construction of local covariant fields

The mass term in (7.91) is actually the integral of a Lorentz scalar field. Recall
that a general covariant field transforms under the HLG as

U (Λ)φα (x)U † (Λ) = Mαβ (Λ−1 )φβ (Λx) (7.92)


1

In this case the representation matrix is (recall (7.36)) M (Λ−1 ) = e− 2 ξ·


σ+ 2 φ·
σ , which
i

satisfies the quasi-orthogonality property

M T (Λ−1 )Cs M (Λ−1 ) = Cs (7.93)

Accordingly

U (Λ)χα (x)(Cs )αβ χβ (x)U † (Λ) = χα (Λx)(M T (Λ−1 )Cs M (Λ−1 ))αβ χβ (Λx)
= χT (Λx)Cs χ(Λx) (7.94)

so the bilinear S(x) = χT (x)Cs χ(x) is indeed a Lorentz scalar field. Such a bilinear
can therefore be used to construct Lorentz scalar interaction densities of Yukawa form,
coupling the Majorana fermion to a self-conjugate spinless scalar φ, for example:

Hint (x) = λYuk φ(x)(χT (x)Cs χ(x) + h.c.) = λYuk φ(χT Cs χ − χ† Cs χ∗ ) (7.95)

What if our ( 12 ,0) field is not self-conjugate? Then it is easy to see that the
mass term in the Hamiltonian must be constructed as a bilinear in χ and χ∗ (to
reproduce the required number operators b† b, bc† bc in H0 ), but then the resultant
field is not a Lorentz scalar (specifically, the scalar property fails under boosts: see
Problem 2). This still leaves the option of a massless spin- 12 non-self-conjugate fermion
transforming according to ( 12 ,0). Such fermions are called left-handed Weyl fermions:4
we return to them in subsection 7.4.5, when we address the massless case and introduce
the two-component Weyl field (Weyl, 1929). On the other hand, massive non-self-
conjugate spin- 12 particles are easily treated by using a reducible representation
of the HLG containing both ( 12 ,0) and (0, 12 ) components, as first introduced by
Dirac (Dirac, 1928) in the late 1920s. We now turn to a study of such reducible
fields.

7.4.2 The Dirac field


Another option for massive spin- 12 particles—indeed, by far the most common in
the Standard Model describing known elementary particle interactions up to the
several hundred GeV scale—is to build a field transforming according to a reducible
representation containing both ( 12 ,0) and (0, 12 ) representations. The resultant reducible
representation may be denoted ( 12 , 0) ⊕ (0, 12 ) and is evidently four-dimensional, and

4 Correspondingly, two-component massless fields transforming according to the (0, 1 ) representation of


2
HLG are right-handed Weyl fields.
Local covariant fields for spin-½ (spinor fields) 187

AB
we may conveniently display the corresponding field as a column 4-spinor ψab
(generally called a “Dirac 4-spinor”, or “bispinor”), as follows:
⎛ 1
0 ⎞
ψ 12
⎜ 2 ⎟ 0
⎜ 1 0⎟
⎜ψ2 1 ⎟
⎜ −20 ⎟
ψ=⎜
⎜ 0 12 ⎟

⎜ψ 1 ⎟
⎜ 0 2 ⎟
⎝ ⎠
0 1
ψ0−21
2

where the “A” (resp. “B”) generators of HLG act only on the top (resp. bottom) two
components of the 4-spinor:
1


= 2
σ 0  = 0 0
A , B 1
0 0 0 2σ

which then gives (recall (7.31)) the following 4x4 matrices for the generators of
rotations and boosts:
1

σ 0
J = 2 , (7.96)
0 12 σ
i

K = 2 σ 0
(7.97)
0 − 2i σ

Our next task is to construct the connection coefficients u(k, σ), v(k, σ) which
determine the canonical field operator for this representation. As usual, we start at
zero momentum: recalling the trivial Clebsch–Gordon values 12 0 12 0| 12 12  = 1 etc., one
finds
⎛ √1 ⎞
2
1 ⎜ 0 ⎟
u(0, ) = ⎜ ⎟
2 ⎝ √1 ⎠
2
0
⎛ ⎞
0
1 ⎜ √1 ⎟
u(0, − ) = ⎜ 2⎟
⎝ 0 ⎠
2
√1
2


where the additional 2 factors are conventional to normalize the 4-spinors. Recall
(cf. (7.83)) that in general we have vab (0, σ) = ξ(−1)2B (−1)j−σ uab (0, −σ). For Dirac
spinors it is conventional to choose the arbitrary phase ξ = −1, giving
188 Dynamics V: Construction of local covariant fields

⎛ ⎞
0
1 ⎜ − √1 ⎟
v(0, ) = ⎜
⎝ 0 ⎠
2⎟
2
√1
2
⎛ ⎞
√1
2
1 ⎜ 0 ⎟
v(0, − ) = ⎜ ⎟
2 ⎝ − √1 ⎠
2
0

The spinor coefficient functions at non-zero momentum are obtained from these by a
boost:

u(k, σ) = e−θk̂·A eθk̂·B u(0, σ)


 
e−θk̂·
σ/2 0
= u(0, σ) (7.98)
0 eθk̂·
σ/2

Defining the matrix (4x4, but written in 2 × 2 blocks!)




0 1
β≡
1 0

one easily verifies that βu(0, σ) = u(0, σ), whence B(k)u(k, σ) = u(k, σ) where
 
−θ k̂·

σ
0 e
B(k) ≡
eθk̂·
σ 0

Expanding the exponential, one finds

E(k) k · σ
eθk̂·
σ = + (7.99)
m m
We may now establish contact with the conventional Dirac formalism by introducing
the Dirac matrices:


0 1
γ0 = β =
1 0


0 −σi
γi = (7.100)
σi 0

in terms of which B(k) = k μ γμ /m and the u-spinor then satisfies

(kμ γμ − m)u(k, σ) = (k/ − m)u(k, σ) = 0 (7.101)

(Note: the abbreviation k/ ≡ k μ γμ is ubiquitous in Dirac theory.) Using βv(0, σ) =


−v(0, σ) an exactly analogous argument reveals that the v-spinors satisfy
Local covariant fields for spin-½ (spinor fields) 189

(k/ + m)v(k, σ) = 0 (7.102)

If we now recall the general expression for a covariant field, we find that the Dirac
field ψ(x), given by

d3 k m  
ψ(x) = (u(k, σ)e−ik·x b(k, σ) + v(k, σ)eik·x d† (k, σ)) (7.103)
(2π)3/2 E(k) σ

satisfies the famous Dirac equation



(iγ μ − m)ψ(x) = 0 (7.104)
∂xμ

m 1
in virtue of (7.101–7.102). The normalization factor of E(k)
instead of 2E(k)
is
conventional for massive Dirac fields (for massless fields, one customarily returns to
the previous normalization, for obvious reasons: see subsection 7.4.5 below), as is the
notation b, d (instead of a, ac ) for the particle and antiparticle destruction operators.

7.4.3 Diracology
The Dirac matrices introduced in (7.100) are readily seen to anticommute with each
other; together with the fact that γ02 = 1, γi2 = −1, we conclude that

{γμ , γν } = 2gμν (7.105)

where the 4x4 identity matrix is understood on the right-hand side of (7.105),
multiplying the metric element 2gμν . The antiparticle coefficient functions v(k, σ) are
conventionally defined as
1
v(k, σ) = γ5 (−1) 2 +σ u(k, −σ) (7.106)

where the matrix




1 0
γ5 = (7.107)
0 −1

incorporates the (−)2B factor in (7.84). In fact, γ5 = −iγ 0 γ 1 γ 2 γ 3 and anticommutes


with each of the γ μ :

{γ5 , γ μ } = 0 (7.108)

(NB: spacetime indices are raised and lowered on the γ-matrices in the usual way: the
spatial ones get a minus sign).
In almost any calculation involving Dirac fields, we encounter spin sums analogous
to (7.73):

N (k)nm ≡ un (k, σ)u∗m (k, σ) (7.109)
σ
190 Dynamics V: Construction of local covariant fields

At zero momentum, from the explicit result for the zero-momentum spinors derived
above we have
 1
N (0)nm ≡ un (0, σ)u∗m (0, σ) = (1 + γ0 )nm (7.110)
σ
2

Applying the boost matrix in (7.98) on the left and the right of N (0), we find

1 1 1 k/
N (k) = B(k)γ0 + γ0 = ( + 1)γ0 (7.111)
2 2 2 m
Another very useful concept in Dirac theory is the “Dirac adjoint” of a 4-spinor.
Namely:

ūn ≡ u∗m γmn


0

† 0
ψ̄n ≡ ψm γmn (7.112)

etc. The spin sum (7.109) usually is needed in the equivalent form

 1 k/
un (k, σ)ūm (k, σ) = ( + 1)nm (7.113)
σ
2 m

which follows from (7.111) as γ02 = 1. For the antiparticle spinors, the result is similar,
with an obvious change of sign:
 1 k/
vn (k, σ)v̄m (k, σ) = ( − 1)nm (7.114)
σ
2 m

Finally, there are the overlap normalization properties such as

v̄(k, σ)γ0 u(−k, σ  ) = v † (k, σ)u(−k, σ  ) (7.115)


† 
= v (0, σ)u(0, σ ) = 0 (7.116)

where the second line follows from the boost equation (7.98):
 
eθk̂·
σ/2 0
u(−k, σ  ) = u(0, σ  ) (7.117)
0 e−θk̂·
σ/2
 
e−θk̂·
σ/2 0
v(k, σ) = v(0, σ) (7.118)
0 eθk̂·
σ/2

Similarly, one easily verifies

E(k)
ū(k, σ)γ0 u(k, σ  ) = δσσ = v̄(k, σ)γ0 v(k, σ  ) (7.119)
m
Local covariant fields for spin-½ (spinor fields) 191

With the help of (7.115–7.119) one may easily establish the field-theoretic formula for
the free Dirac Hamiltonian:
 
H0 = d3 kE(k) (b† (k, σ)b(k, σ) + d† (k, σ)d(k, σ)) (7.120)
σ

or, in terms of the Dirac field,



H0 =  + m)ψ(x) :
d3 x : ψ̄(x)(iγ · ∇ (7.121)

Similarly, one finds the following expression for the charge operator (for a Dirac particle
of charge e; see Problem 4):
 
Q = d3 k (eb† (k, σ)b(k, σ) − ed† (k, σ)d(k, σ)) (7.122)
σ

= d3 xe : ψ̄(x)γ 0 ψ(x) : (7.123)

The four-component bispinor notation has achieved such predominance in dealing


with spin- 12 fields that it is even commonly used for Majorana fields, which, as we saw
in the preceding section, are strictly speaking two-component objects. However, we
can continue to use the extremely convenient (and familiar!) apparatus of the Dirac
algebra even in the Majorana case by defining a four-component field5


χ
ψM =
−Cs χ∗

with χ(x) the two-component field of Section 7.4.1, satisfying the Majorana equation
(7.88) and its conjugate:6


i(∂0 − σ · ∇)χ = −mCs χ∗ (7.124)
−i(∂0 − σ ∗ · ∇)χ
 ∗ = −mCs χ (7.125)

whence one easily verifies that the four-component field ψM satisfies the Dirac equation
(7.104):


(iγ μ − m)ψM (x) = 0 (7.126)
∂xμ

5 Alternatively, one can take the (0, 12 ) field as the starting point and write
 
Cs χ∗
ψM =
χ

6 We temporarily return to the convention of Section 7.4.1, where an asterisk is used to represent both
complex conjugation and hermitian conjugation.
192 Dynamics V: Construction of local covariant fields

Likewise, the free Hamiltonian (7.91) for the Majorana field takes exactly the same
form as the Dirac Hamiltonian (7.121), with an extra factor of 12 to compensate for
the doubling of the Majorana field in the four-component notation:

1  + m)ψM (x) :
H0 = d3 x : ψ̄M (x)(iγ · ∇ (7.127)
2

Indeed, one can think of the Dirac field as a 4-spinor composed of two independent
Majorana fields


χ1
ψM =
−Cs χ∗2

—a point of view which becomes extremely useful in supersymmetric theories (cf.


Section 12.6), where the Majorana field serves as a basic “unit” for constructing spin-
1
2
fields of all types.

7.4.4 Lorentz transformation properties


The Lorentz group is a non-compact group, with three hermitian generators (the J
rotation generators) and three anti-hermitian generators (the K  boost generators).
 K
From the explicit expressions for J,  in the Dirac case (see Eqs. (7.96,7.97)), we see
that the γ0 matrix can be used to effect a conjugation:

J † = J
 0
= γ0 Jγ (7.128)

 = −K

K
 0
= γ0 Kγ (7.129)

Recall that the matrix representing an element of HLG can be expressed as an


exponential:

M (Λ) = e−i(Ω10 K1 +Ω20 K2 +Ω30 K3 +Ω12 J3 +Ω23 J1 +Ω31 J2 ) (7.130)

Using (7.128–7.129), it therefore follows that

M † (Λ) = γ0 M −1 (Λ)γ0 (7.131)


Next, note that the matrix element N (k)nm = σ un (k, σ)u∗m (k, σ) of the spin-sum
matrix introduced above is a spinor dot-product of un and um , and therefore invari-
j
ant under simultaneous rotations with the rotation matrix Dσσ  . From (7.45), one

finds
Local covariant fields for spin-½ (spinor fields) 193

Mn1 m1 (Λ−1 )Mn∗2 m2 (Λ−1 )um1 (Λk, σ)u∗m2 (Λk, σ)
σ,m1 m2

= (M (Λ−1 )N (Λk)M † (Λ−1 ))n1 n2



= un1 (k, σ)u∗n2 (k, σ) = N (k)n1 n2 (7.132)
σ

Using the conjugation property (7.131) in (7.132), we obtain

M (Λ−1 )N (Λk)γ0 M (Λ)γ0 = N (k) (7.133)

Inserting the explicit expression (7.111),

M (Λ−1 )(Λμν k ν γμ + m)γ0 γ0 M (Λ)γ0 = (k/ + m)γ0 (7.134)

from which

Λμν k ν γμ = M (Λ)k/M −1 (Λ) (7.135)

Differentiating (7.135) with respect to kν , we finally obtain the desired transformation


property of the Dirac γ matrices under finite HLG transformations:

M (Λ)γν M −1 (Λ) = Λμν γμ (7.136)


ν −1
M (Λ)γ M (Λ) = Λμν γ μ (7.137)

As expected, the four γ matrices transform as a four-vector under the HLG. The
matrix γ5 introduced in (7.107) transforms as a pseudoscalar:

i
γ5 ≡ −iγ 0 γ 1 γ 2 γ 3 = − μνρσ γ μ γ ν γ ρ γ σ
4!
i    
M (Λ)γ5 M −1 (Λ) = − μνρσ Λμμ Λν ν Λρρ Λσσ γ μ γ ν γ ρ γ σ
4!
i    
= − μ ν  ρ σ det(Λ)γ μ γ ν γ ρ γ σ
4!
= det(Λ)γ5 (7.138)

From the general transformation property of a covariant field (7.5)

U (Λ)ψn (x)U † (Λ) = Mnm (Λ−1 )ψm (Λx) (7.139)

follows the corresponding transformation for the adjoint field

U (Λ)ψn† (x)U † (Λ) = Mnm



(Λ−1 )ψm

(Λx) (7.140)
194 Dynamics V: Construction of local covariant fields

Adding a γ0 to convert the regular adjoint to the Dirac adjoint:



U (Λ)ψ̄n (x)U † (Λ) = Mmm

 (Λ
−1
)ψm  (Λx)(γ0 )mn


= (M † (Λ−1 )γ0 )m n ψm  (Λx)


= (γ0 M (Λ))m n ψm  (Λx)

= Mmn (Λ)ψ̄m (Λx) (7.141)

where we have used the conjugation property (7.131) between the second and third
lines. From (7.139) and (7.141) follow directly

U (Λ)ψ̄n (x)ψn (x)U † (Λ) = ψ̄n (Λx)ψn (Λx) ≡ ψ̄(Λx)ψ(Λx) (7.142)

Accordingly, the field S(x) ≡ ψ̄(x)ψ(x) is a scalar field. On the other hand, using
(7.138),

U (Λ)ψ̄(x)γ5 ψ(x)U † (Λ) = det(Λ)ψ̄(Λx)γ5 ψ(Λx) (7.143)

we see that there is an extra minus sign for transformations including a spatial
reflection, so that P (x) ≡ ψ̄γ5 ψ is a pseudoscalar field. Not surprisingly, we can
construct a vector field employing the Dirac matrices in the obvious way:

U (Λ)ψ̄n (x)(γμ )n n ψn (x)U † (Λ) = Mm n (Λ)(γμ )n n Mnm (Λ−1 )ψ̄m (Λx)ψm (Λx)
= Λρ μ ψ̄m (Λx)(γρ )m m ψm (Λx) (7.144)

Omitting the matrix indices for clarity, this is

U (Λ)ψ̄(x)γμ ψ(x)U † (Λ) = Λρ μ ψ̄(Λx)γρ ψ(Λx) (7.145)

Thus, Vμ (x) ≡ ψ̄γμ ψ(x) is indeed a vector field. The corresponding result for the four-
vector field Aμ (x) ≡ ψ̄γ5 γμ ψ contains an extra det(Λ) factor (which would be –1 if
the transformation Λ is improper, containing a spatial reflection), so we conclude that
Aμ is an axial vector field.
As the Dirac field has four components, it is apparent that there must be sixteen
independent bilinears constructible as linear combinations of ψ̄n (x)ψm (x). So far, the
S, P, Vμ , and Aμ fields provide us with 1+1+4+4=10 independent operators. The six
remaining independent operators form a second-rank antisymmetric Lorentz tensor
i
Tμν (x) ≡ ψ̄(x) [γμ , γν ]ψ(x) (7.146)
4
Of course, fields with non-trivial Lorentz transformation properties, such as Vμ , Aμ ,
and Tμν can still be contracted in the standard way to produce Lorentz scalar fields
suitable for use in an interaction Hamiltonian: e.g., Vμ V μ , Tμν T μν etc. Combinations
like Vμ Aμ are pseudoscalar fields and will lead to parity violation if included in the
interaction Hamiltonian density. Precisely this occurs in the effective Fermi theory
of the weak interactions, as we shall see. Another important type of interaction is
Local covariant fields for spin-½ (spinor fields) 195

the Yukawa interaction between a scalar field φ(x)(or pseudoscalar π(x)) and a Dirac
field ψ(x):

Hint = λψ̄(x)ψ(x)φ(x) (7.147)


Hint = λψ̄(x)γ5 ψ(x)π(x) (7.148)

Note that both (7.147) and (7.148) are parity conserving interactions (as Hint is even
under parity in both cases)!

7.4.5 Massless spin- 12 fields: the Weyl field


For many years the neutrinos associated with the weak interactions were believed to
be massless spin- 12 fermions. The situation changed with the discovery of neutrino
oscillations, beginning with the solar neutrino deficit experiment of Davis in the
1970s and 1980s. Although oscillation experiments typically measure mass differences
between neutrino species, and the possibility still exists that one of the neutrino species
may be exactly massless, there is no natural reason for this to be so, and the working
assumption in the field is that all three generations of neutrino are in fact massive,
albeit with very small (in some cases, considerably smaller than an electron-volt)
masses. Nevertheless, for much of weak interaction phenomenology, the effects of the
neutrino masses are completely negligible, so it is quite useful to consider the massless
limit for a non-self-conjugate Dirac field (we know that neutrinos and antineutrinos
are distinct particles). If we return to the Dirac four-component
√ field of (7.103) we see
right away that we had better transfer a factor of 2m into the normalization of the
u and v spinors before taking the massless limit. Accordingly, instead of (7.113) and
(7.114), our new spinors are normalized by

un (k, σ)ūm (k, σ) = (k/ + m)nm → k/nm , m = 0
σ

vn (k, σ)v̄m (k, σ) = (k/ − m)nm → k/nm , m = 0 (7.149)
σ

while the Dirac field now takes the form (with E(k) = |k|)
 
d3 k
ψ(x) =  (u(k, σ)e−ik·x b(k, σ) + v(k, σ)eik·x d† (k, σ)) (7.150)
(2π) 3/2 2E(k) σ

There is one further subtlety of which we must be cognizant in taking the massless limit
of such a field. The discussion of massless particle states in Section 5.3 made it clear
that such states always appear with a definite value of helicity—i.e., of their angular
momentum resolved along the direction of motion of the particle—rather than along
an arbitrary z-axis, as in the spin states employed in the discussion of the massive
Dirac field above. The relation between spin states |k, σ and helicity states |k, λ is
very simple (recall Problem 2(a) of Chapter 5):
196 Dynamics V: Construction of local covariant fields

 1
|k, λ = 2
Dσλ (R(k̂))|k, σ
σ

where R(k̂) is the rotation from the z-axis into the direction of momentum k̂ (with a
similar relation for the antiparticle states). There is a corresponding relation between
the creation operators
 1
b† (k, λ) = Dσλ2
(R(k̂))b† (k, σ)
σ
 1
d† (k, λ) = 2
Dσλ (R(k̂))d† (k, σ)
σ

So, if we define helicity spinors


 1
u(k, λ) = 2
Dσλ (R(k̂))u(k, σ) (7.151)
σ
 1
2∗
v(k, λ) = Dσλ (R(k̂))v(k, σ) (7.152)
σ

the Dirac field (7.150) can be rewritten entirely in terms of helicity spinors and
creation–annihilation operators:
 
d3 k 1  
ψ(x) = 3/2
(u(k, λ)e−ik·x b(k, λ) + v(k, λ)eik·x d† (k, λ)) (7.153)
(2π) 2E(k)
λ

Recall that the Dirac field transforms according to the reducible representation
( 12 ,0)⊕(0, 12 ), with the upper (resp. lower) two components transforming as ( 12 ,0) (resp.
(0, 12 )). If we label the upper two components ψL and the lower two ψR , then it is easy
to see that in the limit m → 0 the Dirac equation (7.104) decouples into separate
equations for these two chiral fields:7

σ μ ∂μ ψL (x) = 0 (7.154)
μ
σ̄ ∂μ ψR (x) = 0 (7.155)

where σμ = (1, σi ), σ̄μ = (1, −σi ). These are called the “left-handed” and “right-
handed” Weyl equations,8 respectively (Weyl, 1929). The first equation (for the ( 12 ,0)
field) coincides in form, not surprisingly, with the previously discussed Majorana
equation (7.88). Here, however, our field is not assumed to be self-conjugate: our
particle is allowed to carry a non-zero additive conserved quantum number, for
example, opposite in sign to that of the antiparticle (e.g., lepton number in the

7 Recall from the discussion in Section 7.2 that half-integral spin fields must be described by an (A, B)
representation with A = B. In the spin- 12 case, the chiral character (left- or right-handed) of the Weyl field
is directly correlated with the eigenvalue of γ5 ; cf. (7.107).
8 These equations, although evidently Lorentz-covariant, were rejected by Pauli in his famous Handbuch
article,(Pauli, 1933), p. 226, for violating invariance under spatial reflections (parity)—a shortcoming which
was later transformed into a virtue when parity non-conservation was discovered in the 1950s.
Local covariant fields for spin-½ (spinor fields) 197

original form of weak interaction theory, with massless neutrinos). The change in sign
of the spin matrices between the equations for ψL and ψR suggests that they describe
(massless) particles of opposite helicity. We shall now demonstrate this explicitly.
Return for a moment to the massive case, in the original spin representation. We
shall concentrate on the upper two components, so there will be a “L” subscript
everywhere. From (7.98), the finite momentum spinor is obtained by a boost from the
zero-momentum state:

uL (k, σ) = 2me−θk̂·
σ/2 uL (0, σ) (7.156)

with cosh θ = E(k)/m, whence



−θ k̂·
θ θ 1 E(k) E(k)
e σ /2
= cosh − sinh k̂ · σ = √ ( +1− − 1 k̂ · σ ) (7.157)
2 2 2 m m

Inserting this result in (7.156) we obtain


 
uL (k, σ) = ( (E(k) + m) − (E(k) − m)k̂ · σ )uL (0, σ) (7.158)

where the zero-momentum spinor in spin representation is given as discussed previ-


ously in terms of simple Clebsch–Gordon coefficients, namely uL (0, σ)n = √12 δnσ , if
we interpret the Kronecker δ as giving one if n = 1, σ = + 12 and n = 2, σ = − 12 . The
corresponding result for the helicity representation spinor follows from (7.151):
 1
uL (k, λ)n = 2
Dσλ (R(k̂))uL (k, σ)n
σ

1  12  
= √ Dσλ (R(k̂))( (E(k) + m) − (E(k) − m)k̂ · σ )nσ
2 σ
1   1
= √ {( (E(k) + m) − (E(k) − m)k̂ · σ )Dσλ
2
(R(k̂))}nλ
2

whence, by commuting the rotation matrix through the helicity matrix k̂ · σ ,


1 1  
uL (k, λ)n = √ {Dσλ
2
(R(k̂))( (E(k) + m) − (E(k) − m)σ3 )}nλ (7.159)
2

again with the understanding that spin or helicity values + 21 (resp. − 12 ) need to be
translated appropriately into row or column indices 1 (resp. 2) when evaluating matrix
elements of the rotation or spin matrices. Here we have used the definition of R(k̂) as
the rotation taking the z-axis into the k̂ direction, whence
1 1
D 2 (R(k̂))−1 k̂ · σ D 2 (R(k̂)) = σ3 (7.160)

In particular, the positive helicity λ = + 12 spinor is given by

1 1 1  
uL (k, + )n = √ Dn1
2
(R(k̂))( E(k) + m − E(k) − m) (7.161)
2 2
198 Dynamics V: Construction of local covariant fields

and clearly vanishes in the zero-mass (or high-energy) limit. On the other hand, the
negative helicity (or “left-handed”) spinor survives in this limit (whence our choice of
the subscript “L” in labeling the upper two components of the Dirac 4-spinor):
1 1 1  
uL (k, − )n = √ Dn2
2
(R(k̂))( E(k) + m + E(k) − m)
2 2
 1
→ 2E(k)Dn22
(R(k̂)), m → 0 (7.162)

In other words, the chirality of the field (defined as the eigenvalue of γ5 ) and the
helicity of the particle (the eigenvalue of k̂ · σ ) have become linked in the high-energy,
or zero-mass limit.
The helicity interpretation of these spinors is again a consequence of (7.160), which
1
implies that the first column of the 2x2 rotation matrix D 2 (R(k̂)) is an eigenvector of
the helicity operator k̂ · σ with eigenvalue +1, while the second column appearing in
(7.162) is an eigenvector with eigenvalue –1. A similar examination of the properties
of the v spinors associated with the antiparticle states shows that in the massless limit
only the positive helicity state survives. Thus our left-handed Weyl field ψL describes
a massless particle with negative helicity, but whose antiparticle is necessarily positive
helicity. The right-handed Weyl field ψR correspondingly describes a positive helicity
particle, paired with a negative helicity antiparticle. As we shall see later, only the
ψL -type neutrino fields participate in weak interactions in the Standard Model of
elementary particle interactions, even though the particles themselves now appear to
have non-zero mass—suggesting that we are dealing with Dirac fermions after all!

7.5 Local covariant fields for spin-1 (vector fields)


Next, let us construct a suitable covariant field for a particle of spin j = 1. As for
spin- 12 , we begin with the massive case, and return later to the special features that
apply for an exactly massless spin-1 particle. The simplest symmetric choice of HLG
representation (A, B), with A = B, compatible with j = 1, is to take A = B = 12 . This
four-dimensional representation turns out to be none other (see Problem 6) than the
defining fundamental representation of the Lorentz group, with representation vectors
labeled by a spacetime index μ. We shall use the notation W μ (x) for a massive vector
field. The condition (7.48) then amounts to
 (1)
Dσσ (R)uμ (0, σ) = Rμν uν (0, σ  ) (7.163)
σ

where M (Λ)μν = Λμν is simply Rμν for a pure rotation R. For μ = 0 this reduces to
 (1)
Dσσ (R)u0 (0, σ) = u0 (0, σ  ) (7.164)
σ

which implies that u0 (0, σ) = 0. As (7.50) implies that the coefficient functions uμ (k, σ)
at non-zero momentum are obtained from those at zero momentum by a boost

uμ (k, σ) = Lμν (k)uν (0, σ) (7.165)


Local covariant fields for spin-1 (vector fields) 199

we must have the transversality condition

kμ uμ (k, σ) = 0 (7.166)

as the covariant dot-product vanishes in the rest frame. From (7.54)


 it follows that
the v coefficients are also transverse. Recalling that k0 ≡ E(k) ≡ k 2 + m2 in the
μ

expression for the vector field

 d3 k
μ
W (x) = √ (uμ (k, σ)a(k, σ)e−ik·x + v μ (k, σ)ac† (k, σ)eik·x ) (7.167)
σ (2π)3/2 2E

we have

(∂ ρ ∂ρ + m2 )W μ (x) = 0 (7.168)
μ
∂μ W (x) = 0 (7.169)

where the vanishing divergence of W μ follows directly from the transversality property
(7.166). The field equations (7.168), (7.169) can be summarized succintly by defining
the tensor field

F μν (x) ≡ ∂ μ W ν (x) − ∂ ν W μ (x) (7.170)

The single field equation

∂μ F μν + m2 W ν = 0 (7.171)

referred to as the Maxwell–Proca equation (Proca, 1936), is easily seen to yield both
the constraints (7.168), (7.169) which build in the correct transformation properties
of the covariant field W μ under Lorentz transformations. In the special case of a self-
conjugate vector field, W μ is hermitian and the antiparticle piece is just the hermitian
adjoint of the particle piece, so (7.167) simplifies to

 d3 k
Z μ (x) = √ ( μ (k, σ)a(k, σ)e−ik·x + μ (k, σ)∗ a† (k, σ)eik·x ) (7.172)
σ (2π)3/2 2E

where we have (borrowing from electroweak theory) used the letter Z for a neutral
(i.e., self-conjugate) vector boson, and (from quantum electrodynamics) the more
conventional notation μ for the corresponding polarization vector.
Once again, as in the case of spin- 12 particles, the massless limit exhibits special
features. We can proceed as before by examining the behavior of the massive field
discussed above in the limit m → 0. The discussion of massless particle representations
in Section 5.3 again indicates that the helicity representation provides the appropriate
description in this limit. From Eqs. (7.64, 7.65) the zero-momentum polarization
vectors for a massive j = 1 particle (with ξ AB = (−1)2B ) are easily seen to be
200 Dynamics V: Construction of local covariant fields
⎛ ⎞
0
⎜ 0 ⎟
μ (0, σ = 0) = ⎜
⎝ 0 ⎠

−1
⎛ ⎞
0
⎜ ± √1 ⎟
μ (0, σ = ±1) = ⎜⎝ √i ⎠
2⎟
2
0

with the polarization vectors v μ (0, σ) for the antiparticle term given by the conjugates
μ ∗ (0, σ) as in (7.172).
The corresponding vectors for a particle moving in the positive z-direction (so that
the spin σ and helicity λ specifications are equivalent) with momentum k = kẑ are

⎛ k ⎞
−m
⎜ 0 ⎟
μ (0, λ = 0) = ⎜ ⎟
⎝ 0 ⎠
− E(k)
m
⎛ ⎞
0
⎜ ± √1 ⎟
μ (0, λ = ±1) = ⎜ 2⎟
⎝ √i ⎠
2
0

Note that the maximal helicity states λ = ±1 are unaffected by the boost. On the
other hand, the zero-helicity mode is singular in the massless limit (or, for fixed mass,
in the high-energy limit). This seems quite at variance with the situation for massless
spin- 12 Weyl fields, where, for example, the ( 12 ,0) left-handed field gave a well-defined
left-handed spinor in the massless limit, while the right-handed spinor vanished as the
mass was taken to zero (cf. (7.161)). Of course, if we start with an exactly massless
j = 1 particle from the outset, the considerations of Section 5.3 assure us that helicity
λ = +1 and λ = −1 states transform separately and irreducibly under the proper
HLG, and in particular do not mix with a zero-helicity mode, which can be eliminated
completely from the theory. On the other hand, if we wish to have parity conserving
interactions, as is certainly the case for the photon, then both λ = +1 and λ = −1
states must appear.
Still, the singularity of the massless limit for the zero-helicity mode is somewhat
unsettling: it would seem to imply that we could distinguish between an exactly mass-
less photon and one with a mass of 10−80 eV, for example. In fact, the interactions of
a photon with charged particles enjoy a gauge symmetry which ensures the decoupling
of the zero-helicity mode and restores the smoothness of the massless limit, as well
as softening the high-energy behavior of the theory, rendering it renormalizable. We
shall return to these issues (of gauge symmetry and renormalizability) in much greater
detail in the “Symmetries and Scales” sections of the book, but it may be useful here
Local covariant fields for spin-1 (vector fields) 201

to provide a brief explanation of the role of gauge symmetry in taming the massless
limit for the unwanted λ =0 mode.
Suppose the interactions of our spin-1 field Z μ (we use the Z notation as our field
is still massive) are insensitive to altering Z μ by longitudinal (i.e., gradient) fields of
the form ∂ μ Λ(x), for arbitrary Λ(x). Choose Λ(x) to be the hermitian field

1  d3 k
Λ(x) = i √ (a(k, λ)e−ik·x − a† (k, λ)eik·x ) (7.173)
m (2π)3/2 2E
λ

The replacement Z μ → Z μ + ∂ μ Λ (referred to as a gauge transformation of the vector


field Z μ ) then clearly amounts to a shift of polarization vector μ (k, λ) → μ (k, λ) +

m . This corresponds to an alteration of the zero-helicity polarization vector (choosing
the direction of k along the z-axis) to
⎛ E(k)−k

m
⎜ ⎟
⎜ 0 ⎟
(0, λ = 0) = ⎜
μ



⎝ 0 ⎠
− E(k)−k
m

But, in the massless limit m → 0, k ≡ |k| fixed, E(k)−k m → 2k


m
→ 0, so we see that the
interactions of the zero-helicity mode do indeed disappear as promised in the massless
limit. We are therefore at liberty to define a massless spin-1 field, for which we shall now
adopt the standard notation Aμ (x), entirely in terms of the two transverse polarization
vectors (k, λ = ±1). For a general direction of motion k̂ of our massless particle,
these vectors are given by choosing real unit vectors  1 , 2 with  1 ×  2 = k̂, whence
the spatial parts of the positive (resp. negative) helicity polarization vectors become
 (k, +1) = √12 ( 1 + i 2 ) (resp.  (k, −1) = − (k, +1)∗ ). The zeroth (time) component
of these polarization vectors is itself zero, so this construction leads to a purely spatial
field

  d3 k

A(x) = √ ( (k, λ)a(k, λ)e−ik·x +  (k, λ)∗ a† (k, λ)eik·x ) (7.174)
λ=±1
(2π)3/2 2E

satisfying the radiation (or Coulomb) gauge condition

 · A(x)
∇  =0 (7.175)

The appearance of a purely spatial field in a (hopefully!) Lorentz-invariant theory may


seem puzzling at first sight, but recall from classical electromagnetic theory that an
arbitrary four-vector potential Aμ (x) satisfying the source-free Maxwell equations can
be put into radiation gauge A0 = 0, ∇  ·A = 0 by precisely a gauge transformation of
the sort Aμ → Aμ + ∂ μ Λ.
202 Dynamics V: Construction of local covariant fields

From (7.174), the recovery of the usual form for the free Hamiltonian as a sum of
electric and magnetic field energies is straightforward (see Problem 9):
  1

H0 = d3 k E(k)a† (k, λ)a(k, λ) = 2 + B
d3 x : (E  2) : (7.176)
2
λ=±1

 ≡ − ∂ A
, B
where E  ≡∇
 × A.

∂t
Instead of developing the concept of covariant fields starting from the somewhat
ad hoc assumption of relativistic wave equations such as (7.168, 7.169, 7.171), the
systematic construction followed in the preceding sections shows clearly that the
form of the field equations, and the corresponding free Hamiltonian, satisfied by
local covariant fields is actually specified uniquely by the underlying particle state
transformation properties, once we insist on choosing the simplest representation of
the HLG (for each spin) that will do the job.

7.6 Some simple theories and processes


At this stage, to give the reader a more concrete idea of how the covariant fields
constructed in the preceding sections can be put to use, we describe some simple the-
ories of interacting spinless and spin- 12 particles, where the calculation of elementary
scattering and decay processes can be carried out to low orders in perturbation theory
without invoking the full technology of covariant perturbation theory, Wick’s theorem,
etc. (which we shall return to later in the book). Why not spin-1 particles as well? We
shall see later, in Section 12.1, that our strategy so far for generating Lorentz-invariant
theories, by writing the interaction Hamiltonian as an integral over an ultralocal
scalar density, fails for theories involving particles of spin-1 or higher, or for theories
involving interactions with spacetime-derivatives: the resultant interaction Hamiltoni-
ans typically contain non-covariant Schwinger terms in their space-like commutators
(cf. discussion in Section 5.5). For such theories, the Lagrangian formalism developed
later in Chapter 12 is ideally suited to ensuring that our theory indeed yields Lorentz-
invariant scattering amplitudes: this will follow automatically simply by constructing
the Lagrangian in a Lorentz-invariant way.
Restricting ourselves, therefore, to theories with only scalar or Dirac particles, we
shall consider some simple processes in theories defined by the following three simple
interaction Hamiltonians:

1. (Theory A) For coupling constant λ real, and φ(x) = φ† (x) (a self-conjugate


scalar field of mass M ), assume an interaction Hamiltonian density

(A) λ
Hint (x) = φ(x)4 (7.177)
4!
2. (Theory B) With λ, φ(x) as in theory A above, and with ψ(x) a non-self-conjugate
scalar field of mass m

Hint (x) = λψ † (x)ψ(x)φ(x)


(B)
(7.178)
Some simple theories and processes 203

3. (Theory C) With λ, φ(x) as in theory B, but with ψ(x) a spin- 12 Dirac field of
mass m, subject to a Yukawa interaction
(C)
Hint (x) = λψ̄(x)ψ(x)φ(x) (7.179)

We shall be calculating some simple decay and scattering processes in these


theories, so we will need the results on rates and cross-sections from Appendix B. To
employ these results we must first extract the non-singular part Tβα of the T-matrix
element of the process via

Sβα = δβα − 2πiδ 4 (Pα − Pβ )Tβα (7.180)

The desired phenomenological quantities are then given by (see Appendix B for a
derivation of these essential results, specifically (B.17) and (B.27)):
1. The differential decay rate of a one-particle state α to final states β:

dΓ(α → β) = 2πδ4 (Pα − Pβ )|Tβα |2 dβ (7.181)

where the final-state phase-space factor is just



dβ = d3 ki (7.182)
spins i

with ki the final state momenta. Of course, when considering spinless particles,
the sum over final-state spins will be irrelevant.
2. The differential cross-section for a two-particle state α to scatter into final
states β:

(2π)4 4
dσ(α → β) = δ (Pα − Pβ )|Tβα |2 dβ (7.183)

where the relative velocity in the initial state vα is given by

(k1 · k2 )2 − m21 m22
vα = (7.184)
E1 E2
and the final-state phase-space is as above.
The perturbative expansion of the S-matrix
∞ 
(−i)n
S= d4 x1 ..d4 xn T {Hint (x1 )....Hint (xn )} (7.185)
n=0
n!

will be used to extract the T-matrix element for some simple processes in the theories
specified above. We will evaluate the lowest-order non-trivial contribution to S for
each of the following processes:
1. φ − φ scattering in theory A (first order in λ).
2. φ decay in theory B (first order in λ).
204 Dynamics V: Construction of local covariant fields

3. ψ − ψ c scattering in theory B (second order in λ).


4. ψ − ψ c scattering in theory C (second order in λ).

7.6.1 2-2 scattering in λφ4 theory


Non-trivial 2-2 scattering in theory A already occurs at first order in λ. We have
deliberately chosen not to normal-order the interaction Hamiltonian (cf. the discussion
in Section 6.4) in order to display certain features of the scattering amplitude which
are characteristic of relativistic field theory, in contradistinction to non-relativistic
scattering. For the elastic scattering process k1 + k2 → k1 + k2 we have

−iλ
S
k
k ,
k1
k2 = d4 x k1 k2 |φ(x)4 |k1 k2  + O(λ2 ) (7.186)
1 2 4!
with

φ(x) = Dp(a(p)e−ip·x + a† (p)eip·x ) (7.187)

1
Dp ≡  d3 p (7.188)
(2π)3/2 2E(p)

In order to change the incoming state |k1 k2  to the outgoing one |k1 k2  (with different
momenta), we must choose the annihilation part of two of the φ(x) fields to get rid
of the incoming particles and the creation part of the remaining two to produce the
outgoing ones. A typical matrix element appearing in (7.186) might, for example, be

k1 k2 |a† (p1 )a(p2 )a† (p3 )a(p4 )|k1 k2  (7.189)

If we rearrange the product of creation and annihilation operators to produce the


normal-ordered product, we find an additional term:

a† (p1 )a(p2 )a† (p3 )a(p4 ) = a† (p1 )a† (p3 )a(p2 )a(p4 ) + δ 3 (p2 − p3 )a† (p1 )a(p4 ) (7.190)

which clearly leads to a disconnected term, as

k1 k2 |a† (p1 )a(p4 )|k1 k2  = δ 3 (k1 − p1 )δ 3 (k1 − p4 )δ 3 (k2 − k2 ) + permutations (7.191)

corresponding to disconnected terms of the structure displayed in Fig. 6.6, in which one
of the particles passes through the process completely unaffected by the interaction,
while the other particle suffers a self-interaction induced by the interaction. Such
persistent self-interactions, present even for an isolated particle, are typically absent
in non-relativistic scattering, but are an intrinsic feature of relativistic field theory. As
we shall see in our more detailed discussion of perturbation theory in Chapter 10, and
in even more detail in Part 4 of the book (on “Scales”), they result in renormalizations
of the attributes of single-particle states: specifically, of the mass and normalization
of the one particle states of the theory. For now, we ignore such effects, concentrating
on the fully connected contributions to the scattering exemplified by the right-most
diagram in Fig. 6.6.
Some simple theories and processes 205

Fully connected contributions to the scattering arise from keeping the normal-
ordered piece from each of the six possible ways in which we can pick two creation
and two annihilation terms from the product of four fields in (7.186). Relabeling the
momentum integration variables, we find six equivalent contributions. For example,
we can take the term
 
4
d4 x Dpi k1 k2 |a† (p1 )eip1 ·x a† (p2 )eip2 ·x a(p3 )e−ip3 ·x a(p4 )e−ip4 ·x |k1 k2  (7.192)
i=1

and then multiply by 6. In (7.192) the annihilation operator a(p4 ) can remove either the
particle k1 , in which case a(p3 ) must remove k2 , or the other way around. Likewise,
there are two possibilities for creating the final state. Recalling that a(p4 )|k1 .... =
δ 3 (p4 − k1 )|..., k1 ...|a† (p1 ) = δ 3 (p1 − k1 ) ...|, we see that the only interesting parts
of (7.192) evaluate to
 
4
d4 x Dpi (δ 3 (p1 − k1 )δ 3 (p2 − k2 ) + p1 ↔ p2 )
i=1

× (δ 3 (p3 − k1 )δ 3 (p4 − k2 ) + p3 ↔ p4 )ei(p1 +p2 −p3 −p4 )·x (7.193)



4
= (2π)4 Dpi δ 4 (p1 + p2 − p3 − p4 )(δ 3 (p1 − k1 )δ 3 (p2 − k2 ) + p1 ↔ p2 )
i=1

× (δ (p3 − k1 )δ 3 (p4 − k2 ) + p3 ↔ p4 )
3
(7.194)
4
=  (2π)4 δ 4 (k1 + k2 − k1 − k2 ) (7.195)
(2π)6 2E1 · 2E2 · 2E1 · 2E2

To summarize, the connected S-matrix scattering amplitude (putting in the factor


of 6 discussed above) becomes
−iλ
Skc k ,k1 k2 = (2π)4 δ 4 (k1 + k2 − k1 − k2 ) + O(λ2 )
1 2
2E1 · 2E2 · 2E1 · 2E2
(2π)6
(7.196)
Extracting the T-matrix element with (7.180):
λ
Tkc k ,k1 k2 =  + O(λ2 ) (7.197)
1 2
(2π)3 2E1 · 2E2 · 2E1 · 2E2

so the differential cross-section is (from (7.183))

λ2 1 d3 k  d3 k 
dσ =  δ 4 (k1 + k2 − k1 − k2 )  1  2 (7.198)
64π 2
(k1 · k2 )2 − M 4 E1 E2

At this point we have to be a little more specific about the kinematic conditions for the
scattering. Let us then assume we are performing the experiment in a collider, in the
center-of-mass frame of the two incoming particles. Thus the four-vector momenta
take the form k1 = (E, k), k2 = (E, −k), k1 = (E1 , k1 ), k2 = (E2 , k2 ). One finds
206 Dynamics V: Construction of local covariant fields

(k1 · k2 )2 − M 4 = 4|k|2 E 2 , while the energy-momentum conservation δ-function is


δ 4 (..) = δ(E1 + E2 − 2E)δ 3 (k1 + k2 ). Inserting these into (7.198), and doing the k2
integration using the δ-function, we find
  2
λ2 1 |k1 |
dσ = δ(2E1 − 2E)d|k1 |dΩk̂ (7.199)
64π2 2E|k| (E1 )2 1

If we have detectors deployed with angular resolution, we should leave the integral
over solid angles undone; on the other hand, the detector will “blip” once the particle
enters the opening angle of the detector, irrespective of the magnitude of momentum
|k1 |, so we should integrate over this variable. This gives the final result—a differential
cross-section
dσ λ2 3 λ2
= + O(λ ) = + O(λ3 ) (7.200)
dΩk̂ 256π 2 E 2 256π 2 (
k 2 + M 2)
1

Note that the scattering (to lowest order) is isotropic (“s-wave”): independent of the
angle between k1 and k1 .

7.6.2 φ decay in theory B


In theory B we have two distinct spinless particles, the φ particle with mass M ,
and a non-self-conjugate ψ particle with mass m. We now assume that M > 2m. We
shall see that the interaction (7.178) induces a decay process φ → ψ + ψ c . Denote
the momentum of the initial φ particle by k, with k1 (resp. k2 ) the momenta of the
outcoming ψ (resp. ψ c ). We need to calculate

Sk1 k2 ,k = −iλ d4 x k1 k2 |ψ † (x)ψ(x)φ(x)|k + O(λ2 ) (7.201)

The relevant pieces of the various field operators are clearly



φ(x) → Dp a(p)e−ip·x


ψ(x) → Dp2 ac† (p2 )eip2 ·x


ψ † (x) → Dp1 a† (p1 )eip1 ·x

Inserting these forms in (7.201) and performing the spacetime and momentum inte-
grals, we obtain
−iλ
Sk1 k2 ,k =  (2π)4 δ 4 (k1 + k2 − k) + O(λ2 ) (7.202)
(2π)9/2 2E · 2E1 · 2E2
Stripping off the δ-function, this gives the T-matrix element
λ
Tk1 k2 ,k =   
+ O(λ2 ) (7.203)
(2π)3/2 2E · 2E1 · 2E2
Some simple theories and processes 207

The general decay formula (7.181) can now be applied to get the differential decay
rate:

λ2 1
dΓ = 2πδ 4 (k1 + k2 − k) d3 k1 d3 k2 + O(λ3 ) (7.204)
(2π) 2E · 2E1 · 2E2
3

Once again we have reached the point where a choice of frame needs to be made.
Clearly, it is easiest to pick the rest frame for the decaying particle (simple time-
dilation arguments give the decay rate in other frames). So let us take k = (M, 0),
k1 = (E1 , k1 ), k2 = (E2 , k2 ). One then finds (integrating out k2 , and neglecting higher
orders)

λ2 1
dΓ = δ(2E1 − M )|k1 |2 d|k1 |dΩk̂ (7.205)
(2π)2 2M (2E1 )2 1

As in the calculation of the φ − φ scattering, we leave the angular integral undone to


get a differential decay rate per unit solid angle, and perform the integration over the
radial momentum component; a short calculation gives

dΓ λ2 M 2 − 4m2
= (7.206)
dΩk̂ 64π 2 M2
1

which, of course, makes sense only if M > 2m, as assumed initially (otherwise the
energy δ-function constraint in the k1 integral is never satisfied, and we simply get zero:
the φ particle is stable). Note that the above calculation gives the rate for detecting
ψ particles at any angle (in this theory, the particle and antiparticle are in principle
distinguishable; of course, the ψ c simply comes out in the opposite direction to the ψ
in the rest frame of the φ). The decay is isotropic, so the total decay rate (#decays/sec
in all directions) is just, to second order in λ,

λ2 M 2 − 4m2
Γtot = (7.207)
16π M2

7.6.3 ψ − ψc scattering in theory B


Referring to the form of the interaction Hamiltonian (7.178) in theory B, we see that
the destruction of two incoming ψ particles and the creation of two outgoing ones
requires at least four ψ-type fields, i.e., the interaction Hamiltonian must appear to
at least second order. Moreover, as there are no φ particles in either the initial or final
state, only terms of even order in Hint are relevant to this process:

λ2
S
k
k ,
k1
k2 = − d4 x1 d4 x2 k1 k2 |T {ψ † (x1 )ψ(x1 )φ(x1 )ψ † (x2 )ψ(x2 )φ(x2 )}|k1 k2 
1 2 2
+ O(λ4 ) (7.208)
208 Dynamics V: Construction of local covariant fields

The matrix element in the joint Fock space of multi-particle ψ and φ states can be
factorized in an obvious way:

k1 k2 |T { ... }|k1 k2 


= {θ(t1 − t2 ) 0|φ(x1 )φ(x2 )|0 k1 k2 |ψ † (x1 )ψ(x1 )ψ † (x2 )ψ(x2 )|k1 k2 
+ θ(t2 − t1 ) 0|φ(x2 )φ(x1 )|0 k1 k2 |ψ † (x2 )ψ(x2 )ψ † (x1 )ψ(x1 )|k1 k2 }
(7.209)

As in the calculation of φ − φ scattering, we are only interested in the connected


part of the ψ matrix element, in which every field operator associates with one of the
incoming or outgoing particles. Only ψ fields contain annihilation parts for ψ particles
(or creation parts for ψ c ), so these fields associate with the particles with momenta
k1 (resp. k2 ). Likewise, the ψ † fields take care of the particles with momenta k2 , k1 .
The result clearly contains four terms:
 
k1 k2 |ψ † (x1 )ψ(x1 )ψ † (x2 )ψ(x2 )|k1 k2  = PSF · (e−i(k1 +k2 )·x2 +i(k1 +k2 )·x1
     
+ ei(k1 −k1 )·x2 +i(k2 −k2 )·x1 + ei(k2 −k2 )·x2 +i(k1 −k1 )·x1 + ei(k1 +k2 )·x2 −i(k1 +k2 )·x1 )
(7.210)

where “PSF” is an elegant notation for the phase-space factor


1
PSF ≡  (7.211)
(2π)6 2E1 · 2E2 · 2E1 · 2E2
Note that the result (7.210) is symmetric under the interchange x1 ↔ x2 , so that the
two matrix elements involving the ψ fields in (7.209) are in fact equal, and can be
taken out as a common factor. The matrix elements involving the φ field can then be
recombined into a single time-ordered product, as follows:

λ2
S
k
k ,
k1
k2 = − (PSF) d4 x1 d4 x2 0|T (φ(x1 )φ(x2 ))|0
1 2 2
 
×{e−i(k1 +k2 )·x2 +i(k1 +k2 )·x1 + ...} + O(λ4 ) (7.212)

The vacuum expectation value (vev) of the time-ordered product of two fields
appears in almost every perturbative calculation in field theory: it has been dubbed
the Feynman propagator (see Fig. 7.1). Conventionally, a factor of i is included in the
definition, as follows:

iΔF (x1 , x2 ) ≡ 0|T (φ(x1 )φ(x2 ))|0 (7.213)


 
d3 k d3 k
= θ(t1 − t2 ) 3
e+ik·(x2 −x1 ) + θ(t2 − t1 ) e−ik·(x2 −x1 )
(2π) 2E(k) (2π)3 2E(k)
≡ iΔF (x1 − x2 ) (7.214)

(Strictly speaking, this is the propagator for the φ field which should be distinguished
from a similar object for the ψ field, not needed in this calculation.) Note that the
Some simple theories and processes 209

return contour for x0< 0 k0

−E(k) +

E(k) −

return contour for x0 > 0

Fig. 7.1 Contours for the Feynman propagator.

Feynman propagator contains both time orderings t1 ≤ t2 and t1 ≥ t2 ; as we should


expect from the discussion in Chapter 3 (see Fig. 3.1), a frame-independent description
of particle exchange must necessarily include a symmetrical treatment of the emission
and absorption events for the exchanged particle.
The coordinate space version of the propagator given above is less useful in practice
than the Fourier transform, as we are typically more interested in the behavior of
amplitudes in momentum space. In fact, the Fourier transform is remarkably simple:

d4 k e−ik·x
ΔF (x) = (7.215)
(2π) kμ k − M 2 + i
4 μ

where is a positive infinitesimal quantity. To see this, first note that

k2 − M 2 + i = (k0 )2 − E(k)2 + i = (k0 − E(k) + i )(k 0 + E(k) − i ) (7.216)

Accordingly, if x0 > 0 (resp. x0 < 0), the integrand of (7.215) vanishes exponentially
fast in the negative (resp. positive) imaginary direction in the complex plane of the
k0 integration variable, and we can use Cauchy’s theorem to close the contour in the
lower (resp. upper) half-plane, in which case we pick up the residue of the pole at
k 0 = E(k) − i (resp. k 0 = −E(k) + i ). The result of the k0 integration is then

d3 k
ΔF (x) = −i e−ik·x , x0 > 0 (7.217)
(2π)3 2E(k)

d3 k
ΔF (x) = −i eik·x , x0 < 0 (7.218)
(2π)3 2E(k)
210 Dynamics V: Construction of local covariant fields

agreeing with (7.214).


The Feynman propagator is the Green function for an extremely important differ-
ential operator, the Klein–Gordon operator  + M 2 (cf. (6.56)):

( + M 2 )ΔF (x) = −δ 4 (x) (7.219)

It is apparent from the Fourier representation (7.215) that the Feynman propagator is
a Lorentz-invariant function of its spacetime argument x, despite its definition (7.214)
involving θ-functions of a frame-dependent time. The resolution of this apparent
paradox becomes clear if we recall the locality of the φ(x) field: under a Lorentz
transformation Λ altering the time-ordering of φ(x1 ) and φ(x2 ), the space-like separa-
tion of x1 and x2 ensures that the order of field operators is irrelevant, as the field is
local. This is just a special case of the argument given in Section 5.5 for the Lorentz-
invariance of an S-matrix constructed perturbatively in terms of the time-ordered
products of local scalar interaction Hamiltonians.
Returning to the calculation of the S-matrix element for elastic ψ − ψ c scattering
in (7.212), we can now write (at second order in λ)

λ2 d4 k 1
S
k
k ,
k1
k2 = −i (PSF) d 4 x1 d 4 x2
1 2 2 (2π)4 k 2 − M 2 + i
   
× eik·(x2 −x1 ) (e−i(k1 +k2 )·x2 +i(k1 +k2 )·x1 + e−i(k1 −k1 )·x2 +i(k2 −k2 )·x1 + x1 ↔ x2 )

d4 k 1
= −iλ2 (PSF) (2π)8 {δ 4 (k − k1 − k2 )δ 4 (k − k1 − k2 )
(2π)4 k 2 − M 2 + i

+ δ 4 (k − k1 + k1 )δ 4 (k + k2 − k2 )}

= −iλ2 (PSF)(2π)4 δ 4 (k1 + k2 − k1 − k2 )


1 1
×{ +  } (7.220)
(k1 + k2 )2 − M + i (k1 − k1 ) − M 2 + i
2 2

We see that the final amplitude is the sum of two distinct pieces, depending
respectively on the square of the total incoming four-momentum, s ≡ (k1 + k2 )2 (equal
to four times the square of the particle energy in the CM frame) and on the square of
the four-momentum transferred from the ψ to ψ c , t ≡ (k1 − k1 )2 . In the CM frame,
the second term will, of course, lead to an angular dependence of the differential
cross-section. The two terms have a simple graphical interpretation (see Fig. 7.2: note
that the direction of the arrows for the ψ particles indicates “charge”, rather than
momentum, flow, if we associate positive charge with the ψ and negative charge with
the antiparticle ψ c ). The result for the S-matrix amplitude can be read off immediately
from the following simple Feynman rules:

1. −iλ at each vertex.


2. iΔF (p) ≡ p2 −Mi 2 +i for each internal φ line carrying momentum p.
3. Apply four-momentum conservation at each vertex.
Some simple theories and processes 211

k1 k2
k2

k1
k1 − k1
k1 + k2

k2

k1
k1 k2

Fig. 7.2 Feynman Graphs for ψ − ψ c scattering.

4. Each external line has an associated factor of √


1
: the product for all
(2π)3/2 2E(k)
external lines yields the phase-space factor called PSF above (see discussion of
Lorentz-invariance below!).
5. There is an overall energy-momentum conservation factor (2π)4 δ 4 (Pα − Pβ ).
A real derivation of these rules, valid to all orders of perturbation theory, will be given
later in Chapter 10, once we have Wick’s theorem at our disposal.
One final comment on the Lorentz-invariance of our results. With the exception
of the phase-space factor associated with the external lines (see (7.211)), our final
result for the S-matrix element (7.220) is manifestly a Lorentz-invariant function of
the momenta. The presence of the non-Lorentz-invariant phase-space factor is due
to our use of non-covariantly normalized particle states: if we return to covariantly
normalized
 states, the “PSF” factor disappears (as there is now an extra factor of
(2π)3/2 2E(k) for each particle in the initial or final state, exactly cancelling the
PSF), and we once again have the critical invariance property which we set out to
ensure many pages ago: namely, for 2-2 scattering
 
cov k1 k2 |S|k1 k2 cov = cov k1 k2 |U † (Λ)SU (Λ)|k1 k2 cov
= cov Λk1 Λk2 |S|Λk1 Λk2 cov

7.6.4 ψ − ψ c scattering in theory C


We now turn to a theory with massive Dirac particles interacting via a Yukawa
interaction term, (7.179). The Dirac fields appearing in the interaction are
 
ψ(x) = Dp (u(p, σ)b(p, σ)e−ip·x + v(p, σ)d† (p, σ)eip·x ) (7.221)
σ
 
ψ̄(x) = Dp (ū(p, σ)b(p, σ)† eip·x + v̄(p, σ)d(p, σ)e−ip·x ) (7.222)
σ
212 Dynamics V: Construction of local covariant fields

with

1 m 3
Dp ≡ d p (7.223)
(2π)3/2 E(p)

We shall consider the scattering of an incoming ψ particle (with momentum and spin
p1 , σ1 ) on an anti-ψ particle (momentum and spin p2 , σ2 ). The outgoing particles
are similarly labeled, with primes added. As before, we work to second order in the
interaction (7.179). Following steps exactly analagous to those leading to the term
(7.210) in theory B, we encounter a matrix element of the form

p1 σ1 , p2 σ2 |ψ̄(x1 )ψ(x1 )ψ̄(x2 )ψ(x2 )|p1 σ1 , p2 σ2 


 
= PSF{ū(p1 σ1 )v(p2 σ2 )v̄(p2 σ2 )u(p1 σ1 )(ei(p1 +p2 )·x1 −i(p1 +p2 )·x2 + x1 ↔ x2 )
 
− ū(p1 σ1 )u(p1 σ1 )v̄(p2 σ2 )v(p2 σ2 )(ei(p1 −p1 )·x1 +i(p2 −p2 )·x2 + x1 ↔ x2 )} (7.224)

where the phase-space factor now takes the form

m2
PSF =  (7.225)
(2π)6 E(p1 )E(p2 )E(p1 )E(p2 )

Note the appearance of a minus sign in two of the four terms, due to the need for
an odd number of transpositions of fermionic creation and destruction operators in
the fields in order to destroy the incoming particles and create the final-state ones.
As we shall see shortly when we discuss the crossing symmetry of these amplitudes,
the minus sign is an indication of a generalized notion of Fermi antisymmetrization
of amplitudes, applicable in local relativistic theories to exchange of particles between
the initial and final states (and not only, as in non-relativistic quantum mechanics,
to exchange of identical particles in either the initial or final state). Nevertheless, the
entire expression is symmetric under x1 ↔ x2 , just as in the bosonic case of Theory
B. This allows us, as there, to factor the fermionic matrix element entirely from the
vacuum expectation of time-ordered φ fields, so we recover, as before, a Feynman
propagator iΔF (x1 − x2 ) for the φ field. After substituting the Fourier transform
expression (7.215) and performing the spacetime integrals over x1 and x2 , we obtain
the final result for the second-order scattering amplitude:

Sp1 σ1 p2 σ2 ,p1 σ1 p2 σ2 = iλ2 (PSF)(2π)4 δ 4 (p1 + p2 − p1 − p2 )


ū(p1 σ1 )u(p1 σ1 )v̄(p2 σ2 )v(p2 σ2 )
×{
(p1 − p1 )2 − M 2 + i
ū(p1 σ1 )v(p2 σ2 )v̄(p2 σ2 )u(p1 σ1 )
− } (7.226)
(p1 + p2 )2 − M 2 + i

Apart from the minus sign, the result (7.226) is very similar to the result (7.220), with
a contribution from scalar exchange interfering with one from particle–antiparticle
annihilation, as indicated in Fig. 7.3. Our previous list of Feynman rules needs only
the following obvious additions for spin- 12 Dirac particles:
Some simple theories and processes 213

p1, σ 1 p2, σ2


p 2, σ 2

p1, σ1
p1 − p1
− p1 + p 2

p2, σ2

p 1 , σ1
p 1, σ 1 p 2, σ 2

Fig. 7.3 Feynman graphs for ψ − ψ c scattering in Theory C.

1. Drawing particle lines with the arrow pointing upward (i.e., from the
initial to the final state), each final-state particle is associated with a
 
factor (2π)13/2 E(pm
 ) ū(p σ ), and each initial-state particle with a factor

1 m
(2π)3/2 E(p) u(pσ).
2. Adopting the convention that antiparticle lines should be drawn with the arrow
pointing downward (i.e., from final to initial state), each initial-state antiparticle
1 m
is associated with a factor (2π)3/2 E(p) v̄(pσ), and each final-state antiparticle

1 m  
with a factor (2π)3/2 E(p ) v(p σ ).

In other processes in Theory C, such as φ-ψ scattering (see Problem 12), the
Feynman–Dirac propagator iSF (x1 − x2 ), involving the time-ordered product of a ψ
with a ψ̄ field (see Problem 6), will make its appearance. After all Fourier transforms
are performed, this leads, as expected, to a factor of iSF (p) for every internal fermion
line carrying four-momentum p.

7.6.5 Crossing symmetry


The fact that the multi-particle scattering amplitudes of identical bosons (or fermions)
are symmetric (resp. antisymmetric) under exchange of any two momenta in either
the initial or the final state is a basic and irreducible postulate of non-relativistic
quantum mechanics: it cannot be derived from any more fundamental concepts in
the non-relativistic arena, as both relativistic invariance and the locality requirement
are absent. In particular, particles and antiparticles are distinct and totally unrelated
objects from a non-relativistic point of view. In local field theory, on the other hand, we
have seen that particles and antiparticles are intimately related by their symmetrical
appearance in the local fields used to construct the fundamental interaction processes
of the theory. This leads to a generalization of the Bose–Fermi exchange symmetry
214 Dynamics V: Construction of local covariant fields

corresponding to exchange of initial-state particles with final-state antiparticles which


very strongly constrains the structure of a relativistic S-matrix. This generalization, for
reasons that will shortly become apparent, is called crossing symmetry. We are not yet
in a position to give a general proof of crossing symmetry, valid beyond perturbation
theory (i.e., for the exact amplitudes of the theory): this will come later, in Chapter 9,
when we have in hand the master (“LSZ”) formula derived by Lehmann, Symanzik,
and Zimmermann (Lehmann et al., 1955) for arbitrary scattering amplitudes in a
relativistic field theory. At that point, the relation between the crossing symmetry of
the scattering amplitudes and the Bose (or Fermi) character of the underlying local
fields will become immediate and manifest. Here, we shall content ourselves with a
few simple examples based on the toy theories of the preceding sections.
A simple example of crossing symmetry can be seen by examining the second-order
S-matrix amplitude for elastic ψ − ψ scattering in theory B. Consider the process

ψ(p1 ) + ψ(p2 ) → ψ(p1 ) + ψ(p2 ) (7.227)

(we are using momenta labeled by “p”s rather than “k”s to avoid confusion in the
crossing rules given below). A calculation along similar lines to that carried out above
for ψ − ψ c scattering yields an amplitude for this process proportional to
1 1
+ (7.228)
(p1 − p1 )2 
− M + i (p1 − p2 ) − M 2 + i
2 2

corresponding to the Feynman graphs shown in Fig. 7.4. One easily sees that this
amplitude transforms to the corresponding result for ψ − ψ c scattering in (7.220)
with the simple replacements:

p1 → k1
p1 → k1
p2 → −k2
p2 → −k2

i.e., by twisting the antiparticle lines around and changing the sign of their momenta.

p2 p1

p1 p1 − p1 p2


p1 − p2
+
p2 p2

p1 p1

Fig. 7.4 Feynman graphs for ψ − ψ scattering in Theory B.


Problems 215

Similar crossing rules apply in multi-fermion scattering amplitudes, with the


expected supplemental rules for dealing with the external line spinors: u spinors
are replaced by v spinors when an initial-state particle is crossed to a final-state
antiparticle, etc. The appearance of the negative sign in (7.226) for ψ-anti-ψ scattering
in Theory C is a necessary consequence of the Fermi antisymmetry of the ψ-ψ
scattering amplitude under exchange of initial- (or final-)state particle momenta, once
we subject that amplitude to the crossing transformation which converts it to a ψ-ψc
scattering amplitude.

7.7 Problems
1. Verify the expression given in the text for the free Hamitonian of a massive
Majorana field (given in (7.87)):

 + m (χT Cs χ − χ† Cs χ∗ ) :
H0 = d3 x : (χ† iσ · ∇χ
2

Namely, show that the above expression reduces to the desired energy summa-
tion formula in terms of creation and annihilation operators:
 
H0 = d3 k E(k)b† (k, σ)b(k, σ)
σ

Remember to include an extra minus sign whenever a pair of creation or


annihilation operators must be transposed in order to effect the normal ordering
(i.e., when moving b† s to the left of bs). Hint: it is easiest to begin with the form
(7.90); one also needs the result e−θk̂·
σ = m 1
(E(k) − k · σ ) for cosh θ = E(k)
m , and

the conjugation property Csσ Cs = σ .
2. Show that for a non-self-conjugate ( 12 ,0) field χα , the bilinear χ∗α χα does not
transform as a scalar under general HLG transformations (see (7.38)).
3. Show that the free Hamiltonian for a Dirac particle of mass m is given by
 
H0 = d3 kE(k) (b† (k, σ)b(k, σ) + d† (k, σ)d(k, σ))
σ

=  + m)ψ(x) :
d3 x : ψ̄(x)(iγ · ∇

As for the Majorana case, remember that for fermions the normal ordering
includes an extra minus sign for each transposition of fermion operators, by
definition.
4. Convince yourself that J μ (x) ≡ e : ψ̄(x)γ μ ψ(x) : is the quantized version of the
conventional electric current four-vector for a charged Dirac particle described
by a free Dirac field ψ(x), by showing
(a) current conservation (use the Dirac equation for ψ):

∂μ J μ (x) = 0
216 Dynamics V: Construction of local covariant fields

(b) that the charge Q defined as



Q≡ d3 xJ0 (x, t)

counts electric charge (defined as e times the difference in the number of particles
and antiparticles), and is time-independent.
5. Prove the following equal-time anticommutator relation for the Dirac field:

{ψn (x, t), ψm (y , t)} = δ 3 (x − y )δnm

6. Use the result (7.113) for the spin sums for a Dirac particle

u(p, σ)ū(p, σ) = (p/ + m)/2m
σ

(and the corresponding result (7.114) for the v spin functions) to derive the
momentum-space formula for the Dirac propagator, defined as follows (note the
minus sign!):

SF (x − y)mn ≡ 0|T (ψm (x)ψ̄n (y))|0


≡ θ(x0 − y 0 ) 0|ψm (x)ψ̄n (y)|0 − θ(y 0 − x0 ) 0|ψ̄n (y)ψm (x)|0

Show that the Fourier transform of SF is

p/ + m
SF (p) = i
p2 − m2 + i

7. The relation between the four-vector notation and the (A=1/2,B=1/2) notation
for the fundamental representation of the Lorentz group is given by
1 1 1
v 12 ,21 = √ (v 1 − iv2 )
2 2 2
1 1 1
1 = √ (v − v )
0 3
v 12 ,−
2
2 2 2
1 1 1
v−2 21 , 1 = − √ (v 0 + v 3 )
2 2 2
1 1 1
v−2 21 ,− 1 = − √ (v 1 + iv2 )
2 2 2

Recalling that J3 = A3 + B3 , K3 = i(A3 − B3 ), where both A  and B  are rep-


resented by 12 σ (one-half the Pauli matrices) for the (1/2,1/2) representation,
show that the action of A3 + B3 on the left-hand sides is equivalent to the action
of J3 on the right-hand sides above, and likewise for the boost generator in the
z direction.
Problems 217

8. Construct the spin functions uμ (k, σ), v μ (k, σ), in four-vector notation, for a
(1 1)
j = 1 massive boson, starting from the spin functions u±212,± 1 etc., in (AB)
2 2
notation. Do this in the following steps:
(a) First, construct the four-vectors uμ (0, σ), v μ (0, σ) for the particle at rest,
using (7.64, 7.54), with ξ AB = (−1)2B ), and the translation dictionary
supplied in Problem 7. Verify that v μ (0, σ) = (uμ (0, σ))∗ .
(b) Show that the polarization vectors derived in (a) satisfy

uμ (0, σ)uν∗ (0, σ) = −g μν + g0μ g0ν
σ

(Consider the cases of μ, ν both spatial, then one time, one spatial, etc.)
(c) Now, show that for non-zero momentum,
 kμ kν
uμ (k, σ)uν∗ (k, σ) = −(g μν − )
σ
m2

Use the boost operator Lμν (k), recalling that Lμ0 = k μ /m, and that Lμν (k)
is a Lorentz transformation.
(d) The polarization vectors in helicity representation are

uμ (k, λ) = j
Dσλ (R(k̂))uμ (k, σ)
σ

Show that they satisfy an identical equation to part (c).


9. Verify the expression (7.176) for the free-photon Hamiltonian, starting from
(7.174) for the massless spin-1 field in radiation gauge. (Note the following

properties for the radiation gauge polarization vectors: λ i (k, λ) j (k, λ)∗ =
3
δij − k̂i k̂j , i=1 i (k, λ) i (k, λ )∗ = δλλ .)
10. In Theory B of Section 7.6, with interaction Hamiltonian Hint = λψ † ψφ, where
φ is a self-conjugate field of mass M and ψ a non-self-conjugate field of mass m,
calculate the connected S-matrix element for elastic ψ − φ scattering to lowest
(i.e., second) order in λ:

ψ(p) + φ(k) → ψ(p ) + φ(k )

Note that in this case the vacuum expectation value of the time-ordered product
of the ψ field occurs, in the form

0|T {ψ(x1 )ψ † (x2 )}|0 ≡ iΔF (x1 − x2 )


(ψ)


d4 q e−iq·(x1 −x2 )
=i
(2π)4 q 2 − m2 + i

Interpret your result graphically.


218 Dynamics V: Construction of local covariant fields

11. Again, in Theory B of Section 7.6, calculate the lowest-order connected S-matrix
element for the annihilation process
ψ(p1 ) + ψ c (p2 ) → φ(k1 ) + φ(k2 )

Interpret your result graphically.


12. Calculate the second-order scattering amplitude for φ − ψ scattering in Theory
C: i.e.,
ψ(p, σ) + φ(k) → ψ(p , σ  ) + φ(k )
You should find again that the amplitude is the sum of two terms: discuss its
crossing symmetry under the transposition of the bosonic particle momenta.
13. The (as yet unseen) Higgs particle couples to leptons (and quarks) via our
Theory C; i.e., a standard Yukawa interaction term Hint = λψ̄ψφ, where the φ
Higgs field represents a spinless self-conjugate particle of mass M and the lepton
Dirac field ψ has mass m < M/2. Calculate the lowest-order total decay rate of
the Higgs to a lepton-antilepton pair, to lowest order in λ. Spin sums such as
(7.113) and (7.114) will be useful.
8
Dynamics VI: The classical limit
of quantum fields

The precise way in which underlying microphysical processes, governed by the laws
of quantum mechanics, merge into a phenomenal realm describable by classical laws
has been a source of intense discussion and controversy from the very earliest days of
quantum theory. At its core, this subject leads inexorably to quantum measurement
theory, a subject long regarded by physicists of a more practical bent as an intellectual
black hole, quite capable of permanently absorbing any physicist careless enough to
stray within its event horizon. Nevertheless, much can be understood of the way
in which quantum phenomena merge into and “mimic” classical physics without
making a definite commitment to the ultimate role or character of measurement
processes in quantum theory. Typically, one approaches this topic by identifying a set
of “complementarity” relations between quantities which have a precise meaning in
classical physics but cannot be simultaneously “sharp” once the theory is quantized.
The archetype of such relations is, of course, the Heisenberg uncertainty principle
relating the dispersion (or roughly, the “uncertainty”) in the position and momentum
observables for a non-relativistic point particle. Such relations follow directly, by
a straightforward exercise in linear algebra, from the non-commuting character of
the associated quantum-mechanical operators. From the complementarity point of
view, the classical limit amounts to a regime in which the dispersion of the relevant
observables (in the states of interest) becomes much smaller than their mean values,
although, of course, still restricted by the appropriate quantum-mechanical uncertainty
principle. Our task in this chapter will be to identify and examine those states in a
relativistic quantum field theory for which the field observables of interest take on an
essentially classical character.

8.1 Complementarity issues for quantum fields


As already seen in Chapter 1 in Jordan’s treatment of a one-dimensional quantized
massless field (his toy model for quantized electrodynamics), a quantum field theory
is structurally a system of infinitely many quantized degrees of freedom. If we restrict
the system to a finite-sized box, outside of which the fields vanish, then each classical
eigenmode on quantization becomes (for the free field) an independent quantum-
mechanical degree of freedom, describable either in terms of the familiar p and q
operators, with [p, q] = −i, or equivalently in terms of their non-hermitian linear
combinations, the destruction and creation operators a, a† , with commutation relation
[a, a† ] = 1 (independently for each mode). As these box eigenmodes are essentially
220 Dynamics VI: The classical limit of quantum fields

momentum eigenstates, this description obscures the spacetime aspects of the field.
From a spatiotemporal point of view, the conjugate variables analogous to the “q”s and
“p”s of non-relativistic quantum theory are, for a spinless field φ, the field variables
φ(x, t) and their time-derivatives π(x, t) ≡ ∂φ(
x,t)
∂t at each spatial point x at some fixed
time t. Locality ensures that the φ(x, t) commute among themselves (i.e., for any
x, y , [φ(x, t), φ(y , t)] = 0), as do the π(x, t), while (with x0 = y 0 = t, and the measure
3
Dk ≡ √ d 3k )
(2π) 2E(k)

[π(x, t), φ(y , t)] = DkDq[−iE(k)a(k)e−ik·x + iE(k)a† (k)eik·x , a(q)e−iq·y + a† (q)eiq·y ]

d3 k

= (−iE(k)eik·(
x−
y) − iE(k)e−ik·(
x−
y) )
(2π)3 2E(k)

d3 k i
k·(
x−
y)
= −i e = −iδ 3 (x − y ) (8.1)
(2π)3

Note that the analogy to the fundamental [p, q] commutation relation is somewhat
obscured here by the use of natural units in which  is set to unity: restoring the , it
appears on the right-hand side of (8.1) as expected.
The next step, in analogy to the well-known procedure in non-relativistic quantum
mechanics, would naturally be a derivation of an inequality for the product of the
dispersion of the two non-commuting field operators, analogous to Δp · Δx ≥ /2
in particle quantum mechanics. At this point we encounter an embarassment: the
dispersions of the field operators φ(x), or π(x), defined at a single spacetime point,
are typically infinite! In the simplest possible Fock-space state, for example—the
vacuum—one finds that 0|φ(x)2 |0 = ∞. Mathematically, the reason for this is that
the field operator φ(x) is in fact not well-defined on the Hilbert space of normalizable
Fock-space states: the unit norm state |0 is taken into an infinite-norm state φ(x)|0
by the action of the local field φ(x), whence the divergence of the vacuum expectation
value 0|φ(x)2 |0 = (φ(x)|0, φ(x)|0). Local fields such as φ(x) and π(x) should instead
be regarded as operator-valued distributions: they yield well-defined1 operators only
after smearing with sufficiently smooth c-number test functions (cf. the discussion of
localization in Section 6.5). At the very least, we must construct spatially smeared
operators: e.g., we might replace the field at the origin by

1 x2
3 −
φ̄ = d xe 2a2 φ(
x, t) (8.2)
(2πa2 )3/2
The normalization factor is chosen so that the smeared object is identical to the
original for constant fields. The square of this operator has a perfectly finite vacuum
expectation value: in other words, φ̄ maps the vacuum state to a normalizable state

1 The smeared field operators will still be unbounded operators, as are typically the position and
momentum operators in ordinary quantum mechanics: hence, their domain will be a proper subset of
the full Hilbert space, but at least it will not be empty! A systematic discussion of the use of smeared
operators in field theory will be an indispensable part of our introduction to axiomatic quantum field
theory in Chapter 9.
Complementarity issues for quantum fields 221

in the Hilbert space (in mathematical lingo, the vacuum state lies in the domain of
the operator φ̄). A short calculation shows that
 ∞
1 k2 2 2 1 1
0|φ̄2 |0 = √ e−a k dk ∼ , a << (8.3)
4π 2 0 k2+m 2 2
8π a 2 m

which shows clearly the re-emergence of the quadratic divergence in the vacuum
expectation value in the local limit a → 0. We see that it is not even possible to define
a dispersion Δφ(x) for the  local field φ(x), whilefor the smeared field the dispersion
is perfectly finite: Δφ̄ ≡ 0|φ̄2 |0 − 0|φ̄|02 = 0|φ̄2 |0. Physically, the restriction
to smeared fields is also perfectly reasonable, as we can hardly expect any conceivable
measurement apparatus to have infinitely fine resolution, either in space or in time.
The preceding discussion focussed on a spin-0 (hence bosonic) field. What about
spin- 12 fields, which, by the Spin-Statistics theorem, necessarily describe fermionic
particles? We recall from the discussion of blackbody radiation in Chapter 1 that the
classical “Rayleigh–Jeans” regime corresponds to the limit in which the denominator
of the Planck distribution (1.22) vanishes, corresponding to a large occupation number
in modes of any given frequency. We shall see in the next section that the classical limit
for a quantum field necessarily requires such large occupation numbers. For a fermionic
field, the corresponding particle quanta are restricted by the exclusion principle to an
occupancy of either zero or one for each distinct quantum mode, so a classical limit for
such fermionic fields is simply impossible. Of course, there are many particle states of
such fields which exhibit classical behavior—the ∼1026 electrons, protons and neutrons
bound together into a spherical billiard ball certainly behaves perfectly classically in
many contexts—but the classical physics appropriate in this circumstance is particle
mechanics, rather than classical field theory. In this chapter we are concerned with
the approach to the latter, so fermionic fields will no longer be considered.
Returning, then, to bosonic fields, the only classical fields of importance are the
electromagnetic and gravitational fields. As conventional quantum field theories of
gravity are at best effective theories valid only at low energies, we shall concentrate
on the electromagnetic field, which in any case deserves special consideration given its
unique role in the gestation and birth of modern quantum mechanics and field theory.
This leads us to the consideration of complementarity issues for the field A(x)  (7.174)
introduced in Section 7.5 for the description of massless spin-1 particles. We recall that
this field undergoes a non-trivial modification under gauge transformations, leaving
the physics invariant (classically, it is the spatial part of the four-vector potential). The
associated physical (hence gauge-invariant) fields are the electric field and magnetic
fields: in the radiation (or Coulomb) gauge in which we constructed A(x),  these take

  
the form E = ∂t and B = ∇ × A (or Bi = ijk ∂j Ak , i, j, k = 1, 2, 3). We can think of
∂A

the electric field, involving a time-derivative, as the conjugate momentum field to the
magnetic field, which involves only space derivatives, and hence only combinations of
field values on a given time-slice. The commutation relation analogous to the result
(8.1) for a canonical scalar field is (see Problem 1)

[Ei (x, t), Bj (y , t)] = i ijk ∂k δ 3 (x − y ) (8.4)


222 Dynamics VI: The classical limit of quantum fields

The reader may easily verify that the corresponding equal-time commutators among
electric and magnetic fields vanish separately.
Equal time commutation relations of the type (8.1) and (8.4) will later (cf.
Chapter 12) play a central role in the development of a Lagrangian formalism for
field theory, of enormous utility in the incorporation of spacetime and local gauge
symmetries, which necessarily take a complicated and obscure form in the Hamiltonian
approach we have followed so far. Here we note that the complementary role played
by electric and magnetic fields leads to physical consequences of enormous importance
in modern condensed matter physics and field theory. In the BCS theory of super-
conductivity, for example, the appearance of a condensate of charged Cooper pairs in
the superconducting state implies an essentially infinite dispersion in the local electric
fields, requiring the dispersion of the local magnetic field to vanish in the interior of
the superconductor- the famous Meissner effect. A dual Meissner effect, in which the
chromoelectric field is forced to zero (in this case as a result of large chromomagnetic
fluctuations) outside of thin tubes carrying the conserved flux required by Gauss’s
Law, is believed to lie at the core of color confinement in quantum chromodynamics
(cf. Chapter 19). The non-zero zero-point energy of the electromagnetic field ( 12 ω
for each quantized mode of the field) can be viewed as a direct consequence of the
non-existence of states in which both electric E and magnetic B  fields have vanishing

expectation values and dispersion, allowing zero energy (∝ E + B 2  2 ). And so on . . .
In the next section we shall examine closely the conditions for “classical” behavior
of a quantum field. This will turn out to be more easily accomplished in momentum
space—i.e., by examining the multi-particle states for individual quantized modes of
the field, rather than in terms of the spatiotemporally defined fields appearing in the
commutators above. Before doing that, a brief historical digression is in order. As we
recall from the historical account in Chapters 1 and 2, the birth of quantum field
theory was accompanied by much uncertainty about the validity of a straightforward
extension of quantum-mechanical principles to systems with infinitely many degrees
of freedom. An early critique which caused a great deal of unease appeared in a
paper of Landau and Peierls, appearing in 1930 and published the following year
(see (Landau and Peierls, 1983)), in which the arbitrarily precise measurability of
even individual components of the quantized electric or magnetic fields employing
a charged point particle was denied (for fields smeared over a temporal extent Δt,
Landau and Peierls claimed an intrinsic dispersion ΔE ΔB (Δt) 1
2 ). The basic

difficulty was that the acceleration of the test particle in the presence of the field to be
measured would lead to an uncontrollable radiation emission and concomitant energy-
momentum loss. In a famous (and famously unread2 ) paper, Bohr and Rosenfeld
(Bohr and Rosenfeld, 1983) subjected the measurability of the field components of the
free electromagnetic field to a typically exhaustive examination, and concluded that
the uncertainty relations holding among appropriately smeared averages of various
components of the electric and magnetic field operators are precisely consistent with

2 As Pais relates in his biography of Bohr (Pais, 1991): “It [the Bohr–Rosenfeld paper] has been read by
very very few of the aficionados . . . As a friend of Bohr’s and mine once said to me: ‘It is a very good paper
that one does not have to read. You just have to know that it exists.’ ”
When is a quantum field “classical”? 223

the ΔpΔx ≥ 2 constraint holding for the dispersion of the momentum and position
of an extended test body, sufficiently massive to allow the response of the test body
to the field to lead to negligible accelerations, thereby minimizing the uncontrollable
radiation emissions that had bothered Landau and Peierls.
The Bohr–Rosenfeld paper played an important historical role in reassuring the
quantum community that the quantization of systems with infinitely many degrees of
freedom did not lead to conceptual inconsistencies or phenomenologically pathological
results, but that quantum field theories should indeed be viewed simply as quantum
systems of a new type, where the constraints on measurability bear just the same
relation to the underlying formal operator framework as in non-relativistic point-
particle quantum mechanics. From a practical point of view, the enormous body of
phenomenological information—the entire field of quantum optics—gathered over the
three-quarters of a century that have elapsed since the Bohr–Rosenfeld paper has
indeed allowed measurements of the quantum properties of light with unparalleled
precision. However, these measurements typically address the behavior of states in
which a limited number of quantized modes of the electromagnetic field are excited:
in other words, we are concerned with the field in momentum rather than coordinate
space. In the next section we shall see that the transition to classical behavior (as well
as the non-classical deviations therefrom which are the bread and butter of quantum
optics) is most easily studied in the occupation number space of these modes.

8.2 When is a quantum field “classical”?


A spatiotemporally-defined classical field is mathematically a c-number function (or
collection of functions, if we are dealing with the components of vector or tensor
fields) with sufficient mathematical regularity to allow the evaluation of a Fourier
transform to wavevector-frequency (k, ω) space. In other words, such a field can be
viewed as a linear combination of modes of well-defined wavevector and frequency. For
the electromagnetic field in a cavity, the field is more conveniently analysed in terms of
standing wave modes, which involve linear combinations of oppositely traveling waves.
In either case, the essential feature of a classical field is our ability to simultaneously
specify the amplitude and phase of each mode of the field. The quantization of the
field introduces an inescapable complementarity between these two properties. This
is best illustrated first at the level of an individual mode: later (in Section 8.4) we
shall see how to construct states of the field with arbitrary spatial dependence. It is
important to realize at the outset that the restriction to a single mode is by no means
physically unreasonable. Very early in the development of laser technology, ingenious
mode-selection methods were devised (Smith, 1972) to ensure that lasing in a cavity
produced multiple occupation of a single quantum mode. For example, introduction of
a Fabry–Pérot interferometer with a suitable geometry into the cavity can ensure that
only a single frequency mode fits into the positive gain portion of the laser gain profile.
In such cases we may assume that the state of the electromagnetic field consists of zero
photon number for all but a single mode. As indicated previously, such a mode will
typically be a standing wave mode, but the essential features are the same if we simply
consider the occupied mode to have well-defined photon momentum k, polarization
λ, and energy E(k) = ω. In a cubical box of volume V , the integral over continuous
224 Dynamics VI: The classical limit of quantum fields

wavevectors (i.e., momenta) becomes a discrete sum in the usual way, so that we
can speak sensibly of a specific individual mode. Instead of imposing the physical
constraints of an actual rectangular laser cavity (electric field vanishing at the plane
boundaries, for example), we shall simply assume our cubical box of volume V to be a
three-torus topologically, so that the fields satisfy simple periodic boundary conditions
in each of the three Cartesian coordinates. The transition from continous to discrete
momenta, and from creation–annihilation operators with δ distribution commutators
to ones with Kronecker δs, is then made via

(2π)3  2π
d3 k → , (kx , ky , kz ) = (nx , ny , nz )
V L

k

(2π)3
a(k, λ) → a
k,λ
V
where we use subscripts (rather than functional dependence) to indicate discretely
defined objects—for example, the creation–annihilation operators, which now satisfy

[a
k,λ , a
†k ,λ ] = δλλ δ
k
k (8.5)

With these translations, and the assumption that only one mode is occupied, we can
 given in (7.174) to
effectively truncate the quantized spin-1 field A

1

A(x) = 
(a
e−ik·x + a
† eik·x ) (8.6)
2E(k)V k,λ k,λ k,λ

Note that we have also assumed a plane-polarized mode, so that 


k,λ = 
∗k,λ = x̂, for
example. The corresponding electric field operator is

E(k)

E(x) = i 
(a
e−ik·x − a
† eik·x ) (8.7)
2V k,λ k,λ k,λ

To simplify the notation, we shall henceforth drop the subscripts k, λ indicating the
specific mode under consideration, and write simply
 x, t) = iC(aei

x−iωt − a† e−i

x+iωt )
E( (8.8)

Classically, the operators a (resp. a† ) correspond to complex numbers α = Aeiθ (resp.


α∗ ), giving a monochromatic wave of well-defined amplitude and phase:
 x, t) = −2CA sin (k · x − ωt + θ)
E( (8.9)

If our state space were finite-dimensional, we could apply the polar decomposition
theorem for finite-dimensional operators, which asserts that we can find a semipositive-
definite hermitian operator N and unitary operator E, with
√ √
a = E N , a† = N E † (8.10)
N = a† a (8.11)
When is a quantum field “classical”? 225

and (using [a, a† ] = 1)

EN − N E = E (8.12)

If we further identify the unitary operator as the complex exponential of an hermitian


phase operator Φ, E = eiΦ , then we see that the classical amplitude√and phase concepts
A, θ correspond to eigenvalues of the non-commuting operators N , Φ respectively.
There is evidently a complementarity between the concepts of amplitude and phase
at the quantum mechanical level. If one assumes a commutation relation

[N , Φ] = i (8.13)

then the commutation relation (8.12), with E = eiΦ , indeed follows directly using the
usual multiple-commutator formulas. In analogy to [q, p] = i leading to the uncer-
tainty relation ΔqΔp ≥ 2 , one then concludes that ΔN ΔΦ ≥ 12 , and it is immediately
apparent that a classical limit, with both amplitude and phase defined to high relative
precision, necessarily requires the number operator N to take on large values, with
ΔN << N̄ (but ΔN >> 1, thus still allowing ΔΦ << 2π).
While this conclusion is basically correct, the argument leading to it is completely
fallacious. For example, if we take the expectation value of the purported commutation
relation (8.13) in the vacuum state |0 for our mode, we obtain

0|N Φ − ΦN |0 = 0 = i (8.14)

There are two sources of trouble here. First, the polar decomposition theorem does not
extend in general to the existence of polar pairs N , E in infinite-dimensional spaces.
Secondly, the transition from a unitary E to a phase operator Φ clearly cannot lead to
the desired commutation relation (8.13), as the latter is clearly inconsistent. In fact,
the definition and construction of a well-defined phase operator in quantum mechanics
is a notoriously slippery subject—which is hardly surprising, given that even classically
the phase variable is not uniquely defined, but given only modulo 2π.
We shall circumvent the first problem by dealing with a truncated system, where
only states with a maximum occupation number N are considered. It will become
apparent in the next section that the coherent states that most mimic classical
behavior have very rapidly decreasing components for large occupation numbers N ,
with the probability of detecting more than N quanta falling more rapidly than ( en̄N
)N
for N > n̄ ≡ N  once we consider states with occupancy N exceeding the expectation
2
value n̄ of the number operator. Such states also have exponentially small (∼ e−n̄ )
probability of occupancy of the vacuum state |0. Thus, we expect to make only
very small errors by simply truncating the state space at some very high, but finite,
occupation number N . The second problem, defining a phase operator, can be avoided
simply by observing that the specification of the phase θ modulo 2π amounts to
specifying the real numbers cos θ and sin θ, or equivalently, the complex number eiθ ,
corresponding to our unitary operator E above. There will be no difficulty obtaining a
well-defined unitary E (or its hermitian “real” and “imaginary” parts C ≡ 12 (E + E † )
and S ≡ 2i 1
(E − E † )) once we have truncated the space.
226 Dynamics VI: The classical limit of quantum fields

In keeping with our demand that only a maximum number N of photons can
occupy any given mode of the electromagnetic field, we define new creation operators
by demanding that

a† |n = (n + 1)|n + 1, n < N, a† |N  = 0 (8.15)

The destruction operator a is then given by the adjoint of a† . The algebra is realized
by (N + 1) × (N + 1) matrices, which are easily seen to give the modified commutator

[a, a† ]nm = δnm − (N + 1)δnN δmN (8.16)

or, in operator language,

[a, a† ] = 1 − (N + 1)|N N | (8.17)

The restriction to states with extremely (factorially) small components in the |N  (or
higher) mode means that the effect of the last term on the right, with a projection
operator onto the highest mode, is effectively negligible. It is, of course, required,
because the trace of the commutator of two finite-dimensional matrices must vanish.
Note that we still have the usual number operator N = a† a with matrix elements
Nnm = nδnm , 0 ≤ n ≤ N .
In this finite-dimensional space the application of the polar decomposition theorem
is unproblematic, and we find that
√ √
a = E N , a† = N E † (8.18)

where E is the unitary operator corresponding to anticyclic permutation of the states


|0, |1, |2, ...|N :


N −1
E= |nn + 1| + |N 0| (8.19)
n=0

E is basically the lowering operator a without the square-root factors, except for the
action on the vacuum, which, as mentioned above, has exponentially small occupancy
in the quasi-classical states (see Section 8.3 below) in which we shall be interested. E
is the quantum analog of the complex classical phase eiθ . It is a normal operator (i.e.,
being unitary, it commutes with its adjoint), so its hermitian and antihermitian parts
C ≡ 12 (E + E † ) and S ≡ 2i1
(E − E † ) commute

[C, S] = 0 (8.20)

The commutator of the phase operator E with the number operator N is

[E, N ] = E − (N + 1)|N 0| ≈ E (8.21)

where once again we have neglected operators such as |N 0|, which simply exchange
the very small amplitudes for our state to have either 0 or N photons. Together with
[E † , N ] ≈ −E † , this then yields

[C, N ] = iS, [S, N ] = −iC (8.22)


When is a quantum field “classical”? 227

For arbitrary hermitian operators X, Y, Z3 satisfying [X, Y] = iZ, and defining


X = ψ|X|ψ for a unit-normalized state |ψ (and likewise for Y and Z), the fluctu-
ation operators ΔX ≡ X − X, ΔY ≡ Y − Y satisfy [ΔX, ΔY] = iZ, whence, for
arbitrary real λ,

0 ≤ ((ΔX + iλΔY)|ψ, (ΔX + iλΔY)|ψ) (8.23)


⇒ 0 ≤ ψ|(ΔX) |ψ + iλψ|[ΔX, ΔY]|ψ + λ ψ|(ΔY) |ψ
2 2 2

⇒ 0 ≤ (ΔY)2 λ2 − Zλ + (ΔX)2 , ∀λ (8.24)

The final inequality implies that the discriminant of the quadratic equation for λ must
be negative or zero, i.e.,

Z2 − 4(ΔX)2 (ΔY)2  ≤ 0 (8.25)



Writing (ΔX)2  → ΔX for simplicity, (and likewise for Y) we arrive at the
uncertainty relation
1
ΔX · ΔY ≥ |Z| (8.26)
2
Applying this general result to the commutation relations (8.22), we find the desired
number-phase uncertainty relations
1
ΔN · ΔC ≥ |S| (8.27)
2
1
ΔN · ΔS ≥ |C| (8.28)
2
For a state of well-defined photon occupation number |n,
1
n|C|n = n|E|n + n|E † |n = 0 (8.29)
2
1 1
n|C 2 |n = n|E 2 + 2 + (E † )2 |n = (8.30)
4 2
with a similar result for S. Evidently, the phase uncertainties are ΔC = ΔS = √12 ,
corresponding to a phase spread uniformly over the unit circle 0 ≤ θ < 2π. Note
that in this case both ΔN and C, S are zero, so the uncertainty inequality is
saturated trivially. Such states are “maximally quantal”, having the least possible
precise specification of phase. The states of interest in the classical limit—the so-
called “coherent states” studied in the following section—provide a more interesting
saturation of the uncertainty relations (8.27,8.28), in which both the number and
phase operators have non-vanishing dispersion, with the product taking its minimum
possible value, but with nevertheless an arbitrarily small fractional uncertainty in the
number (i.e., amplitude) and phase expectation values.

3 In the following, in an attempt to circumvent a confusing notational overload, we have introduced a


bold-face notation for the operators to distinguish them from their c-number expectation values.
228 Dynamics VI: The classical limit of quantum fields

The answer to the question posed by the heading of this Section—namely, when
is a quantum field classical?—is therefore quite straightforward. The behavior of such
a field is classical only in the context of states with high occupation number for
individual modes of the field. It is therefore necessary that the field (a) be bosonic in
character, (b) be associated with stable particles, in order that a classical configuration,
once established, be maintained over macroscopic time-scales, and (c) have a sensible
non-interacting limit, so that the previous discussion, which assumed many-particle,
but non-interacting, particle states is indeed applicable.
In the Standard Model, the only bosonic fields associated with stable elementary
particles4 are the gauge-vector bosons mediating electromagnetic (the photon) and
strong (the gluons of quantum chromodynamics) interactions. We shall later see that
the non-abelian gauge dynamics of the gluons of the strong interactions results in a
qualitative alteration of the state space once interactions are turned on: namely, the
phenomenon of color confinement (cf. Sections 19.3 and 19.4) in which the physical
particle states of the theory are related to the underlying local (quark and gluon)
fields in an extremely complicated way, so condition (c) is violated in this case. We
are left (apart from gravity) with the quantized electromagnetic field as the single
phenomenologically relevant example of a relativistic quantum field with a sensible
classical limit. It is therefore hardly surprising, given the primary role played by
correspondence principle arguments in the historical evolution of quantum mechanics
and quantum field theory, that the unravelling of the quantum properties of light were
at the core of the most critical developments in this history, as we have already seen
in Chapters 1 and 2.

8.3 Coherent states of a quantum field


We have seen in the preceding section that the states in which a (bosonic) quantum
field has approximately classical behavior, in the sense of having expectations of the
amplitude (or number) and phase operators with small fractional quantum dispersion,
necessarily involve linear combinations of high occupancy multi-particle states. Among
such states there is a special subclass—called coherent states—which exhibit, in a cer-
tain sense, ultraclassical behavior. Such states will be defined by the requirement that
the number-phase uncertainty relations (8.27,8.28) are saturated: i.e., the inequality
becomes an equality, so that the mutual uncertainty is as small as it can possibly
be. Such states are of more than theoretical interest, as they are in fact the states
produced in tuned laser cavities for modes selected for high gain.
The properties of these distinguished coherent states are best obtained by starting
with the inequality (8.23), but with the operators X, Y chosen as the hermitian and
antihermitian parts of the destruction operator for the mode in question:
1
X= (a + a† ) (8.31)
2
1
Y = (a − a† ) (8.32)
2i

4 The Standard Model spin-0 Higgs is, of course, elementary but unstable: see Chapter 7, Problem 12.
Coherent states of a quantum field 229

with the commutator Z = −i[X, Y] = 12 . We shall soon see that the states which
saturate the mutual uncertainty relation for the hermitian operators X, Y (called
“quadrature” operators in the quantum optics literature) do the same for the number-
phase uncertainty relations. The inequality (8.24) becomes an equality when the

Z
discriminant vanishes and λ = 2(ΔY )2 . If we further require that the dispersions in
X and Y are equal, then (ΔX) = (ΔY )2 = 12 Z = 14 , so λ = 1, and the state |ψ
5 2

must satisfy

0 = (ΔX + iΔY)|ψ ⇒ (X + iY)|ψ = (X + iY)|ψ ≡ α|ψ (8.33)

where the eigenvalue α is in general complex. Of course, X + iY = a, so the coherent


states are simply the eigenstates of the destruction operator a. Expressed as linear
combinations of multi-particle Fock states |n

|ψ = cn |n (8.34)
n

the eigenvalue equation a|ψ = α|ψ implies the recursion relation


α
cn+1 = √ cn (8.35)
n+1
with the unique solution (c0 ≡ C)
αn
cn = C √ (8.36)
n!
1 2
The requirement that |ψ be unit normalized then implies C = e− 2 |α| , so
∞
1 2 αn
|ψ = e− 2 |α| √ |n (8.37)
n=0 n!

Henceforth, as is usual in quantum theory, we shall relabel the coherent state |ψ by
its eigenvalue with respect to the destruction operator a, thus |ψ → |α.
The interpretation of the complex number α becomes apparent if we return to the
expression for the quantized single-mode electric field (8.8). The expectation values
of the destruction a and creation a† operators in the coherent state |α are α and
α∗ respectively, so writing α = Aeiθ , the expectation value of the electric field in the
coherent state is
 x, t)|α = −2CA sin (k · x − ωt + θ)
α|E( (8.38)

i.e., exactly the classical monochromatic wave form (8.9). Thus the norm A and phase
θ of the complex eigenvalue α encode the amplitude and phase of the corresponding
classical field. We shall shortly see how to extend this result to construct coherent

5 Minimum uncertainty states in which ΔX = ΔY are also of considerable interest in quantum optics:
they are the so-called “squeezed” states.
230 Dynamics VI: The classical limit of quantum fields

states with an arbitrary preassigned spatiotemporal behavior for the field expectation
value, even in a fully interacting theory!
Note that we have so far not imposed a mode cutoff at some large occupation
number N as in the previous section. Our next task, then, is to verify the assertions
made concerning the negligible effects of such a cutoff in the classical regime. First,
note that the expectation value of the number operator N in the state |α is
 2  |α|2n 2 ∂ |α|2
N̄ = nc2n = e−|α| n = e−|α| |α|2 2
e = |α|2 (8.39)
n n
n! ∂|α|

Thus, the probability P (n) of finding exactly n photons in our coherent state
characterized by an average photon number N̄ is precisely the Poisson distribution
n
P (n) = e−N̄ (N̄n!) . Moreover, the classical limit requires that we choose |α|2 = N̄ >> 1
in order to achieve simultaneously a small fractional dispersion in the classical ampli-
tude (related to N̄ ) and in the phase (which requires ΔN >> 1). In this limit the
Poisson distribution becomes approximately Gaussian,
(n−N̄ )2
1 −1
P (n) ∼ √ e 2 (ΔN )2 (8.40)
2πΔN

with ΔN = (N̄ )1/2 << N̄ .


Introducing a mode cutoff at N >> N̄ >> 1, we find that the components of our
coherent state both at the bottom of the spectrum
2
0|α = c0 = e−|α| /2
= e−N̄ /2 << 1 (8.41)

and at the top, using Stirling’s formula for the factorial,

1 eN̄ N/2
|N |α| ∼ e−N̄ /2 ( ) << 1 (8.42)
(2πN )1/4 N
are negligibly small, allowing us to use freely the phase operators E, C, and S
introduced in the previous section without worrying about the mutilation of the basic
commutation relations (8.17) and (8.21) induced by our mode cutoff.
With the mode cutoff in place, the unitary operator E defined in (8.19) has
eigenstates of the form

1 N
|φ = √ einφ |n (8.43)
N + 1 n=0

with E|φ = eiφ |φ, provided the eigenphases φ take the discrete values
2πm
φ= N +1
, m = 0, 1, 2, . . . N . The corresponding eigenvalues of the hermitian (resp.
antihermitian) parts C (resp. S) of E are, of course, cos φ (resp. sin φ). The amplitude
for measuring a phase φ in a coherent state |α is therefore

e−|α| /2  −inφ αn e−|α| /2  −in(φ−θ) |α|n


2 N 2 N
φ|α = e √ = e √ (8.44)
N + 1 n=0 n! N + 1 n=0 n!
Coherent states of a quantum field 231

where we recall that θ is defined to be the phase of α: α = |α|eiθ . In the classical



regime of interest, the factor e−|α| /2 |α|
2 n
√ = P (n) becomes approximately Gaussian:
n!

/2 |α|
n (n−N̄ )2
2 1 − 14
e−|α| √ ∼ √ e (ΔN )2 (8.45)
n! (2π)1/4 ΔN
The phase amplitude (8.44) can therefore be evaluated in the limit of large N as a
Fourier sum approximated by a Gaussian integral
 (n−N̄ )2
−1 2 2
φ|α ∝ e−in(φ−θ) e 4 (ΔN )2 dn ∝ e−iN̄ (φ−θ) e−(ΔN ) (φ−θ) (8.46)

and the distribution in phase P (φ) of our coherent state is Gaussian


2
(φ−θ)2
P (φ) ≡ |φ|α|2 ∝ e−2(ΔN ) (8.47)
1
corresponding to a half-width in phase of Δφ = 2ΔN . The corresponding dispersion
|S|
in C = cos φ is evidently ΔC = | sin φ|Δφ = 2ΔN whence
1
ΔN · ΔC = |S| (8.48)
2
and we see that the number-phase uncertainty relation (8.27) is indeed saturated for
the coherent state |α, as expected.
The coherent state formalism is even more powerful then the preceding con-
siderations may lead us to expect, enabling the explicit construction of states in
fully interacting theories with an exactly prescribed spatial dependence for the field
observables of the theory. To illustrate this point we shall consider a self-interacting
scalar field φ, with dynamics specified by the φ4 -type interaction discussed earlier (cf.
Chapter 6, (6.90)): namely, with a field energy density given by the normal-ordered
operator
1 1  1 λ
H(x) =: φ̇(x)2 + |∇φ(x)| 2
+ m2 φ(x)2 + φ(x)4 : (8.49)
2 2 2 4!
and self-conjugate local scalar field

1 d3 k
φ(x) = 3/2
 (a(k)e−ik·x + a† (k)eik·x ) ≡ φ(+) (x) + φ(−) (x) (8.50)
(2π) 2E(k)
We shall soon be focussing on the full Hamiltonian H, which is conserved, so we
will evaluate all operators with the time taken to be zero in (8.49): thus φ(x) = φ(x, 0).
We would like to construct coherent states in which the field operator φ(x, 0) has a
prescribed static spatial dependence f (x) for its expectation, with functionals of φ,
such as the energy density above, having expectation values given as the corresponding
functional of the prescribed c-number function f (x). The single-mode coherent states
for photons discussed previously give us a clue for how to achieve this: they are
eigenstates of the annihilation operator a for the given mode. Similarly, we expect
general coherent states of a scalar quantum field φ to be eigenstates of the positive
232 Dynamics VI: The classical limit of quantum fields

frequency part φ(+) of the field, built entirely from destruction operators a(q). To
construct such states, introduce the operator
 
1
S = exp( d3 q(2π)3/2 2E(q)f˜(q)a† (q)) (8.51)
2

where f˜(q) is for the present an unspecified c-number function of spatial momentum
q (as usual, we simplify the notation by omitting arrows for spatial vectors, e.g., q,
where the context is clear). A straightforward calculation using the Baker–Campbell–
Hausdorff6 expansion formula gives
1 
S −1 a(k)S = a(k) + (2π)3/2 2E(k)f˜(k) (8.52)
2
and hence
1
S −1 φ(+) (x, 0)S = φ(+) (x, 0) + f (x) (8.53)
2

where f (x) is the Fourier transform of f˜ in (8.51). Thus, if we define the coherent
state

|f  ≡ S|0 (8.54)

we see (since φ(+) |0 = 0) that our coherent state is indeed an eigenstate of the positive
frequency part of the field
1
φ(+) (x, 0)|f  = f (x)|f 
2
1
f |φ(−) (x, 0) = f | f (x) (8.55)
2
where we have taken f to be real. Thus, as promised, we have constructed a state |f >
with a field expectation value exactly equal to the preassigned function f (x):

f |φ(x, 0)|f  = f (x) (8.56)

From the definition of the normal product it now follows that

f | : φ2 (x, 0) : |f 
= f (x)2
f |f 
f | : φ4 (x, 0) : |f 
= f (x)4
f |f 
 x, 0)|2 : |f 
f | : |∇φ(  (x)|2
= |∇f (8.57)
f |f 

6 Namely, e−A BeA = B − [A, B] + 1


2!
[A, [A, B]] − 1
3!
[A, [A, [A, B]]] + ...
Coherent states of a quantum field 233

The time-derivative term in the free Hamiltonian is a little trickier, although, as we


are dealing with coherent states with a static expectation value, the final result is,
unsurprisingly, zero. By computations analogous to (8.52, 8.53) one finds

i
S −1 π (+) (x, 0)S = π (+) (x, 0) − fE (x) (8.58)
2

where π (+) (x, 0) is the positive frequency part of φ̇(x, 0), and


fE (x) ≡ d3 kE(k)f˜(k)eik·
x (8.59)

is real if f (x) is. Then

i i
π (+) (x, 0)|f  = − fE (x)|f , f |π (−) (x, 0) = f |fE (x) (8.60)
2 2
from which it follows that

f | : φ̇2 : |f  = f |(π (+)2 + 2π(−) π (+) + π (−)2 )|f  = 0 (8.61)

Thus the expectation value of the full Hamiltonian density in the state |f  is

f |H(x, 0)|f  1  1 λ
= |∇f (x)|2 + m2 f (x)2 + f (x)4 (8.62)
f |f  2 2 4!

As the expectation value of the full Hamiltonian, obtained as the spatial integral of
(8.62), is time-independent, it is given, at any time, by the exact functional

f |H|f  1  1 λ
= d3 x( |∇f (x)|2 + m2 f (x)2 + f (x)4 ) (8.63)
f |f  2 2 4!

It is rather remarkable that the coherent-state formalism allows us to construct a state


in a fully interacting quantum field theory with an exactly prescribable expectation
value for the energy density. Strictly speaking, (8.63) contains disguised divergences
(the famous ultraviolet divergences of local field theory), due to the fact that the
coefficients m2 , λ (the so-called bare mass and coupling) of the quadratic and quartic
terms in the expression (8.49) for the full Hamiltonian are only well-defined if the
theory is cutoff at short distance (or large momenta). The need for such cutoffs in
interacting field theories will be exhaustively discussed later in the book, beginning in
Chapter 10. A well defined version of the above arguments would therefore require a
regularization analogous to that performed in our previous examination of the single-
mode case: namely, a cutoff not just of mode occupation number but also of the allowed
momenta k, with |k| < Λ, where Λ is some very-high-energy/momentum scale, far
beyond those of accessible phenomena. Likewise, assuming f˜(k) of compact support
in momentum space (say, f˜(k) = 0, |k| > Λ), the preceding arguments carry through
with no essential modifications.
234 Dynamics VI: The classical limit of quantum fields

8.4 Signs, stability, symmetry-breaking


One further important physical constraint on a local quantum field theory, not
mentioned previously, is readily addressed with the tools developed in this chapter:
the necessity that the spectrum of the theory be bounded below, or, in other words,
the existence of a state (or states) of minimum energy. Otherwise, the system is
unstable, with any finite-energy initial state decaying endlessly to lower-energy states
with the emission of infinitely many particles. Such a requirement imposes constraints
on the signs of couplings appearing in the Hamiltonian, as we can easily see when
employing the coherent state formalism of the preceding section. For example, stability
requires that the coupling λ in (8.63) be positive. If λ < 0, then by choosing f
spatially constant in some region of fixed volume and arbitrarily large in magnitude
we can produce a state of arbitrarily negative energy, so the spectrum is unbounded
below.
It may be thought that the positive sign of the coefficient of f 2 in (8.63) (i.e., of φ2
in (8.49)) is also sacred: after all, what would a negative squared mass (or imaginary
mass) mean? In fact, the theory with Hamiltonian

1 1  2 1 2 2 λ 4
H = d3 x : φ̇2 + |∇φ| − m φ + φ : (8.64)
2 2 2 4!
is perfectly sensible, although the physical interpretation will involve a new concept:
spontaneous symmetry-breaking. If we plot the expectation value of the Hamiltonian
energy density H/V for a system quantized in a box of finite volume V , for the
coherent state |f  (with f spatially constant over the box) we find a double well
shape, symmetric around the zero-field point (see Figure 8.1).
It is apparent that there are two distinct states of minimum energy, with

6
φ = f0 = ± m (8.65)
λ
On the other hand, the state with vanishing f (i.e., the conventional Fock-space
vacuum |0, cf. (8.54)) clearly has larger energy than either of these: it is an excited
state! In such a system, a low-energy state—in other words, a state with a small number

<H>
V

f0 f

Fig. 8.1 Energy density for various coherent states with spontaneous symmetry-breaking.
Signs, stability, symmetry-breaking 235

of particles with finite energy—may be regarded as a perturbation of the situation in


which the field takes the constant value f0 , or the constant value −f0 , over all space,
with the symmetry of the Hamiltonian (8.64) guaranteeing equivalent physics in either
case. In order to write the field operator in terms of creation and annihilation operators
with respect to a new, truly minimum-energy, vacuum, we must first subtract off the
vacuum-expectation-value (VEV) of the original field variable, (8.65). (Clearly, the
canonical field (8.50) automatically has zero VEV with respect to the Fock vacuum
|0 defined by a(k)|0 = 0, ∀k). Thus, we introduce a “shifted” field φ̂:

6
φ(x) = φ̂(x) + f0 = φ̂(x) + m (8.66)
λ

Rewriting the Hamiltonian (8.64) in terms of the shifted field, one finds

1˙ 1  2 λ 3 λ 4
H= d x : φ̂2 + |∇
3
φ̂| + m2 φ̂2 + m φ̂ + φ̂ : (8.67)
2 2 6 4!

(An additive constant in the energy density, irrelevant for our present discussion,
but of profound importance in modern theories of cosmological inflation, has been
discarded).
Note that a physically sensible positive sign has reappeared in front of the quadratic
mass term. Now, however, the theory contains a cubic as well as quartic interac-
tion term. The original φ → −φ symmetry of (8.64), which would have guaranteed
conservation of evenness or oddness of the number of particles in any scattering
process, has been broken. The breaking is really due to an asymmetrical choice of
an off-centered minimum of a symmetric potential energy curve. In other words,
the symmetry is broken by the ground state, not by the underlying dynamics. If
we had chosen random values for the coefficients of the φ̂2 , φ̂3 , φ̂4 terms in (8.67),
the symmetry-breaking would have been explicit in the dynamics: indeed, in this
case, there would be a unique lowest-energy state (vacuum), and no non-trivial
symmetry operation (such as φ → −φ) connecting distinct degenerate vacuum states.
Only for the special case where these couplings and masses are related as indicated
in (8.67) can the theory be regarded as possessing an underlying symmetry broken
only by the choice of a single asymmetric vacuum—from a set of degenerate ones
connected by the symmetry—dictated by historical circumstance, just as the direction
of spontaneous magnetization in a ferromagnet lowered below its Curie tempera-
ture will depend on the presence of small external magnetic fields to resolve the
rotational ambiguity. Symmetry-breaking of this type is referred to as “spontaneous
symmetry-breaking”.
The symmetry of the theory defined by the Hamiltonian (8.64) is a discrete one:
φ → −φ. If the symmetry undergoing spontaneous breakdown is continuous, as in
the case of rotational symmetry mentioned above in the context of ferromagnetism,
there is a remarkable new feature, first noticed by Goldstone (Goldstone, 1961): the
appearance of an exactly massless scalar particle, or “Goldstone boson”. Imagine a
theory with three scalar particles of identical mass m. (For example, the neutral and
charged pions π0 , π ± form such a triplet, if we ignore the slight differences in mass of
236 Dynamics VI: The classical limit of quantum fields

the neutral and charged types due to the weak and electromagnetic interactions.) The
free Hamiltonian for such a system may be written thus:

1 ˙ 2 1   2 1 2  2
H0 = d3 x : φ + |∇φ| + m φ : (8.68)
2 2 2
Here the three scalar fields have been written as the three “components” φ1 , φ2 , φ3 of a
three-dimensional field vector φ. This Hamiltonian possesses a continuous symmetry of
rotations in field space, whereby φi → Rij φj , with Rij a 3 × 3 orthogonal matrix, i.e.,
an element of the fundamental representation of the group O(3), the rotation group
in three dimensions. Note that the symmetry is a “global” one: exactly the same
rotation is applied to the field vector at all spacetime points (otherwise, the spatial
gradient or time-derivative terms in the Hamiltonian would not be left invariant).
Such a symmetry is commonly referred to as an “isospin” symmetry. One may write
down interactions which also respect the isospin symmetry of the free Hamiltonian,
λ 2 2
e.g., Hint = 4! (φ ) , which is clearly also invariant under rotations in field space. Now
suppose that the full Hamiltonian is actually

1 ˙ 2 1   2 1 2  2 λ  2 2
H = d3 x : φ + |∇φ| − m φ + (φ ) :
2 2 2 4!

1 ˙ 2 1   2  :
≡ d3 x : φ + |∇φ| + P (φ) (8.69)
2 2
Once again, the state of minimum energy does not correspond to zero vacuum
 Rather, there are a whole family of minimum-energy
expectation value of the fields φ.
states corresponding to the various field orientations with fixed magnitude

 6
|φ| = m (8.70)
λ
A physical Fock space is constructed by imagining (and here the analogy to ferromag-
netism becomes essentially complete) that some fluctuation has “tickled” the system

into choosing a definite direction in field space for 0|φ|0. By the rotational invariance
of (8.69), we may as well call this direction the “3” direction, and rewrite the theory
in terms of a shifted field:

6
φi (x) = mδi3 + φ̂i (x) (8.71)
λ
As in the theory (8.64), the physical vacuum state will be one in which the vacuum
expectation value of the shifted field φ̂ vanishes. With this substitution we find that
 of H becomes
the “potential energy” part P (φ)

λ λ
P → m φ̂3 +
2 2
mφ̂3 φ̂i φ̂i + (φ̂i φ̂i )2 (8.72)
6 4!
Note the appearance once again of a cubic term which violates the rotational symmetry
(because the third direction in field space is singled out via φ̂3 ). But in contrast to the
Signs, stability, symmetry-breaking 237

discrete case, the theory now contains only a single massive particle, corresponding to
φ̂3 : there is no mass term for φ̂1 , φ̂2 ! These
two directions correspond to the “flat”

directions in field space near the point φi = 6 mδi3 . Note that the Hamiltonian after
λ
the shift is still invariant under rotations around the 3-axis (which leave φ̂3 fixed),
so the symmetry-breaking has reduced the symmetry group of the theory from O(3)
(rotations around the 1, 2, or 3 axes in field space) to O(2) (rotations only around the 3
axis, represented by 2x2 matrices mixing the 1 and 2 coordinates). The two generators
corresponding to the lost symmetries (rotations around the 1 and 2 directions) may
be associated with the two massless Goldstone bosons appearing in (8.72).
The appearance of a massless scalar for every broken generator of a continuous
global symmetry is a very general property of local quantum field theories. Note that
our arguments in this section are based on an identification of the physical masses of
particles with the coefficients of quadratic terms in the Hamiltonian, which is only,
strictly speaking, valid to lowest order in perturbation theory in the interactions: we
shall see later that the interactions will in general renormalize the physical masses.
Nevertheless, the masslessness of the modes induced by spontaneous breaking of global
continuous symmetries turns out to be exact to all orders of perturbation theory, as
we shall show when we return to the subject in Section 14.3.7 Some more examples
of spontaneous symmetry-breaking in theories with continuous global symmetries are
given in Problems 4, 5, and 6 at the end of this chapter.
One may wonder why the Goldstone theorem has assumed such central importance
in modern particle physics: after all, there do not appear to be any massless spinless
particles in Nature! There are two reasons:
(1) There are spontaneously broken continuous global symmetries which would be
exact except for the presence of small explicit terms breaking the symmetry
in the Hamiltonian. In such a case (see Problem 4) one gets scalar particles
of small mass, called “pseudo-Goldstone bosons”. This is the case for pions
in the strong interactions, which are the pseudo-Goldstone bosons of broken
approximate chiral symmetry.
(2) The spontaneous symmetry-breaking may occur in a theory where the contin-
uous symmetry is a local gauge symmetry, and in this case each Goldstone
boson appearing due to a broken generator of the symmetry group is in a sense
“absorbed” into the gauge field of the corresponding gauge particle, making the
latter massive (the so-called “Higgs” mechanism, discussed in detail in Section
15.6, which explains the mass splitting between the photon and the massive W
and Z bosons of modern electroweak theory).
We shall return later, in the “Symmetries” section of the book (cf. Chapter 14), to a
much more detailed examination of the physics of degenerate vacua and spontaneous
symmetry-breaking, both for global and local symmetries of the underlying theory.
For the time being, the preceding discussion provides a suggestive example of the

7 The first general proof of this was given by Goldstone, Salam, and Weinberg, Phys. Rev. 127 (1962),
965.
238 Dynamics VI: The classical limit of quantum fields

power and utility of the coherent state formalism in connecting certain macroscopic
properties of a quantum field theory with the microscopic dynamics.

8.5 Problems
1. Starting with the vector field A  for a massless spin-1 particle given in (7.174),
with the electric field defined as E  = ∂ A
and the magnetic field B  =∇  ×A  (or
∂t
Bi = ijk ∂j Ak , i, j, k =1,2,3), derive the commutation relation:

[Ei (x, t), Bj (y , t)] = i ijk ∂k δ 3 (x − y )

2. Show that in the state (8.37), the probability of detecting more than N quanta
falls more rapidly than ( en̄ N)
N
for N > n̄ ≡ N  once we consider states with
occupancy N exceeding the expectation value n̄ of the number operator.
3. Show that the states (8.43) are eigenstates of the operator (8.19), with eigenvalues
2πm
eiφ , φ = N +1 , m = 0, 1, 2, ...N .
4. Let σ and π (the vector symbol over π refers to an internal “flavor” index, not to
ordinary space!) be a set of four self-conjugate scalar fields, interacting via the
Hamiltonian

1 ∂σ 2 1  2 1  ∂πi 2
3
H= d3 x{: ( ) + |∇σ| + ((  i |2 )
) + |∇π
2 ∂t 2 2 i=1 ∂t
1
− μ2 (σ 2 + π 2 ) + λ(σ 2 + π 2 )2 :}
2
(a) Show that the ground state of this system has non-vanishing expectation
value for one of the fields. (It is conventional to call this the σ field—why
is this purely a matter of convention?). Show that this corresponds to a
breaking of the global symmetry group (what is it?) of H. Show that there
are massless particles (how many?) in the theory.
(b) What happens to the spectrum if a term kσ (k a real constant) is added
to H?
(c) If μ=80 MeV, λ=0.04, k=0, calculate the difference in energy density in
J/m3 between the false vacuum with σ = π  = 0 and one of the coherent
states minimizing the energy.
5. In the theory of the preceding Problem, show that the σ particle is unstable and
calculate its lifetime (inverse of the total decay rate) to lowest order in λ.
 1 and χ
6. Let χ  2 be two three-vectors of self-conjugate scalar fields interacting via
a Hamiltonian density with polynomial part
1
 2 ) = − μ2 (
χ1 , χ
P ( χ21 + χ
 22 ) + λ1 (
χ21 + χ 1 · χ
 22 )2 + λ2 χ χ21 + χ
 2 (  22 )
2
where λ1 , λ2 > 0, and λ1 > λ2 /2.
(a) What is the global symmetry group of the Hamiltonian?
Problems 239

(b) Find the configuration that minimizes the polynomial P and show that it
breaks the global symmetry (i.e., find χ1  and 
χ2 ). What is the residual
global symmetry in this case after spontaneous symmetry-breaking? How
many generators of the original symmetry group were broken?
(c) Identify (as linear combinations of the original fields) the massless and the
massive fields after spontaneous symmetry-breaking, and find the masses of
those with non-zero mass.
9
Dynamics VII: Interacting fields:
general aspects

Our attempts to build a framework incorporating the essential features of quantum


theory, special relativity, and the cluster principle have been motivated by an exam-
ination of the properties of the resultant scattering matrix, as given by the formal
perturbative expansion (5.73). In particular, the entire discussion has taken place
in the context of the interaction-picture representation of the dynamics, in which a
computationally convenient division of the full Hamiltonian is made, and the dynamics
of the field operators of the theory assigned to one portion (the “free” Hamiltonian),
with the complicated dynamics induced by interactions assigned entirely to the states.
All of the conclusions reached heretofore must therefore be regarded as limited to
those theories in which a perturbative expansion is both mathematically sensible and
physically appropriate. On the other hand, one is faced with the unpleasant fact that
in the case of the strong interactions (to give a particularly glaring example), for all
but a very special class of processes (in certain asymptotic kinematic regimes), the
perturbative expansion of the theory is useless—not simply for reasons of quantitative
inadequacy, but because the qualitative implications of perturbation theory completely
mislead us as to the physical structure of the theory.
We shall therefore turn our attention to a more general formulation of local
quantum field theory, in which the results are divorced to the maximum degree possible
from the special features and assumptions of perturbation theory. If we insist on a
unitary dynamics, in which no attempt is made to divide the Hamiltonian H into “free”
and “interaction” parts, we then have, as discussed in Chapter 4, basically two choices
for the description of the dynamics of the theory: the Heisenberg picture, in which the
entire time-dependence induced by H is transferred to the field operators of the theory,
with the state of the system fixed once and for all by appropriate boundary conditions;
or, the Schrödinger picture, in which the states carry the full time-dependence of
H, while the field operators are time-independent. As we saw in Section 4.3, the
discussion of scattering theory, even for ordinary non-relativistic potential scattering
theory, is particularly transparent in the former framework, with the Heisenberg
representation |αin , |βout states incorporating naturally the boundary conditions
of a typical scattering process. In the case of relativistic quantum field theory, the
Heisenberg representation is also preferred for reasons of manifest Lorentz covariance:
we shall see that the Heisenberg field operators, while carrying the time development
specified in a particular inertial frame, nevertheless remain local covariant fields (while
the states are fixed), in contradistinction to the Schrödinger picture, where a definite
Field theory in Heisenberg representation: heuristics 241

choice of inertial frame is necessary in order to specify the relevant time variable for
the state dynamics, and the field operators depend only on the spatial variables, to the
detriment of manifest Lorentz covariance of the theory. Our next task will therefore
be to reformulate local quantum field theory in the Heisenberg picture, avoiding as
far as possible any reference to perturbative expansions of the quantities discussed.
This is a long chapter, and we beg the reader’s patience insofar as the treat-
ment requires perhaps a somewhat higher level of mathematical sophistication than
previously necessary. The arguments have been spelled out in great detail to avoid
confusions, and the required additional mathematics (mainly distribution theory) has
been explained just enough to make the proofs comprehensible. However, the effort
is justified by the central importance of the topics discussed, which encompass the
essential conceptual content of interacting local quantum field theories, expressed, in
the absence of an explicit “from the ground up” mathematical construction of four-
dimensional field theories, in as precise a form as mathematical physicists have been
able to achieve to the present date. In particular, a clear understanding of the variety
of connections between particles and fields, discussed at length in Section 9.6, is really
not possible without the insight into the nature of the interpolating field given by the
asymptotic formalism developed in Sections 9.3 and 9.4.

9.1 Field theory in Heisenberg representation: heuristics


The transformation of a general quantum-mechanical operator from interaction picture
to Heisenberg picture is given formally by (4.18) from Chapter 4:

OH (t) = U † (t, 0)Oip (t)U (t, 0) (9.1)

with U (t, 0) = eiH0 t e−iHt . In the case of a quantum-mechanical system with a finite
number of degrees of freedom, there are no hidden subtleties here: the operator
U (t, 0) is properly unitary, acting within a single Hilbert space spanned by either
the complete set of eigenstates of the free Hamiltonian H0 or the full Hamiltonian H.
We shall see later (cf. the discussion of Haag’s theorem, Section 10.5) that none of
these nice properties obtain for a quantum field theory with infinitely many degrees
of freedom. Thus, strictly speaking, the manipulations that follow in this section are
only valid if the field theory is fully regularized—by which we mean that both infrared
(finite-volume V ) and ultraviolet (short-distance a, or high-momentum Λ) cutoffs are
imposed, reducing the number of allowed momentum modes to a finite number (∼ aV3 ).
In fact, we have already noted (cf. Section 8.3, final paragraph) that such cutoffs are
necessary to have a well-defined expression for the Hamiltonian in terms of the fields.
The attentive reader will, of course, complain that such cutoffs, while perhaps
restoring quantum-mechanical sanity, will clearly do violence to the two other sacred
ingredients which we have taken such pains to implement: Lorentz-invariance and
locality. And it is certainly far from obvious that the cutoffs can be removed at the
end of our calculations in a way that fully restores these desiderata. These are all
important issues, to which we shall return on several occasions later in the book. But
for the time being we shall throw caution to the winds and proceed as though the
conventional manipulations of non-relativistic quantum theory make sense in the field
theory case as well. A rigorous scattering theory, constructed without recourse to a
242 Dynamics VII: Interacting fields: general aspects

mathematically dubious interaction picture, will be the subject of Sections 9.3 and
9.4. In the meantime we shall assume that the interaction picture makes sense. Also,
we shall restrict the discussion to the case of a single, self-conjugate massive scalar
field with polynomial self-interactions, as the complications entailed by non-zero spin
are completely irrelevant to the physical issues of importance here.
We therefore define a Heisenberg field operator φH (x, t) in terms of the correspond-
ing interaction-picture field via (9.1) as

φH (x, t) ≡ U † (t, 0)φ(x, t)U (t, 0) (9.2)

As U (t, 0) is by assumption unitary, the locality (by construction) of the free


interaction-picture field implies the vanishing of the equal-time commutators of φH at
distinct spatial points:

[φH (x, t), φH (y , t)] = U † (t, 0)[φ(x, t), φ(y , t)]U (t, 0) = 0, x = y (9.3)

The extension of this equal-time result to full microcausality (i.e., space-like commuta-
tivity) requires examination of the Lorentz transformation properties of φH , to which
we now turn.
We recall that the construction of the interaction part V of the Hamiltonian
as the spatial integral of a density implies the momentum conservation property
(5.90), [P (0) , V ] = [P (0) , H0 ] = 0, where P (0) is the spatial momentum operator on
the Fock space of eigenstates of H0 (the so-called “bare” states of field theory).
As the interaction does not alter spatial momentum, we can drop the (0) subscript,
as the same operator will measure the momentum of all states after interactions are
switched on. In particular, the commutator of this spatial momentum operator with
the Heisenberg field defined in (9.2) is
∂ ∂
i[Pi , φH (x, t)] = U † (t, 0)i[Pi , φ(x, t)]U (t, 0) = U † (t, 0) φ(x, t)U (t, 0) = φH (x, t)
∂xi ∂xi
(9.4)
as the spatial momentum operator Pi commutes with H, H0 , hence with U (t, 0). Of
course, the energy component of the four-vector energy-momentum operator which
generates the time-evolution of the Heisenberg field is the full Hamiltonian H:

i[P0 , φH (x, t)] = i[H, φH (x, t)] = φH (x, t) (9.5)
∂t
The finite field translation property (5.93) for the interaction-picture field φ therefore
takes the obvious analogous form for the Heisenberg picture field:

eiPμ a φH (x)e−iPμ a = φH (x + a)
μ μ
(9.6)
(0)
with Pμ = (H, Pi ) = (H, Pi ).
Our previous discussion of scattering theory in Heisenberg representation (cf.
Section 4.3) was based on the specification of the Heisenberg state of a scattering
system in terms of the behavior of the system in the far past (the “in-states” |αin ) or
Field theory in Heisenberg representation: heuristics 243

the far future (the “out-states” |βout ). The former correspond to a set of far separated
incoming particles such as those prepared in the beams of a high-energy collider, the
latter to the set of outgoing detected particles, in interesting cases, following a collision.
In the case of non-relativistic potential scattering theory, and assuming the absence
of bound states of the incoming or outgoing particles, either the in- or the out-states
can be shown to be provide a complete set of eigenstates of the full Hamiltonian H. In
other words, the Hilbert space of the theory H can be identified with either the space
Hin spanned by the |αin or the space Hout spanned by the |βout states. We have
no option but to assume that the corresponding property remains valid in relativistic
field theory, where a basis for the in (resp. out) states is provided by the (continuum
normalized) multi-particle states |k1 , k2 , ....kN in (resp. |k1 , k2 , ...kM

out ) with


N
P μ |k1 , k2 , ....kN in = knμ |k1 , k2 , ....kN in (9.7)
n=1


M
P μ |k1 , k2 , ....kM

out = knμ |k1 , k2 , ....kM

out (9.8)
n=1

Here again we emphasize that we are taking for simplicity the theory of a single stable
spinless particle (with no bound states), so the states are fully specified by listing the
momenta of the particles.
The principle of asymptotic completeness asserts that the Hilbert spaces Hin , Hout
are identical, and can be identified with the full Hilbert space H of the theory. The
physical reasoning behind this assumption is in fact very straightforward. Imagine
an arbitrary finite-energy state of the system at, say, time t = 0, corresponding to
the moment of collision of some arbitrary set of formerly separated particles. The
state of the system near t = 0 is, of course, extremely complicated: all we know (in
a massive theory with short range interactions) is that the energy and momentum
of the field(s) is concentrated in a small spatial region around the interaction vertex.
It is physically clear that any such state will eventually evolve (absent bound states)
into a linear combination of Fock states consisting of sets of a finite number (≤ E/m,
where E is the total energy and m the mass of the field quantum) of outgoing stable
particles receding to infinity, with, of course, the total energy and momentum of the
system conserved at every stage. In other words, since the entire history of a system
is encapsulated (in Heisenberg representation) by its specification at some arbitrary
time (in this case, the far future), an arbitrary Heisenberg state of the system must
be resolvable into a linear combination of multi-particle out-states: the |βout span
the physical Hilbert space. Note that in a theory with unstable particles, these do
not form part of the asymptotic Hilbert space: only the stable particles persisting
at late time (including stable bound states, if such exist) are to be included in the
list of |βout . We shall return later in this chapter to a more detailed discussion of
the nature of the Hilbert space in relativistic field theory. For the time being we
note that the completeness of the in-states is assured given that of the out-states
by the TCP theorem of local field theory (cf. Section 13.4): the joint operation of
244 Dynamics VII: Interacting fields: general aspects

charge conjugation (particle–antiparticle interchange), parity and time reversal must


leave the physics invariant, whence the completeness of the in-states defined in the
far past follows unproblematically from the corresponding property of the out-states,
argued heuristically above. Although the property of asymptotic completeness seems
physically almost unassailable, the ability to explicitly and rigorously establish this
property for even the most technically controllable interacting field theories is quite
another matter, as we shall see later.
The states of our system carry, of course, a representation of the HLG (homoge-
neous Lorentz group), implemented by unitary operators UH (Λ) defined to have the
action (on covariantly normalized states)

UH (Λ)|k1 , k2 , ....kN in ≡ |Λk1 , Λk2 , ...., ΛkN in (9.9)

whence, from (9.7), we must have



UH (Λ)P μ UH (Λ) = Λμν P ν (9.10)

Choosing a countable basis1 |αin for Hin , the discrete matrix Sβα is unitary (cf.
Section 4.3.1), and Lorentz-invariance of the theory, out Λβ|Λαin = outβ|αin , then
implies (see Problem 1) the corresponding transformation property on the (once again,
continuum-normalized) out-states:

UH (Λ)|k1 , k2 , ....kN out = |Λk1 , Λk2 , ...., ΛkN out (9.11)

Note the subscript H on UH (Λ): these are not the same unitary operators as the U (Λ)
introduced earlier (cf. (5.23)) implementing Lorentz transformations on the bare multi-
particle states |α which form a basis of eigenstates of the free Hamiltonian H0 , and
which satisfy

U † (Λ)P (0)μ U (Λ) = Λμν P (0)ν (9.12)


(0)
As discussed above, since for the scalar theory under discussion, Pi = Pi , while
(0)
P0 = H0 = P0 = H, it is apparent that the U (Λ) and UH (Λ) operators are different.
We now return to the question of the Lorentz covariance properties of the Heisen-
berg field φH defined by unitary tranformation of the interaction-picture field φ in
(9.2). The latter is by construction a local, Lorentz scalar field, but the Heisenberg
field is obtained by a unitary transformation involving a specific choice of time
variable and inertial frame, so the issues of space-like commutativity and scalar
transformation property of φH under the HLG are not immediately obvious. To
proceed further, we shall again rely on formal arguments assuming the existence of
the interaction picture: in other words, we shall proceed as though our field theory is
fully regularized (i.e., is cutoff in the infrared and the ultraviolet, leaving only a finite
number of quantum mechanical degrees of freedom), so that the interaction-picture
time-development operators operate in the same space as the Heisenberg states of

1 The Fock space of field theory, as a countable direct sum of finite tensor products of one-particle spaces,
is, somewhat surprisingly, a separable Hilbert space; see (Streater and Wightman, 1978).
Field theory in Heisenberg representation: heuristics 245

the theory. After obtaining the relevant result linking matrix elements of interaction
and Heisenberg picture operators, we shall return to the issue of removal of the cutoff
(essential, of course, for restoring the locality and Lorentz transformation properties of
the theory).
The formula we need, of fundamental importance in both relativistic field theory
and many-body theory, is due to Gell–Mann and Low (Gell-Mann and Low, 1951). We
begin with the unitary Møller wave operators Ω∓ = U (0, ∓∞), connecting bare states
|α (eigenstates of H0 ) to the corresponding in- and out-states (cf. Section 4.3.2):

|αin = U (0, −∞)|α, |αout = U (0, +∞)|α (9.13)

Now consider a general matrix element between arbitrary in- and out-states of a time-
ordered product of m Heisenberg fields

out β|T {φH (x1 )φH (x2 )..φH (xm )}|αin =out β|φH (y1 )φH (y2 )..φH (ym )|αin (9.14)

where the set (y1 , y2 , ....ym ) of spacetime coordinates are obtained by subjecting the
original set (x1 , x2 , ...xm ) to a permutation in order to effect the desired time-ordering
t1 ≡ y10 > t2 ≡ y20 > ... > tm ≡ ym 0
. Using (9.2, 9.13) and the semigroup property of
the U (t, t0 ) (cf. Problem 1 in Chapter 4), we find

out β|φH (y1 )...φH (ym )|αin = β|U (+∞, t1 )φ(y1 )U (t1 , t2 )φ(y2 )...φ(ym )U (tm , −∞)|α
(9.15)
Note that the left-hand side of (9.15) involves purely Heisenberg states and operators,
while the right-hand side contains only interaction-picture states and operators.
Further progress requires that we limit ourselves to the formal perturbative expansion
of the theory (the reasons for which will become apparent below):

∞
(−i)n
τ (y1 , ...ym ) ≡ ·
n=0
n!

β|T {φ(y1 )φ(y2 )..φ(ym )Hint (z1 )Hint (z2 )..Hint (zn )}|αd4 z1 d4 z2 ..d4 zn

(9.16)

As the integrations over the spacetime coordinates z1 , z2 , ...zn in (9.16) are performed,
the time-ordering symbol will redistribute the n Hint operators among the already
time-ordered field operators φ(y1 ), φ(y2 ), etc. The full integration can evidently be
subdivided into subregions in which n0 of the Hint operators occur at times later
than t1 ≡ y10 , n1 occur between times t1 and t2 , and so on, with nm interactions prior
to the earliest field time tm , and n = n0 + n1 + ...nm . There are n0 !n1n! !...nm !
equiva-
lent ways of selecting the particular Hint operators to be placed in these temporal
intervals. Accordingly, our m-point function τ (y1 , ...ym ) (commonly referred to as a
“Feynman m-point function”) in (9.16) may be re-expressed (relabeling the z1 , .., zn
coordinates)
246 Dynamics VII: Interacting fields: general aspects

∞ 
(−i)n n!
τ (y1 , ...ym ) =
n=0
n! n0 ,n1 ,..nm ;n0 +n1 +··nm
n !n !...nm !
=n 0 1

· β|T {Hint (z0,1 )..Hint (z0,n0 )}φ(y1 )T {Hint (z1,1 )..Hint (z1,n1 )}φ(y2 )...

.. φ(ym )T {Hint (zm,1 )..Hint (zm,nm )}|α



· θ(z0,j
0
− t1 )θ(t1 − z1,j
0
)...θ(tm − zm,j
0
) d4 zi,j (9.17)
i,j

The sums over the individual n0 , n1 , etc. T-products of interaction operators can now
be reassembled into the corresponding interaction-picture time-evolution operators.
For example.
 (−i)n0  
0
θ(z0,i − t1 )T {Hint (z0,1 )..Hint (z0,n0 )} d4 z0,j = U (+∞, t1 ) (9.18)
n
n0 ! j
0

and similarly for the sums involving time-ordered operators sandwiched in the other
temporal regions between the φ(yi ) operators. In the end we recover exactly

τ (y1 , y2 , .., ym ) = β|U (+∞, t1 )φ(y1 )U (t1 , t2 )φ(y2 )...φ(ym )U (tm , −∞)|α (9.19)

i.e., the right-hand side of (9.15). In other words, with the provisos given earlier vis-
à-vis regularization of the theory, we have the Gell–Mann–Low formula
∞ 
(−i)n
out β|φH (y1 )φH (y2 ) .. φH (ym )|αin = β|T {φ(y1 )φ(y2 )..φ(ym )
n=0
n!

· Hint (z1 )Hint (z2 )..Hint (zn )}|αd4 z1 d4 z2 ..d4 zn (9.20)

or, restoring the original non-time-ordered variables x1 , x2 , ...xm :


∞ 
(−i)n
out β|T {φ H (x )φ
1 H (x 2 )..φH (xm )}|α in = β|T {φ(x1 )φ(x2 )..φ(xm )
n=0
n!

·Hint (z1 )Hint (z2 )..Hint (zn )}|αd4 z1 d4 z2 ..d4 zn (9.21)

Note that as a special case, taking m = 0 (no φH fields), we recover our previous
perturbative expression (5.73) for the S-matrix Sβα ≡ out β|αin .
With this result in hand we can return to the question of the Lorentz
transformation properties of the Heisenberg field φH (x). Of course, the presence
of regularizing cutoffs reducing our field theory to a finite number of degrees of
freedom explicitly breaks the invariance under the continuous HLG (for example, if
we formulate the theory on a discrete spacetime lattice, the usual prelude to attempts
at a rigorous construction of the continuum limit of a relativistic field theory).
We shall see in Part 4 of this book, with our treatment of covariant renormalized
perturbation theory, that the restoration of Lorentz-invariance can indeed be proven
rigorously in a number of four-dimensional field theories (the so-called “perturbatively
renormalizable” theories), but only in the context of the formal perturbative expansion
Field theory in Heisenberg representation: heuristics 247

of the theory in a suitably chosen cutoff-independent coupling parameter(s). The


rigorous demonstration of the existence of Heisenberg fields in continuum spacetime
with the desired locality properties and behavior under the Poincaré group has only
been possible in a limited class of theories in two or three spacetime dimensions
((Simon, 1974),(Glimm and Jaffe, 1987)).
We shall return later to the question of what is rigorously known about the
existence in the continuum of four-dimensional interacting field theories: here, we
stay within the confines of perturbation theory, and assume the validity of the
perturbative Gell–Mann–Low formula (9.21) absent ultraviolet and infrared cutoffs,
with the full HLG implemented both at the interaction-picture level (via U (Λ)) and
for the Heisenberg fields and states (via UH (Λ)). For the special case m = 1, (9.21)
reads
∞
(−i)n
out β| φH (x)|αin =
n=0
n!

· β|T {φ(x)Hint (z1 )Hint (z2 )..Hint (zn )}|αd4 z1 d4 z2 ..d4 zn (9.22)

We recall that the interaction-picture fields satisfy (cf. (5.76))

U † (Λ)Hint (x)U (Λ) = Hint (Λ−1 x), U † (Λ)φ(x)U (Λ) = φ(Λ−1 x) (9.23)

We wish to establish the corresponding Lorentz scalar field transformation property


for the fully interacting Heisenberg field φH (x), viz.

UH (Λ)φH (x)UH (Λ) = φH (Λ−1 x) (9.24)

Let us begin with (9.22), for arbitrary in |αin and out |βout states (recall that
these separately form complete sets, by the principle of asymptotic completeness).
Introducing an arbitrary fixed Lorentz transformation Λ

out β| UH (Λ)φH (x)UH (Λ)|αin = out Λβ|φH (x)|Λαin
∞ 
(−i)n
= Λβ|T {φ(x)Hint (z1 )..Hint (zn )}|Λαd4 z1 ..d4 zn
n=0
n!
∞ 
(−i)n
= β|U † (Λ)T {φ(x)Hint (z1 )..Hint (zn )}U (Λ)|αd4 z1 ..d4 zn
n=0
n!

(9.25)

By the arguments of Section 5.5, the T-product of ultralocal scalar fields (and we
are here assuming that Hint (x) is ultralocal) is Lorentz-covariant: i.e., the similarity
transformation by U (Λ) may be taken inside the T-product,2 giving

2 The reader will recall that the basic reason for this is that the only cases in which Λ can interchange
the time-ordering of two fields involve space-like separations where the fields already commute, by locality.
248 Dynamics VII: Interacting fields: general aspects


out β| UH (Λ)φH (x)UH (Λ)|αin
∞ 
(−i)n
= β|T {U † (Λ)φ(x)Hint (z1 )..Hint (zn )U (Λ)}|αd4 z1 ..d4 zn
n=0
n!
∞ 
(−i)n
= β|T {φ(Λ−1 x)Hint (Λ−1 z1 )..Hint (Λ−1 zn )}|αd4 z1 ..d4 zn (9.26)
n=0
n!

Changing integration variables from zi to wi ≡ Λ−1 zi , and recalling that the Λ are
unimodular (det(Λ) = 1),


out β| UH (Λ)φH (x)UH (Λ)|αin
∞ 
(−i)n
= β|T {φ(Λ−1 x)Hint (w1 )..Hint (wn )}|αd4 w1 ..d4 wn
n=0
n!
−1
= out β|φH (Λ x)|αin , ∀α, β, Λ (9.27)

As α, β run over arbitrary members of complete sets, this establishes (at least in the
context of perturbation theory!) the desired operator scalar field property (9.24) for the
fully interacting Heisenberg field φH . The extension of the equal-time commutativity
property (9.3) to full space-like commutativity, [φH (x), φH (y)] = 0, (x − y)2 < 0 is
now a straightforward exercise (see Problem 2).
The Heisenberg field φH (x) plays a fundamental role in field theory for a very sim-
ple reason: knowledge of its matrix elements is tantamount to a complete description
of the dynamics of the theory. In particular—and this is the primary topic of this
chapter—it contains all the information needed to reconstruct both the Heisenberg
in-states |αin and the out-states |β >out , and therefore, the exact scattering matrix
Sβα = outβ|αin of the theory. In order to see how this comes about, it is convenient
to introduce some auxiliary fields which incorporate the kinematical structure of the
asymptotic states. We shall do this explicitly for the in-states, with the understanding
that the entire procedure can be carried through, mutatis mutandis, for the out-states
by the simple device of replacing “in” with “out” everywhere. We first note that
creation and annihilation operators can be defined on the multi-particle Fock space
Hin exactly as for the bare states |k1 ..kN  in (6.29, 6.30)


N
ain (k)|k1 , k2 , ..kN in ≡ (±)r−1 δ 3 (k − kr )|k1 , ..kr−1 , kr+1 , ..kN in (9.28)
r=1

a†in (k)|k1 , k2 , ..kN in ≡ |k, k1 , k2 , ..kN in (9.29)

We shall be concerned for the time being with the simplest case, a bosonic spinless
particle, so only positive signs need be taken in (9.28). A free field φin (x) can then
be defined in complete analogy to the interaction-picture field φ(x) in (6.73) (with
a = ac , as we are also restricting ourselves to a self-conjugate scalar field) by setting
Field theory in Heisenberg representation: heuristics 249

1 d3 k
φin (x) =  (ain (k)e−ik·x + a†in (k)eik·x ) (9.30)
(2π)3/2 2E(k)

We shall see later that the mass m appearing in the interaction-picture field φ (and in
the coefficient of φ2 in the interaction Hamiltonian density) and denoting the energy of
single-particle eigenstates of H0 at zero momentum is by no means guaranteed to agree
with the actual physical mass mph : i.e., the energy eigenvalue of the full Hamiltonian
H on a single particle in (or out) state with zero spatial momentum.
On the other
hand, the energy function E(k) appearing in (9.30) is given by k 2 + m2ph , and the
four-momenta appearing in the complex exponentials in the integral are on-mass-shell
for the physical mass, k · k = m2ph . This implies that the in-field φin satisfies the free
Klein–Gordon equation, but with the physical mass:

( + m2ph )φin (x) = 0 (9.31)

The reader may easily verify that the spacetime translation of φin is implemented
by the full energy-momentum vector Pμ , of which the in-states in (9.28, 9.29) are
eigenstates:

eiPμ a φin (x)e−iPμ a = φin (x + a)


μ μ
(9.32)

So the in-field φin , despite satisfying a free field (i.e., covariant linear) equation,
is most definitely a Heisenberg picture operator. As indicated earlier, a Heisenberg
out-field φout (x) may be introduced in an exactly analogous fashion, starting from
creation and destruction operators for the out-states: it will likewise satisfy the free
Klein–Gordon equation (9.31) (replacing, of course, “in” with “out”). The Heisen-
berg field φH , defined by transformation from the free interaction-picture field φ,
on the other hand, certainly does not satisfy a free Klein–Gordon equation, as is
apparent from (9.22) giving its matrix elements: only the first, n = 0 term in the
perturbative expansion of these matrix elements, satisfies the Klein–Gordon equation
(albeit with the mass m associated with the free Hamitonian H0 ), while higher terms
involving insertions of the interaction Hamiltonian have a much more complicated
behavior.
It is apparent that knowledge of the φin and φout fields allows us to reconstruct the
complete set of in- and out-states, and hence their overlap, the S-matrix. This follows
from the formula (6.132) (Problem 8 in Chapter 6), appropriately generalized for the
in- (or out-)field case:

 ↔

ain (k) = i d3 x {gk∗ (x, t) φin (x, t)} (9.33)
∂t
 ↔

a†in (k) = −i d x {gk (x, t)
3
φin (x, t)} (9.34)
∂t
250 Dynamics VII: Interacting fields: general aspects

with gk (x) = √ 13 e−ik·x (k an on-mass-shell four-vector, with k0 =


(2π) 2E(k)

k 2 + m2 ), and with obviously similar formulas for aout , a† (by replacing “in”
ph out


with “out” everywhere). The double derivative ∂t is defined as in (6.95).
The central importance of the single interacting Heisenberg field φH lies in its
significance as an interpolating operator for the basic quanta of the field: i.e., the
stable particles whose multi-particle states form the complete asymptotic Hilbert space
(either “in” or “out”) of the theory. By this we mean that φH in some sense “behaves
like” φin in the far past, for t → −∞, and like φout in the far future, for t → +∞.
If the behavior is sufficiently “nice”, the discussion of the previous paragraph makes
it completely plausible that a complete construction of the in- and out-states of the
theory, and hence of their overlap, the all-important S-matrix, should be possible
from a knowledge of the Heisenberg field φH alone. We conclude this section by giving
a purely formal3 argument for this interpolating property of φH : in the forthcoming
sections the whole asymptotic formalism will be rederived in a mathematically rigorous
framework, with absolutely no reference to perturbation theory or the interaction
picture. For now, we shall proceed as though the interaction picture, and the associated
interaction-picture operators, all make perfect sense.4 For the free interaction-picture
field φ, the analog of (9.34) is
 ↔
† ∂
a (k) = −i d x {gk (x, t)
3
φ(x, t)} (9.35)
∂t
First, observe that the creation operator on the left-hand side is time-independent: the
reader should verify, by a short calculation involving an integration by parts, that this
is a direct consequence of the fact that φ(x, t) satisfies the Klein–Gordon equation.
The operator defined from the Heisenberg field φH in complete analogy to (9.35),
 ↔

a†H (k; t) ≡ −i d x {gk (x, t)
3
φH (x, t)} (9.36)
∂t
will, by contrast, be time-dependent (and in a very complicated way!). Next, note
that the relationship between the creation–destruction operators and the respective
fields is linear. This means that we can imagine smearing a† (k) (by smearing the pure
exponential gk ) with a normalizable momentum-space wavefunction peaked around
the central momentum k, so that a† (k) corresponds to a well-defined operator in the
Hilbert space (in particular, acting on normalizable n-particle states, it produces a
normalizable n+1-particle state). We shall assume without further comment that we
are dealing with such smeared wave-packet type states in this section; in the following

3 This expression is typically used in theoretical physics, as indeed here, as a euphemism for “mathe-
matically incorrect, but nevertheless suggestive”.
4 Although the steps that follow are mathematically unjustified, the result is in fact correct within
the context of renormalized perturbation theory, if we choose a split between the free and interacting
Hamiltonians which eliminates, order by order in perturbation theory, persistent interactions modifying the
mass and normalization of the single-particle states. In particular, we shall assume that the mass term in
H0 is taken to be the physical mass mph .
Field theory in Heisenberg representation: heuristics 251

sections the need for smearing (both fields and states) in any careful mathematical
treatment of field theory will be met head on and explicitly.
Observe that the unitary rotation effecting the transformation from interaction to
the Heisenberg picture

φH (x, t) = U † (t, 0)φ(x, t)U (t, 0) (9.37)

applies as well to the time-derivative of φH , by using the equations of motion for the
time-development operator U (t, 0), namely ∂t ∂
U (t, 0) = −iVip (t)U (t, 0), ∂t

U † (t, 0) =
+iU † (t, 0)Vip (t),

∂ ∂φ(x, t)
φH (x, t) = U † (t, 0)( + i[Vip (t), φ(x, t)])U (t, 0) (9.38)
∂t ∂t

The commutator term vanishes if Vip (t) = d3 yHint (y , t), provided the interaction
Hamiltonian density is an ultralocal scalar field, as discussed in Chapter 5:

[Hint (y , t), φ(x, t)] = 0 ⇒ [Vip (t), φ(x, t)] = 0 (9.39)

so we have, defining (cf. Section 8.1) πH (x, t) ≡ φ̇H (x, t), π(x, t) ≡ φ̇(x, t),

πH (x, t) = U † (t, 0)π(x, t)U (t, 0) (9.40)

Thus, both the Heisenberg field and its time-derivative are obtained as unitary trans-
forms of the corresponding interaction-picture free fields. This implies a corresponding
unitary relation between the operators a†H (k; t) and a† (k) built from these fields via
(9.35) and (9.36).
In our earlier discussion of scattering theory in Chapter 4 (cf. (4.155, 4.156)), the
in-state |αin arises as a strong limit from the corresponding bare state |α via

|αin = limt→−∞ U † (t, 0)|α (9.41)

which suggests that the following limits obtain for the action of the interacting operator
a†H (k; t) on an arbitrary normalizable in-state |αin5 :

limt→−∞ a†H (k; t)|αin = limt→−∞ U † (t, 0)a† (k)U (t, 0)|αin (9.42)

= limt→−∞ U (t, 0)|k, α (9.43)
= |k, αin (9.44)
= a†in (k)|αin (9.45)

with a similar result for the action of the operators ain (k), aH (k; t). As the Heisenberg
field φH and the in-field φin are constructed in a precisely analogous fashion from their
corresponding creation and destruction parts, this line of argument suggests that the

5 Were a(k) a bounded operator, this, together with the unitarity (and hence automatic boundedness) of
U and U † would justify going from (9.42) to (9.43). As we shall see below, it is not, so again this argument
is suggestive, not rigorous. A mathematically correct theory of scattering will be given in Section 9.3.
252 Dynamics VII: Interacting fields: general aspects

Heisenberg field approximates the free in-field φin in the far past. A completely anal-
ogous argument leads to the corresponding result for the far future: for t → +∞, the
action of φH is equivalent to that of the free-field φout . In other words, the Heisenberg
field φH interpolates between the free fields φin , φout describing far-separated free
particles in the distant past and similar sets of far separated free particles in the far
future: for this reason, the Heisenberg field is sometimes referred to as an interpolating
field for the particle in question. Roughly speaking, we may write

φH (x, t) → φin (x, t), t → −∞


φH (x, t) → φout (x, t), t → +∞ (9.46)

although we must caution the reader that we are here glossing over important
mathematical subtleties, which will be fully dealt with in our treatment of Haag–
Ruelle scattering theory in Section 9.3.
We conclude this section by noting an important connection between the S-matrix,
defined previously as the set of overlap amplitudes between in- and out-states, and
the unitary operator implementing the mapping between in- and out-creation and
annihilation operators. Defining an unitary operator S by

S|αout = |αin , out β| = inβ|S (9.47)

we see that our scattering amplitudes Sβα ≡ out |β|αin are just the matrix elements
of S in the in-basis:

out β|αin = Sβα =inβ|S|αin (9.48)

With |αout an arbitrary out-state (these, by asymptotic completeness, form a com-


plete set), we have

Sa†out (k)|αout = S|k, αout = |k, αin = a†in (k)|αin = a†in (k)S|αout (9.49)

which implies the intertwining identity

Sa†out (k) = a†in (k)S (9.50)

or, using the unitary property of S,

a†in (k) = Sa†out (k)S †


ain (k) = Saout (k)S † (9.51)

This implies that the in- and out-fields (cf. (9.30), together with the analogous
expression for the out-field) are similarly related by a unitary rotation implemented
by the S-operator:

φin (x) = Sφout (x)S † (9.52)

It should be emphasized that the results (9.47–9.52) rely only on asymptotic com-
pleteness, and make absolutely no reference to either the interaction picture or the
perturbative expansion thereof.
Field theory in Heisenberg representation: axiomatics 253

9.2 Field theory in Heisenberg representation: axiomatics


In the preceding section we have examined some features of the description of quantum
field theory in the Heisenberg representation of the dynamics (in which the field
operators carry the full dynamical evolution) from an heuristic point of view. Given the
fact that our arguments rested on a formal transition from the interaction picture to
the Heisenberg picture, and the unpleasant fact (now generally referred to as “Haag’s
theorem”; cf. Section 10.5) that the two pictures are unitarily inequivalent (i.e., cannot
be related by a well-defined unitary transformation) in any continuum field theory with
non-trivial perturbations,6 the reader may well wonder whether any of our conclusions
can be considered valid for the four-dimensional field theories of phenomenological
interest. In the forthcoming sections of this chapter, and until further notice, we shall
abandon all references to perturbation theory or the interaction picture: in particular,
there will be no need to adopt an artificial split of the Hamiltonian of the theory into
“free” (H0 ) and “interaction” (V ) parts. Instead, we shall adopt, following Wightman,
a minimal set of assumptions (or “axioms”) which incorporate the essential features
of a relativistic quantum field theory.
These axioms may conveniently be divided into three categories: (i) axioms spec-
ifying the structure of the underlying (Hilbert) space of states of the system, (ii)
axioms specifying the primitive properties of the local field(s) of the theory, and (iii)
axioms which establish a physical particle–field duality: i.e., a connection between the
dynamical (Heisenberg) field(s) of the theory and the asymptotic particle states which
are the systems prepared and detected in actual high-energy experiments. The task
of establishing the consistency of these assumptions, in effect by a mathematically
rigorous construction of well-defined operators with the desired properties in an
explicit Hilbert space, is the task of constructive quantum field theory, and will not
be addressed in this book—partly because this project has only succeeded fully in a
small class of field theories defined in less than four spacetime dimensions. However,
we shall see that with a very natural set of assumptions delineating the properties
of the interacting fields of our theory, a complete scattering theory can be erected
which completely avoids the mathematical irregularities and pitfalls associated with
perturbation theory or the interaction picture. This rigorous scattering theory is
primarily the work of Haag and Ruelle, and we shall present an overview of their
results in the following section.
As we are no longer concerned with a free Hamiltonian H0 , or interaction term
V ≡ H − H0 , it will be convenient to drop the “H” subscript (for “Heisenberg repre-
sentation”) in what follows, as we shall be referring throughout to Heisenberg fields
(and states). Thus the Heisenberg field will be simply φ(x) (which, in our previous
notation referred to the interaction-picture field). Also, the only particle mass which
appears will be the actual physical mass mph of the given (stable) particle, which we
shall therefore denote simply as m. Finally, to strip away all inessential complications,
we shall initially consider a theory containing a single, stable, massive, spinless, and
self-interacting particle, with no bound states. A more general discussion, taking into

6 Here, “non-trivial perturbation” can be as innocent as a shift in the mass of a free field, as we shall
see in Section 10.5.
254 Dynamics VII: Interacting fields: general aspects

account essentially all possibilities (in particular, the situations encountered in the
Standard Model of modern particle physics), will follow in Section 9.6. We shall state
the axioms of our theory, divided into the three groups outlined above, with each
axiom followed by (sometimes extensive!) explanatory comments.
We begin with the State axioms, specifying features of the state space of the
theory:
1. Axiom Ia: The state space H of the system is a separable Hilbert space. It carries
a unitary representation U (Λ, a) (Λ an element of the HLG, a a coordinate four-
vector) of the proper inhomogeneous Lorentz group (i.e., the Poincaré group).
Thus, for all |α ∈ H, |α → U (Λ, a)|α ∈ H, with the U (Λ, a) satisfying the
Poincaré algebra U (Λ1 , a1 )U (Λ2 , a2 ) = U (Λ1 Λ2 , a1 + Λ1 a2 ) (cf. 5.14).
Comments: Our Hilbert space H is a countable direct sum of multi-particle
spaces corresponding to a definite number of particles. The multi-particle space
corresponding to a fixed finite number of particles is a finite tensor product of
separable L2 spaces, each with a countable basis, and is therefore itself separable.
The separability of H follows trivially. The reader is free to visualize H as the
space of in-states Hin (or out-states, Hout ) described in the preceding section,
with the action of the U (Λ, a) given by eiP ·a UH (Λ) with P μ , UH (Λ) as defined
in (9.7, 9.9).
2. Axiom Ib: The infinitesimal generators Pμ of the translation subgroup T (a) =
U (1, a) of the Poincaré group have a spectrum pμ restricted to the forward light-
cone, p0 ≥ 0, p2 ≥ 0.
Comments: In accordance with our intuition of asymptotic completeness—
that all Heisenberg states of the system correspond to field disturbances which
eventually resolve into a finite number of well-separated stable particles of finite
energy, and with individual four-momenta on or within the forward light-cone—
the total energy-momentum pμ of any state of the system must be resolvable into
a sum of four-vectors, each of which is within the forward light-cone, and must
therefore itself lie in this region.
3. Axiom Ic: There is a unique state |0 (up to a unimodular phase, of course),
called the “vacuum”, with the isolated eigenvalue pμ = 0 of Pμ . It is unit
normalized, 0|0 = 1.
Comments: In particular, we assume the absence of spontaneous symmetry-
breaking, as discussed in Section 8.4. The need to accommodate simultaneously
a discretely normalized vacuum and continuously normalized multi-particle states
in an infinite-volume interacting theory will be seen later to be intimately
connected with the difficulties associated with the interaction picture and the
famous “Haag’s theorem” (Section 10.5).
4. Axiom Id: The theory has a mass gap: the squared-mass operator P 2 = Pμ P μ
has an isolated eigenvalue m2 > 0, and the spectrum of P 2 is empty between 0
and m2 . The subspace H1 of H corresponding to the eigenvalue m2 carries an
irreducible spin-0 representation of the HLG. These are the single-particle states
of the theory. The remaining spectrum of P 2 is continuous, and begins at (2m)2 .
Comments: This axiom specifically excludes quantum electrodynamics with a
massless photon (the only known massless particle): indeed, the structure of the
state space and the asymptotic dynamics, and in particular the definition of
Field theory in Heisenberg representation: axiomatics 255

–2

–4

–4 –2 0 2 4

Fig. 9.1 The spectrum of the energy-momentum operator P μ (P 0 vertical axis, arbitrary
spatial component P i horizontal axis, m = 2). The dashed lines enclose the support region
for the function f˜(1) (p) introduced in Section 9.3, with a = 12 , b = 2.

a sensible S-matrix, is extremely subtle in the presence of an exactly massless


particle (cf. Chapter 19). Fortunately, it is possible to give the photon a tiny
mass without altering any critical features of the theory: the massless limit can
then be performed smoothly after all calculation of rates, cross-sections, etc., are
performed with the appropriate instrumental constraints. Thus, the assumption
of a mass gap is not unduly constraining. In the two-particle subspace (say,
for |p1 , p2 in ), the squared mass operator gives (p1 + p2 )2 = 2m2 + 2p1 · p2 , with
p1 · p2 ≥ m2 , so the spectrum of P 2 in this subspace is [(2m)2 , ∞). Overall, the
spectrum of P 2 is therefore {0, m2 , [(2m)2 , ∞)}. The spectrum of the four-vector
operator P μ is indicated in Fig. 9.1: it consists of the origin (the vacuum state,
indicated by the central point), the one-particle mass hyperboloid p2 = m2 , and
the continuum, the shaded region contained within and above the hyperboloid
p2 = 4m2 . We assume no bound states, e.g., one-particle mass hyperboloids at
p2 = 4m2 − E.

We turn next to the assumptions concerning the existence and properties of a


Heisenberg field incorporating the complete interacting dynamics of our theory: these
will be termed the Field axioms of the theory.
256 Dynamics VII: Interacting fields: general aspects

1. Axiom IIa: An operator-valued (tempered) distribution φ(x) exists such that


for any Schwartz test function f (x),7 the smeared field

φf ≡ f (x)φ(x)d4 x (9.53)

is an unbounded operator defined on a dense subset D ⊂ H. Moreover, φf D ⊂ D,


allowing the definition of arbitrary (finite) products of smeared fields.
Comments: The assumed properties are motivated by the equivalent statements
for a free scalar field, for which (for real f (x))

φf = d3 p(f˜∗ (p)a( p)a† (
p) + f˜( p)) (9.54)

where

1 0
−i

f˜(
p) ≡  f (x)eiE(p)x x 4
d x (9.55)
(2π)3/2 2E(p)

Standard theorems ensure that the Fourier transform function f˜ is also a function
(of the spatial momentum vector p) of Schwarz type. A dense subset of H can be
obtained by considering all normalizable n-particle states of the form

|g1 , g2 , ..., gn  = d3 q1 d3 q2 ..d3 qn g1 (q1 )..gn (qn )|q1 , q2 , ..qn  (9.56)

where d3 q|gi (q)|2 < ∞, i = 1, 2, ..n (for example, the gi may be chosen from a
complete basis of L2 ). The reader may easily verify (see Problem 3) that the state
φf |g1 , g2 , ..gn  has finite norm. It is also easy to see that a further application of a
second smeared field, φf1 say, still produces a finite norm state (thus, φf D ⊂ D).
That the smeared fields are unbounded operators is also unsurprising if we recall
the discussion of Section 8.3, where normalized coherent states were constructed
with arbitrary (and therefore, arbitrarily large) expectation values of the field.
Note that smearing of the original local field φ(x) is unnecessary if we are only
concerned with matrix elements β|φ(x)|α with |α, |β ∈ D, which are perfectly
well-defined functions of x. However, we must frequently consider the states (in
the Hilbert space) obtained by sequential application of field operators, in which
case the smeared (or, in the language of Haag, “almost local”) operators φf are
unavoidable. Finally, we note that in keeping with our restriction to a system of
a single self-conjugate particle, the hermiticity of φ(x) translates to the following
property for the smeared fields (for f (x) in general complex):

(|β, φf |α) = (φf ∗ |β, |α), ∀|α, |β ∈ D (9.57)

Axiom IIa implies the existence of the vacuum-expectation-value (VEV) of


products of smeared fields, which can be written as overlaps of the famous

7 The space of Schwartz test functions consists of C ∞ (i.e., infinitely times continuously differentiable)
functions of fast decrease (i.e., falling faster than any power as the spacetime coordinates go to ∞).
Field theory in Heisenberg representation: axiomatics 257

Wightman distributions (Wightman, 1956) (somewhat loosely, the terminology


“Wightman functions” is, in fact, more commonly used) W (x1 , x2 , ..., xn ) with
Schwarz test functions:


0|φf1 ...φfn |0 = f1 (x1 )..fn (xn )0|φ(x1 )..φ(xn )|0d4 x1 ..d4 xn

= f1 (x1 )..fn (xn )W (x1 , ..., xn )d4 x1 ..d4 xn (9.58)

Again, the situation with free fields motivates the assumption that the distri-
butions W (x1 , x2 , ...xn ) are tempered: namely, continuous linear functionals on
the space of fast-falling Schwarz test functions in the 4n-dimensional space of
the combined spacetime coordinates x1 , x2 , ..xn . We remind the reader8 that
a tempered distribution W (z) (here z is a vector in the combined coordinate
spacetime of all the test functions in (9.58)) can always be written as a finite
derivative of a polynomially bounded continuous function:

W (z) = Dm F (z) (9.59)


|F (z)| < C(1 + |z|2E )p (9.60)

Here D m is a generic notation for m derivatives with respect to spacetime


coordinates (components of z), the norm in (9.60) is Euclidean, |z|2E ≡ i,μ (xμi )2 ,
and the constant C and power p depend, of course, on the particular Wightman
distribution W (z) under consideration. A simple example of a tempered distri-
d2
bution is the Dirac δ distribution, which can be written δ(x) = 12 dx 2 |x|, clearly

satisfying (9.59, 9.60).


2. Axiom IIb: Under the unitary representation of the Poincaré group U (Λ, a)
introduced in Axiom Ia, the smeared fields transform as

U (Λ, a)φf U † (Λ, a) = φfΛ,a , fΛ,a (x) ≡ f (Λ−1 (x − a)) (9.61)

Comments: Our standard transformation law on unsmeared fields (cf. (5.94))

U (Λ, a)φ(x)U † (Λ, a) = φ(Λx + a) (9.62)

immediately leads to (9.61) after multiplying both sides by f (x), integrating over
the four-coordinate x, and appropriate changes of variable in the integration. The
reader may also verify that U (Λ, a)φf = φfΛ,a U (Λ, a), following directly from
(9.61), implies that the domain D is Poincaré-invariant, namely U (Λ, a)D = D,
as fΛ,a is of Schwarz type if f is. The infinitesimal generators of translations are
just the energy-momentum operators whose spectral properties are delineated in
the first set of axioms above: thus, U (1, a) = eiP ·a , while the general Poincaré
group element can be expressed U (Λ, a) = U (1, a)U (Λ, 0) = U (1, a)U (Λ), with

8 For a review of the essential facts concerning tempered distributions, see (Streater and Wightman,
1978).
258 Dynamics VII: Interacting fields: general aspects

U (Λ) the generators of the HLG (just the UH (Λ) of the preceding Section).
The transformation law for a general (unsmeared) covariant field, given as a
set of component fields φn (x) transforming according to a finite-dimensional
representation M (Λ) of the HLG, is a straightforward generalization of (9.62)
(cf. (7.5)):

U (Λ, a)φn (x)U † (Λ, a) = Mnm (Λ−1 )φm (Λx + a) (9.63)

For Lorentz tranformations infinitesimally close to the identity, Λμν = g μν + ω μν ,


we recall that Mnm (Λ) = δnm + 2i ωμν J μν + O(ω 2 ), corresponding to a unitary
realization U (Λ) likewise infinitesimally close to the identity operator in the
Hilbert space of the theory:
i
U (Λ) = 1 + ωμν Mμν + O(ω 2 ) (9.64)
2
where the Mμν are a set of six independent self-adjoint operators which are the
infinitesimal generators of HLG on the state space, in effect the representatives of
the finite-dimensional J μν matrices, and hence satisfying the same commutator
algebra (7.21)

[Mμν , Mρσ ] = i(g μσ Mρν + g νσ Mμρ − g ρμ Mσν − g ρν Mμσ ) (9.65)

The Poincaré group composition rule implies U (Λ)U (1, a)U † (Λ) = U (1, Λa),
whence, using U (1, a) = eiP ·a and taking a infinitesimal, we obtain

U (Λ)P ρ U † (Λ) = Λσρ P σ (9.66)

Taking Λσρ = gσρ + ωσρ in (9.66), a short calculation gives the commutation
relations between the generators of translations (the energy-momentum four-
vector) and those of the HLG:

[P ρ , Mμν ] = i(g ρν P μ − g ρμ P ν ) (9.67)

The full Lie algebra of the Poincaré group is completed by observing that the
subgroup of translations is abelian, so the generators P μ commute with each
other:

[P ρ , P σ ] = 0 (9.68)

3. Axiom IIc: Let f1 , f2 be Schwarz functions of compact support: thus, if f1


vanishes outside a compact region v1 of spacetime, and f2 vanishes outside of
the compact region v2 , and if x1 − x2 is space-like for all x1 ∈ v1 , x2 ∈ v2 , then

[φf1 , φf2 ] = 0 (9.69)

Comments: If the fields in question are fermionic, then, of course, the commu-
tator in (9.69) should be replaced by an anticommutator. This requirement is an
obvious consequence of our previous assumption of space-like commutativity at
the level of unsmeared fields, e.g., (7.9). The restriction to compact supports for
Field theory in Heisenberg representation: axiomatics 259

the smearing fields is not strictly necessary: it suffices that the supports of f1
and f2 are non-overlapping and mutually space-like.
4. Axiom IId: The set of states obtained by applying arbitrary polynomials in the
smeared fields φf (with all possible Schwarz functions f ) to the vacuum state |0
(cf. Axiom Ic) is dense in the Hilbert space H. This axiom is sometimes expressed
as the “cyclicity of the vacuum”.
Comments: In other words, an arbitrary physical state can be approximated
arbitrarily well by linear combinations of states of the form φf1 φf2 ....φfn |0
(including, of course, the case n = 0, i.e., the vacuum itself). This so-called
Cyclicity axiom incorporates our desire that the complete dynamical information
of the theory is contained in the single field φ. Of course, in theories containing
several different types of particles (which are not bound states of each other,
for example), all such independent fields must be allowed to act on the vacuum
to produce the full Hilbert space of the theory. For the time being though, as
emphasized above, we wish to focus on the simplest possible case, that of a single
spinless particle associated with a single scalar field. The reader may easily verify
that for free fields, a dense set of normed states of the form (9.56) can indeed be
obtained by applying smeared (free) fields to the vacuum. Axiom IId is sometimes
formulated in a different but equivalent way: as the statement that the set of
(appropriately smeared) fields φf are irreducible—or in other words, that the only
bounded operator that commutes with all the φf is a multiple of the identity.

Before stating the axioms of the third category—those necessary to complete the
connection between particles and fields—we shall discuss two fundamental results
which already follow from the axioms previously stated: specifically, from a combi-
nation of the spectral and locality properties of the theory. The first result concerns
the analyticity properties of the Wightman distributions: indeed, we have already
seen in Section 6.6 that a close connection exists between locality and analyticity of
amplitudes in field theory. The second result is the Ruelle Clustering theorem, which is
the precise correlate, in field-theoretic terms, of the clustering property of the S-matrix
which was one of our primary motivations in developing the formalism of local field
theory (cf. Chapter 6). The exact connection will become apparent after our discussion
of Haag–Ruelle scattering theory in the next Section.
The analyticity domain of the Wightman functions follows directly from the trans-
lation property IIb, together with the spectral properties of the energy-momentum
operator P expressed in Axioms Ib and Ic. Thus, writing φ(xi ) = eiP ·xi φ(0)e−iP ·xi ,
we may write (defining ξi ≡ xi − xi+1 )

W (x1 , x2 , ...xn ) = 0|φ(0)e−iP ·(x1 −x2 ) φ(0)e−iP ·(x2 −x3 ) ....e−iP ·(xn−1 −xn ) φ(0)|0
= 0|φ(0)e−iP ·ξ1 φ(0)e−iP ·ξ2 φ(0)...e−iP ·ξn−1 φ(0)|0 ≡ W(ξ1 , ..ξn−1 )
(9.70)

By assumption, W(ξ1 , ..ξn−1 ) is a tempered distribution, with a well-defined Fourier


transform W̃(k1 , ..., kn−1 ):
260 Dynamics VII: Interacting fields: general aspects

 n−1 
n−1
d4 ki
W(ξ1 , ..ξn−1 ) = W̃(k1 , ..., kn−1 )e−i i=1
ki ·ξi
(9.71)
i=1
(2π)4

Inserting (9.70) in the inverse Fourier formula for W̃ and performing the ξi integrals,
we find

W̃(k1 , .., kn−1 ) = (2π)4(n−1) 0|φ(0)δ 4 (P − k1 )φ(0)δ 4 (P − k2 )....δ 4 (P − kn−1 )φ(0)|0


(9.72)
The spectrum of the energy-momentum operator P is restricted to the forward light-
cone ki2 ≥ 0, ki0 > 0, so W̃(k1 , .., kn−1 ) vanishes unless all ki are in the forward cone.
The analytical continuation of (9.70) to complex values of the ξi

ζi = ξi − iηi (9.73)

with all ηi positive time-like (ηi2 > 0) (but the ξi arbitrary real) therefore leads
to the already well-defined integral (9.71) acquiring an additional real exponential
damping factor (as ki · ηi > 0 for all i). As long as we stay away from coincident point
singularities therefore, we see that the Wightman functions are analytic in this multi-
dimensional complex domain, called the forward tube Tn−1 . An extremely important
theorem of axiomatic field theory, due to Hall and Wightman (Hall and Wightman,
1957), shows that a further analytic continuation is possible, by the simple device
of extending the Lorentz-invariance property for real coordinates and real Lorentz
transformations Λ

W (ξ1 , ..., ξn−1 ) = W (Λξ1 , ..., Λξn−1 ) (9.74)

to complex Lorentz transformations (i.e., complex 4x4 matrices Λ with det(Λ) = 1 and

ΛgΛT = g), and defining the analytic extension to the extended tube Tn−1 consisting
of points of the form (Λζ1 , ..., Λζn−1 ), ζi ∈ Tn−1 by

W (Λζ1 , ..., Λζn−1 ) = W (ζ1 , ..., ζn−1 ), (Λζ1 , ..., Λζn−1 ) ∈ Tn−1 (9.75)

The Hall–Wightman theorem gives an explicit characterization of the Wightman


functions in this extended domain: they are simply analytic functions of the complex
scalar dot-products ζi · ζj over the complex domain spanned by these dot-products.
The analyticity of the Wightman functions in the extended tube is an extremely

powerful constraint : for example, we can conclude immediately that for all (ζi ) ∈ Tn−1 ,

W (ζ1 , ..., ζn−1 ) = W (−ζ1 , ..., −ζn−1 ) (9.76)

This is obvious given that, by the Hall–Wightman theorem, the Wightman functions
depend only on the dot-products ζi · ζj , but can also be seen by the fact that we can
connect the unit Lorentz transformation Λ = 1 to the spacetime reflection Λ = −1
(with det(Λ) = +1!) by a continuous analytic path Λ(θ), θ : 0 → π in the complex
Lorentz group, by taking
Field theory in Heisenberg representation: axiomatics 261
⎛ ⎞
cos (θ) 0 0 −i sin (θ)
⎜ 0 cos(θ) sin(θ) 0 ⎟
Λμν (θ) = ⎜

⎟ = eiθ(J3 −iK3 ) = e2iθA3 (9.77)

0 − sin(θ) cos(θ) 0
−i sin (θ) 0 0 cos (θ)

The reader should check that this is indeed a Lorentz transformation (i.e., satisfies
det(Λ) = +1, ΛT gΛ = g)! It is complex insofar as the boost angle (coefficient of iK3 in
the exponent) is imaginary. The result (9.76), involving a reversal of both the temporal
(T) and spatial (P) coordinates, will turn out to be a crucial part of the demonstration
of the TCP theorem in the very general framework of axiomatic field theory given
in Section 13.4. An obvious generalization of (9.76) to the Wightman function for a
product φA1 B1 (x1 )φA2 B2 (x2 ) · · · φAn Bn (xn ) of fields in arbitrary representations of the
HLG follows directly from (9.77):

W A1 B1 ,A2 B2 ,..,An Bn (ζ1 , ..., ζn−1 ) = (−1) i 2Ai W A1 B1 ,A2 B2 ,..,An Bn (−ζ1 , ..., −ζn−1 )
(9.78)
by analytic continuation from θ = 0 to θ = π in (9.77), and recalling that the
3-component of an angular momentum Ai3 differs from the Ai value by an integer.9
Using local commutativity (Axiom IIc), it can be also shown that the analyticity
domains in the difference variables ξ1 , .., ξn−1 for different permutations of the original
arguments x1 , . . . xn can be connected to obtain analyticity in a permuted extended
tube (see (Streater and Wightman, 1978) for further details). A particular case of great
importance is the imaginary time continuation of the Wightman functions, yielding
the Euclidean Schwinger functions of the theory

S(x1 , x2 , ...xn ) ≡ W ((x1 , −ix41 ), (x2 , −ix42 ), ..., (xn , −ix4n )) (9.79)

where the xα i , α = 1, 2, 3, 4 are real. Unlike the Wightman functions, the Schwinger
functions are permutation symmetric in their arguments (the Wightman functions
involve field operators which do not commute at time-like separation). This can
be shown directly from the axioms, but will become essentially trivial in the path-
integral formalism, so we shall postpone further discussion of the Schwinger functions
to Section 10.3, where we shall explore the properties of functional integrals in field
theory.
We turn now to the issue of clustering—in particular, the result originally obtained
by Ruelle (Ruelle, 1962) on the large distance asymptotics of the Wightman functions.
In order to state this result, we need to introduce the concept of a smeared field
localized around a point x, φf (x). Our original definition, φf ≡ f (x)φ(x)d4 x, with
f (x) a function falling faster than any power as x moves away from the origin x = 0,
should be regarded as producing a smeared field localized in a very small, but finite,
region near x = 0. A corresponding field (or product of fields; see below) localized

9 A different choice of complex continuation from Λ = 1 to Λ = −1 could be made, resulting in the factor
 
(−1) i 2Bi . This is, however, identical to (−1) i 2Ai as the spin ji associated with each field differs from
Ai + Bi by an integer, and the vacuum expectation value of a product of fields can only differ from zero if
the spins they carry can couple to zero.
262 Dynamics VII: Interacting fields: general aspects

around an arbitrary coordinate x (in Haag’s language (Haag, 1992), an almost local
field) can be obtained by the standard process of translation:

iP ·x −iP ·x
φf (x) ≡ e φf e = f (y − x)φ(y)d4 y (9.80)

In fact, we can define more general types of smeared fields as smeared polynomials in
the basic field φ(x). For example, taking f (x1 , x2 ) to be a Schwarz function of rapid
decrease in the pair of coordinates (x1 , x2 ), we can define a smeared bilocal operator,
localized again in the neighborhood of the origin, by

φf ≡ f (x1 , x2 )φ(x1 )φ(x2 )d4 x1 d4 x2 (9.81)

with a corresponding almost local field φf (x) localized in the neighborhood of an


arbitrary coordinate point x by translation, as in (9.80). Everything that follows applies
with equal validity to almost local operators of this more general type.
It is somewhat unintuitive, but nevertheless true, that the Fourier transform of a
smeared field, φ̃f (p)

φ̃f (p) ≡ φf (x)e−ip·x d4 x (9.82)

carries a precise four-momentum: it alters the momentum of any state on which it acts
by precisely p. Indeed, if |α, |β are states of well-defined four-momentum Pα , Pβ , then
evidently
 
β| φf (x)e−ip·x d4 x|α = e−ip·x β|eiP ·x φf e−iP ·x |α

= e+i(Pβ −Pα −p)·x β|φf |αd4 x ∝ δ 4 (Pβ − (Pα + p)) (9.83)

Thus, β|φ̃f (p)|α is non-zero only if Pβ = Pα + p.


As a preliminary to the statement and (partial) proof of the Ruelle Clustering
theorem, we first show that the locality postulate (Axiom IIc) implies that the
commutator of two smeared fields falls faster than any power as the localization points
are separated in a space-like direction. We shall need this result only for the vacuum-
expectation-value of the commutator. Thus, let a be a spatial three-vector, with the
corresponding four-vector a ≡ (0, a). With f1 , f2 fast-decreasing Schwarz functions as
usual, we shall show that for any power N , there exists a constant C such that

|0|[φf1 (−a), φf2 (+a)]|0| < C|a|−N , |a| → ∞ (9.84)

This result depends on the fast decrease property of the smearing functions f1 , f2 as
well as the tempered distribution character of the Wightman distributions. We shall
explain the proof in somewhat greater detail than usual in order to give the reader
some idea of the flavor of the reasoning in axiomatic quantum field theory. First, note
that
Field theory in Heisenberg representation: axiomatics 263

0|[φf1 (−a), φf2 (+a)]|0 = f1 (x1 )f2 (x2 )0|[φ(x1 − a), φ(x2 + a)]|0d4 x1 d4 x2
(9.85)
The estimates we shall need rely on introducing a completely Euclidean metric on the
multiple coordinate space of all the fields under consideration. Thus, we shall define
eight-vectors z = (x1 , x2 ), α = (a, −a), with the Euclidean norm |z|2E = μ ((xμ1 )2 +
(xμ2 )2 ), |α|2E = 2|a|2 . The product of smearing functions f1 , f2 is likewise a fast falling
function of z: F (z) ≡ f1 (x1 )f2 (x2 ). The VEV of the commutator is the difference of
two Wightman distributions, W (x1 − a, x2 + a) − W (x2 + a, x1 − a), which we shall
write as a single tempered distribution of the eight-dimensional variable z − α:

0|[φf1 (−a), φf2 (+a)]|0 = F (z)W (z − α)d8 z (9.86)

The tempered nature of W (z) implies (cf. (9.59, 9.60)

W (z) = D m P(z − α), P(z − α) < C1 (1 + |z − α|2E )p < C1 (1 + |z|2E )p (1 + |α|2E )p


(9.87)
for any power p and some constant C1 , with D m consisting of m coordinate derivatives.
Substituting (9.87) into (9.86),and performing m integrations by parts,

|0|[φf1 (−a), φf2 (+a)]|0| = | P(z − α)D m F (z)d8 z| (9.88)

A little geometric reasoning shows that if |z|E < |a|, the spacetime points x1 − a and
x2 + a for the local fields appearing in the commutator on the right-hand side of (9.85)
must be space-like, so that the integrand vanishes unless |z|E ≥ |a|: accordingly, the
polynomially-bounded function P, and hence the integrand as a whole, must vanish
except for |z|E > |a|:

|0|[φf1 (−a), φf2 (+a)]|0| = | P(z − α)Dm F (z)d8 z| (9.89)
|z|E >|

a|

The fast decrease property of the smearing functions implies that for any power N
there exists a constant C2 such that
−(N +2p+9)
|D m F (z)| < C2 |z|E (1 + |z|2E )−p (9.90)

where p is the power already appearing in (9.87). Inserting (9.90) in (9.89), and using
d8 z = S8 |z|7E d|z|E , with S8 the surface area of the eight-dimensional unit sphere,

|0|[φf1 (−a), φf2 (+a)]|0| < C1 C2 S8 (1 + |α|2E )p | |z|−N
E
−2p−2
d|z|E
|z|E >|

a|

C1 C2 S8
< (1 + 2|a|2 )p |a|−N −2p−1 (9.91)
N + 2p + 1
which establishes the desired behavior for large |a| as promised in (9.84).
We now turn to a discussion of the clustering properties of our field theory. In
analogy to the discussion of connected parts of the S-matrix in Sections 6.1 and 6.2,
264 Dynamics VII: Interacting fields: general aspects

a set of connected10 Wightman distributions W c (x1 , x2 , . . . xn ) can be defined induc-


tively, starting from the full distributions W (x1 , . . . xn ) = 0|φ(x1 )φ(x2 ) . . . φ(xn )|0,
as follows:

W c (x1 ) = W (x1 ) (9.92)


W (x1 , x2 ) = W (x1 , x2 ) − W (x1 )W (x2 )
c c c
(9.93)
W c (x1 , x2 , x3 ) = W (x1 , x2 , x3 ) − (W c (x1 )W c (x2 , x3 ) + perms)
− W c (x1 )W c (x2 )W c (x3 ) (9.94)

and so on. The inversion of this set of equations,

W (x1 ) = W c (x1 )
W (x1 , x2 ) = W c (x1 )W c (x2 ) + W c (x1 , x2 ), etc. (9.95)

shows us that the full amplitude corresponding to the vacuum expectation value of
a product of fields has a cluster decomposition into a sum of terms in which the
fields are partitioned in all possible ways into clusters, with the fields in each cluster
appearing inside a connected matrix element. Connected versions of the vacuum-
expectation-values of the corresponding smeared (“almost local”) operators will be
defined similarly, thus

0|φf (x1 )φf (x2 )|0c ≡ 0|φf (x1 )φf (x2 )|0
− 0|φf (x1 )|00|φf (x2 )|0 (9.96)

and so on. The Ruelle Clustering theorem amounts to the intuitively plausible assertion
that the connected expectation values so defined decrease rapidly when the points of
localization of the fields are taken far apart spatially
Theorem 9.1 For a set of n spacetime coordinates x1 , x2 , . . . xn at equal time, xi =
(t, xi ), with spatial diameter d ≡ maxi,j |xi − xj |, for any positive N there exists a
constant CN such that for large d

0|φf1 (x1 )φf2 (x2 )..φfn (xn )|0c < CN d−N (9.97)

It will be an immediate consequence of the beautiful scattering theory due to Haag


and Ruelle—to be discussed in the next section—that the clustering properties posited
as a phenomenological necessity for the S-matrix in Section 6.1 follow transparently
from the aforestated Clustering theorem for the fields. We shall not prove the Ruelle

10 In much of the axiomatic quantum field theory literature, the term “truncated” is used instead of
“connected” for the distributions defined in (9.92–9.94). In the interests of honesty, we should admit here
that the clustering expansion of (9.92, etc.) is, strictly speaking, well-defined only for n-point functions
that are permutation-symmetric on their arguments. We shall shortly be restricting ourselves to the special
case where the spacetime coordinates x1 , . . . , xn are far space-like separated, in which limit the necessary
symmetry of the Wightman functions is restored.
Field theory in Heisenberg representation: axiomatics 265

Clustering theorem in its full generality here,11 but indicate (following the extremely
streamlined discussion of Haag, (Haag, 1992)) how the argument goes for the case
n = 2.
A crucial ingredient in the argument leading to clustering is the assumption made
in Axiom Id: the theory has a mass gap, with the minimum energy of states orthogonal
to the vacuum equal to m > 0. Choose positive numbers a, b such that 0 < a < b < m.
Define the function F+ (p0 ) on the energy variable p0 (for a four-vector p) as
− K
(p0 −a)2
e
F+ (p0 ) ≡ − K
− K , a < p0 < b (9.98)
(p0 −a)2 (p0 −b)2
e +e
≡ 0, p0 ≤ a (9.99)
≡ 1, p0 ≥ b (9.100)

with K an arbitrary real positive constant. Note that by definition 0 ≤ F+ ≤ 1, that


F+ (p0 ) = 0 for p0 < a and F+ (p0 ) = 1 for p0 > b, and that (see Problem 4) F+ is
infinitely many times continuously differentiable with respect to the energy variable
p0 for all p0 . Next, define

F− (p0 ) ≡ F+ (−p0 ) (9.101)

which is also C ∞ , but vanishes for p0 > −a. Finally, we define

F0 (p0 ) ≡ 1 − F+ (p0 ) − F− (p0 ) (9.102)

which clearly vanishes for |p0 | > b. The three functions F+ , F0 and F− form a partition
of unity, and are illustrated in Fig. 9.2 for a specific choice of parameters. It is easily

F−(p0) F 0(p0) F+(p0)

1
0.8
0.6
0.4
0.2
0
p0
–2 –1 0 1 2

Fig. 9.2 C ∞ partition of unity for a =0.3, b =0.6, m =1.0 (K =0.01).

11 For an accessible, but careful and rigorous proof, see Jost (Jost, 1965), Chapter VI, especially
pp. 126–130.
266 Dynamics VII: Interacting fields: general aspects

seen that, like F+ , the functions F0 and F− are also C ∞ (infinitely many times
continuously differentiable).
We now consider the vacuum-expectation-value of a pair φf1 (x1 ), φf2 (x2 ) of almost
local operators localized around spacetime points x1 , x2 at equal time (the n = 2 case
of the Clustering theorem). Any such operator can be split into three parts as follows,
utilizing the energy functions F±,0 introduced above:

(+) (0) (−)


φf (x) = φf (x) + φf (x) + φf (x)
 4  4
(+)
φf (x) ≡ F+ (p0 )φ̃f (p)e ip·x d p
= F + (p 0 ) ˜(p)φ(y)eip·(x−y) d4 y d p
f
(2π)4 (2π)4
 
(0) d4 p 4
˜(p)φ(y)eip·(x−y) d4 y d p
φf (x) ≡ F0 (p0 )φ̃f (p)eip·x = F0 (p 0 ) f
(2π)4 (2π)4
 
(−) d4 p d4 p
φf (x) ≡ F− (p0 )φ̃f (p)eip·x 4
= F− (p0 )f˜(p)φ(y)eip·(x−y) d4 y
(2π) (2π)4
(9.103)

with f˜(p) the Fourier transform of the Schwarz smearing function f (x). Note that
F±,0 (p0 )f˜(p) is C ∞ if f˜(p) is, so the smearing of these three operators in coordinate
space is with fast-decreasing functions: i.e., they are almost local if φf (x) is. By the
momentum shift argument of (9.83), and the support in energy of the functions F±,0 ,
we know the following:
(+)
1. The operators of form φf (x) increase the energy of any state they act on by at
least a > 0.
(0)
2. The operators of form φf (x) change the energy of any state they act on by at
most b < m.
(−)
3. The operators of form φf (x) decrease the energy of any state they act on by at
least a > 0.

The State axioms (specifically, the spectral assumptions Ib,Ic, and Id) then imply

(+) (+)
0|φf (x)|α = 0, ∀|α ⇒ 0|φf (x) = 0 (9.104)
(0) (0)
φf (x)|0 = C|0, C = 0|φf (x)|0 (9.105)
(−)
φf (x)|0 = 0 (9.106)

Accordingly,
(0) (−) (+) (0)
0|φf1 (x1 )φf2 (x2 )|0 = 0|(φf1 (x1 ) + φf1 (x1 ))(φf2 (x2 ) + φf2 (x2 ))|0
(0) (0) (−) (+)
= 0|φf1 (x1 )φf2 (x2 )|0 + 0|[φf1 (x1 ), φf2 (x2 )]|0
(0) (+) (−) (0)
+ 0|[φf1 (x1 ), φf2 (x2 )]|0 + 0|[φf1 (x1 ), φf2 (x2 )]|0 (9.107)
Field theory in Heisenberg representation: axiomatics 267

By (9.84), the three commutators on the right-hand side of (9.107) fall faster than any
power of d ≡ |x1 − x2 |, while, by (9.105),
(0) (0) (0) (0)
0|φf1 (x1 )φf2 (x2 )|0 = 0|φf1 (x1 )|00|φf2 (x2 )|0
= 0|φf1 (x1 )|00|φf2 (x2 )|0 (9.108)

Thus

0|φf1 (x1 )φf2 (x2 )|0c ≡ 0|φf1 (x1 )φf2 (x2 )|0
− 0|φf1 (x1 )|00|φf2 (x2 )|0
< CN d−N , d → ∞ (9.109)

which is just the n = 2 case of Theorem 9.1. The extension to higher values of n
(for which we refer the reader to the above-cited book of Jost (Jost, 1965)) requires
an inductive argument relying on the fact that as the spatial diameter d of the
set of n points becomes large, we may always find two subsets of points separated
by an arbitrarily large spatial distance. It is also important to realize that the
above argument is equally valid for the more general class of composite almost local
operators: e.g., of the form (9.81). Indeed, we shall employ the Clustering theorem in
just such a circumstance in the next section, when it is combined with the Haag–Ruelle
scattering theory to establish the clustering properties of the S-matrix which served
as a crucial motivation in our first stabs at constructing local field theory.
We can now state our final set of axioms, those that connect the particle(s) and
field(s) of the theory. We may call them the particle–field duality axioms. They are
essential in the development of any comprehensible theory of particle scattering in the
context of field theory.

1. Axiom IIIa: For some one-particle state |α = g(k)|kd3 k (g(k) ∈ L2 ) with
discrete eigenvalue m2 of the squared-mass operator (cf. Axiom Id), the smeared
field φf (x) has a non-vanishing matrix element from this single-particle state to
the vacuum, 0|φf (x)|α = 0.
Comments: If this situation holds, we call φf (x) an interpolating Heisenberg
field for the given particle.
2. Axiom IIIb: (Asymptotic completeness.) The Hilbert space Hin (resp. Hout )
corresponding to multi-particle states of far-separated, freely moving stable
particles in the far past (resp. far future) are unitarily equivalent, and may be
identified with the full Hilbert space H of the system (which, from the cyclicity
axiom IId, can be regarded as the space generated by application of the smeared
fields to the vacuum). Thus, this axiom again connects particle concepts (the
asymptotic in- and out-states) with a space H defined in terms of the action of
the basic field(s) of the theory.
Comments: As discussed in the preceding section, this assumption is almost
unavoidable physically, as it incorporates a vast amount of phenomenological
experience of particle interactions. On the other hand, from a brutally utilitarian
point of view, it may be thought to be partially unnecessary: if there are physical
states which do not correspond to the states produced and detected in high-
268 Dynamics VII: Interacting fields: general aspects

energy accelerators, who needs to know, given that essentially all of our insights
into the nature of fundamental microphysical interactions are obtained from
such accelerator experiments? The assumed unitarity of the S-matrix would
only then require Hin = Hout , with both of these asymptotically defined spaces
being (perhaps) proper subsets of the full Hilbert space H. Indeed, the Haag–
Ruelle scattering theory of the next section can only establish the existence of
the asymptotic spaces as such subsets. Moreover, even in the few cases where
we have maximum mathematical control: e.g., the explicitly constructed field
theories corresponding to polynomially self-coupled scalar fields in two spacetime
dimensions, where the validity and consistency of the axioms of type I and II can
be explicitly checked by construction of the Hilbert space and operators of the
theory, the validity of Axiom IIIb remains, in the words of Jaffe and Glimm
((Glimm and Jaffe, 1987), p. 275), “a very deep (and open) mathematical ques-
tion”. Our attitude for the remainder of this book, in the absence of conclusive
evidence to the contrary, will simply be to assume the validity of asymptotic
completeness, and to treat the asymptotic spaces Hin,out as equivalent to the full
physical Hilbert space of the field theory.
The axiomatic framework outlined in this section is the basis for famous proofs
(Streater and Wightman, 1978) of the Spin-Statistics and TCP theorems which estab-
lish rigorously these fundamental properties of local field theory (already discussed
qualitatively in Chapter 3) in a very general context. We have already discussed
(in Section 7.3) the Spin-Statistics theorem, albeit in the context of free interaction-
picture fields. We shall return to both the Spin-Statistics and TCP theorems later, in
the “Symmetries” section of the book (cf. Section 13.4). Another deep and beautiful
result of the axiomatic approach—the Wightman reconstruction theorem ((Streater
and Wightman, 1978), Section 3-4)—allows us to recover all essential features of the
particle-oriented Fock-space formulation of field theory starting only with a set of
Wightman functions satisfying the above axioms. Here, the cyclicity of the vacuum
embodied in Axiom IID is critical, ensuring that arbitrary states of the Hilbert space
of the system can be approximated to arbitrary accuracy by applying polynomials of
the smeared field to the vacuum. An arbitrary matrix element of the smeared field
between physical states can consequently be approximated by linear combinations
of Wightman functions (integrated with smearing functions) to arbitrary accuracy.
The appropriate transformation properties of the states and field operator under the
Poincaré group are also explicitly demonstrated in the process of the reconstruction.
We shall be performing a similar reconstruction shortly using the scattering theory
of Haag and Ruelle. In this approach an explicit formula for the asymptotic in- and
out-states of the theory is given in terms of limits of appropriately smeared Heisenberg
fields acting on the vacuum. To the extent that we accept asymptotic completeness
(Axiom IIIB), the construction of the asymptotic in- or out-spaces is tantamount to
recovering the full physical Hilbert space of the theory.

9.3 Asymptotic formalism I: the Haag–Ruelle scattering theory


In this section we shall establish, in a rigorous way, the promised connection between
the underlying Heisenberg field φ(x) and the asymptotic states of the theory: in
particular, we shall show, following the seminal work of Haag and Ruelle, how
Asymptotic formalism I: the Haag–Ruelle scattering theory 269

to construct the asymptotic Hilbert spaces Hin,out as appropriate limits involving


products of the Heisenberg field applied to the vacuum. The only conceptual input
will be the axioms of the preceding Section. The treatment is restricted for sim-
plicity to a theory of massive, self-interacting spinless particles, with no bound
states in the theory. Our main result will be the fundamental Asymptotic theorem
of Haag.
We begin by showing that a special type of smeared field can be constructed with
the properties that (a) acting on the vacuum it produces single-particle states, and
single-particle states only, and (b) these states are time-independent. The smearing is
carried out in two stages. First, starting with the original local field φ(x), assumed to
be an interpolating field for the stable particle of the theory, we construct a field φ1 (x)
with a special type of smearing, with the property that it produces only one-particle
states when acting on the vacuum. Next, this field is further smeared in such a way that
the resultant field φ1,g produces a time-independent (one-particle) state with a definite
momentum-space wavefunction when acting on the vacuum. The first field, φ1 (x) is
defined exactly as in (9.80), but with a function f (1) (x) chosen as the Fourier transform
of a function f˜(1) (p) with support in the region am2 < p2 < bm2 with 0 < a < 1,
1 < b < 4 (the numbers “1” and “4” appearing here are purely convenient choices,
as will be apparent shortly). The function f˜(1) (p) will be as usual C ∞ and of fast
decrease for large spatial momentum p, although  we shall not allow it to vanish
anywhere on the one-particle mass shell p0 = p 2 + m2 , as we shall want to construct
wave-packets for particles centered around arbitrary momenta. An example of a
possible region of support for f˜(1) (p) is shown in Fig. 9.1 as the region between
the dotted lines. Note that this region includes the one-particle mass hyperboloid,
but not the vacuum or any multi-particle states.12 Consequently, the field φ1 (x) can
only produce—and by Axiom IIIa, will produce!—one-particle states when acting
on the vacuum, by the argument leading to (9.83). The field φ1 (x) is therefore an
almost local field in the sense described in the preceding section, with the usual
translation properties, eiP ·a φ1 (x)e−iP ·a = φ1 (x + a) (with its infinitesimal version
∂φ1
i[Pμ , φ1 (x)] = ∂x μ ). Acting on the vacuum, it can at best produce a one-particle state,

and by Axiom IIIa we will henceforth assume that it is an interpolating field for the
spinless particle of the theory, with a non-vanishing vacuum to single-particle matrix
element. The latter is in fact determined up to a single overall normalization constant
of the field, by Lorentz-invariance. Using covariantly normalized one-particle states,
  3 
cov k |kcov = 2E(k)δ (k − k),


cov k|φ1 (x)|0 = f (1) (y − x) cov k|φ(y)|0d4 y

iP ·y
= f (1) (y − x) 
cov k|e φ(0)e−iP ·y |0d4 y

= f (1) (y − x)eik·y 
cov k|φ(0)|0d y
4

= 
cov k|φ(0)|0f
˜(1) (k)eik·x (9.110)

12 Also, bound states of two particles, with p2 = 4m2 − E, are specifically excluded.
270 Dynamics VII: Interacting fields: general aspects

The Fourier transform f˜(1) (k) is written as a function 


 of the spatial momentum k only,
 2 2
as it is the on-mass-shell value (at k0 = E(k) = k + m ) which is relevant here. By
Lorentz-invariance,

  † 
cov k|φ(0)|0 = cov k|U (Λ)φ(0)U (Λ)|0 = cov Λk|φ(0)|0 (9.111)

Choosing Λ to be the boost which takes momentum k to zero, we obtain the matrix
element cov 0|φ(0)|0. This matrix element (again, assumed non-zero by Axiom IIIa) is
thus a constant dependent on the normalization of the basic Heisenberg field φ(x). For
convenience, we may choose the normalization here to agree with that of a free field,
for which cov k|φ(0)|0 = (2π)13/2 , but it must be remembered that the normalization
of the field is often conventionally fixed by other requirements—such as commutation
relations—which will lead to a different normalization (more on this at the end of
this section, when we derive the asymptotic condition). Thus, we may simply take,
switching back to non-covariantly normalized states (recall |kcov = 2E(k)|k, cf
(5.15)),

1
k|φ1 (x)|0 =  f˜(1) (k)eik·x (9.112)
(2π)3/2 2E(k)


We also note at this point that ∂t φ1 (x) is an almost local field if φ1 (x) is, as the time-
derivative of the fast-decreasing C ∞ smearing function f (1) is still fast decreasing.
Next, let g(x, t) be a positive-energy solution of the Klein–Gordon equation:
 
d3 p
g(x, t) = p)ei(

x−E(p)t)
g̃( , E(p) ≡ p 2 + m2 (9.113)
2E(p)

The momentum wavefunction g̃( p) will be chosen to be C ∞ and rapidly decreasing


(faster than any power) for large p. Also, the time-derivative of g(x, t) has the same
properties as E(p)g̃(p) is also C ∞ and rapidly decreasing for large p. Our final smeared
field φ1,g (t) is now defined as

 ↔

φ1,g (t) ≡ −i d x {g(x, t)
3
φ1 (x, t)} (9.114)
∂t

Both terms in (9.114) therefore correspond to the spatial smearing of an almost local
field with a single particle wavefunction solution of the Klein–Gordon equation. The
admittedly somewhat clumsy subscript “1” is maintained as a reminder that this field
has been engineered to produce only one-particle states when it acts on the vacuum:
we shall later define a similar field without this restriction, and will need to be able
to distinguish the two. Note the similarity of (9.114) to the creation operator defined
in (9.36) of Section 9.1. It is easy to see that the state obtained by applying this field
to the vacuum is time-independent:
Asymptotic formalism I: the Haag–Ruelle scattering theory 271

∂ ∂2 ∂2
φ1,g (t)|0 = −i d3 x {g(x, t) 2
φ1 (x, t) − φ1 (x, t) 2 g(x, t)}|0 (9.115)
∂t ∂t ∂t

∂2  2 − m2 )g(x, t)}|0 (9.116)
= −i d3 x {g(x, t) φ1 (x, t) − φ1 (x, t)(∇
∂t2

In going from (9.115) to (9.116) we have used the fact that the single particle wavefunc-
∂2 2
2 − ∇ + m )g = 0. Transferring
2
tion g(x, t) satisfies the Klein–Gordon equation ( ∂t
the spatial gradients by an integration by parts (using the fast decrease of g in x-space),
we find


φ1,g (t) = −i d3 x g(x, t)( + m2 )φ1 (x, t)|0 (9.117)
∂t
As the energy-momentum operator Pμ annihilates the vacuum

Pμ P μ φ1 (x, t)|0 = −φ1 (x, t)|0 (9.118)

so we see that (9.117) involves the operator Pμ P μ − m2 acting on the one-particle


state (of mass m) φ1|0. The result is clearly zero, establishing the time-independence
of φ1,g (t)|0:


φ1,g (t)|0 = 0 (9.119)
∂t
A little thought reveals that this property is no longer maintained for multiple
applications of the field φ1,g (at the same time t) to the vacuum. However, the
resultant time-dependent multi-particle states will be shown below to have a well-
defined (strong) limit for t → ±∞. This is, in fact, the central result of the Haag–
Ruelle approach to scattering.
The momentum wavefunction of the one-particle state φ1,g (t)|0 = φ1,g (0)|0,
defined as the overlap of this state with the non-covariantly (continuum) normalized
state |k, follows straightforwardly from (9.112, 9.114):
 ↔
−ip·x ∂ ik·x ˜(1)  3 d3 p
ψ1,g (k) ≡ k|φ1,g (t)|0 = −i g̃(
p)(e e )f (k)d x
∂t 2E(p)
g̃(k)f˜(1) (k)
= (2π)3/2  (9.120)
2E(k)

Any desired (fast-decreasing) momentum-space wavefunction of our single-particle


state can evidently be obtained as a product of appropriately chosen factors g̃(k)
and f˜(1) (k).
The scattering theory we shall develop will involve the study of the limits of states
constructed from application to the vacuum of fields of the type φ1,g (t) in the limits
t → ±∞. This will require an understanding of the large time asymptotic behavior of
the single-particle wavefunctions g(x, t). Define a velocity vector v in the obvious way,
v ≡
xt , so that
272 Dynamics VII: Interacting fields: general aspects

d3 p
g(x, t) = p)eit(

v−E(p))
g̃( (9.121)
2E(p)
For fixed v , large t, this integral is dominated13 by a stationary phase point at
∂ p 1
p · v − E(p)) = 0 ⇒ v =
( ⇒ p = mγv , γ ≡ √ (9.122)
∂
p E(p) 1 − v2
Expanding the integrand around the stationary phase point through quadratic terms
one finds

1 
3
p · v − E(p) ∼ −m/γ − (pi − mγvi )Mij (pj − mγvj ) (9.123)
2 i,j=1

with Mij = mγ 1
(δij − vi vj ) a symmetric 3x3 matrix with eigenvalues mγ
1 1
, mγ 1
and mγ 3,
1
and hence determinant m3 γ 5 . The Gaussian integration around the stationary phase
point gives us the desired asymptotic behavior at large t:

g(x = v t, t) ∼ C|t|−3/2 e−imt/γ (γ 3/2 g̃(mγv ) + O(1/t)) (9.124)

with C an irrelevant constant containing the mass m, π, etc. What if the momentum-
space wavefunction g̃( p) vanishes at (and in some neighborhood of) the stationary
phase point p = mγv ? For example, g̃ may have compact support in momentum space,
and simply vanish in some neighborhood of mγv , in which case not only the leading,
but also all higher-order terms in the stationary phase expansion, will vanish. In this
case it can be shown that g(v t, t) vanishes for large t faster than any power of t.14
We shall, however, only need the weaker result encapsulated in (9.124). Finally, we
note that for x (or v ) pointed along the direction of the particle’s motion, the t−3/2
falloff is just the expected spreading of the wave-packet due to the non-zero spread in
momentum space, which leads to the particle being delocalized over a region of linear
dimension ∝ t, with |g|2 t3 ∼ constant at large times.
Before statement and proof of the Haag Asymptotic theorem, we shall need two
important preliminary results. The first is a fairly direct consequence of the Ruelle
Clustering theorem discussed in the preceding section. The essential physical content
of the description of a physical system in terms of in/out-states is the intuition that
for large times, past or future, the particles become physically isolated and cease to
interact significantly with one another. In the field context, this turns out to be exactly
equivalent to the fast decrease of connected VEVs of almost local fields at large spatial
separation: the large-distance falloff of field expectation values is converted to a large
time falloff via the asymptotic kinematics of the single particle wavefunctions given in
(9.124).
First, let us temporarily use the notation φ1,g (t) to generically denote any field
obtained by spatially smearing an almost local field φ(x, t) with a single particle

13 The observation that such wave-packets obey the correct relativistic kinematics was the primary
motivation for de Broglie’s introduction of the wave hypothesis for particles, and as such the seminal
development initiating the path to Schrödinger’s wave mechanics.
14 For a tight proof of this result, originally due to Ruelle, see the above-cited book of Jost (Jost, 1965),
Chapter 6.
Asymptotic formalism I: the Haag–Ruelle scattering theory 273

wavefunction g(x, t):



φ1,g (t) = φ(x, t)g(x, t)d3 x (9.125)

We note that our previously defined field φ1,g (t) in (9.114) is just the sum of two

such fields, with the almost local field being either φ1 or ∂t φ1 , and the single-particle
wavefunction either g(x, t) (defined in (9.113)) or its time-derivative ∂g(
x,t)
∂t . Our first
preliminary result states
Lemma 9.2 For large times, t → ±∞,

Mm,n (t) ≡ 0|φ†1,g (t)..φ†1,gm


 (t)φ1,g1 (t)...φ1,gn (t)|0
1

O(|t|−3/2 ), m = n
−→  † −3 (9.126)
pairs P p∈P 0|φ1,g  (t)φ1,gjp (t)|0 + O(t
ip
), m = n

To establish this result, we begin by noting that the amplitude Mm,n (t) has a cluster
expansion as a sum of terms in which the m + n fields are distributed into Nc separate
clusters, with the mr φ†1,g (t) and nr φ1,g (t) fields in the rth cluster inside a separate
connected VEV of the form (9.97). The rth cluster will give a contribution of the form

0 |φ1 (x1 , t).....φ1 (xmr +nr , t)|0c G1 (x1 , t)...Gmr +nr (xmr +nr , t)d3 x1 ...d3 xmr +nr

= 0|φ1 (x1 , t)φ1 (x1 + ξ2 )...φ1 (x1 + ξmr +nr )|0c

· G1 (x1 , t)...Gmr +nr (x1 + ξmr +nr , t)d3 x1 d3 ξ2 ..d3 ξmr +nr
 
= d x1 G1 (x1 , t) 0|φ1 (0, t)φ1 (ξ2 , t)..φ1 (ξmr +nr , t)|0c
3

· G2 (x1 + ξ2 , t)..Gmr +nr (x1 + ξmr +nr , t)d3 ξ2 ..d3 ξmr +nr (9.127)

Here we use the generic notation Gi (x, t) to denote either a positive-energy wavefunc-
tion g(x, t) (appearing in the φ1,g fields) or a negative-energy wavefunction g ∗ (x, t)
(appearing in the φ†1,g fields). Note that the asymptotic behavior at large time of the g ∗
wavefunctions is given directly by complex conjugating (9.124), and involves the same
|t|−3/2 falloff as the g functions. In passing from the first to the second line we have
shifted integration variables by defining xn ≡ x1 + ξn , n = 2, . . . mr + nr . The last line
is obtained using the translation property of the φ1 fields.
At this point we invoke the Ruelle Clustering theorem, which asserts that the
connected vacuum expectation value appearing in the penultimate line of (9.127) is a
fast decreasing function of the ξ variables. In the asymptotic limit of large t then, each
such variable can be regarded as restricted to a finite range. If we change to velocity
space for the x1 ≡ v1 t variable, we note that the Gn functions become asymptotically,
using (9.124),

|Gn (tv1 + ξn , t)| ∼ |t|−3/2 γ1 |G̃n (mγ1v1 )|, t → ±∞


3/2
(9.128)
274 Dynamics VII: Interacting fields: general aspects
 
where γ1 ≡ (1 − v12 )−1/2 . After changing variables d3 x1 → t3 |
v1 |<1 d3 v1 , we obtain,
after the ξ integrals are performed, a factor of t3 from the x1 integral, and a factor
of t−3/2 from each of the mr + nr wavefunctions. Note that the integral over velocity
space v1 is over the unit ball, with the momentum-space wavefunctions G̃(mγ1v1 )
decreasing rapidly (in particular, faster than any power of γ1 ) as the boundary
is reached. All the integrals are therefore well-defined, allowing us to replace the
wavefunctions by their asymptotic values and giving the overall asymptotic behavior
for the rth cluster
 m
r +nr

| 0|φ1 (x1 , t).....φ1 (xmr +nr , t)|0c Gi (xi , t)d3 x1 ...d3 xmr +nr |
i=1
3
∼ |t|3− 2 (mr +nr ) (9.129)
c Nc
Multiplying this behavior for all Nc clusters, with N r=1 mr ≡ m and r=1 nr ≡ n,
we find that the contribution to Mm,n (t) from Nc clusters with N = m + n total
3
fields behaves asymptotically like t3Nc − 2 N . As 0|φ1 |0 = 0 (φ1 |0 is a single-particle
state, hence orthogonal to the vacuum), all clusters must have at least two fields. If
any cluster has three fields, the total number of fields must satisfy N ≥ 2(Nc − 1) + 3,
which implies 3Nc − 32 N ≤ − 32 : i.e., a vanishing t−3/2 behavior as t → ±∞. Evidently,
the only way to obtain a non-vanishing result at large time is to have all clusters
contain exactly two fields, in which case N = 2Nc and the power falloff is eliminated.
Moreover, the only pairings that survive at large time involve a φ†1,g field paired with
a φ1,g field. If we take instead two φ1,g fields:

0|φ1 (x1 , t)φ1 (x2 , t)|0c g1 (x1 , t)g2 (x2 , t)d3 x1 d3 x2

d3 p1 d3 p2 3
= Δ(x1 − x2 )g̃1 ( p2 )ei(
p1 ·
x1 −E(p1 )t+
p2 ·
x2 −E(p2 )t)
p1 )g̃2 ( d x1 d 3 x2
2E(p1 ) 2E(p2 )
(9.130)

Here Δ(x1 − x2 ) ≡ 0|φ1 (x1 , t)φ1 (x2 , t)|0c is fast decreasing for large x1 − x2 by the
Ruelle Clustering theorem, so that its Fourier transform Δ̃( p) is smooth (C ∞ ).
Changing to center-of-mass variables X  ≡ 1 2 , x ≡ x1 − x2 , and performing the

x +
x
2
spatial integrals, one finds

0| φ1,g1 (t)φ1,g2 (t)|0c = 0|φ1 (x1 , t)φ1 (x2 , t)|0c g1 (x1 , t)g2 (x2 , t)d3 x1 d3 x2

p1 − p2 3 d3 p1 d3 p2
= (2π)3 g̃1 ( p2 )e−i(E(p1 )+E(p2 ))t Δ̃(
p1 )g̃2 ( )δ (
p1 + p2 )
2 2E(p1 ) 2E(p2 )

d3 p 1
= (2π)3 g̃1 (
p1 )g̃2 (− p1 )e−2iE(
p1 )t
p1 )Δ̃( (9.131)
4E(p1 )2
The smooth momentum dependence of all factors in the integral then implies the fast
decrease (faster than any power) of (9.131) as t → ±∞. A similar result obtains for a
Asymptotic formalism I: the Haag–Ruelle scattering theory 275

cluster consisting of two φ†1,g fields. On the other hand, if we take a pairing of a φ†1,g
with a φ1,g field, the complex time exponentials cancel, and the result is non-vanishing,
and time-independent. Finally, we note that if m = n, allowing a complete pairing, as
in the second line of Lemma 9.2, the remainder term must involve at least two clusters
with three (or more) fields, and hence a falloff of t−3 at large time. This concludes the
demonstration of Lemma 9.2.
Our second preliminary result concerns the symmetry under permutation of the
states obtained by applying the φ1,g fields to the vacuum. We state it as the following
Lemma.
Lemma 9.3 Define the time-dependent state |Ψ, t as follows:

|Ψ, t ≡ φ1,g1 (t)φ1,g2 (t)...φ1,gm (t)|0 (9.132)

With P an arbitrary permutation of the sequence 1, 2, ...m, define

|Ψ , t ≡ φ1,gP (1) (t)φ1,gP (2) (t)...φ1,gP (m) (t)|0 (9.133)

Then for large time t, the distance between these two state vectors has the asymptotic
behavior
 1
(|Ψ, t − |Ψ , t, |Ψ, t − |Ψ , t) ∼ 3/2 (9.134)
|t|

This result follows as an immediate consequence of Lemma 9.2, as the squared distance
between the states is

(|Ψ, t − |Ψ , t, |Ψ, t − |Ψ , t) = Ψ, t|Ψ, t + Ψ , t|Ψ , t
− Ψ, t|Ψ , t − Ψ , t|Ψ, t (9.135)

Each of the inner products appearing on the right-hand side of (9.135) is an amplitude
of the form Mm,m (t), so by Lemma 9.2 approaches asymptotically a sum of m cluster
pairs which is symmetric under permutation of any two fields. Thus the leading terms
cancel, leaving a remainder of order t−3 , whence Lemma 9.3.
The proof of the Haag Asymptotic theorem follows very quickly from these results.
First, the theorem itself.
Theorem 9.4 The time-dependent state vector

|Ψ, t ≡ φ1,g1 (t)φ1,g2 (t)...φ1,gn (t)|0 (9.136)

converges strongly in the limit t → −∞ to the n-particle in-state



|Ψin = |g1 , g2 , .., gn in ≡ ψ1,g1 (k1 )...ψ1,gn (kn )|k1 , .., kn in d3 k1 ...d3 kn (9.137)

with momentum wavefunctions ψ1,g1 (k), ..ψ1,gn (k) (defined in (9.120)); thus, the
states in (9.137) have the inner product structure corresponding to the contin-
uum non-covariant normalization of (5.21) (with plus signs everywhere as we are
276 Dynamics VII: Interacting fields: general aspects

considering only bosons here). All of the above holds with the replacements t → +∞
and “in”→“out” everywhere.
The strong convergence is easily established by taking the time-derivative of |Ψ, t:

∂ ∂ ∂
|Ψ, t = ( φ1,g1 (t))φ1,g2 (t)..φ1,gn (t)|0 + φ1,g1 (t)( φ1,g2 (t))..φ1,gn (t)|0
∂t ∂t ∂t

+ . . . + φ1,g1 (t)φ1,g2 (t)..( φ1,gn (t))|0 (9.138)
∂t

The final term on the right-hand side of (9.138) vanishes by (9.119) as the time-
derivative of φ1,gn acts directly on the vacuum state. However, all the other (n-1)
terms correspond to permutations of a similar term in which the field with the time-
derivative is moved to the extreme right, and therefore, by Lemma 9.3, have norm
of order |t|−3/2 . However, || ∂t ∂
||Ψ, t|| < |t|C3/2 ⇒ |||Ψ, t − ||Ψ, t || < |T2C
|1/2
, t, t > T .
Thus the sequence of states |Ψ, t for large t is a Cauchy sequence in the Hilbert space
and must converge in norm to a limit vector |Ψin .
The second part of the theorem, establishing that the in- (or out-)states defined
as such limits have the appropriate inner-product structure to define a Fock space
of independent many-particle states, follows directly from Lemma 9.3, as the inner
product in g1 g2 ...gn |g1 g2 ...gn in is simply the limit for t → −∞ of the amplitude
Mn,n (t) defined in Lemma 9.2, the result of which is a symmetric sum of overlaps
of single-particle states (see Problem 5).
The physical interpretation of the states resulting from the limiting processes of
Theorem 9.4 is fairly clear. The smeared fields correspond to wave-packets which
overlap less and less as the state is run either forwards or backwards in time. The
asymptotic convergence can be greatly improved,15 from the t−3/2 behavior used
above, to faster than any power of the time, if the momentum wavefunctions of
the particles have non-overlapping support in momentum space: thus g̃i ( p) = 0 iff
p) = 0 for i = j. In this situation, the particle velocities are “pointed” in different
g̃j (
directions and the separation at large time is ensured, without recourse to wave-packet
spreading. In particular, in this case, the reader may easily verify (Problem 6) that
the coordinate space overlap of the single-particle wavefunctions of different particles
remains exactly zero at all times. Thus the states constructed by the Haag procedure
satisfy our intuitive picture of widely separated free particles in either the far past or
the far future. We also note here without proof that the in- and out-states as defined
in Theorem 9.4 can be shown to have the correct transformation properties under the
Poincaré operators U (Λ, a).
It is extremely important to realize that the construction of asymptotic multi-
particle states by the limiting procedure of Theorem 9.4 remains perfectly valid
if the underlying field φ(x) is itself a more general type of almost local field, such
as the bilocal operator of (9.81), provided only that Axiom IIIa holds: namely, that
the one particle state of the stable particle whose in- and out-states we wish to

15 For further details, the reader is encouraged to consult the technical literature: e.g., the above-cited
book of Jost (Jost, 1965).
Asymptotic formalism I: the Haag–Ruelle scattering theory 277

construct has a non-vanishing vacuum to single-particle matrix element of this field,


k|φ(x)|0 = 0. For example, the field φ(x) may have to be constructed from products
of the “elementary” fields appearing in the Hamiltonian (or, more commonly, the
Lagrangian, cf. Chapter 12) specifying the dynamics of the theory, if it corresponds
to a stable bound state of the theory. We shall return to a detailed discussion of these
issues in Section 9.6.
The Haag–Ruelle approach to scattering is ideally suited for understanding the
emergence of the clustering property of the S-matrix from the underlying field-theoretic
behavior. Consider the process depicted in Fig. 6.4, in which n1 + n2 particles scatter,
and for simplicity of notation we have assumed that the scattering processes are
number-conserving. Moreover, the wave-packets are constructed in such a way that
n1 of the particles (with initial and final wavefunctions gi , gi , i = 1, ..n1 ) are localized,
both before and after their interaction, around the spacetime point (0, −Δ),  and the
 
other n2 particles (with wavefunctions hi , hi , i = 1, 2, ..n2 ) around the point (0, +Δ).
The desired clustering property of the S-matrix amounts to the statement that when
 is much greater than the size of the spacetime regions over which the two sets
|Δ|
of particles interact, the full S-matrix amplitude should factorize into a product
of independent scattering amplitudes for the “g” and “h” particles. Now, by the
Haag Asymptotic theorem, the amplitude for the full process (a n1 + n2 → n1 + n2
scattering amplitude) is given by

Sn1 +n2 →n1 +n2 = lim 0|φ†1,g (+T )..φ†1,gn (+T )φ†1,h (+T )..φ†1,hn (+T )
T →∞ 1 1 1 1

· φ1,g1 (−T )..φ1,gn1 (−T )φ1,h1 (−T )..φ1,hn1 (−T )|0 (9.139)
The limit T → ∞ is, of course, a mathematical formality: in a typical high-energy
scattering experiment the particles interact only in a spacetime region of microscopic
dimensions. So the limit is very rapidly attained already when T is some very small
value (e.g., 10−23 seconds for a typical strong interaction scattering event). We shall
therefore fix T at some very small but finite value, at which point the S-matrix
amplitude (9.139) has achieved its limit value to any preassigned level of precision,
and enquire about the behavior of the combined scattering amplitude when the two
 >> T . In this limit the commutator
groups of particles are separated by distance 2|Δ|
of any almost local field of “g” type appearing in (9.139) with a field of “h” type falls
 so we may rearrange (9.139) (for fixed T ) as
faster than any inverse power of |Δ|,
follows:
Sn1 +n2 →n1 +n2 = 0|φ†1,g (+T )..φ†1,gn (+T )φ1,g1 (−T )..φ1,gn1 (−T )
1 1

· φ†1,h (+T )..φ†1,hn (+T )φ1,h1 (−T )..φ1,hn1 (−T )|0  −N )


+ o(|Δ|
1 1

 h (0, +Δ)|0
≡ 0|Φg (0, −Δ)Φ   −N )
+ o(|Δ| (9.140)
Note that the two groups of fields, those involving “g” wavefunctions and those
 and
involving “h” wavefunctions, can be combined into the single operators Φg (0, −Δ)

Φh (0, +Δ) which are almost local operators localized around the indicated spacetime
 can be viewed as the smearing of the
points. For example, the product field Φg (0, Δ)
278 Dynamics VII: Interacting fields: general aspects

multi-local product φ(x1 )φ(x2 )...φ(x2n1 ) with a smearing function of Schwarz type
 much larger than all other distance scales in the problem.
falling fast for |xi − Δ|
Now, by definition,
 h (0, +Δ)|0
0|Φg (0, −Δ)Φ  
= 0|Φg (0, −Δ)|00|Φ 
h (0, +Δ)|0

 h (0, +Δ)|0
+ 0|Φg (0, −Δ)Φ  c (9.141)

The connected term on the right-hand side of (9.141), by the Ruelle Clustering theorem
9.1, falls faster than any inverse power of the cluster separation |Δ|,  whence the
desired factorization of the S-matrix amplitude for the combined process into S-matrix
amplitudes representing the separate scattering of “g”-type and “h”-type particles.
We conclude this section by employing the Haag–Ruelle theory to derive the long
promised direct connection between the interpolating Heisenberg field φ(x) and the
free in (resp. out) fields φin (x) (resp. φout (x)) defined in Section 9.1. This connection,
usually referred to as the Asymptotic Condition, was already “derived” heuristically
by manipulations involving interaction-picture operators (see (9.45)). Here we shall see
that the precise result we need, which serves as the starting point for the extremely
important scattering theory formalism of Lehmann, Symanzik, and Zimmermann to
which we turn in the next section, follows from the Haag–Ruelle theory (and hence,
from the axioms of Section 9.2) without any reference to an interaction picture
or perturbation theory. We begin by defining a smeared field φg (t), analogous to
our φ1,g (t) fields, except that the initial smearing function f (1)(x) is now taken to
be a general Schwarz function f (x) of fast decrease, with four-dimensional Fourier
transform f˜(k) which is not restricted to a region of support sandwiching the one-
particle mass hyperboloid as previously. Eventually, in fact, we may even allow f (x) to
approach a δ-function (i.e., take f˜(k) constant). For the time being though, our new
field φg (t) will be obtained by smearing the almost local field φf (x), defined exactly as
in (9.80), with a positive-energy single particle wavefunction g(x, t) as in (9.113). We
note that φg (t)|0 is not any more a single particle state, nor is it time-independent.
However, as far as the preconditions for Lemma 9.2 are concerned, φg (t) is just as good
as our previous φ1,g (t) field. We may therefore conclude that, picking for definiteness
the limit for large negative time t → −∞,

0|φ†1,g (t)...φ†1,gm
 (t)φg (t)φ1,g1 (t)...φ1,gn (t)|0
1


m

→ 0|φ†1,g (t)φg (t)|0c 0|φ†1,g (t)..φ†1,g (t)...φ1,gm
 (t)φ1,g (t)...φ1,g (t)|0
1 n
i 1 i
i=1


n
+ 0|φg (t)φ1,gj (t)|0c 0|φ†1,g (t)...φ†1,gm 
 (t)φ1,g1 (t)..φ1,gj (t)...φ1,gn (t)|0,
1
j=1

(t → −∞) (9.142)

with remainder terms of relative order |t|−3/2 . Fields omitted from the second vacuum
expectation value on each line (and coupled to the far past field φg (t)) are indicated
by the hat notation. The vanishing of 0|φ1,g (t)|0 was previously assured by the fact
Asymptotic formalism I: the Haag–Ruelle scattering theory 279

that φ1,g (t)|0 is a one-particle state. This is no longer true for the new field φg :
here we must explicitly assume the vanishing of the VEV of φ(x). For example, we
may assume that our basic interpolating field transforms non-trivially under some
symmetry unbroken by the vacuum (e.g., there is no spontaneous symmetry-breaking
along the lines discussed in Section 8.4: if there is such a symmetry-breaking, we must
shift the field as described there to remove its vacuum expectation value). Thus, the
dominant terms at large time involve only clusters with pairings of the φg (t) field with
either a φ†1,g (t) field or a φ1,gj (t) field, which are then omitted from the rest of the
i

amplitude, as indicated by the notation [φ†1,g (t)] or [φ1,gj (t)]. We have reassembled
i
the clusters not containing the special field φg (t) into full amplitudes (i.e., without the
“c” subscript). Furthermore, we may also eliminate the connected requirement on the
two-field amplitudes appearing in (9.142), as 0|φ†1,g (t)|0 = 0, 0|φ1,gj (t)|0 = 0,
i

0|φ†1,g (t)φg (t)|0c = 0|φ†1,g (t)φg (t)|0 (9.143)


i i

0|φg (t)φ1,gj (t)|0c = 0|φg (t)φ1,gj (t)|0 (9.144)

However, recalling that φ1,g fields connect only to one-particle states, φ1,gj (t)|0 is

simply the time-independent state |gj in = ψ1,gj (k)|kin d3 k. Likewise, 0|φ†1,g (t) =
 ∗ i

ψg (k) in k|d3 k. On the other hand, the vacuum to one-particle matrix elements of
i
φg (t) are determined up to normalization by Lorentz-invariance, as φg is obtained by
smearing a local scalar field φ(x), assumed to be an interpolating field for the particle
in question, so that, exactly as in (9.112), but replacing f (1) → f ,

Z 1/2

in k|φf (x)|0 =  f˜(k)eik·x (9.145)
(2π)3/2 2E(k)

where we have allowed an arbitrary unfixed normalization factor, conventionally called


Z 1/2 , in case the normalization of the basic local field φ(x) is fixed by some independent
(non-linear) constraint. The four-momentum
 k appearing in (9.145) is on mass shell for
 

the particle in |kin , i.e., k0 = E(k) = k + m2 , eik·x = ei(E(k)t−k·


x) . The subsequent
2

smearing of φf with the single-particle positive-energy solution g to yield φg (t) implies



0|φ†1,g (t)φg (t)|0 = ∗
ψ1,g 
 (k)

in k|φg (t)|0d k
3
i i

 ↔
∗  ∂ 
= −i ψ1,g  (k){g(
x, t) in k|φf (
x, t)|0}d3 xd3 k
i ∂t
 ↔
∗  p){ei(

x−E(p)t) ∂  d3 p
= −i ψ1,g  (k)g̃( ink|φf (
x, t)|0}d3 xd3 k (9.146)
i ∂t 2E(p)

Next we need the orthogonality properties of the solutions of the Klein–Gordon


equation (both positive and negative energy) which follow from the integrals
280 Dynamics VII: Interacting fields: general aspects

 ↔

x−E(p)t) ∂ i(E(k)t−

x) 3
e i(

e p − k)
d x = 2i(2π)3 E(k)δ 3 ( (9.147)
∂t
 ↔

i(
x−E(p)t) ∂ −i(E(k)t−

x) 3
e e d x=0 (9.148)
∂t

p + k). Inserting
The second integral vanishes, as it is proportional to (E(k) − E(p))δ 3 (
(9.147) in (9.146) we obtain

0|φ†1,g (t)φg (t)|0 =Z 1/2 ∗
ψ1,g   3
 (k)ψg (k)d k (9.149)
i i

with ψg (k) defined analogously to ψ1,g (k) as in (9.120), but with f˜(k) replacing f˜(1) (k),

g̃(k)f˜(k)
ψg (k) = (2π)3/2  (9.150)
2E(k)

On the other hand, 0|φg (t)φ1,gj (t)|0 = 0|φg (t)|gj in involves the integral in (9.148)
and vanishes identically.
Returning once again to (9.142), we see that the second line vanishes identically,
so by applying Theorem 9.4 to the left-hand side, and to the multi-particle amplitudes
multiplying < 0|φ†1,g (t)φg (t)|0 >, we obtain
i

m 

  ∗   3
in g1 , .., gm |φg (t)|g1 , .., gn in → Z 1/2 ψ1,g  (k)ψg (k)d k
i
i=1
  
· in g1 , ..[gi ].., gm |g1 , g2 , ..gn in , t → −∞ (9.151)

We now consider a smeared field φin,g (t) defined in complete analogy to φg (t) (with
the same smearing functions), but starting from the free local field φin (x) defined in
(9.30) rather than the interacting field φ(x). Recall that both φin (x) and φ(x) are
Heisenberg fields, evolving with the dynamics specified by the full Hamiltonian. The
contribution to φin,g (t) from the destruction operator in φin is found to vanish using
(9.148), and the creation term becomes time-independent:

φin,g (t) = ψg (k)a†in (k)d3 k (9.152)

Sandwiching this result between the bra- and ket-states of (9.151), and recalling that
creation operators acting to the left destroy particles,

  ∗  ∗  ∗  
in g1 , .., gm | |φin,g (t)|g1 , .., gn in = ψ1,g  (k1 )ψ1,g  (k2 ) · ·ψ1,g  (km )ψg (k)
1 2 m

   † 3  3  3
· in k1 , k2 , ..km |ain (k)|g1 , ..., gn in d k1 ..d km d k
Asymptotic formalism II: the Lehmann–Symanzik–Zimmermann (LSZ) theory 281

m 

∗  ∗   3  
= ψ1,g  (k1 ) · · · ψ1,g  (km )ψg (k)δ (k − ki )
1 m
i=1
   3  3  3
· in k1 ..[ki ]..km |g1 , ...gn in d k1 ...d km d k
m 

∗   3   
= ψ1,g  (k)ψg (k)d k · in g1 , ..[gi ].., gm |g1 , g2 , ..gn in (9.153)
i
i=1

which is precisely the same as the limiting behavior on the right-hand side of (9.151),
up to the normalization factor of Z 1/2 . The bra and ket in-states in (9.151) and (9.153)
run over a dense subset of the Hilbert space Hin (indeed, choosing the ψ1,g (k) from
a countable basis of L2 (R3 ), they run over a countable basis of Hin ), so, provided
Axiom IIIb (asymptotic completeness) holds, and we are allowed to identify Hin with
the full Hilbert space H of the theory, the stated equality in the limit amounts to weak
convergence (i.e., matrix element by matrix element) of the smeared interpolating
field φg (t) to the smeared (and time-independent) free in-field φin,g as t → −∞. All
of the above holds, of course, in the far future limit t → +∞, with “in” replaced by
“out” everywhere. We also note that as we shall be employing the asymptotic limit of
the Heisenberg field φ(x) only in matrix elements between (normalizable) states, the
initial smearing of φ(x) is unnecessary (see the comments following Axiom IIa in the
preceding section): we may set f˜(k) = 1, φf (x) = φ(x) in (9.145).
The weak equivalence of φg (t) and φin,g (t) (resp. φout,g (t)) at large negative (resp.
positive) times, in other words, the limiting behavior just established, with complete
mathematical rigor

in β|φg (t)|αin → Z 1/2 in β|φin,g (t)|αin , t → −∞ (9.154)

with the corresponding result for the far future limit, is commonly referred to as the
Asymptotic Condition, and will be the critical starting point for our treatment of the
scattering theory of Lehmann, Symanzik, and Zimmermann (LSZ) in the following
section. It replaces our previous heuristic result (9.46). The Asymptotic Condition
assures us that all of the information contained in the in- and out-states of the theory,
and in particular in their overlap, the S-matrix, is already implicit in the behavior
of the interpolating Heisenberg field(s) of the theory. The LSZ theory, and in particular
the explicit link it provides between the S-matrix and vacuum expectation values of
the associated interpolating fields, is of absolutely central importance in modern field
theory. Indeed, the formula it gives us for the S-matrix in terms of expectation values
of time-ordered Heisenberg fields will be of much greater practical utility, both within
the confines of perturbation theory and beyond, than expressions of the type (9.139)
obtained directly from the Haag–Ruelle approach.

9.4 Asymptotic formalism II: the


Lehmann–Symanzik–Zimmermann (LSZ) theory
Our discussion of the Haag–Ruelle theory has allowed us to establish, on the basis
of the very general axioms listed in Section 9.2, a precise connection between a local
Heisenberg field φ(x) and the asymptotic (in- and out-)multi-particle states of the
282 Dynamics VII: Interacting fields: general aspects

theory, provided that the given Heisenberg field has a non-vanishing matrix element
from the vacuum to the single-particle state of the particle in question:

 Z 1/2
in k|φ(x)|0 =  eik·x , Z = 0 (9.155)
(2π)3/2 2E(k)

We have been able to verify (cf. 9.154), without any recourse to the interaction picture
or perturbation theory, that for arbitrary normalizable in-states |α >in , |β >in , with
g(x, t) a positive-energy solution of the Klein–Gordon equation, as in (9.113),

 ↔

in β| −i d3 x {g(x, t) φ(x, t)}|αin
∂t
 ↔

→ Z 1/2 in β| − i
3
d x{g(x, t) φin (x, t)}|αin , t → −∞ (9.156)
∂t

The basic reason for this limiting behavior is that the smeared Heisenberg field on
the left, sandwiched between in-states which correspond physically in the far past to
states with widely separated free particles, samples a localized region of spacetime
which is effectively the vacuum, and when appropriately folded with a positive-energy
solution of the Klein–Gordon equation, acts like a free field in creating an additional
free particle in that region. Although the Haag Asymptotic theorem provides an
explicit formula for the S-matrix in terms of large time limits of appropriately
smeared Wightman distributions, it turns out that the matrix elements specified
by the theorem are only computable in a rather cumbersome way in perturbation
theory, so while this result is of great conceptual value, it is of rather limited practical
utility.16 In this section we shall derive alternative expressions for the S-matrix
which are particularly suitable for perturbative evaluation, while still allowing the
application of non-perturbative methods in those situations where perturbation theory
is invalid.
Comparing (9.34) and (9.113,9.120), we √ see that in the limit of plane wave solu-
2E(p) 3

tions of well-defined momentum k, g̃( p) = p − k), ψg (
3/2 δ ( (2π)
p − k) and the
p) = δ 3 (
smeared in-field operator on the right-hand side of (9.156) becomes simply the creation
operator a†in (k) for a particle of well-defined momentum. With a realistic particle
wavefunction with some dispersion in momentum, and momentum-space  wavefunction
p), we may denote the corresponding creation operator a†in,g = ψg (
ψg ( p)a†in (
p)d3 p, so

 ↔
∂ †
−i 3
d x g(x, t) in β|φ(
x, t)|αin → Z 1/2 in β|ain,g |αin , t → −∞ (9.157)
∂t

16 Specifically, the matrix elements in an expression like (9.139) involve non-time-ordered fields, due to
the f (1) (x) smearing of the original local fields. The graphical techniques of perturbation theory are, on
the other hand, tailor-made for time-ordered products. This is clear both in the functional framework (cf.
Section 4.2), in which functional integrals naturally yield such time-ordered operator matrix elements, or
from the Gell–Mann–Low theorem proved in Section 9.1.
Asymptotic formalism II: the Lehmann–Symanzik–Zimmermann (LSZ) theory 283

As |αin , |βin are arbitrary, we may take the conjugate of (9.157) to obtain

 ↔
∗ ∂
in β|φ( → Z 1/2 in β|ain,g |αin , t → −∞
3
i d x g (x, t) x, t)|αin (9.158)
∂t

We now observe that, assuming asymptotic completeness (Axiom IIIb), the Hilbert
spaces Hin and Hout coincide, with each other (and, although we do not need it
here, with the full physical Hilbert space H of the theory). In other words, any
(normalizable) |βin state is also an element of Hout . Accordingly, the in β| bra states
in (9.157,9.158) may be replaced by arbitrary out-states:

 ↔
∂ †
−i 3
d x g(x, t) out β|φ(
x, t)|αin → Z 1/2 out β|ain,g |αin , t → −∞ (9.159)
∂t
 ↔
∗ ∂
out β|φ( → Z 1/2 out β|ain,g |αin , t → −∞
3
i d x g (x, t) x, t)|αin (9.160)
∂t

Precisely analogous arguments imply, in the far future limit t → +∞,

 ↔
∂ †
−i out β|φ( → Z 1/2 out β|aout,g |αin , t → ∞ (9.161)
3
d x g(x, t) x, t)|αin
∂t
 ↔
∗ ∂
out β|φ( → Z 1/2 out β|aout,g |αin , t → ∞ (9.162)
3
i d x g (x, t) x, t)|αin
∂t

with aout,g , a†out,g defined in complete analogy to the corresponding in operators, but
starting from the free field φout (x). The asymptotic conditions (9.159–9.162) will be
the starting points for our derivation of the famous (and indispensable) LSZ reduction
formulas for the S-matrix.
We begin with the S-matrix element for the scattering of n incoming scalar
particles, described by momentum-space wavefunctions ψg1 (k), ..ψgn (k), into m outgo-
ing particles, with wavefunctions ψg1 (k), ..ψgm
 (
k). These wavefunctions are assumed
to have disjoint support in momentum space: in particular, no incoming particle
wavefunction has non-vanishing overlap with an outgoing particle wavefunction, as
we wish to exclude uninteresting disconnected contributions to the S-matrix in which
a particle passes through without interaction. After deriving the reduction formula,
we shall take the limit in which the particle wave-packets approach plane waves (i.e.,
the ψ(k) approach δ-functions), to make contact with the LSZ formulas as usually
stated in field-theory textbooks. Thus
    †
Sg1 ..gm
 ,g ..g
1 n
= outg1 , ..., gm |g1 , ..., gn in = outg1 , ..., gm |ain,g1 |g2 , ..., gn in (9.163)

Next, we note that by our assumption of non-overlapping wavefunctions,

aout,g1 |g1 , ..., gm



out = 0 (9.164)
284 Dynamics VII: Interacting fields: general aspects

as the application
 of the destruction operator leads to a sum of terms involving overlap
integrals g1∗ (k)gi (k)d3 k, which all vanish by assumption. We therefore have
 
Sg1 ..gm
 ,g ..g
1 n
= out g1 , ..., gm |g1 , ..., gn in


=  
out g1 , ..., gm |(ain,g1 − a†out,g1 )|g2 , ..., gn in (9.165)

as the a†out,g1 operator acting to the left as a destruction operator gives zero by the
preceding argument. Matrix elements of a†in,g1 , a†out,g1 are given as the asymptotic
limits (9.159) and (9.161), so
 ↔
−1/2 ∂  
Sg1 ..gm
 ,g ..g
1 n
= iZ ( lim − lim ) 3
d xg1 (x, t) outg1 , .., gm |φ(
x, t)|g2 , .., gn in
t→+∞ t→−∞ ∂t
  +∞

−1/2 ∂ ∂  
= iZ 3
d x dt {g1 (x, t) out g1 , ..., gm |φ(
x, t)|g2 , ..., gn in }
−∞ ∂t ∂t
(9.166)

The time-derivative inside the integrand (9.166) can be rewritten recalling that the
wavefunction g1 (x, t) is a solution of the Klein–Gordon equation (cf. (9.113)):

∂ ∂ ∂2 ∂ 2 g1 (x, t)
(g1 (x, t) ....) = g1 (x, t) 2 .... − ....
∂t ∂t ∂t ∂t2
∂2  2 − m2 )g1 (x, t)....
= g1 (x, t) .... − (∇ (9.167)
∂t2
Inserting (9.167) in (9.166), and integrating by parts to transfer the spatial gradients
from the wavefunction g1 (the fast spatial decrease of which ensures the absence of
surface terms17 ) to the matrix element, we obtain

−1/2
Sg1 ..gm
 ,g ..g
1 n
= iZ d4 xg1 (x, t)(x + m2 ) out g1 , ..., gm

|φ(x)|g2 , ..., gn in
(9.168)
where x ≡ ∂ ∂
∂xμ ∂xμ
.
We note that in (9.168), the number of particles in the incoming
state has been reduced by one, and been replaced by an appropriately smeared
Heisenberg field operator sandwiched between the (remaining) incoming and outgoing
states. A result of this type is called a “LSZ reduction formula”. The notion that a
smeared Heisenberg field can be used to create (or destroy) in- or outgoing particles
should hardly be surprising, given the Haag Asymptotic theorem of the preceding
section, but we note the important difference here that the S-matrix element is given
in terms of an integral of a matrix element of such a field over all spacetime, and in
particular over all time, rather than as a limit for large time.

17 Recall that the matrix elements of φ(x) are tempered distributions—i.e., finite derivatives of a
polynomially bounded continuous function—while the C ∞ function g1 (
x, t) decreases faster than any power
of |
x| at any given t.
Asymptotic formalism II: the Lehmann–Symanzik–Zimmermann (LSZ) theory 285

The process of “reducing” particles from the incoming or outgoing state can be
continued, as follows. We focus our attention next on an outgoing particle—say, the one
with wavefunction ψg1 . Begin with the matrix element under the integral in (9.168):

   
out  g1 , ..., gm |φ(x)|g2 , ..., gn in = outg2 , ..., gm |aout,g1 φ(x)|g2 , ..., gn in

 ↔
−1/2 ∂
= iZ lim d3 x g1∗ (x , t )   
out g2 , .., gm |φ(x )φ(x)|g2 , .., gn in (9.169)
t →+∞ ∂t

As the spacetime point x is fixed in (9.169), the product of Heisenberg fields appearing
in the matrix element is automatically time-ordered in the stated limit, so we may
write

 
out g1 , ..., gm |φ(x)|g2 , ..., gn in
 ↔
−1/2 ∂
= iZ lim d3 x g1∗ (x , t )   
out g2 , .., gm |T (φ(x )φ(x))|g2 , .., gn in (9.170)
t →+∞ ∂t

If the far-future time limit in (9.170) is replaced by one in the far past, so that
t → −∞, we note that the time-ordering would imply

 ↔
−1/2 ∂
iZ lim d3 x g1∗ (x , t )   
out g2 , .., gm |T (φ(x )φ(x))|g2 , .., gn in
t →−∞ ∂t
 ↔

= iZ −1/2  lim d3 x g1∗ (x , t )   
out g2 , .., gm |φ(x)φ(x )|g2 , .., gn in
t →−∞ ∂t
 
= out g2 , .., gmφ(x)a
| in,g1 |g2 , .., gn in =0 (9.171)

using the asymptotic condition (9.160), and the fact that the in-state particle wave-
functions are non-overlapping with ψg1 . The expression in (9.170) may therefore be
replaced by one in which the limits at t → +∞ and t → −∞ are subtracted, leading
to an integral over t of the time-derivative, just as in (9.166):

 
out g1 , ..., gm |φ(x)|g2 , ..., gn in
  +∞

−1/2 3  ∂ ∂
= iZ d x dt  {g1∗ (x , t ) 
   
outg2 , .., gm |T (φ(x )φ(x))|g2 , .., gnin } (9.172)
−∞ ∂t ∂t

Once again, using the fact that g1 (x ) is a solution of the Klein–Gordon equation, and
integrating by parts, one may convert this to the form

 
out g1 , ..., gm |φ(x)|g2 , ..., gn in

= iZ −1/2 d4 x g1∗ (x , t )(x + m2 ) out g2 , .., gm

|T (φ(x )φ(x))|g2 , .., gn in } (9.173)
286 Dynamics VII: Interacting fields: general aspects

Inserting (9.173) into (9.168), we obtain a result in which two particles—one incoming,
the other outgoing—have been “reduced out” of the original n → m amplitude:

−1/2 2
Sg1 ..gm
 ,g ..g
1 n
= (iZ ) d4 xd4 x g1 (x, t)g1∗ (x , t )(x + m2 )(x + m2 )
  
· out g2 , .., gm |T (φ(x )φ(x))|g2 , .., gn in (9.174)

This process may evidently be continued (and we encourage the reader to carry it at
least one step further; see Problem 7), removing all the incoming and outgoing particles
from the initial and final states, and leading to the final LSZ reduction formula, giving
the multi-particle S-matrix element in terms of an integral involving the vacuum-
expectation-value of the time-ordered-product of n + m Heisenberg interpolating fields
(the n + m point Feynman amplitude) for the particle undergoing scattering:
 
n 
m
Sg1 ..gm
 ,g ..g
1 n
= (iZ −1/2 )m+n gi (xi )gj∗ (xj )(xi + m2 )(xj + m2 )
i=1 j=1
  4 
· out 0|T (φ(x1 )..φ(xm )φ(x1 )..φ(xn ))|0in d xi d xj
4
(9.175)

It is conventional to go over to the limit in which our wave-packets approx-


imate plane wave solutions of well defined momentum k1 , ..kn , k1 , ..km

. Thus, the
single particle wavefunctions appearing in (9.175) are replaced by pure exponentials
gk (x) = √ 13 e−ik·x , and the LSZ formula gives the S-matrix element as a
(2π) 2E(k)
Fourier transform of the distribution obtained by applying Klein–Gordon operators
Kx ≡ x + m2 to the Feynman amplitude for n + m fields:
 
n 
m
−1/2 m+n 1 1  
Sk1 ..km
 ,k ..k
1 n
= (iZ )  e+ikj ·xj −iki ·xi
i=1 j=1
(2π)3/2 
2E(ki ) (2π)3/2 2E(kj )

  4 
· Kxi Kxj out 0|T (φ(x1 )..φ(xm )φ(x1 )..φ(xn ))|0in d xi d xj
4
(9.176)

It will be convenient to define an intermediate quantity from which the S-matrix ampli-
tude can be extracted via (9.176). Leaving out for the time being the normalization
factors and Klein–Gordon operators, we define the Feynman Green functions in both
coordinate and momentum space in the obvious way:

G(x1 , . . . . xn ) ≡  
out 0|T (φ(x1 )..φ(xm )φ(x1 )..φ(xn ))|0in (9.177)

kj ·xj −i ki ·xi
G̃(k1 , . . . . kn ) ≡ e+i G(x1 , . . . . xn )d4 x1 . . . d4 xn (9.178)

Note that the momenta appearing in (9.178) may be arbitrary four-vectors, not
necessarily satisfying the on-mass-shell condition ki · ki = kj · kj = m2 . In other words,
the LSZ formula provides us with a natural off-mass-shell extension of S-matrix
elements. If we integrate by parts over the spacetime coordinates xi , xj in (9.176)
we may write the S-matrix element as
Asymptotic formalism II: the Lehmann–Symanzik–Zimmermann (LSZ) theory 287

n  m
−iZ −1/2 (ki2 − m2 ) −iZ −1/2 (kj2 − m2 )
Sk1 ..km
 ,k ..k
1 n
=  G̃(k1 , . . . . kn ) (9.179)
(2π) 3/2 2E(k ) 3/2 
i=1 j=1 i (2π) 2E(k ) j

We see that if the on-mass-shell S-matrix element is to be finite and non-vanishing,


the momentum-space Green function G̃(k1 , . . . . kn ) must contain a simple pole in the
off-shellness variable k 2 − m2 for each incoming and outgoing particle. The residue of
the term containing a single such pole in each external variable, modified appropriately
by the normalization factors in (9.179), then gives the on-mass-shell physical S-matrix
for the scattering process. The appearance of simple poles for each external particle
will be clarified in the context of perturbation theory in Chapter 10, when we see
that they are associated graphically with the external legs of the Feynman diagrams
corresponding to the Green function G̃(k1 , . . . . kn ).
There are a number of important points to be made in connection with the
interpretation and use of the LSZ reduction formalism.

1. The asymptotic conditions used to derive the formula hold, by the Haag–
Ruelle theory, for any almost local field φ(x) with a non-vanishing vacuum to
single particle matrix element (9.155). In particular, they hold for almost local
composite fields (i.e., multi-local combinations of the local fields appearing in
the Hamiltonian defining the dynamics of the theory, as in (9.81)) with such
a non-vanishing matrix element. Such fields must be used, as we shall see in
more detail in the next section, if the particle in question is a bound state.
Even if the particle corresponds to an elementary local field in the theory, there
is no unique interpolating field giving the correct S-matrix for its scattering!
For example, if φ(x) is a local interpolating field for the particle in question,
with k|φ(x)|0 = 0, then for general values of a, b, c, .. we certainly would expect
that φ (x) = aφ(x) + bφ(x)2 + cφ(x)3 .. would also have a non-vanishing vacuum
to single particle matrix element, and the LSZ formula will hold equally well
using this field instead of φ(x). Of course, the Green function G(x1 , . . . . xn )
(and the normalization constant Z) will clearly be different with different fields:
only the multiple pole residue of the on-mass-shell limit of its Fourier transform
is guaranteed to be independent of the choice of field, as it gives the presum-
ably unique physical S-matrix amplitude for the scattering of a specific stable
particle.
2. The existence of simple poles in the off-shellness variables k2 − m2 for all exter-
nal (incoming and outgoing) particles is a rigorous consequence of the Haag–
Ruelle/LSZ theory, and depends critically on the assumed mass gap in the theory.
In a theory such as QED, with a strictly massless photon, this result no longer
holds. In fact, the singularities of charged-particle Green functions in the on-shell
limit are softer than simple poles, and connected S-matrix elements for specified
finite numbers of such incoming and outgoing particles vanish, as we shall see
in Section 19.1. The problem is that, with a strictly massless particle, it takes
essentially no energy to produce any number of extra very-low-energy particles,
so that the probability of finding a strictly finite number in any process where
a physical interaction has occurred is zero. Of course, in actual experiments the
288 Dynamics VII: Interacting fields: general aspects

detector resolution is finite, and ultra-soft photons are undetectable. Giving the
photon a very small mass (smaller than the detector resolution) restores sanity:
a non-vanishing S-matrix, and sensible cross-sections, rates, etc. We shall return
to this subject in Chapter 19.
3. If the fields φ(x) appearing in (9.176) are ultralocal (cf. Section 5.5), the T-
product defines a Lorentz scalar Green function, and the Lorentz invariance of
the resulting S-matrix is manifest (recall that the non-covariant energy square-
root factors are associated with our choice of non-covariantly normalized states).
However, as just discussed, it is perfectly possible to use fields which are almost
local (e.g., composite fields) but not strictly local, in which case the off-shell Green
functions, both in coordinate and momentum space, are not Lorentz-invariant.
However, the Lorentz-invariance of the S-matrix, which follows rigorously from
the Haag–Ruelle theory, assures us that this property still holds in the on-shell
limit (i.e., for the residue of the multi-pole term). This situation, in which a
symmetry of the theory is only recovered in the on-shell limit, is actually quite
common in field theory, as we shall see later in Part 3 of the book when we study
symmetries in field theory in detail.
4. The extension to particles and fields with non-vanishing spin is straightforward.
One begins from the generalization of (9.33, 9.34), which reconstruct the destruc-
tion and creation operators from the relevant free covariant (in- or out-)field, and
applies the asymptotic condition precisely as above. An example for spin- 12 Dirac
fields is given in Problem 8.
5. The crossing symmetry of S-matrix amplitudes discussed in a few simple examples
in Section 7.6 is seen to be an almost trivial consequence of the basic LSZ formula
(9.176). In the case of the self-conjugate scalar field for which this formula
applies, particles and antiparticles are identical, so the statement that initial-
state particles (resp. antiparticles) can be exchanged with final-state antiparti-
cles (resp. particles) by the simple expedient of inserting a minus sign in the
corresponding four-momentum follows from (a) the symmetry of the T-product
under exchange of the spacetime coordinates of the fields, and (b) the form of
 
the Fourier transform, which contains a factor e+ikj ·xj for final-state particles
(or antiparticles) and a factor e−iki ·xi for initial-state particles (or antiparticles).
In the event that we are dealing with non-self-conjugate fields, with distinct
particles and antiparticles, the need to interchange particles and antiparticles
when we cross from initial to final states is a simple consequence of the fact that
the in- and out-fields contain positive frequency parts corresponding to particle
destruction operators and negative-frequency parts corresponding to antiparticle
creation operators. If the reader retraces the derivation of the LSZ formula in
such a case, starting with the obvious generalization of the basic asymptotic
conditions (9.159–9.162) for the complex field case (replacing, for example, a†in,g
in (9.159) with ac† in,g ), the general form of the crossing rule for the S-matrix will
become immediately evident.
6. We note that the reduction formula (9.176), containing the Green function
 
out 0|T (φ(x1 )..φ(xm )φ(x1 )..φ(xn ))|0in , is precisely in a form amenable to pertur-
bative treatment via the Gell–Mann–Low formula (9.21), taking |α, |β to be the
Spectral properties of field theory 289

vacuum state. This convenient form explains why the LSZ, rather than the Haag–
Ruelle, approach has dominated the treatment of scattering processes in field
theory. It will be the starting point for our treatment of covariant perturbation
theory in Chapter 10.
7. Obviously, any calculation of the S-matrix amplitude using (9.176) must include
a knowledge of the normalization constant Z, appearing in (9.155). This constant
is conventionally, and somewhat misleadingly, referred to as the “wavefunction
renormalization constant” for the particle, although it is clear from the way
we have introduced it that it is more properly associated with the choice of
interpolating field. We shall see in Section 9.5 how to extract it from the
behavior of the two-point Feynman amplitude G(x1 , x2 )—commonly called the
“full Feynman propagator” of the theory.
8. The translation property of the Green function

G(x1 , ...., xn ) = G(x1 − a, ...., xn − a) (9.180)

for any fixed four-vector displacement a, valid for both local and almost-local
fields, implies (see Problem 9) energy-momentum conservation: namely

Sk1 ..km
 ,k ..k
1 n
∝ δ 4 (k1 + .. + km

− k1 − ... − kn ) (9.181)

9.5 Spectral properties of field theory


We have already seen that an enormous amount can be learned by starting from
some very general assumptions concerning the nature of the particle states and
interacting fields of a local quantum field theory. In particular, one object of direct
phenomenological interest, the scattering matrix, has been related directly to the
Fourier transform of a Feynman Green function, defined as the vacuum expectation
value of a time-ordered product of Heisenberg fields. Of course, further progress
requires that we develop methods to calculate these Green functions. One obvious
option is perturbation theory, which we study in detail in the next Chapter. But
the usefulness of perturbation theory is contingent on the existence of a split of
the Hamiltonian of the theory into a solvable “free” part which isolates the “large”
parts of the time evolution, in such a way that the resultant asymptotic expansion in
the “interaction” part of the Hamiltonian produces an (initially) rapidly convergent
series of approximants to the desired quantity, such as the S-matrix. In quantum
electrodynamics this program has been brilliantly successful, leading to some of the
most accurate predictions of quantitative science.
However, in the strong interactions the appearance of an effectively large interac-
tion at some point in any hadronic process usually means that perturbation theory
is at best only qualitatively useful. In such cases, we either have to give up on the
project of a complete analytic calculation of the Green functions of the theory and rely
on general properties of the theory (e.g., the axioms of Section 9.2) to put contraints
on their behavior which translate into phenomenologically testable properties of the
theory; or we resort, as in the case of lattice field theory, to a direct, but necessarily
approximate, numerical calculation of the Green functions, making no reference to
290 Dynamics VII: Interacting fields: general aspects

perturbation theory. In this section we describe some exact non-perturbative results


along the lines of the first approach mentioned above, adhering mainly to the
simplest possible case: the two-point Wightman W (x1 , x2 ) or Feynman G(x1 , x2 )
functions for a Heisenberg field interpolating for a stable self-conjugate massive spinless
particle (self-interacting, with no bound states). Here, the spectral axioms play a
central role.
Our first task is to derive a general representation for the Fourier transform of the
Wightman two-point function W (x1 , x2 ) = 0|φ(x1 )φ(x2 )|0. The existence of a well-
defined Fourier transform follows from the fact that W (x1 , x2 ) = W (x, 0), x ≡ x1 − x2
is a tempered distribution. The structure of this representation will be determined
simply by Lorentz-invariance and unitarity. In this context, the latter property means
the existence of a complete orthonormal basis |α in our positive norm Hilbert space,
where the states are chosen to be eigenstates of the energy-momentum four-vector P ,
P μ |α = Pαμ |α. Inserting such a complete set between the two fields, we obtain

W (x1 − x2 ) = 0|φ(x1 )|αα|φ(x2 )|0
α

= 0|eiP ·x1 φ(0)e−iP ·x1 |αα|eiP ·x2 φ(0)e−iP ·x2 |0
α

= e−iPα ·x 0|φ(0)|αα|φ(0)|0
α
 
= { |0|φ(0)|α|2 δ 4 (p − Pα )}e−ip·x d4 p (9.182)
α

where we assume that the integral over four-momentum can be interchanged with the
sum over states (see below). Next, note that the function in brackets in (9.182),

f (p) ≡ |0|φ(0)|α|2 δ 4 (p − Pα ) (9.183)
α

has, by the spectral axioms of our theory (cf. Section 9.2, Axioms Ib, Ic, Id), support
only for p2 ≥ 0, and if we assume 0|φ(0)|0 = 0, only for p2 ≥ m2 , where m is the
single-particle mass. In fact, the support of f (p) is restricted to the one-particle mass
hyperboloid p2 = m2 and the multi-particle continuum starting at p2 = (2m)2 (or, if
there is a symmetry φ → −φ, at p2 = (3m)2 ). Moreover,

f (p) = |0|U † (Λ)U (Λ)φ(0)U † (Λ)U (Λ)|α|2 δ 4 (p − Pα )
α

= |0|φ(0)|Λα|2 δ 4 (p − Pα )
α

= |0|φ(0)|α|2 δ 4 (p − Λ−1 Pα )
α

= |0|φ(0)|α|2 δ 4 (Λp − Pα ) = f (Λp) (9.184)
α
Spectral properties of field theory 291

Accordingly, f (p) is a Lorentz-invariant function of p, which vanishes for p0 < 0. We


may therefore write
 1
|0|φ(0)|α|2 δ 4 (p − Pα ) = θ(p0 )ρ(p2 ) (9.185)
α
(2π)3

where the spectral function ρ(p2 ) is positive (or zero) with support on the spectrum of
1
the squared mass operator P 2 as indicated above, and the normalization factor (2π) 3 is

chosen for later convenience. We finally obtain, writing ρ(p2 ) = δ(p2 − μ2 )ρ(μ2 )dμ2 ,

W (x) = θ(p0 )ρ(p2 )e−ip·x d4 p
 ∞
= ρ(μ2 )W0 (x; μ)dμ2 (9.186)
0

where
 
d4 p 1 d3 p −ip·x
W0 (x; μ) ≡ θ(p0 )δ(p2 − μ2 )e−ip·x = e = Δ+ (x; μ)
(2π)3 (2π)3 2E(p)
(9.187)
Here Δ+ (x; μ) is the invariant function arising from the two-point function of a
free, canonically normalized scalar field of mass μ (cf. Chapter 6, (6.63)). This
remarkable result—that the Wightman two-point function of an arbitrary scalar
interacting Heisenberg field can be written as the positively weighted average of the
corresponding free field Wightman functions for fields of varying mass, with a positive
weight-function containing all the non-trivial interaction physics of the theory—is
called the Kållen–Lehmann representation of the two-point function. We note that it
implies the vanishing of the VEV of the space-like commutator, 0|[φ(x1 ), φ(x2 )]|0 =
0, (x1 − x2 )2 < 0, as the invariant function Δ+ (x; μ) is symmetric at space-like points,
Δ+ (x; μ) = Δ+ (−x; μ), x2 < 0, even though we have not assumed locality of our field.
Vanishing of the matrix element of the space-like commutator between arbitrary states
would require locality (Axiom IIc, Section 9.2).
The spectral representation (9.186) implies that the Fourier transform W̃ (p) of
W (x) is basically the invariant spectral function θ(p0 )ρ(p2 ): as W (x) is a well-defined
tempered distribution, by the basic axioms of Section 9.2, its Fourier transform is
likewise well-defined, and the defining sum for the spectral function (9.184) must
therefore be convergent. We may therefore expect that the interchange of integration
and summation performed above is in this case quite legal. As we shall see below, this
is not necessarily the case for the spectral representation of other two-point functions.
Although it would require an exact solution of the interacting field theory to
calculate the full spectral function ρ(p2 ), the contribution from one-particle states
is calculable up to a normalization constant from (9.155). Thus

1 Z
3
θ(p0 )ρ1part (p 2
) = d3 k|0|φ(0)|kin |2 δ 4 (p − k) = 3
δ(p0 − E(p))θ(p0 )
(2π) (2π) 2E(p)
⇒ ρ1part (p2 ) = Zδ((p0 − E(p))(p0 + E(p)) = Zδ(p2 − m2 ) (9.188)
292 Dynamics VII: Interacting fields: general aspects

A spectral representation for the Feynman two-point Green function of the theory
G(x1 , x2 ) = 0|T (φ(x1 )φ(x2 ))|0, sometimes called the full propagator, can be derived
following the pattern for W (x1 , x2 ). In fact, as

G(x1 , x2 ) = θ(t1 − t2 )W (x1 , x2 ) + θ(t2 − t1 )W (x2 , x1 ) (9.189)

and the spectral representation for W (x1 , x2 ) gives a linear superposition of free field
Wightman functions for mass μ weighted by ρ(μ2 ), one finds the obvious result (again,
defining x = x1 − x2 ):

G(x) = ρ(μ2 )G0 (x; μ)dμ2 (9.190)

where the corresponding free-field time-ordered Green function G0 (x; μ) for a particle
of mass μ is just i times the free Feynman propagator ΔF (x; μ) introduced in Section
7.6 (cf. (7.215)). The Fourier transform is accordingly a weighted average of the
momentum-space free Feynman propagator:

1
−iG̃(p) ≡ Δ̂F (p ) = ρ(μ2 ) 2
2
dμ2 (9.191)
p − μ2 + i

The one-particle contribution to the full propagator, which we now denote Δ̂F (p2 ),
can be isolated and displayed explicitly, using (9.188):
 ∞
2 Z 1
Δ̂F (p ) = 2 + ρ(μ2 ) 2 dμ2 (9.192)
p − m + i
2 2
Mmulti p − μ2 + i
2
where Mmulti is the lowest squared-mass threshold for multi-particle states (i.e., 4m2
if 0|φ(0)|k1 , k2 in = 0, 9m2 if the first non-vanishing multi-particle matrix element
occurs for three particle states, 0|φ(0)|k1 , k2 , k3 in , and so on). This result gives us
the promised interpretation of the normalization constant Z appearing in the LSZ
reduction formulas: it is simply the residue of the momentum-space full propagator
of the interpolating field at the single-particle pole. The same procedures, whether
perturbative or non-perturbative, which are applied to the calculation of the n + m
point Green function G̃(k1 , .., km
, k1 , ..kn ) in the LSZ formula can be used to calculate
the two-point function and extract the required constant Z.
An important difference in the spectral representations of the two-point Wightman
and Feynman functions is immediately apparent in (9.191): the representation for the
Fourier transform of the time-ordered two-point function contains an integral, the
convergence of which evidently requires that
 ∞
ρ(μ2 ) 2
dμ < ∞ (9.193)
2
Mmulti μ2

Unfortunately, the axioms introduced in Section 9.2 are not adequate to ensure
the existence of this integral in all cases. Basically, the problem is that although
θ(x01 − x02 ) and 0|φ(x1 )φ(x2 )|0 are separately fine distributions (with well-defined
Fourier transforms), their product, occurring in the definition of the time-ordered
Green function, is not necessarily a well-defined distribution, with an unambiguous
Spectral properties of field theory 293

Fourier transform. A classic example is given by the product of the θ and δ-functions,
θ(x)δ(x) =?, which is well-defined (namely, zero) on the subset of test functions
vanishing at x = 0, but when extended to a distribution on the full Schwarz space
necessarily involves an undetermined constant, θ(x)δ(x) = Cδ(x) (we can think of
the constant C as our (arbitrary) choice for the “value” of the step function θ(x) at
x = 0). We can also regard convergence problems in the final integral result (9.191) as
due to an unjustified interchange of summation and integration in the process of the
“derivation”, as mentioned above.
What do we actually know about the asymptotic behavior of the spectral function
ρ(μ2 ) at large μ2 ? This behavior will clearly hinge on (i) the specific field(s) appearing
in the time-ordered product (e.g., elementary versus composite), and (ii) the details
of the dynamics (i.e., interactions) in the theory, which determine the multi-particle
matrix elements in the sum (9.185) defining the spectral function. For the specific
example under consideration here, our two-point function involves an elementary
hermitian spinless field, and the dynamics is assumed to be specified by an interaction
Hamiltonian leading to a perturbatively renormalizable theory,18 such as Hint = 4! λ 4
φ
(or more generally, Hint = 3! φ + 4! φ ). In such theories, as we shall see later in
λ3 3 λ4 4

our study (cf. Chapter 18) of the scaling properties of local field theories, the use of
renormalization group techniques allows us to derive the asymptotic behavior both of
Δ̂F (p2 ) and ρ(μ2 ), to any finite order of perturbation theory.19 The result is that in
each order of perturbation theory, the spectral function falls like μ12 × powers of ln μ2 ,
ensuring the convergence of the spectral representation. For the super-renormalizable
φ3 theory, the falloff is even faster (Barton, 1965): ρ(μ2 ) μ14 (times logarithms).
Thus, in these cases, at least within the context of perturbation theory, the Lehmann
representation (9.192) is on a quite firm footing. Note that this representation implies
that the momentum-space full Feynman propagator Δ̂F (p2 ) can be regarded as the
limit of an analytic function F (w) as w → p2 + i: i.e., as the complex variable w
approaches the real axis from above, with

 ∞
Z 1
F (w) = + ρ(w ) dw  (9.194)
w − m2 2
Mmulti w − w

18 The definition and study of renormalizable field theories will be one of our primary objects in Part
4 of this book. For the time being, the reader is invited to think of such theories as ones in which a
well-defined continuum limit exists at the perturbative level: namely, there is a well-defined asymptotic
expansion of Feynman functions of the theory in powers of suitably defined coupling parameter(s), with the
contributions at each order specified in terms of a finite number of parameters.
19 There is a large amount of circumstantial evidence—though as yet no complete proof—that renormal-
izable self-interacting scalar field theories in four spacetime dimensions do not possess a non-trivial—i.e.,
interacting—continuum limit, even though the perturbative expansion is order-by-order well-defined. In
other words, there is no set of Wightman functions satisfying all axioms of Section 9.2 whose asymptotic
expansion in a suitably defined coupling constant agrees with the renormalized perturbative expansion of a
φ4 theory. Such theories, as we shall explain in Part 4 of the book, still have a perfectly sensible interpretation
as effective field theories. For super-renormalizable theories, such as φ3 theory in four dimensions (alas, with
a spectrum unbounded below; cf. Section 8.4), or φ4 theory in two or three spacetime dimensions, the
continuum limit exists. Asymptotically free theories such as QCD, based on a non-abelian gauge group, are
also thought to have a well-defined continuum limit beyond perturbation theory.
294 Dynamics VII: Interacting fields: general aspects

With ρ(w ) falling at least as fast as 1/w , this representation implies that F (w) is a
real analytic function in the complex plane of w, with a simple pole at w = m2 , and
2
cuts on the positive real axis beginning at w = Mmulti . There are multiple cuts because

the spectral function ρ(w ) is the sum of n-particle contributions ρn (p2 ) which switch
on at progressively higher values of w  : at w = (2m)2 for the two-particle states |α
in (9.185), at w = (3m)2 for the three-particle states, and so on. Using the familiar
identity
1 1
=P ∓ iπδ(w − w  ) (9.195)
w − w ± i
 w − w
the discontinuity of F (w) across the cut for positive real w = p2 is given by


F (p2 + i) − F (p2 − i) = −2iπρ(p2 ) = −2iπ ρn (p2 ) (9.196)
n=2

clearly indicating the presence of distinct branch points at p2 = n2 m2 . As the discon-


tinuity of F across the cut is (for a real analytic function) equal to 2iIm(F ) (where by
convention the imaginary part is taken on the upper lip of the cut), the representation
(9.194) can be rewritten as a dispersion relation

Z 1 ∞ Im(F (w )) 
F (w) = + dw (9.197)
w − m2 π Mmulti
2 w − w

allowing the reconstruction of the full analytic function F (w) anywhere in the complex
w-plane from knowledge of its residue at the single-particle pole and its discontinuity
along the cut(s) on the positive real axis.
One further consequence of the Lehmann representation (9.192) is worth com-
menting on at this point. The positivity of the spectral function (which goes back,
of course, to the underlying positivity of the metric of our Hilbert space) clearly
implies that the 1/p2 behavior of the free propagator at large p2 cannot be damped
(to a more rapid decrease) by interactions, as the contribution of the integral is non-
negative. At one point, attempts were made to construct a renormalizable theory of
quantum gravity by introducing higher derivative terms in the Lagrangian with the
effect of damping the high-momentum behavior of the graviton propagator in order to
eliminate the proliferation of ultraviolet divergences in higher orders of perturbation
theory that plague the conventional Einstein–Hilbert theory. The Lehmann spectral
representation shows that such a damping can be possible only in the presence of
negative metric states in the theory.
The derivation of the spectral representation given above depended, apparently,
only on a few very basic properties of the Heisenberg field φ(x) appearing in the
time-ordered product: specifically, hermiticity and the appropriate transformation
properties of the fields under the Poincaré group, together, of course, with the com-
pleteness sum appropriate to a positive metric Hilbert space. However, as we indicated
previously, the derivation also involves interchanges of summation and integration
which are potential sources of disaster. In this case, disaster means a non-convergent
spectral representation. In such a circumstance, the resultant dispersion relation needs
Spectral properties of field theory 295

a subtraction, resulting in the appearance of one (or more) undetermined arbitrary


constants not already fixed by the properties of the individual fields. The appearance
of such additional parameters is a consequence of the potential ambiguities in the
product of two distributions mentioned previously.
Before giving a specific example it will be useful to consider an heuristic argument
leading to a specification of the form these ambiguities can be expected to take
in the momentum-space Feynman functions under consideration. We are concerned
with a sum (corresponding to the two time orderings) of products of θ-functions
of the form θ(x01 − x02 ) with vacuum expectation values of the product of two local
operators 0|O1 (x1 )O2 (x2 )|0 (O1 and O2 may or may not be the same operator).
The only ambiguities that can occur involve taking the times x01 and x02 coincident
(otherwise, we are dealing with well-defined Wightman distributions of the fields),
and also spatial coincidence of the fields x1 → x2 , as otherwise (at equal time) a small
boost could be used to separate the times while (by locality) leaving the VEV of
the two fields essentially unchanged. Therefore (cf. our discussion in Section 5.5 of
the need for ultralocal interaction Hamiltonians) we may expect the appearance in
general of contact type singularities when the spacetime points x1 and x2 coincide. In
coordinate space we may expect that these correspond to the δ-function δ(x1 − x2 )
or spacetime-derivatives thereof; in momentum space, such contact terms correspond
to constants or polynomials in the four-momentum variable. If the Green function in
question is a Lorentz scalar (such as our scalar propagator Δ̂F (p2 ) above), we therefore
expect ambiguities of the form C0 + C1 p2 + .... A spectral representation of the form
(9.192), with Δ̂F (p2 ) → 0, p2 → ∞, is clearly not possible in this case. Instead, the
dispersion relation must be written in a subtracted form which restores the necessary
convergence.
A full discussion of the role of dispersion relations in quantum field theory would
require more space than is available here: instead, we give a simple example to make the
issues more concrete. The cross-section for annihilation of an electron and positron into
arbitrary hadronic final states (the so-called inclusive e+ − e− annihilation process)
played a pivotal role in the history of perturbative quantum chromodynamics (QCD):
it represents the archetypal process in which perturbative calculations can be shown
to yield reliable results at high energy in a strongly interacting theory such as QCD
which displays the property of “asymptotic freedom” (roughly speaking, a damping
of interaction strength at short distances or large momenta/energy). This cross-
section turns out to be related to the spectral density for the two-point function
μ
of the hadronic electromagnetic current Jem,had (x), which in momentum space takes
the form

μ
i 0|T {Jem,had ν
(x)Jem,had (0)|0eiq·x d4 x = (g μν q 2 − q μ q ν )ω(q 2 ) (9.198)

μ
We shall see later, in Chapter 15, that the current Jem,had (x) is a composite field,
involving terms quadratic in quark fields. The transverse tensor on the right-hand side
μ
simply expresses the conservation of the current ∂μ Jem,had (x) = 0, so the interesting
physics is contained in the scalar function ω(q ). A dispersion relation for ω(q 2 ) can
2

be “derived” along the same lines as the Lehmann representation (9.191) for the
296 Dynamics VII: Interacting fields: general aspects

two-point function of elementary scalar fields (insertion of a complete set of states,


interchange of summation and integration, etc.), but in this case the result involves
a divergent integral, as the spectral density (imaginary part of ω(q 2 )) is found not to
vanish for large q 2 , but rather to go to a constant (in higher orders of perturbation
theory, times logarithms). On the other hand, if we subtract the value of ω(q 2 ) at zero
momentum (say, the subtraction can be made at any fixed momentum point), the
resultant spectral representation involves a more rapidly convergent integral, so that
a convergent dispersion relation can be written for this subtracted quantity. Using
the notation introduced in (9.197)—namely, ω(q 2 ) → F (w)—and dropping the single-
particle pole term,20 we have the once-subtracted dispersion relation

1 ∞ 1 1
F (w) − F (0) = Im(F (w ))(  − )dw (9.199)
π Mmulti
2 w − w w

or, equivalently,
 ∞
w Im(F (w ))
F (w) = F (0) + dw (9.200)
π 2
Mmulti w (w  − w)

The constant F (0) = ω(q 2 = 0) appearing on the right-hand side indicates the appear-
ance of an ambiguity in the definition of the T-product of two currents, even though the
currents themselves are individually perfectly well-defined. In this case the ambiguity
involves just the constant C0 discussed previously (i.e., C1 , C2 , .. = 0), and a single
subtraction produces sufficient inverse powers of the integration variable w to ensure
convergence of the spectral integral.21
The appearance of ambiguities in the time-ordered products of composite oper-
ators like the electromagnetic current discussed above, with the concomitant need
for subtractions in the associated dispersion relation, may well provoke feelings of
unease in the attentive reader. Our development of the LSZ reduction formalism in
Section 9.4 was motivated by the desire to achieve a computationally convenient
representation of general S-matrix amplitudes: the completely rigorous (and well-
defined!) representation (9.139) following from the Haag Asymptotic Theorem 9.4,
involving the vacuum expectation value of ordinary (i.e., not time-ordered) products of
smeared field operators, is extremely cumbersome to implement either in perturbation
theory or with available non-perturbative techniques. On the other hand, the LSZ
formula (9.175) can be developed straightforwardly in perturbation theory, as we
shall explain in detail in the next chapter, using the Gell–Mann–Low formula (9.21).
Moreover, ground-state (i.e., vacuum) expectation values of time-ordered products of
Heisenberg operators may be readily transcribed into a path-integral formulation (cf.

20 The μ
current Jem,had (x) is assumed to contain Heisenberg fields with only strong interactions—the
electromagnetic interactions are switched off in these fields—so that there is no pole corresponding to a
stable particle (e.g., the photon) for which the current interpolates. The rho resonance appears as a pole of
ω, but on the unphysical second sheet, as the rho is unstable.
21 Again, as in the case of the scalar-field two-point function, renormalization group techniques allow us
to derive the relevant asymptotic behavior, in the case of an asymptotically free theory like QCD, to all
orders of perturbation theory.
General aspects of the particle–field connection 297

Section 4.2), which we shall see is of enormous importance in both perturbative and
non-perturbative approaches to field theory.
Unfortunately, we now realize that the time-ordered products appearing in LSZ-
type formulas, once developed perturbatively à la Gell–Mann–Low (thus introducing
composite interaction Hamiltonian operators into the time-ordered products), are
likely to contain undefined ambiguities! The ambiguities, of course, arise from the
multiplication of distributions and are manifested as short-distance singularities in
the resultant products, or, in momentum space, as the familiar ultraviolet divergences
appearing in perturbative loop integrals for the Feynman functions of the theory. If the
field theory is regulated at short distance, say by replacing continuous spacetime by a
discrete spacetime lattice (thereby effectively introducing a high-momentum ultraviolet
cutoff in the theory), the ambiguities are eliminated. Of course, to recover the full
(continuous) Poincaré invariance of the theory, the spacing of the lattice points must
eventually be taken to zero, or equivalently, the ultraviolet cutoff taken to infinity, and
the question then arises as to the existence of a well-defined and unambiguous limit
for the S-matrix amplitudes when this is done. From a physical point of view, the
sensitivity of a field theory to the insertion of an ultraviolet (UV) cutoff is equivalent
to the question of the sensitivity of the low-energy (low means momenta much smaller
than the UV cutoff) predictions of the theory to our inevitable ignorance of new
physics at much higher momenta (i.e., much smaller distance scales). The study of the
sensitivity of local field theories at low energies to alterations in their short-distance
structure will be the primary focus of Chapters 16 and 17 of this book. For the present,
the reader should be reassured that for the class of field theories called “perturbatively
renormalizable theories”, the ambiguities appearing in the continuum limit (UV cutoff
going to infinity) in the Feynman Green functions appearing in the LSZ formula can
be shown to be completely absorbable in a finite set of low-energy parameters (masses
and couplings) which uniquely determine (order by order, for all orders of perturbation
theory) the S-matrix amplitudes of the theory, up to terms which fall as a power
(usually, at least quadratic) of the low-energy mass and momentum scales divided by
the UV cutoff. The latter correction terms are not unique, but depend on the precise
details of the regularization (i.e., how the UV cutoff is introduced), reflecting the
aforesaid ambiguities present in the underlying time-ordered products.

9.6 General aspects of the particle–field connection


In most of our discussion of interacting fields in this chapter so far, we have chosen to
illustrate the general features for a particular class of field theory, involving a single
stable spinless particle, whose self-interactions do not induce the formation of bound
states. In this section, all such restrictive assumptions will be relaxed, as we wish to
present a rather general discussion of various aspects of the particle–field connection in
local relativistic quantum field theories. Our discussion of the Haag–Ruelle and LSZ
theories already gives us the essential conceptual basis for this discussion. Some of
the statements made here—for example, concerning color confinement in non-abelian
gauge theories—will have to be taken on trust for the time being: a fuller discussion
awaits in Part 4 of the book. This “borrowing” is justified by the desire to make the
discussion of particle–field duality as complete as possible, by indicating the wonderful
298 Dynamics VII: Interacting fields: general aspects

variety of ways in which the concepts of particle and field are linked in the panoply of
field theories which are known to be relevant in high-energy physics.
We shall start at the particle end, by noting two important classifications which
can be applied to particles. By “particle”, we mean simply a state which to some
appropriately high degree of approximation can be regarded as an eigenstate of
the squared-mass operator P 2 and the spin (i.e., J2 , Jz in the frame where P = 0).
Particles can therefore be associated with irreducible representations of the HLG, as
discussed in Sections 5.2 and 5.3. Beginning with this rather vague specification, one
finds that the zoo of particles encountered in the Particle Data Book may be broken
down into subcategories on the basis of the following two fundamental classifications:

1. Stable particles versus unstable particles (or “resonances”).


2. Elementary particles versus composite particles.

It should be said at the outset that neither of these distinctions is to be regarded as


absolute, for reasons that will shortly become clear.
We begin with the issue of stability. The physical Hilbert space of the theory
actually only contains the stable particles—those which survive to asymptotically
large times in the future, or can survive after preparation in the far past to reach an
interaction event at finite times. This is the content of the asymptotic completeness
principle discussed at length in the preceding sections. Some examples may be useful
here to indicate what this really implies, which can be quite surprising at first glance.22
As a consequence of weak and electromagnetic interactions, the only stable hadron is
the proton (and its antiparticle, the antiproton). Accordingly, the asymptotic in- and
out-space for hadrons consists entirely of multi-particle proton–antiproton states!
What, then, is a neutral pion—which as we know is unstable to decay electro-
magnetically to two photons? It may be regarded as a resonant state in the two-
particle scattering channel of photons. Likewise, the neutron (unstable to β-decay)
is a resonance in the three-particle (proton, electron, anti-electron–neutrino) channel!
We shall explain more precisely below in field-theoretic terms what such a resonance
means, but here the point to emphasize is that an unstable particle in the theory should
be thought of as a transitory physical event representable only in a rather complicated
way in terms of physical state vectors which can only involve stable particles. If we were
to switch off the weak and electromagnetic interactions, then the strong interaction
physical Hilbert space would consist of the multi-particle Fock space of in- (or out-)
states of all stable particles, which would now include (in addition to the proton) the
neutron and pion triplet. The rho particle, on the other hand, would still be regarded
as a messy sort of temporary two-pion state: a resonance in the two-pion scattering
channel. But there would be no single- or multi-rho states in the asymptotic Hilbert
(in- or out-)space of the theory. As it is, in the Standard Model, the peculiar fact is
that the physical Fock space of the theory, as defined in Sections 9.2, 9.3, and 9.4,
contains single- (and multi-)particle states of the hydrogen-atom ground state (mass
938783014(80) eV, spin 0), but no (single- or multi-)neutron states! From the point

22 We shall remain within the context of Standard Model physics in our examples, to avoid too many ifs,
ands, or buts! Thus, the possibility of “exotic” processes such as proton decay is ignored.
General aspects of the particle–field connection 299

of view of the asymptotic formalism of field theory, a stable composite particle is on


just the same footing, in being present in the in- and out-Fock spaces of the theory,
as a stable elementary particle of the theory.
Because of the wide difference in scales of various interactions, it may, of course,
be perfectly sensible to view an unstable particle, with an average lifetime much larger
than the other time-scales of interest in the processes under study (the neutron mean-
life is 15 minutes—essentially infinite on the time-scale of subatomic processes!) as
stable, and include it in the asymptotic states of the theory. In order to preserve
unitarity, and avoid double-counting, the most honest way to do this is as indicated
above: one switches off the guilty, destabilizing interaction responsible for decay. In
the resultant theory, asymptotic states of the newly stable particle appear perfectly
legally in the physical Hilbert space, and can be treated à la LSZ, as described in
Section 9.4. The stable/unstable dichotomy is therefore essentially an issue of isolating
the important time-scales in the process of interest.
Secondly, there is the question of elementarity. A natural definition of an ele-
mentary, as opposed to composite, particle would be negative in nature, and imply
the absence of a detectible substructure. The presence of the qualifier “detectible”
suggests the obvious caveat that future probing at much smaller distance scales (or
higher momenta/energies) than presently accessible might well reveal a substructure
where none is presently evident. Thus, there was in the first decades of the twentieth
century no reason to suppose that the proton should not be regarded as elementary,
as the quark-gluon substructure did not become evident until many years later. And
there remains the possibility that all of the presently identified, and apparently “point-
like”, elementary particles of the conventional Standard Model (quarks, leptons, gauge
bosons and Higgs bosons) might display, at sufficiently small distance scales (on the
order of the Planck length), a “stringy” substructure. Nevertheless, at the scales
presently accessible, it makes perfectly good sense to draw a line between the particles
with an empirically evident composite character and those which are effectively “point-
like”, and of course, once isolated from other particles, characterized by mass and spin
alone. A little later, as we describe the particle–field connection in detail, a very simple
and precise definition of an elementary particle will be given in terms of the nature of
the dynamics obeyed by the associated interpolating field.
To summarize briefly the above discussion, once specific temporal and spatial
scales are identified, we may imagine sorting all known particle states into four
categories, by specifying stable/unstable and elementary/composite. In the Standard
model, all of these categories contain exemplars. Now, although the discussion of
Heisenberg field theory in the present chapter has so far assumed for simplicity that
our Heisenberg field φ(x) was the interpolating field for a stable, elementary spinless
particle, the association of local fields with particles in the other three categories (i.e.,
unstable elementary, stable composite, and unstable composite) is perfectly possible.
For example, the scalar field associated with the Higgs particle in the Standard Model
is a perfectly sensible local field appearing in the Standard Model Lagrangian which
specifies the full dynamics of the theory, even though, by the discussion above, it
does not strictly speaking act as interpolating field for an asymptotic (stable) particle
state of the theory. The spectral representation for the two-point function of this field
will have (among others) a cut starting at 4m2e (where me is the electron mass) as
300 Dynamics VII: Interacting fields: general aspects

one of the possible decay products of the Higgs consists of an electron–positron pair,
but there will be no single particle pole, as the Higgs itself is unstable. Instead, the
Higgs resonance is revealed as a pole below the real axis on the second Riemann
sheet at a much higher energy (at the present time, > 115 GeV).23 In the case of
stable composite particles, e.g., the proton, interpolating local (or almost local, cf.
Section 9.2) fields can be constructed which go right into the Haag–Ruelle or LSZ
formulas to determine the S-matrix for proton interactions. The reader will recall that
the Haag Asymptotic theorem in particular was perfectly valid if the underlying field
involved an appropriately smeared product of local fields: in the case of the proton,
taking quantum chromodynamics as the underlying dynamical theory, the appropriate
interpolating field involves the product of three quark fields (two up and one down)
coupled to zero color.
It is now possible to give a more precise definition of the concept of “elementary”,
for either fields or particles. We shall define as elementary any field appearing in the
fundamental Hamiltonian (or Lagrangian) specifying the exact dynamics of the theory
at the distance scales in question. It is assumed that such a Hamiltonian can be given
in an explicit analytic form: it can be written down, without approximation, on a
finite piece of paper! Any particle interpolated for by such a field can be rightly called
“elementary”: the “point-like” character of the particle is reflected in the fact that
its exact interacting dynamics has a precise finite expression in terms of products of
local fields at a single spacetime point. With this terminology, the quark, lepton, Higgs
and gauge boson fields of the Standard Model are elementary. Correspondingly, the
leptons, Higgs and electroweak gauge bosons of the Standard Model are elementary
particles as well.
What about the quarks and gluons, which we have omitted from our elementary
particle list? Quantum chromodynamics (QCD), which we shall discuss in detail later
in the book, provides a particularly stark example of the perils of the naive dictum “for
every field, a particle”. In confining theories such as QCD, the dynamics is specified via
a Lagrangian involving quark and gluon fields which do not interpolate for any of the
physical particles (“hadrons”) in the theory.24 In fact, these fields strictly speaking are
not defined on the physical Hilbert space at all! However, multi-local combinations of
the quark and gluon fields, appropriately coupled to be invariant under the local gauge
symmetry of the theory, form well-defined almost local operators which do interpolate
for physical particles, and are well defined on the physical state space. Thus, the
product of three quark fields provides us with an interpolating field for the proton,
the product of a quark field and its conjugate a pion field, and so on.
Although we have decided, in agreement with the definition proposed at the
beginning of this section, to withhold the appellation “particle” from quarks and

23 For an excellent discussion of the role and properties of unstable particles in a field-theory context,
see (Brown, 1992), Section 6.3. An elementary discussion in standard quantum theory, including the
interpretation of resonances as poles on the second Riemann sheet, can be found in (Baym, 1990), Chapter 4.
24 Remarkably, there are even difficulties in the description of charged particle states in quantum
electrodynamics, due to the masslessness of the photon: thus, the electron does not possess a conventional
set of asymptotic states à la Haag–Ruelle theory—in the language of Schroer, it is an “infraparticle”. More
on this in Sections 19.1 and 19.2.
General aspects of the particle–field connection 301

gluons, and speak only of quark and gluon fields, it is certainly true that perturbative
calculations in QCD can be performed treating the omnipresent quarks and gluons as
particles in just the same way that electrons and photons are so treated in quantum
electrodynamics. Again, this is a question of the relevant spatiotemporal scales of
the process: in a sense which can be made mathematically precise, QCD possesses
a property of asymptotic freedom which renders the interaction arbitrarily weak at
progressively smaller spatiotemporal scales (or equivalently, higher-energy scales),
so that perturbative calculations become correspondingly more accurate. How these
calculations can be connected to an actual S-matrix amplitude, in which necessarily
the field energy must be allowed to dissipate over a (relatively) large spatial and
temporal region, resulting in a hadronization of the underlying quark and gluon degrees
of freedom, requires an understanding of an important scale separation property of
renormalizable field theories, called “factorization”, which will be an important topic
of investigation in Part 4 of this book.
In summary, we once again emphasize the absence of any sacred one-to-one connec-
tion between particles and fields. A given particle (stable or unstable, elementary or
composite) may be “represented” by many different local or almost local fields: if the
particle is stable, and therefore represented in the asymptotic in- and out-states of the
theory, any field with a non-vanishing vacuum to single-particle matrix element serves
as an appropriate interpolating field for it. On the other hand, it may be convenient,
and in the case of gauge theories indispensable, to introduce fields which interpolate
for none of the physical particles of the theory, but in which products of such fields
do interpolate for these particles. In such confining theories (cf. Section 19.3) we
may loosely speak of the physical particle states as (stable or unstable) bound states
of the underlying quark and gluon “particles”, even though there is no attainable
physical circumstance in which these latter can be realized as isolated entities with
the characteristics expected of a particle: well-defined mass, spin, energy-momentum,
and (in the case of QCD) a definite color quantum number derived from the putative
interpolating field.
We conclude this section with a discussion of a subject which naturally arises when
one thinks carefully about the nature of the particle–field connection in quantum
field theory: the uniqueness (or otherwise) of the dynamical evolution specified by
the theory, given the enormous fluidity of field representations available for the same
particle, which nevertheless yield precisely the same S-matrix for scattering when
all is said and done. At the pure particle level, the time development of the theory
is certainly uniquely specified, in an almost trivial sense, once we assume asymp-
totic completeness. The multi-particle in-states |k1 , k2 , .., kN in , for example, form by
assumption a complete set, and the time-evolution of these states is trivially unique,
N
as they are eigenstates of the full Hamiltonian H (with eigenvalue E = i=1 E(ki )).
However, this obviously begs the question of what such states “look like” at any times
subsequent to the far past. In particular, knowledge of the particular combination
of outgoing sets of freely receding particles that a given in-state represents in the
far future requires that we have at our disposal the complicated connection between
the in- and out-fields (φin and φout ) of the theory, which then allows construction
of the S-matrix, yielding the desired scattering amplitudes. As we have seen, such a
connection is automatically afforded by the Heisenberg interpolating field φ(x) of the
302 Dynamics VII: Interacting fields: general aspects

theory, which converges (weakly) to φin (resp. φout ) in the far past (resp. far future).
But the interpolating field is subject to a considerable freedom of choice. There are
clearly an infinity of possible fields which interpolate between any two specified in-
and out-fields: we need only ensure that the given Heisenberg field has a non-vanishing
vacuum to single particle matrix element for the particle in question. Still, there is
an intuitive feeling that a proper theory should, at least in principle, uniquely specify
the physical situation not just at asymptotic times, but at finite intermediate times
as well: for example, at the time t 0 at which a scattering event takes place.
The conceptual difficulty here primarily springs from the need to specify more
clearly what one means by the phrase “physical situation” above. In standard quantum
mechanics, the specification of the physical state at some time would require a
measurement of a complete set of compatible observables. In field theory this is
enlarged to the notion of local measurements, exploring properties of the system in
bounded (microscopic) regions of spacetime. In other words, we would need to be
able to measure matrix elements of local (or almost local) operators in or between
specified physical states. To make this more concrete, let us take a famous example,
of enormous phenomenological importance: the hadronic electromagnetic current
μ
Jem,had (x) discussed in Section 9.5. A measurement of a general matrix element
μ
(typically called a “form factor”) out β|Jem,had (x)|αin for general spacetime points x is
actually possible given the fact that the current couples linearly and gauge-invariantly
to the photon field (cf. Chapter 15), which in turn is coupled in an accurately
computable (via perturbation theory) way to leptons. The momentum transferred
from scattered leptons (electrons, say: see Fig. 9.3) varies over the entire space-like
domain, giving directly the Fourier transform (to momentum variable k) with respect
to x of this matrix element. Unlike the situation with the S-matrix, this corresponds

e−

e− μ α
out β | J˜em,had(k)|α in

Fig. 9.3 Electron scattering off a hadronic state, allowing extraction of the space-like form
factor.
General aspects of the particle–field connection 303

to off-mass-shell (for the photon) information. It is precisely the availability of such


off-mass-shell information (with momentum variable k not restricted to the light-cone
k2 = 0) that of course allows us to infer direct local behavior (in x space) of the matrix
element. For example, if |α and |β are single proton states (the in- and out-labels
0
then become equivalent), the measurement of Jem,had gives us a direct “peek” at the
instantaneous charge density distribution inside a proton, at any given time. The LSZ
formalism makes it clear that off-shell information of this sort is exquisitely sensitive to
the precise choice of interpolating fields, so we are invited to enquire as to what extent
we really “know” the local operator in question—in this case, the electromagnetic
current for hadrons. Let us address this question in a simplified version of reality,
in which only up and down quarks are present, which interact only with gluons and
photons (i.e., weak interactions switched off). The asymptotic space of the theory
contains, in addition to photons, protons, neutrons (which are now stable, absent weak
decays) and a stable pion triplet. A unique hadronic electromagnetic charge operator
can be written down trivially employing the asymptotic fields of the charged hadrons
of the theory: namely, the proton in-field pin (x) and the charged pion field πin (x).
These are free fields, so the desired charge operator on the asymptotic states (and
therefore, by asymptotic completeness, on all states) is (cf. Problem 3 in Chapter 6,
and Problem 4 in Chapter 7):
 ↔
3 0 ∂ †
Qhad = d x(e : p̄in (x)γ pin (x) : +ie : πin (x)
πin (x) :) (9.201)
∂x0

which is, of course, time-independent, and must therefore equal d3 xJem,had0
(x, t) for
any t. However, we have the problem that in the presence of the extremely complicated
interactions of QCD which end up binding the elementary quark and gluon degrees of
freedom into the observed hadrons, any attempt to specify the full current operator
μ
Jem,had (x) directly in terms of the free asymptotic fields of the theory must lead to an
extremely complicated expression—certainly not one we are in any position to write
down!25 Instead, we must take recourse in our previous definition of elementarity: the
elementary fields of the theory are the unique (up to trivial redefinitions afforded
by the symmetries of the theory) interpolating fields in terms of which an exact
dynamics (e.g., Lagrangian or Hamiltonian) can be written down in some closed,
finite expression. In other words, once we write down the full Lagrangian of QCD (as
we will in Chapter 15) in terms of quark and gluon fields, the exact expression for the
hadronic electromagnetic current follows immediately in terms of the interpolating up
and down quark fields (which are the only elementary electrically charged hadronic
objects in our truncated toy world)

μ 2 1 ¯
Jem,had (x) = e : ū(x)γ μ u(x) : − e : d(x)γ μ
d(x) : (9.202)
3 3
and the physically directly measurable local matrix elements of this object are then
determined (in principle!) uniquely by the fact that the dynamics of the quark

25 Of course, the completeness of the asymptotic states assures that such an expression is in principle
possible.
304 Dynamics VII: Interacting fields: general aspects

fields (and gluon fields through which they interact) is specified by a Hamilto-
nian/Lagrangian of definite form. Of course, the connection between the elementary
quark and gluon fields in (9.202) and the asymptotic nucleon and pion fields appearing
in (9.201) is exceedingly complicated, involving the intricacies of confinement in a
strongly coupled theory. Nevertheless, great progress in direct calculation of form
factors starting from the QCD Lagrangian has been possible using the techniques
of lattice gauge theory at low energy, or perturbative QCD, at high energy (cf.
Chapters 18 and 19).
The above example makes clear that local quantum field theory, although conceptu-
ally stimulated, as we saw in Chapters 5 and 6, by the requirements of S-matrix theory
(in particular, the desire to construct Hamiltonians which lead to Lorentz-invariant
and clustering S-matrices), specifies, in certain cases essentially uniquely, details of
the dynamics which go beyond a pure S-matrix philosophy, which treats scattering
amplitudes connecting the behavior of systems long before and long after interactions
occur, but remains agnostic with regard to what happens “in between”. Indeed, given
a precise specification of the dynamics in terms of elementary fields, local observables
can be constructed, and in some cases even measured, which give us a window into the
behavior of the theory in finite regions of spacetime—behavior which should certainly
be a central component of the conceptual content of any self-respecting field theory.26

9.7 Problems
1. Show, using the Lorentz-invariance of the S-matrix, that the transformation
property (9.9) under the HLG of the in-states of the theory transfers to the
corresponding out-state transformation property (9.11).
2. Show that the scalar field transformation property (9.24) of the Heisenberg field
allows us to extend the equal-time commutativity property (9.3) to full space-
like commutativity:

[φH (x), φH (y)] = 0, (x − y)2 < 0

3. Show that the full Heisenberg field can be reconstructed from the time-
dependent creation and destruction operators aH (k; t), aH (k; t) defined by (9.36)
via

φH (x) = d3 k(gk (x)aH (k; t) + gk∗ (x)a†H (k; t)) (9.203)

where gk (x) ≡ √ 1
e−ik·x . (A precisely analogous formula relates φin (x)
(2π)3 2E(k)
(resp. φout (x)) to the in (resp. out) creation and destruction operators
a†in (k), ain (k) (resp.a†out (k), aout (k)).)
4. With |g1 , g2 , ..gn  defined as in (9.56), show that φf |g1 , g2 , ..gn  is a state of
finite norm, with φf a free scalar field smeared with a Schwarz-type function.

26 For a discussion of the role of almost local operators in devising thought experiments involving particle
detectors which act as probes of particles in arbitrary localized regions of spacetime, see Section II.4.3 of
(Haag, 1992).
Problems 305

5. Show that the function F+ (p0 ) (and hence, trivially, F0 (p0 ) and F− (p0 )) defined
in (9.98, 9.99, 9.100) is infinitely times continuously differentiable for arbitrary
real p0 . (Note that the differentiability is trivial except at the two singular
boundary points p0 = a, b.)
6. Show that the sum of overlap integrals for the inner product of two in-state
vectors obtained from Lemma 9.3 in the large time limit agrees with the inner
product following from the Fock space metric (5.21) on continuum-normalized
states, given the definition (9.137).
7. Show that if two momentum-space wavefunctions are non-overlapping,
gi∗ (
p)gj (
p) = 0, the coordinate space overlap of the corresponding single-particle
wavefunctions gi,j (x, t) (cf. (9.113)) vanishes for all time.
8. The asymptotic conditions (9.157, 9.158) can be used to derive an important
explicit relation (called the Yang–Feldman equation) between the interacting
Heisenberg field φH (x) and the asymptotic free in-field φin (x).
(a) First, show that with a†H (k; t) and gk (x) as in Problem 3, for arbitrary
states |αin , |βin ,


in β|aH (k; t)|αin = Z in β|a†in (k)|αin
 t 
−i dt d3 x gk (x )(x + m2ph )in β|φH (x )|αin
−∞

(Hint: as in the proof of the reduction formulas, a function at time t is


expressed as the value at time −∞ plus the integral of the time-derivative
from −∞ to t).
(b) Using the result of Problem 3 above, and the formula obtained in part (a)
(with its obvious analog for matrix elements of aH (k; t)), show that

in β|φH (x)|αin = Z in β|φin (x)|αin

+ i d4 x ΔR (x − x )(x + m2ph )in β|φH (x )|αin

ΔR (x − x ) ≡ θ(t − t )(Δ+ (x − x ; m) − Δ+ (x − x; m))

with Δ+ the fundamental invariant function defined in (6.63). As the states


|αin , |βin form a complete set (asymptotic completeness), we have the
operator identity (Yang–Feldman equation)
√ 
φH (x) = Z φin (x) + i d4 x ΔR (x − x )(x + m2ph )φH (x ) (9.204)

9. Starting with the formula (9.174), in which one incoming particle and one
outgoing particle have been reduced out of the asymptotic states, carry the LSZ
process one step further by reducing out the incoming particle with wavefunction
g2 to obtain an expression with the time-ordered product of three Heisenberg
fields.
306 Dynamics VII: Interacting fields: general aspects

10. Derive the following reduction formula for reducing out a single incoming Dirac
particle:
Sk1 σ1 ..km
 σ  ,k σ ..k σ
m 1 1 n n

i ←
   
= 1/2 out k1 σ1 ..km σm |ψ̄(x1 )|k2 σ2 ..kn σn in (i ∂
/x1 + m) (9.205)
Z

1 m
· u(k1 , σ1 )e−ik1 ·x1 d4 x1 (9.206)
(2π)3/2 E(k1 )
You should first derive the following Fourier transform formula for the creation
operator for a Dirac particle in the in-state, in terms of the Dirac in-field defined
with normalization as in (7.104):

1 m
b†in (k, σ) = ψin†
(x) u(k, σ)e−ik·x d3 x
(2π)3/2 E(k)
11. Derive the translation property
G(x1 , ...., xn ) = G(x1 − a, ...., xn − a)

for the n-point Green functions, and use it to establish the energy-momentum
conservation property for the S-matrix (as given by LSZ (9.176)):

Sk1 ..km
 ,k ..k
1 n
∝ δ 4 (k1 + ..km

− k1 − ... − kn )
10
Dynamics VIII: Interacting fields:
perturbative aspects

Local quantum field theory, as incorporated in the currently accepted framework


of the Standard Model of elementary particle interactions, has provided a quanti-
tative description of microphysical processes down to distance scales on the order
of 10−18 meters, backed by an enormous quantity of empirical evidence supplied by
experiments carried out over a huge range of energy scales. In some cases, the level
of quantitative agreement is astonishing: the precision allowed in the calculation of
certain quantum electrodynamic processes (such as the anomalous magnetic moment
g−2 of the electron) using relativistic field theory is unmatched in any other area
of physical science. Such precision is only possible for weakly coupled field theories
such as quantum electrodynamics (QED), where a natural expansion parameter, the
fine-structure constant α  1/137, is sufficiently small that the asymptotic expansion
in powers of α provided by perturbation theory gives a rapidly converging sequence of
approximants to the desired physical quantity. Calculations of g−2 have been carried
out through order α4 , leading to theoretical predictions valid to ten significant figures.
However, even in theories such as quantum chromodynamics (QCD), where the
strength of the interaction would seem to vitiate the validity of a perturbation
approach, the property of asymptotic freedom, which we shall discuss in detail in Part
4, allows in many cases the application of perturbative techniques in the derivation of
the high-energy asymptotic behavior of many interesting strong interaction processes.
Finally, even when not directly applicable, perturbation theory, as expressed in
the very intuitive graphical formulation introduced by Feynman in the late 1940s,
provides many important insights into the structure of local quantum field theory
which turn out to be crucial in advancing the development of more general non-
perturbative techniques. We shall therefore be devoting this chapter to an account of
the perturbative aspects of field theory. The technical background, both operator and
functional (path integral), necessary for the development of amplitudes in graphical
expansions will be our main subject.
Perturbation theory for time-dependent processes (such as scattering) in non-
relativistic quantum theory is usually formulated in terms of an interaction picture in
which the Hamiltonian is split into free and interacting parts (see Section 4.1.1), and
the generally complicated effects of the interaction restricted to the time development
of the states, while the operators evolve with the free dynamics. Unfortunately, as we
have mentioned on several occasions previously, and will discuss in more detail in the
final Section of this chapter, in a quantum field theory with infinitely many degrees of
308 Dynamics VIII: Interacting fields: perturbative aspects

freedom, the interaction picture typically does not exist—a result usually referred to
as Haag’s theorem. The problem can be removed by a full regularization of the theory
in which both short- and long-distance cutoffs are introduced, leading to a theory
with a finite (indeed, arbitrarily large!) number of independent quantum-mechanical
degrees of freedom. Unfortunately, such regularizations inevitably result in a (one
hopes, temporary) loss of the full Poincaré symmetry of the theory, and the task then
remains to establish the return of this symmetry as the regularization is removed.
In fact, any application of the usual formal “theorems” (such as the Gell–Mann–
Low theorem of Section 9.1, (9.21)) of interaction-picture perturbative expansions in
field theory necessarily require the insertion of a regularization in order to obtain
unambiguous results, due to unavoidable ambiguities in time-ordered Feynman Green
functions involving composite operators (such as interaction Hamiltonian densities),
as discussed at the end of Section 9.5. On the other hand, the LSZ formula (9.176)
derived in Section 9.4 is completely independent of any reasoning relying on interaction-
picture arguments. Of course, the problem remains of actually calculating the Feynman
n-point amplitudes contained in the LSZ formula (i.e., the VEV of the time-ordered
products of n Heisenberg fields) in order to obtain the desired S-matrix elements.
These Green functions can only be obtained analytically in a handful of toy field
theories in 1+1 spacetime dimensions: in any realistic case, we necessarily must have
recourse to approximative methods. These are basically of two kinds:

1. Perturbative evaluation of the desired amplitudes in an asymptotic expansion


in a (finite number) of parameters defined in terms of physically measurable
amplitudes of the theory. We imagine doing this in a fully regularized version
of the theory so that well-defined results are obtained order by order in the
perturbative expansion. The question of the sensitivity of the results so obtained
to the presence of cutoffs (in particular, a short-distance or “UV” cutoff) will be
the primary object of study in Part 4 of the book.
2. Non-perturbative approximative schemes may be employed. By far the most
successful of these approaches—not necessarily with respect to the quantitative
accuracy of the results obtained, but insofar as the approximations employed
are systematically improvable—is lattice quantum field theory. The field theory
is fully regularized on a (Euclidean) spacetime lattice, and the corresponding
n-point functions (called Schwinger functions in the Euclidean case) evaluated
numerically from a path-integral representation. In many cases the information
so obtained in Euclidean space can be transferred directly to information about
the actual Minkowskian physics of the theory (e.g., spectrum, matrix elements
of various local operators, etc.).

In this chapter we concentrate on developing the basic techniques required to


implement the first item above. The required perturbative expansions will be seen to
have a natural interpretation in terms of graphical objects (Feynman graphs), with
simple rules (Feynman rules) allowing the evaluation of the amplitudes in terms of
elementary algebraic expressions associated with each graphical element (line, vertex,
etc.). This will be done first using operatorial methods: the matrix elements of
Heisenberg picture operators needed for the LSZ formula will be expanded using the
Perturbation theory in interaction picture and Wick’s theorem 309

Gell–Mann–Low theorem, and the resultant expressions evaluated using a technical


tool known universally as Wick’s theorem. Later, we shall see how the resultant
graphical objects also arise naturally in a path-integral formulation of the field theory.
Finally, as promised, we discuss the significance of Haag’s theorem in an attempt
to assuage the reader’s natural anxiety concerning the validity of results obtained in
perturbation theory via interaction-picture methods.

10.1 Perturbation theory in interaction picture


and Wick’s theorem
Our primary object of interest in the following will be the S-matrix, which in a theory
of self-interacting scalar fields is given by the LSZ formula derived in Section 9.4:
 
n 
m
1 1  
Sk1 ..km
 ,k ..k
1 n
= (iZ −1/2 )m+n  e+ikj ·xj −iki ·xi
i=1 j=1
(2π)3/2 2E(ki ) (2π)3/2 2E(kj )

  4 
· Kxi Kxj out 0|T (φH (x1 )..φH (xm )φH (x1 )..φH (xn ))|0in d xi d xj
4

(10.1)

The VEV of the time-ordered product of m + n fields appearing here (in common
parlance, the m + n-point Green function of the theory) has a formal perturbative
expansion via the Gell–Mann–Low theorem (9.21),

 
(−i)p
out 0|T {φH (y1 )φH (y2 ) .. φH (ym )}|0in = 0|T {φ(y1 )φ(y2 )..φ(ym )
p=0
p!

· Hint (z1 )Hint (z2 )..Hint (zp )}|0d4 z1 d4 z2 ..d4 zp (10.2)

which provides the needed formal expansion of the Green function appearing in
the LSZ formula in powers of the interaction. We once again emphasize that in a
continuum field theory with no short-distance cutoff, the T-products appearing on
the right-hand side of (10.2) contain ambiguities, so a suitable regularization (e.g.,
on a spacetime lattice) is implied to make the individual terms in the perturbative
expansion meaningful. All the fields appearing on the right are, of course, free fields,
as they are in the interaction picture.
Our task in this section is to derive an important technical result, called Wick’s
theorem, which will facilitate the computation of these T-products of free fields.
Although, for the purposes of the LSZ formula, we clearly only need the VEV of the
T-products, we shall derive the more general result contained in Wick’s theorem, giving
the T-products as a sum of terms involving products of normal-ordered products of the
fields and c-number two-point Green functions (i.e., free Feynman propagators). As
a normal-ordered product of fields vanishes when it encounters the vacuum either on
the left (bra-state) or right (ket-state), only terms involving products of Feynman
propagators (and no normal-ordered products) will actually survive in the VEV
appearing on the right of (10.2). Nevertheless, the more general operator result derived
below is important in other contexts, and worth the small additional effort required.
310 Dynamics VIII: Interacting fields: perturbative aspects

The reader should also note that we have reverted to the notation used prior to Section
9.2, wherein unsubscripted fields, such as φ(x), are free interaction-picture fields (as in
Chapters 7 and 8), while Heisenberg fields are explicitly distinguished by a subscript
“H”, as in φH (x).
Wick’s theorem is usually proved by an induction procedure, starting with the
result for two fields—the simplest non-trivial case. A short calculation, using the fact
that the positive (destruction) and negative (creation) frequency parts of the free field
operator φ(x) have a c-number commutator, shows that the difference between the
time-ordered and normal-ordered product of two fields is itself a c-number. The T-
product of φ(x1 ) and φ(x2 ) is symmetric in its arguments, so with no loss of generality
we may assume x01 > x02 , whence

T (φ(x1 )φ(x2 ))− : φ(x1 )φ(x2 ) :


= φ(x1 )φ(x2 )− : (φ(+) (x1 ) + φ(−) (x1 ))(φ(+) (x2 ) + φ(−) (x2 )) :
= (φ(+) (x1 ) + φ(−) (x1 ))(φ(+) (x2 ) + φ(−) (x2 ))
− (φ(+) (x1 )φ(+) (x2 ) + φ(−) (x1 )φ(+) (x2 ) + φ(−) (x2 )φ(+) (x1 ) + φ(−) (x1 )φ(−) (x2 ))
= [φ(+) (x1 ), φ(−) (x2 )] = c − number (10.3)

On the other hand, a c-number is equal to its VEV, and by definition the normal-
ordered product vanishes when sandwiched between vacuum states, so taking the VEV
of the operator difference above we find

T (φ(x1 )φ(x2 ))− : φ(x1 )φ(x2 ) : = 0|T (φ(x1 )φ(x2 ))|0 (10.4)

or

T (φ(x1 )φ(x2 )) = : φ(x1 )φ(x2 ) : + 0|T (φ(x1 )φ(x2 ))|0


= : φ(x1 )φ(x2 ) : + iΔF (x1 , x2 ) (10.5)

giving the desired resolution of the time-ordered product in terms of normal-ordered


products and c-number functions—in this case, simply the Feynman propagator (times
a factor if i; see (7.214)). We shall not pursue the inductive proof further here,1
but rather provide an “all-at-once” proof of Wick’s theorem for scalar fields using
functional methods that closely parallel the way in which the theorem emerges in the
path-integral formulation of field theory (discussed below), as a combinatoric property
of the (functional) derivatives of a Gaussian functional. The proof shows that Wick’s
theorem is basically a manifestation of the Baker–Campbell–Hausdorff (BCH) formula,
which is frequently found to be lurking in the background when operator reordering
issues arise. We remind the reader that if two operators A and B have a commutator
[A, B] which itself commutes with both A and B, then
1
eA+B = e− 2 [A,B] eA eB (10.6)

1 See Bjorken and Drell (Bjorken and Drell, 1965), for the standard operatorial proof.
Perturbation theory in interaction picture and Wick’s theorem 311

We now consider a very simple interaction Hamiltonian, linear in the canonical scalar
field, so that
 
Vip (t) = Hint (x, t)d x = j(x, t)φ(x, t)d3 x
3
(10.7)

where j(x, t) is an unspecified real c-number source function, with respect to which
we shall later wish to perform functional derivatives. This hermitian interaction
Hamiltonian determines a unitary evolution operator U (t, t0 ) in the usual way
(cf. (4.28)):
 t
U (t, t0 ) = T {exp (−i Vip (τ )dτ )} (10.8)
t0

with

∂U (t, t0 )
= −i( j(x, t)φ(x, t)d3 x)U (t, t0 ) (10.9)
∂t

The expansion (4.27) shows that time-ordered products of arbitrarily many φ(x) fields
can be obtained from knowledge of U (t, t0 ) by differentiating it with respect to the
source functions j(x). Next, note that if we define a new (unitary) operator E(t, t0 )
by omitting the time-ordering
 t  
4 (+) (−) 4
E(t, t0 ) ≡ exp (−i Vip (τ )dτ ) = e−i j(x)φ(x)d x = e−i j(x)(φ (x)+φ (x))d x
t0
(10.10)
the connection to a normal-ordered quantity is immediate using the BCH formula
(10.6):
 (+) (−) 4
 (−) 4
 (+) 4
: E(t, t0 ) : = : e−i j(x)(φ (x)+φ (x))d x := e−i j(x)φ (x)d x e−i j(x)φ (x))d x

− 12 j(x1 )j(x2 )[φ(−) (x1 ),φ(+) (x2 )]d4 x1 d4 x2
=e E(t, t0 ) (10.11)

where the time-integrals associated with the spacetime coordinates x1 , x2 are implicitly
assumed to go from t0 to t. Our objective of finding a connection between the time-
ordered and normal-ordered field products is therefore accomplished if we can find a
simple relation between U (t, t0 ) and E(t, t0 ). We do this by studying the time-evolution
equation satisfied by E(t, t0 ):
 t
∂E(t, t0 ) 1 −iΔtVip (t)−i tt Vip (τ )dτ −i Vip (τ )dτ
= lim (e 0 −e t0
) (10.12)
∂t Δt→0 Δt
t
Now, define A ≡ −iΔt Vip (t), B ≡ −i t0 Vip (τ )dτ , and apply (10.6), expanded to first
order in Δt, so that
1
eA+B = (1 − [A, B] + A + O((Δt)2 ))eB (10.13)
2
312 Dynamics VIII: Interacting fields: perturbative aspects

Inserting this in (10.12) gives the desired first time-derivative of E(t, t0 ):



∂E(t, t0 ) 1 t
= (−iVip (t) + [Vip (t), Vip (τ )]dτ )E(t, t0 ) (10.14)
∂t 2 t0

It now follows directly that the operator


t  τ1
−1 dτ1 dτ2 [Vip (τ1 ),Vip (τ2 )]
E  (t, t0 ) ≡ e 2 t0 t0
E(t, t0 ) (10.15)

satisfies (see Problem 1) the same first-order differential equation as U (t, t0 )


∂E  (t, t0 )
= −iVip (t)E  (t, t0 ) (10.16)
∂t
and the same initial condition, E(t0 , t0 ) = U (t0 , t0 ) = 1, whence
t  τ1
 − 12 dτ1 dτ2 [Vip (τ1 ),Vip (τ2 )]
U (t, t0 ) = E (t, t0 ) = e t0 t0
E(t, t0 )
1
t t
− dτ1 dτ2 θ(τ1 −τ2 )[Vip (τ1 ),Vip (τ2 )]
= e 2 t0 t0
E(t, t0 ) (10.17)

Now let t → +∞ and t0 → −∞, so that in effect we are studying the functionals
U (+∞, −∞) = S[j], E(+∞, −∞) = E[j], with

S[j] = T {exp −i j(x)φ(x)d4 x} (10.18)

E[j] = : exp −i j(x)φ(x)d4 x : (10.19)

and the integrals extend over all spacetime. Of course, S[j] is just the S-matrix (more
precisely, the S-operator whose matrix elements constitute the S-matrix) for the system
with interaction Hamiltonian (10.7). Using the normal-ordering result (10.11) relating
E to : E :, we find
1
 4 4 (−) (+) 0 0
S[j] = e 2 d x1 d x2 j(x1 )j(x2 ){[φ (x1 ),φ (x2 )]−θ(x1 −x2 )[φ(x1 ),φ(x2 )]} E[j] (10.20)

The exponent in (10.20) can be considerably simplified. As it is a c-number (involving


only commutators of the free field φ(x)), it is equal to its VEV:

0| [φ(−) (x1 ), φ(+) (x2 )] − θ(x01 − x02 )[φ(x1 ), φ(x2 )]|0
= 0| − φ(+) (x2 )φ(−) (x1 ) − θ(x01 − x02 )[φ(x1 ), φ(x2 )]|0
= 0| − φ(x2 )φ(x1 ) − θ(x01 − x02 )[φ(x1 ), φ(x2 )]|0
= 0| − (θ(x01 − x02 ) + θ(x02 − x01 ))φ(x2 )φ(x1 )
− θ(x01 − x02 )(φ(x1 )φ(x2 ) − φ(x2 )φ(x1 ))|0
= −0|θ(x01 − x02 )φ(x1 )φ(x2 ) + θ(x02 − x01 )φ(x2 )φ(x1 )|0
= −0|T (φ(x1 )φ(x2 ))|0 = −iΔF (x1 , x2 ) (10.21)
Perturbation theory in interaction picture and Wick’s theorem 313

Inserting this result in (10.20) we find the desired final result (Wick’s theorem in
functional notation)
  
1  4 4
T {exp −i j(x)φ(x)d4 x} =: exp −i j(x)φ(x)d4 x : e− 2 j(x1 )φ(x1 )φ(x2 )j(x2 )d x1 d x2
(10.22)
where we have introduced the concept of the “contraction of the fields φ(x1 ) and
φ(x2 )”, written φ(x
1 )φ(x2 ) and defined in this case simply as the Feynman two-point
function of the fields in question:

φ(x
1 )φ(x2 ) ≡ 0|T (φ(x1 )φ(x2 ))|0 = iΔF (x1 , x2 ) (10.23)

The time-ordered and normal-ordered products of fields are recovered simply by taking
the desired number of functional derivatives of (10.22) with respect to the c-number
source function j(x). Thus, the explicit expansion
 ∞ 
(−i)n
T {exp −i j(x)φ(x)d x} = 4
d4 x1 d4 x2 ..d4 xn j(x1 )..j(xn )T {φ(x1 )..φ(xn )}
n=0
n!
(10.24)
implies

δn
T {φ(x1 )φ(x2 )..φ(xn )} = in T {exp −i j(x)φ(x)d4 x}|j=0
δj(x1 )δj(x2 )..δj(xn )
(10.25)
and similarly

δn
: φ(x1 )φ(x2 )..φ(xn ) := i n
: exp −i j(x)φ(x)d4 x : |j=0 (10.26)
δj(x1 )δj(x2 )..δj(xn )
The special case n = 2 derived above, (10.5), emerges immediately by taking the
second functional derivative of (10.22) with respect to j(x1 ), j(x2 ) (and setting j = 0).
In general, the application of functional derivatives to (10.22) clearly results in an
expansion giving the T-product of n fields as a sum of terms where normal products
of all possible subsets of the fields are multiplied by products of contractions (i.e.,
Feynman propagators) of the remaining fields. The reader is strongly encouraged to
verify this explicitly for the case n = 4 (see Problem 2).
The preceding formulation of Wick’s theorem connects extremely naturally with
the functional (path-integral) formulation of field theory which we shall discuss in
Section 10.3 (indeed, we have chosen to derive Wick’s theorem in the functional
language for precisely this reason), and can be extended to fermionic fields by using
the Grassmann algebra technology to be discussed in Section 10.3.2 (Evans et al.,
1998). For fermions, the two-point Feynman Green functions are necessarily defined
with a minus sign in the anti-time-ordered part (cf. Chapter 7, Problems 5 and 11);
for example, for a Dirac field

SF (x1 − x2 )mn = 0|T (ψm (x1 )ψ̄n (x2 ))|0 ≡ ψm (x


1 )ψ̄n (x2 )

≡ θ(x01 − x02 )0|ψm (x1 )ψ̄n (x2 )|0 − θ(x02 − x01 )0|ψ̄n (x2 )ψm (x1 )|0
314 Dynamics VIII: Interacting fields: perturbative aspects

while the corresponding two-point contractions for two ψ or two ψ̄ fields vanish:

ψm (x 
1 )ψn (x2 ) = ψ̄m (x1 )ψ̄n (x2 ) = 0 (10.27)

The normal-ordered product of two fermionic fields likewise contains additional minus
signs when creation or annihilation parts of the fields are interchanged to effect the
normal-ordering. With these changes, and introducing2 c-number sources η(x), η̄(x)
(instead of the commuting j(x) above) which anticommute for arbitrary spacetime
points x (e.g., {η(x), η(y)} = 0, ∀x, y), one again recovers the basic result (10.22).3
The result for time-ordered products of fermionic fields, i.e., once the anticommuting
sources are removed by functional differentiation, is as above—namely, an expansion
containing all possible subsets of fields under the normal product, multiplied by all
possible contractions of the remaining fields—with the proviso that an extra minus
sign must be included in each term where fermionic fields on the right-hand side (both
in the normal-ordered parts as well as in the contractions) appear in an order which
is an odd permutation of the order in which they appear in the time-ordered product
on the left (as a result of the difference in sign obtained by reordering the attached
sources on the left versus the right).
Another important generalization of Wick’s theorem, more transparently obtained
in an operatorial proof, states that in the expansion of time-ordered products involving
(under the time-ordering symbol) already normal-ordered products of fields, contrac-
tions of the fields within each such normal-ordered product are omitted in the Wick
expansion. The interested reader is referred to the standard texts, e.g., (Bjorken and
Drell, 1965), for a complete operatorial proof of this extension.

10.2 Feynman graphs and Feynman rules


We return now to our original objective: the derivation of a perturbative expansion
for the S-matrix elements as given by the LSZ formula (10.1), where the T-product
of Heisenberg fields is expanded perturbatively via the formal expansion (10.2). At
p’th order of this expansion, the relevant VEV of interaction-picture fields involves
the time-ordered product

0|T {φ(x1 )..φ(xm )φ(x1 )..φ(xn )Hint (z1 )...Hint (zp )}|0 (10.28)

The calculation is conveniently divided into two stages: (a) a Wick expansion of the
T-product appearing in (10.28), yielding the coordinate space Feynman rules of the
theory, and (b) application of the Klein–Gordon operators and evaluation of the final
(Fourier) integrals over xi , xj in (10.1), yielding the momentum-space Feynman rules
of the theory.

2 A detailed description of the properties of such fermionic c-number functions is deferred to Section
10.3.2, when we consider fermionic functional integration.
3 The desired generalization of Wick’s theorem, where bosonic and/or fermionic fields appear under the
time-ordered product, can also be accomplished by using an inductive operatorial proof. See, for example,
Bjorken and Drell, Relativistic Quantum Fields, Section 17.4 (Bjorken and Drell, 1965).
Feynman graphs and Feynman rules 315

For the moment, assume that we are dealing with a φ4 theory (specifically, Theory
A of Section 7.6, with Hint (x) = 4!
λ
φ(x)4 ) of self-interacting massive scalar particles.4
Thus the matrix element in (10.28) contains the T-product of N = m + n + 4p free
scalar fields, at spacetime points which we may relabel temporarily y1 , y2 , ...yN (with
some of the yi repeated four times). Wick’s theorem then gives the VEV in (10.28) as
a sum of terms of the form

φ(y  
1 )φ(y2 )... φ(yp+1 )φ(yp+2 )...φ(yN −1 )φ(yN ), N even, 0 otherwise (10.29)

Here the scalar fields φ(y1 ), ...φ(yN ) are a permutation of the fields occurring in the
particular term of interest extracted from the product

φ(x1 )..φ(xm )φ(x1 )..φ(xn )Hint (z1 )..Hint (zp )

so the spacetime coordinates y1 , y2 , ..yN are selected from the xi , i = 1, 2, ..n, xj ,
j = 1, 2, ..m, zk , k = 1, 2, ..p. Recall from (10.1) that the spacetime coordinates xi are
associated with the n incoming particles, the coordinates xj with the m outgoing
particles, and the zk coordinates with the spacetime points at which interactions occur.
Thus, the contractions occurring in (10.29) are of three kinds:
1. Contractions between an external spacetime point (i.e., one of the xi or xj ) and
an interaction point zk .
2. Contractions between two interaction points zk .
3. Contractions between two external spacetime points.
We may dispose immediately of the last case, in which external particles are connected
directly rather than through the intermediacy of an interaction, as it actually leads
to a vanishing disconnected contribution.5 Taking the external points to be x1 , x1 for
simplicity, the integrals over x1 , x1 factorize from the rest of the expression in (10.1)

eik1 ·x1 −ik1 ·x1 Kx1 Kx1 φ(x
 
 4 4 
1 )φ(x1 )d x1 d x1


 
= eik1 ·x1 −ik1 ·x1 Kx1 Kx1 iΔF (x1 − x1 )d4 x1 d4 x1

 
=i eik1 ·x1 −ik1 ·x1 Kx1 δ 4 (x1 − x1 )d4 x1 d4 x1


= i(k12 − m2ph ) ei(k1 −k1 )·x1 d4 x1 → 0, k12 → m2ph (10.30)

4 We shall see later that a sensible perturbation theory—one in which the amplitudes are expanded in
terms of physically accessible low-energy parameters—requires a split of free and interacting Hamiltonians in
which quadratic terms, called counterterms, are also transferred from the free to the interaction Hamiltonian.
Our theory may also contain φ3 terms, of course. The discussion given here is readily generalized to include
the corresponding additional graphs in which two or three field lines connect at a spacetime point. Below,
we consider the case of two distinct interacting scalar fields φ, ψ (Theory B from Section 7.6) in some detail.
5 The reader will recall that although disconnected contributions in which a particle passes through
without interacting with the others are certainly present in general S-matrix amplitudes, we explicitly
removed them in the process of deriving the LSZ formula, so we should not expect to see them re-emerging
here!
316 Dynamics VIII: Interacting fields: perturbative aspects

and vanish once the on-mass-shell limit is taken for the external momentum k1 (note
that we have integrated the Klein–Gordon operator by parts onto the exponential in
the step leading to (10.30)).
The first two types of contraction described above form the building blocks of a
graphical representation of the perturbative amplitudes of the theory first introduced
by Feynman in his seminal work (Feynman, 1949a) on quantum electrodynamics in the
late 1940s. Contractions between fields at an external point and an interaction point
(or vertex) are referred to as the external legs of the graph. Contractions between
two interaction vertices are the internal lines of the graph. For a simple illustration,
consider Theory B in Section 7.6, with interaction Hamiltonian

Hint (x) = λψ † (x)ψ(x)φ(x) (10.31)

with φ a self-conjugate, ψ a complex scalar field. In this theory, the only non-zero
contractions are φ(x  †
1 )φ(x2 ) = iΔF (x1 − x2 ; M ) and ψ(x1 )ψ (x2 ) = iΔF (x1 − x2 ; m).
In Section 7.6.3 we computed the second-order (n=2) contribution to ψ − ψ c scattering
in a theory with this interaction Hamiltonian. In this order the relevant LSZ formula
† †
contains the T-product 0|T {ψH (x1 )ψH (x2 )ψH (x1 )ψH (x2 )}0, the expansion of which
to second order employing the Gell–Mann–Low formula (10.2) contains the following
T-product of ten interaction-picture fields:

0|T {ψ † (x1 )ψ(x2 )ψ(x1 )ψ † (x2 )ψ † (z1 )ψ(z1 )φ(z1 )ψ † (z2 )ψ(z2 )φ(z2 )|0 (10.32)

In the application of Wick’s theorem to this expression, we recall that only connected
contributions, in which any spacetime point can be connected with any other by a
continuous sequence of contractions, are phenomenologically relevant. It will soon
become clear that these are precisely the contributions which give rise (after the
Fourier transformation effected by integrating over the external particle exponential
factors) to a single overall δ-function of four-momentum conservation (cf. Chapter 6).
In the particular case here, there are just four possible connected contractions, which
correspond to two topologically distinct graphs, duplicated by the symmetry z1 ↔ z2 ,
which simply produces a factor of 2 when the integrals over z1 , z2 are performed. Thus,
the relevant part of the Wick expansion of (10.32) is

{ψ(x †  † 
 †  † 
1 )ψ (z1 )ψ(z1 )ψ (x2 )ψ(x2 )ψ (z2 )ψ(z2 )ψ (x1 )

+ ψ(x †  † 
 †  †  
1 )ψ (z1 )ψ(z2 )ψ (x2 )ψ(x2 )ψ (z2 )ψ(z1 )ψ (x1 )}φ(z1 )φ(z2 ) + (z1 ↔ z2 )

= {iΔF (x1 − z1 ; m) · iΔF (z1 − x2 ; m) · iΔF (x2 − z2 ; m) · iΔF (z2 − x1 ; m)


+ iΔF (x1 − z1 ; m) · iΔF (z2 − x2 ; m) · iΔF (x2 − z2 ; m) · iΔF (z1 − x1 ; m)}
× iΔF (z1 − z2 ; M ) + (z1 ↔ z2 )
≡ Gscatt (x1 , x2 , x1 , x2 , z1 , z2 ) (10.33)

The contractions appearing here can be represented graphically as indicated in


Fig. 10.1 (note the similarity to Fig. 7.2). At this point the reader may well be
wondering about our frequently expressed cautions concerning the inevitability of ill-
defined results in calculations performed ab initio in the continuum: our T-products of
Feynman graphs and Feynman rules 317

x2
x1
x1 x2
z2
z2

+ z1 + z1 ↔ z2
z1

x1 x2
x1 x2

Fig. 10.1 Coordinate space contractions for ψ − ψ c scattering in second order.

fields appearing in the fundamental LSZ formula (10.1), which need only the further
application of some spacetime-derivatives (in the Klein–Gordon operators) and Fourier
integrals to yield the desired scattering amplitude, seem by Wick’s theorem to dissolve
into sums of at first sight perfectly well-defined products of free propagators (i.e.,
the Feynman functions ΔF (x)). These propagators are indeed—as distributions—well
defined. For example, their Fourier transforms exist, also as well-defined distributions
(namely, the familiar 1/(q 2 − m2 + i ) factors). The problem arises from the fact
that products of well-defined distributions are not necessarily well-defined, although in
particular cases they may well be so. Problems with the multiplication of distributions
typically arise when the distributions being multiplied have coincident singularities.
In the situation here discussed, (10.33), no two Feynman propagators have the same
coordinate space arguments, so this situation does not arise. The fourth-order graph
shown in Fig. 10.2, on the other hand, evidently contains the square of the propagator

x1 x2

z3 z4
z2

z1

x1 x2

Fig. 10.2 Coordinate space contractions for ψ − ψ c scattering in fourth order, containing a
closed loop.
318 Dynamics VIII: Interacting fields: perturbative aspects

ΔF (z2 − z3 ; m)ΔF (z3 − z2 ; m) = Δ2F (z2 − z3 ; m) (10.34)

and this object is not a well-defined distribution! How do we know this? From the
fundamental theorem guaranteeing the existence of the Fourier transform of any
decent tempered distribution. If we attempt to compute this Fourier transform for
the indicated squared propagator, we find (writing z for z2 − z3 )

Π(q) ≡ Δ2F (z; m)eiq·z d4 z

e−ik1 ·z e−ik2 ·z d4 k1 d4 k2 iq·z 4
= e d z
k12 − m2 + i k2 − m2 + i (2π)4 (2π)4
2


1 1 d 4 k1 d4 k2
= (2π)4 δ 4 (q − k1 − k2 )
k12 − m + i k2 − m + i (2π)4 (2π)4
2 2 2

1 1 d 4 k1
= = ∞! (10.35)
k12 − m + i (q − k1 ) − m + i (2π)4
2 2 2

The infinity arises when we integrate (as, in a continuum theory with no short-
distance or high-momentum cutoff, we must) over all of four-dimensional  momentum
space: we then discover that the integrand behaves, for k1 >> q, like d4 k1 /k14 , and
is therefore logarithmically divergent. The (infinite) ambiguity is in this case of a very
simple form: it is removed by a single subtraction at an arbitrary value of q, and
therefore amounts to a single overall additive constant in the definition of the product
distribution. For example, if we subtract at q = 0, we obtain

2q · k1 − q 2 d 4 k1
Π(q) − Π(0) = (10.36)
(k12 − m2 + i )2 ((q − k1 )2 − m2 + i ) (2π)4

which is perfectly finite, as the integrand now behaves like 1/k15 at large values of k1 .
We shall shortly see that the integrated momentum k1 appears in graphs containing
loops as in Fig. 10.2: accordingly, such momenta are usually called “loop momenta”.
Going back to coordinate space (by an inverse Fourier transform), we see that the
(additive) ambiguity in the square of our Feynman propagator amounts to the Fourier
transform of an undetermined constant, i.e., to a four-dimensional δ-function δ 4 (z),
corresponding as expected to the short-distance limit in which the spacetime vertices z2
and z3 coincide. On the other hand, if we work—as we shall henceforth in this chapter
assume we are doing—in a regularized version of the theory6 with a suitably chosen
high-momentum cutoff, the divergence in (10.35) is removed and our amplitudes will
be well-defined at all stages. Soon, we shall see that ultraviolet divergences arising
from ill-defined products of coordinate-space distributions are associated with graphs
containing loops, leading to unbounded integrations in momentum space.
Returning to our initial task, the evaluation of the nth order perturbative con-
tributions to the S-matrix element in (10.1), we see that the final result is obtained

6 A detailed account of suitable regularizations is postponed to Chapter 16, where we begin the discussion
of the sensitivity of field-theory amplitudes to short-distance cutoffs in the theory.
Feynman graphs and Feynman rules 319

by applying Klein–Gordon operators  + m2ph (recall: mph is the actual physical mass
of the particle) to the outer vertex on each of the external legs (for both incoming
and outgoing particles). Equivalently, we may integrate the derivatives by parts onto
the plane-wave exponentials, obtaining factors of m2ph − ki2 (resp. m2ph − ki2 ) for each
of the incoming (resp. outgoing) particles in the process. These, together with the
indicated phase-space factors involving the square-root of the particle energies, then
multiply the Fourier transform with respect to the external momenta of the coordinate
space product of contractions (i.e., free propagators) arising from Wick’s theorem as
discussed above. In our explicit example (10.33), for example, the reader may easily
verify (see Problem 3) that the Fourier transform of the indicated contractions yields
(with a factor of two due to the z1 ↔ z2 symmetry), and integrating also over the
interaction vertex points z1 , z2 , as required by (10.2),

   
eik1 ·x1 +ik2 ·x2 −ik1 ·x1 −ik2 ·x2 Gscatt (x1 , x2 , x1 , x2 , z1 , z2 )d4 xi d4 xi d4 zi

= 2 × iΔ̃F (k1 ; m) · iΔ̃F (k2 ; m) · iΔ̃F (k1 ; m) · iΔ̃F (k2 ; m)


× (iΔ̃F (k1 + k2 ; M ) + iΔ̃F (k1 − k1 ; M )) × (2π)4 δ 4 (k1 + k2 − k1 − k2 ) (10.37)

in terms of the momentum-space Feynman propagators Δ̃F (k; m) = k2 −m1 2 +i . In the
process of performing the integrations over the locations of the interactions vertices
zi , four-dimensional δ-functions implementing energy-momentum conservation at each
vertex are generated. After integrating over the momenta carried by each propagator,
we are left in this case (as the graph is connected: cf. Chapter 6) with a single overall δ-
function enforcing energy-momentum conservation for the entire process. The products
of Feynman propagators appearing in (10.37) have a clear graphical interpretation: the
relevant graphs are in fact just those displayed in Fig. 7.2, in our original discussion of
this scattering process in Chapter 7. We may summarize the ingredients of the above
calculation in a general way valid for the perturbative calculation of arbitrary S-matrix
amplitudes of our theory (with interaction Hamiltonian Hint (x) = λψ† (x)ψ(x)φ(x)),
as a set of Feynman rules associating specific algebraic expressions with each of the
graphical elements:
1. At each 4-vertex, a factor −iλ, and a four-momentum conservation factor which
ensures that the sum of incoming momenta to the vertex equals the sum of
outgoing momenta: namely, (2π)4 δ 4 (Σ) (where Σ is a shorthand notation for the
difference in the total incoming and outgoing four-momenta at the vertex).
2. For each line carrying momentum q, a factor iΔ̃F (q) = q2 −mi 2 +i .
3. Integrate over all internal momenta qi : i.e., those associated with the internal
lines of the diagram, connecting two interaction vertices. The external lines cor-
responding to propagators beginning at one of the external points xi , xi are fixed
at the corresponding external particle momenta. In general there will be more
internal momenta present than δ-functions available to fix them, leaving some
number L of remaining four-momentum integrals. If we think (as a consequence
of rule 1) of the four-momentum as a conserved “fluid” flowing through the graph,
it is immediately apparent that the number L of such remaining integrals must
correspond to the number of independent closed loops around which an arbitrary
320 Dynamics VIII: Interacting fields: perturbative aspects

amount of four-momentum can flow without vitiating energy-momentum conser-


vation at any interaction vertex on the loop. Evidently, ultraviolet divergences
can only arise if there are one or more closed loops (L ≥ 1) in the graph. Thus,
the diagram in Fig. 10.2 has L =1 and is referred to as a “one-loop diagram”:
as we saw earlier, it has a logarithmic ultraviolet divergence. We shall soon see
that the organization of the Feynman graphs of the theory by loop number has
a deep physical significance: it amounts to an ordering in increasing powers of
Planck’s constant . Graphs such as Fig. 10.1, with zero loops, are commonly
called “tree diagrams” and correspond in some sense to the classical content of
the theory. Later (Section 10.4) we shall see that these tree diagrams encode a
formal perturbative solution of the classical field equations of the theory.
4. For each topologically distinct graph, apply a combinatoric factor taking into
account the number of ways this contribution arises from the Wick contractions
in (10.2). Although it is possible to write down an explicit formula for this factor
in any specific field theory, in the author’s experience it is generally easier to
retreat to an undisclosed location and then examine the Wick contractions to
identify the necessary factor.
5. For each external line corresponding to a particle of momentum k, a factor
−i √
Z 1/2
(ki2 − m2ph ) 3/2
1
. When the momentum ki is placed on-shell (i.e.,
(2π) 2E(ki )
ki2 is set equal to m2ph ), the factor (ki2 − m2ph ) vanishes, so the remaining part of
the amplitude must provide a simple pole (i.e., be proportional to 1/(ki2 − m2ph ))
in the on-shell limit for our perturbative expansion of the LSZ formula to yield
sensible results. How this can be ensured will be addressed shortly. The normal-
ization constant Z is extracted, as per our discussion of the Lehmann spectral
representation in Section 9.5, from the (perturbatively computed) residue of the
single-particle pole of the full Feynman propagator (i.e., the Fourier transform of
the two-point Feynman Green function of the Heisenberg field).

The extension of these rules to other field theories, containing fields of non-zero
spin (e.g., Dirac or vector fields), and other types of (polynomial) self-interactions,
is straightforward. Internal lines for a Dirac particle connecting vertices at y1 and y2
become propagators iSF (y1 − y2 ), and so on. There is one additional rule, perhaps
not immediately obvious from the preceding, concerning the contraction of fermionic
fields in a closed loop: in such cases, an additional minus sign must be inserted in the
amplitude, as a consequence of fermionic statistics. Consider, for example, Theory C
of Section 7.6, with interaction Hamiltonian

(C)
Hint (x) = λψ̄a (x)ψa (x)φ(x) (10.38)

where ψ(x) is a massive Dirac field (with the Dirac index here denoted a =1,2,3,4) and
φ(x) a self-conjugate scalar field. Among the various Wick contractions of a second-
order contribution to the S-matrix expansion (10.1) with this interaction will occur
terms corresponding to the graph illustrated in Fig. 10.2, where now the bold lines
represent fermions and the wiggly lines the scalar particle. In this case we have fermion
fields at two separate vertices (at spacetime points z2 and z3 ) fully contracted so that
Feynman graphs and Feynman rules 321

the two Dirac propagators form a closed loop:

T (ψ̄a1 (z2 )ψa1 (z2 )ψ̄a2 (z3 )ψa2 (z3 ).....) = −ψa1 (z 
2 )ψ̄a2 (z3 )ψa2 (z3 )ψ̄a1 (z2 ) (10.39)

= −Tr{iSF (z2 − z3 ) · iSF (z3 − z2 )} (10.40)

The minus sign arises because the reordering of the fields between the left and
right-hand side of (10.39) involves an odd permutation, and hence, by the fermionic
extension of Wick’s theorem discussed in the previous Section, an additional minus
sign. In graphical calculations this requirement is summarized in the simple rule: closed
fermion loops get an extra minus sign.7
There are several non-trivial issues which arise in interpreting the results of a
perturbative calculation of S-matrix amplitudes along the lines discussed above, some
of which have already been touched on above. Basically, obstructions to obtaining
well-defined results at each given order of perturbation theory may arise both in the
ultraviolet (short distance) or infrared (long time) domains. Even then, when a well-
defined regularized result has been obtained order by order, it must be kept in mind
that the resultant (infinite) series is never a convergent Taylor expansion, but only, at
best, an asymptotic expansion of the exact S-matrix amplitude. Let us examine these
three sets of problems in more detail.
We have already touched on the difficulties at short distance—the famous “ultra-
violet divergences” of perturbative quantum field theory—which go back to the ambi-
guities and/or divergences which arise when distributions with coincident spacetime
singularities are multiplied. These divergences are most easily seen in momentum
space, in the Fourier transforms of the products of coordinate space Green functions,
as ultraviolet divergences of loop integrals, such as (10.35), due to insufficiently
rapid falloff of the product of momentum-space propagators at large momentum.
In order to give meaning to the amplitudes we must introduce some appropriate
regularization of the loop integrals appearing in the amplitudes: namely, a large
momentum (or short distance) cutoff rendering all amplitudes finite and well defined in
all orders of perturbation theory. Here, “appropriate” means a regularization doing the
least violence manageable to the important underlying symmetries (such as Poincaré
invariance) of the theory, in such a way that eventually removal of the cutoff can be
accomplished with the restoration of these symmetries. Equivalently, we need to show
that the sensitivity of the Feynman amplitudes of the theory to the presence of the
short-distance cutoff, which can be viewed as an expression of our ignorance of as
yet unexplored new physics at very short distance scales, is for all practical purposes
negligible. These issues of scale sensitivity of field theory—and the whole machinery
of renormalization theory needed to address them adequately—will be the primary
topic of discussion in Part 4 of this book. For the time being we will simply assume,
when deriving or using perturbative results in the interaction picture, that the field

7 Of course, exactly as discussed previously for the scalar theory of Fig. 10.2, the multiplication of
two Dirac propagators with the same singularity (at z2 → z3 ) results in an undefined result, visible as a
divergence in the Fourier transform due to UV contributions to the corresponding loop integral. In this case
the divergence is even more severe, resulting in the appearance of two arbitrary constants. More on all this
in Chapters 16 and 17.
322 Dynamics VIII: Interacting fields: perturbative aspects

theory has been suitably regularized and that only well-defined expressions occur in
the evaluation of the Gell–Mann–Low expansion (10.2).
Another set of problems for our perturbative analysis arise from the large-distance
(or large-time) regime: specifically from the presence in a relativistic field theory
of persistent interactions which continue to affect the propagation of particles even
when (long before or long after a scattering interaction) they are well-separated from
one another and moving freely. Firstly, persistent interactions are present even in
the absence of scattering particles, i.e., in the evolution of the vacuum state. Such
“disconnected vacuum bubbles” appear with equal probability at all spacetime points
and lead to a spacetime volume dependent phase in the overlap of the in- and out-vacua
of the theory. Secondly, persistent interactions can occur as “radiative corrections”
on any of the incoming or outgoing particle lines in a scattering process. In Fig.
10.3 we display a Feynman diagram in λφ4 theory in which both types of persistent
interactions are present. We shall discuss the proper treatment of the external line
radiative corrections below. The role of the vacuum fluctuations leads to one of
the most interesting issues in modern cosmology: the cosmological constant, and its
“abnormally low” value (the so-called “cosmological constant problem”). The fact
that the energy of the discrete vacuum state is shifted by interactions is a perfectly
normal situation in any quantum-mechanical theory. The actual level of the ground-
state energy is not physically relevant in flat-space quantum field theory as it is an
unobservable quantity, but in the presence of gravity the absolute level of energies
(and energy-densities: specifically, the components of the energy-momentum tensor)
is clearly relevant. Staying with the flat-space theories which are our focus in this
book, it is easy to see that the net effect of the vacuum bubbles, defined as the set of
graphs arising from the m = 0 special case of the Gell–Mann–Low theorem (10.2),

 
(−i)p
out 0|0in = 0|T {Hint (z1 )Hint (z2 )..Hint (zp )}|0d4 z1 d4 z2 ..d4 zp (10.41)
p=0
p!

is to introduce an overlap phase factor (see Problem 4 for an explicit example)

out 0|0in = 0|U (+T /2, −T /2)|0 = e−iT V δE (10.42)

which is singular in the limit where spatial volume V and temporal extent T go to
infinity. Here δE is the interaction-induced shift in the vacuum spatial energy-density

correction to “free-floating”
incoming propagator vacuum fluctuation

Fig. 10.3 A disconnected Feynman diagram in λφ4 theory displaying persistent interactions.
Feynman graphs and Feynman rules 323

(which in fact typically contains ultraviolet divergences). The presence of the spacetime
volume factor V T is to be expected, as the disconnected vacuum bubbles, such as
that shown in Fig. 10.3, are free to float over the entire spacetime volume, so the
evaluation of the Feynman diagram necessarily produces a factor of V T . In fact, this
divergence is the only singularity of an infrared nature in a theory of massive particles,
and may be removed by the simple device of taking only connected contributions to
S-matrix elements, which automatically dispenses with the noisome vacuum fluctua-
tions. Effectively, this amounts to using the same vacuum state (either in or out) on
both sides of (10.1), or equivalently, to dividing a general S-matrix element out β|αin
(as given by the LSZ formula, say) by the vacuum phase out 0|0in .
Returning now to the second class of persistent interactions, those giving rise to
radiative corrections on the external legs of the diagrams, we recall from the discussion
of scattering theory in Section 9.4 that the generation of the appropriate in- and out-
states whose overlap gives the desired S-matrix element is formally accomplished by
taking the on-(physical)mass-shell limit for the four-momentum associated with each
external particle, ki2 → m2ph . In addition, the momentum-space m + n point Green
function (where m, n are the number of outgoing or incoming particles, respectively)
appearing in the LSZ formula (cf. (10.1) must be multiplied by factors ki2 − m2ph
for each external particle, so that the on-mass-shell limit gives a well-defined finite
result if and only if this Green function has simple poles in each of the off-shellness
variables ki2 − m2ph , with the final S-matrix amplitude given as the residue of these
poles. A careless execution of perturbation theory will result in amplitudes which,
at any fixed order of perturbation theory, fail to have the required pole behavior!
In particular, taking φ4 scalar theory as an explicit example, if we begin with the
Hamiltonian

1 2 1  2 1 2 2 λ 4
H= d3 x : φ̇ + |∇φ| + m φ + φ : (10.43)
2 2 2 4!

and define an interaction picture via the “obvious” separation H = H0 + V with


 
1 2 1  2 1 2 2 λ 4
H0 = d3 x : φ̇ + |∇φ| + m φ : , V = d3 x : φ : (10.44)
2 2 2 4!

the free momentum-space Feynman propagator Δ̃F (k) will clearly have a pole at
k2 = m2 , where m is the so-called “bare mass”, corresponding to the coefficient of φ2
in the Hamiltonian, but not to the actual physical mass mph , which differs from m as
a consequence of the interaction term. This is hardly unexpected: the energy of states
in quantum mechanics (in this case, the single particle at rest) is typically shifted
from the unperturbed value once interactions are switched on. We can restore mass
stability in our perturbation theory by splitting the full Hamiltonian in a different
way, thereby forcing the free part H0 to yield a single particle state with the correct
physical mass, by defining

m2 = m2ph + δm2 (10.45)


324 Dynamics VIII: Interacting fields: perturbative aspects

and transferring the δm2 φ2 “mass counterterm” into the interaction part of the
Hamiltonian, so that we now have

1 1  2 1 2 2
H0 = d3 x : φ̇2 + |∇φ| + mph φ : (10.46)
2 2 2

1 λ
V = d3 x : δm2 φ2 + φ4 : (10.47)
2 4!
with the coefficient δm2 adjusted order by order in the expansion in the “bare
coupling” λ to ensure that the pole of the full Heisenberg propagator remains at the
required physical value (namely, at k 2 = m2ph ).8 The resultant perturbative amplitudes
are said to be “on-shell renormalized”. It is also possible (and frequently convenient)
to employ a more general class of perturbative splits, in which the poles of the free
propagator are not at the physical mass, but at some “intermediate mass” (coinciding
neither with the bare mass in the Hamiltonian nor the physical particle mass): this
requires the reorganization of the full Green functions into a factorized product of
full propagators on the external legs (with poles at the correct physical mass, by the
Lehmann representation of Section 9.5) and “amputated” Green functions which can
be computed perturbatively with intermediate mass renormalization. How this can be
accomplished will be described below in Section 10.4, where the relevant topological
concepts are introduced.
Finally, there remains the “inconvenient truth” that a perturbative expansion of
field-theoretic amplitudes as a formal expansion in some suitably defined coupling
constant(s) of the theory at best provides an asymptotic expansion to the exact
amplitude, not a convergent (Taylor) expansion capable of yielding results of arbitrary
accuracy (at least in principle) by pushing the calculation to sufficiently high orders.
This should not be surprising if we recall that the same is true even in very simple
(non-field-theoretic) models in non-relativistic quantum theory. The discrete energy
eigenvalues En (λ) of the one-dimensional anharmonic oscillator Hamiltonian,

p2 1
H= + mω 2 x2 + λx4 (10.48)
2m 2
when developed in an expansion in powers of the anharmonicity λ, produce an asymp-
(n) (n)
totic expansion En (λ)  p Cp λp in which the coefficients Cp grow factorially with
p, so that the series in fact has zero radius of convergence. The lack of analyticity in
the λ variable is hardly surprising, if we consider the dramatically different behavior
of the theory if λ is taken negative (real), however small: the Hamiltonian then lacks
a ground state, as the λx4 part of the potential energy eventually becomes arbitrarily
negative for large enough x. The Rayleigh–Schrödinger perturbation expansion is still
useful of course, treated as an asymptotic expansion, provided λ is sufficiently small,
allowing an accurate estimate of the energy to be obtained by summing the first few
terms of the series (before the summands begin to increase). In certain cases (the

8 We remind the reader that we are assuming a fully regularized theory: in particular, all UV divergences
have been taken care of, so the perturbative expansion of the Green functions yields finite well-defined results
everywhere.
Path-integral formulation of field theory 325

so-called “Borel summable” expansions; cf. Section 11.3) the information contained in
the perturbative coefficients in fact suffices to determine the exact amplitudes, but this
is not the case in most field theories of interest (in particular, in the gauge theories
which underly the Standard Model of particle physics), and we must face the fact
that such theories contain qualitatively important “non-perturbative physics”, going
beyond any information which can be gleaned from a purely perturbative approach.
We will return to the issue of the non-convergence of perturbation theory, and the
extent to which “truly non-perturbative” information can be accessed by alternative
methods, in Chapter 11.

10.3 Path-integral formulation of field theory


Although it is perfectly possible to derive essentially all important field-theoretic
results in an operator-based formalism, there are many instances in which a func-
tional formalism yields equivalent results with a fraction of the effort required by an
operatorial calculation. Moreover, a precise specification of the interacting dynamics
of a quantum field theory, valid beyond the formal expansions of perturbation theory,
is typically achieved only by formulating the theory on a finite spacetime lattice, at
which point the functional integral formalism provides an unambiguous specification
of the amplitudes of the theory, even in situations in which perturbation theory is
useless. The only successful “first principles” calculations of the spectrum of a strongly
interacting theory (in particular, of quantum chromodynamics (QCD), the field theory
of strongly interacting particles) are those which have been done in this way, by
stochastic estimates of the latticized path integral of the theory. In this section we
shall build up the basic elements of the path-integral formalism for local quantum field
theory, with our discussion of path integrals for quantum mechanical point particles
in Section 4.2 serving as a natural starting point.

10.3.1 Path integrals for bosonic fields


In our review of basic quantum mechanical formalism in Chapter 4 (Section 4.2), we
saw that ground-state expectation values of (time-ordered) Heisenberg representation
coordinate operators describing a non-relativistic particle could be re-expressed in a
functional, or “path-integral”, framework. For example, we found (cf. (4.134, 4.135))
a functional integral expression for a generating functional Z[j] whose functional
derivatives yield the Feynman Green functions (i.e., ground-state expectation value of
time-ordered Heisenberg operators) of the theory:
  +∞
i (p(t)q̇(t)−H(p(t),q(t))−j(t)q(t))dt
Z[j] ≡ DpDq e −∞ (10.49)

The analogous quantity for a free field theory, a generating functional for the vacuum
expectation value of time-ordered free field operators, has already been introduced
and discussed in Section 10.1 from an operatorial standpoint. Thus (introducing the
subscript 0 to indicate a free field theory, and taking a free massive scalar field to be
specific) we expect that the functional
326 Dynamics VIII: Interacting fields: perturbative aspects

Z0 [j] ≡ 0|S[j]|0 = 0|T {exp −i j(x)φ(x)d4 x}|0 (10.50)

plays a role analogous to the Z[j] in (10.49). We emphasize at this point that an overall
multiplicative constant in Z0 [j] is irrelevant when the normalized n-point functions
(compare (4.135)) are computed, as these involve derivatives of Z0 divided by Z0 .
We seek a functional integral representation for this object analogous to (10.49):
such a representation clearly involves canonical “momenta” complementary to the
coordinate degrees of freedom, which in this case are evidently the field operators at
any spatial point (on a given time-slice). The appropriate complementary quantities
were discussed in our treatment of the classical limit in Section 8.1: for our simple
scalar theory, the field operator and its time-derivative form quantum-mechanically
conjugate variables:
∂φ(x, t)
[π(x, t), φ(y , t)] = −iδ 3 (x − y ), π(x, t) ≡ (10.51)
∂t
We shall proceed by formally imitating the representation (10.49), with the appropriate
modifications for field theory, and then checking explicitly to ensure that the resultant
functional indeed yields the desired Feynman functions of the free field theory.
The generalization of the path-integral formula to include interactions will then be
straightforward.
Essentially, the only modification needed to convert (10.49) into a field theory
formula is to include a spatial integral, as our “coordinates” and “momenta” are now
the values of the field φ(x, t) and its time-derivative π(x, t) at all spatial points (at
any given time). This immediately yields the path integral
  4
Z0 [j] = DφDπei (π(x)φ̇(x)−H0 (π(x),φ(x))−j(x)φ(x))d x (10.52)

where, as expected, the time integrals of (10.49) have been augmented to spacetime
integrals. As discussed in Section 4.2.4, expressions of this kind are purely formal in
nature—they must be given a precise meaning by the following maneuvers:
1. An appropriate regularization of the spacetime continuum, so that we are dealing
with a multiple integral over a well-defined discrete set of (many!) integration
variables. An obvious way to do this is simply to imagine defining the theory on
a finite discrete spacetime lattice: in other words, in addition to discretizing the
time variable as was done in Section 4.2.1, we also replace the spatial integrals
by sums over a finite spatial lattice.9 Space and time-derivatives are, of course,
replaced by appropriate difference quantities. The need for a short-distance cutoff
to obtain well-defined results is, of course, familiar by now from our discussion
of the difficulties that can arise from multiplying distributions in perturbation
theory- here we imagine also a long-distance cutoff (our spacetime lattice is
of finite extent) so that the functional integral becomes a multi- but finite-
dimensional one. At an appropriate point, once the path integrals have been

9 We have previously referred to a field theory regularized in this way as a fully regularized theory.
Path-integral formulation of field theory 327

performed, the continuum (i.e., zero lattice spacing) limit can be taken to obtain
the desired continuum results, and the spatial volume can then be allowed to go
to infinity.
2. Even after the functional integral is converted by the aforesaid discretization into
a conventional multi-dimensional integral, a further regularization, of a different
kind, is required to obtain well-defined results. Again, as discussed in Chapter
4, the integral (10.52) as it stands is not absolutely convergent, as the integrand
involves a complex undamped exponential. Just as in the quantum-mechanical
case, we shall see that the inclusion of an appropriate i factor is needed to ensure
absolute convergence of the integral in all directions in field space.

Does the formula (10.52) reproduce the correct Green functions, even for our very
simple case of a free (massive) scalar field? The Hamiltonian density in this case is
given by (6.89)10

1 1  1
H0 = π(x)2 + |∇φ(x)|2
+ m2 φ(x)2 (10.53)
2 2 2

We shall assume that spacetime has been appropriately discretized and gradients and
time-derivatives realized in such a way that the shift-invariance of the path integral
is preserved. To avoid unnecessarily complicating the notation, however, we shall
continue to use continuum
 notation, and to write the functional integral measure as
Dφ (rather than i dφ(xi ), for example, where xi indicates the discretized spacetime
lattice points). By shift-invariance, we mean simply
 
DφF [φ + χ] = DφF [φ] (10.54)
 
DπF [π + χ] = DπF [π] (10.55)

for any fixed (c-number) function χ(x), and functional F , assuming that the integral
exists (is absolutely convergent) in the first place. Inserting (10.53) into (10.52), we
find that the dependence of the integrand on the momentum field π(x) is Gaussian:
  
(π(x)φ̇(x)− 12 π(x)2 )d4 x 1 2 4
(π(x)−φ̇(x))2 d4 x
· e− 2
i
ei = ei 2 φ̇(x) d x
(10.56)

If we perform the functional integral over the π(x) field first, holding the φ(x) field
fixed, then the shift invariance property assumed for the Dπ integral implies (taking
χ = −φ̇ in (10.55))
 that the latter decouples completely as a multiplicative factor
 − 2i π(x)2 d4 x
C ≡ Dπe , leaving only the integral over the “coordinate” quantities,
i.e., the field φ(x):

10 The normal-ordering, here omitted, amounts to a shift of the Hamiltonian by a fixed c-number: this
affects the path integral by a multiplicative factor, which evidently cancels in formulas like (4.135) giving
the n-point Green functions of the theory.
328 Dynamics VIII: Interacting fields: perturbative aspects
 
( 12 φ̇(x)2 − 12 |∇φ(x)|

2
− 12 m2 φ(x)2 −j(x)φ(x))d4 x
Z0 [j] = C Dφei (10.57)
 

(L0 (φ̇,∇φ,φ)−jφ)d4 1 1
≡C Dφei x
, L0 = ∂μ φ∂ μ φ − m2 φ2 (10.58)
2 2

The result of integrating out the field momentum variables is a path integral only over
the field coordinate variables φ(x), with the integrand containing a function (the “free
 φ) only of the field and its space and time-derivatives,
field Lagrangian”) L0 (φ̇, ∇φ,
in complete analogy to (4.105) where the mechanical Lagrangian of the quantized
point particle makes a similar appearance. The appearance of the Lorentz-invariant

scalar combination ∂μ φ(x)∂ μ φ(x) = φ̇(x)2 − |∇φ(x)| 2
at this stage is certainly an
encouraging development, and we shall see later in Chapter 12, when we develop
the canonical formalism for field theory, that it is precisely the Poincaré invariant
properties of the action (defined as the spacetime integral of the Lagrangian) that
render the Lagrangian such a useful, even indispensable, tool in field theory.
Of course, we still have to face the fact that the integral (10.57) contains an
oscillating integrand of absolute value unity, and is therefore certainly not absolutely
convergent (much less uniquely specified). The integral can be given a definite meaning
by introducing an appropriately signed small imaginary part in front of every term
(quadratic in the field) in the Lagrangian which becomes unbounded in the course of
the integration. Thus, we replace the real Lagrangian function appearing in (10.57) by

 φ) = 1 eiδ φ̇2 − 1 e−iδ (|∇φ(x)|


L0δ (φ̇, ∇φ,  2
+ m2 φ 2 ) (10.59)
2 2

where δ is a small positive quantity to be taken to zero at the end of the calculations.
This corresponds, as the reader may easily verify, to just the “rotation” of the
time variable t → e−iδ t needed to obtain a well-defined real-time functional integral
(cf. Section 4.2.1). The real part of iL0δ is readily seen to be negative definite, so
the integrand is exponentially damped11 in all regions where either the field or its
spacetime-derivatives become large. We can now explicitly verify that the resultant
path integral is well-defined by evaluating it, using once again the shift property
(10.54). Define the differential operator

∂2
K ≡ eiδ − e−iδ (∇
 2 − m2 ) (10.60)
∂t2

The inverse of this operator (or kernel) is a Green function G(x, y) defined by

KG(x, y) = δ 4 (x − y) (10.61)

11 Of course, we should include a similar convergence factor in the integral over π(x) performed previously,
 i −iδ  π(x)2 d4 x
which should strictly speaking be taken to be the absolutely convergent integral Dπe− 2 e .
See the discussion below for the interacting case, where all appropriate convergence factors are inserted ab
initio.
Path-integral formulation of field theory 329

where the operator K acts on the spacetime coordinate x. Writing G(x, y) as a Fourier
transform

d4 k
G(x, y) = G̃(k)e−ik·(x−y) (10.62)
(2π)4

and substituting this and (10.60) in (10.61), one readily finds

1
G̃(k) = −
eiδ k02 − e−iδ (k2 + m2 )
1
= −e−iδ
k02 − e−2iδ (k 2 + m2 )
1
→− ,  ≡ 2δ, δ → 0 (10.63)
k2 − m2 + i(k 2 + m2 )

If the limit  → 0 is taken after all integrals are performed, the (positive!) non-
covariant factor k2 + m2 is irrelevant and we obtain for G(x, y) a well-defined covariant
distribution.12 The Green function G(x, y), or equivalently, the inverse of the operator
K, is therefore, up to a sign, just our old friend the free Feynman propagator:

e−ik·(x−y) d4 k
G(x, y) = − = −ΔF (x − y) (10.64)
k 2 − m2 + i (2π)4

Note that the presence ofa slightly non-zero δ (hence ) is essential so 


that the poles of
the integrand at k0 = ± k + m be avoided (by displacement to ± k 2 + m2 ∓ i),
 2 2

once we remove the regulators and return to an infinite continuous spacetime, as our
use of continuum notation suggests we have done. Were δ zero, the energy integral
(over k0 ) in the continuum theory would run directly along the real axis through these
poles, and the result of the integration would be ill-defined. We shall now see that the
evaluation of the path integral (10.57) involves exactly the Green function G(x, y), so
that the original lack of definition in the functional integral can be traced precisely to
the need to generate a well-defined distribution for the two-point function of the field.
To complete the evaluation of (10.57), we observe that, using integration by parts,
 
 φ) − jφ)d4 x = − ( 1 φ(x)Kφ(x) + j(x)φ(x))d4 x
(Lδ (φ̇, ∇φ, (10.65)
2

1
= − { (φ(x) + K−1 j(x))K(φ(x) + K−1 j(x))
2
1
− j(x)K−1 j(x)}d4 x (10.66)
2

12 We shall see, however, in Chapter 17, that the momentum dependence of the  term is crucial
in guaranteeing the absolute convergence of Minkowski-space Feynman integrals, needed to establish
rigorously the efficacy of the subtraction procedures used to renormalize the Minkowski-space amplitudes
of a perturbatively renormalizable theory.
330 Dynamics VIII: Interacting fields: perturbative aspects

The integral over φ in (10.58) can now be shifted using (10.54) to eliminate the
dependence on the source function j, giving a (convergent!) constant factor
  4
C  = Dφe−i φ(x)Kφ(x)d x (10.67)

multiplying a term Gaussian in the source function j(x):


 −1 4
 4 4
Z0 [j] = CC  e 2 j(x)K j(x)d x = CC  e− 2 j(x)ΔF (x−y)j(y)d xd y
i i
(10.68)

The reader will easily confirm that this result is precisely Wick’s theorem for the
Feynman n-point Green functions in (10.22, 10.23), previously derived by operatorial
methods (note that the normal-ordered exponential on the right-hand side of (10.22)
simply becomes unity once the vacuum expectation value is taken: once expanded, only
the first term in the expansion survives). As indicated above, the overall multiplicative
constant CC  is irrelevant once the functional derivatives needed to extract the n-point
Green function are taken,

n 1 δ n Z0 [j] 
0|T {φ(x1 )φ(x2 )..φ(xn )}|0 = i  (10.69)
Z0 [j] δj(x1 )δj(x2 )..δj(xn ) j=0

as the generating functional Z0 [j] appears in both the numerator and the denominator.
We may therefore take simply, for the generating functional of Green functions of our
free scalar field theory,
 4 4
Z0 [j] = e− 2 j(x)ΔF (x−y)j(y)d xd y
i
(10.70)

We note that the removal of the implicit lattice-regularization of spacetime is com-


pletely unproblematic in this free field theory: we simply reinterpret the above integrals
as fully continuous, confident that the resultant distribution ΔF (x) (and its Fourier
transform) are perfectly well-defined as a consequence of the “epsilonic” regularization
of the path integral. The removal of the spacetime regularization of the theory will turn
out to be a far more intricate matter once interactions are included—not surprisingly,
given the discussion in Section 10.2 of the distributional singularities with which
perturbative expansions in a non-Gaussian (higher than quadratic) interaction term
are infected. A full resolution of this problem is the central subject of renormalization
theory, and will be dealt with in Part 4 of the book.
At least formally, the inclusion of interactions in the path-integral formalism is
perfectly straightforward: one simply replaces the free interaction Hamiltonian density
H0 in (10.52) with the full Hamiltonian density H. For example, in a λφ4 theory we
would have
  4
Z[j] = DφDπei (π(x)φ̇(x)−Hδ (π(x),φ(x))−j(x)φ(x))d x (10.71)

1 1  1 1
Hδ = e−iδ { π(x)2 + |∇φ(x)| 2
+ m(a)2 φ(x)2 + λ(a)φ(x)4 } (10.72)
2 2 2 4!
Path-integral formulation of field theory 331

with the full Feynman Green functions of the interacting theory given by functional
derivatives of the functional Z[j]:

1 δ n Z[j] 
0|T {φH (x1 )φH (x2 )..φH (xn )}|0 = in  (10.73)
Z[j] δj(x1 )δj(x2 )..δj(xn ) j=0

The regularization needed to produce a well-defined path integral is indicated implic-


itly by the presence of the spacetime lattice spacing a in the dependence of the bare
mass m(a) and bare coupling λ(a) on the short-distance cutoff: unlike the case for
the free theory, the continuum limit of an interacting field theory requires that these
coefficients be given the appropriate dependence on a in order that the low-energy
amplitudes of the theory approach well-defined limits (cf. Chapter 16). How to do
this will be the subject of Chapters 16 and 17: for the time being, we suppose that all
functional integrals are defined on a bounded spacetime lattice (although for notational
simplicity we continue to use a continuous notation d4 x... for the implicitly discrete
sums over a spacetime lattice), and therefore amount to finitely multi-dimensional
normal (i.e., Stieltjes) integrals. The further regularization needed to produce an
absolutely convergent integral is effected by the introduction of the overall complex
factor e−iδ multiplying the Hamiltonian density. The integral over the momentum
field π(x) can now be performed exactly as previously in the free case (it is absolutely
convergent for δ small and positive), yielding
  1 iδ 2 1 −iδ
2 1 −iδ 2 2 λ(a) −iδ 4 4
Z[j] = C Dφei ( 2 e φ̇(x) − 2 e |∇φ(x)| − 2 e m(a) φ(x) − 4! e φ(x) −j(x)φ(x))d x
 

4
(Lδ (φ̇,∇φ,φ)−jφ)d
≡C Dφei x
(10.74)

Note that the factors of e+iδ and e−iδ appear in precisely the right places to guarantee
the absolute convergence of the remaining integral over φ(x): not surprisingly, as
the original integral, before the π(x) field was integrated out, was clearly absolutely
convergent (with the absolute value of the integrand falling exponentially in the large
field regime).
If we temporarily restore Planck’s constant in the Lagrangian form of the path
integral (it divides the action: namely, the spacetime integral of the Lagrangian),
  1

4
Z[j] = C Dφei (  Lδ (φ̇,∇φ,φ)−jφ)d x (10.75)

one immediately sees that 1 multiplies the Klein–Gordon operator K in the quadratic
part of the action, and therefore the free propagator ΔF (x), essentially the inverse
of K, should be regarded as proportional to a factor of . The interaction vertices of
the theory, associated with the higher than quadratic part of Lδ , should each carry a
factor of 1/. These factors of  will be of importance in Section 10.4 in sorting out the
“classical” (i.e., lowest order in ) from “quantum” contributions to the amplitudes of
the theory. The latter will turn out to be associated with the presence of closed loops
in the graphs.
332 Dynamics VIII: Interacting fields: perturbative aspects

The reader may recall that in our treatment of path integrals for a quantum
mechanical particle in Chapter 4, we described an alternative technique for eliminating
the noisome oscillations in the Minkowski space (i.e., real-time) formulation. In this
approach one analytically continues the amplitudes to imaginary time, basically by the
“Wick rotation” t → −it. Equivalently, the path integral is derived ab initio for matrix
elements of the bounded hermitian operator e−Ht , rather than the usual unitary real-
time development operator e−iHt . One then obtains, in the quantum-mechanics case,
path integrals such as
  tf 
− ( 12 mq̇ 2 (t)+ 12 mω 2 q 2 (t))dt
KE (qf , tf ; qi , ti ) = e ti
Dq(t) = e−SE (q̇,q) Dq(t) (10.76)

which involves a purely real integrand which is exponentially damped for large q(t)
and q̇(t), and therefore yields a well-defined absolutely convergent (multi-dimensional)
integral once the usual discretization of the time interval ti ≤ t ≤ tf is carried out. In
the field-theory case we observe that taking δ = π2 in (10.71,10.72) (which clearly
leads to a convergent path integral) is equivalent to an analytical continuation to
imaginary time t → −it of the Minkowski path integral: we simply reinterpret the
e−iδ factor in Hδ as part of the time variable in d4 x = dxdt. Note that the term
π(x)φ̇(x)d4 x = π(x)dφ(x)dx term is unchanged in this continuation. We then arrive
at the Euclidean generating functional
  4 4
ZE [j] = DφDπe (iπ(x)φ̇(x)+j(x)φ(x))d x−H[π,φ])d x (10.77)

1 1  1 1
H[π, φ] ≡ { π(x)2 + |∇φ(x)| 2
+ m(a)2 φ(x)2 + λ(a)φ(x)4 }d4 x (10.78)
2 2 2 4!

Integrating out the π field in a (by now) familiar fashion, we obtain, up to an irrelevant
multiplicative constant,
 
j(x)φ(x)d4 x
ZE [j] = Dφe−SE [φ]+ (10.79)

1 1  1 1
SE [φ] ≡ { φ̇(x)2 + |∇φ(x)| 2
+ m(a)2 φ(x)2 + λ(a)φ(x)4 }d4 x (10.80)
2 2 2 4!

which can be regarded as the “Euclidean Lagrangian” version of the functional integral
for this field theory (cf. (4.99, 4.100)). The positive real Euclidean Action functional
SE [φ] appears here in analogy to the corresponding quantity SE (q̇, q) for the quantum-
mechanical particle appearing in (10.76). The functional derivatives of ZE [j] yield
the Euclidean Schwinger functions S(x1 , x2 , ..., xn ) of the theory, which we briefly
discussed in Section 9.2:

1 δ n ZE [j] 
S(x1 , x2 , ..., xn ) =  (10.81)
ZE [j] δj(x1 )δj(x2 )..δj(xn ) j=0
Path-integral formulation of field theory 333

They are clearly permutation symmetric in the arguments x1 , x2 , . . . xn , as they


correspond to the path integral

1
S(x1 , x2 , ..., xn ) = Dφ φ(x1 )φ(x2 )...φ(xn )e−SE [φ] ≡ φ(x1 )φ(x2 )...φ(xn )
ZE [0]
(10.82)
involving the weighted average of the product of c-number functions
φ(x1 )φ(x2 )...φ(xn ).
One sometimes encounters the assertion that the Minkowski formulation of the
path integral, with a complex oscillating integrand, is intrinsically ill-defined, and that
the Minkowski amplitudes which it generates should instead be regarded as obtained
by analytic continuation from the Euclidean Green functions obtainable from the
clearly well-defined (once spacetime is appropriately discretized) Euclidean version
(10.79). This is simply incorrect: as we have emphasized above, once the regularizing
e±iδ... factors are included, the regularized Minkowski path integral yields perfectly
definite and unambiguous results. However, it is certainly true that non-perturbative
approaches to field theory involving a direct numerical attack on the evaluation of the
path integral, as in lattice QCD for example, are necessarily restricted to the Euclidean
version of the path integral. Such approaches typically proceed by a stochastic Monte
Carlo estimation of the multi-dimensional lattice-regularized integral, using the real
positive e−SE factor as a Boltzmann probability weight for generating field configura-
tions. This method is not practicable in the Minkowski domain: even if we separate the
negative-definite real part of the action (due to iδ terms) off as a Boltzmann weight, the
residual wildly oscillatory part of the integrand leads to an intolerably small signal-
to-noise ratio in the Monte Carlo estimates (the infamous “sign problem”). This is
unfortunate, as the approximate evaluation of Euclidean amplitudes can be converted
directly to Minkowski space information only in a very limited number of cases (e.g.,
in computations of the mass spectrum, and in evaluation of certain matrix elements of
local operators). The problem, of course, goes back to the fundamental difficulty that
a finite number of approximate numerical estimates of an analytic function is almost
never sufficiently restrictive to allow a reasonably accurate analytic continuation over
a substantial range in the complex plane, and in particular over the π2 Wick rotation
needed to recover Minkowski-space amplitudes from Euclidean results. There is a
glimmer of hope in a stochastic approach to the direct simulation of the field-theoretic
Minkowski path integral based on a complex extension of the Langevin equation,
originally proposed by Parisi (Parisi, 1983) and Klauder (Klauder, 1984), but the
application of this method to date has been more a matter of art than science, as
the simulations not infrequently fail to converge to unambiguous results, and in some
cases are known to converge, but to the wrong answer!13
Returning to our original Minkowski formulation (10.74), an immediate advantage
of the path-integral approach over operator-based formulations of field theory becomes
apparent if we recall the Gell–Mann–Low theorem (9.21), which gives the perturbative
interaction-picture expansion for the full Feynman Green functions of the theory. This

13 For a careful recent study of the mathematical status of complex Langevin methods, see (Aarts et al.,
2010).
334 Dynamics VIII: Interacting fields: perturbative aspects

result is basically a triviality in the functional formalism. If we write the Lagrangian


for a general self-coupled scalar theory as the difference of a free quadratic part L0 14
and an interaction polynomial Hint , as in (10.74), the path integral for the m-point
Green function of the theory takes the form
 
(L0 (φ,∂μ φ(z))−Hint (φ(z)))d4 z
Dφ φ(x1 )..φ(xm )ei
 
+∞  
(−i)n L0 d4 z
= Dφ φ(x1 )...φ(xm ) Hint (z1 )..Hint (zn )d4 z1 ..d4 zn ei
n=0
n!


+∞   
(−i)n L0 d4 z
d z1 ..d zn Dφ φ(x1 )...φ(xm )Hint (z1 )..Hint (zn ) ei
4 4

n=0
n!

(10.83)

If the interaction part Hint is associated with a perturbative parameter (such as


the coupling constant λ0 in the φ4 theory considered previously), then the final line
amounts to a perturbative expansion, with the nth order result giving a functional
integral which clearly reproduces the interaction-picture free field Green function found
on the right-hand side of the Gell–Mann–Low formula (9.21), where the initial/final
states |α, |β are the vacuum. The second line in (10.83) follows rigorously from
the first, as the exponential expansion is perfectly convergent for fixed values of the
field φ(x) (on the regularizing spacetime lattice). However, the unbounded character
of the integration Dφ over the field values renders the subsequent interchange of
summation and integration invalid, in the sense that the resultant perturbative series
is no longer convergent, but can at best be regarded as an asymptotic expansion for
the full amplitudes in powers of the coupling constant parameterizing the size of the
interaction term. The non-convergence resulting from the exchange of an infinite sum
with an unbounded integral is already familiar in very simple one-dimensional cases:
the reader is encouraged to verify it explicitly in a simple non-Gaussian integral in
Problem 6. We shall have much more to say about this issue in the following chapter,
in the context of non-perturbative approaches to field theory.
The preceding discussion for bosonic scalar (spin-0) fields can be generalized
in a straightforward fashion for bosonic fields of non-zero spin: specifically, spin-1
vector fields. Sensible interacting field theories of spin-1 particles typically possess
important additional (beyond Poincaré) local gauge symmetries (resulting in subtleties
in the path-integral formulation) which are much more conveniently discussed in the
Lagrangian framework which we shall construct in Chapter 12, so we shall postpone
further discussion of bosonic path integrals for higher spin fields until that point.
Before going on to the generalization of the path-integral method to fermionic
theories, a few remaining points concerning the structure of Gaussian bosonic path
integrals should be addressed, especially as the contrast they display to the corre-

14 In order to avoid overburdening the notation, we henceforth omit eiδ factors, and the δ subscript on
the Lagrangian functional.
Path-integral formulation of field theory 335

sponding behavior for the fermionic functional integrals discussed below is of great
importance in numerical approaches to non-perturbative quantum field theory. Let
us take another look at the basic (source-free) Gaussian path integral for a free self-
conjugate scalar field, in the Euclidean formulation, as given by
 
1 4
Z0 [0] = Dφ e− 2 φ(x)KE φ(x)d x (10.84)

where the Euclidean version of the Klein–Gordon operator is

4
∂2
KE = − + m2 ,  ≡ (10.85)
i=1
∂x2i

The implicit regularization of this path integral on a finite Euclidean spacetime lattice
(with N points) implies that the operator KE is actually replaced by a real, symmetric,
positive-definite N xN matrix Kij , so our integral actually reads
 
−1 φ K φ
Z0 [0] = dφi e 2 i,j i ij j (10.86)
i

where the field variables φi representing the value of φ(x) at the lattice point xi are real
(and range from −∞ to +∞). Let Oij be the orthogonal matrix (of unit determinant)
which diagonalizes Kij :
  
Kij = λk Oik Ojk ⇒ φi Kij φj = λk φ̂2k (10.87)
k i,j k


where we have introduced new field variables φ̂k = i φi Oik related to the original
ones by a unit Jacobian. The eigenvalues λk are all positive: otherwise our Euclidean
path integral would be divergent! Changing integration variables to the φ̂k then,
 N
1 2 (2π)N/2 1
Z0 [0] = dφ̂k e− 2 k=1 λk φ̂k = N √ = (2π)N/2 det(K)− 2 (10.88)
k=1 λk

Of course, this source-free integral is a field and source-independent constant—indeed,


just the constant C  in (10.67)—which we have argued is irrelevant to the calculation of
the Green functions of the theory via the functional formula (10.69), as it cancels in the
numerator and the denominator. However, in the fermionic case we shall encounter
cases of Gaussian functional integrals in which the corresponding operator (to K)
contains further bosonic fields, so the determinant is field-dependent and must be
retained as a non-trivial component of the full path integral of the theory. The
Gaussian integrals occurring in the fermionic case are typically over complex (i.e., non-
self-conjugate) fields, so an even closer analogy is obtained by studying the Gaussian
path integral for a complex scalar field ψ(x), which can always be rewritten in terms
of two (equal-mass) real (self-conjugate) scalar fields φ1 , φ2 by
1
ψ(x) = √ (φ1 (x) + iφ2 (x)) (10.89)
2
336 Dynamics VIII: Interacting fields: perturbative aspects

The Gaussian integral corresponding to (10.84) for the doublet of fields φ1 , φ2 can be
rewritten as an integral over the single complex field ψ (and its conjugate field ψ ∗ ,
treated as an independent variable)
  2   ∗
1
φi (x)KE φi (x)d4 x 4
Dφ1 Dφ2 e− 2 i=1 = DψDψ ∗ e− ψ (x)KE ψ(x)d x

→ (2π)N det(K)−1 (10.90)

where the final line gives the explicit evaluation of the regularized path integral, in
this case yielding the inverse determinant of the Euclidean Klein–Gordon operator KE ,
without the square-root (due to the doubling of the degrees of freedom in the case of
a complex scalar field). We shall shortly see that the entire content of the distinction
between Bose–Einstein and Fermi–Dirac statistics for fields in the functional approach
lies in the power of the determinant that appears in the basic Gaussian integral, which
is negative for bosonic fields but positive for fermionic ones. This at first sight minor
distinction leads, in fact, to an enormous increase in difficulty of numerical evaluation
of the path integral for fermionic as opposed to bosonic fields.

10.3.2 Path integrals for fermionic fields


The equal-time anticommutation relation for Dirac fields (cf. Problem 5 in Chapter 7)

{ψn (x, t), iψm (y , t)} = iδ 3 (x − y )δnm (10.91)

suggests (by analogy to the corresponding result (8.1) for scalar fields) that the
conjugate “momentum” field to ψ(x) is π(x) = iψ † (x). If we formally imitate the
Hamiltonian path integral for scalar fields (10.52), where now the free Hamiltonian
density H0 is that appropriate for the Dirac field (Problem 3 in Chapter 7),
 + mψ̄ψ
H0 = iψ̄γ · ∇ψ (10.92)

we find, introducing c-number source functions η, η̄ whose functional derivatives will


(one hopes!) produce the desired time-ordered Feynman Green functions of products
of ψ̄ and ψ, the following expression for the generating functional Z0 [η, η̄] of the free
Dirac theory:
  †

4
Z0 [η, η̄] = DψDψ ∗ ei (iψ ψ̇−iψ̄
γ ·∇ψ−mψ̄ψ−η̄ψ−ψ̄η)d x (10.93)
 
/−m)ψ−η̄ψ−ψ̄η)d4 x
= DψDψ̄ ei (ψ̄(i∂
(10.94)

where operator adjoints ψ † have been reinterpreted as c-number complex conjugates


ψ ∗ . As usual, such an expression is assumed to be regularized on a finite spacetime
lattice: the functional integrals over ψ(x) and ψ † (x), viewed as independent variables,
are then finite in number, corresponding to the a discrete finite set of lattice spacetime
coordinates xi , i = 1, 2, . . . N . Note that overall multiplicative factors, such as those
induced by dropping the i in the definition π(x) = iψ † (x), are ignored just as in the
bosonic case. In the second line the replacement of ψ † with ψ̄ = ψ † γ0 in the functional
Path-integral formulation of field theory 337

measure is completely innocent, as det(γ0 ) = 1. Note that the action (the non-source
part of the exponent in (10.94)) has already assumed a Lorentz-invariant form: for
fermionic theories the Lagrangian is obtained directly by including the π ψ̇ term with
the Hamiltonian, without the need for an integration over the conjugate momentum
π field. Again, the reason for this will become clear in our discussion of the canonical
formalism in Chapter 12.
Defining the differential operator

D ≡ iγ μ ∂μ − m = i∂/ − m (10.95)

in analogy to the Klein–Gordon operator K introduced previously for the scalar field,
we find that the functional Z0 can be formally evaluated by the same procedure of
completion of the square used in the bosonic case:
  −1 −1 −1 4
Z0 [η, η̄] = DψDψ̄ ei {(ψ̄−η̄D )D(ψ−D η)−η̄D η}d x (10.96)

η̄D−1 ηd4 x
= Ce−i (10.97)

where we have assumed that the integrals over ψ, ψ̄ are shift-invariant, as in the bosonic
case. This already looks quite promising, as the Feynman–Dirac propagator defined
by iSF (x − y) = 0|T (ψ(x)ψ̄(y))|0 (cf. Problem 5 in Chapter 7)
 
p/ + m d4 p e−ip·(x−y) d4 p
SF (x − y) = e−ip·(x−y) = (10.98)
p − m + i
2 2 (2π)4 p/ − m + i (2π)4

is, in fact, a Green function for the operator D:

DSF (x − y) = (i∂/ − m)SF (x − y) = δ 4 (x − y) (10.99)

so that we may write (10.97), in analogy to (10.70), as


 4 4
Z0 [η, η̄] = Ce−i η̄(x)SF (x−y)η(y)d xd y (10.100)

with (α, β are Dirac indices, running 1,2,3,4)



i2 δ 2 Z0 
 = i(SF (x − y))αβ = 0|T (ψα (x)ψ̄β (y))|0 (10.101)
Z0 δηβ (y)δ η̄α (x) η=η̄=0

Unfortunately, plausible as it seems, the above argument conceals some deep flaws
in the analogistic reasoning used to arrive at (10.100). In particular, the crucial minus
sign embedded in the definition of the fermionic T-product

T (ψα (x)ψ̄β (y)) ≡ θ(x0 − y 0 )ψα (x)ψ̄β (y) − θ(y 0 − x0 )ψ̄β (y)ψα (x) (10.102)

is not reproduced by the double functional derivative in (10.101), if the field sources
η, η̄ take values in a commutative field (such as the complex numbers). The change
of sign when fermion fields are reordered implies a similar change of sign when the
338 Dynamics VIII: Interacting fields: perturbative aspects

functional derivatives are correspondingly reordered: in other words, we must have

δ2 δ2
=− (10.103)
δηβ (y)δ η̄α (x) δ η̄α (x)δηβ (y)

This problem can (indeed, must) be fixed by insisting that the fermionic source
functions η, η̄ take values in a Grassmann algebra, any two elements of which anti-
commute with each other. This then implies that the c-number fermionic fields ψ, ψ̄
appearing in (10.94) must also be anticommuting Grassmann numbers. Otherwise,
when the source terms in the exponent are expanded, only linear terms would survive,
and we would conclude that Green functions involving more than one ψ (or ψ̄)
¯ are conventional complex-valued c-number
field vanish! For example, if ψ(x), ψ(x)
functions, commuting with the Grassmann source functions,
 
( ψ̄(x)η(x)d4 x)2 = ψ̄(x)ψ̄(y)η(x)η(y)d4 xd4 y = 0 (10.104)

as the product η(x)η(y) is antisymmetric under exchange of x and y, while the product
ψ̄(x)ψ̄(y) is symmetric. This is avoided if we also require ψ̄(x) (and likewise ψ(x)) to
take values in a Grassmann algebra. As indicated above, we have implicitly defined
the theory on a spacetime lattice, so that the fields (and sources) are defined on a
finite discrete set of spacetime coordinates xi , i = 1, 2, ...N . Our Grassmann algebra
consists of the set of all multi-nomials (with complex coefficients) generated by the
4N Grassmann numbers ψ(xi ), ψ̄(xi ), η(xi ), η̄(xi ). Denoting these generically by χi ,
the only properties attributed to these numbers are

χi χj = −χj χi , χi χi = −χi χi = 0 (10.105)

As the square of any given Grassmann number vanishes, the possible multi-nomials
involve each of the 4N independent Grassmann numbers at most once. The linear
space of allowed multi-nomials is therefore 24N -dimensional, consisting of the one-
dimensional space with no Grassmanns (i.e., the complex numbers C), the 4N space
−1)
spanned by the Grassmann generators appearing singly, the 4N (4N 2 space spanned
by products of two distinct Grassmanns χi χj , i < j, and so on.
Of course, in order to pursue this strategy we must next ensure that we under-
stand the meaning of functional derivatives (i.e., in our discretized system, partial
differentiation with respect to a given Grassmann element χi ) and functional (path)
integrals (in the case of Z0 above, multi-dimensional integration over the ψ(xi ), ψ̄(xi )
holding η(xi ), η̄(xi ) fixed). The definition of the derivative is obvious once we recall
that functions F (χi ) of the χi can at most depend linearly on any given χj . Any term
containing χj can therefore be rearranged so that the single factor of χj is moved to
the extreme left (with a concomitant factor of –1 if the rearrangement involves an odd
permutation of Grassmann numbers), giving

F (χi ) = A(χi , i = j) + χj B(χi , i = j) (10.106)


Path-integral formulation of field theory 339

The partial derivative with respect to χj is then defined in the natural way as

∂F
≡ B(χi , i = j) (10.107)
∂χj

The reader is invited to check that, with this definition, the desired antisymmetry of
the second derivatives expressed in (10.103) indeed obtains: namely

∂2F ∂2F
=− (10.108)
∂χi ∂χj ∂χj ∂χi

While differentiation with respect to Grassmann variables is superficially very


similar (modulo the occasional minus sign) to ordinary differentiation with respect to
commuting variables, the meaning of integration over Grassmann variables turns out
to be completely different from that of conventional Stieltjes (or Lebesgue) integration.
One formal property of the integral over ψ and ψ̄ used in the heuristic argument leading
to (10.100) was the shift invariance requirement
 
DψF [ψ − χ] = DψF [ψ] (10.109)
 
Dψ̄F [ψ̄ − χ̄] = Dψ̄F [ψ̄] (10.110)

For a single Grassmann variable this implies


   
dψi (ψi − χi ) = dψi ψi ⇒ ( dψi )χi = 0 ⇒ dψi = 0 (10.111)

where we have also assumed that a constant Grassmann number can be factored out
of the integral. Thus any term lacking a particular Grassmann variable ψi vanishes
when integrated over ψi . As any term containing the Grassmann variable ψi more than
once vanishes, the only potentially non-vanishing integral is that of the variable ψi
appearing linearly, which we can normalize to any value we please (as overall constants
in the generating functional Z are irrelevant, as emphasized repeatedly above). It is
conventional to define

dχi χi = 1 (10.112)

for any Grassmann variable χi : i.e., for any of the field variables ψi , ψ̄i appearing in
the regularized path integral.
With this interpretation of the fermionic integrations in (10.94) we may justify
the result for the free generating function (10.100), at least at the level of lattice-
regularized fields. At this level, in contrast to the situation for bosonic fields, there are
no convergence problems due to oscillating integrands: the expansion of the exponential
terminates at a finite order, with terms containing each ψi and ψ̄i variable once and
only once, and the value of the path integral (as a function of the remaining source
340 Dynamics VIII: Interacting fields: perturbative aspects

N
quantities ηi , η̄i ) is just the coefficient of i=1 ψ̄i ψi in this expanded quantity.15 So
where does the requirement for an i in the denominator of the Fourier transformed
Feynman–Dirac propagator come from? In the bosonic case, this small imaginary
displacement arose naturally from the requirement of regularizing the oscillating
Minkowski integrand, as the integration range for the field variables was infinite,
leading to a failure of absolute convergence of the (finitely) multi-dimensional integral.
Here the finite-dimensional fermionic integral gives a perfectly finite result provided
that the 4N x4N matrix (4 Dirac, N spacetime degrees of freedom) representing the
Dirac operator D on our finite spacetime lattice is invertible. The discrete  energies
and momenta allowed on such a lattice mean that the singularity at k0 = k 2 + m2
encountered by a continuous energy integral over k0 is “missed” when the integral is
converted to a finite sum. In the fermionic case, the need for inclusion of an i pre-
scription for avoiding this singularity only appears once the lattice spacing is taken to
zero and the sum goes over to a continuous integration along the real energy axis. The
need for including the i in the correct way at this point—i.e., as a negative imaginary
part in the mass (see (10.98))—is dictated by our desire to recover the correct causal
(i.e., time-ordered) propagator in agreement with the operator version of the theory.
For the interacting field theories of primary interest in modern particle physics—
the gauge field theories of the Standard Model—the dependence of the Hamiltonian
(or Lagrangian) on the Fermi fields is always quadratic, so we may in general write
the fermionic part of the full functional integral (which may, of course, also contain
further integrations over bosonic fields),
   
4N
(ψ̄(x)D(φ)ψ(x)−η̄(x)ψ(x)−ψ̄(x)η(x))d4 x
Dψ̄Dψei → dψ̄i dψi ei(ψ̄i D(φ)ij ψj −η̄i ψi −ψ̄i ηi )
i=1
(10.113)

where the notation D(φ) indicates the possible dependence of the differential operator
in the quadratic part on a generic set of bosonic fields φ, and the arrow a suitable
discretization of the continuum theory in which the fermion fields are placed on
a spacetime lattice. The indices i, j run from 1 to 4N , where N is the number
of spacetime lattice points and the 4 comes from the discrete Dirac index. After
completion of the square, exactly as previously for the free theory, this becomes
 
4N
−iη̄i D(φ)−1 −1
e ij ηj dψ̄i dψi eiψ̄i D(φ)ij ψj = e−iη̄i D(φ)ij ηj
det[D(φ)] (10.114)
i=1

The integral of the (exponential of the) source-free quadratic fermion action gives
simply the determinant of the 4N x4N matrix of the discretized operator D(φ)! The
reason for this is quite simple: in order to obtain a non-vanishing contribution, the
rules for Grassmann integration described above imply that only the term in which
the expansion of the exponential contains each ψi and each ψ̄i once and only once
contributes. The coefficient of such terms involves the multiplication of 4N matrix

15 Alternatively, we see that the value of the multi-dimensional Grassmann integral is also given by taking
a derivative of the integrand with respect to each and every one of the ψi and ψ̄i : for Grassmann quantities,
integration = differentiation!
Graphical concepts: N -particle irreducibility 341

elements Dij with each row and column of the matrix appearing exactly once, and
with a sign indicating the sign of the permutation needed to place the ψi in the same
order as the ψ̄i . This is exactly the definition of the determinant of the matrix Dij .
Note that the result is precisely the inverse of that obtained for the integration of
Gaussian complex bosonic path integrals (cf. (10.90)). As indicated previously, the
path integral for the interacting gauge field theories contained in the Standard Model
are at most quadratic in the elementary spin- 12 fields (leptons and quarks) of the
theory, so that in principle the fermionic integrations can all be performed yielding
determinantal functionals of the remaining bosonic fields, leaving only bosonic path
integrals to be performed. In the case of lattice QCD, by far the greatest part of the
numerical difficulties encountered with stochastic estimations of the resultant bosonic
path integrals derives from the evaluation of the fermionic determinant associated
with integrating out the quark fields.
There is one further important property of Grassmann integrals which we shall
need to take into account in our discussion of anomalies in Chapter 15. The peculiar
property of fermionic path integrals, whereby the fermion determinant appears to a
positive power (as in (10.114)) when a Gaussian integral is performed, has a correlate in
the behavior of the Grassmann integral under a change of fermionic variables. Suppose
we wish to change variables from a set of Grassmann quantities ψi , i = 1, 2, ..N to
ψi = Cij ψi , where the Cij are bosonic in character (i.e., complex scalars). The
requirement that the new variables satisfy
 
N 
N  
N
1= dψi ψi = dψi C1i1 ψi1 C2i2 ψi2 · · · CN iN ψiN (10.115)
i=1 i=1 i=1
 N  N
implies that the Jacobian J of the change of variables, with i=1 dψi = J i=1 dψi
is given by

J = det(C)−1 (10.116)

instead of by det(C), as would be the case for a multi-dimensional bosonic integral.

10.4 Graphical concepts: N -particle irreducibility


Our earlier discussion of the diagrammatic representation of field-theory amplitudes
pointed out two potential sources16 of singular behavior in the order-by-order imple-
mentation of perturbation theory. First, there is the infrared (more precisely, long time)
problem of persistent interactions, which lead, in a careless application of perturbation
theory, to an incorrect pole structure in external momentum of the n-point functions
of the theory, making the extraction of sensible S-matrix elements via the LSZ formula
impossible. Secondly, even when this first problem has been satisfactorily addressed,
there remains the problem of ultraviolet divergences associated with (in coordinate
space) the lack of definition of products of Feynman propagator distributions, or
equivalently (in momentum space) loop integrals which diverge as the ultraviolet
cutoff (associated, say, with the short-distance lattice-regularization of the theory) is

16 The third diffficulty, or “inconvenient truth”, identified in Section 10.2, the intrinsic non-convergence
of perturbative expansions, will be addressed in detail in the subsequent chapter.
342 Dynamics VIII: Interacting fields: perturbative aspects

removed. The resolution of these two issues is greatly facilitated by the introduction of
graphical concepts which allow us to reorganize the Feynman diagrams of the theory
in an intuitively powerful way. In the first case, the relevant concept is that of an
amputated diagram, in the second, of proper, or one-particle irreducible vertices.
The graphical representation of the Green functions of an interacting field theory,
as described in Section 10.2, suggests that important features of these amplitudes can
be directly visualized and correlated with corresponding properties of the Feynman
graphs which represent these amplitudes at any given order of perturbation theory.
The most basic such property is that of connectedness, first discussed in the context
of S-matrix elements in Section 6.1. This is a special case of the more general concept
of “N -particle irreducibility”. A graph (or set of graphs) contributing to a given n-
point function, or (via the LSZ formula) S-matrix element, is said to be N -particle
irreducible if it remains connected when any N internal lines of the graph are cut.
Thus, the connected diagrams are 0-particle irreducible: they contain the subset of
one-particle irreducible (or 1PI for short) diagrams which remain connected if only
a single internal line is cut. The 1PI diagrams in turn contain as a subset the 2PI
diagrams which remain connected even when two internal lines are cut, and so on.
The physical significance of the connected contributions to S-matrix elements, and
to the Green functions of the theory was already discussed at length in Sections 6.1
and 9.3. The physical interpretation of the 1PI and higher irreducible graphs will
be discussed in due course below. Our task here is to connect these concepts to the
functional approach to field theory introduced in the preceding section.
The functional derivatives of Z[j] (divided by Z[0]) yield the full set of contri-
butions to the Green functions of the theory—both connected and disconnected.17
Exactly as for the generating functional of S-matrix elements, where the connection
between full and connected S-matrix amplitudes is extremely simple when expressed
in terms of generating functionals, as discussed in Section 6.1,

S(j ∗ , j) ≡ exp (S c (j ∗ , j)) (10.117)


the generating functional W [j] whose functional derivatives yield the connected Green
functions of the theory is given by

Z[j] = exp W [j], W [j] ≡ ln Z[j] (10.118)


where, for our usual example of a self-interacting real scalar field, Z[j] is given (cf.
(10.74)) by the Minkowski functional integral
  1 
4 4
Z[j] = Dφe−i ( 2 φ(x)Kφ(x)+P (φ))d x+ j(x)φ(x)d x (10.119)

with P (φ) a polynomial (higher than quadratic) in the fields, the individual terms of
which induce the three-point, four-point, etc. interaction vertices of the theory. As a
concrete example we shall imagine in the following that both trilinear and quadrilinear
interactions are present—we may take, for example, P (φ) = λ3!3 φ3 + λ4!4 φ4 . Thus, the

17 However, as discussed previously, the division by Z[0] eliminates the disconnected vacuum fluctuations
accompanying any scattering process.
Graphical concepts: N -particle irreducibility 343

graphs of the theory contain elementary vertices from which either three or four lines
emerge. The usual factors of eiδ needed for convergence of the Minkowski path integral
are suppressed here to avoid overburdening the notation.
The derivatives of (10.118), in analogy to the functional derivatives of Z[j] for the
full Green functions (see (10.73), lead directly to the cluster decomposition recursion
formulas (cf. (9.92, 9.93, 9.94)) relating connected to full amplitudes

iδW i δZ
= (10.120)
δj(x1 ) Z δj(x1 )
i2 δ 2 W i2 δ2Z iδW iδW
= − (10.121)
δj(x1 )δj(x2 ) Z δj(x1 )δj(x2 ) δj(x1 ) δj(x2 )
i3 δ 3 W i3 δ3Z iδW i2 δ 2 W
= −( + perms)
δj(x1 )δj(x2 )δj(x3 ) Z δj(x1 )δj(x2 )δj(x3 ) δj(x1 ) δj(x2 )δj(x3 )
iδW iδW iδW
− (10.122)
δj(x1 ) δj(x2 ) δj(x3 )

and so on. After the functional derivatives are performed, the desired n-point functions
are obtained by setting the source j = 0. These formulas display directly the removal
of disconnected contributions from the full set of graphs generated by the generating
functional Z[j]. We shall henceforth only consider connected amplitudes, relieving us
of the obligation to further complicate the notation by a sub(or super)script “c”, to
distinguish the connected Green functions generated by W [j] from those (containing
all graphs) generated by Z[j].
Before deriving the general form of the generating functional for the one-particle
irreducible diagrams of the theory, we shall illustrate the basic idea with some
examples, which will help to bring home the critical importance of these new graphical
concepts in exposing exactly the pole structure in the external momenta needed to
make the LSZ formula for S-matrix elements work. Recall from our discussion of
persistent interactions in Section 10.2 that the appearance of the correct set of poles
in the external particle momenta is by no means automatic in perturbation theory.
The heuristic account that follows is intended to reveal in a visually intuitive way
the appropriate reorganization of perturbation theory which will allow the direct and
unambiguous application of the LSZ formula to the extraction of S-matrix elements.
Consider the set of all connected contributions to the four-point Green function in
the theory with generating functional (10.119). All such graphs can be displayed in
the generic form exhibited in Fig. 10.4: the blobs labeled Δ̂F on the four external lines
represent all possible contributions to the full interacting two-point function (Feynman
propagator): they contain all possible self-interactions of the external particles coming
(4)
into and receding from the central collision process. The central blob, labeled Gamp , is
the “amputated” connected four-point function of the theory, and represents the part
of the process where the initial and final particles can no longer be considered to be
moving freely and independently of one another. Thus, the spacetime points labeled
z1 , z2 , z1 , z2 represent interaction vertices, indeed the first interactions (as we move
in towards the central collision process from the external points x1 , x2 , x1 , x2 ) which
344 Dynamics VIII: Interacting fields: perturbative aspects

x1 x2

z 1 z2
(4)
Gamp
Δ̂F (x1−z1) z1 z2

x1 x2

Fig. 10.4 General structure of connected four-point function G(4) (x1 , x2 , x1 , x2 ).

can no longer be regarded as inducing persistent self-interactions of isolated, freely


moving particles. We shall shortly see how to construct a generating functional which
automatically yields amputated Green functions. But even before doing this, it is easy
to see how the introduction of amputated graphs allows us to solve the problem of
persistent interactions in the context of the LSZ formula mentioned above.
The decomposition of the four-point function illustrated in Fig. 10.4 (into exter-
nal legs carrying self-interactions of the external particles and a central legless, or
“amputated”, interaction region) amounts to writing, in coordinate space

G(4) (x1 , x2 , x1 , x2 ) = Δ̂F (x1 − z1 )Δ̂F (x2 − z2 )Δ̂F (x1 − z1 )Δ̂F (x2 − z2 )

  4  4  4
·G(4) 4
amp (z1 , z2 , z1 , z2 )d z1 d z2 d z1 d z2 (10.123)

Fourier transforming the full propagators



d4 q
Δ̂F (x) = eiq·x Δ̂F (q2 ) (10.124)
(2π)4

and the full Green function G(4) (cf. (9.178))


  
G̃(4) (k1 , k2 , k1 , k2 ) ≡ e+i kj ·xj −i ki ·xi G(4) (x1 , x2 , x1 , x2 )d4 x1 d4 x2 d4 x1 d4 x2
(10.125)
we find, as expected, that the convolutions in coordinate space become algebraic
products in momentum space

G̃(4) (k1 , k2 , k1 , k2 ) = Δ̂F (k12 )Δ̂F (k22 )Δ̂F (k12 )Δ̂F (k22 )G̃(4)  
amp (k1 , k2 , k1 , k2 ) (10.126)
Graphical concepts: N -particle irreducibility 345

Δ̂F
= + +

+ + + ....

Fig. 10.5 Feynman Graphs contributing to the full Feynman propagator Δ̂F in φ4 theory.

The 2-2 scattering amplitude in this theory is given by the LSZ formula (9.179): we
must multiply the momentum-space four-point function G̃(4) by external leg factors
−iZ −1/2 (k2 −m2ph )
of 3/2
√ for every external momentum k, and then take the on-mass-shell
(2π) 2E(k)
limit k 2 → m2ph . By the Lehmann representation (9.192), each of the external leg full
propagators in (10.126) produce in this limit a simple pole, with residue Z, at exactly
the (squared) physical mass m2ph (which may not be the same as the squared masses
employed in the free propagators from which the diagrams are constructed!), so that
the factors of (k 2 − m2ph ) for each initial or final-state particle are cancelled, and we
obtain for the 2-2 S-matrix element:


2
−iZ 1/2 2
−iZ 1/2  
Sk1 ,k2 ,k1 ,k2 =  G̃(4)
amp (k1 , k2 , k1 , k2 ) (10.127)
(2π)3/2 2E(k ) 3/2 )
i=1 i j=1 (2π) 2E(kj

(4)
with the amputated four-point function G̃amp evaluated at on-mass-shell momenta
(i.e., ki2 = ki2 = m2ph ). The potentially singular behavior associated with the on-mass-
shell limit (or equivalently, the infinite time propagation of self-interacting particles
into and out of the process) has been taken care of in (10.127): we may continue
(4)
on with the evaluation of G̃amp (k1 , k2 , k1 , k2 ) in perturbation theory, confident that
the pole residue corresponding to the desired S-matrix element has been properly
extracted. In particular, there is no need to use an on-mass-shell renormalization
scheme of the sort described in Section 10.2, in which the pole of the free propagator
is shifted to the physical value order by order in perturbation theory by the choice of
suitable counterterms. The mass appearing in the free propagators constituting the
(4)
perturbative expansion of the amputated Green function G̃amp may be conveniently
chosen at some “intermediate” value, depending on the particular renormalization
scheme employed, as we shall see later in our detailed discussion of renormalization
theory in Chapter 17.
Before going on to a discussion of proper, or “one-particle-irreducible” (1PI), Green
functions, a short digression on some elementary graphical counting rules will equip
us with some results which facilitate the reorganization of perturbation theory implied
by the introduction of the 1PI condition. Recall that a tree diagram is defined as a
connected graph which is rendered disconnected by cutting a single internal line. Let
such a graph have I internal lines and V vertices. We imagine a “pruning” process
whereby vertices are removed one by one by cutting a single internal line, starting at
the outermost “branches” of the tree. Each such removal leaves the quantity I − V + 1
346 Dynamics VIII: Interacting fields: perturbative aspects

unchanged, as a single vertex and a single internal line have been removed. Eventually
we arrive at the remnant core of the graph, with a single vertex, and no remaining
internal lines, for which the quantity I − V + 1 is zero. We conclude that for any tree
graph, I − V + 1 = 0.
More generally, a connected graph may contain L independent loops, resulting in
momentum space in L independent four-momenta integrations corresponding to the
free flow of momentum around each loop. We again consider the quantity I − V + 1,
this time reducing the number of loops one at a time by cutting a single internal
line in each loop, without altering the total number of vertices, and leaving the graph
connected at each stage. In this case we reduce the number of loops L, and the quantity
I − V + 1, by one at each cut, until we arrive at a connected graph with no loops- i.e.,
a tree graph, with I − V + 1 = 0. This establishes that the number of loops L in our
initial graph is given precisely by the combination I − V + 1. The reader is invited to
verify this rule by drawing a few simple graphs.
The study of the short-distance (or ultraviolet) singularity structure of field theory
amplitudes is greatly facilitated by the introduction of the concept of one-particle-
irreducible diagrams, and in particular, by the use of a functional, analogous to W [j]
for connected graphs, which automatically generates such 1PI diagrams. In addition,
the graphs produced will be amputated: in other words, they are subsets of the graphs
(n)
describing the connected amputated functions Gamp discussed previously, where those
diagrams which can be disconnected by cutting a single internal line are discarded. It
is intuitively obvious that the full set of amputated connected diagrams (in coordinate
space) can be reconstituted from the 1PI diagrams by convolving products of the latter
with full Feynman propagators Δ̂F connecting the separate 1PI pieces.
For example, in Fig. 10.6, we see that a class of one-particle-reducible contributions
to the connected four-point function have the structure of two three-point 1PI graphs
(3) (3)
Gamp connected by a single full Feynman propagator. Note that Gamp is automatically
1PI (why?). In momentum space the full graph decomposes algebraically into a product
(3)
of the momentum-space amplitudes for the two Gamp functions times the momentum-
space Feynman propagator Δ̂F (p2 ), where p is the definite four-momentum (fixed by

Δ̂F Δ̂F
Δ̂F
(3)
Gamp (3)
Gamp

Δ̂F Δ̂F

Fig. 10.6 A one-particle-reducible contribution to the connected four-point function.


Graphical concepts: N -particle irreducibility 347

energy-momentum conservation) passing between the two separate 1PI pieces. Thus
the singularity structure of the full connected amplitudes—in momentum space, the
dependence on the ultraviolet cutoff of the loop integrals in any given graph—is
decomposable into independent pieces: namely, the proper (or 1PI) subgraphs into
which it can be decomposed.18 The study of the short-distance sensitivity of general
amplitudes in field theory can therefore be reduced to a study of the cutoff dependence
of the proper, or 1PI, Green functions of the theory.
In addition to simplifying the study of ultraviolet behavior, the introduction of
the 1PI concept leads to two further important insights into the physics of local field
theories. First, we shall see that the generating functional for 1PI graphs (sometimes
referred to as the “effective action”) also has a direct energetic interpretation which
plays a critical role in the analysis of spontaneous symmetry-breaking, to which we
shall return in Part 3. In fact, it plays a role in field theory analogous to that played
by the free energy in thermodynamics. Secondly, if we reintroduce Planck’s constant
as an explicit signature of quantum effects, an expansion of the effective action in
powers of  is found to (a) yield the classical action as the zeroth order term, and (b)
correspond precisely to a reorganization of the perturbation theory according to the
number of loops.
We turn now to the task of constructing a functional for the 1PI graphs of the
theory. To avoid annoying factors of i we shall work in the Euclidean formulation.
Furthermore, for the reasons adduced immediately above, Planck’s constant will
be reintroduced both in the path integral and in the definition of the generating
functionals. Thus, the exponents appearing in the path integral acquire an explicit
factor of 1 , which can be traced back to the reintroduction of  in the Hamiltonian
evolution, e−iHt → e−iHt/ , or, in the imaginary time formulation, e−Ht → e−Ht/ .
Our Euclidean path integral now reads
   
1
( 12 φ(x)Kφ(x)+P (φ(x))−j(x)φ(x))d4 x 1
j(x)φ(x)d4 x)
Z[j] = Dφe−  = Dφe−  (S[φ]−
(10.128)
where now K = − + m2 is the Euclidean Klein–Gordon operator, which is self-
adjoint and positive-definite, and the coordinate integrations are over a Euclidean
four-space. Of course, we expect to be able, at least in principle, to return to the
physical Minkowski-space amplitudes by the process of Wick rotation. The generating
functional for the connected diagrams is now defined as

 1 
W [j] =  ln Z[j] = G(n) (x1 , x2 , ...., xn )j(x1 )j(x2 )...j(xn )d4 x1 d4 x2 ...d4 xn
n
n!
(10.129)
(n)
where we shall for simplicity adopt the same notation G for the Euclidean n-point
Green functions (previously denoted S(x1 , .., xn ), in (10.81)) of the theory as that

18 In coordinate space it is clear that the ultraviolet singularities, which arise from multiplication of
Feynman propagators (which are distributions) at coincident vertices, must be localized within the 1PI
pieces, as the propagators connecting separate 1PI parts of the diagram appear independently.
348 Dynamics VIII: Interacting fields: perturbative aspects

used previously for Minkowski-space (Feynman) amplitudes, and remember that we


are dealing throughout with connected amplitudes.
Note the additional overall factor of  included in the definition of W [j]: it is there
to ensure the existence of a well-defined classical limit for  → 0. Indeed, we note
that in any particular graphical contribution to G(n) , each free propagator ΔF = K−1
is now accompanied by a factor , each vertex (term in P (φ)) with a factor 1 , and
each external line with a factor 1 (as each source function expanded down in the
path integral is accompanied by a factor 1 ). Thus, a particular graphical contribution
to G(n) comes with  raised to a power equal to the total number of lines = n + I
(external plus internal), minus the number of vertices V , minus the number of source
functions n, and plus one (from the overall factor of  in (10.129)). This gives exactly
I − V + 1, which we saw previously is just the number of loops in the graph!
In particular, the tree (loopless) graphs correspond to the leading contribution, of
order 0 , in the classical limit  → 0. At least within the context of perturbation
theory, quantum effects in a quantum field theory amplitude can be directly associated
with the presence of loops in the corresponding graphs. From the path-integral
expression (10.128) it is apparent that an expansion in  amounts to a saddle-point
expansion of the integral, in which the leading term corresponds to finding a minimum
of the exponent in field space. This minimum occurs at the extremal point of the
classical action S[φ], at φ(x) = φcl (x) where

δS δ 1 
= ( φ(x)Kφ(x) + P (φ(x)) − j(x)φ(x)) =0 (10.130)
δφ(x) δφ(x) 2 φ=φcl

which amounts to

Kφcl (x) + P  (φcl (x)) = j(x) (10.131)

In the source-free limit (j = 0), this equation is simply the non-linear classical field
equation corresponding to the classical least action principle (in Euclidean space, of
course). In the presence of a source, it determines the classical field φcl implicitly as a
functional of the external source j. The leading saddle-point approximation to Z[j] is
given by simply evaluating the exponential at its extremal point:
1
 1 4
Zcl [j] = e−  ( 2 φcl (x)Kφcl (x)+P (φcl (x))−j(x)φcl (x))d x (10.132)

whence we obtain, by (10.129), the classical limit of the generating functional W for
connected amplitudes

1
Wcl [j] = − ( φcl (x)Kφcl (x) + P (φcl (x)) − j(x)φcl (x))d4 x (10.133)
2

where φcl is supposed to be determined as a functional of j via (10.131). Notice that


 has disappeared at this point, as we are dealing with completely classical quantities.
Note also that by using the (functional) chain rule for differentiation of a functional
that depends on j(x) both explicitly and implicitly through φcl (x),
Graphical concepts: N -particle irreducibility 349
 
δ δφcl (y) δ δ 
= d4 y + (10.134)
δj(x) δj(x) δφcl (y) δj(x) φcl

we find that

δWcl [j] δφcl (y)
= d4 y (−Kφcl (y) − P  (φcl (y)) + j(y)) + φcl (x)
δj(x) δj(x)
= φcl (x) (10.135)

where the term in brackets in the integral has vanished in virtue of the field equation
(10.131).
The classical field equation (10.131) can be solved iteratively in increasing powers
of the source function j. Let us temporarily restrict the polynomial P (φ) to correspond
to φ4 theory—we take P (φ) = 4! φ , P  (φ) = 3!
λ 4 λ 3
φ . Next, we rewrite (10.131) as follows:

λ −1 3
φcl (x) = K−1 j(x) − K (φcl )(x) (10.136)
3!
or more explicitly, using the Green function for K, which is just the Euclidean
2 , so that KΔE (x) = δ (x)),
1 4
propagator ΔE (x) (with Fourier transform k2 +m
 
λ
φcl (x) = ΔE (x − x1 )j(x1 )d4 x1 − ΔE (x − z)φcl (z)3 d4 z (10.137)
3!

Reinserting the left-hand side result for φcl on the right, we find, through order j 3 ,
 
λ
φcl (x) = ΔE (x − x1 )j(x1 )d4 x1 − ΔE (x − z)
3!
· ΔE (z − x1 )j(x1 )ΔE (z − x2 )j(x2 )ΔE (z − x3 )j(x3 )d4 x1 d4 x2 d4 x3 d4 z + O(j 5 )
(10.138)

Ignoring signs, coupling constants, and combinatoric factors, these terms can be
graphically represented as indicated in Fig. 10.7: it is apparent that one has generated
a set of tree graphs with a preferred external point (the argument x of the classical
field φcl (x)), and source functions j(xi ) attached to all other external points of the
graph. The third term in Fig. 10.7 shows also one of the terms arising at order j 5 (of
second order in the bare coupling λ, as there are two interaction vertices, at z1 and
z2 ), which we obtain by iterating (10.138) one more time.
If we insert this iterative solution for φcl into (10.133), we obtain the connected
generating functional in the classical limit as an explicit functional of the source
function j(x), expanded formally in increasing powers of j(x):

1
Wcl [j] = j(x1 )ΔE (x1 − x2 )j(x2 )d4 x1 d4 x2
2
  4
λ
− ( ΔE (z − xi )j(xi )d4 xi )d4 z + O(j 6 ) (10.139)
4! i=1
350 Dynamics VIII: Interacting fields: perturbative aspects

z1 j (x1)
x j (x1) + x DE
DE DE
DE j(x2)
DE
j(x3)

DE j (x1)
x z1
+ DE j(x2) + . . .
DE
DE DE j(x3)
z2 DE
j(x4)
DE
j(x5)

Fig. 10.7 Formal expansion of the classical field φcl (x) in powers of j.

with the graphical representation indicated in Fig. 10.8 (again ignoring combinatoric
factors, signs, etc., and with one of the O(j 6 ) terms not shown explicitly in (10.139)
indicated graphically). As expected, only connected tree graphs are present, some of
which are, however, clearly one-particle-reducible (i.e., become disconnected when
a single internal line is cut). At the level of tree graphs, of course, the only 1PI
graphs involve at most a single interaction vertex: as soon as more than one vertex is

z1 j(x2)
j(x1) j(x2) + j(x1) DE
DE DE
DE DE
j(x3)
j(x4)

DE j(x2)
z1
+ j(x1) DE DE
j(x3) + . . .

DE DE j(x4)
z2 DE
j(x5)
DE
j(x6)

Fig. 10.8 Formal expansion of Wcl [j] in powers of j.


Graphical concepts: N -particle irreducibility 351

present, there must be an internal line connecting two vertices (the graph as a whole is
connected!), the removal of which will disconnect the diagram (as no loops are present).
Thus any generating functional for 1PI diagrams can, at the classical (leading order
in ) level, only contain a finite set of terms corresponding to the interaction vertices
of the theory (the monomial terms in P (φ)).
The key to finding such a functional turns out to be the Legendre transform: we
re-express the information contained in the functional W [j] as a functional Γ[φ] of
the derivatives δW [j]
δj(x)
≡ φ(x), in such a way that no information is lost in trading in
the source j for the field φ, or vice versa. The field φ(x) obtained by functionally
differentiating W [j] with respect to the source j(x) is sometimes (confusingly) called
the “classical field”, although properly speaking that term should be applied to its  →
0 limit φcl (x) = δW cl [j]
δj(x) (cf. (10.135)). Referring to the path-integral representation for
W [j] =  ln Z[j], and recalling that derivatives of Z[j] generate vacuum-expectation-
values of the corresponding Heisenberg fields (here, continued to Euclidean space), we
1 δZ[j]
see that φ(x) = Z[j] δj(x) is just the normalized VEV of the Heisenberg field φH (x)
in the presence of the source term. It can therefore be viewed as a c-number source
function, like j(x), and in that sense is “classical”.19 As usual, the Legendre transform
allowing us to incorporate the complete information in W [j] in a recoverable way20 as
a functional of the derivative φ(x) = δW [j]
δj(x) is given by


Γ[φ] ≡ −W [j] + j(x)φ(x)d4 x (10.140)

where, of course, the dependence on j(x) on the right-hand side must be eliminated
in favor of φ(x) by inverting the equation φ(x) = δW [j]
δj(x) . The Legendre transformation
is defined in order to preserve the information encoded in W [j]. Using the chain rule
(namely, (10.134), with the roles of j and φ interchanged), we find
 
δΓ[φ] δj(y) δ
= j(x) + d4 y (−W [j] + j(z)φ(z)d4 z)|φ
δφ(x) δφ(x) δj(y)

δj(y) δW [j]
= j(x) + d4 y (− + φ(y)) = j(x) (10.141)
δφ(x) δj(y)

so that W [j] = −Γ[φ] + j(x)φ(x)d4 x allows reconstruction of W [j] from Γ[φ] once φ
is re-expressed in terms of j via (10.141). Moreover, we see that the two-point functions

δ 2 W [j] δφ(y)
= (10.142)
δj(x)δj(y) δj(x)

19 We shall follow the usual confusing practice of using the same symbol φ(x) to refer to (a) the quantum
(operator) free field, (b) the c-number field integrated over in the path-integral formalism, and (c) the
independent field variable for the Legendre transform Γ[φ] of W [j].
20 See (Callen, 1960), Section 5.2, for a lucid geometrical introduction to Legendre transforms.
352 Dynamics VIII: Interacting fields: perturbative aspects

and

δ 2 Γ[φ] δj(x)
= (10.143)
δφ(x)δφ(y) δφ(y)

are functional inverses of one another. In particular, their discretized versions corre-
spond to matrices which are inverses of one another.
It should be noted here that the assumption of invertibility (i.e., solvability of
δW [j]
δj = φ for j in terms of φ, or of δΓ[φ] δφ = j for φ in terms of j) is not automat-
ically assured for arbitrary functionals. In the case of the Legendre transformation
connecting Lagrangians to Hamiltonians in mechanics (or field theory), for example,
there are important cases (in the case of field theories, in situations involving local
gauge symmetries, for example) where the form of the Lagrangian is such that it is not
possible to solve uniquely for the velocities (or time-derivatives of the fields) in terms of
the momenta defined as p = ∂L ∂ q̇ (or, for field theory, π ≡ δ φ̇ ), so the Hamiltonian is not
δL

defined uniquely by a Legendre transformation of the Lagrangian. The modifications


needed in the canonical formalism in the presence of such eventualities will be
described in Chapter 15. Even for a function W (J) of a single variable J, it is apparent
that the equation W  (J) = φ can be solved uniquely for J in terms of φ only if W  (J)
is a monotonic function of J: i.e., if the original function W (J) is convex. The required
convexity of W [j] (and Γ[φ]) in the field theory case can in fact be established quite
generally (beyond perturbation theory), as we shall see when we once again take up
the study of the effective action in the study of spontaneous symmetry-breaking in
Chapter 14. For our perturbative purposes in this Chapter, it will suffice to show that
the invertibility is valid order by order, to all orders, in a formal expansion in the
source j(x) or the classical field φ(x).
We return now to a study of the properties of the effective action Γ[φ]. As a
first step, we may enquire into the meaning of the classical limit, for  → 0, of this
object. The classical limit is obtained by setting W [j] = Wcl [j], so that, by (10.135),
φ(x) = φcl (x), and we find immediately from (10.133):

1
Γcl [φ] = ( φ(x)Kφ(x) + P (φ(x)))d4 x = S[φ] (10.144)
2

We see that in the classical limit, the generating functional Γ[φ] coincides with the
classical action S[φ] of the theory. For this reason, the full (i.e.,  = 0) Γ[φ] is generally
referred to as the “effective action” of the field theory, which in some sense generalizes
the classical action to include quantum effects. If in the usual way we regard Γ[φ] as
a generating functional of n-point vertex functions Γ(n) (x1 , x2 , ..., xn ) of the theory,

δ n Γ[φ] 
Γ (n)
(x1 , x2 , ..., xn ) =  (10.145)
δφ(x1 )δφ(x2 )..δφ(xn ) φ=0

λ 4
we see that in the classical limit (for the specific example P (φ) = 4! φ ), the only
surviving vertex functions are for n = 2 and n = 4:
Graphical concepts: N -particle irreducibility 353

(2)
Γcl (x1 , x2 ) = Kx1 δ 4 (x1 − x2 ) (10.146)
(4)
Γcl (x1 , x2 , x3 , x4 ) = λδ 4 (x1 − x2 )δ 4 (x2 − x3 )δ 4 (x3 − x4 ) (10.147)

Note that all the one-particle-reducible terms present in the connected tree graphs
generated by Wcl [j] have disappeared: instead we have only a two-point vertex given
by the Klein–Gordon operator K (or equivalently, the inverse propagator Δ−1 E ), and a
four-point fully amputated vertex corresponding to a single interaction point: in other
words, the only amputated one-particle-irreducible graphs possible at the tree level.
(2)
The appearance of the inverse propagator in Γcl can be regarded as a result of the
amputation process whereby a factor of the inverse propagator Δ−1 E is applied for each
external point, leading in the case of the two-point function to Δ−1 −1
E · ΔE · ΔE = ΔE .
−1

To go beyond the classical limit (i.e., the tree diagrams of the theory) we need
to look a little more carefully at the structure of the effective action functional Γ[φ].
A convenient way to do this is to start with the functional W [j], as an expansion in
powers of the source j, and carry out the Legendre transformation to Γ[φ] order by
order in this formal expansion. We shall now revert to the usual procedure followed
through the rest of the book, and reinstate natural units  = 1, keeping in mind that
an expansion in explicit powers of Planck’s constant amounts to nothing more than a
reorganization of the perturbation theory according to the number of loops. Moreover,
to keep the resultant formulas from expanding to intolerable lengths on the printed
page, we shall assume that spacetime has been discretized and work in a purely discrete
framework, in which spacetime points x are replaced by indices i (i = 1, 2, . . . N , where
the spacetime lattice xi has N points), and sources and fields are localized by attaching
the relevant index- Ji ≡ j(xi ) or φj ≡ φ(xj ), for example. Integrals over spacetime will
be replaced by summations over (typically) repeated indices, and operators K (resp.
Green functions ΔE (x, y), G(3) (x, y, z),etc.) by appropriately multi-indexed objects
(3)
Kij (resp. Δij , Gijk ,etc.). Connected contributions are readily identified as terms which
cannot be algebraically factored into two or more parts involving non-overlapping sets
of summed indices. We also allow for polynomial interactions P (φ) including both
even and odd powers of the field φ, so that amplitudes G(n) are in general non-
vanishing, even for odd n. By definition, W [j] is the generating functional of the
n-point connected amplitudes G(n) , so in this discrete notation we have simply
1 1 (3) 1 (4)
W [J] = Ji Δ̂ij Jj + Gijk Ji Jj Jk + Gijkl Ji Jj Jk Jl + ... (10.148)
2 3! 4!
whence
∂W 1 (3) 1 (4)
φi = = Δ̂ij Jj + Gijk Jj Jk + Gijkl Jj Jk Jl + O(J 4 ) (10.149)
∂Ji 2 3!

Note that here Δ̂ij is the discrete form of the full Euclidean propagator Δ̂E , as W [j]
generates the complete interacting connected Green functions of the theory, and K̂ will
now be defined as the (discrete) inverse of this full two-point function (which of course,
in the free field limit, reduces to a discretized version of the Euclidean Klein–Gordon
operator K = − + m2 ).
354 Dynamics VIII: Interacting fields: perturbative aspects

The last relation can now be inverted, order by order in increasing powers of φ
(which is of order J, by (10.149)), to yield Ji as a function of φi . Through terms of
order φ3 we find (see Problem 9), using the shorthand (K̂φ)i ≡ K̂ij φj ,

1 (3)
Ji = (K̂φ)i − K̂ij Gjkl (K̂φ)k (K̂φ)l
2
1 (3) (3)
+ K̂ij1 Gj1 k1 l1 (K̂φ)l1 K̂k1 k2 Gk2 l2 j2 (K̂φ)l2 (K̂φ)j2
2
1 (4)
− K̂ij Gjklm (K̂φ)k (K̂φ)l (K̂φ)m + O(φ4 ) (10.150)
3!
Reinserting this result in the expansion for W [J], (10.148), we find after some
straightforward algebra (Problem 10), the effective action through terms of order φ4 :

Γ[φ] = −W [J(φ)] + Ji φi
1 1 (3)
= φi K̂ij φj − Gijk (K̂φ)i (K̂φ)j (K̂φ)k
2 3!
1 (4)
− {Gijkl (K̂φ)i (K̂φ)j (K̂φ)k (K̂φ)l
4!
(3) (3)
− 3Gi1 j1 k1 (K̂φ)j1 (K̂φ)k1 K̂i1 i2 Gi2 j2 k2 (K̂φ)j2 (K̂φ)k2 } + O(φ5 ) (10.151)

We observe that the individual terms appearing in Γ[φ] are again connected, consisting
of the connected G(n) amplitudes singly, or tied together by K̂ operators, which
remove a single internal (full) propagator when two separate G(n) are connected
(so as to avoid doubling the connecting Feynman propagator, as the G(n) are not
amputated). Moreover, the inverse (full) propagators K̂ attached to all external legs
(i.e., to factors of the external field φ) ensure the removal of all external legs, so
that Γ[φ] generates amputated diagrams—exactly the amplitudes leading to a smooth
perturbative evaluation of the LSZ formula, as discussed previously. If we define a
general n-point proper vertex Γ(n) as the n-th derivative at zero field of Γ[φ] (with a
minus included to take care of the sign change in going from W to Γ):

∂ n Γ[φ] 
(n)
Γi1 i2 ...in ≡ −  (10.152)
∂φi1 ∂φi2 ..∂φin φ=0

(2)
we see that the two-point proper vertex is just Γij = −K̂ij , i.e., (minus) the inverse
(3)
full propagator. Similarly, Γijk ,
the three-point proper vertex, is just the amputated
three-point connected Green function
(3) (3)
Γijk = K̂ii K̂jj  K̂kk Gi j  k (10.153)

which, for obvious reasons, is automatically 1PI. At order φ4 a new feature appears,
as the proper vertex is given by a combination of terms, specifically, the full connected
four-point Green function G(4) , with the four external legs amputated (by the K̂
factors), minus three other terms which are clearly one-particle-reducible in character:
Graphical concepts: N -particle irreducibility 355

(4) (4) (3) (3)


Γijkl = K̂ii K̂jj  K̂kk K̂ll Gi j  k l − K̂ii K̂kk Gmi k K̂mn Gnj  l K̂jj  K̂ll
(3) (3) (3) (3)
− K̂ii K̂jj  Gmi j  K̂mn Gnk l K̂kk K̂ll − K̂ii K̂ll Gmi l K̂mn Gnj  k K̂jj  K̂kk
(10.154)

A glance at Fig. 10.9, which shows a decomposition of the amputated G(4) amplitude
into parts built from 1PI amplitudes (connected with full propagators), clarifies the
meaning of the subtracted terms in (10.154): they serve to precisely remove from the
full connected four-point amplitude its one-particle-reducible pieces. In other words,
the first term on the right-hand side of Fig. 10.9, consisting of the 1PI four-point
(4)
contributions to the full connected four-point amplitude, is identical to Γijkl . So,
at least through vertices with four external legs, the effective action constructed by
Legendre transformation does indeed seem to generate the amputated one-particle-
irreducible diagrams of the theory, and only such diagrams. Of course, it is hardly
obvious from the above that this remains true for n-point amplitudes with arbitrarily
large n.
A simple proof that the Legendre transform Γ[φ] indeed generates all the one-
particle-irreducible, and only the one-particle-irreducible, graphs for arbitrary pow-
ers of φ can be given using a trick described by Zinn–Justin (Zinn–Justin, 1989).
The individual contributions on the right-hand side of (10.154) are recognized as
corresponding to connected graphs in the sense that any external index (like i) can
be connected to any other (like l) by a continuous sequence of indices connected by
the multi-index connected Green functions K̂, G(3) , G(4) , etc. For example, taking the
second term on the right-hand side of (10.154), corresponding to the second graph on
the right-hand side in Fig. 10.8, the index i can be connected to l by the sequence
i → i → m → n → l → l, passing sequentially through K̂ii , Gmi k , K̂mn , Gnj  l , and
(3) (3)

K̂ll . We can investigate the effect of cutting internal lines on the graphs generated
by Γ[φ] by introducing a separable perturbation of the Klein–Gordon operator in the
quadratic part of the action, as follows:

k l k l k l
(4)
Gamp = 1PI + 1PI 1PI
i j i j i j

j l l j
+ 1PI 1PI + 1PI 1PI
i k i k
(4)
Fig. 10.9 Decomposition of Gamp ijkl in terms of 1PI vertices: crossbars indicate that the external
legs are “amputated”.
356 Dynamics VIII: Interacting fields: perturbative aspects

   2
S [φ] = S[φ] + φi φj = S[φ] + ( φi ) (10.155)
2 i,j 2 i

The quadratic part of the discretized action is now

1
φi (Kij + Mij )φj , Mij = 1, ∀i, j (10.156)
2 i,j

with a free propagator Δ given by the inverse of K + M :



(Δ )ij = Δij − vi vj + O(2 ), vi ≡ Δij (10.157)
j

The contributions of first order in the perturbing parameter  evidently correspond to


the replacement of a single free propagator Δij by the separable piece vi vj for every
possible internal line,21 and would therefore necessarily contain disconnected graphs
unless the initial amplitude was itself 1PI. For non-zero , the discretized generating
functional Z[J] becomes, to first order in ,
 
1 2   ∂2
Z [J] = dφi e−S[φ]− 2 ( i φi ) +Ji φi = (1 − )Z[J] (10.158)
2 i,j ∂Ji ∂Jj

and its logarithm W [J] is given to the same order by

 1  ∂ 2 Z[J]
W [J] = W [J] −
2 Z[J] i,j ∂Ji ∂Jj

  ∂ 2 W [J] ∂W [J] ∂W [J]


= W [J] − ( + ) (10.159)
2 i,j ∂Ji ∂Jj ∂Ji ∂Jj

Recall here that in going over from the external source Ji to the classical field φi , we
∂ 2 W [J] ∂ 2 Γ[φ]
have ∂W [J]
∂Ji = φi , while ∂Ji ∂Jj is the inverse of the matrix ∂φi ∂φj (cf. (10.142) and
(10.143)). Now, the chain rule formula
  
∂  ∂  ∂Ji  ∂
 =  +  (10.160)
∂ φi ∂ Ji ∂ φi ∂Ji

applied to the Legendre relation for the perturbed effective action Γ [φ]

Γ [φ] + W [J] − Jj φj = 0 (10.161)

21 As Γ[φ] generates amputated graphs, the only lines present are internal lines!
Graphical concepts: N -particle irreducibility 357

gives
  
∂Γ [φ]  ∂W [J]  ∂Ji  ∂
0= + + (W [J] − Jj φj )
∂ φi ∂ Ji ∂ φi ∂Ji
    
∂Γ [φ]  ∂W [J]  ∂Ji  ∂Γ [φ]  ∂W [J] 
= + + (φi − φi ) = +
∂ φi ∂ Ji ∂ φi ∂ φi ∂ Ji

so that the first-order shift in the effective action Γ [φ] is just minus that in W [J],
whence, from (10.159),

  2   ∂ 2 Γ[φ] −1
Γ [φ] = Γ[φ] + ( φi ) + ( ) (10.162)
2 i 2 i,j ∂φi ∂φj

using the fact that the second derivative matrices of Γ[φ] and W [J] are inverses.
The second term on the right-hand side of (10.162) is just the perturbation originally
introduced in the action (10.155), which must, of course, appear in the O(0 ) (classical)
part of the full effective action Γ [φ]. It generates the expected disconnected part of
the two-point function (inverse full propagator). The third term contains the effect of
disconnecting a single internal line in all the higher-order vertices, and a little thought
shows that it consists entirely of connected diagrams. Indeed, as a function of the Ji
∂2 W
sources, it expands into obviously connected graphs, as it is just i,j ∂J i ∂Jj
. But when
each of the Ji = ∂Γ[φ]
∂φi is inserted as a function of the φi into the latter expression, it
simply expands the previous connected graph (from W [J]) by a connected extension,
so even when expanded as a function of the φi only connected diagrams are obtained.
Note that the combinatoric factors relating different 1PI contributions are preserved
in Γ[φ]: the process of going from W to Γ involves (as we see from (10.154), and in Fig.
10.9) simply (a) amputating external lines, and (b) removing en bloc any one-particle-
reducible graphs. This concludes the proof that the graphs generated by Γ[φ] remain
connected even when any single internal line is cut, and hence must correspond exactly
to the amputated 1PI graphs appearing in the connected n-point functions generated
by W [j].
Historically, the concept of one-particle-irreducibility was first introduced by Dyson
(Dyson, 1949) in the context of the three-point function (electron–electron–photon
vertex) in quantum electrodynamics, where the 1PI graphs contributing to this three-
point function were referred to as the “proper vertex part”. As we shall see later
in Part 4, the systematic discussion of renormalizability pioneered by Dyson makes
critical use of the fact that the ultraviolet sensitivity of the theory (in other words,
the divergence structure of the loop integrals appearing in a general graph) can be
fully analysed in terms of 1PI diagrams, as loop integrals in separate 1PI pieces of a
larger graph are algebraically decoupled from one another.
A more systematic treatment of n-particle irreducibility was carried out by
Symanzik a decade later (Symanzik, 1960), emphasizing the fact that the study of
the singularity structure of amplitudes, as a function of the external momenta, was
intimately related to the ability to decompose the contributing graphs by cutting
one, two, or more internal lines. For example, the only amputated graphs capable of
358 Dynamics VIII: Interacting fields: perturbative aspects

p1 p4

p2 1PI 1PI p5

p3 p6
1

(p1 + p2 + p3)2 − m2 + i

Fig. 10.10 A one-particle-reducible contribution to the 6-point function G(6) (p1 , p2 , .., p6 ).


producing single-particle pole singularities of the form 1/(( pi )2 − m2 ), where the
summed momenta pi represent some non-trivial subset of the (appropriately signed)
external momenta of the graph, involve one-particle-reducible diagrams such as the
one shown in Fig. 10.10 in φ4 theory (we are back in Minkowski space, hence the
i in the propagator). The 1PI four-point vertices on either side of the central line
cannot contain any single-particle poles, which arise only when single lines connect
separate connected parts of the graph. Similarly, the ability to decompose a graph
into two pieces by cutting two internal lines, as in Fig. 10.11, implies a branch-point
structure in the variable t ≡ (p1 − p2 )2 when the two-particle threshhold is reached
(at t = 4m2 ).22 Correspondingly, such two-particle threshholds are absent from 2PI
(or two-particle-irreducible) amplitudes, which can only be disconnected by cutting
at least three internal lines of the graphs contributing to such amplitudes, which are
commonly referred to as “Bethe–Salpeter kernels”. They play, as we shall see in the
next chapter, an important role in understanding the physics of threshhold bound

p2 p4

2PI 2PI

p1 p3

Fig. 10.11 A two-particle-reducible contribution to the four-point function G(4) (p1 , p2 , p3 , p4 ).

22 The reader may recall that exactly such branch-points appeared in our discussion of the Lehmann
representation for the full propagator in momentum space in Section 9.5, with new cuts appearing precisely
at squared-momentum values w ≡ p2 at which new intermediate multi-particle states become kinematically
possible.
How to stop worrying about Haag’s theorem 359

states in quantum field theory. Finally, with the recognition of the importance of
spontaneous symmetry breaking in quantum field theory in the 1960s, the central role
of the effective action in understanding the energetics of broken symmetry became
clear, starting with the seminal paper of Jona-Lasinio (Jona-Lasinio, 1964)—a topic
to which we return in Chapter 14.

10.5 How to stop worrying about Haag’s theorem


We have already indicated on numerous occasions, without providing specific justi-
fication, that there are difficulties in the implementation of an interaction picture
in the case of continuum field theories, which can be circumvented by a temporary
full regularization of the theory (i.e., by introduction of both large-distance (IR) and
small-distance (UV) cutoffs) which reduces the number of independent dynamical
variables to a finite number—for example, the fields, and their time-derivatives (which
play the role of conjugate momenta)—on a finite number of spacetime points. The
price one pays is, of course, the loss (one hopes, temporary) of the full continuous
Poincaré symmetry of the theory. The unpleasant fact (now commonly referred to
as “Haag’s theorem” (Haag, 1955)) that the formulation of an interaction picture
of time-development in an infinite-volume continuum field theory is mathematically
untenable has confused many generations of students of quantum field theory, who
look quite naturally to the spectacular successes of renormalized perturbation theory
in quantum electrodynamics (all based on interaction-picture formulas) and ask,
“Where’s the problem?” In this section we shall attempt to allay the understandable
fear that perturbative methods, and the Feynman graph approach described earlier
in this chapter, are founded on mathematical quicksand, and therefore in some sense
unreliable despite their obvious empirical success over the years.
We begin with a very simple example, found already in Haag’s seminal paper
(Haag, 1955), which nevertheless has all the essential features of the general case, but
allows explicit calculation and a concrete display of the source of the difficulty. We shall
start with a free scalar field of mass m1 and introduce a perturbation of an extremely
innocent kind—a shift to a free scalar field of mass m2 . Thus the Hamiltonian is
(cf. (6.89)):

H = H0 + V (10.163)
 
φ̇2 1  2 1 2 2
H0 = d xH0 (x, t) = d3 x : {
3
+ |∇φ| + m1 φ } : (10.164)
2 2 2
 
1
V = d3 xHint (x, t) = d3 x δm2 : φ2 : , δm2 ≡ m22 − m21 (10.165)
2
Thus, our unperturbed field is a free field φ1 (x) of mass m1 , while our full “interacting”
Heisenberg field is likewise a free scalar field φ2 (x), but of mass m2 . We should hardly
expect to encounter difficulties with the interaction picture in so simple a case, but as
we shall soon see, appearances in these matters are definitely deceiving!
On the basis of our treatment of the interaction picture in Section 4.3, we would
expect that the ground states of the two Hamiltonians H0 (i.e., the vacuum |01 for
scalar particles of mass m1 ) and H (the vacuum |02 for particles of mass m2 ) would be
360 Dynamics VIII: Interacting fields: perturbative aspects

straightforwardly related by a unitary operator. For example, choosing the in-vacuum


as the ground state |02 for H, we have (cf. (4.156))

|02 = Ω− |01 (10.166)

with Ω− the unitary Møller wave operator defined in Section 4.3. Of course, |01 is
the Fock vacuum with respect to the destruction operators obtained in the usual way
from φ1 (x):

a1 (k)|01 = 0, ∀k (10.167)


 ↔
i ∂
a1 (k) =  d3 xeik·x φ1 (x, t)
3
(2π) 2E1 (k) ∂t

i

=  d3 xe−ik·
x {φ̇1 (x, 0) − iE1 (k)φ1 (x, 0)} (10.168)
(2π)3 2E1 (k)

The formula for the destruction operator in terms of the field is time-independent,
so we have chosen to set t = 0 in the last line. We recall from Chapter 4 that, by
convention, the various pictures of time-development in quantum theory (interaction,
Schrödinger, Heisenberg) are presumed to coincide at time t = 0. In particular, our
interaction-picture field φ1 (x, t) and “Heisenberg” field φ2 (x, t) must coincide at t = 0,
as must (cf. (9.40)) their respective time-derivatives:

φ1 (x, 0) = φ2 (x, 0) (10.169)


φ̇1 (x, 0) = φ̇2 (x, 0) (10.170)

where our “Heisenberg” field φ2 (x) is expressed in the usual way in terms of the
destruction and creation operators a2 (k), a†2 (k) appropriate for particles of mass m2 :

d3 q
φ2 (x) =  (a2 (q)e−iq·x + a†2 (q)eiq·x ) (10.171)
2E2 (q)(2π)3

Using the identity relations (10.169, 10.170), inserting (10.171) into (10.168), and
carrying out the x and q integrations, we find the connection between the creation–
destruction operators for the particles of mass m1 and m2 :

a1 (k) = α(k)a2 (k) + β(k)a†2 (−k) (10.172)


 
1 E1 (k) E2 (k)
α(k) = ( + ) (10.173)
2 E2 (k) E1 (k)
 
1 E1 (k) E2 (k)
β(k) = ( − ) (10.174)
2 E2 (k) E1 (k)

Note that the commutation algebra of the creation–destruction operators is preserved


by this transformation (commonly referred to as a Bogoliubov transformation), as a
How to stop worrying about Haag’s theorem 361

consequence of α(k)2 − β(k)2 = 1; in particular,

[a1 (k), a†1 (k  )] = [a2 (k), a†2 (k  )] = δ 3 (k − k  ) (10.175)

In a system with a finite number of degrees of freedom (for example, if the system were
quantized on a discrete spacetime lattice with a finite number of points, with discrete
values for the spatial momenta k), standard results going back to von Neumann ensure
the existence of a well-defined unitary transformation relating the a1 (k) and a2 (k). If
the vacua |01 , |02 (satisfying respectively a1 (k)|01 = 0, a2 (k)|02 = 0, ∀k) are indeed
related by a proper unitary transformation, as in (10.166), we should expect to be able
to expand the |01 vacuum in terms of Fock states for particle 2, and obtain thereby
a unit-normalized state:
 
1
|01 = g0 |02 + d3 k1 g1 (k1 )|k1 2 + d3 k1 d3 k2 g2 (k1 , k2 )|k1 , k2 2 + ... (10.176)
2
with g0 = 20|01 and
 
1
10|01 = 1 = |g0 |2 + d3 k1 |g1 (k1 )|2 + d3 k1 d3 k2 |g2 (k1 , k2 )|2 + ... (10.177)
2
On the other hand, from (10.172), we know that

α(k)a2 (k)|01 = −β(k)a†2 (−k)|01 (10.178)

If we successively take the overlap of (10.178) with 20|, 2k1 |, 2k1k2 |, .., we find easily,
using the expansion (10.176),

g1 (k) = 0 (10.179)
β(k) 3 
g2 (k1 , k2 ) = − δ (k1 + k2 ) 20|01 (10.180)
α(k)
g3 (k1 , k2 , k3 ) = 0, and so on (10.181)

Inserting the result (10.180) into (10.177) however, the integral diverges, as it involves
the square of a δ-function. As β(k) = 0 if m1 = m2 , the finite norm of |01 therefore
requires the overlap 20|01 = g0 to vanish. The recursion implied by (10.178) then
leads to the conclusion that g2 = g4 = g6 = g2n = 0. As the states with odd numbers
of particles do not appear, we have arrived at an evident contradiction: 10|01 = 0! In
fact, the argument extends to arbitrary multi-particle states of particle 1, which all
have vanishing overlap with those of particle 2. It is apparent that the Fock spaces
of scalar particles of differing mass cannot be consistently incorporated within the
same separable Hilbert space, allowing the desired unitary transformation between
corresponding states. This result is evidently a direct consequence of the continuum
normalization of the two-particle states, which in turn can be traced back to the fact
that we have quantized our system in an infinite spatial box. Our problems in this
particular example arise from the fact that we cannot obtain a properly normalized
discrete vacuum state by mixing another such state with multi-particle states built
from pairs of continuum-normalized opposite-momentum particles.
362 Dynamics VIII: Interacting fields: perturbative aspects

We may attempt to circumvent the above difficulties and re-establish a consistent


treatment of both the “free” (particle 1) and “interacting” (particle 2) Fock spaces
within a single Hilbert space by quantizing the system in a cubical box of finite
spatial volume V = L3 (with periodic boundary conditions), so that the allowed spatial
momenta are now discrete, ki = 2πnL , ni integer, i =1,2,3, and the corresponding multi-
i

particle states discretely normalized, like the vacuum state from which they arise by
application of creation operators. To distinguish discrete from continuous momenta,
we shall indicate the former by subscripts (rather than arguments) in the following.
Thus, the scalar field φ1 now takes the form
 1
φ1 (x) = V −1/2  (a
e−ik·x + a†
eik·x ) (10.182)

2E
k 1k 1k
k

with a similar equation for φ2 . The infinite volume limit is obtained from (10.182) with
 d3 k (2π)3/2
the usual transcription V1
k → (2π) 
3 and the normalization a1
k → V 1/2 a1 (k) (so
that [a1
k , a†
 ] = δ
k
k ). The discrete analog of (10.172–10.174) is
1k

a1
k = αk a2
k + βk a†
(10.183)
2 −k

1 E1k E2k
αk = ( + ) (10.184)
2 E2k E1k

1 E1k E2k
βk = ( − ) (10.185)
2 E2k E1k

with Eik = k 2 + m2 as before, but with k discrete. Recognizing that the expansion
i
(10.176) involves only pairs of opposite momentum particles k, −k (which can appear
multiply for each discrete k), we divide the set of all pairs into a set of independent
pairs k, −k with k in the right hemisphere kx > 0 in order to avoid double-counting.
The Bogoliubov transformation can be carried out separately on each such pair as the
creation–annihilation operators for distinct k commute. Focussing our attention on a
specific discrete k, we can expand the |01 vacuum as follows:
 1 † †
|01 = cN |N 2 , |N 2 ≡ (a a )N |02 , 2N |N 2 = 1 (10.186)
N ! 2
k 2 −
k
N

The requirement a1
k |01 = 0, using (10.183) then leads directly to the simple recursion
relation (see Problem 11):
βk βk
cn = −
cn−1 ⇒ cN = (− )N c0 (10.187)
αk αk

The normalization condition 10|01 = 1 = N |cN |2 = |c0 |2 1−β12 /α2 thus gives
k k

αk2 − βk2 1 4E1k E2k


|c0 |2 = = = (10.188)
αk2 2
αk (E1k + E2k )2
How to stop worrying about Haag’s theorem 363

The overlap between the two vacua 20|01 is then given, up to an irrelevant phase, by
the product23 of c0 factors of the form (10.188) for all independent pairs, labeled by
k values in the right hemisphere:

 2√E1k E2k  2√E1k E2k


20|01 = = exp ln (10.189)
E1k + E2k E1k + E2k

k

In the large volume limit,
k → 2(2π) V
3 d3 k (the extra factor of 1/2 arises from the
restriction of k to the right hemisphere) and this becomes

V (E1 (k) + E2 (k))
20|01 exp {− 3
d3 k ln {  } (10.190)
2(2π) 2 E1 (k)E2 (k)

For large k (i.e., k >> m1 , m2 ) one easily verifies that

(E1 (k) + E2 (k)) (m21 − m22 )2 (m21 − m22 )2


ln {  } ∼ ln {1 + 4
+ O(1/k6 )} ∼ (10.191)
2 E1 (k)E2 (k) 32k 32k 4

so that the momentum integral in (10.190) is ultraviolet convergent for spatial dimen-
sions three or less (it is manifestly convergent for small k). However, the presence in
the exponent of a volume factor multiplying this finite integral (in three dimensions)
shows immediately the vanishing of the overlap of the two vacua in the infinite volume
limit found earlier. In the usual terminology employed by axiomatic field theorists,
the Fock spaces built on these two vacua are said to be “unitarily inequivalent”
spaces. It should be noted that the ultraviolet dependence of the integral appearing
in expressions for the vacuum overlap is extremely theory (and spacetime dimension)
dependent: for example, an analogous computation for two free spin- 12 fields of different
mass gives a momentum integral divergent for spatial dimensions greater than one
(rather than three, in the scalar case). The important point is that, irrespective
of the ultraviolet behavior, the overlap between normalized vacua for free fields of
differing mass necessarily vanishes in the infinite volume limit as a consequence of the
geometrical/kinematical mismatch in Hilbert space between discretely normalized and
continuum normalized states.
The above example has been generalized to a much broader statement concerning
the non-existence of the interaction picture in essentially all cases in which field
theory Hamiltonians H0 (“free”) and H (“interacting”), defined in an infinite-volume
continuum spacetime supporting the full Poincaré group of spacetime invariances,
differ non-trivially (i.e., by the integral of some local operator density) from each
other. We shall state below (though not prove fully) the modern version of this “Haag’s
theorem”: it is an inescapable consequence of the spectral and field axioms (Ia-d, IIa-d)
of Section 9.2.
Before going on to discuss a more general proof, let us emphasize what Haag’s
theorem does not say about interacting field theories. In particular, there is no difficulty

23 The vacuum of our theory may be regarded as the direct product of vacua for each momentum mode
separately, so that overlaps of the form 20|01 factorize as indicated.
364 Dynamics VIII: Interacting fields: perturbative aspects

whatsoever in establishing a well-defined unitary relation between the in- and out-
states of an interacting field theory: the overlaps out β|αin = Sβα are taken between
states living in spaces spanned by a complete basis of eigenstates of the same Hamil-
tonian operator H. Indeed, the Haag–Ruelle and LSZ scattering theories developed in
Sections 9.3 and 9.4 lead to a perfectly well-defined, and unitary, S-matrix, on the basis
of exactly the same axiomatic framework which can be used to establish the validity of
Haag’s theorem. The LSZ formula, for example, gives a rigorous connection between
well-defined Green functions (time-ordered products of the full Heisenberg fields) and
this unitary S-matrix. Direct non-perturbative evaluation of the Green functions of
the theory (say, by lattice field theory methods) therefore completely circumvents any
difficulty with the non-existence of interaction picture, as the latter is simply not
employed at any point.
Of course, in many cases the only sensible approach to the evaluation of the Green
functions is via perturbation theory, which is inescapably rooted in an interaction-
picture formalism, as we discussed earlier in this chapter in Sections 10.1 and
10.2. If we now return to our initial example, with unperturbed and perturbation
Hamiltonians given in (10.164, 10.165), we discover with some surprise that the
formal execution of the perturbative evaluation in interaction-picture of the n-point
Green functions of particle 2 (described by the “full” Hamiltonian H) leads to
no particular difficulties. For example, using the Gell–Mann–Low theorem (9.21),
we have the formal perturbative expansion for the two-point Feynman function
(2)
2 0|T (φ2 (x)φ2 (y))|02 = iΔF (x − y) of the “interacting” field φ2 in terms of the
propagators of the “free” field φ1 :

2 0|T (φ2 (x) φ2 (y))|02 = 10|T (φ1 (x)φ1 (y))|01



δm2
−i 10|T (φ1 (x)φ1 (y) : φ1 (z1 )2 : |01 d4 z1
2

1 −iδm2 2
+ ( ) 10|T (φ1 (x)φ1 (y) : φ1 (z1 )2 : : φ1 (z2 )2 : |01 d4 z1 d4 z2
2 2
+... (10.192)

Expanding the vacuum-expectations inside the integrals using the Wick theorem
(recall that fields inside a normal product symbol are not contracted), one finds
(2) (1)
iΔF (x − y) = iΔF (x − y)

δm2 (1) (1)
−i (2) iΔF (x − z1 )iΔF (z1 − y)d4 z1
2

1 −iδm2 2 (1) (1) (1)
+ ( ) (8) iΔF (x − z1 )iΔF (z1 − z2 )iΔF (z2 − y)d4 z1 d4 z2
2 2
+... (10.193)

The factors (2) and (8) in the second and third lines are combinatoric factors
reflecting the number of equivalent ways by which the fields can be contracted to
give the displayed product of propagators. On transforming to momentum space, the
How to stop worrying about Haag’s theorem 365

convolutions over spacetime arguments become simple products, and the perturbative
series reveals itself as a simple geometric expansion:
(2) (1) (1) (1)
ΔF (k) = ΔF (k) + δm2 (ΔF (k))2 + (δm2 )2 (ΔF (k))3 + ...
1 1 1
= + δm2 ( 2 )2 + (δm2 )2 ( 2 )3 + ...
k2 − m1 + i
2
k − m1 + i
2
k − m21 + i
1
=
k 2 − m21 − δm2 + i
1
= (10.194)
k 2 − m22 + i

—exactly as expected. And this despite the use of an interaction-picture expansion


which, as we have seen earlier, is mathematically inadmissible, given that we have
used fields defined in infinite-volume space throughout!
The origin of our unexpected good fortune is not hard to uncover if we repeat the
above calculation with our fields quantized in a box of finite volume V , as in (10.182).
In this case the interaction picture is well-defined as the overlap between the “free”
and “interacting” vacua is non-zero, and we may use the Gell–Mann–Low formula with
no qualms. The Feynman propagators are now given, as a straightforward calculation
along the lines of the argument leading from (7.217–7.218) and (7.214–7.215) reveals,
by discrete sums (rather than integrals) over the spatial momenta. For example, for
particle 1:
1  1
(θ(t1 − t2 )eik·(x2 −x1 ) + θ(t2 − t1 )e−ik·(x2 −x1 ) )
(1)
iΔF (x1 − x2 ) =
V 2E1 (k)

k

1  dk0 e−ik·(x1 −x2 )
=i (10.195)
V 2π k 2 − m21 + i

Here the time component k0 is still continuous (as the time dimension is still
unbounded), but the spatial components of k are discrete as appropriate for periodic
modes in our finite box. We may now repeat essentially verbatim the steps leading
from (10.192) to (10.194) to obtain for the “full” propagator,

(2) 1  dk0 e−ik·(x1 −x2 )
ΔF (x1 − x2 ) = (10.196)
V 2π k 2 − m22 + i

—precisely the expression for the free propagator of a particle of mass m2 , also
quantized in a box of volume V . The desired infinite volume result for the “inter-
acting” theory can then be obtained trivially by taking the V → ∞ limit in (10.196),
thereby returning to the conventional result (7.215). The recovery of the full Poincaré
invariance of the infinite-volume continuum theory, with continuous and unbounded
four-momenta, is perfectly straightforward in this case, as we are dealing with theories
without non-trivial interactions: in particular, the execution of the perturbation expan-
sion does not lead to ultraviolet-divergent loop integrals. Also, there are no divergent
366 Dynamics VIII: Interacting fields: perturbative aspects

phases in the evolution of the vacuum states (either for particle 1 or particle 2) due to
disconnected vacuum graphs. Nevertheless, we emphasize once again that, from the
point of view of Haag’s theorem, the interaction picture for our toy system in infinite
volume is just as pathological as in cases in which non-trivial interactions (involving
higher than quadratic terms in the fields) are present.
The situations in which the interaction picture is typically employed, to compute
the scattering amplitudes of an interacting field theory say, involve perturbations of
the free theory of a much more complicated nature than the innocent shift of mass
discussed above, and we should hardly be surprised if the negative conclusions reached
above concerning the existence of proper unitary transformations relating the Fock
states of the free and interacting field remain in force in these more physically relevant
circumstances. The proof of the generalized Haag theorem is usually accomplished
in two steps. One first establishes that the unitary equivalence of two irreducible
field operators (recall that from Axiom IId of Section 9.2, this implies that the only
operator commuting with all fields is a multiple of the identity) implies the equality of
their equal-time Wightman functions (VEVs of field products). The second, and more
difficult, step uses analyticity properties of the Wightman functions following from the
spectral and field axioms of Section 9.3 (and embodied in the Hall–Wightman theorem)
to extend this equality to Wightman functions for arbitrary spacetime arguments. We
shall outline the proof of the first part here, and refer the interested reader to the
literature for the more technically challenging second step. An alternative proof, due
to Jost and Schroer, and involving only the two-point function, will be relegated to
an exercise for the reader (see Problem 13).
Let us suppose that we have two scalar24 fields φ1 (x, t) and φ2 (x, t), with associated
canonical momentum fields π1 (x, t) ≡ φ̇1 (x, t), π2 (x, t) ≡ φ̇2 (x, t), and related at any
given time t by a well-defined unitary operator V (t) defined in a single Hilbert space
accomodating both fields (and equal to the interaction-picture operator U (t, 0) in our
previous notation: the change is occasioned by the desire to avoid confusion with the
U (R, a) unitary representatives of the Euclidean group introduced below):

φ2 (x, t) = V † (t)φ1 (x, t)V (t) (10.197)


π2 (x, t) = V † (t)π1 (x, t)V (t) (10.198)

The Euclidean subgroup of the Poincaré group, consisting of spatial rotations R and
translations a is realized on the Hilbert space by unitary operators Ui (R, a), i = 1, 2
which have the usual action on our local fields (cf. (5.94)):

Ui (R, a)φi (x, t)Ui† (R, a) = φi (Rx + a, t) (10.199)


Ui (R, a)πi (x, t)Ui† (R, a) = πi (Rx + a, t) (10.200)

If we think of φ1 (resp. φ2 ) as our free interaction-picture field φ (resp. Heisenberg field


φH ), then U1 (resp. U2 ) are the inhomogeneous analogs of the unitary representatives

24 The generalization to fields transforming under non-trivial representations of the Lorentz group is
unproblematic.
How to stop worrying about Haag’s theorem 367

U (Λ) (resp. UH (Λ)) introduced in Section 9.1. Note that we have at this stage already
committed ourselves to continuum-normalized multi-particle states, by insisting on
the invariance of the theory under the continuous Euclidean group. Finally, we shall
assume that there is a unique invariant state (vacuum) for each set of Euclidean group
representatives:

Ui (R, a)|0i = |0i , i 0|0i = 1, i = 1, 2 (10.201)

From (10.197, 10.199) it follows that

U1† (R, a)V (t)U2 (R, a)V † (t)φ1 (x, t) = U1† (R, a)V (t)U2 (R, a)φ2 (x, t)V † (t)
= U1† (R, a)V (t)φ2 (Rx + a, t)U2 (R, a)V † (t)
= U1† (R, a)φ1 (Rx + a, t)V (t)U2 (R, a)V † (t)
= φ1 (x, t)U1† (R, a)V (t)U2 (R, a)V † (t) (10.202)

so that the operator U1† (R, a)V † (t)U2 (R, a)V (t) commutes with all fields φ1 (x, t)
on timeslice t. An exactly similar sequence of manipulations (using (10.198,10.200))
establishes that this commutativity holds also with the π1 (x, t) operators. The creation
and annihilation operators appropriate for free field φ1 can be reconstructed from the
φ1 (x, t) and π(x, t) so commutativity of U1† (R, a)V † (t)U2 (R, a)V (t) is thus established
with all such operators, which implies that it must act as a multiple of the identity
in the Fock space of field φ1 (this is the irreducibility property, here invoked for fields
and their conjugate momenta on a single time-slice). Thus

U1† (R, a)V (t)U2 (R, a)V † (t) = c(R, a) (10.203)

with c(R, a) a unimodular c-number. Hence

U2 (R, a) = c(R, a)V † (t)U1 (R, a)V (t) (10.204)

The fact that U1,2 (R, a) form (infinite-dimensional) representations of the Euclidean
group (cf. (5.14):

Ui (R1 , a1 )Ui (R2 , a2 ) = Ui (R1 R2 , a1 + R1a2 ) (10.205)

implies that the c(R, a) must likewise form a one-dimensional representation of the
Euclidean group:

c(R1 , a1 )c(R2 , a2 ) = c(R1 R2 , a1 + R1a2 ) (10.206)

Some simple group-theoretic reasoning (see Problem 12) leads to the conclusion that
the only such representation is the trivial one, c(R, a) = 1, whence

U2 (R, a) = V † (t)U1 (R, a)V (t) (10.207)

Finally, the uniqueness of the vacuum states implies, using (10.207),

U2 (R, a)V † (t)|01 = V † (t)U1 (R, a)|01 = V † (t)|01 (10.208)


368 Dynamics VIII: Interacting fields: perturbative aspects

so that, up to an irrelevant unimodular phase, V † (t)|01 is the vacuum for field φ2 :


|02 = V † (t)|01 (10.209)
The equality of equal-time vacuum-expectation-values for the two fields now follows
trivially:

10|φ1 (
x1 , t)φ2 (x2 , t)...φ1 (xn , t)|01
= 10|V (t)V † (t)φ1 (x1 , t)V (t)V † (t)φ1 (x2 , t)..V (t)V † (t)φ1 (xn , t)V (t)V † (t)|01
= 20|φ2 (x1 , t)φ2 (x2 , t)...φ2 (xn , t)|02 (10.210)
Recall that our fields φ1 , φ2 are supposed to represent free and fully interacting
fields, respectively, so this result is already astonishing, as we should certainly not
expect, even at equal time, the free-field vacuum-expectation-values to coincide with
the corresponding very complicated interacting ones. The final nail in the coffin of
the interaction picture is inserted by the realization that the very strong analyticity
constraints on the spacetime Wightman functions (cf. Section 9.2) allow the equality
expressed in (10.210) to be extended to arbitrary values of the spacetime coordinates
of the fields. Note that these analyticity properties follow from the full panoply of
Wightman axioms (of type I and II) discussed in Section 9.2: in particular, locality, full
Poincaré (not just Euclidean group) invariance, and the usual spectral properties. The
insertion of θ-functions leads to a similar conclusion for the Feynman (time-ordered)
Green functions of fields φ1 and φ2 , from which we conclude (via LSZ) that the S-
matrix of the interacting field φ2 is equal to that of the free field φ1 : namely, unity.
Thus, non-trivial interactions are excluded once we make the evidently overly strong
assumption of well-defined (“proper”) unitary equivalence of the representations for
the two fields. The interested reader is encouraged to follow the more detailed accounts
of this second step in the argument leading to Haag’s theorem, involving an application
of the fundamental Hall–Wightman theorem (Hall and Wightman, 1957)(cf. also
Section 9.2) on analytic domains of Wightman functions, in (Barton, 1963) and
(Greenberg, 1959).
A slightly different route (Streater and Wightman, 1978) to Haag’s theorem utilizes
the two-point function only, the analyticity properties of which are essentially trivial,
as we have seen in Section 9.5. Taking n =2 in (10.210), and with φ1 (x) a canonically
normalized free scalar field of mass m, we have for the “interacting field” φ2

20|φ2 (
x1 , t)φ2 (x2 , t)|02 = Δ+ (x1 − x2 , 0; m) (10.211)
where Δ+ is the invariant function of (6.63). For x1 , x2 any pair of space-like separated
points, the corresponding times t1 , t2 can be brought to equality by an appropriate
Lorentz boost, so from the Lorentz-invariance properties of Δ+ we conclude that the
two-point Wightman function of the φ2 field must coincide with that of the free field
for space-like separations of x1 − x2 . The equality can be analytically extended to the
time-like domain: for example, we need only appeal to the Lehmann representation for
the two-point function derived in Section 9.5, where the spectral function is already
fully determined by knowledge of the two-point function in the space-like region.
How to stop worrying about Haag’s theorem 369

Finally, one utilizes a theorem of Jost and Schroer (Jost, 1961) (see also Problem 13),
wherein it is shown that any field whose two-point function coincides with that of a
free field must itself be a free field (evidently, of the same mass).
The non-existence of the interaction picture for any Poincaré invariant local field
theory with essentially any non-trivial split (other than a trivial c-number one) into
free and interacting parts of the Hamiltonian is, of course, an unpleasant fact of
life given the enormous utility of perturbative Feynman graph technology in modern
particle physics. The attitude of the present author to this circumstance has already
been outlined above, in the discussion of the perturbative expansion of the two-point
function in the toy model of a scalar mass shift. The interaction-picture formalism
can be reinstated with complete mathematical rigor by a full regularization of the
field theory, in which both spatial infrared (i.e., finite volume) and ultraviolet (i.e.,
finite lattice spacing) cutoffs are introduced. The resultant theory, at the price of loss
of Poincaré invariance, is now a quantum-mechanical system with a finite number of
independent degrees of freedom, and the interaction picture makes perfect sense. The
problem is now transferred to the issue of regaining sensible (in particular, Poincaré
invariant!) results in the limit when these cutoffs are removed, after the perturbative
expansion of the n-point functions needed for evaluation of the S-matrix has been
performed. Note that the perturbative contributions obtained at each finite order of
perturbation theory are completely well-defined in this cutoff theory (although, as
emphasized previously, the expansion is only an asymptotic one, with the sum of
perturbative contributions diverging because of factorial growth of the coefficients, as
we shall see in the next chapter).
We consider first the behavior of the cutoff perturbative amplitudes as the spatial
volume of the system is allowed to go to infinity. From the discussion in Section 10.2
of persistent interactions, we know that in a theory of massive particles, the only
volume singularity of the n-point Feynman functions appears in the phase out 0|0in
accumulated in the vacuum due to the vacuum energy density shift induced by inter-
actions. This phase is removed by the simple expedient of considering only connected
contributions to the S-matrix: it would in any event disappear subsequently once the
S-matrix amplitudes are squared to determine the probability of scattering processes.
The infinite volume limit is perfectly smooth in the remaining connected amplitudes,
as the appearance of momentum integrals extending down to zero momentum is
unproblematic in a theory with massive particles due to the absence of infrared
divergences (on this matter, cf. Chapter 19).
It is important to realize that contrary to assertions one sometimes encounters
in discussions of Haag’s theorem, the vacuum fluctuations encountered in interaction
picture, corresponding to the interaction-induced shift in the ground-state energy of
the theory (and present even when the theory is fully regulated), are not the root
cause of the non-existence of the interaction picture. Indeed, Haag’s theorem applies
in full force to supersymmetric field theories (see Section 12.6) in which the vacuum
energy fluctuations cancel identically between bosonic and fermionic contributions.
Nevertheless, the interacting n-point functions of these theories most certainly differ
from their free limits, guaranteeing the non-existence of the interaction picture by
the arguments given above. Indeed, there are typically mass shifts (equal for bosonic
370 Dynamics VIII: Interacting fields: perturbative aspects

particles and their fermionic superpartners of course) induced by the interactions,25 so


we should certainly expect, in analogy to our toy model, a vanishing overlap between
the free (or “bare”) and interacting vacua of these theories.
The final step in restoring the Poincaré-invariant status of the theory (as well
as the unitarity of the S-matrix, which is typically destroyed26 by the omission of
high-momentum states incurred by an ultraviolet cutoff) requires a removal of the
short-distance cutoff, for example, in the case of a lattice regularization, by taking the
lattice spacing a to zero. At this point the usual ultraviolet singularities of perturbative
field theory, occasioned by high-momentum divergences in the loop integrals appearing
in the Feynman graph expansion, come into play. It should be emphasized that
quite apart from the difficulties engendered by Haag’s theorem, the presence of an
ultraviolet regularization is a precondition for even defining a complete dynamical
structure for interacting local quantum field theories in 3+1-dimensional spacetime:
the Hamiltonian density defining the dynamics of such theories in terms of local
fields necessarily involves coefficients (the “bare” masses m(a), couplings λ(a), etc.)
which typically are singular (or in some cases, as in asymptotically free theories, go
to zero) in the continuum limit a → 0. The dynamics of the interacting theory is
reconstructed by a limiting process—renormalization—in which the sensitivity of the
cutoff amplitudes to the short-distance physics at the scale of a lattice spacing is shown
to be negligible (in the sense that the continuum amplitudes are typically corrected by
terms of power order (mph a)2 , (pa)2 , with p a typical momentum in the process), once
the amplitudes are reparameterized in terms of couplings and masses determined by
physical measurements at low energy (or distances much larger than the short-distance
cutoff). The violations of Poincaré invariance and unitarity due to the presence of a
short-distance cutoff are likewise negligible in such cases. Field theories for which power
insensitivity to the ultraviolet cutoff can be established order by order in perturbation
theory ( and to all orders thereof) are termed “renormalizable”: their study, and more
generally the study of the sensitivity of field theory amplitudes to physics at widely
different scales, will be the main object of our attention in Part 4 of this book. The field
theories of the Standard Model of particle physics, and its supersymmetric extensions,
are all of this type. In the opinion of this author, therefore, the proper response to
Haag’s theorem is simply a frank admission that the same regularizations needed to
make proper mathematical sense of the dynamics of an interacting field theory at each
stage of a perturbative calculation will do double duty in restoring the applicability of
the interaction picture at intermediate stages of the calculation. The restoration of the
complete panoply of desirable invariances and properties of a continuum (flat space)
field theory must then be studied using the elegant technologies which have been
devised for exploring the sensitivity of field theories to modifications of the spacetime
structure either at short or large distances, and to which we turn in the final four
chapters of the book.

25 The non-renormalization theorems of the superpotential ensure the absence of renormalization in the
mass terms in the Lagrangian, but there are typically shifts in the poles of the propagator due to non-trivial
wavefunction renormalizations. See Section 12.6 for an explanation of these arcane terms.
26 An important exception to this occurs with Hamiltonian lattice formulations of field theory, provided,
of course, that the lattice Hamiltonian is constructed to be properly hermitian.
Problems 371

10.6 Problems
1. Verify that the operator E  (t, t0 ), defined in (10.15), satisfies the same first-order
equation (10.16) and initial condition as E(t, t0 ), whence E  (t, t0 ) = E(t, t0 ).
2. Determine the Wick expansion of T (φ(x1 )φ(x2 )φ(x3 )φ(x4 )) by taking the fourth
functional derivative of (10.22) with respect to j(x1 ), j(x2 ), j(x3 ), j(x4 ) and
setting the sources to zero.
3. Perform the indicated spacetime integrations in (10.37) to obtain the
momentum-space expression for 2-2 scattering in the theory with interaction
(10.31).
4. The object of this exercise is to work out the lowest-order perturbative contri-
butions to the vacuum energy density shift δE (see (10.42)) induced by a λφ4
interaction.
(a) First assume that the interaction is not normal-ordered, Hint = 4! λ 4
φ . There
is then a contribution to the vacuum-to-vacuum amplitude of first order in
λ. Show that after an overall integral over spacetime (interpreted as V · T ,
spatial volume times temporal extent) is extracted, the energy density shift
is found to be

λ λ 1 d4 k 2
δE = (iΔF (0))2 = − ( ) (10.212)
8 8 k 2 − m2 + i (2π)4

Show that by Wick rotating the momentum integral to Euclidean space a


real result, albeit ultraviolet-divergent in the absence of a high-momentum
cutoff, is obtained. Inserting a large-momentum cutoff Λ, show that δE is of
order Λ4 as Λ → ∞.
(b) Now suppose the interaction to be normal-ordered, Hint = 4! λ
: φ4 :. Show
2
that the lowest-order contribution to δE is now of order λ , corresponding
to the disconnected vacuum graph of Fig. 10.3. Find an expression for δE in
terms of a three-loop Minkowski integral, and verify, by Wick rotation, that
the (again, ultraviolet-divergent) result is real, and again of order Λ4 when
the loop integrals are cut off at a large momentum Λ.
5. The object of this exercise is to rederive the logarithmically divergent Coulomb
self-energy of an electron originally found by Weisskopf in 1939 (cf. Section 2.3,
Equations (2.53–2.62)). One first needs the charge density correlation function
 from which the desired self-energy correction (for
in a single electron state G̃(ξ),
an electron at rest) follows immediately via (2.56). We will use non-covariantly
(continuum) normalised electron states, with kσ|k  σ   = δσσ δ 3 (k − k ). Thus,
we need to calculate (for a fixed spin choice σ)

kσ| : eψ † ψ(r − ξ/2,
 0) : : eψ † ψ(r + ξ/2,
 0) : |0σd3 r = G̃(ξ)δ
 3 (k) (10.213)

where the fields are at time zero. The calculation is greatly simplified by shifting

the first charge density (at r − ξ/2) to a slightly positive time , whereupon
the product of charge densities can be taken to be time-ordered. The Wick
372 Dynamics VIII: Interacting fields: perturbative aspects

expansion now yields a sum of two terms (of the form iSF : ψψ † :, with SF
the Feynman propagator for a Dirac particle), which can be reduced (using the
Fourier formula for SF ) to a three-dimensional momentum integral for G̃(ξ)  by
explicitly performing the energy integral (after which the time shift  can be set
to zero). Show that one obtains the result (2.61) quoted previously,

 = e2 d3 q m i
q·ξ

G̃(ξ) e (10.214)
(2π)3 E(q)
leading to a logarithmically divergent Coulomb self-energy.
6. (a) Consider the one-dimensional integral
 +∞ 
1 2 2 λ 4
Z(λ, m) ≡ e− 2 m x − 4 x dx ∼ cn λn (10.215)
−∞ n

where the sum in powers of λ is obtained by expanding the exponential


inside the integral. Show that the coefficients cn are given by
√ 1 Γ(2n + 12 )
cn = (−1)n 2 (10.216)
(m2 )2n+1/2 Γ(n + 1)
Evidently, the cn increase factorially with n, so that the series in powers of
λ has zero radius of convergence, and must be regarded as an asymptotic
expansion only (cf. Section 11.1).
(b) Instead of an expansion in powers of λ, we may rescale the integration
m2
variable y ≡ λ1/4 x and expand Z(λ, m) in powers of √ λ
. Show that one then
obtains a convergent Taylor series, with coefficients dn decreasing factorially
with n:

 1 1 Γ( n + 14 )
Z(λ, m) = λ−1/4 dn , dn = (−1)n √ m2n 2 (10.217)
n=0
λn/2 2 Γ(n + 1)

Note that this result shows that Z(λ, m) is a real analytic function of λ,
with a cut of standard fractional power type along the negative real axis,
and behaving for large |λ| like |λ|−1/4 .
7. Let φ(x) be a free real scalar field with Feynman propagator ΔF (x). Show that
0|T (eiφ(x) e−iφ(0) )|0 = ei(ΔF (x)−ΔF (0)) by
(a) Expanding out the operator exponentials and using Wick’s theorem, and
(b) by using path-integral methods (i.e., evaluate the path integral for Z0 (j)
with j(z) = δ(z) − δ(z − x)).
8. Calculate the four-point function of four free Dirac fermion fields
0|T (ψ(x)ψ(y)ψ̄(z)ψ̄(w))|0
by taking four functional derivatives with respect to Grassmann sources
η(x), η̄(x) of Z0 [η, η̄], the generating functional for the free fermion field theory.
Check that the relative signs for the terms you obtain agree with Wick’s theorem.
9. Verify the expansion for the sources Ji in terms of the φi in (10.150).
Problems 373

10. Verify the result (10.151) for the effective potential through terms of order φ4 .
11. Verify the recursion relation (10.187) giving the amplitudes for multiple pairs
of the quanta of field φ2 in the vacuum of field φ1 , where φ1 (resp. φ2 ) are free
scalar fields of mass m1 (resp. m2 ).
12. Starting with the representation equation (10.206), show
(a) That for R = 1, the only solutions are c(1, a) = ei

a , with p some fixed
three-vector.
(b) Using the fact that the one-dimensional (zero angular momentum) repre-
sentation of the rotation group is trivial, c(R, 0) = 1, and the result of part
(a), show that for any R, a, c(R, a) = 1.
13. Suppose that the two-point Wightman function of a Heisenberg field φH is
known to coincide with that for a canonically normalized free field of mass m

in 0|φH (x)φH (y)|0in = Δ+ (x − y; m) (10.218)


Show that the Heisenberg field φH must satisfy the Klein–Gordon equation
Kx φH (x) = (x + m2 )φH (x) = 0 (10.219)
(Hint: apply the Klein–Gordon operators Kx Ky to (10.218).) Now, note that by
the Yang–Feldman equation (9.204) derived in Chapter 9, Problem 8, using the
asymptotic condition, the Heisenberg field must be identical up to normalization
to the free in-field:

φH (x) = Z φin (x) (10.220)
delivering the result of Jost and Schroer once again, this time by appeal to the
asymptotic theory developed in Sections 9.3 and 9.4.
11
Dynamics IX: Interacting fields:
non-perturbative aspects

The extraordinary successes, beginning in the early 1950s, of perturbative quantum


electrodynamics (QED) in predicting with astonishing accuracy the measured hydro-
gen fine structure and anomalous magnetic moment of the electron (and muon) played
an important role in the acceptance of local quantum field theory as an appropriate
framework for the description of subatomic phenomena. However, this success was
almost immediately tempered by the realization that the meson field theories devised
to provide a fundamental description of the dynamics of the strong interactions
were completely unable to yield comparable quantitative precision, and the resultant
frustration would lead to an abandonment, for more than a decade, of Lagrangian
field theory in strong interaction physics, in favor of a purely S-matrix approach in
which one attempted to constrain the form of strong interaction amplitudes on the
basis of very general principles of unitarity, analyticity, and crossing symmetry, with
the hope that a sufficiently clever exploitation of these properties might eventually
allow a unique determination of the desired amplitudes, in terms of a finite number
of measurable strong interaction parameters. This project would eventually collapse
in the mid-1970s with the discovery of an appropriate field theory description of the
strong interactions—quantum chromodynamics (QCD)—and the realization that the
previous difficulties with field theory approaches to the strong interactions derived
from (a) the use of an incorrect fundamental field theory, involving pions and nucleons,
instead of the truly elementary quark and gluon underlying degrees of freedom, and
(b) the lack of sufficiently powerful non-perturbative techniques needed to extract even
the most basic qualitative features of a strongly coupled field theory.
The power and intuitive transparency of the perturbative, graph-based approach
to quantum field theory explored in the preceding chapter is undeniable, and we
shall see later that this framework allows us to extract quantitative results even in
circumstances which at first sight would seem to involve strongly coupled theories
where a perturbative expansion could hardly be valid. However, it is equally impor-
tant to distinguish the cases in which perturbation theory is intrinsically incapable
of capturing even the most elementary qualitative aspects of the physics. Broadly
speaking, we can identify three categories of problems from the point of view of the
role played by perturbative calculation:

1. Most obviously, we have the classic processes in which a small coupling (in the
case of QED, or more generally, the standard model of electroweak interactions,
Dynamics IX: Interacting fields: non-perturbative aspects 375

this would be the fine structure constant α ∼1/137) allows extremely accurate
calculations of a given process simply by evaluating and summing the perturba-
tive contributions up to a finite loop order. In the case of QED this has been
done up to the four-loop order (α4 ) for quantities such as the electron anomalous
magnetic moment.
2. Next, we have situations in which a physical quantity, albeit in a weakly coupled
theory, necessarily involves an infinite number of interactions between the con-
stituent particles, and hence an infinite number of Feynman graphs. Bound states
such as the hydrogen atom in a weakly coupled theory such as QED clearly fall
into this category, as the permanent association of the proton and electron clearly
requires that they exchange photons over an infinite time span, in contrast to the
situation in unbound electron–proton scattering, where the scattering amplitudes
can be perturbatively evaluated, with exchange of many photons suppressed
by higher powers of α (uncompensated by kinematic enhancements due to the
bound-state threshold, as we shall see below). Of course, the calculation of all
Feynman diagrams contributing to a process is beyond our calculational powers,
and it will soon become clear that even if that were possible, the resultant series
is in fact a divergent asymptotic expansion, and cannot therefore be summed
directly to yield a meaningful answer! Instead, we shall see that for a certain
class of bound-state problems, the kinematic region important for the permanent
binding of the constituent particles identifies a dominant component of the
(infinite) set of perturbative amplitudes which can be convergently summed,
and which represent the leading contributions to the bound-state properties (in
an expansion in the available weak coupling). For lack of a better term, we may
refer to such situations as “perturbatively non-perturbative” processes in field
theory.
3. Finally, there are those physical processes in which the relevant coupling strength
is large, so that an asymptotic expansion, even if formally available to high order,
is simply useless in extracting quantitative (and in many cases, even qualitative)
features of the physics. Quark confinement and chiral symmetry-breaking in
QCD are archetypal examples of this type. We may (again, for lack of a better
term) refer to these cases as the “essentially non-perturbative” processes in field
theory. The Feynman graph approach is of little if any utility here: instead,
numerical approaches in which the Euclidean functional integral of the discretized
theory is evaluated directly by statistical Monte Carlo techniques (as in lattice
gauge theory) provide the most fruitful line of attack. Indeed, the use of such
methods has allowed us to obtain, starting from the QCD Lagrangian, and with
accuracy now approaching in many cases the level of a few percent, many detailed
predictions of hadron spectrum and structure.

In this chapter, after explaining the nature of the (unavoidable) divergence in per-
turbation theory, we shall give some examples of the various types of non-perturbative
phenomena encountered in the field theories of importance in the Standard Model
of particle physics. The physics of weakly coupled threshhold bound states will be
explained, as it is crucial for understanding the classic successes of QED in the
hydrogen atom spectrum, for example. The limitations of perturbation theory in
376 Dynamics IX: Interacting fields: non-perturbative aspects

strongly coupled theories, and the extent to which perturbation theory can even
in principle be regarded as determining the exact amplitudes of the theory, will be
explained, as well as the role played by Borel-summability (or its absence) of the
perturbative series. Conventional wisdom holds that perturbative information by itself
is virtually useless in non-Borel-summable theories, but we shall see that in at least one
iconic case (the anharmonic “double-well” oscillator) purely perturbative information
can be “massaged” to obtain a rigorously convergent sequence of approximants to the
amplitudes of a non-Borel theory. We conclude the chapter with some brief remarks
on numerical approaches to non-perturbative field theory.

11.1 On the (non-)convergence of perturbation theory


The fact that formal expansions of field theory amplitudes in powers of the coupling
constant(s) of the theory are generally at best asymptotic, rather than convergent
Taylor expansions, can be readily understood on the basis of a simple physical argu-
ment first given by Dyson (Dyson, 1952) in the context of quantum electrodynamics.
If the amplitudes of the theory were in fact described by analytic functions Z(λ) of
the coupling constant (which we here generically denote λ), with λ = 0 a point of
analyticity, thereby allowing a convergent Taylor expansion in powers of λ, with a
finite radius of convergence, we should clearly expect the physics of the theory to
change smoothly if the coupling λ is moved continuously from a small positive to a
small negative value. In fact, such a change typically introduces dramatic instabilities
which completely alter the spectral properties of the theory. For self-interacting φ4
scalar theories, with a Hamiltonian given in d spacetime dimensions by

1 1  2 1 2 2 λ 4
H = dd−1 x{ φ̇2 + |∇| + m φ + φ } (11.1)
2 2 2 4

we have already seen in Section 8.4 that the spectrum of the theory becomes
unbounded below if the sign of the coupling λ becomes negative. For this theory, the
functional integral representation of the Euclidean generating functional (or vacuum
to vacuum amplitude, setting the source function to zero)1
  d 1 1 2 2 λ 4
Z(λ, m) = Dφe− d x{ 2 ∇φ·∇φ+ 2 m φ + 4 φ } (11.2)

clearly diverges if we allow the real part of λ to become negative. Even for the d = 0-
dimensional case, where the integral (11.2) degenerates to a one-dimensional integral
(as the field is defined at a single point, where we may denote its value x, and there
is no gradient term)
 +∞
2
1
x2 − λ 4
Z(λ, m) ≡ e− 2 m 4x dx (11.3)
−∞

1 Here, the ∇ operator is the Euclidean d-gradient. As usual, we assume that the functional integral is
made well-defined—regularized—by an appropriate discretization, both in the infrared and the ultraviolet:
e.g., on a finite lattice.
On the (non-)convergence of perturbation theory 377

simple arguments show (cf. Problem 6 in Chapter 10) that the function Z(λ, m) is
analytic in the complex plane of λ with a cut on the negative real axis extending up
to the origin, so that a formal expansion around λ = 0 cannot converge. Instead, if
we expand the “interaction” x4 term inside the integral, we arrive at an asymptotic
expansion2
 √ 1 Γ(2n + 12 )
Z(λ, m) ∼ cn λn , cn = (−1)n 2 (11.4)
n
(m2 )2n+1/2 Γ(n + 1)

with oscillating coefficients cn which increase factorially in magnitude at large orders


n. Note that the divergent series encountered here has absolutely nothing to do with
the ultraviolet divergences encountered in continuum field theories for d ≥ 2, but occur
even if the theory is fully regularized, as we assume throughout this section, in the
ultraviolet (and, if necessary, in the infrared as well). The same cut-plane analyticity
holds for the regularized functional integral (11.2) in any spacetime dimension, as we
may analytically continue the integral to Re(λ) < 0 by simultaneously rotating the
phase of the coupling λ → |λ|eiθ with the global phase of the field integration variable
φ(x) → |φ(x)|e−iθ/4 so that the damping induced by the quartic term is maintained.
The result is that the integral defines an analytic function of λ for |θ| < π, with
different (complex-conjugate) results obtained on the negative real axis cut depending
on whether the continuation is performed clockwise or anticlockwise around the origin.
It turns out, as we shall soon see, that the large-order divergence of perturbation
theory is intimately related to the behavior of the (imaginary) discontinuity for
negative coupling. Physically, this is due to the connection between the imaginary
part developed by the energies of quantum mechanical states when the system becomes
unstable and the lifetime of these states, so once again we see that there is a deep
connection between the large-order behavior of perturbation theory and stability
issues.
The divergence of the perturbation theory for quantum electrodynamic quantities
e2
expanded in powers of the fine-structure constant α = c was first shown by Dyson
using exactly such a stability argument (Dyson, 1952). One may once again consider
the zero-source generating functional Z(α) of the theory (i.e., the vacuum-to-vacuum
amplitude absent external fields), the formal expansion of which involves the sum
of vacuum diagrams, each of which must contain an even number of vertices (as all
photon lines are internal), so that we have a formal expansion
 e2
Z(α) ∼ cn αn , α ≡ (11.5)
n
c

If we analytically continue the theory to negative α, the vacuum becomes unstable to


the production of electron–positron pairs, by the following simple energetic argument.
As a result of the negative sign of α ∝ e2 , like charges now attract and unlike charges
repel, so that we can concoct a state of negative energy (into which the vacuum can


2 The reader is reminded that the series n cn λn is said to be asymptotic to a function f (λ) if |f (λ) −
N
n=0 cn λ | = O(λ
n N +1 ) as λ → 0 for any fixed N .
378 Dynamics IX: Interacting fields: non-perturbative aspects

N electrons N positrons

r r

Fig. 11.1 Electron–positron pair assembly leading to energetic instability (for α < 0).

therefore decay) by creating N electron–positron pairs and arranging them spatially


(see Fig. 11.1) into two groups, with the electrons and positrons separated into two
groups of spatial dimension r separated by a distance d >> r. The energetic cost of this
configuration, if the electrons and positrons are relativistic with momentum p ∼ /r,
2
is of order +N r c in kinetic energy, and of order −N 2 er for electrostatic potential
energy (ignoring the repulsive contribution between the two groups if d >> r), so we
have instability once
e2  c 1
N2 >N c⇒N > 2 ∼ (11.6)
r r e α
On the other hand, an asymptotic expansion of the form (11.5), with factorially
growing coefficients cn ∼ n! ∼ nn , will consist of a series of terms which decrease until
αn ∼O(1) (i.e., n ∼ N ∼ 1/α) and then increase beyond that point.3 This is exactly
the behavior expected once graphs involving the generation of intermediate states
with N electron–positron pairs (and therefore N powers of α) are included in the
computed vacuum amplitude, inducing via (11.6) an energetic instability for negative
α. Of course, the fact that the series gives increasingly accurate approximants until we
reach on the order of 137 (1/α) loops means that the formal divergence of perturbation
theory in pure quantum electrodynamics is of absolutely no practical consequence:
we are dealing with the first type of theory discussed in the introduction, where an
asymptotic expansion is perfectly adequate to the task of extracting results of an
accuracy (in the case of QED) far beyond our capacity to empirically verify them.
It turns out that for spacetime dimensions less than four, the large-order behavior
of perturbation theory for scalar theories is essentially semiclassical in nature, and

3 The heuristic argument given here, while correct for scalar electrodynamics—i.e., for theories with
spinless charged “electrons” and “positrons”—ignores Fermi exchange effects which suppress configurations
with fermions in highly overlapping states. More
√ careful arguments, first given by (Parisi, 1977), lead
to perturbative coefficients in QED rising like n!. In the path-integral approach described below, the
dominant behavior arises from saddle-points in a combined effective action arising from the free photon
contribution and the determinant obtained by integrating out the electron field: see (Ioffe et al., 2010),
Section 5.8.
On the (non-)convergence of perturbation theory 379

is determined by exactly the instability considerations indicated qualitatively above.


In particular, as we shall see in our study of scale sensitivity in Part 4 of the book,
scalar φ4 theory in d = two or three spacetime dimensions is super-renormalizable:
the ultraviolet behavior becomes progressively softer (i.e., less divergent) as we go
to higher orders of perturbation theory. As a result, the large-order behavior of
the renormalized and cutoff theories is essentially the same. In four dimensions the
renormalizability (rather than super-renormalizability) of the theory, with logarithmic
ultraviolet dependence of the cutoff amplitudes, results in a more complicated situation
once the theory is rewritten (renormalized) in terms of cutoff-independent physical
quantities, as the strength of the ultraviolet sensitivity remains unchanged as we go to
higher orders of perturbation theory. For d ≤ 3, a seminal analysis initiated by Lipatov
(Lipatov, 1977) shows that the leading behavior at large orders of perturbation theory
is determined by classical extrema of the Euclidean action appearing in the functional
integral (11.2). Considerations of space limit us to merely an outline of the argument
here: for further details the reader is referred to the excellent discussion of Zinn–Justin
((Zinn–Justin, 1989), especially Chapters 33 and 37).
For simplicity we begin with the case d = 1 (with the single spacetime dimension
treated as time): relabeling the field φ(x) → q(t), (11.2) becomes (taking m = 1),
  
− dt{ 12 q̇(t)2 + 12 q(t)2 + λ 4
4 q(t) }
Z(λ) = Dq(t)e ≡ Dq(t)e−SE (q) (11.7)

which is exactly the Euclidean functional integral appropriate for an anharmonic


2
oscillator defined by the Hamiltonian H = p2 + 12 q 2 + 14 λq 4 (with m = ω = 1 for sim-
plicity: cf. Equations (4.100), (4.99, and (4.108–4.110)). If we take the functional inte-
gral to run over periodic functions satisfying q(−β/2) = q(+β/2), this functional
integral gives, as discussed in Section 4.2.1, the finite-temperature partition function


Z(λ, β) = Tre−βH = e−βEn (λ) ∼ e−βE0 (λ) , β → ∞ (11.8)
n=0

so that for large β (low temperature) the ground-state energy dominates. The
analog of the generating functional of connected graphs W becomes in this limit
W ≡ ln Z ∼ −βE0 (λ)—in other words, just the ground-state energy (up to a factor
of −β). The instability of the system as we analytically continue from positive real
λ to negative λ is manifested, as usual in quantum mechanics, in the appearance
of an imaginary part in the analytically continued energy eigenvalue E0 (λ): indeed,
just as in the zero-dimensional toy integral case discussed previously, the imaginary
part is simply the signal of a cut appearing along the negative real axis in the
complex plane of λ, with Z(λ = −|λ| + i) = Z ∗ (λ = −|λ| − i) as Z(λ) (dropping
the at present uninteresting β dependence) is real-analytic (real for positive real λ).
As for the one-dimensional integral, the value of Z on the top and bottom lips of
the cut can be computed by rotating the phase of the “field” variable q(t) in tandem
with the phase of the coupling so as to preserve a negative real part (and hence
convergence) in the exponent −SE appearing in (11.7). Thus, we let λ → eiφ |λ|,
1
q(t) → q(t)e−iφ( 4 −δ/π) (with δ small positive) and arrive after rotating φ → ±π with
380 Dynamics IX: Interacting fields: non-perturbative aspects

the well-defined functional integrals


 
dt{ 12 q̇θ (t)2 + 12 qθ (t)2 + λ 4
Z(λ = −|λ| ± i) = Dq(t)e− 4 qθ (t) }
(11.9)

where qθ (t) = eiθ q(t), θ = ∓( π4 − δ). The discontinuity of Z across the cut on the
negative axis is then given by subtracting the two integrals (11.9) for the two signs
of θ. Lipatov realised that the resultant difference of path integrals can be deformed
further to pass through saddle points (extrema of the Euclidean action SE (q)) which
dominate the result for small negative Re(λ). The extremum of the action corresponds
to trajectories of our particle qcl (t) satisfying the equation

δSE (q)
|q=qcl = −q̈cl (t) + qcl (t) + λqcl (t)3 = 0 (11.10)
δq(t)

with the contribution of the saddle-point to Z proportional to e−Scl , Scl ≡ SE (qcl ).


Physically, this is just Newton’s Second Law for a unit-mass classical particle moving
in the potential V (q) = − 12 q 2 − λ4 q 4 , pictured in Fig. 11.2.
The equation (11.10) has an obvious time-independent solution (recall that this is
for λ real
negative), corresponding to the particle sitting motionless at a local minimum
−1
q=± λ
:


−1 β/2
1 1 λ
qcl (t) = ± , Scl = ( q̇cl (t)2 + qcl (t)2 + qcl (t)4 )dt = −β/(4λ) (11.11)
λ −β/2 2 2 4

10

C O D
V(q)

–5
A B

–10
–8 –6 –4 –2 0 2 4 6 8
q

Fig. 11.2 V (q) = − 12 q 2 − λ4 q 4 , λ=–1.0.


On the (non-)convergence of perturbation theory 381

with a contribution e−Scl to Z which (for negative real λ) vanishes exponentially in


the large β limit. In order to obtain a finite contribution in the large β limit, we must
find a saddle-point solution such that the integral in (11.11) remains finite for β → ∞,
which clearly requires that (a) q̇cl (t) → 0 for large t, and (b) 12 qcl (t)2 + λ4 qcl (t)4 → 0 ⇒
qcl (t) → 0 for large t. The Newtonian interpretation of (11.10) means that this solution
corresponds to the particle with zero total energy beginning at rest at the origin at
time t = −β/2 → −∞ and rolling down the potential well, reaching q = ± −2 λ at
some (arbitrary) finite time t0 , −β/2 < t0 < β/2, then returning asymptotically, as
t → +∞, to the origin. There are evidently two independent saddle-point solutions,
depending on whether the particle rolls to the left (going from the origin O through
the minimum at A to point C and then back) or to the right (O to D through the
minimum at B and back) in Fig. 11.2. The reader may easily verify that the following
explicit solution satisfies Newton’s equation (11.10):

−2 1
qcl (t) = ± (11.12)
λ cosh (t − t0 )
with the finite (in the β → ∞ limit) Euclidean action
 +∞
4
Scl = SE (qcl (t))dt = − (11.13)
−∞ 3λ
4
The contribution of this saddle-point to Z is clearly proportional to e−Scl = e 3λ , which
is exponentially small as λ → 0− . Finite action solutions of the Euclidean equations
of motion such as (11.12) have been dubbed “pseudoparticles”, or “instantons” in the
literature. It is clear that Z(λ) contains an essential singularity at the origin—a feature
which persists quite generally in interacting field theories. A complete evaluation
of the leading-order contribution to Im(Z(λ) requires a careful evaluation of the
Gaussian fluctuations around the saddle-point defined by (11.12): in addition to an
obvious factor of β, due to the fact that the time t0 at which the particle reaches its
maximum displacement from the origin can be chosen anywhere between −β/2 and
+β/2 (edge effects are unimportant for β large), the resultant Gaussian integral yields4
a contribution of the form Ke−β/2 −1 λ , with K an uninteresting numerical constant.
To summarize, the leading contribution to the discontinuity of Z(λ) for small negative
λ and large β takes the form (with corrections of relative order λ and e−β )

−β/2 −1 4
Im(Z(λ = −|λ| + i)) ∼ −Kβe e 3λ (11.14)
λ
On the other hand, from (11.8), the imaginary part of the partition function at large
β and small coupling (where Re(E0 (λ)) → 12 , the zero-point energy for the harmonic
oscillator) must behave like

Im(Z) ∼ −β Im(E0 (λ))e−βRe(E0 (λ)) ∼ −βe−β/2 Im(E0 (λ)) (11.15)

4 See (Zinn–Justin, 1989), op. cit., for further details.


382 Dynamics IX: Interacting fields: non-perturbative aspects

Comparing (11.15) and (11.14), we conclude that



−1 4
Im(E0 (λ)) ∼ K e 3λ (1 + O(λ)), λ → 0− (11.16)
λ

The imaginary part of the energy displayed in (11.16) is directly connected to the
tunneling rate for a zero-energy particle initially localized around the origin to
escape through the barrier formed by the potential 12 q 2 − |λ| 4
4 q appearing in our
original functional integral (11.7), as one may verify with a simple WKB calculation
(see Problem 1). The connection of the instability generated by such tunneling to
the large-order behavior of the coefficients appearing in the Rayleigh–Schrödinger
perturbation theory for the ground-state energy E0 (λ) ∼ n cn λn is established by
use of analyticity, which allows us to connect the behavior of E0 (λ) for positive real
λ to the discontinuity across the cut on the negative real axis.
First, note that for large coupling λ, the energy E0 (λ) has the asymptotic behavior
λ1/3 , by a simple scaling argument (see Problem 2). This means that the function
f (λ) ≡ (E0 (λ) − 12 )/λ is (a) analytic in the complex plane of λ, cut along the negative
real axis, and (b) behaves for large |λ| like λ−2/3 . Accordingly, f (λ) satisfies the Cauchy
formula

1 f (λ ) 
f (λ) = dλ (11.17)
2πi C λ − λ

where C is the contour indicated in Fig. 11.3. Replacing f (λ) by E0 (λ), and expanding
the contour C to infinite size, whereupon the curved parts go to zero, while the two
straight portions along the negative real axis combine to give the imaginary part of
E0 , giving

λ

•λ

Fig. 11.3 Contour integral for dispersion relation in complex λ plane.


On the (non-)convergence of perturbation theory 383

1 λ 0
ImE0 (λ ) 
E0 (λ) = + dλ (11.18)
2 π −∞ (λ − λ)λ

Expanding the denominator factor (λ − λ)−1 in powers of λ inside the integral yields
the desired asymptotic expansion and an explicit formula for the leading behavior at
large order n of the coefficients cn as an integral over the cut discontinuity of E0 :

1 0 ImE0 (λ) 3 1 1
cn = dλ ∼ K(−1)n+1 ( )n+ 2 Γ(n + )(1 + O(1/n)) (11.19)
π −∞ λn+1 4 2
where the corrections of order 1/n arise from the corrections of relative order λ to the
leading behavior of the discontinuity for small negative λ. The important features to
note here are first, that the coefficients rise factorially with order, as anticipated by
our intuitive arguments, and second, that they alternate in sign. The latter property
will move to center stage in Section 11.3, when we discuss the Borel summability (or
absence thereof) of perturbative expansions in field theory.
For spacetime dimensions d = 2 or 3, we move into the realm of field theory proper,
but the large-order analysis proceeds along much the same lines as for the anharmonic
oscillator: the dominant contribution to the discontinuity in the partition function
(11.2) when analytically continued to negative (small) λ is given by the saddle-point
contribution to the functional integral arising from finite-action solutions (as before,
called “instantons”) of the classical Euclidean field equation describing the extrema
of the action:

−Δφcl (x) + m2 φcl (x) + λφ3cl (x) = 0 (11.20)

where Δ is the Laplacian in d dimensions. The extremal value of the action Scl at a
solution of (11.20) can be simplified, using (11.20) to eliminate the derivative terms:
 
1 1 1 λ
Scl = dd x{− φcl (x)Δφcl (x) + m2 φ2cl (x) + λφ4cl (x)} = − dd xφ4cl (x) > 0
2 2 4 4
(11.21)
The leading contribution is proportional to e−Scl , so we are really looking for the finite-
action solution with the minimum value for Scl . From (11.21) it is clear that finite
action requires that φcl (x) → 0, x → ∞. It can be further be shown (see (Zinn–Justin,
1989), op. cit) that the minimal action solutions correspond to spherical symmetry,
φcl (x − x0 ) = √m
−λ
u(r), r ≡ m|x − x0 |, where the instanton solution is centered at an
arbitrary (Euclidean) spacetime point x0 (exactly analogous to the time t0 appearing
in the instanton solution (11.12) for d = 1). The rescaled dimensionless function u(r)
satisfies the ordinary non-linear differential equation
d2 u(r) d − 1 du(r) d 1 1
=− − (− u2 (r) + u4 (r)) (11.22)
dr2 r dr dr 2 4
with the boundary condition u(r) → 0, r → ∞. In the previously discussed case of the
anharmonic oscillator (d = 1) the radial coordinate r corresponded to the time, and
the equation (11.22) had a ready-made mechanical interpretation in terms of Newton’s
Law for a unit mass particle in a potential V (u) = − 12 u2 + 14 u4 . This remains the case
384 Dynamics IX: Interacting fields: non-perturbative aspects

here if we interpret r as a time coordinate, for dimensions d = 2, 3, with the additional


feature that a retarding frictional force term (proportional to velocity, and increasing
with dimension) is present. The result is that the boundary condition u(r) → 0 as
“time” r → ∞ is achieved (without an “overshoot”
√ to negative r) only if we start our
fictional particle at time r = 0 at |u(0)| > 2 (the value for the anharmonic oscillator
case, d = 1, must be increased to overcome the additional frictional resistance: so we
must “start” our particle to the right of point D, or the left of point C, in Fig. 11.2)
at a discretely determined value—the minimal such value of |u(0)| determining, as a
detailed analysis confirms, the minimal action Scl . A simple numerical ODE solver for
(11.22) yields the desired results for d = 1, 2, or 3, displayed
√ in Fig. 11.4 (only the
d = 1 case allows an analytic solution: namely, u(r) = 2/ cosh (r), cf. Eq. (11.12)).
The corresponding classical action is given by inserting the result obtained from the
solution of (11.22) into (11.21), which gives
  ∞
λ 1 π d/2
Scl = − d xφcl (x) = − Ad , Ad =
d 4
m 4−d
r d−1 u4 (r)dr (11.23)
4 λ 2Γ(d/2) 0

The physical interpretation of these classical solutions is a natural extension of


that given earlier for the “rollover” solution (11.12) in the anharmonic oscillator
case: φcl (x) represents a spatiotemporal picture of the dominant (i.e., most likely)
tunneling behavior of a field configuration initially centered around the minimum-
energy configuration for positive λ (i.e., zero field) through the energy barrier that
appears once the coupling is continued to a negative real barrier.
Although we have focussed on the large-order behavior of the source-free generating
function Z in (11.2)—in other words, as we saw in the preceding chapter, on the
vacuum graphs of the theory—the saddle-point analysis above extends in a straight-
forward way to the functional integral giving the Euclidean Schwinger functions of the

4
d=1 (anharmonic oscillator)
d=2
3 d=3
u(r)

0
0 1 2 3 4 5
r

Fig. 11.4 Instanton solutions for φ4 -theory in dimensions d = 1, 2, 3.


On the (non-)convergence of perturbation theory 385

theory (cf. (10.82)):


 
1 dd x{ 12 ∇φ·∇φ+ 12 m2 φ2 + λ 4
S(x1 , x2 , . . . , xn ; λ) = Dφ φ(x1 )φ(x2 )...φ(xn )e− 4φ }
Z(λ, m)
(11.24)
The discontinuity across the cut for negative λ, which determines the large-order
behavior by the dispersion analysis we have described, is again obtained by expanding
around the instanton saddle-points corresponding to instantons φcl (x − x0 ) centered
around arbitrary spacetime points x0 , with a prefactor determined by the Gaussian
integration around the saddle-point (see (Zinn–Justin, 1989) for details):
Ad  
n

ImS(x1 , x2 , ..., xn ; λ = −|λ| + i) ∼ Kd dd x 0 φcl (xi − x0 ) (11.25)
(−λ)d/2 i=1

Just as in the perturbative expansion (11.19) for the anharmonic ground-state energy,
Ad
the presence of the essential singularity induced by e λ will lead to factorially growing
contributions (with oscillating sign) at large order, so the perturbation theory (even
when ultraviolet cutoffs are in place) is divergent.
With the technology we have described in this section in place, there is a great
temptation to draw the conclusion that the dominant behavior at large orders of per-
turbation theory somehow ought to determine the dominant physical behavior of the
corresponding amplitudes. This temptation must be strenuously resisted, for (at least)
two reasons. First, one must bear in mind that even if a unique resummation procedure
were available to convert the information contained in the perturbative coefficients (to
all order) into well-defined convergent approximants to the exact field-theoretic ampli-
tudes, there is absolutely no guarantee that the dominant portions of the large-order
perturbative coefficients actually translate into the dominant parts of the amplitudes
in the physical regime of interest. In fact, we shall see in the next section that bound
states in field theory provide an immediate counterexample to any such claim.
Secondly, the analysis performed above was entirely carried out in the context of
the Euclidean functional integral: actual physical amplitudes need to be obtained from
those calculated in Euclidean space by an analytic continuation to Minkowski space.
Unfortunately, it is well known that asymptotic estimates of analytic functions cannot
in general be analytically continued: in other words, it is not true that the analytic
continuation of an asymptotic series yields in general a correct asymptotic expansion
for the analytically continued function. Of course, this would be possible were the
expansions in question convergent Taylor series, but we have just seen that this is
essentially never the case in a non-trivial interacting field theory. These obstacles
have unfortunately severely limited the extraction of quantitatively useful physical
information from a large and elegant body of work on instanton solutions, especially in
quantum chromodynamics, where the tunneling processes described by instantons are
almost certainly connected in a deep way to the chiral symmetries (and the breaking
thereof) of the theory. A fortunate exception is in the theory of critical phenomena,
in which the Borel resummation (cf. Section 11.3) of the large-order behavior of φ4
theory in d = 3 dimensions has been used to extract highly accurate results (often
386 Dynamics IX: Interacting fields: non-perturbative aspects

to four or five significant figures) on critical exponents for second-order transitions in


models in the same universality class as scalar field theory (LeGuillou and Zinn-Justin,
1980).

11.2 “Perturbatively non-perturbative” processes: threshhold


bound states
The fact that the perturbative expansion for field-theory amplitudes is divergent
means that the complete set of Feynman graphs contributing to a process—even if
we could calculate them all exactly (an impossible task in practice)—do
not define an
exact amplitude for the process. In an asymptotic series ck g k , the property that
successive terms only decrease in magnitude for a finite number of orders (say, N )
before starting to increase again, means that the series is only practically useful if the
coupling constant g is “small”, which here should be taken to mean that N the smallest
term in the series cN g N is much smaller than the sum of the series k=0 ck g k up to
that point.
On the other hand, certain physical processes necessarily require the inclusion of an
infinite number of interactions. The formation of a stable bound state, for example, in
which the constituent particles by definition continue to interact over an infinite time
period, is clearly an example of such a situation. Clearly, we cannot simply imagine
literally summing the infinite set of Feynman graphs contributing to the continuing
interaction of the constituents: as we have seen in the preceding section, this would give
an infinite result! Nevertheless, we shall see that for a certain class of bound states,
defined by the appearance of non-relativistic threshold singularities, the dominant
features of the bound-state formation, and the properties of the resultant bound state,
are determined by a summable subset of graphs which predominate in the kinematical
region appropriate for these systems. Moreover, in the event that the basic binding
interaction is weak, the perturbative amplitudes can be reorganized into a series of
terms, each containing a tower of infinitely many contributions, which determine to
successively higher accuracy (i.e., higher order in the weak coupling) the properties of
the resultant bound state. For lack of a better term, we may refer to such processes
as “perturbatively non-perturbative”.
Consider a theory with two fields A(x), B(x) (for kinematic simplicity, we assume
these fields to be equal mass in the following) interacting in such a way that a stable
single particle state |P  (we omit spin labels, if present, for notational simplicity), of
mass MB , arises, with

0|T (A(x)B(y))|P  = 0 (11.26)

We then say that the state |P  is a bound state of the particles interpolated for by A
and B. In the terminology of Section 9.2, the bilocal operator
x x
Cx (X) = T (A(X + )B(X − )) (11.27)
2 2
(suitably smeared over x) acts as an almost local field (centered at the point X) which
interpolates for the bound-state particle. The matrix element
“Perturbatively non-perturbative” processes: threshhold bound states 387


x x
ΦP (x) ≡ (2π)3/2 2E(P )0|T (A( )B(− )|P , E(P ) ≡ P 2 + MB2 (11.28)
2 2
is called the “Bethe–Salpeter wavefunction” of the bound state. The energy square-
root factor for the bound state is included for convenience as our bound-state ket is
non-covariantly normalized. As we shall see, it plays a role analogous to that of the
Schrödinger wavefunction in non-relativistic quantum theory: in particular, the bound-
state mass is determined by an eigenvalue equation involving this function.5 The
Kållen–Lehmann representation (cf. Section 9.5) tells us that the existence of a single
particle asymptotic state of mass MB implies a pole of the form 1/(P 2 − MB2 ) in the
Feynman two-point function of any Heisenberg field that interpolates for the bound-
state particle (see (9.192)). The pole arises in the usual way from the contribution of
single-particle intermediate states to the two-point function

G(X − Y ) ≡ 0|T (Cx (X)Cy† (Y ))|0 (11.29)

where we have temporarily suppressed the relative coordinates x, y which are held
finite and fixed while the pole in the Fourier transform of G

G̃(P ) = 0|T (Cx (X)Cy† (0))|0eiP ·X dd X (11.30)

is evaluated. We have left the spacetime dimension d unspecified here, as we shall


consider examples in various spacetime dimensions below.6 The single particle contri-
bution to the spectral function ρ1 part (see (9.188)) is

1
θ(P0 )ρ1part (P 2 ) = dd−1 k0|Cx (0)|kk|Cy† (0)|0δ 4 (P − k)
(2π)3
1 1
= φP (x)φ∗P (y)δ(P0 − E(P ))θ(P0 )
(2π)3 2E(P )
⇒ ρ1 part (P 2 ) = φP (x)φ∗P (y)δ(P 2 − MB2 ) (11.31)

which then gives rise (see (9.191, 9.192)) to the expected single-particle pole in the
bound-state propagator

−iG̃(P ) = −i eiP ·X 0|T {A(X + x/2)B(X − x/2)A† (y/2)B † (−y/2)}|0

ΦP (x)Φ∗P (y)
→ , P 2 → MB2 (11.32)
P 2 − MB2 + i

5 However, one must be careful, in the relativistic field theory case, not to attach the usual probabilistic
interpretation to this function: recall the difficulties entailed in attempts to define a position operator in
field theory, discussed in detail in Section 6.5.
6 The mathematically fastidious reader may imagine smooth smearing functions of rapid decrease in x
and y attached to the equations that follow, so that we are really dealing with almost local operators Cf (X)
in the sense of Section 9.2.
388 Dynamics IX: Interacting fields: non-perturbative aspects

P P
+q −q
2 2

Φ P (q)

i
G(4)
P 2 → MB2 P2 − MB2 +

ΦP∗ (p)

P P
+p −p
2 2

Fig. 11.5 Bound-state pole contribution to the scattering amplitude in a binding channel.

At this point it will be convenient to go over completely to momentum space, by


Fourier transforming the remaining coordinate space relative variables x, y. We recall
that the Feynman four-point function representing the scattering amplitude for elastic
A-B scattering, with the external momenta chosen as in Fig. 11.5, is given by
P P P P
G̃(4)( + q, − q, + p, − p)
2 2 2 2

= ei( 2 +q)·x1 +i( 2 −q)·x2 −i( 2 +p)·y1 0|T {A(x1 )B(x2 )A† (y1 )B † (0)}|0dd x1 dd x2 dd y1
P P P


= eiP ·X+iq·x−ip·y 0|T {A(X + x/2)B(X − x/2)A† (y/2)B † (−y/2)}|0dd Xdd xdd y

(11.33)

where we have omitted a spacetime integral over the fourth field (which deletes
the uninteresting (2π)4 δ 4 ( P ) energy-momentum conservation factor from the
amplitude) in the second line, and made the change of variables x = x1 − x2 , X =
x1 +x2 −y1
2 , y = y1 in the last line. Comparing (11.33) with (11.32), we see that the 2-2
scattering amplitude of our A and B particles will display a simple pole as the total
incoming momentum P is taken onto the mass shell for the bound state P 2 → MB2 :

P P P P ΦP (q)Φ∗P (p)
G̃(4) ( + q, − q, + p, − p) → i 2 , P 2 → MB2 (11.34)
2 2 2 2 P − MB2 + i

This result may be given a convenient graphical expression, as in Fig. 11.5.


The existence of bound states which are potentially amenable to a systematically
improvable analysis based on perturbation theory depends on the presence of a
“Perturbatively non-perturbative” processes: threshhold bound states 389

sufficiently weak coupling in which the bound-state properties can be expanded,


yielding increasingly accurate results, at least up to some finite order of the expansion.
On the other hand, it seems at first sight strange that a bound state would form at
all in the presence of an arbitrarily weak coupling, as one would expect that the
Feynman graphs representing increasing numbers of interactions between the binding
constituents would be suppressed by higher powers of a small quantity. Indeed, a
bound state can only form for arbitrarily weak coupling if the strength of each
successive interaction is enhanced kinematically by a correspondingly large factor,
so that amplitudes containing any number (indeed, an infinite number) of interactions
contribute comparably to the maintenance of the identity of the bound state over
an infinite time period. We shall see that the necessary enhancement arises from
the presence of threshold singularities, arising from kinematic regions in which the
constituent particles of the bound state are very close to being on mass shell. Moreover,
the set of graphs which provide the strongest threshold enhancement at any given
order of perturbation theory amount to a summable subset of the complete set of
Feynman diagrams of the theory: indeed, they have the mathematical structure of a
convergent geometric expansion, rather than a divergent asymptotic one. This fact
is of tremendous historical importance, given the critical role played by precision
calculations of bound state effects, such as the Lamb shift, in the development of
quantum electrodynamics—or, for that matter, of quantum mechanics, as in the case
of the hydrogen atom.
In Section 10.4 we saw that the singularities displayed by Feynman amplitudes in
field theory are closely related to the connectedness structure of the amplitudes—in
particular, to the way in which the amplitude is built up from p-irreducible amplitudes
defined by connectivity after p internal lines are cut. Expressing the full amplitude in
terms of p-irreducible vertices allows us to isolate the threshold singularities associated
with intermediate states involving p (and exactly p) particles. For the present case,
where the bound state arises from two constituents (the quanta of the A and B fields),
the relevant vertex is the two-particle irreducible (2PI) scattering amplitude KP (q, p)
(or kernel), the iteration of which, with two particle intermediate states connecting
successive kernels, generates the full 2-2 amplitude, which we now slightly relabel and
rescale to simplify the notation as

1 P P P P
G̃P (q, p) ≡ − G̃(4) ( + q, − q, + p, − p) (11.35)
(2π)d 2 2 2 2

The iteration is depicted graphically in Fig. 11.6, where the crossbars on the outgoing
(top) propagators indicate that they have been amputated, with the result that the
disconnected part simply becomes a δ-function equating the initial and final relative
momenta p and q. Note that two-particle irreducibility in this context is defined as
the property that the graph remains connected when a single A line and a single B
line is cut: in other words, we imagine cutting the graphs in Fig. 11.6 horizontally (in
the so-called “s-channel” for the scattering amplitude). Particle lines bearing an arrow
in Fig. 11.6 should be regarded as full propagators (including self-energy corrections,
etc.) for the A and B fields (see below).
390 Dynamics IX: Interacting fields: non-perturbative aspects

P P
+q −q
2 2

KP

G̃P (q,p) = δ d(q − p) + KP + +...

KP

P P
+p −p
2 2

Fig. 11.6 2-2 scattering amplitude expressed as an iteration of two-particle irreducible


segments.

The series of 2PI kernels in Fig. 11.6 can be re-expressed as an integral equation
(4)
for G̃P with kernel KP and inhomogeneous part I(q, p) = δ d (q − p)

P P dd k
Δ̂−1
F ( + q)Δ̂−1
F ( − q)G̃P (q, p) = I(q, p) + KP (q, k)G̃P (k, p) (11.36)
2 2 (2π)d

as depicted in Fig. 11.7. Here, Δ̂F ( P2 + p) (resp. Δ̂F ( P2 − p)) represent full propagators
for the A (resp. B) fields (i.e., the Feynman two-point functions for the fully interacting
Heisenberg fields; cf. Section 9.5). To avoid overburdening the notation, we use the
same symbol for both propagators, even though the fields A and B may be distinct:
which propagator is meant will be clear from the context. The iteration of (11.36)
clearly generates the succession of 2PI segments indicated in Fig. 11.6.
Note that the kernel KP (q, k) is defined to be amputated with respect to both
incoming and outgoing legs (with momenta P2 ± k and P2 ± q), in order to avoid
doubling the internal propagators. The relevance of this representation of the 2-2
amplitude is that the kernel KP (q, k) does not contain a single particle bound state
pole, inasmuch as two-particle intermediate states are absent by definition in the
graphical expansion of KP . Thus, if we take the on-mass-shell limit P 2 → MB2 for
the bound state in (11.36), only the G̃P factors contain the pole term arising from
(11.34), so that, identifying the residues of this pole on both sides of (11.36), we find
the famous Bethe–Salpeter equation (Salpeter and Bethe, 1951):

P P dd k
Δ̂−1
F ( + q)ΦP (q)Δ̂−1
F ( − q) = KP (q, k)ΦP (k) (11.37)
2 2 (2π)d

This result has the obvious graphical depiction indicated in Fig. 11.8. An alternative
version, in which the Bethe–Salpeter wavefunction is itself amputated: i.e.,
“Perturbatively non-perturbative” processes: threshhold bound states 391

P P
P P +q −q
+q −q 2 2
2 2

KP (q,k)

G̃P (q,p) = I(q,p) + P P


+k −k
2 2

G̃P (k,p)

P P
+p −p P P
2 2 +p −p
2 2

Fig. 11.7 Integral equation satisfied by the 2-2 scattering amplitude.

P P
P P +q −q
+q −q 2 2
2 2

KP (q,k)

ΦP (q) =
P P
+k −k
2 2

ΦP (k)

Fig. 11.8 Bethe–Salpeter equation determining the bound state wavefunction ΦP (q).

P P
ΨP (q) ≡ Δ̂−1
F ( + q)ΦP (q)Δ̂−1
F ( − q) (11.38)
2 2
leads to a slightly different Bethe–Salpeter equation for ΨP (q):

P P dd k
ΨP (q) = KP (q, k)Δ̂F ( + k)ΨP (k)Δ̂F ( − k) (11.39)
2 2 (2π)d
392 Dynamics IX: Interacting fields: non-perturbative aspects

The simplest example of a bound state arising in a weakly coupled system, and
amenable to perturbative analysis, is found in our old friend, the scalar φ4 theory,
in two or three spacetime dimensions. Thus, we imagine a self-interacting non-self-
conjugate massive scalar field φ with interaction Hamiltonian density

λ
Hint (z) = : (φ† (z)φ(z))2 : (11.40)
4

and free momentum-space propagator ΔF (p) = p2 −m1 2 +i . The normal ordering means
that graphs in which a scalar line leaves and returns to the same vertex are excluded
(cf. Section 10.1), so any loop integral contains
 at least two scalar propagators and is
ultraviolet convergent in d = 2 or 3 (as dd k/k4 < ∞ as far as the large momentum
contribution is concerned, if d < 4). The theory is therefore ultraviolet finite ab
initio, although, of course, there will be finite corrections which convert the bare
mass m appearing in the free propagator to the physical mass mph of our scalar
particle. These renormalization effects are not particularly relevant to the physics
of the bound-state formation which is our primary interest here, and will not be
emphasized in the following, although in a real calculation of bound-state properties
one would need to re-express the final results in terms of the measurable physical
mass. In any case, for the theory in question, we now take A(x) = φ(x), B(x) = φ† (x),
so we are studying the possibility of particle–antiparticle binding in the φ − φc
channel.
The lowest-order contributions, through order λ2 , to the kernel KP are indi-
cated graphically in Fig. 11.9; analytically, one finds that the following terms cor-
rectly generate the 2-2 scattering amplitude G̃P through O(λ2 ) when (11.36) is
iterated:

1
KP (q, p) = i(λ − λ2 F((p − q)2 ) − λ2 F((p + q)2 )) + O(λ3 ), (11.41)
2

P P P P P P
+q −q +q −q −q +q
2 2 2 2 2 2
l p+q−l

KP (q,p)= + + +...

p−q+l l
P P− p P P P P
+p +p −p +p −p
2 2 2 2 2 2

Fig. 11.9 Low-order contributions to the 2PI kernel KP (q, p) (arrows indicate charge flow: up
for particles, down for antiparticles).
“Perturbatively non-perturbative” processes: threshhold bound states 393

where we have defined



dd l
F(k 2 ) ≡ −i ΔF (l)ΔF (k − l)
(2π)d

1 dd l
= −i (11.42)
(l2 − m2 + i)((k − l) − m + i) (2π)d
2 2

The loop integral defining F(k2 ) is perfectly finite in dimensions d = 2 or 3 (see


Problem 3), and one obtains, in the region 0 < k2 < 4m2 ,

1 1 k2
F (k ) = 
2
arctan , d = 2, (11.43)
π k (4m − k )
2 2 2 4m − k2
2


1 1 k 2 + 2m
F(k ) =
2
√ ln ( √ ), d = 3 (11.44)
4π k 2 4m2 − k2
The value of F(k 2 ) for general k 2 can be obtained by analytic continuation of these
formulas: in particular, one finds that F(k 2 ) is a real analytic function of k 2 with a cut
on the positive real axis for 4m2 ≤ k 2 of square-root (resp. logarithmic) type in d = 2
2
(resp. 3), and no other singularities
√ (the apparent square-root branch point at k = 0
is spurious: only even powers of k 2 appear in the Taylor expansion around k = 0).
We now search for the appropriate conditions for a bound-state pole to develop
in the 2-2 scattering amplitude, √ where for convenience we work in the rest frame
of the bound state and set P 0 = P 2 = 2m − κ2 /m at the bound-state pole. Thus,
the binding energy Ebind = κ2 /m of the bound state is parameterized in terms of the
variable κ, which is a measure of the distance from the bound-state pole to the two-
(free-)particle threshold at P 0 = 2m. The iteration of the leading-order kernel (just
the constant iλ) clearly generates a tower of bubble graphs indicated in Fig. 11.10(a).
These graphs form a geometric series; excluding the trivial disconnected contribution
to G̃P (q, p), one has the sum (n is the number of loops)

(0) iλ  P P P P
G̃P (q, p) = d
(−λF(P 2 ))n ΔF ( + q)ΔF ( − q)ΔF ( + p)ΔF ( − p)
(2π) n=0 2 2 2 2
iλ 1 P P P P
= ΔF ( + q)ΔF ( − q)ΔF ( + p)ΔF ( − p)
(2π)d 1 + λF(P 2 ) 2 2 2 2
(11.45)

The superscript (0) here indicates that this is the leading contribution to a reorganized
set of perturbative contributions to G̃P : we shall soon see that this tower of graphs
determines the leading-order properties of the bound state for weak coupling. For a
true bound state to be present at weak coupling (small λ) the value of the bubble
integral F(P 2 ) must increase correspondingly at the bound-state pole to allow con-
tributions of arbitrary order to remain comparable, thereby keeping the constituents
bound for an infinite time. If the bound state is to be present at arbitrarily weak
coupling, this means that F must become singular: this can only happen if the bound-
394 Dynamics IX: Interacting fields: non-perturbative aspects

.
. n loops
insertion of
O(λ2) kernel

(a) (b)

Fig. 11.10 (a) 2-2 amplitude from iteration of leading-order kernel; (b) subleading tower from
a single insertion of a higher-order kernel.

state momentum P = (2m − κ2 /m, 0) approaches the two-particle threshold, i.e., as
κ/m → 0, when the bubble integral has the asymptotic behavior

1
F(P 2 ) ∼ + O(1), κ/m → 0, d = 2, (11.46)
8mκ
1 2m 2m
F(P 2 ) ∼ ln ( ) + O(κ2 ln ( )), κ/m → 0, d = 3 (11.47)
8mπ κ κ

In other words, the bound state must become non-relativistic, with binding energy
much smaller than the rest energy of the system. Since F is positive, we see also
that the existence of a pole in (11.45) requires that the coupling λ be negative,
corresponding to an attractive local point interaction between the constituent scalars.
Of course, this would lead if taken literally to a theory with a spectrum unbounded
below (cf. Section 8.4), but in d = 2 or 3 we are free to add a φ6 interaction with
an arbitrarily weak coupling to restore spectral sanity, without sensibly altering the
bound-state properties (or the renormalizability of the theory), so we shall ignore this
difficulty and proceed henceforth with negative λ.
Referring to (11.45) we see that the summed bubble graphs of Fig. 11.10(a) have
a pole at |λ|F(P 2 ) = 1, which for small λ, using the asymptotic forms (11.46, 11.47),
occurs when

|λ| λ2
κ∼ , Ebind ∼ , d=2 (11.48)
8m 64m3
8πm 16πm
κ ∼ 2m exp (− ), Ebind ∼ 4m exp (− ), d = 3 (11.49)
|λ| |λ|
“Perturbatively non-perturbative” processes: threshhold bound states 395

In one space and one time dimension (d = 2), the local λ4 (φ† φ)2 interaction generates a
δ-function potential V (x) = gδ(x) in the non-relativistic limit, with the dimensionless
λ
(in natural units) coupling given by g = 4m 2 (note that the coupling λ has dimension

mass squared in d = 2). The reader may easily verify that such a potential in one
spatial dimension does indeed lead (for λ < 0) in a system of reduced mass m/2
to a single bound state with the stated binding energy. The threshold singularity
in two-space, one-time dimensions (d = 3) is much weaker—only logarithmic rather
than linear—with the result that the binding energy vanishes exponentially for small
coupling, with an essential singularity in the dependence of binding energy on coupling.
This is actually the situation that arises for Cooper pairs in the BCS theory of super-
conductivity, where an arbitrarily weak phonon-induced attractive coupling results
in an exponentially small binding (and energy gap) in three spatial dimensions, but
with the system effectively reduced to the two-dimensional Fermi surface of available
electron states (see (Ziman, 1964), p. 330).
The asymptotic behavior indicated in (11.46, 11.47) is readily understood with a
simple power-counting argument. The region of the loop integral responsible for the
dominant contribution at weak coupling to the one-loop bubble integral F(P 2 ) ( with
F defined in (11.42)) corresponds to the non-relativistic scaling l ∼ κ, l0 ∼ κ2 (the
κ2
latter following if we perform the l0 integration picking up the pole at l0 = 2m −m+

l2 + m2 − i, with l ∼ κ << m). Thus in d = 2 dimensions each of the denominators
in the loop integral is of order κ2 , while the d2 l phase-space is of order κ3 , leading
to the overall 1/κ threshold singularity for small κ. In d = 3 dimensions, the power-
counting leads to κ0 , corresponding to a logarithmic dependence on κ when the spatial
integral over l is performed. Each insertion of a higher-order piece of the 2PI kernel
KP in the sequence of bubble graphs, such as the diagram indicated in Fig. 11.10(b),
reduces the strength of the threshold divergence at any given order in powers of λ.
For example, in d = 2, the order λ7 graph indicated in Fig. 11.10(b) produces only
a 1/κ5 threshold singularity, one less power of 1/κ than the corresponding λ7 graph
in the leading tower of bubble graphs shown in Fig. 11.10(a). The reader may easily
verify that adding terms of divergent structure λn+1 /κn−1 to the geometric series in
(11.45) (in contrast to the leading series, with terms of order λn+1 /κn ), corresponds
to an order λ2 contribution to the value of κ at the pole, and hence a higher order
(by one power of λ) contribution to the binding energy of the bound state (i.e., in
the d = 2 case, a contribution of order λ3 to Ebind in (11.48)). In fact, the inclusion
of successively more complicated kernel contributions is necessary to compute the
bound-state properties to successively higher accuracy, with the result that Ebind (for
example) becomes a divergent asymptotic expansion in λ.
The factorial divergence of perturbation theory in φ4 theory discussed in the
preceding section has not, of course, been eliminated: we must expect the total number
of graphs contributing to the 2-2 amplitude G̃P to grow factorially with the power n of
λ. But the remarkable simplification allowed by the existence of threshold singularities
which ensure the persistence of the binding at arbitrarily weak coupling leads to the
(highly fortunate) result that to first approximation the properties of a non-relativistic
threshold bound state are determined by a tiny subset of Feynman graphs (the bubble
diagrams of Fig. 11.10(a)) which form (in this scalar binding case) the simplest of all
396 Dynamics IX: Interacting fields: non-perturbative aspects

summable series: a geometric expansion! The factorial growth with order of the full
set of graphs contributing to the scattering amplitude translates, when the graphs
are reordered into towers on the basis of the strength of the threshold singularities
they exhibit, into an infinite asymptotic expansion for the ground-state properties in
powers of the weak coupling λ.
Threshold singularities of power strength, and therefore qualitatively similar to the
d = 2 self-coupled scalar situation, reappear in massless gauge theories (both abelian
and non-abelian varieties) in 3+1 spacetime dimensions. Some low-order contributions
to the kernel in the case of an “onium” bound state in quantum electrodynamics (e.g.,
positronium, the bound state of an electron and positron) are shown in Fig. 11.11. In
this case, our field A(x) is the electron Dirac field ψ(x) and B(x) is ψ̄(x). As we have
seen, threshold bound states are intrinsically non-relativistic in the weak coupling
region where perturbative resummation is useful, and in gauge theories this singles
out a particular gauge—Coulomb (or “radiation”) gauge—as particularly useful in
isolating the graphs with the strongest threshold singularities (Duncan, 1976). In this
case the leading properties of onium bound states are determined by iteration of the
2PI kernel corresponding to exchange of a single Coulomb photon (Fig. 11.11(a)), with
the two additional powers of κ per loop arising from the two extra space dimensions
cancelled by the 1/κ2 behavior of the momentum-space Coulomb propagator 1/l2 (for
l ∼ κ, where, of course, we need the exchanged photon to be massless). The iteration
of this kernel leads to a series of ladder graphs (see Fig. 11.12)), with the property that
each additional loop brings an extra factor of electron charge squared (and therefore α,
the fine-structure constant), as well as an additional 1/κ power threshold singularity.
Just as in φ4 theory in d = 2, we consequently expect a pole to develop for κ ∼ αm,
giving a binding energy of order α2 m.
We shall discuss the Feynman rules for gauge theories in Chapter 15: for present
purposes, we need only know that in Coulomb gauge the A0 propagator is instan-
taneous, corresponding to a factor i/l2 where l is the spatial momentum carried by
the Coulomb line, and attaches with a factor −ieγ0 at the charged fermion (i.e.,
electron or positron) line. The result is that the leading set of threshold singularities
is generated by iteration of the (fully amputated) kernel indicated in Fig. 11.11(a):
ie2
namely, KP (q, k) ∼ |
q−

k|2
(γ0 )(γ0 ). Note that the kernel has four Dirac subscripts (not

(a) (b) (c) (d)

Fig. 11.11 (a) Coulomb exchange kernel; (b) transverse photon exchange kernel; (c, d) kernels
contributing to Lamb shift.
“Perturbatively non-perturbative” processes: threshhold bound states 397

P P
+q −q
2 2

kn − q

P P
+ kn − kn
2 2

P P
+ k1 − k1
2 2

k1 − p

P P
+p −p
2 2

Fig. 11.12 Tower of ladder graphs generating the leading threshold singularities (of order
αn+1 /κn ) in a massless gauge theory. Arrows denote charge flow; the momentum flow is upwards
on both fermion lines.

shown): the corresponding Dirac index dependence is given by the direct product of
two γ0 matrices. The full fermion propagator (in accordance with our notation from
Chapter 7) is now written ŜF ( P2 + k) for the electron line and ŜF (− P2 + k) for the
corresponding positron line (which is an electron propagator pointing downward, hence
with momentum reversed): in ladder approximation these full propagators become just
the free ones. Moreover, the leading threshold singularities are generated in the non-
relativistic kinematic domain where k0 ∼ O(κ2 ), k ∼ O(κ), and we are as usual in the
frame where P = (2m − κ2 /m, 0), so that we can replace
P/
P + k/ + m 1 1 + γ0
ŜF ( + k) → P 2 2 → P+ , P+ ≡
2 ( 2 + k) − m2 + i k0 −

k2 +κ2
+ i 2
2m

P − P2/ + k/ + m 1 1 − γ0
ŜF (− + k) → → −P− , P− ≡
2 (− P2 + k)2 − m2 + i k0 +

k2 +κ2
− i 2
2m

(11.50)

Note that the appearance of the P+ (resp. P− ) projection operator on the electron
(resp. positron) line explains why transverse photon exchange between the lines is
absent in the leading non-relativistic approximation: a transverse gluon vertex comes
with a spatial γ matrix γ , and the projection operators on either side will then cause
398 Dynamics IX: Interacting fields: non-perturbative aspects

the amplitude to vanish, as P+γ P+ = P−γ P− = 0. In the extreme limit of infinitely


massive fermions, m → ∞, the propagators lose even their dependence on the spatial
momentum k, which corresponds in coordinate space to the static limit in which
SF (x) ∝ δ 3 (x). This limit will be of interest to us in Chapter 19 in our discussion
of quark confinement. The Bethe–Salpeter equation (in the version (11.39)) for this
system evidently takes the form, making the substitutions relevant in the leading
threshold approximation,

P P d4 k
ΨP (q) = KP (q, k)SF ( + k)ΨP (k)SF (− + k)
2 2 (2π)4

e2 1 1 d4 k
= −i γ0 P+ ΨP (k)P− γ0
|q − k|2

k0 − k 2m
2 +κ 2
2
+ i k0 + k 2m+κ 2
− i (2π)
4


e2 m d3 k
= P+ ΨP (k)P− (11.51)
|q − k|2 k 2 + κ2 (2π)3

In passing from the second to the third line, we have used γ0 P+ = P+ , P− γ0 = −P−
(the relative minus sign ensuring the attractive coupling between the electron and
its antiparticle), and performed the integration over k0 to pick up the pole at

2 +κ2
k0 = − k 2m + i. Note that the absence of q0 in the leading order Coulomb kernel
implies that ΨP (q) (in the leading approximation) is really a function only of spatial
momentum, ΨP (q), so that the only k0 dependence in the integral is that displayed
explicitly in the fermion propagators. From (11.51) we conclude immediately that (in
the ladder approximation) (a) ΨP (q) is in fact only a function of the spatial vector q,
and (b) that the Dirac structure of ΨP (q) must be (in the representation (7.100)), as
a consequence of the projection operators P± in (11.51),


A(q) −A(q)
ΨP (q) ∝ (11.52)
A(q) −A(q)

Writing A(q) = (q2 + κ2 )φ(q), we find that (11.51) reduces to



q2 + κ2 e2 d3 k
φ(q) = φ(k) (11.53)
m |q − k|2 (2π)3

or in coordinate space (with φ(r) the Fourier transform of φ(q)),

1 2 e2 κ2
− ∇ φ(r) − φ(r) = − φ(r) (11.54)
m 4πr m
—exactly the non-relativistic Schrödinger equation for two equal mass m particles
e2
binding via an attractive Coulomb potential V (r) = − 4πr to form a bound state with
2
binding energy − κm .
Just as in the case of self-coupled scalar theories, the higher-order contribution
to two-particle irreducible kernels (such as the graphs displayed in Fig. 11.11(b,c,d))
reduce the strength of the threshold singularities at any given order of perturbation
theory, and consequently result in higher-order (in α) shifts in the binding energy (and
“Perturbatively non-perturbative” processes: threshhold bound states 399

other bound-state properties). For example, a single insertion of transverse photon


exchange (graph in Fig. 11.11(b)) in a ladder of Coulomb exchanges will produce (for
the same power of α) two extra factors of κ, as a spatial momentum term (of order κ)
must be used (instead of the P± in (11.50)) in the numerator of a Dirac propagator
adjacent to the transverse photon vertex (with a vertex factor −ieγ ) on both the
electron and positron line, to avoid getting P±γ P± = 0. As κ ∼ α at the bound state
pole, this results in a correction to the binding energy suppressed by α2 relative to the
leading term: i.e., a contribution of order α4 . This is in fact the magnetic hyperfine
splitting.7 For positronium, in which the electron and positron magnetic moments
are comparable, the hyperfine splitting is of the same order as the relativistic O(α4 )
corrections which arise from considering only Coulomb ladder graphs, but with fully
relativistic propagators (instead of the non-relativistic limits (11.50)) on the electron
and positron lines.8 More complicated kernels, such as emission and absorption of a
photon from an electron, with multiple Coulomb interactions in the interim (e.g., Fig.
11.11(c,d)), lead to O(α5 ) shifts: in the case of the hydrogen atom, this is the famous
Lamb shift. A systematic analysis of the reorganization of the scattering amplitude
into towers of graphs of given threshold behavior for both abelian and non-abelian
gauge theories, allowing the systematic extraction of the various contributions to the
binding energy of onium states in both types of theory, has been given in (Duncan,
1976).
Of course, not all bound states in quantum theory appear as a consequence of the
kinematic enhancement of an intrinsically weak coupling near a threshold, allowing
the systematic “sorting” of perturbative contributions into summable towers of graphs
as described above. In non-relativistic quantum mechanics, for example, an attractive
Yukawa potential V (r) = −g 2 e−mcr/ /r in three space dimensions will only produce a
bound state once the coupling strength g exceeds a minimum critical value. The field-
theoretic analog of this problem involves the exchange of a massive boson (mass m)
between heavy fermions (e.g., pions exchanged between nucleons in the meson field
theories of the 1950s). The fact that all particles are massive means that there are
no infrared threshold singularities to effectively enhance the coupling by moving the
bound-state pole close to the two-particle threshold, and the formation of the bound
state is an essentially non-perturbative process: there is no systematic way to extract
a dominant summable subset of graphs contributing to the bound-state properties.
The situation is even more dramatically non-perturbative in the case of quantum
chromodynamics, where the constituent binding particles (quarks and gluons) do not
even exist as asymptotic states in the full theory (in contrast to perturbation theory),
due to color confinement.The process by which quarks bind to form the low-mass
hadrons necessarily involves a large effective coupling of order unity, and perturbative
calculations give us no useful information in this regime.

7 For the explicit calculation, see (Duncan, 1976), Appendix.


8 Of course, in the case of the hydrogen atom, taking the proton mass to infinity, so that the nucleus
simply serves as a source of electrostatic Coulomb energy, the summation of ladder diagrams with a fully
relativistic electron propagator leads to a Bethe–Salpeter equation which is just the Dirac equation in
the presence of a Coulomb potential, with bound-state energy given by (2.63). See (Weinberg, 1995a),
Chapter 14.
400 Dynamics IX: Interacting fields: non-perturbative aspects

11.3 “Essentially non-perturbative” processes:


non-Borel-summability in field theory
The “inconvenient truth” that perturbative expansions in field theory (and in quantum
mechanics, more generally) are at best asymptotic expansions rather than convergent
Taylor expansions leads to an important question of principle: namely, to what extent
does the information contained in a perturbative (i.e., graphical Feynman) expansion
of the amplitudes of the theory fully specify, even in principle, the exact dynamics of
the theory? Were the perturbation theory a convergent expansion, with a finite radius
of convergence in the (complex) coupling constant plane, the answer to this question
would be immediate and affirmative, as standard theorems of complex analysis would
assure the possibility of reconstruction, at least in principle, of the full amplitudes
for arbitrary values of the coupling by analytic continuation starting with the power
series around the point of vanishing coupling. In turns out that in certain cases, the
perturbative information (i.e., the coefficients of the divergent asymptotic expansion
of an amplitude) does indeed suffice for a reconstruction of the full amplitude, so
that in some sense the graphical expansion is “the whole story” for such theories. On
the other hand, in most four-dimensional field theories of phenomenological interest,
and in particular in the gauge theories which form the backbone of the Standard
Model of elementary particle physics, this is unfortunately not the case. In addition
to the question of principle raised here, there is the practical matter that in a strongly
coupled theory, with a coupling constant of order unity, an asymptotic expansion,
the partial summands of which begin to grow essentially immediately (at low orders
of perturbation theory), is just unable to supply us with any quantitatively useful
information with regard to the exact amplitudes of the theory.
As usual in discussions of (non-)convergence of perturbation theory, the toy
integral (11.3) corresponding to 0-dimensional φ4 theory provides a convenient starting
point:
 +∞
1 2 2 λ 4
Z(λ, m) ≡ e− 2 m x − 4 x dx (11.55)
−∞

We can rewrite this quantity as an integral transform as follows


 ∞
Z(λ, m) = t−1/2 e−t B(λt)dt, (11.56)
0
 +∞
1 λ
B(λt) ≡ t 1/2
δ( m2 x2 + x4 − t)dx (11.57)
−∞ 2 4
√  +∞
m2 2 1 4
= λt δ( y + y − λt)dy (11.58)
−∞ 2 4
Evaluating the last integral by locating the roots of the quadratic function in the
δ-function, one finds (setting z = λt),

2 z
B(z) = √ (11.59)
( m4 + 4z − m2 )(m4 + 4z)
“Essentially non-perturbative” processes: non-Borel-summabilityin field theory 401

from which we conclude that the function B appearing in (11.56) is an analytic function
of its argument at the origin, with a convergent Taylor series with radius of convergence
4 m (due to the square-root branch point appearing at z = − 4 m ). The integral
1 4 1 4

transformation in (11.56) (called a Borel transform) has therefore accomplished the


impressive feat of taming the divergence of the asymptotic expansion of Z(λ, m) in
powers of λ, essentially by dividing each coefficient of this asymptotic expansion by a
factorial which reduces the rate of growth of the coefficients from factorial to power
(and therefore to a series with a finite radius of convergence):

  ∞ 
B(z) = dn z n ⇒ Z(λ) ≡ t−1/2 e−t B(λt)dt ∼ cn λn (11.60)
n=0 0 n=0
cn
dn = (11.61)
Γ(n + 12 )

If we substitute the explicit result (11.4) for the cn into (11.61) we find the dominant
asymptotic behavior dn ∼ (−1)n ( m44 )n , leading to the radius of convergence quoted
above (using the ratio test). Note that the appearance of an oscillating sign factor in
the coefficients ensures that the singularity of B(z) occurs on the negative real axis
for z: the Borel transform is well-defined (by analytic continuation of its power series
around z = 0) and non-singular on the entire positive real axis where the integral
(11.56) reconstructing Z(λ, m) runs. In such cases one refers to the original divergent
asymptotic expansion as Borel summable: the full partition function is recoverable in
such cases from a knowledge of the perturbative expansion coefficients, which after
division by a factorial, yield Taylor coefficients with power behavior at large order,
and define a non-singular function B(z) (for positive real z) which can at least in
principle be used to reconstruct the desired Z(λ, m) for arbitrary coupling λ.
The extension of the above ideas to dimensions d ≥ 1 (i.e., the anharmonic oscilla-
tor for d = 1, fully regularized φ4 theories in d ≥ 2) with a positive sign mass term is
quite straightforward. The Euclidean action of the discretized theory may be written

1 λ 4
S(φi ) = φi Kij φj + φ (11.62)
2 i,j 4 i i

where the quadratic form K is positive-definite. A Borel transform is then obtained


by introducing a δ-function as previously. One finds (with N the number of spacetime
points) after appropriately rescaling the fields,
 
1 1 4
B(z) = z −N/2 dψi δ( ψi Kij ψj + ψ − z) (11.63)
2 4 i i

The appropriate analyticity of B(z) and hence the property of Borel summability has
been rigorously established for such theories, although for considerations of space we
shall not attempt to provide a proof here.9

9 For discussion and further references, see (Glimm and Jaffe, 1987), Section 23.2.
402 Dynamics IX: Interacting fields: non-perturbative aspects

Unfortunately, the highly desirable feature of Borel-summability turns out to be


rather fragile. For example, if we simply change the sign of the mass term, the Borel
transform develops a singularity on the positive real axis, vitiating the reconstruction
of the partition function via (11.56) (as the integral runs directly over a non-integrable
singularity). We recall from Chapter 8 that a negative mass term in the Hamiltonian
(or Euclidean action) of a φ4 field theory simply amounts to a situation in which the
system develops degenerate ground states (see Fig. 8.1). For d = 1 (the anharmonic
oscillator case), the sign change leads to a double-well potential, with degenerate
minima of the classical potential leading to the well-known tunneling phenomena and a
unique symmetric ground-state wavefunction. Returning once again to our toy integral,
if we start with an action function with a negative squared-mass term (and shifted by
a constant to set the minimum action at zero)

m4 m2 2 λ 4
Z(λ, m) = e−S(x) dx, S(x) = − x + x (11.64)
4λ 2 4
the Borel transform B(z) becomes
 ∞
√ 1
B(z) = 2 z δ(z − S̃(y))dy, S̃(y) ≡ (m2 − y 2 )2 (11.65)
0 4
and one finds √
that the contribution to the y integral from the root of the δ-function at
y = m2 − 2 z leads to a term in B(z) with a simple pole on the positive real axis
4
at z = m4 :
 √
1 m2 + 2 z
B(z) ∼  √ = (11.66)
m2 − 2 z m4 − 4z

The singularity at positive real values is a consequence of√the fact that the coefficients
in a power series expansion (in this case, in powers of z), while only growing at a
power rate (rather than factorially) now lack the (−1)n oscillating sign factor present
in the Borel-summable case, which resulted in a singularity of B(z) for negative real
z, safely away from the contour of integration of the Borel transform (11.56). For
d ≥ 1 (quantum mechanics or field theory), the appearance of a singularity of the
Borel transform on the positive real axis is associated with the presence of tunneling
phenomena for physical (i.e., positive) values of the coupling λ: recall that, for the
Borel summable cases discussed in Section 11.1, energetic instabilities exemplified by
the instanton solutions responsible for the leading large-order behavior only occurred
once we had analytically continued the coupling λ to negative real values. With a
negative squared-mass term, tunneling between distinct local minima of the potential
energy function already occurs for physical (i.e., positive) values of λ. The presence
of instanton solutions (extrema of the Euclidean action) for physical values of the
gauge coupling in non-abelian gauge theories like QCD mean that such theories must
also necessarily develop Borel singularities on the positive axis, and are therefore not
Borel-summable.
The discussion up to this point has implicitly assumed that the exact amplitudes
of our theory are expressed in terms of a well-defined path-integral representation: i.e.,
“Essentially non-perturbative” processes: non-Borel-summabilityin field theory 403

the theory is fully regularized in both the ultraviolet and the infrared to reduce the
number of degrees of freedom (effectively) to a finite level. In Part 4 of the book we
shall examine the process of reorganizing weak-coupling perturbation theory in order
to eliminate the dependence of the amplitudes on these cutoffs in favor of renormalized
amplitudes defined in terms of physically accessible low-energy parameters of the the-
ory. In particular, the formal expansion of the amplitudes of the theory in powers of the
bare coupling parameter(s) appearing in the cutoff theory is replaced by an expansion
in powers of cutoff-independent renormalized coupling(s). This reorganization of the
perturbation series can result in the appearance of new singularities of the Borel
transform on the positive real axis, called “UV renormalons”, which once again vitiate
the reconstruction of the non-perturbative amplitude from perturbative information.
Even if the ultraviolet cutoff is maintained, in the infinite volume limit for massless
field theories, infrared divergences can appear which similarly induce positive real
singularities of the Borel transform (in this case, called “IR renormalons”), again
destroying the Borel summability of the theory. The lesson from all of this is clear:
the property of Borel summability is an extremely fragile one, and one which we can
hardly ever expect to be present in interesting relativistic field theories.
The failure of the Borel resummation technique suggests that the question for-
mulated earlier in this section—whether or not the information encoded in a formal
perturbative expansion contains sufficient information to reconstruct the exact gener-
ating function(al) of the theory—should be answered in the negative in all such cases.
This conclusion is unwarranted, as we shall now see. The Borel transform is only one of
a variety of reconstruction techniques which attempt to connect perturbative compu-
tations with the exact amplitudes of theories defined by path integrals. In particular,
the path integral for scalar field theories, with either sign of the mass term, may
be reconstructed by use of methods which go under the generic name of “optimized
perturbation theory”. The particular form of optimized perturbation theory which we
shall describe here is sometimes referred to as the “linear δ expansion”. The basic
idea is to construct a series of approximants to the path integral which only require
perturbative calculations (defined here as path integrals with Gaussian exponents only:
hence analytically computable), but nevertheless can be shown to converge rigorously
to the exact answer. The basic idea is to interpolate between a “tunable” Gaussian
approximation and the exact action by introducing an auxiliary interpolating variable
δ, 0 ≤ δ ≤ 1, and a variational parameter μ—effectively a variable bare mass.
Thus, one writes, for a theory with Euclidean action S(φ, m, λ), an interpolating
action

Sδ = δS(φ, m, λ) + (1 − δ)S0 (φ, m, λ; μ) (11.67)

where S0 is quadratic in the field variable. The partition function can then be formally
expanded in the δ variable
 
Z = Dφe−δS−(1−δ)S0 ∼ cn (m, λ; μ)δ n (11.68)
n=0

where the evaluation of the cn involve the usual perturbative manipulations: i.e.,
integrals with a Gaussian action S0 . The correct theory is, of course, recovered in the
404 Dynamics IX: Interacting fields: non-perturbative aspects

limit δ = 1, and an N th order approximant to the theory at δ = 1 is clearly given by


N
ZN ≡ cn (m, λ; μ) (11.69)
n=0

If we hold the parameter μ (the dependence on which, of course, disappears in Z at


δ = 1) fixed, the sequence of approximants is factorially divergent in the usual way.
On the other hand, if we change μ at each increased order N by a principle of minimal
sensitivity (PMS) (Stevenson, 1981)—namely, the requirement that the partial sum
ZN be extremal with respect to μ—then a miracle occurs: the sequence ZN (μN ),
with the μN determined by the aforesaid PMS condition, can be rigorously shown to
converge to the exact partition function Z(δ = 1)! In particular, we can show that the
remainder term at order N , defined by

RN ≡ Z − ZN (μN ) (11.70)

goes exponentially to zero for large N . We shall provide the full proof here only for the
case of the quartic toy integral: the full argument for d = 1 (the anharmonic oscillator,
either single or double-well) can be found in (Duncan and Jones, 1993).
We begin with a useful identity which isolates the N th approximant ZN defined
in (11.69) (see Problem 6)

dz 1 1 − z N +1 −zS−(1−z)S0
ZN = Dφ e (11.71)
C 2πi z N +1 1 − z

where C is a small circular contour enclosing the origin (and excluding z = 1). The
condition of minimal sensitivity requires that we extremize ZN with respect to the
variational parameter μ, the dependence on which is entirely contained in the “free”
action S0 (cf. (11.67)):

∂Zn dz ∂S0 1 −zS−(1−z)S0
= 0 ⇒ Dφ e
∂μ C 2πi ∂μ z N +1

∂S0
0 = Dφ (S − S0 )N e−S0 (11.72)
∂μ

We shall also need the following identity, which will facilitate the evaluation of the
remainder term (see Problem 7):

N  |f |
(−f )n 1
e−f − = e−f eξsign(f ) ξ N dξ (11.73)
n=0
n! N ! 0

valid for odd N . The large order asymptotics discussed henceforth will implicitly
assume that we are dealing with odd orders only. Using this identity, it follows
immediately that the remainder term RN at order N (odd) can be written as the
“Essentially non-perturbative” processes: non-Borel-summabilityin field theory 405

sum of two terms:

RN = AN + BN (11.74)
  S−S0
1 −S
AN ≡ Dφθ(S − S0 )e eξ ξ N dξ (11.75)
N! 0
  S0 −S
1 −S
BN ≡ Dφθ(S0 − S)e e−ξ ξ N dξ (11.76)
N! 0

For large (resp. small) fields φ, we have S > S0 (resp. S < S0 ), so we can refer to AN
(resp. BN ) as the strong (resp. weak) field contributions to the remainder term.
In the zero-dimensional case, there is a single spacetime point, the single field
variable φ is called x, and our free and full actions become

S0 = m2 x2 + λμx2 , S = m2 x2 + λx4 (11.77)

where inessential factors of 12 and 14 in our original quartic toy integral (11.55) have
been dropped. The PMS condition (11.72) determining μN becomes in this case
 ∞
2 2
x2 (x4 − μN x2 )N e−(m +λμN )x dx = 0 (11.78)
0

or, changing to the rescaled variable u ≡ x2 /μN ,


 ∞
1 1
0= uN + 2 (u − 1)N e−N αu du, αN ≡ (m2 μN + λμ2N ) (11.79)
0 N
 ∞

= u sign(u − 1)e−N SN (u) du, SN (u) ≡ αN u − ln |u(u − 1)| (11.80)
0


For large N , the integral in (11.80) is dominated by two saddle points at which SN = 0,
one at u = u> > 1 contributing with a positive sign, the other at u = u< , 0 < u< < 1
contributing with a negative sign. One readily finds

1 1 α2
u> = + (1 + 1 + N ) (11.81)
2 αN 4

1 1 α2
u< = + (1 − 1 + N ) (11.82)
2 αN 4
As the two contributions must cancel to satisfy the PMS condition, the values of the
effective action function SN (u) at these two points must agree in the large N limit:

α2
αN2 1 + 4N + 1
SN (u> ) = SN (u< ) ⇒ 2 1 + = ln ⇒ αN = 1.325487... ≡ α0
4 α2
1 + 4N − 1
(11.83)
406 Dynamics IX: Interacting fields: non-perturbative aspects

so that (recalling that αN ≡ 1 2


N (m μN + λμ2N )) the desired PMS scaling of μN becomes
asymptotically

α0 1/2 m2
μN ∼ N − + O(N −1/2 ) (11.84)
λ 2λ
We see already at this point—and this is crucial for the effectiveness of optimized
perturbation theory in handling both the Borel (m2 > 0) and non-Borel (m2 < 0) cases
with equal facility—that the dependence of the asymptotic behavior of the variational
parameter μN on the sign of the mass term is subdominant: in either case, we must
scale μN ∝ N 1/2 at large N . We shall now see that in either case, with this scaling,
the remainder term RN goes exponentially to zero, so the ZN (for odd N ) provide a
rapidly convergent sequence of approximants to the exact integral Z.
We saw previously that the remainder term RN can be written as the sum of
strong-field and weak-field contributions AN and BN , defined in (11.75,11.76). For
the quartic integral, S − S0 = λ(x4 − μN x2 ) > 0 when x2 > μN , and the expression
for the strong-field contribution becomes
 ∞  λ(x4 −μN x2 )
1 −(m2 x2 +λx4 )
AN = √
dxe eξ ξ N dξ (11.85)
N! μN 0

Defining x2 = μN u and rescaling the ξ integral by

ξ = λ(x4 − μN x2 )σ = λμ2N u(u − 1)σ, 0<σ<1 (11.86)

we find
 ∞ 
(λμ2N )N +1 √ 2
μN u−λμ2N u2 N + 12
1
2
AN = μN du e−m u (u − 1)N +1 σ N eλμN u(u−1)σ dσ
2N ! 1 0
(11.87)
Given that u > 1 in the integral above, the σ integral satisfies the obvious inequality
 1
2 2
σ N eλμN u(u−1)σ dσ < eλμN u(u−1) (11.88)
0

which, when inserted in (11.87), gives


 ∞
(λμ2N )N +1 √ 1 2 2
AN < μN du u 2 (u − 1)e−(m μN +λμN )u+N ln (u(u−1)) (11.89)
2N ! 1

Inserting the PMS scaling behavior found earlier in (11.79, 11.80, 11.83), this becomes
 ∞
(λμ2N )N +1 √ 1
AN < μN du u 2 (u − 1)e−N SN (u) (11.90)
2N ! 1

and we see that the u integral is dominated by exactly the same saddle-point at u = u>
found earlier in implementing the PMS condition. In other words, at large (odd) N ,
the u integral gives a contribution proportional to √1N e−N SN (u> ) , with u> given in
“Essentially non-perturbative” processes: non-Borel-summabilityin field theory 407

(11.81) (with αN = α0 = 1.325..). Inserting this result and using Stirling’s formula for
the factorial, a short calculation gives the desired asymptotic behavior
1 1
AN ∼ CN 4 eN (1+ln α0 −SN (u> )) = CN 4 e−0.6627..N (11.91)

irrespective of the sign of m2 . A similar calculation for the weak-field contribution BN


gives exactly the same asymptotic behavior (up to a constant)—not surprisingly, given
that we earlier saw that the PMS scaling is tantamount to equality of the effective
actions at the corresponding saddle-points, SN (u< ) = SN (u> ). The conclusion is that
the sequence of approximants ZN obtained by carrying out the δ expansion to order
N (a process involving only Gaussian integrals and therefore graphically equivalent to
conventional perturbation theory) and then evaluating the result at the PMS value for
the variational parameter μN , converges to the exact answer with exponential rapidity,
whether or not the conventional asymptotic expansion of the theory in powers of the
coupling λ is Borel summable.
The optimized δ expansion can also be shown to lead to convergent results in
dimensions d = 1 (anharmonic oscillator) and d ≥ 2 (φ4 field theory), again irrespec-
tive of the sign of the mass term (Duncan and Jones, 1993). For the anharmonic
oscillator, the finite (Euclidean) time partition function is given by the anharmonic
version of (4.110):

−βHanhar 1
Tr(e ) = dQ Q|e−βHanhar |Q , Hanhar = (p2 + m2 q 2 ) + λq 4 (11.92)
2

= e−S(q) Dq(t) (11.93)
q(β)=q(0)
 β
1
S(q) = { (q̇ 2 + m2 q 2 ) + λq 4 }dt (11.94)
0 2

which can be subjected to an optimized δ expansion by choosing


 β
1 1
S0 = { q̇ 2 + (m2 + 2μλ)q 2 }dt (11.95)
0 2 2
 β
1 1
S= { q̇ 2 + m2 q 2 + λq 4 }dt (11.96)
0 2 2

In this case, the PMS scaling turns out to be μN ∼ N 2/3 . The strong-field contribution
to the remainder AN dies exponentially as in the quartic integral case (i.e., like
e−const·N ), while the weak-field contribution goes like

λβμ2 1 2 2/3
BN < √ N e− λβ (N/muN ) ∼ N 5/6 e−CN /(λβ) (11.97)
4 2πN
In the case of the anharmonic oscillator, techniques are available for the calculation
of the δ expansion perturbation theory coefficients to high order (e.g., N ∼ 75), and
these convergence results can therefore be checked explicitly. In the field-theory case,
408 Dynamics IX: Interacting fields: non-perturbative aspects

of course, higher loop calculations become simply impractical, so one has to hope for
convergence at moderate values of N (less than 5, say).
Note that the convergence of the optimized approximants in (11.97) is lost at
large Euclidean time extent β—a problem which is, of course, exacerbated in higher
dimension where β becomes βV , with V the spatial volume. This means that with the
particular interpolation chosen here (a variable bare mass), the optimized perturbation
theory is not really useful in the field theory context, even though in principle it implies
that exact results are reconstructible from “perturbative information” in the finite-
volume theory for either sign of the mass term. One might hope to eliminate the volume
dependence by attempting an optimized expansion for the connected amplitudes of the
theory—by studying an optimized expansion of ln Z rather than Z, for example—but
the convergence proof with the PMS optimized interpolation approach as described
above breaks down in this case (Duncan and Jones, 1993). Of course, this procedure
relied on a very specific choice for the interpolation between free and full actions (using
a variational bare mass term, in particular), and it may very well be possible that a
more ingenious interpolation scheme would allow a convergent reorganization of the
perturbative expansion for connected amplitudes even in non-Borel field theories.
The just-described examples of the δ expansion indicate that the question posed
at the beginning of this section—is the full content of field theory already present
in perturbatively computable amplitudes (perhaps in a highly disguised form!)?—
cannot be answered definitively in the negative, even for non-Borel-summable theories.
However, as a practical matter, we must admit that for “essentially non-perturbative”
processes in a strongly coupled non-Borel-summable field theory, those in which no
summable subset of perturbation theoretic contributions can be shown to incorporate
the dominant contribution to the desired amplitudes, a description in terms of Feyn-
man diagrams yields at best a crude qualitative (and very possibly misleading) picture
of the underlying physics. Very little can be learned, for example, about the physics
of quark confinement by studying Feynman graphs of interacting quarks and gluons,
however complicated.
In cases like these, where perturbation theory completely fails us, how can we hope
to make progress in making reliable, quantitatively accurate, predictions in a relativistic
field theory? If we give up on the most ambitious goal—explicitly calculating the full
amplitudes of the theory from a finite (and small) set of experimentally determinable
masses and couplings—much can be achieved simply by exploiting general structural
features which we expect our strongly coupled field theory to possess. In the 1950s and
1960s, for example, the failure of local field theory models to provide a quantitative
description of strong interaction processes (as they could hardly do prior to the
discovery of quantum chromodynamics (QCD), and the development of appropriate
non-perturbative techniques for dealing with QCD) led many theorists to adopt a
highly positivistic approach, in which one attempted to constrain strong interaction
scattering amplitudes (incorporated in the S-matrix of the theory), as the only directly
measurable objects, on the basis of a set of “sacred” principles, primarily Lorentz-
invariance, unitarity, crossing invariance, and a principle of maximal analyticity, which
asserted the analyticity of scattering amplitudes as functions of the complexified
kinematical variables except at points where singularities were necessitated by the
appearance of thresholds. The clever application of dispersion relations which could
“Essentially non-perturbative” processes: non-Borel-summabilityin field theory 409

be derived on the basis of these fundamental assumptions led to many important and
experimentally verifiable predictions in strong interaction physics, despite the fact
that the correct underlying local field theory had yet to be identified. Typically, in
dispersion theory one derives relations between amplitudes: the complete calculation
of a specific amplitude from first principles cannot, of course, be expected in the
absence of a specific microtheory, although at the time there were hopes that a
self-consistent “bootstrap” program for hadronic amplitudes would suffice to “almost
uniquely” determine the S-matrix for hadronic scattering.
The development of current algebra in the late 1960s provides another example
of the profitable exploitation of general symmetry assumptions to derive important
relations between amplitudes, this time on the basis of an assumed commutator algebra
of the currents of the theory associated with chiral symmetry. The results of current
algebra follow purely from the assumed current commutators, and are compatible
with a variety of underlying field-theoretic models (or “effective Lagrangians”) which
share the same current algebra (cf. Section 16.6), so the verification of current algebra
predictions for low-energy multipion scattering (say) brings us no closer to a unique
underlying dynamics than the results of the S-matrix approach. Consequently, if, as we
now believe, the dynamics of the strong interactions is just as precisely defined by an
underlying local quantum field theory (quantum chromodynamics) as the interactions
of electrons and photons were found to be by quantum electrodynamics in the 1950s, a
full test of such a theory must necessarily include a sufficiently accurate determination
of enough of the phenomenological content of the theory to allow us to conclude that
the specific quantum field theory chosen is indeed the correct one.
Fortunately, the last 30 years has seen the development of powerful new numerical
techniques for reliably extracting much of the non-perturbative content of strongly
coupled field theories. These techniques, which go under the general heading of “lattice
field theory”, mimic at a numerical level the rigorous construction of a continuum field
theory (in those cases where a construction is possible; see (Glimm and Jaffe, 1987)),
starting with a full regularization of the theory on a finite spacetime lattice (with
lattice spacing a and spacetime volume V ) and then taking the continuum limit a → 0
and the infinite volume limit V → ∞, sometimes referred to as the “thermodynamic
limit” (in that order). In the case of four-dimensional massless Yang–Mills theories
(coupled to Nf fermionic quarks, provided the number of quark types Nf does not
exceed a critical number; cf. Chapter 15) this limit is believed to yield well-defined
Green functions—in particular, a set of Wightman functions for the local operators of
the theory with zero color which satisfy the Wightman axioms, and a theory with a
non-zero mass gap in the spectrum. A rigorous proof of this assertion is likely to be
extremely difficult: it is one of the seven Millenium Prize Problems announced by the
Clay Mathematics Institute in May 2000!
Nevertheless, assuming the existence of the continuum and thermodynamic limits,
a sequence of approximants to the exact Euclidean Schwinger functions of the theory
can be obtained by evaluating the corresponding functional integrals numerically
(typically, by Monte Carlo simulation methods) on a finite hypercubical L × L × L × L
spacetime lattice (with L = La, L integer, a the lattice spacing), and then increasing
L as the lattice spacing is appropriately scaled towards zero. The statistical errors
incurred in such a numerical approach can be determined by standard statistical
410 Dynamics IX: Interacting fields: non-perturbative aspects

techniques. The systematic errors are of two kinds: short-distance errors due to the
finite lattice spacing a, and long-distance, due to the finite extent of the lattice
both spatially and temporally. The former turn out to be simply of a power nature
ap , with the power p depending on the particular observable being measured. For
a theory with a mass gap m, the finite volume corrections fall exponentially, at
least as fast as e−mL , as the physical size of the lattice is increased. In any event,
the existence of a continuum limit ensures that the approximants to the desired
Schwinger functions systematically approach the correct non-perturbative results,
unlike the situation with partial summands of the formal perturbative expansion for
these functions. In particular, using the methods of lattice gauge theory (described in
greater detail in Sections 19.3 and 19.4), it has been possible to (a) verify the presence
of a linearly rising potential at long distances (and Coulombic behavior at short
distances) between static color sources (quark confinement), as illustrated in a typical
quenched (i.e., pure gauge theory) calculation in Fig. 11.13 (from (Duncan et al.,
1995)), and (b) compute the spectrum of the low-lying hadrons from first principles
(i.e., starting from the QCD Lagrangian) to within a few percent and verify agree-
ment with the observed particle masses. (For a summary of some recent results, see
(Kuramashi, 2008).)
Despite the enormous progress that has been made in obtaining quantitatively reli-
able non-perturbative information with the methods of lattice field theory (especially
in the case of lattice QCD), the restrictions imposed in this approach to numerical
estimates of the Euclidean path integral lead to some serious drawbacks. There are
two main areas where lattice field theory leaves much to be desired:

2.5

2
V(R) (GeV)

1.5

0.5

0
0 0.2 0.4 0.6 0.8 1 1.2
R (fermi)

Fig. 11.13 Static quark–antiquark potential (pure gauge theory) from lattice simulation.
Problems 411

1. Unlike the situation in conventional multi-loop perturbation theory, where exact


analytic results are frequently obtainable, and if not, numerical evaluation of the
Feynman integrals is usually possible to essentially arbitrary accuracy, the use
of Monte Carlo techniques means that one is necessarily dealing with statistical
errors at each stage of a lattice calculation (i.e., with a given lattice and choice
of bare masses and couplings), in addition to the inevitable systematic errors
incurred by a finite lattice spacing and spatiotemporal volume. The statistical
errors (usually containing subtle correlations which must be carefully understood)
add an inescapable level of uncertainty to the final results. Moreover, they become
uncontrollably large in several instances of great physical importance and interest,
such as in the study of field theories at finite temperature and chemical potential,
or in Minkowski space. This leads us to the second important deficiency of lattice
field theory.
2. The behavior of the Euclidean Schwinger functions can be directly related to
many quantities of central physical importance. For example, the lowest mass in
each channel of well-defined conserved quantum numbers can be extracted from
the Euclidean large-distance behavior of the appropriate two-point function via
the Kållen–Lehmann representation (cf. Section 9.5). It is also possible to obtain
certain matrix elements of local operators from the asymptotic behavior of the
Euclidean Green functions of the theory. However, many features of high-energy
scattering in QCD, for example, necessarily involve intrinsically Minkowski-space
amplitudes which cannot be recovered from their only approximately known (by
numerical estimation) Euclidean analytical continuations. On the other hand,
although the functional integral can be formulated directly in Minkowski space
as an absolutely convergent integral (cf. Section 10.3), the resultant multi-
dimensional integrand undergoes essentially independent phase fluctuations at
each lattice site, leading to an intractably low signal to noise ratio if one applies
standard Monte Carlo sampling techniques: hence the blowup of statistical errors
referred to above.10 This is the infamous “sign problem”, which is the bane of the
numerical simulation approach in many important problems in condensed-matter
physics as well as in relativistic field theory.

In summary, it is clear that we are still far from having a comprehensive and universally
applicable strategy—a “magic bullet”, as it were—for dealing with strongly-coupled
field theories. For the time being we must instead make do with a patchwork of
techniques which provide complementary (but far from complete) information about
the physics of such theories.

11.4 Problems
1. The instability of the ground state for an anharmonic oscillator with negative
λ (i.e. V (q) = 12 q 2 − |λ| 4
4 q ) can be studied by the standard WKB formula. The
tunneling amplitude for a particle of zero energy to tunnel from the origin q = 0 to

10 See, however, footnote 13 of Chapter 10 for a potentially useful Langevin simulation approach to
complex actions.
412 Dynamics IX: Interacting fields: non-perturbative aspects

  qt √
− 2V (q)dq
the other side of the barrier (at qt = ± 2/|λ|) is proportional to e 0 .
Evaluate the integral and compare with the exponential term in (11.16) (note:
the imaginary part of the energy is related to the decay rate, i.e. the square of
the tunneling amplitude).
2. In the Hamiltonian for the anharmonic oscillator (m =  = 1)

1 d2 1 λ
H=− 2
+ q2 + q4 (11.98)
2 dq 2 4

show that a rescaling x ≡ λ1/6 q leads to a new Hamiltonian

1 d2 1 1
H  = λ−1/3 H = − + x4 + λ−2/3 x2 (11.99)
2 dx2 4 2

The last term is a Kato perturbation (see (Kato, 1995)) of the first two,
so the ∞expansion of λ−1/3 E(λ) in powers of λ−2/3 is analytic: E(λ) =
1/3 −2n/3
λ n=0 an λ is a convergent series.
3. Verify the one-loop results (11.43,11.44) for the scalar loop integrals in space-time
dimensions 2 and 3, respectively (the identity (16.67) is useful).
4. Consider fermion–antifermion (ψ − ψ c ) scattering via exchange of a massive
spinless boson φ, as in Theory C of Section 7.6, in spacetime dimension d = 4.
(a) Show that the one-loop graph (fourth order in the Yukawa coupling λ) arising
from two successive exchanges of a φ (i.e., the graph displayed in Fig. 11.12
with n = 1) is infrared finite in the on-threshold limit p, q → 0. In this theory
a bound state cannot form at weak coupling by infrared enhancement of the
coupling strength: instead, the coupling itself must become large to encourage
the persistent rescattering needed for bound-state formation.
(b) Repeat the steps leading to (11.54) in this theory (i.e., study the Bethe–
Salpeter equation in the ladder approximation, treating the fermion propa-
gators non-relativistically) to show that the resultant Schrödinger equation
contains a Yukawa potential with range 1/M , where M is the mass of the
φ. Of course, in this theory, even if a bound state exists, the ladder graphs
do not play a preferred role in the formation of the bound state, unlike the
situation for threshold bound states. The formation of the bound state in
this case is an essentially non-perturbative phenomenon.
5. The effect of higher-order kernels in shifting the mass of a threshold bound state
can be calculated perturbatively by the following procedure. We shall imagine
inserting a single higher-order kernel (as, for example, in Fig. 11.10(b)) in the
graphs for the 2-2 scattering amplitude.
(a) Show that the first-order change in the amplitude G̃P (q, p) resulting from a
shift ΔKP (q, p) in the kernel KP (q, p) is given by

dd q  dd q 
ΔG̃P (q, p) = G̃P (q, q  )ΔKP (q  , q  )G̃P (q , p) (11.100)
(2π)d
Problems 413

(b) By taking P to the bound-state pole and extracting the pole-term on both
sides of (11.100) using (11.34), show that the first-order shift in the (squared)
bound-state mass induced by ΔKP (q, p) is

dd q  dd q 
ΔMB = −i Φ∗P (q  )ΔKP (q , q  )ΦP (q  )
2
(11.101)
(2π)2d
This formula is the field-theory analog of the familiar expression for
the energy shift in non-degenerate first-order perturbation theory in non-
relativistic quantum mechanics.
6. Verify the contour-integral identity (11.71) for the partial summand of the
asymptotic expansion of the partition function of a general scalar field theory.
7. Verify the identity (11.73) needed for the estimation of the remainder term in an
asymptotic expansion.
12
Symmetries I: Continuous spacetime
symmetry: why we need Lagrangians
in field theory

For most beginning students of quantum field theory, an early surprise is in store
when they encounter, for the first time since facing unpleasant problems in classical
mechanics, typically involving absurdly complicated devices requiring the insertion of
peculiar constraints, the notion of a Lagrangian as the fundamental object specifying
the dynamical behavior of the theory. Certainly, such a creature plays little or no role
in non-relativistic quantum theory, where the Hamiltonian, the explicit determinant
of time evolution of the system, reigns supreme. Our first objective in this chapter
is to understand the peculiar, and indispensable, utility of the Lagrangian approach
to dynamics in relativistic quantum field theories. Our emphasis initially will be to
underscore the facility with which a Lagrangian approach incorporates the desired—
in fact, indispensable—spacetime symmetries of a relativistic field theory. In later
chapters we shall see that the Lagrangian is an equally useful object in simplifying the
treatment of local gauge symmetries, which are in some sense an amalgam of internal
and spacetime symmetry.

12.1 The problem with derivatively coupled theories: seagulls,


Schwinger terms, and T ∗ products
In our first attempts to construct a Lorentz-invariant theory of scattering in Section
5.5, we saw that the simple expedient of choosing the interaction Hamiltonian density
Hint (x) of the theory to be a Lorentz scalar field seemed to lead us directly to a
Lorentz-invariant S-matrix, provided Hint (x) was also local (i.e., commuting with itself
at space-like separations). Unfortunately, as the discussion in Section 5.5 revealed,
mere locality was not quite enough, as the behavior of the commutators of Hint (x)
as the spacetime points of the commuting operators approach coincidence leads to
delicate singularity issues which can destroy the desired Lorentz-invariance. Let us
illustrate the difficulty with a simple example—a theory of a fermionic spin- 12 field ψ
interacting with a pseudoscalar1 field φ via a derivative coupling:

Hint (x) = g ψ̄(x)γ μ γ5 ψ(x)∂μ φ(x) (12.1)

1 The assignment of parity quantum numbers to local fields will be explained in Section 13.1.
The problem with derivatively coupled theories: seagulls, Schwinger terms, and T ∗ products 415

which is an interaction sometimes used to describe the effective low-energy (parity-


conserving) strong-interaction coupling of nucleons to pions. Note that the interaction
Hamiltonian density in (12.1) is clearly a Lorentz scalar field, as it is obtained by
contracting an axial-vector field with the gradient of a pseudoscalar. From Wick’s
theorem, we see that the 2-2 scattering amplitude of the ψ particles contains the
following contribution to second order in the coupling g (we suppress the spin indices
for the fermions for simplicity):


(−ig)2
d4 z1 d4 z2 p1 p2 | : ψ̄γ μ γ5 ψ(z1 )ψ̄γ ν γ5 ψ(z2 ) : |p1 p2 · 0|T (∂μ φ(z1 )∂ν φ(z2 ))|0
2
(12.2)
So far, on the surface, we seem to be dealing with a perfectly covariant expression, with
the Lorentz indices μ, ν properly contracted, but of course trouble lurks potentially in
the time-ordered product, where an explicit choice of inertial frame is presupposed.
Without the derivatives on the scalar field, this T-product is just the free scalar
propagator iΔF (z1 − z2 ), which we saw in Chapter 7 is perfectly Lorentz-invariant
(as a function of the spacetime separation z1 − z2 ). In this case, we find


0|T (φ(z1 )φ(z2 ))|0 = 0|T (φ(z1 )∂ν φ(z2 ))|0 + δν0 δ(z10 − z20 )[φ(z2 ), φ(z1 )] (12.3)
∂z2ν
= 0|T (φ(z1 )∂ν φ(z2 ))|0 (12.4)

where the commutator appearing in (12.3) (in virtue of time-derivatives acting on the
θ-functions defining the T-product) vanishes by locality of φ(z). Inserting the second
derivative, however, we find

∂ ∂
0|T ∗ (∂μ φ(z1 )∂ν φ(z2 ))|0 ≡ 0|T (φ(z1 )φ(z2 ))|0 (12.5)
∂z1μ ∂z2ν
= 0|T (∂μ φ(z1 )∂ν φ(z2 ))|0 + δμ0 δ(z10 − z20 )[φ(z1 ), ∂ν φ(z2 )] (12.6)
= 0|T (∂μ φ(z1 )∂ν φ(z2 ))|0 + iδμ0 δν0 δ 4 (z1 − z2 ) (12.7)

where in going from (12.6) to (12.7) we have used the equal-time commutator (8.1)
of φ with φ̇. Now the T∗ -product defined in (12.5) is itself perfectly covariant, as it is
simply the second spacetime-derivative of the Lorentz-invariant scalar propagator:

kμ kν d4 k
0|T ∗ (∂μ φ(z1 )∂ν φ(z2 ))|0 = i e−ik·(z1 −z2 ) (12.8)
k2 − m + i
2 (2π)4

Accordingly, the T-product appearing in the second-order amplitude (12.2) contains


a non-covariant contact term (sometimes called a Schwinger term)

0|T (∂μ φ(z1 )∂ν φ(z2 ))|0 = 0|T ∗ (∂μ φ(z1 )∂ν φ(z2 ))|0 − iδμ0 δν0 δ 4 (z1 − z2 ) (12.9)
416 Symmetries I: Continuous spacetime symmetry: why we need Lagrangians in field theory

which in turn means that the 2-2 scattering amplitude (12.2) contains, in addition to
a perfectly covariant contribution (from the T ∗ -product), the non-covariant piece

g2
i d4 z p1 p2 | : ψ̄γ0 γ5 ψ(z)ψ̄γ0 γ5 ψ(z) : |p1 p2 (12.10)
2

Referring back to Section 5.5, the reader may easily verify (see Problem 1) that
the difficulty here was already identified in the general expression (5.81), where we
showed that the appearance of spatial derivatives of a δ-function in the equal-time
commutator of a non-ultralocal interaction Hamiltonian density spelled potential
disaster for the Lorentz covariance of the theory. In this case, the cure is easy to
find, as the non-covariant piece is itself local, and can be cancelled by augmenting
the interaction Hamiltonian in (12.1) by the four-fermion operator in (12.10), with an
opposite sign:

1
Hint (x) = g ψ̄γ μ γ5 ψ∂μ φ(x) + g2 (ψ̄γ0 γ5 ψ)2 (x) (12.11)
2
The contribution to first order of the second term in (12.11) to the 2-2 scattering
amplitude is easily seen to exactly cancel the undesired non-covariant piece (12.10).
Moreover, this new term in the interaction Hamiltonian—dubbed the “seagull” vertex
in the original literature—appears in one-to-one association with every internal scalar
line in the Feynman graphs of the theory, serving to cancel the non-covariant Schwinger
term in the scalar propagator wherever the latter chooses to pop up. The appearance
of a non-covariant term in the Hamiltonian should not cause alarm: the energy density
is itself not a covariant object (as in the free Hamiltonian density, cf. (6.89)).
In general, when non-covariant terms appear in a theory (via an interaction
Hamiltonian failing to be an ultralocal scalar field), it is a non-trivial task to guess the
appropriate seagull terms needed to restore Lorentz-invariance of the theory. In certain
cases (e.g., gauge theories with certain choices of gauge) the required terms may even
be non-local! We clearly need an effective means of assuring Lorentz-invariance of the
theory ab initio—which, we shall soon see, is exactly what the canonical formalism,
in its Lagrangian version, is guaranteed to supply.

12.2 Canonical formalism in quantum field theory


The dynamics of quantum theories generally (and quantum field theories in particular)
is typically specified in terms of a Hamiltonian, the quantization of which requires
that Planck’s constant be inserted in the theory via a set of commutation relations
between conjugate variables (“coordinates” and “momenta”). The specification of an
explicit dynamics for a quantum field theory has been achieved so far in this book
by constructing a free Hamiltonian for the desired particles, in terms of interaction-
picture fields, and then attempting to construct a suitable interaction Hamiltonian
(as the integral of a local interaction density) in terms of these free interaction-
picture fields. For example, for the derivative coupled theory discussed above, and
including the covariantizing seagull term, the total Hamiltonian density for the theory
would be
Canonical formalism in quantum field theory 417

H = H0 + Hint
1 2 1  2 1 2 2  + M )ψ
= φ̇ + | ∇φ | + m φ + ψ̄(iγ · ∇
2 2 2
1
+ g ψ̄γ μ γ5 ψ∂μ φ + g2 (ψ̄γ 0 γ5 ψ)2 (12.12)
2
1 1  2 1 2 2  + M )ψ
= φ̇2 + | ∇φ | + m φ + ψ̄(iγ · ∇
2 2 2
 + 1 g 2 (ψ̄γ 0 γ5 ψ)2
+ g ψ̄γ 0 γ5 ψ φ̇ − g ψ̄γ γ5 ψ · ∇φ (12.13)
2

In the interaction picture, the identification of conjugate field variables is trivial, as the
equal-time commutation relations of the free field operators are exactly computable.
For scalar fields, we have

[φ(y , t), π φ (x, t)] = iδ 3 (x − y ), π φ (x, t) ≡ φ̇(x, t) (12.14)


[φ(y , t), φ(x, t)] = [π φ (y , t), π φ (x, t)] = 0 (12.15)

For Dirac fields, we have anticommutators (cf. Chapter 7, Problem 5):

{ψn (y , t), πm


ψ
(x, t)} = iδmn δ 3 (x − y ), π ψ (x, t) ≡ iψ † (x, t) = iψ̄(x, t)γ 0 (12.16)
{ψn (y , t), ψm (x, t)} = {πnψ (y , t), πm
ψ
(x, t)} = 0 (12.17)

The development of the canonical Hamiltonian formalism in field theory, as in classical


mechanics, requires that we re-express the Hamiltonian of the theory in terms of the
conjugate field pairs φ, π φ and ψ, π ψ . In the field-theory context, this replacement
allows us to move trivially between different representations of the time-development
of the theory: in particular, to obtain an expression for the Hamiltonian in the
Heisenberg picture, which is the natural environment for discussing the exact dynamics
of the theory, inasmuch as no artificial splits (motivated purely by calculational
convenience) are made between “free” and “interaction” terms. Indeed, if we recall that
the interaction-picture and Heisenberg fields are connected by the unitary operator
U (t, 0) ≡ Ω(t) = eiH0 t e−iHt (cf. (9.1)), we see that the Heisenberg fields and field
momenta defined by

φH (x, t) = Ω† (t)φ(x, t)Ω(t), φ


πH (x, t) = Ω† (t)π φ (x, t)Ω(t) (12.18)
ψH (x, t) = Ω† (t)ψ(x, t)Ω(t), ψ
πH (x, t) = Ω† (t)π ψ (x, t)Ω(t) (12.19)

satisfy (in virtue of Ω† (t)Ω(t) = 1) exactly the same equal-time (anti)commutation


relations as the original interaction-picture fields:

φ
[φH (y , t), πH (x, t)] = iδ 3 (x − y )
ψ
{ψH (y , t), πH (x, t)} = iδ 3 (x − y ) (12.20)
418 Symmetries I: Continuous spacetime symmetry: why we need Lagrangians in field theory

where we have dropped the Dirac indices on the fermionic fields to avoid notational
overload. This means that if we replace φ̇ by π φ and ψ̄ by −iπ ψ γ 0 in (12.13),

1 φ 2 1  2 1 2 2  + M )ψ
H= (π ) + | ∇φ | + m φ − iπ ψ γ 0 (iγ · ∇
2 2 2
 − 1 g 2 (π ψ γ5 ψ)2
− igπψ γ5 ψπφ + igπ ψ γ 0γ γ5 ψ · ∇φ (12.21)
2

and recall that the total Hamiltonian H = H(x, t)d3 x is a constant of the dynamics,
we can re-express the total Hamiltonian density immediately in terms of Heisenberg
fields simply by subscripting all the fields with H:

1 φ 2 1  1 ψ 0  + M )ψH
H= (π ) + | ∇φH |2 + m2 φ2H − iπH γ (iγ · ∇
2 H 2 2
ψ
− igπH φ
γ5 ψH πH ψ 0
+ igπH  H − 1 g 2 (π ψ γ5 ψH )2
γ γ γ5 ψH · ∇φ (12.22)
H
2
The fact that the full Hamiltonian of the theory has been expressed in terms of pairs of
fields satisfying canonical commutation (or anticommutation relations) (12.20) means,
as we shall soon see, that the dynamical equations of the theory can be rewritten
as differential functional equations, the field equivalent of the first-order (in time)
Hamiltonian equations of classical mechanics. In the next section we shall see that a
simple Legendre transformation of these functional equations will lead us to a very
simple criterion for ensuring the eventual Lorentz-invariance of our field theory. This is
a particularly pressing objective, given that H in (12.22) displays no vestige whatsoever
of the underlying Lorentz-invariance of the theory! Nor indeed should it: as indicated
previously, H is a spatial energy density, clearly a frame-dependent, non-Lorentz-
invariant object.
The first step in the derivation of the Hamiltonian field equations is a generalization
of a familiar identity in ordinary quantum mechanics: the fact that the commutation of
the momentum with a function of the coordinate operator is equivalent to a derivative
of the latter. We shall begin with the case of bosonic fields, satisfying commutation,
rather than anticommutation, relations. For example, from (12.20) one finds (for n
integer)
φ
[φnH (y , t), πH (x, t)] = inφn−1
H (y , t)δ 3 (x − y ) (12.23)

and

 H (y , t), π φ (x, t)] = 2i∇φ


 H (y , t) · ∇φ
[∇φ  H (y , t) · ∇
 y δ 3 (x − y ) (12.24)
H

One easily generalizes these simple cases to establish that for any polynomial function
φ   φ,
F of φH , πH , ∇φH , ∇πH

φ
[F(φH (y , t), πH (y , t),∇φ  φ (y , t)), π φ (x, t)] = i( ∂F + ∂F · ∇
 H (y , t),∇π  y )δ 3 (x − y )
H H  H
∂φH ∂ ∇φ
(12.25)
Canonical formalism in quantum field theory 419

A similar argument, interchanging the roles of the field φH and its conjugate momen-
φ
tum πH , gives

φ  φ (y , t)), φH (x, t)]


 H (y , t), ∇π
[F (φH (y , t), πH (y , t), ∇φ H

∂F ∂F  y )δ 3 (x − y )
= −i( φ
+ φ
·∇ (12.26)
∂πH 
∂ ∇π H

In particular, taking for our function F the total Hamiltonian density H of a scalar
theory, with the total Hamiltonian H given as a (time-independent) functional of the
fields

φ  φ (y , t))d3 y
 H (y , t), ∇π
H = H(φH (y , t), πH (y , t), ∇φ H (12.27)

the results (12.25, 12.26) show that commutation of a field with the full Hamilto-
nian can be re-expressed as a functional derivative2 with respect to the canonically
conjugate field:

φ δH ∂H x· ∂H
[H, πH (x, t)] = i = i( −∇ ) (12.28)
δφH (x, t) ∂φH (x, t) 
∂ ∇φH (x, t)
δH ∂H x· ∂H
[H, φH (x, t)] = −i φ
= −i( φ
−∇ φ
) (12.29)
δπH (x, t) ∂πH (x, t) 
∂ ∇πH (x, t)

On the other hand, we know that in Heisenberg representation, commutation with


the full Hamiltonian simply acts as a time-derivative: [H, AH (x, t)] = −iȦH (x, t) for
any Heisenberg field AH . Thus (12.28,12.29) amount to exact field analogs of the
Hamiltonian equations ∂H∂q = −ṗ, ∂p = q̇ of classical mechanics:
∂H

δH φ δH
= −π̇H (x), φ
= φ̇H (x) (12.30)
δφH (x) δπH (x)

where we have returned to spacetime coordinate notation ((x, t) → x).


Completely analogous arguments lead to similar conclusions for the fermionic
fields of the theory. The only new features here are (a) that the functionals of fields
considered contain an even number of fermionic fields in each term, and (b) that the
ψ
functional derivatives with respect to the fermionic fields ψH and πH must be defined3
to insert a minus sign at any point during the execution of the product rule where
the derivative operation is interchanged with a fermionic field, in order that the result
agree with that obtained by operator commutation. Thus we again find

2 In evaluating the functional derivatives in this formula, the full Hamiltonian H is assumed to be written
as a spatial integral over the fields on time-slice t. As H is conserved in time, we are of course free to choose
any time-slice on which to express the Hamiltonian as a spatial integral.
3 Cf. the discussion of Grassmann functional derivatives in Section 10.3.2.
420 Symmetries I: Continuous spacetime symmetry: why we need Lagrangians in field theory

δH ψ δH
= −π̇H (x), ψ
= ψ̇H (x) (12.31)
δψH (x) δπH (x)
Our toy model with Hamiltonian density (12.22) furnishes an immediate and
convenient example: for the scalar field, we find (suppressing the spacetime coordinate,
ψ
and writing πH in terms of the original ψ̄H )
δH  H − g ψ̄H γ γ5 ψH ) = −π̇ φ
 · (∇φ
= m2 φ H − ∇ H (12.32)
δφH (x)
δH φ
φ
= πH + g ψ̄H γ 0 γ5 ψH = φ̇H (12.33)
δπH (x)
Note the presence of a fermionic term (indicated here in bold face) in the scalar
φ
conjugate momentum field πH = φ̇H − g ψ̄ H γ 0 γ 5 ψ H : this will very soon play a crucial
role in cancelling the non-covariant effects of the seagull term. Combining these
φ
equations (by eliminating πH ), we arrive at the Lorentz-covariant Heisenberg field
equation for the scalar field φH :

( + m2 )φH (x) = g∂μ (ψ̄H γ μ γ5 ψH (x)) (12.34)

Note that the quartic (in the ψH field) seagull term in the Hamiltonian plays no role
so far: the scalar Heisenberg field equation would still be Lorentz-covariant without
this term. The situation is quite different for the fermionic equation of motion. The
second of Eqs. (12.31) applied to (12.22) gives directly4
 H − g 2 (π ψ γ ψ )γ ψ = ψ̇H
 + M )ψH − igγ5 ψH π φ + igγ 0γ γ5 ψH · ∇φ
−iγ 0 (iγ · ∇ H H 5 H 5 H
(12.35)
where the term arising from the quartic seagull contribution is highlighted in bold-
face type. Multiplying both sides by iγ 0 and inserting (12.33) to eliminate the scalar
φ
momentum field πH , we find
 H = iγ 0 ψ̇H
 + M )ψH + gγ 0 γ5 ψH φ̇H − gγ γ5 ψH · ∇φ
(iγ · ∇
⇒ (iγ μ ∂μ − M )ψH = gγ μ γ5 ψH ∂μ φH (12.36)

so that the Heisenberg field equation for the Dirac field ψH is also manifestly Lorentz-
covariant. It is precisely at this point that we see that the contribution from the
seagull term in the interaction Hamiltonian has cancelled the non-covariant term
φ
introduced by the fermionic component of the scalar momentum field πH . In other
words, the strange necessity for a four-fermion seagull interaction in order to preserve
Lorentz-invariant S-matrix amplitudes for fermion scattering, is directly correlated
with the construction of a Hamiltonian leading to Lorentz-covariant field equations
for the fermionic Heisenberg field of the theory. In this toy theory it was not too
difficult to guess the type of extra non-covariant term needed in the interaction

4 The reader may verify that the first Hamiltonian equation simply produces an equation for ψ †
H
equivalent to the adjoint of the equation obtained from the second equation; see Problem 3.
General condition for Lorentz-invariant field theory 421

density to restore the Lorentz-invariance of the theory, but in more complicated


models this becomes essentially impossible, and we need a fool-proof algorithm for
assuring from the very beginning that our theory possesses the full Lorentz-invariance
that we demand. The Lagrangian formalism is exactly the tool needed, as we shall
now see.

12.3 General condition for Lorentz-invariant field theory


Any formalism in which the Lorentz-invariance of a field theory is to be manifest from
the very outset must clearly be one in which time and space are treated in a symmetric
fashion, consistent with the requirements of special relativity. This is clearly not the
case in the Hamiltonian formalism, but as we shall now see, the symmetry between
space and time can be restored by the simple expedient of a Legendre transformation,
taking us from the Hamiltonian to a Lagrangian encoding of the dynamics of the
theory. We shall discuss the scalar field case only—the argument goes through in
similar fashion in the fermionic case. Also, we shall henceforth drop the H subscript,
as it will be assumed for the remainder of the chapter that we are dealing entirely with
fields in Heisenberg representation. To further streamline the algebra, let us introduce,
motivated by (12.25, 12.26), the total Euler derivative of a function F(φ, ∇φ,  ...)
with respect to a field φ (the ellipsis denotes a possible dependence of F on other
independent fields):

dF ∂F  · ∂F
≡ −∇ (12.37)
dφ ∂φ 
∂ ∇φ

In other words, the Euler derivative acting on the Hamiltonian density is equivalent
to the functional derivative acting on the spatially integrated Hamiltonian density (or
Hamiltonian). The Hamiltonian equations (12.28, 12.29) can thus be written as a pair
 φ), which
of partial differential equations for the Hamiltonian density H = H(π φ , ∇φ,
we assume here can be written in such a way that the momentum fields π φ appear
without spatial derivatives, as is generally the case for theories of interacting spin-0
and spin- 12 particles.5 Namely,

dH 
= −π˙φ , (12.38)
dφ πφ

dH 
= φ̇ (12.39)
dπ φ φ

The restoration of spacetime symmetry clearly requires that we reintroduce the time-
derivative φ̇ in favor of the canonical momentum πφ , inasmuch as spatial gradients
of the field φ are already in evidence. This can be done without loss of dynamical
information by the use of a Legendre transformation: one introduces a Lagrange density

5 The canonical treatment of theories of spin-1 fields involving a local gauge symmetry introduces further
subtleties which we shall defer to Chapter 15.
422 Symmetries I: Continuous spacetime symmetry: why we need Lagrangians in field theory

 φ) ≡ π φ φ̇ − H(π φ , ∇π
L(φ̇, ∇φ,  φ , φ, ∇φ)
 (12.40)

 as well) by solving
where πφ is to be expressed in terms of φ̇ (and possibly φ and ∇φ
the second Hamiltonian equation

dH 
= φ̇ (12.41)
dπ φ φ

Why the process of Legendre transformation should not only succeed in reintroducing
time-derivatives, but do so in just such a way as to manifest directly the Lorentz-
invariance of the theory is not obvious a priori (although we are about to demonstrate
this explicitly starting with the Hamiltonian equations of motion): the underlying
reason for the crucial role played by the Legendre transform in connecting a Lorentz-
invariant formulation of the theory with the energy operator of the theory will become
clear in the next section, when we discuss the Action Principle and Noether’s theorem.
Returning to (12.40), we find for the Euler derivative of the Lagrangian with respect
to φ (holding φ̇ fixed),

dL dπ φ dH dπ φ dH 
= φ̇ − φ − (12.42)
dφ dφ dπ dφ dφ πφ

dH 
=− (12.43)
dφ πφ

using (12.41) to cancel the first two terms in (12.42).


On the other hand,

∂L  ∂π φ dH ∂πφ
= π φ
+ φ̇ − = πφ (12.44)
∂ φ̇ φ ∂ φ̇ dπ φ ∂ φ̇

where we have used (12.41). Taking a time-derivative, we obtain,


 
d ∂L  
= π˙φ = − dH  (12.45)

dt ∂ φ̇ φ dφ πφ

where in (12.45) we have used the first Hamiltonian equation (12.38). Comparing
(12.43) and (12.45), we conclude that

d ∂L dL ∂L  ∂L
= = −∇· (12.46)
dt ∂ φ̇ dφ ∂φ 
∂ ∇φ
∂L ∂L
⇒ ∂μ = (12.47)
∂(∂μ φ) ∂φ

The manifestly Lorentz-covariant equation (12.47) is the celebrated Euler–Lagrange


equation of the theory. It is, of course, equivalent to the Heisenberg field equation for
the field φ (i.e., for our toy theory, (12.34), as it encodes the full dynamical information
contained in the Hamiltonian equations of motion. It is also immediately clear that
General condition for Lorentz-invariant field theory 423

these field equations will themselves be automatically Lorentz-covariant provided the


Lagrange density is chosen to be a Lorentz-scalar function of φ and ∂μ φ.
At this point we should take note of a potential difficulty in carrying out the process
described here: the existence (and reversibility) of the Legendre transform depends on
certain convexity properties of the function to be transformed—otherwise put, the
derivatives expressing the momentum variable π φ in terms of the time-derivative φ̇
must be a monotone function of the latter to ensure that we can solve uniquely for φ̇
in terms of π φ (and vice versa). In contrast to the case with the Legendre transform
defining the effective action Γ(φ) (cf. Section 10.4), where the required convexity is
assured by fundamental positivity properties of field theory (cf. Section 14.3), there
are many cases in which the transition from Lagrangian to Hamiltonian fails as a
result of the fact that constraints, or local gauge symmetries, are present, giving rise
to a singular transformation between field- (time-)derivatives and the field momenta.
A general theory of constrained Hamiltonian systems has been developed (beginning
with Dirac, (Dirac, 1964)) to handle such situations: we shall defer further discussion
of these issues to Chapter 15, when we shall have to face head-on the peculiar features
of the canonical formalism in theories with local gauge symmetry.
For the fermion fields, the Legendre transform takes an algebraically trivial form.
Fermionic Lagrangians (or Hamiltonians) are linear in spacetime-derivatives of the
Fermi field, with the derivatives appearing only in the free part

 − M )ψ
L0 = ψ̄(iγ 0 ∂0 − iγ · ∇ (12.48)

Thus, we always have πψ = iψ̄γ 0 , and the Legendre transform (starting with the
Hamiltonian, say) simply amounts to inserting π ψ ψ̇ into the (negative) Hamilto-
 + M )ψ (thereby covariantizing it), and replacing πψ → iψ̄γ 0
nian density −ψ̄(iγ · ∇
throughout (in both free and interaction terms).
Once again, our toy derivative-coupled theory, defined by the Hamiltonian (12.22),
provides a convenient explicit example. The Lagrangian is easily constructed: we have
(now including the fermionic contributions)

L = π φ φ̇ + π ψ ψ̇ − H (12.49)

where we must eliminate the momentum fields by putting

π φ = φ̇ − g ψ̄γ 0 γ5 ψ (12.50)
πψ = iψ † = iψ̄γ 0 (12.51)

A little algebra then reveals the Lagrangian density as a manifestly Lorentz-scalar


object

1
L= (∂μ φ∂ μ φ − m2 φ2 ) + ψ̄(i∂/ − M )ψ − g ψ̄γ μ γ5 ψ∂μ φ (12.52)
2
= L0 + Lint (12.53)
424 Symmetries I: Continuous spacetime symmetry: why we need Lagrangians in field theory

1
L0 ≡ (∂μ φ∂ μ φ − m2 φ2 ) + ψ̄(i∂/ − M )ψ (12.54)
2
Lint ≡ −g ψ̄γ μ γ5 ψ∂μ φ (12.55)

with an interaction term Lint which is clearly a Lorentz scalar, and just the negative
of the interaction Hamiltonian density (12.1) with which we began the chapter, in an
attempt to construct a derivatively-coupled theory of pions and nucleons. The free
Lagrangian L0 is a sum of scalar and Dirac field contributions which the reader may
easily verify lead directly (by reversing the Legendre transformation, to return to a
Hamiltonian) to the usual free scalar and Dirac Hamiltonians incorporated in (12.22).
We see now clearly that our original criterion for Lorentz-invariance, introduced
in Section 5.5—namely, to construct interaction Hamiltonians as scalar densities
built from products of the underlying covariant fields—was not too far from the
mark, except that the correct prescription in general requires that we choose the
Lagrangian to be a Lorentz scalar. In scalar theories without derivative coupling,
one has Hint = −Lint , so the two prescriptions in fact coincide. In the present case,
the presence of time-derivatives of the φ field in the interaction result in an extra
non-scalar contribution—the quartic seagull term!—to the Hamiltonian interaction
density (cf. (12.13)). Precisely such a term is needed, as we saw in Section 12.1,
to restore the Lorentz-invariance of the amplitudes of the theory, by cancelling the
non-covariant Schwinger terms which appear in propagators of the gradient field ∂μ φ
appearing in the interaction. Of course, in practice it is far easier to start with a Lorentz
scalar Lagrangian and generate the correct Hamiltonian (including any non-covariant
seagull interactions, if necessary) by an algebraically trivial6 Legendre transformation
than to try to guess the form of the interaction Hamiltonian needed to absorb non-
covariant terms in the propagators. In the following section we shall give a much more
general discussion of the Lagrangian formalism, in which it will become clear that it
provides the natural framework for incorporating and expressing the symmetries of the
theory (including symmetries beyond those directly associated with Lorentz/Poincaré
invariance).
The preceding discussion has focussed on the operator formulation of field theory
(in particular, on the Heisenberg field equations of the theory). A completely parallel
discussion of the relation between Hamiltonian and Lagrangian formulations can be
given using the path-integral formulation. In this approach, the relevance of a Lorentz-
invariant Lagrangian to the appearance of fully Lorentz-invariant amplitudes can be
seen in a much more direct fashion, as it allows us to circumvent completely the
appearance of non-covariant Schwinger terms and seagull vertices, and demonstrate
directly a set of Feynman rules with no non-covariant elements. We shall see that under
fairly general circumstances, the Legendre transformation connecting a field-theoretic
Hamiltonian and Lagrangian is exactly equivalent to a functional Fourier transform.
We begin with the bosonic case. Suppose that our Hamiltonian density is a function
of N bosonic fields φn , n = 1, 2, . . . N , where the φn may be individually scalar fields,

6 The asserted triviality is, however, absent in the presence of local gauge symmetries, where the canonical
procedure becomes quite delicate, as we shall see in Chapter 15.
General condition for Lorentz-invariant field theory 425

or components of fields transforming under more complicated representations of the


HLG (e.g., vector fields). We shall assume that the full Hamiltonian density is at
most quadratic in the conjugate momentum fields πn , n = 1, 2, .., N (we drop the φ
superscript on π, as only bosonic fields are present). Thus
1  n)
H= πn Knm πm + Ln πn + V (φn , ∇φ (12.56)
2
Here, the quadratic form Knm (and the Ln ) may be integro-differential operators: our
only condition is that Knm be symmetric and invertible. It may also depend on fields in
the theory other than the φn themselves (in which case we have no ordering problems
to concern us), as may the Ln .7 These conditions ensure that we can solve for πn in
terms of φ̇n using (12.41), a prerequisite for performing the Legendre transformation:
−1
φ̇n = Knm πm + Ln ⇒ πn = Knm (φ̇m − Lm ) (12.57)

The Lagrangian density is now obtained directly:


1 −1  n)
L = πn φ̇n − H = (φ̇n − Ln )Knm (φ̇m − Lm ) − V (φn , ∇φ (12.58)
2
If we now instead consider the Minkowski space path integral (10.52) (with the full
Hamiltonian (12.56) rather than a free one), the Gaussian functional integral over
the momentum fields πn can be easily evaluated by the usual trick of completing the
square, and we find
   1
4 −1
4
Dπn ei (πn (x)φ̇n (x)−H)d x = Const · ei{ ( 2 (φ̇n −Ln )Knm (φ̇m −Lm )−V (φn ,∇φn ))d x}


n )d4 x
L(φ̇n ,φn ,∇φ
= Const · ei (12.59)

so, as promised, the functional Fourier transform induced by the integration over
momentum fields in the path integral has effected precisely the algebraic Legendre
transformation from the Hamiltonian to the Lagrangian. The full generating functional
of the theory is then obtained by a further functional integration over the φn , including,
as usual, for convenience, source functions to allow us to generate the n-point functions
of the theory by functional differentiation,
   

n )−jn (x)φn (x))d4 x
i (L(φ̇n ,φn ,∇φ 4
≡ Dφn e  I[φn ,∂μ φn ]−i jn (x)φn (x)d x
i
Z[j] = Dφn e
(12.60)
where we have reinserted the usually invisible factor of Planck’s constant in the final
expression, for reasons shortly to become apparent. For theories where the Lagrange

7 The form of Hamiltonian density assumed here takes care, for example, of the situations encountered
in the quantization of massive or massless abelian gauge vector fields coupled to scalar or fermionic matter
fields (see Problems 4 and 5)—in the massless abelian case, under the proviso that the Hamiltonian has
been evaluated in a “physical” gauge in which all the gauge freedom has been removed. The more subtle
aspects of the canonical quantization procedure, which emerge in theories with a local gauge symmetry,
will be discussed in detail in Chapter 15.
426 Symmetries I: Continuous spacetime symmetry: why we need Lagrangians in field theory

density L ends up being a Lorentz scalar therefore, the Feynman rules (vertices and
propagators) generated by this path integral will clearly lead to Green functions (and
eventually, via LSZ, to S-matrix amplitudes) behaving appropriately under Lorentz
transformation. In practice, as emphasized previously, we begin by specifying the
dynamics of the theory in terms of a Lorentz-invariant action I (= spacetime integral
of the Lagrangian density). The classical Principle of Least Action amounts, as is
apparent from the functional integral (12.60) to a stationary phase approximation
in which the integral is dominated, in the limit of very small , by fields φn cl (x)
which lead to extremal values of the action integral I. We shall see in the next
section that these fields are precisely those satisfying the Euler–Lagrange equations.
From the point of view of quantum field theory, the discussion here shows that the
computation of Green functions and scattering amplitudes can in fact proceed entirely
at the Lagrangian level, using the representation (12.60), with no need to refer to the
Hamiltonian of the theory (which is frequently a much more complicated object than
the Lagrangian, especially, as we shall see later, in gauge theories).
In the fermionic case, the situation is even simpler. We saw earlier that the

fermionic Lagrangian L(ψ, ∂μ ψ) is algebraically identical to π ψ ψ̇ − H(π ψ , ψ, ∇ψ) (tak-
ing just a single Fermi field for simplicity). Thus the transition between the two
formulations in the path integral context
 does not even require us to perform a
functional integral: we simply replace Dπ ψ → Dψ̄ and π ψ → iψ̄γ 0 to convert the
Hamiltonian functional integral into the Lagrangian one:
  ψ  

4 4
Dπ ψ Dψei (π ψ̇−H(π ,ψ,∇ψ))d x → Dψ̄Dψei L(ψ,∂μ ψ)d x
ψ
(12.61)

Again, we need only demand that L be constructed in a Lorentz-invariant way


out of covariantly transforming Dirac fields to ensure the Lorentz-invariance of the
amplitudes generated by this functional representation.

12.4 Noether’s theorem, the stress-energy tensor, and


all that stuff
The Euler–Lagrange equation (12.47) may be regarded as the differential expression
of a global Action Principle. In the classical context it incorporates the Principle of
Least Action familiar from classical mechanics: the classical field evolves dynamically
in such a way as to extremize the associated (c-number) action. Let us begin with the
assumption that our action can be written as the integral over a Lagrange density over
fields φn and their (at most) first spacetime-derivatives ∂μ φn . As the action involves a
spacetime integration, and our fields may be assumed to vanish at infinity (or alterna-
tively, to satisfy periodic boundary conditions imposed at very far distances), we may
freely integrate by parts (neglecting boundary terms) to redistribute derivatives, so
that for example, we may replace φn (x)φn (x) → −∂μ φn (x)∂ μ φn (x). An assumption
of this kind is in fact physically required for sensible four-dimensional field theories.
Higher derivative terms (such as (φ)2 ) in the kinetic (quadratic in fields) part of the
Lagrangian can be shown to imply a failure of spectral positivity of the field theory:
clearly, the inverse Feynman propagator Δ−1 F (p) of such a field will contain a term
proportional to p4 , so that the Feynman propagator of the theory falls faster than
Noether’s theorem, the stress-energy tensor, and all that stuff 427

1/p2 at large p, which we saw in Section 9.5 is incompatible with the Kållen–Lehmann
spectral representation of the two-point function of a local field theory formulated on
a positive-definite Hilbert space.8 On the other hand, any interaction (higher than
quadratic) terms in the Lagrangian involving more than single derivatives of the fields
turn out (in four spacetime dimensions) to violate perturbative renormalizability, as
we shall see in Part 4 of the book. Indeed, the renormalizable gauge field theories of the
Standard Model all satisfy our basic assumption and involve Lagrangians which can
be written in the form L(φn , ∂μ φn ), where i is an index labeling the independent fields
of the theory. The extremal condition for an action obtained from such a Lagrangian
is therefore (again relying on the freedom to integrate by parts)

0 = δI = δLd4 x

∂L ∂L
= ( δφn (x) + ∂μ δφn (x))d4 x
∂φn (x) ∂(∂μ φn (x))

∂L ∂L
= ( − ∂μ )δφn (x)d4 x, ∀δφn (x) (12.62)
∂φn (x) ∂(∂μ φn (x))

As the first-order variation of the action must vanish for arbitrary local variations
δφn (x) of the independent fields of the theory, the Euler–Lagrange equations follow
directly:
∂L ∂L
∂μ = (12.63)
∂(∂μ φn (x)) ∂φn (x)

We have already seen that the covariant form of this expression ensures that the
dynamical equations of the theory, as encapsulated in these Euler–Lagrange equations,
will be rendered compatible with the demands of special relativity by the simple
device of choosing the Lagrange density to be a Lorentz scalar constructed from the
underlying fields φn .
Our task for the remainder of this section is to display the ubiquitous role played
by the action formulation in the study of symmetries, employing as our basic tool
the beautiful result of Emmy Noether, dating from 1918 (translation of original
paper in (Noether, 1971)), that connects the symmetries of a theory defined in terms
of an action functional with conserved currents expressing the exact conservation
laws implied by the dynamics of the theory. The Noether theorem allows for the
discussion of the symmetries of the theory—whether spacetime related or “internal”—
in a completely unified way, and therefore simplifies enormously the task of reading
off from a given action the symmetries of the theory, or conversely, the construction
of actions representing theories with desired conservation laws.
Noether’s theorem predates by several years the introduction of quantized fields
(indeed, of quantum mechanics itself, in its post-Heisenberg–Schrödinger form), and
concerns the symmetry and conservation properties of classical field theories, such

8 For an explicit demonstration of the failure of positivity in the context of canonical quantization of
higher-derivative theories, see (Bernard and Duncan, 1975).
428 Symmetries I: Continuous spacetime symmetry: why we need Lagrangians in field theory

as Maxwellian electromagnetism or general relativity (the properties of the latter


with regard to energy-momentum conservation having been the immediate motivation
for Noether’s studies). Indeed, the Noether theorem as such is essentially a classical
statement: while the conservation laws implied by the theorem are rigorously valid
at the classical level, there are several instances in which the corresponding result
fails for the quantized field theory, in which case one says that the symmetry in
question displays a “quantum anomaly”. We shall return to the question of anomalies
in Chapter 15: for the present, it suffices to observe that the manipulations that
follow in our classical derivation of Noether currents pay no attention whatsoever
to the subtleties that necessarily arise when one deals with non-linear functionals
of quantum fields, involving potentially ill-defined products of the fields at identical
spacetime points. As a matter of fact, the examples given later in this section of
Noether symmetries are (with the sole exception of dilatation symmetry) anomaly-
free: the conservation laws derived here classically can be rigorously shown to survive
(in the form of the so-called “Ward identities” of the field theory) at the quantum
level, with conserved currents defined in terms of appropriately renormalized operator
products. The demonstration of this statement, however, requires the more detailed
familiarity with renormalization theory which will be developed in Part 4 of the book.
Returning to classical field theory, we start by considering an action functional IΩ
based on a specified Lagrangian L, depending on a set of fields φi , integrated over an
arbitrary domain Ω of spacetime:

IΩ ≡ L(φn , ∂μ φn )d4 x (12.64)
Ω

Suppose that IΩ is invariant under a particular simultaneous infinitesimal transfor-


mation of the spacetime coordinates and the fields,

xμ → xμ = xμ + δxμ (12.65)


φn (x) → φn (x ) = φn (x) + δφn (x) (12.66)

As a concrete example, we recall that under an infinitesimal Lorentz transformation


Λμν = g μν + ω μν , ω μν = −ω νμ , and for a covariant field φn (x) in a definite matrix
representation Mnm of the HLG,

xμ → xμ = xμ + ω μν xν (12.67)
φn (x) → φn (x ) = Mnm (Λ)φm (x) (12.68)

For example, if φ(x) is a scalar field, transforming like φ(x) → φ (x ) = φ(x), its four-
gradient vector field transforms like
∂   ∂xν ∂
∂μ φ(x) → μ
φ (x ) = φ(x) = Λμν ∂ν φ(x) (12.69)
∂x ∂xμ ∂xν
in accordance with (12.68). Of course, in making the Lorentz transformation in (12.64),
we must also transform the domain of integration Ω to Ω , where x ∈ Ω if and only if
x ∈ Ω, thereby ensuring the invariance of IΩ under Lorentz transformations provided
L is constructed as a Lorentz scalar composite of the component fields φn .
Noether’s theorem, the stress-energy tensor, and all that stuff 429

Leaving aside temporarily the special case of Lorentz transformations, we see that
invariance of the action under (12.65, 12.66) amounts to
 
δIΩ = L(φn (x ), ∂μ φn (x ))d4 x − L(φn (x), ∂μ φn (x))d4 x = 0 (12.70)
Ω Ω

The Jacobian of the infinitesimal coordinate transformation is (using det(1 + M ) =


eTrLn(1+M ) = 1 + Tr(M ) + O(M 2 ) for M an infinitesimal matrix)

∂xμ ∂
det( ) = 1 + μ δxμ (12.71)
∂xν ∂x
so, changing variables from x back to x in the first integral, and neglecting second-
order infinitesimals,


δIΩ = [L(φn + δφn , ∂μ φn + δ∂μ φn )(1 + μ δxμ ) − L(φn , ∂μ φn )]d4 x
Ω ∂x

∂L ∂L ∂
= [ δφn + δ(∂μ φn ) + L μ (δxμ )]d4 x, ∀Ω
Ω ∂φ n ∂(∂ φ
μ n ) ∂x
∂L ∂L ∂
⇒ δφn + δ(∂μ φn ) + L μ (δxμ ) = 0 (12.72)
∂φn ∂(∂μ φn ) ∂x

There is a subtlety in the middle term of the final expression (12.72), arising from the
fact that the variation δ and spacetime-derivative ∂μ do not in general commute:
∂  
∂μ φn (x ) = φ (x )
∂xμ n

= (φn (x) + δφn (x))
∂xμ
∂xν ∂
= (φn (x) + δφn (x))
∂xμ ∂xν
so we find
∂δxν
∂μ φn (x ) = (g νμ − )(∂ν φn (x) + ∂ν δφn (x))
∂xμ
∂δxν
= ∂μ φn (x) + ∂μ δφn (x) − ∂ν φn (x)
∂xμ
∂δxν
⇒ δ(∂μ φn (x)) = ∂μ δφn (x) − ∂ν φn (x) (12.73)
∂xμ
Inserting δ(∂μ φn ) from (12.73) in the invariance condition (12.72) we obtain
∂L ∂L ∂δxν ∂
δφn + (∂μ δφn − ∂ν φn ) + L μ (δxμ ) = 0 (12.74)
∂φn ∂(∂μ φn ) ∂xμ ∂x
One final rearrangement of this identity leads to a convenient form for the statement
of Noether’s theorem. Define the intrinsic change in φn as δ ∗ φn ≡ δφn − δxμ ∂φ
∂xμ : this
n
430 Symmetries I: Continuous spacetime symmetry: why we need Lagrangians in field theory

is the change in the field other than that due to the shift in coordinates. Multiplying
out the factors and cancelling terms, a little algebra shows

∂L ∂L ∂L ∂L
δ ∗ φn { − ∂ν } + ∂μ { δφn + [g μν L − ∂ν φn ]δxν }
∂φn ∂(∂ν φn ) ∂(∂μ φn ) ∂(∂μ φn )
∂L ∂L ∂δxν ∂
= δφn + (∂μ δφn − ∂ν φn μ
) + L μ (δxμ ) = 0 (12.75)
∂φn ∂(∂μ φn ) ∂x ∂x

For fields satisfying the Euler–Lagrange equations of motion, the term proportional
to δ ∗ φn vanishes, and we are left with the desired Noether theorem:

∂L ∂L
∂μ J μ (x) = 0, Jμ ≡ [ ∂ν φn − g μν L]δxν − δφn (12.76)
∂(∂μ φn ) ∂(∂μ φn )

In other words, the four fields constituting J μ form a divergenceless vector, or


conserved Noether current, the zeroth component of which gives the density for a
conserved charge, by the familiar maneuver:

∂ 0   ∂
∂μ J μ = 0 ⇒ J =∇·J ⇒ J 0 (x, t)d3 x = 0 (12.77)
∂t ∂t

i.e. Q̇(t) = 0, Q(t) ≡ J 0 (x, t)d3 x (12.78)

To summarize: invariances of the action under infinitesimal variations taking the form
(12.65, 12.66) stand in one-to-one correspondence with conserved currents J μ , each of
which in turn gives rise to a conserved charge Q, preserved under the dynamics entailed
by the Euler–Lagrange equations of the theory. After quantization, the latter equations
simply embody the dynamics of the Heisenberg fields of the theory, so we may at least
hope that quantized currents formed by simply replacing the classical fields from
which the Noether currents are built with the corresponding Heisenberg fields will
also provide conserved objects at the quantum level. This is by no means guaranteed
a priori: as we mentioned earlier, the necessity for careful definition of the composite
operators appearing in the currents, due to ordering difficulties and/or short-distance
singularities, may interfere with the implicit smoothness properties assumed in the
purely classical derivation of the Noether theorem, leading to a “quantum anomaly”,
or violation of a classically conserved quantity at the quantum level. The examples
given below (apart from dilatation symmetry) will not, however, be infected with the
anomaly disease, to which we return in Chapter 15.
We shall shortly see that the combination of fields multiplying δxν in (12.76) plays
a fundamental role for relativistically invariant theories. The free μ, ν indices suggest
that we define a second-rank “energy-momentum” tensor T μν , the physical significance
of which will shortly emerge, as

∂L
T μν ≡ ∂ν φn − g μν L (12.79)
∂(∂μ φn )
Applications of Noether’s theorem 431

In terms of T μν , the Noether current J μ takes the form


∂L
J μ = T μν δxν − δφn (12.80)
∂(∂μ φn )
There is an important generalization of the version of Noether’s theorem given
above which will be important in our discussion of supersymmetry in section 12.6. Let
us suppose that we are dealing with a global symmetry with δxν = 0, but that the
Lagrangian density is only invariant up to a spacetime divergence under an infinitesi-
mal symmetry transformation, δL = ∂μ K μ , so that the action integral IΩ is invariant
if the domain of integration Ω is all of spacetime, but not in general for arbitrary
finite domains Ω, due to boundary terms arising from the four-dimensional version
of Gauss’s theorem. The variation of the Lagrangian density under an infinitesimal
symmetry transformation φn → φn + δφn is
∂L ∂L
δL = δφn + δ(∂μ φn )
∂φn ∂(∂μ φn )
∂L ∂L
= ∂μ ( )δφn + ∂μ δφn
∂(∂μ φn ) ∂(∂μ φn )
μ
= −∂μ JNoeth (12.81)
where we have used the Euler–Lagrange equations of motion in the second line, and
μ
introduced the notation JNoeth to indicate the conventional Noether current of (12.80)
ν
(with δx = 0). But by assumption, the variation of the Lagrangian can be written as
a divergence, δL = ∂μ K μ , so the current that is actually conserved in this case is
μ
J μ = JNoeth + Kμ (12.82)

12.5 Applications of Noether’s theorem


Now for some examples of Noether symmetries of central importance in relativistic
field theory. We treat first the purely spacetime symmetries—those associated with the
invariance of the theory under the Poincaré group composed of spacetime translations
and homogeneous Lorentz transformations. Then we examine the case of internal
symmetries, in which δxν = 0, and only the fields undergo a transformation. The
symmetries considered in this section all involve a Noether current of the standard
form (12.80): we shall see an example of the generalized version (12.82) in the next
section, when we consider global supersymmetry.
Example 1: Spacetime translation invariance.
As long as the Lagrangian density of the theory derives its dependence on the
spacetime coordinate x entirely through the fields, with no other explicit x-dependence
(e.g., in the coefficients multiplying the fields), the action functional IΩ of (12.64) is
clearly invariant under fixed translations (infinitesimal or finite) of x:

xμ → xμ = xμ + g μσ , σ = 0, 1, 2, 3 (12.83)
δφn (x) = 0 (12.84)
432 Symmetries I: Continuous spacetime symmetry: why we need Lagrangians in field theory

Indeed, this is simply the statement of invariance of the spacetime integral under
the change of variables embodied in (12.83). The quantity  is an arbitrary positive
infinitesimal constant, which will be divided out of the definition of the current at the
end. There are evidently four independent symmetries, corresponding to time (σ = 0)
and space (σ = 1, 2, 3) translations, corresponding to the currents (cf. (12.80)

J μσ = T μσ , σ = 0, 1, 2, 3 (12.85)

The reason for attaching the name “energy-momentum tensor” to T μν is now apparent.
Relabeling the conserved charge associated with J μσ as Pσ (rather than Qσ , say), we
have from (12.79),

∂L
Pσ = d3 x{ ∂σ φn − g 0σ L} (12.86)
∂ φ̇n
and in particular, for the time-component (recall that the conjugate momentum fields
πn ≡ ∂∂L
φ̇n
)
 
P0 = d3 x{πn φ̇n − L} = d3 x H(x, t) = H (12.87)

the corresponding charge is simply the full Hamiltonian, given as the Legendre trans-
form of the Lagrangian. The fundamental role played by the Legendre transform
in connecting the Lagrangian and Hamiltonian forms is therefore seen to emerge
inescapably from the Noether treatment of the time-translational symmetry of the
theory. For the spatial components of the four-vector Pσ we find

  n (x, t)
P = d3 x πn (x, t)∇φ (12.88)

The interpretation of this spatial vector as the spatial momentum can be seen if we
take the case of purely bosonic fields, and impose the standard equal-time commutator
relations

[πn (x, t), φm (y , t)] = −iδ 3 (x − y ) (12.89)

whereupon one finds



[P , φm (y , t)] =  n (x, t) = −i∇φ
d3 x[πn (x, t), φm (y , t)] ∇φ  m (y , t) (12.90)

so that P generates spatial translations of the fields of the theory, as expected of the
spatial momentum operator. The Noether charges Pσ given in (12.86) are therefore
nothing but our old friend the energy-momentum four-vector of the theory, with
the “columns” of the energy-momentum tensor T μν giving the associated conserved
currents (thus, ∂μ T μν = 0).
The physical interpretation of the energy-momentum tensor defined in (12.79) is
actually quite subtle. From the point of view of “flat space” field theory (formulated
in Minkowski space), the only relevant property of the tensor density T μν (x) is that
Applications of Noether’s theorem 433

the spatial integrals T 0σ d3 x reproduce the conserved energy-momentum four-vector
components Pσ which implement spacetime translations on the Heisenberg fields of the
theory. The whole axiomatic formulation of interacting field theory à la Wightman,
for example, only relies on the existence of the conserved generators of the Poincaré
group, not on the presence of a set of “charge” densities which can be integrated to
give these operators. Thus, nothing is altered from the point of view of Minkowski
field theory if we “redistribute” the energy and momentum density on any time-slice
as long as the spatial integral preserves the total energy and momentum of the field as
given by Pσ . For example, we can certainly alter the “canonical” energy-momentum
tensor (12.79) (to which we now add a “c” subscript to indicate its special origin in
the canonical Noether procedure) by adding a “superpotential” term:
∂L
Tcμν ≡ ∂ ν φn − g μν L → T μν = Tcμν + ∂λ S λμν , S λμν = −S μλν (12.91)
∂(∂μ φn )

as the divergence of the added term (on the μ index) is automatically zero due to
the antisymmetry property of the superpotential, so that the modified tensor leads to
exactly the same energy-momentum vector as the canonical version.
The actual local distribution of energy and momentum only acquires physical
significance if there are fields in the theory which couple directly, and locally, to the
energy-momentum tensor. In fact, once we include gravitational effects along the lines
of general relativity, such a field appears immediately in the form of the now dynamical
spacetime metric gμν (x). Once a generally covariant action functional is constructed
for the particular matter fields in the background metric gμν (x), the variation of the
action with respect to the metric is precisely the energy-momentum tensor T μν (x) of
these fields. In particular, in the weak field limit for the gravitational field, where
we expand gμν (x) = ημν + hμν (x), and now ημν = diag(1, −1, −1, −1) is the fixed
Minkowski metric (which we have heretofore simply called gμν !), hμν (x) acts as an
interpolating field for gravitons, and the term hμν (x)Tμν (x) of first-order in the metric
deviation is the appropriate interaction Lagrangian density if we wish to compute S-
matrix amplitudes for processes involving one graviton and multiple matter particles.
The specific choice of the spatial dependence of Tμν (x) clearly becomes physically
significant in this case. In particular, the Tμν tensor obtained by metric variation of
the generally covariant matter action is clearly a symmetric tensor, Tμν = Tνμ , which is
clearly not guaranteed by the Noether expression (12.79), and indeed there are cases
where the need to construct a generally covariant action necessitates the addition
of a superpotential term to the canonical tensor Tcμν in order to obtain a properly
symmetric tensor. For scalar field theory (e.g., λφ4 theory), the canonical tensor Tcμν
is already symmetric,

Tcμν = ∂ μ φ∂ ν φ − g μν L (12.92)

but leads to ultraviolet divergences when used as the current coupled to gravitons in
single graviton-multi-scalar scattering amplitudes, as shown by Callan, Coleman, and
Jackiw (Callan et al., 1970). The situation is remedied by adding a term − 12
1
R(x)φ2 (x)
to the generally covariant Lagrangian density, where R(x) is the curvature scalar (so
that the added term vanishes in flat space), thereby modifying the energy-momentum
434 Symmetries I: Continuous spacetime symmetry: why we need Lagrangians in field theory

tensor (which is the metric variation of the Lagrange density) by a superpotential


term, and leading to a “new, improved” energy-momentum tensor:9

μν 1
Timpr (x) = Tcμν (x) − (∂ μ ∂ ν − g μν )(φ2 (x)) (12.93)
6
We shall see shortly that this improved tensor also leads to simplified forms for the
currents associated with dilatation and conformal symmetry.
Example 2: Invariance under the homogeneous Lorentz group.
In this case the invariance obtains once we have constructed an action as the spacetime
integral of a scalar Lagrangian density, built out of properly contracted covariant fields.
The infinitesimal transformations are those given in (12.94, 12.95), repeated here for
convenience:

xμ → xμ = xμ + ω μν xν (12.94)
φn (x) → φn (x ) = Mnm (Λ)φm (x) (12.95)

By definition (cf. Section 7.2) the representatives Mnm (Λ) of a Lorentz transformation
Λ in a particular (finite-dimensional) representation (which may be a direct sum of
irreducible representations) are given in terms of the generators (J μν )nm and the
infinitesimal rotation angles and boost rapidities ωμν by
i
Mnm (Λ) = δnm + ωμν (J μν )nm + O(ω 2 ), ωμν = −ωνμ (12.96)
2
so
i
δφn (x) ≡ φn (x ) − φn (x) = ωμν (J μν )nm φm (x) (12.97)
2
There are six independent choices for the ωμν corresponding to rotations around or
boosts along the three spatial axes. Let us pick a specific one by choosing a pair κ, λ
with 0 ≤ κ < λ ≤ 3, and setting

ωμν = g κμ g λν − g κν g λμ , δxν = g κν xλ − g λν xκ (12.98)

and the corresponding conserved currents are


∂L
Mμκλ = (g κν xλ − g λν xκ )T μν − i (J κλ )nm φm
∂(∂μ φn )
∂L
= xλ T μκ − xκ T μλ − i (J κλ )nm φm (12.99)
∂(∂μ φn )

Each conserved current in turn leads to a conserved charge, obtained by spatially


integrating its time component: to avoid unnecessary proliferation of notation, we

9 As shown by (Freedman and Weinberg, 1974), the coefficient 1


6
is further modified at the two-loop
level, and beyond, by renormalization effects.
Applications of Noether’s theorem 435

shall use the same letter M to denote both the current and its charge:

Mκλ ≡ d3 xM0κλ , Ṁκλ = 0 (12.100)

The interpretation of the various terms appearing in (12.99) becomes clearer if we


examine the special case of spatial rotations, by picking (κ, λ) = (i, j), 1 ≤ i < j ≤ 3.
The associated charge is then seen to be

Mij = xj P i − xi P j − iπn (J ij )nm φm (12.101)

the first two terms of which clearly give the orbital angular momentum of the system,
whereas the last term, present only for fields transforming non-trivially under the
Lorentz group, must correspond to spin angular momentum. That the total charge
Mij indeed corresponds to the total angular momentum operator follows from the
fact that it generates, by commutation with the field, and employing the usual equal-
time commutation relations, the correct infinitesimal variation:

[Mij , φn (x, t)] = i(xi ∂ j − xj ∂ i )φn (x, t) − (J ij )nm φm (x, t) (12.102)

A similar result can be obtained for the commutation with the boost operator M0i —
an exercise we leave for the reader (see Problem 6). The result (12.102) indicates that
our Noether charges are indeed the correct generators of the HLG in the state space of
the quantum field theory. Indeed, for a general Lorentz transformation Λ the covariant
field transformation law is (cf.(7.5))

U (Λ)φn (x)U † (Λ) = Mnm (Λ−1 )φm (Λx) (12.103)

Note that we are working entirely in Heisenberg representation here, so that these
U (Λ)s are really the UH (Λ) of Section 9.1, acting on the fully interacting Heisen-
berg fields of the theory, and in the state space spanned by eigenstates of the full
Hamiltonian: we are omitting the H subscript for simplicity of notation. If we take
Λ infinitesimally close to the identity, Λμν = g μν + ω μν , the corresponding unitary
operators are expressed in terms of the Hilbert space generators Mμν of infinitesimal
Lorentz transformations (see (9.64), for which we use the same notation as the Noether
charges found above, as they will shortly be seen to be identical:
i
U (Λ) = 1 + ωμν Mμν + O(ω 2 ) (12.104)
2
Inserting (12.104) into (12.103), we find, on expanding every term to first order in ω,
the commutation relation for the generators Mμν with our field φn :

[Mμν , φn (x)] = i(xμ ∂ ν − xν ∂ μ )φn (x) − (J μν )nm φm (x) (12.105)

which agrees with (12.102) if we set (μ, ν) → (i, j). One may also compute the
commutators of the various Noether charges with each other, using again the equal-
time commutators of the theory. In this way, the verification of the full Poincaré
algebra, as given in (9.65, 9.67, 9.68), can be carried out explicitly starting from the
expressions for the Noether charges given above.
436 Symmetries I: Continuous spacetime symmetry: why we need Lagrangians in field theory

Example 3: Dilatation and conformal symmetry.


The Poincaré group consisting of the homogeneous Lorentz transformations (with six
independent real parameters, associated with the generators Mμν ), together with the
(four-parameter) group of spacetime translations (with generators P μ ), can actually be
enlarged to a fifteen-parameter group, the conformal group, by including a dilatation
symmetry (with generator D) and a four-parameter set of conformal transformations
(with generators Kμ ). The effect of finite dilatation and conformal transformations on
the spacetime coordinates is10

xμ → xμ ≡ e−ρ xμ dilatation (12.106)


x −c x
μ μ 2
xμ → xμ ≡ conformal (12.107)
1 − 2c · x + c2 x2
which, for infinitesimal transformations (ρ and cμ infinitesimal), become

xμ → xμ − ρxμ + O(ρ2 ) (12.108)


x → x + cσ (2x x − g
μ μ σ μ σμ 2
x ) + O(c )2
(12.109)

The four-momentum generators P μ must transform like −i ∂x∂ μ under dilatation


generators, i.e., with a factor e+ρ , so under a finite dilatation generated by eiρD

eiρD P μ e−iρD = eρ P μ ⇒ eiρD P 2 e−iρD = e2ρ P 2 (12.110)

so that in any theory with exact dilatation symmetry (so that e−iρD |ψ is a physical
state if |ψ is), the mass spectrum of particles must either be exactly zero or continuous:
clearly not the world we live in! Even for massless theories, we shall see that the
classical Noether dilatation (and conformal) currents and charges are in general
broken by quantum effects (anomalies) once interactions are present, so the formal
existence of the conformal extension of the Poincaré group may seem at first sight to
be a matter of purely formal interest.11 Nevertheless, the nature of the breaking of
the conformal group in interacting field theories is now completely understood, and
has deep and important connections to the renormalization group properties of such
theories which we shall study in detail in Part 4 of the book. Accordingly, we shall
give a brief description of the dilatation current and its connection to the trace of
the energy-momentum tensor, starting again from the general Noether prescription
for construction of a conserved current for a classical symmetry of the action.
First, we need to establish appropriate transformation rules for the fields of the
theory, which we shall take for simplicity to be self-conjugate spin-zero scalars. The

10 The rather strange—and highly non-linear!—expression (12.107) for the conformal transformation can
be understood once we re-express it as a sequence: coordinate inversion I- translation T - coordinate inver-
sion I, where Ixμ ≡ xμ /x2 , T xμ = xμ − cμ . It follows immediately that the conformal transformations
form an abelian subgroup of the full conformal group.
11 An important exception arises in the case of two-dimensional conformal field theories: it turns out that
there is a rich plethora of such non-trivial field theories displaying exact conformal invariance, which have
been the subject of intensive study in the last thirty years. The situation with exactly conformally invariant
theories in four dimensions is murkier; cf. Section 15.5.
Applications of Noether’s theorem 437

classical Noether action for a massless φ4 theory,



1 λ
I= ( ∂μ φ(x)∂ μ φ(x) − φ(x)4 )d4 x (12.111)
2 4!

is easily seen to be invariant under the dilatation transformation of the fields

φ (x ) = edρ φ(x), x = e−ρ x (12.112)

provided we choose the real number d (called the “scale dimension” of the field φ)
equal to unity. For example,
  
φ (x )4 d4 x = e4dρ φ(x)4 e−4ρ d4 x = φ(x)4 d4 x if d = 1 (12.113)

On the other hand, this invariance is destroyed the moment we include terms in the
Lagrangian with dimensionful coefficients, such as a mass term 12 m2 φ2 or interaction
terms other than φ4 , i.e., λ(n) φn , n = 4.
Referring back to the specification of a Noether symmetry in terms of the infinites-
imal transformations of coordinates (12.65) and fields (12.66), our general expression
for the associated Noether current (12.80) gives, for the present case of dilatation
symmetry, (taking ρ infinitesimal, and dividing by −ρ to obtain the conventional
normalization)

μ ∂L
Jdil,c = Tcμν xν + d φ = Tcμν xν + (∂ μ φ)φ (12.114)
∂(∂μ φ)

The subscript “c” in the dilatation current and the energy-momentum tensor indicate
that we are using the canonical energy-momentum tensor, as described previously,
without the “improvement” necessary once we couple quantized matter fields to
gravitation. For the theory defined by action (12.111), the canonical energy-momentum
tensor is just

1 λ
Tcμν = ∂ μ φ∂ ν φ − g μν ∂μ φ∂ μ φ + g μν φ4 (12.115)
2 4!

and the conservation of the dilatation current at the classical level follows immediately
(using ∂μ Tcμν = 0, and the classical field equation φ + 3!λ 3
φ = 0))

μ λ 4
∂μ Jdil,c = Tcμμ + ∂μ (φ∂ μ φ) = −∂μ φ∂ μ φ + 4 φ + ∂μ φ∂ μ φ + φφ = 0 (12.116)
4!

As usual, Noether currrents may be modified by the addition of superpotential terms


which do not alter their conservation property (as the superpotential terms are by
definition automatically divergence-free), and in the case of the dilatation current,
unlike the energy-momentum tensor which couples to gravitons, there are no physical
μ
fields coupled to the current Jdil,c , so we are free to construct an “improved” dilatation
438 Symmetries I: Continuous spacetime symmetry: why we need Lagrangians in field theory

current by defining

μ μ 1
Jdil,impr ≡ Jdil,c + ∂σ (xμ ∂ σ − xσ ∂ μ )(φ2 )
6
1
= Tcμν xν + (∂ μ φ)φ + ∂σ (xμ ∂ σ − xσ ∂ μ )(φ2 )
6
μν
= Timpr xν (12.117)
μν
where Timpr is the improved energy-momentum tensor (12.93), and conservation of
the dilatation current now reduces simply to the tracelessness of this tensor
μ μ
∂μ Jdil,impr = Timpr μ =0 (12.118)

The reader may easily verify that the inclusion of a mass term 12 m2 φ2 in the Lagrangian
results in a non-vanishing divergence of the current, and trace of the energy-momentum
tensor:
μ μ 2 2
∂μ Jdil,impr = Timpr μ = 2m φ (12.119)

These results, while classically valid, turn out to be incorrect once we quantize our field
theory: there are additional terms, proportional to Planck’s constant, which appear on
the right-hand side of both (12.118) and (12.119). They provide our first example of the
famous quantum anomalies (in the present case, the “trace anomaly”) of interacting
quantum field theory, which we shall discuss in detail in Chapter 15. The important
lesson which we need to take away from the present discussion is that the extension
of the Poincaré group to the larger conformal group cannot be carried through in
interacting quantum field theories (in four dimensions12 ). This is in contrast to the
supersymmetric extension of the Poincaré group which we shall discuss below, where
it turns out to be perfectly possible to construct a wide class of interacting field
theories with an exact global supersymmetry which extends the conventional Poincaré
symmetry of relativistic field theory.

Example 4: Abelian internal symmetries—phase transformations and


charge conservation.
Our remaining examples involve invariances of the action under transformations which
leave the spacetime coordinates invariant, δxμ = 0, but alter the fields, typically by
a spacetime-independent linear rearrangement: these are the so-called global internal
symmetries of the theory. Suppose that our (for the time being, classical) Lagrangian
L is invariant under simultaneous changes of phase of a set of complex fields φn (x)
according to

φn (x) → eiωqn φn (x), φ∗n (x) → e−iωqn φ∗n (x) (12.120)

12 It turns out that in two dimensions, interacting conformal field theories can be constructed. Also, there
appears to be a very special class of supersymmetric field theories in four dimensions that possess exact
conformal invariance, even though they are interacting.
Applications of Noether’s theorem 439

The action functional IΩ in (12.64) will then be invariant (for ω infinitesimal) under
the variations,

δφn (x) = iωqn φn (x), δφ∗n (x) = −iωqn φ∗n (x), δxμ = 0 (12.121)

The set of transformations of the type (12.120) clearly form a commutative (abelian)
group. Now L, being real (classically- or hermitian, once quantized), must contain
both φn and φ∗n , so the associated Noether current (12.76) can be written
 ∂L ∂L
Jμ = { (−iqn φn ) + (iqn φ∗n )} (12.122)
n
∂(∂μ φn ) ∂(∂μ φ∗n )

corresponding to a conserved charge Q given by


 
Q = −i d3 x qn {πn (x, t)φn (x, t) − πn∗ (x, t)φ∗n (x, t)} (12.123)
n

The charge density J 0 has a simple equal-time commutation relation with the fields
of the theory:

[J 0 (y , t), φn (x, t)] = −qn δ 3 (x − y )φn (y , t) (12.124)

The physical interpretation of this conserved quantity for the quantized theory is very
simple: the field φn may be considered as interpolating for a particle of “charge” qn .
Each term in the Lagrangian must contain a product of fields for which the phase
factor eiω (±qn ) = 1, so that the interaction terms in the Lagrangian lead to graphs
where the charge inserted by the incoming lines exactly balances that removed by
the outgoing lines. Depending on the particular Lagrangian under consideration, the
charge Q may represent electric charge, baryon number, lepton number, strangeness,
or indeed any globally conserved quantum number, depending on the particular set of
phase transformations chosen. Of course, as in the case of the spacetime symmetries
discussed previously, the Noether charge Q also serves as the infinitesimal generator
of the transformation (12.121), as we discover by integrating (12.124) over y :

[Q, φn (x)] = −qn φn (x) (12.125)

in accordance with the interpretation that φn contains a destruction operator for a


particle with charge qn ( and a creation operator for the antiparticle with charge −qn ).

Example 5: Non-abelian internal symmetries.


A further generalization of the type of internal symmetries discussed in the preceding
example arises when the (still spacetime-independent!) transformation of the fields
involves mixing of different field components by a matrix transformation. Typically,
the matrices involved must be unitary (for complex fields) or orthogonal (for real
fields), to maintain the invariance of the kinetic part of the Lagrangian (which, for
example, for scalar fields takes the form n ∂ μ φ∗n ∂μ φn , in the complex case). Thus
consider a theory with an action invariant under δxμ = 0 for the coordinates and a
440 Symmetries I: Continuous spacetime symmetry: why we need Lagrangians in field theory

symmetry transformation

φn (x) → φn (x) = (eiωα tα )nm φm (x) ≡ Mnm (ωα )φm (x) (12.126)

where the tα are a set of infinitesimal generators for some group of linear transfor-
mations. For real fields we shall assume that the tα are hermitian pure imaginary, so
that the resultant matrix transformation eiωα tα of the fields is a real orthogonal one,
while if the fields are complex, the generators are hermitian and the matrix unitary.
In the example to be considered shortly, the only complex fields are Dirac fields, and
the Lagrangian only contains derivatives of the φn s, not of the φ∗n s (i.e., the ψ̄s), so
the Noether current Jαμ associated with the αth generator takes the same form for
both real and complex fields
 ∂L
Jαμ (x) = −i (tα )nm φm (x) (12.127)
n
∂(∂μ φn (x))

In analogy with (12.124) we have the equal-time commutation of the charge density
with the fields

[Jα0 (y , t), φn (x, t)] = −δ 3 (x − y )(tα )nm φm (y , t) (12.128)

The isospin-invariant effective meson field theory of the 1950s provides a suitable
example: we assume that pions and nucleons interact via a (in this case, non-derivative
coupled) Yukawa interaction, with basic fields of the theory taken as a nucleon doublet
N (x) = (p(x), n(x)) (where p(x) and n(x) are Dirac fields for the proton and neutron,
assumed to have identical mass M ) and a triplet of real pion fields πα (x), α = 1, 2, 3
(with π3 interpolating for the neutral pion, and √12 (π1 + iπ2 ) for the positively charged
pion, all assumed to have identical mass mπ ). The Lagrangian for the full system (with
obvious implicit summations over internal indices) is then taken to be
1 
L = N̄ (i∂/ − M )N + (∂μπ · ∂ μπ − m2π π · π ) − ig N̄ γ5τ N · φ (12.129)
2
where the i in the interaction term is there for hermiticity (for g real). The 2x2 matrices
τ are one-half the usual Pauli matrices, τα = 12 σα , α = 1, 2, 3, so the matrix group in
question is just SU(2). The reader may easily verify, using the Lie algebra of SU(2),
[τα , τβ ] = iαβγ τγ , that the Lagrangian (12.129) is invariant under the following set of
global infinitesimal transformations ( ω are spacetime-independent)

N (x) → (1 + i
ω · τ )N (x) (12.130)
N̄ (x) → N̄ (x)(1 − i
ω · τ ) (12.131)
π (x) → π (x) + π (x) × ω
 (12.132)

from which one may read off directly the vector of conserved Noether currents (12.76)
for this theory

Jμ = N̄ γ μτ N + π × ∂ μπ (12.133)


Applications of Noether’s theorem 441

and a corresponding set of conserved “isospin charges” I



I = d3 x{N †τ N (x, t) + π × π˙ (x, t)}
 (12.134)

The (approximate) conservation of the isospin quantum numbers I 2 , I3 in strong


interaction processes leads to many powerful and valuable selection rules in strong
interaction physics: the reader is referred to (Sakurai, 1964) for an exhaustive discus-
sion of these important results.
As in the case of the spacetime Poincaré symmetries, the commutator relations of
the Noether charges for internal symmetries replicate the Lie algebra of the underlying
symmetry group. For example, using the equal-time commutation relations for the pion
fields and the corresponding equal-time anticommutation relations for the nucleon
fields, one may easily demonstrate that the charges in (12.134) satisfy

[Iα , Iβ ] = iαβγ Iγ (12.135)

exactly the Lie algebra of the rotation group, allowing us to take over the entire
machinery of angular momentum in the discussion of isospin symmetry and conser-
vation. The proof of this result for a general global internal symmetry (for bosonic
fields) is deferred to the exercises at the end of this chapter (see Problem 7).
Our discussion of Noether’s theorem so far has been carried out for the most
part in a classical context: issues of operator ordering, regularization of operator
products, and so on, have been resolutely ignored. At first sight, it may seem possible
to circumvent these issues by resorting to a path-integral approach, and indeed, it is
both important and enlightening to understand the precise way in which invariance
and conservation intertwine in the context of the functional integral quantization
of field theory. Inasmuch as the functional integral involves an action built from c-
number fields, we might at first expect that our discussion to this point will carry
over fairly directly to quantum field theory realized via path-integral concepts. In
fact, we shall later see in our discussion of anomalous currents in Chapter 15 that
subtleties arising from operator regularization cannot simply be dodged in a functional
formalism: rather, they reappear in an unexpected location (specifically, in the case
of Noether’s theorem, in the definition of the functional measure).
We shall conclude this section by giving a brief description of the functional version
of Noether’s theorem for the case of non-anomalous symmetries, where the aforesaid
subtleties do not enter. The end result (the functional analog of (12.76)) will be a set of
identities—the so-called Ward–Takahashi identities—satisfied by the Feynman Green
functions of the theory. Let us start with a theory of N fields φn (x), n = 1, 2, ..., N with
a global internal symmetry (12.126), with Lagrangian L(φn , ∂μ φn ). The fields may in
fact be bosonic or fermionic, but here we shall assume for simplicity only bosonic
fields, to avoid having to keep careful track of minus signs arising from interchange of
Grassmann fields or sources. The symmetry parameters ωα in (12.126)) are spacetime
constants, as we are dealing with a global symmetry of the theory, but if the Lagrangian
can be written (as the notation L(φn , ∂μ φn ) implicitly suggests) so that the fields
appear with at most a single spacetime-derivative, it is easily seen to be invariant if
442 Symmetries I: Continuous spacetime symmetry: why we need Lagrangians in field theory

the ωα are allowed to be spacetime functions, in the following sense:


L(Mnm (ωα (x))φn , Mnm (ωα (x))∂μ φm ) = L(φn , ∂μ φn ), ∀ωα (x) (12.136)
Note that the (now spacetime-dependent) gauge parameters ωα (x) are not differenti-
ated on the left-hand side of (12.136), so that the invariance of the Lagrangian density
under global transformations will ensure the stronger invariance property given by
this equality. The generating functional of Feynman Green functions for this theory is
  4
Z[jn ] = Dφn ei {L(φn ,∂μ φn )−jn (x)φn (x)}d x (12.137)

We now wish to examine the result of a change of the functional variables of


integration—i.e., the fields φn (x)—given precisely by the internal symmetry (12.126),
but where the gauge parameters ωα (x) are now allowed to be spacetime-dependent
functions. If the matrix transformation Mnm (ωα ) is a unitary one (or orthogonal, for
real fields), then we should expect the Jacobian of the functional change of variables
to be a product of unity over all spacetime points, and hence itself equal to unity. We
should warn the reader at this point that it is precisely this—on the surface, quite
innocent—assumption which fails in the case of the anomalous currents which we shall
meet in Section 15.5. However, proceeding on the basis of a unit functional Jacobian,
we conclude that Z[j] must also be equal to
  4
Z[jn ] = Dφn ei {L(Mnm (ωα (x))φm ,∂μ Mnm (ωα (x))φm )−jn (x)Mnm (ωα (x))φm (x)}d x
(12.138)
for arbitrary ωα (x), and in particular for infinitesimal ωα , where we may replace
Mnm = δnm + iωα (x)(tα )nm + O(ω 2 ). For infinitesimal ωα , as a consequence of
(12.136), we may write the Lagrangian density appearing in the exponent of (12.138)
to first order in ω, with φn (x) = φn (x) + iωα (x)(tα )nm φm (x), as
∂L
L(φn , ∂μ φn ) = L(φn , ∂μ φn ) + i∂μ ωα (x)(tα )nm φm (x) + O(ω 2 )
∂(∂μ φn )
= L(φn , ∂μ φn ) − ∂μ ωα (x)Jαμ (x) + O(ω 2 ) (12.139)
where we see that the Noether current Jαμ of the global symmetry, (12.127), has re-
emerged as the coefficient of the spacetime variation of the gauge parameters. Of
course, the fact that the term involving the Noether current vanishes if the gauge
parameters are constant is simply a restatement of the assumed exact global internal
symmetry of our Lagrangian. Subtracting the two equivalent expressions (12.137) and
(12.138) for Z[jn ], we find, to first order in ωα (x),
    4
Dφn {−i (∂μ ωα )Jα (x)d x + ωα (x)jn (x)(tα )nm φm (x)}ei {L−jn φn }d x = 0
μ 4

(12.140)
As this identity must hold for arbitrary ωα (x), we may functionally differentiate with
respect to ωα and obtain
Beyond Poincaré: supersymmetry and superfields 443
 
{L−jn φn }d4 x
Dφn {i∂μ Jαμ (x) + jn (x)(tα )nm φm (x)}ei =0 (12.141)

This result is effectively Noether’s theorem in functional form. The Green functions
of the theory are obtained by functionally differentiating with respect to the sources
δp
jn (x) (cf. (10.73)): if we apply ip δjn (y1 )···δjnp (yp )
to (12.141) and then set the sources
1
jn to zero, we obtain the Ward–Takahashi identities (in coordinate space) associated
with the internal symmetry (12.126):

0|T (Jαμ (x)φn1 (y1 ).....φnp (yp ))|0
∂xμ
 p
=− δ 4 (x − yr )(tα )nr m 0|T (φn1 (y1 )..φnr−1 (yr−1 )φm (x)φnr+1 (yr+1 )..φnp (yp ))|0
r=1

(12.142)

The equivalence of this expression to the operator statement of current conservation,


∂μ Jαμ = 0, may not be immediately obvious to the reader. One sees, however, that
there is no term on the right-hand side containing the four-divergence of the current
inside the time-ordered product: just a sequence of “contact” terms in which the
spacetime argument of the current is set equal in turn to the locations of each of the
field operators.
In fact, the Ward–Takahashi identity (12.142) may be rederived in operator
language via a short calculation (see Problem 8) using current conservation and the
equal-time commutation relations (12.128) of the charge densities Jα0 with the φn fields.
One may also regard (12.142) as the off-shell expression of current conservation. The
operator statement ∂μ Jαμ = 0 is, of course, equivalent to the condition β|∂μ Jαμ |α = 0
for arbitrary multi-particle states |α , |β . Such states may be generated starting from
the T-product on the left-hand side of (12.142) via the LSZ formula: by reducing out
each of the p fields to produce the desired initial and final-state particles. One does
this
(cf. Section 9.4) by Fourier transforming the matrix element (applying the factor
e r ±ipr ·yr and integrating over yr ), multiplying by factors p2r − m2 for each external-
state particle, and then taking the on-mass-shell limit p2r → m2 . When this procedure
is followed, we find that in each term on the right-hand side, the δ-function eliminates
the necessary external propagator of the φnr (yr ) field, needed to remove the vanishing
factor p2r − m2 in the on-mass-shell limit. Accordingly, the left-hand side (giving the
desired matrix element β|∂μ Jαμ (x)|α ) is also zero after the LSZ on-shell projection
is performed.

12.6 Beyond Poincaré: supersymmetry and superfields


The immediacy and clarity of the enormous phenomenological support for the sym-
metries of the Poincaré group, consisting of Lorentz transformations (specifically, the
proper, orthochronous subgroup corresponding to physically realizable transforma-
tions) and spacetime translations, force us to incorporate these symmetries at the
very foundations of the quantum field theories with which we attempt to describe
the physics of the microworld. In particular, it was a basic requirement that every
444 Symmetries I: Continuous spacetime symmetry: why we need Lagrangians in field theory

satisfactory field theory contain a set of conserved generators (self-adjoint operators


in the Hilbert space of the theory) Mμν , P ρ satisfying the algebra of the Poincaré
group (cf. (9.65–9.67)):

[Mμν , Mρσ ] = i(g μσ Mρν + g νσ Mμρ − g ρμ Mσν − g ρν Mμσ ) (12.143)


[P ρ , Mμν ] = i(g ρν P μ − g ρμ P ν ) (12.144)
ρ σ
[P , P ] = 0 (12.145)

For the first fifty years of quantum field theory, until the mid-1970s, the Poincaré
symmetries were thought to represent the maximal set of spacetime symmetries: in
other words, it was implicitly assumed that one could not expand the algebra (12.143–
12.145) consistently by introducing further generators with non-trivial commutation
relations with the Mμν , P ρ . This prejudice was reinforced by the famous “no-go”
theorem of Coleman and Mandula (Coleman and Mandula, 1967), which showed that
the Poincaré algebra was indeed maximal in this sense, on the basis of assumptions
which seemed unexceptionable at the time. An important implicit assumption was that
the generators of the algebra were bosonic in character: acting on bosonic states, they
produced bosonic states, and on fermionic ones, fermionic states. The relaxation of this
last assumption is the critical step in allowing the existence of supersymmetry (SUSY)
algebras which expand the original Poincaré algebra stated above. The simplest
possible extension of the Poincaré algebra turns out13 to involve the introduction
of a Majorana 4-spinor set of generators Qα (α = 1, 2, 3, 4)


Cs Q∗
(12.146)
Q

where Qa , a = 1, 2 is a (0, 12 ) 2-spinor and Cs the 2-spinor conjugation matrix (7.40)


(see Section 7.4.3, especially footnote 5). The (anti)commutation relations of the new
spinorial generators are

{Qα , Q̄β } = 2(γμ P μ )αβ (12.147)


μ μ
[P , Qα ] = [P , Q̄α ] = 0 (12.148)

In effect, the Qα generators can be thought of as providing us with a “square root” of


the energy-momentum four-vector! In addition, there are the commutation relations
of the Qα with the Mμν , which simply express the fact that the Majorana 4-spinor
transforms appropriately under the HLG: we shall not use these further, and do not
give them explicitly here. Altogether, the algebra (12.147, 12.148) taken together with
the Poincaré algebra expressed by (12.143, 12.144, 12.145) (and the commutator of the
Qα with the Mμν ) constitute a graded Lie algebra (in this case, the super-Poincaré
algebra)—one involving both generators of bosonic and fermionic type, and in which
commutators involving one (resp. two) bosonic generators lead to a linear combination

13 The most general form of the supersymmetry algebra was first derived by by Haag, Lopuszanski, and
Sohnius, (Haag et al., 1975): see below.
Beyond Poincaré: supersymmetry and superfields 445

of generators of fermionic (resp. bosonic) type, while the anticommutators of the


fermionic generators are expressible as a linear combination of bosonic generators.
Now, it is hardly obvious that the assortment of commutation and anticommutation
rules obeyed by the generators Mμν , P ρ , Qα , as given above, are even mathematically
consistent. Their consistency can be established by giving an explicit realization,
and we shall do so by considering the simplest possible field theory exhibiting the
supersymmetry algebra given here. Consider a theory consisting of a free massless
complex scalar field φ together with a free massless Majorana field ψ (corresponding to
a self-conjugate massless fermion). There are two massless bosonic degrees of freedom
(we can write the complex field φ = √12 (A + iB), where A and B are independent real
scalar fields), and likewise two massless fermionic degrees of freedom (for the two spin
states of the massless fermion). The Lagrangian is (see (7.127) for the origin of the
one-half in the fermionic part)14
i
L = ∂μ φ∗ ∂ μ φ + ψ̄∂/ψ (12.149)
2
with the standard Noether expressions for the energy-momentum operators

P 0 = d3 x{φ̇∗ φ̇ + ∇φ  + i ψ̄γ · ∇ψ)
 ∗ · ∇φ 
2

P = d3 x{φ̇∗ ∇φ
  + φ̇∇φ ∗ + i ψ † ∇ψ}
 (12.150)
2
The fields satisfy the equal-time (anti)commutation relations:

[φ̇(x, t), φ∗ (y , t)] = [φ̇∗ (x, t), φ(y , t)] = −iδ 3 (x − y ) (12.151)
{ψα (x, t), ψ̄β (y , t)} = (γ 0 )αβ δ 3 (x − y ) (12.152)
{ψα (x, t), ψβ (y , t)} = (iγ )αβ δ (x − y )
2 3
(12.153)

The unusual form of the anticommutation relation (12.153) arises because of the
Majorana property of ψ (namely, ψ ∗ = iγ 2 ψ, implying the equivalence of (12.152)
and (12.153)). The equations of motion φ = 0, ∂/ψ = 0 imply conservation (i.e., zero
divergence) of the fermionic current

J μ = 2{(∂/φ)γ μ ψR + (∂/φ∗ )γ μ ψL }, ∂μ J μ = 0 (12.154)

where ψL = PL ψ = 1+γ 5
2 ψ, ψR = PR ψ =
1−γ5
2 ψ are the upper and lower 2-spinor
components of ψ respectively. The current J μ gives rise in the usual way to an
associated conserved charge Qα
√ 
Qα = 2 d3 x(∂/φ(x, t)γ 0 PR + ∂/φ∗ (x, t)γ 0 PL )αβ ψβ (x, t) (12.155)

14 Notational alert: In SUSY, it is conventional to use the ∗ symbol for both complex conjugation (of
complex and Grassmann numbers), as well as hermitian conjugation (of operators). We shall adhere to this
policy throughout this section.
446 Symmetries I: Continuous spacetime symmetry: why we need Lagrangians in field theory

These charges can be regarded as infinitesimal generators of a symmetry of the theory,


as follows. In order to obtain the variation in the fields from a commutator (in both
the bosonic and fermionic case) we introduce an infinitesimal Majorana 4-spinor ξα
whose components are constant Grassmann numbers, commuting with the scalar field
φ but anticommuting with the fermionic field ψ. As ξ is infinitesimal, we work only
to first order in it, and define the variation of the fields under an infinitesimal SUSY
transformation as15
1 ¯ 1 ¯
δξ φ(x) ≡ √ [ξQ, φ(x)], δξ ψ ≡ √ [ξQ, ψ(x)] (12.156)
2 2
Using the (anti)commutation relations given above, one finds explicitly
¯ L ψ(x)
δξ φ(x) = −iξP
δξ φ∗ (x) = −iξP
¯ R ψ(x)

δξ ψ(x) = −(∂/φ(x)PR + ∂/φ∗ (x)PL )ξ


¯ /φ(x)PL + ∂/φ∗ (x)PR )
δξ ψ̄(x) = ξ(∂ (12.157)
Notice that the supersymmetry transformation has interchanged bosonic and fermionic
fields, so that we must expect a very specific balance between the bosonic and fermionic
fields appearing in the Lagrangian if the theory is actually to be invariant under such a
transformation. In particular, the first-order (in ξ) variation of the Lagrangian density
(12.149) under these variations of the fields is found (see Problem 9) to be a pure
divergence
i ¯ μ (∂/φ(x)PR + ∂/φ∗ (x)PL )ψ(x)}
δξ L = − ∂μ {ξγ (12.158)
2
The total action of the theory, defined as the integral over all spacetime of L (with
all fields assumed to vanish at infinity), is therefore invariant under the infinitesimal
SUSY transformations generated by the Qα . However, as discussed in the preceding
Section, the construction of the Noether current in this case requires an additional
term (cf. (12.82), also Problem 10). A further straightforward calculation shows (see
Problem 11) that the generators Qα given in (12.155) satisfy precisely the remarkable
anticommutation relations (12.147). The fact that Qα commutes with P μ follows from
the conservation of Qα (for μ = 0) and the fact that the Qα is the spatial integral of
a density (for μ spatial).
The preceding discussion would amount to little more than a mathematical curios-
ity were it not possible to realize the super-Poincaré algebra in an interacting theory.
In fact, the generalization to non-zero mass and non-trivial interactions turns out to
be quite straightforward, once the appropriate technical machinery is in place. Apart
from some algebraic wizardry involving Grassmann Majorana spinors (see Appendix
C), the important simplifying device turns out to be the introduction of superspace,
an abstract extension of the four spacetime “bosonic” dimensions by a further four
dimensions with Grassmannian (anticommuting) coordinates. The use of superspace

15 We

remove the annoying 2 here, inserted for later convenience in normalizing the charges.
Beyond Poincaré: supersymmetry and superfields 447

allows us to interpret the supersymmetry transformation of the fields introduced above


in a completely ad hoc fashion in an intuitively natural and geometrically motivated
way—to the extent that this is possible working in a space with anticommuting
coordinates! The remaining subsections of this Chapter constitute an all too brief
introduction to the essential ideas of supersymmetry. For further details, the reader is
referred to any of the many excellent texts on supersymmetry (for example, (Weinberg,
1995b)).

12.6.1 The homogeneous Lorentz group and SL(2,C)


The homogeneous Lorentz group (HLG), which we first introduced as the group with
fundamental representation consisting of 4x4 matrices satisfying

Λμν Λμρ = δνρ (12.159)

is locally isomorphic to the group SL(2,C) of unimodular (i.e., unit determinant)


complex 2x2 matrices. Of course, SU(2) is contained in SL(2,C), so we know SL(2,C)
is at least large enough to contain the spatial rotations. A general element of SL(2,C)
can be written


α β
λ≡
γ δ

where α, β, γ, δ are complex numbers satisfying αδ − βγ = 1. Let σμ , μ = 0, 1, 2, 3 be


the extended set of 2x2 σ-matrices, where σ0 ≡ 1 and σi , i = 1, 2, 3 are the usual Pauli
matrices. From any four-vector pμ we can then construct an hermitian 2x2 matrix

P ≡ pμ σμ , det(P ) = pμ pμ = p2 (12.160)

Under the similarity transformation

P → P  ≡ λP λ† (12.161)

and clearly det(P  ) = det(P ) so the four-vector p corresponding to P  is a Lorentz


transform of p, with

λ(pμ σμ )λ† = Λν μ σν pμ (12.162)


= pν σν (12.163)

The group SL(2,C) is six-dimensional: four complex numbers contain eight (real)
degrees of freedom, but the two constraints setting the real part of αδ − βγ to 1 and
the imaginary part to zero reduce the dimensionality to 6, the correct number for the
HLG. Infinitesimally, we can write
i 1
λ = 1 + ( ijk ω ij + ω 0k )σk (12.164)
4 2
corresponding to

Λμν = g μν + ω μν (12.165)
448 Symmetries I: Continuous spacetime symmetry: why we need Lagrangians in field theory

in the fundamental representation of HLG.


The fundamental representation of SL(2,C) is a complex 2-spinor Qa , a = 1, 2:


Q1
Q=
Q2

which transforms under SL(2,C) as

Q → λQ (12.166)

Suppose we manage to find a pair of operators Qa on the state space, with the
transformation under HLG

U † (Λ)Qa U (Λ) = λab Qb (12.167)

Taking Λ, λ infinitesimal, this becomes


i i 1
− ωμν [J μν , Qa ] = ijk ω ij (σk )ab Qb + ω 0k (σk )ab Qb (12.168)
2 4 2
implying the commutation relations (rotations)
1
[Jij , Qa ] = − ijk (σk )ab Qb (12.169)
2
and (boosts)
i
[Jk0 , Qa ] = − (σk )ab Qb (12.170)
2
 K
In the J,  notation introduced in Section 7.2, with J1 = J32 , K1 = J10 , etc., things
are even simpler:
1
[Ji , Qa ] = (σi )ab Qb (12.171)
2

i
[Ki , Qa ] = − (σi )ab Qb (12.172)
2
≡
Finally, recalling the (A,B) notation (cf. Section 7.2) where we define generators A
1   B  ≡ (J + iK),
 these become
2
(J − iK), 2
1

[Ai , Qa ] = 0 (12.173)

1
[Bi , Qa ] = (σi )ab Qb (12.174)
2
so the 2-spinor Qa corresponds to what we previously (in Chapter 7) called the (0, 12 )
representation of the HLG. The reader will recall that such a spinor is conventionally
written as the lower half of a Dirac 4-spinor.
An important notational convention: In SUSY, it is conventional to use the ∗
symbol for both complex conjugation (of complex and Grassmann numbers), as well as
Beyond Poincaré: supersymmetry and superfields 449

hermitian conjugation (of operators)—the dagger symbol is reserved for column vectors
and matrices of operators. Moreover, the ∗ symbol applied to products of Grassmann
objects is defined to reverse the order, in analogy to the property of hermitian adjoints
of operators.
The conjugation matrix Cs ≡ iσ2 has the property

Csσ ∗ = −σ Cs (12.175)

Accordingly,

[Ji , (Cs )ab Q∗b ] = −(Cs )ab [Ji , Qb ]∗ (12.176)


1
= − (Cs )ab (σi∗ )bc Q∗c (12.177)
2
1
= (σi )ab (Cs )bc Q∗c (12.178)
2
and

[Ki , (Cs )ab Q∗b ] = −(Cs )ab [Ki , Qb ]∗ (12.179)


i
= − (Cs )ab (σi∗ )bc Q∗c (12.180)
2
i
= (σi )ab (Cs )bc Q∗c (12.181)
2
so (Cs )ab Q∗b commutes with Bi , i.e., is in the ( 12 ,0) representation of the HLG. This
means that we can construct a 4-spinor Qα , α = 1, 2, 3, 4 (for which we confusingly
use the same letter Q, as the 2-spinors will shortly disappear) with the ( 12 , 0) ⊕ (0, 12 )
transformation properties of the Dirac 4-spinor by taking Qa , a = 1, 2 as the lower two
components of the 4-spinor and (Cs )ab Q∗b , a = 1, 2 as the upper two components. If
the components of such a spinor are anticommuting c-numbers, or fermionic fields, we
call such an object a Grassmann Majorana spinor.
Note that the unimodularity of SL(2,C) implies that λT Cs λ = λ whence

(λQ)a (Cs )ab (λQ)b = Qa (Cs )ab Qb (12.182)

so QCs Q is a scalar: the Cs -matrix can be used to couple two ( 12 ,0) reps (or two (0, 12 )
reps) to a Lorentz scalar.
In the forthcoming sections we shall be needing a number of simple algebraic
properties of Grassmann Majorana spinors, which are defined and studied in Appendix
C. The relevant results are gathered there, and we strongly recommend that the reader
spend a few minutes at this point in gaining some familiarity with the essential prop-
erties of these objects, the basic ingredients from which we construct supersymmetric
theories.

12.6.2 The supersymmetry algebra


The translation and Lorentz generators which together constitute the Lie algebra of
the Poincaré group satisfy the commutator algebra stated in (9.65-9.67). Under the
450 Symmetries I: Continuous spacetime symmetry: why we need Lagrangians in field theory

HLG, the energy-momentum operator P μ transforms as a four-vector, i.e., as a ( 12 , 12 )


representation, while the antisymmetric second rank tensor Mμν is a member of the
reducible representation (1,0)⊕(0,1). Note that the simplest nontrivial representations
of the HLG, the spinorial ( 12 ,0) and (0, 12 ), do not appear in the Poincaré algebra—a
deficiency (if we wish to regard it as such) which was eliminated by the introduction
of supersymmetry.
In 1975, Haag, Lopuszanski, and Sohnius (Haag et al., 1975) derived the most
general extension of the Poincaré algebra, in which Grassmannian spinorial gener-
ators, which can be thought of roughly as “square roots of Pμ ”, are introduced. If
Qa ∈(0, 12 ), then we saw that Cs Q∗ ∈( 12 ,0). Recalling that the four-vector momentum
Pμ transforms in the ( 12 , 12 ) representation of the HLG, it is at least a possibility that
the anticommutator of Q and Q∗ can be proportional to Pμ :

{Qa , Q∗b } = 2σab


μ
Pμ (12.183)
[Pμ , Qa ] = [Pμ , Q∗a ] =0 (12.184)
{Qa , Qb } = {Q∗a , Q∗b } = 0 (12.185)

In the above, the indices a, b run over the values (1,2): Qa , Q∗a are 2-spinors. In fact, the
Haag–Lopuszanski–Sohnius theorem allows for an even more general algebra with N
independent Grassmannian generators, Qar , a = 1, 2, r = 1, 2, . . . N , with an algebra

{Qar , Q∗bs } = 2δrs σab


μ
Pμ (12.186)
{Qar , Qbs } = (Cs )ab Zrs (12.187)

where the Zrs commute with everything and are called “central charges”. This
algebra is called “N-extended” supersymmetry and leads to consistent field theories for
1 ≤ N ≤ 8. The case of N =1 (“simple supersymmetry”) is of the greatest phenomeno-
logical importance, and is the only one we shall consider in our brief introduction to
SUSY.16
The N=1 SUSY algebra, Eqs. (12.183, 12.184, 12.185), is more frequently written
in a four-component Dirac notation: as described previously, we group the generators
Qa , Q∗a into a single Majorana 4-spinor Qα , α = 1, 2, 3, 4 (see (12.146)) and

Q̄β → (Q∗T , −QT Cs ) (12.188)

The basic SUSY anticommutation relations, Eqs. (12.183, 12.185), can now be
expressed as a single equation:


0 −Cs {Q∗ , Q}Cs


{Qα , Q̄β } =
{Q, Q∗ } 0 αβ


0 −2Cs (σμ P μ )T Cs
= (12.189)
2σμ P μ 0 αβ

16 The derivation of Eqs. (12.186, 12.187) is given in full in Weinberg (Weinberg, 1995b), Chapter 2.
Beyond Poincaré: supersymmetry and superfields 451

Recalling Cs = iσ2 and Csσ T Cs = σ , Cs σ0 Cs = −σ0 ,





0 1 0 −σi
{Qα , Q̄β } = 2P 0 + 2P i (12.190)
1 0 αβ σi 0 αβ

giving finally

{Qα , Q̄β } = 2(γμ P μ )αβ (12.191)

together with

[P μ , Qα ] = [P μ , Q̄α ] = 0 (12.192)

so we have recovered the supersymmetry algebra of the fermionic generators Qα


introduced in an ad hoc fashion at the beginning of the section.
The Majorana bispinor Qα ∈ ( 12 , 0) ⊕ (0, 12 ) under HLG, so acting on any one-
particle state of spin j and momentum pμ we must get
1
Qα |
p, j >→ |
p, j ± > (12.193)
2
In particular, if p = 0 (particle of mass m at rest, H|
p = 0, j >= m|
p = 0, j >), Eq.
(12.192) implies

HQα | p = 0, j >= mQα |


p = 0, j >= Qα H| p = 0, j > (12.194)

implying the existence of mass degenerate pairs of particles (“superpartners”) with


spins differing by 12 , and hence of opposite statistics (fermions mass degenerate with
bosons), assuming the validity of the Spin-Statistics theorem. The toy theory studied
earlier of a free massless (complex) scalar boson and a free massless spin- 12 Majorana
particle is merely the simplest possible example of such a situation. We must now face
the task of devising an efficient machinery for constructing theories with interacting
particles and non-zero mass and with an action invariant under the super-Poincaré
group generated by the Qα together with the usual Poincaré generators P μ and Mμν .

12.6.3 Superfields
The simplest way to construct field theories with supersymmetric invariant
Lagrangians is to introduce a Grassmannian extension of ordinary spacetime:

(xμ ) → (xμ , θα ) (superspace) (12.195)

and to think of the SUSY generators Qα as the Grassmann analogs of Pμ , generating


translations in “θ-space”, as Pμ does in xμ space. Fields are now viewed as functions
both of xμ and θα , and a general field can be expanded as a polynomial in the θα of
degree four or less. Schematically:

S(x, θ) = Sn (x)θn , n ≤ 4 (12.196)

If the superfield S has overall bosonic character, then the coefficient fields Sn will be
bosonic for n even and fermionic for n odd. If (as we shall take here) the leading term
452 Symmetries I: Continuous spacetime symmetry: why we need Lagrangians in field theory

(no θ’s) is a scalar bosonic field, we call S(x, θ) a “scalar superfield”. So a superfield is
a handy way of grouping together bosonic and fermionic fields in a “supermultiplet”.
We need the analog of the bosonic spacetime (infinitesimal) translation property

[μ Pμ , φ(x)] = −iμ ∂μ φ(x) = −i(φ(x + ) − φ(x)) (12.197)

This property connects the Hilbert space four-momentum operators Pμ to a differential


operator −i∂μ acting on the fields. In the same way, we seek differential operators
(now involving Grassmann derivatives in superspace) which will be equivalent to
commutation with the Hilbert space operators Qα . We must ensure, of course, that
the differential operators we construct satisfy the SUSY algebra (12.191–12.192).
From the Majorana identity (C.5) (Appendix C), we know that θγ = −θ̄α (γ5 )αγ ,
with  the 4x4 matrix diag(Cs , Cs ), so the following operator (of overall fermionic
character)


Kα ≡ (γ5 )αγ − iγαγ
μ
θ γ ∂μ (12.198)
∂θγ

can equivalently be written


Kα = − − i(γ μ θ)α ∂μ (12.199)
∂ θ̄α

The Dirac adjoint is

K̄β = Kγ (γ5 )γβ (12.200)


∂ μ
= (γ5 )γδ (γ5 )γβ − iγγδ θδ (γ5 )γβ ∂μ (12.201)
∂θδ

= + i(γ5 γ μ )βγ θγ ∂μ (12.202)
∂θβ

The anticommutator algebra of the K and K̄ is now easily computed:

∂ ∂
{Kα , K̄β } = {(γ5 )αγ , i(γ5 γ μ )βδ θδ ∂μ } + {−iγαγ
μ
θ γ ∂μ , } (12.203)
∂θγ ∂θβ
μ
= −i(γ5 γ μ γ5 )βα ∂μ − iγαβ ∂μ (12.204)
μ
= −2iγαβ ∂μ (12.205)

This establishes that the Kα have the same algebra as the Qα , in the sense that, given
a superfield with the property

[Qα , S(x, θ)] = −iKα S(x, θ) (12.206)


Beyond Poincaré: supersymmetry and superfields 453

(from which follows [Q̄β , S(x, θ)] = [Qγ (γ5 )γβ , S] = −iK̄β S), then the SUSY algebra
(12.191) is properly realized via

[{Qα , Q̄β }, S(x, θ)] = Qα [Q̄β , S] + [Qα , S]Q̄β + Q̄β [Qα , S] + [Q̄β , S]Qα
= −iQα (K̄β S) − i(Kα S)Q̄β − iQ̄β (Kα S) − i(K̄β S)Qα
= iK̄β [Qα , S] + iKα [Q̄β , S]
= {K̄β , Kα }S
μ
= −2iγαβ ∂μ S
= 2γ μ [Pμ , S] (12.207)

A glance at (12.198) and (12.202) shows that {Kα , Kβ } = 0 = {K̄α , K̄β }. Moreover, a
covariant (superspace) derivative can be defined as follows, by a simple change of sign
from (12.198):


Dα ≡ (γ5 )αγ μ
+ iγαγ θγ ∂μ (12.208)
∂θγ

The change of sign relative to the definition of the Kα implies the anticommutation
relations

{Kα , Dβ } = {Kα , D̄β } = 0 (12.209)


μ
{Dα , D̄β } = +2iγαβ ∂μ (12.210)

Covariant superspace derivatives are useful, as they allow us to construct new


superfields by applying SUSY-invariant constraints to the basic scalar superfield. For
example, if we impose the constraint

Dα S(x, θ) = 0 (12.211)

on a superfield S, then this constraint will be preserved under an infinitesimal SUSY


transformation generated by (12.206), as a consequence of (12.209). This will be
important later in the construction of the so-called “chiral superfields”.

12.6.4 Transformation properties of components of scalar superfields


The rather schematic expansion of a general scalar superfield given in (12.196) can be
made more explicit. Expanding S(x, θ) to fourth (the maximum) order in the θα ,

i
S(x, θ) = C(x) − iθ̄γ5 ω(x) − (θ̄γ5 θ)M (x)
2
1 1
− (θ̄θ)N (x) − θ̄γ5 γμ θV μ (x)
2 2
i 1 1
− i(θ̄γ5 θ) θ̄(λ(x) − ∂/ω(x)) − (θ̄γ5 θ)2 (D(x) − C(x)) (12.212)
2 4 2
454 Symmetries I: Continuous spacetime symmetry: why we need Lagrangians in field theory

where C(x), D(x), M (x), N (x) and V μ (x) are bosonic fields while the fields λ(x) and
ω(x) are fermionic (Majorana) 4-spinors. The peculiar combinations used to define the
D(x) and λ(x) terms are chosen to simplify the transformation laws for these fields, as
we shall soon see. In analogy to (12.197), an infinitesimal SUSY transformation of S
is generated by an infinitesimal Grassmann “translation” in superspace ξα as follows:

¯ ∂ − iγ μ θ∂μ )S(x, θ)
δS = ξ¯α Kα S = ξ(−
∂ θ̄
¯ 5 ω + iγ5 θM + θN + γ5 γμ θV μ + 2i(γ5 θ)θ̄(λ − i ∂/ω)
= ξ{iγ
2
i 1
+i(θ̄γ5 θ)(λ − ∂/ω) + γ5 θ(θ̄γ5 θ)(D − C)}
2 2
i
¯ μ θ){∂μ C − iθ̄γ5 ∂μ ω − (θ̄γ5 θ)∂μ M
− i(ξγ
2
1 1 i
− θ̄θ∂μ N − θ̄γ5 γν θ∂μ V ν − i(θ̄γ5 θ)θ̄(∂μ λ − ∂μ ∂/ω)} (12.213)
2 2 2
The term with no θs corresponds to the change in the C(x) field, so we immediately
can read off

¯ 5 ω(x)
δC(x) = iξγ (12.214)

Notice that the SUSY transformation has turned a bosonic scalar field C into a
fermionic spinor field ω. Next, terms with a single θ:

¯ /C(x) + iγ5 M (x) + N (x) + γ5 V/(x))θ


−iθ̄γ5 δω(x) = ξ(−i∂
= θ̄(i∂/C(x) + iγ5 M (x) + N (x) + γ5 V/(x))ξ (12.215)

whence

δω(x) = (−γ5 ∂/C(x) − M (x) + iγ5 N (x) + iV/(x))ξ (12.216)

To deal with the terms with two θs we will need a Fierz rearrangement theorem—
(C.26) from Appendix C:

1 1 1
θα θ̄β = − δαβ θ̄θ + (γ5 γμ )αβ θ̄γ5 γ μ θ − (γ5 )αβ θ̄γ5 θ (12.217)
4 4 4
We can now rearrange the terms with two θs in (12.213) as follows:

¯ 5 θ)θ̄(λ − i ∂/ω) + i(θ̄γ5 θ)ξ(λ


2i(ξγ ¯ − i ∂/ω) − (ξγ¯ μ θ)θ̄γ5 ∂μ ω
2 2
1 ¯ i 1 ¯ i
= 2i(− )ξγ 5 (λ − ∂ /ω)θ̄θ + 2i( )ξγ 5 γ5 γμ (λ − ∂/ω)θ̄γ5 γ μ θ
4 2 4 2
1 ¯ i ¯ − i ∂/ω)θ̄γ5 θ
+ 2i(− )ξγ 5 γ5 (λ − ∂ /ω)θ̄γ5 θ + iξ(λ
4 2 2
Beyond Poincaré: supersymmetry and superfields 455

1¯ μ 1¯ μ 1¯ μ
+ ξγ γ5 ∂μ ω θ̄θ − ξγ γ5 γν γ5 ∂μ ω(θ̄γ5 γ ν θ) + ξγ γ5 γ5 ∂μ ω θ̄γ5 θ
4 4 4
i ¯ 5 (λ − i∂/ω)
= − θ̄θ ξγ
2
1¯ i ¯ − i ∂/ω) − i ξ∂ ¯/ω}
+ iθ̄γ5 θ{− ξ(λ − ∂/ω) + ξ(λ
2 2 2 4
1¯ i¯ i¯ ν
+ iθ̄γ5 γ μ θ{ ξγ μ λ − ξγμ ∂ /ω + ξγ γ5 γμ γ5 ∂ν ω}
2 4 4
i ¯ 5 (λ − i∂/ω) (→ − 1 θ̄θδN )
= − θ̄θ ξγ (12.218)
2 2
i ¯ − i∂/ω) (→ − i θ̄γ5 θδM )
+ θ̄γ5 θξ(λ (12.219)
2 2
i ¯ μ λ − i ξ(γ
¯ ν γμ + γμ γ ν )∂ν ω} (→ − 1 θ̄γ5 γμ θδV μ )
+ θ̄γ5 γ μ θ{ξγ (12.220)
2 2 2

(12.218–12.220) lead immediately to the desired transformation rules

¯
δM (x) = −ξ(λ(x) − i∂/ω(x)) (12.221)
¯ 5 (λ(x) − i∂/ω(x))
δN (x) = iξγ (12.222)
¯ μ λ(x) − ξ∂
δV (x) = −iξγ
μ ¯ μ ω(x) (12.223)

The terms with three θs in (12.213) give δ(λ(x) − 2i ∂/ω(x)). Using (C.12, C.13),
and identities (1) and (2) from Appendix C, we have

¯ 5 θ)(θ̄γ5 θ) = (θ̄γ5 θ)(θ̄γ5 ξ)


(ξγ (12.224)
¯ θ)(θ̄γ5 θ) = −(θ̄γ5 θ)(θ̄γ ξ)
(ξγ μ μ
(12.225)
¯ μ θ)(θ̄θ) = −(ξγ
(ξγ ¯ μ γ5 θ)(θ̄γ5 θ) = (θ̄γ5 θ)(θ̄γ5 γ μ ξ) (12.226)
(ξγ ¯ γν θ)(θ̄γ5 θ) = −(θ̄γ5 θ)(θ̄γν γ ξ)
¯ θ)(θ̄γ5 γν θ) = −(ξγ
μ μ μ
(12.227)

Using these identities, we see that the terms with three θs in (12.213) can be rewritten

1 1 i i
(θ̄γ5 θ)θ̄{(D − C)γ5 ξ + γ μ ξ∂μ M + γ5 γ μ ξ∂μ N − ∂μ V/γ μ ξ} (12.228)
2 2 2 2
so that we can read off
i i 1 1 1
δ(λ − ∂/ω) = { ∂/M − γ5 ∂/N + ∂μ V/γ μ + i(D − C)γ5 }ξ (12.229)
2 2 2 2 2

which, together with the transformation law (12.216) for ω gives the desired SUSY
transformation of the λ field
1
δλ(x) = ( [∂μ V/(x), γ μ ] + iγ5 D(x))ξ (12.230)
2
456 Symmetries I: Continuous spacetime symmetry: why we need Lagrangians in field theory

Finally, the (very important!) transformation law for the D field is obtained by
looking at the term in δS with four θs (see (C.20) from Appendix C):

¯ μ θ) θ̄γ5 θ θ̄(∂μ λ − i ∂μ ∂/ω)


δS ∼ −(ξγ (12.231)
2
¯ μ )α (θ̄γ5 θ)θα θ̄β (∂μ λβ − i ∂μ (∂/ω)β )
= −(ξγ (12.232)
2
1 ¯ μ γ5 (∂μ λ − i ∂μ ∂/ω)
= (θ̄γ5 θ)2 ξγ (12.233)
4 2
1 1
= − (θ̄γ5 θ)2 δ(D − C) (12.234)
4 2
Combining this with the transformation rule (12.214) for the C(x) field, we obtain

¯ 5 ∂/λ(x)
δD(x) = ξγ (12.235)

In other words, the D term of any scalar superfield transforms as a total spacetime-
derivative under an infinitesimal SUSY transformation, so if K is any such field, or
product of such fields, a SUSY invariant action can be obtained simply by taking the
D part of K as the Lagrangian density:

I = d4 x[K]D (12.236)

A simpler way to see this is to realize that the above action is really the integral of K
over all of superspace

I = d4 xdθα K(x, θ) (12.237)

and K transforms into a mixture of spacetime and θ derivatives under an infinitesimal


SUSY transformation (see the second line of (12.213)). We will see soon that the kinetic
term of the simplest SUSY models arises from just this type of SUSY invariant.

12.6.5 Chiral superfields


The real scalar superfield S described above is not an irreducible representation of
the extended Poincaré–SUSY algebra. It can basically be split into two chiral halves
which are irreducible (a rough analogy is the second-rank tensor F μν which is really
(1,0)+(0,1) under the HLG). Recall from (12.208–12.211) that the covariant derivative
Dα transforms superfields into superfields. If we consider separately the upper two
(“left-handed”) and lower two (“right-handed”) components of Dα :

1 + γ5
DLα ≡ ( )αβ Dβ (12.238)
2
1 − γ5
DRα ≡( )αβ Dβ (12.239)
2
Beyond Poincaré: supersymmetry and superfields 457

Then the constraint

DRα Φ(x, θ) = 0 (12.240)

defines a left-chiral superfield Φ which remains left-chiral under a SUSY transformation


(12.213) (as a consequence of (12.209)). Likewise

DLα Φ̃(x, θ) = 0 (12.241)

defines a right-chiral field Φ̃. From the definition



Dα ≡ (γ5 )αγ μ
+ iγαγ θγ ∂μ (12.242)
∂θγ
we can easily extract the left- and right-handed parts:

DLα = αγ + i(γ μ θR )α ∂μ (12.243)
∂θLγ

DRα = −αγ + i(γ μ θL )α ∂μ (12.244)
∂θRγ
The easiest way to solve the constraints (12.240, 12.241) is to define chiral coordi-
nates in superspace

xμ± ≡ xμ ∓ iθR
T
γ μ θL (12.245)

These are cooked up so that x+ vanishes under a right derivative, x− under a left
derivative:

DRα xμ+ = (−αβ + i(γ ν θL )α ∂ν )(xμ − iθRγ (γ μ )γδ θLδ )
∂θRβ
= i(2 γ μ )αδ θLδ + i(γ μ θL )α = 0 (12.246)

and

DLα xμ− = (αβ + i(γ ν θR )α ∂ν )(xμ + iθRγ (γ μ )γδ θLδ )
∂θLβ
= −iαβ θRγ (γ μ )γβ + i(γ μ θR )α
= iθRγ (γ μ )γα + i(γ μ θR )α
= −i(γ μ θR )α + i(γ μ θR )α = 0 (12.247)

where in the last step we have used the transposition property γ μ  = −γ μT .


The condition DR Φ = 0 means (since DR does not contain ∂θ∂L ) that the left-chiral
field Φ can be written as a function of xμ+ and θL :
√ T
Φ(x, θ) = φ(x+ ) − 2θL ψL (x+ ) + F(x+ )θL
T
θL (12.248)

where the expansion must terminate at the term quadratic in θL (which has only
two independent components !). Note that φ, F are complex scalar fields (two real
458 Symmetries I: Continuous spacetime symmetry: why we need Lagrangians in field theory

degrees of freedom each) and ψL is a complex spinor doublet, so Φ contains in all


eight real degrees of freedom. The signs, square-roots of 2, etc., are all conventional
normalizations chosen to make the final Lagrangian look decent. Likewise, the most
general right-chiral field can be expanded
√ T
Φ̃(x, θ) = φ̃(x− ) + 2θR ψR (x− ) − F̃(x− )θR
T
θR (12.249)

If we expand the left-chiral field Φ around the bosonic xμ coordinate, we find that
it can be expressed as follows in terms of conventional spacetime fields:
1
Φ(x, θ) = φ(x) − iθR
T
γ μ θL ∂μ φ(x) + (−i)2 θR T
γ μ θL θR
T
γ ν θL ∂μ ∂ν φ(x)
2
√ T √ T
− 2θL ψL (x) + i 2θL T
∂μ ψL (x)θR γ μ θL + F(x)θL T
θL (12.250)

In order to compare the fields in (12.250) more easily with our original component
fields for the full scalar superfield S, we will need the following identities:

T 1 + γ5 1
θR γ μ θL = θ̄γ5 γ μ θ = θ̄γ5 γ μ θ (12.251)
2 2
1
T
θR γ μ θL θR
T
γ ν θL = − g μν (θ̄γ5 θ)2 (12.252)
4
T
θL ψL = θ̄γ5 ψL = θ̄ψL (12.253)
1 1
T
θR γ μ θL θL
T
∂μ ψL = − θ̄γ5 θθ̄γ5 ∂/ψL = θ̄γ5 θθ̄∂/ψL (12.254)
2 2
T 1 + γ 5 1 + γ5
θL θ = θ̄γ5 θ = θ̄ θ (12.255)
2 2
Inserting these results in (12.250) we obtain

i √ 1 + γ5
Φ(x, θ) = φ(x) − θ̄γ5 γ μ θ∂μ φ(x) − 2θ̄ψL (x) + θ̄ θF(x)
2 2
i 1
+ √ θ̄γ5 θθ̄∂/ψL (x) + (θ̄γ5 θ)2 φ(x) (12.256)
2 8

Similarly, for the right-chiral field


i √ 1 − γ5
Φ̃(x, θ) = φ̃(x) + θ̄γ5 γ μ θ∂μ φ̃(x) − 2θ̄ψR (x) + θ̄ θF̃(x)
2 2
i 1
− √ θ̄γ5 θθ̄∂/ψR (x) + (θ̄γ5 θ)2 φ̃(x) (12.257)
2 8

An extremely important special case occurs when Φ and Φ̃ are conjugates of each
other. This will be the case if the bosonic fields are related in the obvious way, φ̃ =
φ∗ , F̃ = F ∗ , and the upper components of ψL are related to the lower components
of ψR in the usual charge-conjugation way familiar from the 4-spinor version of a
Majorana field:
Beyond Poincaré: supersymmetry and superfields 459
⎛ ⎞
0
⎜ 0 ⎟
ψR = ⎜ ⎟
⎝ ψ1 ⎠
ψ2

and
⎛ ⎞
ψ2∗
⎜ −ψ1∗ ⎟
ψL = ⎜
⎝ 0 ⎠

Powers of a left-chiral field Φ are clearly left-chiral (i.e., satisfy (12.240)); likewise,
powers of right-chiral fields are right-chiral (satisfy (12.241)). However a product like
ΦΦ̃ is not chiral, although it is still a scalar superfield (of the S-type). If Φ̃ = Φ∗ as
discussed above, it is also hermitian. An hermitian Lagrangian can also be obtained
by taking the “real part” of a chiral field (or power of chiral fields), so consider

1
Sc (x, θ) ≡ √ (Φ + Φ∗ ) (12.258)
2

Decomposing the complex bosonic fields φ, F in the usual way

A + iB
φ= √ (12.259)
2
F − iG
F= √ (12.260)
2

and with ψL , ψR the upper and lower components of a single Majorana fermion field
ψ as indicated above, we see that taking the real part of the chiral field Φ yields a
constrained real superfield with components

1 1 i
Sc = A(x) − θ̄ψ(x) + θ̄γ5 γ μ θ∂μ B(x) + θ̄θF (x) − θ̄γ5 θG(x)
2 2 2
i 1
− θ̄γ5 θθ̄γ5 ∂/ψ(x) + (θ̄γ5 θ)2 A(x) (12.261)
2 8

Comparing with the component expression for the general scalar superfield (12.212)

i
S(x, θ) = C(x) − iθ̄γ5 ω(x) − (θ̄γ5 θ)M (x)
2
1 1
− (θ̄θ)N (x) − θ̄γ5 γμ θV μ (x)
2 2
i 1 1
− i(θ̄γ5 θ) θ̄(λ(x) − ∂/ω(x)) − (θ̄γ5 θ)2 (D(x) − C(x)) (12.262)
2 4 2
460 Symmetries I: Continuous spacetime symmetry: why we need Lagrangians in field theory

So we see that the real part of our chiral field is just a real scalar superfield with the
identifications

λ = D = 0 (12.263)
C→A (12.264)
M →G (12.265)
N → −F (12.266)
Vμ → −∂μ B (12.267)
ω → −iγ5 ψ (12.268)

so that the F term corresponds to − N √


+iM
2
in the old notation. Recalling the SUSY
transformation rules

¯
δM (x) = −ξ(λ(x) − i∂/ω(x)) (12.269)
¯ 5 (λ(x) − i∂/ω(x))
δN (x) = iξγ (12.270)

we see that once λ = 0, the variation of the F term is a total spacetime-derivative, so


the integral over spacetime of such a term yields a SUSY-invariant action:

If = d4 x[f (Φ) + f (Φ∗ )]F (12.271)

We will see shortly that this is the term responsible for non-trivial interactions in the
simplest SUSY models.

12.6.6 SUSY Lagrangians


So far we have seen that SUSY-invariant actions are obtainable either as the D-term
of a general scalar (“S-type”) superfield or as the F-term of a chiral superfield. Thus,
we can write
 
1
I= d4 x[K]D + d4 x([f (Φ)]F + [f (Φ∗ )]F ) (12.272)
2

where K is a full (S-type) superfield, Φ a left-chiral superfield. Φ begins with a scalar


field φ with dimension of mass. In powers of mass, the relation xμ+ = xμ − iθR T
γ μ θL
implies that θ has dimension − 12 . Anticipating here our discussion of renormalizability
in Chapter 17, we note that perturbatively renormalizable theories (in four spacetime
dimensions) necessarily contain only interactions of mass dimension less than or equal
to 4. However, the dimension of the D-term is 2+dim(K) which implies that K is
at most quadratic in Φ for renormalizability. Hence the K-term corresponds to the
kinetic part of the Lagrangian. The dimension of the F-term of f (Φ) is 1+dim(f (Φ)),
implying that F must be a polynomial of degree 3 or less. In general we wish to allow
for a multiplet of superfields Φn under some internal symmetry group, and the K-term
Beyond Poincaré: supersymmetry and superfields 461

can be some general (hermitian) quadratic form



K(Φn , Φ∗n ) = Φ∗n gnm Φm , ∗
gnm = gmn (12.273)
n,m

For simplicity, let us consider the construction of a SUSY-invariant Lagrangian for


just a single chiral field Φ. The kinetic term is given by extracting the D (four θ)
component of 12 K = 12 Φ∗ Φ = C(x) + ......... − 14 (θ̄γ5 θ)2 (D(x) − 12 C(x)). Obviously,
C(x) = 12 φ∗ φ, and to get D we need the term in Φ∗ Φ quartic in θs. Here Φ∗ is just
the expression (12.257) with tildes replaced by conjugates (φ̃ → φ∗ , etc.). Assembling
the terms in Φ∗ Φ quartic in θs, we find
1 1
[Φ∗ Φ]θ4 = (θ̄γ5 θ)2 (φ∗ φ + φφ∗ ) + θ̄γ5 γ μ θθ̄γ5 γ ν θ∂μ φ∗ ∂ν φ
8 4
1 − γ5 1 + γ5 ∗
− iθ̄ψR θ̄γ5 θθ̄∂/ψL + iθ̄ψL θ̄γ5 θθ̄∂/ψR + θ̄ θθ̄ θF F (12.274)
2 2
The following identities, easily verifiable with the machinery described in Appendix
C, come in handy at this point:
1
θ̄γ5 θθ̄ψR θ̄∂/ψL = (θ̄γ5 θ)2 ψ̄R ∂/ψL (12.275)
4
1
θ̄γ5 θθ̄ψL θ̄∂/ψR = − (θ̄γ5 θ)2 ψ̄L ∂/ψR (12.276)
4
θ̄(1 − γ5 )θθ̄(1 + γ5 )θ = −2(θ̄γ5 θ)2 (12.277)
θ̄γ5 γ θθ̄γ5 γ θ = −g
μ ν μν 2
(θ̄γ5 θ) (12.278)

Using these in (12.274) we find

1 ∗ 1 1
[Φ Φ]θ4 = (θ̄γ5 θ)2 { (φ∗ φ + φφ∗ ) − ∂ μ φ∗ ∂μ φ − 2F ∗ F − i(ψ̄R ∂/ψL + ψ̄L ∂/ψR )}
2 8 2
1 1
= (θ̄γ5 θ)2 { (φ∗ φ + φφ∗ ) − ∂ μ φ∗ ∂μ φ − 2F ∗ F − iψ̄∂/ψ} (12.279)
8 2
which should be compared with − 14 (θ̄γ5 θ)2 (D(x) − 12 C(x)), with C(x) = 12 φ∗ φ. This
gives the desired D-term:
i
D = ∂ μ φ∗ ∂μ φ + F ∗ F + ψ̄∂/ψ (12.280)
2
which is exactly the free Lagrangian (12.149) studied earlier for a massless complex
spin-0 scalar φ and a massless spin- 12 Majorana field ψ, together with an auxiliary
complex scalar F with (at this stage) no interesting dynamics.
The other term in the general action (12.272) comes from the F-term in a poly-
nomial f , typically called the superpotential (and at most cubic for renormalizability)
in Φ. Recall that
√ Tthe component field decomposition for a general chiral field reads
Φ = φ(x+ ) − 2θL ψL (x+ ) + F(x)θLT
θL , so the F-term corresponds to the term
462 Symmetries I: Continuous spacetime symmetry: why we need Lagrangians in field theory

quadratic in θL :

∂ 2 f (φ(x)) T ∂f (φ)
[f (Φ)]θL2 = T
θL ψL (x)θL ψL (x) + F(x)θL
T
θL (12.281)
∂φ2 ∂φ

Recall the form of a Majorana spinor θ T = (θ2∗ , −θ1∗ , θ1 , θ2 ), from which we obtain
T
θL θL = 2θ1∗ θ2∗
T
θL ψL = θ1∗ ψ2∗ − θ2∗ ψ1∗
1 T
T
(θL θL )2 = 2θ1∗ θ2∗ ψ2∗ ψ1∗ = − θL θL (ψ̄)L ψL
2
T
allowing us to read off the coefficient of θL θL in (12.281):

∂f 1 ∂2f
[f (Φ)]F = F(x) (φ(x)) − (φ(x))ψ̄L ψL (12.282)
∂φ 2 ∂φ2
Adding the hermitian adjoint, the total Lagrange density corresponding to the action
(12.272) thus becomes
i ∂f ∂f
L = ∂μ φ∗ ∂ μ φ + F ∗ F + ψ̄∂/ψ + F + F ∗ ( )∗
2 ∂φ ∂φ
1 ∂2f 1 ∂2f ∗
− ψ̄L ψL − ( ) (ψ̄L ψL )∗ (12.283)
2 ∂φ2 2 ∂φ2
The Euler–Lagrange equation allows us to eliminate the non-dynamical field F
∂L ∂f (φ) ∗
= 0 ⇒ F(x) = −( ) (12.284)
∂F ∂φ
whereupon the Lagrangian becomes

i 1 ∂2f 1 ∂ 2f ∗
L = ∂μ φ∗ ∂ μ φ + ψ̄∂/ψ − P (φ) − ψ̄L ψ L − ( ) (ψ̄L ψL )∗ (12.285)
2 2 ∂φ2 2 ∂φ2
(φ) 2
with P (φ) ≡ | ∂f∂φ | .
∂f
In general, there will be a minimum of f (φ) where ∂φ = 0, putting the polynomial
P (φ) at its absolute minimum. In this case, SUSY is an unbroken global symmetry:
∂f
< 0|[ξ¯L Q, ψL ]|0 > ∝ < 0|F|0 > ∝ < 0| |0 >= 0 (12.286)
∂φ
implying that the generators Q annihilate the vacuum. There are ways to evade this
however (e.g., see (Weinberg, 1995b), Section 26.5, for a discussion of O’Raifertaigh
breaking). One can show that if SUSY is unbroken at the lowest order, it will remain
unbroken to all orders of perturbation theory. Obviously, given the notable absence of
superpartners of equal mass to the known elementary particles, broken supersymmetry
is clearly the norm, if indeed supersymmetry is present in Nature at all.
Beyond Poincaré: supersymmetry and superfields 463

Finally, note that since the fundamental SUSY algebra (12.191) implies that the
Hamiltonian P0 can be constructed as a product of Q and Q̄ operators, the exact
vacuum energy must vanish if SUSY is unbroken (i.e., the disconnected vacuum
energy graphs which determine the shift in vacuum energy due to interactions must
cancel identically between bosonic and fermionic loop contributions to all orders of
perturbation theory).
We conclude our abbreviated survey of supersymmetry by giving a simple explicit
example: historically, the first four-dimensional field theory in which supersymme-
try was demonstrated, the Wess–Zumino model (1974), obtained by taking for the
superpotential

1 2 2 3
f (φ) = mφ + λφ (12.287)
2 3
∂f √
= mφ + 2λφ2 (12.288)
∂φ
∂2f √
2
= m + 2 2λφ (12.289)
∂φ

We can re-express the complex scalar φ = √1 (A + iB), where A, B are real (i.e.,
2
hermitian) scalar fields. Then

∂f 2 1 λ2
P (φ) = | | = m2 (A2 + B 2 ) + mλA(A2 + B 2 ) + (A2 + B 2 )2 (12.290)
∂φ 2 2

while the Yukawa interaction terms become

1 ∂2f 1 ∂ 2f 1 √ A + iB
− 2
ψ̄L ψL − ( 2 )∗ (ψ̄L ψL )∗ = − m(ψ̄L ψL + ψ̄R ψR ) − 2λ √ ψ̄L ψL
2 ∂φ 2 ∂φ 2 2
√ A − iB
− 2λ √ ψ̄R ψR
2
1
= − mψ̄ψ − λAψ̄ψ − iλB ψ̄γ5 ψ (12.291)
2

The complete Lagrangian for this theory is thus

1 1 1 i m
L= (∂μ A)2 + (∂μ B)2 − m2 (A2 + B 2 ) + ψ̄∂/ψ − ψ̄ψ
2 2 2 2 2
λ2 2
− λAψ̄ψ − iλB ψ̄γ5 ψ − mλA(A2 + B 2 ) − (A + B 2 )2 (12.292)
2

The overall 12 in the fermion kinetic term is the appropriate normalization for a self-
conjugate Majorana field (recall the similar factor of 12 when we go from complex
to real scalar fields). Given that, we see that the scalar and spin- 12 fields correspond
464 Symmetries I: Continuous spacetime symmetry: why we need Lagrangians in field theory

to particles of equal bare mass m.17 Of course, if we set m = λ = 0, and return to


the complex field φ = √12 (A + iB), we recover the simple Lagrangian (12.149) for free
massless scalars and a Majorana fermion which served as our first explicit example of
a theory exhibiting the super-Poincaré algebra.

12.7 Problems
1. Show that the interaction Hamiltonian density (12.1) is non-ultralocal: i.e., its
equal-time commutator [Hint (x, t), Hint (y , t)] contains a gradient of a δ-function
(cf. (5.81)).
2. In the derivatively-coupled theory defined by interaction Hamiltonian density
(12.11), the interaction part V of the full Hamiltonian is given by

1
V = {g ψ̄γ μ γ5 ψ∂μ φ(y , 0) + g 2 (ψ̄γ 0 γ5 ψ(y , 0))2 }d3 y (12.293)
2
with all the (unsubscripted) fields in interaction picture and taken at time t = 0.
(a) Show that

φH (x, t) = eiHt e−iH0 t φ̇(x, t)eiH0 t e−iHt + eiHt i[V, φ(x, 0)]e−iHt
∂t
φ
= πH (x, t) + eiHt i[V, φ(x, 0)]e−iHt (12.294)

(b) Use the equal-time commutation relations (12.14) to evaluate the commutator
in part (a), thereby recovering (12.33):
∂ φ
φH (x, t) = πH (x, t) + g ψ̄H γ 0 γ5 ψH (x, t) (12.295)
∂t

3. Show that the field equation for ψH (or ψ̄H ) obtained by applying the first of
the Hamiltonian equations (12.31) to the Hamiltonian (12.22) is equivalent, after
taking its adjoint, to the Dirac Heisenberg field equation (12.36) for ψH .
4. The Hamiltonian density for a massive neutral vector field, described by a Lorentz
vector field Zμ , coupled to a four-vector source field J μ , which involves fields other
than Zμ (for example, we might have J μ = eψ̄γ μ ψ, with ψ a Dirac field) is given
by

H = H0 + Hint (12.296)
1 2  × Z| 2 + m2 Z 2 + 1 (∇ · π )2 }
H0 = {π + |∇ (12.297)
2 m2
1  + 1 (J 0 )2
 · π − J · Z
Hint = − 2 J 0∇ (12.298)
m 2m2

17 Again anticipating later discussions of renormalizability, it can be shown that the underlying SUSY
symmetry ensures that counterterms are induced by radiative corrections only in the D-terms, not in the
F-terms. Accordingly, there are no mass or coupling renormalizations in this theory: the bare m, λ can be
chosen at their fixed physical values! However, as the kinetic part of the Lagrangian derives from the D-term
of (12.272), there is a non-trivial wavefunction (field) renormalization: in other words, Z = 1.
Problems 465

where πi is the momentum field conjugate to Zi (i = 1, 2, 3). From Hamilton’s


−1
equation, we easily find πi = Kij (Żj − m12 ∂j J 0 ), where Kij ≡ δij − m12 ∂i ∂j .
(a) Perform the Legendre transform to arrive at a Lagrangian density given as a
function of Z, ˙ J 0 , and J.
 Z,  The result is non-local, and horribly non-covariant
in appearance (sanity will be restored in part (d))! Note that the Lagrange
density may always be thought of as defining an action via a spacetime integral
(e.g., in the Lagrangian form of the functional integral (12.60)), so you are
always permitted to rearrange derivatives in each term by free use of integration
by parts.
(b) Next, show that the result obtained in (a) is completely equivalent to that
obtained starting with the manifestly local and Lorentz-invariant Lagrangian

1 m2
L = − Fμν F μν + Z μ Z μ − J μ Zμ , Fμν ≡ ∂μ Zν − ∂ν Zμ (12.299)
4 2
Proceed as follows. First, show that Z 0 is a dependent field: namely one that
can be expressed, via the Euler–Lagrange equations of motion, uniquely and
entirely in terms of the other canonical fields and their conjugate momenta at
the same time. Do this by writing down the equation of motion for Z 0 and
showing that it reduces to
1  · π ), ∂L
Z0 = (J 0 − ∇ πi = = Żi − ∂i Z 0 (12.300)
m2 ∂ Żi
−1
Show that (12.300) implies the formula πi = Kij (Żj − m12 ∂j J 0 ) obtained pre-
viously in the Hamiltonian framework. Finally, eliminate Z 0 completely from
the Lagrangian (12.299), and show that the resultant expression agrees with
that found in part (a).
(c) Now carry out the canonical procedure in the usual direction, by starting with
the Lagrangian, and eliminating Z ˙ in favor of π in H = π · Z˙ − L, to check that
the original Hamiltonian (12.296–12.298) is recovered.
(d) In the functional integral approach, the fact that Z 0 is a dependent field
manifests itself in the Gaussian dependence of the Lagrangian density on Z 0
(and its spatial derivatives). Show that if we explicitly integrate out Z 0 in the
path integral
  1    ˙ 0
1 2 4
Z,J


4x
DZ μ ei (− 4 Fμν F + 2 m Zμ Z −Jμ Z )d x → DZe  i L (Z,
μν μ μ
,J)d

(12.301)
the resultant Lagrangian L is exactly the expression found in part (a). (Hint:
−1 1
note the identity Kij = δij + −Δ+m 2 ∂i ∂j ) Its disgustingly non-covariant (and

non-local) appearance is seen to arise from the fact that the Legendre transform
from the Hamiltonian side naturally produces a Lagrangian density with depen-
dent fields eliminated—which we see clearly in this instance serves to disguise
the underlying Lorentz-invariance of the theory. The same situation arises when
redundant fields, associated with local gauge symmetries, are present, as we
466 Symmetries I: Continuous spacetime symmetry: why we need Lagrangians in field theory

shall see in Chapter 15. Of course, the sensible way to ensure Lorentz-invariance
is to start with a Lorentz-invariant Lagrangian, from which we go (if needed)
to the Hamiltonian.
5. The Hamiltonian density for a massless gauge vector field Aμ , coupled to a
conserved current J μ (which, as in Problem 4, we assume to depend on a separate
set of fields), is given in the axial gauge A3 = 0 by

H = H0 + Hint (12.302)
1 ∂i ∂j 1  2
H0 = πi (δij + 2 )πj + |∇ × A| (12.303)
2 ∂3 2
1 0 1 0 1 0
Hint = ∂i πi J − J 2 J − Ji Ai (12.304)
∂32 2 ∂3
where the spatial indices i, j run over the values 1,2. Perform the Legendre
transformation to obtain the Lagrange density
1 2 1 1 1   2 + Ji Ai
L= Ȧ + (∂i Ȧi − J 0 ) (∂j Ȧj − J 0 ) − |∇ × A| (12.305)
2 i 2 Δ 2
(b) Starting instead with the Lagrangian
1
LQED = − Fμν F μν − J μ Aμ , Fμν ≡ ∂μ Aν − ∂ν Aμ (12.306)
4
show that the equation of motion for A0 implies that it is a dependent field:
1 1
ΔA0 = ∂i Ȧi − J 0 ⇒ A0 = (∂i Ȧi − J 0 ) = 2 (∂i πi − J 0 ) (12.307)
Δ ∂3

Setting A3 = 0 (as we are allowed to do via a suitable gauge-transformation Aμ →


Aμ − ∂μ Λ: we are anticipating here the detailed discussion of gauge symmetry and
gauge transformations that will follow in Chapter 15), and then eliminating A0 via
(12.307), show that one recovers the result (12.305) found in part (a).
6. Show that the boost generators M 0i , i = 1, 2, 3, as given by the Noether formula,
generate the correct transformation of a set of fields φn lying in a definite
representation of the HLG (possibly reducible) by computing the commutator
[M 0i , φN (x, t)]. You will find (12.25) and (12.26) useful: also, to avoid subtleties
associated with dependent fields or constraints (cf. Problems 4 and 5), assume
that the Hamiltonian density H(πn , φn , ∇φ  n ) does not depend on gradients of the
conjugate momentum field (i.e., on the ∇πn ).
7. The charges of an internal symmetry duplicate the Lie algebra of the original matrix
transformation group on the fields. Let tα be the matrix generators of some Lie
group (in some representation), with structure constants fαβγ :

[tα , tβ ] = ifαβγ tγ (12.308)


 3 0  3
The corresponding Noether charges are Qα = d xJα (x, t) = −i d xπn (tα φ)n
(cf. (12.127)). Verify, using the canonical equal-time commutation relations of the
Problems 467

fields φn and their conjugate momenta πn , the commutation relations for the charge
densities

[Jα0 (x, t), Jβ0 (y , t)] = ifαβγ δ 3 (x − y )Jγ0 (x, t) (12.309)

from which it follows that the Qα satisfy the Lie algebra

[Qα , Qβ ] = ifαβγ Qγ (12.310)

8. Rederive the Ward–Takahashi identity (12.142) using operator methods. One may
assume without loss of generality that the y1 , y2 , ..yp are already time-ordered. One
can then write the T-product as a sum of terms with explicit θ-functions enforcing
the time-ordering of the current relative to the φn fields, thereby facilitating the
application of the spacetime-derivative. The contact terms arise from the μ = 0
derivative, which result in a series of equal-time commutators, at which point
(12.128) may be employed.
9. Verify that the variation in the Lagrangian density (12.149) induced by the
infinitesimal SUSY transformations listed in (12.157) is a space-time divergence,
as given in (12.158). The Grassmann identity (C.12) from Appendix C will be
useful here.
10. Here, we shall check that the SUSY current (12.154) is indeed the appropriate
conserved current of the form (12.82), in a situation in which there is a non-trivial
variation in the Lagrangian density (12.149) (given by √ a spacetime divergence).
From the SUSY variations (again ignoring the noisome 2 normalization) of the
fields given in (12.157), and ∂(∂∂L
μ φ)
= ∂ μ φ∗ , ∂(∂∂L ∗
μφ )
= ∂ μ φ, and ∂(∂∂L
μ ψ)
= 2i ψ̄γ μ ,
show that the conventional Noether current takes the form (including the Grass-
mann infinitesimal ξ)

μ
JNoeth = i(∂ μ φ∗ )ξP ¯ R ψ + i ψ̄γ μ (∂/φ)PR ξ + i ψ̄γ μ (∂/φ∗ )PL ξ
¯ L ψ + i(∂ μ φ)ξP
2 2
(12.311)
Show that the correct conserved current J μ in (12.154) is obtained from this by
adding in the K μ correction term arising from (12.158). Some of the Grassmann
identities from Appendix C (specifically, (C.12) and (C.13)) will be useful in
interchanging the order ψ̄ · ·ξ → ξ¯ · ·ψ.
11. In this exercise we shall verify explicitly the anticommutation algebra of the
supersymmetry generators for the theory given by Lagrangian (12.149), as given in
(12.147). The fundamental equal-time (anti)commutation relations of the theory
(12.151,12.152,12.153) will be used: note that if φa , φb (resp. ψ, ψ̄) are bosonic
(resp. fermionic) fields, then the anticommutator {φa ψ, ψ̄φb } can be rearranged
into φa {ψ, ψ̄}φb − ψ̄[φa , φb ]ψ. A short calculation then shows that the anticom-
mutator {Qα , Q̄β } can be written as the sum of a bosonic and fermionic part,
where for example

{Qα , Q̄β }bos = 2 d3 x(∂/φγ 0 PR ∂/φ∗ + ∂/φ∗ γ 0 PL ∂/φ)αβ (12.312)
468 Symmetries I: Continuous spacetime symmetry: why we need Lagrangians in field theory

The energy-momentum four-vector P μ is given, from (12.147) by


1 μ
Pμ = γ {Qα , Q̄β } (12.313)
8 βα
Using (12.313), show that the Hamiltonian P 0 and spatial momentum P obtained
from the SUSY anticommutator agrees precisely with the expected Noether
expressions (12.150). (Again, see Appendix C for useful Majorana identities).
13
Symmetries II: Discrete spacetime
symmetries

The invariance of a quantum field theory under the physically realizable transfor-
mations embodied in the Poincaré group—the proper orthochronous elements of the
homogeneous Lorentz group corresponding to a realizable change of inertial frame,
together with spacetime translations—has been built into the foundations of the
theory from the very beginning, and many decades of experimental investigation
at the subatomic level (where effects of gravitation can safely be neglected) have
confirmed that this symmetry, if broken, can only fail at an extremely subtle and
quantitatively minute level. Formally, the Lorentz group admits an obvious extension
if we allow the discrete operations of space and time reflection, corresponding to the
improper (because det(Λ) = −1) Lorentz transformations1 ΛP = diag(1, −1, −1, −1)
and ΛT = diag(−1, 1, 1, 1), respectively. These operations generate potential symme-
tries of a quantum field theory, but we must emphasize the term “potential” here, for
the simple reason that only some of the interactions in Nature (specifically, the strong
and electromagnetic) appear to possess an exact invariance under spatial (parity)
and temporal reflection (time reversal). The failure of parity invariance in the weak
interactions, discovered by Lee and Yang in the mid-1950s, came as a great shock at
the time, but we have since come to realize that Nature does not share the human
prejudice that the laws of Nature should take exactly the same form whether expressed
in a left- or right-handed coordinate system.
Although this chapter is entitled “discrete spacetime symmetries”, it turns out that
it is essential to include the discrete symmetry of charge conjugation—the interchange
of particles and antiparticles—in almost the same breath when discussing parity
and time reversal, even though the connection to “spacetime” is less than obvious
for a symmetry involving the swapping of particles for antiparticles. The reason
for this was already indicated in an heuristic fashion in Chapter 3, where we saw
that the assumptions of relativistic invariance, quantum mechanics, and locality of
interactions implied that exchange of a virtual particle is physically indistinguishable
from the spatiotemporally reflected exchange of the corresponding antiparticle. Our
main objective in this chapter will be to show that the combination of these three
discrete operations—parity, time reversal, and charge conjugation—is necessarily an
exact symmetry of any local relativistic quantum field theory, even when (as is the
case in the Standard Model of elementary particles as it presently stands) none of

1 Note that ΛP and ΛT still satisfy ΛT gΛ = g.


470 Symmetries II: Discrete spacetime symmetries

the individual operations represents an exact symmetry of the theory. This result—
commonly referred to as the TCP theorem—will be obtained first for the case of
theories with a dynamics specified by a Lagrangian (or Hamiltonian) density, and
secondly, in a more general framework, by appealing to the irreducible fundamental
properties of local field theory as incorporated in the Wightman axioms of Section 9.2.
As a natural concomitant of the axiomatic proof of the TCP theorem, we shall also
indicate how the Spin-Statistics connection (previously discussed in Section 7.3, in
the context of Hamiltonian densities built from polynomials in local fields) also arises
rigorously from the spectral and locality axioms of the Wightman formulation of field
theory, with no commitment needed to a specific Lagrangian/Hamiltonian dynamics.

13.1 Parity properties of a general local covariant field


Consider an improper (determinant –1) Lorentz transformation ΛP defined by the
transformation x → −x, x0 → x0 . The reversal of spatial directions implies that both
x and p change sign under parity, while the orbital angular momentum vector x × p is
invariant, a property which we shall assume applies also to spin angular momentum.
We shall start by defining the parity operation for the “bare” states |k, σ created
by the interaction picture free fields. Define a unitary parity operator P ≡ U (ΛP ) as
follows: on the vacuum, we have simply

P|0 = |0

while on a single particle state

P|k, σ = η ∗ | − k, σ , |η| = 1 (13.1)

where the complex unimodular number η is called the “intrinsic parity” of the given
particle. Likewise, for a two-particle state

P|k1 , σ1 ; k2 , σ2 = η1∗ η2∗ | − k1 , σ1 ; −k2 , σ2 (13.2)


⇒ Pa† (k1 , σ1 )|k2 , σ2 = η1∗ a† (−k1 , σ1 )η2∗ | − k2 , σ2 (13.3)
= η1∗ a† (−k1 , σ1 )P|k2 , σ2 (13.4)

and in general, given the obvious generalization of (13.2) to general multi-particle


states, it is obvious that we will have

Pa† (k, σ) = η ∗ a† (−k, σ)P (13.5)

leading to the following pair of transformation equations for the creation and annihi-
lation operators (note: P is unitary)

Pa† (k, σ)P −1 = η ∗ a† (−k, σ) (13.6)


−1
Pa(k, σ)P = ηa(−k, σ) (13.7)
Parity properties of a general local covariant field 471

Note that we must in general allow for the charge conjugate antiparticle to have a
different intrinsic parity (more on this later):

Pac (k, σ)P −1 = η c a(−k, σ) (13.8)

Of course, an exactly similar procedure can be used to define parity operators Pin
(resp. Pout ) acting on the in- (resp. out-)states of the theory, simply by appending a
subscript “in” or “out” to the state vectors and creation–annihilation operators in the
preceding. The presence of an exact parity symmetry (“parity conservation”) in the
theory then amounts to the statement that a single parity operator Pin = Pout effects
the parity transformation for both sets of asymptotic states. This occurs in the event
that the full Hamiltonian of the theory commutes with the parity operator P defined
on the interaction-picture states as above. Namely, if we can choose the intrinsic parity
quantum numbers η for the participating particles such that [P, H0 ] = [P, V ] = 0, it
follows in the usual way that [P, Vint ] = 0, and hence [P, S] = 0. This in turn implies

Sβα = β|S|α = β|P −1 SP|α = ηβ ηα∗ β  |S|α = ηβ ηα∗ Sβ α (13.9)

where the prime indicates states with reversed momenta but unchanged spins (see
Fig. 13.1 for an example). Exactly the same result obtains by applying a similarity
operation to the sequence of out annihilation and in creation operators whose vacuum
expectation value expresses the above S-matrix element, and using the identity
of Pin and Pout . As a consequence of the unimodularity of the intrinsic parities,
|Sβα | = |Sβ  α |, which generally prevents the appearance in the scattering amplitudes
of mixtures of scalar and pseudoscalar functions of spin and momentum (for many
explicit examples, see Chapter 3 of the excellent book of Sakurai (Sakurai, 1964)).
As usual, we build symmetries into the Hamiltonian (or Lagrangian) density H
(resp. L) by constructing fields which have simple transformation properties under the
symmetry group—in this case, the discrete parity transformation. As above, we begin
by considering the effect of the parity operation as defined on a free interaction-picture
(or in- or out-)field. We determined in Section 7.3 the general form of a free local
covariant field φAB (x) transforming according to an irreducible (AB) representation
of the Lorentz group, and restate the final result (7.84) here:
 
d3 k  −ik·x 
φAB
ab (x) =  (uAB
ab (k, σ)e a(k, σ)
(2π)3/2 2E(k) σ

 ik·x c† 
ab (k, −σ)e
+(−)2B (−)j−σ uAB a (k, σ)) (13.10)

From (13.6–13.8) we find immediately for the transformation of this field under P
 
d3 k
PφAB x, t)P −1 =
ab ( √ (uAB 
ab (k, σ)e
−ik·x
ηa(−k, σ)
(2π)3/2 2E σ
 η a (−k, σ))
ab (k, −σ)e
+(−)2B+j−σ uAB ik·x c∗ c†
(13.11)
472 Symmetries II: Discrete spacetime symmetries

νe ν̄μ μ+ e−

Sβα SPβ Pα

e− μ+ ν̄μ νe

ν̄e νμ νe ν̄μ

SCβCα STαTβ

e+ μ− e− μ+

Fig. 13.1 A weak interaction process (e− + μ+ → νe + ν̄μ ) and its P, C, and T transforms.
The momentum (straight arrow) and spin (squiggly arrow) vectors are to be taken literally: we
have chosen an example with incoming and outgoing particles of definite helicity.

Recall the explicit formula (7.69) derived earlier for the coefficient functions uAB
ab :

uab (k, σ) = (e−θk̂·A )aa (e+θk̂·B )bb A B a b |j σ (13.12)


a  b

We can reverse the sign of the momentum by inverting the boost rapidity angle θ, and
also use the Clebsch–Gordon identity ABa b |jσ = (−)A+B−j BAb a |jσ , obtaining

uAB 
ab (−k, σ) = (eθk̂·A )aa (e−θk̂·B )bb (−)A+B−j BAb a |jσ = (−)A+B−j uBA 
ba (k, σ)
a  b
(13.13)
Parity properties of a general local covariant field 473

Changing integration variable k → −k in (13.11), and defining the parity inverted
spacetime point x = (x0 , −x):
 
−1 d3 k 
PφAB
ab (
x , t)P = (−) A+B−j
√ (uBA
ba (k, σ)e−ik·x ηa(k, σ)
(2π)3/2 2E σ
 ik·x c† 
ba (k, −σ)e
+ (−)2B−2A η c∗ (−)2A+j−σ uBA a (k, σ)) (13.14)

The field appearing in (13.14) must be one of the covariant fields listed in Section
7.3 (cf. (7.84)), if a Hamiltonian constructed from such fields is to stand a chance of
commuting with the parity operator P. Indeed, it is almost the field φBA ba ! For this to
work, however, we must choose

η c∗ (−)2B−2A = η ⇒ η c = (−)2A−2B η ∗ = (−)2j η ∗ (13.15)

This establishes the well-known theorem that the intrinsic parity ηηc of a particle–
antiparticle system is (−)2j (hence negative for fermions)—a result critical for the
understanding of positronium decay, for example.
Incorporating the constraint (13.15), we find the desired transformation law of an
arbitrary covariant field under the parity operation:

PφAB x, t)P −1 = η(−)A+B−j φBA


ab ( ba (−
x, t) (13.16)

As a specific example, for a Dirac field, with A + B = j = 12 , the effect of the parity
transformation evidently reverses the upper and lower components of the Dirac
4-spinor (as well as the spatial coordinate x, of course), so (cf. (7.107)) we have simply

Pψ(x, t)P −1 = ηγ0 ψ(−x, t) (13.17)

The transformation rule (13.16) for the free interaction-picture fields (or, with
appropriate subscripts, in- or out-fields) applies whether or not the interacting
dynamics of the theory is invariant under parity. Only if the interaction part of the
Hamiltonian commutes with P however, can we extend (13.16) to the full Heisenberg
fields φAB x, t), as in this case P commutes with the transformation operator U (t, 0) =
H (
eiH0 t e−iHt effecting the transformation from interaction to Heisenberg picture (cf.
(9.1)).2 In this case, parity symmetry is equivalent to the statement that the full
Heisenberg fields satisfy

PφAB x, t)P −1 = η(−)A+B−j φBA


Hab ( Hba (−
x, t) (13.18)

with P|0 = |0 where |0 = |0 in = |0 out is the vacuum of the full Hamiltonian of the
theory. The transformation property (13.18) transfers immediately to a corresponding
transformation rule for the Wightman functions of the theory, and thence, by the

2 Of course, we must acknowledge here the usual embarrassment occasioned by Haag’s theorem: in
our appeal to the interaction picture, we presume some regularization which restores the existence of the
interaction picture, which either preserves parity or a sufficiently close simulacrum thereto that the passage
to a continuum limit is smooth vis-à-vis the parity properties of the theory.
474 Symmetries II: Discrete spacetime symmetries

Haag–Ruelle theory, to the exact S-matrix elements, and we recover the phenomeno-
logical constraints (13.9). In a theory like the weak interactions, on the other hand,
where parity is broken, there does not—indeed cannot—exist a single unitary operator
P(= Pin = Pout ) effecting the transformation (13.18) on the Heisenberg fields of the
theory. Exactly the same situation holds with respect to the discrete symmetries of
charge conjugation and time reversal which we are about to consider: the reader should
keep in mind that although these operations are conveniently defined in terms of free
fields and states, their extension to the fully interacting Heisenberg fields of the theory
presupposes that the symmetry is in fact unbroken. As we shall see, the only operation
for which this is in fact the case in the real world (assuming the general validity of the
Wightman axioms) is the combined application of time-reversal, charge-conjugation,
and parity (in any order), the T CP operator.

13.2 Charge-conjugation properties of a general


local covariant field
Many interactions of interest in physics (in particular, the strong, electromagnetic and
gravitational—though, as with parity, not the weak) are invariant under a symmetry
operation which interchanges particles with antiparticles. This operation is called
charge conjugation—somewhat misleadingly, as it can apply in situations where the
particles have zero electrical charge (but some other global quantum number, e.g.,
strangeness, is non-zero, and therefore distinguishes the particle from the antiparticle).
Define a unitary operator on the space of free multi-particle states as follows (ζ, ζc
phases, or “charge conjugation quantum numbers”):

Ca(k, σ)C −1 = ζac (k, σ)


Ca† (k, σ)C −1 = ζ ∗ ac† (k, σ)
Cac (k, σ)C −1 = ζc a(k, σ)
Cac† (k, σ)C −1 = ζc∗ a† (k, σ) (13.19)

For example, a non-self-conjugate spinless field would transform under charge


conjugation as follows:

d3 k
Cφ(x)C −1 = √ (ζe−ik·x ac (k) + ζc∗ eik·x a† (k)) (13.20)
(2π)3/2 2E

Provided we choose ζ = ζc∗ , the resultant field is simply related to the hermitian
conjugate of φ (allowing us to construct hermitian, local, and charge conjugation
invariant interactions, by balancing φ(x) and φ† (x) factors in the interaction density):

Cφ(x)C −1 = ζφ† (x) (13.21)

The general analysis for non-zero spin is a bit messy, though straightforward. First,
we shall need the conjugation property for a general rotation matrix Dj (see, for
example, (Messiah, 1966), Appendix C, Eqs. (62, 64)):
Charge-conjugation properties of a general local covariant field 475

Y (j) Dj (R)Y (j)† = Dj∗ (R) (13.22)


(j)
where Ymm ≡ (−1)j+m δm,−m , and the complex conjugation on the right-hand side
is applied to each matrix element (no transpose!). Taking the rotation R to be
 this implies for the generators
infinitesimal, Dj = 1 − i · J,
 (j)† = −J∗
Y (j) JY (13.23)
so that we have for the boost type operators

(eθk̂·J )∗ = Y (j) e−θk̂·J Y (j)† (13.24)


All of the above, of course, applies to any set of operators satisfying the angular
momentum algebra, and in particular to our A- and B-type operators forming the
HLG Lie algebra. Consequently:




(e−θk̂·A )∗aa = (Y (A) eθk̂·A Y (A)† )aa = (−1)2A+a+a (eθk̂·A )−a.−a (13.25)
providing the information needed to derive the relevant complex conjugation property:
 



uAB 
ab (k, σ) = (−)2A+a+a (eθk̂·A )−a,−a (−)2B+b+b (e−θk̂·B )−b,−b ABa b |jσ
a b


= (−)A+B+σ (−)A+a (−)B+b (e−θk̂·B )−b,b (eθk̂·A )−a,a AB − a − b |jσ


a  b


= (−)A+B+σ (−)A+a (−)B+b (e−θk̂·B )−b,b (eθk̂·A )−a,a BAb a |j − σ


a  b


−b,−a (k, −σ)
= (−)A+B−j (−)j+σ (−)A+a (−)B+b uBA
(j) (A)
= (−)A+B−j Yσσ Yaa Ybb uBA   (B)
b a (k, σ ) (13.26)
The transformation property of the general covariant field (13.10) under the
C-operation now follows straightforwardly:
 
−1 d3 k  c  −ik·x
Cφab (x)C =
AB
√ (uAB
ab (k, σ)ζa (k, σ)e
(2π)3/2 2E σ
 ∗ † 
ab (k, −σ)ζc a (k, σ)e
+(−)2B (−)j−σ uAB ik·x
)
  d3 k (A) (B) (j) BA   ∗ c† 
= (−)A+B−j √ Yaa  Ybb {Yσσ  ub a (k, σ )ζ a (k, σ)eik·x
σσ  b a
(2π)3/2 2E
(j)    −ik·x †
b a (k, −σ )ζc a(k, σ)e
+(−)2B (−)j−σ Y−σ.−σ uBA }
 (A) (B)  3 
d k   −ik·x
= (−)A+B−j Yaa Ybb √ {(−)2B ζc uBA
b a (k, σ)a(k, σ)e
(2π)3/2 2E

ba  σ

+ζ ∗ (−)j+σ uBA  c† 
b a (k, −σ)a (k, σ)e
ik·x †
} (13.27)
476 Symmetries II: Discrete spacetime symmetries

Using the relation (−)j+σ = (−)j−σ (−)2j = (−)j−σ (−)2A+2B (as 2j − 2σ is even, and
A and B must be able to couple to j), this becomes

−1
CφAB
ab (x)C
 (A) (B)  d3 k 
  −ik·x
= (−)A+B−j Yaa Ybb √ {ζc (−)2B uBA
b a (k, σ)a(k, σ)e
(2π) 3/2 2E σ
b a 
∗  c†  ik·x †
b a (k, −σ)a (k, σ)e }
2B
+ ζ (−) (−)2A (−)j−σ uBA (13.28)

Comparing the field in the last expression with the general result (7.84), we see that
we recover the (hermitian adjoint) of a covariant field if and only if

ζc = ζ ∗ (13.29)

in which case the field satisfies the transformation rule


 (A) (B) BA†
−1
CφAB
ab (x)C = ζ(−)A−B−j Yaa Ybb φb a (x) (13.30)
a b

As in the case of parity, if A = B we must include both φAB and φBA fields in
the Hamiltonian if the theory is to be invariant under C-conjugation. However, the
combined operation of charge conjugation and parity, CP, does not mix these two
types, so CP-invariant theories are possible in the presence of asymmetric (chiral)
representations:

−1 (A) (B)
CPφAB
ab (c)(CP) = ηζ(−)2(B−j) Yaa Ybb φAB†
a b (−
x, t) (13.31)
a b

Again, in analogy to (13.17) for the parity operator, the general result (13.30)
reduces in the Dirac field case (( 12 , 0) ⊕ (0, 12 ) representation) to

Cψ(x)C −1 = ζCψ† (x), C ≡ iγ 0 γ 2 (13.32)

Another example: choosing ζ for a j=1, ( 12 , 12 ) field. The connection between the four-
vectorial notation φμ and the φAB
ab notation is as follows

1 1 1
φ 21 ,21 = √ (φ1 − iφ2 )
2 2 2
1 1 1
φ 2 2
1 1 = √ (φ0 − φ3 )
2 ,− 2 2
1 1 1
2 2
φ− 1 1
,
= − √ (φ0 + φ3 )
2 2 2
1 1 1
2 2
φ− 1
,− 1
= − √ (φ1 + iφ2 ) (13.33)
2 2 2
Time-reversal properties of a general local covariant field 477

If we now choose ζ = ±1, for example, then, using (13.30)


1 1 1
Cφ 21 ,21 C −1 = ∓φ†− 1 ,− 1 = ± √ (φ1† − iφ2† ) (13.34)
2 2 2 2 2
and similarly for the other three components. In terms of the vectorial components,
one easily finds

Cφμ (x)C −1 = ±φμ† (x) (13.35)

In the Hamiltonian of quantum electrodynamics, a four-vector field (φμ = Aμ =


photon field, in this case self-conjugate) appears multiplying a four-vector current
built from charged particle fields which changes sign under charge conjugation, so to
ensure that the charge-conjugation operator C commutes with the Hamiltonian, we
also assign ζ = −1 to the photon field.
A comment on the underlying logic of all of this. We are free to define the C and
P operators with any phase constants ζ, η we like (for any given field), but there are
two possible outcomes: either
1. there exists a choice for ζ, η for each participating field such that some interaction
(say the electromagnetic part of the Hamiltonian) then commutes with C (or P),
in which case we assign those values to the fields as the intrinsic parity and
charge-conjugation quantum number of the associated particles; or
2. there is no way to pick ζ, η for all the fields so that C (or P) commute with the
Hamiltonian for the given interaction. In that case we say that the interaction
breaks C (or P, or both). As pointed out above, this implies that the simple
transformation properties (under C or P) of the interaction-picture fields cannot
be extended to the full Heisenberg fields of the theory.
The point is that by assigning quantum numbers as in (1) above (if possible) we gain
a new operator commuting with the Hamiltonian, and the more of these there are, the
easier it is to solve any quantum theory.
One last caution: it is common to use the letter C to refer to the charge-conjugation
quantum number (ζ above), as well as the unitary Hilbert space operator C imple-
menting charge-conjugation on the states. I have avoided this for obvious reasons,
while using the letter C as a general denotation of the charge-conjugation operation,
but the reader should be aware of this when consulting the standard texts.

13.3 Time-reversal properties of a general local covariant field


The third, and final, discrete spacetime symmetry which remains to be discussed is that
of invariance of the dynamical evolution under reversal of the direction of time flow.
The basic properties of time reversal in quantum mechanics were discussed already
in Section 4.1.3, which the reader may find useful to review at this point. We saw
there that the operation of time reversal in quantum mechanics is implemented by an
antiunitary operator T which reverses the direction of spatial momenta. Thus for a
particle of spin j, the one-particle state transforms under the T operation as follows
(cf. (4.60))
478 Symmetries II: Discrete spacetime symmetries

T |k, σ = τ ∗ Yσσ | − k, σ  ,


(j)
|τ | = 1 (13.36)

As in the case of parity and charge conjugation, the antiunitary character of the
time reversal operator allows for an arbitrary phase factor τ (resp. τ c )—or intrinsic
“time-parity”—for the particle (resp. antiparticle) in question. The transformation
property (13.36) transfers in the usual way to a transformation property of creation
and destruction operators under a similarity transformation with T :

T a† (k, σ)T −1 = τ ∗ Yσσ a† (−k, σ  )


(j)
(13.37)

T a(k, σ)T −1 = τ Yσσ a(−k, σ  )


(j)
(13.38)

T ac† (k, σ)T −1 = τ c∗ Yσσ ac† (−k, σ  )


(j)
(13.39)

T ac (k, σ)T −1 = τ c Yσσ ac (−k, σ  )


(j)
(13.40)

Recalling that T contains the charge-conjugation operator K (cf. (4.60)), one finds for
the time-reversal transformation of the covariant field (13.10), using (13.38, 13.39),
 
−1 d3 k
τ Yσσ  a(−k, σ  )
 (j)
T φab (x)T
AB
=  (uAB∗
ab (k, σ)e
ik·x
(2π)3/2 2E(k) σ
−ik·x c∗
τ Yσσ  ac† (−k, σ  ))
 (j)
ab (k, −σ)e
+ (−)2B (−)j−σ uAB∗ (13.41)

Changing integration variable k → −k, defining xt = (x, −t), and using the reflection
and conjugation properties (13.13, 13.26) of the uAB 
ab (k, σ), a few lines of algebra (see
Problem 3) leave us with
 
−1 (A) (B) d3 k  −ik·xt 
T φab (x)T
AB
= τ Yaa Ybb  (uAB
a b (k, σ)e a(k, σ)
(2π)3/2 2E(k) σ
τ c∗  ik·xt c† 
+ a b (k, −σ)e
(−)2B (−)j−σ uAB a (k, σ)) (13.42)
τ
The right-hand side of this expression is clearly a covariant field of type φAB
a b , provided
we make the by now usual identification

τc = τ∗ (13.43)

whereupon the desired transformation law for φAB


ab (x) emerges

x, t)T −1 = τ Yaa Ybb φAB


(A) (B)
T φAB
ab ( x, −t) = τ (−)A+B+a+b φAB
a b ( x, −t)
−a,−b ( (13.44)

13.4 The TCP and Spin-Statistics theorems


Invariance under the combined operation of parity inversion, charge-conjugation, and
time reversal enjoys a special status in local relativistic quantum field theories (see
Fig. 13.2). Independently of the detailed dynamics of such theories, it follows from
only the basic underlying ingredients outlined in Chapter 3: quantum theory, Lorentz-
invariance, and locality (specifically, weak local commutativity, which asserts the
The TCP and Spin-Statistics theorems 479

νe ν̄ μ μ− e+

Sβα SΘαΘβ

e− μ+ νμ
ν̄e

Fig. 13.2 The weak interaction process (e− + μ+ → νe + ν̄μ ) of Fig. 13.1 and its Θ = T CP
transform. The TCP theorem ensures equality (up to phase) of the amplitudes for the process
and its TCP transform in any (weakly) local relativistic quantum field theory describing the
interactions responsible for the process.

vanishing of the vacuum expectation value of space-like commutators). We shall first


demonstrate this result—the famous “TCP theorem”—in the context of a Lagrangian
field theory, in which the dynamics is specified in terms of fields transforming under
T, C, and P as indicated in the preceding sections, coupled together to form a
Lorentz-scalar Lagrangian density. The more general argument, based on the axioms
of axiomatic field theory discussed in Section 9.2, will then be outlined.
First note that for a complex scalar field φ (i.e., a field in the A = 0, B = 0
representation of the HLG), undergoing the P, C, and T transformations

Pφ(x, t)P −1 = ηφ(−x, t) (13.45)


Cφ(x, t)C −1 = ζφ† (x, t) (13.46)
−1
T φ(x, t)T = τ φ(x, −t) (13.47)

the transformation property under the combined effect of Θ ≡ T CP becomes remark-


ably simple:

Θφ(x, t)Θ−1 = (ηζτ )∗ φ† (−x, −t) (13.48)

We are at liberty to choose the phases η, ζ, τ defining the corresponding discrete


symmetries as we please, and we shall henceforth choose

ηζτ = 1 (13.49)

whence (returning to spacetime coordinate notation)

Θφ(x)Θ−1 = φ† (−x), Θ∂μ φ(x)Θ−1 = −∂μ φ† (−x) (13.50)

It follows that the action obtained as the spacetime integral of any hermitian, Lorentz-
scalar Lagrangian density L(φ, ∂μ φ, φ† , ∂μ φ† ) is invariant under the combined TCP
480 Symmetries II: Discrete spacetime symmetries

operation, which simply interchanges φ and φ† (which must appear symmetrically


if the Lagrangian is to be hermitian, and the dynamics unitary), and switches x →
−x (the so-called “strong reflection” operation), which leaves the spacetime integral
unaltered. Spacetime gradients must appear paired for the Lagrangian to be a scalar,
so the minus signs in field derivatives must also cancel. The hermitian stress-energy
tensor Tμν (x) for such a theory (cf. Section 12.4),

∂L ∂L
Tμν (x) = ∂ν φ(x) + ∂ν φ† (x) − gμν L (13.51)
∂(∂μ φ) ∂(∂μ φ† )

clearly satisfies the strong reflection property

Θ Tμν (x) Θ−1 = Tμν (−x) (13.52)

Recalling that the Hamiltonian H of the theory is simply the spatial integral of the
density T00 (x, 0),

H= d3 xT00 (x, 0) ⇒ Θ H Θ−1 = H ⇒ [Θ, H] = 0 (13.53)

so the Hamiltonian of our theory is also invariant under the TCP operation.
The above argument is easily generalized to fields φABab (x) in arbitrary representa-
tions of the HLG: remarkably, we find that any such fields have (up to a sign) exactly
the same transformation property as the scalar field above. Combining (13.16, 13.30,
13.44) (or more directly, (13.31) and (13.44)), a few lines of algebra lead to

−1
Θ φAB
ab (x) Θ = (ηζτ )∗ (−)2A φAB†
ab (−x) = (−)
2A AB†
φab (−x) (13.54)

where we have inserted the phase choice (13.49). The argument for invariance of a
Lagrangian field theory constructed from such fields now follows directly. First, we note
that the four-vector spacetime gradient ∂μ , which transforms according to the ( 12 , 12 )
representation of the HLG, when applied to a φAB -type field multiplet, generates, by
the usual Clebsch–Gordon machinery of angular momentum addition, a combination
 
of irreducible fields of type φA B (with A = |A ± 12 |, B  = |B ± 12 |), which individually
transform under T, C, and P just as we have indicated above. Thus, we may assume
henceforth that the (scalar!) Lagrangian density of our theory is simply a sum of terms
involving products of φAB fields in which the independent A and B representation
labels are separately coupled to zero spin. In any such product, the sign factors (−)2A
must also multiply to give +1, and we therefore have the strong reflection property
for the scalar Lagrangian density

Θ L(x) Θ−1 = L† (−x) = L(−x) (13.55)

where we have again assumed an hermitian Lagrangian in the last step. The invariance
of the action of the theory, given as the spacetime integral of the Lagrangian density,
The TCP and Spin-Statistics theorems 481

now follows as before. A similar argument3 ensures the strong reflection property for
the Hamiltonian density T00 (x), and hence the commutativity of the Hamiltonian with
the Θ operator.
The preceding discussion of TCP symmetry is based on a specification of the
dynamics in terms of a Lagrangian (or associated Hamiltonian) built from local fields.
A much more general understanding of the origins of TCP symmetry can be given using
the Wightman axiomatic formalism discussed in Section 9.2, with no commitment
whatsoever to a detailed dynamics of the fields. The basic ingredients which, once
present, ensure the exact TCP invariance of a field theory are (a) the spectral
and Lorentz transformation properties needed to establish the analytic properties of
the Wightman functions embodied in the Hall–Wightman theorem (specifically, the
property (9.76) following directly from that theorem), and (b) a weaker version of
space-like commutativity of the fields (called weak local commutativity, or WLC for
short), in which the vanishing of the relevant commutators (or anticommutators) is
only required in the vacuum expectation value. Without giving detailed proofs, we
shall outline the argument here.
We see comparing (13.50) and (13.54) that the behavior of a general covariant
field under the TCP operation is effectively identical to that of a scalar field, so to
strip out inessential details and present the essence of the argument most clearly, we
shall assume that we are dealing with a theory of a single real self-interacting scalar
field φ(x). The invariance of the theory (and its vacuum) under TCP implies the
existence of an antiunitary operator Θ satisfying (13.50): this then means that the
n-point Wightman function defined by4

W (x1 , x2 , . . . , xn ) ≡ 0|φ(x1 )φ(x2 ) . . . φ(xn )|0 (13.56)

satisfies the constraint

W (x1 , x2 , . . . , xn ) = W (−xn , −xn−1 , . . . , −x1 ) (13.57)

which follows immediately from the fact that 0|ΘOΘ−1 |0 = 0|O † |0 for any operator
O (by antiunitarity of Θ), if we substitute for O the product of field operators appear-
ing in (13.56). By translation invariance, in terms of the displacements ξi ≡ xi − xi+1 ,
this can then be written

W (ξ1 , ξ2 , . . . , ξn−1 ) = W (ξn−1 , ξn−2 , . . . , ξ1 ) (13.58)

We recall from Section 9.3 that the Haag–Ruelle scattering theory assures us that the
entire phenomenological content of the theory contained in the S-matrix is uniquely
recoverable from knowledge of the Wightman functions, so the essential physical
content of TCP invariance (discussed in greater detail below) can be considered to be
fully incorporated in the condition (13.58) on the Wightman functions of the theory.

3 In this case, the stress-energy tensor density transforms like a symmetric rank 2 tensor—i.e., a
combination of (0,0) and (1,1) fields. In either case, the sum of A quantum numbers in each term must be
integer, so the product of (−)2A factors again gives unity.
4 We shall be dealing with the full Wightman functions of the theory for the remainder of this chapter,
and omit the “H” subscript indicating the omnipresent Heisenberg fields.
482 Symmetries II: Discrete spacetime symmetries

Next, we need to review briefly, and then extend somewhat, the analytic properties
of Wightman functions discussed previously in Section 9.2. A set of n − 1 complex 4-

coordinates (ζ1 , . . . , ζn−1 ) is said to lie in the extended tube Tn−1 if ζi = Λzi where
Λ is an arbitrary complex Lorentz transformation and the zi , i = 1, 2, .., n − 1 lie
in the forward tube (i.e., zi = ξi − iηi , ηi2 > 0, ηi0 > 0). The Hall–Wightman theorem
establishes the analyticity of the Wightman functions continued into the whole of the
extended tube, from which the reflection property (9.76) is easily deduced:

W (ζ1 , ζ2 , . . . , ζn−1 ) = W (−ζ1 , −ζ2 , . . . , −ζn−1 ) (13.59)

Note that this result hangs only on the spectral and Lorentz transformation axioms of
Section 9.2 (in particular Axioms Ia-d, IIa-b): in particular, there has as yet been no
appeal to the locality axiom IIc (space-like commutativity or anticommutativity of the
fields). A weakened version of the locality property is all that is needed to complete
the proof of the TCP theorem, in the form (13.58). Let (ξ1 , ξ2 , . . . , ξn−1 ) be a set of
real space-like four-vectors with space-like convex hull, i.e.,


n−1 
( λi ξi )2 < 0 ∀λi , λi = 1, 0 ≤ λi ≤ 1 (13.60)
i=1 i

Note that this implies that for any subset S of (1, 2, . . . , n − 1), i∈S ξi is also space-
like, and hence, if ξi = xi − xi+1 , then for i < j we have xi − xj = (xi − xi+1 ) +
(xi+1 − xi+2 ) + . . . (xj−1 − xj ) = ξi + . . . ξj−1 also space-like, and hence xi − xj is
space-like for all i = j: the set ξi is totally space-like. Locality of the theory then
implies that we can rearrange the operators φ(x1 )φ(x2 ) . . . φ(xn ) into the reversed
order φ(xn )φ(xn−1 ) . . . φ(x1 ), and therefore, for the vacuum expectation value obtain

W (ξ1 , . . . , ξn−1 ) = 0|φ(x1 )φ(x2 ) . . . φ(xn )|0


= 0|φ(xn )φ(xn−1 ) . . . φ(x1 )|0 = W (−ξn−1 , . . . , −ξ1 ) (13.61)

The condition (13.61) on the Wightman functions is strictly weaker than the full
requirement of space-like commutativity at the operator level, as we are only requiring
the commutativity to be effective at the level of the vacuum expectation value: it is
usually given the name “weak local commutativity” in the literature. Real points
in 4(n − 1)-dimensional space satisfying (13.60) are called Jost points, and have a

remarkable characteristic: they can all be shown to lie in the extended tube Tn−1 ! In
other words, given any set (ξ1 , ξ2 , . . . , ξn−1 ) satisfying (13.60), we can find a complex
Lorentz transformation Λ such that ξi = Λzi , with zi in the interior of the (complex)
forward tube. Some explicit examples are given in Problem 4.5 Now, the reader will
doubtless recall that two analytic functions of a single complex variable that agree
on any finite neighborhood of the real axis must continue to agree when analytically
continued to the rest of the complex plane. An exactly similar phenomenon in the
case of analytic functions of several complex variables ensures that the equality
W (ξ1 , . . . , ξn−1 ) = W (−ξn−1 , . . . , −ξ1 ) valid for real Jost points (and following from

5 For the full proof of this assertion, see (Streater and Wightman, 1978), theorem 2-12.
The TCP and Spin-Statistics theorems 483


weak local commutativity) can be extended to points in the entire extended tube Tn−1 :

W (ζ1 , ζ2 , . . . , ζn−1 ) = W (−ζn−1 , −ζn−2 , . . . , −ζ1 )



= W (ζn−1 , ζn−2 , . . . , ζ1 ), ∀ζi ∈ Tn−1 (13.62)

where the final equality obtains in consequence of (13.59). The extended tube, of
course, contains the forward tube, consisting of points ζi = ξi − iηi with the ηi future
time-like, and the ξi arbitrary spacetime points (in particular, not necessarily space-
like!). Taking the boundary limit ηi → 0 in (13.62) we replace the ζi with real points
ξi everywhere and obtain

W (ξ1 , ξ2 , . . . , ξn−1 ) = W (ξn−1 , ξn−2 , . . . , ξ1 ) (13.63)

which is precisely the desired TCP theorem, in the form (13.58). The extension of
this result to fields in arbitrary representations of the HLG, and in the fermionic case,
satisfying space-like anticommutativity (in the weak form) can be found in (Streater
and Wightman, 1978) (theorem 4-7).
The immediate phenomenological consequences of TCP symmetry are among the
most precisely tested predictions of quantum field theory. In particular, the fact that
the exact Hamiltonian H of the world commutes with the T CP operator implies an
exact relation between the static properties, such as mass and magnetic moment (if
any), of a particle and its antiparticle. If |k = 0, σ is the one particle state of a stable
particle at rest, with H|k = 0, σ = mph |k = 0, σ (mph the physical mass), then TCP
invariance requires that the state T CP|k = 0, σ be an eigenstate of H with exactly
the same eigenvalue (i.e., mph ). But this state is just the one particle state of the
antiparticle at rest (with spin reversed, which by rotational invariance cannot alter
the energy). Thus the particle and antiparticle masses must be exactly equal, any
deviation implying a violation of TCP invariance. This equality has been tested in the
case of the proton/antiproton to better than one part in 108 . Even more precise tests
of TCP invariance are available in the neutral kaon system, where the K 0 and K̄ 0
masses6 are found to agree to within one part in 1018 !
In similar fashion, TCP symmetry implies that the magnetic moments of particle
and antiparticle are equal in magnitude and opposite in sign. This follows from the
fact that the single-particle state in the presence of a static magnetic field B  must
have the same energy as its TCP conjugate, which is an antiparticle of reversed spin.
On the other hand, the magnetic field B  is unchanged under TCP (B  changes sign
under C and T, as is clear by considering the behavior of steady currents giving rise
 but is unchanged under P, as it is an axial vector). Thus the magnetic moments,
to B,
like the spins, must be equal and opposite, although the phenomenological checks in
this case are much coarser than is the case for particle masses.
Consequences of TCP for typical scattering processes such as the weak process
depicted in Fig. 13.2, in which none of the discrete symmetries T, C, or P are separately
conserved, are typically much more difficult to check with comparable precision to the
static properties described above: one typically finds that the TCP transform of a

6 See (Bloch, 2006) for a recent review of the implications of TCP violation for neutral kaon physics.
484 Symmetries II: Discrete spacetime symmetries

phenomenologically accessible process is difficult to observe, and in any event, cross-


sections of scattering processes are generally only measurable at a precision level far
inferior to that of static properties such as mass or magnetic moment.
We conclude this chapter by revisiting the Spin-Statistics theorem, discussed
already in Section 7.3 from the point of view of Hamiltonians expressed in terms of
free, interaction-picture fields. As in the case of the TCP theorem, the Spin-Statistics
connection can be derived rigorously from general underlying principles of relativistic
quantum field theory, as incorporated in the Wightman axioms—with no commitment
to a detailed dynamics as specified by a Hamiltonian (or Lagrangian). The strategy
is almost identical to the proof of TCP invariance, employing the generalization of
(13.59) to the Wightman function of a product of fields φA1 B1 φA2 B2 · · · φAn Bn in
arbitrary representations of the HLG (see Section 9.2, (9.78)):

W A1 B1 ,A2 B2 ,..,An Bn (ζ1 , . . . , ζn−1 ) = (−1) i 2Ai W A1 B1 ,A2 B2 ,..,An Bn (−ζ1 , . . . , −ζn−1 )
(13.64)
for all ζi in the extended tube. We shall now assume that the fields φAi Bi , i = 1, 2, .., n
appearing in the Wightman function above are either (a) Bose fields, defining as
those with vanishing space-like commutator, or (b) Fermi fields, defined as those with
vanishing space-like anticommutator. Bose fields are, of course, assumed to commute
at space-like separation with any Fermi field. If we now specialize to totally space-
like real Jost points, ζi = ξi (which the reader will recall are in the extended tube),
the rearrangement of the fields will give a factor (−1)P where P is the number
of transpositions of Fermi fields needed to effect the rearrangement. Thus, at the
Jost points

W A1 B1 ,A2 B2 ,..,An Bn(ξ1 , . . . , ξn−1)



= (−1)P +2 i Ai W An Bn ,An−1 Bn−1 ,..,A1 B1 (ξn−1 , . . . , ξ1) (13.65)

By analytic continuation to the full extended tube, followed by the return to the real
boundary of the forward tube (exactly as above for TCP), we may therefore conclude
that for arbitrary real four-coordinates xi , i = 1, 2, . . . , n the Wightman functions (now
written as a function of the full set of n coordinates, rather than the n − 1 coordinate
differences) satisfy

W A1 B1 ,...,An Bn (x1 , . . . , xn ) = (−1)P +2 i Ai W An Bn ,...,A1 B1 (−xn , . . . , −x1 ) (13.66)

For n = 2, and taking the field φA2 B2 = (φA1 B1 )† (so A2 = B1 , B2 = A1 , cf. Section
13.2) this implies, in operator language,

0|φA1 B1 (x1 )(φA1 B1 )† (x2 )|0 = (−1)P +2(A1 +B1 ) 0|(φA1 B1 )† (−x2 )φA1 B1 (−x1 )|0
(13.67)
Multiplying both sides of (13.66) by test functions f (x1 )f ∗ (x2 ) and integrating over
x1 , x2 , and noting that (−1)2(A1 +B1 ) = (−1)2j , where j is the spin carried by the field
φA1 B1 ,
Problems 485

f (x1 ) 0|φA1 B1 (x1 )(φA1 B1 )† (x2 )|0 f ∗ (x2 )d4 x1 d4 x2

= (−1)P +2j f ∗ (x2 ) 0|(φA1 B1 )† (−x2 )φA1 B1 (−x1 )|0 f (x1 )d4 x2 d4 x1 (13.68)

which amounts to
 
∗ †
|| f (x)(φ A1 B1
(x)) |0 d x|| = (−1)
4 2 P +2j
|| f (−x)φA1 B1 (x)|0 ||2 (13.69)

As the squared norms in (13.69) cannot vanish (if the smeared fields give zero acting
on the vacuum, the Wightman functions are zero and we have no theory!), we must
have P + 2j an even integer: in other words, Fermi fields (where P = 1) correspond
to half-integer spin, while Bose fields (P = 0) correspond to integer spin.

13.5 Problems
1. Let P be the unitary operator

d3 ka† (

k)(a(

k)+λa(−

P = e−iθ k))
(13.70)

where a(k) is the destruction operator for a spinless, self-conjugate boson. Find
θ and λ (in terms of η) so that P will be the parity operator for a particle of
intrinsic parity η:

Pa(k)P −1 = ηa(−k) (13.71)

(Hint: expand the left-hand side in a series of multiple commutators.)


(a) Use the result (13.30) for the charge-conjugation property of a general
covariant field to derive the corresponding result for a Dirac 4-spinor field
ψ, as constructed in Section 7.4:

Cψn C −1 = ζCnm ψ̄m (13.72)

where Cnm is a numerical 4x4 matrix. Write out this matrix explicitly.
(b) Show that the matrix C of part (a) has the following properties

C = −C −1 (13.73)
Cγμ C = γμT (13.74)

(c) A four-vector current can be defined as follows


1 1
jμ (x) ≡ [ψ̄(x), γμ ψ(x)] = (γμ )nm (ψ̄n ψm − ψm ψ̄n ) (13.75)
2 2 nm

Use the results of parts (a,b) above to establish that this current is odd under
charge conjugation:

Cjμ (x)C −1 = −jμ (x) (13.76)


486 Symmetries II: Discrete spacetime symmetries

(d) Show that the expression for jμ (x) given in part (c) is equivalent to normal-
ordering, namely jμ (x) =: ψ̄(x)γμ ψ(x) : (where we are now dealing with
interaction-picture free fields, of course).
2. Verify, using the reflection and positivity results (13.13,13.26), the steps leading
from (13.41) to (13.42).
3. (a) Show that for a single real space-like vector ρ, there exists a complex Lorentz
transformation Λ such that ρ = Λζ, where ζ is in the forward tube (i.e., ζ =
ξ − iη where η is future time-like, η 2 > 0, η 0 > 0. (Hint: choose a coordinate
system in which ρ1 = ρ2 = 0, |ρ0 | < ρ3 . Then consider the transformation
(9.77), for suitable choice of θ).
(b) Now suppose the two real space-like vectors ρ1 , ρ2 have space-like convex
hull. Show that they form a point in the extended tube T2 : i.e., there exists
a complex Λ such that ρ1 = Λζ1 , ρ2 = Λζ2 , with ζ1 , ζ2 in the forward tube.

4. Suppose a set of n real points (ρ1 , ρ2 , .., ρn ) is in the extended tube Tn . Show
that their convex hull (i.e., the set of spacetime points of form i λi ρi , i λi =
1, 0 ≤ λi ≤ 1) consists entirely of space-like points.
14
Symmetries III: Global symmetries
in field theory

The symmetries discussed in the preceding two chapters have been primarily those
involving the transformation properties of relativistic field theories under the continu-
ous and discrete parts of the homogeneous Lorentz group. In this chapter our focus will
be on internal global symmetries: those symmetries involving spacetime-independent
transformations of the fields which leave the dynamics of the theory invariant (or
almost invariant, in the case of weakly broken global symmetries). We have already
seen some examples of such symmetries in the context of Noether’s theorem in
Section 12.4, where we considered symmetries under global phase transformations
(cf. (12.120)) forming a commutative (abelian) group, or more generally, symmetries
in which the transformation of a multiplet of fields under a linear (non-abelian) matrix
group, as in (12.126), leaves the Lagrangian invariant.
The term “global” here refers to the fact that the same phase or matrix trans-
formation is applied to the fields of the theory at all spacetime points: the far richer
physics that emerges when the symmetry is local, allowing arbitrary dependence of
the transformation on spacetime location, will be the subject of the next chapter.
The formal application of Noether’s theorem in the presence of a global symmetry
leads, as we saw in Section 12.4, to the existence of a conserved current Jαμ (x) for each
generator tα of the global symmetry group, with the conserved charges associated with
each current,

Qα ≡ Jα0 (x)d3 x (14.1)

mimicking the commutation relations of the Lie algebra of the symmetry group:

[tα , tβ ] = ifαβγ tγ ⇒ [Qα , Qβ ] = ifαβγ Qγ (14.2)

We emphasize that the application of Noether’s theorem, strictly speaking a classical


result relying on adequate smoothness properties of the (classical) action and fields, to
quantum field theories should be taken with a grain of salt: it is entirely possible for
quantum effects to destroy the classical conservation of the current(s) Jαμ (x), leading to
a non-zero divergence of order , ∂μ Jαμ (x) = O(), a symmetry violation, or anomaly,
entirely invisible at the classical level. Typically, anomalies arise because of delicacies
in the construction of well-defined composite operators (implementing the desired
symmetry transformations) for the currents Jαμ (x) once the theory is quantized. From
the functional integral point of view, anomalies arise when the functional change
488 Symmetries III: Global symmetries in field theory

of variables needed in the demonstration of Noether’s theorem in the path-integral


formalism generates a non-trivial Jacobian once the functional integral is properly
regularized. We shall postpone a detailed discussion of anomalies to Chapter 15, as
the important cases typically involve interacting local gauge fields, which we introduce
and study there, even though the anomalous symmetry may itself be only a global one.
Putting aside for the time being the possibility of quantum-breaking of a global
symmetry via anomalies, an exact global symmetry (at the Lagrangian level) may
be realized in two distinct ways, depending on the transformation properties of the
ground state (in field theory, the vacuum) of the theory under the symmetry. Let us
suppose that the dynamics is exactly invariant under a symmetry group G, with a Lie
algebra spanned by the set of generators tα , α = 1, 2, ...n, with corresponding conserved
charges Qα . Next, let us imagine that the generators tα are chosen as appropriate linear
combinations so that some subset of the corresponding charges, Qα , α = 1, 2, ..m ≤ n
annihilates the physical vacuum

Qα |0 = 0 (14.3)

In fact, the set of such generators must itself span the Lie algebra of a subgroup H ⊂ G,
as Qα |0 = Qβ |0 = 0 ⇒ [Qα , Qβ ]|0 = 0. In this case, we say that the symmetry group
H is realized in the Wigner–Weyl mode, while the generators Qα , α = m + 1, .., n
under which the vacuum is not invariant correspond to the Nambu–Goldstone mode of
symmetry realization. Since, by assumption, the full symmetry group G is preserved
by the dynamics, the broken generators still commute with the Hamiltonian, and
therefore lead (when exponentiated and applied to any minimum-energy vacuum state)
to new states which are also minimum-energy states: i.e., the vacuum is degenerate.
This is precisely the situation discussed previously in Chapter 8 under the heading
“spontaneous symmetry-breaking”.
By contrast, the surviving Wigner–Weyl symmetry group implies the existence
of degenerate multiplets of particle states, which must span finite-dimensional repre-
sentations of H: in this case the existence of the underlying symmetry is manifestly
visible in the spectrum of the theory and in symmetry constraints on the transition
amplitudes of the theory (as in the case of isospin symmetry, for example).1 For the
broken generators, the non-invariance of the ground state of the theory transfers to
complicated transformation properties of the multi-particle states built on it, and
the existence of an underlying exact dynamical symmetry can be far from obvious
phenomenologically. However, the spontaneous breaking of an exact global symmetry
leaves a remarkable phenomenological residue which can scarcely be missed: the
appearance of an exactly massless particle, called a “Goldstone boson”, for each
broken generator of the original exact global group G. We shall prove this result—the
Goldstone–Salam–Weinberg theorem—in Section 14.2.
In fact, for reasons to be discussed in Section 14.1, exact global symmetries (which
are not associated with a local gauge symmetry) are very rare in Nature: indeed, it

1 Of course, we may have m = 0, in which case H is null, the entire group G is spontaneously broken,
and we have only the Nambu–Goldstone mode of realization, or m = n, in which case H = G, and the
entire symmetry is realized in Wigner–Weyl mode.
Exact global symmetries are rare! 489

is possible that there are none at all! Instead, we find many examples of approximate
global symmetries, in which the breaking of the symmetry is in some (appropriate)
sense small, allowing us to exploit the symmetry by taking it to be exact at zeroth
order, and treating the effects of the symmetry-breaking perturbatively. The absence
of exactly zero-mass Goldstone bosons in Nature also suggests that there are no
dynamically exact (with exactly conserved charges) but spontaneously broken global
symmetries. However, the chiral symmetry of quantum chromodynamics provides a
clear example of a global symmetry which is (a) weakly broken by explicit non-
symmetric terms in the Lagrangian, and (b) spontaneously broken by the vacuum.
In this situation, one finds in place of the exactly zero-mass Goldstone bosons which
would necessarily appear if the small symmetry-breaking terms in the Lagrangian were
turned off, “light” spinless particles—pseudo-Goldstone bosons—associated with each
spontaneously broken generator of the chiral group.2 These light particles are just
the pions, whose squared masses are just 2% of the squared mass of the proton, for
example. The diagnostic techniques available for determining the presence or absence
of spontaneous breaking of a global symmetry will be the subject of Section 14.3:
it turns out that the concept of the effective action introduced as a graph theoretic
concept in Chapter 10 plays an essential role here.

14.1 Exact global symmetries are rare!


Until the advent of Grand Unified Theories (GUTs) in the mid-1970s, the exact
conservation of global quantum numbers such as baryon number (B) and lepton
number (L) was considered an obvious feature of elementary particle interactions.
In the Standard Model of electroweak interactions, for example, the particle content
and gauge dynamics of the theory guarantees invariance of the Lagrangian with respect
to phase transformations assigning quantum numbers B = 13 to quarks, B = 0 to
gluons and leptons, L = 1 to leptons (electrons, muons, and taus, and their associated
neutrinos), and L = 0 to quarks and gluons. The classical Noether analysis of Section
μ
12.4 then assures existence of divergenceless currents JB and JLμ , with associated
exactly conserved charges (B and L respectively).
Even within the Standard Model, however, quantum effects produce an anomaly
μ
resulting in explicit violation of the current JB + JLμ corresponding to the quantum
number B + L. We defer discussion and derivation of these anomalies to Section 15.6:
for our present purposes, it suffices to point out that the structure of the anomalous
divergence of the B + L current results in extremely small transition amplitudes
violating B + L conservation, of non-perturbative order O(e−C/αwk ), where αwk is
the weak fine-structure constant (of similar magnitude to αem = 1/137), as originally
pointed out by ’t Hooft. Such violations are so small as to be negligible, for all practical
purposes.
However, there is considerable circumstantial evidence to suggest that much
larger—indeed, potentially observable—violations of both baryon and lepton number
conservation do indeed occur in Nature. Perhaps the most convincing argument for
such violations come from modern cosmology. The observed asymmetry in the density

2 For an explicit example of a pseudo-Goldstone boson, see Problem 4(b) in Chapter 8.


490 Symmetries III: Global symmetries in field theory

of matter with respect to antimatter in the visible Universe requires either that we
introduce the asymmetry in an ad hoc fashion as an initial condition at some early
point in the evolution of the Universe following the Big Bang, while retaining exact
conservation (ignoring the effects of exponentially small anomalous violations), or, as
assumed by essentially all present practitioners of early Universe cosmology, we must
introduce a small breaking of baryon number (say) to account for the fact that the
ratio of baryon to photon number densities nnBγ at the epoch of nucleosynthesis is on
the order of 10−8 .
In fact, the structure of local quantum field theories, and in particular theories
based on local gauge interactions, already implies that exact conservation of global
symmetries is extremely fragile, and easily subject to violation on several grounds. This
is in contradistinction to the presumed exact character of the local gauge symmetries
described in the next chapter, both in the Standard Model as well as in Grand Unified
(or even, superstring) extensions thereof. These symmetries may undergo spontaneous
breaking, but are as far as we know exact symmetries at the dynamical level.
There are several reasons for the apparent fragility of global symmetries in local
quantum field theories. Here is a (partial) list:

1. The dynamics of local field theories, specified in terms of a Lagrangian functional,


for example, is dependent in form and structure on the distance (or energy) scales
over which we wish to provide an accurate description of the transition amplitudes
of the theory. A detailed understanding of the nature of this dependence will
require the concepts embodied in the renormalization group which will be the
central topic in Part 4 of this book. It will be shown there that Lagrangian field
theories, treated so far as fundamental descriptions of the exact microphysics of
some set of interacting particles, should rather be regarded as provisional effective
descriptions of relativistic quantum phenomena, valid (to within a specifiable level
of accuracy) in some range of energy scales. As we raise the energy horizon to
which we expect our theory to hold, new microphysics typically emerges in the
form of new degrees of freedom (fields/particles) with a new effective Lagrangian
expressing the local interactions (valid down to the new lower-distance scale)
of these new fields. The “ultimate” theory, if any, which emerges when this
procedure is continued, say to or beyond the Planck energy scale characteristic
of quantum gravity, may not even be a local quantum field theory at all: in the
view of many particle physicists, it is a theory in which the basic entities may be
one or higher-dimensional objects (strings, membranes, etc.).
To the extent that the B and L (baryon and lepton) global symmetries
of the Standard Model (SM) are “accidental”, arising out of the paucity of
renormalizable terms which can be included in the Standard Model Lagrangian
given the assumed particle/field content of the theory, extensions of the Standard
Model (BSM models, for “beyond the Standard Model”), constrained only by
the requirement that they yield SM physics at sub-TeV energy scales will
typically include Lagrangian interactions which violate the global symmetries
of the Standard Model. The reason for this is simply that these symmetries,
unlike the underlying local gauge symmetry which incorporate the essential
physics of the Standard Model, are not imposed from the outset, or required
Exact global symmetries are rare! 491

by deep physical principles, but simply an accidental consequence of the limited


number of low-mass fields (and gauge-symmetric interactions of such fields)
needed to embody Standard Model physics at the sub-TeV scale. A concrete
example is provided by Grand Unified Theories such as the SU(5) extension
of the Standard Model, where explicit B and L violating terms automatically
appear due to the enlarged particle content (although the combination B − L
is still preserved). In supersymmetric extensions of the Standard Model, the
proliferation of particle species makes it even more likely that global symmetries
of the SM (even the resistant B − L symmetry) are explicitly violated by new
terms in the supersymmetric Lagrangian of the extended theory. As we shall see in
Part 4, the size of the symmetry violating contributions to transition amplitudes
E n
at a low-energy scale E is typically suppressed by a factor ( M ) , where M is
the characteristic mass/energy scale of the extended Lagrangian and the global
symmetry violating terms correspond to operators in the low-energy effective
Lagrangian of mass dimension 4 + n.
2. A global symmetry of an effective Lagrangian at a low-energy scale may be
promoted to a local symmetry of an extended Lagrangian describing the physics
at a higher-energy scale, and then undergo spontaneous breaking, so that the
symmetry is realized in the Nambu–Goldstone mode. This is the case, for exam-
ple, with the global B − L symmetry of the Standard Model and its SU(5) Grand
Unified extension, when embedded in a SO(10) Grand Unified gauge theory. The
global B − L charge in the latter theory is (with the standard choice of gauge
and fermion representations) then associated with a local gauge symmetry and a
massive gauge boson (via the Higgs effect; cf. Chapter 15). The phenomenological
consequences are similar to those arising in the case of explicit breaking: the
transition amplitudes of the theory display B − L violating processes at a level
suppressed by inverse powers of the characteristic mass/energy of the new physics.
3. As discussed previously, depending on the precise set of representations (particle
content) of the theory, quantum anomalies can arise which result in violations of
global Noether symmetries in theories with gauge (or gravitational) interactions.
Although the non-conservation is present and manifest in perturbation theory,
the manifestation of the symmetry violation in actual S-matrix elements of the
theory may involve exponentially suppressed (in the inverse gauge coupling) non-
perturbative tunneling processes.
4. Quantum gravity effects seem to lead to a universal mechanism for the breaking
of global symmetries, at a level involving inverse powers of the Planck mass,
the characteristic scale of quantum gravity. Classical “no-hair” theorems assert
the impossibility of associating global quantum numbers with a black hole.
On the other hand, once quantum effects are turned on, black holes become
unstable to decay via Hawking radiation. One can therefore imagine dumping
baryons into a small black hole and having the baryon energy dissipated in
Hawking radiation of photons or leptons. From an effective field theory point
of view, the corresponding process can be represented by non-renormalizable
effective operators with strength proportional to inverse powers of the Planck
mass. Certain field theories coupled to gravity (Rey, 1989) give rise to wormhole
instanton solutions where, as with black holes, global charge can also disappear
492 Symmetries III: Global symmetries in field theory

through the wormhole. The conventional wisdom (see (Peccei, 1988), Section 2.7)
suggests that gravitational effects inevitably destroy global charge conservation.
The present author is an agnostic on this issue, and would prefer to wait for
a manifestly consistent quantum gravitational description of black holes and/or
wormholes before pronouncing a definitive verdict.

14.2 Spontaneous breaking of global symmetries:


the Goldstone theorem
In Section 8.4 we studied the effect of the spontaneous symmetry-breaking in a scalar
field theory with a triplet φ of equal mass scalar fields, in which the global symmetry
group G = O(3) of the Hamiltonian (or Lagrangian) is broken by the vacuum of
the theory to a subgroup H = O(2), leading to the appearance of two massless
scalar particles corresponding to the two broken generators of G. The appearance
of massless particles as the consequence of vacuum-breaking of an exact continuous
global symmetry of the theory is a very general consequence of basic covariance and
spectral properties of field theory: the theorem establishing this result goes back to a
seminal paper of Goldstone, Salam, and Weinberg (Goldstone et al., 1962). Later work
by Kastler, Robinson, and Swieca established rigorously, using methods of axiomatic
field theory (cf. Section 9.2), that in a theory with a mass gap (in which the lowest-mass
single-particle state corresponds to non-zero mass), satisfying the usual spectral and
Poincaré invariance axioms, spontaneous symmetry-breaking is impossible: an exact
global symmetry of the Lagrangian must also be preserved by the vacuum. The Ruelle
Clustering theorem of Section 9.2, in the stronger form available for theories with a
mass gap, plays a critical role in establishing the spatial asymptotics needed in the
proof of Kastler et al. (Kastler et al., 1966). In this section we will describe the Gold-
stone theorem and outline (in a non-rigorous fashion) its proof from various angles,
referring the reader to the aforementioned paper for a completely rigorous treatment.
Suppose that we have a (set of) four-vector conserved currents Jαμ (x), ∂μ Jαμ = 0 in
a theory with a mass gap. Assuming the fields of the theory to transform covariantly
under the HLG, the current(s) must satisfy

U † (Λ)Jαμ (x)U (Λ) = Λμν Jαν (Λ−1 x) (14.4)

where Λ is an arbitrary Lorentz transformation. The vanishing of the four-divergence


of Jαμ ensures in the usual way that the associated charge

Qα (t) ≡ Jα0 (x, t)d3 x (14.5)

is time-independent (and in fact, clearly commutes with the full energy-momentum


four-vector P μ of the theory). What is less obvious is that it also guarantees that
Qα (t) = Qα is invariant under the HLG: U † (Λ)Qα U (Λ) = Qα . This is best seen by
first assuming Λ to be an infinitesimal Lorentz transformation: the general result for
finite Λs then follows by exponentiation of the infinitesimal case in the usual way.
Thus, let Λμν = g μν + ω μν with ω μν = −ω νμ and ω infinitesimal. To first order in ω,
we easily find, using (14.4) and (14.5),
Spontaneous breaking of global symmetries: the Goldstone theorem 493

U † (Λ)Qα U (Λ) = Qα + (ω 0ν Jαν (x) − ω μρ xρ ∂μ Jα0 (x))d3 x + O(ω 2 ) (14.6)

The ρ = 0 contribution to the second term  in the integral is proportional to a total


spatial derivative and hence vanishes: x0 ∂i Jα0 (x)d3 x = 0, as does the contribution

from spatial indices ρ = i, μ = j, proportional to ω j i xi ∂j Jα0 (x)d3 x ∝ δij ω ij = 0, by
integration by parts. Finally, the contribution from ρ = i, μ = 0 becomes, using current
conservation,
 
−ω 0i xi ∂0 Jα0 (x)d3 x = ω 0i xi ∂j Jαj (x)d3 x

=− ω 0i Jαi (x)d3 x

=− ω 0ν Jαν (x)d3 x (14.7)

cancelling the first term in the integral in (14.6) and giving the desired result

U † (Λ)Qα U (Λ) = Qα ⇒ [U (Λ), Qα ] = 0 (14.8)

If we now define a (set of) states |α ≡ Qα |0 , obtained by applying the charges Qα


to the Lorentz-invariant physical vacuum |0 , U (Λ)|0 = |0 of the theory, we conclude
that since U (Λ)|α = |α

α|Jαμ (x)|0 = α|U † (Λ)Jαμ (x)U (Λ)|0 = Λμν α|Jαν (Λ−1 x)|0 (14.9)

But the matrix elements in (14.9) can clearly be translated to x = 0, as the states
|α have zero-energy-momentum (as mentioned above, [P μ , Qα ] = 0), allowing us to
conclude

α|Jαμ (0)|0 = α|Jαμ (x)|0 = 0 (14.10)

Integrating (14.10) over all space then yields immediately



α|Jα0 (x)|0 d3 x = α|α = 0 (14.11)

which, by positive-definiteness of the Hilbert space, implies |α = Qα |0 = 0, so that


the symmetry is in fact preserved by the vacuum state. The necessity for a mass gap
in the preceding argument (first given in (Goldstone et al., 1962)) is not immediately
clear: it turns out that suitable asymptotic spatial convergence of the integrals defining
the |α states in (14.11) is absent in theories with zero-mass particles. The arguments
of Kastler et al. (Kastler et al., 1966) fill in the required rigorous asymptotic bounds
needed for a proper proof. We should also emphasize here that the argument fails in
any theory with local gauge interactions:3 as we shall see in the next chapter, in such

3 Note that, as we shall see in our discussion of the Higgs phenomenon, it is perfectly possible for a
theory with exact dynamical local gauge symmetry to have no massless particles—one simply arranges for
the symmetry group G to be completely broken by the vacuum state, leading to a theory with a non-zero-
494 Symmetries III: Global symmetries in field theory

theories, the theory must be quantized either in a non-covariant “physical” gauge, with
a positive-definite Hilbert space, or in a covariant (“Gupta–Bleuler”) gauge in which
the Hilbert space is not positive-definite (even though the negative or zero metric
unphysical modes can be shown to decouple from physical transition amplitudes). As
both explicit covariance of the fields and positive-definiteness of the Hilbert space are
ingredients of the reasoning leading to (14.11), we must conclude that spontaneous
breaking in theories with exact local gauge symmetries has distinctive features not
present in the global case (in particular, we cannot expect a massless Goldstone boson
to appear). In fact, we shall see that the peculiarities (generally dubbed the “Higgs
mechanism”) of theories with both local gauge symmetry and spontaneous symmetry-
breaking, to be discussed in Section 15.7, play a central role in the electroweak sector
of the Standard Model of elementary particle interactions.
The physics underlying the appearance of massless particles when a continuous
global symmetry is broken by the vacuum becomes clearer if we approach the problem
from another angle. Let us assume that our theory admits an exact continuous global
symmetry G (which could be abelian or non-abelian) yielding Noether charges Qα
which generate the appropriate infinitesimal transformations on a finite set of local
(or almost local) fields φn (x) spanning a representation of G with matrix generators
tα (cf. Section 12.4, Example 5):

[Qα , φn (x)] = −(tα )nm φm (x) (14.12)

The fields φn (x) here may be elementary or composite: we require only that they fill
out a finite-dimensional representation of the global symmetry group G. If the vacuum
is invariant under G, i.e., Qα |0 = 0, ∀α, we must have

0|[Qα , φn (x)]|0 = 0 ⇒ (tα )nm 0|φm (x)|0 = (tα )nm 0|φm (0)|0 = 0, ∀α (14.13)

On the other hand, if there are generator(s) tα which do not annihilate the vector of
vacuum expectation values,

Nαn ≡ (tα )nm 0|φm (0)|0 = 0 (14.14)

the symmetry generated by Qα is spontaneously broken, and there must be a zero-mass


particle in the theory.
In fact, a simple spectral argument of Kållen–Lehmann type (cf. Section 9.5;
see also (Itzhykson and Zuber, 1980), p. 520), shows that the conserved current Jαμ
itself acts as an interpolating field for this “Goldstone particle”. Inserting a complete
set of states |n , P μ |n = Pnμ |n in the vacuum expectation value of the commutator
0|[Qα (t), φn (0)]|0 = Nαn , with the (conserved!) charge chosen at a definite time t,
Qα (t) = Jα0 (x, t)d3 x, we find,

mass gap. In such cases, the possibility of evasion of the Goldstone theorem arises from the indicated mutual
compatibility of manifest covariance and positive-definiteness in the local gauge case.
Spontaneous breaking of global symmetries: dynamical aspects 495


Nαn = { 0|Jα0 (0)|n n|φn (0)|0 e−iPn ·x − 0|φn (0)|n n|Jα0 (0)|0 eiPn ·x }d3 x
n

= (2π)3 δ 3 (Pn ){ 0|Jα0 (0)|n n|φn (0)|0 e−iEn t − 0|φn (0)|n n|Jα0 (0)|0 eiEn t }
n=|0>

= 0, ∀t (14.15)

Note that the physical vacuum does not appear among the sum over states |n in
(14.15), as the two terms in brackets cancel in this case. In fact, the charge Qα (t)
is time-independent by assumption, so differentiating with respect to time, we must
have, for all t,

δ 3 (Pn )En { 0|Jα0 (0)|n n|φn (0)|0 e−iEn t + 0|φn (0)|n n|Jα0 (0)|0 eiEn t } = 0
n=|0>
(14.16)
The conditions (14.15, 14.16) together imply that there must be non-vacuum states
|n for which 0|Jα0 (0)|n = 0, n|φn (0)|0 = 0 requires the vanishing of the energy En
whenever the spatial momentum Pn of the state is zero. Such states can only be single-
particle zero-mass states, and the requirement 0|Jα0 (0)|n = 0 is simply the statement
that the current Jαμ is an interpolating field for the corresponding Goldstone particle.
If the components of Jαμ are bosonic fields, the particle is a Goldstone boson. However,
note that in supersymmetric theories, we can have global Noether currents of fermionic
type (recall (12.154)), and the corresponding Goldstone mode will be a fermion, if the
symmetry is spontaneously broken.

14.3 Spontaneous breaking of global symmetries: dynamical


aspects
The explicit examples of spontaneous symmetry-breaking (SSB) in self-interacting
scalar field theories described in Section 8.4 make the connection between the ener-
getics of the theory—specifically, the properties of the lowest-energy state—and the
presence or absence of SSB clear. In this section we shall discuss some useful diagnostic
tools for determining whether SSB is present in the context of specific Lagrangian field
theories. In contrast to the approach followed in Section 8.4, we shall primarily be using
the (Euclidean) functional integral version of the quantized field theory. The physics of
spontaneous symmetry-breaking, as we shall see, is essentially long-distance physics:
the essential phenomena appear in the limit where the spatial volume of the system
tends to infinity. We shall implicitly assume that a short-distance regularization of
the theory is in place (e.g., a spacetime lattice) throughout the discussion, with the
functional integral defined initially in a spacetime box of volume Ω = V T , where V is
the spatial volume and T the (imaginary) time extent of the system. The energetics
of the system relevant to deciding on the existence of SSB then turns out to be
conveniently encoded in the effective action functional Γ[φ] introduced in Section 10.4,
or rather, on its specialization to constant fields V (φ) ≡ Γ[φ(x) = φ] called the effective
potential. For simplicity, we shall work with a theory of a single real scalar field, with
Euclidean action S[φ] = ( 12 (∂μ φ)2 − P (φ))d4 x, where P (φ) is a polynomial in φ.
496 Symmetries III: Global symmetries in field theory

The effective action Γ[φ] was defined in Section 10.4 as the functional Legendre
transform of the generating functional W [j] of connected Green functions:
  4
Z[j] = Dφe−(S[φ]− j(x)φ(x)d x) (14.17)

W [j] = ln Z[j]/Z[0] (14.18)



δW [j]
Γ[φ] = j(x)φ(x)d4 x − W [j] , φ(x) = (14.19)
δj(x)
The existence of the Legendre transform as defined in (14.19) presupposes that the
functional relation φ(x) = δW [j]
δj(x) can be uniquely inverted to allow us to eliminate the
source function j(x) in favor of the classical field φ(x) in Γ[φ]. Precisely this supposition
becomes problematical in the situations in which spontaneous symmetry-breaking
occurs, as we shall now see: instead, we shall introduce a more general definition
of the Legendre transform, which (a) agrees with (14.19) in the absence of SSB, (b)
is mathematically well-defined even in theories exhibiting SSB, and (c) is physically
more directly related to the underlying energetic effects responsible for inducing SSB.
The basic idea is best exhibited in a simple toy model which we have already
exploited in our discussion of Borel summability in Chapter 11: the zero-dimensional
“field theory” obtained by setting all non-zero momentum modes of the field to zero, so
that the path integral in (14.17) becomes a one-dimensional integral over the constant
value of the field φ, in the presence of a constant source j(x) = j:

Z(j) = eΩ(jφ−P (φ)) dφ, exp (ΩW (j)) = Z(j)/Z(0) (14.20)

In the limit of infinite spacetime volume Ω, the integral is dominated by the point or
points at which the maximal value of the exponent is achieved. In the examples to be
considered below there will be a unique such point, and we have simply (for Ω → ∞)

W [j] = supφ (jφ − P (φ)) − supφ (−P (φ)) = supφ (jφ − P (φ)) + inf φ P (φ) (14.21)

For convenience, we shall choose the field polynomial P (φ) so that the second term
on the right vanishes, and we have simply

W [j] = supφ (jφ − P (φ)) (14.22)

For example, we may take the scalar theory discussed in Section 8.4, with either
1 2 2 λ 4
P+ (φ) = m φ + φ (14.23)
2 4!
with no SSB, or the theory with a negative sign in the mass term,

1 2 2 λ 4 3 4 λ 2 6
P− (φ) = − m φ + φ + m = (φ − v ) , v ≡
2 2
m (14.24)
2 4! 2λ 4! λ
Taking first the positive-sign case, one easily sees
that for any real j there is a
6j
unique maximum of jφ − P (φ) when φ = vx, v ≡ λ6 m, x3 + x − λv 3 = 0, with the
Spontaneous breaking of global symmetries: dynamical aspects 497

cubic equation having only one real root. The location of the root can be found readily
for small j, and we find that the solution for φ depends analytically on j, with
6 216
φ= j − 3 8 j 3 + O(j 5 ) (14.25)
λv 2 λ v
and
3 2 54
W+ (j) = j − 3 8 j 4 + O(j 6 ) (14.26)
λv 2 λ v
Evidently, in this case, W+ (j) is everywhere differentiable, and the relation φ = W+ (j)
reduces to the cubic equation above, with a unique 1-1 mapping between φ and j. The
Legendre transform Γ+ (φ) of W+ (j) can therefore be constructed in the usual fashion
(as in (14.19)), and we find

1 2 2 λ 4
Γ+ (φ) = jφ − W+ (j) = m φ + φ = P+ (φ) (14.27)
2 4!
which is just the classical action in this constant-field model (S(φ) → P+ (φ)). There
are no loop graphs in this model, as there are no non-zero momenta to integrate over,
so this agrees with our discovery in Section 10.4 that the effective action reduces to
the classical Lagrangian at tree level.
The situation is quite different in the case where the mass term has a negative
sign, as the maximum of jφ − P− (φ) occurs at a solution of the cubic equation φ(φ2 −
v 2 ) = λ6 j, which has three real solutions for |j| < 9√
λ
3
v (for example, at j = 0 we find
solutions at φ = 0, ±v). For small positive j, the solution giving the absolute maximum
of jφ − P− (φ) is φ = v + λv3 2 j + O(j 2 ), while for small negative j we must choose the
solution at φ = −v + λv3 2 j + O(j 2 ). There is clearly a discontinuity in φ as a function
of j as we pass through j = 0, which asserts itself as a cusp (discontinuity in the first
derivative) of W− (j):
3 2
W− (j) = v|j| + j + .. (14.28)
2λv 2
The dramatic alteration in shape of W (j) as a result of the change of sign in the
mass term is clearly visible in Fig. 14.1. The failure of the derivative of W− (j) to be
well defined at j = 0 invalidates the usual procedure for constructing the Legendre
transform Γ− (φ) of W− (j). Note that the cusp in W− (j) is a direct consequence of
having taken the infinite volume limit Ω → ∞ in the defining integral (14.20): for finite
Ω the integral defines an infinitely differentiable function of j, and there are absolutely
no difficulties in defining a Legendre transform in the usual fashion. We can obtain
a unique and well-defined infinite-volume Γ− (φ), which, moreover, coincides with the
usual definition in those cases where there are no difficulties with differentiability of
W (j), by defining the Legendre transform in general4 as follows,

4 For a beautiful introduction to the applications of convexity theory in classical thermodynamics, leading
to the “sup” definition of the Legendre transform given here, see the Introduction to (Israel, 1978) by
A. S. Wightman.
498 Symmetries III: Global symmetries in field theory

W±(j)
80
W−(j)

W+(j)

40

j
–20 0 20

Fig. 14.1 The infinite volume connected generating functions W± (j) for the case of zero-
dimensional scalar theories P± (φ) and with positive or negative squared mass terms (parameters
m = 1, λ = 1.5, v = 2). Note the cusp of W− (j) at j = 0.

Γ(φ) = supj (jφ − W (j)) (14.29)

A finite maximum obtains for arbitrarily large |j| provided W (j) rises at least linearly
with j for large |j|, i.e., W (j) is convex for large j. Recall that a real function W (j)
is convex if

W (αj1 + (1 − α)j2 )) ≤ αW (j1 ) + (1 − α)W (j2 ), ∀α, 0 < α < 1 (14.30)

An exactly analogous definition holds for convex functionals W [j] on a function space
of real functions j(x). In fact, returning temporarily to the full field theory case, the
convexity of the generating functional W [j] (not just at large j) follows directly from
its functional integral representation
  
exp W [j] = exp ( j(x)φ(x)d x)dμ, dμ ≡ exp (−S[φ])Dφ/ exp (−S[φ])Dφ
4

(14.31)

as an integral over a normalized positive measure dμ, dμ = 1. Integrals over such
measures satisfy a Hölder inequality (see (Rudin, 1966), p. 62)
  
α 1−α
f g dμ ≤ ( f dμ) ( gdμ)1−α , 0 < α < 1
α
(14.32)
 
Setting f = exp ( j1 (x)φ(x)d4 x), g = exp ( j2 (x)φ(x)d4 x), we see that (14.32) imme-
diately implies, taking the logarithm,

W [αj1 + (1 − α)j2 ] ≤ αW [j1 ] + (1 − α)W [j2 ], 0 < α < 1 (14.33)


Spontaneous breaking of global symmetries: dynamical aspects 499

The convexity of W (j) in our toy model (with or without SSB) is apparent from a
glance at Fig. 14.1. It is easy to verify that the property of convexity of W (j) carries
over to Γ(φ), provided that we use the “sup” definition of the latter. Indeed,

Γ(αφ1 + (1 − α)φ2 ) ≡ supj {(j(αφ1 + (1 − α)φ2 ) − W (j)}


= supj {(α(jφ1 − W (j)) + (1 − α)(jφ2 ) − W (j))}
≤ α supj1 {(j1 φ1 − W (j1 ))} + (1 − α) supj2 {(j2 φ2 − W (j2 ))}
≤ αΓ(φ1 ) + (1 − α)Γ(φ2 ) (14.34)
The conventional definition (14.19) of the Legendre transform, when it is applica-
ble, is involutive: the Legendre transform of the Legendre transform simply reproduces
the original function. A glance at (14.22) shows that the generating function W (j) in
our toy model is in fact just the Legendre transform, using the “sup” definition, of
the classical action of the model P (φ), so this involutive property would imply that
Γ(φ) should be just our original action function P (φ). However, the above convexity
argument shows that, in the case of a negative mass term, the Legendre transform of
Γ− (φ) cannot reproduce the classical action P− (φ), as the latter is clearly not convex!
One can show that | dWdj − (j)
| ≥ v (with the equality holding only at j = 0: see (14.28)
and Fig. 14.1), so that for |φ| < v the maximum in the definition (14.29) is attained
at j = 0, and we find Γ− (φ) = 0, −v ≤ φ ≤ +v. For |φ| > v, the relation between j
and φ is invertible and the “sup” definition of Γ− (φ) reproduces P− (φ) precisely (see
Fig. 14.2). Note that the double-well structure of the potential P− (φ) in the broken
symmetry case has been eliminated in Γ− (φ), which is in fact the convex hull (i.e.,
the boundary of the minimal convex set containing) of the set bounded below by the
graph of P− (φ), with a flat section connecting the points on the graph at φ = ±v, where
Γ− (φ) is evidently not differentiable. On the other hand, if we compute the effective
potential at finite values of Ω (see Fig. 14.2), the resultant convex function is smooth
and everywhere differentiable (inheriting these properties from the finite volume W (j)-
see above). We shall shortly see that there is a direct physical interpretation of the
flat (convexity restoring) section of Γ− (φ) in terms of the energetics of a system with
spontaneously broken symmetry and a degenerate vacuum in the infinite volume limit.
We now return to field theory proper, but ease the transition by considering first
the case of one-time, zero-space dimensions, in which case our “field theory” amounts
to the quantum mechanics of a one-dimensional anharmonic oscillator, and the path
integral (14.17) defines the Euclidean kernel (cf. Section 4.2.1) over a finite time extent
T , Tr exp (−Hj̄ T ) for the Hamiltonian

p2 λ 1 d2
Hj̄ ≡ + (x2 − v 2 )2 − j̄x = − + P (x) − j̄x ≡ H0 − j̄x (14.35)
2 24 2 dx2
Here the “field” φ has been replaced by the quantum coordinate x, and we have
restricted the source function j(t) to be independent of time, j(t) = j̄ = constant.
Evidently we are dealing with a “double-well” anharmonic oscillator, where we now
also assume that we are in a regime where λ is taken large for v fixed, so that the
two potential basins are separated by a large energy barrier (see Fig. 14.3). In this
situation, the Gaussian wavefunctions ψ± (x) = x|± centered on x = ±v,
500 Symmetries III: Global symmetries in field theory

Γ+(φ)
P+(φ)
Γ−(φ)
+ P−(φ)
Γ−(φ), Ω = V T =1
o o
o o
20
o o
o o
o o
o o
o 10 o
+ o o +
+ o o +
+ o o +
+ o o +
+ ooo oo +
++ ooo o +
+++ oo
++
o++++++++++++
oo
+oooo +++ φ
++++++++ ooooooooooo +++++++++
–4 4

Fig. 14.2 The infinite volume effective potential Γ± (φ) for theories defined by action P± (φ)
(same parameters as in Fig. 14.1). The effective potential Γ− (φ) at finite volume, Ω = 1, for
the symmetry-broken case is also shown.

P (x)
ψ+(x)
ψ−(x)

15

10

x
–1.5 –1 –0.5 0 0.5 1 1.5

Fig. 14.3 Potential energy function P (x) and approximate ground states ψ± (x) for a one-
dimensional anharmonic oscillator (λ=300, m = v = 1).
Spontaneous breaking of global symmetries: dynamical aspects 501

−ω 2 λv 2
ψ± (x) ≡ Ce 2 (x∓v) , C = (ω/π) 1/4
, ω≡ (14.36)
3
are approximate degenerate ground-state eigenfunctions of H0 , with approximate
eigenvalue ω/2, due to the exponentially suppressed tunneling amplitude for the
particle to transition between the two potential basins. A short calculation shows
2
that γ ≡ +|H0 |− ∼ O(ω 2 e−ωv ) for λ (hence ω) large, and fixed v. We may therefore
represent approximately the low-energy sector of this theory by truncating the Hilbert
space to the two-dimensional subspace spanned by the states |+ , |− , in which space
the Hamiltonian Hj̄ takes the matrix form

2 − v j̄ γ
Hj̄ = ω (14.37)
γ 2 + v j̄

using the fact that the expectation value of the coordinate x in the highly localized
states represented by wavefunctions ψ+ (x), ψ− (x) are +v, −v approximately. The
source-free Hamiltonian H0 has, as is well known, a unique non-degenerate ground
state, with (in the approximation (14.37)) the symmetric eigenfunction √12 (ψ+ (x) +
ψ− (x)), and energy ω2 − γ. The antisymmetric state with wavefunction √12 (ψ+ (x) −
ψ− (x)) lies at an energy 2γ above this ground state, and will be suppressed in the
partition function Z(j̄) (for arbitrary j̄) if we choose T large enough that e−2γT << 1,
as we shall henceforth do. Switching on the source j̄, the energy of the new ground
state |0, j̄ becomes
ω 
E0,j̄ = − γ 2 + v 2 j̄ 2 (14.38)
2
with Z(j̄) = e−T E0,j̄ + O(e−2T γ ). The generating function W (j̄) becomes, up to expo-
nentially small corrections,
1
W (j̄) = ln Z(j̄) = −E0,j̄ (14.39)
T
The expectation value of the coordinate x (recall that this is the analog of the field
operator φ in our zero-spatial-dimensional model) in the ground state of Hj̄ can be
calculated directly (by diagonalizing Hj̄ ) or by taking the derivative of W (j̄): in either
case one finds
dW (j̄) v 2 j̄
x̄ = = (14.40)
dj̄ γ 2 + v 2 j̄ 2
We pointed out previously that there are no difficulties in defining the Legendre
transform in the usual way at finite spatiotemporal volume, which is certainly the
case in our model (for finite T ). And indeed, for small j̄, when our approximations
are valid, we see that there is no problem with differentiability, or with inverting
the relation between j̄ and x̄, so we may define the effective potential Γ(x̄) with the
conventional Legendre transform,

Γ(x̄) = j̄ x̄ − W (j̄) = E0,j̄ + j̄ x̄ = 0, j̄|H0 |0, j̄ (14.41)


502 Symmetries III: Global symmetries in field theory

This is the promised energetic interpretation of the effective potential Γ(x̄): it is the
expectation of the source-free Hamiltonian H0 in the ground state of the sourced Hamil-
tonian Hj̄ where the source j̄ is chosen to give the value x̄ for the expectation value
of position in that ground state. In our crude approximation, one  finds (see Problem
3) that for |x̄| < v, Γ(x̄) is a convex function, with Γ(x̄) = ω2 − γ (1 − x̄2 /v 2 ). The
overlap matrix element γ is exponentially small in our model by our choice of a large
quartic coupling λ, but it is automatically small in a true field theory, as the overlap
of the states |v (respectively, | − v ) characterized by having field expectation values
φ(x) = +v (respectively −v) is exponentially suppressed in the spatial volume V :
+v| − v ∼ e−KV , essentially because this overlap involves the multiplication of order
V tunneling amplitudes connecting the field variable at each spatial point5 between the
two vacuum values +v and −v (see Problem 4 for an explicit example). We see that in
the infinite-volume limit (where γ → 0) in the field theory cases, the effective potential
develops a flat section, as in the zero-dimensional model (see Fig. 14.2), connecting
the two classical minima of the field potential. Of course, in the quantum case there
is an additional zero-point energy (which could be removed by normal-ordering): just
the ubiquitous ω2 appearing in the preceding formulas.
Finally, we consider field theory proper, specifically the double-well scalar theory
of Section 8.4, with Euclidean generating functional
 
j(x)φ(x)d4 x) 1
Z[j] = Dφe−(S[φ]− , S[φ] = (∂μ φ)2 − P− (φ) (14.42)
2

with P− (φ) given in (14.24). In the discussion that follows we shall be discussing
features of spontaneous symmetry-breaking at the lowest order of the loop expansion:
the reader will recall from the discussion in Section 10.4 that the effective action at
the leading order of a formal expansion in Planck’s constant —which, as we showed
there, was equivalent to the perturbative loop expansion—reproduces the classical
action. We now know that in the spontaneously broken case (14.42), the properly
defined (with either the “sup”, or equivalently, the minimum-energy definition as in
(14.41)) effective potential is convex, and at the leading order of  amounts to the
convex hull of the double-well field potential P− (φ). There will be higher-order loop
corrections to the quantities discussed below, but they generally do not affect the
qualitative features of global spontaneous symmetry breakdown.6
As previously, we begin by working at finite spacetime volume Ω = V T , and restrict
the source function j(x) to be a spacetime constant, j(x) = j̄. A connected finite-
volume generating function W (j̄) is then defined in the usual way by taking the infinite
Euclidean time limit to project out the lowest-energy state |0, j̄ in the presence of the
source j̄, as follows:

5 We imagine throughout that our theory is regularized at short distance, say on a spatial lattice, so
that V can be regarded simply as enumerating the finite number of spatial lattice points.
6 An interesting exception is provided by the Coleman–Weinberg phenomenon (Coleman and Weinberg,
1973), in which a theory without spontaneous breaking at the classical level develops a non-vanishing field
expectation value at the one-loop level due to radiative corrections. The features of vacuum structure,
clustering, etc., discussed below, apply in full force to such theories, once these loop effects are included.
Spontaneous breaking of global symmetries: dynamical aspects 503

1 1 Z(j̄)
WV (j̄) ≡ lim ln (14.43)
V T →∞ T Z(0)

Note that the derivative dWdj̄(j̄) ≡ φ̄ = 0, j̄|φ(x)|0, j̄ = 0, j̄|φ(0)|0, j̄ , as the expec-


tation value of the field in the ground state of the translationally invariant sourced
Hamiltonian is necessarily itself translation-invariant. The Legendre transform giving
the effective potential is defined as above for the anharmonic oscillator, and as
previously, is just the energy of the source-free Hamiltonian H0 in the state |0, j̄
in which the field has expectation φ̄

ΓV (φ̄) = j̄ φ̄ − WV (j̄) = 0, j̄|H0 |0, j̄ (14.44)

For finite spatial volume V , ΓV (φ̄) is convex and smooth, but develops cusps (dis-
continuities in the first derivative) and a flat section for −v < φ̄ < +v in the infinite
volume limit. In this limit, the theory has two exactly degenerate minimum-energy
states (in the absence of a source), which we may denote, following our previous
notation, | + v and | − v , with ±v|φ(x)| ± v = ±v. As pointed out above, these
states are strictly orthogonal in the infinite volume limit: indeed, the matrix element
of any local operator taken between | + v and | − v also vanishes, as the local field
cannot “twist” the scalar field expectation value from −v to +v over all spacetime
points (see Problem 4). As

−v|H0 | − v = +v|H0 | + v ≡ E0 (14.45)

and

−v|H0 | + v = 0 (14.46)

it follows that for −v < φ̄ < +v, taking

|0, j̄ = α| − v + β| + v , |α|2 + |β|2 = 1 (14.47)

with

φ̄ = 0, j̄|φ(x)|0, j̄ = |α|2 (−v) + |β|2 (+v) (14.48)

the infinite volume effective potential is immediately seen to be constant in the range
−v < φ̄ < +v:

Γ∞ (φ̄) = 0, j̄|H0 |0, j̄ = |α|2 E0 + |β|2 E0 = E0 (14.49)

This phenomenon is a very familiar one in the classical thermodynamics of coexisting


phases of simple fluids (see Wightman’s Introduction in (Israel, 1978)), where we also
find flat sections in the boundaries representing the graphs of various thermodynamic
functions (such as the internal energy U ) when given as functions of the other extensive
variables of the system (such as entropy S and volume V ). It is apparent from this
discussion that the coherent states |f constructed in Section 8.3, with f |φ(x, 0)|f =
f (x) and with energy reproducing the double-well structure of P− (φ) (cf. (8.63)), do
not in fact correspond to true ground states once φ̄ is constrained to take a value in
504 Symmetries III: Global symmetries in field theory

the range −v to +v: instead the system prefers the appropriate linear combination
(14.47) of the two degenerate vacua of the system.
The mixed states7 α| − v + β| + v ≡ |φ̄ defined in (14.47) are perfectly well-
defined normalized states in the Hilbert space of the theory, but, with the exception
of the two extreme cases | ± v (|α| = 0 or 1), they are not physically acceptable vacua.
Indeed, a Fock space built on such states will necessarily result in a dramatic failure of
the Ruelle clustering property discussed in Section 9.2, and hence in the basic property
of cluster decomposition which constitutes one of the pillars on which we constructed
the entire framework of local quantum field theory. To see this in a simple example,
consider the connected part of the Wightman two-point function defined with respect
to a mixed vacuum |φ̄ , −v < φ̄ < v:

φ̄|φ(x1 )φ(x2 )|φ̄ c ≡ φ̄|φ(x1 )φ(x2 )|φ̄ − φ̄|φ(x1 )|φ̄ φ̄|φ(x2 )|φ̄ (14.50)

In the first term on the right-hand side, we may insert a complete set of states

φ̄|φ(x1 )φ(x2 )|φ̄ = φ̄|φ(x1 )| − v −v|φ(x2 )|φ̄ + φ̄|φ(x1 )| + v +v|φ(x2 )|φ̄




+ φ̄|φ(x1 )|n n|φ(x2 )|φ̄ (14.51)
n

where the primed sum runs over non-vacuum states, beginning with single-particle
states separated (in this theory with a broken discrete symmetry) by a non-zero mass
gap m from the degenerate vacuum states | ± v . If x1 , x2 are spacetime coordinates
separated by a large space-like separation R, i.e., (x1 − x2 )2 = −R2 , then the primed
sum has a Kållen–Lehmann representation (cf. Section 9.5) as a spectral integral over
free two-point functions with mass μ ≥ m, which fall at least as fast as e−mR at large
R. Up to these exponential falling terms, therefore, and using the vacuum orthog-
onality property ±v|φ(x)| ∓ v = 0, we find that in an infinite volume theory, for
R large

φ̄|φ(x1 )φ(x2 )|φ̄ = |α|2 −v|φ(x1 )| − v −v|φ(x2 )| − v


+ |β|2 +v|φ(x1 )| + v +v|φ(x2 )| + v + O(e−mR )
= |α|2 −v|φ(0)| − v −v|φ(0)| − v
+ |β|2 +v|φ(0)| + v +v|φ(0)| + v + O(e−mR )
= |α|2 (−v)2 + |β|2 v 2 = v 2 (14.52)

whereas the second term in (14.50) becomes

φ̄|φ(x1 )|φ̄ φ̄|φ(x2 )|φ̄ = φ̄2 (14.53)

7 The terminology “mixed” here being applied to non-extremal vacuum states should not be confused
with the sense of “mixed” as distinguished from “pure” states in statistical physics: the states |φ̄ are
pure states for all α, in the statistical sense, corresponding to definite rays in the Hilbert space. The
thermodynamic analogy is with systems with coexisting phases: see Wightman, footnote 4, op. cit.
Spontaneous breaking of global symmetries: dynamical aspects 505

Combining (14.52) and (14.53), we find asymptotically,

φ̄|φ(x1 )φ(x2 )|φ̄ c → v 2 − φ̄2 + O(e−mR ), R → ∞ (14.54)

As v 2 − φ̄2 = 0 unless we choose φ̄ = ±v (i.e., |α| = 1 or 0), we see that clustering


fails except for the two extreme points representing states where the field is globally
oriented with expectation value φ̄ either at +v or at −v. The preference for such states
does not seem to be energetically based, as the states |φ̄ are degenerate in energy for
all values of φ̄ with −v ≤ φ̄ ≤ +v.
Indeed, the fact that we live in a Universe consisting of particle states built on
a clustering vacuum has to be understood in “historical” terms—in other words,
as a consequence of the cosmological evolution of the Universe. To take a familiar
analogy from condensed matter physics, the choice of a direction of magnetization of
an initially demagnetized region of ferromagnetic material as it falls below its Curie
temperature Tc will typically depend on the presence of small random external fields,
whose direction then gets “frozen in” as the material settles into a state with a non-zero
expectation value for total electronic magnetic moment as the material cools below Tc .
In our simple model with a discrete symmetry φ → −φ, the choice of either the | + v
or the | − v state as the physical vacuum would likewise depend on an accident of
external fields which would “tickle” the field into one or the other state—at this early
stage, for energetic reasons—from which the system cannot escape once the volume
of the Universe grows and the temperature cools.
Of course, we can only expect the same orientation of the symmetry-breaking to
obtain over regions which are causally connected when the freeze-out occurs. The
fact that the entire present observable Universe appears to have the same direction
of spontaneous symmetry-breaking for the putative Grand Unified Theory overlying
the Standard Model of gauge interactions is one aspect of the deep and famous
“horizon” problem of Big-Bang cosmology, in principle solved by the inflationary
cosmology developed by Alan Guth and others. The history dependence of the low-
energy dynamics of a system with spontaneous symmetry-breaking has a precise
mathematical correlate in the non-uniformity of the zero-source and infinite-volume
limits for WV [j]: if the infinite-volume limit is taken after the sources are sent to
zero, expectation values of observables necessarily respect the global symmetry (for
example, φ = 0, corresponding to the symmetric, but non-clustering, ground state
√1 (| − v + | + v )), while if the infinite-volume limit is taken first, and then the source
2
sent to zero from the positive (resp. negative) directions, the expectation values found
from the functional integral refer to the “pre-magnetized” clustering states | + v
(resp. | − v ).
The preceding discussion was restricted to a theory displaying a simple discrete
global reflection symmetry, φ → −φ, undergoing spontaneous breaking, which is sig-
nalled (in the infinite volume limit) by the appearance of a two-dimensional vacuum
sector with two degenerate minimum-energy states | + v , | − v . The generalization
of the discussion to a spontaneously broken continuous group is straightforward. In
the second theory discussed in Section 8.4, for example, with a triplet of scalar field
 subject to Hamiltonian (8.69),
φ
506 Symmetries III: Global symmetries in field theory


1 ˙ 1 2  = λ (φ
 :, P (φ)  6
H= 3
d x: φ 2
+ |∇ φ| + P (φ) 2
−v ) , v =
2 2
m (14.55)
2 2 24 λ

the global O(3) symmetry of the Hamiltonian, under the transformations φ  → Rφ,
with R an orthogonal rotation, is spontaneously broken, and the vacuum sector can be
parameterized by the points on the surface of a sphere | φ |  ≤ v in field space. There is
thus a continuous infinity of orthogonal minimum-energy states in the infinite-volume
limit, and the effective potential Γ(φ),  which is again necessarily convex, is constant
within and on the boundary of this sphere. A physically realistic theory, satisfying
the constraints of clustering, again requires that we choose vacua corresponding to
an extreme point, on the surface of the sphere, with | φ |  = v. Of course, Goldstone’s
theorem (our symmetry is now continuous!) asserts that the field theory constructed
on such a vacuum state necessarily has zero-mass particle states. Denoting a particular

clustering vacuum state by |v , where v is a three-vector with magnitude v = 6/λm,
the proof of the Goldstone theorem outlined in the preceding section indicates that the
state J 0 (x)|v contains single-particle Goldstone particle states, so that the spatial
integral, giving the effect of the charges Q  on |v , produces a state with zero spatial
momentum (and therefore energy) Goldstone particles. On the other hand, the effect of
the O(3) charges is simply to perform an infinitesimal rotation in field space, so at least
 = v can be constructed from each other by application
formally, the vacua |v with | φ |

ω ·Q
i

of the finite rotation e —in other words, by constructing coherent states containing
infinitely many zero-energy Goldstone modes. At infinite volume these states become
orthogonal, and the associated formally unitary operations become improper, much
as the interaction-picture operators in Haag’s theorem (cf. Section 10.5).
Although the definition of an effective action (or the associated effective potential,
for constant fields) in terms of a Legendre transform is very convenient for perturbative
calculations (one simply sums the 1PI graphs), it is not particularly useful in situations
where spontaneous symmetry-breaking occurs at a non-perturbative level, such as in
strongly coupled scalar theories (in calculating upper bounds on the Higgs mass,
for example) or in quantitative studies of chiral symmetry-breaking in quantum
chromodynamics. A more convenient object in these cases, where we must resort to
explicit numerical simulation of the lattice-regularized field theory, is provided by
the constraint effective potential introduced in (O’Raifertaigh et al., 1986). For the
scalar theory (14.42), for example, define a functional UΩ [φ], which we shall call the
constraint effective action, by the functional integral

−UΩ [φ]
e ≡ Dφ̂δ(φ(x) − φ̂(x))e−S[φ̂] (14.56)

where the field theory is defined at finite spacetime volume Ω. Note that the connected
generating functional WΩ [j] is completely reconstructible from the knowledge of UΩ [φ],
by a functional Laplace transformation
 
j(x)φ(x)d4 x)
WΩ [j] = Dφe−(UΩ [φ]− (14.57)
Problems 507

Restricting ourselves to spacetime constant fields, we can similarly define the con-
straint effective potential UΩ (φ)
 
1
e−UΩ (φ) ≡ Dφ̂δ(φ − φ̂(x)d4 x)e−S[φ̂] (14.58)
Ω
It is trivial to impose the δ-function constraint in (14.58) in a numerical (e.g.,
Monte Carlo) simulation of the lattice-regularized theory: for example, on a spacetime
N −1
lattice of Ω = N points, we can just set φ̂N = N φ − i=1 φ̂i , and then simulate the
remaining system of N − 1 field variables by standard statistical sampling methods.
In the case of our zero-dimensional toy theory (14.20), one sees immediately that
U (φ̄) = P (φ̄): the contraint effective potential therefore has the same double-well
structure as the classical field potential in the symmetry-breaking case, and is evidently
not convex, unlike the conventional effective potential Γ(φ̄). In proper field theory
(models with kinetic terms), one finds (O’Raifertaigh et al., 1986) that in the infinite
volume limit, Ω → ∞, UΩ (φ̄) approaches the previously discussed convex function
Γ∞ (φ̄): in particular, the flat regions describing mixed vacua are recovered in this
limit. For an application of the constraint effective potential approach to the problem
of the Higgs boson mass limit in electroweak theory, see (Kuti and Shen, 1988).

14.4 Problems
1. Show that in the toy model defined by the integral (14.20), in the symmetry-
breaking case with potential P− (φ), the infinite volume limit of the effective
potential (with the “sup” definition) Γ− (φ) equals P− (φ) for |φ| > v, while Γ− (φ)
is constant for −v ≤ φ ≤ +v.
2. Calculate the Hamiltonian matrix element γ ≡ +|H0 |− for the Hamiltonian H0
in (14.35) between the Gaussianapproximate ground states ψ± (x) given in (14.36).
3. Verify the result Γ(x̄) = ω2 − γ (1 − x̄2 /v 2 ) for the effective potential for |x̄| < v
in the anharmonic oscillator, using the two-dimensional truncation to the subspace
spanned by |± .
4. The coherent states of a scalar field of mass m with different expectation values for
the field become orthogonal in the infinite-volume limit. To see this, we begin by
considering the field quantized at finite volume (at time zero):
1  1
(a
eik·
x + a
† e−ik·
x ), [a
k , a
†  ] = δ
k,
k

φ(x, 0) = √  (14.59)
V
2E
k k k k
k

A coherent translationally-invariant (zero-momentum) state |v with non-vanishing


expectation value v|φ|v = v can be constructed by applying an exponential of the
creation operator a†0 for zero-momentum modes of the field φ:

|v = KeCva0 |0 (14.60)

with |0 the vacuum with respect to the destruction modes of φ, a


k |0 = 0, ∀k. K
is a normalization constant to assure that |v is unit normalized.
(a) Determine the constants C, K in terms of m, V, v.
508 Symmetries III: Global symmetries in field theory

(b) Show that −v| + v vanishes exponentially in the infinite volume limit V → ∞.
(c) Show that −v|φ(x, 0)2 | + v vanishes exponentially in the infinite-volume limit
V → ∞. Note that this matrix element actually vanishes identically even at
finite volume if we normal order the squared field: why? (See discussion of
coherent states at end of Section 8.3.)
15
Symmetries IV: Local symmetries
in field theory

The preceding three chapters have been devoted to examining the consequences of
two main types of symmetry in quantum field theory: those in which the dynamics
of the theory is invariant under certain transformations on the kinematical spacetime
scaffolding of the theory (we have called these “spacetime symmetries”), and those in
which the symmetry transformations operate globally (i.e., identically at all points in
spacetime) and linearly on the set of independent fields present in the theory (calling
these “internal global symmetries”). If only fields of spin zero and 12 are present, these
are in fact the only types of symmetry that are relevant in relativistic field theory. Once
spin-1 fields are present, however, the situation changes radically. The formulation
of renormalizable interacting field theories for spin-1 particles turns out to lead us
inexorably to the introduction of a new type of symmetry—local gauge symmetry—
which represents, in some sense, an amalgam of spacetime and internal symmetry.
Anticipating the discussion of scale dependence of Lagrangian field theories in
Part 4 of the book, we shall see that the survival to low energies of non-trivial
interactions of spin-1 particles guarantees the presence of local gauge invariance in
the dynamics of the theory describing these low-energy processes. Of course, local
gauge invariance was already fully present in the classical electrodynamics perfected
by the great work of Maxwell in the 1860s, and the incorporation of a local gauge
principle in relativistic quantum field theory was implicit in the very earliest works on
quantum electrodynamics.1 However, a full appreciation of the extraordinarily deep
implications of local gauge symmetry for all the fundamental interactions in Nature
had to await the development of the concepts and techniques of modern quantum field
theory. In this chapter we begin our study of these implications.

15.1 Gauge symmetry: an example in particle mechanics


The basic idea of local gauge symmetry can be illustrated in a purely classical context
with a simple example from point-particle mechanics. We imagine a particle, for
convenience of unit mass, moving in one dimension subject to a potential: for reasons
that will shortly become clear, we will denote the coordinate of motion r (rather
than x, say), and suppose that the motion is restricted to the positive half-line r > 0
(the potential may be chosen to go to positive infinity as r → 0, for example). The

1 The modern “gauge” terminology, however, goes back to the work of Weyl, beginning in 1919.
510 Symmetries IV: Local symmetries in field theory

Lagrangian for this system can be written:

1 2
L= ṙ − V (r2 ) (15.1)
2

Now suppose that the motion of our particle is observed in a frame of reference
attached to a turntable (situated just below the half-line along which the particle
is moving) on which are inscribed perpendicular axes measuring two coordinates q1
and q2 . The turntable is allowed to execute capricious rotations in the course of
time, but at any time we have r = q12 + q22 . If we substitute this relation into the
Lagrangian, we find a new Lagrangian in terms of the q1 , q2 degrees of freedom which
describe the dynamics of the system as observed in the frame of reference affixed to the
turntable:

1 (q1 q̇1 + q2 q̇2 )2


L= − V (q12 + q22 ) (15.2)
2 q12 + q22

If we now wish to study the quantum mechanics of such a system, we must construct
a Hamiltonian via a Legendre transformation, and impose canonical commutation
relations on the conjugate momentum-coordinate pairs p1 , q1 and p2 , q2 , where

∂L q1 (q1 q̇1 + q2 q̇2 )


p1 = = (15.3)
∂ q̇1 q12 + q22
∂L q2 (q1 q̇1 + q2 q̇2 )
p2 = = (15.4)
∂ q̇2 q12 + q22

It is immediately clear that the Legendre transform does not exist in this case, for the
simple reason that it is impossible to solve uniquely for the velocities q̇1 , q̇2 in terms of
the conjugate momenta p1 , p2 , as the pair of equations (15.3, 15.4) are degenerate. In
fact, we have the identity (or “primary constraint”—one following directly from the
structure of the Lagrangian),

χ(q1 , q2 , p1 , p2 ) ≡ χ(q, p) = q1 p2 − q2 p1 = 0 (15.5)

Recalling our discussion of Legendre transforms in Section 14.3, we recognize a


recurrence of the disease already encountered in systems with spontaneous symmetry-
breaking, a lack of strict convexity in the quantity (in this case the Lagrangian)
undergoing the Legendre transform. Unfortunately, the energetically motivated alter-
native “sup” definition of the Legendre transform introduced there to circumvent the
difficulty is of no use here: we must have well-defined expressions for the velocities in
terms of the momenta if we wish to identify conjugate canonical variables as a prelude
to the quantization of the system and calculate a unique Hamiltonian dynamics at
the quantum level. The “flat” regions of the Lagrangian function giving rise to the
breakdown of the standard Legendre transform are easily identified: the Lagrangian
in (15.2) is invariant under the time-dependent “gauge transformations”
Gauge symmetry: an example in particle mechanics 511

q1 (t) → q1 (t) cos (θ(t)) + q2 (t) sin (θ(t))


q2 (t) → −q1 (t) sin (θ(t)) + q2 (t) cos (θ(t)) (15.6)

where θ(t) is an arbitrary differentiable function of time: in the turntable picture


above, it corresponds to twisting the turntable in an arbitrary direction, given by the
angle θ(t), at any given time t. Note that the constraint (15.5) is just the component
of angular momentum in the “3” direction perpendicular to the q1 , q2 axes: it is, in
fact, the generator of the infinitesimal version of the gauge transformations (15.6).
The “gauge-invariance” of the Lagrangian  (15.2) amounts simply to the statement
that the true physical coordinate r = q12 + q22 is independent of θ(t), and (15.2) is
simply (15.1) in disguised form.
The invariance property (15.6) means that we are free to rotate the coordinate
pair (q1 (t), q2 (t)) into the “gauge” q2 = 0 (say) by choosing θ(t) = arctan ( qq21 (t)
(t) ), as
the dynamics is independent of θ(t): in other words, by choosing the orientation of
the turntable at any time so that the particle is situated on the q1 axis. In the gauge
q2 (t) = 0, the Lagrangian (15.2) reduces simply to
1 2
L= q̇ − V (q12 ) (15.7)
2 1
which is precisely our original Lagrangian (15.1), with the trivial change of notation
r → q1 . For this Lagrangian, of course, there is absolutely no problem with quantiza-
tion: we simply set p1 = q̇1 and H = 12 p21 + V (q12 ), with [p1 , q1 ] = −i.
The Lagrangian (15.2) has an obvious generalization to three dimensions: with q
a three-vector, we choose

1 (q · q˙ )2
L= − V (q 2 ) (15.8)
2 q 2

˙

·q
q
In this case there are three primary constraints following immediately from p = q
2 ,
q
q × p = 0:

χi (q, p) = ijk qj pk = 0 (15.9)

which are just the angular momentum components Li , generating rotations around
the three spatial axes. In this case the set of gauge transformations

qi (t) → Rij (t)qj (t) (15.10)

with R(t) an orthogonal O(3) rotation clearly form a non-abelian group. Note that the
commutators (or at the classical level, the Poisson brackets) of the primary constraints
in this case form a closed algebra—indeed, just the Lie algebra of the rotation group.
Constraints which close in this way are referred to as “first-class” constraints, and are
always associated with the presence of superfluous “gauge” degrees of freedom which
can be eliminated by an appropriate gauge-fixing procedure. In order to understand
how to do this in a more general way, we must now turn to a brief discussion of the
theory of constrained Hamiltonian systems.
512 Symmetries IV: Local symmetries in field theory

We referred above to the constraint (15.5) as a “primary” constraint—one following


directly from the definition of the momenta and the structure of the Lagrangian. The
requirement that the constraints of the theory survive the time development—in other
words, are consistent with the Euler–Lagrange equations of motion of the theory—may
lead to further constraints, which are then termed “secondary”. The distinction is not
of fundamental importance, as primary and secondary constraints are to some degree
interchangeable. Consider the Lagrangian, depending on three coordinates q0 , q1 , q2 ,

1 2
L= (q̇ + q̇22 ) + q0 (q1 q̇2 − q2 q̇1 ) − V (q12 + q22 ) (15.11)
2 1

The absence of any dependence on q̇0 immediately implies the primary constraint

∂L
p0 = =0 (15.12)
∂ q̇0

However, the requirement that this constraint be preserved in the time evolution,

∂p0 ∂ ∂L ∂L
= = =0 (15.13)
∂t ∂t ∂ q̇0 ∂q0

where we have employed the Euler–Lagrange equation for the coordinate q0 , amounts
to the further secondary constraint (setting the non-dynamical q0 = 0)

∂L
= q1 q̇2 − q2 q̇1 = q1 p2 − q2 p1 = 0 (15.14)
∂q0

which is just the primary constraint (15.5) arising from the Lagrangian (15.2). In the
next section it will become apparent that both Lagrangians have identical physical
content as constrained Hamiltonian systems. The important distinction, as we shall
see, is between those constraints whose Poisson brackets (or, in the quantum case,
commutators) with each other vanish once the constraints themselves are imposed
(so-called first-class constraints) and those with non-vanishing Poisson brackets on
the constraint surface (second-class constraints, which we do not consider further
here, as our primary interest lies in the Hamiltonian interpretation of local gauge
symmetries).

15.2 Constrained Hamiltonian systems


The problems encountered in attempting to construct a meaningful Hamiltonian from
singular Lagrangians such as (15.2, 15.8) suggest that a direct canonical interpretation
of such theories in classical phase-space is simply impossible. Dirac was the first
to show, in a masterful analysis presented in his Lectures on Quantum Mechanics
(Dirac, 1964), that this conclusion is unwarranted, and that a well-defined Hamiltonian
formalism can be constructed in the presence of constraints such as (15.5). A full
introduction to the theory of constrained Hamiltonian systems would require far more
Constrained Hamiltonian systems 513

space2 than we can devote to it here, so we shall restrict ourselves to the elements of
the theory directly relevant to the canonical treatment, and quantization, of theories
with local gauge symmetries.
Let us return to the simple example described in the preceding section, with
Lagrangian (15.2). Ignoring temporarily the inconvenient absence of a unique relation
between velocities and momenta, we see, using (15.3, 15.4), that we can re-express the
Hamiltonian function for this theory, initially given as
1 (q1 q̇1 + q2 q̇2 )2
H = p1 q̇1 + p2 q̇2 − L = + V (q12 + q22 ) (15.15)
2 q12 + q22
in a number of equivalent ways: for example,
1 2 q2
H= p1 (1 + 22 ) + V (q12 + q22 ) (15.16)
2 q1
1 2 q2
= p2 (1 + 12 ) + V (q12 + q22 ) (15.17)
2 q2
1 2
= (p + p22 ) + V (q12 + q22 ), . . . . (15.18)
2 1
The lack of a unique inversion for the velocities in terms of the momenta manifests
itself in the multiplicity of equivalent expressions for the Hamiltonian in the three lines
above, which are clearly equal once we take the primary constraint χ(q1 , q2 , p1 , p2 ) =
q1 p2 − q2 p1 = 0 into account. In fact, the set of Hamiltonians given by (the “T”
subscript denotes “total Hamiltonian”, including constraints, in Dirac’s language)
1 2
HT = (p + p22 ) + V (q12 + q22 ) − λ(t)χ(q1 , q2 , p1 , p2 ) (15.19)
2 1
with λ(t) an arbitrary function of time (either explicitly, and/or through an arbitrary
function of the coordinates q, p ), are all equivalent in this sense. If we derive Hamil-
tonian equations of motion q̇ = ∂H ∂p , ṗ = − ∂q in the usual way from (15.19), treating
∂H

the variables q1 , q2 , p1 , p2 as normal unconstrained variables, we obtain


∂V
q̇1 = p1 + λq2 , ṗ1 = − + λp2
∂q1
∂V
q̇2 = p2 − λq1 , ṗ2 = − − λp1 (15.20)
∂q2
The interpretation of the arbitrary function λ, which we shall see also plays the role
of a Lagrange multiplier enforcing the constraint, becomes clear if we introduce new
primed coordinates and momenta

q(t) = R(t)q  (t), p  (t)


p(t) = R(t) (15.21)

2 For a careful and very thorough treatment of the full theory of constrained systems, with emphasis on
gauge theories, see (Henneaux and Teitelboim, 1992).
514 Symmetries IV: Local symmetries in field theory

with R(t) the time-dependent rotation matrix


 
cos (θ(t)) sin (θ(t))
R(t) = (15.22)
− sin (θ(t)) cos (θ(t))

transforming us from a stationary frame to the wobbly “turntable” frame of the


previous section, with the angular velocity of the turntable θ̇(t) = λ(t). One then
finds that the primed coordinates and momenta satisfy the Hamiltonian equations
without the additional λ-term in (15.19):

∂V
q̇1 = p1 , ṗ1 = − (15.23)
∂q1
∂V
q̇2 = p2 , ṗ2 = − (15.24)
∂q2

In other words, the arbitrariness of the constraint term in (15.19) precisely incorporates
the gauge freedom in the solutions of the underlying one-dimensional problem when
viewed in the floating turntable frame, if we interpret the Lagrange multiplier function
λ(t) as the angular velocity of the turntable θ̇(t) at any given time.
At this point, it is useful to recall that the classical Hamiltonian equations of
the theory, (15.20), can be expressed in a form which is particularly suggestive when
one wishes to make the transition to quantum theory, in terms of the Poisson bracket
{F, G} defined on arbitrary functions F (qi , pi ), G(qi , pi ) on the (unconstrained) phase-
space as follows

∂F ∂G ∂F ∂G
{F, G} ≡ − (15.25)
∂qi ∂pi ∂pi ∂qi

Using the Poisson brackets, the dynamical evolution on phase-space (i.e., Eqs. (15.20))
amounts to

q̇i = {qi , HT }, ṗi = {pi , HT } (15.26)

Equivalently, we can say that the total Hamiltonian acts as the generator of infinitesi-
mal time translations: for example, qi (t + δt) = qi (t) + {qi , δt · HT }, etc. The primary
constraint χ(q1 , q2 , p1 , p2 ) = q1 p2 − q2 p1 = 0 is itself left invariant under Hamiltonian
evolution, {χ, HT } = 0: this is physically obvious in our toy model, as the con-
straint is just the angular momentum which is preserved under the two-dimensional
motion of our particle in the central potential V (q12 + q22 ). Thus the constraint, once
applied as an initial condition at time t = 0, will automatically be satisfied at any
later time on trajectories following the Hamiltonian evolution (15.26). Or, in yet
other words, the three-dimensional constraint surface obtained by restricting the
four-dimensional phase-space (q1 , q2 , p1 , p2 ) to points satisfying χ(q1 , q2 , p1 , p2 ) = 0 is
invariant under Hamiltonian evolution. However, this three-dimensional space, as it
is odd-dimensional, cannot act as a proper dynamical phase-space (with an equal
number of “p’s” and “q’s”). In fact, it is clearly still too large, as it contains distinct
Constrained Hamiltonian systems 515

points representing physically equivalent states of the system: those related by a gauge
transformation
(q, p ) → (q  = Rq, p  = R
p) (15.27)

where R is a 2x2 rotation matrix (see (15.22)). As the rotation R varies over all
possible rotation angles 0 ≤ θ < 2π, the points q  , p  trace out a one-dimensional
“gauge orbit” of physically equivalent points in phase-space. In the preceding section
we saw that the gauge ambiguity of the system defined by Lagrangian (15.2) could
be eliminated by imposing a “gauge condition” (such as ψ(q, p ) = q2 = 0), at which
point we recover a non-singular Lagrangian (15.7) with perfectly regular canonical
properties. An appropriately chosen gauge condition ψ(q, p ) defines a surface in the
original four-dimensional phase-space of our unconstrained system which intersects
the gauge orbit passing through any given point in phase-space exactly once. The
imposition of such a condition means that the gauge freedom of the unconstrained
system has been completely eliminated: in the turntable model of the previous section,
it means that we have specified unambiguously the orientation of the turntable at every
moment in time.
Note that the constraint function χ(q, p ) acts as the infinitesimal generator of
gauge transformations (i.e., O(2) rotations), as

{q1 , δθ · χ} = −δθ q2 , {q2 , δθ · χ} = +δθ q1 (15.28)


{p1 , δθ · χ} = −δθ p2 , {p2 , δθ · χ} = +δθ p1 (15.29)

Note also that a necessary condition for the gauge freedom to be completely eliminated
is that the Poisson bracket {ψ, χ} of the gauge-fixing function and the constraint be
non-zero: otherwise put, once on the gauge-fixed surface, any gauge transformation,
and in particular any infinitesimal gauge transformation, must move us off that
surface. Simple axial gauges such as ψ = q2 clearly satisfy this requirement, as we see
from (15.28).
The three-dimensional version of our toy model, (15.8), has three primary first-
class constraints (15.9) whose Poisson brackets are just the Lie algebra of the gauge
group O(3):
{χi , χj } = ijk χk (15.30)

The reader will recall that a set of constraints is said to be first-class if their Poisson
brackets vanish once the constraints themselves are imposed, which is certainly the
case if they form a closed Lie algebra as here. The gauge orbits in this model correspond
to spheres of fixed radius |q |, |
p | for the coordinate and momentum vectors. Again,
a complete gauge-fixing- amounting to selecting a single representative point on each
gauge orbit- is easily achieved by the axial gauge corresponding to imposing, say,
ψ1 = q1 = 0, ψ2 = q2 = 0, which at the Lagrangian level amounts to rotating the q
vector at each time into the z-direction. Note that on the gauge-fixed surface, the
constraint χ3 = q1 p2 − q2 p1 is automatically satisfied: there are only two independent
first-class constraints, χ1 and χ2 which act non-trivially on this surface, and indeed
the non-degeneracy of the determinant
516 Symmetries IV: Local symmetries in field theory

det{ψm , χn } = q32 = 0, 1 ≤ m, n ≤ 2 (15.31)

assures us that no non-zero linear combination of the gauge transformations imple-


mented by χ1 and χ2 can leave us on the gauge surface defined by the gauge conditions
ψi = 0. It is clear that the imposition of the two gauge conditions and two independent
first-class constraints should reduce our originally six-dimensional phase-space to the
two-dimensional phase-space appropriate for describing the underlying “true” one-
dimensional physics of the model. We will now see, following the seminal discussion of
Faddeev (Faddeev, 1969), how this can be accomplished in a very general way at the
Hamiltonian level using the technique of canonical transformations. Our end result
will be the famous Dirac–Faddeev formula giving a well-defined functional integral
quantization of a Hamiltonian system with first-class (gauge) constraints.
Let us assume that our constrained Hamiltonian system is initially defined on a
2f -dimensional phase-space with phase-space coordinates (q1 , .., qf , p1 , .., pf ), with (cf.
(15.19) as an example)


r
HT = h(q1 , .., qf , p1 , .., pf ) + λm χm (15.32)
m=1

and that the set of first-class constraints χm (qi , pi ) = χm (q, p), m = 1, . . . , r generate
gauge transformations whose orbits intersect uniquely the submanifold defined by a
set of r gauge conditions ψm (qi , pi ) = 0, m = 1, . . . , r. As we saw earlier, this implies
that the determinant of the Poisson bracket matrix {ψm , χn } be non-vanishing on
the constraint surface. We shall assume that the gauge conditions are chosen to have
vanishing Poisson brackets with each other:

{ψm , ψn } = 0 (15.33)

The commutativity of the gauge-fixing conditions implies that we can find a canonical
transformation to a new set of 2f coordinates and momenta, which we shall label
(Q∗1 , .., Q∗f −r , Q1 , .., Qr , P1∗ , .., Pf∗−r , P1 , .., Pr ), where the ψm play the role of the last
r momenta (which necessarily commute)

Pm ≡ ψm (q, p ), m = 1, .., r (15.34)

We recall that a canonical transformation on phase-space is a change of coordinates


which leaves the Poisson bracket invariant: in particular we must have {Q∗i , Pj∗ } = δij
for the first f − r conjugate pairs and {Qm , Pn } = δmn for the final r pairs of the
gauge contraints with their conjugate coordinates.
For example, if our system is the f = 2-dimensional toy model defined by (15.19),
and we wish to impose the (single) axial gauge condition ψ = αq1 + βq2 = 0, α2 + β 2 =
1, a suitable set of new coordinates would be Q∗1 = βq1 − αq2 , P1∗ = βp1 − αp2 , Q1 =
−αp1 − βp2 , P1 = ψ = αq1 + βq2 . Note that the determinant of the Poisson bracket
matrix of χs and ψs is just the Jacobian of the change of variables from χm to Qn :

∂χm
det{χm , Pn } = det( ) = 0 (15.35)
∂Qn
Constrained Hamiltonian systems 517

which ensures that our first-class constraints, re-expressed in the new variables,
χm (Q∗i , Pi∗ , Qm , Pm ) = 0 (15.36)
can be solved uniquely for the r new coordinates Qm , m = 1, .., r as functions of the
constrained starred variables only (once the Pm = 0 gauge conditions are applied):

Qm = Qm (Q∗i , Pi∗ , Pm = 0) ≡ fm (Q∗i , Pi∗ ) (15.37)


In the example given above, the single first-class constraint q1 p2 − q2 p1 = 0 becomes
P1 P1∗
Q1 Q∗1 + P1 P1∗ = 0 which allows us to eliminate the coordinate Q1 = − Q ∗ , which, of
1
course, just amounts to Q1 = 0 on the gauge-fixed surface P1 = ψ = 0.
Returning to the general situation as given by (15.32), the usual lore on canonical
transformations tells us that the physics of the system is uniquely captured by
employing just the 2(f − r) set of constrained variables (Q∗1 , .., Q∗f −r , , P1∗ , .., Pf∗−r )
with a constrained Hamiltonian H ∗ , obtained by expressing the original Hamiltonian
HT in terms of the new variables and then implementing the gauge conditions Pm = 0
and using the constraints χm to eliminate the Qm coordinates as in (15.37):

H ∗ (Q∗i , Pi∗ ) = h(q1 , .., qf , p1 , .., pf )|Pm =0,Qm =fm (Q∗i ,Pi∗ ) (15.38)
Once again, resorting to our simple toy model as a concrete example, we find that our
original Hamiltonian (15.19) becomes, in terms of the constrained starred variables
Q∗1 , P1∗ , in the gauge ψ = αq1 + βq2 = 0,
1 ∗ 2
H ∗ (Q∗1 , P1∗ ) = (P ) + V ((Q∗1 )2 ) (15.39)
2 1
This Hamiltonian has exactly the form we would obtain by the conventional canonical
procedure beginning from the non-singular Lagrangian (15.7), which the reader will
recall was obtained by exploiting the gauge symmetry of the singular Lagrangian
(15.2) to eliminate the gauge freedom in the system ab initio (by setting q2 = 0). The
reader is strongly encouraged to carry through the gauge-fixing procedure in the three-
dimensional version of the model, with Lagrangian (15.8) and an O(3) non-abelian
gauge symmetry: the end result will be exactly the same Hamiltonian, representing a
theory with a single physical degree of freedom.
The fully constrained Hamiltonians in (15.38, 15.39) can be subjected to quanti-
zation in the normal way, by imposing the canonical commutator condition3
[Q∗i , Pj∗ ] = iδij (15.40)

Equivalently, we may formulate the theory in the path-integral framework by writing


the functional integral for the propagation kernel (cf. (4.120))
 f
−r  tf
i
(Pi∗ (t)Q̇∗ ∗ ∗ ∗
i (t)−H (Qi (t),Pi (t)))dt
K(tf , ti ) = DQ∗i DPi∗ e  ti
(15.41)
i=1

3 As pointed out originally by Dirac, the classical to quantum transition in this context amounts simply
to the replacement {. . . , . . .} → −i

[. . . , . . .].
518 Symmetries IV: Local symmetries in field theory

We are now going to do something which at first sight seems very strange indeed: we
wish to write an equivalent functional-integral representation for the kernel K, but
in terms of the full set of unconstrained variables q1 , . . . , qf , p1 , . . . , pf from which we
started. In other words, we wish to restore the physically superfluous gauge degrees
of freedom which we have just expended so much effort to eliminate! For the simple
mechanical examples considered so far, such a maneuver would be completely unnec-
essary and, indeed, pointless, but for the gauge field theories which we are about to
explore it is precisely the unconstrained version of the theory which manifests directly
the critical (from a field-theoretic point of view) locality and Poincaré invariance
properties which we are enjoined to preserve at all costs.
First, we restore the original set of 2f coordinates and momenta at any given
time in the functional integral measure by inserting δ-functions which incorporate the
procedures by which we originally eliminated the 2r coordinates and momenta Qm , Pm
to obtain the constrained Hamiltonian H ∗ :
f −r f −r
  
r
DQ∗i DPi∗ → DQ∗i DPi∗ DQm DPm δ(Pm )δ(Qm − fm (Q∗i , Pi∗ ))
i=1 i=1 m=1


f

r
= Dqi Dpi δ(ψm )δ(Qm − fm (Q∗i , Pi∗ )) (15.42)
i=1 m=1


where we have used the fact that the canonical measure fi=1 Dqi Dpi is invari-
ant under the canonical transformation to the (Q∗1 , .., Q∗f −r , Q1 , .., Qr , P1∗ , .., Pf∗−r ,
P1 , .., Pr ) variables. The δ-functions of the Qm coordinates in (15.42) can be traded
in for δ-functions of the first-class constraints χm at the cost of the Jacobian (15.35):


r 
r
∂(χ1 , χ2 , .., χr ) 
r
δ(Qm − fm (Q∗i , Pi∗ )) = δ(χm ) = δ(χm ) · det{χm , ψn }
m=1 m=1
∂(Q1 , Q2 , .., Qr ) m=1
(15.43)
The reader will recall (see, for example, (Goldstein, 2002), Section 9.1) that in Hamilto-
nian systems the combination pi q̇i − H is unchanged under a canonical transformation
up to an additive total time-derivative dF (where F is the generating function of the
canonical transformation), which in the exponent of the path integral will lead to
an overall phase factor e  (Ff −Fi ) , where Fi (resp. Ff ) are the initial (resp. final)
i

values of the generating function over the time evolution from ti to tf . In the field
theory case, we shall be letting the initial and final times go to −∞ and +∞, where
the fields can be safely switched off, so we may ignore this factor here. Recalling
that the constrained Hamiltonian H ∗ in (15.41) is precisely obtained by subjecting
the unconstrained Hamiltonian h(q1 , .., qf , p1 , .., pf ) to the δ-function constraints in
(15.42), we see that our expression for the propagation kernel in terms of a path
integral over constrained variables can be written
 
f 
r  tf
i
(pi q̇i −h(qi ,pi ))dt
K(tf , ti ) = Dqi Dpi δ(χm )δ(ψm )det{χm , ψn } e  ti

i=1 m=1
Abelian gauge theory as a constrained Hamiltonian system 519

 
f 
r  tf
i
(pi q̇i −h(qi ,pi )−λm χm )dt
= Dqi Dpi Dλm δ(ψm )det{χm , ψn } e  ti

i=1 m=1
 
f

r  tf
i
(pi q̇i −HT (qi ,pi ))dt
= Dqi Dpi Dλm δ(ψm )det{χm , ψn } e  ti
(15.44)
i=1 m=1

In the second line we have implemented the first-class constraints χm = 0 by intro-


ducing a set of auxiliary variables λm , m = 1, .., r, the functional integral over which
reproduces the desired δ-functions δ(χm ). These variables play exactly the role of the
Lagrange multipliers we introduced earlier in defining the total Hamiltonian HT in
(15.32), as we see in the final form (15.44), the previously announced Dirac–Faddeev
formula. The Jacobian determinant det{χm , ψn } appearing in this formula has become
known in the field theory literature as the “deWitt–Faddeev–Popov” determinant,
which we shall henceforth refer to as the “DFP” determinant. We shall now see that
it plays an extremely important role in the functional integral quantization of gauge
field theory.

15.3 Abelian gauge theory as a constrained Hamiltonian system


Maxwellian electrodynamics provides the classic example of a field theory with a local
gauge symmetry, exemplifying just the features discussed in the previous two sections.
The classical Lagrangian of this field theory
1
LEM = − Fμν F μν − J μ Aμ , Fμν ≡ ∂μ Aν − ∂ν Aμ (15.45)
4
where the external current J μ is conserved, ∂μ J μ = 0, has the Euler–Lagrange equa-
tion for the field Aν , ν = 0, 1, 2, 3
∂LEM
∂μ = ∂μ (−F μν ) (15.46)
∂(∂μ Aν )
∂LEM
= = −J ν (15.47)
∂Aν
⇒ ∂μ F μν = J ν (15.48)

whereupon, in (15.48), we recognize Maxwell’s equations in covariant notation. The


action defined by this Lagrangian,

IEM = LEM d4 x (15.49)

is invariant under the local (i.e., spacetime-dependent) gauge transformation

Aμ (x) → AΛ
μ (x) ≡ Aμ (x) + ∂μ Λ(x) (15.50)

where Λ(x) is an arbitrary twice-differentiable function (as we wish the field strengths
Fμν to remain well-defined after the gauge transformation). The invariance of the term
in the action involving Jμ is apparent after we use integration by parts to transfer the
520 Symmetries IV: Local symmetries in field theory

spacetime-derivative in the variation due to the gauge transformation to the conserved


current Jμ , ignoring, as usual, boundary terms at spatiotemporal infinity where the
fields may be switched off (or periodic boundary conditions imposed).
The set of transformations (15.50) evidently form an abelian group, as a gauge
transformation induced by gauge-function Λ1 (x) followed by another induced by
Λ2 (x) produces the same result if performed in the opposite order (Λ2 before Λ1 ).
The transformations (15.50) are the field-theoretic analog of the time-dependent
transformations (15.6) which preserved the Lagrangian of our mechanical toy example
(15.2). Indeed, we shall shortly see that the gauge freedom implied by the insensitivity
of the dynamics to the transformations (15.50) corresponds exactly to the variation
induced by a set of first-class constraints, which in this case satisfy an abelian algebra:
their Poisson brackets vanish identically, whether on or off the constraint surface.
The analogy is almost, but not quite, exact: in the case of electrodynamics, first-
class constraints arise as secondary constraints, as the condition that the primary
constraints of the theory be preserved by the dynamics. In fact, the third mechanical
example given in Section 15.1, with Lagrangian (15.11), and secondary first-class
constraints (but otherwise equivalent in physical content to Lagrangian system (15.2)),
is essentially identical in its constraint structure to Maxwell electrodynamics. To see
this, we construct the conjugate momentum fields to the basic four-vector field Aμ in
the usual way (cf. (12.44)):
∂LEM
= −F 0μ (15.51)
∂ Ȧμ

As F 00 = 0, we have the primary constraint

Π0 = −F 00 = 0 (15.52)

while for the spatial components of the field Aμ , the conjugate momentum fields are
recognized as the electric field components of Maxwellian electrodynamics:

Πi = −F 0i = F0i = ∂0 Ai − ∂i A0 ≡ E i , i = 1, 2, 3 (15.53)

The vanishing of the momentum Π0 field conjugate to A0 is the automatic consequence


of the absence of time-derivatives of A0 in the Lagrangian: consequently, the Euler–
Lagrange equation for A0 (which is essentially the equation asserting that the primary
constraint Π0 = 0 is maintained in time) amounts to a secondary constraint relating
fields (and/or their conjugate momenta) on a given time-slice:
∂ ∂LEM ∂LEM ∂LEM
0 = Π̇0 = = −∂i + = ∂i E i − J 0 (15.54)
∂t ∂ Ȧ0 ∂(∂i A0 ) ∂A0
which we recognize as Gauss’s Law in classical electromagnetism. Note that this
secondary constraint only depends on, and thereby restricts, the conjugate momentum
fields Πi = E i of the spatial vector potential Ai , in this case by setting the divergence
of the electric field equal to the charge density (which we here take to be an externally
prescribed classical function) at the same time. We now introduce unconstrained Pois-
son brackets on the classical phase-space defined by the conjugate pair (Ai (x), Πi (x))
Abelian gauge theory as a constrained Hamiltonian system 521

(at a given time)



δF δG
{F [Ai , Πi ], G[Ai , Πi ]} ≡ ( − (F ↔ G))d3 x (15.55)
δAi (x) δΠi (x)

for arbitrary functionals F, G: in particular, the Poisson brackets for the spatial vector
potential and its conjugate (electric) field are simply

{Ai (x), Πj (y )} = δij δ 3 (x − y ) (15.56)

The secondary constraints (plural, as we consider this equation as defining an inde-


pendent constraint at each spatial point separately)

χ(x) ≡ J 0 − ∂i E i = J 0 − ∂i Πi = 0 (15.57)

are in fact first-class, as their Poisson brackets vanish identically:4

{χ(x), χ(y )} = 0 (15.58)



In analogy to (15.28), an infinitesimal linear combination λ(y )χ(y )d3 y of the first-
class constraints acts as the infinitesimal generator of the associated gauge transfor-
mations of the theory,
 
{Ai (x), λ(y )χ(y )d y} = −
3
λ(y )∂i δ 3 (x − y )d3 y = ∂i λ(x) (15.59)

while the effect of a gauge-transformation on the momentum field variables (i.e., the
electric field) is null:

{Π (x),
i
λ(y )χ(y )d3 y} = 0 (15.60)

—i.e., the electric field, unlike the vector potential, is gauge-invariant, and therefore
possesses a direct physical meaning. Just as in the mechanical examples of the
preceding sections, we are free to make such a transformation of the canonical fields
independently at different times, so we recognize the gauge invariance generated by
the first-class constraints of this theory as just the local gauge invariance (15.50)—
restricted to the dynamical canonical fields Ai , of course.
We can now proceed to the construction of an unconstrained total Hamiltonian à
la Dirac, imitating the procedure followed in the preceding section, where our starting

4 The current J μ should be regarded here as either a fixed external field, or built out of matter fields
which have vanishing Poisson brackets with the Aμ fields.
522 Symmetries IV: Local symmetries in field theory

Lagrangian is now (15.45):



H= (Πμ Ȧμ − L)d3 x

1 2 1 2
= (Πi Ȧi − F0i + Fjk + J 0 A0 + J i Ai )d3 x
2 4

1 1 2
= (Πi (Πi + ∂i A0 ) − (Πi )2 + Fjk + J 0 A0 + J i Ai )d3 x
2 4

1 2 2
= ( (E + B ) − J · A
 + A0 (J 0 − ∇
 · E))d
 3x (15.61)
2

We have used electric/magnetic field notation E i = Πi , B i = 12 ijk Fjk in the final line.
Note that the time component of the four-vector potential A0 now plays the role of the
Lagrange multiplier term in the Dirac total Hamiltonian, as it multiplies the Gauss’s
Law first-class constraint:

H = HT = (HEM + A0 (x)χ(x))d3 x (15.62)

1 2 2
HEM = (E + B ) − J · A
 (15.63)
2

The analogy of canonical electrodynamics to the mechanical example embodied in


the Lagrangian (15.11) should now be clear. The derivation of a fully constrained
Hamiltonian version of this theory in which all superfluous gauge degrees of freedom
have been removed proceeds along the same lines as for the mechanical examples
treated earlier: we must choose an appropriate set of gauge-fixing conditions which
select a unique representative from each set of gauge equivalent field configurations.
The canonical momenta Πi are gauge-invariant, so we are here concerned only with the
gauge freedom in the Ai fields, where (on a given time-slice) Ai (x) is gauge equivalent
to Ai (x) if there exists a function Λ(x) such that Ai (x) = Ai (x) + ∂i Λ(x). A common
gauge condition is that leading to Coulomb gauge, where the field is rendered transverse
(spatially divergence-free) at all times:

 · A(
ψ ≡ ∂ i Ai (x, t) = 0 = ∇  x, t) (15.64)

Once appropriate boundary conditions are imposed (periodic boundary conditions


with the system confined to a large spatial box, say), this condition evidently selects
a unique representative on each gauge orbit, as

 · (A
∇  + ∇Λ)
  ·A
=∇ =0⇒∇
 2 Λ(x) = 0 ⇒ Λ(x) = constant (15.65)

Alternatively, one could choose an axial gauge, where the gauge transformation is
chosen to remove (say) the third component of the field,
Abelian gauge theory as a constrained Hamiltonian system 523

ψ = A3 (x, t) = 0 (15.66)
which bears an obvious similarity to the gauge choices we made earlier in our mechan-
ical examples (cf. (15.7)). Both of these conditions destroy the manifest Lorentz-
invariance of the theory, of course. The Coulomb gauge is at least marginally superior
in that it preserves at least the rotational symmetry of the theory (the O(3) subgroup
of the HLG), and we shall adopt it for the time being in our construction of a fully
constrained Hamiltonian theory.
For Coulomb gauge, we may now write down, using (15.44), the functional integral
representation of the generating functional Z giving the vacuum persistence amplitude
in the presence of the external current J μ (we henceforth set  = 1 as we return to
field theory proper):
  i 4
ZEM = DE i DAi DA0 δ(ψ)det{χ, ψ}ei (E Ȧi −HEM −A0 χ)d x (15.67)

with the Hamiltonian energy density given in (15.63). The DFP determinant appearing
in this expression, involves the Poisson bracket of the Gauss’s Law constraint (15.57)
with the Coulomb gauge constraint (15.64) (at equal times, so we suppress time)
∂ i ∂ ∂ ∂  2 δ 3 (x − y )
{J 0 (x) − E (x), j Aj (y )} = {Aj (y ), E i (x)} = ∇ (15.68)
∂xi ∂y ∂xi ∂y j
so the corresponding determinant5
 2 δ 3 (x − y ))
DFP ≡ det{χ(x), ψ(y )} = det(∇ (15.69)
is a field-independent constant, which can be given a well-defined value by a full
regularization of the theory (for example, by introducing IR and UV cutoffs via a
lattice), but is of no physical significance, as we recall from Chapter 10 that overall
multiplicative factors in the functional integral for the vacuum amplitude of a field
theory disappear once we compute the connected Green functions of the theory.
Discarding the determinant factor in (15.67), we see that the dependence of the
exponent on the momentum (electric) fields is quadratic, so these may be integrated
out in the usual fashion by completing the square (and again dropping irrelevant
overall factors), leaving a functional integral over the original vector potential field
variables (A0 , Ai ) = Aμ :
  i 1 2 2 4
 i (E (∂0 Ai −∂i A0 )− 2 (E
+B
)−J Aμ )d x (15.70)
μ
ZEM = DAμ DE i δ(∇  · A)e
  2 2
 i
 · A)e ( 12 F0i − 14 Fjk −J μ Aμ )d4 x
= DAμ δ(∇ (15.71)

5 The determinant factor det{χ, ψ} appearing in the functional integral (15.67) over spacetime fields is a
product of the factor DFP given here at each discrete time, once the theory is regularized, for example, on a
Nt
spacetime lattice: thus, for the present case of Coulomb gauge, det{χ, ψ} = DFP , where Nt is the number
of points in the time direction. Similarly, the δ-function gauge-fixing constraint δ(ψ) implicitly involves a
  · A(
 x, t)).
product of δ-functions enforcing the constraint at each spacetime point: δ(ψ) = x,t δ(∇
524 Symmetries IV: Local symmetries in field theory
 
 i
 · A)e (− 14 Fμν F μν −J μ Aμ )d4 x
= DAμ δ(∇ (15.72)
 
 i
 · A)e LEM d4 x
= DAμ δ(∇ (15.73)

Note that the original, manifestly local, gauge- and Poincaré-invariant action for our
abelian electrodynamics has re-emerged: the only fly in the ointment is the δ-function
enforcing the (non-Lorentz-invariant) Coulomb gauge restriction independently on
each time-slice. The loss of Lorentz symmetry is only apparent: indeed, we may already
anticipate that the theory must, despite appearances, preserve Lorentz symmetry,
as the Euler–Lagrange equations (15.48) of our starting Lagrangian are perfectly
covariant. We shall now demonstrate this highly desirable feature of the constrained
formalism by showing that, despite appearances, the functional integral (15.67) is
independent of the choice of gauge-fixing function ψ, leaving us free to replace the non-
covariant Coulomb (or axial) gauge choices by a perfectly Lorentz-covariant choice—
for example, “Landau (or Lorentz) gauge” ∂μ Aμ = 0.
Recalling (cf. 15.50)) the notation AΛμ ≡ Aμ + ∂μ Λ for the effect of a finite gauge
transformation on the gauge field Aμ , consider the functional Δcoul [A] of Aμ defined
implicitly by

Δcoul [A] · DΛδ(∇  ·A Λ (x, t)) = 1 (15.74)

We again remind the reader (see footnote 5) that the δ-function constraint in the
functional integral implies a product of δ-functions at each and every spacetime
point—a statement which can be given a precise meaning by regularizing the theory
on a spacetime lattice. To the extent that the gauge fixing imposed by ψ = 0 picks a
unique gauge field with ψ(AΛ ) = 0 on the orbit passing through an arbitrary field A,
the functional integral over Λ in (15.74) receives its entire contribution from exactly
one gauge function. In particular, for fields A already on the gauge surface ∇  ·A
 = 0,
this must occur at Λ = 0, and we have
 
Δcoul [A]−1 = DΛ  2 Λ(x, t))
δ(∇
t
  
= DΛ(x, t)δ( M(x, y )Λ(y , t)d3 y) (15.75)
t

 2 δ 3 (x − y )
M(x, y ) ≡ ∇ (15.76)

where we have indicated explicitly the time-discretization in (15.75). The functional


integral yields immediately (at each discrete time) the inverse determinant6 of the
operator M which we previously called DFP (cf. (15.69)), so we have that our
functional

6 This is clear if we discretize also the spatial dependence of the fields, and recall that for any matrix

Mij , dΛi δ(Mij Λj ) = (detM)−1 .
Abelian gauge theory as a constrained Hamiltonian system 525

Δcoul [A] = DFP


Nt
= det{χ, ψ} (15.77)

is just the DFP determinant appearing in the unconstrained functional integral


(15.67), for fields already in Coulomb gauge (which are, of course, just the fields
appearing in said functional integral). Note that this functional is itself gauge-
invariant, as, for any gauge function Λ (x),
 

Δcoul [AΛ ]−1 =  Λ+Λ (x, t)) =
 ·A
DΛδ(∇  Λ (x, t)) = Δcoul [A]−1
 ·A
DΛδ(∇

where we have used the shift invariance of the functional integral over Λ under Λ →
Λ − Λ (for Λ (x) any fixed function on spacetime). The gauge-invariance is, of course,
a triviality in this particular case (abelian gauge theory in Coulomb gauge), as we
previously saw that the determinant is a field-independent constant. But this will
not be the case once we repeat the procedure for non-abelian theories, so we shall
proceed as though the DFP determinants we encounter are non-trivial functionals
of the gauge field. One may similarly define a covariant DFP functional associated
with the gauge choice ψ = ∂ μ Aμ (x) − f (x) = 0, where f (x) is for the time being a
perfectly arbitrary, but fixed, function of spacetime (thus, Landau gauge corresponds
to taking f (x) = 0 identically):

Δfcov [A] · μ − f) = 1
DΛδ(∂ μ AΛ (15.78)

with, as usual, AΛ μ ≡ Aμ + ∂μ Λ. By exactly the same argument used above, involving


the shift invariance of the functional integral, we see that Δfcov [A] is a gauge-invariant
functional of A, for any choice of the function f . A less obvious property (but one which
will shortly becomeimportant) is that for gauge fields on the gauge surface ∂ μ Aμ (x) =
f (x), Δfcov [A]−1 = DΛδ(Λ); i.e., the DFP determinant loses its dependence on the
arbitrary function f (as well as on the gauge fields themselves).
We are now in a position to demonstrate the claimed independence of the path
integral (15.67) to the specific gauge choice employed. We begin with the Coulomb
gauge version of the path integral, replacing det{χ, ψ} by the gauge-invariant func-
tional Δcoul [A], and then introduce a factor of unity via (15.78). We shall also slightly
generalize our earlier discussion by including in the action external sources Ki (x)
coupled to an arbitrary set of gauge-invariant operators Oi (x). The latter might, in
the case of QED, for example, be the gauge-invariant fields Fμν (x) which possess
non-vanishing vacuum to single-photon matrix elements and whose T-products can
therefore be related to S-matrix elements for multi-photon scattering via the LSZ
formalism of Chapter 9. Our previous result (15.73) becomes
 
 coul [A]ei
 · A)Δ (LEM +Ki Oi )d4 x
ZEM [Ki ] = DAμ δ(∇ (15.79)
 
  i (LEM (A)+Ki Oi (A))d4 x
= μ − f )δ(∇ · A)Δcoul [A]e
DΛDAμ Δfcov [A]δ(∂ μ AΛ
526 Symmetries IV: Local symmetries in field theory

= DΛDAμ Δfcov [A−Λ ]δ(∂ μ Aμ − f )δ(∇  −Λ )Δcoul [A−Λ ]
 ·A

(LEM (A−Λ )+Ki Oi (A−Λ ))d4 x
·ei
 
 −Λ )Δcoul [A]ei
 ·A (LEM (A)+Ki Oi (A))d4 x
= DΛDAμ Δfcov [A]δ(∂ μ Aμ − f )δ(∇
 
i (LEM (A)+Ki Oi (A))d4 x
= DAμ Δfcov [A]δ(∂ μ Aμ − f )e (15.80)

In going from the second to the third equations we have made the functional change
of variable Aμ → AΛ (there is no Jacobian as this amounts to an additive shift of
the integration variables); in the fourth equation we have used the gauge-invariance
of the functionals Δcoul , Δfcov and of the Lagrangian LEM and sourced fields Oi ; and
in the final line we have employed the definition (15.74) to remove all evidence of
the original Coulomb gauge-fixing. Our final result (15.80) is manifestly Lorentz-
covariant, as all spacetime indices are properly contracted. If we set f = 0 we recover
the functional integral for abelian gauge theory in Landau gauge. Note that, as we
pointed out earlier, despite appearances, Δfcov [A] in (15.80) is in fact independent of
f , and we may henceforth write it simply as Δcov [A]. As our starting point (15.79) for
ZEM is clearly independent of the choice of the arbitrary function f , we may multiply
it by an irrelevant constant factor obtained by a functional integral over all functions
f with a damped Gaussian factor (ξ a positive real number)
  2 4
C ≡ Df e− 2ξ f (x) d x
i
(15.81)

obtaining
  
f (x)2 d4 x+i (LEM +Ki Oi )d4 x
Df DAμ Δcov [A]δ(∂ μ Aμ − f )e− 2ξ
i
ZEM [Ki ] =
  1
(∂ μ Aμ )2 +Ki Oi )d4 x
= DAμ Δcov [A]ei (LEM (A)− 2ξ
(15.82)

The functional integral (15.82) defines the partition function (or vacuum persistence
amplitude) for abelian gauge theory in the so-called covariant ξ-gauges (first intro-
duced for non-abelian theories by ’t Hooft (T’Hooft, 1971)).
We reiterate that in the present case of abelian gauge theory, the DFP determinant
factor Δcov [A] is in fact a field-independent constant, and may be omitted completely:
we retain it here in anticipation of the fact that for non-abelian gauge theories,
it develops a non-trivial structure and must be kept in order to arrive at unitary
amplitudes. These gauges are extremely useful in performing perturbative calculations
in gauge theories (abelian or non-abelian): the disappearance of the arbitrary constant
ξ from all expressions for gauge-invariant quantities at the end of the calculation
provides a very useful check on the intermediate manipulations. That the Green
functions of the theory (vacuum expectation values of time-ordered products of the
Oi fields, obtained by functionally differentiating ZEM [Ki ] with respect to the source
functions Ki (x)) are ξ-independent is clear from the fact that our original expression
Abelian gauge theory as a constrained Hamiltonian system 527

(15.79) for ZEM [Ki ] did not contain the parameter ξ at all. It is a straightforward
matter to extract the perturbative Feynman rules in the ξ-gauge for abelian quantum
electrodynamics from (15.82), but we shall defer this task to the more interesting
case of non-abelian gauge theory, where the full power of the deWitt–Fadeev–Popov
approach becomes manifest. The Feynman rules for abelian gauge theory are in any
event obtained trivially from the non-abelian ones by a simple reduction, so we lose no
information by proceeding directly, as we shall shortly do, to the case of non-abelian
gauge field theory.
We may promote our abelian gauge theory to a full-fledged quantum electrody-
namics (QED), in which the external source is now the quantized four-vector current
arising from Dirac fermions (e.g, the electron) of charge e and mass m,

J μ (x) = eψ̄(x)γ μ ψ(x) (15.83)

We must also include the usual kinetic term in the Lagrangian for the fermions, thereby
arriving at the Lagrangian for QED:
1
LQED = − Fμν F μν + ψ̄(iD
/ − m)ψ, Dμ ≡ ∂μ + ieAμ (15.84)
4
This Lagrangian is invariant under the local gauge transformations consisting of the
following joint transformations on the Aμ and ψ fields:

Aμ (x) → Aμ (x) + ∂μ Λ(x) (15.85)


ψ(x) → e−ieΛ(x) ψ(x) (15.86)
ψ̄(x) → e+ieΛ(x) ψ̄(x) (15.87)

The invariance of the Fμν F μν and mass mψ̄ψ terms under these transformations is
obvious. The covariant derivative Dμ preserves the form of the gauge transformation
of the charged field, as under (15.85–15.87),

Dμ ψ(x) → (∂μ + ieAμ (x) + ie∂μ Λ(x))e−ieΛ(x) ψ(x)


= e−ieΛ(x) (∂μ + ieAμ (x))ψ(x) = e−ieΛ(x) Dμ ψ(x) (15.88)

from which the invariance of the kinetic fermion term ψ̄γ μ Dμ ψ follows immediately.
Note that if we take the commutator of two covariant derivatives acting on the fermion
field ψ, the derivative terms on ψ cancel, and we are left with the field tensor Fμν :

[Dμ , Dν ]ψ(x) = ie(∂μ Aν (x) − ∂ν Aμ (x))ψ(x) = ieFμν (x)ψ(x) (15.89)

As both ψ(x) and [Dμ , Dν ]ψ(x) must transform identically under (15.86), we see that
the commutator must be gauge-invariant—an observation that will simplify our search
below for a non-abelian generalization of the gauge-field kinetic term in (15.84).
The transformations embodied in (15.85–15.87) form an abelian (commutative)
group, as successive transformations with gauge functions Λ1 (x), Λ2 (x), etc., per-
formed in any order lead to the same final result. The global gauge symmetry obtained
by restricting the gauge functions Λ(x) to spacetime constants clearly amounts to a
U(1) phase transformation on the charged fermion field ψ, so we refer to a theory
528 Symmetries IV: Local symmetries in field theory

characterized by invariance under the transformations (15.85–15.87) as a “U(1) gauge


theory”.
The reader may be puzzled by the fact that in the construction of S-matrix elements
involving (via the LSZ formula) T-products of the gauge and fermion fields associated
with the desired external photons and electrons/positrons, we must introduce sources
for these manifestly non-gauge-invariant fields into the path integral. For example, in
Coulomb gauge, we clearly need to compute the functional
 


 i (LQED +
j·A+η̄ψ+
 · A)e ψ̄η)d4 x
ZQED [j, η, η̄] = DAμ DψDψ̄δ(∇ (15.90)

Note that in Coulomb gauge we need only include a source j for the spatial part A 
of the four-vector potential, as this is the part of the gauge field that interpolates for
the asymptotic (transverse) photon states (cf. the discussion at the end of Section
7.5). The source terms in (15.90) are clearly not locally gauge-invariant, so the
reader may well wonder how we can manage the conversion of this functional into
a manifestly Lorentz-covariant form along the lines of the maneuvers leading to the
covariant functional (15.82) above, which required that the exponent in the path
integral be exactly invariant under an arbitrary local gauge transformation of the
fields. In fact, the Feynman Green functions (T-products of gauge and Dirac fields)
are not gauge-invariant as such, nor do they need to be. Instead, we shall see that the
physical information they contain, in the form of on-mass-shell S-matrix elements,
is preserved under local gauge transformations. For example, we recall that the
LSZ formula requires that we subject the generating functional ZQED [j, η, η̄] to the
operation

δ
d4 xeik·x x ∗ (k, λ) · ZQED (15.91)

δ j(x)

where k is an on-mass-shell four-vector for a photon, in order to extract the S-matrix


amplitude for a scattering in which a photon with momentum k and polarization λ
appears in the final state. In going from the third to the fourth line of (15.80), the
gauge transformation induces a change in the source terms in the exponent,

 + η̄ψ + ψ̄η → j · (A
j · A  − ∇Λ)
 + eieΛ η̄ψ + e−ieΛ ψ̄η (15.92)

and the shift in (15.91) due to Λ is seen to be proportional to



d4 xeik·x x ∗ (k, λ) · ∇Λ(x)
 · ·· ∝ k ·  ∗ (k, λ) = 0 (15.93)

where we have integrated by parts to transfer the spatial gradient to the complex
exponential, and used the fact that the photon polarization vectors are transverse, k ·
(k, λ) = 0. One can similarly establish that the Λ-dependence of the fermionic source
terms visible in (15.92) does not affect the result once the on-mass-shell projection is
made for initial- or final-state fermions as required in the LSZ formula (cf. (9.206);
also Problem 1).
Non-abelian gauge theory: construction and functional integral formulation 529

15.4 Non-abelian gauge theory: construction and functional


integral formulation
The generalization of the idea of local gauge symmetry to a set of non-commuting
transformations, in which the phase transformations on the fermionic fields of the
theory form a non-abelian group, goes back to a seminal paper of Yang and Mills
(Yang and Mills, 1954), in which the attempt is made to convert the global isotopic
spin symmetry of the meson field theories used to describe the strong interactions
in the mid-1950s (cf. Example 5 in Section 12.4) to a local gauge symmetry. The
Lie group considered by Yang and Mills was just SU(2), but here we shall consider
a completely general gauge group of dimension ng , specified by a set of generators
tα , α = 1, 2, . . . , ng , and Lie algebra

[tα , tβ ] = ifαβγ tγ (15.94)

The matrix identity [tα , [tβ , tγ ]] + [tβ , [tγ , tα ]] + [tγ , [tα , tβ ]] = 0 then implies the Jacobi
identity constraint on the structure constants fαβγ of the group

fαδ fβγδ + fγδ fαβδ + fβδ fγαδ = 0 (15.95)

By choosing appropriate linear combinations of the generators, we can also arrange


for the structure constants to be totally antisymmetric: i.e., to change sign under
interchange of any two indices (the antisymmetry under exchange of the first two
indices is, of course, guaranteed from (15.94)). We shall take the generator matrices
to act in the fundamental representation of the group, of dimension N , and assume
that the fermions ψn , n = 1, 2, . . . N of the theory also occupy this representation.
Thus, we are considering a theory in which a global internal symmetry of the type
(12.126) of Section 12.4 acting on a multiplet of matter fields ψn (x) is extended to a
local symmetry, with the parameters ωα of the group element replaced by arbitrary
functions Λα (x) of spacetime:

ψn (x) → ψn (x) ≡ Unm (x)ψm (x), U ≡ exp (−igΛa (x)tα ) (15.96)

For the gauge theories of the standard model, we are dealing with unitary groups:
the generator matrices tα are hermitian, the gauge functions Λα are real, and the finite
group transformation matrices U are therefore unitary. In analogy to the abelian
gauge transformation (15.86), it is conventional to include a factor of the gauge
coupling constant g (analogous to the electric charge e in the QED case) in the
gauge parameters defining U . The dynamical (as opposed to “gauge-kinematical”)
role of the gauge coupling constant will become clear shortly when we construct the
full Lagrangian of the theory. The non-abelian groups associated with the strong and
weak interactions are in addition “special” unitary, satisfying the additional constraint
det(U ) = 1 (corresponding to the generators tα being traceless, as we see immediately
using the identity ln det(U ) = Tr ln (U )). For the purposes of the present discussion, we
may as well restrict ourselves to the special unitary groups SU(N ) in which the matter
fields ψn (x) fill out the fundamental representation of the group. The dimension of
SU(N ) (i.e., the number of linearly independent traceless hermitian N × N generator
matrices tα ) is ng = N 2 − 1. Thus, for the gauge group SU(3), the gauge index α runs
530 Symmetries IV: Local symmetries in field theory

over the values 1,2,. . . ,8. In addition to the fundamental representation of dimension
N , the adjoint representation, of dimension ng , will play a central role in the following.
We remind the reader that a multiplet of real fields Vα (x), α = 1, 2, .., ng transforms
according to the adjoint representation of SU(N ) if, under the gauge transformation
Unm (x) in (15.96),

tα Vα (x) → U (x)tα Vα (x)U † (x) ≡ tα Vα (x) (15.97)

Note that the similarity transformation in (15.97) preserves the traceless, hermitian
character of the tα Vα matrix (if Vα are real fields, as we assume), so the fact that
the generators tα of SU(N ) are a complete basis for all N × N traceless hermitian
matrices ensures that the Vα are well-defined by the above procedure.
The transformations (15.96) are the field-theoretic analogs of the non-abelian gauge
transformations (15.10) leaving invariant the Lagrangian (15.8) in the mechanical
example of Section 15.1. In that mechanical model, the coordinate vector q(t) of
the point particle moves along a trajectory in three-dimensional space, but only
the radial coordinate r(t) ≡ q(t) · q(t) possesses physical significance: the individual
Cartesian coordinates can “wobble” furiously, as though the entire system is being
viewed from the standpoint of an inebriated experimentalist shaking (rotationally) the
coordinate axes in a random fashion. Exactly the same arbitrariness attaches to the
physical interpretation of the internal symmetry axes in the case of a gauge field
theory.
To take a concrete example, consider the role of the color quantum number
in quantum chromodynamics (QCD)—the local field theory describing the strong
interaction sector of the Standard Model. QCD is a gauge theory exhibiting an exact
invariance under local gauge transformations of the form (15.96), where independent
unitary SU(3) rotations of the three quark fields ψn (x) at arbitrary spacetime points
leave the physics unchanged. If we label the three quarks (fancifully and, of course,
arbitrarily) as “red”, “green”, and “blue”, we see that the attachment of any particular
color label to any particular quark at any given time is a completely arbitrary choice:
the color “axes” may be unitarily rotated in a completely random way during the
dynamical evolution of the system without altering any physical observable. Indeed,
the physical observables—in a gauge field theory, those associated with local or almost
local operators (in the language of Chapter 9)—are precisely those which (in analogy
to mechanical quantities which depend only on the radial coordinate in the “turntable
model” of Section 15.1) are gauge-invariant: that is, they are unchanged under the
transformations (15.96). For the gauge group SU(N ), such gauge-invariant observables,
built from fermionic fields in the fundamental representation (the quark fields of QCD,
for example) include composite operators such as

S(x) ≡ ψ¯n (x)ψn (x) (15.98)


J μ (x) ≡ ψ¯n (x)γ μ ψn (x) (15.99)
N (x) ≡ n1 n2 ...nN ψn1 (x)ψn2 (x) · · · ψnN (x) (15.100)

to present just a few examples. These constructs are easily seen to be invariant under
ψ(x) → U (x)ψ(x), ψ̄(x) → ψ̄(x)U † (x) for U ∈ SU (N ): we need only take into account
Non-abelian gauge theory: construction and functional integral formulation 531

the fact that the action of the gauge matrices U (x) leaves the implicit Dirac indices
in the ψn fields unaltered (or in other words, the U matrices commute with the
γ matrices implementing the Dirac algebra for our spin- 12 fields). Local fields such
as S(x), J μ (x), N (x),, and so on, are said to be “colorless” or “color neutral”, and
represent the only local observables with unambiguous physical content in a local
gauge field theory. From the axiomatic Wightman point of view discussed in Section
9.2, the Wightman functions (vacuum expectation values of products) of such gauge-
invariant local fields contain the entire physical content of the theory: the vacuum
is cyclic with respect to the algebra generated by all local gauge-invariant fields (cf.
Section 9.2, Axiom IId). In particular, the complete S-matrix of a gauge theory like
QCD with an exact (unbroken) non-abelian gauge symmetry is determined in principle
from a knowledge of such functions, as the asymptotic Fock space of physical states
consists entirely—as we shall see in Chapter 19—of colorless multi-particle states.
Just as in the abelian case, the construction of a gauge-covariant derivative for
fermionic matter fields transforming non-trivially under the gauge group requires the
existence of vector fields Aαμ (one for each independent gauge transformation). Such
fields are needed to absorb the term involving a spacetime-derivative of the local gauge
functions Λα (x) in the kinetic part of the matter Lagrangian. It is convenient to “pack”
these fields into the adjoint matrix (dimension N × N )

Aμ ≡ tα Aαμ (15.101)

We then require, from (15.97), that under global (i.e., spacetime-independent) gauge
transformations,

Aμ → U Aμ U † , U ∈ SU (N ) (15.102)

and can now build a covariant derivative in analogy to (15.84)

Dμ = ∂μ + igtα Aαμ = ∂μ + igAμ (15.103)

and demand that Dμ ψ transform identically to ψ for any set of matter fields in the
fundamental representation:

ψ(x) → U (x)ψ(x) ⇒ Dμ ψ(x) → U (x)Dμ ψ(x) (15.104)

The inclusion of an inhomogeneous term in the transformation rule for the gauge field
Aμ under local gauge transformations,

† i †
Aμ (x) → AU
μ (x) = U (x)Aμ (x)U (x) + (∂μ U (x))U (x) (15.105)
g
is easily seen to do the trick:

Dμ ψ(x) → (∂μ + igU (x)Aμ (x)U † (x) − (∂μ U (x))U † (x))U (x)ψ(x)
= (∂μ U (x))ψ(x) + U (x)∂μ ψ(x) + igU (x)Aμ (x)ψ(x) − (∂μ U (x))ψ(x)
= U (x)(∂μ + igAμ (x))ψ(x) = U (x)Dμ ψ(x) (15.106)
532 Symmetries IV: Local symmetries in field theory

thereby guaranteeing the local gauge-invariance of the fermionic part of the Lagrangian
(as ψ̄ → ψ̄(x)U † (x) under a local gauge transformation)

Lferm = ψ̄(iD
/ − m)ψ, Dμ ≡ ∂μ + igAμ (15.107)

Note that the gauge invariance requires all members of the fermion multiplet to have
the same mass: if m is a mass matrix, then U † mU = m for all U ∈ SU (N ) implies m
a multiple of the identity (by Schur’s lemma).
A gauge-covariantly transforming field tensor suitable for constructing a kinetic
Lagrangian for the adjoint gauge fields Aαμ is constructed along exactly the same
lines as (15.89) for the abelian theory. We note that derivatives of the matter field
cancel in the commutator [Dμ , Dν ]:

[Dμ , Dν ]ψ = ig(∂μ Aν − ∂ν Aμ + ig[Aμ , Aν ])ψ ≡ igFμν ψ (15.108)

As the covariant derivatives (or products thereof) preserve the transformation property
(15.96) of ψ(x), under ψ(x) → U (x)ψ(x),

Fμν (x)ψ(x) → U (x)Fμν (x)ψ(x) ⇒ Fμν (x) → U (x)Fμν (x)U † (x) (15.109)

so the matrix Fμν (x) = tα Fαμν transforms exactly as required for the adjoint rep-
resentation (15.97), and is built from ng = N 2 − 1 antisymmetric tensor fields Fαμν
related to the underlying vector fields Aαμ by (15.108):

Fμν = ∂μ tα Aαν − ∂ν tα Aαμ + ig[tβ Aβμ , tγ Aγν ]


= tα (∂μ Aαν − ∂ν Aαμ − gfαβγ Aβμ Aγν )
⇒ Fαμν = ∂μ Aαν − ∂ν Aαμ − gfαβγ Aβμ Aγν (15.110)

It is now a trivial matter to construct the non-abelian version of the kinetic gauge
Lagrangian − 41 Fμν F μν in the abelian case: we simply take
1 1
Lgauge = − Tr(Fμν F μν ) = − Fαμν Fαμν (15.111)
4 4
where it conventional to normalize the group generators by Tr(tα tβ ) = δαβ . The
invariance of Lgauge under local gauge transformations is now obvious from the
transformation property (15.109) of the non-abelian field tensor. Including the matter
fields, we have arrived at the Yang–Mills Lagrangian
1 
LYM = Lgauge + Lferm = − Tr(Fμν F μν ) + / − ma )ψa
ψ̄a (iD (15.112)
4 a

where we have made the obvious generalization of allowing for several fermionic
multiplets ψa , of different mass, all transforming according to the fundamental repre-
sentation of the gauge group SU(N ).
We note at this point the characteristic (and remarkable) feature of non-abelian
gauge theories, which dramatically sets them apart from their abelian cousins such
as quantum electrodynamics: even in the absence of matter (scalar or Dirac fields),
the gauge fields themselves form a highly non-trivial interacting field theory, with the
Non-abelian gauge theory: construction and functional integral formulation 533

gauge Lagrangian Lgauge containing, in addition to quadratic kinetic terms, interaction


terms that are cubic and quartic in the gauge fields. It is now generally accepted that
the entire vast range of strong interaction phenomenology is a manifestation of an
underlying local physics described by precisely the Lagrangian (15.112), where the
gauge (or “color”) group is SU(3), and the observed hadrons (strongly interacting
particles) are colorless bound states of the eight gauge “gluon fields” Aαμ , α = 1, 2, .., 8
and fermionic “quark field” multiplets ψa , a = 1, 2, . . . 6 (corresponding to the up,
down, strange, charmed, bottom, and top quarks). This Lagrangian defines the theory
we now call quantum chromodynamics (QCD), in analogy to quantum electrodynamics
(QED). Note that the Lagrangian (15.84) is formally identical to (15.112): however,
the gauge group U(1) is abelian, with vanishing structure constants fαβγ , and the
group multiplets are simply one-dimensional, with a single gauge field (the photon)
and a single Dirac field associated with every charged spin- 12 particle (electron, muon,
quark, etc.).
We have given the explicit form of the local non-abelian gauge transformations for
finite group elements: in other words, the gauge functions Λα (x) in (15.96) are finite
spacetime functions. For transformations near to the identity (Λα → λα infinitesimal),
simple algebra (Problem 2) gives the infinitesimal version of (15.96, 15.105, 15.109):

ψ(x) → (1 − igtα λα (x))ψ(x), ψ̄(x) → ψ̄(x)(1 + igtα λα (x)) (15.113)


Aαμ (x) → Aαμ (x) + ∂μ λα (x) + gfαβγ λβ (x)Aγμ (x) (15.114)
Fαμν (x) → Fαμν (x) + gfαβγ λβ (x)Fγμν (x) (15.115)

The analysis of the dynamics of the classical system defined by the Lagrangian
(15.112) as a constrained Hamiltonian system proceeds in exact analogy to the abelian
case: as there, we find that the fields Aα0 have vanishing conjugate momenta, since

∂LYM
= −Fα0μ (15.116)
∂ Ȧαμ

As Fα00 = 0, we have the primary constraints (one for each generator of the group)

Πα0 = −Fα00 = 0 (15.117)

while the equations of motion for the Aα0 amount to secondary constraints (which
guarantee the preservation in time of the primary constraints) which are just the
non-abelian version of Gauss’s Law (see Problem 3):

∂ ∂LYM ∂LYM ∂LYM 


0 = Π̇α0 = = −∂i + = Di Eαi − Jα0 , Jαμ = g ψ̄a tα γ μ ψa
∂t ∂ Ȧα0 ∂(∂i Aα0 ) ∂Aα0 a
(15.118)
The non-abelian “electric fields” Eαi are in the adjoint representation with the adjoint
covariant derivative D appearing in (15.118) defined as

Di Eαi ≡ ∂i Eαi − gfαβγ Aβi Eγi (15.119)


534 Symmetries IV: Local symmetries in field theory

As in the abelian case, the Eαi are the conjugate momentum fields for the Aαi fields

Πiα = −Fα0i = ∂0 Aαi − ∂i Aα0 − gfαβγ Aβ0 Aγi ≡ Eαi , i = 1, 2, 3 (15.120)

satisfying the classical Poisson bracket relations (on a given time-slice)

{Aαi (x), Πjβ (y )} = δαβ δij δ 3 (x − y ) (15.121)

The secondary constraints (non-abelian Gauss’s Law) of the theory are

χα (x) ≡ Jα0 − Di Eαi = Jα0 − Di Πiα = 0 (15.122)

Note that these constraints generate via Poisson brackets, as in the abelian case, and
as in the mechanical examples of Sections 15.1 and 15.2, the infinitesimal local gauge
transformations (15.113–15.115): for example, for the spatial gauge field (taking the
Poisson bracket at equal times, and suppressing the time coordinate)
 
{Aα (y ), d xλβ (x)χβ (x)} = − d3 xλβ (x){Aiα (y ), ∂j Eβj (x) − gfβγδ Aγj (x)Eδj (x)}
i 3

= ∂ i λα (y ) + gfαβγ λβ (y )Aiγ (y ) (15.123)

in agreement with (15.115). It follows that the Poisson bracket algebra of the con-
straints among themselves must imitate the Lie algebra of the underlying local gauge
group,7

{χα (x), χβ (y )} = igfαβγ δ 3 (x − y )χγ (x) (15.124)

and therefore that the set of constraints (15.122) are indeed first-class as their Poisson
brackets form a closed algebra. The reader will note the analogy to the constraint
algebra in the mechanical non-abelian example (with gauge group O(3)) studied
earlier, (15.30). Now that the primary and secondary constraints have been identified,
we can proceed to the construction of the total Hamiltonian, following steps analogous
to those leading to (15.61, 15.63). One obtains (see Problem 4), ignoring for the time
being the free fermion kinetic parts (thus, we consider only the parts of the action
involving the gauge field),

H = HT = (HYM + Aα0 (x)χα (x))d3 x (15.125)

1 2 2 1 ijk
HYM = (E + Bα ) − Jα · A
α, Bαi =  Fα jk (15.126)
2 α 2
In order to proceed to a canonical quantization of this theory, we have, as usual, two
choices: either (a) the gauge freedom is eliminated ab initio by imposing a physical
gauge choice—one which reduces the number of degrees of freedom in the theory
in accordance with the freedom implicit in the gauge symmetry to the point where

7 Classical Poisson brackets can be defined also for the fermionic fields, with appropriate attention to
signs: the charge densities Jα 0 have vanishing Poisson brackets with the gauge fields and momenta, and
the result is that they satisfy separately the algebra (15.124): see (12.309) for the corresponding quantum
commutator result.
Non-abelian gauge theory: construction and functional integral formulation 535

the remaining coordinates have well-defined conjugate momenta, and a Hamiltonian


can be obtained by the standard Legendre transform procedure—or (b) we treat
the system à la Dirac and Fadeev, with a total Hamiltonian written as in (15.126)
in terms of unconstrained coordinates (i.e., fields) and the constraints and gauge
conditions inserted via Lagrange multipliers, with the appropriate DFP determinant
in the functional integral. We shall follow the latter approach.
In either case, we must make a gauge choice. In principle, one could impose a gauge
condition either on the matter fields or the gauge fields of the theory. For example,
if the gauge group is SU(2) and a doublet of complex scalar fields φn (x), n = 1, 2
is present transforming under the fundamental representation of SU(2), one might
fix the gauge by setting φ1 (x) = φR (x), φ2 (x) = 0, with φR (x) real, which uniquely
determines a local SU(2) transformation which rotates the field doublet into the gauge-
fixed form. Such a “unitary” gauge, which is the natural generalization to field theory
of the gauges considered earlier in our mechanical examples, is actually quite useful
in exposing the physical degrees of freedom of the theory, especially when the gauge
symmetry is spontaneously broken, as we shall see in Section 15.6. However, unitary
gauges turn out to be highly inconvenient for perturbative calculations, as the Green
functions of the theory in such a gauge are in general (even after renormalization of
the Lagrangian parameters) ultraviolet-divergent, with only the on-shell limit (i.e., the
S-matrix) possessing a sensible limit when the UV cutoffs in the theory are removed.
These problems are avoided by choosing a gauge in terms of a condition on the
gauge vector fields of the theory. For example, as for QED, one may employ the gauge
freedom to restrict the spatial components of the gauge fields Aαi by the axial gauge
condition
 α = 0,
n̂ · A n̂ · n̂ = −1 (15.127)

where it is conventional to simply choose the space-like unit vector n̂i = δi3 , so the
gauge freedom is employed to move an arbitrary gauge field along a gauge orbit to the
unique point where Aα3 (x) = 0 (for all x) (see Problem 5). In this case, it is easy to
see that the DFP determinant Δaxial [A] is in fact a field-independent constant (as in
the abelian case), and may therefore be omitted from the functional integral. Despite
this simplifying feature, axial gauge is not a very popular choice, as it clearly destroys
manifest Lorentz-invariance: not just boosts, but also (except for rotations around the
n̂-axis) rotational symmetry.
A less objectionable choice of physical gauge is supplied by Coulomb gauge, which
at least retains manifest rotational symmetry, and is also physically desirable in
situations where a non-relativistic limit plays an important physical role (as in bound-
state problems with heavy quarks, for example):
 ·A
ψα ≡ ∇  α (x) = 0 (15.128)

with apologies for the need to temporarily appropriate the ψ symbol from the fermionic
fields of the theory, in order to maintain conformity with our previous notation
for the gauge fixing condition. In fact, the gauge choice (15.128) is faulty in one
important respect: it fails to satisfy the important requirement that the gauge orbit,
through an arbitrary configuration, intersects the gauge-fixed surface once, and only
once. Instead, as pointed out by Gribov (Gribov, 1978), gauge orbits passing through
536 Symmetries IV: Local symmetries in field theory

“large fields” can intersect the gauge-fixed surface multiple times (the famous “Gribov
copies”). In particular (cf. the discussion following (15.28, 15.29)), infinitesimal gauge
transformations should definitely move us from a point on the gauge-fixed surface to a
point off the surface. Instead, we find that for fields A  α “sufficiently large” (in a sense
soon to become clear), there exist infinitesimal transformations λα which preserve the
Coulomb gauge condition (15.128). Referring to (15.114), we see that this is the case
if (on a given time-slice, suppressing the time variable)

(Δδαβ + gfαβγ A  β (x) = 0


 γ · ∇)λ (15.129)

for some non-trivial λβ (x). It is easy to exhibit examples of this phenomenon. For
example, taking the gauge group to be SU(2), we have fαβγ = αβγ , and making the
Ansatz (all indices α, β, .., i, j, .. now run over the values 1,2,3):

Aiα (x) = αij xj V (r) (15.130)



λα (x) = xα R(r), r ≡ x · x (15.131)

we find that the condition (15.129) amounts to


1
− Δ(xα R(r)) + gV (r)(xα R(r)) = E(xα R(r)), E = 0 (15.132)
2
which is just the three-dimensional Schrödinger equation for a zero-energy l = 1 bound
state of a particle of unit mass in the potential gV (r). Making the potential V (r)
attractive and sufficiently deep (corresponding to a “strong” gauge field A  α )8 , there
will exist normalizable zero-energy solutions for R(r) (i.e., the gauge transformation
function, which we require to vanish at infinity in accordance with the usual boundary
condition of all vanishing fields there). A simple example is an attractive spherical well
with angular momentum l ≥ 1 (which includes our case of l = 1), where a normalizable
zero-energy solution exists with R(r) ∼ r −l−1 at large r (Daboul and Nieto, 1994). The
same phenomenon can be shown to occur in the covariant Landau gauge:

∂μ Aμα = 0 (15.133)

Nevertheless, these gauges are perfectly appropriate if our interests are purely pertur-
bative: in particular, if we are content with finding a consistent set of Feynman rules
for generating the formal asymptotic expansion of the Green functions of the theory in
powers of the coupling constant g. The reason for this is simply that the perturbative
calculation of Green functions from the functional integral amounts to a saddle-point
expansion around the Gaussian functional integrand represented by the free part of
the action, and the results obtained to any finite order of a saddle-point expansion
only depend on the structure of the integrand in the infinitesimal neighborhood of the
saddle point. The preceding discussion makes it clear that in this neighborhood (of
infinitesimally small fields) the troublesome Gribov copies are, in fact, absent.
Restricting our attention to perturbation theory therefore, we may construct an
unconstrained functional integral in Coulomb gauge proceeding in analogy to the

8 We recall that arbitrarily weak potentials do not bind in three dimensions, absent long-range Coulomb-
like behavior.
Non-abelian gauge theory: construction and functional integral formulation 537

abelian case (cf. (15.69)). In this case, however, the DFP determinant constructed
from the Poisson bracket of the constraints with the Coulomb gauge conditions,

{χα (x), ψβ (y )} = {−Di Eαi (x), ∂j Aj (y )}


= (Δδαβ + gfαβγ A  3 (x − y )
 γ · ∇)δ (15.134)

corresponds to a non-trivial functional of the gauge field

 α ] = det(Δδαβ + gfαβγ A
Δcoul [A 
 γ · ∇) (15.135)

The reader will recognize here the reappearance of the same operator whose zero-mode
eigenfunctions signaled the appearance of the Gribov ambiguity in (15.129). The non-
abelian Dirac–Fadeev functional integral analogous to (15.67) (over gauge-field degrees
of freedom only: the fermions will be inserted later) (cf.(15.73)), using the expression
(15.125) for the total Hamiltonian density,9 now becomes (cf.(15.73)):

ZYM = DAαμ DEαi Δcoul [A  α ]δ(∇
 ·A
 α)
 i
2 +B
(∂0 Aαi −∂i Aα0 −gfαβγ Aβ0 Aγi )− 12 (E
2 )−J μ Aαμ )d4 x
· ei (Eα α α α
(15.136)
  2 2
 ·A  α ]ei
 α )Δcoul [A ( 12 Fα0i − 14 Fαjk −Jα
μ
Aαμ )d4 x
= DAμ δ(∇ (15.137)
 
 ·A  α ]ei
 α )Δcoul [A (− 14 Fαμν Fαμν −Jα
μ
Aαμ )d4 x
= DAμ δ(∇ (15.138)
 
 ·A  α ]ei
 α )Δcoul [A LYM d4 x
= DAμ δ(∇ (15.139)

where LYM is the local and Lorentz scalar Yang–Mills Lagrangian (15.112), minus
the fermion kinetic piece. Of course, manifest Lorentz-invariance is still broken by
the δ-function enforcing the non-covariant Coulomb gauge condition and by the DFP
functional Δcoul which only depends on the spatial components of the gauge-field.
As for abelian gauge theories, a choice of gauge which preserves the manifest
Lorentz symmetry and locality properties of the underlying dynamics is usually
preferable to the non-covariant choices which lead to a straightforward canonical
treatment. The conversion can be made (again, as in QED) by recognizing that the
DFP determinant (15.135) can be written as the inverse of a functional integral which
averages the gauge-fixing δ-function over all gauge transformations (cf. (15.74)). This
“averaging over a group” requires the notion of Hurwitz measure: the integration over
all elements U of a continuous Lie group G is uniquely defined by the two conditions
  
dU f (U V ) = dU f (V U ) = dU f (U ), V ∈ G (shift invariance) (15.140)

9 Recall that the A


α0 fields are Lagrange multiplier fields enforcing the χα constraints: the second and
i in the exponent in (15.136) arise from the D E i part of χ ,
third terms in the expression multiplying Eα i α α
with an integration by parts on the ∂i part of Di .
538 Symmetries IV: Local symmetries in field theory

and

dU = 1 (normalization) (15.141)

For example, the Lie group SU(2) is defined in the fundamental representation by unit
determinant 2x2 unitary matrices U = iσ · u + u4 · 1 with u2 + u24 = 1: in other words,
it is topologically the four-dimensional unit sphere. The Hurwitz measure turns out
in this case to be the obvious choice: dU = 2π1 2 dΩ where dΩ is the solid angle
in four dimensions. It is easy to verify the shift invariance condition (see Problem 6)
for this definition. For a local gauge symmetry we have  the obvious generalization of
the single Hurwitz integral to a functional integral DU (x) over independent gauge
elements U (x) at each spacetime point.
Now consider the functional F [A] defined by the Hurwitz functional integral

 ≡ DU δ(∇
F [A]  ·AU ) (15.142)
α

where A U is the spatial gauge field after being subjected to the finite local gauge
transformation U (x) (as in (15.105)). We shall now evaluate this functional integral
for gauge fields A α which are (a) sufficiently weak that no Gribov copies exist, and (b)
in Coulomb gauge—i.e., satisfy ∇  ·A α = 0. It is clear that the δ-function in the integral
is supported exactly at the identity value for the local gauge function U (x) = eigtα Λα (x)
(as we are already in Coulomb gauge). In the neighborhood of the identity, we may
replace the finite group parameters Λα by their infinitesimal limits λα (x), and obtain
(up to an irrelevant constant factor), using (15.114), and the identity from footnote 6,


F [A] = Dλα δ(∇  · (A
 α + ∇λ
 α + gfαβγ λβ A  γ ))

= Dλα δ((Δδαβ + gfαβγ A  γ )λβ )

= det−1 (Δδαβ + gfαβγ A  α ]−1


 = Δcoul [A
 γ · ∇) (15.143)

Thus, we find, in analogy to (15.74) in the abelian case, that the DFP determinant
provides a partition of unity over all gauge transformations

 α ] DU δ(∇
Δcoul [A  ·A
U ) = 1 (15.144)
α

and, by the shift-invariance property of the Hurwitz measure, is trivially gauge-


invariant:
 
 V −1    ·A  α ]−1
 U ) = Δcoul [A
Δcoul [Aα ] = DU δ(∇ · Aα ) = DU δ(∇
VU
α (15.145)

We are now in a position to repeat the steps analogous to those leading from
(15.79) to (15.80) in the abelian case, thereby making the transition from the non-
covariant Coulomb gauge to a covariant gauge specified by the generalized Landau
Non-abelian gauge theory: construction and functional integral formulation 539

gauge condition

∂ μ Aαμ (x) = fα (x) (15.146)

where the fα (x) are, for the time being, arbitrary but fixed c-number functions. We
shall include the fermion dynamics completely at this point, by including the usual
integrals over Grassmannian integration variables, and begin with the functional
  4
ZYM [Ki ] = DAαμ DψDψ̄δ(∇  ·A  α ]ei (LYM +Ki (x)Oi (x))d x
 α )Δcoul [A (15.147)

where now LYM is the full non-abelian Lagrangian (15.112), including fermion kinetic
terms, and the Oi (x) are an arbitrary set of gauge-invariant operators whose Green
functions we wish to compute, by taking functional derivatives with respect to the asso-
ciated sources Ki (x). Defining, in analogy to (15.78), a covariant DFP functional by

1 = Δfcov [A] · DU δ(∂ μ AUαμ − fα ) (15.148)

we can easily show (see Problem 7), just as in the abelian case, that for fields in
the generalized Landau gauge (15.146), (a) Δfcov ≡ Δcov is in fact independent of the
arbitrary functions fα (x), (b) is a gauge-invariant functional, Δcov [AV ] = Δcov [A]
for any local gauge transformation V (x), and (c) is the determinant of the covariant
analog of the Coulomb operator (15.135),

Δcov [A] = det(δαβ + gfαβγ ∂ μ Aγμ ) (15.149)

The steps leading from (15.79) to (15.80) can be repeated more or less verbatim, by
inserting the partition of
 unity (15.148) (with the functional integral over abelian
gauge transformations DΛ. . . now replaced by the corresponding non-abelian
Hurwitz functional integral DU ), and using a shift of the integration variables
which corresponds to a local (finite) gauge transformations on all the fields (leaving
the Lagrangian LY M and the gauge-invariant operators Oi unchanged)
−1
Aαμ (x) → AU
αμ (x), ψ → U −1 ψ(x) (15.150)

Note that the shift of variables (15.150) has unit Jacobian. This is obvious for the
fermionic fields, which undergo unitary rotation at each spacetime point by the matrix
U −1 (x) which has unit determinant. The gauge fields in the adjoint representation
transform as follows (cf. (15.105):
−1
−1 i
t α AU
αμ = U tα Aαμ U − U −1 ∂μ U (15.151)
g
As the second term on the right is an additive shift, and the first corresponds to an
orthogonal rotation of the Aαμ in the α indices, the Jacobian of the transformation
on the gauge fields is also unity. The upshot is that we obtain (after removing the
Coulomb DFP determinant in the form of the partition of unity (15.144)) a manifestly
covariant expression for the same functional ZYM :
540 Symmetries IV: Local symmetries in field theory
 
(LYM +Ki (x)Oi (x))d4 x
ZYM [Ki ] = DAαμ DψDψ̄δ(∂ μ Aαμ − fα )Δcov [A]ei (15.152)

As our starting point did not contain the functions fα , the functional ZYM must also
be independent of the fα (despite their appearance in the δ-function), and we may
therefore multiply (15.152) by a physically irrelevant constant factor
  2 4
C ≡ Dfα e− 2ξ fα (x) d x
i
(15.153)

and interchange the functional integrals over the gauge and fermion fields with those
over the fα to obtain the non-abelian analog of (15.82):
  1 2 4
ZYM = DAαμ DψDψ̄Δcov [A]ei (LYM − 2ξ (∂ Aαμ ) +Ki Oi )d x
μ
(15.154)

The presence of the non-trivial (and unknown!) functional Δcov [A] in this formula
makes it unsuitable for practical perturbative calculations. Instead, it is convenient
to re-express this functional in terms of a functional integral representation, where
the determinant in (15.149) is generated by integrating over complex Grassmannian
“ghost fields” ωα (x), ω̄α (x), using the Gaussian fermionic integal (10.114),
  μ 4
Δcov [A] = det(δαβ + gfαβγ ∂ Aγμ ) = Dωα Dω̄α ei ω̄α (δαβ +gfαβγ ∂ Aγμ )ωβ d x
μ

(15.155)
We emphasize that the ghost fields are introduced here as a purely technical device:
they evidently do not correspond to physical fields or particles in the theory, and
in particular there are no asymptotic states associated with them. As the theory is
based on an underlying hermitian Hamiltonian, we expect the perturbative S-matrix
constructed from a Fock space of gauge and fermion particles (and no “ghost
particles”) to be unitary on its own.10 Inserting (15.155) in (15.154), we obtain
our final result for the generating functional of a theory of coupled Yang–Mills and
fermionic matter fields, in a covariant ξ-gauge:
  1 2 4
ZYM [Ki ] = DAαμ DψDψ̄Dωα Dω̄α ei (LYM +Lgh − 2ξ (∂ Aαμ ) +Ki Oi )d x (15.156)
μ

1 
LYM = − Fαμν Fαμν + / − ma )ψa
ψ̄a (iD (15.157)
4 a

Lgh = ω̄α (δαβ + gfαβγ ∂ μ Aγμ )ωβ (15.158)

The advantage of the ghost field representation of the DFP determinant functional
is apparent in (15.156): the path integral takes the standard form, as an integral

10 By arguments analogous to those given in Section 15.3 for abelian gauge theory, the on-shell S-matrix
can be shown to be the same in a physical gauge (such as Coulomb or axial gauge), where only manifestly
physical gauge degrees of freedom are present, as in the covariant gauges under discussion. It is therefore
clear that perturbative unitarity must therefore hold on a Fock space constructed from the interpolating
fields associated with the gauge and matter fields of the theory, sans ghosts.
Non-abelian gauge theory: construction and functional integral formulation 541

over an exponential of a Lagrangian with a polynomial structure in fields and their


derivatives, from which a set of Feynman rules can easily be extracted.
A free Lagrangian is identified by setting the gauge coupling constant to zero, and
the free propagators of the various fields (gauge, ghost, and fermion) can then be
extracted, as in Section 10.3, as the Green functions associated with the differential
operators in the corresponding quadratic parts of the Lagrangian. The ξ-dependent
term in (15.156) will clearly enter in the determination of the gauge field propagator:
a straightforward calculation (see Problem 8) shows that the two-point function for
the free gauge fields is

(gμν − (1 − ξ) μk2 ν )e−ik·(x−y) d4 k
k k
0|T {Aαμ (x)Aβν (y)}|0 = −iδαβ (15.159)
k 2 + i (2π)4

The choices ξ = 0 (resp. 1) are referred to as Landau (resp. Feynman) gauge, but as
mentioned previously, it is often convenient to leave the gauge parameter ξ unfixed
at intermediate stages, as its disappearance at the end in physically meaningful
quantities is a powerful check (guaranteed by gauge invariance) on the correctness
of the calculation.
The interaction vertices of the theory are associated with the cubic and quartic
terms in the total Lagrangian. Thus, denoting the gauge, ghost and Fermi fields
by A, ω, ψ generically, there are vertices in the graphs of the theory corresponding
(schematically) to A3 , A4 , ψ̄ψA, and ω̄ωA field products (see Fig. 15.1 for the Feynman
rules for the bosonic vertices of the theory; also, Problem 9). In addition, we must
remember that the Grassmann nature of the ghost fields inserts minus signs (analogous
to those for the physical fermion fields of the theory) when ghost fields are exchanged
in Wick products, as well as in closed loops of ghost propagators (cf. (10.40)). As
emphasized above, the ghost propagators only appear as internal lines, as the ghost
fields do not correspond to physical asymptotic particle states.
A deeper understanding of the role of ghost fields in the quantization of local gauge
theories has been provided by the beautiful theory developed by Becchi, Rouet, and
Stora (Becchi et al., 1976), and Tyutin (Iofa and Tyutin, 1976), where the first-class
constraints appearing in a theory with local gauge symmetry are reinterpreted in terms
of an exact global supersymmetry of the full Lagrangian density (e.g., the exponent
in the path-integral expression (15.156), including ghost and gauge-fixing terms). The
BRST theory (like its historical antecedent, the Gupta–Bleuler quantization method)
is necessarily formulated on a state space with indefinite metric, but the existence of
a global supersymmetry turns out to be exactly what is needed for the ghost states,
together with longitudinal polarizations of massless gauge mesons, to decouple from
the positive-definite subspace of physical states. The derivation of the Ward–Takahashi
identities which summarize the content of the local symmetry at the level of the Green
functions of the theory is also considerably simplified in the BRST approach. We shall
not describe this approach further here, but refer the reader to the original papers and
accounts in textbooks devoted specifically to quantization of gauge field theories.11

11 For a readable account, see (Taylor, 1976), Chapter 12. The general graded cohomology BRST theory
of Hamiltonian systems with first-class constraints can be found in (Henneaux and Teitelboim, 1992).
542 Symmetries IV: Local symmetries in field theory

βν,p2 γρ,p3

(a) −igfαβγ[gμν(p1−p2)ρ+gνρ(p2−p3)μ+gρμ(p3−p1)ν]

αμ,p1

δσ,p4 γρ,p3
−g2[f βf δ(gμρgνσ−gμσgνρ)
(b) +f γf β(gμσgνρ−gμνgρσ)
+f δf γ(gμνgρσ−gμρgνσ)]
αμ,p1 βν,p2

β,p2
γμ,p3

(c) igfαβγ p2μ

α,p1

Fig. 15.1 Feynman rules for the (a) triple gluon, (b) quadruple gluon, and (c) ghost-ghost-
gluon vertices in QCD. The arrows indicate direction of momentum flow.

As in the case of abelian gauge theories, we are frequently interested in calculating


the perturbative S-matrix elements for scattering of the elementary quanta of the
theory (e.g., quarks and gluons in QCD). This requires that sources be introduced
for the non-gauge-invariant fields (the ψ and Aαμ ) which interpolate for these objects
in perturbation theory. The argument that the S-matrix is gauge-independent, and
that we may therefore, for example, use the Feynman rules in an arbitrary ξ-gauge
to compute the relevant scattering amplitudes, proceeds along similar lines to the
discussion for abelian gauge theory at the end of the previous section.
Our functional integral quantization of non-abelian gauge theory has so far been
carried out in Minkowski space. Many aspects of the Yang–Mills theory are more
conveniently studied in the Euclidean space formulation of the theory. For example,
the important role played by vacuum tunneling processes (“instantons”) can only
be exposed clearly in Euclidean space, and non-perturbative simulations of the
lattice-regularized theory are all performed in the Euclidean version. The transition
from Minkowski to Euclidean space is accomplished by the usual Wick analytical
continuation:

x0 → −ix4 , xi → xi , i = 1, 2, 3, ∂0 → i∂4 , ∂ i → ∂i
i 1
Aα0 → Aα4 , Aαi → Aαi , i = 1, 2, 3,
g g
Non-abelian gauge theory: construction and functional integral formulation 543

i 1
Fα0i → Fα4i , i = 1, 2, 3, Fαij → Fαij , i, j = 1, 2, 3
g g
γ 0 → γ̂4 , γ i → iγ̂i (i = 1, 2, 3), {γ̂μ , γ̂ν } = 2δμν (15.160)

where we have also rescaled the gauge field by a factor of the inverse coupling
constant. On performing these replacements in the Minkowski functional integral
(15.156) we obtain (ignoring source terms, and taking for simplicity just a single
Dirac field in the fundamental representation) the Euclidean functional integral:
 
− (LYM,E +Lgh + 1 2 (∂μ Aαμ )2 )d4 x
ZYM,E = DAαμ DψDψ̄Dωα ω̄α e 2ξg (15.161)

1
LYM,E = (Fαμν )2 − ψ̄(iD̂
/ − m)ψ (15.162)
4g 2
Fαμν = ∂μ Aαν − ∂ν Aαμ − fαβγ Aβμ Aγν , D̂
/ = γ̂μ (−i∂μ + tα Aαμ ) (15.163)
Lgh,E = ω̄α (δαβ ∂μ ∂μ + fαβγ ∂μ Aγμ )ωβ (15.164)

We note the following important features of this Euclidean functional representation:


1. The gauge field component of the Euclidean action,

1 2
Sgauge,E ≡ 2 Fξ [Aαμ ], Fξ [Aαμ ] ≡ {(Fαμν )2 + (∂μ Aαμ )2 }d4 x (15.165)
4g ξ
is evidently positive-definite and provides the usual damping factor for large
fields which ensures the existence of the functional integral (provided, as usual,
the theory is fully regularized, say on a spacetime lattice). Perturbation theory
corresponds to performing Gaussian functional integrals over the gauge fields
keeping only the parts of Fαμν linear in Aαμ in Fξ : i.e., functional integrals of
− 1
F0
polynomials of the fields with the Gaussian measure e 4g2 ξ , with

2
Fξ0 [A] = {(∂μ Aαν − ∂ν Aαμ )2 + (∂μ Aαμ )2 }d4 x (15.166)
ξ
“Large” gauge fields A of the type responsible for Gribov copies (recall the
example (15.130)), where the “potential” V (r) must be of order unity) have
a Gaussian action Fξ0 [A] ≥ C, where C is a constant of order unity in the weak
coupling limit g → 0, and their contribution to the functional integral is therefore
− C
suppressed by the non-analytic dependence e 4g2 , which has a vanishing formal
expansion in powers of the coupling, to all orders of perturbation theory. This
argument vindicates our earlier assertion that the Gribov ambiguity is irrelevant
at the level of perturbative expansions.
2. The Dirac operator D̂ / appearing in the quadratic fermionic action is formally
self-adjoint: the Euclidean γ matrices γ̂μ are hermitian, as are the gauge group
generators tα , the gauge fields Aαμ are real, and the differential operators −i∂μ
are self-adjoint when acting on functions in a suitable function space. In fact, if we
quantize the theory in a box of finite spacetime volume (imposing, say, periodic
544 Symmetries IV: Local symmetries in field theory

boundary conditions), this operator will have a purely discrete spectrum, with
the usual orthogonality properties holding for its eigenfunctions:

/φi (x) = λi φi (x), λi real,
D̂ φ†i (x)φj (x)d4 x = δij (15.167)

where the eigenfunctions φi (x) carry implicitly discrete Dirac and fundamental
representation gauge group indices (whence the † appearing in the orthonormality
relation). These properties turn out to be crucial in the analysis of the chiral
properties of gauge theory, as we shall see shortly in our discussion of the axial
anomaly. The self-adjointness of D̂/ is also critical in non-perturbative evaluations
of the Euclidean Green functions in QCD by Monte Carlo simulation of the
lattice-regularized functional integral, as the fermion fields when integrated out
yield the determinant of the Dirac operator Q ≡ iD̂ / − m which can be shown to
be real, as a consequence of

γ5 Qγ5 = Q† ⇒ det(Q) = det(γ5 Q† γ5 ) = det(Q† ) = det(Q)∗ (15.168)

and even positive, and therefore can be treated as a probability measure in a


stochastic evaluation of the path integral over the gauge fields by importance
sampling methods (see Chapter 7, (Montvay and Münster, 1994)).

15.5 Explicit quantum-breaking of global symmetries: anomalies


The existence of global symmetries which are exactly conserved at the classical level
but broken at the quantum level, corresponding to Noether currents with a divergence
proportional to Planck’s constant, was established in the late 1960s with the discovery
of the famous axial (or chiral) anomaly by Adler (Adler, 1969), and by Bell and Jackiw
(Bell and Jackiw, 1969). In fact, the existence of the chiral anomaly was foreshadowed
in the much earlier work of Schwinger (Schwinger, 1951), where the decay rate of a
pseudoscalar meson to two photons was computed perturbatively and shown to yield
an amplitude which (much later) was realized (by Bell and Jackiw) to be exactly the
source of the anomalously high decay rate of the neutral pion to two photons.
The origin of the failure of the classical Noether theorem due to quantum anomalies
was clarified within the framework of the functional integral formulation by Fujikawa
(Fujikawa, 1980, 1981). We shall discuss Fujikawa’s treatment of the chiral anomaly in
some detail, as it reveals the essential features of the problem, common to the entire
class of quantum-induced symmetry-breaking anomalies in field theory. The reader will
recall from the discussion in Section 7.4.5 that a massless Dirac field ψ decomposes
naturally into the Weyl field components ψL (resp. ψR ) which transform according to
( 12 , 0) (resp. (0, 12 )) representations of the HLG. The kinetic part of the Lagrangian
for such fields, ψ̄γ μ Dμ ψ (with a covariant derivative Dμ appearing if gauge fields
are present coupled to the fermions), lacks terms mixing the left and right parts of
the Dirac field. Consequently, the fermionic Lagrangian possesses a global U(1)×U(1)
symmetry under independent phase rotations of the left- and right-handed parts of
Explicit quantum-breaking of global symmetries: anomalies 545

the Dirac bispinor:

ψL (x) → eiωL ψL (x)


ψR (x) → eiωR ψR (x)

The “diagonal” subgroup corresponding to ωL = ωR is just a global phase symmetry


of the type discussed in the fourth example of Section 12.5: the associated Noether
current J μ = ∂(∂∂L
μ ψ)
ψ = ψ̄γ μ ψ is not anomalous (as we shall see below), and is exactly
conserved (it is usually referred to as a “vector” symmetry, as the associated current
transforms as a spacetime vector, rather than as an axial vector). The chiral subgroup
consisting of the global transformations

ψL (x) → eiω ψL (x)


ψR (x) → e−iω ψR (x)
ψ(x) → eiωγ5 ψ(x) (15.169)

also leads to a Noether symmetry and, at the classical level, a conserved axial Noether
current:

J5μ = ψ̄γ5 γ μ ψ (15.170)

Of course, if a mass term mψ̄ψ = mψ̄L ψR + mψ̄R ψL is added to the Lagrangian, the
chiral symmetry is lost, and the current (15.170) will develop a non-zero divergence.
The remarkable discovery of Adler, Bell, and Jackiw was that even in the absence of
a mass term quantum effects inevitably produce a non-vanishing divergence of J5μ if
the fermions are coupled to vector gauge fields via an exact local gauge symmetry. We
shall (following Fujikawa) exhibit this result using the functional version of Noether’s
theorem: i.e., by deriving the Ward–Takahashi identities for the chiral current of
fundamental representation fermions coupled to gauge fields (either abelian or non-
abelian). We shall see that an anomalous divergence, proportional to Planck’s constant
—and therefore, explicitly a quantum effect—emerges once the Jacobian arising
from a functional change of variable corresponding to a local chiral transformation
is carefully evaluated. As a preparation for the discussion here, the reader may find
it convenient to briefly review the derivation of the Ward–Takahashi identities for a
non-anomalous current that concludes Section 12.5.
The chiral transformations (15.169) act only on the fermion fields of the theory, so
we need only consider the fermionic part of the gauge theory path integral, with
the gauge fields “frozen” at some fixed, but unspecified, values (which may later
be integrated over after the pure gauge parts of the action are included). We shall
work in Euclidean space, so our starting point is the fermionic functional integral
(cf. (15.162))
  
1 4 4
Zferm [η, η̄] = DψDψ̄ e  Lferm d x+ (η̄(x)ψ(x)+ψ̄(x)η(x))d x (15.171)

/ − m)ψ(x)
Lferm = ψ̄(x)(iD̂ (15.172)
546 Symmetries IV: Local symmetries in field theory

We have explicitly indicated the factor of Planck’s constant (normally set to unity)
in the exponent of the functional integrand in order to clarify the quantum origins
of the anomaly. We now retrace the procedure followed in Section 12.5 in deriving
the functional form of the Noether theorem for a vector symmetry, by considering
the effect of a functional change of variables induced by an infinitesimal local chiral
transformation:

ψ(x) → ψ  (x) ≡ eiω(x)γ5 ψ(x) ∼ (1 + iω(x)γ5 )ψ(x) + O(ω 2 )


ψ̄(x) → ψ̄  (x) ≡ ψ̄(x)(1 + iω(x)γ5 )ψ(x) + O(ω 2 ) (15.173)

Note that the Wick rotation to Euclidean space converts our previous Minkowski space
definition of γ5 to its Euclidean version γ5 = γ̂1 γ̂2 γ̂3 γ̂4 , which turns out to be exactly
the same matrix—i.e., (7.107). We can expand the general Grassmann fields ψ(x), ψ̄(x)
in a complete set of normalized eigenfunctions of the self-adjoint Dirac operator D̂ / (cf.
(15.167)), thereby giving a precise meaning to the functional integral in terms of a
discrete multi-dimensional Grassmann integral
  †
ψ(x) = φi (x)ψi , ψ̄(x) = φi (x)ψ̄i (15.174)
i i
  
DψDψ̄ ≡ dψi dψ̄i (15.175)
i

Note that the eigenfunctions φi (x) contain a hidden (four-dimensional) Dirac index
and a gauge group index corresponding to the fundamental representation of the gauge
group (one-dimensional for the abelian case, such as QED, or N -dimensional for a
SU(N ) non-abelian gauge group). They depend on the c-number gauge fields buried
in the covariant derivative D̂. Under (15.173), the fermionic Lagrangian transforms to

Lferm → Lferm − 2imω(x)ψ̄γ5 ψ(x) − ∂μ ω(x)ψ̄γ̂ μ γ5 ψ(x) (15.176)

while the source terms in (15.171) transform to


 
(η̄(x)ψ(x) + ψ̄(x)η(x))d x → (η̄(x)ψ(x) + ψ̄(x)η(x))d4 x
4


+i ω(x)(η̄(x)γ5 ψ(x) + ψ̄(x)γ5 η(x))d4 x (15.177)

The fermionic integration variables meanwhile undergo the following transformation


(to first order in ω):
 
ψ  (x) = φi (x)ψi = (1 + iω(x)γ5 ) φj (x)ψj
i j
 † 

⇒ ψi = ( φi (x)(1 + iω(x)γ5 )φj (x)d4 x)ψj = Cij ψj
j j
 
Cij ≡ φ†i (x)(1 + iω(x)γ5 )φj (x)d4 x = δij + i φ†i (x)ω(x)γ5 φj (x)d4 x
Explicit quantum-breaking of global symmetries: anomalies 547

The Dirac conjugate fields ψ̄ produce a similar result, as the sign of the ω term in
(15.173) is the same as for ψ (due to the fact that ψ̄ = ψ † γ̂4 , giving an extra minus
sign when the γ5 is commuted through γ̂4 ):

ψ̄  (x) = ψ̄j Cji (15.178)
j

The effect of the change of variable (15.173) is therefore to introduce a Jacobian factor
(cf. (10.116))

J = det(C)−2 = e−2Tr ln (C)


 † 4

ω(x)A(x)d4 x
= e−2i i ω(x)φi (x)γ5 φi (x)d x ≡ e−2i (15.179)

where we again remind the reader that we are working to first order in the gauge
parameter ω(x). Note that if we had been working with the vector symmetry induced
by the phase transformation ψ → eiω ψ (with no γ5 ), there would be no mass term in
the variation of the Lagrangian, as in (15.176), and the functional Jacobian would be
a product of the determinants of matrices C and C̄ given by
 
Cij = δij + i φ†i (x)ω(x)φj (x)d4 x, C̄ij = δij − i φ†i (x)ω(x)φj (x)d4 x (15.180)

which is simply (to order ω) unity, and we would recover the standard (non-anomalous)
Ward–Takahashi identities (as in Section 12.5). For the chiral current, however, the
functional Jacobian (15.179) is a non-trivial, though gauge-invariant functional of
the gauge fields (as ψ(x) → U (x)ψ(x) induces the change φi (x) → U (x)φi (x), which
leaves Cij unchanged), which we shall shortly evaluate explicitly. Before doing that, let
us assemble the various pieces needed to obtain the chiral Ward–Takahashi identity,
which is simply the statement that the functional Zferm is unchanged by the functional
change of variables (15.173)—provided, of course, that we take into account properly
any non-trivial Jacobians induced by the change. After subjecting Zferm to the change
of variable, we find that the first-order (in ω) change in the integral takes the form

0 = δZferm [η, η̄] = ω(x)W[η, η̄; x] + O(ω 2 )

⇒ 0 = W[η, η̄; x] = DψDψ̄{i(η̄(x)γ5 ψ(x) + ψ̄(x)γ5 η(x)) − 2iA(x)

1
+ (∂μ (ψ̄γ̂μ γ5 ψ(x)) − 2imψ̄γ5 ψ(x))}

1
 4
 4
Lferm d x+ (η̄(x)ψ(x)+ψ̄(x)η(x))d x
· e (15.181)

which is the analog of the Noether functional theorem (12.141) for our anomalous
symmetry. The Ward–Takahashi (WT) identities (analogous to (12.142)) are obtained
by differentiating W[η, η̄; x] (=0) with respect to the fermionic sources η(x), η̄(x)
and then setting the sources to zero, whereupon we obtain a set of relations among
the Euclidean n-point (Schwinger) functions of the theory, which are the analytic
548 Symmetries IV: Local symmetries in field theory

continuations of the multi-fermion Minkowski Green functions (T-products) of the


δ2
theory. For example, applying δη̄α (y)δη β (z)
, we obtain (cf. (12.142)),

−i(γ5 ψ(x))α ψ̄β (z)δ 4 (x − y) + i(ψ̄(x)γ5 )β ψα (y)δ 4 (x − z) + 2iA(x)ψα (y)ψ̄β (z)


1
− (∂μ (ψ̄γ̂μ γ5 ψ(x)) − 2imψ̄γ5 ψ(x))ψα (y)ψ̄β (z) = 0 (15.182)


If we analytically continue back to Minkowski space, the Euclidean correlators become


T-products in the usual way, and we find the Minkowski space WT identity12

0|T {−(γ5 ψ(x))α ψ̄β (z)δ 4 (x − y) + (ψ̄(x)γ5 )β ψα (y)δ 4 (x − z)}|0


1 ∂
− 0| μ T {ψ̄γ μ γ5 ψ(x)ψα (y)ψ̄β (z)} − 2imT {ψ̄γ5 ψ(x)ψα (y)ψ̄β (z)}|0
 ∂x
+ 2iAM (x)0|T {ψα (y)ψ̄β (z)}|0 = 0 (15.183)

where AM is the Minkowski continuation of the functional anomaly (to be determined


below). This result generalizes in the obvious way to arbitrary numbers of ψ and ψ̄
fields, by taking the appropriate higher functional derivatives with respect to η, η̄. As in
the case of the non-anomalous WT identity (12.142), if we reduce out the fermions by
taking the ψα (y)ψ̄β (z) fields on-shell and applying the LSZ formula, the contact terms
disappear and we find that arbitrary matrix elements of the operator combination
∂μ J5μ (x) − 2imψ̄γ5 ψ(x) + 2iAM (x) vanish, and we therefore conclude that

∂μ J5μ (x) = 2imψ̄γ5 ψ(x) + 2iAM (x) (15.184)

We see that even in the massless limit, when the mass term on the right-hand side
vanishes, and the chiral symmetry (15.169) becomes an exact Noether symmetry at the
classical level, there is a remaining non-zero contribution coming from the functional
anomaly AM , the quantum origins of which are apparent in the explicit prefactor of
Planck’s constant multiplying the anomaly.
In order to compute the explicit form of the functional anomaly A (back in
Euclidean space), we must recall that although we have already discretized the
spectrum of D̂ / by placing the system in a finite spacetime volume, there is as
yet no short-distance (or high-momentum) cutoff, and the determinant therefore
involves an ill-defined product of infinitely many eigenvalues. We may regularize it
in a gauge-invariant way by observing that the eigenvalues λi are gauge-invariant
2 2
(see Problem 10), so that the inclusion of a factor e−λi /Λ in the trace in (15.179),
where Λ is an ultraviolet cutoff, amounts to a smooth gauge-invariant tempering of the
short-distance modes of the theory, which should be removed after evaluation of the
determinant by letting the cutoff Λ → ∞. Thus, we define the regularized functional

12 The contact terms lose a factor of i as a consequence of the Wick rotation of the four-dimensional
δ-functions, where one coordinate—the time—is rotated by a factor of i.
Explicit quantum-breaking of global symmetries: anomalies 549

anomaly AΛ as
 2 2  2 2
AΛ ≡ φ†i (x)γ5 e−λi /Λ φi (x) = φ†i (x)γ5 e−D̂
/ /Λ
φi (x) (15.185)
i i

By the usual spectral analysis of self-adjoint operators, the discrete completeness sum
is equivalent to one over plane wavefunctions, as for any operator O, writing in Dirac
notation φi (x) = x|i
 
φ†i (x)Oφi (x) = i|xx|O|i
i i

= x|Tr(O)|x

d4 k
= φk (x)† Tr(O)φk (x) , φk (x) = eik·x (15.186)
(2π)4

where the trace operation Tr extends over the discrete γ-matrix and internal gauge
/2 /Λ2
indices only. With O = e−D̂ we therefore have

2 2 d4 k
AΛ = Tr(γ5 e−ik·x e−D̂
/ /Λ ik·x
e ) (15.187)
(2π)4

We may write the discrete trace Tr = trD trG where we explicitly separate the traces
over Dirac (trD ) and fundamental representation gauge (trG ) indices. Now

1 1 1
γ̂μ γ̂ν = {γ̂μ , γ̂ν } + [γ̂μ , γ̂ν ] = δμν + [γ̂μ , γ̂ν ] (15.188)
2 2 2

/ = γ̂μ Dμ where Dμ = −i∂μ + tα Aαμ =


whence (recall that in Euclidean space D̂
−i∂μ + Aμ )

1 i
/2 = Dμ Dμ + [γ̂μ , γ̂ν ] [Dμ , Dν ] = Dμ Dμ − [γ̂μ , γ̂ν ] Fμν
D̂ (15.189)
4 4

The factors of e−ik·x . . . eik·x in (15.187) merely serve to translate the covariant
derivative Dμ by the four-vector kμ ,

e−ik·x Dμ eik·x = Dμ + kμ (15.190)

from which we then see, using (15.189), that


2 2 −ik·x 1 2
/2 eik·x /Λ2
e−ik·x e−D̂
/ /Λ ik·x
e = e−e D̂
= e− Λ2 ((kμ +Dμ ) − 4i [γ̂μ ,γ̂ν ] Fμν )

2 1
= e−kμ kμ /Λ · e− Λ2 (2kμ Dμ +Dμ Dμ − 4 [γ̂μ ,γ̂ν ] Fμν )
i
(15.191)

At this point we shall need some simple Dirac trace identities, which we leave to the
reader to check (remembering that our Euclidean γ matrices are now all hermitian
550 Symmetries IV: Local symmetries in field theory

and satisfy trD (γ̂μ γ̂ν ) = 4δμν ). Specifically,

trD (γ5 ) = 0 (15.192)


trD (γ5 γ̂μ γ̂ν ) = 0 (15.193)
trD (γ5 γ̂μ γ̂ν γ̂ρ γ̂σ ) = 4μνρσ (15.194)

where μνρσ is the completely antisymmetric Euclidean four-tensor with 1234 = 1. It


follows that when the expression (15.191) is traced with γ5 , the only terms which
survive involve expanding out at least two factors of the term with Λ12 [γ̂μ , γ̂ν ]Fμν in
the second exponential, in order to obtain the requisite minimum of four γ matrices
needed to provide a non-vanishing trace. Pulling down additional factors, such as
(kμ Dμ )2 /Λ4 or Dμ2 /Λ2 , lead to integrals over k which (together with the accompanying
inverse powers of Λ) vanish in the infinite cutoff limit, for example:
 2
1 k −k2 /Λ2 d4 k 1 1
e = → 0, Λ → ∞ (15.195)
Λ4 Λ4 (2π)4 8π 2 Λ2

1 1 −k2 /Λ2 d4 k 1 1
4
e = → 0, Λ → ∞ (15.196)
Λ Λ2 (2π)4 16π 2 Λ2
1
Therefore, keeping only the term quadratic in Λ2 [γ̂μ , γ̂ν ]Fμν , we find that for large Λ,

1 i 2
/Λ2 d4 k
AΛ → Tr(γ5 ( 2 )2 [γ̂μ , γ̂ν ][γ̂ρ , γ̂σ ]Fμν Fρσ ) e−k
2 4Λ (2π)4
1 Λ4
= − tr D (γ5 [γ̂μ , γ̂ ν ][γ̂ρ , γ̂σ ])tr G (F μν Fρσ ) ·
32Λ4 16π 2
1 1
= − trG (Fμν F̃μν ), F̃μν ≡ μνρσ Fρσ (15.197)
16π2 2
The result, gratifyingly, depends only on the field tensor Fμν , which transforms under
local gauge transformations as Fμν → U (x)Fμν U † (x), so that the group trace in
(15.197) is gauge-invariant, as desired. For the abelian case, we simply obtain (without
the trace) a result proportional to Fμν F̃μν where Fμν is the single field ∂μ Aν − ∂ν Aμ .
If we now rotate back to Minkowski space, a factor of −i appears (in each term of
Fμν F̃μν there are three space and one time indices), so the Minkowski anomaly is

1
AM = i trG (Fμν F̃ μν ) (15.198)
16π2
and the axial current divergence (15.184) becomes
1
∂μ J5μ (x) = 2imψ̄γ5 ψ(x) −  trG (Fμν F̃ μν ) (15.199)
8π 2
If we return to the canonical normalization of the gauge fields (by reversing the scaling
Aμ → g1 Aμ in (15.160)), the anomaly term is seen to acquire an explicit factor of
the squared coupling constant, and we obtain the usual form for the axial current
Explicit quantum-breaking of global symmetries: anomalies 551

divergence (setting, as usual in this book,  = 1):

g2
∂μ J5μ (x) = 2imψ̄γ5 ψ(x) − trG (Fμν F̃ μν ) (15.200)
8π 2

We see that the coefficient of the anomaly is an exceedingly simple function of g,


and can indeed be determined by a second-order calculation in perturbation theory,
with no further modifications at higher order, as first demonstrated explicitly using
graph-theoretic methods by Adler and Bardeen (Adler and Bardeen, 1969) (a result
generally referred to as the “Adler-Bardeen non-renormalization theorem”).
The chiral anomaly is of enormous importance in the physics of the Standard
Model. It is at the heart of the current algebra derivation of the neutral pion decay
rate to two photons (Bell and Jackiw, 1969), and of the resolution of the famous
“U(1) problem”, where the absence of a low-mass pseudoscalar isosinglet meson—
despite the apparent prediction of such a particle via Goldstone’s theorem applied
to broken global chiral symmetry in QCD—is directly attributable to the anomalous
breaking of chiral current conservation (’t Hooft, 1976). Later, in Part 4 of the book
where we address issues of renormalizability, we shall see that quantum anomalies also
place severe restrictions on the construction of theories with local gauge symmetry.
Although our derivation of the chiral anomaly above was carried out in a gauge
theory, the gauge particles were coupled to non-anomalous exactly conserved vector
currents, and the anomalous axial current was associated with a global (non-gauged)
symmetry of the theory. It turns out that interacting local gauge field theories require
exact conservation of the associated currents (whose charges generate the symmetry
transformations associated with the global restriction of the gauge group) in order to
be consistent, although the local gauge symmetry may be spontaneously broken by
the ground state of the theory (as we shall discuss in the next section). This requires
that theories such as the electroweak component of the Standard Model, which are
rife with axial currents coupled to the weak bosons of the theory, must satisfy an
anomaly cancellation condition (see Problem 11) whereby the contributions to the
anomaly in any gauged current from the various fermions coupled to that gauge boson
are required to cancel in order to maintain consistency (and renormalizability) of the
theory.
Another quantum anomaly of great importance in modern field theory is the
trace anomaly, which arises in the divergence of the dilatation current, which we
discussed briefly in the context of scalar field theories in Section 12.5. For a local
gauge theory ((15.112) with a single fermion multiplet), the (appropriately improved
(Freedman et al., 1974)) energy-momentum tensor gives rise to a dilatation current,
as in (12.117), with a non-vanishing trace even in the limit of massless fermions. In
this case, in contrast to the axial anomaly, the trace anomaly receives contributions
in all orders of perturbation, and a detailed graph theoretic analysis (using techniques
of renormalization theory which we must defer to Part 4) gives ((Adler et al., 1977),
(Collins et al., 1977)):

β(g)
T μμ = Tr(Fμν F μν ) + (1 + γ(g))mψ̄ψ (15.201)
2g
552 Symmetries IV: Local symmetries in field theory

Here the functions β(g), γ(g) are well-defined functions of the renormalized coupling
g which are related to the coupling and mass renormalizations of the theory and can
be calculated explicitly order by order in perturbation theory. The β(g) function in
particular—the famous “β function” of the renormalization group—plays a critical
role in understanding the scaling behavior of gauge theories, and will be discussed
in detail later in the book. The intimate connection between the trace anomaly and
the scaling behavior of interacting field theories should come as no surprise when
we recall its origin in our attempt to formulate a Noether current for the classical
dilatation symmetry of a Lagrangian with no dimensionful couplings (such as the
Yang–Mills Lagrangian (15.112) with all fermion masses zero). The trace anomaly
can also be understood at low orders (one loop) from a functional integral point
of view (Fujikawa, 1981)—again, as for the chiral anomaly, the culprit is a non-
trivial functional Jacobian—but it is difficult to obtain a rigorous all-orders result,
as in (15.201), by this technique. Although the field theories of the Standard Model
have non-vanishing β(g) functions and are definitely not conformally (or dilatation)
invariant, there are examples of supersymmetric field theories (N =4 supersymmetric
Yang–Mills is the classic case) where the trace anomalies contributed by the various
fields of the theory cancel and the β function appears to vanish to all orders of
perturbation theory, suggesting an exactly conformally invariant (and even UV finite!)
theory. The cautionary verb “appears” is used here because of the annoying fact that
there is no known ultraviolet regulator which can be used to give a definite meaning
to all the perturbative amplitudes of the theory while preserving both the local gauge
invariance and the global supersymmetry which are essential ingredients in the formal
arguments leading to the asserted conformal invariance.
As a final example of the important role played by quantum anomalies in modern
particle theory, we may mention here the purely gravitational anomalies that arise in
field theories in higher dimensions—in particular, in theories with local supersymmetry
(supergravity theories). In this case, the anomalous current is the energy-momentum
tensor itself! In any generally covariant theory of gravity, the graviton must couple
to a covariantly conserved energy-momentum tensor arising from the matter fields,
and it turns out (Alvarez-Gaumé and Witten, 1983) that the required cancellation of
potential anomalies in the Ward identity expressing this conservation requires very
careful choice of the fermionic representation content of the theory. The observation
that the required gravitational anomaly cancellations corresponded to supergravity
theories (in ten dimensions) which were the low-energy limits of a special class of
superstring theories played a seminal role in the renaissance of string theory in the
mid-1980s.

15.6 Spontaneous symmetry-breaking in theories


with a local gauge symmetry
We saw in our study of spontaneous symmetry-breaking in Chapters 8 and 14 that
while the dynamics (as specified, say, by a Lagrangian density) of a theory may possess
an exact global symmetry, the energetics of the system may lead to a ground state
which is not itself invariant under the symmetry transformation. If the symmetry
is a continuous one, the Goldstone theorem then implies the appearance of exactly
Spontaneous symmetry-breaking in theories with a local gauge symmetry 553

massless particles in the spectrum of the theory. In point of fact, spontaneously broken
symmetries are much more prevalent than massless particles in Nature, so there must
clearly exist a mechanism for avoiding the consequences of the Goldstone theorem in
most cases. Sometimes, of course, the global symmetry is only approximate, so the
associated Goldstone modes are merely “light” particles, rather than exactly massless
ones. Such creatures are then referred to as “pseudo-Goldstone” particles.
But in the case of electroweak interactions in the Standard Model, we encounter
a situation in which the spontaneous breakdown is associated with a local symmetry,
corresponding to a Lagrangian which is exactly locally gauge-invariant but in which
the vacuum (ground state) of the theory breaks the associated global charge. It should
be emphasized that the underlying local gauge symmetry is always present, as it
is simply the reflection of a redundancy in the field variables in the un-gauge-fixed
Lagrangian: indeed, a famous theorem due to Elitzur (Elitzur, 1975) assures us that
the vacuum-expectation-value of any non-gauge-invariant quantity always vanishes
in a theory with an exact local symmetry, in the absence of gauge-fixing. Once a
gauge is fixed, however, to remove the redundant degrees of freedom, the remaining
(discrete!) global symmetry may undergo spontaneous symmetry-breaking exactly
along the lines discussed in the previous chapter. The phrase “spontaneous breaking
of local gauge symmetry” is therefore in some sense a misnomer, but a convenient
one, if we think of it as a short circumlocution for “spontaneous breaking of remnant
global symmetry after removal of redundant gauge degrees of freedom by appropriate
gauge-fixing”.
In the presence of local gauge symmetry, the conditions discussed in Section
14.2 for the applicability of the Goldstone theorem are not present, and instead
of producing massless Goldstone particles we move in exactly the opposite direc-
tion, with the emergence of massive vector particles corresponding to gauge fields
with no mass term in the Lagrangian! This remarkable phenomenon—discovered
in the 1960s by Higgs (Higgs, 1964) (and independently, by several other work-
ers), but already prefigured in the Ginzburg–Landau model of superconductivity
(where the appearance of a photon “mass” underlies the exponential Meissner screen-
ing of the magnetic field in the superconducting medium)—is at the core of our
present understanding of the electroweak sector of the Standard Model of elementary
particles. The physical mechanism underlying the Higgs phenomenon can be com-
pletely understood in a simple abelian model (which Higgs himself used to illustrate
the essential idea). We start with the Lagrangian for a complex scalar field φ(x)
coupled gauge-invariantly to a vector field Aν , with polynomial self-coupling P (φ∗ φ)
for the scalar field:
1
L = − Fνρ F νρ + (∂ν − igAν )φ∗ · (∂ ν + igAν )φ − P (φ∗ φ) (15.202)
4
which is clearly invariant under the local abelian transformations

φ(x) → eigΛ(x) φ(x) (15.203)


φ∗ (x) → e−igΛ(x) φ∗ (x) (15.204)
Aν (x) → Aν (x) − ∂ν Λ(x) (15.205)
554 Symmetries IV: Local symmetries in field theory

If the quadratic term in the scalar potential P (φ∗ φ) is positive,

P (φ∗ φ) = +μ2 φ∗ φ + λ(φ∗ φ)2 (15.206)

the vacuum state occurs for vanishing expectation value (VEV) of the scalar field,
0|φ(x)|0 = 0, and the gauge symmetry is preserved by the vacuum. The theory then
corresponds quite simply to the scalar quantum electrodynamics of a charged massive
spinless particle coupled to a massless photon. If the quadratic coefficient is negative,
on the other hand,

P (φ∗ φ) = −μ2 φ∗ φ + λ(φ∗ φ)2 (15.207)

the classical energy density is minimized for fields with magnitude |φ(x)| = √μ2λ ≡ v,
and we must expect the quantum scalar field to acquire a non-vanishing VEV as well,
which to lowest order in the coupling is just the value v. In the absence of a coupling
to the gauge field (i.e., setting g = 0) we would, of course, simply shift the scalar field
by defining φ(x) = v + φ̂(x), and discover on rewriting the Lagrangian in terms of the
shifted field that the real component of the field φ̂R possesses a sensible (positive) non-
zero mass term, while the imaginary part φ̂I has no quadratic part and corresponds
to the massless Goldstone mode.
For g = 0 the result is altogether different. The physical spectrum of the theory
is most easily exposed in this case by employing the full—and exact—local gauge
symmetry of the theory to rotate the complex field to a real value. Thus, writing
φ(x) = √12 (φR (x) + iφI (x)), where φR , φI are self-conjugate (and with the canonical
normalization of their kinetic terms), the gauge symmetry (15.203) can clearly be used
to set φI (x) = 0 identically. In this “unitary” gauge, the Lagrangian becomes

1 1 1
L = − Fνρ F νρ + (∂ν − igAν )φR · (∂ ν + igAν )φR − P ( φ2R ) (15.208)
4 2 2

If we now shift the single remaining field φR by its VEV v = √μ to reflect the
λ
appropriate VEV for the ground state

μ
φR (x) ≡ √ + ψR (x) (15.209)
λ

the Lagrangian becomes

1 μ2 g 2 1
L = − Fνρ F νρ + Aν Aν + (∂ν − igAν )ψR · (∂ ν + igAν )ψR − μ2 ψR2
4 2λ 2
μg 2 √ 3 1 4
+ √ Aν Aν ψR − μ λψR − λψR (15.210)
λ 4

We recognize this as the theory of a massive Maxwell–Proca field Aν , with mass


μg
(to lowest order) given by √λ
= gv ≡ mA , and a massive real scalar field ψR , with
Spontaneous symmetry-breaking in theories with a local gauge symmetry 555

mass (again to lowest order) given by 2μ.13 The remnant massive physical spin-zero
particle associated with ψR has become known universally as the “Higgs particle”. In
addition to the usual scalar self-couplings, there are vertices in the theory correspond-
ing to cubic Aν Aν ψR and quartic Aν Aν ψR 2
interactions of the massive vector with the
Higgs particle.
Note that if we count physical degrees of freedom, there is no discontinuity as we
pass smoothly (by varying μ) from the symmetry-unbroken phase of theory (15.206) to
the symmetry-broken phase (15.207). In the former theory we have a complex massive
field with two independent modes (as the particle and antiparticle are distinct) and a
massless spin-1 vector field, with two independent polarizations; in the latter, a massive
spin-1 field corresponding to a particle with three independent polarizations and a self-
conjugate spin-zero Higgs particle (one degree of freedom). One sometimes hears this
transition described with the rather colorful language: “the massless gauge particle, by
eating the would-be Goldstone boson induced by symmetry-breaking, converts itself
into a massive vector particle!” This is really somewhat misleading: we are in either
the symmetry unbroken phase, with a massless vector and massive complex scalar,
or in the broken phase, where our initial choice of field variables in the Lagrangian
(15.202) really corresponds to a misidentification of the correct, in this case completely
massive, physical degrees of freedom of the theory.
Finally, we should note here, in anticipation of our discussions of renormalizability
in Part 4, that the quantization of this abelian Higgs theory in unitary gauge, as given
by the Lagrangian (15.210), gives rise to ultraviolet singularities in the (off-shell) Green
functions of the theory which cannot be renormalized (i.e., absorbed into redefinitions
of Lagrangian parameters). The potential for problems of this sort becomes apparent
when we recall that the momentum-space propagator of our massive vector field
contains a numerator factor gμν − k μ k ν /m2A (cf. Chapter 7, Problem 8) which leads
(when divided by the k 2 − m2A denominator factor) to a propagator of order unity at
large momentum, with the consequent appearance of uncurable ultraviolet divergences
in the perturbative expansion of the Green functions. We shall see below how this
problem can be cured by a different choice of gauge, which of course does not alter
the S-matrix, but provides a smooth (and ultraviolet convergent) off-shell extension
of the Green functions of the theory.
Before proceeding to the general functional quantization (in renormalizable gauge)
of a spontaneously broken local gauge theory, a less trivial example involving a non-
abelian field may help to concretize some of the features which we should expect to
appear in the general case. The theory in question is the Weinberg–Salam model of
leptonic electroweak theory, which forms (with QCD) one of the two legs on which
the modern Standard Model of particle physics stands. Let us take the local gauge
group to be SU(2)×U(1), corresponding to a theory with four gauge mesons, which
ν (associated with the SU(2) group) and Bν
we will label Aαν , α = 1, 2, 3 or simply A

13 The quantization of this massive vector field can be carried out explicitly along the lines of Problem 4,
Chapter 12. From the point of view of Dirac Hamiltonian theory, the primary constraint Π0 = 0 gives rise to
a secondary constraint (equation of motion for A0 ) which contains the combination m2A A0 + ∇  · Π,
 which
has non-vanishing Poisson brackets with Π0 : i.e., we have a pair of second-class constraints. This is therefore
a theory without the first-class constraints characteristic of a gauge theory—not surprisingly, as we have
eliminated the gauge symmetry by a choice of gauge.
556 Symmetries IV: Local symmetries in field theory

ν gauge fields are coupled with charge


(associated with the abelian U(1) group). The A
g under SU(2) to a single complex doublet of scalar fields φn , n = 1, 2, which couples
with U(1) charge g  to the Bν field. Thus, the Lagrangian for the gauge and scalar
fields takes the form

1 1
L = − F νρ · F νρ − Gνρ Gνρ − P (φ)
4 4
g g 
ν )φ]† (∂ ν − i g B ν − i g τ · A
ν )φ
+ [(∂ν − i Bν − i τ · A (15.211)
2 2 2 2
Fανρ = ∂ ν Aρα − ∂ ρ Aνα + gαβγ Aνβ Aργ , Gνρ = ∂ ν B ρ − ∂ ρ B ν (15.212)
P (φ) = −μ2 φ† φ + λ(φ† φ)2 (15.213)

where τ are the Pauli matrices, and we have adopted the conventional coupling sign
and normalizations (involving a change of sign relative to (15.202) and a factor of 12 ).
Again, the physical spectrum is most readily revealed in unitary gauge, so we use the
local SU(2) gauge freedom to rotate the scalar doublet field to eliminate the upper
component and the imaginary part of the lower component, leaving only the real part of
the lower component, which is then shifted to remove (at lowest order of perturbation
theory) the VEV associated with the minimum of P (φ) at |φ| = √μ2λ ≡ √12 v:

0
φ(x) = √1 (v + H(x))
2

The single remaining self-conjugate scalar field H(x) interpolates for the famous, but
as yet undiscovered,14 Higgs particle of the Standard Model. The vacuum expectation
value of the scalar doublet


0
< φ >= √1 v
2

generates in the scalar kinetic term in (15.211), as in the Higgs abelian model discussed
previously, a mass term for the four vector bosons of the theory, in this case involving
a squared mass matrix

1 2
M =< φ† > Tα Tβ < φ >, α, β = 1, 2, 3, Y (15.214)
2 αβ

where we have combined the four generators of SU(2)×U(1) in a single notation, with
the Y index referring to the weak hypercharge abelian U (1) subgroup. Thus (using Y
also to indicate the value of the “hypercharge” associated with the U(1) subgroup,

14 As this book goes to press, there are intriguing indications at the Large Hadron Collider at CERN
(Geneva, Switzerland) of a possible Higgs signal at a mass of approximately 125 GeV.
Spontaneous symmetry-breaking in theories with a local gauge symmetry 557

which must be assigned separately to the various field multiplets in the theory)

g g g
Ti = τi , i = 1, 2, 3, TY = Y (= for φ), [Ti , Tj ] = igijk Tk , [Ti , Y ] = 0
2 2 2
(15.215)
The vector mass matrix separates into two uncoupled sectors, with the (α, β) = 1, 2
sector giving


1 2 1 1 −i
M = g2v2 (15.216)
2 8 i 1

corresponding to a mass term


1 2 2 1 1 1
g v · √ (A1ν − iA2ν )† · √ (Aν1 − iAν2 ) = MW
2
Wν† W ν , MW
2
= g2v2 (15.217)
4 2 2 4

where we have defined a complex massive vector field Wν = √12 (A1ν − iA2ν ) with mass
gv/2. In the 3-Y subspace we have the 2x2 squared mass matrix
 2

1 2 1 g −gg 
M = v2 (15.218)
2 8 −gg  g 2

corresponding to a mass term


1 2  1
v (g Bν − gA3ν )2 + 0 · (gBν + g  A3ν )2 = MZ2 Zν Z ν (15.219)
8 2
where the following field combinations (satisfying conventionally normalized canonical
equal-time commutation relations if the A3 and B fields do) have been defined

g  Bν − gA3ν
Zν ≡  (15.220)
g 2 + g 2
gBν + g  A3ν
Aν ≡  (15.221)
g 2 + g 2

where the self-conjugate field Zν has mass mZ = 12 v g 2 + g 2 , while the Aν field is
massless. The existence of a zero mode in the mass matrix (15.214) is clearly associated
with the existence of a linear combination of generators 12 (τ3 + Y ) = g1 T3 + g1 TY
which annihilates the VEV of the scalar doublet (which has Y = 1):


1 0
(τ3 + 1) √1 v = 0
2 2

Thus there is an unbroken U(1) subgroup of the original SU(2)×U(1) local gauge
symmetry, which must be associated with a massless gauge particle. This is, of course,
the photon, in the modern electroweak theory. One may easily check that the W, W †
and Z fields transform under the generator 1g T3 + g1 TY = 12 (τ3 + Y ) ≡ Q as fields of
electric charge –1, +1, and 0 respectively. The discovery in 1983 of a neutral massive
558 Symmetries IV: Local symmetries in field theory

vector Z boson in the weak interactions (in addition to the long-suspected charged
weak carriers W ± ) was a dramatic confirmation that the particular pattern of local
symmetry-breaking described here indeed conforms to reality. Of course, the real value
of such a model lies in its ability to accurately depict the weak interactions of the
fundamental fermions of the theory: the leptons and quarks. We shall briefly describe
the leptonic sector of the electroweak theory here, as proposed in Weinberg’s seminal
paper (Weinberg, 1967), before going on to discuss the functional quantization and
derivation of Feynman rules for a general spontaneously broken local gauge theory.
The electroweak sector of the Standard Model is a chiral gauge theory: that is to say,
left- and right-handed parts of the Dirac fermion fields of the theory (which the reader
will recall from Chapter 7, fall into separate representations of the proper homogeneous
Lorentz group) are placed in different representations of the gauge group. Thus, if ψ is
a Dirac 4-spinor field, ψL = PL ψ = 1+γ 5
2 ψ and ψR = PR ψ =
1−γ5
2 ψ are the left-handed
and right-handed 2-spinor components of ψ respectively. This means that ψL and ψR
may be in SU(2) multiplets of different dimensionality, and may be assigned different
weak hypercharge quantum numbers YL and YR . One recovers the conventional V − A
structure of the charged weak currents by placing the left-handed part of the electron
field eL together with the purely left-handed Weyl electron neutrino field in a SU(2)
doublet Le (with weak hypercharge YL =–1),


νe (x)
Le (x) =
eL (x)

and the right-handed part of the electron field eR in a SU(2) singlet field Re , with weak
hypercharge YR =-2. We may also think of this chiral arrangement as corresponding to
the inclusion of γ5 factors (via chiral projection operators PL , PR ) in the gauge group
generators,
g g
Ti = τ i PL , TY = (YL PL + YR PR ) (15.222)
2 2
which, of course, satisfy the commutation relations (15.215). The charge operator then
becomes Q = 12 (τ3 − 1)PL − PR , giving electric charge –1 to both components eL and
eR of the electron field, and zero charge to the neutrino, as desired. The fermionic
(leptonic) part of the Lagrangian, with these choices, becomes
1 1
Llept = L̄e (i∂/ − g  B /)Le + R̄e (i∂/ − g  B
/ + g τ · A /)Re (15.223)
2 2
Note that there is so far no mass term for the electron field, as a direct coupling of the
left- and right-handed parts of the electron field would violate the SU(2) symmetry.
When the A ν and Bν fields are rewritten in terms of the physical Wν , Zν , Aν fields,
one recovers (see Problem 13), in addition to the long known V − A structure for the
charged weak currents (mediated by the W fields), a new set of neutral weak current
interactions due to the massive Z boson, as well as, of course, conventional quantum
electrodynamics for the interaction of the electron and photon fields. The muon and tau
leptons (with their associated neutrinos) may be included by essentially “xeroxing” the
structure above twice. Masses arise naturally in this model for the charged leptons once
Spontaneous symmetry-breaking in theories with a local gauge symmetry 559

Yukawa interactions, exactly invariant under the local gauge symmetry, are included
between the leptons and the scalar field doublet:

LYuk = −Ge {L̄e φRe + R̄e φ† Le } (15.224)

If we recall that the scalar field φ lies in a SU(2) doublet with weak hypercharge 1,
with Le and Re having weak hypercharges –1 and –2 respectively, we see that the cubic
Yukawa coupling here is invariant under both the SU(2) and U(1) parts of the gauge
group. Moreover, once the field is shifted to extract the vacuum expectation value,
a mass term −Ge (ēL √v2 eR + ēR √v2 eL ) = −me ēe, me = Ge √v2 emerges automatically
for the electron. Muon and tau masses emerge similarly: they involve completely
independent Yukawa couplings Gμ , Gτ , so we cannot expect any obvious connection
between the charged lepton masses (on the basis of symmetry requirements), although,
of course, the wide disparity (as yet, completely mysterious) of these masses is at the
very least aesthetically disturbing.
The presence of γ5 factors in the fermionic generators of our chiral SU(2)×U(1)
gauge theory should alert us to the possibility of anomalies, and indeed the con-
servation of the Noether gauge currents of the purely leptonic electroweak theory
described here is broken by quantum anomalies, which would render the theory non-
renormalizable, and even prevent the execution of the functional quantization process
to be described below (where we assume the absence of any non-trivial functional
Jacobians in the functional integral). It is an extraordinary—and highly suggestive—
feature of the electroweak theory that the quantum anomalies in the gauge currents
are exactly cancelled once quark fields (in one-to-one correspondence with the lepton
fields) are introduced with the appropriate quantum numbers (see Problem 11).
We now turn to the long-promised derivation of the Feynman rules for a sponta-
neously broken gauge theory. We shall emphasize the derivation of the propagators of
the theory, as the possibility of obtaining a renormalizable theory hinges most directly
on the high-momentum behavior of the propagators: in particular, we wish to show
that the disastrous asymptotic behavior (of order kμ k ν /k 2 ) of the massive vector boson
propagator in a unitary gauge can be removed by a choice of gauge which both (a)
maintains manifest Lorentz covariance, and (b) damps the high-momentum behavior
to the same level as that of a scalar propagator: i.e., 1/k 2 . Complete details of the
derivation of the Feynman rules in broken gauge theories can be found in the classic
articles by Weinberg (Weinberg, 1973) and Abers and Lee (Abers and Lee, 1973).
We begin by slightly altering the notation used in the examples discussed above:
the generator matrices will now not contain factors of the coupling constant, and we
return to our original sign conventions for the coupling(s), as incorporated in (15.113–
15.115). We shall assume that our scalar field multiplets consist of purely real fields
(we can, of course, always decompose a complex scalar field into two real fields by
writing φ = √12 (φR + iφI )), with the generator matrices Tα real and antisymmetric,
so that the covariant derivative on the scalar fields reads

Dμ φ = (∂ μ − gTα Aμα )φ, [Tα , Tβ ] = fαβγ Tγ (15.225)

Note that, as in electroweak theory, the coupling g may vary from one simple subgroup
of the full local gauge group G to another: to avoid overcomplicating the notation, we
560 Symmetries IV: Local symmetries in field theory

shall avoid indicating this explicitly below. The fermions fill, as usual, complex (but
possibly chiral) representations of G, and we use, as previously, hermitian generators
tα in the fermionic representations, with covariant derivative

D μ ψ = ∂ μ + igtα Aμα (15.226)

Under the infinitesimal local gauge transformations

φ(x) → (1 + gTα λα (x))φ(x) (15.227)


ψ(x) → (1 − igtα λα (x))ψ(x) (15.228)
Aμα (x) → Aμα (x) + ∂ μ λα (x) + gfαβγ λβ (x)Aμγ (x) (15.229)

the Lagrangian
1 1
L = − Fαμν Fαμν + (Dμ φ)T D μ φ + ψ̄(iD/ − m)ψ − ψ̄Γi ψφi − P (φ) (15.230)
4 2
is invariant, provided the fermion mass (matrix) m commutes with the generators,
[tα , m] = 0, and the Yukawa couplings and the scalar polynomial are appropriately
chosen: namely,

[tα , Γi ] = −i(Tα )ij Γj (15.231)


∂P
(Tα )ij φj = 0 (15.232)
∂φi
We shall now suppose that spontaneous breaking of the gauge symmetry occurs,
induced by the presence of a non-trivial minimum of the scalar potential P (φ), and
the appearance of a non-vanishing vacuum expectation value vi for φi (at lowest order
in the coupling)

∂P 
= 0 ⇒ 0|φi (x)|0 = vi (15.233)
∂φ φi =vi =0

The vacuum expectation value will be removed in the usual fashion by defining a
shifted field φi ≡ φi − vi , so that the action of an infinitesimal gauge transformation
on the scalar and vector fields becomes

φ(x) → (1 + gTβ λβ (x))φ(x)


⇒ φ (x) → φ (x) + gTβ λβ (x)(v + φ (x))
Aαμ (x) → Aαμ (x) + ∂μ λα (x) + gfαβγ λβ (x)Aγμ (x) (15.234)

We shall now impose a gauge condition as a joint constraint on the gauge and scalar
fields of the theory. The form of the constraint is at first sight rather peculiar, but will
shortly be seen to give an algebraically convenient set of Feynman rules. We impose
the local gauge condition

∂ μ Aαμ (x) − ξg < v, Tα φ (x) >= fα (x), < v, Tα φ >≡ vi (Tα )ij φj (15.235)
Spontaneous symmetry-breaking in theories with a local gauge symmetry 561

where ξ is an arbitrary positive real number, and we have introduced an obvious


bracket notation to indicate dot-products with respect to the scalar field indices. The
fα (x) are arbitrary real functions, so in the absence of symmetry-breaking (v = 0)
our gauge choice reduces to the generalized Landau gauge used earlier (cf. (15.146)).
Following steps analogous to those leading to (15.148, 15.149), we introduce the
covariant DFP functional associated with this choice of gauge

1 = Δcov [A, φ ] · DU δ(∂ μ AU  U
αμ − ξg < v, Tα (φ ) > −fα )

Δcov [A, φ ] = det(δαβ + gfαβγ ∂ μ Aγμ − ξg 2 < v, Tα Tβ v > −ξg 2 < v, Tα Tβ φ >)
(15.236)

which, as previously (15.155, 15.156), can be represented more conveniently by intro-


ducing Grassmannian ghost fields ωα with a ghost Lagrangian

Lgh = ω̄α (δαβ + gfαβγ ∂ μ Aγμ + ξg 2 < Tα v, Tβ v > −ξg 2 < v, Tα Tβ φ >)ωβ
(15.237)
One notes here the appearance of (a) a ghost mass matrix ξg 2 < Tα v, Tβ v > and, (b)
in addition to the ghost-vector vertex, a ghost-scalar coupling term. Precisely as in
the unbroken case, one may establish that the generating functional of the theory
is independent of the choice of the arbitrary functions fα , which we may therefore
integrate over, with a Gaussian modulating factor as in (15.153), to obtain the path
integral (minus sources) for our spontaneously broken gauge theory:
   4
ZSBGT = DAαμ DψDψ̄Dωα Dω̄α Dφ ei Ltot d x

1 1
Ltot = − Fαμν Fαμν − (∂ μ Aαμ − ξg < v, Tα φ >)2
4 2ξ
1
+ (Dμ (v + φ ))2 − P (v + φ ) + Lferm + Lgh (15.238)
2
The utility of the peculiar choice of gauge condition (15.235) now becomes apparent
on examining the scalar field kinetic term,
1 1 g2
(Dμ (v + φ ))2 = (Dμ φ )2 + < Tα v, Tβ v > Aαμ Aμβ
2 2 2
1 1 1
− < gTα v, (∂μ − gTβ Aβμ )φ > Aμα + < (∂ μ − gTα Aμα )φ , gTβ v > Aβμ
2 2 2
1 1
= < Dμ φ , Dμ φ > + Mαβ 2
Aαμ Aβ + g 2 Aμα Aβμ < Tα v, Tβ φ > −gAμα < Tα v, ∂μ φ >
μ
2 2
(15.239)

which evidently generates a non-trivial (but ξ-independent!) squared-mass matrix for


the gauge vector bosons (cf. (15.214))
2
Mαβ ≡ g 2 < Tα v, Tβ v > (15.240)
562 Symmetries IV: Local symmetries in field theory

The final term in (15.239), mixing the scalar and gauge fields, combines with the cross-
term from the gauge-fixing part of the total Lagrangian to produce a total derivative,
which then vanishes after integration over spacetime (recall that the Tα matrices are
real antisymmetric):

1
− (−2ξg∂ μ Aαμ < v, Tα φ >) − gAμα < Tα v, ∂μ φ >= ∂ μ (gAαμ < v, Tα φ >)

(15.241)
The propagators of the theory are associated with the parts of Ltot quadratic in the
various fields, and now that unwanted mixing terms have been eliminated, these can
easily be read off from the quadratic scalar, gauge, and fermion Lagrangians:

1 1 ∂2P ξg2
Lquad
scal = (∂μ φi )2 − (φ = v)φi φj − (< v, Tα φ >)2 (15.242)
2 2 ∂φi ∂φj 2
1 1 1 2
gauge = − (∂μ Aαν − ∂ν Aαμ ) −
Lquad 2
(∂μ Aμα )2 + Mαβ Aαμ Aμα
4 2ξ 2
1 1
→ 2
Aαμ ((δαβ + Mαβ )g μν + δαβ ( − 1)∂ μ ∂ ν )Aβν (15.243)
2 ξ
Lquad / − Mf )ψ,
ferm = ψ̄(iD Mf = m + Γi vi (15.244)

where the arrow in (15.243) refers to a rearrangement of the derivatives via an


integration by parts. The squared-mass matrix for the scalar fields is easily read off
from (15.242):

2 ∂2P 1
Mij = (φ = v) + ξg 2 (Tα v)i (Tα v)j (15.245)
∂φi ∂φj 2

which is evidently dependent on the arbitrary gauge parameter ξ. The physical


interpretation of this disconcerting feature is best seen if we choose a basis of the gauge
group generators Tα , α = 1, . . . , N such that the first m generators span the unbroken
subgroup, while Tα v = 0 for α = m + 1, .., N correspond to the broken directions (and,
in the limit where the gauge coupling vanishes, to Goldstone bosons). We saw in the
simple models examined earlier that the field redefinitions implicit in the unitary
gauge effectively absorb the Goldstone scalar modes into the longitudinal parts of the
massive gauge fields, so it should not be surprising to find that the “mass-matrix” for
these modes is gauge-dependent, and linked to the behavior of the longitudinal part of
the gauge vector propagator, which we shall shortly see is also ξ-dependent. The first
term on the right-hand side of (15.245), on the other hand, is gauge-independent, and
corresponds to the physical scalar modes. Indeed, this mass matrix contributes exactly
zero in the Goldstone mode directions, as we see by differentiating the invariance
condition (15.232) and setting φ(x) = v,

∂2 P ∂P ∂2P
(Tα )ij φj + (Tα )ik = 0 ⇒ (φ = v)(Tα v)i = 0 (15.246)
∂φk ∂φi ∂φi ∂φk ∂φi
Spontaneous symmetry-breaking in theories with a local gauge symmetry 563

The gauge vector propagator can be read off easily from (15.243): we need the
Green function for the operator ( + M 2 )g μν + ( 1ξ − 1)∂ μ ∂ ν , which the reader will
easily verify corresponds to a Feynman propagator given by
 k k
d4 k gμν − (1 − ξ) k2 −ξM 2
μ ν

0|T (Aαμ (x)Aβν (y)|0 = −i ( )αβ e−ik·(x−y) (15.247)


(2π)4 k 2 − M 2 + i

with the squared-mass matrix M 2 (carrying the α, β indices) given by (15.240). Notice
that the ξ-dependence is entirely in the longitudinal part (proportional to kμ kν ) of the
momentum-space propagator. The poles at k 2 = ξM 2 cannot correspond to physical
particle states as they depend on the arbitrary gauge parameter ξ: indeed, we know
that the S-matrix is independent of ξ and therefore cannot have any such poles in
single-particle cuts of amplitudes. However, these poles occur at exactly the same
mass eigenvalues as those given by the Goldstone mode part of the scalar propagator,
the second term on the right-hand side of (15.245). Indeed, suppose that ρi is an
eigenvector of the latter matrix:

ξg 2 (Tα v)i (Tα v)j ρj = λρi (15.248)

Multiplying both sides by (Tβ v)i , summing over i, and defining χα ≡ (Tα v)j ρj , we
find
2
ξMβα χα = λχβ (15.249)

This means that the unphysical poles in the longitudinal part of the gauge propa-
gators can (indeed, by gauge invariance, must) be cancelled by poles at exactly the
same locations in the scalar propagators. The Feynman ξ = 1 gauge choice gives a
particularly simple momentum-space propagator, proportional to gμν . Of course, as
a check on more complicated higher-order perturbative calculations, it may be useful
to retain the general form to ensure that all ξ-dependent terms cancel in the final
physical result (see Problem 14).
The large momentum behavior of the gauge vector and scalar propagators is
uniformly 1/k2 regardless of the value of the gauge parameter,15 and whether or not
we are in the symmetry broken (M 2 = 0) or unbroken (M 2 = 0) phase of the theory.
This is in complete consonance with the intuition developed from our discussions of
spontaneous symmetry-breaking in Chapter 14, as a phenomenon linked to the large-
distance energetic properties of the theory, but essentially decoupled from the short-
distance, or large momentum, properties of the amplitudes. The soft behavior of the
vector propagators in these renormalizable ξ-gauges plays a crucial role in establishing
the renormalizability of a spontaneously broken gauge theory with massive spin-1
particles, as we shall see in Part 4.
The discovery of the renormalizable SU(2)×U(1) gauge field theory for the weak
and electromagnetic interactions in the early 1970s was followed in short order by

15 The limit ξ → ∞ is singular, and we see that we recover the unitary gauge momentum-space propagator
in this limit, with the numerator factor gμν − kμ kν /M 2 characteristic of a massive Maxwell–Proca field,
and with a propagator of order unity, rather than 1/k2 at large momentum.
564 Symmetries IV: Local symmetries in field theory

the proposal of a similar gauge theory, quantum chromodynamics (QCD), as the


underlying dynamical structure for the strong interactions. Earlier indications of a
triplet structure for the fermionic “quark” constituents of hadrons were incorporated
naturally by assuming that the gauge group in this case was SU(3), with the quark
fields transforming in the fundamental three-dimensional representation. There were
no available candidates for massive vector bosons transforming in an octet of the color
group, so the gauge symmetry in this case was presumed to be exact, with the quark
and gluon fields combining to give color-singlet interpolating fields for the physical
hadronic particles. The absence of physical particles associated with the non-color-
singlet fields of the theory was dubbed the “quark confinement hypothesis” (although
a more accurate term would be “color confinement”). This remarkable phenomenon—
in the author’s view, matched only in twentieth-century physics by the exotic phe-
nomena associated with superconductivity and superfluidity—is a consequence of the
strongly coupled character of unbroken four-dimensional Yang–Mills theories in the
infrared (at long distances), which we shall examine in detail in the final chapter of
the book.
The “Standard Model” of modern particle physics is based on a Lagrangian field
theory encompassing the interactions of leptons and quarks via gauge interactions
associated with an exact local SU(3)×SU(2)×U(1) gauge theory, with a spontaneously
broken SU(2) subgroup, in the simplest case by an elementary Higgs doublet field, as
discussed above.16 Remarkably, quark and lepton fields appear in the Standard Model
(see Problem 11) with exactly the right quantum numbers to assure that none of
the gauge currents receive anomalous contributions, via “miraculous” cancellations
between the quark and lepton fields in each generation. We know, however, that this
model cannot be complete, for (at least) two very convincing reasons: (i) the existence
of non-zero neutrino masses, and (ii) the apparent existence of a massive stable weakly
interacting dark-matter particle, which cannot be identified with any of the fields in
the Standard Model. Needless to say, speculative extensions of the Standard Model
abound. The possibility of unifying the three simple groups into a single larger group
(SU(5), SO(10),..; see Problem 12), broken at a high-energy scale (∼ 1015 GeV) to
the gauge symmetries of the Standard Model, led early on to the development of
Grand Unified Theories (GUTs), in which quarks and leptons appear in the same
representation of the gauge group, thereby rendering less mysterious the cancellation
of anomalies between seemingly unrelated quark and lepton representations. Another
obvious extension of the Standard Model lies in the direction of supersymmetry: the
minimal supersymmetric extension of the Standard Model (MSSM) is an attempt to
build a supersymmetric theory with the minimal number of additional fields (super-
partners to the conventional Standard Model fields). The breaking of supersymmetry
is expected for reasons connected with the renormalization of the Higgs mass to appear
at the now accessible TeV scale, and the search for supersymmetric partners of the
Standard Model particles with masses in this range is a subject of intense interest at

16 For an excellent and comprehensive introduction to the phenomenology of the Standard Model, see
(Donoghue et al., 1992).
Problems 565

the Large Hadron Collider (LHC), which should reach total energies of 10 or more
TeV in the center-of-mass frame within the next few years.

15.7 Problems
1. The object of this exercise is to verify that the change in fermionic Green functions
induced by a local gauge transformation in an abelian gauge theory does not affect
the on-shell S-matrix amplitudes of the theory, as given by the LSZ formula.
Suppose there is a single incoming (resp. outgoing) fermion carrying momentum p
(resp. p ). Show that the change in the S-matrix amplitude induced by the gauge
transformation Λ(x) is proportional to

  
d4 xd4 x eip ·x −ip·x (eie(Λ(x )−Λ(x)) − 1)

· ū(p )(p
/ − m)0|T (ψH (x )ψ̄H (x) . . .)|0(p/ − m)u(p) (15.250)

where we have suppressed spin labels and the gauge transformation function Λ(x)
is assumed to go to zero rapidly at large x. The change (15.250) will vanish if the
poles in the Fourier transform of the T-product of the form 1/(p / − m), 1/(p/ − m)
are absent. Show that these poles are indeed absent by demonstrating that the

Fourier transform f (q  , q) of f (x, x ) = eie(Λ(x )−Λ(x)) − 1 takes the form

f˜(q  , q) = D(q )F (q) + δ 4 (q)F  (q  ) (15.251)

where the functions F (q) (resp. F  (q  )) are smooth—and in particular, do not


contain δ-functions δ 4 (q) (resp. δ 4 (q  )). Accordingly, when the Fourier transform of
the T-product is convolved with f˜, the external propagator poles are smeared out
and the external spinors u(p), ū(p ) are annihilated by the LSZ (p/ − m), (p / − m)
factors.
2. Verify the infinitesimal form of the non-abelian transformation rules (15.113–
15.115), starting from (15.96, 15.105, 15.109).
3. Check that the Euler–Lagrange equation for Aα0 reduces to the non-abelian
Gauss’s Law (15.118), with the covariant derivative defined in (15.119).
4. Show that the non-abelian Hamiltonian density Πμα Ȧαμ − LYM leads to the total
Hamiltonian (keeping only the gauge fields) given in (15.125).
5. (a) Show that (for fields vanishing sufficiently fast at infinity) the axial gauge
choice Aα3 = 0 can be imposed with a unique choice of non-abelian gauge
transformation.
(b) Show that the Faddeev–Popov gauge functional Δaxial [A] for a non-abelian
gauge theory is constant (i.e., field-independent) in axial gauge A3α = 0.
6. Show that the Hurwitz integral for SU(2) (defined as the solid-angle integral over
the 4-sphere) is shift-invariant:
 
dU f (U V ) = dU f (U ), U, V ∈ SU (2) (15.252)
566 Symmetries IV: Local symmetries in field theory

7. Verify the expression (15.149) for the DFP functional in the covariant gauge
∂ μ Aαμ (x) = fα (x).
8. By examining the quadratic (in gauge fields) part of the action in the generating
functional (15.156), show that the gauge field propagator in the covariant ξ-gauge
is a Green function for the operator g μν + ( 1ξ − 1)∂ μ ∂ ν , and is given by (15.159).
9. By considering the lowest-order tree expressions for the three- and four-point
functions for gauge vector scattering in momentum space, verify the Feynman
vertex factors given in Fig. 15.1, parts (a) and (b).
10. Show that the eigenvalues λi of the Euclidean Dirac operator D̂[A] = γ̂ μ (−i∂μ +
Aμ ) are gauge-invariant: namely, show that if D̂[A]φi (x) = λi φi (x), then

† †
D̂[AU ]U (x)φi (x) = λi U (x)φi (x), AU
μ = U Aμ U + i(∂μ U )U (15.253)

11. The appearance of an anomaly in the currents associated with a local gauge
symmetry would destroy (at the quantum level) the local gauge symmetry of the
theory, with dire consequences for the renormalizability of the theory, as we shall
see in Part 4. In a chiral theory such as the electroweak sector of the Standard
Model, the appearance of γ5 factors in the generators of the local gauge group
(due to the fact that left and right handed fermionic fields transform differently
under the gauge group) signal the potential existence of such anomalies. Now
suppose that the generators are decomposed into right and left handed parts (cf.
1+γ5 1−γ5
(15.222)), Tα = tL R
α PL + tα PR , PL = 2 , PR = 2 , where now the tα do not
contain γ5 factors. It can be shown (see (Donoghue et al., 1992), for example)
that the anomaly in the current ψ̄γ μ Tα ψ is proportional to μνρσ Fβμν Fγρσ times a
difference of traces over the left- and right-handed fields:

Aα ∝ Tr(tL
α {tβ , tγ }) − Tr(tα {tβ , tγ })
L L R R R
(15.254)

The cancellation of anomalies in the SU(2)×U(1) electroweak theory occurs via


a magical cancellation between leptons and quarks in each generation. For the
lowest generation, we have in addition to the electron and electron-neutrino fields
discussed above, a left-handed quark doublet (uL (x), dL (x)) and right-handed
singlets uR (x), dR (x) under SU(2), with hypercharge assignments 13 , 13 , 43 , − 23 for
uL , dL , uR , dR respectively, in order to arrive at the desired electric charges + 23
for the up quark and − 13 for the down quark. The anticommutator of two Pauli
matrices is a multiple of the identity and the trace of a single Pauli matrix vanishes,
so we see immediately that the anomaly arising from three SU(2) indices α, βγ
vanishes. The choice of one U(1) with two SU (2) indices is easily seen to give
an anomaly proportional to YL , the sum of hypercharge quantum numbers
for the left-handed-fields, while taking all
three indices in the U(1) subgroup
gives an anomaly proportional to YL3 − YR3 . Show that, with the hypercharge
assignments given above, both of these potential anomalies vanish. Remember
that each quark field comes in 3 color versions, corresponding to the quantum
numbers under the SU(3) color gauge group of the strong interactions, so the
quark contributions need to be multiplied by a factor of 3.
Problems 567

12. Consider a broken gauge theory based on the gauge group G= SU(5). The
symmetry-breaking is implemented by coupling the twenty-four-dimensional
adjoint representation of gauge bosons to a real twenty-four-dimensional Higgs
scalar representation, which can be conveniently represented as a traceless hermi-
tian 5x5 matrix Φ, with Φ → U † ΦU giving the action of the group (where U is
a 5x5 unitary matrix of determinant 1). The most general scalar polynomial of
degree ≤ 4 symmetric under G is

P (Φ) = aTr(Φ2 ) + b(Tr(Φ2 ))2 + cTr(Φ4 )

Show that for some range of parameters a, b, c a symmetry-breaking minimum of P


is achieved where the ground-state vacuum expectation value of Φ is (for constant
real α)
⎛ ⎞
2α 0 0 0 0
⎜ 0 2α 0 0 0 ⎟
⎜ ⎟
0|Φ|0 = ⎜
⎜ 0 0 2α 0 0 ⎟⎟
⎝ 0 0 0 −3α 0 ⎠
0 0 0 0 −3α

What is the remaining unbroken symmetry group in this case? Find the spectrum
of vector mesons in the symmetry-broken phase.
13. Work out the form of the neutral leptonic current sector (electron generation
only) of the electroweak SU(2)×U(1) theory, by extracting the interaction terms
in (15.223) containing the photon (Aμ ) and Z-boson (Zμ ) fields. It is conventional

to introduce the Weinberg angle θW , with gg = tan θW , so

A3μ = sin (θW )Aμ − cos (θW )Zμ


Bμ = cos (θW )Aμ + sin (θW )Zμ (15.255)

Show that the leptonic interactions involving these neutral fields take the form

Llept,neut = −e ē(x)γ μ e(x)Aμ − e tan (θW )ēR (x)γ μ eR (x)Zμ


e
− ν̄e (x)γ μ νe (x)Zμ + e cot (2θW )ēL (x)γ μ eL (x)Zμ
sin (2θW )
(15.256)

14. The presence of neutral weak currents mediated by a massive Z meson necessitates
the choice of SU(2)xU(1) as the electroweak gauge group. The basic QED anni-
hilation process e+ + e− → μ+ + μ− now requires, in addition to the usual graph
with an intermediate virtual photon, inclusion of a graph with an intermediate Z
boson.
(a) Assuming the mass of the electron vanishes, show that the tree amplitude for
this process is ξ-independent.
568 Symmetries IV: Local symmetries in field theory

(b) Show that the presence of the Z graph leads to a forward-backward asymmetry
in the process (i.e., terms linear in cos(θ) in the center-of-mass differential
cross-section).
(c) Show that the tree amplitude from the above two graphs is not ξ-independent
if the electron and muon masses are not neglected. Explain what other graph
or graphs have to be taken into account in this case to restore a gauge-invariant
result.
16
Scales I: Scale sensitivity of field
theory amplitudes and effective field
theories

The history of the natural sciences since the late 1800s (if we temporarily set aside
astronomy as primarily concerned with Nature “in the large”) has to a great extent
involved an attempt to decipher the behavior of matter at ever smaller distance scales.
Qualitative descriptions of biological organisms at the macroscopic level have been
supplanted by an astonishingly detailed understanding of the underlying biochem-
istry of life; the complex profusion of chemical phenomena revealed empirically by
nineteenth-century and early-twentieth-century chemists are now understood to follow,
in many cases with the detailed quantitative support of sophisticated algorithms of
quantum chemistry, from a precise mathematical formulation based on Schrödinger’s
equation applied to atoms and molecules; the phenomenology of nuclei (fission, fusion,
radioactivity, etc.) has been reduced to the behavior of more “elementary” constituents
(quarks and gluons) obeying precise dynamical laws; and so on.
It is apparent from these examples that the type of theory or descriptive frame-
work appropriate for the description of the same phenomena at different levels of
“magnification” can vary enormously. Intricate details of the underlying microscopic
dynamics of a physical process may simply be irrelevant to achieving an adequate
“qualitative” understanding of the process as viewed at larger distance scales. Much
of nuclear physics can be understood perfectly well by treating the proton and neutron
as point-like fermions interacting non-relativistically via spin-dependent short-range
potentials, with absolutely no understanding of the underlying non-abelian local gauge
theory giving rise to these particles and their interactions.
A remarkable property of local quantum field theory, not shared by any of the
larger-scale phenomenologies mentioned above (which we now, of course, believe to be
consequences of the underlying field-theoretic phenomena), is that it is structurally
amenable to a precise mathematical description of the way in which the form of the
dynamical laws changes as the phenomena are examined at varying distance-scales.
As we shall shortly see, the representation of the theory at a given distance- (or
energy-)scale will turn out to be specified by an “effective Lagrangian” fixed in terms
of an infinite vector of dimensionless couplings, and the variation of these couplings
with the distance scale determined by a “renormalization group equation” which is
in essence the infinitesimal Lie algebra corresponding to the “renormalization group”
transformations associated with finite changes of the scale at which we examine the
570 Scales I: Scale sensitivity of field theory amplitudes and effective field theories

theory. In this chapter we shall begin the task of exploring these remarkable features
of local quantum field theory.

16.1 Scale separation as a precondition for theoretical science


In our discussion of cluster decomposition in chapter 6, we emphasized the critical
importance of the decoupling of amplitudes of far separated processes for the viability
of experimental science. If this decoupling did not occur, the correct interpretation of
the results of experiments carried out in some localized region of spacetime would be
contingent on a specification of the state of the Universe far beyond the boundaries
of the laboratory—a state of knowledge about our surroundings which we clearly can
never possess. This (fortunate) insensitivity to our lack of knowledge of the state of the
world at large distances from our experiments must be mirrored by a corresponding
insensitivity of the phenomena to details of the dynamics at short distances which are,
given the incomplete state of our theoretical knowledge, necessarily unknown at any
given moment in the historical enterprise that constitutes the physical sciences. The
source of this ignorance in elementary particle physics derives from the simple fact
that short-distance details of the dynamics are associated (via the magic of Fourier
transformation) with the behavior of the quantum amplitudes of the theory at high
energy, and at any given point in history we have access to man-made accelerators with
a limited energy range.1 Thus, even the projected maximum center-of-mass energy (14
TeV) of the recently initiated LHC (Large Hadron Collider) in Geneva corresponds to
an ability to probe the structure of elementary processes down to distances of about
10−5 fermis (= 10−20 m). As small as this seems from a human perspective, it is still
roughly 1015 times larger than the Planck scale of 10−20 fermis where we know that
(at the very latest) the effects of quantum gravity may no longer be ignored and the
nature of the quantum dynamics of elementary processes must undergo a dramatic,
and as yet completely mysterious, alteration.
The last point raised above leads us to an interesting conclusion: the concept of an
exact continuum limit for a local quantum field theory formulated on a flat Minkowski
spacetime background (as we have done throughout our discussion of field theory
to this point) has strictly speaking no correlate in physical reality. The Minkowski
metric of special relativity—which, after all, underlies the most characteristic aspect
of relativistic field theory, the space-like commutativity of local observables—is at best
an emergent phenomenon, an approximate representation of an underlying theory in
which kinematical and dynamical aspects are perhaps inextricably intertwined. Even
the most ambitious attempts to come to grips with quantum gravity—as in M-theory,
for example—seem to presuppose some sort of pre-existing spacetime background in
which the elementary entities of the theory (be they strings, membranes, or what have
you) move and interact.

1 The existence of very-high-energy cosmic rays offers us a tantalizing, but unfortunately very narrow,
window into physics at much higher energies than those accessible in accelerators, as do indirect cosmological
arguments involving the very early Universe, but the vast majority of our detailed information about
subatomic dynamics derives from the much more precise information gleaned from terrestrial accelerator
experiments.
General structure of local effective Lagrangians 571

It is perfectly clear, however, that the unknown complexities of a final “Theory


of Everything” (if such exists) cannot really be relevant in describing the physics of
processes at energy scales much lower than those at which quantum gravity effects
become important. This is not merely wishful thinking on our part: the extraordinary
quantitative successes of quantum electrodynamics (QED), as in the anomalous mag-
netic moment of the electron for example (in complete quantitative agreement with
experiment up to nine significant digits), assure us that the decoupling of the effective
low-energy physics represented by a local abelian gauge theory of interacting electrons
and photons from the unknown physics which must inevitably supersede QED at
sufficiently high energy (or short distances) is not only present but extraordinarily
efficacious. We might term this decoupling the “scale separation” property of local
field theory: it lies at the heart of our ability to construct theoretical systems of
finite complexity but adequate accuracy in the description of the “low”-energy physics
which lies within our purview at any given moment of time (where “low” is, of course, a
function of time). Our primary objective in this fourth and final part of the book will be
to understand the origin, and consequences, of this fortunate, and remarkable, property
of local quantum field theory. The adjective “remarkable” is hardly an exaggeration
when we consider that scale separation breaks down completely in the arena of classical
physics involving chaotic phenomena, where very small perturbations at small scales
can rapidly propagate to much larger ones, essentially destroying our capacity to
maintain accurate control over the temporal dynamics of many classical systems over
long time-periods. The fact that quantum field theory is able to avoid this fate goes
back ultimately, of course, to the linear character (and unitary temporal evolution) of
the underlying quantum dynamical framework.

16.2 General structure of local effective Lagrangians


The demands of cluster decomposition—the requirement that quantum scattering
amplitudes for far separated processes factorize in such a way as to ensure the
statistical independence of spatially distant phenomena—were our first constraints
in constructing the framework of local quantum field theory in Chapter 6. We saw
there that the interaction Hamiltonian for any quantum system of self-interacting
spinless particles satisfying the cluster decomposition principle necessarily must take
the form (cf. (6.85))
 
1 d3 k1 d3 kM
Hint (x) =  .. 
M !M  ! 2E(k1 ) 2E(kM )
MM

· fM M  (k1 , .., kM )eik1 ·x a† (k1 )..e−ikM ·x a(kM ) (16.1)

where the functions fM M  (k1 , .., kM ) are smooth functions of momenta, expandable in
joint Taylor expansions in the four-momenta k1 , ..kM 
 , k1 , . . . , kM . It will be convenient

to rewrite this expression in terms of the positive- and negative-frequency components


of the associated (for simplicity, self-conjugate) scalar field

1 d3 k
φ(x) =  (a( k)e−ik·x + a† ( k)eik·x ) ≡ φ(+) (x) + φ(−) (x) (16.2)
(2π)3/2 2E(k)
572 Scales I: Scale sensitivity of field theory amplitudes and effective field theories

as follows,

 (2π) 32 (M +M  ) 
∂ ∂ (−)  
Hint (x) = f M M  (−i
 , . . . , +i )φ (x ) · · · ·φ(+)
(xM ) 

M !M !  ∂x1 ∂xM 1

MM xi =xi =x
(16.3)
where the spacetime-derivatives in the functions fM M  are converted to the appro-
priate momentum dependences if we insert the definition of the scalar field (16.2) in
(16.1). We see that the interaction Hamiltonian density is necessarily an (infinite)
expansion involving multi-nomials in the scalar field and all possible spacetime-
derivatives thereof. At this stage, we have not yet inserted the demands of special
relativity. The discussion in Chapter 12 reveals the appropriate further constraints
which the as yet unspecified functions fM M  for this Hamiltonian must satisfy to
yield relativistically invariant scatttering amplitudes: they must arise by the standard
canonical procedure whereby the interaction Hamiltonian is derived via Legendre
transformation of a Lorentz scalar Lagrangian density L(φ, ∂ν φ, ∂ν ∂ρ φ, . . .), where now
we must allow the Lagrangian to contain arbitrarily many derivatives and powers of
the local scalar field φ(x) = φ(+) (x) + φ(−) (x) (with positive and negative frequency
parts of the field paired throughout to satisfy the demands of locality) in order to
obtain the general expansion for the (interaction) Hamiltonian density as indicated in
(16.3). A Lagrangian of this type, containing effectively all possible terms consistent
with cluster decomposition and Lorentz-invariance, is sometimes called a “Wilsonian
effective Lagrangian”,2 in order to reflect the profound contributions made in the
understanding of the scale sensitivity of local theories in the early 1970s by Ken
Wilson (Wilson, 1971).
The reader with prior acquaintance with standard treatments of perturbative quan-
tum field theory may object to the use of a scalar Lagrangian with “non-renormalizable
terms” of higher than (mass) dimension 4 (in four spacetime dimensions) which are
well known to lead to ultraviolet (large momentum) divergences in the loop integrals
of perturbation theory which are not removable via the usual processes of mass,
coupling, and wavefunction renormalization of the bare parameters of the theory. We
shall return to the whole matter of perturbative renormalizability, and to its relation
with the Wilsonian approach, in the next chapter. For now, this objection provides
us with the opportunity to fully realize, and put into effect, the qualitative insights
of the preceding section concerning the inescapable limitations on any local theory
formulated on a flat Minkowski spacetime due to the unavoidable dissolution of this
kinematic structure at very short distances (or large momenta). We therefore admit
frankly that our Lagrangian field theory, with its associated path integral, must be

2 At this point we should alert the reader to a dangerous source of terminological confusion. The use
of the word “effective” in this chapter will be completely restricted to the sense indicated here, where we
imagine writing an exact representation of only part of the physical content of the theory, basically by a
change of variable in the functional integral defining the theory at short distances. The notion of an “effective
action” Γ, as used in Chapter 10 in reference to the generating functional of the one-particle-irreducible
n-point functions of the theory, plays no role here, and to avoid confusion with the aforesaid Γ we shall try
to stick to the phrase “effective Lagrangian”, while avoiding the perfectly natural term “effective action”
for the spacetime integral thereof.
General structure of local effective Lagrangians 573

interpreted as a theory describing a field where the large momentum components of


the field are cut off, either in some smooth, but largely arbitrary fashion (reflecting our
ignorance of the ultimate underlying microphysics), or more simply, by a sharp cutoff
which eliminates (sets to zero) Fourier modes of the field above some limiting value.
We shall want to ensure that the cutoff is imposed in such a way as to correspond
to short distances both in the spatial and temporal directions of spacetime, and also
to respect the Lorentz-invariance of the remaining “low-energy” (or “large-distance”)
theory. This is best done in Euclidean space, where the momentum √ modes of the
theory can be divided cleanly into ultraviolet modes with |k| = k · k > Λ (where k is
a Euclidean four-vector), and infrared modes with |k| < Λ, with Λ a high-energy cutoff
which the reader may assume for the time being to be several orders of magnitude
below the Planck scale ΛPl ∼ 1019 GeV (to ensure that Minkowski space notions are
still reasonable for the modes below this value) but many orders of magnitude higher
than the energies presently accessible in accelerator experiments. Thus the scalar field
φ(x) appearing in the infinite expansion specifying our Lagrangian should really be
written with a subscript specifying the energy scale up to which it possesses Fourier
modes:
 
d4 k d4 k
φΛ (x) = φ̃(k)e−ik·x 4
= φ̃Λ (k)e−ik·x , φ̃Λ (k) ≡ θ(Λ − |k|)φ̃(k)
|k|<Λ (2π) (2π)4
(16.4)
We emphasize that this cutoff does not destroy Lorentz-invariance, which in Euclidean
space amounts to a four-dimensional O(4) rotation which clearly does not mix the
ultraviolet and infrared modes. Thus we still expect that the cutoff theory should
lead (after analytic continuation back from Euclidean space) to Lorentz-invariant low-
energy amplitudes provided that the Euclidean effective Lagrangian is constructed to
be invariant under Euclidean rotations (which implement the HLG for the Euclidean
version of the theory).
The perturbation theory based on an effective Lagrangian built from the field
φΛ (x) will lead to loop integrals in which the propagators ΔΛ (k) = φ̃Λ (k)φ̃Λ (−k)
automatically vanish for |k| > Λ, so all such integrals (no matter how many the
numerator factors of momentum arising from vertices with multiple derivatives on
the fields) are by fiat ultraviolet finite. Of course, the amplitudes computed in this
way will clearly depend on the scale Λ at which the effective Lagrangian is defined,
and we must still hope that after further examination, this sensitivity to the short-
distance structure of the cutoff theory will disappear when we restrict our attention
to processes at momentum scales much smaller than Λ.
To summarize, our theory of interacting scalar particles is to be regarded as defined
in the usual way (in Euclidean space) by a generating functional ZΛ of Euclidean
Schwinger functions given by the path integral
  4
ZΛ [j] = DφΛ e−SE (φΛ )+ d xj(x)φΛ (x) (16.5)

where the Euclidean action SE is given as an integral over a Lagrangian density


containing all possible powers of the field and its derivatives. It is convenient (though
574 Scales I: Scale sensitivity of field theory amplitudes and effective field theories

not of great physical significance) to simplify the algebra by imposing a symmetry


under φ → −φ restricting us to even powers of the field,
  
1
SE (φΛ ) = d4 x{ (∂ν φΛ )2 + an φ2+n
Λ + an (∂ν φΛ )2 φnΛ + . . .} (16.6)
2 n>0
n≥0
   
1  
≡ d x{ (∂ν φΛ ) +
4 2
an On + an On + . . .} = LΛ (φΛ )d4 x
2 n>0
n≥0

(16.7)

where the dots represent terms with a total of 4, 6, 8, etc., spacetime-derivatives acting
on the fields (coupled, of course, to an overall Lorentz scalar).
We have used our freedom to rescale the field to set the coefficient of the free
kinetic term (∂ν φΛ )2 to be exactly 12 . The mass term is now concealed in the term
a0 φ2Λ , while the coefficient a2 corresponds to the usual dimensionless quartic coupling
constant λ in our previous discussions of λφ4 theory. Now, however, we have an
infinite series of additional interaction terms (note: n is even), corresponding to new
four-point vertices arising from the derivative coupling (∂ν φΛ )2 φ2Λ , six-point vertices
from φ6Λ , (∂ν φΛ )2 φ4Λ , and so on. Recalling that the action SE in (16.5) must be
dimensionless, implying mass dimension of 1 (from the kinetic term) for the field
φΛ , we see that the coupling constants an (resp. an ) must have mass dimension 2 − n
(resp. −n). It will be convenient to rescale these couplings in terms of dimensionless
ones by extracting the appropriate powers of the cutoff (itself of mass dimension 1):

an ≡ gn Λ2−n , n = 0, 2, 4, . . .
an ≡ gn Λ−n , n = 2, 4, 6, . . . (16.8)

For the present, we shall be assuming that our theory is weakly coupled—in other words
that the dimensionless couplings gn , gn , . . . corresponding to interaction terms (i.e.,
those higher than quadratic in the field) are all of order unity, or perhaps somewhat
smaller,3 in which case a formal asymptotic expansion in these variables becomes
quantitatively useful.

16.3 Scaling properties of effective Lagrangians: relevant,


marginal, and irrelevant operators
We can begin to expose the physical content of the effective Lagrangian formulation
of our theory by exploring the relative importance of the various terms in the action
as a function of the energy scale of the phenomena under study. A convenient starting
point is provided by the tree (or classical) approximation to the amplitudes of the
theory. We recall from the discussion in Section 10.4 that the amplitudes of a local

3 The concept of “order unity” possesses a somewhat elastic connotation in field theory, as it is not always
obvious what the relevant expansion variable ought to be. The fine-structure constant α = e2 /4π = 1/137
seems to be two orders of magnitude smaller than “order unity”, but the electric charge e ∼ 0.3 is clearly
much closer to unity. Nevertheless, for many amplitudes in QED, an expansion in powers of α is appropriate,
in the sense that the coefficients of powers of α, at low orders of perturbation theory, are fairly close to 1.
Scaling properties of effective Lagrangians: relevant, marginal, and irrelevant operators 575

field theory can be expanded formally in powers of Planck’s constant , with the
lowest-order contributions corresponding to the tree (no-loop) graphs of the theory,
the one-loop graphs contributing with one extra power of , the two-loop graphs with
two extra powers, and so on. We can now ask about the perturbative contributions of
the operators On , On appearing in the general action (16.7) to some n-point function of
the theory, where we assume that the incoming and outgoing momenta of the process
under consideration are all of order E << Λ. The only dimensionful scales present
at the tree graph level are the energy scale of the process E (which permeates the
internal propagators) and the UV cutoff Λ, with the dependence on the latter arising
only from the explicit dependence of the couplings in (16.8) on Λ: in particular, at
tree level there are no loop integrals extending up to and cut off at Λ to introduce
further Λ-dependence. This means that the contribution of a particular operator at
the energy scale E can be estimated by a trivial dimensional argument, essentially by
just counting the mass dimension of the operator (integrated over spacetime), whence
 
d4 xOn ≡ d4 xφ2+n
Λ ∼ E n−2 (16.9)
 
d4 xOn ≡ d4 x(∂ν φΛ )2 φnΛ ∼ E n , etc. (16.10)

Including the coupling constants in (16.8) we see that these operators contribute to
tree amplitudes at the relative order

E n−2
On → gn ( ) , n = 0, 2, 4, . . . (16.11)
Λ
E
On → gn ( )n , n = 2, 4, 6, . . . (16.12)
Λ
and so on for the higher operators. This means that for the infrared physics with
which we are concerned, where the energy E of the processes we are studying is much
smaller than the ultraviolet cutoff Λ of the theory, the most important, or relevant,
operator is O0 = φ2 , the mass operator, whose effects grow quadratically as we lower
the energy. This is hardly surprising if we consider the mass expansion of the free
(Euclidean) propagator

1 1 m2 m4
∼ − + + ... (16.13)
k 2 + m2 k2 k4 k6

where we see that increasing powers of the mass correspond to larger and larger
contributions in the infrared region k << m. The quartic coupling operator O2 , by
contrast, contributes equally at all energy scales (in the tree amplitudes): it is therefore
termed a “marginal” operator, which we should regard as a technical designation of its
scaling behavior, and not (given its importance in generating non-trivial interactions)
as a demeaning comment on its importance for the theory! Higher-dimension operators
such as O4 = φ6 and O2 = (∂ν φ)2 φ2 (both of mass dimension 6 and contributing at
order (E/Λ)2 for E << Λ) contribute at a progressively smaller level to the low-energy
576 Scales I: Scale sensitivity of field theory amplitudes and effective field theories

g2 g2

1 g4
E2 Λ2

Fig. 16.1 Some tree graph contributions to the 2-4 scalar scattering amplitude.

physics, the higher their dimension, and are termed “irrelevant”, from the point of view
of tree amplitude scaling.
A simple example is given by the tree graphs displayed in Fig. 16.1, representing
contributions to the 2-4 scattering amplitude—the first graph arising from the quartic
coupling g2 φ4 in second order (and of order g22 /E 2 , where the incoming and outgoing
momenta are of order E) while the second graph, arising from the higher-dimension
2
term Λg42 φ6 is very small, of order E
Λ2 relative to the first (assuming all dimensionless
2
couplings of order unity, or in any event much closer to unity than the ratio E Λ2 ), in
agreement with the scaling deduced previously in (16.11).
The reader should once again guard against attaching the colloquial meaning of
terms such as “irrelevant” to the physics generated by the corresponding operators:
the dimension-six four-fermion operator of Fermi weak interaction theory, responsible
for β-decay, for example, is “irrelevant” from this point of view, but the associated
vast phenomenology of radioactivity is hardly so. A higher-dimension operator may
initiate processes with very low amplitude (hence, rare processes), which may, however,
be of a sufficiently different type from the processes induced by marginal or relevant
operators as to stand out phenomenologically, and even to play an important role
in uncovering details of the physics emerging at shorter distances (as in the case
of the electroweak theory supplanting the Fermi theory of weak interactions). For
reasons that will become clear in the next chapter, the classification into “relevant”,
“marginal”, and “irrelevant” operators (of mass dimension <4, 4, and >4 respectively,
in four spacetime dimensions) is mirrored in the terminology of renormalization
theory by the terms “super-renormalizable”, “strictly renormalizable”, and “non-
renormalizable”, respectively.
When loop effects are included, the situation becomes more complicated, and much
more interesting. If we take 2-2 scattering as a test case, the graphs in Fig. 16.2
illustrate some simple low-order contributions to the process: the lowest-order tree
graph corresponding to the quartic coupling g2 φ4 (we will drop the Λ subscript on
the fields here with the reminder that it simply instructs us to cut off the momenta
on all internal propagators at |k| = Λ), the three one-loop graphs arising at second
order in g2 , and the one-loop graph coming from the first-order contribution of the
dimension 6 “irrelevant” operator Λg42 φ6 . Setting m2 = 2a0 = 2g0 Λ2 , we shall assume
that the momenta in the process and the unperturbed mass m are all much smaller
Scaling properties of effective Lagrangians: relevant, marginal, and irrelevant operators 577

k3 k4
k4
k3 k4
g2 k3 k4 k4 k3 k3
g2 g2 g2 g2 g2 g4
g2
k1 k2 k1 k2 k1
k1 k2
k2
k1 k2

Fig. 16.2 Some tree and one-loop contributions to the 2-2 scalar scattering amplitude.

than the cutoff Λ. The final graph, arising from contracting two out of the six lines
emerging from the six-point vertex associated with the higher-dimension φ6 operator,
contains the cutoff one-loop integral
  Λ
1 d4 k 1 k3
θ(Λ2 − k2 ) = dk
k + m (2π)4
2 2 8π 2 0 k2 + m2
1 Λ2 m2
= 2
(Λ2 − m2 ln 2 ) + O( 2 ) (16.14)
16π m Λ
The loop integral (which in the absence of a cutoff would be quadratically divergent)
therefore produces a large factor proportional to the cutoff squared, which will cancel
the inverse factor of Λ2 (in the coupling Λg22 ) which we have previously used to argue
for the “irrelevance” of the φ6 operator at low energies. Of course, the result is just a
momentum-independent constant contribution to the amplitude, of exactly the same
form as the tree contribution proportional to g2 . In fact, a short calculation (see
Problem 1) gives the following result for the truncated four-point function arising
from the graphs in Fig. 16.2 (normalized to begin with g2 )

3 2
Γ(4) (k1 , k2 , k3 , k4 ) = g2 − g {I(s, m2 , Λ2 ) + I(t, m2 , Λ2 ) + I(u, m2 , Λ2 )}
4π 2 2
15 m2 , ki2
2
+g4 + O( ) (16.15)
16π Λ2
 1
Λ2
I(p , m , Λ ) ≡
2 2 2
(ln ( ) − 1)dx (16.16)
0 x(1 − x)p2 + m2
s ≡ (k1 + k2 )2 , t ≡ (k1 − k3 )2 , u ≡ (k1 − k4 )2 (16.17)

We have already encountered the one-loop integral I in Chapter 10 (in Minkowski


space, and with a slightly different notation, cf. (10.35)), where we pointed out that in
a continuum theory it is necessarily ultraviolet-divergent, corresponding in coordinate
space to an ill-defined multiplication of distributions at the same point: here, the
divergence is explicit in the ln (Λ2 ) factor, which blows up if we force Λ → ∞. As our
theory is cut off, momentum integrals terminate at Λ, and there are no ultraviolet
divergences at any point. However, we see that the “large” integration range from
578 Scales I: Scale sensitivity of field theory amplitudes and effective field theories

the low-energy regime with momenta of order ki ∼ E << Λ up to the UV cutoff


Λ produces loop integrals which can promote the size of the contributions induced
by the “irrelevant” operators (which only correct tree amplitudes by small amounts
proportional to inverse powers of Λ) to values comparable to the marginal or relevant
operators of dimension 4 or less. Indeed, the final term in (16.15) shows that, assuming
that the dimensionless couplings g2 , g4 , .. are of order unity4 , the momentum modes
of the field between E and Λ, when integrated out in the path integral, produce order
unity modifications (arising from higher-dimension operators) in the effective four-
point coupling strength at the low-energy scale. This occurs simply because the explicit
inverse powers of the cutoff in the coupling factors (16.8) can be cancelled by positive
powers of the cutoff arising from loop-integrals containing vertices corresponding to
these higher-dimension operators.
The “filtering down” effect from higher- to lower-dimension operators may seem
fairly innocuous for the marginal couplings such as g2 , but it implies much more
dramatic consequences for the coefficient of the relevant operators such as the mass
term g0 Λ2 φ2 . A classic example is given by the Higgs mass: if the Higgs turns out
be described by an elementary scalar field, it must have a mass in the range of a
few hundred GeV, to avoid violating the fairly precise agreement obtained between
calculated electroweak radiative corrections and well-measured Standard Model weak
processes. At first, this suggests that we assign a spectacularly small value to the
dimensionless coupling g0 , of order m2Higgs /Λ2Pl ∼ 10−34 (using the Planck scale as our
ultraviolet cutoff)! However, even if the dimensionless coupling is set to this value at
some high-energy scale, the effect of integrating out field modes from this scale down to
the energy scale of the electroweak theory will inevitably produce corrections of order
unity, bringing the Higgs mass back up to the range of the UV cutoff, unless the value
of the “bare” coupling g0 at the UV scale is set with extraordinary precision. This
is the famous fine-tuning issue associated with the “hierarchy” problem for scalar
masses, which are not protected from large (i.e., power-like in the cutoff) radiative
corrections, unlike, as we shall see later, fermion masses in gauge theories.
The instability of relevant operators to infection with large radiative corrections
from large energy scales is, of course, worse the lower the dimension of the operator:
consider, for example, the unit operator O−2 ≡ φ0 = 1, of dimension zero, correspond-
ing to an overall additive constant in the Lagrangian density (and to the zero-point
energy in the Hamiltonian density), which has been ignored by our discussion so far, as
it corresponds to a physically irrelevant additive shift in the energy in flat Minkowski
space. If we imagine coupling our theory to gravity, such a term becomes physically
relevant as a cosmological constant term in the Einstein field equations. On the other
hand, the associated coefficient a−2 evidently receives contributions of order Λ4 (due
to the associated dimensionless coupling g−2 receiving contributions of order unity)
from integrating out field modes from the UV scale Λ down to the much lower scale
of astronomical phenomena, which, if we take the UV cutoff as the Planck scale, is
10120 times greater than the observed value (if we interpret the presently observed

4 For the purposes of the present discussion, 1/137 is a number of order unity, to be distinguished from
the much tinier ratio of scales, ∼ 10−15 , between, say, the LHC energy E ∼ 104 GeV and the Planck energy
Λ ∼ 1019 GeV.
Scaling properties of effective Lagrangians: relevant, marginal, and irrelevant operators 579

dark-energy effects as arising from a cosmological constant). In comparison to this


“cosmological constant problem”, the fine-tuning required to achieve a suitable low-
mass Higgs (say, of order 125 GeV) is hardly worth mentioning.
This mixing of higher- and lower-dimension operators would seem to complicate
enormously our ability to give a direct quantitative interpretation to the terms in
an effective Lagrangian such as (16.7). The problem would, of course, be greatly
ameliorated if we could ensure that loop contributions to low-energy amplitudes could
not contain integrations over a large momentum range capable of producing large
Λ
factors (in the case under discussion, positive powers of E ): in this case, the order
of magnitude of the contributions of different operators in the effective Lagrangian
could be deduced directly from the size of the coefficient couplings multiplying the
respective operators.
We can see how to achieve this by taking note of a simple property of the Fourier
transform change of functional field variables, which is a linear unitary one, allowing
us to write the path integral (normally given in terms of the coordinate space fields)
in terms of momentum-space modes of the field which can be divided in an obvious
way into successive momentum “shells”. Let us choose an energy scale μ << Λ, much
smaller than the UV cutoff of the effective theory but still above the energy scale at
which we wish (or are able) to perform experiments, so E < μ. In analogy to (16.4),
we can define “sliced” (or even better, “peeled”) fields:

d4 k
φ(μ,Λ) (x) ≡ φ̃(k)e−ik·x (16.18)
μ<|k|<Λ (2π)4
φΛ (x) = φ(μ,Λ) (x) + φμ (x) (16.19)
     
DφΛ = Dφ̃(k) = Dφ̃(k) Dφ̃(k)
|k|<Λ |k|<μ μ<|k|<Λ
 
= Dφμ Dφ(μ,Λ) (16.20)

Of course, in order to generate (by functional differentiation of the generating function


ZΛ ) the desired n-point scattering amplitudes with momenta of magnitude up to, but
not exceeding, μ, we must include a source function j(x) which needs to contain
momentum Fourier modes only up to, but not exceeding, the scale μ:

d4 k
jμ (x) = j̃(k)e−ik·x (16.21)
|k|<μ (2π)4

As Fourier modes of different momentum are orthogonal (in coordinate space), the
source term in the functional integral (16.5) depends only on the infrared field φμ (x)
(as the overlap of φ(μ,Λ) and jμ vanishes):
 
d4 x jμ (x)φΛ (x) = d4 x jμ (x)φμ (x) (16.22)

If we factor the functional measure in the path integral (16.5) as indicated in (16.20),
we see that the source term can be taken outside the integral over the momentum
580 Scales I: Scale sensitivity of field theory amplitudes and effective field theories

shell field φ(μ,Λ) , and the resulting integral used to define a new effective Lagrangian
Lμ (φμ ):
   4  4
ZΛ [jμ ] = Dφμ Dφ(μ,Λ) e− d xLΛ (φΛ )+ d xjμ (x)φΛ (x)
   
d4 xjμ (x)φμ (x) d4 xLΛ (φμ +φ(μ,Λ) ))
= Dφμ e Dφ(μ,Λ) e−
  
d4 xLμ (φμ )+ d4 xjμ (x)φμ (x)
≡ Dφμ e− (16.23)

where
 
d4 xLΛ (φμ +φ(μ,Λ) ))
Lμ (φμ ) ≡ − ln ( Dφ(μ,Λ) e− ) (16.24)

The new effective Lagrangian, Lμ , can be expanded5 in powers of the infrared field
φμ and its derivatives, just as our original effective Lagrangian defining the theory at
the high scale Λ, but of course, with coefficients which depend on the scale μ:
1   
Lμ = a0 (μ)(∂ν φμ )2 + an (μ)On (φμ ) + an (μ)On (φμ ) + . . . (16.25)
2 n>0
n≥0

Note that the coefficient a0 (often written Z, the wavefunction renormalization
constant) of the kinetic term (which we were free to choose to be 12 at the high scale,
by rescaling the field) now becomes a function of μ as well. The effective running
couplings an (μ), an (μ), . . . . of course satisfy the boundary condition

a0 (Λ) = 1
an (Λ) = an = gn Λ2−n
an (Λ) = an = gn Λ−n , n≥2 (16.26)

Clearly, we can view a given effective Lagrangian as a point in an infinite-dimensional


space of couplings gn , gn , .., and the process of integrating out (partially) the momen-
tum modes of the field results in a well-defined flow through this space, where, given
a definite starting point at the highest scale Λ, as indicated in (16.26), the structure
of the effective Lagrangian is uniquely specified (corresponding to a unique point
in coupling constant space) at any lower-energy scale μ. The size of the low-energy
effective couplings is now directly associated with the importance of the corresponding
interaction term for low-energy physics, as the large momentum range present in loop
integrals (previously stretching all the way up to Λ) has been eliminated. In the next
section we shall see how to write a precise set of coupled equations—the famous
renormalization group equations—describing this flow.

5 Strictly speaking, the locality of the effective Lagrangian defined by this procedure depends on certain
smoothness properties which are not present with the sharp momentum cutoff envisaged here. In the next
section we shall remedy this difficulty and derive an exact equation for the cutoff dependence of the local
effective Lagrangian which arises once the momentum cutoff is appropriately chosen.
The renormalization group 581

16.4 The renormalization group


The arguments of the preceding section suggest that the low-momentum, or large-
distance physics, of our theory of a self-interacting scalar particle, described at a high
scale by a specified effective Lagrangian, is determined at low energies by an effective
Lagrangian of the same form but with modified couplings associated with each of
the (infinitely many) operators appearing in the expansion of the Lagrangian. Given
definite starting values for the couplings gn , gn , .. at the high scale Λ, the process of
integrating out the intermediate modes between μ and Λ should therefore lead to a
new effective Lagrangian at the lower scale with definite couplings gn (μ), gn (μ), . . .. We
therefore expect that there should be an (infinite) coupled set of first-order differential
equations describing the flow of the infinite set of couplings: first order, as we expect
a unique solution simply by specifying the initial value of the couplings at the high
scale Λ,

μ gn (μ) = βn (gn (μ), gn (μ), ..)
∂μ
∂ 
μ g (μ) = βn (gn (μ), gn (μ), ..), etc. (16.27)
∂μ n
where the βn , βn , .. are dimensionless functions of the infinite set of dimensionless
couplings gn , gn , ... Evidently, the process of successively integrating out momentum
modes can be viewed in group-theoretical terms: the operation R(μ2 , μ1 ) of integrating
out modes between scales μ2 < μ1 followed by the subsequent process R(μ3 , μ2 ) is evi-
dently equivalent to the single operation R(μ3 , μ1 ), and the set of all such processes is
collectively termed the “renormalization group”. The finite non-linear mappings in the
infinite-dimensional coupling constant space of the theory represented by the abstract
group elements R reduce infinitesimally to the differential form of the Lie generator of
the renormalization group indicated in (16.27), which is called the “renormalization
group equation” of the theory. As mentioned previously, it is the precise mathematical
expression of the change in form of the physics as we examine the theory at different
length (or momentum) scales.
The momentum shell approach described in the preceding section, while physically
intuitive, turns out to be somewhat awkward from an analytical point of view in
deriving the desired Lagrangian flow, as sharp cutoffs in momentum lead to singular
terms when we execute the desired derivatives with respect to scale visible in (16.27).
Instead, we shall (following Polchinski (1984)) use a continuous cutoff, by writing our
general effective Lagrangian as a sum of free and interacting parts, as follows. In the
free part, we introduce a cutoff via a damping function D,
 
1 
S0 [φ, Λ] = d4 xL0 (φ, λ) = d4 x φ(x)(− + m2 )D(− )φ(x)
2 Λ
 2 4
1 k d k
= φ̃(k)(k 2 + m2 )D( 2 )φ̃(−k) (16.28)
2 Λ (2π)4
where the function D is essentially unity up to the scale |k| = Λ, then grows exponen-
tially for |k| > Λ so as to damp the propagator D(k2 /Λ2 1)(k2 +m2 ) , effectively cutting off
582 Scales I: Scale sensitivity of field theory amplitudes and effective field theories

the loop integrals when any internal propagator exceeds momentum Λ. Note that the
field φ in (16.28) is not cut off, but contains all momentum modes and is independent
of the scale at which we are examining the theory, so derivatives with respect to scale
do not affect the fields. (Alternatively, we may simply choose to pick once and for all
a fixed “ultimate” UV cutoff for this field—the Planck scale ΛPl , say—reflecting our
certain knowledge that a representation of the physics in terms of Minkowski space
fields must fail at this point; see below.) The precise form of D is unimportant, but
for definiteness we can take, for example,

D(ρ) = 1 + exp (α(ρ − 1)) (16.29)


2
with α a large positive constant. Thus, the inverse of the function D( Λk 2 ) in (16.28)
undergoes a rapid transition from unity for |k| just below Λ to an exponentially small
value for |k| just above Λ. In particular, we can assume that the Λ derivative of D (or its
inverse) is effectively zero for |k| < Λ. The interaction part of the effective Lagrangian
at a given scale Λ, in analogy to (16.7), is given by the usual infinite expansion

Lint (φ, Λ) = (an (Λ)On + an (Λ)O  + . . . .) (16.30)
n≥0

and contains all possible local field dependent terms: accordingly, the sums begin at
n = 0, including the quadratic mass O0 = φ2 and kinetic O0 = (∂ν φ)2 terms, whose
coefficients must be allowed to change as we change the scale. We shall assume, as
previously, that we are only interested in the physics up to some scale μ much lower
than a “top” UV scale ΛU V at which the “bare” couplings gn , gn are initially set, with
  −n
an = gn Λ2−n
U V , an = gn ΛU V as before (cf. (16.26)). Accordingly, the external source j(x)
introduced to probe field modes associated with the desired scattering amplitudes need
contain only momentum modes up to μ:

j̃μ (k) = 0, |k| > μ (16.31)

The generating functional describing the physics at any scale Λ is given by the path
integral
  4  4
ZΛ [jμ ] = Dφe− d x(L0 (φ,Λ)+Lint (φ,Λ))+ d xjμ (x)φ(x)
  d4 q
−S0 [φ,Λ]−Sint [φ,Λ]+ j̃μ (q)φ̃(−q)
≡ Dφe (2π)4 (16.32)

We now claim that there exists a unique evolution with Λ of the interaction Lagrangian
Lint (φ, Λ) (from Λ = ΛU V down to Λ = μ) such that the low-energy physics is exactly
preserved—in other words, which leaves the generating functional W [jμ ] = ln Z[jμ ]
of the connected low-momentum amplitudes of the theory invariant up to source-
independent terms:


WΛ [jμ ] = independent of jμ (16.33)
∂Λ
The renormalization group 583

Thus, when we perform functional derivatives with respect to jμ to extract the con-
nected low-momentum n-point amplitudes of the theory, the Λ dependence disappears
(for any Λ in the range μ < Λ < ΛU V ), as long as we use the effective Lagrangian
Lint (φ, Λ) appropriate for that scale.
The derivation of the renormalization group flow equation for the Lagrangian
Lint is facilitated by a functional integral identity, based on the observation that the
functional integral of a total functional derivative vanishes provided the integrand has
the usual falloff (in our case, exponential) for large values of the field. Namely, we have
  4
δ 1 (2π)4 δ −S0 [φ,Λ]−Sint [φ,Λ]+ j̃μ (q)φ̃(−q) d q4
Dφ̃ {(φ̃(k)D(k /Λ ) +
2 2
)e (2π) }

δ φ̃(k) 2 k 2 + m2 δ φ̃(−k)
=0 (16.34)

This functional identity holds for all values of the momentum k, but for reasons shortly
to become apparent we shall apply it only in the regime |k| > μ, where by assumption
j̃μ (k) = 0. Thus, when working out the functional derivatives in (16.34), we can ignore
any factors of j̃μ (k) (or j̃μ (−k)) that appear. We shall also suppose that our fields are
restricted to a large spacetime box of volume V , so that infrared singular functional
derivatives such as

δ 1 V
φ̃(k) = δ 4 (0) = 4
d4 xei0·x = (16.35)
δ φ̃(k) (2π) (2π)4

are given a definite meaning. (These terms will, in any case, later turn out to be irrel-
evant disconnected vacuum terms.) Carrying out the indicated functional derivatives
in (16.34), and using

δS0 k 2 + m2
= D(k 2 /Λ2 )φ̃(k) (16.36)
δ φ̃(−k) (2π)4

one finds after a short calculation (see Problem 2) that it can be rewritten as

1 k2 4 1 k 2 + m2 k2 2
D( 2 )δ (0)ZΛ [jμ ] = Dφ̃ { D( ) φ̃(k)φ̃(−k)
2 Λ 2 (2π)4 Λ2

1 (2π)4 δ 2 Sint δSint δSint jμ φd4 x
+ 2 2
[ − ]}e−S0 −Sint +
2 k + m δ φ̃(k)δ φ̃(−k) δ φ̃(k) δ φ̃(−k)
(16.37)

Note that the two terms in curly braces in (16.34) are chosen such that a contribution
of the form φ̃(k)D(k 2 /Λ2 ) δδS int
φ̃(k)
cancels between them, leaving just the terms given
here.
We can now return to our main focus: how to choose the effective Lagrangian
Lint (φ, Λ) at any given scale Λ to ensure that we obtain the same low-momentum
amplitudes, by functionally differentiating the generating functional ZΛ [jμ ]. A glance
at (16.32) shows that the differential variation of this functional with Λ arises from
two sources: the Λ dependence of the propagator via the cutoff function D(k 2 /Λ2 )
584 Scales I: Scale sensitivity of field theory amplitudes and effective field theories

embedded in the free Lagrangian L0 , and the Λ-dependence of the “interaction” part
Lint (through the Λ-dependent coupling parameters contained in the latter). In fact,
we clearly have, differentiating (16.32),
  2
∂ZΛ [jμ ] 1 2 ∂D( Λk 2 ) d4 k ∂Sint
Λ =− Dφ̃{ (k + m2 )φ̃(k)φ̃(−k)Λ +Λ }
∂Λ 2 ∂Λ (2π)4 ∂Λ
 4
·e−S0 −Sint + jμ φd x (16.38)

Our choice of cutoff function D(k 2 /Λ2 ) implies (see (16.29)) that derivatives of D
with respect to the scale Λ, if we keep Λ above the infrared scale μ, are exponentially
2 2
negligible (of order e−α(1−k /Λ ) ) in the infrared region |k| < μ < Λ. The support of
both sides of the following identity is therefore precisely in the region of validity |k| > μ
of (16.37):
2
∂D( Λk 2 ) k2 ∂D−1
Λ = −D( 2 )2 Λ (16.39)
∂Λ Λ ∂Λ
Using (16.39), (16.38) can be re-expressed
 
∂ZΛ [jμ ] 1 2 k2 ∂D −1 d4 k ∂Sint
Λ = Dφ̃{ (k + m2 )D( 2 )2 φ̃(k)φ̃(−k)Λ 4
−Λ }
∂Λ 2 Λ ∂Λ (2π) ∂Λ
 4
·e−S0 −Sint + jμ φd x (16.40)

Comparing (16.37) with (16.40), we see that by setting



∂Sint (2π)4 ∂D(k 2 /Λ2 )−1 δSint δSint δ 2 Sint d4 k
Λ = Λ { − } 2 (16.41)
∂Λ 2 ∂Λ δ φ̃(k) δ φ̃(−k) δ φ̃(k)δ φ̃(−k) k + m2

and using the identity obtained by integrating both sides of (16.37) with the measure
 4 ∂D−1
d kΛ ∂Λ .., we obtain

∂ZΛ [jμ ] 1 ∂ ln (D(k 2 /Λ2 )) 4
Λ = δ 4 (0) Λ d k · ZΛ [jμ ] (16.42)
∂Λ 2 ∂Λ

or, equivalently, as desired (see (16.33)),



∂WΛ [jμ ] Λ ∂ZΛ [jμ ] 1 ∂ ln (D(k2 /Λ2 )) 4
Λ = = δ 4 (0) Λ d k = independent of jμ
∂Λ ZΛ ∂Λ 2 ∂Λ
(16.43)
The peculiar right-hand side appearing here, which must be associated with discon-
nected vacuum graphs6 which do not contribute to the connected n-point functions
obtained by differentiating WΛ with respect to the low-momentum source jμ , can easily

6 The reader may find it convenient at this point to review the discussion of disconnected graphs and
vacuum energy in Section 10.2.
The renormalization group 585

be seen to arise from the cutoff-dependence of the zero-point energy associated with
the free Lagrangian L0 (see Problem 3).
The equation (16.41) gives the desired variation in the form of the effective
Lagrangian with the scale at which we probe the physics. Moreover, given that
the starting effective Lagrangian (at the UV cutoff) yields a well-defined convergent
functional integral representation for ZΛ , this renormalization group equation is non-
perturbatively valid, as it is based on exact manipulations of the functional integral.
We note immediately that the space of free Lagrangians (i.e., those Lagrangians
quadratic in the field, but with arbitrarily many spacetime-derivatives) is preserved
under renormalization group transformations, as Λ ∂S int
∂Λ
is clearly quadratic in the
fields (ignoring physically ignorable constant terms) if Sint is. However, if there are
interactions present (with our φ → −φ symmetry, terms quartic or higher in the fields),
the non-linear functional equation (16.41) produces an infinite-dimensional mixing of
the operators in the general expansion (16.30). The reason for this is simply that this
expansion, re-expressed in terms of momentum-space fields, can be written
 1   
Sint = hL (k1 , k2 , . . . ., kM ; Λ))δ 4 ( ki )φ̃(k1 )φ̃(k2 ) · · · φ̃(kL ) d4 ki
L! i i
L
(16.44)
where the functions hM are scalar functions (under Euclidean rotations of their four-
momentum arguments) expandable in powers of their momentum arguments. Inserting
this form in (16.41) we find that the form of the effective Lagrangian is preserved under
renormalization group transformation
  
∂Sint  1 (1) (2)
Λ = (ĥL (k1 , . . . , kL ) − ĥL (k1 , . . . , kL ))δ 4 ( ki )φ̃(k1 ) · · · φ̃(kL ) d4 ki
∂Λ L! i i
L
(16.45)
where the functions
 
(1) L!
ĥL (k1 , . . . , kL ) = F(k 2 )hM +1 (k, k1 , . . . , kM )
M !N !
M +N =L

·hN +1 (−k, kM +1 , kM +2 , . . . , kL )d4 k (16.46)



(2)
ĥL (k1 , . . . , kL ) = F(k 2 )hL+2 (k, −k, k1 , k2 , . . . , kL )d4 k (16.47)

(2π)4 1 ∂D(k 2 /Λ2 )−1


F(k2 ) ≡ 2 2
Λ (16.48)
2 k +m ∂Λ
are Taylor expandable in their momentum arguments, which holds in our case given
our choice of a smooth cutoff function D(k 2 /Λ2 ) in (16.29). This analyticity property
of the regularization used to smear the short-distance behavior theory is essential in
maintaining the desired locality properties of our theory: i.e., in ensuring that the
effective Lagrangian at any scale can be expressed as an infinite sum of local terms
(products of the fields and their derivatives at a single spacetime point). The upshot of
this whole discussion is that the cutoff variation of the effective Lagrangian amounts
to an infinite set of non-linear first-order differential equations among the coefficients
586 Scales I: Scale sensitivity of field theory amplitudes and effective field theories

k
F(k2)
F(k2)

Fig. 16.3 Graphical representation of the renormalization group evolution of the effective
Lagrangian.

of the local operator basis in terms of which we choose to express our cutoff theory,
exactly as expressed in the renormalization group equations (16.27).
The physical interpretation of the two terms in (16.46, 16.47) is illustrated in
Fig. 16.3. The graph on the left (corresponding to (16.46)) illustrates the differential
change in a typical effective vertex (in this case, six-point) due to the differential vari-
ation F of the propagator, represented by the thick line, which in this case connects
two vertices as an internal line in a tree graph. The graph on the right (corresponding
to (16.47)) describes the variation in the effective vertex (in this case, a four-point
vertex) due to the differential variation of the cutoff propagator in an internal line in
a loop graph.
We have already seen an explicit example of the effect of the latter term in (16.15),
where the irrelevant term Λg42 φ6 was shown to lead to an order unity modification of
the marginal four-point vertex (unsuppressed by inverse powers of the high scale) as a
consequence of the large loop integral in the final graph of Fig. 16.2. Let us see how to
reproduce this result from the point of view of our new renormalization group equation.
We shall assume that at the initial high scale Λ, all the dimensionless couplings except
g2 and g4 are negligible, and we shall also ignore terms of order g22 , relative to g2 and
g4 . The evolution of the g4 vertex is determined by terms of order g22 (from (16.46))
or g6 (from (16.47)) both of which we shall neglect: we thereby conclude that to the
∂ g4 g4 (Λ) g4 (μ)
desired accuracy Λ ∂Λ Λ2 is negligible, and we may replace Λ2 by μ2 at any lower
scale μ. The evolution of g2 arising from (16.47) is determined by

∂g2 (2π)4 30 g4 (Λ) ∂D(k 2 /Λ2 )−1 d4 k
Λ =− Λ
∂Λ 2 (2π)8 Λ2 ∂Λ k 2 + m2

(2π)4 g4 (μ) 30 ∂D(k 2 /Λ2 )−1 d4 k
≈− Λ (16.49)
2 μ2 (2π)8 ∂Λ k 2 + m2

Integrating (16.49) from Λ down to the infrared scale μ,



g4 (μ) 1 d4 k
g2 (μ) = g2 (Λ) + 15 2 (D(k2 /Λ2 )−1 − D(k 2 /μ2 )−1 ) 2 (16.50)
μ k + m2 (2π)4

We may approach the sharp momentum cutoff used in Section 16.3 by choosing a
large value for the parameter α in (16.29), whereupon we may replace D(k 2 /Λ2 )−1 by
The renormalization group 587

a step function θ(Λ − |k|). The integral in (16.50) is then restricted to the momentum
shell μ < |k| < Λ, and we find (replacing g4μ(μ)
2 by g4Λ(Λ)
2 as indicated above)

15 m2 , μ2
g2 (μ) = g2 (Λ) + 2
g4 (Λ) + O( ) (16.51)
16π Λ2
which can be seen to agree with the order unity shift in g2 induced by the six-point
vertex obtained earlier in (16.15).
The renormalization group flow equation (16.45) gives an exact description of
the appropriate form taken by the dynamics of the theory once phenomenologically
inaccessible short-distance modes of the field are averaged out, but it nevertheless
leaves us with a complicated, and not very practical, end result, as our effective
Lagrangian contains an infinite number of terms. Although by lowering the cutoff
from some very high (and experimentally unreachable) value ΛU V to a value μ
close to experimental energies we have ensured that large loop integrals involving
powers of ratios of ΛU V to the low scale μ have been eliminated, there remains the
obvious difficulty that the calculation of a low-energy amplitude seems to require
the inclusion of contributions from an infinite number of vertices in the low-energy
Lagrangian.
It is a remarkable property of local quantum field theory that for a certain
subset of theories, the sensitivity of the low-energy amplitudes to all but a finite
number of coupling parameters—in particular, those associated with the marginal
and relevant operators in the effective Lagrangian—is reduced to inverse powers of
the high cutoff. From the renormalization group point of view, this occurs because the
renormalization group flow has the property that the point describing the “location”
of the Lagrangian in the infinite-dimension coupling space of the gn (μ), gn (μ), . . . is
attracted, for μ << ΛU V , onto a finite-dimensional submanifold (of dimension equal to
the number of marginal and relevant operators), up to corrections of order ΛUμV to some
(typically even) power (modulo logarithms of ΛUμV ). As a consequence, up to usually
negligible corrections, we find that for these theories the low-energy amplitudes can be
parameterized by just a finite set of parameters—namely, those needed to locate the
theory on the finite-dimensional attractive submanifold, and which can in principle be
determined by making an equal number of independent experimental measurements.
The insensitivity asserted here is actually demonstrated in a perturbative setting: one
shows that the formal expansion of an arbitrary scattering amplitude in powers of the
marginal and relevant interaction couplings defined at low momentum, holding the
irrelevant couplings fixed at the high cutoff scale ΛU V , depend on the latter only by
inverse powers of ΛU V . We then say that the marginal and relevant operators of the
effective theory form a “perturbatively renormalizable set”. Exactly how this works
will be the topic of Section 17.4 in the next Chapter.
Although the physical content of the renormalization group is most easily displayed
using momentum cutoff regularization schemes of the type we have used up to this
point, such schemes have distinct disadvantages from a calculational point of view
once one goes beyond the lowest orders of perturbation theory. Moreover, in theories
with local gauge symmetry, such cutoff schemes turn out to be incompatible with the
local symmetry, with the unwanted result that the renormalization group evolution
588 Scales I: Scale sensitivity of field theory amplitudes and effective field theories

necessarily introduces non-gauge-invariant operators which greatly complicate the


renormalization group analysis of the theory, and would in fact not be needed if
the theory could be regularized in a manner compatible with local gauge symmetry.
In the next section we shall discuss alternative approaches to the regularization
of local field theory which allow a more efficient application of the insights of the
renormalization group.

16.5 Regularization methods in field theory


We have so far been discussing the behavior of local field theories under change of scale
in terms of fields and field products whose matrix elements are made well-defined by
building in an explicit cutoff in the momentum-space Fourier transform modes of
the fields. This sort of cutoff has a clear physical connection to our coordinate space
intuition, whereby the ability to probe sensitivity to higher-momentum field modes is
directly correlated to our ability to expose “finer” details of the interactions at ever
shorter distances. Alternatively, we may use a spacetime lattice cutoff, whereby the
continuum theory (should one exist) is approached by taking the spacing a between the
points of our hypercubical lattice to zero, and where (on an infinitely extended lattice)
the Fourier momenta assigned to the fields range continuously over the finite interval
− πa < kμ < πa . Lattice cutoffs are especially valuable in non-perturbative formulations
of field theory (via the Euclidean functional integral): indeed, rigorous constructive
proofs of the existence of the continuum limit for super-renormalizable theory in less
than four spacetime dimensions make extensive use of this method of provisionally
defining the theory, as a prelude to the proof of existence of the continuum (a → 0)
and “thermodynamic” (V → ∞, where V is the spacetime volume) limits.7
A renormalization group approach based on lattice fields is also possible, in analogy
to the “momentum shell” methods discussed above: one defines progressively “coarser”
fields, with momentum components restricted at each discrete stage to one half the
range of the previous fields, by forming “block averages” of the fields over hypercubical
sublattices (consisting, in four dimensions, of the fields at the sixteen lattice points
equidistant from the points of a dual lattice interspaced with the original one and with
twice the lattice-spacing). This block renormalization technique was pioneered, and
has been widely used, in the study of critical phenomena in spin systems, but as with
the momentum-shell methods of the preceding sections, turns out to be somewhat
clumsy, and beset with undesirable technical drawbacks when we come to the study of
the renormalization group behavior of the four-dimensional field theories (especially
gauge theories) of relevance to the Standard Model of elementary particle physics (and
its potential extensions at higher energy).
In our discussion of effective Lagrangians up to this point, the theory is specified
at any given cutoff scale Λ by a formal expansion containing an infinite number
of scalar operators involving arbitrarily many powers of the field (and derivatives
thereof), multiplied at the same spacetime point. From our discussion of the Wightman

7 The use of the term “thermodynamic” here does not imply any connection to finite-temperature
phenomena: it is a carry-over from the close formal analogy between the Euclidean quantum functional
integral and the canonical partition sums of classical thermodynamics.
Regularization methods in field theory 589

formalism, we are already familiar with the notion that local quantum fields should
really be regarded as operator-valued distributions, and that the multiplication of such
distributions can lead to ambiguities (or singularities) in the continuum theory. Thus,
the matrix elements of the operators On , On etc., defined in Section 16.2 are actually
infinite, if we insist in working in a continuum theory where the UV cutoff is infinite.
Of course, the whole point of the philosophy espoused here is that a cutoff is not only
technically but physically required, and with the momentum cutoff in place (say, by
employing the modified propagator (16.29)), there are no ultraviolet divergences in any
of the loop integrals we encounter, so the operators of the theory have perfectly well-
defined matrix elements (and lead to well-defined perturbative corrections to n-point
functions of the theory when inserted into the graphs for some process). The actual
value of these matrix elements will, of course, depend on the cutoff, so we must keep
in mind that operators such as O0 (x) ≡ φ2 (x) or O0 (x) ≡ (∂ν φ(x))2 have, strictly
speaking, no meaning until we specify an ultraviolet regularization procedure, such
as (in the momentum-shell framework) a value for the UV cutoff Λ, and the specific
form of the cutoff (e.g., the smooth function (16.29)). The renormalization group flow
equation (16.41) of the preceding section expresses the fact that the infinite set of
operators so defined, at any given scale Λ, form a complete set, in the sense that
we need only alter their coefficients in the effective Lagrangian in order to obtain an
exactly equivalent description of the low-energy amplitudes of the theory at any other
scale μ < Λ. We once again remind the reader that this equation is an exact non-
perturbative statement about the amplitudes of the effective field theory, assuming
only that we start with a well-defined functional integral at the high scale: in the
proof of (16.41), we have employed only exact functional integral identities, with no
need to expand the exponent of the functional integral in a perturbative series.
Let us explore in a little more detail the freedom we have to choose different sets of
operator products in an effective field theory without altering the physical content of
the theory. At this point we shall resort to perturbation theory to gain some concrete
intuition about the variability entailed by this freedom of choice. Staying for the time
being with the momentum cutoff approach, let us consider the one particle to one
particle matrix elements of O0 (x) ≡ φ2 (x),

k  |O0 (x)|k = eiq·x k  |O0 (0)|k, q ≡ k − k (16.52)

Ignoring uninteresting initial and final-state factors, the matrix element k |O0 (0)|k
receives, in addition to the tree-graph contribution (first graph in Fig. 16.4), a one-loop
contribution of order g2 from the graph on the right in Fig. 16.4:

 D(l2 /Λ2 )−1 D((q − l)2 /Λ2 )−1 d4 l
k |O0 (0)|k = 1 − 12g2 + · · · (16.53)
(l2 + m2 )((q − l)2 + m2 ) (2π)4
= 1 − 12g2 I(q 2 ; Λ2 , m2 ) + · · · (16.54)

where the dots represent other perturbative corrections which are not of interest to
us presently. The one-loop integral I(q 2 ; Λ2 , m2 ) can be expanded in powers of the
momentum variable q << Λ, m (see Problem 4):
590 Scales I: Scale sensitivity of field theory amplitudes and effective field theories

k l k

× + ×

k q−l k

Fig. 16.4 Low-order contributions to a matrix element of φ2 (x).

 Λ2 q 2 n
I(q 2 ; Λ2 , m2 ) = fn ( )( ) (16.55)
n
m2 m2

With a little thought one establishes that the coefficient functions fn , which contain
the dependence on the UV cutoff Λ, and therefore incorporate the conventionality of
our particular regularization of the operator O0 (x), contain at worst a logarithmic
divergence ln Λ2 /m2 in the limit Λ → ∞, plus vanishing corrections involving inverse
powers of the cutoff. For example, taking the first term in the low-momentum expan-
sion, and assuming the parameter α in (16.29) large, so that the cutoff is essentially a
step function at Λ, we find

Λ2 1 m2
f0 ( ) = (ln (Λ 2
/m 2
) − 1 + O( )) (16.56)
m2 16π 2 Λ2
2 2
2 ) for n ≥ 1 given by dimensionless constants plus corrections of O( Λ2 ).
Λ m
with the fn ( m
A similar calculation, again including just the two graphs appearing in Fig. 16.4, gives
for the one-particle matrix element of O0 = (∂ν φ)2 (the only difference being the
appearance of a dot product of the four-momenta entering and leaving the two-point
vertex of the O0 operator),

l · (l − q)D(l2 /Λ2 )−1 D((q − l)2 /Λ2 )−1 d4 l
k  |O0 (0)|k = k · k − 12g2 +···
(l2 + m2 )((q − l)2 + m2 ) (2π)4
= k · k − 12g2 I  (q 2 ; Λ2 , m2 ) + · · · (16.57)

where

 Λ2 q 2 n
I  (q2 ; Λ2 , m2 ) = m2 fn ( )( ) (16.58)
n
m2 m2

In this case a quadratic dependence on the cutoff appears in the leading coefficient
function f0 . Again, taking α large so that D(l2 /Λ2 )−1 is approximately a step function
θ(Λ − |l|), one finds
Regularization methods in field theory 591

Λ2 1 Λ2 Λ2
f0 ( 2
)= 2
( 2 − 2 ln ( 2 ) + 1) (16.59)
m 16π m m
Λ2 1 1 Λ2 2
f1 ( ) = (− ln ( )+ ) (16.60)
m2 16π2 2 m2 3

while the fn are Λ independent constants for n ≥ 2. Of course, for a more general
choice of cutoff function (for example, keeping the parameter α in the cutoff function
finite), the coefficients fn , fn (and their generalizations to all orders of perturbation
theory, as well as the corresponding coefficients for all possible local operators) will
be different, although exactly the same low-energy physics can be reproduced by an
appropriate (different) linear combination of the new set of regularized operators as
defined by the altered cutoff method, as we have seen in the preceding section. The
appearance of power-dependence (quadratic, in the case of (16.59)) on the ultraviolet
cutoff Λ in loop integrals, as we have already seen in Section 16.2, is responsible for
the mixing of operators of different mass dimension in the momentum cutoff approach
to the renormalization group.
We shall now see that an alternative cutoff procedure can be used to give a precise
meaning to the matrix elements of an arbitrary local operator at any order of per-
turbation theory, with the remarkable additional feature that all power-dependences
on the cutoff are removed, leaving only terms with a logarithmic dependence on the
Λ2
cutoff scale (such as the ln ( m 2 ) terms visible in (16.59,16.60)). First, note that the

loop integral appearing in (16.53), with the cutoff functions D−1 omitted, would in
fact be ultraviolet-convergent in any (integer) spacetime dimension less than four: it
is only logarithmically divergent in four dimensions after all, right at the edge, as it
were, of being a convergent integral at large momenta. This suggests a dimensional
regularization approach whereby we temporarily imagine carrying out the integral in
a general spacetime dimension d < 4, and then examine the behavior of the result
as we analytically continue the resultant expression back to the physical spacetime
dimension d = 4. To see how to do this, first note the expression for the radial phase-
space in a general d-dimensional Euclidean integral

  ∞
d 2π d/2
d l= ld−1 dl (16.61)
Γ(d/2) 0

In order to preserve dimensional consistency, so that our expression for the regulated
amplitude retains the same dimension in powers of mass regardless of the dimension d,
we shall append the appropriate power of the ultraviolet scale Λ to each loop integral
to maintain overall mass dimension 4: thus loop integrals will appear as Λ4−d dd l · ·.
Accordingly, we find that the one-loop integral in question, in d-dimensions, becomes

Λ4−d dd l
Id (q 2 ; Λ2 , m2 ) ≡ (16.62)
(2π)d(l2 + m2 )((q − l)2 + m2 )
 
Λ4−d 1 dd l
= dx (16.63)
(2π)d 0 (l2 − 2xq · l + xq 2 + m2 )2
592 Scales I: Scale sensitivity of field theory amplitudes and effective field theories

 1 
Λ4−d dd l
= dx (16.64)
(2π)d 0 (l2 + x(1 − x)q 2 + m2 )2
 1  ∞
π d/2 Λ4−d 2ld−1
= dx dl, (16.65)
(2π)d Γ(d/2) 0 0 (l2 + M2 )2
M2 ≡ x(1 − x)q 2 + m2 (16.66)

where we have used the familiar Feynman parameter identity


 1
1 1
= dx (16.67)
AB 0 ((1 − x)A + xB)2

and performed a shift of integration variable l → l + xq to obtain (16.63). The remain-


ing integral in (16.64) can be evaluated by making the change of variable t = l2 and
using the β function identity
 ∞
tx−1 Γ(x)Γ(y)
dt = (M2 )−y B(x, y) = (M2 )−y (16.68)
0 (t + M2 )x+y Γ(x + y)

whence we find

Γ(2 − d2 ) 1
m2 + x(1 − x)q 2 d −2
Id (q2 ; Λ2 , m2 ) = ( ) 2 dx (16.69)
(4π)d/2 0 Λ2

We see that the right-hand side of (16.69) provides an analytic continuation of our
originally four-dimensional loop integral to general complex spacetime dimensions d,
with a finite result in the region Re(d) < 4, and indeed analytic save at the poles of the
Γ function at d = 4, 6, 8, . . .. Of course, the poles of the Γ function at zero (and negative
integer) values of its argument mean, not surprisingly, that the integral becomes
divergent once we attempt to return to the physical spacetime dimension d = 4. At
this point in the complex d-plane, our continued loop-integral has a Laurent expansion
in the variable  ≡ 4 − d, the first few terms of which (using the Γ function property
Γ(z) ∼ z1 − γ + O(z), z → 0, with γ the Euler–Mascheroni constant) are found to be
 1
1 2 q2
Id (q 2 ; Λ2 , m2 ) ∼ ( + ln (4π)−γ + ln (Λ 2
/m2
)− ln (1+x(1−x) )dx + O())
16π2  0 m2
(16.70)
We now define8 the minimally subtracted dimensionally regularized matrix element of
our O0 operator arising from the one-loop graph in Fig. 16.4 by simply omitting the
pure pole term in (16.70), leaving the remaining “finite part” (henceforth indicated
by the notation FP) in the d → 4 limit:

8 The ubiquitous appearance of the annoying factor of ln (4π) − γ accompanying the pole in has led
to a modified minimal subtraction scheme, wherein the Laurent expansion is made in a shifted variable ¯,
with
2¯ ≡
2 + ln (4π) − γ, and poles in ¯ are then dropped. This is commonly referred to as the “MS-bar”
scheme.
Regularization methods in field theory 593
 1
1 q2
FP Id (q ; Λ , m ) ≡
2 2 2
(ln (Λ2 /m2 ) + ln (4π) − γ − ln (1 + x(1 − x) )dx)
16π2 0 m2
(16.71)
The regularized amplitude (16.71) can be expanded in powers of the momentum as in
Λ2
(16.55): for example, referring to (16.56), we see that the leading coefficient f0 ( m 2)

has exactly the same logarithmic cutoff dependence in the momentum cutoff and
dimensional regularization schemes, differing only by an overall additive constant, up
2
to terms of O( m Λ2 ), suppressed by inverse powers of the cutoff. It can be easily shown
(see Problem 5) that all higher coefficients fn , n ≥ 1 are in fact identical up to such
terms in the two regularization schemes.
For the operator under discussion therefore, O0 = φ2 , there would seem to be no
important differences between the use of a momentum cutoff or the pole subtraction
approach leading to (16.71). If we look instead at the operator O0 = (∂ν φ)2 , with one-
particle matrix elements given in (16.58), the situation is very different. Here, the one-
loop integral, containing the extra factor of l · (l − q) in the numerator, has a quadratic
dependence on the ultraviolet cutoff, resulting in the appearance of terms quadratic in
Λ in the leading coefficient f0 (see (16.59)). On the other hand, the dependence of the
dimensionally regularized amplitude on the cutoff Λ appears only through the prefactor
Λ4−d = Λ , and in developing the Laurent expansion of the regularized Feynman
integral in powers of  it is apparent that only powers of logarithms of the cutoff
Λ can appear and not whole integer powers, via Λ = 1 +  ln Λ + 12 2 (ln Λ)2 + . . ..
A straightforward calculation, following exactly the steps used above to arrive at
the minimally subtracted matrix element corresponding to (16.58), gives for the matrix
element of O0 ,

Λ4−d l · (l − q)dd l
Id (q 2 ; Λ2 , m2 ) ≡ (16.72)
(2π)d (l2 + m2 )((q − l)2 + m2 )
1 1
FP Id (q 2 ; Λ2 , m2 ) = 2
{Aq 2 + Bm2 − ( q 2 + 2m2 ) ln (Λ2 /m2 )
16π 2
 1
q2
+ (3x(1 − x)q 2 + 2m2 ) ln (1 + x(1 − x) 2 )dx} (16.73)
0 m
1 1
A= (γ − ln (4π)) − , B = 2(γ − ln (4π)) − 1 (16.74)
2 6

Comparing with the result (16.59) for the zero-momentum amplitude in the momen-
tum cutoff scheme, we see that the term proportional to Λ2 has, as expected,
disappeared: only a logarithm of the cutoff appears, which is in fact restricted to the
coefficients f0 , f1 , where it appears with the same coefficient in both the momentum
and dimensional regularization schemes.
Our discussion of dimensional regularization has clearly been very restricted: we
have considered only a few simple low-order perturbative contributions to a particular
matrix element of the two simplest local operators of our theory. It would clearly
be very desirable to derive a non-perturbative renormalization group equation for an
effective Lagrangian defined in terms of such operators, along the lines of the derivation
594 Scales I: Scale sensitivity of field theory amplitudes and effective field theories

given in the preceding section for the momentum cutoff scheme. Unfortunately, the
obviously very formal prescription given here for obtaining finite matrix elements of
local operators, by simply eliminating the pure pole parts at d = 4 in Feynman loop
integrals analytically continued to complex dimensionality, cannot be extended beyond
the perturbatively expanded amplitudes of the field theory, as we simply have no way
of giving a sensible non-perturbatively valid definition of a local quantum field theory
in other than integer dimensions. For example, we do not know how to write down
the analog of the functional integral (16.32) for the exact generating functional of a
theory in non-integer dimensions, whose dynamics is specified in terms of an effective
Lagrangian expanded in local operators, with perturbative matrix elements defined
by dimensional pole subtraction.
Nevertheless, as we shall see in the next section, many important applications of
effective field theories may be carried out completely in the context of perturbation
theory, and in such cases the dimensional regularization approach is extraordinarily
useful. We have already seen a glimmer of why this might be the case in the examples
above: unlike the situation in a momentum cutoff scheme, integer powers of the renor-
malization scale which would otherwise result in the mixing of operators of different
dimension as the scale is changed are simply absent in dimensional regularization—a
fact which enormously simplifies the power-counting behavior of effective field theories.
In particular, the contribution of higher-dimension “irrelevant” operators to low-
energy amplitudes will remain “small” (in a precisely quantifiable sense) even when
loop integrals are considered, provided we employ dimensional regularization methods
to define these integrals.
Another very important advantage of dimensional regularization (over the momen-
tum cutoff approach) is the ease with which it incorporates local vector gauge
symmetries, which are formally preserved in this approach, as the form of the
Lagrangian for such symmetries remains unaltered in (integer) dimensions other than
the physical one. This turns out to have the very pleasant consequence that the
Ward identities of the theory expressing the local gauge symmetry are preserved
under dimensional regularization. We shall return to these issues in the subsequent
chapters. In particular, a consistent definition of regularized local composite operators,
extending the low-order examples given above, but valid to all orders of perturbation
theory, requires graph-theoretical technology that we will develop in the next two
chapters when we consider perturbative renormalization in more detail. The general
procedure—the “normal product formalism” of Zimmermann—for obtaining well-
defined local composite operators will be explained in detail in Section 18.1.
Before leaving the issue of regularization, we should comment on one potentially
confusing issue which may already have crossed the reader’s mind in connection
with the absence of power-dependence on the cutoff scale in dimensionally regular-
ized amplitudes. We previously emphasized the difficulty—due to just such power-
dependences in a momentum cutoff scheme-with maintaining “small” values (i.e.,
much smaller than the cutoff scale of the theory) for the coefficients of the relevant
operators in an effective Lagrangian defined by momentum cutoff. This “fine-tuning”
difficulty is most dramatically manifested in the cosmological constant and hierarchy
(Higgs mass) problems, briefly discussed earlier. These issues are not obviated by
the existence of a regularization scheme (dimensional regularization), where power-
Effective field theories: a compendium 595

dependence on the cutoff scale is automatically absent. In a sense, the “fine-tuning”


at the UV scale necessary to remove, order by order in perturbation theory, large
shifts in the coefficients of the relevant operators at low scales is just an automatic
consequence of the structure of dimensionally regularized perturbative amplitudes: it
is built ab initio into the definition of local operators (or rather, their perturbative
matrix elements) in the dimensional scheme, and is therefore a matter of convention,
not physics. In any event, the important applications of dimensional regularization in
effective field theories, as we shall see in the next section, occur in situations (e.g.,
chiral Lagrangians in QCD) in which the UV scale is perhaps an order or two of
magnitude above the scale of the interesting physics, so there is no issue of “fine-
tuning” at the level of 10−15 or 10−120 as in the hierarchy or cosmological-constant
problems.

16.6 Effective field theories: a compendium


Our study of effective field theory so far has concentrated on using the technology of
effective Lagrangians to characterize in a precise mathematical language the change
in the form of the local dynamics of an underlying microscopic theory when it is
examined at progressively longer distance scales, “smearing out”, as it were, the fine
details of the interactions at shorter distances. We have chosen as our prime example
a theory of a single self-interacting scalar field, and the aforesaid smearing process
can be expressed in this case very simply in terms of the momentum-space Fourier
modes φ̃(k) of the field, by progressively integrating out in the path-integral modes
with momentum |k| > Λ, where Λ is a sliding ultraviolet cutoff.
The Wilsonian effective Lagrangian obtained by this procedure is just the simplest
example of a much more general class of effective field theories, obtained in general by
a combination of (a) change of functional variable of integration in the path integral
defining the theory, and (b) a partial evaluation of the resultant path integral, whereby
some, but not all, of the field variables are integrated out in the absence of external
sources. The field modes removed by integration correspond to those which we are
not interested (or unable) to probe, perhaps because they correspond to amplitudes
which are inaccessible in presently available low-energy experiments. The dependence
of the functional integrand obtained by this partial integration on the remaining field
modes defines the effective Lagrangian which incorporates the accessible physics of the
theory. It is essential to recognize that this effective field theory yields exact results
for the n-point amplitudes of the remaining “smeared” fields.
Let us make the argument a little clearer by introducing some notation which
captures the general context. Let φn , n = 1, 2, ..N represent the complete set of fields
used to define the dynamics of the theory at some high-energy (short-distance) scale,
and define a new set of fields Φm , m = 1, 2, ..M by

Φm = fm (φn ; Λ) (16.75)

Note that (a) the number M of smeared fields Φm may be different (typically, smaller)
than the original number N of short-distance fields φn , and that (b) the smearing
functions fm may be linear or non-linear in character, may depend on a sliding energy
scale Λ, and are not in general invertible—the smearing in this sense entailing a “loss
596 Scales I: Scale sensitivity of field theory amplitudes and effective field theories

of information” as regards the full local physics of the theory. If the theory is originally
specified in terms of the original φn fields via a (Euclidean) Lagrangian L(φn ), then
the effective Lagrangian Leff (Φm ) associated with the smeared fields Φm is defined by
 4    4
e− d xLeff (Φm ) ≡ Dφn δ(Φm − fm (φn ; Λ))e− d xL(φn ) (16.76)
n

Provided we are only interested in the n-point functions of the new fields Φm , we can
discard completely the original microscopic Lagrangian L(φn ) in favor of the effective
theory defined by Leff (Φm ), as the generating functional for the Φm can be written
entirely in terms of the latter,
   
− d4 xL(φn )+ d4 xJm (x)fm (φn ;Λ)
Z[Jm ] = Dφn e (16.77)
n
   
d4 xLeff (Φm )+ d4 xJm (x)Φm (x)
= DΦm e− (16.78)
m

as we can see by introducing the definition (16.76) on the right-hand side of (16.78). We
note here that the smearing functions are subject to certain smoothness requirements
in order to ensure that the resultant effective Lagrangian (16.76) can be expanded
in multi-nomials of local products of the smeared fields Φm (see the discussion
following (16.47)).
The application of effective field theory methods has become a wide-spread industry
in modern high-energy physics, and it would certainly require an entire additional
volume to do justice to only the most widely used. We shall conclude our very brief
introduction with a few examples that illustrate the main types, based on the nature
of the smearing functions used to define the effective theory, and refer the reader to
more detailed treatments available in the many excellent reviews of this subject for
a more thorough discussion of the individual cases. Following the general philosophy
exemplified by (16.75), we can choose functional change of variables which involve
1. a linear transformation of the modes of a single fundamental field of the theory,
2. a linear transformation involving several distinct fundamental fields, or
3. a non-linear change of field variables. In this latter case, one may be left with
an effective field theory involving completely different fields than the underlying
“microscopic” elementary fields which define the short-distance dynamics of the
theory.
We have already encountered an example of the first type in our discussion of the
renormalization group transformation of a scalar field theory with a momentum cutoff.
Here the smearing function amounts to a cutoff of the Fourier modes φ̃(k) of a single
scalar field. An extreme example of such a cutoff is the constraint effective potential
discussed in Section 14.3 (cf. (14.58)), where the effective field Φ is just the zero-
momentum mode φ̃(0) of the original field theory: all non-zero momentum modes are
integrated out. As we saw in Chapter 14, the remaining (highly truncated!) effective
theory is an important tool when examining the possibility of spontaneous symmetry-
breaking of the underlying theory. A somewhat less trivial example is provided by
Effective field theories: a compendium 597

non-relativistic effective field theory (NREFT),9 where we are interested in the Fourier
modes φ̃(k) of a massive field corresponding to non-relativistic quanta of the same:
i.e., with | k| ∼ κ << m, |k0 | ∼ m + O(κ2 /m). We can expose these modes of the field
by a simple linear transformation of the original field φ, which here we take to be a
real scalar field with φ4 interaction and short-distance (Minkowski) Lagrangian
1 1 λ
L= ∂μ φ∂ μ φ − m2 φ2 − φ4 (16.79)
2 2 4!
Define a new field
 
0 d4 k
ψ(x) ≡ (2m)eimx θ(k0 )φ̃(k)e−ik·x (16.80)
(2π)4
in terms of which the original field may be written
1 0 0
φ(x) = √ (e−imx ψ(x) + eimx ψ † (x)) (16.81)
2m
We shall assume that “relativistic” modes of the new field ψ(x) have been integrated
out,10 and that the effective theory that remains contains only Fourier modes of ψ̃(k)
with k0 << m. Accordingly, when (16.81) is substituted back into the Lagrangian
(16.79), terms with unequal numbers of ψ and ψ† fields are accompanied by time-
0
dependent factors e±2inmx with n a non-zero integer which must vanish when we
integrate the Lagrange density over time to form the action, as they cannot be
compensated by the remaining time-dependence of the ψ fields (which, given the
assumed momentum scales |k0 | ∼ m + O(κ2 /m) of the original φ field, mean that
the momentum modes relevant to ψ̃(k) have | k| ∼ κ << m, k0 ∼ O(κ2 /m) << m).
Substituting (16.81) in the Lagrangian (16.79), one finds for the Minkowski action of
the resultant effective theory the leading terms

1 † 2 λ
L → d4 x{iψ̇ψ† + ψ ∇ ψ− (ψ † ψ)2 } + · · · (16.82)
2m 16m2

where the dots refer to terms with unequal numbers of ψ and ψ † (which will induce
pair creation and annihilation processes which are unimportant in the non-relativistic
k2
1
regime), as well as terms like 2m ψ̇ † ψ̇ which scale like m0 (subleading, as k02 /m << k0 ∼
k 2 /m, the scaling behavior of the free, quadratic part of the effective Lagrangian). The
effective Lagrangian (16.82) can be used to establish the existence of bound states in
d = 2, 3 dimensions (for negative λ) exactly as in Section 11.2, and the reader can verify
that for weak coupling the bound-state energy is correctly determined to leading order

9 The reader may find it convenient at this point to review our discussion of non-relativistic threshhold
physics in Section 11.2.
10 As in the renormalization group transformations of Section 16.3, this will result in a modification of the
coefficients of the leading terms in the effective Lagrangian. In weakly coupled theories, as we imagine here,
these modifications will be small and can be computed in perturbation theory. Practically, the determination
of the coefficients in the effective Lagrangian for any given cutoff scheme is carried out by a “matching”
process which we shall describe briefly at the end of this section.
598 Scales I: Scale sensitivity of field theory amplitudes and effective field theories

in λ by the bubble diagrams generated by the non-relativistic Lagrangian (16.82) (see


Problem 6).
The essential feature of the effective field theory chosen here is the careful choice of
a change of field variable, followed by integrating out those Fourier modes which do not
correspond to the dominant momentum regions involved in the physics of interest—in
this case, the infrared regions generating the threshold singularities responsible for
non-relativistic bound states in weakly coupled theories (cf. Section 11.2). The full
development of the non-relativistic effective field theory for QCD (termed NRCQD)
has led to important progress in understanding the physics of systems containing
“heavy quarks” (i.e., charm, bottom, and top quarks) (Bodwin et al., 1995). Other
examples of effective field theories of this type are heavy quark effective theory
(HQET) and soft collinear effective theory (SCET), which again involve changes of
field variable and momentum mode restrictions appropriate for extracting the leading
contributions to QCD processes involving, respectively, (HQET) processes with non-
relativistic heavy quarks interacting with relativistic light quarks and gluons, and
(SCET) processes in which highly energetic quarks interact with gluons or sets of
gluons with small total squared four-momentum. Much more on the use of effective
field theory methods in heavy quark physics generally can be found in the text of
Manohar and Wise (Manohar and Wise, 2000).
The second type of effective field theory enumerated above corresponds to a situa-
tion in which the smearing function acts differently (though linearly) on different fields
in the theory. The simplest case is one in which we simply integrate out completely the
degrees of freedom corresponding to some subset of particles in the theory. Typically,
this is useful when the particles can be divided into “light” and “heavy” subsets,
with only the light particles accessible at available accelerator energies, so that the
effects of the heavy particles occur only through their appearance in internal lines
in the graphs of the theory. A simple toy model illustrating this situation consists
of a light fermion field ψ (of mass m) coupled to a heavy scalar φ (of mass M )
via a Yukawa interaction g ψ̄ψφ. If we ignore self-interactions of the scalar field, the
Euclidean-generating functional of the theory takes the form
 
d4 x(Lψ + 12 (∂μ φ)2 +M 2 φ2 )+g ψ̄ψφ)
Z= DψDφe− (16.83)

where Lψ is the free Lagrangian for the light fermion (although we may also allow
this field to have other interactions unconnected with the heavy scalar, e.g., gauge
interactions, without altering what follows in any essential way). The exponent in this
functional integral is at most quadratic in the scalar field, which we may therefore
integrate out completely, obtaining
  2 
d4 xLψ − g2 d4 xd4 yS(x)ΔE (x−y)S(y)
Z= Dψe− (16.84)

S(x) ≡ ψ̄(x)ψ(x) (16.85)



e−ik·z d4 k
ΔE (z) = (16.86)
k 2 + M 2 (2π)4
Effective field theories: a compendium 599

If we formally expand the massive Euclidean scalar propagator ΔE in inverse powers


of the scalar mass,

1 k2 k4 d4 k
ΔE (z) = e−ik·z ( 2 − 4 + 6 . . .)
M M M (2π)4
1 4 1
= 2
δ (z) + 4 z δ 4 (z) + . . . (16.87)
M M
we see that the net effect of the heavy scalar is to induce an effective Lagrangian
in terms of the light fermion fields containing an infinite number of local terms of
progressively higher dimension (from the spacetime-derivatives) and inverse powers of
the large mass:

Lind (ψ) = G0 S(x)2 − G2 (∂μ S(x))2 + . . .


= G0 (ψ̄(x)ψ(x))2 − G2 (∂μ (ψ̄(x)ψ(x)))2 + . . . (16.88)
2 2
g g
G0 ≡ , G2 ≡ ,... (16.89)
2M 2 2M 4
When we consider the low-energy amplitudes of the light fermion, higher terms will
therefore be suppressed by (even) powers of the ratio of the low momenta (or masses)
of the fermion amplitudes divided by the heavy mass M . Comparing the form of this
induced Lagrangian with the general structure (16.7, 16.8), we see that in this case
the heavy particle mass is playing the role of the ultraviolet cutoff Λ in the momentum
cutoff theory. The neglect of heavy-particle self-interactions means that our toy model
includes their effects only at tree level: the graphs generated by (16.88) correspond to
the contraction of a single internal heavy-particle line to a four-fermion vertex, with
the momentum dependence of the heavy-particle propagator appearing as an infinite
series of higher derivative terms in coordinate space.
The Fermi weak interaction Lagrangian, quartic in fermion fields, arises in just
this way, by integrating out the heavy W boson field in the electroweak model. In that
case, of course, the quartic effective Lagrangian involves the square of vector and axial
currents, as the W couples vectorially to the Fermi fields of the theory. Once loop
effects are taken into account, things become more complicated (just as discussed in
Section 16.3). The general situation was first delineated by Appelquist and Carrazzone
(Appelquist and Carrazzone, 1975), in their decoupling theorem, which asserts that the
only effect of integrating out a subset of heavy-particle fields in a local field theory,
apart from terms suppressed by ratios of the momentum scales and masses of the
light degrees of freedom to the heavy-particle masses, consists of a renormalization of
the coefficients of the relevant and marginal operators in the remaining field theory,
provided that the latter form a perturbatively renormalizable set of operators. The exact
meaning of the latter condition will become clear in the following chapter, when we
define and discuss in depth the concept of perturbative renormalization.
Our final category of effective field theories corresponds to situations in which
the effective Lagrangian is specified in terms of fields which do not even appear in
the microscopic Lagrangian specifying the short-distance dynamics: typically, these
are non-linear functions of the original elementary fields of the theory. The classic
600 Scales I: Scale sensitivity of field theory amplitudes and effective field theories

example of this type is provided by the chiral Lagrangians describing the low-energy
behavior of hadronic amplitudes. The underlying QCD Lagrangian containing quark
and gluon fields in this case is replaced by an effective Lagrangian in terms of meson
(and possibly baryon) fields. We shall illustrate the basic idea by taking a highly
simplified version of the real world as our starting point: we assume there are only
two quarks (the “up” and “down” quarks), and neglect, at least initially, their masses
mu and md (which are known to be much smaller than all other mass scales in QCD),
so that our theory is described at the microscopic level (i.e., at distance scales much
smaller than a fermi) by the Lagrangian (cf. (15.112))

1 2
LQCD = − Tr(Fμν F μν ) + q̄a iD
/ qa (16.90)
4 a=1

The local gauge group will as usual be taken to be SU(3), although we shall see that
specific details of the gauge group dynamics remain essentially hidden through the
process of generating the effective field theory. We use the notation qa (x) for the quark
fields, with q1 (x) = u(x) the up-quark field and q2 (x) the down-quark field. Defining
left and right chiral parts of the quark fields in the usual way, qL (x) = 1+γ 5
2 q(x),
qR (x) = 1−γ
2
5
q(x), the quark kinetic term in (16.90) can be rewritten as
 
q̄a iD
/ qa = (q̄La iD
/ qLa + q̄Ra iD
/ qRa ) (16.91)
a a

as the γ0 γμ product separating q † (x) from q(x) in the quark kinetic bilinear commutes
with γ5 . Formally, therefore, our fundamental Lagrangian is invariant under the eight-
dimensional chiral group U(2)xU(2) (four generators from each U(2)) corresponding
to the global linear field transformations

qLa (x) → VLab qLb (x)


qRa (x) → VRab qRb (x) (16.92)

with VL , VR independent 2x2 unitary matrices. We may identify several important


subgroups of this chiral group:

1. The abelian subgroup U(1)×U(1) corresponding to VL = eiωL , VR = eiωR where


ωL,R are real phases. As we saw previously in Section 15.5 (cf. (15.169)), the
diagonal subgroup with ωL = ωR corresponds to an exact Noether symmetry—
in our case, just “quark number conservation” (or equivalently, modulo a factor of
3, baryon number conservation), while the chiral subgroup with ωL = ω = −ωR
is broken by the chiral anomaly at the quantum level, and is therefore not a
symmetry of the full quantum field theory. Note that this latter symmetry would
also be broken explicitly if a quark mass term (involving the bilinears q̄L qR , q̄R qL )
were present.
2. The diagonal non-abelian SU(2) group defined by VL = VR = V, V ∈ SU(2), also
corresponds to an exact Noether symmetry of the theory: isospin symmetry (cf.
Section 12.5).
Effective field theories: a compendium 601

3. The chiral non-abelian subgroup defined by VL = VR† = V ∈ SU(2) is (absent


quark mass terms) an exact Noether symmetry of the theory: it is not anomalous
(the anomaly (15.187) for the corresponding current would contain a single
generator ta of SU(2) in the trace, which would thereupon vanish). However,
as we shall now discuss, it fails to be a symmetry of the vacuum: the chiral
symmetry of QCD is spontaneously broken.
The final item above plays a central role in the low-energy dynamics of the theory, as
we know from Goldstone’s theorem that it immediately implies the presence of three
massless spinless particles in the theory—one for each broken generator of the chiral
SU(2) group. Of course, the fact that the up and down quark masses are not zero, but
small, means that in the real world these particles are light, not massless, and since the
late 1960s they have been identified with the pion iso-triplet. The hypothesis that the
chiral symmetry is spontaneously broken via the appearance of a quark condensate,
whereby the quark bilinear q̄L qR acquires a non-vanishing vacuum expectation value
(VEV)

0|q̄La qRb |0 = Bδab (16.93)

where B is a constant of dimension mass3 (from the dimensions of the quark fields) is
no longer a matter for any serious debate: its validity has been more than adequately
confirmed by extensive non-perturbative numerical computations using lattice QCD.
Under a general chiral transformation (VL , VR ),

q̄La qRb → VLa  a VRbb q̄La qRb (16.94)

from which we see that the VEV (16.93) leaves the diagonal isospin subgroup VL = VR
unbroken, but does indeed break the chiral SU(2) subgroup with VL = VR† . Of course,
we are at liberty to redefine the quark fields by a (dynamically exact) chiral symme-

try transformation qLa (x) → Vab qLb (x), qRa (x) → Vab
T
qRb (x), for some V ∈ SU(2),
whereupon the VEV becomes

0|q̄La qRb |0 → B(V 2 )ab ≡ V |q̄La qRb |V  (16.95)

where in the final equality we have chosen to parameterize the degenerate vacua of
the theory by the chiral transformation V connecting the particular vacuum to the
canonical one in (16.93) corresponding to V = 1. From our discussion of spontaneous
symmetry-breaking in Section 14.3, we recall that well-defined amplitudes in an
infinite-volume theory where an initially exact symmetry is spontaneously broken
can be obtained only by introducing a small symmetry-breaking perturbation which
“tickles” the system into a particular one of the infinitely many degenerate vacua,
before the infinite volume limit is taken. As we shall see below, in real life this
perturbation is provided by the quark masses which we have so far neglected.
In our proof of the Goldstone theorem in Section 14.2, we saw that the Noether
current of a spontaneously broken symmetry serves as an interpolating field for
the corresponding Goldstone boson (in other words, this operator possesses a non-
vanishing vacuum to single-particle matrix element for the corresponding Goldstone
boson, which by Haag–Ruelle theory means that it can be used to construct the exact
602 Scales I: Scale sensitivity of field theory amplitudes and effective field theories

S-matrix scattering amplitudes of the Goldstone particle). In our case the (three)
relevant currents are the axial vector currents J5aμ ≡ q̄γμ γ5 τa q, a = 1, 2, 3 (where τ1,2,3
are the Pauli matrices: we avoid using the usual σ notation here as a field with this
name will shortly make its appearance). We may just as well use the pseudoscalar
operators q̄γ5 τa q, however: indeed, if a quark mass term is present (as it is, in the
real world), these operators are proportional to the divergence of the axial vector
currents J5aμ (cf. (15.199), but with no anomaly term and an SU(2) generator matrix
τa included both in the current on the left and the pseudoscalar divergence on the
right). Consequently, they must have a non-vanishing vacuum to single pion matrix
element if the J5aμ do.
We shall therefore content ourself with writing a generating functional with sources
for the q̄γ5 τa q operators, as well as for the scalar density q̄q = q̄L qR + h.c. whose VEV
signals the spontaneous symmetry-breaking of the theory. Knowledge of this functional
is tantamount (in virtue of the Haag–Ruelle or LSZ scattering theories discussed in
Chapter 9) to knowledge of the full set of multi-pion scattering amplitudes in the
theory. We therefore define (in Minkowski space, and glossing over the usual fine
points vis-à-vis gauge fixing, DFP determinants, ghosts, etc., in our specification of
the functional integral)
  4
Z[s, p ] ≡ Dq̄DqDAμ ei d x(LQCD −q̄(x)(s(x)−iγ5
τ ·
p(x))q(x)) (16.96)

where the Aμ are the gauge vector gluon fields implementing the underlying color
SU(3) local gauge symmetry, and s(x), p (x) are as usual c-number external sources cou-
pled to the operators of interest in the theory. The expected spontaneous symmetry-
breaking means that we must supplement this functional specification of the theory
by a choice of vacuum, inserted via an infinitesimal “magnetizing” field—in this case,
a small perturbing quark mass term q̄(x)q(x) ( small, real and positive), which can
be implemented by taking the source field s(x) to contain the spacetime constant
term  (plus, as usual, fields vanishing at infinity, or outside some compact region of
support). The source term may be decomposed by chirally splitting the quark fields
in the usual way:

q̄(s − iγ5 τ · p )q = q̄L (s + i τ · p )qR + q̄R (s − i τ · p )qL


= q̄L χqR + q̄R χ† qL (16.97)
χ(x) ≡ s(x) + i τ · p (x) (16.98)

From the form (16.97) it follows immediately that the source term (together with the
Lagrangian LQCD , from our previous discussion) is invariant under the joint chiral
transformation (16.92) (with VL , VR SU(2) matrices), together with the source field
transformation

χ(x) → VL χ(x)VR† (16.99)

If we consider the effect of a functional change of field variable in the path integral
(16.96) consisting precisely of such a SU(2)×SU(2) chiral transformation (which, being
anomaly-free, has unit functional Jacobian), we see that this invariance transfers
Effective field theories: a compendium 603

directly to the functional Z[s, p ] which we may just as well write as a functional
Z[χ] of the 2x2 matrix source field χ(x):

Z[χ] = Z[VL χVR† ] (16.100)

Next, let us define a new 2x2 matrix field Σ(x) = σ(x) + i τ · π (x) which incorporates
four fields—an isoscalar σ and an isovector π —in terms of which we shall write our
effective Lagrangian. The latter may be defined in terms of the functional Fourier
transform of Z[χ], as follows:
   4 †
d xLeff [Σ] ≡ −i ln ( Dχe− 2 d xTr(χ (x)Σ(x)) Z[χ])
i
4
(16.101)

where the factor of one-half arises as a consequence of the relation Tr(χ† (x)Σ(x)) =
Tr(χ(x)Σ† (x)) = 2(s(x)σ(x) + p (x) · π (x). The inverse Fourier transform relation then
becomes
  4 1 †
Z[χ] = DΣei d x(Leff [Σ]+ 2 Tr(χ (x)Σ(x))) (16.102)

Once again, the chiral SU(2)×SU(2) symmetry (16.100) transfers directly to our new
effective Lagrangian Leff [Σ] defined in (16.101),11

Leff [Σ] = Leff [VL ΣVR† ] (16.103)

The effective Lagrangian Leff [Σ] is an exact transcription of the dynamics of QCD
relevant for the determination of the full n-point Green functions of the quark bilinear
fields q̄q and q̄γ5 τ q coupled to the sources χ: in particular, if we knew the exact
form of this functional, we would be able to calculate arbitrary multi-pion scattering
amplitudes exactly, and even determine the exact nucleon mass from the location of
the nucleon–antinucleon threshold in π 0 − π 0 scattering, for example! Of course, all we
know about this effective Lagrangian is that it is invariant under the chiral symmetry
(16.103). However, by the same arguments of clustering and Lorentz-invariance which
led to the general form (16.6) in Section 16.2, any (appropriately regularized) effective
Lagrangian must be expandable in an infinite series of products of local operators
and their spacetime derivatives. The chiral symmetry which we must impose on
Leff [Σ] implies that only certain combinations of the matrix field Σ can appear in this
expansion: specifically, the Lagrangian must be given in terms of traces of products of
Σ and Σ† arranged to ensure the validity of (16.103). We thereby obtain the following
expansion (rescaling the field Σ to fix the coefficient of the kinetic term at 14 , and
including an infinitesimal mass term Tr(Σ) from the source s(x) as discussed above,
with a coefficient rescaled to  after rescaling Σ)

1 μ2 λ
Llin [Σ] = Tr(∂μ Σ† ∂ μ Σ) + Tr(Σ† Σ) − (Tr(Σ† Σ))2 +  Tr(Σ) + . . . (16.104)
4 4 16

11 The reader may easily check that the SU(2)xSU(2) chiral symmetry in (16.103) is equivalent to an
O(4) rotation of the four fields σ, 
π . One frequently finds discussions of chiral symmetry phrased in terms
of this O(4) language.
604 Scales I: Scale sensitivity of field theory amplitudes and effective field theories

where the dots refer to higher-dimension (and therefore, in the sense of Section 16.3,
irrelevant) operators such as (Tr(Σ† Σ))3 , Tr(∂μ Σ† ∂ν Σ)Tr(∂ μ Σ† ∂ ν Σ), etc. We shall
return to the role of higher-dimension operators in our theory below. Here, we note
that the dimension 4 (or less) terms indicated in (16.104) constitute the so-called
“linear σ model”. As our field σ = 12 Tr(Σ) reproduces the matrix elements of q̄q we
must ensure that it acquires a VEV in the vacuum, which implies the choice of sign
of the second term in (16.104). The linear model leads in the usual way at tree level
to a vacuum expectation value for the σ field at the unique minimum of the field
polynomial (for infinitesimal positive  )

μ2 2 λ
P (σ, π ) = − (σ + π 2 ) + (σ 2 + π 2 )2 − 2 σ (16.105)
2 4

which occurs at σ ≡ v = μ/ λ. Displacing the field σ(x) = v + σ̂(x) in the usual
way, we find that the π fields lose their mass term and become massless Goldstone
bosons as expected. This result, of course, continues to hold at the exact minimum of
the non-derivative part of the (unknown!) full Llin [Σ], which by the chiral symmetry,
is necessarily a function of σ 2 + π 2 : thus, at the minimum, we have π = 0 and the flat
directions are just those of the π fields. In addition to the massless Goldstone π fields,
the model also contains the massive σ̂ degree of freedom, with mass of order μ.12
As we explained in our general discussion at the beginning of this section, the
derivation of an effective field theory usually entails, in addition to a functional change
of variable, the partial elimination of degrees of freedom by integrating out field modes
which are not important at the energy scales of interest. We now proceed to this second
step, beginning with the (still, in principle exact) effective theory (16.104). We shall be
interested in processes occurring at momentum scales much lower than the mass scale
μ of the “heavy” degrees of freedom interpolated by the σ̂ field. Precisely as in our
discussion of heavy particle decoupling above, we can do this by integrating out the σ
degree of freedom, leaving an effective Lagrangian depending only on the Goldstone
fields π . The most convenient way to do this involves a further change of variable,
whereby we re-express the theory in terms of new fields S(x), Π via the non-linear
transformation


Σ(x) = σ(x) + i τ · π (x) ≡ S(x)ei
τ ·Π/v , S(x) ≡ σ 2 + π 2 = v + Ŝ(x) (16.106)

The reader may easily verify (Problem 8) that this change of variable, when inserted
in (16.104), leads, after expanding the exponential, to a Lagrangian with (canonically
normalized) massive Ŝ field and massless Π fields. The chiral symmetry transfers


directly to the unitary matrix field U (x) ≡ ei
τ ·Π/v (as S(x) = det(Σ) is chirally
invariant): the theory must be invariant under

12 Note that while the theory contains physical (massless) pions, there is no stable particle associated
with the σ field: the mass scale μ is naturally of the order of the other important physical hadronic scales,
e.g., the rho resonance pole or nucleon mass, i.e., closer to 1 GeV, and quite a bit larger than the VEV v,
which turns out to be just the pion decay constant fπ ∼ 100 MeV (see Problem 7).
Effective field theories: a compendium 605

U (x) → VL U (x)VR† (16.107)

The result of integrating out the massive Ŝ field must therefore be an effective
Lagrangian Lnonlin [U ], subject to the exact global symmetry (16.107), which our new
effective theory inherits from the original symmetry (16.103) of the linear model.
Following a by now familiar pattern, we therefore set about constructing the most
general chirally invariant functional of the unitary matrix field U , as an expansion
in terms involving traces of products of U , U † and their spacetime-derivatives. As
factors of U and U † must appear adjacent in the traces to ensure invariance under
(16.107), they must have derivatives to avoid evaporating via the unitarity constraint
U † U = U U † = 1. The coefficient of the leading term in the expansion (two derivatives

only) can be chosen to yield the canonical normalization of the kinetic term for the Π
field, and we find the non-linear σ model

v2
Lnonlin [U ] = Tr(∂μ U † ∂ μ U )
4
+L1 Tr(∂μ U † ∂ μ U )2 + L2 Tr(∂μ U † ∂ν U )Tr(∂ μ U † ∂ ν U )
+L3 Tr(∂ μ U † ∂μ U ∂ ν U † ∂ν U ) + . . . (16.108)

where we have followed the notation of Gasser and Leutwyler (Gasser and Leutwyler,
1985) in notating the higher-order coefficients L1 , L2 , etc. If we expand the exponential

U = ei
τ ·Π/v , we find an effective Lagrangian with a massless kinetic term for our
Goldstone pion fields Π and interaction terms which all contain two (or more)
spacetime-derivatives. For example, the terms L2 with two derivatives only (from
the first term on the right-hand side of (16.108)) become (see Problem 9)
1 + 1 (Π · ∂μ Π
Π
· ∂μΠ
−Π
2 ∂μ Π
· ∂ μ Π)
+ ...
L2 = ∂μ Π · ∂ μ Π (16.109)
2 6v 2
where the dots represent operators of dimension 8 or higher. The terms L4 , L6 .. with
four, six,.. spacetime-derivatives may similarly be expanded in powers of the fields
which interpolate for the massless pions of our toy theory. At least at tree level,
Π
we see that the interaction terms in this effective Lagrangian give rise to powers of
the external momenta of the process corresponding to the spacetime-derivatives, and
that at low momenta (much smaller than the scale v, say, which can be shown to
correspond exactly to the pion decay constant fπ —see Problem 7) the multi-pion
scattering amplitudes of our theory should be given to a good approximation by the
graphs generated by L2 , with progressively smaller contributions from L4 , L6 . . ., etc.
But what about loops, which must certainly be included if we wish to calculate in
a systematic way the complete amplitudes implied by our effective theory? And what
about the infinite set of terms represented by the dots in (16.109), containing higher
powers of the Π field, but still only two derivatives, which we may expect to contribute
comparably to the indicated ones, by this argument?
In fact, as the discussion in Section 16.5 made clear, the formal expansion of
an effective Lagrangian in local operators containing powers of the fields and their
606 Scales I: Scale sensitivity of field theory amplitudes and effective field theories

derivatives only acquires a precise meaning once a regularization procedure is provided


which defines unambiguously the n-point functions of these operators. In the case of
chiral Lagrangians, the dimensional regularization scheme described there is by far
the best choice. The cutoff introduced in this scheme was called Λ previously, but in
accordance with standard convention, we shall use μ henceforth, to be distinguished
from the similarly notated scale of the “heavy” σ degrees of freedom of the linear
σ model discussed above, although the scale of the dimensional regularization may
perfectly well be chosen at such a value.
Note that the choice of μ is in a sense arbitrary, as it merely corresponds to a
reshuffling of the definition of the (infinite set) of dimensionally regularized operators
of the theory (as we shall see in Chapter 17), although it should certainly be chosen
at a “sensible” value which minimizes the size of the logarithmic arguments in the
amplitudes of interest (thereby improving the rapidity of convergence of the chiral
perturbation theory). Indeed, as we saw in the preceding section, the only dependence
of the regularized amplitudes of the theory on the regularization scale μ comes through
logarithmic factors of the form ln (p2 /μ2 ), where p is a generic external momentum
of the process, as the loop integrals contain only massless pion propagators carrying
combinations of the external and loop momenta, and the latter are integrated out,
with dependence on μ only arising from expanding the μ4−d factors associated with
each loop integral around the physical dimension d = 4.
Apart from the mass dimensions provided by the couplings associated with the
various vertices in the 2(4,6,. . . ) derivative Lagrangians L2 (L4 , L6 ,. . . ), which (as U
is dimensionless) must have mass dimension M 4−n , and the dimension M −1 associated
with the 1/v factor accompanying each pion in an interaction vertex, the remaining
mass dimension of a dimensionally regulated amplitude must arise solely from integer
powers of the generic external momentum p (modulated by dimensionless logarithms
involving the regularization scale μ). Consider a graph with E external pions and
a total of Nπ pion fields appearing at all the interaction vertices, of which there
are N2 from the L2 Lagrangian, N4 from L4 , etc. The total mass dimension of the
(truncated) amplitude is 4 − E, of which a mass power n (4 − n)Nn − Nπ is con-
tributed by explicit factors of the dimensionful constants of the theory. The remaining
D mass dimensions must therefore come from the powers of external momentum,
and is

D = 4 − E − ( (4 − n)Nn − Nπ ) (16.110)
n

The E external pion fields and Nπ fields at the interaction vertices clearly produce
graphs with the number of internal lines I = Nπ2−E (as each internal line arises from the
contraction of two pion fields).
Moreover, we saw in Section 10.4 that a connected graph
with L loops and V = n Nn vertices has I = L + V − 1 internal lines. Eliminating
Nπ = 2L + 2V + E − 2 from (16.110), we find

D =2+ (n − 2)Nn + 2L (16.111)
n
Effective field theories: a compendium 607

As the second and third terms on the right are zero or positive, we see that the leading
behavior of an arbitrary multi-pion amplitude at low momentum (a) vanishes at least
quadratically as the external momenta go to zero, and (b) that the leading behavior
at low momentum (D = 2), is given completely by the lowest term L2 in our general
effective Lagrangian, and indeed, by only the tree graphs (L = 0) obtained therefrom.
Higher-order corrections at low momentum, of order p4 say, require using the higher-
order term L4 at tree level, or L2 to one-loop order; and so on. In any event, it is clear
that only a finite and well-defined set of parameters (which must be determined by
experimental fits, or calculated non-perturbatively, say by lattice QCD techniques) are
relevant up to any given order of the low-momentum expansion of the amplitudes of
the theory. We see again the quintessential advantage of an effective field theory: the
sensitivity of the amplitudes of the theory in a restricted momentum regime is found
to be restricted to a limited set of terms, allowing these amplitudes to be calculated in
a systematic, though approximate, fashion in perturbation theory, starting from the
leading terms in the effective Lagrangian for the theory.
The highly simplified toy model with which we have introduced the ideas of chiral
effective Lagrangians must, of course, undergo substantial elaboration to provide an
adequate description of low-energy hadronic physics in the real world. Our massless up
and down quarks must be given (small) masses, and the somewhat higher strange quark
mass also included if we wish to study the low-energy amplitudes of strange mesons.
The methods described above can be generalized to deal with the inclusion of quark
mass terms, treated perturbatively (so the amplitudes of the theory are developed in a
double expansion in the small external momenta of the pseudo-Goldstone mesons and
the light quark masses), and sources can be introduced for the vector and axial vector
currents of the theory. Even the axial U(1) anomaly (of importance in the calculation
of the neutral pion decay rate to two photons) can be included systematically in the
resultant effective Lagrangian. The interested reader is encouraged to pursue these
further developments, which are fully laid out in the seminal work of Gasser and
Leutwyler (Gasser and Leutwyler, 1985) (in turn based on the pioneering contributions
of Weinberg (Weinberg, 1968)).
Considerations of space require us to bring to a conclusion this all-too-brief survey
of effective field theory: admittedly, we have barely scratched the surface of this rich,
and, for modern elementary particle theory, profoundly important subject. A few
more comments are in order, however. As a practical matter, most applications of
effective field theory are carried out by a matching procedure, whereby the coefficients
of the putative effective Lagrangian (itself determined by appropriate choice of a
smearing of the underlying fields followed by integrating out unwanted modes, and fully
exploiting any available symmetries to restrict the set of allowed operators) relevant
to the physics regime of interest are determined by equating amplitudes computed
by the use of the effective Lagrangian with the same amplitudes computed either
perturbatively or non-perturbatively (e.g., by lattice methods) up to the appropriate
level of sensitivity in an expansion in some available small momentum ratio. The
technical details of this matching procedure will not be described further here: we
refer the reader to any of a number of excellent reviews on effective field theory, for
example, the review of Georgi (Georgi, 1993) and the TASI-2002 lectures of Rothstein
(Haber and Nelson-eds, 2004).
608 Scales I: Scale sensitivity of field theory amplitudes and effective field theories

16.7 Problems
1. Verify that the graphs in Fig. 16.2 give rise to the amplitude displayed in (16.15).
Use the Feynman parameter identity (16.67) to combine the propagators in the
loop integral, which you may evaluate with a sharp momentum cutoff (i.e., a
factor of θ(Λ − |k|)).
2. Carry out the steps leading from (16.34) to (16.37).
3. Show that the cutoff dependence of the connected functional W0 arising from
the functional integral of the (exponentiated) free cutoff Lagrangian L0 (φ, Λ) in
(16.28) corresponds to the source-independent term in (16.43).
4. Verify that the loop integral appearing in (16.53) can be expanded as indicated
in (16.55).
5. Show that the expansion coefficients fn of the one-loop integral (16.62) in
powers of q 2 /m2 for n > 0 are identical up to powers of m2 /Λ2 in dimensional
regularization and sharp momentum cutoff schemes. Note that the integral
subtracted at zero momentum (thereby leaving only the fn , n > 0 terms) is
convergent as d → 4 in the dimensional scheme and has a finite limit (with
O(m2 /Λ2 ) corrections) as Λ → ∞ in the sharp momentum cutoff scheme.
6. Starting from the effective non-relativistic Lagrangian (16.82), show that the
bubble diagrams contributing to the 2-2 amplitude (corresponding to the Fourier
transform of the four-point function 0|T (ψψψ † ψ † )|0) produce in d = 2 or d = 3
spacetime dimensions (provided λ < 0) a pole corresponding to a non-relativistic
threshold bound state, of exactly the form found in the full relativistic theory in
Section 11.2.
7. By identifying the Noether currents J5aμ = q̄γμ γ5 τa q (to lowest order) for chiral
SU(2) transformations in the QCD Lagrangian with the corresponding current
in the effective non-linear Lagrangian (16.108), we find that the σ model VEV v
can be identified with the pion decay constant fπ . Here is the argument:
(a) Show that the Noether current implementing infinitesimal chiral transforma-
tions (i.e., VL = VR† = 1 + iωa τa , ωa infinitesimal) in the effective non-linear
model is (to lowest order in the chiral expansion)

v2
eff
J5aμ =i Tr(τa (U † ∂μ U − U ∂μ U † ) (16.112)
4
(b) The pion decay constant is defined in terms of the vacuum to one-pion matrix
element of the axial current J5aμ (which appears in the hadronic part of the
Fermi interaction Hamiltonian G √F ūγ μ (1 + γ5 )d μ̄γμ (1 + γ5 )νμ responsible for
2
charged pion decay to μ + ν̄μ ), as follows:

0|J5aμ (0)|πb (p) = ifπ pμ δab (16.113)

Using (16.112) (with U suitably expanded in Π fields) in (16.113), show that


we can identify fπ with v.
8. Carry out the change of field variables indicated in (16.106) in the terms given
explicitly in (16.104), with the “tickling” term set to zero ( = 0), to obtain
Problems 609

the form of the nonlinear Lagrangian (as a function of S and U fields) prior to
integrating out the “heavy” S field.

9. Substituting the definition U (x) ≡ ei


τ ·Π/v in the first term in the non-linear
chiral Lagrangian (16.108), and expanding out the exponential, show that one
obtains the terms shown in (16.109). Use the indicated quartic interaction
term to calculate, to lowest order at small pion momenta, the amplitude for
π + + π − → π 0 + π 0 scattering.
17
Scales II: Perturbatively
renormalizable field theories

In the previous chapter we emphasized the importance of the scale separation property
of local quantum field theories, which expresses our ability to predict, at least to some
reasonable level of accuracy, the features of particle interactions at long distances (or
equivalently, at low energies) despite the fact that we are inevitably ignorant of the
“ultimate” details of these interactions at arbitrarily small distance scales. Historically,
this property was first realized in the context of the perturbative treatment of a specific
quantum field theory, quantum electrodynamics, the local gauge theory describing
the interactions of photons with electrons (and other charged leptons). As we saw
in Chapter 2, the development of interacting quantum field theories in the two
decades from the early 1930s to around 1950 was severely hampered by the ubiquitous
infinities—more precisely, ultraviolet divergences in the integrals over the momenta
of particles appearing in intermediate states in scattering amplitudes—which plagued
all higher-order calculations in these theories.
In the late 1940s the development of graphical techniques for covariant perturbation
theory proved to be the critical ingredient needed to surmount this impasse. The
covariant graphical techniques introduced by Feynman revealed in a much more
transparent way the structure of the scattering amplitudes of the theory, and in
particular the “nested” character of the divergent contributions to these amplitudes,
features which were then exploited by Dyson in his classic development of perturbative
renormalization theory for quantum electrodynamics (Dyson, 1949). Dyson was able
to show, order by order in perturbation theory, and to all orders of perturbation
theory, that the distressing divergences disappeared provided the amplitudes of the
theory were re-expressed in terms of a finite number of “renormalized” parameters
corresponding to measurable (and therefore ipso facto finite) low-energy properties
of the theory. Otherwise stated, if an ultraviolet cutoff Λ is introduced to regularize
the theory (thereby making all loop integrals finite), the reparameterization of the
amplitudes of the theory in terms of renormalized quantities softened the dependence
on the cutoff Λ, yielding amplitudes which were finite in the limit Λ → ∞, with a
cutoff sensitivity at finite Λ corresponding to powers (typically quadratic) of the ratio
of the masses and momenta in the scattering amplitude to the UV cutoff Λ.
The remarkable quantitative agreement of the quantum electrodynamic amplitudes
(anomalous magnetic moment of the electron and muon, Lamb shift, etc.) computed
using this procedure with the measured experimental values remain among the most
impressive successes of physical science. Nevertheless, the overpowering impression
Scales II: Perturbatively renormalizable field theories 611

persisted among many physicists that the procedures of Dyson amounted to an intel-
lectually unsavory “fudge”—a mere sweeping under the rug of potential underlying
inconsistencies in the theory of which the ultraviolet divergences were apparently
the overt symptom. This unease began to dissipate in the early 1970s, with the
development by Wilson of the effective field theory point of view which we discussed
in Section 16.2. The inescapable presence of new physics (quantum gravity, string
theory, or what have you) at short distances implies, as emphasized in the preceding
chapter, that we must necessarily imagine a cutoff Λ of some kind at high energies:
the question of infinities in the (unphysical!) limit where this cutoff is taken to infinity
is then replaced by the issue of sensitivity of the measurable low-energy amplitudes
of the theory to the value (and type) of this cutoff. Our object in this chapter is to
review the techniques that have been developed to study this sensitivity in the context
of the formal weak coupling perturbation expansions introduced in Chapter 10, and
in particular to show that for a certain subset of field theories, the sensitivity is of the
weak kind (inverse powers of the cutoff) first described by Dyson. We shall also see
that these results fit naturally into the picture of the renormalization group flow of
effective Lagrangians discussed in the previous chapter.
Before diving into the technical details, it may be helpful to the reader to give a
simple example of the phenomenon outlined above, whereby the strong dependence of a
regularized scattering amplitude on an ultraviolet cutoff can be dramatically weakened
by a reparameterization of the theory in terms of “renormalized” parameter(s). Let
us suppose that we are told that the 2-2 scattering amplitude out k3 , k4 |k1 , k2 in for
some particle is described by the expression (given, as usual, up to terms involving
inverse powers of the cutoff Λ)
ig 2 ig 2 k2
Γ(4) (k1 , k2 , k3 , k4 ) = + + O( i2 )
1+ Cg 2 2 2 2
ln (t/Λ ) 1 + Cg ln (u/Λ ) Λ
t ≡ −(k3 − k1 )2 , u ≡ −(k4 − k1 )2 (17.1)
where g is the coupling constant for some interaction in the theory cut off at
momentum Λ, and C is an uninteresting (real positive) numerical constant.1 If we
expand out the denominator factors in a perturbative series in powers of g 2 , it is
apparent that at any given finite order of perturbation theory, the amplitude displays
logarithmic ultraviolet divergences, becoming infinite as a power of ln (Λ2 ) as the
cutoff Λ is taken to infinity. If we instead parameterize the amplitude in terms of a
renormalized coupling gR , defined simply in terms of the value of the 2-2 scattering
amplitude at some experimentally accessible value of the momentum transfer variables
t = u = μ2 << Λ2 , as follows:

1 The curious reader may be interested in the origin of this simple expression, although it is not relevant
to the present discussion. It represents the 2-2 amplitude for scattering in a theory of N massless Dirac

fermions in two spacetime dimensions, with quartic interaction Lagrangian 12 g 2 ( N 2
i=1 ψ̄i ψi ) (the so-called
Gross–Neveu model (Gross and Neveu, 1974)), in the limit where N → ∞ with g N fixed—the so-called
2
“1/N” expansion. In this case the constant C = N/(2π). It also gives, in a similar limit, the 2-2 scattering
amplitude for a theory of N massless complex scalar fields in four spacetime dimensions with interaction
 †
Lagrangian − 12 g 2 ( Ni=1 φi φi ) , but with the constant C now negative, C = −N/(16π ).
2 2
612 Scales II: Perturbatively renormalizable field theories

i g2 2
gR
2
gR ≡ − Γ(4) (t = u = μ2 ) = 2 2 2
⇒ g2 = 2 (17.2)
2 1 + Cg ln (μ /Λ ) 1 + CgR ln (Λ2 /μ2 )
In the jargon of renormalization theory with which we must now begin to familiarize
ourselves, such a definition is referred to as a renormalization condition: a complete
set of such conditions (one for each free parameter needed to uniquely identify the
low-energy theory) specifies a renormalization scheme. If we now re-express the 2-2
amplitude (17.1) in terms of the renormalized coupling gR , we find
2 2
igR igR k2
Γ(4) (k1 , k2 , k3 , k4 ) = 2 2
+ 2 2
+ O( i2 ) (17.3)
1 + CgR ln (t/μ ) 1 + CgR ln (u/μ ) Λ
and we see that all dependence on the ultraviolet cutoff Λ has been removed to the
level of the (in practice, for quantum electrodynamics) tiny inverse power terms which
are, of course, harmless in the limit (Λ → ∞) in which the cutoff is removed entirely.
In particular, the expansion of our 2-2 amplitude in powers of gR gives a renormalized
perturbation expansion in which each term is separately insensitive (again, up to
harmless inverse power corrections) to the UV cutoff Λ. The essential components
of the renormalization procedure in field theory can be seen already here, in this
admittedly algebraically trivial example: first, the choice of a regularization method
(effectively, a mathematical model of our ignorance of the theory at short distances),
and second, the renormalization conditions identifying a specific reparameterization of
the amplitudes of the theory (allowing us to conveniently inject accessible low-energy
information about the theory).
A third feature of the renormalization program which becomes apparent once we
evaluate amplitudes order by order in perturbation theory derives directly from the
reparameterization step: the appearance of subtracted amplitudes at each order of
4
perturbation theory. Let us illustrate this point at the first subleading order (gR ) in
the expansion of our 2-2 amplitude (17.1) (corresponding to one-loop contributions
in the underlying field theory). We may evidently define a coupling constant shift (or
2
counterterm) δgR by the trivial identity
g 2 ≡ gR
2 2
+ δgR (17.4)
4
where, by expanding (17.2), we have through order gR ,
2
δgR = −CgR
4
ln (Λ2 /μ2 ) + O(gR
6
) (17.5)

Inserting (17.5) into the perturbative expansion of the amplitude (17.1) in powers of
the bare coupling g, we find (neglecting terms suppressed by powers of the cutoff)

Γ(4) (k1 , k2 , k3 , k4 ) = i{g 2 + Cg 4 ln (Λ2 /t) + . . .} + (t → u)


2
= i{gR 4
+ CgR ln (Λ2 /t) + δgR
2 6
+ O(gR )} + (t → u)
2
= i{gR 4
+ CgR ln (Λ2 /t) − CgR
4
ln (Λ2 /μ2 ) + O(gR
6
)} + (t → u)
2
= i{gR 4
+ CgR ln (μ2 /t) + O(gR
6
)} + (t → u) (17.6)
We see in the penultimate line of (17.6) that the effect of the reparameterization in
4
terms of the renormalized coupling gR has been to introduce a subtraction of the O(gR )
Weinberg’s power-counting theorem and the divergence structure of Feynman integrals 613

amplitude which precisely removes the logarithmic Λ-dependence of the latter in the
original cutoff amplitude.
Understanding the structure of such subtractions will be crucial in developing a
renormalization technology capable of exposing the cutoff sensitivity of field theory
amplitudes at all orders of perturbation theory. In particular, let us note that although
the reparameterization indicated in (17.2) has evidently succeeded in suppressing the
cutoff dependence of the elastic scattering 2-2 amplitude (up to ignorable inverse
power terms, as always), we have not demonstrated that the same reparameterization
suffices to remove the cutoff sensitivity of all the amplitudes of the theory: 2-4, 2-6,
etc., particle production amplitudes, for example. This much stronger requirement—
that a reparameterization of a finite number of couplings (and masses) in terms of
low-energy quantities can suppress the cutoff dependence of the entire set of S-matrix
amplitudes of the theory, order by order, and to all orders of perturbation theory—
is, amazingly, satisfied by a rich variety of local quantum field theories, which we
collectively refer to as “perturbatively renormalizable theories”: they are the subject
of our enquiry in this chapter. Later, after developing the appropriate technology for
perturbative renormalization theory, we shall see (in Section 17.4) that such theories
appear naturally as low-energy limits of the effective Wilsonian field theories described
in Chapter 16.
In the next section we shall examine in detail the structure of the cutoff dependence
of general multi-loop Feynman integrals appearing in the perturbative expansion of
amplitudes in a local quantum field theory. We shall see that the occurrence of
divergent integrals (and subintegrals) in such loop amplitudes is associated with cutoff-
dependent contributions which have a very simple (in fact, polynomial) momentum
dependence. This latter fact will then be exploited in the subsequent section to
demonstrate the equivalence of the set of subtractions needed to remove the leading
cutoff dependence of an arbitrary Feynman amplitude (reducing it to the inverse
power-dependence of the type seen above) to the result of a reparameterization of a
set of coupling and mass parameters appearing in the Lagrangian of the theory. The
intimate connection reparameterization ⇐⇒ subtractions, of which we have just seen
a particularly trivial example, is the essence of the proof of cutoff-insensitivity for
perturbatively renormalizable theories.

17.1 Weinberg’s power-counting theorem and the divergence


structure of Feynman integrals
The sensitivity of field theory amplitudes to a short distance, or large momentum,
cutoff arises in perturbation theory from the presence of loop integrals over the four-
momenta carried by internal lines of the diagrams contributing to the amplitude in
question. The level of the sensitivity is determined by the size of the contributions
these integrals receive when some subset (or all) of the loop momenta are on the
order of (or greater than) the imposed ultraviolet cutoff. Rigorous estimates of these
contributions are possible if we work in Euclidean space, as first demonstrated by
Weinberg (Weinberg, 1960). In fact, for the remainder of this chapter we shall be
exclusively concerned with the cutoff sensitivity of the Euclidean amplitudes of field
theory. Once well-defined Euclidean amplitudes are obtained (in the infinite cutoff
614 Scales II: Perturbatively renormalizable field theories

limit) they can be shown (Zimmermann, 1969) to be analytically continuable to


physically sensible Minkowski amplitudes, and hence to a well-defined S-matrix.
Weinberg’s theorem, which we shall state shortly, applies to a general multi-loop
integral of the form

N (k1 , k2 , .., kE ; l1 , l2 , .., lL ) dd l1 dd l2 · ·dd lL
I(k1 , k2 , .., kE ) = (17.7)
D(k1 , k2 , .., kE ; l1 , l2 , .., lL ) (2π)dL

Here k1 , k2 , .., kE are the external momenta associated with the E external lines of the
graph, and l1 , l2 , .., lL the L independent loop momenta appearing within the graph.
The numerator factor N (k1 , k2 , .., kE ; l1 , l2 , .., lL ) is a multi-nomial of finite order in
its four-momenta arguments, while the denominator factor is a product of the usual
(Euclidean) Feynman propagator factors p2 + m2i , which we shall here assume are
all massive, to avoid concern over infrared divergences (which we shall be taking up
specifically in Chapter 19), which are, of course, not relevant in a discussion of short-
distance sensitivity of the theory. At present, we leave the spacetime dimension d free,
as Weinberg’s theorem applies for all (integral) dimensions.
A simple example is the one-loop graph of Fig. 17.1 (cf. also Fig. 16.2) contributing
to 2-2 scattering in φ4 theory, which we have already encountered on several occasions:

1 dd l
I(k1 , .., k4 ) = (17.8)
(l2 + m2 )((k1 + k2 − l)2 + m2 ) (2π)d

For the purposes of the present discussion, we ignore overall numerical factors, powers
of the coupling constant, etc. For large values of the loop momentum, l ≡ |l| >> |ki |, m,
the integral scales like

k3 k4

l k1 + k2 − l

k1 k2

Fig. 17.1 A one-loop contribution to the 2-2 amplitude in φ4 -theory.


Weinberg’s power-counting theorem and the divergence structure of Feynman integrals 615
 Λ  Λ
1 d−1
l dl = ld−5 dl (17.9)
l4

where Λ is a UV cutoff. Evidently, if d < 4 the integral is convergent at the upper end,
and we may let Λ → ∞ obtaining a finite result. If d = 4, the integral is logarithmically
divergent, so the result at finite Λ contains a logarithmic sensitivity ∼ ln Λ to the
cutoff. For (integer) dimensions d > 4, the integral has a much larger, power growth
dependence Λd−4 on the cutoff. Evidently, the quantity D ≡ d − 4, which we shall term
the superficial degree of divergence of the loop integral, indicates the demarcation point
for ultraviolet convergence of the loop integral. Note that it is obtained very simply by
counting the difference in powers of loop momenta in the numerator and denominator
of the Feynman integrand, including a factor of ld from the measure dd l. For D < 0, we
say that the integral is superficially convergent: in this case, the result at finite cutoff
(much larger than the external momenta and masses) differs from the infinite cutoff
limit by inverse powers of the cutoff (for d even, at least quadratically, perhaps with
logarithmic modifications which are dominated by the power falloff), and we shall refer
to such a loop integral as “UV-finite”. For D = 0, we have a logarithmically divergent
integral, with ln Λ sensitivity to the cutoff, while for D > 0 we have power-divergent
integrals (linearly for D = +1, quadratically for D = +2, etc.) with much stronger
dependence on the UV cutoff. Weinberg’s theorem generalizes the particular case of a
UV-finite integral to an arbitrary Euclidean Feynman integral of the form (17.7), as
follows.
Theorem 17.1 (Weinberg, 1960) The general Euclidean loop integral (17.7) is UV-
finite provided the superficial degree of divergence associated with scaling any subset
of loop momenta uniformly to infinity is negative. In other words, the overall integral
is UV-finite if the superficial degree of divergence associated with integrating over any
hyperplanar subspace of the full dL-dimensional Euclidean integration space is strictly
negative.
In the event that this condition is satisfied, we are assured that the sensitivity of the
corresponding amplitude to a high-momentum cutoff Λ corresponds to the mild, and
from the point of view of renormalization theory, ignorable inverse power-dependence
on the cutoff that we have seen now on numerous occasions. We shall not attempt
to reproduce the details of Weinberg’s proof here, which depends on a technically
sophisticated (but physically not particularly enlightening) application of real analysis,
especially as the result appears completely natural, particularly once rephrased in the
language of large momentum flows, as we shall see shortly.
Before going on to more general cases, let us just note that for the simple one-
loop integral (17.8) of Fig. 17.1 one easily sees that in d = 4 dimensions, the degree
of divergence associated with a hyperplane subspace of dimension n ≤ 4 is D = n − 4
so that the only divergence comes from the region corresponding to n = 4 in which
all components of the loop four-momentum become large. This will typically be the
case for more complicated multi-loop graphs: we will need only examine hyperplanes
corresponding to all components of some subset (or possibly all) of the loop momenta
becoming large. Physically this corresponds to regions of the integration in which large
momentum “irrigates” various loops of the graph, individually or in some specified
616 Scales II: Perturbatively renormalizable field theories

joint fashion. If this seems all a bit vague at the moment, we beg the reader’s patience
for a few moments longer: explicit examples will soon follow.
In our first foray into perturbative field theory in Chapter 10, we encountered the
logarithmic divergence of (17.8) for infinite UV cutoff in four spacetime dimensions
and recognized (cf. (10.35)) this infinity as the symptom in momentum space of the
singular result of trying to multiply individually well-defined distributions at the same
point in coordinate space (i.e., trying to obtain the squared coordinate space Feynman
propagator ΔF (z)2 ). We saw there (cf. the discussion following (10.36)) that the
singularity amounts in coordinate space to an additive local δ-function term δ 4 (z),
or to a constant in the Fourier-transformed momentum space: indeed, subtracting the
loop integral at any fixed value of the external momenta—for example, zero—gives a
perfectly UV-finite result (cf. (10.36)). Let us revisit this result briefly from the point
of view of Weinberg’s theorem, as it serves as a useful prelude to the discussion in
the more general multi-loop case. Taking d = 4 spacetime dimensions, and subtracting
from the amplitude (17.8) its value with all external momenta set to zero, we obtain

IR (k1 , .., k4 ) ≡ I(k1 , .., k4 ) − I(0, .., 0) (17.10)



1 1 d4 l
= { 2 − 2 }
(l + m )((k1 + k2 − l) + m ) (l + m ) (2π)4
2 2 2 2 2

2l · (k1 + k2 ) − (k1 + k2 )2 d4 l
= (17.11)
(l + m ) ((k1 + k2 − l) + m ) (2π)4
2 2 2 2 2

and we see that the subtracted Feynman integral now has superficial degree of
divergence D = −1 and is therefore convergent, by Weinberg’s theorem (there is only
a single loop here, so the only region of large momentum flow corresponds to l large,
where the scaling of the subtracted integrand is as l−1 ). Otherwise put, with a cutoff
Λ in place, the subtracted integral depends on Λ at the level of power-suppressed
k2 ,m2
terms of order O( iΛ2 ): in the conventional language of renormalization theory, it is
UV-finite.2 The origin of this convergence is simply that the leading behavior of the
original (“unrenormalized”) integrand (17.8) and the subtraction term at large l are
identical, so the subtraction removes the leading asymptotic behavior of the integrand
responsible for the logarithmic divergence (= logarithmic dependence of the cutoff
integrals on Λ).
There is an alternative, and for our future purposes extremely important, way to
understand the efficacy of this subtraction procedure in reducing the cutoff dependence
of the loop amplitude. We shall introduce the notation

tΓ I(k1 , .., k4 ) ≡ I(0, .., 0) (17.12)

for the subtraction term in (17.10). The operation tΓ will be defined on all one-particle-
irreducible (1PI) graphs Γ with superficial degree of divergence D ≥ 0 as extracting
the terms up to order D in the Taylor expansion in the external momenta ki around

2 The quadratic dependence arises from the fact that we can further improve the convergence of the
integral by symmetric integration: i.e., taking the average of the integrand at l and −l before integration.
Weinberg’s power-counting theorem and the divergence structure of Feynman integrals 617

zero momentum of the loop integral(s) I representing Γ. In the present case the loop
integral has D = 0, and the operation tΓ therefore simply takes the leading term of
the Taylor expansion, i.e., the unsubtracted amplitude at zero momentum. Thus, our
subtracted amplitude (17.11) amounts to

IR (k1 , .., k4 ) = (1 − tΓ )I(k1 , .., k4 ) (17.13)

On the other hand, the right-hand side of (17.13) can be viewed as the sum of all
the Taylor terms in the expansion of I(k1 , .., k4 ) around zero momentum, linear or
higher in the momenta. But as Γ is 1PI, all of its internal lines3 are parts of loops, and
therefore every differentiation of an internal propagator reduces the superficial degree
of divergence of the integrand by 1. In our simple one-loop case, for example,
∂ 1 (k1 + k2 − l)μ
= −2 (17.14)
∂k1μ (k1 + k2 − l)2 + m2 ) ((k1 + k2 − l)2 + m2 )2
so the differentiation has reduced the scaling of the propagator at large loop momen-
tum from 1/l2 to 1/l3 . Accordingly, all the terms in the Taylor expansion of IR in
(17.13) have degree of divergence D = −1 or less, and are therefore UV-convergent.
This argument can clearly be generalized to 1PI graphs Γ with a superficial degree
of divergence D > 0, by defining the Taylor operator tΓ as the sum of the first D + 1
orders of the Taylor expansion in the external momenta (around zero): i.e., all terms
up to homogeneous order k D in the external momenta ki of the graph. The Taylor
operator is defined simply to be zero when applied to a superficially convergent graph
or subgraph. As what remains after tΓ I(ki ) is subtracted from I(ki ) are just the terms
in the Taylor expansion with at least D + 1 derivatives with respect to the ki , and each
such derivative lowers the superficial degree of divergence of the Feynman integral I
associated with the graph Γ (i.e., the scaling of the integrand when all loop momenta
get uniformly large) by 1, we see that the subtracted amplitude (1 − tΓ )I will again
have D < 0, just as in our simple one-loop example.
Another critical point, the full implications of which will emerge in the next section,
concerns the structure of the subtraction terms generated by the Taylor operator
tΓ : by definition, they are polynomial in the external momentum of the subtracted
graph. This implies that they are equivalent to the terms that would be generated in
perturbation theory by a local term in the Lagrangian with as many field operators as
external lines of the graph in question (in this case, four), and with a finite number
of spacetime-derivatives applied to the fields to generate the appropriate factors of
momentum entering the graph. As we shall see in the following sections, it is exactly
this property of the subtractions effected by the zero-momentum Taylor operators
introduced here that allows us to connect the subtractions needed to remove the
dominant cutoff dependence of the amplitudes of the theory with a precise set of
reparameterizations of these amplitudes in terms of low-energy constants defined by
appropriate renormalization conditions (as in the preceding section). It cannot be
emphasized too forcefully that the entire essence of perturbative renormalization
theory is implicit in the ideas introduced in this paragraph: the reader is strongly

3 Recall from Chapter 10 that external legs are truncated by definition in 1PI amplitudes.
618 Scales II: Perturbatively renormalizable field theories

encouraged to engage in a thorough mental mastication of the arguments just given,


before swallowing whole and proceeding to the next course.
Our discussion so far has been entirely in the context of the perturbative expansion
of the Euclidean space n-point amplitudes (Schwinger functions) of the field theory,
and indeed, the Weinberg power-counting theorem is explicitly formulated in this
context. One sometimes encounters the assertion that a rigorous convergence analysis
of Feynman amplitudes must necessarily be carried out in Euclidean space: only then,
having obtained UV-finite, and suitably analytic, Euclidean space amplitudes, are we
allowed to analytically continue back to the physically relevant Minkowski amplitudes.
This is in fact, as first pointed out by Zimmermann (Zimmermann, 1968), incorrect. It
is certainly true that Minkowski-based Feynman integrals based on the usually defined
momentum-space free propagator ΔF (k) = 1/(k2 − m2 + i) are only conditionally
convergent (as there is an infinite volume associated with the hyperboloid shells defined
by finite ranges of the Minkowski squared momentum, say a < k2 < b), and the power-
counting theorems require absolute convergence. But if we return to the form (10.63) of
the free propagator implied by the original definition of the Minkowski space functional
integral in terms of an infinitesimally rotated time axis,
1 1
ΔF (k) = = (17.15)
(k 2 − m2 + i( k 2 + m2 )) (k02 − (k + m ) + i( k 2 + m2 ))
2 2

absolute convergence of any Minkowski space Feynman integral (subtracted appropri-


ately for UV-finiteness in Euclidean space) follows via the inequality (Problem 1)

1 4 1
| |≤ 1+ 2 (17.16)
2 2
(k − (k + m ) + i(k + m ))
2 2 2  2
k + k 2 + m2
0 0

In fact, for  kept positive and non-zero, all of our Minkowski space Feynman inte-
grands are majorized by the corresponding Euclidean ones, for which the arguments we
have been presenting, using the power-counting prescriptions of the Weinberg theorem,
are rigorously valid. The absolute convergence (and UV-finiteness) of the subtracted
Euclidean amplitudes is therefore a fortiori correct for their Minkowski versions. Of
course, proper mathematical rigor requires that we establish that the  → 0 limit can
indeed be carried out at the end, leading to well-defined covariant Minkowski space
amplitudes. The covariance property in particular is not immediately obvious, given
the presence of the non-covariant factor k2 + m2 in the  term in (17.15), but, once
again, Zimmermann (Zimmermann, 1968) has done the hard work for us, and shown
explicitly that the limit does indeed exist, with the resultant amplitudes well-behaved
covariant tempered distributions. For the following, we shall return to Euclidean space,
confident that the subtraction procedures devised there to remove the ultraviolet
sensitivity of the amplitudes will be equally efficacious in Minkowski space.
The true power of Weinberg’s theorem really emerges when we go to higher
orders of perturbation theory, when we encounter multi-loop graphs with multiple
independent regions of integration giving divergent contributions to the overall ampli-
tude. Consider the two-loop diagram illustrated in Fig. 17.2 contributing to the 2-2
amplitude in φ4 theory (again, in four spacetime dimensions). The corresponding loop
integral is (again, ignoring overall numerical factors, couplings, etc.)
Weinberg’s power-counting theorem and the divergence structure of Feynman integrals 619

k3 k4

l1 − l2 − k3

l2
l1 k1 + k2 − l1

k1 k2

Fig. 17.2 A two-loop contribution to the 2-2 amplitude in φ4 -theory.


1 d4 l1
I(ki ) =
(l12 + m2 )((k1 + k2 − l1 )2 + m2 ) (2π)4

1 d 4 l2
· (17.17)
(l22 m2 )((l 1 − l2 − k3 ) + m ) (2π)
+ 2 2 4

The superficial degree of divergence of the graph as a whole, obtained by taking


both loop momenta l1 , l2 large and counting total powers of these momenta in the
numerator (namely the eight powers arising from the measure d4 l1 d4 l2 ) is evidently
D = 8 − 4 − 4 = 0, so there is definitely an overall logarithmic divergence. But there
is also a divergence coming from the region where the “outer” loop momentum l1 is
kept fixed and the “inner” loop momentum l2 becomes large: along this hyperplane,
the degree of divergence (counting just powers of l2 ) is also D = 0. A divergence of this
type, arising from large momentum flow through a part of, but not the whole, diagram,
we shall term a “subdivergence”. Note that there is also a region of the integration
corresponding to a large momentum flow where l1 and l2 get large with l1 − l2 fixed,
corresponding to large momentum flowing through the lines with momentum l1 , l2 and
k1 + k2 − l1 in Fig. 17.2, but this region has degree of divergence D = 4 − 2 − 2 − 2 =
−2 and therefore gives a convergent contribution to the full graph. In order to devise an
appropriate set of subtractions capable of removing the leading logarithmic divergences
in this situation, it will be convenient to introduce a notation for identifying various
subgraphs γ, γ1 , .. etc., of the full graph Γ associated with possible subdivergences of
the full loop integral I. We shall identify a graph or subgraph by indicating within
square brackets the momenta of the lines contained therein, thus

Γ = [l1 , k1 + k2 − l1 , l2 , l1 − l2 − k3 ]
γ = [l2 , l1 − l2 − k3 ] (17.18)
620 Scales II: Perturbatively renormalizable field theories

We have indicated here only the renormalization parts of the full graph Γ: namely,
those connected 1PI subgraphs with superficial degree of divergence greater than or
equal to zero (including possibly the entire graph, as here). Each possible subgraph
γ can be assigned a degree of divergence d(γ) corresponding to the power-counting
associated with large momentum flow through all lines of that subgraph. In the present
case, the only renormalization parts are the full graph Γ and the subgraph γ, with in
both cases d(Γ) = d(γ) = 0. Taylor subtraction operators tΓ and tγ can be similarly
associated with each renormalization part, in an obvious extension of the procedure
followed in our previous one-loop example. For the subgraph γ, the external momenta
now include, in addition to the ki , the loop momentum l1 associated with the lower
pair of internal lines in Fig. 17.2 (more generally, the external momenta of a given
renormalization part consists simply of those momenta which remain fixed when the
large momentum flow giving rise to the degree of divergence of that subgraph is
invoked). As in our present case both d(Γ) and d(γ) are zero, the Taylor operators
amount to either setting k1 = k2 = k3 = k4 = 0 (tΓ ) or k1 = k2 = k3 = k4 = l1 = 0
(tγ ). It seems entirely plausible, and the reader is encouraged to verify (Problem 2)
that the following subtracted integral, when examined in the context of Weinberg’s
theorem, is UV-finite:

IR (ki ) = (1 − tΓ )Ī = Ī(ki ) − Ī(0)



1 d4 l1
Ī ≡
(l1 + m )((k1 + k2 − l1 ) + m ) (2π)4
2 2 2 2

1 d4 l2
·(1 − tγ )
(l2 + m )((l1 − l2 − k3 ) + m ) (2π)4
2 2 2 2

2l2 · (l1 − k3 ) − (l1 − k3 )2 d 4 l 1 d4 l 2
=
(l12 + m2 )((k1 + k2 − l1 )2 + m2 )(l22 + m2 )2 ((l1 − l2 − k3 )2 + m2 ) (2π)8
(17.19)

Indeed, the subtraction effected by tγ ensures that the degree of divergence associated
with large momentum flow through the subgraph γ is reduced to –1 (just as in our
first one-loop example), while the overall subtraction effected by tΓ reduces the degree
of divergence of the graph as a whole (when both l1 and l2 become large) to –1. The
particular combination of four loop integrals implied by the subtractions of (17.19) is
therefore UV-finite by Weinberg’s theorem, or equivalently, cutoff insensitive at the
level of inverse powers of the cutoff. We have introduced a notation Ī to indicate a
subtracted amplitude containing Taylor subtraction operators for all proper subgraphs
of the graph in question, but not the overall (“top level”) subtraction needed if the
full graph is superficially divergent.
At this point it will be convenient to introduce some graphical notations which will
serve us in good stead as we attempt to generalize the insights gleaned from these first
simple examples to arbitrary graphs. First note that the inner subtraction effected by
the tγ operator in (17.19) actually corresponds to replacing the integral over l2 by a
pure number (as tγ sets l1 and k3 to zero), leaving a graph with only the lines [l1 , k1 +
k2 − l1 ], which amounts to a graph in which the entire subgraph represented by γ
Weinberg’s power-counting theorem and the divergence structure of Feynman integrals 621

has been shrunk to a single vertex (in this case, without any momentum dependence,
as the Taylor operator is of degree zero). We shall denote such a graph by Γ/γ,
and more generally, graphs Γ in which some set of disjoint renormalization parts
γ1 , . . . γn are shrunk to a point as a consequence of having been replaced by their Taylor
operator evaluations as Γ/{γ1 , .., γn }. Also, the integrands associated with graphs (or
subgraphs, or shrunk graphs) will be indicated by an obvious superscript notation.
We may then abbreviate the expression (17.19) as follows:

IR
Γ
(ki ) = (1 − tΓ )Ī Γ , Ī Γ = I Γ + I Γ/γ (−tγ )Ī γ (17.20)

Note that the integrand Ī γ to which the inner subtraction tγ is applied is just the
original subgraph I γ consisting of the lines [l2 , l1 − l2 − k3 ]: the overbar in this case
is superfluous, as this graph contains no proper superficially divergent subgraphs
which need be subtracted. The formula (17.20) is our first example of a general
recursion formula (originally due to Bogoliubov), the explicit solution of which (due
to Zimmermann) will describe in a compact way the exact divergent structure of
arbitrary Feynman integrals.
The two-loop example just discussed possesses a simplifying feature which may
already have occurred to the reader: the divergent regions of the graph are nested—
in other words, the inner subdivergence arising from the region of large momentum
flow through the subgraph γ corresponds to a set of lines which are strictly contained
in the set of lines that carry the large momentum leading to the overall divergence
of the full graph Γ. This makes it easy to guess the proper set of sequential sub-
tractions (1 − tΓ ) · · · (1 − tγ ) · · which render the graph UV-finite: one simply begins
by subtracting off the innermost divergence and then proceeding outwards, at each
stage performing the Taylor subtraction if a divergent subgraph is encountered.
The generalization of this procedure to a large graph containing non-intersecting
sets of nested divergences is also clear: the Taylor subtractions can be performed
independently on the separate non-intersecting sets, leading to a subtracted integrand
which possesses a finite Λ → ∞ limit by Weinberg’s theorem.
We shall use the terminology “non-overlapping” to describe the situations summa-
rized here: two renormalization parts γ1 , γ2 of a diagram (i.e., superficially divergent
1PI subgraphs) are said to be non-overlapping if either (a) one is entirely contained
in the other (the “nested” case), or (b) they are completely non-intersecting (i.e., no
lines in common). Our analysis of the divergence structure of Feynman graphs would
be essentially concluded if we had only the non-overlapping case to consider.
There are, however, cases of divergent subgraphs that intersect partially, giving
rise to the famous problem of overlapping divergences. Evidently, we must consider
at least a two-loop diagram in order to find two distinct—but not simply nested—
subdivergences. We shall illustrate the problem in a a self-interacting scalar theory
λ 3
which we have not heretofore studied- 3! φ theory in six spacetime dimensions—which
has the advantage of being topologically similar to gauge theory, inasmuch as the basic
interaction involves a trilinear coupling.4 To fourth order in λ, one encounters the

4 For a similar analysis of an overlapping divergence in φ4 theory, see Problem 4. Note that the φ3 -theory
has an unbounded spectrum below (cf. Section 8.4): all finite-energy states are unstable. Nevertheless, the
622 Scales II: Perturbatively renormalizable field theories

p − l1 p − l2

p l1 − l2 p

l1 l2

Fig. 17.3 A two-loop contribution to the scalar propagator in φ3 -theory.

self-energy contribution to the scalar propagator indicated by the graph in Fig. 17.3,
corresponding to the Feynman integral (the external legs are truncated):

1 d6 l1
I (p) =
Γ
(l12 + m2 )((p − l1 )2 + m2 ) (2π)6

1 d6 l2
· (17.21)
(l22 + m2 )((l1 − l2 )2 + m2 )((p − l2 )2 + m2 ) (2π)6
The overall divergence degree of this graph is quadratic, D = −2 − 2 + 6 − 2 −
2 − 2 + 6 = +2, while the subdivergences corresponding to l1 large (with l2 fixed)
or l2 large (with l1 fixed) have degree of divergence D = −2 − 2 − 2 + 6 = 0: i.e.,
logarithmic. Again, we introduce an abbreviation for the various renormalization parts
of the diagram,
Γ = [l1 , p − l1 , l2 , l1 − l2 , p − l2 ],
γ1 = [l1 , p − l1 , l1 − l2 ],
γ2 = [l2 , l1 − l2 , p − l2 ] (17.22)
with d(Γ) = 2, d(γ1 ) = 0, d(γ2 ) = 0. It is apparent that in this case we are faced with
two divergent subgraphs (γ1 and γ2 ) which have a non-trivial intersection (the line
carrying momentum l1 − l2 ) but are not simply nested as in our previous two-loop
example of Fig. 17.2. The correct subtraction procedure is hardly obvious in this case,
especially as the subtraction terms obtained by applying the Taylor operators tγ1 and
tγ2 depend on the order in which these operations are applied, as the reader may easily
verify. It turns out that the correct way to “slice” the integrand in order to extract
correctly the dominant contributions in all divergent subintegrations of the two-loop
integral (17.21) involves subtractions only on non-overlapping renormalization parts.
Specifically, we define the fully subtracted two-loop self-energy as
IR
Γ
(p) = (1 − tΓ )Ī Γ (p) (17.23)
Ī (p) = I + I
Γ Γ Γ/γ1
(−t )Ī
γ1 γ1
+I Γ/γ2
(−t )Ī
γ2 γ2
(17.24)

renormalized perturbation theory of this model is perfectly sensible, to any finite order, and the graph
topology and divergence structure are similar to those appearing in gauge theories, making it a very useful
laboratory for illustrating important issues of renormalization, unclouded by complications introduced by
higher spin fields.
Weinberg’s power-counting theorem and the divergence structure of Feynman integrals 623

IR
Γ
(p) = I Γ + I Γ/γ1 (−tγ1 )Ī γ1 + I Γ/γ2 (−tγ2 )Ī γ2 − tΓ I Γ
−tΓ (I Γ/γ1 (−tγ1 )Ī γ1 ) − tΓ (I Γ/γ2 (−tγ2 )Ī γ2 )
≡ Ia (p) + Ib (p) + Ic (p) + Id (p) + Ie (p) + If (p) (17.25)

Once again, as in (17.20), the overbars on Ī γ1 , Ī γ2 are superfluous, as these subgraphs


do not contain any further subdivergences to be subtracted. The required subtractions
are indicated in Fig. 17.4, where we have introduced a simple notational device—a
“o” attached to the external legs of a divergent 1PI graph or subgraph γ, to indicate
the presence of the Taylor subtraction operator tγ . A little thought shows that the
subtractions indicated here are precisely those needed to render the fully subtracted
amplitude IR (p) UV-finite by Weinberg’s theorem. One needs merely to check that
the sum of terms indicated in (17.25) have negative degree of divergence in any of the
possible subintegration hyperplanes of the integral in (17.21). For example, consider
the region in which large momentum flows through γ1 : i.e., large l1 with l2 fixed. The
subcombination Ia (p) + Ib (p) has degree of divergence D = −1 as the logarithmic
divergence for large l1 (l2 fixed) is subtracted in exactly the same way as in our
previous examples, as does the combination Id (p) + Ie (p) (which is just minus the first
three terms in the Taylor expansion of the already convergent Ia (p) + Ib (p) around
p = 0: recall that the full graph has degree of divergence d(Γ) = 2). The combination
Ic (p) + If (p) gives

1 1 1 2p · l1
Ic (p) + If (p) = − { − −
(l12 + m2 ) ((p − l1 )2 + m2 ) (l12 + m2 ) ((l12 + m2 )2
4(p · l1 )2 − p2 d6 l1
− }
((l12 + m2 )3 (2π)6

1 d6 l2
· (17.26)
(l22 + m ) (2π)6
2 3

o o
o o
− −

o o
(a) (b) (c)

o o
o o + o o o o
− +

o o
(d) (e) (f)

Fig. 17.4 Subtracted two-loop self-energy in φ3 theory.


624 Scales II: Perturbatively renormalizable field theories

The subtracted propagator 1/((p − l1 )2 + m2 ) − 1/(l12 + m2 ) − . . . appearing between


the curly braces is easily seen to have the asymptotic behavior ∼ 1/l15 for large l1
(it consists of the terms in the Taylor expansion in p of the unsubtracted propagator
with three or more derivatives), so the degree of divergence of the l1 subintegration is
reduced to −2 − 5 + 6 = −1, as desired. Note that we are specifically examining the
region of l1 large with l2 fixed, so the logarithmic divergence in the final l2 integral is of
no consequence here. The efficacy of the subtractions in the subregion corresponding
to large momentum flow through γ2 is also obvious, by the symmetry of the graph.
There remains the region of large momentum flow through the entire diagram (both l1
and l2 large). In this case it is easy to see that the dominant asymptotic contributions
cancel pairwise between Ia and Id , Ib and Ie , and Ic and If .
The generalization of the subtraction procedure described here to arbitrary
Feynman integrals was first obtained as a recursive formula by Bogoliubov and
Parasiuk (Bogoliubov and Parasiuk, 1957)(with technical improvements by Hepp
(Hepp, 1966)—hence the appellation “BPH scheme”). It is an obvious generalization
of (17.24, 17.25):

IR
Γ
= (1 − tΓ )Ī Γ (17.27)
 
n
Ī Γ = I Γ + I Γ/{γ1 ,..,γn } (−tγi )Ī γi (17.28)
γ1 ,γ2 ,..γn ;γi ∩γj =0,i=j i=1

where for conciseness we no longer indicate the dependence on external momenta.


The sum in (17.28) is over all sets of disjoint renormalization parts γi strictly
contained in Γ. It is a recursive formula inasmuch as the integrand Ī γi associated
with each such renormalization part has all of its subdivergences already subtracted
(by recursive use of the same formula, until renormalization parts are reached with
no further subdivergences). If the graph as a whole is superficially convergent, the
overall subtraction performed by the tΓ operator in (17.27) is absent. If the full graph
is divergent, then this tΓ performs the subtraction needed to remove the divergent
contribution from the asymptotic region in which large momentum irrigates the entire
graph.
The proof of the Bogoliubov–Parasiuk formula is by induction on the number of
loops. It is manifestly correct at one loop: either the graph is already convergent, or
the (1 − tΓ ) subtraction renders it finite; there are no subdivergences in a one-loop
graph, so Ī Γ (p) = I Γ in (17.27). The subtractions given in (17.28) are then shown
to be efficient in subtracting off the divergent contributions from subintegrations in
which large momentum circulates between distinct renormalization parts γi , as we saw
explicitly in our two-loop example of Figs. 17.3 and 17.4. A careful proof of (17.28)
will not be given here (see, for example, (Hepp, 1966), (Van der Kolk and de Kerf,
1975)). From the point of view of physical intuition it is far more useful to examine
its operation in explicit examples, where the subtractions are readily seen to induce
cancellations between appropriate subclasses of diagrams for each distinct type of
large momentum flow. We therefore encourage the reader to repeat the arguments
given above for the integrals of (17.17, 17.21) for the additional examples given in the
Problems at the end of this chapter.
Weinberg’s power-counting theorem and the divergence structure of Feynman integrals 625

The recursion formula (17.28) was explicitly solved by Zimmermann (Zimmer-


mann, 1969), giving a transparent, and exceedingly simple, formula which generates
automatically the complete set of subtractions needed to make an arbitrary Feynman
diagram UV-finite. First, we define a forest U of a graph Γ as any set (including
possibly the empty set) of non-overlapping renormalization parts of Γ. The set of all
forests of Γ is denoted F(Γ). The origin of the name “forest” is apparent with a glance
at Fig. 17.5. Here we show a forest consisting of N renormalization parts, γ1 , γ2 , .., γN
which are all proper subgraphs of some full graph Γ. Strictly speaking, forests
containing only renormalization parts that are subgraphs but not (in the event that
the full graph is superficially divergent) the full graph Γ are called “normal forests”—
the set of which we shall denote N (Γ)—while forests containing Γ (if divergent) are
“full forests”, comprising the set F(Γ). Fig. 17.5 displays the structure of a normal
forest, with the nodes representing renormalization parts and the lines connecting
two nodes indicating that the upper renormalization part is strictly contained in the
lower one. Note that a normal forest has an extremal set of “biggest” renormalization
parts (in Fig. 17.5, these are γ1 , γ2 , ..γn ), none of which are contained in any other
renormalization part. Typically, we denote forests of a graph by capital Roman letters,
U, V, etc. Let I Γ represent, as usual, the unsubtracted Feynman integrand for the
graph Γ. Zimmermann’s solution for the subtractions needed to generate directly Ī Γ
in (17.28) is, for an arbitrary renormalization part γ (which may include Γ itself),
 
Ī γ = (−tγr )I γ (17.29)
U ∈N (γ) γr ∈U

We shall show, by induction on the number of loops, that (17.29) solves (17.28).
In the event that U is the empty forest, the product of (negative) Taylor operators is
interpreted as unity: this term simply reproduces the original, unsubtracted integrand.
All other terms involve at least one renormalization part and, therefore, a subtraction
of the original integrand. The formula is evidently correct at one loop, as there are
no possible subdivergences, the normal forests of γ are empty, and Ī γ = I γ . We now
proceed by induction, and assume that the Zimmermann formula has been established
up to the (lower) number of loops contained in the subgraphs γ1 , .., γn in (17.28). We
may therefore insert (17.29) in (17.28) for the subgraphs Γi , i = 1, 2, .., n, obtaining
 
n  
Ī Γ = I Γ + I Γ/{γ1 ,..,γn } (−tγi ) (−tγr )I γi (17.30)
γ1 ,γ2 ,..γn ;γi ∩γj =0,i=j i=1 Ui ∈N (γi ) γr ∈Ui

γN−1
γN−2 γN

γn+1
γn+2 γn+3 ····

γ1 γ2 γn

Fig. 17.5 Structure of renormalization parts in a normal Zimmermann “forest”.


626 Scales II: Perturbatively renormalizable field theories

With a little thought, the reader will easily see that the sum in the second term on
the right-hand side of (17.30) simply assembles all the non-empty normal forests of
the graph Γ, sorted by their extremal elements γ1 , γ2 , ..γn , thereby reproducing the
forest formula at the next higher level of induction. In the event that the full graph
Γ is itself divergent, then the sum over all forests F(Γ) can be divided into pairs of
normal forests U and full forests [Γ, U ] with an additional subtraction −tΓ for each
full forest, yielding the Zimmermann forest formula for the fully subtracted amplitude
given recursively in (17.27):
 
IR
Γ
= (−tγr )I Γ (17.31)
U ∈F (γ) γr ∈U

Note that the nested nature of the subtractions in the forest formula (due to the
absence of overlapping renormalization parts) allows us to write the complete Feynman
integrand I Γ (broken up into a product of reduced integrands I Γ/{γ1 ,..,γn } and
individual renormalization parts I γi in (17.30)) to the extreme right in (17.31): the
Taylor operators in each of the “trees” in Fig. 17.5 then act sequentially, starting at
the top (the smallest divergent subdiagram) and working downwards. The trickiest
part of a proper proof 5 of the convergence of IR Γ
in (17.31) lies in the demonstration
that there exists an “admissible” routing of the loop momenta in the subdiagrams such
that the result of the Taylor operations is unambiguous (as we necessarily encounter
the situation in higher orders that the external momenta of one subgraph become the
internal momenta of another).
As usual, nothing serves better to clarify how this works than an explicit exam-
ple. In the case of the graph of Fig. 17.3, in addition to the empty forest, F(Γ)
evidently contains the forests [γ1 ], [γ2 ], [Γ], [Γ, γ1 ], [Γ, γ2 ]—but not a forest containing
both overlapping subdiagrams γ1 and γ2 —giving rise to exactly the six terms indicated
in (17.25) and displayed in Fig. 17.4, the convergence of which has already been
explained.The reader is strongly encouraged to verify the efficacy of the Zimmermann
forest formula with additional examples such as the graph of Figure 17.2, where one
may also verify explicitly the equivalence of the Bogoliubov–Parasiuk recursion and
Zimmermann formulas.
The zero-momentum subtraction method described so far in this section (referred
to commonly as the “BPHZ” or Bogoliubov–Parasiuk–Hepp–Zimmermann scheme) is
by no means the only way to arrive at a fully subtracted amplitude which is UV-finite
according to the requirements of Weinberg’s theorem. Clearly, the addition of finite
(cutoff independent) constant terms to the amounts prescribed by our Taylor operators
tγ for each renormalization part would produce an equally UV-finite fully subtracted
amplitude, differing from the BPHZ one by a finite amount. Indeed, soon after the

5 The technical niceties involved in a proper choice of momentum routing are dealt with in full detail in
(Zimmermann, 1969). Alternatively, one may employ an “α-parametric” representation, replacing Feynman
 2 2
propagators 1/(p2 + m2 ) → 0∞ e−α(p +m ) dα, performing the loop momentum integrals explicitly, and
formulating the subtractions directly for the resultant integrands in the multi-dimensional “α” space, in
which case the ambiguities of momentum ordering can be avoided: see (Bèrgere and Lam, 1976).
Weinberg’s power-counting theorem and the divergence structure of Feynman integrals 627

introduction of the dimensional regularization scheme by ’t Hooft and Veltman (cf.


Section 16.5), it was demonstrated (Speer, 1973) that one may replace the tγ Taylor
subtraction operators, which act on the Feynman integrands of divergent subdiagrams
by a dimensional pole part operation tγDR ≡ PP γ (cf. Section 16.5) which removes
the pure pole singularities in the Laurent expansion of the dimensionally continued
Feynman integral of the subgraph γ in the variable  = dph − d, where d is the
dimensional continuation variable and dph the physical spacetime dimension. As the
poles which would produce a divergent result when the limit d → dph is performed
at the end have been removed, this procedure automatically produces a result which
contains no explicit UV cutoff Λ but, as we saw in Section 16.5, necessarily depends on
a renormalization scale μ required to consistently define the dimensionally continued
amplitudes.
In this dimensional subtraction approach, all the formulas we have written so
far (in particular, the Bogoliubov–Parasiuk recursion formula and the Zimmermann
forest formula) remain in force, with the simple replacement tγ → PP γ . As we shall
see in the next section, the demonstration that the subtractions appearing in these
formulas are precisely equivalent to a reparameterization of the amplitudes of the
theory in terms of a well-defined finite set of low-energy parameters (themselves specific
non-linear functions of the coupling and mass parameters appearing as coefficients in
the Lagrangian of the theory) depends crucially on the fact that the counterterm
tγ Ī γ associated with a renormalization part γ is a polynomial of degree d(γ) in the
external momenta of the subgraph γ, where d(γ) is the (by definition, non-negative)
degree of divergence of γ. It takes exactly the same form, in other words, as the
insertion of a local operator O(γ) at a single vertex replacing the entire subgraph γ,
where O(γ) contains as many field operators as there are external lines of γ, with (a
finite number of) spacetime-derivatives chosen to reproduce the powers of external
momenta generated by the Taylor operator tγ . While this is obvious in the BPHZ
zero-momentum subtraction scheme, it is far from so when we employ pole subtractions
to remove the large momentum divergences of a general Feynman integral, as the pole
terms are extracted after the loop integrations are performed, and may contain a
priori a complicated (and in particular, non-polynomial) dependence on the external
momenta of the subgraph.
To see how the dimensional subtraction method avoids this potential trap, we first
note that the subdivergences of γ are already subtracted by internal counterterms in
Ī γ . To show that the pole part PP γ Ī γ is indeed a polynomial in the external momenta
of γ, and therefore equivalent to a local operator insertion, we need only show that
differentiating Ī γ d(γ) + 1 times with respect to the external momenta of γ results in
a UV-convergent integral, with vanishing pole part as d → dph —thereby establishing
that the pole part is a polynomial of degree d(γ) in these momenta, and reproducible
by a local operator insertion. The arguments needed to establish this via Weinberg’s
theorem are not particularly difficult: the essential point is that the differentiation of
Ī γ , which the reader will recall already contains counterterms needed to remove all the
proper subdivergences of γ, lowers the total degree of divergence of both the overall
graph γ and its (already subtracted) subdivergences to the point where we have a
completely UV-finite integral, with no pole in . As usual, the only situation where
628 Scales II: Perturbatively renormalizable field theories

this is potentially a subtle issue occurs when we have overlapping divergences, so we


encourage the reader to verify the asserted convergence for the third derivative (with
respect to p) of the amplitude Ī Γ (p) in (17.28) for the superficially quadratically
divergent graph of Fig. 17.3 (see Problem 2). This convergence implies that the
final overall subtraction operator tΓ of (17.23), when replaced by the dimensional
pole part operation PP Γ , indeed produces a quadratic (even) polynomial A + Bp2 in
the momentum p, corresponding to the contribution which would be obtained by an
insertion of the local operator Aφ2 + B(∂μ φ)2 at the vertex obtained by contracting
the two-loop graph to a point.
Although the individual terms in the Zimmermann forest formula (17.31) corre-
spond to UV-divergent integrals, and must therefore be regulated in some fashion if we
wish to examine them individually, the combined sum of all the terms, by construction,
produces an integrand which yields an absolutely convergent multi-loop integral, even
in the absence of a cutoff (or any other type of regularization: e.g., dimensional). We
shall shortly see that the “BPHZ-renormalized” amplitudes provided by the forest
formula exactly correspond to the perturbative expansion of the amplitudes of a
Lagrangian field theory in powers of a suitably reparameterized coupling constant,
so the reader may be wondering why we need to introduce a regularization scheme in
the first place.
The answer is twofold. First, as emphasized repeatedly in the preceding chapter, a
physical regularization of the amplitudes at high momentum is present in any event,
whether we wish it or not, due to the inevitable breakdown of flat-space Minkowski field
theory once quantum gravity effects become appreciable, or simply because the field
theory itself is only a low-energy effective theory to be supplanted at higher energies
by a more comprehensive microscopic theory (for example, in the way in which the
Standard Model may be replaced at short distance scales by a Grand Unified Theory
with a larger local gauge group containing the gauge symmetries of the Standard
Model). We must therefore always bear in mind that the finite results yielded by
(17.31) are approximations to the actual physical amplitudes, with corrections of order
inverse powers of the ratio of the masses and momenta of the particles described by
the theory to the ultraviolet scale at which our low-energy theory, for whatever reason,
begins to break down.
Secondly, for purely practical reasons, the integrands obtained by combining all
the terms in the forest formula into a single integrand of the form (17.7), especially
in higher orders (i.e., two or more loops) are usually extremely long and cumbersome
expressions, making it impossible in practice to analytically perform the resultant
integration. It is usually vastly simpler to introduce a regularization which renders
the individual terms in the forest formula well-defined, perform the (much simpler)
integrals corresponding to each term individually, and then combine the results, at
which point one can verify that the singular dependence on the regularization variable
(cutoff Λ in momentum cutoff schemes, or 1/ poles in dimensional regularization)
cancels in the complete subtracted amplitude. In verifying the perturbative renormal-
izability of the theory, to which we now turn, it is very important, when choosing a
regularization scheme, that the symmetries of the underlying Lagrangian are preserved
at the intermediate stages of the calculation.
Counterterms, subtractions, and perturbative renormalizability 629

17.2 Counterterms, subtractions, and perturbative


renormalizability
The considerations of the preceding section provide a complete characterization of the
terms giving rise to cutoff-dependence in arbitrary multi-loop Feynman amplitudes,
but as yet the meaning of these contributions in the context of the Lagrangian
dynamics of a local quantum field theory is unclear. In fact, for a certain class of
local field theories, the subtractions prescribed by the BPHZ procedure are exactly
equivalent to a reparameterization of the perturbative Feynman amplitudes of the
theory in terms of a finite set of low-energy parameters, which can be determined
(order by order in perturbation theory) by an equivalent number of independent low-
energy measurements. We shall see how this comes about first in the algebraically
simplest case—that of self-coupled scalar particles. We recall that the general form
taken by the Euclidean action giving a clustering relativistically invariant theory for
such particles is (cf. (16.6)), in d spacetime dimensions,
  
1
SE (φ) = dd x{ (∂ν φ)2 + an φ2+n + an (∂ν φ)2 φn + . . .}
2 n>0
n≥0
   
1
≡ dd x{ (∂ν φ)2 + an On + an On + . . .} = L(φ)d4 x (17.32)
2 n>0
n≥0

where an ultraviolet momentum cutoff Λ (or alternatively some form of short-distance


cutoff), is assumed implicitly present. The dots “. . . ” represent operators with four or
more spacetime-derivatives.
In what follows we shall frequently resort to dimensional analysis, so we remind
the reader at this point that, using natural units ( = c = 1), ensuring the dimensional
consistency of our equations amounts to counting powers of mass, with momentum and
energy having dimensions of mass, and space and time coordinates, inverse powers of
mass. The Euclidean action SE must be dimensionless (it appears in the exponent
in the functional integral) so by examining the kinetic term in (17.32) (the first
term on the right-hand side), we conclude that the scalar field φ must have, in d
spacetime dimensions, dimension md/2−1 . Having determined the dimension of the
field, it is straightforward to examine the other terms in (17.32) to establish their
mass dimension. We shall denote this “engineering dimension” of any quantity in
powers of mass with the notation “dim”, thus:
d
dim(φ) = −1
2
d
dim(an ) = 2 + n(1 − )
2
d
dim(On ) = (n + 2)( − 1)
2
d
dim(an ) = n( − 1) + d
2
 d
dim(On ) = n(1 − ) + d (17.33)
2
630 Scales II: Perturbatively renormalizable field theories

In perturbation theory, we as usual assign the quadratic field kinetic ( 12 (∂ν φ)2 ) and
mass (a0 φ2 ) terms to the “free” part of the Lagrangian, and the remaining terms in
(17.32) are placed in the “interaction”, generating the vertices of the Feynman graphs
of the theory.6 In the present case, the interaction vertices are associated with the bare
coupling constants a1 , a2 , a3 , .., a1 , a2 , a3 , .. etc. Let us now consider a 1PI graph with
E external lines, with momentum-space amplitude Γ(E) (k1 , k2 , . . . , kE ). We shall be
concerned only with dimensional analysis here—counting powers of mass dimension—
so overall numerical (dimensionless) factors may be ignored. Thus, using the symbol
∼ to indicate dimensional equivalence, we have

 
E 
δd ( ki )Γ(E) (k1 , k2 , . . . , kE ) ∼ (ki2 + m2 ) dd x1 · · · dd xE 0|φ(x1 ) · · · φ(xE )|0
i i=1
(17.34)
The product of k 2 + m2 factors on the right-hand side serves to truncate the external
propagators, and, of course, the 1PI character of Γ(E) means that only a subset of the
terms contributing to the E-point function in the integral are included, but this does
not alter the fact that dimensional consistency require the left- and right-hand sides
of (17.34) to have the same engineering dimension. This implies
d d
−d + dim(Γ(E) ) = 2E − dE + E( − 1) ⇒ dim(Γ(E) ) = d + E(1 − ) (17.35)
2 2
Now let us consider a particular L-loop graph contributing to Γ(E) with N1 , N2 , ..
vertices of the interaction terms O1 , O2 , .. etc., and similarly N1 , N2 , .. vertices for the
primed operators O1 , O2 , ... With l representing a generic loop momentum, and with
all loop momenta large compared to masses and external momenta ki , we also have,
dimensionally,

 N1  N2
Γ(E) ∼ aN 1
1 N2
a 2 · ·(a 1 ) (a 2 ) · · lD−dL ddL l (17.36)

where D is the superficial degree of divergence of the graph in question, which we


recall is computed precisely by counting powers of loop momentum in the multi-loop
Feynman integrand, including, of course, the phase-space associated with L indepen-
dent d-dimensional momentum integrations. Referring to (17.33), and comparing the
dimensions of the left- and right-hand sides of (17.36), we find
 d d
dim(Γ(E) ) = (Nn (2 + n(1 − )) + Nn n(1 − ) + ..) + D (17.37)
n=1
2 2

Using our previous result (17.35) for the dimension of Γ(E) , we therefore find for the
superficial degree of divergence of this graph

6 For the time being we shall ignore terms with more than two derivatives and quadratic in the field. Such
“Pauli–Villars” terms can be included in the free propagator, and result in a damping at high momentum
analogous to that employed in our treatment of the renormalization group flow of effective Lagrangians in
Section 16.4. In other words, they can be viewed as a modification of the cutoff scheme employed to define
individually divergent Feynman graphs.
Counterterms, subtractions, and perturbative renormalizability 631

d  d d
D = d + E(1 − ) − (Nn (2 + n(1 − )) + Nn n(1 − ) + ..) (17.38)
2 n=1
2 2

In particular, in d = 4 spacetime dimensions, the result becomes



D =4−E+ ((n − 2)Nn + nNn + ..) (17.39)
n=1

We note that the sum divides into (a) a single negative contribution (n = 1), corre-
sponding to the operator φ3 , increasing insertions of which decrease the superficial
degree of divergence of the graph, (b) a term with vanishing coefficient of N2 , which
counts the number of appearances of the quartic vertex induced by the φ4 term in
the Lagrangian, increasing insertions of which therefore do not affect the superficial
degree of divergence of the graph, and (c) terms with positive contributions to D,
corresponding to operators O3 = φ5 , O4 = φ6 , . . . , O1 = (∂ν φ)2 φ2 , . . . with engineering
dimension greater than 4, and for which increasing numbers of vertices result in
increasing degree of divergence of the graph containing them. This division precisely
corresponds to the classification of operators referred to in Section 16.3 as (a) relevant,
(b) marginal, or (c) irrelevant, from the point of view of the renormalization group
behavior of effective field theories described there. In the present context it will be
more convenient to refer to the operators of type (a) as super-renormalizable, (b) as
renormalizable, and (c) as non-renormalizable, for reasons that will shortly become
clear. The essential point to be grasped at this juncture is that, while the number
of super-renormalizable and renormalizable operators/vertices is finite, (indeed, there
are only two possible interaction terms in 4-dimensions, corresponding to φ3 and φ4 ),
the number of non-renormalizable vertices/operators is always infinite. We note here
for future reference that in d = 6 spacetime dimensions, (17.39) becomes

D = 6 − 2E + ((2n − 2)Nn + 2nNn + ..) (17.40)
n=1

so the only renormalizable operator in this case is φ3 , with the quartic vertex from φ4
already corresponding to a non-renormalizable term. For spacetime dimensions greater
than 6, all interaction operators fall into the non-renormalizable category.
The result (17.39) allows us to identify immediately the renormalization parts
for a scalar field theory in four dimensions. First, consider the case where only
super-renormalizable and renormalizable terms are present in the Lagrangian. Thus,
Nn = 0, n > 2, Nn = 0, n > 0 etc., and we have simply

D =4−E (17.41)

We may identify this theory by the useful notation φ44 , where the subscript indi-
cates the spacetime dimension, and the superscript indicates the highest dimension
(renormalizable) interaction operator. Let us also assume the discrete symmetry
φ → −φ which eliminates terms containing odd powers of the field. Then the only
renormalization parts of the resultant φ4 theory correspond to 1PI graphs with
E = 0, 2, or 4 external lines. We have already seen in Chapter 10 that the vacuum
graphs of the theory (E = 0) induce a physically irrelevant (in flat space) phase
632 Scales II: Perturbatively renormalizable field theories

shift of the vacuum state, which cancels in the appropriately normalized amplitudes.
Thus the renormalization parts of this theory correspond to quadratically divergent
1PI two-point subgraphs (termed “self-energy”, or sometimes “vacuum polarization”
graphs) and logarithmically divergent 1PI four-point subgraphs (which we shall call
“vertex renormalization parts” for reasons shortly to become apparent). All graphs
with more than four external lines are superficially convergent: they may, of course,
contain internal subdivergences, but these must correspond to self-energy or vertex
renormalization subgraphs. Similarly, in φ36 theory, we conclude from (17.40) that
with only renormalizable or super-renormalizable terms present, the renormalization
parts correspond to 1PI graphs with 1, 2, or 3 external lines: termed “tadpole”, “self-
energy”, and “vertex” renormalization parts respectively. Graphs with more than three
external lines (e.g., 2-2 scattering graphs) are superficially convergent. Here we cannot
impose the reflection symmetry φ → −φ, as we wish to retain the one non-trivial
renormalizable term, which is cubic in the field.
On the other hand, if non-renormalizable terms are present, there is an infinite
number of renormalization parts in either case, as the superficial degree of divergence
D of graphs with any number of external lines eventually becomes positive if enough
vertices of non-renormalizable operators are inserted, due to the positive terms in the
sums in (17.39, 17.40). We are about to see that the subtractions introduced in the
preceding section to remove the dominant cutoff dependence of Feynman amplitudes
are exactly equivalent to a precise set of reparameterizations of the parameters in
the (cutoff) Lagrangian in terms of parameters defined at low energy. Evidently, the
presence of non-renormalizable operators in the basic Lagrangian will require the repa-
rameterization of an infinite number of Lagrangian parameters, if we wish to compute
cutoff-insensitive amplitudes to arbitrary orders of perturbation theory. In such a case,
we say that we are dealing with a “perturbatively non-renormalizable theory”.
What if we exclude ab initio non-renormalizable terms from the Lagrangian?
From the point of view of the Wilsonian effective Lagrangian theory discussed in the
preceding chapter, this would seem to be a physically unreasonable procedure: after all,
we saw there that the inevitable presence of new physics (such as quantum gravity) at
very short distances necessarily induces an infinite set of higher-dimension operators in
the effective Lagrangian for any theory cutoff at some high-momentum scale. We shall
return to precisely this question in Section 17.4, where the implications of restricting
ourselves to a Lagrangian with only super-renormalizable or renormalizable terms are
examined from a general renormalization group point of view. For the time being,
let us suppose (as we have frequently done in the book, making no excuses!) that we
can describe the interactions of a real scalar field in four dimensions by a Lagrangian
containing only three terms (and an implicit UV cutoff Λ),
1 1 1
L= (∂μ φ)2 − m2 φ2 − λφ4 (17.42)
2 2 4!
where we have returned to more conventional notation for the mass and coupling
term: a0 = 12 m2 , a2 = 4!
1
λ. This Lagrangian, of course, represents a class of physical
theories, with varying masses for the φ-particle, and for the strength of its self-
coupling. The relevant physical theory must be fixed by performing experiments to
measure the physical mass mph (related, but as we shall see, certainly not identical,
Counterterms, subtractions, and perturbative renormalizability 633

to the parameter m2 in the Lagrangian), and a physical coupling strength λph . One
might, for example, define the latter as the S-matrix element for elastic 2-2 scattering
in the zero (spatial) momentum limit pμ0 = (mph , 0), sans uninteresting momentum-
conservation and phase-space factors (cf. (7.196)):
1
Sp0 p0 ,p0 p0 ≡ λph · (2π)4 δ 4 () (17.43)
(2π)6 (2m ph )
2

Evidently, for any given fixed value of the UV cutoff Λ, these definitions fix the bare
parameters m2 , λ appearing in the Lagrangian as functions of measurable quantities

m2 = m2 (λph , mph , Λ), λ = λ(λph , mph , Λ) (17.44)

Arbitrary Feynman amplitudes of the (cutoff) theory can then be computed as


functions of m, λ, and Λ, and then reparameterized in terms of the physical mass
and coupling mph , λph . This can be done (indeed, in general, must be done) order by
order in a formal expansion in the measured coupling λph , as we typically are unable
to solve the theory exactly. The choice of definition of λph means that, to lowest
order in perturbation theory (see (7.196)), the bare and physical coupling coincide,
λph = λ + O(λ2 ) (equivalently, λ = λph + O(λ2ph )), which also ensures that the bare
and physical masses m and mph coincide to lowest order of perturbation theory (in
either λ or λph ). In the first section of this chapter we saw a simple example in which
such a reparameterization in fact completely removed the dominant cutoff-dependence
of the amplitudes, leaving only a quantitatively ignorable dependence at the level
of inverse powers of the cutoff. We are about to see that exactly this softening of
ultraviolet sensitivity occurs, order by order in perturbation theory and (to all orders)
in theories containing only renormalizable or super-renormalizable operators in their
cutoff Lagrangian.
Before proceeding to the general argument, we need to point out an important fea-
ture of the Lagrangian (17.42), which we have heretofore glossed over. The Lagrangian
contains three terms, and we should therefore in general expect it to represent a three-
parameter family of theories, if we allow the coefficient
 of the kinetic term to vary,
effectively by rescaling the field in (17.42), φ ≡ ẐφR :
1 1 1
L= Ẑ(∂μ φR )2 − Ẑm2 φ2R − Ẑ 2 λφ4R (17.45)
2 2 4!
We use the notation Ẑ, rather than the more common Z, to emphasize the fact that
the “wavefunction renormalization” constant being introduced here may, but need not,
coincide with the LSZ field normalization constant Z appearing in the vacuum to single
particle matrix element of the field, as in (9.155). The new constant Ẑ merely corre-
sponds to our freedom to rescale the field by a constant, φ → κφ. The new rescaled field
φR is generally termed the “renormalized field”. From the point of view of S-matrix
elements, such a rescaling is physically irrelevant: if two fields φA and φB differ by
a simple scaling, φA = κφB , then their LSZ normalization constants are related by
ZA = κ2 ZB , and the LSZ formula (9.176) gives identical results for the S-matrix using
either field, as the change in Z −1/2 factors exactly compensates for the rescaling of the
fields in the Feynman Green function. Nevertheless, we shall want to remove cutoff
634 Scales II: Perturbatively renormalizable field theories

sensitivity, if possible, from as many elements of the formalism as possible, and in


particular, from the Feynman amplitudes (n-point functions) appearing in the LSZ
formula, even before these are taken on-shell and appropriately scaled. This freedom
will also be necessary to complete the connection of the reparameterization program
described above with the BPHZ subtraction procedure of the preceding section.
The reparameterization of the theory suggested above in terms of a directly
measurable physical mass and coupling is only one of an infinite variety of possible
redefinitions of the parameters of the theory. In order to establish the connection of the
reparameterized amplitudes with the subtracted ones discussed in the previous section,
we shall make an alternative choice. Our “physical” (or “renormalized”) coupling
parameter, now and henceforth dubbed λR , will be defined as the zero four-momentum
value of the 1PI Euclidean four-point function of the renormalized field φR , which we
denote, for general external momenta, Γ(4) (k1 , k2 , k3 , k4 ):

λR ≡ Γ(4) (0, 0, 0, 0) (17.46)

Note that we take kiμ = 0 for all four components of the external momenta, which
means that we are parameterizing the theory in terms of an off-shell value for a
Green function. The overall numerical factor in Γ(4) is chosen so that its perturbative
expansion in terms of the bare coupling λ begins with λ, with a coefficient of unity.
The first few graphs contributing to Γ(4) in terms of the bare parameters of (17.45)
are indicated in Fig. 17.6. Thus, we have a formal expansion:

Ẑ 2 λ ≡ λR + δλ, δλ = c2 λ2R + c3 λ3R + . . . .. (17.47)

The shift δλ between the bare and renormalized coupling is called a “coupling constant
counterterm”: the coefficients c2 , c3 , . . . appearing in the expansion of the counterterm
in λR must be determined order by order in perturbation theory to ensure that the
zero momentum value of Γ(4) remains pinned at λR , as it is defined to be. We shall
see shortly how this is accomplished in practice. We likewise define counterterms for
the other two (as yet) floating parameters in the Lagrangian:

Ẑ ≡ 1 + δ Ẑ = 1 + a1 λR + a2 λ2R + . . . (17.48)
Ẑm2 ≡ m2R − δm2 = m2R − (b1 λR + b2 λ2R + . . .) (17.49)

k3 k4
k3 k4
k3 k4 k4 k3

+ + + + ....

k1 k2 k1 k2
k1 k2
k1 k2

Fig. 17.6 Low-order 1PI contributions (through one loop) to Γ(4) (k1 , k2 , k3 , k4 ) in φ4 theory.
The crossbars indicate that external legs are amputated.
Counterterms, subtractions, and perturbative renormalizability 635

The counterterms δ Ẑ, δm2R , will be used to adjust the behavior of the self-energy (i.e.,
the 1PI two-point function of the renormalized field φR ) at zero momentum, as follows.
Note that the connected two-point function of the theory (in momentum space, the
“full” Feynman propagator Δ̂F (p); cf. Section 10.4) can be graphically represented
as a series of free propagators ΔF (p), interspersed by 1PI self-energy corrections, as
indicated in Fig. 10.5. Denoting the sum of all 1PI two-point self-energy graphs as
Π(p) (the graphs contributing to this Green function in φ4 through two loops are
displayed in Fig. 17.7), we have algebraically

Δ̂F (p) = ΔF (p) + ΔF (p)Π(p)ΔF (p) + ΔF (p)Π(p)ΔF (p)Π(p)ΔF (p) + . . .


1
= (17.50)
Δ−1
F (p) − Π(p)

In other words, the inverse (Euclidean) full propagator consists just of a tree con-
tribution which is just the inverse free propagator, together with (minus) the 1PI
loop graphs indicated in Fig. 17.7, just as we would expect, given that it coincides
with the two-point function corresponding to the functional Γ(φR ) generating the 1PI
graphs of the theory (cf. Equations (10.142, 10.143)). We are at liberty to apportion the
quadratic (mass term) part of the Lagrangian at liberty into a “free” and “interacting”
part, and we shall define the coefficient of 12 φ2R in the free Lagrangian as m2R : thus
Δ−1 2 2 2
F (p) = p + mR in (17.50). Moreover, we shall choose δ Ẑ, δmR order by order in
perturbation theory to remove the first two terms in the Taylor expansion (in p2 ) of
Π(p) (which is, of course, a scalar function of p2 , by Lorentz-invariance), as follows:

Π(0) = 0 (17.51)

∂ 2 Π(p) 
=0 (17.52)
∂p2 p=0

Equivalently, these latter two conditions may be phrased, using (17.50), in terms of
the full propagator,

Δ̂−1
F (p = 0) = mR
2
(17.53)

1 ∂ 2 Δ̂−1 
F (p) 
 =1 (17.54)
2 ∂p2 
p=0

p p + p p +...

Fig. 17.7 Low-order 1PI contributions (through two loops) to Π(p) in φ4 theory. The crossbars
indicate that external legs are amputated.
636 Scales II: Perturbatively renormalizable field theories

The three constraints (17.46, 17.51, 17.52) constitute the renormalization conditions
for our theory, as described in the first section of this chapter: the associated ampli-
tudes are said to be calculated in the BPHZ renormalization scheme.
Note that there is no reason to expect—and indeed it is not the case—that mR
corresponds to the physical mass mph of the particle: i.e., the location of the pole of the
full propagator Δ̂F (p). However, as is apparent from (17.53), it is a perfectly sensible
quantity (with dimensions of mass, of course) in terms of which to parameterize the
amplitudes of the theory, and in terms of which a formula can be obtained (order
by order in perturbation theory) for mph . The reparameterization procedure is most
easily carried out by rewriting the Lagrangian (17.45) as a sum of three terms:

L = L0 + Lbasic + Lct (17.55)


1 1
L0 = (∂μ φ)2 − m2R φ2 (17.56)
2 2
1
Lbasic = − λR φ4 (17.57)
4!
1 1 1
Lct = δ Ẑ(∂μ φ)2 + δm2 φ2 − δλ φ4 (17.58)
2 2 4!

To avoid notational overload, we have dropped the “R” subscript on the field, with
the understanding that here and henceforth we are concerned only with the Green
functions of the rescaled field. The “free” Lagrangian L0 , which determines the
propagators appearing in our graphs, is now written in terms of this rescaled (or
“renormalized”) field (φR of (17.45)), and the renormalized mass parameter mR (thus,
the Euclidean propagator denominators are simply k 2 + m2R ). Inasmuch as all the
terms in the “basic vertex Lagrangian” Lbasic and the “counterterm Lagrangian” Lct
contain at least one power of λR , the full interaction Lagrangian now contains vertices
corresponding not just to the original four-point interaction vertex of (17.45), but
additional two- and four-point vertices corresponding to the terms in the counterterm
Lagrangian.
We now wish to establish the following remarkable property of the amplitudes
(i.e., n-point Euclidean Green functions) of this theory: once reparameterized in
terms of the renormalized quantities mR , λR as defined above, the amplitudes acquire
precisely the zero momentum subtractions corresponding to the Zimmermann formula
(17.31), thereby softening their UV cutoff dependence to the level of inverse powers,
order by order in the perturbative expansion of the amplitudes in λR . A theory of this
kind, in which the redefinition of a finite number of Lagrangian mass and coupling
parameters induces subtractions removing the UV cutoff dependence (up to inverse
powers) of all the amplitudes of the theory, to all orders of perturbation theory, is
called “perturbatively renormalizable”.
Our demonstration will be inductive in the number of loops of the diagrams
considered. Accordingly, our first task is to initiate the induction by demonstrating the
validity of the italicized assertion above in the lowest non-trivial order, namely, for the
one-loop amplitudes of the theory. In fact, we need only consider the 1PI amplitudes of
the theory for our proof of renormalizability. Any (connected) amplitude which is not
Counterterms, subtractions, and perturbative renormalizability 637

one-particle-irreducible can be divided into subgraphs that are, connected by single


internal lines carrying fixed momenta (which are linear combinations of the external
momenta of the process). Consequently, any potential loop divergences are isolated
in the separate 1PI parts of the diagram: once these are appropriately subtracted, no
further divergences can arise by connecting 1PI subdiagrams with internal lines whose
momenta are fixed. First, let us consider the superficially divergent 1PI diagrams
of the theory (i.e., the renormalization parts), and calculate these to one-loop order,
imposing the BPHZ renormalization conditions (17.46, 17.51, 17.52). We first examine
the four-point function Γ(4) (k1 , k2 , k3 , k4 ), computed to order λ2R . In addition to the
graphs of Fig. 17.6, involving only the basic vertex (17.57), there is a contribution at
order λ2R from the c2 λ2R part of the − 4!1
δλ φ4 piece of the counterterm Lagrangian. The
parts of the counterterm Lagrangian quadratic in the fields result in the insertion of
two-point vertices and generate one-particle reducible diagrams. Thus, to order λ2R (cf.
(16.15), with the change of notation g2 → λR ), using a momentum cutoff Λ to regulate
the individual diagrams, and neglecting terms suppressed by inverse powers of Λ,
(4) 1 2
Γ1 (k1 , k2 , k3 , k4 ) = λR − λ {I(s, m2R , Λ2 ) + I(t, m2R , Λ2 ) +
32π 2 R
I(u, m2R , Λ2 )} − c2 λ2R (17.59)
 1
Λ2
I(p , m , Λ ) ≡
2 2 2
(ln ( ) − 1)dx (17.60)
0 x(1 − x)p2 + m2R
s ≡ (k1 + k2 )2 , t ≡ (k1 − k3 )2 , u ≡ (k1 − k4 )2 (17.61)

where the subscript indicates explicitly that our four-point function is being calculated
through one-loop order. The renormalization condition (17.46) implies that we must
choose the coefficient c2 in the vertex renormalization counterterm to precisely remove
the contribution of the second term on the right-hand side of (17.60) at zero momen-
tum, as the correct zero-momentum value for Γ(4) is already given by the lowest-order
tree contribution:
1
c2 = − · 3I(0, m2R , Λ2 ) (17.62)
32π 2
Inserting (17.62) in (17.60), we find, through order λ2R (i.e., to one loop),
(4) 1 2
Γ1 (k1 , k2 , k3 , k4 ) = λR − λ {IR (s, m2R ) + IR (t, m2R ) + IR (u, m2R )} (17.63)
32π 2 R
with
 1
m2R
IR (p 2
, m2R ) ≡ (1 − t )I(p
γ 2
, m2R , Λ2 ) = ln ( )dx (17.64)
0 x(1 − x)p2 + m2R

satisfying the required constraint IR (0, m2R ) = 0. We remind the reader that the Taylor
operator tγ (with γ referring to the one-loop bubbles of Fig. 17.6) when applied to
a one-loop integral with superficial degree of divergence zero, simply evaluates the
graph at zero external momentum. In other words, the counterterm proportional to
c2 λ2R is exactly equivalent to the Taylor operation defined in the preceding section, as
638 Scales II: Perturbatively renormalizable field theories

a consequence of the need to keep the four-point function pinned at the defined value
λR at higher orders. As expected, the result is the removal from the amplitude of
the logarithmic sensitivity to the cutoff Λ, leaving only terms of order m2R /Λ2 , ki2 /Λ2 ,
which we have neglected above.
Next, we consider the one-loop subtractions induced by the field renormalization
(17.48) and mass (17.49) counterterms, which require the insertion in our graphs, to
first order, of interaction vertices generated by the associated one-loop counterterm
Lagrangian
1 1
Lct,1 loop = a1 λR (∂μ φ)2 + b1 λR φ2 (17.65)
2 2
The perturbative expansion of the full propagator Δ̂F (p) through one loop now
consists of the three graphs indicated in Fig. 17.8: in addition to the one-loop self-
energy correction involving the basic vertex of Lbasic , there is a counterterm graph
(Fig. 17.8(c)) in which the operator (17.65) induces a two-point vertex with the
coefficient a1 λR p2 + b1 λR . Thus, we have, through order λR ,

Δ̂F,1 (p) = ΔF (p) + ΔF (p)Π1 (p)ΔF (p) + ΔF (p)(a1 λR p2 + b1 λR )ΔF (p)


= ΔF (p) + ΔF (p)Π1,R (p)ΔF (p) (17.66)

1 1 d4 l
Π1,R (p) = Π1 (p) + a1 λR p2 + b1 λR = − λR 2 + a1 λR p2 + b1 λR
2 l2 + mR (2π)4
(17.67)

Note that in this case the unsubtracted one-loop self-energy is given by a quadratically
divergent integral which is, however, independent of the external momentum p. We
must now choose the one-loop counterterm coefficents a1 , b1 to impose the renormaliza-
tion conditions (17.51,17.52) on the complete one-loop self-energy Π1,R (p), including
counterterm contributions. As Π1 (p) is independent of p in this case, we have simply

a1 = 0 (17.68)

1 1 d4 l
b1 = 2 (17.69)
2 l2 + mR (2π)4

l a1λR p 2 + b1λR

γ
p p + p p + p × p

(a) (b) (c)

Fig. 17.8 Zeroth order (a), unsubtracted one-loop (b), and counterterm (c) contributions to
the propagator in φ4 theory, through one-loop order.
Counterterms, subtractions, and perturbative renormalizability 639

p−l

p γ p

Fig. 17.9 Unsubtracted one-loop propagator graph in φ36 theory.

which (in this admittedly very special case) results in the complete cancellation
of the self-energy to one-loop order: Π1,R (p) = 0! More generally, the effect of the
counterterm is clearly just to remove the first two terms in the Taylor expansion in
p2 of the unsubtracted self-energy (which for this special case degenerates to the first
constant term)—namely, the quadratically divergent γ in Fig. 17.8(b):

Π1,R = (1 − tγ )Π1 (17.70)

In most theories, the one-loop self-energy Π1 (p) will be a non-trivial function of


momentum, and the Taylor operator tγ will, as discussed in the preceding section,
(a) remove the dominant UV-cutoff dependence of the graph, leaving only inverse
power-dependence, and (b) leave a non-zero “renormalized” self-energy Π1,R (p), which
contributes non-trivially to the full propagator. For example, in φ36 theory,7 the one-
loop propagator graph in Fig. 17.9 corresponds to an unsubtracted self-energy

1 1 d6 l
Π1 (p) = λ2R ≡ λ2R I(p) (17.71)
2 (l + mR )((p − l) + mR ) (2π)6
2 2 2 2

and the counterterm Lagrangian a2 λ2R (∂μ φ)2 + b2 λ2R φ2 induces a subtraction exactly
equivalent to the Taylor operator tγ , as we must choose, pursuant to (17.51, 17.52),

1 ∂2 
a2 = − 2
I(p) (17.72)
2 ∂p p=0

b2 = −I(0) (17.73)

whence

Π1,R (p) = λ2R (I(p) + a2 p2 + b2 ) = (1 − tγ )Π1 (p) (17.74)

as required. In this case, of course, Π1,R (p) is a non-trivial function of p.


To summarize, we have established that the reparameterization of the superficially
divergent Euclidean Green functions of φ44 -theory—namely, the two point propagator
and four point 2-2 amplitude—in terms of renormalized parameters consistent with the
BPHZ renormalization conditions (17.46, 17.51, 17.52) has resulted in the appearance
of exactly the Taylor subtractions required by the forest formula for these amplitudes,

7 In φ3 -theory, as in gauge theories, each additional loop is accompanied by the square of the basic
6
coupling λR , as the reader may easily verify by examining a few simple graphs.
640 Scales II: Perturbatively renormalizable field theories

when computed up to one-loop order. What about all the other one-loop amplitudes
of the theory, with more than four external legs, which are, in virtue of (17.41),
superficially convergent ? Such amplitudes may still contain ultraviolet divergences,
through the presence of divergent one-loop 1PI subgraphs. However, the latter are
exactly the two-point and four-point one-loop subgraphs whose UV divergence has
just been shown to be subtracted by the one-loop counterterms generated by the
reparameterization of the theory in terms mR , λR (and rescaling of the field). By Wein-
berg’s theorem, a superficially convergent graph is UV-finite if all its subdivergences
(in this case, a single one-loop subdivergence) are subtracted, thereby appropriately
lowering the degree of divergence of the subgraph to a negative value. The first step
of our inductive argument is therefore complete.
The inductive step of the renormalization proof proceeds in a familiar fashion: we
assume that it has been established that the subtractions implicit in the Bogoliubov–
Parasiuk recursion formula (17.27, 17.28) (or its explicit solution (17.31)) are exactly
those generated by the counterterms required to implement the renormalization condi-
tions (17.46, 17.51, 17.52), up to L loops. The reader will recall (cf. Section 10.4) that
reinserting  explicitly in Feynman amplitudes provides a convenient counting device
for loops: an amplitude with L loops is proportional to L . Consider a contribution of
order L+1 to a 1PI amplitude Γ on the left-hand side of (17.24). The renormalization
parts γ1 , ..γn in (17.27) are proper subgraphs of Γ, therefore of order L at most, and
by the induction hypothesis their subtractions correspond to counterterms induced
by the BPHZ renormalization conditions up to loop order L. If Γ is superficially
convergent, we are done, as the tΓ operation in (17.24) is defined to be zero, and the
amplitude is already finite with no further subtractions needed. If Γ is a two- or four-
point function (in φ44 theory), then we must define the as yet unfixed counterterms
aL+1 , bL+1 , cL+2 to enforce the renormalization conditions, by subtracting off the
constant and quadratic terms in the zero momentum expansion (in the case of the
two-point function, with d(Γ) = 2), or just the zero momentum value for the four-point
functions (with d(Γ) = 0). These counterterms, aL+1 , bL+1 , cL+2 , therefore correspond
exactly to the new order (L + 1)-loop subtraction tΓ Ī Γ in (17.24). There is, as usual,
no better way to convince oneself that nothing is being swept under the rug at this
point than by examining an explicit example: the two-loop self-energy of Fig. 17.3 with
overlapping divergences discussed earlier (see Problem 3). Although our argument has
considered 1PI graphs, we may now extend it to general connected graphs by recalling,
as discussed previously, that a general connected graph is simply the algebraic product
of its 1PI components with connecting propagator factors at fixed momentum, so its
UV-finiteness is assured once the component 1PI pieces are properly subtracted.
The UV-finiteness of amplitudes subtracted according to the recursive Bogoliubov–
Parasiuk procedure (or the explicit forest formula) therefore implies that the ampli-
tudes of φ44 , once reparameterized in terms of renormalized quantities, lose their
dependence on an ultraviolet cutoff Λ, up to the usual inverse power terms of
m2 ,k2 ki
order O( Λ2 i ) (times possible powers of logarithms of m Λ , Λ ), which are considered
negligible from the point of view of renormalization theory. Whether they are indeed
so, from a quantitative point of view, depends, of course, on whether the range of
energy scales over which our low-energy field theory remains valid is sufficiently large.
Certainly, any perturbatively renormalizable local field theory which remains valid up
Counterterms, subtractions, and perturbative renormalizability 641

to the scale of Grand Unified Theories (1015 GeV) or the Planck scale (1019 GeV) will
contain contributions from the “new physics” at the high scale which are completely
negligible, from the point of view of computing amplitudes for processes allowed by
the low-energy theory, even at LHC energies of 10 TeV. Of course, these considerations
are not relevant if we are dealing with processes which are simply forbidden by the
low-energy theory (e.g., proton decay in Grand Unified Theories) and can only occur
by virtue of the new physics appearing at high energy scales.
(n)
Once renormalized amplitudes ΓR (k1 , k2 , . . . kn ; mR , λR ) are obtained in the above
BPHZ zero momentum renormalization scheme, up to the desired loop order in
perturbation theory, they may, of course, be re-expressed in terms of other low-energy
parameterizations, which may bear a more direct relation to directly physically mea-
surable quantities. For example, we may prefer to use an “on-shell” renormalization
scheme in which amplitudes are parameterized in terms of the actual physical mass
mph of our scalar particle (as given by the location in the pole of the full propagator;
cf. Section 9.5) and the on-shell 2-2 scattering amplitude λph at zero (or some other
standard) spatial momentum. In addition, the field rescaling constant Ẑ may be chosen
to adjust the residue of the pole of the full propagator to be unity. These changes
amount to non-linear reparameterizations of the BPHZ parameters mR , λR , Ẑ which,
of course, do not alter the UV-finiteness of the amplitudes: more exactly, this means
that we can re-expand the BPHZ parameters in power series in the new renormalized
coupling λph with coefficients which are finite (up to inverse power corrections, as
usual) as the UV-cutoff Λ is taken much larger than all other scales in the theory.
Alternatively, we may decide to compute our amplitudes ab initio in an on-shell
scheme,8 by an appropriate alteration of the renormalization conditions (17.46, 17.51,
17.52) used to determine the counterterms to each perturbation theory.
The zero-momentum renormalization scheme for φ44 -theory can be generalized in
a fairly obvious way by fixing the renormalized coupling λR as the value of the four-
point function Γ(4) (k1 , k2 , k3 , k4 ) (all momenta outgoing) at a non-zero Euclidean
momentum: the most convenient choice is the Euclidean symmetric point, defined by

ki2 = μ2 , ki · kj = −μ2 /3, i


= j (17.75)

where
4 the dot-products are determined by symmetry plus momentum conservation
i=1 i = 0. Similarly, renormalization conditions for the self-energy (17.51,17.52)
k
are imposed at p2 = μ2 rather than at zero momentum. This scheme- really, a one
parameter set of schemes, parameterized by the arbitrary scale μ- results in renor-
malized Green functions with an explicit dependence on the renormalization scale μ
(exactly as we saw in the toy example at the beginning of this chapter). Nevertheless,
physical S-matrix elements are independent of μ, once they are reparameterized in
terms of measurable low-energy quantities (such as λph , mph of the on-shell scheme,
for example). This means that in order to keep the physics invariant, we must change
λR , mR as a function of the scale μ: in this sense, the renormalization procedure
induces a “running” coupling constant and mass.

8 The reader should be warned that for massless theories there are difficulties, due to infrared divergences,
in adopting such an on-shell renormalization procedure, as we shall see in Chapter 19.
642 Scales II: Perturbatively renormalizable field theories

In the case of gauge theories, the use of a dimensional renormalization scheme,


as described at the end of the preceding section, is almost indispensable, due to
the importance in maintaining the gauge symmetry in the regularized amplitudes,
as we shall discuss in the next section. Here, as for the Euclidean point subtraction
schemes above, the presence of the free dimensionful renormalization scale parameter
μ (cf. Section 16.5) means that this procedure corresponds to a one-parameter class
of renormalization schemes, where, for a given fixed physical theory, the renormalized
parameters mR (μ), λR (μ) will depend on the choice of renormalization scale. Again,
the dimensionally renormalized parameters of the theory mR , λR (in the scalar case)
may be re-expressed, if desired, in terms of “physically defined” mass and coupling
parameters which do not depend at all on the choice of dimensional renormalization
scale μ (any more than the physical scattering amplitudes of a theory computed via
the BPHZ scheme depend on our arbitrary, if technically convenient, choice of a zero-
momentum subtraction point).
The discussion of perturbative renormalizability for scalar theories given above
generalizes without difficulty to other non-gauge theories. There is an important point
which needs further amplification before we can move on to the case of local gauge
theories, however. Perturbative renormalizability of an interacting field theory (in
contrast to the classification of operators by their power dimension) is really a property
of a set of interaction operators,9 not of any given individual term that may appear
in the interaction Lagrangian. A simple example will suffice to illustrate what is at
issue here. Recall the theory of an isotriplet pion field interacting with a Dirac nucleon
doublet via a Yukawa interaction term (previously discussed in the context of global
isospin symmetry and Noether’s theorem, cf. (12.129)), described by the bare (pre-
reparameterization) Lagrangian
1
L = N̄ (i∂/ − M )N + (∂μ π · ∂ μ π − m2π π · π ) − ig N̄ γ5 τ N · φ (17.76)
2
The engineering mass dimension of a Dirac fermion field in four spacetime dimensions
is seen to be 3/2, by examining the kinetic term in the Lagrangian, so the Yukawa term
is of dimension 4, hence renormalizable, and we may therefore expect that the theory
is perturbatively renormalizable, according to arguments which parallel those made
above for pure scalar theories. One easily finds, for example (see Problem 4), that the
degree of divergence of a graph with Eψ external fermion lines and Eφ external scalar
lines has superficial degree of divergence
3
D = 4 − Eψ − Eφ (17.77)
2
so there are evidently only a finite number of types of superficially divergent graphs, as
in φ4 -theory. In particular, once one begins to enumerate the possible renormalization
parts in this theory, one encounters 1PI loop diagrams with four external scalar lines

9 In Section 17.4 we shall see that this set corresponds to a finite-dimensional low-energy surface in the
infinite-dimensional coupling constant space of Wilsonian effective theory, onto which the renormalization
group flow necessarily contracts, at least in the neighborhood of zero coupling corresponding to formal
perturbation theory.
Counterterms, subtractions, and perturbative renormalizability 643

Fig. 17.10 Fermion loop contribution to π − π scattering amplitude in Yukawa isospin theory
(thick lines are Dirac propagators, thin lines are (amputated) scalar propagators).

with superficial degree of divergence zero, as in Fig. 17.10, where the four thick
internal lines refer to Dirac nucleon propagators, each of order 1/(loop
 momentum),
hence giving rise to a logarithmically divergent loop integral ∼ d4 l/l4 . The forest
formula implies that counterterms corresponding to the operator (π · π )2 must be
present to generate the corresponding Taylor subtraction operator needed to remove
the leading cutoff dependence of this diagram. Note that the single required operator
satisfies the global O(3) isospin symmetry of the basic Lagrangian (17.76), provided
the regularization procedure employed to define the individual diagrams also does. In
the case of a global symmetry, such as the isospin symmetry present here, this is not
difficult to manage: even a crude momentum cutoff will suffice, provided the mode
cutoffs on the different components of the scalar and Dirac fields are done identically.
As we shall see in the next section, cutoff procedures capable of maintaining a local
gauge symmetry are much harder to come by: in this case, the spatiotemporal aspect
of the symmetry requires a much more delicate treatment of the momentum modes of
the fields.
Returning to (17.76), we see that the theory described by this Lagrangian is
not perturbatively renormalizable as it stands, until we include in the interaction
Lagrangian all renormalizable operators (i.e., up to dimension 4) satisfying the global
symmetries of the theory. In the present case, this means that we must include a
quartic scalar term ab initio in our bare Lagrangian:
1  − λ (π · π )2
L = N̄ (i∂/ − M )N + (∂μπ · ∂ μπ − m2π π · π ) − ig N̄ γ5τ N · φ (17.78)
2 4!
The reparameterization of the bare parameters and fields in the Lagrangian (17.78)
now suffices to generate all the counterterms needed to implement the Taylor sub-
tractions in the BPHZ renormalization scheme for this theory. In particular, the δλ
counterterm arising from the final interaction in (17.78) is needed to remove the
logarithmic divergence of Fig. 17.10. We say that the original Yukawa term, together
with the quartic scalar term, form a perturbatively renormalizable set of operators.
An even simpler example of this phenomenon occurs with the φ36 theory introduced
earlier. Once non-renormalizable operators are excluded, we see from (17.40) that the
renormalization parts of the theory consist of the 1PI diagrams with E = 1, 2, or 3
644 Scales II: Perturbatively renormalizable field theories

δv
´

Fig. 17.11 Tadpole graph appearing in φ36 -theory, and its accompanying counterterm.

external lines (there is, of course, no φ → −φ symmetry to exclude odd powers of the
field in this theory), so a naive choice of Lagrangian
1 1 λ
L= (∂μ φ)2 − m2 φ2 − φ3 (17.79)
2 2 3!
lacks the counterterm needed to remove UV divergences which appear in the “tadpole”
graphs of the theory: i.e., the 1PI diagrams with E = 1 external scalar line, as in
Fig. 17.11. Instead, we must write
1 1 λ
L= (∂μ φ)2 − δv φ − m2 φ2 − φ3 (17.80)
2 2 3!
and choose the counterterm δv order by order in perturbation theory to remove the
(momentum independent) tadpole terms as they appear at increasing loop order.10 In
this case, the generation of a complete perturbatively renormalizable set of operators
requires the inclusion of the super-renormalizable operator φ. In the next chapter we
shall see that the need for “completing” the set of operators included in the Lagrangian
is intimately related with the “operator mixing” which is characteristic of the behavior
of local operators in an interacting field theory.
The construction of a perturbatively renormalizable theory requires, as we have just
seen, that “enough” operators be included to allow for all necessary counterterms, but
it is also essential not to include even a single non-renormalizable operator in the basic
Lagrangian if we wish to have a theory in which a finite number of reparameterizations
suffice to remove the primary (divergent) cutoff dependence of the amplitudes. As we
mentioned previously, if even a single such operator is included in the Lagrangian,
the divergence counting formula (17.39) implies that renormalization parts appear
with arbitrarily many external lines, simply by considering graphs with arbitrarily
many insertions of the non-renormalizable vertex. Although the Zimmermann forest
formula continues to yield formally UV-finite subtracted amplitudes to all orders of
perturbation theory in such a case, the tγ subtraction operators appearing therein now
implicitly correspond to an infinite series of counterterms present in the reparameter-

10 As the tadpole graphs lack momentum dependence, BPHZ subtraction removes them entirely, so we
may simply drop any graph with a tadpole insertion in this scheme.
Renormalization and symmetry 645

ized Lagrangian: we are now forced to expand the Lagrangian to include essentially
all operators consistent with any global symmetries of the theory, and our amplitudes
will depend on an infinite set of parameters, severely restricting the predictive content
of the theory, at the very least. The restriction of the terms in the Lagrangian to
only that finite set compatible with the given field content, with the imposed global
and local symmetries, and with perturbative renormalizability, may have unintended
consequences: in particular, the appearance of additional accidental symmetries which
are only valid in virtue of the absence of non-renormalizable operators compatible
with the initial imposed symmetries of the theory but not with renormalizability. Such
accidental symmetries play an important role in the Standard Model (see Section 12.5,
(Weinberg, 1995a)).
It may have occurred to the reader at this point that the insertion of an ultraviolet
cutoff in a general clustering, Lorentz-invariant theory, as discussed in Chapter 16
effectively implies that our theory should indeed be described by an effective Wilsonian
Lagrangian in which all operators—super-renormalizable, renormalizable, and non-
renormalizable—appear. And the renormalization group discussion of the role of these
three classes of operator given there suggested that in fact the non-renormalizable
operators are in some sense the least important at low energy. The apparent incon-
sistency between these two points of view—the historically seminal restriction of
acceptable field theories to the (finite) class of perturbatively renormalizable ones,
and the much wider class of effective field theories encompassed in the Wilsonian
approach—will be discussed in Section 17.4, where we shall see that perturbatively
renormalizable theories emerge by evaluating the renormalization group flow of a
general Wilsonian effective Lagrangian with a particular choice of initial conditions
at the ultraviolet end, and utilizing the demonstrable insensitivity of the low-energy
properties (at the level of perturbation theory) to this special choice.

17.3 Renormalization and symmetry


We have seen that the constraint of perturbative renormalizability selects Lagrangians
with a finite but sufficient number of renormalizable and super-renormalizable terms
to absorb the divergent parts of the graphs generated by these terms, while avoiding
the appearance of an infinite set of independent renormalization parts whose diver-
gent parts cannot be absorbed by the counterterms generated by reparameterizing
the coefficients in the original Lagrangian. In some cases, the existence of a finite
perturbatively renormalizable set of operators depends crucially on the existence of a
symmetry restricting the type of operators which may appear in the Lagrangian. In the
preceding section we have already seen an example—the isospin invariant Lagrangian
(17.78), where the O(3) symmetry restricts the possible renormalizable interactions
which can, and indeed must, be included in order to provide counterterms for all
divergent subgraphs of the theory. However, in this case the symmetry is not itself
essential for perturbative renormalizability: any theory with only scalar and Dirac
fermion fields and with interactions of Yukawa type and scalar self-interactions quartic
or less in the scalar fields will be renormalizable, provided all possible renormalizable
(and super-renormalizable) terms are included. The presence of the global O(3)
symmetry serves only to restrict the number of such terms which we must include
646 Scales II: Perturbatively renormalizable field theories

in the Lagrangian, in the example cited above leaving us with only two non-trivial
interaction terms. The situation is completely different once we include vector fields
interpolating for spin-1 particles. In this case, perturbative renormalizability requires
the presence of a local gauge symmetry, and, as we shall see now, the intertwining of
symmetry and renormalization properties becomes much more intricate than in the
global symmetry case.
The new features which appear in the treatment of renormalization for spin-1 fields
can be traced back to the large momentum behavior of the propagator for a canonical
(massive) field of spin-j, which contains a numerator factor of order k 2j . This leads
to a scalar (j = 0) propagator with a 1/k 2 falloff at large momentum k, a Dirac
fermion propagator falloff 1/k, but no falloff at all for a massive spin-1 (Euclidean)
propagator (g μν − k μ k ν /m2 )/(k 2 + m2 ) ∼ k 0 , k >> m. The result is that, although
the engineering dimension of a massive Aμ vector field is the same as that of a scalar
field (namely, mass to the first power in four spacetime dimensions), internal vector
lines contribute an extra factor of +2, compared to scalar lines, to the overall degree
of divergence of any 1PI graph containing them. For a theory of fermions interacting
with vectors by a trilinear coupling g ψ̄γ μ ψAμ (with g dimensionless), this means that
the power-counting formula (17.77) for Yukawa theories must be replaced by
3
D = 4 − Eψ − EA + 2IA (17.81)
2
for a graph with Eψ external fermion lines, EA (resp. IA ) external (resp. internal)
vector lines. The positive term +2IA has the usual disastrous consequence: regardless
of the number of external lines possessed by a subgraph, at high enough order
of perturbation theory the subgraph will become divergent, and increasingly so as
we go to even higher orders in the trilinear coupling, by inserting more and more
internal gauge field lines, leading to a non-renormalizable theory with infinitely many
independent renormalization parts.
Of course, we have already seen in Chapter 15 that in a theory with massless
vector particle and an exact local gauge symmetry, there exist choices of gauge in
which the vector propagator has the same falloff, 1/k2 , as a scalar propagator, thereby
removing the unfortunate +2IA contribution in (17.81). This is the case, for example,
in the covariant ξ-gauges discussed in Section 15.4, where the (Minkowski) gauge field
propagator takes the form (cf. (15.159)):
 (g − (1 − ξ) pμ pν )e−ip·(x−y) 4
μν p2 d p
0|T {Aαμ (x)Aβν (y)}|0 = −iδαβ (17.82)
p2 + i (2π)4

The equivalence of the theory formulated in Hamiltonian language (and therefore


manifestly unitary) to the manifestly covariant one determined by the (Euclidean)
functional integral (15.161) depends crucially, as we explained in Section 15.4, on
the local gauge invariance of the Lagrangian. Just as in the case of global symmetries,
where Noether’s theorem in functional form leads to a set of Ward–Takahashi identities
constraining the Green functions of the theory, the critical local gauge invariance
here can be re-expressed in a set of Ward identities (in the non-abelian case, more
frequently called Slavnov–Taylor identities), the preservation of which, both by our
Renormalization and symmetry 647

regularization and by our reparameterization procedures, will be crucial for ensuring


the perturbative renormalizability of the gauge theory. We shall illustrate the basic
point using an abelian gauge theory—for example, QED—to reduce the algebra to a
manageable level. Thus, we begin with a Euclidean functional integral (recall that no
ghosts are needed in the abelian case) for coupled photons and electrons, with e the
electron charge,
  1 2 4
Z[J] = eW [J] = DAμ DψDψ̄e− (LE + 2ξ (∂μ Aμ ) −Jμ Aμ )d x (17.83)

1
LE = (Fμν )2 − ψ̄(∂/ − m)ψ − ieψ̄A/ψ (17.84)
4
We have not included source terms for the fermion fields, as we shall be consid-
ering only Green functions with external gauge field lines in the following. The
functional integral (17.83) is invariant under a change of variable of integration
Aμ (x) → Aμ (x) + ∂μ λ(x), with λ(x) infinitesimal; and as this also corresponds to a
local gauge transformation leaving LE invariant, we have, to first order in λ:
  
1 1 2 4
0 = DAμ DψDψ̄ [ ∂μ Aμ (x)λ(x) − Jμ ∂μ λ(x)]d4 x e− (LE + 2ξ (∂μ Aμ ) −Jμ Aμ )d x
ξ
(17.85)
Defining ω(x) ≡ λ(x), this becomes
  
1 1 1 2 4
0 = DAμ DψDψ̄ [ ∂μ Aμ (x) + ∂μ Jμ (x)]ω(x)d4 x e− (LE + 2ξ (∂μ Aμ ) −Jμ Aμ )d x
ξ 
(17.86)
whence, taking the functional derivative with respect to ω(x), we find
 
1 1 1 2 4
0 = DAμ DψDψ̄[ ∂μ Aμ (x) + ∂μ Jμ (x)] e− (LE + 2ξ (∂μ Aμ ) −Jμ Aμ )d x (17.87)
ξ 
Factors of Aμ (x) within the functional integral may be replaced by functional deriva-
tives with respect to the source Jμ (x) acting on the generating functional Z[J], so our
result (17.87) can be rewritten as a functional differential equation for Z[J],
1 δZ[J] 1
∂μ + ∂μ Jμ (x)Z[J] = 0 (17.88)
ξ δJμ (x) 

or better, for the generating functional of connected Green functions, W [J],


1 δW [J] 1
∂μ = − ∂μ Jμ (x) (17.89)
ξ δJμ (x) 
As we have seen repeatedly in the preceding discussion of renormalization, the 1PI
graphs of the theory play a primary role: they are, as it were, the “atomic” constituents
in terms of which the divergence structure of the theory is most conveniently analyzed.
Consequently, it is preferable to re-express the constraint (17.89) in terms of the
generating functional Γ[Aμ ] which generates the 1PI graphs of the theory. In this
case, it is defined by the Legendre transformation (cf. (10.140))
648 Scales II: Perturbatively renormalizable field theories

Γ[A] = −W [J] + Jμ Aμ (17.90)

where the source variable Jμ and classical field variable Aμ satisfy


δΓ[A]
Jμ (x) = (17.91)
δAμ
δW [J]
Aμ = (17.92)
δJμ
Inserting these relations in (17.89) we obtain, without any further ado,
1 1 δΓ[A]
∂μ Aμ (x) = − ∂μ (17.93)
ξ  δAμ (x)
This seemingly innocent equation contains a wealth of information about the n-point
1PI functions of the gauge field, as we recover statements about these simply by
taking the appropriate number of functional derivatives of (17.93) with respect to Aμ .
In particular, if we differentiate once with respect to Aν (y), we find

1 ∂ 4 1 ∂ δ 2 Γ[A]
δ (x − y) = − (17.94)
ξ ∂xν x ∂xμ δAμ (x)δAν (y)
Recalling that the second derivative of Γ yields the (full) inverse propagator of the
theory, we may write down immediately the Fourier transform of (17.94), with the
obvious translations ∂μ → ipμ ,  → −p2 ,
1 1
pν = 2 pμ Δ̂−1
F μν (p) (17.95)
ξ p
The full gauge field (i.e., photon) propagator can be obtained, just as in (17.50), as
an iteration of free propagators interspersed with self-energy corrections. The only
difference is that here the propagator and self-energy carry two vector indices, so the
products in the iteration are matrix products. Consequently,
1 pμ pν
Δ̂−1 −1
F μν (p) = ΔF μν (p) − Πμν (p) = p (δμν + ( − 1)
2
) − Πμν (p) (17.96)
ξ p2
where the first term on the right-hand side is the inverse of the free Euclidean
propagator (cf. (17.82)) in the covariant ξ-gauge, and the self-energy Πμν (p) is given
in the bare (i.e., pre-reparameterization) theory by graphs such as those indicated in
Fig. 17.12. In particular, the one-loop graph of Fig. 17.12 is given by

2 tr[γ̂μ ((il/ + m)γ̂ν (i(p/ + /l ) + m)] d4 l
Π1 μν (p) = e (17.97)
(l2 + m2 )((p + l)2 + m2 ) (2π)4
where the γ̂μ are the Euclidean γ-matrices defined in (15.160). Using the appropriate
Euclidean trace identities (see Problem 5), this loop integral becomes

δμν (m2 + l · (p + l)) − lμ (p + l)ν − lν (p + l)μ d4 l
Π1 μν (p) = 4e2 (17.98)
(l2 + m2 )((p + l)2 + m2 (2π)4
Renormalization and symmetry 649

p p

p+l

Fig. 17.12 One-loop photon self-energy graph.

which is at first sight quadratically divergent, in accordance with the power-counting


rule (17.81). Now the Ward identity (17.95) evidently implies, on inserting (17.96),

pμ Πμν (p) = 0 (17.99)

as the left-hand side of (17.95) is already given by just the free propagator part of
(17.96). In other words, the local gauge symmetry embodied by the Ward identities of
the theory requires the photon self-energy tensor to be transverse. This transversality
must, of course, hold at each loop order in perturbation theory, as we may formally
expand Πμν in powers of  and require that (17.99) hold at each order. Any regu-
larization which violates this property will ipso facto do violence to the local gauge
symmetry, and, as we shall now see, destroy the perturbative renormalizability of the
theory. Let us examine this issue explicitly for the one-loop integral (17.98). Inserting
a Feynman parameter in the usual way (cf. (16.67)), this becomes
 
2
1
δμν m2 − 2lμ lν + 2x(1 − x)pμ pν + δμν (l2 − x(1 − x)p2 d4 l
Π1 μν (p) = 4e dx
0 (l2 + x(1 − x)p2 + m2 )2 (2π)4
(17.100)
If we regularize the integral simply by imposing a momentum cutoff |l| < Λ, the
integral has a leading contribution for Λ >> p, m given by

e2 2
Π1 μν (p) ∼ Λ δμν (17.101)
8π 2

which clearly does not satisfy (17.99). Such a divergence, were it truly present in the
theory, could only be removed by a counterterm corresponding to an explicit photon
mass term in the Lagrangian δm2 Aμ Aμ , which of course also violates the local gauge
invariance of the theory.
On the other hand, dimensional regularization of the integral maintains the desired
transversality, as we might expect (or at least hope!), given that the formal structure
(and hence, the propagators and vertices) of a locally gauge-invariant Lagrangian does
not depend on the spacetime dimension in which it is formulated. Making the standard
replacement

d4 l d
4−d d l
→ μ (17.102)
(2π)4 (2π)d
650 Scales II: Perturbatively renormalizable field theories

in (17.100), with μ a regularization scale needed to maintain dimensional consistency of


the expression while we are away from four dimensions, and following steps completely
analogous to those leading from (16.62) to (16.65) (see Problem 6), we find
 1
π d/2 d x(1 − x)p2 + m2 d −2
Π1 μν (p) = 8e2 (pμ pν − δμν p2
)Γ(2 − ) x(1 − x)( ) 2 dx
(2π)d 2 0 μ2
(17.103)
which manifestly satisfies the transversality condition (17.99). The result is, of
course, singular as we return to four dimensions, d = 4 − , → 0, with the pole part
given by
 1
π2 2 2 e2 1
tγDR Π1 μν (p) = 8e2 4
(p p
μ ν − δμν p ) x(1 − x)dx = (pμ pν − δμν p2 )
(2π) 0 6π 2
(17.104)
We have introduced the notation tγDR for the dimensional subtraction operator on
the renormalization part γ—in this case simply the one-loop graph of Fig. 17.12,
which replaces the zero momentum Taylor operators of the BPHZ scheme when we
follow a dimensional renormalization procedure as described in the preceding section.
The critical point is that the subtraction operator amounts to exactly a counterterm
δZ × 14 (Fμν )2 , which inserted to first order in the photon propagator clearly produces
(Problem 7), in momentum space, just the transverse tensor seen in (17.104). But
such a counterterm simply amounts to a rescaling of the photon field Aμ in our bare
Lagrangian (17.84): in the more conventional (but somewhat misleading) language of
renormalization theory, to a “wavefunction renormalization”. Taking into account the
subtraction effected by the tγDR pole part operator, we are left with the renormalized
one-loop self-energy (taking the spacetime dimension back to the physical value)

Π1,R μν (p) = lim (1 − tγDR )Π1 μν (p) (17.105)


→0

e2 1
x(1 − x)p2 + m2
=− (pμ pν − δμν p2 ) (γ + ln ( )dx (17.106)
2π 2 0 4πμ2
= (pμ pν − δμν p )Π1,R (p )
2 2
(17.107)
 1
e2 x(1 − x)p2 + m2
Π1,R (p ) ≡ − 2
2
(γ + ln ( )dx (17.108)
2π 0 4πμ2

Note that the dependence on the arbitrary renormalization scale μ (which plays the
role of the UV cutoff in a momentum cutoff scheme) is logarithmic, not quadratic. In
effect, the need to produce the transverse tensor pμ pν − δμν p2 outside the loop integral
has reduced the quadratic divergence of the loop integral (17.98) to a logarithmic one.
Inserting the renormalized self-energy in the expression (17.96) for the inverse full
photon propagator, and inverting once again, we find, for the renormalized Euclidean
photon propagator (to one loop),
pμ pν
δμν − p2 pμ pν
Δ̂F μν (p) = +ξ (17.109)
p2 (1 − Π1,R (p2 )) p4
Renormalization and symmetry 651

The radiative corrections have induced a change only in the transverse (“physical”)
part of the propagator, but the propagator pole is still at p2 = 0 (note that Π1,R (0) is
finite and non-zero): our physical photon mass is still safely zero. The gauge-variant
part of the propagator retains exactly its lowest-order, tree-level value—a property
which can be shown to persist at higher orders of perturbation, thanks to the Ward
identity. Of course, the residue at the pole—the LSZ “Z” constant—is not unity (it is,
in this renormalization scheme, just 1/(1 − Π1,R (0)), and must therefore be included
in the LSZ formula (9.179) when S-matrix elements are computed.
Similar improvements in the degree of divergence relative to the naive power-
counting rule (17.81) occur in other dimensionally subtracted Green functions of the
theory, all with the net result of limiting the renormalization parts of the theory to
just those which correspond to the counterterms available by reparameterizing only
the couplings and fields present in our initially locally gauge-invariant Lagrangian
(17.84). For example, the analog of Fig. 17.10, the one-loop contribution to light by
light (elastic 2-2 photon) scattering arising from an internal electron loop, would seem
at first glance to be logarithmically divergent, as in the Yukawa case, and therefore
to require the presence of a quartic counterterm δλ(Aμ Aμ )2 . Such a term, of course,
destroys local gauge invariance, and once admitted, would then lead to the need for a
photon mass counterterm as well, massive photon propagators, and the re-emergence
of the fatal +2IA contribution in (17.81). In fact, the Ward identity obtained by
differentiating (17.94) a further three times with respect to the classical photon field
A amounts to the statement that the 1PI four-point photon Green function of the
theory is divergence-less on any of its four spacetime indices, or in momentum space:
p1μ Γ(4) (4)
μνρσ (p1 , p2 , p3 , p4 ) = p2ν Γμνρσ (p1 , p2 , p3 , p4 ) = . . . = 0 (17.110)
The transversality, just as in the case of the two-point function, is realized by the
appearance of four transverse tensors on each of the external legs of the diagram,
which then reduces the effective degree of divergence (in this case, by an astonishing
four powers, although in the present circumstance one would suffice!) to a negative
value. The diagram is therefore finite, and no counterterm is in fact needed.
A complete demonstration of perturbative renormalizability for a general non-
abelian gauge theory,11 quantized in a covariant ξ-gauge and defined by the functional
integral (15.161), requires the systematic application of the appropriate generalizations
of our simple Ward identity (17.94) for the general 1PI Green functions of the theory—
involving gauge field, and fermion and ghost fields. These “Slavnov–Taylor” identities
can then be shown to imply that the only renormalization parts arising to arbitrary
orders of perturbation theory indeed correspond to counterterms associated with
reparameterization of the original Lagrangian. The argument is somewhat lengthy

11 It should be confessed at this point that the addition of a photon mass term in the abelian case does
not in fact ruin perturbative renormalizability: the additional term induced in the Ward identity (basically,
m2
one displaces the gauge parameter ξ1 by A ) does not alter the transversality of the multi-photon 1PI
amplitudes needed to exclude quartic non-gauge-invariant counterterms, although a mass counterterm is
now necessary. In the non-abelian case, however, the non-linear gauge couplings result in the generation
of an infinite number of non-gauge-invariant counterterms once an explicit mass term is included for the
gauge fields.
652 Scales II: Perturbatively renormalizable field theories

and will not be reproduced here, as its conceptual essence is already visible in the
abelian examples given above.12
We have seen that the explicit breaking of global symmetries of the Lagrangian does
not in general alter the renormalizability status of the theory: typically, one simply
has a larger number of renormalizable and super-renormalizable counterterms which
must be included in the Lagrangian to absorb divergences in multi-loop diagrams.
The same is true of spontaneously broken symmetries, which from a Lagrangian
point of view can be thought of as explicitly broken theories with special relations
between the Lagrangian parameters (cf. Section 8.4). Evidently, the renormalizability
of gauge theories (especially in the non-abelian case) is much more intimately con-
nected with the underlying local symmetry, and relies on the absence of any explicit
symmetry-breaking terms, even if their engineering dimension places them in the
category of (superficially) renormalizable or super-renormalizable operators.
Nevertheless, the perturbative renormalizability of a non-abelian gauge theory is
unaffected by spontaneous breaking of the remnant discrete gauge symmetry (after
gauge-fixing; cf. Section 15.6), which results in the appearance of physically massive
gauge vector particles via the Higgs phenomenon. We saw in Chapter 14 that the
spontaneous breakdown of a symmetry is quintessentially a long-distance (therefore,
low-energy) phenomenon, so it should not be surprising that the short-distance scaling
behavior of the theory is basically unaffected by the choice of vacuum state entailed by
the field shifts used to implement the effects of the spontaneous breakdown. Once a per-
turbatively renormalizable theory is reparameterized in terms of a (finite!) set of well-
defined low-energy parameters (and appropriately rescaled fields), the Lagrangian,
and hence the associated Hamiltonian defines a dynamical evolution in the state space
which is (order by order in perturbation theory, of course) insensitive at the power level
to the UV cutoff in the theory. This remains true whether the Hamiltonian is applied to
states obtained by applying field operators to the “false” non-ground-state “vacuum”
of the unshifted, symmetric theory, or to the physically relevant states built on the
true ground-state vacuum of the theory. Indeed, we have already seen in Section 15.6
that by exploiting the underlying exact local gauge symmetry of a spontaneously
broken gauge theory (abelian or non-abelian) one may derive covariant Feynman rules
in which the massive vector propagators have the soft 1/k 2 falloff essential for the
normal alignment of renormalizable (resp. non-renormalizable) operators with mass
dimension 4 (resp. > 4). The non-abelian Slavnov–Taylor identities do the remaining
job, just as in the unbroken case, of restricting the divergent counterterms to just
those associated with reparameterization of the parameters of the original Lagrangian,
which, we recall, is manifestly locally gauge-symmetric before this property is disguised
by the field shifts employed to display the spontaneous breaking.

17.4 Renormalization group approach to renormalizability


The property of renormalizability—otherwise stated, the assertion that the complete
dynamics of a local quantum field theory can be expressed in terms of a finite number

12 The interested reader will find the complete proof in (for example) Section 12.4, (Itzhykson and Zuber,
1980).
Renormalization group approach to renormalizability 653

of parameters defined in terms of the low-energy (or large-distance) properties of


the theory—appears at first sight quite astonishing in the light of our discussion
of Wilsonian effective Lagrangians in Sections 16.2 and 16.3. There, in view of the
inevitable breakdown of Minkowski-based field theories at short distance due to (at
the very least) quantum gravity effects, we insisted that any physically sensible theory
should include a cutoff at some high-energy/momentum scale, and that the necessary
result of such a cutoff was the appearance of an infinite number of operators, including
the baleful non-renormalizable ones, in the Lagrangian density defining the dynamics
of the cutoff theory. The cutoff μ used is a matter of convenience, as the resultant
effective Lagrangian Lμ is defined to yield exactly the same low-energy (i.e., sub-μ)
physics irrespective of the choice of μ; but we saw in Section 16.3 that even if the coef-
ficient of a particular operator is fixed to zero at some scale, it will no longer be so at
lower scales, due to the renormalization group flow in the infinite-dimensional coupling
constant space of a cutoff theory. Thus, the whole notion of working with a Lagrangian,
with fields cutoff at some value Λ, but only a finite number of operators (the super-
renormalizable and renormalizable ones) with non-zero couplings, and maintaining
zero couplings for the (infinitely many) non-renormalizable operators as the cutoff
Λ is progressively increased, seems at first sight incomprehensible from the Wilsonian
point of view. Our object in this section is quite simply to explain the resolution of
this apparent paradox, along the lines originally followed by Polchinski (Polchinski,
1984). Our discussion will make clear that the specific procedure of perturbative
renormalization developed in the preceding sections of this chapter amounts really
to just one special (though technically very convenient!) application of the much more
general idea of a Wilsonian effective Lagrangian introduced in the preceding chapter.
We recall that a local scalar field theory with field modes above some momentum
scale Λ integrated out is described by an effective Lagrangian (cf. (16.25)):

LΛ = an (Λ)On (φΛ ) (17.111)
n

We have slightly altered notation here, by labeling all operators (irrespective of the
number of spacetime-derivatives) by a single index n: thus, in addition to simple
powers of the field, the kinetic operator is included in the list, together with all other
operators with two, four, six, etc., spacetime-derivatives. The field φΛ (x) (cf. (16.4))
only contains Fourier modes of the field with Euclidean momentum |k| < Λ. Counting
powers of mass dimension, we find that if the operator On has mass dimension 4 − dn ,
the associated coefficient an has mass dimension dn , and we may define (cf. (16.8))
dimensionless couplings gn (Λ) by the simple scaling

gn (Λ) ≡ an (Λ)Λ−dn (17.112)

The scale Λ is, as previously stated, arbitrary, and we may imagine fixing the
dimensionless couplings at some fixed very high ultraviolet scale ΛUV (much higher
than the physics we wish to explore, but safely smaller than the scale of quantum
gravity effects, say),

ḡn ≡ gn (ΛU V ) (17.113)


654 Scales II: Perturbatively renormalizable field theories

and then using the non-linear first-order evolution equations (16.27) describing the
renormalization group flow of the effective Lagrangian to determine the dimensionless
couplings at any lower scale μ < ΛUV , as a function of the dimensionless ratio μ/ΛUV
and the initial high-energy parameters ḡn :


μ gn (μ) = βn (gn (μ)) ⇒ gn (μ) = gn (ḡn ; μ/ΛUV ) (17.114)
∂μ

In Section 16.3 we divided the set of local operators On into the relevant operators
corresponding to dn > 0, the marginal operators with dn = 0, and the irrelevant
operators with dn < 0 (corresponding, respectively, to the classification into super-
renormalizable, renormalizable, and non-renormalizable operators in the language of
perturbative renormalization). The relevant and marginal operators correspond to
operators (in spacetime dimension 4) with mass dimension less than or equal to 4,
and therefore constitute a finite set: let there be N of these (the actual number may
depend on the type of fields and interactions, and imposed symmetries, of course). We
shall distinguish the operators in this finite set by writing small Roman indices a, b,
etc., and indicate the irrelevant operators of dimension greater than 4 (of which there
are an infinite number) by Greek letters α, β, etc., reserving later Roman characters
m, n, r, etc., for the general set of operators. In Section 16.3 we also saw in some simple
examples that the dominant effect of integrating out non-renormalizable operators
between a high UV cutoff scale and a low-energy scale (up to small corrections
involving inverse powers of the large ultraviolet scale) was to produce modifications,
potentially of order unity, in the couplings associated with marginal and relevant
operators. The point of the following discussion is to reproduce this result in a much
more general context. Our derivation will follow the streamlined approach described
by Weinberg (Weinberg, 1995a), based on Polchinski’s original arguments (Polchinski,
1984).
We shall demonstrate that in the regime of weakly coupled perturbation theory,
the renormalization group flow implied by (17.114) maps an arbitrary initial surface
S̄ in the high-energy coupling constant space of the {ḡn } to an N-dimensional surface
S of the {gn (μ)}, a given point on which is uniquely determined by specifying N
low-energy parameters, up to corrections which fall as inverse whole powers of the
ratio of the UV cutoff ΛUV to the low-energy scale μ. The demonstration relies on a
linear stability analysis familiar in the treatment of non-linear dynamical systems. We
first consider the effect of a small (infinitesimal) change δgn in the parameters on the
renormalization flow generated by (17.114). Defining

∂βn
Mnm (gn ) ≡ (17.115)
∂gm

we have, to first order in the δgn ,


μ δgn (μ) = Mnm δgm (μ) (17.116)
∂μ
Renormalization group approach to renormalizability 655

We may also define a matrix Gnm expressing the variation of the low-energy parame-
ters under variation of the initial parameters ḡn (see (17.114)):
∂gn
Gnm (μ) ≡ (17.117)
∂ḡm
Differentiating (17.114) with respect to the initial parameters, one finds also

μ Gnm = Mnr Grm (17.118)
∂μ
We shall assume that the finite N×N submatrix Gab with rows and columns restricted
to the marginal and relevant couplings is not singular, with well-defined inverse G−1
ab ,
which is presumably the case with the exception of perhaps an isolated set of measure
zero in the coupling constant space, which we assume our renormalization group flow
avoids. By usual matrix algebra, one has
∂ −1 ∂
μ Gab = −G−1
ac (μ Gcd )G−1 −1 −1
db = −Gac Mcn Gnd Gdb (17.119)
∂μ ∂μ

Note that the final pair of matrices Gnd G−1


db appearing here cannot (for general n) be
collapsed to a Kronecker δ, as the summed index d only runs over a partial subset
ˆ in the irrelevant
of the couplings. We now introduce a projected set of variations δg α
(non-renormalizable) couplings:
ˆ (μ) ≡ δgα (μ) − Gαa G−1 δgb (μ)
δg (17.120)
α ab

ˆ measures the extent to which variations in the


Effectively, as we shall soon see, δg α
(infinitely many) non-renormalizable couplings cannot be compensated for by variation
in the N marginal/renormalizable couplings. The flow equation for the δg ˆ follows
α
directly from (17.116, 17.118, 17.119), and we find
∂ ˆ
μ δg (μ) = Mαn δgn − Mαn Gna G−1
ab δgb
∂μ α
−1 −1
+Gαa G−1
ac Mcn Gnd Gdb δgb − Gαa Gab Mbn δgn

= Mαβ δgβ − Mαβ Gβa G−1


ab δgb
−1 −1
+Gαa G−1
ac Mcβ Gβd Gdb δgb − Gαa Gab Mbβ δgβ

= (Mαβ − Gαa G−1 −1


ab Mbβ )(δgβ − Gβc Gcd δgd ) (17.121)
ˆ
≡ M̂αβ δg (17.122)
β

The equivalence of the first and third lines follows from the fact that for values of the
index n in the marginal/relevant subset, we have Gna G−1 ab = δnb , and the two terms
on the first line then cancel: hence, the sum over n may be restricted to the non-
renormalizable set labeled by β in the third line. A similar argument establishes the
equivalence of the second and fourth lines, at which point (17.121) follows with some
straightforward shuffling of indices. In the free field limit (all couplings corresponding
656 Scales II: Perturbatively renormalizable field theories

to higher than quadratic operators set to zero) there are no loop integrals, and the
entire scale dependence of the dimensionless parameters is due to the rescaling by
engineering dimension in (17.112), and we therefore have

Mnm ∼ M̂nm ∼ −dn δnm (17.123)

If we then consider renormalization group flows in the infinitesimal neighborhood of


the free field surface (and when we work in perturbation theory, effectively computing
multiple derivatives of the amplitudes at zero coupling, this is exactly what we are
ˆ for the irrelevant couplings, for which dα < 0 must
doing), the projected variations δg α
decay at low energy like inverse powers of the UV cutoff:

ˆ ∼ ( μ )−dα = ( μ )|dα |
δg (17.124)
α
ΛUV ΛUV

or, equivalently,
μ |dα |
δgα (μ) ∼ Gαa G−1
ab δgb (μ) + O(( ) ), μ << ΛUV (17.125)
ΛUV

The reason for the otherwise strange qualifier “irrelevant” applied to the non-
renormalizable couplings and operators in the theory (indexed by α) should be
apparent at this point: their effects at low energy may be entirely subsumed in
variations of the marginal and relevant couplings (indexed by b).
The result (17.125) (see Fig. 17.13) expresses the desired result: arbitrary infinites-
imal displacements of the initial (high-energy) point {ḡn } (in other words, in the
tangent space of any surface containing the initial point) amount to displacements of
the low-energy parameters (at cutoff μ) in a finite, N-dimensional surface S, as the
displacements δgα (μ) are simply linear combinations of the N displacements δgb (μ) of
the marginal/relevant couplings of the theory. All low-energy amplitudes of the theory
(i.e., with external Euclidean momenta less than μ) are given by the specification of
the effective Lagrangian Lμ determined by the gn (μ), so fixing the low-energy physics
uniquely amounts to specifying a finite number—in fact, exactly N—of independent
low-energy amplitudes, which then locate a unique point on the attractive surface S.
This is exactly the procedure used in the preceding sections, where we have imposed
N renormalization conditions as a prelude to the reparameterization of the theory in
terms of the parameters defined by these conditions. It should be emphasized that
the gα (μ) coefficients of the infinitely many non-renormalizable operators in Lμ are
not zero: indeed, they are needed to incorporate the effects of the marginal/relevant
couplings once all the modes of the field between ΛUV and μ are integrated out.
In other words, using Lμ to actually calculate field theory amplitudes would require
including all the vertices for the operators On , but restricting the loop integrals to
internal propagator momenta |k| < μ—clearly an impractical procedure, and not the
way we actually proceed in renormalized perturbation theory. Instead, the process
of renormalization as outlined in the preceding sections of this chapter implicitly
amounts to computing field-theory amplitudes (order by order in perturbation theory)
successively starting the renormalization group flow at the initial point
Renormalization group approach to renormalizability 657

g2(Λ)
g4(Λ)
g3(Λ)
g1(Λ) High Energy Scale

Low energy surface S

g1(μ) Low Energy Scale


g2(μ)

2
−1 μ
δgα(μ) = GαaGab δgb(μ) + O(( )|dα|)
b=1 Λ

Fig. 17.13 Schematic illustration of an attractive renormalization group flow onto a finite-
dimensional (N=2) low-energy surface.

gα (ΛUV ) = 0
ga (ΛUV ) = ḡa (ΛUV ) = 0 (17.126)

with the “bare” couplings ḡa (ΛUV ) chosen to fix the physical theory at low energy at
the desired end-point on the low-energy surface S (by imposing the renormalization
conditions), and then taking the limit ΛUV → ∞, confident that the end-point of the
renormalization group flow remains on the attractive surface S at the unique physical
point identified by fixing the particular N low-energy amplitudes that define the renor-
malization scheme we have decided to employ. In the language of the renormalization
group, we can say that the huge variety of theories defined by the infinitely many
couplings ḡn specified at the short-distance cutoff lie in a single universality class
of theories, namely those which collapse to a finite-(N)-dimensional surface at low
energies, as pictured in Fig. 17.13.
While the above argument asserts that the low-energy limit corresponds to some
set of marginal/relevant operators, and hence to a theory that is perturbatively
renormalizable by power-counting, it may be the case, depending on the field content
and the nature of the interactions, that the resultant low-energy theory is in fact
just a free theory. A classic example is the Fermi theory of weak interactions: if we
integrate out all fields down to a Gev (say), and consider only the weak interactions
of leptons and baryons (e.g., nuclear β decay), then we have only spin- 12 fields to
consider, and there are no renormalizable interactions in four dimensions involving only
658 Scales II: Perturbatively renormalizable field theories

fermionic spin- 12 fields: the four-fermion coupling term is dimension 6 and therefore
non-renormalizable.13 From the point of view of the argument just above, the low-
energy theory is free, and we simply ignore the weak interactions, as a remnant of high-
energy “new physics”, contributing at the “negligible” O(( ΛμUV )|dα | ) level of (17.125)
(where here μ ∼ MeVs, ΛUV ∼ the W mass, 80 GeV, and dα = 2). Of course, if we
integrate only down to 1 TeV, we are left with all the fields of the Standard Model,
including the W-, Z-, and (presumptive) Higgs bosons, which do form a perturbatively
renormalizable theory—exactly the point of the great electroweak unification of the
early 1970s.
We also note here the obvious point that we are not always so fortunate to have
the couplings of the low-energy renormalizable theory—albeit a non-trivial one with
interesting interactions—sufficiently small to render perturbation theory quantita-
tively useful. In the case of QCD, we do indeed end up with a renormalizable theory
at low energy, but one with a gauge coupling constant of order unity, which means
that a complete quantitative evaluation of the low-energy amplitudes of the theory
necessarily requires explicitly non-perturbative methods, such as lattice gauge theory.
In the next chapter we shall see that QCD however possesses the remarkable property
that the running coupling decreases with increasing energy (the famous property of
“asymptotic freedom”), so that the renormalization group actually provides an escape
route allowing the perturbative calculation of certain high-energy amplitudes of the
theory. Of course, in the case of QED we are very fortunate in this regard: the low-
energy coupling of the electron to the photon provides an expansion coefficient of
order 1/137 (the fine-structure constant), so perturbation theory is (initially) rapidly
convergent, and the accuracy of the results obtained for the anomalous magnetic
moment of the electron can be used to establish the absence of new physics (in the
form of a dimension-5 operator ψ̄σμν ψF μν ) up to an energy scale of at least 107 GeV
(Weinberg, 1995a).
It must once again be emphasized that the arguments given here, based as they are
on a linear stability analysis, make sense only in the context of perturbation theory,
where we are entitled to treat all couplings associated with interaction operators as
infinitesimal. Infinitesimal couplings remain so under the renormalization group flow
from ΛUV down to the low scale μ, even though the flow is non-linear. No statement
is made—nor can be made—concerning the actual existence of flows beginning at the
specified UV starting point (17.126), with finite rather than infinitesimal couplings,
and ending at some specified desired low-energy coupling strengths (say, for the
renormalized coupling λR in φ4 -theory). This is a global issue which requires non-
perturbative control over the theory, as one is in principle interested in situations
where the renormalization group flow is diverted into regions where weak coupling
perturbation theory is no longer valid.

13 The same problem occurs in attempts to describe quantum gravity in terms of a local field theory:
there are no renormalizable interactions of spin-2 gravitons. The leading low-energy residual of whatever
microscopic theory adequately describes quantum gravity effects, the Einstein–Hilbert Lagrangian, leads
to the appearance of infinitely many counterterms if we attempt a perturbative expansion. Indeed, the
only coupling in the theory, Newton’s constant G, has dimensions of mass−2 —the classic signature of a
non-renormalizable interaction.
Renormalization group approach to renormalizability 659

In fact, it may well be the case that the limit outlined above, in which ΛUV is taken
to infinity while holding a set of N low-energy amplitudes fixed leads to a high-energy
theory which does not correspond to a physically sensible theory—for example, by
having negative terms in the effective action which render the functional integral at
the high-scale divergent. In such a case, the only scaling of the bare couplings at the
ultraviolet end which leads to a sensible low-energy theory corresponds to sending all
the interaction terms to zero—in other words, to the “trivial” result of a free field
theory. There is considerable circumstantial evidence (we shall consider some in the
next chapter) that both φ4 theory and quantum electrodynamics in four spacetime
dimensions, taken as self-standing field theories, are in fact trivial theories of this sort,
even though they are, as we have seen earlier, formally perturbatively renormalizable
to all orders of perturbation theory.14
There is no paradox here: recall (Section 11.1) that the formal perturbation
series is always only a divergent, asymptotic one. A local field theory in which a
well-defined continuum limit exists, where all ultraviolet cutoffs have been removed
in a way consistent with full Poincaré invariance while retaining the hermiticity of
the Lagrangian and the unitarity of the theory, is presumably one in which n-point
functions exist satisfying the full panoply of Wightman axioms discussed in Chapter 9.
However, it may well be the case that the perfectly well-defined all-orders expansions
of the amplitudes of a field theory do not correspond to the asymptotic expansions
of a set of Wightman functions satisfying the needed axioms. In this situation, the
perturbation expansion may still be of enormous phenomenological utility (as in the
case of QED): we must regard the relevant microscopic theory not as a continuum field
theory, but as an effective Wilsonian theory valid up to a high-energy scale beyond
which new physics comes into play, altering significantly the ultraviolet behavior. As
long as the interaction couplings at the high scale are reasonably small, we may expect
that the flow down to the low-energy scale where we are doing physical measurements
produces an attraction onto a finite-dimensional surface on which we work in the
setting of renormalization theory. In the case of the electroweak sector of the Standard
Model, the measurements are at an energy scale on the order of hundreds of GeVs, and
we are fortunate that the low-energy couplings are small here. In the case of QCD, the
low-energy couplings attract to a finite-dimensional surface where the gauge coupling
appropriate for hadronic phenomena in the sub-GeV regime is of order unity, and
renormalized perturbation theory, though formally perfectly sensible, is not useful,
and in fact, is qualitatively misleading with regard to the physics of the theory, as we
shall see in Chapter 19.
The renormalization group approach to perturbative renormalizability has some
quite striking advantages in comparison to the detailed analysis of divergence structure
presented earlier in this chapter. The decoupling of ultraviolet sensitivity is seen to
proceed by simple scaling arguments, with no reference to the complications of nested
or overlapping subdivergences in the Feynman graphs associated with the perturbative
amplitudes. There are, however, considerable disadvantages attached to this approach,

14 For a detailed description of the mathematical issues involved in establishing triviality of field theories,
see (Fernandez et al., 1992).
660 Scales II: Perturbatively renormalizable field theories

at least from the point of view of high-energy theory (though less so in the many
condensed-matter applications of the renormalization group). The main one arises
from the need to impose momentum cutoffs, which we have seen do violence to
the local gauge symmetry which plays a central role in all sectors of the Standard
Model. This results in enormous technical complications when repeating the proof
of renormalizability for gauge theories along the lines of Wilsonian effective theory,
as given above for scalar field theories, although the method has been successfully
applied (if painfully) even in this case (Kopper and Müller, 2009). Furthermore, the
renormalization techniques developed earlier in the chapter provide the germs for
further extensions which are indispensable in understanding the short-distance/high-
energy behavior of field theories (where here we are talking about energy scales
intermediate between the important dimensionful scales defining the theory at low
energy, and the high-energy scale at which the theory becomes invalid). Here the
notion of oversubtraction of amplitudes—a straightforward extension of the subtrac-
tion techniques of Section 17.2—becomes critical in understanding the factorization
properties of field theory amplitudes in this regime. These ideas are difficult, if not
impossible, to implement in the framework of renormalization group flow arguments
of the type given in this section, although, as we are about to see, renormalization
group ideas, appropriately reformulated for use within the framework of renormalized
perturbation theory, play an indispensable role in extracting useful information about
high-energy amplitudes once certain factorization properties of the latter have been
established.

17.5 Problems
1. Verify the inequality, valid for A, B ≥ 0, (implying (17.16), setting A = k02 ,
B = k2 + m2 ):


1 4 1
| |≤ 1+ (17.127)
(A − B + i B) 2
A+B

2. Check that the result of differentiating the internally subtracted two loop graph
in Figure 17.3 three times with respect to the external momentum p is UV-finite
in dimensional regularization. One needs to show that the amplitudes resulting
from the differentiation of the basic graph, together with the inner subtractions
prescribed by the forest formula (i.e. (a), (b), and (c) in Figure 17.4), can be
rearranged into a sum of individually UV-finite terms. For example, the terms in
which the propagators carrying momentum p − l1 , p − l2 both receive derivatives
are manifestly UV-finite by Weinberg’s Theorem. Thus, one must show that the
terms in which all three derivatives are applied to a single propagator, together
with associated subtraction terms, give a UV-finite result. The UV convergence of
the third derivative implies that the pole part of the subtracted two loop diagram
must be a polynomial, at most quadratic, in the external momentum p.
3. Determine, in terms of zero-momentum amplitudes, the self-energy and vertex
counterterms responsible for renormalizing the two-loop self-energy of Fig. 17.3
Problems 661

in φ36 -theory, and show that these counterterms give rise to exactly the set of
subtractions indicated in Fig. 17.4.
4. Imitating the arguments leading to (17.41), derive the formula (17.77) giving the
superficial degree of divergence of a graph with Eψ external fermion and Eφ external
scalar lines in a theory of a Dirac fermion field Yukawa-coupled to a scalar field:
3
D = 4 − Eψ − Eφ (17.128)
2
5. Show that the Euclidean γ matrices γ̂μ , μ = 1, 2, 3, 4 defined in (15.160) satisfy the
trace identities:
Tr(γ̂μ γ̂ν ) = 4δμν (17.129)
Tr(γ̂μ γ̂ν γ̂ρ γ̂σ ) = 4(δμν δρσ − δμρ δνσ + δμσ δνρ ) (17.130)
6. Evaluate, after dimensional continuation via (17.102), the photon one-loop self-
energy (17.100) integral, and show that one obtains the result displayed in (17.103).
7. Show that a counterterm of the form δZ × 14 (Fμν )2 , inserted to first order in the
momentum-space photon propagator, produces exactly the transverse tensor in
(17.104).
18
Scales III: Short-distance structure
of quantum field theory

In the preceding chapter we saw that for a certain subclass of local quantum field
theories, whose local dynamics is determined by a Lagrangian containing only a finite
number N of operators, the low-energy amplitudes of the theory, order by order in per-
turbation theory, lose their leading sensitivity to ultraviolet modifications of the theory
once reparameterized in terms of an equal number of independent low-energy quanti-
ties. We refer to such theories as “perturbatively renormalizable”, and the requirement
that the Lagrangian of such theories contain only operators of mass dimension less than
(relevant/super-renormalizable) or equal to (marginal/renormalizable) that of the
spacetime dimension is extremely restrictive, with the pleasant result that enormous
phenomenological predictivity obtains with a minimum of input. More exactly, we
have seen that in perturbatively renormalizable theories, a general Green function
M of the theory, depending on generic momenta p, masses m, and bare Lagrangian
couplings g, evaluated with ultraviolet cutoff Λ, becomes, once reparameterized in
terms of renormalized masses mR and couplings gR (which may depend on a choice
of renormalization scheme—for example, through a renormalization scale μ)

p2 , m2R , μ2
M(p, m, g, Λ) → MR (p, mR , gR , μ) + O( ) (18.1)
Λ2
The preceding equation is to be interpreted as valid order by order in the formal
asymptotic expansion of both sides in powers of the subset of the gR corresponding
to interaction (higher than quadratic) vertices of the theory. For simplicity, we have
taken the couplings to be dimensionless. The power suppressed terms in (18.1) are
to be thought of as incorporating “new physics” which may be interesting in its own
right, but is not directly of interest in the calculation of the desired amplitude M. In
particular, we assume that these terms are quantitatively negligible: the masses and
momenta of particles involved in the given amplitude are much less than the energy
scale Λ at which new physics may emerge. The subtraction technology developed in
Section 17.2 was precisely fitted to the task of extracting just the parts of the full
amplitude which survive in this limit.
Our subject in this chapter will be to show that the subtraction procedure used to
demonstrate (18.1) has a natural generalization to situations in which three distinct
energy scales are present: the “low” masses m and momenta of some of the particles,
a momentum scale Q >> p, m much greater than the remaining momenta and the
masses of the theory, and (as always) a high-energy frontier scale Λ reflecting our
Scales III: Short-distance structure of quantum field theory 663

large momentum unknown


small momenta, short-distance
masses in process
physics
energy
renormalization scale scale
pi,mi m Q L

counterterm subtractions

Fig. 18.1 Energy scales involved in UV subtraction of amplitudes (leading behavior for
large Λ).

ignorance of the ultimate microphysics underlying our field theory (see Fig. 18.1).
We shall see that it is possible in many cases to effectively repeat the procedures
leading to (18.1) to extract the leading behavior of the renormalized amplitude (with
Λ dependence now already discarded) and obtain a factorized amplitude

 p2 , m2R , μ2
MR (p, Q, mR , gR , μ) → M̂R,i (p, mR , gR , μ)Ci (Q, mR , gR , μ) + O( )
i
Q2
(18.2)
where now the “small” terms (usually called “higher twist” contributions) are of the
order of inverse powers of the large momentum scale Q, which in some sense has taken
over the role previously played by the “ultimate” cutoff Λ. The result (18.2), effectively
decoupling the dependence of the full amplitude on the large and small momenta, is
the expression in momentum space of a coordinate space property of local operators
originally uncovered by Wilson (Wilson, 1969), and which has come to be known as
the “Wilson operator-product expansion”.
The proof of Wilson’s hypothetical expansion, first given by Zimmermann, simply
extends the subtraction procedure used to remove the leading Λ dependence of the
amplitudes further down, to the “large” (but not too large!) scale Q, as schematically
indicated in Fig. 18.2. The coefficient functions Ci are correspondingly termed “Wilson
coefficients”, while the set of amplitudes M̂R,i will turn out to involve insertions of
appropriately defined local composite operators. We have seen on many occasions (cf.
Section 16.5) that products of field operators contain ultraviolet divergences, and as
(18.2) no longer contains any reference to a cutoff scale Λ, it is clear that the composite
operators appearing here must come fully equipped with a prescription for subtracting
off any additional Λ-dependence which their insertion in a graph might occasion.
Our exploration of the factorization properties of amplitudes at high energy must
therefore begin, naturally enough, with a more detailed treatment of the definition
and properties of local composite operators. A full treatment of the factorization and
renormalization group properties of high-energy amplitudes would easily require a
separate (and sizeable!) book, so we must beg the reader’s indulgence in providing
merely an overview, with (in most cases) detailed proofs omitted.
664 Scales III: Short-distance structure of quantum field theory

large momentum unknown


small momenta, short-distance
masses in process
physics

renormalization scale energy


scale
m
pi,mi Q L

oversubtracted amplitudes
(dominant dependence on
Q removed)

Fig. 18.2 Energy scales involved in oversubtraction of amplitudes (leading behavior for
large Q).

18.1 Local composite operators in field theory


We have already alluded on many occasions to the ultraviolet divergences which appear
in a continuum field theory when one attempts to multiply local field operators at the
same spacetime point, in order to form composite operators, such as those needed
in the formulation of either a renormalizable Lagrangian or the more general Wilso-
nian effective Lagrangians discussed in Chapter 16. In coordinate space the singular
structure of these composite operators is directly associated with the singularities
encountered when distributions are multiplied at the same argument (cf. Section 10.2).
We shall concentrate on the structure of these divergences in momentum space, as the
required technology is essentially already at our disposal given our previous study of
the divergence structure of Feynman graphs in Sections 17.1 and 17.2. Our objective
is to describe a systematic approach to the renormalization to all orders of composite
operators, effectively generalizing the introductory discussion given in Section 16.5 in
terms of some simple one-loop examples.
We shall work in φ44 -theory, which we assume to be renormalized by zero momentum
BPHZ subtractions, implemented via the counterterms of (17.58). The renormaliz-
ability of the theory then implies the existence of the limit of the Euclidean Green
functions of the theory φ(x1 )φ(x2 ) · · · φ(xn ) (cf. (10.82)) or their momentum-space
Fourier transforms as the UV cutoff Λ is taken to infinity. However, if two of the
field operators in the above n-point function are taken at the same spacetime point
(say, by taking x1 = x2 = 0), a singularity appears, which can be viewed as the
re-emergence of a singular UV cutoff dependence for graphs involving an insertion
of the composite operator φ2 (0) in an (n − 2)-point amplitude. This singularity is
inherited by the matrix elements of the composite operator (involving a total of n − 2
incoming and outgoing particles), as such matrix elements are obtained (via LSZ)
by taking the appropriately projected on-mass-shell limit of the momentum-space
amplitude (continued back to Minkowski space, of course). To be specific, consider
the momentum-space amplitude
Local composite operators in field theory 665

 
Γ(2,1,1) (k, k ) ≡ Δ̂−1 −1 
F (k)Δ̂F (k ) d4 xd4 x eik·x−ik ·x φ2 (0)φ(x)φ(x ) (18.3)

The superscript notation indicates that the Green function involves a composite
(squared) operator and two separate field operators, and we use the Γ notation
to make explicit the fact that the amplitude is taken to be 1PI, with the inverse
propagator prefactors removing the external legs carrying momentum k in and k  out.
Accordingly, the perturbative expansion of the amplitude begins with the constant
unity, and the order λR contribution is given simply by the logarithmically divergent
one-loop graph of Fig. 16.4. Note that there is no opportunity as yet for the order λ2R
counterterm contained in δλ to appear to cancel the divergence, as we are working
only to order λR (associated with the vertex on the right): the special vertex on the
left associated with the insertion of the φ2 operator (henceforth labeled V ) does not, of
course, carry a factor of the coupling constant. The presence of the composite operator
φ2 (0) has evidently introduced additional UV divergences which are not taken care
of by the normal counterterm subtractions. Nevertheless, a renormalized version of
the amplitude (18.3) can be defined very simply, and to all loop orders, following
the techniques of the BPHZ subtraction scheme. In order to do this we shall make
a slight change in notation for the Taylor subtraction operator tγ associated with a
renormalization part γ of a graph (namely, a superficially divergent 1PI subgraph),
writing

tγ → tD(γ) (18.4)

where the degree function D(γ) indicates the number of terms in the Taylor expansion
around zero momentum to be included in the Taylor operator. For renormalization
graphs not containing the composite vertex V , the subtraction degree is computed as
usual

D(γ) = 4 − Eγ (18.5)

where Eγ is the number of external lines attached to γ, while for renormalization parts
containing V (such as the one-loop graph in Fig. 16.4), we define

D(γ) = δ − Eγ (18.6)

with δ an integer at least as large as the engineering dimension of the composite


operator at the vertex V (in this case, 2). The Zimmermann forest formula (17.31) is
now taken over exactly as in Chapter 17,
 
IR,δ
Γ
= (−tD(γr ) )I Γ (18.7)
U ∈F (γ) γr ∈U

to define a fully subtracted, and UV-finite, amplitude, with I Γ the unrenormalized


Feynman integrand associated with every graph Γ contributing to the amplitude in
(18.3). For the one-loop graph in Fig. 16.4, taking δ = 2, the logarithmically divergent
one-loop subgraph contains the composite vertex, and therefore acquires a Taylor sub-
traction of degree 2 − 2 = 0, exactly sufficient to remove the logarithmic divergence.
With a little thought, one sees that the D(γ) defined in this way exactly computes
666 Scales III: Short-distance structure of quantum field theory

the superficial degree of divergence of all subgraphs, whether or not they contain the
special vertex V . In fact, we can think of the vertex V as a normal four-point vertex,
but with two of the external lines missing, whence the difference of 2 in the degree
functions (18.5) and (18.6) (taking δ = 2). However—and this freedom will become
a crucial ingredient in the techniques to be developed in this chapter—we may also
choose δ > 2, thereby subtracting additional finite terms from the already adequately
subtracted subintegrations associated with each γ, and obtaining an oversubtracted
but nevertheless completely UV-finite amplitude. The sum of all graphs, renormalized
according to the prescription (18.7), defines the insertion of a renormalized composite
φ2 operator of degree δ, henceforth denoted Nδ (φ2 ) (and frequently referred to as a
“Zimmermann normal product operator”)1 :
 
 
IR,δ
Γ
≡ Δ̂−1F (k) Δ̂−1 
F (k ) d4 xd4 x eik·x−ik ·x Nδ (φ2 (0))φ(x)φ(x ) (18.8)
Γ

In the event that we choose δ = 2, the minimal value required to yield a UV-finite
amplitude, the associated composite operator, N2 (φ2 ) is called minimally subtracted.
Composite operators, such as N4 (φ2 ), containing more than the minimal number
of subtractions required to remove the singular UV-dependence, are called over-
subtracted.
All of the preceding may be carried out in a dimensional renormalization scheme
simply by reinterpreting the Taylor operators tD(γr ) in the forest formula as pole-part
extraction operators, as described in the previous chapter. The renormalized composite
operator Nμ (φ2 ) (for example) so defined implicitly depend on the renormalization
scale μ used to define the dimensionally continued integrals, but we lose the ability to
define oversubtracted operators in which additional momentum dependence is removed
from the renormalization parts, with inconvenient consequences for the proof of the
operator product expansion (for example). Nevertheless, as we shall see below, the
dimensionally renormalized operators can be explicitly related to linear combinations
of the more intuitive BPHZ normal product ones. As usual, one is dealing with the
usual freedom available in choosing a particular “basis” of local operators from an
infinite set of independent ones.
The physical interpretation of the subtractions implemented in our new forest
formula (18.7) according to the degree function (18.5) is clear: these are just the
subtractions generated by the appearance of counterterms in the Lagrangian once the
theory is reparameterized in terms of a set of low-energy parameters identified through
renormalization conditions (in the present scheme, at zero momentum), as we saw
in the previous chapter. But the additional subtractions involving renormalization
parts containing the vertex V associated with the insertion of the composite φ2
operator, employing the degree function (18.6), clearly have nothing to do with
these counterterms, and we may well be concerned that they involve an unacceptable
mutilation of the composite operator, perhaps destroying important properties, such
as locality (space-like commutativity), etc.

1 The normal products defined here are to be distinguished, of course, from the “normal-ordered
products” introduced in our discussion of Wick’s theorem.
Local composite operators in field theory 667

k
k k k k
V× + + + + +...

k k k k
k

Fig. 18.3 Unsubtracted graphs contributing to Γ(2,1,1) up to two loops.

In fact, it is not hard to see that the extra subtractions introduced in (18.7)
to render the amplitude Γ(2,1,1) UV-finite correspond simply to a multiplicative
renormalization of the composite φ2 operator,
√ completely analogous to the previous
rescaling of the basic field operator φ → Zφ in (17.45), needed to absorb singular
cutoff dependence in the two-point function of the theory. Rather than give a formal
demonstration of this statement with the forest formula, we shall illustrate the basic
point with an example.
In Fig. 18.3 we show the graphs contributing to Γ(2,1,1) in φ44 -theory through two
loops. The corresponding subtractions induced by application of the forest formula
are indicated in Fig. 18.4, where we remind the reader that the appearance of “o”
symbols on the external legs of a subgraph indicate the application of the appropriate
Taylor zero-momentum operation to that subgraph. In the present case, this effectively
means just setting the momenta entering that subgraph to zero. The lowest-order
graph (a) in Fig. 18.4 is by definition just unity. Also, the reader will recall from the

k k
k k k
V× + + + +
k k k k
k
(a) (b) (c) (d) (e)
o o o
o o o
−o − o + o −
o o o

(f) (g) o (h) o (i) o

k o o k
o o
−o − o + o o −×
o
k o o k
(j) (k) (l) (m)

Fig. 18.4 Fully subtracted graphs contributing to Γ(2,1,1) up to two loops.


668 Scales III: Short-distance structure of quantum field theory

previous chapter that the one-loop propagator correction in graph (e) is momentum-
independent in φ4 -theory, so graphs (e) and (m) in fact cancel identically. A brief
inspection shows that the indicated subtractions indeed suffice to remove divergent
UV contributions from all possible large momentum flows, as required by Weinberg’s
theorem. The graphs of Fig. 18.4 can be rearranged as indicated in Fig. 18.5. We see
that the fully subtracted amplitude factorizes into a dimensionless number, which
we shall call Zφ2 , independent of the external momenta k, k  , corresponding to
the contents of the parenthesis on the top line, times the contributions to Γ(2,1,1)
corresponding to the amplitude obtained by inserting the bare φ2 operator into the 1-
1 amplitude and carrying out all necessary counterterm subtractions (in this case, only
the one-loop vertex subtractions) arising from the reparameterization of the theory.
In other words, with a UV-cutoff Λ present to regularize the individual graphs in
Figs. 18.3–18.5, the minimally subtracted φ2 operator, giving UV-finite insertions into
1PI graphs (and hence, by LSZ, with finite matrix elements), is obtained by a cutoff-
dependent rescaling of the bare operator:

Λ2 2
N2 (φ2 (0)) = Zφ2 (λR , )φ (0) (18.9)
m2R

This simple multiplicative relation between the renormalized normal product operator
N2 (φ2 (0)) and its bare counterpart φ2 (0) suggests that the renormalized operator
will possess, in addition to UV-finite matrix elements, the desired Lorentz scalar and
locality (space-like commutativity) properties, and indeed, these properties have been
rigorously established (see (Zimmermann, 1970), and references cited therein). We
should point out here that for certain particularly “nice” composite operators—the
most important examples being those operators corresponding to conserved Noether
currents associated with a Ward–Takahashi identity—the bare composite operator
may already have finite matrix elements, allowing us to simply take the corresponding
Z factor to be unity (see Problem 1).
More generally, defining renormalized composite operators may require a combina-
tion of bare operators, as a consequence of operator mixing. For example, in a theory
with two independent scalar fields φ, χ, with basic interaction Lagrangian

Lint = λ1 φ4 + λ2 φ2 χ2 + λ3 χ4 (18.10)

o o
o o o
1 −o
o
−o +o o
o
+o +...
o
o o
k k k o k
o o
+ + − + −o +...
o o
k k k o k

Fig. 18.5 Factorization of the fully subtracted graphs contributing to Γ(2,1,1) up to two loops.
Local composite operators in field theory 669

φ χ

φ2(0)× λ1 φ2(0)× λ2
φ χ

Fig. 18.6 One-loop renormalization parts appearing in the renormalization of φ2 (0) in the
theory defined by (18.10).

the minimally subtracted operators N2 (φ2 ), N2 (χ2 ) are linear combinations of the
bare φ2 and χ2 operators: thus, we have a 2×2 matrix of renormalization constants
(Zφφ , Zφχ , etc.) connecting the bare operators with the renormalized ones. The need
for including the χ2 operator in the renormalization of φ2 is apparent when one
considers the graphs of Fig. 18.6, where we see that renormalization parts arising
in the insertion of a φ2 operator induce subtractions corresponding to a local χ2
operator, as the zero momentum subtraction of the graph on the right amounts to a
lowest-order insertion of a χ2 operator.
We mentioned previously that the ability to define over-subtracted operators with
more than the minimum number of subtractions needed to ensure the UV-finiteness of
amplitudes containing these operators will be extremely important in understanding
the underlying physics of operator product expansions. In fact, such operators are
simply particular linear combinations of the minimally subtracted ones, as we shall
now see, albeit combinations with particularly useful properties.
Consider, for example, the oversubtracted operator N4 (φ2 ), where the Taylor
operator acting on renormalization parts containing the special vertex V where the
operator is inserted into the diagram contains two additional terms. Thus, the subtrac-
tion (f) of graph (b) in Fig. 18.4 contains, in addition to the constant term obtained by
evaluating the one-loop integral at zero external momentum, the linear and quadratic
terms in the Taylor expansion in external momenta of this graph. The graph evidently
is a scalar function Π(q 2 ) of the momentum q = k  − k inserted by the composite
operator, so the subtraction must take the form a + bq 2 = a + b(k2 + k 2 ) − 2bk · k  ,
with a (cutoff-dependent) and b (finite) constants. The extra terms contained in
the oversubtracted operator, proportional to b, are clearly just what we would get
from a lowest-order insertion of the composite operator φφ (giving the momentum
dependence −(k 2 + (k )2 ) and ∂μ φ∂μ φ (giving the 2k · k term). A little time spent
examining the effect of the oversubtraction at the next loop order shows that these
new operators containing derivatives appear only minimally subtracted when their
momentum dependence enters a renormalization part requiring subtraction. One also
finds in higher order that the oversubtractions generate a term corresponding to the
minimally subtracted N4 (φ4 ) operator (see Problem 2). The result is the famous
Zimmermann identity:

N4 (φ2 ) = N2 (φ2 ) + rN4 (∂μ φ∂μ φ) + sN4 (φφ) + tN4 (φ4 ) (18.11)

displaying, as predicted, the oversubtracted operator as a linear combination of


minimally subtracted ones, with the constants r, s, and t UV-finite functions of the
670 Scales III: Short-distance structure of quantum field theory

renormalized parameters of the theory, as they must be, given that all operators
appearing in the identity are fully renormalized. Alternatively, we may write

N2 (φ2 ) = N4 (φ2 ) − rN4 (∂μ φ∂μ φ) − sN4 (φφ) − tN4 (φ4 ) (18.12)

The general rule is very simple: we may write a minimally subtracted operator of
degree D as a linear combination of all the independent operators of degree D + δ,
δ > 0 with which it may mix (given symmetry constraints) under renormalization.
A general proof of these Zimmermann identities involves straightforward, if lengthy,
algebraic reshuffling of the forest formula (18.7), which we shall not give here. The
interested reader is referred to the lectures of Lowenstein (Lowenstein, 1976) and
Zimmermann (Zimmermann, 1970), in which all the details are given with proper
mathematical rigor.
Higher than quadratic composite operators can be defined in a similar way to the
above: the minimally subtracted N4 (φ4 ) operator, for example, requires any renormal-
ization part γ containing the four-point vertex corresponding to the operator insertion
to be subtracted with the normal Taylor operator tD(γ) , i.e., with D(γ) = 4 − Eγ .
Moreover, these Zimmermann normal products satisfy some obvious (and convenient!)
properties with respect to spacetime-derivatives—namely:
1. Derivatives may be passed through the normal product by the simple expedient
of raising the degree of the subtractions by one for each derivative that enters
the product: e.g.,

∂ν Nδ (φ∂μ φ) = Nδ+1 (∂ν (φ∂μ φ)) (18.13)

The reason is simple. After Fourier-transforming to momentum space, we see


that the interior ∂ν derivative on the right-hand side of (18.13) corresponds to
an extra factor of external momentum for any renormalization part containing
the special vertex for the Nδ+1 composite operator: thus, the Taylor subtraction
operator on the right must perform an extra momentum differentiation to ensure
that the left- and right-hand sides agree.
2. Usual Leibniz rules of differentiation obtain within normal products: e.g.,

Nδ (∂μ (φ∂μ φ)) = Nδ (∂μ φ∂μ φ) + Nδ (φφ) (18.14)

Dimensionally subtracted composite operators satisfy similar properties, with the


important change that there is no freedom to change the subtraction level, so the
derivatives pass through the normal product with no change in its definition: e.g.,
∂ν Nμ (φ∂μ φ) = Nμ (∂ν (φ∂μ φ)).
Fully subtracted amplitudes involving multiple insertions of composite operators
are constructed by an obvious generalization of the forest formula (18.7): one simply
applies the appropriate subtraction degree formula for each renormalization part
taking into account the sum of the degree increments (if some or all operators are
oversubtracted) for all the special vertices appearing in that renormalization part.
We thereby arrive at a very general and flexible formalism, with a remarkable formal
benefit: it allows an extremely concise formulation of the renormalized Lagrangian
dynamics of a perturbatively renormalizable theory, in terms of a Zimmermann effec-
Local composite operators in field theory 671

tive Lagrangian.2 The usual interaction-picture formulation of perturbation theory in


terms of the bare (unrenormalized) fields and parameters is replaced by the (Euclidean)
Lagrangian specification (subscript Z for “Zimmermann”), for φ44 -theory,

LZ = LZ,0 + LZ,int (18.15)


1 1
LZ,0 = N4 ( (∂μ φ)2 + m2R φ2 ) (18.16)
2 2
λR 4
LZ,int = N4 ( φ ) (18.17)
4!
No counterterms appear here (the mass and coupling parameters are the BPHZ
renormalized ones), but the perturbative expansion of a general BPHZ renormalized
1PI function in terms of correlation functions of free fields with dynamics specified by
(18.16) is defined as

(N )
 (−1)r λr 
ΓR (k1 , ..., kN ) = r
R
d4 z1 d4 z2 · ·d4 zr N4 (φ4 (z1 ))N4 (φ4 (z2 )) · ·N4 (φ4 (zr ))
r=0
r!(4!)

· φ̃(k1 ) · ·φ̃(kN )1PI (18.18)

where the graphs obtained by Wick expansion of the correlation function on the right
are to be subjected to the forest formula subtraction formula corresponding to the
indicated multiple insertion of the quartic interaction operator. The latter is minimally
subtracted, and it is more or less obvious that this prescription precisely corresponds
to the BPHZ renormalization scheme described in detail in the preceding chapter.
The reader may be somewhat puzzled by the fact that the mass operator in (18.16)
appears in oversubtracted form, as a N4 (φ2 ). The reason is easily seen if we examine the
effect of a small change in the renormalized (squared) mass, or equivalently, compute
the first derivative of a renormalized 1PI amplitude with respect to m2R . The effect is
simply (with a change of sign) to double each internal propagator of the graph, as
∂ 1 1 1
2 2 =− 2 · 2 (18.19)
2
∂mR p + mR p + mR p + m2R
2

This doubling occurs, of course, not only in the basic unsubtracted graphs but also in
each of the subtraction terms which pop up whenever there is a divergent subgraph.
The point at which the propagator is doubled may be regarded as a new special vertex
associated with the insertion of the φ2 operator appearing in LZ,0 . The result is as
shown in Fig. 18.7 for a simple example: the 1PI four-point function at one loop,
where “X” marks the point of the φ2 insertion. Recall that only internal lines are
present and differentiated, as we are dealing with a 1PI, and therefore automatically
amputated, amplitude. It is clear that the mass derivative of this one-loop contribution
(4)
to ΓR corresponds to an insertion of the oversubtracted N4 (φ2 ) operator, as it is
subtracted at zero momentum even though the overall degree of divergence of the

2 We apologize once again to the reader for the lamentable overuse of the adjective “effective”, which
appears here now for the third time with a completely different connotation!
672 Scales III: Short-distance structure of quantum field theory

o o o o

− 2 − = −
∂m R
o o o o
o o
+ −
o o

(4)
Fig. 18.7 Mass derivative of the four-point one-loop renormalized amplitude ΓR in φ44 -theory.

one-loop graph with one of the propagators doubled is now –2 rather than zero, and a
minimally subtracted N2 (φ2 ) operator would by definition not require a subtraction of
an already superficially convergent subgraph containing its vertex. In general, all the
operators appearing in a Zimmermann effective Lagrangian of this type carry a degree
subscript equal to the spacetime dimension, independent of their actual engineering
dimension. This means that the operators corresponding to super-renormalizable
terms are necessarily oversubtracted.
We must now return to the basic theme of this chapter—the use of renormalization
techniques to study the short distance, or equivalently, large momentum behavior
of amplitudes in a renormalizable local field theory. We indicated earlier that the
concept of oversubtraction provides the key to unlocking this behavior. In particular,
we wish to consider the situation in which there is a distinct large momentum scale
Q present in the renormalized amplitudes being studied, with Q much larger than
all other dimensionful quantities (masses, super-renormalizable couplings if any, and
other momentum variables: but, of course, as the amplitudes have been renormalized,
no UV cutoff Λ).
The simplest case concerns an amplitude in which all external momenta are of
order Q. It was realized a long time ago by Symanzik (Symanzik, 1970) (and almost
simultaneously, by Callan (Callan, 1970)) that in this regime the leading contribution
to the amplitude at large Q (neglecting subdominant terms suppressed by inverse
powers of Q (cf. (18.2)) satisfies an homogeneous partial differential equation, which
in certain circumstances can be solved and used to extract the desired asymptotic
behavior. The equation in question is now referred to universally as the Callan–
Symanzik equation. We shall not follow the more involved methods used by either
Symanzik or Callan to derive this equation here, as it is an almost immediate
consequence of the Zimmermann identity discussed earlier, and the approach we use
will generalize more easily to the case of factorized amplitudes to be treated in the
following section. Also, we shall henceforth focus on the scalar φ36 -theory introduced
in the previous chapter, for the same reasons indicated there: the topological structure
of the diagrams is essentially identical to that of a four-dimensional gauge theory, and
moreover, the structure and strength of the ultraviolet divergences are very similar
to the gauge-theory case. Thus, instead of (18.15, 18.16, 18.17), we shall be dealing
with an effective Zimmermann Euclidean Lagrangian (in six dimensions) given by the
following free and interaction parts:
Local composite operators in field theory 673

1 1
LZ,0 = N6 ( (∂μ φ)2 + m2R φ2 ) (18.20)
2 2
λR 3
LZ,int = N6 ( φ ) (18.21)
3!
Note that the engineering dimension of the scalar field φ is 2 in six dimensions, so
the kinetic and interaction terms are minimally subtracted and the mass operator
oversubtracted, as usual. No linear term in the field is included, as the BPHZ zero-
momentum subtractions automatically remove all tadpoles—a process equivalent to
cancelling such graphs with a additive field shift order by order in perturbation theory
(see Fig. 17.11). The corresponding graphs in a gauge theory such as QED, in which
a photon line virtualizes into a electron–positron pair which subsequently disappears
into the vacuum, are, of course, necessarily zero by (for example) angular momentum
conservation.
The key to understanding the Callan–Symanzik equation lies in an important
difference in the behavior of minimally subtracted (such as the N2 (φ2 ) in φ44 -theory
and N4 (φ2 ) in φ36 -theory) and oversubtracted (e.g., N4 (φ2 ) in φ44 -theory and N6 (φ2 )
in φ36 -theory) operators when inserted into amplitudes at large (external) momentum
ki = Qk̂i , where Q is a large momentum scale and the k̂i are Euclidean momenta of
order unity. For example, the minimally subtracted φ2 operators, as we have seen,
simply introduce an additional internal propagator into the diagrams, without any
additional subtractions (see Fig. 18.8(a)). The result is to lower the superficial degree

k3

×
k1 k2
(a)
k3 o

× o × o
k1 k2

(b)

(3)
Fig. 18.8 (a) A one-loop graph corresponding to an insertion of N4 (φ2 ) in ΓR in φ36 -theory.
(b) Result of insertion of the oversubtracted N6 (φ2 ) in the same graph.
674 Scales III: Short-distance structure of quantum field theory

of divergence of the overall (fully subtracted) graph by 2, which, by a corollary of


Weinberg’s theorem discussed in the previous chapter, lowers correspondingly the
asymptotic dependence of the graph for large Q by (modulo logarithms) two powers
of Q. This softening of the asymptotic behavior is guaranteed for all non-exceptional
momenta: specifically, provided no partial subset of the momenta entering the graph
vanishes (for the reason for this, and an explicit counterexample, see Problem 3).
On the other hand, the insertion of an oversubtracted φ2 operator generates
additional subtractions at zero momentum which do not fall off at large Q, as
indicated in Fig. 18.8(b), where the insertion of the oversubtracted N6 (φ2 ) operator
(3)
into the one-loop 1PI three-point function ΓR (k1 , k2 , k3 ), ki = Qk̂i , in φ36 -theory
2
produces the softened graph (falling like 1/Q , modulo logs) corresponding to the
insertion of the minimally subtracted operator, as in Fig. 18.8(a), but with an extra
zero-momentum subtraction which is just a constant as Q → ∞. For this reason,
minimally subtracted operators are sometimes referred to as soft operators, while
their oversubtracted counterparts are termed hard operators. Nevertheless, it is the
oversubtracted operator, as we saw previously, that corresponds to mass derivatives
of the renormalized amplitude. This observation, together with the Zimmermann
identity connecting minimally and oversubtracted operators, will provide the key to
our derivation of the Callan–Symanzik equation.
We begin with the formal expansion of the 1PI N -point function in φ36 -theory
(analogous to (18.18) for φ44 -theory):

(N )
ΓR (k1 , ..., kN ) = φ̃(k1 ) · ·φ̃(kN )1PI (18.22)
 (−1)r λr 
= r
R
d4 z1 d4 z2 · ·d4 zr N6 (φ3 (z1 ))N6 (φ3 (z2 )) · ·N6 (φ3 (zr ))
r=0
r!(3!)

· φ̃(k1 ) · ·φ̃(kN )1PI (18.23)

The fields on the first line are fully interacting (Heisenberg) fields (we omit the usual
“H” subscript here to avoid overburdening the notation), whereas the second line
corresponds to the interaction-picture expansion. Recall that the · · ·1PI symbol in
the second line is to be interpreted by first Wick-expanding the operators products
inside the bracket to generate a set of bare (unsubtracted) 1PI irreducible graphs,
each of which is then subjected to the forest formula to generate the appropriate
subtractions. We now define a series of zero-momentum insertion operations on the
(N )
general renormalized N -point function ΓR as follows:


(N ) 1
Δ0 Γ R ≡ d6 zN4 ( φ2 (z))φ̃(k1 ) · ·φ̃(kN )1PI (18.24)
2

(N ) 1
Δ1 Γ R ≡ d6 zN6 ( φ2 (z))φ̃(k1 ) · ·φ̃(kN )1PI (18.25)
2

(N ) 1
Δ2 Γ R ≡ d6 zN6 ( ∂μ φ(z)∂μ φ(z))φ̃(k1 ) · ·φ̃(kN )1PI (18.26)
2
Local composite operators in field theory 675

corresponding to insertions at zero-momentum (as a consequence of the d6 z integra-
tion) of the minimally subtracted mass operator, the oversubtracted mass operator,
and the minimally subtracted kinetic term operator, respectively. Finally, there is the
insertion operator for an additional minimally subtracted φ3 interaction vertex (at
zero momentum):

(N )
 (−1)r λr 
Δ3 ΓR ≡ r
R
d4 zd4 z1 · · · d4 zr N6 (φ3 (z))N6 (φ3 (z1 )) · · · N6 (φ3 (zr ))
r=0
r!(3!)

· φ̃(k1 ) · ·φ̃(kN )1PI (18.27)



First, note that insertions of d6 zN6 ( 12 (∂μ φ(z)∂μ φ(z) + m2R φ2 (z))) introduce a
factor of an inverse propagator at a two-point vertex (i.e., a factor of p2 + m2R for an
internal line carrying momentum p) on each internal line of a given bare graph—in
other words, just the factor unity for each internal line. This insertion, corresponding
to the operation m2R Δ1 + Δ2 , therefore just multiplies each bare graph by the number
of internal lines it contains, which by the usual graph topology arguments is just
1
2
(3r − N ) for a graph containing r basic 3-vertices. On the other hand, the power r
associated with each term in (18.23) is evidently obtained by the differential operation
λR ∂λ∂R . Thus we obtain the counting identity

(N ) N (N ) 3 ∂ (N )
(m2R Δ1 + Δ2 )ΓR (k1 , .., kN ) = − ΓR (k1 , .., kN ) + λR Γ (k1 , .., kN )
2 2 ∂λR R
(18.28)
On the other hand, from (18.23), we find

∂ (N ) 1 (N )
ΓR (k1 , .., kN ) = − Δ3 ΓR (k1 , .., kN ) (18.29)
∂λR 3!

so that (18.28) may be written

(N ) N (N ) λR (N )
(m2R Δ1 + Δ2 )ΓR (k1 , .., kN ) = − Γ (k1 , .., kN ) − Δ3 ΓR (k1 , .., kN ) (18.30)
2 R 4
We pointed out earlier that an insertion of the oversubtracted φ2 operator is equivalent
to a mass derivative:
(N ) ∂ (N )
Δ1 ΓR (k1 , .., kN ) = − Γ (k1 , .., kN ) (18.31)
∂m2R R

The final ingredient is the Zimmermann identity analogous to (18.12). It is convenient


to use the operator ∂μ (φ∂μ φ) = ∂μ φ∂μ φ + φφ, instead of φφ, in our basis of
operators, so with a slight change of notation, and with the subtraction degrees
appropriate for φ36 -theory,

N4 (φ2 (z)) = N6 (φ2 (z)) + r(λR , mR )N6 (∂μ φ(z)∂μ φ(z))


+ s(λR , mR )N6 (∂μ (φ(z)∂μ φ(z))) + t(λR , mR )N6 (φ3 (z)) (18.32)
676 Scales III: Short-distance structure of quantum field theory

Integrating over z, the pure derivative term proportional to s(λR , mR ) vanishes, and
we have, in terms of vertex insertion operators, the final relation
1
Δ0 = Δ1 + r(λR , mR )Δ2 + t(λR , mR )Δ3 (18.33)
2
which is essentially the Callan–Symanzik equation in disguised form, as we shall now
see. Note that the engineering dimension of the functions r(λR , mR ), t(λR , mR ) must
be –2 in powers of mass, so we have (as λR is dimensionless)
1 1
r(λR , mR ) = f (λR ), t(λR , mR ) = g(λR ) (18.34)
m2R m2R

where f (λR ) (resp. g(λR )) begin at order λ2R (resp. λ3R ) in perturbation theory. Com-
bining (18.29, 18.30, 18.31, 18.33), we find the promised Callan–Symanzik equation

∂ ∂ (N ) m2R (N )
(m2R 2 + β(λR ) − N γ(λR ))ΓR (ki ; λR , mR ) = Δ0 ΓR (ki ; λR , mR )
∂mR ∂λR f (λR ) − 1
(18.35)
where we have indicated explicitly the dependence of the N -point function on the
renormalized coupling and mass, and defined the functions
3 λR f (λR ) − 2g(λR )
β(λR ) ≡ ∼ O(λ3R ) (18.36)
2 f (λR ) − 1
1 f (λR )
γ(λR ) ≡ ∼ O(λ2R ) (18.37)
2 f (λR ) − 1

We have already seen that in the large-momentum (or short-distance) limit where
ki = Qk̂i with Q large and the k̂i non-exceptional and fixed, the insertion of the soft
mass operator effected by the vertex operation Δ0 on the right-hand side of (18.35)
suppresses the asymptotic behavior of our N -point function by two powers of Q, so
neglecting such contributions we find an homogeneous equation which must be obeyed
by the leading high-momentum contributions to the amplitude (up to inverse powers
of Q):

(N ) ∂ ∂ (N )
DCZ ΓR (ki ; λR , mR ) ≡ (m2R + β(λR ) − N γ(λR ))ΓR (ki ; λR , mR ) ≈ 0
∂m2R ∂λR
(18.38)
In other words, the particular combination of mass and coupling derivatives contained
in the Callan–Symanzik operator DCZ is exactly equivalent, to all orders of perturba-
tion theory, to an insertion of a soft mass operator, and must therefore suppress, by
powers, the asymptotic behavior of any N -point (Euclidean)1PI amplitude provided
all external momenta are taken large. Eq. (18.38) as it stands is not obviously useful,
as we are hardly in a position to explore the response of physical amplitudes to a
change in the mass of the particles being scattered. However, we can translate the
dependence on mass into one on uniformly rescaled momenta for the process by simple
Local composite operators in field theory 677

(N )
dimensional analysis. Let dN be the engineering dimension of ΓR in powers of mass
(thus, dN = 4 − N for φ44 -theory, 6 − 2N for φ36 -theory). The total powers of mass
and momentum in each term (the coupling λR is dimensionless) contributing to this
1PI amplitude must therefore be dN , which we may express with the usual Euler
derivative:
∂ ∂ (N ) (N )
(κ + mR )Γ (κki ; λR , mR ) = dN ΓR (κki ; λR , mR ) (18.39)
∂κ ∂mR R

We may therefore eliminate the mass derivative in (18.38) to obtain the asymptotic
equation

1 ∂ ∂ 1 (N )
(− κ + β(λR ) − N γ(λR ) + dN )ΓR (κki ; λR , mR ) = 0 (18.40)
2 ∂κ ∂λR 2

In Section 18.3 we shall return to (18.40), and show how to solve an homogeneous
partial differential equation of this type to constrain the large-momentum asymptotic
behavior of an arbitrary amplitude in a perturbatively renormalizable field theory. It
will also be seen there that a very similar equation can be derived in a completely
different way using renormalization group ideas, and the connection between the two
(involving the concept of mass singularities) will be explained.
The derivation of the Callan–Symanzik equation for fully amputated 1PI ampli-
tudes can be easily generalized to take care of the case when some or all of the
external propagators are present. For example, if our amplitude contains all external
legs, the number of lines in each basic graph is 12 (3r + N ) (instead of 12 (3r − N )
in the fully amputated case) for a graph containing r basic 3-vertices, and the end
result is a Callan–Symanzik operator with a change of sign in the N γ(λR ) term in
(18.38). Similarly, if the amplitude is “half-amputated”, with only half the external
legs amputated, the γ(λR ) term is absent.
The functions3 β(λR ) (the famous “β function” of renormalization group lore) and
γ(λR ) (which, for reasons to be seen later, is termed the “anomalous dimension”
of the scalar field) could be computed order by order in perturbation theory by
first determining the coefficient functions in the Zimmermann identity perturbatively
(paying very careful attention to the forest formula!), but it is more convenient to
extract them by simply applying the Callan–Symanzik equation to two independent
1PI N -point functions, in the asymptotic large momentum limit where the right-hand
side may be neglected. For example, to one-loop order, we may use the two-point 1PI
function (inverse propagator) and three-point function, by which a short calculation
(Problem 4) reveals the form:
 1
(2) λ2R
ΓR (k) = k 2 + m2R − {(x(1 − x)k2 + m2R ) ln (1 + x(1 − x)k 2 /m2R )
128π 3 0

− x(1 − x)k2 }dx + O(λ4R ) (18.41)

3 Modulo 1
a noisome factor of 2
; see below.
678 Scales III: Short-distance structure of quantum field theory

(3) 1 1 1 d6 l
ΓR (k1 , k2 , k3 ) = λR + λ3R { 2 2 2
l2 + mR (l + k1 ) + mR (l + k1 + k2 ) + mR (2π)6
2 2


1 d6 l
− 2 3 } + O(λ5R ) (18.42)
(l2 + mR ) (2π)6

Note that the self-energy term in (18.41) (given by the integral over the Feynman
parameter x) begins at order k 4 at small momentum, in keeping with the BPHZ
subtraction of a quadratically divergent integral. The three-point function (triangle
graph in Fig. 18.8(a), without the mass insertion) is logarithmically divergent and
therefore receives a single subtraction at zero momentum, as indicated in (18.42).
(2)
The Callan–Symanzik operator applied to ΓR (k) produces a dominant contribution
proportional to k 2 at large k and order λ2R with contributions from the mass derivative
and anomalous dimension term:
λ2R 1 k 2 λ2R
m2R − 2γ(λR )k 2
= 0 ⇒ γ(λ R ) = + O(λ4R ) (18.43)
128π 3 6 m2R 1536π 3
(3)
while the same considerations applied to the three-point function ΓR (k1 , k2 , k3 ) (in
this case the mass derivative of the first integral in (18.42) is asymptotically suppressed
and can be thrown away), using
 
∂ 1 d6 l 1 d6 l 1
m2R 2 2 2 3 6
= −3m 2
R 2 2 4 6
=− (18.44)
∂mR (l + mR ) (2π) (l + mR ) (2π) 128π 3
imply

1 3λ3R
3
λ3R + β(λR ) − 3λR γ(λR ) = 0 ⇒ β(λR ) = − + O(λ5R ) (18.45)
128π 512π 3
The significance of the (at first sight innocent) negative sign appearing in the β
function can scarcely be overstated: it leads, as we shall see in Section 18.3, to the
critical property of asymptotic freedom, implying that the theory becomes effectively
weakly coupled at large momenta, restoring the quantitative usefulness of perturbation
theory even in theories which are (at low momenta) strongly coupled. The unphysical
φ36 -theory serving as our toy example here shares this remarkable property with
non-abelian gauge theories generally, and with QCD in particular. The discovery and
proper interpretation in 1973 of asymptotic freedom was rewarded in 2004 by the
conferral of the Nobel Prize in Physics to Gross, Politzer, and Wilczek. But before
going on to describe the special features of asymptotically free theories in more detail,
we shall generalize our discussion of high-momentum behavior of amplitudes given
so far to situations in which, as described in the introduction to this chapter, only a
proper subset of the external momenta are large. The required generalization will lead
us directly to the Wilson operator product expansion.
Considerations of space prevent us from describing many of the quite beautiful
applications of the Zimmermann normal product formalism for local composite opera-
tors. Suffice it to say that the use of normal products allows one to write the Lagrangian
(Heisenberg) field equations of the theory as rigorous relations, correct to all orders
Factorizable structure of field theory amplitudes: the operator product expansion 679

of renormalized perturbation theory, between well-defined renormalized composite


operators. Likewise, the current conservation properties in theories with global or
local symmetries of the theory can be expressed as precise operator statements,
and, in the case in which anomalies appear in these currents, they can be seen to
arise automatically as a natural consequence of a Zimmermann identity relating the
minimally- and over-subtracted versions of the naive (classical) divergence of the
current. For more on these fascinating and deep results, the reader is encouraged
to consult the Erice lectures of Lowenstein (Lowenstein, 1976).

18.2 Factorizable structure of field theory amplitudes:


the operator product expansion
At the beginning of this chapter we indicated that the methods of perturbative
renormalization, originally formulated to isolate and extract the dominant dependence
of field-theoretic amplitudes on a single large momentum scale Λ, beyond which we
lose precise control of the dynamics of the theory, can be generalized to the problem
of extracting the asymptotic dependence of already renormalized amplitudes on a
physical large momentum scale Q, where Q is much larger than the masses, dimen-
sionful couplings, and other momentum scales in the problem. The first indications
that something of this kind might be possible go back to Wilson’s hypothesis (Wilson,
1969) of a short-distance expansion for the product of local operators (in the original
application, these were hadronic current operators) at nearby spacetime points:

A(x + ξ/2)A(x − ξ/2) → Ci (ξ)Oi (x), ξ → 0, ξ 2 < 0 (18.46)
i

Here, x, ξ are Minkowski spacetime coordinates, and the limit ξ → 0 is presumed to


be taken from the space-like direction.4
The Wilson coefficient functions Ci (ξ) (which may carry Lorentz indices) are UV-
finite functions which are ordered in the sum to be decreasingly singular as ξ → 0, while
the local operators Oi are well-defined renormalized local composite operators of the
kind discussed in the preceding section, ordered according to increasing engineering
dimension. The expansion acquires a definite meaning in the weak convergence sense,
once we sandwich both sides of (18.46) between definite initial and final states, at
which point the remainder terms at any finite point in the expansion are asserted to
vanish more rapidly as ξ → 0 than the kept terms (see (Zimmermann, 1970)). The
operators Oi appearing in this expansion are just those which are capable of mixing
with the bilocal operator on the left under renormalization, subject to symmetries of
the theory, in a sense to be made precise below. In order to extract useful information
from an expansion of this sort, Wilson was obliged to assume (incorrectly) that the
strong interactions behaved in a scale-invariant way, at which point the dependence of
the coefficient function Ci (ξ) on ξ becomes power-like, with the power related simply
to the scale dimension of the associated composite operator Oi .

4 We shall see below that the structure of the expansion is considerably altered if the local limit is
approached from the light-cone direction, with ξ 2 = 0.
680 Scales III: Short-distance structure of quantum field theory

We now know, from the rigorous work of Zimmermann, that the expansion is indeed
correct in renormalized perturbation theory, but that the short-distance behavior of
the coefficient functions is more complicated, involving, in general, logarithms as well
as powers of ξ. Nevertheless, up to logarithms, the leading power behavior of the
Wilson coefficient functions (in perturbation theory) is still associated in a simple
way, as we shall see below, with the engineering dimension of the associated composite
operator, in such a way that each additional power of mass dimension in the operator
corresponds to a softening of the short-distance behavior of the associated Wilson
coefficient by a power of ξ (modulo logarithms).
The connection of such an expansion—a sort of operator generalization of the
Taylor expansion (though, as typical in field theory, at best an asymptotic and not
a convergent one)—to the promised separation of large momentum behavior becomes
clear once we consider a definite matrix element of (18.46) (with x = 0) and Fourier
transform the ξ variable:

T (q, ki , ki ) ≡ d4 ξeiq·ξ ki |T {A(ξ/2)A(−ξ/2)}|ki  (18.47)

T (q, ki , ki ) → C̃i (q)ki |Oi (0)|ki , q 2 → −∞ (18.48)
i

The amplitude T (q, ki , ki ) corresponds to a situation in which a large space-like


momentum q is introduced and then removed (by the A fields) from a set of particles
at fixed low momenta ki , ki , as indicated in Fig. 18.9.5 Here, the C̃i (q) are the
Fourier transforms of the coordinate space Wilson coefficient functions Ci (ξ) appearing
in (18.46): by the usual properties of Fourier transformation, successive coefficient
functions Ci (ξ) with (as discussed above) additional powers
of ξ in their short-distance
behavior correspond to additional inverse powers of Q ≡ −q 2 in the asymptotics of

q ki

T(q,ki,ki)

q
ki

Fig. 18.9 Momentum space amplitude T (q, ki , ki ) corresponding to the insertion of a bilocal
operator.

5 This is a Euclidean analog of the forward scattering amplitude T (k, q; k, q) discussed in Section 6.6:
we shall see later how to extend the use of the OPE to Minkowskian situations of this sort.
Factorizable structure of field theory amplitudes: the operator product expansion 681

C̃i (q) for large Q. Thus, the leading asymptotic behavior of T (q, ki , ki ) for large Q is
determined by the leading term(s) in the expansion, in many cases, by a single operator
of minimal engineering dimension, provided, of course, that such an operator possesses
a non-vanishing matrix element between the initial and final states indicated in (18.48).
A glance at this formula also shows that the general dependence of our amplitude
on “large” q and “small” ki , ki momenta has been factorized in the expansion. This
factorization property of amplitudes is a deep consequence of the dynamics of local
field theories, and the Wilson OPE is the most direct expression thereof.
Our discussion of the Wilson operator product expansion (henceforth, OPE) will
take place entirely in the arena of momentum space: i.e., in the form given in (18.48).
There are several reasons for this. First, we are primarily interested in the behavior
of S-matrix amplitudes at high energy, which are naturally formulated directly as
momentum-space objects. But more importantly, the physical intuition underlying the
emergence of the factorization properties of amplitudes is far more easily acquired by
an examination of the behavior of large momentum flows in graphical amplitudes than
by direct consideration of the corresponding coordinate space amplitudes. Moreover,
there are generalizations of the OPE expansion (the cut vertex formalism of Mueller
(Mueller, 1981) is an example) in which non-local operators appear, and which do not
even have a natural expression in terms of the coordinate space asymptotic behavior
of amplitudes. For all these reasons, our discussion of the OPE, and more generally,
the factorization property, will be given in terms of momentum-space amplitudes.
The basic strategy underlying Zimmermann’s proof of the OPE can easily be
illustrated with a simple example. As usual, the φ36 theory provides a convenient
stage for displaying the central idea. We consider the case where the local field A
in (18.47) is just the canonical φ field, with a single incoming and outgoing particle
carrying momentum k. In Euclidean space, the corresponding amplitude is given by
the connected contributions to the correlation function

T (q, k) = d6 ξeiq·ξ φ(ξ/2)φ(−ξ/2)φ̃(k)φ̃(−k) (18.49)

with the lowest-order graph displayed in Fig. 18.10. The external propagators associ-
ated with the fields carrying momentum ±k are assumed to be truncated (as indicated

q k

q+k

q k

Fig. 18.10 Lowest-order contribution to T (q, k).


682 Scales III: Short-distance structure of quantum field theory

k o
q q

• q+k − • q

q q
k o
(a) (b)

Fig. 18.11 (a) One-loop graph giving divergent contribution to φ2 (0). (b) Zero-momentum
subtraction renormalizing φ2 (0).

by crossbars), but the propagators on the left, carrying momentum q, are not. The
local limit ξ → 0 of the operator product in (18.49) corresponds to integrating T (q, k)
over q, thereby obtaining a δ-function setting ξ to zero. This corresponds, of course,
to the one-loop graph indicated in Fig. 18.11(a), which is logarithmically divergent.
This simply indicates that, as we have seen above, the composite operator φ2 (0)
is ultraviolet-divergent and requires renormalization. The divergence is, of course,
removed by the zero momentum subtraction indicated in Fig. 18.11(b), leading to a
finite result which we interpret as arising from the minimally subtracted composite
operator N4 (φ2 (0)). This subtraction is effective in making the one-loop integral finite
(and this is the crucial point!) precisely because it removes the dominant dependence of
the graph in Fig. 18.11(a) at large momentum q. This suggests that we can introduce
an oversubtracted bilocal operator N4 (φ(ξ/2)φ(−ξ/2)), with Fourier transform given
to lowest order by the graphs indicated in Fig. 18.12, with a finite integral over q, and
therefore with the finite local limit N4 (φ(ξ/2)φ(−ξ/2)) → N4 (φ2 (0)), ξ → 0.
The term “oversubtraction” is appropriate here as the tree diagram giving the
leading order contribution to the amplitude containing the bilocal operator is already
finite and does not in that sense “need” a subtraction. However, the dominant
asymptotic behavior for large q, k fixed, of T (q, k) is closely related to exactly the extra
subtractions introduced to define this oversubtracted bilocal operator, which in the
forest language correspond to counting as renormalization parts those 1PI subgraphs
which would become divergent when the vertices ±ξ/2 associated with the bilocal
operator are pinched to a point (turning Fig. 18.12 into Fig. 18.11).
This insight, combined with clever use of forest formula techniques, allowed
Zimmermann to provide a rigorous, all-orders proof of the OPE in a very general
context. We shall sketch the proof here, but the basic steps will be translated from
the language of Zimmermann forests into explicit graphical expressions where the
structure of the subtractions, and their relation to the dominant large momentum
flows in the diagrams, will be more physically intuitive. Also, we shall restrict ourself
to the leading term in the expansion, as forest formula techniques are more or less
indispensable in handling the combinatorics of the subleading terms in the expansion.
We begin with the Euclidean case, before going on to the phenomenologically more
important light-cone expansion.
Factorizable structure of field theory amplitudes: the operator product expansion 683

q q
k o

q+k − q

k o
q q

Fig. 18.12 Over-subtraction of the bilocal operator d6 ξeiq·ξ N4 (φ(ξ/2)φ(−ξ/2)).

q l
k

q−l k+l

q k
l

Fig. 18.13 One-loop two-particle-reducible contribution to T (q, k).

Our graphical demonstration of the OPE will depend on a crucial property of two-
particle-irreducible diagrams (or “kernels”: cf. Sections 10.4, 11.2). Consider first the
one-loop “box” diagram contribution to T (q, k) indicated in Fig. 18.13. Recalling that
external propagators are amputated on the right side only, this amplitude (ignoring
combinatoric and coupling factors) takes the form

1 1 1 1 d6 l
I1 loop (q, k) = 2 (18.50)
(q + m2 )2 (l2 + m2 )2 (q − l)2 + m2 (l + k)2 + m2 (2π)6

For large Q ≡ q · q, the integral (ignoring the external propagator factors) is UV-
finite and of order 1/Q2 , as we can see by examining the contributions of the possible
regions of large momentum (of order Q) flow through the diagram. In particular, we
have:
1. The region where the loop momentum is large, i.e., lμ ∼ Q, with phase-space
volume Q6 and integrand of order 1/Q8 .
2. The region in which the large momentum q flows in and out of the diagram
entirely through the propagator carrying momentum q − l, corresponding to loop
momentum lμ ∼ kμ , m << Q. This region also contributes asymptotic behavior
1/(q − l)2 ∼ 1/Q2 .
The presence of the second region implies that a zero-momentum subtraction I(q, k) −
I(q, 0) does not reduce the asymptotic behavior. On the other hand, in the corre-
sponding over-subtracted graph, I1 loop (q, k) − I1 loop (q, 0), it is easy to see that the
684 Scales III: Short-distance structure of quantum field theory

contribution of the first region, where large momentum permeates the entire graph,
is suppressed to at least order 1/Q3 , as the effect of the small incoming momentum
k is subdominant once all the internal propagators of the loop are far off-shell (of
order Q2 ). The reader may easily verify this assertion explicitly by constructing the
subtracted integrand and subjecting it to the simple power-counting analysis along the
lines just followed above. The presence of a dominant contribution in regions where a
subset of lines remain soft (low momentum) is clearly connected to the two-particle
reducibility of our box diagram: the large momentum Q is afforded a rapid exit route
from the diagram on the left, via the single left-most internal propagator. On the other
hand, a two-particle-irreducible (2PI) diagram such as the one indicated in Fig. 18.14,
while still of order 1/Q2 for large Q, receives its entire dominant contribution from the
region of large momentum flow through the entire diagram: i.e., lμ ∼ Q. The Feynman
integral in this case is

1 1 1
K1 loop (q, k) =
(q + m2 )2
2 l2 + m (q − l)2 + m2
2

1 1 d6 l
· (18.51)
(l + k)2 + m2 (k + l − q)2 + m2 (2π)6

and we see immediately that the contribution of the second region lμ ∼ kμ , m to the
integral is of order 1/Q4 . The reason is simply that in the 2PI case the large momentum
q entering at the bottom left is forced to flow through at least two internal lines of
the graph in order to exit the graph on the upper left-hand side.6 This property
generalizes to an arbitrary multi-loop 2PI contribution to T (q, k), so defining the sum
of all such 2PI graphs as K(q, k), we conclude that the zero-momentum subtraction
K(q, k) − K(q, 0) softens the asymptotic behavior by at least a power of Q, along the
same lines as discussed above for the large-momentum region of the box diagram. This
property is all that we shall need below to show that the oversubtractions introduced

k+l−q
q k

q−l k+l

q k
l

Fig. 18.14 One-loop two-particle irreducible contribution to T (q, k).

6 One of the corollaries of Weinberg’s power-counting theorem discussed in Section 17.1 provides a
rigorous estimate for the asymptotic behavior of any convergent Feynman integral in terms of exactly the
minimal routing argument given here, so the reader may be assured that the asserted behavior is on a very
solid footing.
Factorizable structure of field theory amplitudes: the operator product expansion 685

k k k k
q q q q
T (q,k) = K + K K + K K K +...
q q q q
k k k k

Fig. 18.15 Ladder expansion for T (q, k) in terms of 2PI kernels K.

to define the bilocal operator N4 (φ(ξ/2)φ(−ξ/2)) are just those needed to obtain a
factorized expression for the leading asymptotic behavior.
The introduction of two-particle irreducible kernels simplifies the description of the
oversubtraction procedure needed to exhibit the emergence of an operator product
expansion, by simplifying the graphical structure of the 1PI contributions to the
amplitudes T (q, ki , ki ) of Fig. 18.9. In particular the 1PI contributions to these
amplitudes7 may be expressed as a sum of ladder graphs in which 2PI kernels are
iterated, as shown in Fig. 18.15 (for the 2-2 case T (q, k)). Each 2PI kernel in Fig.
18.15 is itself a sum of infinitely many graphs, of which a few of the lowest-order ones
are shown in Fig. 18.16.
It should be emphasized that Figs. 18.15 and 18.16 are skeleton graphs: each
propagator line actually represents the full renormalized scalar propagator, including
all possible self-energy corrections, with their associated BPHZ subtractions, and each
vertex where three propagator lines meet at a point a full 1PI three-point vertex
function including all BPHZ subtractions from any renormalization parts it may
contain. In this way, we ensure that the set of ladder graphs in Fig. 18.15 indeed
contains all the graphs making up the fully renormalized 1PI amplitude T (q, k).
Another critical point here is one which we encountered earlier in our treatment
of perturbative renormalization in Chapter 17: the subtractions induced by the
counterterms of the theory produce UV-finite amplitudes with a dependence on the

q k q k q k q k

K = + + +...

q k q k q k q k

Fig. 18.16 Low-order skeleton graphs contributing to the 2PI kernel K(q, k).

7 We note here that in a φ3 theory there are also one-particle reducible graphs contributing to T (q, k , k  ),
6 i i
as the absence of a discrete φ → −φ symmetry allows the large momentum q to flow through a subgraph
connected only by a zero-momentum propagator to the part containing the small momenta ki , ki . This
means that among the operators Oi appearing on the right-hand side of the OPE (18.48) in this theory
is the scalar field φ itself. We shall ignore these graphs, which do not occur in the QCD/QED analogs of
φ36 -theory, as an amplitude in which two photons carry large momentum q in and out of a graph cannot be
connected by a single gluon (or photon) to the rest of the diagram, by Lorentz-invariance. Thus, we shall
only consider the 1PI contributions to the T (q, ki , ki ) amplitudes in the following.
686 Scales III: Short-distance structure of quantum field theory

(a) (b)
k o
q q
K K
q q
k o
(c) (d) (e) (f)
l k l o l k l o
q q q o
+ K K K K K K K
q q q o
l k l o l k l o

+ .... ∼ O(1/Q3)

Fig. 18.17 Oversubtracted amplitude T (q, k).

external momenta which is modified from that given by naive power-counting by at


most powers of logarithms. As our discussion of factorization treats only the (integer)
power of the large momentum Q as determinative of the leading asymptotic behavior,
logarithmic factors are ignorable in isolating subdominant asymptotic terms. Thus, the
replacement of free propagators and vertices by their full counterparts in Fig. 18.16
does not alter our previous conclusions vis-à-vis the flow of large momentum through
2PI kernels. In particular, a single subtraction of such a kernel suffices to lower the
asymptotic behavior at large Q by at least a power of Q.
In Fig. 18.17 we show the first few diagrams (namely, the graphs (a) and (c))
in the skeleton expansion of T (q, k), supplemented by a set of subtraction terms
corresponding to the oversubtractions needed to define the N4 (φ(x)φ(y)) operator
discussed above. The point of these subtraction terms is just to remove the leading
asymptotic dependence of the basic terms (a) and (c) in the large Q limit. The
uninteresting full propagators carrying momentum q in and out of the diagram on the
left are, until further notice, truncated (as indicated by the crossbars), as they supply
simply an overall Q dependence (of order 1/Q4 ) in all the diagrams. Accordingly, the
asymptotic dependence of the basic graphs (a) and (c), prior to subtraction, is, modulo
logarithms, 1/Q2 .
We have previously explained why the subtraction effected by the graph (b) reduces
the dependence of (a) by at least a single power of Q. An analysis of the possible large
momentum flows through the graph (c) allows us to establish a similar suppression
for the graphs on the second line of Fig. 18.17. For example, if the loop momentum
lμ ∼ Q, then the large momentum irrigates both 2PI kernels, and the subtraction is
effective separately between graphs (c) and (d), and between graphs (e) and (f). On
the other hand if lμ ∼ kμ ∼ m, the cancellation occurs between graphs (c) and (e), and
between (d) and (f), as only the kernel on the left is irrigated by large momentum. The
result is that the sum of diagrams in Fig. 18.17 is of order 1/Q3 at large Q, rather than
1/Q2 . The extra subtractions appearing here are just those needed if we were to pinch
Factorizable structure of field theory amplitudes: the operator product expansion 687

the external vertices x and y in Fig. 18.15 together to construct an insertion of the
composite minimally subtracted N4 (φ2 ), whichis finite precisely because the resultant
loop integral over q has asymptotic behavior (1/q4 · 1/q 3 )d6 q and is therefore UV-
finite. The reader should verify this by explicitly constructing the forests appearing in
the renormalized amplitudes for the N4 (φ2 ) operator (see Problem 5).
If we examine the subtraction terms appearing in Fig. 18.17 closely, which we now
realize incorporate exactly the leading asymptotic behavior of T (q, k), a remarkable
property emerges: they factorize algebraically into functions of the large momentum
q and the remaining “small” momentum k. Indeed, transferring the subtraction terms
to the right-hand side, we obtain the graphical equation indicated in Fig. 18.18, where
the dependence on q is isolated in the set of graphs in the first parenthesis (which
sum simply to T (q, 0)), while the second parenthesis contains exactly the graphs
contributing to the insertion of the minimally subtracted composite N4 (φ2 ) operator
in the 1-1 matrix element. In terms of Euclidean correlation functions, reinserting the
external legs carrying momentum q on the left, we have the asymptotic result

φ̃(q)φ̃(−q)φ̃(k)φ̃(−k)1P I ∼ C̃φ2 (q)N4 (φ2 (0))φ̃(k)φ̃(−k)1P I + O(1/Q3 ) (18.52)

where the Wilson coefficient function C̃φ2 (q) is given in this case simply by setting
the small momenta to zero in the full 1PI amplitude, and is clearly of order 1/Q2 for
large Q. This result generalizes straightforwardly to amplitudes with more than two
low-momentum fields,

φ̃(q)φ̃(−q)φ̃(k1 )φ̃(k3 ) · · · φ̃(kn )1P I ∼ C̃φ2 (q)N4 (φ2 (0))φ̃(k)φ̃(k2 ) · · · φ̃(kn )1P I
+ O(1/Q3 ) (18.53)

Indeed, in this case, the ladder expansion of the amplitude T (q, k1 , k2 , ..., kn ) ter-
minates on the right with a 2PI kernel with two incoming lines on the left and
n > 2 (small momentum) lines on the right. The reader may easily verify that such a
kernel is automatically suppressed (to order 1/Q4 ) if large momentum flows through
it. The subtraction terms needed for the oversubtraction are therefore just the ones
discussed above for the 2PI 2-2 kernels to the left of this final 2-n kernel, and the
reader may easily verify (Problem 6) that the factorization obtained is precisely as

k o o
q q q
T (q, k) ∼ K + K K +....
q q q
k o o
k o

× 1 + V× K − V× K +.... + O(1/Q3)

k o

Fig. 18.18 Factorized structure of T (q, k) at large Q.


688 Scales III: Short-distance structure of quantum field theory

indicated in (18.53). The transition to an operator expansion can then be made in


the usual fashion: the Euclidean correlation functions are analytically continued back
to Minkowski space, and the resultant Minkowski amplitudes (vacuum-expectation-
values of T-products) taken on-shell via the LSZ formula to yield the desired expansion
(18.48) for matrix elements of the operator product.
The graphical arguments presented above are especially valuable in exposing the
physical basis for factorization, and indeed are perfectly adequate in convincing oneself
of the validity of the OPE at the leading order, especially in situations where, as in our
case, only a single operator of lowest dimension contributes to the leading asymptotic
behavior. To derive the general form of the expansion (18.48) however, including all
the subdominant terms, the forest formalism of Zimmermann is indispensable. At
this point, the power of the normal product formalism really manifests itself clearly:
the ability to perform progressively higher degrees of oversubtraction by altering the
subtraction degree of composite operators (as in (18.6)) turns out to be exactly the
formal ingredient needed to organize an efficient proof of the general form of the
expansion, in which local composite operators of increasing dimension are accompanied
by coefficient functions of progressively more rapid falloff (by powers of the large
momentum, modified by logarithms) at large momentum. Lack of space prevents us
from going into the details here, but the interested reader may find the complete
argument in the lectures of Zimmermann (Zimmermann, 1970).
Returning to our result (18.53), we see that the leading large momentum behavior
of T (q, k) is contained in the coefficient function C̃φ2 (q) = T (q, 0), with the low-
momentum k-dependence isolated in a correlation function of a renormalized com-
posite operator. Note that the factorization has automatically produced UV-finite
components: both the coefficient function and the composite operator contain all
necessary subtractions to remove ultraviolet cutoff-dependence (which, after all, had
already been removed in the original unfactorized amplitude). However, in a strongly
coupled theory we are still no nearer to actually determining the precise form of the
asymptotic behavior, as this would presumably require the computation of the full
coefficient function, to all orders, and appropriately resummed to obtain a sensible
finite result. Fortunately, just as in the case of the amplitudes discussed in the
preceding section, in which all external momenta were large, a Callan–Symanzik
equation controlling the large-q behavior of C̃φ2 (q) can be derived. We shall see in
the next section that the validity of such an equation in non-abelian gauge theories,
together with the remarkable property of asymptotic freedom (namely, a negative β
function at weak coupling), reduces the problem of large-momentum behavior to an
entirely perturbative one.
As we shall now see, the desired Callan–Symanzik equation amounts to the state-
ment of factorizability of the amplitude C̃φ2 (q) = T (q, 0) after insertion of a soft mass
operator. When this holds, we say that the process (or amplitude) is “renormalization
group controlled” (the connection to the renormalization group will also be explained
more fully in the following section). Recall from our previous discussion that the inser-
tion of the soft mass operator (in other words, of the minimally subtracted N4 (φ2 ))in
a 1PI amplitude with N external legs is equivalent, up to a multiplicative constant, to
application of the Callan–Symanzik operator DCZ ≡ m2R ∂m ∂ ∂
2 + β(λR ) ∂λ
R
− N γ(λR )
R
Factorizable structure of field theory amplitudes: the operator product expansion 689

o o
q q o
DCZ ∼ V ×
K K Q→ ∞ K K
q q o
o o

Fig. 18.19 Factorized structure of the two kernel contribution to DCZ T (q, 0) at large Q.

to that amplitude. We note first that applying DCZ (with N = 4) to the first term in
the skeleton expansion (see Fig. 18.15) of T (q, 0) (consisting of a single 2PI kernel)
automatically produces an asymptotically suppressed amplitude: the insertion of a soft
φ2 vertex produces an extra internal propagator (with no additional subtractions) in
an amplitude which receives its dominant asymptotic contribution from the regime in
which all internal lines are far off-shell (of order Q2 ). Moreover, in all higher terms
in the skeleton expansion, the mass insertion must avoid the left-most kernel, as the
large momentum cannot avoid flowing at least through this part of the graph.
Thus, in the term with two kernels, the application of DCZ leads to the factorization
indicated in Fig. 18.19. We use the “hat” (or “caret”) symbol to indicate the part of the
graph containing the mass insertion, with the specification that the two propagators
on the left of a kernel also receive mass insertions if the hat symbol is attached to
that kernel.8 Once a kernel receives the mass insertion, the momentum flowing in and
out on the left is forced to be small, as momenta of order Q would again lead to
an asymptotically suppressed contribution, by the previous argument. The standard
OPE derived earlier may therefore be applied to the part of the graph to the left of the
mass-inserted kernel (in Fig. 18.19, the 2PI kernel on the left), which thereupon loses
its dependence on the small momentum connecting it to the right side of the graph.
The result is that the mass-inserted kernel sees a momentum-independent amplitude
to its left, leading to the appearance of a local vertex V , as indicated in Fig. 18.19.
The same reasoning applied to the contribution with three 2PI kernels leads to the
factorization indicated in Fig. 18.20, as the reader will confirm with a little thought.
(Here, the OPE factorization of the two kernel subgraph on the left of the final graph
on the top line is allowed by the fact that the loop momentum l connecting it to the
mass-inserted kernel must be soft.)
Putting these results together, we arrive at the factorization indicated in Fig. 18.21
for the amplitude obtained by applying the Callan–Symanzik operator DCZ ≡
m2R ∂m∂ ∂
2 + β(λR ) ∂λ
R
− 4γ(λR ) to the coefficient function C̃φ2 (q). As usual, the vertex
R
V indicates the point of insertion of a minimally subtracted N4 (φ2 ) operator. We see
that, up to terms suppressed by 1/Q2 , the coefficient function satisfies an homogeneous
Callan–Symanzik equation, with an additional term corresponding to C̃φ2 (q) (the
graphs on the top line) multiplied by the momentum-independent series of graphs on
the second line. This term is (a) dimensionless and (b) only a function γφ2 ,CZ of the

8 For these terms, one takes N = 0 in D , as the graph is only “half amputated”—see the discussion
CZ
following (18.38).
690 Scales III: Short-distance structure of quantum field theory

o o o
q q o q

DCZ K K K K V´ K K + K K K
Q→∞
q q o q
o o o

o o
q o q o q o o
∼ K V´ K K + K K + K V´ K – V´ K K
q o q o q o o
o o

Fig. 18.20 Factorized structure of the three-kernel contribution to DCZ T (q, 0) at large Q.

o o o
q q q
DCZCφ2(q) ∼ K + K K + K K K +...
Q→∞ q q q
o o o

o o o
o
´ V´ K + V´ K − V´ K K +V ´ K K +...
o
o o o

Fig. 18.21 Callan–Symanzik equation for coefficient function C̃φ2 (q), in graphical form.

renormalized parameters mR , λR , and therefore only of the dimensionless renormalized


coupling λR . For reasons that will become apparent in the next section, it is called
the anomalous dimension of the composite operator N4 (φ2 ), and is evidently given
by a single-particle matrix element (with a mass insertion) of this operator at zero
momentum. In equation form, we have

∂ ∂
DCZ Cφ2 (q; mR , λR ) = (m2R 2 + β(λR ) − 4γ(λR ))Cφ2 (q; mR , λR )
∂mR ∂λR
≈ γφ2 ,CZ (λR )Cφ2 (q; mR , λR ) (18.54)

where the approximation symbol ≈ 0 indicates the neglect of terms suppressed by


powers of Q (see also Problem 7). In the next section we shall see how to solve
this equation. At that point it will become apparent that in the important subclass
of asymptotically free theories (including the present case of φ36 -theory, and, much
more importantly, non-abelian gauge theories such as QCD), it determines the precise
leading asymptotic behavior at large Q of the coefficient function (and therefore, of
our factorized amplitude T (q, k)) on the basis of purely perturbative information.
Factorizable structure of field theory amplitudes: the operator product expansion 691

Our discussion so far of the short-distance/large momentum factorization prop-


erties of amplitudes has focussed on the behavior of the Euclidean Green functions
of the theory, allowing us to help ourselves plentifully to the physically transparent
power-counting rules provided by Weinberg’s theorem. Real physical processes, on the
other hand, have a stubborn propensity to unfold in Minkowski spacetime, and we
may therefore wonder whether any of the impressive formal results obtained so far
have any meaningful phenomenological application. Fortunately, the basic principles
of factorization are still found to be valid in the Minkowski regime for numerous
high-energy processes. Although the topology of the (over)subtractions required to
demonstrate this is essentially the same as in the Euclidean case, there are important
differences in detail, as we shall now see.
We shall consider the archetypal Minkowski process exhibiting factorization, deep-
inelastic scattering, as the argument in this case provides the basic template for
establishing renormalization group control of many different types of high-energy
amplitudes. The process is depicted in Fig. 18.22: a deeply space-like photon emitted
by an incoming lepton (momentum p), with large momentum q, q 2 ≡ −Q2 , results
in the fragmentation of an incoming hadron (e.g., a proton) of momentum k into
an arbitrary hadronic final state, indicated n in the figure. We are interested in the
total inclusive cross-section for this process, summing over all possible hadronic final
states n. The factors contributed to the graph in Fig. 18.22 by the lepton lines and
photon propagator are completely known, as the electromagnetic part of the process
is treated to lowest-order perturbation theory. Apart from these boring kinematical
factors, therefore, the amplitude in Fig. 18.22 is just given by a Fourier transform of the
corresponding matrix element of the hadronic electromagnetic current n|Jem,had (x)|k
(cf. (9.202)). The inclusive cross-section is obtained by squaring this amplitude and

e−(p)
n

n|J˜em,had(q)|k

q = p − p

k
e−(p)

Fig. 18.22 Deep-inelastic electron–hadron scattering.


692 Scales III: Short-distance structure of quantum field theory

summing over all possible final states, subject to energy-momentum conservation, and
is therefore proportional to the tensor9
 μ
(2π)4 δ 4 (Pn − k − q)k|Jem,had ν
(0)|nn|Jem,had (0)|k
n

μ
= Im{i d4 xeiq·x k|T {Jem,had ν
(x)Jem,had (0)}|k} (18.55)

where the second line follows by standard manipulations along the lines used to
establish the Kållen–Lehmann spectral representation in Section 9.5 (see Problem 8).
This result (basically the optical theorem of scattering theory, relating a total
cross-section to the imaginary part of a forward scattering amplitude) allows us to
concentrate our attention on the 2-2 amplitude on the second line, where a large
space-like momentum q is inserted and then extracted on the left of the diagram,
with the momentum k kept fixed (and eventually, sent on-mass-shell for the incoming
and outgoing hadron). We saw in Section 6.6 that a forward scattering amplitude
like (18.55) can also be written as the Fourier transform of a retarded commutator
μ μ
θ(x0 )[Jem,had (x), Jem,had (0)] (cf. (6.126–6.129)), so by locality, the integral over x is
restricted to the forward light-cone. We shall now see that in a certain kinematic
limit, the coordinate displacement x of the two current operators can be forced onto
the light-cone, i.e., to the value x2 = 0 (in Minkowski space), and that the product
of operators can again be expanded, with a leading set of operators (with associated
coefficient functions) providing the dominant asymptotic contribution.
Let us consider the Bjorken limit, in which k · q and q 2 are large (i.e., >> k 2 , m2 )
and comparable, with the ratio fixed:
1 2k · q
ω≡ ≡− 2 fixed (18.56)
x q
We may automatically realize the Bjorken limit by choosing a convenient Lorentz
frame. For a general Minkowski four-momentum p,10 define light-cone coordinates
p± ≡ √12 (p0 ± p3 ), p = (p1 , p2 ) in terms of which the invariant dot-product takes the
form

p · q = p+ q− + p− q+ − p · q (18.57)

Note that the vector symbol here applies only to the two (or, in six spacetime dimen-
sions, four) transverse dimensions orthogonal to the preferred spatial z-direction. We
shall work in a frame in which q = 0, q+ ∼ Q2 /mR , and q− < 0 ∼ mR . For kμ ∼ mR
fixed, in the Bjorken limit,

2(k− q+ + k+ q− ) k−
ω=− ∼− (18.58)
2q− q+ q−

9 For a fuller discussion, including the kinematic factors glossed over here, see Section 13.4, (Itzhykson
and Zuber, 1980).
10 In six dimensions we likewise take p ≡ √1 (p ± p ), p
± 0 5  = (p1 , p2 , p3 , p4 ).
2
Factorizable structure of field theory amplitudes: the operator product expansion 693

Now, by taking q+ large, we force the Fourier transform to extract the dominant
dependence of the retarded commutator of currents for x− small (as the exponent
is q · x = q+ x− + ..). However, locality restricts us to the interior of the forward
light-cone, 2x+ x− > x2 so x− → 0 forces also x → 0, and the Bjorken limit naturally
probes the region x2 → 0: i.e., the light-cone singularities of the operator product.
Fortunately, as in the Euclidean case, where the structure of the amplitude simplifies
(via factorization) in the Euclidean limit x2 → 0 (⇒ x → 0), the forward amplitude in
(18.55) also displays a factorized structure in the Bjorken limit in Minkowski space. In
this case, however, the leading contribution involves an infinite “tower” of operators
and coefficient functions—not surprisingly, as the light-cone limit involves a surface,
rather than a single point.
To expose the essential ideas, while avoiding the (not inconsiderable!) complications
of spin and local gauge symmetry with which we would have to contend in QCD,
we shall sketch the factorization procedure in our old standby, φ36 -theory. Thus,
instead of the second line of (18.55), we consider exactly our previous amplitude
T (q, k) (essentially, the Fourier transform of k|T {φ(x)φ(0)}|k) of Fig. 18.15, but
now in Minkowski space, and in the Bjorken limit (18.56). The factorization will be
demonstrated for the full amplitude: the imaginary part can then be taken at the end,
to obtain the desired inclusive cross-section.
We begin, as before, with the lowest-order tree diagram contributing to T (q, k), as
indicated in Fig. 18.23(a). The large momentum q flows through a single propagator
and the (fully amputated) graph is therefore (suppressing the ubiquitous i terms)
proportional to
1 1
= 2 (18.59)
(q + k) − mR
2 2 q + 2k− q+ + 2k+ q− + k 2 − m2R

In the Bjorken limit, the terms 2k+ q− and k 2 are of order m2R , suppressed by two
powers of the large scale Q relative to the terms q 2 and 2k− q+ which are both of order
Q2 . This means that the leading asymptotic dependence of graph (a) on Q is unaltered
if we set k+ and k to zero (the latter meaning, in six spacetime dimensions, k1 =
k2 = k3 = k4 = 0). We shall indicate that the + and transverse vector components

k o(⇒ k̂)
q q

k+q − k̂ + q

q q
k o(⇒ k̂)
(a) (b)

Fig. 18.23 Oversubtraction of lowest-order T (q, k) in the Bjorken limit.


694 Scales III: Short-distance structure of quantum field theory

of an external momentum entering a subgraph have been set to zero by once again
appending the “o” symbol to the corresponding leg, and also define, for a general
momentum pμ = (p+ , p− , p), the projected momentum p̂μ = (0, p− , 0). Accordingly,
the subtraction effected by graph Fig. 18.23(b), with propagator
1 1
= (18.60)
(q + k̂)2 − m2R q 2 + 2k− q+ − m2R

reduces the asymptotic behavior of the amplitude from 1/Q2 to 1/Q4 .


At the one-loop level we encounter the box diagram indicated in Fig. 18.24(a).
In analogy to the subtractions indicated in Fig. 18.17, we introduce the subtractions
shown in graphs (b), (c), and (d). As previously, we only expect graph (b) to contain
the leading contribution at large Q from the region in which all four internal lines are
off-shell of order Q2 , with additional subtraction terms needed to take care of the case
in which the loop momentum l is soft (l2 << Q2 ). The Feynman integral for all four
terms combines to the expression

1 1 1
I1 loop (q, k) = { − }
(l2 − m2R + i)2 (q − l)2 − m2R + i (q − ˆl)2 − m2R + i
1 1 d6 l
·{ − }
(k + l)2 − mR + i (k̂ + l)2 − m2R + i (2π)6
2


1 (l2 − 2q− l+ )
=
(l2 − m2R + i)2 ((q − l)2 − m2R + i)((q − ˆl)2 − m2R + i)
(2k+ l− + k 2 ) d6 l
· (18.61)
((k + l)2 − m2R + i)((k̂ + l)2 − m2R + i) (2π)6

We now need to estimate the leading asymptotic behavior of this rather formidable
expression at large Q. We no longer have the Weinberg theorem, and its corollaries,
at our disposal, as we are in Minkowski space. In particular, denominators (such as
(q − ˆl)2 − m2R ) with only linear dependence on loop momentum components appear—
a circumstance completely alien to the Euclidean space analysis. Nevertheless, the
indicated subtractions do indeed do their job, and end up reducing the asymptotic
dependence by two powers of Q, as desired. A “physicist’s” proof of this assertion is

l l l l
q k q o q o k o

q−l k+l − q−l k̂ + l − q−lˆ k+l − k̂ + l


q k q o q o k o
l l l l
(a) (b) (c) (d)

Fig. 18.24 Oversubtraction of a one-loop box graph contribution to T (q, k) in the Bjorken
limit.
Factorizable structure of field theory amplitudes: the operator product expansion 695

easily obtained by a straightforward scaling analysis, but the general result is confirmed
by extensive computational experience in perturbative field theory, although a general
power-counting theorem in Minkowski space of the scope and power of Weinberg’s
theorem for the Euclidean case has, to the author’s knowledge, never been established.
Let us therefore proceed directly, by examining the contribution to (18.61) from a
region of phase-space corresponding to arbitrary power scalings of the loop momentum
components:

l + ∼ Qα , l− ∼ Qβ , l ∼ Qγ , α, β, γ > 0 (18.62)

The volume of loop phase-space corresponding to this region evidently scales like
Qα+β+4γ . By examining the scaling under (18.62) of each of the numerator and
denominator terms in (18.61), we find that the subtracted amplitude receives a
contribution of order QP(α,β,γ) , with the power given by

P(α, β, γ) = α − β + 4γ − 2 − 3 max(α + β, 2γ) − max(2, α + β, 2γ) (18.63)

A short exercise (see Problem 9) shows that

P(α, β, γ) ≤ −4 (18.64)

so we may reasonably conclude that the subtractions in graphs (b), (c), and (d) have
indeed succeeded in suppressing by two powers of Q the leading 1/Q2 dependence of
the box diagram Fig. 18.24(a). The subtractions are effective because, in analogy to
the Euclidean case, in the region where the loop momentum l is soft (lμ ∼ kμ , mR ),
corresponding to the large momentum flowing entirely through the left-most vertical
line, the leading asymptotic dependence on Q cancels separately between graphs (a)
and (c), and between (b) and (d); whereas, in the region of l “hard” (this means
l+ ∼ q+ ∼ Q2 , l ∼ Q, l− fixed) where all lines are far off-shell, the cancellation occurs
between graphs (a) and (b), and between (c) and (d), as the reader may easily check,
using power-scaling arguments along the lines of (18.62–18.64).
The preceding examples suggest that we may proceed exactly as in the Euclidean
case to introduce an oversubtracted skeleton expansion for the Minkowski amplitude
T (q, k): the topological structure of these subtractions is exactly the same as in
the Euclidean case, the only difference begin that the subtraction point is at a
projected light-like momentum, rather than at zero momentum. The result is that the
leading asymptotic behavior factorizes as indicated graphically in Fig. 18.25 (replacing
Fig. 18.18). This result may be written explicitly as

d6 l
T (q, k) ∼ T (q, ˆl)(Δ̂F (l))2 T (l, k) , Q → ∞, ω fixed (18.65)
(2π)6

The amplitude T (q, k) is a Lorentz scalar and therefore a function of q 2 , q · k, and k2 ,


or of Q2 , k 2 and the Bjorken variable ω = − kq−

. A slight change of notation makes
this clear:
l−
T (q, k) ≡ Γ(ω, Q2 , k 2 ) ⇒ T (q, ˆl) = Γ(ω̂, Q2 , 0), ω̂ = − (18.66)
q−
696 Scales III: Short-distance structure of quantum field theory

k k
q q ˆl l

T (q,k) Q→∞ T (q,lˆ) V´ T (l,k)
q ω fixed q ˆl l
k k

Fig. 18.25 Factorization of T (q, k) in the Bjorken limit.

One can show (see Problem 10) that Γ(ω, Q2 , k 2 ) is analytic in the cut plane of ω with
cuts running from −∞ to −1 and from 1 to +∞. In the light-like case (ˆl2 = 0), we
can therefore expand


Γ(ω̂, Q2 , 0) = ω̂ n Cn (Q2 ) (18.67)
n=0

Thus, the factorization (18.65) can be written, using (ω̂)n = ( kl−− )n · ω n ,

 
d6 l
Γ(ω, Q , k ) ∼
2 2
Cn (Q ) 2
T (l, k)(Δ̂F (l))2 (ω̂)n (18.68)
n
(2π)6
 
l− n d6 l
= ω n Cn (Q2 ) T (l, k)(Δ̂F (l))2 ( ) (18.69)
n
k− (2π)6

 i∂− n 
= n 2
ω Cn (Q )k|N4 (φ(0)( ) φ(0))|k ≡ ω n vn Cn (Q2 ) (18.70)
n
2k− n


i∂− n
with vn ≡ k|N4 (φ(0)( 2k −
) φ(0))|k the matrix elements of a tower of renormalized
composite operators incorporating the low-energy physics of the process.11
Factors of loop momentum l− appearing at the vertex V in Fig. 18.25 (arising from
expanding the T (q, ˆl) amplitude on the left) have been converted to the corresponding
spatial derivatives appearing in minimally subtracted composite operators, of which
there are clearly an infinite number contributing at leading order. Note that these oper-
ators receive only a single subtraction to remove a logarithmic divergence, irrespective
of the power n of the loop component l− present in the graph (see Problem 11). The
result (18.70) is called the light-cone expansion of the amplitude T (q, k) ≡ Γ(ω, Q2 ).
It shows that the leading asymptotic behavior of the amplitude in the Bjorken limit
is given by an infinite set of factorized terms involving the product of coefficient
functions depending only on the large scale Q, and matrix elements of renormalized
composite operators. In the leading term, all operators of minimum twist (defined as

11 Note that if the initial and final states are taken as light-like elementary scalars, with k → k̂, the matrix
elements are unity, vn = 1. This is just the renormalization condition for the composite operators equating
the matrix element at the special subtraction point to its lowest-order value, as higher loop corrections
vanish if taken at the subtraction point, where they are subtracted.
Factorizable structure of field theory amplitudes: the operator product expansion 697

the engineering dimension of the operator minus its spin, the latter given in this case
by the number of spacetime-derivatives ∂− ) appear. The subtraction degree of the
operator is determined in a light-cone expansion not by the engineering dimension, as

i∂− n
in the Euclidean case, but by the twist: thus, all the operators N4 (φ(0)( 2k −
) φ(0))
appearing in (18.70) have twist 4 (note: the factors of k− are not included in the
dimension) and appear at the same, leading twist, level in the OPE. Operators of
higher twist will contribute to the amplitude at levels suppressed by powers of Q.
In practice, one extracts individual terms in the infinite sum in (18.70) by taking
moments with respect to the variable x ≡ ω1 of Im(Γ(ω, Q2 )), which is directly related
to the inclusive cross-section for the deep inelastic scattering, as discussed above (see
Problem 12).
The discussion of factorization for deep inelastic amplitudes in QCD follows
completely analogous lines to the argument for φ36 -theory given here. In this case
(see, for example, the review of Mueller (Mueller, 1981)), the leading contributions
are given by a tower of operators of twist 2: namely, the quark composite operators

iD− n
On ≡ N2 (ψ̄γ− ( ) ψ) (18.71)
2k−

where Dμ is the gauge-covariant derivative (15.103) for the quark field ψ. In a general
covariant gauge, these composite operators involve both quark and gauge fields, but
the analysis simplifies, and becomes (modulo spin complications) extremely similar
to the φ36 case if we choose a light-cone gauge in which A− (x) = 0, in which case the
gauge-covariant derivatives may be replaced by ordinary ones, D− → ∂− .
Just as in the Euclidean case, the asymptotic behavior of the coefficient functions
Cn (Q2 ) is determined by a Callan–Symanzik equation. The derivation of this equation
follows exactly the lines of the space-like factorization: one applies a soft mass insertion,
via the Callan–Symanzik operator DCZ , to the large momentum amplitude T (q, ˆl),
which is then refactorized. One then obtains, in analogy to (18.54), for the asymptotic
behavior of Cn (Q2 ) (as always, up to power suppressed terms),

DCZ Cn (Q2 ) ∼ γn Cn (Q2 ) (18.72)

where γn , in analogy to the Euclidean case, is given by a single (soft) mass insertion on

i∂− n
the 1-1 matrix element of N4 (φ(0)( 2k −
) φ(0)) (see Problem 11), and is a dimensionless
function of λR . In the case of asymptotically free theories such as QCD (or φ36 ), (18.72)
can be used to reduce the determination of the leading asymptotic behavior in Q to
perturbative information, as we shall explain in the next section. In particular, the
asymptotic behavior of individual moments of the inclusive amplitude can be explicitly
computed and compared with experiment.12

12 For more details on all of this, the reader is referred to the original literature: (Gross and Wilczek,
1974a), (Gross and Wilczek, 1974b). The general approach to factorization outlined here is covered in great
detail in the review of Mueller (Mueller, 1981); see also (Buras, 1980).
698 Scales III: Short-distance structure of quantum field theory

18.3 Renormalization group equations for renormalized amplitudes


The problem of determining the asymptotic behavior of amplitudes when all or some
of the external momenta are taken large can be approached from a quite different
direction from the methods used above, which are deeply grounded in the subtraction
technology of renormalization theory. Instead, we can rely on information provided
by the renormalization group flow that underlies the whole dependence of field theory
amplitudes on the energy scale at which these amplitudes are examined. This can
be done both for the cases where all external momenta are taken large (a classic
example of enormous phenomenological importance being inclusive electron–positron
annihilation to hadrons, which amounts to calculating the imaginary part of the two-
point Green function of the hadronic electromagnetic current, with both incoming and
outgoing momentum q large), and when only some momenta are large (as in deep-
inelastic scattering, discussed above). We shall see that results entirely equivalent to
those discussed previously (in particular, the Callan–Symanzik equations regulating
the large-momentum amplitudes) can be obtained, although the physical arguments
involved are somewhat different.
In Section 16.4 we introduced the concept of the renormalization group in order
to describe the evolution of a Wilsonian effective Lagrangian when we change the
ultraviolet momentum scale up to which Fourier components of the field are included.
The reader will recall (Section 16.2) that the physics of a clustering, Lorentz-invariant
system of interacting particles under quite general assumptions can be described by
a scalar Lagrangian function involving an infinite number of terms with arbitrary
powers and spacetime-derivatives of the field(s). Such a Lagrangian, formulated on a
flat Minkowski background spacetime, is assumed to be valid up to some ultraviolet
momentum scale Λ, beyond which the physics of the model breaks down, for example,
via quantum gravity effects. The UV scale Λ can be varied while leaving the low-
energy/momentum predictions of the theory unchanged by appropriately varying the
infinite set of (dimensionless) coefficients gn of the terms in the effective Lagrangian,
and the corresponding flow in the infinite parameter space of coupling coefficients is
referred to as the renormalization group. The Lie algebra of this group, corresponding
to infinitesimal group operations, is specified by an infinite set of coupled, non-linear
first-order differential equations (16.27):13

μ gn (μ) = βn (gn (μ)) (18.73)
∂μ
where here the floating cutoff of the theory is denoted, as common in treatments of
the renormalization group, by the Greek letter μ.
In general, we can say little about the solution to these equations, but in the
event that the renormalization group flow remains within the region of parameter
space in which the couplings corresponding to interactions (i.e., terms higher than
quadratic in the fields) are small enough to validate the use of perturbation theory,
one is able to show, as we saw in Section 17.4, that the renormalization group

13 As in Section 17.4, we have chosen to label all the coefficients with a single symbol here, not
distinguishing between terms with different numbers of spacetime-derivatives.
Renormalization group equations for renormalized amplitudes 699

flow, starting at a generic point in the infinite-dimensional coupling space at a


high cutoff, attracts to a finite(N )-dimensional submanifold of the coupling space
at low values of the cutoff. The dimensionality N of this submanifold—points on
which can be regarded as associated in one-to-one fashion with physically distinct
low-energy theories—corresponds simply to the number of renormalizable and super-
renormalizable operators (or equivalently, relevant and marginal terms) in the full
effective Lagrangian of the theory.
The fact that the “memory” of a high-energy cutoff is lost in the low-energy physics
provided we parameterize the amplitudes of the theory in terms of renormalization
conditions associated with just this finite subset of operators of the theory is referred
to as “perturbative renormalizability”: it is a property of a truncated Lagrangian
containing solely renormalizable and super-renormalizable terms compatible with the
symmetries and fields present in the theory. To the extent that we are concerned
only with the low-energy properties of the renormalized amplitudes (Green functions)
of such a perturbatively renormalizable theory, we may forget about its origins in
a much more complicated Wilsonian action extending the physics of the theory
up to much higher energies. The remnant information of the full renormalization
group then amounts simply to the statement that the physics at low energy can
be described in terms of any set of N independent parameters uniquely identifying
the point on the N -dimensional submanifold onto which the flow converges at low
energy. Although this freedom seems at first sight a bit trivial, it turns out that
it allows us, in combination with information about the mass-dependence of the
amplitudes of the theory, to derive highly non-trivial conclusions about the energy
dependence of amplitudes, which (fortunately) turn out to be completely consistent
with those implied by the Callan–Symanzik equations derived previously. The material
in this section can therefore be regarded as an alternative approach to the study of
energy/momentum dependence of amplitudes, one which provides additional physical
insight and in some cases is technically more convenient than the operator inspired
technology used in the preceding sections.
In order to examine the consequences of the reparameterization invariance of
physical amplitudes on the low-energy attractor manifold we must, of course, allow
ourselves the freedom of examining these amplitudes in a variety of renormalization
schemes. Sticking to the zero momentum BPHZ scheme, for example, we learn nothing,
as the parameterization is completely fixed by the unique choice of subtraction point.
In principle, one could imagine very general reparameterizations which mix all N
relevant and marginal couplings. Our chief aim will be to explore the connection of
reparameterization invariance to the scaling of amplitudes as the overall energy scale
is changed, so it suffices to examine a one-parameter subset of reparameterizations in
which the particular renormalization scheme is identified by specifying a dimensionful
parameter: for example, the momentum scale μ introduced in the renormalization
scheme in which the propagator and interaction vertex are subtracted at a Euclidean
momentum point, as in (17.75), or the scale μ introduced in the dimensional renormal-
ization scheme, via (17.102). In either of these schemes, the Green functions necessarily
acquire a dependence on the renormalization scale μ. Moreover, if we fix our attention
on a definite physical theory, identified as a unique point on the low-energy attractor
surface (corresponding to some definite endpoint of the renormalization group flow
700 Scales III: Short-distance structure of quantum field theory

at some conventionally chosen low cutoff, and ignoring as usual variations in this
surface of order inverse powers of the much higher UV cutoff), then the renormalized
parameters mR , λR (in φ44 -theory, say) must also vary with μ to keep the physics fixed.
We shall illustrate the derivation of the renormalization group equation for renor-
malized amplitudes taking self-coupled scalar field theory renormalized at Euclidean
scale μ as our starting point (the reader may imagine our theory to be φ44 although
an essentially identical argument holds for φ36 theory). The analysis of perturbative
renormalization in Section 17.2 makes it clear that the renormalized 1PI Euclidean
(N )
N -point function ΓR in such a theory is related to the corresponding 1PI function
(N )
Γ computed by using the bare parameters
 and field in (17.42) with a UV cutoff
Λ by (a) rescaling the bare field φ → ẐφR , and (b) reparameterizing the resultant
amplitude in terms of renormalized mass and coupling parameters defined at the
Euclidean subtraction point. Thus, the upshot of our demonstration of perturbative
renormalizability is the relation14

ki2 , m2R , λ2R , μ2


ΓR (ki ; λR , mR , μ) = Ẑ(λR , mR , μ, Λ)−N/2 Γ(N ) (ki ; λ, m, Λ) + O(
(N )
)
Λ2
(18.74)
In essence, a unique physical theory is specified by the boundary condition that we
start with a cutoff theory at scale ΛU V = Λ, in which the full Wilsonian Lagrangian
is given by (17.42), and then reparameterize the amplitudes (and rescale the field)
at low energy, by using the parameters mR , λR and field rescaling Ẑ defined by the
renormalization conditions (17.46, 17.51, 17.52). The correction term on the right
of (18.74) is just the usual, by assumption ignorable, sensitivity of the amplitudes
of a perturbatively renormalizable theory to the UV cutoff of the theory: we shall
henceforth drop it entirely (by taking Λ → ∞ at the appropriate point, for example).
Of course, the same physical theory can just as well be parameterized in terms of
couplings defined at a different Euclidean scale μ̃:

ΓR (ki ; λ̃R , m̃R , μ̃) = Ẑ(λ̃R , m̃R , μ̃, Λ)−N/2 Γ(N ) (ki ; λ, m, Λ)
(N )
(18.75)

Dividing (18.75) by (18.74), and defining

Ẑ(λ̃R , m̃R , μ̃, Λ)


F (λ̃R , m̃R , μ̃; λR , mR , μ) ≡ (18.76)
Ẑ(λR , mR , μ, Λ)
we find

ΓR (ki ; λ̃R , m̃R , μ̃) = F (λ̃R , m̃R , μ̃; λR , mR , μ)−N/2 ΓR (ki ; λR , mR , μ)


(N ) (N )
(18.77)

In other words, the 1PI Green functions of our theory transform covariantly (by
multiplicative rescaling) under reparameterizations corresponding to an alteration of
the subtraction scale μ. The absence of a dependence on the UV cutoff Λ in the

14 The negative power of Ẑ here arises as a consequence of the need to divide the basic N -point Green
function G(N ) by N full propagators in order  to arrive at the fully amputated Γ(N ) : each such propagator

gives a factor of Ẑ on rescaling, converting the Ẑ associated with each of the N fields in G(N ) to a 1/ Ẑ.
Renormalization group equations for renormalized amplitudes 701

renormalized amplitudes, of course, implies the same for the rescaling factor F in
(18.76) and (18.77). The finite transformation expressed in (18.77) can clearly be
viewed as an invertible element of a one-parameter continuous Lie group in which
successive transformations of renormalization scale satisfy an obvious composition
rule. This one parameter group is all that remains of the vastly more complicated
renormalization group flow embodied in the equations (18.73), which themselves lead
to the collapse to the low-energy attractor surface corresponding to perturbative
renormalizability, as we saw in Section 17.4. Nevertheless, this equation—or rather,
its infinitesimal Lie algebra version—once combined with information on the mass
singularities of the amplitudes, will lead us back to the same powerful constraints
on the Green functions of the theory derived previously in the form of the Callan–
Symanzik equation.
The desired infinitesimal version of (18.77) is readily obtained: we simply keep

μ̃, λ̃R and m̃R fixed while applying μ ∂μ . As a result, λR and mR must also be allowed
to vary, and, with the understanding that everywhere

∂ ∂ 
μ ≡μ  (18.78)
∂μ ∂μ λ̃R m̃R μ̃

we obtain
∂ −N/2 (N ) ∂ ∂λR ∂mR (N )
(μ F )ΓR + F −N/2 (μ +μ +μ )ΓR = 0 (18.79)
∂μ ∂μ ∂μ ∂μ

Multiplying through by F N/2 , we find

∂ ∂λR ∂mR (N ) 1 ∂ (N )
(μ +μ +μ )ΓR = N (μ ln F )ΓR (18.80)
∂μ ∂μ ∂μ 2 ∂μ

After taking the partial derivatives, we may set μ̃ = μ, m̃R = mR , λ̃R = λR and define
the dimensionless functions
mR ∂λR
β(λR , )≡μ (18.81)
μ ∂μ
mR 1 ∂mR
γm (λR , )≡ μ (18.82)
μ mR ∂μ
mR ∂ ln F
γ(λR , )≡μ (18.83)
μ ∂μ

As the coupling λR is dimensionless, the dimensionless functions β, γ, γm can, of course,


only depend on the ratio mR /μ. Inserting these definitions in (18.80), we find the
renormalization group equation for 1PI amplitudes:

∂ mR ∂ mR ∂ mR
{μ + β(λR , ) + γm (λR , )mR − N γ(λR , )}
∂μ μ ∂λR μ ∂mR μ
(N )
× ΓR (ki ; λR , mR , μ) = 0 (18.84)
702 Scales III: Short-distance structure of quantum field theory

We emphasize that this equation is exact (having taken the UV cutoff Λ of the theory
to infinity, of course): we have so far not considered any simplifications arising in an
asymptotic regime. An equation of exactly the same form holds if we use dimensional
renormalization, where the parameter μ is introduced via (17.102), with the addi-
tional simplification that the renormalization group functions β, γm , γ lose their mass
dependence (on mR ) and are therefore only functions of the dimensionless coupling
λR . These functions are, moreover, to be regarded as quantum effects: the dependence
on the subtraction scale μ appears only in loop diagrams requiring subtractions—in
other words, in contributions to the amplitude containing non-zero powers of Planck’s
constant. In particular, the tree diagrams of the theory are independent of μ.
At this point, a superficial resemblance of (18.84) to the asymptotic version of the
Callan–Symanzik equation (18.38) should already be apparent. Recall that the latter
equation applies in the event that the external momentum set ki is non-exceptional
(no non-trivial subset of the Euclidean momenta ki summing to zero), and that this
is also the condition for the absence of mass singularities of the amplitude in the
zero mass limit. In particular, all Lorentz-invariant dot-products ki · kj are non-zero
(and uniformly large, say of order Q2 , with Q a large momentum scale, if we consider
the asymptotic regime as previously for the Callan–Symanzik case), and Weinberg’s
theorem then assures us that the graphs contributing to ΓN R receive their dominant
contribution from regions in which all internal propagators are far off-shell, with
denominators of order Q2 , and therefore with a sensitivity to the mass of order m2R /Q2 .
Neglecting the mass sensitivity, and setting mR to zero, we arrive at the approximate
asymptotic equation, valid to inverse powers of the large momentum scale,
∂ ∂ (N )
{μ + β(λR ) − N γ(λR )}ΓR (ki ; λR , μ) = 0 (18.85)
∂μ ∂λR
which is now formally identical to (18.38). (Note, however, that the renormalization
scale μ in this zero mass theory now plays the role of the BPHZ renormalized mass
mR in the Callan–Symanzik equation.) The dependence on the renormalized mass
(N )
(now set to zero) in ΓR (ki ; λR , μ) and the renormalization group functions β(λR )
and γ(λR ) (which therefore also lose their dependence on μ) has been omitted, and
we have an equation which can be made useful, in precise analogy to the steps leading
from (18.38) to (18.40) by trading in the derivative with respect to renormalization
scale μ (which describes a physically inaccessible dependence of the amplitudes) for
one implementing a uniform rescaling of the external momenta, via the dimensional
(N )
equation (dN is the engineering dimension of ΓR in powers of mass)
∂ ∂ (N ) (N )
(κ + μ )ΓR (κki ; λR , μ) = dN ΓR (κki ; λR , μ) (18.86)
∂κ ∂μ
Using (18.86) to eliminate the μ derivative in (18.85), we find
∂ ∂ (N )
(−κ + β(λR ) − N γ(λR ) + dN )ΓR (κki ; λR , μ) = 0 (18.87)
∂κ ∂λR
This equation is identical in form (modulo a redefinition of the functions β(λR ), γ(λR )
by a factor of 2) with the Callan–Symanzik equation (18.40). At the tree level,
Renormalization group equations for renormalized amplitudes 703

as discussed above, the dependence on the subtraction scale disappears, as do the


functions β, γ, and the resultant equation,
∂ (N ) (N ) (N )
(−κ + dN )Γtree (κki ; λR ) = 0 ⇒ Γtree (κki ; λR ) = κdN Γtree (ki ; λR ) (18.88)
∂κ
(N )
simply reduces to the statement that the tree amplitudes contributing to Γtree have
engineering dimension dN , which therefore determines the scaling behavior with
respect to momentum of the zero-mass theory. The modifications to this scaling
induced by loop diagrams, with their concomitant subtractions, are the additional
information inserted in (18.87) by the renormalization group functions β and γ.
As we are at liberty to set the renormalization scale μ to the value of the renor-
malized mass mR in the BPHZ scheme appearing in (18.40), these renormalization
group functions must in fact (apart from the trivial factor of 2) be identical, despite
their very different definitions in the two approaches. We must therefore conclude
that the scaling properties associated with the insertion of soft mass operators in
BPHZ renormalization are in fact physically equivalent to the contraints imposed
by the renormalization group, once supplemented with information about the mass
singularities of the amplitudes under consideration.
The renormalization group equation (18.87) relates in a linear fashion the scaling
behavior in momentum of the N -point amplitudes to the coupling constant depen-
dence, with the derivative-free terms responsible for an overall scaling of the amplitude.
To see this, define a running (or effective) coupling λeff (κ) as the solution to the first-
order ordinary differential equation and boundary condition

κ λeff (κ) = β(λeff (κ)), λeff (1) = λR (18.89)
∂κ
Further, let
 κ
dκ
z(κ) ≡ exp ( γ(λeff (κ )) ) (18.90)
1 κ
whence we find
∂ −N
κ z (κ) = −N γ(λeff (κ))z −N (κ) (18.91)
∂κ
With these definitions, it is easy to show (see Problem 13) that the general solution
to (18.87) is

ΓR (κki ; λR , μ) = κ4−N z −N (κ)ΓR (ki ; λeff (κ), μ)


(N ) (N )
(18.92)

The extraction of the large momentum behavior of the amplitudes (i.e., for κ large)
is therefore transferred to a knowledge of the κ-dependence of the running coupling
λeff (κ), from which we may determine the scaling factor z(κ), provided that we are
(N )
able to determine the dependence of the full renormalized 1PI amplitude ΓR on the
coupling (now replaced by its running counterpart). In a general theory where the
coupling(s) may be large, rendering perturbation theory inapplicable, this solution
is not particularly helpful. But in an important subclass of cases, the asymptotic
704 Scales III: Short-distance structure of quantum field theory

behavior is determined by the weak-coupling regime of the theory, and we are able to
make rigorous statements about the large momentum properties of the amplitudes.
Suppose that for some renormalized coupling λ̄, and for all 0 < λR < λ̄, we have
β(λR ) < 0. Then it is apparent from (18.89) that with a physical renormalized coupling
λR < λ̄, the running coupling λeff (κ) is monotone decreasing for κ > 1, and indeed,
we have that λeff (κ) → 0, κ → ∞. Such a theory is said to be asymptotically free. The
leading term in the perturbative expansion of the β function is necessarily negative
in an asymptotically free theory, to enforce negativity of the β function at arbitrarily
small couplings. We have already encountered an example in the φ36 -theory discussed
previously, with a β function given to lowest order (including the factor of two in
accordance with the new definition (18.81)) by
3
β(λR ) = −β0 λ3R + O(λ5R ), β0 = (18.93)
256π 3
Once κ is sufficiently large, the effective coupling will become sufficiently small that
the β function is dominated by its leading term, so that the extreme large-momentum
(or short-distance) asymptotics of the theory is determined by solving the defining
equation (18.89), keeping only the leading term in (18.93). It is more convenient to
solve for the squared effective coupling λ2eff , which satisfies, at this leading order
∂ 2
κ λ (κ) = −2β0 λ4eff (κ) (18.94)
∂κ eff
the solution to which is
λ2R
λ2eff (κ) ∼ , κ→∞ (18.95)
1 + 2β0 λ2R ln (κ)

The running coupling in φ36 -theory therefore falls off logarithmically with the momen-
tum rescaling variable: in effect, free field behavior is restored in the amplitude
(N )
ΓR (κki ; λR , μ) when all momenta are taken large, though admittedly very slowly.
Exactly the same behavior obtains in QCD, with renormalized coupling gR and
lowest-order β function (for a SU(N ) theory with Nq quark fields in the fundamental
representation)
1 11 2
β(gR ) = −β0 gR
3 5
+ O(gR ), β0 = 2
( N − Nq ) (18.96)
16π 3 3
The extraordinary progress made in the last three-and-a-half decades in bringing the
high-energy behavior of many strong interaction amplitudes under analytic control is
entirely dependent on this very fortunate property of the theory, first uncovered by
Gross, Politzer, and Wilczek, and for which they received the 2004 Nobel Prize in
Physics. Considerations of space preclude our delving further into this fascinating and
hugely important area of modern particle physics.15

15 Classic QCD applications of the renormalization group control of high-energy processes, such as the
inclusive electron–positron annihilation cross-section to hadrons, and deep-inelastic scattering can be found
in the reviews of Mueller and Buras cited earlier, as well as in any number of modern texts on Standard
Model field theory.
Renormalization group equations for renormalized amplitudes 705

Asymptotically free theories in four spacetime dimensions are rather rare: indeed,
in the class of perturbatively renormalizable theories, the only field theories with this
property are gauge theories with a non-abelian gauge group, and with not too many
matter fields charged under the gauge group (e.g., in (18.96), we must have Nq < 11 2
N ).
Our other field-theoretic workhorse, φ44 -theory, variants of which are clearly present in
the Standard Model in the event that the Higgs particle turns out to be an elementary
scalar, is not asymptotically free. To lowest (one-loop) order, one finds (see Problem 14)

3
β(λR ) = +β0 λ2R + O(λ3R ), β0 = (18.97)
16π 2
and a running coupling

λR
λeff (κ) ∼ (18.98)
1 − β0 λR ln (κ)

which evidently runs into a singularity (the famous “Landau pole”16 ) at a finite value
of κ, beyond which the effective coupling changes sign, apparently destabilizing the
theory. Of course, we are no longer entitled to rely on the perturbative one-loop form
of the β function once the effective coupling becomes large, as it certainly does once
the singularity is approached. Non-perturbative studies have provided considerable
evidence for the hypothesis that this situation should nevertheless be interpreted
as signaling the “triviality” of φ4 theory (cf. the discussion of global aspects of the
renormalization group flow in Section 17.4): the removal of the UV cutoff of the
theory, corresponding to prescribing a well-defined Wilsonian effective Lagrangian at
arbitrarily high-energy scales, necessarily implies the vanishing of the renormalized
coupling λR defined at any fixed low-energy scale (see, for example, (Fernandez et al.,
1992), for a detailed treatment of the mathematical issues surrounding triviality).
Nevertheless, given, as emphasized on many previous occasions, the unavoidable
presence of a physical cutoff at sufficiently high energy, there is absolutely no reason
to reject φ44 theory as a perfectly adequate low-energy effective field theory whose
amplitudes are accurately computable (if we are lucky enough to have a sufficiently
small λR ) by the standard technology of perturbative renormalization theory.
An alternative scenario to triviality for four-dimensional non-asymptotically-free
theories has been the subject of much interest: the presence of a non-trivial ultraviolet
fixed point at which the theory recovers an exact scale (even conformal) invariance.
Consider a four-dimensional scalar theory where the β function is positive at small
coupling, β(λ) > 0, 0 < λ < λ∗ , but with a zero at some positive coupling value
β(λ∗ ) = 0, as indicated in Fig. 18.26. We also assume that the particular physical
theory of interest has a renormalized coupling λR < λ∗ , which serves as the starting
point, λeff (κ = 1) = λR , for the renormalization group flow of the effective coupling
defined in (18.89). As the β function is positive, the effective coupling increases
monotonically with κ, with the rate of growth slowing as λeff (κ) approaches the

16 Quantum electrodynamics, and abelian gauge theories generally, exhibit a similar structure, with
a positive β function at small coupling: the associated singular behavior, in the context of the photon
propagator, was first pointed out in the 1950s by Landau.
706 Scales III: Short-distance structure of quantum field theory

value λ∗ , which therefore acts as a fixed point of the flow: λeff (κ) → λ∗ , κ → ∞. From
(18.90), the scaling factor z(κ) therefore behaves asymptotically as
 κ
dκ ∗
z(κ) ∼ exp (A + γ(λ∗ ) ) ∼ Cκγ(λ ) , κ → ∞ (18.99)
κ

which means that the scaling behavior κ4−N of the tree amplitudes of the massless
theory (cf. (18.88), which follows from the fact that the N scalar fields in the associated
Green function each have engineering dimension 1 (in four dimensions), together
with naive dimensional analysis, is altered by quantum loop effects to the scaling

κ4−N (1+γ(λ )) . We interpret this by saying that the interactions have induced an
anomalous scale dimension γ(λ∗ ) for the underlying scalar field, but that scaling by
fixed powers (rather than fractional powers of logarithms, as in the asymptotically free
case) is restored at the fixed point. In fact, we saw earlier in our discussion of the trace
anomaly (cf. (15.201)), the trace of the energy momentum tensor, which acts as the
divergence of the would-be conserved current of scale transformations (cf. (12.118)),
only vanishes in a massless interacting theory if the β function does so: i.e., at
precisely the fixed point indicated in Fig. 18.26. The situation described here does not
appear to arise in any of the field theories comprising the Standard Model of particle
interactions in four dimensions, but conformally invariant interacting field theories
in two dimensions have been the subject of intense scrutiny, partly as a consequence
of their close connection to aspects of string theory. Moreover, the renormalization
group treatment of critical phenomena in condensed-matter theory—specifically, the
scaling behavior of thermodynamic quantities at a second-order transition—is based
precisely on the existence of an infrared fixed point in scalar field theories which can
be shown to describe the long-distance behavior of spin models near such transitions.
A classic case is the β function of massless φ4 theory in three dimensions, which has
exactly the appearance of Fig. 18.26, but with an overall minus sign, indicating that
power scaling behavior is obtained for the correlation functions in the infrared limit

β(λ)

λ∗
• • λ
λR

Fig. 18.26 β function for a theory with an ultraviolet fixed point.


Renormalization group equations for renormalized amplitudes 707

(κ taken to zero).17 The Wilson–Fisher analysis of critical exponents in second-order


phase transitions relies on the existence of an infrared fixed point of just this kind.
Finally, we turn to the case in which the amplitude of interest has, in addition
to a large-momentum scale Q, important low-momentum scales, and in which a
factorization of the amplitude via an operator product expansion can be established.
We examined in the previous section the case of an (Euclidean) amplitude with N + 2
scalar lines, two of which inject and remove a large momentum q, leading to the
factorization indicated in (18.53). Moreover, the factorization of the mass-inserted
amplitude led to an homogeneous Callan–Symanzik equation (18.54) for the coefficient
function C̃φ2 (q), which can evidently be solved in an asymptotically free theory by the
techniques described above to produce an explicit form for the leading asymptotic
behavior for large q. Here we wish to reproduce this result, but starting from a
renormalization group analysis of the amplitude, in analogy to the arguments given
previously for the case of uniformly large external momenta.
Let us suppose that we have already established the validity of an operator
product expansion (OPE) like (18.53) for a scalar amplitude with N + 2 external
legs, and suppose for simplicity that only a single operator O appears in the leading
twist contribution (e.g., O = φ2 in the case considered explicitly in Section 18.3).
(N,O)
The right-hand side of the OPE contains a renormalized amplitude ΓR with N
scalar legs and a single insertion of the renormalized composite operator, obtained by
multiplicative renormalization (cf. (18.9) of the corresponding bare operator, OR (x) =
ZO (λR , mR , μ, Λ)O(x). This amplitude, in analogy to (18.74), may be written in terms
of the corresponding bare amplitude (neglecting terms vanishing like inverse powers
of the UV cutoff),

(ki ; λR , mR , μ) = ZO (λR , mR , μ, Λ)Ẑ(λR , mR , μ, Λ)−N/2 Γ(N,O) (ki ; λ, m, Λ)


(N,O)
ΓR
(18.100)
Repeating the procedure leading from (18.74) to (18.84), we find
∂ mR ∂ mR ∂
{μ + β(λR , ) + γm (λR , )mR
∂μ μ ∂λR μ ∂mR
mR mR (N,O)
−N γ(λR , ) + γO (λR , )}ΓR (ki ; λR , mR , μ) = 0 (18.101)
μ μ
where the new anomalous dimension arises from a logarithmic derivative of the
renormalization constant ZO (λR , mR , μ, Λ) associated with the composite operator
O. The OPE implies the asymptotic behavior
(N +2) (N,O)
ΓR (q, ki ; λR , mR , μ) ∼ C̃O (q; λR , mR , μ)ΓR (ki ; λR , mR , μ) (18.102)
(N +2)
with ΓR (q, ki ; λR , mR , μ) satisfying the renormalization group equation (18.84)
with N → N + 2. Inserting the right-hand side of (18.102) into the latter, and using
(18.101), one finds

17 A comprehensive treatment of this important area can be found in the treatise of Zinn–Justin (Zinn–
Justin, 1989).
708 Scales III: Short-distance structure of quantum field theory

∂ ∂ ∂
{μ +β + γm mR − (2γ + γO )}C̃O (q; λR , mR , μ) = 0 (18.103)
∂μ ∂λR ∂mR

An equation of Callan–Symanzik type for the coefficient function is therefore obtained,


provided we can neglect the mass dependence in the coefficient function and take
mR = 0—in other words, if the mass singularities of the full amplitude can be removed
from the high-momentum (or “hard”) end of the amplitude (the coefficient function)
and completely incorporated in the low-momentum end (the matrix element(s) of
the composite operator O). We then obtain a renormalization group equation for the
coefficient function,

∂ ∂
{μ + β(λR ) − (2γ(λR ) + γO (λR ))}C̃O (q; λR , μ) = 0 (18.104)
∂μ ∂λR

of the same form as the Callan–Symanzik equation (18.54) obtained previously


(modulo differences in definition of anomalous dimensions in the two approaches).
The suppression of soft mass insertions in hard amplitudes which lies at the core of
the BPHZ approach to high-energy behavior is thus seen, in the renormalization group
approach, to be tantamount to the property of factorization of mass singularities. In
the case of QCD, the ability to isolate the extremely complicated mass singularity
structure of hadronic amplitudes (containing for example, the intricate confinement
physics inaccessible to perturbative treatment) is the basic precondition to extracting
useful information (via asymptotic freedom) using renormalization group equations.

18.4 Problems
1. The conserved currents Jαμ associated with a non-abelian symmetry satisfy the
Ward–Takahashi identity (12.142), where all the operators are bare, unrenormal-
ized ones (prior to rescaling). Rewriting the identity in terms of renormalized
fields (with N3 (Jαμ ) = ZJ Jαμ , φR = Ẑ −1/2 φ), show that the bare Jαμ must already
be ultraviolet finite, and that we may therefore simply set the associated renor-
malization factor ZJ = 1.
2. By examining the graph of Fig. 18.27(a), verify that the oversubtracted N4 (φ2 )
operator in φ44 -theory contains a contribution from the (minimally subtracted)
N4 (φ4 ) operator, as implied by the Zimmermann identity (18.11).
3. The one-loop hexagon graph in φ36 theory shown in Fig. 18.27(b) is taken at an
exceptional momentum point as various subsets of the incoming external momenta
combine to zero momentum. It is proportional to the (UV-finite) Feynman inte-
gral, with superficial degree of divergence –6,

1 1 1 1 d6 l
I(qi , m) =
(l − q1 )2 + m2 (l − q2 )2 + m2 (l − q3 )2 + m2 (l2 + m2 )3 (2π)6
(18.105)

(a) By rescaling the loop integration variable by Q ≡ q 2 , show that

1 m
I(qi , m) = 6
I(q̂i , ), qˆi = qi /Q, q̂i2 = 1, i = 1, 2, 3 (18.106)
Q Q
Problems 709

φ q2 −q2

l − q2
l l
φ
N4(φ2(0)) ´ −q1 q3
φ l − q1 l − q3

l
φ q1 − q3

(a) (b)

Fig. 18.27 (a) A one-loop graph in φ44 -theory needing an oversubtraction in the insertion of
N4 (φ2 ) in the 2-2 amplitude.(b) A one-loop graph in φ36 -theory at an exceptional external
momentum point.

Show that for large Q, the dependence of I(q̂i , m Q ) on the vanishing rescaled
mass m Q
is logarithmic, due to a logarithmic infrared divergence when l → 0
in the rescaled integral. This implies a logarithmic modification of the naive
power scaling 1/Q6 when all qI are taken large simultaneously (of order Q):
I(qi , m) ∼ C ln (Q/m)/Q6 .
(b) Now assume that a mass insertion is made on one of the lines carrying
1
momentum l, thereby increasing the power of the propagator l2 +m 2 to four.

Show that there is no suppression of the asymptotic dependence in this case,


despite the reduction of the superficial degree of divergence of the graph to –8.
Explain this result in terms of the enhanced mass singularity of the rescaled
integral after the mass insertion.
(c) Show that if the incoming external momenta are chosen non-exceptionally,
qi , i = 1, 2, ...6, i qi = 0, all qi of order Q, but no subset of the qi summing
to zero, a mass insertion on any line does reduce the asymptotic dependence
for large Q by two powers of Q, in accordance with the superficial degree of
divergence.
4. Verify the expressions (18.41,18.42) for the 1PI two- and three-point functions of
φ36 -theory at one-loop order.
5. Construct the Zimmermann forests needed to subtract the 1-1 matrix element of
N4 (φ2 ) through 2-loop order, and show how they correspond to the subtractions
introduced in Fig. 18.17 (here, the 2PI kernel factors can be replaced by single
propagators, for simplicity).
6. By analysing the large momentum flows for the skeleton ladder expansion of the
amplitude T (q, k1 , k2 , ..., kn ) (with n > 2), show that the required oversubtrac-
tions lead to the OPE factorization given in (18.53).
7. The point of this exercise is to check the validity of the Callan–Symanzik equation
(18.54) for the coefficient function of the φ2 operator in φ36 -theory. To do this,
first calculate the coefficient function C̃φ2 (q) through one loop, renormalized
710 Scales III: Short-distance structure of quantum field theory

by zero momentum BPHZ subtraction (use an interaction Lagrangian λ3!R φ3 ,


and do not forget self-energy and vertex corrections). Then calculate DCZ C̃φ2 (q)
(using γ(λR ), β(λR ) from (18.43,18.45)) and show that (18.54) holds, neglecting
subdominant terms of order 1/q 4 , with γφ2 ,CZ = − 128π
1 2 4
3 λR + O(λR ).

8. Verify the equality of the left- and right-hand sides of (18.55) (see also Sec-
tion 6.6)).
9. Starting from (18.63), verify the inequality (18.64).
10. The analyticity of the amplitude Γ(ω, Q2 ) (suppressing the k 2 dependence) in
the complex ω-plane is limited by the presence of cuts due to physical thresholds
in the sum over states n in (18.55). For a physical state with forward time-like
four-momentum P μ , we must have P± > 0 (as Po > |P |). Show that this implies
that the imaginary part of the amplitude vanishes unless ω > 1. Moreover, by
Bose symmetry (amplitude T (q, k) even under q → −q), we must have Γ(ω, Q2 ) =
Γ(−ω, Q2 ). Consequently, Γ(ω, Q2 ) is analytic in the complex ω-plane except for
cuts running along the real axis from −∞ to −1 and +1 to +∞.
11. (a) Show that the one-loop term contributing to the right-most graph in Fig.

i∂− n
18.25, with the operator N4 (φ(0)( 2k −
) φ(0)) appearing at the vertex V , is loga-
rithmically divergent regardless of the power n. The relevant Feynman integral is

1 1 l− n d6 l
( ) (18.107)
(l2 − m2 + i )2 (k + l)2 − m2 + i k− (2π)6

Show that this integral is rendered finite when once subtracted at the light-like
point k = k̂ (i.e., setting k+ = k = 0). Hint: consider the integral with l−
n
replaced
by the general tensor lμ1 lμ2 ...lμn ; after introducing Feynman parameters (cf.
(16.67), easily generalized to higher powers of either propagator by differentiation
with respect to A or B), and making the usual shift in integration variable, one
sees that the only surviving term when all Lorentz indices are set equal to the −
n
light-cone coordinate is proportional to k− (as g−− = 0), with a logarithmically
divergent coefficient.
(b) Repeat for the one-loop graph studied in part (a), but with a single mass
insertion (doubled propagator), which gives, up to uninteresting factors, the
lowest-order contribution to the anomalous dimension γn in (18.72).
12. Use the analyticity of Γ(ω, Q2 ) demonstrated in Problem 8 to derive the following
Cauchy representation for the coefficients in (18.70):

1 dω
vn Cn (Q2 ) = Γ(ω, Q2 ) (18.108)
2πi ω n+1

where the contour runs around a small circle enfolding the origin. By unfolding
the contour of integration onto the cuts, and changing variables to x ≡ 1/ω, show
that
 1
1 + (−1)n
vn Cn (Q2 ) = xn−1 Im(Γ(x, q 2 ))dx (18.109)
π 0
Problems 711

giving the coefficient functions in terms of moments of the inclusive deep-inelastic


cross-section (cf. (18.55)).
13. Our objective in this problem is to verify the general solution (18.92) to (18.87).
Starting from the definitions (18.89, 18.90), establish first the identities
∂λeff (κ)
β(λR ) = β(λeff (κ)) (18.110)
∂λR
∂z(κ) 1
= (γ(λeff (κ)) − γ(λR ))z(κ) (18.111)
∂λR β(λR )
where the derivatives with respect to λR are at fixed κ. Using these identities,
together with (18.91), verify directly that the Ansatz (18.92) solves (18.87).
(4)
14. Calculate the Euclidean 1PI four-point vertex function ΓR (k1 , k2 , k3 , k4 ) in zero
4
mass φ4 -theory, with all momenta incoming, and subtracted at the symmetric
Euclidean point (with ki2 = μ2 , ki · kj = −μ2 /3, i = j), through one loop. Show
that one obtains, to order λ2R ,

(4) λ2R 3(k1 + k2 )2 3(k1 + k3 )2


ΓR (ki ) = λR + (ln ( ) + ln ( )
32π 2 4μ2 4μ2
3(k1 + k4 )2
+ ln ( )) + O(λ3R ) (18.112)
4μ2
Use this result to verify the result (18.97) previously stated for the one-loop β
function in φ44 -theory.
19
Scales IV: Long-distance structure
of quantum field theory

In the final chapter of this book we wish to turn our attention to the behavior of field
theory amplitudes in the long-distance regime, where “distance” is to be interpreted
spatiotemporally: we are interested in those aspects of elementary processes described
by a local relativistic field theory which involve large, even macroscopic, regions of
spacetime. Evidently, this behavior is closely connected with the whole definition of the
stable, asymptotic particle states of the theory, which are after all what survive when
a long time has elapsed after an elementary interaction and the final-state particles
are allowed to separate and become free from each other’s influence.
In Section 9.3 we described in detail the construction of these asymptotic states
and their connection to the local (or almost local) Heisenberg fields of the theory,
along the lines of the Haag–Ruelle scattering theory. An essential input to the
Haag–Ruelle formalism is the existence of a mass gap: the single particle state(s)
of the theory correspond to isolated δ-function singularities δ(p2 − m2 ), m = 0 in the
spectral density of the squared-mass operator Pμ P μ , or equivalently, simple poles in
the momentum-space full propagator defined by Fourier transforming the two-point
Feynman function of any suitable interpolating field for the particle in question. In
other words, the scattering theory unfolds in a straightforward and physically intuitive
way provided we adhere to field theories of purely massive particles. Once exactly
massless particles are present in the theory, the construction of a rigorous scattering
theory—in particular, the construction of a separable asymptotic Fock space based on
almost local operators, along the lines of Theorem 9.4—becomes considerably more
complicated. Nevertheless, the desired extension of the Haag–Ruelle theory has been
carried out by Buchholz for massless spin-0 bosons (Buchholz, 1977) and for massless
spin- 12 fermions (Buchholz, 1975), leading to a well-defined S-matrix and asymptotic
in- and out-Fock spaces along more or less the usual lines, at least for theories in odd
spatial dimensions where the Huyghens principle applies.
For massless spin-1 gauge bosons on the other hand, where the dynamics is
subject to an exact local gauge symmetry, the situation is far more subtle. Strangely
enough, the conceptual (if not calculational) difficulties are greater in the case of
unbroken abelian theories such as QED than in non-abelian gauge theories such as
QCD, for the simple reason that the latter theories do have a mass gap, despite the
presence of massless fields in the Lagrangian, as a consequence of the non-perturbative
confinement of non-gauge-invariant states, to be discussed below. Accordingly, the
slippery issues which arise once massless gauge particles appear in the asymptotic
The infrared catastrophe in unbroken abelian gauge theory 713

spectrum are avoided in the case of unbroken non-abelian theories, as they also are
in the case of spontaneously broken gauge theories (abelian or non-abelian) where
the Higgs phenomenon results in the spin-1 gauge particles of the theory associated
with the broken generators becoming massive. For an unbroken abelian gauge theory
such as QED, on the other hand, with a massive charged particle (e.g., the electron)
coupled to exactly massless photons, we shall see that the definition of a conventional
Fock space and associated S-matrix fails in a fundamental way: strictly speaking,
the S-matrix vanishes identically in such a theory. Indeed, the single-particle pole(s)
in amplitudes associated with incoming or outgoing charged particle(s) are softened
to branch points, making the LSZ formalism useless for obtaining finite scattering
amplitudes.

19.1 The infrared catastrophe in unbroken abelian gauge theory


We shall calculate the amplitude for creation and absorption of an arbitrary number of
real photons by an external c-number conserved current Jμ (x), coupled to a quantized
photon field Aμ (x). Real photon emission and absorption from this classical external
current is best studied in the transverse, or Coulomb, gauge, with the photon field
satisfying
 · A(x)
∇  =0 (19.1)
corresponding to the Fourier expansion (cf. (7.174)), choosing a real basis for the
polarization vectors,
 2
d3 k

A(x) =  ((k, λ)a(k, λ)e−ik·x + (k, λ)a† (k, λ)eik·x ) (19.2)
(2π)3/2 2E(k) λ=1

with E(k) = |k| for massless photons. The polarization vectors (k, λ) are transverse,
i.e., k · (k, λ) = 0, thereby ensuring the Coulomb gauge condition. It will be conve-
nient to re-express the photon field in a four-dimensional Fourier representation, by
reintroducing the energy integral, with the mass-shell condition inserted via the usual
δ-function:
  
2
d4 k

A(x) = ((2π)5/2
2E(k) (θ(k0 )δ(k 2 ) (k, λ)a(k, λ)e−ik·x + h.c.) (19.3)
(2π)4
λ=1

The Heisenberg field equation for the theory is just the quantized version of
Maxwell’s equation,
 H (x) = Jtr (x),
A  · Jtr (x) = 0
∇ (19.4)

where Jtr (x) is the transverse part of the full external current Jμ (an explicit example
will be considered shortly), the rest of Jμ (or “longitudinal” part) being associated with
the Coulombic flux carried along with the incoming and outgoing charged particle(s)
of the classical current, but not involved in the generation or removal of real photons
present at outgoing or incoming null (i.e., light-like) infinity. The prescribed c-number
current Jtr (x) is real and transverse, and therefore has the Fourier expansion
714 Scales IV: Long-distance structure of quantum field theory


d4 k  
2
Jtr (x) = ˜ k, λ)e−ik·x + J˜∗ (k, λ)eik·x ))
(k, λ)(J( (19.5)
(2π)5/2
λ=1

˜ k, λ) a c-number function of momentum for each polarization. The strange


with J(
power of 2π is chosen for later convenience.
The field equation (19.4) can be solved subject to the asymptotic conditions which
relate the Heisenberg field to the in- and out-fields in the far past and future (cf.
(9.46))1
 in (x, t), t → −∞
 H (x, t) → A
A
 H (x, t) → A
A  out (x, t), t → +∞ (19.6)
by introducing retarded and advanced Green functions for the  operator in (19.4):

e−ik·x d4 k
ΔR (x) ≡ − , ΔR (x) = 0, x0 < 0 (19.7)
k 2 + ik0 (2π)4

e−ik·x d4 k
ΔA (x) ≡ − , ΔR (x) = 0, x0 > 0 (19.8)
k 2 − ik0 (2π)4
ΔR (x) = ΔA (x) = δ 4 (x) (19.9)
For example, the retarded Green function ΔR (x) vanishes for negative time coordinates
x0 < 0 as the denominator has two simple poles k0 = ±|k| − i which are both below
the real axis, allowing us to close the k0 integration contour in the upper-half-plane for
x0 < 0, avoiding both poles and giving zero by Cauchy’s theorem. A similar argument
gives the corresponding advanced property for ΔA (x). The Green function property
(19.9) is obvious, as the infinitesimal displacement factors ±ik0 become irrelevant
once the d’Alembertian operator  is applied, generating a −k 2 factor in the integrand
and cancelling the poles. These Green functions immediately provide the solution of
(19.4) subject to the boundary conditions (19.6):

A H (x) = A
 in (x) + d4 yΔR (x − y)Jtr (y) (19.10)

A  out (x) +
 H (x) = A d4 yΔA (x − y)Jtr (y) (19.11)

Subtracting these two equations, we find




Aout (x) = Ain (x) + d4 yΔ(x − y)Jtr (y),
 Δ(x) ≡ ΔR (x) − ΔA (x) (19.12)

1 The field renormalization constant Z appearing in the asymptotic condition, non-trivial in theories
with fully quantized local field interactions, is simply unity in our case, where the photon field is coupled
to a c-number source. This follows from the fact that the Heisenberg field and the associated asymptotic
in- and out-fields differ by c-number terms (see (19.10, 19.11)), and therefore satisfy identical equal-time
commutators, which then forces Z = 1. We remind the reader that the limits indicated in (19.6) are to
be interpreted as weak limits—for matrix elements of the indicated, suitably smeared, operators (see
Section 9.3).
The infrared catastrophe in unbroken abelian gauge theory 715

1
Employing the familiar identity k2 +ik 0
= P ( k12 ) − iπ(k0 )δ(k2 ) (with (k0 ) the sign
function, (k0 ) = θ(k0 ) − θ(−k0 )), the Green function Δ(x) is found to have the
Fourier representation

d4 k
Δ(x) = i e−ik·x (k0 )δ(k2 ) (19.13)
(2π)3

Taking the Fourier transform of (19.13), and using the Fourier representations (19.3)
and (19.5) for the field and current, we find
i ˜ k, λ)
aout (k, λ) = ain (k, λ) +  J( (19.14)
2E(k)

This explicit relation between the out- and in-annihilation operators also determines
(up to an overall phase) the S-matrix for the theory (cf. (9.51)),

aout (k, λ) = S † ain (k, λ)S (19.15)

with a similar relation connecting the creation operators a†out , a†in . The desired S-
matrix operator is easily found if we recall the Baker–Campbell–Hausdorff formula
eB Ae−B = A + [B, A], valid if [B, A] is a c-number. If we take S as the formally
unitary operator (exponential of an anti-hermitian operator)

d3 k  ˜ 
2
S = exp (i  (J(k, λ)a†in (k, λ) + J˜∗ (k, λ)ain (k, λ))) (19.16)
2E(k) λ=1

then (19.15) follows immediately using the creation–annihilation commutator algebra


[ain (k, λ), a†in (k , λ )] = δλλ δ(k − k  ).
The expression just obtained for the scattering operator is more easily interpreted
by re-expressing it in Wick ordered form, with all creation operators placed to the left
of all destruction operators. This is easily achieved using the standard identity,
1
eA+B = e− 2 [A,B] eA eB (19.17)

valid if the commutator [A, B] is a c-number. Choosing A and B as the first and second
terms in the exponent in (19.16), we find
  
d3 k  ˜∗ 
2 2
d3 k ˜ k, λ)a† (k, λ)) · exp (i
S = C exp (i  J( in  J (k, λ)ain (k, λ))
2E(k) λ=1 2E(k) λ=1
(19.18)

with the commutator contribution



d3 k  ˜ 
2
1
C ≡ exp (− |J(k, λ)|2 ) (19.19)
2 2E(k)
λ=1
716 Scales IV: Long-distance structure of quantum field theory

Recalling our original definition of S in Section 9.1 (9.47), we see that the photon
state produced by our classical current, given a photon vacuum in the asymptotic
past, takes the explicit form


d3 k  ˜ 
2
S|0in = C exp (i  J(k, λ)a†in (k, λ))|0in (19.20)
2E(k) λ=1

as the exponential in (19.18) containing purely destruction operators simply reduces


to unity when acting on the in-vacuum. This result shows that the effect of a classical
current coupled to the quantized photon field is simply to generate a coherent state of
the photon field A  (cf. Section 8.3, especially (8.51))—a linear superposition of multi-
photon states with fixed phase and amplitude relations between the components of
the state with different numbers of photons (determined by the specific charge current
inducing the radiation).
The results (19.16, 19.20) represent the complete solution of the problem of
quantized electromagnetic radiation from a classical source current. The formula for
the S-matrix (19.16) looks plausible and, at first sight, completely unproblematic: we
have an apparently unitary S-operator given in a simple and explicit form in terms of
the (Fourier-transformed) classical current. In fact, as we shall now see, for any process
involving radiation from an accelerated classical charged particle, this operator is
actually zero! The problem arises from the apparently innocent normalization prefactor
C given by (19.19), and in particular from the behavior of the integral in the infrared
regime of small k, where the behavior E(k) = |k| for massless photons will, as we shall
see shortly, lead to an infrared divergence of the integral at low momentum.
The problem is easily exposed if we consider the simplest situation in which
a (classical) massive (mass m) charged (electric charge e) particle generates the
c-number current source Jμ (x). We shall assume that the particle enters and leaves a
bounded spacetime region where it undergoes a temporary acceleration, for simplicity
as a consequence of some non-electromagnetic interaction. Outside this region the
particle moves freely, approaching the interaction zone with four-momentum p and
leaving it with four-momentum p (see Fig. 19.1). The spacetime trajectory of the
particle, given as a function of the particle’s proper time τ , therefore satisfies

p
xμ (τ ) = τ, τ < τ−
m
p
xμ (τ ) = τ, τ > τ+ (19.21)
m

where we shall also assume the trajectory to be smooth (infinitely τ -differentiable) in


the interaction zone. The four-current J μ (x) is just the charge density eδ 4 (x − xμ (τ ))
μ
times the four-velocity dx
dτ , integrated over the proper time of the trajectory,


dxμ 4
J μ (x) = e dτ δ (x − xμ (τ )) (19.22)

The infrared catastrophe in unbroken abelian gauge theory 717

p τ/m

τ+

τ–

pτ/m

Fig. 19.1 spacetime trajectory of a classical charged particle undergoing a localized interaction.

with Fourier transform


 +∞
e dxμ −ik·x(τ )
J˜μ (k) = dτ e (19.23)
(2π)3/2 −∞ dτ

where the unexpected power of 2π arises from our unconventional normalization of


the Fourier transform in (19.5).
The low-momentum behavior of J˜μ (k) will turn out to be singular, with the leading
term determined by the asymptotic parts of the particle trajectory indicated in (19.21),
as the interaction zone integral from τ− to τ+ is clearly perfectly finite as k → 0. The
contribution of the asymptotic portions to the integral are
 τ−
pμ −ik·pτ /m pμ pμ
dτ e = e−ik·pτ− /m → i , small k (19.24)
−∞ m −ik · p k·p
 +∞
pμ −ik·p τ /m pμ −ik·p τ+ /m pμ
dτ e = e → −i , small k (19.25)
τ+ m ik · p k · p

We conclude that the momentum-space current density in this situation must take
the form

ie pμ pμ
J˜μ (k) = ( − )M(k) (19.26)
(2π)3/2 k · p k · p

where the residual amplitude M(k) satisfies (a) M(k) → 1, k → 0, and (b) M(k)
vanishes faster than any power of k for large k (as a consequence of the smoothness
of the trajectory in the interaction zone). Note that the current conservation property
∂μ J μ = 0 ⇒ kμ J˜μ (k) = 0 is automatically satisfied by (19.26). The full momentum-
space current in (19.26) can be decomposed into a longitudinal and transverse part
718 Scales IV: Long-distance structure of quantum field theory

μ

2
J˜μ (k) = kμ J˜l (k) + J˜tr (k) = k μ J˜l (k) + μ (k, λ)J(
˜ k, λ) (19.27)
λ=1

where the Coulomb gauge polarization vectors satisfy 0 (k, λ) = 0, k ·  (k, λ) = 0, and
therefore kμ μ (k, λ) = 0. The functions J(˜ k, λ) are just (up to uninteresting overall
constants) the objects introduced earlier in (19.5). The current conservation property
kμ J˜μ (k) = 0 then follows directly as a consequence of the photon mass-shell condition
k 2 = 0. We also have (again employing k 2 = 0, together with (19.26))


2
e2 p p 2
J˜μ (k)J˜μ∗ (k) = − ˜ k, λ)|2 =
|J( ( − ) |M(k)|2 (19.28)
(2π)3 k · p k · p
λ=1

Note that the four-vector square here is negative because only the space-like transverse
part survives. Returning to our expression for the S-operator, (19.18), we see that the
exponent in the prefactor C displayed in (19.19) is given, for massless photons with
E(k) = |k|, and with the current satisfying (19.28), by an infrared (logarithmically)
divergent integral, resulting in the vanishing of C, and thence, the S-operator itself
(remember the implicit minus sign in the four-vector square!),

e2 d3 k 1 p p
C = exp ( ( − )2 |M(k)|2 )
2(2π) 3 2E(k) |k|2 E(p) − k̂ · p E(p ) − k̂ · p 
∼ exp (−∞) = 0 (19.29)

where
 for the photon E(k) = |k|, while for the massive charged particle E(p) =
p 2 + m2 , etc. Note that the integral is cut off at the upper end by the rapid falloff of
the residual amplitude M(k): the problem is entirely at the infrared (low-momentum)
end. Indeed, if we regulate
the infrared divergence by temporarily inserting a photon
mass, so that E(k) = k 2 + m2γ , and cut the integral off at some momentum Λ
(below which the amplitude M(k) may be approximated by unity), the integral in
the exponent in (19.29) becomes
 |
p p k|=Λ
d3 k

( − )2 n̂ (19.30)
E(p) − n̂ · p E(p ) − n̂ · p 

2|k|2 k 2 + m2γ

where the quantity in angle brackets corresponds to an angular average over directions
of the unit vector n̂. The integral on the right is logarithmically divergent in the limit
of vanishing photon mass:
 |
k|=Λ  Λ
d3 k dk Λ
= 2π ∼ 2π log , mγ → 0 (19.31)
2|k|2 k 2 + m2γ 0 k 2 + m2γ mγ

Note that while the infrared divergence manifests itself as a zero in the S-operator once
we sum to all powers of the electric charge e, normal perturbation theory corresponds
to an expansion in powers of e, in which case the divergence appears as logarithms
The infrared catastrophe in unbroken abelian gauge theory 719

of the photon mass—one for each power of the fine structure constant α ∝ e2 —and
therefore an independent logarithmic divergence at each order of perturbation theory
in any specific S-matrix element involving a definite number of incoming and outgoing
photons. The only way to avoid this “infrared catastrophe” is, as we see clearly in
(19.30), to take p = p , and prevent our charged particle from receiving any transfer
of momentum, however small!
The physical interpretation of these results is actually extremely simple. Unlike the
situation for massless pions in a chirally symmetric theory (cf. Section 16.6), where
emission of low-momentum pions is suppressed by powers of the low momentum, there
is no penalty in quantum electrodynamics to the emission of low-momentum photons.
Moreover, the emission of a massless photon with arbitrarily low momentum incurs an
arbitrarily low-energy cost, and therefore we should hardly be surprised if the slightest
momentum shift of a charged particle induces the emission of a very large number of
extremely soft photons.
The result of this proliferation of emitted photons, as we have just seen, at least
for the simplified case of a classical charged particle acting as the source, is that the
coherent photon state thereby produced contains so many multi-photon states with
arbitrarily many soft photons that the exclusive amplitude for our charged particle to
emit any definite finite number of photons simply vanishes. In particular, the vacuum-
persistence-amplitude out
0|0in = C itself vanishes. The situation is somewhat anal-
ogous to that discussed previously in the context of Haag’s theorem (Section 10.5), in
that we have unitarily inequivalent spaces, but in this case, not between the physical
in- and out-asymptotic spaces and the computationally convenient (for perturbation
theory) but physically dispensable interaction-picture space, but between the in- and
out-spaces themselves, which by asymptotic completeness we have come to regard as
identical to each other and to the basic physical Hilbert space of the theory in any
“sensible” field theory.
The situation we are here encountering clearly suggests at the conceptual level a
more serious disease than any we have previously uncovered in our studies of local field
theory, and in fact it must be admitted that after much intense investigation there does
not appear to be any way to resuscitate the concept of a normal separable Fock space
with unitarily equivalent asymptotic spaces in an unbroken abelian gauge theory like
QED, with massless photons coupled to charged particles. Nevertheless, we hasten to
assure the depressed reader that a cure is at hand, even though it requires the abandon-
ment of scattering amplitudes like the S-matrix as the fundamental phenomenological
object of the theory, and the return to a direct evaluation of only carefully defined
and directly measurable quantities. This “Bloch–Nordsieck resolution” of the infrared
catastrophe of QED was already proposed in the 1930s, before the advent of modern
covariant quantum electrodynamics a decade later, and will be the subject of the
following section.
There may be some concern that the results we have obtained are subject to the
restriction that while our photons are described in a fully quantum mechanical context,
the charged source is classical and that perhaps this “hybrid” treatment is in some way
introducing inconsistencies into the theory, resulting in the evaporation of our beloved
S-matrix. In fact, essentially identical results are obtained in the fully quantized version
of the theory. We shall briefly explain how this works, referring the reader to the
720 Scales IV: Long-distance structure of quantum field theory

beautiful article of Weinberg (Weinberg, 1965) for the combinatoric details needed to
obtain the final result. Consider a process in QED in which a single incoming electron
scatters off an arbitrary set of other particles, which for simplicity we take to be
themselves uncharged (an example might be Compton scattering, where the electron
scatters off a single hard photon). The basic process is indicated in Fig. 19.2, where
we consider the effect of emission of a single soft photon, carrying four-momentum
k (much smaller than all other momentum scales in the process), from either the
incoming (Fig. 19.2(a)) or outgoing (Fig. 19.2(b)) electron. In the limit k → 0, both
of these amplitudes are found to diverge linearly. In the former case, for example, the
amplitude is proportional to
p/ − k/ + m
ū(p , σ )M(k) ieγ μ u(p, σ)
(p − k)2 − m2
γ μ (−p/ + m) + 2pμ − k/γ μ
= ieū(p , σ  )M(k) u(p, σ)
−2p · k + k 2

∼ (−ie )ū(p , σ  )M(k)u(p, σ), k → 0 (19.32)
p·k
Here the amplitude M(k) simply represents the “core” of the diagram, where the
relevant momentum scales are much higher than k, so that for small k we may simply
assume that it approaches some (for our purposes) uninteresting constant. The vector
index μ of the emitted photon (associated with the vertex factor ieγ μ ) is, of course, to
be contracted with a polarization vector μ in the event that the photon is a real one
appearing in the final state, or with the corresponding vector index of an absorbed
photon if it ends up being a virtual photon. The initial on-mass-shell spinor satisfies
the Dirac equation (p/ − m)u(p, σ) = 0, and we have employed the Dirac algebra p/γ μ =
2pμ − γ μ p/, and neglected higher powers of k, in arriving at (19.32). The corresponding
emission from an outgoing line in Fig. 19.2(b) produces similarly, in the small k limit,

( a) (b)
p p
k
k
p + k

M(k) M(k)

p−k

p p

Fig. 19.2 Soft photon emission from a charged fermion.


The infrared catastrophe in unbroken abelian gauge theory 721

a factorized amplitude with the dependence on the soft photon momentum isolated in
a linearly divergent prefactor,
p/ + k/ + m pμ
ū(p , σ  )ieγ μ 
M(k)u(p, σ) ∼ (+ie  )ū(p , σ )M(k)u(p, σ) (19.33)
(p + k) − m
2 2 p ·k
Combining these results, we see that the emission of a single soft photon from
the charged electron traversing the process produces a divergent prefactor which is
identical (up to a sign) to our Fourier-transformed classical current (19.26):
pμ pμ
Memit one photon (k) ∼ −ie( − )ū(p , σ  )M(k)u(p, σ) (19.34)
k · p k · p
The absorption of a photon on either the incoming or outgoing electron line leads
to an exactly similar result, with a change of sign (as in this case we have k → −k).
A single virtual photon exchange requires both of these factors, together with the
virtual photon propagator −igμν /(k2 + i), and an integration over the photon four-
momentum. Putting in the appropriate coupling and combinatoric factors, one finds
that the low-momentum contribution arising from single virtual photon exchange
results in the modification of the uncorrrected core amplitude M by a multiplicative
factor

e2 d4 k 1 p p 2
J =i ( − ) (19.35)
2 (2π) k + i k · p k · p
4 2

while multiple virtual photon exchanges simply exponentiate this result leading to an
overall amplitude in which soft photon effects appear as an exponential prefactor,

Mvirtual photon exchanges ∼ exp (J)M(e = 0) (19.36)

The real suppression factor (19.29) previously obtained in our semiclassical analy-
sis corresponds to the absolute magnitude | exp (J)| = exp (Re(J)).2 One finds that
the real part arises effectively from the mass-shell δ-function in the virtual photon
propagator 1/(k2 + i) = P(1/k2 ) − iπδ(k 2 ), leaving us with the three-dimensional
d3 k integral visible in (19.29) (recall that in the classical problem, M(k) = 1 in the
infrared region where the factorization of the soft-photon effects is valid). The reader
is referred to the previously mentioned article of Weinberg’s for a full discussion of
the combinatorics of the soft photon effects, and a careful analysis of the integral
appearing in (19.35).
The annoying “evaporation” of S-matrix amplitudes as a consequence of the expo-
nentiation of infrared divergences appearing in S-matrix amplitudes at finite orders of
perturbation theory is a symptom of deep structural problems in the formulation of
the physical state space of a theory like QED with massless gauge particles appearing
in the asymptotic states of the theory. In fact, the phenomenon is even present in
some low-dimensional non-gauge theories—the classic example being the model of a

2 There is also a divergent phase factor, arising from the imaginary part of J. This is associated with
divergences familiar from careful treatments of Coulomb scattering in non-relativistic quantum mechanics:
see (Weinberg, 1965), Section V.
722 Scales IV: Long-distance structure of quantum field theory

massless boson in two spacetime dimensions derivatively coupled to a massive Dirac


fermion initially studied in this context by Schroer (Schroer, 1963). The Lagrangian is
just that studied earlier in Chapter 12 (cf. (12.52)) in connection with the appearance
of seagulls and Schwinger terms in a derivatively coupled theory, but in two dimensions
and with the boson mass m = 0 (also, the γ5 corresponding to a pseudoscalar boson
is unimportant and can be dropped). The derivative of the massless boson field, ∂μ φ,
acts as a stand-in for the photon field in four dimensions, and one easily sees (see
Problem 1) that virtual photon exchanges in processes analogous to Fig. 19.2 give rise
to logarithmic
 infrared divergences arising from loop integrals behaving for small k
like d2 k/k 2 . In this model, Schroer was able to show that the fermion propagator in
momentum space has a branch point rather than a simple pole in the on-mass-shell
2
limit, S̃F (p) ∼ (p2 − M 2 )−1+g /2π , where g is the coupling constant in the derivative
coupling term (see (12.52)). The softening of the pole structure means that any literal
application of the LSZ formula will clearly result in a vanishing of S-matrix elements
involving an on-mass-shell Dirac fermion in either the initial or final state.
But this embarassment is merely the most obvious manifestation of much deeper
problems in the Fock-space formulation of field theory on which we have relied
throughout this book. Indeed, the very general arguments of Section 9.5 (leading to
the Kållen–Lehmann representation) show that the existence of a simple pole in the
two-point function of a massive field theory follows from very general principles (single
particle representation of the Poincaré group, unitarity, etc.) which we would certainly
be loath to abandon. Massive particles suffering the indignity of loss of their single
particle pole in Green functions of the associated field have been dubbed infraparticles
by Schroer, and it has been demonstrated rigorously (Buchholz, 1986) that the same
situation obtains in any field theory with an exact local Gauss’s Law, in particular, in
four-dimensional quantum electrodynamics. In the latter case, the situation is made
even messier by the gauge-variance of the electron propagator: in different gauges, the
precise form of the “smearing out” of the simple pole singularity of the free electron
field is dependent on the gauge used to quantize the theory.
A careful analysis of the mathematical structure of the state space in quantum
electrodynamics leads to some dramatic—indeed, disturbing—conclusions,3 to wit:
1. As a consequence of the above-cited Buchholz theorem (following from the
existence of a local Gauss’s Law), the mass-squared operator Pμ P μ lacks a
discrete eigenvalue, with a corresponding normalizable one-electron state, at the
value corresponding to the squared electron mass.
2. Single-particle electron states with differing momentum fall into unitarily inequiv-
alent spaces: in particular, matrix elements of all local operators of the theory
(and not just of the S-operator, as discussed above) vanish between such states.
This unitary inequivalence is far more consequential than the example we have
already encountered with Haag’s theorem, where the unitarily inequivalent spaces
concern the absence of well-defined unitary operators connecting the asymptotic
physical in- (or out-)spaces of the theory to the computationally useful (in pertur-
bation theory) but physically unmeaningful interaction-picture states. Here the

3 For a recent review of the situation, see the discussion of Haag in (Haag, 1992), Sections VI.2, VI.3.
The infrared catastrophe in unbroken abelian gauge theory 723

inequivalence is such that we are unable to even construct well-defined normal-


izable single-electron wave-packets as the electron states of different momentum
lie in inequivalent sectors. Again, the unbroken abelian symmetry giving rise
to an exact Gauss’s Law is the underlying culprit: each charged particle state is
associated with an asymptotic Coulomb flux,4 with the form of the flux depending
on the velocity of the charged particle. The local fields of the theory can be
shown to leave the asymptotic flux unchanged, resulting in a non-denumerable
infinity of inequivalent sectors of the full Hilbert space (which thereby becomes
non-separable) characterized by different values of the charged particle velocity
(and therefore, asymptotic flux). The presence of a non-denumerable infinity of
charged particle sectors (distinguished by a superselection rule associated with
the charged particle momentum) can also be regarded as a consequence of a
spontaneous symmetry-breaking of the Lorentz boost symmetry.(Fröhlich et al.,
1979)
3. The usual Haag–Ruelle scattering theory (cf. Section 9.3) which depends heavily
on the existence of a mass gap, and normalizable single-particle states, becomes
inapplicable for infraparticle amplitudes. Despite many efforts to reinstate the
concept of a well-defined S-matrix, effectively by considering charged particle
states “dressed” with a coherent cloud of infinitely many photons, a physically
transparent and computationally practical scheme for dealing with transition
amplitudes of charged states in QED remains an unrealized goal. The construc-
tion of suitable asymptotic Maxwell fields, on the other hand, is not particularly
problematic (Buchholz, 1982): indeed, one finds that there are no infrared
divergences in the Feynman amplitudes associated with processes (such as light-
by-light scattering) in which only electrically neutral photons, and no electrons,
appear in the initial and final states. The S-matrix in the charge-free sector is
therefore (subject to the usual need for ultraviolet renormalization) perfectly well
defined.

The conceptual difficulties which infect the construction of a rigorous framework


for charged-particle scattering in quantum electrodynamics might seem to run directly
in the face of the stunningly precise phenomenological successes of this theory. Fortu-
nately, we shall see that a careful analysis of the conditions under which actual mea-
surements can be performed in quantum electrodynamic processes provides a secure
pathway out of the quagmire. Just as the difficulties imposed by Haag’s theorem can be
evaded by avoidance of an ill-defined formal intermediary (the interaction picture for
the unregulated theory), either by fully regulating the theory or by non-perturbative
methods, the disconcerting evaporation of a well-defined S-matrix can also be circum-
vented once we formulate properly the inescapably inclusive transition probabilities
which correspond to the actually measured quantities in any feasible experiment.
Admittedly, the abandonment of the S-matrix as the central phenomenological object
incorporating the sum total of available information concerning the scattering physics

4 In our classical current model this flux has its origin in the longitudinal part of the current density
(19.27). In a fully quantized theory the asymptotic flux can be shown to commute with all local operators
and therefore to be a c-number: see (Buchholz, 1982).
724 Scales IV: Long-distance structure of quantum field theory

of the theory seems at first a radical step. Nevertheless, the elimination of the infrared
catastrophe provided by this “Bloch–Nordsieck” resolution, to which we now turn,
provides the basis for the unambiguous calculation of the quantum electrodynamic
component of essentially all high-energy processes, the vast majority of which involve
charged particles in the initial or final state, in modern particle physics.

19.2 The Bloch–Nordsieck resolution


The physical origin of the infrared problems of a theory with exactly massless photons
was clarified already in 1937, in the seminal paper of Bloch and Nordsieck (Bloch
and Nordsieck, 1937), even prior to the development of the fully covariant pertur-
bative formalism for quantum electrodynamics in the late 1940s. Massless photons
of arbitrarily low momentum (and therefore energy) couple with equal strength to a
charged source, unlike the situation with derivatively coupled massless pions in a chiral
effective Lagrangian (cf. Section 16.6), where the emission or absorption amplitude for
a pion vanishes linearly with the momentum. The result, as we have discussed in the
previous section, is that the slightest acceleration of a charged source results in the
radiation of an infinite number of very soft (low-energy) photons. This “cloud” of soft
photons is the inevitable concomitant of any process in which an electrically charged
particle undergoes any change of momentum, however small. On the other hand, actual
measurements of quantum electrodynamic processes employ photodetectors with a
finite resolution, which cannot register soft photons below a certain minimum energy
Δ. Thus, measurements involving charged particles are inevitably inclusive: transition
probabilities should be computed summing over all final states compatible with the
trigger limitations of the apparatus used.
A simple example will suffice to illustrate the general situation. Let us imagine
that we are interested in the probability of emission of a single “hard” photon, of
momentum q and polarization λ, with |q| > Δ, from a charged particle undergoing
a scattering (and therefore a change of momentum). We shall again resort to the
analytically solvable semiclassical model of the preceding section, so that our charged
particle is represented by a classical c-number current density, with transverse part
as given in (19.5). The hard photon is registered by our photodetector irrespective
of the presence of an arbitrary number n of additional soft photons with momenta
k1 , k2 , .., kn with |ki | < Δ which are emitted by the particle but remain unregistered
by the detector. The total probability of a single trigger of the detector by the indicated
hard photon is therefore, allowing for the emission of precisely n soft photons, so to
speak, “flying below the radar” of our photodetector,

 
1
Pn (q, λ; Δ) = d3 k1 · ·d3 kn |out
qλ, k1 λ1 , ..kn λn |0in |2 (19.37)
n! |
ki |<Δ λi

where the 1/n! factor takes into account multiple counting of identical soft photon
states (by Bose symmetry). We may convert the out-state appearing in the matrix
element to an in-state by introducing the scattering operator S, as in (9.47), where in
our case S is given explicitly by (19.18):
The Bloch–Nordsieck resolution 725
 
1
Pn (q, λ; Δ) = |C|2 d3 k1 · ·d3 kn
n! |
ki |<Δ λi
 
in+1 d3 k ˜ k, λ)a† (k, λ)}n+1 |0in |2 (19.38)
·|in
qλ, k1 λ1 , ..kn λn | {  J( in
n + 1! 2E(k) λ

The destruction operators in the exponential on the right in (19.18) act on the in-
vacuum, and the exponential therefore reduces to unity, while the left exponential
can be expanded as shown in (19.38), with only the term involving n + 1 creation
operators surviving. C is the vacuum persistence amplitude (amplitude for emission
of no photons) given in (19.19). Any one of the n + 1 creation operators can be used
to the left to remove
 the single distinguished hard photon, giving an overall factor
˜ q , λ)/ 2E(q) in the matrix element. Taking this outside the soft photon
of (n + 1)J(
integrals, we find

˜ q , λ)|2 1   †
|J( n
(A (Δ))
Pn (q, λ; Δ) = |C|2 d3 k1 · ·d3 kn |in
k1 λ1 , ..kn λn | J |0in |2
2E(q) n! |
ki |<Δ n!
λi

d3 k  ˜∗ 
AJ (Δ) ≡  J (k, λ)ain (k, λ) (19.39)
|
k|<Δ 2E(k) λ

where we are allowed to restrict the integral over k in the n-particle creation operator
to the soft-momentum regime |k| < Δ, as the only photons present in the final state
are now the soft ones. The restriction to soft momenta in the momentum integrals
d3 k1 · ·d3 kn can now be relaxed, as the matrix element for any particle with momentum
|ki | > Δ vanishes given the restriction to soft momenta in the creation operator A†J (Δ).
Moreover, the sum over all n-particle states implied by these momentum integrals
can be expanded to a complete set (by including states with m = n photons) as the
additional states manifestly have vanishing matrix elements to the vacuum of the
indicated n-particle creation operator. Recalling the completeness relation (5.22) for
a multi-particle bosonic Fock space, the sum over n-particle states can be augmented
to a complete set of in-states:

˜ q , λ)|2 
|J( (A† (Δ))n
Pn (q, λ; Δ) = |C|2 |in
α| J |0in |2 (19.40)
2E(q) α n!
˜ q , λ)|2 1
|J(
= |C|2
0|(AJ (Δ))n (A†J (Δ))n |0in (19.41)
2E(q) (n!)2 in

The matrix element in (19.41) is easily evaluated (see Problem 2),



† d3 k  ˜ 
in
0|(AJ (Δ)) (AJ (Δ)) |0in = n!( |J(k, λ)|2 )n
n n
(19.42)

|k|<Δ 2E(k)
λ

The reader will recall (cf. (19.19)) that the vacuum persistence (i.e., no-photon-
emission) factor C is given by
726 Scales IV: Long-distance structure of quantum field theory

1 d3 k  ˜ 
C = exp (− |J(k, λ)|2 ) (19.43)
2 2E(k)
λ

with the integral in the exponent logarithmically divergent in the infrared (in the
massless photon limit) for currents corresponding to a charged particle undergoing
a change of momentum, thereby resulting in the vanishing of C. For any finite n,
the probability Pn (q, λ; Δ) in (19.41) therefore also vanishes for massless photons,
as the matrix element multiplying |C|2 contains a finite power of this same infrared
divergence, as we see from (19.42). On the other hand, if we calculate, as previously
argued, the inclusive probability allowing for emission of arbitrarily many soft photons,
we find that the divergence in the integral at small k is exactly cancelled, giving a finite
probability for the detection of a single hard photon (momentum q and polarization
λ) by a detector of finite resolution Δ:


 
˜ q , λ)|2
|J( d3 k  ˜ 
Ptot (q, λ; Δ) ≡ Pn (q, λ; Δ) = exp (− |J(k, λ)|2 )
n=0
2E(q) |
k|>Δ 2E(k)
λ
(19.44)
The infrared divergence in the integral in the exponent is now effectively cutoff by
the detector resolution Δ: in fact, the exponent turns out to contain logarithms of
the form ln (q 2 /Δ2 ) with currents of the form (19.28) and q = p − p . In other words,
transition probabilities, and more generally all types of measurable cross-sections, for
quantum electrodynamics processes can be expected to depend in an important, but
fortunately calculable, way on the sensitivity of the measurement apparatus to the
“haze” of low-energy photons which are inevitably present.
We have been illustrating the essential nature of the infrared problem in quan-
tum electrodynamics with the aid of a semiclassical model in which the source
current is treated classically (but with a quantized Maxwell field), and taking full
advantage of a delicious property—complete analytic solvability—of this model. In
particular, we have not needed to resort to perturbative approximations, as our
results contain the exact emission probabilities to all orders in the particle charge
e (which is hidden in the current J( ˜ k, λ)- cf. (19.26)). In the fully quantized
version (QED) of quantum electrodynamics, in which the charged particle fields
are also treated quantum mechanically, we must of course resort to perturbation
theory.
Before describing the Bloch–Nordsieck resolution in QED proper, it is useful to
take a look at the cancellation of infrared divergences visible in (19.40–19.44), from
the point of view of a perturbative expansion in the squared charge, or fine-structure
constant α = e2 /4π, as the cancellations occurring in the fully quantized theory arise in
a completely analogous way. We shall work to order α2 , or e4 , recalling that J˜ ∼ O(e),
and that the leading term in Ptot is of order α, as we insist on the emission of a single
hard photon of momentum q. It is clear from (19.39) that the n-soft-photon emission
probability is of order αn+1 , so to order α2 the total transition probability Ptot , which
we already know to be infrared finite for finite detector resolution Δ, is given by just
the contributions from n = 0 and n = 1:
The Bloch–Nordsieck resolution 727

Ptot = P0 + P1 + O(α3 )
 
˜ q , λ)|2
|J( d3 k  ˜  d3 k  ˜ 
= {1 − |J(k, λ)|2 ) + |J(k, λ)|2 } + O(α3 )
2E(q) 2E(k) 
|k|<Δ 2E(k)
λ λ

(19.45)

The first two terms in the curly braces in (19.45) arise from the expansion (to order
α) of the no-photon-emission probability |C|2 : in particular, the infrared divergent
integral in the second term corresponds to the emission and reabsorption of a single
virtual photon (of arbitrary momentum) on the charged-particle line, accompanying,
of course, the hard photon emission described by the overall prefactor. The third
term, also an infrared divergent integral (cut off on the ultraviolet end by the
detector resolution Δ), corresponds to the total probability for the emission of a single
undetected soft photon. We see that the cancellation in the infrared divergence between
the two integrals appearing in (19.45) amounts to a cancellation in the total probability
Ptot between infrared divergences arising from virtual photon contributions and real
photon emission terms. If we introduce a photon mass mγ to separately regularize
each of the integrals in the infrared, the singular ln (mγ ) dependence in each integral
evidently cancels exactly between virtual and real photon emission terms, at each order
of the perturbative expansion in α, once the finite detector resolution is properly taken
into account.
Returning now to the fully quantized version of quantum electrodynamics, one finds
that precisely the same mechanism operates to produce well-defined transition prob-
abilities (or cross-sections) once photon detector resolutions are taken into account,
once again by cancellation between virtual and real photon contributions. The detailed
analysis, which we must here omit for considerations of space, can be found in the
seminal paper of Yennie, Frautschi, and Suura (Yennie et al., 1961). However, the
mechanism of the cancellation can be indicated with a simple example. Fig. 19.3
shows the low-order Feynman graphs contributing to the emission of a hard photon

2 2
q q q q
p p p p
k1

k1
+ k +... + + +...

p p p p

Fig. 19.3 Contributions to hard photon emission from an electron in QED (through O(α2 )).
728 Scales IV: Long-distance structure of quantum field theory

of momentum q (indicated by the spiral line) from a charged particle which has had
its momentum altered5 by an interaction indicated by the dashed line (which for
simplicity we may take to be of non-electromagnetic character). To lowest order in the
particle charge e (order α ∼ e2 for the cross-section, or the squared amplitude), only
the single hard photon need be taken into account, as in the left graph in Fig. 19.3(a),
but to the next order we must take into account the possibility of a virtual photon
exchange, as in the right graph in Fig. 19.3(a), or an additional emission of a real soft
photon (momentum k1 ) below the detector threshold, as in Fig. 19.3(b).
Once again one finds that the infrared divergence in the virtual photon diagram
(actually, in the interference term obtained by squaring the amplitude indicated in
Fig. 19.3(a)) cancels exactly with an infrared divergence in the real photon emission
graphs of Fig. 19.3(b). The proof that this cancellation is effective for general
processes, to all orders of perturbation theory, can be found in the aforecited paper of
Yennie et al.
The essential point which we wish to emphasize here is that a careful specifica-
tion of the limitations of any measurement process in a situation involving exactly
massless abelian gauge particles automatically leads to well-defined finite transition
probabilities and cross-sections once the measurement process is carefully specified,
even if the intermediate quantities (S-matrix amplitudes) which we normally rely on
in field theory have a singular structure in the zero mass limit. The formal difficulties
(infrared divergences at finite orders of perturbation theory, vanishing of the S-matrix
when the amplitudes are summed to all orders) appear because of mathematically
convenient idealizations in the theoretical formulation which do not correspond to
physical reality: specifically, the propagation of particles in a Minkowski space of
infinite spatial volume, and the existence of detectors of infinitely precise resolution.
For example, in a finite spatial volume the momentum integrals become discretized
sums, with a natural infrared cutoff of order the inverse spatial size of the “box”.
Thus, unlike the situation discussed above, where the average number of photons
emitted (into infinite volume) by an accelerated charge is infinite, the average number
of photons per unit volume in blackbody radiation is perfectly finite, as the reader
may easily confirm by integrating ρ(ν, T )/hν, with the energy density ρ(ν, T ) given
by the Planck formula (1.22), over all photon frequencies ν. Of course, if we consider
an infinite volume box, the total number of photons is again infinite. Inasmuch as the
formulation of a scattering theory typically presupposes the asymptotic propagation of
incoming and outgoing particles through an infinite spatial volume, it is not surprising
that we encounter formal difficulties due to the concomitant appearance of infinitely
many very soft photons of arbitrarily long wavelength and correspondingly low energy.

19.3 Unbroken non-abelian gauge theory: confinement


Imagine a fictional physicist, fully informed of the basic relativistic and quantum
theoretical frameworks underlying modern physics, but encountering for the first time
the experimental discoveries of the past century. Two phenomena in particular stand
out in their capacity to provoke astonishment—and even, at first sight, disbelief. The

5 The classical term for this sort of process is Bremsstrahlung—German for “braking radiation”.
Unbroken non-abelian gauge theory: confinement 729

first—already well established experimentally by the second decade of the twentieth


century—was the phenomenon of superconductivity. The sudden collapse of electrical
resistivity (to unmeasurably low values) in mercury cooled to 4.2 K must have seemed,
frankly, miraculous at the time. As it turned out, a correct microscopic interpretation
of superconductivity would have to wait until the mid 1950s, with the BCS theory
developed by Bardeen, Cooper, and Schrieffer.
The second “miraculous” phenomenon was the discovery of quark confinement. In
this case we are talking more about a gradual process of deepening understanding
rather than a singular event of discovery. The notion that the observed hadronic
particles and resonances could, at least from a group-theoretical point of view, be
easily interpreted by regarding these objects as composites of fractionally charged
“quarks”, as proposed in the mid 1960s by Gell’Mann and Zweig, seemed at first a
purely technical convenience, but the discovery of point-like substructures in deep-
inelastic scattering from protons soon (by the early 1970s) gave rise to the quark-
parton model, identifying the kinematical objects of Gell’Mann and Zweig with actual
particle constituents of the observed hadrons, now thought of as bound states of the
quarks/partons.
Unfortunately, there was soon overwhelming experimental evidence that the puta-
tive quark constituents of hadrons simply never appeared as isolated (fractionally
charged!) objects in the final states of even the most energetic hadronic collisions. This
in turn gave rise to the hypothesis of “quark confinement”: basically, the appearance
in asymptotic states of hadronic scattering processes of particles with the quantum
numbers of quarks was simply excluded by fiat. Physically, this could be “explained”
by the assumption that the separation of an isolated quark from other hadronic
matter required an infinite (or at least, extremely large) amount of energy. Such a
phenomenon certainly runs counter to the deep-seated intuition inherent in all local
quantum field theories that far separated objects have negligible influence on each
other: exponentially decreasing with distance for theories with only massive particles,
or like an inverse power of the separation for massless theories. As we shall see, the
two “miracles” of twentieth-century physics mentioned here—superconductivity and
quark confinement—in fact share deep similarities in their underlying mechanisms.
From a formal point of view our apparent inability to observe isolated quarks
can be re-expressed simply as the statement that the S-matrix vanishes for in- or
out-states in which particles with quantum numbers of quarks appear. Indeed, we
have just encountered a superficially similar circumstance, in the case of quantum
electrodynamics in four spacetime dimensions, where the infrared divergences asso-
ciated with the copious emission and absorption of low-energy photons sum to give
a vanishing result for exclusive S-matrix amplitudes involving scattering of charged
particles (which, for an abelian gauge theory, means the matter fields, e.g., in the case
of QED, electrons or positrons). In the abelian case, however, we have seen that this
vanishing does not imply the inability to produce asymptotic isolated electron states
(for example), but only the impossibility that such charged particles can be treated
in isolation from the inescapable low-energy “photon cloud” surrounding them.
It was realized very early—immediately following the surge in interest in non-
abelian gauge theories following ’t Hooft’s proof of their renormalizability in 1971 and
their subsequent employment (in the symmetry broken phase) in the development of
730 Scales IV: Long-distance structure of quantum field theory

the electroweak Standard Model—that the infrared divergences of unbroken (massless)


non-abelian gauge theories are even more ferocious than those encountered in the
abelian case. The reason is simple: in the non-abelian case, the massless gauge vector
particles (“gluons”) are themselves charged under the gauge symmetry group of the
theory, so the emission of soft gluons occurs from gluons themselves, as well as from
the fermionic matter fields (quarks). The self-interaction of gluons leads to a highly
non-trivial interacting theory even if the matter fields are decoupled (say, by making
them infinitely massive), unlike the situation in QED. In fact, the similarity between
the long-distance structure of unbroken abelian and non-abelian gauge theories is only
superficial: the underlying dynamics leads to dramatically different physics in the two
cases, as we shall see.
The adoption in the mid 1970s of an unbroken non-abelian gauge theory as the
most promising candidate for a field-theoretic description of the dynamics underlying
hadronic processes came with the realization that the theory would have to provide a
mechanism not just for eliminating the fermionic quark fields (transforming under the
fundamental representation of the gauge group) from the asymptotic spectrum, but
also the massless gauge fields (gluons, transforming under the adjoint representation),
which were, of course, nowhere in evidence experimentally. So the obvious need
for “quark confinement” was expanded into the more general requirement of “color
confinement”, which excludes all objects with non-vanishing charge under the non-
abelian symmetry (referred to as “color” to avoid confusion with electric charge)
from the asymptotic spectrum. From the point of view of a Wightman formulation of
the theory, the physical states of the theory are obtained by application of gauge-
invariant operators to the vacuum, which are the only local fields which can act
as interpolating fields (i.e., possess non-vanishing vacuum to single particle matrix
elements) for the stable particles appearing in the asymptotic spectrum. On the
other hand, the underlying dynamics of the theory is specified in terms of a simple
Lagrangian (15.112) in which only non-gauge-invariant quark and gluon fields appear.6
The local or almost local fields describing observable stable hadronic particles (for pure
QCD, based on a SU(3) gauge group, with the weak interactions switched off, these
include the nucleon and pion fields) then correspond to composite, gauge-invariant
fields built from the underlying quark and gluon fields.
The phenomenon of confinement of a charge associated with a local gauge sym-
metry is actually not limited to non-abelian theories, as we can see if we examine
field theories in one or two spatial dimensions. The simplest case is provided by the
analog of quantum electrodynamics in two spacetime dimensions, comprising the class
of theories with action given by (in Minkowski space)

1
I = d2 x{− Fμν (x)F μν (x) + ψ̄(x)(iD / − m)ψ(x)}, Dμ ≡ ∂μ + ieAμ (x) (19.46)
4

Two special cases in this class are of particular importance for the present discussion,
corresponding to the fermionic field (which, in virtue of the similarities of the model

6 A review of the discussion of the general relation between particles and fields provided in Section 9.6
may be useful at this point.
Unbroken non-abelian gauge theory: confinement 731

to four-dimensional QCD, we shall dub the “quark” field) being either (a) extremely
massive, m >> e (note that in two spacetime dimensions the gauge field is dimen-
sionless and the charge coupling constant e has dimensions of mass, as required by a
dimensionless action), or (b) massless m = 0—the famous “Schwinger model”. First,
note that in one spatial dimension, gauge-fixing to axial gauge A1 (equivalent in this
case to the transverse or Coulomb gauge ∂i Ai = ∂1 A1 = 0) leaves only the auxiliary,
non-dynamical A0 field, responsible for the static Coulomb interaction, with the Green
function (in one space dimension)

∂2 1
−∇2 V (x) = δ(x) ⇒ − V (x1 ) = δ(x1 ) ⇒ V (x1 ) = − |x1 | (19.47)
∂x21 2

leading to a full Hamiltonian (after elimination of A0 , see Problem 3) consisting of the


usual free “quark” Hamiltonian plus a Coulomb energy contribution (cf. 2.53)7 :

1
Hcoul = − ρ(x1 , t)|x1 − x1 |ρ(x1 , t)dx1 dx1 (19.48)
4

with the charge density given by ρ(x) = eψ̄(x)γ0 ψ(x). There are no transverse degrees
of freedom in one space dimension, so real “photons” are absent in this model: the
entire physics induced by the gauge field is incorporated in the Coulomb interaction
(in Coulomb gauge). Even classically, this theory confines charge, as we see that
the Coulomb potential grows linearly with the separation of charges: thus if we set
ρ(x1 ) = +Qδ(x1 ) − Qδ(x1 − L), the static Coulomb energy of the opposite charged
pair is 12 Q2 L, so an infinite amount of energy would be required to completely isolate
either charge from the other. The result is easily understood from Gauss’s Law: the
electric flux leaves the +Q charge with magnitude Q and energy density Q2 /2 and
travels directly to the −Q charge along the only available spatial axis, giving a total
electrostatic energy Q2 L/2.
This is in contrast to the situation in three space dimensions, of course, where the
static electric flux originating on a charge spreads out throughout the ambient three-
dimensional volume, decreasing the energy density and giving rise to an electrostatic
interaction energy of separated charges falling inversely with their separation. In
the present case, the asymptotic spectrum cannot contain increasingly far-separated
“quarks” (with no intervening charged particles) without incurring an arbitrarily large
energy penalty.
On the other hand, we expect, in the limit of very heavy “quarks” (m >> e), to
find non-relativistic bound quark–antiquark states (analogous to the “onium” mesons
of QCD) of zero total charge. In fact, the only stable particles in the theory correspond
to neutral bosons, which can undergo non-trivial scatterings. As we decrease the mass
m relative to the coupling e, eventually reaching the “strong-coupling” regime of
e >> m, the stable bosons of the theory become, somewhat paradoxically, weakly
coupled (Coleman, 1976), and in the exactly massless limit for the quark (the original

7 As pointed out by Coleman, the physics of the massive model is enriched in an interesting way by
allowing for an external electric field, in which case additional terms appear in the Hamiltonian. Here we
set this field to zero. See (Coleman, 1976) for a detailed discussion of the general case.
732 Scales IV: Long-distance structure of quantum field theory

“Schwinger model”) the spectrum of the √ theory collapses to that of a single free,
neutral, massive boson (with mass = e/ π). In fact, the gauge-invariant operators of
the theory can all be re-expressed in terms of a scalar field φ, in terms of which the
Hamiltonian density reads H = 12 (πφ2 + (∂1 φ)2 + e2 φ2 /π).
The linear form of the Coulomb potential in the Schwinger model is, of course,
a kinematical consequence of the single spatial dimension available for the spread of
electric flux. In two spatial dimensions the Coulomb potential (Green function of the
Laplacian) grows logarithmically with distance, still providing charge confinement in
the abelian case, although a “weaker” form than in one spatial dimension, while in
three space dimensions we have the usual 1/r falloff, allowing us to isolate charged
particles from one another, although not, as we have seen, from the ever present
“cloud” of infrared low-energy photons. In non-abelian models, on the other hand,
there are strong arguments to believe that linear confinement persists even in two
or three space dimensions, as a consequence of the very non-trivial self-interacting
dynamics of the gauge fields of the theory. In the remainder of this chapter we shall
see how the methods of lattice gauge theory can be used to provide both analytic and
numerical support for this hypothesis.
As in the case of spontaneous symmetry-breaking, the physics of confinement
primarily concerns the long-distance properties of the theory, and we may therefore
expect that the details of the theory at very short-distance scales are unimportant,
as long as we take care to regularize the theory in a way that does not do violence
to those features of the theory that are intimately connected with the long-distance
phenomenon of interest. In our case, the features in question are those related directly
to the local gauge symmetry of the Lagrangian, which we take to be unbroken in the
Lagrangian, and with the associated remaining global symmetry (after gauge-fixing)
preserved by the vacuum of the theory (in other words, we are not in a Higgs phase of
the theory where the gauge fields of the theory are screened by a vacuum condensate
of charged fields).
In addition, we shall work in Euclidean space, as it is easy to formulate a simple and
direct criterion for confinement (or non-confinement) of matter fields in any particular
representation of the gauge group in an imaginary-time formulation, as we shall soon
see. The regularization of the theory at short distances will be performed by working
on a four-dimensional hypercubic lattice with a large but finite number of points L
in each (Euclidean) spacetime dimension, with a lattice spacing a separating nearest
neighbors in each spacetime direction (see Fig. 19.4, where a small section of the lattice,
in the μ, ν plane, is shown). We shall assume for definiteness that we are dealing with
a single unbroken gauge group SU(N) (the abelian case U(1) can be treated in a
completely analogous fashion), with the dynamics specified by a Euclidean functional
integral, as in (15.161–15.164), for the continuum theory.
The matter fields of the theory will be identified with field variables localized on
the sites of the lattice, which we will label with bold-faced Roman letters n, m, etc.
Thus, a bosonic scalar (resp. Dirac) field in the fundamental representation will be
specified at location n on the lattice as φn (resp. ψn ), where the gauge group “color”
index (and, in the fermion case, Dirac) indices are suppressed. The continuum vector
gauge fields Aαμ (x) of the theory lying in the adjoint representation (thus, the index
α = 1, 2, ..., N 2 − 1) are encoded in an N × N matrix field chosen to simplify the task
Unbroken non-abelian gauge theory: confinement 733

Un;μν
a n Un,μ

Fig. 19.4 A slice through a Euclidean hypercubic lattice supporting a lattice gauge field.

of constructing a gauge-invariant discretized action, as follows. Recall that the adjoint


gauge fields can be conveniently “packed” into a single matrix field (cf. (15.101)):

Aμ ≡ tα Aαμ (19.49)

where the tα are a set of hermitian generators of SU(N ), normalized by Tr(tα tβ ) =


δαβ . We shall also slightly change the notation previously used for finite gauge
transformations: thus, a finite local gauge transformation of a (e.g., scalar) matter
field in the fundamental representation of SU(N ) will be denoted (cf. (15.96))

φ(x) → φ (x) ≡ Λ(x)φ(x), Λ(x) ∈ SU(N ) (19.50)

while the gauge field appearing in (19.49) transforms like (cf. (15.105))

† i †
Aμ (x) → AΛ
μ (x) = Λ(x)Aμ (x)Λ (x) + (∂μ Λ(x))Λ (x) (19.51)
g
The symbol U , formerly used for the local gauge transformations now denoted by
Λ in (19.50), will instead be used to denote the parallel gauge transporter, which is
defined, for the infinitesimal path (x + dx, x) corresponding to the straight segment
from x to x + dx, to be the transformation

U (x + dx, x) ≡ e−igAμ dxμ = 1 − igAμ dxμ (19.52)

One sees immediately, working to first order in dxμ ,

Λ(x + dx)U (x + dx, x)Λ† (x) = Λ(x + dx)(1 − igAμ dxμ )Λ† (x)
= (Λ(x) + ∂μ Λ(x)dxμ )(1 − igAμ dxμ )Λ† (x)
= 1 + dxμ ((∂μ Λ(x))Λ† (x) − igΛ(x)Aμ Λ† (x))
734 Scales IV: Long-distance structure of quantum field theory

i
= 1 − ig(Λ(x)Aμ Λ† (x) + (∂μ Λ(x))Λ† (x))dxμ
g
= 1 − igAΛ
μ (x)dxμ ≡ U (x + dx, x)
Λ
(19.53)

that the transporter U (x + dx, x) serves to “shift” a matter field localized at point x to
one transforming under the local gauge transformation appropriate for point x + dx:

U (x + dx, x)φ(x) → U Λ (x + dx, x)Λ(x)φ(x) = Λ(x + dx)U (x + dx, x)φ(x) (19.54)

The parallel transport property for the infinitesimal path (x + dx, x) generalizes in an
obvious way to finite paths specified by some continuous contour Ca→b from spacetime
point a to point b, as we may simply divide the path into infinitesimal segments
and form the path-ordered product8 of U (x + dx, x) transporters to obtain a finite
transporter

U (Ca→b ) = P exp {−ig Aμ dxμ } (19.55)
Ca→b

transforming like

U (Ca→b ) → U Λ (Ca→b ) = Λ(b)U (Ca→b )Λ† (a) (19.56)

For a closed path, we end up with a transporter transforming covariantly under the
adjoint representation of the gauge group,

U (Ca→a ) → Λ(a)U (Ca→b )Λ† (a) (19.57)

from which it follows (from Λ† (a)Λ(a) = 1) that the trace of any closed-path trans-
porter is gauge-invariant:

Tr(U (Ca→a )) → Tr(Λ(a)U (Ca→a )Λ† (a)) = Tr(U (Ca→a )) (19.58)

For the special case of straight line contours, the parallel transporter satisfies a familiar
type of first-order equation, analogous to the equation (4.20) satisfied by the time-
ordered interaction-picture evolution operator (4.28). Suppose the contour is just a
straight line path in the Euclidean “time” (fourth) direction, from the spacetime point
y ≡ (x, y4 ) to the point x ≡ (x, x4 ), where x4 > y4 . Then we have
∂ 1
U (Cy→x ) = lim (e−igA4 (x,x4 )Δx4 − 1)U (Cy→x )
∂x4 Δx 4 →0 Δx4
= −igA4 (x, x4 )U (Cy→x ) (19.59)

Of particular interest in the discretized lattice version of the theory are the parallel
transporters corresponding to links connecting nearest neighbor sites on the lattice.
Thus, we define (see Fig. 19.4) Un,μ as the parallel transporter for the path (n + aμ̂, n)

8 The finite path-ordered product is defined, analogously to the time-ordered product of (4.27), by
expanding the exponential and ordering the gauge-field factors in each term so that “later” fields along the
path are placed to the left.
Unbroken non-abelian gauge theory: confinement 735

extending from site n in the positive μ direction by one lattice spacing. As a conse-
quence of (19.56), this object transforms under local SU(N ) gauge transformations,
specified by assigning an SU(N ) element Λn to each lattice site n, as follows:

Un,μ → Λn+aμ̂ Un,μ Λ†n (19.60)

The smallest closed path on the lattice corresponds to a plaquette, or a square


one lattice spacing on a side, corresponding to a path n → n + aμ̂ → n + aμ̂ + aν̂ →
n + aν̂ → n (see Fig. 19.4). The corresponding transporter, Un;μν , μ < ν, is given
explicitly by
† †
Un;μν = Un,ν Un+aν̂,μ Un+aμ̂,ν Un,μ

= eigaAν (n) eigaAμ (n+aν̂) e−igaAν (n+aμ̂) e−igaAμ (n) (19.61)

For the time being we shall assume that we are dealing with smooth classical fields, so
that aAμ (n + aν̂) may be approximated in the exponent by a(Aμ (n) + a∂ν Aμ (n)),
neglecting terms of order a3 . The exponentials can be combined using a Baker–
Campbell–Hausdorff formula
2
1
[X,Y ]+O(a3 )
eaX eaY = eaX+aY + 2 a (19.62)

and a short calculation (see Problem 4) then gives


2
Fμν +O(a3 )
Un;μν = e−iga (19.63)

with Fμν the N × N hermitian matrix field strength tensor (cf. (15.108))

Fμν = ∂μ Aν − ∂ν Aμ + ig[Aμ , Aν ] = tα Fαμν (19.64)

As expected, the closed-path plaquette variable Un;μν inherits the covariant adjoint
transformation property (19.57) from the field-strength tensor which has the same
transformation behavior. The trace of this quantity is easily seen to be exactly
invariant under the full set of local lattice gauge transformations specified by (19.60).
Moreover, the real part of the trace of the plaquette transporter can be expanded for
small lattice spacing, giving
1 † 1
Re Tr(Un;μν ) = Tr(Un;μν + Un;μν ) = Tr(1) − g 2 a4 Tr(Fμν (n)Fμν (n)) + O(a5 )
2 2
(19.65)
Note that the trace of the term linear in the exponent vanishes, as the exponent must
be anti-hermitian (Un;μν is unitary). The second term on the right is nothing but
the usual pure gauge Lagrangian density (15.111) (in Euclidean space, so there are
no raised indices). The higher terms in (19.65), of order a5 or higher, correspond by
dimensional analysis to operators of dimension 5 or higher—exactly the ones which
in a cutoff effective Lagrangian correspond to “irrelevant” operators, to which the
low-energy physics should be insensitive, as we saw in Chapter 16. The continuum
gauge action corresponding to (15.111)) (in Euclidean space) becomes, after a naive
discretization on a hypercubic lattice,
736 Scales IV: Long-distance structure of quantum field theory
 1
1
Sgauge = Tr(Fμν Fμν )d4 x → a4 Tr(Fμν (n)Fμν (n)) (19.66)
4 n
4

Comparing this with (19.65), we see that the usual continuum action, once regulated
on a lattice, corresponds up to irrelevant (dimension 5 and higher) operators with the
Wilson lattice action (Wilson, 1974)

 1 N
SWils,latt = β (1 − Re Tr(Un;μν )), β≡ (19.67)
n,μ<ν
N g(a)2

The coupling constant g(a) here refers to a dimensionless parameter appearing in an


effective, cutoff action functional: the presence of a spacetime hypercubic lattice with
spacing a between nearest neighbor sites implies that the Fourier momentum modes of
the fields are cutoff at a high momentum Λ ∼ πa , so in the language of Sections 16.4 and
17.4, the value chosen for g(a) (or equivalently, g(Λ)) must be chosen to “flow” as the
lattice spacing a is taken to zero (to recover a continuum theory), or as the UV cutoff Λ
is taken to infinity, while holding some suitable set of low-energy amplitudes fixed. We
expect from the asymptotic freedom property (Section 18.3) of unbroken non-abelian
gauge theories that this will require g(a) to vanish logarithmically as the lattice spacing
is sent to zero. In other words, the continuum limit of the theory is approached by
taking β in (19.67) large, and then examining the correlation of observables over larger
and larger separations in lattice units (corresponding to a fixed separation in physical
units). We have a sensible continuum limit if a uniquely specified rescaling of lattice
to physical units simultaneously results in well-defined and non-trivial (i.e., not simply
free field value) continuum limits for all the distinct gauge-invariant observables of the
theory.
We may remark here that abelian gauge theories based on a U(1) gauge group
can be regularized gauge-invariantly on a lattice by a completely analogous pro-
cedure. In this case, the link variables are unimodular complex numbers, Un,μ =
exp (iθn,μ ), −π < θn,μ < +π, with plaquette angles θn;μν constructed by summing the
four angles around an elementary square. One may take, in analogy to the non-abelian
Wilson action (19.67),


SWils,latt = β (1 − cos (θn;μν )) (19.68)
n,μ<ν

The continuum limit again corresponds to taking the coupling β large, which forces
the path integral to concentrate in the region of small θn;μν . The specific choice of
periodic function used here is to a large extent a matter of convenience: different
functions with the same Gaussian behavior for small plaquette angles correspond to
effective Lagrangians at the UV cutoff scale differing by higher-dimension operators
(i.e., higher powers of θn;μν ∼ Fμν ) which we expect to be in the same universality
class (cf. Section 17.4) as the theory defined by the Wilson action, say. For example,
a very useful choice for analytic computations is the Villain U(1) action:
Unbroken non-abelian gauge theory: confinement 737

SVill,latt = SVill (θn;μν ) (19.69)
n,μ<ν


+∞
β
SVill (θ) ≡ − log exp {− (θ − 2mπ)2 } (19.70)
m=−∞
2

The Wilson and Villain actions (for β = 5) are displayed in Fig. 19.5: it is apparent
that the small θ behavior is identical; we are free to use either as our regularized version
of the abelian gauge theory. Similarly, a non-abelian Villain lattice action (say, for the
gauge group SU(2)) can be defined by choosing the single plaquette action:


+∞
β 1
SVill (Un;μν ) = − log exp {− (arccos ( Tr(Un;μν )) − 2mπ)2 } (19.71)
m=−∞
2 2

The addition of charged (fundamental representation) scalar matter fields φn local-


ized on lattice sites is also straightforward:
one may easily verify (Problem 5) that the
discrete gauge-invariant actions nμ (φ∗n+μ̂ Un,μ φn + c.c.) and n P (φ∗n φn ) (with P an
at most quadratic polynomial) can be linearly combined to give a matter lattice action
Smatter,latt containing the most general set of gauge-invariant relevant and marginal
continuum operators. The inclusion of charged fermionic fields is somewhat trickier,
as a consequence of the infamous “doubling” problem, whereby naive discretizations
of the usual Dirac action lead to a superfluity (by a power of two) of the fermionic
degrees of freedom (see (Montvay and Münster, 1994), Section 4.2). We shall not treat
this subject here, as we shall be concerned only with the confinement question in the
static (infinite quark mass) limit, where the problem can be circumvented.
The Euclidean path integral for the lattice discretized gauge theory can now be
written down directly. For a theory with scalar matter site fields φn interacting with
the gauge link fields Un,μ , the partition function of the theory is given by
  
Zlatt = dφn dUn,μ e−(SWils,latt +Smatter,latt ) (19.72)
n nμ

S(θ)

Villain action

Wilson action

θ
−π +π

Fig. 19.5 Comparison of Wilson and Villain actions for a U(1) lattice gauge theory.
738 Scales IV: Long-distance structure of quantum field theory

where the integrals over the link variables Un,μ ∈ SU(N ) are the usual Hurwitz measure
ones (cf. (15.140)). Note that the local gauge group is now compact, as it is simply the
direct product of a finite number of independent SU(N ) groups acting on each lattice
site. The problem of an infinite gauge group volume which plagued the continuum
formulation of the theory, and required the insertion of a gauge-fixing prescription
to provide unambiguous finite results for correlation functions computed from the
functional integral, has simply disappeared. We may therefore perform an unrestricted
integration over the link variables Un,μ , provided the observables O[φn , Un,μ ] being
averaged in the functional integral are themselves gauge-invariant:
  
O = dφn dUn,μ O[φn , Un,μ ]e−(SWils,latt +Smatter,latt ) /Zlatt (19.73)
n nμ

with gauge-invariance requiring

O[φn , Un,μ ] = O[Λn φn , Λn+aμ̂ Un,μ Λ†n ] (19.74)

With this non-gauge-fixed formulation, correlation functions of gauge-variant objects


are not useful, and in fact, frequently vanish. The correlation function of two distinct
link variables, for example, Un,ν Um,μ , automatically vanishes as the integration over
all link variables “includes” an averaging over local gauge transformations of the two
link variables (during which the action exponent is constant), and the Hurwitz integral
over
 gauge transformations Λ of a single link variable U shifted by Λ will vanish, as
dΛ(ΛU ) = 0. Thus, we are unable to compute a non-zero gauge field propagator in
this approach. Likewise, the charged matter field Euclidean propagator, defined by
the correlation function φn φ∗m , will automatically vanish for n = m.
The physical questions we ask of the theory must therefore be expressed in terms
of explicitly gauge-invariant operators. For example, a quark–antiquark pair, obtained
in standard perturbation theory by applying the bilocal operator ψ̄(x)ψ(y) to the
vacuum, must be obtained here by the application to the vacuum of an almost
local operator ψ̄(x)U (Cy→x )ψ(y), in which the two quark fields are connected by a
parallel transporter (“Wilson line”) along some continuous path from y to x, and
which is exactly gauge-invariant, using (19.56). The specific choice of path will not
be important, as we will be interested below in an asymptotic argument in which the
initial quark–antiquark state, however constituted at the outset, is allowed to rearrange
itself into a compatible state (of equal conserved quantum numbers) with minimal
energy by evolving the system over a long Euclidean time. In the Schwinger model
discussed earlier, the physical significance of the Wilson line is clear: it represents the
“string” of electric flux connecting the oppositely charged particles and giving rise to
the linear dependence of energy with separation.
We wish to study the question of the static energy of a quark–antiquark pair
in a non-abelian gauge theory, where the static condition is enforced by taking the
quark mass M very large (effectively infinite). The pair is inserted into the vacuum
at Euclidean time x4 = 0 and removed at Euclidean time x4 = T . Here, by “quark”
we mean a spin- 12 fermionic field transforming under the fundamental representation
of the gauge group, which for definiteness we take to be SU(N ). A suitable almost
local gauge-invariant operator to perform the insertion is ψ̄(x, 0)U (C(y,0)→(x,0) )ψ(y , 0),
Unbroken non-abelian gauge theory: confinement 739

while the removal is accomplished by ψ̄(y , T )U (C(x,T )→(y,T ) )ψ(x, T ) (see Fig. 19.6),
where the contours are chosen for simplicity to be straight line spatial paths connecting
the locations x and y at fixed time 0 or T .
We also assume, without explicitly indicating this in the notation, that the quark
and antiquark (although of equal mass M ) are of different “flavors”—i.e., one is not
the antiquark of the other—to eliminate the possibility of mutual annihilation (into
pure gauge energy). Instead, the two heavy objects are forced to propagate over a
large Euclidean time T , after which they are removed from the system. The Euclidean
amplitude for this process, written for the time being in the continuum theory, is
represented schematically (ignoring overall normalization, gauge-fixing issues, etc.) by
the functional integral

ψ̄(c)U (Cb→c )ψ(b)ψ̄(a)U (Cd→a )ψ(d)



= DAαμ DψDψ̄ ψ̄(b)U (Cb→c )ψ(b)ψ̄(a)U (Cd→a )ψ(d)e−(Sgauge +Smatter )

= DAαμ Tr(SE (d, c; A)U (Cb→c )SE (b, a; A)U (Cd→a ))e−Sgauge (19.75)

where a, b, c, d are the spacetime points (x, 0), (x, T ), (y , T ), (y , 0), as indicated (super-
imposed on a spacetime lattice) in Fig. 19.6. In the final line the integral over the
fermionic quark fields has been performed for each fixed gauge field in the remaining
DAαμ functional integral. Thus, the function SE (b, a; A), for example, is the Euclidean
Dirac propagator for the massive quark propagating in the background classical gauge
field Aαμ . The trace appearing in (19.75) is over gauge group (i.e., fundamental
representation) indices, and we have ignored an irrelevant overall minus sign arising
from permuting the Grassmann quark fields.

b c
t=T

t=0 a d
R

x y

Fig. 19.6 Path corresponding to the Wilson loop observable W (R, T ).


740 Scales IV: Long-distance structure of quantum field theory

We have already seen in Chapter 11 (see the discussion following (11.50)) that the
exchange of transverse gauge particles between charged Dirac fermions is suppressed in
the limit where the fermion mass(es) are taken large, and that moreover, in the extreme
static limit (with the mass taken to infinity), the spatial momentum dependence of
the propagator and the coupling to spatial gauge fields disappears entirely, plausibly
enough, as an infinitely massive particle is insensitive to the transfer of any finite
amount of momentum, and, in the absence of acceleration, cannot radiate or absorb
real gauge quanta.
In the static limit therefore, the Euclidean propagator for our quarks is just the
Green function for the Euclidean Dirac operator appearing in (15.162), but with the
spatial derivatives and spatial components of the gauge field set to zero, leaving only
the fourth (Euclidean “time”) components

/ + M = −iγ̂μ (−i∂μ + gtα Aαμ ) + M → −γ̂4 D4 + M,


−iD̂ D4 ≡ ∂4 + igA4 (19.76)

where we have reverted in the final expression to the use of the matrix gauge field
A4 = tα Aα4 , and removed a factor of the coupling constant from the field to maintain
consistency with our notation throughout this section. Thus, our static propagator in
a background continuum field satisfies

(−γ̂4 D4 + M )SE (x, y; A) = δ 4 (x − y) (19.77)

Using (19.59), we may write down the solution to (19.77) (see Problem 6)

SE (x, y; A) = e−M |x4 −y4 | δ 3 (x − y )(P+ θ(y4 − x4 ) + P− θ(x4 − y4 ))U (C(y,y4 )→(x,x4 ) )
(19.78)
with P± ≡ (1 ± γ̂4 )/2 the projection operators appropriate for quark and antiquark
propagation (in the static limit). Inserting (19.78) into (19.75) we see that the
Euclidean propagation amplitude for our quark–antiquark pair is proportional (using
cyclicity of the trace) to the Wilson loop variable W (R, T ) = Tr(UCa→b→c→d→a ) ≡
Tr(UC(R,T ) ) corresponding to the gauge-invariant trace of the closed rectangular
contour displayed in Fig. 19.6, averaged in the Euclidean functional integral over
gauge fields distributed according to a Boltzmann weight e−Sgauge determined by
the pure gauge action. If our contour is very long in the Euclidean time direction
(T >> R = |x − y | in Fig. 19.6), the Euclidean propagation amplitude must acquire,
by reasoning familiar from Section 4.2, a factor e−V (x−y)T where the static potential
energy V (x − y ) is defined as that of the minimum-energy state into which the quark–
antiquark pair introduced at t = 0 (plus the gauge gluons with which they interact)
can rearrange itself, or alternatively, the minimum-energy state with a non-vanishing
matrix element of the bilocal operator ψ̄(x, 0)U (C(y,0)→(x,0) )ψ(y , 0) to the vacuum.
For the lattice-regularized theory, this quantity, defined mathematically as
1
V (R) ≡ lim {− log < W (R, T ) >} (19.79)
T →∞ T
can be numerically estimated by generating a large ensemble of statistically indepen-
dent gauge field (i.e., link) configurations according to the Boltzmann weight arising
from the Wilson action using Monte Carlo techniques, and then averaging the Wilson
Unbroken non-abelian gauge theory: confinement 741

loop variable (for various choices of R and T ) over this ensemble. This program,
initiated in the late 1970s by Creutz (Creutz, 1980), has been pushed to quite large
lattices and a high level of statistical precision, and there is by now overwhelming
numerical evidence that V (R), in addition to the expected Coulombic behavior at
short distances (where we expect perturbative behavior due to the asymptotic freedom
property at short distance described in the preceding chapter), possesses a linear
dependence of V (R) on R at large separations (see Fig. 11.13).
The appearance of a linearly rising static potential was already demonstrated by
Wilson in the strong coupling limit where β = N/g2 is taken small, in his seminal
paper on lattice gauge theory (Wilson, 1974). We shall briefly explain the reason for
this result here. For compact Lie groups G (such as the SU(N ) groups considered here),
any invariant function on the group f (U ), f (V −1 U V ) = f (U )∀U, V ∈ G has a Fourier
expansion in terms of the character functions χr (U ) associated with a complete set of
unitary representations of the group (labeled by the index r). The character function
χr (U ) is simply the trace of the unitary matrix representing the group element U in
the rth representation, of dimension dr ,

χr (U ) ≡ Tr(U ), χr (1) = dr (19.80)

1
and any invariant function, and in particular the exponential function eβ( N ReTr(UP )−1)
appearing in the lattice functional integral from the Wilson gauge lattice action (where
P denotes a particular plaquette) can be expanded

1
  cr (β)
eβ( N ReTr(UP )−1) = cr (β)χr (UP ) = c0 (β)(1 + χr (UP )) (19.81)
r
c0 (β)
r =0

where we have separated out explicitly the contribution of the trivial representation
(r = 0) with χ0 (U ) = 1. For concreteness, let us take the case of gauge group SU(2).
The representations are labeled by the index j, which can be integer or half-integer,
with the fundamental (spinor) representation corresponding to j = 12 . In this case the
coefficient ratios appearing in the sum are

cj (β) I2j+1 (β) β 2j


= (2j + 1) ∼ + O(β 2j+2 ), β → 0 (19.82)
c0 (β) I1 (β) (2j − 1)!22j

Note that for small β, the leading contributions arise from the use of representations
with minimum dimensionality (which give a non-zero contribution to the desired
amplitude). For SU(N ), the lowest-dimensional non-trivial representation is the fun-
damental, which we shall denote with subscript “F” (thus for SU(2), cF = cj=1/2 ).
The Schur orthogonality theorem for the finite-dimensional irreducible unitary
matrix representations of SU(N ) (where the superscripts r, s identify the represen-
tation)

1
dU (Uij )∗ Umn
(r) (s)
= δrs δim δjn (19.83)
dr
742 Scales IV: Long-distance structure of quantum field theory

implies a useful identity,



1
χr (U † V )χs (U W )dU = δrs χr (V W ) (19.84)
dr
which allows us to perform the gauge link integrations appearing in the lattice
functional integral giving the expectation value of the Wilson loop observable W (R, T )
(= trace in the fundamental representation of the product of link variables around the
rectangular loop of Fig. 19.6):
  
1 1
W (R, T ) = dUn,μ χF (UC(R,T ) ) eβ( N ReTr(UP )−1) (19.85)
Z nμ P
   1
Z= dUn,μ eβ( N ReTr(UP )−1) (19.86)
nμ P

The character expansion (19.81) can be inserted in (19.85), whereupon we obtain


  
1
W (R, T ) = dUn,μ χF (UC(R,T ) ) crP χrP (UP ) (19.87)
Z nμ r P P

In the limit β → 0, the leading contribution to W (R, T ) comes from picking the
minimum set of non-trivial representations for each plaquette appearing in the product
over plaquettes in (19.87), compatible with obtaining a non-zero result for the integral
over links. At a minimum, we must include a full set of RT plaquettes “tiling”
the interior of the rectangular R × T contour given by the Wilson loop. Otherwise,
there will be unmatched link variables (appearing only once) which integrate to zero.
Moreover, this minimum set of plaquettes must all be associated with the fundamental
representation in order to obtain a non-zero result, as the boundary links appearing
in χF (UC(R,T ) ) are in this representation, and integrals over products of characters
in different representations vanish by Schur orthogonality, (19.84). The upshot of this
reasoning (see (Montvay and Münster, 1994), Section 3.4, for the gory combinatoric
details) is that the Wilson loop expectation value in the strong coupling limit satisfies9
cF (β) RT
W (R, T ) ∝ ( ) (small β) (19.88)
dF c0 (β)
For SU(2), for example, this becomes
I2 (β) β
W (R, T ) ∝ u(β)RT , u(β) = ∼ + O(β 3 ) (19.89)
I1 (β) 4
i.e. W (R, T ) ∼ e−KRT ≡ e−T V (R) , V (R) = KR, K ≡ − log(u(β)) > 0
(19.90)

9 The factors appearing here originate as follows. Each plaquette variable integration provides a 1/d
F
c (β)
factor, pursuant to (19.84), and a character coefficient cF (β) , as in (19.81), with the overall factors of c0 (β)
0
cancelling between the numerator and denominator (Z) in (19.85).
Unbroken non-abelian gauge theory: confinement 743

The falloff of the Wilson loop as a (negative) exponential of the area RT of the loop
(the famous “area law”) clearly indicates a linear rise in the static potential V (R),
as defined in (19.79). The coefficient K appearing in (19.90) is commonly referred to
as the “string tension”. It has dimensions of force, and turns out in QCD to take the
interesting phenomenological value of approximately 15 metric tons: the gluon flux
“string” extending from a single isolated quark could support a rather large truck
(carrying the quantum numbers of a single anti-quark)!
The physical interpretation of the area-law dependence of the Wilson loop observ-
able in the strong coupling limit is not hard to uncover, given our earlier discussion
of confinement in the massive Schwinger model. From (19.65) we see that the lattice
plaquette variable Re Tr(Un;μν ) corresponds to the square of the μ, ν component of
the color field strength tensor (summed over colors to yield a gauge-invariant object).
If we take our Wilson loop in Fig. 19.6 to be oriented in the Euclidean “x-t” plane,
the plaquettes tiling the interior of the loop correspond to local values of the square
of the color electric field F14 (Euclidean version of F10 in Minkowski space) in the
x-direction. The necessity for including all these plaquettes in the expansion of the
action (in order to obtain a non-vanishing contribution to the functional integral)
amounts to the imposition of a non-abelian Gauss’s Law whereby color flux originating
on the quark must make its way to the anti-quark.
However, if we probe for the presence of color electric flux elsewhere in the volume
of our lattice, by inserting, say, additional plaquette variables somewhere off the plane
of the loop in the observable being measured in the path integral to check for the
presence of electric flux elsewhere, we find that our result in (19.88) is suppressed by
additional factors of β, which is, of course, small in the strong coupling limit under
consideration. Similarly, subdominant contributions to the Wilson loop expectation are
obtained from tilings in which the set of plaquettes bordered by the Wilson loop “bulge
away” from the plane of the loop, corresponding to color electric flux “straying” away
from the straight line connecting quark to anti-quark. In fact, the appearance of exactly
linear confinement in 1+1-dimensional gauge theory is now seen to follow simply
from the kinematic impossibility of such straying when only one spatial dimension
is present: indeed, one can easily show that the result (19.88) is exact (for all β) in
1+1-dimensional gauge theory (see Problem 7).
A number of rigorous results have been established for the behavior of the Wilson
loop observable in lattice gauge theories. Two results of particular interest can be
stated here:

1. The strong coupling expansion (unlike the weak coupling expansions of per-
turbation theory, which as we saw in Chapter 11 are at best only asymptotic
expansions) is a Taylor expansion: in other words, the lattice observables are
analytic in β at β = 0. Moreover, for at least some finite range of β, the string
tension K(β) is non-vanishing (Osterwalder and Seiler, 1978).
2. The static potential V (R) defined by the limit (19.79) cannot rise more rapidly
than linearly with R at large R (Seiler, 1978): the area law is in this sense max-
imal. As we must expect at least constant terms in V (R), Euclidean symmetry
implies that a perimeter law W (R, T ) < e−C(T +R) , for some constant C, is
minimal.
744 Scales IV: Long-distance structure of quantum field theory

As we have emphasized previously (see the discussion following (19.67)), the


continuum limit of the lattice-regularized gauge theory is expected to appear in the
limit in which the bare coupling g(a) is taken to zero as the lattice spacing a is taken to
zero: in other words, in the limit of β = N/g(a)2 going to infinity. Thus the appearance
of an area law at small β tells us absolutely nothing about the persistence of a linearly
rising potential between static quarks once we insist on the validity of our stated
action functional down to length scales much smaller than those implied by the low-
energy parameters of the theory (mass of lightest hadron, string tension, etc.). Once
β becomes large, there is a vast increase in the number of contributing terms in the
strong coupling expansion of the Wilson loop observable, corresponding to “tilings” of
the loops by essentially arbitrary surfaces of plaquettes bounded by the loop. In other
words, field configurations in the path integral become important which correspond to
the spreading out of the electric color flux lines from the “frozen” string going directly
from the quark to the anti-quark (the so-called “roughening transition”), which clearly
have the tendency to weaken the static potential felt by the quark pair.
Indeed, the strong coupling expansion of the loop observable in an abelian U(1)
gauge theory also displays a linearly rising static potential (one simply has u(β) = II10 (β)
(β)
in (19.89)), despite the fact that the continuum limit should yield a free pure gauge
2
theory with the usual quadratic gauge action Fμν , and a static Coulomb potential
between charges, V (R) ∝ 1/R. In the abelian case there exists a rigorous proof (due
to Guth (Guth, 1980)) that there is a phase transition at a finite critical coupling
βc ∼ 1 such that the string tension K defined in (19.90) vanishes identically for β > βc ,
leaving a perimeter law (as e−(A+B/R)T → e−AT , T >> R) corresponding exactly to
the desired Coulomb behavior at long distances, as long as we stay on the weak-
coupling (large β) side of the transition.
For non-abelian gauge theories, we instead expect (or at least, fervently hope)
that the string tension K remains non-zero for arbitrarily large β, and in fact scales
appropriately with β so that the coefficient K of the linear potential, once converted
to physical units (say, in units determined by the mass gap, or lightest hadron, of the
theory) remains finite as we take the lattice spacing to zero. Clearly, this behavior will
depend crucially on the highly non-trivial self-interacting dynamics of a non-abelian
pure gauge theory, which must somehow act to “focus” the lines of electric color flux
along the “string” connecting the static quark–antiquark pair. In the next section we
shall show how a plausible mechanism for such focussing can be displayed in a simpler
model: gauge theory in three spacetime (two space, one time) dimensions.

19.4 How confinement works: three-dimensional gauge theory


The lattice-regularization of gauge theory, together with a strong coupling expansion,
as introduced by Wilson and described in the preceding section, gives a hint of the sort
of mechanism which might be responsible for a linear rising potential between static
quarks. We have seen that in the extreme strong coupling limit, the Euclidean path
integral describing the propagation of a quark–antiquark pair over long Euclidean
times is dominated by field configurations corresponding to “color” electric flux
concentrated on the line between the quarks. The result is an area-law dependence of
the Wilson loop observable.
How confinement works: three-dimensional gauge theory 745

An analytically precise, and physically intuitive, description of how this happens


is available in at least one non-trivial case: compact abelian gauge theory in 2+1
spacetime dimensions. For this theory, a full semiclassical account of confinement has
been developed, beginning with the work of Polyakov (Polyakov, 1977) and Banks et al.
(Banks et al., 1977), with important elaborations by Göpfert and Mack (Göpfert and
Mack, 1982). As in four spacetime dimensions, there is no distinction in the confining
behavior at strong coupling of abelian and non-abelian gauge theories in 2+1 dimen-
sions: both display a linearly rising inter-quark potential for small β. However, we
expect the continuum limit of the abelian theory in 2+1 dimensions to yield the usual
perturbative Coulomb potential (Green function of the two-dimensional Laplacian
operator), i.e., a potential which rises logarithmically, rather than linearly, at large
distance. On the other hand, numerical simulations provide convincing evidence that
the linear rise (=area law) persists for non-abelian gauge theories in 2+1 dimensions
for lattice spacings much smaller than the scale of the theory, as set by the inverse
squared coupling (in 2+1 dimensions, g has dimensions mass1/2 ). In this section we
shall present an account of confinement in 2+1-dimensional gauge theories, using an
interpolating model, which contains both the abelian and non-abelian (with gauge
group SU(2)) theories as special limiting cases.
We begin with a lattice action built from an SU(2) gauge field, realized on the
lattice with link variables Un,μ ∈ SU(2), and an isovector (i.e., in the three-vector
representation of SU(2)) scalar field φn which is constrained to be unit length, φn ·
φn = 1. We use the Villain form for the SU(2) gauge action, with coupling βg , and
introduce a second coupling constant βh to indicate the strength of the scalar-gauge
coupling:

 
+∞
βg 1
SV = − log exp {− (arccos ( Tr(UP ) − 2mπ)2 }
m=−∞
2 2
P

βh  †
+ Tr(φn · σUn,μ φn+μ̂ · σUn,μ ) (19.91)
2 nμ

where we use the simplified index notation P for plaquettes (thus, P runs over {n, μν}
with μ < ν).10 In the limit of vanishing βh , we are left with a pure non-abelian
gauge theory (in this case, with gauge group SU(2)). On the other hand, when βh
is taken to infinity, the theory becomes a pure abelian U(1) gauge theory. We can
see this by using the local SU(2) gauge symmetry on each lattice site to rotate the
 field to the 3-direction, so that the second term in (19.91) becomes a sum over
φ

links of βh Tr(σ 3 Un,μ σ 3 Un,μ ), which for βh → ∞ forces the non-abelian link variables
to collapse onto the U(1) subgroup given by Un,μ = exp (iσ 3 θn,μ ), at which point
we have precisely the abelian Villain model specified earlier in (19.69, 19.70). The
model therefore interpolates smoothly between lattice-regularized unbroken (compact)

10 This is a lattice-regularized version of the so-called Georgi–Glashow model, with an adjoint scalar field
coupled to non-abelian gauge vector fields, in the regime where the isovector “Higgs” field develops a non-
vanishing vacuum-expectation value. The “frozen” magnitude of the scalar field can be viewed as arriving
from a scalar potential P (φ) = λ(φ  2 − 1)2 in the limit of large quartic coupling, λ → ∞.
746 Scales IV: Long-distance structure of quantum field theory

abelian and non-abelian gauge theories. From the discussion in the preceding section
we know that linear confinement (an area law for the Wilson loop observable)
obtains in both cases in the strong coupling expansion, and indeed, it is easy to
verify, for all finite values of βh , that we obtain a non-vanishing string tension for
small βg .
In this interpolating model there is substantial analytic and numerical evidence
that linear confinement persists for all values of the lattice gauge parameter βg . For
example, one can determine numerically an “isotonic” line of constant string tension in
the (βg , βh ) plane (see Fig. 19.7) with just the qualitative features expected from the
semiclassical model of monopole confinement which we shall shortly discuss (Duncan
and Mawhinney, 1990). The essential point is that the physics of confinement evolves
smoothly from the purely abelian case, where we have a mathematically rigorous
and fairly complete physical picture of the confining mechanism, to the much more
complicated non-abelian case, where a proper analytic treatment does not exist.
We begin by examining the physical mechanism responsible for linear confinement
at the abelian end (βh → ∞), where the theory reduces to the Villain model. The path
integral giving the desired Wilson loop observable is just (see Problem 8)
 
dθn,μ   1 2 
W  = exp (− lP + ilP θP + i Jn,μ θn,μ ) (19.92)

2π 2βg nμ
lP P

where the link angle variables θn,μ are continuous, and the plaquette variables lP are
integer valued. The abelian Wilson loop is obtained by setting the lattice vector field
Jn,μ equal to unity on all the ordered links comprising the perimeter of the Wilson
loop (see Fig. 19.6), and zero on all other links. Also, we have used a shorthand index
notation for plaquettes, where P indicates the plaquette in the μ, ν plane with the site

βg
10

βh
5 10

Fig. 19.7 A line of constant string tension in the (βg − βh ) plane.


How confinement works: three-dimensional gauge theory 747

n in the lower left-hand corner, lP = ln;μν . Performing a discrete integration by parts,


we may write the second term in the exponent as
 
lP θP = θnν Δ̄μ ln;μν (19.93)
P nμν

Here, right (resp. left) discrete difference operators Δμ (resp. Δ̄μ ), which interconvert
under a discrete integration by parts, are defined as follows

Δμ φn ≡ φn+μ̂ − φn , Δ̄μ φn ≡ φn − φn−μ̂ (19.94)


The integrations over the link angles in (19.92) can now be performed, yielding a single
(Kronecker) δ-function constraint for each link:

dθn,μ iJn,ν θn,ν +iθn,ν Δ̄μ ln;μν
e = δ(Δ̄μ ln;μν + Jn,ν ) (19.95)

Note that despite the continuum δ-function notation, the lattice fields appearing on
the right all take integer values, so the constraint is of the Kronecker type. A general
solution of the constraint for the plaquette variable ln;μν can be written as a sum of the
general solution μνλ Δ̄λ φn (where φn is an arbitrary integer-valued site field) of the
homogeneous equation Δ̄μ ln;μν = 0, and any particular solution of the inhomogeneous
equation Δ̄μ ln;μν = −Jn,ν . To obtain a solution to the latter, we choose a unit vector ζ
pointing along either axis of the Wilson loop (for definiteness, say along the x-direction
for a Wilson loop oriented in the x-y plane), and define an inverse operator as follows:

1 
nx
sn ≡ smx ny nz (19.96)
ζ · Δ̄ m =0
x

The right-hand side represents a well-defined and periodic integer-valued site field
provided the site field sn sums to zero when accumulated across the lattice in the ζ
(i.e., x) direction. One then finds that a general solution of Δ̄μ ln;μν = −Jn,ν may be
written, using current conservation Δ̄μ Jn,μ = 0,
1 1
ln;μν = ζν Jn,μ − ζμ Jn,ν + μνλ Δ̄λ φn (19.97)
ζ · Δ̄ ζ · Δ̄
where the first two terms on the right-hand side are the desired particular solution.
The inverse operators in (19.97) are well-defined, as only the current in the plane of
the Wilson loop orthogonal to ζ appears (say, the y-direction), by antisymmetry, and
this component of the current clearly sums to zero across the lattice in the ζ direction
(by combining current contributions from opposite sides of the loop). Using the δ
constraint (19.95) in the Wilson loop average (19.92), and converting the sums over
the integer valued site field φn into integrals over a real-valued site field χn via the
Poisson identity,

+∞ +∞ 

f (φ) = dχf (χ)e2πiρχ (19.98)
φ=−∞ ρ=−∞
748 Scales IV: Long-distance structure of quantum field theory

we find an equivalent expression for the Wilson loop average:


   1  1 1
W  = dχn exp (− (ζν Jn,μ − ζμ Jn,ν + μνλ Δ̄λ χn )2 )
4βg nμν ζ · Δ̄ ζ · Δ̄
{ρn }

· exp (+2πi ρn χn ) (19.99)
n

The Gaussian integral over χn can be performed explicitly:


 
− 1 χ (−Δ)χn +χn (2iπρn − β1g σn )
dχn e 2βg n
n
−1 −1
−1
2
ρn (−Δ ρn Δ − 2β1g σn Δ
= e−2π βg )nm ρm +2πi σn σn
nm n ·e n (19.100)

where we have defined


1
σn = μνλ ζμ Δλ Jnν (19.101)
ζ · Δ̄
Here Δ is the lattice Laplacian

Δ = Δ̄μ Δμ (19.102)

whose (negative) inverse gives the lattice Coulomb potential


coul
Δnr vrm = −δnm (19.103)

The site field ρn in fact describes a gas of magnetic monopoles interacting with the
electric current loop carried by Jn,μ . To see this, observe that an electric current Jμ
produces a magnetic B-field via the curl operation

Jμ = μνλ Δ̄ν Bλ (19.104)

Setting
1
Bλ = λνμ ζμ Jν (19.105)
ζ · Δ̄
one easily finds that (19.104) is satisfied. We now see that the term coupling the
ρn field to the Wilson loop current Jν (the second term in the first exponential in
(19.100)) is proportional to
 
ρn Δ−1 σn = coul
ρn vnm (Δλ Bλ )m (19.106)
n nm

In other words, the objects whose density is represented by the ρn field couple
Coulombically to the divergence of the magnetic field generated by the current loop,
and may therefore be properly regarded as magnetic monopoles. The first term in
the exponent (quadratic in ρn ) is just the Coulombic interaction energy of this gas of
monopoles.
How confinement works: three-dimensional gauge theory 749

To summarize, we have shown that the Wilson loop average in (19.92) factorizes
exactly into the product of two terms,

W  = W mon · W SW (19.107)

where the first term describes the interaction of a magnetic monopole gas (with
monopole density ρn ) with an electric current Jn,μ running around the Wilson loop
(which generates the field σn ),
 −2π2 β ρ (−Δ−1 ) ρ +2πi ρ Δ−1 σ
W mon = e g
nm
n nm m
n
n n
(19.108)
{ρn }

and a second “spin-wave” term which is an explicit functional of the current Jn,μ :
−1
− 1 (ζ 1 J −ζ 1 J )2 − 2β1g σn Δ σn
W SW = e 4βg nμν ν ζ·Δ̄ n,μ μ ζ·Δ̄ n,ν n (19.109)

This second, rather complicated-looking expression is just the usual (electrostatic)


Coulomb term in disguise. This is most easily seen by choosing a specific orientation
for the Wilson loop—say in the x-y plane—and setting ζμ = δμ1 . In this case, only Jn1
and Jn2 are non-zero. The exponent in (19.109) is then seen to become (apart from a
prefactor − 2β1g )

−1
(Δ̄−1 −1 −1
1 Jn2 Δ̄1 Jn2 + Δ3 Δ̄1 Jn2 Δ Δ3 Δ̄−1
1 Jn2 ) (19.110)
n

−1
= Jn2 (−Δ−1 −1 −1
1 Δ̄1 + Δ1 Δ̄3 Δ Δ3 Δ̄−1
1 )Jm2 (19.111)
nm

However, as the difference operators all commute, one may easily check that
−1 −1 −1
−Δ−1 −1 −1
1 Δ̄1 + Δ1 Δ̄3 Δ Δ3 Δ̄−1
1 = −Δ − Δ−1
1 Δ2 Δ Δ̄2 Δ̄−1
1 (19.112)

Finally, using current conservation,

Δ̄2 Δ̄−1
1 J2 = −J1 (19.113)

and we obtain
1
−1 coul
Jn,μ Δnm Jm,μ − 2β1g Jn,μ vnm Jm,μ
W SW = e 2βg nm =e nm (19.114)

This contribution to the Wilson loop is precisely (see Problem 9) the term that one
expects to survive in the continuum limit of the pure abelian theory, and cannot
therefore be responsible for an area-law behavior of the loop average in the limit
of large loops (recalling that the Coulomb potential in two space dimensions only
rises logarithmically, not linearly, at large distance). Returning to the monopole
contribution (19.108), note that with the x-y orientation of the Wilson loop chosen
above, the σn field (cf. (19.101)) can be written as the gradient in the z-direction
orthogonal to the loop of a uniform density localized on the plane interior of the loop:
750 Scales IV: Long-distance structure of quantum field theory


mx
σm = Δ3 Δ̄−1
1 Jm,2 = Δ3 Jnx my mz ,2 (19.115)
nx =0

In other words, the monopole density in (19.108) couples Coulombically to a magnetic


potential corresponding to a dipole sheet localized on the plane of the Wilson loop,
exactly as we would expect for an electric current loop traversing the perimeter.
If we view the sum over monopole fields ρn in (19.108) as a finite temperature
partition function for an electric loop interacting with a gas of magnetic monopoles,
it is perfectly plausible that monopoles will tend to “condense” on the dipole sheet,
producing a screening of the magnetic field produced by the current loop, and giving
rise to a shift in energy of the system proportional to the area of the loop—in other
words, to an area law for the Wilson loop observable. The string tension corresponds
to the coefficient of the area, and will be suppressed exponentially at large βg due to
coul
the fact that the individual monopoles come with a self-energy proportional to βg vnn ,
so that in this limit the density of the monopole gas dies exponentially, and we are left
with just the “spin-wave” (i.e., Coulombic) contribution discussed above. For further
details on the abelian theory, the reader is encouraged to consult (Göpfert and Mack,
1982) for rigorous estimates, or (Duncan and Mawhinney, 1990) for a semiclassical
evaluation of the monopole term employing the techniques of Debye–Hückel screening
theory.
So far, we have been discussing the mechanism for inducing a linearly rising
confining potential in the compact abelian U(1) gauge theory corresponding to the
βh → ∞ limit of the interpolating model specified by the lattice action (19.91). For
finite βh > 0, we are dealing with a theory with an exact local SU(2) gauge symmetry,
broken spontaneously to the U(1) subgroup under which the “Higgs” isovector field is
invariant. In such theories, in four spacetime dimensions, it was shown a long time ago,
by ’t Hooft (’T Hooft, 1974), and independently by Polyakov (Polyakov, 1974), that
there exist static solutions to the coupled scalar-gauge field equations corresponding to
finite mass field configurations carrying non-zero magnetic charge and finite mass—the
famous ’t Hooft–Polyakov monopoles.
A time-independent (but spatially varying) solution of the 3+1-dimensional theory
amounts, of course, to a solution of the field equations of the Euclidean version of
the scalar-gauge theory in one less dimension (i.e., precisely the 2+1-dimensional
theory we are considering here), where the finite mass of the monopole solutions in
3+1 dimensions translates to a configuration with finite (and, for a solution of the
field equations, locally extremal) Euclidean action. These “instanton”-type Euclidean
configurations are the non-abelian deformations of the abelian monopoles revealed
by the transformations described above in the compact abelian limit. They can be
exposed, and studied numerically, by a “cooling” procedure whereby scalar-gauge
configurations on the lattice generated by Monte Carlo simulation techniques are
subjected to a stochastic process which dampens all fluctuations on the scale of a
lattice spacing, revealing the underlying “classical” configurations. The upshot is that
the non-abelian monopoles are found to persist all the way down to βh = 0—the limit
in which we have a pure, unbroken non-abelian SU(2) gauge theory. The isotones—
lines of constant string tension—of the interpolating theory are completely smooth
How confinement works: three-dimensional gauge theory 751

curves (see Fig. 19.7) in the βg − βh plane, as we indicated earlier, and the shape of
the curves can even be understood qualitatively from known properties of ’t Hooft–
Polyakov monopoles (see (Duncan and Mawhinney, 1990)). The essential difference
between the behavior of the theory at the abelian (βh → ∞) and non-abelian (βh → 0)
ends lies simply in the fact that in the latter case the monopole action no longer
diverges for large βg : instead, the monopole density remains finite in the continuum
limit, corresponding to the persistence of the area law, and linear confinement on
distance scales much larger than the lattice cutoff. However, in the small βh regime
the monopole cores grow to the point where they overlap, and we no longer have a
beautiful and analytically tractable transcription of the theory in terms of a dilute
monopole gas as in the abelian case.
In four spacetime dimensions, an analogous treatment of the Villain version of
compact U(1) gauge theory (see (Banks et al., 1977)) shows that in the strong-coupling
regime (i.e., for βg smaller than the critical coupling shown by Guth (Guth, 1980) to
mark the transition point to a non-confining phase) an area law is obtained as a con-
sequence of the appearance of magnetic vortices—i.e., closed loops carrying magnetic
current—which interlace with the electric current loop corresponding to the Wilson
loop observable. These Euclidean configurations correspond in Minkowski space to vir-
tual events in which magnetic monopole/anti-monopole pairs appear and subsequently
annihilate. The concomitant large fluctuations in the local magnetic charge density
lead to a suppression of the electric field in the bulk, and a “focussing” of the (by
Gauss’s Law, necessarily conserved) electric flux travelling between opposite electric
charges onto a “flux tube” connecting the charges, with an energy cost proportional to
the length of the tube. The situation is precisely the “dual” (in the sense of interchange
of electric and magnetic fields) of the Meissner effect in superconductivity. There, large
fluctuations in the local electric charge density in the superconducting state lead to
a suppression of magnetic field in the bulk of the superconductor (recall from Section
8.1 that electric and magnetic fields are complementary quantities, with corresponding
mutual uncertainty constraints). Indeed, if we had actual magnetic monopoles at our
disposal, then inserting an oppositely (magnetically) charged pair into the bulk volume
of a superconductor would lead precisely to the formation of a string of magnetic
flux connecting the two, with an energy rising linearly with their separation. This
is the “dual Meissner” interpretation of quark confinement in QCD, which remains,
nearly forty years after its introduction, a perfectly reasonable qualitative picture of
the underlying mechanism leading to a linearly rising potential between static quarks.
A glance at the recent literature surrounding confinement in four-dimensional
Yang–Mills theories reveals a complicated nexus of competing, and at first sight
incompatible, hypotheses advanced by theorists interested in the detailed physical
mechanism leading to quark and/or color confinement (for an extensive review, see
(Greensite, 2003)). The origins of this complexity lie in (at least) two directions.
Firstly, the intrinsic fluidity of gauge theories, which allows physically equivalent,
but sometimes superficially completely different, descriptions of the same physi-
cal phenomenon simply by changing the gauge, tends to induce a proliferation of
hypothetical mechanisms even where the underlying relevant physics is the same.
Secondly, the asymptotic freedom of the theory, so helpful in allowing the extraction
of quantitative results from perturbation theory at high energies, proves a double-
752 Scales IV: Long-distance structure of quantum field theory

edged sword at low energy or long distances. Precisely in this regime, the field
configurations responsible for the confinement phenomenon (as well as the dynamical
chiral symmetry-breaking discussed in Section 16.5) necessarily correspond to strongly
coupled modes of the theory. Specifically, this means, as indicated previously, that
semiclassical methods, which depend on saddle-point expansions rendered sensible
by the existence of some type of small expansion parameter, no longer prove useful
except in the most crudely qualitative way. This means that we will probably never
be able to arrive at an analytically tractable, as well as quantitatively accurate,
description of four-dimensional non-abelian confinement along the lines discussed
above for three-dimensional compact abelian gauge theory. It is indeed fortunate that
the lattice formulation of four-dimensional Yang–Mills theory has at least given us the
option of direct numerical evaluation, using Monte Carlo methods, of the (Euclidean)
amplitudes of the theory, with results which leave us with no possible doubt that
the overall picture of confined elementary quark and gluon constituents is indeed the
correct framework for hadronic physics.

19.5 Problems
1. In the model used by Schroer to introduce the concept of infraparticles, a massless
boson field is coupled to a massive fermion in 1+1 spacetime dimensions, via the
Lagrangian

1
L= ∂μ φ∂ μ φ + ψ̄(i∂/ − M )ψ + ig ψ̄γ μ ψ∂μ φ (19.116)
2

Show that the one-loop contributions to the ψ̄ψφ vertex (with the external
fermions on-mass-shell) contain logarithmic divergences from the infrared part of
the loop integral.
2. Verify the result (19.42) for the matrix element appearing in (19.41).
3. Show that the action (19.46) for QED in 1+1 spacetime dimensions leads, after
going to axial (=Coulomb) gauge A1 = 0, to a Hamiltonian consisting of the usual
massive free fermion piece, plus the Coulomb interaction term (19.48), after the
dependent A0 field is eliminated.
4. Using the identity (19.62), verify that the plaquette variable Un;μν defined in
(19.61) reduces to (19.63).
5. Show that by choosing C(a), D(a) as suitable functions of the lattice spacing a,
the combination of lattice fields
 
C(a) (φ∗n+μ̂ Un,μ φn + c.c.) + D(a) P (φ∗n φn ) (19.117)
nμ n

reproduces for a → 0 the classical continuum scalar action for the most general
renormalizable gauge-invariant theory with fundamental representation scalar
fields coupled to the gauge vector fields. Here, P is a polynomial up to
degree 2.
6. Show that the Green function (19.78) satisfies its defining equation (19.77).
Problems 753

7. Consider a SU(2) gauge theory defined on a two-dimensional L × L Euclidean


lattice, with free boundary conditions for the links on the edge (thus, link
variables at the boundary appear in only a single plaquette variable in the action).
Show that the Wilson loop average W (R, T ) is given exactly by the expression
(19.88), at all values of β (and not just for small β, as in higher dimensions).
8. Using the Poisson identity (19.98), show that the abelian gauge action (19.92) is
equivalent to our original version of the Villain action (19.70).
9. In this Problem we shall evaluate the Wilson loop average in a pure abelian
gauge theory in three (Euclidean) dimensions in the continuum. As the Wilson
loop observable is gauge-invariant, we may choose any convenient gauge: in this
case, a Feynman gauge will be the preferred choice, corresponding to a Euclidean
gauge action (cf. the discussion following (15.159))

1 2
Sgauge,E = 2 (Fμν + 2(∂μ Aμ )2 )d3 x (19.118)
4g
The Wilson loop variable in an abelian
! theory corresponds
 to the familiar
Aharanov–Bohm phase factor exp (i Aμ dxμ ) = exp (i Jμ Aμ d3 x), where Jμ (x)
is a conserved current of unit strength localized on the Wilson loop. Show that
  
3 g2 −1 3
DAμ e−Sgauge,E +i Jμ Aμ d x = Ce+ 2 Jμ Δ Jμ d x (19.119)

with Δ = ∂μ ∂μ . This is exactly the continuum version of (19.114) (with the


identification βg = g12 ).
Appendix A
The functional calculus

The concepts of functionals and functional derivatives play an indispensable role in


many field-theoretic calculations, so here we shall collect the main results on which
we rely throughout the book. A functional is simply a mapping from some space of
functions to the real (or complex) numbers. Typically, we shall need to probe the
variation of a functional in response to a small change in the function(s) on which it
depends, in order to define, in analogy to derivatives of (real or complex) functions,
a functional derivative. The limiting procedure needed to obtain a derivative requires
that the function space be supplied with a norm, and the most general spaces of this
type which are useful in physics are Banach spaces (of which the complex Hilbert
spaces of quantum theory are a subclass). Some functionals commonly encountered in
quantum field theory include generating functionals, such as
 1 
Z[j] = G(n) (x1 , x2 , . . . , xn )j(x1 )j(x2 ) . . . j(xn )d4 x1 d4 x2 . . . d4 xn (A.1)
n
n!

which encode knowledge of an infinite class of correlation functions G(n) (x1 , x2 , . . . , xn )


(symmetric under permutation of their arguments) in a single functional Z[j], from
which we can recover, as we shall soon see, the correlation functions by functional
differentiation. We shall also encounter action functionals, or spacetime integrals of
Lagrangian densities, such as
 
1 1 λ
I[φ] = L(φ, ∂μ φ)d4 x = ( ∂μ φ(x)∂ μ φ(x) − m2 φ(x)2 − φ(x)4 )d4 x (A.2)
2 2 4!
The examples given here possess an obvious smoothness property with respect to
small perturbations of the argument functions j(x) or φ(x). Let f (x) be a Schwarz
test function (i.e., an infinitely differentiable function of compact support—properties
which will facilitate the processes of integration by parts which are indispensable in
allowing us to develop a useful functional calculus). For a general functional Z[j] (with
δZ[j]
j(x) a c-number function on spacetime, say) let us suppose that a distribution δj(x)
exists such that for all such f (x) the limit

Z[j + f ] − Z[j] δZ[j]
lim = f (x)d4 x (A.3)
→0  δj(x)
δZ[j]
exists. The existence of the limit implies that the functional derivative δj(x) (called a
Fréchet derivative in the mathematical literature) is a uniquely defined distribution:
in other words, a continuous linear functional on the space of test functions (with an
The functional calculus 755

appropriately defined norm, (Friedlander, 1982)). The definition (A.3) is the obvious
expression of our desire to express the first variation of the functional in a form
analogous to that familiar from ordinary multi-variable calculus:

δZ[j]
δZ[j] ≡ Z[j + δj] − Z[j] = δj(x)d4 x + O((δj)2 ) (A.4)
δj(x)
The reader may easily verify that the definition (A.3) applied to (A.1) gives, for the
first functional derivative,

δZ[j]  1
= G(n) (y, x1 , x2 , . . . , xn )j(x1 )j(x2 ) . . . j(xn )d4 x1 d4 x2 . . . d4 xn (A.5)
δj(y) n
n!

and that the correlation functions G(n) are recoverable from Z[j] by taking the nth
functional derivative and then setting the “source” functions j to zero:

δ n Z[j] 
(n)
G (y1 , y2 , . . . , yn ) =  (A.6)
δj(y1 )δj(y2 ) . . . δj(yn ) j=0

which is the functional analog of the usual formula for the Taylor coefficients of
a Taylor-expandable function. For action functionals such as (A.2), the functional
derivative gives the total Euler derivative of the Lagrange density

I[φ + δφ] − I[φ] = (L(φ + δφ, ∂μ φ + ∂μ δφ) − L(φ, ∂μ φ))d4 x

∂L L
= ( − ∂μ )δφ(x)d4 x (A.7)
∂φ ∂(∂μ φ)
δI[φ] ∂L L
⇒ = − ∂μ (A.8)
δφ(x) ∂φ(x) ∂(∂μ φ(x))
where the integration by parts maneuvers required to reach the second line are
validated by the smoothness and compact support of the test functions δφ(x). The
actual mechanics of functional differentiation can be simplified by noting that

δj(y)
j(y) = δ 4 (y − x)j(x)d4 x ⇒ = δ 4 (y − x) (A.9)
δj(x)
and applying the obvious generalization of the Leibniz rule to arbitrary products of
the source function.
In Section 10.3 the concept of functional differentiation is extended to functionals
of Grassmann functions, where both the argument functions and the functionals
themselves take values in an anticommuting number field. The reader is referred to
that section for an explanation of the basic properties of such functionals.
Appendix B
Rates and cross-sections

Prepare an incoming scattering state in the usual fashion:



|t = dα g(α )e−iEα t |α in (B.1)

where g(α ) is sharply peaked around some state α with well-defined energy (Eα )
and momentum. Recall that α is a shorthand notation for energy, momentum, and
internal quantum numbers (if any) of all the incoming particles. The state (B.1) is
unit normalized, t|t = 1, provided

dα | g(α ) |2 = 1 (B.2)

We are interested in the probability of appearance at late times t  0 of a state |β


with no overlap with the original state |α (for example, we look for particles coming
out at an angle to the beam). Accordingly, taking the overlap of the time-dependent
state (B.1) with a free state β|, and referring to (4.181, 4.183), the trivial δβα part of
the S-matrix in (4.182) does not contribute, and we have, after the collision:

e−iEα t
< β | t >= dα g(α ) Tβα (B.3)
Eα − Eβ + i
The probability that we shall find a state in the range of final-state phase-space β
to β + dβ is thus

dP (β, t) = |< β | t >|2 dβ



e−i(Eα −Eα )t Tβα Tβα


= dα dα g(α )g ∗ (α ) dβ (B.4)
(Eα − Eβ + i)(Eα − Eβ − i)
The event rate is just the time-derivative of this: namely,

dΓ(β, t) = −i dα dα g(α )g ∗ (α )Tβα Tβα



1 1
· ( − )e−i(Eα −Eα )t dβ (B.5)
E
α − Eβ − i Eα − Eβ + i


Strictly speaking, this event rate vanishes as t → ∞, as a result of the rapid oscillations
of the exponential factor. A finite number of particles localized in wave-packets
eventually separate, and we should not be surprised that the interaction rate then goes
Rates and cross-sections 757

to zero. The situation in an accelerator is somewhat different: the machine provides


a steady supply of incoming particles in a beam. This situation may be idealized as
a monoenergetic incoming plane wave of infinite extent. In other words, to obtain a
steady event rate, we may assume that the folding functions g(α ), g(α ) are sharply
focussed at energy Eα . The exponential time-dependent factor in (B.4) may then be
dropped. The two energy denominators then become identical except for a change in
the sign of : the difference is just a δ-function (see (4.188)). This leads to

dΓ(β) = 2πδ(Eα − Eβ ) | dα g(α )Tβα |2 dβ (B.6)

As a consequence of the clustering property of the S-matrix, as discussed in Chapter


6, the T-matrix element Tβα is a smooth function of the momenta characterizing β
and α , with the exception of a single overall δ-function of momentum conservation
δ 3 (Pβ − Pα ). Thus we can write

Tβα Tβα δ 3 (Pβ − Pα ) (B.7)

where in the smooth part T of T we can replace α by α as the smearing function


g is sharply peaked at α near α. Since the smooth part no longer depends on the
integration variable α , it can be pulled out of the integral in (B.6) to give

dΓ(β) 2πδ(Eα − Eβ ) | Tβα |2 | dα g(α )δ 3 (Pβ − Pα ) |2 dβ (B.8)

Let α correspond to a state of N particles so g(α ) = g(k1 , ..., kN



) is the momentum-
space wavefunction of the initial packets. This is related to the coordinate space
wavefunction in the usual way

d3 x1 ..d3 xN 
g(k1 , ...) = 3N/2
ψ(x1 , .., xN )e−i(k1 ·x1 +...) (B.9)
(2π)

while the δ-function of momentum conservation may be written



1  
δ 3 (Pα − Pβ ) = 3
d3 xei(k1 +...+kN −Pβ )·x (B.10)
(2π)

Combining (B.9) and (B.10), one finds



d3 k1 ..d3 kN

g(k1 , ..)δ 3 (k1 + ..kN

− Pβ )

= (2π) 3N/2−3
d3 xe−iPβ ·x ψ(x, x, .., x) (B.11)

The initial wavefunction evaluated at coincident spatial points can be written

ψ(x, x, .., x) = eiPα ·x f (x) (B.12)


758 Rates and cross-sections

where f (x) is a slowly varying envelope. Thus in the quantity


 
  3 3N −6
| dα g(α )δ (Pα − Pβ ) | = (2π)

2
d3 xd3 yei(Pβ −Pα )·(y−x) f ∗ (y)f (x)

we can replace f ∗ (y) by f ∗ (x) in the integral on the right and obtain
  
  3 3N −6
| dα g(α )δ (Pα − Pβ ) | = (2π)
2
d x | f (x) |
3 2
d3 yei(Pβ −Pα )·y

= (2π)3N −3 d3 x | f (x) |2 δ 3 (Pβ − Pα )

= (2π)3N −3 δ 3 (Pβ − Pα ) d3 x | ψ(x, x, .., x) |2

= (2π)3N −3 ρrel δ 3 (Pβ − Pα ) (B.13)

where the relative density ρrel is defined as



ρrel ≡ d3 x | ψ(x, x, ..x) |2 (B.14)

Our final result for the event rate is thus

dΓ(β) = (2π)3N −2 ρrel δ 4 (Pβ − Pα ) | Tβα |2 dβ (B.15)

where we have combined the energy-conservation δ-function of (B.8) with the three-
momentum δ-function of (B.13) to give a single four-dimensional δ-function of energy-
momentum conservation.
There are two particularly important special cases of (B.15) which deserve special
attention: N=1 (particle decay), and N=2 (two-particle scattering).
1. If there is only one particle in the initial state, we have immediately

ρrel = d3 x | ψ(x) |2 = 1 (B.16)

so the differential decay rate (into final-state phase-space between β and β + dβ)
is

dΓ(β) = 2πδ 4 (Pβ − Pα ) | Tβα |2 dβ (B.17)

2. Two particle collisions correspond to N=2. Consider a volume V containing ρ1


target particles at rest per unit volume, and ρ2 projectile particles per unit volume
travelling with speed v2 .(For the time being we shall assume that the target and
projectile particles are distinguishable). The effective differential cross-section dσ
is defined on a classical analogy as the effective area presented to the projectile
beam by each target particle leading to final states in the range β to β + dβ.
Since the flux of projectiles is ρ2 v2 and the total number of target particles is
ρ1 V , the event rate is
Rates and cross-sections 759

dΓ(β) = ρ1 V dσ · ρ2 v2 (B.18)

so

1 dΓ(β)
dσ = · (B.19)
ρ1 ρ2 v2 V

Quantum-mechanically we are dealing with a two-particle wavefunction ψ(x1 , x2 )


where the joint probability of finding both target and projectile in the same
volume V is
 
d3 x1 d3 x2 | ψ(x1 , x2 ) |2 (B.20)
V1

where V1 is a box of volume V around x1 . This integral represents the probability


of having a single projectile particle in volume V moving with speed v2 : i.e., to
flux v2 /V , impinging on a single target particle. The corresponding rate, from
definition (B.18), is
 
v2
dΓ = dσ · d3 x 1 d3 x2 | ψ(x1 , x2 ) |2 (B.21)
V V1

The projectile beam wavefunction usually varies slowly (in amplitude) over the
interaction range with the target particle, so choosing V much larger than the
interaction range but much smaller than the scale of variation of the projectile
wave-packet envelope
   
3
d x1 d x2 | ψ(x1 , x2 ) | d x1
3 2 3
d3 x2 | ψ(x1 , x1 ) |2
V1 V1

=V d3 x1 | ψ(x1 , x1 ) |2

= V ρrel (B.22)

Inserting this result in (B.21), and using (B.15) for N=2, we obtain

(2π)4 4
dσ(α → β) = δ (Pβ − Pα ) | Tβα |2 dβ (B.23)
v2

The above result was obtained in the frame in which one of the particles was
at rest. Note that from (B.19) the differential cross-section dσ was given as the
quantity dΓ(β)/V , which is a Lorentz-invariant (# of events per unit time per
unit spatial volume—i.e., # per unit spacetime volume), divided by the quantity
ρ1 ρ2 v2 . It is customary to define the cross-section in a general frame to be exactly
the same number as in (B.23), by generalizing the latter quantity in a Lorentz-
invariant way. Note that in the rest frame of the target (particle 1) the densities
of both particles are given by (c=1 everywhere!)
760 Rates and cross-sections

(0)
ρ 1 = ρ1
(0)
ρ2
ρ2 =  (B.24)
1 − v22
(0) (0)
where ρ1 , ρ2 are the particle densities in the rest frames of the particles
themselves. Thus
(0) (0) v2 (0) (0) | p
2 |
ρ1 ρ2 v2 = ρ1 ρ2  = ρ1 ρ2 (B.25)
1 − v22 m 2

The invariant quantity (p1 · p2 )2 − m21 m22 (p1 , p2 four-vectors) becomes, in the
frame where particle 1 is at rest, just m1 | p2 |, so the quantity equal to ρ1 ρ2 v2
in the target rest frame may be written in any frame as
 
(0) (0) (p1 · p2 )2 − m21 m22 (p1 · p2 )2 − m21 m22
ρ1 ρ2 = ρ1 1 − v1 · ρ2 1 − v2 ·
2 2
m1 m2 m1 m2

m1 m2 (p1 · p2 )2 − m21 m22
= ρ1 ρ2 ·
E1 E2 m1 m2
= ρ1 ρ2 vα (B.26)

(p1 ·p2 )2 −m21 m22
with the relative velocity vα ≡ E1 E2 providing the appropriate gener-
alization of v2 in (B.23). Our final formula is thus
(2π)4 4
dσ = δ (Pβ − Pα ) | Tβα |2 dβ (B.27)

Although the above formulas were derived under the assumption of distinguishable
particles in the initial state (in particular, we did not worry about niceties of sym-
metrizing or antisymmetrizing the initial-state wavefunction), it turns out that the
final result is still valid in such cases. The derivation can be found in any of the
standard texts on scattering theory (see (Newton, 1966), for example).
Appendix C
Majorana spinor algebra

Recall from the discussion in Section 7.4.3 that we may continue to employ the
very convenient (and familiar) Dirac 4-spinor language even when describing the
two-component Majorana fields that naturally interpolate for self-conjugate spin- 12
particles. Here, we shall follow the conventions of that section and define a Majorana
spinor as a Dirac spinor with the following relation between the upper and lower
2-spinors:
1 1
φ( 2 0) = Cs φ(0 2 )∗ (C.1)

where Cs = iσ2 is the 2-spinor conjugation matrix introduced in Section 7.2. Thus,
if Qa is a (0 12 ) spinor, in the fundamental representation of SL(2,C), we can form a
Majorana spinor as follows:


Cs Q∗
(C.2)
Q

Any Dirac spinor ψ can be decomposed ψ = √1 (χ1 + iχ2 ) where χ1,2 are Majorana:
2

1
χ1 ≡ √ (ψ − Cγ0 ψ ∗ ) (C.3)
2
−i
χ2 ≡ √ (ψ + Cγ0 ψ ∗ ) (C.4)
2

where


−Cs 0
C ≡ iγ2 γ0 =
0 Cs

In general, a numerical Majorana spinor is any 4-spinor of the form




Cs χ∗
χ

where χ is a complex 2-spinor.


Of particular interest in supersymmetry are Majorana spinors whose components
are Grassmann in character. These components can either be c-number constants or
anticommuting components of fermionic fields (with each other and with any c-number
762 Majorana spinor algebra

Grassmann quantities present). Thus, consider




Cs χ∗
s=
χ

where


χ1
χ=
χ2

and χ1 , χ2 , χ∗1 , χ∗2 are independent Grassmann quantities (i.e., they square to zero and
anticommute with each other).
The following 4x4 matrices will be very useful:


Cs 0
≡
0 Cs

1 0
γ5 ≡
0 −1

0 1
β = γ0 ≡
1 0

Using these, one finds that the adjoint of a Majorana spinor, s̄ ≡ s∗T β can be
written

s̄ = sT γ5 (C.5)

Let M be a general 4x4 numerical matrix (containing normal complex numbers), and
s1 , s2 two Grassmann Majorana spinors:

s̄1 M s2 = s1α (γ5 M )αβ s2β (C.6)


= −s2β (M T γ5 T )βα s1α (C.7)
= sT2 M T γ5 s1 (C.8)
= s̄2 (γ5 )−1 M T γ5 s1 (C.9)

But γ5 is basically the charge conjugation matrix C, and

C −1 M T C = M, M = 1, γ5 , γ5 γμ (C.10)
= −M, M = γμ , [γμ , γν ] (C.11)

which therefore gives us

s̄1 M s2 = s̄2 M s1 , M = 1, γ5 , γ5 γμ (C.12)


= −s̄2 M s1 , M = γμ , [γμ , γν ] (C.13)
Majorana spinor algebra 763

For the special case where s1 , s2 are the same spinor, we find

s̄γμ s = s̄[γμ , γν ]s = 0 (C.14)

It follows that the only non-vanishing bilinears that can be built from a single
Grassmann Majorana s are s̄s, s̄γ5 s, and s̄γ5 γμ s.
Inserting the explicit expression for s in terms of its Grassmann components, one
finds

s̄s = 2(χ∗1 χ∗2 − χ1 χ2 ) (C.15)


s̄γ5 s = 2(χ∗1 χ∗2 + χ1 χ2 ) (C.16)

There are only four independent objects cubic in χ1 , χ2 , χ∗1 , χ∗2 , so products of three
ss can always be reduced to
⎛ ⎞
2χ1 χ2 χ∗2
⎜ −2χ1 χ2 χ∗1 ⎟
s̄γ5 s · s = ⎜ ⎟
⎝ 2χ∗1 χ∗2 χ1 ⎠
2χ∗1 χ∗2 χ2

while the only surviving quantity involving four ss is

1
χ∗1 χ∗2 χ1 χ2 = (s̄γ5 s)2 (C.17)
8

We list here for convenience a number of useful identities:

(γ5 s)α s̄γ5 s = −sα (s̄s) (C.18)


sα (s̄γ5 γν s) = −(γν s)α s̄γ5 s (C.19)
1
(s̄γ5 s)sα s̄β = − (γ5 )αβ (s̄γ5 s)2 (C.20)
4
(s̄s)2 = −(s̄γ5 s)2 (C.21)

An important consequence of these identities is the Fierz rearrangement property


for bilinears of Grassmann spinors. Let θα be a Grassmann Majorana spinor as usual.
It follows from the discussion above that the only non-vanishing objects bilinear in θ
are θ̄θ, θ̄γ5 θ and θ̄γ5 γ μ θ. Hence, by Lorentz-invariance,

θα θ̄β = Aδαβ θ̄θ + B(γ5 γμ )αβ θ̄γ5 γ μ θ + C(γ5 )αβ θ̄γ5 θ (C.22)

Tracing with the identity (i.e., setting α = β and summing) gives

1
Tr(1 · θθ̄) = −θ̄θ = 4Aθ̄θ ⇒ A = − (C.23)
4
764 Majorana spinor algebra

Likewise
1
Tr(γ5 γν θθ̄) = −θ̄γ5 γν θ = −4B θ̄γ5 γν θ ⇒ B = (C.24)
4
1
Tr(γ5 θθ̄) = −θ̄γ5 θ = 4C θ̄γ5 θ ⇒ C = − (C.25)
4
To summarize, we have the following identity:
1 1 1
θα θ̄β = − δαβ θ̄θ + (γ5 γμ )αβ θ̄γ5 γ μ θ − (γ5 )αβ θ̄γ5 θ (C.26)
4 4 4
References

Aarts, G., Seiler, E., and Stamatescu, I. (2010). Complex Langevin method: When
can it be trusted? Physical Review D, 81, 054508.
Abers, E. S. and Lee, B. W. (1973). Gauge theories. Physics Reports, 9, 1–141.
Adler, S. L. (1969). Axial vector vertex in spinor electrodynamics. Physical
Review , 177, 2426–2438.
Adler, S. L. and Bardeen, W. A. (1969). Absence of higher-order corrections in the
anomalous axial-vector divergence equation. Physical Review , 182, 1517–1536.
Adler, S. L, Collins, J. C., and Duncan, A. (1977). Energy-momentum-tensor trace
anomaly in spin-1/2 quantum electrodynamics. Physical Review D, 15, 1712–1721.
Alvarez-Gaumé, L. and Witten, E. (1983). Gravitational anomalies. Nuclear Physics
B , 234, 269–330.
Anderson, A. (1994). Canonical transformations in quantum mechanics. Annals of
Physics, 232, 292–331.
Appelquist, T. and Carrazzone, J. (1975). Infrared singularities and massive fields.
Physical Review D, 11, 2856–2861.
Banks, T., Myerson, R., and Kogut, J. (1977). Phase transitions in abelian lattice
gauge theories. Nuclear Physics B , 129, 493–510.
Barton, G. (1963). Introduction to Advanced Field Theory (1st edn). Interscience
Publishers (John Wiley and Sons), New York.
Barton, G. (1965). Introduction to Dispersion Techniques in Field Theory (1st edn).
W. A. Benjamin, New York.
Baym, G. (1990). Lectures on Quantum Mechanics (3rd edn). Westview Press,
New York.
Becchi, C., Rouet, A., and Stora, R. (1976). Renormalization of gauge theories. Annals
of Physics, 98, 287–321.
Bell, J. S. and Jackiw, R. (1969). A PCAC puzzle: π0 → γγ in the σ-model. Nuovo
Cimento A, 51, 47–61.
Bèrgere, M. and Lam, Y. M. P. (1976). Bogoliubov-Parasiuk theorem in the α-
parametric representation. Journal of Mathematical Physics, 17, 1546–1557.
Bernard, C. and Duncan, A. (1975). Lorentz covariance and Matthew’s theorem for
derivative-coupled field theories. Physical Review D, 11, 848–859.
Bjorken, J. D. and Drell, S. D. (1965). Relativistic Quantum Fields (1st edn). McGraw-
Hill Book Company, New York.
Bloch, F. and Nordsieck, A. (1937). Note on the radiation field of the electron. Physical
Review , 52, 54–59.
Bloch, P. (2006). CPT invariance tests in neutral kaon physics. Journal of Physics
G: Nuclear and Particle Physics, 33, 666–667.
Bodwin, G., Braaten, E., and Lepage, G. P. (1995). Rigorous QCD analysis of
inclusive annihilation and production of heavy quarkonium. Physical Review D, 51,
1125–1171.
766 References

Bogoliubov, N. N. and Parasiuk, O. S. (1957). Über die Multiplikation der Kausal-


funktionen in der Quantentheorie der Felder. Acta Mathematica, 97, 227–266.
Bohr, N. and Rosenfeld, L. (1983). On the question of the measurability of electromag-
netic field quantities. In Quantum Theory and Measurement (eds. J. A. Wheeler and
W. H. Zurek), Chapter IV.2, pp. 479–522. Princeton University Press, Princeton,
New Jersey.
Boltzmann, L. (1898). Über vermeintlich irreversible Strahlungsvorgänge. Berliner
Berichte, 182.
Born, M. and Fuchs, K. (1939a). On fluctuations in electromagnetic radiation. Pro-
ceedings of the Royal Society of London, Series A, 170, 252–265.
Born, M. and Fuchs, K. (1939b). On fluctuations in electromagnetic radiation-
correction. Proceedings of the Royal Society of London, Series A, 172, 465–466.
Born, M., Heisenberg, W., and Jordan, P. (1926). Zur Quantenmechanik 2. Zeitschrift
für Physik , 35, 557–615.
Born, M. and Jordan, P. (1925). Zur Quantenmechanik. Zeitschrift für Physik , 34,
858–888.
Brenig, W. and Haag, R. (1963). General quantum theory of collision processes. In
Quantum Scattering Theory (ed. M. Ross). Indiana University Press, Bloomington,
Indiana.
Brown, L. S. (1992). Quantum Field Theory (1st edn). Cambridge University Press,
New York.
Buchholz, D. (1975). Collision theory for massless fermions. Communications in
Mathematical Physics, 42, 269–279.
Buchholz, D. (1977). Collision theory for massless bosons. Communications in Math-
ematical Physics, 52, 147–173.
Buchholz, D. (1982). The physical state space in quantum electrodynamics. Commu-
nications in Mathematical Physics, 85, 49–71.
Buchholz, D. (1986). Gauss’ law and the infraparticle problem. Physics Letters B , 174,
331–334.
Buras, Andrzej J. (1980). Asymptotic freedom in deep inelastic processes in leading
order and beyond. Reviews of Modern Physics, 52, 199–276.
Callan, Curtis G. (1970). Broken scale invariance in scalar field theory. Physical Review
D, 2, 1541–1547.
Callan, C. G., Coleman, S., and Jackiw, R. (1970). A new, improved energy-momentum
tensor. Annals of Physics (N.Y.), 59, 42–73.
Callen, H. B. (1960). Thermodynamics (1st edn). John Wiley and Sons, New York.
Coleman, S. (1976). More about the massive Schwinger model. Annals of Physics, 101,
239–267.
Coleman, S. and Mandula, J. (1967). All possible symmetries of the S matrix. Physical
Review , 159, 1251–1256.
Coleman, S. and Weinberg, E. (1973). Radiative corrections as the origin of sponta-
neous symmetry breaking. Physical Review D, 7, 1888–1910.
Collins, J. C., Duncan, A., and Joglekar, S. D. (1977). Trace and dilatation anomalies
in gauge theories. Physical Review D, 16, 438–449.
Condon, E. U. and Shortley, G. H. (1935). The Theory of Atomic Spectra (1st edn).
Cambridge University Press, Bentley House, London.
References 767

Creutz, M. (1980). Monte carlo study of quantized su(2) gauge theory. Physical Review
D, 21, 2308–2315.
Daboul, J. and Nieto, M. M. (1994). Quantum bound states with zero binding energy.
Physics Letters A, 190, 357–362.
Dirac, P. A. M. (1927a). The physical interpretation of the quantum dynamics.
Proceedings of the Royal Society of London, Series A, 113, 621–641.
Dirac, P. A. M. (1927b). The quantum theory of the emission and absorption of
radiation. Proceedings of the Royal Society of London, Series A, 114, 243–265.
Dirac, P. A. M. (1928). The quantum theory of the electron, I. Proceedings of the
Royal Society (London) A, 117, 610–624.
Dirac, P. A. M. (1933). Théorie du positron. In Septieme Conseil de Physique
Solvay: Structure et propriétés des noyaux atomiques, pp. 203–221. Gauthiers-Villars
(Paris).
Dirac, P. A. M. (1945). On the analogy between classical and quantum mechanics.
Reviews of Modern Physics, 17, 195–199.
Dirac, P. A. M. (1964). Lectures on Quantum mechanics. Yeshiva University, New
York.
Donoghue, J. F., Golowich, E., and Holstein, B. R. (1992). Dynamics of the Standard
Model (1st edn). Cambridge University Press, Cambridge, UK.
Duncan, A. (1976). Fine structure in non-abelian gauge theories. Physical Review
D, 13, 2866–2880.
Duncan, A., Eichten, E., Flynn, J., Hill, B., Hockney, G., and Thacker, H. (1995).
Properties of B mesons in lattice QCD. Physical Review D, 51, 5101–5129.
Duncan, A. and Janssen, M. (2007a). Van Vleck and the correspondence principle
(part one). Archive for History of the Exact Sciences, 61, 553–624.
Duncan, A. and Janssen, M. (2007b). Van Vleck and the correspondence principle
(part two). Archive for History of the Exact Sciences, 61, 625–671.
Duncan, A. and Janssen, M. (2008). Pascual Jordan’s resolution of the conundrum
of the wave-particle duality of light. Studies in History and Philosophy of Modern
Physics, 39, 634–666.
Duncan, A. and Janssen, M. (2009). From canonical transformations to transformation
theory, 1926–1927: The road to Jordan’s Neue Begründung. Studies in the History
and Philosophy of Modern Physics, 40, 352–362.
Duncan, A. and Jones, H. F. (1993). Convergence proof for optimized δ expansion:
Anharmonic oscillator. Physical Review D, 47, 2560–2572.
Duncan, A. and Mawhinney, R. (1990). Semiclassical approach to confinement in three-
dimensional gauge theories. Physical Review D, 43, 554–565.
Dyson, F. J. (1949). The S-matrix in quantum electrodynamics. Physical Review , 75,
1736–1755.
Dyson, F. J. (1952). Divergence of perturbation theory in quantum electrodynamics.
Physical Review , 85, 631–632.
Ehrenfest, P. (1911). Welche Züge der Lichtquantenhypothese spielen in der Theorie
der Wärmestrahlung eine wesentliche Rolle? Annalen der Physik , 36, 91–118.
Ehrenfest, P. (1925). Energieschwankungen im Strahlungsfeld oder Kristallgitter
bei Superposition quantisierter Eigenschwingungen. Zeitschrift für Physik , 34,
362–373.
768 References

Einstein, A. (1905a). Über einen die Erzeugung und Verwandlung des Lichtes betref-
fenden heuristischen Gesichtspunkt. Annalen der Physik , 17, 132–148.
Einstein, A. (1905b). Zur Elektrodynamik bewegter Körper. Annalen der Physik , 17,
891–921.
Einstein, A. (1909a). Über die Entwicklung unserer Anschauungen über das Wesen
und die Konstitution der Strahlung. Physikalische Zeitschrift, 10, 817–825.
Einstein, A. (1909b). Zum gegenwärtigen Stand des Strahlungproblems. Physikalishe
Zeitschrift, 10, 185–193.
Einstein, A. (1916). Zur Quantentheorie der Strahlung. Mitteilungen der Physikalis-
chen Gesellschaft, Zürich, 18, 47–62.
Einstein, A. (1917). Quantentheorie der Strahlung. Physikalische Zeitschrift, 18,
121–128.
Elitzur, S. (1975). Impossibility of spontaneously breaking local symmetries. Physical
Review D, 12, 3978–3982.
Evans, T. S., Kibble, T. W. B., and Steer, D. A. (1998). Wick’s theorem for non-
symmetric normal ordered products and contractions. Journal of Mathematical
Physics, 39, 5726–5738.
Faddeev, L. D. (1969). The Feynman integral for singular Lagrangians. Theoretical
and Mathematical Physics, 1, 1–13.
Fermi, E. (1929). Sopra l’elettrodinamica quantistica I. Rendiconti d. R. Acc. dei
Lincei , 9, 881–887.
Fermi, E. (1930). Sopra l’elettrodinamica quantistica II. Rendiconti d. R. Acc. dei
Lincei , 12, 431–435.
Fernandez, R., Fröhlich, J., and Sokal, A. D. (1992). Random Walks, Critical
Phenomena, and Triviality in Quantum Field Theory (1st edn). Springer Press,
New York.
Feynman, R. P. (1948). Space-time approach to non-relativistic quantum mechanics.
Reviews of Modern Physics, 20, 367–387.
Feynman, R. P. (1949a). Space-time approach to quantum electrodynamics. Physical
Review , 76, 769–78.
Feynman, R. P. (1949b). The theory of positrons. Physical Review , 76, 749–759.
Fock, V. A. (1933). Zur theorie des positrons. Doklady Akademii Nauk USSR, 6,
265–272.
Freedman, D. Z., Muzinich, I. J., and Weinberg, E. J. (1974). On the energy-
momentum tensor in gauge field theories. Annals of Physics, 87, 95–125.
Freedman, D. Z. and Weinberg, E. J. (1974). The energy-momentum tensor in scalar
and gauge field theories. Annals of Physics, 87, 354–374.
Friedlander, F. G. (1982). Introduction to the theory of distributions (1st edn). Cam-
bridge University Press, Cambridge, UK.
Fröhlich, J., Morchio, G., and Strocchi, F. (1979). Charged sectors and scattering
states in quantum electrodynamics. Annals of Physics, 119, 241–284.
Fujikawa, K. (1980). Path integral for gauge theories with fermions. Physical Review
D, 21, 2848–2858.
Fujikawa, K. (1981). Energy-momentum tensor in quantum field theory. Physical
Review D, 23, 2262–2275.
References 769

Furry, W. H. and Oppenheimer, J. R. (1934). On the theory of the electron and


positive. Physical Review , 45, 245–262.
Gasser, J. and Leutwyler, H. (1985). Chiral perturbation theory: Expansions in the
mass of the strange quark. Nuclear Physics B , 250, 465–516.
Gell-Mann, M. and Low, F. (1951). Bound states in quantum field theory. Physical
Review , 84, 350–354.
Georgi, H. (1993). Effective field theory. Annual Review of Nuclear and Particle
Science, 43, 209–252.
Glimm, J. and Jaffe, A. (1987). Quantum Physics: A Functional Integral Point of View
(2nd edn). Springer-Verlag, New York.
Goldstein, H. (2002). Classical Mechanics (3rd edn). Addison-Wesley, Reading, Mas-
sachusetts.
Goldstone, J. (1961). Field theories with superconductor solutions. Nuovo
Cimento, 19, 154–164.
Goldstone, J., Salam, A., and Weinberg, S. (1962). Broken symmetries. Physical
Review , 127, 965–970.
Göpfert, M. and Mack, G. (1982). Proof of confinement of static quarks in
3-dimensional u(1) lattice gauge theory for all values of the coupling constant.
Communications in Mathematical Physics, 82, 545–606.
Gordon, W. (1926). Der Comptoneffekt nach der Schrödingerschen Theorie. Zeitschrift
für Physik , 40, 117–133.
Greenberg, O. W. (1959). Haag’s theorem and clothed operators. Physical
Review , 115, 706–710.
Greensite, J. (2003). The confinement problem in lattice gauge theory. Progress in
Particle and Nuclear Physics, 51, 1–83.
Gribov, V. N. (1978). Quantization of non-abelian gauge theories. Nuclear Physics
B , 139, 1–19.
Gross, D. and Wilczek, F. (1974a). Asymptotically free gauge theories i. Physical
Review D, 8, 3633–3652.
Gross, D. and Wilczek, F. (1974b). Asymptotically free gauge theories ii. Physical
Review D, 9, 980–993.
Gross, D. J. and Neveu, A. (1974). Dynamical symmetry breaking in asymptotically
free field theories. Physical Review D, 10, 3235–3253.
Guth, A. (1980). Existence proof of a nonconfining phase in four-dimensional u(1)
lattice gauge theory. Physical Review D, 21, 2291–2307.
Haag, R. (1955). On quantum field theories. Kgl. Danske Videnskab. Selskab, Mat.-
Fys. Medd., 29, 1–37.
Haag, R. (1992). Local Quantum Physics (1st edn). Springer-Verlag, Berlin.
Haag, R., Lopuszanski, J. T., and Sohnius, M. (1975). All possible generators of
supersymmetries of the S-matrix. Nuclear Physics B , 88, 257–274.
Haber, H. and Nelson-eds, A. (2004). Particle Physics and Cosmology (TASI 2002).
World Scientific, Singapore.
Hall, D. W. and Wightman, A. S. (1957). A theorem on invariant analytic functions
with applications to relativistic quantum field theory. Kgl. Danske Videnskab.
Selskab, Mat.-Fys. Medd., 31, 1–41.
770 References

Heisenberg, W. (1925). Über eine quantentheoretische Umdeutung kinematischer und


mechanischer Beziehungen. Zeitschrift für Physik , 33, 879–893.
Heisenberg, W. (1931). Über Energieschwankungen in einem Strahlungsfeld. Berichte
über die Verhandlungen der Sächsischen Akademie der Wissenschaften zu Leipzig,
mathematische-physikalische Klasse, 83, 3–9.
Heisenberg, W. (1938). Über die in der Theorie der Elementarteilchen auftretende
universelle Länge. Annalen der Physik , 424, 20–33.
Heisenberg, W. (1943a). Die beobachtbaren Grössen in der Theorie der Elemen-
tarteilchen I. Zeitschrift für Physik , 120, 513–538.
Heisenberg, W. (1943b). Die beobachtbaren Grössen in der Theorie der Elemen-
tarteilchen II. Zeitschrift für Physik , 120, 673–702.
Heisenberg, W. (1944). Die beobachtbaren Grössen in der Theorie der Elemen-
tarteilchen III. Zeitschrift für Physik , 123, 93–112.
Heisenberg, W. and Pauli, W. (1929). Zur Quantendynamik der Wellenfelder I.
Zeitschrift für Physik , 56, 1–61.
Heisenberg, W. and Pauli, W. (1930). Zur Quantendynamik der Wellenfelder II.
Zeitschrift für Physik , 59, 168–190.
Henneaux, M. and Teitelboim, C. (1992). Quantization of Gauge Systems (1st edn).
Princeton University Press, Princeton, N.J.
Hepp, K. (1966). Proof of the Bogoliubov-Parasiuk theorem on renormalization.
Communications in Mathematical Physics, 2, 301–326.
Higgs, P. W. (1964). Broken symmetries and the masses of gauge bosons. Physical
Review Letters, 13, 508–509.
Iofa, M. Z. and Tyutin, I. V. (1976). Gauge invariance of spontaneously broken non-
Abelian theories in the Bogolyubov-Parasyuk-Hepp-Zimmermann method. Theoret-
ical and Mathematical Physics, 27, 316–322.
Ioffe, B. L., Fadin, V. S., and Lipatov, L. N. (2010). Quantum Chromodynamics:
Perturbative and Nonperturbative Aspects (1st edn). Cambridge University Press,
Cambridge, UK.
Israel, R. B. (1978). Convexity in the Theory of Lattice Gases (1st edn). Princeton
University Press, Princeton, N.J.
Itzhykson, C. and Zuber, J-B. (1980). Quantum Field Theory (1st edn). McGraw-Hill,
Inc, New York.
Jona-Lasinio, G. (1964). Relativistic field theories with symmetry-breaking solutions.
Nuovo Cimento, 34, 1790–1795.
Jordan, P. (1926). Über kanonischen Transformationen in der Quantenmechanik: II.
Zeitschrift für Physik , 38, 513–517.
Jordan, P. (1927a). Über eine Neue Begründung der Quantenmechanik. Zeitschrift für
Physik , 40, 809–838.
Jordan, P. (1927b). Über eine Neue Begründung der Quantenmechanik ii. Zeitschrift
für Physik , 44, 1–25.
Jordan, P. and Klein, O. (1927). Zum Mehrkörperproblem der Quantentheorie.
Zeitschrift für Physik , 45, 751–765.
Jordan, P. and Pauli, W. (1928). Zur Quantenelektrodynamik ladungsfreier Felder.
Zeitschrift für Physik , 47, 151–173.
References 771

Jordan, P. and Wigner, E. (1928). Über das Paulische äquivalenzverbot. Zeitschrift


für Physik , 47, 631–651.
Jost, R. (1961). Properties of Wightman functions. In Lectures on Field Theory and
the Many-Body Problem (ed. E. R. Caianiello), pp. 127–145. Academic Press, New
York.
Jost, R. (1965). The general theory of quantized fields (1st edn). American Mathemat-
ical Society, Providence, Rhode Island.
Kaiser, D. (2005). Drawing theories apart: the dispersion of Feynman diagrams in
postwar physics (1st edn). University of Chicago Press, Chicago, IL.
Kastler, D., Robinson, D. W., and Swieca, A. (1966). Conserved currents and
associated symmetries; Goldstone’s theorem. Communications in Mathematical
Physics, 2, 108–120.
Kato, T. (1995). Perturbation Theory for Linear Operators (2nd edn). Springer-Verlag,
Berlin-Heidelberg-New York.
Kirchhoff, G. (1859). Über den Zusammenhang zwischen Emission und Absorption von
Licht und Wärme. Monatsberichte der Akademie der Wissenschaft zu Berlin, 12,
783–787.
Kirchhoff, G. (1860). On the relation between the radiating and absorbing powers of
different bodies for light and heat. Philosophical Magazine, 20, 1–21.
Kirschner, M. and Gerhart, J. C. (2005). The Plausibility of Life. Yale University
Press, New Haven, Connecticut.
Klauder, J. R. (1984). Coherent-state Langevin equations for canonical quantum
systems with applications to the quantized Hall effect. Physical Review A, 29,
2036–2047.
Klein, M. (1970). Paul Ehrenfest: The Making of a Theoretical Physicist (1st edn).
North-Holland Publishing Company, Amsterdam.
Klein, O. (1926). Quantentheorie und fünfdimensionale Relativitätstheorie. Zeitschrift
für Physik , 37, 895–906.
Kopper, Christoph and Müller, Volkhard F. (2009). Renormalization of spontaneously
broken SU(2) Yang-Mills theory with flow equations. Reviews of Mathematical
Physics, 21, 781–820.
Kramers, H. A. (1927). La diffusion de la lumiere par les atomes. In Atti Cong.
Intern. Fisica (Transactions of the Volta Centenary Congress, Como), Volume 2,
pp. 545–557.
Kuhn, T. (1978). Black-body Theory and the Quantum Discontinuity 1894–1912 (1st
edn). Oxford University Press, Oxford.
Kuramashi, Y. (2008). PACS-CS results for 2+1 flavor lattice QCD simulation on and
off the physical point. In Proceedings of Science: Lattice 2008, pp. 18–31.
Kuti, J. and Shen, Y. (1988). Supercomputing the effective action. Physical Review
Letters, 60, 85–88.
Landau, L. and Peierls, R. (1983). Extension of the uncertainty principle to relativistic
quantum theory. In Quantum Theory and Measurement (eds. J. A. Wheeler and
W. H. Zurek), Chapter IV.1, pp. 465–476. Princeton University Press, Princeton,
New Jersey.
Leech, J. W. (1965). Classical Mechanics (2nd edn). Methuen and Co, London.
772 References

LeGuillou, J. C. and Zinn-Justin, J. (1980). Critical exponents from field theory.


Physical Review B , 21, 3976–3998.
Lehmann, H., Symanzik, K., and Zimmermann, W. (1955). Zur Formulierung quan-
tisierter Feldtheorien. Nuovo Cimento, 1, 205–225.
Lipatov, L. N. (1977). Divergence of the perturbation-theory series and pseudoparti-
cles. JETP Letters, 25, 104–107.
London, F. (1926). Winkelvariable und kanonische Transformationen in der Undula-
tionsmechanik. Zeitschrift für Physik , 40, 193–210.
Lorentz, H. A. (1916). Les théories statistiques en thermodynamique: conferences faites
au College de France, novembre 1912. Teubner, Leipzig, Berlin.
Lowenstein, J. H. (1976). BPHZ renormalization. In Erice Lectures 1975: Proceedings
of the NATO Advanced Study Institute, pp. 95–160. D. Reidel Publishing Company.
Majorana, E. (1937). Teoria simmetrica dell’elettrone e del positrone. Nuovo
Cimento, 14, 171–184.
Manohar, A. V. and Wise, M. B. (2000). Heavy Quark Physics (1st edn). Cambridge
University Press, Cambridge, UK.
Maxwell, J. C. (1873). A Treatise on Electricity and Magnetism (1st edn). Clarendon
Press, Oxford.
Maxwell, J. C. (1875). On the dynamical evidence of the molecular constitution of
bodies. Nature, 11, 357–374.
Messiah, A. (1966). Quantum Mechanics: Volume 2 (1st edn). North Holland Pub-
lishing Company, Amsterdam.
Miller, Arthur I. (1994). Early Quantum Electrodynamics- A Source Book (1st edn).
Cambridge University Press, Cambridge, U.K.
Montvay, I. and Münster, G. (1994). Quantum Fields on a Lattice (1st edn). Cam-
bridge University Press, Cambridge, UK.
Mueller, A. H. (1981). Perturbative QCD at high energies. Physics Reports, 73,
237–368.
Newton, R. G. (1966). Scattering Theory of Waves and Particles (1st edn). McGraw-
Hill, New York.
Noether, E. (1971). Invariant variation problems (translation by m. a. tavel). Transport
Theory and Statistical Physics, 1, 183–207.
O’Raifertaigh, L., Wipf, A., and Yoneyama, H. (1986). The constraint effective poten-
tial. Nuclear Physics B , 271, 653–680.
Osterwalder, K. and Seiler, E. (1978). Gauge field theories on a lattice. Annals of
Physics, 110, 440–471.
Pais, A. (1982). Subtle is the Lord: The Science and Life of Albert Einstein (1st edn).
Oxford Univesity Press, Oxford.
Pais, A. (1991). Niels Bohr’s Times, in Physics, Philosophy and Polity (1st edn).
Oxford University Press, Oxford.
Parisi, G. (1977). Asymptotic estimates in perturbation theory with fermions. Physics
Letters B , 66, 382–385.
Parisi, G. (1983). On complex probabilities. Physics Letters B , 131, 393–395.
Pauli, W. (1933). Die allgemeinen Prinzipien der Wellenmechanik. In Handbuch der
Physik (2nd edn), Volume 24,1, pp. 83–272. Springer Verlag, Berlin.
References 773

Pauli, W. (1940). The connection between spin and statistics. Physical Review , 58,
716–722.
Pauli, W. and Weisskopf, V. (1934). Über die Quantisierung der skalaren relativistis-
chen Wellengleichung. Helvetica Physica Acta, 7, 709–731.
Peccei, R. D. (1988). Discrete and global symmetries in particle physics. In Broken
Symmetries: Proceedings of the 37 International Universitäts Wochen für Kern- und
Teilchenphysik, pp. 1–50.
Peierls, R. E. (1934). The vacuum in Dirac’s theory of the positive electron. Proceedings
of the Royal Society A, London, 146, 420–441.
Planck, M. (1899). Über irreversible Strahlungsvorgänge: Fünfte mitteilung (Schluss).
Berliner Berichte, 440–480.
Planck, M. (1900a). Über irreversible Strahlungsvorgänge. Annalen der Physik , 1,
69–122.
Planck, M. (1900b). Zur Theorie des Gesetzes der Energieverteilung im Normalspek-
trum. Verhandlungen der Deutschen Physikalischen Gesellschaft , 2, 237–245.
Polchinski, J. (1984). Renormalization and effective Lagrangians. Nuclear Physics
B , 231, 269–295.
Polyakov, A. M. (1974). Particle spectrum in quantum field theory. JETP Letters, 20,
194–195.
Polyakov, A. M. (1977). Quark confinement and topology of gauge theories. Nuclear
Physics B , 120, 429–458.
Proca, A. (1936). Sur la théorie ondulatoire des électrons positifs et négatifs. Journal
de Physique et le Radium, 7, 347–353.
Rayleigh, Lord (1900). Remarks upon the law of complete radiation. Philosophical
Magazine, 49, 539–540.
Reeh, H. and Schlieder, S. (1961). Bemerkungen zur Unitäräquivalenz von Lorentzin-
varianten feldern. Nuovo Cimento, 22, 1051–1068.
Rey, S-J. (1989). Axion dynamics in wormhole background. Physical Review D, 39,
3185–3189.
Rudin, W. (1966). Real and Complex Analysis (1st edn). McGraw-Hill, Inc, New York.
Ruelle, D. (1962). On the asymptotic condition in quantum field theory. Helvetica
Physica Acta, 35, 147–163.
Sakurai, J. J. (1964). Invariance Principles and Elementary Particles (1st edn).
Princeton University Press, Princeton, New Jersey.
Salpeter, E. E. and Bethe, H. A. (1951). A relativistic equation for bound-state
problems. Physical Review , 84, 1232–1242.
Schroer, B. (1963). Infrateilchen in der Quantenfeldtheorie. Fortschritte der
Physik , 11, 1–32.
Schweber, Silvan S. (1994). QED and the men who made it: Dyson, Feynman,
Schwinger, and Tomonaga (1st edn). Princeton University Press, Princeton,
New Jersey.
Schwinger, J. (1948a). On quantum electrodynamics and the magnetic moment of the
electron. Physical Review , 73, 416–417.
Schwinger, J. (1948b). Quantum electrodynamics. I. a covariant formulation. Physical
Review , 74, 1439–1461.
774 References

Schwinger, J. (1951). On gauge invariance and vacuum polarization. Physical


Review , 82, 664–679.
Seiler, E. (1978). Upper bound on the color-confining potential. Physical Review D, 18,
482–483.
Simon, B. (1974). The P(φ2 ) Euclidean (Quantum) Field Theory (1st edn). Princeton
University Press, Princeton, New Jersey.
Smith, P. W. (1972). Mode selection in lasers. Proceedings of the IEEE , 60, 422–440.
Speer, E. R. (1973). Renormalization and Ward identities using complex space-time
dimension. Journal of Mathematical Physics, 15, 1–6.
Stevenson, P. M. (1981). Optimized perturbation theory. Physical Review D, 23,
2916–2944.
Streater, R. F. and Wightman, A. S. (1978). PCT, Spin and Statistics, and All That
(2nd edn). W. A. Benjamin, New York.
Swanson, M. S. (1993). Phase-space anomalies and canonical transformations. Physical
Review A, 47, R2431–R2434.
Symanzik, K. (1960). On the many-particle structure of Green’s functions in quantum
field theory. Journal of Mathematical Physics, 1, 249–273.
Symanzik, K. (1970). Small distance behaviour in field theory and power counting.
Communications in Mathematical Physics, 18, 227–246.
’t Hooft, G. (1974). Magnetic monopoles in unified gauge theories. Nuclear Physics
B , 79, 276–284.
’t Hooft, G. (1976). Symmetry breaking through Bell-Jackiw anomalies. Physical
Review Letters, 37, 8–11.
Taylor, J. C. (1976). Gauge Theories of Weak Interactions (1st edn). Cambridge
University Press, Cambridge, UK.
Taylor, J. R. (1966). Cluster decomposition of S-matrix elements. Physical
Review , 142, 1236–1245.
t’Hooft, G. (1971). Renormalizable Lagrangians for massive Yang-Mills fields. Nuclear
Physics B , 35, 167–188.
Titchmarsh, E. C. (1948). Introduction to the Theory of Fourier Integrals (2nd edn).
Clarendon Press, Oxford.
Toll, J. S. (1956). Causality and the dispersion relation: Logical foundations. Physical
Review , 104, 1760–1770.
Tomonaga, S. (1946). On a relativistically invariant formulation of the quantum theory
of wave fields. Progress of Theoretical Physics, 1, 1–13.
van der Kolk, C. M. and de Kerf, E. A. (1975). A simplified proof of the Bogoliubov-
Parasiuk theorem. Physica, 80A, 339–359.
von Neumann, J. (1996). Mathematical Foundations of Quantum Mechanics (2nd edn).
Princeton University Press, Princeton, New Jersey.
Weinberg, S. (1960). High-energy behavior in quantum field theory. Physical Review
D, 118, 838–849.
Weinberg, S. (1964a). Feynman rules for any spin. Physical Review , 133, 1318–1332.
Weinberg, S. (1964b). Systematic solution of multiparticle scattering problems. Phys-
ical Review , 133, 232–256.
Weinberg, S. (1965). Infrared photons and gravitons. Physical Review , 140, 516–524.
Weinberg, S. (1967). A model of leptons. Physical Review Letters, 19, 1264–1266.
References 775

Weinberg, S. (1968). Nonlinear realizations of chiral symmetry. Physical Review , 166,


1568–1577.
Weinberg, S. (1972). Gravitation and Cosmology (1st edn). John Wiley and Sons, New
York. See Chapter 2, Section 13.
Weinberg, S. (1973). Perturbative calculations of symmetry breaking. Physical Review
D, 7, 2887–2910.
Weinberg, S. (1995a). The Quantum Theory of Fields: Volume 1 (1st edn). Cambridge
University Press, Cambridge, UK.
Weinberg, S. (1995b). The Quantum Theory of Fields: Volume 3 (1st edn). Cambridge
University Press, Cambridge, UK.
Weisskopf, V. (1939). On the self-energy and the electromagnetic field of the electron.
Physical Review , 56, 72–85.
Weisskopf, V. S. (1936). Über die Elektrodynamik des Vakuums auf Grund der Quan-
tentheorie des Elektrons. Kongelige Danske Videnskabernes Selskab, Mathematisk-
fysiske Meddelelser , 14, 3–39.
Weyl, H. (1929). Elektron und Gravitation: I. Zeitschrift für Physik , 56, 330–352.
Wichmann, E. H. and Crichton, J. H. (1963). Cluster decomposition properties of the
S matrix. Physical Review , 132, 2788–2799.
Wick, G. C., Wightman, A. S., and Wigner, E. P. (1952). The intrinsic parity of
elementary particles. Physical Review , 88, 101–105.
Wightman, A. (1956). Quantum field theory in terms of vacuum expectation values.
Physical Review , 101, 860–866.
Wigner, E. P. (1939). On unitary representations of the inhomogeneous Lorentz group.
Annals of Mathematics, 40, 149–204.
Wigner, E. P. (1959). Group Theory and its Applications to the Quantum Mechanics
of Atomic Spectra. Academic Press.
Wigner, E. P. (1979a). Events, laws of nature and conservation laws. In Symmetries
and Reflections. Ox Bow Press, Woodbridge, Connecticut.
Wigner, E. P. (1979b). Symmetry and conservation laws. In Symmetries and Reflec-
tions. Ox Bow Press, Woodbridge, Connecticut.
Wilson, K. G. (1969). Non-Lagrangian models of current algebra. Physical
Review , 179, 1499–1512.
Wilson, K. G. (1971). Renormalization group and critical phenomena. Physical Review
B , 4, 3174–3183.
Wilson, K. G. (1974). Confinement of quarks. Physical Review D, 10, 2445–2459.
Wilson, K. G. and Kogut, J. (1974). The renormalization group and the epsilon
expansion. Physics Reports, 12, 75–199.
Wüthrich, A. (2010). The Genesis of Feynman Diagrams (1st edn). Springer
(Archimedes series), Dordrecht-Heidelberg-London-New York.
Yang, C. N. and Mills, R. L. (1954). Conservation of isotopic spin and isotopic gauge
invariance. Physical Review , 96, 191–195.
Yennie, D. R., Frautschi, S. C., and Suura, H. (1961). The infrared divergence
phenomena and high-energy processes. Annals of Physics, 13, 379–452.
Ziman, J. M. (1964). Principles of the Theory of Solids (1st edn). Cambridge Univer-
sity Press, Cambridge, UK.
776 References

Zimmermann, Wolfhart (1968). The power counting theorem for Minkowski metric.
Communications in Mathematical Physics, 11, 1–8.
Zimmermann, W. (1969). Convergence of Bogoliubov’s method of renormalization in
momentum space. Communications in Mathematical Physics, 15, 208–234.
Zimmermann, W. (1970). Local operator products and renormalization. In Bran-
deis Lectures on Elementary Particles and Quantum Field Theory, Volume 1,
pp. 395–582. MIT Press.
Zinn–Justin, J. (1989). Quantum Field Theory and Critical Phenomena (1st edn).
Oxford University Press, Oxford.
Index

N -particle irreducibility, 342 Bogoliubov transformation, 361


N -particle-reducibility Bogoliubov–Parasiuk recursion formula, 624
threshold singularities from, 358 Bohr–Rosenfeld measurement analysis, 222
β function, 552, 677, 678, 704 Boltzmann, Ludwig, 2, 5, 9
’t Hooft (ξ-)gauges, 526 Borel transform, 401
Borel resummation, 385
accidental symmetries, 490 Borel summability, 383, 401
action functional, 43, 428 Born, Max, 18, 162
action-at-a-distance effects, 124 bound state poles, 387
adjoint matrix field, 531 bound states, 386
almost local fields, 256 in φ4 -theories, 392
amputated Green function, 343 in massless gauge theories, 396
analyticity non-relativistic, 394
connection to causality, 164 BPHZ renormalization scheme, 626
of forward scattering amplitude, 168 equivalence to Lagrangian counterterms, 640
of S-matrix, 164
anomalous dimension, 677, 706 Callan–Symanzik equation, 676
of a composite operator, 690 for Wilson coefficient function, 690
anomalous magnetic moment (electron), 307 canonical formalism
anomaly, 85, 487 operator version, 421
axial (chiral), 544 path integral version, 424
Fujikawa derivation, 545 canonical scalar field, 155
gravitational, 552 cavity (blackbody) radiation, 3, 9
trace (of energy-momentum tensor), 551 central charges (SUSY), 450
antiunitary operators, 78 charge conservation
area law, 743 from phase invariance of Lagrangian, 439
asymptotic completeness, 243, 267 charge-conjugation operation, 474
asymptotic condition, 278 chiral effective Lagrangian, 603
asymptotic expansion, 377 chiral field, 176
asymptotic freedom, 301, 658, 704 chiral symmetry, 545
in φ36 -theory, 678 in QCD, 600
in QCD, 704 chirality, 198
asymptotic perturbative expansion, 324 classical field φcl
axial current, 545 as extremum of classical action, 348
anomalous divergence, 551 instanton solutions for, 383
axial gauge, 522 classical limit
axiomatic quantum field theory, 253 bosonic fields, 223
non-existence for fermionic fields, 221
Baker–Campbell–Hausdorff formula, clustering property, 58
129, 310 momentum smoothness condition for, 137
bare coupling, 324 coherent states, 228
bare mass, 323 from classical current source, 716
baryon number violation, 490 of a scalar field, 231
Bethe–Salpeter equation, 390 Coleman–Mandula theorem, 444
in QED, 398 color confinement, 730
Schrödinger equation as non-relativistic complementarity
limit of, 398 in quantum field theory, 220
Bethe–Salpeter kernel, 358 in quantum mechanics, 219
Bethe–Salpeter wavefunction, 387 composite operators, 664
Bjorken limit, 692 minimally subtracted, 666
Bloch–Nordsieck theory, 724 oversubtracted, 666
778 Index

composite operators (cont.) Dirac equation for the electron, 38


soft vs. hard, 673 prediction of the positron, 39
subtraction degree of, 665 Dirac–Faddeev formula, 518
conformal symmetry, 436 Diracology, 189
constrained Hamiltonian, 511 dispersion relation, 294
constructive quantum field theory, 253 for anharmonic oscillator energy, 382
contraction of fields, 313 subtracted, 295
convex function, 498 dual Meissner effect, 222, 751
convexity Dyson, Freeman, 56, 357, 377
of Γ[φ], 499
of W [j], 498 effective action functional, 353
cooling procedure (for gauge configurations), generates 1PI graphs, 356
750 in spontaneous symmetry-breaking, 496
Cooper pairs effective field theories, 595
as threshold bound states, 395 chiral Lagrangians, 600
cosmological constant problem, 322, 579 heavy particle decoupling, 599
Coulomb gauge, 522 HQET, 598
covariant derivative non-relativistic (NREFT), 597
in superspace, 453 SCET, 598
in abelian gauge theory, 527 effective Lagrangian, 569
in non-abelian gauge theory, 531 coupling constant space of, 580
covariant fields, 172 scale dependence of, 580
for any spin, 177 Wilsonian, 572
transformation under charge conjugation, 476 effective potential, 495
transformation under time reversal, 478 constraint version, 507
transformations under parity, 473 energetic interpretation of, 502
covariant normalization of states, 111 Ehrenfest, Paul, 12, 20
creation–annihilation operators, 144 Einstein, Albert, 1
commutation algebra, 145 energy fluctuations in cavity radiation, 15, 16
Lorentz transformation of, 146 special relativity, 3
crossing symmetry, 213, 288 electromagnetic form factor, 302
cyclicity of the vacuum, 259 elementary particle
as structureless entity, 164
dangerous δ-functions, 144 elementary vs composite particles, 299
deep-inelastic scattering, 691 Elitzur’s theorem, 553
delta expansion, 403 energy-momentum tensor, 430
derivatively-coupled theories, 414 canonical, 433
detector resolution, 724 new, improved version, 434
role in infrared catastrophe, 726 equal-time anticommutation relations, 45
deWitt–Faddeev–Popov (DFP) determinant equal-time commutation relations (ETCR’s),
in abelian gauge theory, 523 43, 417
in general constrained system, 519 in Heisenberg picture, 417
in non-abelian gauge theory, 537 equal-time commutator (ETCR), 221
dilatation symmetry, 436 essentially non-perturbative processes, 375
dimensional regularization, 591 Euclidean action functional, 332
Dirac 4-spinor (bispinor), 187 Euler derivative, 421
Dirac adjoint, 190 Euler–Lagrange equation, 43, 422
Dirac equation
for spin- 12 field, 189 factorized amplitude, 663
for spinors, 188 for soft photons, 721
Dirac field, 186 fermionic determinant, 340
Lorentz transformation properties, 194 Feynman amplitude, 286
Dirac Hamiltonian (free), 191 Feynman diagrams, 55
Dirac matrices, 188 Feynman graphs, 316
algebra satisfied by, 189 Feynman parameters, 592
Dirac propagator, 213, 216 Feynman propagator, 208
non-relativistic limit, 397 Feynman rules, 210
Dirac, Paul Adrien Maurice, 31 for non-abelian gauge theories, 541
birth of quantum electrodynamics, 32 for spontaneously broken gauge theories, 559
constrained Hamiltonian systems, 513 in coordinate space, 314
derives atomic transition amplitudes, 37 in momentum space, 319
Index 779

Feynman, Richard, 55, 59 Grassmann calculus


fine-tuning (of relevant operators), 594 derivatives, 339
first-class constraint, 512 integrals, 339
Fock space, 47, 112 Grassmann Majorana spinor, 449
Fock, Vladimir, 47 Gribov copies, 536
forward scattering, 166
forward tube, 260
Haag’s theorem, 253, 254, 308, 359
extended, 260
axiomatic proof, 366
extended permuted, 261
in supersymmetric theories, 369
full propagator, 292
toy example, 360
functional determinant, 335
Haag, Rudolf, 256, 262, 265
Furry, Wendell, 47
asymptotic theorem, 269, 275
Haag–Ruelle scattering theory, 268
gauge field propagator, 541
hadronic electromagnetic current, 302
gauge orbit, 515
Hall–Wightman theorem, 260, 368, 481
gauge symmetry, 200
Hamilton’s principle, 88
gauge theory
modified version, 91
abelian, 519
Hamiltonian equations
abelian path integral (ξ-gauge), 526
in field theory, 419
Euclidean functional integral, 543
Heisenberg field
Euclidean version, 542
as interpolating operator, 250
in classical mechanics, 510
Heisenberg field equation, 420
non-abelian (Yang–Mills), 529
Heisenberg field operator, 242
non-abelian path integral (ξ-gauge), 540
Heisenberg, Werner, 18, 43, 70
with SU(2)×U(1) gauge group, 555
introduces S-matrix, 53
with SU(N) gauge group, 529
helicity states, 116, 118, 120, 195
with U(1) gauge group, 528
for spin- 12 particles, 197
gauge transformation, 201 hierarchy problem, 578
abelian, 519 Higgs model, 553
non-abelian, 529 Higgs particle, 556
non-abelian, infinitesimal version, 533 Higgs phenomenon, 494
turntable model, 510 higher-derivative theories, 427
gauge-fixing hole theory, 46
abelian gauge theory, 522 homogeneous Lorentz group (HLG), 108
non-abelian, 535 finite-dimensional representations of, 173
turntable model, 515 isomorphic to SL(2,C), 447
unitary gauge, 554 Lie algebra of, 174, 258
gauge-invariance unitary representative U (Λ), 111
abelian gauge theory (QED), 527 Hurwitz measure, 537
turntable model, 511 hyperfine splitting, 399
gauge-invariant observables, 530
Gauss’s Law, 520
non-abelian, 533 i factor
Gell–Mann/Low formula, 245 in fermionic path integral, 340
Generalized Optical theorem, 105 in Feynman propagator, 209
generating functional, 137 in quantum mechanical path-integral, 88
convexity of, 352 in scalar field path integral, 329
for 1PI diagrams, 351 in(out)-states, 96
for connected Green functions, 343 inconvenient truth, 324
for connected S-matrix elements, 137 infraparticles, 722
for Euclidean Schwinger functions, 332 infrared catastrophe, 713
for Feynman Green functions, 331 instantons, 381
ghost fields, 540 in φ4 -theory, 383
global symmetries, 487 in non-Borel theories, 402
fragility of, 490 interaction picture, 72
Goldstone boson, 235, 488, 495 time evolution operator, 75
Goldstone–Salam–Weinberg theorem, 492 internal symmetries, 438
Grand Unified Theories (GUTs), 564 interpolating field, 252
Grassmann algebra, 338 non-uniqueness of, 287
780 Index

intrinsic parity, 470 from chiral Lagrangians, 605


irrelevant operator, 576 infrared behavior, 606
Møller operator, 100
Jordan, Pascual, 18, 40, 83
quantized string model, 20 N-extended supersymmetry, 450
Nambu–Goldstone symmetry, 488
Kållen–Lehmann (spectral) representation, 291 natural units, 70
for Feynman two-point function, 292 neutrino masses, 195
Kållen-Lehmann (spectral) representation no-hair theorems, 491
for Wightman two-point function, 291 Noether current, 430
Kirchhoff, Gustav, 4 Noether currents
Klein, Oskar, 40 dilatation symmetry, 438
Klein–Gordon equation, 37, 151, 183 global non-abelian symmetry, 440
Klein–Gordon operator, 210 global phase symmetry, 439
Kramers, H. A., 54 Lorentz symmetry, 435
translational symmetry, 432
Lagrange density, 421 Noether’s theorem, 427, 487
Lagrangian field theory, 43 functional version, 441
Hamiltonian version, 43 non-abelian field tensor, 532
Lamb shift, 55, 399 non-Borel theories
Lamb, Willis Jr, 54 convergent resummation of, 404
Landau (Lorentz) gauge, 524 double-well case, 402
lattice field theory, 409 non-covariant normalization of states, 112
lattice gauge theory, 732 non-renormalizable operator, 631
strong coupling expansions in, 741 normal-ordered product, 158
Legendre transform, 351 normal-ordering, 41
as functional Fourier transform, 425 number conserving interaction, 154
existence of, 423 number-phase uncertainty relation, 227
supremum definition of, 497
Lehmann-Symanzik-Zimmermann (LSZ) on-mass-shell, 151
formalism, 281 on-shell symmetry, 288
light-cone expansion, 697 one-particle-irreducible (1PI) graphs, 346
Lipmann–Schwinger equation, 102 operator mixing, 668
little group, 115 operator product expansion (OPE), 67, 663, 679
massless, Euclidean group in two dimensions, operator products
119 dimensional regularization, 592
local covariant fields, 181 momentum regularization, 589
local field, 126 Oppenheimer, J. Robert, 47
local gauge symmetry, 509 optimized perturbation theory, 403
turntable model of, 510 orthochronous Lorentz transformations, 109
localizability of particles, 160
London, Fritz, 32 parallel gauge transporter, 733
Lorentz scalar field, 126 parity operation, 470
LSZ reduction formula, 283, 284, 286 parity transformation, 176
path integral
magnetic monopoles, 748 fermionic (Grassmann) fields, 336
’t Hooft–Polyakov, 750 for bosonic fields, 325
Majorana equation, 185 for finite temperature partition function, 89
Majorana field, 185 for ground-state expectation values, 94
marginal operator, 576 formulation of quantum mechanics, 86
mass counterterm, 323 gives time-ordered products, 92
mass gap, 254 Hamiltonian version, 89
mass-shell, 110 Lagrangian version, 88
matrix mechanics, 71 simple harmonic oscillator, 87
matter fields, 40 Pauli exclusion principle, 146
Maxwell, James Clerk, 2, 4 Pauli, Wolfgang, 43, 48
Maxwell–Proca equation, 199 Peierls, Rudolf, 50
microcausality, 144 persistent interactions, 322
minimally supersymmetric Standard Model perturbation theory
(MSSM), 564 divergence of, 376
multi-pion amplitudes large order behavior, 379
Index 781

perturbatively non-perturbative processes, 375, 386 renormalized field φR , 633


perturbatively renormalizable set of operators, 587 renormalized perturbation theory
perturbatively renormalizable theory, 636 (non-)overlapping divergences, 621
phase operator, 225 counterterms in, 612, 635
Planck’s constant, 13 renormalization parts, 620
Planck, Max, 1, 2 role of symmetry in, 645
blackbody spectrum, 13 subtractions in, 612
quantization of energy, 8 Taylor subtraction operator tΓ , 616
plaquette variable, 735 Zimmermann forest formula, 626
Poincaré group, 110 renormalons, 403
composition law, 111 restricted Lorentz group, 109
Lie algebra, 258, 444 rollover solution, 381
Poisson bracket, 514 Ruelle Clustering theorem, 264
power-counting running coupling, 703
in Minkowski space, 695
primary constraint, 510 S-matrix, 53, 96
principle of minimal sensitivity(PMS), 404 cluster decomposition principle, 132
proper Lorentz transformations, 109 clustering, in Haag–Ruelle theory, 277
proper vertex, 354 connected parts, 133
pseudo-Goldstone boson, 237, 489, 553 gauge-invariance of, 528
infrared divergences in, 719
quantum chromodynamics (QCD), 530 LSZ formula for, 286
Lagrangian for, 532 parity invariance of, 471
quantum electrodynamics (QED), 30, 31, 45, 307, relativistic invariance, 121
357 vanishes in QED, 718
divergence of perturbation theory in, 611 scalar field
early difficulties, 46 Lorentz transformation property, 126
infrared cancellations in, 728 scalar fields
Lagrangian for, 527 translation property, 129
non-relativistic bound states in, 396 scale dimension (of a field), 437
quark (color) confinement, 564 Schrödinger equation
quark condensate, 601 as non-relativistic limit of Bethe–Salpeter
quark confinement, 729 equation, 398
semiclassical theory of, 745 Schrödinger, Erwin, 70
Schwarz test function, 256
Rabi, Isidore R., 54 Schwinger functions, 261
Rayleigh–Jeans Law, 13 Schwinger model, 731
regularization techniques, 588 Schwinger term, 127, 415
relativistic wave equations, 171 Schwinger, Julian, 56
relevant operator, 576 seagull vertex, 416
renormalizability, 159 second quantization, 42, 144
renormalizable ξ-gauges, 563 self-conjugate field/particle, 156
renormalizable operator, 631 self-energy divergence, 46
renormalization conditions, 612, 635 Shelter Island conference, 55
renormalization group, 66 sigma model
renormalization group (RG), 569, 581 linear, 603
fixed point, 706 non-linear, 605
for renormalized amplitudes, 699 sign problem, 411
renormalization group equation (RGE), 570, 581 Slavnov–Taylor identities, 647
for 1PI amplitudes, 701 smeared fields, 220, 256
renormalization group flow space-like anticommutativity, 181
of effective Lagrangians, 584 space-like commutativity, 126
renormalization scale μ spectral function, 291
in dimensional renormalization, 641 asymptotic behavior, 293
in momentum subtraction scheme, 641 spin states, 116
renormalization scheme, 612 Spin-Statistics theorem, 45, 59, 484
BPHZ, 636 spontaneous symmetry-breaking (SSB), 234, 488
Euclidean subtraction point, 641 clustering failure in mixed states, 504
on-shell, 641 global symmetries, 492
renormalization theory, 610 in local gauge theories, 552
Wilsonian approach, 652 tests for, 495
renormalized coupling, 611 stability constraint, 234
782 Index

stable vs unstable particles, 298 Ward–Takahashi identities, 441, 647


Standard Model, 57, 68, 184, 186, 198, 228, 299, chiral, 547
300, 307, 370, 400, 427, 489, 490, 505, 564 wavefunction renormalization constant, 289, 633
anomaly cancellation in, 551 weak hypercharge, 556
electroweak sector, 558 weak(strong) convergence, 98
strong interaction sector, 530 Weinberg power-counting theorem, 613
static quark potential, 740 applicability in Minkowski space, 618
string tension, 743 Weinberg, Steven, 59
super-Poincaré algebra, 444 clustering theorem, 140
super-renormalizable operator, 631 Weinberg–Salam theory, 555
superfields, 451 Weisskopf, Victor, 46, 48, 54
component fields, 453 electron self-energy, 51
chiral, 456 Wess–Zumino model, 463
scalar, 452 Weyl equation, 196
superpotential Weyl field, 186, 195
contribution to energy-momentum tensor, Wick rotation, 332
433 Wick’s theorem, 309
in SUSY, 461 functional form, 313
superselection rules, 85 path integral version, 330
superspace, 447 proof, 310
supersymmetry (SUSY), 63, 444 Wick, Gian-Carlo, 85
fermi-bose degeneracy, 451 Wien, Wilhelm, 6
implies vanishing vacuum energy, 463 Wightman axioms, 253
supersymmetry algebra, 450 field axioms, 255
supersymmetry transformations, 446 particle–field duality axioms, 267
state axioms, 254
T-matrix, 103 Wightman functions, 256
TCP theorem, 59, 478 analyticity properties, 259
tempered distribution, 257 clustering properties, 264
theories of everything (TOE), 67 Wightman reconstruction theorem, 268
Three-man-paper (Drei-Männer-Arbeit), 18, 19 Wightman, Arthur, 85
threshold singularities, 389 axiomatic formulation of field theory, 253
time reversal operation, 79, 477 Wigner rotation, 117, 178
Tomonaga, Sin-itiro, 56 Wigner, Eugene, 42
Two-man-paper, 18 irreducible representations of the Poincaré
two-particle-irreducible (2PI) graphs group, 115
as Bethe–Salpeter kernels, 358 role of symmetry, 62
in threshold bound states, 389 unitarity-antiunitarity theorem, 78
in Wilson OPE, 682 Wigner–Weyl symmetry, 488
Wilson coefficient functions, 679
ultralocal commutator, 127 Wilson lattice action, 736
ultraviolet catastrophe, 14 Wilson line, 738
ultraviolet divergences, 46 Wilson loop, 740
in products of Feynman propagators, 318 monopole contribution in three-dimensional
ultraviolet fixed point, 705 gauge theory, 749
unitarily inequivalent spaces, 363 spin-wave contribution in three-dimensional
gauge theory, 749
vacuum fluctuations, 322 Wilson, Kenneth, 66
vacuum polarization, 50
vector fields, 198 Yang–Mills Lagrangian, 532
transformation under charge conjugation, 476
Villain lattice action, 736 Zimmermann effective Lagrangian, 671
von Neumann, John, 70, 86 Zimmermann identity, 669

You might also like