Unifying Typical Entanglement and Coin Tossing: On Randomization in Probabilistic Theories

Download as pdf or txt
Download as pdf or txt
You are on page 1of 35

Unifying typical entanglement and coin tossing: on randomization in probabilistic theories

Markus P. Muller,
Oscar C. O. Dahlsten,2, 3 and Vlatko Vedral2, 3, 4

arXiv:1107.6029v1 [quant-ph] 29 Jul 2011

1
Perimeter Institute for Theoretical Physics, 31 Caroline Street North, Waterloo, ON N2L 2Y5, Canada
Centre for Quantum Technologies, National University of Singapore, 3 Science Drive 2, Singapore 117543, Singapore
3
Clarendon Laboratory, University of Oxford, Parks Road, Oxford OX1 3PU, United Kingdom
4
Department of Physics, National University of Singapore, 2 Science Drive 3, Singapore 117542, Singapore
(Dated: August 1, 2011)

It is well-known that pure quantum states are typically almost maximally entangled, and thus have
close to maximally mixed subsystems. We consider whether this is true for probabilistic theories
more generally, and not just for quantum theory. We derive a formula for the expected purity of a
subsystem in any probabilistic theory for which this quantity is well-defined. It applies to typical
entanglement in pure quantum states, coin tossing in classical probability theory, and randomization
in post-quantum theories; a simple generalization yields the typical entanglement in (anti)symmetric
quantum subspaces. The formula is exact and simple, only containing the number of degrees of
freedom and the information capacity of the respective systems. It allows us to generalize statistical
physics arguments in a way which depends only on coarse properties of the underlying theory. The
proof of the formula generalizes several randomization notions to general probabilistic theories. This
includes a generalization of purity, contributing to the recent effort of finding appropriate generalized
entropy measures.

Contents

I. Introduction
II. Main results and overview
A. Purity
B. State space dimension K and capacity N
C. Unifying typical entanglement and coin tossing
D. Typical entanglement of symmetric and antisymmetric states
E. Statistical physics and the second law
F. Simple proof of the quantum case

2
3
4
4
6
7
9

III. Mathematical framework and proofs


A. General probabilistic theories and the Bloch representation
B. Definition and properties of purity
C. Comparison to existing entropy measures
D. Irreducible subgroups and generalized Paulis
E. Bipartite systems: local purity as an entanglement measure
F. Classical subsystems and capacity
G. GG0 -invariant faces: entanglement in symmetric subspaces
H. Theories which are not locally tomographic

10
10
11
14
14
18
21
24
28

IV. Summary and outlook

30

A. Irreducibility of the Clifford group

30

B. Purity in boxworld

31
33

References

I.

INTRODUCTION

It is increasingly recognized that entanglement is ubiquitous, as opposed to a rare resource that is difficult to create.
In fact most unitary time evolutions (in a sense to be made precise later) generate a large amount of entanglement

2
within a closed quantum system. This turns out to be equivalent to saying that pure quantum states are typically almost
maximally entangled.
This striking observation was already made decades ago, see e.g. [14], although it was initially phrased as subsystem entropy typically being maximalthis was before subsystem entropy became the canonical measure of entanglement for pure states. The observation and its subsequent refinements have helped us understand more about
entanglement and its role in information processing [114] as well as statistical mechanics [1523]. For example,
bearing the above in mind it is not surprising that the difficulty for an experimenter trying to perform e.g. quantum
computing is not to generate entanglement but to control what is entangled with what, and in particular to avoid
entanglement between the experiment and the environment, as that will increase the entropy of the system.
Here we show that this observation is an instance of a more universal phenomenon which appears in a wide class
of probabilistic theories: systems typically randomize locally if a global transformation is applied. More specifically,
the expected amount of randomization can be expressed by a simple formula, which is universally valid for any
probabilistic theory satisfying a small set of requirements. The formula describes classical coin tossing as well as
typical entanglement in quantum and possible post-quantum theories, and has a particularly simple form which
does not depend on the details of the theory.
We work in the framework of generalized probabilistic theories, also known as the convex framework. This
amounts to taking an operational pragmatic point of view that the physical content of a theory is the predictions
of outcome statistics, conditional on the experimental settings. A wide range of theories can be formulated in this
framework, including quantum theory and classical probability theory.
We ask how pure or mixed subsystems tend to be in such theories, if the global state is drawn randomly (possibly
subject to some constraints). To make the question well-defined, we add some additional restrictions on the set
of theories, including crucially that all pure states are connected by reversible dynamics. Our main result is to
give a simple expression for the expected value of the purity of a subsystem in such probabilistic theories. The
expression shows that, in certain limits, subsystems are typically close to maximally random. (In the case of pure
global quantum states this is equivalent to saying that the states are typically close to maximally entangled).
Our result unifies several instances of randomization associated with different theories. It also clarifies which
features of the theory are behind this phenomenon and govern the strength with which it occurs. Some of the techniques invented in the proof are in addition interesting in themselves. These include generalizations of the notions
of purity and of Pauli operators to general probabilistic theories. The proof is moreover guided by an intuitive
Heisenberg-picture argument which is different to the standard arguments for the quantum case and arguably adds
to our understanding of the quantum result.
We apply the result to generalize a specific statistical mechanical argument employing typical entanglement which
is related to the second law of thermodynamics. We moreover calculate the typical subsystem purity in a variety of
cases, including typical entanglement of pure symmetric and antisymmetric bipartite quantum states, which is to
our knowledge also a new contribution.
The presentation is divided into two parts. It should be possible for readers not wishing to familiarize themselves
with general probabilistic theories to only read the first part. The first part describes the main results and their
implications with an emphasis on the quantum and classical cases. In the second part we deal with the general
probabilistic case.

II.

MAIN RESULTS AND OVERVIEW

One of our main results is an identity which relates simple properties of state spaces to the randomization of
subsystems. Suppose that Alice and Bob hold a bipartite system AB (for example, a composite quantum system
A B). They draw a biparte state AB at random; it may be a random pure state, or a random mixed state with fixed
purity P( AB ). Then, the reduced state at Alice, A , will in general be mixed: its expected purity turns out to be
E P( A ) =

KA 1
NA NB 1

P( AB ).
KA KB 1
NA 1

(1)

The parameters KA and NA denote the state space dimension and information carrying capacity of A, respectively
(similarly for B). It will turn out that this simple formula describes the typical amount of entanglement in random pure
quantum states (in particular the fact that most quantum states are almost maximally entangled), and at the same time
classical coin tossing.
Moreover, this identity describes randomization in possible probabilistic theories beyond quantum theory. It

3
shows that very coarse properties of a theory are sufficient to determine its randomization power basically, the
ratio between the total number of degrees of freedom K versus the number of perfectly distinguishable states N .
A generalization of this identity gives the expected amount of entanglement in symmetric and antisymmetric subspaces, a quantum result that seems to be new as well.
In this section, we give a self-contained and elementary statement of our results:
First, we outline how we define the purity P in general (Subsection II A), and we explain the notions of state
space dimension K and capacity N (in Subsection II B).
Then we demonstrate how our result unifies typical quantum entanglement and coin tossing (Subsection II C)
into a single identity, and we apply a simple generalization of this result to compute the average entanglement
in symmetric and antisymmetric quantum subspaces (Subsection II D).
In Subsection II E, we apply our results to statistical physics. We argue that the results contribute to a theoryindependent understanding of some aspects of thermalization and the second law, which may be applied in
situations like black hole thermodynamics where the underlying probabilistic theory is not fully known.
Finally, we give a simple proof of the quantum case in Subsection II F, which also illustrates the main ideas of
the more general proof in Section III.
The detailed mathematical calculations and results are given in Section III. The main result is Theorem 29, which
contains the exact list of assumptions which must be satisfied for eq. (1) to hold. There is also a more general version
of this result which needs less assumptions, but is slightly less intuitive (Theorem 22). An even more general version
concerns random states under constraints (Theorem 34); this one can be used to derive the average entanglement in
(anti)symmetric subspaces in quantum theory.
Section III uses the mathematical framework of general probabilistic theories, as explained for example in [24, 25].
Several results in this section are of independent interest in this framework. In particular, we introduce and analyze a
general-probabilistic notion of purity. Due to its group-theoretic origin, purity satisfies several interesting identities.
It can be seen as an easy-to-compute replacement for entropy, and has several advantages over recently proposed
entropy measures for probabilistic theories (cf. Subsection III C).
The remainder of this section does not assume familiarity with general probabilistic theories.

A.

Purity

P
In quantum theory, the standard notion of purity of a density matrix with eigenvalues {i } is Tr(2 ) = i 2i .
This quantity has an operational meaning as the probability that two successive measurements on two identical
copies of give the same outcome, if one measures in the basis where is diagonal, i.e. in the minimal uncertainty
basis. It is therefore sometimes called the collision probability.
In this work, it will turn out to be extremely useful to rescale this quantity slightly. For density matrices on Cn ,
we define
P() =

1
n
Tr(2 )
.
n1
n1

(2)

For a qubit (n = 2), this quantity has a nice geometrical interpretation in the Bloch ball: it is the squared length of the
Bloch vector which corresponds to . For all dimensions n,

1 if is pure,
P() =
(3)
0 if is the maximally mixed state.
The definition above applies to quantum theory, where states are density matrices on a Hilbert space. However,
we can also consider classical probability theory (CPT) instead, where states are simply probability distributions, p =
(p1 , . . . , pn ). In analogy to the quantum definition, we set
n

P(p) :=

n X 2
1
.
p
n 1 i=1 i
n1

(4)

4
In CPT, pure states are probability
distributions like p = (1, 0, . . . , 0) (one unity, all others zero), and the maximally

mixed state is p = n1 , . . . , n1 . Therefore, eq. (3) is still valid.
How can these definitions be naturally generalized to other possible probabilistic theories? (Readers who are not
so interested in the framework of general probabilistic theories may now safely proceed to the next subsection.) In
the quantum case, the standard notion of purity can be expressed as
Tr(2 ) = h, i,
where hX, Y i := Tr(XY ) is the Hilbert-Schmidt inner product on the real vector space of Hermitian matrices. This
inner product is very special: it is invariant with respect to unitary transformations U , that is, hU(X), U(Y )i =
hX, Y i, where we used the abbreviation U(X) := U XU . This suggests the following strategy for defining purity in
general probabilistic theories: Find an inner product h, i on the state space which is invariant with respect to all reversible
transformations, and define the purity of a state as h, i.
To make this idea work, we have to be careful, though: even in quantum theory, the invariant inner product
is not unique in the first place. This is due to the fact that the space of Hermitian matrices V decomposes into
V = span{} V , where = I/n is the maximally mixed state, and V is the subspace of traceless Hermitian
matrices. These two subspaces are both invariant with respect to unitaries. Thus, group representation theory [26]
tells us that there are infinitely many invariant inner products.
We can fix this problem by subtracting away the maximally mixed state : if is a density matrix, we define the
corresponding Bloch vector := . This is an element of the traceless Hermitian matrices V , and that subspace
cannot be further decomposed into invariant subspaces. Thus, there is a unique inner product h, i (up to a constant
factor) on V , and we define P() := h
, i, rescaling the inner product such that P() = 1 for pure states . It turns
out that hX, Y i = n/(n 1) Tr(XY ) for X, Y V , and so this definition agrees with eq. (2).
In Section III, we apply exactly the same construction to define purity in general probabilistic theories (cf. Definition 7), under some assumptions on the probabilistic theory which are necessary to get a useful definition. The
resulting purity notion will, in particular, still satisfy eq. (3).

B.

State space dimension K and capacity N

For every state space, we denote by K the number of real parameters required to describe an unnormalized, mixed state,
whereas N denotes the maximal number of (normalized) states that can be perfectly distinguished in a single measurement.
These quantities were to our knowledge first introduced by Wootters and Hardy [27, 28].
As a simple example,
 consider a single
 quantum bit (qubit). Arbitrary mixed states of a qubit are described by
w
y + iz
density matrices =
, where normalization Tr = 1 demands that w + x = 1. Since we are
y iz
x
interested in unnormalized states, we may drop this condition. As a result, unnormalized states are described by
four real parameters w, x, y, z. (Positivity of the matrix adds additional constraints in the form of inequalities, but
the set of matrices fulfilling these conditions is still four-dimensional.) That is, we have K = 4. On the other hand, if
we want to distinguish two states , perfectly in a single-shot measurement, they must be orthogonal. Since there
are only two orthogonal states on C2 , the capacity if the qubit state space is N = 2.
For all state spaces of quantum theory, capacity N equals the Hilbert space dimension (i.e. the states live on CN ),
and we have the relation K = N 2 .
In classical probability theory, the state space
with N perfectly distinguishable configurations consists of the probP
ability distributions p = (p1 , . . . , pN ) with i pi = 1. Dropping normalization, these are N real parameters p1 , . . . , pN
to specify a state. That is, classical state spaces have K = N , in contrast to quantum theory.
For other general probabilistic state spaces, state space dimension K and capacity N can basically be arbitrary
natural numbers, only the relation K N is always true. We give a rigorous mathematical definition of both
quantities in Section III.

C.

Unifying typical entanglement and coin tossing

We will now show that eq. (1) describes both typical entanglement of random pure quantum states and classical
coin tossing at the same time. This will be demonstrated by considering three special cases of eq. (1).

5
Random pure quantum states. Suppose we draw a pure state AB on AB at random, where A and B are Hilbert
spaces of dimensions NA and NB respectively. Recalling eq. (2) and (3) and the fact that K = N 2 in quantum theory,
our main formula (1) yields


 A 2
N2 1
NA NB 1
NA + 1 NB 1
NA
1
A
Tr ( )
= 2A2

.
E P( ) = E
NA 1
NA 1
NA NB 1
NA 1
NA NB + 1
NB
Recall that P( A ) is one if and only if A is pure, and it is zero if and only if A is the maximally mixed state. Now
if the bath B becomes large, we see that the expected purity of the local reduced state on A gets closer and closer
to zero, so that A gets close to maximally mixed. This expresses the fact that random pure quantum states are typically
almost maximally entangled, if the bipartition is taken with respect to a small subsystem.[59]
By typicality, at this point, we mean something very specific. Suppose we want to generate a random state AB
with fixed purity P( AB ) =: P0 (in this case P0 = 1, since we are interested in random pure states). We do this by
choosing a fixed state AB with purity P(AB ) = P0 , and then apply a random reversible transformation T to it,
getting AB := T AB . The transformation T is picked according to the invariant measure (Haar measure) on the
group of reversible transformations. In the quantum case, the Haar measure on unitaries is also called the unitary
circular ensemble. See [29] for an explicit recipe for how to pick unitaries numerically in this manner.
So far, our formula only expresses the expectation value of the local purity P( A ). To call this value the typical value
one needs to show that the distribution is peaked around the mean. Intuitively this must be the case if the expected
value is close to the minimum allowed, as that could only occur if almost all of the distribution is concentrated close
to the minimum. A simple way to see that this is indeed the case is to apply Markovs inequality [30], which in this
case reads (for x > 1)


NB x
1
A
.
x E P( A )
P P( )
x
NB
This shows that if the mean is small, the probability of P( A ) deviating from it must be small. Stronger results of this
kind can be obtained from measure concentration theorems on Lie groups [31], but we will not pursue this approach
further in this paper.
Random pure classical states. What if we draw random pure bipartite states in classical probability theory? In
this case, purity is defined by eq. (4), and, as discussed above, state space dimension and capacity are equal: K = N .
A
Thus, our main formula (1) yields for the local marginal A = (1A , . . . , N
) the result
A
N

E P( A ) = E

A
1
NA X
( A )2
NA 1 i=1 i
NA 1

!
=

KA 1
NA NB 1

= 1.
KA KB 1
NA 1

All the terms cancel, and we get that the expected local reduced purity equals unity. Since 1 is the maximal value,
this is only possible if in fact P( A ) = 1 for all pure states AB . In other words: all pure bipartite states have
pure marginals. This expresses the simple fact that there are no entangled states in classical probability theory all pure
bipartite states are product states.
Before turning to the more interesting example of classical coin tossing, we briefly discuss how classical probability
distributions AB of fixed purity P( AB ) =: P0 are drawn at random (in this example, so far, we have the case
P0 = 1). In analogy to the quantum case, we start with an arbitrary fixed bipartite probability distribution AB
with purity P(AB ) = P0 . In classical probability theory, the reversible transformations are the permutations, that is,
doubly stochastic matrices containing only ones and zeroes. Now the state AB is defined as AB := T AB , where
T is a random permutation.
Classical coin tossing. We can use identity (1) to describe the process of coin tossing in classical probability theory.
Suppose we start with a coin (that is, a classical bit) whose value (heads or tails say, heads) is perfectly known
to us. In this case, the (pure) state of the coin is a probability distribution A = (1, 0), where 1 is the probability of
heads and 0 the probability of tails. However, the environment B is not known to us it is in some mixed state B .
The total state (coin and environment) is thus in a mixed state AB := A B , with some purity P(AB ) =: P0 < 1.
Now we toss the coin that is, we flip it in an uncontrolled way which makes it interact with the environment.
It makes sense to model this process as a random reversible transformation (permutation) T of the global system
AB. In the end, we capture the coin, cover it with our hand (so that we cannot see what side is up) and disregard

6
the environment. The state of the coin is then A , the marginal corresponding to the global state AB := T AB . We
expect that the coins state should be mixed. In fact,
E P( A ) =

NA NB 1
KA 1

P( AB ) = P( AB ) = P0 ,
KA KB 1
NA 1

where the same cancellation as in the previous example applies. That is, our ignorance about the environment gets
transferred to the coin, which is exactly what coin tossing is all about.

D.

Typical entanglement of symmetric and antisymmetric states

So far, our discussion only covered the case that a state is drawn randomly from the set of all states (subject to
fixed purity). However, there are situations particularly in thermodynamics, as we discuss in the next subsection
where one would like to draw random states subject to additional constraints.
This generalization is treated in Subsection III G, where we compute the expected subsystem purity for random
states that satisfy certain symmetry constraints (Theorem 34). In the special case of quantum theory, this gives the
typical entanglement for subspaces S AB that have the following symmetry property: For every unitary U on A,
there is a unitary U 0 on B such that U U 0 preserves the subspace S. An explicit formula for the expected purity of the
reduced state is given in Theorem 36.
Here, we apply this result to compute the typical amount of entanglement in pure states of symmetric and antisymmetric subspaces: they are both U U -invariant. As usual, for a Hilbert space H, the symmetric subspace H H
resp. antisymmetric subspace H H are defined as those vectors |i with |i = |i resp. |i = |i, where
is the unitary that swaps the two particles. For three and more particles, the totally (anti)symmetric subspace is
defined as the set of vectors that satisfy this equation for all pairs of particles simultaneously. Investigating this
case is motivated by the importance of identical bosons and fermions which, by the symmetrisation postulate, have
symmetric and antisymmetric joint states respectively [32, 33].
States such as antisymmetric fermionic states are clearly entangled in the mathematical sense, but we note that
they can only be termed entangled in the operational sense under some additional assumptions. Standard entanglement theory implicitly assumes that different systems corresponding to different tensor factors can be operationally
distinguished, which is in general not true for bosons and fermions. However, whilst e.g. two electrons are always
indistinguishable, they can in fact be treated as distinguishable if they are localized in two separate spatial locations [32, 33]. This fact gives rise to a natural scenario where antisymmetric states appear that are entangled in the
operational sense: If two such localized electrons had previously shared the same spatial part of the wavefunction,
their internal degrees of freedom (spin) would have been antisymmetric, and remain so unless altered. After separating the two electrons, one would have obtained standard (not just mathematical) entanglement between them,
having arisen due to the antisymmetry requirement on their joint state (see eg. [3436]). One may, for concreteness,
think about our calculations in this section with such a scenario in mind.
Theorem 1. Consider the symmetric and antisymmetric subspaces S on two n-level quantum systems A = B = Cn , i.e.
S+ = Cn Cn and S = Cn Cn . If S is a random pure quantum state, then the expected local purity is
h
 i
2(n 1)
A 2
.
E Tr
= 2
n n+2
2
Moreover, drawing a random mixed state of fixed purity Tr(
) from the corresponding subspace, the expected local purity is
2
described by the same equation, only the factor 2 in the numerator has to be replaced by 1 + Tr(
).

Proof. We use Theorem 36 from Subsection III G: this theorem is applicable because the symmetric and antisymmetric
subspace are both invariant with respect to transformations of the form U U . In the following, we sketch the proof
for the symmetric subspace S := S+ ; the proof for the antisymmetric case is completely analogous.
According to the notation of Theorem 36, we have NA = n and NS = dim(S+ ) = n(n + 1)/2. Since the case n = 1
is trivial, we may assume that n 2. Denote orthonormal basis vectors of A = Cn by |1i, |2i, . . . , |ni. We may choose
2
the matrix EA as EA := 12 |1ih1| 12 |2ih2|; it satisfies Tr EA = 0 and Tr EA
= 1 as required. An orthonormal basis
1
of S consists of the vectors |iii with 1 i n, and 2 (|iji + |jii) for i < j. This allows us to write the projector
onto S as
n
X
1X
1X
1X
=
|iiihii| +
(|iji + |jii) (hij| + hji|) =
|ijihij| +
|ijihji|.
2 i<j
2 i,j
2 i,j
i=1

7


Using these expressions, the calculation of Tr ((EA IB ))2 is lengthy but straightforward. The result is that this
expression equals n/4 + 1/2. Substituting this into Theorem 36 proves the claim.
Since Theorem 36 (which has been used to prove this result) is applicable in more general situations, there exist
several possibilities to generalize the theorem above. For example, consider the totally symmetric or totally antisymmetric subspace on N qudits, S+ := Cn Cn . . . Cn , and S := Cn Cn . . . Cn , both as subspaces of
AB = (Cn )N . Consider the 1-versus-(N 1)-qudits cut, i.e. A = Cn and B = (Cn )(N 1) . Then the situation
satisfies the conditions of Theorem 36: for every unitary U on A, there is a unitary U 0 on B such that U U 0 S = S ,
namely U 0 := U (N 1) . Thus, Theorem 36 can be used to compute the expected local purity for this cut. (We do not
pursue this calculation here.)
It is clear that the result of Theorem 1 above can be proven in principle without the machinery of this paper, purely
within quantum mechanics. However, we think it is important to have it derived within the framework of general
probabilistic theories, showing the power and flexibility of this framework. Our general proof, as given in Section III,
is very geometrical in flavour; it treats the set of quantum states as a convex set, with the (anti)symmetric subspace
as a face. Thereby, it shows very clearly what geometric properties of the quantum state space are important for the
result to hold.
Apart from the generalization to other theories that one obtains for free, this proof method also clarifies some
aspects of the quantum result. For example, it shows why some further generalizations of the above result will need
considerable further effort, such as computing the average local purity for the 2-versus-(N 2)-cut on the totally
symmetric subspace. If Alice holds two qudits, she can locally perform unitaries of the form U U . If, for any
such unitary, the map U 0 := U (N 2) is applied on Bobs part of the state, then the totally symmetric subspace stays
invariant. Since the group of unitaries U U acts irreducibly on Alices subspace Cn Cn , the situations seems fine
at first, and one might guess that Theorem 36 is easily generalized to this situation.
But this turns out to be wrong: the important property is that Alices unitaries should act irreducibly on her convex
set of states, which is in general not the case. Instead, the group action 7 U U U U is reducible on the space
of traceless Hermitian matrices over C2 C2 , and Alices (Bloch) state space decomposes into invariant subspaces.
This shows that the relevant question is not whether the group of U U acts irreducibly, but whether it is a 2-design.
In this case, the answer is negative.

E.

Statistical physics and the second law

Our result can also be used to generalize an approach to thermodynamics which has recently attracted a lot of
attention [1522]. This approach is based on the fact that most pure quantum states are almost maximally entangled,
in the sense described earlier in this paper.
The main idea, as developed for example in [18], can be stated as follows. We divide the universes Hilbert space
H into a small system and a large environment, H = HS HE . In many cases, the state of the universe is
constrained to be an element of some subspace HR H, which might be, for example, a subspace corresponding to
a narrow window of energies. The maximally mixed state on HR is called the equiprobable stateR . The actual
state of the universe is then assumed to be some unknown pure state |i from HR .
At first, it seems as if the exact form of the actual state |i HR would have profound consequences, and that
very little can be said about the reduced state on the small subsystem, S = TrE |ih|. But this turns out to be
wrong: in fact, most states |i look very alike on the small subsystem. That is, S TrE R with high probability
for randomly chosen |i. This can be formulated as follows:
Principle of Apparently Equal a priori Probability [18]: For almost every pure state of the universe, the state of a sufficiently small subsystem is approximately the same as if the universe were in
the equiprobable state R . In other words, almost every pure state of the universe is locally (i.e. on the
system) indistinguishable from R .
This principle is then used to justify the equal a priory probability assumption, an assumption of Statistical Physics
which is used in the derivation of many major results in that field.
Our results can be interpreted in a similar manner. First, consider the simple case where we have a small system,
A (not necessarily quantum), coupled to a large bath, B, and where all global states are in principle possible. In the
quantum situation, this corresponds to the special case where HR = H. If we have a random state AB on AB with

8
purity P( AB ), and if the conditions of Theorem 29 are satisfied, we have for NB  KA
E P( A ) =

NA NB 1
NA (KA 1)
NB
KA 1

P( AB )
P( AB )
.
KA KB 1
NA 1
KA (NA 1)
KB

If this is very small, then the state of the small subsystem is very close to maximally mixed. As discussed in Subsection II C), the Markov inequality (or more powerful measure concentration inequalities) tell us that the expectation
value is then also the typical value. In this case, the Principle of Apparently Equal a priori Probability is satisfied
in our more general setting.
It remains to see under what conditions this expectation value is actually close to zero. There are two possibilities
how this may happen:
We might have a random pure state, i.e. P( AB ) = 1, but NB /KB might tend to zero with increasing size of
the bath B. This is exactly what happens in quantum theory, where NB is the baths Hilbert space dimension,
and KB = NB2 . The interpretation is that most pure bipartite states are almost maximally entangled, such
that the local reduced state looks close to maximally mixed.
It is interesting to see that the same phenomenon may appear in general probabilistic theories beyond quantum
theory, and may in fact be stronger: there are natural possible classes of theories [27, 28] where KB = NBr for
some integer r N. While we have r = 2 for quantum theory, other theories with r 3 would have even
stronger-than-quantum randomization: they would have NB /KB = NB1r , turning faster to zero than the
quantum value.
On the other hand, we may have KB NB , or equality as in the case of classical probability theory. Then, we
could still have randomization if P( AB ) tended to zero with increasing size of B, in situations where it makes
sense to model the global state as a random mixed state.
A situation like this is given in classical coin tossing, as discussed in Subsection II C: there, P( AB ) describes
the purity of the unknown global initial state, before the coin is tossed in a random, reversible way. Larger
environment usually amounts to less knowledge about its details, which means smaller purity P( AB ).
One may argue that a situation like this is also encountered in natural systems of classical statistical mechanics,
if a small finite system is reversibly coupled to a large, unknown environment.
In the quantum situation, the principle above is formulated for the more general case that HR is a proper subspace
of H, and not all of the global Hilbert space. The analogue of this situation in general probabilistic theories would
be to have the set of allowed states restricted to some face of the global state space AB. Our results do not directly
address this situation in full generality, but Theorem 34 covers the special case of a GG0 -invariant face F. Even
though the resulting formula is not as transparent as the one above, it shows that the amount of randomization is
also very strong if the faces dimension KF increases with the bath B: since projections are contractions, we have
2 KA 1






2 KA 1
P AB P(F ) X A uB
P AB
EF P( A ) = F (X A uB ) 2
KF 1
K

1
2
F

P AB
KA 1
KA 1 NA NB 1
=

P( AB )
KF 1 P (A B )
KF 1
NA 1
whenever the conditions of Theorem 29 are satisfied (we have also used Lemma 21). If NB /KF tends to zero with increasing size of the bath (as is the case for symmetric and antisymmetric subspaces in quantum theory), the Principle
of Apparently Equal a priori Probability remains valid.
A speculative, but interesting application of this result in the post-quantum case could be in black hole thermodynamics. The results on typical entanglement have already been discussed in the context of black hole entropy [37],
and quantum information analysis has been applied to learn more about the black hole information paradox [15, 38
40]. Since no fully complete and unique theory of quantum gravity is available yet, many parts of black hole thermodynamics are subject to speculation. Vice versa, the assumption that the laws of thermodynamics are valid for
black holes is used to obtain information on properties of the possible underlying theory of quantum gravity.
One may speculate that a possible theory of quantum gravity might not only involve a modification of the usual
concepts of geometry and gravity, but also of quantum theory itself. It is possible that quantum theory is only an
approximation to a different kind of deeper probabilistic theory, similarly as classical probability is only an approximation to quantum theory. The principle of equal a priori probability is closely linked to the second law, and one

9
may view our results as a first step towards formulating the second law as a kind of meta-theorem that does not
depend on the details of the theory and may thus apply to post-quantum theories.
As further motivation for research in this direction we note that there is a striking historical precedent where assuming the persistence of the second law helped to discover new physics. Planck [41] arrived at energy quantization
(energy = h, where is a frequency and h his constant), by implicitly assuming certain thermodynamical entropic
relations would still be valid after the quantization of energy.

F.

Simple proof of the quantum case

We now give a comparatively simple derivation of the value of typical purity for the quantum case. The proof
simplifies and generalizes the proof for the case of globally pure quantum states in [13, 42]. Moreover, it gives some
intuition on the necessary notions and ingredients for the general proof in Section III.
Firstly we note that the local purity is directly related to how well one can predict measurements of local outcomes.
Phrased in these terms we wish to show that local measurements (of the form gA IB for some gA 6= IA ) tend to
be highly unpredictable. We shall accordingly represent the state in a way which makes it clear to what extent local

measurements are defined. We will use a nice way of linking the Heisenberg and Schrodinger
pictures, which is to
expand the density matrix in terms of elements g of the Pauli group {X, Y, Z, 1}n :
X
=
i gi .
i
n

The sum contains 4 terms, which we label from i = 0 to i = 4n 1, such that g0 = 1. The coefficients i are directly
related to the expectation values of the corresponding Pauli element via
hgi i = Tr(gi ) = Tr(1i ) = 2n i .

(5)

In what follows we use the above representation to derive the expected purity value. We write the formula in a
way that highlights that the ratio of the local to the total purity is proportional to the ratio of the number of local
versus global observables. The intuition here is as follows. There is a certain limited amount of purity/predictability
about the state, and this gets associated with observables picked at random (not independently) by the random
unitary. If most observables are global, this predictability is then likely to be associated with global observables,
while the remaining local ones become unpredictable.
Lemma 2. Consider any quantum state on n = nA + nB qubits, with fixed purity Tr(2 ). Apply a random unitary U to it,
i.e. := U U . Then the expected local purity on subsystem A is given by


EU Tr (A )2 2nA
KA 1
= 2nB
,
Tr(2 ) 2n
KAB 1
where KA = 4nA and KAB = 4n quantify the number of local (i.e. Paulis of the form gA IB ) and global degrees of freedom
(all other Paulis), respectively. Note that 2nA and 2n are the minimal possible values of the purity of any quantum state on
A resp. AB.
P
nA
Proof. Note that A = TrB () =
such elements. This shows that
Paulis gi with 1on B i TrB (gi ), and there are 4
nA
P
4
1 2
A 2
nB 2
( ) = (2 )
i 1A , where i is the label of the Pauli operator gA IB . Consequently
i=0
"
#
A 1
A 1
4nX
4nX
 A 2


n+nB
2
n+nB
2
2
EU i = 2
EU 0 +
EU i = 2n+nB 22n + (4nA 1) EU i2 .
EU Tr ( ) = 2
i=1

i=0
n

EU 02

2n

We have used the fact that Tr() = 1 = 2 0


=2
. Now consider two elements gi , gj with i, j 6= 0. Those
elements are connected by some unitary operation V , i.e. gj = V gi V . Thus,




EU j2 = EU Tr (U gj U )2 = EU Tr (U V gi V U )2 = EU i2
due to the unitary invariance of the Haar measure. Now we exploit the fact that Tr(2 ) = Tr(2 ) =
P n

Tr(2 ) 2n
4 1 2
2
n
2n
2n

+
2
.
Taking
the
expectation
value
of
this
expression
gives
E

=
2

. This
U
i
i
i=1
4n 1
can be substituted into the expression above, proving the statement of the lemma.

10
Some remarks:


1. We see that EU Tr (A )2 2(nB n) = 2nA when n  nA  1. This is the minimum value it can take.
2. The purity condition Tr(2 ) = 1 enforces the uncertainty principle, forcing most of the i2 to be small.
3. Apart from the purity restriction, the observer is constrained by having access to only the KA 1 = 4nA 1
local observables out of a total of KAB 1 = 4n 1. This ratio appears directly in the statement of the lemma.
Perhaps surprisingly, the proof above can also be adapted to classical probability theory (with nA + nB classical bits).
CPT
CPT
The only difference in the result will be that KA and KAB have to be replaced by KA
= 2nA and KAB
= 2n , in
agreement with Subsection II B. Remark 2. above suggests that this result is related to the absence of uncertainty in
classical pure states.
The proof above also illustrates some main ideas for the derivation of the general probabilistic result in Section III.
A useful insight above was to consider the linear maps i i (), and to see that purity can in general be expressed
as a sum over i ()2 . More specifically, the local reduced states purity can be expressed as a sum over i ()2 for a
certain type of i s, namely those which act locally: i () = Tr((g I)). Since they are all connected by reversible
transformations, they all have the same Haar expectation value.
The general case will use a very similar construction, where the sum is replaced by an integral over the group,
and the map i is replaced by a general Pauli map (cf. Lemma 13). In analogy to the quantum case, it turns out
that the local reduced states purity can be expressed as an integral over a certain type of Pauli map, namely one
which acts locally (cf. Lemma 21 and the proof of Theorem 22.) Again, due to invariance with respect to reversible
transformations, all these maps in the integral have the same Haar expectation value, giving rise to the proof of our
main result.

III.
A.

MATHEMATICAL FRAMEWORK AND PROOFS

General probabilistic theories and the Bloch representation

We work in the framework of general probabilistic theories, a natural mathematical framework which describes basic
operational laboratory situations like preparations, transformations, and measurements. Quantum theory can be
described within the framework, as well as classical probability theory and a large class of possible generalizations.
For an introduction to this framework, and in particular for the physical motivation, see e.g. [24, 28, 43]. Our notation
is particularly close to [43] and [28].
A state space is a tuple (A, A+ , uA ), where A is a real vector space of finite dimension KA (we will not consider
infinite-dimensional state spaces in this paper), and A+ A is a proper cone (that is, a closed, convex cone of full
dimension which does not contain lines). It can be interpreted as the set of unnormalized states. uA is a linear
functional which is strictly positive on A+ \ {0} and is called the order unit of A. The set of
S points A+ with
uA () = 1 is called the set of (normalized) states and is denoted A . It follows that A+ = 0 A and that A
is a compact convex (KA 1)-dimensional set. Its extremal points are called pure states, the others are mixed states.
Instead of the full tuple, we will usually just call A the state space.
A linear invertible map T : A A is called a symmetry if T (A+ ) = A+ and uA T = uA . That is, symmetries T
map the set of normalized states A bijectively into itself. The example of a qubit shows that not all symmetries of
a state space have to be allowed transformations: reflections in the Bloch ball are symmetries, but are not physically
allowed since they do not correspond to completely positive maps. Thus, in order to define reversible dynamics on
a state space, we also have to specify a group GA of (allowed) reversible transformations. For the sake of generality,
we allow arbitrary choices of GA , as long as GA is compact and contains only symmetries.[60] Then, a pair (A, GA ),
where A is a state space (equivalently: a tuple (A, A+ , uA , GA )) will be called a dynamical state space. Again, to save
some ink, we will usually denote the dynamical state space simply by the letter A rather than by the full tuple.
One goal of this paper is to investigate properties of random pure states on general state spaces. In order to have
a meaningful mathematical notion of random states, we need the following property:
Definition 3 (Transitivity). A dynamical state space A is called transitive if for every pair of pure states , A there
exists a reversible transformation T GA such that T = .

11
Since GA is compact, we have the notion of a Haar measure on that group [26]. Thus, we can draw a pure state by
applying a random reversible transformation T GA to an arbitrary given pure state . Transitivity is required so
that the resulting distribution does not depend on the initial state . The property of transitivity is thus a necessary
mathematical prerequisite in order to have an unambiguous notion of random pure states.
It is also questionable whether reversible theories without transitivity re self-consistent in a specific physical sense.
Imagine for example that given a product state there is no reversible way to transform it into a pure but correlated
(and thus entangled) state. This is the case for the theory with PR-boxes known as boxworld [44]. Then one can for
example not model a measurement as a reversible correlating interaction between a memory system and the system
in question. The possibility of modelling measurement interactions in that way seems to play a fundamental role
for the self-consistency of quantum theory and statistical mechanics (cf. Maxwells demon and Bennetts reversible
measurements). In accordance with the two justifications above we shall henceforth unless otherwise stated assume
that we are dealing with transitive state spaces.
Definition 4 (Maximally mixed state). If A is a transitive dynamical state space, let A be an arbitrary pure state, and
define the maximally mixed state A on A by
Z
A :=
G() dG.
GGA

It follows from transitivity that A does not depend on the choice of . Clearly, T A = A for all T GA , and A
is the unique state on A with this invariance property. The space A can be decomposed into a direct sum
A = A RA ,
where RA denotes the one-dimensional subspace which is spanned by A , and A is defined as the set of all vectors
a A with uA (a) = 0.
Definition 5 (Bloch vector). Given any state A (or, more generally, any point A with uA () = 1), we define its
corresponding Bloch vector
as

:= A A.
A.
The set of all Bloch vectors
with A will be called
Note that convex combinations of states yield the corresponding convex combinations of the Bloch vectors:
P
P
P

( i i i ) = i i
i if i i = 1. Every reversible transformation T GA leaves A invariant. Thus, we have
(T ) = T A = T ( A ) = T
,
and applying a transformation T to a state is equivalent to applying it to the corresponding Bloch vector.

B.

Definition and properties of purity

We would like to define a notion of purity in generalized state spaces. In the quantum case, this is just
Tr(2 ) = h, i, where hX, Y i := Tr(XY ) denotes the Hilbert-Schmidt inner product on Hermitian matrices. This
inner product is very special it is invariant with respect to the reversible transformations of quantum theory:
hU AU , U BU i = hA, Bi for all unitaries U . Thus, it makes sense to ask for the existence of an analogous inner
product in more general theories.
Lemma 6. Let A be a transitive dynamical state space. Then the following statements are equivalent:
There is a unique inner product h, i on A (up to constant multiples) such that all reversible transformations T GA are
orthogonal [61].
that is, A does not contain any proper subspace which is invariant under the action of all
GA acts irreducibly on A;
reversible transformations T GA .

12
Proof. This is a standard use of the (real version of) Schurs Lemma, see Proposition VIII.2.3 in [26].
we call the dynamical state space A irreducible. In other words: A is irreducible if and
If GA acts irreducibly on A,
only if the only non-trivial subspaces which are invariant under all transformations of GA are A and RA .
Transitive dynamical state spaces are not automatically irreducible. As a simple example, consider a state space

A which is a cylinder (as in Figure 1) and where GA contains all symmetries. The pure states are the points on the
two circles. By rotation and reflection, every pure state can be reversibly mapped to every other, such that we have
transitivity. However, it is not irreducible: the symmetry axis and the plane orthogonal it, intersecting the cylinders
center, are invariant subspaces.

A is a cylinder, then the corresponding state space is transitive, but not irreducible.
Figure 1: If

Now it is straightforward to introduce a generalized notion of purity.


Definition 7. Let A be a transitive and irreducible state space, and let h, i be the unique inner product on A such that all
transformations are orthogonal and h
,
i = 1 for pure states . Then, the purity P() of any state A is defined as the
squared length of the corresponding Bloch vector, i.e.
P() := k
k2 h
,
i.
It is straightforward to deduce some useful properties that follow from this definition:
Lemma 8 (Properties of Purity). Let A be a transitive and irreducible dynamical state space, then
1. 0 P() 1 for all A ,
2. P() = 0 if and only if = A , i.e. if is the maximally mixed state on A,
3. P() = 1 if and only if is a pure state,
4. P(T ) = P() for all reversible transformations T GA and states A ,

5. P is convex, i.e.
v
!
u
m
m
X
X
u
p
tP
i i
i P(i )
i=1

if i 0,

i=1

i = 1, and all i A .

p
Proof. First, 5. follows directly from the fact that P() = kk is a norm (use the triangle inequality). That pure
states have P() = 1 follows directly from Definition 7. Since every state can be written as a convex combination
of pure states, it follows from 5. that P() 1 for all states , and P() = k
k2 0 is clear. We have proven 1.
2
A
Clearly, from 0 = P() = k
k , it follows that
= 0, so = , and this proves 2. Since the inner product was
chosen such that reversible transformations are orthogonal, it follows that
P(T ) = hT
, T
i = h
,
i = P().

13
Now consider the ball B := {x A : hx, xi 1}. If is any state with P() = 1, then
is on the surface of that
A and
A B, it must then also be an
ball; in particular,
is an exposed point of the convex set B. But since

A , hence a pure state. This proves 3. Note that it also proves that all pure states are exposed.
exposed point of
In general, our definition of purity only works for transitive state spaces. Unfortunately, in the case of bipartite
(and multipartite) state spaces, this already excludes the most popular general probabilistic theory, colloquially
called boxworld. As it turns out, there is a natural way to define an analogous notion of purity in boxworld, which
we explain in Appendix B. However, the resulting notion of purity does not have all the nice properties of Lemma 8
any more: in particular, it equals unity for some pure (product) states, but is necessarily less than one for other pure
(PR box) states.
In the quantum case, our definition of purity coincides with the standard definition up to a factor and an offset:
Example 9 (Purity of Quantum States). The real vector space which describes the states on a quantum n-level system is the
set of Hermitian complex n n matrices,
A = {M Cnn | M = M }.
The cone of unnormalized states is given by all positive matrices, while the order unit is the trace functional:
uA () = Tr().

A+ = {M A | M 0},

Thus, the set of normalized states A is the usual set of density matrices; similarly, the Bloch vector space A is the set of traceless
Hermitian matrices. The group of reversible transformations GA is the projective unitary group,
GA = {U U | U SU (n)},
and this group acts irreducibly on A (this follows from Lemma 43 in the appendix). Thus, there is a unique inner product on
A such that all reversible transformations are orthogonal. It is easy to guess (we mentioned it before): it is the Hilbert-Schmidt
inner product, scaled such that pure state Bloch vectors have norm 1:
M
i :=
hL,

n
M
)
Tr(L
n1

M
A),

(L,

As a consequence, the purity P() of any quantum state is


P() = h
, i = h 1/n, 1/ni =

n
1
Tr(2 )
.
n1
n1

(6)

Classical probability distributions can be treated in a similar manner:


Example 10 (Purity of Classical Probability Distributions). The state space B of a classical n-level system is the set of all
probability distributions on n outcomes, that is, the simplex
)
(
n
X
B = (p1 , . . . , pn ) | pi 0,
pi = 1 .
i=1
n

This state space is contained in the vector space B = R with order unit uB (p) :=
cone of unnormalized states is

Pn

i=1

pi for p = (p1 , . . . , pn ) B. The

B+ = {p = (p1 , . . . , pn ) B | pi 0 for all i},


and the group of reversible transformation GB is the permutation
group Sn . The unique state which is invariant with respect to

GB is the maximally mixed state B = n1 , n1 , . . . , n1 . It is a well-known fact of group representation theory [26] that GB acts
= {p B | P pi = 0}. The unique invariant inner product on B
turns out to be
irreducibly on B
i
n

h
p, qi :=

n X
pi qi
n 1 i=1

(
p, q B).

Permutations relabel the entries of p and q and preserve this inner product. Pure states p = (0, . . . , 0, 1, 0, . . . , 0) have h
p, pi =
1. Thus, the purity of a probability distribution p on n outcomes, using pi = pi 1/n, is
n

P(p) = h
p, pi =

n X 2
1
p
.
n 1 i=1 i
n1

(7)

14
This is the quantum result (6) restricted to diagonal matrices (as expected); however, here it is derived without embedding the
probability distributions into quantum state space.
A is a square as in Figure 2.
Example 11 (Purity for a gbit). Consider a generalized bit, or gbit [24] where the state space
It can be understood as describing one half of a PR box [24], and as a particular type of state space that appears in a theory
called generalized nonsignaling theory or boxworld [44].
We assume that the group of reversible transformations GA is the group of all symmetries, which is consistent with the tensor
product in boxworld [44]. Then GA is the dihedral group D4 , containing all rotations of multiples of /2 and reflections
A as a square with
through diagonals. This group acts irreducibly on A = R2 . If we represent
A = 0 as the center, then the
invariant inner product is given by the usual Euclidean inner product. That is, the contour lines of constant purity correspond
to circles in the state space, see Fig.2.

C.

Comparison to existing entropy measures

The square state space (the gbit), mentioned in Example 11 and depicted in Figure 2, is also a good example to
highlight a difference between the generalized purity used here and another possible generalization.
In the quantum case, the standard purity satisfies the equation Tr(2 ) = 2H2 () , where H2 denotes the Renyi
entropy of order 2. There has been some work on notions of entropy in general probabilistic theories [25, 45, 46].
One could now imagine to define the purity of some state as 2H2 () .
However, such a definition would have undesirable properties, as can be seen by example of the definitions in [45].
2 () of
Two possible definitions of H2 are considered there: one possibility is to define the measurement entropy H
some state as the minimum Renyi-2 entropy of the set of outcome probabilities of any fine-grained measurement
on . However, using the measurement entropy, the corresponding definition of purity would assign purity 1 to
some highly mixed states in the boundary of the square state space, that is, the same value as for pure states.
As an
2
y and
example, consider
the
fine-grained
measurement
on
the
gbit
which
consists
of
the
two
effects
E
()
:=
1

y , if
= (
x ,

)
denotes
the
Bloch
vector
corresponding
to

(that
is,
the
corresponding
point
in
E2 () := 1 2
y
the square). The state
= (0, 1/ 2) is a mixed state in the boundary of the square, but the measurement (E1 , E2 )
assigns outcome probabilities 0 and 1 to this state. Hence 2H2 () = 1. This shows that the measurement entropy
can be misleading if used as a characterization of the mixedness of a state [62].
2 (), which is defined as the minimum Renyi 2-entropy of any
A second definition is the decomposition entropy, H
P
P
probability distribution (1 , . . . , n ) with n N and i i = 1 such that = i i i , with pure states i . That is,
it is the minimal entropy of the coefficients in any decomposition of into pure states. However, as shown in [45,

1

2 1 1 + 1 2 < 1 H
(D8)], there are states 1 , 2 in the gbit state space with the property that H
2
2
2 2 (1 ) + 2 H2 (2 ).
1
1
According to the corresponding purity definition, the mixture 2 1 + 2 2 would have higher purity then both 1
and 2 , violating intuition about mixtures being at least as mixed as their components.
In contrast, it follows from

property 5. in Lemma 8 that our notion of purity P always satisfies P 21 1 + 12 2 max{P(1 ), P(2 )}.
Another advantage of our definition of purity, as compared to other approaches, is that it satisfies several useful
identities arising from group theory. Thus, it is sometimes possible to calculate its value explicitly on the basis of
simple properties of the state space (such as in Theorem 28), which is important to derive the results of this paper.
This is analogous to the situation in quantum theory, where purity is often used as an easy-to-calculate replacement
for von Neumann entropy.

D.

Irreducible subgroups and generalized Paulis

In the quantum case, there is a simple formula expressing purity in terms of squared expectation values of Pauli
operators. For a single qubit, denote the 2 2 Pauli matrices by (X0 , X1 , X2 , X3 ) := (1, X, Y, Z), then Tr(2 ) =
3
1X
2
(Tr(Xi )) . A similar formula holds in the case of several qubits; we discuss this below.
2 i=0
As it turns out, there is an interesting generalization of these identities to the general probabilistic case. To understand the general case, it is useful to think of the Pauli operators not as matrices, but as linear maps that assign

15

A is a square and the capacity is NA = 2; shown are the four pure


Figure 2: The left and center panels display a gbit state space A, where
states and the maximally mixed state
A = 0. The symmetry group is the dihedral group GA = D4 , and the contour lines of constant purity
are circles (cf. Example 11). It has a complete set of Paulis, consisting of 2 maps which is the minimal possible number. The right panel shows
B . The group of symmetries is the dihedral group GB = D5 , containing rotations of multiples of = 2/5. This
a pentagon state space
state space has capacity NB = 2, a complete set of Paulis consists of 5 maps, and the maximally mixed state cannot be written as a uniform
mixture of two perfectly distinguishable pure states, in contrast to the square.

real numbers to states: 7 Tr(Xi ). These maps have certain properties that correspond to the conditions in the
following definition:
Definition 12. Let A be a transitive state space. A linear map X : A R is called a Pauli map if
X(A ) = 0 for the maximally mixed state A A , and
h
max{|X(
a)| | a
A,
a, a
i 1} = 1.
A such that X(
a
Thus, the second
If X is a Pauli map, then there exists a vector X
a) = hX,
i q
for all a
A.
Xi
denotes the norm on A
2 = 1, where kXk
2 = hX,
condition in Definition 12 is equivalent to the condition kXk

which is derived from the invariant inner product h, i on A.


In the quantum case of a single qubit, it is easy to see that the maps 7 Tr(Xi ) are Pauli maps if i {1, 2, 3}, but
not if i = 0: recall the definitions in Example 9, and let i {1, 2, 3}, then
!

i , i = 2 Tr(X
i ),
7 Tr(Xi ) = Tr(Xi ) = hX
i = 1 Xi A,
that is, one half
which proves that this map is represented by the vector (traceless Hermitian matrix) X
2
times the corresponding Pauli matrix. Then, we have for the norm on A


i k22 = hX
i, X
i i = 2 Tr(X
i2 ) = 2 Tr 1 1 = 1
kX
4
which proves that the corresponding maps are Pauli maps in the sense of Definition 12.
Pauli maps are related to purity by the following lemma:
Lemma 13. Let A be an irreducible transitive dynamical state space of dimension KA , and let H GA be a compact subgroup
which acts irreducibly on A (for example, H = GA ). Then, if X is any Pauli map on A,
Z
P()
2
for all states A .
(X H()) dH =
K
A1
HH
R
then I :=
Proof. If M 0 is any positive matrix on A,
HM H 1 dH 0 satisfies [I, H] = 0 for all H H. Thus,
HH
by Schurs Lemma, we have I = cIA for some c R. By taking the trace of both sides, we see that c = Tr M/(KA 1).
2

we get
Since (X H()) = hX|H|
ih
|H 1 |Xi,
Z
HH

(X H()) dH = hX|


Xi

ih
|
|
ihX|
= hX|
Tr |
= h
H|
ih
|H 1 dH |Xi
IA |Xi
.
KA 1
KA 1
HH

Z

16
Note that in the quantum case, the vectors themselves are operators, hence the matrices and linear maps appearing
in this calculation are superoperators.
The result becomes particularly interesting if the subgroup H GA is finite: it will finally give the natural analog
of Pauli matrices in more general theories.
Corollary 14. Let A be an irreducible transitive dynamical state space of dimension KA , and let H GA be a finite subgroup
Fix any Pauli map X1 on A, and let X be the orbit of H on X1 , disregarding the sign of each map.
which acts irreducibly on A.
That is, X := {X1 H | H H}/{+1, 1}. Then
1 X
P()
2
(X()) =
,
|X |
KA 1
XX


and we call X a complete set of Paulis for A. Note that X() = hX,
i.
Pn
For a classical probability distribution p = (p1 , . . . , pn ), the expression i=1 p2i is sometimes called the collision
probability. It is directly related to our notion of purity P(p) by eq. (7), and can be interpreted as the probability that
two identically prepared copies of p give the same outcome, if the random variable i {1, . . . , n} is measured. A
similar interpretation exists in the quantum case: we can ask for the probability of getting the same outcome, if we
measure two copies of a state in a fixed basis. Maximizing this collision probability over all bases yields Tr(2 ),
with the maximum being attained in the eigenbasis.
The following lemma generalizes this observation to other probabilistic theories.
Lemma 15 (Operational Interpretation of Purity). Any Pauli map X can be interpreted as a measurement, giving outcomes
1 on a state with probabilities (1 X())/2. The corresponding expectation value is exactly X().
Denote by Pc (X) the probability that two successive measurements of X on two identically prepared copies of give the
same outcome (c is for collision probability). Then it turns out that


1
max
Pc (X) =
1 + P() .
2
Pauli map

Proof. We use the Cauchy-Schwarz inequality


Pc (X)


=

1 + X()
2

2


+

1 X()
2

2
=

 1 1

1 1 2
1
1

1 + X()2 = + hX,
i2 + kXk
k22 =
1 + P() .
2 k
2
2 2
2 2
2

:=
This upper-bound is attained on the Pauli map corresponding to X
/k
k2 .
In quantum theory on k qubits, our notion of a complete set of Paulis reduces to the usual Pauli operators:
Example 16 (Paulis on k Qubits in Quantum Theory). Recall the quantum situation described in Example 9, but now on
n = 2k -dimensional Hilbert space, i.e. A is the quantum state space of k qubits. A particular finite subgroup of GA is given by
the Clifford group [47]
Ck := {U U (2k ) | U P U Pk for all P Pk },
where Pk is the Pauli group on k qubits, i.e. Pk = {1 . . . k | i {1, X, Y, Z}}. This group acts irreducibly by
the set of traceless Hermitian matrices on (C2 )k ; we show this in Lemma 43 in Appendix A. Consider X k ,
conjugation on A,
the k-fold tensor product of the Pauli matrix X. We would like to find a constant c > 0 such that X1 () := c Tr(X k )
1 which describes X1 :
becomes a Pauli map. First, we calculate the vector (i.e. matrix) X
!
1 , i =
X1 () = c Tr(X k ) = c Tr(X k ) = hX

2k
1 ),
Tr(X
2k 1

1 = c(2k 1)2k X k . The constant c is determined by normalization:


and we see that X
!

1, X
1i =
1 = hX

 
k

2k
c2 (2k 1)2
12 = 2
Tr
X

Tr (X k )2 = c2 (2k 1),
k
k
2k
2 1
2 1
2

17

hence c = 1/ 2k 1. Now if H = U U with U Ck , then


(X1 H)() =

1
2k

Tr(X k H()) =

1
2k

Tr(X k U U ) =

1
2k

Tr(U X k U )

By choosing appropriate elements H = U U of the Clifford group, the matrix X k Pn is mapped to every other element of
Pn except the identity. Ignoring the sign as suggested in Corollary 14, we get the orbit


1
Tr(1 . . . k ) | i {1, X, Y, Z}, not all i = 1 .
X = 7
2k 1
This is a complete set of Paulis according to Corollary 14: these are the (maps corresponding to) the usual Pauli matrices.
Therefore, purity can be expressed as
1
k
4 1
The standard result Tr(2 ) = 2k

X
(1 ,...,k )6=(1,...,1)

(Tr(1 . . . k ))
P()
= k
.
k
2 1
4 1

Tr(1 . . . k )2 follows from some further simplification.

1 ,...,k

For a classical n-level system introduced in Example 10, a complete set of Paulis consists of n maps that basically
read out a probability vectors components:
Example 17 (Paulis in Classical Probability Theory). With the notation of Example 10, let X1 : B R be the map
n

1 X
pi .
X1 (p) := p1
n 1 i=2

1 , pi for the
For the maximally mixed state B = n1 , . . . , n1 , we have X1 (B ) = 0. It is easy to check that X1 (
p) = hX
if X
1 = n 1 , 1 , . . . , 1 . Moreover, we have hX
1, X
1 i = 1, hence X1 is a Pauli map
invariant inner product on B
n
n
n
according to Definition 12. Now let H = GB = Sn be the full permutation group. If H is any permutation with,
1 X
say, (i) = 1, then Xi (p) := X1 (p) = pi
pj (for normalized probability vectors p B , this is just
n1
j6=i

Xi (p) =

n
n1 pi

1
n1 ).

Thus, a complete set of Paulis X is given by the set of maps

1 X
pj
X = p 7 pi

n1
j6=i


i {1, . . . , n} .

Then the formula from Corollary 14 reproduces eq. (7).


is a regular
Example 18 (Paulis for Polygonal State Spaces). Consider state spaces which are regular polygons; that is,
n-gon inscribed in the unit circle as in Figure 2. Then complete sets of Paulis (in the sense of Corollary 14) look very differently,
depending on properties of the symmetry group Dn , the dihedral group. We illustrate this for the cases n = 4 and n = 5.
1 := (1, 0), such that the corresponding Pauli map acts on the Bloch space R2 via X1 (
1,
Let X
) = hX
i = 1 , if
A , is a square, inscribed into a

= (1 , 2 ). First, consider a gbit system A where the Bloch representation of state space,
unit circle as in Figure 2. The symmetry group is the dihedral group D4 ; its orbit on X1 consists of the maps
{X1 G | G D4 } = {(1 , 2 ) 7 1 , (1 , 2 ) 7 1 , (1 , 2 ) 7 2 , (1 , 2 ) 7 2 }.
Disregarding the sign, we get a complete set of Paulis in the sense of Corollary 14, which is X = {X1 , X2 }, where X1 (
) = 1
and X2 () = 2 . Since KA = 3, the formula from Corollary 14 becomes
P()
1 2
( + 22 ) =
for all A ,
2 1
2
which just expresses the fact that the purity equals the squared Euclidean length of the Bloch vector.

(8)

18
B is a regular pentagon. A smallest irreducible subgroup is
Now consider the case n = 5, that is, a state space B where
given by H := {R(2k/5) | k = 0, 1, . . . , 4} (denoting its action on Bloch vectors), where R() denotes rotation by angle
in R2 . In general, (X1 R(2k/5)(
))2 gives different values for all k, which means that a complete set of Paulis necessarily
contains all the five maps. Thus, all we get is
4

2
P()
1X
X1 (R(2k/5)
=
for all B .
5
2

(9)

k=0

In a sense, this is inefficient: in order to compute P() = k


k2 , we could as well use eq. (8), which involves only two
addends instead of five. However, the advantage of (9) compared to (8) is that all involved maps Xk := X1 R(2k/5) are
equivalent for the state space B: they are all connected by reversible transformations. In other words, in order to build a
device that measures Xk , it is sufficient to have a device measuring X1 . All other measurements can then be accomplished by
composing X1 with a reversible transformation Tk := R(2k/5), as sketched in Figure 3. Within state space B, this is not
possible for the two maps X1 () := 1 and X2 () := 2 that appear in eq. (8).

Figure 3: All elements Xk of a complete set of Paulis (in the sense of Corollary 14) on a general state space are connected by reversible
transformations. That is, every Xk can be measured by first applying a reversible transformation Tk , and then measuring a fixed Pauli X1 .

E.

Bipartite systems: local purity as an entanglement measure

The main goal of this paper is to investigate typical states on composite state spaces. When we have state spaces
A and B, there are in general many different possible ways to combine them into a joint state space AB. However,
there is a minimal set of assumptions that necessarily must hold in order to interpret AB as a joint state space in a
physically meaningful way. The most important assumption is no-signalling: measurements on one subsystem do not
affect the outcome probabilities on other subsystems. In this section, we make an additional simplifying assumption
which is often (but not always) imposed in the framework of general probabilistic theories: that of local tomography.
However, we will later drop this assumption in Subsection III H.
Assumption: Local tomography. If A and B are state spaces, then the joint state space AB has the property that
states AB AB are uniquely characterized by the outcome probabilities of the local measurements on A and B.
From a physics point of view, this assumption means that the content of bipartite states consists of the correlations of
outcome probabilities of local measurements. This is equivalent to the multiplicativity of the state space dimension:
KAB = KA KB . It can be shown [24] that this assumption implies the tensor product formalism: The linear space
which carries the global unnormalized states is the algebraic tensor product of the local spaces: AB = A B. We
have the notion of product states A B AB for states A A , B B , and similarly for effects, with the
same interpretation as in the quantum case. In particular, the unit effect on AB is uAB = uA uB . For global states
AB AB , we can define the reduced state A A by LA ( A ) := LA uB ( AB ) for all linear functionals LA on A
(in particular, for all effects).
In accordance with [24], we give a list of additional assumptions that naturally follow from the physical interpretation of a composite state space. First, if A A and B B , then we assume that A B AB . That is,
we assume that it is possible to prepare states independently on A and B. Second, if AB AB is any global state,
we assume that the local reduced states are valid states on A and B: A A and B B . Since this work is
on dynamical state spaces, we also postulate that reversible transformations can always be applied locally. That is,
GA GB GAB .
Similarly as in the quantum case, a global state AB AB will be called entangled if it cannot be written as a
convex combination of product states. Now suppose AB is pure.
If the purity of the local reduced state is one, i.e. P( A ) = 1, then A must be pure. From this, it follows [48]
that AB = A B that is, the global state is unentangled.

19
On the other hand, if P( A ) < 1, then AB cannot be written as a product A B , since A would necessarily
have to be pure. Thus, AB is entangled.
That is, the local purity P( A ) can be understood as an entanglement measure: the smaller P( A ), the more entangled AB . If P( A ) = 0, or equivalently A = A , we may call AB maximally entangled. It turns out that a PR box
is an example of a maximally entangled post-quantum state in this sense [44].
It is natural to ask for the typical entanglement of random pure states on a composite state space AB. As discussed
in the context of Definition 3 above, in order for this notion to make sense, we need the property of transitivity: for
every pair of pure states , AB , there must be a reversible transformation T GAB such that T = . It is
important to note that transitivity of the local state spaces A and B does not imply transitivity of the joint state space
AB. A simple example is given by a state space called boxworld [44]: suppose that A and B are both square state
spaces as in the left of Figure 2, and AB is the state space which contains all no-signalling behaviours (including,
for example, PR-box states). That is, AB is assumed to be the largest possible subset of AB that is consistent with
the assumptions mentioned above (AB is sometimes called the no-signalling polytope, or the maximal tensor
product of the local state spaces). Then it turns out that the global state space is not transitive: for example, no
reversible transformation takes a pure product state to a pure PR-box state.
Thus, in the following, we will only consider composite state spaces AB that are themselves transitive. As a first
observation, it turns out that the maximally mixed state on AB is the product of the maximally mixed states of A
and B. The proof is given in [49].
Lemma 19. If A, B, and AB are transitive dynamical state spaces, then AB = A B .
If AB is transitive, we can decompose it into the Bloch subspace and multiples of the maximally mixed state:
AB = (AB) RAB . On the other hand, if A and B are transitive, we can substitute their local decompositions into
the tensor product:
RB ) = (A B)
(A B ) (A B)
RAB .
AB A B = (A RA ) (B
Since uAB = uA uB , the unit effect is zero on the first three addends in this decomposition. This shows that
(A B ) (A B).

(AB) = (A B)

(10)

This decomposition is reminiscent of another Bloch representation [49, 50] which writes global states in terms
of three vectors: the two local reduced states, and a correlation matrix. Now suppose that, in addition, AB is
irreducible. Then there is a unique inner product on (AB) such that all transformations T GAB are orthogonal.
Moreover, the three subspaces in eq. (10) are preserved by local transformations TA TB , and they are mutually
R
R
orthogonal. To see this, first note that for pure states A , we have GA GA GA dGA = GA GA GA dGA A =
and this integral is zero for all vectors A.
Now mutual orthogonality
0. Since the pure states span A, the span A,
B

of the subspaces, for example A B A , follows from


h
a b, a
0 B i = h(IA GB )(
a b), (IA GB )(
a0 B )i = h
a GB b, a
0 B i =

h
a GB b, a
0 B i dGB

GB GB

= h
a 0, a
0 B i = 0
(the other pairs of subspaces can be treated similarly). The value of the inner product on A B (and similarly on
can be calculated explicitly:
A B)
Lemma 20. Let A, B, and AB be transitive dynamical state spaces, where A and AB are irreducible. Then
h
x B , y B i = P(A B )h
x, yi

for all x
, y A,

where A is any pure state on A.


define (

Proof. For any pair of vectors x


, y A,
x, y) := h
x B , y B i. Clearly, this is an inner product on A.
Moreover, it is invariant with respect to all reversible transformations on A. Explicitly, for all T GA ,
(T x
, T y) = hT x
B , T y B i = h(T I)(
x B ), (T I)(
y B )i = h
x B , y B i = (
x, y),

20
since local transformations are in particular orthogonal with respect to the global invariant inner product. According
to Lemma 6, it follows that there exists some global constant c > 0 such that (
x, y) = ch
x, yi. Choosing x
= y = A
A
A
A
A
B
A
B
for some pure state A shows that c = ( , ) = h , i. But
(A B ) = (A + A ) B A B = A B ,

(11)

hence c = h(A B ) , (A B ) i = P(A B ).


We would like to construct Pauli maps on AB from Pauli maps on A and B. In particular, if X A is a Pauli map
on A, a natural idea is to use the map X A uB on the global state space. Is this a Pauli map? First, we have
X A uB (AB ) = X A (A )uB (B ) = 0, so the first condition of Definition 12 is satisfied. But there is a second
condition, demanding that the vector (X A uB ) representing this map must be normalized. As it turns out, this
vector has norm larger than one in general. The following lemma says how the map has to be normalized in order
to obtain a Pauli map on AB.
Lemma 21. Let A, B, and AB be transitive dynamical state spaces, where A and AB are irreducible. If X A : A R is a
Pauli map on A, then the following identity holds and describes a Pauli map on AB:
X A uB
=
k(X A uB ) k2

q
P(A B ) X A uB ,

where A is an arbitrary pure state on A, and B is the maximally mixed state on B.


2 = 1 and X(AB ) = 1 X A (A )uB (B ) = 0,
Proof. Let c := k(X A uB ) k2 and X := 1c X A uB , then clearly kXk
c
so X is a Pauli map according to Definition 12. It remains to show that c12 = P(A B ). Recall the decomposition
and A B,
hence it achieves its
of (AB) from eq. (10). The functional X A uB acts as the zero map on A B
B

maximal value on unit vectors on the subspace A . Thus, by elementary analysis,


c = k(X A uB ) k2 =

|X A uB ()|

|X A uB ()|

|X A uB (
a B )|
=
max
= max
.
B \{0}

kk
2
kk
2
k
a B k2
\{0}

A
a
A\{0}

max

(AB)

According to Lemma 20, we have k


a B k2 =
c= p

h
a B , a
B i =

P(A B )k
ak2 , hence

A k2
|X A (
a)|
1
kX
=p
.
=p

k
ak2
P(A B ) aA\{0}
P(A B )
P(A B )
1

max

This proves the claim.


Suppose we draw a pure state AB AB at random. This can be alternatively understood as a two-part process:
first, we fix an arbitrary pure state AB AB . Then, we apply a random reversible transformation T GAB to
it (drawn according to the Haar measure): the result AB = T AB will be a random pure state. Similarly, we can
fix a mixed state AB AB with P(AB ) = P0 < 1, and apply a Haar-random reversible transformation to it:
AB = T AB .
Having an initially mixed global state describes, for example, classical coin tossing, with A the coin and B the
environment. We will loosely describe this situation as drawing a random state AB of purity P0 := P(AB ) =
P( AB ), but this description is not quite correct: there is no natural invariant measure on the set of all states with
fixed purity P0 < 1, since those states are in general not all connected by reversible transformations (an obvious
example is given by the square state space in Figure 2). Thus, not all properties of the random state AB will be
independent of the initial state AB . However, as we shall see, the expected local purity will be independent of AB ,
and this is all we are interested in here.
Theorem 22. Let A, B, and AB be transitive dynamical state spaces, where A and AB are irreducible. Draw a state AB
AB of fixed purity P( AB ) randomly. Then, the expected purity of the local reduced state A is
E P( A ) =

KA 1
P( AB )

,
KA KB 1 P(A B )

where A is an arbitrary pure state on A, and B is the maximally mixed state on B.

21
p
P(A B ) X A uB is a Pauli map on AB according to
Proof. Let X A be any Pauli map on A, then X :=
Lemma 21. Using the invariance of the Haar measure and Lemma 13, we calculate
E

P( A )
= E
KA 1
= E

Z
2
2
X A G( A ) dG =
E X A uB (G I( AB )) dG
GGA
GGA
Z

2
2
X A uB ( AB ) =
E X A uB (G AB ) dG
GGAB

P(A

1
E
B )

2
X G( AB ) dG =

GGAB

P(A

E P( AB )
1

.
B
) KA KB 1

The symbol E denoting the expected value with respect to disappears since AB is drawn from a uniform distribution of a set of states with fixed purity (as described before the lemma).
It is easily checked that this result contains the well-known quantum result that random pure bipartite states are
almost maximally entangled with high probability if |B|  |A|, but we will not demonstrate this here. Instead, we
ask whether the expression P(A B ) appearing in Theorem 22 can be simplified. It turns out that this is possible
under some additional assumptions, and that this expression is related to the information carrying capacity of the
involved state spaces. This will be shown in the next subsection.

F.

Classical subsystems and capacity

How can we quantify the ability of a system (or state space) to carry classical information? A classical bit A and a
quantum bit B both carry one bit of classical information, even though the state space dimensions are quite different:
KA = 2, while KB = 4. The relevant quantity turns out to be the maximal number of perfectly distinguishable
states [28], denoted N . In order to define it, we have to talk about measurements.
Single measurement outcomes on a state space A are described by effects, which are linear maps E : A R with
the property that E() 0 for all A+ . The set of all effects [63] is known as the dual cone A+ of the cone of
unnormalized states AP
+ in convex geometry [51]. An n-outcome measurement is a collection of effects E1 , . . . , En that
n
sum to the order unit: i=1 Ei = uA . The probability of obtaining outcome i on state A is Ei ().
Definition 23. Let A be any state space.
A set of pure states 1 , . . . , n A is called a classical subsystem if there is a measurement E1 , . . . , En such that
Ei (j ) = ij (which is 1 for i = j and 0 otherwise); that is, if the states are perfectly distinguishable by a single-shot
measurement.
The capacity NA is defined to be the maximal size of any classical subsystem of A.
n

If A is a transitive state space, then a classical subsystem 1 , . . . , n will be called centered if

1X
i = A .
n i=1

If A is a transitive dynamical state space, then a classical subsystem 1 , . . . , n will be called dynamical if for every
permutation on {1, . . . , n}, there is a reversible transformation T GA such that T (i ) = (i) for all i.
A classical subsystem is a subset of a state space which, in many respects, behaves like a classical system from
probability theory. For example, given orthonormal vectors |1 i, . . . , |n i Cd with hi |j i = ij , the corresponding
quantum states i := |i ihi | constitute a classical subsystem. It is centered if and only if n = d, and the quantum
state space capacity is its Hilbert space dimension d. A classical subsystem is dynamical if it also carries all of the
reversible dynamics of classical probability theory that is, all the permutations. This is clearly the case in quantum
theory, where every permutation of the orthonormal basis vectors can be implemented by a unitary transformation.
Is is easy to see that to any set of mixed states 1 , . . . , n with effects E1 , . . . , En such that Ei (j ) = ij , there exists
a set of pure states 10 , . . . , n0 such that Ei (j0 ) = ij . Thus, the requirement of purity in this definition introduces no
restriction. Here are some simple consequences of this definition:
Lemma 24. We have the following properties of capacity and classical subsystems:
A is a simplex.
(i) Capacity satisfies NA KA , and we have equality if and only if A is a classical state space, i.e.

22
(ii) If 1 , . . . , n is a centered classical subsystem, then necessarily n = NA .
(iii) If A and B carry centered classical subsystems, then so does AB, and we have NAB = NA NB .
(iv) If a dynamical classical subsystem contains the maximally mixed state in its affine hull, then it is centered.
Proof. (i) It follows from the definition that sets of perfectly distinguishable states are linearly independent. Since the
number of linearly independent vectors is upper-bounded by the dimension KA , this proves that NA KA . Now
suppose we have equality, then the perfectly distinguishable states 1 , . . . , n with n = NA are a basis of A. Every
PNA
PNA
i = 1.
i i with i R. Since uA () = 1 = uA (i ), we get i=1
state A can thus be written = i=1
PNA
Moreover, we have 0 Ej () = i=1 i Ej (i ) = j . That is, is in the convex hull of 1 , . . . , n ; in other words,
A is the simplex generated by the i .
(ii) Clearly, NA n. In order to see the converse inequality, let 1 , . . . , NA be a maximal classical subsystem with
corresponding effects E1 , . . . , ENA . Due to transitivity, for every k, there is a reversible transformation Tk GA such
that Tk 1 = k . Using the invariance of the maximally mixed state, we get
!
n
n
1X
1X
1
1
1
A
A
i =
Ek (Tk i ) Ek (Tk 1 ) = Ek (k ) = .
Ek ( ) = Ek (Tk ) = Ek Tk
n i=1
n i=1
n
n
n
On the other hand, we have 1 =

NA
X

Ek (A )

k=1
NA
{iA }i=1

NA
. This proves that NA n.
n

B
{jB }N
j=1

(iii) If
and
are centered classical subsystems on A and B respectively, then all states iA jB are
pure. Moreover, they are perfectly distinguishable by the corresponding product measurement, and

!
NA
NB
X
X
X
1
1
1
A jB =
A
B = A B = AB .
NA NB i,j i
NA i=1 i
NB j=1 j
Thus, {iA jB }i,j is a centered classical subsystem on AB of size NA NB , and it follows from part (ii) that NAB =
NA NB .
n
X
ri i for some real numbers
(iv) Suppose that 1 , . . . , n is a dynamical classical subsystem on A, and A =
i=1
Pn
ri R with i=1 ri = 1. Let E1 , . . . , En be the corresponding effects with Ei (j ) = ij . Let j, k {1, . . . , n} be
arbitrary, and a permutation with k = 1 (j). Then
rj =

n
X
i=1

ri ij =

n
X

ri Ej (i ) = Ej (A ) = Ej (T A ) =

i=1

Thus, all ri are equal, and since

n
X

ri Ej (T i ) =

i=1

Pn

i=1 ri

= 1, we must have ri =

n
X

ri Ej ((i) ) = r1 (j) = rk .

i=1
1
n

for all i. This proves the claim.

Not every state space carries a centered classical subsystem. This is illustrated in Figure 2: both the square state
space A and the pentagon B have capacity NA = NB = 2. Any pair of antipodal pure states of the square constitutes
a centered classical subsystem of A, but the pentagon does not possess any centered classical subsystem. A polygonal
state space with n 4 sides carries a centered classical subsystem if and only if n is even.
Why is it natural to assume the existence of a centered classical subsystem? We will now discuss three good
reasons for a centered classical subsystem to exist in physically relevant state spaces. A first motivation comes
from dynamical considerations in group theory. Consider a qubit. The north and south pole (say, 1 = |0ih0| and
2 = |1ih1|) constitute a classical subsystem and it is one with rich dynamics: we can do classical computation
in this subsystem, that is, implement all the permutations (which is just a bit flip in the case of a qubit, but involves
many more transformations for higher-dimensional quantum systems).
More generally, we may ask what transformations preserve this classical subsystem. Together with the bit flips,
these are the rotations around the z-axis, and there are many of them: only the maximally mixed state (and no other)
is preserved by those transformations. It turns out that this property forces the classical subsystem to be centered:
Lemma 25. Let 1 , . . . , n be a classical subsystem on a transitive dynamical state space A, and let G GA be its stabilizer
subgroup. If the maximally mixed state A is the only G -invariant state, then the subsystem is centered.

23
Proof. Every T G preserves :=
mixed state A .

1
n

Pn

i=1

i . If the lemmas condition is satisfied, this must be the maximally

This means that if the state space is symmetric enough to allow for a rich group of dynamics (leaving the classical
subsystem invariant), and if that subsystem is large enough such that the corresponding group mixes basically
all of state space, then the subsystem must be centered.
As a second motivation, consider any maximal classical subsystem 1 , . . . , N . We can think of the convex hull
conv{1 , . . . , N } as a classical state space (a simplex) embedded in the more general, larger state space. This simplex
PN
carries its own classical maximally mixed state, which is classical := N1 i=1 i . The property of being centered just
means that this classical maximally mixed state equals the maximally mixed state of the larger theory, classical = :
classical probability theory is embedded in a symmetric way.
From a physics point of view, this is to expect whenever we have some kind of decoherence mechanism which
effectively reduces observations to the embedded classical system. On an n-level quantum system, for example,
decoherence can effectively reduce the observable state space to that of an n-simplex, which corresponds to diagonal
density matrices in the Hamiltonians eigenbasis. Now suppose that decoherence has taken place, and in addition,
we have total ignorance about the classical state of our system, such that we hold the state classical .
Physically, we expect that we are left with no remaining information at all: if we have perfect decoherence, followed
by perfect classical ignorance of the state, there should be no more remaining information that we could read out by
measurement. This implies that classical = ; that is, the existence of a centered classical subsystem. This subsystem
determines a preferred basis for decoherence.
A third, more operational way to understand this property is a principle[52] of information saturation: suppose
that Alice obtains a message i {1, . . . , N } randomly, with uniform distribution. She encodes this message into the
state i of the state spaces maximal classical subsystem, and sends it to Bob. The principle of information saturation
asserts that Alice can use this to send the message i to Bob with perfect success probability, but not more. This amounts
PN
to saying that the mixed state that she effectively sends, N1 i=1 i , should be the maximally mixed state of the
theory.
Before turning to the main result of this section, we need to consider one more property of state spaces. So far,
we have mainly talked about classical subsystems on single state spaces. However, if we are interested in classical
subsystems on composite state spaces AB, we expect that our theory can imitate another computational feature of
classical probability theory: that dynamical classical subsystems on A and B combine to dynamical classical subsystems on AB. In other words, we expect that AB carries a dynamical classical subsystem which can be decomposed
into A- and B-parts.
Definition 26 (Composite Classical Subsystem). A composite transitive dynamical state space AB is said to carry a comA
B
posite classical subsystem if there are centered dynamical classical subsystems 1A , . . . , N
on A and 1B , . . . , N
on B
A
B
AB
such that the corresponding classical subsystem containing the states i,j
:= iA jB is dynamical.
AB
We know from Lemma 24 that the states i,j
are automatically a centered classical subsystem, and NAB = NA NB .
However, it is not automatically clear that all permutations on this classical subsystems can be implemented reversibly, that is, that this classical subsystem is dynamical. If it is, it will be called a composite classical subsystem.

Intuitively, this means that A and B contain classical probability distributions as subsystems, in the friendliest
possible way: all permutations can be applied; the classical states of AB are combinations of those of A and B; the
local classical maximally mixed states correspond to the maximally mixed states of A and B. The philosophy of this
assumption is that physical state spaces should always be generalizations of classical probability theory, reducing to
the latter in the case of decoherence.
Centered dynamical classical subsystems have a nice symmetry property:
Lemma 27. Let {1 , . . . , N } be a centered dynamical classical subsystem on some state space. Then h
i ,
j i =
all i 6= j.

1
for
N 1

Proof. By definition, for every permutation on {1, . . . , N }, there exists a reversible transformation T such that
T i = (i) . Hence h
i ,
j i = hT
i , T
j i = h
(i) ,
(j) i. This proves that there is some constant R such that

24
h
i ,
j i = for all i 6= j. Now use the fact that the classical subsystem is centered:

N
N
N X
X

1 X
1 X
1
h
i ,
j i = 2
h
i ,
ii +
0 = h
,
i = 2
h
i ,
j i = 2 N + N (N 1) .
N i,j=1
N
N
i=1
i=1
j6=i

This equation can be used to infer that = 1/(N 1).


Now we are ready to prove the main result of this subsection.
Theorem 28. Let A, B, and AB be irreducible, and suppose that AB carries a composite classical subsystem. Then
P(A B ) =

NA 1
NA NB 1

for every pure state A A .

A
B
Proof. By definition, there are centered dynamical classical subsystems 1A , . . . , N
on A, and 1B , . . . , N
on B
A
B
AB
A
B
such that the states i,j := i j constitute a centered dynamical classical subsystem on AB. We know from
PNB A
AB
AB
1 jB
Lemma 27 that h
i,j
,
k,l
i = 1/(NA NB 1) if (i, j) 6= (k, l). Decomposing B , we get 1A B = N1B j=1
P
NB
and thus (1A B ) = N1B j=1
(1A jB ) . Consequently,
NB
1 X
h(1A jB ) , (1A kB ) i
NB2
j,k=1




NB
NB X
X
1
1
1 X
AB
AB
AB
AB
h

i
+
h

i
=
N

1
+
N
(N

1)

.
=
B
B
B
1j
1j
1k
NB2 j=1 1j
NB2
NA NB 1
j=1

P(1A B ) = h(1A B ) , (1A B ) i =

k6=j

Some simplification completes the proof.


Substituting Theorem 28 into Theorem 22 proves
Theorem 29. Let A, B, and AB be irreducible, and suppose that AB carries a composite classical subsystem. Draw a state
AB AB of fixed purity P( AB ) randomly, then
E P( A ) =

KA 1
NA NB 1

P( AB ).
KA KB 1
NA 1

This is the sought-for specialization of Theorem 22. Both Theorem 22 and Theorem 29 give explicit expressions
for the expected local purity of random bipartite states. While Theorem 22 is more general (it does not assume the
existence of a composite classical subsystem), it has the disadvantage of containing a term P(A B ) with no
simple operational meaning. The statement of Theorem 29 is operationally simpler, but makes stronger assumptions
on the state spaces.
Note also that a further simplification may be made in the case where K = N r for some integer r, a class of theories
discussed in [27, 28]. Then r becomes the only parameter that determines the expected purity of a subsystem and
E P( A ) NBr1 (where the approximation is good if N  1 for all systems/ subsystems under consideration).
G. GG0 -invariant faces: entanglement in symmetric subspaces

So far, we have computed the expected amount of entanglement (that is, the purity of the local reduced state) only
for the case that we draw the initial pure state from the full state space AB. In many cases, however, it is useful to
consider drawing random states under constraints. As a paradigmatic ensemble, suppose we draw a random pure
quantum state |i from the symmetric or antisymmetric subspace of Cn Cn . What can we say about the expected
local purity in this case?
We will see that Hilbert subspaces correspond to faces of the state space in the sense of convex geometry. This
will enable us to compute the average reduced purity with geometric methods, using the invariant inner product
introduced in earlier subsections. Moreover, both symmetric and antisymmetric subspace are invariant under all
transformations of the form U U . This behaviour is a special case of the following general-probabilistic definition.

25
of
AB will be called GG0 Definition 30 (GG0 -invariant face). Let AB be a composite dynamical state space. A face F
0
0
invariant if for every G GA there is some G GB such that G G maps F into itself. The stabilizer subgroup {G
GAB | GF = F} will be called GF . The face F will be called transitive if for every pair of extreme points (pure states) , F
there is some G GF such that G = . If F is transitive, we define the F-maximally mixed state F as
Z
F :=
G dG,
GGF

:= {
where is any pure state in F. For every F, we set
:= F , and F
| F}. F is called irreducible if GF acts

irreducibly on F.
Note that GG0 -invariance is not a symmetric notion: if for every G, there is some G0 such that G G0 stabilizes F,
then it is not necessarily the case that to every G0 , there is some G such that G G0 stabilizes F.
Example 31. Here are some examples of transitive irreducible GG0 -invariant faces:
The symmetric subspace FSYM on n-level quantum systems A and B. If is the projector onto the symmetric subspace,
then FSYM = { | Tr() = 1}. This shows that FSYM is in fact a face of the state space on AB. If G = U U GA is
some unitary transformation, then G GFSYM = FSYM , so it is GG0 -invariant with G0 = G.
There is a one-to-one correspondence between the symmetric subspace and the Hilbert space H := Cn(n+1)/2 : every
state in FSYM corresponds to a density matrix on H, and every map reversible transformation in GFSYM corresponds to a
unitary on H. We know that the unitaries act transitively on H, and we have already shown that this action is irreducible
(cf. Lemma 43 in the appendix), so FSYM is transitive and irreducible.
The totally antisymmetric subspace in A B, where A ' Cn and B ' Cn Cn . If G = U U , then this set of quantum
states is invariant with respect to G G0 , where G0 = U U U U .
The face F ofPAB with A = B ' Cn which consists only of the maximally entangled state, F = {|+ ih+ |}, where
n
-invariant.
|+ i = 1n i=1 |ii |ii. It is U U
Coin tossing in environment with record. Suppose we have a classical coin (corresponding to one bit), and an environment whose state can be described by a bit string of length n 1. Initially, the joint system is in an uncorrelated state
AB = A B . Since the coins state is known to use (say, it shows heads), A is pure; on the other hand, we may not
have full knowledge about the environment, meaning that B is mixed.
In contrast to the usual coin tossing example of Subsection II C, we additionally assume that the environment always
contains a perfect record of the coins state. In other words, if the coins state is 0 (or heads), the environments state must
be some bit string from a set S0 ; if the coins state is 1 (tails), it must be some bit string from a set S1 . Both S0 and S1 are
subsets of {0, 1}n1 , have empty intersection, and we assume that they have the same cardinality.
As a consequence, the possible configurations of the joint system are restricted to be either of the form 0s0 or 1s1 , where
s0 S0 and s1 S1 . The possible states (that is, probability distributions) have their full support on those configurations.
This defines a face F of the joint state space AB.
Since permutations can map every configuration of this kind to every other, F is a transitive. Moreover, it is GG0 invariant: if G GA is a reversible transformation, there are only two possibilities. First, G is the identity. Then, setting
G0 also equal to identity yields a map G G0 which preserves F. Second, G is a bit flip. Then, let T be a permutation
which swaps S0 and S1 (leaving all other strings invariant). Then G T preserves F.
We will study this scenario further in Example 39 below.
The following lemma will be useful.
A
Lemma 32. If F is a transitive GG0 -invariant face, and if A is transitive, then A
F = .

Proof. Let G GA be arbitrary, and let E A be any effect on A, then



A
1
E A (G1 A
) uB (F ) = E A uB G1 I(G G0 (F )) = E A (uB G0 )(F )
F ) = (E G
= E A uB (F ) = E A (A
F ).
A
Since this is true for all E A , we must have G1 A
F = F . But the only state which is invariant with respect to all
A
A
A
reversible transformations on A is , hence F = .

Another technical ingredient is this:

26

Lemma 33. Let AB be a transitive dynamical state space and F a transitive irreducible GG0 -invariant face. Then F
F .
and G GF , then
Proof. Suppose that a
F
a, G
F i = hG
a,
h
a,
F i = hG
F i =

Z

a dG,
G
F


= h0,
F i = 0.

GGF

This proves the claim.


Theorem 34. Let F be a transitive and irreducible GG0 -invariant face on an irreducible dynamical state space AB, where A is
also transitive and irreducible. Drawing a state AB F with fixed purity P( AB ) randomly, the expected local purity is
2 KA 1



EF P( A ) = F (X A uB ) 2
P AB P(F ) ,
KF 1
where KF denotes the dimension of F, X A is any Pauli map on A, and F denotes the orthogonal projection onto the span of F
(using the invariant inner product on (AB) ).
Taking Lemma 21 into account, it is clear that this theorem reduces to Theorem 22 in the case of F = AB .
Proof. Abbreviate := AB . Similarly as in Definition 12, call a linear map X : AB R a Pauli map on F if X(F ) = 0
is the vector with hX,
Xi
= 1, where X
F

and hX,
i = X() for all F. If X is a Pauli map on F, the same
calculation as in the proof of Lemma 13 shows that
Z
2
h
,
i
X G() dG =
for all F.
KF 1
GGF
Due to Lemma 33, we also have h
,
i = h

F ,

F i = h
,
i + h
F ,
F i, hence h
,
i = P() P(F ).
1 A
A A
B
According to Lemma 32, we have X A uB (F ) = X A (A
)
=
X
(
)
=
0,
hence
X

u
is a Pauli map on F,
F
c



A
A
B
B

where c = X u = F (X u ) 2 . Similarly as in the proof of Lemma 22, we have for all F
2

EF

P( )
= EF
KA 1
Z
=

2
X G( ) dG = EF

GGA

EF

X A uB (G G0 ())
GGA
Z
2
2
A
B
0
F
A
B
X u (G G ()) dG = E X u () =
A

GGA

= EF c2

2

dG
2
EF X A uB () dG

GGF

Z
GGF

1 A
X uB (G)
c

2

dG = c2

h
,
i
.
KF 1

Combining all the little results proves the claim.


In the quantum case, we can give an explicit description of the projector F :
Lemma 35. Suppose that A is a quantum state space, and is a projector onto some subspace. This subspace defines a face F
of A by F = { | Tr() = 1} = { | = }. Then F (M ) = M Tr(M )/(Tr ).
i.e. for all M with M = M and Tr M = 0. Clearly,
Proof. Define Q(M ) := M Tr(M )/(Tr ) for all M A,
Furthermore,
Q(M ) = Q(M ) and Tr Q(M ) = 0, hence we have a map Q : A A.
Q(Q(M )) = Q(M )

Tr(Q(M )) = Q(M )
Tr(Q(M )) = Q(M ),
Tr
Tr

hence Q is a projector. Denote the Hilbert space dimension by d, then we get for the inner product on A
h 
i
d1

d
hM, Q(N )i = Tr(M Q(N )) = Tr M N
Tr(N ) = Tr(M N )
Tr(M ) Tr(N ),
d
Tr
d1
and this expression is symmetric with respect to interchanging M and N (for the first addend due to the cyclicity of

the trace). Thus Q is an orthogonal projector on A.


i.e. there is some A such
The maximally mixed state F on the face is F = /(Tr ). Suppose that M F,

ran Q.
that M = = /(Tr ). Then direct calculation shows that Q(M ) = M , i.e. F ran Q, and thus span F

27
Now let m := Tr (the dimension of the subspace), then the term M in the definition of Q creates an m m
block matrix, and the subsequent term Tr(M )/(Tr ) removes the trace of this block matrix, leaving m2 1
parameters. Thus, dim(ran Q) m2 1. On the other hand, density matrices in F are described by m2 1 parameters,
= m2 1. This proves that dim(ran Q) dim(span F).
Altogether, this proves that span F
= ran Q, so
so dim(span F)
as claimed.
that Q is the orthogonal projector onto the span of F
Theorem 36. Let S be a subspace of dimension NS on a bipartite quantum state space AB with Hilbert space dimensions NA
and NB , with the property thatfor every unitary U on A there is a unitary U 0 on B such that U U 0 S = S. Drawing a state
AB on S with fixed purity Tr (AB )2 randomly, the expected local quantum purity is
ES


h
i  
 A 2

1
NA2 1
1
2
AB 2
Tr ( ) =
Tr ((EA IB )) Tr ( )
,
+ 2
NA
NS 1
NS

2
where EA = EA
is any matrix on A with Tr EA = 0 and Tr EA
= 1.

Proof. The set of states


on AB that have full support on S is a face F on the quantum state space AB . Since EA is
R
traceless, we have U U EA U dU = 0. Hence, if is the orthogonal projector onto S, we have


Tr((EA IB )) = Tr((EA IB ) = Tr U U 0 U U 0 (EA IB ) = Tr U U 0 (EA IB )U U 0
 Z


Z


= Tr (U EA U ) IB =
Tr (U EA U ) IB dU = Tr
U EA U dU IB = 0.
U

q
NA
A
B
A
Tr(E
)
is
a
Pauli
map
on
A.
Thus,
X

u
()
=
It is easy to check that X A () := NN
A
1
NA 1 Tr(EA IB ),
A
q

NA
A NB 1
and so X A uB = AB EA IB , where AB = NN
NA 1 . Using Lemma 35, this proves that
A NB
q

F X u


B

= AB F (EA IB ) = AB

Tr (EA IB )
(EA IB )
Tr

!
,

such that

h
2 i

2

2
2
k(EA IB )k22 = AB
Tr (EA IB )

F X A uB = AB
2

NA NB
.
NA NB 1

In order to apply Theorem 34, note that KA = NA2 and KF = NS2 , and the maximally mixed state on F is F = /(Tr ),
such that


NA NB
1
1
NA NB
2
P(F ) =
Tr(F )
=
1 .
NA NB 1
NA NB 1
NA NB 1
NS
Expressing all the purities P() in terms of Tr( 2 ) via eq. (6) and some algebraic simplification proves the claim.
In Theorem 1 in Subsection II D, we apply this result to compute the average entanglement in symmetric and
antisymmetric quantum subspaces. For the remainder of this subsection, we discuss the case of classical probability
theory. In this case, we can explicitly compute the norm of the projector appearing in Theorem 34:
Lemma 37. Suppose that A and B are classical state spaces over NA and NB outcomes, and F is any GG0 -invariant face on


NA NB /NF 1
NF (NA NB 1)

2
AB, corresponding to NF outcomes. Then F X A uB =
and P(F ) =
.
NA NB (NA 1)
NA NB 1
2
Proof. We use Theorem 34. First, the maximally mixed state F is just the uniform distribution on the classical
outcomes that generate F, that is, a probability vector with NF entries equal to 1/NF and all others zero. Recalling
NA NB /NF 1
. Now we apply Theorem 34 to the
the formula for purity in the classical case, eq. (7), gives P(F ) =
NA NB 1
special case where the initial state is pure: P( AB ) = 1. Since there are no entangled states in classical probability
theory, we know that A must be pure as well, i.e. P( A ) = 1, and so is its expectation value. Using that K = N
classically, substituting all these identities into the statement of Theorem 34 yields the norm of the projector.
Substituting this result back into Theorem 1, we get a very simple statement regarding GG0 -invariant faces in
classical probability theory. The proof involves only simple algebra and is thus omitted.

28
Theorem 38. Suppose that A and B are classical state spaces, and F is any GG0 -invariant face on AB. If we draw a random
state AB in F of fixed purity P( AB ), then the expected purity of the local marginal is
EF P( A ) = P( AB F ),
where the right-hand side denotes the purity of AB , computed by treating AB as a state on the smaller state space F.
F
Explicitly, if {jAB }N
j=1 denote the entries of the probability vector, then

P( AB F ) =

F
2
NF X
1
jAB
NF 1 j=1
NF 1

(compare this with eq. (7)). The result of Theorem 38 is no surprise at all: we get the same result in the unconstrained
case, Theorem 29, where the prefactors are cancelled due to N = K.
Example 39 (Coin tossing in environment with record, part 2). Recall the scenario from the last paragraph of Example 31.
Does the record in the environment affect the randomization of the coin? Suppose the coin is initially in the pure state 0 (or
heads). Then the environments initial state B must have full support on S0 ; for simplicity, we assume that it is otherwise
completely unknown, i.e. the uniform mixture over S0 . Applying Theorem 38, a little calculation shows that
EF P( A ) =

1
.
2#S0 1

This is exactly the same result as Theorem 29 gives us for an unconstrained environment B with NB = #S0 . This is an
environment which has half as many possible states as in the first scenario, where the possible environment configurations are in
S0 S1 with cardinality 2#S0 . Intuitively, the informed environment loses one bit of randomization power due to redundancy.
The same conclusion holds for correlated initial states.

H.

Theories which are not locally tomographic

In the previous sections, we have considered certain types of composite state spaces: transitive locally tomographic
compositions AB of state spaces A and B. At present date, there are no known examples of such theories beyond
quantum theory and subspaces within it such as classical probability theory. The search for such theories has just
started recently, but preliminary results suggest that theories of this kind might be rare [53].
On the other hand, it is known that there is a multitude of transitive composite state spaces AB if the requirement
of local tomography is dropped [54]. As it turns out, some of our results are easily generalized to theories without
local tomography. We will sketch this in this subsection, but leave a more detailed analysis of such theories to future
work. We start with a trial definition of arbitrary compositions of state spaces which need not be locally tomographic.
Definition 40. If A and B are state spaces, a composition AB is any state space which can be decomposed as AB = (A
B) C such that the following properties hold:
If A A and B B , then A B AB .
For AB AB , define the vector A via L( A ) := L uB ( AB ) for all linear maps L : A R. (An analogous
definition yields B .) Then A A and B B .
Moreover, if A and B are dynamical state spaces, a dynamical composition AB is assumed to have the following property: if
TA GA and TB GB , then (TA TB ) IC GAB .
Physically, this means that the global state space AB has some degrees of freedom (collected in C) that cannot be
accessed locally at A or B, not even by comparing correlations of measurement outcomes. It follows from the second
property that uAB = uA uB , because uA uB is a linear functional which gives unity on all global states. In this
notation, the tensor product of two linear functionals on A and B is assumed to act as the zero functional on C, i.e.
LA LB LA LB 0C .
The most famous example of a composite state space which is not locally tomographic is quantum theory over the
reals [55]:

29
Example 41 (Real quantum theory). Let A = { Rmm | Tr = 1, = T , 0}, that is, the set of (m m)-density
matrices with all real entries. The order unit is uA () = Tr . This is a state space of dimension KA = m(m + 1)/2. Similarly,
let B be the state space of (n n)-density matrices with all real entries. We assume m, n 2.
Then, a composition of A and B is given by the set of all (mn) (mn)-density matrices with all real entries. Since KAB >
KA KB , this is not a locally tomographic composition, but it is easy to check that it satisfies all the properties of Definition 40.
Since A, B, and AB are state spaces in the usual sense, the results of Subsections III A to III D apply without any
modification. As usual, if AB is transitive, it has a decomposition AB = (AB) R AB . Moreover, we claim
that the locally inaccessible subspace C is part of the Bloch subspace, C (AB) . To see this, let c C, then
uAB (c) = uA uB (c) = 0. In more detail, we have the decomposition
(A B ) (A B)
C
(AB) = (A B)
which follows from the fact that the right-hand side is a subspace V AB of dimension dim V = dim(AB) 1, and
uAB = uA uB evaluates to zero on all vectors of V . Now suppose that AB is irreducible then all the addends
C, let
above are mutually orthogonal in the invariant inner product on (AB) . For example, to see that A B

a
A, b B and c C, and compute
Z

h
a b, ci = h(TA IB IC )
a b, (TA IB IC )ci = hTA a
b, ci = h
TA a
dTA b, ci = h0, ci = 0,
GA

using the same argumentation as in Subsection III E. How is the maximally mixed state AB on AB related to A
and B ? To answer this question, extend the inner product on (AB) to an inner product on all of AB: for v, w AB
with decomposition v = v + v0 AB and w = w
+ w0 AB , where v0 = uAB (v) and w0 = uAB (w), we define
hv, wi := h
v , wi
+ v0 w0 .
This inner product is clearly invariant with respect to all reversible transformations from GAB , and it is constructed
such that AB (AB) . Taking into account the orthogonality of subspaces mentioned above, this proves that
h
i
(A B ) (A B)

C R AB = (A B)
.
A B ,
By integration as above, it is also easy to see that A B is perpendicular to all the three subspaces A B,
A
A
B
AB
C
Thus, C R . In other words, there is some constant R and vector C such that
and B.
A B = AB C . Applying uAB to this equation, using that uAB (AB ) = uAB (A B ) = 1 and uAB (C ) = 0,
we get = 1, and thus
AB = A B + C .
The Bloch vector C can be interpreted as the collection of all locally inaccessible degrees of freedom of the maximally
mixed state on AB. For symmetry reasons, we think it is plausible that C = 0 for many theories, but we were unable
to prove this in generality.
Following the argumentation in Subsection III E, it is interesting to see that both Lemma 20 and Lemma 21 remain
valid if AB is not locally tomographic with only minor modifications. We now assume that A, B, and AB are
transitive dynamical state spaces, where A and AB are irreducible. Lemma 20 becomes

h
x B , y B i = P(A B ) kC k22 h
x, yi,
while Lemma 21 gets modified to stating that
X A uB
=
k(X A uB ) k2

P(A B ) kC k22 X A uB

is a Pauli map on AB.


The only modification of Theorem 22 is that KA KB has to be replaced by KAB , the dimension of the composite
state space. The rest of the proof remains unaltered.

30
Theorem 42. Let A, B, and AB be transitive dynamical state spaces, where A and AB are irreducible, and AB is not
necessarily locally tomographic. Draw a state AB AB of fixed purity P( AB ) randomly. Then, the expected purity of the
local reduced state A is
E P( A ) =

KA 1
P( AB )
,

KAB 1 P(A B ) kC k22

where A is an arbitrary pure state on A, B is the maximally mixed state on B, and C is the vector which contains the locally
inaccessible degrees of freedom of the maximally mixed state AB on AB.
We leave it open whether the results of Subsection III F (including a more operational formulation of the main
result as in Theorem 29 involving only N and K) can be generalized to composite state spaces that are not locally
tomographic: this seems to depend strongly on the question under what circumstances AB = A B remains true
such that C = 0.

IV.

SUMMARY AND OUTLOOK

In summary, we considered general probabilistic theories and asked how mixed (impure) subsystems tend to be in
such theories after undergoing reversible dynamics. We showed that under certain limited assumptions subsystems
tend be close to maximally mixed in appropriate limits, and the amount of purity is given by a simple formula. Showing this involved developing various generalizations of the corresponding quantum concepts, e.g. purity, which are
of interest in themselves. Our results also apply to subspaces within quantum theory, and we calculated for example
the expected purity of subsystems in symmetric and antisymmetric spaces.
We view our results as a significant first step towards formulating the second law as a meta-theorem, meaning a
theorem that requires weaker assumptions than for example all of quantum theory. More generally we envisage a
formulation of statistical mechanics independent of theory details. Such a formulation can be expected to be useful
for example for black hole thermodynamics, where one cannot be certain that standard quantum theory applies, but
may accept some more basic assumptions.

Appendix A: Irreducibility of the Clifford group

Here we prove a lemma which is used in the main text in Example 16. It shows that our generalized definition
of a Pauli map reduces to the usual Pauli operators for the case of several qubits in quantum theory. It exploits the
well-known fact that the Clifford group is a 2-design [56].
Lemma 43. The Clifford group Ck on k qubits acts irreducibly by conjugation on the real vector space of traceless Hermitian
2k 2k -matrices.
Proof. We use the notation from Example 16. If there is a real subspace S A which is invariant with respect to all
Clifford maps, i.e. U SU S for all U Ck , then its complexification
S 0 := S + iS := {S1 + iS2 | S1 , S2 S} B(H)
k
which is also
is a complex subspace of the set of all complex matrices B(H) on the Hilbert space H = C2
k
invariant with respect to all Clifford maps. Fix any orthonormal basis {|ii}2i=1 on H, and define a complex-linear
map : B H H (which is related to the infamous Choi-Jamiolkowski isomorphism) by
k

(M ) :=

2
X

hi|M |ji |ii |ji.

i,j=1

)(M )
It is a linear isomorphism which satisfies h(M ), (N )i = Tr(M N ). Moreover, we have (U M U ) = (U U
denotes the complex-conjugate of U with respect to the given basis. It follows that
for all unitaries U , where U
the complex subspaces of B(H) which are invariant under conjugation with respect to all unitaries U Ck are in
for all unitaries
one-to-one correspondence with the complex subspaces of H H which are invariant under U U
U Ck .

31
invariant subspace in B(H) consists of all complex multiples of the identity 1. The image is (1) =
An obvious
P
2k |m i = i |ii |ii, which reproduces the well-known fact that multiples of the maximally entangled state |m i
. In order to prove the lemma, we have to show that
are invariant with respect to transformations of the form U U
this subspace and its orthogonal complement (consisting of the traceless matrices in B(H) respectively of the vectors
that are orthogonal to |m i) are the only non-trivial subspaces which are Ck -invariant.
It is well-known [56] that the Clifford group is a 2-design, i.e.
Z
2 Tr(a M a )
2 Tr(s M s )
1 X
(U U )M (U U ) =
s + k k
a
(U U )M (U U ) dU = k k
|Ck |
2 (2 + 1)
2 (2 1)
U U (2k )

(A1)

U Ck

for all M B(H), where s and a denote the projectors onto the symmetric and antisymmetric subspaces of H H,
respectively. We can write s = (1 + F)/2 and a = (1 F)/2, where F|ii |ji = |ji |ii is the swap operator. It
is easy to see that 2k hm |M TB |m i = Tr(M F), and FTB = 2k |m ihm |, if TB denotes the partial transposition on the
second system. Moreover, it holds (A BC T )TB = A DT TB C B T [57]. Using these identities and applying
TB to eq. (A1), we get
Z

1 X

)N (U U
) dU = Tr(m N m ) m
(U U )N (U U ) =
(U U
+ Tr(m N m )m ,
|Ck |
22k 1
U U (2k )
U Ck

:= 1 m . By Schurs Lemma, it follows that the one-dimensional subspace spanned


where m := |m ihm | and m

by |m i and its orthogonal complement are the only non-trivial subspaces which are invariant with respect to U U
for all U Ck .

Appendix B: Purity in boxworld

As we show here, it is possible to define a notion of purity for generalized no-signalling theory [24], colloquially
called boxworld, even though this theory is not transitive [44]. However, it will turn out that the resulting notion of
purity does not have all the nice properties that hold in the transitive case.
To keep things simple, we will only consider the paradigmatic case of two observers (Alice and Bob), each carrying
a square state space (a so-called gbit, as shown in, and discussed around, Figure 2). Operationally, this means
that both Alice and Bob carry two measurement devices with two outcomes each (yes and no); local states are
characterized by the two probabilities of the yes-outcomes. Both probabilities can be chosen independently, giving
rise to two coordinates in a square state space.
The two local state spaces are equal: A = B (we use the two different labels for convenience). Now we use a
particular representation of the square state space introduced in [44]. We define the set of normalized states A as
the convex hull of the four pure states

1
:= 1/2 .
1/ 2
A
T
As usual, the cone of unnormalized states is A+ := R+
0 A , and the order unit is u = (1, 0, 0) , if we denote effects
A
A
by vectors (such that u () = hu , i in the usual inner product). It turns out that the cone of effects A+ is generated
by the four effects

1/2
1/2
1/2
1/2

0 .
Y = 1/ 2 ,
uA Y = 1/ 2 ,
Z = 0 ,
uA Z =
0
0
1/ 2
1/ 2

The square state space is transitive. As discussed in Example 11, the group of reversible transformations GA is the
dihedral group D4 . In the particular representation chosen here, it acts as on the y- and z-components
a state
of
1
vector , and leaves the x-component (the normalization) invariant. The maximally mixed state is A = 0 . The
0
set of unnormalized states on AB is defined as follows:


(AB)+ := A B | E A F B () 0 for all E A A+ , F B B+


.

32
That is, these are all the vectors with the property that all local measurements yield positive outcome probabilities.
The bipartite (normalized) state space AB consists of all (AB)+ with uAB () = 1, where the order unit is, as
always, uAB = uA uB . Since AB is 9-dimensional, AB is an 8-dimensional polytope, known as the no-signalling
polytope.
What are the pure states in AB? Clearly, the 16 product states are pure. But there are 8 additional
entangled pure states: one of them is the famous PR box state P R , and the others can be obtained by local transformations from P R . We could use vectors with 9 entries to write down those states explicitly, but it will be more
convenient to use another representation: given the three unit vectors e1 , e2 , e3 , we will denote states (and vectors)
P3
AB as matrices (ij ), by using the decomposition = i,j=1 ij ei ej . In this representation, the maximally
mixed state and the pure product states are

1 t/ 2 u/ 2
1 0 0
AB = 0 0 0 ,
rs tu = r/2 rt/2 ru/2 (r, s, t, u {1, +1}).
0 0 0
s/ 2 st/2 su/2
One of the PR-box states is

1 0
0
= 0 1/2 1/2 .
0 1/2 1/2

P R

What are the reversible transformations in AB? It can be shown [44] that these are exactly the local transformations,
that is, those of the form GA GB , together with the swap transformation S which exchanges the two subsystems. There are no other reversible transformations in GAB . The bipartite space AB decomposes into GAB -invariant

subspaces (the first addend cannot be decomposed further because D4 acts complex-irreducibly on A):
(A B
A B ) (R A B ) .
AB = (A B)
| {z } |
{z
} |
{z
}
4dim .

4dim .

(B1)

1dim .

In this notation, A denotes the subspace of vectors x A with uA (x) = 0 (we called this the Bloch subspace in
Subsection III A). This shows that the only state on AB which is invariant with respect to all reversible transformations is the maximally mixed state AB := A B . We call the subspace generated by the first two addends above
(AB) , such that
AB = (AB) R AB .
In other words, (AB) consists of all vectors x AB with uAB (x) = 0.
Now we proceed as in Section III: to every state AB , we define the corresponding Bloch vector
as
:=
AB . This vector is obtained from the matrix representation above by replacing the 1 in the upper-left corner
by a zero. Denote the usual Euclidean inner product by h, i. Then we define the purity of as
P() := c h
,
i,

(B2)

and we choose the constant c > 0 such that, say, the pure product states have purity P( ) = 1. Using the
representation above, it is easy to see that we must have c = 1/3. The resulting definition satisfies some of the
properties mentioned in Lemma 8:
0 P() 1 for all AB ,
P() = 0 if and only if = AB , i.e. if is the maximally mixed state on AB,

P is convex, and
P(T ) = P() for all reversible transformations T GAB and states AB .
For example, to prove the last point, note that the local transformations on A and B are orthogonal in the chosen
representation: they rotate and reflect the square. Hence their product is orthogonal as well, and so is the swap. It
follows that P(T ) = hT
, T
i = h
, T T
i = h
,
i = P().

33
However, there is some bad news: if we compute the purity of the pure PR box state, we get
P(P R ) =

1
1
h
P R ,
P Ri = .
3
3

Even though this state is pure, it has purity (much) less than one. On transitive state spaces as considered in Section III, this cannot happen: all pure states have purity 1. Vice versa, we can see from this result that there is no
reversible transformation which maps a pure product state to a PR-box state: if there was one, then both states
necessarily would have the same purity.
Can we somehow avoid this problem? So far, we have been a bit hasty in our definition: in eq. (B2), we defined
purity with respect to the usual Euclidean inner product, because all reversible transformations in GAB are orthogonal with respect to this inner product. However, the decomposition (B1) shows that this is not the only inner product
on (AB) (where the Bloch vectors
live) which has this property: if we have two vectors ,

in this space, we can
decompose them as
= 0 + 00 ,

0 A B,

(A B )
00 (A B)

and similarly for


, and then define
h,

i := ah0 , 0 i + bh00 , 00 i,
where all brackets on the right-hand side denote the usual Euclidean inner product. For every choice of a, b > 0, this
yields an invariant inner product on (AB) . Is there a way to choose a and b such that the purity of pure product
states and PR-box states equals unity at the same time? (We can retain c = 1/3 in definition (B2) and absorb any
necessary factor into a and b). Using that A = e1 and A = span{e2 , e3 }, it is easy to see that
P( ) =

1
(a + 2b),
3

P(P R ) =

1
a.
3

Both expressions can only be simultaneously equal to 1 if b = 0. But this ruins the inner-product property. If we
ignore this, and go on with setting a = 3 and b = 0, we loose the property that P() = 0 only for the maximally
mixed state = AB : for example, we get P(A ) = 0.
In summary, there is no definition of purity on bipartite boxworld which has all the nice properties that hold true
in transitive state spaces. However, if we accept the existence of pure states with purity less than one, then eq. (B2)
can be a useful definition. A similar conclusion holds for other non-transitive state spaces.

Acknowledgements. We acknowledge valuable discussions with Frederic Dupuis, Johan Aberg,


Jonathan Oppenheim, Ldia del Rio, Lucien Hardy, Renato Renner, Roger Colbeck, as well as financial support from the National
Research Foundation (Singapore), the Ministry of Education (Singapore), Swiss National Science Foundation (grant
no. 200021-119868). Research at Perimeter Institute is supported by the Government of Canada through Industry
Canada and by the Province of Ontario through the Ministry of Research and Innovation.

[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]

E. Lubkin, Entropy of an n-system from its correlation with a k-reservoir, J. Math. Phys., vol. 19, pp. 10281031, 1978.
S. Lloyd and H. Pagels, Complexity as thermodynamic depth, Ann. Phys., vol. 188, pp. 186213, 1988.
D. N. Page, Average entropy of a subsystem, Phys. Rev. Lett. , vol. 71, pp. 12911294, 1993.
S. K. Foong and S. Kanno, Proof of Pages conjecture on the average entropy of a subsystem, Phys. Rev. Lett. , vol. 72, no. 8,
pp. 11481151, 1994.
P. Hayden, D. W. Leung, and A. Winter, Aspects of generic entanglement, Comm. Math. Phys., vol. 265, pp. 95117, 2006.
A. Harrow, P. Hayden, and D. Leung, Superdense coding of quantum states, Phys. Rev. Lett. , vol. 92, p. 187901, 2004.
A. Abeyesinghe, P. Hayden, G. Smith, and A. Winter, Optimal superdense coding of entangled states, IEEE Trans. Inf.
Theory, vol. 52, no. 8, pp. 3635 3641, 2006.
A. Serafini, O. C. O. Dahlsten, and M. B. Plenio, Teleportation fidelities of squeezed states from thermodynamical state
space measures, Phys. Rev. Lett. , vol. 98, p. 170501, 2007.
A. Serafini, O. C. O. Dahlsten, D. Gross, and M. B. Plenio, Canonical and micro-canonical typical entanglement of continuous
variable systems, J. Phys. A, vol. 40, p. 9551, 2007.
G. Smith and D. Leung, Typical entanglement of stabilizer states, Phys. Rev. A , vol. 74, no. 6, p. 062314, 2006.
O. O. Dahlsten and M. B. Plenio, Exact entanglement probability distribution of bi-partite randomised stabilizer states,
QIC, vol. 6, p. 527, 2006.

34
[12] O. C. O. Dahlsten, Typical Entanglement from the abstract to the physical. PhD thesis, Imperial College, 2008.
[13] R. Oliveira, O. C. O. Dahlsten, and M. B. Plenio, Generic Entanglement Can Be Generated Efficiently, Phys. Rev. Lett. ,
vol. 98, no. 13, p. 130502, 2007.
[14] F. Dupuis, The decoupling approach to quantum information theory. PhD thesis, Universite de Montreal, 2009.
[15] P. Hayden and J. Preskill, Black holes as mirrors: quantum information in random subsystems, J. High Energy Phys., vol. 09,
no. 120, 2007.
and H.-J. Briegel, Entanglement and decoherence in spin gases, IJQI, vol. 5, pp. 509
[16] J. Calsamiglia, L. Hartmann, W. Dur,
523, 2007.
[17] J. Gemmer, A. Otte, and G. Mahler, Quantum approach to a derivation of the second law of thermodynamics, Phys. Rev.
Lett. , vol. 86, no. 10, pp. 19271930, 2001.
[18] S. Popescu, A. J. Short, and A. Winter, Entanglement and the foundations of statistical mechanics, Nature Physics, vol. 2,
pp. 754758, 2006.
[19] S. Lloyd, Black Holes, Demons and the Loss of Coherence: How complex systems get information,and what they do with it. PhD thesis,
Rockefeller University, 1988.
[20] J. Gemmer, M. Michel, and G. Mahler, Quantum Thermodynamics: Emergence of Thermodynamic Behavior Within Composite
Quantum Systems, vol. 657. Lecture Notes in Physics, Berlin Springer Verlag, 2004.
[21] E. Lubkin and T. Lubkin, Average quantal behavior and thermodynamic isolation, IJTP, vol. 32, pp. 933943, 1993.

[22] M. P. Muller,
D. Gross, and J. Eisert, Concentration of Measure for Quantum States with a Fixed Expectation Value, Comm.
Math. Phys., vol. 303, pp. 785824, 2011.
[23] L. Rio, J. Aberg, R. Renner, O. Dahlsten, and V. Vedral, The thermodynamic meaning of negative entropy, Nature (London)
, vol. 474, no. 7349, pp. 6163, 2011.
[24] J. Barrett, Information processing in generalized probabilistic theories, Phys. Rev. A , vol. 75, no. 3, p. 032304, 2007.
[25] H. Barnum, J. Barrett, L. Orloff Clark, M. Leifer, R. Spekkens, N. Stepanik, A. Wilce, and R. Wilke, Entropy and information
causality in general probabilistic theories, New J. Phys., vol. 12, no. 3, p. 033024, 2010.
[26] B. Simon, Representations of Finite and Compact Groups, Graduate Studies in Mathematics, vol. 10. American Mathematical Society,
1995.
[27] W. K. Wootters, Quantum mechanics without probability amplitudes, Found. Phys., vol. 16, pp. 391405, 1986.
[28] L. Hardy, Quantum Theory From Five Reasonable Axioms, ArXiv Quantum Physics e-prints, 2001.
[29] P. Diaconis, What is a random matrix?, Notices of the AMS, p. 1349, 2005.
[30] P. Billingsley, Probability and Measure, 3rd Edition. Wiley-Interscience, 1995.
[31] V. D. Milman and G. Schechtman, Asymptotic theory of finite dimensional normed spaces (Lecture Notes in Mathematics 1200). New
York, NY, USA: Springer, 2001.
[32] A. Peres, Quantum Theory: Concepts and Methods. Kluwer, 1993.
[33] C. Cohen-Tannoudji, B. Diu, F. Laloe, and B. Dui, Quantum Mechanics, Vol. 2. Wiley-Interscience, 2006.
[34] J. R. Gittings and A. J. Fisher, Describing mixed spin-space entanglement of pure states of indistinguishable particles using
an occupation number basis, Phys. Rev. A , vol. 66, p. 032305, 2002.
[35] V. Vedral, Entanglement in the second quantization formalism, Cent. Eur. J. Phys., vol. 1, p. 289, 2003.
[36] D. Cavalcanti, L. M. Moreira, F. Matinaga, M. O. T. Cunha, and M. F. Santos, Useful entanglement from the Pauli principle,
Phys. Rev. B , vol. 76, p. 113304, 2007.
[37] S. D. H. Hsu and D. Reeb, Monsters, black holes and the statistical mechanics of gravity, Mod. Phys. Lett. A, vol. 24, p. 1875,
2009.
[38] J. A. Smolin and J. Oppenheim, Locking Information in Black Holes, Phys. Rev. Lett. , vol. 96, no. 8, p. 081302, 2006.
[39] D. N. Page, Information in black hole radiation, Phys. Rev. Lett. , vol. 71, pp. 37433746, 1993.
[40] J. Preskill, Do Black Holes Destroy Information?, in Black Holes, Membranes, Wormholes and Superstrings (S. Kalara &
D. V. Nanopoulos, ed.), p. 22, 1993.
[41] M. Planck, On the law of distribution of energy in the normal spectrum, Annalen der Physik, vol. 4, p. 553, 1901.
[42] O. C. O. Dahlsten, R. Oliveira, and M. B. Plenio, The emergence of typical entanglement in two-party random processes, J.
Phys. A, vol. 40, pp. 80818108, 2007.
[43] H. Barnum and A. Wilce, Information processing in convex operational theories, ArXiv e-prints, 2009.

[44] D. Gross, M. Muller,


R. Colbeck, and O. C. O. Dahlsten, All Reversible Dynamics in Maximally Nonlocal Theories are
Trivial, Phys. Rev. Lett. , vol. 104, no. 8, p. 080402, 2010.
[45] A. J. Short and S. Wehner, Entropy in general physical theories, New J. Phys., vol. 12, no. 3, p. 033023, 2010.
[46] G. Kimura, K. Nuida, and H. Imai, Distinguishability measures and entropies for general probabilistic theories, Reports on
Mathematical Physics, vol. 66, no. 2, pp. 175 206, 2010.
[47] D. Gottesman, The Heisenberg Representation of Quantum Computers, Proceedings of the XXII International Colloquium on
Group Theoretical Methods in Physics, pp. 3243, 1999.
[48] L. Hardy, Foliable Operational Structures for General Probabilistic Theories, ArXiv e-prints, 2009.

[49] L. Masanes and M. P. Muller,


A derivation of quantum theory from physical requirements, New J, Phys., vol. 13, no. 6,
p. 063001, 2011.
[50] B. Dakic and C. Brukner, Quantum Theory and Beyond: Is Entanglement Special?, ArXiv e-prints, 2009.
[51] C. D. Aliprantis and R. Tourky, Cones and Duality, vol. 84 of Graduate Studies in Mathematics. American Mathematical Society,
2007.
[52] L. Masanes, private communication, 2011.

[53] L. Masanes, M. P. Muller,


D. Perez-Garcia, and R. Augusiak, The singularity of entanglement, in preparation, 2011.

35
[54] H. Barnum and C. Ududec, private communication, 2011.
[55] L. Hardy and W. K. Wootters, Limited holism and real-vector-space quantum theory, ArXiv e-prints, 2010.
[56] D. P. DiVincenzo, D. W. Leung, and B. M. Terhal, Quantum Data Hiding, IEEE Trans. Inf. Theory, vol. 48, no. 3, p. 580598,
2001.
[57] D. Bruss and G. Leuchs, Lectures on Quantum Information. Weinheim: Wiley-VCH, 2007.
[58] J. von Neumann, Mathematical Foundations of Quantum Mechanics. Princeton University Press, 1955.
[59] It is interesting to note that the set of stabilizer states (including their convex combinations as density matrices) shares the
values of N and K with usual quantum theory. As we show later, this set of states satisfies all the conditions for eq. (1)
to hold. Thus, this equation gives the same amount of expected local purity as for the full set of quantum states. This was
already observed in [10], and is a consequence of the fact that the Clifford group constitutes a 2-design.
[60] The physical motivation for postulating compact groups GA is as follows. First, GA must be bounded (in the topology induced
by its action on A ) due to the compactness of A . Then, suppose we have a sequence of transformations (Tn )nN GA
such that limn Tn = T . Physically, this means that we can apply the transformation T to arbitrary accuracy. But this is
anyway all that we can hope for in physics; hence it makes sense to call T a physically allowed reversible transformation,
and include it in GA . Thus, from a physical point of view, it makes sense to postulate that GA must be closed.
[61] A linear map T is orthogonal with respect to an inner product h, i if hT u, T vi = hu, vi for all vectors u and v.
[62] An interesting question is whether this or another suggested measure could quantify the free energy of a state, a property
which has traditionally been used as a justification for choosing one entropy measure over another [58].
[63] We will not consider the possibility to have only a subcone of A+ as the cone of allowed effects. This more general setting
would describe a situation where some mathematically well-defined effects are physically impossible to measure, similar to
superselection rules forbidding certain superpositions in quantum mechanics. In this paper, we assume that all effects can
be physically implemented.

You might also like