A Counterexample To Additivity of Minimum Output Entropy: PACS Numbers
A Counterexample To Additivity of Minimum Output Entropy: PACS Numbers
A Counterexample To Additivity of Minimum Output Entropy: PACS Numbers
r
X
i
v
:
0
8
0
9
.
3
9
7
2
v
3
[
q
u
a
n
t
-
p
h
]
6
O
c
t
2
0
0
8
A Counterexample to Additivity of Minimum Output Entropy
M. B. Hastings
1
1
Center for Nonlinear Studies and Theoretical Division,
Los Alamos National Laboratory, Los Alamos, NM, 87545
We present a random construction of a pair of channels which gives, with non-zero probability
for suciently large dimensions, a counterexample to the minimum output entropy conjecture. As
shown by Shor[4], this implies a violation of the additivity conjecture for the classical capacity of
quantum channels. The violation of the minimum output entropy conjecture is relatively small.
PACS numbers:
The eld of information theory started with the study
of the capacity of a classical noisy channel to send clas-
sical information[1]. One fundamental property is addi-
tivity: the asymptotic rate at which information can be
transmitted noiselessly over a channel can be computed
from a single-letter formula for the information trans-
mitted in one use of the channel. The only advantage of
correlating the symbols across multiple channels (or the
same channel multiple times) is the ability to do channel
coding. Channel coding reduces the probability of er-
ror, but at the cost of sending a shorter message so that
the asymptotic amount of information that can be sent
remains the same.
With the development of quantum information the-
ory, it becomes natural to ask about the ability to send
quantum information over quantum channels, or clas-
sical information over quantum channels. In the rst
case, even the operational denition of additivity was re-
cently shown to be false in dramatic fashion: there exist
two channels, each of which has zero capacity to send
quantum information no matter how many times it is
used, but which can be used in tandem to send quantum
information[2].
The question of additivity for sending classical infor-
mation over a quantum channel is known as the additivity
conjecture. The Holevo capacity[3] for sending classical
information over a quantum channel E is dened to be
the maximum over density matrices
i
and probabilities
p
i
of
H
_
i
p
i
E(
i
)
_
i
p
i
H
_
E(
i
)
_
, (1)
where H() = Tr( ln()) is the von Neumann entropy.
Shor[4] showed that the additivity conjecture for the
capacity of quantum channels is equivalent to another
conjecture, the minimum output entropy conjecture[5].
Given a channel E, we dene the minimum output en-
tropy H
min
by
H
min
(E) = min
|
H(E(||)). (2)
The minimum output entropy conjecture is that for all
channels E
1
and E
2
, we have
H
min
(E
1
E
2
) = H
min
(E
1
) +H
min
(E
2
). (3)
In this letter we give a random construction that leads,
with high probability, to a counterexample to the min-
imum output entropy conjecture. Our construction is
similar to the constructions Winter and Hayden used to
show violation of the maximal p-norm multiplicativity
conjecture for all p > 1[6, 7, 8]. For p = 1, this violation
would imply violation of the minimum output entropy
conjecture; however, the counterexample found in [7] re-
quires a matrix size which diverges as p 1. We use dif-
ferent system and environment sizes (note that D << N
in our construction below) and make a dierent analysis
of the probability of dierent output entropies. Other
violations of the maximum p-norm multiplicativity con-
jecture are known for p close to 0[9].
For i = 1, ..., D pick l
i
0 independently from a prob-
ability distribution with
P(l
i
) l
2N1
i
exp(NDl
2
i
), (4)
where the proportionality constant is chosen such that
_
0
P(l
i
)dl
i
= 1. This distribution is the same as that
of the length of a random vector chosen from a Gaussian
distribution in N complex dimensions. Then, dene
L =
i
l
2
i
, (5)
and dene completely positive trace-preserving maps E
and E by
E() =
D
i=1
l
2
i
L
2
U
i
U
i
, (6)
E() =
D
i=1
l
2
i
L
2
U
i
U
i
,
where the U
i
are N-by-N unitary matrices. The channel
E applies U
i
with probability l
2
i
/L
2
. Note that for large
N >> D, l
2
i
/L
2
is very likely to be close to 1/D, so
that we apply unitaries with roughly equal probability.
The only reason in what follows for not choosing all the
probabilities equal is that the choice we made will allow
us to appeal to certain exact results on random bipartite
states later.
2
We also dene the conjugate channel
E
C
() =
D
i=1
D
j=1
l
i
l
j
L
2
Tr
_
U
i
U
j
_
|ij|, (7)
As shown in [10]
H
min
(E) = H
min
(E
C
). (8)
In the O(...) notation that follows, we will take
1 << D << N. (9)
We claim that
Theorem 1. For suciently large D, for suciently
large N, there is a non-zero probability that a random
choice of U
i
from the Haar measure and of l
i
from (4)
will give a channel E such that
H
min
(E E) < H
min
(E) +H
min
(E) (10)
= 2H
min
(E).
The size of N depends on D.
This theorem will follow from two lemmas below, (1)
and (5), which give small corrections to the naive esti-
mates of 2 ln(D) and ln(D) for the entropies. Lemma
(1) upper bounds H
min
(E E) by 2 ln(D) ln(D)/D.
Lemma (5) shows that for given D, for suciently large
N, with non-zero probability, the entropy H
min
(E) is at
least ln(D) S
max
, for
S
max
= c
1
/D +p
1
(D)O(
_
ln(N)/N)), (11)
where c
1
is a constant and p
1
(D) = poly(D). Thus,
since for large enough D, for large enough N we have
2S
max
< ln(D)/D, the theorem follows.
Lemma 1. For any D and N, we have
H
min
(E E)
1
D
ln(D) +
D 1
D
ln(D
2
) (12)
= 2 ln(D)
1
D
ln(D).
Proof. Consider the maximally entangled state, | =
(1/
N)
N
=1
| |. Then,
E E(||) =
_
i
l
4
i
L
4
_
|| (13)
+
i=j
_
l
2
i
l
2
j
L
4
__
U
i
U
j
_
|
_
U
i
U
j
_
.
Since the states || and (U
i
U
j
)|(U
i
U
j
)
are pure states, the entropy of the state in (13) is
bounded by H(E E(||))
_
i
l
4
i
L
4
_
ln(
i
l
4
i
L
4
)
i=j
_
l
2
i
l
2
j
L
4
_
ln(
l
2
i
l
2
j
L
4
)
1
D
ln(D) +
D1
D
ln(D
2
), where the
second inequality follows by Holder inequality.
Lemma 2. Consider a random bipartite pure state
|| on a bipartite system with subsystems B and E
with dimensions N and D respectively. Let
E
be the
reduced density matrix on E. Then, the probability den-
sity that
E
has a given set of eigenvalues, p
1
, ..., p
D
, is
bounded by
P(p
1
, ..., p
D
)
i
dp
i
(14)
= O(N)
O(D
2
)
D
(ND)D
(1
D
i=1
p
i
)
D
i=1
p
ND
i
dp
i
.
= O(N)
O(D
2
)
(1
D
i=1
p
i
)
D
i=1
_
D
ND
p
ND
i
exp[(N D)D(p
i
1/D)]
_
dp
i
.
Note that D
ND
p
ND
exp[(N D)D(p 1/D)] 1
for all 0 p 1.
Similarly, consider a random state pure state =
|| on an N dimensional space, and a channel E
C
(...)
as dened in Eq. (6), with unitaries U
i
chosen ran-
domly from the Haar measure and the numbers l
i
cho-
sen as described in Eq. (4) and with N > D. Then, the
probability density that the eigenvalues of E
C
() assume
given values p
1
, ..., p
D
is bounded by the same function
P(p
1
, ..., p
D
)
i
dp
i
as above.
Proof. As shown in[11], the exact probability distribution
of eigenvalues is
P(p
1
, ..., p
D
)
i
dp
i
(1
D
i=1
p
i
)
1j<kD
(p
j
p
k
)
2
D
i=1
p
ND
i
dp
i
,
(15)
where the constant of proportionality is given by
the requirement that the probability distribution in-
tegrate to unity. The proportionality constant is
O(N)
O(D
2
)
D
(ND)D
, and for 0 p
i
1
1j<kD
(p
j
p
k
)
2
D
i=1
p
ND
i
D
i=1
p
ND
i
, (16)
so Eq. (14) follows. The second equality in (14) holds
because
i
(p
i
1/D) = 0.
Given a random pure state ||, with U
i
and l
i
cho-
sen as described above, then the state E
C
() has the same
probability distribution as the reduced density matrix of
a random bipartite state, so the second result follows.
If all the probabilities p
i
are close to 1/D, so that p
i
=
1/D + p
i
for small p
i
, we can Taylor expand the last
line of (14), using p
ND
i
= exp[(N D) ln(p
i
)], to get:
P(p
1
, ..., p
D
) O(N)
O(D
2
)
exp[(ND)D
2
i
p
2
i
/2+...].
(17)
3
Similarly, we can expand
S =
i
p
i
ln(p
i
) ln(D) D
i
p
2
i
/2 +... (18)
Using Eq. (17,18), we nd that the probability of having
S = ln(D)S is roughly O(N)
O(D
2
)
exp[(ND)DS].
Using -nets, these estimates (17,18) give some motiva-
tion for the construction we are using, but just fail to give
a good enough bound on their own: dene an -net with
distance d << 1 between points on the net. There are
then O(d
2N
) points in the net. Then, the probability
that, for a random U
i
, l
i
, at least one point on the net has
a given S is bounded by exp[NDS + 2N ln(1/d)].
Thus, the probability of having a S = ln(D)/2D is less
than one for d D
1/4
. However, in order to use -nets
to show that it is unlikely to have any state | with
given S, we need to take a suciently dense -net. If
there exists a state |
0
with given S
0
, then any state
within distance d will have, by Fannes inequality[12], a
S S
0
d
2
ln(D/d
2
), and therefore we will need to
take a d of roughly 1/
0
|), the state E
C
(||) will typically be
close to a mixture of E
C
(|
0
0
|) with the maximally
mixed state, and hence will also have a relatively large
S. This idea motivates what follows.
Precisely, we will say that a density matrix is close
to maximally mixed if the eigenvalues p
i
of all obey
|p
i
1/D| c
MM
_
ln(N)/(N D)), (19)
where the constant c
MM
will be chosen later. For any
given channel E
C
, let P
E
C denote the probability that,
for a randomly |, the density matrix E
C
(||) is close
to maximally mixed. Let Q denote the probability that a
randomchoice of U
i
from the Haar measure and a random
choice of numbers l
i
produces a channel E
C
such that
P
E
C is less than 1/2. Note: we are dening Q to be the
probability of a probability here. Then,
Lemma 3. For an appropriate choice of c
MM
, the prob-
ability Q can be made made arbitrarily close to zero for
all suciently large D and N/D.
Proof. The probability Q is less than or equal to 2 times
the probability that for a random U
i
, random l
i
, and
random |, the density matrix E
C
(||) is not close to
maximally mixed. From (14), this probability is bounded
by the maximum over p such that |p 1/D| > c
mm
of
O(N
2
)
O(D
2
)
D
ND
p
ND
exp[(N D)D(p 1/D)] (20)
exp[O(D
2
) ln(N) (N D)D
2
c
2
MM
(ln(N)/(N D))/2 +...].
By picking c
MM
large enough, we can make this probabil-
ity arbitrarily small for suciently large D and N/D.
The next lemma is the crucial step.
Lemma 4. Consider a given choice of U
i
and l
i
which
give a E
C
such that P
E
C 1/2. Suppose there exists a
state |
0
, such that E
C
(|
0
0
|) has given eigenvalues
p
1
, ..., p
D
. Let P
near
denote the probability that, for a
randomly chosen state |, the density matrix E
C
(||)
has eigenvalues q
1
, ..., q
D
which obey
|q
i
(yp
i
+(1y)(1/D))| poly(D)O(
_
ln(N)/(N D))
(21)
for some y 1/2. Then,
P
near
exp(O(N))(1/2 1/poly(D)), (22)
where the power of D in the polynomial in (22) can be
made arbitrarily large by an appropriate choice of the
polynomial in (21).
Proof. Consider a random state . We can write
| =
_
1 x
2
|
0
+x|, (23)
With probability exp(O(N)), we have x
2
1/2.
Since is random, the probability distribution of |
is that of a random state with |
0
= 0. One way to
generate such a random state | with this property is to
choose a random state | and set
| =
1
_
1 |
0
||
2
_
1 |
0
0
|
_
. (24)
If we choose a random state |, then with probabil-
ity at least 1/2, the state E
C
(||) is close to maxi-
mally mixed. Further, for any given i, j, the probabil-
ity that |
0
|U
i
U
j
|| is greater than O(
_
ln(D)/N) is
1/poly(D), and the polynomial poly(D) can be chosen
to be any given power of D by appropriate choice of the
constant hidden in the O notation for O(
_
ln(D)/N).
Therefore,
Pr
_
Tr
_
|E
C
(|
0
|)|
_
poly(D)
_
ln(D)/N
_
1/poly(D),
(25)
with any desired power of D in the polynomial on the
right-hand side.
Then, since
Pr
_
| |
O(
_
ln(D)/N)
_
1/poly(D), (26)
4
we nd that
Pr
_
Tr
_
|E
C
(|
0
|)|
_
poly(D)
_
ln(D)/N
_
1/poly(D), (27)
with again any desired power in the polynomial.
The probability that E
C
(||) is close to maximally
mixed is at least 1/2, and so by (26) the probability that
the eigenvalues r
1
, ..., r
D
of E
C
(||) obey
|r
i
1/D| c
MM
_
ln(N)/(N D) + poly(D)(ln(D)/N)
poly(D)O(
_
ln(N)/N) (28)
is at least 1/2 1/poly(D). Let
y = 1 x
2
. (29)
Thus, since
E
C
(||) = (1 x
2
)E
C
(|
0
0
|) +x
2
E
C
(||) (30)
+
_
x
_
1 x
2
)E
C
(|
0
|) +h.c.
_
,
using Eq. (27) we nd that for given x, the probability
that a randomly chosen | gives a state with eigenvalues
q
1
, ..., q
D
such that
|q
i
(yp
i
+(1y)(1/D))| poly(D)O(
_
ln(N)/N) (31)
is 1/2 1/poly(D). Combining this result with the
exp(O(N)) probability of x
2
1/2, the claim of the
lemma follows.
Lemma 5. If the unitary matrices U
i
are chosen at
random from the Haar measure, and the l
i
are chosen
randomly as described above, then the probability that
H
min
(E
C
) is less than ln(D) S
max
is less than one
for suciently large N, for appropriate choice of c
1
and
p
1
. The N required depends on D.
Proof. Let P
bad
denote the probability that H
min
(E
C
) <
ln(D) S
max
. Then, with probability at least P
bad
Q,
for random U
i
and l
i
, the channel E
C
has P
E
C 1/2 and
has H
min
(E
C
) < ln(D) S
max
.
Let |
0
be a state which minimizes the output entropy
of channel E
C
. By lemma (4), for such a channel, for
a random state |, the density matrix E
C
(||) has
eigenvalues q
1
, ..., q
D
which obey
|q
i
(yp
i
+(1y)(1/D))| poly(D)O(
_
ln(N)/N) (32)
for y 1/2 with probability at least
exp(O(N))(1/2 1/poly(D)). (33)
Therefore, for a random choice of U
i
, l
i
, , the state
E
C
(||) has eigenvalues q
i
which obey Eq. (32) with
probability at least
(P
bad
Q) exp(O(N))(1/2 1/poly(D)). (34)
However, by Eq. (14), the probability of having such
eigenvalues q
i
is bounded by the maximum of the prob-
ability density
P(q
1
, ..., q
D
) over q
i
which obey Eq. (32).
Given the assumptions that
i
p
i
ln(p
i
) ln(D)
S
max
, y 1/2, and the constraint that
i
p
i
= 1, the
quantity
P(q
1
, ..., q
D
) O(N
O(D
2
)
exp(c
2
(N D))),
where c
2
can be made arbitrarily large by choosing c
1
large. We pick c
MM
so that Q < 1 and then if P
bad
= 1,
we can pick c
1
and p
1
such that for suciently large N
this quantity
P(q
1
, ..., q
D
) is less than that in (34), giv-
ing a contradiction. Therefore, P
bad
< 1. In fact, since Q
can be made arbitrarily close to zero, P
bad
can be made
arbitrarily close to zero for suciently large D, N.
This completes the proof of the theorem.
Discussion We have given a random construction
that leads, with non-zero probability, to a counterex-
ample to the minimum output entropy conjecture. It is
worth noting, however, that the violation of the conjec-
ture is relatively small, in that, even with careful choices
of D and N, the relative violation [2H
min
(E) H
min
(E
E)]/H
min
(E) is much less than unity.
We have not computed the exact constants in this rel-
ative violation, but the relative violation is at most 1/D.
For any choice of unitaries U
i
, we can choose a state |
which is an eigenvector of U
1
U
2
to show that
H
min
(E) ln(D) (2/D) ln(2). (35)
Comparing 2(2/D) ln(2) to ln(D)/D, we see that D needs
to be greater than 16 for this construction to work. In
fact, our randomized estimates require a much greater
D than that, so that the relative violation will be quite
small.
Acknowledgments I thank J. Yard, P. Hayden, and
A. Harrow. This work was supported by U. S. DOE
Contract No. DE-AC52-06NA25396.
[1] C. E. Shannon, Bell Syst. Tech. Jour. 27, 379 (1948).
[2] G. Smith and J. Yard, arXiv.org:0807.4935.
[3] A. S. Holevo, Probl. Info. Transm. (USS), 9, 31 (1973).
[4] P. W. Shor, Comm. Math. Phys. 246, 543 (2004).
[5] C. King and M. B. Ruskai, IEEE Trans. Inf. Thy. 47, 192
(2001).
[6] A. Winter, arXiv.org:0707.0402.
[7] P. Hayden, arXiv.org:0707.3291.
[8] P. Hayden and A. Winter, arXiv.org:0807.4753.
[9] T. Cubitt, A. W. Harrow, D. Leung, A. Montanero, and
A. Winter, arXiv.org:0712.3628.
[10] C. King, K. Matsumoto, M. Nathanson, M. B. Ruskai,
arXiv:quant-ph/0509126, Markov Proc. Relat. Fields.
13, 391 (2007).
[11] D. N.Page, Phys. Rev. Lett. 71, 1291 (1993); S. Lloyd
and H. Pagels, Ann. Phys. 188, 186 (1988).
[12] M. Fannes, Commun. Math. Phys. 31, 291 (1973).