1
The Three-Terminal Interactive
Lossy Source Coding Problem
arXiv:1502.01359v3 [cs.IT] 18 Jan 2016
Leonardo Rey Vega, Pablo Piantanida and Alfred O. Hero III
Abstract
The three-node multiterminal lossy source coding problem is investigated. We derive an inner bound
to the general rate-distortion region of this problem which is a natural extension of the seminal work by
Kaspi’85 on the interactive two-terminal source coding problem. It is shown that this (rather involved)
inner bound contains several rate-distortion regions of some relevant source coding settings. In this way,
besides the non-trivial extension of the interactive two terminal problem, our results can be seen as a
generalization and hence unification of several previous works in the field. Specializing to particular
cases we obtain novel rate-distortion regions for several lossy source coding problems. We finish by
describing some of the open problems and challenges. However, the general three-node multiterminal
lossy source coding problem seems to offer a formidable mathematical complexity.
Index Terms
Multiterminal source coding, Wyner-Ziv, rate-distortion region, Berger-Tung inner bound, interactive
lossy source coding, distributed lossy source coding.
The material in this paper was partially published in the IEEE International Symposium on Information Theory, Honolulu,
Hawaii, USA, June 29 - July 4, 2014 and in the ncIEEE International Symposium on Information Theory, Hong-Kong, China,
June 14 - June 19, 2015. The work of L. Rey Vega was partially supported by project UBACyT 2002013100751BA. The work
of P. Piantanida was partially supported by the FP7 Network of Excellence in Wireless COMmunications NEWCOM#. The
work of A. Hero was partially supported by a DIGITEO Chair from 2008 to 2013 and by US ARO grant W911NF-15-1-0479.
L. Rey Vega is with the Departments of Electronics (FIUBA) and CSC-CONICET, Buenos Aires, Argentina (e-mail:
[email protected],
[email protected]).
P. Piantanida is with the Laboratoire des Signaux et Systèmes (L2S), CentraleSupelec, 91192 Gif-sur-Yvette, France (e-mail:
[email protected]).
Alfred O. Hero III is with the Department of Electrical Eng. & CompSci University of Michigan, Ann Arbor, MI, USA
(e-mail:
[email protected]).
November 7, 2018
DRAFT
2
I. I NTRODUCTION
A. Motivation and related works
Distributed source coding is an important branch of study in information theory with enormous
relevance for the present and future technology. Efficient distributed data compression may be
the only way to guarantee acceptable levels of performance when energy and link bandwidth
are severely limited as in many real world sensor networks. The distributed data collected by
different nodes in a network can be highly correlated and this correlation can be exploited at
the application layer, e.g., for target localization and tracking or anomaly detection. In such
cases cooperative joint data-compression can achieve a better overall rate-distortion trade-off
that independent compression at each node.
Complete answers to the optimal trade-offs between rate and distortion for distributed source
coding are scarce and the solution to many problems remain elusive. Two of the most important
results in information theory, Slepian-Wolf solution to the distributed lossless source coding
problem [1] and Wyner-Ziv [2] single letter solution for the rate-distortion region when side
information is available at the decoder provided the kick-off for the study of these important
problems. Berger and Tung [3], [4] generalized the Slepian-Wolf problem when lossy reconstructions are required at the decoder. It was shown that the region obtained, although not tight
in general, is the optimal one in several special cases [5]–[8] and strictly suboptimal in others
[9]. Heegard and Berger [10] considered the Wyner-Ziv problem when the side information at
the decoder may be absent or when there are two decoders with degraded side information.
Timo et al [11] correctly extended the achievable region for many (> 2) decoders. In [12] and
the references therein, the complementary delivery problem (closely related to the HeegardBerger problem) is also studied. The use of interaction in a multiterminal source coding setting
has not been so extensively studied as the problems mentioned above. Through the use of
multiple rounds of interactive exchanges of information explicit cooperation can take place using
distributed/successive refinement source coding. Transmitting “reduced pieces” of information,
and constructing an explicit sequential cooperative exchange of information, can be more efficient
that transmitting the “total information” in one-shot.
The value of interaction for source coding problems was first recognized by Kaspi in his seminal work [13], where the interactive two-terminal lossy source coding problem was introduced
November 7, 2018
DRAFT
3
and solved under the assumption of a finite number of communication rounds. In [14] it is shown
that interaction strictly outperforms (in term of sum rate) the Wyner-Ziv rate function. There
are also several extensions to the original Kaspi problem. In [15] the interactive source coding
problem with a helper is solved when the sources satisfy a certain Markov chain property. In
[16]–[18] other interesting cases where interactive cooperation can be beneficial are studied. To
the best of our knowledge, a proper generalization of this setting to interactive multiterminal
(> 2) lossy source coding has not yet been observed.
B. Main contributions
In this paper, we consider the three-terminal interactive lossy source coding problem presented
in Fig. 1. We have a network composed of 3 nodes which can interact through a broadcast ratelimited –error free– channel. Each node measures the realization of a discrete memoryless source
(DMS) and is required to reconstruct the sources from the other terminals with a fidelity criterion.
Nodes are allowed to interact by interchanging descriptions of their observed sources realizations
over a finite number of communication rounds. After the information exchange phase is over,
the nodes try to reconstruct the realization of the sources at the other nodes using the recovered
descriptions.
The general rate-distortion region seems to pose a formidable mathematical problem which
encompass several known open problems. However, several properties of this problem are
established in this paper.
General achievable region
We derive a general achievable region by assuming a finite number of rounds. This region
is not a trivial extension of Kaspi’s region [13] and the main ideas behind its derivation are
the exchange of common and private descriptions between the nodes in the network in order to
exploit optimality the different side informations at the different nodes. As in the original Kaspi’s
formulation, the key to obtaining the achievable region is the natural cooperation between the
nodes induced by the generation of new descriptions based on the past exchanged description.
However, in comparison to Kaspi’s 2 node case, the 3 nodes interactions make significant
differences in the optimal action of each node at the encoding and decoding procedure in
a given round. At each encoding stage, each node need to communicate to two nodes with
November 7, 2018
DRAFT
4
n
n
, D23 )
(X̂21
, D21 ) (X̂23
n
n
, D13 )
(X̂12
, D12 ) (X̂13
R2
R1
Encoder 1
Encoder 2
R3
R3
R1
X1n
X3n
R2
X2n
Encoder 3
n
n
(X̂31
, D31 ) (X̂32
, D32 )
Figure 1: Three-Terminal Interactive Source Coding. There is a single noiseless rate-limited
broadcast channel from each terminal to the other two terminals. Dij denotes the average perletter distortion between the source Xjn and X̂ijn measured at the node i for each pair i 6= j.
different side information. This is reminiscent of the Heegard-Berger problem [10], [11], whose
complete solution is not known, when the side information at the decoders is not degraded.
Moreover, the situation is a bit more complex because of the presence of 3-way interaction.
This similarity between the Heegard-Berger problem leads us to consider the generation of two
sets of messages at each node: common messages destined to all nodes and private messages
destined to some restricted sets of nodes. On the other hand, when each node is acting as a
decoder the nodes need to recover a set of common and private messages generated at different
nodes (i.e. at round l node 3, needs to recover the common descriptions generated at nodes 1
and 2 and the private ones generated also at nodes 1 and 2). This is reminiscent of the BergerTung problem [4]–[6], [19] which is also an open problem. Again, the situation is more involved
because of the cooperation induced by the multiple rounds of exchanged information. Particularly
important is the fact that, in the case of the common descriptions, there is cooperation based
on the conditioning on the previous exchanged descriptions in addition to cooperation naturally
induced by the encoding-decoding ordering imposed by the network. This explicit cooperation
for the exchange of common messages is accomplished through the use of a special binning
technique to be explained in Appendix B.
November 7, 2018
DRAFT
5
Besides the complexity of the achievable region, we give an inner bound to the rate-distortion
region that allows us to recover the two node Kaspi’s region. We also recover several previous
inner bounds and rate-distortion regions of some well-known cooperative and interactive –as
well as non-interactive– lossy source coding problems.
Special cases
As the full problem seems to offer a formidable mathematical complexity, including several
special cases which are known to be long-standing open problems, we cannot give a full converse
proving the optimality of the general achievable region obtained. However, in Section V we
provide a complete answer to the rate-distortion regions of several specific cooperative and
interactive source coding problems:
(1) Two encoders and one decoder subject to lossy/lossless reconstruction constraints without
side information (see Fig. 2).
(2) Two encoders and three decoders subject to lossless/lossy reconstruction constraints with
side information (see Fig. 3).
(3) Two encoders and three decoders subject to lossless/lossy reconstruction constraints, reversal
delivery and side information (see Fig. 4).
(4) Two encoders and three decoders subject to lossy reconstruction constraints with degraded
side information (see Fig. 5).
(5) Three encoders and three decoders subject to lossless/lossy reconstruction constraints with
degraded side information (see Fig. 6).
Interestingly enough, we show that for the two last problems, interaction through multiple
rounds could be helpful. Whereas for the other three cases, it is shown that a single round
of cooperatively exchanged descriptions suffices to achieve optimality. Table I summarizes the
characteristics of each of the above mentioned cases.
Next we summarize the contents of the paper. In Section II we formulate the general problem.
In Section III we present and discuss the inner bound of the general problem. In Section IV we
show how our inner bound contains several results previously obtained in the past. In Section
V we present the converse results and their tightness with respect to the inner bound for the
special cases mentioned above providing the optimal characterization for them. In Section VI
we present a discussion of the obtained results and their limitations and some numerical results
November 7, 2018
DRAFT
6
Cases
R1
R2
R3
(1)
6= 0
6= 0
=0
∅ (is not reconstructing
any source)
n
Pr X̂21
6= X1n ≤ ǫ
Constraints at Node 1
Constraints at Node 2
∅ (is not reconstructing
(2)
6= 0
6= 0
=0
any source)
i
n
E d(X̂12
, X2n ) ≤ D12
(3)
6= 0
6= 0
=0
h
i
n
, X2n ) ≤ D12
E d(X̂12
n
Pr X̂21
6= X1n ≤ ǫ
(4)
6= 0
6= 0
=0
h
i
n
E d(X̂12
, X2n ) ≤ D12
h
i
n
E d(X̂21
, X1n ) ≤ D21
(5)
6= 0
6= 0
6= 0
∅ (is not decoding)
h
n
6= X1n ≤ ǫ
Pr X̂21
i
h
n
E d(X̂23
, X3n ) ≤ D23
Constraints at Node 3
n
Pr X̂31
6= X1n ≤ ǫ
h
i
n
E d(X̂32
, X2n ) ≤ D32
n
Pr X̂31
6= X1n ≤ ǫ
h
i
n
E d(X̂32
, X2n ) ≤ D32
n
Pr X̂32
6= X2n ≤ ǫ
i
h
n
E d(X̂31
, X1n ) ≤ D31
h
i
n
E d(X̂31
, X1n ) ≤ D31
h
i
n
E d(X̂32
, X2n ) ≤ D32
n
6= X1n ≤ ǫ
Pr X̂31
i
h
n
E d(X̂32
, X2n ) ≤ D32
Table I: Special cases fully characterized in Section V.
concerning the new optimal cases from the previous Section. Finally in Section VII we provide
some conclusions. The major mathematical details are relegated to the appendixes.
Notation: We summarize the notation. With xn and upper-case letters X n we denote vectors
and random vectors of n components, respectively. The i-th component of vector xn is denoted
as xi . All alphabets are assumed to be finite.
Entropy is denoted by H(·) and mutual information by I(·; ·). H2 (p) denotes the entropy
associated with a Bernoulli random variable with parameter p. With h(·) we denote differential
entropy. Let X, Y and V be three random variables on some alphabets with probability distribution pXY V . When clear from context we will simple denote pX (x) with p(x). If the probability
distribution of random variables X, Y, V satisfies p(x|yv) = p(x|y) for each x, y, v, then they
form a Markov chain, which is denoted by X −− Y −− V .
The probability of an event A is denoted by Pr {A}, where the measure used to compute it
will be understood from the context. Conditional probability of a set A with respect to a set B
is denoted as Pr A B . The set of strong typical sequences associated with random variable X
n
(see appendix A) is denoted by T[X]ǫ
, where ǫ > 0. We simply denote these sets as Tǫn when
clear from the context. The cardinal of set A is denoted by kAk. The complement of a set is
denoted by Ā. With Z≥α and R≥β we denote the integers and reals numbers greater than α and
November 7, 2018
DRAFT
7
β respectively. co {A} denotes the convex hull of a set A ∈ RN , where N ∈ N.
II. P ROBLEM
FORMULATION
Assume three discrete memoryless sources (DMS’s) with alphabets and pmfs given by X1 ×
X2 × X3 , pX1 X2 X3 and arbitrary bounded distortion measures: dj : Xj × X̂j → R≥0 , j ∈
M , {1, 2, 3} where {X̂j }j∈M are finite reconstruction alphabets1 . We consider the problem
of characterizing the rate-distortion region of the interactive source coding scenario described
in Fig. 1. In this setting, through K rounds of information exchange between the nodes each
one of them will attempt to recover a lossy description of the sources that the others nodes
observe, e.g., node 1 must reconstruct –while satisfying distortion constraints– the realization
of the sources X2n and X3n observed by nodes 2 and 3. Indeed, this setting can be seen as a
generalization of the well-known Kaspi’s problem [13].
Definition 1 (K-step interactive source code): A K-step interactive n-length source code, denoted for the network model in Fig. 1, is defined by a sequence of encoder mappings:
f1l : X1n × J21 × J31 × · · · × J2l−1 × J3l−1 −→ J1l ,
f2l : X2n × J11 × J31 × · · · × J3l−1 × J1l −→ J2l ,
f3l : X3n × J11 × J21 × · · · × J1l × J2l −→ J3l ,
(1)
(2)
(3)
with l ∈ [1 : K] and message sets: Jil , 1, 2, . . . , Iil , Iil ∈ Z≥0 , i ∈ M, and reconstruction
mappings:
gij : Xin ×
O
m∈M, m6=i
Jm1 × · · · × JmK −→ X̂ijn , i 6= j.
(4)
The average per-letter distortion and the corresponding distortion levels achieved at the node i
with respect to source j are:
i
h
n
n
E dj Xj , X̂ij ≤ Dij i, j ∈ M, i 6= j
with
n
1X
d(xm , ym ) .
d (x , y ) ≡
n m=1
n
1
n
(5)
(6)
The problem can be easily generalized to the case in which there are different reconstruction alphabets at the terminals. It
can also be shown that all the results are valid if we employ arbitrary bounded joint distortion functions, e.g. at node 1 we use
d(X2 , X3 ; X̂2 , X̂3 ).
November 7, 2018
DRAFT
8
In compact form we denote a K-step interactive source coding by (n, K, F, G) where F and G
denote the sets of encoders and decoders mappings.
Remark 1: The code definition depends on the node ordering in the encoding procedure.
K
Above we defined the encoding functions f1l , f2l , f3l l=1 assuming that in each round node 1
acts first, followed by node 2, and finally by node 3, and the process beginning again at node 1.
Definition 2 (Achievability and rate-distortion region): Consider R , (R1 , R2 , R3 ) and D ,
(D12 , D13 , D21 , D23 , D31 , D32 ). The rate vector R is (D, K)-achievable if ∀ε > 0 there is
n0 (ε, K) ∈ N such that ∀n > n0 (ε, K) there exists a K-step interactive source code (n, K, F, G)
with rates satisfying:
K
1X
log kJil k ≤ Ri + ǫ, i ∈ M
n l=1
(7)
n
o
R3 (D, K) = R : R is (D, K)-achievable
(9)
and with average per-letter distortions at node i with respect to source j:
h
i
E dj (Xjn , X̂ijn ) ≤ Dij + ǫ, i, j ∈ M, i 6= j ,
(8)
N
n
1
K
n
where X̂ij ≡ gij Xi , m∈M, m6=i Jm × · · · × Jm , i 6= j ∈ M. The rate-distortion region
R3 (D, K) is defined by:
S
2
Similarly, the D-achievable region R3 (D) is given by R3 (D) = ∞
K=1 R3 (D, K) , that is:
o
n
(10)
R3 (D) = R : R is (D, K)-achievable for some K ∈ Z≥1 .
Remark 2: By definition R3 (D, K) is closed and using a time-sharing argument it is easy to
show that it is also convex ∀K ∈ Z≥1 .
Remark 3: R3 (D, K) depends on the node ordering in the encoding procedure. Above we
K
defined the encoding functions f1l , f2l , f3l l=1 assuming that in each round node 1 acts first,
followed by node 2, and finally by node 3, and the process beginning again at node 1. In this
paper we restrict the analysis to the canonical ordering (1 → 2 → 3). However, there are 3! = 6
different orderings that generally lead to different regions and the (D, K)-achievable region
2
Notice that this limit exists because it is the union of a monotone increasing sequence of sets.
November 7, 2018
DRAFT
9
defined above is more explicitly denoted R3 (D, K, σc ), where σc is the trivial permutation for
M. The correct (D, K)-achievable region is:
R3 (D, K) =
[
R3 (D, K, σ)
(11)
σ∈Σ(M)
where Σ(M) contains all the permutations of set M. The theory presented in this paper for
determining R3 (D, K, σc ) can be used on the other permutations σ 6= σc to compute (11)3 .
III. I NNER B OUND ON THE R ATE -D ISTORTION R EGION
In this Section, we provide a general achievable rate-region on the rate-distortion region.
A. Inner bound
We first present a general achievable rate-region where each node at a given round l will
generate descriptions destined to the other nodes based on the realization of its own source, the
past descriptions generated by a particular node and the descriptions generated at the other nodes
and recovered by the node up to the present round. In order to precisely describe the complex
rate-region, we need to introduce some definitions. For a set A, let C (A) = 2A \ {A, ∅} be the
set of all subsets of A minus A and the empty set. Denote the auxiliary random variables:
Ui→S,l , S ∈ C (M) , i ∈
/ S, l = 1, . . . , K.
(12)
Auxiliary random variables {Ui→S,l } will be used to denote the descriptions generated at node i
and at round l and destined to a set of nodes S ∈ C (M) with i ∈
/ S. For example, U1→23,l denote
the description generated at node 1 and at round l and destined to nodes 2 and 3. Similarly,
{U1→2,l } will be used to denote the descriptions generated at node 1 at round l and destined
only to node 2. We define variables:
W[i,l] ≡ Common information4 shared by the three nodes available at node i at round l before
encoding
3
It should be mentioned that this is not the most general setting of the problem. The most general encoding procedure will
follow from the definition of the transmission order by a sequence t1 , t2 , t3 , . . . , tkMk×K with ti ∈ M. This will cover even the
situation in which the order can be changed in each round. To keep the mathematical presentation simpler we will not consider
this more general setting.
4
Not to be confused with the Wyner’s definition of common information [20].
November 7, 2018
DRAFT
10
V[S,l,i] ≡ Private information shared by nodes in S ∈ C (M) available at node i ∈ S, at round
l, before encoding
In precise terms, the quantities introduced above for our problem are defined by:
W[1,l] ={U1→23,k , U2→13,k , U3→12,k }l−1
k=1 ,
W[2,l] =W[1,l] ∪ U1→23,l ,
W[3,l] =W[2,l] ∪ U2→13,l ,
V[12,l,1] ={U1→2,k , U2→1,k }l−1
k=1 , V[12,l,2] = V[12,l,1] ∪ U1→2,l ,
V[13,l,1] ={U1→3,k , U3→1,k }l−1
k=1 , V[13,l,3] = V[13,l,1] ∪ U1→3,l ,
V[23,l,2] ={U2→3,k , U3→2,k }l−1
k=1 , V[23,l,3] = V[23,l,2] ∪ U2→3,l .
Before presenting the general inner bound, we provide the basic idea of the random coding
scheme that achieves the rate-region in Theorem 1 for the case of K communication rounds.
Assume that all codebooks are randomly generated and known to all the nodes before the
information exchange begins and consider the encoding ordering given by 1 → 2 → 3 so that
we begin at round l = 1 in node 1. Also, and in order to maintain the explanation simple and
to help the reader to grasp the essentials of the coding scheme employed, we will consider that
all terminal are able to recover the descriptions generated at other nodes (which will be the
case under the conditions in our Theorem 1). From the observation of the source X1n , node 1
generates a set of descriptions for each of the other nodes connected to it. In particular it generates
a common description to be recovered at nodes 2 and 3 in addition to two private descriptions for
node 2 and 3, respectively, generated from a conditional codebook given the common description.
Then, node 2 tries to recover the descriptions destined to it (the common description generated
at 1 and its corresponding private description), using X2n as side information, and generates its
own descriptions, based on source X2n and the recovered descriptions from node 1. Again, it
generates a common description for nodes 1 and 3, a private description for node 3 and another
one for node 1. The same process goes on until node 3, which tries to recover jointly the common
descriptions generated by node 1 and node 2, and then the private descriptions destined to him
by node 1 and 2. Then generates its own descriptions (common and private ones) destined to
nodes 1 and 2. Finally, node 1 tries to recover all the descriptions destined to it generated by
nodes 2 and 3 in the same way as previously done by node 3. After this, round l = 1 is over,
November 7, 2018
DRAFT
11
and round l = 2 begins with node 1 generating new descriptions using X1n , its encoding history
(from previous round) and the recovered descriptions from the other nodes.
The process continues in a similar manner until we reach round l = K where node 3 recovers
the descriptions from the other nodes and generates its own ones. Node 1 recovers the last
descriptions destined to it from nodes 2 and 3 but does not generate new ones. The same holds
for node 2 who only recovers the descriptions generated by node 3 and thus terminating the
information exchange procedure. Notice that at the end of round K the decoding in node 1 and
node 2 can be done simultaneously. This is due to the fact that node 1 is not generating a new
description destined to node 2. However, in order to simplify the analysis and notation in the
appendix we will consider that the last decoding of node 2 occurs in round K + 15 . After all
the exchanges are done, each node recovers an estimate of the other nodes, source realizations
by using all the available recovered descriptions from the K previous rounds.
Theorem 1 (Inner bound): Let R̄3 (D, K) be the closure of set of all rate tuples satisfying:
R1 =
K
X
(l)
(l)
(l)
(l)
(l)
(l)
(l)
(l)
(l)
(l)
(l)
(l)
(l)
(l−1)
(l)
(l−1)
(l)
(l)
(l)
(l)
l=1
R2 =
K
X
R2→13 + R2→1 + R2→3
l=1
R3 =
K
X
R3→12 + R3→1 + R3→2
l=1
R1 + R2 =
K
X
(l)
R1→23 + R1→2 + R1→3
(13)
(14)
(15)
(l)
(l)
R1→23 + R2→13 + R1→3 + R2→3 + R1→2 + R2→1
l=1
R1 + R3 =
K+1
X
(l)
(l)
R1→23 + R3→12 + R1→2 + R3→2 + R1→3 + R3→1
l=1
R2 + R3 =
K
X
(l)
(l)
R2→13 + R3→12 + R2→1 + R3→1 + R2→3 + R3→2
l=1
5
(16)
(17)
(18)
This is clearly a fictitious round, in the sense that there is not descriptions generation on it. In this way, there is not
modification of the final rates achieved by the procedure described if we consider this additional round.
November 7, 2018
DRAFT
12
where6 for each l ∈ [1 : K]:
(0)
(l)
R1→23 > I X1 ; U1→23,l X2 W[1,l] V[12,l,1] V[23,l−1,3]
(l)
R2→13 > I X2 ; U2→13,l X3 W[2,l] V[13,l,1] V[23,l,2]
(l)
R3→12 > I X3 ; U3→12,l X1 W[3,l] V[12,l,2] V[13,l,3]
(l)
(l)
R1→23 + R2→13 > I X1 X2 ; U1→23,l U2→13,l X3 W[1,l] V[13,l,1] V[23,l,2]
(l)
(l)
R2→13 + R3→12 > I X2 X3 ; U2→13,l U3→12,l X1 W[2,l] V[12,l,2] V[13,l,3]
(l)
(l−1)
R1→23 + R3→12 > I X1 X3 ; U1→23,l U3→12,l−1 X2 W[3,l−1] V[12,l,1] V[23,l−1,3]
(l−1)
R3→2 > I X3 ; U3→2,l−1 X2 W[2,l] V[23,l−1,3] V[12,l,2]
(l)
R1→2 > I X1 ; U1→2,l X2 W[2,l] V[23,l,2] V[12,l,1]
(l)
(l−1)
R1→2 + R3→2 > I X1 X3 ; U1→2,l U3→12,l−1 X2 W[2,l] V[23,l−1,3] V[12,l,1]
(l)
R1→3 > I X1 ; U1→3,l X3 W[3,l] V[23,l,3] V[13,l,1]
(l)
R2→3 > I X2 ; U2→3,l X3 W[3,l] V[23,l,2] V[13,l,3]
(l)
(l)
R1→3 + R2→3 > I X1 X2 ; U1→3,l U2→3,l X3 W[3,l] V[23,l,2] V[13,l,1]
(l)
R2→1 > I X2 ; U2→1,l X1 W[1,l+1] V[12,l,2] V[13,l+1,1]
(l)
R3→1 > I X3 ; U3→1,l X1 W[1,l+1] V[12,l+1,1] V[13,l,3]
(l)
(l)
R2→1 + R3→1 > I X2 X3 ; U2→1,l U3→1,l X1 W[1,l+1] V[12,l,2] V[13,l,3]
(K+1)
with Ri→S = Ri→S
(19)
(20)
(21)
(22)
(23)
(24)
(25)
(26)
(27)
(28)
(29)
(30)
(31)
(32)
(33)
= 0 and Ui→S,0 = Ui→S,K+1 = ∅ for S ∈ C (M) and i ∈
/ S.
With these definitions the rate-distortion region satisfies7 :
[
R̄3 (D, K) ⊆ R3 (D, K) ,
(34)
p∈P(D,K)
6
Notice that these definitions are motivated by the fact that at round 1, node 2 only recovers the descriptions generated by
node 1 and at round K + 1 only recovers what node 3 already generated at round K.
7
It is straightforward to show that the LHS of equation (34) is convex, which implies that the convex hull operation is not
needed.
November 7, 2018
DRAFT
13
where P(D, K) denotes the set of all joint probability measures associated with the following
Markov chains for every l ∈ [1 : K]:
1) U1→23,l −− (X1 , W[1,l] ) −− (X2 , X3 , V[12,l,1] , V[13,l,1] , V[23,l,2] ) ,
2) U1→2,l −− (X1 , W[2,l] , V[12,l,1] ) −− (X2 , X3 , V[13,l,1] , V[23,l,2] ) ,
3) U1→3,l −− (X1 , W[2,l] , V[13,l,1] ) −− (X2 , X3 , V[12,l,2] , V[23,l,2] ) ,
4) U2→13,l −− (X2 , W[2,l] ) −− (X1 , X3 , V[12,l,2] , V[13,l,3] , V[23,l,2] ) ,
5) U2→1,l −− (X2 , W[3,l] , V[12,l,2] ) −− (X1 , X3 , V[13,l,3] , V[23,l,2] ) ,
6) U2→3,l −− (X2 , W[3,l] , V[23,l,2] ) −− (X1 , X3 , V[12,l+1,1] , V[13,l,3] ) ,
7) U3→12,l −− (X3 , W[3,l] ) −− (X1 , X2 , V[12,l+1,1] , V[13,l,3] , V[23,l,3] ) ,
8) U3→1,l −− (X3 , W[1,l+1] , V[13,l,3] ) −− (X1 , X2 , V[12,l+1,1] , V[23,l,3] ) ,
9) U3→2,l −− (X3 , W[1,l+1] , V[23,l,3] ) −− 1(X1 , X2 , V[12,l+1,1] , V[13,l+1,1] ) ,
and such that there exist reconstruction mappings:
h
i
gij Xi , V[ij,K+1,1] ,W[1,K+1] = X̂ij
(35)
with E dj (Xj , X̂ij ) ≤ Dij for each i, j ∈ M and i 6= j.
The proof of this theorem is relegated to Appendix C and relies on the auxiliary results
presented in Appendix A and the theorem on the cooperative Berger-Tung problem with side
information presented in Appendix B.
Remark 4: It is worth mentioning here that our coding scheme is constrained to use successive
decoding, i.e., by recovering first the coding layer of common descriptions and then coding
layer of private descriptions (at each coding layer each node employ joint-decoding). Obviously,
this is a sub-optimum procedure since the best scheme would be to use joint decoding where
both common and private informations can be jointly recovered. However, the analysis of this
scheme is much more involved. The associated achievable rate region involves a large number of
equations that combine rates belonging to private and common messages from different nodes.
Also, several mutual information terms in each of these rate equations cannot be combined,
leading to a proliferation of many equations that offer little insight to the problem.
Remark 5: The idea behind our derivation of the achievable region can be extended to any
number M (> 3) of nodes in the network. This can be accomplished by generating a greater
number of superimposed coding layers. First a layer of codes that generates descriptions destined
November 7, 2018
DRAFT
14
to be decoded by all nodes. The next layer corresponding to all subsets of size M − 1, etc, until
we reach the final layer composed by codes that generate private descriptions for each of nodes.
Again, successive decoding is used at the nodes to recover the descriptions in these layers
destined to them. Of course, the number of required descriptions will increase with the number
of nodes as well as the obtained rate-distortion region.
Remark 6: It is interesting to compare the main ideas of our scheme with those of Kaspi
[13]. The main idea in [13] is to have a single coding tree shared by the two nodes. Each leaf
in the coding tree is codeword generated either at node 1 or 2. At a given round each node
knows (assuming no errors at the encoding and decoding procedures) the path followed in the
tree. For example, at round l, node 1, using the knowledge of the path until round l and its
source realization generate a leaf (from a set of possible ones) using joint typicality encoding
and binning. Node 2, using the same path known at node 1 and its source realization, uses joint
typicality decoding to estimate the leaf generate at node 1. If there is no error at these encoding
and decoding steps, the previous path is updated with the new leaf and both -node 1 and 2know the updated path. Node 2 repeats the procedure. This is done until round K where the
final path is known at both nodes and used to reconstruct the desired sources.
In the case of three nodes the situation is more involved. At a given round, the encoder at an
arbitrary node is seeing two decoders with different side information8 . In order to simplify the
explanation consider that we are at round l in the encoder 1, and that the listening nodes are
nodes 2 and 3. This situation forces node 1 to encode two sets of descriptions: one common
for the other two nodes and a set of private ones associated with each of the listening nodes 2
and 3. Following the ideas of Kaspi, it is then natural to consider three different coding trees
followed by node 1. One coding tree has leaves that are the common descriptions generated
and shared by all the nodes in the network. The second tree is composed by leaves that are
the private descriptions generated and shared with node 2. The third tree is composed by leaves
that are the private descriptions generated and shared with node 3. As the private descriptions
refine the common ones, depending on the quality of the side information of the node that is the
intended recipient, it is clear that descriptions are correlated. For example, the private description
destined to node 2, should depend not only on the past private descriptions generated and shared
8
Because at each node the source realizations are different, and the recovered previous descriptions can also be different.
November 7, 2018
DRAFT
15
by nodes 1 and 2, but also on the common descriptions generated at all previous rounds in all
the nodes and on the common description generated at the present round in node 1. Something
similar happens for the private description destined to node 3. It is clear that as the common
descriptions are to be recovered by all the nodes in the network, they can only be conditioned
with respect to the past common descriptions generated at previous rounds and with respect to
the common descriptions generated at the present round by a node who acted before (i.e. at
round l node 1 acts before node 2). The private descriptions, as they are only required to be
recovered at some set of nodes, can be generated conditioned on the past exchanged common
descriptions and the past private descriptions generated and recovered in the corresponding set
of nodes (i.e., the private descriptions exchanged between nodes 1 and 2 at round l, can only
be generated conditioned on the past common descriptions generated at nodes 1, 2 and 3 and
on the past private descriptions exchanged only between 1 and 2).
We can see clearly that there are basically four paths to be cooperatively followed in the
network:
•
One path of common descriptions shared by nodes 1, 2 and 3.
•
One path of private descriptions shared by nodes 1 and 2.
•
One path of private descriptions shared by nodes 1 and 3.
•
One path of private descriptions shared by nodes 2 and 3.
It is also clear that each node only follows three of these paths simultaneously. The exchange of
common descriptions deserves special mention. Consider the case at round l in node 3. This node
needs to recover the common descriptions generated at nodes 1 and 2. But at the moment node
2 generated its own common description, it also recovered the common one generated at node 1.
This allows for a natural explicit cooperation between nodes 1 and 2 in order to help node 3 to
recover both descriptions. Clearly, this is not the case for private descriptions from nodes 1 and 2
to be recovered at node 3. Node 2 does not recover the private description from node 1 to 3 and
cannot generate an explicit collaboration to help node 3 to recover both private descriptions. Note,
however, that as both private descriptions will be dependent on previous common descriptions
an implicit collaboration (intrinsic to the code generation) is also in force. In appendix B we
consider the problem (not in the interactive setting) of generating the explicit cooperation for
the common descriptions through the use of what we call a super-binning procedure, in order
November 7, 2018
DRAFT
16
to use the results for our interactive three-node problem.
IV. K NOWN C ASES AND R ELATED W ORK
Several inner bounds and rate-distortion regions on multiterminal source coding problems can
be derived by specializing the inner bound (34). Below we summarize only a few of them.
1) Distributed source coding with side information [4], [19]: Consider the distributed source
coding problem where two nodes encode separately sources X1n and X2n to rates (R1 , R2 ) and a
decoder by using side information X3n must reconstruct both sources with average distortion less
than D1 and D2 , respectively. By considering only one-round/one-way information exchange
from nodes 1 and 2 (the encoders) to node 3 (the decoder), the results in [4], [19] can be
recovered as a special case of the inner bound (34). Specifically, we set:
U1→23,l =U2→13,l = U3→12,l = U1→2,l = U2→1,l = U3→1,l = U3→2,l = ∅, ∀l
U1→3,l =U2→3,l = ∅, ∀l > 1 .
In this case, the Markov chains of Theorem 1 reduce to:
U1→3,1 −−X1 −− (X2 , X3 , U2→3,1 ) ,
(36)
U2→3,1 −−X2 −− (X1 , X3 , U1→3,1 ) ,
(37)
and thus the inner bound from Theorem 1 recovers the results in [19]
R1 >I(X1 ; U1→3,1 |X3 U2→3,1 ) ,
(38)
R2 >I(X2 ; U2→3,1 |X3 U1→3,1 ) ,
(39)
R1 + R2 >I(X1 X2 ; U1→3,1 U2→3,1 |X3 ) .
(40)
2) Source coding with side information at 2-decoders [10], [11]: Consider the setting where
one encoder X1n transmits descriptions to two decoders with different side informations (X2n , X3n )
and distortion requirements D2 and D3 . Again we consider only one way/round information
exchange from node 1 (the encoder) to nodes 2 and 3 (the decoders).
In this case, we set:
U2→13,l =U3→12,l = U2→1,l = U3→1,l = U3→2,l = U2→3,l = ∅, ∀l
U1→23,l =U1→23,l = U1→2,l = U1→3,l = ∅, ∀l > 1 .
November 7, 2018
DRAFT
17
The above Markov chains imply
(U1→23,1 , U1→2,1 , U1→3,1 ) −− X1 −− (X2 , X3 )
(41)
and thus the inner bound from Theorem 1 reduces to the results in [10], [11]
R1 >max I(X1 ; U1→23,1 |X2 ) , I(X1 ; U1→23,1 |X3 )
+I(X1 ; U1→2,1 |X2 U1→23,1 ) + I(X1 ; U1→3,1 |X3 U1→23,1 ) .
(42)
3) Two terminal interactive source coding [13]: Our inner bound (34) is basically the generalization of the two terminal problem to the three-terminal setting. Assume only two encodersdecoders X1n and X2n which must reconstruct the other terminal source 3 with distortion constraints D1 and D2 , and after K rounds of information exchange. Let us set:
U1→23,l =U2→13,l = U3→12,l = U1→3,l = U3→1,l = U2→3,l = U3→2,l = ∅, ∀l
X3 =∅ .
The Markov chains become
U1→2,l −− (X1 , V[12,l,1] ) −− X2 ,
(43)
U2→1,l −− (X2 , V[12,l,2] ) −− X2 ,
(44)
for l ∈ [1 : K] and thus the inner bound from Theorem 1 permit us to obtain the results in [13]
R1 >I(X1 ; V[12,K+1,1] |X2 ) ,
(45)
R2 >I(X2 ; V[12,K+1,2] |X1 ) .
(46)
4) Two terminal interactive source coding with a helper [15]: Consider now two encoders/decoders,
namely X2n and X3n , that must reconstruct the other terminal source with distortion constraints
D2 and D3 , respectively, using K communication rounds. Assume also that another encoder
X1n provides both nodes (2, 3) with a common description before beginning the information
exchange and then remains silent. Such common description can be exploited as coded side
information. Let us set:
U2→13,l =U3→12,l = U1→3,l = U1→2,l = U1→3,l = U2→1,l = U3→1,l = ∅, ∀l
U1→23,l =∅, ∀l > 1 .
November 7, 2018
DRAFT
18
The Markov chains reduce to:
U1→23,1 −−X1 −− (X2 , X3 ) ,
(47)
U2→3,l −−(X2 , U1→23,1 , V[23,l,2] ) −− (X1 , X3 ) ,
(48)
U3→2,l −−(X3 , U1→23,1 , V[23,l,3] ) −− (X1 , X2 ) .
(49)
An inner bound to the rate-distortion region for this problem reduces to (using the rate equations
in our Theorem 1)
R1 >max I(X1 ; U1→23,1 |X2 ), I(X1 ; U1→23,1 |X3 ) ,
R2 >I(X2 ; V[23,K+1,2] |X3 U1→23,1 ) ,
(51)
R3 >I(X3 ; V[23,K+1,2] |X2 U1→23,1 ) .
(52)
(50)
This region contains as a special case the region in [15]. In that paper it is further assumed
(in order to have a converse result) that X1 − − X3 − − X2 . Then, the value of R1 satisfies
R1 > I(X1 ; U1→23,1 |X2 ). Obviously, with the same extra Markov chain we obtain the same
limiting value for R1 and the above region is the rate-distortion region.
V. N EW R ESULTS ON I NTERACTIVE AND C OOPERATIVE S OURCE C ODING
A. Two encoders and one decoder subject to lossy/lossless reconstruction constraints without
side information
Consider now the problem described in Fig. 2 where encoder 1 wishes to communicate the
source X1n to node 3 in a lossless manner while encoder 2 wishes to send a lossy description
of the source X2n to node 3 with distortion constraint D31 . To achieve this, the encoders use K
communication rounds. This problem can be seen as the cooperating encoders version of the
well-known Berger-Yeung [5] problem.
Theorem 2: The rate-distortion region of the setting described in Fig. 8 is given by the union
over all joint probability measures pX1 X2 U2→13 such that there exists a reconstruction mapping:
h
i
g32 (X1 , U2→13 ) = X̂32 with E d(X2 , X̂32 ) ≤ D32 ,
(53)
November 7, 2018
DRAFT
19
X1n
Node 1
R1
R1
X2n
Node 3
R2
n
X̂31
≈ X1n
n
(X̂32
, D32 )
R2
Node 2
Figure 2: Two encoders and one decoder subject to lossy/lossless reconstruction constraints
without side information.
of the set of all tuples satisfying:
R1 ≥ H(X1 |X2 ) ,
(54)
R2 ≥ I(X2 ; U2→13 |X1 ) ,
(55)
R1 + R2 ≥ H(X1 ) + I(X2 ; U2→13 |X1 ) .
(56)
The auxiliary random variable U2→13 has a cardinality bound of kU2→13 k ≤ kX1 kkX2 k + 1.
Remark 7: It is worth emphasizing that the rate-distortion region in Theorem 2 outperforms the
non-cooperative rate-distortion region first derived in [5]. This is due to two facts: the conditional
entropy given in the rate constraint (54) which is strictly smaller than the entropy H(X1 ) present
in the rate-region in [5], and the fact that the random description U2→13 may be arbitrarily
dependent on both sources (X1 , X2 ) which is not the case without cooperation [5]. Therefore,
cooperation between encoders 1 and 2 reduces the rate needed to communicate the source X1
while increasing the optimization set of all admissible source descriptions.
Remark 8: Notice that the rate-distortion region in Theorem 2 is achievable with a single
round of interactions K = 1, which implies that multiple rounds do not improve the ratedistortion region in this case. This holds because of the fact that node 3 reconstruct X1 in a
lossless fashion.
Remark 9: Although in the considered setting of Fig. 8 node 1 is not supposed to decode
neither a lossy description nor the complete source X2n , if nodes 1 and 3 wish to recover the same
November 7, 2018
DRAFT
20
descriptions the optimal rate-region remains the same as given in Theorem 2. The only difference
relies on the fact that node 1 is now able to find a function g12 (X1 , U2→13 ) = X̂12 which must
h
i
satisfy an additional distortion constraint E d(X2 , X̂12 ) ≤ D12 . In order to show this, it is
enough to check that in the converse proof given below the specific choice of the auxiliary
random variable already allows node 1 to recover a general function X̂12[t] = g12 X1[t] , U2→13[t]
for each time t ∈ {1, . . . , n}.
Proof: The direct part of the proof simply follows by choosing:
U3→12,l =U1→3,l = U1→2,l = U2→1,l = U2→3,l = U3→1,l = U3→2,l = ∅, ∀l
U1→23,1 =X1 ,
U1→23,l = U2→13,l = ∅ ∀ l > 1 ,
and thus the rate-distortion region (34) reduces to the desired region in Theorem 2 where for
simplicity we dropped the round index. We now proceed to the proof of the converse.
If a pair of rates (R1 , R2 ) and distortion D32 are admissible for the K-steps interactive
cooperative distributed source coding setting described in Fig. 8, then for all ε > 0 there exists
n0 (ε, K), such that ∀ n > n0 (ε, K) there exists a K-steps interactive source code (n, K, F, G)
with intermediate rates satisfying:
K
1X
log kJil k ≤ Ri + ε , i ∈ {1, 2}
n l=1
(57)
and with average per-letter distortions with respect to the source 2 and perfect reconstruction
with respect to the source 1 at node 3:
h
i
n
E d(X2n , X̂32
) ≤ D32 + ε ,
n
≤ε,
Pr X1n 6= X̂31
(58)
(59)
where
[1:K]
[1:K]
[1:K]
[1:K]
n
n
.
, X̂31
≡ g31 J1 , J2
X̂32
≡ g32 J1 , J2
(60)
For each t ∈ {1, . . . , n}, define random variables U2→13[t] as follows:
[1:K]
[1:K]
(61)
U2→13[t] , J1 , J2 , X1[1:t−1] , X1[t+1:n] .
n
≤ ε and Fano’s inequality [21], we have
By the condition (59) which says that Pr X1n 6= X̂31
n
n
n
, nǫn ,
(62)
log2 (kX1n k − 1) + H2 Pr X1n 6= X̂31
H(X1n |X̂31
) ≤ Pr X1n 6= X̂31
where ǫn (ε) → 0 provided that ε → 0 and n → ∞.
November 7, 2018
DRAFT
21
1) Rate at node 1: For the first rate, we have
[1:K]
n(R1 + ε) ≥ H J1
[1:K]
≥ I J1 ; X1n |X2n
n
n [1:K]
= nH(X1 |X2 ) − H X1 |X2 J1
(a)
[1:K] [1:K]
= nH(X1 |X2 ) − H X1n |X2n J1 J2
(63)
(64)
(65)
(66)
(b)
n
≥ nH(X1 |X2 ) − H(X1n |X̂31
)
(67)
≥ nH(X1 |X2 ) − nǫn ,
(68)
(b)
where
•
[1:K]
step (a) follows from the fact that by definition of the code the sequence J2
of the source X2n and the vector of messages
•
•
is a function
[1:K]
J1 ,
step (b) follows from the code assumption that guarantees the existence of a reconstruction
[1:K]
[1:K]
n
,
function X̂31
≡ g31 J1 , J2
step (c) follows from Fano’s inequality in (62).
November 7, 2018
DRAFT
22
2) Rate at node 2: For the second rate, we have
[1:K]
n(R2 + ε) ≥ H J2
[1:K]
≥ I J2 ; X1n X2n
[1:K]
[1:K]
n
n
n
= I J2 ; X1 + I J2 ; X2 |X1
(a)
≥I
(b)
=I
(c)
≥I
[1:K]
J2 ; X1n
[1:K]
J2 ; X1n
[1:K]
J2 ; X1n
(e)
where
•
[1:K]
(70)
(71)
n
X
[1:K]
I J2 ; X2[t] |X1[t] X1[t+1:n] X1[1:t−1] X2[1:t−1]
+
+
+
[1:K]
= I J2 ; X1n +
(d)
(69)
t=1
n
X
t=1
n
X
t=1
n
X
t=1
I
[1:K]
J2 X1[t+1:n] X1[1:t−1] X2[1:t−1] ; X2[t] |X1[t]
I U2→13[t] ; X2[t] |X1[t]
(73)
(74)
I U2→13[Q] ; X2[Q] |X1[Q] , Q = t
= I J2 ; X1n + nI U2→13[Q] ; X2[Q] |X1[Q] , Q
(f )
[1:K]
e2→13 ; X2 |X1
≥ I J2 ; X1n + nI U
(g)
e
≥ nI U2→13 ; X2 |X1 ,
(72)
(75)
(76)
(77)
(78)
step (a) follows from the chain rule for conditional mutual information and non-negativity
of mutual information,
•
step (b) follows from the memoryless property across time of the sources (X1n , X2n ),
•
step (c) follows from the non-negativity of mutual information and definitions (61),
•
step (d) follows from the use of a time sharing random variable Q uniformly distributed
over the set {1, . . . , n},
•
•
•
step (e) follows from the definition of the conditional mutual information,
e2→13 , (U2→13[Q] , Q),
step (f ) follows by letting a new random variable U
step (g) follows from the non-negativity of mutual information.
November 7, 2018
DRAFT
23
3) Sum-rate of nodes 1 and 2: For the sum-rate, we have
[1:K]
n(R1 + R2 + 2ε) ≥ H J1
+ n(R2 + ε)
(a)
[1:K]
[1:K]
e2→13 ; X2 |X1
+ I J2 ; X1n + nI U
≥ H J1
[1:K]
[1:K]
[1:K]
[1:K]
+ I J1 ; J2
= H J1 |J2
[1:K]
e2→13 ; X2 |X1
+ I J2 ; X1n + nI U
[1:K]
[1:K]
[1:K]
[1:K]
≥ I J1 ; X1n |J2
+ I J1 ; J2
[1:K]
n
e
+ I J2 ; X1 + nI U2→13 ; X2 |X1
[1:K]
[1:K]
[1:K]
e2→13 ; X2 |X1
+ nI U
= I J1 ; X1n + I X1n J1 ; J2
h
i
(b)
e2→13 ; X2 |X1
= n H(X1 ) + I U
[1:K]
[1:K]
[1:K]
+ I X1n J1 ; J2
− H X1n |J1
i
h
(c)
[1:K] [1:K]
n
e
≥ n H(X1 ) + I U2→13 ; X2 |X1 − H X1 |J1 J2
i
h
(d)
n
e
)
≥ n H(X1 ) + I U2→13 ; X2 |X1 − H(X1n |X̂31
i
h
(e)
e2→13 ; X2 |X1 − ǫn ,
≥ n H(X1 ) + I U
(79)
(80)
(81)
(82)
(83)
(84)
(85)
(86)
(87)
where
•
step (a) follows from inequality (77),
•
step (b) follows from the memoryless property across time of the source X1n ,
•
step (c) follows from non-negativity of mutual information,
•
step (d) follows from the code assumption that guarantees the existence of reconstruction
[1:K]
[1:K]
n
and from the fact that unconditioning increases entropy,
function X̂31
≡ g31 J1 , J2
•
step (e) from Fano’s inequality in (62).
[1:K]
[1:K]
n
and lossy
4) Distortion at node 3: Node 3 reconstructs lossless X̂31
≡ g31 J1 , J2
[1:K]
[1:K]
n
. For each t ∈ {1, . . . , n}, define a function X̂32[t] as beging the t-th
X̂32
≡ g32 J1 , J2
coordinate of this estimate:
X̂32[t] U2→13[t] , g32[t]
November 7, 2018
[1:K]
[1:K]
J1 , J2
.
(88)
DRAFT
24
The component-wise mean distortion thus verifies
i
h
[1:K]
[1:K]
D32 + ε ≥ E d X2 , g31 J1 , J2
(89)
i
1X h
E d X2[t] , X̂32[t] U2→13[t]
n t=1
n
=
1X h
E d X2[Q] , X̂32[Q] U2→13[Q]
n t=1
h
i
= E d X2[Q] , X̂32[Q] U2→13[Q]
i
h
e2→13
e32 U
,
= E d X2 , X
(90)
n
=
Q=t
i
e32 by
where we defined function X
e32 Q, U2→13[Q] , X̂32[Q] U2→13[Q] .
e2→13 = X
e32 U
X
(91)
(92)
(93)
(94)
This concludes the proof of the converse and thus that of the theorem.
B. Two encoders and three decoders subject to lossless/lossy reconstruction constraints with
side information
Consider now the problem described in Fig. 3 where encoder 1 wishes to communicate the
lossless the source X1n to nodes 2 and 3 while encoder 2 wishes to send a lossy description of its
source X2n to nodes 1 and 3 with distortion constraints D12 and D32 , respectively. In addition to
this, the encoders overhead the communication using K communication rounds. This problem
can be seen as a generalization of the settings previously investigated in [3], [5].
Theorem 3: The rate-distortion region of the setting described in Fig. 3 is given by the union
over all joint probability measures pX1 X2 X3 U2→13 U2→3 satisfying the Markov chain
(U2→13 , U2→3 ) −− (X1 , X2 ) −− X3
(95)
and such that there exists reconstruction mappings:
November 7, 2018
g32 (X1 , X3 , U2→13 , U2→3 )=X̂32
with
g12 (X1 , U2→13 )=X̂12
with
h
i
E d(X2 , X̂32 ) ≤ D32 ,
h
i
E d(X2 , X̂12 ) ≤ D12 ,
(96)
(97)
DRAFT
25
n
(X̂12
, D12 )
X1n
Node 1
R1
R1
Node 3
R2
n
(X̂32
, D32 )
R2
X2n
n
X̂31
≈ X1n
Node 2
X3n
n
X̂21
≈ X1n
Figure 3: Two encoders and three decoders subject to lossless/lossy reconstruction constraints
with side information.
of the set of all tuples satisfying:
R1 ≥ H(X1 |X2 ) ,
(98)
R2 ≥ I(U2→13 ; X2 |X1 ) + I(U2→3 ; X2 |U2→13 X1 X3 ) ,
(99)
R1 + R2 ≥ H(X1 |X3 ) + I(U2→13 U2→3 ; X2 |X1 X3 ).
(100)
The auxiliary random variables have cardinality bounds: kU2→13 k ≤ kX1 kkX2 k + 2, kU2→3 k ≤
kX1 kkX2 kkU2→13 k + 1.
Remark 10: Notice that the rate-distortion region in Theorem 3 is achievable with a single
round of interactions K = 1, which implies that multiple rounds do not improve the ratedistortion region in this case.
Remark 11: It is worth mentioning that cooperation between encoders reduces the rate needed
to communicate the source X2 while increasing the optimization set of all admissible source
descriptions.
Proof: The direct part of the proof follows by choosing:
U3→12,l =U1→3,l = U1→2,l = U2→1,l = U3→1,l = U3→2,l = ∅, ∀l
U1→23 ≡U1→23,1 = X1 ,
November 7, 2018
U1→23,l = U2→13,l = U2→3,l = ∅ ∀ l > 1 .
DRAFT
26
and U2→13,1 ≡ U2→13 and U2→3,1 ≡ U2→3 are auxiliary random variables that according to
Theorem 1 should satisfy:
U2→13 −− (X1 , X2 ) −− X3 , U2→3 , −−(U2→13 , X1 , X2 ) −− X3 .
(101)
Notice, however that these Markov chains are equivalent to (95). From the rate equations in
Theorem 1, and the above choices for the auxiliary random variables we obtain:
R1→23 >H(X1 |X2 ) ,
(102)
R2→13 >max {I(X2 ; U2→13 |X1 ), I(X2 ; U2→13 |X1 X3 )}
(103)
=I(X2 ; U2→13 |X1 ) ,
(104)
R1→23 + R2→13 >H(X1 |X3 ) + I(X2 ; U2→13 |X1 X3 ) ,
(105)
R2→3 >I(X2 ; U2→3 |U2→13 X1 X3 ) .
(106)
Noticing that R1 ≡ R1→23 and R2 ≡ R2→13 + R2→3 the rate-distortion region (34) reduces to
the desired region in Theorem 3, where for simplicity we dropped the round index. We now
proceed to the proof of the converse.
If a pair of rates (R1 , R2 ) and distortions (D12 , D32 ) are admissible for the K-steps interactive
cooperative distributed source coding setting described in Fig. 3, then for all ε > 0 there exists
n0 (ε, K), such that ∀ n > n0 (ε, K) there is a K-steps interactive source code (n, K, F, G) with
intermediate rates satisfying:
K
1X
log kJil k ≤ Ri + ε , i ∈ {1, 2}
n l=1
(107)
and with average per-letter distortions with respect to the source 2 and perfect reconstruction
with respect to the source 1 at all nodes:
h
i
n
n
E d(X2 , X̂32 ) ≤ D32 + ε ,
n
≤ε,
Pr X1n 6= X̂21
h
i
n
) ≤ D12 + ε ,
E d(X2n , X̂12
n
≤ε,
Pr X1n 6= X̂31
November 7, 2018
(108)
(109)
(110)
(111)
DRAFT
27
where
[1:K]
[1:K]
n
X̂32
≡ g32 J1 , J2 , X3n ,
[1:K]
[1:K]
n
X̂31
≡ g31 J1 , J2 , X3n ,
[1:K]
n
X̂12
≡ g12 J2 , X1n ,
[1:K]
n
X̂21
≡ g21 J1 , X2n .
(112)
(113)
For each t ∈ {1, . . . , n}, define random variables U2→13[t] and U2→3[t] as follows:
[1:K]
[1:K]
U2→13[t] , J1 , J2 , X1[1:t−1] , X1[t+1:n] , X3[1:t−1] ,
U2→3[t] , U2→13[t] , X3[t+1:n] , X2[1:t−1] .
(114)
(115)
The fact that these choices of the auxiliary random variables satisfy the Markov chain (95) can be
obtained from point 6) in Lemma 10. By the conditions (111) and (109), and Fano’s inequality,
we have
n
H(X1n |X̂31
)
≤ Pr
X1n
6=
n
X̂31
n
n
H(X1n |X̂21
) ≤ Pr X1n 6= X̂21
log2 (kX1n k
X1n
n
X̂31
, nǫn ,
6
=
− 1) + H2 Pr
n
6 X̂21
, nǫn ,
log2 (kX1n k − 1) + H2 Pr X1n =
(116)
(117)
where ǫn (ε) → 0 provided that ε → 0 and n → ∞.
1) Rate at node 1: For the first rate, we have
[1:K]
n(R1 + ε) ≥ H J1
[1:K]
n
≥ H J1 |X2
(a)
[1:K]
n
n
= I J1 ; X1 |X2
[1:K]
= nH(X1 |X2 ) − H X1n |X2n J1
(118)
(119)
(120)
(121)
(b)
n
≥ nH(X1 |X2 ) − H(X1n |X̂21
)
(122)
(c)
≥ n [H(X1 |X2 ) − ǫn ] ,
(123)
where
•
[1:K]
step (a) follows from the fact that by definition of the code the sequence J1
is a function
of the both sources (X1n , X2n ),
•
•
step (b) follows from the code assumption in (113) that guarantees the existence of a
[1:K]
n
reconstruction function X̂21
≡ g21 J1 , X2n ,
step (c) follows from Fano’s inequality in (117).
November 7, 2018
DRAFT
28
2) Rate at node 2: For the second rate, we have
[1:K]
n(R2 + ε) ≥ H J2
(a)
[1:K]
= I J2 ; X1n X2n X3n
(b)
[1:K]
≥ I J2 ; X2n X3n |X1n
(c)
[1:K] [1:K]
= I J1 J2 ; X2n X3n |X1n
[1:K] [1:K]
[1:K] [1:K]
n
n n
n
n
= I J1 J2 ; X3 |X1 + I J1 J2 ; X2 |X1 X3
(124)
(125)
(126)
(127)
(128)
n h
X
[1:K] [1:K]
n
=
I J1 J2 ; X3[t] |X1 , X3[1:t−1]
(d)
t=1
(e)
=
i
[1:K] [1:K]
+I J1 J2 ; X2[t] |X1n X3n X2[1:t−1]
(129)
n h
X
[1:K] [1:K]
I J1 J2 X1[1:t−1] X1[t+1:n] X3[1:t−1] ; X3[t] |X1[t]
t=1
+I
(f )
=
[1:K] [1:K]
J1 J2 X1[1:t−1] X1[t+1:n] X3[1:t−1] X3[t+1:n] X2[1:t−1] ; X2[t] |X1[t] X3[t]
n h
X
t=1
=
(g)
=
(h)
=
n h
X
t=1
n h
X
t=1
n h
X
t=1
where
I U2→13[t] ; X3[t] |X1[t] + I U2→13[t] ; X2[t] |X1[t] X3[t]
+I U2→3[t] ; X2[t] |X1[t] X3[t] U2→13[t]
i
i
I U2→13[t] ; X2[t] X3[t] |X1[t] + I U2→3[t] ; X2[t] |X1[t] X3[t] U2→13[t]
i
I U2→13[t] ; X2[t] |X1[t] + I U2→3[t] ; X2[t] |X1[t] X3[t] U2→13[t]
I U2→13[Q] ; X2[Q] |X1[Q] , Q = t
i
+I U2→3[Q] ; X2[Q] |X1[Q] X3[Q] U2→13[Q] , Q = t
i
(i) h
e
e
e
,
≥ n I U2→13 ; X2 |X1 + I U2→3 ; X2 |X1 X3 U2→13
[1:K]
step (a) follows from the fact that J2
•
step (b) follows from the non-negativity of mutual information,
•
step (c) follows from the fact that J1
[2:K]
(131)
(132)
(133)
(134)
(135)
is a function of the sources (X1n , X2n ),
•
November 7, 2018
i
(130)
[1:K]
is a function of J2
and the source X1n ,
DRAFT
29
•
step (d) follows from the chain rule for conditional mutual information,
•
step (e) follows from the memoryless property across time of the sources (X1n , X2n , X3n ),
•
step (f ) follows from the chain rule for conditional mutual information and the definitions (114) and (115),
•
step (g) follows from the Markov chain U2→13[t] −
−(X1[t] , X2[t] )−
−X3[t] , for all t ∈ {1, . . . , n},
•
step (h) follows from the use of a time sharing random variable Q uniformly distributed
over the set {1, . . . , n},
•
e2→13 , (U2→13[Q] , Q) and U
e2→3 ,
step (i) follows by letting new random variables U
(U2→3[Q] , Q).
3) Sum-rate of nodes 1 and 2: For the sum-rate, we have
[1:K]
[1:K]
+ H J2
n(R1 + R2 + 2ε) ≥ H J1
[1:K]
[1:K]
[1:K] [1:K]
+ I J1 ; J2
= H J1 J2
(a)
[1:K]
[1:K]
[1:K] [1:K]
= I J1 J2 ; X1n X3n X2n + I J1 ; J2
(b)
[1:K] [1:K]
n
n n
≥ I J1 J2 ; X1 X2 |X3
[1:K] [1:K]
[1:K] [1:K]
= I J1 J2 ; X1n |X3n + I J1 J2 ; X2n |X1n X3n
[1:K] [1:K]
= H (X1n |X3n ) − H X1n |J1 J2 X3n
[1:K] [1:K]
n
n n
+I J1 J2 ; X2 |X1 X3
(c)
[1:K] [1:K]
n
n
n
n n
n
n
≥ H (X1 |X3 ) − H(X1 |X̂31 ) + I J1 J2 ; X2 |X1 X3
(d)
[1:K] [1:K]
≥ n [H (X1 |X3 ) − ǫn ] + I J1 J2 ; X2n |X1n X3n
(e)
=
(136)
(137)
(138)
(139)
(140)
(141)
(142)
(143)
n
X
[1:K] [1:K]
I J1 J2 X1[1:t−1] X1[t+1:n] X3[1:t−1] X3[t+1:n] X2[1:t−1] ; X2[t] |X1[t] X3[t]
t=1
+ n [H (X1 |X3 ) − ǫn ]
(f )
= n [H (X1 |X3 ) − ǫn ] +
November 7, 2018
t=1
I U2→13[t] U2→3[t] ; X2[t] |X1[t] X3[t]
= n H (X1 |X3 ) − ǫn + I U2→13[Q] U2→3[Q] ; X2[Q] |X1[Q] X3[Q] , Q
i
h
(h)
e2→13 U
e2→3 ; X2 |X1 X3
,
= n H (X1 |X3 ) − ǫn + I U
(g)
where
(144)
n
X
(145)
(146)
(147)
DRAFT
30
•
[1:K]
step (a) follows from the fact that J1
[1:K]
and J2
are functions of the sources (X1n , X2n , X3n ),
to emphasize
•
step (b) follows non-negativity of mutual information,
•
step (c) follows from the code assumption in (113) that guarantees the existence of recon
[1:K]
[1:K]
n
n
struction function X̂31 ≡ g31 J1 , J2 , X3 ,
•
step (d) follows from Fano’s inequality in (111),
•
step (e) follows from the chain rule of conditional mutual information and the memoryless
property across time of the source (X1n , X2n , X3n ),
•
step (f ) from follows from the definitions (114) and (115),
•
step (g) follows from the use of a time sharing random variable Q uniformly distributed
over the set {1, . . . , n},
•
e2→13 , (U2→13[Q] , Q) and U
e2→3 ,
step (h) follows by letting new random variables U
(U2→3[Q] , Q).
[1:K]
n
4) Distortion at node 1: Node 1 reconstructs a lossy X̂12
≡ g12 J2 , X1n . It is clear that
[1:K]
[1:K]
n
we write without loss of generality X̂12
≡ g12 J1 , J2 , X1n . For each t ∈ {1, . . . , n},
define a function X̂12[t] as beging the t-th coordinate of this estimate:
[1:K]
[1:K]
X̂12[t] U2→13[t] , X1[t] , g12[t] J1 , J2 , X1n .
(148)
The component-wise mean distortion thus verifies
h
i
[1:K]
[1:K]
D12 + ε ≥ E d X2 , g12 J1 , J2 , X1n
(149)
i
1X h
E d X2[t] , X̂12[t] U2→13[t] , X1[t]
n t=1
n
=
1X h
E d X2[Q] , X̂12[Q] U2→13[Q] , X1[Q]
n t=1
h
i
= E d X2[Q] , X̂12[Q] U2→13[Q] , X1[Q]
i
h
e
e
,
= E d X2 , X12 U2→13 , X1
(150)
n
=
Q=t
i
(151)
(152)
(153)
e12 by
where we defined function X
e12 Q, U2→13[Q] , X1[Q] , X̂12[Q] U2→13[Q] , X1[Q] .
e12 U
e2→13 , X1 = X
X
(154)
November 7, 2018
DRAFT
31
[1:K]
[1:K]
n
5) Distortion at node 3: Node 3 reconstructs a lossy description X̂32
≡ g32 J1 , J2 , X3n .
For each t ∈ {1, . . . , n}, define a function X̂32[t] as beging the t-th coordinate of this estimate:
[1:K]
[1:K]
(155)
X̂32[t] U2→13[t] , U2→3[t] , X3[t] , g32[t] J1 , J2 , X3n .
The component-wise mean distortion thus verifies
h
i
[1:K]
[1:K]
D32 + ε ≥ E d X2 , g32 J1 , J2 , X3n
(156)
i
1X h
E d X2[t] , X̂32[t] U2→13[t] , U2→3[t] , X3[t]
=
n t=1
n
1X h
=
E d X2[Q] , X̂32[Q] U2→13[Q] , U2→3[Q] , X3[Q]
n t=1
h
i
= E d X2[Q] , X̂32[Q] U2→13[Q] , U2→3[Q] , X3[Q]
i
h
e32 U
e2→13 , U
e2→3 , X3
,
= E d X2 , X
(157)
n
Q=t
e32 by
where we defined function X
e32 Q, U2→13[Q] , U2→3[Q] , X3[Q]
e2→13 , U
e2→3 , X3 = X
e32 U
X
, X̂32[Q] U2→13[Q] , U2→3[Q] , X3[Q] .
i
(158)
(159)
(160)
(161)
This concludes the proof of the converse and thus that of the theorem.
C. Two encoders and three decoders subject to lossless/lossy reconstruction constraints, reversal
delivery and side information
Consider now the problem described in Fig. 4 where encoder 1 wishes to communicate the
lossless the source X1n to node 2 and a lossy description to node 3. Encoder 2 wishes to send
a lossy description of its source X2n to node 1 and a lossless one to node 3. The corresponding
distortion at node 1 and 3 are D12 and D31 , respectively. In addition to this, the encoders
accomplish the communication using K communication rounds. This problem is very similar to
the problem described in Fig. 3, with the difference that the decoding at node 3 is inverted.
Theorem 4: The rate-distortion region of the setting described in Fig. 4 is given by the union
over all joint probability measures pX1 X2 X3 U2→13 satisfying the Markov chain
U2→13 −− (X1 , X2 ) −− X3
November 7, 2018
(162)
DRAFT
32
n
(X̂12
, D12 )
X1n
Node 1
R1
R1
Node 3
R2
n
(X̂31
, D31 )
R2
X2n
n
X̂32
≈ X2n
Node 2
X3n
n
X̂21
≈ X1n
to emphasize
Figure 4: Two encoders and three decoders subject to lossless/lossy reconstruction constraints,
reversal delivery and side information.
and such that there exists reconstruction mappings:
g31 (X2 , X3 , U2→13 )=X̂31
with
g12 (X1 , U2→13 )=X̂12
with
of the set of all tuples satisfying:
h
i
E d(X1 , X̂31 ) ≤ D31 ,
h
i
E d(X2 , X̂12 ) ≤ D12 ,
(163)
(164)
R1 ≥H(X1 |X2 ) ,
(165)
R2 ≥I(U2→13 ; X2 |X1 ) + H(X2 |U2→13 X1 X3 ) ,
(166)
R1 + R2 ≥H(X1 X2 |X3 ) .
(167)
The auxiliary random variable has cardinality bounds: kU2→13 k ≤ kX1 kkX2 k + 3.
Remark 12: Notice that the rate-distortion region in Theorem 4 is achievable with a single
round of interactions K = 1, which implies that multiple rounds do not improve the ratedistortion region in this case.
Remark 13: Notice that, although node 3 requires only the lossy recovery of X1 , it can in
fact recover X1 perfectly. That is, as node 3 requires the perfect recovery of X2 , it has the same
information that node 2 who recover X1 perfectly. This explains the sum-rate term. We also see,
November 7, 2018
DRAFT
33
that the cooperation helps in the Wyner-Ziv problem that exists between node 2 and 1, with an
increasing of the optimization region thanks to the Markov chain (162).
Proof: The direct part of the proof follows by choosing:
U3→12,l =U1→3,l = U1→2,l = U2→1,l = U3→1,l = U3→2,l = ∅, ∀l
U1→23 ≡U1→23,1 = X1 , U2→3 ≡ U2→3,1 = X2 U1→23,l = U2→13,l = U2→3,l = ∅ ∀ l > 1
and with U2→13,1 ≡ U2→13 auxiliary random variable that according to Theorem 1 should satisfy:
U2→13 −− (X1 , X2 ) −− X3 .
(168)
From the rate equations in Theorem 1, and the above choices for the auxiliary random variables
we obtain:
R1→23 >H(X1 |X2 ) ,
(169)
R2→13 >max {I(X2 ; U2→13 |X1 ), I(X2 ; U2→13 |X1 X3 )}
(170)
=I(X2 ; U2→13 |X1 ) ,
(171)
R1→23 + R2→13 >H(X1 |X3 ) + I(X2 ; U2→13 |X1 X3 ) ,
(172)
R2→3 >H(X2 |U2→13 X1 X3 ) .
(173)
Noticing that R1 ≡ R1→23 and R2 ≡ R2→13 + R2→3 the rate-distortion region (34) reduces to
the desired region in Theorem 4, where for simplicity we dropped the round index. We now
proceed to the proof of the converse.
If a pair of rates (R1 , R2 ) and distortions (D12 , D31 ) are admissible for the K-steps interactive
cooperative distributed source coding setting described in Fig. 4, then for all ε > 0 there exists
n0 (ε, K), such that ∀ n > n0 (ε, K) there exists a K-steps interactive source code (n, K, F, G)
with intermediate rates satisfying:
K
1X
log kJil k ≤ Ri + ε , i ∈ {1, 2}
n l=1
November 7, 2018
(174)
DRAFT
34
and with reconstruction constraints:
h
i
n
E d(X1n , X̂31
) ≤ D31 + ε ,
n
≤ε,
Pr X1n 6= X̂21
h
i
n
E d(X2n , X̂12
) ≤ D12 + ε ,
n
≤ε,
Pr X2n 6= X̂32
where
n
X̂32
≡ g32
[1:K]
[1:K]
J1 , J2 , X3n
[1:K]
n
X̂31
≡ g31 J1
[1:K]
, J2
, X3n
,
,
n
X̂12
≡ g12
(175)
(176)
(177)
(178)
[1:K]
J2 , X1n
[1:K]
n
X̂21
≡ g21 J1
, X2n
,
(179)
.
(180)
For each t ∈ {1, . . . , n}, define random variables U2→13[t] as follows:
[1:K]
[1:K]
U2→13[t] , J1 , J2 , X1[1:t−1] , X1[t+1:n] , X3[1:t−1] .
(181)
Using point 6) in Lemma 10 we can see that this choice satisfies (162). By the conditions (176)
and (178), and Fano’s inequality, we have
n
n
n
log2 (kX1n k − 1) + H2 Pr X1n 6= X̂21
, nǫn ,
H(X1n |X̂21
) ≤ Pr X1n 6= X̂21
n
n
n
6 X̂32
, nǫn ,
log2 (kX2n k − 1) + H2 Pr X2n =
H(X2n |X̂32
) ≤ Pr X2n 6= X̂32
(182)
(183)
where ǫn (ε) → 0 provided that ε → 0 and n → ∞.
1) Rate at node 1: For the first rate, from cut-set arguments similar to the ones used in
Theorem 3 and Fano inequality, we can easily obtain:
n(R1 + ε) ≥ n [H(X1 |X2 ) − ǫn ]
November 7, 2018
(184)
DRAFT
35
2) Rate at node 2: For the second rate, we have
[1:K]
n(R2 + ε) ≥ H J2
(a)
[1:K]
= I J2 ; X1n X2n X3n
(b)
[1:K]
≥ I J2 ; X2n X3n |X1n
(c)
[1:K] [1:K]
= I J1 J2 ; X2n X3n |X1n
[1:K] [1:K]
[1:K] [1:K]
n
n n
n
n
= I J1 J2 ; X3 |X1 + I J1 J2 ; X2 |X1 X3
(185)
(186)
(187)
(188)
(189)
n h
X
[1:K] [1:K]
n
=
I J1 J2 ; X3[t] |X1 , X3[1:t−1]
(d)
t=1
(e)
=
i
[1:K] [1:K]
+I J1 J2 ; X2[t] |X1n X3n X2[1:t−1]
n h
X
[1:K] [1:K]
I J1 J2 X1[1:t−1] X1[t+1:n] X3[1:t−1] ; X3[t] |X1[t]
(190)
t=1
+I
(f )
=
=
[1:K] [1:K]
J1 J2 X1[1:t−1] X1[t+1:n] X3[1:t−1] X3[t+1:n] X2[1:t−1] ; X2[t] |X1[t] X3[t]
n h
X
t=1
n h
X
t=1
(g)
=
n h
X
t=1
(h)
≥
(i)
=
n h
X
t=1
n h
X
t=1
where
November 7, 2018
i
(191)
I U2→13[t] ; X3[t] |X1[t] + I U2→13[t] X3[t+1:n] X2[1:t−1] ; X2[t] |X1[t] X3[t]
I U2→13[t] ; X2[t] X3[t] |X1[t]
+I X3[t+1:n] X2[1:t−1] ; X2[t] |X1[t] X3[t] U2→13[t]
I U2→13[t] ; X2[t] |X1[t] + H X2[t] |X1[t] X3[t] U2→13[t]
−H X2[t] |X1[t] X3[t:n] U2→13[t] X2[1:t−1]
i
i
I U2→13[t] ; X2[t] |X1[t] + H X2[t] |X1[t] X3[t] U2→13[t] − ǫn
I U2→13[Q] ; X2[Q] |X1[Q] , Q = t
i
+H X2[Q] |X1[Q] X3[Q] U2→13[Q] , Q = t − ǫn
i
(j) h
e2→13 − ǫn ,
e2→13 ; X2 |X1 + H X2 |X1 X3 U
≥n I U
i
(192)
(193)
(194)
(195)
(196)
DRAFT
36
[1:K]
is a function of the sources (X1n , X2n ),
•
step (a) follows from the fact that J2
•
step (b) follows from the non-negativity of mutual information,
•
step (c) follows from the fact that J1
•
step (d) follows from the chain rule for conditional mutual information,
•
step (e) follows from the memoryless property across time of the sources (X1n , X2n , X3n ),
•
step (f ) follows from the definition (181)
•
step (g) follows from the Markov chain U2→13[t] −−(X1[t] , X2[t] )−−X3[t] , for all t ∈ {1, . . . , n}
•
and the usual decomposition of mutual information,
step (h) follows from the fact that Pr X2[t] 6= X̂32[t] ≤ ǫ ∀t ∈ {1, . . . , n} , X̂32 [t] ≡
[2:K]
[1:K]
is a function of J2
and the source X1n ,
g32[t] (U2→13[t] , X3[t:n] ) and Fano inequality.
•
step (i) follows from the use of a time sharing random variable Q uniformly distributed
over the set {1, . . . , n},
•
e2→13 , (U2→13[Q] , Q).
step (j) follows by defining new random variables U
3) Sum-rate of nodes 1 and 2: From cut-set arguments and Fano inequality, we can easily
obtain:
n(R1 + R2 + 2ε) ≥ n [H(X1 |X2 X3 ) − ǫn ]
[1:K]
n
≡ g12 J2
4) Distortion at node 1: Node 1 reconstructs a lossy X̂12
(197)
, X1n . For each
t ∈ {1, . . . , n}, define a function X̂12[t] as the t-th coordinate of this estimate:
[1:K]
n
X̂12[t] U2→13[t] , X1[t] , g12[t] J2 , X1 .
(198)
The component-wise mean distortion thus verifies
h
i
[1:K]
D12 + ε ≥ E d X2n , g12 J2 , X1n
(199)
i
1X h
E d X2[t] , X̂12[t] U2→13[t] , X1[t]
n t=1
n
=
1X h
E d X2[Q] , X̂12[Q] U2→13[Q] , X1[Q]
n t=1
h
i
= E d X2[Q] , X̂12[Q] U2→13[Q] , X1[Q]
i
h
e2→13 , X1
e12 U
,
= E d X2 , X
(200)
n
=
Q=t
i
e12 by
where we defined function X
e12 Q, U2→13[Q] , X1[Q] , X̂12[Q] U2→13[Q] , X1[Q] .
e2→13 , X1 = X
e12 U
X
November 7, 2018
(201)
(202)
(203)
(204)
DRAFT
37
[1:K]
[1:K]
n
5) Distortion at node 3: Node 3 reconstructs a lossy description X̂31
≡ g31 J1 , J2 , X3n .
For each t ∈ {1, . . . , n}, define a function X̂31[t] as:
[1:K]
[1:K]
n
X̄31[t] U2→13[t] , X̂32[t] , X3[t] , X3[t+1:n] , g31[t] J1 , J2 , X3 t ∈ {1, . . . , n}. (205)
[1:K]
[1:K]
n
This can be done because X̂32[t] is also a function of J1 , J2 , X3 . The component-wise
mean distortion thus verifies
h
i
[1:K]
[1:K]
D32 + ε ≥ E d X1n , g31 J1 , J2 , X3n
(206)
i
1X h
E d X1[t] , X̄31[t] U2→13[t] , X̂32[t] , X3[t] , X3[t+1:n]
=
n t=1
n
n
1X
=
E d X1[t] , X̄31[t] U2→13[t] , X2[t] , X3[t] , X3[t+1:n]
n t=1
(208)
≥
(209)
(a)
(b)
i
1X h
E d X1[t] , X̂31[t] U2→13[t] , X2[t] , X3[t]
n t=1
n
1X h
E d X1[Q] , X̂31[Q] U2→13[Q] , X2[Q] , X3[Q]
n t=1
h
i
= E d X1[Q] , X̂32[Q] U2→13[Q] , X2[Q] , X3[Q]
i
h
e
e
,
= E d X1 , X32 U2→13 , X2 , X3
n
=
where
•
•
(207)
Q=t
i
(210)
(211)
(212)
step (a) follows from the fact that X̄31[t] U2→13[t] , X̂32[t] , X3[t] , X3[t+1:n] can be trivially
expressed as a function of U2→13[t] , X2[t] , X3[t] , X3[t+1:n] as follows:
if X2[t] = X̂32[t]
X̄31[t] U2→13[t] X2[t] X3[t] X3[t+1:n]
X̄31[t] U2→13[t] X2[t] X3[t] X3[t+1:n] =
X̄31[t] U2→13[t] X̂32[t] X3[t] X3[t+1:n] if X2[t] 6= X̂32[t]
step (b) follows from the fact that X1[t] −− (U2→13[t] X2[t] X3[t] ) −− X3[t+1:n] , which implies
that for all t ∈ {1, . . . , n} exists X̂31[t] U2→13[t] , X2[t] , X3[t] such that
h
i
E d X1[t] , X̂31[t] U2→13[t] X2[t] X3[t]
≤ E d X1[t] , X̄31[t] U2→13[t] X2[t] X3[t] X3[t+1:n]
e32 by
We also defined function X
e32 Q, U2→13[Q] , X2[Q] , X3[Q]
e2→13 , X2 , X3 = X
e32 U
X
, X̂32[Q] U2→13[Q] , X2[Q] , X3[Q] .
(213)
This concludes the proof of the converse and thus that of the theorem.
November 7, 2018
DRAFT
38
D. Two encoders and three decoders subject to lossy reconstruction constraints with degraded
side information
Consider now the problem described in Fig. 5 where encoder 1 has access to X1 and X3 and
wishes to communicate a lossy description of X1 to nodes 2 and 3 with distortion constraints
D21 and D31 , while encoder 2 wishes to send a lossy description of its source X2n to nodes 1
and 3 with distortion constraints D12 and D32 . In addition to this, the encoders overhead the
communication using K communication rounds. This problem can be seen as a generalization of
the settings previously investigated in [19]. This setup is motivated by the following application.
Consider that node 1 transmits a probing signal X3 which is used to explore a spatial region (i.e.
a radar transmitter). After transmission of this probing signal, node 1 measures the response (X1 )
at its location. Similarly, in a different location node 2 measures the response X2 . Responses X1
and X2 have to be sent to node 3 (e.g. the fusion center) which has knowledge of the probing
signal X3 and wants to reconstruct a lossy estimate of them. Nodes 1 and 2 cooperate through
multiple rounds to accomplish this task.
n
(X̂12
, D12 )
(X1n , X3n )
Node 1
R1
R1
n
(X̂32
, D32 )
Node 3
R2
n
, D31 )
(X̂31
R2
X2n
Node 2
X3n
n
, D21 )
(X̂21
Figure 5: Two encoders and three decoders subject to lossy reconstruction constraints with
degraded side information.
Theorem 5: The rate-distortion region of the setting described in Fig. 5 where X1 −−X3 −−X2
form a Markov chain is given by the union over all joint probability measures pX1 X2 X3 W[1,K+1] U1→3,K
November 7, 2018
DRAFT
39
satisfying the following Markov chains:
U1→23,l −− (X1 , X3 , W[1,l] ) −− X2 ,
(214)
U2→13,l −− (X2 , W[2,l] ) −− (X1 , X3 ) ,
(215)
U1→3,K −− (X1 , X3 , W[2,K] ) −− X2 ,
(216)
for all l = [1 : K], and such that there exist reconstruction mappings:
h
i
g12 X1 , X3 , U1→3,K , W[1,K+1] = X̂12 with E d(X2 , X̂12 ) ≤ D12
h
i
g21 X2 , W[1,K+1] = X̂21 with E d(X1 , X̂21 ) ≤ D21
h
i
g31 X3 , W[1,K+1] , U1→3,K = X̂31 with E d(X1 , X̂31 ) ≤ D31
h
i
g32 X3 , W[1,K+1] , U1→3,K = X̂32 with E d(X2 , X̂32 ) ≤ D32
,
,
,
,
9
with W[1,l] = {U1→23,l , U2→13,l }l−1
k=1 for all l = [1 : K], of the set of all tuples satisfying:
R1 ≥ I(W[1,K+1] ; X1 X3 |X2 ) + I(U1→3,K ; X1 |W[1,K+1] X3 ) ,
(217)
R2 ≥ I(W[1,K+1] ; X2 |X3 ) .
(218)
The auxiliary random variables have cardinality bounds:
kU1→23,l k ≤ kX1 kkX3 k
l−1
Y
i=1
kU2→13,l k ≤ kX2 kkU1→23,l k
kU1→23,i kkU2→13,i k + 1, l ∈ [1 : K]
l−1
Y
i=1
kU1→3,l k ≤ kX1 kkX3 k
kU1→23,i kkU2→13,i k + 1, l ∈ [1 : K]
K
Y
i=1
kU1→23,i kkU2→13,i k + 3.
(219)
(220)
(221)
Remark 14: Notice that multiple rounds are needed to achieve the rate-distortion region in
Theorem 5. It is worth to mention that first encoders 1 and 2 cooperate over the K rounds while
on the last round only node 1 send a private description to node 3. Because of the Markov chain
assumed for the sources we observe the following:
•
Only node 1 send a private description to node 3. This is due to the fact that node 3 has
better side information than 2.
9
Notice that U3→12,l = ∅ for all l because R3 = 0.
November 7, 2018
DRAFT
40
•
For the transmissions from node 2, both node 1 and 3 can be thought as an unique node
and there is not reason for node 2 to send a private description to node 1 or node 3.
•
Notice that the there is not sum-rate. Node 3 recovers the descriptions generated at nodes
1 and 2 without resorting to joint-decoding. That is, node 3 can recover the descriptions
generated at nodes 1 and 2 separately and independently.
Proof: The direct part of the proof follows by choosing:
U1→2,l =U2→1,l = U3→12,l = U3→1,l = U3→2,l = U2→3,l = ∅, ∀l
U1→3,l =∅ l < K .
and U1→23,l and U2→13,l and U1→3,K are auxiliary random variables that according to Theorem
1 should satisfy the Markov chains (214)-(216). Cumbersome but straightforward calculations
allows to obtain the desired results. We now proceed to the proof of the converse.
If a pair of rates (R1 , R2 ) and distortions (D12 , D21 , D31 , D32 ) are admissible for the K-steps
interactive cooperative distributed source coding setting described in Fig. 5, then for all ε > 0
there exists n0 (ε, K), such that ∀ n > n0 (ε, K) there exists a K-steps interactive source code
(n, K, F, G) with intermediate rates satisfying:
K
1X
log kJil k ≤ Ri + ε , i ∈ {1, 2}
n l=1
and with average per-letter distortions
h
i
n
E d(X1n , X̂21
) ≤ D21 + ε
h
i
n
n
E d(X1 , X̂31 ) ≤ D31 + ε
h
i
n
E d(X2n , X̂12
) ≤ D12 + ε
h
i
n
E d(X2n , X̂32
) ≤ D32 + ε
(222)
,
(223)
,
(224)
,
(225)
,
(226)
where
[1:K]
[1:K]
n
X̂32
≡ g32 J1 , J2 , X3n ,
[1:K]
[1:K]
n
X̂31
≡ g31 J1 , J2 , X3n ,
November 7, 2018
[1:K]
[1:K]
n
X̂12
≡ g12 J1 , J2 , X1n , X3n ,
[1:K]
[1:K]
n
X̂21
≡ g21 J1 , J2 , X2n .
(227)
(228)
DRAFT
41
For each t ∈ {1, . . . , n}, define random variables (U1→3,[t] , U2→3,[t] ) and the sequences of random
variables (U1→23,k,[t] , U2→13,k,[t] )k=[1:K] as follows:
U1→23,1,[t] , J11 , X3[1:t−1] , X2[t+1:n]
U2→13,1,[t] , J21 ,
,
(229)
(230)
U1→23,k,[t] , J1k , ∀ k = [2 : K] ,
(231)
U2→13,k,[t] , J2k , ∀ k = [2 : K] ,
(232)
U1→3,K,[t] , X3[t+1:n] ,
(233)
From Corollary 4 in the Appendices we see that these choices satisfy equations (214), (215) and
(216).
1) Rate at node 1: For the first rate, we have
[1:K]
n(R1 + ε) ≥ H J1
(a)
[1:K]
= I J1 ; X1n X2n X3n
(b)
[1:K]
≥ I J1 ; X1n X3n |X2n
(c)
[1:K] [1:K]
= I J1 J2 ; X1n X3n |X2n
[1:K] [1:K]
[1:K] [1:K]
n
n
n
n n
= I J1 J2 ; X1 |X2 X3 + I J1 J2 ; X3 |X2
(234)
(235)
(236)
(237)
(238)
n
X
[1:K] [1:K]
=
I J1 J2 X2[1:t−1] X2[t+1:n] X3[1:t−1] X3[t+1:n] X1[1:t−1] ; X1[t] |X2[t] X3[t]
(d)
+
(e)
≥
+
t=1
n
X
t=1
n
X
t=1
n
X
t=1
November 7, 2018
I
I
[1:K] [1:K]
J1 J2 X3[1:t−1] X2[1:t−1] X2[t+1:n] ; X3[t] |X2[t]
[1:K] [1:K]
J1 J2 X2[t+1:n] X3[1:t−1] X3[t+1:n] X1[1:t−1] ; X1[t] |X2[t] X3[t]
[1:K] [1:K]
I J1 J2 X3[1:t−1] X2[t+1:n] ; X3[t] |X2[t]
(239)
(240)
DRAFT
42
=
n
X
[1:K] [1:K]
I J1 J2 X2[t+1:n] X3[1:t−1] ; X1[t] |X2[t] X3[t]
t=1
n h
X
[1:K] [1:K]
+
I X3[t+1:n] X1[1:t−1] ; X1[t] |J1 J2 X2[t:n] X3[1:t]
t=1
+I
[1:K] [1:K]
J1 J2 X3[1:t−1] X2[t+1:n] ; X3[t] |X2[t]
n
X
[1:K] [1:K]
=
I J1 J2 X2[t+1:n] X3[1:t−1] ; X1[t] X3[t] |X2[t]
+
(f )
=
+
(g)
≥
+
(h)
=
+
=
t=1
n
X
t=1
n
X
t=1
n
X
t=1
n
X
t=1
n
X
t=1
n
X
t=1
n
X
t=1
n
X
t=1
+
n
X
t=1
i
[1:K] [1:K]
I X3[t+1:n] X1[1:t−1] ; X1[t] |J1 J2 X2[t:n] X3[1:t]
I
I
I
I
[1:K] [1:K]
J1 J2 X2[t+1:n] X3[1:t−1] ; X1[t] X3[t] |X2[t]
(242)
[1:K] [1:K]
X3[t+1:n] X1[1:t−1] ; X1[t] X2[t] |J1 J2 X2[t+1:n] X3[1:t]
[1:K] [1:K]
J1 J2 X2[t+1:n] X3[1:t−1] ; X1[t] X3[t] |X2[t]
[1:K] [1:K]
X3[t+1:n] ; X1[t] |J1 J2 X2[t+1:n] X3[1:t]
I U1→23,[1:K],[t] U2→13,[1:K],[t] ; X1[t] X3[t] |X2[t]
I U1→3,K,[t] ; X1[t] |U1→23,[1:K],[t] U2→13,[1:K],[t] X3[t]
(241)
(243)
(244)
(245)
I U1→23,[1:K],[Q] U2→13,[1:K],[Q] ; X1[Q] X3[Q] |X2[Q] , Q = t
I U1→3,K,[Q] ; X1[Q] |U1→23,[1:K],[Q] U2→13,[1:K],[Q] X3[Q] , Q = t
(246)
i
h
e1→3,K ; X1 |U
e1→23,[1:K] U
e2→13,[1:K] X3
e1→23,[1:K] U
e2→13,[1:K] ; X1 X3 |X2 +I U
=n I U
h
i
f
e
f
(247)
= n I W[1,K+1] ; X1 X3 |X2 +I U1→3,K ; X1 |W[1,K+1] X3
(i)
where
[1:K]
is a function of the sources (X1n , X2n , X3n ),
•
step (a) follows from the fact that J1
•
step (b) follows from the non-negativity of mutual information,
•
step (c) follows from the fact that J2
November 7, 2018
[1:K]
[1:K]
is a function of J1
and the source X2n ,
DRAFT
43
•
step (d) follows from the chain rule for conditional mutual information and the memoryless
property across time of the sources (X1n , X2n , X3n ),
•
step (e) follows from the non-negativity of mutual information,
•
step (f ) follows from the Markov chain X2[t] −
−(J1
[1:K]
[1:K]
J2
X2[t+1:n] X3[1:t] )−
−(X3[t+1:n] X1[1:t−1] )
(Corollary 4 in the appendices), for all t = [1 : n] which follows from X1 −− X3 −− X2 .,
•
step (g) follows from the non-negativity of mutual information,
•
step (h) follows from defintions (233).
•
step (i) follows from the standard time-sharing arguments and the definition of new random
e1→23,[1:K] , (U1→23,[1:K],[Q] , Q) and X1 , (X1[Q] , Q)).
variables, (i.e. U
and the last step follows from the definition of the past shared common descriptions W[1,l] ∀l. It
e1→23,l , U
e2→13,l ) satisfies the Markov chains in (214)-(216) for
is also immediate to show that (U
all l ∈ [1 : K].
2) Rate at node 2: For the second rate, by following the same steps as before we have
[1:K]
(248)
n(R2 + ε) ≥ H J2
(a)
[1:K]
(249)
= I J2 ; X1n X2n X3n
(b)
[1:K]
n n
n
(250)
≥ I J2 ; X2 |X1 X3
(c)
[1:K] [1:K]
(251)
= I J1 J2 ; X2n |X1n X3n
(d)
[1:K] [1:K]
(252)
= I J1 J2 X1n ; X2n |X3n
(e)
[1:K] [1:K]
(253)
≥ I J1 J2 ; X2n |X3n
n
X
[1:K] [1:K]
=
I J1 J2 X3[1:t−1] X3[t+1:n] X2[t+1:n] ; X2[t] |X3[t]
(f )
(g)
≥
(h)
=
t=1
n
X
t=1
n
X
t=1
where
•
[1:K] [1:K]
I J1 J2 X3[1:t−1] X2[t+1:n] ; X2[t] |X3[t]
(255)
I U1→23,[1:K],[t] U2→13,[1:K],[t] ; X2[t] |X3[t]
(256)
f[1,K+1] ; X2 |X3
= nI W
(i)
[1:K]
step (a) follows from the fact that J2
November 7, 2018
(254)
(257)
is a function of the sources (X1n , X2n , X3n ),
DRAFT
44
•
step (b) follows from the non-negativity of mutual information,
•
step (c) follows from the fact that J1
•
step (d) follows from the Markov chain X1 −− X3 −− X2 .
•
step (e) follows from the non-negativity of mutual information,
•
step (f ) follows from the chain rule for conditional mutual information and the memoryless
[1:K]
[1:K]
is a function of J2
and the source (X1n , X3n ),
property across time of the sources (X1n , X2n , X3n ),
•
step (g) follows from the non-negativity of mutual information,
•
step (h) follows from definitions (233).
•
step (i) follows the definition for W[1,l] ∀l and from standard time-sharing arguments similar
to the ones for rate at node 1.
[1:K]
[1:K]
n
3) Distortion at nodes 1 and 2: Node 1 reconstructs an estimate X̂12
≡ g12 J1 , J2 , X1n , X3n
[1:K]
[1:K]
n
while node 2 reconstructs X̂21
≡ g21 J1 , J2 , X2n . For each t ∈ {1, . . . , n}, define
n
functions X̂12[t] and X̂21[t] as being the t-th coordinate of the corresponding estimates of X̂12
n
and X̂21
, respectively:
[1:K]
[1:K]
,
X̂12[t] W[1,K+1],[t] , U1→3,K,[t] , X1[t] , X3[t] , X1[1:t−1] , X1[t+1:n] , g12[t] J1 , J2 , X1n (258)
[1:K]
[1:K]
.
X̂21[t] W[1,K+1],[t] , X2[t] , g21[t] J1 , J2 , X2n (259)
The component-wise mean distortions thus verify
h
i
[1:K]
[1:K]
n
D12 + ε ≥ E d X2 , g12 J1 , J2 , X1
(260)
i
1X h
(261)
E d X2[t] , X̂12[t] W[1,K+1],[t] , U1→3,K,[t] , X1[t] , X3[t] , X1[1:t−1] , X1[t+1:n]
=
n t=1
(a)
n
i
1X h
∗
≥
E d X2[t] , X̂12[t] W[1,K+1],[t] , U1→3,K,[t] , X1[t] , X3[t]
n t=1
(b)
n
1X h
∗
=
E d X2[Q] , X̂12[Q] W[1,K+1],[Q] , U1→3,K,[Q] , X1[Q] , X3[Q]
n t=1
h
i
∗
W[1,K+1],[Q] , U1→3,K,[Q] X1[Q] , X3[Q]
= E d X2[Q] , X̂12[Q]
i
h
(c)
f[1,K+1] , U
e1→3,K , X1 , X3
e12 W
,
= E d X2 , X
(262)
n
where
•
Q=t
i
(263)
(264)
(265)
step (a) follows from (259),
November 7, 2018
DRAFT
45
•
[1:K]
[1:K]
step (b) follows from Markov chain X2[t] −− X1[t] , X3[t] , J1 , J2 , X3[1:t−1] , X3[t+1:n] ,
X2[t+1:n] −− X1[1:t−1] , X1[t+1:n] ∀ t = [1 : n] (which can be obtained from Corollary 4 in
the appendices) and Lemma 9.
•
step (c) follows from the following relations:
e12 Q, W[1,K+1],[Q] , U1→3,K,[Q] , X1[Q] , X3[Q]
f[1,K+1] , U
e1→3,K , X1 , X3 = X
e12 W
X
∗
W[1,K+1],[Q] , U1→3,K,[Q] , X1[Q] , X3[Q] .
, X̂12[Q]
By following the very same steps, we can also show that:
h
i
[1:K]
[1:K]
D21 + ε ≥ E d X1 , g21 J1 , J2 , X2n
i
h
f[1,K+1] , X2
e21 W
,
= E d X1 , X
[1:K]
and where we used the Markov chain X2[1:t−1] −− X2[t] , J1
[1:K]
, J2
(266)
(267)
, X3[1:t−1] , X2[t+1:n] −−X1[t]
∀ t = [1 : n] (which can be obtained from Corollary 4 in the appendices) and Lemma 9 and
e21 :
where we define the function X
∗
f
f
e
f
e
X21 W[1,K+1] , X2 = X21 Q, W[1,K+1],[Q] , X2[Q] , X̂21[Q] W[1,K+1],[Q] , U2→13[Q] , X2[Q] .
[1:K]
[1:K]
n
4) Distortions at node 3: Node 3 compute lossy reconstructions X̂31
≡ g31 J1 , J2 , X3n
[1:K]
[1:K]
n
and X̂32
≡ g32 J1 , J2 , X3n . For each t ∈ {1, . . . , n}, define functions X̂31[t] and X̂32[t]
n
n
as being the t-th coordinate of the corresponding estimates of X̂31
and X̂32
, respectively:
[1:K]
[1:K]
n
(268)
X̂31[t] W[1,K+1],[t] , U1→3,K,[t] , X3[t] , g31[t] J1 , J2 , X3 ,
[1:K]
[1:K]
(269)
X̂32[t] W[1,K+1],[t] , U1→3,K,[t] , X3[t] , g32[t] J1 , J2 , X3n .
The component-wise mean distortions thus verify
h
i
[1:K]
[1:K]
n
D31 + ε ≥ E d X1 , g31 J1 , J2 , X3
(270)
i
1X h
=
E d X1[t] , X̂31[t] W[1,K+1],[t] , U1→3,K,[t] , X3[t]
n t=1
n
1X h
E d X1[Q] , X̂31[Q] W[1,K+1],[Q] , U1→3,K,[Q] , X3[Q]
=
n t=1
h
i
= E d X1[Q] , X̂31[Q] W[1,K+1],[Q] , U1→3,K,[Q] , X3[Q]
i
h
f[1,K+1] , U
e1→3,K , X3
e31 W
,
= E d X1 , X
(271)
n
November 7, 2018
Q=t
i
(272)
(273)
(274)
DRAFT
46
e31 by
where the last step follows by defining the function X
e31 Q, W[1,K+1],[Q] , U1→3,K,[Q] , X3[Q]
f[1,K+1] , U
e1→3,K , X3 = X
e31 W
X
, X̂31[Q] W[1,K+1],[Q] , U1→3,K,[Q] , X3[Q] .
By following the very same steps, we can also show that:
h
i
[1:K]
[1:K]
D32 + ε ≥ E d X2 , g32 J1 , J2 , X3n
i
h
f
e
e
,
= E d X2 , X32 W[1,K+1] , U1→3,K , X3
(275)
(276)
e32 by
and where we define the function X
e32 Q, W[1,K+1],[Q] , U1→3,K,[Q] , X3[Q]
f[1,K+1] , U
e1→3,K , X3 = X
e32 W
X
, X̂32[Q] W[1,K+1],[Q] , U1→3,K,[Q] , X3[Q] .
Cooperative distributed source coding with two distortion criteria under reversal delivery and
side information This concludes the proof of the converse and thus that of the theorem.
E. Three encoders and three decoders subject to lossless/lossy reconstruction constraints with
degraded side information
Consider now the problem described in Fig. 6 where encoder 1 wishes to communicate the
lossless the source X1n to nodes 2 and 3 while encoder 2 wishes to send a lossy description
of its source X2n to node 3 with distortion constraints D32 and encoder 3 wishes to send a
lossy description of its source X3n to node 2 with distortion constraints D23 . In addition to this,
the encoders perfom the communication using K communication rounds. This problem can be
seen as a generalization of the settings previously investigated in [10]. This setting can model
a problem in which node 1 generate a process X1 . This process physically propagates to the
locations where nodes 2 and 3 are. These nodes measure X2 and X3 respectively. If node 2 is
closer to node 1 than node 3, we can assume that X1 −− X2 −− X3 . Nodes 2 and 3 then interact
between them and with node 1, in order to reconstruct X1 in lossless fashion and X2 and X3
with some distortion level.
November 7, 2018
DRAFT
47
n
n
(X̂23
, D23 ) X̂21
≈ X1n
X2n
Node 2
R1
R2
R2
Node 1
R3
R3
X3n
Node 3
X1n
R1
n
n
(X̂32
, D32 ) X̂31
≈ X1n
Figure 6: Three encoders and three decoders subject to lossless/lossy reconstruction constraints
with degraded side information.
Theorem 6: The rate-distortion region of the setting described in Fig. 6 where X1 −−X2 −−X3
form a Markov chain is given by the union over all joint probability measures pX1 X2 X3 U3→2,[1:K] U2→3,[1:K]
satisfying the Markov chains
U2→3,l −− (X1 , X2 , V[23,l,2] ) −− X3 ,
(277)
U3→2,l −− (X1 , X3 , V[23,l,3] ) −− X2 ,
(278)
∀l ∈ [1 : K], and such that there exist reconstruction mappings:
h
i
g23 X1 , X2 , V[23,K+1,2] = X̂23 with E d(X3 , X̂23 ) ≤ D23 ,
h
i
g32 X1 , X3 , V[23,K+1,2] = X̂32 with E d(X2 , X̂32 ) ≤ D32 ,
of the set of all tuples satisfying:
R1 ≥ H(X1 |X2 ) ,
(279)
R2 ≥ I(V[23,K+1,2] ; X2 |X1 X3 ) ,
(280)
R3 ≥ I(V[23,K+1,2] ; X3 |X1 X2 ) ,
(281)
R1 + R2 ≥ H(X1 |X3 ) + I(V[23,K+1,2] ; X2 |X1 X3 ) ,
November 7, 2018
(282)
DRAFT
48
The auxiliary random variables have cardinality bounds:
kU2→3,l k ≤ kX1 kkX2 k
l−1
Y
i=1
kU2→3,i kkU3→2,i k + 1, l ∈ [1 : K]
kU3→2,l k ≤ kX1 kkX3 kkU2→3,l k
l−1
Y
i=1
kU2→3,i kkU3→2,i k + 1, l ∈ [1 : K]
(283)
(284)
Remark 15: Theorem 6 shows that several exchanges between nodes 2 and 3 can be helpful.
Node 1 transmit only once at the beginning its full source.
Proof: The direct part of the proof follows according to Theorem 1 by choosing:
U1→3,l = U1→2,l = U3→1,l = U2→1,l = U3→12,l = ∅
U1→23,l = U2→13,l = ∅,
∀l ∈ [1 : K]:
∀l ∈ [2 : K]
and U1→23,1 = U2→13,1 = X1 . The remanding auxiliary random variables satisfy ∀l ∈ [1 : K]:
U2→3,l −− (X1 , X2 , V[23,l,2] ) −− X3 ,
(285)
U3→2,l −− (X1 , X3 , V[23,l,3] ) −− X2 .
(286)
If a pair of tuple (R1 , R2 , R3 ) and distortions (D23 , D32 ) are admissible for the K-steps
interactive cooperative distributed source coding setting described in Fig. 6, then for all ε > 0
there exists n0 (ε, K), such that ∀ n > n0 (ε, K) there exists a K-steps interactive source code
(n, K, F, G) with intermediate rates satisfying:
K
1X
log kJil k ≤ Ri + ε , i ∈ {1, 2, 3}
n l=1
(287)
and with average per-letter distortions with respect to the source 2 and perfect reconstruction
with respect to the source 1 at all nodes:
h
i
n
E d(X2n , X̂32
) ≤ D32 + ε ,
n
n
Pr X1 6= X̂21 ≤ ε ,
h
i
n
E d(X3n , X̂23
) ≤ D23 + ε ,
n
≤ε,
Pr X1n 6= X̂31
November 7, 2018
(288)
(289)
(290)
(291)
DRAFT
49
where
[1:K]
[1:K]
[1:K]
n
X̂32
≡ g32 J1 , J2 , J3 , X3n ,
[1:K]
[1:K]
[1:K]
n
X̂31
≡ g31 J1 , J2 , J3 , X3n ,
[1:K]
[1:K]
[1:K]
n
X̂23
≡ g23 J1 , J2 , J3 , X2n , (292)
[1:K]
[1:K]
[1:K]
n
X̂21
≡ g21 J1 , J2 , J3 , X2n . (293)
For each t ∈ {1, . . . , n} and l ∈ [1 : K], we define random variables U2→3,l,[t] and U3→2,l,[t]
as follows:
U2→3,1,[t] , J11 , J21 , X1[1:t−1] , X1[t+1:n] , X2[t+1:n] , X3[1:t−1] ,
U2→3,l,[t] , J1l , J2l , l ∈ [2 : K]
U3→2,l,[t] , J3l , l ∈ [1 : K] .
(294)
(295)
(296)
These auxiliary random variables satisfy the Markov conditions (285) and (286), which can
be verified from Lemma 11 in the appendices. By the conditions (291) and (289), and Fano’s
inequality [21], we have
n
n
n
, nǫn ,
log2 (kX1n k − 1) + H2 Pr X1n 6= X̂31
H(X1n |X̂31
) ≤ Pr X1n =
6 X̂31
n
n
n
, nǫn ,
6 X̂21
log2 (kX1n k − 1) + H2 Pr X1n =
H(X1n |X̂21
) ≤ Pr X1n =
6 X̂21
(297)
(298)
where ǫn (ε) → 0 provided that ε → 0 and n → ∞.
1) Rate at node 1: For the first rate, we have
[1:K]
n(R1 + ε) ≥ H J1
[1:K]
≥ H J1 |X2n X3n
(a)
[1:K] [1:K] [1:K]
= H J1 J2 J3 |X2n X3n
(b)
[1:K] [1:K] [1:K]
= I J1 J2 J3 ; X1n |X2n X3n
[1:K] [1:K] [1:K]
≥ nH(X1 |X2 X3 ) − H X1n |X2n J1 J2 J3
(299)
(300)
(301)
(302)
(303)
(c)
n
≥ nH(X1 |X2 X3 ) − H(X1n |X̂21
)
(304)
(d)
≥ n [H(X1 |X2 X3 ) − ǫn ] ,
(e)
= n [H(X1 |X2 ) − ǫn ] ,
(305)
(306)
where
November 7, 2018
DRAFT
50
•
[1:K]
step (a) follows from the fact that by definition of the code the sequence J2
[1:K]
functions of (J1
•
, X2n , X3n ),
[1:K]
step (b) follows from the fact that by definition of the code the sequences (J1
are functions of the sources(X1n , X2n , X3n ),
•
[1:K]
, J3
[1:K]
, J2
are
[1:K]
, J3
step (c) follows from the code assumption in (293) that guarantees the existence of a
[1:K]
[1:K]
[1:K]
n
reconstruction function X̂21
≡ g21 J1 , J2 , J3 , X2n ,
•
step (d) follows from Fano’s inequality in (298),
•
step (e) follows from the assumption that X1 −− X2 −− X3 form a Markov chain.
2) Rate at nodes 2 and 3: For the second rate, we have
[1:K]
n(R2 + ε) ≥ H J2
(a)
[1:K]
= I J2 ; X1n X2n X3n
(b)
[1:K]
≥ I J2 ; X2n |X1n X3n
(c)
[1:K] [1:K] [1:K]
= I J1 J2 J3 ; X2n |X1n X3n
(307)
(308)
(309)
(310)
n
X
[1:K] [1:K] [1:K]
=
I J1 J2 J3 ; X2[t] |X1n X3n X2[t+1:n]
(d)
(311)
t=1
(e)
≥
(f )
=
n
X
t=1
n
X
t=1
I V[23,K+1,2][Q] ; X2[Q] |X1[Q] X3[Q] , Q = t
e[23,K+1,2] ; X2 |X1 X3 ,
≥ nI V
(g)
where
I V[23,K+1,2][t] ; X2[t] |X1[t] X3[t]
[1:K]
(313)
(314)
is a function of the sources (X1n , X2n , X3n ),
•
step (a) follows from the fact that J2
•
step (b) follows from the non-negativity of mutual information,
•
step (c) follows from the fact that (J1
[1:K]
(X1n , X2n , X3n ),
(312)
[1:K]
, J2
[1:K]
, J3
) are functions of the sources
•
step (d) follows from the chain rule for conditional mutual information,
•
step (e) follows from the definitions (294), (295) and (296), the memoryless property of
the sources and the non-negativity of mutual information,
November 7, 2018
DRAFT
)
51
•
step (f ) follows from the use of a time sharing random variable Q uniformly distributed
over the set {1, . . . , n},
•
e[23,K+1,2] , (V[23,K+1,2][Q] , Q).
step (g) follows by letting a new random variable V
By following similar steps, it is not difficult to check that
n
X
n(R3 + ε) ≥
I V[23,K+1,2][t] ; X3[t] |X1[t] X2[t]
=
t=1
n
X
t=1
I V[23,K+1,2][Q] ; X3[Q] |X1[Q] X2[Q] , Q = t
e[23,K+1,2] ; X3 |X1 X2 .
≥ nI V
(315)
3) Sum-rate of nodes 1 and 2: For the sum-rate, we have
[1:K]
[1:K]
n(R1 + R2 + 2ε) ≥ H J1
+ H J2
[1:K] [1:K]
≥ H J1 J2
(a)
[1:K] [1:K]
= I J1 J2 ; X1n X3n X2n
(b)
[1:K] [1:K]
≥ I J1 J2 ; X1n X2n |X3n
[1:K] [1:K] [1:K]
[1:K] [1:K]
n n
n
n
n
= I J1 J2 ; X1 |X3 + I J1 J2 J3 ; X2 |X1 X3
[1:K] [1:K]
= H (X1n |X3n ) − H X1n |J1 J2 X3n
[1:K] [1:K] [1:K]
+I J1 J2 J3 ; X2n |X1n X3n
(c)
[1:K] [1:K] [1:K]
n
≥ H (X1n |X3n ) − H(X1n |X̂31
) + I J1 J2 J3 ; X2n |X1n X3n
(d)
[1:K] [1:K] [1:K]
n
n n
≥ n [H (X1 |X3 ) − ǫn ] + I J1 J2 J3 ; X2 |X1 X3
(316)
(317)
(318)
(319)
(320)
(321)
(322)
(323)
(324)
(325)
n
X
[1:K] [1:K] [1:K]
≥
I J1 J2 J3 X1[1:t−1] X1[t+1:n] X3[1:t−1] X2[t+1:n] ; X2[t] |X1[t] X3[t]
(e)
t=1
+ n [H (X1 |X3 ) − ǫn ]
(f )
= n [H (X1 |X3 ) − ǫn ] +
November 7, 2018
t=1
I V[23,K+1,2][t] ; X2[t] |X1[t] X3[t]
= n H (X1 |X3 ) − ǫn + I V[23,K+1,2][Q] ; X2[Q] |X1[Q] X3[Q] , Q
i
h
(h)
e[23,K+1,2] ; X2 |X1 X3
,
= n H (X1 |X3 ) − ǫn + I V
(g)
where
(326)
n
X
(327)
(328)
(329)
DRAFT
52
[1:K]
[1:K]
and J2
are functions of the sources (X1n , X2n , X3n ),
•
step (a) follows from the fact that J1
•
step (b) follows non-negativity of mutual information,
•
step (c) follows from the code assumption in (293) that guarantees the existence of recon
[1:K]
[1:K]
[1:K]
n
n
struction function X̂31 ≡ g31 J1 , J2 , J3 , X3 ,
•
step (d) follows from Fano’s inequality in (291),
•
step (e) follows from the chain rule of conditional mutual information and the memoryless
property across time of the sources (X1n , X2n , X3n ), and non-negativity of mutual information,
•
step (f ) from follows from the definitions (294) and (295),
•
step (g) follows from the use of a time sharing random variable Q uniformly distributed
•
over the set {1, . . . , n},
e[23,K+1,2] , (V
e[23,K+1,2][Q] , Q).
step (h) follows from V
4) Distortion at node 2: Node 2 reconstructs a lossy
n
X̂23
≡ g23
[1:K]
[1:K]
[1:K]
J1 , J2 , J3 , X2n
.
For each t ∈ {1, . . . , n}, define a function X̂23[t] as beging the t-th coordinate of this estimate:
[1:K]
[1:K]
[1:K]
n
(330)
X̂23[t] V[23,K+1,2][t] , X2[t] , g23[t] J1 , J2 , J3 , X2 .
The component-wise mean distortion thus verifies
h
i
[1:K]
[1:K]
[1:K]
D23 + ε ≥ E d X3 , g23 J1 , J2 , J3 , X2n
(331)
i
1X h
E d X3[t] , X̂23[t] V[23,K+1,2][t] , X2[t]
=
n t=1
n
1X h
=
E d X3[Q] , X̂23[Q] V[23,K+1,2][Q] , X2[Q]
n t=1
h
i
= E d X3[Q] , X̂23[Q] V[23,K+1,2][Q] , X2[Q]
i
h
e
e
,
= E d X3 , X23 V[23,K+1,2] , X2
(332)
n
Q=t
i
(333)
(334)
(335)
e23 by
where we defined function X
e23 Q, V[23,K+1,2][Q] , X2[Q] , X̂23[Q] V[23,K+1,2][Q] , X2[Q] .(336)
e
e
X23 V[23,K+1,2] , X2 = X
[1:K]
[1:K]
n
5) Distortion at node 3: Node 3 reconstructs a lossy description X̂32 ≡ g32 J1 , J2 ,
[1:K]
J3 , X3n . For each t ∈ {1, . . . , n}, define a function X̂32[t] as beging the t-th coordinate of
this estimate:
[1:K]
[1:K]
[1:K]
X̂32[t] V[23,K+1,2][t] , X3[t] , g32[t] J1 , J2 , J3 , X3n .
November 7, 2018
(337)
DRAFT
53
The component-wise mean distortion thus verifies
h
i
[1:K]
[1:K]
[1:K]
n
D32 + ε ≥ E d X2 , g32 J1 , J2 , J3 , X3
(338)
i
1X h
E d X2[t] , X̂32[t] V[23,K+1,2][t] , X3[t]
=
n t=1
n
1X h
E d X2[Q] , X̂32[Q] V[23,K+1,2][Q] , X3[Q]
=
n t=1
h
i
= E d X2[Q] , X̂32[Q] V[23,K+1,2][Q] , X3[Q]
i
h
e[23,K+1,2] , X3
e32 V
,
= E d X2 , X
(339)
n
e32 by
where we defined function X
e32 Q, V[23,K+1,3][Q] , X3[Q]
e
e
X32 V[23,K+1,3] , X3 = X
, X̂32[Q] V[23,K+1,3][Q] , X3[Q] .
Q=t
i
(340)
(341)
(342)
(343)
This concludes the proof of the converse and thus that of the theorem.
VI. D ISCUSSION
A. Numerical example
In order to obtain further insight into the gains obtained from cooperation, we consider the
case of two encoders and one decoder subject to lossy/lossless reconstruction constraints without
side information in which the sources are distributed according to:
x22
1
exp − 2
pX1 X2 (x1 , x2 ) = α1 {x1 = 1} √
2σ1
2πσ1
1
x22
+ (1 − α)1 {x1 = 0} √
(344)
exp − 2 .
2σ0
2πσ0
This model yields a mixed between discrete and continuous components. We observe that X1
follows a Bernoulli distribution with parameter α ∈ [0 : 1] while X2 given X1 follows a Gaussian
distribution with different variance according to the value of X1 ∈ {0, 1}. In this sense, X2
follows a Gaussian mixture distribution10 .
10
Although the inner bound region in Theorem 1 is strictly valid for discrete sources with finite alphabets, the Gaussian
distribution is sufficiently well-behaved to apply a uniform quantization procedure prior to the application of the results of
Theorem 1. Then, a limiting argument using a sequence of decreasing quantization step-sizes will deliver the desired result. See
chapter 3 in [22].
November 7, 2018
DRAFT
54
The optimal rate-distortion region for this case was characterized in Theorem 2 and can be
alternatively written as:
R(D) =
[n
p∈L
(R1 , R2 ) : R1 > H(X1 |X2 ) ,
R2 > I(X2 ; U |X1 ) ,
R1 + R2 > H(X1 ) + I(X2 ; U |X1 )
where
o
,
L = pU |X1 X2 : there exists (x1 , u) 7→ g(x1 , u) such that E[d(X2 , g(X1 , U ))] ≤ D .
(345)
(346)
The corresponding non-cooperative region for the same problem was characterized in [5]:
[ n
R(D) =
(R1 , R2 ) : R1 > H(X1 |U ) ,
(347)
p∈L⋆
R2 > I(X2 ; U |X1 ) ,
R1 + R2 > H(X1 ) + I(X2 ; U |X1 )
where
(348)
o
,
L⋆ = pU |X2 : there exists (x1 , u) 7→ g(x1 , u) such that E[d(X2 , g(X1 , U ))] ≤ D .
(349)
(350)
From the previous expressions, it is evident that the cooperative case offers some gains with
respect to the non-cooperative setup. This is clearly evidenced from the lower limit in R1 and
the fact that L⋆ ⊆ L. We have the following result.
Theorem 7 (Cooperative region for mixed discrete/continuous source): Assume the source distribution is given by (344) and that, without loss of generality, σ02 ≤ σ12 . The rate-distortion region
November 7, 2018
DRAFT
55
from Theorem 2 can be written as:
Z ∞
1−α
x22
R1 > √
exp − 2 H2 (g(x2 ))dx2
2σ0
2πσ0 −∞
Z ∞
x22
α
exp − 2 H2 (g(x2 ))dx2 ,
+√
2σ1
2πσ1 −∞
!
2(1−α) 2α
1
σ0
σ1
D ≤ σ02
log
2
D
R2 >
,
+
2
α
ασ
1
D > σ02
log
2
D − (1 − α)σ02
!
2(1−α) 2α
σ
σ
1
0
1
H2 (α) + log
D ≤ σ02
2
D
R1 + R2 >
,
+
2
α
ασ
1
H2 (α) +
D > σ02
log
2
D − (1 − α)σ02
where H2 (z) ≡ −z log z − (1 − z) log (1 − z) for z ∈ [0, 1], [x]+ = max {0, x} and
x22
α
√
exp − 2
2σ1
2πσ
1
.
g(x2 ) =
2
α
x2
x22
1−α
√
exp − 2 + √
exp − 2
2σ1
2σ0
2πσ1
2πσ0
(351)
Proof: The converse proof is straightforward by observing that when D ≤ σ02 :
I(X2 ; U |X1 ) = h(X2 |X1 ) − h(X2 |U, X1 ) ≥ h(X2 |X1 ) −
and h(X2 |X1 ) =
α
2
1
log (2πeD)
2
(352)
log (2πeσ12 ) + 1−α
log (2πeσ02 ). For the case when σ02 < D ≤ ασ12 + (1 − α)σ02
2
we can write:
I(X1 ; U |X1 ) = h(X2 |X1 ) − h(X2 |U, X1 )
≥ h(X2 |X1 ) − αh(X2 |U, X1 = 1) − (1 − α)h(X2 |X1 = 0)
2πe(D − (1 − α)σ02 )
α
2
= log (2πeσ1 ) − α log
2
α
2
ασ1
α
.
= log
2
D − (1 − α)σ02
(353)
(354)
When D > ασ12 + (1 − α)σ02 we can lower bound the mutual information by zero.
The achievability follows from the choice:
2
σ0
U if X1 = 0
2
σ02 +σZ
2 0
g(U, X1 ) =
σ1
U if X1 = 1 ,
σ 2 +σ 2
1
November 7, 2018
(355)
Z1
DRAFT
56
and by setting the auxiliary random variable:
X + Z if X = 0 ,
2
0
1
U=
X2 + Z1 if X1 = 1 ,
(356)
where Z0 , Z1 are zero-mean Gaussian random variables, independent from X2 and X1 and with
variances given by:
σZ2 0 =
Dσ12
Dσ02
2
,
σ
,
=
Z1
σ02 − D
σ12 − D
(357)
for D ≤ σ02 while for σ02 < D ≤ ασ12 + (1 − α)σ02 , we choose:
σZ2 0 → ∞ , σZ2 1 =
[D − (1 − α)σ02 ] σ12
.
ασ12 − [D − (1 − α)σ02 ]
(358)
Finally, for D > ασ12 + (1 − α)σ02 , we let σZ2 0 → ∞ and σZ2 1 → ∞.
Unfortunately, the non-cooperative region is hard to evaluate for the assumed source model11 .
In order to present some comparison between the cooperative and non-cooperative case let us
fix the same value for the rate R1 in both cases and compare the rate R2 that can be obtained
in each case. Clearly, in this way, we are not taking into account the gain in R1 that could be
obtained by the cooperative scheme (as H(X1 |X2 ) ≤ H(X1 |U ) for every U −− X2 −− X1 ). For
both schemes, it follows that for fixed R1 :
R2 > max {I(X2 ; U |X1 ), H(X1 ) + I(X2 ; U |X1 ) − R1 } .
(359)
From Theorem 7, we can compute (359) for the cooperative case. For the non-cooperative case
we need to obtain a lower bound on I(X2 ; U |X1 ) for pU |X1 ∈ L⋆ . It is easy to check that:
2
2
α
σ0
σ1
1−α
+ log
,
(360)
log
I(X2 ; U |X1 ) ≥
2
β0
2
β1
where
i
|X1 = 0 ,
h
i
2
β1 = EX2 U |X1 X2 − EX2|U X1 [X2 |U, X1 = 1] |X1 = 1 .
β0 = EX2 U |X1
h
X2 − EX2|U X1 [X2 |U, X1 = 0]
2
(361)
(362)
The distortion constraint imposes the condition:
(1 − α)β0 + αβ1 ≤ D .
11
(363)
However, there are cases where an exact characterization is possible. This is the case, for example, when X1 and X2 are
the input and output of binary channel with crossover probability α and the distortion function is the Hamming distance [5].
November 7, 2018
DRAFT
57
In order to guarantee that (360) and (361) are achievable, under the Markov constraint U −−
X2 −− X1 , the following conditions on pU |X2 (u|x2 ) should be satisfied:
x22
1
)
(
exp − 2
pU |X2 (u|x2 ) √
2
1
(x2 − f0 (u))
2σ0
2πσ0
=√
exp −
2
R∞
x
1
2β0
2πβ0
exp − 22 dx2
p
(u|x2 ) √
−∞ U |X2
2σ0
2πσ0
and
where
x22
1
(
)
exp − 2
pU |X2 (u|x2 ) √
(x2 − f1 (u))2
1
2σ1
2πσ1
exp −
,
=√
R∞
1
x22
2β1
2πβ1
(u|x2 ) √
p
exp − 2 dx2
−∞ U |X2
2σ1
2πσ1
f0 (U ) ≡ EX2|U X1 [X2 |U, X1 = 0] , f1 (U ) ≡ EX2|U X1 [X2 |U, X1 = 1] .
(364)
(365)
(366)
The characterization of all distributions pU |X2 (u|x2 ) that satisfies (365) and (366) appears to be
a difficult problem. In order to show a numerical example, we shall simply assume that:
(
)
(u − x2 )2
1
exp −
.
(367)
pU |X2 (u|x2 ) = √
2σw2
2πσw
Indeed, this choice satisfies simultaneously expressions (365) and (366). In this way, we can
calculate the corresponding values of β0 and β1 obtaining the parametrization of I(X2 ; U |X1 )
as function of σw2 :
1−α
I(X2 ; U |X1 ) =
log
2
σ02 + σw2
σw2
α
+ log
2
σ12 + σw2
σw2
(368)
with the following constraint:
(1 − α)
σw2 σ12
σw2 σ02
+
α
=D .
σ02 + σw2
σ12 + σw2
(369)
We can replace (369) in (359) to obtain an indication of the performance of the non-cooperative
case when R1 is fixed.
We present now some numerical evaluations. As equation (359) is valid for both the cooperative and the non-cooperative setups, it is sufficient to compare the mutual information term
I(X2 ; U |X1 ) for each of them. Let us consider the next scenarios:
1) α = 0.1, σ02 = 0.01, σ12 = 2,
2) α = 0.1, σ02 = 0.5, σ12 = 2.
November 7, 2018
DRAFT
58
1.5
I(X2;U|X1)
Non−cooperative scheme
Cooperative scheme
1
α=0.1
σ0=0.5
σ1=2
0.5
0
0.1
0.15
0.2
0.25
0.3
D
0.35
0.4
0.45
0.5
0.5
Non−cooperative scheme
I(X2;U|X1)
0.4
Cooperative scheme
0.3
α=0.1
σ =0.01
0
σ =2
0.2
0.1
0.01
1
0.015
0.02
0.025
0.03
D
0.035
0.04
0.045
0.05
Figure 7: Comparison between the cooperative and the non-cooperative schemes.
From Fig. 7 we see that in the case σ02 ≪ σ12 the gain of the cooperative scheme is pretty
noticeable. However, as σ02 becomes comparable to σ12 the gains are reduced. This was expected
from the fact that as σ02 → σ12 , the random variable X2 converges to a Gaussian distribution.
In that case, the reconstruction of X2 at Node 3 is equivalent for the cooperative scenario to
a lossy source coding problem with side information X1 at both the encoder and the decoder
while for the non-cooperative setting to the standard Wyner-Ziv problem. It is known that in this
case there is no gains that can be expected [2].
B. Interactive Lossless Source Coding
Consider now the problem described in Fig. 8 where encoder 1 wishes to communicate lossless
the source X1n to two decoders which observe sources X2n and X3n . At the same time node 1
November 7, 2018
DRAFT
59
wishes to recover X2n and X3n lossless. Similarly the other encoders want to communicate lossless
their sources and recover the sources from the rest. It is wanted to do this through K rounds of
exchanges.
n
n
≈ X3n
X̂21
≈ X1n X̂23
R1
n
X̂12
X1n
≈
X2n
n
X̂13
≈
Node 2
X3n
X2n
R2
R2
Node 1
R3
R3
R1
Node 3
X3n
n
n
X̂31
≈ X1n X̂32
≈ X2n
Figure 8: Interactive lossless source coding.
Theorem 8 (Interactive lossless source coding): The rate region of the setting described in
Fig. 8 is given by the set of all tuples satisfying:
R1 >H(X1 |X2 X3 ) ,
(370)
R2 >H(X2 |X1 X3 ) ,
(371)
R3 >H(X3 |X1 X2 ) ,
(372)
R1 + R2 >H(X1 X2 |X3 ) ,
(373)
R1 + R3 >H(X1 X3 |X2 ) ,
(374)
R2 + R3 >H(X2 X3 |X1 ) .
(375)
Remark 16: It is worth observing that the multiple exchanges of descriptions between all
nodes cannot increase the rate compared to standard Slepian-Wolf coding [1].
Proof: The achievability part is a standard exercise. The converse proof is straightforward
from cut-set arguments. For these reasons both proofs are ommited.
November 7, 2018
DRAFT
60
We should note that for this important case, Theorem 1 does not provide the optimal rateregion. That is, the coding scheme used is not optimal for this case. In fact, from Theorem 1
and for this problem we can obtain the following achievable region12 :
R1 > H(X1 |X2 ) ,
(376)
R2 > H(X2 |X1 X3 ) ,
(377)
R3 > H(X3 |X1 X2 ) ,
(378)
R1 + R2 > H(X1 X2 |X3 ) ,
(379)
R2 + R3 > H(X2 X3 |X1 ) .
(380)
It is easily seen that in this region the node 2 is not performing joint decoding of the descriptions
generated at node 1 and 3. Because of the encoding ordering assumed (1 → 2 → 3) and the
fact that the common description generated in node 2 should be conditionally generated on the
common description generated at node 1, node 2 has to recover first this common description
first. At the end, it recovers the common description generated at node 3. On the other hand,
nodes 1 and 3 perform joint decoding of the common information generated at nodes 2 and
3, and at nodes 1 and 2, respectively. Clearly, this is consequence of the sequential encoding
and decoding structure imposed between the nodes in the network and which is the basis of the
interaction.
If all nodes would be allowed to perform a joint decoding procedure in order to recover all the
exchanged descriptions only at the end of each round, this problem would not appear. However,
this would destroy the sequential encoding-decoding structure assumed in the paper which seems
to be optimal in other situations.
VII. S UMMARY
The three-node multiterminal lossy source coding problem was investigated. This problem
is not a straightforward generalization of the original problem posed by Kaspi in 1985. As
this general problem encompasses several open problems in multiterminal rate distortion the
12
Consider U1→23,1 ≡ X1 , U2→13,1 ≡ X2 and U3→12,1 ≡ X3 and the other auxiliary random variables to be constants for
all l ∈ [1 : K].
November 7, 2018
DRAFT
61
mathematical complexity of it is formidable. For that reason we only provided a general inner
bound for the rate distortion region. It is shown that this (rather involved) inner bound contains
several rate-distortion regions of some relevant source coding settings. It In this way, besides
the non-trivial extension of the interactive two terminal problem, our results can be seen as a
generalization and hence unification of several previous works in the field. We also showed,
that our inner bound provides definite answers to special cases of the general problem. It was
shown that in some cases the cooperation induced by the interaction can be helpful while in
others not. It is clear that further study is needed on the topic of multiple terminal cooperative
coding, including a proper generalization to larger networks and to the problem of interactively
estimating arbitrary functions of the sources sensed at the nodes.
November 7, 2018
DRAFT
62
A PPENDIX A
S TRONGLY TYPICAL SEQUENCES AND RELATED RESULTS
In this appendix we introduce standard notions in information theory but suited for the
mathematical developments and proof needed in this work. The results presents can be easily
derived from the standard formulations provided in [22] and [23]. Be X and Y finite alphabets
and (xn , y n ) ∈ X n × Y n . With P(X × Y) we denote the set of all probability distributions on
X × Y. We define the strongly δ-typical sets as:
Definition 3 (Strongly typical set): Consider p ∈ P(X ) and δ > 0. We say that xn ∈ X n is
n
pδ- strongly typical if xn ∈ T[p]δ
with:
δ
N (a|xn )
n
n
n
− p(a) ≤
, ∀a ∈ X such that p(a) 6= 0 ,
T[p]δ = x ∈ X :
n
kX k
(381)
where N (a|xn ) denotes de number of occurrences of a ∈ X in xn and p ∈ P(X ). When
n
X ∼ pX (x) we can denote the corresponding set of strongly typical sequences as T[X]δ
.
Similarly, given pXY ∈ P (X × Y) we can construct the set of δ-jointly typical sequences as:
δ
N (a, b|xn , y n )
n
− pXY (a, b) ≤
,
T[XY ]δ = (xn , y n ) ∈ X n × Y n :
n
kX kkYk
∀(a, b) ∈ X × Y such that pXY (a, b) 6= 0} .
(382)
We also define the conditional typical sequences. In precise terms, given xn ∈ X n we consider
the set:
T[Yn |X]δ (xn )
n
δ
N (a, b|xn , y n )
− pXY (a, b) ≤
,
= y ∈Y :
n
kX kkYk
o
∀(a, b) ∈ X × Y such that pXY (a, b) 6= 0 .
n
n
(383)
Notice that we the following is an alternative writing of this set:
n
T[Yn |X]δ (xn ) = y n ∈ Y n : (xn , y n ) ∈ T[XY
]δ
.
(384)
We have several useful and standard Lemmas, which will be presented without proof (except
for the last one):
Lemma 1 (Properties of typical sets [23]): The following are true:
n
n
n
n
n
n
n
1) Consider (xn , y n ) ∈ T[XY
∈ T[Y ]ǫ , xn ∈ T[X|Y
∈
]ǫ . Then, x ∈ T[X]ǫ , y
]ǫ (y ) and y
T[Yn |X]ǫ (xn ) .
November 7, 2018
DRAFT
63
n
2) Be T[Yn |X]ǫ (xn ) with xn ∈
/ T[X]ǫ
. Then T[Yn |X]ǫ (xn ) = ∅ .
Q
n
3) Be (X n , Y n ) ∼ nt=1 pXY (xt , yt ). If xn ∈ T[X]ǫ
we have
2−n(H(X)+δ(ǫ)) ≤ pX n (xn ) ≤ 2−n(H(X)−δ(ǫ))
with δ(ǫ) → 0 when ǫ → 0. Similarly, if y n ∈ T[Yn |X]ǫ (xn ):
′
′
2−n(H(Y |X)+δ (ǫ)) ≤ pY n |X n (y n |xn ) ≤ 2−n(H(Y |X)−δ (ǫ))
with δ ′ (ǫ) → 0 when ǫ → 0 .
Lemma 2 (Conditional typicality lemma [23]): Consider de product measure
Using that measure, we have the following
n
−nf (ǫ)
, c1 > 1
Pr T[X]ǫ
≥ 1 − O c1
n
′
where f (ǫ) → 0 when ǫ → 0. In addition, for every xn ∈ T[X]ǫ
′ with ǫ <
n
−ng(ǫ,ǫ′ )
n
n
, c2 > 1
Pr T[Y |X]ǫ (x )|x ≥ 1 − O c2
ǫ
kYk
Qn
t=1
pXY (xt , yt ).
we have:
where g(ǫ, ǫ′ ) → 0 when ǫ, ǫ′ → 0.
Lemma 3 (Size of typical sets [23]): Given pXY ∈ P (X × Y) we have
1 n
1 n
kT[X]ǫ k ≤ H(X) + δ(ǫ),
kT k ≥ H(X) − δ ′ (ǫ, n)
n
n [X]ǫ
where δ(ǫ), δ ′ (ǫ, n) → 0 when ǫ → 0 and n → ∞. Similarly for every xn ∈ X n we have:
1 n
kT[Y |X]ǫ (xn )k ≤ H(Y |X) + δ(ǫ)
n
n
′
with δ(ǫ) → 0 with ǫ → 0. In addition, for every xn ∈ T[X]ǫ
′ with ǫ <
ǫ
kYk
we have:
1 n
kT
(xn )k ≥ H(Y |X) − δ ′ (ǫ, ǫ′ , n)
n [Y |X]ǫ
where δ ′ (ǫ, ǫ′ , n) → 0 when ǫ, ǫ′ → 0 and n → ∞.
n n
Lemma 4 (Joint typicality lemma [23]): Consider (X,
o Z (x, y, z) and (x , y ) ∈
n Y, Z) ∼ pXY
n
′
T[XY
]ǫ′ with ǫ <
ǫ
kZk
and ǫ < ǫ′′ . If pZ n |X n (z n |xn ) =
n
n
1 z n ∈T[Z|X]ǫ
′′ (x )
kT[Z|X]ǫ′′ (xn )k
there exists δ ′ (ǫ, ǫ′ , ǫ′′ , n)
which goes to zero as ǫ, ǫ′ , ǫ′′ → 0 and n → ∞ and:
o
n
′
′
n
n n
(x
,
y
)
≤ 2−n(I(Y ;Z|X)−δ )
2−n(I(Y ;Z|X)+δ ) ≤ Pr Z̃ n ∈ T[Z|XY
]ǫ
November 7, 2018
DRAFT
64
n
′
Lemma 5 (Covering Lemma [22]): Be (U, V, X) ∼ pU V X and (xn , un ) ∈ T[XU
]ǫ′ , ǫ <
ǫ
kVk
nR
and ǫ < ǫ′′ . Consider
also o{V n (m)}2m=1 random vectors which are independently generated
n
according to
n
(un )
1 v n ∈T[V
|U ]ǫ′′
kT[V |U ]ǫ′′ (un )k
. Then:
Pr V n (m) ∈
/ T[Vn |U X]ǫ (xn , un ) for all m −−−→ 0
(385)
n→∞
n
uniformly for every (xn , un ) ∈ T[XU
]ǫ′ if:
R > I (V ; X|U ) + δ(ǫ, ǫ′ , ǫ′′ , n)
(386)
where δ(ǫ, ǫ′ , ǫ′′ , n) → 0 when ǫ, ǫ′ , ǫ′′ → 0 and n → ∞.
Corollary 1: Assume the conditions in Lemma 5, and also:
n
Pr (X n , U n ) ∈ T[XU
−−→ 1 .
]ǫ′ −
(387)
n→∞
Then:
Pr (U n , X n , V n (m))) ∈
/ T[Un XV ]ǫ for all m −−−→ 0
(388)
n→∞
when (386) is satisfied.
Lemma 6 (Packing Lemma [22]): Be (U1 U2 W V1 V2 X) ∼ pU1 U2 W V1 V2 X , (xn , wn , v1n , v2n ) ∈
n
′
T[XW
V1 V2 ]ǫ′ and ǫ <
ǫ
kU1 kkU2 k
1
and ǫ < min {ǫ1 , ǫ2 }. Consider random vectors {U1n (m1 )}A
m1 =1
2
and {U2n (m2 )}A
m2 =1 which are independently generated according to
o
n
n
n
n
n
1 ui ∈ T[Ui |Vi W ]ǫi (vi , w )
, i = 1, 2 ,
kT[Ui |Vi W ]ǫi (wn , vin )k
and A1 , A2 are positive random variables independent of everything else. Then
Pr (U1n (m1 ), U2n (m2 )) ∈ T[Un1 U1 |XW V1 V2 ]ǫ (xn , wn , v1n , v2n ) for some (m1 , m2 ) −−−→ 0
n→∞
(389)
n
uniformly for every (xn , wn , v1n , v2n ) ∈ T[XW
V1 V2 ]ǫ′ provided that:
log E [A1 A2 ]
< I (U1 ; XV2 U2 |W V1 ) + I (U2 ; XV1 U1 |W V2 ) − I (U1 ; U2 |XW V1 V2 ) − δ (390)
n
where δ ≡ δ(ǫ, ǫ′ , ǫ1 , ǫ2 , n) → 0 when ǫ, ǫ′ , ǫ1 , ǫ2 → 0 and n → ∞.
Corollary 2: Assume the conditions in Lemma 6, and also:
n
Pr (X n , W n , V1n , V2n ) ∈ T[XW
−−→ 1
V1 V2 ]ǫ′ −
n→∞
November 7, 2018
(391)
DRAFT
65
Then:
Pr (U1n (m1 ), U2n (m2 ), X n , W n , V1n , V2n )) ∈ T[Un1 U1 XW V1 V2 ]ǫ for some (m1 , m2 ) −−−→ 0
n→∞
(392)
when (390) is satisfied.
Lemma 7 (Generalized Markov Lemma [24] ): Consider a pmf pU XY belonging to P (X × Y × U)
and that satisfies de following:
Y −− X −− U
n
n
Consider (xn , y n ) ∈ T[XY
]ǫ′ and random vectors U generated according to:
n
o
Pr U n = un xn , y n , U n ∈ T[Un |X]ǫ′′ (xn ) =
n
o
1 un ∈ T[Un |X]ǫ′′ (xn )
kT[Un |X]ǫ′′ (xn )k
(393)
n
For sufficiently small ǫ, ǫ′ , ǫ′′ the following holds uniformly for every (xn , y n ) ∈ T[XY
]ǫ′ :
where c > 1.
n
o
Pr U n ∈
/ T[Un |XY ]ǫ (xn , y n ) xn , y n , U n ∈ T[Un |X]ǫ′′ (xn ) = O c−n
(394)
Corollary 3: Assume the conditions in Lemma 7, and also:
n
Pr (X n , Y n ) ∈ T[XY
−−→ 1
]ǫ′ −
n→∞
n
and that uniformly for every (xn , y n ) ∈ T[XY
]ǫ′ :
n
o
Pr U n ∈
/ T[Un |X]ǫ′′ (xn ) xn , y n −−−→ 0
n→∞
we obtain:
Pr (U n , X n , Y n ) ∈ T[Un XY ]ǫ −−−→ 1 .
n→∞
(395)
(396)
(397)
Lemma 7 and Corollary 3 will be central for us. They will guarantee the joint typicality of
the descriptions generated in different encoders considering the pmf of the chosen descriptions
induced by the coding scheme used. The original proof of this result is given in [4] and involves a
combination of rather sophisticated algebraic and combinatorial arguments over finite alphabets.
Alternative proof was also provided in [22],
Pr (U n X n , Y n ) ∈ T[Un XY ]ǫ −−−→ 1
n→∞
November 7, 2018
(398)
DRAFT
66
which strongly relies on a rather obscure result by Uhlmann [25] on combinatorics. In [24] a
short and more general proof of this result is given.
We next present a result which will be useful for proving Theorem 1. In order to use the
Markov Lemma we need to show that the descriptions induced by the encoding procedure in
each node satisfies (393).
Lemma 8 (Encoding induced distribution): Consider a pmf pU XW belonging to P (U × X × W)
and ǫ′ ≥ ǫ. Be {U n (m)}Sm=1 random vectors independently generated according to
n
o
1 un ∈ T[Un |W ]ǫ′ (wn )
kT[U |W ]ǫ′ (wn )k
and where (W n , X n ) are generated with an arbitrary distribution. Once these vectors are generated, and given xn and wn , we choose one of them if:
(un (m), wn , xn ) ∈ T[Un W X]ǫ , for some m ∈ [1 : S] .
(399)
If there are various vectors un that satisfies this we choose the one with smallest index. If there
are none we choose an arbitrary one. Let M denote the index chosen. Then we have that:
n
o
n
n
n
n
n
o 1 u ∈ T[U |XW ]ǫ (x , w )
Pr U n (M ) = un xn , wn , U n (M ) ∈ T[Un |XW ]ǫ (xn , wn ) =
. (400)
kT[U |XW ]ǫ (xn , wn )k
Proof: From the selection procedure for M we know that:
M = f 1 (U n (m), X n , W n ) ∈ T[Un XW ]ǫ , m ∈ [1 : S] ,
(401)
where f (·) is an appropriate function. Moreover, because of this and the way in which the
random vectors U n are generated we have:
We can write:
U n (M ) −− 1 (U n (M ), X n , W n ) ∈ T[Un XW ]ǫ , W n , X n −− M .
(402)
n
o
Pr U n (M ) = un xn , wn , U n (M ) ∈ T[Un |XW ]ǫ (xn , wn ) =
S
X
m=1
n
o
Pr M = m xn , wn , U n (m) ∈ T[Un |XW ]ǫ (xn , wn ) ×
n
n
n
n
n
n
Pr U (m) = u x , w , U (m) ∈
November 7, 2018
T[Un |XW ]ǫ (xn , wn ), M
o
=m .
(403)
DRAFT
67
From (402), the second probability term in the RHS of (403) can be written as:
n
o
n
n n
n
n
n
n
n
Pr U (m) = u x , w , U (m) ∈ T[U |XW ]ǫ (x , w ) ∀m .
(404)
We are going to analyze this term. It is clear that we can write:
n
o
n
n
n
n
n
n
n
n
Pr U (m) = u , U (m) ∈ T[U |XW ]ǫ (x , w ) x , w =1 un ∈ T[Un |W X]ǫ (xn , wn )
o
n
×Pr U n (m) = un xn , wn ∀m .(405)
This means that
n
n
n
n
n
n
T[Un |XW ]ǫ (xn , wn )
o
Pr U (m) = u x , w , U (m) ∈
n
o
1 un ∈ T[Un |W X]ǫ (xn , wn ) ∩ T[Un |W ]ǫ′ (wn )
o
= n
∀m .
n
n
n
n
n
n
Pr U (m) ∈ T[U |XW ]ǫ (x , w ) x , w kT[U |W ]ǫ′ (wn )k
(406)
From (406) and the fact that for ǫ′ ≥ ǫ, we have that T[Un |XW ]ǫ (xn , wn ) ⊆ T[Un |W ]ǫ′ (wn ) we obtain:
n
o
n
o 1 un ∈ T[Un |W X]ǫ (xn , wn )
Pr U n (m) = un xn , wn , U n (m) ∈ T[Un |XW ]ǫ (xn , wn ) =
∀m .
kT[U |XW ]ǫ (xn , wn )k
(407)
From this equation and (403) we easily obtain the desired result.
We present, without proof, a useful result about reconstruction functions for lossy source
coding problems:
Lemma 9 (Reconstruction functions for degraded random variables [13]): Consider random
variables (X, Y, Z) such that X − − Y − − Z. Consider an arbitrary function X̂ = f (Y, Z)
and an arbitrary positive distortion function d(·, ·). Then ∃ g ∗ (Y ) such that
E [d(X, g ∗ (Y ))] ≤ E[d(X, f (Y, Z))] .
(408)
Finally we present two lemmas about Markov chains induced by the interactive encoding
schemes which will be relevant for the paper converse results
Lemma 10 (Markov chains induced by interactive encoding of two nodes): Consider a set of
n
Q
three sources (X n , Y n , Z n ) ∼
pXY Z (xt , yt , zt ) and integer K ∈ N. For each l ∈ [1 : K]
t=1
consider arbitrary message sets Ixl , Iyl and arbitrary functions
fxl X n , Jx[1:l−1] , Jy[1:l−1] = Jxl ,
fyl Y n , Jx[1:l] , Jy[1:l−1] = Jyl
November 7, 2018
(409)
(410)
DRAFT
68
with Jxl ∈ Ixl and Jyl ∈ Iyl . The following Markov chain relations are valid for each t ∈ [1 : n]
and l ∈ [1 : K]:
1) Jx1 , X[1:t−1] , Y[t+1:n] −− X[t] −− (Y[t] , Z[t] ) ,
[1:l−1]
[1:l−1]
, Jy
, X[1:t] , Y[t+1:n] −− (Y[t] , Z[t] ) ,
2) Jxl , X[t+1:n] −− Jx
[1:l−1]
[1:l]
, X[1:t−1] , Y[t:n] −− (X[t] , Z[t] ) ,
3) Jyl , Y[1:t−1] −− Jx , Jy
[1:K]
[1:K]
4) X[t+1:n] −− Jx , Jy , X[1:t] , Y[t+1:n] −− (Y[t] , Z[t] ) ,
[1:K]
[1:K]
5) Y[1:t−1] −− Jx , Jy , X[1:t−1] , Y[t:n] −− (X[t] , Z[t] ) ,
[1:K]
[1:K]
6) Jx , Jy , X[1:t−1] , X[t+1:n] , Z[1:t−1] , Z[t+1:n] , Y[1:t−1] −− (X[t] , Y[t] ) −− Z[t] .
Proof: Relations 1), 2) and 3) where obtained in [13]. For completeness we present here
a short proof of 2). The proof of 1) and 3) are similar. For simplicity let us consider A =
[1:l−1] [1:l−1]
I Jxl X[t+1:n] ; Y[t] Z[t] Jx
Jy
X[1:t] Y[t+1:n] . We can write the following:
(a)
A ≤ I Jxl X[t+1:n] ; Y[1:t] Z[t] Jx[1:l−1] Jy[1:l−1] X[1:t] Y[t+1:n]
(b)
= I X[t+1:n] ; Y[1:t] Z[t] Jx[1:l−1] Jy[1:l−1] X[1:t] Y[t+1:n]
n
[1:l−1] [1:l−1]
[1:l−1] [1:l−1]
= H X[t+1:n] Jx
Jy
X[1:t] Z[t] Y
Jy
X[1:t] Y[t+1:n] − H X[t+1:n] Jx
(c)
≤ I X[t+1:n] ; Y[1:t] Z[t] Jx[1:l−1] Jy[1:l−2] X[1:t] Y[t+1:n]
= H Y[1:t] Z[t] Jx[1:l−1] Jy[1:l−2] X[1:t] Y[t+1:n] − H Y[1:t] Z[t] Jx[1:l−1] Jy[1:l−2] X n Z[t] Y[t+1:n]
(d)
[1:l−2] [1:l−2]
(411)
≤ I X[t+1:n] ; Y[1:t] Z[t] Jx
Jy
X[1:t] Y[t+1:n]
where
•
•
•
step (a) follows from non-negativity of mutual information,
[1:l−1]
[1:l−1]
,
, Jy
step (b) follows from the fact that Jxl = fxl X n , Jx
[1:l−1]
[1:l−2]
and that conditioning
, Jy
step (c) follows from the fact that Jyl−1 = fyl Y n , Jx
reduces entropy,
•
step (d) follows from the fact that
reduces entropy.
Jxl−1
=
fxl
X
n
[1:l−2]
[1:l−2]
, Jx
, Jy
Continuing this procedure we obtain:
A ≤ I X[t+1:n] ; Y[1:t] Z[t] X[1:t] Y[t+1:n] = 0.
November 7, 2018
and that conditioning
(412)
DRAFT
69
This shows that 2) is true. 4) and 5) are straightforward consequences of 2) and 3). Just consider
[1:K] [1:K]
is
JxK+1 = JyK+1 = ∅. The proof for 6) is straightforward from the fact that Jx Jy
only function of (X n , Y n ).
Corollary 4: Consider the setting in Lemma 10 with the following modifications:
•
•
X −− Z −− Y ,
[1:l−1]
[1:l−1]
= Jxl .
, Jy
fxl X n , Z n , Jx
The following are true:
1) Jx1 , Z[1:t−1] , Y[t+1:n] −− Z[t] −− Y[t] ,
[1:l−1]
[1:l−1]
, Jy
, Z[1:t] , Y[t+1:n] −− Y[t] ,
2) Jxl , Z[t+1:n] −− Jx
[1:l]
[1:l−1]
3) Jyl , Y[1:t−1] −− Jx , Jy
, Z[1:t−1] , Y[t:n] −− (X[t] , Z[t] ) ,
[1:K]
[1:K]
4) Z[t+1:n] , X n −− Jx , Jy , Z[1:t] , Y[t+1:n] −− Y[1:t] .
Proof: The proof follows the same lines of Lemma 10.
We consider next some Markov chains that arises naturally when we have 3 interacting nodes
and which will be needed for Theorem 6.
Lemma 11 (Markov chains induced by interactive encoding of three nodes): Consider a set of
Q
three sources (X n , Y n , Z n ) ∼ nt=1 pXY Z (xt , yt , zt ) and integer K ∈ N. For each l ∈ [1 : K]
consider arbitrary message sets Ixl , Iyl , Izl and arbitrary functions
fxl X n , Jx[1:l−1] , Jy[1:l−1] , Jz[1:l−1] = Jxl ,
fyl Y n , Jx[1:l] , Jy[1:l−1] , Jz[1:l−1] = Jyl ,
fzl Z n , Jx[1:l] , Jy[1:l] , Jz[1:l−1] = Jzl
(413)
(414)
(415)
with Jxl ∈ Ixl , Jyl ∈ Iyl and Jzl ∈ Izl . The following Markov chain relations are valid for each
t ∈ [1 : n] and l ∈ [1 : K]:
1) Jx1 , Jy1 , X[1:t−1] , X[t+1:n] , Y[t+1:n] , Z[1:t−1] −− X[t] , Y[t] −− Z[t] ,
[1:l−1]
[1:l−1]
[1:l−1]
, Jy
, Jz
, X n , Y[t:n] , Z[1:t−1] −− Z[t] ,
2) Jxl , Jyl , Y[1:t−1] −− Jx
[1:l]
[1:l]
[1:l−1]
3) Jzl , Z[t+1:n] −− Jx , Jy , Jz
, X n , Y[t+1:n] , Z[1:t] −− Y[t] ,
[1:K]
[1:K]
[1:K]
4) Z[t+1:n] −− Jx , Jy , Jz , X n , Y[t+1:n] , Z[1:t] −− Y[t] ,
[1:K]
[1:K]
[1:K]
5) Y[1:t−1] −− Jx , Jy , Jz , X n , Y[t:n] , Z[1:t−1] −− Z[t] .
Proof: Along the same lines of Lemma10 and for that reason omitted.
November 7, 2018
DRAFT
70
V1n
X1n
Encoder 1
R1
Decoder
R1
Û1n
Û2n
R2
X2n
(X3n , V1n , V2n )
Encoder 2
V1n
Figure 9: Cooperative Berger-Tung problem.
A PPENDIX B
C OOPERATIVE B ERGER -T UNG P ROBLEM
WITH
S IDE I NFORMATION AT THE D ECODER
We derive an inner bound on the rate region of the setup described in Fig. 9. It should be
emphasize that we will not consider distortion measures, we only focus is on the exchange
of descriptions. Encoders 1 and 2 observe source sequences X1n and X2n , and also have access to a common side information V1n . Whereas, the decoder has access to side informations
(X3n , V1n , V2n ). Upon observing X1n and V1n , Encoder 1 generates a message M1 which is
transmitted to Encoder 2 and the decoder. Encoder 2, upon observing (X2n , V1n ) and the message
M1 , generates a message M2 which is transmitted only to the decoder. Finally, the decoder
uses messages (M1 , M2 ) and the side informations (X3n , V1n , V2n ) to reconstruct two sequences
(Û1n , Û2n ) which are jointly typical with (X1n , X2n , X3n , V1n , V2n ). In precise terms, we will assume
the following:
•
A probability mass function pX1 X2 X3 U1 U2 V1 V2 which takes values on cartesian product finite
alphabets X1 × X2 × X3 × U1 × U2 × V1 × V2 , and that satisfies the following Markov chains:
U1 −− (X1 , V1 ) −− (X2 , X3 , V2 ) , U2 −− (U1 , X2 , V1 ) −− (X1 , X3 , V2 ) .
•
(416)
Five random vectors (X1n , X2n , X3n , V1n , V2n ) (not necessarily independently and identically
distributed with pX1 X2 X3 V1 V2 which take values on alphabets X1n × X2n × X3n × V1n × V2n
November 7, 2018
DRAFT
71
such that, for every ǫ > 0,
n
=1.
lim Pr (X1n , X2n , X3n , V1n , V2n ) ∈ T[X
1 X2 X3 V1 V2 ]ǫ
n→∞
(417)
Definition 4 (Cooperative code): A code (n, f1n , f2n , g n , M1 , M2 ) for the setup in Fig. 9 is
composed by:
•
•
Two set of indices M1 , M2 .
An encoding function f1n : X1n × V1n → M1 , such that f1n (xn1 , v1n ) = m1 .
•
An encoding function f2n : X2n × V1n × M1 → M2 , such that f2n (xn2 , v1n , m1 ) = m2 .
•
A decoding function g n : X3n ×V1n ×V2n ×M1 ×M2 → U1n ×U2n , such that g n (xn3 , v1n , v2n , m1 , m2 ) =
(ûn1 , ûn2 ).
Definition 5 (Achievable rates): We say that (R1 , R2 ) are ǫ-achievable if there exists a code
(n, f1n , f2n , g n , M1 , M2 ) such that:
1
1
log kM1 k ≤ R1 + ǫ ,
log kM2 k ≤ R2 + ǫ
n
n
and
Pr
n
(Û1n , Û2n , V1n , V2n , X1n , X2n , X3n )
∈
T[Un1 U2 V1 V2 X1 X2 X3 ]ǫ
(418)
o
≤ǫ.
(419)
The closure of the set of all achievable rates (R1 , R2 ) is denoted by RCBT .
The following theorem presents an inner bound to RCBT .
Theorem 9: (Inner bound on the rate region of the cooperative Berger-Tung problem) Consider
Rinner
CBT the closure of the set of rates satisfying:
R1 > I(X1 ; U1 |X2 V1 ) ,
R2 > I(X2 ; U2 |X3 V1 V2 U1 ) ,
R1 + R2 > I(X1 X2 ; U1 U2 |X3 V1 V2 ) ,
where the union is over all probability distributions verifying (416). Then Rinner
CBT ⊆ RCBT .
Remark 17: Notice that we are not asking for (X1n , X2n , X3n , V1n , V2n ) to be independently and
identically distributed. This is in fact not needed for the result that follows. For us, when trying
to use this result, the case of most interest will be when (X1n , X2n , X3n ) is generated using the
Q
product measure ni=1 pX1 X2 X3 (x1i , x2i , x3i ), (that is, when (X1 X2 X3 ) is a DMS). However,
(V1n , V2n ) will not be independently and identically distributed. Still, (417) will be satisfied.
November 7, 2018
DRAFT
72
Remark 18: Notice that unlike the classical rate-distortion problem we are not interested in
an average per-symbol distortion constraints at the decoder. We only require that the obtained
sequences be jointly typical with the sources. Clearly the problem can be slightly modified to
consider the case in which reconstruction distortion constraints are of interest. In fact, case (C)
reported in [26], considers a similar setting. Here, given the importance of this result for our
interactive scheme, we present a slightly different and more direct proof of the achievability,
where we discuss the key points in the encoding and decoding procedures which will be relevant
for our extension to the interactive problem.
Proof: Our proof uses standard ideas from multiterminal source coding. As V1n is common
to both encoders and decoder we can set without loss of generality V1n = ∅. Conditioning with
respect to V1 the final expressions can take into account the situation in which V1n 6= ∅.
A. Coding generation
We randomly generate 2nR̂1 codewords U1n (k), k ∈ [1 : 2R̂1 ] according to
U1n (k) ∼
1
n
un1
∈
T[Un1 ]ǫcd
T[Un1 ]ǫcd
o
, ǫcd > 0 .
(420)
These 2nR̂1 codewords are distributed uniformly over 2nR1 bins denoted by B1 (m1 ), where
m1 ∈ [1 : 2nR1 ]. For each codeword un1 (k) with k ∈ [1 : 2nR̂1 ] , we randomly generate 2nR̂2
codewords according to:
U2n (l, k) ∼
n
o
1 un2 ∈ T[Un2 |U1 ]ǫcd (un1 (k))
T[Un2 |U1 ]ǫcd (un1 (k))
, ǫcd > 0
(421)
with l ∈ [1 : 2nR̂2 ]. The 2n(R̂1 +R̂2 ) codewords generated are distributed uniformly in 2nR2 bins,
denoted by B2 (m2 ), m2 ∈ [1 : 2nR2 ]. It is worth to mention that the codewords {U2n (l, k)} are
not distributed in a different structure of bins for each k, but on only one super-bin structure of
size 2n(R̂1 +R̂2 ) /2nR2 where B2 (m2 ) does not needed to be indexed with k.
As will be clear, this will not constraint the decoder to use successive decoding and instead
use joint decoding in order to recover the desired codewords (Û1n , Û2n ). Finally all codebooks
are revealed to all parties.
November 7, 2018
DRAFT
73
B. Encoding at node 1
Given xn1 , the encoder search for k ∈ [1 : 2nR̂1 ] in such a way that:
n
(xn1 , un1 (k)) ∈ T[X
, ǫ2 > 0 .
1 U1 ]ǫ2
(422)
If more than one index satisfies this condition, then we choose the one with the smallest index.
Otherwise, if no such index exists, we choose an arbitrary one and declare an error. Finally we
select m1 as the index of the bin which contains the codeword un1 (k) found and transmit it to
nodes 2 and 3.
C. Decoding at node 2
Given xn2 and m1 , we search in the bin B1 (m1 ) for an index k ∈ [1 : 2nR̂1 ] such that:
n
(xn2 , un1 (k)) ∈ T[X
, ǫ3 > 0 .
2 U1 ]ǫ3
(423)
If there only one index that satisfies this we declare it as the index generated at node 1. If there
several or none we choose a predefined one and declare an error. The chosen index is denoted
as k̂(2).
D. Encoding at node 2
Given xn2 and k̂(2) we search for l ∈ [1 : 2nR̂2 ] such that:
n
(xn2 , un1 (k̂(2)), un2 (l, k̂(2))) ∈ T[X
, ǫ4 > 0 .
2 U1 U2 ]ǫ4
(424)
If more than one index satisfies this condition, then we choose the one with the smallest index.
Otherwise, if no such index exists, we choose an arbitrary one and declare an error. Finally we
select m2 as the index of the bin which contains the codeword un2 (l, k̂(2)) selected and transmit
it to node 3.
E. Decoding at node 3
Given xn3 , v2n and m1 , m2 , the decoder search in the bins B1 (m1 ) and B2 (m2 ) for a pair of
ˆ
indices (k, l) ∈ [1 : 2nR̂1 ] × [1 : 2nR2 ] such that
n
(xn3 , v2n , un1 (k), un2 (l, k)) ∈ T[X
, ǫ>0.
3 V2 U1 U2 ]ǫ
(425)
If there only one pair of indices that satisfy this we declare it as the indices generated at node 1
and 2. If there several or none we choose a predefined pair and declare an error. The chosen pair
is denoted by (k̂(3), ˆl(3)). Finally, the decoder declares (ûn , ûn ) = (un (k̂(3)), un (ˆl(3), k̂(3))).
1
November 7, 2018
2
1
2
DRAFT
74
F. Error probability analysis
Consider (K, L) the description indices generated at node 1 and 2, and (M1 , M2 ) the corresponding bin indices. With K̂(2) and (K̂(3), L̂(3)) we denote the indices recovered at nodes 2
and 3. We want to prove that Pr {E} ≤ ǫ′ when n is sufficiently large, where
o
n
n
n
n
n
n
n
n
E=
X1 , X2 , X3 , V2 , U1 (K̂(3)), U2 (L̂(3), K̂(3)) ∈
/ T[X1 X2 X3 V2 U1 U2 ]ǫ .
(426)
We consider the following events of error:
o
n
n
n
n
n
n
/ T[X1 X2 X3 V2 ]ǫ1 , ǫ1 > 0.
• E1 = (X1 , X2 , X3 , V2 ) ∈
n
o
n
n
n
nR̂1
• E2 = (X1 , U1 (k)) ∈
/ T[X
∀k
∈
[1
:
2
]
, ǫ2 > 0.
1 U1 ]ǫ2
n
o
n
n
n
n
n
n
, ǫ3 > 0.
• E3 = (X1 , X2 , X3 , V2 , U1 (K)) ∈
/ T[X
1 X2 X3 V2 U1 ]ǫ3
o
n
n
n
n
, ǫ3 > 0.
• E4 = ∃k̂ 6= K, k̂ ∈ B1 (M1 ), X2 , U1 (k̂) ∈ T[X U ]ǫ
1 1 3
n
o
n
nR̂2
• E5 =
X2n , U1n (K̂(2)), U2n (l, K̂(2))) ∈
/ T[X
∀l
∈
[1
:
2
]
, ǫ4 > 0.
2 U1 U2 ]ǫ4
o
n
n
, ǫ > 0.
• E6 =
X1n , X2n , X3n , V2n , U1n (K), U2n (L, K̂(2)) ∈
/ T[X
1 X2 X3 V2 U1 U2 ]ǫ
o
n
n
.
• E7 = ∃k̂ 6= K, ˆ
l 6= L, k̂ ∈ B1 (M1 ), ˆl ∈ B2 (M2 ), X3n , V2n , U1n (k̂), U2n (ˆl, k̂) ∈T[X
3 V2 U1 U2 ]ǫ
n
o
S
Clearly E ⊆ 7i=1 Ei . In fact, it is easy to show that (K̂(3), L̂(3)) 6= (K, L), K̂(2) 6= K ⊆
S7
ǫ2
i=1 Ei . From hypothesis, we obtain that limn→∞ Pr {E1 } = 0. Choosing ǫ1 < kU1 k and ǫ2 < ǫcd
we can use Lemma 5 and its Corollary (with the following equivalences: V ≡ U1 , X ≡ X1 ,
U ≡ ∅) to obtain limn→∞ Pr {E2 } = 0 if
R̂1 > I(U1 ; X1 ) + δ(ǫ1 , ǫ2 , ǫcd , n) .
(427)
For the analysis of Pr {E3 } we can use Lemma 7, its Corollary and Lemma 8 defining Y ≡ X2 X3 ,
X ≡ X1 and U ≡ U1 and using ǫ2 , ǫ3 and ǫcd sufficiently small13 to obtain limn→∞ Pr {E3 } = 0.
For the analysis of Pr {E4 } we can write:
Pr {E4 } = E [Pr {E4 |K = k, M1 = m1 }]
[ n
o
n
X2n , U1n (k̂) ∈ T[X
= E Pr
1 U1 ]ǫ3
k̂6
=
k
k̂∈B1 (m1 )
13
K=k
M1 =m1 .
(428)
In the following, we will not indicate anymore the corresponding values of the constants ǫ, the arguments of δ and the
equivalence between the involved random variables in order to use the lemmas from Appendix A
November 7, 2018
DRAFT
75
Using Lemma 6 (with the appropriate equivalences on the involved random variables) and the
statistical properties of the codebooks, binning and encoding, we have, that for each k, m1 :
o
n
[
n
K=k
=0
(429)
lim Pr
X2n , U1n (k̂) ∈ T[X
M1 =m1
1 U1 ]ǫ3
n→∞
k̂6=k,k̂∈B1 (m1 )
provided that
1
log EkB1 (m1 )k < I(X2 ; U1 ) − δ(ǫ1 , ǫ3 , ǫcd , n) .
n
(430)
As E [kB1 (m1 )k] = 2n(R̂1 −R1 ) ∀m1 we have that limn→∞ Pr {E4 } = 0 provided that
R̂1 − R1 < I(X2 ; U1 ) − δ .
(431)
The analysis of Pr {E5 } follows the same lines of Pr {E2 }. The above analysis implies that
o
n
n
=1.
(432)
lim Pr X2n , U1n (K̂(2)) ∈ T[X
2 U1 ]ǫ3
n→∞
Then, by Lemma 5 and its Corollary we have that limn→∞ Pr {E5 } = 0 if :
R̂2 > I(X2 ; U2 |U1 ) + δ .
(433)
From Lemmas 7 and 8, similarly as with Pr {E3 }, we have limn→∞ Pr {E6 } = 0. Let us turn to
analyze Pr {E7 }:
Pr {E7 } = E [Pr {E7 |K = k, L = l, M1 = m1 , M2 = m2 }]
o
n
[
n
n
n
n ˆ
n
X3 , V2 , U1 (k̂), U2 (l, k̂) ∈ T[X3 V2 U1 U2 ]ǫ
= E Pr
(k̂,l̂)6=(k,l)
k̂∈B1 (m1 )
(k̂,l̂)∈B2 (m2 )
≤ E [α1 + α2 + α3 ]
where we have
o
n
[
n
X3n , V2n , U1n (k̂), U2n (l, k̂) ∈ T[X
α1 = Pr
3 V2 U1 U2 ]ǫ
k̂6
=
k
k̂∈B1 (m1 )
(k̂,l)∈B2 (m2 )
November 7, 2018
K=k,L=l
M1 =m1 ,M2 =m2
K=k,L=l
M1 =m1 ,M2 =m2
(434)
,
(435)
DRAFT
76
α2 = Pr
α3 = Pr
[
l̂6=l
(k,l̂)∈B2 (m2 )
[
k̂6=k,l̂6=l
k̂∈B1 (m1 )
(k̂,l̂)∈B2 (m2 )
n
n
o
n
X3n , V2n , U1n (k), U2n (ˆl, k) ∈ T[X
3 V2 U1 U2 ]ǫ
o
n
X3n , V2n , U1n (k̂), U2n (ˆl, k̂) ∈ T[X
3 V2 U1 U2 ]ǫ
K=k,L=l
M1 =m1 ,M2 =m2
K=k,L=l
M1 =m1 ,M2 =m2
We can use the Lemma 6 to obtain:
lim α1 = 0 , lim α2 = 0 , lim α3 = 0
n→∞
n→∞
n→∞
,
(436)
.
(437)
(438)
provided that
1
log E k̂ : (k̂, l) ∈ B2 (m2 ), k̂ ∈ B1 (m1 ) < I(X3 V2 ; U1 U2 ) − δ,
n
1
log E ˆl : (k, ˆl) ∈ B2 (m2 ) < I(X3 V2 ; U2 |U1 ) − δ ,
n
1
log E (k̂, ˆl) : (k̂, ˆl) ∈ B2 (m2 ), k̂ ∈ B1 (m1 ) < I(X3 V2 ; U1 U2 ) − δ .
n
(439)
(440)
(441)
Because on how the binning is performed, we have:
E k̂ : (k̂, l) ∈ B2 (m2 ), k̂ ∈ B1 (m1 ) = 2n(R̂1 −R1 −R2 ) ,
E ˆl : (k, ˆl) ∈ B2 (m2 ) = 2n(R̂2 −R2 ) ,
E (k̂, ˆl) : (k̂, ˆl) ∈ B2 (m2 ), k̂ ∈ B1 (m1 ) = 2n(R̂1 +R̂2 −R1 −R2 ) ,
(442)
(443)
(444)
which give us:
(R̂1 − R1 ) − R2 < I(X3 V2 ; U1 U2 ) − δ ,
(445)
R̂2 − R2 < I(X3 V2 ; U2 |U1 ) − δ ,
(446)
(R̂1 + R̂2 ) − (R1 + R2 ) < I(X3 V2 ; U1 U2 ) − δ .
(447)
Notice that equation (445) remains inactive because of (447). Equations (427), (431), (433),
(446), (447) can be combined with:
November 7, 2018
R̂1 > R1 ,
(448)
R̂1 + R̂2 > R2 ,
(449)
DRAFT
77
which follow from the binning structure assumed in the generated codebooks. A Fourier-Motzkin
elimination procedure can be done to eliminate R̂1 and R̂2 obtaining the desired rate region
(conditioning also the mutual information terms on V1 ).
The following Corollary considers the case in which a genie gives node 2 the value of M1 .
Indeed, this case will be important for our main result.
Corollary 5: If a genie gives M1 to node 2, the achievable region Rinner
CBT reduces to:
R2 > I(X2 ; U2 |X3 V1 V2 U1 ) ,
(450)
R1 + R2 > I(X1 X2 ; U1 U2 |X3 V1 V2 ) .
(451)
The proof of this result is straightforward and thus it will not be presented.
A PPENDIX C
P ROOF OF T HEOREM 1
Let us describe the coding generation, encoding and decoding procedures. We will consider the
following notation. With Mi→S,l we will denote the index corresponding to the true description
n
Ui→S,l
generated at node i at round l and destined to the group of nodes S ∈ C (M) with i ∈
/ S.
With M̂i→S,l (j) where S ∈ C (M), i ∈
/ S, j ∈ S we denote the corresponding estimated index
at node j.
A. Codebook generation
Consider the round l ∈ [1 : K]. For simplicity let us consider the descriptions at node 1. We
(l)
n
generate 2nR̂1→23 independent and identically distributed n-length codewords U1→23,l
(m1→23,l , mW[1,l] )
according to:
o
n
n
1 un1→23,l ∈ T[Un1→23,l |W[1,l] ]ǫ(1,23,l) w[1,l]
n
(m1→23,l , mW[1,l] ) ∼
U1→23,l
, ǫ(1, 23, l) > 0 (452)
n
n
T[U1→23,l |W[1,l] ]ǫ(1,23,l) w[1,l]
(l)
n
where m1→23,l ∈ [1 : 2nR̂1→23 ] and let mW[1,l] denote the indices of the common descriptions W[1,l]
generated in rounds t ∈ [1 : l − 1]. For example, mW[1,l] = {m1→23,t , m2→13,t , m3→12,t }l−1
t=1 . With
n
w[1,l]
we denote the set of n-length common information codewords from
previous rounds
corre
(l)
sponding to the indices mW[1,l] . For each mW[3,l−1] consider the set of 2
(l)
n R̂1→23 +R̂3→12
codewords
n
U1→23,l
(m1→23,l , m3→12,l−1 , mW[3,l−1] ). These n-length codewords are distributed independently
November 7, 2018
DRAFT
78
(l)
and uniformly over 2nR1→23 bins denoted by B1→23,l p1→23,l , mW[3,l−1] with p1→23,l ∈ [1 :
(l)
2nR1→23 ]. Notice that this binning structure is exactly the same we used for the cooperative Bergern
Tung problem in Appendix B. Node 1 distributes codewords U1→23,l
(m1→23,l , m3→12,l−1 , mW[3,l−1] )
in a super-binning structure. This will allow node 2 to recover both, m1→23,l and m3→12,l−1 , using
the same procedure as in the Berger-Tung problem described above. Notice that a different superbinning structure is generated for every mW[3,l−1] . This is without loss of generality, because at
round l nodes 1, 2 and 3, will have a very good estimated of it (see below).
(l)
(l)
We also generate 2nR̂1→2 and 2nR̂1→3 independent and identically distributed n-length coden
n
words U1→2,l
(m1→2,l , mW[2,l] , mV[12,l,1] ), and U1→3,l
(m1→3,l , mW[2,l] , mV[13,l,1] ) according to:
n
o
n
n
1 un1→2,l ∈ T[Un1→2,l |W[2,l] V[12,l,1] ]ǫ(1,2,l) w[2,l]
, v[12,l,1]
n
,(453)
U1→2,l
(m1→2,l , mW[2,l] , mV[12,l,1] ) ∼
n
n
n
T[U1→2,l |W[2,l] V[12,l,1] ]ǫ(1,2,l) w[2,l] , v[12,l,1]
o
n
n
n
, v[13,l,1]
1 un1→3,l ∈ T[Un1→3,l |W[2,l] V[13,l,1] ]ǫ(1,3,l) w[2,l]
n
,(454)
U1→3,l
(m1→3,l , mW[2,l] , mV[13,l,1] ) ∼
n
n
n
,
v
w
T[U1→3,l |W[2,l] V[13,l,1] ]ǫ(1,3,l)
[2,l] [13,l,1]
(l)
(l)
where ǫ(1, 2, l) > 0, ǫ(1, 3, l) > 0, and m1→2,l ∈ [1 : 2nR̂1→2 ] and m1→3,l ∈ [1 : 2nR̂1→3 ]. These
(l)
codewords are distributed uniformly on 2nR1→2 bins denoted by B1→2,l p1→2,l , mW[2,l] , mV[12,l,1]
(l)
(l)
and indexed with p1→2,l ∈ [1 : 2nR1→2 ] and on 2nR1→3 bins denoted by B1→3,l p1→3,l , mW[2,l] , mV[13,l,1]
(l)
and indexed with p1→3,l ∈ [1 : 2nR1→3 ], respectively. Notice that these codewords (which will be
used to generated private descriptions to node 2 and 3) are not distributed in a super-binning
structure. This is because there is not explicit cooperation between the nodes at this level. That
is, node 2 is not compelled to recover the private description that node 1 generate for node 3,
and for that reason the private description that node 2 generate for node 3 is not superimposed
over the former. Notice that the binning structure used for the codewords to be utilized by node
1 impose the following relationships:
(l)
(l)
(l−1)
R1→23 < R̂1→23 + R̂3→12 ,
(455)
(l)
(l)
(456)
(l)
(l)
(457)
R1→2 < R̂1→2 ,
R1→3 < R̂1→3 .
The common and private codewords to be utilized in nodes 2 and 3, for every round, are generated
by following a similar procedure and theirs corresponding rates have analogous relationships.
After this is finished the generated codebooks are revealed to all the nodes in the network.
November 7, 2018
DRAFT
79
B. Encoding technique
Consider node 1 at round l ∈ [1 : K]. Upon observing xn1 and given all of its encoding and
decoding history up to round l, encoder 1 first looks for a codeword un1→23,l (m1→23,l , m̂W[1,l] (1))
such that ǫc (1, 23, l) > 0,
n
n
n
x1 , w[1,l] (m̂W[1,l] (1)), u1→23,l (m1→23,l , m̂W[1,l] (1)) ∈ T[Un1→23,l X1 W[1,l] ]ǫc (1,23,l) .
(458)
Notice that some components in m̂W[1,l] (1) are generated at node 1 and are perfectly known.
If more than one codeword satisfies this condition, then we choose the one with the smallest
index. Otherwise, if no such codeword exists, we choose an arbitrary index and declare an
error. With the chosen index m1→23,l , and with m̂W[3,l−1] (1), we determine the index p1→23,l of
the bin B1→23,l p1→23,l , m̂W[3,l−1] to which un1→23,l (m1→23,l , m̂3→12,l−1 (1), m̂W[3,l−1] (1)) belongs.
After this, Encoder 1 generates the private descriptions looking for codewords un1→2,l (m1→2,l ,
m̂W[2,l] (1), m̂V[12,l,1] (1)), un1→3,l (m1→2,l , m̂W[2,l] (1), m̂V[13,l,1] (1)) such that
n
n
xn1 , w[2,l]
(m̂W[2,l] (1)), v[12,l,1]
(m̂V[12,l,1] (1)), un1→2,l (m1→2,l , m̂W[2,l] (1), m̂V[12,l,1] (1))
∈ T[Un1→2,l X1 W[2,l] V[12,l,1] ]ǫc (1,2,l) , ,
n
n
xn1 , w[2,l]
(m̂V[13,l,1] (1)), un1→3,l (m1→2,l , m̂W[2,l] (1), m̂V[13,l,1] (1))
(m̂W[2,l] (1)), v[13,l,1]
∈ T[Un1→3,l X1 W[2,l] V[13,l,1] ]ǫc (1,3,l) ,
(459)
(460)
respectively, where ǫc (1, 2, l) > 0 and ǫc (1, 3, l) > 0. Given m̂W[2,l] (1), m̂V[12,l,1] (1), m̂V[13,l,1] (1) ,
the encoding procedure continues by determining the bin indices p1→2,l and p1→3,l to which the
generated private descriptions belong to. Node 1 then transmits to node 2 and 3 the indices
(p1→23,l , p1→2,l , p1→3,l ). The encoding in nodes 2 and 3 follows along the same lines and for that
reason are not described.
C. Decoding technique
Consider round l ∈ [1 : K + 1] and node 2. During round the present and previous round
node 2 receives (p1→23,l , p3→12,l−1 , p1→2,l , p1→3,l , p3→1,l−1 , p3→2,l−1 ). However, only the indices
(p1→23,l , p3→12,l−1 , p1→2,l , p3→2,l−1 ) are the ones relevant to him. Knowing this set of indices, node
2 aims to recover the exact values of (m1→23,l , m3→12,l−1 , m1→2,l , m3→2,l−1 ). This is done through
November 7, 2018
DRAFT
80
successive decoding where first, the common information indices are recovered by looking
for the unique tuple of codewords un1→23,l (m1→23,l , m3→12,l−1 , m̂W[3,l−1] (2)), un3→12,l−1 (m3→12,l−1 ,
m̂W[3,l−1] (2)) that satisfies:
n
n
n
xn2 , w[3,l−1]
(m̂W[3,l−1] (2)), v[12,1,l]
(m̂V[12,l,1] (2)), v[23,l−1,3]
(m̂V[23,l−1,3] (2)),
un1→23,l (m1→23,l , m3→12,l−1 , m̂W[3,l−1] (2)), un3→12,l−1 (m3→12,l−1 , m̂W[3,l−1] (2))
∈ T[Un1→23,l U3→12,l−1 X2 W[3,l−1] V[23,l−1,3] V[12,l,1] ]ǫdc (2,l) , ǫdc (2, l) > 0
(461)
and also belong to the bins indicated by p1→23,l and p3→12,l−1 . If there are more than one pair
of codewords, or none that satisfies this, we choose a predefined one and declare an error.
After this is done, node 2 can recover the private information indices by looking at codewords
un1→2,l (m1→2,l , m̂W[2,l] (2), m̂V[12,l,1] (2)) and un3→2,l−1 (m3→2,l−1 , m̂W[1,l] (2), m̂V[23,l−1,3] (2)) which satisfy
n
n
n
xn2 , w[2,l]
(m̂W[2,l] (2)), v[12,l,1]
(m̂V[12,l,1] (2)), v[23,l−1,3]
(m̂V[23,l−1,3] (2)),
un1→2,l (m1→2,l , m̂W[2,l] (2), m̂V[12,l,1] (2)), un3→2,l−1 (m3→2,l−1 , m̂W[1,l] (2), m̂V[23,l−1,3] (2))
∈ T[Un1→2,l U3→2,l−1 X2 W[2,l] V[23,l−1,3] V[12,l,1] ]ǫdp (2,l) , ǫdp (2, l) > 0
(462)
and are in the bins given by p1→2,l and p3→2,l−1 . If there are more than one pair of codewords,
or none that satisfies this, we choose a predefined one and declare an error. The decoding in
nodes 1 and 3 is exactly the same and for that reason are not described.
D. Lossy reconstructions
When the exchange of information is completed, each node needs to estimate the other nodes
sources. For instance, node 1 reconstruct the source of node 2 by computing:
x̂12,i = g12 x1i , v[12,K+1,1]i , w[1,K+1]i , i = 1, 2, . . . , n,
(463)
and similarly, for the source of node 3:
x̂13,i = g13 x1i , v[13,K+1,1]i , w[1,K+1]i , i = 1, 2, . . . , n.
(464)
Reconstruction at nodes 2 and 3 is done in a similar way using the adequate reconstruction
functions.
November 7, 2018
DRAFT
81
E. Error and distortion analysis
In order to maintain expressions simple, in the following when we denote a description without
n
n
the corresponding index, i.e. Ui→S,l
or W[1,l]
, we will assume that the corresponding index is the
true one generated in the corresponding nodes through the detailed encoding procedure. Consider
round l and the event Dl = Gl ∩ Fl , where for ǫl > 0,
o
n
n
n
n
n
n
n
n
n
Gl = X1 , X2 , X3 , W[1,l] , V[12,l,1] , V[13,l,1] , V[23,l,2] ∈ T[X1 X2 X3 W[1,l] V[12,l,1] V[13,l,1] V[23,l,2] ]ǫl ,(465)
n
/ S, j ∈ S, t ∈ [1 : l − 1], with exception of
Fl = M̂i→S,t (j) = Mi→S,t , S ∈ C(M), i ∈
o
M̂3→12,l−1 (2), M̂3→2,l−1 (2) .
(466)
The set Gl indicates that all the descriptions generated in the network, up to round l are jointly
typical with the sources. The ocurrence of this depends mainly on the encoding procedure in
the nodes. Set Fl indicates, that up to round l, all nodes were able to recovers the true indices
of the descriptions. This clearly implies that there were not errors at the decoding procedures
in all the nodes in the network. The condition in Fl on M̂3→12,l−1 (2), M̂3→2,l−1 (2) is due to the
fact, that the decoding of those descriptions in node 2 occurs during round l. The occurrence of
Dl guarantees that at the beginning of round l:
•
n
n
Node 1 and 2 share a common path of descriptions W[1,l]
∪ V[12,l,1]
which are typical with
(X1n , X2n , X3n ).
•
n
n
Node 1 and 3 share a common path of descriptions W[1,l]
∪ V[13,l,1]
which are typical with
(X1n , X2n , X3n ).
•
n
n
Node 2 and 3 share a common path of descriptions W[3,l−1]
∪ V[23,l−1,3]
which are typical
with (X1n , X2n , X3n ).
Let us also define the event El :
El ={there exists at least an error at the encoding or decoding in a node during round l}
[
=
Eenc (i, l) ∪ Edec (i, l)
(467)
i∈M
where Eenc (i, l) contains the errors at the encoding in node i during round l and Edec (i, l) considers
the event that at node i during round l there is a failure at recovering an index generated
previously in other node. For example, at node 1 and during round l:
Eenc (i, l) = Eenc (1, l, 23) ∪ Eenc (1, l, 2) ∪ Eenc (1, l, 3)
November 7, 2018
(468)
DRAFT
82
where
n
n
X1n , W[1,l]
(M̂W[1,l] (1))U1→23,l
(m1→23,l , M̂W[1,l] (1)) ∈
/ T[Un1→23,l X1 W[1,l] ]ǫc (1,l,23)
o
(l)
∀m1→23,l ∈ [1 : 2nR̂1→23 ]
(469)
n
n
n
n
n
Eenc (1, l, 2)= X1 , W[2,l] (M̂W[2,l] (1)), V[12,l,1] (M̂V[12,l,1] (1)), U1→2,l (m1→2,l , M̂W[2,l] (1), M̂V[12,l,1] (1))
o
(l)
∈
/ T[Un1→2,l X1 W[1,l] V[12,l,1] ]ǫc (1,l,2) ∀m1→2,l ∈ [1 : 2nR̂1→2 ]
(470)
n
n
n
n
Eenc (1, l, 3)= X1n , W[2,l]
(M̂W[2,l] (1)), V[13,l,1]
(M̂V[13,l,1] (1)), U1→3,l
(m1→3,l , M̂W[2,l] (1), M̂V[13,l,1] (1))
o
(l)
n
nR̂1→3
∈
/ T[U1→3,l X1 W[1,l] V[13,l,1] ]ǫc (1,l,3) ∀m1→3,l ∈ [1 : 2
] .
(471)
Eenc (1, l, 23)=
n
Event Edec (i, l) can be decomposed as:
[
Edec (i, l) =
[ n
S∈C(M),i∈S j:j ∈S
/
M̂j→S,l (i) 6= Mj→S,l
o
.
(472)
At the end of the information exchange phase we would expect the occurrence of DK+1 ∩ ĒK+1 ,
where EK+1 is the event of an error during round K + 1. As during round K + 1 only node 2
tries to recover the descriptions generated during round K in node 3, we have:
o
n
EK+1 = Edec (2, K + 1) = M̂3→12,K (2) 6= M3→12,K or M̂3→2,K (2) 6= M3→2,K .
(473)
The occurrence of DK+1 ∩ ĒK+1 guarantees that all the descriptions generated during the K
rounds of information exchange in the network are jointly typical with the sources realizations
and that those descriptions can be perfectly recovered in all the nodes. In this way, if we can
guarantee that Pr DK+1 ∩ ĒK+1 −−−→ 1, then with probability converging to one we obtain:
n→∞
•
n
n
Node 1 and 2 share a common path of descriptions W[1,K+1]
∪ V[12,K+1,1]
which are typical
with (X1n , X2n , X3n ).
•
n
n
Node 1 and 3 share a common path of descriptions W[1,K+1]
∪ V[13,K+1,1]
which are typical
with (X1n , X2n , X3n ).
•
n
n
Node 2 and 3 share a common path of descriptions W[1,K+1]
∪ V[23,K+1,2]
which are typical
with (X1n , X2n , X3n ).
Using standard analysis ideas, the average distortions (over the codebooks) at the reconstruction stages in all the nodes satisfy the required fidelity constraints. From there is straightforward to prove the existence of good codebooks for the network. In order to prove that
November 7, 2018
DRAFT
83
Pr DK+1 ∩ ĒK+1 −−−→ 1 let us write:
n→∞
o
n
Pr DK+1 ∩ ĒK+1 =Pr D̄K+1 ∪ EK+1 = Pr D̄K+1 + Pr {DK+1 ∩ EK+1 }
≤Pr D̄K+1 ∩ DK + Pr D̄K + Pr {DK+1 ∩ EK+1 }
≤Pr D̄K + Pr D̄K+1 ∩ DK ∩ ĒK + Pr {DK ∩ EK } + Pr {DK+1 ∩ EK+1 }
K+1
K
X
X
≤Pr D̄1 +
Pr {Dl ∩ El } +
Pr D̄l+1 ∩ Dl ∩ Ēl
l=1
Notice that
.
(474)
l=1
n
D1 = (X1n , X2n , X3n ) ∈ T[X
1 X2 X3 ]ǫ1
, ǫ1 > 0 .
(475)
From Lemma 2, we see that for every ǫ1 > 0 Pr D̄1 −−−→ 0. Then, it is easy to see that
n→∞
Pr DK+1 ∩ ĒK+1 −−−→ 1 will hold if the coding generation, the encoding and decoding
n→∞
procedures described above allow us to have the following:
1) If Pr {Dl } −−−→ 1 then Pr {Dl+1 } −−−→ 1 ∀l ∈ [1 : K + 1].
n→∞
n→∞
2) Pr {Dl ∩ El } −−−→ 0 ∀l ∈ [1 : K + 1].
n→∞
In the following we will prove these facts. Observe that, at round l the nodes act sequentially:
Encoding at node 1 → Decoding at node 2 → · · · → Encoding at node 3 → Decoding at node 1.
Then, using (467) we can write condition 2) as:
Pr {Dl ∩ El }=Pr {Dl ∩ Eenc (1, l)} + Pr Dl ∩ Edec (2, l) ∩ Ēenc (1, l)
+Pr Dl ∩ Eenc (2, l) ∩ Ēenc (1, l) ∩ Ēdec (2, l) +
· · · + Pr Dl ∩ Edec (1, l) ∩ Ēenc (1, l) ∩ · · · ∩ Ēenc (3, l) .
(476)
Assume then that at the beginning of round l we have Pr {Dl } −−−→ 1. Let us analize the
n→∞
encoding procedure at node 1. Let us consider Pr {Dl ∩ Eenc (1, l)}. We can write:
Pr {Dl ∩ Eenc (1, l)}≤Pr {Eenc (1, l, 23) ∩ Dl } + Pr Eenc (1, l, 2) ∩ Dl ∩ Ēenc (1, l, 23)
+Pr Eenc (1, l, 3) ∩ Dl ∩ Ēenc (1, l, 23) .
(477)
From Lemma 1 and the fact that limn→∞ Pr {Gl } = 1 we have that limn→∞ Pr {Al (1, 23)} = 1
where:
November 7, 2018
o
n
n
n
.
Al (1, 23) = (X1n , W[1,l]
) ∈ T[X
1 W[1,l] ]ǫl
(478)
DRAFT
84
Then, we can use Lemma 5 to obtain:
lim Pr {Eenc (1, l, 23) ∩ Dl } = 0
(479)
n→∞
provided that
(l)
R̂1→23
> I X1 ; U1→23,l W[1,l] + δc (1, l, 23)
(480)
where δc (1, l, 23) can be made arbitrarily small. On the other hand we can write:
Pr Eenc (1, l, 2) ∩ Dl ∩ Ēenc (1, l, 23) ≤ Pr {Eenc (1, l, 2) ∩ Gl (1, 2) ∩ Fl } + Pr Ḡl (1, 2)
(481)
where
o
n
n
n
n
n
n
Gl (1, 2) = (X1n , X2n , X3n , W[2,l]
, V[12,l,1]
, V[13,l,1]
, V[23,l,2]
) ∈ T[X
1 X2 X3 W[2,l] V[12,l,1] V[13,l,1] V[23,l,2] ]ǫl (1,2)
(482)
where ǫl (1, 2) > 0. As explained before, Pr {Gl } −−−→ 1. Then, from condition (480) we have
n→∞
o
n
n
(483)
) ∈ T[X1 W[2,l] ]ǫc (1,l,23) −−−→ 1 .
Pr Ēenc (1, l, 23) ∩ Fl = Pr (X1n , W[2,l]
n→∞
Moreover, from the coding generation and the encoding procedure proposed is immediate to use
Lemma 8 to show that:
n
n
Pr U1→23,l
= un1→23,l xn1 , w[1:l]
, Ēenc (1, l, 23) ∩ Fl =
n
o
n
1 un1→23,l ∈ T[Un1→23,l |X1 W[1,l] ]ǫc (1,23,l) xn1 , w[1,l]
.
n
T[Un1→23,l |X1 W[1,l] ]ǫc (1,23,l) xn1 , w[1,l]
(484)
Then, from Markov chain
U1→23,l −− (X1 , W[1,l] ) −− (X2 , X3 , V[12,l,1] , V[13,l,1] , V[23,l,2] )
(485)
and the Markov Lemma 7, for sufficiently small (ǫc (1, l, 23), ǫl , ǫl (1, 2)) and after some minor
manipulations, we can obtain:
Pr {Gl (1, 2)} −−−→ 1 .
n→∞
(486)
From equation (481) it is clear that we need to analyze term Pr {Eenc (1, l, 2) ∩ Gl (1, 2) ∩ Fl }.
Similarly as before limn→∞ Pr {Al (1, 2)} where:
o
n
n
n
n
n
Al (1, 2) = (X1 , W[2,l] , V[12,l,1] ) ∈ T[X1 W[2,l] V[12,l,1] ]ǫl (1,2) ,
November 7, 2018
(487)
DRAFT
85
which allow us to write:
Pr {Eenc (1, l, 2) ∩ Gl (1, 2) ∩ Fl } ≤ Pr {Eenc (1, l, 2) ∩ Al (1, 2) ∩ Fl } .
(488)
Using again Lemma 5, we obtain that Pr {Eenc (1, l, 2) ∩ Gl (1, 2) ∩ Fl } −−−→ 0 provided that
n→∞
(l)
(489)
R̂1→2 > I X1 ; U1→2,l W[2,l] V[12,l,1] + δc (1, l, 2)
where δc (1, l, 2) can be made arbitrarly small. For the analysis of Pr Eenc (1, l, 3) ∩ Dl ∩ Ēenc (1, l, 23)
we follow the same procedure. We can write
Pr Eenc (1, l, 3) ∩ Dl ∩ Ēenc (1, l, 23) ≤ Pr {Eenc (1, l, 3) ∩ Gl (1, 3) ∩ Fl } + Pr Ḡl (1, 3)
(490)
with
o
n
n
n
n
n
.
Gl (1, 3) = (X1n , X2n , X3n , W[2,l]
, V[12,l,2]
, V[13,l,1]
, V[23,l,2] ) ∈ T[X
1 X2 X3 W[2,l] V[12,l,2] V[13,l,1] V[23,l,2] ]ǫl (1,3)
(491)
Using the Markov chain
U1→2,l −− (X1 , W[2,l] , V[12,l,1] ) −− (X2 , X3 , V[13,l,1] V[23,l,2] ) ,
(492)
the fact that Pr {Gl (1, 2)} −−−→ 1 and the Markov Lemma 7, and Lemma 8 for appropriately
n→∞
chosen values of (ǫc (1, l, 2), ǫl (1, 2), ǫl (1, 3)) we have:
Pr {Gl (1, 3)} −−−→ 1 .
(493)
n→∞
Following exactly the same reasoning as above, we have that in order to have
Pr Eenc (1, l, 3) ∩ Dl ∩ Ēenc (1, l, 23) −−−→ 0 ,
n→∞
besides conditions (480) and (489) we need:
(l)
R̂1→3 > I X1 ; U1→3,l W[2,l] V[13,l,1] + δc (1, l, 3)
(494)
(495)
for sufficiently small δc (1, l, 3). With these conditions we have proved that the encoding procedure
in node 1 during round l permit us to have:
Pr {Dl ∩ Eenc (1, l)} −−−→ 0 .
n→∞
(496)
Another instance of the Markov lemma, jointly with Markov chain
U1→3,l −− (X1 , W[2,l] , V[13,l,1] ) −− (X2 , X3 , V[12,l,2] V[23,l,2] )
November 7, 2018
(497)
DRAFT
86
and Lemma 8 allow us to have:
Pr {Gl (2, 13)} −−−→ 1
(498)
n→∞
where
Gl (2, 13) =
n
n
n
n
(X1n , X2n , X3n , W[2,l]
, V[12,l,2]
, V[13,l,3]
, V[23,l,2] )
∈
n
T[X
1 X2 X3 W[2,l] V[12,l,2] V[13,l,3] V[23,l,2] ]ǫl (2,13)
(499)
At this point we have to analyze the decoding in node 2. If that decoding if successful, with
(498), the analysis of the encoding at node 2 follows the same lines as above14 . The same can
be said of the encoding at node 3 (after successful decoding). In this way, we terminate round
l with
Pr {Dl+1 } = Pr {Gl+1 ∩ Fl+1 } −−−→ 1
(500)
n→∞
which is one the results we wanted. Clearly, analyzing now the decoding at node 2 (from which
we can easily extrapolate the analysis to the decoding at node 1 and 3) we will be able to obtain
Pr {Dl ∩ El } −−−→ 0 which is the other required result.
n→∞
The decoding in each of nodes follows the approach of successive decoding. Decoder 2 will try
to find first the common descriptions M1→23,l and M3→12,l−1 . Then, it will try to find the private
descriptions M1→2,l and M3→2,l−1 (using of course the previously obtained common descriptions
as side information). Clearly, the use of joint-decoding could improve the rate region. However,
the analysis of this strategy, besides of being more difficult to analyze, it will give rise to more
complex rate region. It can be easily seen, that the joint-decoding region will contain several
sum-rate equations that will contains common and private rates. Successive decoding allows for
a rate region where the sum-rate equations contains solely common rates or private rates, being
more easy to analyze and understand.
In order to analyze the decoding, we can write:
Pr Dl ∩ Edec (2, l) ∩ Ēenc (1, l) ≤ Pr {Edec (2, l) ∩ Fl ∩ Gl (2, 13)} + Pr Ḡl (2, 13)
.
(501)
As Pr {Gl (2, 13)} −−−→ 1 we can concentrate our effort on the first term. Event Edec (2, l) can
n→∞
be written as:
Edec (2, l) = Hcommon (2, l) ∪ Hprivate (2, l) ,
14
(502)
See that Gl (2, 13) has, for the encoding at node 2 the same role that Gl has for the enconding at node 1 during round l.
November 7, 2018
DRAFT
o
.
87
where
o
M̂3→12,l−1 (2), M̂1→23,l (2) 6= (M3→12,l−1 , M1→23,l ) ,
o
n
Hprivate (2, l)= M̂3→2,l−1 (2), M̂1→2,l (2) 6= (M3→2,l−1 , M1→2,l ) .
Hcommon (2, l)=
n
(503)
(504)
From these definitions, we can easily deduce that:
Pr {Edec (2, l) ∩ Fl ∩ Gl (2, 13)}=Pr {Hcommon (2, l) ∩ Fl ∩ Gl (2, 13)}
+Pr Hprivate (2, l) ∩ Fl ∩ Gl (2, 13) ∩ H̄common (2, l)
≤Pr {Kcommon (2, l)} + Pr {Kprivate (2, l)}
(505)
where
n
Kcommon (2, l)= ∃(m̃1→23,l , m̃3→12,l−1 ) 6= (M1→23,l , M3→12,l−1 ), (m̃1→23,l , m̃3→12,l−1 )
n
n
n
∈ B1→23,l (P1→23,l ) × B3→12,l−1 (P3→12,l−1 ) : X2n , W[3,l−1]
, V[23,l−1,3]
, V[12,l,1]
,
o
n
n
n
, (506)
U1→23,l
(m̃1→23,l , m̃3→12,l−1 ), U3→12,l−1
(m̃3→12,l−1 ) ∈ T[X
1 W[2,l] V[12,l,1] V[23,l−1,3] ]ǫdc (2,l)
n
Kprivate (2, l)= ∃(m̃1→2,l , m̃3→2,l−1 ) 6= (M1→2,l , M3→2,l−1 ), (m̃1→2,l , m̃3→2,l−1 )
n
n
n
∈ B1→2,l (P1→2,l ) × B3→2,l−1 (P3→2,l−1 ) : X2n , W[2,l]
, V[23,l−1,3]
, V[12,l,1]
,
o
n
n
n
, (507)
U1→2,l
(m̃1→2,l ), U3→2,l−1
(m̃3→2,l−1 ) ∈ T[X
1 W[2,l] V[12,l,2] V[23,l,2] ]ǫdp (2,l)
where ǫdc (2, l), ǫdp (2, l) are carefully chosen15 , and for a saving of notation we considered only the
n
n
indices to be recovered, i.e., U1→23,l
(m̃1→23,l , m̃3→12,l−1 ) ≡ U1→23,l
(m̃1→23,l , m̃3→12,l−1 , MW[3,l−1] )
Consider first the recovering of the common information. Node 2 has to recover two indices from
a binning structure as the one in the cooperative Berger-Tung problem described in Appendix B.
In Fig. 10, we have a representation of the problem seen at decoder 2. Node 3 generate
(l−1)
n
a common description at rate R3→12 using W[3,l−1]
as side information. Similarly node 1,
15
Using Lemma 1 to have:
n
n
n
n
Gl (2, 13)⊆ X2n , W[2,l]
, V[23,l−1,3]
, V[12,l,1]
) ∈ T[X
,
1 W[2,l] V[12,l,1] V[23,l−1,3] ]ǫdc (2,l)
n
n
n
n
.
Gl (2, 13)⊆ X2n , W[2,l]
, V[23,l,2]
, V[12,l,2]
) ∈ T[X
1 W[2,l] V[12,l,2] V[23,l,2] ]ǫdp (2,l)
November 7, 2018
DRAFT
88
n
W[3,l−1]
X3n
Encoder 3
n
U1!23,l
(M̂1!23,l , M̂3!12,l−1 )
(l−1)
R3!12
(l−1)
Decoder 2
R3!12
(l)
R1!23
X1n
n
U3!12,l−1
(M̂3!12,l−1 )
Encoder 1
n
n
n
(X2n , W[3,l−1]
, V[12,l,1]
, V[23,l−1,3]
)
n
W[3,l−1]
Figure 10: Cooperative Berger-Tung decoding problem for node 2.
after decoding the common description from node 3, generate its own description using the
n
recovered one and also W[3,l−1]
as side information. All these operations all done using the
super-binning structure as in the cooperative Berger-Tung problem in Appendix B. Then, node
n
n
n
2, using (X3n , W[3,l−1]
, V[12,l,1]
, V[23,l−1,3]
) as side information tries to recover the descriptions
generated at node 3 and 1. Remember the fact that the encoding procedure at nodes 1 and 3
requires:
(l−1)
(508)
(l)
(509)
R̂3→12 > I(X3 ; U3→12,l−1 |W[3,l−1] ) + δc (1, l − 1, 12) ,
R̂1→23 > I(X1 ; U1→23,l |W[1,l] ) + δc (1, l, 23) ,
(l)
(l)
(l−1)
R1→23 < R̂1→23 + R̂3→12
(510)
and that the following Markov chains:
U3→12,l−1 −− (X3 , W[3,l−1] ) −− (X1 , X2 , V[12,l,1] , V[23,l−1,3] ) ,
(511)
U1→23,l −− (X1 , W[1,l] ) −− (X2 , X3 , V[12,l,1] , V[23,l,2] )/,
(512)
are implied by the Markov chains in the conditions of Theorem 1. In this way, we can use the
results in Appendix B to show that the following rates imply Pr {Kcommon (2, l)} −−−→ 0:
n→∞
(l)
R1→23
(l)
> I(X1 ; U1→23,l |X2 W[1,l] V[23,l−1,3] V[12,l,1] ) + δdc (2, l) ,
(513)
(l−1)
′
R1→23 + R3→12 > I(X1 X3 ; U1→23,l U3→12,l−1 |X2 W[3,l−1] V[23,l−1,3] V[12,l,1] ) + δdc
(2, l) , (514)
November 7, 2018
DRAFT
89
n
n
(W[3,l−1]
, V[23,l−1,3]
)
X3n
Encoder 3
(l−1)
n
U1!2,l
(M̂1!2,l )
R3!2
(l−1)
Decoder 2
R3!2
n
(M̂3!2,l−1 )
U3!2,l−1
(l)
R1!2
X1n
Encoder 1
n
n
(W[3,l−1]
, V[12,l,1]
)
n
n
n
(X2n , W[3,l−1]
, V[12,l,1]
, V[23,l−1,3]
)
Figure 11: Berger-Tung decoding problem for node 2 when it tries to recover the private
descriptions generated in nodes 1 and 3.
′
where δdc (2, l), δdc
(2, l) can be made arbitrarily small16 .
The decoding of the private descriptions can be seen as a standard Berger-Tung decoding
problem (see Fig. 11) where the binning used to transmit the descriptions generated in node 3
and 1 is not cooperative (in the sense of Theorem 9) as in the case of the common descriptions.
Lemma 6 can bee easily used to analyze Pr {Kprivate (2, l)}. The following conditions guarantee
that Pr {Kprivate (2, l)} −−−→ 0:
n→∞
16
Here we considered the Corollary to Theorem 9. That is we assumed, that node 1 knows perfectly the value of M3→12,l−1 .
This follows from the assumed fact, that at the beginning of round l, the probability of decoding errors at previous rounds in
all nodes is goes to zero when n → ∞. In this way, the constraint on rate R3→12,l−1 that should be considered, according
to Theorem 9 is not needed. In fact, constraints on rate R3→12,l−1 will arise when at node 1 we consider the recovering of
M3→12,l−1 and M2→13,l−1 . For that reason, the analysis carried on is valid. Through this analysis we avoid carrying a lengthy
(l)
(l)
(l)
and difficult Fourier-Motzkin procedure to eliminate R̂1→23 , R̂2→13 , R̂3→12 for l = [1 : K].
November 7, 2018
DRAFT
90
(l)
(l)
R̂1→2 <R1→2
(515)
+ I U1→2,l ; X2 V[23,l,2] W[2,l] V[12,l,1] − δdp (2, l)
(l−1)
(l−1)
′
(2, l) (516)
R̂3→2 <R3→2 + I U3→2,l−1 ; X2 U1→23,l V[12,l,2] W[1,l] V[23,l−1,3] − δdp
(l−1)
(l)
(l−1)
(l)
R̂3→2 + R̂1→2 <R3→2 + R1→2 + I U1→2,l ; X2 V[23,l,2] W[2,l] V[12,l,1]
+I U3→2,l−1 ; X2 U1→23,l V[12,l,2] W[1,l] V[23,l−1,3]
′′
−I U3→2,l−1 ; U1→2,l W[2,l] V[23,l−1,3] V[12,l,1] X2 − δdp
(2, l) (517)
′
′′
where δdp (2, l), δdp
(2, l), δdp
(2, l) can be made arbitrarily small. Then, combining all the obtained
results, we have that:
Pr Dl ∩ Edec (2, l) ∩ Ēenc (1, l) −−−→ 0 .
n→∞
(518)
At this point, the story is as it was at the encoding stage in node 1 and all the steps can be
repeated with minor modifications, proving the desired results at the end of round l:
Pr {Dl+1 } −−−→ 1 , Pr {Dl ∩ El } −−−→ 0 .
n→∞
n→∞
(519)
The other rates equations are as follows:
•
•
Encoding at node 2:
Decoding at node 3:
(l)
R̂2→13 > I X2 ; U2→13,l W[2,l] + δc (2, l, 13)
(l)
R̂2→1 > I X2 ; U2→1,l W[3,l] V[12,l,2] + δc (2, l, 1)
(l)
R̂2→3 > I X2 ; U2→3,l W[3,l] V[23,l,2] + δc (2, l, 3)
(520)
(521)
(522)
(l)
(523)
R2→13 > I X2 ; U2→13,l X3 W[2,l] V[13,l,1] V[23,l,2] + δdc (3, l)
(l)
(l)
′′
(3, l) (524)
R2→13 + R1→23 > I X1 X2 ; U1→23,l U2→13,l X3 W[1,l] V[13,l,1] V[23,l,2] + δdc
(l)
(l)
(525)
R̂2→3 < R2→3 + I U2→3,l ; X3 V[13,l,3] W[3,l] V[23,l,2] − δdp (3, l)
(l)
(l)
′
(3, l) (526)
R̂1→3 < R1→3 + I U1→3,l ; X3 U2→13,l V[23,l,3] W[2,l] V[13,l,1] − δdp
(l)
(l)
(l)
(l)
R̂1→3 + R̂2→3 < R1→3 + R2→3 + I U2→3,l ; X3 V[13,l,3] W[3,l] V[23,l,2]
+I U1→3,l ; X3 U2→13,l V[23,l,3] W[2,l] V[13,l,1]
′′
(3, l) (527)
−I U1→3,l ; U2→3,l W[3,l] V[23,l,2] V[13,l,1] X3 − δdp
November 7, 2018
DRAFT
91
•
Encoding at node 3:
(l)
R̂3→12 > I X3 ; U3→12,l W[3,l] + δc (3, l, 12)
(l)
R̂3→1 > I X3 ; U3→1,l W[1,l+1] V[13,l,3] + δc (3, l, 1)
(l)
R̂3→2 > I X3 ; U3→2,l W[1,l+1] V[23,l,3] + δc (3, l, 2)
•
(528)
(529)
(530)
Decoding at node 1:
(l)
R3→12
(l)
(l)
R3→12 + R2→13
(l)
R̂3→1
(l)
R̂2→1
(l)
(l)
R̂2→1 + R̂3→1
(531)
> I X3 ; U3→12,l X1 W[3,l] V[12,l,2] V[13,l,3] + δdc (1, l)
′′
(1, l) (532)
> I X2 X3 ; U2→13,l U3→12,l X1 W[2,l] V[12,l,2] V[13,l,3] + δdc
(l)
R3→1
(533)
+ I U3→1,l ; X1 V[12,l+1,1] W[1,l+1] V[13,l,3] − δdp (1, l)
(l)
′
(1, l) (534)
< R2→1 + I U2→1,l ; X1 U3→12,l V[13,l+1,1] W[3,l] V[12,l,2] − δdp
(l)
(l)
< R2→1 + R3→1 + I U3→1,l ; X1 V[12,l+1,1] W[1,l+1] V[13,l,3]
+I U2→1,l ; X1 U3→12,l V[13,l+1,1] W[3,l] V[12,l,2]
′′
(1, l) . (535)
−I U2→1,l ; U3→1,l W[1,l+1] V[12,l,2] V[13,l,3] X1 − δdp
<
The final private rate equations in Theorem 1 follows from a rather simple Fourier-Motzkin
elimination procedure [22].
R EFERENCES
[1] D. Slepian and J. Wolf, “Noiseless coding of correlated information sources,” Information Theory, IEEE Transactions on,
vol. 19, no. 4, pp. 471–480, 1973.
[2] A. Wyner and J. Ziv, “The rate-distortion function for source coding with side information at the decoder,” Information
Theory, IEEE Transactions on, vol. 22, no. 1, pp. 1–10, 1976.
[3] T. Berger, “Multiterminal source coding,” in The Information Theory Approach to Communications, G. Longo, Ed. Series
CISM Courses and Lectures Springer-Verlag, New York, 1978, vol. 229, pp. 171–231.
[4] S. Y. Tung, “Multiterminal source coding,” Ph.D. Dissertation, Electrical Engineering, Cornell University, Ithaca, NY, May
1978.
[5] T. Berger and R. Yeung, “Multiterminal source encoding with one distortion criterion,” Information Theory, IEEE
Transactions on, vol. 35, no. 2, pp. 228–236, Mar 1989.
[6] T. Berger, Z. Zhang, and H. Viswanathan, “The ceo problem [multiterminal source coding],” Information Theory, IEEE
Transactions on, vol. 42, no. 3, pp. 887–902, 1996.
[7] Y. Oohama, “The rate-distortion function for the quadratic gaussian ceo problem,” Information Theory, IEEE Transactions
on, vol. 44, no. 3, pp. 1057–1070, 1998.
November 7, 2018
DRAFT
92
[8] A. Wagner, S. Tavildar, and P. Viswanath, “Rate region of the quadratic gaussian two-encoder source-coding problem,”
Information Theory, IEEE Transactions on, vol. 54, no. 5, pp. 1938–1961, May 2008.
[9] A. Wagner, B. Kelly, and Y. Altug, “Distributed rate-distortion with common components,” Information Theory, IEEE
Transactions on, vol. 57, no. 7, pp. 4035–4057, 2011.
[10] C. Heegard and T. Berger, “Rate distortion when side information may be absent,” IEEE Transactions on Information
Theory, vol. 31, no. 6, pp. 727 – 734, Nov. 1985.
[11] R. Timo, T. Chan, and A. Grant, “Rate distortion with side-information at many decoders,” IEEE Transactions on
Information Theory, vol. 57, no. 8, pp. 5240 –5257, Aug. 2011.
[12] R. Timo, A. Grant, and G. Kramer, “Lossy broadcasting with complementary side information,” Information Theory, IEEE
Transactions on, vol. 59, no. 1, pp. 104–131, Jan 2013.
[13] A. Kaspi, “Two-way source coding with a fidelity criterion,” IEEE Transactions on Information Theory, vol. 31, no. 6, pp.
735 – 740, Nov. 1985.
[14] N. Ma and P. Ishwar, “Interaction strictly improves the wyner-ziv rate-distortion function,” in Information Theory
Proceedings (ISIT), 2010 IEEE International Symposium on, June 2010, pp. 61–65.
[15] H. Permuter, Y. Steinberg, and T. Weissman, “Two-way source coding with a helper,” IEEE Transactions on Information
Theory, vol. 56, no. 6, pp. 2905–2919, 2010.
[16] N. Ma and P. Ishwar, “Some results on distributed source coding for interactive function computation,” IEEE Transactions
on Information Theory, vol. 57, no. 9, pp. 6180 –6195, Sep. 2011.
[17] N. Ma, P. Ishwar, and P. Gupta, “Interactive source coding for function computation in collocated networks,” IEEE
Transactions on Information Theory, vol. 58, no. 7, pp. 4289 –4305, Jul. 2012.
[18] L. Sankar and H. Poor, “Distributed estimation in multi-agent networks,” in Information Theory Proceedings (ISIT), 2012
IEEE International Symposium on, July 2012, pp. 329–333.
[19] M. Gastpar, “The wyner-ziv problem with multiple sources,” IEEE Transactions on Information Theory, vol. 50, no. 11,
pp. 2762 – 2768, Nov. 2004.
[20] A. Wyner, “The common information of two dependent random variables,” Information Theory, IEEE Transactions on,
vol. 21, no. 2, pp. 163–179, Mar 1975.
[21] T. Cover and J. Thomas, Elements of information theory (2nd Ed). Wiley-Interscience, 2006.
[22] A. El Gamal and Y.-H. Kim, Network Information Theory. Cambridge, U.K.: Cambridge Univ. Press, 2011.
[23] I. Csiszár and J. Körner, Information Theory: Coding Theorems for Discrete Memoryless Systems. New York: Academic,
1981.
[24] P. Piantanida, L. Rey Vega, and A. Hero, “A proof of the generalized markov lemma with countable infinite sources,” in
Information Theory Proceedings (ISIT), 2014 IEEE International Symposium on, July 2014.
[25] W. Uhlmann, “Vergleich der hypergeometrischen mit der binomial-verteilung,” Metrika, vol. 10, no. 1, pp. 145–158, 1966.
[26] A. Kaspi and T. Berger, “Rate-distortion for correlated sources with partially separated encoders,” Information Theory,
IEEE Transactions on, vol. 28, no. 6, pp. 828–840, Nov 1982.
November 7, 2018
DRAFT