Academia.eduAcademia.edu

On the three-terminal interactive lossy source coding problem

2014, 2014 IEEE International Symposium on Information Theory

Abstract

The three-node multiterminal lossy source coding problem is investigated. We derive an inner bound to the general rate-distortion region of this problem which is a natural extension of the seminal work by Kaspi'85 on the interactive two-terminal source coding problem. It is shown that this (rather involved) inner bound contains several rate-distortion regions of some relevant source coding settings. In this way, besides the non-trivial extension of the interactive two terminal problem, our results can be seen as a generalization and hence unification of several previous works in the field. Specializing to particular cases we obtain novel rate-distortion regions for several lossy source coding problems. We finish by describing some of the open problems and challenges. However, the general three-node multiterminal lossy source coding problem seems to offer a formidable mathematical complexity.

1 The Three-Terminal Interactive Lossy Source Coding Problem arXiv:1502.01359v3 [cs.IT] 18 Jan 2016 Leonardo Rey Vega, Pablo Piantanida and Alfred O. Hero III Abstract The three-node multiterminal lossy source coding problem is investigated. We derive an inner bound to the general rate-distortion region of this problem which is a natural extension of the seminal work by Kaspi’85 on the interactive two-terminal source coding problem. It is shown that this (rather involved) inner bound contains several rate-distortion regions of some relevant source coding settings. In this way, besides the non-trivial extension of the interactive two terminal problem, our results can be seen as a generalization and hence unification of several previous works in the field. Specializing to particular cases we obtain novel rate-distortion regions for several lossy source coding problems. We finish by describing some of the open problems and challenges. However, the general three-node multiterminal lossy source coding problem seems to offer a formidable mathematical complexity. Index Terms Multiterminal source coding, Wyner-Ziv, rate-distortion region, Berger-Tung inner bound, interactive lossy source coding, distributed lossy source coding. The material in this paper was partially published in the IEEE International Symposium on Information Theory, Honolulu, Hawaii, USA, June 29 - July 4, 2014 and in the ncIEEE International Symposium on Information Theory, Hong-Kong, China, June 14 - June 19, 2015. The work of L. Rey Vega was partially supported by project UBACyT 2002013100751BA. The work of P. Piantanida was partially supported by the FP7 Network of Excellence in Wireless COMmunications NEWCOM#. The work of A. Hero was partially supported by a DIGITEO Chair from 2008 to 2013 and by US ARO grant W911NF-15-1-0479. L. Rey Vega is with the Departments of Electronics (FIUBA) and CSC-CONICET, Buenos Aires, Argentina (e-mail: [email protected], [email protected]). P. Piantanida is with the Laboratoire des Signaux et Systèmes (L2S), CentraleSupelec, 91192 Gif-sur-Yvette, France (e-mail: [email protected]). Alfred O. Hero III is with the Department of Electrical Eng. & CompSci University of Michigan, Ann Arbor, MI, USA (e-mail: [email protected]). November 7, 2018 DRAFT 2 I. I NTRODUCTION A. Motivation and related works Distributed source coding is an important branch of study in information theory with enormous relevance for the present and future technology. Efficient distributed data compression may be the only way to guarantee acceptable levels of performance when energy and link bandwidth are severely limited as in many real world sensor networks. The distributed data collected by different nodes in a network can be highly correlated and this correlation can be exploited at the application layer, e.g., for target localization and tracking or anomaly detection. In such cases cooperative joint data-compression can achieve a better overall rate-distortion trade-off that independent compression at each node. Complete answers to the optimal trade-offs between rate and distortion for distributed source coding are scarce and the solution to many problems remain elusive. Two of the most important results in information theory, Slepian-Wolf solution to the distributed lossless source coding problem [1] and Wyner-Ziv [2] single letter solution for the rate-distortion region when side information is available at the decoder provided the kick-off for the study of these important problems. Berger and Tung [3], [4] generalized the Slepian-Wolf problem when lossy reconstructions are required at the decoder. It was shown that the region obtained, although not tight in general, is the optimal one in several special cases [5]–[8] and strictly suboptimal in others [9]. Heegard and Berger [10] considered the Wyner-Ziv problem when the side information at the decoder may be absent or when there are two decoders with degraded side information. Timo et al [11] correctly extended the achievable region for many (> 2) decoders. In [12] and the references therein, the complementary delivery problem (closely related to the HeegardBerger problem) is also studied. The use of interaction in a multiterminal source coding setting has not been so extensively studied as the problems mentioned above. Through the use of multiple rounds of interactive exchanges of information explicit cooperation can take place using distributed/successive refinement source coding. Transmitting “reduced pieces” of information, and constructing an explicit sequential cooperative exchange of information, can be more efficient that transmitting the “total information” in one-shot. The value of interaction for source coding problems was first recognized by Kaspi in his seminal work [13], where the interactive two-terminal lossy source coding problem was introduced November 7, 2018 DRAFT 3 and solved under the assumption of a finite number of communication rounds. In [14] it is shown that interaction strictly outperforms (in term of sum rate) the Wyner-Ziv rate function. There are also several extensions to the original Kaspi problem. In [15] the interactive source coding problem with a helper is solved when the sources satisfy a certain Markov chain property. In [16]–[18] other interesting cases where interactive cooperation can be beneficial are studied. To the best of our knowledge, a proper generalization of this setting to interactive multiterminal (> 2) lossy source coding has not yet been observed. B. Main contributions In this paper, we consider the three-terminal interactive lossy source coding problem presented in Fig. 1. We have a network composed of 3 nodes which can interact through a broadcast ratelimited –error free– channel. Each node measures the realization of a discrete memoryless source (DMS) and is required to reconstruct the sources from the other terminals with a fidelity criterion. Nodes are allowed to interact by interchanging descriptions of their observed sources realizations over a finite number of communication rounds. After the information exchange phase is over, the nodes try to reconstruct the realization of the sources at the other nodes using the recovered descriptions. The general rate-distortion region seems to pose a formidable mathematical problem which encompass several known open problems. However, several properties of this problem are established in this paper. General achievable region We derive a general achievable region by assuming a finite number of rounds. This region is not a trivial extension of Kaspi’s region [13] and the main ideas behind its derivation are the exchange of common and private descriptions between the nodes in the network in order to exploit optimality the different side informations at the different nodes. As in the original Kaspi’s formulation, the key to obtaining the achievable region is the natural cooperation between the nodes induced by the generation of new descriptions based on the past exchanged description. However, in comparison to Kaspi’s 2 node case, the 3 nodes interactions make significant differences in the optimal action of each node at the encoding and decoding procedure in a given round. At each encoding stage, each node need to communicate to two nodes with November 7, 2018 DRAFT 4 n n , D23 ) (X̂21 , D21 ) (X̂23 n n , D13 ) (X̂12 , D12 ) (X̂13 R2 R1 Encoder 1 Encoder 2 R3 R3 R1 X1n X3n R2 X2n Encoder 3 n n (X̂31 , D31 ) (X̂32 , D32 ) Figure 1: Three-Terminal Interactive Source Coding. There is a single noiseless rate-limited broadcast channel from each terminal to the other two terminals. Dij denotes the average perletter distortion between the source Xjn and X̂ijn measured at the node i for each pair i 6= j. different side information. This is reminiscent of the Heegard-Berger problem [10], [11], whose complete solution is not known, when the side information at the decoders is not degraded. Moreover, the situation is a bit more complex because of the presence of 3-way interaction. This similarity between the Heegard-Berger problem leads us to consider the generation of two sets of messages at each node: common messages destined to all nodes and private messages destined to some restricted sets of nodes. On the other hand, when each node is acting as a decoder the nodes need to recover a set of common and private messages generated at different nodes (i.e. at round l node 3, needs to recover the common descriptions generated at nodes 1 and 2 and the private ones generated also at nodes 1 and 2). This is reminiscent of the BergerTung problem [4]–[6], [19] which is also an open problem. Again, the situation is more involved because of the cooperation induced by the multiple rounds of exchanged information. Particularly important is the fact that, in the case of the common descriptions, there is cooperation based on the conditioning on the previous exchanged descriptions in addition to cooperation naturally induced by the encoding-decoding ordering imposed by the network. This explicit cooperation for the exchange of common messages is accomplished through the use of a special binning technique to be explained in Appendix B. November 7, 2018 DRAFT 5 Besides the complexity of the achievable region, we give an inner bound to the rate-distortion region that allows us to recover the two node Kaspi’s region. We also recover several previous inner bounds and rate-distortion regions of some well-known cooperative and interactive –as well as non-interactive– lossy source coding problems. Special cases As the full problem seems to offer a formidable mathematical complexity, including several special cases which are known to be long-standing open problems, we cannot give a full converse proving the optimality of the general achievable region obtained. However, in Section V we provide a complete answer to the rate-distortion regions of several specific cooperative and interactive source coding problems: (1) Two encoders and one decoder subject to lossy/lossless reconstruction constraints without side information (see Fig. 2). (2) Two encoders and three decoders subject to lossless/lossy reconstruction constraints with side information (see Fig. 3). (3) Two encoders and three decoders subject to lossless/lossy reconstruction constraints, reversal delivery and side information (see Fig. 4). (4) Two encoders and three decoders subject to lossy reconstruction constraints with degraded side information (see Fig. 5). (5) Three encoders and three decoders subject to lossless/lossy reconstruction constraints with degraded side information (see Fig. 6). Interestingly enough, we show that for the two last problems, interaction through multiple rounds could be helpful. Whereas for the other three cases, it is shown that a single round of cooperatively exchanged descriptions suffices to achieve optimality. Table I summarizes the characteristics of each of the above mentioned cases. Next we summarize the contents of the paper. In Section II we formulate the general problem. In Section III we present and discuss the inner bound of the general problem. In Section IV we show how our inner bound contains several results previously obtained in the past. In Section V we present the converse results and their tightness with respect to the inner bound for the special cases mentioned above providing the optimal characterization for them. In Section VI we present a discussion of the obtained results and their limitations and some numerical results November 7, 2018 DRAFT 6 Cases R1 R2 R3 (1) 6= 0 6= 0 =0 ∅ (is not reconstructing any source)  n Pr X̂21 6= X1n ≤ ǫ Constraints at Node 1 Constraints at Node 2 ∅ (is not reconstructing (2) 6= 0 6= 0 =0 any source) i n E d(X̂12 , X2n ) ≤ D12 (3) 6= 0 6= 0 =0 h i n , X2n ) ≤ D12 E d(X̂12   n Pr X̂21 6= X1n ≤ ǫ (4) 6= 0 6= 0 =0 h i n E d(X̂12 , X2n ) ≤ D12 h i n E d(X̂21 , X1n ) ≤ D21 (5) 6= 0 6= 0 6= 0 ∅ (is not decoding) h    n 6= X1n ≤ ǫ Pr X̂21 i h n E d(X̂23 , X3n ) ≤ D23 Constraints at Node 3   n Pr X̂31 6= X1n ≤ ǫ h i n E d(X̂32 , X2n ) ≤ D32   n Pr X̂31 6= X1n ≤ ǫ h i n E d(X̂32 , X2n ) ≤ D32   n Pr X̂32 6= X2n ≤ ǫ i h n E d(X̂31 , X1n ) ≤ D31 h i n E d(X̂31 , X1n ) ≤ D31 h i n E d(X̂32 , X2n ) ≤ D32   n 6= X1n ≤ ǫ Pr X̂31 i h n E d(X̂32 , X2n ) ≤ D32 Table I: Special cases fully characterized in Section V. concerning the new optimal cases from the previous Section. Finally in Section VII we provide some conclusions. The major mathematical details are relegated to the appendixes. Notation: We summarize the notation. With xn and upper-case letters X n we denote vectors and random vectors of n components, respectively. The i-th component of vector xn is denoted as xi . All alphabets are assumed to be finite. Entropy is denoted by H(·) and mutual information by I(·; ·). H2 (p) denotes the entropy associated with a Bernoulli random variable with parameter p. With h(·) we denote differential entropy. Let X, Y and V be three random variables on some alphabets with probability distribution pXY V . When clear from context we will simple denote pX (x) with p(x). If the probability distribution of random variables X, Y, V satisfies p(x|yv) = p(x|y) for each x, y, v, then they form a Markov chain, which is denoted by X −− Y −− V . The probability of an event A is denoted by Pr {A}, where the measure used to compute it will be understood from the context. Conditional probability of a set A with respect to a set B  is denoted as Pr A B . The set of strong typical sequences associated with random variable X n (see appendix A) is denoted by T[X]ǫ , where ǫ > 0. We simply denote these sets as Tǫn when clear from the context. The cardinal of set A is denoted by kAk. The complement of a set is denoted by Ā. With Z≥α and R≥β we denote the integers and reals numbers greater than α and November 7, 2018 DRAFT 7 β respectively. co {A} denotes the convex hull of a set A ∈ RN , where N ∈ N. II. P ROBLEM FORMULATION Assume three discrete memoryless sources (DMS’s) with alphabets and pmfs given by X1 ×  X2 × X3 , pX1 X2 X3 and arbitrary bounded distortion measures: dj : Xj × X̂j → R≥0 , j ∈ M , {1, 2, 3} where {X̂j }j∈M are finite reconstruction alphabets1 . We consider the problem of characterizing the rate-distortion region of the interactive source coding scenario described in Fig. 1. In this setting, through K rounds of information exchange between the nodes each one of them will attempt to recover a lossy description of the sources that the others nodes observe, e.g., node 1 must reconstruct –while satisfying distortion constraints– the realization of the sources X2n and X3n observed by nodes 2 and 3. Indeed, this setting can be seen as a generalization of the well-known Kaspi’s problem [13]. Definition 1 (K-step interactive source code): A K-step interactive n-length source code, denoted for the network model in Fig. 1, is defined by a sequence of encoder mappings:  f1l : X1n × J21 × J31 × · · · × J2l−1 × J3l−1 −→ J1l ,  f2l : X2n × J11 × J31 × · · · × J3l−1 × J1l −→ J2l ,  f3l : X3n × J11 × J21 × · · · × J1l × J2l −→ J3l , (1) (2) (3)  with l ∈ [1 : K] and message sets: Jil , 1, 2, . . . , Iil , Iil ∈ Z≥0 , i ∈ M, and reconstruction mappings: gij : Xin × O m∈M, m6=i  Jm1 × · · · × JmK −→ X̂ijn , i 6= j. (4) The average per-letter distortion and the corresponding distortion levels achieved at the node i with respect to source j are: i h  n n E dj Xj , X̂ij ≤ Dij i, j ∈ M, i 6= j with n 1X d(xm , ym ) . d (x , y ) ≡ n m=1 n 1 n (5) (6) The problem can be easily generalized to the case in which there are different reconstruction alphabets at the terminals. It can also be shown that all the results are valid if we employ arbitrary bounded joint distortion functions, e.g. at node 1 we use d(X2 , X3 ; X̂2 , X̂3 ). November 7, 2018 DRAFT 8 In compact form we denote a K-step interactive source coding by (n, K, F, G) where F and G denote the sets of encoders and decoders mappings. Remark 1: The code definition depends on the node ordering in the encoding procedure.  K Above we defined the encoding functions f1l , f2l , f3l l=1 assuming that in each round node 1 acts first, followed by node 2, and finally by node 3, and the process beginning again at node 1. Definition 2 (Achievability and rate-distortion region): Consider R , (R1 , R2 , R3 ) and D , (D12 , D13 , D21 , D23 , D31 , D32 ). The rate vector R is (D, K)-achievable if ∀ε > 0 there is n0 (ε, K) ∈ N such that ∀n > n0 (ε, K) there exists a K-step interactive source code (n, K, F, G) with rates satisfying: K 1X log kJil k ≤ Ri + ǫ, i ∈ M n l=1 (7) n o R3 (D, K) = R : R is (D, K)-achievable (9) and with average per-letter distortions at node i with respect to source j: h i E dj (Xjn , X̂ijn ) ≤ Dij + ǫ, i, j ∈ M, i 6= j , (8)   N n 1 K n where X̂ij ≡ gij Xi , m∈M, m6=i Jm × · · · × Jm , i 6= j ∈ M. The rate-distortion region R3 (D, K) is defined by: S 2 Similarly, the D-achievable region R3 (D) is given by R3 (D) = ∞ K=1 R3 (D, K) , that is: o n (10) R3 (D) = R : R is (D, K)-achievable for some K ∈ Z≥1 . Remark 2: By definition R3 (D, K) is closed and using a time-sharing argument it is easy to show that it is also convex ∀K ∈ Z≥1 . Remark 3: R3 (D, K) depends on the node ordering in the encoding procedure. Above we  K defined the encoding functions f1l , f2l , f3l l=1 assuming that in each round node 1 acts first, followed by node 2, and finally by node 3, and the process beginning again at node 1. In this paper we restrict the analysis to the canonical ordering (1 → 2 → 3). However, there are 3! = 6 different orderings that generally lead to different regions and the (D, K)-achievable region 2 Notice that this limit exists because it is the union of a monotone increasing sequence of sets. November 7, 2018 DRAFT 9 defined above is more explicitly denoted R3 (D, K, σc ), where σc is the trivial permutation for M. The correct (D, K)-achievable region is: R3 (D, K) = [ R3 (D, K, σ) (11) σ∈Σ(M) where Σ(M) contains all the permutations of set M. The theory presented in this paper for determining R3 (D, K, σc ) can be used on the other permutations σ 6= σc to compute (11)3 . III. I NNER B OUND ON THE R ATE -D ISTORTION R EGION In this Section, we provide a general achievable rate-region on the rate-distortion region. A. Inner bound We first present a general achievable rate-region where each node at a given round l will generate descriptions destined to the other nodes based on the realization of its own source, the past descriptions generated by a particular node and the descriptions generated at the other nodes and recovered by the node up to the present round. In order to precisely describe the complex rate-region, we need to introduce some definitions. For a set A, let C (A) = 2A \ {A, ∅} be the set of all subsets of A minus A and the empty set. Denote the auxiliary random variables: Ui→S,l , S ∈ C (M) , i ∈ / S, l = 1, . . . , K. (12) Auxiliary random variables {Ui→S,l } will be used to denote the descriptions generated at node i and at round l and destined to a set of nodes S ∈ C (M) with i ∈ / S. For example, U1→23,l denote the description generated at node 1 and at round l and destined to nodes 2 and 3. Similarly, {U1→2,l } will be used to denote the descriptions generated at node 1 at round l and destined only to node 2. We define variables: W[i,l] ≡ Common information4 shared by the three nodes available at node i at round l before encoding 3 It should be mentioned that this is not the most general setting of the problem. The most general encoding procedure will follow from the definition of the transmission order by a sequence t1 , t2 , t3 , . . . , tkMk×K with ti ∈ M. This will cover even the situation in which the order can be changed in each round. To keep the mathematical presentation simpler we will not consider this more general setting. 4 Not to be confused with the Wyner’s definition of common information [20]. November 7, 2018 DRAFT 10 V[S,l,i] ≡ Private information shared by nodes in S ∈ C (M) available at node i ∈ S, at round l, before encoding In precise terms, the quantities introduced above for our problem are defined by: W[1,l] ={U1→23,k , U2→13,k , U3→12,k }l−1 k=1 , W[2,l] =W[1,l] ∪ U1→23,l , W[3,l] =W[2,l] ∪ U2→13,l , V[12,l,1] ={U1→2,k , U2→1,k }l−1 k=1 , V[12,l,2] = V[12,l,1] ∪ U1→2,l , V[13,l,1] ={U1→3,k , U3→1,k }l−1 k=1 , V[13,l,3] = V[13,l,1] ∪ U1→3,l , V[23,l,2] ={U2→3,k , U3→2,k }l−1 k=1 , V[23,l,3] = V[23,l,2] ∪ U2→3,l . Before presenting the general inner bound, we provide the basic idea of the random coding scheme that achieves the rate-region in Theorem 1 for the case of K communication rounds. Assume that all codebooks are randomly generated and known to all the nodes before the information exchange begins and consider the encoding ordering given by 1 → 2 → 3 so that we begin at round l = 1 in node 1. Also, and in order to maintain the explanation simple and to help the reader to grasp the essentials of the coding scheme employed, we will consider that all terminal are able to recover the descriptions generated at other nodes (which will be the case under the conditions in our Theorem 1). From the observation of the source X1n , node 1 generates a set of descriptions for each of the other nodes connected to it. In particular it generates a common description to be recovered at nodes 2 and 3 in addition to two private descriptions for node 2 and 3, respectively, generated from a conditional codebook given the common description. Then, node 2 tries to recover the descriptions destined to it (the common description generated at 1 and its corresponding private description), using X2n as side information, and generates its own descriptions, based on source X2n and the recovered descriptions from node 1. Again, it generates a common description for nodes 1 and 3, a private description for node 3 and another one for node 1. The same process goes on until node 3, which tries to recover jointly the common descriptions generated by node 1 and node 2, and then the private descriptions destined to him by node 1 and 2. Then generates its own descriptions (common and private ones) destined to nodes 1 and 2. Finally, node 1 tries to recover all the descriptions destined to it generated by nodes 2 and 3 in the same way as previously done by node 3. After this, round l = 1 is over, November 7, 2018 DRAFT 11 and round l = 2 begins with node 1 generating new descriptions using X1n , its encoding history (from previous round) and the recovered descriptions from the other nodes. The process continues in a similar manner until we reach round l = K where node 3 recovers the descriptions from the other nodes and generates its own ones. Node 1 recovers the last descriptions destined to it from nodes 2 and 3 but does not generate new ones. The same holds for node 2 who only recovers the descriptions generated by node 3 and thus terminating the information exchange procedure. Notice that at the end of round K the decoding in node 1 and node 2 can be done simultaneously. This is due to the fact that node 1 is not generating a new description destined to node 2. However, in order to simplify the analysis and notation in the appendix we will consider that the last decoding of node 2 occurs in round K + 15 . After all the exchanges are done, each node recovers an estimate of the other nodes, source realizations by using all the available recovered descriptions from the K previous rounds. Theorem 1 (Inner bound): Let R̄3 (D, K) be the closure of set of all rate tuples satisfying: R1 = K X (l) (l) (l) (l) (l) (l) (l) (l) (l) (l) (l) (l) (l) (l−1) (l) (l−1) (l) (l) (l) (l) l=1 R2 = K X R2→13 + R2→1 + R2→3 l=1 R3 = K X R3→12 + R3→1 + R3→2 l=1 R1 + R2 = K X  (l) R1→23 + R1→2 + R1→3 (13)  (14)  (15) (l) (l) R1→23 + R2→13 + R1→3 + R2→3 + R1→2 + R2→1 l=1 R1 + R3 = K+1 X (l) (l) R1→23 + R3→12 + R1→2 + R3→2 + R1→3 + R3→1 l=1 R2 + R3 = K X (l) (l) R2→13 + R3→12 + R2→1 + R3→1 + R2→3 + R3→2 l=1 5    (16) (17) (18) This is clearly a fictitious round, in the sense that there is not descriptions generation on it. In this way, there is not modification of the final rates achieved by the procedure described if we consider this additional round. November 7, 2018 DRAFT 12 where6 for each l ∈ [1 : K]: (0)   (l) R1→23 > I X1 ; U1→23,l X2 W[1,l] V[12,l,1] V[23,l−1,3]   (l) R2→13 > I X2 ; U2→13,l X3 W[2,l] V[13,l,1] V[23,l,2]   (l) R3→12 > I X3 ; U3→12,l X1 W[3,l] V[12,l,2] V[13,l,3]   (l) (l) R1→23 + R2→13 > I X1 X2 ; U1→23,l U2→13,l X3 W[1,l] V[13,l,1] V[23,l,2]   (l) (l) R2→13 + R3→12 > I X2 X3 ; U2→13,l U3→12,l X1 W[2,l] V[12,l,2] V[13,l,3]   (l) (l−1) R1→23 + R3→12 > I X1 X3 ; U1→23,l U3→12,l−1 X2 W[3,l−1] V[12,l,1] V[23,l−1,3]   (l−1) R3→2 > I X3 ; U3→2,l−1 X2 W[2,l] V[23,l−1,3] V[12,l,2]   (l) R1→2 > I X1 ; U1→2,l X2 W[2,l] V[23,l,2] V[12,l,1]   (l) (l−1) R1→2 + R3→2 > I X1 X3 ; U1→2,l U3→12,l−1 X2 W[2,l] V[23,l−1,3] V[12,l,1]   (l) R1→3 > I X1 ; U1→3,l X3 W[3,l] V[23,l,3] V[13,l,1]   (l) R2→3 > I X2 ; U2→3,l X3 W[3,l] V[23,l,2] V[13,l,3]   (l) (l) R1→3 + R2→3 > I X1 X2 ; U1→3,l U2→3,l X3 W[3,l] V[23,l,2] V[13,l,1]   (l) R2→1 > I X2 ; U2→1,l X1 W[1,l+1] V[12,l,2] V[13,l+1,1]   (l) R3→1 > I X3 ; U3→1,l X1 W[1,l+1] V[12,l+1,1] V[13,l,3]   (l) (l) R2→1 + R3→1 > I X2 X3 ; U2→1,l U3→1,l X1 W[1,l+1] V[12,l,2] V[13,l,3] (K+1) with Ri→S = Ri→S (19) (20) (21) (22) (23) (24) (25) (26) (27) (28) (29) (30) (31) (32) (33) = 0 and Ui→S,0 = Ui→S,K+1 = ∅ for S ∈ C (M) and i ∈ / S. With these definitions the rate-distortion region satisfies7 : [ R̄3 (D, K) ⊆ R3 (D, K) , (34) p∈P(D,K) 6 Notice that these definitions are motivated by the fact that at round 1, node 2 only recovers the descriptions generated by node 1 and at round K + 1 only recovers what node 3 already generated at round K. 7 It is straightforward to show that the LHS of equation (34) is convex, which implies that the convex hull operation is not needed. November 7, 2018 DRAFT 13 where P(D, K) denotes the set of all joint probability measures associated with the following Markov chains for every l ∈ [1 : K]: 1) U1→23,l −− (X1 , W[1,l] ) −− (X2 , X3 , V[12,l,1] , V[13,l,1] , V[23,l,2] ) , 2) U1→2,l −− (X1 , W[2,l] , V[12,l,1] ) −− (X2 , X3 , V[13,l,1] , V[23,l,2] ) , 3) U1→3,l −− (X1 , W[2,l] , V[13,l,1] ) −− (X2 , X3 , V[12,l,2] , V[23,l,2] ) , 4) U2→13,l −− (X2 , W[2,l] ) −− (X1 , X3 , V[12,l,2] , V[13,l,3] , V[23,l,2] ) , 5) U2→1,l −− (X2 , W[3,l] , V[12,l,2] ) −− (X1 , X3 , V[13,l,3] , V[23,l,2] ) , 6) U2→3,l −− (X2 , W[3,l] , V[23,l,2] ) −− (X1 , X3 , V[12,l+1,1] , V[13,l,3] ) , 7) U3→12,l −− (X3 , W[3,l] ) −− (X1 , X2 , V[12,l+1,1] , V[13,l,3] , V[23,l,3] ) , 8) U3→1,l −− (X3 , W[1,l+1] , V[13,l,3] ) −− (X1 , X2 , V[12,l+1,1] , V[23,l,3] ) , 9) U3→2,l −− (X3 , W[1,l+1] , V[23,l,3] ) −− 1(X1 , X2 , V[12,l+1,1] , V[13,l+1,1] ) , and such that there exist reconstruction mappings: h i  gij Xi , V[ij,K+1,1] ,W[1,K+1] = X̂ij (35) with E dj (Xj , X̂ij ) ≤ Dij for each i, j ∈ M and i 6= j. The proof of this theorem is relegated to Appendix C and relies on the auxiliary results presented in Appendix A and the theorem on the cooperative Berger-Tung problem with side information presented in Appendix B. Remark 4: It is worth mentioning here that our coding scheme is constrained to use successive decoding, i.e., by recovering first the coding layer of common descriptions and then coding layer of private descriptions (at each coding layer each node employ joint-decoding). Obviously, this is a sub-optimum procedure since the best scheme would be to use joint decoding where both common and private informations can be jointly recovered. However, the analysis of this scheme is much more involved. The associated achievable rate region involves a large number of equations that combine rates belonging to private and common messages from different nodes. Also, several mutual information terms in each of these rate equations cannot be combined, leading to a proliferation of many equations that offer little insight to the problem. Remark 5: The idea behind our derivation of the achievable region can be extended to any number M (> 3) of nodes in the network. This can be accomplished by generating a greater number of superimposed coding layers. First a layer of codes that generates descriptions destined November 7, 2018 DRAFT 14 to be decoded by all nodes. The next layer corresponding to all subsets of size M − 1, etc, until we reach the final layer composed by codes that generate private descriptions for each of nodes. Again, successive decoding is used at the nodes to recover the descriptions in these layers destined to them. Of course, the number of required descriptions will increase with the number of nodes as well as the obtained rate-distortion region. Remark 6: It is interesting to compare the main ideas of our scheme with those of Kaspi [13]. The main idea in [13] is to have a single coding tree shared by the two nodes. Each leaf in the coding tree is codeword generated either at node 1 or 2. At a given round each node knows (assuming no errors at the encoding and decoding procedures) the path followed in the tree. For example, at round l, node 1, using the knowledge of the path until round l and its source realization generate a leaf (from a set of possible ones) using joint typicality encoding and binning. Node 2, using the same path known at node 1 and its source realization, uses joint typicality decoding to estimate the leaf generate at node 1. If there is no error at these encoding and decoding steps, the previous path is updated with the new leaf and both -node 1 and 2know the updated path. Node 2 repeats the procedure. This is done until round K where the final path is known at both nodes and used to reconstruct the desired sources. In the case of three nodes the situation is more involved. At a given round, the encoder at an arbitrary node is seeing two decoders with different side information8 . In order to simplify the explanation consider that we are at round l in the encoder 1, and that the listening nodes are nodes 2 and 3. This situation forces node 1 to encode two sets of descriptions: one common for the other two nodes and a set of private ones associated with each of the listening nodes 2 and 3. Following the ideas of Kaspi, it is then natural to consider three different coding trees followed by node 1. One coding tree has leaves that are the common descriptions generated and shared by all the nodes in the network. The second tree is composed by leaves that are the private descriptions generated and shared with node 2. The third tree is composed by leaves that are the private descriptions generated and shared with node 3. As the private descriptions refine the common ones, depending on the quality of the side information of the node that is the intended recipient, it is clear that descriptions are correlated. For example, the private description destined to node 2, should depend not only on the past private descriptions generated and shared 8 Because at each node the source realizations are different, and the recovered previous descriptions can also be different. November 7, 2018 DRAFT 15 by nodes 1 and 2, but also on the common descriptions generated at all previous rounds in all the nodes and on the common description generated at the present round in node 1. Something similar happens for the private description destined to node 3. It is clear that as the common descriptions are to be recovered by all the nodes in the network, they can only be conditioned with respect to the past common descriptions generated at previous rounds and with respect to the common descriptions generated at the present round by a node who acted before (i.e. at round l node 1 acts before node 2). The private descriptions, as they are only required to be recovered at some set of nodes, can be generated conditioned on the past exchanged common descriptions and the past private descriptions generated and recovered in the corresponding set of nodes (i.e., the private descriptions exchanged between nodes 1 and 2 at round l, can only be generated conditioned on the past common descriptions generated at nodes 1, 2 and 3 and on the past private descriptions exchanged only between 1 and 2). We can see clearly that there are basically four paths to be cooperatively followed in the network: • One path of common descriptions shared by nodes 1, 2 and 3. • One path of private descriptions shared by nodes 1 and 2. • One path of private descriptions shared by nodes 1 and 3. • One path of private descriptions shared by nodes 2 and 3. It is also clear that each node only follows three of these paths simultaneously. The exchange of common descriptions deserves special mention. Consider the case at round l in node 3. This node needs to recover the common descriptions generated at nodes 1 and 2. But at the moment node 2 generated its own common description, it also recovered the common one generated at node 1. This allows for a natural explicit cooperation between nodes 1 and 2 in order to help node 3 to recover both descriptions. Clearly, this is not the case for private descriptions from nodes 1 and 2 to be recovered at node 3. Node 2 does not recover the private description from node 1 to 3 and cannot generate an explicit collaboration to help node 3 to recover both private descriptions. Note, however, that as both private descriptions will be dependent on previous common descriptions an implicit collaboration (intrinsic to the code generation) is also in force. In appendix B we consider the problem (not in the interactive setting) of generating the explicit cooperation for the common descriptions through the use of what we call a super-binning procedure, in order November 7, 2018 DRAFT 16 to use the results for our interactive three-node problem. IV. K NOWN C ASES AND R ELATED W ORK Several inner bounds and rate-distortion regions on multiterminal source coding problems can be derived by specializing the inner bound (34). Below we summarize only a few of them. 1) Distributed source coding with side information [4], [19]: Consider the distributed source coding problem where two nodes encode separately sources X1n and X2n to rates (R1 , R2 ) and a decoder by using side information X3n must reconstruct both sources with average distortion less than D1 and D2 , respectively. By considering only one-round/one-way information exchange from nodes 1 and 2 (the encoders) to node 3 (the decoder), the results in [4], [19] can be recovered as a special case of the inner bound (34). Specifically, we set: U1→23,l =U2→13,l = U3→12,l = U1→2,l = U2→1,l = U3→1,l = U3→2,l = ∅, ∀l U1→3,l =U2→3,l = ∅, ∀l > 1 . In this case, the Markov chains of Theorem 1 reduce to: U1→3,1 −−X1 −− (X2 , X3 , U2→3,1 ) , (36) U2→3,1 −−X2 −− (X1 , X3 , U1→3,1 ) , (37) and thus the inner bound from Theorem 1 recovers the results in [19] R1 >I(X1 ; U1→3,1 |X3 U2→3,1 ) , (38) R2 >I(X2 ; U2→3,1 |X3 U1→3,1 ) , (39) R1 + R2 >I(X1 X2 ; U1→3,1 U2→3,1 |X3 ) . (40) 2) Source coding with side information at 2-decoders [10], [11]: Consider the setting where one encoder X1n transmits descriptions to two decoders with different side informations (X2n , X3n ) and distortion requirements D2 and D3 . Again we consider only one way/round information exchange from node 1 (the encoder) to nodes 2 and 3 (the decoders). In this case, we set: U2→13,l =U3→12,l = U2→1,l = U3→1,l = U3→2,l = U2→3,l = ∅, ∀l U1→23,l =U1→23,l = U1→2,l = U1→3,l = ∅, ∀l > 1 . November 7, 2018 DRAFT 17 The above Markov chains imply (U1→23,1 , U1→2,1 , U1→3,1 ) −− X1 −− (X2 , X3 ) (41) and thus the inner bound from Theorem 1 reduces to the results in [10], [11]  R1 >max I(X1 ; U1→23,1 |X2 ) , I(X1 ; U1→23,1 |X3 ) +I(X1 ; U1→2,1 |X2 U1→23,1 ) + I(X1 ; U1→3,1 |X3 U1→23,1 ) . (42) 3) Two terminal interactive source coding [13]: Our inner bound (34) is basically the generalization of the two terminal problem to the three-terminal setting. Assume only two encodersdecoders X1n and X2n which must reconstruct the other terminal source 3 with distortion constraints D1 and D2 , and after K rounds of information exchange. Let us set: U1→23,l =U2→13,l = U3→12,l = U1→3,l = U3→1,l = U2→3,l = U3→2,l = ∅, ∀l X3 =∅ . The Markov chains become U1→2,l −− (X1 , V[12,l,1] ) −− X2 , (43) U2→1,l −− (X2 , V[12,l,2] ) −− X2 , (44) for l ∈ [1 : K] and thus the inner bound from Theorem 1 permit us to obtain the results in [13] R1 >I(X1 ; V[12,K+1,1] |X2 ) , (45) R2 >I(X2 ; V[12,K+1,2] |X1 ) . (46) 4) Two terminal interactive source coding with a helper [15]: Consider now two encoders/decoders, namely X2n and X3n , that must reconstruct the other terminal source with distortion constraints D2 and D3 , respectively, using K communication rounds. Assume also that another encoder X1n provides both nodes (2, 3) with a common description before beginning the information exchange and then remains silent. Such common description can be exploited as coded side information. Let us set: U2→13,l =U3→12,l = U1→3,l = U1→2,l = U1→3,l = U2→1,l = U3→1,l = ∅, ∀l U1→23,l =∅, ∀l > 1 . November 7, 2018 DRAFT 18 The Markov chains reduce to: U1→23,1 −−X1 −− (X2 , X3 ) , (47) U2→3,l −−(X2 , U1→23,1 , V[23,l,2] ) −− (X1 , X3 ) , (48) U3→2,l −−(X3 , U1→23,1 , V[23,l,3] ) −− (X1 , X2 ) . (49) An inner bound to the rate-distortion region for this problem reduces to (using the rate equations in our Theorem 1)  R1 >max I(X1 ; U1→23,1 |X2 ), I(X1 ; U1→23,1 |X3 ) , R2 >I(X2 ; V[23,K+1,2] |X3 U1→23,1 ) , (51) R3 >I(X3 ; V[23,K+1,2] |X2 U1→23,1 ) . (52) (50) This region contains as a special case the region in [15]. In that paper it is further assumed (in order to have a converse result) that X1 − − X3 − − X2 . Then, the value of R1 satisfies R1 > I(X1 ; U1→23,1 |X2 ). Obviously, with the same extra Markov chain we obtain the same limiting value for R1 and the above region is the rate-distortion region. V. N EW R ESULTS ON I NTERACTIVE AND C OOPERATIVE S OURCE C ODING A. Two encoders and one decoder subject to lossy/lossless reconstruction constraints without side information Consider now the problem described in Fig. 2 where encoder 1 wishes to communicate the source X1n to node 3 in a lossless manner while encoder 2 wishes to send a lossy description of the source X2n to node 3 with distortion constraint D31 . To achieve this, the encoders use K communication rounds. This problem can be seen as the cooperating encoders version of the well-known Berger-Yeung [5] problem. Theorem 2: The rate-distortion region of the setting described in Fig. 8 is given by the union over all joint probability measures pX1 X2 U2→13 such that there exists a reconstruction mapping: h i g32 (X1 , U2→13 ) = X̂32 with E d(X2 , X̂32 ) ≤ D32 , (53) November 7, 2018 DRAFT 19 X1n Node 1 R1 R1 X2n Node 3 R2 n X̂31 ≈ X1n n (X̂32 , D32 ) R2 Node 2 Figure 2: Two encoders and one decoder subject to lossy/lossless reconstruction constraints without side information. of the set of all tuples satisfying: R1 ≥ H(X1 |X2 ) , (54) R2 ≥ I(X2 ; U2→13 |X1 ) , (55) R1 + R2 ≥ H(X1 ) + I(X2 ; U2→13 |X1 ) . (56) The auxiliary random variable U2→13 has a cardinality bound of kU2→13 k ≤ kX1 kkX2 k + 1. Remark 7: It is worth emphasizing that the rate-distortion region in Theorem 2 outperforms the non-cooperative rate-distortion region first derived in [5]. This is due to two facts: the conditional entropy given in the rate constraint (54) which is strictly smaller than the entropy H(X1 ) present in the rate-region in [5], and the fact that the random description U2→13 may be arbitrarily dependent on both sources (X1 , X2 ) which is not the case without cooperation [5]. Therefore, cooperation between encoders 1 and 2 reduces the rate needed to communicate the source X1 while increasing the optimization set of all admissible source descriptions. Remark 8: Notice that the rate-distortion region in Theorem 2 is achievable with a single round of interactions K = 1, which implies that multiple rounds do not improve the ratedistortion region in this case. This holds because of the fact that node 3 reconstruct X1 in a lossless fashion. Remark 9: Although in the considered setting of Fig. 8 node 1 is not supposed to decode neither a lossy description nor the complete source X2n , if nodes 1 and 3 wish to recover the same November 7, 2018 DRAFT 20 descriptions the optimal rate-region remains the same as given in Theorem 2. The only difference relies on the fact that node 1 is now able to find a function g12 (X1 , U2→13 ) = X̂12 which must h i satisfy an additional distortion constraint E d(X2 , X̂12 ) ≤ D12 . In order to show this, it is enough to check that in the converse proof given below the specific choice of the auxiliary  random variable already allows node 1 to recover a general function X̂12[t] = g12 X1[t] , U2→13[t] for each time t ∈ {1, . . . , n}. Proof: The direct part of the proof simply follows by choosing: U3→12,l =U1→3,l = U1→2,l = U2→1,l = U2→3,l = U3→1,l = U3→2,l = ∅, ∀l U1→23,1 =X1 , U1→23,l = U2→13,l = ∅ ∀ l > 1 , and thus the rate-distortion region (34) reduces to the desired region in Theorem 2 where for simplicity we dropped the round index. We now proceed to the proof of the converse. If a pair of rates (R1 , R2 ) and distortion D32 are admissible for the K-steps interactive cooperative distributed source coding setting described in Fig. 8, then for all ε > 0 there exists n0 (ε, K), such that ∀ n > n0 (ε, K) there exists a K-steps interactive source code (n, K, F, G) with intermediate rates satisfying: K 1X log kJil k ≤ Ri + ε , i ∈ {1, 2} n l=1 (57) and with average per-letter distortions with respect to the source 2 and perfect reconstruction with respect to the source 1 at node 3: h i n E d(X2n , X̂32 ) ≤ D32 + ε ,   n ≤ε, Pr X1n 6= X̂31 (58) (59) where     [1:K] [1:K] [1:K] [1:K] n n . , X̂31 ≡ g31 J1 , J2 X̂32 ≡ g32 J1 , J2 (60) For each t ∈ {1, . . . , n}, define random variables U2→13[t] as follows:   [1:K] [1:K] (61) U2→13[t] , J1 , J2 , X1[1:t−1] , X1[t+1:n] .   n ≤ ε and Fano’s inequality [21], we have By the condition (59) which says that Pr X1n 6= X̂31     n n n , nǫn , (62) log2 (kX1n k − 1) + H2 Pr X1n 6= X̂31 H(X1n |X̂31 ) ≤ Pr X1n 6= X̂31 where ǫn (ε) → 0 provided that ε → 0 and n → ∞. November 7, 2018 DRAFT 21 1) Rate at node 1: For the first rate, we have   [1:K] n(R1 + ε) ≥ H J1   [1:K] ≥ I J1 ; X1n |X2n   n n [1:K] = nH(X1 |X2 ) − H X1 |X2 J1   (a) [1:K] [1:K] = nH(X1 |X2 ) − H X1n |X2n J1 J2 (63) (64) (65) (66) (b) n ≥ nH(X1 |X2 ) − H(X1n |X̂31 ) (67) ≥ nH(X1 |X2 ) − nǫn , (68) (b) where • [1:K] step (a) follows from the fact that by definition of the code the sequence J2 of the source X2n and the vector of messages • • is a function [1:K] J1 , step (b) follows from the code assumption that guarantees the existence of a reconstruction   [1:K] [1:K] n , function X̂31 ≡ g31 J1 , J2 step (c) follows from Fano’s inequality in (62). November 7, 2018 DRAFT 22 2) Rate at node 2: For the second rate, we have   [1:K] n(R2 + ε) ≥ H J2   [1:K] ≥ I J2 ; X1n X2n     [1:K] [1:K] n n n = I J2 ; X1 + I J2 ; X2 |X1 (a) ≥I (b) =I (c) ≥I    [1:K] J2 ; X1n [1:K] J2 ; X1n [1:K] J2 ; X1n    (e) where •  [1:K]  (70) (71) n   X [1:K] I J2 ; X2[t] |X1[t] X1[t+1:n] X1[1:t−1] X2[1:t−1] + + +   [1:K] = I J2 ; X1n + (d) (69) t=1 n X t=1 n X t=1 n X t=1 I  [1:K] J2 X1[t+1:n] X1[1:t−1] X2[1:t−1] ; X2[t] |X1[t] I U2→13[t] ; X2[t] |X1[t]  (73) (74) I U2→13[Q] ; X2[Q] |X1[Q] , Q = t = I J2 ; X1n + nI U2→13[Q] ; X2[Q] |X1[Q] , Q    (f )  [1:K] e2→13 ; X2 |X1 ≥ I J2 ; X1n + nI U   (g) e ≥ nI U2→13 ; X2 |X1 ,  (72)   (75) (76) (77) (78) step (a) follows from the chain rule for conditional mutual information and non-negativity of mutual information, • step (b) follows from the memoryless property across time of the sources (X1n , X2n ), • step (c) follows from the non-negativity of mutual information and definitions (61), • step (d) follows from the use of a time sharing random variable Q uniformly distributed over the set {1, . . . , n}, • • • step (e) follows from the definition of the conditional mutual information, e2→13 , (U2→13[Q] , Q), step (f ) follows by letting a new random variable U step (g) follows from the non-negativity of mutual information. November 7, 2018 DRAFT 23 3) Sum-rate of nodes 1 and 2: For the sum-rate, we have   [1:K] n(R1 + R2 + 2ε) ≥ H J1 + n(R2 + ε)       (a) [1:K] [1:K] e2→13 ; X2 |X1 + I J2 ; X1n + nI U ≥ H J1     [1:K] [1:K] [1:K] [1:K] + I J1 ; J2 = H J1 |J2     [1:K] e2→13 ; X2 |X1 + I J2 ; X1n + nI U     [1:K] [1:K] [1:K] [1:K] ≥ I J1 ; X1n |J2 + I J1 ; J2     [1:K] n e + I J2 ; X1 + nI U2→13 ; X2 |X1       [1:K] [1:K] [1:K] e2→13 ; X2 |X1 + nI U = I J1 ; X1n + I X1n J1 ; J2 h  i (b) e2→13 ; X2 |X1 = n H(X1 ) + I U     [1:K] [1:K] [1:K] + I X1n J1 ; J2 − H X1n |J1  i  h  (c) [1:K] [1:K] n e ≥ n H(X1 ) + I U2→13 ; X2 |X1 − H X1 |J1 J2 i h  (d) n e ) ≥ n H(X1 ) + I U2→13 ; X2 |X1 − H(X1n |X̂31 i  h  (e) e2→13 ; X2 |X1 − ǫn , ≥ n H(X1 ) + I U (79) (80) (81) (82) (83) (84) (85) (86) (87) where • step (a) follows from inequality (77), • step (b) follows from the memoryless property across time of the source X1n , • step (c) follows from non-negativity of mutual information, • step (d) follows from the code assumption that guarantees the existence of reconstruction   [1:K] [1:K] n and from the fact that unconditioning increases entropy, function X̂31 ≡ g31 J1 , J2 • step (e) from Fano’s inequality in (62).   [1:K] [1:K] n and lossy 4) Distortion at node 3: Node 3 reconstructs lossless X̂31 ≡ g31 J1 , J2   [1:K] [1:K] n . For each t ∈ {1, . . . , n}, define a function X̂32[t] as beging the t-th X̂32 ≡ g32 J1 , J2 coordinate of this estimate:  X̂32[t] U2→13[t] , g32[t] November 7, 2018  [1:K] [1:K] J1 , J2  . (88) DRAFT 24 The component-wise mean distortion thus verifies i h  [1:K] [1:K]  D32 + ε ≥ E d X2 , g31 J1 , J2 (89) i 1X h  E d X2[t] , X̂32[t] U2→13[t] n t=1 n =  1X h  E d X2[Q] , X̂32[Q] U2→13[Q] n t=1 h  i = E d X2[Q] , X̂32[Q] U2→13[Q] i  h  e2→13 e32 U , = E d X2 , X (90) n = Q=t i e32 by where we defined function X     e32 Q, U2→13[Q] , X̂32[Q] U2→13[Q] . e2→13 = X e32 U X (91) (92) (93) (94) This concludes the proof of the converse and thus that of the theorem. B. Two encoders and three decoders subject to lossless/lossy reconstruction constraints with side information Consider now the problem described in Fig. 3 where encoder 1 wishes to communicate the lossless the source X1n to nodes 2 and 3 while encoder 2 wishes to send a lossy description of its source X2n to nodes 1 and 3 with distortion constraints D12 and D32 , respectively. In addition to this, the encoders overhead the communication using K communication rounds. This problem can be seen as a generalization of the settings previously investigated in [3], [5]. Theorem 3: The rate-distortion region of the setting described in Fig. 3 is given by the union over all joint probability measures pX1 X2 X3 U2→13 U2→3 satisfying the Markov chain (U2→13 , U2→3 ) −− (X1 , X2 ) −− X3 (95) and such that there exists reconstruction mappings: November 7, 2018 g32 (X1 , X3 , U2→13 , U2→3 )=X̂32 with g12 (X1 , U2→13 )=X̂12 with h i E d(X2 , X̂32 ) ≤ D32 , h i E d(X2 , X̂12 ) ≤ D12 , (96) (97) DRAFT 25 n (X̂12 , D12 ) X1n Node 1 R1 R1 Node 3 R2 n (X̂32 , D32 ) R2 X2n n X̂31 ≈ X1n Node 2 X3n n X̂21 ≈ X1n Figure 3: Two encoders and three decoders subject to lossless/lossy reconstruction constraints with side information. of the set of all tuples satisfying: R1 ≥ H(X1 |X2 ) , (98) R2 ≥ I(U2→13 ; X2 |X1 ) + I(U2→3 ; X2 |U2→13 X1 X3 ) , (99) R1 + R2 ≥ H(X1 |X3 ) + I(U2→13 U2→3 ; X2 |X1 X3 ). (100) The auxiliary random variables have cardinality bounds: kU2→13 k ≤ kX1 kkX2 k + 2, kU2→3 k ≤ kX1 kkX2 kkU2→13 k + 1. Remark 10: Notice that the rate-distortion region in Theorem 3 is achievable with a single round of interactions K = 1, which implies that multiple rounds do not improve the ratedistortion region in this case. Remark 11: It is worth mentioning that cooperation between encoders reduces the rate needed to communicate the source X2 while increasing the optimization set of all admissible source descriptions. Proof: The direct part of the proof follows by choosing: U3→12,l =U1→3,l = U1→2,l = U2→1,l = U3→1,l = U3→2,l = ∅, ∀l U1→23 ≡U1→23,1 = X1 , November 7, 2018 U1→23,l = U2→13,l = U2→3,l = ∅ ∀ l > 1 . DRAFT 26 and U2→13,1 ≡ U2→13 and U2→3,1 ≡ U2→3 are auxiliary random variables that according to Theorem 1 should satisfy: U2→13 −− (X1 , X2 ) −− X3 , U2→3 , −−(U2→13 , X1 , X2 ) −− X3 . (101) Notice, however that these Markov chains are equivalent to (95). From the rate equations in Theorem 1, and the above choices for the auxiliary random variables we obtain: R1→23 >H(X1 |X2 ) , (102) R2→13 >max {I(X2 ; U2→13 |X1 ), I(X2 ; U2→13 |X1 X3 )} (103) =I(X2 ; U2→13 |X1 ) , (104) R1→23 + R2→13 >H(X1 |X3 ) + I(X2 ; U2→13 |X1 X3 ) , (105) R2→3 >I(X2 ; U2→3 |U2→13 X1 X3 ) . (106) Noticing that R1 ≡ R1→23 and R2 ≡ R2→13 + R2→3 the rate-distortion region (34) reduces to the desired region in Theorem 3, where for simplicity we dropped the round index. We now proceed to the proof of the converse. If a pair of rates (R1 , R2 ) and distortions (D12 , D32 ) are admissible for the K-steps interactive cooperative distributed source coding setting described in Fig. 3, then for all ε > 0 there exists n0 (ε, K), such that ∀ n > n0 (ε, K) there is a K-steps interactive source code (n, K, F, G) with intermediate rates satisfying: K 1X log kJil k ≤ Ri + ε , i ∈ {1, 2} n l=1 (107) and with average per-letter distortions with respect to the source 2 and perfect reconstruction with respect to the source 1 at all nodes: h i n n E d(X2 , X̂32 ) ≤ D32 + ε ,   n ≤ε, Pr X1n 6= X̂21 h i n ) ≤ D12 + ε , E d(X2n , X̂12   n ≤ε, Pr X1n 6= X̂31 November 7, 2018 (108) (109) (110) (111) DRAFT 27 where   [1:K] [1:K] n X̂32 ≡ g32 J1 , J2 , X3n ,   [1:K] [1:K] n X̂31 ≡ g31 J1 , J2 , X3n ,   [1:K] n X̂12 ≡ g12 J2 , X1n ,   [1:K] n X̂21 ≡ g21 J1 , X2n . (112) (113) For each t ∈ {1, . . . , n}, define random variables U2→13[t] and U2→3[t] as follows:   [1:K] [1:K] U2→13[t] , J1 , J2 , X1[1:t−1] , X1[t+1:n] , X3[1:t−1] ,  U2→3[t] , U2→13[t] , X3[t+1:n] , X2[1:t−1] . (114) (115) The fact that these choices of the auxiliary random variables satisfy the Markov chain (95) can be obtained from point 6) in Lemma 10. By the conditions (111) and (109), and Fano’s inequality, we have n H(X1n |X̂31 ) ≤ Pr   X1n 6= n X̂31 n n H(X1n |X̂21 ) ≤ Pr X1n 6= X̂21   log2 (kX1n k  X1n n X̂31  , nǫn , 6 = − 1) + H2 Pr    n 6 X̂21 , nǫn , log2 (kX1n k − 1) + H2 Pr X1n = (116) (117) where ǫn (ε) → 0 provided that ε → 0 and n → ∞. 1) Rate at node 1: For the first rate, we have   [1:K] n(R1 + ε) ≥ H J1   [1:K] n ≥ H J1 |X2   (a) [1:K] n n = I J1 ; X1 |X2   [1:K] = nH(X1 |X2 ) − H X1n |X2n J1 (118) (119) (120) (121) (b) n ≥ nH(X1 |X2 ) − H(X1n |X̂21 ) (122) (c) ≥ n [H(X1 |X2 ) − ǫn ] , (123) where • [1:K] step (a) follows from the fact that by definition of the code the sequence J1 is a function of the both sources (X1n , X2n ), • • step (b) follows from the code assumption in (113) that guarantees the existence of a   [1:K] n reconstruction function X̂21 ≡ g21 J1 , X2n , step (c) follows from Fano’s inequality in (117). November 7, 2018 DRAFT 28 2) Rate at node 2: For the second rate, we have   [1:K] n(R2 + ε) ≥ H J2   (a) [1:K] = I J2 ; X1n X2n X3n   (b) [1:K] ≥ I J2 ; X2n X3n |X1n   (c) [1:K] [1:K] = I J1 J2 ; X2n X3n |X1n     [1:K] [1:K] [1:K] [1:K] n n n n n = I J1 J2 ; X3 |X1 + I J1 J2 ; X2 |X1 X3 (124) (125) (126) (127) (128) n h   X [1:K] [1:K] n = I J1 J2 ; X3[t] |X1 , X3[1:t−1] (d) t=1 (e) = i  [1:K] [1:K] +I J1 J2 ; X2[t] |X1n X3n X2[1:t−1] (129) n h   X [1:K] [1:K] I J1 J2 X1[1:t−1] X1[t+1:n] X3[1:t−1] ; X3[t] |X1[t] t=1 +I (f ) =  [1:K] [1:K] J1 J2 X1[1:t−1] X1[t+1:n] X3[1:t−1] X3[t+1:n] X2[1:t−1] ; X2[t] |X1[t] X3[t] n h X t=1 = (g) = (h) = n h X t=1 n h X t=1 n h X t=1 where   I U2→13[t] ; X3[t] |X1[t] + I U2→13[t] ; X2[t] |X1[t] X3[t] +I U2→3[t] ; X2[t] |X1[t] X3[t] U2→13[t] i  i I U2→13[t] ; X2[t] X3[t] |X1[t] + I U2→3[t] ; X2[t] |X1[t] X3[t] U2→13[t] i  I U2→13[t] ; X2[t] |X1[t] + I U2→3[t] ; X2[t] |X1[t] X3[t] U2→13[t] I U2→13[Q] ; X2[Q] |X1[Q] , Q = t  i +I U2→3[Q] ; X2[Q] |X1[Q] X3[Q] U2→13[Q] , Q = t i   (i) h  e e e , ≥ n I U2→13 ; X2 |X1 + I U2→3 ; X2 |X1 X3 U2→13 [1:K] step (a) follows from the fact that J2 • step (b) follows from the non-negativity of mutual information, • step (c) follows from the fact that J1 [2:K] (131) (132) (133) (134) (135) is a function of the sources (X1n , X2n ), • November 7, 2018 i (130) [1:K] is a function of J2 and the source X1n , DRAFT 29 • step (d) follows from the chain rule for conditional mutual information, • step (e) follows from the memoryless property across time of the sources (X1n , X2n , X3n ), • step (f ) follows from the chain rule for conditional mutual information and the definitions (114) and (115), • step (g) follows from the Markov chain U2→13[t] − −(X1[t] , X2[t] )− −X3[t] , for all t ∈ {1, . . . , n}, • step (h) follows from the use of a time sharing random variable Q uniformly distributed over the set {1, . . . , n}, • e2→13 , (U2→13[Q] , Q) and U e2→3 , step (i) follows by letting new random variables U (U2→3[Q] , Q). 3) Sum-rate of nodes 1 and 2: For the sum-rate, we have     [1:K] [1:K] + H J2 n(R1 + R2 + 2ε) ≥ H J1     [1:K] [1:K] [1:K] [1:K] + I J1 ; J2 = H J1 J2     (a) [1:K] [1:K] [1:K] [1:K] = I J1 J2 ; X1n X3n X2n + I J1 ; J2   (b) [1:K] [1:K] n n n ≥ I J1 J2 ; X1 X2 |X3     [1:K] [1:K] [1:K] [1:K] = I J1 J2 ; X1n |X3n + I J1 J2 ; X2n |X1n X3n   [1:K] [1:K] = H (X1n |X3n ) − H X1n |J1 J2 X3n   [1:K] [1:K] n n n +I J1 J2 ; X2 |X1 X3   (c) [1:K] [1:K] n n n n n n n ≥ H (X1 |X3 ) − H(X1 |X̂31 ) + I J1 J2 ; X2 |X1 X3   (d) [1:K] [1:K] ≥ n [H (X1 |X3 ) − ǫn ] + I J1 J2 ; X2n |X1n X3n (e) = (136) (137) (138) (139) (140) (141) (142) (143) n   X [1:K] [1:K] I J1 J2 X1[1:t−1] X1[t+1:n] X3[1:t−1] X3[t+1:n] X2[1:t−1] ; X2[t] |X1[t] X3[t] t=1 + n [H (X1 |X3 ) − ǫn ] (f ) = n [H (X1 |X3 ) − ǫn ] + November 7, 2018 t=1 I U2→13[t] U2→3[t] ; X2[t] |X1[t] X3[t]    = n H (X1 |X3 ) − ǫn + I U2→13[Q] U2→3[Q] ; X2[Q] |X1[Q] X3[Q] , Q i h  (h) e2→13 U e2→3 ; X2 |X1 X3 , = n H (X1 |X3 ) − ǫn + I U (g) where (144) n X (145) (146) (147) DRAFT 30 • [1:K] step (a) follows from the fact that J1 [1:K] and J2 are functions of the sources (X1n , X2n , X3n ), to emphasize • step (b) follows non-negativity of mutual information, • step (c) follows from the code assumption in (113) that guarantees the existence of recon  [1:K] [1:K] n n struction function X̂31 ≡ g31 J1 , J2 , X3 , • step (d) follows from Fano’s inequality in (111), • step (e) follows from the chain rule of conditional mutual information and the memoryless property across time of the source (X1n , X2n , X3n ), • step (f ) from follows from the definitions (114) and (115), • step (g) follows from the use of a time sharing random variable Q uniformly distributed over the set {1, . . . , n}, • e2→13 , (U2→13[Q] , Q) and U e2→3 , step (h) follows by letting new random variables U (U2→3[Q] , Q).   [1:K] n 4) Distortion at node 1: Node 1 reconstructs a lossy X̂12 ≡ g12 J2 , X1n . It is clear that   [1:K] [1:K] n we write without loss of generality X̂12 ≡ g12 J1 , J2 , X1n . For each t ∈ {1, . . . , n}, define a function X̂12[t] as beging the t-th coordinate of this estimate:    [1:K] [1:K] X̂12[t] U2→13[t] , X1[t] , g12[t] J1 , J2 , X1n . (148) The component-wise mean distortion thus verifies h  i [1:K] [1:K] D12 + ε ≥ E d X2 , g12 J1 , J2 , X1n (149) i 1X h  E d X2[t] , X̂12[t] U2→13[t] , X1[t] n t=1 n =  1X h  E d X2[Q] , X̂12[Q] U2→13[Q] , X1[Q] n t=1 h  i = E d X2[Q] , X̂12[Q] U2→13[Q] , X1[Q] i  h  e e , = E d X2 , X12 U2→13 , X1 (150) n = Q=t i (151) (152) (153) e12 by where we defined function X     e12 Q, U2→13[Q] , X1[Q] , X̂12[Q] U2→13[Q] , X1[Q] . e12 U e2→13 , X1 = X X (154) November 7, 2018 DRAFT 31   [1:K] [1:K] n 5) Distortion at node 3: Node 3 reconstructs a lossy description X̂32 ≡ g32 J1 , J2 , X3n . For each t ∈ {1, . . . , n}, define a function X̂32[t] as beging the t-th coordinate of this estimate:    [1:K] [1:K] (155) X̂32[t] U2→13[t] , U2→3[t] , X3[t] , g32[t] J1 , J2 , X3n . The component-wise mean distortion thus verifies h  i [1:K] [1:K] D32 + ε ≥ E d X2 , g32 J1 , J2 , X3n (156) i 1X h  E d X2[t] , X̂32[t] U2→13[t] , U2→3[t] , X3[t] = n t=1 n  1X h  = E d X2[Q] , X̂32[Q] U2→13[Q] , U2→3[Q] , X3[Q] n t=1 h  i = E d X2[Q] , X̂32[Q] U2→13[Q] , U2→3[Q] , X3[Q]  i h  e32 U e2→13 , U e2→3 , X3 , = E d X2 , X (157) n Q=t e32 by where we defined function X    e32 Q, U2→13[Q] , U2→3[Q] , X3[Q] e2→13 , U e2→3 , X3 = X e32 U X  , X̂32[Q] U2→13[Q] , U2→3[Q] , X3[Q] . i (158) (159) (160) (161) This concludes the proof of the converse and thus that of the theorem. C. Two encoders and three decoders subject to lossless/lossy reconstruction constraints, reversal delivery and side information Consider now the problem described in Fig. 4 where encoder 1 wishes to communicate the lossless the source X1n to node 2 and a lossy description to node 3. Encoder 2 wishes to send a lossy description of its source X2n to node 1 and a lossless one to node 3. The corresponding distortion at node 1 and 3 are D12 and D31 , respectively. In addition to this, the encoders accomplish the communication using K communication rounds. This problem is very similar to the problem described in Fig. 3, with the difference that the decoding at node 3 is inverted. Theorem 4: The rate-distortion region of the setting described in Fig. 4 is given by the union over all joint probability measures pX1 X2 X3 U2→13 satisfying the Markov chain U2→13 −− (X1 , X2 ) −− X3 November 7, 2018 (162) DRAFT 32 n (X̂12 , D12 ) X1n Node 1 R1 R1 Node 3 R2 n (X̂31 , D31 ) R2 X2n n X̂32 ≈ X2n Node 2 X3n n X̂21 ≈ X1n to emphasize Figure 4: Two encoders and three decoders subject to lossless/lossy reconstruction constraints, reversal delivery and side information. and such that there exists reconstruction mappings: g31 (X2 , X3 , U2→13 )=X̂31 with g12 (X1 , U2→13 )=X̂12 with of the set of all tuples satisfying: h i E d(X1 , X̂31 ) ≤ D31 , h i E d(X2 , X̂12 ) ≤ D12 , (163) (164) R1 ≥H(X1 |X2 ) , (165) R2 ≥I(U2→13 ; X2 |X1 ) + H(X2 |U2→13 X1 X3 ) , (166) R1 + R2 ≥H(X1 X2 |X3 ) . (167) The auxiliary random variable has cardinality bounds: kU2→13 k ≤ kX1 kkX2 k + 3. Remark 12: Notice that the rate-distortion region in Theorem 4 is achievable with a single round of interactions K = 1, which implies that multiple rounds do not improve the ratedistortion region in this case. Remark 13: Notice that, although node 3 requires only the lossy recovery of X1 , it can in fact recover X1 perfectly. That is, as node 3 requires the perfect recovery of X2 , it has the same information that node 2 who recover X1 perfectly. This explains the sum-rate term. We also see, November 7, 2018 DRAFT 33 that the cooperation helps in the Wyner-Ziv problem that exists between node 2 and 1, with an increasing of the optimization region thanks to the Markov chain (162). Proof: The direct part of the proof follows by choosing: U3→12,l =U1→3,l = U1→2,l = U2→1,l = U3→1,l = U3→2,l = ∅, ∀l U1→23 ≡U1→23,1 = X1 , U2→3 ≡ U2→3,1 = X2 U1→23,l = U2→13,l = U2→3,l = ∅ ∀ l > 1 and with U2→13,1 ≡ U2→13 auxiliary random variable that according to Theorem 1 should satisfy: U2→13 −− (X1 , X2 ) −− X3 . (168) From the rate equations in Theorem 1, and the above choices for the auxiliary random variables we obtain: R1→23 >H(X1 |X2 ) , (169) R2→13 >max {I(X2 ; U2→13 |X1 ), I(X2 ; U2→13 |X1 X3 )} (170) =I(X2 ; U2→13 |X1 ) , (171) R1→23 + R2→13 >H(X1 |X3 ) + I(X2 ; U2→13 |X1 X3 ) , (172) R2→3 >H(X2 |U2→13 X1 X3 ) . (173) Noticing that R1 ≡ R1→23 and R2 ≡ R2→13 + R2→3 the rate-distortion region (34) reduces to the desired region in Theorem 4, where for simplicity we dropped the round index. We now proceed to the proof of the converse. If a pair of rates (R1 , R2 ) and distortions (D12 , D31 ) are admissible for the K-steps interactive cooperative distributed source coding setting described in Fig. 4, then for all ε > 0 there exists n0 (ε, K), such that ∀ n > n0 (ε, K) there exists a K-steps interactive source code (n, K, F, G) with intermediate rates satisfying: K 1X log kJil k ≤ Ri + ε , i ∈ {1, 2} n l=1 November 7, 2018 (174) DRAFT 34 and with reconstruction constraints: h i n E d(X1n , X̂31 ) ≤ D31 + ε ,   n ≤ε, Pr X1n 6= X̂21 h i n E d(X2n , X̂12 ) ≤ D12 + ε ,   n ≤ε, Pr X2n 6= X̂32 where n X̂32 ≡ g32   [1:K] [1:K] J1 , J2 , X3n [1:K] n X̂31 ≡ g31 J1 [1:K] , J2 , X3n   , , n X̂12 ≡ g12 (175) (176) (177) (178)   [1:K] J2 , X1n [1:K] n X̂21 ≡ g21 J1 , X2n   , (179) . (180) For each t ∈ {1, . . . , n}, define random variables U2→13[t] as follows:   [1:K] [1:K] U2→13[t] , J1 , J2 , X1[1:t−1] , X1[t+1:n] , X3[1:t−1] . (181) Using point 6) in Lemma 10 we can see that this choice satisfies (162). By the conditions (176) and (178), and Fano’s inequality, we have     n n n log2 (kX1n k − 1) + H2 Pr X1n 6= X̂21 , nǫn , H(X1n |X̂21 ) ≤ Pr X1n 6= X̂21     n n n 6 X̂32 , nǫn , log2 (kX2n k − 1) + H2 Pr X2n = H(X2n |X̂32 ) ≤ Pr X2n 6= X̂32 (182) (183) where ǫn (ε) → 0 provided that ε → 0 and n → ∞. 1) Rate at node 1: For the first rate, from cut-set arguments similar to the ones used in Theorem 3 and Fano inequality, we can easily obtain: n(R1 + ε) ≥ n [H(X1 |X2 ) − ǫn ] November 7, 2018 (184) DRAFT 35 2) Rate at node 2: For the second rate, we have   [1:K] n(R2 + ε) ≥ H J2   (a) [1:K] = I J2 ; X1n X2n X3n   (b) [1:K] ≥ I J2 ; X2n X3n |X1n   (c) [1:K] [1:K] = I J1 J2 ; X2n X3n |X1n     [1:K] [1:K] [1:K] [1:K] n n n n n = I J1 J2 ; X3 |X1 + I J1 J2 ; X2 |X1 X3 (185) (186) (187) (188) (189) n h   X [1:K] [1:K] n = I J1 J2 ; X3[t] |X1 , X3[1:t−1] (d) t=1 (e) = i  [1:K] [1:K] +I J1 J2 ; X2[t] |X1n X3n X2[1:t−1] n h   X [1:K] [1:K] I J1 J2 X1[1:t−1] X1[t+1:n] X3[1:t−1] ; X3[t] |X1[t] (190) t=1 +I (f ) = =  [1:K] [1:K] J1 J2 X1[1:t−1] X1[t+1:n] X3[1:t−1] X3[t+1:n] X2[1:t−1] ; X2[t] |X1[t] X3[t] n h X t=1 n h X t=1 (g) = n h X t=1 (h) ≥ (i) = n h X t=1 n h X t=1 where November 7, 2018 i (191)   I U2→13[t] ; X3[t] |X1[t] + I U2→13[t] X3[t+1:n] X2[1:t−1] ; X2[t] |X1[t] X3[t] I U2→13[t] ; X2[t] X3[t] |X1[t]  +I X3[t+1:n] X2[1:t−1] ; X2[t] |X1[t] X3[t] U2→13[t]   I U2→13[t] ; X2[t] |X1[t] + H X2[t] |X1[t] X3[t] U2→13[t] −H X2[t] |X1[t] X3[t:n] U2→13[t] X2[1:t−1] i i   I U2→13[t] ; X2[t] |X1[t] + H X2[t] |X1[t] X3[t] U2→13[t] − ǫn I U2→13[Q] ; X2[Q] |X1[Q] , Q = t  i  +H X2[Q] |X1[Q] X3[Q] U2→13[Q] , Q = t − ǫn i    (j) h  e2→13 − ǫn , e2→13 ; X2 |X1 + H X2 |X1 X3 U ≥n I U i (192) (193) (194) (195) (196) DRAFT 36 [1:K] is a function of the sources (X1n , X2n ), • step (a) follows from the fact that J2 • step (b) follows from the non-negativity of mutual information, • step (c) follows from the fact that J1 • step (d) follows from the chain rule for conditional mutual information, • step (e) follows from the memoryless property across time of the sources (X1n , X2n , X3n ), • step (f ) follows from the definition (181) • step (g) follows from the Markov chain U2→13[t] −−(X1[t] , X2[t] )−−X3[t] , for all t ∈ {1, . . . , n} • and the usual decomposition of mutual information,   step (h) follows from the fact that Pr X2[t] 6= X̂32[t] ≤ ǫ ∀t ∈ {1, . . . , n} , X̂32 [t] ≡ [2:K] [1:K] is a function of J2 and the source X1n , g32[t] (U2→13[t] , X3[t:n] ) and Fano inequality. • step (i) follows from the use of a time sharing random variable Q uniformly distributed over the set {1, . . . , n}, • e2→13 , (U2→13[Q] , Q). step (j) follows by defining new random variables U 3) Sum-rate of nodes 1 and 2: From cut-set arguments and Fano inequality, we can easily obtain: n(R1 + R2 + 2ε) ≥ n [H(X1 |X2 X3 ) − ǫn ]  [1:K] n ≡ g12 J2 4) Distortion at node 1: Node 1 reconstructs a lossy X̂12  (197) , X1n . For each t ∈ {1, . . . , n}, define a function X̂12[t] as the t-th coordinate of this estimate:    [1:K] n X̂12[t] U2→13[t] , X1[t] , g12[t] J2 , X1 . (198) The component-wise mean distortion thus verifies h  i [1:K] D12 + ε ≥ E d X2n , g12 J2 , X1n (199) i 1X h  E d X2[t] , X̂12[t] U2→13[t] , X1[t] n t=1 n =  1X h  E d X2[Q] , X̂12[Q] U2→13[Q] , X1[Q] n t=1 h  i = E d X2[Q] , X̂12[Q] U2→13[Q] , X1[Q] i  h  e2→13 , X1 e12 U , = E d X2 , X (200) n = Q=t i e12 by where we defined function X     e12 Q, U2→13[Q] , X1[Q] , X̂12[Q] U2→13[Q] , X1[Q] . e2→13 , X1 = X e12 U X November 7, 2018 (201) (202) (203) (204) DRAFT 37   [1:K] [1:K] n 5) Distortion at node 3: Node 3 reconstructs a lossy description X̂31 ≡ g31 J1 , J2 , X3n . For each t ∈ {1, . . . , n}, define a function X̂31[t] as:     [1:K] [1:K] n X̄31[t] U2→13[t] , X̂32[t] , X3[t] , X3[t+1:n] , g31[t] J1 , J2 , X3 t ∈ {1, . . . , n}. (205)   [1:K] [1:K] n This can be done because X̂32[t] is also a function of J1 , J2 , X3 . The component-wise mean distortion thus verifies h  i [1:K] [1:K] D32 + ε ≥ E d X1n , g31 J1 , J2 , X3n (206) i  1X h  E d X1[t] , X̄31[t] U2→13[t] , X̂32[t] , X3[t] , X3[t+1:n] = n t=1 n n  1X  = E d X1[t] , X̄31[t] U2→13[t] , X2[t] , X3[t] , X3[t+1:n] n t=1 (208) ≥ (209) (a) (b) i 1X h  E d X1[t] , X̂31[t] U2→13[t] , X2[t] , X3[t] n t=1 n  1X h  E d X1[Q] , X̂31[Q] U2→13[Q] , X2[Q] , X3[Q] n t=1 h  i = E d X1[Q] , X̂32[Q] U2→13[Q] , X2[Q] , X3[Q] i  h  e e , = E d X1 , X32 U2→13 , X2 , X3 n = where • • (207)  Q=t i (210) (211) (212)  step (a) follows from the fact that X̄31[t] U2→13[t] , X̂32[t] , X3[t] , X3[t+1:n] can be trivially  expressed as a function of U2→13[t] , X2[t] , X3[t] , X3[t+1:n] as follows:   if X2[t] = X̂32[t]   X̄31[t] U2→13[t] X2[t] X3[t] X3[t+1:n]   X̄31[t] U2→13[t] X2[t] X3[t] X3[t+1:n] =  X̄31[t] U2→13[t] X̂32[t] X3[t] X3[t+1:n] if X2[t] 6= X̂32[t] step (b) follows from the fact that X1[t] −− (U2→13[t] X2[t] X3[t] ) −− X3[t+1:n] , which implies  that for all t ∈ {1, . . . , n} exists X̂31[t] U2→13[t] , X2[t] , X3[t] such that h   i  E d X1[t] , X̂31[t] U2→13[t] X2[t] X3[t] ≤ E d X1[t] , X̄31[t] U2→13[t] X2[t] X3[t] X3[t+1:n] e32 by We also defined function X    e32 Q, U2→13[Q] , X2[Q] , X3[Q] e2→13 , X2 , X3 = X e32 U X  , X̂32[Q] U2→13[Q] , X2[Q] , X3[Q] . (213) This concludes the proof of the converse and thus that of the theorem. November 7, 2018 DRAFT 38 D. Two encoders and three decoders subject to lossy reconstruction constraints with degraded side information Consider now the problem described in Fig. 5 where encoder 1 has access to X1 and X3 and wishes to communicate a lossy description of X1 to nodes 2 and 3 with distortion constraints D21 and D31 , while encoder 2 wishes to send a lossy description of its source X2n to nodes 1 and 3 with distortion constraints D12 and D32 . In addition to this, the encoders overhead the communication using K communication rounds. This problem can be seen as a generalization of the settings previously investigated in [19]. This setup is motivated by the following application. Consider that node 1 transmits a probing signal X3 which is used to explore a spatial region (i.e. a radar transmitter). After transmission of this probing signal, node 1 measures the response (X1 ) at its location. Similarly, in a different location node 2 measures the response X2 . Responses X1 and X2 have to be sent to node 3 (e.g. the fusion center) which has knowledge of the probing signal X3 and wants to reconstruct a lossy estimate of them. Nodes 1 and 2 cooperate through multiple rounds to accomplish this task. n (X̂12 , D12 ) (X1n , X3n ) Node 1 R1 R1 n (X̂32 , D32 ) Node 3 R2 n , D31 ) (X̂31 R2 X2n Node 2 X3n n , D21 ) (X̂21 Figure 5: Two encoders and three decoders subject to lossy reconstruction constraints with degraded side information. Theorem 5: The rate-distortion region of the setting described in Fig. 5 where X1 −−X3 −−X2 form a Markov chain is given by the union over all joint probability measures pX1 X2 X3 W[1,K+1] U1→3,K November 7, 2018 DRAFT 39 satisfying the following Markov chains: U1→23,l −− (X1 , X3 , W[1,l] ) −− X2 , (214) U2→13,l −− (X2 , W[2,l] ) −− (X1 , X3 ) , (215) U1→3,K −− (X1 , X3 , W[2,K] ) −− X2 , (216) for all l = [1 : K], and such that there exist reconstruction mappings: h i  g12 X1 , X3 , U1→3,K , W[1,K+1] = X̂12 with E d(X2 , X̂12 ) ≤ D12 h i  g21 X2 , W[1,K+1] = X̂21 with E d(X1 , X̂21 ) ≤ D21 h i  g31 X3 , W[1,K+1] , U1→3,K = X̂31 with E d(X1 , X̂31 ) ≤ D31 h i  g32 X3 , W[1,K+1] , U1→3,K = X̂32 with E d(X2 , X̂32 ) ≤ D32 , , , , 9 with W[1,l] = {U1→23,l , U2→13,l }l−1 k=1 for all l = [1 : K], of the set of all tuples satisfying: R1 ≥ I(W[1,K+1] ; X1 X3 |X2 ) + I(U1→3,K ; X1 |W[1,K+1] X3 ) , (217) R2 ≥ I(W[1,K+1] ; X2 |X3 ) . (218) The auxiliary random variables have cardinality bounds: kU1→23,l k ≤ kX1 kkX3 k l−1 Y i=1 kU2→13,l k ≤ kX2 kkU1→23,l k kU1→23,i kkU2→13,i k + 1, l ∈ [1 : K] l−1 Y i=1 kU1→3,l k ≤ kX1 kkX3 k kU1→23,i kkU2→13,i k + 1, l ∈ [1 : K] K Y i=1 kU1→23,i kkU2→13,i k + 3. (219) (220) (221) Remark 14: Notice that multiple rounds are needed to achieve the rate-distortion region in Theorem 5. It is worth to mention that first encoders 1 and 2 cooperate over the K rounds while on the last round only node 1 send a private description to node 3. Because of the Markov chain assumed for the sources we observe the following: • Only node 1 send a private description to node 3. This is due to the fact that node 3 has better side information than 2. 9 Notice that U3→12,l = ∅ for all l because R3 = 0. November 7, 2018 DRAFT 40 • For the transmissions from node 2, both node 1 and 3 can be thought as an unique node and there is not reason for node 2 to send a private description to node 1 or node 3. • Notice that the there is not sum-rate. Node 3 recovers the descriptions generated at nodes 1 and 2 without resorting to joint-decoding. That is, node 3 can recover the descriptions generated at nodes 1 and 2 separately and independently. Proof: The direct part of the proof follows by choosing: U1→2,l =U2→1,l = U3→12,l = U3→1,l = U3→2,l = U2→3,l = ∅, ∀l U1→3,l =∅ l < K . and U1→23,l and U2→13,l and U1→3,K are auxiliary random variables that according to Theorem 1 should satisfy the Markov chains (214)-(216). Cumbersome but straightforward calculations allows to obtain the desired results. We now proceed to the proof of the converse. If a pair of rates (R1 , R2 ) and distortions (D12 , D21 , D31 , D32 ) are admissible for the K-steps interactive cooperative distributed source coding setting described in Fig. 5, then for all ε > 0 there exists n0 (ε, K), such that ∀ n > n0 (ε, K) there exists a K-steps interactive source code (n, K, F, G) with intermediate rates satisfying: K 1X log kJil k ≤ Ri + ε , i ∈ {1, 2} n l=1 and with average per-letter distortions h i n E d(X1n , X̂21 ) ≤ D21 + ε h i n n E d(X1 , X̂31 ) ≤ D31 + ε h i n E d(X2n , X̂12 ) ≤ D12 + ε h i n E d(X2n , X̂32 ) ≤ D32 + ε (222) , (223) , (224) , (225) , (226) where   [1:K] [1:K] n X̂32 ≡ g32 J1 , J2 , X3n ,   [1:K] [1:K] n X̂31 ≡ g31 J1 , J2 , X3n , November 7, 2018   [1:K] [1:K] n X̂12 ≡ g12 J1 , J2 , X1n , X3n ,   [1:K] [1:K] n X̂21 ≡ g21 J1 , J2 , X2n . (227) (228) DRAFT 41 For each t ∈ {1, . . . , n}, define random variables (U1→3,[t] , U2→3,[t] ) and the sequences of random variables (U1→23,k,[t] , U2→13,k,[t] )k=[1:K] as follows: U1→23,1,[t] , J11 , X3[1:t−1] , X2[t+1:n] U2→13,1,[t] , J21 ,  , (229) (230) U1→23,k,[t] , J1k , ∀ k = [2 : K] , (231) U2→13,k,[t] , J2k , ∀ k = [2 : K] , (232) U1→3,K,[t] , X3[t+1:n] , (233) From Corollary 4 in the Appendices we see that these choices satisfy equations (214), (215) and (216). 1) Rate at node 1: For the first rate, we have   [1:K] n(R1 + ε) ≥ H J1   (a) [1:K] = I J1 ; X1n X2n X3n  (b)  [1:K] ≥ I J1 ; X1n X3n |X2n   (c) [1:K] [1:K] = I J1 J2 ; X1n X3n |X2n     [1:K] [1:K] [1:K] [1:K] n n n n n = I J1 J2 ; X1 |X2 X3 + I J1 J2 ; X3 |X2 (234) (235) (236) (237) (238) n   X [1:K] [1:K] = I J1 J2 X2[1:t−1] X2[t+1:n] X3[1:t−1] X3[t+1:n] X1[1:t−1] ; X1[t] |X2[t] X3[t] (d) + (e) ≥ + t=1 n X t=1 n X t=1 n X t=1 November 7, 2018 I I   [1:K] [1:K] J1 J2 X3[1:t−1] X2[1:t−1] X2[t+1:n] ; X3[t] |X2[t]  [1:K] [1:K] J1 J2 X2[t+1:n] X3[1:t−1] X3[t+1:n] X1[1:t−1] ; X1[t] |X2[t] X3[t]   [1:K] [1:K] I J1 J2 X3[1:t−1] X2[t+1:n] ; X3[t] |X2[t] (239)  (240) DRAFT 42 = n   X [1:K] [1:K] I J1 J2 X2[t+1:n] X3[1:t−1] ; X1[t] |X2[t] X3[t] t=1 n h   X [1:K] [1:K] + I X3[t+1:n] X1[1:t−1] ; X1[t] |J1 J2 X2[t:n] X3[1:t] t=1 +I  [1:K] [1:K] J1 J2 X3[1:t−1] X2[t+1:n] ; X3[t] |X2[t] n   X [1:K] [1:K] = I J1 J2 X2[t+1:n] X3[1:t−1] ; X1[t] X3[t] |X2[t] + (f ) = + (g) ≥ + (h) = + = t=1 n X t=1 n X t=1 n X t=1 n X t=1 n X t=1 n X t=1 n X t=1 n X t=1 + n X t=1 i   [1:K] [1:K] I X3[t+1:n] X1[1:t−1] ; X1[t] |J1 J2 X2[t:n] X3[1:t] I I I I     [1:K] [1:K] J1 J2 X2[t+1:n] X3[1:t−1] ; X1[t] X3[t] |X2[t] (242)  [1:K] [1:K] X3[t+1:n] X1[1:t−1] ; X1[t] X2[t] |J1 J2 X2[t+1:n] X3[1:t] [1:K] [1:K] J1 J2 X2[t+1:n] X3[1:t−1] ; X1[t] X3[t] |X2[t] [1:K] [1:K] X3[t+1:n] ; X1[t] |J1 J2 X2[t+1:n] X3[1:t] I U1→23,[1:K],[t] U2→13,[1:K],[t] ; X1[t] X3[t] |X2[t]    I U1→3,K,[t] ; X1[t] |U1→23,[1:K],[t] U2→13,[1:K],[t] X3[t] (241)  (243) (244)  (245) I U1→23,[1:K],[Q] U2→13,[1:K],[Q] ; X1[Q] X3[Q] |X2[Q] , Q = t  I U1→3,K,[Q] ; X1[Q] |U1→23,[1:K],[Q] U2→13,[1:K],[Q] X3[Q] , Q = t  (246) i   h  e1→3,K ; X1 |U e1→23,[1:K] U e2→13,[1:K] X3 e1→23,[1:K] U e2→13,[1:K] ; X1 X3 |X2 +I U =n I U h    i f e f (247) = n I W[1,K+1] ; X1 X3 |X2 +I U1→3,K ; X1 |W[1,K+1] X3 (i) where [1:K] is a function of the sources (X1n , X2n , X3n ), • step (a) follows from the fact that J1 • step (b) follows from the non-negativity of mutual information, • step (c) follows from the fact that J2 November 7, 2018 [1:K] [1:K] is a function of J1 and the source X2n , DRAFT 43 • step (d) follows from the chain rule for conditional mutual information and the memoryless property across time of the sources (X1n , X2n , X3n ), • step (e) follows from the non-negativity of mutual information, • step (f ) follows from the Markov chain X2[t] − −(J1 [1:K] [1:K] J2 X2[t+1:n] X3[1:t] )− −(X3[t+1:n] X1[1:t−1] ) (Corollary 4 in the appendices), for all t = [1 : n] which follows from X1 −− X3 −− X2 ., • step (g) follows from the non-negativity of mutual information, • step (h) follows from defintions (233). • step (i) follows from the standard time-sharing arguments and the definition of new random e1→23,[1:K] , (U1→23,[1:K],[Q] , Q) and X1 , (X1[Q] , Q)). variables, (i.e. U and the last step follows from the definition of the past shared common descriptions W[1,l] ∀l. It e1→23,l , U e2→13,l ) satisfies the Markov chains in (214)-(216) for is also immediate to show that (U all l ∈ [1 : K]. 2) Rate at node 2: For the second rate, by following the same steps as before we have   [1:K] (248) n(R2 + ε) ≥ H J2   (a) [1:K] (249) = I J2 ; X1n X2n X3n   (b) [1:K] n n n (250) ≥ I J2 ; X2 |X1 X3   (c) [1:K] [1:K] (251) = I J1 J2 ; X2n |X1n X3n   (d) [1:K] [1:K] (252) = I J1 J2 X1n ; X2n |X3n  (e)  [1:K] [1:K] (253) ≥ I J1 J2 ; X2n |X3n n   X [1:K] [1:K] = I J1 J2 X3[1:t−1] X3[t+1:n] X2[t+1:n] ; X2[t] |X3[t] (f ) (g) ≥ (h) = t=1 n X t=1 n X t=1 where •   [1:K] [1:K] I J1 J2 X3[1:t−1] X2[t+1:n] ; X2[t] |X3[t] (255) I U1→23,[1:K],[t] U2→13,[1:K],[t] ; X2[t] |X3[t] (256)   f[1,K+1] ; X2 |X3 = nI W (i) [1:K] step (a) follows from the fact that J2 November 7, 2018 (254)  (257) is a function of the sources (X1n , X2n , X3n ), DRAFT 44 • step (b) follows from the non-negativity of mutual information, • step (c) follows from the fact that J1 • step (d) follows from the Markov chain X1 −− X3 −− X2 . • step (e) follows from the non-negativity of mutual information, • step (f ) follows from the chain rule for conditional mutual information and the memoryless [1:K] [1:K] is a function of J2 and the source (X1n , X3n ), property across time of the sources (X1n , X2n , X3n ), • step (g) follows from the non-negativity of mutual information, • step (h) follows from definitions (233). • step (i) follows the definition for W[1,l] ∀l and from standard time-sharing arguments similar to the ones for rate at node 1.   [1:K] [1:K] n 3) Distortion at nodes 1 and 2: Node 1 reconstructs an estimate X̂12 ≡ g12 J1 , J2 , X1n , X3n   [1:K] [1:K] n while node 2 reconstructs X̂21 ≡ g21 J1 , J2 , X2n . For each t ∈ {1, . . . , n}, define n functions X̂12[t] and X̂21[t] as being the t-th coordinate of the corresponding estimates of X̂12 n and X̂21 , respectively:    [1:K] [1:K] , X̂12[t] W[1,K+1],[t] , U1→3,K,[t] , X1[t] , X3[t] , X1[1:t−1] , X1[t+1:n] , g12[t] J1 , J2 , X1n (258)    [1:K] [1:K] . X̂21[t] W[1,K+1],[t] , X2[t] , g21[t] J1 , J2 , X2n (259) The component-wise mean distortions thus verify h  i [1:K] [1:K] n D12 + ε ≥ E d X2 , g12 J1 , J2 , X1 (260) i 1X h  (261) E d X2[t] , X̂12[t] W[1,K+1],[t] , U1→3,K,[t] , X1[t] , X3[t] , X1[1:t−1] , X1[t+1:n] = n t=1 (a) n i 1X h  ∗ ≥ E d X2[t] , X̂12[t] W[1,K+1],[t] , U1→3,K,[t] , X1[t] , X3[t] n t=1 (b) n  1X h  ∗ = E d X2[Q] , X̂12[Q] W[1,K+1],[Q] , U1→3,K,[Q] , X1[Q] , X3[Q] n t=1 h  i ∗ W[1,K+1],[Q] , U1→3,K,[Q] X1[Q] , X3[Q] = E d X2[Q] , X̂12[Q] i  h  (c) f[1,K+1] , U e1→3,K , X1 , X3 e12 W , = E d X2 , X (262) n where • Q=t i (263) (264) (265) step (a) follows from (259), November 7, 2018 DRAFT 45 • [1:K] [1:K] step (b) follows from Markov chain X2[t] −− X1[t] , X3[t] , J1 , J2 , X3[1:t−1] , X3[t+1:n] ,   X2[t+1:n] −− X1[1:t−1] , X1[t+1:n] ∀ t = [1 : n] (which can be obtained from Corollary 4 in the appendices) and Lemma 9. • step (c) follows from the following relations:    e12 Q, W[1,K+1],[Q] , U1→3,K,[Q] , X1[Q] , X3[Q] f[1,K+1] , U e1→3,K , X1 , X3 = X e12 W X  ∗ W[1,K+1],[Q] , U1→3,K,[Q] , X1[Q] , X3[Q] . , X̂12[Q] By following the very same steps, we can also show that: h  i [1:K] [1:K] D21 + ε ≥ E d X1 , g21 J1 , J2 , X2n i  h  f[1,K+1] , X2 e21 W , = E d X1 , X [1:K] and where we used the Markov chain X2[1:t−1] −− X2[t] , J1 [1:K] , J2 (266) (267)  , X3[1:t−1] , X2[t+1:n] −−X1[t] ∀ t = [1 : n] (which can be obtained from Corollary 4 in the appendices) and Lemma 9 and e21 : where we define the function X       ∗ f f e f e X21 W[1,K+1] , X2 = X21 Q, W[1,K+1],[Q] , X2[Q] , X̂21[Q] W[1,K+1],[Q] , U2→13[Q] , X2[Q] .   [1:K] [1:K] n 4) Distortions at node 3: Node 3 compute lossy reconstructions X̂31 ≡ g31 J1 , J2 , X3n   [1:K] [1:K] n and X̂32 ≡ g32 J1 , J2 , X3n . For each t ∈ {1, . . . , n}, define functions X̂31[t] and X̂32[t] n n as being the t-th coordinate of the corresponding estimates of X̂31 and X̂32 , respectively:    [1:K] [1:K] n (268) X̂31[t] W[1,K+1],[t] , U1→3,K,[t] , X3[t] , g31[t] J1 , J2 , X3 ,    [1:K] [1:K] (269) X̂32[t] W[1,K+1],[t] , U1→3,K,[t] , X3[t] , g32[t] J1 , J2 , X3n . The component-wise mean distortions thus verify h  i [1:K] [1:K] n D31 + ε ≥ E d X1 , g31 J1 , J2 , X3 (270) i 1X h  = E d X1[t] , X̂31[t] W[1,K+1],[t] , U1→3,K,[t] , X3[t] n t=1 n  1X h  E d X1[Q] , X̂31[Q] W[1,K+1],[Q] , U1→3,K,[Q] , X3[Q] = n t=1 h  i = E d X1[Q] , X̂31[Q] W[1,K+1],[Q] , U1→3,K,[Q] , X3[Q] i  h  f[1,K+1] , U e1→3,K , X3 e31 W , = E d X1 , X (271) n November 7, 2018 Q=t i (272) (273) (274) DRAFT 46 e31 by where the last step follows by defining the function X    e31 Q, W[1,K+1],[Q] , U1→3,K,[Q] , X3[Q] f[1,K+1] , U e1→3,K , X3 = X e31 W X  , X̂31[Q] W[1,K+1],[Q] , U1→3,K,[Q] , X3[Q] . By following the very same steps, we can also show that: h  i [1:K] [1:K] D32 + ε ≥ E d X2 , g32 J1 , J2 , X3n i  h  f e e , = E d X2 , X32 W[1,K+1] , U1→3,K , X3 (275) (276) e32 by and where we define the function X    e32 Q, W[1,K+1],[Q] , U1→3,K,[Q] , X3[Q] f[1,K+1] , U e1→3,K , X3 = X e32 W X  , X̂32[Q] W[1,K+1],[Q] , U1→3,K,[Q] , X3[Q] . Cooperative distributed source coding with two distortion criteria under reversal delivery and side information This concludes the proof of the converse and thus that of the theorem. E. Three encoders and three decoders subject to lossless/lossy reconstruction constraints with degraded side information Consider now the problem described in Fig. 6 where encoder 1 wishes to communicate the lossless the source X1n to nodes 2 and 3 while encoder 2 wishes to send a lossy description of its source X2n to node 3 with distortion constraints D32 and encoder 3 wishes to send a lossy description of its source X3n to node 2 with distortion constraints D23 . In addition to this, the encoders perfom the communication using K communication rounds. This problem can be seen as a generalization of the settings previously investigated in [10]. This setting can model a problem in which node 1 generate a process X1 . This process physically propagates to the locations where nodes 2 and 3 are. These nodes measure X2 and X3 respectively. If node 2 is closer to node 1 than node 3, we can assume that X1 −− X2 −− X3 . Nodes 2 and 3 then interact between them and with node 1, in order to reconstruct X1 in lossless fashion and X2 and X3 with some distortion level. November 7, 2018 DRAFT 47 n n (X̂23 , D23 ) X̂21 ≈ X1n X2n Node 2 R1 R2 R2 Node 1 R3 R3 X3n Node 3 X1n R1 n n (X̂32 , D32 ) X̂31 ≈ X1n Figure 6: Three encoders and three decoders subject to lossless/lossy reconstruction constraints with degraded side information. Theorem 6: The rate-distortion region of the setting described in Fig. 6 where X1 −−X2 −−X3 form a Markov chain is given by the union over all joint probability measures pX1 X2 X3 U3→2,[1:K] U2→3,[1:K] satisfying the Markov chains U2→3,l −− (X1 , X2 , V[23,l,2] ) −− X3 , (277) U3→2,l −− (X1 , X3 , V[23,l,3] ) −− X2 , (278) ∀l ∈ [1 : K], and such that there exist reconstruction mappings: h i  g23 X1 , X2 , V[23,K+1,2] = X̂23 with E d(X3 , X̂23 ) ≤ D23 , h i  g32 X1 , X3 , V[23,K+1,2] = X̂32 with E d(X2 , X̂32 ) ≤ D32 , of the set of all tuples satisfying: R1 ≥ H(X1 |X2 ) , (279) R2 ≥ I(V[23,K+1,2] ; X2 |X1 X3 ) , (280) R3 ≥ I(V[23,K+1,2] ; X3 |X1 X2 ) , (281) R1 + R2 ≥ H(X1 |X3 ) + I(V[23,K+1,2] ; X2 |X1 X3 ) , November 7, 2018 (282) DRAFT 48 The auxiliary random variables have cardinality bounds: kU2→3,l k ≤ kX1 kkX2 k l−1 Y i=1 kU2→3,i kkU3→2,i k + 1, l ∈ [1 : K] kU3→2,l k ≤ kX1 kkX3 kkU2→3,l k l−1 Y i=1 kU2→3,i kkU3→2,i k + 1, l ∈ [1 : K] (283) (284) Remark 15: Theorem 6 shows that several exchanges between nodes 2 and 3 can be helpful. Node 1 transmit only once at the beginning its full source. Proof: The direct part of the proof follows according to Theorem 1 by choosing: U1→3,l = U1→2,l = U3→1,l = U2→1,l = U3→12,l = ∅ U1→23,l = U2→13,l = ∅, ∀l ∈ [1 : K]: ∀l ∈ [2 : K] and U1→23,1 = U2→13,1 = X1 . The remanding auxiliary random variables satisfy ∀l ∈ [1 : K]: U2→3,l −− (X1 , X2 , V[23,l,2] ) −− X3 , (285) U3→2,l −− (X1 , X3 , V[23,l,3] ) −− X2 . (286) If a pair of tuple (R1 , R2 , R3 ) and distortions (D23 , D32 ) are admissible for the K-steps interactive cooperative distributed source coding setting described in Fig. 6, then for all ε > 0 there exists n0 (ε, K), such that ∀ n > n0 (ε, K) there exists a K-steps interactive source code (n, K, F, G) with intermediate rates satisfying: K 1X log kJil k ≤ Ri + ε , i ∈ {1, 2, 3} n l=1 (287) and with average per-letter distortions with respect to the source 2 and perfect reconstruction with respect to the source 1 at all nodes: h i n E d(X2n , X̂32 ) ≤ D32 + ε ,   n n Pr X1 6= X̂21 ≤ ε , h i n E d(X3n , X̂23 ) ≤ D23 + ε ,   n ≤ε, Pr X1n 6= X̂31 November 7, 2018 (288) (289) (290) (291) DRAFT 49 where   [1:K] [1:K] [1:K] n X̂32 ≡ g32 J1 , J2 , J3 , X3n ,   [1:K] [1:K] [1:K] n X̂31 ≡ g31 J1 , J2 , J3 , X3n ,   [1:K] [1:K] [1:K] n X̂23 ≡ g23 J1 , J2 , J3 , X2n , (292)   [1:K] [1:K] [1:K] n X̂21 ≡ g21 J1 , J2 , J3 , X2n . (293) For each t ∈ {1, . . . , n} and l ∈ [1 : K], we define random variables U2→3,l,[t] and U3→2,l,[t] as follows:  U2→3,1,[t] , J11 , J21 , X1[1:t−1] , X1[t+1:n] , X2[t+1:n] , X3[1:t−1] ,  U2→3,l,[t] , J1l , J2l , l ∈ [2 : K] U3→2,l,[t] , J3l , l ∈ [1 : K] . (294) (295) (296) These auxiliary random variables satisfy the Markov conditions (285) and (286), which can be verified from Lemma 11 in the appendices. By the conditions (291) and (289), and Fano’s inequality [21], we have     n n n , nǫn , log2 (kX1n k − 1) + H2 Pr X1n 6= X̂31 H(X1n |X̂31 ) ≤ Pr X1n = 6 X̂31     n n n , nǫn , 6 X̂21 log2 (kX1n k − 1) + H2 Pr X1n = H(X1n |X̂21 ) ≤ Pr X1n = 6 X̂21 (297) (298) where ǫn (ε) → 0 provided that ε → 0 and n → ∞. 1) Rate at node 1: For the first rate, we have   [1:K] n(R1 + ε) ≥ H J1   [1:K] ≥ H J1 |X2n X3n   (a) [1:K] [1:K] [1:K] = H J1 J2 J3 |X2n X3n   (b) [1:K] [1:K] [1:K] = I J1 J2 J3 ; X1n |X2n X3n   [1:K] [1:K] [1:K] ≥ nH(X1 |X2 X3 ) − H X1n |X2n J1 J2 J3 (299) (300) (301) (302) (303) (c) n ≥ nH(X1 |X2 X3 ) − H(X1n |X̂21 ) (304) (d) ≥ n [H(X1 |X2 X3 ) − ǫn ] , (e) = n [H(X1 |X2 ) − ǫn ] , (305) (306) where November 7, 2018 DRAFT 50 • [1:K] step (a) follows from the fact that by definition of the code the sequence J2 [1:K] functions of (J1 • , X2n , X3n ), [1:K] step (b) follows from the fact that by definition of the code the sequences (J1 are functions of the sources(X1n , X2n , X3n ), • [1:K] , J3 [1:K] , J2 are [1:K] , J3 step (c) follows from the code assumption in (293) that guarantees the existence of a   [1:K] [1:K] [1:K] n reconstruction function X̂21 ≡ g21 J1 , J2 , J3 , X2n , • step (d) follows from Fano’s inequality in (298), • step (e) follows from the assumption that X1 −− X2 −− X3 form a Markov chain. 2) Rate at nodes 2 and 3: For the second rate, we have   [1:K] n(R2 + ε) ≥ H J2   (a) [1:K] = I J2 ; X1n X2n X3n   (b) [1:K] ≥ I J2 ; X2n |X1n X3n   (c) [1:K] [1:K] [1:K] = I J1 J2 J3 ; X2n |X1n X3n (307) (308) (309) (310) n   X [1:K] [1:K] [1:K] = I J1 J2 J3 ; X2[t] |X1n X3n X2[t+1:n] (d) (311) t=1 (e) ≥ (f ) = n X t=1 n X t=1  I V[23,K+1,2][Q] ; X2[Q] |X1[Q] X3[Q] , Q = t   e[23,K+1,2] ; X2 |X1 X3 , ≥ nI V (g) where I V[23,K+1,2][t] ; X2[t] |X1[t] X3[t] [1:K]  (313) (314) is a function of the sources (X1n , X2n , X3n ), • step (a) follows from the fact that J2 • step (b) follows from the non-negativity of mutual information, • step (c) follows from the fact that (J1 [1:K] (X1n , X2n , X3n ), (312) [1:K] , J2 [1:K] , J3 ) are functions of the sources • step (d) follows from the chain rule for conditional mutual information, • step (e) follows from the definitions (294), (295) and (296), the memoryless property of the sources and the non-negativity of mutual information, November 7, 2018 DRAFT ) 51 • step (f ) follows from the use of a time sharing random variable Q uniformly distributed over the set {1, . . . , n}, • e[23,K+1,2] , (V[23,K+1,2][Q] , Q). step (g) follows by letting a new random variable V By following similar steps, it is not difficult to check that n X  n(R3 + ε) ≥ I V[23,K+1,2][t] ; X3[t] |X1[t] X2[t] = t=1 n X t=1 I V[23,K+1,2][Q] ; X3[Q] |X1[Q] X2[Q] , Q = t   e[23,K+1,2] ; X3 |X1 X2 . ≥ nI V (315)  3) Sum-rate of nodes 1 and 2: For the sum-rate, we have     [1:K] [1:K] n(R1 + R2 + 2ε) ≥ H J1 + H J2   [1:K] [1:K] ≥ H J1 J2   (a) [1:K] [1:K] = I J1 J2 ; X1n X3n X2n   (b) [1:K] [1:K] ≥ I J1 J2 ; X1n X2n |X3n     [1:K] [1:K] [1:K] [1:K] [1:K] n n n n n = I J1 J2 ; X1 |X3 + I J1 J2 J3 ; X2 |X1 X3   [1:K] [1:K] = H (X1n |X3n ) − H X1n |J1 J2 X3n   [1:K] [1:K] [1:K] +I J1 J2 J3 ; X2n |X1n X3n   (c) [1:K] [1:K] [1:K] n ≥ H (X1n |X3n ) − H(X1n |X̂31 ) + I J1 J2 J3 ; X2n |X1n X3n   (d) [1:K] [1:K] [1:K] n n n ≥ n [H (X1 |X3 ) − ǫn ] + I J1 J2 J3 ; X2 |X1 X3 (316) (317) (318) (319) (320) (321) (322) (323) (324) (325) n   X [1:K] [1:K] [1:K] ≥ I J1 J2 J3 X1[1:t−1] X1[t+1:n] X3[1:t−1] X2[t+1:n] ; X2[t] |X1[t] X3[t] (e) t=1 + n [H (X1 |X3 ) − ǫn ] (f ) = n [H (X1 |X3 ) − ǫn ] + November 7, 2018 t=1 I V[23,K+1,2][t] ; X2[t] |X1[t] X3[t]    = n H (X1 |X3 ) − ǫn + I V[23,K+1,2][Q] ; X2[Q] |X1[Q] X3[Q] , Q i h  (h) e[23,K+1,2] ; X2 |X1 X3 , = n H (X1 |X3 ) − ǫn + I V (g) where (326) n X (327) (328) (329) DRAFT 52 [1:K] [1:K] and J2 are functions of the sources (X1n , X2n , X3n ), • step (a) follows from the fact that J1 • step (b) follows non-negativity of mutual information, • step (c) follows from the code assumption in (293) that guarantees the existence of recon  [1:K] [1:K] [1:K] n n struction function X̂31 ≡ g31 J1 , J2 , J3 , X3 , • step (d) follows from Fano’s inequality in (291), • step (e) follows from the chain rule of conditional mutual information and the memoryless property across time of the sources (X1n , X2n , X3n ), and non-negativity of mutual information, • step (f ) from follows from the definitions (294) and (295), • step (g) follows from the use of a time sharing random variable Q uniformly distributed • over the set {1, . . . , n}, e[23,K+1,2] , (V e[23,K+1,2][Q] , Q). step (h) follows from V 4) Distortion at node 2: Node 2 reconstructs a lossy n X̂23 ≡ g23  [1:K] [1:K] [1:K] J1 , J2 , J3 , X2n  . For each t ∈ {1, . . . , n}, define a function X̂23[t] as beging the t-th coordinate of this estimate:    [1:K] [1:K] [1:K] n (330) X̂23[t] V[23,K+1,2][t] , X2[t] , g23[t] J1 , J2 , J3 , X2 . The component-wise mean distortion thus verifies h  i [1:K] [1:K] [1:K] D23 + ε ≥ E d X3 , g23 J1 , J2 , J3 , X2n (331) i 1X h  E d X3[t] , X̂23[t] V[23,K+1,2][t] , X2[t] = n t=1 n  1X h  = E d X3[Q] , X̂23[Q] V[23,K+1,2][Q] , X2[Q] n t=1 h  i = E d X3[Q] , X̂23[Q] V[23,K+1,2][Q] , X2[Q] i  h  e e , = E d X3 , X23 V[23,K+1,2] , X2 (332) n Q=t i (333) (334) (335) e23 by where we defined function X     e23 Q, V[23,K+1,2][Q] , X2[Q] , X̂23[Q] V[23,K+1,2][Q] , X2[Q] .(336) e e X23 V[23,K+1,2] , X2 = X  [1:K] [1:K] n 5) Distortion at node 3: Node 3 reconstructs a lossy description X̂32 ≡ g32 J1 , J2 ,  [1:K] J3 , X3n . For each t ∈ {1, . . . , n}, define a function X̂32[t] as beging the t-th coordinate of this estimate:    [1:K] [1:K] [1:K] X̂32[t] V[23,K+1,2][t] , X3[t] , g32[t] J1 , J2 , J3 , X3n . November 7, 2018 (337) DRAFT 53 The component-wise mean distortion thus verifies h  i [1:K] [1:K] [1:K] n D32 + ε ≥ E d X2 , g32 J1 , J2 , J3 , X3 (338) i 1X h  E d X2[t] , X̂32[t] V[23,K+1,2][t] , X3[t] = n t=1 n  1X h  E d X2[Q] , X̂32[Q] V[23,K+1,2][Q] , X3[Q] = n t=1 h  i = E d X2[Q] , X̂32[Q] V[23,K+1,2][Q] , X3[Q] i  h  e[23,K+1,2] , X3 e32 V , = E d X2 , X (339) n e32 by where we defined function X    e32 Q, V[23,K+1,3][Q] , X3[Q] e e X32 V[23,K+1,3] , X3 = X  , X̂32[Q] V[23,K+1,3][Q] , X3[Q] . Q=t i (340) (341) (342) (343) This concludes the proof of the converse and thus that of the theorem. VI. D ISCUSSION A. Numerical example In order to obtain further insight into the gains obtained from cooperation, we consider the case of two encoders and one decoder subject to lossy/lossless reconstruction constraints without side information in which the sources are distributed according to:   x22 1 exp − 2 pX1 X2 (x1 , x2 ) = α1 {x1 = 1} √ 2σ1 2πσ1   1 x22 + (1 − α)1 {x1 = 0} √ (344) exp − 2 . 2σ0 2πσ0 This model yields a mixed between discrete and continuous components. We observe that X1 follows a Bernoulli distribution with parameter α ∈ [0 : 1] while X2 given X1 follows a Gaussian distribution with different variance according to the value of X1 ∈ {0, 1}. In this sense, X2 follows a Gaussian mixture distribution10 . 10 Although the inner bound region in Theorem 1 is strictly valid for discrete sources with finite alphabets, the Gaussian distribution is sufficiently well-behaved to apply a uniform quantization procedure prior to the application of the results of Theorem 1. Then, a limiting argument using a sequence of decreasing quantization step-sizes will deliver the desired result. See chapter 3 in [22]. November 7, 2018 DRAFT 54 The optimal rate-distortion region for this case was characterized in Theorem 2 and can be alternatively written as: R(D) = [n p∈L (R1 , R2 ) : R1 > H(X1 |X2 ) , R2 > I(X2 ; U |X1 ) , R1 + R2 > H(X1 ) + I(X2 ; U |X1 ) where o ,  L = pU |X1 X2 : there exists (x1 , u) 7→ g(x1 , u) such that E[d(X2 , g(X1 , U ))] ≤ D . (345) (346) The corresponding non-cooperative region for the same problem was characterized in [5]: [ n R(D) = (R1 , R2 ) : R1 > H(X1 |U ) , (347) p∈L⋆ R2 > I(X2 ; U |X1 ) , R1 + R2 > H(X1 ) + I(X2 ; U |X1 ) where (348) o ,  L⋆ = pU |X2 : there exists (x1 , u) 7→ g(x1 , u) such that E[d(X2 , g(X1 , U ))] ≤ D . (349) (350) From the previous expressions, it is evident that the cooperative case offers some gains with respect to the non-cooperative setup. This is clearly evidenced from the lower limit in R1 and the fact that L⋆ ⊆ L. We have the following result. Theorem 7 (Cooperative region for mixed discrete/continuous source): Assume the source distribution is given by (344) and that, without loss of generality, σ02 ≤ σ12 . The rate-distortion region November 7, 2018 DRAFT 55 from Theorem 2 can be written as:   Z ∞ 1−α x22 R1 > √ exp − 2 H2 (g(x2 ))dx2 2σ0 2πσ0 −∞   Z ∞ x22 α exp − 2 H2 (g(x2 ))dx2 , +√ 2σ1 2πσ1 −∞ !  2(1−α) 2α  1 σ0 σ1   D ≤ σ02 log  2 D R2 > , +   2  α ασ  1   D > σ02 log 2 D − (1 − α)σ02 !  2(1−α) 2α  σ σ 1  0 1  H2 (α) + log D ≤ σ02  2 D R1 + R2 > , +   2  α ασ  1   H2 (α) + D > σ02 log 2 D − (1 − α)σ02 where H2 (z) ≡ −z log z − (1 − z) log (1 − z) for z ∈ [0, 1], [x]+ = max {0, x} and   x22 α √ exp − 2 2σ1 2πσ 1     . g(x2 ) = 2 α x2 x22 1−α √ exp − 2 + √ exp − 2 2σ1 2σ0 2πσ1 2πσ0 (351) Proof: The converse proof is straightforward by observing that when D ≤ σ02 : I(X2 ; U |X1 ) = h(X2 |X1 ) − h(X2 |U, X1 ) ≥ h(X2 |X1 ) − and h(X2 |X1 ) = α 2 1 log (2πeD) 2 (352) log (2πeσ12 ) + 1−α log (2πeσ02 ). For the case when σ02 < D ≤ ασ12 + (1 − α)σ02 2 we can write: I(X1 ; U |X1 ) = h(X2 |X1 ) − h(X2 |U, X1 ) ≥ h(X2 |X1 ) − αh(X2 |U, X1 = 1) − (1 − α)h(X2 |X1 = 0)   2πe(D − (1 − α)σ02 ) α 2 = log (2πeσ1 ) − α log 2 α   2 ασ1 α . = log 2 D − (1 − α)σ02 (353) (354) When D > ασ12 + (1 − α)σ02 we can lower bound the mutual information by zero. The achievability follows from the choice:   2  σ0  U if X1 = 0 2 σ02 +σZ  2 0 g(U, X1 ) = σ1  U if X1 = 1 , σ 2 +σ 2 1 November 7, 2018 (355) Z1 DRAFT 56 and by setting the auxiliary random variable:   X + Z if X = 0 , 2 0 1 U=  X2 + Z1 if X1 = 1 , (356) where Z0 , Z1 are zero-mean Gaussian random variables, independent from X2 and X1 and with variances given by: σZ2 0 = Dσ12 Dσ02 2 , σ , = Z1 σ02 − D σ12 − D (357) for D ≤ σ02 while for σ02 < D ≤ ασ12 + (1 − α)σ02 , we choose: σZ2 0 → ∞ , σZ2 1 = [D − (1 − α)σ02 ] σ12 . ασ12 − [D − (1 − α)σ02 ] (358) Finally, for D > ασ12 + (1 − α)σ02 , we let σZ2 0 → ∞ and σZ2 1 → ∞. Unfortunately, the non-cooperative region is hard to evaluate for the assumed source model11 . In order to present some comparison between the cooperative and non-cooperative case let us fix the same value for the rate R1 in both cases and compare the rate R2 that can be obtained in each case. Clearly, in this way, we are not taking into account the gain in R1 that could be obtained by the cooperative scheme (as H(X1 |X2 ) ≤ H(X1 |U ) for every U −− X2 −− X1 ). For both schemes, it follows that for fixed R1 : R2 > max {I(X2 ; U |X1 ), H(X1 ) + I(X2 ; U |X1 ) − R1 } . (359) From Theorem 7, we can compute (359) for the cooperative case. For the non-cooperative case we need to obtain a lower bound on I(X2 ; U |X1 ) for pU |X1 ∈ L⋆ . It is easy to check that:  2  2 α σ0 σ1 1−α + log , (360) log I(X2 ; U |X1 ) ≥ 2 β0 2 β1 where i |X1 = 0 , h i 2 β1 = EX2 U |X1 X2 − EX2|U X1 [X2 |U, X1 = 1] |X1 = 1 . β0 = EX2 U |X1 h X2 − EX2|U X1 [X2 |U, X1 = 0] 2 (361) (362) The distortion constraint imposes the condition: (1 − α)β0 + αβ1 ≤ D . 11 (363) However, there are cases where an exact characterization is possible. This is the case, for example, when X1 and X2 are the input and output of binary channel with crossover probability α and the distortion function is the Hamming distance [5]. November 7, 2018 DRAFT 57 In order to guarantee that (360) and (361) are achievable, under the Markov constraint U −− X2 −− X1 , the following conditions on pU |X2 (u|x2 ) should be satisfied:   x22 1 ) ( exp − 2 pU |X2 (u|x2 ) √ 2 1 (x2 − f0 (u)) 2σ0 2πσ0   =√ exp − 2 R∞ x 1 2β0 2πβ0 exp − 22 dx2 p (u|x2 ) √ −∞ U |X2 2σ0 2πσ0 and where   x22 1 ( ) exp − 2 pU |X2 (u|x2 ) √ (x2 − f1 (u))2 1 2σ1 2πσ1   exp − , =√ R∞ 1 x22 2β1 2πβ1 (u|x2 ) √ p exp − 2 dx2 −∞ U |X2 2σ1 2πσ1 f0 (U ) ≡ EX2|U X1 [X2 |U, X1 = 0] , f1 (U ) ≡ EX2|U X1 [X2 |U, X1 = 1] . (364) (365) (366) The characterization of all distributions pU |X2 (u|x2 ) that satisfies (365) and (366) appears to be a difficult problem. In order to show a numerical example, we shall simply assume that: ( ) (u − x2 )2 1 exp − . (367) pU |X2 (u|x2 ) = √ 2σw2 2πσw Indeed, this choice satisfies simultaneously expressions (365) and (366). In this way, we can calculate the corresponding values of β0 and β1 obtaining the parametrization of I(X2 ; U |X1 ) as function of σw2 : 1−α I(X2 ; U |X1 ) = log 2  σ02 + σw2 σw2  α + log 2  σ12 + σw2 σw2  (368) with the following constraint: (1 − α) σw2 σ12 σw2 σ02 + α =D . σ02 + σw2 σ12 + σw2 (369) We can replace (369) in (359) to obtain an indication of the performance of the non-cooperative case when R1 is fixed. We present now some numerical evaluations. As equation (359) is valid for both the cooperative and the non-cooperative setups, it is sufficient to compare the mutual information term I(X2 ; U |X1 ) for each of them. Let us consider the next scenarios: 1) α = 0.1, σ02 = 0.01, σ12 = 2, 2) α = 0.1, σ02 = 0.5, σ12 = 2. November 7, 2018 DRAFT 58 1.5 I(X2;U|X1) Non−cooperative scheme Cooperative scheme 1 α=0.1 σ0=0.5 σ1=2 0.5 0 0.1 0.15 0.2 0.25 0.3 D 0.35 0.4 0.45 0.5 0.5 Non−cooperative scheme I(X2;U|X1) 0.4 Cooperative scheme 0.3 α=0.1 σ =0.01 0 σ =2 0.2 0.1 0.01 1 0.015 0.02 0.025 0.03 D 0.035 0.04 0.045 0.05 Figure 7: Comparison between the cooperative and the non-cooperative schemes. From Fig. 7 we see that in the case σ02 ≪ σ12 the gain of the cooperative scheme is pretty noticeable. However, as σ02 becomes comparable to σ12 the gains are reduced. This was expected from the fact that as σ02 → σ12 , the random variable X2 converges to a Gaussian distribution. In that case, the reconstruction of X2 at Node 3 is equivalent for the cooperative scenario to a lossy source coding problem with side information X1 at both the encoder and the decoder while for the non-cooperative setting to the standard Wyner-Ziv problem. It is known that in this case there is no gains that can be expected [2]. B. Interactive Lossless Source Coding Consider now the problem described in Fig. 8 where encoder 1 wishes to communicate lossless the source X1n to two decoders which observe sources X2n and X3n . At the same time node 1 November 7, 2018 DRAFT 59 wishes to recover X2n and X3n lossless. Similarly the other encoders want to communicate lossless their sources and recover the sources from the rest. It is wanted to do this through K rounds of exchanges. n n ≈ X3n X̂21 ≈ X1n X̂23 R1 n X̂12 X1n ≈ X2n n X̂13 ≈ Node 2 X3n X2n R2 R2 Node 1 R3 R3 R1 Node 3 X3n n n X̂31 ≈ X1n X̂32 ≈ X2n Figure 8: Interactive lossless source coding. Theorem 8 (Interactive lossless source coding): The rate region of the setting described in Fig. 8 is given by the set of all tuples satisfying: R1 >H(X1 |X2 X3 ) , (370) R2 >H(X2 |X1 X3 ) , (371) R3 >H(X3 |X1 X2 ) , (372) R1 + R2 >H(X1 X2 |X3 ) , (373) R1 + R3 >H(X1 X3 |X2 ) , (374) R2 + R3 >H(X2 X3 |X1 ) . (375) Remark 16: It is worth observing that the multiple exchanges of descriptions between all nodes cannot increase the rate compared to standard Slepian-Wolf coding [1]. Proof: The achievability part is a standard exercise. The converse proof is straightforward from cut-set arguments. For these reasons both proofs are ommited. November 7, 2018 DRAFT 60 We should note that for this important case, Theorem 1 does not provide the optimal rateregion. That is, the coding scheme used is not optimal for this case. In fact, from Theorem 1 and for this problem we can obtain the following achievable region12 : R1 > H(X1 |X2 ) , (376) R2 > H(X2 |X1 X3 ) , (377) R3 > H(X3 |X1 X2 ) , (378) R1 + R2 > H(X1 X2 |X3 ) , (379) R2 + R3 > H(X2 X3 |X1 ) . (380) It is easily seen that in this region the node 2 is not performing joint decoding of the descriptions generated at node 1 and 3. Because of the encoding ordering assumed (1 → 2 → 3) and the fact that the common description generated in node 2 should be conditionally generated on the common description generated at node 1, node 2 has to recover first this common description first. At the end, it recovers the common description generated at node 3. On the other hand, nodes 1 and 3 perform joint decoding of the common information generated at nodes 2 and 3, and at nodes 1 and 2, respectively. Clearly, this is consequence of the sequential encoding and decoding structure imposed between the nodes in the network and which is the basis of the interaction. If all nodes would be allowed to perform a joint decoding procedure in order to recover all the exchanged descriptions only at the end of each round, this problem would not appear. However, this would destroy the sequential encoding-decoding structure assumed in the paper which seems to be optimal in other situations. VII. S UMMARY The three-node multiterminal lossy source coding problem was investigated. This problem is not a straightforward generalization of the original problem posed by Kaspi in 1985. As this general problem encompasses several open problems in multiterminal rate distortion the 12 Consider U1→23,1 ≡ X1 , U2→13,1 ≡ X2 and U3→12,1 ≡ X3 and the other auxiliary random variables to be constants for all l ∈ [1 : K]. November 7, 2018 DRAFT 61 mathematical complexity of it is formidable. For that reason we only provided a general inner bound for the rate distortion region. It is shown that this (rather involved) inner bound contains several rate-distortion regions of some relevant source coding settings. It In this way, besides the non-trivial extension of the interactive two terminal problem, our results can be seen as a generalization and hence unification of several previous works in the field. We also showed, that our inner bound provides definite answers to special cases of the general problem. It was shown that in some cases the cooperation induced by the interaction can be helpful while in others not. It is clear that further study is needed on the topic of multiple terminal cooperative coding, including a proper generalization to larger networks and to the problem of interactively estimating arbitrary functions of the sources sensed at the nodes. November 7, 2018 DRAFT 62 A PPENDIX A S TRONGLY TYPICAL SEQUENCES AND RELATED RESULTS In this appendix we introduce standard notions in information theory but suited for the mathematical developments and proof needed in this work. The results presents can be easily derived from the standard formulations provided in [22] and [23]. Be X and Y finite alphabets and (xn , y n ) ∈ X n × Y n . With P(X × Y) we denote the set of all probability distributions on X × Y. We define the strongly δ-typical sets as: Definition 3 (Strongly typical set): Consider p ∈ P(X ) and δ > 0. We say that xn ∈ X n is n pδ- strongly typical if xn ∈ T[p]δ with:   δ N (a|xn ) n n n − p(a) ≤ , ∀a ∈ X such that p(a) 6= 0 , T[p]δ = x ∈ X : n kX k (381) where N (a|xn ) denotes de number of occurrences of a ∈ X in xn and p ∈ P(X ). When n X ∼ pX (x) we can denote the corresponding set of strongly typical sequences as T[X]δ . Similarly, given pXY ∈ P (X × Y) we can construct the set of δ-jointly typical sequences as:  δ N (a, b|xn , y n ) n − pXY (a, b) ≤ , T[XY ]δ = (xn , y n ) ∈ X n × Y n : n kX kkYk ∀(a, b) ∈ X × Y such that pXY (a, b) 6= 0} . (382) We also define the conditional typical sequences. In precise terms, given xn ∈ X n we consider the set: T[Yn |X]δ (xn ) n δ N (a, b|xn , y n ) − pXY (a, b) ≤ , = y ∈Y : n kX kkYk o ∀(a, b) ∈ X × Y such that pXY (a, b) 6= 0 . n n (383) Notice that we the following is an alternative writing of this set:  n T[Yn |X]δ (xn ) = y n ∈ Y n : (xn , y n ) ∈ T[XY ]δ . (384) We have several useful and standard Lemmas, which will be presented without proof (except for the last one): Lemma 1 (Properties of typical sets [23]): The following are true: n n n n n n n 1) Consider (xn , y n ) ∈ T[XY ∈ T[Y ]ǫ , xn ∈ T[X|Y ∈ ]ǫ . Then, x ∈ T[X]ǫ , y ]ǫ (y ) and y T[Yn |X]ǫ (xn ) . November 7, 2018 DRAFT 63 n 2) Be T[Yn |X]ǫ (xn ) with xn ∈ / T[X]ǫ . Then T[Yn |X]ǫ (xn ) = ∅ . Q n 3) Be (X n , Y n ) ∼ nt=1 pXY (xt , yt ). If xn ∈ T[X]ǫ we have 2−n(H(X)+δ(ǫ)) ≤ pX n (xn ) ≤ 2−n(H(X)−δ(ǫ)) with δ(ǫ) → 0 when ǫ → 0. Similarly, if y n ∈ T[Yn |X]ǫ (xn ): ′ ′ 2−n(H(Y |X)+δ (ǫ)) ≤ pY n |X n (y n |xn ) ≤ 2−n(H(Y |X)−δ (ǫ)) with δ ′ (ǫ) → 0 when ǫ → 0 . Lemma 2 (Conditional typicality lemma [23]): Consider de product measure Using that measure, we have the following    n −nf (ǫ) , c1 > 1 Pr T[X]ǫ ≥ 1 − O c1 n ′ where f (ǫ) → 0 when ǫ → 0. In addition, for every xn ∈ T[X]ǫ ′ with ǫ <    n −ng(ǫ,ǫ′ ) n n , c2 > 1 Pr T[Y |X]ǫ (x )|x ≥ 1 − O c2 ǫ kYk Qn t=1 pXY (xt , yt ). we have: where g(ǫ, ǫ′ ) → 0 when ǫ, ǫ′ → 0. Lemma 3 (Size of typical sets [23]): Given pXY ∈ P (X × Y) we have 1 n 1 n kT[X]ǫ k ≤ H(X) + δ(ǫ), kT k ≥ H(X) − δ ′ (ǫ, n) n n [X]ǫ where δ(ǫ), δ ′ (ǫ, n) → 0 when ǫ → 0 and n → ∞. Similarly for every xn ∈ X n we have: 1 n kT[Y |X]ǫ (xn )k ≤ H(Y |X) + δ(ǫ) n n ′ with δ(ǫ) → 0 with ǫ → 0. In addition, for every xn ∈ T[X]ǫ ′ with ǫ < ǫ kYk we have: 1 n kT (xn )k ≥ H(Y |X) − δ ′ (ǫ, ǫ′ , n) n [Y |X]ǫ where δ ′ (ǫ, ǫ′ , n) → 0 when ǫ, ǫ′ → 0 and n → ∞. n n Lemma 4 (Joint typicality lemma [23]): Consider (X, o Z (x, y, z) and (x , y ) ∈ n Y, Z) ∼ pXY n ′ T[XY ]ǫ′ with ǫ < ǫ kZk and ǫ < ǫ′′ . If pZ n |X n (z n |xn ) = n n 1 z n ∈T[Z|X]ǫ ′′ (x ) kT[Z|X]ǫ′′ (xn )k there exists δ ′ (ǫ, ǫ′ , ǫ′′ , n) which goes to zero as ǫ, ǫ′ , ǫ′′ → 0 and n → ∞ and: o n ′ ′ n n n (x , y ) ≤ 2−n(I(Y ;Z|X)−δ ) 2−n(I(Y ;Z|X)+δ ) ≤ Pr Z̃ n ∈ T[Z|XY ]ǫ November 7, 2018 DRAFT 64 n ′ Lemma 5 (Covering Lemma [22]): Be (U, V, X) ∼ pU V X and (xn , un ) ∈ T[XU ]ǫ′ , ǫ < ǫ kVk nR and ǫ < ǫ′′ . Consider also o{V n (m)}2m=1 random vectors which are independently generated n according to n (un ) 1 v n ∈T[V |U ]ǫ′′ kT[V |U ]ǫ′′ (un )k . Then:  Pr V n (m) ∈ / T[Vn |U X]ǫ (xn , un ) for all m −−−→ 0 (385) n→∞ n uniformly for every (xn , un ) ∈ T[XU ]ǫ′ if: R > I (V ; X|U ) + δ(ǫ, ǫ′ , ǫ′′ , n) (386) where δ(ǫ, ǫ′ , ǫ′′ , n) → 0 when ǫ, ǫ′ , ǫ′′ → 0 and n → ∞. Corollary 1: Assume the conditions in Lemma 5, and also:  n Pr (X n , U n ) ∈ T[XU −−→ 1 . ]ǫ′ − (387) n→∞ Then:  Pr (U n , X n , V n (m))) ∈ / T[Un XV ]ǫ for all m −−−→ 0 (388) n→∞ when (386) is satisfied. Lemma 6 (Packing Lemma [22]): Be (U1 U2 W V1 V2 X) ∼ pU1 U2 W V1 V2 X , (xn , wn , v1n , v2n ) ∈ n ′ T[XW V1 V2 ]ǫ′ and ǫ < ǫ kU1 kkU2 k 1 and ǫ < min {ǫ1 , ǫ2 }. Consider random vectors {U1n (m1 )}A m1 =1 2 and {U2n (m2 )}A m2 =1 which are independently generated according to o n n n n n 1 ui ∈ T[Ui |Vi W ]ǫi (vi , w ) , i = 1, 2 , kT[Ui |Vi W ]ǫi (wn , vin )k and A1 , A2 are positive random variables independent of everything else. Then  Pr (U1n (m1 ), U2n (m2 )) ∈ T[Un1 U1 |XW V1 V2 ]ǫ (xn , wn , v1n , v2n ) for some (m1 , m2 ) −−−→ 0 n→∞ (389) n uniformly for every (xn , wn , v1n , v2n ) ∈ T[XW V1 V2 ]ǫ′ provided that: log E [A1 A2 ] < I (U1 ; XV2 U2 |W V1 ) + I (U2 ; XV1 U1 |W V2 ) − I (U1 ; U2 |XW V1 V2 ) − δ (390) n where δ ≡ δ(ǫ, ǫ′ , ǫ1 , ǫ2 , n) → 0 when ǫ, ǫ′ , ǫ1 , ǫ2 → 0 and n → ∞. Corollary 2: Assume the conditions in Lemma 6, and also:  n Pr (X n , W n , V1n , V2n ) ∈ T[XW −−→ 1 V1 V2 ]ǫ′ − n→∞ November 7, 2018 (391) DRAFT 65 Then:  Pr (U1n (m1 ), U2n (m2 ), X n , W n , V1n , V2n )) ∈ T[Un1 U1 XW V1 V2 ]ǫ for some (m1 , m2 ) −−−→ 0 n→∞ (392) when (390) is satisfied. Lemma 7 (Generalized Markov Lemma [24] ): Consider a pmf pU XY belonging to P (X × Y × U) and that satisfies de following: Y −− X −− U n n Consider (xn , y n ) ∈ T[XY ]ǫ′ and random vectors U generated according to: n o Pr U n = un xn , y n , U n ∈ T[Un |X]ǫ′′ (xn ) = n o 1 un ∈ T[Un |X]ǫ′′ (xn ) kT[Un |X]ǫ′′ (xn )k (393) n For sufficiently small ǫ, ǫ′ , ǫ′′ the following holds uniformly for every (xn , y n ) ∈ T[XY ]ǫ′ : where c > 1. n o  Pr U n ∈ / T[Un |XY ]ǫ (xn , y n ) xn , y n , U n ∈ T[Un |X]ǫ′′ (xn ) = O c−n (394) Corollary 3: Assume the conditions in Lemma 7, and also:  n Pr (X n , Y n ) ∈ T[XY −−→ 1 ]ǫ′ − n→∞ n and that uniformly for every (xn , y n ) ∈ T[XY ]ǫ′ : n o Pr U n ∈ / T[Un |X]ǫ′′ (xn ) xn , y n −−−→ 0 n→∞ we obtain:  Pr (U n , X n , Y n ) ∈ T[Un XY ]ǫ −−−→ 1 . n→∞ (395) (396) (397) Lemma 7 and Corollary 3 will be central for us. They will guarantee the joint typicality of the descriptions generated in different encoders considering the pmf of the chosen descriptions induced by the coding scheme used. The original proof of this result is given in [4] and involves a combination of rather sophisticated algebraic and combinatorial arguments over finite alphabets. Alternative proof was also provided in [22],  Pr (U n X n , Y n ) ∈ T[Un XY ]ǫ −−−→ 1 n→∞ November 7, 2018 (398) DRAFT 66 which strongly relies on a rather obscure result by Uhlmann [25] on combinatorics. In [24] a short and more general proof of this result is given. We next present a result which will be useful for proving Theorem 1. In order to use the Markov Lemma we need to show that the descriptions induced by the encoding procedure in each node satisfies (393). Lemma 8 (Encoding induced distribution): Consider a pmf pU XW belonging to P (U × X × W) and ǫ′ ≥ ǫ. Be {U n (m)}Sm=1 random vectors independently generated according to n o 1 un ∈ T[Un |W ]ǫ′ (wn ) kT[U |W ]ǫ′ (wn )k and where (W n , X n ) are generated with an arbitrary distribution. Once these vectors are generated, and given xn and wn , we choose one of them if: (un (m), wn , xn ) ∈ T[Un W X]ǫ , for some m ∈ [1 : S] . (399) If there are various vectors un that satisfies this we choose the one with smallest index. If there are none we choose an arbitrary one. Let M denote the index chosen. Then we have that: n o n n n n n o 1 u ∈ T[U |XW ]ǫ (x , w ) Pr U n (M ) = un xn , wn , U n (M ) ∈ T[Un |XW ]ǫ (xn , wn ) = . (400) kT[U |XW ]ǫ (xn , wn )k Proof: From the selection procedure for M we know that:   M = f 1 (U n (m), X n , W n ) ∈ T[Un XW ]ǫ , m ∈ [1 : S] , (401) where f (·) is an appropriate function. Moreover, because of this and the way in which the random vectors U n are generated we have: We can write:   U n (M ) −− 1 (U n (M ), X n , W n ) ∈ T[Un XW ]ǫ , W n , X n −− M . (402) n o Pr U n (M ) = un xn , wn , U n (M ) ∈ T[Un |XW ]ǫ (xn , wn ) = S X m=1 n o Pr M = m xn , wn , U n (m) ∈ T[Un |XW ]ǫ (xn , wn ) × n n n n n n Pr U (m) = u x , w , U (m) ∈ November 7, 2018 T[Un |XW ]ǫ (xn , wn ), M o =m . (403) DRAFT 67 From (402), the second probability term in the RHS of (403) can be written as: n o n n n n n n n n Pr U (m) = u x , w , U (m) ∈ T[U |XW ]ǫ (x , w ) ∀m . (404) We are going to analyze this term. It is clear that we can write: n o  n n n n n n n n Pr U (m) = u , U (m) ∈ T[U |XW ]ǫ (x , w ) x , w =1 un ∈ T[Un |W X]ǫ (xn , wn ) o n ×Pr U n (m) = un xn , wn ∀m .(405) This means that n n n n n n T[Un |XW ]ǫ (xn , wn ) o Pr U (m) = u x , w , U (m) ∈ n o 1 un ∈ T[Un |W X]ǫ (xn , wn ) ∩ T[Un |W ]ǫ′ (wn ) o = n ∀m . n n n n n n Pr U (m) ∈ T[U |XW ]ǫ (x , w ) x , w kT[U |W ]ǫ′ (wn )k (406) From (406) and the fact that for ǫ′ ≥ ǫ, we have that T[Un |XW ]ǫ (xn , wn ) ⊆ T[Un |W ]ǫ′ (wn ) we obtain: n o n o 1 un ∈ T[Un |W X]ǫ (xn , wn ) Pr U n (m) = un xn , wn , U n (m) ∈ T[Un |XW ]ǫ (xn , wn ) = ∀m . kT[U |XW ]ǫ (xn , wn )k (407) From this equation and (403) we easily obtain the desired result. We present, without proof, a useful result about reconstruction functions for lossy source coding problems: Lemma 9 (Reconstruction functions for degraded random variables [13]): Consider random variables (X, Y, Z) such that X − − Y − − Z. Consider an arbitrary function X̂ = f (Y, Z) and an arbitrary positive distortion function d(·, ·). Then ∃ g ∗ (Y ) such that E [d(X, g ∗ (Y ))] ≤ E[d(X, f (Y, Z))] . (408) Finally we present two lemmas about Markov chains induced by the interactive encoding schemes which will be relevant for the paper converse results Lemma 10 (Markov chains induced by interactive encoding of two nodes): Consider a set of n Q three sources (X n , Y n , Z n ) ∼ pXY Z (xt , yt , zt ) and integer K ∈ N. For each l ∈ [1 : K] t=1 consider arbitrary message sets Ixl , Iyl and arbitrary functions  fxl X n , Jx[1:l−1] , Jy[1:l−1] = Jxl ,  fyl Y n , Jx[1:l] , Jy[1:l−1] = Jyl November 7, 2018 (409) (410) DRAFT 68 with Jxl ∈ Ixl and Jyl ∈ Iyl . The following Markov chain relations are valid for each t ∈ [1 : n] and l ∈ [1 : K]:  1) Jx1 , X[1:t−1] , Y[t+1:n] −− X[t] −− (Y[t] , Z[t] ) ,    [1:l−1] [1:l−1] , Jy , X[1:t] , Y[t+1:n] −− (Y[t] , Z[t] ) , 2) Jxl , X[t+1:n] −− Jx    [1:l−1] [1:l] , X[1:t−1] , Y[t:n] −− (X[t] , Z[t] ) , 3) Jyl , Y[1:t−1] −− Jx , Jy   [1:K] [1:K] 4) X[t+1:n] −− Jx , Jy , X[1:t] , Y[t+1:n] −− (Y[t] , Z[t] ) ,   [1:K] [1:K] 5) Y[1:t−1] −− Jx , Jy , X[1:t−1] , Y[t:n] −− (X[t] , Z[t] ) ,   [1:K] [1:K] 6) Jx , Jy , X[1:t−1] , X[t+1:n] , Z[1:t−1] , Z[t+1:n] , Y[1:t−1] −− (X[t] , Y[t] ) −− Z[t] . Proof: Relations 1), 2) and 3) where obtained in [13]. For completeness we present here a short proof of 2). The proof of 1) and 3) are similar. For simplicity let us consider A =   [1:l−1] [1:l−1] I Jxl X[t+1:n] ; Y[t] Z[t] Jx Jy X[1:t] Y[t+1:n] . We can write the following:  (a)  A ≤ I Jxl X[t+1:n] ; Y[1:t] Z[t] Jx[1:l−1] Jy[1:l−1] X[1:t] Y[t+1:n]   (b) = I X[t+1:n] ; Y[1:t] Z[t] Jx[1:l−1] Jy[1:l−1] X[1:t] Y[t+1:n]     n [1:l−1] [1:l−1] [1:l−1] [1:l−1] = H X[t+1:n] Jx Jy X[1:t] Z[t] Y Jy X[1:t] Y[t+1:n] − H X[t+1:n] Jx  (c)  ≤ I X[t+1:n] ; Y[1:t] Z[t] Jx[1:l−1] Jy[1:l−2] X[1:t] Y[t+1:n]     = H Y[1:t] Z[t] Jx[1:l−1] Jy[1:l−2] X[1:t] Y[t+1:n] − H Y[1:t] Z[t] Jx[1:l−1] Jy[1:l−2] X n Z[t] Y[t+1:n]  (d)  [1:l−2] [1:l−2] (411) ≤ I X[t+1:n] ; Y[1:t] Z[t] Jx Jy X[1:t] Y[t+1:n] where • • • step (a) follows from non-negativity of mutual information,   [1:l−1] [1:l−1] , , Jy step (b) follows from the fact that Jxl = fxl X n , Jx   [1:l−1] [1:l−2] and that conditioning , Jy step (c) follows from the fact that Jyl−1 = fyl Y n , Jx reduces entropy, • step (d) follows from the fact that reduces entropy. Jxl−1 = fxl  X n [1:l−2] [1:l−2] , Jx , Jy Continuing this procedure we obtain:   A ≤ I X[t+1:n] ; Y[1:t] Z[t] X[1:t] Y[t+1:n] = 0. November 7, 2018  and that conditioning (412) DRAFT 69 This shows that 2) is true. 4) and 5) are straightforward consequences of 2) and 3). Just consider   [1:K] [1:K] is JxK+1 = JyK+1 = ∅. The proof for 6) is straightforward from the fact that Jx Jy only function of (X n , Y n ). Corollary 4: Consider the setting in Lemma 10 with the following modifications: • • X −− Z −− Y ,   [1:l−1] [1:l−1] = Jxl . , Jy fxl X n , Z n , Jx The following are true:  1) Jx1 , Z[1:t−1] , Y[t+1:n] −− Z[t] −− Y[t] ,    [1:l−1] [1:l−1] , Jy , Z[1:t] , Y[t+1:n] −− Y[t] , 2) Jxl , Z[t+1:n] −− Jx    [1:l] [1:l−1] 3) Jyl , Y[1:t−1] −− Jx , Jy , Z[1:t−1] , Y[t:n] −− (X[t] , Z[t] ) ,    [1:K] [1:K] 4) Z[t+1:n] , X n −− Jx , Jy , Z[1:t] , Y[t+1:n] −− Y[1:t] . Proof: The proof follows the same lines of Lemma 10. We consider next some Markov chains that arises naturally when we have 3 interacting nodes and which will be needed for Theorem 6. Lemma 11 (Markov chains induced by interactive encoding of three nodes): Consider a set of Q three sources (X n , Y n , Z n ) ∼ nt=1 pXY Z (xt , yt , zt ) and integer K ∈ N. For each l ∈ [1 : K] consider arbitrary message sets Ixl , Iyl , Izl and arbitrary functions  fxl X n , Jx[1:l−1] , Jy[1:l−1] , Jz[1:l−1] = Jxl ,  fyl Y n , Jx[1:l] , Jy[1:l−1] , Jz[1:l−1] = Jyl ,  fzl Z n , Jx[1:l] , Jy[1:l] , Jz[1:l−1] = Jzl (413) (414) (415) with Jxl ∈ Ixl , Jyl ∈ Iyl and Jzl ∈ Izl . The following Markov chain relations are valid for each t ∈ [1 : n] and l ∈ [1 : K]:   1) Jx1 , Jy1 , X[1:t−1] , X[t+1:n] , Y[t+1:n] , Z[1:t−1] −− X[t] , Y[t] −− Z[t] ,    [1:l−1] [1:l−1] [1:l−1] , Jy , Jz , X n , Y[t:n] , Z[1:t−1] −− Z[t] , 2) Jxl , Jyl , Y[1:t−1] −− Jx    [1:l] [1:l] [1:l−1] 3) Jzl , Z[t+1:n] −− Jx , Jy , Jz , X n , Y[t+1:n] , Z[1:t] −− Y[t] ,   [1:K] [1:K] [1:K] 4) Z[t+1:n] −− Jx , Jy , Jz , X n , Y[t+1:n] , Z[1:t] −− Y[t] ,   [1:K] [1:K] [1:K] 5) Y[1:t−1] −− Jx , Jy , Jz , X n , Y[t:n] , Z[1:t−1] −− Z[t] . Proof: Along the same lines of Lemma10 and for that reason omitted. November 7, 2018 DRAFT 70 V1n X1n Encoder 1 R1 Decoder R1 Û1n Û2n R2 X2n (X3n , V1n , V2n ) Encoder 2 V1n Figure 9: Cooperative Berger-Tung problem. A PPENDIX B C OOPERATIVE B ERGER -T UNG P ROBLEM WITH S IDE I NFORMATION AT THE D ECODER We derive an inner bound on the rate region of the setup described in Fig. 9. It should be emphasize that we will not consider distortion measures, we only focus is on the exchange of descriptions. Encoders 1 and 2 observe source sequences X1n and X2n , and also have access to a common side information V1n . Whereas, the decoder has access to side informations (X3n , V1n , V2n ). Upon observing X1n and V1n , Encoder 1 generates a message M1 which is transmitted to Encoder 2 and the decoder. Encoder 2, upon observing (X2n , V1n ) and the message M1 , generates a message M2 which is transmitted only to the decoder. Finally, the decoder uses messages (M1 , M2 ) and the side informations (X3n , V1n , V2n ) to reconstruct two sequences (Û1n , Û2n ) which are jointly typical with (X1n , X2n , X3n , V1n , V2n ). In precise terms, we will assume the following: • A probability mass function pX1 X2 X3 U1 U2 V1 V2 which takes values on cartesian product finite alphabets X1 × X2 × X3 × U1 × U2 × V1 × V2 , and that satisfies the following Markov chains: U1 −− (X1 , V1 ) −− (X2 , X3 , V2 ) , U2 −− (U1 , X2 , V1 ) −− (X1 , X3 , V2 ) . • (416) Five random vectors (X1n , X2n , X3n , V1n , V2n ) (not necessarily independently and identically distributed with pX1 X2 X3 V1 V2 which take values on alphabets X1n × X2n × X3n × V1n × V2n November 7, 2018 DRAFT 71 such that, for every ǫ > 0,  n =1. lim Pr (X1n , X2n , X3n , V1n , V2n ) ∈ T[X 1 X2 X3 V1 V2 ]ǫ n→∞ (417) Definition 4 (Cooperative code): A code (n, f1n , f2n , g n , M1 , M2 ) for the setup in Fig. 9 is composed by: • • Two set of indices M1 , M2 . An encoding function f1n : X1n × V1n → M1 , such that f1n (xn1 , v1n ) = m1 . • An encoding function f2n : X2n × V1n × M1 → M2 , such that f2n (xn2 , v1n , m1 ) = m2 . • A decoding function g n : X3n ×V1n ×V2n ×M1 ×M2 → U1n ×U2n , such that g n (xn3 , v1n , v2n , m1 , m2 ) = (ûn1 , ûn2 ). Definition 5 (Achievable rates): We say that (R1 , R2 ) are ǫ-achievable if there exists a code (n, f1n , f2n , g n , M1 , M2 ) such that: 1 1 log kM1 k ≤ R1 + ǫ , log kM2 k ≤ R2 + ǫ n n and Pr n (Û1n , Û2n , V1n , V2n , X1n , X2n , X3n ) ∈ T[Un1 U2 V1 V2 X1 X2 X3 ]ǫ (418) o ≤ǫ. (419) The closure of the set of all achievable rates (R1 , R2 ) is denoted by RCBT . The following theorem presents an inner bound to RCBT . Theorem 9: (Inner bound on the rate region of the cooperative Berger-Tung problem) Consider Rinner CBT the closure of the set of rates satisfying: R1 > I(X1 ; U1 |X2 V1 ) , R2 > I(X2 ; U2 |X3 V1 V2 U1 ) , R1 + R2 > I(X1 X2 ; U1 U2 |X3 V1 V2 ) , where the union is over all probability distributions verifying (416). Then Rinner CBT ⊆ RCBT . Remark 17: Notice that we are not asking for (X1n , X2n , X3n , V1n , V2n ) to be independently and identically distributed. This is in fact not needed for the result that follows. For us, when trying to use this result, the case of most interest will be when (X1n , X2n , X3n ) is generated using the Q product measure ni=1 pX1 X2 X3 (x1i , x2i , x3i ), (that is, when (X1 X2 X3 ) is a DMS). However, (V1n , V2n ) will not be independently and identically distributed. Still, (417) will be satisfied. November 7, 2018 DRAFT 72 Remark 18: Notice that unlike the classical rate-distortion problem we are not interested in an average per-symbol distortion constraints at the decoder. We only require that the obtained sequences be jointly typical with the sources. Clearly the problem can be slightly modified to consider the case in which reconstruction distortion constraints are of interest. In fact, case (C) reported in [26], considers a similar setting. Here, given the importance of this result for our interactive scheme, we present a slightly different and more direct proof of the achievability, where we discuss the key points in the encoding and decoding procedures which will be relevant for our extension to the interactive problem. Proof: Our proof uses standard ideas from multiterminal source coding. As V1n is common to both encoders and decoder we can set without loss of generality V1n = ∅. Conditioning with respect to V1 the final expressions can take into account the situation in which V1n 6= ∅. A. Coding generation We randomly generate 2nR̂1 codewords U1n (k), k ∈ [1 : 2R̂1 ] according to U1n (k) ∼ 1 n un1 ∈ T[Un1 ]ǫcd T[Un1 ]ǫcd o , ǫcd > 0 . (420) These 2nR̂1 codewords are distributed uniformly over 2nR1 bins denoted by B1 (m1 ), where m1 ∈ [1 : 2nR1 ]. For each codeword un1 (k) with k ∈ [1 : 2nR̂1 ] , we randomly generate 2nR̂2 codewords according to: U2n (l, k) ∼ n o 1 un2 ∈ T[Un2 |U1 ]ǫcd (un1 (k)) T[Un2 |U1 ]ǫcd (un1 (k)) , ǫcd > 0 (421) with l ∈ [1 : 2nR̂2 ]. The 2n(R̂1 +R̂2 ) codewords generated are distributed uniformly in 2nR2 bins, denoted by B2 (m2 ), m2 ∈ [1 : 2nR2 ]. It is worth to mention that the codewords {U2n (l, k)} are not distributed in a different structure of bins for each k, but on only one super-bin structure of size 2n(R̂1 +R̂2 ) /2nR2 where B2 (m2 ) does not needed to be indexed with k. As will be clear, this will not constraint the decoder to use successive decoding and instead use joint decoding in order to recover the desired codewords (Û1n , Û2n ). Finally all codebooks are revealed to all parties. November 7, 2018 DRAFT 73 B. Encoding at node 1 Given xn1 , the encoder search for k ∈ [1 : 2nR̂1 ] in such a way that: n (xn1 , un1 (k)) ∈ T[X , ǫ2 > 0 . 1 U1 ]ǫ2 (422) If more than one index satisfies this condition, then we choose the one with the smallest index. Otherwise, if no such index exists, we choose an arbitrary one and declare an error. Finally we select m1 as the index of the bin which contains the codeword un1 (k) found and transmit it to nodes 2 and 3. C. Decoding at node 2 Given xn2 and m1 , we search in the bin B1 (m1 ) for an index k ∈ [1 : 2nR̂1 ] such that: n (xn2 , un1 (k)) ∈ T[X , ǫ3 > 0 . 2 U1 ]ǫ3 (423) If there only one index that satisfies this we declare it as the index generated at node 1. If there several or none we choose a predefined one and declare an error. The chosen index is denoted as k̂(2). D. Encoding at node 2 Given xn2 and k̂(2) we search for l ∈ [1 : 2nR̂2 ] such that: n (xn2 , un1 (k̂(2)), un2 (l, k̂(2))) ∈ T[X , ǫ4 > 0 . 2 U1 U2 ]ǫ4 (424) If more than one index satisfies this condition, then we choose the one with the smallest index. Otherwise, if no such index exists, we choose an arbitrary one and declare an error. Finally we select m2 as the index of the bin which contains the codeword un2 (l, k̂(2)) selected and transmit it to node 3. E. Decoding at node 3 Given xn3 , v2n and m1 , m2 , the decoder search in the bins B1 (m1 ) and B2 (m2 ) for a pair of ˆ indices (k, l) ∈ [1 : 2nR̂1 ] × [1 : 2nR2 ] such that n (xn3 , v2n , un1 (k), un2 (l, k)) ∈ T[X , ǫ>0. 3 V2 U1 U2 ]ǫ (425) If there only one pair of indices that satisfy this we declare it as the indices generated at node 1 and 2. If there several or none we choose a predefined pair and declare an error. The chosen pair is denoted by (k̂(3), ˆl(3)). Finally, the decoder declares (ûn , ûn ) = (un (k̂(3)), un (ˆl(3), k̂(3))). 1 November 7, 2018 2 1 2 DRAFT 74 F. Error probability analysis Consider (K, L) the description indices generated at node 1 and 2, and (M1 , M2 ) the corresponding bin indices. With K̂(2) and (K̂(3), L̂(3)) we denote the indices recovered at nodes 2 and 3. We want to prove that Pr {E} ≤ ǫ′ when n is sufficiently large, where o n  n n n n n n n E= X1 , X2 , X3 , V2 , U1 (K̂(3)), U2 (L̂(3), K̂(3)) ∈ / T[X1 X2 X3 V2 U1 U2 ]ǫ . (426) We consider the following events of error: o n n n n n n / T[X1 X2 X3 V2 ]ǫ1 , ǫ1 > 0. • E1 = (X1 , X2 , X3 , V2 ) ∈ n o n n n nR̂1 • E2 = (X1 , U1 (k)) ∈ / T[X ∀k ∈ [1 : 2 ] , ǫ2 > 0. 1 U1 ]ǫ2 n o n n n n n n , ǫ3 > 0. • E3 = (X1 , X2 , X3 , V2 , U1 (K)) ∈ / T[X 1 X2 X3 V2 U1 ]ǫ3 o n   n n n , ǫ3 > 0. • E4 = ∃k̂ 6= K, k̂ ∈ B1 (M1 ), X2 , U1 (k̂) ∈ T[X U ]ǫ 1 1 3 n  o n nR̂2 • E5 = X2n , U1n (K̂(2)), U2n (l, K̂(2))) ∈ / T[X ∀l ∈ [1 : 2 ] , ǫ4 > 0. 2 U1 U2 ]ǫ4 o  n n , ǫ > 0. • E6 = X1n , X2n , X3n , V2n , U1n (K), U2n (L, K̂(2)) ∈ / T[X 1 X2 X3 V2 U1 U2 ]ǫ o n   n . • E7 = ∃k̂ 6= K, ˆ l 6= L, k̂ ∈ B1 (M1 ), ˆl ∈ B2 (M2 ), X3n , V2n , U1n (k̂), U2n (ˆl, k̂) ∈T[X 3 V2 U1 U2 ]ǫ n o S Clearly E ⊆ 7i=1 Ei . In fact, it is easy to show that (K̂(3), L̂(3)) 6= (K, L), K̂(2) 6= K ⊆ S7 ǫ2 i=1 Ei . From hypothesis, we obtain that limn→∞ Pr {E1 } = 0. Choosing ǫ1 < kU1 k and ǫ2 < ǫcd we can use Lemma 5 and its Corollary (with the following equivalences: V ≡ U1 , X ≡ X1 , U ≡ ∅) to obtain limn→∞ Pr {E2 } = 0 if R̂1 > I(U1 ; X1 ) + δ(ǫ1 , ǫ2 , ǫcd , n) . (427) For the analysis of Pr {E3 } we can use Lemma 7, its Corollary and Lemma 8 defining Y ≡ X2 X3 , X ≡ X1 and U ≡ U1 and using ǫ2 , ǫ3 and ǫcd sufficiently small13 to obtain limn→∞ Pr {E3 } = 0. For the analysis of Pr {E4 } we can write: Pr {E4 } = E [Pr {E4 |K = k, M1 = m1 }]        [ n  o  n X2n , U1n (k̂) ∈ T[X = E Pr 1 U1 ]ǫ3     k̂6 = k  k̂∈B1 (m1 ) 13       K=k M1 =m1  .      (428) In the following, we will not indicate anymore the corresponding values of the constants ǫ, the arguments of δ and the equivalence between the involved random variables in order to use the lemmas from Appendix A November 7, 2018 DRAFT 75 Using Lemma 6 (with the appropriate equivalences on the involved random variables) and the statistical properties of the codebooks, binning and encoding, we have, that for each k, m1 :     o  n [ n K=k =0 (429) lim Pr X2n , U1n (k̂) ∈ T[X M1 =m1 1 U1 ]ǫ3 n→∞   k̂6=k,k̂∈B1 (m1 ) provided that 1 log EkB1 (m1 )k < I(X2 ; U1 ) − δ(ǫ1 , ǫ3 , ǫcd , n) . n (430) As E [kB1 (m1 )k] = 2n(R̂1 −R1 ) ∀m1 we have that limn→∞ Pr {E4 } = 0 provided that R̂1 − R1 < I(X2 ; U1 ) − δ . (431) The analysis of Pr {E5 } follows the same lines of Pr {E2 }. The above analysis implies that o n  n =1. (432) lim Pr X2n , U1n (K̂(2)) ∈ T[X 2 U1 ]ǫ3 n→∞ Then, by Lemma 5 and its Corollary we have that limn→∞ Pr {E5 } = 0 if : R̂2 > I(X2 ; U2 |U1 ) + δ . (433) From Lemmas 7 and 8, similarly as with Pr {E3 }, we have limn→∞ Pr {E6 } = 0. Let us turn to analyze Pr {E7 }: Pr {E7 } = E [Pr {E7 |K = k, L = l, M1 = m1 , M2 = m2 }]           o n  [   n n n n ˆ n  X3 , V2 , U1 (k̂), U2 (l, k̂) ∈ T[X3 V2 U1 U2 ]ǫ = E Pr     (k̂,l̂)6=(k,l)      k̂∈B1 (m1 ) (k̂,l̂)∈B2 (m2 ) ≤ E [α1 + α2 + α3 ] where we have         o n  [ n X3n , V2n , U1n (k̂), U2n (l, k̂) ∈ T[X α1 = Pr 3 V2 U1 U2 ]ǫ    k̂6 = k     k̂∈B1 (m1 ) (k̂,l)∈B2 (m2 ) November 7, 2018           K=k,L=l  M1 =m1 ,M2 =m2           K=k,L=l M1 =m1 ,M2 =m2 (434)                , (435) DRAFT 76 α2 = Pr α3 = Pr                         [ l̂6=l (k,l̂)∈B2 (m2 ) [ k̂6=k,l̂6=l k̂∈B1 (m1 ) (k̂,l̂)∈B2 (m2 ) n n o  n X3n , V2n , U1n (k), U2n (ˆl, k) ∈ T[X 3 V2 U1 U2 ]ǫ o  n X3n , V2n , U1n (k̂), U2n (ˆl, k̂) ∈ T[X 3 V2 U1 U2 ]ǫ K=k,L=l M1 =m1 ,M2 =m2 K=k,L=l M1 =m1 ,M2 =m2 We can use the Lemma 6 to obtain:                         lim α1 = 0 , lim α2 = 0 , lim α3 = 0 n→∞ n→∞ n→∞ , (436) . (437) (438) provided that 1 log E k̂ : (k̂, l) ∈ B2 (m2 ), k̂ ∈ B1 (m1 ) < I(X3 V2 ; U1 U2 ) − δ, n 1 log E ˆl : (k, ˆl) ∈ B2 (m2 ) < I(X3 V2 ; U2 |U1 ) − δ , n 1 log E (k̂, ˆl) : (k̂, ˆl) ∈ B2 (m2 ), k̂ ∈ B1 (m1 ) < I(X3 V2 ; U1 U2 ) − δ . n (439) (440) (441) Because on how the binning is performed, we have: E k̂ : (k̂, l) ∈ B2 (m2 ), k̂ ∈ B1 (m1 ) = 2n(R̂1 −R1 −R2 ) , E ˆl : (k, ˆl) ∈ B2 (m2 ) = 2n(R̂2 −R2 ) , E (k̂, ˆl) : (k̂, ˆl) ∈ B2 (m2 ), k̂ ∈ B1 (m1 ) = 2n(R̂1 +R̂2 −R1 −R2 ) , (442) (443) (444) which give us: (R̂1 − R1 ) − R2 < I(X3 V2 ; U1 U2 ) − δ , (445) R̂2 − R2 < I(X3 V2 ; U2 |U1 ) − δ , (446) (R̂1 + R̂2 ) − (R1 + R2 ) < I(X3 V2 ; U1 U2 ) − δ . (447) Notice that equation (445) remains inactive because of (447). Equations (427), (431), (433), (446), (447) can be combined with: November 7, 2018 R̂1 > R1 , (448) R̂1 + R̂2 > R2 , (449) DRAFT 77 which follow from the binning structure assumed in the generated codebooks. A Fourier-Motzkin elimination procedure can be done to eliminate R̂1 and R̂2 obtaining the desired rate region (conditioning also the mutual information terms on V1 ). The following Corollary considers the case in which a genie gives node 2 the value of M1 . Indeed, this case will be important for our main result. Corollary 5: If a genie gives M1 to node 2, the achievable region Rinner CBT reduces to: R2 > I(X2 ; U2 |X3 V1 V2 U1 ) , (450) R1 + R2 > I(X1 X2 ; U1 U2 |X3 V1 V2 ) . (451) The proof of this result is straightforward and thus it will not be presented. A PPENDIX C P ROOF OF T HEOREM 1 Let us describe the coding generation, encoding and decoding procedures. We will consider the following notation. With Mi→S,l we will denote the index corresponding to the true description n Ui→S,l generated at node i at round l and destined to the group of nodes S ∈ C (M) with i ∈ / S. With M̂i→S,l (j) where S ∈ C (M), i ∈ / S, j ∈ S we denote the corresponding estimated index at node j. A. Codebook generation Consider the round l ∈ [1 : K]. For simplicity let us consider the descriptions at node 1. We (l) n generate 2nR̂1→23 independent and identically distributed n-length codewords U1→23,l (m1→23,l , mW[1,l] ) according to: o  n n 1 un1→23,l ∈ T[Un1→23,l |W[1,l] ]ǫ(1,23,l) w[1,l] n   (m1→23,l , mW[1,l] ) ∼ U1→23,l , ǫ(1, 23, l) > 0 (452) n n T[U1→23,l |W[1,l] ]ǫ(1,23,l) w[1,l] (l) n where m1→23,l ∈ [1 : 2nR̂1→23 ] and let mW[1,l] denote the indices of the common descriptions W[1,l] generated in rounds t ∈ [1 : l − 1]. For example, mW[1,l] = {m1→23,t , m2→13,t , m3→12,t }l−1 t=1 . With n w[1,l] we denote the set of n-length common information codewords from previous rounds corre  (l) sponding to the indices mW[1,l] . For each mW[3,l−1] consider the set of 2 (l) n R̂1→23 +R̂3→12 codewords n U1→23,l (m1→23,l , m3→12,l−1 , mW[3,l−1] ). These n-length codewords are distributed independently November 7, 2018 DRAFT 78   (l) and uniformly over 2nR1→23 bins denoted by B1→23,l p1→23,l , mW[3,l−1] with p1→23,l ∈ [1 : (l) 2nR1→23 ]. Notice that this binning structure is exactly the same we used for the cooperative Bergern Tung problem in Appendix B. Node 1 distributes codewords U1→23,l (m1→23,l , m3→12,l−1 , mW[3,l−1] ) in a super-binning structure. This will allow node 2 to recover both, m1→23,l and m3→12,l−1 , using the same procedure as in the Berger-Tung problem described above. Notice that a different superbinning structure is generated for every mW[3,l−1] . This is without loss of generality, because at round l nodes 1, 2 and 3, will have a very good estimated of it (see below). (l) (l) We also generate 2nR̂1→2 and 2nR̂1→3 independent and identically distributed n-length coden n words U1→2,l (m1→2,l , mW[2,l] , mV[12,l,1] ), and U1→3,l (m1→3,l , mW[2,l] , mV[13,l,1] ) according to: n  o n n 1 un1→2,l ∈ T[Un1→2,l |W[2,l] V[12,l,1] ]ǫ(1,2,l) w[2,l] , v[12,l,1] n   ,(453) U1→2,l (m1→2,l , mW[2,l] , mV[12,l,1] ) ∼ n n n T[U1→2,l |W[2,l] V[12,l,1] ]ǫ(1,2,l) w[2,l] , v[12,l,1] o  n n n , v[13,l,1] 1 un1→3,l ∈ T[Un1→3,l |W[2,l] V[13,l,1] ]ǫ(1,3,l) w[2,l] n   ,(454) U1→3,l (m1→3,l , mW[2,l] , mV[13,l,1] ) ∼ n n n , v w T[U1→3,l |W[2,l] V[13,l,1] ]ǫ(1,3,l) [2,l] [13,l,1] (l) (l) where ǫ(1, 2, l) > 0, ǫ(1, 3, l) > 0, and m1→2,l ∈ [1 : 2nR̂1→2 ] and m1→3,l ∈ [1 : 2nR̂1→3 ]. These  (l) codewords are distributed uniformly on 2nR1→2 bins denoted by B1→2,l p1→2,l , mW[2,l] , mV[12,l,1] (l) (l) and indexed with p1→2,l ∈ [1 : 2nR1→2 ] and on 2nR1→3 bins denoted by B1→3,l p1→3,l , mW[2,l] , mV[13,l,1] (l) and indexed with p1→3,l ∈ [1 : 2nR1→3 ], respectively. Notice that these codewords (which will be used to generated private descriptions to node 2 and 3) are not distributed in a super-binning structure. This is because there is not explicit cooperation between the nodes at this level. That is, node 2 is not compelled to recover the private description that node 1 generate for node 3, and for that reason the private description that node 2 generate for node 3 is not superimposed over the former. Notice that the binning structure used for the codewords to be utilized by node 1 impose the following relationships: (l) (l) (l−1) R1→23 < R̂1→23 + R̂3→12 , (455) (l) (l) (456) (l) (l) (457) R1→2 < R̂1→2 , R1→3 < R̂1→3 . The common and private codewords to be utilized in nodes 2 and 3, for every round, are generated by following a similar procedure and theirs corresponding rates have analogous relationships. After this is finished the generated codebooks are revealed to all the nodes in the network. November 7, 2018 DRAFT  79 B. Encoding technique Consider node 1 at round l ∈ [1 : K]. Upon observing xn1 and given all of its encoding and decoding history up to round l, encoder 1 first looks for a codeword un1→23,l (m1→23,l , m̂W[1,l] (1)) such that ǫc (1, 23, l) > 0,   n n n x1 , w[1,l] (m̂W[1,l] (1)), u1→23,l (m1→23,l , m̂W[1,l] (1)) ∈ T[Un1→23,l X1 W[1,l] ]ǫc (1,23,l) . (458) Notice that some components in m̂W[1,l] (1) are generated at node 1 and are perfectly known. If more than one codeword satisfies this condition, then we choose the one with the smallest index. Otherwise, if no such codeword exists, we choose an arbitrary index and declare an error. With the chosen index m1→23,l , and with m̂W[3,l−1] (1), we determine the index p1→23,l of  the bin B1→23,l p1→23,l , m̂W[3,l−1] to which un1→23,l (m1→23,l , m̂3→12,l−1 (1), m̂W[3,l−1] (1)) belongs. After this, Encoder 1 generates the private descriptions looking for codewords un1→2,l (m1→2,l , m̂W[2,l] (1), m̂V[12,l,1] (1)), un1→3,l (m1→2,l , m̂W[2,l] (1), m̂V[13,l,1] (1)) such that   n n xn1 , w[2,l] (m̂W[2,l] (1)), v[12,l,1] (m̂V[12,l,1] (1)), un1→2,l (m1→2,l , m̂W[2,l] (1), m̂V[12,l,1] (1)) ∈ T[Un1→2,l X1 W[2,l] V[12,l,1] ]ǫc (1,2,l) , , n n xn1 , w[2,l] (m̂V[13,l,1] (1)), un1→3,l (m1→2,l , m̂W[2,l] (1), m̂V[13,l,1] (1)) (m̂W[2,l] (1)), v[13,l,1] ∈ T[Un1→3,l X1 W[2,l] V[13,l,1] ]ǫc (1,3,l) ,   (459) (460)  respectively, where ǫc (1, 2, l) > 0 and ǫc (1, 3, l) > 0. Given m̂W[2,l] (1), m̂V[12,l,1] (1), m̂V[13,l,1] (1) , the encoding procedure continues by determining the bin indices p1→2,l and p1→3,l to which the generated private descriptions belong to. Node 1 then transmits to node 2 and 3 the indices (p1→23,l , p1→2,l , p1→3,l ). The encoding in nodes 2 and 3 follows along the same lines and for that reason are not described. C. Decoding technique Consider round l ∈ [1 : K + 1] and node 2. During round the present and previous round node 2 receives (p1→23,l , p3→12,l−1 , p1→2,l , p1→3,l , p3→1,l−1 , p3→2,l−1 ). However, only the indices (p1→23,l , p3→12,l−1 , p1→2,l , p3→2,l−1 ) are the ones relevant to him. Knowing this set of indices, node 2 aims to recover the exact values of (m1→23,l , m3→12,l−1 , m1→2,l , m3→2,l−1 ). This is done through November 7, 2018 DRAFT 80 successive decoding where first, the common information indices are recovered by looking for the unique tuple of codewords un1→23,l (m1→23,l , m3→12,l−1 , m̂W[3,l−1] (2)), un3→12,l−1 (m3→12,l−1 , m̂W[3,l−1] (2)) that satisfies:  n n n xn2 , w[3,l−1] (m̂W[3,l−1] (2)), v[12,1,l] (m̂V[12,l,1] (2)), v[23,l−1,3] (m̂V[23,l−1,3] (2)), un1→23,l (m1→23,l , m3→12,l−1 , m̂W[3,l−1] (2)), un3→12,l−1 (m3→12,l−1 , m̂W[3,l−1] (2)) ∈ T[Un1→23,l U3→12,l−1 X2 W[3,l−1] V[23,l−1,3] V[12,l,1] ]ǫdc (2,l) , ǫdc (2, l) > 0  (461) and also belong to the bins indicated by p1→23,l and p3→12,l−1 . If there are more than one pair of codewords, or none that satisfies this, we choose a predefined one and declare an error. After this is done, node 2 can recover the private information indices by looking at codewords un1→2,l (m1→2,l , m̂W[2,l] (2), m̂V[12,l,1] (2)) and un3→2,l−1 (m3→2,l−1 , m̂W[1,l] (2), m̂V[23,l−1,3] (2)) which satisfy n n n xn2 , w[2,l] (m̂W[2,l] (2)), v[12,l,1] (m̂V[12,l,1] (2)), v[23,l−1,3] (m̂V[23,l−1,3] (2)), un1→2,l (m1→2,l , m̂W[2,l] (2), m̂V[12,l,1] (2)), un3→2,l−1 (m3→2,l−1 , m̂W[1,l] (2), m̂V[23,l−1,3] (2)) ∈ T[Un1→2,l U3→2,l−1 X2 W[2,l] V[23,l−1,3] V[12,l,1] ]ǫdp (2,l) , ǫdp (2, l) > 0  (462) and are in the bins given by p1→2,l and p3→2,l−1 . If there are more than one pair of codewords, or none that satisfies this, we choose a predefined one and declare an error. The decoding in nodes 1 and 3 is exactly the same and for that reason are not described. D. Lossy reconstructions When the exchange of information is completed, each node needs to estimate the other nodes sources. For instance, node 1 reconstruct the source of node 2 by computing:  x̂12,i = g12 x1i , v[12,K+1,1]i , w[1,K+1]i , i = 1, 2, . . . , n, (463) and similarly, for the source of node 3:  x̂13,i = g13 x1i , v[13,K+1,1]i , w[1,K+1]i , i = 1, 2, . . . , n. (464) Reconstruction at nodes 2 and 3 is done in a similar way using the adequate reconstruction functions. November 7, 2018 DRAFT 81 E. Error and distortion analysis In order to maintain expressions simple, in the following when we denote a description without n n the corresponding index, i.e. Ui→S,l or W[1,l] , we will assume that the corresponding index is the true one generated in the corresponding nodes through the detailed encoding procedure. Consider round l and the event Dl = Gl ∩ Fl , where for ǫl > 0, o n  n n n n n n n n Gl = X1 , X2 , X3 , W[1,l] , V[12,l,1] , V[13,l,1] , V[23,l,2] ∈ T[X1 X2 X3 W[1,l] V[12,l,1] V[13,l,1] V[23,l,2] ]ǫl ,(465) n / S, j ∈ S, t ∈ [1 : l − 1], with exception of Fl = M̂i→S,t (j) = Mi→S,t , S ∈ C(M), i ∈ o M̂3→12,l−1 (2), M̂3→2,l−1 (2) . (466) The set Gl indicates that all the descriptions generated in the network, up to round l are jointly typical with the sources. The ocurrence of this depends mainly on the encoding procedure in the nodes. Set Fl indicates, that up to round l, all nodes were able to recovers the true indices of the descriptions. This clearly implies that there were not errors at the decoding procedures in all the nodes in the network. The condition in Fl on M̂3→12,l−1 (2), M̂3→2,l−1 (2) is due to the fact, that the decoding of those descriptions in node 2 occurs during round l. The occurrence of Dl guarantees that at the beginning of round l: • n n Node 1 and 2 share a common path of descriptions W[1,l] ∪ V[12,l,1] which are typical with (X1n , X2n , X3n ). • n n Node 1 and 3 share a common path of descriptions W[1,l] ∪ V[13,l,1] which are typical with (X1n , X2n , X3n ). • n n Node 2 and 3 share a common path of descriptions W[3,l−1] ∪ V[23,l−1,3] which are typical with (X1n , X2n , X3n ). Let us also define the event El : El ={there exists at least an error at the encoding or decoding in a node during round l} [ = Eenc (i, l) ∪ Edec (i, l) (467) i∈M where Eenc (i, l) contains the errors at the encoding in node i during round l and Edec (i, l) considers the event that at node i during round l there is a failure at recovering an index generated previously in other node. For example, at node 1 and during round l: Eenc (i, l) = Eenc (1, l, 23) ∪ Eenc (1, l, 2) ∪ Eenc (1, l, 3) November 7, 2018 (468) DRAFT 82 where  n n X1n , W[1,l] (M̂W[1,l] (1))U1→23,l (m1→23,l , M̂W[1,l] (1)) ∈ / T[Un1→23,l X1 W[1,l] ]ǫc (1,l,23) o (l) ∀m1→23,l ∈ [1 : 2nR̂1→23 ] (469) n  n n n n Eenc (1, l, 2)= X1 , W[2,l] (M̂W[2,l] (1)), V[12,l,1] (M̂V[12,l,1] (1)), U1→2,l (m1→2,l , M̂W[2,l] (1), M̂V[12,l,1] (1)) o (l) ∈ / T[Un1→2,l X1 W[1,l] V[12,l,1] ]ǫc (1,l,2) ∀m1→2,l ∈ [1 : 2nR̂1→2 ] (470) n  n n n Eenc (1, l, 3)= X1n , W[2,l] (M̂W[2,l] (1)), V[13,l,1] (M̂V[13,l,1] (1)), U1→3,l (m1→3,l , M̂W[2,l] (1), M̂V[13,l,1] (1)) o (l) n nR̂1→3 ∈ / T[U1→3,l X1 W[1,l] V[13,l,1] ]ǫc (1,l,3) ∀m1→3,l ∈ [1 : 2 ] . (471) Eenc (1, l, 23)= n Event Edec (i, l) can be decomposed as: [ Edec (i, l) = [ n S∈C(M),i∈S j:j ∈S / M̂j→S,l (i) 6= Mj→S,l o . (472) At the end of the information exchange phase we would expect the occurrence of DK+1 ∩ ĒK+1 , where EK+1 is the event of an error during round K + 1. As during round K + 1 only node 2 tries to recover the descriptions generated during round K in node 3, we have: o n EK+1 = Edec (2, K + 1) = M̂3→12,K (2) 6= M3→12,K or M̂3→2,K (2) 6= M3→2,K . (473) The occurrence of DK+1 ∩ ĒK+1 guarantees that all the descriptions generated during the K rounds of information exchange in the network are jointly typical with the sources realizations and that those descriptions can be perfectly recovered in all the nodes. In this way, if we can  guarantee that Pr DK+1 ∩ ĒK+1 −−−→ 1, then with probability converging to one we obtain: n→∞ • n n Node 1 and 2 share a common path of descriptions W[1,K+1] ∪ V[12,K+1,1] which are typical with (X1n , X2n , X3n ). • n n Node 1 and 3 share a common path of descriptions W[1,K+1] ∪ V[13,K+1,1] which are typical with (X1n , X2n , X3n ). • n n Node 2 and 3 share a common path of descriptions W[1,K+1] ∪ V[23,K+1,2] which are typical with (X1n , X2n , X3n ). Using standard analysis ideas, the average distortions (over the codebooks) at the reconstruction stages in all the nodes satisfy the required fidelity constraints. From there is straightforward to prove the existence of good codebooks for the network. In order to prove that November 7, 2018 DRAFT 83  Pr DK+1 ∩ ĒK+1 −−−→ 1 let us write: n→∞ o n   Pr DK+1 ∩ ĒK+1 =Pr D̄K+1 ∪ EK+1 = Pr D̄K+1 + Pr {DK+1 ∩ EK+1 }   ≤Pr D̄K+1 ∩ DK + Pr D̄K + Pr {DK+1 ∩ EK+1 }    ≤Pr D̄K + Pr D̄K+1 ∩ DK ∩ ĒK + Pr {DK ∩ EK } + Pr {DK+1 ∩ EK+1 } K+1 K X X    ≤Pr D̄1 + Pr {Dl ∩ El } + Pr D̄l+1 ∩ Dl ∩ Ēl l=1 Notice that . (474) l=1  n D1 = (X1n , X2n , X3n ) ∈ T[X 1 X2 X3 ]ǫ1 , ǫ1 > 0 . (475)  From Lemma 2, we see that for every ǫ1 > 0 Pr D̄1 −−−→ 0. Then, it is easy to see that n→∞  Pr DK+1 ∩ ĒK+1 −−−→ 1 will hold if the coding generation, the encoding and decoding n→∞ procedures described above allow us to have the following: 1) If Pr {Dl } −−−→ 1 then Pr {Dl+1 } −−−→ 1 ∀l ∈ [1 : K + 1]. n→∞ n→∞ 2) Pr {Dl ∩ El } −−−→ 0 ∀l ∈ [1 : K + 1]. n→∞ In the following we will prove these facts. Observe that, at round l the nodes act sequentially: Encoding at node 1 → Decoding at node 2 → · · · → Encoding at node 3 → Decoding at node 1. Then, using (467) we can write condition 2) as:  Pr {Dl ∩ El }=Pr {Dl ∩ Eenc (1, l)} + Pr Dl ∩ Edec (2, l) ∩ Ēenc (1, l)  +Pr Dl ∩ Eenc (2, l) ∩ Ēenc (1, l) ∩ Ēdec (2, l) +  · · · + Pr Dl ∩ Edec (1, l) ∩ Ēenc (1, l) ∩ · · · ∩ Ēenc (3, l) . (476) Assume then that at the beginning of round l we have Pr {Dl } −−−→ 1. Let us analize the n→∞ encoding procedure at node 1. Let us consider Pr {Dl ∩ Eenc (1, l)}. We can write:  Pr {Dl ∩ Eenc (1, l)}≤Pr {Eenc (1, l, 23) ∩ Dl } + Pr Eenc (1, l, 2) ∩ Dl ∩ Ēenc (1, l, 23)  +Pr Eenc (1, l, 3) ∩ Dl ∩ Ēenc (1, l, 23) . (477) From Lemma 1 and the fact that limn→∞ Pr {Gl } = 1 we have that limn→∞ Pr {Al (1, 23)} = 1 where: November 7, 2018 o n n n . Al (1, 23) = (X1n , W[1,l] ) ∈ T[X 1 W[1,l] ]ǫl (478) DRAFT 84 Then, we can use Lemma 5 to obtain: lim Pr {Eenc (1, l, 23) ∩ Dl } = 0 (479) n→∞ provided that (l) R̂1→23   > I X1 ; U1→23,l W[1,l] + δc (1, l, 23) (480) where δc (1, l, 23) can be made arbitrarily small. On the other hand we can write:   Pr Eenc (1, l, 2) ∩ Dl ∩ Ēenc (1, l, 23) ≤ Pr {Eenc (1, l, 2) ∩ Gl (1, 2) ∩ Fl } + Pr Ḡl (1, 2) (481) where o n n n n n n Gl (1, 2) = (X1n , X2n , X3n , W[2,l] , V[12,l,1] , V[13,l,1] , V[23,l,2] ) ∈ T[X 1 X2 X3 W[2,l] V[12,l,1] V[13,l,1] V[23,l,2] ]ǫl (1,2) (482) where ǫl (1, 2) > 0. As explained before, Pr {Gl } −−−→ 1. Then, from condition (480) we have n→∞ o n  n (483) ) ∈ T[X1 W[2,l] ]ǫc (1,l,23) −−−→ 1 . Pr Ēenc (1, l, 23) ∩ Fl = Pr (X1n , W[2,l] n→∞ Moreover, from the coding generation and the encoding procedure proposed is immediate to use Lemma 8 to show that:  n n Pr U1→23,l = un1→23,l xn1 , w[1:l] , Ēenc (1, l, 23) ∩ Fl = n  o n 1 un1→23,l ∈ T[Un1→23,l |X1 W[1,l] ]ǫc (1,23,l) xn1 , w[1,l]   . n T[Un1→23,l |X1 W[1,l] ]ǫc (1,23,l) xn1 , w[1,l] (484) Then, from Markov chain U1→23,l −− (X1 , W[1,l] ) −− (X2 , X3 , V[12,l,1] , V[13,l,1] , V[23,l,2] ) (485) and the Markov Lemma 7, for sufficiently small (ǫc (1, l, 23), ǫl , ǫl (1, 2)) and after some minor manipulations, we can obtain: Pr {Gl (1, 2)} −−−→ 1 . n→∞ (486) From equation (481) it is clear that we need to analyze term Pr {Eenc (1, l, 2) ∩ Gl (1, 2) ∩ Fl }. Similarly as before limn→∞ Pr {Al (1, 2)} where: o n n n n n Al (1, 2) = (X1 , W[2,l] , V[12,l,1] ) ∈ T[X1 W[2,l] V[12,l,1] ]ǫl (1,2) , November 7, 2018 (487) DRAFT 85 which allow us to write: Pr {Eenc (1, l, 2) ∩ Gl (1, 2) ∩ Fl } ≤ Pr {Eenc (1, l, 2) ∩ Al (1, 2) ∩ Fl } . (488) Using again Lemma 5, we obtain that Pr {Eenc (1, l, 2) ∩ Gl (1, 2) ∩ Fl } −−−→ 0 provided that n→∞   (l) (489) R̂1→2 > I X1 ; U1→2,l W[2,l] V[12,l,1] + δc (1, l, 2)  where δc (1, l, 2) can be made arbitrarly small. For the analysis of Pr Eenc (1, l, 3) ∩ Dl ∩ Ēenc (1, l, 23) we follow the same procedure. We can write   Pr Eenc (1, l, 3) ∩ Dl ∩ Ēenc (1, l, 23) ≤ Pr {Eenc (1, l, 3) ∩ Gl (1, 3) ∩ Fl } + Pr Ḡl (1, 3) (490) with o n n n n n . Gl (1, 3) = (X1n , X2n , X3n , W[2,l] , V[12,l,2] , V[13,l,1] , V[23,l,2] ) ∈ T[X 1 X2 X3 W[2,l] V[12,l,2] V[13,l,1] V[23,l,2] ]ǫl (1,3) (491) Using the Markov chain U1→2,l −− (X1 , W[2,l] , V[12,l,1] ) −− (X2 , X3 , V[13,l,1] V[23,l,2] ) , (492) the fact that Pr {Gl (1, 2)} −−−→ 1 and the Markov Lemma 7, and Lemma 8 for appropriately n→∞ chosen values of (ǫc (1, l, 2), ǫl (1, 2), ǫl (1, 3)) we have: Pr {Gl (1, 3)} −−−→ 1 . (493) n→∞ Following exactly the same reasoning as above, we have that in order to have  Pr Eenc (1, l, 3) ∩ Dl ∩ Ēenc (1, l, 23) −−−→ 0 , n→∞ besides conditions (480) and (489) we need:   (l) R̂1→3 > I X1 ; U1→3,l W[2,l] V[13,l,1] + δc (1, l, 3) (494) (495) for sufficiently small δc (1, l, 3). With these conditions we have proved that the encoding procedure in node 1 during round l permit us to have: Pr {Dl ∩ Eenc (1, l)} −−−→ 0 . n→∞ (496) Another instance of the Markov lemma, jointly with Markov chain U1→3,l −− (X1 , W[2,l] , V[13,l,1] ) −− (X2 , X3 , V[12,l,2] V[23,l,2] ) November 7, 2018 (497) DRAFT 86 and Lemma 8 allow us to have: Pr {Gl (2, 13)} −−−→ 1 (498) n→∞ where Gl (2, 13) = n n n n (X1n , X2n , X3n , W[2,l] , V[12,l,2] , V[13,l,3] , V[23,l,2] ) ∈ n T[X 1 X2 X3 W[2,l] V[12,l,2] V[13,l,3] V[23,l,2] ]ǫl (2,13) (499) At this point we have to analyze the decoding in node 2. If that decoding if successful, with (498), the analysis of the encoding at node 2 follows the same lines as above14 . The same can be said of the encoding at node 3 (after successful decoding). In this way, we terminate round l with Pr {Dl+1 } = Pr {Gl+1 ∩ Fl+1 } −−−→ 1 (500) n→∞ which is one the results we wanted. Clearly, analyzing now the decoding at node 2 (from which we can easily extrapolate the analysis to the decoding at node 1 and 3) we will be able to obtain Pr {Dl ∩ El } −−−→ 0 which is the other required result. n→∞ The decoding in each of nodes follows the approach of successive decoding. Decoder 2 will try to find first the common descriptions M1→23,l and M3→12,l−1 . Then, it will try to find the private descriptions M1→2,l and M3→2,l−1 (using of course the previously obtained common descriptions as side information). Clearly, the use of joint-decoding could improve the rate region. However, the analysis of this strategy, besides of being more difficult to analyze, it will give rise to more complex rate region. It can be easily seen, that the joint-decoding region will contain several sum-rate equations that will contains common and private rates. Successive decoding allows for a rate region where the sum-rate equations contains solely common rates or private rates, being more easy to analyze and understand. In order to analyze the decoding, we can write:   Pr Dl ∩ Edec (2, l) ∩ Ēenc (1, l) ≤ Pr {Edec (2, l) ∩ Fl ∩ Gl (2, 13)} + Pr Ḡl (2, 13) . (501) As Pr {Gl (2, 13)} −−−→ 1 we can concentrate our effort on the first term. Event Edec (2, l) can n→∞ be written as: Edec (2, l) = Hcommon (2, l) ∪ Hprivate (2, l) , 14 (502) See that Gl (2, 13) has, for the encoding at node 2 the same role that Gl has for the enconding at node 1 during round l. November 7, 2018 DRAFT o . 87 where  o M̂3→12,l−1 (2), M̂1→23,l (2) 6= (M3→12,l−1 , M1→23,l ) ,  o n Hprivate (2, l)= M̂3→2,l−1 (2), M̂1→2,l (2) 6= (M3→2,l−1 , M1→2,l ) . Hcommon (2, l)= n (503) (504) From these definitions, we can easily deduce that: Pr {Edec (2, l) ∩ Fl ∩ Gl (2, 13)}=Pr {Hcommon (2, l) ∩ Fl ∩ Gl (2, 13)}  +Pr Hprivate (2, l) ∩ Fl ∩ Gl (2, 13) ∩ H̄common (2, l) ≤Pr {Kcommon (2, l)} + Pr {Kprivate (2, l)} (505) where n Kcommon (2, l)= ∃(m̃1→23,l , m̃3→12,l−1 ) 6= (M1→23,l , M3→12,l−1 ), (m̃1→23,l , m̃3→12,l−1 ) n n n ∈ B1→23,l (P1→23,l ) × B3→12,l−1 (P3→12,l−1 ) : X2n , W[3,l−1] , V[23,l−1,3] , V[12,l,1] , o n n n , (506) U1→23,l (m̃1→23,l , m̃3→12,l−1 ), U3→12,l−1 (m̃3→12,l−1 ) ∈ T[X 1 W[2,l] V[12,l,1] V[23,l−1,3] ]ǫdc (2,l) n Kprivate (2, l)= ∃(m̃1→2,l , m̃3→2,l−1 ) 6= (M1→2,l , M3→2,l−1 ), (m̃1→2,l , m̃3→2,l−1 ) n n n ∈ B1→2,l (P1→2,l ) × B3→2,l−1 (P3→2,l−1 ) : X2n , W[2,l] , V[23,l−1,3] , V[12,l,1] , o n n n , (507) U1→2,l (m̃1→2,l ), U3→2,l−1 (m̃3→2,l−1 ) ∈ T[X 1 W[2,l] V[12,l,2] V[23,l,2] ]ǫdp (2,l) where ǫdc (2, l), ǫdp (2, l) are carefully chosen15 , and for a saving of notation we considered only the n n indices to be recovered, i.e., U1→23,l (m̃1→23,l , m̃3→12,l−1 ) ≡ U1→23,l (m̃1→23,l , m̃3→12,l−1 , MW[3,l−1] ) Consider first the recovering of the common information. Node 2 has to recover two indices from a binning structure as the one in the cooperative Berger-Tung problem described in Appendix B. In Fig. 10, we have a representation of the problem seen at decoder 2. Node 3 generate (l−1) n a common description at rate R3→12 using W[3,l−1] as side information. Similarly node 1, 15 Using Lemma 1 to have:   n n n n Gl (2, 13)⊆ X2n , W[2,l] , V[23,l−1,3] , V[12,l,1] ) ∈ T[X , 1 W[2,l] V[12,l,1] V[23,l−1,3] ]ǫdc (2,l)   n n n n . Gl (2, 13)⊆ X2n , W[2,l] , V[23,l,2] , V[12,l,2] ) ∈ T[X 1 W[2,l] V[12,l,2] V[23,l,2] ]ǫdp (2,l) November 7, 2018 DRAFT 88 n W[3,l−1] X3n Encoder 3 n U1!23,l (M̂1!23,l , M̂3!12,l−1 ) (l−1) R3!12 (l−1) Decoder 2 R3!12 (l) R1!23 X1n n U3!12,l−1 (M̂3!12,l−1 ) Encoder 1 n n n (X2n , W[3,l−1] , V[12,l,1] , V[23,l−1,3] ) n W[3,l−1] Figure 10: Cooperative Berger-Tung decoding problem for node 2. after decoding the common description from node 3, generate its own description using the n recovered one and also W[3,l−1] as side information. All these operations all done using the super-binning structure as in the cooperative Berger-Tung problem in Appendix B. Then, node n n n 2, using (X3n , W[3,l−1] , V[12,l,1] , V[23,l−1,3] ) as side information tries to recover the descriptions generated at node 3 and 1. Remember the fact that the encoding procedure at nodes 1 and 3 requires: (l−1) (508) (l) (509) R̂3→12 > I(X3 ; U3→12,l−1 |W[3,l−1] ) + δc (1, l − 1, 12) , R̂1→23 > I(X1 ; U1→23,l |W[1,l] ) + δc (1, l, 23) , (l) (l) (l−1) R1→23 < R̂1→23 + R̂3→12 (510) and that the following Markov chains: U3→12,l−1 −− (X3 , W[3,l−1] ) −− (X1 , X2 , V[12,l,1] , V[23,l−1,3] ) , (511) U1→23,l −− (X1 , W[1,l] ) −− (X2 , X3 , V[12,l,1] , V[23,l,2] )/, (512) are implied by the Markov chains in the conditions of Theorem 1. In this way, we can use the results in Appendix B to show that the following rates imply Pr {Kcommon (2, l)} −−−→ 0: n→∞ (l) R1→23 (l) > I(X1 ; U1→23,l |X2 W[1,l] V[23,l−1,3] V[12,l,1] ) + δdc (2, l) , (513) (l−1) ′ R1→23 + R3→12 > I(X1 X3 ; U1→23,l U3→12,l−1 |X2 W[3,l−1] V[23,l−1,3] V[12,l,1] ) + δdc (2, l) , (514) November 7, 2018 DRAFT 89 n n (W[3,l−1] , V[23,l−1,3] ) X3n Encoder 3 (l−1) n U1!2,l (M̂1!2,l ) R3!2 (l−1) Decoder 2 R3!2 n (M̂3!2,l−1 ) U3!2,l−1 (l) R1!2 X1n Encoder 1 n n (W[3,l−1] , V[12,l,1] ) n n n (X2n , W[3,l−1] , V[12,l,1] , V[23,l−1,3] ) Figure 11: Berger-Tung decoding problem for node 2 when it tries to recover the private descriptions generated in nodes 1 and 3. ′ where δdc (2, l), δdc (2, l) can be made arbitrarily small16 . The decoding of the private descriptions can be seen as a standard Berger-Tung decoding problem (see Fig. 11) where the binning used to transmit the descriptions generated in node 3 and 1 is not cooperative (in the sense of Theorem 9) as in the case of the common descriptions. Lemma 6 can bee easily used to analyze Pr {Kprivate (2, l)}. The following conditions guarantee that Pr {Kprivate (2, l)} −−−→ 0: n→∞ 16 Here we considered the Corollary to Theorem 9. That is we assumed, that node 1 knows perfectly the value of M3→12,l−1 . This follows from the assumed fact, that at the beginning of round l, the probability of decoding errors at previous rounds in all nodes is goes to zero when n → ∞. In this way, the constraint on rate R3→12,l−1 that should be considered, according to Theorem 9 is not needed. In fact, constraints on rate R3→12,l−1 will arise when at node 1 we consider the recovering of M3→12,l−1 and M2→13,l−1 . For that reason, the analysis carried on is valid. Through this analysis we avoid carrying a lengthy (l) (l) (l) and difficult Fourier-Motzkin procedure to eliminate R̂1→23 , R̂2→13 , R̂3→12 for l = [1 : K]. November 7, 2018 DRAFT 90 (l) (l) R̂1→2 <R1→2   (515) + I U1→2,l ; X2 V[23,l,2] W[2,l] V[12,l,1] − δdp (2, l)   (l−1) (l−1) ′ (2, l) (516) R̂3→2 <R3→2 + I U3→2,l−1 ; X2 U1→23,l V[12,l,2] W[1,l] V[23,l−1,3] − δdp   (l−1) (l) (l−1) (l) R̂3→2 + R̂1→2 <R3→2 + R1→2 + I U1→2,l ; X2 V[23,l,2] W[2,l] V[12,l,1]   +I U3→2,l−1 ; X2 U1→23,l V[12,l,2] W[1,l] V[23,l−1,3]   ′′ −I U3→2,l−1 ; U1→2,l W[2,l] V[23,l−1,3] V[12,l,1] X2 − δdp (2, l) (517) ′ ′′ where δdp (2, l), δdp (2, l), δdp (2, l) can be made arbitrarily small. Then, combining all the obtained results, we have that:  Pr Dl ∩ Edec (2, l) ∩ Ēenc (1, l) −−−→ 0 . n→∞ (518) At this point, the story is as it was at the encoding stage in node 1 and all the steps can be repeated with minor modifications, proving the desired results at the end of round l: Pr {Dl+1 } −−−→ 1 , Pr {Dl ∩ El } −−−→ 0 . n→∞ n→∞ (519) The other rates equations are as follows: • • Encoding at node 2: Decoding at node 3:   (l) R̂2→13 > I X2 ; U2→13,l W[2,l] + δc (2, l, 13)   (l) R̂2→1 > I X2 ; U2→1,l W[3,l] V[12,l,2] + δc (2, l, 1)   (l) R̂2→3 > I X2 ; U2→3,l W[3,l] V[23,l,2] + δc (2, l, 3) (520) (521) (522)   (l) (523) R2→13 > I X2 ; U2→13,l X3 W[2,l] V[13,l,1] V[23,l,2] + δdc (3, l)   (l) (l) ′′ (3, l) (524) R2→13 + R1→23 > I X1 X2 ; U1→23,l U2→13,l X3 W[1,l] V[13,l,1] V[23,l,2] + δdc   (l) (l) (525) R̂2→3 < R2→3 + I U2→3,l ; X3 V[13,l,3] W[3,l] V[23,l,2] − δdp (3, l)   (l) (l) ′ (3, l) (526) R̂1→3 < R1→3 + I U1→3,l ; X3 U2→13,l V[23,l,3] W[2,l] V[13,l,1] − δdp   (l) (l) (l) (l) R̂1→3 + R̂2→3 < R1→3 + R2→3 + I U2→3,l ; X3 V[13,l,3] W[3,l] V[23,l,2]   +I U1→3,l ; X3 U2→13,l V[23,l,3] W[2,l] V[13,l,1]   ′′ (3, l) (527) −I U1→3,l ; U2→3,l W[3,l] V[23,l,2] V[13,l,1] X3 − δdp November 7, 2018 DRAFT 91 • Encoding at node 3:   (l) R̂3→12 > I X3 ; U3→12,l W[3,l] + δc (3, l, 12)   (l) R̂3→1 > I X3 ; U3→1,l W[1,l+1] V[13,l,3] + δc (3, l, 1)   (l) R̂3→2 > I X3 ; U3→2,l W[1,l+1] V[23,l,3] + δc (3, l, 2) • (528) (529) (530) Decoding at node 1: (l) R3→12 (l) (l) R3→12 + R2→13 (l) R̂3→1 (l) R̂2→1 (l) (l) R̂2→1 + R̂3→1   (531) > I X3 ; U3→12,l X1 W[3,l] V[12,l,2] V[13,l,3] + δdc (1, l)   ′′ (1, l) (532) > I X2 X3 ; U2→13,l U3→12,l X1 W[2,l] V[12,l,2] V[13,l,3] + δdc (l) R3→1   (533) + I U3→1,l ; X1 V[12,l+1,1] W[1,l+1] V[13,l,3] − δdp (1, l)   (l) ′ (1, l) (534) < R2→1 + I U2→1,l ; X1 U3→12,l V[13,l+1,1] W[3,l] V[12,l,2] − δdp   (l) (l) < R2→1 + R3→1 + I U3→1,l ; X1 V[12,l+1,1] W[1,l+1] V[13,l,3]   +I U2→1,l ; X1 U3→12,l V[13,l+1,1] W[3,l] V[12,l,2]   ′′ (1, l) . (535) −I U2→1,l ; U3→1,l W[1,l+1] V[12,l,2] V[13,l,3] X1 − δdp < The final private rate equations in Theorem 1 follows from a rather simple Fourier-Motzkin elimination procedure [22]. R EFERENCES [1] D. Slepian and J. Wolf, “Noiseless coding of correlated information sources,” Information Theory, IEEE Transactions on, vol. 19, no. 4, pp. 471–480, 1973. [2] A. Wyner and J. Ziv, “The rate-distortion function for source coding with side information at the decoder,” Information Theory, IEEE Transactions on, vol. 22, no. 1, pp. 1–10, 1976. [3] T. Berger, “Multiterminal source coding,” in The Information Theory Approach to Communications, G. Longo, Ed. Series CISM Courses and Lectures Springer-Verlag, New York, 1978, vol. 229, pp. 171–231. [4] S. Y. Tung, “Multiterminal source coding,” Ph.D. Dissertation, Electrical Engineering, Cornell University, Ithaca, NY, May 1978. [5] T. Berger and R. Yeung, “Multiterminal source encoding with one distortion criterion,” Information Theory, IEEE Transactions on, vol. 35, no. 2, pp. 228–236, Mar 1989. [6] T. Berger, Z. Zhang, and H. Viswanathan, “The ceo problem [multiterminal source coding],” Information Theory, IEEE Transactions on, vol. 42, no. 3, pp. 887–902, 1996. [7] Y. Oohama, “The rate-distortion function for the quadratic gaussian ceo problem,” Information Theory, IEEE Transactions on, vol. 44, no. 3, pp. 1057–1070, 1998. November 7, 2018 DRAFT 92 [8] A. Wagner, S. Tavildar, and P. Viswanath, “Rate region of the quadratic gaussian two-encoder source-coding problem,” Information Theory, IEEE Transactions on, vol. 54, no. 5, pp. 1938–1961, May 2008. [9] A. Wagner, B. Kelly, and Y. Altug, “Distributed rate-distortion with common components,” Information Theory, IEEE Transactions on, vol. 57, no. 7, pp. 4035–4057, 2011. [10] C. Heegard and T. Berger, “Rate distortion when side information may be absent,” IEEE Transactions on Information Theory, vol. 31, no. 6, pp. 727 – 734, Nov. 1985. [11] R. Timo, T. Chan, and A. Grant, “Rate distortion with side-information at many decoders,” IEEE Transactions on Information Theory, vol. 57, no. 8, pp. 5240 –5257, Aug. 2011. [12] R. Timo, A. Grant, and G. Kramer, “Lossy broadcasting with complementary side information,” Information Theory, IEEE Transactions on, vol. 59, no. 1, pp. 104–131, Jan 2013. [13] A. Kaspi, “Two-way source coding with a fidelity criterion,” IEEE Transactions on Information Theory, vol. 31, no. 6, pp. 735 – 740, Nov. 1985. [14] N. Ma and P. Ishwar, “Interaction strictly improves the wyner-ziv rate-distortion function,” in Information Theory Proceedings (ISIT), 2010 IEEE International Symposium on, June 2010, pp. 61–65. [15] H. Permuter, Y. Steinberg, and T. Weissman, “Two-way source coding with a helper,” IEEE Transactions on Information Theory, vol. 56, no. 6, pp. 2905–2919, 2010. [16] N. Ma and P. Ishwar, “Some results on distributed source coding for interactive function computation,” IEEE Transactions on Information Theory, vol. 57, no. 9, pp. 6180 –6195, Sep. 2011. [17] N. Ma, P. Ishwar, and P. Gupta, “Interactive source coding for function computation in collocated networks,” IEEE Transactions on Information Theory, vol. 58, no. 7, pp. 4289 –4305, Jul. 2012. [18] L. Sankar and H. Poor, “Distributed estimation in multi-agent networks,” in Information Theory Proceedings (ISIT), 2012 IEEE International Symposium on, July 2012, pp. 329–333. [19] M. Gastpar, “The wyner-ziv problem with multiple sources,” IEEE Transactions on Information Theory, vol. 50, no. 11, pp. 2762 – 2768, Nov. 2004. [20] A. Wyner, “The common information of two dependent random variables,” Information Theory, IEEE Transactions on, vol. 21, no. 2, pp. 163–179, Mar 1975. [21] T. Cover and J. Thomas, Elements of information theory (2nd Ed). Wiley-Interscience, 2006. [22] A. El Gamal and Y.-H. Kim, Network Information Theory. Cambridge, U.K.: Cambridge Univ. Press, 2011. [23] I. Csiszár and J. Körner, Information Theory: Coding Theorems for Discrete Memoryless Systems. New York: Academic, 1981. [24] P. Piantanida, L. Rey Vega, and A. Hero, “A proof of the generalized markov lemma with countable infinite sources,” in Information Theory Proceedings (ISIT), 2014 IEEE International Symposium on, July 2014. [25] W. Uhlmann, “Vergleich der hypergeometrischen mit der binomial-verteilung,” Metrika, vol. 10, no. 1, pp. 145–158, 1966. [26] A. Kaspi and T. Berger, “Rate-distortion for correlated sources with partially separated encoders,” Information Theory, IEEE Transactions on, vol. 28, no. 6, pp. 828–840, Nov 1982. November 7, 2018 DRAFT