Academia.eduAcademia.edu

Spectral Efficiency of Mixed-ADC Massive MIMO

2018, IEEE Transactions on Signal Processing

https://doi.org/10.1109/TSP.2018.2833807

We study the spectral efficiency (SE) of a mixed-ADC massive MIMO system in which K single-antenna users communicate with a base station (BS) equipped with M antennas connected to N high-resolution ADCs and M − N one-bit ADCs. This architecture has been proposed as an approach for realizing massive MIMO systems with reasonable power consumption. First, we investigate the effectiveness of mixed-ADC architectures in overcoming the channel estimation error caused by coarse quantization. For the channel estimation phase, we study to what extent one can combat the SE loss by exploiting just N ≪ M pairs of high-resolution ADCs. We extend the round-robin training scheme for mixed-ADC systems to include both high-resolution and one-bit quantized observations. Then, we analyze the impact of the resulting channel estimation error in the data detection phase. We consider random high-resolution ADC assignment and also analyze a simple antenna selection scheme to increase the SE. Analytical expressions are derived for the SE for maximum ratio combining (MRC) and numerical results are presented for zero-forcing (ZF) detection. Performance comparisons are made against systems with uniform ADC resolution and against mixed-ADC systems without round-robin training to illustrate under what conditions each approach provides the greatest benefit.

1 Spectral Efficiency of Mixed-ADC Massive MIMO arXiv:1802.10259v2 [cs.IT] 30 Apr 2018 Hessam Pirzadeh, Student Member, IEEE, and A. Lee Swindlehurst, Fellow, IEEE Abstract—We study the spectral efficiency (SE) of a mixedADC massive MIMO system in which K single-antenna users communicate with a base station (BS) equipped with M antennas connected to N high-resolution ADCs and M − N one-bit ADCs. This architecture has been proposed as an approach for realizing massive MIMO systems with reasonable power consumption. First, we investigate the effectiveness of mixed-ADC architectures in overcoming the channel estimation error caused by coarse quantization. For the channel estimation phase, we study to what extent one can combat the SE loss by exploiting just N ≪ M pairs of high-resolution ADCs. We extend the roundrobin training scheme for mixed-ADC systems to include both high-resolution and one-bit quantized observations. Then, we analyze the impact of the resulting channel estimation error in the data detection phase. We consider random high-resolution ADC assignment and also analyze a simple antenna selection scheme to increase the SE. Analytical expressions are derived for the SE for maximum ratio combining (MRC) and numerical results are presented for zero-forcing (ZF) detection. Performance comparisons are made against systems with uniform ADC resolution and against mixed-ADC systems without round-robin training to illustrate under what conditions each approach provides the greatest benefit. Index Terms—Massive MIMO, analog-to-digital converter, mixed-ADC, spectral efficiency. I. I NTRODUCTION T HE seminal work of Marzetta introduced massive MIMO as a promising architecture for future wireless systems [2]. In the limit of an infinite number of base station (BS) antennas, it was shown that massive MIMO can substantially increase the network capacity. Another key potential of massive MIMO systems which has also made it interesting from a practical standpoint is its ability of achieving this goal with inexpensive, low-power components [3], [4]. However, preliminary studies on massive MIMO systems have for the most part only analyzed its performance under the assumption of perfect hardware [5], [6]. The impact of hardware imperfections and nonlinearities on massive MIMO systems has recently been investigated in [7]-[12]. Although it is well-known that the dynamic power in √massive MIMO systems can be scaled down proportional to M , where M denotes the number of BS antennas, the static power consumption at the BS will increase proportionally to M [8]. Hence, considering hardware-aware design together with power consumption at the BS seems necessary in realizing practical massive MIMO systems. This work was supported by the National Science Foundation under Grants ECCS-1547155 and CCF-1703635, and by a Hans Fischer Senior Fellowship from the Technische Universität München Institute for Advanced Study. H. Pirzadeh and A. L. Swindlehurst are with the Center for Pervasive Communications and Computing, University of California, Irvine, CA 92697 USA (e-mail: [email protected]; [email protected]). Portions of this paper have appeared in [1]. Among the various components responsible for power dissipation at the BS, the contribution of analog-to-digital converters (ADCs) is known to be dominant [13]. Consequently, the idea of replacing the high-power high-resolution ADCs with power efficient low-resolution ADCs could be a viable approach to address power consumption concerns at the massive MIMO BSs. The impact of utilizing low-resolution ADCs on the spectral efficiency (SE) and energy consumption of massive MIMO systems has been considered in [14]-[22]. In particular, studies on massive MIMO systems with purely onebit ADCs show that the high spatial multiplexing gain owing to the use of a large number of antennas is still achievable even with one-bit ADCs [14], [15]. However, many more antennas with one-bit ADCs (at least 2-2.5 times) are required to attain the same performance as in the high-resolution ADCs case. One of the main causes of SE degradation in purely one-bit massive MIMO systems is the error due to the coarse quantization that occurs during the channel estimation phase. While at low SNR the loss due to one-bit quantization is only about 2 dB, at higher SNRs performance degrades considerably more and leads to an error floor [14]. The SE degradation can be reduced by improving the quality of the channel estimation prior to signal detection. One approach for doing so is to exploit socalled mixed-ADC architectures during the channel estimation phase, in which a combination of low- and high-resolution ADCs are used side-by-side. This architecture is depicted in Fig. 1. Mixed-ADC implementations were introduced in [23], [24] and their performance was studied from an information theoretic perspective via generalized mutual information. The basic premise behind the mixed-ADC architecture is to achieve the benefits of conventional massive MIMO systems by just exploiting N ≪ M pairs of high-resolution ADCs. An SE analysis of mixed-ADC massive MIMO systems with maximum ratio combining (MRC) detection for Rayleigh and Rician fading channels was carried out in [25] and [26], respectively. The SE and energy efficiency of mixed-ADC systems compared with systems composed of one-bit ADCs was studied in [27] for MRC detection, and conditions were derived under which each architecture provided the highest SE for a given power consumption. The advantage of using a mixed-ADC architecture in designing Bayes-optimal detectors for MIMO systems with low-resolution ADCs is reported in [28]. Although the nonlinearity of the quantization process increases the complexity of the optimal detectors, it is shown that adding a small number of high-resolution ADCs to the system allows for less complex detectors with only a slight performance degradation. Moreover, the benefit of using mixed-ADC architectures in massive MIMO relay systems and cloud-RAN deployments is elaborated in [29], [30]. Most existing work in the mixed-ADC massive MIMO literature has assumed either perfect channel state information 2 (CSI) or imperfect CSI with “round-robin” training. In the round-robin training approach [23], [24], [26], the training data is repeated several times and the high-resolution ADCs are switched among the RF chains so that every antenna can have a “clean” snapshot of the pilots for channel estimation. This obviously requires a larger portion of the coherence interval to be devoted to training rather than data transmission. More precisely, for M antennas and N pairs of high-resolution ADCs, M/N pilot signals are required in the single-user scenario to estimate all M channel coefficients with high-resolution ADCs. This issue is pointed out in [23] for the single user scenario and its impact is taken into account. This training overhead will be exacerbated in the multiuser scenario where orthogonal pilot sequences should be assigned to the users. In this case, the training period becomes (M/N )η, where η represents the length of the pilot sequences (at least as large as the number of user terminals), which could be prohibitively large and may leave little room for data transmission. Hence, it is crucial to account for this fact in any SE analysis of mixed-ADC massive MIMO systems. In this paper, we examine the channel estimation performance and the resulting uplink SE of mixed-ADC architectures with and without round-robin training, and compare them with implementations that employ uniform ADC quantization across all antennas. The main goals are to determine when, if at all, the benefits of using the round-robin approach with ADC/antenna switching outweigh the cost of increasing the training overhead, and furthermore to examine the question of whether or not one should employ a mixed-ADC architecture in the first place. The contributions of the paper can be summarized as follows. • • • We first present an extension of the round-robin training approach that incorporates both high-resolution and onebit measurements for the channel estimation. The roundrobin training proposed in [23], [24], [26] based the channel estimate on only high-resolution observations, assuming that no data was collected from antennas during intervals when they were not connected to the highresolution ADCs. In contrast, our extension assumes that these antennas collect one-bit observations and combine this data with the high-resolution samples to improve the channel estimation performance. We use the Bussgang decompositon [31] to develop a linear minimum mean-squared error (LMMSE) channel estimator based on the combined round-robin measurements and we derive a closed-form expression for the resulting mean-squared error (MSE). We further illustrate the importance of using the Bussgang approach rather than the simpler additive quantization noise model in obtaining the most accurate characterization of the channel estimation performance for round-robin training. The analysis illustrates that the addition of the one-bit observations considerably improves performance at low SNR. We perform a spectral efficiency analysis of the mixedADC implementation for the MRC and ZF receivers, and obtain expressions for a lower bound on the SE that takes • into account the channel estimation error and the loss of efficiency due to the round-robin training. We compare the resulting SE with that achieved by mixed-ADC implementations that do not switch ADCs among the RF chains, and hence do not use round-robin training. We also compare against the SE for architectures that do not mix the ADC resolution across the array, but instead use uniform resolution with a fixed number of comparators for different array sizes. We show that, depending on the SNR, coherence interval, number of high-resolution ADCs, and the choice of the linear receiver, there are situations where each of the considered approaches shows superior performance. In particular, using uniform lowresolution ADCs is better than a mixed-ADC approach for an interference limited system. On the other hand, a mixed-ADC system, even one with round-robin training, is superior at higher SNRs when zero-forcing is used to reduce the interference. We analyze the possible SE improvement that can be achieved by using an antenna selection algorithm that connects the high-resolution ADCs to the subset of antennas with the highest channel gain. We analytically derive the SE performance of the antenna selection algorithm for MRC and numerically study its performance for ZF detection, comparing against the simpler approach of assigning the high-resolution ADCs to an arbitrary fixed subset of the RF chains. In addition to the above contributions, we also discuss some of the issues related to implementing an ADC switch or multiplexer in hardware that allows different ADCs to be assigned to different antennas. We restrict our analysis and numerical examples to a single-carrier flat-fading scenario, although our methodology can be used in a straightforward way to extend the results to frequency-selective fading or multiplecarrier signals (e.g., see our prior work in Section III.B of [14] for the SE analysis of an all-one-bit ADC system for OFDM and frequency selectivity). The reasons for focusing on the single-carrier flat-fading case are as follows: (1) the mixed-ADC assumption already makes the resulting analytical expressions quite complicated even for the simple flat-fading case, and it would be more difficult to gain insight into the problem if the expressions were further complicated; (2) the original round-robin training idea was proposed in [23] for the single-carrier flat-fading case, and thus we analyze it under the same assumptions; (3) the main conclusions of the paper are based on relative algorithm comparisons for the same set of assumptions, and we expect our general conclusions to remain unchanged if frequency rather than flat fading were considered; and (4) the flat fading case is still of interest in some applications, for example in a micro-cell setting with typical path-length differences of 50-100 m, the coherence bandwidth is between 3-6 MHz, which is not insignificant. Further assumptions regarding the system model are outlined in the next section. Section III discusses channel estimation using round-robin training, and derives the LMMSE channel estimator that incorporates both the high-resolution and one-bit observations. A discussion of hardware and other 3 M antennas M-N one-bit ADCs Z& Z& ŚĂŝŶ ŚĂŝŶ ϭͲďŝƚ ϭͲďŝƚ   Z& Z& ŚĂŝŶ ŚĂŝŶ  DƵůƚŝƉůĞdžĞƌ ϭͲďŝƚ ϭͲďŝƚ   ,ŝͲƌĞƐ ,ŝͲƌĞƐ   ĂƐĞďĂŶĚ ŽŵďŝŶŝŶŐ ,ŝͲƌĞƐ ,ŝͲƌĞƐ   Z& Z& ŚĂŝŶ ŚĂŝŶ N high-resolution ADCs Fig. 1. Mixed-ADC architecture. practical considerations associated with using a mixed-ADC system with ADC/antenna switching is presented in Section IV. Section V then presents the analysis of the spectral efficiency for MRC and ZF receivers based on the imperfect channel state estimates, including an analytical performance characterization of antenna selection and architectures with uniform ADC resolution across the array. A number of numerical studies are then presented in Section VI to illustrate the relative performance of the algorithms considered. Notation: We use boldface letters to denote vectors, and capitals to denote matrices. The symbols (.)∗ , (.)T , and (.)H represent conjugate, transpose, and conjugate transpose, respectively. A circularly-symmetric complex Gaussian (CSCG) random vector with zero mean and covariance matrix R is denoted v ∼ CN (0, R). The symbol k.k represents the Euclidean norm. The K × K identity matrix is denoted by I K and the expectation operator by E{.}. We use 1N to denote the N ×1 vector of all ones, and diag{C} the diagonal matrix formed from the diagonal elements of the square matrix C. For a complex value, c = cR + jcI , we define arcsin(c) , arcsin(cR ) + jarcsin(cI ). II. S YSTEM M ODEL Consider the uplink of a single-cell multi-user MIMO system consisting of K single-antenna users that send their signals simultaneously to a BS equipped with M antennas. Assuming a single-carrier frequency flat channel and symbolrate sampling , the M × 1 signal received at the BS from the K users is given by K X √ pk g k sk + n, r= (1) k=1 where pk represents √ the average transmission power from the kth user, g k = βk hk is the channel vector between the kth user and the BS where βk models geometric attenuation and shadow fading, and hk ∼ CN (0, I M ) represents the fast fading and is assumed to be independent of other users’ channel vectors. The symbol  transmitted by the kth user is denoted by sk where E |sk |2 = 1 and is drawn from a CSCG codebook independent of the other users. Finally, n ∼ CN 0, σn2 I M denotes additive CSCG receiver noise at the BS. The assumption of symbol-rate sampling means that the matched filter at the receiver must be implemented in the analog domain. Better performance (e.g., higher rates) could be achieved by oversampling the ADCs, particularly those with one-bit resolution. We consider a block-fading model with coherence bandwidth Wc and coherence time Tc . In this model, each channel remains constant in a coherence interval of length T = Tc Wc symbols and changes independently between different intervals. Note that T is a fixed system parameter chosen as the minimum coherence duration of all users. At the beginning of each coherence interval, the users send their η-tuple mutually orthogonal pilot sequences (K ≤ η ≤ T ) to the BS for channel estimation. Denoting the length of the training phase as ηeff , the remaining T − ηeff symbols are dedicated to uplink data transmission. III. T RAINING P HASE In this section, we investigate the linear minimum mean squared error (LMMSE) channel estimator for different ADC architectures at the BS. In all scenarios, the pilot sequences are drawn from an η × K matrix Φ, where the kth column of Φ, φk , is the kth user’s pilot sequence and ΦH Φ = I K . Therefore, the M × η received signal at the BS before quantization becomes X= K X √ ηpk g k φTk + N , (2) k=1 where N is an M × η matrix with i.i.d. CN (0, σn2 ) elements. Since the rows of X are mutually independent due to the assumption of spatially uncorrelated Gaussian channels and noise, we can analyze them separately. As a result, we will focus on the mth row of X which is xTm = K X √ ηpk gmk φTk + nTm , (3) k=1 where gmk is the mth element of the kth user channel vector, g k , and nTm is the mth row of N . Since the analysis is not dependent on m, hereafter we drop this subscript and denote the received signal at the mth antenna by x. A. Estimation Using One-Bit Quantized Observations In this subsection, to have a benchmark for comparison purposes, we consider the case in which all antennas at the BS are connected to one-bit ADCs. The received signal xT after quantization by one-bit ADCs can be written as  (4) y Tt = Q xT , where the element-wise one-bit quantization operation Q(·) replaces each input entry with the quantized value √12 (±1 ± j), depending on the sign of the real and imaginary parts. According to the Bussgang decomposition [31], the following linear representation of the quantization can be employed [14]: r  2 T − 21 T x Dx + q Tt , (5) Q x = π 4 where Dx = diag{Cx } and Cx denotes autocorrelation matrix of x, which can be calculated as ηpk βk φ∗k φTk + σn2 Iη . (6) k=1 In addition, q t represents quantization noise which is uncorrelated with x and its autocorrelation matrix can be derived based on the arcsine law as [32] 2 −1 2 −1 −1 −1 Cqt = arcsin{Dx 2 Cx Dx 2 } − Dx 2 Cx Dx 2 . (7) π π Much of the existing work on massive MIMO systems with low-resolution ADCs employs the simple additive quantization noise model (AQNM) for their analysis [20]-[22], [25]-[30], [39] which is valid only for low SNRs and does not capture the correlation among the elements of q t , which turns out to be of crucial importance in our analysis. Hence, we consider the Bussgang decomposition instead and will show its effect on the system performance analysis. Stacking the rows of (5) into a matrix, the one-bit quantized observation at the BS becomes r 2 −1 Y= (8) XDx 2 + Q, π where Q is an M × η matrix whose mth row is q Tt . The LMMSE estimate of the channel G = [g 1 , ..., g K ] based on just one-bit quantized observations (8) is given in the following theorem. Antenna set 1 Antenna set 2 Antenna set 3 Antenna set 4 Antenna set 5 π 21 Dx φk 2 (10)  1  2 ∗ T σn + φ̄k Cq t φ̄k . ηpk (11) φ̄k , 2 σw = k r Define the channel estimation error ε , ĝ k − g k . Then we have σĝ2k = βk2 2 βk + σw k and σε2k = 2 σw β k k , 2 βk + σw k (12) where σĝ2k and σε2k are the variances of the independent zeromean elements of ĝ k and ε, respectively. From Theorem 1, it is apparent that in the channel estimation analysis of massive MIMO systems with one-bit ADCs, the estimation error is directly affected not only by the inner product of the pilot sequences, but also by their outer product as well [14]. To get insight into the impact of the one-bit quantization on the channel estimation, in the next corollary we adopt the statistics-aware power control policy proposed in [37]. Apart from its practical advantages, this policy is especially suitable specially for one-bit ADCs since it avoids near-far blockage and hence strong interference. Moreover, this power control approach also leads to simple expressions and provides analytical convenience for our derivation in Data Transmission 000000000 000000000000000000 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0000000000 000000000 000000000 000000000 Full-resolution observations No observation Wc Tc Fig. 2. Transmission protocol for estimation using full-resolution observations. Section VI. Although not the focus of this paper, we note that in general a massive MIMO system employing a mixedADC architecture will be more resilient than an all one-bit implementation to the near-far effect and jamming. This is an interesting topic for further study. Corollary 1. For the case in which power control is performed, i.e., pk = βpk for some fixed value p and for k ∈ K = {1, · · · , K}, the number of users is equal to the length of pilot sequences, i.e., η = K, and the pilot matrix satisfies ΦΦH = IK , we have  Cx = Kp + σn2 IK = Dx (13)   2 IK , (14) Cq t = 1 − π which yields σĝ2k = Theorem 1. The LMMSE estimate of the k-th user channel, g k , given the one-bit quantized observations Y is [14] r 1 βk ∗ Yφ̄k , (9) ĝ k = 2 βk + σwk ηpk where Training antennas Cx = K X 0000000000000 0000000000000 0000000000000 2 βk π 1 + σn2 (15) Kp σε2k =  Kp 2 σn 1− 2 π 1+   + 1 βk Kp 2 σn . (16) Corrollary 1 states conditions under which Cqt is diagonal. In addition, it is evident that the channel estimation suffers from an error floor at high SNRs. B. Channel Estimation with Few Full Resolution ADCs Channel estimation with coarse observations suffers from large errors especially in the high SNR regime. On the other hand, while estimating all channels using high-resolution ADCs is desirable, the resulting power consumption burden makes this approach practically infeasible. This motivates the use of a mixed-ADC architecture for channel estimation to eliminate the large estimation error caused by one-bit quantization while keeping the power consumption penalty at an acceptable level. In the approach described in [23], [24], [26] , N ≪ M pairs of high-resolution ADCs are deployed and switched between different antennas during different transmission intervals in an approach referred to as “round-robin” training. In this approach, the M BS antennas are grouped into M/N sets1 . In the first training sub-interval, users send their mutually orthogonal pilots to the BS while the N high-resolution ADC pairs are connected to the first set of N antennas. After receiving the pilot symbols from all users in the η-symbol-length training sub-interval, the high-resolution 1 We assume M/N is an integer throughout the paper. 5 ηpk βk and the resulting variances of the channel estimate and the error are given respectively by σĝ2k = βk 1+ 2 σn ηpk βk and σε2k = βk 1+ . ηpk βk (18) 2 σn Eq. (18) states that by employing only N pairs of highresolution ADCs and by expending a larger portion of the coherence interval for channel estimation, the channel can be estimated with the same precision as that achieved by conventional high-resolution ADC massive MIMO systems. However, this comes at the high cost of repeating the training data M/N times, which can significantly reduce the time available for data transmission. Indeed, we will see later that in some cases, a mixed-ADC implementation with round-robin training achieves a lower SE than a system with all one-bit ADCs because of the long training interval (even with the improvements we propose below for the round-robin method). However, we will also see that there are other situations for which the mixed-ADC round-robin method provides a large gain in SE. The primary goal of this paper is to elucidate under what conditions these and other competing approaches provide the best performance. Before analyzing the tradeoff between the gain (lower channel estimation error) and cost (longer training period) of the round-robin approach, in the next subsection we propose channel estimation based on the use of both full-resolution and one-bit data received by the BS in order to further improve the performance of the mixed-ADC architecture with round-robin channel estimation. To our knowledge, this approach has not been considered in prior work on mixed-ADC massive MIMO. C. Estimation Using Joint Full-Resolution/One-Bit Observations While channel estimation performance based on coarsely quantized observations suffers from large errors in the high SNR regime, it provides reasonable performance for low SNRs. Hence, in this subsection we consider joint channel estimation based on observations from both high-resolution and one-bit ADCs to further improve the channel estimation accuracy. Unlike the previous subsection in which the onebit ADCs were not employed, here we incorporate their coarse observations into the channel estimation procedure. The protocol for this method is illustrated in Fig. 3 for a mixedADC system with M/N = 5. It can be seen that, in addition to one set of full-resolution observations for each antenna, there are (M/N ) − 1 sets of one-bit observations which are also Antenna set 1 Antenna set 2 Antenna set 3 Antenna set 4 Antenna set 5 0 0 0 0 0 0 0 0 0 0 0 0 0 Training 0000000000000 0000000000000 0000000000 0000000000 0 0 0 0 0 0 0 0 0000000000000000000 000000000000000000 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0000000000 Data Transmission antennas ADCs are switched to the next set of antennas and so on. In this manner, after (M/N )η pilot transmissions (M/N subintervals), we can estimate each channel based on observations with only high-resolution ADCs. This round-robin channel estimation protocol is illustrated in Fig. 2 for a mixed-ADC system with M/N = 5. Stacking all N ×η full-resolution observations into an M ×η matrix, X, the LMMSE estimate of the k-th user channel, g k , is [5] 1 1 Xφ∗k , (17) ĝ k = √ σ2 ηpk 1+ n 000000000 000000000 000000000 Full-resolution observations One-bit observations Wc Tc Fig. 3. Transmission protocol for estimation using full-resolution/one-bit observations. taken into account for channel estimation. The next theorem characterizes the performance of this approach. Theorem 2. Stacking all N × η full-resolution observations into an M × η matrix, X, and all (M/N ) − 1 N × η one-bit quantized observations into M × η matrices, Yt , t ∈ T = {1, ..., M/N − 1}, the LMMSE estimate of the k-th user channel, g k , is   M r −1 N X 1  ∗ Yt φ̄k  , (19) ĝ k = w∞k Xφ∗k + w1k ηpk t=1 where ηpk 2 σn ηpk 2 + ςk (pk ) σn (20) −1 M ςk (pk ) N −1 ηpk 1 2 + ςk (pk ) βk + σn (21) w∞k = 1 βk +  w1k = M N ςk (pk ) = 2 σw k C̄q t = −1   2 + M −2 ̺ σw k N k  1  2 T ∗ = σn + φ̄k Cq t φ̄k ηpk (22) ̺k = (24) 1 T ∗ φ̄ C̄q φ̄ ηpk k t k 2 2 −1 −1 −1 −1 arcsin{D̄x 2 C̄x D̄x 2 } − D̄x 2 C̄x D̄x 2 π π C̄x = K X ηpk βk φ∗k φTk (23) (25) (26) k=1 D̄x = diag{C̄x }. (27) This approach yields the following variances for the channel estimate and the estimation error, respectively: σĝ2k = ηpk 2 + ςk (pk ) σn βk ηpk 1 2 + ςk (pk ) βk + σn σε2k = Proof. See Appendix A. 1 βk + 1 . + ςk (pk ) ηpk 2 σn (28) (29)  Theorem 2 demonstrates the optimal approach for combining the observations from high-resolution and one-bit ADCs. In addition, this highlights the importance of considering the correlation among the one-bit observations in the analysis 6 of mixed-ADC channel estimation, something that could not be addressed by the widely-used AQNM approach. More precisely, it can be seen that the impact of joint highresolution/one-bit channel estimation is manifested in the variance of the channel estimation error by the term ςk (pk ). To see this, assume that the correlation among one-bit observations in different training sub-intervals is ignored (as would be the case with the AQNM approach). As shown in the appendix, this is equivalent to setting ̺k = 0 in (24). Under this assumption, ςk (pk ) becomes  M −1 > ςk (pk ), (30) ςk0 (pk ) = N 2 σwk and thus, σε2k > σε2k0 where σε2k0 denotes the estimation error for ̺k = 0. Consequently, the AQNM model yields an overly optimistic assessment of the channel estimation error compared with the more accurate Bussgang analysis. We will see below that the impact of the AQNM approximation is significant for mixed-ADC channel estimation. The next corollary provides insight into the impact of the system parameters on the joint high-resolution/one-bit LMMSE estimation. Corollary 2. For the case in which power control is performed, i.e., pk = βpk for k ∈ K, the number of users is equal to the length of pilot sequences, i.e., η = K, and the pilot matrix satisfies ΦΦH = IK , we have C̄x = KpIK = D̄x , and C̄q t = 1 and σε2k = where ς(p) = 2 π σn 2 Kp In addition, w∞ = 1+ Kp 2 σn Kp 2 + σn ς(p) +  M N −1  π 2 −1 and w1 = -10 -15 -20 -15 -10 -5 (32) 1 1+ M N M N Kp 2 σn −1 −1 Kp 2 σn βk , (33) . −1 1+ + ς(p) (34) ς(p) + ς(p) , (35) where w∞ and w1 denote the weights of the high-resolution and one-bit observations in the LMMSE estimation, respectively. Corallary 2 states that in contrast to Theorem 1 where the correlation among one-bit observations within each training sub-interval can be eliminated by carefully selecting the system parameters as in Corollary 1, we cannot overcome the correlation among one-bit observations from different training sub-intervals. This phenomenon makes the addition of the onebit observations less useful especially in the high SNR regime. 0 5 2. Fig. 4. Channel estimation error σε2k /βk versus p/σn For instance, in the asymptotic case, as the SNR = to infinity, we have ς −→   2 = 1− IK , π Kp 2 + ς(p) σn βk + Kp 2 + ς(p) σn -5 (31) which yields σĝ2k 0 π 2 1 , −1 w∞ −→ 1, w1 −→ 0. p 2 σn goes (36) (37) It is apparent from (36) that in the asymptotic regime ς tends to a finite value and also is independent of M/N . Moreover, (37) implies that the optimal approach for high SNRs is to estimate the channel based solely on the high-resolution observations. The error for the three channel estimation approaches in Eqs. (12), (18), and (29) is depicted in Fig. 4 for a case with M = 100 antennas, K = 10 users, and various numbers of high-resolution ADCs, N and training lengths η. The label “Joint” refers to round-robin channel estimation that includes the one-bit observations as described in the previous section, ”Full resolution” indicates the performance achieved using a full array of high-resolution ADCs, and “One-bit” refers to the performance of an all-one-bit architecture. We also plot the performance predicted for the Joint approach based on the AQNM analysis, which ignores the correlation among the onebit observations. We see that the AQNM-based analysis yields an overly optimistic prediction for the channel estimation error. In particular, unlike AQNM, the more accurate Bussgang analysis shows that channel estimation with all an one-bit BS actually outperforms the Joint method for low SNRs, a critical observation in analyzing whether or not a mixed-ADC implementation makes sense. However, we see that the mixedADC architecture eventually overcomes the error floor of the all one-bit system for high SNRs and in such cases can reduce the estimation error dramatically. Fig. 4 focuses on channel 7 estimation performance, but does not reflect the full impact of the round-robin training on the overall system spectral efficiency, since reducing N increases the amount of training required by the round-robin method. This will be taken into account when we analyze the SE in Section V. IV. P RACTICAL C ONSIDERATIONS The improvement in channel estimation performance provided by the round-robin training clearly comes at the expense of a significantly increased training overhead. For example, consider a simple worst-case example with a 400 Hz Doppler spread in a narrowband channel of 400 kHz bandwidth; in this case, the coherence time is roughly 1000 symbols. For higher bandwidths or smaller cells with lower mobility, the coherence time can easily approach 10,000 symbols or more. A mixed-ADC array of 128 antennas with 16 high-resolution ADCs would require repeating the pilots 8 times, which for 20 users would amount to 160 symbols, or 16% of the coherence time when T = 1000 symbols. This is a relatively high price to pay, and as we will see later, in many instances the resulting loss in SE cannot be offset by the improved channel estimate. However, we will also see that on the other hand, there are other situations where the opposite is true, where the roundrobin method leads to significant gains in SE even taking the training overhead into account. Besides the extra training overhead, the round-robin method has the disadvantage of requiring extra RF switching or multiplexing hardware prior to the ADCs, as shown in Fig. 1. It is unlikely that a single large M × M multiplexer would be used for this purpose, since complete flexibility in assigning a given high-resolution ADC to any possible antenna is not needed. A more likely architecture would employ a bank of smaller multiplexers that allows one high-resolution ADC to be switched among a smaller subarray of antennas, ensuring that each RF chain has access to high-resolution training data during one of the round-robin intervals. Such an approach is similar to the simplified “subarray switching” schemes proposed for antenna selection in massive MIMO [33]-[35]. In an interesting earlier example, a large 108 × 108 multiplexer chipset for a local area network application was developed in [36], composed of several 36 × 36 differential crosspoint ASIC switches that consume less than 100 mW each, with a bandwidth of 140 MHz and a 0 dB insertion loss. In the 20 years since [36], RF switch technology has advanced considerably. For the example discussed above involving a 128-element array with 16 high-resolution ADCs and 112 one-bit ADCs, the multiplexing could be achieved using 16 8×8 analog switches arranged in parallel. Consider the Analog Devices ADV3228 8 × 8 crosspoint switch as an example of an off-the-shelf component for such an architecture2. The ADV3228 has a 750 MHz bandwidth, a switching time of 15 ns, and a power consumption of 500 mW, which is similar to that of an 8-bit ADC (for example, see Texas Instruments’ ADC08B200 8-bit 200 MS/s ADC3 ). Since the switches can 2 See http://www.analog.com/en/products/switches-multiplexers/bufferedanalog-crosspoint-switches/adv3228.html#product-overview for product details. 3 http://www.ti.com/product/ADC08B200/technicaldocuments. be implemented at a lower intermediate frequency prior to the I-Q demodulation, only one per subarray is required, and thus the total power consumption of the switches would be less than half that of the ADCs. Note that for the vast majority of the coherence time, the switch is idle. To accommodate the round-robin training, the switches only need to be operated M N −1 times, once for every repetition of the training data. This reduces the actual power consumption to below the specification, and further reduces the impact of the additional training. Short guard intervals would need to be inserted between the training intervals to account for the switching transients, but these will typically not impact the SE. For the example discussed above with 128 antennas and 8 switches, 7 switching events are required for a total switching time of 105 ns, which is insignificant compared to the coherence time of 2.5 ms at a 400 Hz Doppler. The insertion loss of the analog switches would also have to be taken into account in an actual implementation, since this will directly reduce the overall SNR of the received signals. Harmonic interference due to nonlinearities in the switch are likely not an issue; for example, the specifications for a Texas Instruments switch (LMH6583) similar to the ADV3228 indicate that the power of the second and third harmonic distortions were -76 dBc. Furthermore, it has been shown that the use of signal combining with a massive antenna array provides significant robustness to such nonlinearities and other hardware imperfections [7]-[12]. V. S PECTRAL E FFICIENCY Although channel estimation with a mixed-ADC architecture using round-robin training can substantially improve the channel estimation accuracy, it requires a longer training interval and, therefore, leaves less room for data transmission in each coherence interval. More precisely, (M/N )η symbol transmissions are required for round-robin channel estimation which could be large when the number of high-resolution ADCs, N , is small4 . Despite losing a portion of the coherence interval for channel estimation due to the mixed-ADC architecture, the improvement in the signal-to-quantizationinterference-and-noise ratio (SQINR) can be significant owing to more accurate channel estimation, and thus a higher rate would be expected during this shorter data transmission period. In this section, we study this system performance trade-off in terms of spectral efficiency for the three mentioned channel estimation approaches. In the data transmission phase, all users simultaneously send their data symbols to the BS. To begin, assume the antennas are ordered so that the last N antennas are connected to highresolution ADCs in this phase. A more thoughtful assignment of the high-resolution ADCs will be considered below. From 4 Note that in designing a mixed-ADC system with round-robin channel training, one should consider the ratio M/N in scaling the system instead of just increasing the number of antennas M . In particular, increasing the number of BS antennas requires increasing of the high-resolution ADCs, N , as well. 8 equation (1), and based on the Bussgang decomposition, the received signal at the BS after one-bit quantization is "q #   2 − 12 q̄ D̄ 0 π yd = r+ d (38) 0 0 IN | {z } qd D̄ = diag{Cr } Cr = K X (39) 2 pk ḡ k ḡ H k + σn IM−N , (40) k=1 where ḡ k denotes the M − N elements of g k corresponding to the M − N one-bit ADCs and q̄ d is the (M − N ) × 1 quantization noise in the data transmission phase. It is apparent that the covariance matrix in (40) is not diagonal which makes analytical tractability difficult. However, by adopting statisticsaware power control [37], i.e., pk = βpk , and assuming that the number of users is relatively large (typical for massive MIMO systems), channel hardening occurs [14], and (40) can be approximated as  (41) Cr ∼ = Kp + σn2 IM−N = D̄. As a result, according to the arcsine law (see (7)), the covariance matrix of the quantization noise in the data transmission phase becomes Cq̄d ∼ = (1 − 2/π)IM−N and (38) simplifies to ! K X √ ∼ phk sk + n + q y =A (42) d d k=1  αIM−N A= 0  0 , IN q 1 where α , π2 (Kp+σ 2 . n) For data detection, the BS selects a linear receiver W ∈ CM×K as a function of the channel estimate. Note that the quantization model considered in (4) and (5) does not preserve the power of the input of the quantizer since the power of the output is forced to be 1. Thus we premultiply the received signal as follows to offset this effect: ŷ d = A−1 y d . (43) By employing the linear detector W, the resulting signal at the BS is ŝ = W H ŷ d . (44) Thus, the kth element of ŝ is ŝk = K √ X √ H pw k hk sk + p wH k hi si i=1,i6=k H −1 + wH q d , (45) k n + wk A where w k is the kth column of W . We assume the BS treats w H k hk as the gain of the desired signal and the other terms of (45) as Gaussian noise when decoding the signal5 . Consequently, we can use the classical bounding technique of 5 Note that in general, the quantization noise is not Gaussian. However, to derive a lower bound for the SE, we assume it is Gaussian with covariance Cq d . [37] to derive an approximation for the ergodic achievable SE at the kth user as Sk = R (SQINRk ) , (46) where the effective SQINRk is defined by (47) at the top of the next page, and R (θ) , (1 − ηeff /T ) log2 (1 + θ) where ηeff represents the training duration which is η and (M/N ) η for the pure one-bit and mixed-ADC architectures, respectively. A. MRC Detection 1) Random Mixed-ADC Detection: In this subsection, we consider the case in which the high-resolution ADCs are connected to an arbitrary set of N antennas. Denoting the estimate of the channel by Ĥ = [ĥ1 , ..., ĥK ], setting W = Ĥ, and following the same reasoning as in [14], the SE of the mixed-ADC architecture with MRC detection can be derived as   2 pM σ ĥ  (48) SkMRC = R   , (1− π2 ) N 2 pK + σn + α2 1− M where the channel estimate variance σĥ2 = σĝ2k /βk depends on the estimation approach as denoted in (12), (18), and (28). From (48), it can be observed that the gain of exploiting the mixed-ADC architecture is manifested in the SE expressions by two factors, channel estimation improvement by a factor of σĥ2 , and quantization noise reduction by a factor of 1 − N/M . 2) Mixed-ADC Detection with Antenna Selection: Having an accurate channel estimate can help us to employ the N high-resolution ADCs in an intelligent manner to further improve the performance of the mixed-ADC architecture. A careful look at the SQINR expression in (47) reveals that the effect of one-bit quantization on the SE is manifested by the last term of the denominator. Hence, one can maximize the SE by minimizing this term through smart use of the N high-resolution ADCs. We refer to this approach as MixedADC with Antenna Selection. We consider an antenna selection scheme suggested by the SQINR expression in (47). In this approach, the N high-resolution ADCs are connected to the antennas corresponding to rows of Ĥ with the largest energy, 2 P i.e. K k=1 ĥmk . Besides numerical evaluation in Section VI, in Theorem 3 we derive a bound for the SE achieved by MRC detection with antenna selection. Theorem 3. The spectral efficiency of the mixed-ADC system with antenna selection and an MRC receiver is lower bounded by   2 pM σ   ĥ S̄kMRC = R    , (49) 2 P 1− ) ( M−N π pK + σn2 + MKα2 m=1 χm where χm is defined at the top of the next page, and FA denotes the Lauricella function of type A [45]. Proof. See Appendix B.  The lower bound (49) explicitly reflects the benefit of antenna selection in the data transmission phase. By comparing (49) with (48), it is evident that antenna selection has 9 SQINRk = χm = p PK i=1 E n wH k hi 2  p E wH k hk o  − p E wH k hk 2 2  + σn2 E {kwk k2 } + α−2 E wH k Cq d w k (47)   M−m X M! M −m (Γ(K))−m−ℓ K 1−m−ℓ Γ (1 + K (m + ℓ)) (−1)−ℓ (m − 1) ! (M − m) ! ℓ ℓ=0 (m+ℓ−1) × FA PM −N (1 + K (m + ℓ) ; K, · · · , K; K + 1, · · · , K + 1; −1, · · · , −1) (50) χ m improved the SE by replacing 1 − N/M by m=1 . In MK Section VI we illustrate how antenna selection improves SE for different SNRs. Note that Theorem 3 assumes the ability to make an arbitrary assignment of the high-resolution ADCs to different RF chains, which may not be possible if the ADC multiplexing is implemented by a bank of subarray switches. In the numerical results presented later, we show that this does not lead to a significant degradation in performance. B. ZF Detection In this section, we study the SE of the mixed-ADC architecture with ZF detection. To design a mixed-ADC adapted ZF detector, we re-write the last two terms of the denominator of (47) as follows:    H 2 −2 H (51) wk σn IM + α Cqd w k = W Cneff W , kk where Cneff = σn2 IM +α−2 Cqd . Accordingly, the ZF detector for the mixed-ADC architecture can be written as  −1 H −1 W = C−1 Ĥ Ĥ C Ĥ . (52) neff neff Plugging (52) into (47) yields (53) at the top of the next page. Similar to the MRC case, the SQINR in (53) suggests the same antenna selection approach for ZF detection. In general, calculating the expected values in (53) is not tractable neither for arbitrary-antenna mixed-ADC detection nor mixed-ADC with antenna selection. Hence, we numerically evaluate the performance of mixed-ADC with ZF detection in the next section. C. Massive MIMO with Uniform ADC Resolution Contrary to the mixed-ADC architecture where the ADC comparators are concentrated in a few antennas, uniformly spreading the comparators over the array is an alternative approach [19], [20], [21], [41], [44]. In this subsection, we provide the SE expressions for such systems. These expressions will be used in the next section for performance comparisons with the mixed-ADC architecture. The SE for the case of all one-bit ADCs was derived in [14] using the Bussgang decomposition. For ADC resolutions of 2 bits or higher, the AQNM model is sufficiently accurate. Using AQNM and following the same reasoning as in [21], [41], [44], the SE of a massive MIMO system with uniform resolution ADCs can be derived as S̃kMRC =   pM σ̃ĥ2     R 0) 2 + K + σ2 pK + σn2 + (1−α p σ̃ n α2 ĥ (54) 0 S̃kZF =   R  pK 1 − σ̃ĥ2  p (M − K) σ̃ĥ2  (M−K)σ̃2 ĥ E wH + σn2 + k C0 w k α2   , (55) for MRC and ZF detection, respectively. In (54) and (55), σ̃ĥ2 = α20 ηp + α20 σn2 α20 ηp , + α0 (1 − α0 ) (pK + σn2 ) (56) α0 is a scalar depending on the ADC resolution and can be found in Table  I of [21], wk is the kth column of −1 W = Ĥ ĤH Ĥ , and C0 denotes the covariance matrix of the quantization noise based on the AQNM model [21]. The  detailed calculation of E w H k C0 w k in (55) is provided in [44] which we do not include here for the sake of brevity. VI. N UMERICAL R ESULTS By substituting from (12), (18), and (28) into (48), (49), and (53), we can evaluate the performance of mixed-ADC architectures for different system settings. For all of the following experiments, we consider a system with M = 100 antennas at the BS, and K = 10 users. Also, we assume the power control approach of [37] is used, so that pk βk = p for all k. We also assume that an optimal resource allocation has been performed [41], [42] such that the training length, ηeff , transmission power during the training phase, pt , and data transmission phase, pd are optimized under a power constraint ηeff pt + (T − ηeff )pd = Pave T . In the following figures, the SNR is defined as SNR , Pave /σn2 . Fig. 5 illustrates the optimal weights for combining high-resolution and one-bit observations for the joint highresolution/one-bit LMMSE channel estimation. Interestingly, it can be seen that when M/N is large, the one-bit observations 10 SQINRZF k = p    −1   −1   −1   −1 −2 −1 −1 2 H H H H Ĥ Cneff Ĥ Ĥ Cneff Ĥ Ĥ Cneff Ĥ pK 1 − σĥ E +E Ĥ Cneff Ĥ kk 1 100 0.9 90 0.8 80 0.7 70 0.6 60 0.5 50 0.4 40 0.3 30 0.2 20 0.1 10 0 -20 -15 -10 -5 0 5 10 15 20 Fig. 5. Weights used in the LMMSE channel estimator for high-resolution and one-bit observations. 30 25 20 15 10 5 0 -20 -15 -10 -5 0 5 10 15 20 Fig. 6. Sum SE for MRC detection versus SNR for M = 100, N = 20, and T = 400. are emphasized in the low SNR regime relative to the highresolution observations. In addition, in contrast to the weights for the high-resolution observations, which rise monotonically with increasing SNR, the weight for the one-bit observations grows at first and then decreases to zero. To study the performance improvement due to joint channel estimation and antenna selection in mixed-ADC massive MIMO, the sum SE for the MRC and ZF detectors for a system 0 -20 -15 -10 (53) kk -5 0 5 10 15 20 Fig. 7. Sum SE for ZF detection versus SNR for M = 100, N = 20, and T = 400. with coherence interval T = 400 symbols and N = 20 highresolution ADCs is depicted in Fig. 6 and Fig. 7, respectively. In these and subsequent figures, “Joint with AS” indicates that the channel estimation was performed with both onebit and high-resolution ADCs and that antenna selection (AS) was used for data detection, “Joint without AS” represents the same case without antenna selection, “Joint Subarray AS” means that the antenna selection only occurred within each M/N -element subarray (one high-resolution ADC assigned to the strongest channel within each subarray), and “Not Joint without AS” represents the case in which channel is estimated based on only high-resolution observations and no antenna selection is employed. For both MRC and ZF, it can be seen that antenna selection slightly improves the SE for high SNRs, where the channel estimation is most accurate. At low SNR, we see that joint channel estimation provides a gain from the use of one-bit ADCs, which provide useful information at these SNRs. We also see that the constrained AS required when the switching is only performed within subarrays provides nearly identical performance to the case where arbitrary AS is allowed. Note that the main reason for the small gain for antenna selection is due to the fact that, with multiple users, selecting a given antenna does not benefit all users simultaneously, and the strong users responsible for a given antenna being selected will in general be different for different antennas. Thus, the improvement due to increased signal-to-noise ratio for some users is somewhat offset by the fact that other users may experience a lower SNR on those same antennas. We would 11 100 30 20 50 10 0 -20 -15 -10 -5 0 5 10 15 20 0 -20 -15 -10 -5 0 5 10 15 20 -15 -10 -5 0 5 10 15 20 100 30 20 50 10 0 -20 -15 -10 -5 0 5 10 15 20 Fig. 8. Sum SE for MRC detection versus SNR for M = 100, N = 20, 10, and T = 400, 1000. see a much larger benefit for antenna selection if only a single user were present. Figs. 8 and 9 provide a comparison among a mixed-ADC massive MIMO system with joint channel estimation and antenna selection, an all-one-bit architecture (“One-bit”), and a mixed-ADC without round-robin training for which the highresolution ADCs are connected to a fixed set of antennas without ADC switching or antenna selection (“Non-roundrobin”) [27]. Since mixed-ADC channel estimation improves the channel estimation accuracy by expending a larger portion of the coherence interval for training, its benefit is directly related to the length of the coherence interval. For MRC detection, when T = 400, the mixed-ADC architecture performs better than the all-one-bit architecture for N = 20, but when N = 10 the all-one-bit architecture is better due to the larger training overhead incurred when N is smaller. However, for T = 1000, mixed-ADC outperforms the allone-bit architecture at high SNRs for both N = 10, 20, while the all-one-bit case is still better for N = 10 at low SNRs. Round-robin training provides better SE performance at high SNR when N = 20 compared to the case without antenna switching, especially for the larger coherence interval. However, for other cases, the round-robin training overhead significantly reduces the SE, especially for N = 10 and the shorter coherence interval. For ZF detection, we see that the mixed-ADC architectures can provide very large gains in SE compared to the one-bit case at high SNRs, regardless of T . For low SNRs, there is little to no improvement. These cases still do not show a significant benefit for round-robin training compared with a fixed ADC assignment; only when N = 20 and T = 1000 do we see a slight improvement. For N = 20, Figs. 10 and 11 show how the coherence interval T impacts the effectiveness of the mixed-ADC architecture for MRC and ZF detectors, respectively. For mixed-ADC 0 -20 Fig. 9. Sum SE for ZF detection versus SNR for M = 100, N = 20, 10, and T = 400, 1000. 30 28 26 24 22 20 18 16 14 12 500 1000 1500 2000 Fig. 10. Sum SE for MRC detection versus T for M = 100, N = 20, and SN R = −10, 0, 10 dB. MRC detection, it is apparent that the best choice among the three architectures (all one-bit, mixed-ADC with and without round-robin training) depends on the SNR operating point and the length of the coherence interval. The advantage of round-robin training becomes apparent for long coherence intervals, where the increased training length has a smaller impact. The gain for round-robin training is greatest at higher SNRs. For shorter coherence intervals, mixed ADC with fixed antenna/ADC assignments provides the best SE, with the largest gains again coming at higher SNRs. For this value of N , the all-one-bit system generally has the lowest SE, although the difference is not large for MRC. The next example investigates the impact of distributing the resolution (i.e., the comparators of the ADCs) across the array 12 70 100 60 50 50 0 -20 -15 -10 -5 0 5 10 15 20 -15 -10 -5 0 5 10 15 20 40 100 30 50 20 0 -20 10 500 1000 1500 2000 Fig. 11. Sum SE for ZF detection versus T for M = 100, N = 20, and SN R = −10, 0, 10 dB. 40 30 20 10 0 -20 -15 -10 -5 0 5 10 15 20 -15 -10 -5 0 5 10 15 20 40 30 20 10 0 -20 Fig. 12. Sum SE for MRC detection versus SNR for 180 comparators and T = 400, 1000. with different numbers of antennas. If we assume that the “high-resolution” ADCs consist of 5 bits [43], a mixed-ADC architecture with N = 20 high-resolution and M − N = 80 one-bit ADCs will have 180 total comparators. Figs. 12 and 13 illustrate the SE achieved by distributing the 180 comparators across arrays of different length for MRC and ZF detection, respectively. In these figures, “Joint with AS” and “Non-roundrobin” refer to mixed-ADC architectures with N = 20 5-bit ADCs and M −N = 80 one-bit ADCs, “One-bit” corresponds to M = 180 antennas with one-bit ADCs, and “Multi-bit” indicates a system with either M = 90 2-bit ADCs or M = 60 3-bit ADCs. As we see in the figures, it can be inferred that for MRC detection, which is interference limited, it is better to have a larger number of antennas with lower-resolution ADCs Fig. 13. Sum SE for ZF detection versus SNR for 180 comparators and T = 400, 1000. instead of equipping the BS with fewer antennas and high resolution ADCs. This is consistent with the results of [30], [39], and is due to the fact that a larger number of antennas helps the system to more effectively cancel the interference. On the other hand, for ZF detection which is noise limited, the use of high-resolution ADCs avoids additional quantization noise imposed by the low-resolution ADCs, and is more beneficial than having a larger number of antennas with lowresolution ADCs at high SNR. Finally, Figs. 14 and 15 show the impact of the number of high-resolution ADCs in a mixed-ADC system with M = 100 antennas, K = 10 users, and various numbers N of highresolution ADCs, where N = 100 denotes the all-highresolution system. It is apparent that with a large enough coherence interval and a sufficient number of high-resolution ADCs, the mixed-ADC implementation with joint round-robin channel estimation and antenna selection outperforms the all-one-bit architecture and mixed-ADC without round-robin training. The gains are greatest when ZF detection is used and the SNR is high, but such gains must be weighed against the increased power consumption and hardware complexity. VII. C ONCLUSION We studied the spectral efficiency of mixed-ADC massive MIMO systems with either MRC or ZF detection. We showed that properly accounting for the impact of the quantized receivers using the Bussgang decomposition is important for obtaining an accurate analysis of the SE. We introduced a joint channel estimation approach to leverage both high-resolution ADCs and one-bit ADCs and our analytical and numerical results confirmed the benefit of joint channel estimation for low SNRs. Mixed-ADC detection with MRC and ZF detectors and antenna selection were also studied. Analytical expressions were derived for MRC detection and a numerical performance 13 ADC switching and round-robin training can achieve the best performance in some cases, particularly when the coherence interval is long and more high-resolution ADCs are available to reduce the number of training interval repetitions. Otherwise, a mixed-ADC implementation without ADC switching and extra training is preferred. 35 30 A PPENDIX 25 A. Proof of Theorem 2 From (2), the observations from the high-resolution ADCs can be written as r 1 Xφ∗k = g k + ñ(0), (57) v(0) = ηpk 20 15 σ2 10 10 20 30 40 50 60 70 80 90 100 Fig. 14. Sum SE for MRC detection versus N for SN R = −10, 0, 10 dB and T = 1000. 100 90 80 70 60 50 where ñ(0) ∼ CN (0, ηpnk I M ). In addition, from (8), the observations from the one-bit ADCs become r 1 ∗ Yt φ¯k = g k + ñ(t) + q̃(t), t ∈ T , (58) v(t) = ηpk σ2 where ñ(t) ∼ CN (0, ηpnk I M ) is independent of ñ(t′ ) for q 1 ¯∗ t 6= t′ , and q̃(t) = ηpk Q(t)φk . Since the elements of v(t) are independent, we can estimate the mth channel gmk separately. Therefore, stacking all the observations in a vector, we can write       ñm (0) vm (0) 1   ..     .. ..  .    . .       .  vm (t)  = 1 gmk +  ñ (t) + q̃ (t) m m         ..     .. ..       . . . M M M 1 ñm ( N − 1) + q̃m ( N − 1) vm ( N − 1) | | {z } |{z} {z } 1M v u N 40 30 20 10 10 20 30 40 50 60 70 80 90 100 Fig. 15. Sum SE for ZF detection versus N for SN R = −10, 0, 10 dB and T = 1000. analysis was performed for ZF detection. It was shown that antenna selection provides a slight advantage for high SNRs while this advantage tends to disappear for low SNRs. We showed that the SNR, the number of high-resolution ADCs and the length of the coherence interval play a pivotal role in determining the performance of mixed-ADC systems. We showed that, in general, mixed-ADC architectures will have the greatest benefit compared to implementations with all low-resolution ADCs when ZF detection is used and the SNR is relatively high. In such cases, the gain of the mixed-ADC approach can be substantial. Gains are also possible for MRC, but they are not as significant, and require larger numbers of high-resolution ADCs to see a benefit compared with the ZF case. The more complicated mixed-ADC approach based on (59) As a result, the LMMSE estimation of the mth channel coefficient for the kth user is [40] −1  1 + 1TM C−1 1TM C−1 1 M (60) ĝmk = u u v. N N N βk In Eq. (60), Cu denotes the covariance matrix of u which is a block diagonal matrix of the form  σ2  n 0 ... 0 # ηpk " 2  0 2 σn σw . . . ̺k    0 k ηp k , (61) Cu =  . = .. ..  ..  .. 0 S . . .  0 ̺k ... 2 σw k where ∗ ̺k = E{(ñm (t) + q̃m (t)) (ñm (t′ ) + q̃m (t′ )) }, t 6= t′ , (62) can be easily calculated with the aid of the Bussgang decomposition and the arcsine law as in (24). Substituting (61) into (60), we have −1  1 ηpk ĝmk = + 2 + 1TM −1 S−1 1 M −1 N N βk σn # " 2 σn −1 T v. (63) × 1 M −1 S N ηpk 14 To calculate the inverse of the matrix S, we re-write it as  2 S = σw − ̺k I M −1 + ̺k 1 M −1 1TM −1 , (64) k N N N and use Woodbury’s matrix identity: S−1 = 2 σw k 1 IM − − ̺k N −1 1 2 −̺ σw k k which yields 2 ( M − 1) 1 + 2N ̺k σwk − ̺k 1TM −1 S−1 = N 1 2 + σw k 1TM −1 S−1 1 M −1 = N N M N 2 σw k !−1 distributed, the Em are independent Gamma random variables with ! x ,K , (71) F (x) = γ σĥ2 where γ(., .) denotes the incomplete Gamma function. From [47], the integral (70) can be calculated in closed form for Gamma random variables as 1 M −1 1TM −1 , (65) N N E{E(m) } = σĥ2 χm . (72) This is in contrast to the unordered case where E{Em } = Kσĥ2 . As a result  1TM , − 2 ̺k N −1  M N −1  . + M N − 2 ̺k (66) (67) Substituting (66) and (67) into (63) completes the proof.  2 M−N oo  n n H 2 σĥ X = 1− min E ĥk Cq d ĥk χm . π K m=1 (73) The remaining terms in (47) can be calculated similar to the case where the high-resolution ADCs are connected to arbitrary antennas. Plugging these terms and (73) into (47) and some algebraic manipulation results in (49). B. Proof of Theorem 3 Denote the energy of the mth row, m ∈ M = {1, ..., M }, of Ĥ by Em , i.e., Em , K X 2 ĥmk . (68) k=1 To do antenna selection, we must connect the N highresolution ADCs to the antennas corresponding to the largest Em . Suppose that the indices of the N antennas to which the high-resolution ADCs are connected are contained in the set N . Hence, we have K X k=1 o n H E ĥk Cq d ĥk =  o  n H 2 X E{Em }. (69) KE ĥk Cqd ĥk = 1 − π M\N Eq. (69) provides a criterion for connecting the N highresolution ADCs in the data transmission phase. In fact, it states that, for the MRC receiver, the expected value in (69) will be minimized if the high-resolution ADCs are connected to the antennas corresponding to the largest Em . Denote E(m) as the mth smallest value of Em , i.e., E(1) ≤ E(2) ≤ · · · ≤ E(M) . Hence, E(m) is the mth order statistic, and assuming that the E(m) are statistically independent and identically distributed, we have [46] E{E(m) } =  Z ∞ M −1 M x [F (x)]m−1 [1 − F (x)]M−m dF (x), m−1 −∞ (70) where x is the realization of E(m) and F (x) is the cumulative distribution function of Em . For the case that we have considered, where the channel coefficients are i.i.d. Rayleigh R EFERENCES [1] H. Pirzadeh, and A. Swindlehurst, “Analysis of MRC for Mixed-ADC Massive MIMO,” in Proc. IEEE Int. Workshop Comput. Adv. Multi-Sensor Adaptive Process., 2017. [2] T. L. Marzetta, “Noncooperative cellular wireless with unlimited numbers of base station antennas,” IEEE Trans. Wireless Commun., vol. 9, no. 11, pp. 3590-3600, Nov. 2010. [3] L. Lu, G. Y. Li, A. Swindlehurst, A. Ashikhmin, and R. Zhang, “An overview of massive MIMO: Benefits and challenges,” IEEE J. Sel. Topics in Signal Process., vol. 8, no. 5, pp. 742-758, Oct. 2014. [4] E. G. Larsson, O. Edfors, F. Tufvesson, and T. L. Marzetta, “Massive MIMO for next generation wireless systems,” IEEE Commun. Mag., vol. 52, no. 2, pp. 186-195, Feb. 2014. [5] H. Q. Ngo, E. G. Larsson, and T. L. Marzetta, “Energy and spectral efficiency of very large multiuser MIMO systems,” IEEE Trans. Commun., vol. 61, no. 4, pp. 1436-1449, Apr. 2013. [6] H. Yang and T. L. Marzetta, “Performance of conjugate and zero-forcing beamforming in large-scale antenna systems,” IEEE J. Sel. Areas in Commun., vol. 31, no. 2, pp. 172-179, Feb. 2013. [7] E. Björnson, J. Hoydis, M. Kountouris, and M. Debbah, “Massive MIMO systems with non-ideal hardware: Energy efficiency, estimation, and capacity limits,” IEEE Trans. Inf. Theory, vol. 60, no. 11, pp. 71127139, Nov. 2014. [8] E. Björnson, M. Matthaiou, and M. Debbah, “Massive MIMO with nonideal arbitrary arrays: Hardware scaling laws and circuit-aware design,” IEEE Trans. Wireless Commun., vol. 14, no. 8, pp. 4353-4368, Aug. 2015. [9] C. Mollén, E. Larsson and T. Eriksson, “Waveforms for the massive MIMO downlink: Amplifier efficiency, distortion, and performance,” IEEE Trans. Commun., vol. 64, no. 12, pp. 5050-5063, Dec. 2016. [10] C. Mollén, U. Gustavsson, T. Eriksson, and E. Larsson, “Spatial characteristics of distortion radiated from antenna arrays with transceiver nonlinearities,” Arxiv preprint, arXiv:1711.02439. [11] C. Mollén, E. Larsson, U. Gustavsson, T. Eriksson, and R. Heath Jr., “Out-of-Band radiation from large antenna arrays,” Arxiv preprint, arXiv:1611.01359. [12] C. Mollén, U. Gustavsson, T. Eriksson, and E. Larsson, “Impact of spatial filtering on distortion from low-noise amplifiers in massive MIMO base stations,” Arxiv preprint, arXiv:1712.09612, submitted to IEEE Trans. Commun.. [13] Q. Bai and J. A. Nossek, “Energy efficiency maximization for 5G multiantenna receivers,” Trans. Emerging Telecommun. Technol., vol. 26, no. 1, pp. 3-14, 2015. [14] Y. Li, C. Tao, L. Liu, A. Mezghani, G. Seco-Granados, and A. Swindlehurst, “Channel estimation and performance analysis of one-bit massive MIMO systems,” IEEE Trans. Signal Process., vol. 65, no. 15, pp. 40754089, May 2017. [15] C. Mollén, J. Choi, E. G. Larsson, and R. W. Heath, “Uplink performance of wideband massive MIMO with one-bit ADCs,” IEEE Trans. Wireless Commun., vol. 16, no. 1, pp. 87-100, Jan. 2017. 15 [16] S. Jacobsson, G. Durisi, M. Coldrey, U. Gustavsson, and C. Studer, “Throughput analysis of massive MIMO uplink with low-resolution ADCs,” IEEE Trans. Wireless Commun., vol. 16, no. 6, pp. 4038-4051, June 2017. [17] C. Studer and G. Durisi, “Quantized massive MU-MIMO-OFDM uplink,” IEEE Trans. Commun., vol. 64, no. 6, pp. 2387–2399, June 2016. [18] J. Mo and R. W. Heath, “Capacity analysis of one-bit quantized MIMO systems with transmitter channel state information,” IEEE Trans. Signal Process., vol. 63, no. 20, pp. 5498–5512, Oct. 2015. [19] M. Sarajlić, L. Liu, and O. Edfors, “When are low resolution ADCs energy efficient in massive MIMO?,” IEEE Access, vol. 5, pp. 1483714853, July 2017. [20] D. Verenzuela, E. Björnson, and M. Matthaiou, “Hardware design and optimal ADC resolution for uplink massive MIMO systems,” in IEEE Sensor Array and Multichannel Signal Processing Workshop (SAM), Rio de Janeiro, Brazil, July 2016. [21] L. Fan, S. Jin, C. Wen, and V. Zhang, “Uplink achievable rate for massive MIMO systems with low-resolution ADC,” IEEE Commun. Lett., vol. 19, no. 12, pp. 2186-2189, Dec. 2015. [22] J. Zhang, L. Dai, S. Sun, and Z. Wang, “On the spectral efficiency of massive MIMO systems with low-resolution ADCs,” IEEE Commun. Lett., vol. 20, no. 5, pp. 842-845, May. 2016. [23] N. Liang, W. Zhang, “Mixed-ADC massive MIMO,” IEEE J. Sel. Areas in Commun., vol. 34, no. 4, pp. 983-997, April 2016. [24] N. Liang, W. Zhang, “Mixed-ADC Massive MIMO uplink in frequencyselective channels,” IEEE Trans. Commun., vol. 64, no. 11, pp. 46524666, Nov. 2016. [25] W. Tan, S. Jin, C. Wen and Y. Jing, “Spectral efficiency of mixedADC receivers for massive MIMO systems,” IEEE Access, vol. 4, pp. 7841-7846, Aug. 2016. [26] J. Zhang, L. Dai, Z. He, S. Jin, and X. Li, “Performance analysis of mixed-ADC massive MIMO systems over Rician fading channels,” IEEE J. Sel. Areas in Commun., vol. 35, no. 6, pp. 1327-1338, June 2017. [27] H. Pirzadeh, and A. Swindlehurst, “Spectral efficiency under energy constraint for mixed-ADC MRC massive MIMO,” IEEE Sig. Process. Lett., vol. 24, no. 12, pp. 1847-1851, Oct. 2017. [28] T. C. Zhang, C. K. Wen, S. Jin, and T. Jiang, “Mixed-ADC massive MIMO detectors: Performance analysis and design optimization,” IEEE Trans. Wireless Commun., vol. 15, no. 11, pp. 7738–7752, Nov. 2016. [29] J. Liu, J. Xu, W. Xu, S. Jin, and X. Dong, “Multiuser massive MIMO relaying with Mixed-ADC receiver,” IEEE Sig. Process. Lett., vol. 24, no. 1, pp. 76-80, Dec. 2016. [30] J. Park, S. Park, A. Yazdan and R. W. Heath “Optimization of MixedADC multi-antenna systems for Cloud-RAN deployments,” IEEE Trans. Commun., vol. 65, no. 9, pp. 3962-3975, Sep. 2017. [31] J. J. Bussgang, “Crosscorrelation functions of amplitude-distorted Gaussian signals,” Res. Lab. Electron., Massachusetts Inst. Technol., Cambridge, MA, USA, Tech. Rep. 216, 1952. [32] G. Jacovitti and A. Neri, “Estimation of the autocorrelation function of complex Gaussian stationary processes by amplitude clipped signals,” IEEE Trans. Inf. Theory, vol. 40, no. 1, pp. 239-245, Jan. 1994. [33] A. Garcia-Rodriguez, C. Masouros, and P. Rulikowski, “Reduced Switching Connectivity for Large Scale Antenna Selection,” IEEE Trans. Commun., vol. 65, no. 5, pp. 2250-2263, May 2017. [34] Y. Gao, H. Vinck, and T. Kaiser, “Massive MIMO antenna selection: Switching architectures, capacity bounds, and optimal antenna selection algorithms,” IEEE Trans. Sig. Process., vol. 66, no. 5, pp. 1346-1360, March, 2018. [35] X. Gao, O. Edfors, F. Tufvesson, and E. Larsson, “Multi-Switch for antenna selection in massive MIMO,” in Proc. IEEE Global Communications Conference (GLOBECOM), San Diego, CA, 2015. [36] A. Le Fevre, R. Flett, “A 100 Mb/s Multi-LAN crosspoint chip set for cable management,” IEEE J. Solid-State Circuits, vol. 32, no. 7, pp. 1115-1121, July 1997. [37] E. Björnson, E. G. Larsson, and M. Debbah, “Massive MIMO for maximal spectral efficiency: How many users and pilots should be allocated?,” IEEE Trans. Wireless Commun., vol. 15, no. 2, pp. 12931308, Feb. 2016. [38] “http://www.analog.com/media/en/news-marketing-collateral/productselection-guide/HighSpeedSwitches.pdf” [39] H. Pirzadeh, and A. Swindlehurst, “On the optimality of mixed-ADC massive MIMO with MRC detection,” in Proc. Int. ITG Workshop Smart Antennas (WSA), 2017. [40] S. M. Kay, Fundamentals of Statistical Signal Processing: Estimation Theory. Englewood Cliffs, NJ: Prentice-Hall, 1993. [41] L. Fan, S. Jin, C. K. Wen, and M. Matthaiou, “Optimal pilot length for uplink massive MIMO systems with low-resolution ADC,” in IEEE Sensor Array and Multichannel Signal Processing Workshop (SAM), Rio de Janeiro, Brazil, July 2016. [42] H. Q. Ngo, M. Matthaiou, and E. G. Larsson, “Massive MIMO with optimal power and training duration allocation,” IEEE Wireless Commun. Lett., vol. 3, no. 6, pp. 605-608, Dec. 2014. [43] K. Roth, H. Pirzadeh, A. L. Swindlehurst, and J. A. Nossek, “A comparison of hybrid beamforming and digital beamforming with low-resolution ADCs for multiple users and imperfect CSI,” IEEE J. Sel. Topics Signal Process., to be published, doi: 10.1109/JSTSP.2018.2813973. [44] D. Qiao, W. Tan, Y. Zhao, C.-K. Wen and S. Jin, “Spectral efficiency for massive MIMO zero-forcing receiver with low-resolution ADC,” in IEEE Wireless Communication and Signal Processing (WCSP), Yangzhou, China, Oct. 2016. [45] Q. Shi, and Y. Karasawa, “Some applications of Lauricella hypergeometric function FA in performance analysis of wireless communications,” IEEE Commun. Lett., vol. 16, no. 5, pp. 581-584, May 2012. [46] H. A. David, Order Statistics, 2nd ed. New York: Wiley, 1981. [47] S. Nadarajah and M. Pal, “Explicit expressions for moments of gamma order statistics,” Bulletin of the Brazilian Mathematical Society, New Series, vol. 39, no. 1, pp. 45-60, Mar. 2008.