Multi-Attribute Missing Data Reconstruction Based On Adaptive Weighted Nuclear Norm Minimization in Iot
Multi-Attribute Missing Data Reconstruction Based On Adaptive Weighted Nuclear Norm Minimization in Iot
Multi-Attribute Missing Data Reconstruction Based On Adaptive Weighted Nuclear Norm Minimization in Iot
ABSTRACT In the Internet of Things, data sets gathered from sensor nodes are missing a significant
fraction of data, owing to noise, collision, unreliable links, and unexpected damage. This phenomenon is
more serious in some scenarios, which limits the applications of sensor data. It is therefore necessary to
develop methods to reconstruct these lost data with high accuracy. In this paper, a new multi-attribute missing
data reconstruction method based on adaptive weighted nuclear norm minimization is developed, and a
K -means clustering analysis is embedded into this forecasting process to improve the prediction accuracy.
First, to ensure that sensors in one group have similar patterns of measurement, we use a traditional machine
learning algorithm, called K -means clustering algorithm to separate sensors into different groups. Second,
considering the correlations among multiple attributes of sensor data and its joint low-rank characteristics,
we propose an algorithm based on the matrix rank-minimization method of automatic weighted nuclear norm
minimization, which adaptively assigns different weights to each singular value simultaneously. Moreover,
we use the alternating direction method of multipliers to obtain its optimal solution. Finally, we evaluate
the proposed method by using a real sensor data set from the Intel Berkeley Research Laboratory with two
missing patterns, namely, random missing pattern and consecutive missing pattern. The simulation results
prove that our algorithm performs well even with a small number of samples, and it can propagate through
a structure to fill in large missing regions.
INDEX TERMS Data reconstruction, Internet of Things (IoT), K -means, multi-attribute, tensor completion,
weighted nuclear norm minimization (WNNM).
I. INTRODUCTION IoT has employed many technologies and the core tech-
Owing to advancements in information technology, the new nologies of it mainly include Radio Frequency Identifica-
era of the Internet of Things (IoT) is encompassing comput- tion (RFID), sensor technology, wireless network technology,
ing and communication technologies, spanning every aspect artificial intelligence and cloud computing [1]. IoT is widely
of our lives, and it has emerged as one of central issues in used in conventional Long Term Evolution (LTE), Wi-Fi,
our daily lives. The IoT is an intelligent network, it has ZigBee, wireless sensor network (WSN), as well as device-
been defined as a global network with an infrastructure that to-device (D2D) communication [4], transmit antenna selec-
has self-configuring capabilities. Specifically, the sensors are tion (TAS) in cooperative networks [5], the low-power wide
embedded in various objects and then integrated with the area network from the LoRa Alliance (LoRaWAN), narrow
existing Internet to realize the information exchange between band IoT (Nb-IoT), Ethernet and many other communications
human society and physical systems. IoT is to fully utilize the technologies. Therefore, IoT is rapidly transforming into a
new generation of IT technology in all walks of life, which is highly heterogeneous ecosystem that provides inter operabil-
called the third wave of the world information industry after ity among different types of devices and communications
computers and the Internet [1]–[3]. technologies [6]–[8].
2169-3536
2018 IEEE. Translations and content mining are permitted for academic research only.
VOLUME 6, 2018 Personal use is also permitted, but republication/redistribution requires IEEE permission. 61419
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
X. Yu et al.: Multi-Attribute Missing DR-AWNNM in IoT
Stylized as IoT, it is the interconnection of everyday and voltage. Thus, we can assume that there are a few con-
‘‘thing,’’ such as vehicles, buildings, and other Internet- nections between these attributes objectively. For instance,
connected devices embedded with software, electronics, actu- when light illumination is increased, the ambient temperature
ators, and sensors, enabling them to send and receive data. will increase simultaneously, and air humidity will decrease.
Thus, data is the most crucial and significant asset responsible An empirical study [16] revealed that temperature, dewpoint
for the functioning of these smart devices. IoT applications temperature, and relative humidity are linearly correlated.
gather huge volumes of multiple attribute data from the phys- That is, the correlations among the attributes can be used as a
ical world through all connected sensors and reconstruct envi- supplement of the internal correlations to increase estimation
ronmental data in the cyber world [9]. For instance, scientists accuracy.
have revealed the plant evolution based on wind speed, air In this paper, a new approach to reconstructing missing
humidity, and air temperature data [10]. Volcanic eruptions values in IoT data is proposed. This method implements
have been predicted based on temperature and shake data of a K -means clustering algorithm to separate sensors into dif-
volcanoes [11]. ferent groups. The main goal of clustering is to increase the
IoT applications have strict requirements in terms of data similarity within the same group and the difference between
integrity, correctness, and on-time delivery [12]. However, different clusters. Thereafter, we use the potential relation-
in many IoT applications, massive data loss is common and ships among multiple data attributes to propose an algorithm
unavoidable. For example, the Intel Berkeley Research lab based on the matrix rank-minimization method, which is an
dataset [11] is missing roughly 50% of data, the Ocean Sense approach to low rank matrix approximation [17], to recon-
project is missing 64% of data [13], and the GreenOrbs struct the multi-attribute sensor data within each cluster.
project is missing 35% of data [14]. This can be ascribed To improve data reconstruction performance, we enhance
to many reasons. On the one hand, these data have the our algorithm by normalizing the data and using weighted
characteristics of large scale, high dimensions, and complex nuclear norm minimization (WNNM) [18]. Our contributions
structures. Data loss or damage may occur during acquisition, are summarized as follows:
transmission, and storage. On the other hand, sensor nodes 1) We use the K -means clustering algorithm to divide the
have limited capabilities, so limitations in terms of energy, sensor nodes into different clusters, so as to ensure
storage, communication capabilities, battery depletion, hard- that sensors within one group have similar patterns of
ware failures, or other environmental and communications measurement.
issues may cause lead to the capture of incomplete data. 2) To the best of our knowledge, this is the first work
Loss of sensing data hinders various IoT applications and to apply adaptive weighted nuclear norm minimization
makes it more difficult to process sensor data. Moreover, algorithm in tensor-based method to sensor data recon-
if the missing data values cannot be filled in accurately, struction problem.
the existing analysis tools cannot be applied; if the missing 3) We combine the multi-attribute correlation of sensor
data are deleted directly, a large amount of raw data would be data and the low-rank minimization technique to pro-
lost, which would reduce the accuracy and reliability of the pose a tensor-based algorithm, named DR-AWNNM,
analysis results and lead to the wastage of a massive amount for reconstruction missing values.
of energy. According to a report by IBM, tons of data are The remainder of this paper is organized as follows.
produced every day throughout the world. Currently, it is In Section II, we present related work. Section III describes
about 2.5 quintillion bytes per day. With IoT, this number will problem formulation. The performance of the proposed
increase considerably. For example, in 2009, there were about method is evaluated in Section IV. Section V provides our
0.9 billion smart objects, and their number is projected to concluding remarks and an outline for future work.
reach 26 billion by 2020. The more widespread the use of this
technology, the greater will be the volume of data produced. II. RELATED WORK
Recovering missing sensor data effectively to better analyze A. EXISTING APPROACHES AND THEIR LIMITATIONS
them for IoT applications is a major challenge. Therefore, In the modern IoT paradigm, data integrity is the most impor-
it is urgent and important to design effective methods for tant aspect that influences the overall performance of any
reconstructing missing values in big sensor data. system. The IoT is used for many critical applications, such
Many studies have contributed techniques to predict miss- as telemetry in hazardous environments, control of industrial
ing sensor data. Most such techniques are based on temporal processes, e-Health, smart transportation systems, national
methods, spatial methods, or spatio-temporal methods. How- security, etc. Recently, IoT was used also for the network
ever, in a few special scenarios, high data loss rates may veil monitoring to manage the performance of 5G heterogeneous
temporal and spatial correlations. Therefore, it is necessary to networks under variable conditions The problem of missing
find new ways to address this problem. We are aware that a sensor data in WSN has been known for a long time [6].
sensing node is usually integrated with a multi-function sen- WSN, which is one of the key technologies of IoT, has been
sor, and these nodes usually gather multiple attributes simul- researched extensively on the back of rapid development of
taneously, for example, the data collected by sensor nodes wireless communication technology, microelectronics tech-
in [15] contains four attributes: temperature, humidity, light, nology, and embedded computing technology [9].
FIGURE 3. Iterations of K-means clustering algorithm (K = 9, (a)-Step 1, (b)-step 2 and 3, (c)-step 4 and 5).
Step 6: Iterate steps 2, 3, 4 and 5 until set Ck no longer TABLE 1. The data format.
changes or the difference between adjacent iterations is
smaller than the given threshold:
X
4ck < η. (11) Let T be a tensor of the sensor data with M attributes col-
k lected by N nodes within Q time slots, that is, T ∈ RN ×M ×Q ;
the tensor is a generalization of the matrix concept. We use
where η is a small positive number, here we set
tn,m,q to represent the entry of T . Owing to data loss in IoT,
η = 1e−4 [35].
T is usually an incomplete tensor. The intact information of
Figure 3 shows the procedure the five-step procedure of
T is a set of entries tn,m,q , for (n, m, q) ∈ , where is
one iteration. The red solid points represent the cluster cen-
the set of sampled entries of T . We use P (·) to indicate the
ters, and the stars represent the sensor nodes; the different
sampling operator, which is defined as follows:
colors of sensor points indicate different clusters. (
tn,m,q , if (n, m, q) ∈ ,
B. DATA RECONSTRUCTION WITH ADAPTIVE WEIGHTED [P (T )]n,m,q = (12)
0, otherwise.
NUCLEAR NORM (DR-AWNNM)
Suppose that N nodes are deployed in a monitoring area, each The multi-attribute sensor data reconstruction problem is
of which is equipped with K ∗ sensors to monitor different defined as follows:
at-tributes simultaneously. The monitoring period consists Given a subset of T , denoted as P (T ), the ultimate goal
of Q time slots. The sensor data gathered in one node can is to find an optimal solution T̂ that minimizes the error
be organized in the following format [29], where sensor between T and T̂ :
ID stands for sensor identity number, time stamp represents
sampling time, and the attributes include temperature and min kT − T̂ kF
humidity. s.t. P (T̂ ) = P (T ), (13)
q 2
where kX kF := ïijĹ(6i1 ,i2 ,...,in xi1 ,i2 ,...,in )ïijĽ is defined and T denote the incomplete tensor of multi-attribute sen-
as the Frobenius norm of tensor X . sor data. Then, the missing values in Eq.(16) can be estimated
Many studies [19], [32], have mentioned that the sensor effectively by solving the following convex optimization
data tensor is low-rank, so the missing values in T can problem,
be recovered using the matrix complement based on nuclear 3 3
Xi,(i)
+ 1
H ·
X X
2
norm relaxation: Xi − H · T
F .
minimize w,∗
X 2λ
i=1 i=1
min kX k∗
(18)
s.t. P (X ) = P (T ), (14)
where (·) denotes element-wise production of the tensor,
where kX k∗ is the nuclear norm of X , which is defined and λ is an adjustable parameter introduced to prevent
N
as kX k∗ :=
P
αi
X(i)
∗ , where αi is the weight cor-
over-fitting.
i=1
responding to the unfolding of the i-th mode. Here, let C. ALTERNATE DIRECTION MULTIPLIER METHOD
α1 = α2 = α3 (N = 3), which means each X(i) is FOR DR-AWNNM
assigned equal importance. However, traditional NNM has The alternate direction multiplier method (ADMM) algo-
certain limitations, for example, all singular values are treated rithm [36], a convex optimization algorithm, is widely used
equally and shrunk with the same threshold [18]. In doing so, to solve convex optimization problems by breaking them into
prior knowledge on singular values of a practical data matrix smaller pieces, each of which can then be handled easily. It is
is ignored. More specifically, larger singular values of an an extension of the augmented Lagrange multiplier method
input data matrix quantify the information of its underlying algorithm . Therefore, in the present study, we attempted
principal directions. In other words, inner the same unfold- to use the ADMM algorithm to solve the formulated
ing X(i) , considering the information included in different problem.
singular value is different [33]. Obviously, the traditional To apply the ADMM method to problem (18), we need to
NNM model, as well as its corresponding soft-thresholding transform it into the ADMM form. Thus, we first perform
solvers, are not adequately flexible to handle such variable splitting. We introduce N new tensor-valued vari-
issues. ables, M1 , M2 , . . . , MN . Let Xi = Mi (i ∈ 1, 2, . . . , N );
To improve the flexibility of NNM, weak shrinkage by using these new variables Mi , Eq. (18) can be rewritten
strength should be enforced on the large singular value, as follows:
while a large weight should be assigned to the small singular
3 3
value. Based on this idea, we use the weighted nuclear norm
Xi,(i)
+ 1
H ·
X
X
2
minimize w,∗
Mi − H · T
F
minimization (WNNM) algorithm, and define the weighted Xi ,Mi 2λ
i=1 i=1
nuclear norm of X as follows:
s.t. Xi = Mi , i = 1, 2, . . . , N (19)
3
X
wi σi (X(i) ),
X(i)
= (15) The augmented Lagrangian of (19) becomes
w,∗
j=1 ρ
L(Xi , Mi , Yi ) = hYi , Xi − Mi i + kXi − Mi k2F
2
where w = [w1 , w2 , . . . , wn ]T and wi ≥ 0, which is a 3
non-negative weight assigned to σi (X ). The weight vector 1
X
2
+
H · Mi − H · T
F
enhances the representation capability of the original nuclear 2λ
i=1
norm. 3
Therefore, the proposed model is revised as follows:
X
Xi,(i)
,
+ w,∗
(20)
3
X i=1
min
X(i)
For convenience, by combining the linear and quadratic
w,∗
i=1 terms in the augmented Lagrangian and scaling the dual
s.t. P (X ) = P (T ), (16) variable, Eq.(20) can be simplified as follows
For convenience, we define a sampling tensor H, 3
Xi,(i)
+ ρ kXi − Mi + Ui k2
X
L(Xi , Mi , Yi ) =
where w,∗ F
2
( i=1
1, if(i, j, . . . , n) ∈ 3
hi,j,...,n = (17) 1
X
2
0, otherwise, +
H · Mi − H · T
F + const,
2λ
i=1
Clearly, H is a binary tensor that indicates whether any data (21)
in T are missing.
Now, we can define the multi-attribute sensor-data recon- where const represents a constant term, U is the scaled dual
struction problem. Let H denote a binary sampling tensor, variable [36], and Ui = ρ −1 Yi . Now, we can obtain the
iterations of ADMM as follows. According to the method in [31], the update of M is equal
ρ
2 to the iteration of M̄ :
Xik+1 = argmin (
Xi,(i)
w,∗ +
Xi + Uik − Mki
),
2 1
2
Xi F M̄k+1 = argmin (
H · N M̄ − H · T
F
N M̄i 2λ
1
X
2
Mk+1 = argmin (
H · Mi − H · T
F ρ
2
i
2λ +
Z̄ − X̄ k+1 − Ū k
), (27)
Mi i=1 2 F
ρ
2
where
+
Mi − Xik+1 − Uik
),
2 F N
1 X
Uik+1 = Uik + Xik+1 − Mk+1
i . (22) M̄ = Mi ,
N
i=1
1) UPDATE X N
1 X
Before the update step for X , we need to introduce the X̄ k+1 = Xik+1 ,
following definition and theorem. N
i=1
Definition 1: Given a matrix Y, the WNNM problem [18] 1 XN
aims to find a matrix X, which is as close to Y as possible Ū k = Uik . (28)
N
under certain data fidelity functions, and the WNNM problem i=1
is described as follows: and then we get the M̄-update solution :
min kY − Xk2F + kXkw,∗ (23) M̄k+1 = M+ · /M− , (29)
X
here, we let where
C M− = (1/λ)(H + ρI),
wi = , (24)
σi (X(i) ) + ξ M+ = ρ(X̄ k+1 + Ū k + (1/(N λ))H · T ). (30)
where ξ is a small constant, and C is a compromising con-
stant, both constants were set empirically in our experiment. 3) THE DR-AWNNM ALGORITHM
Theorem 1: ∀Y ∈ <m×n , denote by Y = U6VT theSVD After discussing the appearing subproblem, we present the
diag(σ1 (Y), σ2 (Y), . . . , σn (Y)) complete DR-AWNNM algorithm for multi-attribute sensor-
of it, where 6 = and data reconstruction, as in Alg.1.
0,
σi (Y) denotes the i-th singular value of Y. If ε and C satisfy DR-AWNNM algorithm can be mainly divided into four
√
the inequality ε < C
(C), σi (Y ) , and the weights satisfy steps: Step 1: The updating of parameter Xik+1 ; Step 2: Calcu-
w1 ≥ · · · ≥ wn ≥ 0 simultaneously, the WNNM problem in lating X̄ k+1 , Ū k and M̄ to update M̄k+1 ; Step 3: Using Xik+1
Eq.(23)
has the closed-form solution: X∗ = U6̃V , where
T and M̄k+1 to update Uik+1 ; Step 4: Iterate steps 1, 2 and 3
diag(σ1 (X∗ ), σ2 (X∗ ), . . . , σn (X∗ )) until the difference between adjacent iterations is smaller than
6̃ = ,
0, the given threshold ρ.
and The algorithm uses the sampling binary index
tensor H, incomplete sensor data tensor T , and the param-
0 if c2 < 0
√ eters λ,ρ,cλ ,λ∗ as inputs. It minimizes Eq.(18) iteratively
σi (X∗ ) = c1 + c2 (25)
if c2 ≥ 0 by decreasing A toward convergence. λ∗ is set as the lower
2
bound of λ.
where Figure 4 shows the flowchart of the proposed missing data
c1 = σ1 (Y) − ε, c2 = (σ1 (Y) + ε)2 − 4C. reconstruction algorithm. Our algorithm is divided into two
Therefore, the variable Xi can be solved independently by main steps: first, to fully use the spatial correlations among
using the matrix shrinkage operator introduced above. Hence, the data, we use the k-means clustering algorithm to group
the update of Xi can be given as follows: sensor nodes for obtaining the best classification. Second,
Xik+1 = foldi (D1/ρ (Mki − Uik )(i) ). (26) the complete dataset T is processed using two patterns to
represent missing data, namely, random missing pattern and
where the ‘‘unfold’’ operation along the i-th mode on a tensor consecutive missing pattern. Finally, the DR-AWNNM algo-
X is defined as unfoldi (X ) := X(i) ∈ RKi ×(K1 ...Ki−1 Ki+1 ...Kn ) . rithm is used in each cluster according to the clustering result,
is defined as foldi (X(i) ) := X .
The opposite operation ‘‘fold’’ and then the ADMM algorithm is used to iterate parameters
It is clear that kX kF =
X(i)
2 is for any 1 ≤ i ≤ n. until the global optimal solution is output by the algorithm.
A. EXPLANATION OF THE DATASET USED FOR FIGURE 4. A flowchart of the proposed missing data reconstruction
SIMULATION algorithm.
1) INTEL BERKELEY DATASET
For our experimental simulation, we used data collected 2) MISSING DATA
from the Intel Berkeley Research Laboratory between Febru- To verify the validity of the algorithm accurately and objec-
ary 28 and April 5, 2004 [37]. The laboratory has different tively, we adopted two data processing patterns, namely, ran-
rooms, and in each room, Mica2Dot sensors collect times dom missing pattern and consecutive missing pattern. We first
tamped topology information, along with humidity, temper- obtained complete raw sensor data and then produced artifi-
ature, light, and voltage values once every 31 s. Data were cial missing data based on the two patterns.
collected using the TinyDB in-network query processing sys- • Random Missing Pattern (RMP): This pattern repeatedly
tem implemented on the TinyOS platform. The data collected chooses a random time and random sensor to be missing
from all sensors were merged into one big dataset containing and hence the data is missing.
2.3 million readings (∼150 MB). Samples from this dataset • Consecutive Missing Pattern (CMP): This pattern
are given in Table 2. reflects that a few nodes miss all data after a certain
3) PARAMETER SETTINGS
We employed error formulation to measure the differences
between the predicated values and the actual values. Perfor-
mance was measured in terms of RSE, a metric for measuring
reconstruction error, and we defined it as follows:
kX − T k2F
RES = , (31)
kT k2F
The sampling ratio ε refers to the observation ratio of sensor FIGURE 7. Total maximum prediction error for different numbers of
clusters.
data, which is defined in [31]
P
(i,j)∈ 1
ε=P . (32) sensors 44 and 45 are in the second cluster. As can be
(i,j)∈( ) ¯ 1
S
seen in Fig. 6, measurements recorded by the sensors within
In our experiment, we set some parameters empiri- one cluster follow very similar patterns, but the difference
cally [31], specifically, λ = 1, c = 14 , λ∗ = 1e−6 . ρ is a between clusters is relatively large.
positive number, and its value affects the speed of conver- Therefore, when there are missing values in a cluster,
gence of the ADMM algorithm. Here, we set ρ = 1.05 [30]. we can estimate them from neighboring nodes within the
The parameter C is associated with the allocation of weights same cluster. We first used the spatial relationship between
to singular values, and it is a very important parameter. sensor nodes to divide them into different clusters and
Therefore, we discussed its value separately in the following then applied our algorithm within each cluster. To enhance
separate. the reliability of the proposed algorithm, we changed K
All simulations were run in the MATLAB environment from 3 to 17 and then calculated the difference between the
on a desktop computer equipped with a 3.60-GHz Intel predicted and original values within each cluster to obtain the
i7-4790 CPU and 8 GB RAM. maximum error between them.
Theoretically, as the cluster size decreases, the spatial
B. SIMULATION PARAMETERS AND RESULTS relationships within the same cluster become stronger; thus,
Figure 5 shows the locations of sensor nodes and their clusters the final calculation result is more accurate. However, as the
in the Intel Berkeley Research Laboratory. Here, the number number of clusters increases, there will be fewer nodes in the
of clusters is 12. The corresponding sensor nodes of each same cluster.
cluster are shown within the solid lines. Our DR-AWNNM Figure 7 shows the simulation results of the total maximum
algorithm for missing data reconstruction was applied within prediction error (TMPE) for different numbers of clusters.
each group. The red and the blue curves, respectively, represent the total
The main reason for clustering the sensors was to find maximum error of all classes corresponding to each K in
similarities between their measurements. Figure 6 shows the case of consecutive missing and random missing data.
temperature measurements from the sensors in two clus- According to the obtained results, the TMPE value decreases
ters. Here, sensors 22 and 20 are in the first cluster, and upon increasing the number of clusters, which confirms the
when the data missing rate of each node is lower than 50%.
However, with a slightly higher rank-[5,5,2], Tucker decom-
position achieves poor performance. This means that to use
the Tucker decomposition-based algorithm for accurately
reconstructing sensor data, we should first obtain the correct
n-rank. However, this is usually difficult in practice, espe-
cially when the tensor is incomplete. In addition, the error rate
of the CAMP algorithm is also lower than other algorithms.
When the sparsity becomes better and the compression ratio
gets higher, the CAMP algorithm can effectively reduce the
error and improve the accuracy. Notably, when the data miss-
ing rate exceeds 50%, the error rate of the proposed algorithm
is considerably lower than that of the other three algorithms,
and even when the data missing rate is as high as 90%,
the error rate of the proposed algorithm is lower than 20%.
FIGURE 10. (a) DR-AWNNM algorithm and DR-AWNNM1 algorithm under
The reason is that we combine the various attributes of the RMP model. (b) DR-AWNNM algorithm and DR-AWNNM1 algorithm under
data to enhance the intrinsic relationship between the data, CMP model.
and our algorithm uses the K -means cluster algorithm to
group sensors at the beginning, which greatly enhances the
external connection of the data and improves the accuracy of The computational complexity of DR-AWNNM consists
the algorithm. mainly of two parts. One is the complexity of the K -means
In addition, we present in Figure 10 the results obtained clustering algorithm, and the other is the cost of matrix
by using only the DR-AWNNM algorithm (DR-AWNNM1) completion calculation. When we use K -means clustering
to reconstruct the missing data under the RMP and the CMP algorithm to separate sensor nodes, we first set k cluster
models, here all parameters have the same value as DR- centers are randomly, and then calculate the distance from n
AWNNM algorithm. It can be seen that when the data loss points to k centers, the distance between the single image and
rate is high, the DR-AWNNM1 algorithm can reconstruct the the image is calculated as d, and last repeat the above process
missing data well. Moreover, the DR-AWNNM algorithm is for t times, so the complexity of K -means is O = k ×n×d ×t,
superior to the other algorithms, including DR-AWNNM1. namely O(kndt). Usually k, d and t can be considered as
The solution of DR-AWNNM algorithm, ADMAC algo- constants, therefore, the computational cost of K -means can
rithm, and HaLRTC algorithm essentially involves finding be simplified to O(n), which is linear. From Alg.1 we know
the augmented Lagrange function of the target function, and that the matrix completion algorithm mainly involves the
then the ADMM algorithm is used to solve it. Hence, Fig.11 iteration of Xik+1 , Mk+1
i and Uik+1 , the parameters iterated M
shows the convergence curves of the three algorithms in the times each time for N times, so the computational complexity
continuous missing data mode, in which the abscissa repre- of the matrix completion algorithm is O(n2 ), that is to say
sents the iteration number of algorithm convergence, and the the complexity of the ADMAC algorithm and the HaLRTC
ordinate represents the tolerance of the relative difference of algorithm is O(n2 ). Based on Eq.(24)-(26) we can learn that
outputs of two neighboring iterations. The red curve repre- the calculation cost of the weighted nuclear norm is O(1).
sents the proposed algorithm. It can be seen that our algorithm Therefore, the computational complexity of DR-AWNNM in
is convergent, and it is slightly superior to other algorithms. this paper is also O(n2 ). In addition, Fig.8 and Fig.9 show
REFERENCES
[1] IEEE Internet Initiative. (2015). Towards a Definition of the Internet of
Things (IoT). [Online]. Available: http://iot.ieee.org/images/files/pdf/
IEEE_IoT_Towards_Definition_Internet_of_Things_Revision1
_27MAY15.pdf
FIGURE 11. Convergence Curves with consecutive missing pattern.
[2] L. Atzori, A. Iera, and G. Morabito, ‘‘The Internet of Things: A survey,’’
Comput. Netw., vol. 54, no. 15, pp. 2787–2805, Oct. 2010.
[3] K. Rose, S. Eldridge, and L. Chapin, ‘‘The Internet of Things:
An overview,’’ Internet Soc., Reston, VA, USA, Tech. Rep., 2015,
that the DR-AWNNM algorithm has significant effects and pp. 1–50.
far-reaching significance. [4] L. Xu, J. Wang, H. Zhang, and T. A. Gulliver, ‘‘Performance analysis
To summarize, the proposed DR-AWNNM algorithm out- of IAF relaying mobile D2D cooperative networks,’’ J. Franklin Inst.,
performs the other tested algorithms on the dataset obtained vol. 354, no. 2, pp. 902–916, Jan. 2017.
[5] L. Xu, J. Wang, Y. Liu, J. Yang, W. Shi, and T. A. Gulliver, ‘‘Outage
from Intel Berkeley Research Laboratory and considering performance for IDF relaying mobile cooperative networks,’’ in Proc. Int.
the two patterns of data missing. Specifically, the proposed Conf. 5G Future Wireless Netw.. Springer, 2017, pp. 395–402.
algorithm provides better results under the random missing [6] S. Chen, H. Xu, D. Liu, B. Hu, and H. Wang, ‘‘A vision of IoT: Applica-
tions, challenges, and opportunities with china perspective,’’ IEEE Internet
pattern than the consecutive missing pattern. Things J., vol. 1, no. 4, pp. 349–359, Aug. 2014.
[7] D. Bonino et al., ‘‘ALMANAC: Internet of Things for smart cities,’’ in
V. CONCLUSION Proc. 3rd Int. Conf. Future Internet Things Cloud, Rome, Italy, Aug. 2015,
pp. 309–316.
Missing data causes many difficulties in various IoT applica-
[8] C. Perera, C. H. Liu, S. Jayawardena, and M. Chen, ‘‘A survey on
tions. Although this is inevitable due to the inherent character- Internet of Things from industrial market perspective,’’ IEEE Access,
istic of IoT, to solve the problem, it is imperative to estimate vol. 2, pp. 1660–1679, 2014.
the missing data as accurately as possible. [9] O. Salman, I. Elhajj, A. Kayssi, and A. Chehab, ‘‘Edge computing
enabling the Internet of Things,’’ in Proc. IEEE 2nd World Forum Internet
In this paper, a missing data reconstruction method based Things (WF-IoT), Milan, Italy, Dec. 2015, pp. 603–608.
on automatic WNNM was developed, and K -means cluster- [10] M. Heil and R. Karban, ‘‘Explaining evolution of plant communication by
ing analysis was embedded into this forecasting process to airborne signals,’’ Trends Ecol. Evol., vol. 25, no. 3, pp. 137–144, 2010.
[11] L. Kong et al., ‘‘Surface coverage in sensor networks,’’ IEEE Trans.
improve the prediction accuracy. First, to apply spatial corre- Parallel Distrib. Syst., vol. 25, no. 1, pp. 234–243, Jan. 2014.
lation between sensor nodes, we used the K -means clustering [12] B. Nathani and R. Vijayvergia, ‘‘The Internet of intelligent things:
algorithm to separate sensor nodes into different groups. An overview,’’ in Proc. Int. Conf. Intell. Commun. Comput. Techn. (ICCT),
Second, we considered sensor networks with multiple Jaipur, India, Dec. 2017, pp. 119–122.
[13] Z. Yang, M. Li, and Y. Liu, ‘‘Sea depth measurement with restricted
types of sensors in each node. Accounting for possible cor- floating sensors,’’ in Proc. 28th IEEE Int. Real-Time Syst. Symp. (RTSS),
relations among multiple-attribute sensor data, we provided Tucson, AZ, USA, Dec. 2007, pp. 469–478.
a tensor-based method to estimate missing data and proposed [14] Y. Liu, Y. He, M. Li, J. Wang, K. Liu, and X. Li, ‘‘Does wireless sensor
network scale? A measurement study on GreenOrbs,’’ IEEE Trans. Parallel
an algorithm based on matrix rank-minimization method, Distrib. Syst., vol. 24, no. 10, pp. 1983–1993, Oct. 2013.
namely, DR-AWNNM. We considered the weights between [15] X.-Y. Liu, K.-L. Wu, Y. Zhu, L. Kong, and M.-Y. Wu, ‘‘Mobility increases
singular values and adaptively assign different weights to the surface coverage of distributed sensor networks,’’ Comput. Netw.,
vol. 57, no. 11, pp. 2348–2363, 2013.
each singular value. Especially, when the weights are sorted
[16] M. G. Lawrence, ‘‘The relationship between relative humidity and the
in a non-descending order, the optimal solution can be easily dewpoint temperature in moist air: A simple conversion and appli-
obtained in closed-form. cations,’’ Bull. Amer. Meteorol. Soc., vol. 86, no. 2, pp. 225–233,
Third, to demonstrate the feasibility of proposed method, 2005.
[17] E. J. Candès and B. Recht, ‘‘Exact low-rank matrix completion via con-
a traditional EM-based Tucker decomposition algorithm and vex optimization,’’ in Proc. Found. Comput. Math., 2009, vol. 9, no. 6,
other excellent algorithms, namely ADMAC, HaLRTC and pp. 806–812.
CAMP were introduced and compared. Our experiment sug- [18] S. Gu, Q. Xie, D. Meng, W. Zuo, X. Feng, and L. Zhang, ‘‘Weighted nuclear
norm minimization and its applications to low level vision,’’ Int. J. Comput.
gested that the proposed method, on the one hand, when Vis., vol. 121, no. 2, pp. 183–208, Jan. 2017.
the data missing rate is low or the sampling rate is high, [19] L. Kong, M. Xia, X.-Y. Liu, M.-Y. Wu, and X. Liu, ‘‘Data loss and
have outstanding performance as well as the other algorithms. reconstruction in sensor networks,’’ in Proc. IEEE INFOCOM, Apr. 2013,
On the other hand, when the data missing rate is high or the pp. 1654–1662.
[20] S. R. Madden, M. J. Franklin, J. M. Hellerstein, and W. Hong, ‘‘TinyDB:
sampling rate is low, it performs much better than the other An acquisitional query processing system for sensor networks,’’ ACM
algorithms. Trans. Database Syst., vol. 30, no. 1, pp. 122–173, 2005.
[21] E. Granger, M. A. Rubin, S. Grossberg, and P. Lavoie, ‘‘Classification XIANG YU received the Ph.D. degree from the
of incomplete data using the fuzzy ARTMAP neural network,’’ in Proc. University of Electronic Science and Technology
IEEE-INNS-ENNS Int. Joint Conf. Neural Netw. (IJCNN), Neural Comput., of China, Chengdu, China, in 1995. From 1999 to
New Challenges Perspect. New Millennium, Como, Italy, vol. 6, Jul. 2000, 2002, he was Post-Doctoral/Associate Professor of
pp. 35–40. postdoctoral mobile station in computer science
[22] T. Cover and P. Hart, ‘‘Nearest neighbor pattern classification,’’ IEEE and technology with the University of Electronic
Trans. Inf. Theory, vol. 13, no. 1, pp. 21–27, Jan. 1967. Science and Technology of China. He first had an
[23] L. Pan and J. Li, ‘‘K-nearest neighbor based missing data estimation
industrial career in R&D, mostly in the field of
algorithm in wireless sensor networks,’’ Wireless Sensor Netw., vol. 2,
radio communications.
no. 2, pp. 115–122, 2010.
[24] M. H. Le Gruenwald, ‘‘Estimating missing values in related sensor data From 1991 to 2008, he was the Chief Repre-
streams,’’ in Proc. COMAD, 2005, pp. 83–94. sentative with Beijing Taige Electronics Co., Ltd., Chengdu Branch, and
[25] L. Gruenwald, H. Chok, and M. Aboukhamis, ‘‘Using data min- a Deputy Chief Engineer with the Seventh Research Institute, China Elec-
ing to estimate missing sensor data,’’ in Proc. 7th IEEE Int. Conf. tronics Technology Group Corporation. He was the expert of major projects
Data Mining Workshops (ICDMW), Omaha, NE, USA, Oct. 2007, of national science and technology. He is currently a Professor with the
pp. 207–212. Chongqing University of Posts and Telecommunications, Chongqing, China.
[26] J. Karjee, H. K. Rath, and A. Pal, ‘‘Efficient data prediction, recon- His main research interests lie in the areas of digital communications and
struction and estimation in indoor IoT networks,’’ in Proc. IEEE radio signal processing. He was a member of the China delegation for
6th Int. Conf. Future Internet Things Cloud (FiCloud), Aug. 2018, (ITU-R) WP5A, 5B, and 5C Conference.
pp. 236–243.
[27] H. Wu, M. Suo, J. Wang, P. Mohapatra, and J. Cao, ‘‘A holistic approach XIA FAN received the B.S. degree in communi-
to reconstruct data in ocean sensor network using compression sensing,’’ cation engineering from the Chongqing Univer-
IEEE Access, vol. 6, pp. 280–286, 2018. sity of Posts and Telecommunications, Chongqing,
[28] J. Chen, ‘‘The research of missing data recovery method based on feature China, in 2016, where she is currently pursuing the
analysis in wireless sensor networks,’’ in Proc. China Nat. Knowl. Infras- master’s degree in communication and electronic
truct. (CNKI), 2016.
information engineering. Since 2016, she has been
[29] G. Chen et al., ‘‘Multiple attributes-based data recovery in wireless sensor
studying at the Wireless Communication Technol-
networks,’’ in Proc. IEEE GLOBECOM, Atlanta, GA, USA, Dec. 2013,
pp. 103–108. ogy Innovation Laboratory, Broad Band Equip-
[30] J. Liu, P. Musialski, P. Wonka, and J. Ye, ‘‘Tensor completion for estimating ment Mobilization Center. Her research interests
missing values in visual data,’’ IEEE Trans. Pattern Anal. Mach. Intell., include mobile cloud computing, artificial intel-
vol. 35, no. 1, pp. 208–220, Jan. 2013. ligence, big data analysis in wireless communications, and the Internet of
[31] Y. Shao, Z. Chen, F. Li, and C. Fu, ‘‘Reconstruction of big sensor data,’’ Things.
in Proc. 2nd IEEE Int. Conf. Comput. Commun.(ICCC), Chengdu, China,
Oct. 2016, pp. 1–6. KAN CHEN received the B.S. degree in commu-
[32] X. Wu, C.-L. Chuang, and J.-A. Jiang, ‘‘Temperature map recov- nications engineering from the Chongqing Univer-
ery based on compressive sensing for large-scale wireless sensor net- sity of Posts and Telecommunications, Chongqing,
works,’’ in Proc. IEEE Int. Conf. Green Comput. Commun. IEEE Internet China, in 2017, where he is currently pursuing the
Things IEEE Cyber, Phys. Social Comput., Beijing, China, Aug. 2013, M.S. degree in information and communication
pp. 1202–1206. engineering. Since 2016, he has been studying at
[33] J.-F. Cai, E. J. Candès, and Z. Shen, ‘‘A singular value threshold- the Wireless Communication Technology Innova-
ing algorithm for matrix completion,’’ SIAM J. Optim., vol. 20, no. 4, tion Laboratory, Broadband Equipment Mobiliza-
pp. 1956–1982, 2010. tion Center. His research interest mainly focuses
[34] D. Raval, G. Raval, and S. Valiveti, ‘‘Optimization of clustering process
on signal processing for wireless communications,
for WSN with hybrid harmony search and K-means algorithm,’’ in Proc.
especially the multicarrier transmissions for next generation wireless com-
Int. Conf. Recent Trends Inf. Technol.(ICRTIT), Chennai, India, Apr. 2016,
pp. 1–6. munication system.
[35] J. A. Hartigan and M. A. Wong, ‘‘Algorithm AS 136: A k-means
clustering algorithm,’’ Appl. Statist., vol. 28, no. 1, pp. 100–108, SIRUI DUAN received the Ph.D. degree from
1979. the Beijing University of Posts and telecommu-
[36] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, Distributed Opti- nications, Beijing, China, in 2014. He is cur-
mization and Statistical Learning via the Alternating Direction Method rently a Lecturer with the Chongqing Univer-
of Multipliers. Now Foundations and Trends, 2011. [Online]. Available: sity of Posts and telecommunications, Chongqing,
https://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=8186925 China. His research interests mainly focus on sig-
[37] [Online]. Available: http://db.csail.mit.edu/labdata/labdata.html/ nal processing for wireless communications, with
[38] A. Smoliński, B. Walczak, and J. W. Einax, ‘‘Exploratory analysis of data an emphasis on multicarrier transmissions for next
sets with missing elements and outliers,’’ Chemosphere, vol. 49, no. 3, generation wireless communication system.
pp. 233–245, 2002.