Multi-Attribute Missing Data Reconstruction Based On Adaptive Weighted Nuclear Norm Minimization in Iot

Received September 25, 2018, accepted October 13, 2018, date of publication October 18, 2018, date of current
version November 9, 2018.

Digital Object Identifier 10.1109/ACCESS.2018.2876701
Multi-Attribute Missing Data Reconstruction

Based on Adaptive Weighted Nuclear Norm
Minimization in IoT
XIANG YU , XIA FAN , KAN CHEN , AND SIRUI DUAN
School of Communication and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
Corresponding author: Xia Fan ([email protected])
This work was supported in part by the National Science and Technology Major Program of China under Grant 2017ZX03001004-004,
in part by the Research and Innovation Project of Graduate Student of Chongqing City under Grant CYS18239, and in part by
Chongqing Research Program of Basic science and Frontier Technology under Grant cstc2016jcyjA0542.
ABSTRACT In the Internet of Things, data sets gathered from sensor nodes are missing a significant
fraction of data, owing to noise, collision, unreliable links, and unexpected damage. This phenomenon is
more serious in some scenarios, which limits the applications of sensor data. It is therefore necessary to
develop methods to reconstruct these lost data with high accuracy. In this paper, a new multi-attribute missing
data reconstruction method based on adaptive weighted nuclear norm minimization is developed, and a
K -means clustering analysis is embedded into this forecasting process to improve the prediction accuracy.
First, to ensure that sensors in one group have similar patterns of measurement, we use a traditional machine
learning algorithm, called K -means clustering algorithm to separate sensors into different groups. Second,
considering the correlations among multiple attributes of sensor data and its joint low-rank characteristics,
we propose an algorithm based on the matrix rank-minimization method of automatic weighted nuclear norm
minimization, which adaptively assigns different weights to each singular value simultaneously. Moreover,
we use the alternating direction method of multipliers to obtain its optimal solution. Finally, we evaluate
the proposed method by using a real sensor data set from the Intel Berkeley Research Laboratory with two
missing patterns, namely, random missing pattern and consecutive missing pattern. The simulation results
prove that our algorithm performs well even with a small number of samples, and it can propagate through
a structure to fill in large missing regions.
INDEX TERMS Data reconstruction, Internet of Things (IoT), K -means, multi-attribute, tensor completion,
weighted nuclear norm minimization (WNNM).
I. INTRODUCTION IoT has employed many technologies and the core tech-
Owing to advancements in information technology, the new nologies of it mainly include Radio Frequency Identifica-
era of the Internet of Things (IoT) is encompassing comput- tion (RFID), sensor technology, wireless network technology,
ing and communication technologies, spanning every aspect artificial intelligence and cloud computing [1]. IoT is widely
of our lives, and it has emerged as one of central issues in used in conventional Long Term Evolution (LTE), Wi-Fi,
our daily lives. The IoT is an intelligent network, it has ZigBee, wireless sensor network (WSN), as well as device-
been defined as a global network with an infrastructure that to-device (D2D) communication [4], transmit antenna selec-
has self-configuring capabilities. Specifically, the sensors are tion (TAS) in cooperative networks [5], the low-power wide
embedded in various objects and then integrated with the area network from the LoRa Alliance (LoRaWAN), narrow
existing Internet to realize the information exchange between band IoT (Nb-IoT), Ethernet and many other communications
human society and physical systems. IoT is to fully utilize the technologies. Therefore, IoT is rapidly transforming into a
new generation of IT technology in all walks of life, which is highly heterogeneous ecosystem that provides inter operabil-
called the third wave of the world information industry after ity among different types of devices and communications
computers and the Internet [1]–[3]. technologies [6]–[8].
2169-3536
2018 IEEE. Translations and content mining are permitted for academic research only.
VOLUME 6, 2018 Personal use is also permitted, but republication/redistribution requires IEEE permission. 61419
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
X. Yu et al.: Multi-Attribute Missing DR-AWNNM in IoT
Stylized as IoT, it is the interconnection of everyday and voltage. Thus, we can assume that there are a few con-
‘‘thing,’’ such as vehicles, buildings, and other Internet- nections between these attributes objectively. For instance,
connected devices embedded with software, electronics, actu- when light illumination is increased, the ambient temperature
ators, and sensors, enabling them to send and receive data. will increase simultaneously, and air humidity will decrease.
Thus, data is the most crucial and significant asset responsible An empirical study [16] revealed that temperature, dewpoint
for the functioning of these smart devices. IoT applications temperature, and relative humidity are linearly correlated.
gather huge volumes of multiple attribute data from the phys- That is, the correlations among the attributes can be used as a
ical world through all connected sensors and reconstruct envi- supplement of the internal correlations to increase estimation
ronmental data in the cyber world [9]. For instance, scientists accuracy.
have revealed the plant evolution based on wind speed, air In this paper, a new approach to reconstructing missing
humidity, and air temperature data [10]. Volcanic eruptions values in IoT data is proposed. This method implements
have been predicted based on temperature and shake data of a K -means clustering algorithm to separate sensors into dif-
volcanoes [11]. ferent groups. The main goal of clustering is to increase the
IoT applications have strict requirements in terms of data similarity within the same group and the difference between
integrity, correctness, and on-time delivery [12]. However, different clusters. Thereafter, we use the potential relation-
in many IoT applications, massive data loss is common and ships among multiple data attributes to propose an algorithm
unavoidable. For example, the Intel Berkeley Research lab based on the matrix rank-minimization method, which is an
dataset [11] is missing roughly 50% of data, the Ocean Sense approach to low rank matrix approximation [17], to recon-
project is missing 64% of data [13], and the GreenOrbs struct the multi-attribute sensor data within each cluster.
project is missing 35% of data [14]. This can be ascribed To improve data reconstruction performance, we enhance
to many reasons. On the one hand, these data have the our algorithm by normalizing the data and using weighted
characteristics of large scale, high dimensions, and complex nuclear norm minimization (WNNM) [18]. Our contributions
structures. Data loss or damage may occur during acquisition, are summarized as follows:
transmission, and storage. On the other hand, sensor nodes 1) We use the K -means clustering algorithm to divide the
have limited capabilities, so limitations in terms of energy, sensor nodes into different clusters, so as to ensure
storage, communication capabilities, battery depletion, hard- that sensors within one group have similar patterns of
ware failures, or other environmental and communications measurement.
issues may cause lead to the capture of incomplete data. 2) To the best of our knowledge, this is the first work
Loss of sensing data hinders various IoT applications and to apply adaptive weighted nuclear norm minimization
makes it more difficult to process sensor data. Moreover, algorithm in tensor-based method to sensor data recon-
if the missing data values cannot be filled in accurately, struction problem.
the existing analysis tools cannot be applied; if the missing 3) We combine the multi-attribute correlation of sensor
data are deleted directly, a large amount of raw data would be data and the low-rank minimization technique to pro-
lost, which would reduce the accuracy and reliability of the pose a tensor-based algorithm, named DR-AWNNM,
analysis results and lead to the wastage of a massive amount for reconstruction missing values.
of energy. According to a report by IBM, tons of data are The remainder of this paper is organized as follows.
produced every day throughout the world. Currently, it is In Section II, we present related work. Section III describes
about 2.5 quintillion bytes per day. With IoT, this number will problem formulation. The performance of the proposed
increase considerably. For example, in 2009, there were about method is evaluated in Section IV. Section V provides our
0.9 billion smart objects, and their number is projected to concluding remarks and an outline for future work.
reach 26 billion by 2020. The more widespread the use of this
technology, the greater will be the volume of data produced. II. RELATED WORK
Recovering missing sensor data effectively to better analyze A. EXISTING APPROACHES AND THEIR LIMITATIONS
them for IoT applications is a major challenge. Therefore, In the modern IoT paradigm, data integrity is the most impor-
it is urgent and important to design effective methods for tant aspect that influences the overall performance of any
reconstructing missing values in big sensor data. system. The IoT is used for many critical applications, such
Many studies have contributed techniques to predict miss- as telemetry in hazardous environments, control of industrial
ing sensor data. Most such techniques are based on temporal processes, e-Health, smart transportation systems, national
methods, spatial methods, or spatio-temporal methods. How- security, etc. Recently, IoT was used also for the network
ever, in a few special scenarios, high data loss rates may veil monitoring to manage the performance of 5G heterogeneous
temporal and spatial correlations. Therefore, it is necessary to networks under variable conditions The problem of missing
find new ways to address this problem. We are aware that a sensor data in WSN has been known for a long time [6].
sensing node is usually integrated with a multi-function sen- WSN, which is one of the key technologies of IoT, has been
sor, and these nodes usually gather multiple attributes simul- researched extensively on the back of rapid development of
taneously, for example, the data collected by sensor nodes wireless communication technology, microelectronics tech-
in [15] contains four attributes: temperature, humidity, light, nology, and embedded computing technology [9].
61420 VOLUME 6, 2018

Compressive Sensing(CS) is a powerful and generic tech-

nique for estimating data, it can utilize a small fraction
of data to reconstruct the entire dataset. Reference [26]
developed a Compressive Sensing based Data Prediction
(CS-DP) model to predict data at the gateways by learning
the data pattern received from IoT devices, [27] developed an
extended sparse adaptive matching tracking algorithm based
on aforementioned Greedy algorithm (CAMP) to reconstruct
data in WSN. However, CS cannot be directly applied for
environment reconstruction because of its special inherent
FIGURE 1. Network structure of WSN.
structures. Meanwhile, the missing data must follow the
Gaussian or pure random distribution.
All above methods aiming to estimate missing values are
A WSN consists of many sensor nodes distributed in a based on a single attribute. However, many physical attributes
specific area, each of which has certain computing, storage, in nature are strongly correlated, such as humidity and
and communication capabilities. Figure 1 shows the network temperature. In [28], a method was proposed to analyze inter-
structure of a WSN: These devices capture similar data and attribute correlations based on the perceived data in real envi-
transmit it to the sink node. Sink node is considered as a ronments. The authors selected the temperature and humidity
store and forward device which retrieves information from the attribute data of GreenOrbs [14], and after smoothing, they
IoT devices, performs data acquisition and finally transmits it found a significant negative correlation between the changes
to a central entity called cloud under a cloud robotics frame- in temperature and humidity according to the scatter plot.
work. Many studies in the literature have focused on miss- However, actual perception data may be positively corre-
ing data reconstruction in WSN, and most such techniques lated or non-linearly related, depending on nature-specific
are based on temporal methods, spatial methods, or spatio- attributes. In [29], the characteristics of real sensor data were
temporal methods. studied, and a multi-attribute-assistant compressive-sensing-
Temporal methods leverage the temporal correlations based algorithm was developed to approximate missing data.
among readings recorded by the same sensor node [19], that The simulation results show this algorithm performed well,
is to say, the corresponding sensory data collected by the except in cases where the data were highly complex.
sensing node within the monitoring range usually change Recently, the intrinsic low-rank property of high dimen-
slowly within a short time interval, and the sensing data of sional data has been considered. In [30], the tensor comple-
the same sensory node are often the same or similar before tion theory was used to recover missing data from the sink
and after the time interval. Salient methods include observed node of a large-scale WSN; the authors proposed a high-
data mean [20], last seen [21], and linear interpolation. How- accuracy low-rank tensor completion algorithm (HaLRTC)
ever, for long temporal gaps in observations for a given to solve the tensor completion problem without considering
sensor or when the WSN changes sharply and irregularly, noise. Shao et al. [31] propose a tensor-based method called
the temporal methods cannot perform well, and their effec- ADMAR to reconstruct multi-attribute sensor data. However,
tiveness decreases rapidly as the number of consecutively this method cannot perform well when the data missing rate
missing readings increases. is more than 60% with consecutive missing patterns.
Spatial methods leverage spatial correlations among the
sensor data of spatially similar sensor nodes at the same B. MATRIX COMPLEMENT BASED ON NUCLEAR
monitoring time; usually, the closer the sensory nodes NORM RELAXATION
to each other spatially, the more relevant are the data. Wu et al. [32] analyzed a temperature dataset collated using
The most classical interpolation method is the K -nearest- sensor data and verified that most of the information in the
neighbor method [22], which utilizes the values of the nearest dataset can be described with few principal components,
K neighbors to estimate the missing one. The K -nearest indicating that sensor data have a low-rank characteristic.
neighbor estimation method [23] is applied to describe the Therefore, the missing data reconstruction problem can be
spatial correlation among the sensor data of different sen- transformed into matrix rank minimization problem. We first
sor nodes by using the linear regression model, and it uses introduce the basic theory of matrix complement based on
the data information of each neighboring node to estimate nuclear norm relaxation.
the missing data. The window association rule mining [24] Given an incomplete low-rank matrix E ∈ Rm×n , the goal
and freshness association rule mining [25] estimate miss- is to recover all elements based on a few observed elements of
ing data based on association rules among spatially cor- the matrix, which can be described by the following radiation
related neighbors. However, such models are suitable only rank minimization problem:
for limited types of signals and environments, which typ-
ically require precise three-dimensional distances between min rank(X), s.t. Xij = Eij , (i, j) ∈ . (1)
sensors. X
VOLUME 6, 2018 61421

where ⊂ {1, 2, · · · , n} × {1, 2, · · · , m} is the set of

sampled entries of E, and rank(X) is the rank of matrix X. The
constraints of the optimization problem can be expressed as
P (X) = P (E), as well, where P (·) is defined as follows:
(
Xij , if(i, j) ∈ ,
[P (Xij )] = (2)
0, otherwise.
The rank function of the matrix is discontinuous and non-
convex, and the optimization problem has been evidenced
to be NP-hard [17], therefore, it cannot be solved easily in
practice. At present, the method mainly converts the convex
relaxation process of the problem into the problem of solving
Nuclear Norm Minimization (NNM). Thus problem (1) is FIGURE 2. Temperature collected by three sensor nodes.
transformed into the following:
min kXk∗ , s.t. P (X) = P (E). (3)
X far from nodes 32 and 33. This example shows that not
where kXk∗ = 6i σi (X) denotes the nuclear norm of X, every node in a given is useful for recovering the missing
and σi (X) denotes the ith largest singular value. In prac- values of a certain node. Therefore, to better use the degree
tice, the noises in sensory data may lead to over-fitting, of similarity among measured values of neighboring sensors,
so the equality constraints cannot be satisfied strictly. Thus, we first divide the sensors into different groups to minimize
problem (3) is often placed in the objective function: measurement changes within each group.
In this work, we use the K-means clustering algorithm
1 P (X) − P (X) 2 ,

min kXk∗ + F
(4) to group sensors. K-means clustering algorithm is a classic
X 2λ unsupervised clustering algorithm that provides good group-
In practice, noises in sensory data may lead to over-fitting if ing of rectangular and circular regions [34]. This algorithm
strict satisfaction is required. Therefore, we use the parameter divides a set of points into K clusters so that points in each
λ (0 < λ < 1) controlsq to the fit to constraint P (X ) = cluster tend to be close to each other. We list the main steps
2
P (T ). kXkF := 6i,j xij is the Frobenius norm of X .
2 of K-means clustering algorithm in the following:
Cai et al. [33] proposed a singular value threshold algorithm Step 1: Use X ∗ = {x1 , x2 , . . . , xN } to represent the set of
for solving NNM problems: coordinates of N sensor nodes in the monitoring area, and
λ 2 randomly select K objects as the initial clustering center,
Dτ (X) = argmin X − E F + τ kXk∗ , (5) denoted as C = {c1 , c2 , . . . , cK }, where each object rep-
X 2
resents a clustering center. Next, place the cluster centers
where Dτ (X) is the ‘‘shrinkage,’’ operator of X, and τ (τ > 0) of K clusters uniformly within the target field of sensors.
is a constant. If rank(X) = r, its singular value is decomposed Step 2: Associate each sensor with the nearest cluster
into X = U6 τ VT , where 6 = diag({σi } , 1 ≤ i ≤ r) center by using the criterion of distance:
is a vector, consisting of σi (X) with descending order, and
U and V are orthogonal matrices. Thus, the matrix shrinkage xic = arg min kxi − ck k2 , (7)
k=1,...,K
operator Dτ (X) is defined as follows:
where ci stands for the clustering center of sensor node i.
Dτ (X) = Udiag({max(0, σi − τ )})VT . (6) We gather the sensor nodes associated with clustering center k
into set Ck . We call set Ck as a cluster.
III. PROBLEM FORMULATION
Step 3: For each cluster Ck , calculate the average coordi-
A. GROUPING SENSOR NODES WITH K-MEANS
nates as follows:
CLUSTERING ALGORITHM P
Normally, in a monitored region, many sensor nodes are i∈Ck xi
c0k = , (8)
deployed. In [16], it was proved that data sensed from these |Ck |
nodes have spatial correlations. For example, Fig. 2 shows
where |Ck | denotes the cardinality of set Ck .
the temperature observed by three sensor nodes from the
Step 4: Update the coordinates of cluster centers:
Intel Berkeley Research Laboratory dataset over two days.
On the one hand, the data sensed by nearby nodes, namely, C 0 = c01 , c02 , . . . , c0K .

(9)
nodes 32 and 33, have similar curves. Thus, when some
of the sensed data of a sensor node are missing, we can Step 5: Determine the differences between the new cluster
estimate them by using the data of the neighboring nodes. center and the previous cluster center:
On the other hand, the data curve of node 2 is completely 2
4ck = c0k − ck .

different from those of nodes 32 and 33 because node 2 is (10)
61422 VOLUME 6, 2018

FIGURE 3. Iterations of K-means clustering algorithm (K = 9, (a)-Step 1, (b)-step 2 and 3, (c)-step 4 and 5).
Step 6: Iterate steps 2, 3, 4 and 5 until set Ck no longer TABLE 1. The data format.
changes or the difference between adjacent iterations is
smaller than the given threshold:
X
4ck < η. (11) Let T be a tensor of the sensor data with M attributes col-
k lected by N nodes within Q time slots, that is, T ∈ RN ×M ×Q ;
the tensor is a generalization of the matrix concept. We use
where η is a small positive number, here we set
tn,m,q to represent the entry of T . Owing to data loss in IoT,
η = 1e−4 [35].
T is usually an incomplete tensor. The intact information of
Figure 3 shows the procedure the five-step procedure of
T is a set of entries tn,m,q , for (n, m, q) ∈ , where is
one iteration. The red solid points represent the cluster cen-
the set of sampled entries of T . We use P (·) to indicate the
ters, and the stars represent the sensor nodes; the different
sampling operator, which is defined as follows:
colors of sensor points indicate different clusters. (
tn,m,q , if (n, m, q) ∈ ,
B. DATA RECONSTRUCTION WITH ADAPTIVE WEIGHTED [P (T )]n,m,q = (12)
0, otherwise.
NUCLEAR NORM (DR-AWNNM)
Suppose that N nodes are deployed in a monitoring area, each The multi-attribute sensor data reconstruction problem is
of which is equipped with K ∗ sensors to monitor different defined as follows:
at-tributes simultaneously. The monitoring period consists Given a subset of T , denoted as P (T ), the ultimate goal
of Q time slots. The sensor data gathered in one node can is to find an optimal solution T̂ that minimizes the error
be organized in the following format [29], where sensor between T and T̂ :
ID stands for sensor identity number, time stamp represents
sampling time, and the attributes include temperature and min kT − T̂ kF
humidity. s.t. P (T̂ ) = P (T ), (13)
VOLUME 6, 2018 61423

q 2
where kX kF := ïĳĹ(6i1 ,i2 ,...,in xi1 ,i2 ,...,in )ïĳĽ is defined and T denote the incomplete tensor of multi-attribute sen-
as the Frobenius norm of tensor X . sor data. Then, the missing values in Eq.(16) can be estimated
Many studies [19], [32], have mentioned that the sensor effectively by solving the following convex optimization
data tensor is low-rank, so the missing values in T can problem,
be recovered using the matrix complement based on nuclear 3 3
Xi,(i) + 1 H ·
X X 2
norm relaxation: Xi − H · T F .

minimize w,∗
X 2λ
i=1 i=1
min kX k∗
(18)
s.t. P (X ) = P (T ), (14)
where (·) denotes element-wise production of the tensor,
where kX k∗ is the nuclear norm of X , which is defined and λ is an adjustable parameter introduced to prevent
N
as kX k∗ :=
P
αi X(i) ∗ , where αi is the weight cor-
over-fitting.
i=1
responding to the unfolding of the i-th mode. Here, let C. ALTERNATE DIRECTION MULTIPLIER METHOD
α1 = α2 = α3 (N = 3), which means each X(i) is FOR DR-AWNNM
assigned equal importance. However, traditional NNM has The alternate direction multiplier method (ADMM) algo-
certain limitations, for example, all singular values are treated rithm [36], a convex optimization algorithm, is widely used
equally and shrunk with the same threshold [18]. In doing so, to solve convex optimization problems by breaking them into
prior knowledge on singular values of a practical data matrix smaller pieces, each of which can then be handled easily. It is
is ignored. More specifically, larger singular values of an an extension of the augmented Lagrange multiplier method
input data matrix quantify the information of its underlying algorithm . Therefore, in the present study, we attempted
principal directions. In other words, inner the same unfold- to use the ADMM algorithm to solve the formulated
ing X(i) , considering the information included in different problem.
singular value is different [33]. Obviously, the traditional To apply the ADMM method to problem (18), we need to
NNM model, as well as its corresponding soft-thresholding transform it into the ADMM form. Thus, we first perform
solvers, are not adequately flexible to handle such variable splitting. We introduce N new tensor-valued vari-
issues. ables, M1 , M2 , . . . , MN . Let Xi = Mi (i ∈ 1, 2, . . . , N );
To improve the flexibility of NNM, weak shrinkage by using these new variables Mi , Eq. (18) can be rewritten
strength should be enforced on the large singular value, as follows:
while a large weight should be assigned to the small singular
3 3
value. Based on this idea, we use the weighted nuclear norm Xi,(i) + 1 H ·
X X 2
minimize w,∗
Mi − H · T F
minimization (WNNM) algorithm, and define the weighted Xi ,Mi 2λ
i=1 i=1
nuclear norm of X as follows:
s.t. Xi = Mi , i = 1, 2, . . . , N (19)
3
X
wi σi (X(i) ),

X(i) = (15) The augmented Lagrangian of (19) becomes
w,∗
j=1 ρ
L(Xi , Mi , Yi ) = hYi , Xi − Mi i + kXi − Mi k2F
2
where w = [w1 , w2 , . . . , wn ]T and wi ≥ 0, which is a 3
non-negative weight assigned to σi (X ). The weight vector 1 X 2
+ H · Mi − H · T F
enhances the representation capability of the original nuclear 2λ
i=1
norm. 3
Therefore, the proposed model is revised as follows:
X
Xi,(i) ,

+ w,∗
(20)
3
X i=1

min X(i) For convenience, by combining the linear and quadratic
w,∗
i=1 terms in the augmented Lagrangian and scaling the dual
s.t. P (X ) = P (T ), (16) variable, Eq.(20) can be simplified as follows
For convenience, we define a sampling tensor H, 3
Xi,(i) + ρ kXi − Mi + Ui k2
X
L(Xi , Mi , Yi ) =

where w,∗ F
2
( i=1
1, if(i, j, . . . , n) ∈ 3
hi,j,...,n = (17) 1 X 2
0, otherwise, + H · Mi − H · T F + const,
2λ
i=1
Clearly, H is a binary tensor that indicates whether any data (21)
in T are missing.
Now, we can define the multi-attribute sensor-data recon- where const represents a constant term, U is the scaled dual
struction problem. Let H denote a binary sampling tensor, variable [36], and Ui = ρ −1 Yi . Now, we can obtain the
61424 VOLUME 6, 2018

iterations of ADMM as follows. According to the method in [31], the update of M is equal
ρ 2 to the iteration of M̄ :
Xik+1 = argmin ( Xi,(i) w,∗ + Xi + Uik − Mki ),

2 1 2
Xi F M̄k+1 = argmin ( H · N M̄ − H · T F
N M̄i 2λ
1 X 2
Mk+1 = argmin ( H · Mi − H · T F ρ 2
i
2λ + Z̄ − X̄ k+1 − Ū k ), (27)

Mi i=1 2 F
ρ 2
where
+ Mi − Xik+1 − Uik ),

2 F N
1 X
Uik+1 = Uik + Xik+1 − Mk+1
i . (22) M̄ = Mi ,
N
i=1
1) UPDATE X N
1 X
Before the update step for X , we need to introduce the X̄ k+1 = Xik+1 ,
following definition and theorem. N
i=1
Definition 1: Given a matrix Y, the WNNM problem [18] 1 XN
aims to find a matrix X, which is as close to Y as possible Ū k = Uik . (28)
N
under certain data fidelity functions, and the WNNM problem i=1
is described as follows: and then we get the M̄-update solution :
min kY − Xk2F + kXkw,∗ (23) M̄k+1 = M+ · /M− , (29)
X
here, we let where
C M− = (1/λ)(H + ρI),
wi = , (24)
σi (X(i) ) + ξ M+ = ρ(X̄ k+1 + Ū k + (1/(N λ))H · T ). (30)
where ξ is a small constant, and C is a compromising con-
stant, both constants were set empirically in our experiment. 3) THE DR-AWNNM ALGORITHM
Theorem 1: ∀Y ∈ <m×n , denote by Y = U6VT theSVD After discussing the appearing subproblem, we present the
diag(σ1 (Y), σ2 (Y), . . . , σn (Y)) complete DR-AWNNM algorithm for multi-attribute sensor-
of it, where 6 = and data reconstruction, as in Alg.1.
0,
σi (Y) denotes the i-th singular value of Y. If ε and C satisfy DR-AWNNM algorithm can be mainly divided into four
√
the inequality ε < C
(C), σi (Y ) , and the weights satisfy steps: Step 1: The updating of parameter Xik+1 ; Step 2: Calcu-
w1 ≥ · · · ≥ wn ≥ 0 simultaneously, the WNNM problem in lating X̄ k+1 , Ū k and M̄ to update M̄k+1 ; Step 3: Using Xik+1
Eq.(23)
has the closed-form solution: X∗ = U6̃V , where
T and M̄k+1 to update Uik+1 ; Step 4: Iterate steps 1, 2 and 3
diag(σ1 (X∗ ), σ2 (X∗ ), . . . , σn (X∗ )) until the difference between adjacent iterations is smaller than
6̃ = ,
0, the given threshold ρ.
and The algorithm uses the sampling binary index
tensor H, incomplete sensor data tensor T , and the param-

0 if c2 < 0
√ eters λ,ρ,cλ ,λ∗ as inputs. It minimizes Eq.(18) iteratively
σi (X∗ ) = c1 + c2 (25)
 if c2 ≥ 0 by decreasing A toward convergence. λ∗ is set as the lower
2
bound of λ.
where Figure 4 shows the flowchart of the proposed missing data
c1 = σ1 (Y) − ε, c2 = (σ1 (Y) + ε)2 − 4C. reconstruction algorithm. Our algorithm is divided into two
Therefore, the variable Xi can be solved independently by main steps: first, to fully use the spatial correlations among
using the matrix shrinkage operator introduced above. Hence, the data, we use the k-means clustering algorithm to group
the update of Xi can be given as follows: sensor nodes for obtaining the best classification. Second,
Xik+1 = foldi (D1/ρ (Mki − Uik )(i) ). (26) the complete dataset T is processed using two patterns to
represent missing data, namely, random missing pattern and
where the ‘‘unfold’’ operation along the i-th mode on a tensor consecutive missing pattern. Finally, the DR-AWNNM algo-
X is defined as unfoldi (X ) := X(i) ∈ RKi ×(K1 ...Ki−1 Ki+1 ...Kn ) . rithm is used in each cluster according to the clustering result,
is defined as foldi (X(i) ) := X .
The opposite operation ‘‘fold’’ and then the ADMM algorithm is used to iterate parameters
It is clear that kX kF = X(i) 2 is for any 1 ≤ i ≤ n. until the global optimal solution is output by the algorithm.
2) UPDATE M IV. EXPERIMENTAL RESULTS

Relative to the update of X , the iteration of M is more com- In this section, we evaluate the performance of the proposed
plicated. In the following part, we use the method introduced algorithm and compare it with exiting algorithms for missing
in [31] to solve this problem. data estimation in sensor data reconstruction.
VOLUME 6, 2018 61425

TABLE 2. Data samples obtained in the Intel Berkeley Research Laboratory.
Algorithm 1 DR-AWNNM Algorithm for Multi-Attribute

Sensor-Data Reconstruction
1: Input: T ,H, ε, C, λ, ρ, cλ , λ∗ ;
2: Initialize k = 0, M̄0 = U 0 = Xi0 = 0, i = 1, 2, . . . , N ;
3: for k = 0, 1, . . . do
4: for i = 1 : N do
5: // Lines 6 solve Xik+1 = argmin L(Xi , Mki , Uik )
Xi
6: Xik+1 = foldi (D1/ρ (Mki − Uik )(i) ).
7: end for
8: //Line 9-12 solveMk+1
i = argmin L(Xik+1 , Mi , Uik )
Mi
N
X̄ k+1 1
Xik+1 ;
P
9: Calculate = N
i=1
N
1
Ū k Uik ;
P
10: = N
i=1
N
1 P
11: M̄ = N Mi ;
i=1
12: M− = (1/λ)(H + ρI);
13: M+ = ρ(X̄ k+1 + Ū k + (1/(N λ))H · T );
14: M̄ k+1 = M+ · /M− ;
15: for i = 1 : N do
16: Uik+1 = Uik + Xik+1 − Mk+1 .
17: end for
18: λk+1 = max (cλ λk , λ∗ );
19: end for
N
Output: Xik+1 .
P
20:
i=1
A. EXPLANATION OF THE DATASET USED FOR FIGURE 4. A flowchart of the proposed missing data reconstruction
SIMULATION algorithm.
1) INTEL BERKELEY DATASET
For our experimental simulation, we used data collected 2) MISSING DATA
from the Intel Berkeley Research Laboratory between Febru- To verify the validity of the algorithm accurately and objec-
ary 28 and April 5, 2004 [37]. The laboratory has different tively, we adopted two data processing patterns, namely, ran-
rooms, and in each room, Mica2Dot sensors collect times dom missing pattern and consecutive missing pattern. We first
tamped topology information, along with humidity, temper- obtained complete raw sensor data and then produced artifi-
ature, light, and voltage values once every 31 s. Data were cial missing data based on the two patterns.
collected using the TinyDB in-network query processing sys- • Random Missing Pattern (RMP): This pattern repeatedly
tem implemented on the TinyOS platform. The data collected chooses a random time and random sensor to be missing
from all sensors were merged into one big dataset containing and hence the data is missing.
2.3 million readings (∼150 MB). Samples from this dataset • Consecutive Missing Pattern (CMP): This pattern
are given in Table 2. reflects that a few nodes miss all data after a certain
61426 VOLUME 6, 2018

FIGURE 5. Sensor locations and clustering in the Intel Berkeley Research

Laboratory.
FIGURE 6. Comparison of the measurements within and among clusters.

sampling time point owing to damage or running out of
energy. Therefore, we randomly selected 10% of nodes
as objective nodes that suffer from consecutive data
missing and then let each objective node lose the last
x% of all its data.
3) PARAMETER SETTINGS
We employed error formulation to measure the differences
between the predicated values and the actual values. Perfor-
mance was measured in terms of RSE, a metric for measuring
reconstruction error, and we defined it as follows:
kX − T k2F
RES = , (31)
kT k2F
The sampling ratio ε refers to the observation ratio of sensor FIGURE 7. Total maximum prediction error for different numbers of
clusters.
data, which is defined in [31]
P
(i,j)∈ 1
ε=P . (32) sensors 44 and 45 are in the second cluster. As can be
(i,j)∈( ) ¯ 1
S
seen in Fig. 6, measurements recorded by the sensors within
In our experiment, we set some parameters empiri- one cluster follow very similar patterns, but the difference
cally [31], specifically, λ = 1, c = 14 , λ∗ = 1e−6 . ρ is a between clusters is relatively large.
positive number, and its value affects the speed of conver- Therefore, when there are missing values in a cluster,
gence of the ADMM algorithm. Here, we set ρ = 1.05 [30]. we can estimate them from neighboring nodes within the
The parameter C is associated with the allocation of weights same cluster. We first used the spatial relationship between
to singular values, and it is a very important parameter. sensor nodes to divide them into different clusters and
Therefore, we discussed its value separately in the following then applied our algorithm within each cluster. To enhance
separate. the reliability of the proposed algorithm, we changed K
All simulations were run in the MATLAB environment from 3 to 17 and then calculated the difference between the
on a desktop computer equipped with a 3.60-GHz Intel predicted and original values within each cluster to obtain the
i7-4790 CPU and 8 GB RAM. maximum error between them.
Theoretically, as the cluster size decreases, the spatial
B. SIMULATION PARAMETERS AND RESULTS relationships within the same cluster become stronger; thus,
Figure 5 shows the locations of sensor nodes and their clusters the final calculation result is more accurate. However, as the
in the Intel Berkeley Research Laboratory. Here, the number number of clusters increases, there will be fewer nodes in the
of clusters is 12. The corresponding sensor nodes of each same cluster.
cluster are shown within the solid lines. Our DR-AWNNM Figure 7 shows the simulation results of the total maximum
algorithm for missing data reconstruction was applied within prediction error (TMPE) for different numbers of clusters.
each group. The red and the blue curves, respectively, represent the total
The main reason for clustering the sensors was to find maximum error of all classes corresponding to each K in
similarities between their measurements. Figure 6 shows the case of consecutive missing and random missing data.
temperature measurements from the sensors in two clus- According to the obtained results, the TMPE value decreases
ters. Here, sensors 22 and 20 are in the first cluster, and upon increasing the number of clusters, which confirms the
VOLUME 6, 2018 61427

TABLE 3. Partial experimental results with RMP.
TABLE 4. Partial Experimental Results With CMP.
theoretical expectations. However, when the number of clus-

ters exceeds a certain threshold, the curve shows an upward
trend. Thus, we selected the minimum TMPE corresponding
to the best K as the basis for subsequent research.
Next, we evaluated the performance of the proposed
DR-AWNNM algorithm in reconstructing multi-attribute
sensor data. In this step, we first used multi-attribute sensor
data to constitute a third-order tensor, where the three modes
represent sensor time stamp, sensor node ID, and attributes
(such as temperature and humidity). Then, we obtained the
tensor of multi-attribute sensor data.
In our experiment, we compared the output of the recov-
ered sensor data with that of existing algorithms, namely, FIGURE 8. Tensor-based multi-attribute sensor-data reconstruction, with
HaLRTC [30], ADMAR [31], CAMP [27] and EM-based random missing pattern.
Tucker decomposition algorithm [38]. The Tucker rank is
approximately rank-[2,2,2]. By contrast, we used the cor-
rect rank (rank-[2,2,2]) and a higher rank (here we obtained
rank-[5,5,2]) to execute Tucker decomposition. We applied Laboratory with the random missing pattern. In general,
Tucker decomposition to the Intel Berkeley dataset under two the performance of our algorithm is better than that of the
patterns: random missing and consecutive missing. existing algorithms considered for comparison. The proposed
We know that the parameter C is very important, and it DR-AWNNM algorithm performs as well as the other algo-
can even directly affect the effectiveness of our algorithm. rithms when sampling ratio is higher than 40%. Moreover,
Therefore, we discuss it separately. In [18], the parameter when the sampling ratio is lower than 40%, our error is close
C was set empirically as the square root of the tensor size; to 0.2, which is considerably lower than that of the other three
we changed its value on this basis, and its final value was as algorithms. We can also know that the CAMP algorithm also
follows: has a good performance, because when the data is missing,
the algorithm can develop and gradually develop the sparsity
• RMP : C = 2m2 , adaptive matching tracking framework based on these obser-
• CMP : C = 1.3m. vation nodes to reconstruct the missing data more accurately.
where m is the row of T . We obtained this result by conduct- Figure 9 shows the results of the consecutive missing
ing multiple experiments, and some of these experimental pattern. From this figure, it can be seen that the overall
results are presented in Tables 3 and 4; ε represents the error ratio under this pattern is lower than that under the
sampling ratio, and represents the data missing rate. random missing pattern. The Tucker decomposition of correct
Figure 8 shows the results of multi-attribute sensor data rank-[2,2,2] leads to the best performance, and the
reconstruction by using data from the Intel Berkeley Research DR-AWNNM algorithm performs as well as the best one
61428 VOLUME 6, 2018

FIGURE 9. Tensor-based multi-attribute sensor data reconstruction, with

consecutive missing pattern.
when the data missing rate of each node is lower than 50%.
However, with a slightly higher rank-[5,5,2], Tucker decom-
position achieves poor performance. This means that to use
the Tucker decomposition-based algorithm for accurately
reconstructing sensor data, we should first obtain the correct
n-rank. However, this is usually difficult in practice, espe-
cially when the tensor is incomplete. In addition, the error rate
of the CAMP algorithm is also lower than other algorithms.
When the sparsity becomes better and the compression ratio
gets higher, the CAMP algorithm can effectively reduce the
error and improve the accuracy. Notably, when the data miss-
ing rate exceeds 50%, the error rate of the proposed algorithm
is considerably lower than that of the other three algorithms,
and even when the data missing rate is as high as 90%,
the error rate of the proposed algorithm is lower than 20%.
FIGURE 10. (a) DR-AWNNM algorithm and DR-AWNNM1 algorithm under
The reason is that we combine the various attributes of the RMP model. (b) DR-AWNNM algorithm and DR-AWNNM1 algorithm under
data to enhance the intrinsic relationship between the data, CMP model.
and our algorithm uses the K -means cluster algorithm to
group sensors at the beginning, which greatly enhances the
external connection of the data and improves the accuracy of The computational complexity of DR-AWNNM consists
the algorithm. mainly of two parts. One is the complexity of the K -means
In addition, we present in Figure 10 the results obtained clustering algorithm, and the other is the cost of matrix
by using only the DR-AWNNM algorithm (DR-AWNNM1) completion calculation. When we use K -means clustering
to reconstruct the missing data under the RMP and the CMP algorithm to separate sensor nodes, we first set k cluster
models, here all parameters have the same value as DR- centers are randomly, and then calculate the distance from n
AWNNM algorithm. It can be seen that when the data loss points to k centers, the distance between the single image and
rate is high, the DR-AWNNM1 algorithm can reconstruct the the image is calculated as d, and last repeat the above process
missing data well. Moreover, the DR-AWNNM algorithm is for t times, so the complexity of K -means is O = k ×n×d ×t,
superior to the other algorithms, including DR-AWNNM1. namely O(kndt). Usually k, d and t can be considered as
The solution of DR-AWNNM algorithm, ADMAC algo- constants, therefore, the computational cost of K -means can
rithm, and HaLRTC algorithm essentially involves finding be simplified to O(n), which is linear. From Alg.1 we know
the augmented Lagrange function of the target function, and that the matrix completion algorithm mainly involves the
then the ADMM algorithm is used to solve it. Hence, Fig.11 iteration of Xik+1 , Mk+1
i and Uik+1 , the parameters iterated M
shows the convergence curves of the three algorithms in the times each time for N times, so the computational complexity
continuous missing data mode, in which the abscissa repre- of the matrix completion algorithm is O(n2 ), that is to say
sents the iteration number of algorithm convergence, and the the complexity of the ADMAC algorithm and the HaLRTC
ordinate represents the tolerance of the relative difference of algorithm is O(n2 ). Based on Eq.(24)-(26) we can learn that
outputs of two neighboring iterations. The red curve repre- the calculation cost of the weighted nuclear norm is O(1).
sents the proposed algorithm. It can be seen that our algorithm Therefore, the computational complexity of DR-AWNNM in
is convergent, and it is slightly superior to other algorithms. this paper is also O(n2 ). In addition, Fig.8 and Fig.9 show
VOLUME 6, 2018 61429

Finally, by comparing the convergence of different algo-

rithms, we showed that the proposed algorithm is better,
easier to converge, and less complex. In addition, we learned
that the proposed algorithm is more suitable for missing
data reconstruction under the random missing pattern than
under the consecutive missing pattern. In future research,
we will provide more insights into the performance of the
DR-AWNNM algorithm in different scenarios.
REFERENCES
[1] IEEE Internet Initiative. (2015). Towards a Definition of the Internet of
Things (IoT). [Online]. Available: http://iot.ieee.org/images/files/pdf/
IEEE_IoT_Towards_Definition_Internet_of_Things_Revision1
_27MAY15.pdf
FIGURE 11. Convergence Curves with consecutive missing pattern.
[2] L. Atzori, A. Iera, and G. Morabito, ‘‘The Internet of Things: A survey,’’
Comput. Netw., vol. 54, no. 15, pp. 2787–2805, Oct. 2010.
[3] K. Rose, S. Eldridge, and L. Chapin, ‘‘The Internet of Things:
An overview,’’ Internet Soc., Reston, VA, USA, Tech. Rep., 2015,
that the DR-AWNNM algorithm has significant effects and pp. 1–50.
far-reaching significance. [4] L. Xu, J. Wang, H. Zhang, and T. A. Gulliver, ‘‘Performance analysis
To summarize, the proposed DR-AWNNM algorithm out- of IAF relaying mobile D2D cooperative networks,’’ J. Franklin Inst.,
performs the other tested algorithms on the dataset obtained vol. 354, no. 2, pp. 902–916, Jan. 2017.
[5] L. Xu, J. Wang, Y. Liu, J. Yang, W. Shi, and T. A. Gulliver, ‘‘Outage
from Intel Berkeley Research Laboratory and considering performance for IDF relaying mobile cooperative networks,’’ in Proc. Int.
the two patterns of data missing. Specifically, the proposed Conf. 5G Future Wireless Netw.. Springer, 2017, pp. 395–402.
algorithm provides better results under the random missing [6] S. Chen, H. Xu, D. Liu, B. Hu, and H. Wang, ‘‘A vision of IoT: Applica-
tions, challenges, and opportunities with china perspective,’’ IEEE Internet
pattern than the consecutive missing pattern. Things J., vol. 1, no. 4, pp. 349–359, Aug. 2014.
[7] D. Bonino et al., ‘‘ALMANAC: Internet of Things for smart cities,’’ in
V. CONCLUSION Proc. 3rd Int. Conf. Future Internet Things Cloud, Rome, Italy, Aug. 2015,
pp. 309–316.
Missing data causes many difficulties in various IoT applica-
[8] C. Perera, C. H. Liu, S. Jayawardena, and M. Chen, ‘‘A survey on
tions. Although this is inevitable due to the inherent character- Internet of Things from industrial market perspective,’’ IEEE Access,
istic of IoT, to solve the problem, it is imperative to estimate vol. 2, pp. 1660–1679, 2014.
the missing data as accurately as possible. [9] O. Salman, I. Elhajj, A. Kayssi, and A. Chehab, ‘‘Edge computing
enabling the Internet of Things,’’ in Proc. IEEE 2nd World Forum Internet
In this paper, a missing data reconstruction method based Things (WF-IoT), Milan, Italy, Dec. 2015, pp. 603–608.
on automatic WNNM was developed, and K -means cluster- [10] M. Heil and R. Karban, ‘‘Explaining evolution of plant communication by
ing analysis was embedded into this forecasting process to airborne signals,’’ Trends Ecol. Evol., vol. 25, no. 3, pp. 137–144, 2010.
[11] L. Kong et al., ‘‘Surface coverage in sensor networks,’’ IEEE Trans.
improve the prediction accuracy. First, to apply spatial corre- Parallel Distrib. Syst., vol. 25, no. 1, pp. 234–243, Jan. 2014.
lation between sensor nodes, we used the K -means clustering [12] B. Nathani and R. Vijayvergia, ‘‘The Internet of intelligent things:
algorithm to separate sensor nodes into different groups. An overview,’’ in Proc. Int. Conf. Intell. Commun. Comput. Techn. (ICCT),
Second, we considered sensor networks with multiple Jaipur, India, Dec. 2017, pp. 119–122.
[13] Z. Yang, M. Li, and Y. Liu, ‘‘Sea depth measurement with restricted
types of sensors in each node. Accounting for possible cor- floating sensors,’’ in Proc. 28th IEEE Int. Real-Time Syst. Symp. (RTSS),
relations among multiple-attribute sensor data, we provided Tucson, AZ, USA, Dec. 2007, pp. 469–478.
a tensor-based method to estimate missing data and proposed [14] Y. Liu, Y. He, M. Li, J. Wang, K. Liu, and X. Li, ‘‘Does wireless sensor
network scale? A measurement study on GreenOrbs,’’ IEEE Trans. Parallel
an algorithm based on matrix rank-minimization method, Distrib. Syst., vol. 24, no. 10, pp. 1983–1993, Oct. 2013.
namely, DR-AWNNM. We considered the weights between [15] X.-Y. Liu, K.-L. Wu, Y. Zhu, L. Kong, and M.-Y. Wu, ‘‘Mobility increases
singular values and adaptively assign different weights to the surface coverage of distributed sensor networks,’’ Comput. Netw.,
vol. 57, no. 11, pp. 2348–2363, 2013.
each singular value. Especially, when the weights are sorted
[16] M. G. Lawrence, ‘‘The relationship between relative humidity and the
in a non-descending order, the optimal solution can be easily dewpoint temperature in moist air: A simple conversion and appli-
obtained in closed-form. cations,’’ Bull. Amer. Meteorol. Soc., vol. 86, no. 2, pp. 225–233,
Third, to demonstrate the feasibility of proposed method, 2005.
[17] E. J. Candès and B. Recht, ‘‘Exact low-rank matrix completion via con-
a traditional EM-based Tucker decomposition algorithm and vex optimization,’’ in Proc. Found. Comput. Math., 2009, vol. 9, no. 6,
other excellent algorithms, namely ADMAC, HaLRTC and pp. 806–812.
CAMP were introduced and compared. Our experiment sug- [18] S. Gu, Q. Xie, D. Meng, W. Zuo, X. Feng, and L. Zhang, ‘‘Weighted nuclear
norm minimization and its applications to low level vision,’’ Int. J. Comput.
gested that the proposed method, on the one hand, when Vis., vol. 121, no. 2, pp. 183–208, Jan. 2017.
the data missing rate is low or the sampling rate is high, [19] L. Kong, M. Xia, X.-Y. Liu, M.-Y. Wu, and X. Liu, ‘‘Data loss and
have outstanding performance as well as the other algorithms. reconstruction in sensor networks,’’ in Proc. IEEE INFOCOM, Apr. 2013,
On the other hand, when the data missing rate is high or the pp. 1654–1662.
[20] S. R. Madden, M. J. Franklin, J. M. Hellerstein, and W. Hong, ‘‘TinyDB:
sampling rate is low, it performs much better than the other An acquisitional query processing system for sensor networks,’’ ACM
algorithms. Trans. Database Syst., vol. 30, no. 1, pp. 122–173, 2005.
61430 VOLUME 6, 2018

[21] E. Granger, M. A. Rubin, S. Grossberg, and P. Lavoie, ‘‘Classification XIANG YU received the Ph.D. degree from the
of incomplete data using the fuzzy ARTMAP neural network,’’ in Proc. University of Electronic Science and Technology
IEEE-INNS-ENNS Int. Joint Conf. Neural Netw. (IJCNN), Neural Comput., of China, Chengdu, China, in 1995. From 1999 to
New Challenges Perspect. New Millennium, Como, Italy, vol. 6, Jul. 2000, 2002, he was Post-Doctoral/Associate Professor of
pp. 35–40. postdoctoral mobile station in computer science
[22] T. Cover and P. Hart, ‘‘Nearest neighbor pattern classification,’’ IEEE and technology with the University of Electronic
Trans. Inf. Theory, vol. 13, no. 1, pp. 21–27, Jan. 1967. Science and Technology of China. He first had an
[23] L. Pan and J. Li, ‘‘K-nearest neighbor based missing data estimation
industrial career in R&D, mostly in the field of
algorithm in wireless sensor networks,’’ Wireless Sensor Netw., vol. 2,
radio communications.
no. 2, pp. 115–122, 2010.
[24] M. H. Le Gruenwald, ‘‘Estimating missing values in related sensor data From 1991 to 2008, he was the Chief Repre-
streams,’’ in Proc. COMAD, 2005, pp. 83–94. sentative with Beijing Taige Electronics Co., Ltd., Chengdu Branch, and
[25] L. Gruenwald, H. Chok, and M. Aboukhamis, ‘‘Using data min- a Deputy Chief Engineer with the Seventh Research Institute, China Elec-
ing to estimate missing sensor data,’’ in Proc. 7th IEEE Int. Conf. tronics Technology Group Corporation. He was the expert of major projects
Data Mining Workshops (ICDMW), Omaha, NE, USA, Oct. 2007, of national science and technology. He is currently a Professor with the
pp. 207–212. Chongqing University of Posts and Telecommunications, Chongqing, China.
[26] J. Karjee, H. K. Rath, and A. Pal, ‘‘Efficient data prediction, recon- His main research interests lie in the areas of digital communications and
struction and estimation in indoor IoT networks,’’ in Proc. IEEE radio signal processing. He was a member of the China delegation for
6th Int. Conf. Future Internet Things Cloud (FiCloud), Aug. 2018, (ITU-R) WP5A, 5B, and 5C Conference.
pp. 236–243.
[27] H. Wu, M. Suo, J. Wang, P. Mohapatra, and J. Cao, ‘‘A holistic approach XIA FAN received the B.S. degree in communi-
to reconstruct data in ocean sensor network using compression sensing,’’ cation engineering from the Chongqing Univer-
IEEE Access, vol. 6, pp. 280–286, 2018. sity of Posts and Telecommunications, Chongqing,
[28] J. Chen, ‘‘The research of missing data recovery method based on feature China, in 2016, where she is currently pursuing the
analysis in wireless sensor networks,’’ in Proc. China Nat. Knowl. Infras- master’s degree in communication and electronic
truct. (CNKI), 2016.
information engineering. Since 2016, she has been
[29] G. Chen et al., ‘‘Multiple attributes-based data recovery in wireless sensor
studying at the Wireless Communication Technol-
networks,’’ in Proc. IEEE GLOBECOM, Atlanta, GA, USA, Dec. 2013,
pp. 103–108. ogy Innovation Laboratory, Broad Band Equip-
[30] J. Liu, P. Musialski, P. Wonka, and J. Ye, ‘‘Tensor completion for estimating ment Mobilization Center. Her research interests
missing values in visual data,’’ IEEE Trans. Pattern Anal. Mach. Intell., include mobile cloud computing, artificial intel-
vol. 35, no. 1, pp. 208–220, Jan. 2013. ligence, big data analysis in wireless communications, and the Internet of
[31] Y. Shao, Z. Chen, F. Li, and C. Fu, ‘‘Reconstruction of big sensor data,’’ Things.
in Proc. 2nd IEEE Int. Conf. Comput. Commun.(ICCC), Chengdu, China,
Oct. 2016, pp. 1–6. KAN CHEN received the B.S. degree in commu-
[32] X. Wu, C.-L. Chuang, and J.-A. Jiang, ‘‘Temperature map recov- nications engineering from the Chongqing Univer-
ery based on compressive sensing for large-scale wireless sensor net- sity of Posts and Telecommunications, Chongqing,
works,’’ in Proc. IEEE Int. Conf. Green Comput. Commun. IEEE Internet China, in 2017, where he is currently pursuing the
Things IEEE Cyber, Phys. Social Comput., Beijing, China, Aug. 2013, M.S. degree in information and communication
pp. 1202–1206. engineering. Since 2016, he has been studying at
[33] J.-F. Cai, E. J. Candès, and Z. Shen, ‘‘A singular value threshold- the Wireless Communication Technology Innova-
ing algorithm for matrix completion,’’ SIAM J. Optim., vol. 20, no. 4, tion Laboratory, Broadband Equipment Mobiliza-
pp. 1956–1982, 2010. tion Center. His research interest mainly focuses
[34] D. Raval, G. Raval, and S. Valiveti, ‘‘Optimization of clustering process
on signal processing for wireless communications,
for WSN with hybrid harmony search and K-means algorithm,’’ in Proc.
especially the multicarrier transmissions for next generation wireless com-
Int. Conf. Recent Trends Inf. Technol.(ICRTIT), Chennai, India, Apr. 2016,
pp. 1–6. munication system.
[35] J. A. Hartigan and M. A. Wong, ‘‘Algorithm AS 136: A k-means
clustering algorithm,’’ Appl. Statist., vol. 28, no. 1, pp. 100–108, SIRUI DUAN received the Ph.D. degree from
1979. the Beijing University of Posts and telecommu-
[36] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, Distributed Opti- nications, Beijing, China, in 2014. He is cur-
mization and Statistical Learning via the Alternating Direction Method rently a Lecturer with the Chongqing Univer-
of Multipliers. Now Foundations and Trends, 2011. [Online]. Available: sity of Posts and telecommunications, Chongqing,
https://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=8186925 China. His research interests mainly focus on sig-
[37] [Online]. Available: http://db.csail.mit.edu/labdata/labdata.html/ nal processing for wireless communications, with
[38] A. Smoliński, B. Walczak, and J. W. Einax, ‘‘Exploratory analysis of data an emphasis on multicarrier transmissions for next
sets with missing elements and outliers,’’ Chemosphere, vol. 49, no. 3, generation wireless communication system.
pp. 233–245, 2002.
VOLUME 6, 2018 61431

Multi-Attribute Missing Data Reconstruction Based On Adaptive Weighted Nuclear Norm Minimization in Iot

Uploaded by

Copyright:

Available Formats

Multi-Attribute Missing Data Reconstruction Based On Adaptive Weighted Nuclear Norm Minimization in Iot

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Multi-Attribute Missing Data Reconstruction Based On Adaptive Weighted Nuclear Norm Minimization in Iot

Uploaded by

Copyright:

Available Formats

Received September 25, 2018, accepted October 13, 2018, date of publication October 18, 2018, date of current

version November 9, 2018.

Multi-Attribute Missing Data Reconstruction

61420 VOLUME 6, 2018

Compressive Sensing(CS) is a powerful and generic tech-

VOLUME 6, 2018 61421

where  ⊂ {1, 2, · · · , n} × {1, 2, · · · , m} is the set of

61422 VOLUME 6, 2018

VOLUME 6, 2018 61423

61424 VOLUME 6, 2018

2) UPDATE M IV. EXPERIMENTAL RESULTS

VOLUME 6, 2018 61425

TABLE 2. Data samples obtained in the Intel Berkeley Research Laboratory.

Algorithm 1 DR-AWNNM Algorithm for Multi-Attribute

61426 VOLUME 6, 2018

FIGURE 5. Sensor locations and clustering in the Intel Berkeley Research

FIGURE 6. Comparison of the measurements within and among clusters.

VOLUME 6, 2018 61427

TABLE 3. Partial experimental results with RMP.

TABLE 4. Partial Experimental Results With CMP.

theoretical expectations. However, when the number of clus-

61428 VOLUME 6, 2018

FIGURE 9. Tensor-based multi-attribute sensor data reconstruction, with

VOLUME 6, 2018 61429

Finally, by comparing the convergence of different algo-

61430 VOLUME 6, 2018

VOLUME 6, 2018 61431

You might also like

where ⊂ {1, 2, · · · , n} × {1, 2, · · · , m} is the set of