A Holistic Approach To Distributed Dimensionality Reduction of Big Data
A Holistic Approach To Distributed Dimensionality Reduction of Big Data
A Holistic Approach To Distributed Dimensionality Reduction of Big Data
fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TCC.2015.2449855, IEEE Transactions on Cloud Computing
1
Abstract—With the exponential growth of data volume, big data have placed an unprecedented burden on current computing
infrastructure. Dimensionality reduction of big data attracts a great deal of attention in recent years as an efficient method to extract the
core data which is smaller to store and faster to process. This paper aims at addressing the three fundamental problems closely related
to distributed dimensionality reduction of big data, i.e. big data fusion, dimensionality reduction algorithm and construction of distributed
computing platform. A chunk tensor method is presented to fuse the unstructured, semi-structured and structured data as a unified
model in which all characteristics of the heterogeneous data are appropriately arranged along the tensor orders. A Lanczos based
High Order Singular Value Decomposition algorithm is proposed to reduce dimensionality of the unified model. Theoretical analyses
of the algorithm are provided in terms of storage scheme, convergence property and computation cost. To execute the dimensionality
reduction task, this paper employs the Transparent Computing paradigm to construct a distributed computing platform as well as utilizes
the linear predictive model to partition the data blocks. Experimental results demonstrate that the proposed holistic approach is efficient
for distributed dimensionality reduction of big data.
2168−7161 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TCC.2015.2449855, IEEE Transactions on Cloud Computing
2
2.1 Preliminaries
Fig. 1. The Transparent Computing paradigm.
Transparent Computing Paradigm. Transparent Com-
puting paradigm [7–9] aims at the vision advocated Tensor Decomposition. Tensor is a type of high
by pervasive computing in which users can demand dimension matrix widely used in many applications
computing services in a hassle-free way. Fig. 1 demon- such as computer vision, data mining, graph analy-
strates the paradigm that consists of three parts, e.g. sis and signal processing [10]. Let T ∈ RI1 ×I2 ×...×IP
Transparent Server (TS), Transparent Network (TN) and denote a P -order tensor, the p-mode unfolded matrix
Transparent Client (TC). TS works as a warehouse that T(p) ∈ RIp ×(Ip+1 Ip+2 ...IP I1 I2 ...Ip−1 ) contains the element
provides software resources such as Operating Systems, ei1 i2 ...ip ip+1 ...iP at the position with row number ip and
supporting softwares and analysis algorithms. A bare column number that is equal to
client first utilizes the remote booting protocol to initiate
itself and loads the Operating System image into local (ip+1 − 1)Ip+2 ...IP I1 ...Ip−1 + (ip+2 − 1)
memory, then it downloads softwares to construct a Ip+3 ...IP I1 ...Ip−1 + ... + (p2 − 1)I3 I4 ...Ip−1 (1)
runtime environment which is responsible for big data + · · · + ip−1 .
2168−7161 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TCC.2015.2449855, IEEE Transactions on Cloud Computing
3
The p-mode product of a tensor by a matrix is defined To extract the core data, the reduction function fdr in
as Eq. (7) is responsible for fusing the heterogenous data
(T ×p U )i1 i2 ...ip−1 jp ip+1 ...iP and constructing a computing platform to execute the
∑Ip
(2) dimensionality reduction tasks. Fig. 2 demonstrates the
= (ei1 i2 ...ip−1 ip ip+1 ...iP × ujp ip ).
ip =1 three fundamental problems closely related to distribut-
ed dimensionality reduction of big data.
During tensor decomposition, Eq. (2) is employed to
reduce the dimensionality of the p-th order. For an initial
Problem 2 core Problem 3
tensor T , the core tensor S and the approximate tensor Distributed
Dimensionality Hardware
T̂ [11] are defined as Reduction alg task platform
Computing
Platform
Algorithm
S = T ×1 U1 T ×2 U2 T ...×P UP T , Construction Software
(3) tensor
T̂ = S×1 U1 ×2 U2 ...×P UP .
Big Data Fusion Problem 1
Generally, the core tensor and the truncated unitary
orthogonal bases are considered as the compressed ver-
sion of the initial tensor. The reconstructed data in Unstructured Semi-structured Structured
Data Data Data
approximate tensor are more accurate than the original
data in initial tensor as the inconsistent, uncertain and
noisy data are removed. Fig. 2. The three problems closely related to distributed
Lanczos Method. The Lanczos method [12] is partic- dimensionality reduction of big data.
ularly useful to find eigenvalues and eigenvectors of
Problem 1: Big Data Fusion. The heterogenous dat from
a large scale sparse matrix. Let A denote a symmetric
multiple sources need to be fused as a unified model
matrix, the Lanczos decomposition is defined as
which will be used for dimensionality reduction.
AQk = Qk Dk + βk qk+1 eT
k, (4) Problem 2: Dimensionality Reduction Algorithm. The
large scale raw data in the unified model are uncertain,
where Qk is an orthogonal matrix, qk+1 and ek are inconsistent and redundant, they need to be cleaned to
unitary orthogonal vectors, D is a tridiagonal matrix that obtain the high quality core data which are small but
has the following form contain valuable information.
Problem 3: Platform Construction and Data Distribution.
α1 β1
.. The hardwares and softwares need to be integrated to
β1 α2 . construct a distributed computing platform, in which
k
D = . (5)
. . efficient method should be proposed to partition and
.. .. β
k−1
distribute the data blocks to autonomic devices.
βk−1 αk
The main objective of this paper is to address the three
Eq. (4) is a Lanczos decomposition with k length. The problems mentioned above. Firstly, a chunk tensor mod-
tridiagonal matrix Dk is obtained by carrying out the el is proposed to fuse the unstructured, semi-structured
iteration procedures and structured data as a unified tensor. During data
fusion, the local data can be preprocessed in Transparent
v = Aqj , αj = qjT v, Clients and only the sub-core data are submitted to
v = v − αj qj − βj−1 qj−1 , (6) Transparent Servers for integration. Secondly, a Lanc-
βj = ∥v∥2 , qj+1 = v/βj . zos based High Order Singular Value Decomposition
The Lanczos method is efficient for obtaining the larger algorithm is presented to reduce the dimensionality of
or smaller eigenvalues and the corresponding eigen- the unified tensor. Finally, the Transparent Computing
vectors of a symmetric matrix. Matrix-vector product paradigm is employed to construct a distributed com-
is the frequently used linear transformation during the puting platform, as well as a linear predictive method
Lanczos iteration. is utilized to partition big data to multi-size data blocks
which are distributed to heterogeneous devices.
2.2 Problem Statement and Solution Overview 3 B IG DATA F USION
Let Du , Dsemi and Ds denote the unstructured data, A chunk tensor model is proposed to fuse the unstruc-
semi-structured data and structured data respectively, tured, semi-structured and structured data. Concepts
core refer to the core data that consist of the core tensor and operations of the chunk tensor model are illustrated
and the truncated unitary orthogonal bases, hd and sf in this section.
denote the hardware and software, then the function fdr
to dimensionality reduction of big data can be formal- 3.1 Fusion Framework of Big Data
ized as
Fig. 3 depicts the fusion framework of big data based
fdr : {Du , Dsemi , Ds , hd, sf } → core. (7) on the Transparent Computing paradigm. The original
2168−7161 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TCC.2015.2449855, IEEE Transactions on Cloud Computing
4
data collected from multiple sources are transported to In Eq. (8), the dimensionality of each order in chunk
transparent clients to generate the sub core chunk tensors tensor C are not greater than the corresponding dimen-
which are submitted to transparent servers where the sionality in the initial tensor T .
unified core data are obtained by fusing all the sub core
chunk tensors. 71 72 73 74 75 73 74 75
76 77 78 79 80 78 79 80
81 82 83 84 85 83 84 85
Unified Core 86 87 88 89 90 88 89 90
Data
TS 41 42 43 44 45 C2 ∈ R 4×3
46 47 48 49 50
51 52 53 54 55 51 52 53
TN 56 57 58 59 60 56 57 58
TC 11 12 13 14 15 21 22 23
Core Chunk ... 16 17 18 19 20 26 27 28
Tensors 21 22 23 24 25
26 27 28 29 30 T ∈ R 4×5×3 C1 ∈ R 2×3×2
Original ...
Data Fig. 4. Example of an initial tensor and two chunk tensors.
Fig. 3. Fusion framework via the Transparent Computing The two chunk tensors C1 and C2 in Fig. 4 are inter-
paradigm. cepted from the initial tensor T . The number of orders of
chunk tensor C1 is equal to tensor T , while the number
The framework in Fig. 3 is hierarchical. The original of orders of chunk tensor C2 is fewer than tensor T . In
data are preprocessed in the clients, the core data are this example, the chunk tensor C2 is a 4-by-3 matrix.
extracted in the servers. This hierarchical model is flex- Definition 2: Chunk Extension Operator. Let C1 ∈
ible and efficient for big data fusion, it can achieve the RI1 ×I2 ×...×IM and C2 ∈ RI1 ×I2 ×...×IN be two chunk
following desired features: tensors, then the extension operator is defined as
1) Distributed Preprocessing of Original Data: The ⇀
2168−7161 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TCC.2015.2449855, IEEE Transactions on Cloud Computing
5
Eq. (10) formally describes the six parts of chunk tensor the individual XML elements and relationships. The GPS
C3 . In this paper, a new notation is employed to repre- data are integrated to tensor order Is . This nine-order
sent the partitioned parts, that is C(i,j,k) , i ∈ {1}, j ∈ tensor includes all data characteristics from the image,
{1, 2, 3}, k ∈ {1, 2}. The elements of chunk tensor XML and GPS data. In general case, as more types of
C(1,3,1) and chunk tensor C(1,1,2) in Fig. 5 are zeros, e.g. data are appended to the basic tensor space, the number
C(1,3,1) = 0 and C(1,1,2) = 0. of tensor orders is huge. For convenience this paper
employs T ∈ R×i=1 Ii to denote a P -order tensor model.
P
Definition 3: Chunking Approximate Tensor. Sup-
pose T is an initial tensor, C1 , C2 , . . . , CN are the
chunk tensors intercepted from tensor T , as well as
4 A L ANCZOS -BASED A LGORITHM FOR D I -
Ĉ1 , Ĉ2 , . . . , ĈN are the corresponding approximate
chunk tensors, then a chunking approximate tensor de- MENSIONALITY R EDUCTION OF B IG DATA
⌣
noted by T is defined as the extension product of all sub This paper proposes a Lanczos based High Order Sin-
⌣ ⇀ ⇀ ⇀ gular Value Decomposition (L-HOSVD) algorithm to
approximate chunk tensors, e.g. T = Ĉ1 ×Ĉ2 × . . . ×ĈN .
reduce the dimensionality of large scale heterogeneous
Given various types of data, we can first transform
data. Storage scheme, convergence property and compu-
them to low-rank chunk tensors, compute the sub ap-
tation cost of the algorithm are investigated in detail.
proximate chunk tensors and then integrate all the sub
approximate chunk tensors as a chunking approximate
tensor. Note that the approximate tensor T̂ is different 4.1 Decomposition Theorem for Large Scale Tensor
⌣
from the chunking approximate tensor T proposed in Theorem 1: Iterative Decomposition of Tensor. Let
this paper. For a P -order tensor, the approximate tensor T ∈ RI1 ×I2 ×···×IP denote a P -order tensor and vec
is reconstructed with the core tensor S and the truncated refer to a random vector, then there exists an iterative
unitary orthogonal bases U1 , U2 , . . . , UP , while the decomposition that satisfies equation
chunking approximate tensor is generated by integrating
S = T ×1 U1T ×2 U2T · · · ×P UPT , (13)
all the sub approximate chunk tensors.
where S is the core tensor as well as U1 , U2 , . . . , UP are
3.3 Fusion with Chunk Tensors the truncated orthogonal bases. The bases are obtained
In this paper a basic three-order tensor is defined for big during the iterative process with matrix-vector product.
data fusion as Proof. The unfolded matrix T(p) can be transformed to
Tbase ∈ RIt ×Is ×Id , (11) corresponding symmetric matrix Sym(T(p) ). Assume the
p-mode unfolded matrix has m rows and n columns, this
where the three orders denote time, space and device paper adopts T(p) ∈ Rm×n , 1 ≤ p ≤ P, m ≥ n to denote
characteristics respectively. Eq. (11) provides a basic the unfolded matrix. Define the symmetric matrix as
space that various types of data can be appended to it [ m×m ]
to generate a unified representation model [14]. 0 T(p)
Sym(T(p) ) = T , (14)
T(p) 0n×n
Is T
and let T(p) = U(p) Σ(p) V(p) be the singular value decom-
Image Data position of matrix T(p) , then the left singular matrix and
singular values can be calculated using the equations
GPS Data ′ ′′ ′
[ (p) , U] (p) ], U (p) ∈ R
U(p) = [U m×n
, U ′′ (p) ∈ Rm×(m−n) ,
XML Data Σ̂(p)
Σ(p) = , Σ̂(p) ∈ Rn×n .
It 0
(15)
By performing eigen value decomposition on the sym-
metric matrix Sym(T(p) ) = Q(p) Λ(p) QT (p) , we can obtain
Id [ ]
Λ(p) = Σ̂(p) , −Σ̂(p) , 0(m−n)×(m−n) ,
Fig. 6. Fuse three types of data as a unified tensor. [ ]
√1 U ′ (p) − √1 U ′ (p) U ′′
(p) (16)
2 2
Fig. 6 illustrates the integrating process of three types Q(p) = 1
√ V 1
√ V n×(m−n) .
2 2
0
of data, i.e. image data, XML document data and GPS da-
ta. These data are transformed to three sub chunk tensors Eq. (16) indicates that the left singular vectors and the
and are embedded to the basic tensor. The constructed singular values of matrix T(p) can be extracted from the
unified tensor model can be defined as eigenvectors and eigenvalues of the symmetric matrix
Sym(T(p) ). The Lanczos method described in Eqs. (4) ∼
T ∈ RIt ×Is ×Id ×Ih ×Iw ×Ics ×Iia ×Iib ×Ir , (12)
(6) is employed in this paper to quickly compute the left
where the orders Ih , Iw , Ics refer to the image height, singular vectors and the singular values of each unfolded
image width, color space respectively, Iia , Iib , Ir denote matrices.
2168−7161 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TCC.2015.2449855, IEEE Transactions on Cloud Computing
6
2168−7161 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TCC.2015.2449855, IEEE Transactions on Cloud Computing
7
the row ptr array to store the locations in the val array Let m denote the number of rows of matrix T(p) in Fig.
that start a new row. 10, n denote the number of columns, then m and n can
As the matrix Sym(T(2) ) is symmetric, only the nonze- be computed with the following equation
ro elements in the upper triangular portion are essential
for storage. The nonzero elements in the lower triangular ∏
P
m = Ip , n = Ii . (18)
portion can be implicitly obtained during the Lanczos
i=1, i̸=p
iteration. In Eq. (17), the symmetric matrix Sym(T(2) )
contains sixteen nonzero elements and eighty-four zero In this paper, we adopt mnz to denote the number of
elements. The eight nonzero elements in the upper tri- rows that contain nonzero elements, then the matrix-
angular portion are stored in memory. vector product of the symmetric matrix Sym(T(p) ) by
the vector q using the Compressed Row Storage format
0 0 0 0 0 9 7 0 0 0 is provided by
0 0 0 0 3 0 0 0 0 6
0 0 0 0 0 0 0 5 4 0
1: for i = 1 to mnz do
0 0 0 0 0 8 2 0 0 0
2: v(i) = 0.
0 3 0 0 0 0 0 0 0 0
Sym(T(2) ) = . (17) 3: for j = row ptr(i) to row ptr(i + 1) − 1 do
9 0 0 8 0 0 0 0 0 0 4: v(i) = v(i) + val(j) × q(col ind(j)).
7 0 0 2 0 0 0 0 0 0
5: end for
0 0 5 0 0 0 0 0 0 0
6: end for
0 0 4 0 0 0 0 0 0 0
7: for j = 1 to mnz do
0 6 0 0 0 0 0 0 0 0 8: for i = row ptr(j) to row ptr(j + 1) − 1 do
9: v(col ind(i)) = v(col ind(i)) + val(i) × q(j).
In Fig. 9, the val array contains the eight nonzero
10: end for
elements and the col ind array contains the correspond-
11: end for
ing column indices. The five locations that start a new
row are stored in the row ptr array. Instead of storing
one hundred (10 × 10) elements, this scheme only need All elements of the vector v are set to zeros before
twenty-one (8 + 8 + 5) storage locations. When the sym- the matrix-vector product. The matrix-vector product of
metric matrix Sym(T(2) ) in Eq. (17) is loaded to computer v = T(p) q is computed from Line 1 to Line 6. The multi-
memory using the CRS scheme, 79% storage locations plication operation can be expressed using the equation
are saved. The storage saving is significant. ∑ ∑
v(i) = (T(p) )i,j q(j) = ei,j q(j), (19)
j j
val 9 7 3 6 5 4 8 2
col_ind 6 7 5 10 8 9 6 7 where ei,j refers to the nonzero element of the matrix
row_ptr 1 3 5 7 9 T(p) . The product of v = (T(p) )T q is obtained from Line
7 to 11 with the following equation
Fig. 9. Compressed row storage of the sparse symmetric ∑ ∑
T
matrix. v(i + m) = (T(p) )i,j q(j) = ej,i q(j). (20)
j j
It is efficient to use the CRS format of the symmetric
matrix Sym(T(2) ) for matrix-vector product during the The Compressed Row Storage format illustrated in Fig.
Lanczos iteration. Fig. 10 demonstrates the matrix-vector 9 is used by the product operations mentioned above.
product of a symmetric matrix with a vector. The upper
tridiagonal part of the symmetric matrix multiplies by 4.4 Convergence and Accuracy of the L-HOSVD Al-
the low part of the vector, while the lower tridiagonal gorithm
part multiplies by the upper part of the vector.
This section investigates the convergence property of
m n the proposed Lanczos based High Order Singular Value
Decomposition (L-HOSVD) algorithm, and analyzes the
0 T( p ) ∈ R m×n m approximation accuracy of the decomposition results ob-
tained by the algorithm. Let T(p) = Up Σp VpT denote the
= × singular value decomposition of the p-mode unfolded
T Τ
0 n matrix T(p) , Upr = [u1 , u2 , . . . , ur ], Vpr = [v1 , v2 , . . . , vr ]
( p)
and Σrp = diag(σ1 , σ2 , . . . , σr ) denote the truncated
left singular matrix, right singular matrix, and singular
v Sym(T( p ) ) q
values respectively, then the truncated singular value
decomposition of the p-th unfolded matrix is defined as
Fig. 10. Illustration of the matrix-vector product of sym-
metric matrix Sym(T(p) ) by vector q. r
T(p) = Upr Σrp (Vpr )T . (21)
2168−7161 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TCC.2015.2449855, IEEE Transactions on Cloud Computing
8
The approximation accuracy of the reconstructed matrix 4.5 Computation Cost and Memory Usage
r
T(p) can be computed with the equation The decomposition of a large scale tensor is a great bur-
r
den on current computing infrastructure. For instance,
T(i) − T(i)
= σr+1 , (22)
let T ∈ R100×100×100×100 be a four order tensor, then the
F
where σr+1 is the (r + 1)-th singular value, ∥·∥F is the first-mode tensor unfolding T(1) is a 102 by 106 matrix.
Frobenius Norm [16]. Eq. (22) provides a method to In order to apply the classical Golub-Reinsch SVD [19]
measure the absolute error of the singular value decom- algorithm to factorize the unfolded matrix, this paper
T
position. This paper extends the absolute error value to uses the equation T(1) = V ΣT U T to compute the singular
a relative error ratio that can appropriately measure the value matrix Σ and the right singular vector matrix V .
approximation accuracy. The ratio is defined as As the flop count of the Golub-Reinsch SVD algorithms
is 4mn2 + 8n3 [20] where m and n are the number of
r
T(p) − T(p)
rows and columns respectively, the execution time to
F
ψ= . (23) decompose the unfolded matrix T(1) T
with a computer
T(p)
that achieves a speed of one trillion operations per
In order to find a better approximation for a unified second are about
tensor model T , this paper employs the error ratio ψ to 2 3
determine the amount of iterations during the Lanczos 4 × 102 × (106 ) + 8 × (106 )
≈ 93 (days). (24)
procedure. An example of the exit from the Lanczos 1012 × 3600 × 24
iteration is demonstrated in Fig. 11. An error ratio ψ This type of dimensionality reduction method is unbear-
is provided in advance based on the requirement of a ably slow for big data applications. Compared to the
big data application, then the singular value σr+1 in Golub-Reinsch SVD algorithm, the proposed L-HOSVD
Fig. 11(b) is obtained using Eq. (22). According to Eq. algorithm is more efficient. Theoretical analyses in terms
(15), the eigen value Λr+1 is equal to this singular value of computation cost and memory usage of the L-HOSVD
σr+1 . The Kaniel-Paige-Saad [17] theory indicates that algorithm are provided in this section.
the exterior eigenvalues are among the fast to converge. Computation Cost. Let T ∈ RI1 ×I2 ×...×IP denote a
The better separated these eigenvalues are from the rest P -order tensor, the required operations to decompose
of the eigenvalues, the faster they converge. Therefore this tensor consist of three parts, e.g. unfolding tensor
during the Lanczos iteration of the proposed L-HOSVD T to matrices T(1) , T(2) , . . . , T(P ) , singular value
algorithm, the procedure can break out once the eigen- decomposition on these unfolded matrices, and the p-
value Λr+1 is obtained. mode product of the tensor T by the obtained truncated
U2
matrices U1 , U2 , . . . , UP . The total flop counts f lp of
the mentioned three parts are equal to f lp = f lpunf +
f lpsvd + f lpprd . As the most time-consuming part is
f lpsvd , this section mainly focuses on the computation
cost of the singular value decomposition of the symmet-
3
U
T U1 S
ric matrix Sym(T(p) ), 1 ≤ p ≤ P .
During the Lanczos iteration, the frequently called
T^ operation is the matrix-vector product which consists
(a) of addition and scalar multiplication. Define nnz as a
function to compute the number of nonzero elements in
T(2) U2 Σ2 a matrix, then the flop counts of the Lanczos method
σ r +1 (V2r )Τ
Σ r2
...
T r
(2) = U r
2 × × described in Eq. (6) are illustrated in Table 2.
...
V2Τ
(b)
TABLE 2
Flop counts to decompose a symmetric matrix.
Fig. 11. Example of the exit condition during the Lanczos
iteration. Operation Name Multiplication Addition
v = Sym(T(p) )qj nnz(Sym(T(p) )) nnz(Sym(T(p) )) − 1
Using the error ratio ψ defined in Eq. (23), we can ∏P ∏P
αj = qjT v Ip + Ii Ip + Ii − 1
control the Lanczos iteration and obtain the truncated i=1, i̸=p i=1, i̸=p
SVD of the unfolded matrices which are used to com- v = v − αj qj ∏P ∏
P
2(Ip + Ii ) 2(Ip + Ii )
pute the core tensor and the approximate tensor. This −βj−1 qj−1 i=1, i̸=p i=1, i̸=p
dimensionality reduction technique can provide a low ∏
P ∏
P
βj = ∥v∥2 (Ip + Ii ) (Ip + Ii ) −1
rank approximation for the initial tensor T , which is i=1, i̸=p i=1, i̸=p
not the best approximation of the initial data, but is still ∏
P
qj+1 = v/βj (Ip + Ii ) 0
considered to be sufficiently good for many applications. i=1, i̸=p
2168−7161 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TCC.2015.2449855, IEEE Transactions on Cloud Computing
9
2168−7161 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TCC.2015.2449855, IEEE Transactions on Cloud Computing
10
the subscript is omitted, it implies this model is suitable a set of training data that can be defined as
for all devices. In this paper, the block size Bij+1 is a
1 eng 1 sec1 pro1 tra1
dependent variable that is affected by four independent 2 2 2 2
variables defined above, the linear function of this pre- χ = 1 eng sec pro tra . (28)
. .. ..
dictive model can be formulated as 1 .. . .
[ ]T
B j+1 = ω0 + ω1 eng j + ω2 secj + ω3 proj + ω4 traj , (27) Let B = B 1 , B 2 , . . . denote the vector of the
dependent variables in the training data, and ω =
T
where ω1 , ω2 , ω3 and ω4 refer to the weight coefficients [ω0 , ω1 , ω2 , ω3 , ω4 ] refer to the vector of the weight
for the four independent variables, as well as ω0 denotes coefficients, then the regression function of the sample
the average affection of other variables that are not data can be formalized as
included in this predictive model.
B = χ × ω. (29)
If χT χ is nonsingular, the coefficient parameters can be
5.3 Quantitative Methods of Dependent Variables obtained using equation
This section illustrates how to quantify the four de- ω = (χT χ)−1 χT B. (30)
pendent variables in Eq. (27). When a dimensionality
reduction task is scheduled to heterogenous clients and After calculating the solutions of this equation, we can
servers, the energy consumption is determined by the obtain the predictive function and estimate the block size
CPU workload which can be measured by the number of that is appropriate for a device to process by applying
CPU cycles required by the task. When the task is trans- Eq. (27). In this paper, the key operations of the proposed
mitted over a wireless network or Internet, the energy L-HOSVD method is matrix-vector multiplication which
consumption is produced for the data transmission. Let is mainly affected by the size of the matrix. The task
engtc , engtra and engts denote the energy consumed in scheduling is controlled by partitioning and distributing
transparent clients, communication network and trans- appropriate size of data blocks to the autonomic devices.
parent servers, then the total energy consumption for
executing the dimensionality reduction task is defined 6 P ERFORMANCE E VALUATION
as eng = engtc + engtra + engts . The amount of energy This section presents the performance evaluation results
consumed in devices and network can be obtained using of the proposed holistic approach for dimensionality
the calculation methods reported in Refs. [25, 26]. reduction of big data. The Brighkite dataset 1 is used for
The study of data security has been published in the evaluation. This dataset includes trajectory information
literature. In Ref. [27], three types of security services, (collected from user’s checking-in) as well as the social
i.e. confidentiality, authentication and integrity are con- relationships among these users. We also fused some
sidered for providing security services. Confidentiality unstructured video data to evaluate the effect of dimen-
protection service is against unauthorised disclosure of sionality reduction of heterogenous data. We have imple-
information. Authentication service is the process of reli- mented a number of algorithms for big data processing
ably determining the identity of a communicating party. such as chunk tensor fusion algorithm, Lanczos based
Integrity protection service is against the unauthorized High Order Singular Value Decomposition algorithm,
modification of data. In Ref. [28], the security features in- linear prediction algorithm, etc. But due to limitation of
cluding data integrity, identity privacy, location privacy, the paper length, we only present some representative
and authentication are investigated for formalization. experimental results.
Processing time are measured in seconds which repre-
sent the computation costs of dimensionality reduction
6.1 Unified Tensor and Chunk Tensor
tasks. The matrix-vector multiplication, matrix-matrix
multiplication and n-mode product of a tensor by a The trajectory data are fused to a four-order tensor which
matrix are three key linear operations in the reduction can be denoted as T ∈ RIuser ×Itime ×Ilatitude ×Ilongitude .
tasks. All of them can be calculated in seconds. The For illustration we selected trajectories of three users
communication traffic are the bytes exchanged in the net- and demonstrated them in Fig. 13. The reference time
work bandwidth, they can be quantified by calculating is 20 : 56 : 10 on the evening of May 25th, 2009, as
the data passed between the autonomic devices. well as the time axis is measured in one-minute unit.
Different colors are used to distinguish users in order
Iuser . The trajectories of the first, second and third user
5.4 Computing the Coefficient Parameters are indicated by 2100, 1210 and 2100 points respectively.
To illustrate the chunk tensor model, we represented a
The least square estimation method [29, 30] is employed week of trajectories of one user as an initial chunk tensor
to compute the coefficient parameters of the predictive and employed high order singular value decomposition
model. We first perform the dimensionality reduction
tasks on the distributed computing platform and collect 1. http://snap.stanford.edu/data/loc-brightkite.html
2168−7161 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TCC.2015.2449855, IEEE Transactions on Cloud Computing
11
50
80%
0
Longitude
−50
60%
Ratio
−100
−150
60 40%
40 1
0.5
20 0 6
Latitude −0.5 x 10
0 −1 20%
Time (Minute) Reduction Ratio
Approximation Ratio
Fig. 13. Trajectories of three users in a tensor model. 0
2 4 6 8 10 12
Experiment
to obtain the approximate chunk tensor. Fig. 14(a) and Fig. 15. Dimensionality reduction and approximate error.
Fig. 14(b) demonstrate the two chunks as well as Fig.
14(c) shows the two chunks in an integrated space which
reveals that there is no great offset between the initial while the approximation ratio decrease slowly from
chunk tensor and the approximate chunk tensor. 99.2% to 86%. In the fifth experiment, 48% core data
can ensure 97% approximation accuracy, where the core
x 10
5
x 10
5
tensor accounted for 71.8% of the core data and the
4 4 truncated orthogonal bases accounted for 28.2%. The
average results of the experiments show that 38.2% core
Longitude
Longitude
0 0
10 10
x 10
5 5 1
2
4 x 10
5 5 1
2
4
6.3 Performance of the L-HOSVD Algorithm
x 10 x 10
Latitude 0 0 Time (Minute) Latitude 0 0 Time (Minute)
(a) (b)
1
x 10
5 General HOSVD
Lanczos−based HOSVD
4 0.8
Normalized execution time
Longitude
2
0.6
Initial Chunk
0 Approximate Chunk
10
15000 0.4
5
5 10000
x 10 5000
0 0
Latitude Time (Minute)
0.2
(c)
2168−7161 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TCC.2015.2449855, IEEE Transactions on Cloud Computing
12
as the approximate accuracy ratio rapidly meet the pre- the faster algorithms for solving symmetric tridiagonal
defined value. We carried out a great deal of iterations on eigen-problem in both sequential and parallel settings.
the symmetric matrix generated from the second-mode Drmac and Veselic[38] apply the classical Jacobi method
unfolded matrix of the seven order tensor, and depicted to design fast SVD algorithm.
the top forty eigenvalues in Fig. 17, which illustrates
the convergence property of the L-HOSVD algorithm. 7.3 Lanczos Method
Experimental results show that the symmetric matrix The Lanczos method [39] is efficient for finding several
reconstructed from 33 percent exterior eigenvalues and extreme eigenvectors and eigenvalues of a large scale
the corresponding eigenvectors can guarantee 95 percent sparse symmetric matrix. In Ref. [40] a block Lanczos
approximate accuracy. type algorithm is proposed to compute the tridiagonal
4
matrix. Parallel implementation of Lanczos algorithms
x 10
3 [41, 42] are studied to improve the efficiency, those al-
gorithms aim to effectively parallelize the matrix-vector
2 or vector-vector operations.
Many studies on tensor decomposition, singular val-
1 ue decomposition and Lanczos method have been per-
formed over the past few decades. However, all the
Eigenvalue
2168−7161 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TCC.2015.2449855, IEEE Transactions on Cloud Computing
13
[4] E. Henry, J. Hofrichter et al., “Singular Value De- pp. 1324–1342, 2000.
composition: Application to Analysis of Experimen- [19] G. H. Golub and C. Reinsch, “Singular Value De-
tal Data,” Essential Numerical Computer Methods, vol. composition and Least Squares Solutions,” Nu-
210, pp. 81–138, 2010. merische Mathematik, vol. 14, no. 5, pp. 403–420, 1970.
[5] P. Comon and C. Jutten, Handbook of Blind Source [20] G. H. Golub and C. F. Van Loan, Matrix Computa-
Separation: Independent Component Analysis and Ap- tions. JHU Press, 2012, vol. 3.
plications. Academic Press, 2010. [21] K. Wu and H. Simon, “Thick-Restart Lanczos
[6] E. E. Cureton and R. B. D’Agostino, Factor Analysis: Method for Large Symmetric Eigenvalue Problem-
An Applied Approach. Psychology Press, 2013. s,” SIAM Journal on Matrix Analysis and Applications,
[7] Y. Zhang and Y. Zhou, “Transparent Computing: vol. 22, no. 2, pp. 602–616, 2000.
A New Paradigm for Pervasive Computing,” in [22] H. D. Simon, “The Lanczos Algorithm with Partial
Ubiquitous Intelligence and Computing. Springer, Reorthogonalization,” Mathematics of Computation,
2006, pp. 1–11. vol. 42, no. 165, pp. 115–142, 1984.
[8] Y. Zhang, L. T. Yang, Y. Zhou, and W. Kuang, “Infor- [23] Y. Zhang and Y. Zhou, “4VP: A Novel Meta OS Ap-
mation Security Underlying Transparent Comput- proach for Streaming Programs in Ubiquitous Com-
ing: Impacts, Visions and Challenges,” Web Intelli- puting,” in 21st International Conference on Advanced
gence and Agent Systems, vol. 8, no. 2, pp. 203–217, Information Networking and Applications AINA’07.
2010. IEEE, 2007, pp. 394–403.
[9] Y. Zhang and Y. Zhou, “Separating Computation [24] M. Johnston and S. Venaas, “Dynamic Host Con-
and Storage with Storage Virtualization,” Computer figuration Protocol (DHCP) Options for the Intel
Communications, vol. 34, no. 13, pp. 1539–1548, 2011. Preboot eXecution Environment (PXE),” RFC 4578,
[10] T. G. Kolda and B. W. Bader, “Tensor Decomposi- November, Tech. Rep., 2006.
tions and Applications,” SIAM Review, vol. 51, no. 3, [25] K. Kumar and Y.-H. Lu, “Cloud Computing for
pp. 455–500, 2009. Mobile Users: Can Offloading Computation Save
[11] L. De Lathauwer, B. De Moor, and J. Vandewalle, “A Energy?” Computer, vol. 43, no. 4, pp. 51–56, 2010.
Multilinear Singular Value Decomposition,” SIAM [26] W. Zhang, Y. Wen, K. Guan, D. Kilper, H. Luo, and
Journal on Matrix Analysis and Applications, vol. 21, D. Wu, “Energy-Optimal Mobile Cloud Computing
no. 4, pp. 1253–1278, 2000. under Stochastic Wireless Channel,” 2013.
[12] M. Grüning, A. Marini, and X. Gonze, “Implemen- [27] M. Lin, L. Xu, L. T. Yang, X. Qin, N. Zheng, Z. Wu,
tation and Testing of Lanczos-based Algorithms and M. Qiu, “Static Security Optimization for Real-
for Random-Phase Approximation Eigenproblems,” Time Systems,” IEEE Transactions on Industrial Infor-
Computational Materials Science, vol. 50, no. 7, pp. matics, vol. 5, no. 1, pp. 22–37, 2009.
2148–2156, 2011. [28] A. N. Khan, M. Mat Kiah, S. U. Khan, and S. A.
[13] S. Ragnarsson and C. F. Van Loan, “Block Tensor Madani, “Towards Secure Mobile Cloud Comput-
Unfoldings,” SIAM Journal on Matrix Analysis and ing: A Survey,” Future Generation Computer Systems,
Applications, vol. 33, no. 1, pp. 149–169, 2012. vol. 29, no. 5, pp. 1278–1299, 2013.
[14] L. Kuang, F. Hao, L. T. Yang, M. Lin, C. Luo, and [29] R. C. Aster, B. Borchers, and C. H. Thurber, Parame-
G. Min, “A Tensor-Based Approach for Big Da- ter Estimation and Inverse Problems. Academic Press,
ta Representation and Dimensionality Reduction,” 2013.
IEEE Transactions on Emerging Topics in Computing, [30] G. Liu, M. Fan, and G. Quan, “Neighbor-Aware
(in Press, DOI 10.1109/TETC.2014.2330516). Dynamic Thermal Management for Multi-Core Plat-
[15] Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and form,” in Europe Conference on Design, Automation &
H. van der Vorst, Templates for the Solution of Alge- Test & Exhibition DATE’12. IEEE, 2012, pp. 187–192.
braic Eigenvalue Problems: A Practical Guide. SIAM, [31] C. M. Martin, “Tensor Decompositions Workshop
2000, vol. 11. Discussion Notes,” American Institute of Mathematics,
[16] A. L. Custódio, H. Rocha, and L. N. Vicente, “In- 2004.
corporating Minimum Frobenius Norm Models in [32] L. R. Tucker, “Some Mathematical Notes on Three-
Direct Search,” Computational Optimization and Ap- Mode Factor Analysis,” Psychometrika, vol. 31, no. 3,
plications, vol. 46, no. 2, pp. 265–278, 2010. pp. 279–311, 1966.
[17] T. Ericsson and A. Ruhe, “The Spectral Transfor- [33] R. Wetzker, C. Zimmermann, C. Bauckhage, and
mation Lanczos Method for the Numerical Solution S. Albayrak, “I Tag, You Tag: Translating Tags for
of Large Sparse Generalized Symmetric Eigenvalue Advanced User Models,” in Proceedings of the Third
Problems,” Mathematics of Computation, vol. 35, no. ACM International Conference on Web Search and Data
152, pp. 1251–1268, 1980. Mining. ACM, 2010, pp. 71–80.
[18] L. De Lathauwer, B. De Moor, and J. Vandewalle, [34] B. Savas and L. Eldén, “Handwritten Digit Classifi-
“On the Best Rank-1 and Rank-(R1, R2,..., Rn) Ap- cation Using Higher Order Singular Value Decom-
proximation of Higher-Order Tensors,” SIAM Jour- position,” Pattern Recognition, vol. 40, no. 3, pp. 993–
nal on Matrix Analysis and Applications, vol. 21, no. 4, 1003, 2007.
2168−7161 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TCC.2015.2449855, IEEE Transactions on Cloud Computing
14
[35] G. Golub and W. Kahan, “Calculating the Singular Laurence T. Yang received the B.E. degree
Values and Pseudo-Inverse of A Matrix,” Journal of in Computer Science and Technology from Ts-
inghua University, China and the Ph.D degree
the Society for Industrial & Applied Mathematics, Series in Computer Science from University of Victoria,
B: Numerical Analysis, vol. 2, no. 2, pp. 205–224, 1965. Canada. He is a professor in the School of
[36] M. Gu and S. C. Eisenstat, “A Divide-and-Conquer Computer Science and Technology at Huazhong
University of Science and Technology, China,
Algorithm for the Bidiagonal SVD,” SIAM Journal and in the Department of Computer Science,
on Matrix Analysis and Applications, vol. 16, no. 1, St. Francis Xavier University, Canada. His re-
pp. 79–92, 1995. search interests include parallel and distributed
computing, embedded and ubiquitous/pervasive
[37] J. Cuppen, “A Divide and Conquer Method for the computing, and big data. His research has been supported by the
Symmetric Tridiagonal Eigenproblem,” Numerische National Sciences and Engineering Research Council, and the Canada
Mathematik, vol. 36, no. 2, pp. 177–195, 1980. Foundation for Innovation.
[38] Z. Drmac and K. Veselic, “New Fast and Accurate
Jacobi SVD Algorithm. I,” SIAM Journal on Matrix
Analysis and Applications, vol. 29, no. 4, pp. 1322–
1342, 2008.
[39] J. K. Cullum and R. A. Willoughby, Lanczos Algo-
Jinjun Chen received the Ph.D degree in com-
rithms for Large Symmetric Eigenvalue Computations: puter science and software engineering from
Vol. 1: Theory. SIAM, 2002, vol. 41. Swinburns University of Technology, Australia.
[40] I. Popovyan, “Efficient Parallelization of Lanczos He is an associate professor from the Faculty
of Engineering and Information Technology, U-
Type Algorithms,” Tech. Rep., 2011. niversity of Technology Sydney (UTS), Australia.
[41] I. Flesch and R. Bisseling, “A New Parallel Ap- He is the director of Lab of Cloud Computing and
proach to the Block Lanczos Algorithm for Finding Distributed Systems at UTS. His research inter-
ests include cloud computing, big data, workflow
Nullspaces Over GF (2),” Master’s Thesis, Department management, privacy and security, and related
of Mathematics, Utrecht University, Utrecht, the Nether- various research topics. His research results
lands, 2002. have been published in more than 100 papers in high-quality journals
and at conferences.
[42] P. L. Montgomery, “Distributed Linear Algebra,” in
Proceedings, 4th Workshop on Elliptic Curve Cryptog-
raphy, 2000.
2168−7161 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.