A Holistic Approach To Distributed Dimensionality Reduction of Big Data

This article has been accepted for publication in a future issue of this journal, but has not been
fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TCC.2015.2449855, IEEE Transactions on Cloud Computing
1
A Holistic Approach to Distributed

Dimensionality Reduction of Big Data
Liwei Kuang, Yaoxue Zhang, Laurence T. Yang, Jinjun Chen, Fei Hao, and Changqing Luo
Abstract—With the exponential growth of data volume, big data have placed an unprecedented burden on current computing
infrastructure. Dimensionality reduction of big data attracts a great deal of attention in recent years as an efficient method to extract the
core data which is smaller to store and faster to process. This paper aims at addressing the three fundamental problems closely related
to distributed dimensionality reduction of big data, i.e. big data fusion, dimensionality reduction algorithm and construction of distributed
computing platform. A chunk tensor method is presented to fuse the unstructured, semi-structured and structured data as a unified
model in which all characteristics of the heterogeneous data are appropriately arranged along the tensor orders. A Lanczos based
High Order Singular Value Decomposition algorithm is proposed to reduce dimensionality of the unified model. Theoretical analyses
of the algorithm are provided in terms of storage scheme, convergence property and computation cost. To execute the dimensionality
reduction task, this paper employs the Transparent Computing paradigm to construct a distributed computing platform as well as utilizes
the linear predictive model to partition the data blocks. Experimental results demonstrate that the proposed holistic approach is efficient
for distributed dimensionality reduction of big data.
Index Terms—Big Data, Tensor, Dimensionality Reduction, Transparent Computing.
1 I NTRODUCTION is a technique that expresses the data in such a way as to

highlight their similarities and differences. PCA applies
D A ta volume is increasing rapidly towards Terabyte
level or Petabyte level in various fields including
biomedicine, Internet of Things (IoTs), social media,
the linear operations to transform the data to a new
coordinate system in which the greatest variance comes
to lie on the first coordinate, the second on the next
etc. Big data can support better understanding of the
coordinate, and so on. Singular Value Decomposition
real world as well as provide high-quality services to
(SVD) [4] is a method to identify the dimensions along
users through efficient analysis approaches. However,
which data points exhibit the most variation. SVD allows
the tremendously growing data are so large and complex
to find the best approximation of the original data using
that they are beyond the ability of current computing
fewer dimensions. Independent Component Analysis [5]
infrastructure and analysis algorithms to manage and
(ICA) is a high-order method that seeks linear projection,
process. Hence, new approaches should be further ex-
not necessarily orthogonal to each other, that are as near-
plored to address these challenges caused by big data.
ly statistically independent as possible. Factor analysis
As the excess of dimensionality can bring about in-
[6] is a statistical method to describe variability among
consistency, redundancy and confusion in big data ap-
observed, correlated variables in terms of a potentially
plications, to extract the principal variables that strongly
lower number of unobserved variables called factors.
capture the distinguish characteristics of heterogeneous
data becomes a primary requirement. Dimensionality Although many studies have been performed on di-
reduction [1] is a process to decrease the number of mensionality reduction, the mentioned methods suffer
variables in large scale data, it can avoid the effect of from some limitations. They are mainly concentrated on
curse of dimensionality [2]. During the past decades, the homogeneous data. These methods are computationally
study of dimensionality reduction has been reported in expensive when they are applied to large scale data, and
the literature. Principal Component Analysis (PCA) [3] scalability problems might arise if the reduction tasks
are scheduled to heterogeneous computing devices. All
• L. Kuang, F. Hao and C. Luo are with the School of Computer Science
the investigations fail to recognize the importance of
and Technology, Huazhong University of Science and Technology, Wuhan a systematic research for dimensionality reduction. The
430074, China. E-mail: {liweikuang, feehao, chqluo2013}@gmail.com. main purpose of this paper is to put forward a holistic
• Y. Zhang is with the School of Information Science and Engineering, Cen-
tral South University, Changsha 410083, China. E-mail: [email protected].
approach to address the three fundamental problems
• L.T. Yang is with the School of Computer Science and Technology, closely related to distributed dimensionality reduction of
Huazhong University of Science and Technology, Wuhan 430074, China, big data, that is big data fusion, dimensionality reduction
and Department of Computer Science, St. Francis Xavier University,
Antigonish, NS, Canada. Email: [email protected].
algorithm, and construction of a distributed computing
• J. Chen is with the Faculty of Engineering and Information Technology, platform. Our major contributions are summarized as
University of Technology, Sydney. PO Box 123, Broadway NSW 2007 follows:
Australia. Email: [email protected].
• We propose a chunk tensor method to fuse the
2168−7161 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
2
heterogeneous data from multiple sources as a uni- TABLE 1

fied tensor model. The original data are represented Table of symbols.
as sub-chunk tensors in local clients, the extracted
Symbol Definition
core chunk tensors are uploaded to servers for
T initial tensor
integration. Concepts and operations of the chunk
S core tensor
tensor method are provided as well as the fusion
T̂ approximate tensor
process is illustrated in detail.
C chunk tensor
• We present a Lanczos based High Order Singular ⌣
T chunking approximate tensor
Value Decomposition (L-HOSVD) algorithm that
T(p) p-mode unfolded matrix
can quickly obtain the core tensor and the truncated
Sym(T(p) ) Symmetric matrix created with T(p)
orthogonal bases from the unified tensor model.
D tridiagonal matrix
The tensor unfolding are transformed to symmetric
Q eigenvector matrix
matrices and the Lanczos method is employed to
U (V ) left (right) singular vector matrix
compute the singular vectors and singular values.
Σ (Λ) singular (eigen) value
According to our experiment, the L-HOSVD algo- ×p p-mode product of a tensor by a matrix
rithm is efficient and competitive. ∥T ∥F Frobenius norm of tensor T
• We employ the Transparent Computing paradigm
α, β elements of the tridiagonal matrix
to construct a distributed computing platform with dev device of the computing platform
heterogeneous devices to perform the dimensional- Bij size of data block processed in device i at timej
ity reduction tasks of big data. A linear predictive ω weight parameter
model is used to estimate the size of data blocks χ training data
that will be distributed to autonomic devices.
The remainder of this paper is organized as follows.
Section 2 recalls the preliminaries and formally defines processing. With Transparent Computing paradigm, the
the distributed dimensionality reduction problem of big Von Neumann architecture used in many computers can
data. A chunk tensor model is propose for big da- be extended into a distributed computing environment
ta fusion in Section 3. In Section 4, a Lanczos based spatio-temporally.
High Order Singular Value Decomposition algorithm
is explored. Section 5 illustrates the construction of
a distributed computing platform via the Transparent Data Data
Computing paradigm. Experimental results are shown Application Application
in Section 6. After reviewing the related works in Section OS1
……
OSn
7, we conclude the paper in Section 8. Transparent Transparent
Server1 Servern
2 P RELIMINARIES AND P ROBLEM S TATEMENT

The preliminaries of Transparent Computing paradigm, Transparent Network
tensor decomposition and Lanczos method are reviewed
in this section. Additionally, the distributed dimension-
ality reduction problems of big data are formalized, and
an overview of the holistic solution is provided. Table 1 Requests Calculations Calculations Requests
demonstrates some primary symbols used in this paper. Transparent
……
Transparent
Services Client1 Clientm Services
2.1 Preliminaries
Fig. 1. The Transparent Computing paradigm.
Transparent Computing Paradigm. Transparent Com-
puting paradigm [7–9] aims at the vision advocated Tensor Decomposition. Tensor is a type of high
by pervasive computing in which users can demand dimension matrix widely used in many applications
computing services in a hassle-free way. Fig. 1 demon- such as computer vision, data mining, graph analy-
strates the paradigm that consists of three parts, e.g. sis and signal processing [10]. Let T ∈ RI1 ×I2 ×...×IP
Transparent Server (TS), Transparent Network (TN) and denote a P -order tensor, the p-mode unfolded matrix
Transparent Client (TC). TS works as a warehouse that T(p) ∈ RIp ×(Ip+1 Ip+2 ...IP I1 I2 ...Ip−1 ) contains the element
provides software resources such as Operating Systems, ei1 i2 ...ip ip+1 ...iP at the position with row number ip and
supporting softwares and analysis algorithms. A bare column number that is equal to
client first utilizes the remote booting protocol to initiate
itself and loads the Operating System image into local (ip+1 − 1)Ip+2 ...IP I1 ...Ip−1 + (ip+2 − 1)
memory, then it downloads softwares to construct a Ip+3 ...IP I1 ...Ip−1 + ... + (p2 − 1)I3 I4 ...Ip−1 (1)
runtime environment which is responsible for big data + · · · + ip−1 .
3
The p-mode product of a tensor by a matrix is defined To extract the core data, the reduction function fdr in
as Eq. (7) is responsible for fusing the heterogenous data
(T ×p U )i1 i2 ...ip−1 jp ip+1 ...iP and constructing a computing platform to execute the
∑Ip
(2) dimensionality reduction tasks. Fig. 2 demonstrates the
= (ei1 i2 ...ip−1 ip ip+1 ...iP × ujp ip ).
ip =1 three fundamental problems closely related to distribut-
ed dimensionality reduction of big data.
During tensor decomposition, Eq. (2) is employed to
reduce the dimensionality of the p-th order. For an initial
Problem 2 core Problem 3
tensor T , the core tensor S and the approximate tensor Distributed
Dimensionality Hardware
T̂ [11] are defined as Reduction alg task platform
Computing
Platform
Algorithm
S = T ×1 U1 T ×2 U2 T ...×P UP T , Construction Software
(3) tensor
T̂ = S×1 U1 ×2 U2 ...×P UP .
Big Data Fusion Problem 1
Generally, the core tensor and the truncated unitary
orthogonal bases are considered as the compressed ver-
sion of the initial tensor. The reconstructed data in Unstructured Semi-structured Structured
Data Data Data
approximate tensor are more accurate than the original
data in initial tensor as the inconsistent, uncertain and
noisy data are removed. Fig. 2. The three problems closely related to distributed
Lanczos Method. The Lanczos method [12] is partic- dimensionality reduction of big data.
ularly useful to find eigenvalues and eigenvectors of
Problem 1: Big Data Fusion. The heterogenous dat from
a large scale sparse matrix. Let A denote a symmetric
multiple sources need to be fused as a unified model
matrix, the Lanczos decomposition is defined as
which will be used for dimensionality reduction.
AQk = Qk Dk + βk qk+1 eT
k, (4) Problem 2: Dimensionality Reduction Algorithm. The
large scale raw data in the unified model are uncertain,
where Qk is an orthogonal matrix, qk+1 and ek are inconsistent and redundant, they need to be cleaned to
unitary orthogonal vectors, D is a tridiagonal matrix that obtain the high quality core data which are small but
has the following form contain valuable information.
  Problem 3: Platform Construction and Data Distribution.
α1 β1
 ..  The hardwares and softwares need to be integrated to
 β1 α2 .  construct a distributed computing platform, in which
k
D =  . (5)
. .  efficient method should be proposed to partition and
 .. .. β
k−1

distribute the data blocks to autonomic devices.
βk−1 αk
The main objective of this paper is to address the three
Eq. (4) is a Lanczos decomposition with k length. The problems mentioned above. Firstly, a chunk tensor mod-
tridiagonal matrix Dk is obtained by carrying out the el is proposed to fuse the unstructured, semi-structured
iteration procedures and structured data as a unified tensor. During data
fusion, the local data can be preprocessed in Transparent
v = Aqj , αj = qjT v, Clients and only the sub-core data are submitted to
v = v − αj qj − βj−1 qj−1 , (6) Transparent Servers for integration. Secondly, a Lanc-
βj = ∥v∥2 , qj+1 = v/βj . zos based High Order Singular Value Decomposition
The Lanczos method is efficient for obtaining the larger algorithm is presented to reduce the dimensionality of
or smaller eigenvalues and the corresponding eigen- the unified tensor. Finally, the Transparent Computing
vectors of a symmetric matrix. Matrix-vector product paradigm is employed to construct a distributed com-
is the frequently used linear transformation during the puting platform, as well as a linear predictive method
Lanczos iteration. is utilized to partition big data to multi-size data blocks
which are distributed to heterogeneous devices.
2.2 Problem Statement and Solution Overview 3 B IG DATA F USION
Let Du , Dsemi and Ds denote the unstructured data, A chunk tensor model is proposed to fuse the unstruc-
semi-structured data and structured data respectively, tured, semi-structured and structured data. Concepts
core refer to the core data that consist of the core tensor and operations of the chunk tensor model are illustrated
and the truncated unitary orthogonal bases, hd and sf in this section.
denote the hardware and software, then the function fdr
to dimensionality reduction of big data can be formal- 3.1 Fusion Framework of Big Data
ized as
Fig. 3 depicts the fusion framework of big data based
fdr : {Du , Dsemi , Ds , hd, sf } → core. (7) on the Transparent Computing paradigm. The original
4
data collected from multiple sources are transported to In Eq. (8), the dimensionality of each order in chunk
transparent clients to generate the sub core chunk tensors tensor C are not greater than the corresponding dimen-
which are submitted to transparent servers where the sionality in the initial tensor T .
unified core data are obtained by fusing all the sub core
chunk tensors. 71 72 73 74 75 73 74 75
76 77 78 79 80 78 79 80
81 82 83 84 85 83 84 85
Unified Core 86 87 88 89 90 88 89 90
Data
TS 41 42 43 44 45 C2 ∈ R 4×3
46 47 48 49 50
51 52 53 54 55 51 52 53
TN 56 57 58 59 60 56 57 58
TC 11 12 13 14 15 21 22 23
Core Chunk ... 16 17 18 19 20 26 27 28
Tensors 21 22 23 24 25
26 27 28 29 30 T ∈ R 4×5×3 C1 ∈ R 2×3×2
Original ...
Data Fig. 4. Example of an initial tensor and two chunk tensors.
Fig. 3. Fusion framework via the Transparent Computing The two chunk tensors C1 and C2 in Fig. 4 are inter-
paradigm. cepted from the initial tensor T . The number of orders of
chunk tensor C1 is equal to tensor T , while the number
The framework in Fig. 3 is hierarchical. The original of orders of chunk tensor C2 is fewer than tensor T . In
data are preprocessed in the clients, the core data are this example, the chunk tensor C2 is a 4-by-3 matrix.
extracted in the servers. This hierarchical model is flex- Definition 2: Chunk Extension Operator. Let C1 ∈
ible and efficient for big data fusion, it can achieve the RI1 ×I2 ×...×IM and C2 ∈ RI1 ×I2 ×...×IN be two chunk
following desired features: tensors, then the extension operator is defined as
1) Distributed Preprocessing of Original Data: The ⇀
heterogeneous data collected from multiple sources C3 = C1 ×C2 , (9)

are transmitted to the near transparent clients for where C3 ∈ RJ1 ×J2 ×...×JK , K ≤ M + N, is a k-
preprocessing. The noisy data, inconsistent data order chunk tensor. Let Set1 = {I1 , I2 , . . . , IM },
and redundant data are removed, the sub core Set2 = {I1 , I2 , . . . , IN }, Set3 = {J1 , J2 , . . . , JK },
chunk tensors are obtained in the autonomic de- then the order set Set3 is the union of Set1 and Set2 ,
vices. This distributed framework is efficient and that is Set3 = Set1 ∪ Set2 .
scalable for big data fusion.
2) Decreasing Communication Traffic: As only the sub 73 74 75 0 0 73 74 75
core chunk tensors are submitted to the transpar- 78 79 80 0 0 78 79 80
83 84 85 0 0 83 84 85
ent servers, the network bandwidth is greatly de- 88 89 90 C1 0 0 88 89 90
creased. When needed, the sub core chunk tensors 41 42 43 44 0
can be used for data reconstruction. 41 42 43 44
46 47 48 49 0
46 47 48 49
3) Saving Storage Space: After obtaining the unified 51 52 53 54 0
51 52 53 54 I3 56 57 58 59 0
core data, all the sub core chunk tensors can be re- 56 57 58 59 I2
11 12 13 14
moved and only the unified core data are essential 16
I1 11 12 13 14 0
16 17 18 19 0
17 18 19
to reserve. Storage spaces are greatly saved by only 21 22 23 24 21 22 23 24 0
26 27 28 29 C2 26 27 28 29 0 C3
storing the nonzero elements in the core tensor and
the truncated orthogonal bases. Fig. 5. Generate a new chunk tensor with the chunk
extension operator.
3.2 Concepts and Operations of Chunk Tensor
In Fig. 5 two chunk tensors with different number of
Concepts and operations of the proposed chunk tensor orders are extended to a 3-order chunk tensor. Chunk
model are illustrated in this section. The study of block tensor C1 ∈ RI1 ×I2 ×I3 = R4×4×2 and chunk tensor
tensor [13] has been reported in the literature. Blocking a C2 ∈ RI1 ×I2 = R4×3 are unified as a chunk tensor
tensor is to partition the index range vectors associated C3 ∈ R4×5×3 . Zero elements are padded into the new
with each tensor order. The chunk tensor is different chunk tensor because dimensions of the three chunk
from the block tensor as a chunk might have fewer tensors are not equal. For clear description, we partition
orders than the corresponding initial tensor. the index range vectors associated with each order of
Definition 1: Chunk Tensor. Let T ∈ RI1 ×I2 ×...×IP chunk tensor C3 and denote them as follows
denote a P -order initial tensor, then a chunk tensor C
is defined as 1 : 4 = [1 : 4],
1 : 5 = [1 : 2|3 : 4|5 : 5], (10)
C ∈ RJ1 ×J2 ×...×JM , M ≤ P. (8) 1 : 3 = [1 : 2|3 : 3].
5
Eq. (10) formally describes the six parts of chunk tensor the individual XML elements and relationships. The GPS
C3 . In this paper, a new notation is employed to repre- data are integrated to tensor order Is . This nine-order
sent the partitioned parts, that is C(i,j,k) , i ∈ {1}, j ∈ tensor includes all data characteristics from the image,
{1, 2, 3}, k ∈ {1, 2}. The elements of chunk tensor XML and GPS data. In general case, as more types of
C(1,3,1) and chunk tensor C(1,1,2) in Fig. 5 are zeros, e.g. data are appended to the basic tensor space, the number
C(1,3,1) = 0 and C(1,1,2) = 0. of tensor orders is huge. For convenience this paper
employs T ∈ R×i=1 Ii to denote a P -order tensor model.
P
Definition 3: Chunking Approximate Tensor. Sup-
pose T is an initial tensor, C1 , C2 , . . . , CN are the
chunk tensors intercepted from tensor T , as well as
4 A L ANCZOS -BASED A LGORITHM FOR D I -
Ĉ1 , Ĉ2 , . . . , ĈN are the corresponding approximate
chunk tensors, then a chunking approximate tensor de- MENSIONALITY R EDUCTION OF B IG DATA
⌣
noted by T is defined as the extension product of all sub This paper proposes a Lanczos based High Order Sin-
⌣ ⇀ ⇀ ⇀ gular Value Decomposition (L-HOSVD) algorithm to
approximate chunk tensors, e.g. T = Ĉ1 ×Ĉ2 × . . . ×ĈN .
reduce the dimensionality of large scale heterogeneous
Given various types of data, we can first transform
data. Storage scheme, convergence property and compu-
them to low-rank chunk tensors, compute the sub ap-
tation cost of the algorithm are investigated in detail.
proximate chunk tensors and then integrate all the sub
approximate chunk tensors as a chunking approximate
tensor. Note that the approximate tensor T̂ is different 4.1 Decomposition Theorem for Large Scale Tensor
⌣
from the chunking approximate tensor T proposed in Theorem 1: Iterative Decomposition of Tensor. Let
this paper. For a P -order tensor, the approximate tensor T ∈ RI1 ×I2 ×···×IP denote a P -order tensor and vec
is reconstructed with the core tensor S and the truncated refer to a random vector, then there exists an iterative
unitary orthogonal bases U1 , U2 , . . . , UP , while the decomposition that satisfies equation
chunking approximate tensor is generated by integrating
S = T ×1 U1T ×2 U2T · · · ×P UPT , (13)
all the sub approximate chunk tensors.
where S is the core tensor as well as U1 , U2 , . . . , UP are
3.3 Fusion with Chunk Tensors the truncated orthogonal bases. The bases are obtained
In this paper a basic three-order tensor is defined for big during the iterative process with matrix-vector product.
data fusion as Proof. The unfolded matrix T(p) can be transformed to
Tbase ∈ RIt ×Is ×Id , (11) corresponding symmetric matrix Sym(T(p) ). Assume the
p-mode unfolded matrix has m rows and n columns, this
where the three orders denote time, space and device paper adopts T(p) ∈ Rm×n , 1 ≤ p ≤ P, m ≥ n to denote
characteristics respectively. Eq. (11) provides a basic the unfolded matrix. Define the symmetric matrix as
space that various types of data can be appended to it [ m×m ]
to generate a unified representation model [14]. 0 T(p)
Sym(T(p) ) = T , (14)
T(p) 0n×n
Is T
and let T(p) = U(p) Σ(p) V(p) be the singular value decom-
Image Data position of matrix T(p) , then the left singular matrix and
singular values can be calculated using the equations
GPS Data ′ ′′ ′
[ (p) , U] (p) ], U (p) ∈ R
U(p) = [U m×n
, U ′′ (p) ∈ Rm×(m−n) ,
XML Data Σ̂(p)
Σ(p) = , Σ̂(p) ∈ Rn×n .
It 0
(15)
By performing eigen value decomposition on the sym-
metric matrix Sym(T(p) ) = Q(p) Λ(p) QT (p) , we can obtain
Id [ ]
Λ(p) = Σ̂(p) , −Σ̂(p) , 0(m−n)×(m−n) ,
Fig. 6. Fuse three types of data as a unified tensor. [ ]
√1 U ′ (p) − √1 U ′ (p) U ′′
(p) (16)
2 2
Fig. 6 illustrates the integrating process of three types Q(p) = 1
√ V 1
√ V n×(m−n) .
2 2
0
of data, i.e. image data, XML document data and GPS da-
ta. These data are transformed to three sub chunk tensors Eq. (16) indicates that the left singular vectors and the
and are embedded to the basic tensor. The constructed singular values of matrix T(p) can be extracted from the
unified tensor model can be defined as eigenvectors and eigenvalues of the symmetric matrix
Sym(T(p) ). The Lanczos method described in Eqs. (4) ∼
T ∈ RIt ×Is ×Id ×Ih ×Iw ×Ics ×Iia ×Iib ×Ir , (12)
(6) is employed in this paper to quickly compute the left
where the orders Ih , Iw , Ics refer to the image height, singular vectors and the singular values of each unfolded
image width, color space respectively, Iia , Iib , Ir denote matrices.
6
data Ds collected from multiple sources are modeled

0 5×5
R 5×3 as low order chunk tensors in the transparent clients
in Line 1. The core chunk tensors are computed and
submitted to the transparent servers in Line 2. All the
approximate chunk tensors reconstructed of the core
R 3×5 03×3
chunk tensors in Line 3 are embedded to the three-order
basic tensor Tbase ∈ RIt ×Is ×Id to obtain a unified tensor
Fig. 7. Transform a matrix to the corresponding symmetric T ∈ RI1 ×I2 ×...×IP in Line 4. The fusion procedures from
matrix. Line 1 to Line 4 are the essential prerequisites for di-
mensionality reduction of big data. In Line 5, the unified
tensor T is unfolded to P matrices T(1) , T(2) , . . . , T(P ) .
Fig. 7 demonstrates the symmetric process of a ma- Each unfolded matrix is decomposed using the Lanczos
trix containing five rows and three columns. The 5 by method in the following steps. Line 7 employs Eq. (14)
3 matrix is transformed to the corresponding 8 by 8 to transform the unfold matrix T(p) to the corresponding
symmetric matric using Eq. (14). symmetric matrix Sym(T(p) ). The Lanczos method is
applied to obtain the large eigen values Λ(p) and the
4.2 Dimensionality Reduction Algorithm corresponding eigen vectors Q(p) in Line 8. Using Eq.
Based on the iterative decomposition theorem, this paper (16) the singular values Σ(p) and singular vectors Up
proposes a distributed dimensionality reduction algo- can be obtained in Line 9. Each left singular vector
rithm which can represent the large scale heterogeneous matrix truncated in Line 10 will be collected in Line
data as a chunking approximate tensor and extract the 12 to compute the core tensor S using Eq. (3). Line 13
high quality core data. returns the obtained core data core which consists of the
core tensor S and the truncated unitary orthogonal bases
Algorithm 1 Lanczos based High Order Singular Value U 1 , U 2 , . . . , UP .
Decomposition (L-HOSVD) algorithm.
Input: 4.3 Storage Scheme for Symmetric Matrix during
Heterogeneous data Du , Dsemi , Ds . Lanczos Iteration
Output: Generally the unified tensor T ∈ RI1 ×I2 ×...×IP construct-
Core data core. ed from the large scale heterogenous data is sparse, only
1: Represent the heterogeneous data as low rank chunk the nonzero elements of tensor T are essential for the
tensors C1 , C2 , . . . , CM in the transparent clients. linear operations in the L-HOSVD algorithm. Hence, this
2: Compute the core chunk tensors and submit them to paper employs the Compressed Row Storage (CRS) [15]
the transparent servers. method to store the unfolded matrices. The CRS scheme
3: Obtain the corresponding approximate chunk ten- is efficient for matrix-vector product and can decrease
sors Ĉ1 , Ĉ2 , . . . , ĈM from the core chunk tensors. memory usage during the Lanczos iteration.
4: Embed the approximate chunk tensors to the basic
⇀ ⇀ ⇀ ⇀
tensor space T = Tbase ×Ĉ1 ×Ĉ2 × . . . ×ĈM , obtain a 0 0 4 0
unified tensor T ∈ RI1 ×I2 ×...×IP . 0 6 0 0 T(1) = 0 7 0 3 0 0 0 0 4 0 2 0
9 0 0 0 0 6 0 5 0 8 0 0
T=
5: Unfold tensor T to P matrices T(1) , . . . , T(P ) . 7 0 0 2
6: for each unfolded matrix T(p) , 1 ≤ p ≤ P do 0 0 5 0 0 9 7 0 0 0
7: Transform matrix T(p) to the corresponding sym- 0 3 0 0 9 0 0 8
0 3 0 0 T(2) = 3 0 0 0 0 6 T(3) = 7 0 0 2 0 0 5 0
metric matrix Sym(T(p) ). 0 0 0 5 4 0
0 0 4 0 0 6 0 0
9 0 0 8 0 8 2 0 0 0
8: Obtain the larger eigen values and the correspond-
ing eigen vectors by applying the Lanczos method Fig. 8. A sparse tensor and three unfolded matrices.
to the symmetric matrix Sym(T(p) ).
9: Extract the singular values and singular vectors Fig. 8 is an example that demonstrates a three order
from the eigen values and eigen vectors. tensor T ∈ R2×4×3 and the corresponding unfolded
10: Compute the truncated unitary orthogonal basis matrices T(1) , T(2) and T(3) . Eq. (17) illustrates the sym-
Up of the unfolded matrix T(p) . metric matrix Sym(T(2) ) generated from the second-
11: end for mode unfolded matrix T(2) ∈ R4×6 . Fig. 9 demonstrates
12: Collect the truncated orthogonal bases and obtain the the storage format of the symmetric matrix Sym(T(2) )
core tensor S = T ×1 U1 T ×2 U2 T ...×P UP T . of which the subsequent nonzero elements are stored in
13: return the core data core = {S, U1 , U2 , . . . , UP }. contiguous memory locations. Three arrays are generat-
ed to store this symmetric matrix, e.g. the val array to
Algorithm 1 depicts the procedures of dimensionality store the values of the nonzero elements of the matrix
reduction based on Lanczos method. The unstructured which are arranged in a row-wise fashion, the col ind
data Du , semi-structured data Dsemi , and structured array to store the column indices of the elements, and
7
the row ptr array to store the locations in the val array Let m denote the number of rows of matrix T(p) in Fig.
that start a new row. 10, n denote the number of columns, then m and n can
As the matrix Sym(T(2) ) is symmetric, only the nonze- be computed with the following equation
ro elements in the upper triangular portion are essential
for storage. The nonzero elements in the lower triangular ∏
P
m = Ip , n = Ii . (18)
portion can be implicitly obtained during the Lanczos
i=1, i̸=p
iteration. In Eq. (17), the symmetric matrix Sym(T(2) )
contains sixteen nonzero elements and eighty-four zero In this paper, we adopt mnz to denote the number of
elements. The eight nonzero elements in the upper tri- rows that contain nonzero elements, then the matrix-
angular portion are stored in memory. vector product of the symmetric matrix Sym(T(p) ) by
  the vector q using the Compressed Row Storage format
0 0 0 0 0 9 7 0 0 0 is provided by
 0 0 0 0 3 0 0 0 0 6 
 
 0 0 0 0 0 0 0 5 4 0 
  1: for i = 1 to mnz do
 0 0 0 0 0 8 2 0 0 0 
  2: v(i) = 0.
 0 3 0 0 0 0 0 0 0 0 

Sym(T(2) ) =   . (17) 3: for j = row ptr(i) to row ptr(i + 1) − 1 do

 9 0 0 8 0 0 0 0 0 0  4: v(i) = v(i) + val(j) × q(col ind(j)).
 7 0 0 2 0 0 0 0 0 0 
  5: end for
 0 0 5 0 0 0 0 0 0 0 
  6: end for
 0 0 4 0 0 0 0 0 0 0 
7: for j = 1 to mnz do
0 6 0 0 0 0 0 0 0 0 8: for i = row ptr(j) to row ptr(j + 1) − 1 do
9: v(col ind(i)) = v(col ind(i)) + val(i) × q(j).
In Fig. 9, the val array contains the eight nonzero
10: end for
elements and the col ind array contains the correspond-
11: end for
ing column indices. The five locations that start a new
row are stored in the row ptr array. Instead of storing
one hundred (10 × 10) elements, this scheme only need All elements of the vector v are set to zeros before
twenty-one (8 + 8 + 5) storage locations. When the sym- the matrix-vector product. The matrix-vector product of
metric matrix Sym(T(2) ) in Eq. (17) is loaded to computer v = T(p) q is computed from Line 1 to Line 6. The multi-
memory using the CRS scheme, 79% storage locations plication operation can be expressed using the equation
are saved. The storage saving is significant. ∑ ∑
v(i) = (T(p) )i,j q(j) = ei,j q(j), (19)
j j
val 9 7 3 6 5 4 8 2
col_ind 6 7 5 10 8 9 6 7 where ei,j refers to the nonzero element of the matrix
row_ptr 1 3 5 7 9 T(p) . The product of v = (T(p) )T q is obtained from Line
7 to 11 with the following equation
Fig. 9. Compressed row storage of the sparse symmetric ∑ ∑
T
matrix. v(i + m) = (T(p) )i,j q(j) = ej,i q(j). (20)
j j
It is efficient to use the CRS format of the symmetric
matrix Sym(T(2) ) for matrix-vector product during the The Compressed Row Storage format illustrated in Fig.
Lanczos iteration. Fig. 10 demonstrates the matrix-vector 9 is used by the product operations mentioned above.
product of a symmetric matrix with a vector. The upper
tridiagonal part of the symmetric matrix multiplies by 4.4 Convergence and Accuracy of the L-HOSVD Al-
the low part of the vector, while the lower tridiagonal gorithm
part multiplies by the upper part of the vector.
This section investigates the convergence property of
m n the proposed Lanczos based High Order Singular Value
Decomposition (L-HOSVD) algorithm, and analyzes the
0 T( p ) ∈ R m×n m approximation accuracy of the decomposition results ob-
tained by the algorithm. Let T(p) = Up Σp VpT denote the
= × singular value decomposition of the p-mode unfolded
T Τ
0 n matrix T(p) , Upr = [u1 , u2 , . . . , ur ], Vpr = [v1 , v2 , . . . , vr ]
( p)
and Σrp = diag(σ1 , σ2 , . . . , σr ) denote the truncated
left singular matrix, right singular matrix, and singular
v Sym(T( p ) ) q
values respectively, then the truncated singular value
decomposition of the p-th unfolded matrix is defined as
Fig. 10. Illustration of the matrix-vector product of sym-
metric matrix Sym(T(p) ) by vector q. r
T(p) = Upr Σrp (Vpr )T . (21)
8
The approximation accuracy of the reconstructed matrix 4.5 Computation Cost and Memory Usage
r
T(p) can be computed with the equation The decomposition of a large scale tensor is a great bur-

r den on current computing infrastructure. For instance,
T(i) − T(i) = σr+1 , (22)
let T ∈ R100×100×100×100 be a four order tensor, then the
F
where σr+1 is the (r + 1)-th singular value, ∥·∥F is the first-mode tensor unfolding T(1) is a 102 by 106 matrix.
Frobenius Norm [16]. Eq. (22) provides a method to In order to apply the classical Golub-Reinsch SVD [19]
measure the absolute error of the singular value decom- algorithm to factorize the unfolded matrix, this paper
T
position. This paper extends the absolute error value to uses the equation T(1) = V ΣT U T to compute the singular
a relative error ratio that can appropriately measure the value matrix Σ and the right singular vector matrix V .
approximation accuracy. The ratio is defined as As the flop count of the Golub-Reinsch SVD algorithms
is 4mn2 + 8n3 [20] where m and n are the number of
r
T(p) − T(p) rows and columns respectively, the execution time to
F
ψ= . (23) decompose the unfolded matrix T(1) T
with a computer
T(p)
that achieves a speed of one trillion operations per
In order to find a better approximation for a unified second are about
tensor model T , this paper employs the error ratio ψ to 2 3
determine the amount of iterations during the Lanczos 4 × 102 × (106 ) + 8 × (106 )
≈ 93 (days). (24)
procedure. An example of the exit from the Lanczos 1012 × 3600 × 24
iteration is demonstrated in Fig. 11. An error ratio ψ This type of dimensionality reduction method is unbear-
is provided in advance based on the requirement of a ably slow for big data applications. Compared to the
big data application, then the singular value σr+1 in Golub-Reinsch SVD algorithm, the proposed L-HOSVD
Fig. 11(b) is obtained using Eq. (22). According to Eq. algorithm is more efficient. Theoretical analyses in terms
(15), the eigen value Λr+1 is equal to this singular value of computation cost and memory usage of the L-HOSVD
σr+1 . The Kaniel-Paige-Saad [17] theory indicates that algorithm are provided in this section.
the exterior eigenvalues are among the fast to converge. Computation Cost. Let T ∈ RI1 ×I2 ×...×IP denote a
The better separated these eigenvalues are from the rest P -order tensor, the required operations to decompose
of the eigenvalues, the faster they converge. Therefore this tensor consist of three parts, e.g. unfolding tensor
during the Lanczos iteration of the proposed L-HOSVD T to matrices T(1) , T(2) , . . . , T(P ) , singular value
algorithm, the procedure can break out once the eigen- decomposition on these unfolded matrices, and the p-
value Λr+1 is obtained. mode product of the tensor T by the obtained truncated
U2
matrices U1 , U2 , . . . , UP . The total flop counts f lp of
the mentioned three parts are equal to f lp = f lpunf +
f lpsvd + f lpprd . As the most time-consuming part is
f lpsvd , this section mainly focuses on the computation
cost of the singular value decomposition of the symmet-
3
U
T U1 S
ric matrix Sym(T(p) ), 1 ≤ p ≤ P .
During the Lanczos iteration, the frequently called
T^ operation is the matrix-vector product which consists
(a) of addition and scalar multiplication. Define nnz as a
function to compute the number of nonzero elements in
T(2) U2 Σ2 a matrix, then the flop counts of the Lanczos method
σ r +1 (V2r )Τ
Σ r2
...
T r
(2) = U r
2 × × described in Eq. (6) are illustrated in Table 2.
...
V2Τ
(b)
TABLE 2
Flop counts to decompose a symmetric matrix.
Fig. 11. Example of the exit condition during the Lanczos
iteration. Operation Name Multiplication Addition
v = Sym(T(p) )qj nnz(Sym(T(p) )) nnz(Sym(T(p) )) − 1
Using the error ratio ψ defined in Eq. (23), we can ∏P ∏P
αj = qjT v Ip + Ii Ip + Ii − 1
control the Lanczos iteration and obtain the truncated i=1, i̸=p i=1, i̸=p
SVD of the unfolded matrices which are used to com- v = v − αj qj ∏P ∏
P
2(Ip + Ii ) 2(Ip + Ii )
pute the core tensor and the approximate tensor. This −βj−1 qj−1 i=1, i̸=p i=1, i̸=p
dimensionality reduction technique can provide a low ∏
P ∏
P
βj = ∥v∥2 (Ip + Ii ) (Ip + Ii ) −1
rank approximation for the initial tensor T , which is i=1, i̸=p i=1, i̸=p
not the best approximation of the initial data, but is still ∏
P
qj+1 = v/βj (Ip + Ii ) 0
considered to be sufficiently good for many applications. i=1, i̸=p
The Higher Order Orthogonal Iteration [18] can be used

to find a best rank approximation of a tensor. According to Table 2, the flop counts of one Lanczos
9
iteration are equal to Local Data Stream f f

Tasks Results
3. Task Scheduling f f f
Computing Tasks
∏
P 2. Remote Deploying JAMA
itr = 2nnz(Sym(T(p) )) + 9Ip + 9 Ii − 3. (25) Supporting Softwares
Software Pool
Operating System 1. Remote Booting
i=1, i̸=p
CPU Sensor RAM I/O OSes Pool
Assume that k eigenvalues are converged after j Lanczos
Transparent Client Transparent Server
iterations, then the total flop counts to obtain the rank-r
truncated orthogonal basis Up are equal to k × itr. An Fig. 12. A simple distributed computing platform via trans-
equation k = 5+2.5r is introduced in Ref. [17] to estimate parent computing paradigm.
k. In addition, the vectors might lose their orthogonality
during the Lanczos iteration, some techniques such as
full reorthogonalization [21] and partial orthogonaliza- Based on the loosely coupled paradigm shown in
tion [22] are used to ensure the orthogonality of the Fig. 12, the big data processing model via Transparent
vector, therefore the flop counts for reorthogonalization Computing paradigm can be formalized as
should be considered. But in general case, it can be ∪
n
concluded that the Lanczos based SVD method is more Y = fs ( fc (hw, os, sf, tk, ds)), (26)
competitive than the classical SVD method, especially i=1
for large scale sparse tensor decomposition. where Y denotes the analysis results of big data, fc and
Memory Usage. The memory usage of the L-HOSVD fs refer to the computing function executed in trans-
algorithm is related to the number of nonzero ele- parent clients and transparent servers, hw, os, sf, tk, ds
ments in tensor T . Let mpnz denote the number of rows represent the hardware, operating system, software,
that contain nonzero elements in the symmetric ma- computing tasks and data stream respectively. The re-
trix Sym(T(p) ), then the total memory usage is equal sources provided by the transparent servers consist of os,
∑
P
sf and tk which can be flexibly chose to boot and process
to 2P × nnz(T ) + mpnz . Based on the Compressed
p=1 the data block in various types of hardware devices.
Row Storage (CRS) scheme described in Section 4.3, By decoupling components of the transparent clients
the proposed L-HOSVD algorithm can significantly save and optimally integrating them, the paradigm model
computer memory. illustrated in Eq. (26) can provide a basis for developing
high performance distributed systems which support for
heterogenous devices, enhanced data security and user
5 D ISTRIBUTED C OMPUTING P LATFORM transparency.
C ONSTRUCTION AND DATA B LOCK PARTITION In Fig. 12 the remote booting step can be supported by
This section uses the Transparent Computing paradigm hard coding a remote boot protocol in the Transparent
to construct a distributed computing platform where Client’s BIOS beforehand, e.g. the existing Intel Preboot
a predictive model is employed to partition data to eXecution Environment (PXE) [24] protocol. The client
appropriate size of blocks for distribution. can first send a OS boot request and get an IP address
from the Transparent Server, then download a universal
OS loader from the server. Once the universal OS loader
5.1 Distributed Computing Platform Construction vi- is up and running, the client will have the ability to
a Transparent Computing Paradigm remotely access and install the Operating System in the
Transparent Server.
To build a transparent run-time environment that al-
lows the heterogeneous devices to flexibly participate
in big data processing, this paper presents an efficient 5.2 Predictive Model for Data Block Partition
approach to design a distributed computing platform This paper proposes to employ the linear predictive
via Transparent Computing paradigm. One key idea of model for data block partition and distribution. The coef-
the transparent computing paradigm is the separation ficient parameters of the predictive model are estimated
of hardware, software and data. A bare client can flexi- using the training data. With this model, the distributed
bly demand Operation System and supporting software computing platform can deliver appropriate size of data
resources from the centralized transparent server [23]. blocks to heterogeneous devices.
Fig. 12 demonstrates the loosely coupled paradigm with Assume the distributed computing platform con-
one single transparent client and one single transparent tains M devices that can be denoted as device =
server. A client can first access the OS images located at {dev1 , dev2 , . . . , devM }. For a device devi , 1 ≤ i ≤ M ,
servers and load them to reboot the local device on Step we use Bij to denote the size of data block that device
1, then download the softwares and computing tasks to devi processed at timej . Let engij , secji , proji and traji
process the large scale data on Step 2 and Step 3. Finally denote the four variables, e.g. energy consumption, data
the processing results are submitted to the transparent security, processing time and communication traffic, gen-
server for combination and further mining. erated by devi at time timej . Without loss of generality,
10
the subscript is omitted, it implies this model is suitable a set of training data that can be defined as
for all devices. In this paper, the block size Bij+1 is a  
1 eng 1 sec1 pro1 tra1
dependent variable that is affected by four independent  2 2 2 2 
variables defined above, the linear function of this pre- χ =  1 eng sec pro tra . (28)
. .. ..
dictive model can be formulated as 1 .. . .
[ ]T
B j+1 = ω0 + ω1 eng j + ω2 secj + ω3 proj + ω4 traj , (27) Let B = B 1 , B 2 , . . . denote the vector of the
dependent variables in the training data, and ω =
T
where ω1 , ω2 , ω3 and ω4 refer to the weight coefficients [ω0 , ω1 , ω2 , ω3 , ω4 ] refer to the vector of the weight
for the four independent variables, as well as ω0 denotes coefficients, then the regression function of the sample
the average affection of other variables that are not data can be formalized as
included in this predictive model.
B = χ × ω. (29)
If χT χ is nonsingular, the coefficient parameters can be
5.3 Quantitative Methods of Dependent Variables obtained using equation
This section illustrates how to quantify the four de- ω = (χT χ)−1 χT B. (30)
pendent variables in Eq. (27). When a dimensionality
reduction task is scheduled to heterogenous clients and After calculating the solutions of this equation, we can
servers, the energy consumption is determined by the obtain the predictive function and estimate the block size
CPU workload which can be measured by the number of that is appropriate for a device to process by applying
CPU cycles required by the task. When the task is trans- Eq. (27). In this paper, the key operations of the proposed
mitted over a wireless network or Internet, the energy L-HOSVD method is matrix-vector multiplication which
consumption is produced for the data transmission. Let is mainly affected by the size of the matrix. The task
engtc , engtra and engts denote the energy consumed in scheduling is controlled by partitioning and distributing
transparent clients, communication network and trans- appropriate size of data blocks to the autonomic devices.
parent servers, then the total energy consumption for
executing the dimensionality reduction task is defined 6 P ERFORMANCE E VALUATION
as eng = engtc + engtra + engts . The amount of energy This section presents the performance evaluation results
consumed in devices and network can be obtained using of the proposed holistic approach for dimensionality
the calculation methods reported in Refs. [25, 26]. reduction of big data. The Brighkite dataset 1 is used for
The study of data security has been published in the evaluation. This dataset includes trajectory information
literature. In Ref. [27], three types of security services, (collected from user’s checking-in) as well as the social
i.e. confidentiality, authentication and integrity are con- relationships among these users. We also fused some
sidered for providing security services. Confidentiality unstructured video data to evaluate the effect of dimen-
protection service is against unauthorised disclosure of sionality reduction of heterogenous data. We have imple-
information. Authentication service is the process of reli- mented a number of algorithms for big data processing
ably determining the identity of a communicating party. such as chunk tensor fusion algorithm, Lanczos based
Integrity protection service is against the unauthorized High Order Singular Value Decomposition algorithm,
modification of data. In Ref. [28], the security features in- linear prediction algorithm, etc. But due to limitation of
cluding data integrity, identity privacy, location privacy, the paper length, we only present some representative
and authentication are investigated for formalization. experimental results.
Processing time are measured in seconds which repre-
sent the computation costs of dimensionality reduction
6.1 Unified Tensor and Chunk Tensor
tasks. The matrix-vector multiplication, matrix-matrix
multiplication and n-mode product of a tensor by a The trajectory data are fused to a four-order tensor which
matrix are three key linear operations in the reduction can be denoted as T ∈ RIuser ×Itime ×Ilatitude ×Ilongitude .
tasks. All of them can be calculated in seconds. The For illustration we selected trajectories of three users
communication traffic are the bytes exchanged in the net- and demonstrated them in Fig. 13. The reference time
work bandwidth, they can be quantified by calculating is 20 : 56 : 10 on the evening of May 25th, 2009, as
the data passed between the autonomic devices. well as the time axis is measured in one-minute unit.
Different colors are used to distinguish users in order
Iuser . The trajectories of the first, second and third user
5.4 Computing the Coefficient Parameters are indicated by 2100, 1210 and 2100 points respectively.
To illustrate the chunk tensor model, we represented a
The least square estimation method [29, 30] is employed week of trajectories of one user as an initial chunk tensor
to compute the coefficient parameters of the predictive and employed high order singular value decomposition
model. We first perform the dimensionality reduction
tasks on the distributed computing platform and collect 1. http://snap.stanford.edu/data/loc-brightkite.html
11
First User Second User Third User

100%
50
80%
0
Longitude
−50
60%
Ratio
−100
−150
60 40%
40 1
0.5
20 0 6
Latitude −0.5 x 10
0 −1 20%
Time (Minute) Reduction Ratio
Approximation Ratio
Fig. 13. Trajectories of three users in a tensor model. 0
2 4 6 8 10 12
Experiment
to obtain the approximate chunk tensor. Fig. 14(a) and Fig. 15. Dimensionality reduction and approximate error.
Fig. 14(b) demonstrate the two chunks as well as Fig.
14(c) shows the two chunks in an integrated space which
reveals that there is no great offset between the initial while the approximation ratio decrease slowly from
chunk tensor and the approximate chunk tensor. 99.2% to 86%. In the fifth experiment, 48% core data
can ensure 97% approximation accuracy, where the core
x 10
5
x 10
5
tensor accounted for 71.8% of the core data and the
4 4 truncated orthogonal bases accounted for 28.2%. The
average results of the experiments show that 38.2% core
Longitude
Longitude
2 2 data can guarantee 94.2% approximation accuracy.
0 0
10 10
x 10
5 5 1
2
4 x 10
5 5 1
2
4
6.3 Performance of the L-HOSVD Algorithm
x 10 x 10
Latitude 0 0 Time (Minute) Latitude 0 0 Time (Minute)
(a) (b)
1
x 10
5 General HOSVD
Lanczos−based HOSVD
4 0.8
Normalized execution time
Longitude
2
0.6
Initial Chunk
0 Approximate Chunk
10
15000 0.4
5
5 10000
x 10 5000
0 0
Latitude Time (Minute)
0.2
(c)
Fig. 14. Initial chunk (a), approximate chunk (b), and 0

0 0.25 0.5 0.75 1
integrated space of the two chunks (c). Normalized tensor size
Fig. 16. Performance comparison between L-HOSVD al-

gorithm and general HOSVD algorithm.
6.2 Effects of Dimensionality Reduction
The unstructured video clip of MPEG-4 format and semi- We evaluated the performance of the Lanczos based
structured user data were transformed to a five-order High Order Singular Value Decomposition (L-HOSVD)
tensor model. The Lanczos based High Order Singular algorithm with a seven order sparse tensor containing
Value Decomposition (L-HOSVD) algorithm was per- unstructured image data and structured trajectory data.
formed to extract the core tensor and the truncated uni- The execution time and tensor size were normalized for
tary orthogonal bases. For better comparison we adopted better comparison. Fig. 16 demonstrates the performance
an approximate ratio ϑ (ϑ = 1−ψ) to illustrate the effects comparison between the L-HOSVD algorithm and the
of dimensionality reduction . general HOSVD algorithm. A considerable increase oc-
Fig. 15 shows the comparison between the dimen- curred in the curve of the general HOSVD method from
sionality reduction ratio and the approximation ratio. the normalized size 0.1%, while there was a gradual
From the first experiment to the twelfth experiment, increase in the L-HOSVD method. In addition, we note
the reduction ratio decrease sharply from 85% to 7%, that the Lanczos algorithm can quickly exit the iteration
12
as the approximate accuracy ratio rapidly meet the pre- the faster algorithms for solving symmetric tridiagonal
defined value. We carried out a great deal of iterations on eigen-problem in both sequential and parallel settings.
the symmetric matrix generated from the second-mode Drmac and Veselic[38] apply the classical Jacobi method
unfolded matrix of the seven order tensor, and depicted to design fast SVD algorithm.
the top forty eigenvalues in Fig. 17, which illustrates
the convergence property of the L-HOSVD algorithm. 7.3 Lanczos Method
Experimental results show that the symmetric matrix The Lanczos method [39] is efficient for finding several
reconstructed from 33 percent exterior eigenvalues and extreme eigenvectors and eigenvalues of a large scale
the corresponding eigenvectors can guarantee 95 percent sparse symmetric matrix. In Ref. [40] a block Lanczos
approximate accuracy. type algorithm is proposed to compute the tridiagonal
4
matrix. Parallel implementation of Lanczos algorithms
x 10
3 [41, 42] are studied to improve the efficiency, those al-
gorithms aim to effectively parallelize the matrix-vector
2 or vector-vector operations.
Many studies on tensor decomposition, singular val-
1 ue decomposition and Lanczos method have been per-
formed over the past few decades. However, all the
Eigenvalue
0 investigations mentioned above are concentrated on spe-

cial topics, no systematic research has been dedicated
−1 to dimensionality reduction of big data. The present
study was undertaken to propose a holistic approach to
−2 address the three fundamental problems closely related
to distributed dimensionality reduction of big data, i.e.
−3
0 5 10 15 20
big data fusion, dimensionality reduction algorithm and
Iteration construction of distributed computing platform.
Fig. 17. An illustration of the Lanczos iteration in the L- 8 C ONCLUSION

HOSVD algorithm.
This paper aims at providing a holistic approach to dis-
tributed dimensionality reduction of big data. Firstly a
7 R ELATED W ORKS chunk tensor model is proposed to fuse the heterogenous
data from multiple sources as a unified tensor model.
This section reviews the literature about tensor decom- Concepts and operations of the chunk tensor model
position and singular value decomposition, the Lanczos are established in this paper. Secondly, a Lanczos-based
type methods are also briefly recalled. High Order Singular Value Decomposition (L-HOSVD)
algorithm is proposed to obtain the core data which
7.1 Tensor Decomposition are small but contain valuable information. Storage and
A tensor [10, 31] is the generalization of a matrix, it convergence property of the L-HOSVD algorithm are
is usually called multidimensional array. Tensor is an studied. Thirdly, the transparent computing paradigm is
effective data representation model with which the valu- employed to construct a distributed computing platform,
able information can be extracted using the High Order as well as the linear predictive model is used to partition
Singular Value Decomposition (HOSVD) [11] method. and distribute data blocks to autonomic devices. Theo-
As the HOSVD method imposes orthogonal constraint retical analysis and experimental results show that the
on the truncated vector bases, it is considered as a special proposed holistic approach is efficient for dimensionality
case of the commonly used TUCKER [32] decomposition. reduction of big data, it can serve as a basis for large
HOSVD has been adopted for data analysis and mining scale heterogeneous data processing.
in many applications such as tag recommendations [33]
and hand-written digit classification [34]. R EFERENCES
[1] L. J. van der Maaten, E. O. Postma, and H. J.
7.2 Singular Value Decomposition van den Herik, “Dimensionality Reduction: A Com-
Singular value decomposition transforms a matrix to the parative Review,” Journal of Machine Learning Re-
product of two singular vector matrices and a singular search, vol. 10, no. 1-41, pp. 66–71, 2009.
value matrix. The Golub-Kahan SVD [35] algorithm is a [2] U. Doraszelski and K. L. Judd, “Avoiding the Curse
general and efficient SVD technique for factorization of of Dimensionality in Dynamic Stochastic Games,”
small scale matrix. The Divide and Conquer method in- Quantitative Economics, vol. 3, no. 1, pp. 53–93, 2012.
troduced in Ref. [36] can efficiently optimize the Golub- [3] H. Abdi and L. J. Williams, “Principal Component
Kahan SVD algorithm. Another divide-and-conquer al- Analysis,” Wiley Interdisciplinary Reviews: Computa-
gorithm proposed in Ref. [37] can be considered as one of tional Statistics, vol. 2, no. 4, pp. 433–459, 2010.
13
[4] E. Henry, J. Hofrichter et al., “Singular Value De- pp. 1324–1342, 2000.
composition: Application to Analysis of Experimen- [19] G. H. Golub and C. Reinsch, “Singular Value De-
tal Data,” Essential Numerical Computer Methods, vol. composition and Least Squares Solutions,” Nu-
210, pp. 81–138, 2010. merische Mathematik, vol. 14, no. 5, pp. 403–420, 1970.
[5] P. Comon and C. Jutten, Handbook of Blind Source [20] G. H. Golub and C. F. Van Loan, Matrix Computa-
Separation: Independent Component Analysis and Ap- tions. JHU Press, 2012, vol. 3.
plications. Academic Press, 2010. [21] K. Wu and H. Simon, “Thick-Restart Lanczos
[6] E. E. Cureton and R. B. D’Agostino, Factor Analysis: Method for Large Symmetric Eigenvalue Problem-
An Applied Approach. Psychology Press, 2013. s,” SIAM Journal on Matrix Analysis and Applications,
[7] Y. Zhang and Y. Zhou, “Transparent Computing: vol. 22, no. 2, pp. 602–616, 2000.
A New Paradigm for Pervasive Computing,” in [22] H. D. Simon, “The Lanczos Algorithm with Partial
Ubiquitous Intelligence and Computing. Springer, Reorthogonalization,” Mathematics of Computation,
2006, pp. 1–11. vol. 42, no. 165, pp. 115–142, 1984.
[8] Y. Zhang, L. T. Yang, Y. Zhou, and W. Kuang, “Infor- [23] Y. Zhang and Y. Zhou, “4VP: A Novel Meta OS Ap-
mation Security Underlying Transparent Comput- proach for Streaming Programs in Ubiquitous Com-
ing: Impacts, Visions and Challenges,” Web Intelli- puting,” in 21st International Conference on Advanced
gence and Agent Systems, vol. 8, no. 2, pp. 203–217, Information Networking and Applications AINA’07.
2010. IEEE, 2007, pp. 394–403.
[9] Y. Zhang and Y. Zhou, “Separating Computation [24] M. Johnston and S. Venaas, “Dynamic Host Con-
and Storage with Storage Virtualization,” Computer figuration Protocol (DHCP) Options for the Intel
Communications, vol. 34, no. 13, pp. 1539–1548, 2011. Preboot eXecution Environment (PXE),” RFC 4578,
[10] T. G. Kolda and B. W. Bader, “Tensor Decomposi- November, Tech. Rep., 2006.
tions and Applications,” SIAM Review, vol. 51, no. 3, [25] K. Kumar and Y.-H. Lu, “Cloud Computing for
pp. 455–500, 2009. Mobile Users: Can Offloading Computation Save
[11] L. De Lathauwer, B. De Moor, and J. Vandewalle, “A Energy?” Computer, vol. 43, no. 4, pp. 51–56, 2010.
Multilinear Singular Value Decomposition,” SIAM [26] W. Zhang, Y. Wen, K. Guan, D. Kilper, H. Luo, and
Journal on Matrix Analysis and Applications, vol. 21, D. Wu, “Energy-Optimal Mobile Cloud Computing
no. 4, pp. 1253–1278, 2000. under Stochastic Wireless Channel,” 2013.
[12] M. Grüning, A. Marini, and X. Gonze, “Implemen- [27] M. Lin, L. Xu, L. T. Yang, X. Qin, N. Zheng, Z. Wu,
tation and Testing of Lanczos-based Algorithms and M. Qiu, “Static Security Optimization for Real-
for Random-Phase Approximation Eigenproblems,” Time Systems,” IEEE Transactions on Industrial Infor-
Computational Materials Science, vol. 50, no. 7, pp. matics, vol. 5, no. 1, pp. 22–37, 2009.
2148–2156, 2011. [28] A. N. Khan, M. Mat Kiah, S. U. Khan, and S. A.
[13] S. Ragnarsson and C. F. Van Loan, “Block Tensor Madani, “Towards Secure Mobile Cloud Comput-
Unfoldings,” SIAM Journal on Matrix Analysis and ing: A Survey,” Future Generation Computer Systems,
Applications, vol. 33, no. 1, pp. 149–169, 2012. vol. 29, no. 5, pp. 1278–1299, 2013.
[14] L. Kuang, F. Hao, L. T. Yang, M. Lin, C. Luo, and [29] R. C. Aster, B. Borchers, and C. H. Thurber, Parame-
G. Min, “A Tensor-Based Approach for Big Da- ter Estimation and Inverse Problems. Academic Press,
ta Representation and Dimensionality Reduction,” 2013.
IEEE Transactions on Emerging Topics in Computing, [30] G. Liu, M. Fan, and G. Quan, “Neighbor-Aware
(in Press, DOI 10.1109/TETC.2014.2330516). Dynamic Thermal Management for Multi-Core Plat-
[15] Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and form,” in Europe Conference on Design, Automation &
H. van der Vorst, Templates for the Solution of Alge- Test & Exhibition DATE’12. IEEE, 2012, pp. 187–192.
braic Eigenvalue Problems: A Practical Guide. SIAM, [31] C. M. Martin, “Tensor Decompositions Workshop
2000, vol. 11. Discussion Notes,” American Institute of Mathematics,
[16] A. L. Custódio, H. Rocha, and L. N. Vicente, “In- 2004.
corporating Minimum Frobenius Norm Models in [32] L. R. Tucker, “Some Mathematical Notes on Three-
Direct Search,” Computational Optimization and Ap- Mode Factor Analysis,” Psychometrika, vol. 31, no. 3,
plications, vol. 46, no. 2, pp. 265–278, 2010. pp. 279–311, 1966.
[17] T. Ericsson and A. Ruhe, “The Spectral Transfor- [33] R. Wetzker, C. Zimmermann, C. Bauckhage, and
mation Lanczos Method for the Numerical Solution S. Albayrak, “I Tag, You Tag: Translating Tags for
of Large Sparse Generalized Symmetric Eigenvalue Advanced User Models,” in Proceedings of the Third
Problems,” Mathematics of Computation, vol. 35, no. ACM International Conference on Web Search and Data
152, pp. 1251–1268, 1980. Mining. ACM, 2010, pp. 71–80.
[18] L. De Lathauwer, B. De Moor, and J. Vandewalle, [34] B. Savas and L. Eldén, “Handwritten Digit Classifi-
“On the Best Rank-1 and Rank-(R1, R2,..., Rn) Ap- cation Using Higher Order Singular Value Decom-
proximation of Higher-Order Tensors,” SIAM Jour- position,” Pattern Recognition, vol. 40, no. 3, pp. 993–
nal on Matrix Analysis and Applications, vol. 21, no. 4, 1003, 2007.
14
[35] G. Golub and W. Kahan, “Calculating the Singular Laurence T. Yang received the B.E. degree
Values and Pseudo-Inverse of A Matrix,” Journal of in Computer Science and Technology from Ts-
inghua University, China and the Ph.D degree
the Society for Industrial & Applied Mathematics, Series in Computer Science from University of Victoria,
B: Numerical Analysis, vol. 2, no. 2, pp. 205–224, 1965. Canada. He is a professor in the School of
[36] M. Gu and S. C. Eisenstat, “A Divide-and-Conquer Computer Science and Technology at Huazhong
University of Science and Technology, China,
Algorithm for the Bidiagonal SVD,” SIAM Journal and in the Department of Computer Science,
on Matrix Analysis and Applications, vol. 16, no. 1, St. Francis Xavier University, Canada. His re-
pp. 79–92, 1995. search interests include parallel and distributed
computing, embedded and ubiquitous/pervasive
[37] J. Cuppen, “A Divide and Conquer Method for the computing, and big data. His research has been supported by the
Symmetric Tridiagonal Eigenproblem,” Numerische National Sciences and Engineering Research Council, and the Canada
Mathematik, vol. 36, no. 2, pp. 177–195, 1980. Foundation for Innovation.
[38] Z. Drmac and K. Veselic, “New Fast and Accurate
Jacobi SVD Algorithm. I,” SIAM Journal on Matrix
Analysis and Applications, vol. 29, no. 4, pp. 1322–
1342, 2008.
[39] J. K. Cullum and R. A. Willoughby, Lanczos Algo-
Jinjun Chen received the Ph.D degree in com-
rithms for Large Symmetric Eigenvalue Computations: puter science and software engineering from
Vol. 1: Theory. SIAM, 2002, vol. 41. Swinburns University of Technology, Australia.
[40] I. Popovyan, “Efficient Parallelization of Lanczos He is an associate professor from the Faculty
of Engineering and Information Technology, U-
Type Algorithms,” Tech. Rep., 2011. niversity of Technology Sydney (UTS), Australia.
[41] I. Flesch and R. Bisseling, “A New Parallel Ap- He is the director of Lab of Cloud Computing and
proach to the Block Lanczos Algorithm for Finding Distributed Systems at UTS. His research inter-
ests include cloud computing, big data, workflow
Nullspaces Over GF (2),” Master’s Thesis, Department management, privacy and security, and related
of Mathematics, Utrecht University, Utrecht, the Nether- various research topics. His research results
lands, 2002. have been published in more than 100 papers in high-quality journals
and at conferences.
[42] P. L. Montgomery, “Distributed Linear Algebra,” in
Proceedings, 4th Workshop on Elliptic Curve Cryptog-
raphy, 2000.
Fei Hao is an assistant professor in Huazhong

University of Science and Technology. He re-
ceived the B.S. and M.S. degrees in School of
Liwei Kuang is currently studying for the Ph.D Mathematics and Computer Engineering from
degree in School of Computer Science and Xihua University, Chengdu, China, in 2005 and
Technology at Huazhong University of Science 2008, respectively. He was a research assis-
and Technology, Wuhan, China. He received the tant at Korea Advanced Institute of Science
master’s degree in School of Computer Science and Technology and Hangul Engineering Re-
from Hubei University of Technology, Wuhan, search Center, Korea. He has published over
China, in 2004. From 2004 to 2012, he was a Re- 30 research papers in international and national
search Engineer with FiberHome Technologies Journals as well as conferences. His research
Group, Wuhan, China. His research interests interests include social computing, big data analysis and processing and
include big data, pervasive computing and cloud mobile cloud computing.
computing.
Changqing Luo received his B.E. and M.E.

Yaoxue Zhang received his BS degree from degrees from Chongqing University of Posts and
Northwest Institute of Telecommunication Engi-
Telecommunications in 2004 and 2007, respec-
neering, China, and received his Ph.D degree
tively, and the Ph.D. from Beijing University of
in computer networking from Tohoku University, Posts and Telecommunications in 2011, all in
Japan in 1989. Then he joined Department of
Electrical Engineering. After the graduation, he
Computer Science, Tsinghua University, China.
joined the School of Computer Science and
He was a visiting professor of Massachusetts
Technology, Huazhong University of Science
Insitute of Technology (MIT) and University of
and Technology in 2011, where he currently
Aizu, in 1995 and 1998 respectively. Currently, works as an Assistant Professor. His current
he is a fellow of the Chinese Academy of En-
research focuses on algorithms and optimization
gineering and the President of Central South
for wireless networks, cooperative communication, green communica-
University, China. His research interests include computer networking, tion, resouce management in heterogeneous wireless networks, and
operating systems, cloud computing, transparent computing, and active
mobile cloud computing.
services.

A Holistic Approach To Distributed Dimensionality Reduction of Big Data

Uploaded by

Copyright:

Available Formats

A Holistic Approach To Distributed Dimensionality Reduction of Big Data

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Holistic Approach To Distributed Dimensionality Reduction of Big Data

Uploaded by

Copyright:

Available Formats

This article has been accepted for publication in a future issue of this journal, but has not been

A Holistic Approach to Distributed

Index Terms—Big Data, Tensor, Dimensionality Reduction, Transparent Computing.

1 I NTRODUCTION is a technique that expresses the data in such a way as to

heterogeneous data from multiple sources as a uni- TABLE 1

2 P RELIMINARIES AND P ROBLEM S TATEMENT

heterogeneous data collected from multiple sources C3 = C1 ×C2 , (9)

data Ds collected from multiple sources are modeled

The Higher Order Orthogonal Iteration [18] can be used

iteration are equal to Local Data Stream f f

First User Second User Third User

2 2 data can guarantee 94.2% approximation accuracy.

Fig. 14. Initial chunk (a), approximate chunk (b), and 0

Fig. 16. Performance comparison between L-HOSVD al-

0 investigations mentioned above are concentrated on spe-

Fig. 17. An illustration of the Lanczos iteration in the L- 8 C ONCLUSION

Fei Hao is an assistant professor in Huazhong

Changqing Luo received his B.E. and M.E.

You might also like