Multi-View Multiple Clustering
arXiv:1905.05053v1 [cs.LG] 13 May 2019
Shixing Yao1 , Guoxian Yu1 , Jun Wang1 , Carlotta Domeniconi2 , Xiangliang Zhang3
1
College of Computer and Information Sciences, Southwest University, Chongqing, China
2
Department of Computer Science, George Mason University, VA, USA
3
CEMSE, King Abdullah University of Science and Technology, Thuwal, SA
4
Fourth Affiliation
{ysx, gxyu, kingjun}@swu.edu.cn,
[email protected],
[email protected]
Abstract
Multiple clustering aims at exploring alternative
clusterings to organize the data into meaningful
groups from different perspectives. Existing multiple clustering algorithms are designed for singleview data. We assume that the individuality and
commonality of multi-view data can be leveraged
to generate high-quality and diverse clusterings.
To this end, we propose a novel multi-view multiple clustering (MVMC) algorithm. MVMC first
adapts multi-view self-representation learning to
explore the individuality encoding matrices and
the shared commonality matrix of multi-view data.
It additionally reduces the redundancy (i.e., enhancing the individuality) among the matrices using the Hilbert-Schmidt Independence Criterion
(HSIC), and collects shared information by forcing
the shared matrix to be smooth across all views.
It then uses matrix factorization on the individual
matrices, along with the shared matrix, to generate diverse clusterings of high-quality. We further
extend multiple co-clustering on multi-view data
and propose a solution called multi-view multiple co-clustering (MVMCC). Our empirical study
shows that MVMC (MVMCC) can exploit multiview data to generate multiple high-quality and
diverse clusterings (co-clusterings), with superior
performance to the state-of-the-art methods.
1
Introduction
The goal of clustering is to partition samples into disjoint
groups to facilitate the discovery of hidden patterns in the
data. Traditional clustering algorithms are designed for
single-view data. With the diffusion of the internet of things
and of big data, samples can be easily collected from different sources, or observed from different views. For example, a
video can be characterized using image signals and audio signals, and a given news can be reported in different languages.
Objects with diverse feature views are typically called multiview data. It’s recognized that the integration of information
contained in multiple views can achieve consolidated data
clustering [Chao et al., 2017]. Many multi-view clustering
methods have been developed to extract comprehensive infor-
Multi-view Data
View1
Texture
Clustering1
(Texture+Shape)
View2
Shape
Color
Clustering2
(Color+Shape)
Figure 1: An example of multi-view multiple clustering. Two alternative clusterings (texture+shape and color+shape) can be generated using the commonality (shape) and the individuality (texture
and color) information of the same multi-view objects.
mation from multiple feature views; examples are co-training
based [Kumar and Daumé, 2011], multiple kernel learning
[Gönen and Alpaydın, 2011], and subspace learning based
[Cao et al., 2015; Luo et al., 2018]. However, the aforementioned clustering methods typically provide a single clustering, which may fail to reveal the high-quality but all diverse
alternative clusterings of the same data.
For example, in Figure 1, we have a collection of objects
represented by a texture view and a color view. We can
group the objects based on their shared shapes. By leveraging the commonality and the individuality of these multiview objects, we can obtain two alternative clusterings (texture+shape and color+shape), as shown in the bottom of the
figure. From this example, we can see that multi-view data
include not only the commonality information for generating
high-quality clustering (as multi-view clustering does), but
also the individual (or specific) information for generating
diverse clusterings (as multiple clustering aims to achieve)
[Bailey, 2013].
To explore different clusterings of the given data, multiple
clustering has emerged as a new branch of clustering in recent
years. Some approaches seek diverse clusterings in alternative to those already explored, by enforcing the new ones to
be different [Bae and Bailey, 2006; Davidson and Qi, 2008;
Yang and Zhang, 2017]; other solutions simultaneously seek
multiple clusterings by reducing their correlation [Caruana
et al., 2006; Jain et al., 2008; Dang and Bailey, 2010;
Wang et al., 2018], or by seeking orthogonal (or independent) subspaces and clusterings therein [Niu et al., 2010;
Ye et al., 2016; Mautz et al., 2018; Wang et al., 2019].
However, these multiple clustering methods are designed for
single-view data.
Based on the discussed example in Figure 1, we leverage
the individuality and the commonality of multi-view data to
generate high-quality and diverse clusterings, and we propose
an approach called multi-view multiple clustering (MVMC)
to achieve this goal. To the best of our knowledge, MVMC is
the first attempt to encompass both multiple clusterings and
multi-view clustering, where the former focuses on generating diverse clusterings from a single data view, and the latter
focuses on a single consensus clustering by summarizing the
information from different views. MVMC first extends multiview self-representation learning [Luo et al., 2018] to explore
the individuality information encoding matrices and the commonality information matrix shared across views. To obtain
more credible commonality information from multiple views,
we force the commonality information matrix to be smooth
across all views. In addition, we use the Hilbert-Schmidt
Independence Criterion (HISC)[Gretton et al., 2005] to enhance the individuality between matrices, and consequently
increase the diversity between clusterings. We then use matrix factorization to jointly factorize each individuality matrix (for diversity) and the commonality matrix (for quality)
into a clustering indicator matrix and a basis matrix. To simultaneously seek the individual and common data matrices,
and diverse clusterings therein, we use an alternating optimization technique to solve the unified objective. In addition, we extend multiple co-clustering [Tokuda et al., 2017;
Wang et al., 2018] to the multi-view scenario, and term
the extended approach as multi-view multiple co-clustering
(MVMCC).
The main contributions of our work are summarized as follows:
• We study how to generate multiple clusterings from
multi-view data, which is an interesting and challenging problem, but largely overlooked. The problem we
address is different from existing multi-view clustering,
which generates a single clustering by leveraging multiple views, and also different from multiple clustering,
which produces alternative clusterings from single-view
data.
• We introduce a unified objective function to simultaneously seek multiple individuality information encoding matrices, and the commonality information encoding matrix. This unified function can leverage the individuality to generate diverse clusterings, and the commonality to boost the quality of the generated clusterings. We further adopt an alternative optimization technique to solve the unified objective.
• Extensive experimental results show that MVMC
(MVMCC) performs considerably better than existing
multiple clustering (co-clustering) algorithms [Cui et al.,
2007; Jain et al., 2008; Niu et al., 2010; Ye et al., 2016;
Yang and Zhang, 2017; Tokuda et al., 2017; Wang et al.,
2018; Wang et al., 2019] in exploring multiple clusterings and co-clusterings.
2
The Proposed Methods
2.1
Multi-View Multiple Clustering
Suppose Xv ∈ Rdv ×n denotes the feature data matrix of
the v-th view, v ∈ {1, 2, · · · , m}, for n objects in the dv
dimensional space. We aim to generate h (provided by the
user) different clusterings from {Xv }m
v=1 using the shared
and individual information embedded in the data matrices.
Most multi-view clustering approaches in essence focus on
the shared and complementary information of multiple data
views to generate a consolidated clustering [Chao et al.,
2017]. By viewing each subspace as a feature view, an
intuitive solution to explore multiple clusterings on multiview data is to first concatenate different feature views, and
then apply subspace-based multiple clustering methods on the
concatenated feature vectors [Niu et al., 2010; Ye et al., 2016;
Wang et al., 2019].
To find high-quality and diverse multiple clusterings, we
should make concrete use of the individuality and commonality of multi-view data. The individuality helps to explore diverse clusterings, while the commonality coordinates the diverse clusterings to capture the common knowledge of multiview data. To explore the individuality and commonality of
multi-view data, we extend the multi-view self-representation
learning [Cao et al., 2015; Luo et al., 2018] as follows:
JD ({Dk }hk=1 , U) =
h
m
X
1 X
k Xv − Xv (U + Dk )k2F
m v=1
k=1
+
n×n
λ1 Φ1 ({Dk }hk=1 )
(1)
+ λ2 Φ2 (U)
where U ∈ R
is specified to encode the commonality of
k
n×n
the data matrices {Xv }m
is used to env=1 , and D ∈ R
code the individuality of the k-th (k ∈ {1, 2, , · · · , h}) group
of views. Φ1 ({Dk }hk=1 ) and Φ2 (U) (defined later) are two
constraints used to enhance the individuality and commonality. The multi-view self-representation learning in [Cao et
al., 2015; Luo et al., 2018] requires h = m. In contrast, Eq.
(1) does not have this requirement. As a result, the groupwise individuality and diversity are jointly considered, and
the number of alternative clusterings can be adjusted by the
user. The assumption for the linear representation in Eq. (1)
is that a data sample can be expressed as a linear combination
of other samples in the subspace. This assumption is widelyused in sparse representation [Wright et al., 2010] and lowrank representation learning [Liu et al., 2013].
[Luo et al., 2018] recently combined U and {Dk }m
k=1 into
an integrated co-association matrix of samples, and then applied spectral clustering to seek a consistent clustering. Their
empirical study shows that the individual information encoded by Dk helps producing a robust clustering. However,
0
since Xv and Xv (v 0 6= v) describe the same objects using
different types of features, the matrix Dk resulting from Eq.
0
(1) may still have a large information overlap with Dk . As
0
a result, the expected individuality of Dk and Dk cannot be
guaranteed. This overlapping of information is not necessarily an issue for multi-view clustering, which aims at finding
a single clustering, but it is for our problem, where multiple
clusterings of both high-quality and diversity are expected.
To enhance the diversity between individuality encoding
(representation) matrices {Dk }hk=1 , we approximately quantify the diversity based on the dependency between these
matrices. The smaller the dependency between data matrices is, the larger their diversity is, since the matrices are
less correlated. Various measurements can be used to evaluate the dependence between variables. Here, we adopt the
Hilbert-Schmidt Independence Criterion (HSIC) [Gretton et
al., 2005], for its simplicity, solid theoretical foundation, and
capability in measuring both linear and nonlinear dependence
between variables. HSIC computes the squared norm of the
0
cross-covariance operator over Dk and Dk in Hilbert Space
to estimate the dependency. The empirical HISC does not
have to explicitly compute the joint distribution of Dk and
0
Dk , it is give by:
0
0
HSIC(Dk , Dk ) = (n − 1)−2 tr(Kk HKk H)
0
(2)
0
where Kk , Kk , H ∈ Rn×n , Kk and Kk are used to measure the kernel induced similarity between vectors of Dk and
0
Dk , respectively. H = δij − 1/n, δij = 1 if i = j, δij = 0
otherwise. In this paper, we adopt the inner product kernel
to specify Kk = (Dk )T Dk ∀k ∈ {1, 2, · · · , h}. Then we
minimize the overall HISC on h individuality matrices to reduce the redundancy between them and specify Φ1 (Dk ) as
follows:
Φ1 ({Dk }hk=1 )
=
k
HSIC(D , D )
m
X
h
X
0
(n − 1)−2 tr(Dk HKk H(Dk )T ) (3)
e k (Dk )T )
tr(Dk K
k=1
k0
e k = (n − 1)−2 Pm0
where K
k =1,k0 6=k HK H.
Inspired by subspace-based multi-view learning [Gao et
al., 2015; Chao et al., 2017] and manifold regularization
[Belkin et al., 2006], we specify Φ2 (U) in Eq. (1) to collect
more shared information from multiple data views as follows:
Φ2 (U) =
V X
n
X
k
n×rk
k
(5)
n×rk
where R ∈ R
and B ∈ R
(rk is the number of
sample clusters) are the clustering indicator matrix and the
basis matrix, respectively. Here, the k-th clustering is generated not only with respect to Dk , but also to U, which encodes the shared information of multi-view data. As such, the
explored k-th clustering (encoded by Rk ) not only reflects the
individuality of views in the k-th group, but also captures the
commonality of all data views. As a consequence, a highquality, and yet diverse clustering can be generated.
The above process first explores the individual information
matrices and the shared information matrix, and then generates diverse clusterings on the data matrices. A sub-optimal
solution may be obtained as a result because the two steps
are performed separately. To avoid this, we advocate to simultaneously optimize {Dk }hk=1 and the diverse clusterings
{Rk }hk=1 therein, and formulate a unified objective function
for MVMC as follows:
JM C ({Dk }hk=1 , {Rk }hk=1 , U)
h
1X
=
k (U + Dk ) − Bk (Rk )T k2F
h
k=1
k
ek
k T
(6)
T
e
+ λ1 tr(D K (D ) ) + λ2 tr(ULU
)
k0
k=1,k6=k0
=
U + Dk = Bk (Rk )T
s.t. Xv = Xv (U + Dk ), v ∈ {1, 2, · · · , m}
h
X
k=1,k6=k0
=
adopt the widely used semi-nonnegative matrix factorization
[Ding et al., 2010] to explore the k-th clustering on U + Dk
as follows:
v
e T)
k ui − uj k22 Wij
= tr(ULU
(4)
v=1 i,j=1
v
where Wij
is the feature similarity between xvi and xvj . To
compute Wv , we simply adopt an = 5 nearest neighborhood graph and use the Gaussian heat kernel (with kernel
width as the standard deviation of the distance between samples) to quantify the similarity between neighborhood same = Pm (Λv − Wv ) and Λv is a diagonal matrix
ples. L
v=1
P
n
v
with Λvii =
j=1 Wij . Minimizing Φ2 (U) can guide U
to encode consistent and complementary information shared
across views. In this way, the quality of diverse clusterings
can be boosted using enhanced shared information.
Given the equivalence between matrix factorization based
clustering and spectral clustering (or k-means clustering), we
By solving Eq. (6), we can simultaneously obtain multiple
diverse clusterings of quality by leveraging the commonality and individuality information of multiple views. Our experiments confirm that MVMC can generate multiple clusterings with enhanced diversity and quality. In addition, it
outperforms the counterpart algorithms [Cui et al., 2007;
Jain et al., 2008; Niu et al., 2010; Ye et al., 2016; Wang et
al., 2019], which concatenate multiple data views into a composite view, and then explore multiple clusterings in the subspaces of the composite view.
Binary matrices {Rk }hk=1 are hard to directly optimize.
As such, we relax the entries of {Rk }hk=1 to nonnegative
numeric values. Since Eq. (6) is not jointly convex for
{Dk }hk=1 , U and {Rk }hk=1 , it is unrealistic to find the global
optimal values for all the variables. Here, we solve Eq. (6)
via the alternating optimization method, which alternatively
optimizes one variable, while fixing the other variables. The
detailed optimization process can be viewed in the supplementary file due to the limitation of space.
2.2
Multiple Views Multiple Co-Clusterings
Multiple co-clustering algorithms recently have also been
suggested to explore alternative co-clusterings from the same
data [Tokuda et al., 2017; Wang et al., 2018]. Multiple co-clustering methods aim at exploring multiple twoway clusterings, where both samples are features are clustered. In contrast, multiple clustering techniques only explore diverse one-way clusterings, where only samples (or
only features) are clustered. Based on the merits of matrix
tri-factorization in exploring co-clusters [Wang et al., 2011;
Wang et al., 2018], we can seek multiple co-clusterings on
multiple views by optimizing an objective function as follows:
v m
v m
JM CC ({Dv }m
v=1 , {R }v=1 , {C }v=1 , U)
m
1 X
=
k Xv (U + Dv ) − Cv Sv (Rv )T k2F
m v=1
v
v T
Table 1: Characteristics of multi-view datasets. n is the number of
samples, dv is the dimensionality of samples, ’classes’ is the number
of ground-truth clusters, and m is the number of views.
Datasets
Caltech-7
Caltech-20
Mul-fea digits
Wiki article
Corel
Mirflickr
(7)
T
e
+ λ1 tr(D K(D ) ) + λ2 tr(ULU
)
v
v
v
v
v
s.t. X = X (U + D ), C ≥ 0, R ≥ 0
n
1474
2386
2000
2866
5000
16738
dv
[40,48,254,1984,512,928]
[40,48,254,1984,512,928]
[76,216,64,240,47,6]
[128,10]
[9,18,9]
[150,500]
classes
7
20
10
10
50
24
m
6
6
6
2
3
2
where Cv ∈ Rdv ×cv and Rv ∈ Rn×rk correspond to the
row-cluster (grouping features) and column-cluster (grouping samples) indicator matrices of the h-th co-clustering.
Sk ∈ Rcv ×rv plays the role of absorbing the different scaling
factors of Rv and Cv to minimize the squared error. Here we
fix h = m for MVMCC, since different feature views have
different numbers of features. Eq. (7) can be optimized following the similar procedure for optimizing Eq. (6), which is
provided in the supplementary file due to page limit.
Index (DI) as the internal index. Large values of SC and
DI indicate a high quality clustering. To quantify the redundancy between alternative clusterings, we use Normalized Mutual Information(NMI) and Jaccard Coefficient(JC)
as external indices. Smaller values of NMI and JC indicate smaller redundancy between alternative clusterings. All
these metrics have been used in the multiple clustering literature [Bailey, 2013]. The formal definitions of these metrics,
omitted here to save space, can be found in [Bailey, 2013;
Yang and Zhang, 2017].
3
3.2
Experimental Results and Analysis
3.1
Experimental Setup
In this section, we evaluate our proposed MVMC and
MVMCC on five widely-used multi-view datasets [Li et al.,
2015; Tao et al., 2018], which are described in Table 1. The
datasets have different number of views and are from different domains. Caltech-71 and Caltech-20 [Li et al., 2015]
are two subsets of Caltech-101, which contains only 7 and 20
classes, respectively. The creation of these subsets is due to
the unbalance of the number of data in each class of Caltech101. Each sample is made of 6 views on the same image.
Mul-fea digits2 is comprised of 2,000 data points from 0 to
9 digit classes, with 200 data points for each class. There are
six public features available: 76 Fourier coefficients of the
character shapes, 216 profile correlations, 64 Karhunenlove
coefficients, 240 pixel averages in 2 × 3 windows, 47 Zernike
moments, and 6 morphological features. Wiki article3 contains selected sections from the Wikipedia’s featured articles
collection. We considered only the 10 most populated categories. It contains two views: text and image. Corel4 [Tao
et al., 2018] consists of 5000 images from 50 different categories. Each category has 100 images. The features are color
histogram (9), edge direction histogram (18), and WT (9).
Mirflickr5 contains 25,000 instances collected from Flicker.
Each instance consists of an image and its associated textual
tags. To avoid noise, here we remove textual tags that appear
less than 20 times in the dataset, and then delete instances
without textual tags or semantic labels. This process gives us
16,738 instances.
Multiple clusterings need to quantify the quality and diversity of alternative clusterings. To measure quality, we
use the widely-adopted Silhouette Coefficient (SC) and Dunn
1
https://github.com/yeqinglee/mvdata
https://archive.ics.uci.edu/ml/datasets/Multiple+Features
3
http://www.svcl.ucsd.edu/projects/crossmodal/
4
http://www.cais.ntu.edu.sg/∼chhoi/SVMBMAL/
5
http://press.liacs.nl/mirflickr/mirdownload.html
2
Discovering multiple one-way clusterings and
multiple co-clusterings
We compare the one-way multiple clusterings found by
MVMC against Dec-kmeans [Jain et al., 2008], MNMF
[Yang and Zhang, 2017], OSC [Cui et al., 2007], mSC [Niu et
al., 2010], ISAAC [Ye et al., 2016], and MISC [Wang et al.,
2019]. We also compare the multiple co-clusterings found
by MVMCC against MultiCC [Wang et al., 2018] and MCCNBMM [Tokuda et al., 2017]. The input parameters of the
comparing methods are set as the authors suggested in their
papers or shared code. The parameter values of MVMC and
MVMCC are λ1 = 10, λ2 = 100 and h = 2 for multiple
one-way clusterings, and h = m for multiple co-clusterings.
None of existing multiple clusterings algorithms can work
on multiple view data, we concatenate the feature vectors of
multiple view data and then run these comparing methods on
the concatenated vectors to seek alternative clusterings. Our
MVMC and MCMCC directly run on the multiple view data,
without such feature concatenation.
We use k-means to generate the reference clustering for
MNMF, and then use their respective solutions to generate two alternative clustering (C1 , C2 ). We downloaded the
source code of MNMF, ISAAC, MultiCC, MISC, and MCCNBMM, and implemented the other methods (Dec-kmeans,
mSC, and OSC) following the respective original papers. Input parameters of the comparing methods were fixed or optimized as suggested by the authors. Following the experimental protocol adopted by these methods, we quantify the
average clustering quality on C1 and C2 , and measure the diversity between C1 and C2 . We fix the number of row-clusters
rk for each clustering as the respective number of classes of
each dataset, as listed in Table 1. For co-clustering, we adopt
a widely used technique [Monti et al., 2003] to determine the
number of column clusters ck . Detailed parameter values can
be viewed in the supplementary file.
Table 2 reports the average results (of ten independent
runs) and standard deviations of these comparing methods on
exploring two alternative one-way clusterings with h = 2.
Table 2: Quality and Diversity of the various competing methods on finding multiple clusterings. ↑(↓) indicates the direction of preferred
values for the corresponding measure. •/◦ indicates whether MVMC is statistically (according to pairwise t-test at 95% significance level)
superior/inferior to the other method.
Caltech-7
Caltech-20
Corel
Digits
Wiki article
Mirflickr
SC↑
DI↑
NMI↓
JC↓
SC↑
DI↑
NMI↓
JC↓
SC↑
DI↑
NMI↓
JC↓
SC↑
DI↑
NMI↓
JC↓
SC↑
DI↑
NMI↓
JC↓
SC↑
DI↑
NMI↓
JC↓
Dec-kmeans
0.049±0.002•
0.044±0.000•
0.024±0.000•
0.126±0.001•
-0.124 ± 0.001•
0.026 ± 0.000•
0.056 ± 0.000•
0.050 ± 0.001•
0.112 ± 0.014◦
0.031 ± 0.001•
0.643 ± 0.035•
0.219 ± 0.004•
-0.133 ± 0.022•
0.016 ± 0.000•
0.078 ± 0.000•
0.076 ± 0.000•
0.447 ± 0.016◦
0.124 ± 0.000
0.803 ± 0.019•
0.593 ± 0.006•
-0.004 ± 0.000◦
0.061 ± 0.002•
0.427 ± 0.012•
0.878 ± 0.022•
ISAAC
0.235±0.011◦
0.034±0.001•
0.485±0.023•
0.363±0.008•
0.085 ± 0.000◦
0.035 ± 0.000•
0.475 ± 0.011•
0.222 ± 0.002•
-0.052 ± 0.000•
0.032 ± 0.000•
0.204 ± 0.002•
0.031 ± 0.001•
-0.001 ± 0.000•
0.016 ± 0.000•
0.364 ± 0.012•
0.279 ± 0.003•
-0.024 ± 0.000•
0.085 ± 0.000•
0.042 ± 0.000•
0.078 ± 0.001•
-0.092 ± 0.000•
0.062 ± 0.005•
0.016 ± 0.000•
0.047 ± 0.000•
MISC
0.201±0.002◦
0.048±0.000•
0.513±0.016•
0.349±0.001•
0.036 ± 0.000◦
0.033 ± 0.000•
0.489 ± 0.013•
0.198 ± 0.002•
-0.070 ± 0.000•
0.020 ± 0.000•
0.209 ± 0.002•
0.029 ± 0.001•
0.061 ± 0.000
0.018 ± 0.001•
0.399 ± 0.008•
0.298 ± 0.000•
-0.031 ± 0.000•
0.085 ± 0.001•
0.041 ± 0.000•
0.078 ± 0.000•
-0.028 ± 0.000◦
0.071 ± 0.000•
0.021 ± 0.000•
0.037 ± 0.000•
Table 3: Quality and Diversity of the various competing methods on
finding multiple co-clusterings. •/◦ indicates whether MVMCC is
statistically (according to pairwise t-test at 95% significance level)
superior/inferior to the other method.
Caltech-7
Caltech-20
Corel
Digits
Wiki article
Mirflickr
SC↑
DI↑
NMI↓
JC↓
SC↑
DI↑
NMI↓
JC↓
SC↑
DI↑
NMI↓
JC↓
SC↑
DI↑
NMI↓
JC↓
SC↑
DI↑
NMI↓
JC↓
SC↑
DI↑
NMI↓
JC↓
MCC-NBMM
-0.100 ± 0.002•
0.034 ± 0.000•
0.376 ± 0.014•
0.185 ± 0.003•
-0.134 ± 0.000•
0.026 ± 0.000•
0.325 ± 0.010•
0.150 ± 0.002•
-0.087 ± 0.002•
0.024 ± 0.000•
0.377 ± 0.013•
0.176 ± 0.004•
-0.243 ± 0.024•
0.014 ± 0.000
0.286 ± 0.006•
0.166 ± 0.001•
-0.0694 ± 0.000•
0.079 ± 0.000
0.287 ± 0.005•
0.127 ± 0.002•
-0.095 ± 0.000•
0.052 ± 0.000•
0.017 ± 0.000•
0.052 ± 0.000•
MultiCC
-0.103 ± 0.006•
0.011 ± 0.000•
0.005 ± 0.000
0.087 ± 0.000
-0.229 ± 0.012•
0.011 ± 0.000•
0.021 ± 0.000
0.056 ± 0.000•
-0.172 ± 0.012•
0.015 ± 0.000•
0.164 ± 0.002•
0.044 ± 0.000•
-0.214 ± 0.013•
0.003 ± 0.000•
0.010 ± 0.000◦
0.060 ± 0.000◦
-0.058 ± 0.000•
0.041 ± 0.001•
0.007 ± 0.000
0.054 ± 0.000
-0.194 ± 0.002•
0.065 ± 0.000•
0.081 ± 0.000•
0.052 ± 0.000•
MVMCC
0.198 ± 0.004
0.047 ± 0.000
0.005 ± 0.000
0.083 ± 0.000
0.080 ± 0.000
0.156 ± 0.008
0.026 ± 0.000
0.029 ± 0.000
-0.017 ± 0.000
0.152 ± 0.002
0.070 ± 0.000
0.010 ± 0.000
0.144 ± 0.002
0.018 ± 0.000
0.207 ± 0.003
0.115 ± 0.000
0.064 ± 0.000
0.078 ± 0.000
0.006 ± 0.000
0.052 ± 0.000
0.064 ± 0.000
0.151 ± 0.003
0.005 ± 0.000
0.022 ± 0.000
From Table 2, we can find MVMC often performs better
than these comparing methods across different multi-view
datasets, which proves the effectiveness of MVMC on exploring alternative clusterings on multi-view data. MVMC
always has the best results on the diversity metrics (NMI and
JC). This fact suggests it can find two alternative clusterings
with high diversity. MVMC occasionally has a lower value
on quality metrics (SC and DI) than some of these comparing
methods. That is explainable, since it is a widely-recognized
dilemma to obtain alternative clusterings with high-diversity
and high-quality, and MVMC has a much larger diversity than
these comparing methods. Although these comparing methods employ different techniques to explore alternative clusterings in the subspaces or by reducing the redundancy between the clusterings, they almost always lose to MVMC.
MNMF
0.234±0.001◦
0.034±0.000•
0.022±0.000•
0.094±0.000•
-0.169 ± 0.000•
0.009 ± 0.000•
0.052 ± 0.002•
0.033 ± 0.001•
-0.277 ± 0.000•
0.015 ± 0.000•
0.092 ± 0.001•
0.013 ± 0.000
-0.076 ± 0.000•
0.008 ± 0.000•
0.011 ± 0.000
0.053 ± 0.000
-0.029 ± 0.000•
0.083 ± 0.000•
0.006 ± 0.000
0.056 ± 0.001
-0.058 ± 0.000•
0.053 ± 0.001•
0.014 ± 0.000•
0.023 ± 0.000
mSC
0.163±0.008◦
0.056±0.000•
0.152±0.002•
0.136±0.001•
-0.172 ± 0.006•
0.028 ± 0.000•
0.240 ± 0.003•
0.074 ± 0.001•
-0.128 ± 0.000•
0.019 ± 0.000•
0.394 ± 0.006•
0.072 ± 0.001•
-0.117 ± 0.000•
0.016 ± 0.001•
0.515 ± 0.022•
0.279 ± 0.004•
0.108 ± 0.002◦
0.095 ± 0.001•
0.212 ± 0.001•
0.113 ± 0.002•
-0.093 ± 0.000•
0.064 ± 0.001•
0.216 ± 0.006•
0.073 ± 0.000•
OSC
0.261±0.004◦
0.066±0.000
0.693±0.015•
0.522±0.046•
0.196 ± 0.001◦
0.056 ± 0.000•
0.715 ± 0.025•
0.444 ± 0.004•
0.238 ± 0.002◦
0.032 ± 0.000•
0.762 ± 0.018•
0.410 ± 0.013•
0.471 ± 0.013◦
0.062 ± 0.000•
0.822 ± 0.028•
0.656 ± 0.015•
0.418 ± 0.012◦
0.135 ± 0.001◦
0.783 ± 0.052•
0.535 ± 0.014•
0.017 ± 0.000◦
0.059 ± 0.002•
0.575 ± 0.011•
0.368 ± 0.011•
MVMC
0.140±0.002
0.062±0.000
0.006 ± 0.000
0.076 ± 0.000
0.004 ± 0.000
0.183 ± 0.000
0.027 ± 0.000
0.023 ± 0.000
-0.016 ± 0.000
0.354 ± 0.000
0.070 ± 0.000
0.010 ± 0.000
0.064 ± 0.000
0.087 ± 0.000
0.008 ± 0.000
0.052 ± 0.000
0.066 ± 0.000
0.122 ± 0.000
0.006 ± 0.000
0.052 ± 0.000
-0.038 ± 0.000
0.173 ± 0.005
0.005 ± 0.000
0.022 ± 0.000
Table 4: Comparison results with/without the shared information
matrix U for discovering multiple clusterings.
Digits
Wiki article
MVMC(nU)
MVMC
MVMC(nU)
MVMC
SC↑
0.061
0.064
0.059
0.066
DI↑
0.076
0.087
0.088
0.122
NMI↓
0.008
0.008
0.005
0.006
JC↓
0.050
0.052
0.051
0.052
The cause is that the concatenated long feature vectors override the intrinsic structures of different views. It also explains
why these comparing methods have lower values on diversity metrics (NMI and JC). In practice, because of the long
concatenated feature vectors, these comparing methods generally suffer from a long runtime and cannot be applied on
multi-view datasets with high-dimensional feature views. In
contrast, our MVMC is rather efficient, it does not need to
concatenate features and is directly applicable on each view.
Table 3 reports the results of MVMCC, MultiCC and
MCC-NBMM in exploring multiple co-clusterings, whose
number is equal to the number of views h = m (unlike
h = 2 for Table 2), because of the feature view heterogeneity. For this evaluation, we report the average quality
and diversity values of all pairwise alternative co-clusterings
of h clusterings. We can see MVMCC significantly outperforms these two state-of-the-art multiple co-clustering methods across different evaluation metrics and datasets. MultiCC
sometimes obtains a better diversity than MVMCC. That is
because it directly optimizes the diversity on the samplecluster and feature-cluster matrices, while our MVMCC indirectly optimizes the diversity and mainly from the samplecluster matrices. These results prove the effectiveness of our
solution on exploring multiple co-clusterings on multi-view
data.
Following the experimental setup in Table 2, we conduct additional experiments to investigate the contribution of
shared matrix U on improving the quality of multiple clusterings. For this investigation, we introduce a variant of
h
MVMC(nU), which only uses {Dk }k=1 to generate multiple clusterings and disregards the shared information matrix
U. From the reported results on Digits and Wikipeda datasets
3.3
Parameter analysis
λ1 and λ2 are two important input parameters of MVMC
(MCMCC) for seeking the individuality and commonality information of multi-view data, which consequently affect the
quality and diversity of multiple clusterings. We investigate
the sensitivity of MVMC to these parameters by varying λ1
(it controls diversity) and λ2 (it controls quality) in the range
[10−3 , 10−2 , · · · , 103 ]. Figure 2 reports the Quality (DI) and
Diversity (1-NMI, the larger the better) of MVMC on the
Caltech-7 dataset. We have several interesting observations:
(i) diversity (1-NMI) increases as λ1 increases, but not as
much as with the increase of λ2 (see Figure 2(b)); quality
(DI) increases as λ2 increases, but not as much as with the
increase of λ1 . (ii) The synchronous increase of λ1 and λ2
does not necessarily give the highest results in quality and diversity. (iii) When both λ1 and λ2 are fixed to a small value,
both quality and diversity are reduced. This fact suggests that
both diversity and commonality information of multi-view
data should be used for the exploration of alternative clusterings. These observations again confirm the known dilemma
between diversity and quality of multiple clusterings. The
values λ1 = 10 and λ2 = 100 often provide the best balance
between quality and diversity.
Caltech-7
0.065
6.8
0.06
6.6
0.055
6.4
NMI
DI
in Table 4, we can find the diversity (NMI and JC) of two alternative clusterings keeps almost the same when the shared
information matrix U is excluded. However, the quality (SC
and DI) is clearly reduced when U is excluded. This contrast
proves that U indeed improves the quality of multiple clusterings, and justifies our motivation to seek U across views
and leverage it with individuality information matrix Dv for
multiple clusterings.
In summary, we can conclude that the diversity information
matrix helps to generate diverse clusterings, and the commonality information matrix evacuated from multi-view data can
respectively improve the quality of these clusterings. These
experimental results also confirm our assumption that the diversity and commonality of multi-view data can be leveraged
to generate diverse clusterings with high-quality.
0.05
Caltech-7
6.2
0.045
0.04
2
×10-3
6
4
6
8
10
12
5.8
2
4
6
h
8
10
12
h
(a) DI
(b) NMI
Figure 3: Quality (SI) and Diversity(NMI, the lower the better) of
MVMC vs. h from 2 to 2 ∗ m on the Caltech-7 dataset.
of themselves. Overall, we can find that our MVMC can explore h ≥ 2 alternative clusterings with quality and diversity.
3.4
Runtime Analysis
Table 5 gives the runtimes of all methods. The time complexity of our MVMC is O(tn2 d(h2 v + 2hv + 2h)), where t
is the number of iterations for optimization, v is the number of views, n is the number of samples, d is the number of features, h is the number of clusterings. The experiments are conducted on a server with Ubuntu 16.04, Intel
Xeon8163 with 1TB RAM; all methods are implemented in
Matlab2014a. OSC is the fastest and Dec-kmeans is the second. OSC finds multiple clusterings by iteratively reducing dimension and applying k-means. Dec-kmeans jointly
seeks two different clusterings by minimizing a k-means sum
squared error objective for the two clustering solutions, and
the correlation between them. Because k-means has a low
time complexity, these two k-means based methods are much
faster than the other techniques. MVMC and MNMF have
similar runtimes, since they are both based on nonnegative
matrix factorization, which has a large complexity than kmeans. MVMC is more efficient than other comparing methods (except OSC and Dec-kmeans), since it does not need to
concatenate features and is directly applicable on each view.
In summary, MVMC not only performs better than the stateof-the-art methods in exploring multiple clusterings, but also
holds a comparable runtime with the efficient counterparts.
Table 5: Runtimes of comparing methods (in seconds).
Caltech-7
Caltech-7
0.996
1-NMI
DI
0.06
0.05
0.04
4
0.03
4
Dec-kmeans
ISAAC
MISC
MNMF
mSC
OSC
MVMC
0.998
0.07
2
2
0
0
-2
-2
-4
λ )
log(10
2
(a) DI
0.994
0.992
0.99
0.988
4
4
2
2
0
0
-4
λ )
log(10
1
-2
-2
-4
-4
λ )
log(10
2
λ )
log(10
1
(b) 1-NMI
Figure 2: Quality (SI) and Diversity(1-NMI) of MVMC vs. λ1 and
λ2 on the Caltech-7 dataset.
We vary h from 2 to 2 ∗ m on Caltech-7 dataset to explore
the variation of average quality and diversity of multiple clusterings generated by MVMC. In Figure 3, as the increase of
h, the average quality of multiple clusterings decreases gradually with small fluctuations, and the average diversity fluctuates in a small range. These patterns are accountable from
the dilemma of quality and diversity of multiple clusterings.
More alternative clusterings with diversity scarify the quality
4
Caltech7
112
1432
1336
105
864
4
332
Caltech20
273
2232
2156
1257
1564
10
585
Corel
433
1822
1956
614
1269
1
790
Digits
39
1223
1278
356
690
1
234
Wiki
14
656
638
220
527
2
490
Mirflickr
361
10012
21168
1577
2631
217
9564
Total
1232
17074
28532
4129
7545
235
11995
Conclusion
In this paper, we proposed an approach to generate multiple clusterings (co-clusterings) from multi-view data, which
is an interesting but largely overlooked clustering topic which
encompasses multi-view clustering and multiple clusterings.
Our approach leverages the diversity and commonality of
multi-view data to generate multiple clusterings, and outperforms state-of-the-art multiple clustering solutions. Our study
confirms the existence of individuality and commonality of
multi-view data, and their contribution for generating diverse
clusterings with quality. We plan to find a principal way to
automatically determine the number of alternative clusterings
for our proposed approach.
References
[Bae and Bailey, 2006] Eric Bae and James Bailey. Coala: A
novel approach for the extraction of an alternate clustering
of high quality and high dissimilarity. In ICDM, pages
53–62, 2006.
[Bailey, 2013] James Bailey. Alternative clustering analysis:
A review. In Aggarwal Charu and Reddy Chandan, editors, Data Clustering: Algorithms and Applications, pages
535–550. CRC Press, 2013.
[Belkin et al., 2006] Mikhail Belkin, Partha Niyogi, and
Vikas Sindhwani. Manifold regularization: A geometric
framework for learning from labeled and unlabeled examples. JMLR, 7(11):2399–2434, 2006.
[Cao et al., 2015] Xiaochun Cao, Changqing Zhang, Huazhu
Fu, Si Liu, and Hua Zhang. Diversity-induced multi-view
subspace clustering. In CVPR, pages 586–594, 2015.
[Caruana et al., 2006] Rich Caruana, Mohamed Elhawary,
Nam Nguyen, and Casey Smith. Meta clustering. In
ICDM, pages 107–118, 2006.
[Chao et al., 2017] Guoqing Chao, Shiliang Sun, and Jinbo
Bi. A survey on multi-view clustering. arXiv preprint
arXiv:1712.06246, 2017.
[Cui et al., 2007] Ying Cui, Xiaoli Z Fern, and Jennifer G
Dy. Non-redundant multi-view clustering via orthogonalization. In ICDM, pages 133–142, 2007.
[Dang and Bailey, 2010] Xuan Hong Dang and James Bailey. Generation of alternative clusterings using the cami
approach. In SDM, pages 118–129, 2010.
[Davidson and Qi, 2008] Ian Davidson and Zijie Qi. Finding
alternative clusterings using constraints. In ICDM, pages
773–778, 2008.
[Ding et al., 2010] Chris HQ Ding, Tao Li, and Michael I
Jordan. Convex and semi-nonnegative matrix factorizations. TPAMI, 32(1):45–55, 2010.
[Gao et al., 2015] Hongchang Gao, Feiping Nie, Xuelong
Li, and Heng Huang. Multi-view subspace clustering. In
ICCV, pages 4238–4246, 2015.
[Gönen and Alpaydın, 2011] Mehmet Gönen and Ethem Alpaydın. Multiple kernel learning algorithms. JMLR,
12(7):2211–2268, 2011.
[Gretton et al., 2005] Arthur Gretton, Olivier Bousquet,
Alex Smola, and Bernhard Schölkopf. Measuring statistical dependence with hilbert-schmidt norms. In ALT, pages
63–77, 2005.
[Jain et al., 2008] Prateek Jain, Raghu Meka, and Inderjit S
Dhillon. Simultaneous unsupervised learning of disparate clusterings. Statistical Analysis and Data Mining,
1(3):195–210, 2008.
[Kumar and Daumé, 2011] Abhishek Kumar and Hal
Daumé. A co-training approach for multi-view spectral
clustering. In ICML, pages 393–400, 2011.
[Li et al., 2015] Yeqing Li, Feiping Nie, Heng Huang, and
Junzhou Huang. Large-scale multi-view spectral clustering via bipartite graph. In AAAI, 2015.
[Liu et al., 2013] Guangcan Liu, Zhouchen Lin, Shuicheng
Yan, Ju Sun, Yong Yu, and Yi Ma. Robust recovery of
subspace structures by low-rank representation. TPAMI,
35(1):171–184, 2013.
[Luo et al., 2018] Shirui Luo, Changqing Zhang, Wei
Zhang, and Xiaochun Cao. Consistent and specific multiview subspace clustering. In AAAI, pages 3730–3737,
2018.
[Mautz et al., 2018] Dominik Mautz, Wei Ye, Claudia Plant,
and Christian Böhm. Discovering non-redundant k-means
clusterings in optimal subspaces. In KDD, pages 1973–
1982, 2018.
[Monti et al., 2003] Stefano Monti, Pablo Tamayo, Jill
Mesirov, and Todd Golub. Consensus clustering: a
resampling-based method for class discovery and visualization of gene expression microarray data. Machine
learning, 52(1-2):91–118, 2003.
[Niu et al., 2010] Donglin Niu, Jennifer G Dy, and Michael I
Jordan. Multiple non-redundant spectral clustering views.
In ICML, pages 831–838, 2010.
[Tao et al., 2018] Hong Tao, Chenping Hou, Xinwang Liu,
Tongliang Liu, Dongyun Yi, and Jubo Zhu. Reliable multiview clustering. In AAAI, pages 4123–4130, 2018.
[Tokuda et al., 2017] Tomoki Tokuda, Junichiro Yoshimoto,
Yu Shimizu, and et al. Multiple co-clustering based
on nonparametric mixture models with heterogeneous
marginal distributions. PLoS ONE, 12(10):e0186566,
2017.
[Wang et al., 2011] Hua Wang, Feiping Nie, Heng Huang,
and Fillia Makedon.
Fast nonnegative matrix trifactorization for large-scale data co-clustering. In IJCAI,
pages 1553–1558, 2011.
[Wang et al., 2018] Xing Wang, Guoxian Yu, Carlotta
Domeniconi, Jun Wang, Zhiwen Yu, and Zili Zhang. Multiple co-clusterings. In ICDM, pages 1308–1313, 2018.
[Wang et al., 2019] Xing Wang, Guoxian Yu, Carlotta
Domeniconi, Jun Wang, Guoqiang Xiao, and Maozu Guo.
Multiple independent subspace clusterings. In AAAI,
pages 1–8, 2019.
[Wright et al., 2010] John Wright, Yi Ma, Julien Mairal,
Guillermo Sapiro, Thomas S Huang, and Shuicheng Yan.
Sparse representation for computer vision and pattern
recognition. Proceedings of the IEEE, 98(6):1031–1044,
2010.
[Yang and Zhang, 2017] Sen Yang and Lijun Zhang. Nonredundant multiple clustering by nonnegative matrix factorization. Machine Learning, 106(5):695–712, 2017.
[Ye et al., 2016] Wei Ye, Samuel Maurus, Nina Hubig, and
Claudia Plant. Generalized independent subspace clustering. In ICDM, pages 569–578, 2016.