Multi-View Multiple Clustering

Carlotta Domeniconi

Multi-View Multiple Clustering

Carlotta Domeniconi

2019, Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence

visibility

…

description

7 pages

link

1 file

Multiple clustering aims at exploring alternative clusterings to organize the data into meaningful groups from different perspectives. Existing multiple clustering algorithms are designed for single-view data. We assume that the individuality and commonality of multi-view data can be leveraged to generate high-quality and diverse clusterings. To this end, we propose a novel multi-view multiple clustering (MVMC) algorithm. MVMC first adapts multi-view self-representation learning to explore the individuality encoding matrices and the shared commonality matrix of multi-view data. It additionally reduces the redundancy (i.e., enhancing the individuality) among the matrices using the Hilbert-Schmidt Independence Criterion (HSIC), and collects shared information by forcing the shared matrix to be smooth across all views. It then uses matrix factorization on the individual matrices, along with the shared matrix, to generate diverse clusterings of high-quality. We further extend multiple c...

Multi-View Multiple Clustering arXiv:1905.05053v1 [cs.LG] 13 May 2019 Shixing Yao1 , Guoxian Yu1 , Jun Wang1 , Carlotta Domeniconi2 , Xiangliang Zhang3 1 College of Computer and Information Sciences, Southwest University, Chongqing, China 2 Department of Computer Science, George Mason University, VA, USA 3 CEMSE, King Abdullah University of Science and Technology, Thuwal, SA 4 Fourth Affiliation {ysx, gxyu, kingjun}@swu.edu.cn, [email protected], [email protected] Abstract Multiple clustering aims at exploring alternative clusterings to organize the data into meaningful groups from different perspectives. Existing multiple clustering algorithms are designed for singleview data. We assume that the individuality and commonality of multi-view data can be leveraged to generate high-quality and diverse clusterings. To this end, we propose a novel multi-view multiple clustering (MVMC) algorithm. MVMC first adapts multi-view self-representation learning to explore the individuality encoding matrices and the shared commonality matrix of multi-view data. It additionally reduces the redundancy (i.e., enhancing the individuality) among the matrices using the Hilbert-Schmidt Independence Criterion (HSIC), and collects shared information by forcing the shared matrix to be smooth across all views. It then uses matrix factorization on the individual matrices, along with the shared matrix, to generate diverse clusterings of high-quality. We further extend multiple co-clustering on multi-view data and propose a solution called multi-view multiple co-clustering (MVMCC). Our empirical study shows that MVMC (MVMCC) can exploit multiview data to generate multiple high-quality and diverse clusterings (co-clusterings), with superior performance to the state-of-the-art methods. 1 Introduction The goal of clustering is to partition samples into disjoint groups to facilitate the discovery of hidden patterns in the data. Traditional clustering algorithms are designed for single-view data. With the diffusion of the internet of things and of big data, samples can be easily collected from different sources, or observed from different views. For example, a video can be characterized using image signals and audio signals, and a given news can be reported in different languages. Objects with diverse feature views are typically called multiview data. It’s recognized that the integration of information contained in multiple views can achieve consolidated data clustering [Chao et al., 2017]. Many multi-view clustering methods have been developed to extract comprehensive infor- Multi-view Data View1 Texture Clustering1 (Texture+Shape) View2 Shape Color Clustering2 (Color+Shape) Figure 1: An example of multi-view multiple clustering. Two alternative clusterings (texture+shape and color+shape) can be generated using the commonality (shape) and the individuality (texture and color) information of the same multi-view objects. mation from multiple feature views; examples are co-training based [Kumar and Daumé, 2011], multiple kernel learning [Gönen and Alpaydın, 2011], and subspace learning based [Cao et al., 2015; Luo et al., 2018]. However, the aforementioned clustering methods typically provide a single clustering, which may fail to reveal the high-quality but all diverse alternative clusterings of the same data. For example, in Figure 1, we have a collection of objects represented by a texture view and a color view. We can group the objects based on their shared shapes. By leveraging the commonality and the individuality of these multiview objects, we can obtain two alternative clusterings (texture+shape and color+shape), as shown in the bottom of the figure. From this example, we can see that multi-view data include not only the commonality information for generating high-quality clustering (as multi-view clustering does), but also the individual (or specific) information for generating diverse clusterings (as multiple clustering aims to achieve) [Bailey, 2013]. To explore different clusterings of the given data, multiple clustering has emerged as a new branch of clustering in recent years. Some approaches seek diverse clusterings in alternative to those already explored, by enforcing the new ones to be different [Bae and Bailey, 2006; Davidson and Qi, 2008; Yang and Zhang, 2017]; other solutions simultaneously seek multiple clusterings by reducing their correlation [Caruana et al., 2006; Jain et al., 2008; Dang and Bailey, 2010; Wang et al., 2018], or by seeking orthogonal (or independent) subspaces and clusterings therein [Niu et al., 2010; Ye et al., 2016; Mautz et al., 2018; Wang et al., 2019]. However, these multiple clustering methods are designed for single-view data. Based on the discussed example in Figure 1, we leverage the individuality and the commonality of multi-view data to generate high-quality and diverse clusterings, and we propose an approach called multi-view multiple clustering (MVMC) to achieve this goal. To the best of our knowledge, MVMC is the first attempt to encompass both multiple clusterings and multi-view clustering, where the former focuses on generating diverse clusterings from a single data view, and the latter focuses on a single consensus clustering by summarizing the information from different views. MVMC first extends multiview self-representation learning [Luo et al., 2018] to explore the individuality information encoding matrices and the commonality information matrix shared across views. To obtain more credible commonality information from multiple views, we force the commonality information matrix to be smooth across all views. In addition, we use the Hilbert-Schmidt Independence Criterion (HISC)[Gretton et al., 2005] to enhance the individuality between matrices, and consequently increase the diversity between clusterings. We then use matrix factorization to jointly factorize each individuality matrix (for diversity) and the commonality matrix (for quality) into a clustering indicator matrix and a basis matrix. To simultaneously seek the individual and common data matrices, and diverse clusterings therein, we use an alternating optimization technique to solve the unified objective. In addition, we extend multiple co-clustering [Tokuda et al., 2017; Wang et al., 2018] to the multi-view scenario, and term the extended approach as multi-view multiple co-clustering (MVMCC). The main contributions of our work are summarized as follows: • We study how to generate multiple clusterings from multi-view data, which is an interesting and challenging problem, but largely overlooked. The problem we address is different from existing multi-view clustering, which generates a single clustering by leveraging multiple views, and also different from multiple clustering, which produces alternative clusterings from single-view data. • We introduce a unified objective function to simultaneously seek multiple individuality information encoding matrices, and the commonality information encoding matrix. This unified function can leverage the individuality to generate diverse clusterings, and the commonality to boost the quality of the generated clusterings. We further adopt an alternative optimization technique to solve the unified objective. • Extensive experimental results show that MVMC (MVMCC) performs considerably better than existing multiple clustering (co-clustering) algorithms [Cui et al., 2007; Jain et al., 2008; Niu et al., 2010; Ye et al., 2016; Yang and Zhang, 2017; Tokuda et al., 2017; Wang et al., 2018; Wang et al., 2019] in exploring multiple clusterings and co-clusterings. 2 The Proposed Methods 2.1 Multi-View Multiple Clustering Suppose Xv ∈ Rdv ×n denotes the feature data matrix of the v-th view, v ∈ {1, 2, · · · , m}, for n objects in the dv dimensional space. We aim to generate h (provided by the user) different clusterings from {Xv }m v=1 using the shared and individual information embedded in the data matrices. Most multi-view clustering approaches in essence focus on the shared and complementary information of multiple data views to generate a consolidated clustering [Chao et al., 2017]. By viewing each subspace as a feature view, an intuitive solution to explore multiple clusterings on multiview data is to first concatenate different feature views, and then apply subspace-based multiple clustering methods on the concatenated feature vectors [Niu et al., 2010; Ye et al., 2016; Wang et al., 2019]. To find high-quality and diverse multiple clusterings, we should make concrete use of the individuality and commonality of multi-view data. The individuality helps to explore diverse clusterings, while the commonality coordinates the diverse clusterings to capture the common knowledge of multiview data. To explore the individuality and commonality of multi-view data, we extend the multi-view self-representation learning [Cao et al., 2015; Luo et al., 2018] as follows: JD ({Dk }hk=1 , U) = h m X 1 X k Xv − Xv (U + Dk )k2F m v=1 k=1 + n×n λ1 Φ1 ({Dk }hk=1 ) (1) + λ2 Φ2 (U) where U ∈ R is specified to encode the commonality of k n×n the data matrices {Xv }m is used to env=1 , and D ∈ R code the individuality of the k-th (k ∈ {1, 2, , · · · , h}) group of views. Φ1 ({Dk }hk=1 ) and Φ2 (U) (defined later) are two constraints used to enhance the individuality and commonality. The multi-view self-representation learning in [Cao et al., 2015; Luo et al., 2018] requires h = m. In contrast, Eq. (1) does not have this requirement. As a result, the groupwise individuality and diversity are jointly considered, and the number of alternative clusterings can be adjusted by the user. The assumption for the linear representation in Eq. (1) is that a data sample can be expressed as a linear combination of other samples in the subspace. This assumption is widelyused in sparse representation [Wright et al., 2010] and lowrank representation learning [Liu et al., 2013]. [Luo et al., 2018] recently combined U and {Dk }m k=1 into an integrated co-association matrix of samples, and then applied spectral clustering to seek a consistent clustering. Their empirical study shows that the individual information encoded by Dk helps producing a robust clustering. However, 0 since Xv and Xv (v 0 6= v) describe the same objects using different types of features, the matrix Dk resulting from Eq. 0 (1) may still have a large information overlap with Dk . As 0 a result, the expected individuality of Dk and Dk cannot be guaranteed. This overlapping of information is not necessarily an issue for multi-view clustering, which aims at finding a single clustering, but it is for our problem, where multiple clusterings of both high-quality and diversity are expected. To enhance the diversity between individuality encoding (representation) matrices {Dk }hk=1 , we approximately quantify the diversity based on the dependency between these matrices. The smaller the dependency between data matrices is, the larger their diversity is, since the matrices are less correlated. Various measurements can be used to evaluate the dependence between variables. Here, we adopt the Hilbert-Schmidt Independence Criterion (HSIC) [Gretton et al., 2005], for its simplicity, solid theoretical foundation, and capability in measuring both linear and nonlinear dependence between variables. HSIC computes the squared norm of the 0 cross-covariance operator over Dk and Dk in Hilbert Space to estimate the dependency. The empirical HISC does not have to explicitly compute the joint distribution of Dk and 0 Dk , it is give by: 0 0 HSIC(Dk , Dk ) = (n − 1)−2 tr(Kk HKk H) 0 (2) 0 where Kk , Kk , H ∈ Rn×n , Kk and Kk are used to measure the kernel induced similarity between vectors of Dk and 0 Dk , respectively. H = δij − 1/n, δij = 1 if i = j, δij = 0 otherwise. In this paper, we adopt the inner product kernel to specify Kk = (Dk )T Dk ∀k ∈ {1, 2, · · · , h}. Then we minimize the overall HISC on h individuality matrices to reduce the redundancy between them and specify Φ1 (Dk ) as follows: Φ1 ({Dk }hk=1 ) = k HSIC(D , D ) m X h X 0 (n − 1)−2 tr(Dk HKk H(Dk )T ) (3) e k (Dk )T ) tr(Dk K k=1 k0 e k = (n − 1)−2 Pm0 where K k =1,k0 6=k HK H. Inspired by subspace-based multi-view learning [Gao et al., 2015; Chao et al., 2017] and manifold regularization [Belkin et al., 2006], we specify Φ2 (U) in Eq. (1) to collect more shared information from multiple data views as follows: Φ2 (U) = V X n X k n×rk k (5) n×rk where R ∈ R and B ∈ R (rk is the number of sample clusters) are the clustering indicator matrix and the basis matrix, respectively. Here, the k-th clustering is generated not only with respect to Dk , but also to U, which encodes the shared information of multi-view data. As such, the explored k-th clustering (encoded by Rk ) not only reflects the individuality of views in the k-th group, but also captures the commonality of all data views. As a consequence, a highquality, and yet diverse clustering can be generated. The above process first explores the individual information matrices and the shared information matrix, and then generates diverse clusterings on the data matrices. A sub-optimal solution may be obtained as a result because the two steps are performed separately. To avoid this, we advocate to simultaneously optimize {Dk }hk=1 and the diverse clusterings {Rk }hk=1 therein, and formulate a unified objective function for MVMC as follows: JM C ({Dk }hk=1 , {Rk }hk=1 , U) h 1X = k (U + Dk ) − Bk (Rk )T k2F h k=1 k ek k T (6) T e + λ1 tr(D K (D ) ) + λ2 tr(ULU ) k0 k=1,k6=k0 = U + Dk = Bk (Rk )T s.t. Xv = Xv (U + Dk ), v ∈ {1, 2, · · · , m} h X k=1,k6=k0 = adopt the widely used semi-nonnegative matrix factorization [Ding et al., 2010] to explore the k-th clustering on U + Dk as follows: v e T) k ui − uj k22 Wij = tr(ULU (4) v=1 i,j=1 v where Wij is the feature similarity between xvi and xvj . To compute Wv , we simply adopt an = 5 nearest neighborhood graph and use the Gaussian heat kernel (with kernel width as the standard deviation of the distance between samples) to quantify the similarity between neighborhood same = Pm (Λv − Wv ) and Λv is a diagonal matrix ples. L v=1 P n v with Λvii = j=1 Wij . Minimizing Φ2 (U) can guide U to encode consistent and complementary information shared across views. In this way, the quality of diverse clusterings can be boosted using enhanced shared information. Given the equivalence between matrix factorization based clustering and spectral clustering (or k-means clustering), we By solving Eq. (6), we can simultaneously obtain multiple diverse clusterings of quality by leveraging the commonality and individuality information of multiple views. Our experiments confirm that MVMC can generate multiple clusterings with enhanced diversity and quality. In addition, it outperforms the counterpart algorithms [Cui et al., 2007; Jain et al., 2008; Niu et al., 2010; Ye et al., 2016; Wang et al., 2019], which concatenate multiple data views into a composite view, and then explore multiple clusterings in the subspaces of the composite view. Binary matrices {Rk }hk=1 are hard to directly optimize. As such, we relax the entries of {Rk }hk=1 to nonnegative numeric values. Since Eq. (6) is not jointly convex for {Dk }hk=1 , U and {Rk }hk=1 , it is unrealistic to find the global optimal values for all the variables. Here, we solve Eq. (6) via the alternating optimization method, which alternatively optimizes one variable, while fixing the other variables. The detailed optimization process can be viewed in the supplementary file due to the limitation of space. 2.2 Multiple Views Multiple Co-Clusterings Multiple co-clustering algorithms recently have also been suggested to explore alternative co-clusterings from the same data [Tokuda et al., 2017; Wang et al., 2018]. Multiple co-clustering methods aim at exploring multiple twoway clusterings, where both samples are features are clustered. In contrast, multiple clustering techniques only explore diverse one-way clusterings, where only samples (or only features) are clustered. Based on the merits of matrix tri-factorization in exploring co-clusters [Wang et al., 2011; Wang et al., 2018], we can seek multiple co-clusterings on multiple views by optimizing an objective function as follows: v m v m JM CC ({Dv }m v=1 , {R }v=1 , {C }v=1 , U) m 1 X = k Xv (U + Dv ) − Cv Sv (Rv )T k2F m v=1 v v T Table 1: Characteristics of multi-view datasets. n is the number of samples, dv is the dimensionality of samples, ’classes’ is the number of ground-truth clusters, and m is the number of views. Datasets Caltech-7 Caltech-20 Mul-fea digits Wiki article Corel Mirflickr (7) T e + λ1 tr(D K(D ) ) + λ2 tr(ULU ) v v v v v s.t. X = X (U + D ), C ≥ 0, R ≥ 0 n 1474 2386 2000 2866 5000 16738 dv [40,48,254,1984,512,928] [40,48,254,1984,512,928] [76,216,64,240,47,6] [128,10] [9,18,9] [150,500] classes 7 20 10 10 50 24 m 6 6 6 2 3 2 where Cv ∈ Rdv ×cv and Rv ∈ Rn×rk correspond to the row-cluster (grouping features) and column-cluster (grouping samples) indicator matrices of the h-th co-clustering. Sk ∈ Rcv ×rv plays the role of absorbing the different scaling factors of Rv and Cv to minimize the squared error. Here we fix h = m for MVMCC, since different feature views have different numbers of features. Eq. (7) can be optimized following the similar procedure for optimizing Eq. (6), which is provided in the supplementary file due to page limit. Index (DI) as the internal index. Large values of SC and DI indicate a high quality clustering. To quantify the redundancy between alternative clusterings, we use Normalized Mutual Information(NMI) and Jaccard Coefficient(JC) as external indices. Smaller values of NMI and JC indicate smaller redundancy between alternative clusterings. All these metrics have been used in the multiple clustering literature [Bailey, 2013]. The formal definitions of these metrics, omitted here to save space, can be found in [Bailey, 2013; Yang and Zhang, 2017]. 3 3.2 Experimental Results and Analysis 3.1 Experimental Setup In this section, we evaluate our proposed MVMC and MVMCC on five widely-used multi-view datasets [Li et al., 2015; Tao et al., 2018], which are described in Table 1. The datasets have different number of views and are from different domains. Caltech-71 and Caltech-20 [Li et al., 2015] are two subsets of Caltech-101, which contains only 7 and 20 classes, respectively. The creation of these subsets is due to the unbalance of the number of data in each class of Caltech101. Each sample is made of 6 views on the same image. Mul-fea digits2 is comprised of 2,000 data points from 0 to 9 digit classes, with 200 data points for each class. There are six public features available: 76 Fourier coefficients of the character shapes, 216 profile correlations, 64 Karhunenlove coefficients, 240 pixel averages in 2 × 3 windows, 47 Zernike moments, and 6 morphological features. Wiki article3 contains selected sections from the Wikipedia’s featured articles collection. We considered only the 10 most populated categories. It contains two views: text and image. Corel4 [Tao et al., 2018] consists of 5000 images from 50 different categories. Each category has 100 images. The features are color histogram (9), edge direction histogram (18), and WT (9). Mirflickr5 contains 25,000 instances collected from Flicker. Each instance consists of an image and its associated textual tags. To avoid noise, here we remove textual tags that appear less than 20 times in the dataset, and then delete instances without textual tags or semantic labels. This process gives us 16,738 instances. Multiple clusterings need to quantify the quality and diversity of alternative clusterings. To measure quality, we use the widely-adopted Silhouette Coefficient (SC) and Dunn 1 https://github.com/yeqinglee/mvdata https://archive.ics.uci.edu/ml/datasets/Multiple+Features 3 http://www.svcl.ucsd.edu/projects/crossmodal/ 4 http://www.cais.ntu.edu.sg/∼chhoi/SVMBMAL/ 5 http://press.liacs.nl/mirflickr/mirdownload.html 2 Discovering multiple one-way clusterings and multiple co-clusterings We compare the one-way multiple clusterings found by MVMC against Dec-kmeans [Jain et al., 2008], MNMF [Yang and Zhang, 2017], OSC [Cui et al., 2007], mSC [Niu et al., 2010], ISAAC [Ye et al., 2016], and MISC [Wang et al., 2019]. We also compare the multiple co-clusterings found by MVMCC against MultiCC [Wang et al., 2018] and MCCNBMM [Tokuda et al., 2017]. The input parameters of the comparing methods are set as the authors suggested in their papers or shared code. The parameter values of MVMC and MVMCC are λ1 = 10, λ2 = 100 and h = 2 for multiple one-way clusterings, and h = m for multiple co-clusterings. None of existing multiple clusterings algorithms can work on multiple view data, we concatenate the feature vectors of multiple view data and then run these comparing methods on the concatenated vectors to seek alternative clusterings. Our MVMC and MCMCC directly run on the multiple view data, without such feature concatenation. We use k-means to generate the reference clustering for MNMF, and then use their respective solutions to generate two alternative clustering (C1 , C2 ). We downloaded the source code of MNMF, ISAAC, MultiCC, MISC, and MCCNBMM, and implemented the other methods (Dec-kmeans, mSC, and OSC) following the respective original papers. Input parameters of the comparing methods were fixed or optimized as suggested by the authors. Following the experimental protocol adopted by these methods, we quantify the average clustering quality on C1 and C2 , and measure the diversity between C1 and C2 . We fix the number of row-clusters rk for each clustering as the respective number of classes of each dataset, as listed in Table 1. For co-clustering, we adopt a widely used technique [Monti et al., 2003] to determine the number of column clusters ck . Detailed parameter values can be viewed in the supplementary file. Table 2 reports the average results (of ten independent runs) and standard deviations of these comparing methods on exploring two alternative one-way clusterings with h = 2. Table 2: Quality and Diversity of the various competing methods on finding multiple clusterings. ↑(↓) indicates the direction of preferred values for the corresponding measure. •/◦ indicates whether MVMC is statistically (according to pairwise t-test at 95% significance level) superior/inferior to the other method. Caltech-7 Caltech-20 Corel Digits Wiki article Mirflickr SC↑ DI↑ NMI↓ JC↓ SC↑ DI↑ NMI↓ JC↓ SC↑ DI↑ NMI↓ JC↓ SC↑ DI↑ NMI↓ JC↓ SC↑ DI↑ NMI↓ JC↓ SC↑ DI↑ NMI↓ JC↓ Dec-kmeans 0.049±0.002• 0.044±0.000• 0.024±0.000• 0.126±0.001• -0.124 ± 0.001• 0.026 ± 0.000• 0.056 ± 0.000• 0.050 ± 0.001• 0.112 ± 0.014◦ 0.031 ± 0.001• 0.643 ± 0.035• 0.219 ± 0.004• -0.133 ± 0.022• 0.016 ± 0.000• 0.078 ± 0.000• 0.076 ± 0.000• 0.447 ± 0.016◦ 0.124 ± 0.000 0.803 ± 0.019• 0.593 ± 0.006• -0.004 ± 0.000◦ 0.061 ± 0.002• 0.427 ± 0.012• 0.878 ± 0.022• ISAAC 0.235±0.011◦ 0.034±0.001• 0.485±0.023• 0.363±0.008• 0.085 ± 0.000◦ 0.035 ± 0.000• 0.475 ± 0.011• 0.222 ± 0.002• -0.052 ± 0.000• 0.032 ± 0.000• 0.204 ± 0.002• 0.031 ± 0.001• -0.001 ± 0.000• 0.016 ± 0.000• 0.364 ± 0.012• 0.279 ± 0.003• -0.024 ± 0.000• 0.085 ± 0.000• 0.042 ± 0.000• 0.078 ± 0.001• -0.092 ± 0.000• 0.062 ± 0.005• 0.016 ± 0.000• 0.047 ± 0.000• MISC 0.201±0.002◦ 0.048±0.000• 0.513±0.016• 0.349±0.001• 0.036 ± 0.000◦ 0.033 ± 0.000• 0.489 ± 0.013• 0.198 ± 0.002• -0.070 ± 0.000• 0.020 ± 0.000• 0.209 ± 0.002• 0.029 ± 0.001• 0.061 ± 0.000 0.018 ± 0.001• 0.399 ± 0.008• 0.298 ± 0.000• -0.031 ± 0.000• 0.085 ± 0.001• 0.041 ± 0.000• 0.078 ± 0.000• -0.028 ± 0.000◦ 0.071 ± 0.000• 0.021 ± 0.000• 0.037 ± 0.000• Table 3: Quality and Diversity of the various competing methods on finding multiple co-clusterings. •/◦ indicates whether MVMCC is statistically (according to pairwise t-test at 95% significance level) superior/inferior to the other method. Caltech-7 Caltech-20 Corel Digits Wiki article Mirflickr SC↑ DI↑ NMI↓ JC↓ SC↑ DI↑ NMI↓ JC↓ SC↑ DI↑ NMI↓ JC↓ SC↑ DI↑ NMI↓ JC↓ SC↑ DI↑ NMI↓ JC↓ SC↑ DI↑ NMI↓ JC↓ MCC-NBMM -0.100 ± 0.002• 0.034 ± 0.000• 0.376 ± 0.014• 0.185 ± 0.003• -0.134 ± 0.000• 0.026 ± 0.000• 0.325 ± 0.010• 0.150 ± 0.002• -0.087 ± 0.002• 0.024 ± 0.000• 0.377 ± 0.013• 0.176 ± 0.004• -0.243 ± 0.024• 0.014 ± 0.000 0.286 ± 0.006• 0.166 ± 0.001• -0.0694 ± 0.000• 0.079 ± 0.000 0.287 ± 0.005• 0.127 ± 0.002• -0.095 ± 0.000• 0.052 ± 0.000• 0.017 ± 0.000• 0.052 ± 0.000• MultiCC -0.103 ± 0.006• 0.011 ± 0.000• 0.005 ± 0.000 0.087 ± 0.000 -0.229 ± 0.012• 0.011 ± 0.000• 0.021 ± 0.000 0.056 ± 0.000• -0.172 ± 0.012• 0.015 ± 0.000• 0.164 ± 0.002• 0.044 ± 0.000• -0.214 ± 0.013• 0.003 ± 0.000• 0.010 ± 0.000◦ 0.060 ± 0.000◦ -0.058 ± 0.000• 0.041 ± 0.001• 0.007 ± 0.000 0.054 ± 0.000 -0.194 ± 0.002• 0.065 ± 0.000• 0.081 ± 0.000• 0.052 ± 0.000• MVMCC 0.198 ± 0.004 0.047 ± 0.000 0.005 ± 0.000 0.083 ± 0.000 0.080 ± 0.000 0.156 ± 0.008 0.026 ± 0.000 0.029 ± 0.000 -0.017 ± 0.000 0.152 ± 0.002 0.070 ± 0.000 0.010 ± 0.000 0.144 ± 0.002 0.018 ± 0.000 0.207 ± 0.003 0.115 ± 0.000 0.064 ± 0.000 0.078 ± 0.000 0.006 ± 0.000 0.052 ± 0.000 0.064 ± 0.000 0.151 ± 0.003 0.005 ± 0.000 0.022 ± 0.000 From Table 2, we can find MVMC often performs better than these comparing methods across different multi-view datasets, which proves the effectiveness of MVMC on exploring alternative clusterings on multi-view data. MVMC always has the best results on the diversity metrics (NMI and JC). This fact suggests it can find two alternative clusterings with high diversity. MVMC occasionally has a lower value on quality metrics (SC and DI) than some of these comparing methods. That is explainable, since it is a widely-recognized dilemma to obtain alternative clusterings with high-diversity and high-quality, and MVMC has a much larger diversity than these comparing methods. Although these comparing methods employ different techniques to explore alternative clusterings in the subspaces or by reducing the redundancy between the clusterings, they almost always lose to MVMC. MNMF 0.234±0.001◦ 0.034±0.000• 0.022±0.000• 0.094±0.000• -0.169 ± 0.000• 0.009 ± 0.000• 0.052 ± 0.002• 0.033 ± 0.001• -0.277 ± 0.000• 0.015 ± 0.000• 0.092 ± 0.001• 0.013 ± 0.000 -0.076 ± 0.000• 0.008 ± 0.000• 0.011 ± 0.000 0.053 ± 0.000 -0.029 ± 0.000• 0.083 ± 0.000• 0.006 ± 0.000 0.056 ± 0.001 -0.058 ± 0.000• 0.053 ± 0.001• 0.014 ± 0.000• 0.023 ± 0.000 mSC 0.163±0.008◦ 0.056±0.000• 0.152±0.002• 0.136±0.001• -0.172 ± 0.006• 0.028 ± 0.000• 0.240 ± 0.003• 0.074 ± 0.001• -0.128 ± 0.000• 0.019 ± 0.000• 0.394 ± 0.006• 0.072 ± 0.001• -0.117 ± 0.000• 0.016 ± 0.001• 0.515 ± 0.022• 0.279 ± 0.004• 0.108 ± 0.002◦ 0.095 ± 0.001• 0.212 ± 0.001• 0.113 ± 0.002• -0.093 ± 0.000• 0.064 ± 0.001• 0.216 ± 0.006• 0.073 ± 0.000• OSC 0.261±0.004◦ 0.066±0.000 0.693±0.015• 0.522±0.046• 0.196 ± 0.001◦ 0.056 ± 0.000• 0.715 ± 0.025• 0.444 ± 0.004• 0.238 ± 0.002◦ 0.032 ± 0.000• 0.762 ± 0.018• 0.410 ± 0.013• 0.471 ± 0.013◦ 0.062 ± 0.000• 0.822 ± 0.028• 0.656 ± 0.015• 0.418 ± 0.012◦ 0.135 ± 0.001◦ 0.783 ± 0.052• 0.535 ± 0.014• 0.017 ± 0.000◦ 0.059 ± 0.002• 0.575 ± 0.011• 0.368 ± 0.011• MVMC 0.140±0.002 0.062±0.000 0.006 ± 0.000 0.076 ± 0.000 0.004 ± 0.000 0.183 ± 0.000 0.027 ± 0.000 0.023 ± 0.000 -0.016 ± 0.000 0.354 ± 0.000 0.070 ± 0.000 0.010 ± 0.000 0.064 ± 0.000 0.087 ± 0.000 0.008 ± 0.000 0.052 ± 0.000 0.066 ± 0.000 0.122 ± 0.000 0.006 ± 0.000 0.052 ± 0.000 -0.038 ± 0.000 0.173 ± 0.005 0.005 ± 0.000 0.022 ± 0.000 Table 4: Comparison results with/without the shared information matrix U for discovering multiple clusterings. Digits Wiki article MVMC(nU) MVMC MVMC(nU) MVMC SC↑ 0.061 0.064 0.059 0.066 DI↑ 0.076 0.087 0.088 0.122 NMI↓ 0.008 0.008 0.005 0.006 JC↓ 0.050 0.052 0.051 0.052 The cause is that the concatenated long feature vectors override the intrinsic structures of different views. It also explains why these comparing methods have lower values on diversity metrics (NMI and JC). In practice, because of the long concatenated feature vectors, these comparing methods generally suffer from a long runtime and cannot be applied on multi-view datasets with high-dimensional feature views. In contrast, our MVMC is rather efficient, it does not need to concatenate features and is directly applicable on each view. Table 3 reports the results of MVMCC, MultiCC and MCC-NBMM in exploring multiple co-clusterings, whose number is equal to the number of views h = m (unlike h = 2 for Table 2), because of the feature view heterogeneity. For this evaluation, we report the average quality and diversity values of all pairwise alternative co-clusterings of h clusterings. We can see MVMCC significantly outperforms these two state-of-the-art multiple co-clustering methods across different evaluation metrics and datasets. MultiCC sometimes obtains a better diversity than MVMCC. That is because it directly optimizes the diversity on the samplecluster and feature-cluster matrices, while our MVMCC indirectly optimizes the diversity and mainly from the samplecluster matrices. These results prove the effectiveness of our solution on exploring multiple co-clusterings on multi-view data. Following the experimental setup in Table 2, we conduct additional experiments to investigate the contribution of shared matrix U on improving the quality of multiple clusterings. For this investigation, we introduce a variant of h MVMC(nU), which only uses {Dk }k=1 to generate multiple clusterings and disregards the shared information matrix U. From the reported results on Digits and Wikipeda datasets 3.3 Parameter analysis λ1 and λ2 are two important input parameters of MVMC (MCMCC) for seeking the individuality and commonality information of multi-view data, which consequently affect the quality and diversity of multiple clusterings. We investigate the sensitivity of MVMC to these parameters by varying λ1 (it controls diversity) and λ2 (it controls quality) in the range [10−3 , 10−2 , · · · , 103 ]. Figure 2 reports the Quality (DI) and Diversity (1-NMI, the larger the better) of MVMC on the Caltech-7 dataset. We have several interesting observations: (i) diversity (1-NMI) increases as λ1 increases, but not as much as with the increase of λ2 (see Figure 2(b)); quality (DI) increases as λ2 increases, but not as much as with the increase of λ1 . (ii) The synchronous increase of λ1 and λ2 does not necessarily give the highest results in quality and diversity. (iii) When both λ1 and λ2 are fixed to a small value, both quality and diversity are reduced. This fact suggests that both diversity and commonality information of multi-view data should be used for the exploration of alternative clusterings. These observations again confirm the known dilemma between diversity and quality of multiple clusterings. The values λ1 = 10 and λ2 = 100 often provide the best balance between quality and diversity. Caltech-7 0.065 6.8 0.06 6.6 0.055 6.4 NMI DI in Table 4, we can find the diversity (NMI and JC) of two alternative clusterings keeps almost the same when the shared information matrix U is excluded. However, the quality (SC and DI) is clearly reduced when U is excluded. This contrast proves that U indeed improves the quality of multiple clusterings, and justifies our motivation to seek U across views and leverage it with individuality information matrix Dv for multiple clusterings. In summary, we can conclude that the diversity information matrix helps to generate diverse clusterings, and the commonality information matrix evacuated from multi-view data can respectively improve the quality of these clusterings. These experimental results also confirm our assumption that the diversity and commonality of multi-view data can be leveraged to generate diverse clusterings with high-quality. 0.05 Caltech-7 6.2 0.045 0.04 2 ×10-3 6 4 6 8 10 12 5.8 2 4 6 h 8 10 12 h (a) DI (b) NMI Figure 3: Quality (SI) and Diversity(NMI, the lower the better) of MVMC vs. h from 2 to 2 ∗ m on the Caltech-7 dataset. of themselves. Overall, we can find that our MVMC can explore h ≥ 2 alternative clusterings with quality and diversity. 3.4 Runtime Analysis Table 5 gives the runtimes of all methods. The time complexity of our MVMC is O(tn2 d(h2 v + 2hv + 2h)), where t is the number of iterations for optimization, v is the number of views, n is the number of samples, d is the number of features, h is the number of clusterings. The experiments are conducted on a server with Ubuntu 16.04, Intel Xeon8163 with 1TB RAM; all methods are implemented in Matlab2014a. OSC is the fastest and Dec-kmeans is the second. OSC finds multiple clusterings by iteratively reducing dimension and applying k-means. Dec-kmeans jointly seeks two different clusterings by minimizing a k-means sum squared error objective for the two clustering solutions, and the correlation between them. Because k-means has a low time complexity, these two k-means based methods are much faster than the other techniques. MVMC and MNMF have similar runtimes, since they are both based on nonnegative matrix factorization, which has a large complexity than kmeans. MVMC is more efficient than other comparing methods (except OSC and Dec-kmeans), since it does not need to concatenate features and is directly applicable on each view. In summary, MVMC not only performs better than the stateof-the-art methods in exploring multiple clusterings, but also holds a comparable runtime with the efficient counterparts. Table 5: Runtimes of comparing methods (in seconds). Caltech-7 Caltech-7 0.996 1-NMI DI 0.06 0.05 0.04 4 0.03 4 Dec-kmeans ISAAC MISC MNMF mSC OSC MVMC 0.998 0.07 2 2 0 0 -2 -2 -4 λ ) log(10 2 (a) DI 0.994 0.992 0.99 0.988 4 4 2 2 0 0 -4 λ ) log(10 1 -2 -2 -4 -4 λ ) log(10 2 λ ) log(10 1 (b) 1-NMI Figure 2: Quality (SI) and Diversity(1-NMI) of MVMC vs. λ1 and λ2 on the Caltech-7 dataset. We vary h from 2 to 2 ∗ m on Caltech-7 dataset to explore the variation of average quality and diversity of multiple clusterings generated by MVMC. In Figure 3, as the increase of h, the average quality of multiple clusterings decreases gradually with small fluctuations, and the average diversity fluctuates in a small range. These patterns are accountable from the dilemma of quality and diversity of multiple clusterings. More alternative clusterings with diversity scarify the quality 4 Caltech7 112 1432 1336 105 864 4 332 Caltech20 273 2232 2156 1257 1564 10 585 Corel 433 1822 1956 614 1269 1 790 Digits 39 1223 1278 356 690 1 234 Wiki 14 656 638 220 527 2 490 Mirflickr 361 10012 21168 1577 2631 217 9564 Total 1232 17074 28532 4129 7545 235 11995 Conclusion In this paper, we proposed an approach to generate multiple clusterings (co-clusterings) from multi-view data, which is an interesting but largely overlooked clustering topic which encompasses multi-view clustering and multiple clusterings. Our approach leverages the diversity and commonality of multi-view data to generate multiple clusterings, and outperforms state-of-the-art multiple clustering solutions. Our study confirms the existence of individuality and commonality of multi-view data, and their contribution for generating diverse clusterings with quality. We plan to find a principal way to automatically determine the number of alternative clusterings for our proposed approach. References [Bae and Bailey, 2006] Eric Bae and James Bailey. Coala: A novel approach for the extraction of an alternate clustering of high quality and high dissimilarity. In ICDM, pages 53–62, 2006. [Bailey, 2013] James Bailey. Alternative clustering analysis: A review. In Aggarwal Charu and Reddy Chandan, editors, Data Clustering: Algorithms and Applications, pages 535–550. CRC Press, 2013. [Belkin et al., 2006] Mikhail Belkin, Partha Niyogi, and Vikas Sindhwani. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. JMLR, 7(11):2399–2434, 2006. [Cao et al., 2015] Xiaochun Cao, Changqing Zhang, Huazhu Fu, Si Liu, and Hua Zhang. Diversity-induced multi-view subspace clustering. In CVPR, pages 586–594, 2015. [Caruana et al., 2006] Rich Caruana, Mohamed Elhawary, Nam Nguyen, and Casey Smith. Meta clustering. In ICDM, pages 107–118, 2006. [Chao et al., 2017] Guoqing Chao, Shiliang Sun, and Jinbo Bi. A survey on multi-view clustering. arXiv preprint arXiv:1712.06246, 2017. [Cui et al., 2007] Ying Cui, Xiaoli Z Fern, and Jennifer G Dy. Non-redundant multi-view clustering via orthogonalization. In ICDM, pages 133–142, 2007. [Dang and Bailey, 2010] Xuan Hong Dang and James Bailey. Generation of alternative clusterings using the cami approach. In SDM, pages 118–129, 2010. [Davidson and Qi, 2008] Ian Davidson and Zijie Qi. Finding alternative clusterings using constraints. In ICDM, pages 773–778, 2008. [Ding et al., 2010] Chris HQ Ding, Tao Li, and Michael I Jordan. Convex and semi-nonnegative matrix factorizations. TPAMI, 32(1):45–55, 2010. [Gao et al., 2015] Hongchang Gao, Feiping Nie, Xuelong Li, and Heng Huang. Multi-view subspace clustering. In ICCV, pages 4238–4246, 2015. [Gönen and Alpaydın, 2011] Mehmet Gönen and Ethem Alpaydın. Multiple kernel learning algorithms. JMLR, 12(7):2211–2268, 2011. [Gretton et al., 2005] Arthur Gretton, Olivier Bousquet, Alex Smola, and Bernhard Schölkopf. Measuring statistical dependence with hilbert-schmidt norms. In ALT, pages 63–77, 2005. [Jain et al., 2008] Prateek Jain, Raghu Meka, and Inderjit S Dhillon. Simultaneous unsupervised learning of disparate clusterings. Statistical Analysis and Data Mining, 1(3):195–210, 2008. [Kumar and Daumé, 2011] Abhishek Kumar and Hal Daumé. A co-training approach for multi-view spectral clustering. In ICML, pages 393–400, 2011. [Li et al., 2015] Yeqing Li, Feiping Nie, Heng Huang, and Junzhou Huang. Large-scale multi-view spectral clustering via bipartite graph. In AAAI, 2015. [Liu et al., 2013] Guangcan Liu, Zhouchen Lin, Shuicheng Yan, Ju Sun, Yong Yu, and Yi Ma. Robust recovery of subspace structures by low-rank representation. TPAMI, 35(1):171–184, 2013. [Luo et al., 2018] Shirui Luo, Changqing Zhang, Wei Zhang, and Xiaochun Cao. Consistent and specific multiview subspace clustering. In AAAI, pages 3730–3737, 2018. [Mautz et al., 2018] Dominik Mautz, Wei Ye, Claudia Plant, and Christian Böhm. Discovering non-redundant k-means clusterings in optimal subspaces. In KDD, pages 1973– 1982, 2018. [Monti et al., 2003] Stefano Monti, Pablo Tamayo, Jill Mesirov, and Todd Golub. Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Machine learning, 52(1-2):91–118, 2003. [Niu et al., 2010] Donglin Niu, Jennifer G Dy, and Michael I Jordan. Multiple non-redundant spectral clustering views. In ICML, pages 831–838, 2010. [Tao et al., 2018] Hong Tao, Chenping Hou, Xinwang Liu, Tongliang Liu, Dongyun Yi, and Jubo Zhu. Reliable multiview clustering. In AAAI, pages 4123–4130, 2018. [Tokuda et al., 2017] Tomoki Tokuda, Junichiro Yoshimoto, Yu Shimizu, and et al. Multiple co-clustering based on nonparametric mixture models with heterogeneous marginal distributions. PLoS ONE, 12(10):e0186566, 2017. [Wang et al., 2011] Hua Wang, Feiping Nie, Heng Huang, and Fillia Makedon. Fast nonnegative matrix trifactorization for large-scale data co-clustering. In IJCAI, pages 1553–1558, 2011. [Wang et al., 2018] Xing Wang, Guoxian Yu, Carlotta Domeniconi, Jun Wang, Zhiwen Yu, and Zili Zhang. Multiple co-clusterings. In ICDM, pages 1308–1313, 2018. [Wang et al., 2019] Xing Wang, Guoxian Yu, Carlotta Domeniconi, Jun Wang, Guoqiang Xiao, and Maozu Guo. Multiple independent subspace clusterings. In AAAI, pages 1–8, 2019. [Wright et al., 2010] John Wright, Yi Ma, Julien Mairal, Guillermo Sapiro, Thomas S Huang, and Shuicheng Yan. Sparse representation for computer vision and pattern recognition. Proceedings of the IEEE, 98(6):1031–1044, 2010. [Yang and Zhang, 2017] Sen Yang and Lijun Zhang. Nonredundant multiple clustering by nonnegative matrix factorization. Machine Learning, 106(5):695–712, 2017. [Ye et al., 2016] Wei Ye, Samuel Maurus, Nina Hubig, and Claudia Plant. Generalized independent subspace clustering. In ICDM, pages 569–578, 2016.

Log In

Multi-View Multiple Clustering

Related papers

Related papers

Related topics