Local-Global Fuzzy Clustering With Anchor Graph

188 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 32, NO.
1, JANUARY 2024
Local-Global Fuzzy Clustering With Anchor Graph

Jingyu Wang , Member, IEEE, Shengzhao Guo , Feiping Nie , Senior Member, IEEE,
and Xuelong Li , Fellow, IEEE
Abstract—Recently, anchor-based strategy is getting a lot of of minimizing the discrepancies between samples within the
attention, which extends spectral clustering to reveal the dual same clusters, while the patterns of different clusters are kept as
relation between samples and features. However, the acquisition dissimilar as possible from others [4]. Based on this basic idea,
of clustering results follows the relaxed-discrete procedure, which
might lead to serious information loss. Given that, local-global numerous clustering methods have been developed. Among
fuzzy clustering with anchor graph is proposed in this article, these methods, graph-based clustering [5] features the capability
which jointly completes subspace learning and clustering. With of allocating data points to corresponding clusters on the strength
the construction of anchor graph, we first impose the fuzzy con- of graph decomposition, which is able to handle datasets with
straint on the indicator matrix, which is beneficial for revealing complex structures [6].
the ambiguity and uncertainty in clustering tasks. Thereafter, we
introduce the graph divergence regularization term for maximizing As a classic graph-based method, spectral clustering (SC) [7]
the variance of fuzzy indicator matrix, which not only avoids consists of subspace learning and clustering for capturing the
the trivial solutions in local graph learning, but also enhances intrinsic structure of data. First, SC constructs the similarity
the separability of data for explicit global structure. In this way, the graph with Gaussian kernel function to explore the pairwise
local graph loss and global divergence regularization term are able
relationship between samples [8]. Second, eigenvalue decompo-
to jointly capture the critical clustering structure. Finally, we can
obtain the clustering results in accordance with the optimal fuzzy sition is performed as the process of subspace learning, which
indicator matrix, which is updated alternately by the presented is used to learn the representation of data in latent space [9].
coordinate descent method in optimization process. Therefore, the Finally, the acquisition of labels requires postprocessing, such
desirable discrete labels come out automatically under the fuzzy as K-means [10], to discretize the latent low-dimensional distri-
strategy without extra discretization operations, which conforms bution. Despite the acceptable performance of SC, its clustering
to reality. The effectiveness of our method will be demonstrated
through comprehensive experimental results. accuracy and efficiency depend on the graph construction to a
great extent [11]. In view of this, various strategies for improving
Index Terms—Anchor-based graph, fuzzy clustering, graph graph quality have been proposed to ameliorate SC, which can
divergence regularization term, local-global structure, trivial
solutions.
be roughly divided as follows.
1) Anchor graph: The widespread adoption of anchor-based
strategy in graph-based clustering allows the “sample-
I. INTRODUCTION feature-label” information chain [12] to be fully explored,
S A foundational data analysis technique of unsupervised where anchors act as intermediaries between samples and
A learning, clustering algorithm has gained a wide range of
applications in data mining [1], image processing [2], and pattern
labels. Specifically, anchors are generated as the center
of neighboring samples [13] so that they are capable of
recognition [3]. The clustering methods follow the principle representing neighborhood feature, according to which the
label of samples can be obtained.
2) Rank-constrained graph: It is hard to guarantee the eigen-
Manuscript received 24 December 2022; revised 27 June 2023; accepted 1
July 2023. Date of publication 13 July 2023; date of current version 3 January vectors with discrete value so that the latent distribution
2024. This work was supported in part by the National Natural Science Foun- is relaxed to be continuous, with discretization restoring
dation of China under Grant 61976179, Grant 61871470, and Grant 62176212. it to ideal outcome [14]. As a result, such an approximate
(Corresponding author: Feiping Nie.)
Jingyu Wang is with the School of Artificial Intelligence, OPtics and Elec- solution could lead to difficulty in obtaining the optimal
troNics (iOPEN), Key Laboratory of Intelligent Interaction and Applications, solution [15]. The feasibility to remove the extra postpro-
Ministry of Industry and Information Technology, Northwestern Polytechni- cessing has been verified [16], while only preset connected
cal University, Xi’an 710072, China, and also with the School of Astronau-
tics, Northwestern Polytechnical University, Xi’an 710072, China (e-mail: components consist in similarity graph.
[email protected]). 3) Multiview graph: Generally, the acquisition of datasets
Feiping Nie and Xuelong Li are with the School of Artificial Intelligence, is inevitably accompanied along with the generation of
OPtics and ElectroNics (iOPEN), Key Laboratory of Intelligent Interaction and
Applications, Ministry of Industry and Information Technology, Northwestern redundant features [17]. However, the weights of each
Polytechnical University, Xi’an 710072, China (e-mail: [email protected]; feature are defaulted to be consistent in SC, even though
[email protected]). the information value contained in each feature varies
Shengzhao Guo is with the School of Astronautics, Northwest-
ern Polytechnical University, Xi’an 710072, China (e-mail: guosheng widely [18]. Hence, utilizing feature legitimately based
[email protected]). on their contribution for robust graph quality is expected,
Color versions of one or more figures in this article are available at which can enhance the complementarity of information
https://doi.org/10.1109/TFUZZ.2023.3294921.
Digital Object Identifier 10.1109/TFUZZ.2023.3294921 from different views.
1063-6706 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: UNIVERSITE Cote d'Azur (Nice). Downloaded on March 12,2024 at 14:57:16 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: LOCAL-GLOBAL FUZZY CLUSTERING WITH ANCHOR GRAPH 189
In recent years, multitudinous methods have been proposed descent method to acquire the clustering result. Our method
based on the graph construction mentioned above. Compared contains following three contributions:
to traditional graph utilizing the relationships between samples, 1) We impose a fuzzy constraint on the assignment ma-
anchor-based strategy is beneficial for coclustering task. trix, which corresponds to the fuzzy latent space. In this
Besides, anchor graph can effectively reduce the size of way, assignment matrix indicates the latent distribution of
similarity matrix, where the elements indicate the relationship samples and anchors, along with the fuzzy membership
between sample-anchor pairs [19]. For instance, Cai et al. [20] belonging to clusters. That is, subspace learning and clus-
combined anchor graph and singular value decomposition to tering can be completed in one step, which is synergistic
accelerate SC, where the running time and storage consumption for approaching the optimal solution.
are proportional to the size of data. Inspired by rank-constrained 2) A novel graph divergence regularization term is proposed
graph, Li et al. [21] optimized the similarity graph to meet rank to further preserve global structure, which corresponds to
constraint, where labels can be obtained without postprocessing. the intersample and interanchor variance. Based on local
Meanwhile, the projected matrix can be updated for unraveling graph learning, it can balance local and global structure
explicit cluster structure, which can mitigate the negative effect and addresses the trivial solutions in fuzzy prototype, so
of noise and irrelevant information. To sufficiently explore the that latent distribution is able to exhibit more discrimina-
information of key features, Zhou et al. [22] optimized multiview tive cluster structure.
graph and dictionary learning in turn, which fully utilized the 3) Instead of performing eigenvalue decomposition and post-
discriminant information in low-dimensional distribution. processing successively, the objective function is opti-
Nevertheless, majorizations are still expected for the sake mized by a coordinate descent method alternately. Experi-
of improving clustering performance, and many efforts have ments can verify that most (if not all) elements in indicator
been made to synthesize the innovations mentioned above. For matrix are discrete after optimization by enhancing the
instance, Nie et al. [23] guaranteed the anchor graph with specific credibility of label information, according to which we
connected components with a global rank constraint imposed can obtain the labels directly.
on the Laplacian matrix, which guarantees both clustering We organize the rest of this article in the following order.
efficiency and accuracy. Lu et al. [24] adopted anchor-strategy First, graph-based clustering, anchor-based clustering, and fuzzy
into multiview graph for handling large-scale datasets with clustering are reviewed in Section II, especially the methods
multiple features. Wang et al. [25] combined rank constraint proposed with anchor graph. Second, we introduce our method
and multifeature fusion to further improve the graph quality, and the corresponding optimization process in Section III,
which has capability to reveal cluster structure more correctly. while comprehensive analysis is shown in detail in Section IV.
However, the patterns of different clusters are similar to each Third, the results of comprehensive experiments are presented in
other in most real-world situations [26], while graph-based Section V to verify the theoretical analysis and effectiveness of
methods perform clustering with hard assignment. As a re- our method. Finally, the content of this article is summarized in
sult, it remains a challenge to extend graph-based clustering Section VI, including the future work on it.
to fuzzy assignment tasks. In light of this, fuzzy graph clus-
tering [27] adopts fuzzy strategy to make graph-based clus- II. PRELIMINARIES
tering more flexible for exploring complex manifold structure.
Inspired by anchor-based strategy, Yang et al. [28] accelerated In this section, we briefly introduce the notations involved
the large-scale hyperspectral clustering and maintained satisfied in this article. Besides, SC, anchor-based clustering and fuzzy
performance as well with fuzzy clustering. Notwithstanding, it clustering, especially the fuzzy clustering methods performed
is preferable to balance both local and global information for with anchor graph, are succinctly reviewed.
competitive performance [29], which can be drawn from analysis
of following two aspects. On one hand, anchor graph mainly A. Notations
focuses on the local structure, while the intersample and inter- Given a matrix W ∈ Rn×d , in which (i, j)-element is wij ,
anchor information are not fully utilized. On the other hand, soft wi is its ith row. wiT and WT are the transpose of vector wi and
membership learning has difficulty in revealing explicit cluster matrix W , respectively. Matrix In ∈ Rn×n represents identity
structure, due to the neglect of relationship among clusters. matrix and 1n ∈ Rn×1 is a column vector of all ones. W ≥ 0
To address these referred deficiencies, we propose the local- and wi ≥ 0 mean all elements in W and wi are nonnegative.
global fuzzy clustering with anchor graph (LGFCA). First, we For visual differentiation, matrices are represented by boldface
generate anchors and construct anchor graph to preserve the uppercase letters while vectors and scalar values are represented
dual relation between samples and features. Second, the fuzzy by lowercase letters. Besides, the main mathematical notations
constraint is imposed on the indicator matrix, where the rep- used in the article are listed in Table I.
resentation in the latent space also indicates the membership
belonging to different clusters. Third, the graph divergence
regularization term is introduced to avoid trivial solutions, which B. Spectral Clustering
also maximizes the differences among clusters for strengthening As a classical method in graph-based clustering, SC explores
the discriminative information. Finally, we present a coordinate the similarity relation between pairs of samples to learn the graph
190 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 32, NO. 1, JANUARY 2024
TABLE I zj . Thus, we can define the similarity matrix S and the degree
MAIN MATHEMATICAL NOTATIONS INVOLVED IN THIS ARTICLE
matrix D as

B Dx
S= , D = (1)
BT Dz
where Dx ∈ Rn×n and Dz ∈ Rm×m are diagonal matrices, of
which
the ith and the jth diagonal elements are j b ij and
i bij , respectively. In practice, eigenvalue decomposition [36]
or singular value decomposition [37] is performed to learn
the approximate solution because it is NP-hard to solve the
graph partition directly. After that, postprocessing is required to
transform the matrix F into discrete one. For instance, K-means
performs clustering on the latent representation of samples to
gain the labels [38].
In general, it is difficult to evaluate the quality of this approx-
imate solution, which follows the relaxed-discrete strategy [39].
Moreover, hard assignment cannot deal with the samples located
partition [30]. In general, the primal data are organized into the at the boundary of different clusters, which happens in most
form of matrix. The samples in matrix correspond to vertices real-world situations.
in the original space, which is determined by its features. The
distance between vertices indicates the similarity relationship D. Fuzzy Clustering
between samples, where neighboring points tend to have the
same labels. Varies distance measurements have been proposed, Generally speaking, conventional graph-based clustering
among which Euclidean distance is the most popular [31]. In methods only perform well when clusters are well separated.
this way, the edges indicate the mutual distance among vertices, Motivated by this, fuzzy clustering has been widely used in many
where pair of points with greater weight are considered to fields, which exhibits the most natural relation between samples
be more similar to each other. With the similarity graph, SC and clusters. Different from hard assignment, samples belong
performs eigenvalue decomposition for embedding the original to multiple clusters with different degrees in fuzzy assignment,
data into the latent space [32]. However, the construction of which is desirable for the interpretability of the results [40]. Re-
similarity graph and the eigenvalue decomposition on it are cently, numerous works have been proposed to get the utmost out
extremely time consuming, which makes it infeasible to handle of fuzzy set theory combined with graph-based clustering [41].
large-scale data [33]. Besides, dual relation between samples Dong et al. [42] performed hierarchical clustering algorithm
and features is ignored in SC, which limits its performance. with fuzzy connectedness, which enables it to reveal the cluster
structure of arbitrary shapes. Yang et al. [43] performed fuzzy
C. Anchor-Based Clustering graph embedding clustering with the regularization term, which
makes the membership of samples belonging to the clusters more
Recent studies on anchor-based strategy can achieve efficient reasonable. Recently, Nie et al. [44] provided the reference of
graph construction, which takes full advantage of the sample- performing similarity graph learning between sample-anchor
feature duality [34]. In general, anchors can be seen as the com- pairs and fuzzy membership learning jointly based on anchor
bination of neighboring samples, while representing the features graph, which reduces the computational burden significantly.
of them. Denote n, m, and d as the number of samples, anchors, Notably, the intersample and interfeature relationship are not
and features, respectively. The original data and the anchors fully used in anchor-based strategy and difficulty exists in fuzzy
selected from it can be defined as X = [x1 , x2 , . . ., xn ] ∈ Rn×d strategy to mine the discrepancy between clusters. As a result,
and Z = [z1 , z2 , . . ., zm ]T ∈ Rm×d . Anchor generation strate- discriminative information is expected to be explored for im-
gies commonly includes K-means and random selection, while proving the clustering performance.
former cannot guarantee the satisfied selection performance.
For further improving efficiency, a novel generation method III. METHODOLOGY
abbreviated as BKHK [35] is introduced to achieve better rough
coverage of the anchors, where the number of anchors are Here, the LGFCA will be proposed to settle the drawbacks
selected based on the binary tree strategy. Although BKHK can mentioned before, where subspace learning is conducted along
achieve anchor generation with lower computation complexity with clustering in one step. To solve the problem, we introduce
while maintaining relatively high performance, it is not conve- a coordinate descent method in details and present the theory
nient to find the optimal setting of m. Therefore, K-means++ analyses on it.
is adopted to gain anchors in this article. With the number of k
nearest anchors for each sample set in advance, the construction A. Motivation
of anchor graph B can be completed in a parameter-free way, Generally, the optimal graph partition is NP-hard to be solved
which is elaborated in Section III-C. The value of weight bij on with discrete constraint on the indicator matrix. As a conse-
the edges indicate the strength of relationship between xi and quence, an approximate solution is proposed to deal with this
issue, which relaxes the indicator matrix so that we can write LGFCA will be discussed in Section IV, which is corresponding
the problem as to the experimental results in Section V.
⎛ ⎞
T
F F ⎠ B. Optimization
min T r ⎝ Ls
G G The concrete optimization steps are as follows. Denote Y =
[F; G]T ∈ R(n+m)×c as the representation of samples and an-
s.t. FT F = Ic ; GT G = Ic (2) chors, where yi is the row vector of Y. For the sake of conve-
where Ls = D − S and Ic ∈ Rc×c is the identity matrix. F = nience, the solution set satisfying the constraint F1c = 1n , F ≥
[f1 , f2 , . . ., fn ]T ∈ Rn×c and G = [g1 , g2 , . . ., gm ]T ∈ Rm×c 0; G1c = 1m , G ≥ 0 is represented hereinafter by Ω. Then, we
can be seen as the orthogonal representation of samples and can define the objective function as
anchors in the latent space. Subsequently, postprocessing is min YT (Ls − λHn+m ) Y. (5)
required to transform the matrix F into discrete ones. However, F,G∈Ω
some critical information gets lost after these two separate steps, Substitute Ls , Hn+m into the (6), we can define
which causes negative effect on clustering performance [45]. In
addition to that, membership exists between one sample and Lm = Ls − λHn+m
multiple clusters in most (if not all) cases. As a consequence,
1
anchor-based clustering with hard assignment has difficulty in = (D − S) − λ In+m − 1n+m 1Tn+m
n+m
handling overlapping datasets.
In response to this, we replace the orthogonal constraint on 1
= (D − λIn+m ) − S − λ 1n+m 1Tn+m
the indicator matrix with fuzzy constraint, which can be written n+m
as
⎛ ⎞ = Dm − S m (6)
T
F F ⎠ where Dm is defined as diagonal
min T r ⎝ Ls matrix. Noting that the diagonal
G G element Dm,ii equals to the n+m j=1 Sm,ij for each i, we can
transform (5) as
s.t. F1c = 1n , F ≥ 0; G1c = 1m , G ≥ 0. (3)
n+m n+m
In this way, F and G are embedded into a novel low-dimensional min YT Lm Y ⇔ min yi − yj 22 mij (7)
F,G∈Ω F,G∈Ω
space named fuzzy latent space, where elements can be seen as i=1 j=1
the representation of samples and anchors in the latent space where yi ∈ Y, i = 1, 2, . . ., n + m. According to the value of i
as well as the membership indicator matrix. Thus far, we can and j, the mij can be verified as
complete subspace learning and clustering simultaneously by
λ
optimizing (3), where labels can be obtained directly according − n+m i, j ∈ [1, . . ., n] or [n + 1, . . ., n + m]
mij = λ
to the representation of samples without postprocessing. bij − n+m else.
Obviously, trivial solutions exist in (3), that is, all row vectors (8)
of F and G are set equal to each other, where the variance of them Significantly, yi , i = 1, 2, . . ., n + m are independent of each
is set as zero. Therefore, the graph divergence regularization other in (7), so we can update the vectors yi in turn until the
term is introduced to avoid the trivial solutions. The problem objective function converges to obtain the optimal results. We
then become can then rewrite (7) as
⎛ ⎞ ⎛ ⎞
T T n+m
F F ⎠ F F ⎠
min T r ⎝ Ls − λT r ⎝ Hn+m min yi − yj 22 mij i = 1, 2, . . ., n + m. (9)
G G G G F,G∈Ω
j=1,i=j
s.t. F1c = 1n , F ≥ 0; G1c = 1m , G ≥ 0 (4) Subsequently, (9) becomes

n+m
1
where Hn+m = I − n+m 1n+m 1Tn+m ∈ R(n+m)×(n+m) is the min yi − yj 22 mij
centering matrix, λ is the tradeoff parameter to balance the local F,G∈Ω
j=1,i=j
and global structure learning. By setting λ with appropriate
n+m
value, the samples are assigned together with others similar to
them while the discrepancies among different clusters are kept ⇔ min mij (yiT yi − 2yiT yj + yjT yj )
F,G∈Ω
j=1,i=j
as great as possible.
Thus, the neighboring relationship between anchors and sam- n+m n+m
ples are kept while the latent representation exhibits more ex- ⇔ min mij yiT yi − 2 mij yiT yj
F,G∈Ω
plicit cluster structure by optimizing (4). Besides, the discrete j=1,i=j j=1,i=j
labels can be obtained naturally under fuzzy constraint without n+m
approximate solution, which is desirable for clustering [46]. + mij yjT yj
The importance of regularization term and the effectiveness of j=1,i=j
n+m n+m
j=1,i=j mij yj Algorithm 1: Local-Global Fuzzy Clustering With Anchor
⇔ min mij yiT yi − 2yiT n+m + ···
F,G∈Ω
j=1,i=j mij
Graph.
j=1,i=j
n+m
Input: Samples X ∈ Rn×d , the
number of clusters c, the number of anchors
⇔ min mij yi − ȳi 22 i = 1, 2, . . ., n + m
F,G∈Ω m, the number of nearest anchors k for each sample
j=1,i=j
and the regularization parameter λ.
⇔ min ωi yi − ȳi 22 i = 1, 2, . . ., n + m (10) Output: The predicted labels for samples.
F,G∈Ω
1: Generate anchors to capture the simple-feature relation
n+m
mij yj n+m 2: Establish the anchor graph B.
where ȳi = j=1,i=j
n+m and ωi = j=1,i=j mij . The value 3: Initialize indicator matrix Y ∈ R(n+m)×c by random
j=1,i=j mij
of yjT yj is excusably omitted for it is irrelevant to the updating assignment.

of fi . According to the Theorem 1, (10) can be optimized in 4: while not converge do
alternating iterations. 5: Update Y according to (4);
Theorem 1: Denote the vector u ∈ Rc×1 and v ∈ Rc×1 . The 6: end while
problems with the same form as (11) have the optimal solutions 7: Obtain the labels according to fuzzy assignment
according to the different value of coefficient σ matrix F.
min σ u − v22
learned by solving the following problem:
s.t. uT 1c = 1, u ≥ 0, v T 1c = 1, v ≥ 0. (11) m m
min xi − zj 22 bij + η b2ij
Proof: It is worth noting that it is meaningless while σ = 0,
j=1 j=1
which will not be discussed here.
Optimize u while: σ > 0 : Since the value of σ does not affect s.t. bTi 1m = 1, bi ≥ 0 (14)
the optimization result, thus, the original function can be further
written as where η controls the sparsity of bi , with larger η corresponding
to more nonzero values in bi . Therefore, sparse bi learning with
1 exact k nonzero values can be described as
min u − v22
2
max η
s.t. uT 1c = 1, u ≥ 0, v T 1c = 1, v ≥ 0 (12)
s.t. b∗i 0 = k (15)
which can be solved according to an iterative algorithm [47] and
where b∗i is the optimal solution to (14). Denote hij = xi −
takes O(clog(c)) to obtain the optimal solution.
zj 22 and hi as a vector with the jth element as hij . Nie et al.
Optimize a while: σ < 0 : Similar to the optimization while
[48] proved that the value of η can be set adaptively while the
σ > 0, the original minimum optimization problem is trans-
optimal solution of b∗ij can be given as
formed into a maximum optimization problem as

hi,k+1 −hij
1 1 k j≤k
min − u − v22 ⇔ max u − v22 b∗ij = khi,k+1 − r=1 hir (16)
2 2 0 j > k.
⇔ max uT u − 2uT v + v T v Then, the anchor graph affinity matrix B can be obtained
s.t. u 1c = 1, u ≥ 0, v 1c = 1, v ≥ 0
T T
(13) according to the number of anchors m and the number of nearest
anchors k for each sample. Thereafter, the similarity matrix S
where the value of v T v has nothing to do with the optimization defined as (1) can be constructed and fixed in the following
of u. Obviously, uT u takes the maximum value when only one procedure.
element of u is set as 1. Meanwhile, only when ur = 1, r = Initialization to fuzzy assignment matrix Y: The indicator
1, 2, . . ., m and vr is the smallest in vector v could uT v take the matrix Y can be initialized via random assignment subjected
minimum value. to the fuzzy constraint Ω. To be more specific, the elements in
matrix Y are set as nonnegative numbers stochastically
at first
y
and initialized subsequently as m ij yij so that m j=1 yij = 1.
j=1
C. Initialization Although the indicators F and G are randomly set to be
Initial Anchor Graph Construction B: In the proposed model, continuous, most (if not all) elements of them are discrete
an initial anchor graph affinity matrix B is required for mining after optimization. The ith sample belongs to the rth cluster
latent label. It is desirable that the initialization of B follows a while fir is the maximum element in fi . The final algorithm
similar probabilistic neighborhood construction method as the goes like Algorithm 1, where the objective function reaches the
learning of Y, where data points with small distances should convergence condition as the iteration progresses. Immediately,
have a higher probability to be neighbors. Combined with the some comprehensive analysis are carried out from Section IV,
2 -norm regularization to eliminate trivial solutions, B can be including complexity analysis and ablation analysis.
IV. THEORETICAL ANALYSIS Hence, we can transform (4) as

A. Complexity Analysis
n m n+m n+m
1
The time complexity of LGFCA will be discussed in detail min 2 fi −gj 22 bij −λ . yi −yj 22
here, where the big O indicates the computational complexity F,G∈Ω
i=1 j=1 i=1 j=1
n + m
of algorithms. The complexity analysis of our method contains (19)
following four parts: For further optimization, the second item of (19) can be
1) Anchors generation: It generally takes O(nmdt0 ) time for rewritten as
K-means++ to generate anchors, where the iteration times
n+m n+m
of Kmeans++ is denoted by t0 .
2) The construction of B: It generally takes O(nmd + yi − yj 22
i=1 j=1
nmlog(m)) to construct anchor graph according to
[48], where computing hij = xi − zj 22 needs O(nmd) ⇓
and arranging its elements in descending order takes n m n n m m
O(nmlog(m)). Generally, satisfied results can be
2 fi −gj 22 + fi −fh 22 + gj −gl 22 .
achieved with a fat lot anchors so that the time complexity i=1 j=1 i=1 h=1 j=1 l=1
of sorting can be ignored compared to other steps.
3) Fuzzy membership initialization: Initialize the indicator jA jB
matrix Y takes O(nmc) according to Section III-C. (20)

4) Fuzzy membership learning: According to the optimiza-
tion procedure in Section III-B, computing ȳi needs It is apparent that the introduction of graph divergence regu-
O((n + m)c)). It is noteworthy that the value of ωi deter- larization term consists of two parts, where term jA affects on
mines the updating strategy. For situation where ωi < 0, sample-anchor pairwise relationship, and term jB acts on inter-
update row vector yi ∈ R1×c takes O(clog(c)) to obtain sample relationship and interanchor relationship. Substituting it
the vector ȳi arranged in ascending order. For situation into the original (19), we can transform it into
where ωi > 0, update row vector yi ∈ R1×c takes O(c) to
select the smallest element in ȳi . In this way, updating each n m
λ
rows yi , i ∈ [1, 2, . . ., n + m] in Y take O((n + m)((n + min 2 bij − fi − gj 22
F,G∈Ω n + m
m)c + clog(c))) (ωi > 0) or O((n + m)((n + m)c + c)) i=1 j=1
(ωi < 0). m m

λ
n n
λ
Normally, the number of labels c satisfies c (n + m) so − gj − gl 22 − fi − fh 22 .
n+m n+m
that the overall time complexity of LGFCA in Algorithm 1 is j=1 l=1 i=1 h=1
O((n + m)2 ct1 ), where t1 is the number of iterations in step 5. (21)
In general, our method converges after 30 iterations so that its
computational complexity is less than SC with O(n2 d + n3 ), Next, we will elaborate on the function of term jA and
especially when the dimension of data d is pretty high. term jB , respectively. First, the term jA naturally drives the
latent representation to form clusters. The distance between fi
B. Ablation Analysis and gj in the latent space is expected to be small as possible
λ
The graph divergence regularization term is introduced to where bij − n+m ≥ 0, which is corresponding to the structure
avoid trivial solutions and reveal the local-global structure of of anchor graph learned from the original space. At the same
data. Here, we will analyze the function of the graph divergence time, the distance between samples and anchors are maximized
λ
regularization term in details. Since while bij − n+m ≤ 0, which means that there is no or weak
neighboring sample-anchor relation. Second, the term jB is
T r YT Hn+m Y introduced to fully mine the discriminative information, where
the patterns of different clusters are kept as dissimilar as possible.
1
⇔ T r YT I − 1n+m 1Tn+m Y It should be noted that the local structure will get lost in the
n+m latent space with too much emphasis placed on discriminative
n+m n+m
1 information.
⇔2 yi − yj 22 . (17) In conclusion, the local structure of original data is main-
n+m
i=1 j=1 tained in latent representation with anchor graph, while the
Similarly, we can rewrite graph divergence regularization term maximizes the variance
of latent representation, which enables our method to explore
T r Y T Ls Y more separable distribution. The characteristic that labels can be
obtained without any postprocessing be proved on toy datasets
⇔ 2T r FT BG
and real-world datasets in Section V. To observe the influence of
n m fuzzy constraint and the graph divergence regularization term,
⇔4 fi − gj 22 bij . (18) we design ablation experiments and the corresponding results
i=1 j=1 are exhibited in Section V-E.
TABLE II 3) Purity is the percentage of correct labels and can be defined

STATISTICS OF REAL BENCHMARK DATASETS
as
c
1
Purity = max |map(ci ) ∩ lj | (24)
n 1≤j≤c
i=1
which signifies the consistency of labels [57].
B. Validation on Toy Example

For intuitive visualization, two moon toy data and initial
spheres toy data are exploited for analyzing our method, which
are shown in Figs. 1(a) and 2(a). It is evident that two classes exist
in two moon toy data while initial spheres toy data containing
V. EXPERIMENTS three classes, where each class consists of 100 samples with 0.12
noise for both of them.
In this section, the comprehensive experiments are carried As long as the parameters involved in our method are set
out on benchmark datasets to evaluate our method. First, ex- in a reasonable range, we can always maintain the satisfied
perimental scheme is introduced briefly, including datasets and clustering performance. Figs. 1(b) and 2(b) show the distribution
metrics used in this section. Second, the process of anchor graph of anchors selected in two toy examples with m = 8 and k = 2,
construction is exhibited graphically on two toy datasets, along respectively. In order to distinguish the anchor points from
with the visual verify of iteration analysis. Third, the sensitivity the sample points, anchors points are represented by the red
of parameters involved in this method and the convergence diamonds, which are evenly distributed in each class. After
experiment are illustrated on ten real-world datasets. Finally, that, the visual construction of anchor graph is exhibited in
ablation experiments and comparison results validate the effec- Figs. 1(c) and 2(c). Despite the connection between samples
tiveness and predominance of our method, where SC (normal- and anchors from different classes, the ACC can reach 99.50%
ized cut) [30], LSC [49], FSC [50], FRWL [13], SEC [51], and 99.75% for two moon toy data and spheres toy data with
SEANC [52], ULGE [53], and LAPIN [54] are selected as λ = 0.2. Figs. 1(d) and 2(d) show the result of optimized indi-
comparison. cator matrix F in these two toy examples, where labels can be
obtained directly without extra postprocessing. Nevertheless, the
A. Experimental Scheme presence of noisy data will seriously interfere with clustering,
such as the 102th sample in two moon toy data and the 33th,
Before the introduction of comprehensive experiments, we
178th samples in the spheres toy data, which are marked with
first list the real-world datasets used in details on Table II.
black boxes in Figs. 1(a) and 2(a).
Besides, we use the following three popular evaluation metrics
In order to further verify the functionality of our method
to evaluate the performance for all the clustering methods.
to maintain the local and global structure of data, Figs. 3(a)
1) Clustering accuracy (ACC) is the shortform of clustering
and 4(a) exhibits the change of indexes during the iteration
accuracy, which is the percentage of correctly clustered
process, including the percentage of neighboring distance, vari-
samples as the evaluation measurement criteria [55].
ance of latent representation, and ACC. The percentage of
n neighboring distance is expected to be minimized so that the
δ (li , map (ci ))
ACC = i=1 (22) local structure of data can be kept in the latent space, which is
n
defined as
where n is the number of samples and δ(a, b) equals 1 if 2
a = b and equals 0, otherwise. li is the ground truth of i,j∈δ fi − gj 2
n m 2 × 100% (25)
the ith sample and ci is the predicted label. map(·) is a i=1 j=1 fi − gj 2
function that maps computed clustering label index to the
where i, j ∈ δ indicates that sample point xi is located within
best truth class label.
neighborhood of anchor point yj in the initial anchor graph
2) Normalized mutual information(NMI) is calculated by
B while bij = 0. The denominator in (25) is set as the graph
M I(C, C ) divergence regularization term jA and the variance of latent
NMI = (23) representation is defined as (20), both of which correspond to
max(H(C), H(C ))
the global structure of latent distribution.
where C is a set of clusters obtained from the true labels Local structure learning: The heatmaps of initial matrix F for
and C is a set of clusters obtained from the clustering two moon data and spheres data are shown in Figs. 3(b) and 4(b),
algorithm. M I(C, C ) is the mutual information metric, which are consists of soft labels. As shown in Figs. 3(a) and 4(a),
and H(C), H(C ) are the entropies of C and C [56], the percentage of neighboring distance gradually decreases as
respectively. NMI is between 0 and 1 while larger value the iteration goes on. The visualization of elements in matrix
of it indicates better performance. F after one iteration is exhibited in Figs. 3(c) and 4(c), which
Fig. 1. Experiments of the proposed LGFCA on Two Moon data. (a) Samples of Two Moon. (b) Anchors of Two Moon. (c) Similarity graph of Two Moon.
(d) Visualization of fuzzy indicator matrix F, also the membership value of samples to clusters.
Fig. 2. Experiments of the proposed LGFCA on Spheres data. (a) Samples of spheres. (b) Anchors of spheres. (c) Similarity graph of spheres. (d) The visualization
of fuzzy indicator matrix F, also the membership value of samples to clusters.
Fig. 3. Iterative process on Two Moon data. (a) Trend of percentage of neighboring distance (%), variance of latent representations and accuracy (%) with
iterations. (b), (c), (d) Heatmaps of fuzzy assignment matrix F with iterations.
Fig. 4. Iterative process on spheres data. (a) Trend of percentage of neighboring distance (%), variance of latent representations and accuracy (%) with iterations.
(b), (c), (d) Heatmaps of fuzzy assignment matrix F with iterations.
accounts for the decrease of variance of latent representation at Global structure learning: Furthermore, ACC is improved
the beginning of iteration. During this process, fi are assigned gradually with the increasing variance of latent representation,
equal to neighboring gj , corresponding to the local structure which corresponds to maximizing interclass differences (global
learning. structure). At the same time, Figs. 3(c) and 4(c) confirms
Fig. 5. Trend of ACC by adjusting the value of m and λ on ten real-world datasets. (a) Lymphoma. (b) Lung. (c) Glass. (d) Control. (e) JAFFE. (f) MSRA25.
(g) USPS20. (h) Digits. (i) USPS. (j) MNIST.
that even with the continuous constraints, every element of TABLE III
SETTING OF m AND λ FOR DIFFERENT DATASETS
indicator matrix F is set as discrete values after optimization,
maintaining satisfied clustering results despite of randomly
initialization.
C. Parameter Sensitivity
In this section, we focus on the analysis of parameter sen-
sitivity in our method. Except for the number of nearest an-
chors k for each sample in USPS and MNIST are set as 72,
k is set as 8 uniformly. Then, two main parameters are in-
volved in LGFCA, including the number of anchors m and On one hand, Fig. 5 portrays the trend that critical features
the value of regularization parameter λ. The value of λ in- would be lost with sparse anchors while redundant information
fluences the clustering performance by affecting the value of might be introduced with superfluous anchors, both of which
λ lead to negative effect on clustering performance. On the other
n+m , which means the setting of λ should refer to the number
of samples and anchors. Therefore, the synthetic influence of hand, better clustering results are more likely obtained with
λ, n, and m will be discussed first. Thereafter, the function smaller value of λ for small datasets and greater value of λ
of λ on exploring local-global structure will be analyzed in for USPS and MINIST due to the different sizes of datasets,
λ
detail. which matches our analysis. Generally, the value of n+m is set
−3
1) Synthetic Influence of λ and m: It is worth noting that the as η × 10 , η ∈ (0, 5] for better performance.
λ
loss of intrinsic structure happens while the value of λ is set 2) Value of λ: To begin with, we adjust the value of n+m
−3 −3 −3
λ
too large, that is, bij − n+m ≤ 0 regardless of the value of i from 0.5 × 10 to 5 × 10 with interval 0.5 × 10 for ten
and j. As the result, the choice of regularization parameter λ real-world datasets mentioned before. The setting of m for
should be adjusted according to the size of datasets. We adopt different datasets are shown in Table III, which are expected
the range of λ is set from 2 to 20 with interval 2 for USPS and to obtain better performance according to Fig. 5. After fixing
MNIST, while the range of λ is set from 0.2 to 2 with interval parameters, we record the average value of ACC in ten times re-
0.2 for other datasets. According to the complexity analysis in peated experiments. From Fig. 6, the trend of ACC are portrayed
Section IV-A, the running time will increase extremely with along with the percentage of neighboring distance influenced
λ
overly large amounts of anchors. Generally, we set m less than by n+m during the iteration process. Meanwhile, as the value
λ
1/4 of the number of samples for USPS and MNIST and less of n+m continues to increase, the percentage of neighboring
than 1/2 the number of samples for other datasets, e.g., the distance shows a general trend of growth. The reason why the
number of anchors are set from 300 to 2100 with interval 300 for ACC shows a downward trend is that excessive value of λ
USPS while 70 to 130 with interval 10 for JAFFE. To reveal the might cause loss of local structural information. Accordingly,
universal trend of clustering performance with different setting appropriate value of λ should be set according to the scale of
of parameters, we repeat ten times experiments and select the samples and anchors, while balancing the local structure learning
mean value of ACC as the clustering results, which is portrayed and separability of latent representation for better clustering
in Fig. 5. performance.
Fig. 6. Trend of ACC with percentage of neighboring distance on ten real-world datasets. (a) Lymphoma. (b) Lung. (c) Glass. (d) Control. (e) JAFFE. (f) MSRA25.
(g) USPS20. (h) Digits. (i) USPS. (j) MNIST.
Fig. 7. Trend of ACC, NMI, and objective function value with the iterations of LGFCA on ten real-world datasets. (a) Lymphoma. (b) Lung. (c) Glass.
(d) Control. (e) JAFFE. (f) MSRA25. (g) USPS20. (h) Digits. (i) USPS. (j) MNIST.
D. Convergence Experiments two toy datasets and partial real-world datasets are selected for
experiments.
The convergence of LGFCA is controlled by the absolute
value error of the objective function value in (4), which is less 1) Fuzzy Constraint on Assignment Matrix: The subspace
learning result of SC (the assignment matrix F) on two toy
than the threshold 10−4 , i.e., |obj(i) − obj(i − 1)| < 10−4 and
obj(i) denotes the objective function value at the end of the ith datasets are shown in Figs. 8(a) and 9(a), while the latent
iteration. The iterative experimental results of our method on ten distribution (the assignment matrix F and G) in LGFCA after
optimization are exhibited in Figs. 8(b) and 9(b). Although both
real-world datasets are exhibited in Fig. 7 to deeply demonstrate
the optimization process on assignment matrix, where ACC of them can obtain explicit cluster structure, the elements in F are
and NMI almost synchronously rise with the convergence of independent of labels because SC follows orthogonal constraint.
At the same time, fuzzy theory enables LGFCA to unify latent
objective function. Generally, the training of our method reaches
convergence within 30 iterations. Therefore, the trends of ACC, space and label space so that clustering results come from F and
NMI, and the value of objective function can effectively confirm G directly. Furthermore, the ACC of SC on two moon data and
spheres data reach 83.50% and 99.30%, while LGFCA achieves
the theoretical analysis.
16% and 0.45% improvement on them, respectively. Compared
with obtaining labels with extra discretization, one-step learning
processing can prevent information loss effectively, which can
E. Ablation Experiments also be verified on real-world datasets in Section V-F.
In this section, the effect of introducing fuzzy constraint and 2) Function of jA and jB : For visualizing effects of diver-
divergence regularization term will be illustrated in detail, where gence regularization term jA and jB , we design three strategies
Fig. 8. Ablation analysis on Two Moon data. (a) Latent distribution of samples in SC. (b) Fuzzy latent distribution of anchors and samples obtained in LGFCA.
(c) Fuzzy latent distribution of anchors and samples obtained in LGFCA (without jA and jB ). (d) Fuzzy latent distribution of anchors and samples obtained in
LGFCA (without jB ).
Fig. 9. Ablation analysis on spheres data. (a) Latent distribution of samples in SC. (b) Fuzzy latent distribution of anchors and samples obtained in LGFCA.
(c) Fuzzy latent distribution of anchors and samples obtained in LGFCA (without jA and jB ). (d) Fuzzy latent distribution of anchors and samples obtained in
LGFCA (without jB ).
with different forms of objective function on toy datasets as discretization of fuzzy label. Specifically, jA corresponds to the
follows. discriminative feature between samples and anchors, while jB
1) With jA and jB : The clustering results obtained in ac- adjusting the intersample and interanchor discrepancies.
cording to (4) is exhibited in Figs. 8(b) and 9(b). The red
points and blue points correspond to the latent representa-
tion of anchors and samples. The intraclass difference is F. Comparison Results
weakened while the intercluster difference is maximized, Eight clustering methods are selected for comparison to high-
which converts soft labels into discrete labels naturally. light the effectiveness of our method, which involve traditional
2) Without jA and jB : Figs. 8(c) and 9(c) show the clustering SC with normalized cut, three graph clustering methods based
results optimized according to (3). It can be seen that only on anchor strategy, and four clustering with subspace learn-
the local structure is taken into consideration, so that the ing methods. To be specific, anchor-based clustering, such as
value of elements in F fluctuates around 0.5 for two moon LSC, FSC, and FRWL, scales down similarity matrix so that
data and 0.3 for spheres data. As a result, F and G can graph clustering can be extended to large-scale data analysis.
neither exhibit explicit cluster structure, nor indicate the Clustering with subspace learning can be divided into spectral
membership correctly. embedding (e.g., SEC, SEANC, ULGE) and dictionary learning
3) Without jB : Figs. 8(d) and 9(d) present the distribution (e.g., LAPIN). Among them, ULGE combines anchor graph
of fuzzy latent representation in two toy datasets. The and spectral embedding to learn subspace more efficiently. Fur-
samples are assigned together with others similar to them, thermore, LAPIN impose rank-constraint on similarity matrix
while the distance between samples from different clusters for robust clustering, where anchor indicates relevance between
is kept greater. Nevertheless, the discriminative feature sample and feature of dictionary. Among these anchor-based
could be enhanced further. The ablation experiments on se- techniques, different strategies are applied to select anchors,
lected datasets are shown in Table V (including MSRA25, such as random selection, K-means, and BKHK [53]. In general,
USPS20, and Digits), where discrete proportion (DP) is random selection and K-means cannot guarantee the perfor-
defined as (26). It can be seen that the clustering performance while BKHK limits the number of anchors. For better
mance without term jB are obviously poor, since fuzzy comparison effect, we generate anchors with K-means++ con-
membership degree may not clearly indicate labels. sistently and construct the anchor graph uniformly for different
anchor-based clustering methods. In Table III, we itemize the
setting of m and λ for each dataset. In addition to the parameters
The number of discrete indicators in F
DP = × 100%. (26) required to construct an anchor graph, more parameters are
The number of indicators in F
involved in comparison methods. For the sake of fairness, grid
In conclusion, the divergence regularization term jA and jB search strategy is employed to seek the best parameter setting if
are both necessary for mining explicit clustering structure and necessary. Due to the missing of label information in practice,
TABLE IV
COMPARISON IN TERMS OF ACC(%), NMI(%), AND PURITY(%)
TABLE V Spectral clustering with embedding has the ability to make

EFFECTIVENESS OF TERM jB
reasonable predictions on the distribution of unforeseeable data
in subspace, which leads to decrease in clustering quality to
some extent for balancing subspace learning and clustering.
With the introduction of anchor strategy, the efficiency of graph
construction (e.g., LSC, LAPIN, and our method) is streamlined
compared to SC on large-scale datasets. However, the labels of
these methods come after subspace learning, where improve-
ment space still exist to prevent information loss due to post-
processing. Though rank constraint is beneficial for revealing
estimating the clustering performance without priori knowledge explicit clustering structure, it may have difficulty in meeting
faces difficulty. In consideration of that, we reach the reasonable ideal constraint after optimization. As a result, it takes LAPIN
assumption that the best clustering results are considered to much more time for updating dictionary matrix and coefficient
appear while the objective function is optimized more efficiently. matrix, until satisfying the exiting condition of loop (objective
For those methods where the clustering results are obtained function achieves convergence or the number of connected com-
after postprocessing, clustering performance with the lowest ponents satisfies default). By contrast, our method puts forward
objective function value during K-means is selected. In this way, fuzzy latent space, where labels can be obtained naturally from
we record the best clustering performance after repeating each the distribution of low-dimensional representation. Meanwhile,
method for ten times. enhancing discriminative information mining with local struc-
Comparison results are mainly reflected in the clustering ture learning enables fuzzy labels to indicate the membership
quality and efficiency. In detail, Table IV demonstrates the between samples and clusters more unambiguous. Anchor rep-
effectiveness of our method where the best results are bolded resent critical features in neighborhood so that clusters analysis
while the second-best results are shown underlined. Especially, can give consideration to the relationships between features and
our method is far superior to others in USPS and MNIST. samples, where the labels of anchors can be obtained as well.
The computational efficiency of the proposed method will be Different from rank-constrained strategy, the coordinate descent
evaluated in terms of running time, and three clustering methods method guarantees that LGFCA has the capability to reveal the
are selected as comparison, including SC, LSC, and LAPIN. Ad- complex data structure more efficiently. Thus, we have reason
ditionally, computational complexity of all comparison methods to believe that our method is capable of completing clustering
are reported in Table VI to reveal the efficiency of the proposed tasks better than other clustering approaches.
method and support the experiment results shown in Table VII.
TABLE VI
SUMMARY OF TIME COMPLEXITY
TABLE VII
COMPARISON IN TERMS OF RUNNING TIME (S)
VI. CONCLUSION [2] J. Wang, G. Zhang, K. Zhang, Y. Zhao, Q. Wang, and X. Li, “De-
tection of small aerial object using random projection feature with re-
In this article, we proposed a fuzzy clustering method with gion clustering,” IEEE Trans. Cybern., vol. 52, no. 5, pp. 3957–3970,
anchor-based strategy named LGFCA, which achieves subspace May 2022.
[3] C. Hwang and F. C.-H. Rhee, “Uncertain fuzzy clustering: Interval type-2
learning and clustering at the same time and maintains the fuzzy approach to C-means,” IEEE Trans. Fuzzy Syst., vol. 15, no. 1,
local-global structure of data. First, we impose a fuzzy constraint pp. 107–120, Feb. 2007.
on the indicator matrix for dealing with fuzzy clustering tasks, [4] J. Xue, F. Nie, R. Wang, and X. Li, “Iteratively reweighted algorithm for
fuzzy C-means,” IEEE Trans. Fuzzy Syst., vol. 30, no. 10, pp. 4310–4321,
where the latent representation also indicates the membership Oct. 2022.
belonging to different clusters. Second, the graph divergence [5] F. Nie, C. Liu, R. Wang, Z. Wang, and X. Li, “Fast fuzzy clustering based
term is introduced to avoid trivial solutions, which also con- on anchor graph,” IEEE Trans. Fuzzy Syst., vol. 30, no. 7, pp. 2375–2387,
Jul. 2022.
tributes to obtain a explicit cluster structure. Finally, a novel [6] C. Zhang, L. Chen, Y.-P. Zhao, Y. Wang, and C. L. P. Chen, “Graph
coordinate descent method is presented to solve the problem, in enhanced fuzzy clustering for categorical data using a Bayesian dissim-
which the row vectors of indicator matrix are updated alternately ilarity measure,” IEEE Trans. Fuzzy Syst., vol. 31, no. 3, pp. 810–824,
Mar. 2023.
until the objective function converges. Even with the fuzzy [7] C. Liu, F. Nie, R. Wang, and X. Li, “Graph based soft-balanced fuzzy clus-
constraint, the elements in indicator matrices are set as discrete tering,” IEEE Trans. Fuzzy Syst., vol. 31, no. 6, pp. 2044–2055, Jun. 2023.
after optimization, which means postprocessing is needless for [8] Z. Bian, H. Ishibuchi, and S. Wang, “Joint learning of spectral clustering
structure and fuzzy similarity matrix of data,” IEEE Trans. Fuzzy Syst.,
the acquisition of labels. The superiority of our method than vol. 27, no. 1, pp. 31–44, Jan. 2019.
others has been verified through the comprehensive experiments. [9] Z. Wang, X. Dai, P. Zhu, R. Wang, X. Li, and F. Nie, “Fast optimization of
In conclusion, our method can handle complex data structure spectral embedding and improved spectral rotation,” IEEE Trans. Knowl.
Data Eng., vol. 35, no. 2, pp. 1515–1527, Feb. 2023.
while achieving satisfied performance without strict constraints, [10] K. Allab, L. Labiod, and M. Nadif, “Power simultaneous spectral data
which also has the potential for dealing with coclustering embedding and clustering,” Proc. SIAM Int. Conf. Data Mining, 2016,
task. pp. 270–278.
[11] F. Wang, L. Zhu, C. Liang, J. Li, X. Chang, and K. Lu, “Robust optimal
Despite that our method completes subspace learning and graph clustering,” Neurocomputing, vol. 378, pp. 153–165, 2020.
clustering, the algorithm is performed based on the fixed simi- [12] I. S. Dhillon, “Co-clustering documents and words using bipartite spectral
larity graph, which is unable to correctly extract labels for noisy graph partitioning,” in Proc. 17th ACM SIGKDD Int. Conf. Knowl. Discov.
Data Mining, 2001, pp. 269–274.
points. To be specific, the fuzzy membership learning depends on [13] C.-L. Wang, F. Nie, R. Wang, and X. Li, “Revisiting fast spectral clustering
the anchor selection results. As a consequence, robust similarity with anchor graph,” in Proc. IEEE Int. Conf. Acoust. Speech Signal
graph learning is of vital importance for better clustering quality. Process., 2020, pp. 3902–3906.
[14] X. Li, M. Chen, and Q. Wang, “Adaptive consistency propagation method
Furthermore, the representation of samples in fuzzy latent space for graph clustering,” IEEE Trans. Knowl. Data Eng., vol. 32, no. 4,
cannot provide reference for out-of-sample clustering. There- pp. 797–802, Apr. 2020.
after, these limitations mentioned are considered as our future [15] Z. Wang, Z. Li, R. Wang, F. Nie, and X. Li, “Large graph clustering with
simultaneous spectral embedding and discretization,” IEEE Trans. Pattern
work. Anal. Mach. Intell., vol. 43, no. 12, pp. 4426–4440, Dec. 2021.
[16] F. Nie, X. Wang, M. I. Jordan, and H. Huang, “The constrained Laplacian
REFERENCES rank algorithm for graph-based clustering,” in Proc. Conf. AAAI Artif.
Intell., 2016, pp. 1969–1976.
[1] Q.-T. Bui et al., “SFCM: A fuzzy clustering algorithm of extracting [17] J. Wang, H. Wang, Z. Ma, L. Wang, Q. Wang, and X. Li, “Unsupervised
the shape information of data,” IEEE Trans. Fuzzy Syst., vol. 29, no. 1, hyperspectral band selection based on hypergraph spectral clustering,”
pp. 75–89, Jan. 2021. IEEE Geosci. Remote Sens. Lett., vol. 19, pp. 1–5, 2022.
[18] Z. Li, F. Nie, X. Chang, Y. Yang, C. Zhang, and N. Sebe, “Dynamic [42] Y. Dong, Y. Zhuang, K. Chen, and X. Tai, “A hierarchical clustering
affinity graph construction for spectral clustering using multiple features,” algorithm based on fuzzy graph connectedness,” Fuzzy Sets Syst., vol. 157,
IEEE Trans. Neural Netw. Learn. Syst., vol. 29, no. 12, pp. 6323–6332, no. 13, pp. 1760–1774, 2006.
Dec. 2018. [43] X. Yang, W. Yu, R. Wang, G. Zhang, and F. Nie, “Fast spectral clustering
[19] X. Chen, W. Hong, F. Nie, D. He, M. Yang, and J. Z. Huang, “Spectral learning with hierarchical bipartite graph for large-scale data,” Pattern
clustering of large-scale data by directly solving normalized cut,” in Recognit. Lett., vol. 130, pp. 345–352, 2020.
Proc. 24th ACM SIGKDD Int. Conf. Knowl. Discov. Data Mining, 2018, [44] C. Liu, F. Nie, R. Wang, and X. Li, “Scalable fuzzy clustering with anchor
pp. 1206–1215. graph,” IEEE Trans. Knowl. Data Eng., vol. 35, no. 8, pp. 8503–8514,
[20] D. Cai and X. Chen, “Large scale spectral clustering via landmark-based Aug. 2023.
sparse representation,” IEEE Trans. Cybern., vol. 45, no. 8, pp. 1669–1680, [45] K. Allab, L. Labiod, and M. Nadif, “Simultaneous spectral data embedding
Aug. 2015. and clustering,” IEEE Trans. Neural Netw. Learn. Syst., vol. 29, no. 12,
[21] Z. Li, F. Nie, X. Chang, L. Nie, H. Zhang, and Y. Yang, “Rank-constrained pp. 6396–6401, Dec. 2018.
spectral clustering with flexible embedding,” IEEE Trans. Neural Netw. [46] F. De la Torre and T. Kanade, “Discriminative cluster analysis,” in Proc.
Learn. Syst., vol. 29, no. 12, pp. 6073–6082, Dec. 2018. Int. Conf. Mach. Learn., 2006, pp. 241–248.
[22] R. Zhou, X. Chang, L. Shi, Y.-D. Shen, Y. Yang, and F. Nie, “Person [47] J. Huang, F. Nie, and H. Huang, “A new simplex sparse learning model
reidentification via multi-feature fusion with adaptive graph learning,” to measure data similarity for clustering,” in Proc. 24th Int. Conf. Artif.
IEEE Trans. Neural Netw. Learn. Syst., vol. 31, no. 5, pp. 1592–1601, Intell., 2015, pp. 3569–3575.
May 2020. [48] Q. Wang, Y. Miao, M. Chen, and Y. Yuan, “Spatial-spectral clustering with
[23] F. Nie, X. Wang, C. Deng, and H. Huang, “Learning a structured optimal anchor graph for hyperspectral image,” IEEE Trans. Geosci. Remote Sens.,
bipartite graph for co-clustering,” in Proc. Adv. Neural Inf. Process. Syst., vol. 60, pp. 1–13, 2022.
2017, pp. 4132–4141. [49] B. Yang, X. Zhang, F. Nie, and F. Wang, “Fast multiview clustering with
[24] J. Lu, F. Nie, R. Wang, and X. Li, “Fast multiview clustering by optimal spectral embedding,” IEEE Trans. Image Process., vol. 31, pp. 3884–3895,
graph mining,” IEEE Trans. Neural Netw. Learn. Syst., early access, March 2022.
23, 2023, doi: 10.1109/TNNLS.2023.3256066.. [50] W. Zhu, F. Nie, and X. Li, “Fast spectral clustering with efficient large graph
[25] Q. Wang, R. Liu, M. Chen, and X. Li, “Robust rank-constrained sparse construction,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process.,
learning: A graph-based framework for single view and multiview cluster- 2017, pp. 2492–2496.
ing,” IEEE Trans. Cybern., vol. 52, no. 10, pp. 10228–10239, Oct. 2022. [51] F. Nie, Z. Zeng, I. W. Tsang, D. Xu, and C. Zhang, “Spectral embedded
[26] X. Zhao, F. Nie, R. Wang, and X. Li, “Robust fuzzy K-means clustering clustering: A framework for in-sample and out-of-sample spectral cluster-
with shrunk patterns learning,” IEEE Trans. Knowl. Data Eng., vol. 35, ing,” IEEE Trans. Neural Netw., vol. 22, no. 11, pp. 1796–1808, Nov. 2011.
no. 3, pp. 3001–3013, Mar. 2023. [52] Q. Wang, Z. Qin, F. Nie, and X. Li, “Spectral embedded adaptive neigh-
[27] Y. Peng, X. Zhu, F. Nie, W. Kong, and Y. Ge, “Fuzzy graph clustering,” bors clustering,” IEEE Trans. Neural Netw. Learn. Syst., vol. 30, no. 4,
Inf. Sci., vol. 571, pp. 38–49, 2021. pp. 1265–1271, Apr. 2019.
[28] X. Yang, Y. Xu, S. Li, Y. Liu, and Y. Liu, “Fuzzy embedded clustering based [53] F. Nie, W. Zhu, and X. Li, “Unsupervised large graph embedding,” in Proc.
on bipartite graph for large-scale hyperspectral image,” IEEE Geosci. 21th Conf. AAAI Artif. Intell., 2017, pp. 2422–2428.
Remote Sens. Lett., vol. 19, pp. 1–5, 2022. [54] F. Nie, W. Chang, R. Wang, and X. Li, “Learning an optimal bipartite
[29] J. Han, H. Liu, and F. Nie, “A local and global discriminative framework graph for subspace clustering via constrained Laplacian rank,” IEEE Trans.
and optimization for balanced clustering,” IEEE Trans. Neural Netw. Cybern., vol. 53, no. 2, pp. 1235–1247, Feb. 2023.
Learn. Syst., vol. 30, no. 10, pp. 3059–3071, Oct. 2019. [55] R. Wang, F. Nie, Z. Wang, H. Hu, and X. Li, “Parameter-free weighted
[30] U. V. Luxburg, “A tutorial on spectral clustering,” Statist. Comput., vol. 17, multi-view projected clustering with structured graph learning,” IEEE
no. 4, pp. 395–416, 2007. Trans. Knowl. Data Eng., vol. 32, no. 10, pp. 2014–2025, Oct. 2020.
[31] J. Wang, Z. Ma, F. Nie, and X. Li, “Entropy regularization for unsupervised [56] Z. Bian, F.-L. Chung, and S. Wang, “Fuzzy density peaks clustering,” IEEE
clustering with adaptive neighbors,” Pattern Recognit., vol. 125, 2022, Trans. Fuzzy Syst., vol. 29, no. 7, pp. 1725–1738, Jul. 2021.
Art. no. 108517. [57] J. Han, K. Xiong, F. Nie, and X. Li, “Structured graph reconstruction
[32] Y. Pang, J. Xie, F. Nie, and X. Li, “Spectral clustering by joint spectral for scalable clustering,” IEEE Trans. Knowl. Data Eng., vol. 33, no. 5,
embedding and spectral rotation,” IEEE Trans. Cybern., vol. 50, no. 1, pp. 2252–2265, May 2021.
pp. 247–258, Jan. 2020.
[33] C. Fowlkes, S. Belongie, F. Chung, and J. Malik, “Spectral grouping using
the Nystrom method,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 26,
no. 2, pp. 214–225, Feb. 2004. Jingyu Wang (Member, IEEE) received the Ph.D.
[34] J. Xue, F. Nie, R. Wang, and X. Li, “Rank-r discrete matrix factorization degree in signal, image and automatic from the Uni-
for anchor graph clustering,” IEEE Trans. Knowl. Data Eng., vol. 35, no. 7, versité Paris-Est, Paris, France, in 2015.
pp. 7371–7381, Jul. 2023. He is currently a Full Professor with the School of
[35] J. Wang, Z. Ma, F. Nie, and X. Li, “Progressive self-supervised clustering Astronautics and the School of Artificial Intelligence,
with novel category discovery,” IEEE Trans. Cybern., vol. 52, no. 10, OPtics and ElectroNics (iOPEN), Northwestern Poly-
pp. 10393–10406, Oct. 2022. technical University, Xi’an, China. His research inter-
[36] J. Zhu, L. Tao, H. Yang, and F. Nie, “Unsupervised optimized bipar- ests include information processing, computer vision
tite graph embedding,” IEEE Trans. Knowl. Data Eng., vol. 35, no. 3, and intelligent perception.
pp. 3224–3238, Mar. 2023.
[37] R. Wang, F. Nie, and W. Yu, “Fast spectral clustering with anchor graph
for large hyperspectral images,” IEEE Geosci. Remote Sens. Lett., vol. 14,
no. 11, pp. 2003–2007, Nov. 2017.
[38] D. Shi, L. Zhu, Y. Li, J. Li, and X. Nie, “Robust structured graph
clustering,” IEEE Trans. Neural Netw. Learn. Syst., vol. 31, no. 11,
pp. 4424–4436, Nov. 2020. Shengzhao Guo is currently working toward the M.E.
[39] Y. Liu, Y. Cai, X. Yang, F. Nie, and W. Ye, “Fast adaptive neighbors clus- degree with the School of Astronautics, Northwestern
tering via embedded clustering,” Neurocomputing, vol. 399, pp. 331–341, Polytechnical University, Xi’an, China.
2020. His research interests include machine learning and
[40] F. Nie, X. Zhao, R. Wang, X. Li, and Z. Li, “Fuzzy k-means clustering computer vision.
with discriminative embedding,” IEEE Trans. Knowl. Data Eng., vol. 34,
no. 3, pp. 1221–1230, Mar. 2022.
[41] R. Nock, P. Vaillant, C. Henry, and F. Nielsen, “Soft memberships for
spectral clustering, with application to permeable language distinction,”
Pattern Recognit., vol. 42, no. 1, pp. 43–53, 2009.
Feiping Nie (Senior Member, IEEE) received the Xuelong Li (Fellow, IEEE) is currently a Full Professor with the School of Arti-
Ph.D. degree in computer science from Tsinghua ficial Intelligence, Optics and Electronics (iOPEN), Northwestern Polytechnical
University, Beijing, China, in 2009. University, Xi’an, China.
He has authored more than 100 papers in
the following top journals and conferences: the
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND
MACHINE INTELLIGENCE, the International Jour-
nal of Computer Vision, the IEEE TRANSAC-
TIONS ON IMAGE PROCESSING, the IEEE TRANSAC-
TIONS ON NEURAL NETWORKS AND LEARNING SYS-
TEMS/TRANSACTIONS ON NEURAL NETWORKS, the
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, the ACM
Transactions on Knowledge Discovery from Data, the Bioinformatics, the
International Conference on Machine Learning, the Conference on Neural
Information Processing Systems, the Knowledge Discovery and Data Mining
Conference, the International Joint Conference on Artificial Intelligence, the
Association for the Advancement of Artificial Intelligence, the International
Conference on Computer Vision, the Conference on Computer Vision and
Pattern Recognition, and the ACM Multimedia. His current research interests
include machine learning and its applications, such as pattern recognition, data
mining, computer vision, image processing, and information retrieval. He is
currently serving as an Associate Editor or a PC Member for several prestigious
journals and conferences in the related fields. His papers have been cited over
5000 times (Google scholar).

Local-Global Fuzzy Clustering With Anchor Graph

Uploaded by

Copyright:

Available Formats

Local-Global Fuzzy Clustering With Anchor Graph

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Local-Global Fuzzy Clustering With Anchor Graph

Uploaded by

Copyright:

Available Formats

188 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 32, NO.

Local-Global Fuzzy Clustering With Anchor Graph

s.t. F1c = 1n , F ≥ 0; G1c = 1m , G ≥ 0 (4) Subsequently, (9) becomes

of yjT yj is excusably omitted for it is irrelevant to the updating assignment.

IV. THEORETICAL ANALYSIS Hence, we can transform (4) as

matrix Y takes O(nmc) according to Section III-C. (20)

(ωi < 0). m m

TABLE II 3) Purity is the percentage of correct labels and can be defined

which signifies the consistency of labels [57].

B. Validation on Toy Example

TABLE V Spectral clustering with embedding has the ability to make

You might also like