Individuality-Enhanced and Multi-Granularity Consistency-Preserving Graph Neural Network For Semi-Supervised Node Classification
Individuality-Enhanced and Multi-Granularity Consistency-Preserving Graph Neural Network For Semi-Supervised Node Classification
Individuality-Enhanced and Multi-Granularity Consistency-Preserving Graph Neural Network For Semi-Supervised Node Classification
https://doi.org/10.1007/s10489-023-04974-x
Abstract
Semi-supervised node classification is an important task that aims at classifying nodes based on the graph structure, node
features, and class labels for a subset of nodes. While most graph convolutional networks (GCNs) perform well when an
ample number of labeled nodes are available, they often degenerate when the amount of labeled data is limited. To address
this problem, we propose a scheme, namely, Individuality-enhanced and Multi-granularity Consistency-preserving graph
neural Network (IMCN), which can alleviate the problem of losing individual information within the encoder while providing
a reliable supervised signal for learning purposes. First, one simple encoder based on node features only is integrated to
enhance node individuality and amend node commonality learned by the GCN-based encoder. Then, three constraints are
defined at different levels of granularity, encompassing node embedding agreement, semantic class alignment, and node-to-
class distribution identity. They can maintain the consistency between the individuality and commonality of nodes and be
leveraged as latent supervised signals for learning representative embeddings. Finally, the trade-off between the individuality
and commonality of nodes captured by two encoders is taken into consideration for node classification. Extensive experiments
on six real-world datasets have been conducted to validate the superiority of IMCN against state-of-the-art baselines in handling
node classification tasks with scarce labeled data.
Keywords Node classification · Semi-supervised learning · Multi-granularity consistency · Graph neural network
123
X. Liu and W. Yu
Inspired by the recent success of contrastive learning [16– grained one at the semantic class level by aligning the
18] in computer vision, numerous multi-view GCNs have prototypes of the same class learned from different encoders;
been specifically designed to capture node feature infor- and the middle-grained one at the node-to-class level by
mation from diverse perspectives. There are two primary ensuring the identity of node-to-class relational distributions
approaches for generating node embeddings from multi- learned from two encoders.
views: one is combining different kinds of information from 3) Extensive experiments on six real-world networks from
the same graph, such as the models CG3 [19] and VCHN [20]; different fields verify that IMCN significantly outperforms
and the other is contrasting the information captured from one the comparison methods for semi-supervised node classifi-
graph and its corresponding augmented one, such as the mod- cation with few labeled nodes. Especially, on the three public
els MVGRL [21] and CSGNN [22]. Therefore, information benchmark datasets Cora, CiteSeer, and PubMed, the classi-
from different views can be complementary by each other fication accuracies of IMCN are more than 2.5% higher than
and provide valuable hints for promoting node embedding baseline methods when only two or three labeled nodes per
and classification. class are available for model training.
Despite the remarkable advancements achieved by single- The remainder of the paper is structured as follows.
view and multi-view GCNs in semi-supervised node classifi- Section 2 introduces some related work. Section 3 presents
cation, it remains a daunting challenge to effectively classify the details of the proposed model. Then, the experimental
nodes with only a limited number of labeled examples (lack setup and results are introduced and analyzed in Section 4
of supervision). This is mainly attributed to the following two and Section 5, respectively. Finally, Section 6 provides the
problems: (1) Graph convolutions focus on information prop- main conclusions of this paper and gives ideas for further
agations from neighboring nodes to the central node based on study.
the graph topology, so that the similar features or commonal-
ity shared by connected nodes are learned. However, this may
result in a loss of the node individuality, the acquisition of 2 Related work
over-smoothed features, and the failure to distinguish nodes
from different classes [13]. Such outcomes are unfavorable In this section, some previous work on GCNs for semi-
when aiming to learn a discriminative classifier. (2) The class supervised node classification are briefly reviewed based on
label information of nodes is scarce and concise but plays a whether the method capture abundant information from dif-
significant role in supervising model learning. The relation ferent aspects.
between nodes and classes should be taken full advantage of
for mining valuable supervised signals and optimizing the 2.1 Single-view GCNs
processes of node embedding and classification. However,
only limited useful supervised signals are mined from the Single-view GCNs usually learn node embeddings for classi-
relation by most GCNs [19, 20]. fication by propagating and aggregating feature information
In this paper, we develop an Individuality-enhanced and between adjacent nodes in the graph from one aspect only.
Multi-granularity Consistency-preserving graph neural Net- The classical and representative model GCN [11], which
work (IMCN) for semi-supervised node classification with derived inspiration primarily from the convolutional oper-
scarce labeled data. Specifically, IMCN integrates a simple ations on images, learns low-dimensional node embeddings
two-layer MLP as a supplementary encoder to amend the through propagations and aggregations of nodes’ and their
individuality of nodes damaged by graph convolutions. Then, neighbors’ features. The GraphSAGE (SAmple and aggre-
IMCN enriches supervised signals by taking full advantage GatE) model [23], a general inductive framework, generates
of the multi-granularity relations among nodes and classes. node embeddings by sampling and aggregating features from
The main contributions are summarized as follows: one node’s local neighborhood and can efficiently gener-
1) An individuality-enhanced and multi-granularity ate node embedding for previously unseen data. The Graph
consistency-preserving graph neural Network is built for ATtention network (GAT) [24] employs an attention mecha-
semi-supervised node classification, which can maintain the nism to modify the traditional propagation and aggregation
individuality and commonality of nodes simultaneously dur- operations between one node and its neighbors in GCN. The
ing the feature extraction process. The proposed method Simple Graph Convolution network (SGC) [25] reduces the
is highly effective, particularly when there are only a few excess complexity in GCN by successively removing nonlin-
labeled nodes available for model training. earities and collapsing weight matrices between consecutive
2) Three consistency constraints at different granular- layers. The Hierarchical Graph Convolution Network (H-
ity are designed to enrich the supervised information for GCN) [12] enlarges the receptive field of graph convolutional
model learning: the fine-grained one at the node level by processes in GCN by fusing nodes with similar structures into
an improved semi-supervised contrastive loss; the coarse- super-nodes. The simplified multi-layer graph convolutional
123
Individuality-enhanced and multi-granularity consistency-preserving graph neural...
networks with dropout [6] combines SGC and the dropout GNNs and a shared MLP and generates corresponding graph
regularization in deep learning and extends the shallow GCN representations from them by shared pooling and MLP lay-
model to the multi-layer GCN model to extract information ers. Then, contrastive constraints between node and graph
from higher-order neighbors. This model can reduce redun- representations are designed as important parts of the learn-
dant calculation and over-fitting of the multi-layer GCN to ing objective. The deep GRAph Contrastive rEpresentation
make it simple and efficient. learning model (GRACE) [27] first generates two correlated
Despite the noticeable achievements of these single-view graph views by randomly performing corruption of remov-
models, the main concern of lacking supervised informa- ing edges and masking node features, then maximizes the
tion in semi-supervised node classification is still not well agreement between node embeddings in these two views
handled. Most single-view GCNs are usually shallow with based on the idea of contrastive learning. The Contrastive
one aspect only taken into consideration, and cannot obtain Semi-supervised learning model based GNN (CSGNN) [22]
adequate information for effective classification when very employs a two-layer GCN as a teacher encoder to learn node
limited nodes are labeled. representations for one graph and its corrupted graph, and
then contrasts the latent vectors between nodes, edges, and
2.2 Multi-view GCNs labels from these two views for improving predictions. In
the final stage, the predictions are distilled into the down-
Different from the single-view GCNs, multi-view methods streaming student module.
are specifically designed to capture abundant information The above two branches of multi-view GCNs can learn
from different aspects for improving learning and classifi- complementary information for boosting the discrimination
cation. In recent years, many multi-view GCNs have been of node embeddings and classification accuracy. However,
proposed, and are classified into the following two classes. these methods usually rely on the graph convolutional
One captures and combines feature information from two process, which enforces the encoders focusing on the com-
views of the same graph. The Contrastive GCN with Graph monality of adjacent nodes and damaging some individuality
Generation (CG3 ) [19] integrates H-GCN and a two-layer of them. In addition, these models do not take full advan-
GCN to learn complementary information from local and tage of the complex but valuable relation information among
global views of nodes and imposes the designed node-level nodes and classes.
contrastive and graph-level generative constraints on the The methods H-GCN, CG3 , and VCHN are closely
embeddings learned by the above two encoders. The View- related to the method proposed in this paper. The constraints
Consistent Heterogeneous Network (VCHN) [20] combines and mechanism used in these four methods are listed in
the classical methods GCN and GAT to learn node embed- Table 1.
dings from spectral and spatial views and applies constraints The following are the differences among the proposed
on the predictions between two views to promote the super- IMCN method, H-GCN, CG3 , and VCHN. Compared with
vision from one to the other. the multi-view methods CG3 and VCHN, a GCN-based
The other captures feature information from one graph encoder is coupled with an MLP-based encoder in the pro-
and its corresponding augmented graph and contrasts them posed IMCN method, which can enhance the node individ-
to provide extra useful information for learning. The Deep uality for learning discriminative node embedding vectors.
Graph Infomax model (DGI) [26] aims at learning patch rep- For the node-level constraints in IMCN and CG3 , the calcu-
resentations and the corresponding high-level summaries of lation of the former is much simpler than that of the latter
graphs and related corrupted graphs by GCN, and then maxi- by reducing repeated node-pair contrasts. In addition, IMCN
mizes the mutual information between them. Similar to DGI, takes full advantage of the complex relations among nodes
the contrastive Multi-View Graph Representation Learning and classes from the views of node-to-class distribution and
model (MVGRL) [21] learns node representations for the class centroid alignment which are ignored by the other three
graph and its corresponding corrupted one by two different methods.
123
X. Liu and W. Yu
Node embedding
representations
Graph structure Node-to-class distribution Class prototypes
Node features
Global-view
Classification
GCN-based encoder
Consistency at
+
Consistency at Consistency at
node level node-to-class level class level
Local-view
MLP-based encoder
Graph
Fig. 1 Framework of the proposed IMCN model: feature extraction in Stage 1; multi-granularity consistency constrains in Stage 2; and feature
fusion and node classification in Stage 3
123
Individuality-enhanced and multi-granularity consistency-preserving graph neural...
where m, n, and c (as the dimension of node embeddings) are local and global representations of one node is described as
the number of nodes, node features, and classes, respectively; fine-grained node-level consistency.
and φ(·) denotes the processes of generating coarse graphs, In the proposed IMCN method, the vector distance of
graph convolution, and refining coarsened graphs in H-GCN. local and global representations is used to measure this fine-
However, those GCN-based encoders mainly focus on grained consistency. In detail, this constraint is defined with
the commonality of linked nodes excessively and may lose unsupervised and supervised parts as follows.
the individuality of nodes in information propagation. This On the one hand, in order to utilize the abundance of unla-
problem also exists in the local encoder which is designed beled information effectively, an unsupervised node-level
by two-layer GCNs in [19]. In practice, the categories of loss is defined to maintain the consistency between the local
nodes are mainly determined by their individual-feature and global representations of the same node:
information. Therefore, to compensate for the damaged
individual-feature information learned from the global GCN- m sim(hlocal
global
,hi )
i=1 e
i
based encoder, the node itself is regarded as a local view and L unode = − log global
, (3)
extract the individual-feature information of nodes by a sim- n sim(hlocal ,hk )
j,k=1 e
j
123
X. Liu and W. Yu
Fig. 2 Differences between the proposed IMCN method and CG3 [19] on calculating unsupervised and supervised node-level loss
Therefore, the learning process of IMCN for local and global mentioned above, then add them to the prototype represen-
node representations can complement and promote each tations calculated after the last iteration for updating class
other based on node features and semantic information. prototype representations and suppressing the instability:
local(t) local(t−1)
3.3.2 Class-level consistency constraint ci = (1 − μ)ci + μcilocal ,
global(t) global(t−1) global
ci = (1 − μ)ci + μci , (7)
The performance of the model tends to be biased when trained
only at the sample level. The reason is that different sam- where μ is the balance weight for updating the class
ples can share some common features and belong to the prototypes in the t-th iteration based on the prototype repre-
same class, and it is necessary for the model to distinguish sentation after t − 1 iterations, and μ ∈ [0, 1).
between samples of different classes. Despite a small num-
ber of labeled nodes, their semantic category information 3.3.3 Consistency constraint at the node-to-class level
is an important supplement for feature embedding. This is
not taken into consideration in [19]. From the perspective Assume that the distribution around each prototype is
of semantic category, the common information between the isotropic Gaussian and that the distributions around the same
local and global views of nodes is named after coarse-grained class in the local and global views should be similar. There-
class-level consistency. fore, in addition to the consistencies at node and class levels,
Following but different from the idea in [36], prototypes there must be some indispensable consistent information
for each class from local and global views are generated in the node-to-class relationship between local and global
using the learned embeddings of the labeled nodes, then the views. This is also not taken into consideration in [19].
distance between them are expected to be minimized. The To make the best use of unlabeled nodes, sim(hi , c j )
following constraint is designed: is used to calculate the similarities between each node
embedding and the obtained class prototypes, and then node-
1
c
local global 2 to-class relational distributions are generated for unlabeled
L class = ci − ci , (6)
c 2 nodes in local and global views according to the following
i=1
expressions:
global
where cilocal ∈ Rc and ci ∈ Rc are the prototypes of the
esim(hi ,c j )/τ
local local
i-th class calculated by average aggregation of the learned pilocal = ,
j c sim(hlocal ,clocal )/τ
local and global embeddings of the labeled nodes belonging k=1 e
i k
to this class respectively, ·2 is the L 2 -norm operator, and global global
,c j )/τ
L class is the mean-squared Euclidean distance of the corre- global esim(hi
pi j = global global
, (8)
sponding class prototypes. Note that it is different from the c sim(hi ,ck )/τ
k=1 e
magnet loss [37] which uses the k-means method to compute
cluster centers for each class. where τ > 0 is a temperature hyper-parameter denoting the
The representations of class prototypes are not stable dur- concentration of node embeddings around class prototypes,
ing the model learning process and may forget valuable and a smaller τ indicates a larger concentration.
information learned before. Therefore, in the t-th iteration, Then, the distribution of the relation of one node to
global
we compute the class prototypes cilocal and ci in the way all classes can be represented as pi = [ pi1 , pi2 , ..., pic ].
123
Individuality-enhanced and multi-granularity consistency-preserving graph neural...
123
X. Liu and W. Yu
4.2 Comparison models nodes in different views. Then contrastive learning is con-
ducted between node embeddings from two views.
To verify the effectiveness of the proposed model, a compari- 8) H-GCN [12] is an improved GCN-based model that
son is made between the proposed IMCN model and ten other expands the receptive field of graph convolutions in GCN.
baseline methods, including four basic deep graph models 9) CG3 [19] employs the H-GCN model and a two-layer
(GCN [11], GAT [24], SGC [25], and H-GCN [12]), and GCN module to obtain local and global node embeddings and
six GCN-based contrastive models (DGI [26], GMI [43], designs a semi-supervised node-level contrastive loss and a
MVGRL [21], GRACE [27], CG3 [19], and VCHN [20]). graph-level generative loss to optimize the model learning
The description of the details of these methods is as follows. process.
1) GCN [11] produces node embedding vectors by a recur- 10) VCHN [20] uses a two-layer GCN module and a two-
sive average neighborhood aggregation scheme. It is derived layer GAT module to obtain latent features from spectral and
from the related work of conducting graph convolutions in spatial views and designs a strategy to generate confident
the spectral domain [44]. pseudo-labels for unsupervised nodes.
2) GAT [24] generates node embedding vectors by mod- Note that, in the last experiment of Section 5, IMCN with
eling the differences between the node and its one-hop a two-layer GCN as the local encoder is added to illustrate
neighbors. the effectiveness of the proposed scheme in this paper.
3) SGC [25] reduces the excess complexity in GCN
by removing nonlinearities and collapsing weight matrices
between consecutive layers.
4) DGI [26] generates node embeddings and graph sum- 4.3 Experimental settings
mary vector for the original input graph and constructs a
corrupted graph to obtain negative node embeddings with The proposed IMCN model was trained using the Adam opti-
the same GNN encoder. Then DGI aims at maximizing the mizer with 500 epochs and the following settings: (1) The
mutual information between positive node embeddings and ReLU function is adopted as the non-linear activation of hid-
the graph summary vector and minimizing it between nega- den layers. (2) The output dimension of local and global
tive node embeddings and the graph summary vector. node representations is fixed to the number of classes. The
5) GMI [43], different from DGI, focuses on maximizing dimensions of hidden layers, learning rate, weight decay, and
the mutual information of feature and edge between the input dropout ratio are searched in {32, 64, 128}, {0.1, 0.05, 0.01},
graph and the output graph of the encoder. {0.01, 0.005, 0.001, 0.0005}, and {0, 0.1, 0.2, 0.3, 0.4, 0.5,
6) MVGRL [21] uses graph diffusion to generate an addi- 0.6, 0.7, 0.8, 0.9}, respectively. (3) The hyper-parameters μ,
tional structural view of a graph, then original-view and λ, and τ in the proposed IMCN model are searched in
diffusion-view graphs are fed to GNNs and shared MLP to {0.1, 0.2, 0.3, 0.4, 0.5}. (4) The hyper-parameters α, β, and
learn node representations. The learned features are then fed γ for the trade-off among three consistencies at different
to a graph pooling layer and a shared MLP to learn graph rep- granularity are tuned in {0.1, 0.5, 1, 1.5, 2}. In each experi-
resentations. A discriminator contrasts node representations ment, the proposed IMCN model is run 10 random trials and
from one view with graph representation of another view and the mean and standard deviation of the best test classification
vice versa and scores the agreement between representations accuracy is reported. The results of the comparison methods
which is used as the training signal. are directly excerpted from the original papers. If not, cor-
7) GRACE [27] jointly corrupts the input graph at both responding experiments are conducted to obtain the results.
topology and node attribute levels, such as removing edges The code and datasets are publicly available at https://github.
and masking node features, to provide diverse contexts for com/xinya0817/IMCN.
123
Individuality-enhanced and multi-granularity consistency-preserving graph neural...
5 Experimental results and analysis set for one hyper-parameter, the other two hyper-parameters
are fixed.
This section presents the experimental results and discusses The classification accuracies of IMCN with different val-
the performance of IMCN from the following seven aspects: ues for α, β, and γ are shown in Fig. 3, and the following
(1) Performance of IMCN with different weights of the observations can be obtained:
node-level consistent constraint, the class-level one, and the (1) IMCN obtained the best classification result on Cora
one at the node-to-class level; (2) Performance of IMCN when α = 1.5, β = 1, and γ = 2. This is significantly
with different updating rates for learning class prototypes, superior to the results when α, β, or γ equals 0.1.
different temperatures for calculating the node-to-class dis- (2) IMCN got the best result on PubMed when α = 0.5,
tribution, and different weights for the local embedding in β = 1, and γ = 0.1, which is obviously much better when
the final embedding; (3) Visualization of the original nodes these hyper-parameters are set to other values. Therefore, the
and node embeddings learned by IMCN and its part mod- impacts of three different granularity consistency levels on
ules; (4) Ablation study of IMCN with different loss terms; the model are quantified.
(5) Performance of IMCN on alleviating over-smoothing; (6) (3) The IMCN model demonstrates satisfactory perfor-
Performance of IMCN and comparison methods with scarce mance when the values of α and β are approximately 1,
labeled training data; and (7) Performance of IMCN and regardless of whether it is applied to Cora or PubMed
baselines on common benchmark datasets. In all tables of datasets. However, the same level of sensitivity was not
experimental results, the highest record on each dataset is observed in relation to the parameter γ . This indicates that
highlighted in bold. IMCN is considerably more responsive to the weight of the
node-to-class loss. As a result, it is recommended to set the
parameters α and β to 1, while the parameter γ may require
5.1 Performance of IMCN with different weights
careful fine-tuning in domain-specific applications.
on the consistent constraints
Experiments are carried out to determine the effectiveness of 5.2 Performance of IMCN with different parameters
three local-global consistent constraints in IMCN. To verify , ,
the performance of the proposed IMCN model on very lim-
ited labeled training nodes, two citation networks (a small one This part discusses the impacts of different rates μ for
and a relatively big one) are used with label rates equal 0.5% updating class prototypes, different temperatures τ for the
for Cora and 0.03% for PubMed. When different values are node-to-class distribution, and different weights λ of the local
Fig. 3 Classification accuracies of the proposed IMCN model with different α, β, and γ
123
X. Liu and W. Yu
embedding in the final embedding on the performance of the (4) IMCN demonstrates strong performance when the val-
proposed IMCN model. Classification experiments are con- ues of μ and λ are set to 0.4 and 0.3, respectively, whether
ducted on Cora and PubMed with 0.5% and 0.03% labeled applied to Cora or PubMed datasets. However, the same level
nodes, respectively. From the results shown in Fig. 4, the of performance consistency was not observed in relation to
following observations can be obtained: the parameter τ . This indicates that IMCN is highly sensi-
(1) IMCN performs best when μ is set to 0.3 on Cora and tive to the temperature parameter in the node-to-class loss.
0.4 on PubMed. Smaller or bigger values for μ cannot ensure Therefore, it is recommended to set the parameters μ and λ to
IMCN gets the ideal performance, which implies that appro- 0.4 and 0.3, respectively, while the parameter τ may require
priate updating speed for class prototypes is important for fine-tuning in practical tasks.
learning stable and expressive node representations. On the
one hand, when μ is very small, node embeddings are unable
to obtain new useful information in a timely manner. On the 5.3 Visualization of node embeddings learned
other hand, when μ is very large, however, the important by different models
information learned previously cannot be retained.
(2) In general, a low value for the temperature hyper- The t-SNE algorithm [45] is used to visualize the original
parameter τ ensures that IMCN achieves the best per- nodes of Cora [40] with a label rate of 0.5% and their embed-
formance, as seen on Cora. This is because that small ding representations learned by a two-layer MLP (only the
temperature hyper-parameter can ensure a high concentration features of node itself are used), the representative model H-
of node-to-class distribution. However, the superior result on GCN [12] (feature information propagated from multi-hop
PubMed obtained when τ = 0.3 in IMCN was used may be neighbors is used), and the proposed IMCN model (which
attributed to PubMed’s relatively large scale and sparse struc- integrate feature information of node itself and from its
ture. neighbors). All original and embedded nodes are projected
(3) The best classification accuracy was got when λ = 0.3 into a two-dimensional space for visualization and shown in
for IMCN both on Cora and PubMed. A much smaller or Fig. 5.
bigger percentage of local information in the final node rep- From the results, the following observations can be
resentations cannot obtain ideal results. This is because the obtained. After the embedding process of a simple two-layer
hierarchical GCN module takes the feature information of the MLP model, nodes from different classes are still mixed
node into learning, but some are damaged by the propagation and cannot be clearly distinguished. H-GCN can group most
and aggregation operations of GCN layers. embedded nodes into their classes correctly, however, many
nodes from different classes in the central area of Fig. 5(c)
Fig. 4 Classification accuracies of the proposed IMCN model with different μ, τ and λ
123
Individuality-enhanced and multi-granularity consistency-preserving graph neural...
Fig. 5 Two-dimension visualization of original nodes and node embeddings obtained by MLP, H-GCN, and the proposed IMCN model on Cora
are very close, which can easily lead to misclassification. proposed IMCN models with corresponding GCNs as global
Compared with the above two methods, the proposed IMCN encoders obtain low-dimensional node representations on
model can push nodes from different classes away while these two datasets respectively, as shown in Fig. 6(b) and (d).
increasing the distance between these classes, ensuring low From these figures, it can be seen that the proposed IMCN
classification errors. method obviously alleviates the over-smoothing problem
resulting from multiple convolution operations.
5.4 Ablation study of IMCN
5.6 Performance of IMCN and comparison methods
On three citation networks, ablation experiments are carried on datasets with scarce labeled nodes
out to demonstrate the effectiveness of various local-global
consistent constraints in IMCN. The label rates of Cora, Cite- In this part, some experiments are conducted to verify the
Seer, and PubMed are 0.5%, 0.5%, and 0.03%, respectively. effectiveness of the proposed IMCN method in learning
Experimental results are listed in Table 3. expressive node embeddings with only a few nodes labeled
From the results, it can be seen that the designed in the training process. In the experiments, three benchmark
multi-granularity constraints (feature embedding agreement, graph datasets (Cora, CiteSeer, and PubMed) with differ-
semantic class alignment, and the identity of node-to-class ent label rates are used: 0.5%, 1%, 3% labeled nodes for
relational distribution) make a significant improvement. The Cora and CiteSeer; and 0.03%, 0.05%, 0.01% labeled nodes
consistency of the local and global perspectives at multi- for PubMed, respectively. The classification accuracies of all
ple levels can reveal their complementary features between methods are listed in Table 4, and the following three obser-
the individuality and commonality of nodes. By combining vations are obtained.
these constraints, IMCN can make full use of both limited (1) The proposed IMCN method outperforms most base-
labeled nodes and abundant unlabeled nodes and integrate lines in terms of different label rates on three datasets,
useful information from the two views. especially when there are very few labeled nodes. For exam-
ple, on CiteSeer with 0.5% labeled nodes, the classification
5.5 Performance of IMCN on alleviating accuracy of IMCN is significantly higher than that of other
over-smoothing methods, which is 6.5% higher than the method ranked
second. This mainly attributes that IMCN can capture the
Through a series of experiments, it was observed that the abundant individuality and commonality of nodes with the
nine-layer and twelve-layer GCNs cause all node repre- consideration of the complex relations among nodes.
sentations to be similar and indistinguishable on Cora and (2) On Cora with a label rate of 1% and PubMed with a
CiteSeer, respectively, as shown in Fig. 6(a) and (c). Then, the label rate of 0.1%, IMCN’s performance is not superior to
VCHN, which is ranked first, but it is clearly better than the
method ranked third. Concretely, the classification accuracy
Table 3 Ablation study of the proposed IMCN method with different
loss terms (%) of the proposed IMCN method is 2.6% and 0.1% lower than
VCHN, but is 3.7% and 0.4% higher than H-GCN on these
L cr oss L node L class L node2class Cora CiteSeer PubMed
two datasets, respectively.
68.7 55.4 69.8 (3) The performance of the proposed IMCN model is quite
70.0 68.4 70.8 good when density is high, especially on Cora and CiteSeer.
76.9 70.0 71.1 This is obviously in contrast with that on PubMed which is
78.0 70.8 71.7 much sparser than the first two datasets according to their
density information in Table 2.
123
X. Liu and W. Yu
Fig. 6 Two-dimension
visualization of node
embeddings obtained by
multi-layer GCNs and the
proposed IMCN models with
the corresponding multi-layer
GCNs on Cora and CiteSeer
5.7 Performance of IMCN and baselines on common obtain the best classification accuracies on Photo which are
benchmark datasets about 2.9% and 3.3% higher than that of CG3 which ranked
at third and proposed in recent years. This mainly owes to
In this section, classification experiments are conducted on the individuality of nodes enhanced by the designed sim-
six various networks with common dissociation to assess the ple local encoder and the multi-granularity relations among
performance of the proposed IMCN method and compare it nodes and classes maintained by the designed consistency
with the baseline methods. Note that IMCN1 and IMCN2 are constraints.
the methods proposed in this paper with the H-GCN model (3) IMCN1 and IMCN2 are obviously better than the cor-
and a two-layer GCN as the global encoder, respectively. In responding IEN1 and IEN2 on most datasets. For example, on
addition, there are two designed comparison methods corre- the citation network CiteSeer, the node classification accu-
sponding to IMCN1 and IMCN2 without multi-granularity racies are promoted by 3.7% and 5.7% with the designed
consistency constraints: IEN1 takes the H-GCN model and a multi-granularity consistency constraints of the proposed
two-layer MLP as encoders; and IEN2 uses a two-layer GCN method.
and a two-layer MLP as encoders. Then, the widely used statistical Nemenyi test [46] is
First, the classification accuracies of all methods are employed to conduct a comprehensive analysis of the signif-
shown in Table 5, where the results ranking in the first two icant difference among the proposed IMCN methods and 12
are marked in bold. The following three conclusions can be comparison methods on six datasets with the classification
drawn: accuracies in Table 5. The average ranks of all the meth-
(1) The performance of IMCN1 and IMCN2 is obviously ods with the critical distance (CD) are plotted as shown in
better than the first three traditional models, GCN, GAT, and Fig. 7. The following observations are obtained. The classi-
SGC, which are based on a single view. This is thanks to two fication accuracies of the proposed IMCN1, IMCN2, IEN1,
different views of the graph combined in IMCN to capture IEN2, CG3 , H-GCN, and MVGRL are statistically better than
both shared and complementary information from them. those of other seven comparison methods. There is no con-
(2) The proposed methods IMCN1 and IMCN2 outper- sistent evidence to indicate the statistical differences among
form the contrastive learning-based comparison methods on IMCN1, IMCN2, IEN1, IEN2, CG3 , H-GCN, and MVGRL
all experimental datasets. Especially, IMCN1 and IMCN2 on the metric of classification accuracy.
Table 4 Classification
Dataset Cora CiteSeer PubMed
accuracies of the proposed
Label rate 0.5% 1% 3% 0.5% 1% 3% 0.03% 0.05% 0.1%
IMCN method and comparison
methods on Cora, CiteSeer, and GCN [11] 42.6 56.9 74.9 33.4 46.5 66.9 61.8 68.8 71.9
PubMed with very limited
labeled nodes (%) GAT [24] 56.4 71.7 78.5 45.7 64.7 69.3 65.7 69.9 72.4
SGC [25] 43.7 64.3 71.0 43.2 50.7 60.9 62.5 69.4 69.9
DGI [26] 67.5 72.4 78.9 60.7 66.9 69.8 60.2 68.4 70.7
GMI [43] 67.1 71.0 78.8 56.2 63.5 68.0 60.1 62.4 71.4
MVGRL [21] 61.6 65.2 79.0 61.7 66.6 70.3 63.3 69.4 72.2
GRACE [27] 60.4 70.2 75.8 55.4 59.3 67.8 64.4 67.5 72.3
H-GCN [12] 70.9 75.0 82.5 56.3 60.3 69.6 68.7 70.3 74.7
CG3 [19] 69.3 74.1 79.9 62.7 70.6 71.3 68.3 70.1 73.2
VCHN [20] 73.9 81.3 82.0 64.3 67.9 69.1 69.2 71.8 75.2
IMCN (our) 78.0 78.7 83.3 70.8 72.9 73.8 71.7 72.4 75.1
123
Individuality-enhanced and multi-granularity consistency-preserving graph neural...
Table 5 Classification
Methods Cora CiteSeer PubMed Photo Computer CS
accuracies of the proposed
IMCN method and comparison GCN [11] 81.5±0.0 70.3±0.0 79.0±0.0 87.3±1.0 76.3±0.5 91.8±0.1
methods on six benchmark
datasets (%) GAT [24] 83.0±0.7 72.5±0.7 79.0±0.3 86.2±1.5 79.3±1.1 90.5±0.7
SGC [25] 81.0±0.0 71.9±0.1 78.9±0.0 86.4±0.0 74.4±0.1 91.0±0.0
DGI [26] 81.7±0.6 71.5±0.7 77.3±0.6 83.1±0.5 75.9±0.6 90.0±0.3
GMI [43] 82.7±0.2 73.0±0.3 80.1±0.2 85.1±0.1 76.8±0.1 91.0±0.0
MVGRL [21] 82.9±0.7 72.6±0.7 79.4±0.3 87.3±0.3 79.0±0.6 91.3±0.1
GRACE [27] 80.0±0.4 71.7±0.6 79.5±1.1 81.8±1.0 71.8±0.4 90.1±0.8
H-GCN [12] 82.9±0.3 71.1±0.5 79.9±0.4 92.0±0.8 80.4±0.8 92.5±0.1
CG3 [19] 83.4±0.7 73.6±0.8 80.2±0.8 89.4±0.5 79.9±0.6 92.3±0.2
VCHN [20] 81.6±0.5 71.5±0.6 78.7±0.4 89.3±1.2 82.1±0.3 88.4±0.3
IEN1 83.2±0.4 72.8±0.3 79.4±0.4 93.0±0.3 83.1±0.3 93.2±0.2
IMCN1 (our) 84.6±0.4 76.5±0.3 81.2±0.8 93.1±0.5 83.6±0.4 93.3±0.2
IEN2 82.0±0.2 71.2±0.9 79.3±0.5 92.8±0.2 84.1±0.5 93.2±0.2
IMCN2 (our) 85.1±0.2 76.9±0.2 82.2±1.2 93.6±0.2 84.9±0.4 93.7±0.2
123
X. Liu and W. Yu
According to the ranks of the proposed methods and encoders. Extensive experiments were conducted on various
12 comparison methods, macro-F1 results of the first six public benchmark datasets, and the results demonstrated the
methods (including IMCN1, IMCN2, IEN1, IEN2, CG3 and effectiveness of the proposed IMCN method in solving node
H-GCN), the latest method VCHN, and the baseline GCN are classification tasks with extremely limited labeled nodes.
listed and compared in Table 6. The best results are marked in The proposed IMCN model has strict requirements on
bold. It can be seen that, under the macro-F1 metric, IMCNs’ the input graph, which assumes that the entire structure
performance is much better than that of their corresponding of this graph is available to capture the common-feature
IENs, GCN, H-GCN, and VCHN in most cases, especially information of nodes. Moreover, IMCN has a considerable
obvious on Cora. This is mainly attributed to the specific number of hyperparameters that require tuning and can be
designed individuality-enhanced module and three consis- inefficient when dealing with very large networks. In the
tency constraints at different levels. future, our focus will be on developing scalable and efficient
Finally, Table 7 shows the FLOPs, trainable parameters, deep semi-supervised node classification methods specifi-
test time, and running memory of the proposed methods cally designed for large-scale graph datasets. We also aim to
IMCN1 and IMCN2 and the corresponding baseline meth- explore automatic parameter tuning techniques using opti-
ods (H-GCN and GCN). It can be seen that the space and mization methods, such as [47].
time complexity of the proposed methods is slightly higher
Acknowledgements This work has been supported by the National
than the corresponding baseline methods while maintaining
Natural Science Foundation of China under Grant No. 61972203.
competitive performance. This is because of the added two-
layer MLP to enhance individuality and the three consistency Declarations
constraints between local and global encoders.
123
Individuality-enhanced and multi-granularity consistency-preserving graph neural...
123
X. Liu and W. Yu
42. Shchur O, Mumme M, Bojchevski A, Gunnemann S (2018) Pitfalls Weiren Yu received the Ph.D.
of graph neural network evaluation. CoRR abs/1811.05868 degree in computer science from
43. Peng Z, Huang W, Luo M, Zheng Q, Rong Y, Xu T, Huang J (2020) the University of New South
Graph representation learning via graphical mutual information Wales, Sydney NSW, Australia.
maximization. In: International World Wide Web Conferences, pp After that, he spent two years
259–270 as a Postdoctoral Researcher at
44. Defferrard M, Bresson X, Vandergheynst P (2016) Convolutional Imperial College, London, U.K.,
neural networks on graphs with fast localized spectral filtering. In: and then a Lecturer of Computer
Annual Conference on Neural Information Processing Systems, pp Science with Aston University,
3837–3845 Birmingham, U.K. He is currently
45. Van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. an Assistant Professor of Com-
J Mach Learn Res 9(11):2579–2605 puter Science with the University
46. Demsar J (2006) Statistical comparisons of classifiers over multiple of Warwick, Coventry, U.K., and
data sets. J Mach Learn Res 7:1–30 an Honorary Visiting Fellow with
47. Chauhan S, Singh M, Aggarwal AK (2023) Designing of optimal Imperial College London. His
digital IIR filter in the multi-objective framework using an evolu- current research interests include graph data management, data min-
tionary algorithm. Eng Appl Artif Intell 119:105803 ing, and Internet of Things.
123