Individuality-Enhanced and Multi-Granularity Consistency-Preserving Graph Neural Network For Semi-Supervised Node Classification

Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

Applied Intelligence

https://doi.org/10.1007/s10489-023-04974-x

Individuality-enhanced and multi-granularity consistency-preserving


graph neural network for semi-supervised node classification
Xinxin Liu1 · Weiren Yu2

Accepted: 15 August 2023


© The Author(s) 2023

Abstract
Semi-supervised node classification is an important task that aims at classifying nodes based on the graph structure, node
features, and class labels for a subset of nodes. While most graph convolutional networks (GCNs) perform well when an
ample number of labeled nodes are available, they often degenerate when the amount of labeled data is limited. To address
this problem, we propose a scheme, namely, Individuality-enhanced and Multi-granularity Consistency-preserving graph
neural Network (IMCN), which can alleviate the problem of losing individual information within the encoder while providing
a reliable supervised signal for learning purposes. First, one simple encoder based on node features only is integrated to
enhance node individuality and amend node commonality learned by the GCN-based encoder. Then, three constraints are
defined at different levels of granularity, encompassing node embedding agreement, semantic class alignment, and node-to-
class distribution identity. They can maintain the consistency between the individuality and commonality of nodes and be
leveraged as latent supervised signals for learning representative embeddings. Finally, the trade-off between the individuality
and commonality of nodes captured by two encoders is taken into consideration for node classification. Extensive experiments
on six real-world datasets have been conducted to validate the superiority of IMCN against state-of-the-art baselines in handling
node classification tasks with scarce labeled data.

Keywords Node classification · Semi-supervised learning · Multi-granularity consistency · Graph neural network

1 Introduction the annotation process [5, 6]. Consequently, the scarcity of


labeled examples renders the task of classifying nodes in a
Graphs are useful data structures for capturing relationships semi-supervised manner both challenging and crucial.
between entities in various complex interconnected systems, In recent years, deep graph neural networks [7–10]
such as social relationships [1], protein interactions [2], have garnered considerable attention and witnessed notable
commodity co-purchasing [3], and co-citations [4]. Many advancements in semi-supervised node classification. Among
fundamental tasks on graphs involve making predictions over them, graph convolutional networks (GCNs) have emerged
nodes, such as predicting labels for unlabeled nodes accord- as a prominent approach. Early GCNs, such as GCN [11]
ing to the graph structure and node attributes. In practical and the hierarchical GCN model [12], are based on the
scenarios, the availability of labeled nodes is often limited graph convolution process, which focuses on propagat-
due to the resource-intensive and time-consuming nature of ing and aggregating feature information between adjacent
nodes. These models solely learn node embeddings from
the single-view perspective, often adopting a shallow archi-
B Weiren Yu
tecture to mitigate the problem of over-smoothing [13–15]
[email protected]
that may arise when dealing with node feature informa-
Xinxin Liu
[email protected] tion. These characters often lead to the problem of capturing
only a restricted amount of information by most single-view
1 School of Computer Science and Technology, Nanjing GCNs. This is unfavorable for learning discriminative node
University of Science and Technology, Jiangsu, China embeddings and devising effective classifiers, particularly in
2 Department of Computer Science, University of Warwick, situations where the availability of labeled nodes is scarce in
Coventry, UK practical tasks.

123
X. Liu and W. Yu

Inspired by the recent success of contrastive learning [16– grained one at the semantic class level by aligning the
18] in computer vision, numerous multi-view GCNs have prototypes of the same class learned from different encoders;
been specifically designed to capture node feature infor- and the middle-grained one at the node-to-class level by
mation from diverse perspectives. There are two primary ensuring the identity of node-to-class relational distributions
approaches for generating node embeddings from multi- learned from two encoders.
views: one is combining different kinds of information from 3) Extensive experiments on six real-world networks from
the same graph, such as the models CG3 [19] and VCHN [20]; different fields verify that IMCN significantly outperforms
and the other is contrasting the information captured from one the comparison methods for semi-supervised node classifi-
graph and its corresponding augmented one, such as the mod- cation with few labeled nodes. Especially, on the three public
els MVGRL [21] and CSGNN [22]. Therefore, information benchmark datasets Cora, CiteSeer, and PubMed, the classi-
from different views can be complementary by each other fication accuracies of IMCN are more than 2.5% higher than
and provide valuable hints for promoting node embedding baseline methods when only two or three labeled nodes per
and classification. class are available for model training.
Despite the remarkable advancements achieved by single- The remainder of the paper is structured as follows.
view and multi-view GCNs in semi-supervised node classifi- Section 2 introduces some related work. Section 3 presents
cation, it remains a daunting challenge to effectively classify the details of the proposed model. Then, the experimental
nodes with only a limited number of labeled examples (lack setup and results are introduced and analyzed in Section 4
of supervision). This is mainly attributed to the following two and Section 5, respectively. Finally, Section 6 provides the
problems: (1) Graph convolutions focus on information prop- main conclusions of this paper and gives ideas for further
agations from neighboring nodes to the central node based on study.
the graph topology, so that the similar features or commonal-
ity shared by connected nodes are learned. However, this may
result in a loss of the node individuality, the acquisition of 2 Related work
over-smoothed features, and the failure to distinguish nodes
from different classes [13]. Such outcomes are unfavorable In this section, some previous work on GCNs for semi-
when aiming to learn a discriminative classifier. (2) The class supervised node classification are briefly reviewed based on
label information of nodes is scarce and concise but plays a whether the method capture abundant information from dif-
significant role in supervising model learning. The relation ferent aspects.
between nodes and classes should be taken full advantage of
for mining valuable supervised signals and optimizing the 2.1 Single-view GCNs
processes of node embedding and classification. However,
only limited useful supervised signals are mined from the Single-view GCNs usually learn node embeddings for classi-
relation by most GCNs [19, 20]. fication by propagating and aggregating feature information
In this paper, we develop an Individuality-enhanced and between adjacent nodes in the graph from one aspect only.
Multi-granularity Consistency-preserving graph neural Net- The classical and representative model GCN [11], which
work (IMCN) for semi-supervised node classification with derived inspiration primarily from the convolutional oper-
scarce labeled data. Specifically, IMCN integrates a simple ations on images, learns low-dimensional node embeddings
two-layer MLP as a supplementary encoder to amend the through propagations and aggregations of nodes’ and their
individuality of nodes damaged by graph convolutions. Then, neighbors’ features. The GraphSAGE (SAmple and aggre-
IMCN enriches supervised signals by taking full advantage GatE) model [23], a general inductive framework, generates
of the multi-granularity relations among nodes and classes. node embeddings by sampling and aggregating features from
The main contributions are summarized as follows: one node’s local neighborhood and can efficiently gener-
1) An individuality-enhanced and multi-granularity ate node embedding for previously unseen data. The Graph
consistency-preserving graph neural Network is built for ATtention network (GAT) [24] employs an attention mecha-
semi-supervised node classification, which can maintain the nism to modify the traditional propagation and aggregation
individuality and commonality of nodes simultaneously dur- operations between one node and its neighbors in GCN. The
ing the feature extraction process. The proposed method Simple Graph Convolution network (SGC) [25] reduces the
is highly effective, particularly when there are only a few excess complexity in GCN by successively removing nonlin-
labeled nodes available for model training. earities and collapsing weight matrices between consecutive
2) Three consistency constraints at different granular- layers. The Hierarchical Graph Convolution Network (H-
ity are designed to enrich the supervised information for GCN) [12] enlarges the receptive field of graph convolutional
model learning: the fine-grained one at the node level by processes in GCN by fusing nodes with similar structures into
an improved semi-supervised contrastive loss; the coarse- super-nodes. The simplified multi-layer graph convolutional

123
Individuality-enhanced and multi-granularity consistency-preserving graph neural...

networks with dropout [6] combines SGC and the dropout GNNs and a shared MLP and generates corresponding graph
regularization in deep learning and extends the shallow GCN representations from them by shared pooling and MLP lay-
model to the multi-layer GCN model to extract information ers. Then, contrastive constraints between node and graph
from higher-order neighbors. This model can reduce redun- representations are designed as important parts of the learn-
dant calculation and over-fitting of the multi-layer GCN to ing objective. The deep GRAph Contrastive rEpresentation
make it simple and efficient. learning model (GRACE) [27] first generates two correlated
Despite the noticeable achievements of these single-view graph views by randomly performing corruption of remov-
models, the main concern of lacking supervised informa- ing edges and masking node features, then maximizes the
tion in semi-supervised node classification is still not well agreement between node embeddings in these two views
handled. Most single-view GCNs are usually shallow with based on the idea of contrastive learning. The Contrastive
one aspect only taken into consideration, and cannot obtain Semi-supervised learning model based GNN (CSGNN) [22]
adequate information for effective classification when very employs a two-layer GCN as a teacher encoder to learn node
limited nodes are labeled. representations for one graph and its corrupted graph, and
then contrasts the latent vectors between nodes, edges, and
2.2 Multi-view GCNs labels from these two views for improving predictions. In
the final stage, the predictions are distilled into the down-
Different from the single-view GCNs, multi-view methods streaming student module.
are specifically designed to capture abundant information The above two branches of multi-view GCNs can learn
from different aspects for improving learning and classifi- complementary information for boosting the discrimination
cation. In recent years, many multi-view GCNs have been of node embeddings and classification accuracy. However,
proposed, and are classified into the following two classes. these methods usually rely on the graph convolutional
One captures and combines feature information from two process, which enforces the encoders focusing on the com-
views of the same graph. The Contrastive GCN with Graph monality of adjacent nodes and damaging some individuality
Generation (CG3 ) [19] integrates H-GCN and a two-layer of them. In addition, these models do not take full advan-
GCN to learn complementary information from local and tage of the complex but valuable relation information among
global views of nodes and imposes the designed node-level nodes and classes.
contrastive and graph-level generative constraints on the The methods H-GCN, CG3 , and VCHN are closely
embeddings learned by the above two encoders. The View- related to the method proposed in this paper. The constraints
Consistent Heterogeneous Network (VCHN) [20] combines and mechanism used in these four methods are listed in
the classical methods GCN and GAT to learn node embed- Table 1.
dings from spectral and spatial views and applies constraints The following are the differences among the proposed
on the predictions between two views to promote the super- IMCN method, H-GCN, CG3 , and VCHN. Compared with
vision from one to the other. the multi-view methods CG3 and VCHN, a GCN-based
The other captures feature information from one graph encoder is coupled with an MLP-based encoder in the pro-
and its corresponding augmented graph and contrasts them posed IMCN method, which can enhance the node individ-
to provide extra useful information for learning. The Deep uality for learning discriminative node embedding vectors.
Graph Infomax model (DGI) [26] aims at learning patch rep- For the node-level constraints in IMCN and CG3 , the calcu-
resentations and the corresponding high-level summaries of lation of the former is much simpler than that of the latter
graphs and related corrupted graphs by GCN, and then maxi- by reducing repeated node-pair contrasts. In addition, IMCN
mizes the mutual information between them. Similar to DGI, takes full advantage of the complex relations among nodes
the contrastive Multi-View Graph Representation Learning and classes from the views of node-to-class distribution and
model (MVGRL) [21] learns node representations for the class centroid alignment which are ignored by the other three
graph and its corresponding corrupted one by two different methods.

Table 1 Differences among


Constraints/Mechanism H-GCN CG3 VCHN IMCN
H-GCN, CG3 , VCHN, and the
proposed IMCN method √ √
multi-view encoder — Individuality-enhanced
√ √
node-level constraint — No repeated calculation

node2class-level constraint — — —

class-level constraint — — —

123
X. Liu and W. Yu

3 IMCN model mation to improve model learning. Multi-view learning has


grown in popularity as a result of this concept. Meanwhile,
This section first introduces the framework of the proposed contrastive learning has received intensive research in recent
Individuality-enhanced and Multi-granularity Consistency- years showing that contrasting congruent and incongruent
preserving graph neural Network (IMCN) for semi-supervised views of objects can help the algorithms learn expressive
node classification. Then, the critical components of IMCN representations [28–32]. Inspired by these ideas, two differ-
are described in detail. ent views are established for a graph and applied to learn
discriminative node representations for classification.
3.1 Framework overview Although some augmentation strategies are proposed to
generate related graphs with different views, such as node
We design a novel graph neural network (as shown in Fig. 1) dropping and edge perturbation in [21, 22, 33], they may
for semi-supervised node classification, which commits to destroy the original graph topology and degrade the perfor-
taking both the individual- and common-feature informa- mance of graph convolutional networks. Unlike the previous
tion of nodes and preserving multi-granularity consistency approaches, in this paper, the node itself is seen as a local
between them to learn discriminative node representations view and the nodes with adjacent neighbor structures as a
for effective classification. The model learning and classifi- global view for a graph. These two views are obviously dif-
cation process of IMCN mainly includes the following three ferent from those of node2vec [34] which takes the width
stages: and depth of the deep walks controlled by two parameters as
1) Feature extraction The individual- and common- local and global views.
feature information for nodes is extracted by two different From the global structure view, many public GCN-based
encoders according to the graph topology and original node models can be used to capture common-feature information
features. among adjacent nodes. Here, the effective GCN-based model
2) Multi-granularity consistency constrains The multi- H-GCN [12] is adopted as the global encoder in the proposed
granularity consistency of feature information learned from IMCN model. H-GCN aggregates nodes that have equivalent
the encoders is preserved according to the relations among or similar structures into hyper-nodes for graph convolution
nodes and classes. and then refines the coarsened graphs to the original graphs
3) Feature fusion and node classification A trade-off is for restoring the representation for each node. Therefore, the
made between the constrained individuality and common- receptive field for each node is enlarged, and more global
ality of nodes and then a multi-objective loss function is and common-feature information of nodes can be compre-
established to obtain the optimized model for node classi- hensively captured. The node feature matrix X ∈ Rm×n and
fication. adjacency matrix A ∈ Rm×m are input into the H-GCN
encoder to generate low-dimension global node representa-
3.2 Feature extraction tions Hglobal ∈ Rm×c as follows:

Different views contain quite different information to describe


the same object, which can provide complementary infor- Hglobal = φ(A, X), (1)

Node embedding
representations
Graph structure Node-to-class distribution Class prototypes
Node features

Global-view
Classification

GCN-based encoder
Consistency at
+
Consistency at Consistency at
node level node-to-class level class level
Local-view
MLP-based encoder
Graph

Node features only Node-to-class distribution Class prototypes


Node embedding
representations

Stage 1 Stage 2 Stage 3

Fig. 1 Framework of the proposed IMCN model: feature extraction in Stage 1; multi-granularity consistency constrains in Stage 2; and feature
fusion and node classification in Stage 3

123
Individuality-enhanced and multi-granularity consistency-preserving graph neural...

where m, n, and c (as the dimension of node embeddings) are local and global representations of one node is described as
the number of nodes, node features, and classes, respectively; fine-grained node-level consistency.
and φ(·) denotes the processes of generating coarse graphs, In the proposed IMCN method, the vector distance of
graph convolution, and refining coarsened graphs in H-GCN. local and global representations is used to measure this fine-
However, those GCN-based encoders mainly focus on grained consistency. In detail, this constraint is defined with
the commonality of linked nodes excessively and may lose unsupervised and supervised parts as follows.
the individuality of nodes in information propagation. This On the one hand, in order to utilize the abundance of unla-
problem also exists in the local encoder which is designed beled information effectively, an unsupervised node-level
by two-layer GCNs in [19]. In practice, the categories of loss is defined to maintain the consistency between the local
nodes are mainly determined by their individual-feature and global representations of the same node:
information. Therefore, to compensate for the damaged
individual-feature information learned from the global GCN- m sim(hlocal
global
,hi )
i=1 e
i
based encoder, the node itself is regarded as a local view and L unode = − log  global
, (3)
extract the individual-feature information of nodes by a sim- n sim(hlocal ,hk )
j,k=1 e
j

ple two-layer MLP encoder with X as the only input. The


local node low-dimension representations Hlocal ∈ Rm×c
can be obtained by IMCN as follows: where hlocal
i is the i-th row vector in Hlocal , and sim(a, b) is
the cosine similarity between a and b: sim(a, b) = |a|×|b| a·b
.
By minimizing this loss, the representations of the same node
Hlocal = σ (XW(0) )W(1) , (2)
from two views are expected to be similar, while those of
different nodes are expected to be away from each other.
where W(i) and σ (·) denote the trainable weight matrix and On the other hand, labeled nodes are scarce but can provide
non-linear ReLU activation function [35]. Denote W(0) ∈ valuable semantic information for learning expressive node
Rn×d , and W(1) ∈ Rd×c , where d is the feature dimension in embeddings for easy classification. The consistency between
the hidden layer. Since Hlocal is computed regardless of the local and global representations of labeled nodes belonging
structural information of graphs, the label information can to the same class is maintained by a designed supervised loss
be effectively propagated without the limitation of distance as follows:
between nodes.
In order to make the feature information learned by two m local ,h global )
encoders in the same metric, Hlocal and Hglobal are normal- i, j=1;yi =y j esim(hi j
L snode = − log n local ,h global )
, (4)
ized by the L 2 -norm in the column direction before imposing sim(hk
k,m=1 e
m
the following multi-granularity consistency constraints and
the classification stage.
where yi is the one-hot coded class vector of the i-th node
and yi ∈ R1×c . Therefore, each labeled node from one view
3.3 Multi-granularity consistency constraints is contrasted with the labeled nodes belonging to the same
class from the other view.
For the individuality and commonality of nodes captured sep-
Note that in the above two loss terms, we expect node
arately through the above feature extraction processes, it is
embeddings of the same node or nodes from the same class
reasonable to preserve the consistency between them for opti-
to be most similar simultaneously in the joint similarity distri-
mizing the encoding process. Inspired by human cognition
bution between all nodes, instead of the marginal distribution,
and intelligence, data are analyzed from different granulari-
as shown in Fig. 2. This is more reasonable and can avoid
ties in IMCN. This strategy can lead the model to analyze
the following time-consuming duplication calculations in
data more comprehensively, utilize data more efficiently,
CG3 [19]: (1) reuse negative nodes (blue shaded and checked
and make more accurate decisions. The following part intro-
ones as shown in Fig. 2(a) and (c)) in contrastive learning;
duces the designed multi-granularity consistency constraints
and (2) repeat calculations of the inner products between row
according to relations among nodes and classes in detail.
vectors from Hlocal and Hglobal in each loss term.
Finally, the local-global consistency at the fine-grained
3.3.1 Node-level consistency constraint node level is maintained by a node-wise regularization term
defined with both the unsupervised and supervised informa-
Data tend to be analyzed intuitively from the node (sample) tion as the following formula:
level, which can enforce the model focusing on the features of
representative samples and having good generalization abil-
L node = L snode + L unode . (5)
ity. From this level, some common information shared by

123
X. Liu and W. Yu

Fig. 2 Differences between the proposed IMCN method and CG3 [19] on calculating unsupervised and supervised node-level loss

Therefore, the learning process of IMCN for local and global mentioned above, then add them to the prototype represen-
node representations can complement and promote each tations calculated after the last iteration for updating class
other based on node features and semantic information. prototype representations and suppressing the instability:

local(t) local(t−1)
3.3.2 Class-level consistency constraint ci = (1 − μ)ci + μcilocal ,
global(t) global(t−1) global
ci = (1 − μ)ci + μci , (7)
The performance of the model tends to be biased when trained
only at the sample level. The reason is that different sam- where μ is the balance weight for updating the class
ples can share some common features and belong to the prototypes in the t-th iteration based on the prototype repre-
same class, and it is necessary for the model to distinguish sentation after t − 1 iterations, and μ ∈ [0, 1).
between samples of different classes. Despite a small num-
ber of labeled nodes, their semantic category information 3.3.3 Consistency constraint at the node-to-class level
is an important supplement for feature embedding. This is
not taken into consideration in [19]. From the perspective Assume that the distribution around each prototype is
of semantic category, the common information between the isotropic Gaussian and that the distributions around the same
local and global views of nodes is named after coarse-grained class in the local and global views should be similar. There-
class-level consistency. fore, in addition to the consistencies at node and class levels,
Following but different from the idea in [36], prototypes there must be some indispensable consistent information
for each class from local and global views are generated in the node-to-class relationship between local and global
using the learned embeddings of the labeled nodes, then the views. This is also not taken into consideration in [19].
distance between them are expected to be minimized. The To make the best use of unlabeled nodes, sim(hi , c j )
following constraint is designed: is used to calculate the similarities between each node
embedding and the obtained class prototypes, and then node-
1  
c
 local global 2 to-class relational distributions are generated for unlabeled
L class = ci − ci  , (6)
c 2 nodes in local and global views according to the following
i=1
expressions:
global
where cilocal ∈ Rc and ci ∈ Rc are the prototypes of the
esim(hi ,c j )/τ
local local
i-th class calculated by average aggregation of the learned pilocal = ,
j c sim(hlocal ,clocal )/τ
local and global embeddings of the labeled nodes belonging k=1 e
i k

to this class respectively, ·2 is the L 2 -norm operator, and global global
,c j )/τ
L class is the mean-squared Euclidean distance of the corre- global esim(hi
pi j =  global global
, (8)
sponding class prototypes. Note that it is different from the c sim(hi ,ck )/τ
k=1 e
magnet loss [37] which uses the k-means method to compute
cluster centers for each class. where τ > 0 is a temperature hyper-parameter denoting the
The representations of class prototypes are not stable dur- concentration of node embeddings around class prototypes,
ing the model learning process and may forget valuable and a smaller τ indicates a larger concentration.
information learned before. Therefore, in the t-th iteration, Then, the distribution of the relation of one node to
global
we compute the class prototypes cilocal and ci in the way all classes can be represented as pi = [ pi1 , pi2 , ..., pic ].

123
Individuality-enhanced and multi-granularity consistency-preserving graph neural...

The node-to-class relational consistency between plocal


i and Algorithm 1 Individuality-enhanced and Multi-granularity
global Consistency-preserving graph neural Network (IMCN).
pi is kept by minimizing the Kullback-Leibler diver-
gence [38] between them as follows: Require: Graph with adjacency matrix A, node feature matrix X
and one-hot label matrix Y of labeled nodes, hyper-parameters
μ, τ, λ, α, β, γ , and the number of iterations T ;

m−l
global Ensure: Predicted labels for unlabeled nodes;
L node2class = D K L (plocal
i  pi ) 1: for t = 1 : T do
i=1 2: Extract individual and common feature of nodes by (1) and (2);
3: Calculate the node-level consistency loss by (3), (4), and (5);

m−l
global 4: Generate class prototypes for global and local labeled nodes by
= g(plocal
i , pi ) − g(plocal
i ),
average aggregation;
i=1 5: Compute and update class prototypes from two views by (7),
global

c respectively;
g(plocal
i , pi )=− so f tmax(plocal
i ) 6: Calculate the class-level consistency loss with (6);
j=1 7: Compute the node-to-class consistency loss for unlabeled nodes
global from two views by (8) and (9);
× ln(so f tmax(pi )), (9) 8: Calculate the final node representations according to (10);
9: Compute the cross-entropy classification loss for labeled training
where g(plocal ) is omitted, so f tmax(·) is the softmax nodes by (11);
i 10: Update parameters in the network by minimizing the overall loss
global
function, and g(plocal
i , pi ) is implemented by the cross- function in (12);
entropy function according to [39]. In the proposed IMCN 11: end for
model, L node2class is regarded as a middle-grained consis- 12: Conduct category prediction for unlabeled nodes based on the
trained IMCN model;
tency constraint.

3.4 Feature fusion and node classification


4 Experimental setup
Node representations with the individuality and commonal-
This section presents the experimental setup from the fol-
ity of nodes are generated by the designed encoders under
lowing three perspectives: (1) benchmark datasets used for
the defined multi-granularity consistency constraints. That
training and testing the model; (2) baseline models compared
important information is integrated and complements each
with the proposed model; and (3) parameter settings for the
other to obtain the final node representations as follows:
proposed model in the series of experiments.
H = λHlocal + (1 − λ)Hglobal , (10)
4.1 Datasets
where λ is a trade-off hyper-parameter between the individ-
uality and commonality of nodes, and λ ∈ (0, 1). Six benchmark datasets are used in the experiments for a
Then, the embedding vectors of l labeled nodes can be comprehensive comparison between the proposed method

noted as H ∈ Rl×c from H, and the cross-entropy classifi- and the state-of-the-art methods, including three undirected
cation loss is calculated to penalize the differences between citation networks from [40], two co-purchasing networks

the predicted labels Ŷ = so f tmax(H ) and the ground truth segmented from the Amazon co-purchasing graph [41], and
Y ∈ Rl×c of the labeled nodes as follows: one co-authorship network [42] from the KDD Cup 2016
challenge.1 Detailed statistics of these datasets are summa-

l 
c
rized in Table 2, where the density of each dataset is defined
L cr oss = − Yi j ln Ŷi j . (11) as the ratio between the number of actual edges in the dataset
i=1 j=1
and the edges in its corresponding fully connected graph.
Following the data preprocessing in [19], each dataset is
Finally, the proposed semi-supervised classification model
split into training, validation, and test sets as follows: (1) For
IMCN is trained with the overall loss function expressed as
the first three citation networks, twenty labeled nodes per
follows:
class are used as the training set, 500 nodes, and 1,000 nodes
as the validation and test sets, respectively. (2) For the other
L = L cr oss + αL node + β L class + γ L node2class , (12)
three networks, thirty labeled nodes per class are used as the
training set, thirty nodes per class as the validation set, and
where α, β, and γ are three adjustable hyper-parameters to the rest as the test set.
measure the importance of multi-granularity consistencies
respectively. The training process of IMCN is sketched in
Algorithm 1. 1 https://kddcup2016.azurewebsites.net/

123
X. Liu and W. Yu

Table 2 Dataset statistics


Datasets Type Nodes Features Edges Classes Label rate Density

Cora Citation 2,708 1,433 5,429 7 5.2% 0.148%


CiteSeer Citation 3,327 3,703 4,732 6 3.6% 0.086%
PubMed Citation 19,717 500 44,338 3 0.3% 0.023%
Photo Co-purchasing 7,650 745 119,081 8 3.1% 0.407%
Computer Co-purchasing 13,752 767 245,861 10 2.2% 0.260%
CS Co-authorship 18,333 6,805 81,894 15 2.5% 0.049%

4.2 Comparison models nodes in different views. Then contrastive learning is con-
ducted between node embeddings from two views.
To verify the effectiveness of the proposed model, a compari- 8) H-GCN [12] is an improved GCN-based model that
son is made between the proposed IMCN model and ten other expands the receptive field of graph convolutions in GCN.
baseline methods, including four basic deep graph models 9) CG3 [19] employs the H-GCN model and a two-layer
(GCN [11], GAT [24], SGC [25], and H-GCN [12]), and GCN module to obtain local and global node embeddings and
six GCN-based contrastive models (DGI [26], GMI [43], designs a semi-supervised node-level contrastive loss and a
MVGRL [21], GRACE [27], CG3 [19], and VCHN [20]). graph-level generative loss to optimize the model learning
The description of the details of these methods is as follows. process.
1) GCN [11] produces node embedding vectors by a recur- 10) VCHN [20] uses a two-layer GCN module and a two-
sive average neighborhood aggregation scheme. It is derived layer GAT module to obtain latent features from spectral and
from the related work of conducting graph convolutions in spatial views and designs a strategy to generate confident
the spectral domain [44]. pseudo-labels for unsupervised nodes.
2) GAT [24] generates node embedding vectors by mod- Note that, in the last experiment of Section 5, IMCN with
eling the differences between the node and its one-hop a two-layer GCN as the local encoder is added to illustrate
neighbors. the effectiveness of the proposed scheme in this paper.
3) SGC [25] reduces the excess complexity in GCN
by removing nonlinearities and collapsing weight matrices
between consecutive layers.
4) DGI [26] generates node embeddings and graph sum- 4.3 Experimental settings
mary vector for the original input graph and constructs a
corrupted graph to obtain negative node embeddings with The proposed IMCN model was trained using the Adam opti-
the same GNN encoder. Then DGI aims at maximizing the mizer with 500 epochs and the following settings: (1) The
mutual information between positive node embeddings and ReLU function is adopted as the non-linear activation of hid-
the graph summary vector and minimizing it between nega- den layers. (2) The output dimension of local and global
tive node embeddings and the graph summary vector. node representations is fixed to the number of classes. The
5) GMI [43], different from DGI, focuses on maximizing dimensions of hidden layers, learning rate, weight decay, and
the mutual information of feature and edge between the input dropout ratio are searched in {32, 64, 128}, {0.1, 0.05, 0.01},
graph and the output graph of the encoder. {0.01, 0.005, 0.001, 0.0005}, and {0, 0.1, 0.2, 0.3, 0.4, 0.5,
6) MVGRL [21] uses graph diffusion to generate an addi- 0.6, 0.7, 0.8, 0.9}, respectively. (3) The hyper-parameters μ,
tional structural view of a graph, then original-view and λ, and τ in the proposed IMCN model are searched in
diffusion-view graphs are fed to GNNs and shared MLP to {0.1, 0.2, 0.3, 0.4, 0.5}. (4) The hyper-parameters α, β, and
learn node representations. The learned features are then fed γ for the trade-off among three consistencies at different
to a graph pooling layer and a shared MLP to learn graph rep- granularity are tuned in {0.1, 0.5, 1, 1.5, 2}. In each experi-
resentations. A discriminator contrasts node representations ment, the proposed IMCN model is run 10 random trials and
from one view with graph representation of another view and the mean and standard deviation of the best test classification
vice versa and scores the agreement between representations accuracy is reported. The results of the comparison methods
which is used as the training signal. are directly excerpted from the original papers. If not, cor-
7) GRACE [27] jointly corrupts the input graph at both responding experiments are conducted to obtain the results.
topology and node attribute levels, such as removing edges The code and datasets are publicly available at https://github.
and masking node features, to provide diverse contexts for com/xinya0817/IMCN.

123
Individuality-enhanced and multi-granularity consistency-preserving graph neural...

5 Experimental results and analysis set for one hyper-parameter, the other two hyper-parameters
are fixed.
This section presents the experimental results and discusses The classification accuracies of IMCN with different val-
the performance of IMCN from the following seven aspects: ues for α, β, and γ are shown in Fig. 3, and the following
(1) Performance of IMCN with different weights of the observations can be obtained:
node-level consistent constraint, the class-level one, and the (1) IMCN obtained the best classification result on Cora
one at the node-to-class level; (2) Performance of IMCN when α = 1.5, β = 1, and γ = 2. This is significantly
with different updating rates for learning class prototypes, superior to the results when α, β, or γ equals 0.1.
different temperatures for calculating the node-to-class dis- (2) IMCN got the best result on PubMed when α = 0.5,
tribution, and different weights for the local embedding in β = 1, and γ = 0.1, which is obviously much better when
the final embedding; (3) Visualization of the original nodes these hyper-parameters are set to other values. Therefore, the
and node embeddings learned by IMCN and its part mod- impacts of three different granularity consistency levels on
ules; (4) Ablation study of IMCN with different loss terms; the model are quantified.
(5) Performance of IMCN on alleviating over-smoothing; (6) (3) The IMCN model demonstrates satisfactory perfor-
Performance of IMCN and comparison methods with scarce mance when the values of α and β are approximately 1,
labeled training data; and (7) Performance of IMCN and regardless of whether it is applied to Cora or PubMed
baselines on common benchmark datasets. In all tables of datasets. However, the same level of sensitivity was not
experimental results, the highest record on each dataset is observed in relation to the parameter γ . This indicates that
highlighted in bold. IMCN is considerably more responsive to the weight of the
node-to-class loss. As a result, it is recommended to set the
parameters α and β to 1, while the parameter γ may require
5.1 Performance of IMCN with different weights
careful fine-tuning in domain-specific applications.
on the consistent constraints

Experiments are carried out to determine the effectiveness of 5.2 Performance of IMCN with different parameters
three local-global consistent constraints in IMCN. To verify , , 
the performance of the proposed IMCN model on very lim-
ited labeled training nodes, two citation networks (a small one This part discusses the impacts of different rates μ for
and a relatively big one) are used with label rates equal 0.5% updating class prototypes, different temperatures τ for the
for Cora and 0.03% for PubMed. When different values are node-to-class distribution, and different weights λ of the local

Fig. 3 Classification accuracies of the proposed IMCN model with different α, β, and γ

123
X. Liu and W. Yu

embedding in the final embedding on the performance of the (4) IMCN demonstrates strong performance when the val-
proposed IMCN model. Classification experiments are con- ues of μ and λ are set to 0.4 and 0.3, respectively, whether
ducted on Cora and PubMed with 0.5% and 0.03% labeled applied to Cora or PubMed datasets. However, the same level
nodes, respectively. From the results shown in Fig. 4, the of performance consistency was not observed in relation to
following observations can be obtained: the parameter τ . This indicates that IMCN is highly sensi-
(1) IMCN performs best when μ is set to 0.3 on Cora and tive to the temperature parameter in the node-to-class loss.
0.4 on PubMed. Smaller or bigger values for μ cannot ensure Therefore, it is recommended to set the parameters μ and λ to
IMCN gets the ideal performance, which implies that appro- 0.4 and 0.3, respectively, while the parameter τ may require
priate updating speed for class prototypes is important for fine-tuning in practical tasks.
learning stable and expressive node representations. On the
one hand, when μ is very small, node embeddings are unable
to obtain new useful information in a timely manner. On the 5.3 Visualization of node embeddings learned
other hand, when μ is very large, however, the important by different models
information learned previously cannot be retained.
(2) In general, a low value for the temperature hyper- The t-SNE algorithm [45] is used to visualize the original
parameter τ ensures that IMCN achieves the best per- nodes of Cora [40] with a label rate of 0.5% and their embed-
formance, as seen on Cora. This is because that small ding representations learned by a two-layer MLP (only the
temperature hyper-parameter can ensure a high concentration features of node itself are used), the representative model H-
of node-to-class distribution. However, the superior result on GCN [12] (feature information propagated from multi-hop
PubMed obtained when τ = 0.3 in IMCN was used may be neighbors is used), and the proposed IMCN model (which
attributed to PubMed’s relatively large scale and sparse struc- integrate feature information of node itself and from its
ture. neighbors). All original and embedded nodes are projected
(3) The best classification accuracy was got when λ = 0.3 into a two-dimensional space for visualization and shown in
for IMCN both on Cora and PubMed. A much smaller or Fig. 5.
bigger percentage of local information in the final node rep- From the results, the following observations can be
resentations cannot obtain ideal results. This is because the obtained. After the embedding process of a simple two-layer
hierarchical GCN module takes the feature information of the MLP model, nodes from different classes are still mixed
node into learning, but some are damaged by the propagation and cannot be clearly distinguished. H-GCN can group most
and aggregation operations of GCN layers. embedded nodes into their classes correctly, however, many
nodes from different classes in the central area of Fig. 5(c)

Fig. 4 Classification accuracies of the proposed IMCN model with different μ, τ and λ

123
Individuality-enhanced and multi-granularity consistency-preserving graph neural...

Fig. 5 Two-dimension visualization of original nodes and node embeddings obtained by MLP, H-GCN, and the proposed IMCN model on Cora

are very close, which can easily lead to misclassification. proposed IMCN models with corresponding GCNs as global
Compared with the above two methods, the proposed IMCN encoders obtain low-dimensional node representations on
model can push nodes from different classes away while these two datasets respectively, as shown in Fig. 6(b) and (d).
increasing the distance between these classes, ensuring low From these figures, it can be seen that the proposed IMCN
classification errors. method obviously alleviates the over-smoothing problem
resulting from multiple convolution operations.
5.4 Ablation study of IMCN
5.6 Performance of IMCN and comparison methods
On three citation networks, ablation experiments are carried on datasets with scarce labeled nodes
out to demonstrate the effectiveness of various local-global
consistent constraints in IMCN. The label rates of Cora, Cite- In this part, some experiments are conducted to verify the
Seer, and PubMed are 0.5%, 0.5%, and 0.03%, respectively. effectiveness of the proposed IMCN method in learning
Experimental results are listed in Table 3. expressive node embeddings with only a few nodes labeled
From the results, it can be seen that the designed in the training process. In the experiments, three benchmark
multi-granularity constraints (feature embedding agreement, graph datasets (Cora, CiteSeer, and PubMed) with differ-
semantic class alignment, and the identity of node-to-class ent label rates are used: 0.5%, 1%, 3% labeled nodes for
relational distribution) make a significant improvement. The Cora and CiteSeer; and 0.03%, 0.05%, 0.01% labeled nodes
consistency of the local and global perspectives at multi- for PubMed, respectively. The classification accuracies of all
ple levels can reveal their complementary features between methods are listed in Table 4, and the following three obser-
the individuality and commonality of nodes. By combining vations are obtained.
these constraints, IMCN can make full use of both limited (1) The proposed IMCN method outperforms most base-
labeled nodes and abundant unlabeled nodes and integrate lines in terms of different label rates on three datasets,
useful information from the two views. especially when there are very few labeled nodes. For exam-
ple, on CiteSeer with 0.5% labeled nodes, the classification
5.5 Performance of IMCN on alleviating accuracy of IMCN is significantly higher than that of other
over-smoothing methods, which is 6.5% higher than the method ranked
second. This mainly attributes that IMCN can capture the
Through a series of experiments, it was observed that the abundant individuality and commonality of nodes with the
nine-layer and twelve-layer GCNs cause all node repre- consideration of the complex relations among nodes.
sentations to be similar and indistinguishable on Cora and (2) On Cora with a label rate of 1% and PubMed with a
CiteSeer, respectively, as shown in Fig. 6(a) and (c). Then, the label rate of 0.1%, IMCN’s performance is not superior to
VCHN, which is ranked first, but it is clearly better than the
method ranked third. Concretely, the classification accuracy
Table 3 Ablation study of the proposed IMCN method with different
loss terms (%) of the proposed IMCN method is 2.6% and 0.1% lower than
VCHN, but is 3.7% and 0.4% higher than H-GCN on these
L cr oss L node L class L node2class Cora CiteSeer PubMed
two datasets, respectively.
 68.7 55.4 69.8 (3) The performance of the proposed IMCN model is quite
  70.0 68.4 70.8 good when density is high, especially on Cora and CiteSeer.
   76.9 70.0 71.1 This is obviously in contrast with that on PubMed which is
    78.0 70.8 71.7 much sparser than the first two datasets according to their
density information in Table 2.

123
X. Liu and W. Yu

Fig. 6 Two-dimension
visualization of node
embeddings obtained by
multi-layer GCNs and the
proposed IMCN models with
the corresponding multi-layer
GCNs on Cora and CiteSeer

5.7 Performance of IMCN and baselines on common obtain the best classification accuracies on Photo which are
benchmark datasets about 2.9% and 3.3% higher than that of CG3 which ranked
at third and proposed in recent years. This mainly owes to
In this section, classification experiments are conducted on the individuality of nodes enhanced by the designed sim-
six various networks with common dissociation to assess the ple local encoder and the multi-granularity relations among
performance of the proposed IMCN method and compare it nodes and classes maintained by the designed consistency
with the baseline methods. Note that IMCN1 and IMCN2 are constraints.
the methods proposed in this paper with the H-GCN model (3) IMCN1 and IMCN2 are obviously better than the cor-
and a two-layer GCN as the global encoder, respectively. In responding IEN1 and IEN2 on most datasets. For example, on
addition, there are two designed comparison methods corre- the citation network CiteSeer, the node classification accu-
sponding to IMCN1 and IMCN2 without multi-granularity racies are promoted by 3.7% and 5.7% with the designed
consistency constraints: IEN1 takes the H-GCN model and a multi-granularity consistency constraints of the proposed
two-layer MLP as encoders; and IEN2 uses a two-layer GCN method.
and a two-layer MLP as encoders. Then, the widely used statistical Nemenyi test [46] is
First, the classification accuracies of all methods are employed to conduct a comprehensive analysis of the signif-
shown in Table 5, where the results ranking in the first two icant difference among the proposed IMCN methods and 12
are marked in bold. The following three conclusions can be comparison methods on six datasets with the classification
drawn: accuracies in Table 5. The average ranks of all the meth-
(1) The performance of IMCN1 and IMCN2 is obviously ods with the critical distance (CD) are plotted as shown in
better than the first three traditional models, GCN, GAT, and Fig. 7. The following observations are obtained. The classi-
SGC, which are based on a single view. This is thanks to two fication accuracies of the proposed IMCN1, IMCN2, IEN1,
different views of the graph combined in IMCN to capture IEN2, CG3 , H-GCN, and MVGRL are statistically better than
both shared and complementary information from them. those of other seven comparison methods. There is no con-
(2) The proposed methods IMCN1 and IMCN2 outper- sistent evidence to indicate the statistical differences among
form the contrastive learning-based comparison methods on IMCN1, IMCN2, IEN1, IEN2, CG3 , H-GCN, and MVGRL
all experimental datasets. Especially, IMCN1 and IMCN2 on the metric of classification accuracy.

Table 4 Classification
Dataset Cora CiteSeer PubMed
accuracies of the proposed
Label rate 0.5% 1% 3% 0.5% 1% 3% 0.03% 0.05% 0.1%
IMCN method and comparison
methods on Cora, CiteSeer, and GCN [11] 42.6 56.9 74.9 33.4 46.5 66.9 61.8 68.8 71.9
PubMed with very limited
labeled nodes (%) GAT [24] 56.4 71.7 78.5 45.7 64.7 69.3 65.7 69.9 72.4
SGC [25] 43.7 64.3 71.0 43.2 50.7 60.9 62.5 69.4 69.9
DGI [26] 67.5 72.4 78.9 60.7 66.9 69.8 60.2 68.4 70.7
GMI [43] 67.1 71.0 78.8 56.2 63.5 68.0 60.1 62.4 71.4
MVGRL [21] 61.6 65.2 79.0 61.7 66.6 70.3 63.3 69.4 72.2
GRACE [27] 60.4 70.2 75.8 55.4 59.3 67.8 64.4 67.5 72.3
H-GCN [12] 70.9 75.0 82.5 56.3 60.3 69.6 68.7 70.3 74.7
CG3 [19] 69.3 74.1 79.9 62.7 70.6 71.3 68.3 70.1 73.2
VCHN [20] 73.9 81.3 82.0 64.3 67.9 69.1 69.2 71.8 75.2
IMCN (our) 78.0 78.7 83.3 70.8 72.9 73.8 71.7 72.4 75.1

123
Individuality-enhanced and multi-granularity consistency-preserving graph neural...

Table 5 Classification
Methods Cora CiteSeer PubMed Photo Computer CS
accuracies of the proposed
IMCN method and comparison GCN [11] 81.5±0.0 70.3±0.0 79.0±0.0 87.3±1.0 76.3±0.5 91.8±0.1
methods on six benchmark
datasets (%) GAT [24] 83.0±0.7 72.5±0.7 79.0±0.3 86.2±1.5 79.3±1.1 90.5±0.7
SGC [25] 81.0±0.0 71.9±0.1 78.9±0.0 86.4±0.0 74.4±0.1 91.0±0.0
DGI [26] 81.7±0.6 71.5±0.7 77.3±0.6 83.1±0.5 75.9±0.6 90.0±0.3
GMI [43] 82.7±0.2 73.0±0.3 80.1±0.2 85.1±0.1 76.8±0.1 91.0±0.0
MVGRL [21] 82.9±0.7 72.6±0.7 79.4±0.3 87.3±0.3 79.0±0.6 91.3±0.1
GRACE [27] 80.0±0.4 71.7±0.6 79.5±1.1 81.8±1.0 71.8±0.4 90.1±0.8
H-GCN [12] 82.9±0.3 71.1±0.5 79.9±0.4 92.0±0.8 80.4±0.8 92.5±0.1
CG3 [19] 83.4±0.7 73.6±0.8 80.2±0.8 89.4±0.5 79.9±0.6 92.3±0.2
VCHN [20] 81.6±0.5 71.5±0.6 78.7±0.4 89.3±1.2 82.1±0.3 88.4±0.3
IEN1 83.2±0.4 72.8±0.3 79.4±0.4 93.0±0.3 83.1±0.3 93.2±0.2
IMCN1 (our) 84.6±0.4 76.5±0.3 81.2±0.8 93.1±0.5 83.6±0.4 93.3±0.2
IEN2 82.0±0.2 71.2±0.9 79.3±0.5 92.8±0.2 84.1±0.5 93.2±0.2
IMCN2 (our) 85.1±0.2 76.9±0.2 82.2±1.2 93.6±0.2 84.9±0.4 93.7±0.2

Fig. 7 Average ranks of all


methods with the critical
distance (CD) for classification
accuracy according to the
Nemenyi test [46]

Table 6 Macro-F1 results of the


Methods Cora CiteSeer PubMed Photo Computer CS
proposed IMCN methods and
comparison methods on six GCN [11] 76.5±0.5 60.8±1.1 77.1±0.5 84.4±1.3 69.1±6.4 80.2±0.9
benchmark datasets (%)
H-GCN [12] 81.9±0.3 67.8±0.6 79.2±0.4 90.2±1.0 76.8±1.7 90.7±0.1
CG3 [19] 81.4±0.5 71.3±0.4 80.2±0.2 90.3±1.0 77.0±3.3 90.7±0.1
VCHN [20] 80.7±0.7 67.6±0.6 77.9±0.3 87.3±0.9 82.3±0.3 85.4±0.4
IEN1 82.0±0.5 68.9±0.4 78.9±0.4 91.4±0.4 84.4±0.4 91.5±0.3
IMCN1 (our) 83.6±0.4 71.0±0.4 80.0±0.7 91.2±0.5 81.9±0.8 91.5±0.2
IEN2 80.9±0.3 67.3±0.7 78.9±0.5 91.1±0.2 84.1±0.4 91.5±0.2
IMCN2 (our) 84.0±0.3 71.8±0.4 81.3±0.9 91.9±0.3 84.5±0.4 92.1±0.2

123
X. Liu and W. Yu

Table 7 FLOPs, trainable


Metrics Datasets H-GCN [12] IMCN1 (our) GCN [11] IMCN2 (our)
parameters, test time, and
running memory of GCN, FLOPs Cora 0.70G 1.09G 0.77G 0.04G
H-GCN, and the corresponding
IMCNs CiteSeer 0.94G 1.47G 0.77G 0.05G
PubMed 4.58G 8.51G 1.17G 1.21G

Parameters Cora 0.49M 0.59M 0.09M 0.18M


CiteSeer 1.07M 0.58M 0.09M 0.47M
PubMed 0.25M 1.31M 0.03M 0.06M

Test time Cora 0.04s 0.04s 0.01s 0.02s


CiteSeer 0.05s 0.07s 0.01s 0.04s
PubMed 0.26s 0.91s 0.40s 0.95s

Memory Cora 1.52G 1.10G 0.11G 0.40G


CiteSeer 1.66G 1.21G 0.11G 0.50G
PubMed 2.97G 3.98G 0.41G 1.63G

According to the ranks of the proposed methods and encoders. Extensive experiments were conducted on various
12 comparison methods, macro-F1 results of the first six public benchmark datasets, and the results demonstrated the
methods (including IMCN1, IMCN2, IEN1, IEN2, CG3 and effectiveness of the proposed IMCN method in solving node
H-GCN), the latest method VCHN, and the baseline GCN are classification tasks with extremely limited labeled nodes.
listed and compared in Table 6. The best results are marked in The proposed IMCN model has strict requirements on
bold. It can be seen that, under the macro-F1 metric, IMCNs’ the input graph, which assumes that the entire structure
performance is much better than that of their corresponding of this graph is available to capture the common-feature
IENs, GCN, H-GCN, and VCHN in most cases, especially information of nodes. Moreover, IMCN has a considerable
obvious on Cora. This is mainly attributed to the specific number of hyperparameters that require tuning and can be
designed individuality-enhanced module and three consis- inefficient when dealing with very large networks. In the
tency constraints at different levels. future, our focus will be on developing scalable and efficient
Finally, Table 7 shows the FLOPs, trainable parameters, deep semi-supervised node classification methods specifi-
test time, and running memory of the proposed methods cally designed for large-scale graph datasets. We also aim to
IMCN1 and IMCN2 and the corresponding baseline meth- explore automatic parameter tuning techniques using opti-
ods (H-GCN and GCN). It can be seen that the space and mization methods, such as [47].
time complexity of the proposed methods is slightly higher
Acknowledgements This work has been supported by the National
than the corresponding baseline methods while maintaining
Natural Science Foundation of China under Grant No. 61972203.
competitive performance. This is because of the added two-
layer MLP to enhance individuality and the three consistency Declarations
constraints between local and global encoders.

Competing Interest The authors declare that they have no known


competing financial interests or personal relationships that could have
6 Conclusions and future work appeared to influence the work reported in this paper.

Open Access This article is licensed under a Creative Commons


In this paper, we proposed a graph neural network called Attribution 4.0 International License, which permits use, sharing, adap-
Individuality-enhanced and Multi-granularity Consistency- tation, distribution and reproduction in any medium or format, as
preserving Network (IMCN). IMCN aims to take advantage long as you give appropriate credit to the original author(s) and the
of the limited, yet valuable, supervised information avail- source, provide a link to the Creative Commons licence, and indi-
cate if changes were made. The images or other third party material
able in labeled data and effectively enhance the classification in this article are included in the article’s Creative Commons licence,
capability. On the one hand, a simple MLP module was com- unless indicated otherwise in a credit line to the material. If material
bined with the original GCN-based model to enhance individ- is not included in the article’s Creative Commons licence and your
ual information in learning node representations. On the other intended use is not permitted by statutory regulation or exceeds the
permitted use, you will need to obtain permission directly from the copy-
hand, the complex relations among nodes and classes were right holder. To view a copy of this licence, visit http://creativecomm
taken full advantage of by the designed three consistency ons.org/licenses/by/4.0/.
constraints for optimizing the encoding processes of two

123
Individuality-enhanced and multi-granularity consistency-preserving graph neural...

References 21. Hassani K, Ahmadi AHK (2020) Contrastive multi-view represen-


tation learning on graphs. In: International Conference on Machine
1. Wang K, An J, Zhou M, Shi Z, Shi X, Kang Q (2023) Minority- Learning, pp 4116–4126
weighted graph neural network for imbalanced node classification 22. Song Y, Gu Y, Li X, Li C, Yu G (2022) CSGNN: improving
in social networks of internet of people. IEEE Internet Things J graph neural networks with contrastive semi-supervised learning.
10(1):330–340 Int Conf Adv Comput Appl 13245:731–738
2. Yu H, Shen Z, Du P (2022) NPI-RGCNAE: fast predicting ncRNA- 23. Hamilton WL, Ying Z, Leskovec J (2017) Inductive representa-
protein interactions using the relational graph convolutional net- tion learning on large graphs. In: Annual Conference on Neural
work autoencoder. IEEE J Biomed Health Inform 26(4):1861–1871 Information Processing Systems, pp 1024–1034
3. Zheng Y, Gao C, He X, Jin D, Li Y (2023) Incorporating price into 24. Velickovic P, Cucurull G, Casanova A, Romero A, Lió P, Bengio
recommendation with graph convolutional networks. IEEE Trans Y (2018) Graph attention networks. In: International Conference
Knowl Data Eng 35(2):1609–1623 on Learning Representations
4. Liu J, Xia F, Feng X, Ren J, Liu H (2022) Deep graph learning for 25. Wu F Jr, AHS, Zhang T, Fifty C, Yu T, Weinberger KQ, (2019)
anomalous citation detection. IEEE Trans Neural Netw Learn Syst Simplifying graph convolutional networks. International Confer-
33(6):2543–2557 ence on Machine Learning 97:6861–6871
5. Lin G, Kang X, Liao K, Zhao F, Chen Y (2021) Deep graph learning 26. Velickovic P, Fedus W, Hamilton WL, Lió P, Bengio Y, Hjelm
for semi-supervised classification. Pattern Recognit 118:108039 RD (2019) Deep graph infomax. In: International Conference on
6. Yang F, Zhang H, Tao S (2022) Simplified multilayer graph convo- Learning Representations
lutional networks with dropout. Applied Intelligence 52(5):4776– 27. Zhu Y, Xu Y, Yu F, Liu Q, Wu S, Wang L (2020) Deep graph
4791 contrastive representation learning. CoRR abs/2006.04131
7. Wang J, Liang J, Cui J, Liang J (2021) Semi-supervised learning 28. Liu Y, Li Z, Pan S, Gong C, Zhou C, Karypis G (2022) Anomaly
with mixed-order graph convolutional networks. Information Sci- detection on attributed networks via contrastive self-supervised
ences 573:171–181 learning. IEEE Trans Neural Netw Learn Syst 33(6):2378–2392
8. Li K, Ye W (2022) Semi-supervised node classification via 29. Liu Y, Wang K, Liu L, Lan H, Lin L (2022) TCGL: temporal
graph learning convolutional neural network. Applied Intelligence contrastive graph for self-supervised video representation learning.
52(11):12724–12736 IEEE Trans Image Process 31:1978–1993
9. Kazi A, Cosmo L, Ahmadi SA, Navab N, Bronstein MM (2023) 30. Song X, Jin Z (2022) Robust label rectifying with consistent con-
Differentiable graph module (DGM) for graph convolutional net- trastivelearning for domain adaptive person re-identification. IEEE
works. IEEE Trans Pattern Anal Mach Intell 45(2):1606–1617 Trans Multimedia 24:3229–3239
10. Zhou H, Gong M, Wang S, Gao Y, Zhao Z (2023) SMGCL: semisu- 31. Lin Y, Gou Y, Liu X, Bai J, Lv J, Peng X (2023) Dual contrastive
pervised multi-view graph contrastive learning. Knowledge-Based prediction for incomplete multi-view representation learning. IEEE
Systems 260:110120 Trans Pattern Anal Mach Intell 45(4):4447–4461
11. Kipf TN, Welling M (2017) Semi-supervised classification with 32. Tian R, Shi H (2023) Momentum memory contrastive learn-
graph convolutional networks. In: International Conference on ing for transferbased few-shot classification. Applied Intelligence
Learning Representations 53(1):864–878
12. Hu F, Zhu Y, Wu S, Wang L, Tan T (2019) Hierarchical graph 33. You Y, Chen T, Sui Y, Chen T, Wang Z, Shen Y (2020) Graph
convolutional networks for semi-supervised node classification. In: contrastive learning with augmentations. In: Annual Conference
International Joint Conference on Artificial Intelligence, pp 4532– on Neural Information Processing Systems
4539 34. Grover A, Leskovec J (2016) Node2vec: scalable feature learning
13. Li Q, Han Z, Wu X (2018) Deeper insights into graph convolutional for networks. In: ACM International Conference on Knowledge
networks for semi-supervised learning. In: AAAI Conference on Discovery and Data Mining, pp 855–864
Artificial Intelligence, pp 3538–3545 35. Nair V, Hinton GE (2010) Rectified linear units improve restricted
14. Rong Y, Huang W, Xu T, Huang J (2019) Dropedge: towards deep Boltzmann machines. In: International Conference on Machine
graph convolutional networks on node classification. In: Interna- Learning, pp 807–814
tional Conference on Learning Representations 36. Yang X, Deng C, Dang Z, Wei K, Yan J (2021) SelfSAGCN: Self-
15. Chen M, Wei Z, Huang Z, Ding B, Li Y (2020) Simple and supervised semantic alignment for graph convolution network. In:
deep graph convolutional networks. International Conference on IEEE Conference on Computer Vision and Pattern Recognition, pp
Machine Learning 119:1725–1735 16775–16784
16. He K, Fan H, Wu Y, Xie S, Girshick RB (2020) Momentum contrast 37. Rippel O, Paluri M, Dollár P, Bourdev LD (2016) Metric learning
for unsupervised visual representation learning. In: IEEE Confer- with adaptive density discrimination. In: International Conference
ence on Computer Vision and Pattern Recognition, pp 9726–9735 on Learning Representations
17. Chen X, Yao L, Zhou T, Dong J, Zhang Y (2021) Momentum 38. Kullback S, Leibler RA (1951) On information and sufficiency.
contrastive learning for few-shot COVID-19 diagnosis from chest Ann Math Stat 22(1):79–86
CT images. Pattern Recognition 113:107826 39. Zheng M, You S, Wang F, Qian C, Zhang C, Wang X, Xu C (2021)
18. Xu H, Xiong H, Qi G (2022) K-shot contrastive learning of visual ReSSL: relational self-supervised learning with weak augmenta-
features with multiple instance augmentations. IEEE Trans Pattern tion. In: Annual Conference on Neural Information Processing
Anal Mach Intell 44(11):8694–8700 Systems, pp 2543–2555
19. Wan S, Pan S, Yang J, Gong C (2021) Contrastive and genera- 40. Sen P, Namata G, Bilgic M, Getoor L, Gallagher B, Eliassi-Rad
tive graph convolutional networks for graph-based semi-supervised T (2008) Collective classification in network data. AI Magazine
learning. In: AAAI Conference on Artificial Intelligence, pp 29(3):93–106
10049–10057 41. McAuley JJ, Targett C, Shi Q, van den Hengel A (2015) Image-
20. Liao Z, Zhang X, Su W, Zhan K (2022) View-consistent het- based recommendations on styles and substitutes. In: Interna-
erogeneous network on graphs with few labeled nodes. IEEE tional Conference on Research and Development in Information
Transactions on Cybernetics 1–10 Retrieval, pp 43–52

123
X. Liu and W. Yu

42. Shchur O, Mumme M, Bojchevski A, Gunnemann S (2018) Pitfalls Weiren Yu received the Ph.D.
of graph neural network evaluation. CoRR abs/1811.05868 degree in computer science from
43. Peng Z, Huang W, Luo M, Zheng Q, Rong Y, Xu T, Huang J (2020) the University of New South
Graph representation learning via graphical mutual information Wales, Sydney NSW, Australia.
maximization. In: International World Wide Web Conferences, pp After that, he spent two years
259–270 as a Postdoctoral Researcher at
44. Defferrard M, Bresson X, Vandergheynst P (2016) Convolutional Imperial College, London, U.K.,
neural networks on graphs with fast localized spectral filtering. In: and then a Lecturer of Computer
Annual Conference on Neural Information Processing Systems, pp Science with Aston University,
3837–3845 Birmingham, U.K. He is currently
45. Van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. an Assistant Professor of Com-
J Mach Learn Res 9(11):2579–2605 puter Science with the University
46. Demsar J (2006) Statistical comparisons of classifiers over multiple of Warwick, Coventry, U.K., and
data sets. J Mach Learn Res 7:1–30 an Honorary Visiting Fellow with
47. Chauhan S, Singh M, Aggarwal AK (2023) Designing of optimal Imperial College London. His
digital IIR filter in the multi-objective framework using an evolu- current research interests include graph data management, data min-
tionary algorithm. Eng Appl Artif Intell 119:105803 ing, and Internet of Things.

Publisher’s Note Springer Nature remains neutral with regard to juris-


dictional claims in published maps and institutional affiliations.

Xinxin Liu received a master’s


degree in computer applied tech-
nology from Minnan Normal
University, Zhangzhou, China,
in 2020. She is currently pursu-
ing the Ph.D. degree in computer
science and technology at Nan-
jing University of Science and
Technology, Nanjing, China. Her
research interests include graph
data mining and machine learn-
ing.

123

You might also like