Automated Unsupervised Graph Representation Learning
Automated Unsupervised Graph Representation Learning
Automated Unsupervised Graph Representation Learning
Abstract—Graph data mining has largely benefited from the recent developments of graph representation learning. Most attempts to improve
graph representations have thus far focused on designing new network embedding or graph neural network (GNN) architectures. Inspired by
the SGC and ProNE models, we instead focus on enhancing any existing or learned graph representations by further smoothing them via
graph filters. In this paper, we introduce an automated framework AutoProNE to achieve this. Specifically, AutoProNE automatically searches
for a unique optimal set of graph filters for any input dataset, and its existing representations are then smoothed via the selected filters. To make
AutoProNE more general, we adopt self-supervised loss functions to guide the optimization of the automated search process. Extensive
experiments on eight commonly used datasets demonstrate that the AutoProNE framework can consistently improve the expressive power of
graph representations learned by existing network embedding and GNN methods by up to 44%. AutoProNE is also implemented in CogDL, an
open source graph learning library, to help boost more algorithms.
Authorized licensed use limited to: Lanzhou University. Downloaded on April 17,2023 at 14:06:07 UTC from IEEE Xplore. Restrictions apply.
2286 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 35, NO. 3, MARCH 2023
Fig. 1. The overview of the AutoProNE framework. The input is the node embeddings learned by any other network embedding or GNN methods and
the output is the smoothed embeddings with enhanced expressive power. AutoProNE includes two components: (1) Parameter selection: the AutoML
parameter generator automatically select graph filters and corresponding hyperparameters. (2) Embedding propagation: once the graph filters are
selected, they are used to smooth the input embeddings. In the example, the PPR and Gaussian graph filters as well as their parameters are selected
for the specific input graph. And as a result, each node pays attention to its indirect neighbors.
self-supervised manner. Second, rather than a fixed Gaussian and CogDL1 [17] , a open source graph learning library, to
filter, we would like to answer the question of whether there make it more convenient to collaborate with existing graph
exist better graph filters for propagating existing embeddings. representation algorithms.
Finally, the natural need is to avoid the manual design or Contribution. The main contributions of this work
choice of graph filters for each dataset, that is, can we automati- include:
cally find the best filters for each specific graph?
Solutions. To address these issues, we present the We investigate the role of graph filters on unsuper-
AutoProNE framework to automatically find the appro- vised graph representation learning and provide
priate graph filters to smooth existing graph representa- insights into various graph filters.
tions in a self-supervised manner. Different from ProNE We propose a comprehensive framework Auto-
that uses a fixed Gaussian kernel for all graphs, the pro- ProNE that integrates automatic machine learning
posed framework leverages the idea of AutoML to and graph representation learning.
search for the best filter(s) for the given graph, such as We conduct extensive experiments to evaluate our
low-pass, band-pass, and signal-rescaling filters. More- framework. The results demonstrate the effective-
over, rather than using supervised signals to guide the ness of AutoProNE where our framework achieves
AutoML process, the optimization of AutoProNE is built significant improvements over various methods and
upon the recent advancements in self-supervised graph datasets.
representation learning, that is, AutoProNE adopts self-
supervised loss to direct the AutoML optimization.
2 PRELIMINARIES
Fig. 1 illustrates the framework of AutoProNE.
We conduct extensive experiments on eight publicly 2.1 Concepts
available datasets. By using both network embedding meth- Graph Notations. Let G = ðV; E; XÞ denote an undirected graph
ods without input features (e.g., DeepWalk) and graph neu- with V as the set of its n nodes and E as its edges, and X 2
ral networks with node features (e.g., DGI), we show that Rnd as the feature matrix of V. Given G, we have its (sym-
the AutoProNE framework can consistently and signifi- metric) adjacency matrix as A ¼ ðaij Þ with aij = 1 if and only
cantly enhance the expressive power of graph representa- if there exists an edge between vi and vj , as well P as its
tions learned by these base methods. For example, the degree matrix D ¼ diagðd1 ; d2 ; . . . ; dn Þ with di ¼ j aij . In
simple, automated, and unsupervised AutoProNE can boost practice, the row-normalized adjacency matrix A ^ rw ¼ D1 A
the node classification performance of LINE’s representa- and symmetric normalized adjacency matrix A ^ sym ¼
1=2 1=2
tions by 34–44% on the PPI dataset. Furthermore, we show D AD are more commonly used. In this work, we
that the graph filters automatically decided by the AutoML simplify A^ rw and A^ sym as A.
^
process of AutoProNE are always among the best options Network Embedding. The problem of network embed-
across all different datasets. Finally, we find that Auto- ding aims to learn a mapping function f : V ! Rd that
ProNE requires only 2–4% of the running time of DeepWalk projects each node to a d-dimensional space (d jVj).
to offer outperformance by up to 8%, and also demonstrate Network embedding methods mainly focus on neighbor-
that its scalability is linear to the number of nodes in the hood similarity and capture the structural properties of
graph, enabling it for handling large-scale data. The code is
available at https://github.com/THINK2TRY/AutoProNE. 1. https://github.com/THUDM/cogdl
Authorized licensed use limited to: Lanzhou University. Downloaded on April 17,2023 at 14:06:07 UTC from IEEE Xplore. Restrictions apply.
HOU ET AL.: AUTOMATED UNSUPERVISED GRAPH REPRESENTATION LEARNING 2287
P
the network. Extensive studies have shown that the where D ~ ii ¼ ~
j Aij . Furthermore, multiple layers are
learned node representations can benefit various graph stacked to recover a rich class of convolutional filter func-
mining tasks, such as node classification and link predic- tions and each layer is followed by a linear transformation
tion. DeepWalk, LINE, Node2Vec and NetMF are all net- and a point-wise non-linearity. Then the one-layer graph
work embedding methods. convolution becomes as follows:
Graph Signal Processing. In this part, we introduce a
recent formulation [18] of graph signal processing [19], ~ ðiÞ QðiÞ Þ;
Hðiþ1Þ ¼ sðAH (5)
[20] on irregular graphs. The Laplacian matrix of graph
G is defined as L ¼ D A. L ^ ¼ In A
^ is the augmented where QðiÞ is a layer-specific trainable matrix, HðiÞ 2 Rnd is
normalized Laplacian. A vector x 2 Rn defined on the the d-dimensional hidden node representation in the ith
nodes of G is called a graph signal. A useful property of layer.
graph Laplacian L is that its quadratic form measures
the smoothness of the graph signal. It is easy to verify 2.2 ProNE
that In this part, we give a brief introduction to ProNE. ProNE is
X X a fast network embedding method based on matrix factori-
x¼
x T Lx Aij ðxi xj Þ2 ¼ ðxi xj Þ2 : (1)
zation. First, it formulates network embedding as a sparse
i;j ði;jÞ2E matrix factorization for efficient representation transforma-
tion. Second, it utilizes a Gaussian graph filter for represen-
Let U 2 Rnn ¼ ½u u1 ; . . . ; u n T be the eigenvectors of L
^ and tation smoothing to improve the representation.
we have L ¼ ULU , where L ¼ diag½1 ; . . . ; n is the eigen-
^ T
Network Embedding as Sparse Matrix Factorization. ProNE
values of L ^ corresponding to U. The graph Fourier transform leverages an edge to represent a node-context pair. Let
F : Rn ! Rn is defined by F x ¼ x ~ ¼ UT x , and the inverse ri ; ci 2 Rd be the node embedding and context vectors of
1
graph Fourier transform F is F 1 x
~ ¼ x ¼ U~x because node vi respectively. The concurrence of context vj given vi is
1
F F ¼ U U ¼ In .
T
Graph filter is defined based on the graph Fourier p^i;j ¼ sðrTi cj Þ; (6)
g
transform. Let g: R ! R be a function mapping x ! y.
In frequency domain, the graph filter specified by g is sðDÞ is the sigmoid function. To maximize the concurrence
defined by the relation: y~ ¼ gðLÞ~ x , that is, y~ði Þ ¼ probability and avoid trivial solution (ri ¼ ci &^ pi;j ¼ 1) for
gði Þ~
x~ði Þ. In the spatial domain, the above equation is each pair, the objective can be expressed as the weighted
equivalent to y ¼ gðLÞx ^ x. sum of log loss over all edges accompanied with negative
For multi-dimensional cases, the graph Fourier transform sampling
~ ¼ UT X and the inverse form is F 1 X X
is F X ¼ X ~ ¼ X ¼ UX. ~
Loss ¼ ½pi;j ln sðrTi cj Þ þ tPE;j ln sðsðrTi cj ÞÞ;
Then the graph filter is represented as ði;jÞ2E
(7)
Y ¼ gðLÞX
^ ¼ UgðLÞUT X: (2)
where pi;j ¼ Ai;j= Di;i indicates the weight
P of pair (vi ; vj ), t is
Generally, for computational efficiency, Taylor or Cheby- the ratio negative sampling, PE;j / ð i:ði;jÞ2E pði; jÞÞa with
shev approximation is applied to avoid the eigendecompo- a ¼ 1 or 0.75. To minimize the objective equals to let the the
^ Without loss of generality, let T 2 Rnn be the
sition of L. partial derivative with respect to rTi cj be zero. And hence
transition matrix and ui be the weight coefficients. The k- we get
order expansion is represented as
rTi cj ¼ ln pi;j lnðtPE;j Þ: (8)
X
k
^
gðLÞ ui Ti 2 Rnn : (3) ProNE proposes to define a proximity matrix M with
i¼0 each entry as rTi cj , which represents the similarity between
embedding of vi and context embedding of vj
Graph Convolution. In spectral graph convolution net-
works, the graph convolution operation is fast approxi-
ln pi;j lnðtPE;j Þ; ðvi ; vj Þ 2 E
mated with the layer-wise propagation rule [1]. GCN M¼ :
simplifies Eq. (3) by only keeping the first two items with 0; ðvi ; vj Þ 2
=E
u0 ¼ u1
Now that the relationship matrix is built, the objective is
^ ¼ u0 In þ u1 A
gðLÞ ^ transformed into matrix factorization. ProNE uses a fast
(4)
randomized tSVD to achieve fast sparse matrix factorization
¼ uðIn þ AÞ:
^ ^ 2 RNd as node embedding matrix.
and finally gets X
In is an identity matrix. The eigenvalues of I þ A
^ is in the
Spectral Propagation. To address the representation
range [0, 2]. To alleviate the problem of numerical instability smoothing problem, ProNE proposes to leverage a Gaussian
and exploding/vanishing gradients when used in deep ^ ProNE designs
graph filter as the smoothing function fðAÞ.
neural networks, the following renormalization trick is intro- 1 2
the graph filter as gðÞ ¼ e½2ðmÞ 1u and formulates the
duced transformation as the following rule:
In þ A ~ 1=2 ðI þ AÞD
^ !D ~ 1=2 ¼ D
~ 1=2 A
~D~ 1=2 ; ^ ¼ D1 AðI
fðAÞ ^ n LÞ;
~ (9)
Authorized licensed use limited to: Lanzhou University. Downloaded on April 17,2023 at 14:06:07 UTC from IEEE Xplore. Restrictions apply.
2288 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 35, NO. 3, MARCH 2023
Authorized licensed use limited to: Lanzhou University. Downloaded on April 17,2023 at 14:06:07 UTC from IEEE Xplore. Restrictions apply.
HOU ET AL.: AUTOMATED UNSUPERVISED GRAPH REPRESENTATION LEARNING 2289
Algorithm 1. The AutoProNE Algorithm the Gaussian graph filter is similar to the heat graph filter
^ Input embeddings X;
Input: Normalized adj acency matrix A; ^
Lg ¼ Ufg ðLÞUT ¼ Udiagðeu1 ; . . . ; eun ÞUT :
Search iterations N. (16)
Output: Enhanced embeddings H with the same dimension
^
size as X. Signal Rescaling. In addition to the three existing filters,
1: for i ¼ 1 to N do we also propose a signal rescaling filter. The intuition is that
2: Select filters GFs from fPPR; Heat; Gaussian; SRg
structural information plays a central role in networks. An
3: Let Z be an empty list: Z ¼ ½
intuitive example is that nodes of a high degree may be
4: for each gf 2 GFs do
^ gfÞ more important and influential in a network. To capture
5: Hgf ¼ PropagationðA;
^ X;
6: Append Hgf into Z
this phenomenon, we can attenuate node signals (and per-
7: end for form renormalization) before propagation, that is
8: Concatenate smoothed embeddings in Z to have Zr 1
9: Conduct dimension reduction on Zr to get H As ¼ NormalizeðAsðD
^ ÞÞ; (17)
10: Calculate the corresponding loss and optimize
where sðxÞ ¼ 1þe1x .
parameters.
Chebyshev Expansion for Efficiency. For both the heat kernel
11: end for
and Gaussian kernel, we utilize the Chebyshev expansion
and Bessel function [29] to avoid explicit eigendecomposi-
tion. Chebyshev polynomials of the first kind are defined as
3.3 Automated Graph Filter Selection
Tiþ1 ðxÞ ¼ 2xTi ðxÞ Ti1 ðxÞ with T0 ðxÞ ¼ 1; T1 ðxÞ ¼ x. Then,
We introduce the design choices of the core components in the expression of a general graph filter can be expanded as
AutoProNE: Search Space Design and Unsupervised Loss
Function. X
k X
k
U
L ~ T ¼
ci ðuÞTi ðLÞU ci ðuÞTi ðLÞ;
~ (18)
3.3.1 Search Space Design i¼0 i¼0
In graph signal processing, different graph filters have been where specifically L ~ ¼ L, L~¼L ^ for heat kernel and L~¼
designed for modulating graph signals. We adopt three 2 ðL
1
mIÞ2 , L ^ mIÞ2 for Gaussian kernel. The coeffi-
~ ¼ 1 ðL
2
existing simple and efficient graph filters—PPR, Heat kernel, cient ci ðuÞ can be obtained with the Bessel function
and Gaussian kernel—that have been widely used in graph
Z
representation learning [16], [24], [25] and also propose a b Ti ðxÞexu
ci ðuÞ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffi dx ¼ bðÞi Bi ðuÞ; (19)
new filter based on signal rescaling. The four graph filters p 1 x2
are summarized in Table 1.
PPR [26]. Personalized PageRank(PPR) is defined based where b = 1 if i = 0 otherwise b=2 and Bi ðuÞ is the modified
on random walk with restart and reflects the probability of Bessel function of the first kind [29]. Then the truncated
a random walk starting from a node to another. PPR is expansion of the filter becomes as follows:
defined as pðx xÞ ¼ ð1 aÞApðx
^ xÞ þ ax x, and its closed-form
matrix is Ap ¼ aðIn ð1 aÞAÞ.
^ To avoid the high computa- X
k1
L B0 ðuÞT0 ðLÞ
~ þ2 Bi ðuÞTi ðLÞ:
~ (20)
tional cost of matrix inversion, we use Taylor expansion to i¼1
approximate Ap , that is
This truncated Chebyshev expansion provides a good
X
k approximation for eu with a fast convergence rate.
Ap uðiÞ
p A;
^ where uðiÞ
p ¼ að1 aÞ :
i
(14) Taylor expansion and Chebyshev expansion are common
i¼0
approximation methods. [30] adopts another local spectral
Heat Kernel [27]. The heat kernel is often used in heat con- embedding method which utilizes random vectors and
dition and diffusion, and represents the evolution of tem- Gauss-Seidel iteration to avoid explicit eigendecomposition
perature in a region whose boundary is held fixed at a of the adjacency matrix.
Authorized licensed use limited to: Lanzhou University. Downloaded on April 17,2023 at 14:06:07 UTC from IEEE Xplore. Restrictions apply.
2290 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 35, NO. 3, MARCH 2023
Authorized licensed use limited to: Lanzhou University. Downloaded on April 17,2023 at 14:06:07 UTC from IEEE Xplore. Restrictions apply.
HOU ET AL.: AUTOMATED UNSUPERVISED GRAPH REPRESENTATION LEARNING 2291
Authorized licensed use limited to: Lanzhou University. Downloaded on April 17,2023 at 14:06:07 UTC from IEEE Xplore. Restrictions apply.
2292 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 35, NO. 3, MARCH 2023
merely row normalization, as we can see from the figures, the graph and benefiting node classification. In practical
PPR, HeatKernel and Gaussian kernel enlarge the receptive applications, different types of graph filters can be selected
field and assign different weights to neighbors and also pay to fit the characteristics of graphs.
attention to nonneighbor nodes.They do show different
forms when capturing the characteristics of higher-order
4.3 Complexity
neighbors. In addition, the three filters assign high weights
The computation of graph filters in Table 1 can be efficiently
to the node itself even without adding self-loops. Signal
executed in a recurrent manner. Let n ¼ jVj; m ¼ jEj. With
rescaling adjusts weights according to the node degrees of
Taylor and Chebyshev expansion, we only apply repeated
neighbors.
Sparse Matrix-Matrix multiplication (SPMM) between a
Fig. 2 shows the impact of the hyperparameters on filters.
sparse n n matrix and a dense n d feature matrix with
And it can be observed that PPR, HeatKernel and Gaussian
time complexity OðmdÞ and avoid explicit eigendecomposi-
Kernel are sensitive to hyperparameters, which mainly
tion with complexity Oðn3 Þ. In addition, Intel Math Kernel
determine how a node aggregates the features of its neigh-
Library (MKL) provides efficient SPMM operators which
bors. For PPR, the bigger restart probability a, the more
can handle graph of millions or even billions of nodes [36].
attention a node pays to neighbor nodes. Gaussian and heat
This makes it possible for AutoProNE to handle large-scale
kernel are more complex in the spatial domain.
graph data.
For Taylor expansion, we denote XðiÞ ¼ ui X ^ ðiÞ ; X
^ ðiÞ ¼
4.2 Connection With Graph Partition ^X^ ði1Þ ð0Þ
^ ¼ X. ^ For Chebyshev expansion, we denote
A with X
Graph partition aims to reduce a graph into a set of smaller ^ ðiÞ ; X
Xðiþ1Þ ¼ ci ðuÞX ^ ðiÞ ¼ 2L
^X^ ði1Þ X ^ ð0Þ ¼ X.
^ ði2Þ with X ^ As L ^
graphs by partitioning its set of nodes into mutually exclu- ^
and A are both sparse matrix, the complexity of Propagation
sive groups. In many cases, graphs have good locality and a
is OðkdmÞ. The dimension reduction (SVD) on small matrix
node often tends to have similar features and be in the same
is Oðnd2 Þ. And the complexity of computing Infomax and
class with its neighbors. The assumption is also the basis of
InfoNCE loss is also Oðnd2 Þ as it only involves the element-
many algorithms like DeepWalk, LINE and GCN. Therefore,
wise multiplication of matrix. All together, the complexity
the locality and connectivity of a graph are highly related to
of AutoProNE is Oðnd2 þ kdmÞ.
the effectiveness of the Propagation in our model. Luckily,
high-order Cheeger’s inequality[34], [35] suggests that
eigenvalues of Laplacian matrix are closely associated with 5 EXPERIMENTS
a graph’s locality and connectivity. We conduct extensive experiments to evaluate the effective-
The effect of graph partition can be measured by graph ness and efficiency of the proposed AutoProNE framework.
conductance(a.k.a. Cheeger constant).
P For a node subset S Specifically, we examine to what extent AutoProNE can
V, the volume of S is volðSÞ ¼ x2S dðxÞ, dðxÞ is the degree improve both shallow network embedding methods (with-
of node x. The edge boundary of S is eðSÞ ¼ fðx; yÞ 2 Ejx 2 out input feature matrix) and graph neural networks (with
S; y 2
= Sg. Then the conductance of S is defined to be input feature matrix).
jeðSÞj
FðSÞ ¼ :
minðvolðSÞ; volðSÞÞ 5.1 Experiment Setup
To measure the conductance of the whole graph G when 5.1.1 Datasets
partitioned into k parts, let S1 ; . . . ; Sk V and Si \ Sj ¼ ; if We use eight datasets that are commonly used for evaluat-
i 6¼ j. The k-way Cheeger constant is defined as ing graph representation learning, 6 of which are relatively
small scale but are widely used in graph representation
rG ðkÞ ¼ minfmaxfFðSi Þ; i ¼ 1; . . . ; kgg; learning literature, including BlogCatalog, PPI, Wiki, Cora,
rG ðkÞ is positively correlated to graph connectivity. if a Citeseer and Pubmed. DBLP and Youtube are relatively
graph has higher conductance(bigger rG ðkÞ), it is better con- large scale networks. Table 2 lists their statistics.
nected. High-order Cheeger equality bridges the gap We use the following five datasets without input node
between graph partition and graph spectrum and shows features.
that the eigenvalues of Laplacian matrix control the bounds BlogCatalog [37] is a network of social relationships of
of k-way Cheeger constant as follows: online bloggers. The vertex labels represent the inter-
pffiffiffiffiffi ests of the bloggers.
k
rG ðkÞ Oðk2 Þ k : (25) PPI [38] is a subgraph of the PPI network for Homo
2
Sapiens. The vertex labels are obtained from the hall-
In spectral graph theory, the number of connected compo- mark gene sets and represent biological states.
nents in an undirected graph is equal to the multiplicity of DBLP [39] is an academic citation network where
the eigenvalue zero in graph Laplacian. authors are treated as nodes and their dominant con-
Eq. (25) indicates that we can increase(decrease) the ferences as labels.
graph conductance rG ðkÞ by increasing(decreasing) the Wiki [40] is a word co-occurrence network in part of
lower bound k . For example, low-pass filters increases k the Wikipedia dump. Node labels are the Part-of-
for small k and decreases eigenvalues for big k. As a result, Speech tags.
different parts become more isolated when the graph is par- Youtube [37] is a video-sharing website that allows
titioned into different groups, thus improving the locality of users to upload, view, rate, share, add to their
Authorized licensed use limited to: Lanzhou University. Downloaded on April 17,2023 at 14:06:07 UTC from IEEE Xplore. Restrictions apply.
HOU ET AL.: AUTOMATED UNSUPERVISED GRAPH REPRESENTATION LEARNING 2293
Authorized licensed use limited to: Lanzhou University. Downloaded on April 17,2023 at 14:06:07 UTC from IEEE Xplore. Restrictions apply.
2294 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 35, NO. 3, MARCH 2023
TABLE 3
Micro-F1 Results of Node Classification by Different Algorithms w/ and w/o AutoProNE
Datasets Ratio DeepWalk ProDeepWalk LINE ProLINE node2vec ProNode2vec NetMF ProNetMF HOPE ProHOPE
BlogCatalog 0.1 35.6 36.2 (+1.7%) 30.3 32.3 (+6.6%) 35.7 35.7 (0.0%) 35.6 36.8 (+3.4%) 30.4 33.5 (+10.2%)
0.5 40.5 41.8 (+3.2%) 37.1 38.3 (+3.2%) 40.6 41.5 (+2.2%) 41.1 41.9 (+2.0%) 34.0 37.4 (+10.0%)
0.9 42.3 44.2 (+4.5%) 39.6 40.5 (+2.3%) 41.8 43.9 (+5.0%) 41.6 42.3 (+1.7%) 35.3 38.3 (+8.5%)
PPI 0.1 16.7 17.6 (+5.4%) 12.5 17.0 (+34.9%) 16.1 17.5 (+8.7%) 17.9 18.0 (+0.6%) 16.0 17.3 (+8.1%)
0.5 21.6 24.7 (+14.3%) 16.4 23.7 (+44.5%) 20.6 24.1 (+17.0%) 23.1 24.6 (+6.5%) 21.0 23.3 (+10.9%)
0.9 24.0 27.0 (+12.5%) 19.5 26.2 (+34.4%) 23.1 25.7 (+11.2%) 25.5 26.5 (+3.9%) 23.2 24.9 (+7.3%)
Wiki 0.1 43.3 44.5 (+2.8%) 41.8 45.8 (+9.6%) 44.8 44.2 (-1.3%) 45.7 45.9 (+0.5%) 48.8 48.0 (+2.4%)
0.5 49.2 50.0 (+1.6%) 52.5 53.2 (+1.3%) 51.1 50.9 (-0.3%) 50.1 50.9 (+1.6%) 53.1 52.8 (-0.5%)
0.9 50.0 51.4 (+2.8%) 54.7 55.0 (+0.5%) 52.8 52.5 (-0.5%) 50.7 51.9 (+2.4%) 53.1 54.3 (+0.4%)
DBLP 0.01 51.5 55.8 (+8.3%) 49.7 52.4 (+5.4%) 53.4 57.8 (+8.2%) 51.5 52.9 (+2.7%) - -
0.05 58.1 59.0 (+1.5%) 54.9 56.2 (+2.4%) 58.3 60.0 (+2.9%) 57.1 59.5 (+4.2%) - -
0.09 59.4 59.9 (+0.9%) 56.3 57.0 (+1.2%) 59.5 60.6 (+1.8%) 57.9 60.2 (+4.0%) - -
Youtube 0.01 38.2 39.2 (+2.6%) 33.2 39.8 (+19.8%) 38.2 39.7 (+3.9%) - - - -
0.05 41.6 44.7 (+6.0%) 36.2 43.5 (+20.1%) 40.0 45.3 (+12.2%) - - - -
0.09 42.8 46.2 (+7.9%) 38.3 45.9 (+19.8%) 43.0 47.1 (+9.5%) - - - -
Ratio indicates the percentage of labeled data. Numbers in the brackets indicate performance improvements and all are statistically significant (p-value0.01;
t-test).
and repeat the training and predicting for ten times and Gaussian kernel for Cora, Citeseer, and Pubmed. We leave
report the average Micro-F1 for all methods. the reason behind this difference for future work.
We observe that the performance of all the base algo-
rithms can be significantly improved by the AutoProNE
framework and also the improvements are statistically sig- 5.2.2 The Role of AutoML
nificant. For DeepWalk, LINE, node2vec, NetMF, and Tables 5 and 6 report the performance of base methods with
HOPE, the improvements brought by AutoProNE are up to each of the four graph filters in AutoProNE’s search space.
14.3%, 44.5%, 17%, 6.5%, and 10.9%, respectively. On aver- We can observe that different filters have unique impacts on
age, LINE benefits more from AutoProNE as it is in nature the performance across datasets. For example, Signal
an embedding method without incorporating high-order
structural information. TABLE 5
Table 4 reports the results of unsupervised GraphSAGE Results of Base Embedding Methods With Different
Graph Filters
and DGI as well as the AutoProNE version of them. As a
reference point, we also list the performance of two popular Dataset Type DeepWalk LINE node2vec NetMF HOPE
semi-supervised GNNs—GCN and GAT. We observe that
the unsupervised AutoProNE framework can help improve BlogCatalog Heat 41.4 38.3 41.0 41.6 34.6
the performance of both DGI and GraphSAGE in most PPR 41.7 38.8 41.3 41.8 35.6
Gaussian 41.4 38.4 41.1 41.3 37.5
cases. This suggests that by automated usage of graph fil-
SR 41.9 38.7 40.9 41.6 34.5
ters, the simple AutoProNE strategy is capable of enhancing AutoProNE 41.8 38.3 41.5 41.9 37.4
graph representation learning with or without input node
PPI Heat 23.4 21.1 22.8 24.0 21.7
features. With the help of AutoProNE, DGI can even yield PPR 23.9 22.7 23.6 24.3 22.3
better performance than the end-to-end semi-supervised Gaussian 23.2 21.3 22.8 23.7 21.1
GCN and GAT models on Pubmed. SR 24.5 23.0 24.8 25.0 23.0
Finally, we observe that the AutoProNE prefers the AutoProNE 24.7 23.7 24.1 24.6 23.3
newly proposed signal rescaling (SR) filter for BlogCatalog, Wiki Heat 48.2 52.8 50.5 47.4 53.1
PPI, Wiki, DBLP and Youtube, while it tends to favor PPR 48.4 52.6 49.9 48.1 53.2
Gaussian 48.2 52.6 50.3 47.2 53.0
SR 49.0 54.1 52.2 50.7 53.0
AutoProNE 50.0 53.2 50.9 50.9 52.8
TABLE 4
Accuracy Results of Node Classification by Unsupervised GNNs DBLP Heat 58.8 54.7 59.4 58.8 -
PPR 59.0 55.4 59.7 59.0 -
Dataset Cora Pubmed Citeseer Gaussian 58.9 54.7 59.7 58.5 -
SR 58.7 54.1 59.7 58.9 -
Semi-supervised GCN 81.5 79.0 70.3 AutoProNE 59.0 56.2 60.0 59.5 -
GAT 83.0 0.7 79.0 0.3 72.5 0.7 Youtube Heat 44.5 42.7 44.7 - -
Unsupervised DGI 82.0 0.1 77.1 0.1 71.7 0.2 PPR 44.6 43.5 45.1 - -
ProDGI 82.9 0.2 81.0 0.1 70.8 0.2 Gaussian 44.3 40.5 44.3 - -
GraphSAGE 77.2 0.1 78.0 0.1 61.2 0.1 SR 44.5 43.0 45.1 - -
ProSAGE 78.1 0.2 79.5 0.1 62.1 0.2 AutoProNE 44.7 43.5 45.3 - -
Significant test (p-value0.01; t-test). Ratio=0.5 for BlogCatalog, PPI, and Wiki; 0.05 for DBLP and Youtube.
Authorized licensed use limited to: Lanzhou University. Downloaded on April 17,2023 at 14:06:07 UTC from IEEE Xplore. Restrictions apply.
HOU ET AL.: AUTOMATED UNSUPERVISED GRAPH REPRESENTATION LEARNING 2295
TABLE 6 TABLE 9
Results of Base GNNs With Different Graph Filters Efficiency of AutoProNE (Seconds)
Rescaling(SR) and PPR perform relatively better than other 5.2.3 Efficiency and Scalability
filters in dataset PPI, BlogCatalog and Wiki, but for Cora, We follow the common practice for efficiency evaluation by
Citeseer and Pubmed, Gaussian filter exhibits better perfor- the wall-clock time and AutoProNE ’s scalability is ana-
mance. And the tables show that the performance of Auto- lyzed by the time cost in multiple-scale networks [5].
ProNE is equal to the best result of one single filter, which Tables 8 and 9 report the extra running time (seconds)
may imply that AutoProNE finally picks this filter, or our when stacked with AutoProNE for 100 searching iterations
model yields better performance than any single filter, which in 10 threads. Note that AutoProNE is a dataset specific and
means AutoProNE learns a better combination of graph fil- base agnostic framework. The percentage of extra running
ters. These all suggest the need for AutoML to select the best time in terms of DeepWalk and DGI is also reported inside
filter(s) for each dataset. The proposed AutoProNE strategy the brackets, respectively.
consistently and automatically picks the optimal filter(s) and We can observe that AutoProNE is very efficient com-
parameters in an unsupervised manner. pared to base methods. Overall, AutoProNE costs only 2.3–
The graph filter used for the embedding smoothing in 4.7% of DeepWalk’s or DGI’s running time. Take the You-
ProNE [16] is a modified 2nd-order Gaussian kernel. For sim- tube graph as an example, it takes DeepWalk 68,272 seconds
plicity, the 1st-order Gaussian kernel is covered in ( 19 hours) for generating the embeddings of its 1,000,000+
AutoProNE’s search space. Table 7 reports the performance for nodes. However, AutoProNE only needs 4.7% of its time to
ProNE(SMF) enhanced by ProNE graph filter and AutoProNE offer 2.6–7.9% performance improvements (Cf. Table 3).
respectively. We can see that the automated search strategy For convolutional methods, GNNs are usually less efficient
empowers AutoProNE to generate better performance than for large scale datasets. This suggests that AutoProNE will
ProNE in most cases, further demonstrating the effectiveness take less of the additional percentage of time to achieve
of using AutoML for smoothing graph representations. improvement.
From the experiment results of AutoML searching, we We use synthetic networks to demonstrate the scalability
also have some interesting findings. PPR, Heat kernel and of AutoProNE. We generate random regular graphs with a
TABLE 7
Results of ProNE and AutoProNE
With ProNE and With AutoProNE mean the result of ProNE(SMF) improved by the graph filter of ProNE and AutoProNE respectively.
TABLE 8
Efficiency of AutoProNE (Seconds)
Authorized licensed use limited to: Lanzhou University. Downloaded on April 17,2023 at 14:06:07 UTC from IEEE Xplore. Restrictions apply.
2296 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 35, NO. 3, MARCH 2023
Authorized licensed use limited to: Lanzhou University. Downloaded on April 17,2023 at 14:06:07 UTC from IEEE Xplore. Restrictions apply.
HOU ET AL.: AUTOMATED UNSUPERVISED GRAPH REPRESENTATION LEARNING 2297
technique for finding the globally optimal solution of an opti- [9] P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Li o, and Y.
Bengio, “Graph attention networks,” in Proc. 6th Int. Conf. Learn.
mization problem. And this technique is widely used espe- Representations, 2018.
cially in hyperparameter optimization. As AutoProNE [10] W. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation
mainly aims to search for the best hyperparameters for graph learning on large graphs,” in Proc. 31st Int. Conf. Neural Inf. Process.
filters, bayesian Optimization is the principle behind the Syst., 2017, pp. 1025–1035.
[11] P. Velickovic, W. Fedus, W. L. Hamilton, P. Li o, Y. Bengio, and R.
searching. AutoProNE is an unsupervised and task-indepen- D. Hjelm, “Deep graph infomax,” in Proc. 7th Int. Conf. Learn. Rep-
dent model that aims to pre-train general embeddings. resentations, 2019.
[12] J. Qiu et al., “GCC: Graph contrastive coding for graph neural net-
work pre-training,” in Proc. 26th ACM SIGKDD Int. Conf. Knowl.
Discov. Mining, 2020, pp. 1150–1160.
7 CONCLUSION [13] R. H. Devon et al., “Learning deep representations by mutual
information estimation and maximization,” in Proc. 7th Int. Conf.
In this paper, we investigate the role of graph filters and Learn. Representations, 2019.
propose an automated and unsupervised framework Auto- [14] K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum con-
ProNE to generate improved graph embeddings for unsu- trast for unsupervised visual representation learning,” in Proc.
IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 9726–9735.
pervised graph representation learning. AutoProNE [15] F. Wu, A. H. S. Jr., T. Zhang, C. Fifty, T. Yu, and K. Q. Weinberger,
comprises four graph filters (PPR, HeatKernel, Gaussian “Simplifying graph convolutional networks,” in Proc. 36th Int.
Kernel and Signal Rescaling) and automatically searches for Conf. Mach. Learn., 2019, pp. 6861–6871.
a combination of graph filters and corresponding hyper- [16] J. Zhang, Y. Dong, Y. Wang, J. Tang, and M. Ding, “ProNE: Fast
and scalable network representation learning,” in Proc. 28th Int.
parameters for the given dataset. Specifically, AutoProNE Joint Conf. Artif. Intell., 2019, pp. 4278–4284.
operates on the adjacency matrix to enrich the context infor- [17] Y. Cen et al., “CogDL: An extensive toolkit for deep learning on
mation of each node. It is a flexible and adaptive frame- graphs,” 2021, arXiv:2103.00959.
[18] B. Girault, A. Ortega, and S. S. Narayanan, “Irregularity-aware
work, as the graph filter set can simulate many kinds of graph fourier transforms,” IEEE Trans. Signal Process., vol. 66, no.
filtering functions. It’s also very efficient and costs only a lit- 21, pp. 5746–5761, Nov. 2018.
tle extra time to obtain the improvement in performance. [19] A. Sandryhaila and J. M. Moura, “Discrete signal processing on
The developed method is model-agnostic and can be eas- graphs,” IEEE Trans. Signal Process., vol. 61, no. 7, pp. 1644–1656,
Apr. 2013.
ily stacked with all unsupervised graph representation [20] D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P.
learning methods such as DeepWalk, LINE, node2vec, Vandergheynst, “The emerging field of signal processing on
NetMF, and HOPE. On eight publicly available datasets, graphs: Extending high-dimensional data analysis to networks
and other irregular domains,” IEEE Signal Process. Mag., vol. 30,
AutoProNE helps significantly improve the performance of no. 3, pp. 83–98, May 2013.
various algorithms. In addition, AutoProNE can also [21] X. He, K. Zhao, and X. Chu, “AutoML: A survey of the state-of-
enhance the performance of self-supervised/unsupervised the-art,” 2019, arXiv: 1908.00709.
GNN methods, e.g., DGI and GraphSage. We show that the [22] T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama, “Optuna: A
next-generation hyperparameter optimization framework,” in
self-supervised DGI model with the unsupervised Auto- Proc. 25th ACM SIGKDD Int. Conf. Knowl. Discov. Data Mining,
ProNE can generate comparable or even better performance 2019, pp. 2623–2631.
than semi-supervised end-to-end GNN methods, such as [23] K. Tu, J. Ma, P. Cui, J. Pei, and W. Zhu, “AutoNE: Hyperparameter
GCN and GAT. We have implemented AutoProNE in optimization for massive network embedding,” in Proc. 25th ACM
SIGKDD Int. Conf. Knowl. Discov. Data Mining, 2019, pp. 216–225.
CogDL, an open-source graph learning library, to help [24] J. Klicpera, S. Weißenberger, and S. G€ unnemann, “Diffusion
more graph representation learning methods. improves graph learning,” in Proc. Int. Conf. Neural Inf. Process.
Syst., 2019, pp. 13333–13345.
[25] B. Xu, H. Shen, Q. Cao, K. Cen, and X. Cheng, “Graph convolu-
REFERENCES tional networks using heat kernel for semi-supervised learning,”
in Proc. 28th Int. Joint Conf. Artif. Intell., 2019, pp. 1928–1934.
[1] T. N. Kipf and M. Welling, “Semi-supervised classification with [26] L. Page, S. Brin, R. Motwani, and T. Winograd, “The pagerank cita-
graph convolutional networks,” in Proc. 5th Int. Conf. Learn. Repre- tion ranking: Bringing order to the web,” Stanford InfoLab, 1999.
sentations, 2017. [27] R. I. Kondor and J. Lafferty, “Diffusion kernels on graphs and other
[2] R. Ying, R. He, K. Chen, P. Eksombatchai, W. L. Hamilton, and J. discrete structures,” in Proc. 19th Int. Conf. Mach. Learn., 2002,
Leskovec, “Graph convolutional neural networks for web-scale pp. 315–322.
recommender systems,” in Proc. 24th ACM SIGKDD Int. Conf. [28] D. I. Shuman, B. Ricaud, and P. Vandergheynst, “Vertex-fre-
Knowl. Discov. Data Mining, 2018, pp. 974–983. quency analysis on graphs,” Appl. Comput. Harmon. Anal., vol. 40,
[3] M. Ding, C. Zhou, Q. Chen, H. Yang, and J. Tang, “Cognitive no. 2, pp. 260–291, 2016.
graph for multi-hop reading comprehension at scale,” in Proc. [29] L. C. Andrews, Special Functions of Mathematics for Engineers, vol.
57th Annu. Meeting Assoc. Comput. Linguistics, 2019, pp. 2694–2703. 49. Bellingham, WA, USA: SPIE, 1998.
[4] B. Perozzi, R. Al-Rfou , and S. Skiena, “DeepWalk: Online learning [30] C. Deng, Z. Zhao, Y. Wang, Z. Zhang, and Z. Feng,
of social representations,” in Proc. 20th ACM SIGKDD Int. Conf. “GraphZoom: A multi-level spectral approach for accurate
Knowl. Discov. Data Mining, 2014, pp. 701–710. and scalable graph embedding,” in Proc. 8th Int. Conf. Learn.
[5] J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei, “LINE: Representations, 2020.
Large-scale information network embedding,” in Proc. 24th Int. [31] F.-Y. Sun, J. Hoffmann, V. Verma, and J. Tang, “InfoGraph: Unsu-
Conf. World Wide Web, 2015, pp. 1067–1077. pervised and semi-supervised graph-level representation learning
[6] A. Grover and J. Leskovec, “node2vec: Scalable feature learning via mutual information maximization,” in Proc. Int. Conf. Learn.
for networks,” in Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Dis- Representations, 2020.
cov. Data Mining, 2016, pp. 855–864. [32] J. Snoek, H. Larochelle, and R. P. Adams, “Practical Bayesian opti-
[7] M. Ou, P. Cui, J. Pei, Z. Zhang, and W. Zhu, “Asymmetric transi- mization of machine learning algorithms,” in Proc. 25th Int. Conf.
tivity preserving graph embedding,” in Proc. 22nd ACM SIGKDD Neural Inf. Process. Syst., 2012, pp. 2951–2959.
Int. Conf. Knowl. Discov. Data Mining, 2016, pp. 1105–1114. [33] H. NT and T. Maehara, “Revisiting graph neural networks: All we
[8] J. Qiu, Y. Dong, H. Ma, J. Li, K. Wang, and J. Tang, “Network have is low-pass filters,” 2019, arXiv: 1905.09550.
embedding as matrix factorization: Unifying deepWalk, LINE, [34] J. R. Lee, S. O. Gharan, and L. Trevisan, “Multiway spectral parti-
PTE, and node2vec,” in Proc. 11th ACM Int. Conf. Web Search Data tioning and higher-order Cheeger inequalities,” J. ACM, vol. 61,
Mining, 2018, pp. 459–467. no. 6, 2014, Art. no. 37.
Authorized licensed use limited to: Lanzhou University. Downloaded on April 17,2023 at 14:06:07 UTC from IEEE Xplore. Restrictions apply.
2298 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 35, NO. 3, MARCH 2023
[35] A. S. Bandeira, A. Singer, and D. A. Spielman, “A Cheeger [61] J. Bergstra, D. Yamins, and D. Cox, “Making a science of model
inequality for the graph connection Laplacian,” SIAM J. Matrix search: Hyperparameter optimization in hundreds of dimensions
Anal. Appl., vol. 34, no. 4, pp. 1611–1630, 2013. for vision architectures,” in Proc. 30th Int. Conf. Mach. Learn., 2013,
[36] J. Qiu, L. Dhulipala, J. Tang, R. Peng, and C. Wang, “LightNE: A pp. 115–123.
lightweight graph processing system for network embedding,” in [62] H. Mendoza, A. Klein, M. Feurer, J. T. Springenberg, and
Proc. Int. Conf. Manage. Data, 2021, pp. 2281–2289. F. Hutter, “Towards automatically-tuned neural networks,” in
[37] R. Zafarani and H. Liu, “Social computing data repository at Proc. Workshop Autom. Mach. Learn., 2016, pp. 58–65.
ASU,” 2009. [63] Y. Gao, H. Yang, P. Zhang, C. Zhou, and Y. Hu, “GraphNAS:
[38] B.-J. Breitkreutz et al., “The biogrid interaction database: 2008 Graph neural architecture search with reinforcement learning,”
update,” Nucleic Acids Res., vol. 36, no. suppl_1, pp. D637–D640, 2019, arXiv: 1904.09981.
2007. [64] K. Zhou, Q. Song, X. Huang, and X. Hu, “Auto-GNN: Neural architec-
[39] J. Tang, J. Zhang, L. Yao, J. Li, L. Zhang, and Z. Su, “ArnetMiner: ture search of graph neural networks,” 2019, arXiv: 1909.03184.
Extraction and mining of academic social networks,” in Proc. 14th [65] B. Zoph and Q. V. Le, “Neural architecture search with reinforce-
ACM SIGKDD Int. Conf. Knowl. Discov. Data Mining, 2008, pp. 990–998. ment learning,” 2016, arXiv:1611.01578.
[40] M. Mahoney, “Large text compression benchmark,” 2009. [Online]. [66] J. Mockus, “On Bayesian methods for seeking the extremum,” in
Available: http://www. mattmahoney.net/text/text.html Proc. IFIP Tech. Conf. Optim. Techn., 1975, pp. 400–404.
[41] A. K. McCallum, K. Nigam, J. Rennie, and K. Seymore,
“Automating the construction of internet portals with machine Zhenyu Hou received the undergraduate degree
learning,” Inf. Retrieval, vol. 3, no. 2, pp. 127–163, 2000. from the Department of Computer Science and
[42] P. Sen, G. Namata, M. Bilgic, L. Getoor, B. Galligher, and Technology, Tsinghua University, Beijing, China.
T. Eliassi-Rad, “Collective classification in network data,” AI His main research interests include graph neural
Mag., vol. 29, no. 3, pp. 93–93, 2008. networks and representation learning.
[43] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation
of word representations in vector space,” 2013, arXiv:1301.3781.
[44] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean,
“Distributed representations of words and phrases and their
compositionality,” in Proc. 26th Int. Conf. Neural Inf. Process. Syst.,
2013, pp. 3111–3119.
[45] S. Cao, W. Lu, and Q. Xu, “GraRep: Learning graph representa- Yukuo Cen received the bachelor’s degree in
tions with global structural information,” in Proc. 24th ACM Int. computer science and technology from Tsinghua
Conf. Inf. Knowl. Manage., 2015, pp. 891–900. University, Beijing, China. He is currently working
[46] T. Chen and Y. Sun, “Task-guided and path-augmented heteroge- toward the PhD degree in the Department of
neous network embedding for author identification,” in Proc. 10th Computer Science and Technology, Tsinghua
ACM Int. Conf. Web Search Data Mining, 2017, pp. 295–304. University, Beijing, China. His research interests
[47] M. Henaff, J. Bruna, and Y. LeCun, “Deep convolutional networks include social influence and graph embedding.
on graph-structured data,” 2015, arXiv:1506.05163.
[48] M. Defferrard, X. Bresson, and P. Vandergheynst, “Convolutional neu-
ral networks on graphs with fast localized spectral filt-ering,” in Proc.
30th Int. Conf. Neural Inf. Process. Syst., 2016, pp. 3844–3852.
[49] H. Chen, H. Yin, X. Sun, T. Chen, B. Gabrys, and K. Musial,
“Multi-level graph convolutional networks for cross-platform Yuxiao Dong received the PhD degree in computer
anchor link prediction,” in Proc. 26th ACM SIGKDD Int. Conf. science from the University of Notre Dame, Notre
Knowl. Discov. Data Mining, 2020, pp. 1503–1511. Dame, Indiana, in 2017. He is a research scientist
[50] H. Chen, H. Yin, T. Chen, Q. V. H. Nguyen, W.-C. Peng, and X. Li, with Facebook AI, Seattle, was previously a senior
“Exploiting centrality information with graph convolutions for researcher with Microsoft Research, Redmond. His
network representation learning,” in Proc. IEEE 35th Int. Conf. research focuses on data mining, representation
Data Eng., 2019, pp. 590–601. learning, and networks, with an emphasis on devel-
[51] K. Xu, C. Li, Y. Tian, T. Sonobe, K.-I. Kawarabayashi, and oping machine learning models to addressing prob-
S. Jegelka, “Representation learning on graphs with jumping lems in large-scale graph systems.
knowledge networks,” in Proc. 35th Int. Conf. Mach. Learn., 2018,
pp. 5449–5458.
[52] J. Klicpera, A. Bojchevski, and S. G€ unnemann, “Predict then prop-
Jie Zhang received the bachelor’s degree in
agate: Graph neural networks meet personalized pagerank,”
mathematics and physics from the Department of
in Proc. 7th Int. Conf. Learn. Representations, 2019.
Physics, Tsinghua University, Beijing, China, in
[53] B. Jiang, D. Lin, J. Tang, and B. Luo, “Data representation and learn-
2016, and the master’s degree from the Depart-
ing with graph diffusion-embedding networks,” in Proc. IEEE/CVF
ment of Computer Science and Technology,
Conf. Comput. Vis. Pattern Recognit., 2019, pp. 10406–10415.
Tsinghua University, Beijing, China, in 2019. His
[54] X. Liu et al., “Self-supervised learning: Generative or contras-
research interests include graph neural networks
tive,” IEEE Trans. Knowl. Data Eng., early access, Jun. 22, 2021,
and data mining on graphs.
doi: 10.1109/TKDE.2021.3090866.
[55] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple
framework for contrastive learning of visual representations,”
2020, arXiv: 2002.05709.
[56] A. K. Sankararaman, S. De, Z. Xu, R. W. Huang, and T. Goldstein, Jie Tang (Fellow, IEEE) received the PhD degree
“Contrastive multi-view representation learning on graphs,” from Tsinghua University, Beijing, China. He is a
in Proc. Int. Conf. Mach. Learn., 2020. professor with the Department of Computer Sci-
[57] Y. Zhu, Y. Xu, F. Yu, Q. Liu, S. Wu, and L. Wang, “Deep graph ence and Technology, Tsinghua University. His
contrastive representation learning,” 2020, arXiv: 2006.04131. main research interests include data mining,
[58] Y. You, T. Chen, Y. Sui, T. Chen, Z. Wang, and Y. Shen, “Graph social network, and machine learning. He has
contrastive learning with augmentations,” in Proc. Int. Conf. Neu- published more than 200 research papers in top
ral Inf. Process. Syst., 2020, pp. 5812–5823. international journals and conferences.
[59] Y. You, T. Chen, Y. Shen, and Z. Wang, “Graph contrastive learn-
ing automated,” in Proc. 38th Int. Conf. Mach. Learn., 2021,
pp. 12121–12132.
[60] Y. Jiao, Y. Xiong, J. Zhang, Y. Zhang, T. Zhang, and Y. Zhu, “Sub-
graph contrast for scalable self-supervised graph representation " For more information on this or any other computing topic,
learning,” in Proc. IEEE Int. Conf. Data Mining, 2020, pp. 222–231. please visit our Digital Library at www.computer.org/csdl.
Authorized licensed use limited to: Lanzhou University. Downloaded on April 17,2023 at 14:06:07 UTC from IEEE Xplore. Restrictions apply.