A Comprehensive Survey of Graph Neural Networks PDF
A Comprehensive Survey of Graph Neural Networks PDF
A Comprehensive Survey of Graph Neural Networks PDF
X, DECEMBER 2018 1
Abstract—Deep learning has revolutionized many machine learning tasks in recent years, ranging from image classification and video
processing to speech recognition and natural language understanding. The data in these tasks are typically represented in the
Euclidean space. However, there is an increasing number of applications where data are generated from non-Euclidean domains and
are represented as graphs with complex relationships and interdependency between objects. The complexity of graph data has
arXiv:1901.00596v2 [cs.LG] 10 Mar 2019
imposed significant challenges on existing machine learning algorithms. Recently, many studies on extending deep learning
approaches for graph data have emerged. In this survey, we provide a comprehensive overview of graph neural networks (GNNs) in
data mining and machine learning fields. We propose a new taxonomy to divide the state-of-the-art graph neural networks into different
categories. With a focus on graph convolutional networks, we review alternative architectures that have recently been developed; these
learning paradigms include graph attention networks, graph autoencoders, graph generative networks, and graph spatial-temporal
networks. We further discuss the applications of graph neural networks across various domains and summarize the open source codes
and benchmarks of the existing algorithms on different learning tasks. Finally, we propose potential research directions in this
fast-growing field.
Index Terms—Deep Learning, graph neural networks, graph convolutional networks, graph representation learning, graph
autoencoder, network embedding
1 I NTRODUCTION meaningful features that are shared with the entire datasets
for various image analysis tasks.
While deep learning has achieved great success on Eu-
T HE recent success of neural networks has boosted re-
search on pattern recognition and data mining. Many
machine learning tasks such as object detection [1], [2], ma-
clidean data, there is an increasing number of applications
where data are generated from the non-Euclidean domain
chine translation [3], [4], and speech recognition [5], which and need to be effectively analyzed. For instance, in e-
once heavily relied on handcrafted feature engineering to commerce, a graph-based learning system is able to exploit
extract informative feature sets, has recently been revolu- the interactions between users and products [9], [10], [11]
tionized by various end-to-end deep learning paradigms, to make highly accurate recommendations. In chemistry,
i.e., convolutional neural networks (CNNs) [6], long short- molecules are modeled as graphs and their bioactivity needs
term memory (LSTM) [7], and autoencoders. The success to be identified for drug discovery [12], [13]. In a citation
of deep learning in many domains is partially attributed to network, papers are linked to each other via citations and
the rapidly developing computational resources (e.g., GPU) they need to be categorized into different groups [14],
and the availability of large training data, and is partially [15]. The complexity of graph data has imposed significant
due to the effectiveness of deep learning to extract latent challenges on existing machine learning algorithms. This is
representation from Euclidean data (e.g., images, text, and because graph data are irregular. Each graph has a variable
video). Taking image analysis as an example, an image can size of unordered nodes and each node in a graph has
be represented as a regular grid in the Euclidean space. a different number of neighbors, causing some important
A convolutional neural network (CNN) is able to exploit operations (e.g., convolutions), which are easy to compute
the shift-invariance, local connectivity, and compositionality in the image domain but are not directly applicable to the
of image data [8], and as a result, CNN can extract local graph domain anymore. Furthermore, a core assumption of
existing machine learning algorithms is that instances are
independent of each other. However, this is not the case for
• Z. Wu, F. Chen, G. Long, C. Zhang are with Centre for graph data where each instance (node) is related to others
Artificial Intelligence, FEIT, University of Technology Sydney,
NSW 2007, Australia (E-mail: [email protected];
(neighbors) via some complex linkage information, which is
[email protected]; [email protected]; used to capture the interdependence among data, including
[email protected]). citations, friendship, and interactions.
• S. Pan is with Faculty of Information Technology, Monash University, Recently, there is increasing interest in extending deep
Clayton, VIC 3800, Australia (Email: [email protected]).
• P. S. Yu is with Department of Computer Science, University of Illinois learning approaches for graph data. Driven by the success
at Chicago, Chicago, IL 60607-7053, USA (Email: [email protected]) of deep learning, researchers have borrowed ideas from
• Corresponding author: Shirui Pan. convolution networks, recurrent networks, and deep auto-
Manuscript received Dec xx, 2018; revised Dec xx, 201x. encoders to design the architecture of graph neural net-
JOURNAL OF LATEX CLASS FILES, VOL. X, NO. X, DECEMBER 2018 2
3.2 Frameworks
Graph neural networks, graph convolution networks
(GCNs) in particular, try to replicate the success of CNN
in graph data by defining graph convolutions via graph
spectral theory or spatial locality. With the graph structure
and node content information as inputs, the outputs of GCN
can focus on different graph analytics task with one of the
following mechanisms:
• Node-level outputs relate to node regression and
classification tasks. As a graph convolution module
directly gives nodes’ latent representations, a multi-
(a) Graph Convolution Net- (b) Graph Attention Networks
works [14] explicitly assign a [15] implicitly capture the perceptron layer or softmax layer is used as the final
non-parametric weight aij = weight aij via an end-to-end layer of GCN. We review graph convolution modules
√ 1 neural network architecture,
to the neigh- in Section 4.1 and Section 4.2.
deg(vi )deg(vj )
so that more important nodes
bor vj of vi during the aggre-
receive larger weights.
• Edge-level outputs relate to the edge classifica-
gation process. tion and link prediction tasks. To predict the la-
bel/connection strength of an edge, an additional
Fig. 6: Differences between graph convolutional networks
function will take two nodes’ latent representations
and graph attention networks.
from the graph convolution module as inputs.
• Graph-level outputs relate to the graph classification
task. To obtain a compact representation on graph
level, a pooling module is used to coarse a graph into
Graph Generative Networks aim to generate plausible sub-graphs or to sum/average over the node repre-
structures from data. Generating graphs given a graph sentations. We review the graph pooling module in
empirical distribution is fundamentally challenging, mainly Section 4.3.
because graphs are complex data structures. To address this In Table 3, we list the details of the inputs and outputs
problem, researchers have explored to factor the generation of the main GCNs methods. In particular, we summarize
process as forming nodes and edges alternatively [67], [68], output mechanisms in between each GCN layer and in the
to employ generative adversarial training [69], [70]. One final layer of each method. The output mechanisms may
promising application domain of graph generative networks involve several pooling operations, which are discussed in
is chemical compound synthesis. In a chemical graph, atoms Section 4.3.
are treated as nodes and chemical bonds are treated as
edges. The task is to discover new synthesizable molecules End-to-end Training Frameworks. Graph convolutional net-
which possess certain chemical and physical properties. works can be trained in a (semi-) supervised or purely un-
supervised way within an end-to-end learning framework,
depending on the learning tasks and label information avail-
Graph Spatial-temporal Networks aim to learn unseen pat-
able at hand.
terns from spatial-temporal graphs, which are increasingly
important in many applications such as traffic forecasting • Semi-supervised learning for node-level classifi-
and human activity prediction. For instance, the underlying cation. Given a single network with partial nodes
road traffic network is a natural graph where each key loca- being labeled and others remaining unlabeled, graph
tion is a node whose traffic data is continuously monitored. convolutional networks can learn a robust model that
By developing effective graph spatial-temporal network effectively identify the class labels for the unlabeled
models, we can accurately predict the traffic status over nodes [14]. To this end, an end-to-end framework can
the whole traffic system [73], [74]. The key idea of graph be built by stacking a couple of graph convolutional
spatial-temporal networks is to consider spatial dependency layers followed by a softmax layer for multi-class
and temporal dependency at the same time. Many current classification.
approaches apply GCNs to capture the dependency together • Supervised learning for graph-level classification.
with some RNN [73] or CNN [74] to model the temporal Given a graph dataset, graph-level classification aims
dependency. to predict the class label(s) for an entire graph [58],
JOURNAL OF LATEX CLASS FILES, VOL. X, NO. X, DECEMBER 2018 6
− 21 − 21
[59], [77], [78]. The end-to-end learning for this task defined as L = In − D AD P, where D is a diagonal
can be done with a framework which combines matrix of node degrees, Dii = j (Ai,j ). The normalized
both graph convolutional layers and the pooling graph Laplacian matrix possesses the property of being real
procedure [58], [59]. Specifically, by applying graph symmetric positive semidefinite. With this property, the nor-
convolutional layers, we obtain representation with malized Laplacian matrix can be factored as L = UΛUT ,
a fixed number of dimensions for each node in every where U = [u0 , u1 , · · · , un−1 ] ∈ RN ×N is the matrix of
single graph. Then, we can get the representation of eigenvectors ordered by eigenvalues and Λ is the diagonal
an entire graph through pooling which summarizes matrix of eigenvalues, Λii = λi . The eigenvectors of the
the representation vectors of all nodes in a graph. normalized Laplacian matrix form an orthonormal space, in
Finally, by applying linear layers and a softmax layer, mathematical words, UT U = I. In graph signal processing,
we can build an end-to-end framework for graph a graph signal x ∈ RN is a feature vector of the nodes of a
classification. An example is given in Fig 5a. graph where xi is the value of the ith node. The graph Fourier
• Unsupervised learning for graph embedding. When transform to a signal x is defined as F (x) = UT x and the in-
no class labels are available in graphs, we can learn verse graph Fourier transform is defined as F −1 (x̂) = Ux̂,
the graph embedding in a purely unsupervised way where x̂ represents the resulting signal from graph Fourier
in an end-to-end framework. These algorithms ex- transform. To understand graph Fourier transform, from its
ploit edge-level information in two ways. One simple definition we see that it indeed projects the input graph
way is to adopt an autoencoder framework where signal to the orthonormal space where the basis is formed by
the encoder employs graph convolutional layers to eigenvectors of the normalized graph Laplacian. Elements
embed the graph into the latent representation upon of the transformed signal x̂ are the coordinates of the graph
which a decoder is used to reconstruct the graph signal in the new spaceP so that the input signal can be
structure [62], [64]. Another way is to utilize the neg- represented as x = i x̂i ui , which is exactly the inverse
ative sampling approach which samples a portion graph Fourier transform. Now the graph convolution of the
of node pairs as negative pairs while existing node input signal x with a filter g ∈ RN is defined as
pairs with links in the graphs being positive pairs.
Then a logistic regression layer is applied after the x ∗G g = F −1 (F (x) F (g))
convolutional layers for end-to-end learning [25]. (1)
= U(UT x UT g)
4.1 Spectral-based Graph Convolutional Networks where Xk ∈ RN ×fk−1 is the input graph signal, N is the
Spectral-based methods have a solid foundation in graph number of nodes, fk−1 is the number of input channels and
signal processing [79]. We first give some basic knowledge fk is the number of output channels, Θki,j is a diagonal
background of graph signal processing, after which we matrix filled with learnable parameters, and σ is a non-
review the representative research on the spectral-based linear transformation.
GCNs.
Chebyshev Spectral CNN (ChebNet). Defferrard et al.
[12] propose ChebNet which defines a filter as Chebyshev
4.1.1 Backgrounds polynomials of the diagonal matrix of eigenvalues, i.e,
PK−1
In spectral-based models, graphs are assumed to be undi- gθ = i=0 θi Tk (Λ̃), where Λ̃ = 2Λ/λmax − IN . The
rected. A robust mathematical representation of an undi- Chebyshev polynomials are defined recursively by Tk (x) =
rected graph is the normalized graph Laplacian matrix, 2xTk−1 (x) − Tk−2 (x) with T0 (x) = 1 and T1 (x) = x. As a
JOURNAL OF LATEX CLASS FILES, VOL. X, NO. X, DECEMBER 2018 7
result, the convolution of a graph signal x with the defined need to be symmetric. For example, it can be the form
filter gθ is of D−1 A. The main drawback of 1stChebNet is that the
K−1 computation cost increases exponentially with the increase
X
x ∗G gθ = U( θi Tk (Λ̃))UT x of the number of 1stChebNet layers during batch training.
i=0 Each node in the last layer has to expand its neighborhood
K−1
(4) recursively across previous layers. Chen et al. [48] assume
X
= θi Ti (L̃)x the rescaled adjacency matrix à in Equation 7 comes from a
i=0 sampling distribution. Under this assumption, the technique
of Monte Carlo and variance reduction techniques are used
where L̃ = 2L/λmax − IN .
to facilitate the training process. Chen et al. [49] reduce
From Equation 4, ChebNet implicitly avoids the compu-
the receptive field size of the graph convolution to an
tation of the graph Fourier basis, reducing the computation
arbitrary small scale by sampling neighborhoods and using
complexity from O(N 3 ) to O(KM ). Since Ti (L̃) is a polyno-
historical hidden representations. Huang et al. [57] propose
mial of L̃ of ith order, Ti (L̃)x operates locally on each node.
an adaptive layer-wise sampling approach to accelerate the
Therefore, the filters of ChebNet are localized in space.
training of 1stChebNet, where sampling for the lower layer
First order of ChebNet (1stChebNet 2 ) Kipf et al. [14] in- is conditioned on the top one. This method is also applicable
troduce a first-order approximation of ChebNet. Assuming for explicit variance reduction.
K = 1 and λmax = 2 , Equation 4 is simplified as
1 1
Adaptive Graph Convolution Network (AGCN). To ex-
x ∗G gθ = θ0 x − θ1 D− 2 AD− 2 x (5) plore hidden structural relations unspecified by the graph
To restrain the number of parameters and avoid over- Laplacian matrix, Li et al. [23] propose the adaptive graph
fitting, 1stChebNet further assumes θ = θ0 = −θ1 , leading convolution network (AGCN). AGCN augments a graph
to the following definition of graph convolution, with a so-called residual graph, which is constructed by
1 1
computing a pairwise distance of nodes. Despite being able
x ∗G gθ = θ(In + D− 2 AD− 2 )x (6) to capture complement relational information, AGCN incurs
expensive O(N 2 ) computation.
In order to incorporate multi-dimensional graph input
signals, 1stChebNet proposes a graph convolution layer
which modifies Equation 6,
Xk+1 = ÃXk Θ (7) 4.1.3 Summary
1 1
where à = IN + D− 2 AD− 2 . Spectral CNN [21] relies on the eigen-decomposition of the
The graph convolution defined by 1stChebNet is local- Laplacian matrix. It has three effects. First, any perturbation
ized in space. It bridges the gap between spectral-based to a graph results in a change of eigenbasis. Second, the
methods and spatial-based methods. Each row of the output learned filters are domain dependent, meaning they cannot
represents the latent representation of each node obtained be applied to a graph with a different structure. Third, eigen-
by a linear transformation of aggregated information from decomposition requires O(N 3 ) computation and O(N 2 )
the node itself and its neighboring nodes with weights memory. Filters defined by ChebNet [12] and 1stChebNet
specified by the row of Ã. From the perspective of spatial- [14] are localized in space. The learned weights can be
based methods, the adjacency matrix à not necessarily shared across different locations in a graph. However, a
common drawback of spectral methods is they need to
2. Due to its impressive performance in many node classification
tasks, 1stChebNet is simply termed as GCN and is considered as a load the whole graph into the memory to perform graph
strong baseline in the research community. convolution, which is not efficient in handling big graphs.
JOURNAL OF LATEX CLASS FILES, VOL. X, NO. X, DECEMBER 2018 8
grid-data for the target node. In the end, LGCN applies 1D from original feature data. Mean/max/sum pooling is the
CNN on the resultant inputs to get the target node’s hidden most primitive and most effective way of implementing this
representation. While deriving graph labelings in PATCHY- since calculating the mean/max/sum value in the pooling
SAN requires complex pre-processing, sorting feature val- window is rapid.
ues in LGCN does not need a pre-processing step, making
it more efficient. To suit the scenario of large-scale graphs, hG = mean/max/sum(hT1 , hT2 , ..., hTn ) (15)
LGCN proposes a subgraph training strategy, which puts
the sampled subgraphs into a mini-batch. Henaff et al. [22] prove that performing a simple
max/mean pooling at the beginning of the network is espe-
Mixture Model Network (MoNet) [26] unifies standard cially important to reduce the dimensionality in the graph
CNN with convolutional architectures on non-Euclidean domain and mitigate the cost of the expensive graph Fourier
domains. While several spatial-based approaches ignore the transform operation.
relative positions between a node and its neighbors when Defferrard et al. optimize max/min pooling and devise
aggregating neighborhood feature information, MoNet in- an efficient pooling strategy in their approach ChebNet [12].
troduce pseudo-coordinates and weight functions to let the Input graphs are first processed by the coarsening process
weight of a node’s neighbor be determined by the relative described in Fig 5a . After coarsening, the vertices of the
position (pseudo-coordinates) between the node and its input graph and its coarsened versions are reformed in a
neighbor. Under such a framework, several approaches on balanced binary tree. Arbitrarily ordering the nodes at the
manifolds such as Geodesic CNN (GCNN) [87], Anisotropic coarsest level then propagating this ordering to the lower
CNN(ACNN) [88], Spline CNN [89], and on graphs such level in the balanced binary tree would finally produce a
as GCN [14], DCNN [47] can be generalized as special regular ordering in the finest level. Pooling such a rear-
instances of MoNet. These approaches under the framework ranged 1D signal is much more efficient than the original.
of MoNet have fixed weight functions. MoNet instead pro- Zhang et al. also propose a framework DGCNN [58]
poses a Gaussian kernel with learnable parameters to freely with a similar pooling strategy named SortPooling which
adjust the weight function. performs pooling by rearranging vertices to a meaningful
order. Different from ChebNet [12], DGCNN sorts vertices
4.2.4 Summary
according to their structural roles within the graph. The
Spatial-based methods define graph convolutions via ag- graph’s unordered vertex features from spatial graph con-
gregating feature information from neighbors. According to volutions are treated as continuous WL colors [85], and they
different ways of stacking graph convolution layers, spatial- are then used to sort vertices. In addition to sorting the
based methods are split into two groups, recurrent-based vertex features, it unifies the graph size to k by truncat-
and composition-based. While recurrent-based approaches ing/extending the graph’s feature tensor. The last n−k rows
try to obtain nodes’ steady states, composition-based ap- are deleted if n > k , otherwise k − n zero rows are added.
proaches try to incorporate higher orders of neighborhood This method enhances the pooling network to improve the
information. In each layer, both two groups have to update performance of GCNs by solving one challenge underlying
hidden states over all nodes during training. However, it is graph-structured tasks which is referred to as permutation
inefficient to store all the intermediate states into memory. invariant. Verma and Zhang propose graph capsule net-
To address this issue, several training strategies have been works [93] which further explore the permutation invariant
proposed, including sub-graph training for composition- problem for graph data.
based approaches such as GraphSage [25] and stochastically Recently a pooling module, DIFFPOOL [59], is proposed
asynchronous training for recurrent-based approaches such which can generate hierarchical representations of graphs
as SSE [20]. In addition, recent advances in spatial-based and can be combined with not only CNNs, but also var-
approaches tend to construct more complex network ar- ious graph neural network architectures in an end-to-end
chitectures. For examples, GeniePath [51] leverages gating fashion. Compared to all previous coarsening methods,
mechanisms to control the depth and breadth of a node’s DIFFPOOL does not simply cluster the nodes in one graph
neighborhood. DualGCN [52] devises two graph convolu- but provide a general solution to hierarchically pool nodes
tion networks to jointly embed the local consistency and the across a broad set of input graphs. This is done by learning
global consistency knowledge of a graph. Tran et al. [90] a cluster assignment matrix S at layer l referred to as
introduce a hyper-parameter to influence the receptive field S(l) ∈ Rnl ×nl +1 . Two separate GNNs with both input
size of a node. cluster node features X(l) and coarsened adjacency matrix
A(l) are being used to generate the assignment matrix S(l)
4.3 Graph Pooling Modules and embedding matrices Z(l) as follows:
When generalizing convolutional neural networks to graph-
Z(l) = GN Nl,embed (A(l) , X(l) ) (16)
structured data, another key component, graph pooling
module, is also of vital importance, particularly for graph- S(l) = sof tmax(GN Nl,pool (A(l) , X(l) )) (17)
level classification tasks [58], [59], [91]. According to Xu
et al. [92], pooling-assisted GCNs are as powerful as the Equation 16 and 17 can be implemented with any
Weisfeiler-Lehman test [85] in distinguishing graph struc- standard GNN module, which processes the same input
tures. Similar to the original pooling layer which comes data but has distinct parametrizations since the roles they
with CNNs, graph pooling module could easily reduce the play in the framework are different. The GN Nl,embed will
variance and computation complexity by down-sampling produce new embeddings while the GN Nl,pool generates a
JOURNAL OF LATEX CLASS FILES, VOL. X, NO. X, DECEMBER 2018 11
probabilistic assignment of the input nodes to nl+1 clusters. inputs can be incorporated into the aggregation function
The Softmax function is applied in a row-wise fashion in (e.g. [13], [18], [54], [55], [56]).
Equation 17. As a result, each row of S(l) corresponds to one As a result, spatial models have attracted increasing
of the nl nodes(or clusters) at layer l, and each column of attention in recent years [26].
S(l) corresponds to one of the nl at the next layer. Once we
have Z(l) and S(l) , the pooling operation comes as follows:
5 B EYOND G RAPH C ONVOLUTIONAL N ETWORKS
(l+1) (l) T (l) nl+1 ×d In this section, we review other graph neural networks
X =S Z ∈R (18)
including graph attention neural networks, graph auto-
T
A(l+1) = S(l) A(l) S(l) ∈ Rnl+1 ×nl+1 (19) encoder, graph generative networks, and graph spatial-
temporal networks. In Table 4, we provide a summary of
Equation 18 takes the cluster embeddings Z(l) then the main approaches under each category.
aggregates these embeddings according to the cluster as-
signments S(l) to calculate embedding for each of the nl+1
5.1 Graph Attention Networks
clusters. Initial cluster embedding would be node repre-
sentation. Similarly, Equation 19 takes the adjacency matrix Attention mechanisms have almost become a standard in
A(l) as inputs and generates a coarsened adjacency matrix sequence-based tasks [94]. The virtue of attention mecha-
denoting the connectivity strength between each pair of the nisms is their ability to focus on the most important parts
clusters. of an object. This specialty has been proven to be useful
Overall, DIFFPOOL [59] redefines the graph pooling for many tasks, such as machine translation and natural
module by using two GNNs to cluster the nodes. Any language understanding. Thanks to the increased model
standard GCN module is able to combine with DIFFPOOL, capacity of attention mechanisms, graph neural networks
not only to achieve enhanced performance but also to speed also benefit from this by using attention during aggregation,
up the convolution operation. integrating outputs from multiple models, and generating
importance-oriented random walks. In this section, we will
discuss how attention mechanisms are being used in graph-
4.4 Comparison Between Spectral and Spatial Models structured data.
As the earliest convolutional networks for graph data,
spectral-based models have achieved impressive results in 5.1.1 Methods of Graph Attention Networks
many graph related analytics tasks. These models are ap- Graph Attention Network (GAT) [15] is a spatial-based
pealing in that they have a theoretical foundation in graph graph convolution network where the attention mechanism
signal processing. By designing new graph signal filters is involved in determining the weights of a node’s neighbors
[24], we can theoretically design new graph convolution when aggregating feature information. The graph convolu-
networks. However, there are several drawbacks to spectral- tion operation of GAT is defined as,
based models. We illustrate this in the following from three X
aspects, efficiency, generality and flexibility.
hti = σ( α(hit−1 , ht−1
j )W
t−1 t−1
hj ) (20)
j∈Ni
In terms of efficiency, the computational cost of spectral-
based models increases dramatically with the graph size where α(·) is an attention function which adaptively con-
because they either need to perform eigenvector compu- trols the contribution of a neighbor j to the node i. In order
tation [21] or handle the whole graph at the same time, to learn attention weights in different subspaces, GAT uses
which makes them difficult to parallel or scale to large multi-head attentions.
graphs. Spatial based models have the potential to handle X
large graphs as they directly perform the convolution in the
hti =kK
k=1 σ( αk (hit−1 , ht−1
j )Wk
t−1 t−1
hj ) (21)
j∈Ni
graph domain via aggregating the neighboring nodes. The
computation can be performed in a batch of nodes instead where k denotes concatenation.
of the whole graph. When the number of neighboring nodes
Gated Attention Network (GAAN) [29] also employs the
increases, sampling techniques [25], [28] can be developed
multi-head attention mechanism in updating a node’s hid-
to improve efficiency.
den state. However, rather than assigning equal weight to
In terms of generality, spectral-based models assumed
each head, GAAN introduces a self-attention mechanism
a fixed graph, making them generalize poorly to new or
which computes a different weight for each head. The
different graphs. Spatial-based models, on the other hand,
updating rule is defined as,
perform graph convolution locally on each node, where
weights can be easily shared across different locations and X
structures. hti = φo (xi ⊕ kK k
k=1 gi αk (ht−1
i , ht−1 t−1
j )φv (hj )) (22)
In terms of flexibility, spectral-based models are limited j∈Ni
to work on undirected graphs. There is no clear definition
where φo (·) and φv (·) denote feedforward neural networks
of the Laplacian matrix on directed graphs so that the only
and gik is the attention weight of the k th attention head.
way to apply spectral-based models to directed graphs is to
transfer directed graphs to undirected graphs. Spatial-based Graph Attention Model (GAM) [60] proposes a recur-
models are more flexible to deal with multi-source inputs rent neural network model to solve graph classification
such as edge features and edge directions because these problems, which processes informative parts of a graph
JOURNAL OF LATEX CLASS FILES, VOL. X, NO. X, DECEMBER 2018 12
TABLE 4: Summary of Alternative Graph Neural Networks (Graph Convolutional Networks Excluded). We summarize
methods based on their inputs, outputs, targeted tasks, and whether a method is GCN-based. Inputs indicate whether a
method suits attributed graphs (A), directed graphs (D), and spatial-temporal graphs (S).
Inputs GCN
Category Approaches Outputs Tasks
A D S Based
Graph GAT (2017) [15] 3 3 7 node labels node classification 3
Attention GAAN (2018) [29] 3 3 7 node labels node classification 3
Networks GAM (2018) [60] 3 3 7 graph labels graph classification 7
Attention Walks (2018) [61] 7 7 7 node embedding network embedding 7
GAE (2016) [62] 3 7 7 reconstructed adajacency matrix network embedding 3
Graph ARGA (2018) [64] 3 7 7 reconstructed adajacency matrix network embedding 3
Auto-encoder reconstructed sequences of
NetRA (2018) [65] 7 7 7 network embedding 7
random walks
DNGR (2016) [42] 7 7 7 reconstructed PPMI matrix network embedding 7
SDNE (2016) [43] 7 3 7 reconstructed adajacency matrix network embedding 7
DNRE (2018) [66] 3 7 7 reconstructed node embedding network embedding 7
MolGAN (2018) [69] 3 7 7 new graphs graph generation 3
Graph
DGMG (2018) [68] 7 7 7 new graphs graph generation 3
Generative
GraphRNN (2018) [67] 7 7 7 new graphs graph generation 7
Networks
NetGAN (2018) [70] 7 7 7 new graphs graph generation 7
spatial-temporal
DCRNN (2018) [73] 7 7 3 node value vectors 3
Graph forecasting
Spatial-Temporal spatial-temporal
CNN-GCN (2017) [74] 7 7 3 node value vectors 3
Networks forecasting
spatial-temporal
ST-GCN (2018) [75] 7 7 3 graph labels 3
classification
spatial-temporal
Structural RNN (2016) [76] 7 7 3 node labels/value vectors 7
forecasting
by adaptively visiting a sequence of important nodes. The the umbrella of graph attention networks, they can also be
GAM model is defined as considered as spatial-based graph convolution networks at
the same time. The advantage of GAT [15] and GAAN [29]
ht = fh (fs (rt−1 , vt−1 , g; θs ), ht−1 ; θh ) (23) is that they can adaptively learn the importance weights of
neighbors as illustrated in Fig 6. However, the computa-
where fh (·) is an LSTM network, fs is the step network tion cost and memory consumption increase rapidly as the
which takes a step from the current node vt−1 to one of its attention weights between each pair of neighbors must be
neighbors ct , prioritizing those whose type have a higher computed.
rank in vt−1 which is generated by a policy network:
The framework of GAE is also depicted in Fig 5b. The GAE of adjacent nodes close to each other as much as possible.
can be trained in a variational manner, i.e., to minimize the Specifically, the loss function L1st is defined as
variational lower bound L: n
(k) (k)
X
L1st = Ai,j ||hi − hj ||2 (30)
L = Eq(Z|X,A) [logp (A|Z)] − KL[q(Z|X, A)||p(Z)] (28) i,j=1
Autoencoders (NetRA) [65] is a graph auto-encoder frame- One innovation of DRNE is that it chooses an LSTM as
work which shares a similar idea with ARGA. It also the aggregation function where the neighbors’ sequence is
regularizes node hidden representations to comply with a ordered by their node degree.
prior distribution via adversarial training. Instead of recon-
structing the adjacency matrix, they recover node sequences 5.2.3 Summary
sampled from random walks by a sequence-to-sequence DNGR and SDNE learn node embeddings only given the
architecture [96]. topological structures, while GAE, ARGA, NetRA, DRNE
learn node embeddings when both topological information
Deep Neural Networks for Graph Representations
and node content features are available. One challenge of
(DNGR) [42] uses the stacked denoising autoencoder
graph auto-encoders is the sparsity of the adjacency matrix
[97] to reconstruct the pointwise mutual information ma-
trix(PPMI). The PPMI matrix intrinsically captures nodes
A, causing the number of positive entries of the decoder to
be far less than the negative ones. To tackle this issue, DNGR
co-occurrence information when a graph is serialized as
reconstructs a denser matrix namely the PPMI matrix, SDNE
sequences by random walks. Formally, the PPMI matrix is
imposes a penalty to zero entries of the adjacency matrix,
defined as
GAE reweights the terms in the adjacency matrix, and
count(v1 , v2 ) · |D| NetRA linearizes Graphs into sequences.
PPMIv1 ,v2 = max(log( ), 0) (29)
count(v1 )count(v2 )
P 5.3 Graph Generative Networks
where |D| = v1 ,v2 count(v1 , v2 ) and v1 , v2 ∈ V . The
The goal of graph generative networks is to generate graphs
stacked denoising autoencoder is able to learn highly non-
given an observed set of graphs. Many approaches to graph
linear regularity behind data. Different from conventional
generative networks are domain specific. For instance, in
neural autoencoders, it adds noise to inputs by randomly
molecular graph generation, some works model a string
switching entries of inputs to zero. The learned latent repre-
representation of molecular graphs called SMILES [98], [99],
sentation is more robust especially when there are missing
[100], [101]. In natural language processing, generating a se-
values present.
mantic or a knowledge graph is often conditioned on a given
Structural Deep Network Embedding (SDNE) [43] uses sentence [102], [103]. Recently, several general approaches
stacked auto encoder to preserve nodes first-order proximity have been proposed. Some work factor the generation pro-
and second-order proximity jointly. The first-order proxim- cess as forming nodes and edges alternatively [67], [68]
ity is defined as the distance between a node’s hidden rep- while others employ generative adversarial training [69],
resentation and its neighbor’s hidden representation. The [70]. The methods in this category either employ GCN as
goal for the first-order proximity is to drive representations building blocks or use different architectures.
JOURNAL OF LATEX CLASS FILES, VOL. X, NO. X, DECEMBER 2018 14
Fig. 9: Framework of MolGAN [70]. A generator first samples an initial vector from a standard normal distribution. Passing
this initial vector through a neural network, the generator outputs a dense adjacency matrix A and a corresponding feature
matrix X . Next, the generator produces a sampled discrete à and X̃ from categorical distributions based on A and X .
Finally, GCN is used to derive a vector representation of the sampled graph. Feeding this graph representation to two
distinct neural networks, a discriminator and a reward network output a score between zero and one separately, which
will be used as feedback to update the model parameters.
5.3.1 GCN Based Graph Generative Networks NetGAN [70] combines LSTM [7] with Wasserstein GAN
Molecular Generative Adversarial Networks (MolGAN) [106] to generate graphs from a random-walk-based ap-
[69] integrates relational GCN [104], improved GAN [105] proach. The GAN framework consists of two modules, a
and reinforcement learning (RL) objective to generate generator and a discriminator. The generator makes its best
graphs with desired properties. The GAN consists of a effort to generate plausible random walks through an LSTM
generator and a discriminator, competing with each other network while the discriminator tries to distinguish faked
to improve the authenticity of the generator. In MolGAN, random walks from the real ones. After training, a new
the generator tries to propose a faked graph along with its graph is obtained by normalizing a co-occurrence matrix
feature matrix while the discriminator aims to distinguish of nodes which occur in a set of random walks.
the faked sample from the empirical data. Additionally,
a reward network is introduced in parallel with the dis- 5.3.3 Summary
criminator to encourage the generated graphs to possess Evaluating generated graphs remains a difficult problem.
certain properties according to an external evaluator. The Unlike synthesized images or audios, which can be di-
framework of MolGAN is described in Fig 9. rectly assessed by human experts, the quality of generated
graphs is difficult to inspect visually. MolGAN and DGMG
Deep Generative Models of Graphs (DGMG) [68] uti-
make use of external knowledge to evaluate the validity
lizes spatial-based graph convolution networks to obtain
of generated molecule graphs. GraphRNN and NetGAN
a hidden representation of an existing graph. The decision
evaluate generated graphs by graph statistics (e.g. node
process of generating nodes and edges is conditioned on the
degrees). Whereas DGMG and GraphRNN generate nodes
resultant graph representation. Briefly, DGMG recursively
and edges sequentially, MolGAN and NetGAN generate
proposes a node to a growing graph until a stopping crite-
nodes and edges jointly. According to [71], the disadvantage
rion is evoked. In each step after adding a new node, DGMG
of the former approaches is that when graphs become large,
repeatedly decides whether to add an edge to the added
modeling a long sequence is not realistic. The challenge
node until the decision turns to false. If the decision is true,
of the latter approaches is that the global properties of a
it evaluates the probability distribution of connecting the
graph are difficult to control. A recent approach [71] adopts
newly added node to all existing nodes and samples one
variational auto-encoder to generate a graph by proposing
node from the probability distribution. After a new node
the adjacency matrix, imposing penalty terms to address
and its connections are added to the existing graph, DGMG
validity constraints. However, as the output space of a graph
updates the graph representation again.
with n nodes is n2 , none of these methods is scalable to large
graphs.
5.3.2 Miscellaneous Graph Generative Networks
GraphRNN [67] exploits deep graph generative models
through two-level recurrent neural networks. The graph- 5.4 Graph Spatial-Temporal Networks
level RNN adds a new node each time to a node sequence Graph spatial-temporal networks capture spatial and tem-
while the edge level RNN produces a binary sequence poral dependencies of a spatial-temporal graph simultane-
indicating connections between the newly added node and ously. Spatial-temporal graphs have a global graph structure
previously generated nodes in the sequence. To linearize with inputs to each node which are changing across time.
a graph into a sequence of nodes for training the graph For instance, in traffic networks, each sensor taken as a
level RNN, GraphRNN adopts the breadth-first-search (BFS) node records the traffic speed of a certain road continuously
strategy. To model the binary sequence for training the edge- where the edges of the traffic network are determined by
level RNN, GraphRNN assumes multivariate Bernoulli or the distance between pairs of sensors. The goal of graph
conditional Bernoulli distribution. spatial-temporal networks can be forecasting future node
JOURNAL OF LATEX CLASS FILES, VOL. X, NO. X, DECEMBER 2018 15
values or labels, or predicting spatial-temporal graph labels. graph according to the distance of the two related nodes.
Recent studies have explored the use of GCNs [75] solely, In this way, the adjacency matrix can be represented as a
a combination of GCNs with RNN [73] or CNN [74], and summation of K adjacency matrices where K is the number
a recurrent architecture tailored to graph structures [76]. In of labels. Then ST-GCN applies GCN [14] with a different
the following, we introduce these methods. weight matrix to each of the K adjacency matrix and sums
them. X −1 −1
5.4.1 GCN Based Graph Spatial-Temporal Networks fout = Λj 2 Aj Λj 2 fin Wj (37)
Diffusion Convolutional Recurrent Neural Network j
treated as undirected graphs in evaluating model perfor- NewsGroup dataset consists of around 20,000 News Group
mance with respect to node classification, link prediction, (NG) text documents categorized by 20 news types. The
and node clustering tasks. There are three popular datasets graph of the 20-NewsGroup is constructed by representing
for paper-citation networks, Cora, Citeseer and Pubmed. each document as a node and using the similarities between
The Cora dataset contains 2708 machine learning publi- nodes as edge weights.
cations grouped into seven classes. The Citeseer dataset
Others There are several other datasets worth mentioning.
contains 3327 scientific papers grouped into six classes. Each
The METR-LA is a traffic dataset collected from the high-
paper in Cora and Citeseer is represented by a one-hot
ways of Los Angeles County. The MovieLens-1M dataset
vector indicating the presence or absence of a word from
from the MovieLens website contains 1 million item rat-
a dictionary. The Pubmed dataset contains 19717 diabetes-
ings given by 6k users. It is a benchmark dataset for
related publications. Each paper in Pubmed is represented
recommender systems. The NELL dataset is a knowledge
by a term frequency-inverse document frequency (TF-IDF)
graph obtained from the Never-Ending Language Learning
vector. Furthermore, DBLP is a large citation dataset with
project. It consists of facts represented by a triplet which
millions of papers and authors which have been collected
involves two entities and their relation.
from computer science bibliographies. The raw dataset of
DBLP can be found on https://dblp.uni-trier.de. A pro-
cessed version of the DBLP paper-citation network is up- 6.2 Benchmarks & Open-source Implementations
dated continuously by https://aminer.org/citation. Of the datasets listed in Table 5, Cora, Pubmed, Citeseer,
and PPI are the most frequently used datasets. They are
Social Networks are formed by user interactions from
often tested to compare the performance of graph convo-
online services such as BlogCatalog, Reddit, and Epinions.
lution networks in node classification tasks. In Table 6, we
The BlogCatalog dataset is a social network which con-
report the benchmark performance of these four datasets,
sists of bloggers and their social relationships. The labels
all of which use standard data splits. Open-source imple-
of bloggers represent their personal interests. The Reddit
mentations facilitate the work of baseline experiments in
dataset is an undirected graph formed by posts collected
deep learning research. Due to the vast number of hyper-
from the Reddit discussion forum. Two posts are linked if
parameters, it is difficult to achieve the same results as
they contain comments by the same user. Each post has a
reported in the literature without using published codes.
label indicating the community to which it belongs. The
In Table 7, we provide the hyperlinks of open-source imple-
Epinions dataset is a multi-relation graph collected from an
mentations of the graph neural network models reviewed in
online product review website where commenters can have
Section 4-5. Noticeably, Fey et al. [89] published a geometric
more than one type of relation, such as trust, distrust, co-
learning library in PyTorch named PyTorch Geometric 3 ,
review, and co-rating.
which implements several graph neural networks including
Chemical/Biological Graphs Chemical molecules and com- ChebNet [12], 1stChebNet [14], GraphSage [25], MPNNs
pounds can be represented by chemical graphs with atoms [13], GAT [15] and SplineCNN [89]. Most recently, the Deep
as nodes and chemical bonds as edges. This category of Graph Library (DGL) 4 is released which provides a fast
graphs is often used to evaluate graph classification perfor- implementation of many graph neural networks with a set
mance. The NCI-1 and NCI-9 dataset contain 4110 and 4127 of functions on top of popular deep learning platforms such
chemical compounds respectively, labeled as to whether as PyTorch and MXNet.
they are active to hinder the growth of human cancer cell
lines. The MUTAG dataset contains 188 nitro compounds, 6.3 Practical Applications
labeled as to whether they are aromatic or heteroaromatic.
Graph neural networks have a wide range of applications
The D&D dataset contains 1178 protein structures, labeled
across different tasks and domains. Despite general tasks at
as to whether they are enzymes or non-enzymes. The QM9
which each category of GNNs is specialized, including node
dataset contains 133885 molecules labeled with 13 chemical
classification, node representation learning, graph classifi-
properties. The Tox21 dataset contains 12707 chemical com-
cation, graph generation, and spatial-temporal forecasting,
pounds labeled with 12 types of toxicity. Another important
GNNs can also be applied to node clustering, link predic-
dataset is the Protein-Protein Interaction network(PPI). It
tion [124], and graph partition [125]. In this section, we
contains 24 biological graphs with nodes represented by
mainly introduce practical applications according to general
proteins and edges represented by the interactions between
domains to which they belong.
proteins. In PPI, each graph is associated with one human
tissue. Each node is labeled with its biological states. Computer Vision One of the biggest application areas for
graph neural networks is computer vision. Researchers have
Unstructured Graphs To test the generalization of graph explored leveraging graph structures in scene graph gener-
neural networks to unstructured data, the k nearest neigh- ation, point clouds classification and segmentation, action
bor graph (k-NN graph) has been widely used. The MNIST recognition and many other directions.
dataset contains 70000 images of size 28×28 labeled with ten In scene graph generation, semantic relationships be-
digits. A typical way to convert an MNIST image to a graph tween objects facilitate the understanding of the semantic
is to construct an 8-NN graph based on its pixel locations. meaning behind a visual scene. Given an image, scene
The Wikipedia dataset is a word co-occurrence network ex-
tracted from the first million bytes of the Wikipedia dump. 3. https://github.com/rusty1s/pytorch geometric
Labels of words represent part-of-speech (POS) tags. The 20- 4. https://www.dgl.ai/
JOURNAL OF LATEX CLASS FILES, VOL. X, NO. X, DECEMBER 2018 17
TABLE 6: Benchmark performance of four most frequently unmanned vehicles. To identify objects depicted by point
used datasets. The listed methods use the same training, clouds, [130], [131], [132] convert point clouds into k-nearest
validation, and test data for evaluation. neighbor graphs or superpoint graphs, and use graph con-
Method Cora Citeseer Pubmed PPI volution networks to explore the topological structure.
1stChebnet (2016) [14] 81.5 70.3 79.0 - In action recognition, recognizing human actions con-
GraphSage (2017) [25] - - - 61.2 tained in videos facilitates a better understanding of video
GAT (2017) [15] 83.0±0.7 72.5±0.7 79.0±0.3 97.3±0.2
Cayleynets (2017) [24] 81.9±0.7 - - - content from a machine aspect. One group of solutions
StoGCN (2018) [49] 82.0±0.8 70.9±0.2 78.7±0.3 97.8±0.05 detects the locations of human joints in video clips. Human
DualGCN (2018) [52] 83.5 72.6 80.0 -
GAAN (2018) [29] - - - 98.71±0.02
joints which are linked by skeletons naturally form a graph.
GraphInfoMax (2018) [109] 82.3±0.6 71.8±0.7 76.8±0.6 63.8±0.2 Given the time series of human joint locations, [75], [76]
GeniePath (2018) [51] - - 78.5 97.9 applies spatial-temporal neural networks to learn human
LGCN (2018) [28] 83.3±0.5 73.0±0.6 79.5±0.2 77.2±0.2
SSE (2018) [20] - - - 83.6 action patterns.
In addition, the number of possible directions in which
to apply graph neural networks in computer vision is still
graph generation models detect and recognize objects and growing. This includes human-object interaction [133], few-
predict semantic relationships between pairs of objects [126], shot image classification [134], [135], semantic segmentation
[127], [128]. Another application inverses the process by [136], [137], visual reasoning [138] and question answering
generating realistic images given scene graphs [129]. As [139].
natural language can be parsed as semantic graphs where Recommender Systems Graph-based recommender sys-
each word represents an object, it is a promising solution to tems take items and users as nodes. By leveraging the
synthesize images given textual descriptions. relations between items and items, users and users, users
In point clouds classification and segmentation, a point and items, as well as content information, graph-based
cloud is a set of 3D points recorded by LiDAR scans. recommender systems are able to produce high-quality
Solutions for this task enable LiDAR devices to see the recommendations. The key to a recommender system is
surrounding environment, which is typically beneficial for to score the importance of an item to a user. As a result,
JOURNAL OF LATEX CLASS FILES, VOL. X, NO. X, DECEMBER 2018 18
network, a new person may enter into a network at any time [11] R. Ying, R. He, K. Chen, P. Eksombatchai, W. L. Hamilton, and
and an existing person may quit the network as well. In J. Leskovec, “Graph convolutional neural networks for web-
scale recommender systems,” in Proceedings of the ACM SIGKDD
a recommender system, products may have different types International Conference on Knowledge Discovery and Data Mining.
where their inputs may have different forms such as texts ACM, 2018, pp. 974–983.
or images. Therefore, new methods should be developed to [12] M. Defferrard, X. Bresson, and P. Vandergheynst, “Convolutional
handle dynamic and heterogeneous graph structures. neural networks on graphs with fast localized spectral filtering,”
in Advances in Neural Information Processing Systems, 2016, pp.
3844–3852.
8 C ONCLUSION [13] J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl,
“Neural message passing for quantum chemistry,” in Proceedings
In this survey, we conduct a comprehensive overview of of the International Conference on Machine Learning, 2017, pp. 1263–
graph neural networks. We provide a taxonomy which 1272.
[14] T. N. Kipf and M. Welling, “Semi-supervised classification with
groups graph neural networks into five categories: graph graph convolutional networks,” in Proceedings of the International
convolutional networks, graph attention networks, graph Conference on Learning Representations, 2017.
autoencoders, graph generative networks and graph spatial- [15] P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Lio,
temporal networks. We provide a thorough review, com- and Y. Bengio, “Graph attention networks,” in Proceedings of the
International Conference on Learning Representations, 2017.
parisons, and summarizations of the methods within or [16] M. Gori, G. Monfardini, and F. Scarselli, “A new model for
between categories. Then we introduce a wide range of ap- learning in graph domains,” in Proceedings of the International Joint
plications of graph neural networks. Datasets, open source Conference on Neural Networks, vol. 2. IEEE, 2005, pp. 729–734.
[17] A. Micheli, “Neural network for graphs: A contextual construc-
codes, and benchmarks for graph neural networks are sum-
tive approach,” IEEE Transactions on Neural Networks, vol. 20,
marized. Finally, we suggest four future directions for graph no. 3, pp. 498–511, 2009.
neural networks. [18] F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Mon-
fardini, “The graph neural network model,” IEEE Transactions on
Neural Networks, vol. 20, no. 1, pp. 61–80, 2009.
ACKNOWLEDGMENT [19] Y. Li, D. Tarlow, M. Brockschmidt, and R. Zemel, “Gated graph
sequence neural networks,” in Proceedings of the International
This research was funded by the Australian Government Conference on Learning Representations, 2015.
through the Australian Research Council (ARC) under [20] H. Dai, Z. Kozareva, B. Dai, A. Smola, and L. Song, “Learning
grants 1) LP160100630 partnership with Australia Govern- steady-states of iterative algorithms over graphs,” in Proceedings
ment Department of Health and 2) LP150100671 partnership of the International Conference on Machine Learning, 2018, pp. 1114–
1122.
with Australia Research Alliance for Children and Youth [21] J. Bruna, W. Zaremba, A. Szlam, and Y. LeCun, “Spectral net-
(ARACY) and Global Business College Australia (GBCA). works and locally connected networks on graphs,” in Proceedings
We acknowledge the support of NVIDIA Corporation and of International Conference on Learning Representations, 2014.
[22] M. Henaff, J. Bruna, and Y. LeCun, “Deep convolutional networks
MakeMagic Australia with the donation of GPU used for on graph-structured data,” arXiv preprint arXiv:1506.05163, 2015.
this research. [23] R. Li, S. Wang, F. Zhu, and J. Huang, “Adaptive graph convolu-
tional neural networks,” in Proceedings of the AAAI Conference on
Artificial Intelligence, 2018, pp. 3546–3553.
R EFERENCES [24] R. Levie, F. Monti, X. Bresson, and M. M. Bronstein, “Cayleynets:
[1] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only Graph convolutional neural networks with complex rational
look once: Unified, real-time object detection,” in Proceedings of spectral filters,” IEEE Transactions on Signal Processing, vol. 67,
the IEEE conference on computer vision and pattern recognition, 2016, no. 1, pp. 97–109, 2017.
pp. 779–788. [25] W. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation
[2] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards learning on large graphs,” in Advances in Neural Information
real-time object detection with region proposal networks,” in Processing Systems, 2017, pp. 1024–1034.
Advances in neural information processing systems, 2015, pp. 91–99. [26] F. Monti, D. Boscaini, J. Masci, E. Rodola, J. Svoboda, and M. M.
[3] M.-T. Luong, H. Pham, and C. D. Manning, “Effective approaches Bronstein, “Geometric deep learning on graphs and manifolds
to attention-based neural machine translation,” in Proceedings of using mixture model cnns,” in Proceedings of the IEEE Conference
the Conference on Empirical Methods in Natural Language Processing, on Computer Vision and Pattern Recognition, 2017, pp. 5115–5124.
2015, pp. 1412–1421. [27] M. Niepert, M. Ahmed, and K. Kutzkov, “Learning convolutional
[4] Y. Wu, M. Schuster, Z. Chen, Q. V. Le, M. Norouzi, W. Macherey, neural networks for graphs,” in Proceedings of the International
M. Krikun, Y. Cao, Q. Gao, K. Macherey et al., “Google’s neural Conference on Machine Learning, 2016, pp. 2014–2023.
machine translation system: Bridging the gap between human [28] H. Gao, Z. Wang, and S. Ji, “Large-scale learnable graph convolu-
and machine translation,” arXiv preprint arXiv:1609.08144, 2016. tional networks,” in Proceedings of the ACM SIGKDD International
[5] G. Hinton, L. Deng, D. Yu, G. E. Dahl, A.-r. Mohamed, N. Jaitly, Conference on Knowledge Discovery & Data Mining. ACM, 2018,
A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath et al., “Deep pp. 1416–1424.
neural networks for acoustic modeling in speech recognition: [29] J. Zhang, X. Shi, J. Xie, H. Ma, I. King, and D.-Y. Yeung, “Gaan:
The shared views of four research groups,” IEEE Signal processing Gated attention networks for learning on large and spatiotem-
magazine, vol. 29, no. 6, pp. 82–97, 2012. poral graphs,” in Proceedings of the Uncertainty in Artificial Intelli-
[6] Y. LeCun, Y. Bengio et al., “Convolutional networks for images, gence, 2018.
speech, and time series,” The handbook of brain theory and neural [30] P. W. Battaglia, J. B. Hamrick, V. Bapst, A. Sanchez-Gonzalez,
networks, vol. 3361, no. 10, p. 1995, 1995. V. Zambaldi, M. Malinowski, A. Tacchetti, D. Raposo, A. Santoro,
[7] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” R. Faulkner et al., “Relational inductive biases, deep learning, and
Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997. graph networks,” arXiv preprint arXiv:1806.01261, 2018.
[8] M. M. Bronstein, J. Bruna, Y. LeCun, A. Szlam, and P. Van- [31] J. B. Lee, R. A. Rossi, S. Kim, N. K. Ahmed, and E. Koh, “Attention
dergheynst, “Geometric deep learning: going beyond euclidean models in graphs: A survey,” arXiv preprint arXiv:1807.07984,
data,” IEEE Signal Processing Magazine, vol. 34, no. 4, pp. 18–42, 2018.
2017. [32] Z. Zhang, P. Cui, and W. Zhu, “Deep learning on graphs: A
[9] R. van den Berg, T. N. Kipf, and M. Welling, “Graph convolu- survey,” arXiv preprint arXiv:1812.04202, 2018.
tional matrix completion,” stat, vol. 1050, p. 7, 2017. [33] P. Cui, X. Wang, J. Pei, and W. Zhu, “A survey on network em-
[10] F. Monti, M. Bronstein, and X. Bresson, “Geometric matrix com- bedding,” IEEE Transactions on Knowledge and Data Engineering,
pletion with recurrent multi-graph neural networks,” in Advances 2017.
in Neural Information Processing Systems, 2017, pp. 3697–3707.
JOURNAL OF LATEX CLASS FILES, VOL. X, NO. X, DECEMBER 2018 20
[34] W. L. Hamilton, R. Ying, and J. Leskovec, “Representation learn- [57] W. Huang, T. Zhang, Y. Rong, and J. Huang, “Adaptive sampling
ing on graphs: Methods and applications,” in Advances in Neural towards fast graph representation learning,” in Advances in Neu-
Information Processing Systems, 2017, pp. 1024–1034. ral Information Processing Systems, 2018, pp. 4563–4572.
[35] D. Zhang, J. Yin, X. Zhu, and C. Zhang, “Network representation [58] M. Zhang, Z. Cui, M. Neumann, and Y. Chen, “An end-to-end
learning: A survey,” IEEE Transactions on Big Data, 2018. deep learning architecture for graph classification,” in Proceedings
[36] H. Cai, V. W. Zheng, and K. Chang, “A comprehensive survey of of the AAAI Conference on Artificial Intelligence, 2018.
graph embedding: problems, techniques and applications,” IEEE [59] Z. Ying, J. You, C. Morris, X. Ren, W. Hamilton, and J. Leskovec,
Transactions on Knowledge and Data Engineering, 2018. “Hierarchical graph representation learning with differentiable
[37] P. Goyal and E. Ferrara, “Graph embedding techniques, applica- pooling,” in Advances in Neural Information Processing Systems,
tions, and performance: A survey,” Knowledge-Based Systems, vol. 2018, pp. 4801–4811.
151, pp. 78–94, 2018. [60] J. B. Lee, R. Rossi, and X. Kong, “Graph classification using struc-
[38] S. Pan, J. Wu, X. Zhu, C. Zhang, and Y. Wang, “Tri-party deep tural attention,” in Proceedings of the ACM SIGKDD International
network representation,” in Proceedings of the International Joint Conference on Knowledge Discovery & Data Mining. ACM, 2018,
Conference on Artificial Intelligence. AAAI Press, 2016, pp. 1895– pp. 1666–1674.
1901. [61] S. Abu-El-Haija, B. Perozzi, R. Al-Rfou, and A. A. Alemi, “Watch
[39] X. Shen, S. Pan, W. Liu, Y.-S. Ong, and Q.-S. Sun, “Discrete your step: Learning node embeddings via graph attention,” in
network embedding,” in Proceedings of the International Joint Con- Advances in Neural Information Processing Systems, 2018, pp. 9197–
ference on Artificial Intelligence, 7 2018, pp. 3549–3555. 9207.
[40] H. Yang, S. Pan, P. Zhang, L. Chen, D. Lian, and C. Zhang, [62] T. N. Kipf and M. Welling, “Variational graph auto-encoders,”
“Binarized attributed network embedding,” in IEEE International arXiv preprint arXiv:1611.07308, 2016.
Conference on Data Mining. IEEE, 2018. [63] C. Wang, S. Pan, G. Long, X. Zhu, and J. Jiang, “Mgae: Marginal-
[41] B. Perozzi, R. Al-Rfou, and S. Skiena, “Deepwalk: Online learning ized graph autoencoder for graph clustering,” in Proceedings of
of social representations,” in Proceedings of the ACM SIGKDD the ACM on Conference on Information and Knowledge Management.
international conference on Knowledge discovery and data mining. ACM, 2017, pp. 889–898.
ACM, 2014, pp. 701–710. [64] S. Pan, R. Hu, G. Long, J. Jiang, L. Yao, and C. Zhang, “Adver-
[42] S. Cao, W. Lu, and Q. Xu, “Deep neural networks for learning sarially regularized graph autoencoder for graph embedding.”
graph representations,” in Proceedings of the AAAI Conference on in Proceedings of the International Joint Conference on Artificial
Artificial Intelligence, 2016, pp. 1145–1152. Intelligence, 2018, pp. 2609–2615.
[43] D. Wang, P. Cui, and W. Zhu, “Structural deep network embed- [65] W. Yu, C. Zheng, W. Cheng, C. C. Aggarwal, D. Song, B. Zong,
ding,” in Proceedings of the ACM SIGKDD International Conference H. Chen, and W. Wang, “Learning deep network representations
on Knowledge Discovery and Data Mining. ACM, 2016, pp. 1225– with adversarially regularized autoencoders,” in Proceedings of
1234. the ACM SIGKDD International Conference on Knowledge Discovery
[44] A. Sandryhaila and J. M. Moura, “Discrete signal processing on & Data Mining. ACM, 2018, pp. 2663–2671.
graphs,” IEEE transactions on signal processing, vol. 61, no. 7, pp. [66] K. Tu, P. Cui, X. Wang, P. S. Yu, and W. Zhu, “Deep recursive
1644–1656, 2013. network embedding with regular equivalence,” in Proceedings of
[45] S. Chen, R. Varma, A. Sandryhaila, and J. Kovačević, “Discrete the ACM SIGKDD International Conference on Knowledge Discovery
signal processing on graphs: Sampling theory,” IEEE Transactions and Data Mining. ACM, 2018, pp. 2357–2366.
on Signal Processing, vol. 63, no. 24, pp. 6510–6523, 2015. [67] J. You, R. Ying, X. Ren, W. L. Hamilton, and J. Leskovec,
[46] A. Susnjara, N. Perraudin, D. Kressner, and P. Vandergheynst, “Graphrnn: A deep generative model for graphs,” Proceedings of
“Accelerated filtering on graphs using lanczos method,” arXiv International Conference on Machine Learning, 2018.
preprint arXiv:1509.04537, 2015. [68] Y. Li, O. Vinyals, C. Dyer, R. Pascanu, and P. Battaglia, “Learning
[47] J. Atwood and D. Towsley, “Diffusion-convolutional neural net- deep generative models of graphs,” in Proceedings of the Interna-
works,” in Advances in Neural Information Processing Systems, 2016, tional Conference on Machine Learning, 2018.
pp. 1993–2001. [69] N. De Cao and T. Kipf, “Molgan: An implicit generative model
[48] J. Chen, T. Ma, and C. Xiao, “Fastgcn: fast learning with graph for small molecular graphs,” arXiv preprint arXiv:1805.11973,
convolutional networks via importance sampling,” in Proceedings 2018.
of the International Conference on Learning Representations, 2018. [70] A. Bojchevski, O. Shchur, D. Zügner, and S. Günnemann, “Net-
[49] J. Chen, J. Zhu, and L. Song, “Stochastic training of graph gan: Generating graphs via random walks,” in Proceedings of the
convolutional networks with variance reduction,” in Proceedings International Conference on Machine Learning, 2018.
of the International Conference on Machine Learning, 2018, pp. 941– [71] T. Ma, J. Chen, and C. Xiao, “Constrained generation of semanti-
949. cally valid graphs via regularizing variational autoencoders,” in
[50] F. P. Such, S. Sah, M. A. Dominguez, S. Pillai, C. Zhang, Advances in Neural Information Processing Systems, 2018, pp. 7110–
A. Michael, N. D. Cahill, and R. Ptucha, “Robust spatial filter- 7121.
ing with graph convolutional neural networks,” IEEE Journal of [72] Y. Seo, M. Defferrard, P. Vandergheynst, and X. Bresson, “Struc-
Selected Topics in Signal Processing, vol. 11, no. 6, pp. 884–896, 2017. tured sequence modeling with graph convolutional recurrent
[51] Z. Liu, C. Chen, L. Li, J. Zhou, X. Li, and L. Song, “Geniepath: networks,” arXiv preprint arXiv:1612.07659, 2016.
Graph neural networks with adaptive receptive paths,” arXiv [73] Y. Li, R. Yu, C. Shahabi, and Y. Liu, “Diffusion convolutional
preprint arXiv:1802.00910, 2018. recurrent neural network: Data-driven traffic forecasting,” in
[52] C. Zhuang and Q. Ma, “Dual graph convolutional networks for Proceedings of International Conference on Learning Representations,
graph-based semi-supervised classification,” in Proceedings of the 2018.
World Wide Web Conference on World Wide Web. International [74] B. Yu, H. Yin, and Z. Zhu, “Spatio-temporal graph convolutional
World Wide Web Conferences Steering Committee, 2018, pp. 499– networks: A deep learning framework for traffic forecasting,”
508. in Proceedings of the International Joint Conference on Artificial
[53] T. Derr, Y. Ma, and J. Tang, “Signed graph convolutional net- Intelligence, 2018, pp. 3634–3640.
work,” arXiv preprint arXiv:1808.06354, 2018. [75] S. Yan, Y. Xiong, and D. Lin, “Spatial temporal graph con-
[54] T. Pham, T. Tran, D. Q. Phung, and S. Venkatesh, “Column volutional networks for skeleton-based action recognition,” in
networks for collective classification,” in Proceedings of the AAAI Proceedings of the AAAI Conference on Artificial Intelligence, 2018.
Conference on Artificial Intelligence, 2017, pp. 2485–2491. [76] A. Jain, A. R. Zamir, S. Savarese, and A. Saxena, “Structural-rnn:
[55] M. Simonovsky and N. Komodakis, “Dynamic edgeconditioned Deep learning on spatio-temporal graphs,” in Proceedings of the
filters in convolutional neural networks on graphs,” in Proceed- IEEE Conference on Computer Vision and Pattern Recognition, 2016,
ings of the IEEE conference on computer vision and pattern recognition, pp. 5308–5317.
2017. [77] S. Pan, J. Wu, X. Zhu, C. Zhang, and P. S. Yu, “Joint structure
[56] S. Kearnes, K. McCloskey, M. Berndl, V. Pande, and P. Riley, feature exploration and regularization for multi-task graph clas-
“Molecular graph convolutions: moving beyond fingerprints,” sification,” IEEE Transactions on Knowledge and Data Engineering,
Journal of computer-aided molecular design, vol. 30, no. 8, pp. 595– vol. 28, no. 3, pp. 715–728, 2016.
608, 2016. [78] S. Pan, J. Wu, X. Zhu, G. Long, and C. Zhang, “Task sensitive fea-
ture exploration and learning for multitask graph classification,”
IEEE transactions on cybernetics, vol. 47, no. 3, pp. 744–758, 2017.
JOURNAL OF LATEX CLASS FILES, VOL. X, NO. X, DECEMBER 2018 21
[79] D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Van- [101] R. Gómez-Bombarelli, J. N. Wei, D. Duvenaud, J. M. Hernández-
dergheynst, “The emerging field of signal processing on graphs: Lobato, B. Sánchez-Lengeling, D. Sheberla, J. Aguilera-
Extending high-dimensional data analysis to networks and other Iparraguirre, T. D. Hirzel, R. P. Adams, and A. Aspuru-Guzik,
irregular domains,” IEEE Signal Processing Magazine, vol. 30, “Automatic chemical design using a data-driven continuous
no. 3, pp. 83–98, 2013. representation of molecules,” ACS central science, vol. 4, no. 2,
[80] L. B. Almeida, “A learning rule for asynchronous perceptrons pp. 268–276, 2018.
with feedback in a combinatorial environment.” in Proceedings of [102] B. Chen, L. Sun, and X. Han, “Sequence-to-action: End-to-end
the International Conference on Neural Networks, vol. 2. IEEE, 1987, semantic graph generation for semantic parsing,” in Proceedings of
pp. 609–618. the Annual Meeting of the Association for Computational Linguistics,
[81] F. J. Pineda, “Generalization of back-propagation to recurrent 2018, pp. 766–777.
neural networks,” Physical review letters, vol. 59, no. 19, p. 2229, [103] D. D. Johnson, “Learning graphical state transitions,” in Proceed-
1987. ings of the International Conference on Learning Representations, 2016.
[82] K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, [104] M. Schlichtkrull, T. N. Kipf, P. Bloem, R. van den Berg, I. Titov,
F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase rep- and M. Welling, “Modeling relational data with graph convolu-
resentations using rnn encoder-decoder for statistical machine tional networks,” in European Semantic Web Conference. Springer,
translation,” in Proceedings of the Conference on Empirical Methods 2018, pp. 593–607.
in Natural Language Processing, 2014, pp. 1724–1734. [105] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C.
[83] D. K. Duvenaud, D. Maclaurin, J. Iparraguirre, R. Bombarell, Courville, “Improved training of wasserstein gans,” in Advances
T. Hirzel, A. Aspuru-Guzik, and R. P. Adams, “Convolutional in Neural Information Processing Systems, 2017, pp. 5767–5777.
networks on graphs for learning molecular fingerprints,” in [106] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein gan,” arXiv
Advances in Neural Information Processing Systems, 2015, pp. 2224– preprint arXiv:1701.07875, 2017.
2232. [107] P. Sen, G. Namata, M. Bilgic, L. Getoor, B. Galligher, and T. Eliassi-
[84] K. T. Schütt, F. Arbabzadah, S. Chmiela, K. R. Müller, and Rad, “Collective classification in network data,” AI magazine,
A. Tkatchenko, “Quantum-chemical insights from deep tensor vol. 29, no. 3, p. 93, 2008.
neural networks,” Nature communications, vol. 8, p. 13890, 2017. [108] X. Zhang, Y. Li, D. Shen, and L. Carin, “Diffusion maps for textual
[85] B. Weisfeiler and A. Lehman, “A reduction of a graph to a network embedding,” in Advances in Neural Information Processing
canonical form and an algebra arising during this reduction,” Systems, 2018.
Nauchno-Technicheskaya Informatsia, vol. 2, no. 9, pp. 12–16, 1968. [109] P. Veličković, W. Fedus, W. L. Hamilton, P. Liò, Y. Ben-
[86] B. L. Douglas, “The weisfeiler-lehman method and graph isomor- gio, and R. D. Hjelm, “Deep graph infomax,” arXiv preprint
phism testing,” arXiv preprint arXiv:1101.5211, 2011. arXiv:1809.10341, 2018.
[87] J. Masci, D. Boscaini, M. Bronstein, and P. Vandergheynst, [110] Q. Li, Z. Han, and X.-M. Wu, “Deeper insights into graph convo-
“Geodesic convolutional neural networks on riemannian mani- lutional networks for semi-supervised learning,” in Proceedings of
folds,” in Proceedings of the IEEE International Conference on Com- the AAAI Conference on Artificial Intelligence, 2018.
puter Vision Workshops, 2015, pp. 37–45. [111] J. Tang, J. Zhang, L. Yao, J. Li, L. Zhang, and Z. Su, “Arnetminer:
[88] D. Boscaini, J. Masci, E. Rodolà, and M. Bronstein, “Learning extraction and mining of academic social networks,” in Proceed-
shape correspondence with anisotropic convolutional neural net- ings of the ACM SIGKDD International Conference on Knowledge
works,” in Advances in Neural Information Processing Systems, 2016, Discovery and Data Mining. ACM, 2008, pp. 990–998.
pp. 3189–3197. [112] Y. Ma, S. Wang, C. C. Aggarwal, D. Yin, and J. Tang, “Multi-
[89] M. Fey, J. E. Lenssen, F. Weichert, and H. Müller, “Splinecnn: Fast dimensional graph convolutional networks,” arXiv preprint
geometric deep learning with continuous b-spline kernels,” in arXiv:1808.06099, 2018.
Proceedings of the IEEE Conference on Computer Vision and Pattern [113] L. Tang and H. Liu, “Relational learning via latent social dimen-
Recognition, 2018, pp. 869–877. sions,” in Proceedings of the ACM SIGKDD International Conference
[90] D. V. Tran, A. Sperduti et al., “On filter size in graph convolu- on Knowledge Ciscovery and Data Mining. ACM, 2009, pp. 817–
tional networks,” in 2018 IEEE Symposium Series on Computational 826.
Intelligence (SSCI). IEEE, 2018, pp. 1534–1541. [114] H. Wang, J. Wang, J. Wang, M. Zhao, W. Zhang, F. Zhang, X. Xie,
[91] S. Pan, J. Wu, and X. Zhu, “Cogboost: Boosting for fast cost- and M. Guo, “Graphgan: Graph representation learning with
sensitive graph classification,” IEEE Transactions on Knowledge & generative adversarial nets,” in Proceedings of the AAAI Conference
Data Engineering, no. 1, pp. 1–1, 2015. on Artificial Intelligence, 2017.
[92] K. Xu, W. Hu, J. Leskovec, and S. Jegelka, “How powerful are [115] M. Zitnik and J. Leskovec, “Predicting multicellular function
graph neural networks,” arXiv preprint arXiv:1810.00826, 2018. through multi-layer tissue networks,” Bioinformatics, vol. 33,
[93] S. Verma and Z.-L. Zhang, “Graph capsule convolutional neural no. 14, pp. i190–i198, 2017.
networks,” arXiv preprint arXiv:1805.08090, 2018. [116] N. Wale, I. A. Watson, and G. Karypis, “Comparison of descrip-
[94] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. tor spaces for chemical compound retrieval and classification,”
Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Knowledge and Information Systems, vol. 14, no. 3, pp. 347–375,
in Advances in Neural Information Processing Systems, 2017, pp. 2008.
5998–6008. [117] A. K. Debnath, R. L. Lopez de Compadre, G. Debnath, A. J.
[95] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde- Shusterman, and C. Hansch, “Structure-activity relationship of
Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adver- mutagenic aromatic and heteroaromatic nitro compounds. cor-
sarial nets,” in Advances in neural information processing systems, relation with molecular orbital energies and hydrophobicity,”
2014, pp. 2672–2680. Journal of medicinal chemistry, vol. 34, no. 2, pp. 786–797, 1991.
[96] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence [118] P. D. Dobson and A. J. Doig, “Distinguishing enzyme structures
learning with neural networks,” in Advances in Neural Information from non-enzymes without alignments,” Journal of molecular biol-
Processing Systems, 2014, pp. 3104–3112. ogy, vol. 330, no. 4, pp. 771–783, 2003.
[97] P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol, “Ex- [119] R. Ramakrishnan, P. O. Dral, M. Rupp, and O. A. Von Lilien-
tracting and composing robust features with denoising autoen- feld, “Quantum chemistry structures and properties of 134 kilo
coders,” in Proceedings of the international conference on Machine molecules,” Scientific data, vol. 1, p. 140022, 2014.
learning. ACM, 2008, pp. 1096–1103. [120] T. Joachims, “A probabilistic analysis of the rocchio algorithm
[98] G. L. Guimaraes, B. Sanchez-Lengeling, C. Outeiral, P. L. C. with tfidf for text categorization.” Carnegie-mellon univ pitts-
Farias, and A. Aspuru-Guzik, “Objective-reinforced generative burgh pa dept of computer science, Tech. Rep., 1996.
adversarial networks (organ) for sequence generation models,” [121] H. Jagadish, J. Gehrke, A. Labrinidis, Y. Papakonstantinou, J. M.
arXiv preprint arXiv:1705.10843, 2017. Patel, R. Ramakrishnan, and C. Shahabi, “Big data and its tech-
[99] M. J. Kusner, B. Paige, and J. M. Hernández-Lobato, “Grammar nical challenges,” Communications of the ACM, vol. 57, no. 7, pp.
variational autoencoder,” arXiv preprint arXiv:1703.01925, 2017. 86–94, 2014.
[100] H. Dai, Y. Tian, B. Dai, S. Skiena, and L. Song, “Syntax-directed [122] B. N. Miller, I. Albert, S. K. Lam, J. A. Konstan, and J. Riedl,
variational autoencoder for molecule generation,” in Proceedings “Movielens unplugged: experiences with an occasionally con-
of the International Conference on Learning Representations, 2018. nected recommender system,” in Proceedings of the international
conference on Intelligent user interfaces. ACM, 2003, pp. 263–266.
JOURNAL OF LATEX CLASS FILES, VOL. X, NO. X, DECEMBER 2018 22
[123] A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. R. Hruschka Jr, question answering,” in Advances in Neural Information Processing
and T. M. Mitchell, “Toward an architecture for never-ending Systems, 2018, pp. 2655–2666.
language learning.” in Proceedings of the AAAI Conference on [140] Z. Cui, K. Henrickson, R. Ke, and Y. Wang, “High-order graph
Artificial Intelligence, 2010, pp. 1306–1313. convolutional recurrent neural network: a deep learning frame-
[124] M. Zhang and Y. Chen, “Link prediction based on graph neural work for network-scale traffic learning and forecasting,” arXiv
networks,” in Advances in Neural Information Processing Systems, preprint arXiv:1802.07007, 2018.
2018. [141] H. Yao, F. Wu, J. Ke, X. Tang, Y. Jia, S. Lu, P. Gong, J. Ye,
[125] T. Kawamoto, M. Tsubaki, and T. Obuchi, “Mean-field theory and Z. Li, “Deep multi-view spatial-temporal network for taxi
of graph neural networks in graph partitioning,” in Advances in demand prediction,” in Proceedings of the AAAI Conference on
Neural Information Processing Systems, 2018, pp. 4362–4372. Artificial Intelligence, 2018, pp. 2588–2595.
[126] D. Xu, Y. Zhu, C. B. Choy, and L. Fei-Fei, “Scene graph generation [142] J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei, “Line:
by iterative message passing,” in Proceedings of the IEEE Confer- Large-scale information network embedding,” in Proceedings of
ence on Computer Vision and Pattern Recognition, vol. 2, 2017. the International Conference on World Wide Web. International
[127] J. Yang, J. Lu, S. Lee, D. Batra, and D. Parikh, “Graph r-cnn World Wide Web Conferences Steering Committee, 2015, pp.
for scene graph generation,” in European Conference on Computer 1067–1077.
Vision. Springer, 2018, pp. 690–706. [143] A. Fout, J. Byrd, B. Shariat, and A. Ben-Hur, “Protein interface
[128] Y. Li, W. Ouyang, B. Zhou, J. Shi, C. Zhang, and X. Wang, “Fac- prediction using graph convolutional networks,” in Advances in
torizable net: an efficient subgraph-based framework for scene Neural Information Processing Systems, 2017, pp. 6530–6539.
graph generation,” in European Conference on Computer Vision. [144] J. You, B. Liu, R. Ying, V. Pande, and J. Leskovec, “Graph
Springer, 2018, pp. 346–363. convolutional policy network for goal-directed molecular graph
[129] J. Johnson, A. Gupta, and L. Fei-Fei, “Image generation from generation,” in Advances in Neural Information Processing Systems,
scene graphs,” arXiv preprint, 2018. 2018.
[130] Y. Wang, Y. Sun, Z. Liu, S. E. Sarma, M. M. Bronstein, and J. M. [145] M. Allamanis, M. Brockschmidt, and M. Khademi, “Learning to
Solomon, “Dynamic graph cnn for learning on point clouds,” represent programs with graphs,” in Proceedings of the Interna-
arXiv preprint arXiv:1801.07829, 2018. tional Conference on Learning Representations, 2017.
[131] L. Landrieu and M. Simonovsky, “Large-scale point cloud seman- [146] J. Qiu, J. Tang, H. Ma, Y. Dong, K. Wang, and J. Tang, “Deepinf:
tic segmentation with superpoint graphs,” in Proceedings of the Social influence prediction with deep learning,” in Proceedings of
IEEE Conference on Computer Vision and Pattern Recognition, 2018. the ACM SIGKDD International Conference on Knowledge Discovery
[132] G. Te, W. Hu, Z. Guo, and A. Zheng, “Rgcnn: Regular- & Data Mining. ACM, 2018, pp. 2110–2119.
ized graph cnn for point cloud segmentation,” arXiv preprint [147] D. Zügner, A. Akbarnejad, and S. Günnemann, “Adversarial
arXiv:1806.02952, 2018. attacks on neural networks for graph data,” in Proceedings of the
[133] S. Qi, W. Wang, B. Jia, J. Shen, and S.-C. Zhu, “Learning human- ACM SIGKDD International Conference on Knowledge Discovery and
object interactions by graph parsing neural networks,” in Proceed- Data Mining. ACM, 2018, pp. 2847–2856.
ings of the European Conference on Computer Vision (ECCV), 2018, [148] E. Choi, M. T. Bahadori, L. Song, W. F. Stewart, and J. Sun, “Gram:
pp. 401–417. graph-based attention model for healthcare representation learn-
[134] V. G. Satorras and J. B. Estrach, “Few-shot learning with graph ing,” in Proceedings of the ACM SIGKDD International Conference on
neural networks,” in Proceedings of the International Conference on Knowledge Discovery and Data Mining. ACM, 2017, pp. 787–795.
Learning Representations, 2018. [149] E. Choi, C. Xiao, W. Stewart, and J. Sun, “Mime: Multilevel
[135] M. Guo, E. Chou, D.-A. Huang, S. Song, S. Yeung, and L. Fei- medical embedding of electronic health records for predictive
Fei, “Neural graph matching networks for fewshot 3d action healthcare,” in Advances in Neural Information Processing Systems,
recognition,” in European Conference on Computer Vision. Springer, 2018, pp. 4548–4558.
2018, pp. 673–689. [150] J. Kawahara, C. J. Brown, S. P. Miller, B. G. Booth, V. Chau, R. E.
[136] X. Qi, R. Liao, J. Jia, S. Fidler, and R. Urtasun, “3d graph neural Grunau, J. G. Zwicker, and G. Hamarneh, “Brainnetcnn: convo-
networks for rgbd semantic segmentation,” in Proceedings of the lutional neural networks for brain networks; towards predicting
IEEE Conference on Computer Vision and Pattern Recognition, 2017, neurodevelopment,” NeuroImage, vol. 146, pp. 1038–1049, 2017.
pp. 5199–5208. [151] T. H. Nguyen and R. Grishman, “Graph convolutional networks
[137] L. Yi, H. Su, X. Guo, and L. J. Guibas, “Syncspeccnn: Synchro- with argument-aware pooling for event detection,” in Proceedings
nized spectral cnn for 3d shape segmentation.” in Proceedings of of the AAAI Conference on Artificial Intelligence, 2018, pp. 5900–
the IEEE Conference on Computer Vision and Pattern Recognition, 5907.
2017, pp. 6584–6592. [152] Z. Li, Q. Chen, and V. Koltun, “Combinatorial optimization
[138] X. Chen, L.-J. Li, L. Fei-Fei, and A. Gupta, “Iterative visual reason- with graph convolutional networks and guided tree search,” in
ing beyond convolutions,” in Proceedings of the IEEE Conference on Advances in Neural Information Processing Systems, 2018, pp. 536–
Computer Vision and Pattern Recognition, 2018. 545.
[139] M. Narasimhan, S. Lazebnik, and A. Schwing, “Out of the [153] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning
box: Reasoning with graph convolution nets for factual visual for image recognition,” in Proceedings of the IEEE conference on
computer vision and pattern recognition, 2016, pp. 770–778.