Graph Neural Networks

Download as odp, pdf, or txt
Download as odp, pdf, or txt
You are on page 1of 27

Graph Neural Networks


Classic deep learning architectures such as Convolutional
Neural Networks (CNNs) and Recurrent Neural Networks
(RNNs) require the input data domain to be regular, such
as 2D or 3D Euclidean grids for Computer Vision and 1D
lines for Natural Language Processing.

However, most real-world data beyond images and language
has an underlying structure that is non-Euclidean. Such
complex data commonly occurs in science and engineering,
and can be modelled by heterogeneous graphs. Examples
include chemical graphs, computer graphics, social networks,
genetics, neuroscience, and sensor networks.
Why are graphs useful?

Graph-structured data can be large and complex (in the case
of social networks, on the scale of billions), and is a natural
target for machine learning applications. However, designing
models for learning from non-Euclidean data is challenging as
there are no familiar properties such as coordinate systems,
vector space structure, or shift invariance.
Why are graphs useful?

This is a very flexible data structure that generalizes
many other data structures. For example, if there are no
edges, then it becomes a set; if there are only “vertical”
edges and any two nodes are connected by exactly one
path, then we have a tree. Such flexibility is both good
and bad as I’ll discuss in this tutorial.
In the context of computer vision (CV) and machine learning
(ML), studying graphs and the models to learn from them can
give us at least four benefits:

We can become closer to solving important problems
that previously were too challenging, such as: drug
discovery for cancer.

In most CV/ML applications, data can be actually viewed
as graphs even though you used to represent them as
another data structure. Representing your data as
graph(s) gives you a lot of flexibility and can give you a
very different and interesting perspective on your
problem. For instance, instead of learning from image
pixels you can learn from “superpixels”. Graphs also let
you impose a relational inductive bias in data — some
prior knowledge you have about the problem. For
instance, if you want to reason about a human pose,
your relational bias can be a graph of skeleton joints of a
human body.

Your favourite neural network itself can be viewed
as a graph, where nodes are neurons and edges
are weights, or where nodes are layers and edges
denote flow of forward/backward pass (in which
case we are talking about a computational graph
used in TensorFlow, PyTorch and other DL
frameworks). An application can be optimization
of a computational graph, neural architecture
search, analyzing training behavior, etc.

Finally, you can solve many problems, where data
can be more naturally represented as graphs,
more effectively. This includes, but is not limited
to, molecule and social network classification
and , 3D Mesh classification and many other
exciting problems.
For example, recognizing and analyzing faces and
emotions.
Let’s consider an undirected graph G with N nodes. Edges E represent
undirected connections between nodes. Nodes and edges typically come
from your intuition about the problem. Our intuition in the case of images is
that nodes are pixels or superpixels (a group of pixels of weird shape) and
edges are spatial distances between them. For example, the MNIST image
below on the left is typically represented as an 28×28 dimensional matrix.
We can also represent it as a set of N=28*28=784 pixels. So, our graph G is
going to have N=784 nodes and edges will have large values (thicker edges
in the Figure below) for closely located pixels and small values (thinner
edges) for remote pixels.
What is Graph Neural Networks?


Graph/Geometric Deep Learning is an umbrella term for
emerging techniques attempting to generalize deep neural
networks to non-Euclidean domains such as graphs and
manifolds. We are interested to designing neural networks
for graphs of arbitrary topologies and structures in order to
solve generic graph problems, such as vertex classification,
graph classification, graph regression, and graph
generation.

These Graph Neural Network (GNN) architectures are used
as backbones for challenging domain-specific applications
in a myriad of domains, including chemistry, social
networks, recommendations and computer graphics.
How it works?

Each GNN layer computes
d-dimensional
representations for the
nodes/edges of the graph
through recursive
neighborhood diffusion (or
message passing), where
each graph node gathers
features from its neighbors
to represent local graph
structure. Stacking GNN
layers allows the network to
build node representations
from the -hop neighborhood
of each node.
● Let hil denote the feature vector at layer l associated
with node . The updated features hil+1 at the next layer
l+1 are obtained by applying non-linear
transformations to the central feature vector hil and the
feature vectors hjl for all nodes j in the neighborhood of
node i (defined by the graph structure). This
guarantees the transformation to build local reception
fields, such as in standard ConvNets for computer
vision, and be invariant to both graph size and vertex
re-indexing.
● Thus, the most generic version of a feature vector hil+1 at
vertex at the next layer in the graph network is:


Where j → i denotes the set of neighboring nodes j pointed
to node i, which can be replaced by j , the set of neighbors
of node i, if the graph is undirected. In other words, a GNN
is defined by a mapping f taking as input a vector hil(the
feature vector of the center vertex) as well as an un-ordered
set of vectors hjl (the feature vectors of all neighboring
vertices).

The arbitrary choice of the mapping defines an instantiation
of a class of GNNs, e.g., GraphSage, MoNet, GAT, etc.
What makes a neural network a
graph neural network?

You know how a classical neural network works, right? We have some C-
dimensional features X as the input to the net. Using our running MNIST
example, X will be our C=784 dimensional pixel features (i.e. a “flattened”
image). These features get multiplied by C×F dimensional weights W that
we update during training to get the output closer to what we expect. The
result can be directly used to solve the task (e.g. in case of regression) or
can be further fed to some nonlinearity (activation), like ReLU, or other
differentiable (or more precisely, sub-differentiable) functions to form a
multi-layer network. In general, the output of some layer l is:

The signal in MNIST is so strong, that you can
get an accuracy of 91% by just using the
formula above and the Cross Entropy loss
without any nonlinearities and other tricks.

Now, how do we transform our vanilla neural
network to a graph neural network?The core
idea behind GNNs is aggregation over
“neighbors”.

For example, this can be a fragment (subgraph) of a
social network with 5 persons and an edge between
a pair of nodes denotes if two people are friends (or
at least one of them think so). An adjacency matrix
(usually denoted as A) in the figure below on the
right is a way to represent these edges in a matrix
form, convenient for our deep learning frameworks.
Yellow cells in the matrix represent the edge and
blue — the absence of the edge.

This is a typical, but not the only, way to define an
adjacency matrix for visual tasks. This adjacency matrix
is our prior, or our inductive bias, we impose on the
model based on our intuition that nearby pixels should
be connected and remote pixels shouldn’t or should
have very thin edge (edge of a small value). This is
motivated by observations that in natural images nearby
pixels often correspond to the same object or objects
that interact frequently, so it makes a lot of sense to
connect such pixels.

So, now instead of having just features X we
have some fancy matrix A with values in the
range [0,1]. It’s important to note that once we
know that our input is a graph, we assume that
there is no canonical order of nodes that will be
consistent across all other graphs in the
dataset. In terms of images, it means that pixels
are assumed to be randomly shuffled. Finding
the canonical order of nodes is combinatorially
unsolvable in practice. Even though for MNIST
we technically can cheat by knowing this order
(because data are originally from a regular
grid), it’s not going to work on actual graph
datasets.

Remember that our matrix of features X has 𝑁 rows and C
columns. So, in terms of graphs, each row corresponds to one
node and C is the dimensionality of node features. But now
the problem is that we don’t know the order of nodes, so we
don’t know in which row to put features of a particular node. If
we just pretend to ignore this problem and feed X directly to an
MLP as we did before, the effect will be the same as feeding
images with randomly shuffled pixels with independent
shuffling for each image! Surprisingly, a neural network can in
principle still fit such random data, however test performance
will be close to random prediction. One of the solutions is to
simply use the adjacency matrix A, we created before, in the
following way:

We just need to make sure that row i in A
corresponds to features of node in row i of X.
Here, We are using 𝓐 instead of plain A,
because often you want to normalize A. If 𝓐=A,
the matrix multiplication 𝓐X⁽ˡ⁾ will be equivalent
to summing features of neighbors, which turned
out to be useful in many tasks. Most commonly,
you normalize it so that 𝓐X⁽ˡ⁾ averages features
of neighbors, i.e. 𝓐=A/ΣᵢAᵢ.

After running the code, you may notice that the
classification accuracy is actually about the
same. What’s the problem? Aren’t graph
networks supposed to work better? Well, they
are, in many cases. But not in this one,
because the 𝓐X⁽ˡ⁾ operator we added is actually
nothing else, but a Gaussian filter:

So, our graph neural network turned out to be
equivalent to a convolutional neural network with a
single Gaussian filter, that we never update during
training, followed by the fully-connected layer. This
filter basically blurs/smooths the image, which is not
a particularly useful thing to do (see the image above
on the right). However, this is the simplest variant of
a graph neural network, which nevertheless works
great on graph-structured data. To make GNNs work
better on regular graphs, like images, we need to
apply a bunch of tricks. For example, instead of using
a predefined Gaussian filter, we can learn to predict
an edge between any pair of pixels.
Conclusion


Graph Neural Networks are a very flexible and
interesting family of neural networks that can be
applied to really complex data. As always, such
flexibility must come at a certain cost. In case of
GNNs it is the difficulty of regularizing the model
by defining such operators as convolution.
Research in that direction is advancing quite
fast, so that GNNs will see application in
increasingly wider areas of machine learning
and computer vision.
List of references

https://arxiv.org/abs/1611.08097

https://arxiv.org/abs/1605.07736

https://www.nature.com/articles/s41598-019-45349-y

https://arxiv.org/abs/1603.07063

https://arxiv.org/abs/1801.07455

https://ai.googleblog.com/2018/06/how-can-neural-network-similarity-help.html

https://arxiv.org/abs/1811.09595

https://arxiv.org/abs/1711.08920

https://www.cv-foundation.org/openaccess/content_cvpr_2015/html/Antonakos_Active_
Pictorial_Structures_2015_CVPR_paper.html

https://arxiv.org/abs/1711.08920

https://arxiv.org/abs/1606.09375

https://arxiv.org/abs/1611.03530

https://arxiv.org/abs/1810.00826
Questions

What is the main feature of graph neural
networks?

The main feature is flexibility, graph neural
networks allow us to use large and complex data,
which we can not use with classic neural networks
before.

Does this mean that graph neural networks will
always be better than classical neural networks?

No, it does not. There is no universal solution for
deep learning tasks. But it is likely that graph
neural networks will occupy a niche for certain
tasks.
Questions

Are there ready-made frameworks for working with graph
neural networks, such as Tensorflow or pytorch for calsic
neural networks?

Yes, there are. For example, Deep Graph Library (DGL
https://docs.dgl.ai/index.html ) is a Python package built
for easy implementation of graph neural network model
family, on top of existing DL frameworks.

Can you give examples of the application of graph neural
networks in practice?

This year there are applications in fixing bugs in
Javascript, game playing, answering IQ-like tests,
optimization of TensorFlow computational graphs,
molecule generation, and question generation in dialogue
systems.
Thank you

You might also like