Ometric
Ometric
Ometric
Thibault Formal
Motivation
Up to now, we have been dealing with models, but not on how to implement / train
them
PyTorch Geometric (PyG) is a library for deep learning on irregular structured input
data such as graphs, point clouds and manifolds, built upon PyTorch
github: https://github.com/rusty1s/pytorch_geometric
doc: https://rusty1s.github.io/pytorch_geometric/build/html/index.html
Main contributions (overview)
● Large number of common benchmark datasets (CORA, CiteSeer etc.)
● Easy to use dataset interface (for custom ones)
● Helpful data transforms
● Mini-batch handling (graphs different sizes)
● Clean message passing API
● High data throughput
● Bundles many recently proposed GCN-like layers
● …
Many already implemented operators/models (> 25)
● How Powerful are Graph Neural Networks ?, Xu et al., ICLR 2019 (Rohit’s
presentation) → GINConv
● [...]
Installation
Quick note on installation
It took me one day to set up everything on my laptop + servers
Installation is well documented but few things are not clear (at least to me)
My recommendation (additional notes w.r.t. doc)
1. setup a new conda environnement
2. conda install -c psi4 gcc-5 →it ensures valid gcc version (see:
https://github.com/rusty1s/pytorch_geometric/issues/170) (thanks Rohit)
3. Check if CUDA is installed (on my machine located in /usr/local/cuda-10.1/, on
the servers in /nfs/core/cuda/*)1
4. then install PyTorch (1.1, choose right CUDA):
https://pytorch.org/get-started/locally/
5. Then strictly follow installation guide:
https://rusty1s.github.io/pytorch_geometric/build/html/notes/installation.html
6. This should work !
1
: On my laptop I installed it following: https://www.if-not-true-then-false.com/2018/install-nvidia-cuda-toolkit-on-fedora/
A note on design choice
SpMM vs gather + scatter
Generally, you can express message passing as:
Then you can implement model with SpMM (what is done in Kip’s GCN)
5 2 3 1 1 2 7 2 source
9 4 3 7
None of these are required, and one can easily add new attributes by extending
the class
Example
Dataset and DataLoader
Then similarly to PyTorch, you can define a torch_geometric.data.Dataset (that
may contain several graphs) and a torch_geometric.data.DataLoader (to iterate
over a dataset)
Large number of benchmark datasets + clean interface to build you own + useful
data transforms
Batch graphs with different number of nodes / edges ? → Build a large sparse
block diagonal adjacency matrix + concatenate features and target matrices (no
messages exchanged between these disconnected graphs)
The Message Passing interface
Message Passing networks
Neighborhood aggregation or message passing scheme ([2] Gilmer et al., 2017)
[2] Gilmer et al., Neural message passing for quantum chemistry, 2017 (ICLR)
The MessagePassing base class
PyG provides the torch_geometric.nn.MessagePassing base, which helps
creating such networks, by automatically taking care of message propagation
● message function
● update function
● aggregation scheme (add, mean or max)
Under the hood (high level view)
Toy example
[0, 0, 0, 1, 1, 2, 3, 3]
A(COO) =
[1, 2, 3, 0, 3, 0, 0, 1]
Toy example - simple node update
source nodes i,
which aggregate
[0, 0, 0, 1, 1, 2, 3, 3]
A = [1, 2, 3, 0, 3, 0, 0, 1] target nodes j, to
messages aggregate
aggregation
(scatter op)
e.g. sum
using source index
Few lines of code
Should you use PyG ?
Contender: Deep Graph Library (DGL)
Both accepted at ICLR 2019 RLGM workshop (this week !)
At the time of writing, PyG >> DGL (much faster, up to 15 times faster !)
But…
Fused message passing
Blog post: https://www.dgl.ai/blog/2019/05/04/kernel.html
Standard message passing does not scale to large graphs: messages are
explicitly materialized and stored