MLNS_report

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

MLNS Project Report

Alexis-Raja Brachet, Louis Chirol, Sophia Chirrane and Tanguy Olympie


CentraleSupélec

ABSTRACT GNNs have been introduced almost 20 years ago. Based on the
This project aims to benchmark various machine learning and deep success on Convolutional Neural Networks (CNN) on images, which
learning algorithms for graph classification. Graph classification is are grid graphs, it was quite natural to implement such techniques
a fundamental problem in graph analysis, with numerous applica- on more complex graph structures such as proteins. Such algo-
tions in various domains, including bioinformatics, social networks, rithms on graphs are based on Recurrent Neural Networks (RNN)
and computer vision. The goal of this project is to evaluate the framework called Message passing. Each layer corresponds to the
performance of state-of-the-art ML and deep learning algorithms information coming from the neighbors of each node. By putting 𝑘
for graph classification on the COLLAB dataset. The project will layers, we consider messages coming from 𝑘 hops neighbors. One
involve the implementation of various algorithms, including tradi- limitation of GNNs is clear : adding more and more layers will
tional ML algorithms such as Random Forest and Support Vector reduce the performances of the model as considering too far neigh-
Machines, and deep learning algorithms such as Graph Convolu- bors is averaging the messages passing through each node. Recently,
tional Networks and Graph Attention Networks. newer techniques arose to tackle generalisation and computational
issues such as the degree variation between nodes or large scale
ACM Reference Format: graph. To do so, GraphSAGE was proposed which consists in a
Alexis-Raja Brachet, Louis Chirol, Sophia Chirrane and Tanguy Olympie.
fixed-size sampling neighbors and perform aggregation when up-
2023. MLNS Project Report. In Proceedings of ACM Conference (Confer-
dating the node embedding. Then, Graph Attention Networks
ence’17). ACM, New York, NY, USA, 7 pages. https://doi.org/10.1145/nnnnnnn.
nnnnnnn (GAT) were known as being good for dealing with variable sized
inputs, focusing on the most relevant parts of the input [8].
1 INTRODUCTION / MOTIVATION For this project, we will compare the performance of a deep
Graph classification is a fundamental task in machine learning learning approach using GNNs with classical machine learning
that has a wide range of applications in various domains such as algorithms on the COLLAB dataset.
bioinformatics, chemistry, social network analysis, and recommen-
dation systems. Graph classification involves assigning labels to a
set of graphs based on their topological structures and properties. It
is a challenging task since graphs can have different sizes, shapes,
2 PROBLEM DEFINITION
and connectivity patterns, making it difficult to extract meaningful The COLLAB contains graphs representing scientific collaborations
features and patterns. between authors of papers in the arXiv repository. The task is to
classify each graph into one of three categories based on the subject
The use of graph classification algorithms can provide several area of the papers. The COLLAB graph dataset is a collection of
benefits, including the ability to identify similarities and differ- scientific collaboration networks. Each graph represents the col-
ences among graphs, cluster similar graphs together, and predict laborations between authors on a paper in a specific research area.
the properties of new graphs based on their structural features. The nodes in the graph represent authors, and the edges between
In bioinformatics, graph classification is used to predict protein nodes represent co-authorship on a paper. The dataset consists of
functions and analyze gene expression data, while in chemistry, it 5,000 graphs with a variable number of nodes and edges, depending
is used to predict the properties of molecules and design new drugs. on the number of authors and their collaboration patterns. The
COLLAB dataset poses several challenges for graph classification,
In recent years, Graph Neural Networks (GNNs) have emerged including varying graph sizes, the sparsity of the graphs, and the
as a powerful tool for graph classification tasks. GNNs are neural presence of noise and outliers.
networks that operate on graph-structured data, and they can cap-
ture complex dependencies and interactions between graph ele- Despite these challenges, the COLLAB dataset is interesting for
ments. GNNs have been shown to outperform traditional machine graph classification tasks because it reflects real-world collabo-
learning algorithms on several graph classification tasks, including ration patterns in scientific research. The dataset can be used to
molecule classification, social network analysis, and citation net- explore various research questions, such as how collaboration pat-
work classification. terns differ across research areas or how to predict the success of a
research collaboration based on the co-authorship network. Addi-
tionally, the COLLAB dataset provides an opportunity to evaluate
. the performance of graph classification algorithms in real-world
Conference’17, July 2017, Washington, DC, USA scenarios. The dataset has been used in various studies in the field
© 2023 Association for Computing Machinery.
ACM ISBN 978-x-xxxx-xxxx-x/YY/MM. . . $15.00 of graph classification, and the results have led to insights on the
https://doi.org/10.1145/nnnnnnn.nnnnnnn effectiveness of different classification algorithms and features for
Conference’17, July 2017, Washington, DC, USA

this task. Some additional processings have been performed for each meth-
ods such as feature extraction for ML-based
Let’s denote G = (𝐺𝑖 )𝑖 ≤5000 the graph collection where each
graph is associated to certain label, denoted as 𝑙 (𝐺). 4.2 Data exploration
The goal of this project is to learn a labelling function 𝑙ˆ such as : It is always interesting to spend some time on data exploration to
𝑙ˆ : 𝐺 ∈ G ↦→ 𝑙ˆ(𝐺) ∈ {0; 1; 2} better understand our data. When it comes to graph classification,
useful information can be the number of nodes, edges, the average
We aim to maximize the accuracy on the test set of the labelling node degree, and the class distribution for all the graphs and for
function : each attribute. We summarize this information in figure 1. This
figure teaches us that a lot of graphs have few nodes and edges, but
#(𝑙ˆ(𝐺) = 𝑙 (𝐺))𝐺 ∈ G𝑡𝑒𝑠𝑡 a significant number of them has a lot more edges, leaving a whole
accuracy =
#G𝑡𝑒𝑠𝑡 in the distribution of edges. The average node degree is 74 which is
quite high, with some graphs having a much higher average degree.
3 RELATED WORK
It is interesting to note that for a majority of graphs, the diameter
The COLLAB graph dataset has been used extensively in the litera- is only two. Finally, the classes are quite imbalanced, so that it is
ture as a benchmark for evaluating graph classification algorithms. interesting to examine as well the former information, combined
The state-of-the-art performance on this dataset is constantly evolv- with a label information, which is done in figure 2. The number of
ing, as new methods are proposed and new features are extracted nodes boxplot is not very informative as all the boxes are severely
see Table 1. overlapping, but the number of edges one shows that graphs in
class one have significantly less edges than the other classes, and
Table 1: Accuracy results on the COLLAB dataset using dif- the graphs of class 2 can have more. The boxes in terms of average
ferent methods node degree are also well distinguished. Graphs of class one never
have a diameter of 1, while only a few graphs of class 0 have. As a
Method Accuracy (%) whole, this short analysis highlights that basic features are quite
GCN [4] 81.5 informative about the classes, yet overlapping too much to draw
GAT [7] 83.0 arbitrary boundaries between the classes. Consequently, feeding
AGNN [10] 83.0 such features and more advanced ones into classifiers could well
SVM [11] 73.9 produce a good classification, in order to exploit this between-class
RWR [9] 75.1 variance. Another noteworthy element is the diameter of the graphs
GraRep [4] 78.9 being less than 2, which means that the graphs is highly connected.

4.3 Machine Learning approach


Our project is at the same time replicating, different and comple-
mentary to these works. Indeed, we will base our architectures on 4.3.1 Features extraction. In order to use classical ML tech-
the findings of both ML based and GNN based architecture. Mixing niques, we are going to extract features from the graph such as
the two approaches is complementary to these related works, tak- degree centrality, betweenness centrality, and clustering coefficient.
ing benefit from the best of each world. In addition, distance based We would also perform Graph Kernel techniques as seen during the
approach has been explored, which is a different approach from lectures in order to perform graph similarity in an higher dimension
the related works. Hilbert space.

4 METHODOLOGY 4.3.2 Distance based method. Another approach we could per-


To compare the performance of GNNs and classical machine learn- form is to use distance based classification method such as adapted
ing algorithms, we would follow the following methodology: KNN or K-means. These classic algorithms can perform a classi-
fication task by comparing the samples with respect to a define
distance (often euclidean distance on the feature spaces). We could
4.1 Data collection process therefore think of defining a handcrafted distance based on the
The COLLAB dataset is convenient to work with. Indeed, a sim- graphs of the dataset properties.
ple line of code downloads the entire dataset from the TUDataset
collection [6]. Finding an appropriate distance between graphs is not a trivial
question. It is often solved using expert knowledge i.e. defining a
General Data Preprocessing. We preprocess the COLLAB dataset specific distance for a specific case of application using the informa-
by converting it into a format that is compatible with both GNN tion represented by the graph [2]. This method does not necessarily
and classical machine learning algorithms. take into account the whole graph information but sometimes some
• For Machine Learning we would work with the Networkx specific properties : the number of cliques for communities search,
Python library the number of leaves for trees, the node degrees . . . Another ap-
• For Graph Neural Networks we would use the DGL (Deep proach used in [5] could be to define a distance based an optimal
Graph Library) and the Pytorch Python libraries transport framework to previous further analysis define a distance
MLNS Project Report Conference’17, July 2017, Washington, DC, USA

Figure 1: Number of nodes and edges distribution and label distribution of the COLLAB dataset. Red dashed lines indicate the
average.

Figure 2: Label-wise distributions for the number of nodes, edges, the average node degree and the diameter of the COLLAB
dataset.

based on a graph alignment problem. ∑︁


𝑅𝑊 𝐷 (𝐺 1, 𝐺 2 ) = 𝑠𝑖𝑚(𝑝𝑖 , 𝑞𝑖 ) (1)
We studied different classic distances to compare graph : 𝑖
where 𝐺 1 and 𝐺 2 are the two graphs being compared, 𝑛 is the
number of nodes in the graphs, 𝑝𝑖 and 𝑞𝑖 are the random walk
• Random Walks Distance (RWD) (Equation 1): RWD is a dis-
distributions for node 𝑖 in each graph, and sim(𝑝𝑖 , 𝑞𝑖 ) is a similarity
tance measure based on random walks on graphs. The RWD
measure between the two distributions, such as cosine similarity
between two graphs is defined as the sum of the similarities
or KL divergence.
of their random walk distributions. Random walk distribu-
tions can be computed using graph Laplacians, and the simi-
The shortest path distance between two nodes in a graph can be
larity between distributions can be measured using various
defined as the length of the shortest path connecting them, denoted
metrics, such as cosine similarity or KL divergence.
by 𝛿 (𝑢, 𝑣) for nodes 𝑢 and 𝑣. The average shortest path distance
• Shortest path distance (SPD) (Equation 2): The shortest path
between all pairs of nodes in a graph 𝐺 can be computed as:
distance measures the distance between two nodes in a graph
as the length of the shortest path connecting them. This 1 ∑︁ ∑︁
distance can be used as a measure of similarity between 𝑆𝑃𝐷 (𝐺) = 𝛿 (𝑢, 𝑣) (2)
𝑛(𝑛 − 1)
𝑢 ∈𝑉 𝑣 ∈𝑉 ,𝑣≠𝑢
graphs, by computing the average shortest path distance
between nodes in each graph. where 𝑉 is the set of nodes in the graph, 𝑛 is the number of nodes,
• Diameter distance (Equation 3) : The diameter of a graph is and 𝛿 (𝑢, 𝑣) is the length of the shortest path between nodes 𝑢 and
the maximum distance between any two nodes in the graph. 𝑣.
In other words, it is the length of the longest shortest path in
the graph. It can be computed by finding the shortest path The formula for computing the diameter of an undirected graph
between all pairs of nodes in the graph and then taking the G with n nodes is:
maximum over all pairs. 𝑑𝑖𝑎𝑚𝑒𝑡𝑒𝑟 (𝐺) = 𝑚𝑎𝑥𝑢,𝑣 ∈𝑉 𝑑 (𝑢, 𝑣) (3)
Conference’17, July 2017, Washington, DC, USA

where 𝑉 is the set of nodes in 𝐺, and 𝑑 (𝑢, 𝑣) is the length of the where a and W are learnable parameters. Finally, each head is
shortest path between nodes 𝑢 and 𝑣 in 𝐺. concatenated in order to produce the updated embedding of the
node 𝑖 :
The K-nearest neighbors (KNN) algorithm is a simple and ef- ∑︁
fective method for classification. When using a graph distance for ℎ𝑖′ = CONCAT𝑘 (𝜎 ( 𝛼𝑖𝑘𝑗 𝑊 𝑘 ℎ 𝑗 ))
𝑗 ∈ N𝑖
classification, the KNN algorithm works as follows:
• For each training graph, compute its distance to the test GAT architecture is nowadays the state-of-the-art for most GNN
graph using the chosen graph distance measure. problems due to the use of attention mechanism which is so pow-
• Select the K training graphs with the smallest distances to erful.
the test graph.
• Assign the class label of the test graph as the majority class One may know that a deep network in GNN framework is dan-
label among the K nearest neighbors. gerous as it will average node representations, which is called
The value of K is a hyperparameter that can be tuned to opti- oversmoothing. Indeed it is annoying for node classification or link
mize the classification performance. Additionally, the choice of the prediction. However, in our case, we want to capture graph level
graph distance measure can also have a significant impact on the properties so computing a deep network would be beneficial. This
classification performance. method hasn’t been explored.
The dataset has been splitted in 3 parts :
4.4 Graph Neural Network approach
• a train set on which the model will be trained representing
For the GNN implementation, we use the DGL library to build our 64% of the dataset
networks. The data is processed in order to be in a dataloader with • a validation set to control overfitting representing 16% of
batches. The goal here is to implement a classifier fully based on the dataset
neural network, from the graph representation to the classification. • a test set on which the performance is evaluated representing
In order to build the representation, we implement : 20 % of the dataset
• GraphSAGE architecture
All the datalaoders have been weighted in order to tackle the
• GAT architecture
imbalance issue of the dataset.
to produce the nodes embeddings. Then we average them and The final hyperparameters we chose for each architecture are
feed the resulting embedding into a Multi layer perceptron. the following :
GraphSAGE has been introduced in 2017 [3]. At each step 𝑘, the for GraphSAGE. , a 2 SAGE layer embedding model with 128
node embedding of 𝑣 denoted as ℎ𝑘𝑣 is computed as it follows : neurons in each and batch-norm layers in between has been cho-
• N (𝑣) neighboors are sampled from all the neighboors of sen and a learning rate of 10 −3 . Each SAGE layer use the pooling
𝑣. Then, an aggregation function is applied on this subset aggregation function.
giving
for GAT. , a 2 GAT layer with 4 attention heads, 128 neurons in
ℎ𝑘N (𝑣) = AGGREGATE𝑘 ({ℎ𝑢𝑘 −1, ∀𝑢 ∈ N (𝑣)}) each and a learning rate of 10 −3 .
• then the previous embedding ℎ𝑘𝑣 −1 and ℎ𝑘N (𝑣) are fed into a For both architecture, the input embedding for every node was
at first a vector with the graph level features defined in section 5.1.
fully connected layer in order to compute ℎ𝑘𝑣 They are then concatenated with the output of the GNN in the 3
ℎ𝑘𝑣 = 𝜎 (𝑊 𝑘 CONCAT(ℎ𝑘𝑣 −1, ℎ𝑘N (𝑣) )) layer MLP for classification. The number of neurons in each layer
is divided by 2 from one layer to another. A softmax is then applied
This approach allows GraphSAGE to efficiently scale to large graphs
in order to compute the probability over the 3 possible classes.
while maintaining the ability to capture complex structural patterns
in the data.
At the end of the project, we started to consider as input the
embeddings given by the Node2Vec algorithm. Sadly, we didn’t have
GAT has been introduced in 2018 [8]. It takes advantage from the
time to finish our computation due to time limit, non optimized
attention mechanism developed in Natural Language Processing
function and no GPUs remaining. The first the obtained results
introduced in 2015 [1]. At each time step, the embedding ℎ𝑖′ is the
were much higher and usable with this initialization so we would
concatenation of each attention head. An attention head computes
have followed this path.
the weighted sum of each neighbor of 𝑖.
The Node2Vec algorithm is based on the NLP domain. Multiple
For an attention head, we have :
random walks are performed on each node of the graph. Then, an
∑︁ embedding is computed such as neighbors, ie nodes occurring in
ℎ𝑖′ = 𝜎 ( 𝛼𝑖 𝑗 Wℎ 𝑗 )
the same walk, have similar embeddings. As a consequence, it can
𝑗 ∈ N𝑖
capture the structure of the graph and is much more expressive
where than a random initialization.
exp(LeakyReLU(a𝑇 [Wℎ𝑖 ||Wℎ 𝑗 ]))
𝛼𝑖 𝑗 = Í
𝑘 ∈ N𝑖 exp(LeakyReLU(a𝑇 [Wℎ𝑖 ||Wℎ𝑘 ])) Results and discussion are reported in section 5.3.
MLNS Project Report Conference’17, July 2017, Washington, DC, USA

4.5 Hybrid approach Table 3: Importance of the graph features for the Random
Forests decisions
Finally, we could design a new method that includes both machine
learning and deep learning for a classification task problem. The
deep learning approach would be used to automatize the process Feature Importance
of representation learning in order to select a relevant graph repre- number of edges .14
sentation. Then, this learned representation could be used as the number of nodes .056
input of a machine learning approach. The GNN would be trained average number of edges .211
with a specific loss based on the performance of the ML classifier. clustering coeficient .120
density .099
We also thought about trying to learn graph representations with diameter .001
a GNN before feeding a ML classifier through self-supervised learn- maximum number of connected components .058
ing, which is a hot topic in the modern literature and would have average shortest path lenght .106
been fun to experiment with in this project. But our lack of knowl- mean of the nodes average degree connectivity .101
edge and the diverse design choices to make (self-supervised task standard deviation of
definition, dataset generation, amount of data) led us to abandon the nodes average degree connectivity .101
this approach.

5 EVALUATION 5.2 Distance based method


5.1 ML method As the sizes of the graph of the dataset differ and can be quite big the
The exact list of features used is : distance choice has a strong influence on the computation time. The
3 studied distances were chosen due to they rather fast computation.
• diameter
• clustering coefficient
To evaluate the distance based approach we decided to only study
• mean number of edges
a sub part of the whole dataset due to computational constraints.
• total number of edges
We so created sub dataset taking x, the number of sample per class.
• number of nodes
• mean of the nodes average degree connectivity
• standard deviations of the nodes average degree connectivity Table 4: Balanced Accuracy results in (%) on the COLLAB
• average shortest path length dataset using different distance based methods
3 classical machine learning classifiers were trained : a logistic
Metric k NN 50 graphs/class 100 graphs/class
regression classifier and a Random Forests algorithm and a Support
Vector Machine algorithm. RWD k=3 26.8 29.8
SPD k=3 33.3 33.3
Diameter k=3 40.7 45.10
RWD k=10 36.1 30.68
Table 2: Balanced Accuracy in (%) results on the COLLAB SPD k=10 33.3 33.33
dataset using different Machine Learning algorithms Diameter k=10 37.96 35.27

Classifier Accuracy
Logistic regression 69 The following table (table 4 summarizes the obtained results.
Random Forests 77 We can see with those results that the choice of the distance is
SVC 68 crucial and lead to significantly different results. The distance that
compared the diameter of the graphs performs better than the other
for every set up. However this distance seems to have a maximum
accuracy with respect to the number of neighbors used for clas-
The best performing of the 3 is the Random forest algorithm sification when we add some data to the training. For instance,
with an accuracy of .77. We can extract the features importances : when the number of neighbor is equal to 3 we have an increase
of the balanced accuracy dealing from 50 to 100 graphs per class.
The importance of the feature seems to be rather well distributed, However, when increasing to reach 200 graphs/class the accuracy
as a most have a non negligible impact on the result. The most decreases to 42.16 (Figure 3). The same phenomenon is observed
important feature is the average number of edges while the least with 10 neighbors.
important are the number of nodes, diameter and density. Furthermore, this distance is the only one for which the perfor-
These simple algorithms based on feature extraction can be used as mances are not increased when the number of neighbor increases.
a simple baseline for further models. Even if the classical models The SPD distance does not perform better than a random affecta-
take very little time to be trained, the feature extraction in itself is tion. We can take a closer look of what is happening when using
quite costly. The results are slightly below those of the this distance.
Conference’17, July 2017, Washington, DC, USA

5.3 GNN method

Figure 5: Learning curves of the GraphSAGE architecture


Figure 3: Confusion matrix when using the diameter distance
over 50 epochs showing the prediction issue
for a dataset containing 200 graphs per class and 3 neighbors

Figure 6: Learning curves of the GAT architecture for 1 run


showing the prediction issue and a different local minimum

Figure 4: Result using the SPD distance

On figure 2 we then see that every graph is classified as belong- Figure 7: Learning curves of the SAGE architecture for 1
ing to the second class. run showing with Node2Vec initialization along 12 epochs
showing promising performances
Overall, even this approach is highly interpretable it does not
lead to very good results. Moreover, the choice of the distance is Despite playing on the hyperparameters of each architecture, we
crucial both in term of performances but also in term of computa- found that the validation accuracy would most of the time get stuck
tion complexity. in a local minimimum, corresponding to predict the same label for
every graph. The label chosen is the most represented in the dataset.
The diameter in the Collab dataset seems to play a key role in Implementing a learning scheduler didn’t solve the problem. We
order to classify the graphs. coult think that the model is stuck into a large local minimum.
MLNS Project Report Conference’17, July 2017, Washington, DC, USA

Table 5: Best accuracy in (%) results on the COLLAB dataset Finally, we tried to explore state of the art method through Graph
using different GNN architecture and initialization Neural Networks. This approach required a lot a tuning and tricks to
finally yield interesting results, despite the low number of classes in
Method features initialization Train acc Val acc Test acc the dataset, only three. Yet, the method is once again slow and could
GraphSAGE graph 67 53 53 not be extended to other approaches, such as a hybrid GNN/ML
GAT graph 67 53 53 classifier, or a self-supervised training of the GNN for the learning
GraphSAGE Node2Vec 69 62 ... of feature extraction.
It appeared to us that despite looking similar to other classifica-
GAT Node2Vec 65 62 ...
tion tasks, graph classification may prove tricky, as it requires to
exploit a lot of network science knowledge, and a lot of efforts to
However, setting from small (10 −6 ) to large (10 −1 ) learning rates extract relevant information from a graph without prior node or
didn’t change anything. edge features in order to classify it.
As the accuracy on the train set is increasing, the model is learn-
ing something. REFERENCES
The figure 5 illustrates quite well these words. The accuracy on [1] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine trans-
lation by jointly learning to align and translate. In Yoshua Bengio and Yann
the validation is constant along the 50 training epochs while the LeCun, editors, 3rd International Conference on Learning Representations, ICLR
one of the training set is increasing. 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015.
[2] C. Colijn and G. Plazzotta. A Metric on Phylogenetic Tree Shapes. Systematic
Biology, 67(1):113–126, 05 2017.
At the end of our work, we started to implement the Node2Vec [3] William L. Hamilton, Rex Ying, and Jure Leskovec. Inductive representation
embedding method in order to initialize the embeddings fed into the learning on large graphs. CoRR, abs/1706.02216, 2017.
[4] Thomas N Kipf and Max Welling. Semi-supervised classification with graph
GNNs. The predictions were no longer constant and the accuracy convolutional networks. Neural Networks, 61:147–160, 2017.
was significantly higher. [5] Hermina Petric Maretic, Mireille El Gheche, Matthias Minder, Giovanni Chierchia,
GraphSAGE with Node2Vec initialization get the best results and Pascal Frossard. Wasserstein-based graph alignment. CoRR, abs/2003.06048,
2020.
during training. [6] Christopher Morris, Nils M. Kriege, Franka Bause, Kristian Kersting, Petra Mutzel,
This way is promising and we should explore it in depth. The and Marion Neumann. Tudataset: A collection of benchmark datasets for learning
parameters of the code were the ones producing the scores of the with graphs. CoRR, abs/2007.08663, 2020.
[7] Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro
Node2Vec initialization in table 5. Lio, and Yoshua Bengio. Graph attention networks. In Proceedings of the 6th
International Conference on Learning Representations, 2018.
5.4 Hybrid method [8] Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro
Liò, and Yoshua Bengio. Graph attention networks, 2017.
The hybrid method consists in training a GNN to learn relevant [9] Ulrike von Luxburg. A comparison of spectral clustering algorithms. In Advances
in Neural Information Processing Systems, pages 857–864, 2007.
graph features, and then to use these features with a different type [10] Yufei Zhao, Leman Akoglu, and Hanghang Tong. Attributed graph neural net-
of classifier, for instance a Machine Learning classifier. The choice works for graph classification. In Proceedings of the AAAI Conference on Artificial
of ML classifier is broad: we could think of a Random Forest, a Intelligence, volume 33, pages 4054–4061, 2019.
[11] Dengyong Zhou, Jiayuan Huang, and Bernhard Schölkopf. Learning with local
Gradient Boosting model like xgboost, an SVM model. It would also and global consistency. In Proceedings of the Tenth IEEE International Conference
be easy to concatenate more classical graph features to the GNN on Computer Vision (ICCV’05) Volume 2, volume 2, pages 321–328. IEEE, 2004.
features, like the ones used in the fully ML part above.
. We were unfortunately unable to test this approach, even
though it was implemented. As we wanted to use it on top of our
best GNN, it took some time before getting an interesting model,
and we did not get enough time to exploit it. The original idea was
to compare the results of a Random Forest, an xgboost and an SVC
classifiers on top of the GNN, on the basis of their F1-score and
their confusion matrix.

6 CONCLUSION
In this work, we performed an exhaustive study on the graph clas-
sification task, over the COLLAB dataset.
We first experimented with classical Network Science, with dis-
tance based methods suited for this task. They proved computa-
tionally expensive, and inefficient (45% accuracy at most) as the
results were worse than a constant prediction based on the most
populated class (≃ 53% of class 0).
Thus, we moved toward Machine Learning methods, relying
on a supervised learning of the labels based on classical graph
features. This attempts gave better results, with a 77% accuracy
with a Random Forest classifier.

You might also like