Complex Networks and their Applications
Complex Networks and their Applications
Edited by
Hocine Cherifi
Complex Networks and their Applications,
Edited by Hocine Cherifi
This book first published 2014
Cambridge Scholars Publishing
12 Back Chapman Street, Newcastle upon Tyne, NE6 2XX, UK
British Library Cataloguing in Publication Data
A catalogue record for this book is available from the British Library
Copyright © 2014 by Hocine Cherifi and contributors
All rights for this book reserved. No part of this book may be reproduced, stored in a retrieval system,
or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or
otherwise, without the prior permission of the copyright owner.
ISBN (10): 1-4438-5370-4, ISBN (13): 978-1-4438-5370-5
TABLE OF CONTENTS
Preface ....................................................................................................... vii
Chapter One ................................................................................................. 1
Disentangling Spatial and Non-spatial Effects in Real Complex Networks
Tiziano Squartini, Francesco Picciolo, Franco Ruzzenenti, Riccardo
Basosi and Diego Garlaschelli
Chapter Two .............................................................................................. 39
Online and Offline Sociality: A Multidimensional Complex Network
Approach
Matteo Zignani, Sabrina Gaito and Gian Paolo Rossi
Chapter Three ............................................................................................ 76
Multi-Ego-Centered Communities
Maximilien Danisch, Jean-Loup Guillaume and Bénédicte Le Grand
Chapter Four ............................................................................................ 112
Complex Networks in Scientometrics
Adam Matusiak and Mikolaj Morzy
Chapter Five ............................................................................................ 132
Rumor Dynamics and Inoculation of Nodes in Complex Networks
Anurag Singh and Yatindra Nath Singh
Chapter Six .............................................................................................. 174
Exploratory Network Analysis: Visualization and Interaction
Sébastien Heymann and Bénédicte Le Grand
Chapter Seven.......................................................................................... 212
The Composite Centrality Framework
Andreas Joseph and Guanrong Chen
Chapter Eight ........................................................................................... 245
Complex Networks and Epidemiology
Marco Alberto Javarone and Giuliano Armano
vi
Table of Contents
Chapter Nine............................................................................................ 261
Building Social Networks in Online Chats with Users, Agents and Bots
Vladimir Gligorijević, Milovan Šuvakov and Bosiljka Tadić
Chapter Ten ............................................................................................ 285
Complex Networks and Web Services
Chantal Cherifi
Chapter Eleven ........................................................................................ 320
Non-Overlapping Community Detection
Hocine Cherifi
Contributors ............................................................................................. 354
PREFACE
Complex network theory is an emerging multidisciplinary field of research
that is spreading to many disciplines such as physics, engineering, biology,
sociology and economics. The common feature of many systems
encountered in these different scientific fields is that they can be represented
as a graph with the nodes representing a set of individual entities and the
links standing for the interactions between these entities. Regardless of
their physical nature, complex networks share some common structural
properties that distinguish them from purely random graphs. Inspired by
the study of real-world systems rather than by theory and fuelled by the
availability of large datasets and computing power, research on complex
networks is booming. The primary goal of this book is to provide an
overview of the multiple aspects of this fast-growing research area. It
contains eleven chapters presenting a wide spectrum of recent development
with emphasis on theory and applications in the field. Although this book
is a collection of independent studies, it represents a cohesive work that
provides the reader with an up-to-date picture of the state of the field.
Collectively, these contributions highlight the impact of complex network
theory on a variety of scientific disciplines. This book does not solely
reflect the opinion of the author. Instead, it expresses the views of 25
researchers working in well-known universities and research institutions
throughout the world. The readers of this book are expected to be involved
in a range of interdisciplinary studies. With this aim in mind, care was
taken to make it as readable as possible to newcomers. I am honored to
bring you this book, which was generated by the contributions and
discussions held at the Workshop on Complex Networks and their
Applications. I would like to thank the contributors of the different
chapters for their constructive effort. I hope that “Complex Networks and
their Applications” will be useful to a large audience of experts and
graduate students and that it will stimulate important developments in this
exciting research area.
Hocine Cherifi
CHAPTER ONE
DISENTANGLING SPATIAL AND NON-SPATIAL
EFFECTS IN REAL NETWORKS
TIZIANO SQUARTINI, FRANCESCO PICCIOLO,
FRANCO RUZZENENTI, RICCARDO BASOSI
AND DIEGO GARLASCHELLI
Over the last fifteen years, Network Science has facilitated the
identification of universal and unexpected patterns across systems
belonging to deeply different research fields, such as biology, economics
and physics (Caldarelli 2007). A fruitful cross-fertilization among these
disciplines, leading to the introduction of novel multidisciplinary tools, has
been made possible by the fact that many real complex systems can be
formally abstracted as networks or graphs, irrespective of their specific
nature. In so doing, several details of the original system are discarded and
the emphasis is put on the study of the topological properties of the
underlying ‘network backbone’ (Caldarelli 2007; Barrat Barthelemy and
Vespignani 2008; West Brown and Enquist 1997, 1999, 2001). While this
process facilitates the detection of key structural properties in real complex
systems, it can also obscure other important levels of organization that
involve non-topological factors. A key example is the spatial organization
of networks (Barthelemy 2011).
Many real networks lie embedded in a metric space, i.e. a space where
distances between nodes can be properly defined. In such cases, besides
their connectivity, vertices can be identified by additional parameters,
definable as coordinates, measuring their position and allowing the
quantification of their mutual “proximity”. We will refer to these networks
as embedded networks. Embedded networks represent an important subset
of real networks: transportation systems, electric power grids, wireless
communication networks and the Internet (i.e. the net of physical
connections between servers) are only a few examples of systems embedded
2
Chapter One
in a two-dimensional metric space (Barthelemy 2003, 2011; Emmerich et
al. 2012; Woolley-Meza et al. 2011).
Social networks, e.g. those represented by friendship or sexual
relations among individuals, are also shaped by the proximity of the nodes
in a two-dimensional space (even if the World Wide Web is challenging
our traditional way of establishing social relations, it is still far more
common to have a higher number of friends in the same city or country
than in a distant one). Other examples, such as neural networks and protein
networks, can instead be considered as occupying a three-dimensional
metric space (Emmerich et al. 2012).
The range of applications can be even further extended to networks
that are not necessarily embedded in a physical or geographic space, by
noticing that the concept of metrics allows us to study configurations lying
in abstract (e.g. cultural, economic or temporal) spaces, where distances
are defined accordingly (Axelrod 1997; Aiello et al. 2012; Starnini et al.
2012; Valori et al. 2012). For instance, networks of protein configurations
linked by saddle-points in a properly defined energy landscape are
examples of networks embedded in high-dimensional configuration spaces.
In all these examples, both vertex-specific and global spatial dependencies
affect the dynamics of the network (Böde 2007). Thus, in order to deepen
our understanding of the mechanisms shaping real networks and ruling
their evolution, the unavoidable step to be made is to take into account
also spatial properties (Bettencour et al. 2007; Bejan and Lorente 2010;
Emmerich et al. 2012).
Unfortunately, while many theoretical models have already been
introduced in order to artificially generate networks shaped by a
combination of spatial and non-spatial factors, it is still much more
difficult to disentangle these two effects in real networks (Bradde et al.
2010; Barthelemy 2011; Picciolo et al. 2012).
Two main obstacles are encountered. First, most approaches require
the introduction of a mathematical model where the functional dependence
of network properties on distances is postulated a priori and thus
arbitrarily (Duenas and Fagiolo 2011; Anderson and Yotov 2012). Second,
it is very difficult to filter out a spurious or apparent component of spatial
effects which is instead due to other non-spatial factors. For instance, hubs
(vertices with many connections) are generally connected to several other
nodes irrespective of the positions of the latter, simply because they are
highly connected. This effect would generally appear as a local lack of
spatial dependence, spuriously lowering any global measure of spatial
effects, even if the overall network formation process were instead
distance-driven. Conversely, pairs of hubs (vertices with many connections)
Disentangling Spatial and Non-spatial Effects in Real Networks
3
tend to contribute to an overestimation of spatial factors, since they are
typically connected to each other even in networks where distances play
no role. The distance between pairs of hubs would then incorrectly appear
as a preferred spatial scale for connectivity, biasing again the
interpretation of the results.
The above considerations clarify that, in order to disentangle spatial
and non-spatial effects in real networks, any satisfactory approach should
be able to control for two potentially misleading factors. First, it should
control for the mathematical arbitrariness a priori associated to the
definition of any proxy of spatial dependence. Second, it should control
for the effects of non-spatial topological constraints inducing a spurious
spatial dependence a posteriori, given the characteristics of the particular
real network considered.
In this chapter, we describe how these two important prescriptions can
be implemented into the definition of a general method that we have
recently introduced (Ruzzenenti et al. 2012; Picciolo et al. 2012).
The method is based on the idea that, given any definition of spatial
effects, the relevant information is not given by the measured value itself.
A comparison is needed with the corresponding expected value under a
suitable null model that preserves the non-spatial properties of the real
network. This comparison removes the mathematical arbitrariness of the
adopted definition, and the fact that the null model controls for non-spatial
effects also removes the undesired effects of the latter. Moreover, by
focusing on both global (network-wide) and local (vertex-specific)
quantities, this method allows us to isolate the (potentially conflicting)
contributions of individual nodes to the overall spatial effects.
We will describe our method in detail by considering its application to
a particular embedded network, namely the World Trade Web (WTW)
defined as the network of international import-export trade relationships
between world countries. Our choice is driven by the fact that both spatial
effects (e.g. geographic distances between countries) and non-spatial
effects (e.g. the countries’ Gross Domestic Products) are known to shape
the structure of this network (Ruzzenenti et al. 2012; Picciolo et al 2012).
For this reason, the WTW is the ideal candidate not only to illustrate our
method, but also to compare the results with a different class of spatial
models known in the economic literature as Gravity Models (Tinbergen
1962; Linders Matijn and Van Oort 2008; Fagiolo 2010; Duenas and
Fagiolo 2012; Squartini and Garlaschelli 2013).
As the name itself suggests, Gravity Models aim at predicting the
yearly intensity of the total trade exchanges between any two countries by
adopting the same functional form of Newton’s gravitational potential.
4
Chapter One
The predicted intensity is proportional to the countries’ GDPs (calculated
in the same year as the trade exchanges) and inversely proportional to the
countries’ geographic distance (Tinbergen 1962; Linders Matijn and Van
Oort 2008). Our results show that the effects of geographic distances on
the WTW are much more complicated than what is generally learnt from
the use of Gravity Models.
The remainder of the chapter is structured in three main sections and a
final concluding one. In the first section, we consider the case of binary
networks, where pairs of vertices are either connected or not connected.
We initially define some preliminary quantities that will be our “target”
measures of spatial effects. Then we introduce suitable null models where
the role of distances is, in some sense, switched off. Finally we calculate
some “integrated” quantities defined as combinations of observed and
expected values. This allows us to assess whether spatial effects are
present or not (both locally and globally), given a target quantity used as a
proxy. In the second section, we extend our approach to weighted
networks, where links can have different intensities. Taken together, the
results of the first two sections reveal that spatial effects are clearly present
in the WTW but vary considerably over time, for different countries, and
for different (binary or weighted) representations of the network. In the
third section, we extend our formalism in order to compare the entity of
spatial effects with that of other factors shaping the network. We find that
geographic distances are comparatively much less important than nonspatial properties such as the reciprocity of the network. We conclude that
the role of distances in the WTW, in both absolute and relative terms, is
very different from what is generally thought.
Spatial effects in binary networks
In order to disentangle spatial and non-spatial effects in real embedded
networks, the first step is to define quantities which measure how a
network “feels” its embedding space. For illustrative purposes, in Fig. 1
we show two extreme ways in which this can happen. In both panels,
nodes represent the capitals of the countries adhering to the European
Union and distances between nodes are proportional to the geographic
distances between the EU capital cities. In the top panel, links are
established between the geographically closest pairs of countries,
originating a “spatially polarized” (or shrunk) configuration. In the bottom
panel, the same number of links is instead drawn between the most distant
Disentangling Spatial and Non-spatial Effects in Real Networks
5
Fig. 1. Two examples of a hypothetical EU27 trading network (N=27). The black
dots correspond to the geographic positions of the capital cities. For a given
number of links (here arbitrarily chosen to be L=27), the figure represents the
maximally shrunk network (top) and the maximally stretched network (bottom).
The filling coefficient that we introduce later takes the values = (top) and
=
(bottom) for these two extreme configurations, and
< <
in
intermediate cases.
Chapter One
6
pairs of countries, originating a “spatially diluted” (or stretched) configuration.
In the following subsections, we define quantities that can properly
distinguish between these extremes and also capture any intermediate
configuration.
A global measure
A binary, directed graph is specified by a × adjacency matrix, ,
is 1 when there
where is the number of nodes and the generic entry
is a connection from node to node , and 0 otherwise. The simplest
definition of a global measure incorporating distances and network
structure is
=∑
∑
is the generic entry of the matrix of distances, , among nodes
where
(Ruzzenenti et al. 2012). Since we will consider networks without self= ), is a measure of the total distance between different,
loops (i.e.
topologically connected pairs of nodes. Equivalently can be seen as a
measure of the extent to which the networks “fills” the available space.
The quantity reaches its minimum when the links are placed between
the closest vertices. Formally speaking, if we consider the list
=
,⋯, ,⋯,
of all non-diagonal elements of ordered from
), the minimum value of
is
the smallest to the largest (
simply given by
, where = ∑ ∑
is the number
=∑
of links in the network. Similarly, the maximum value of is reached
when links are placed between the spatially farthest nodes. Considering
the list
=
,⋯, ,⋯,
of distances in decreasing order
), the maximum value of for a network with vertices is
(
.
=∑
In order to compare, and possibly rank, different networks according to
their values of , a normalized quantity should be used. An improved
global definition, which we will denote as filling coefficient, is
=
−
−
=
∑
∑
−
−
Disentangling Spatial and Non-spatial Effects in Real Networks
7
where
. For the maximally shrunk and maximally stretched
configurations shown in Fig. 1, the filling coefficient takes the values
=
and
=
respectively. Depending on the chosen links’
disposition, different choices of the two values
and
can be
made. As an example, in Fig. 2 we show how the extreme values of (or
equivalently ) can change if different or additional constraints, beside the
total number of links, are enforced on the network topology (e.g. imposing
and
only one outgoing link for each vertex). So, in principle,
can be arbitrarily tuned to fit the best scenario for the network under
consideration.
In the next section we will present a general method to disentangle
spatial and non-spatial effects concurring to shape embedded networks and
we will compare them with the existing ones.
The filling allows us to quantify the tendency of embedded networks to
fill the metric space they are in. However the interplay between spatial and
non-spatial effects in shaping a network topology can be unambiguously
quantified only after having defined a proper reference model with which
to compare the observed value of whose aim is to discount as much as
possible spurious non-spatial effects letting the genuine distances-induced
ones emerge (Squartini and Garlaschelli 2011).
The reference models we will define in what follows are probabilistic
in nature and known as null models. The methodology underlying a null
model prescribes to choose only a portion of the available observed
network’s information and test how effective it is in explaining the rest of
the (unconstrained) topology (Shannon 1948; Jaynes 1957; Holland and
Leinhardt 1975; Wasserman and Faust 1994; Maslov and Sneppen 2002;
Park and Newman 2004; Garlaschelli and Loffredo 2008; Squartini and
Garlaschelli 2011). The effectiveness of the chosen set of constraints will
be also tested over time, by analyzing different temporal snapshots of the
same network. In so doing the presence of statistically significant trends
through time can be highlighted.
Non-spatial null models
As previously mentioned, the aim of this comparison is to discount
apparent or spurious spatial effects due to non-spatial factors. For this
reason, we need to introduce space-neutral models that play the role of
null models.
8
Chapter One
Fig. 2. Two more examples of a hypothetical EU27 trading network (N=27). The
figure represent the maximally shrunk (top) and the maximally stretched (bottom)
network, under the constraint that each vertex has at most one out-going link. The
values of the filling coefficient are now = . (top) and = . (bottom).
The effectiveness of the encoded information can also be tested over
time, by analyzing different temporal snapshots of the same network, thus
highlighting the presence of statistically significant trends.
Disentangling Spatial and Non-spatial Effects in Real Networks
9
Our null models are statistical ensembles of graphs with specified
properties, or constraints. A graph ensemble, ℵ, is a collection of graphs.
For our purposes, we identify ℵ as the so-called “grand-canonical”
ensemble of binary directed networks, i.e. all the networks with a given
number of nodes, , and a number of links varying from 0 to
− .
We want to construct a probability measure
, associated to each
graph of this ensemble, that allows us to realize the desired constraints
(for instance, the average number of links can be set to a given value),
while leaving the unconstrained properties maximally random. This is
achieved by maximizing Shannon’s entropy (Shannon 1948)
=−
ln
∈ℵ
subject to the normalization condition ∑ ∈ℵ
= and to the condition
that a set of desired properties { } is realized, i.e. the expected value
≡
∈ℵ
can be tuned to any desired value. The result of this constrained entropy
maximization is an occurrence probability of the form
,
=
where
is a vector of unknown Lagrange multipliers,
, =
∑
is the graph Hamiltonian (a linear combination of the chosen
,
is the partition function (Park and
constraints) and
= ∑ ∈ℵ
Newman 2004).
Given an observed graph, ∗ , the Lagrange multipliers are set to the
numerical values ∗ that maximize the log-likelihood function defined as
∗
L
≡ ln
|
(Garlaschelli and Loffredo 2008; Squartini and
Garlaschelli 2011):
L
∗
=
∀
Chapter One
10
This leads to the system of equations
∗
=
∀ .
In other words, the parameters ∗ ensure that the expected values of the
desired constraints equal the particular values observed in the real
network. If inserted into
, these parameters allow us to calculate
analytically the expected value
of any other (unconstrained)
topological property of interest. Comparing
with the observed value
∗
finally allows us to conclude whether the enforced constraints are
(partially) responsible also for other unconstrained properties (Squartini
and Garlaschelli 2011).
For our purposes, the above step is the key ingredient we will exploit
in order to check whether a non-spatial null model (i.e. one where the
chosen constraints are purely topological and independent of distances)
can account for (part of) the spatial organization of a real network by
filtering out spurious spatial effects and highlighting the genuine effects of
distances. Note that our use of the terms “spatial”, “non-spatial” and
“topological” is somewhat improper but very practical; we will give a
complete clarification of our terminology at the end of the chapter. In our
binary analyses, we will employ three non-spatial null models: the
Directed Random Graph model (DRG), the Directed Configuration Model
(DCM) and the Reciprocated Configuration Model (RCM) (Squartini and
Garlaschelli 2011). These models are of increasing complexity, and are
briefly described below.
The DRG is characterized by only one constraint: the total number of
observed links, ∗ . The DRG Hamiltonian is thus
, =
and,
of connection is equal to
for any pair of vertices and , the probability
=
∗
+
∗
=
∗
−
where ≡
( ∗ is the fitted value corresponding to ∗ ). The DRG
represents the simplest binary model (Erdös and Renyi 1959; Gilbert
1959).
The DCM is a more refined null model, defined by the network’s indegree sequence (the vector of the in-degrees of each vertex, i.e. the
numbers of incoming links defined as
=∑
) and out-degree
sequence (the vector of the out-degrees of each vertex, i.e. the numbers of
=∑
). The DCM is one of the most
outgoing links defined as
Disentangling Spatial and Non-spatial Effects in Real Networks
11
used null models in network theory and it was shown to replicate many
properties of the WTW (Squartini and Garlaschelli 2011; Squartini
Fagiolo and Garlaschelli 2011a). The resolution of the DCM
equations
leads to a probability matrix whose generic entry has the functional form
=
∗ ∗
+
∗ ∗
where
≡
and
≡
,
and
being the Lagrange
multipliers coupled with the out-degree and in-degree sequences
respectively ( ∗ and ∗ denote the fitted values).
Finally, the RCM is characterized by
constraints, decomposing the
in-degree and out-degree sequences into three more detailed sequences
distinguishing between reciprocated (by a mutual link in the opposite
direction) and non-reciprocated links. The three sequences are the
following: the one of reciprocated degrees (the numbers of reciprocated
links involving each vertex), the one of non-reciprocated out-degrees (the
numbers of non-reciprocated out-going links from each vertex) and the
one of non-reciprocated in-degrees (the numbers of non-reciprocated incoming links into each vertex) (Garlaschelli and Loffredo 2004;
Garlaschelli and Loffredo 2006). The equations to be solved are now
and the connection probability is
=
+
∗
∗ ∗
+
∗
+ ∗
∗ ∗
∗
+
∗ ∗
where
, with
,
and
being the
, ≡
, ≡
≡
Lagrange multipliers associated with the three types of enforced node
degrees (Squartini and Garlaschelli 2011).
The aforementioned three models are characterized by some kind of
topological property (such as the link density, the degree sequence and the
reciprocity) that is a priori independent of any spatial constraint. They
therefore allow us to improve our definition of filling coefficient by
filtering out the spurious spatial effects due to the non-spatial constraint
enforced. In order to achieve this result, a comparison between the
observed value of f and its expectation is needed. Consider the expected
value of the filling coefficient under any of the three aforementioned null
models (NM)
Chapter One
12
=
∑
∑
max −
−
min
min
where
is given by one of eqs. (9-11). The comparison between
observation and expectation can be easily carried out by making use of the
following rescaled version of the filling coefficient, that we denote as
filtered filling (Ruzzenenti et al. 2012):
≡
−
−
.
is [− , ]. A positive value of
means that the
The range of
considered network is “more stretched” than its expected counterpart
defined by imposing a selected set of constraints on the graph ensemble.
is negative for networks which are “more shrunk”
On the other hand,
than expected. Thus, the filtered filling combines the model’s prediction
and the observed information in such a way that their comparison can be
carried out by simply looking at the sign of
. Note that the
normalization in eq. (12) also allows for a comparison between networks
with different topological properties (i.e. number of nodes, number of
links, degree sequences, etc.), discounting for the different impact of the
imposed constraints on the considered topologies. We also note that the
comparison between the observed and expected values of the filling make
and
irrelevant, in accordance with our previous
the choice of
comment about the arbitrariness of the latter.
Local Measures
The filling coefficient and the filtered filling are global quantities
measuring the extent to which spatial effects shape the graph as a whole.
However, from our introductory remarks it is clear that a vertex-specific
definition is also necessary in order to isolate potentially conflicting
contributions of individual nodes. To this end, a local measure is naturally
induced by the sums
out
≡
in
≡
.
Disentangling Spatial and Non-spatial Effects in Real Networks
13
As before, after rescaling out , we can define the local outward filling
coefficient (Ruzzenenti et al. 2012) as
out
≡
∑
out
−
max −
out
out
min
min
where the values out min and out max characterize the extreme local
values for the maximally shrunk and maximally stretched configurations,
in a properly-defined scenario. In analogy with the global quantities
, we choose the extreme values out min and out max as the
and
sums of the first ⁄ smallest and largest distances (now defined locally
for each vertex ) respectively. This number of addenda is chosen to be
consistent with the choice made at the global level: for a network with a
given number of links, the expected number of (either in-coming or outgoing) connections of each node is ⁄ . Similarly, we can define the local
inward filling coefficient as
in
≡
∑
in
−
max −
in
in
min
min
.
in
=
Note that, due to the symmetry of the matrix of distances,
max
out
out
in
max and
min =
min .
As for the global quantity, the expected value of the local filling
coefficients can be simply obtained by replacing the term
in eqs. (14)
under the chosen null model. It is
and (15) with the probability
already very useful to compare the observed and expected values of the
local filling coefficients as functions of the corresponding non-spatial
properties (i.e. the out-degree or in-degree). In this case, we do not
introduce any rescaled or “filtered” measure for brevity.
The effects of distances on the binary WTW
We now come to the application of the above methodology to the
WTW. We analyzed the yearly binary snapshots of the network from 1948
to 2000, extracted from a comprehensive dataset (Gleditsch 2002). During
this temporal interval, the number of nodes (countries) increased from
=
to
=
, and the link density,
= /
− ,
raised from
= .
to
= . . By contrast, the average
14
Chapter One
distance,
=∑
/
− , remained quite stable from
=
km to
=
km. This is not surprising, considering that the
Earth’s surface is a bounded space.
, calculated under the three null models,
The global filtered filling
is plotted as a function of time in Fig. 3. In the period under consideration,
. This means that the
all null models always yield negative values of
WTW is a systematically “shrunk” network, confirming the naïve
expectation that geographic distances have a suppressing effect on trade:
the farthest the countries, the lesser the probability to observe a trade
exchange between them (remember that we are carrying out a binary
analysis for the moment). However, the small measured values
seem also to suggest that the role played by
− .
distances is quite weak, a result that appears to contrast classical economic
arguments (Linders Martijn and Van Oort 2008).
While the three models qualitatively agree in classifying the WTW as
spatially shrunk, we observe important quantitative differences both
among models and over time. The temporal trends obtained under the
RCM and the DCM are practically identical, but (from 1960 onwards) they
are almost inverted with respect to the trend obtained under the DRG.
of the binary WTW from year 1948 to
Fig. 3. The filtered filling coefficient
2000, under the three null models considered: DRG (diamonds), DCM (circles)
and RCM (squares).
Disentangling Spatial and Non-spatial Effects in Real Networks
15
Fig. 4. Local outward filling, defined in eq.(14), versus out-degree (top panel) and
local inward filling, defined in eq.(15), versus in-degree (bottom panel). The empty
circles represent the observed values, while the filled circles represent the expected
values predicted by the DCM.
The first finding means that the introduction of reciprocity as an
additional constraint is not really necessary in order to filter out the local
non-spatial effects, which seem to be already effectively discounted by the
in- and out-degree sequences alone.
A naïve explanation might be the high symmetry of the WTW, i.e. the
high number of reciprocated interactions between world countries
(Ruzzenenti Garlaschelli and Basosi 2010; Garlaschelli and Loffredo
16
Chapter One
2004). This high reciprocal structure, which reduces the WTW almost to
an undirected network, makes the information carried by the reciprocity
irrelevant. However, as we show later, this interpretation is incorrect. A
statistically appropriate procedure to quantify and rank the effectiveness of
different models in explaining the observed network structure is presented
in the third section of this chapter. Its application reveals that the
reciprocity is a key and irreducible structural property of the WTW
(Picciolo et al. 2012).
The second finding, i.e. the almost inverted trend of the DRG with
respect to the other two models, is a result of the intrinsic difference
between the homogeneity of the DRG (which controls only for the overall
density of trade) and the heterogeneity of the other models (which control
for country-specific properties). The continuous appearance of unrealized
long-distance connections overcompensates the establishment of a few
new ones, and the overall result is an effective shrinking of the network.
At this point, it is worth mentioning that the topology of the real WTW is
very different from that of the DRG, while it is instead accurately
reproduced by the DCM and especially the RCM (Squartini Fagiolo and
Garlaschelli 2011a; Squartini and Garlaschelli 2013). This means that the
non-spatial effects filtered out by the DRG do not represent key structural
properties shaping the real WTW.
By contrast, the DCM and RCM filter out the most informative
properties, i.e. the ones that are sufficient in order to reproduce the
observed topology of the WTW. The use of the DCM and RCM should
therefore be strongly preferred to that of the DRG when trying to
disentangle spatial and non-spatial effects in the WTW. The empirical
inverted trends shown above warn us about the opposite interpretations
that can arise from a misuse of homogeneous network benchmarks.
Focusing on the trend obtained under the heterogeneous models, we
find that the two periods known in the economic literature as the first and
second “waves” of globalization (De Benedictis and Helg 2002; Crafts
2004) turn out to correspond to two opposite phenomena at a topological
level. During the “first wave”, i.e. the period starting around 1960 during
which many former colonies became independent states, the topology of
the WTW actually “shrunk”.
This result is apparently a paradox, since it is known that the new
independent states (which gradually appear as new nodes in the network)
kept a strong trade relationship with their former colonizers, thus
originating new long-distance links and (in principle) “stretching out” the
WTW. However, one must also note that the appearance of the new nodes,
while accompanied by new long-distance links, is also accompanied by
Disentangling Spatial and Non-spatial Effects in Real Networks
17
many new missing long-distance links: two new (and generally small)
independent states located at opposite locations on the globe typically do
not trade with each other.
By contrast, during the “second wave” of globalization corresponding
to the fall of the east-west division in Europe and the disintegration of the
Soviet Union, the WTW stretched out topologically, as indicated by the
rise of the trend between the late Eighties and the mid Nineties. Since the
trade relationships linking the formerly Soviet states are short distance the
overall stretching of the WTW must be the result of the establishment of
additional long-distance connections. In other words, unlike the previous
phase, the new states are now really internationally integrated, at least at a
topological level.
We now turn to a local analysis of spatial effects. The local spatial
quantities defined in eqs. (14) and (15) are plotted as a function of the
corresponding non-spatial properties in Fig. 4. The top panel shows the
local outward filling plotted versus the out-degree, while the bottom panel
shows the local inward filling plotted versus the in-degree. We show the
results for the year 2000 only, but similar results are observed for all the
considered years. The expected values under the DCM are also plotted; we
do not show the expected values according to the RCM because they
overlap to the DCM ones. We find a strong nonlinear correlation between
node degrees and local filling values (both outward and inward). For
countries with very small and very large degrees, the accordance with the
null model is almost perfect, while the largest discrepancy is observed for
countries with intermediate values of the degree. Our explanation of this
effect is the following. Countries with degree (almost) equal to the
maximum value are necessarily connected with (almost) every other
country, both in the real network and in the null model (because the latter
preserves the number of links of each node). This generates the accordance
with the null model for large-degree countries, and also for small-degree
ones: the latter countries are in turn necessarily connected with the “hubs”,
irrespective of distances. Only countries in the intermediate range of
connectivity can have a large degree of freedom. The figure shows that
these countries have systematically a stronger than predicted tendency to
trade with geographically closer countries.
The global spatial effects discussed above, encapsulated in a negative
value of the filtered filling, come only from these intermediate-degree
countries, and they are therefore not representative of the behavior of all
nodes.
Chapter One
18
Spatial effects in weighted networks
The concepts introduced in the previous section can be generalized to
the weighted case (Ruzzenenti et al. 2012).
A weighted graph can be unambiguously defined by an adjacency matrix,
, represents the intensity of the link from node
, whose generic entry,
to node (we assume again that self-loops are absent, i.e.
= ). In
this section, we first define the weighted counterparts of the quantities we
have already introduced (this also include a definition of weighted null
models). Later on we present the corresponding application to the analysis
of the WTW as a weighted network.
Weighted definitions
By looking at eq. (1), we can define the weighted analogous of
as
=
Similarly, the weighted filling coefficient can be written as
=
∑
∑
max −
−
min
min
.
can be chosen in
Also in the weighted case, the two extreme values of
an arbitrary way. For instance, if we fix the total weight
=
∑ ∑
,
reaches its lowest and highest value when
is placed
between the two nearest and farthest vertices respectively, i.e. min =
and max =
(Ruzzenenti et al. 2012).
As for binary networks, we can introduce null models in order to have
a benchmark filtering out non-spatial effects. The Weighted Random
Graph model (WRG) is the analogous of the DRG for binary networks.
The only constraint we impose is the total weight, , and the Hamiltonian
is
=
.
The expected weight of the link from node to node is
=
∗
−
∗
=
−
Disentangling Spatial and Non-spatial Effects in Real Networks
19
where now ≡
( ∗ is the fitted value corresponding to ∗ ). By
imposing this only constraint, we are exclusively making use of the
average intensity of the links (Garlaschelli 2009).
The second weighted null model we consider is the Weighted
Configuration Model (WCM), where the constraints are the in-strength
and out-strength sequences, defined by the
values of the in-strength,
=∑
, and the out-strength,
=∑
, of vertices. The
expected link is now
=
out
∗ ∗
−
∗ ∗
in
( ∗ and ∗ indicate the fitted values
≡
and
≡
where
realizing the observed strength sequences) (Squartini and Garlaschelli
2011).
Even if a more comprehensive list of null models for weighted
networks has been defined recently, we will consider only the WRG and
WCM for brevity (Squartini et al. 2013; Mastrandrea et al., 2013).
As in the binary case, these models allow us to obtain the expected
value of the weighted filling coefficient by simply substituting in eq. (17)
the observed link weight,
, with the expected one,
, calculated
using either eq. (18) or eq. (19). The observed and expected values can be
combined in the following definition of weighted filtered filling:
≡
−
−
which again ranges between − and + . A positive (negative) value of
means that distances have a stretching (shrinking) effect on the link
weights of the observed weighted network (Ruzzenenti et al. 2012).
A final extension concerns the local structure. The sums
w out
w in
≡
≡
lead us to the following definitions of local outward weighted filling
(Ruzzenenti et al. 2012)
w out
≡
∑
w out
−
max −
w out
w out
min
min
Chapter One
20
and local inward weighted filling
w in
≡
∑
w in
max
−
−
w in
w in
min
min
where the minimum and maximum values for w out and w in characterize
the maximally stretched and shrunk possibilities for vertex , in a properly
chosen weighted scenario. In analogy with the choice made for the global
quantity, we choose a scenario where the total weight
is fixed. The
resulting expected in-strength and out-strength of every vertex have the
same value
/ . In a straightforward approach, our choice for the
extreme values of w out and w in is such that vertex concentrates all its
out-strength in a single outgoing link of weight / directed to the
spatially closest vertex, and all its in-strength in a single incoming link of
weight
/ coming from the same vertex. Note that this implies
w in
w in
= w out max and
= w out min . As above, the
max
min
expected local outward filling is simply obtained by replacing the terms
in eqs. (22) and (23) with the expectations
under the chosen null
model.
The effects of distances in the weighted WTW
We can now perform a new analysis of the WTW, by considering its
weighted structure rather than its topology. Our weighted analysis spans
again the years from 1948 to 2000.
As shown in Fig. 5, the (small) negative values of the global filtered
filling confirm that the WTW is a (weakly) shrunk network. However, the
temporal trends are very different from the corresponding binary ones.
Surprisingly, according to the WCM, the strongest spatial stretching
occurred during the Fifties, while during the first wave of globalization the
trend remained approximately constant. The second wave of globalization
corresponds instead to a decreasing trend, now signaling an unexpected
spatial shrinking of the network.
The WRG is instead more in line with the DCM, and identifies a
shrinking during the first wave and a sudden stretching during the second
wave. Considering together the binary and weighted results, it appears
Disentangling Spatial and Non-spatial Effects in Real Networks
21
Fig. 5. The filtered weighted filling for the WTW from year 1948 to 2000, under
the two null models WRG (diamonds) and WCM (circles).
that two tendencies coexist. First, the WTW topology has become more
and more stretched during the last decade of the sample, with distances
opposing less and less resistance. Second, the intensity of trade exchanges
has risen more between countries that are geographically closer, with
distances opposing more and more resistance.
In other words, it appears that during the last wave of globalization the
WTW has, from an “extensive” point of view, tended to stretch out in its
embedding space by effectively preferring long-distance connections, and,
from an “intensive” point of view, tended to shrunk in by strengthening
the existing links between close neighbors.
However, the above results must be interpreted with particular care,
since (unlike the DCM) both the WRG and the WCM are known to be
very poor models of the WTW (Squartini Fagiolo and Garlaschelli 2011b).
We therefore warn the reader that the WCM does not filter out the
weighted, non-spatial patterns as satisfactorily as the DCM does. In order
to reproduce the weighted structure of the WTW, a more refined model
combining binary and weighted constraints is needed (Mastrandrea et al.,
2013; Squartini and Garlaschelli 2013). Thus, even if from an economic
point of view the WCM might appear more satisfactory than the DCM,
because it controls for the total imports and exports of countries, it turns
out to be uninformative about other properties of the network.
Counter-intuitively, the number of exporters and importers (which
defines the DCM), turns out to be a much more informative property. We
will comment again on this point when discussing Gravity Models at the
end of the chapter.
22
Chapter One
Keeping the above warning in mind, we finally consider the local
spatial effects in the weighted WTW. The top panel of Fig. 6 shows both
the observed and expected local weighted outward filling, plotted versus
the out-strength sequence, while the bottom panel shows both the observed
and expected local weighted inward filling, plotted versus the in-strength
sequence. We only show the results for the year 2000, but similar results
are observed for all the considered time period.
Fig. 6. Local outward weighted filling versus out-strength (top panel) and local
inward weighted filling versus in-strength (bottom panel). The empty circles
represent the observed values, while the filled circles represent the expected values
predicted by the WCM.