Complex Networks

Download as pdf or txt
Download as pdf or txt
You are on page 1of 145

Complex networks:

an introduction
Alain Barrat
CPT, Marseille, France
ISI, Turin, Italy
http://www.cpt.univ-mrs.fr/~barrat
http://cxnets.googlepages.com
http://www.sociopatterns.org
REVIEWS:

•Statistical mechanics of complex networks


R. Albert, A.-L. Barabasi, Reviews of Modern Physics 74, 47 (2002),
cond-mat/0106096
•The structure and function of complex networks
M. E. J. Newman, SIAM Review 45, 167-256 (2003), cond-mat/
0303516
•Evolution of networks
S.N. Dorogovtsev, J.F.F. Mendes, Adv. Phys. 51, 1079 (2002) , cond-
mat/0106144
•Complex Networks: Structure and Dynamics
S. Boccaletti, V. Latora, Y. Moreno, M. Chavez, D.-U. Hwang,
Physics Reports 424 (2006) 175
•Evolution of Networks: From Biological Nets to the
Internet and WWW, S.N. Dorogovtsev and J.F.F.
Mendes. Oxford University Press, Oxford, 2003.

•Evolution and Structure of the Internet: A Statistical


Physics Approach, R. Pastor-Satorras and A.
Vespignani. Cambridge University Press, Cambridge,
2004.

•Scale-free networks: Complex Webs in Nature and


Technology, G. Caldarelli. Oxford University Press,
Oxford, 2007
Networks, Crowds, and Markets: Reasoning About a Highly
Connected World
D. Easley, J. Kleinberg

Networks, An introduction
M. Newman

4
•Statistical mechanics of complex networks, edited by R.
Pastor-Satorras, J.-M. Rubí, A. Diaz-Guilera. Springer,
2003.

•Handbook of graphs and networks: from the Genome


to the Internet, edited by S. Bornholdt, H. G. Schuster.
John Wiley and Sons, 2003.

•Large Scale Structure and Dynamics of Complex


Networks: From Information Technology to Finance
and Natural Science, edited by G. Caldarelli and A.
Vespignani. World Scientific, 2007.
Outline of the lectures

I. INTRODUCTION
I. Networks: definitions, statistical characterization, examples
II. Modeling frameworks
II. DYNAMICAL PROCESSES
I. Resilience, vulnerability
II. Random walks
III. Epidemic processes
IV. Social phenomena
V. Some perspectives
What is a network
Network=set V of nodes joined by links (set E)

very abstract representation


very general
convenient to describe
many different systems
Examples
The bridges of Koenigsberg

L. Euler:
Can one walk across the seven bridges and
never cross the same bridge twice?
1735: Leonhard Euler’s theorem:

(a) If a graph has nodes of odd degree, there is no path.

(b) If a graph is connected and has no odd degree nodes, it has at least one path.

11
Metabolic Network Protein Interactions

Nodes: metabolites Nodes: proteins


Links:chemical reactions Links: interactions
Food-webs

N. Martinez, R. Williams
Scientific collaboration network

Weights: depending on
•number of co-authored papers
•number of authors of each paper
•number of citations…

M. E. J. Newman and M. Girvan, Physical Review E 69, 026113 (2004).


Image: MEJ Newman, http://www-personal.umich.edu/~mejn/networks/
Primary school,
cumulative contact network
High-school dating
Transportation network:
Urban level
TRANSIMS project

Nodes=locations (homes, shops, offices…)


Weighted links=flow of individuals
World airport network

Example: complete IATA database


 V = 3100 airports
 E = 17182 weighted edges

 wij #seats / (time scale)


Meta-population networks

Each node: internal structure


Links: transport/traffic

City i

City a

City j
Internet
Graph representation

different
granularities
Internet mapping
•continuously evolving and growing
•intrinsic heterogeneity
•self-organizing
Largely unknown topology/properties

Mapping projects:
•Multi-probe reconstruction (router-level): traceroute
•Use of BGP tables for the Autonomous System level (domains)

•CAIDA, NLANR, RIPE, Topology and performance


IPM, PingER, DIMES measurements
CAIDA AS cross section map
Internet, AS level
http://lanet-vi.soic.indiana.edu/

23
The World-Wide-Web
Virtual network to find and share informations
•web pages
•hyperlinks
Online (virtual) social networks
Online (virtual) social networks
Online (virtual) social networks
Interdisciplinary science
Science of complex networks:
-graph theory
-sociology
-communication science
-biology
-physics
-computer science
Interdisciplinary science
Science of complex networks:
• Empirics
• Characterization
• Modeling
• Dynamical processes
Networks characteristics
Networks: of very different origins

Do they have anything in common?


Possibility to find common properties?

the abstract character of the graph representation


and graph theory allow to answer….
Different types of networks

• Undirected

• Directed

• Weighted

• Multi-relational

• Bipartite
Graph theory: basics
G=(V,E) ; |V|=N

Maximum number of edges


• Undirected: N(N-1)/2
• Directed: N(N-1)

Complete graph:

(all to all interaction/communication)


How to represent a graph
• List of nodes + list of edges
i,j

• List of nodes + list of neighbors of each node


(adjacency lists)
1: 2,3,10,...
2: 1,12,11
3: 1,...

• Adjacency matrix
Adjacency matrix
N nodes i=1,…,N

aij= 1 if (i,j) ∈ E
0 if (i,j) ∉ E
0 1 2 3 0 1
0 0 1 1 1
1 1 0 1 1 2
2 1 1 0 1
3
3 1 1 1 0
Adjacency matrix
N nodes i=1,…,N

aij= 1 if (i,j) ∈ E Symmetric


0 if (i,j) ∉ E for undirected networks

0 1 2 3 0 1
0 0 1 0 0
1 1 0 1 1 2
2 0 1 0 1
3
3 0 1 1 0
Adjacency matrix
N nodes i=1,…,N

aij= 1 if (i,j) ∈ E Non symmetric


0 if (i,j) ∉ E for directed networks
0
0 1 2 3 1
0 0 1 0 1
1 0 0 0 0 2
2 0 1 0 0
3
3 0 1 1 0
Matrix of weights
N nodes i=1,…,N

(Non symmetric
wij= ≠ 0 if (i,j) ∈ E
for directed networks)
0 if (i,j) ∉ E
0
0 1 2 3 1
0 0 2 0 10
1 0 0 0 0 2
2 0 5 0 0
3
3 0 1 2 0
Sparse graphs
Density of a graph D=|E|/(N(N-1)/2)

Number of edges
D=
Maximal number of edges

Sparse graph: D <<1 Sparse adjacency matrix

Representation by lists of neighbours of each


node (adjacency lists) better suited
Node characteristics:
Degrees and strengths

39
Node characteristics
• Degree=number of neighbours=∑j aij

ki=5
i
NB: in a sparse graph we expect ki << N

0 1 2 3 0 1
0 0 1 0 0
1 1 0 1 1 2
i 2 0 1 0 1
3
3 0 1 1 0
Node characteristics
• Degree in directed graphs:
– in-degree= number of in-neighbours=∑j aji

– out-degree= number of out-neighbours=∑j aij

0
0 1 2 3 1
0 0 1 0 1
1 0 0 0 0 2
2 0 1 0 0
3
3 0 1 1 0
Node characteristics
• Weighted graphs: Strength si = ∑j wij
• Directed Weighted graphs:
–in-strength si = ∑j wji
–out-strength si = ∑j wij
0
0 1 2 3 1
0 0 2 0 10
1 0 0 0 0 2
2 0 5 0 0
3
3 0 1 2 0
Paths, connectedness,
small-world effect

43
Paths
G=(V,E)
Path of length n = ordered collection of
• n+1 vertices i0,i1,…,in ∈ V
• n edges (i0,i1), (i1,i2)…,(in-1,in) ∈ E
i3
i4
i5
i0 i1 i2

Cycle/loop = closed path (i0=in)


Tree=graph with no loops
Paths and adjacency matrix

(an)ij =number of paths of length n between i and j

Example: (a2)ij = ∑ aik akj


Paths and connectedness
G=(V,E) is connected if and only if there exists a
path connecting any two nodes in G

is connected

•is not connected


•is formed by two components
Paths and connectedness

G=(V,E)=> distribution of components’ sizes

Giant component= component whose


size scales with the number of vertices N

Existence of a giant Macroscopic fraction of


component the graph is connected
Paths and connectedness:
directed graphs
Paths are directed
Giant SCC: Strongly
Giant IN Connected Component Giant OUT
Component Component
Disconnected
components

Tendrils
Tube Tendril
Shortest paths
Shortest path between i and j: minimum number
of traversed edges
distance l(i,j)=minimum
j
number of edges traversed
on a path between i and j
i

Diameter of the graph= max(l(i,j))


Average shortest path= ∑ij l(i,j)/(N(N-1)/2)
Complete graph: l(i,j)=1 for all i,j
“Small-world”: “small” diameter
Social networks:
Milgram’s experiment
Milgram, Psych Today 2, 60 (1967)
Dodds et al., Science 301, 827 (2003)

“Six degrees of separation”

SMALL-WORLD CHARACTER
Social networks as small-worlds:
Milgram’s experiment, revisited
Dodds et al., Science 301, 827 (2003)
email chains

60000 start nodes


18 targets
384 completed chains

51
Small-world properties
Average number of nodes
within a chemical distance l

Scientific collaborations

Internet
The intuition behind the small-world effect

versus

Tree: (local) regular structure: slower


number of reachable nodes growth of the number of
grows very fast (exponentially) reachable nodes (polynomial),
with the distance because of path redundancy
Small-world yet clustered

54
Structure of neighborhoods
k
Clustering coefficient of a node
3
# of links between 1,2,…n neighbors
C(i) =
i 2
k(k-1)/2

Clustering: My friends will know each other with high probability!


(typical example: social networks)
Structure of neighborhoods

Average clustering coefficient of a graph


C=∑i C(i)/N

NB: slightly different definition from the


fraction of transitive triples:
3 x number of fully connected triples
C’ =
number of triples
Clustering coefficient
n
Empirically: large clustering coefficients
3

Higher probability to be connected

2
1

Clustering: My friends will know each other with high probability


(typical example: social networks)
Empirical networks:
• small-world
• locally very “structured”
=> cf lecture on models

58
Ranking nodes

59
Centrality measures
How to quantify the importance of a node?
• Degree=number of neighbours=∑j aij

ki=5
i

• Large degree nodes=”hubs”


• Nodes with very large degree can be
“peripheral”
Path-based centrality measures

• Closeness centrality

gi= 1 / ∑j l(i,j)

Quantifies the reachability of other nodes

61
Betweenness centrality
for each pair of nodes (l,m) in the graph, there are
σlm shortest paths between l and m
σilm shortest paths going through i
bi is the sum of σilm / σlm over all pairs (l,m)

path-based quantity

i
bi is large
j bj is small

NB: similar quantity= load li=∑ σilm


NB: generalization to edge betweenness centrality
Betweenness centrality
path-based quantity => bc(i) depends on all the nodes
that are connected to i by at least one path

non-local quantity

“hard” to compute

“naive” algorithm: O(N3)


Brandes algorithm: O(N*E)
Statistical characterization
of networks

64
Statistical characterization
Degree distribution

•List of degrees k1,k2,…,kN Not very useful!


•Histogram: P(k)
0.6

Nk= number of nodes with degree k


0.5
0.4
0.3
0.2
0.1

•Distribution: 1 2 3 4 k

P(k)=Nk/N=probability that a randomly chosen


node has degree k
•Cumulative distribution:
P>(k)=probability that a randomly chosen
node has degree at least k
Statistical characterization
Degree distribution

P(k)=Nk/N=probability that a randomly chosen


node has degree k

Average=< k > = ∑i ki/N = ∑k k P(k)=2|E|/N


Sparse graphs: < k > << N

Fluctuations: < k2> - < k > 2


< k2 > = ∑i k2i/N = ∑k k2 P(k)
< kn > = ∑k kn P(k)
Topological heterogeneity
Statistical analysis of centrality measures:

P(k)=Nk/N=probability that a randomly chosen


node has degree k

Two broad classes


•homogeneous networks: light tails
•heterogeneous networks: skewed, heavy tails
Airplane route network
CAIDA AS cross section map
Topological heterogeneity
Statistical analysis of centrality measures

Broad degree
distributions
(often: power-law tails
P(k) ∝ k-γ ,
typically 2< γ <3)

No particular
Internet characteristic scale
Unbounded fluctuations
Topological heterogeneity
Statistical analysis of centrality measures:

linear scale
Poisson
vs.
Power-law
log-scale
Exp. vs. Scale-Free
Poisson distribution Power-law distribution

Exponential Network Scale-free Network


Consequences
Power-law tails
P(k) ∝ k-γ
Average=< k > = ∫k P(k)dk
Fluctuations
< k > =∫ k P(k) dk ∝ kc γ
2 2 3-

kc=cut-off due to finite-size


N → ∞ => diverging degree fluctuations
for γ < 3

Level of heterogeneity:
Other heterogeneity levels

Weights

Strengths
Main things to (immediately) measure
in a network

• Degree distribution

• Distances, average shortest path, diameter

• Clustering coefficient

• (Weights/strengths distributions)

75
Networks characteristics
Most often:

• Small diameter
• Large cohesiveness (clustering)
• Heterogeneities
Of course, this is not
the whole story...

77
Correlations

78
Statistical characterization
Multipoint degree correlations

P(k): not enough to characterize a network

Large degree nodes tend to


connect to large degree nodes
Ex: social networks

Large degree nodes tend to


connect to small degree nodes
Ex: technological networks
Statistical characterization
Multipoint degree correlations
Measure of correlations:
P(k’,k’’,…k(n)|k): conditional probability that a node of
degree k is connected to nodes of degree k’, k’’,…

Simplest case:
P(k’|k): conditional probability that a node of degree k is
connected to a node of degree k’

often inconvenient (statistical fluctuations)


Statistical characterization
Multipoint degree correlations
Practical measure of correlations:
average degree of nearest neighbors

k=4
k=4
i

k=7 k=3 ki=4


knn,i=(3+4+4+7)/4=4.5
Statistical characterization
average degree of nearest neighbors

Correlation spectrum:
putting together nodes which
have the same degree

class of degree k
Statistical characterization
case of random uncorrelated networks

P(k’|k)
• independent of k
• proba that an edge points to a node of degree k’
number of edges from nodes of degree k’
number of edges from nodes of any degree

Punc(k’|k)=k’P(k’)/< k > proportional


to k’ itself
Typical correlations

• Assortative behaviour: growing knn(k)


Example: social networks
Large sites are connected with large sites

• Disassortative behaviour: decreasing knn(k)


Example: internet
Large sites connected with small sites, hierarchical
structure
Correlations:
Clustering spectrum
•P(k’,k’’|k): cumbersome, difficult to estimate from data
•Average clustering coefficient C=average over nodes with
very different characteristics

Clustering spectrum:
putting together nodes which
have the same degree

class of degree k
Empirical clustering and
correlations

non-trivial
structures

No special
scale
Correlations in weighted
networks
Weighted assortativity

5 5
1

5
i
5

ki=5; knn,i=1.8
Weighted assortativity

1 1
5

1
i
1

ki=5; knn,i=1.8
Weighted assortativity

5 5
1

5
i
5

ki=5; si=21; knn,i=1.8 ; knn,iw=1.2: knn,i > knn,iw


Weighted assortativity

1 1
5

1
i
1

ki=5; si=9; knn,i=1.8 ; knn,iw=3.2: knn,i < knn,iw


Examples:

● Scientific collaborations: cond-mat archive;


N=12722 authors, 39967 links
● Airports' network: data by IATA; N=3863 connected
airports, 18807 links

92
Assortativity and weighted
assortativity

Airports' network

knn(k) < knnw(k): larger weights between large nodes


Assortativity and weighted
assortativity
Scientific collaborations

knn(k) < knnw(k): larger weights between large nodes


Weighted clustering coefficient

i i
wij=1
wij=5

si=16 si=8
ki=4
ciw=0.625 > ci ciw=0.25 < ci
ci=0.5
Weighted clustering coefficient

k
Average clustering coefficient
(wjk) C=∑i C(i)/N
wik
j Cw=∑i Cw(i)/N
i wij

Random(ized) weights: C = Cw
C < Cw : more weights on cliques
C > Cw : less weights on cliques

Clustering spectra
Examples:

● Scientific collaborations: cond-mat archive;


N=12722 authors, 39967 links
● Airports' network: data by IATA; N=3863 connected
airports, 18807 links

97
Clustering and weighted clustering
Scientific collaborations: C= 0.65, Cw ~ C

C(k) ~ Cw(k) at small k, C(k) < Cw(k) at large k: larger weights on cliques
Clustering and weighted clustering
Airports' network: C= 0.53, Cw=1.1 C

C(k) < Cw(k): larger weights on cliques at all scales


Participation ratio

1/ki if all weights equal

close to 1 if few weights dominate


Weights-topology correlations

101
Weighted graphs:
• Strength si = ∑j wij
• Degree ki = ∑j aij

<s(k)>=average strength of nodes of degree k


• no correlations, or random weights:
<s(k)>~ <w> k
• existence of correlations:
<s(k)> ~ kα
102
Examples:

● Scientific collaborations: cond-mat archive;


N=12722 authors, 39967 links
● Airports' network: data by IATA; N=3863 connected
airports, 18807 links
● Human contact networks

103
Strengths and degrees

104
Strengths vs degrees
Collaboration network

105
Strengths vs degrees
Air transportation network

S(k) proportional to kβ, β=1.5

Randomized weights: β=1

Strong correlations between topology and dynamics


Human contact networks
Conference Museum

k=number of distinct persons contacted


s=total time spent in contact
Random weights: s ~ <w>k
Hierarchies
A way to measure hierarchies:
K-core decomposition
graph G=(V,E)
–k-core of graph G: maximal subgraph such that for all
vertices in this subgraph have degree at least k

–vertex i has shell index k iff it belongs to the k-core


but not to the (k+1)-core

–k-shell: ensemble of all nodes of shell index k


Example

1-core

shell index 1
shell index 2
2-core
3-core shell index 3
k-core analysis
maximal shell index
size of shells versus shell index
structure of the successive cores:
degree distribution, correlations, clustering...

Remarks
 Additional investigation tool
 Easy to construct

 Focuses on more and more central parts of the network

 Visualization tool
Analysis of AS Internet maps
Analysis of Internet maps
Degree distributions
Analysis of Internet maps
Clustering and assortativity properties
Subgraphs,
communities
Subgraphs

A subgraph of G=(V,E) is a graph G’=(V’,E’)


such that
V’ ⊆ V and E’ ⊆ E
i.e.,V’ and E’ are subsets of nodes and edges of G

Special case: subgraph induced by a set of nodes=


-this set of nodes
-and all links of G between these nodes

Particular subgraphs=connected components


Cliques

A clique is a set C of nodes of G=(V,E)


such that
for all i,j ∊ C, (i,j)∊ E

Examples:
Communities and community detection
Communities: examples
Communities: examples

Social network
(Zachary’s karate club)
Communities: examples

Scientist collaboration network


(Santa Fe Institute)
Communities: examples

Protein-protein interaction network


Communities: examples

Word association network


Communities: (loose) definition

Group of nodes
that are more tightly linked together
than with the rest of the graph
Why are communities interesting?

Discover groups in social networks

Discover common interests

Understand opinion formation mechanisms


Communities: opinion formation/consensus
Communities: (loose) definition

subgraph C of G, n nodes, e links


internal density d = e/(n(n-1)/2)

For C to be a community, we expect:


• d (much) larger than density of G
• d (much) larger than the density of links towards G\C,
given by d’= e’/(n(N-n)), where e’=number of links
between nodes of C and nodes of G\C

n=7, e=10 C G\C


N-n=8, e’=2
d=0.24, d’=0.035
Communities: detection problem

Group of nodes
that are more tightly linked together
than with the rest of the graph

• How to (systematically) detect such groups?


• How to partition a graph into communities?
• How to check if it makes sense?
Communities: detection problem

Two main classes of methods:

-agglomerative: merging clusters iteratively


-divisive: splitting clusters by removing edges

Then: choice of a certain level of the dendogram

Dendogram
Communities: detection problem

Divisive methods: splitting clusters by removing edges


Girvan-Newman algorithm: use edge betweenness centrality

1. Computation of the centrality for all edges;


2. Removal of edge with largest centrality: in case of ties with other
edges, one of them is picked at random;
3. Recalculation of centralities on the new graph;
4. Iteration of the cycle from step 2.

Betweenness computation: O(E2N) => rather slow algorithm


Communities: detection problem
A quality function for graph partitions:
the modularity

Given a partition of a graph, how to quantify


if it is a “good” division into communities?

Fundamental idea: compare the density of edges in


each subgraph to a null model, i.e., a case in which no
community structure is expected
Communities: detection problem
A quality function for graph partitions:
the modularity

Modularity Q of a partition (C1,C2,....):


-Sum over all pairs of vertices
-E=number of links
Q = 1/(2E) ∑ij (aij - Pij) δ(Ci,Cj) -aij=adjacency matrix
-Pij=null model adjacency matrix
-δ(Ci,Cj): 1 if Ci=Cj, 0 otherwise

Compares the average density of the subgraphs Ci


with what it would be in the null model case
Communities: detection problem
A quality function for graph partitions: the modularity

Modularity Q of a partition (C1,C2,....):


Q = 1/(2E) ∑ij (aij - Pij) δ(Ci,Cj)

Choice of the null model:


•uniform probability p of connecting any pair of nodes i,j
•randomly linking nodes, keeping their true degrees ki
(then Pij=kikj/(2E) )

Large Q: larger intra-cluster densities of links than if nodes


were linked at random
Small Q: similar to a random graph with same degree sequence

NB: extensions to weighted and directed graphs


Communities: detection problem
A quality function for graph partitions:
the modularity

Quality function

Optimization of the quality function

“Modularity optimization algorithms”


Modularity optimization algorithms

greedy method (Newman)

• starting from N clusters: 1 node=1 cluster


• join two clusters in the way that increases most Q
• repeat
=> tends to create large communities
=> not very efficient
=> many proposed variations to get better results
Modularity optimization algorithms

“Louvain” method (Blondel et al.): (weighted graphs)


successive steps of
• local modularity optimization by grouping nodes
• aggregation of all nodes of each community into “supernodes”

NB: implemented in gephi


There are limits and drawbacks to modularity optimization

Nodes can belong to multiple communities

Many algorithms available, most often open

http://www.cfinder.org/
http://www.oslom.org/
http://www.tp.umu.se/~rosvall/code.html

For a review
S. Fortunato, Phys. Rep. 486, 75-174, 2010
(http://sites.google.com/site/santofortunato/)
Modularity resolution limit

S. Fortunato, M. Barthélemy, Resolution limit in community detection,


Proc. Natl. Acad. Sci. USA 104 (2007) 36–41.
Networks and complexity

140
Complex networks
Complex is not just “complicated”

Cars, airplanes…=> complicated, not complex

Complex (no unique definition):


•many interacting units
•no centralized authority, self-organized
•complicated at all scales
•evolving structures
•emerging properties (heavy-tails, hierarchies…)

Examples: Internet, WWW, Social nets, etc…


Example: Internet growth
Main features of complex networks
•Many interacting units
•Self-organization
•Small-world
•Scale-free heterogeneity
•Dynamical evolution
Standard graph theory
Random graphs
•Static
•Ad-hoc topology

Example: Internet topology generators


Modeling of the Internet structure with ad-hoc algorithms
tailored on the properties we consider more relevant
Statistical physics approach
Microscopic processes of the
many component units

Macroscopic statistical and dynamical


properties of the system

Cooperative phenomena Natural outcome of


Complex topology the dynamical evolution

Development of new modeling frameworks


Plan of the lecture

I. INTRODUCTION
I. Networks: definitions, statistical characterization, examples
II. Modeling frameworks
II. DYNAMICAL PROCESSES
I. Resilience, vulnerability
II. Random walks
III. Epidemic processes
IV. Social phenomena
V. Some perspectives

You might also like