Navigability Complex PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Navigability of complex networks

Mari
an Bogu
na,1 Dmitri Krioukov,2 and kc claffy2

arXiv:0709.0303v2 [physics.soc-ph] 10 Sep 2008

Departament de Fsica Fonamental, Universitat de Barcelona, Mart i Franqu`es 1, 08028 Barcelona, Spain
2
Cooperative Association for Internet Data Analysis (CAIDA), University of California,
San Diego (UCSD), 9500 Gilman Drive, La Jolla, CA 92093, USA
Routing information through networks is a universal phenomenon in both natural and manmade
complex systems. When each node has full knowledge of the global network connectivity, finding
efficient communication paths is merely a matter of distributed computation. However, in many real
networks nodes communicate efficiently even without such global intelligence. Here we show that
the peculiar structural characteristics of observable complex networks are exactly the characteristics
needed to maximize their communication efficiency without global knowledge. We also describe a
general mechanism that explains this connection between network structure and function. This
mechanism relies on the presence of a metric space hidden behind an observable network. Our findings suggest that real networks in nature have underlying metric spaces that remain undiscovered.
Their discovery would have enormous practical applications ranging from routing in the Internet
and searching social networks, to studying information flows in neural, gene regulatory networks, or
signaling pathways.

I.

INTRODUCTION

Networks are ubiquitous in all domains of science and


technology, and permeate many aspects of daily human
life [14], especially upon the rise of the information technology society [5, 6]. Our growing dependence on them
has inspired a burst of activity in the new field of network science, keeping researchers motivated to solve the
difficult challenges that networks offer. Among these, the
relation between network structure and function is perhaps the most important and fundamental. Transport is
one of the most common functions of networked systems.
Examples can be found in many domains: transport of
energy in metabolic networks, of mass in food webs, of
people in transportation systems, of information in cell
signalling processes, or of bytes across the Internet.
In many of these examples, routing or signalling of
information propagation paths through a complex network maze plays a determinant role in the transport
properties of the system. The observed efficiency of this
routing process in real networks poses an intriguing question: how is this efficiency achieved? When each element of the system has a full view of the global network
topology, finding efficient routes to target destinations is
a well-understood computational process. However, in
many networks observed in nature, including those in society and biology (signalling pathways, neural networks,
etc.), nodes efficiently find intended communication targets even though they do not possess any global view
of the system. For example, neural networks would not
function so well if they could not route specific signals to
appropriate organs or muscles in the body, although no
neurone has a full view of global inter-neurone connectivity in the brain.
In this work, we identify a general mechanism that
explains routing conductivity, or navigability of real
networks based on the concept of similarity between
nodes [710]. Specifically, intrinsic characteristics of
nodes define a measure of similarity between them, which

we abstract as a hidden distance. Taken together, hidden distances define a hidden metric space for a given
network. Our recent work shows that these spaces explain the observed structural peculiarities of several real
networks, in particular social and technological ones [11].
Here we show that this underlying metric structure can
be used to guide the routing process, leading to efficient
communication without global information in arbitrarily
large networks. Our analysis reveals that, remarkably,
real networks satisfy the topological conditions that maximise their navigability within this framework. Therefore, hidden metric spaces offer explanations of two open
problems in complex networks science: the communication efficiency networks so often exhibit, and their unique
structural characteristics. Our results have enormous
consequences for network science and engineering, opening the possibility, for example, to design efficient routing
and searching strategies for the Internet and other technological or social networks.

II.

NODE SIMILARITY AND HIDDEN METRIC


SPACES

Our work is inspired by the seminal work of sociologist


Stanley Milgram on the small world problem. The small
world paradigm refers to the existence of short chains
of acquaintances among individuals in societies [17]. At
Milgrams time, direct proof of such a paradigm was impossible due to the lack of large databases of social contacts, so Milgram conceived an experiment to analyse
the small world phenomenon in human social networks.
Randomly chosen individuals in the United States were
asked to route a letter to an unknown recipient using only
friends or acquaintances that, according to their judgement, seemed most likely to know the intended recipient.
The outcome of the experiment revealed that, without
any global network knowledge, letters reached the target
recipient using, on average, 5.2 intermediate people[22],

FIG. 1: How hidden metric spaces influence the structure and function of complex networks. The smaller the
distance between two nodes in the hidden metric space, the more likely they are connected in the observable network topology.
If node A is close to node B, and B is close to C in the hidden space, then A and C are necessarily close, too, because of the
triangle inequality in the metric space. Therefore, triangle ABC exists in the network topology with high probability, which
explains the strong clustering observed in real complex networks. The hidden space also guides the greedy routing process: if
node A wants to reach node F (hidden distance AF is the black dashed line), it checks the hidden distances between F and
its two neighbours B and C. Distance CF (green dashed line) is smaller than BF (red dashed line), therefore A forwards
information to C. Node C then performs similar calculations and selects its neighbour D as the next hop on the path to F .
Node D is directly connected to F . The result is path A C D F shown by green edges in the observable topology.

demonstrating that social acquaintance networks were indeed small worlds.


The small world property can be easily induced by
adding a small number of random connections to a large
world network [12]. More striking is the fact that social networks are navigable without global information.
Indeed, the only information that people used to make
their routing decisions in Milgrams experiment was a set
of descriptive attributes of the destined recipient, such as
place of living and occupation. People then determined
who among their contacts was socially closest to the
target. The success of the experiment indicates that social distances among individuals even though they may
be difficult to define mathematically play a role in shaping the network architecture and that, at the same time,
these distances can be used to navigate the network.
However, it is not clear how this coupling between the
structure and function of the network leads to efficiency
of the search process, or what the minimum structural
requirements are to facilitate such efficiency [18].
In this work, we show how network navigability depends on the structural parameters characterising the
two most prominent and common properties of real complex networks: (1) scale-free (power-law) node degree dis-

tributions characterising the heterogeneity in the number


of connections that different nodes have, and (2) clustering, a measure of the number of triangles in the network
topology. We assume the existence of a hidden metric
space, an underlying geometric frame that contains all
nodes of the network, shapes its topology, and guides
routing decisions, as illustrated in Fig. 1. Nodes are connected in the observable topology, but a full view of their
global connectivity is not available at any node. Nodes
are also positioned in the hidden metric space and identified by their co-ordinates in it. Distances between nodes
in this space abstract their similarity [710]. These distances influence both the observable topology and routing function: (1) the smaller the distance between two
nodes in the hidden space, i.e., the more similar the two
nodes, the more likely they are connected in the observable topology; (2) nodes also use hidden distances to select, as the next hop, the neighbour closest to the destination in the hidden space. Kleinberg introduced the term
greedy routing to describe this forwarding process [18].
We use the class of network models developed in recent work [11]. They generate networks with topologies
similar to those of real networks small-world, scale-free,
and with strong clustering and, simultaneously, with

=2.2
3

10

10

network size (N)

10

10

2.2

2.4

2.6

2.8

hidden metric spaces lying underneath. The simplest


model in this class (the details are in Appendix A) uses a
one-dimensional circle as the underlying metric space, in
which nodes are uniformly distributed. The model first
assigns to each node its expected degree k, drawn from a
power-law degree distribution P (k) k , with > 2,
and then connects each pair of nodes with connection
probability r(d; k, k 0 ) that depends both on the distance
d between the two nodes in the circle and their assigned
degrees k and k 0 ,
,

=2.1
=2.2
=2.3
=2.4
=2.5

=2.6
=2.7
=2.8
=2.9
=3.0
3

10

=1.1
4

(1)

where > 1 and dc kk 0 , which means that the probability of link connection between two nodes in the network decreases with the hidden distance between them
(as d ) and increases with their degrees (as (kk 0 ) ).
These two properties have a clear interpretation. The
connection cost increases with hidden distance, thus discouraging long-range links. However, in making connections, rich (well-connected, high-degree) nodes care less
about distances (connection costs) than poor nodes. Further, the characteristic distance scale dc provides a coupling between node degrees and hidden distances, and ensures the following three topological characteristics that
we commonly see in real networks. First, pairs of richly
connected, high-degree nodes hubs are connected with
high probability regardless of the hidden distance between them because their characteristic distance dc is
so large that any actual distance d between them will be
short in comparison: regardless of d, connection probability r in Eq. (1) is close to 1 if dc is large. Second, pairs
of low-degree nodes will not be connected unless the hidden distance d between them is short enough to compare
with the small value of their characteristic distance dc .
Third, following similar arguments, pairs composed of

=1.1
=1.5
=2.0
=3.0
=5.0

0.4

0.2

10

2.2

2.4

2.8

2.6

degree exponent ()
3

0.6

0.5

0.4

=5.0
0.3
2
10

0.6

10

network size (N)

0.7

FIG. 2: Average length of greedy-routing paths. The


left plot shows the average hop length of successful paths, ,
as a function of the network size N for different values of
and . Results for values of > 2.5 look similar but with
longer paths and are omitted for clarity. In all cases, the
path length grows polylogarithmically with the network size:
the observed values of are fit well by (N ) = A[log N ]
(solid lines), where A and are some constants. The right
plot shows as a function of and for networks of fixed
size N 105 . The effect of the two parameters on average
path length is straightforward: paths are shorter for smaller
exponents and stronger clustering (larger s).

0
2
10

degree exponent ()

r(d; k, k 0 ) r(d/dc ) = (1 + d/dc )

0.1

success probability (ps)

=1.5
=2.5
=3.5
=4.5

degree exponent ()

=2.5

success probability (ps)

=1.5
=2.5
=3.5
=4.5

15

success probability (ps)

average hop length ()

average hop length ()

0.2
8

10

10

network size (N)

10

non-navigable region
Web of trust

2.5

Metabolic
Internet
2

navigable region
0

0.1

0.2

0.3

0.4

0.5

Airports
0.6

0.7

clustering coefficient (C)

FIG. 3: Success probability of greedy routing. The two


plots on the left show the success probability ps as a function
of network size N for different values of with weak (top)
and strong (bottom) clustering. The top-right plot shows
ps as a function of and for networks of fixed size N
105 . In the bottom-right plot, parameter is mapped to the
clustering coefficient C [12] by computing C for each network
with given and . For each value of C, there is a critical
value of = c (C) such that the success ratio in networks
with this C and > c (C) decreases with the network size
N (ps (N ) 0), while ps (N ) reaches a constant value for
N

large N in networks with < c (C). The solid line in the


plot shows these critical values c (C). It separates the low-,
high-C navigable region, in which greedy routing sustains in
the large-graph limit, from the high-, low-C non-navigable
region, where the efficiency of greedy routing degrades for
large networks. The same plot labels the measured values of
and C for several real complex networks. Internet is the
global topology of the Internet at the level of Autonomous
Systems as seen by the Border Gateway Protocol (BGP) [13];
Web of trust is the Pretty Good Privacy (PGP) social network
of mutual trust relationships [14]; Metabolic is the network of
metabolic reactions of E. coli [15]; and Airports is the network
of the public air transportation system [16].

hubs and low-degree nodes are connected only if they are


located at moderate hidden distances.
The parameter in Eq. (1) determines the importance
of hidden distances for node connections. The larger ,
the more preferred are connections between nodes close
in the hidden space. Consequently, the triangle inequality in the metric space leads to stronger clustering in
the network, cf. Fig. 1. Clustering has a clear interpretation in our approach as a reflection of the networks
metric strength: the more powerful is the influence of
the networks underlying metric space on the observable
topology, the more strongly it is clustered.

4
III.

NAVIGABILITY OF MODELLED
NETWORKS

We use the model to generate scale-free networks with


different values of power-law degree distribution exponent and clustering strength , covering the observed
values in a vast majority of documented complex networks [13]. We then simulate greedy routing for a large
sample of paths on all generated networks, and compare
the following two navigability parameters: 1) the average hop length from source to destination of successful
greedy-routing paths, and 2) the success ratio ps , defined as the percentage of successful paths. Unsuccessful
paths are paths that get stuck at nodes without neighbours closer to the destination in the hidden space than
themselves. These nodes usually have small degrees. See
Appendix B for simulation details.
Fig. 2 shows the impact of the networks degree distribution and clustering on the average length of greedy
routing paths. We observe a straightforward dependency:
paths are shorter for smaller exponents and stronger
clustering (larger s). The dependency of the success
ratio (the fraction of successful paths) ps on the two
topology parameters and is more intertwined. Fig. 3
shows that the effect of one parameter, , on the success
ratio depends on the other parameter, the level of clustering. If clustering is weak (low ), the percentage of
successful paths decays with network size N regardless
of the value of (Fig. 3 top-left). However, with strong
clustering (large ), the percentage of successful paths
increases with N and attains a maximum for large networks if . 2.6, whereas it degrades for large networks
if > 2.6 (Fig. 3 bottom-left). Fig. 3 top-right shows
this effect for networks of the same size (N = 105 ) with
different and . The value of = 2.6 0.1 maximises
the number of successful paths once clustering is above
a threshold, 1.5. These observations mean that for
a fixed clustering strength, there is a critical value of the
exponent (Fig. 3 bottom-right) below which networks
remain navigable as their size increases, but above which
their navigability deteriorates with their size.
In summary, strong clustering improves both navigability metrics. We also find a delicate trade-off between
values of close to 2 minimising path lengths, and higher
values not exceeding 2.6 maximising the percentage of successful paths. We explain these findings in the
next section, but we note here that qualitatively, this
navigable parameter region contains a majority of complex networks observed in reality [13], as confirmed in
Fig. 3 (bottom-right), where we juxtapose few paradigmatic examples of communication, social, biological, and
transportation networks vs. the identified navigable region of clustering and degree distribution exponent.

IV.

AIR TRAVEL BY GREEDY ROUTING AS


AN EXPLANATION

We illustrate the greedy routing function, and the


structure of networks conductive to such routing, with
an example of passenger air travel. Suppose we want
to travel from Toksook Bay, Alaska, to Ibiza, Spain, by
the public air transportation network. Nodes in this network are airports, and two airports are connected if there
is at least one flight between them. We travel according to the greedy routing strategy using geography as
the underlying metric space. At each airport we choose
the next-hop airport geographically closest to the destination. Under these settings, our journey goes first to
Bethel, then to Anchorage, to Detroit, over the Atlantic
to Paris, then to Valencia and finally to Ibiza, see Fig. 4.
The sequence and sizes of airport hops reveal the structure of our greedy-routing path. The path proceeds from
a small airport to a local hub at a small distance, from
there to a larger hub at a larger distance, and so on until we reach Paris. At that point, when the distance to
the destination becomes sufficiently small, greedy routing
leads us closer to our final destination by choosing not
another hub, but a less connected neighbouring airport.
We observe that the navigation process has two, somewhat symmetric phases. The first phase is a coarsegrained search, travelling longer and longer distances per
hop toward hubs, thus zooming out from the starting
point. The second phase corresponds to a fine-grained
search, zooming in onto the destination. The turning
point between the two phases appears naturally: once we
are in a hub near the destination, the probability that it
is connected to a bigger hub closer to the destination
sharply decreases, but at this point we do not need hubs
anyway, and greedy routing directs us to smaller airports
at shorter distances next to the destination.
This zoom out/zoom in mechanism works efficiently
only if the coupling between the airport network topology and the underlying geography satisfies the following two conditions: the sufficient hubs condition and
the sufficient clustering condition. The first condition
ensures that a network has enough hub airports (highdegree nodes) to provide an increasing sequence during
the zoom out phase. This condition is fulfilled by the real
airport network and by other scale-free networks with
small values of degree distribution exponent , because
the smaller the , the larger the proportion of hubs in
the network.
However, the presence of many hubs does not ensure
that greedy routing will use them. Unlike humans, who
can use their knowledge of airport size to selectively
travel via hub airports, greedy routing uses only one constraint at each hop: minimise distance to the destination.
Therefore, the network topology must satisfy the second
condition, which ensures that Bethel is larger than Toksook Bay, Anchorage larger than Bethel, and so on. More
generally, this condition is that the next greedy hop from
a remote low-degree node likely has a higher degree, so

Logarithm of airport degree

Paris
2.5

Detroit
2

1.5

Anchorage
Bethel

Valencia

0.5

9000

Ibiza
Toksook bay
8000

7000

6000

5000

4000

3000

2000

1000

Distance to Ibiza (Km)

FIG. 4: Greedy-routing path from Toksook Bay to Ibiza. At each intermediate airport, the next hop is the airport
closest to Ibiza geographically. Sizes of symbols representing the airports are proportional to the logarithm of their degrees.
The inset shows the changing distance to Ibiza (in the x axis) and the degree of the visited airports (y axis, in logarithmic
scale).

that greedy paths typically head first toward the highly


connected network core. But the network metric strength
is exactly the required property: preference for connections between nodes nearby in the hidden space means
that low-degree nodes are less likely to have connectivity
to distant low-degree nodes; only high-degree nodes can
have long-range connection that greedy routing will effectively select. The stronger this coupling between the
metric space and topology (the higher in Eq. (1)), the
stronger the clustering in the network.
To illustrate, imagine an airport network without sufficient clustering, one where the airport closest to our
destination (Ibiza) among all airports connected to our
current node (Toksook Bay, Alaska) is not Bethel, which
is bigger than Toksook Bay, but Nightmute, Alaska, a
nearby airport of comparable size to Toksook Bay. As
greedy routing first leads us to Nightmute, then to another small nearby airport, and then to another, we can
no longer get to Ibiza in few hops. Worse, travelling via
these numerous small airports, we could reach one with
no connecting flights heading closer to Ibiza. Our greedy
routing would be stuck at this airport with an unsuccessful path.
These factors explain why the most navigable topologies correspond to scale-free networks with small exponents of the degree distribution, i.e., a large number of
hubs, and with strong clustering, i.e., strong coupling between the hidden geometry and the observed topology.

V.

THE STRUCTURE OF GREEDY-ROUTING


PATHS

We observe the discussed zoom-out/zoom-in mechanism in analytical calculations and numerical simulations. Specifically, we calculate in Appendix C the probability that the next hop from a node of degree k located
at hidden distance d from the destination has a larger
degree k 0 > k, in which case the path moves toward the
high-degree network core, see Fig. 5. In the most navigable case, with small degree-distribution exponent and
strong clustering, the probability of increasing the node
degree along the path is high at low-degree nodes, and
sharply decreases to zero after reaching a node of a critical degree value, which increases with distance d. This
observation implies that greedy-routing paths first propagate up to higher-degree nodes in the network core and
then exit the core toward low-degree destinations in the
periphery. In contrast, with low clustering, paths are
less likely to find higher-degree nodes regardless of the
distance to the destination. This path structure violates
the zoom-out/zoom-in pattern required for efficient navigation.
Fig. 6 shows the structure of greedy-routing paths
in simulations, further confirming our analysis. We
again see that for small degree-distribution exponents
and strong clustering (upper left and middle left), the
routing process quickly finds a way to the high-degree
core, makes a few hops there, and then descends to a lowdegree destination. In the other, non-navigable cases, the

Pup(k,d)

6
1

0.8

0.8

0.6

0.6

0.4

0.4

0.2

Pup(k,d)

0
0
10
1

0.2
2

10

10

10

0
0
10
1

0.8

0.8

0.6

0.6

0.4

0.4

0.2
0
0
10
1

Pup(k,d)

=5.0, =2.2

0.8

=5.0, =2.5
1

10

10

=5.0, =3.0

0
0
10
1

0.6

0.4

0.4

0.2

0.2
1

10

10

10

=1.1, =2.5

10

10

0
0
10

10

10

10

=1.1, =3.0

0.8

0.6

0
0
10

10

0.2
2

10

=1.1, =2.2

d=100
d=1000
d=10000
d=100000
1

10

10

10

FIG. 5: Probability that greedy routing travels to


higher-degree nodes. More precisely, the probability
Pup (k, d) that the greedy-routing next hop after a node of
degree k located at distance d from a destination has higher
degree k0 k and is closer to the destination. The distance
legend in the right-bottom plot applies to all the plots. The
results are for the large-graph limit N .

process can almost never get to the core of high-degree


nodes. Instead, it wanders in the low-degree periphery
increasing the probability of getting lost at low-degree
nodes.

VI.

CONCLUSION

In this paper, we have shown that the existence of


hidden metric spaces, coupled with heterogeneous degree
distributions and strong clustering, explains the surprising navigability of real networks. Discovery of the explicit
structure of such hidden metric spaces underlying actual
networks may have profound practical implications. In
social or some communication networks (e.g., the Web,
overlay, or online social networks) hidden spaces would
yield efficient strategies for searching specific individuals
or content. The metric spaces hidden under some biological networks (such as neural, gene regulatory networks,
or signalling pathways) can become a powerful tool in
studying the structure of information or signal flows in
these networks. Even more promising and immediately
applicable is the potential use of hidden metric spaces
in the global Internet. Its routing architecture bears

FIG. 6: The structure of greedy-routing paths. We visualise the results of our simulation of greedy routing in modelled networks with different values of and observed in real
complex networks. The hidden distance between the starting
point and the destination is always approximately 104 , and
the network size N and number of attempted paths is always
105 for each (, ) combination, but the number of successful
paths and path hop-lengths vary, cf. Figs. 2,3. All paths start
and end at low-degree nodes located, respectively, in the leftand right-bottom corners of the diagrams (see top left plot).
For each (, ) we depict a single typical path in black and
use colour to indicate how often paths included a node of degree k located at distance d from the destination (blue/red
indicates exponentially less/more visits to those nodes). The
simulations confirm that only when is small and is large
does the average path structure follow the zoom-out/zoomin pattern that characterises successful greedy routing in real
networks, e.g., the airport network in our example.

long-standing scalability problems related to the need for


routers to maintain a coherent view of the increasingly
dynamic and ever-growing global Internet topology [19].
Greedy routing over the Internets hidden metric space
would remove this scalability bottleneck, as it does not
require any global topological awareness. In general, we

7
believe that the present and future work on hidden metric
spaces and network navigability will deepen our understanding of the fundamental laws describing relationships
between structure and function of complex networks.

Acknowledgments

We thank M. Angeles
Serrano for useful comments and
discussions. This work was supported in part by DGES
grants No. FIS2004-05923-CO2-02 and FIS2007-66485C02-02, Generalitat de Catalunya grant No. SGR00889,
the Ram
on y Cajal program of the Spanish Ministry of
Science, and by NSF CNS-0434996 and CNS-0722070, by
DHS N66001-08-C-2029, and by Cisco Systems.

APPENDIX A: A MODEL WITH THE CIRCLE


AS A HIDDEN METRIC SPACE

In our model we place all nodes on a circle by assigning them a random variable , i.e., their polar angle, distributed uniformly in [0, 2). The circle radius R grows
linearly with the total number of nodes N , 2R = N ,
in order to keep the average density of nodes on the circle fixed to 1. We next assign to each node its expected
degree drawn from some distribution (). The connection probability between two nodes with hidden coordinates (, ) and (0 , 0 ) takes the form
r(, ; 0 , 0 ) =


1+

d(, 0 )
0


,

( 1)
, (A1)
2hki

where d(, 0 ) is the geodesic distance between the two


node on the circle, while hki is the average degree. One
can show that the average degree of nodes with hidden

variable , k(),
is proportional to .[20] This proportionality guarantees that the shape of the node degree distribution P (k) in generated networks is approximately the same as the shape of (). The choice of
() = ( 1)01 , > 0 ( 2)hki/( 1),
> 2, generates random networks with a power-law degree distribution of the form P (k) k .

APPENDIX B: NUMERICAL SIMULATIONS

each network instance G, we randomly select 106 sourcedestination pairs (a, b) and execute the greedy-routing
process for them starting at a and selecting, at each hop
h, the next hop as the hs neighbour in G closest to b
in the circle. If for a given (a, b), this process visits the
same node twice, then the corresponding path leads to a
loop and is unsuccessful. We then average the measured
values of path hop lengths and percentage of successful
paths ps across all pairs (a, b) and networks G for the
same (, , N ).

APPENDIX C: THE ONE-HOP PROPAGATOR


OF GREEDY ROUTING

To derive the greedy-routing propagator in this appendix, we adopt a slightly more general formalism than
in the main text. Specifically, we assume that nodes
live in a generic metric space H and, at the same time,
have intrinsic attributes unrelated to H. Contrary to
normed spaces or Riemannian manifolds, generic metric
spaces do not admit any coordinates, but we still use the
coordinate-based notations here to simplify the exposition below, and denote by x nodes coordinates in H and
by all their other, non-geometric attributes. In other
words, hidden variables x and in this general formalism represent some collections of nodes geometric and
non-geometric hidden attributes, not just a pair of scalar
quantities. Therefore, integrations over x and in what
follows stand merely to denote an appropriate form of
summation in each concrete case.
As in the main text, we assume that x and are independent random variables so that the probability density
to find a node with hidden variables (x, ) is
(x, ) = (x)()/N,

where () is the probability density of the variables


and (x) is the concentration of nodes in H. The total
number of nodes is
Z
N=
(x)dx,
(C2)
H

and the connection probability between two nodes is an


integrable decreasing function of the hidden distance between them,
r(x, ; x0 , 0 ) = r[d(x, x0 )/dc (, 0 )],

Our model has three independent parameters: exponent of power-law degree distributions, clustering
strength , and average degree hki. We fix the latter to
6, which is roughly equal to the average degree of some
real networks of interest [13, 14], and vary [2.1, 3]
and [1.1, 5], covering their observed ranges in documented complex networks [13]. For each (, ) pair, we
produce networks of different sizes N [103 , 105 ] generating, for each (, , N ), a number of different network
instancesfrom 40 for large N to 4000 for small N . In

(C1)

(C3)

where dc (, 0 ) a characteristic distance scale that depends on and 0 .


We define the one-step propagator of greedy routing as
the probability G(x0 , 0 |x, ; xt ) that the next hop after
a node with hidden variables (x, ) is a node with hidden variables (x0 , 0 ), given that the final destination is
located at xt .
To further simplify the notations below, we label the
set of variables (x, ) as a generic hidden variable h and

8
by {h, ht , h1 , , hN 2 } {h, ht ; {hj }} with j =
1, , N 2, where h and ht denote the hidden variables of the current hop and the destination, respectively.
In this particular network configuration, the probability
that the current nodes next hop is a particular node i
with hidden variable hi is the probability that the current node is connected to i but disconnected to all nodes
that are closer to the destination than i,

undo this notation change at the end of the calculations


according to the following rules:
(x, )
(x, )
dxd
r(x, ; x0 , 0 )

h
(h)
dh
r(h, h0 ).

(C4)

We begin the propagator derivation assuming that a


particular network instance has a configuration given

Prob(i|h, ht ; {hj }) = r(h, hi )

N
2
Y

[d(hi ,ht )d(hj ,ht )]

[1 r(h, hj )]

(C5)

j(6=i)=1

where () is the Heaviside step function.


Taking the average over all possible configurations
{h1 , , hi1 , hi+1 , , hN 2 } excluding node i, we obtain

N 3
1
Prob(i|h, ht ; hi ) = r(h, hi ) 1
k(h|hi , ht )
,
N 3
(C6)
where
Z

k(h|h
(h0 )r(h, h0 )dh0
i , ht ) = (N 3)
d(hi ,ht )<d(h0 ,ht )

(C7)
is the average number of connections between the current
node and nodes closer to the destination than node i,
excluding i and t.
The probability that the next hop has hidden variable
h0 , regardless of its label, i.e., index i, is
Prob(h0 |h, ht ) =

N
2
X

(h0 )Prob(i|h, ht ; h0 ).

(C8)

tity. Taking the limit of large N , the above expression


simplifies to

|x0 , xt ) =
k(x,

(C10)

We now undo the notation change and express this


propagator in terms of our mixed coordinates:
G(x0 , 0 |x, ; xt ) =



0
d(x, x0 )
(x0 )( 0 )

r
ek(x,|x ,xt ) ,

dc (, 0 )
1 ek(x,)
(C11)

with

Z
dy

In the particular case of the S1 model, we can express


this propagator in terms of relative hidden distances instead of absolute coordinates. Namely, G(d0 , 0 |d, ) is
the probability that an -labeled node at hidden distance

N (h0 )r(h, h0 )ek(h|h ,ht )


.

1 ek(h)

G(h0 |h, ht ) =

Z
d(x0 ,xt )>d(y,xt )

(C9)

Yet, this equation is not a properly normalized probability density function for the variable h0 since node h can
have degree zero with some probability. If we consider
only nodes with degrees greater than zero, then the nor
malization factor is given by 1 ek(h) . Therefore, the
properly normalized propagator is finally

i=1
0

In the case of sparse networks, k(h|h


, ht ) is a finite quan-

Prob(h0 |h, ht ) = N (h0 )r(h, h0 )ek(h|h ,ht ) .

d 0 (y)( 0 )r


d(x, y)
.
dc (, 0 )

(C12)

d from the destination has as the next hop an 0 -labeled


node at hidden distance d0 from the destination. After
tedious calculations, the resulting expression reads:

G(d0 , 0 |d, ) =

(1)
0

(1)
0

dd0
(1+
0)

1
0

d d
(1+
0)

+
+

d+d0
(1+
0)

exp

(1)
1

d+d
B( dd
, 2, 2 ) B( , 2, 2 )

exp

(1)
1

2
2

io

; d0 d
,

d+d
(1+
0)

0
d
B( d
,

2, 2 )

0
B( d+d
,

2, 2 )

io

;d > d

(C13)

0.8

=1.1, =2.2

Pup(,d)

Pup(,d)

0.8
0.6

0.6

0.4

B(z, a, b) z

0.2

=5.0, =2.2

0
-3
10

-2

10

-1

10

/d

10

10

0
-3
10

10

-2

10

1/2

-1

10

/d
1

0.8

0.8

10

10

10

1/2

=1.1, =2.5

Pup(,d)

0.6

0.6

0.4

0.4

0.2
0
-3
10

-2

10

-1

10

/d

10

10

0
-3
10

10

Z
-2

10

1/2

-1

10

/d

0.8

0.8

ta1 (1 + t)b1 dt,

(C14)

which is somewhat
R z similar to the incomplete beta function B(z, a, b) = 0 ta1 (1 t)b1 dt.
One of the informative quantities elucidating the structure of greedy-routing paths is the probability Pup (, d)
that the next hop after an -labeled node at distance d
from the destination has a higher value of . The greedyrouting propagator defines this probability as

0.2

=5.0, =2.5

0.4

0.2

Pup(,d)

where we have defined function

10

10

10

1/2

Pup (, d) =

d
0

dd0 G(d0 , 0 |d, ),

(C15)

d0 <d

[1] R. Albert and A.-L. Barab


asi, Rev. Mod. Phys. 74, 47
(2002).
[2] M. E. J. Newman, SIAM Rev 45, 167 (2003).
[3] S. N. Dorogovtsev and J. F. F. Mendes, Evolution of
networks: From biological nets to the Internet and WWW
(Oxford University Press, Oxford, 2003).
[4] S. Boccaletti, V. Latora, Y. Moreno, M. Chavez, and D.U. Hwang, Phys. Rep. 424, 175 (2006).
[5] M. Castells, The rise of the network society (Blackwell
Publishing, Oxford, 1996).
[6] R. Pastor-Satorras and A. Vespignani, Evolution and
Structure of the Internet. A Statistical Physics Approach
(Cambridge University Press, Cambridge, 2004).
[7] D. J. Watts, P. S. Dodds, and M. E. J. Newman, Science
296, 1302 (2002).
[8] M. Girvan and M. E. J. Newman, Proc Natl Acad Sci
USA 99, 7821 (2002).
[9] F. Menczer, Proc Natl Acad Sci USA 99, 14014 (2002).
[10] E. A. Leicht, P. Holme, and M. E. J. Newman, Phys Rev

E 73, 026120 (2006).


[11] M. A. Serrano, D. Krioukov, and M. Bogu
na
, Phys. Rev.
Lett. p. (in press) (2008).
[12] D. J. Watts and S. H. Strogatz, Nature 393, 440 (1998).
[13] P. Mahadevan, D. Krioukov, M. Fomenkov, B. Huffaker,
X. Dimitropoulos, kc claffy, and A. Vahdat, Comput
Commun Rev 36, 17 (2006).
[14] M. Bogu
na
, R. Pastor-Satorras, A. Daz-Guilera, and
A. Arenas, Phys Rev E 70, 056122 (2004).
[15] H. Jeong, B. Tombor, R. Albert, Z. N. Oltvai, and A.-L.
Barab
asi, Nature 407, 651 (2000).
[16] A. Barrat, M. Barthelemy, R. Pastor-Satorras, and
A. Vespignani, Proc. Natl. Acad. Sci. USA 101, 3747
(2004).
[17] S. Milgram, Psychol Today 1, 61 (1967).
[18] J. Kleinberg, Nature 406, 845 (2000).
[19] D. Meyer, L. Zhang, and K. Fall, eds., Report from the
IAB Workshop on Routing and Addressing (The Internet
Architecture Board, 2007).

0.6

0.6

0.4
0.2

d=100
d=1000
d=10000
d=100000

Pup(,d)

Pup(,d)

FIG. 7: Probability Pup (/d1/2 , d).

and we show Pup (/d1/2 , d) in Fig. 7. We see that the


proper scaling of c d1/2 , where c is the critical
value of above which Pup (, d) quickly drops to zero,
is present only when clustering is strong. Furthermore,
Pup (, d) is an increasing function of for small s only
when the degree distribution exponent is close to 2.
A combination of these two effects guarantees that the
layout of greedy routes properly adapts to increasing distances or graph sizes, thus making networks with strong
clustering and s greater than but close to 2 navigable.

=1.1, =3.0

0.4
0.2

=5.0, =3.0

0
-3
10

-2

10

-1

10

/d

10
1/2

10

10

0
-3
10

-2

10

-1

10

/d

10

10

10

1/2

10
[20] M. Bogu
na
and R. Pastor-Satorras, Phys. Rev. E 68,
036112 (2003).
[21] J. M. Guiot, Eur J of Soc Psychol 6, 503 (1976).
[22] The percentage of successfully delivered messages in

the original experiments was around 30% [17], while


their subsequent simple modifications improved it up to
90% [21].

You might also like