Navigability Complex PDF
Navigability Complex PDF
Navigability Complex PDF
Mari
an Bogu
na,1 Dmitri Krioukov,2 and kc claffy2
Departament de Fsica Fonamental, Universitat de Barcelona, Mart i Franqu`es 1, 08028 Barcelona, Spain
2
Cooperative Association for Internet Data Analysis (CAIDA), University of California,
San Diego (UCSD), 9500 Gilman Drive, La Jolla, CA 92093, USA
Routing information through networks is a universal phenomenon in both natural and manmade
complex systems. When each node has full knowledge of the global network connectivity, finding
efficient communication paths is merely a matter of distributed computation. However, in many real
networks nodes communicate efficiently even without such global intelligence. Here we show that
the peculiar structural characteristics of observable complex networks are exactly the characteristics
needed to maximize their communication efficiency without global knowledge. We also describe a
general mechanism that explains this connection between network structure and function. This
mechanism relies on the presence of a metric space hidden behind an observable network. Our findings suggest that real networks in nature have underlying metric spaces that remain undiscovered.
Their discovery would have enormous practical applications ranging from routing in the Internet
and searching social networks, to studying information flows in neural, gene regulatory networks, or
signaling pathways.
I.
INTRODUCTION
we abstract as a hidden distance. Taken together, hidden distances define a hidden metric space for a given
network. Our recent work shows that these spaces explain the observed structural peculiarities of several real
networks, in particular social and technological ones [11].
Here we show that this underlying metric structure can
be used to guide the routing process, leading to efficient
communication without global information in arbitrarily
large networks. Our analysis reveals that, remarkably,
real networks satisfy the topological conditions that maximise their navigability within this framework. Therefore, hidden metric spaces offer explanations of two open
problems in complex networks science: the communication efficiency networks so often exhibit, and their unique
structural characteristics. Our results have enormous
consequences for network science and engineering, opening the possibility, for example, to design efficient routing
and searching strategies for the Internet and other technological or social networks.
II.
FIG. 1: How hidden metric spaces influence the structure and function of complex networks. The smaller the
distance between two nodes in the hidden metric space, the more likely they are connected in the observable network topology.
If node A is close to node B, and B is close to C in the hidden space, then A and C are necessarily close, too, because of the
triangle inequality in the metric space. Therefore, triangle ABC exists in the network topology with high probability, which
explains the strong clustering observed in real complex networks. The hidden space also guides the greedy routing process: if
node A wants to reach node F (hidden distance AF is the black dashed line), it checks the hidden distances between F and
its two neighbours B and C. Distance CF (green dashed line) is smaller than BF (red dashed line), therefore A forwards
information to C. Node C then performs similar calculations and selects its neighbour D as the next hop on the path to F .
Node D is directly connected to F . The result is path A C D F shown by green edges in the observable topology.
=2.2
3
10
10
10
10
2.2
2.4
2.6
2.8
=2.1
=2.2
=2.3
=2.4
=2.5
=2.6
=2.7
=2.8
=2.9
=3.0
3
10
=1.1
4
(1)
where > 1 and dc kk 0 , which means that the probability of link connection between two nodes in the network decreases with the hidden distance between them
(as d ) and increases with their degrees (as (kk 0 ) ).
These two properties have a clear interpretation. The
connection cost increases with hidden distance, thus discouraging long-range links. However, in making connections, rich (well-connected, high-degree) nodes care less
about distances (connection costs) than poor nodes. Further, the characteristic distance scale dc provides a coupling between node degrees and hidden distances, and ensures the following three topological characteristics that
we commonly see in real networks. First, pairs of richly
connected, high-degree nodes hubs are connected with
high probability regardless of the hidden distance between them because their characteristic distance dc is
so large that any actual distance d between them will be
short in comparison: regardless of d, connection probability r in Eq. (1) is close to 1 if dc is large. Second, pairs
of low-degree nodes will not be connected unless the hidden distance d between them is short enough to compare
with the small value of their characteristic distance dc .
Third, following similar arguments, pairs composed of
=1.1
=1.5
=2.0
=3.0
=5.0
0.4
0.2
10
2.2
2.4
2.8
2.6
degree exponent ()
3
0.6
0.5
0.4
=5.0
0.3
2
10
0.6
10
0.7
0
2
10
degree exponent ()
0.1
=1.5
=2.5
=3.5
=4.5
degree exponent ()
=2.5
=1.5
=2.5
=3.5
=4.5
15
0.2
8
10
10
10
non-navigable region
Web of trust
2.5
Metabolic
Internet
2
navigable region
0
0.1
0.2
0.3
0.4
0.5
Airports
0.6
0.7
4
III.
NAVIGABILITY OF MODELLED
NETWORKS
IV.
Paris
2.5
Detroit
2
1.5
Anchorage
Bethel
Valencia
0.5
9000
Ibiza
Toksook bay
8000
7000
6000
5000
4000
3000
2000
1000
FIG. 4: Greedy-routing path from Toksook Bay to Ibiza. At each intermediate airport, the next hop is the airport
closest to Ibiza geographically. Sizes of symbols representing the airports are proportional to the logarithm of their degrees.
The inset shows the changing distance to Ibiza (in the x axis) and the degree of the visited airports (y axis, in logarithmic
scale).
V.
We observe the discussed zoom-out/zoom-in mechanism in analytical calculations and numerical simulations. Specifically, we calculate in Appendix C the probability that the next hop from a node of degree k located
at hidden distance d from the destination has a larger
degree k 0 > k, in which case the path moves toward the
high-degree network core, see Fig. 5. In the most navigable case, with small degree-distribution exponent and
strong clustering, the probability of increasing the node
degree along the path is high at low-degree nodes, and
sharply decreases to zero after reaching a node of a critical degree value, which increases with distance d. This
observation implies that greedy-routing paths first propagate up to higher-degree nodes in the network core and
then exit the core toward low-degree destinations in the
periphery. In contrast, with low clustering, paths are
less likely to find higher-degree nodes regardless of the
distance to the destination. This path structure violates
the zoom-out/zoom-in pattern required for efficient navigation.
Fig. 6 shows the structure of greedy-routing paths
in simulations, further confirming our analysis. We
again see that for small degree-distribution exponents
and strong clustering (upper left and middle left), the
routing process quickly finds a way to the high-degree
core, makes a few hops there, and then descends to a lowdegree destination. In the other, non-navigable cases, the
Pup(k,d)
6
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
Pup(k,d)
0
0
10
1
0.2
2
10
10
10
0
0
10
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0
0
10
1
Pup(k,d)
=5.0, =2.2
0.8
=5.0, =2.5
1
10
10
=5.0, =3.0
0
0
10
1
0.6
0.4
0.4
0.2
0.2
1
10
10
10
=1.1, =2.5
10
10
0
0
10
10
10
10
=1.1, =3.0
0.8
0.6
0
0
10
10
0.2
2
10
=1.1, =2.2
d=100
d=1000
d=10000
d=100000
1
10
10
10
VI.
CONCLUSION
FIG. 6: The structure of greedy-routing paths. We visualise the results of our simulation of greedy routing in modelled networks with different values of and observed in real
complex networks. The hidden distance between the starting
point and the destination is always approximately 104 , and
the network size N and number of attempted paths is always
105 for each (, ) combination, but the number of successful
paths and path hop-lengths vary, cf. Figs. 2,3. All paths start
and end at low-degree nodes located, respectively, in the leftand right-bottom corners of the diagrams (see top left plot).
For each (, ) we depict a single typical path in black and
use colour to indicate how often paths included a node of degree k located at distance d from the destination (blue/red
indicates exponentially less/more visits to those nodes). The
simulations confirm that only when is small and is large
does the average path structure follow the zoom-out/zoomin pattern that characterises successful greedy routing in real
networks, e.g., the airport network in our example.
7
believe that the present and future work on hidden metric
spaces and network navigability will deepen our understanding of the fundamental laws describing relationships
between structure and function of complex networks.
Acknowledgments
We thank M. Angeles
Serrano for useful comments and
discussions. This work was supported in part by DGES
grants No. FIS2004-05923-CO2-02 and FIS2007-66485C02-02, Generalitat de Catalunya grant No. SGR00889,
the Ram
on y Cajal program of the Spanish Ministry of
Science, and by NSF CNS-0434996 and CNS-0722070, by
DHS N66001-08-C-2029, and by Cisco Systems.
In our model we place all nodes on a circle by assigning them a random variable , i.e., their polar angle, distributed uniformly in [0, 2). The circle radius R grows
linearly with the total number of nodes N , 2R = N ,
in order to keep the average density of nodes on the circle fixed to 1. We next assign to each node its expected
degree drawn from some distribution (). The connection probability between two nodes with hidden coordinates (, ) and (0 , 0 ) takes the form
r(, ; 0 , 0 ) =
1+
d(, 0 )
0
,
( 1)
, (A1)
2hki
variable , k(),
is proportional to .[20] This proportionality guarantees that the shape of the node degree distribution P (k) in generated networks is approximately the same as the shape of (). The choice of
() = ( 1)01 , > 0 ( 2)hki/( 1),
> 2, generates random networks with a power-law degree distribution of the form P (k) k .
each network instance G, we randomly select 106 sourcedestination pairs (a, b) and execute the greedy-routing
process for them starting at a and selecting, at each hop
h, the next hop as the hs neighbour in G closest to b
in the circle. If for a given (a, b), this process visits the
same node twice, then the corresponding path leads to a
loop and is unsuccessful. We then average the measured
values of path hop lengths and percentage of successful
paths ps across all pairs (a, b) and networks G for the
same (, , N ).
To derive the greedy-routing propagator in this appendix, we adopt a slightly more general formalism than
in the main text. Specifically, we assume that nodes
live in a generic metric space H and, at the same time,
have intrinsic attributes unrelated to H. Contrary to
normed spaces or Riemannian manifolds, generic metric
spaces do not admit any coordinates, but we still use the
coordinate-based notations here to simplify the exposition below, and denote by x nodes coordinates in H and
by all their other, non-geometric attributes. In other
words, hidden variables x and in this general formalism represent some collections of nodes geometric and
non-geometric hidden attributes, not just a pair of scalar
quantities. Therefore, integrations over x and in what
follows stand merely to denote an appropriate form of
summation in each concrete case.
As in the main text, we assume that x and are independent random variables so that the probability density
to find a node with hidden variables (x, ) is
(x, ) = (x)()/N,
Our model has three independent parameters: exponent of power-law degree distributions, clustering
strength , and average degree hki. We fix the latter to
6, which is roughly equal to the average degree of some
real networks of interest [13, 14], and vary [2.1, 3]
and [1.1, 5], covering their observed ranges in documented complex networks [13]. For each (, ) pair, we
produce networks of different sizes N [103 , 105 ] generating, for each (, , N ), a number of different network
instancesfrom 40 for large N to 4000 for small N . In
(C1)
(C3)
8
by {h, ht , h1 , , hN 2 } {h, ht ; {hj }} with j =
1, , N 2, where h and ht denote the hidden variables of the current hop and the destination, respectively.
In this particular network configuration, the probability
that the current nodes next hop is a particular node i
with hidden variable hi is the probability that the current node is connected to i but disconnected to all nodes
that are closer to the destination than i,
h
(h)
dh
r(h, h0 ).
(C4)
N
2
Y
[1 r(h, hj )]
(C5)
j(6=i)=1
k(h|h
(h0 )r(h, h0 )dh0
i , ht ) = (N 3)
d(hi ,ht )<d(h0 ,ht )
(C7)
is the average number of connections between the current
node and nodes closer to the destination than node i,
excluding i and t.
The probability that the next hop has hidden variable
h0 , regardless of its label, i.e., index i, is
Prob(h0 |h, ht ) =
N
2
X
(h0 )Prob(i|h, ht ; h0 ).
(C8)
|x0 , xt ) =
k(x,
(C10)
0
d(x, x0 )
(x0 )( 0 )
r
ek(x,|x ,xt ) ,
dc (, 0 )
1 ek(x,)
(C11)
with
Z
dy
1 ek(h)
G(h0 |h, ht ) =
Z
d(x0 ,xt )>d(y,xt )
(C9)
Yet, this equation is not a properly normalized probability density function for the variable h0 since node h can
have degree zero with some probability. If we consider
only nodes with degrees greater than zero, then the nor
malization factor is given by 1 ek(h) . Therefore, the
properly normalized propagator is finally
i=1
0
d 0 (y)( 0 )r
d(x, y)
.
dc (, 0 )
(C12)
G(d0 , 0 |d, ) =
(1)
0
(1)
0
dd0
(1+
0)
1
0
d d
(1+
0)
+
+
d+d0
(1+
0)
exp
(1)
1
d+d
B( dd
, 2, 2 ) B( , 2, 2 )
exp
(1)
1
2
2
io
; d0 d
,
d+d
(1+
0)
0
d
B( d
,
2, 2 )
0
B( d+d
,
2, 2 )
io
;d > d
(C13)
0.8
=1.1, =2.2
Pup(,d)
Pup(,d)
0.8
0.6
0.6
0.4
B(z, a, b) z
0.2
=5.0, =2.2
0
-3
10
-2
10
-1
10
/d
10
10
0
-3
10
10
-2
10
1/2
-1
10
/d
1
0.8
0.8
10
10
10
1/2
=1.1, =2.5
Pup(,d)
0.6
0.6
0.4
0.4
0.2
0
-3
10
-2
10
-1
10
/d
10
10
0
-3
10
10
Z
-2
10
1/2
-1
10
/d
0.8
0.8
(C14)
which is somewhat
R z similar to the incomplete beta function B(z, a, b) = 0 ta1 (1 t)b1 dt.
One of the informative quantities elucidating the structure of greedy-routing paths is the probability Pup (, d)
that the next hop after an -labeled node at distance d
from the destination has a higher value of . The greedyrouting propagator defines this probability as
0.2
=5.0, =2.5
0.4
0.2
Pup(,d)
10
10
10
1/2
Pup (, d) =
d
0
(C15)
d0 <d
0.6
0.6
0.4
0.2
d=100
d=1000
d=10000
d=100000
Pup(,d)
Pup(,d)
=1.1, =3.0
0.4
0.2
=5.0, =3.0
0
-3
10
-2
10
-1
10
/d
10
1/2
10
10
0
-3
10
-2
10
-1
10
/d
10
10
10
1/2
10
[20] M. Bogu
na
and R. Pastor-Satorras, Phys. Rev. E 68,
036112 (2003).
[21] J. M. Guiot, Eur J of Soc Psychol 6, 503 (1976).
[22] The percentage of successfully delivered messages in