A spatial web graph model with local influence
regions
W. Aiello1 , A. Bonato2 , C. Cooper3 , J. Janssen4 , and P. Pralat4
1
University of British Columbia
Vancouver, Canada
[email protected]
2
Wilfrid Laurier University
Waterloo, Canada
[email protected]
3
King’s College
London, UK
[email protected]
4
Dalhousie University
Halifax, Canada
[email protected],
[email protected]
Abstract. The web graph may be considered as embedded in a topic space, with a metric that
expresses the extent to which web pages are related to each other. Using this assumption, we
present a new model for the web and other complex networks, based on a spatial embedding
of the nodes, called the Spatial Preferred Attachment (SPA) model. In the SPA model, nodes
have influence regions of varying size, and new nodes may only link to a node if they fall within
its influence region. We prove that our model gives a power law in-degree distribution, with
exponent in [2, ∞) depending on the parameters, and with concentration for a wide range of
in-degree values. We also show that the model allows for edges that span a large distance in the
underlying space, modelling a feature often observed in real-world complex networks.
1
Introduction
Current stochastic models for complex networks (such as those described in [1, 2]) aim
to reproduce a number of graph properties observed in real-world networks such as
the web graph. On the other hand, experimental and heuristic treatments of real-life
networks operate under the tacit assumption that the network is a visible manifestation
of an underlying hidden reality. For example, it is commonly assumed that communities
in a social network can be recognized as densely linked subgraphs, or that web pages
with many common neighbours contain related topics. Such assumptions imply that
there is an a priori community structure or relatedness measure of the nodes, which is
reflected by the link structure of the graph.
A common method to represent relatedness of objects is by an embedding in a metric
space, so that related objects are placed close together, and communities are represented
by clusters of points. Following a common text mining technique, web pages are often
represented as vectors in a word-document space. Using Latent Sematic Indexing, these
vectors can then be embedded in a Euclidean topic space, so that pages on similar topics
?
The authors gratefully acknowledge support from NSERC and MITACS grants.
are located close together. Experimental studies [7] have confirmed that similar pages
are more likely to link to each other. On the other hand, experiments also confirm a
large amount of topic drift: it is possible to move to a completely different topic in a
relatively short number of hops. This points to a model where nodes are embedded in a
metric space, and the edge probability between nodes is influenced by their proximity,
but edges that span a larger distance in the space are not uncommon.
The Spatial Preferred Attachment (SPA) model proposed in this paper combines
the above considerations with the often-used preferential attachment principle: pages
with high in-degree are more likely to receive new links. In the SPA model, each node is
placed in space and surrounded by an influence region. The area of the influence region
is determined by the in-degree of the node. Moreover, in each time-step all regions
decrease in area as a function of time. A new node v can only link to an existing node
u if v falls within the influence region of u. If v falls within the region of influence u,
then v will link to u with probability p. Thus, the model is based on the preferential
attachment principle, but only implicitly: nodes with high in-degree have a large region
of influence, and therefore are more likely to attract new links.
A random graph model with certain similarities to the SPA model is the geometric
random graph; see [8]. In that model, all influence regions have the same size, and
the link probability is p = 1. Flaxman, Frieze, and Vera in [5] supply an interesting
geometric model where nodes are embedded on a sphere, and the link probability is
influenced by the relative positions of the nodes. This model is a generalization of a
geometric preferential attachment models presented by the same authors in [4], which
influenced our model.
There are at least three features that distinguish the SPA model from previous
work. First, a new node can choose its links purely based on local information. Namely,
the influence region of a node can be seen as the region where a web page is visible:
only web pages that are close enough (in topic) to fall within the influence region will
be aware of the give page, and thus have a possibility to link to it. Moreover, a new
node links independently to each node visible to it. Consequently, the new node needs
no knowledge of the invisible part of the graph (such as in-degree of other nodes, or
total number of nodes or links) to determine its neighbourhood. Second, since a new
node links to each visible node independently, the out-degree is not a constant nor
chosen according to a pre-determined distribution, but arises naturally from the model.
Third, the varying size of the influence regions allows for the occasional long links, edges
between nodes that are spaced far apart. This implies a certain ”small world” property.
We formally define the SPA model as follows. Let S be the surface of the sphere of
area 1 in R3 . For each positive real number α ≤ 1, and u ∈ S, define the cap around u
with area α as
Bα (u) = {x ∈ S : ||x − u|| ≤ rα },
where || · || is the usual Euclidean norm, and rα is chosen such that Bα has area α.
The SPA model has parameters A1 , A2 , A3 , p ≥ 0 such that p ≤ 1, A1 ≤ 1 and
A2 > 0. It generates stochastic sequences of graphs (Gt : t ≥ 0), where Gt = (Vt , Et ),
and Vt ⊆ S. Let d− (v, t) (d+ (v, t)) be the in-degree (out-degree) of node v in Gt . We
define the influence region of node v at time t ≥ 1, written R(v, t), to be the cap around
v with area
|R(v, t)| =
A1 d− (v, t) + A2
,
t + A3
or R(v, t) = S if the righ-hand-side is greater than 1.
The process begins at t = 0, with G0 being the empty graph, and we let G1 be just
K1 . Time-step t, t ≥ 2, is defined to be the transition between Gt−1 and Gt . At the
beginning of each time-step t, a new node vt is chosen uniformly at random (uar ) from
S, and added to Vt−1 to create Vt . Next, independently, for each node u ∈ Vt−1 such
that vt ∈ R(u, t − 1), a directed edge (vt , u) is created with probability p. Thus, the
probability that a link (vt , u) is added in time-step t equals p|R(u, t − 1)|.
Because new nodes choose independently whether to link to each visible node, and
the size of the influence region of a node depends only on the edges from younger nodes,
the distribution of the random graph Gn produced by the SPA model with parameters
A1 , A2 , A3 , p is equivalent to the graph Gn+A3 produced by the SPA model with the same
values for A1 , A2 , p, but with A3 = 0, where the first A3 nodes have been removed. Since
the results presented in this paper do not depend on the first nodes, we will assume
throughout that A3 = 0.
Note that the model could be defined on any compact set of measure 1. However,
if the set has non-empty boundary, the definiton of the influence regions should be
adjusted. If higher dimensions are desired, S could be chosen to be the boundary of
a hypersphere in Rk for some k. The results in Sections 2 and 3 will still hold, while
Section 4 can be easily extended to this case.
We prove in Section 2 that with high probability a graph Gn generated by the SPA
model has an in-degree distribution that follows a power law in-degree distribution
pA1 /(6pA1 +2)
1
n
if
with exponent 1 + pA1 , with concentration up to n , where if = log4 n
. If
pA1 = 10/11, then the power law in-degree exponent is 2.1, the same as observed in the
web graph (see, for example [2]). We also give a precise expression for the probability
distribution of each individual node vi , provided that pA1 < 1. In Section 3, we show
that, if pA1 < 1, the number of edges of Gn is linear, and strongly concentrated around
the mean, while if pA1 = 1 the expected number of edges is n log n. In Section 4 we
explore a geometric version of the small world property. We show that the expected sum
of (geometric) lengths of new edges added at time t in the SPA model is Θ(t2−b ), where
b = 1 + pA1 1 is the exponent of the power law. For the in-degree power law exponent
b = 2.1 commonly observed in the web graph, this expected sum of lengths is greater
than the corresponding expected sum in a corresponding geometric random graph with
equal-sized influence regions.
2
In-degree distribution
In the rest of the paper, (Gt : t ≥ 0) refers to a sequence of random graphs generated by
the SPA model with parameters A1 , A2 , A3 = 0, and p. In this section, we explore the
in-degree of the nodes in Gn . We say that an event holds asymptotically almost surely
(aas) if it holds with probability tending to one as n → ∞; an event holds with extreme
probability (wep) if it holds with probability at least 1 − exp(−Θ(log2 n)) as n → ∞.
Let Ni,t denote the number of nodes of in-degree i in Gt . For an integer n ≥ 0, define
pA1 /(6pA1 +2)
n
if = if (n) =
.
(1)
log4 n
Our main result in this section is the following.
Theorem 1. Fix p ∈ (0, 1]. Then for any i ≥ 0,
E(Ni,n ) = ci n(1 + o(1)),
(2)
where
c0 =
1
,
1 + pA2
(3)
and for 1 ≤ i ≤ n,
i−1
Y
jA1 + A2
pi
ci =
.
1 + pA2 + ipA1 j=0 1 + pA2 + jpA1
(4)
For i = 0, . . . , if , wep
Ni,n = ci n(1 + o(1)).
−(1+
1
(5)
)
pA1
(1 + o(1)) for some constant c, this shows that for large i, the
Since ci = ci
expected proportion Ni,n /n follows a power law with exponent 1 + pA1 1 , with concentration for all values of i up to if . The proof of the Theorem 1 is contained in the rest
of this section.
2.1
Expected value
The equations relating the random variables Ni,t are described as follows. As G1 consist
of one isolated node, N0,1 = 1, and Ni,1 = 0 for i > 0. For all t > 0, we derive that
A2
,
(6)
t
A1 (i − 1) + A2
A1 i + A2
E(Ni,t+1 − Ni,t | Gt ) = Ni−1,t p
− pNi,t
.
(7)
t
t
Recurrence relations for the expected values of Ni,t can be derived by taking the
expectation of the above equations. To solve these relations, we use the following lemma
on real sequences, which is Lemma 3.1 from [2].
E(N0,t+1 − N0,t | Gt ) = 1 − N0,t p
Lemma 1. If (αt ), (βt ) and (γt ) are real sequences satisfying the relation
βt
αt+1 = 1 −
αt + γt ,
t
αt
t
and limt→∞ βt = β > 0 and limt→∞ γt = γ, then limt→∞
exists and equals
γ
.
1+β
Applying this lemma with αt = E(N0,t ), βt = pA2 , and γt = 1 gives that E(N0,t ) =
c0 t + o(t) with c0 as in (3). For i > 0, the lemma can be inductively applied with αt =
2
E(Ni,t ), βt = p(A1 i+A2 ), and γt = E(Ni−1,t ) A1 (i−1)+A
to show that E(Ni,t ) = ci t+o(t),
t
where
ci = ci−1
A1 (i − 1) + A2
.
1 + p(A1 i + A2 )
It is easy to verify that the expression for ci as defined in (3) and (4) satisfies this
recurrence relation.
2.2
Concentration
We prove concentration for Ni,t when i ≤ if by using a relaxation of Azuma-Hoeffding
martingale techniques. The random variables Ni,t do not a priori satisfy the c-Lipschitz
condition: it is possible that a new node may fall into many overlapping regions of
influence. Nevertheless, we will prove that deviation from the c-Lipschitz condition
occurs with exponentially small probability. The following lemma gives a bound for
|Ni,t+1 − Ni,t | which holds with extreme probability.
Lemma 2. Wep for all 0 ≤ t ≤ n − 1 the following inequalities hold.
i |Ni,t+1 − Ni,t | ≤ 2(A1 i + A2 ) log2 n, for 0 ≤ i ≤ t.
ii |Ni,t+1 − Ni,t | ≤ 2(A1 i + A2 ), for log2 n < i ≤ t.
Proof. Fix t, let i, j ≤ t, and let Xj (i, t) denote the indicator variable for the event that
vj has degree i at time t and vt+1 links to vj . Thus,
Ni,t+1 − Ni,t =
t
X
Xj (i − 1, t) −
j=1
and so
|Ni,t+1 − Ni,t | ≤ max
t
X
Xj (i, t),
j=1
t
X
j=1
Xj (i − 1, t),
t
X
!
Xj (i, t) .
(8)
j=1
Let Zj (i, t) denote the indicator variable for the event that vt+1 is chosen in the cap
of area (A1 i + A2 )/t around node vj . Clearly, if Xj (i, t) = 1, then Zj (i, t) = 1 as well,
so Xj (i, t) ≤ Zj (i, t). Thus, to bound |Ni,t+1 − Ni,t | it suffices to bound the values of
Z(i, t), where
t
X
Z(i, t) =
Zj (i, t).
j=1
The variables Zj (i, t) for j = 1, . . . , t are pairwise independent. To see this, we can
assume the position of vt+1 to be fixed. Then, the value of Zj (i, t) depends only on the
position of vj . Since the position of each node is chosen independently and uniformly,
the value of Zj (i, t) is independent from the value of any other Zj 0 (i, t) where j 6= j 0 .
Therefore, Z(i, t) is the sum of independent Bernouilli variables with probability of
success equal to
A1 i + A2
P(Zj (i, t) = 1) =
.
t
Using Chernoff’s inequalities (see, for instance Theorem 2.1 [6]), we can show that
Z(i, t) < A1 i + A2 + (A1 i + A2 ) log2 n < 2(A1 i + A2 ) log2 n. and Z(i, t) < 2(A1 i + A2 )
if i > log2 n. Using these bounds, the proof now follows since by (8),
|Ni,t+1 − Ni,t | ≤ max(Z(i − 1, t), Z(i, t)).
u
t
To sketch the technique of the proof of Theorem 1, we consider N0,t , the number
of nodes of in-degree zero. We use the supermartingale method of Pittel et al. [9], as
described in [10].
Lemma 3. Let G0 , G1 , . . . , Gn be a random graph process and Xt a random variable
determined by G0 , G1 , . . . , Gt , 0 ≤ t ≤ n. Suppose that for some real β and constants
γi ,
E(Xt − Xt−1 |G0 , G1 , . . . , Gt−1 ) < β
and
|Xt − Xt−1 − β| ≤ γi
for 1 ≤ t ≤ n. Then for all α > 0,
α2
P(For some t with 0 ≤ t ≤ n : Xt − X0 ≥ tβ + α) ≤ exp − P 2 .
2 γj
Note that we use the concept of a stopping time in the proof of Lemma 3 to obtain
a stronger result. Stopping times aid by showing that the bound for the deviation of
Xn applies with the same probability for all of the Xt , with t ≤ n.
Theorem 2. Wep for every t, 1 ≤ t ≤ n
N0,t =
t
+ O(n1/2 log3 n) .
1 + A2 p
Proof. We first transform N0,t into something close to a martingale. Consider the following real-valued function
H(x, y) = xpA2 y −
x1+pA2
1 + pA2
(9)
(note that we expect H(t, N0,t ) to be close to zero). Let wt = (t, N0,t ), and consider the
sequence of random variables (H(wt ) : 1 ≤ i ≤ n). The second-order partial derivatives
of H evaluated at wt are all O(tpA2 −1 ). Therefore, we have
H(wt+1 ) − H(wt ) = (wt+1 − wt ) · grad H(wt ) + O(tpA2 −1 ),
(10)
where “·” denotes the scalar product and grad H(wt ) = (Hx (wt ), Hy (wt )).
Observe that, from our choice of H,
E(wt+1 − wt | Gt ) · grad H(wt ) = 0,
since H was chosen so that H(w) is constant along every trajectory w of the differential
equation that approximates the recurrence relation (6).
Hence, taking the expectation of (10) conditional on Gt , we obtain that
E(H(wt+1 ) − H(wt ) | Gt ) = O(tpA2 −1 ).
From (10), noting that
grad H(wt ) = pA2 tpA2 −1 N0,t − tpA2 , tpA2 ,
and using Lemma 2 to bound the change in N0,t , we have that wep
|H(wt+1 ) − H(wt )| ≤ tpA2 2(A1 i + A2 ) log2 n + O(tpA2 ) = O(tpA2 log2 n).
Now we may apply Lemma 3 to the sequence (H(wt ) : 1 ≤ i ≤ n), and symmetrically to (−H(wt ) : 1 ≤ i ≤ n), with α = n1/2+pA2 log3 n, β = O(tpA2 −1 ) and
γt = O(tpA2 log2 n), to obtain that wep
|H(wt ) − H(w0 )| = O(n1/2+pA2 log3 n)
for 1 ≤ t ≤ n. As H(w0 ) = 0, this implies from the definition (9) of the function H,
that wep
t
+ O(n1/2 log3 n)
(11)
N0,t =
1 + pA2
for 1 ≤ t ≤ n which finishes the proof of the theorem.
u
t
We may repeat the argument as in the proof of Theorem 2 for Ni,t with i ≥ 1. We
omit the details here, which will follow in the long version of the paper.
2.3
In-degree of given node
In contrast to the large-scale behaviour of the degree distribution described in the
previous subsection, here we focus on the distribution of the in-degree of an individual
node. The indicator variable Yt for the increase in d− (v, t) by receiving a link from vt+1
is Bernoulli Be(p(A1 d− (v, t) + A2 )/t). Thus,
E(d− (v, t + 1)|Gt ) = d− (v, t) +
p(A1 d− (v, t) + A2 )
.
t
(12)
This is very similar to the growth of the degree in the Preferential Attachment model
as analized in [3]. As in the PA model, a ”rich get richer” principle applies for the
in-degrees, and the richer nodes are those that were born first. Theorem 2.1 of [3] can
be used to obtain results on the concentration of Ni,t , but the methods employed in the
previous sections give a stronger result.
The results on the distribution of d− (v, n) are summarized in parts (a) and (b) of
the theorem below (use Theorem 2.2 of [3] with minor reworking). Part (c) will be
discussed in the next section, and used to establish the concentration of the edges of
Gt .
Theorem 3. Let ω = log n and let l∗ = nmin{pA1 ,1/2} /ω 4 . For 0 < pA1 < 1,
(a) For ω 8 ≤ j ≤ (n − n/ω) and 0 ≤ l ≤ l∗ or for (n − n/ω) < j < n and l = 0, 1,
!l
pA1
pA1
n
n
P(d− (vj , n) = l) = (1+O(1/ω2 ))
1−
(1+O(1/ω2 )) .
j
j
(b) For (n − n/ω) < j < n and l ≥ 2,
P(d− (vj , n) = l) = O(lpA1 −1 /ω l ).
(c) For all K > 0,
−
2
pA1
P(There exists j ≤ n : d (vj , n) ≥ Kω (n/j)
)=O n
−Ke−18
.
Theorem 3(c) implies that aas the maximum in-degree of node vj is at most (n/j)pA1 Kω 2 .
Conditional on this, (a) and (b) characterize the distribution of d− (vj , n) for all j ≥ ω 8
when pA1 ≤ 1/2 and for j ≥ ω 8 npA1 −1/2 when pA1 > 1/2.
3
The number of edges of Gt
We derive a concentration result for the number of edges in graphs generated by the
SPA model. Let Mt = |Et |, the number of edges in Gt , and let mt = E(Mt ). Then we
have that
E(Mt+1
t
X
pA1 Mt
A1 d− (vj , t) + A2
= Mt +
+ pA2 ,
| Mt ) = Mt +
p
t
t
j=1
and so m1 = 0, and for t ≥ 1,
mt+1
pA1
= mt 1 +
+ pA2 .
t
The (first-order) solutions of this recurrence are
pA2
1−pA1 n, pA1 < 1
mn ∼
n log n, pA1 = 1.
Theorem 4. If pA1 < 1, then aas the number of edges is concentrated around its
expected value:
Mn = mn (1 + o(1)).
The following lemma (whose proof is left to the long version of the paper) is used
in the proof of Theorem 4, and proves Theorem 3 (c).
Lemma 4. For all vj , j > 0 and K > 0,
−18
P(d− (vj , n) ≥ K log2 n(n/j)pA1 ) = O(n−Ke
).
Proof of Theorem 4. We count the number of edges by counting the in-degree of
nodes. Our approach is as follows: by Theorem 1 wep for i ≤ if the number of nodes
Ni,n of in-degree i at time n is concentrated. Let a be the solution of (n/a)pA1 = if and
let ω 0 = (K log2 n)1/(pA1 ) be the solution of
pA1
n pA1
t
2
,
K log (n) =
aω 0
a
where K ≥ 4e18 . From Lemma 4, with probability 1 − O(n−3 ) no node v ≥ aω 0 has
P
P 0 −
degree exceeding if . Let µ(n) = i≤if ENi,n , and let λ(n) = aω
j=1 d (vj , n). We prove,
conditional on Lemma 4, that λ(n) = o(mn ) and thus the number of edges is concentrated around mn . We have that for pA1 < 1
0
λ(n) =
aω
X
d− (vj , n)
j=1
aω pA1
X
n
0
≤ Kω
2
j=1
j
= O(1/(1 − pA1 )) log2/(pA1 ) (n)npA1 a1−pA1
7pA1 +1
= O(1/(1 − pA1 )) log2/(pA1 ) (n) + 4(1 − pA1 )/(6pA1 + 2)n 6pA1 +2
= o(n).
However, µ(t) ≥ ct for some constant c > 0.
u
t
4
A geometric small world property
In Section 2 it was shown that the number of nodes in a graph generated by the SPA
model of in-degree zero in Gn is linear in n. Also, with positive probability a new node
will land in an area of S not covered by any influence regions, and thus have out-degree
zero. Therefore, the underlying undirected graph of Gn is not connected. In fact, we
expect that for the majority of distinct pairs u, v, there will not be a directed path from
u to v. Since this is a property also observed in the web graph, it does not detract from
the SPA model, but rather indicates that we should consider another variable rather
than diameter to indicate a “small world” property. Thus, we focus on the (geometric)
distance, in S, spanned by the links.
For a pair of points u, v ∈ S. let L(u, v) be the length of the shortest curve embedded
in the surface of S that connects u and v. Define
X
Lt =
L(vt , vi );
(vt ,vi )∈Et
that is, Lt is the sum of the lengths of new edges added at time t in the SPA model.
Note that Lt is a continuous random variable.
Theorem 5. Suppose that pA1 > 2/3. For the expectation of Lt ,
1−pA
− pA 1
1
E(Lt ) = Θ t
.
To prove Theorem 5 we need the following lemma whose (straightforward) proof is
omitted.
Lemma 5. Let u be chosen uar from a cap with centre v and area α.p
If X is the
2
distance between u and v, measured over the surface of S, then E(X) = 3 απ .
Proof of Theorem 5 Define
Zj,t =
Then Lt =
have that
Pt−1
j=1
L(vt , vj ) if (vt , vj ) ∈ Et
0
else.
Zj,t . Let Bt,j be the event that (vt , vj ) ∈ Et . Then using Lemma 5 we
E(Zj,t+1 | Gt ) = P(Bt,j )E(Zj,t+1 | Gt , Bt,j ) + P(Bt,j )E(Zj,t+1 | Gt , Bt,j )
= P(Bt,j )E(L((vt+1 , vj ) | Gt )
!
r
A1 d− (vj , t) + A2
2 A1 d− (vj , t) + A2
= p
t
3
πt
3/2
2p
A1 d− (vj , t) + A2
= √
,
t
3 π
where the second last equality follows by Lemma 5 and the definition of the model, and
the second equality follows from the definition of Zj,t+1 . Thus
E(Lt+1 | Gt ) =
t
X
X
k=0 {j:d− (vj ,t)=k}
3/2
t
2p X A1 k + A2
E(Zj,t+1 |Gt ) = √
Nk,t .
t
3 π k=0
−(1+
1
(13)
)
pA1
Taking expectations on both sides, and using that ck = ck
(1 + o(1)), we have
that
3/2
t
2p X A1 k + A2
E(Nk,t )
E(Lt+1 ) = √
t
3 π k=0
t
2p X
= √
(A1 k + A2 )3/2 ck (1 + o(1))
3 πt k=0
Z t
2pc
x1/2−1/(pA1 ) (1 + o(1))dx
= √
3 πt 0
= Θ(t1−1/(pA1 ) ),
where the second equality follows by Theorem 1 (2). The last step is justified since it
can be shown that the o(1) term in the integrand is in fact O(x− ) for some > 0. u
t
Theorem 5 contrasts with the analogous result for graphs generated with a similar
process to the SPA model, but where all influence regions have area d/t for d > 0 a
constant. We call this a threshold model. In the threshold model, E(Lt ) decreases much
faster than for the SPA model with p large, such as when p > 2/3 and A1 = 1. For
example, if pA1 = 1, then E(Lt ) = O(1).
Theorem 6. In the threshold model with areas of influence d/t, where d is a constant,
E(Lt ) ∼ ct−1/2 .
Proof. With the same notation as in the proof of Theorem 5 and using Lemma 5, we
have that
E(Zj,t+1 | Gt ) = P(Bj,t+1 )E(L(vt+1 , vj ) | Bj,t+1 )
r
2d d
=
.
3t πt
Hence,
E(Lt+1 | Gt ) =
t
X
i=1
2d
E(Zj,t+1 |Gt ) =
3
Taking expectations completes the proof.
r
d
= Θ(t−1/2 ).
πt
u
t
View publication stats
5
Conclusions and further work
We have proved that graphs produced by the SPA model have some of the graph properties observed in real-world complex networks: a power law in-degree distribution, and
constant average degree. In future work, we will investigate additional graph properties,
such as the expected length of a directed path between two nodes (when such a path
exists), expansion properties, and spectral values. We are also interested in aspects suggesting self-similarity: is it true that the subgraph induced by all nodes that fall in a
certain compact region of the sphere S share some of the graph properties of the whole
graph?
Several generalizations of this model may be proposed. An undirected version could
be developed, where the link probability depends on the influence regions of both endpoints. In a more realistic model, both the addition of edges without adding a node
and the deletion of edges and nodes should be incorporated. The effect of replacing S
with other underlying geometric spaces, either with boundaries or of higher dimension,
would be interesting to investigate.
Last but not least, a realistic spatial model gives the possibility for reverse engineering of real-life networks: given a real-life network and assuming a spatial graph model
by which the network was generated, it should be possible to give reliable estimates
about the positions of the nodes in space. This direction has important applications to
web graph clustering and development of link-based similarity measures.
References
1. A. Bonato, A survey of web graph models, In: Proceedings of Combinatorial and Algorithm Aspects of
Networking, 2004.
2. F.R.K. Chung, L. Lu, Complex Graphs and Networks, American Mathematical Society, 2006.
3. C. Cooper, The age specific degree distribution of web-graphs, Combinatorics Probability and Computing
15 (2006) 637–661.
4. A. Flaxman, A.M. Frieze, J. Vera, A geometric preferential attachment model of networks, Internet Mathematics 3 (2006) 187–205.
5. A. Flaxman, A.M. Frieze, J. Vera, A geometric preferential attachment model of networks II, preprint.
6. S. Janson, T. Luczak, A. Ruciński, Random Graphs, Wiley, New York, 2000.
7. F. Menczer, Lexical and semantic clustering by Web links, JASIST 55(14) (2004), 1261-1269.
8. M. Penrose, Random Geometric Graphs, Oxford University Press, Oxford, 2003.
9. B. Pittel, J. Spencer, N. Wormald, Sudden emergence of a giant k-core in a random graph, Journal of
Combinatorial Theory, Series B 67 (1996) 111–151.
10. N. Wormald, The differential equation method for random graph processes and greedy algorithms, In: Lectures on Approximation and Randomized Algorithms, eds. M. Karoński and H. J. Prömel, PWN, Warsaw,
(1999) 73-155.