A Spatial Web Graph Model with Local Influence Regions

Colin Cooper

A Spatial Web Graph Model with Local Influence Regions

Colin Cooper

2008, Internet Mathematics

visibility

…

description

12 pages

link

1 file

The web graph may be considered as embedded in a topic space, with a metric that expresses the extent to which web pages are related to each other. Using this assumption, we present a new model for the web and other complex networks, based on a spatial embedding of the nodes, called the Spatial Preferred Attachment (SPA) model. In the SPA model, nodes have influence regions of varying size, and new nodes may only link to a node if they fall within its influence region. We prove that our model gives a power law in-degree distribution, with exponent in [2, ∞) depending on the parameters, and with concentration for a wide range of in-degree values. We also show that the model allows for edges that span a large distance in the underlying space, modelling a feature often observed in real-world complex networks. The authors gratefully acknowledge support from NSERC and MITACS grants.

A spatial web graph model with local influence regions W. Aiello1 , A. Bonato2 , C. Cooper3 , J. Janssen4 , and P. Pralat4 1 University of British Columbia Vancouver, Canada [email protected] 2 Wilfrid Laurier University Waterloo, Canada [email protected] 3 King’s College London, UK [email protected] 4 Dalhousie University Halifax, Canada [email protected],[email protected] Abstract. The web graph may be considered as embedded in a topic space, with a metric that expresses the extent to which web pages are related to each other. Using this assumption, we present a new model for the web and other complex networks, based on a spatial embedding of the nodes, called the Spatial Preferred Attachment (SPA) model. In the SPA model, nodes have influence regions of varying size, and new nodes may only link to a node if they fall within its influence region. We prove that our model gives a power law in-degree distribution, with exponent in [2, ∞) depending on the parameters, and with concentration for a wide range of in-degree values. We also show that the model allows for edges that span a large distance in the underlying space, modelling a feature often observed in real-world complex networks. 1 Introduction Current stochastic models for complex networks (such as those described in [1, 2]) aim to reproduce a number of graph properties observed in real-world networks such as the web graph. On the other hand, experimental and heuristic treatments of real-life networks operate under the tacit assumption that the network is a visible manifestation of an underlying hidden reality. For example, it is commonly assumed that communities in a social network can be recognized as densely linked subgraphs, or that web pages with many common neighbours contain related topics. Such assumptions imply that there is an a priori community structure or relatedness measure of the nodes, which is reflected by the link structure of the graph. A common method to represent relatedness of objects is by an embedding in a metric space, so that related objects are placed close together, and communities are represented by clusters of points. Following a common text mining technique, web pages are often represented as vectors in a word-document space. Using Latent Sematic Indexing, these vectors can then be embedded in a Euclidean topic space, so that pages on similar topics ? The authors gratefully acknowledge support from NSERC and MITACS grants. are located close together. Experimental studies [7] have confirmed that similar pages are more likely to link to each other. On the other hand, experiments also confirm a large amount of topic drift: it is possible to move to a completely different topic in a relatively short number of hops. This points to a model where nodes are embedded in a metric space, and the edge probability between nodes is influenced by their proximity, but edges that span a larger distance in the space are not uncommon. The Spatial Preferred Attachment (SPA) model proposed in this paper combines the above considerations with the often-used preferential attachment principle: pages with high in-degree are more likely to receive new links. In the SPA model, each node is placed in space and surrounded by an influence region. The area of the influence region is determined by the in-degree of the node. Moreover, in each time-step all regions decrease in area as a function of time. A new node v can only link to an existing node u if v falls within the influence region of u. If v falls within the region of influence u, then v will link to u with probability p. Thus, the model is based on the preferential attachment principle, but only implicitly: nodes with high in-degree have a large region of influence, and therefore are more likely to attract new links. A random graph model with certain similarities to the SPA model is the geometric random graph; see [8]. In that model, all influence regions have the same size, and the link probability is p = 1. Flaxman, Frieze, and Vera in [5] supply an interesting geometric model where nodes are embedded on a sphere, and the link probability is influenced by the relative positions of the nodes. This model is a generalization of a geometric preferential attachment models presented by the same authors in [4], which influenced our model. There are at least three features that distinguish the SPA model from previous work. First, a new node can choose its links purely based on local information. Namely, the influence region of a node can be seen as the region where a web page is visible: only web pages that are close enough (in topic) to fall within the influence region will be aware of the give page, and thus have a possibility to link to it. Moreover, a new node links independently to each node visible to it. Consequently, the new node needs no knowledge of the invisible part of the graph (such as in-degree of other nodes, or total number of nodes or links) to determine its neighbourhood. Second, since a new node links to each visible node independently, the out-degree is not a constant nor chosen according to a pre-determined distribution, but arises naturally from the model. Third, the varying size of the influence regions allows for the occasional long links, edges between nodes that are spaced far apart. This implies a certain ”small world” property. We formally define the SPA model as follows. Let S be the surface of the sphere of area 1 in R3 . For each positive real number α ≤ 1, and u ∈ S, define the cap around u with area α as Bα (u) = {x ∈ S : ||x − u|| ≤ rα }, where || · || is the usual Euclidean norm, and rα is chosen such that Bα has area α. The SPA model has parameters A1 , A2 , A3 , p ≥ 0 such that p ≤ 1, A1 ≤ 1 and A2 > 0. It generates stochastic sequences of graphs (Gt : t ≥ 0), where Gt = (Vt , Et ), and Vt ⊆ S. Let d− (v, t) (d+ (v, t)) be the in-degree (out-degree) of node v in Gt . We define the influence region of node v at time t ≥ 1, written R(v, t), to be the cap around v with area |R(v, t)| = A1 d− (v, t) + A2 , t + A3 or R(v, t) = S if the righ-hand-side is greater than 1. The process begins at t = 0, with G0 being the empty graph, and we let G1 be just K1 . Time-step t, t ≥ 2, is defined to be the transition between Gt−1 and Gt . At the beginning of each time-step t, a new node vt is chosen uniformly at random (uar ) from S, and added to Vt−1 to create Vt . Next, independently, for each node u ∈ Vt−1 such that vt ∈ R(u, t − 1), a directed edge (vt , u) is created with probability p. Thus, the probability that a link (vt , u) is added in time-step t equals p|R(u, t − 1)|. Because new nodes choose independently whether to link to each visible node, and the size of the influence region of a node depends only on the edges from younger nodes, the distribution of the random graph Gn produced by the SPA model with parameters A1 , A2 , A3 , p is equivalent to the graph Gn+A3 produced by the SPA model with the same values for A1 , A2 , p, but with A3 = 0, where the first A3 nodes have been removed. Since the results presented in this paper do not depend on the first nodes, we will assume throughout that A3 = 0. Note that the model could be defined on any compact set of measure 1. However, if the set has non-empty boundary, the definiton of the influence regions should be adjusted. If higher dimensions are desired, S could be chosen to be the boundary of a hypersphere in Rk for some k. The results in Sections 2 and 3 will still hold, while Section 4 can be easily extended to this case. We prove in Section 2 that with high probability a graph Gn generated by the SPA model has an in-degree distribution that follows a power law in-degree distribution pA1 /(6pA1 +2) 1 n if with exponent 1 + pA1 , with concentration up to n , where if = log4 n . If pA1 = 10/11, then the power law in-degree exponent is 2.1, the same as observed in the web graph (see, for example [2]). We also give a precise expression for the probability distribution of each individual node vi , provided that pA1 < 1. In Section 3, we show that, if pA1 < 1, the number of edges of Gn is linear, and strongly concentrated around the mean, while if pA1 = 1 the expected number of edges is n log n. In Section 4 we explore a geometric version of the small world property. We show that the expected sum of (geometric) lengths of new edges added at time t in the SPA model is Θ(t2−b ), where b = 1 + pA1 1 is the exponent of the power law. For the in-degree power law exponent b = 2.1 commonly observed in the web graph, this expected sum of lengths is greater than the corresponding expected sum in a corresponding geometric random graph with equal-sized influence regions. 2 In-degree distribution In the rest of the paper, (Gt : t ≥ 0) refers to a sequence of random graphs generated by the SPA model with parameters A1 , A2 , A3 = 0, and p. In this section, we explore the in-degree of the nodes in Gn . We say that an event holds asymptotically almost surely (aas) if it holds with probability tending to one as n → ∞; an event holds with extreme probability (wep) if it holds with probability at least 1 − exp(−Θ(log2 n)) as n → ∞. Let Ni,t denote the number of nodes of in-degree i in Gt . For an integer n ≥ 0, define pA1 /(6pA1 +2) n if = if (n) = . (1) log4 n Our main result in this section is the following. Theorem 1. Fix p ∈ (0, 1]. Then for any i ≥ 0, E(Ni,n ) = ci n(1 + o(1)), (2) where c0 = 1 , 1 + pA2 (3) and for 1 ≤ i ≤ n, i−1 Y jA1 + A2 pi ci = . 1 + pA2 + ipA1 j=0 1 + pA2 + jpA1 (4) For i = 0, . . . , if , wep Ni,n = ci n(1 + o(1)). −(1+ 1 (5) ) pA1 (1 + o(1)) for some constant c, this shows that for large i, the Since ci = ci expected proportion Ni,n /n follows a power law with exponent 1 + pA1 1 , with concentration for all values of i up to if . The proof of the Theorem 1 is contained in the rest of this section. 2.1 Expected value The equations relating the random variables Ni,t are described as follows. As G1 consist of one isolated node, N0,1 = 1, and Ni,1 = 0 for i > 0. For all t > 0, we derive that A2 , (6) t A1 (i − 1) + A2 A1 i + A2 E(Ni,t+1 − Ni,t | Gt ) = Ni−1,t p − pNi,t . (7) t t Recurrence relations for the expected values of Ni,t can be derived by taking the expectation of the above equations. To solve these relations, we use the following lemma on real sequences, which is Lemma 3.1 from [2]. E(N0,t+1 − N0,t | Gt ) = 1 − N0,t p Lemma 1. If (αt ), (βt ) and (γt ) are real sequences satisfying the relation βt αt+1 = 1 − αt + γt , t αt t and limt→∞ βt = β > 0 and limt→∞ γt = γ, then limt→∞ exists and equals γ . 1+β Applying this lemma with αt = E(N0,t ), βt = pA2 , and γt = 1 gives that E(N0,t ) = c0 t + o(t) with c0 as in (3). For i > 0, the lemma can be inductively applied with αt = 2 E(Ni,t ), βt = p(A1 i+A2 ), and γt = E(Ni−1,t ) A1 (i−1)+A to show that E(Ni,t ) = ci t+o(t), t where ci = ci−1 A1 (i − 1) + A2 . 1 + p(A1 i + A2 ) It is easy to verify that the expression for ci as defined in (3) and (4) satisfies this recurrence relation. 2.2 Concentration We prove concentration for Ni,t when i ≤ if by using a relaxation of Azuma-Hoeffding martingale techniques. The random variables Ni,t do not a priori satisfy the c-Lipschitz condition: it is possible that a new node may fall into many overlapping regions of influence. Nevertheless, we will prove that deviation from the c-Lipschitz condition occurs with exponentially small probability. The following lemma gives a bound for |Ni,t+1 − Ni,t | which holds with extreme probability. Lemma 2. Wep for all 0 ≤ t ≤ n − 1 the following inequalities hold. i |Ni,t+1 − Ni,t | ≤ 2(A1 i + A2 ) log2 n, for 0 ≤ i ≤ t. ii |Ni,t+1 − Ni,t | ≤ 2(A1 i + A2 ), for log2 n < i ≤ t. Proof. Fix t, let i, j ≤ t, and let Xj (i, t) denote the indicator variable for the event that vj has degree i at time t and vt+1 links to vj . Thus, Ni,t+1 − Ni,t = t X Xj (i − 1, t) − j=1 and so |Ni,t+1 − Ni,t | ≤ max t X Xj (i, t), j=1 t X j=1 Xj (i − 1, t), t X ! Xj (i, t) . (8) j=1 Let Zj (i, t) denote the indicator variable for the event that vt+1 is chosen in the cap of area (A1 i + A2 )/t around node vj . Clearly, if Xj (i, t) = 1, then Zj (i, t) = 1 as well, so Xj (i, t) ≤ Zj (i, t). Thus, to bound |Ni,t+1 − Ni,t | it suffices to bound the values of Z(i, t), where t X Z(i, t) = Zj (i, t). j=1 The variables Zj (i, t) for j = 1, . . . , t are pairwise independent. To see this, we can assume the position of vt+1 to be fixed. Then, the value of Zj (i, t) depends only on the position of vj . Since the position of each node is chosen independently and uniformly, the value of Zj (i, t) is independent from the value of any other Zj 0 (i, t) where j 6= j 0 . Therefore, Z(i, t) is the sum of independent Bernouilli variables with probability of success equal to A1 i + A2 P(Zj (i, t) = 1) = . t Using Chernoff’s inequalities (see, for instance Theorem 2.1 [6]), we can show that Z(i, t) < A1 i + A2 + (A1 i + A2 ) log2 n < 2(A1 i + A2 ) log2 n. and Z(i, t) < 2(A1 i + A2 ) if i > log2 n. Using these bounds, the proof now follows since by (8), |Ni,t+1 − Ni,t | ≤ max(Z(i − 1, t), Z(i, t)). u t To sketch the technique of the proof of Theorem 1, we consider N0,t , the number of nodes of in-degree zero. We use the supermartingale method of Pittel et al. [9], as described in [10]. Lemma 3. Let G0 , G1 , . . . , Gn be a random graph process and Xt a random variable determined by G0 , G1 , . . . , Gt , 0 ≤ t ≤ n. Suppose that for some real β and constants γi , E(Xt − Xt−1 |G0 , G1 , . . . , Gt−1 ) < β and |Xt − Xt−1 − β| ≤ γi for 1 ≤ t ≤ n. Then for all α > 0, α2 P(For some t with 0 ≤ t ≤ n : Xt − X0 ≥ tβ + α) ≤ exp − P 2 . 2 γj Note that we use the concept of a stopping time in the proof of Lemma 3 to obtain a stronger result. Stopping times aid by showing that the bound for the deviation of Xn applies with the same probability for all of the Xt , with t ≤ n. Theorem 2. Wep for every t, 1 ≤ t ≤ n N0,t = t + O(n1/2 log3 n) . 1 + A2 p Proof. We first transform N0,t into something close to a martingale. Consider the following real-valued function H(x, y) = xpA2 y − x1+pA2 1 + pA2 (9) (note that we expect H(t, N0,t ) to be close to zero). Let wt = (t, N0,t ), and consider the sequence of random variables (H(wt ) : 1 ≤ i ≤ n). The second-order partial derivatives of H evaluated at wt are all O(tpA2 −1 ). Therefore, we have H(wt+1 ) − H(wt ) = (wt+1 − wt ) · grad H(wt ) + O(tpA2 −1 ), (10) where “·” denotes the scalar product and grad H(wt ) = (Hx (wt ), Hy (wt )). Observe that, from our choice of H, E(wt+1 − wt | Gt ) · grad H(wt ) = 0, since H was chosen so that H(w) is constant along every trajectory w of the differential equation that approximates the recurrence relation (6). Hence, taking the expectation of (10) conditional on Gt , we obtain that E(H(wt+1 ) − H(wt ) | Gt ) = O(tpA2 −1 ). From (10), noting that grad H(wt ) = pA2 tpA2 −1 N0,t − tpA2 , tpA2 , and using Lemma 2 to bound the change in N0,t , we have that wep |H(wt+1 ) − H(wt )| ≤ tpA2 2(A1 i + A2 ) log2 n + O(tpA2 ) = O(tpA2 log2 n). Now we may apply Lemma 3 to the sequence (H(wt ) : 1 ≤ i ≤ n), and symmetrically to (−H(wt ) : 1 ≤ i ≤ n), with α = n1/2+pA2 log3 n, β = O(tpA2 −1 ) and γt = O(tpA2 log2 n), to obtain that wep |H(wt ) − H(w0 )| = O(n1/2+pA2 log3 n) for 1 ≤ t ≤ n. As H(w0 ) = 0, this implies from the definition (9) of the function H, that wep t + O(n1/2 log3 n) (11) N0,t = 1 + pA2 for 1 ≤ t ≤ n which finishes the proof of the theorem. u t We may repeat the argument as in the proof of Theorem 2 for Ni,t with i ≥ 1. We omit the details here, which will follow in the long version of the paper. 2.3 In-degree of given node In contrast to the large-scale behaviour of the degree distribution described in the previous subsection, here we focus on the distribution of the in-degree of an individual node. The indicator variable Yt for the increase in d− (v, t) by receiving a link from vt+1 is Bernoulli Be(p(A1 d− (v, t) + A2 )/t). Thus, E(d− (v, t + 1)|Gt ) = d− (v, t) + p(A1 d− (v, t) + A2 ) . t (12) This is very similar to the growth of the degree in the Preferential Attachment model as analized in [3]. As in the PA model, a ”rich get richer” principle applies for the in-degrees, and the richer nodes are those that were born first. Theorem 2.1 of [3] can be used to obtain results on the concentration of Ni,t , but the methods employed in the previous sections give a stronger result. The results on the distribution of d− (v, n) are summarized in parts (a) and (b) of the theorem below (use Theorem 2.2 of [3] with minor reworking). Part (c) will be discussed in the next section, and used to establish the concentration of the edges of Gt . Theorem 3. Let ω = log n and let l∗ = nmin{pA1 ,1/2} /ω 4 . For 0 < pA1 < 1, (a) For ω 8 ≤ j ≤ (n − n/ω) and 0 ≤ l ≤ l∗ or for (n − n/ω) < j < n and l = 0, 1, !l pA1 pA1 n n P(d− (vj , n) = l) = (1+O(1/ω2 )) 1− (1+O(1/ω2 )) . j j (b) For (n − n/ω) < j < n and l ≥ 2, P(d− (vj , n) = l) = O(lpA1 −1 /ω l ). (c) For all K > 0, − 2 pA1 P(There exists j ≤ n : d (vj , n) ≥ Kω (n/j) )=O n −Ke−18 . Theorem 3(c) implies that aas the maximum in-degree of node vj is at most (n/j)pA1 Kω 2 . Conditional on this, (a) and (b) characterize the distribution of d− (vj , n) for all j ≥ ω 8 when pA1 ≤ 1/2 and for j ≥ ω 8 npA1 −1/2 when pA1 > 1/2. 3 The number of edges of Gt We derive a concentration result for the number of edges in graphs generated by the SPA model. Let Mt = |Et |, the number of edges in Gt , and let mt = E(Mt ). Then we have that E(Mt+1 t X pA1 Mt A1 d− (vj , t) + A2 = Mt + + pA2 , | Mt ) = Mt + p t t j=1 and so m1 = 0, and for t ≥ 1, mt+1 pA1 = mt 1 + + pA2 . t The (first-order) solutions of this recurrence are  pA2  1−pA1 n, pA1 < 1 mn ∼  n log n, pA1 = 1. Theorem 4. If pA1 < 1, then aas the number of edges is concentrated around its expected value: Mn = mn (1 + o(1)). The following lemma (whose proof is left to the long version of the paper) is used in the proof of Theorem 4, and proves Theorem 3 (c). Lemma 4. For all vj , j > 0 and K > 0, −18 P(d− (vj , n) ≥ K log2 n(n/j)pA1 ) = O(n−Ke ). Proof of Theorem 4. We count the number of edges by counting the in-degree of nodes. Our approach is as follows: by Theorem 1 wep for i ≤ if the number of nodes Ni,n of in-degree i at time n is concentrated. Let a be the solution of (n/a)pA1 = if and let ω 0 = (K log2 n)1/(pA1 ) be the solution of pA1 n pA1 t 2 , K log (n) = aω 0 a where K ≥ 4e18 . From Lemma 4, with probability 1 − O(n−3 ) no node v ≥ aω 0 has P P 0 − degree exceeding if . Let µ(n) = i≤if ENi,n , and let λ(n) = aω j=1 d (vj , n). We prove, conditional on Lemma 4, that λ(n) = o(mn ) and thus the number of edges is concentrated around mn . We have that for pA1 < 1 0 λ(n) = aω X d− (vj , n) j=1 aω pA1 X n 0 ≤ Kω 2 j=1 j = O(1/(1 − pA1 )) log2/(pA1 ) (n)npA1 a1−pA1 7pA1 +1 = O(1/(1 − pA1 )) log2/(pA1 ) (n) + 4(1 − pA1 )/(6pA1 + 2)n 6pA1 +2 = o(n). However, µ(t) ≥ ct for some constant c > 0. u t 4 A geometric small world property In Section 2 it was shown that the number of nodes in a graph generated by the SPA model of in-degree zero in Gn is linear in n. Also, with positive probability a new node will land in an area of S not covered by any influence regions, and thus have out-degree zero. Therefore, the underlying undirected graph of Gn is not connected. In fact, we expect that for the majority of distinct pairs u, v, there will not be a directed path from u to v. Since this is a property also observed in the web graph, it does not detract from the SPA model, but rather indicates that we should consider another variable rather than diameter to indicate a “small world” property. Thus, we focus on the (geometric) distance, in S, spanned by the links. For a pair of points u, v ∈ S. let L(u, v) be the length of the shortest curve embedded in the surface of S that connects u and v. Define X Lt = L(vt , vi ); (vt ,vi )∈Et that is, Lt is the sum of the lengths of new edges added at time t in the SPA model. Note that Lt is a continuous random variable. Theorem 5. Suppose that pA1 > 2/3. For the expectation of Lt , 1−pA − pA 1 1 E(Lt ) = Θ t . To prove Theorem 5 we need the following lemma whose (straightforward) proof is omitted. Lemma 5. Let u be chosen uar from a cap with centre v and area α.p If X is the 2 distance between u and v, measured over the surface of S, then E(X) = 3 απ . Proof of Theorem 5 Define Zj,t = Then Lt = have that Pt−1 j=1 L(vt , vj ) if (vt , vj ) ∈ Et 0 else. Zj,t . Let Bt,j be the event that (vt , vj ) ∈ Et . Then using Lemma 5 we E(Zj,t+1 | Gt ) = P(Bt,j )E(Zj,t+1 | Gt , Bt,j ) + P(Bt,j )E(Zj,t+1 | Gt , Bt,j ) = P(Bt,j )E(L((vt+1 , vj ) | Gt ) ! r A1 d− (vj , t) + A2 2 A1 d− (vj , t) + A2 = p t 3 πt 3/2 2p A1 d− (vj , t) + A2 = √ , t 3 π where the second last equality follows by Lemma 5 and the definition of the model, and the second equality follows from the definition of Zj,t+1 . Thus E(Lt+1 | Gt ) = t X X k=0 {j:d− (vj ,t)=k} 3/2 t 2p X A1 k + A2 E(Zj,t+1 |Gt ) = √ Nk,t . t 3 π k=0 −(1+ 1 (13) ) pA1 Taking expectations on both sides, and using that ck = ck (1 + o(1)), we have that 3/2 t 2p X A1 k + A2 E(Nk,t ) E(Lt+1 ) = √ t 3 π k=0 t 2p X = √ (A1 k + A2 )3/2 ck (1 + o(1)) 3 πt k=0 Z t 2pc x1/2−1/(pA1 ) (1 + o(1))dx = √ 3 πt 0 = Θ(t1−1/(pA1 ) ), where the second equality follows by Theorem 1 (2). The last step is justified since it can be shown that the o(1) term in the integrand is in fact O(x− ) for some > 0. u t Theorem 5 contrasts with the analogous result for graphs generated with a similar process to the SPA model, but where all influence regions have area d/t for d > 0 a constant. We call this a threshold model. In the threshold model, E(Lt ) decreases much faster than for the SPA model with p large, such as when p > 2/3 and A1 = 1. For example, if pA1 = 1, then E(Lt ) = O(1). Theorem 6. In the threshold model with areas of influence d/t, where d is a constant, E(Lt ) ∼ ct−1/2 . Proof. With the same notation as in the proof of Theorem 5 and using Lemma 5, we have that E(Zj,t+1 | Gt ) = P(Bj,t+1 )E(L(vt+1 , vj ) | Bj,t+1 ) r 2d d = . 3t πt Hence, E(Lt+1 | Gt ) = t X i=1 2d E(Zj,t+1 |Gt ) = 3 Taking expectations completes the proof. r d = Θ(t−1/2 ). πt u t View publication stats 5 Conclusions and further work We have proved that graphs produced by the SPA model have some of the graph properties observed in real-world complex networks: a power law in-degree distribution, and constant average degree. In future work, we will investigate additional graph properties, such as the expected length of a directed path between two nodes (when such a path exists), expansion properties, and spectral values. We are also interested in aspects suggesting self-similarity: is it true that the subgraph induced by all nodes that fall in a certain compact region of the sphere S share some of the graph properties of the whole graph? Several generalizations of this model may be proposed. An undirected version could be developed, where the link probability depends on the influence regions of both endpoints. In a more realistic model, both the addition of edges without adding a node and the deletion of edges and nodes should be incorporated. The effect of replacing S with other underlying geometric spaces, either with boundaries or of higher dimension, would be interesting to investigate. Last but not least, a realistic spatial model gives the possibility for reverse engineering of real-life networks: given a real-life network and assuming a spatial graph model by which the network was generated, it should be possible to give reliable estimates about the positions of the nodes in space. This direction has important applications to web graph clustering and development of link-based similarity measures. References 1. A. Bonato, A survey of web graph models, In: Proceedings of Combinatorial and Algorithm Aspects of Networking, 2004. 2. F.R.K. Chung, L. Lu, Complex Graphs and Networks, American Mathematical Society, 2006. 3. C. Cooper, The age specific degree distribution of web-graphs, Combinatorics Probability and Computing 15 (2006) 637–661. 4. A. Flaxman, A.M. Frieze, J. Vera, A geometric preferential attachment model of networks, Internet Mathematics 3 (2006) 187–205. 5. A. Flaxman, A.M. Frieze, J. Vera, A geometric preferential attachment model of networks II, preprint. 6. S. Janson, T. Luczak, A. Ruciński, Random Graphs, Wiley, New York, 2000. 7. F. Menczer, Lexical and semantic clustering by Web links, JASIST 55(14) (2004), 1261-1269. 8. M. Penrose, Random Geometric Graphs, Oxford University Press, Oxford, 2003. 9. B. Pittel, J. Spencer, N. Wormald, Sudden emergence of a giant k-core in a random graph, Journal of Combinatorial Theory, Series B 67 (1996) 111–151. 10. N. Wormald, The differential equation method for random graph processes and greedy algorithms, In: Lectures on Approximation and Randomized Algorithms, eds. M. Karoński and H. J. Prömel, PWN, Warsaw, (1999) 73-155.

Log In

A Spatial Web Graph Model with Local Influence Regions

Related papers

Related papers

Related topics