13 Network Models: Nadine Baumann and Sebastian Stiller

Download as pdf or txt
Download as pdf or txt
You are on page 1of 32

13 Network Models

Nadine Baumann and Sebastian Stiller

The starting point in network analysis is not primarily the mathematically dened object of a graph, but rather almost everything that in ordinary language is called network. These networks that occur in biology, computer science, economy, physics, or in ordinary life belong to what is often called the real world. To nd suitable models for the real world is the primary goal here. The analyzed real-world networks mostly fall into three categories. The biggest fraction of research work is devoted to the Internet, the WWW, and related networks. The HTML-pages and their links (WWW), the newsgroups and messages posted to two or more of them (USENET), the routers and their physical connections, the autonomous systems, and several more are examples from this scope of interest. In biology, in particular chemical biology, including genetics, researchers encounter numerous structures that can be interpreted as networks. Some of these show their net structure directly, at least under a microscope. But some of the most notorious of the biological networks, namely the metabolic networks, are formed a little more subtly. Here the vertices model certain molecules, and edges represent chemical reactions between these molecules in the metabolism of a certain organism. In the simplest case, two vertices are connected if there is a reaction between those molecules. Sociological networks often appear without scientic help. We think of cronyism and other (usually malign) networks in politics and economy, we enjoy to be part of a circle of friends, we get lost in the net of administration, and networking has become a publicly acknowledged sport. The trouble not only but also for scientists is to get the exact data. How can we collect the data of a simple acquaintance network for a whole country, or even a bigger city? But for some networks the data is available in electronic form. For example, the collaboration of actors in movies, and the co-authorship and the citation in some research communities, partly owe their scientic attraction to the availability of the data. Many but not all of these examples from dierent areas have some characteristics in common. For example metabolics, the WWW, and co-authorship often form networks that have very few vertices with very high degree, some of considerable degree and a huge number of vertices with very low degree. Unfortunately, the data is sometimes forced to t into that shape, or even mischievously interpreted to show a so called power law. Often deeper results are

U. Brandes and T. Erlebach (Eds.): Network Analysis, LNCS 3418, pp. 341372, 2005. c Springer-Verlag Berlin Heidelberg 2005

342

N. Baumann and S. Stiller

not only presented without proof, but also only based on so called experimental observations. Yet one feature can be regarded as prevalent without any alchemy: Most of the real-world networks are intrinsically historical. They did not come into being as a complete and xed structure at one single moment in time, but they have developed step by step. They emerged. Therefore, on the one hand, it makes sense to understand the current structure as the result of a process. On the other hand, one is often more interested in the networks future than in one of its single states. Therefore several models have been developed that dene a graph, or a family of graphs, via a process in the course of which they emerge. The mathematical models for evolving networks are developed for three main intentions. First of all, the model should meet or advocate a certain intuition about the nature of the development of the real-world network. Secondly, the model should be mathematically analyzable. A third objective is to nd a model that is well suited for computational purpose, i.e., to simulate the future development or generate synthetic instances resembling the real network, for example to test an algorithm. There are several overviews in particular on models for Internet and WWW networks (see [164, 68] for a more mathematically inclined overview). Some of these papers already exceed this chapter in length. It hardly pays and it is virtually impossible to mention all models, experimental conjectures, and results. We rather intend to endow the reader with the necessary knowledge to spark her own research. We proceed in four sections. In the rst section the founding questions, driving ideas, and predominant models are summarized. Then, in the second section, we compile some methods that are deemed to or have proven to be fruitful in analyzing the structure of a network as a whole. Third, we broaden our scope in order to exemplify the great variety of models for evolving networks. The last section is devoted to the state of the art generators for networks that resemble the Internet. Up to further notice we consider graphs as directed graphs. Some graph processes may generate multigraphs which should be clear from the context.

13.1
13.1.1

Fundamental Models
The Graph Model (Gn,p )

First we want to discuss the historical starting point of random graph theory. More precisely, we dene the graph model (Gn,p ). A graph model is a set of graphs endowed with a probability distribution. In this case the graphs under consideration are undirected. The following three graph models stochastically converge to each other as n : 1. The rst way to generate a random graph is to choose a graph uniformly at random among all graphs of given vertex number n and average vertex degree z.

13 Network Models

343

2. Alternatively, choose every edge in a complete graph of n vertices with prob2p(n) ability p to be part of E(G), where n2 = p(n 1) =: z is the expected average degree. This model is denoted by (Gn,p ). 3. In the third method, n vertices vi are added successively, deciding for each vi and for each j < i whether to put {vi , vj } in the edge set or not with probability p. The last one is an interpretation of the second as a graph process. See Section 13.1.4 for more details about graph processes. The rst is of course more restrictive, because the average degree is xed and not just expected, as in the two other models. Still these models converge. Thereby the rst model may be more intuitive, but the second is often more suitable for analysis. These three aspects are also important for the other models we will discuss in this section: Some models capture best our intuition about the real world, others are superior in mathematical tractability. Third, networks in the real world very often are structures which rather emerged from a process than popped up as a whole. There is a myriad of literature and highly developed theory on the (Gn,p ) and related models. It turns out that a graph chosen according to that distribution, a graph generated by that model, shows a number of interesting characteristics with high probability. On the other hand, this graph model has, precisely because of these characteristics, often been disqualied as a model for real-world networks that usually do not show these characteristics. For example, without deep mathematical consideration one can see that the majority of the vertices will have almost or exactly the average degree. For many networks in the real world this is not the case. Still our interest in this model is more than historical. We should state at least one fundamental and very illuminating result on (Gn,p )-graphs. Let Gn,p denote a xed graph generated by one of these models. Theorem 13.1.1. Let m be the expected number of arcs in a Gn,p , i.e., m = p n . If m = n (log n + (n)), then for 2 2 Gn,p is disconnected with high probability, and for Gn,p is connected with high probability. This chapter will extensively treat degree sequences. Therefore we state the following immediate fact about the probability distribution p of the degree k of a vertex in a (Gn,p )-graph. We use z or z1 to denote the average degree of a vertex in the graph under consideration. n1 k z k exp(z) p (1 p)n1k k k! After this classical mathematical model let us turn to a topic strongly inspired by the real-world, the concept of a Small World. p(k) = 13.1.2 Small World

One of the starting points of network analysis is a sociological experiment conducted to verify the urban legend that anyone indirectly knows each other by

344

N. Baumann and S. Stiller

just a few other mediators. To scrutinize this assumption Milgram [421] asked several people in the US to deliver a message just by passing it on to people they knew personally. The senders and mediators knew nothing about the recipient but his name, profession, and the town he lived in. These messages reached their destination on average after roughly ve or six mediators, justifying the popular claim of six degrees of vicinity. The world, at least in the US, appears to be small. The notion of Small World has become technical since, usually encompassing two characteristics: First, the average shortest path distances over all vertices in a small world network has to be small. Small is conceptualized as growing at most logarithmically with the number of vertices. In this sense (Gn,p ) graphs (see Section 13.1.1) are small even for small values of p, and the sociological observation would come as no surprise. But in a vicinity-network like the one the sociological experiment was conducted on a huge fraction of people one knows personally, also know each other personally. Mathematically speaking a network shows the worldly aspect of a small world if it has a high clustering coecient. Whereas in an (Gn,p ) graph the clustering coecient obviously tends to zero. (The clustering coecient gives the fraction of pairs of neighbors of a vertex that are adjacent, averaged over all vertices of the graph. For a precise denition of the clustering coecient and a related graph statistic, called transitivity, see 11.5.) A very popular abstract model of small world networks, i.e., a graph with clustering coecient bounded from below by a constant and logarithmically growing average path distance, is obtained by a simple rewiring procedure. Start k with the kth power of an n-cycle, denoted by Cn . The kth power of a cycle is a graph where each vertex is not only adjacent to its direct neighbors but also to its k neighbors to the right and k neighbors to the left. Decide for each edge independently by a given probability p whether to keep it in place or to rewire it, i.e., to replace the edge {a, b} by an edge {a, c} where c is chosen uniformly at random from the vertex set. The description contains a harmless ambiguity. Viewing the rewiring process as iteratively passing through all vertices, one may choose an edge to be rewired from both of its vertices. It is not a priori clear how to handle these ties. The natural way to straighten this out is the following: Visit each vertex iteratively in some order, and make the rewiring decisions for each of the currently incident edges. Therefore, strictly speaking, the model depends on the order in which the vertex set is traversed. Anyway, the reader should be condent that this does not aect the outcome we are interested in, namely the average shortest path distance and the clustering coecient C. For small p the clustering coecient k stays virtually that of Cn . To be more precise, for small k and p and large p k n: C(Grewired ) = C(Cn )(1 2k ), as the pth fraction of an average of the 2k neighbors contribution is removed from the numerator. On the other hand, the average path distance in such a graph decreases quickly (as p increases) from the n original 4k (on average one has to walk a quarter of the circle by steps of length

13 Network Models

345

Fig. 13.1. Clustering coecient and path lengths for the small world model by WattsStrogats. Found at: http://backspaces.net/PLaw/. Results are from 2,000 random graphs, each with 300 vertices and 900 edges

k, except for maybe the last) to small values, claimed [573] to be in O(log n) (compare Figure 13.1). Unfortunately, these gures were obtained and veried empirically only. The chart suggests that calculation of the second moment of the distributions would be desirable, as the lower cloud of points, i.e., the average shortest path distances, appear far less stable. Maybe the most important problem with such experimental gures is that they can hardly account for the dierence between, for example, a logarithmic or a n behavior. A weakness of the rewiring model, and thus of the whole denition of small world graphs, is that, by xing the number of random edges and enlarging k, the clustering coecient can be kept articially high, whereas the path distances on average only depend on the number of random edges relative to n. An increase in small deterministic edges does not contribute to the average path distance, except for a constant: On average the number of steps to go from one longrange edge to the other becomes smaller only by a constant factor. Sociologically speaking, having many friends in the neighborhood brings you only a constant closer to the Dalai Lama. 13.1.3 Local Search

Revisiting the sociological experiment, one may not be satised that the theoretical explanation only accounts for the existence of short average shortest paths. The fact that the letters reached their destination within a few steps requires short paths not only to exist, but also to be detectable for the ignorant agents in the vicinity network. This led Kleinberg to the idea of a local algorithm. Roughly speaking, a local algorithm should act for example crawl a network step by

346

N. Baumann and S. Stiller

step without knowing the whole structure. In each step only a specic, local part of the whole data should be used to reach the current decision. A denition of local algorithm for the specic problem will be given in a moment. The real-world vicinity network is idealized by a parameterized network model that is easily found to have a high and constant clustering coecient. It is once again a network comprised of short deterministic and long random edges, modeling the intuition that we know our neighborhood and have some occasional acquaintances. The aim is to determine the parameters under which there exists a local algorithm capable of nding a path with on average logarithmic length for a randomly chosen pair of vertices. Model for Local Search. The network G(V, E) is parameterized by n, p, q and r. The vertex set V contains the points of a 2-dimensional n n lattice. On the one hand, E contains bi-directed arcs between each vertex and its 2p closest horizontal and 2p closest vertical neighbors. On the other hand, for each vertex, v, there are q directed arcs of the form (v, x) E, where x is chosen r (v,x) out of V \ {v} according to the distribution p(x) = d dr (v,y) , where d(x, y) y denotes the minimum number of steps to go from x to y on the grid and r > 0 is a constant. We call such a network GK (n, p, q, r) a Kleinberg-Grid. (Note that for p = 1 the clustering coecient is 0, but for p > 1 it is greater than 0 and essentially independent of n.)

Local Algorithm. The following notion of a local algorithm is not very general, but rather tailor-made for the above model. A local algorithm provides a rule giving the subsequent vertex at each vertex of the path to be output in the end, based only on the following types of information: Global Knowledge The structure of the underlying grid. The position of the destination vertex in the underlying grid. Local Knowledge The positions of the current vertex in the underlying grid and of its neighbors in the whole network (i.e., including its long-range connections). The positions of all vertices visited so far, including their neighbors positions. Results. The local algorithm Kleinberg analyses is the most natural one which gives even more explanatory power for the sociological experiment: Every recipient of the message passes it on to that vertex among its neighbors that is closest to the destination in d(, ). Call this the Kleinberg-Algorithm.

13 Network Models

347

Theorem 13.1.2. Let p, q be xed. Then the following holds for every Kleinberg-Grid GK (n, p, q, r): For r = 0 2 every local algorithm nds paths of average length in (n 3 ). For 0 < r < 2 2r every local algorithm nds paths of average length in (n 3 ). For r = 2 the Kleinberg-Algorithm nds paths of average length in O(log2 n). For r > 2 r2 every local algorithm nds paths of average length in (n r1 ). Sketch of Proof. Though the negative results of Theorem 13.1.2 (that no local algorithm can nd a path of the desired length) are the intriguing ones, it is the proof of the positive result that will give us the required insight, and will be sketched here. In order to estimate the average number of steps which the Kleinberg-Algorithm takes (for r = 2 and x the destination vertex) subdivide the vertex space in subsets Uk of vertices v with 2k1 d(x, v) < 2k . The algorithm always proceeds to a vertex that is closer to x than the current vertex. Thus, if it once reaches a subset Ui it will only advance to Uj where j i. As the total number of subsets grows logarithmically with n, we are done if we can show that the algorithm needs at most a constant number of steps to leave a subset Uk , independent of k. As the subset Uk can be very big, we interpret leaving a subset as nding a vertex that has a random edge into i<k Ui . As the algorithm visits every vertex at most once, we can apply the technique of postponed decisions, i.e., choose the random edge of a vertex v when we reach v. In order to have a constant probability at every level k, the probability for v to have a random contact at distance less than or equal to 2k1 from v, must be constant for all k. This is true for a 2-dimensional lattice if and only if r = 2. At this point the result of Theorem 13.1.2 seems generalizable to other dimensions, where r should always equal the dimension. This can easily be seen for dimension 1. The details of the proof and the negative results may be more dicult to generalize. The above proof already gives a hint why the negative results hold for dimension 2. If r > 2 the random arcs are on average too short to reach the next section in a constant time, when the algorithm is in a big and far away subset. On the other hand, r < 2 distributes too much of the probabilistic mass on long reaching arcs. The algorithm will encounter lots of arcs that bring it far beyond the target, but too rarely one that takes it to a subset closer to the target. In general, the distribution must pay sucient respect to the underlying grid structure to allow for a local algorithm to make use of the random arcs, but still need to be far-reaching enough. It seems worthwhile to conduct such an analysis on other, more life-like, models for the vicinity network.

348

N. Baumann and S. Stiller

13.1.4

Power Law Models

As already described in Section 11.1 there is a wide interest in nding graphs where the fraction of vertices of a specied degree k follows a power law. That means that the degree distribution p is of the form p(k) = ck > 0, c > 0.

This mirrors a distribution where most of the vertices have a small degree, some vertices have a medium degree, and only very few vertices have very high degree. Power laws have not only been observed for degree distributions but also for other graph properties. The following dependencies (according to [197]) can especially be found in the Internet topology: 1. Degree of vertex as a function of the rank, i.e., the position of the vertex in a sorted list of vertex degrees in decreasing order 2. Number of vertex pairs within a neighborhood as a function of the neighborhood size (in hops) 3. Eigenvalues of the adjacency matrix as a function of the rank A more detailed view to these power laws found in Internet topologies is given in Section 13.4. Since in the literature the most interesting fact seems to be the degree distribution, or equivalently the number of vertices that have a certain degree k, we will focus mostly on this. In some contexts (protein networks, e-mail networks, etc.) we can observe an additional factor q k to the power law with 0 < q < 1 the so called exponential cuto (for details see [448]). Trying to t a degree distribution to this special form, the power law p(k) = ck q k obtains a lower exponent than would be attained otherwise. A power law with an exponential cuto allows to normalize the distribution even in the case that the exponent lies in (0, 2]. Since the strict power law, i.e., in the form without cuto, is more fundamental and more explicit in a mathematical way, we will in the following restrict ourselves to some models that construct networks with power laws not considering exponential cuto. We start by describing the most well-known preferential attachment model and then give some modications of this and other models. Preferential Attachment Graphs. In many real life networks we can observe two important facts: growth and preferential attachment. Growth happens because networks like the WWW, friendships, etc. grow with time. Every day more web sites go online, and someone nds new friends. An often made observation in nature is that some already highly connected vertices are likely to become even more connected than vertices with small degree. It is more likely that a new website also inserts a link to a well-known website like google than to some private homepage. One could argue that someone who already has a lot of friends easily gets more new friends than someone with only a few friends the so called the rich get richer-phenomenon. This is modeled by a preferential attachment rule.

13 Network Models

349

One of the rst models to tackle these two special characteristics is the preferential attachment model presented by Barabsi and Albert in [40]. a

Graph Process. Formally speaking a graph process (G t ) is a sequence of sets G t of graphs (called states of the process (G t )) each endowed with a probability distribution. Thereby the sets and their distributions are dened recursively by some rule of evolution. More intuitively one thinks of a graph process as the dierent ways in which a graph can develop over the time states. In [40] a graph process (Gt ) is described in this intuitive way as the history m of a graph G = (V, E). At every point in time one vertex v with outdegree m is added to the graph G. Each of its outgoing edges connects to some vertex i V chosen by a probability distribution proportional to the current degree or indegree of i. Formally, this description gives a rule how any graph of a certain state of the process is transformed into a graph of the next state. Further, this rule of evolution prescribes for any graph of a state of the graph process the probabilities with which it transforms into a certain graph of the next state. In this way, the sets and distributions of the graph process are recursively dened. Unfortunately, the above description from [40] entails some signicant imprecisions which we will discuss now. The choice of the initial state (which usually contains exactly one graph) is a nonnegligible matter. For example, taking m = 1, if the graph is disconnected at the beginning of the sequence then any emerging graph is also disconnected. In contrast, any connected graph stays connected. Moreover, we need at least one vertex to which the m new edges can connect. But it is not dened how to connect to a vertex without any edge, since its probability is zero. Thus, there must be at least one loop at that vertex, or some other rule how to connect to this vertex. Secondly, one has to spell out that the distribution shall be proportional to the degree. In particular, it has to be clear whether and how the new vertex is already part of the vertex set V . If it is excluded no loops can occur. (Note that for m = 1 loops are the only elementary cycles possible.) If it is an element of V it is usually counted as if it had degree m, though its edges are not yet connected to their second vertices. Moreover, if m > 1 one has to dene a probability |+1 possible ways to connect which is distribution on the set of all |V | or |Vm m not suciently dened by requiring proportionality to the degree for each single vertex. Note that the process (Gt ) is equivalent to the process (Gt ) for large t in 1 m the following sense: Starting with the process (Gtm ) and always contracting the 1 last m vertices after m states we get the same result as for the process (Gt ). m With the graph process (Gt ), the probability that an arbitrary vertex has 1 degree k is Pr[k] = k , with = 3. There are several possibilities to prove this degree distribution. Some of them, and a precise version of the model, are presented in Section 13.2.1.

350

N. Baumann and S. Stiller

Other Power-Law Models. There are more models that try to construct a graph that resembles some real process and for which the degree distribution follows a power law. One of them is the model of initial attractiveness by Buckley and Osthus (see [68] for details and references). Here the vertices are given a value a 1 that describes their initial attractiveness. For example a search engine is already from the start more attractive to be linked to than a specialized webpage for scientists. So the probability that a new vertex is linked to vertex i is proportional to its indegree plus a constant initial attractiveness am. A dierent approach to imitate the growing of the world wide web is the copying model introduced by Kleinberg and others [375] where we are given a xed outdegree and no attractiveness. We choose a prototype vertex v V uniformly at random out of the vertex set V of the current graph. Let v be a new vertex inserted into the network. For all edges (v, w) for w V edges (v , w) are inserted into the network. In a next step each edge (v , w) is retained unchanged with probability p, or becomes rewired with probability 1 p. This simulates the process of copying a web page almost identical to the one the user is geared to and modifying it by rewiring some links. The authors also showed that this model obtains a degree distribution following a power law. One advantage of this model is that it constructs a lot of induced bipartite subgraphs that are often observed in real web graphs (for some further details see Section 3.9.3 about Hubs & Authorities). But it does not account for the high clustering coecient that is also characteristic of the webgraph. Another model that tries to combine most of the observations made in nature, and therefore does not restrict to only one way of choosing the possibilities for a connection, is the model of Cooper and Frieze [131]. Here we rst have the choice between a method NEW and a method OLD, which we choose by a probability distribution . Method OLD inserts a number of edges starting at a vertex already in the network whereas method NEW inserts rst a new vertex and then connects a number of edges to this new vertex. The number of inserted edges is chosen according to probability distribution . The end vertices to which to connect the edges are chosen either uniformly at random, or depending on the current vertex degree, or by a mixture of both.

13.2
13.2.1

Global Structure Analysis


Finding Power Laws of the Degree Distribution

We would like to have some general approaches to nd the exact degree distribution of a given graph model. Since there are no known general methods we will present four dierent ways of showing the degree distribution of the preferential attachment model. One will be a static representation of one state of the graph process called Linearized Chord Diagrams, introduced by Bollobs [68]. a Furthermore we will give three heuristic approaches that yield the same result. Linearized Chord Diagrams. A Linearized Chord Diagram (LCD) consists of 2n distinct points on the x-axis paired o by chords in the upper half-plane.

13 Network Models

351

v1

v2

v3

v4

v5

Fig. 13.2. An LCD representing a graph

The goal is now to construct a graph out of this LCD that represents a static state of a graph process. Reconsider the preferential attachment model by Barabsi and Albert (Seca tion 13.1.4). There a graph process (Gt ) is used. Let us consider the case m = 1. m Let Pr[v] be the probability that the vertex vt inserted at time t is connected to vertex v. We dene Pr[v] = 1/(2t 1) kv /(2t 1) if v = vt , otherwise (13.1)

where kv denotes the current degree of vertex v before the connection. The normalizing term is (2t 1) because the new edge is understood to be incident to vt only, until its second endpoint is chosen. The LCD Model. To construct a Linearized Chord Diagram as dened in the beginning of this section we can use n-pairings. An n-pairing L is a partition of the set S = {1, 2, . . . , 2n} into pairs. So there are (2n)! n-pairings. Figure the n!2n elements of S in their natural order on the x-axis and represent each pair by connecting its two elements by a chord (compare Figure 13.2). On such a Linearized Chord Diagram the construction of the graph for the pairing L becomes understandable. Construct the graph (L) by the following rules: starting from the left of the x-axis we identify all endpoints up to and including the rst right endpoint of a chord to form vertex v1 . Then we identify all further endpoints until the second right endpoint as vertex v2 and so on. To form the edges we replace all chords by an edge connecting the vertices associated with the endpoints. Figure 13.2 gives an example of such a Linearized Chord Diagram and the associated graph.

The same can be achieved by choosing 2n points at random in the [0, 1] interval and associating the points 2i 1 and 2i, i {1, 2, . . . , n} as a chord. LCDs as Static Representation of (Gn ). For a special point in time t = n we m can construct a Linearized Chord Diagram with n chords and build the graph (L). The obtained graph model is exactly the nth state of the graph process (Gt ), i.e., Gn . To see this observe how the evolution rule of (Gt ) can be imitated 1 1 1 for LCDs. Add one pair to an arbitrary LCD by placing the right point of the

352

N. Baumann and S. Stiller

pair at the end of the point set and inserting the left point of the pair uniformly at random before any of the 2n + 1 points. Then the new edge is connected by the same distribution as in the (Gt ) process. 1 It can easily be shown that the degree distribution of this static graph follows a power law with exponent = 3 (for details see [69]). Now we will give three heuristic approaches that work with the preferential attachment model. Continuum Theory. Let ki again denote the degree of vertex i. The value ki increases if a new vertex v enters the network and connects an edge to vertex ki i. The probability that this happens will be . Note that this does jV \{v} kj not yet determine the full probability distribution, but it is sucient for our argument. In addition we have to specify a start sequence. We want to start with a graph of m0 ( m) vertices and zero edges. As in this case the probability distribution is not dened, we stipulate it to be the uniform distribution for the rst step. Obviously after the rst step we have a star plus vertices of degree zero which are irrelevant for the further process. Unfortunately, the exact shape of the initial sequence is not given in [15].

We now want to consider ki as a continuous real variable. Therefore the rate with which ki changes is proportional to the probability that an edge connects to i. So we can state the following dynamic equation: ki =m t ki
jjV \{v}

kj

(13.2)

So we get for the total number of degrees in the network, except for that of the N 1 new vertex, j=1 kj = 2mt m. ki Thus the above equation changes to ki = 2t1 and, since we consider very t large times t, we can approximate it as ki ki = . t 2t (13.3)

By construction of the preferential attachment model we know that the initial condition ki (ti ) = m holds where ti is the time when vertex i was inserted into the network. Using this initial condition we obtain as a solution of the dierential equation (13.3) the following result: ki (t) = m t ti

1 . 2

(13.4)

Our goal is now to determine p(k), the probability that an arbitrary vertex has degree exactly k. Since p(k) = Pr[ki (t)<k] we rstly have to determine the k probability that the degree of vertex i at time t is strictly smaller than k.

13 Network Models

353

By using the solution of the dierential equation given above, the following equations arise: Pr[ki (t) < k] = Pr[m t ti

< k] m t k m t k Pr[ti = t] dt
1 1 1 1

= Pr ti >

= 1 Pr ti
1

m tk

=1
0

=1

m t k (t + m0 )
1

The last equation follows from the fact that the probability space over ti has to sum up to one and the probabilities are assumed to be constant and uniformly t 1 distributed, thus 1 = i=1 Pr[ti ] = Pr[ti ] = m0 +t . Dierentiating the above equations with respect to k we obtain for p(k): p(k) = 2m t 1 Pr[ki (t) < k] = 1 . k m0 + t k +1
1 1

(13.5)

1 For t asymptotically we get p(k) 2m k with = + 1 = 3. Note that the exponent is independent of m. So we get a degree distribution that follows a power law where the coecient is proportional to m2 .

Master Equation Approach. With the master equation approach we want to use recursion to nd the shape of the degree distribution. So we are looking for equations that use information from the time steps before, in form of the degree distribution of older time steps. Since we know the initial distribution it is easy to solve this recursion. This approach to determining the power law of the preferential attachment model was introduced by Dorogovtsev, Mendes, and Samukhin [166]. We study the probability p(k, ti , t) that a vertex i that entered the system at time ti has degree k at time t. During the graph process the degree of a vertex k i increases by one with probability 2t . For simplicity of formulas we use the dot notation for the derivative with respect to t. A master equation for this probability p(k, ti , t) is of the form: p(k, ti , t) =
k

[Wk k p(k , ti , t) Wkk p(k, ti , t)]

(13.6)

354

N. Baumann and S. Stiller

Here Wk k denotes the probability of changing from state k to state k. In our model this probability is obviously Wk k = k k ,k1 2t , where i,j = 1 0 i=j otherwise (13.7)

is the Kronecker symbol. By summing up over all vertices inserted up to time t, we dene the probability P (k, t) := ti t that some arbitrary vertex has degree k. As we are interested in a stationary distribution, we are looking for the point where the derivative with respect to time is zero. 0 = P (k, t) = = t 1 t
ti

p(k,ti ,t)

p(k, ti , t) t2 p(k, ti , t)

ti

p(k, ti , t)

ti

1 P (k, t) t 1 P (k, t) t

=
k

1 [Wk k p(k , ti , t) Wkk p(k, ti , t)] t [Wk k P (k , t) Wkk P (k, t)]

=
k

1 P (k, t) t 2 P (k, t) 2t

=
k

k k k ,k1 P (k , t) k,k 1 P (k, t) 2t 2t

k+2 k1 P (k 1, t) P (k, t) = 2t 2t There is now a t so that for every time t greater than t we get the stationary distribution, P (k). This results in the recursive equation P (k) = k1 P (k 1) for k+2 k m + 1. For the case k = m the probability directly results from the scaling 2 condition of the probability measure: P (m) = m+2 .
2m(m+1) This directly yields the power law of the form Pr[k] = k(k+1)(k+2) which converges to the value of the power law found using the continuum theory, 2m2 3 . This approach can also be used to determine the degree distribution of a more general case of preferential linking. In this model, one new vertex is inserted at every point in time. At the same time we insert m edges with one endpoint at unspecied vertices or from the outside. This can be done since here we only take into consideration the indegree of a vertex. The other endpoints are distributed to existing vertices proportional to q(s) + A where q(s) is the indegree of vertex s, and A an additional attractiveness associated with all vertices.

Rate Equation Approach. In this approach we want to analyze the change over time of the numbers of vertices with degree k we are looking for the rate at which this number changes.

13 Network Models

355

This approach for the preferential attachment model is due to Krapivsky, Redner, and Leyvraz [369]. We are considering the average number (over all graphs of the state of the process) Nk (t) of vertices that have exactly degree k at time t. Asymptotically we have, by the strong law of large numbers, the following for large t: Nk (t)/t Pr[k] and k kNk (t)/t 2m. If a new vertex enters the network, Nk (t) changes as follows: Pr[k] = (k 1)Nk1 (t) kNk (t) Nk =m + k,m . t k kNk (t) (13.8)

Here the rst term of the numerator denotes the total number of edges leaving vertices with degree exactly k 1 where new edges connect to those vertices and therefore increase the degree to k. The second term determines the number of edges leaving vertices with degree exactly k where new edges connect to those vertices and therefore increase the degree to a value higher than k. If the newly entered vertex has exactly degree k, i.e., m = k, then we have to add a 1 to our rate equation. Applying the above limits we obtain exactly the same recursive equation as found with the master equation approach, and therefore we have the same power law. Flexibility of the Approaches. All the approaches mentioned before are very helpful and easy to understand for the case of analyzing the preferential attachment model. Some of them are also applicable for more general versions of the preferential attachment model, as for m = 1 and others. But it is not clear whether there is a useful application of these approaches to totally dierent models. For the rate equation approach an adaption to more general evolving networks, as well as for networks with site deletion and link-arrangement, is possible. There is a huge need for approaches that can deal with other models. It would be even more desirable to nd a way to treat numerous types of evolving network models with a single approach. 13.2.2 Generating Functions

The power law exemplies an often faced problem in network-analysis: In many cases all that is known of the network is its degree sequence, or at least the distribution of the degrees. It seems as if one could infer certain other structural features of the network, for example second order neighbors from its degreesequence. Combinatorics provides a powerful tool to retrieve such insights from sequences: Generating functions. Our goal is to present some basics of generating functions and then develop the method for the special purposes of network analysis.

356

N. Baumann and S. Stiller

Ordinary Generating Functions. We are given the distribution of the degree sequence, to be precise a function p(k) mapping each vertex degree k to the probability for a randomly chosen vertex in a network chosen according to that degree sequence to be adjacent to k other vertices. (For simplicity we conne ourselves to undirected graphs.) Calculating the expectation of that distribution immediately gives the (expected) average degree z1 , i.e., the average number of neighbors of a random vertex. Can we as easily calculate the probability for a vertex to have k second order neighbors, i.e., vertices whose shortest path distance to it equals exactly 2, from the distribution of the rst order neighbors? Trying a direct approach, one might want to average over all degrees of a vertex the average of the degrees of the adjacent vertices. In some sense, one would like to perform calculations that relate the whole distribution to itself. But how to get hold of the entire distribution in a way useful for calculation? A generating function solves exactly this problem: On the one hand, it is an encoding of the complete information contained in the distribution, but on the other hand it is a mathematical object that can be calculated with. We dene: Denition 13.2.1. For a probability distribution p : [0, 1] Gp (x) =
k

p(k)xk

(13.9)

is called the generating function of p. This denition is by no means in its most general form. This particular way of encapsulating p is sometimes called the ordinary generating function. The formal denition is justied in the light of the following proposition: Proposition 13.2.2. Let p be a probability distribution and Gp its generating function: 1. Gp (1) = 1 2. Gp (x) converges for x in [1, 1]. 3. p(k) = 4. E(p) :=
k

1 k Gp k! xk

(x=0)

kp(k) = Gp (1)

5. Var(p) :=
k

k(k 1)p(k) = Gp (1)

13 Network Models

357

The convergence is shown by standard analytic criteria. The other parts of the proposition are immediate from the denition, keeping in mind for the rst that k p(k) = 1 for a probability distribution. Part 3 of the proposition shows that a generating function encodes the information of the underlying distribution. From another perspective a generating function, Gp , is a formal power series that actually converges on a certain interval. Taking powers (Gp (x))m of it will again result in such a power series. Interpreting this in turn as a generating function amounts to interpreting the coecient of some xk in (Gp (x))m . For m = 2 this is j+l=k p(j)p(l), in other words, this is the probability that the values of two independent realizations of the random variable with distribution p sum up to k. In general (Gp (x))m is the generating function for the distribution of the sum of the values of m independent realizations of the random variable distributed according to p. Generating Functions for Degree Sequences. For k let Dk be a random variable equal to the number of vertices of degree k. Further p(k) shall be the probability that a randomly chosen vertex has degree equal to k. It holds that np(k) = E(Dk ), the expectation of the random variable. To construct a random graph according to the distribution p may mean two slightly dierent things. We may take Dk to be a constant function for every k, thus, there is a xed number of vertices with degree k. Alternatively, we only require the expectation of Dk to equal that xed number. The latter model will make the graphs to which the rst model is conned only the most probable. Moreover, as the rst xes the degree sequence, only those sequences of xed values of Dk that are realizable generate a non-empty model. For example the sum of all degrees must not be odd. (The next section will discuss which degree sequences are realizable.) Despite these dierences, for a realizable sequence the statistical results we are interested in here are not aected by changing between these two models. We conne our explicit considerations to the second and more general interpretation, where p(k) only prescribes the expectation of Dk . To justify the technicality of generating functions, some structural features of the network should be easily derived from its degree sequences distribution. So far the average vertex degree z1 has been shown to be Gp (1), which is not a real simplication for computation. Next we ask for the degree distribution of a vertex, chosen by the following experiment: Choose an edge of the graph uniformly at random and then one of its endpoints. The probability f to thereby reach a vertex of degree k is proportional to kp(k). That means the corresponding generating function is Gf (x) = k kp(k) xk = x Gp (1) . Removing the factor p k kp(k) x in the right hand term amounts to reducing the exponent of x in the middle term, thus obtaining a generating function, where the coecient of xk in Gf (x) becomes the coecient of xk1 in the new generating function. Hence the new function is the generating function of f (k 1). Interpreting this combinatorially, we look at the distribution of the degrees minus one. In other words, we want to know the probability distribution p for the number of edges that are incident to the vertex, not counting the one edge we came from in the above choosing

G (x)

358

N. Baumann and S. Stiller

procedure. Its generating function can thus be written nicely, as Gp (x) =


k

Gp (x) kp(k) xk1 = kp(k) Gp (1) k

(13.10)

This distribution p is useful to determine the distribution of rth neighbors of a random vertex. Distribution of rth Neighbors. What is the probability, for a randomly chosen vertex v, that exactly k vertices are at a shortest path distance of exactly r? For r = 2, assume that the number of vertices of distance exactly 2 from v is ( wN (v) d(w)) d(v), (where d(v) denotes the degree of v), and for general r that the network contains no cycles. This assumption seems to be a good approximation for big, sparse, random graphs, as the number of occasional cycles seems negligible. But its exact implications are left to be studied. For the sake of explanation, assume the network to be directed in the following way: Choose a shortest-path tree from a xed vertex v and direct each of its edges in the orientation in which it is used by that tree. Give a third orientation, zero, to the remaining edges. In this way the denition is consistent even for non treelike networks. But assume again a tree structure. Any vertex except for v has exactly one in-edge and p is the distribution of the number of out-edges of that in-edge. Now a second assumption is required: For an out-edge {x, y} of a vertex x the degree of y shall be distributed independently of that of xs other out-edges endvertices, and independently of the degree of x. Of course, there are examples of pathological distributions for the degree-sequence where this assumption fails. Again, the assumption seems reasonable in most cases. Again, precise propositions on the matter are left to be desired. Given these two assumptions, tree structure and independence, the generating function of the distribution of second neighbors is seen to be p(k)(Gp (x))k = Gp (Gp (x)),
k

(13.11)

recalling that k independent realizations of a random variable amount to taking the kth power of its generating function. Correspondingly, the generating function of the distribution of the rth neighbors G(r) is: G(r) := Gp (Gp (Gp . . . Gp (x))) .
r
functions altogether

(13.12)

Taking expectations for second neighbors, i.e., calculating z2 , simplies nicely: z2 = [Gp (Gp (x))] |(x=1) = Gp (Gp (1))Gp (1) = Gp (1)
=1

(13.13)

Recall that the expectation of the rst neighbors z1 is Gp (1). Note that in general the formula for r-neighbors does not boil down to the rth derivative.

13 Network Models

359

Component Size. For the analysis of the component size, rst consider the case without a giant component. A giant component is a component of size in (n). Thus we assume that all components have nite size even in the limit. Assume again the network to be virtually tree-like. Again the results are subject to further assumptions on the independency of certain stochastic events. And again these assumptions are false for certain distributions and, though likely for large networks, it is unclear where they are applicable. To point out these presuppositions we take a closer look at the stochastic events involved. Construct a random graph G for a given probability distribution of the vertex degree, p, as always in this section. Next, choose an edge e uniformly at random among the edges of G. Flip a coin to select v, one of es vertices. The random variable we are interested in is the size of the component of v in G \ e. Let p be its distribution, and p as above the distribution of the degree of v in G \ e found by this random experiment. Then for example p (1) = p (0). In general, for k the degree of v, let n1 , . . . , nk be the neighbors of v in G \ e. Further, we need a laborious denition: Pk (s 1) := Pr[The sizes of the components of the k vertices n1 . . . nk in G \ {e, (v, n1 ), . . . (v, nk )} sum up to s 1.] Then when may write: p (s) = k p (k)Pk (s 1). How to compute Pk ? It does not in general equal the distribution of the component size of a randomly chosen vertex when removing one of its edges. Take into account that in the above experiment a vertex is more likely to be chosen the higher its degree. On the other hand, supposing a tree-like structure, the component size of nj is the same in G \ {e, (v, n1 ), . . . (v, nk )} as in G \ (v, nj ). Now, assume that our experiment chooses the edges (v, ni ) independently and uniformly at random among all edges in G, then Pk is distributed as the sum of k random variables distributed according to p . These assumptions are not true in general. Yet, granted their applicability for a special case under consideration, we can conclude along the following lines for the generating function of p :
n n

Gp (x) =
s=0

p xs =
s=0

xs
k

p (k)Pk (s 1)
n

=x
k

p (k)
s=0

xs1 Pk (s 1)
GPk (x)

Since we presume Pk as the distribution of the sum of k independent realizations of p , we have GPk (x) = Gk (x), and Gp (x) = x k p (k)(Gp (x))k . This can p be restated as Gp (x) = xGp (Gp (x)). (13.14) In a similar way we arrive at a consistency requirement for p , the distribution of the component size of a randomly chosen vertex: Gp (x) = xGp (Gp (x)) (13.15)

The assumptions on stochastic independence made here are not true in general. Granted they are veried for a specic degree distribution, the functional

360

N. Baumann and S. Stiller

Equations (13.14) and (13.15) still withstand their general solution. Numerical solutions have been carried out for special cases (for details see [448]). But the expected component size of a random vertex can be computed directly from those equations. The expectation of a distribution is the derivative of its generating function at point 1. Therefore E(p ) = Gp (1) = 1 + Gp (1)Gp (1). But, as Gp (1) = 1 + Gp (1)Gp (1), this becomes: E(p ) = Gp (1) = 1 +
2 Gp (1) z1 =1+ 1 Gp (1) z1 z2

(13.16)

Giant Component. So far we have excluded graphs with a giant component, i.e., a component that grows linearly with the graph. For a distribution that would generate such a component, the probability for a cycle would of course be no longer negligible. If we, however, still infer a tree-like structure as a good approximation, Formula 13.16 for the expected component size should no longer be independent of n, the number of vertices. Indeed for Gp (1) 1 equation (13.16) diverges, meaning that the expected component size is not bounded for unbounded n. What can be derived from Gp (1) = 1? 1 = Gp (1) k(k 1)p(k)xk2
k (x=1)

=
k

kp(k)

k(k 2)p(k) = 0
k

This equation marks the phase transition to the occurrence of a giant component, as the sum on the left increases monotonically with the relative number of vertices of degree greater than 2. How much of the graph is occupied by the giant component? In [448] it is claimed that the above considerations on the component size still apply to the non-giant part of the graph. But Gp (1) becomes smaller than 1 in these cases. Following the lines of [448], this should in turn give the fraction of the vertex set that is covered by non-giant components. In other words, n(1 Gp (1)) equals the (expected) number of vertices in the giant component. This is an audacious claim, as we calculate information about the non-giant part insinuating that it shows the same degree distribution as the whole graph. For example highdegree vertices could be more likely to be in the giant-component. Maybe those calculations actually lead to reasonable results, at least in many cases, but we cannot give any mathematical reason to be sure. Generating Functions for Bipartite Graphs. So far this section has collected several ideas based on generating functions in order to squeeze as much information as possible from the mere knowledge of the degree distribution. Some

13 Network Models

361

of them depend on further assumptions, some are less appropriate than others. Some conclusions drawn in the literature are left out due to their questionable validity. Finally, we become a little more applied. Many real-world networks show a bipartite structure. For example, in [184] we nd a graph model used to analyze how infections can spread in a community. The model consists of two sets of vertices, persons and places, and edges from any person to any place the person regularly visits. As in other examples, like the co-author or the co-starring network, we are given bipartite data, but the interest is often mainly in the projection onto one of the vertex sets, mostly the persons. Suppose we are given two probability distributions of degrees a and b, for the persons and the places, and the fraction between the numbers of persons and places. Make a partition of n vertices according to , realize a and b each in one part of the partition, and choose H uniformly at random among all bipartite graphs of n vertices with the same partition and degree sequences on the partition sets. Let H be the projection of H on the persons vertices, and p the corresponding distribution of its degree sequence. Then Gp = Gb (Ga ) and Gp = Gb (Ga ). Now the whole machinery of generating functions can be applied again. In this way generating functions can help to bridge the gap between the bipartite data we are given and the projected behavior we are interested in. 13.2.3 Graphs with Given Degree Sequences

Given a degree sequence, generating functions allow to derive some deeper graph properties. Now we wish to construct a graph of a given degree sequence. At best, the generating algorithm would construct such a graph with uniform probability over all graphs that have a proposed degree sequence d1 , d2 , . . . , dn . For reasons of simplicity we assume that d1 d2 dn are the degrees of vertices v1 , v2 , . . . , vn . Denition 13.2.3. A degree sequence d1 , d2 , . . . , dn is called realizable if there is a graph G with vertices v1 , v2 , . . . , vn V with exactly the given degree sequence. Erds and Gallai [180] gave sucient and necessary conditions for realizabilo ity of a simple, undirected graph. Necessary and Sucient Conditions. In order to construct a graph with a given degree sequence we should at rst verify whether that sequence is realizable at all. Secondly, we are only interested in connected graphs. Thus, we also want to know whether the degree sequence can be realized by a connected graph. Starting with the rst property we can observe the following. A degree sequence d = (d1 , d2 , . . . , dn ) is realizable if and only if n di is even (since the i=1 sum of the degrees is twice the number of edges), and for all subsets {v1 , v2 , . . . , v } of the highest vertex degrees, the degrees of those vertices can be absorbed within those vertices and with the outside degrees. This means that there are

362

N. Baumann and S. Stiller

enough edges within the vertex set and to the outside to bind to all the degrees. More formally we can state the following theorem. Theorem 13.2.4. A degree sequence d = (d1 , d2 , . . . , dn ) is realizable if and n only if i=1 di is even and
n

di ( 1) +
i=1 i= +1

min{ , di }

1 n.

(13.17)

This inequality is intuitively obvious, and therefore one direction of the theorem is trivial to prove. All degrees in the rst degrees of highest order have to be connected rst of all to the ( 1) other vertices in this set of vertices. The rest of the open degrees have to be at least as many as there are open degrees in the outside of the chosen set. How many can there be? For each vertex there is the minimum of either (since no more are needed for the vertices in the chosen set) or the degree of a vertex i where only vertices + 1, . . . , n are taken into account. A more precise theorem about realizability of an undirected, simple graph is given below. Theorem 13.2.5. A sequence d = (d1 , d2 , . . . , dn ) is realizable if and only if the sequence H(d) = (d2 1, d3 1, . . . , dd1 +1 1, dd1 +2 , dd1 +3 , . . . , dn ) is realizable. Furthermore we are interested not only in a graph with this degree sequence, but in a connected graph. The necessary and sucient conditions on connectedness are well known, but should be repeated here for completeness. Theorem 13.2.6. A graph G is connected if and only if it contains a spanning tree as a subgraph. As we neither have a graph nor a spanning tree, we are interested in a property that can give us the information whether a graph with certain degree sequence is constructible. As a spanning trees comprises (n 1) edges, the sum of degrees must be at least 2(n 1). This necessary condition is already sucient, as it will become clear from the constructing algorithms given below. If we can fulll Theorem 13.2.4, and vi V di 2(n 1) holds, there exists a connected graph with the given degree sequence. Algorithms. There are several easy-to-implement algorithms with linear running time that construct a graph with a given degree sequence. In the following we present two slightly dierent algorithms; one constructs a graph with a sparse core, the other constructs a graph with a dense core. The reader has to be aware that all of these easy algorithms do not construct a random graph out of all graphs with the desired degree sequence with the same probability. But starting from the graph constructed by one of these algorithms we give a method to generate a random instance that is indeed equiprobable among all graphs with the desired degree sequence. We assume that the sum of all degrees is at least 2(n 1).

13 Network Models

363

For both algorithms we need a subroutine called connectivity. This subroutine rst of all checks whether the constructed graph is connected. If the graph G is not connected, it nds a connected component that contains a cycle. Such a connected component must exist because of the assumption on the degrees made above. Let uv be an edge in the cycle, and st be an edge in another connected component. We now delete edges uv and st, and insert edges us and vt to the network. Sparse Core. In this section we want to describe an algorithm that constructs a graph with the given degree sequence that additionally is sparse. We are given a degree sequence d1 d2 dn , and we assign the vertices v1 , v2 , . . . , vn to those degrees. As long as there exists a vertex vi with di > 0, we choose the vertex v with the currently lowest degree d . Then we insert d edges from v to the rst d vertices with highest degree. After that we update the residual vertex degrees di = di 1 for i = 1, . . . , d and d = 0. Last, but not least, we have to check connectivity and, if necessary, establish it using the above mentioned method connectivity. Dense Core. To construct a graph with a dense core for a certain degree sequence, we only have to change the above algorithm for sparse cores slightly. As long as there exists a vertex vi with di > 0 we now choose such a vertex arbitrarily and insert edges from vi to the di vertices with the highest residual degrees. After that we only have to update the residual degrees and establish connectivity if it is not given. Markov-Process. To generate a random instance from the space of all graphs with the desired degree sequence, we start using an easy to nd graph G with the desired realization. In a next step, 2 edges (u, v) and (s, t) with u = v, s = t such that (u, s), (v, t) G are chosen uniformly at random. The second step is / to delete the edges (u, v) and (s, t) and replace them with (u, s) and (v, t). This process is a standard Markov-chain process often used for randomized algorithms. We can observe that the degree distribution is unchanged by this algorithm. If rewiring two edges would induce a disconnected graph, the algorithm simply does not do this step, and repeats the random choice. The following theorem states that this algorithm constructs a random instance out of the space of all graphs with the desired degree sequence. Theorem 13.2.7. Independent of the starting point, in the limit, the above Markov-chain process will reach every possible connected realization with equal probability. For practical reasons one has to nd a stopping rule so that we can bound the number of steps of the algorithm. Mihail et al. [420] observed that the process levels o in terms of the dierence of two sorted lists (at dierent points in time) of all neighbors (by degree) of nodes with unique degrees. Using this measure they heuristically claim a number of at most 3 times the level-o number of steps to get a good random graph for instances like todays AS-level topology

364

N. Baumann and S. Stiller

(about 12,000 vertices). They observed the number of steps to level-o to be less than 10,000 for graphs of 3,000 vertices, less than 75,000 for graphs with 7,500 vertices, and less than 180,000 for graphs with 11,000 vertices. d-Regular Graphs. A special variant of graphs with given degree sequences are d-regular graphs where each vertex has exactly degree d. There are several algorithms known that can construct an equiprobable dregular graph. McKay and Wormald [416] gave an algorithm that is also appli1 cable for arbitrary degree sequences. For a given d O(n 3 ) its expected running 2 4 time is in O(n d ), and furthermore it is very dicult to implement. A modication of this algorithm for only d-regular graphs improves the running time to O(nd3 ), but does not remove the disadvantages. Tinhofer [550] gave a simpler algorithm that does not generate the graphs uniformly at random and, moreover, the resulting probability distribution can be virtually unknown. Jerrum and Sinclair [329] introduced an approximation algorithm where all graphs have only a probability varying by a factor of (1 + ), but the d-regular graph can be constructed in polynomial time (in n and ), and the algorithm works for all possible degrees d. A very simple model is the pairing model, introduced in the following. The 2 1 running time is exponential (O(nd exp ( d 4 ))), and the graph can only be con1 structed in this running time for d n 3 . Pairing Model. A simple model to construct a d-regular graph is the so-called pairing model. There, nd points are partitioned into n groups clearly every group should include exactly d points. In a rst step a random pairing of all points has to be chosen. Out of this pairing we now construct a graph G. Let the n groups be associated with n vertices of the graph. There is an edge (i, j) between vertices i and j in the graph if and only if there is a pair in the pairing containing points in the ith and jth group. This so constructed graph is a dregular graph if there are no duplicated edges. Furthermore, we have to check a posteriori whether the graph is connected.

13.3

Further Models of Network Evolution

In this section we want to present some further models for evolving networks. Since there is a huge variety of them, we want to consider only some of those network models that include signicantly new ideas or concepts. 13.3.1 Game Theory of Evolution

The literature for games on a (xed) network is considerable. But game theoretical mechanisms can also be used to form a network, and this falls in our scope of interest. The following example is designed to model economic cooperation. Vertices correspond to agents, who establish or destroy an edge between each other trying to selshly maximize their value of a cost revenue function.

13 Network Models

365

The objective function of an agent sums revenues that arise from each other agent directly or indirectly connected to him minus the costs that occur for each edge incident to him: Let c be the xed costs for an incident edge and (0, 1). The cost revenue of a vertex v is uv (G) = ( wV (G) d(v,w) ) deg(v)c , where G is the current network and d(v, w) is the edge-minimal path distance from v to w in G. (To avoid confusion we denote the degree of a vertex v by deg(v) here.) Set that distance to innity for vertices in dierent components, or conne the index set of the sum to the component of v. An edge is built when it increases the objective function of at least one of the agents becoming incident and does not diminish the objective function of the other. To delete an edge it suces that one incident agent benets from the deletion. In fact the model analyzed is a little more involved. Agents may simultaneously delete any subset of their incident edges, while participating in the creation of a new edge, and consider their cost revenue function after both actions. To put it formally: Denition 13.3.1. A network G is stable if for all v V (G) e E(G) : v e = uv (G) uv (G \ e) and w V (G), S {e E(G) | v e w e} : uv ((G {v, w}) \ S) > uv (G) = uw ((G {v, w}) \ S) < uw (G) This quasi-pareto notion of stability does not guarantee that a stable network is in some sense good, namely that at least in total the benet is maximal. Therefore we dene: Denition 13.3.2. A network G is ecient if G : V (G) = V (G ) =
v

uv (G )
v

uv (G).

Theorem 13.3.3. In the above setting we have: For c < , ( c) > 2 the complete graph is stable. For c < , ( c) 2 the star is stable. For c the empty graph is stable. Theorem 13.3.4. In the above setting we have: For ( c) > 2 only the complete graph is ecient. For ( c) < 2 , c < + (n 2) 2 /2 only a star is ecient. For ( c) < 2 , c > + (n 2) 2 /2 only the empty graph is ecient. Through the work of A. Watts [572], this approach received a push towards evolution. Given the parameters c and , the question is, which networks will emerge? This remains unclear until the order in which agents may alter their

366

N. Baumann and S. Stiller

incident part of the edge set (and the initial network) is given. Up to now only the case for an empty network as the initial conguration has been scrutinized. The order, in which the agents can make their decisions, is given in the following random way: At each step t of a discretized time model, one edge e of the complete graph of all agents (whether e is part of the current network or not) is chosen uniformly at random. Then the two incident agents may decide whether to keep or drop or, respectively, establish or leave out the edge e for the updated network. This means for an existing edge e that it is deleted if and only if one of its endvertices benets from the deletion, and for a non-existing edge e that it is inserted if and only if at least one of the potentially incident vertices will benet and the other will at least not be worse o. Note that in this model more sophisticated actions, that are comprised of the creation of one and the possible deletion of several other edges, are not allowed. All decisions are taken selshly by only considering the cost revenue of the network immediately after the decision of time t. In particular no vertex has any kind of long time strategy. The process terminates if a stable network is reached. For this model the following holds: Theorem 13.3.5. In the above setting we have: For ( c) > 2 > 0 the process terminates in a complete graph in nite time. For ( c) < 0 the empty set is stable. For 2 > ( c) > 0 Pstar := Pr[Process terminates in nite time in a star] > 0, but Pstar 0 for n . The rst result is obvious as any new edge pays. The second just reformulates the stability Theorem 13.3.3. For the third part, note that a star can no longer emerge as soon as two disjoint pairs of vertices form their edges. The model and the results, though remarkable, still leave lots of room for further renement, generalization, and variation. For example, if a star has positive probability that tends to zero, then this could mean that one can expect networks in which some vertices will have high degree, but most vertices will show very low degree. This is a rst indication of the much discussed structure of a power law. 13.3.2 Deterministic Impact of the Euclidean Distance

For the following geometric model we want to denote the Euclidean distance between a vertex i and a vertex j by d(i, j). The idea of this model by Fabrikant, Koutsoupias, and Papadimitriou [196] is to iteratively construct a tree. In a rst step a sequence p0 , p1 , . . . , pn of vertices is distributed within a unit square or unit sphere. In the next step we insert edges successively. Here we want to distinguish between two opposite goals. On the one hand, we are interested in connecting vertices to their geometrically nearest neighbor. On the other hand, we are interested in a high degree of centrality for each vertex. In order to deal with this trade-o between the last mile costs and the operation costs due to

13 Network Models

367

communication delays, we connect the vertices with edges in the following way. Vertex i becomes connected to the vertex j that fullls minj<i d(i, j) + hj , where hj denotes the centrality measure and the relative importance of both goals. Here centrality measures can be the average number of hops to other vertices, the maximum number of hops to another vertex, or the number of hops to a given center - a xed vertex v V (for more details on centrality measures, see Chapter 3). The behavior of this model is of course highly dependent on the value and, to a lesser extent, on the shape used to place the vertices. Let T denote the constructed tree in a unit square. And let us dene hj to be the number of hops from pi to p0 in the tree T . Then we can state the following properties of T for dierent values of . Theorem 13.3.6. (Theorem 2.1. in [196]) If T is generated as above then: 2, 1. If < 1/ then T is a star with vertex p0 as its center. 2. If = ( n), then the degree distribution of T is exponential, that is, the expected number of vertices that have degree at least k is at most n2 exp (ck) for some constant c: E[|{i : degree of i < n2 exp (ck). k}|] 3. If 4 and = o( n), then the degree distribution of T is a power law; specically, the expected number of vertices with degree at least k is greater than c (k/n) for some constants c and (that may depend on though): E[|{i : degree of i k}|] > c(k/n) . Specically, for = o( 3n) the constants are: 1/6 and c = O(1/2 ). This theorem gives the impression that networks constructed by this algorithm have a degree sequence following a power law. But there are some points to add. The power law given in this theorem does not resemble the denition of power law given in this chapter. Here the authors analyzed the behavior of the degree distribution where only vertices with degree at least k come into consideration. Therefore, one has to take care when comparing results. A second point is that the results only hold for a very small number of vertices in the network. For a majority of vertices (all but O(n1/6 )) there is no statement made in the work by Fabrikant et al. Subsequently, Berger et al. [58] prove the real behavior of the degree distribution obtained by this model and show that there is a very large number of vertices with almost maximum degree. 13.3.3 Probabilistic Impact of the Euclidean Distance

The model of Waxman [574] uses the Euclidean distance, henceforth denoted by d(, ), to determine the probability distribution used to generate a graph of the model. In a rst step n points on a nite 2-dimensional lattice are chosen equiprobably to form the vertex set V (G) of the graph G. Then each edge {i, j}, i, j V , of the complete graph on these vertices is chosen to be part of the edge set E(G) with probability Pr({i, j}) = exp d(i,j) . Thereby L denotes L

368

N. Baumann and S. Stiller

the maximum Euclidean distance of two lattice points, i.e., the diagonal of the lattice. Increasing (0, 1] will decrease the expected length of an edge, whereas increasing (0, 1] will result in a higher number of edges in expectation. In a variant of the model, the function d(, ) is dened by random for each pair of chosen vertices. Thus, it will in general not even fulll the triangle inequality.

13.4

Internet Topology

The Internet consists of two main levels, the router level and the Autonomous System level. Both are systems with certain properties, like a power law with specied exponent, a certain connectivity, and so on. These properties are analyzed by Faloutsos et al. [197] in detail. One goal is now to construct synthetic networks that resemble the Internet very much and, further, that can generate a prediction of the future Internet topology. There are two types of generators: The rst type are model-oriented generators that implement only a specied set of models, as for example given in the previous sections. A universal topology generator, on the other hand, should further have the property of being extensible to new models that can be added in an easy way. To have such a universal generator is interesting for researchers who need good synthetic topologies to simulate their Internet protocols and algorithms. Therefore very good generation tools are needed. Those tools should have at least the following characteristics to be usable for a wide range of researchers and their dierent applications, not only for Internet topologies (see also [417]). 1. Representativeness: The tool should generate accurate synthetic topologies where as many aspects of the target network as possible are reected. 2. Inclusiveness: A single tool should combine the strengths of as many models as possible. 3. Flexibility: The tool should be able to generate networks of arbitrary size. 4. Eciency: Even large topologies should be generated in reasonable CPU time and memory. 5. Extensibility: The generator should be easily extendible by new models by the user. 6. User-friendliness: There should be an easy to learn interface and mechanics of use. 7. Interoperability: There should be interfaces to the main simulation and visualization applications. 8. Robustness The tool should be robust in the sense of resilience to random failures and, moreover, have the capability to detect errors easily. These are the desired characteristics of a generator tool. In order to reach the characteristics made above, there are some challenges that have not been solved yet in an acceptable way. Two main challenges in the eld of topology generation are (quoted from [417]):

13 Network Models

369

1. How do we develop an adapting and evolving generation tool that constitutes an interface between general Internet research and pure topology generation research? Through this interface, representative topologies, developed by the topology generation research community, can be made readily available to the Internet research community at large. 2. How do we design a tool that also achieves the goal of facilitating pure topology generation research? A researcher that devises a generation model should be able to test it readily without having to develop a topology generator from scratch. The topology generators available today or, better, their underlying models can be classied as follows (after [417]). On the one hand, there are ad-hoc models that are based on educated guesses, like the model of Waxman [574] and further models [109, 157]. On the other hand there are measurement based models where measures can be, for example, a power law. We can divide this class into causality-oblivious and causality-aware models. By causality we think of some possible fundamental or physical causes, whereas causality-oblivious models orient themselves towards such abstract features as power laws. The INET model and generator, described in Section 13.4.2, and the PLRG model by Aiello et al. [9] belong to the rst of these subclasses. The preferential attachment model and the topology generator BRITE belong to the causality-aware models. 13.4.1 Properties of the Internets Topology

Faloutsos, Faloutsos and Faloutsos [197] analyzed the structure of the Internet topology at three dierent points in time, and especially analyzed the growing of special metrics. Some of the very obvious metrics are, for example, the rank of a vertex, i.e., the position of the vertex in a sorted list of vertex degrees in decreasing order, and the frequency of a vertex degree, i.e., how often a degree k occurs among all the vertices. Using the minimal distance between two vertices, i.e., the minimal number of edges on a path between the two vertices, one can determine the number of pairs of vertices P (h) that are separated by a distance of no more than h. By taking this denition we obviously have the property that self-pairs are included in P (h) and all other pairs are counted twice. A resultant metric is the average number of vertices N (h) that lie in a distance of at most h hops. In the Internet there are two levels worth evaluating (for details see [259]). In [197] the data collected by the Border Gateway Protocols (BGP) that stores all inter-domain connections is evaluated. There the authors looked at three special points in time. The rst graph is from November 1997, the second from April 1998, and the third from December 1998. By evaluating this information, and looking for power laws for the above mentioned metrics, they draw the following conclusions. The rst conclusion is that the degree d(v) of a vertex v is proportional to the rank rv of the vertex, raised to the power of a constant, R. This yields the power law of the form

370

N. Baumann and S. Stiller

R d(v) rv . R is dened as the slope in the graph of the function that maps d(v) on the rank of v (denoted as (d(v); rank of v)) in log-log plot. A second observation is about the frequency fk of a vertex degree k: fk k O with O a constant. The constant O can be determined by determining the slope of the (fk ; k) plot with a log-log scale. For the total number of pairs of vertices P (h) they can only approximate the power law to the form P (h) hH , where H is the so called hop-plot exponent and is constant. In this case the constant H is dened by the slope of a plot in log-log scale of the (P (h); h) graph.

13.4.2

INET The InterNEt Topology Generator

This model-oriented topology generator (more details in [332]) tries to implement more than just the analyzed power law in the degree distribution. Several of the analyses of Faloutsos et al. result in exponential laws. The rst exponential law they observed and determined an exact form for is the frequency of certain degrees. (13.18) fk = exp(at + b)k O , where fk is the frequency of a degree k. a, b, O are known constants and t is the time in months since November 1997. Having this equation, we can also predict the frequency of a degree k for t a month in the future. A second exponential law they found was the degree growth. k = exp(pt + q)rR (13.19)

The degree k at a given rank r also grows exponentially over time. Here p, q, R are known constants and t is again the number of months since November 1997. This law tells us that the value of the ith largest degree of the Internet grows exponentially. This does not necessarily mean that every ASs degree grows exponentially with time because the rank of a particular AS can change as the number of ASs increases. Two further exponential laws are the pair size growth and the resultant neighborhood size growth. Pt (h) = exp(sh t)P0 (h) (13.20)

The pair size within h hops, P (h), grows exponentially with the factor P0 (h), that is the pair size within h hops at time 0 (=November 1997). The neighborhood size within h hops, A(h), grows exponentially as follows. At (h) = Pt (h) = exp((log P0 (h) log P0 (0)) + (sh s0 )t) P0 (h) = A0 (h) exp((sh s0 )t)

(13.21)

Here A0 (h) is the neighborhood size at time 0 (= November 1997). The value t is, as always, the number of months since time 0. The INET topology generator now uses the observed and analyzed exponential laws to construct a network that strongly resembles the real Internet network

13 Network Models

371

evaluated at time t. In a rst step, the user has to input the number of vertices and the fraction p of the number of vertices that have degree one. By assuming exponential growth of the number of ASs in the Internet, the generator computes the value of t the number of months since November 1997. Then it is easy to also compute the distributions of degree frequency and rank. Since the second power law only holds for 98% of the vertices we have to assign degrees to the top 2% of the vertices using the rank distribution (13.19). p percent of the vertices are assigned degree one. The remaining vertices are assigned degrees following the frequency degree distribution. The edges are inserted into the initial graph G to be generated according to the following rules. First, a spanning tree is built among vertices with degree strictly larger than one. This is done by successively choosing uniformly at random a vertex with degree strictly larger than one that is not in the current tree G, and connecting it to a vertex in G with k probability proportional to K . Here k is the degree of the vertex in G, and K is the sum of degrees of all vertices already in G that still have at least one unlled degree. In a next step, p|V | vertices with degree one are connected to vertices in G with proportional probability as above. In a nal step, the remaining degrees in G, starting with the vertex with largest assigned degree, are connected to vertices with free degrees randomly picked, again using proportional probability. The connectivity of the graph is rst tested by a feasibility test before actually inserting the edges. Other Generators. There are several more topology generators available. The GT-ITM generator [109] is able to construct dierent topologies. One of them is a transit-stub network that has a well-dened hierarchical structure. The Tiers generator [157] is designed to provide a three level hierarchy that represents Wide Area Networks (WAN), Metropolitan Area Networks (MAN), and Local Area Networks (LAN). The generator BRITE [417] also contains several mechanisms to construct topologies. It not only includes the well-known basic model of Barabsi and Albert on router and on AS level but also the Waxman model for a both types. It can also illustrate and evaluate the networks made by the INET and GT-ITM generators, and the data obtained from the National Laboratory for Applied Network Research routing data [452]. By using the mentioned power laws observed in the Internet it is now easy to determine the representativeness of such a generator. Medina et al. [418] used the above mentioned topology generators to generate dierent kinds of topologies, and then evaluated them according to the existence of power laws. As a result they can say that the degree versus rank and the number of vertices versus degree power laws were not observed in all of the topologies. In this way the existence can be used to validate the accuracy of a generator. Power laws concerning the neighborhood size and the eigenvalues were found in all the generated topologies, but with dierent values of the exponent.

372

N. Baumann and S. Stiller

13.5

Chapter Notes

In 1959, Gilbert [244] introduced the model (Gn,p ) in the sense of our second denition. In the same year Erds and Rnyi [181] presented a model paramo e eterized by the number of vertices and the number of edges, n and m, which corresponds to the rst denition we give, except that we x the average vertex degree, z = 2m . For a good introduction to this research area we refer to n Bollobs [67, Chapter 7]. There Theorem 9 corresponds to our Theorem 13.1.1. a Our discussion about Local Search in Small Worlds is based on Kleinbergs pertinent work [360]. Further details on the exponential cuto, and an evolutionary model that regards the exponential cuto, are given in the very recent paper by Fenner, Levene, and Loizou [205]. In Bollobs et al. [68] further interesting behavior caused by certain initial a choices of vertices and edges for the preferential attachment model by Barabsi a and Albert is given, and some more imprecisions are pointed out. For more details on the equivalence of (Gt ) and (Gtm ) see [68], too. Also, more explanations of m 1 other power law models and some mathematical background are given there. Simple and ecient generators for standard random graphs, small worlds, and preferential attachment graphs are described in [43]. Generating functions are a concept for dealing with counting problems that is far more general than we present it here. Most books on combinatorics include a thorough discussion of generating functions (see for example [10]). A particular reference for generating functions only is [586], which can be downloaded at www.cis.upenn.edu/~wilf/. We mainly mathematically clarify the assertions found in [448]. There also further details on generating functions for bipartite graphs can be found. The results in Section 13.2.3 are basically taken from Mihail et al. [420]. A proof for Theorem 13.2.4 is contained in [57, Chapter 6] (the original paper of Erds and Gallai [180] is in Hungarian). The more precise Theorem 13.2.5 was o rstly given and proven in Havel and Hakimi [288, 270], and it is stated as found in Aigner and Triesch [11]. For references on Markov-chain-processes see [420]. An overview of the algorithms for d-regular graphs can be found in Steger and Wormald [532]. They also construct a polynomial algorithm that works for all d and give an idea of how to implement that algorithm to obtain an expected running time in O(nd2 + d4 ). We present the results given in Section 13.3.1 following A. Watts [572], though earlier work by Jackson and Wolinsky [323] prepared the ground. The p -model is introduced in Section 10.2.5, and therefore it is omitted in this chapter.

You might also like