Academia.eduAcademia.edu

Efficiency of scale-free networks: error and attack tolerance

2003, Physica A: Statistical …

arXiv:cond-mat/0205601v1 28 May 2002 Efficiency of Scale-Free Networks: Error and Attack Tolerance Paolo Crucitti a, Vito Latora b, Massimo Marchiori c,d, and Andrea Rapisarda b a Scuola Superiore di Catania, Via S. Paolo 73, 95123 Catania, Italy b Dipartimento di Fisica e Astronomia, Università di Catania, and INFN sezione di Catania, Corso Italia 57, 95129 Catania, Italy c W3C and Lab. for Computer Science, Massachusetts Institute of Technology, USA d Dipartimento di Informatica, Università di Venezia, Italy Abstract The concept of network efficiency, recently proposed to characterize the properties of small-world networks, is here used to study the effects of errors and attacks on scale-free networks. Two different kinds of scale-free networks, i.e. networks with power law P(k), are considered: 1) scale-free networks with no local clustering produced by the Barabasi-Albert model and 2) scale-free networks with high clustering properties as in the model by Klemm and Eguı́luz, and their properties are compared to the properties of random graphs (exponential graphs). By using as mathematical measures the global and the local efficiency we investigate the effects of errors and attacks both on the global and the local properties of the network. We show that the global efficiency is a better measure than the characteristic path length to describe the response of complex networks to external factors. We find that, at variance with random graphs, scale-free networks display, both on a global and on a local scale, a high degree of error tolerance and an extreme vulnerability to attacks. In fact, the global and the local efficiency are unaffected by the failure of some randomly chosen nodes, though they are extremely sensititive to the removal of the few nodes which play a crucial role in maintaining the network’s connectivity. Key words: Structure of Complex Networks, Scale-Free Networks PACS: 89.75.-k, 89.75.Fb, 05.90.+m Preprint submitted to Elsevier Preprint 1 February 2008 1 Introduction The study of the structural properties of the underlying network can be very important to understand the functions of a complex system [1]. For instance the architecture of a computer network is the first critical issue to take into account when we want to design an efficient communication system. Similarly, the efficiency of the communication and of the navigation over the Net is strongly related to the topological properties of the Internet and of the World Wide Web. The connectivity structure of a population (the set of social contacts) affects the way ideas are diffused, but also the spreading of epidemics over the network. Only very recently the increasing accessibility of databases of real networks on one side, and the availability of powerful computers on the other side, have made possible a series of empirical studies on the properties of biological, technological and social networks. The results obtained have shown that, in most cases, real networks are very different from random and regular networks, and display some common properties as high efficiency and high degree of robustness. The literature on complex networks has followed an exponential growth in the last few years; a comprehensive review can be found in Refs.[2–4]. In the following we enumerate some of the results appeared in the recent literature that are important in order to understand the purpose of this paper: (1) In ref. [5], Watts and Strogatz have shown that the connection topology of some real networks is neither completely regular nor completely random. These networks, named small-world networks [6], exhibit in fact high clustering coefficient, like regular lattices, and small average distance between two generic points (small characteristic path length), like random graphs. Watts and Strogatz have also proposed a simple model (the WS model) to construct networks with small-world properties (i.e. networks with high clustering and small average distance), by rewiring few edges of a regular lattice. (2) In ref.[7] two of us have introduced the concept of efficiency of a network, which measures how efficiently the information is exchanged over the network. By using the efficiency as a new measure to characterize the network, it has been showed that small-worlds are systems that are both globally and locally efficient. Moreover the description of a network in terms of its efficiency extends the small-world analysis also to unconnected networks and to real systems that are better represented as weighted networks [8–10]. (3) Small average distance and high clustering are not all the common features of complex networks. Barabasi and collaborators have studied P (k), the degree distribution of a network, and found that many large networks (the World Wide Web, Internet, metabolic networks and protein networks) are scale-free, i.e. have a power-law degree distribution P (k) ∼ k −γ 2 [11–16]. Neither random graph theory [17], nor the WS model to construct networks with the small-world properties [5] can reproduce this feature: in fact both give P (k) peaked around the average value of k. In ref. [12] Barabasi and Albert have proposed a simple model (the BA model) to construct a scale-free topology by modeling the dynamical growth of the network: some ad hoc assumptions in the network dynamics result in a network with the correct scale-free features, i.e. with a power-law degree distribution P (k) ∼ k −3 . Moreover in ref.[14] the authors have shown that scale-free networks, at variance with random networks, display a high degree of error tolerance. That is the ability of their nodes to communicate is unaffected by the failure of some randomly chosen nodes. However, error tolerance comes at a high price in that scale-free networks are extremely vulnerable to attacks, i.e. to the removal of a few nodes which play a crucial role in maintaining the network’s connectivity. Such error tolerance and attack vulnerability typical of scale-free networks have also been found in real networks [14]. (4) The BA scale-free model produces networks with a power law connectivity distribution, but not with small-world properties. In fact the BA scalefree networks have small average distance between two generic points, the first property of a small-world network, while they lack of high clustering, the other property of a small-world network. More recently Klemm and Eguı́luz [18] have proposed an alternative model (the KE model) to construct networks where scale-free degree distributions coexist with small average distances and with strong clustering. Therefore, the KE model reproduces, at the same time, the two distinct features present in real networks: power law degree distribution and the small-world behavior. In this paper we use the concept of global and local efficiency to characterize the properties of scale-free networks (i.e. networks with power law degree distributions), and to study their error and attack tolerance. We consider both scale-free networks with no clustering (the BA model), and scale-free networks with high clustering properties (the KE model). We analyze the effect of errors and attacks not only on the global properties of the network (as done in ref.[14] by using as a measure the average distance between two points) but also on the local properties of the network. Moreover we compare the results obtained in terms of global and local efficiency of the network with the results in terms of average distance and clustering coefficient. The three innovative point of our paper are: • The use of the efficiency measure to characterize scale-free networks. This allows to avoid problems due to the divergence of the average distance. • The parallel study of scale-free networks with no clustering, and scale-free networks with high clustering. • The study of the effect of errors and attacks not only on the global proper3 tied, but also on the local properties of the network. The paper is organized as follows. In Section 2 we define the variable efficiency and we illustrate how the small-world behavior can be expressed in terms of the local and the global efficiency of the network. In Section 3 we discuss the relevance and the properties of scale-free networks, and we illustrate the BA model and the KE model. In Section 4, the central part of the paper, we investigate the effects of errors and attacks both on the global and on the local properties of scale-free networks. We show that the efficiency is a better measure than the characteristic path length to describe the global properties of complex networks, especially when a large number of nodes is removed. The local properties of the scale-free networks are equally well described by the local efficiency or by the clustering coefficient. By considering both BA and KE scale-free networks, we show that scale-free networks are systems resistent to errors but vulnerable to attacks both at a global and at a local level. In Section 5 we draw the conclusions. 2 Small-World behavior and Efficiency of a Network In their seminal paper Watts and Strogatz have shown that the connection topology of some real (biological, social and technological) networks is neither completely regular nor completely random [5]. Watts and Strogatz have named these networks, that are somehow in between regular and random networks, small-worlds, in analogy with the small-world phenomenon, empirically observed in social systems more than 30 years ago [6]. The mathematical characterization of the small-world behavior is based on the evaluation of two quantities, the characteristic path length L, measuring the typical separation between two generic nodes in the network and the clustering coefficient C, measuring the average cliquishness of a node. Small-world networks are in fact highly clustered, like regular lattices, yet having small characteristic path lengths, like random graphs. Let us give some useful mathematical formalism. A generic unweighted (or relational) network [9] is represented by a graph G with N vertices (nodes) and K edges (arcs, links or connections). Such a graph is described by the so-called adjacency matrix {aij } (also called connection matrix). This is a N · N symmetric matrix, whose entry aij is 1 if there is an edge joining vertex i to vertex j, and 0 otherwise. An important quantity of graph G, which will be used in the following of this paper, is the degree of a generic vertex i, i.e. the number ki of edges incident with vertex i, the number P of neighbours of i. We have K = i ki /2 because each link is counted twice, and the average value of ki is < k >= 2K/N. To define L we need first to construct the shortest path length dij between two vertices (known in social networks studies as the number of degrees of separation [6]), measured as the miminum number of edges traversed to get from a vertex i to another vertex 4 j. By definition dij ≥ 1 with dij = 1 if there exists a direct edge between i and j. The characteristic path length L of graph G is defined as the average of the shortest path lengths between two generic vertices: L(G) = X 1 dij N(N − 1) i6=j∈G (1) Of course this definition is valid only if G is totally connected, which means that there must exist at least a path connecting any couple of vertices with a finite number of steps. Otherwise, when from i∗ we can not reach j ∗ then di∗ j ∗ = +∞ and consequentely L as given in eq.(1),being divergent, is an illdefined quantity. When studying how the properties of a network are affected by the removal of nodes, one often incurrs in non-connected networks. In such cases the alternative formalism in terms of efficiency here proposed is much more powerful, as will be clarified in the following. The second measure, the clustering coefficient C, is a local quantity of G defined as follows. For any node i we consider Gi , the subgraph of neighbors of i. That is once eliminated i we study how the nodes previously connected to i remain still connected between each other. If the node i has ki neighbors, then Gi has ki nodes and at most ki (ki − 1)/2 edges. Ci is the fraction of these edges that actually exist, and C is the average value of Ci all over the network: C(G) = 1 X Ci N i∈G Ci = # of edges in Gi ki (ki − 1)/2 (2) To illustrate the onset of the small-world, Watts and Strogatz have proposed a one-parameter model (the WS model) to construct a class of unweighted graphs which interpolates between a regular lattice and a random graph. The edges of a regular lattice are rewired with a probability p. As the rewiring probability p increases, the network becomes increasingly disordered and for p = 1 a random graph is obtained. Although in the two limiting cases large C is associated to large L (p = 0) and viceversa small C to small L (p = 1), there is an intermediate regime where the network is a small-world: highly clustered like a regular lattice and with small characteristic path lengths like a random graph. In fact only a few rewired edges (0 < p ≪ 1) are sufficient to produce a rapid drop in L, while C is not affected and remains equal to the value for the regular lattice [5]. By means of this mathematical formalism based on the evaluation of L and C, Watts and Strogatz have found three examples of small-world behavior in real networks: 1) the collaboration graph of actors in feature films from Ref.[19], as an example of a social system; 2) the neural network of a nematode, the C. elegans [20] as an example of a biological network; 3) finally an example of a technological network, the electric power grid of the western United States. 5 An alternative definition of the small-world behavior has been proposed more recently by two of us in ref.[7,9] and is based on the definition of the efficiency of a network. Instead of L and C the network is characterized in terms of how efficiently it propagates information on a global and on a local scale, respectively. To define the efficiency of G let us suppose that every node sends information along the network, through its edges. We assume that the efficiency ǫij in the communication between node i and j is inversely proportional to the shortest distance: ǫij = 1/dij ∀i, j. With this definition, when there is no path in the graph between i and j, dij = +∞ and consistently ǫij = 0. The global efficiency of the graph G can be defined as: Eglob (G) = P X 1 ǫij 1 = N(N − 1) N(N − 1) i6=j∈G dij i6=j∈G (3) and the local efficiency, in analogy with C, can be defined as the average efficiency of local subgraphs: Eloc (G) = 1 X E(Gi ) N i∈G E(Gi ) = X 1 1 ki (ki − 1) l6=m∈Gi d′lm (4) where Gi , as previously defined, is the subgraph of the neighbours of i, which is made by ki nodes and at most ki (ki − 1)/2 edges. It is important to notice that the quantities {d′lm } are the shortest distances between nodes l and m calculated on the graph Gi . The two definitions we have given have the important property that both the global and local efficiency are already normalized, that is: 0 ≤ Eglob (G) ≤ 1 and 0 ≤ Eloc (G) ≤ 1 [21]. The maximum value of the efficiency Eglob (G) = 1 and Eloc (G) = 1 are obtained in the ideal case of a completely connected graph, i.e. in the case in which the graph G has all the N(N − 1)/2 possible edges and dij = 1 ∀i, j. In the efficiency-based formalism a small-world results as a system with high Eglob (corresponding to low L) and high Eloc (corresponding to high clustering C), i.e. a network extremely efficient in exchanging information both on a global and on a local scale. Moreover the description of a network in terms of its efficiency extends the small-world analysis also to unconnected networks and, more important, with only a few modifications, to weighted networks. A weighted network is a case in which there is a weight associated to each of the edges. Such a network needs two matrices to be described: the usual adjacency matrix {aij } telling about the existence or not existence of a link (and whose entry aij , as for the unweighted case, is 1 when there is an edge joining i to j, and 0 otherwise) and and a second matrix, the matrix of the weights associated to each link. All the details of the applications of the efficiency-based formalism to study real weighted networks, e.g. the Boston subway transportation system, can be found in [7–9]. In this paper we focus instead on the simpler case of unweighted networks: we are in fact interested in the use of the efficiency 6 Fig. 1. The connectivity properties of two graphs G1 and G2, both with N=5 nodes, are compared. Differently from the efficiency Eglob , the characteristic path length L is not a representative measure when the graph is unconnected. At the local level, C is a good approximation of Eloc . formalism to describe in quantitative terms the global and the local properties of scale-free networks, and to study how these properties are affected by the random removal of nodes or by attacks. A simple example will be very useful to illustrate the comparison between Eglob , Eloc and L, C, and to explain why the efficiency in many cases works better than L and C, even for unweighted networks. In particular the differences between the description in term of Eglob and the description in terms of L are evident when the network is unconnected. Fig. 1 is an example of the problems associated to the calculations of L when the graph is unconnected. We consider 2 graphs G1 and G2, both having the same number of nodes N = 5, but different number of edges. By using the definition (1) we obtain L1 = 13/10 for the graph on the left hand side and L2 = ∞ for the graph on the right hand side. An alternative possibility to avoid the divergence of L2 is to limit the use of definition (1) only to a part of the graph, the main connected component [22] of G2, which is made of 3 nodes. In this way we get L2 = 1 and the final information we extract from the analysis of the characteristic path length is that graph G2 has better structural properties than graph G1, since L2 < L1 . This is of course wrong because G1 is certainly much better connected than G2, and the misleading information comes from the fact that in the second graph we had to remove two nodes from the analysis. By studying instead the efficiency of the two graphs we are allowed to take into account also the nodes not connected to the main connected component: we get (Eglob )1 = 17/20 and (Eglob )2 = 3/10, in perfect agreement with the fact that G1 has a much better connectivity (17/20 the efficiency of the completely connected graph) than G2. On the other side an evaluation of the local clustering of the two graphs gives: C1 = 4/5, C2 = 3/5, and an evaluation of the local efficiency gives: (Eloc )1 = 9/10, (Eloc )2 = 3/5. This indicates that the first graph has also better local properties than the second one. Moreover the variable C is a good approximation of the local efficiency Eloc (this is in general true when the subgraphs Gi of a generic node i are composed by small graphs [9]). 7 3 Scale-Free Networks An important information to characterize a graph G, as previously mentioned, is the degree of a generic vertex i, i.e. the number ki of edges incident with vertex i, the number of neighbours of i. Barabasi and collaborators focussed their attention on P (k), the degree distribution of a network, and showed that many real large networks, as the World Wide Web, the Internet, metabolic and protein networks, are scale-free, that is, their degree distribution follows a power-law for large k [11–16]. Also some social systems of interest for the spreading of sexually trasmitted diseases [23,24], and the connectivity network of atomic clusters’ systems [25] display a similar behavior. Neither random graphs [17], nor small-world networks constructed according to the WS model, have a power-law degree distribution P (k) like the one observed in real large networks. In fact for a random graph P (k) is described by a Poisson distribution P (k) = < k >k /k! e−<k> , a curve peaked at k =< k > and exponentially decaying for large k, in contrast to the power-law decay of scale-free graph. This is the reason why random graphs are sometimes referred in the literature as exponential graphs [14]. Also in the case of the WS small-world model P (k) is strongly peaked around the average value of k (since it is very close to the P (k) of regular graphs). Furthermore, even for those real networks for which P (k) is not clearly a power law for all values of k, and has for instance an exponential cut off for very large k, the degree distribution significantly deviates from the Poisson expected for random graphs [26]. At this point two natural questions come up to the mind: 1) What is the mechanism responsible for the emergence of a scale-free structure in such a huge number of real networks ? 2) What are the main properties of a scale-free topology, and why is it privileged with respect to the other topologies ? An answer to the first question and a concrete algorithm to construct a scalefree network has been proposed by Barabasi and collaborators. In Refs.[12,13] the authors argue that the scale-free nature of real networks is rooted in two generic mechanisms occurring in many real networks. First of all most realworld networks describe open systems which grow by the continuous addition of new nodes: as an example the WWW grows exponentially in time by the addition of new web pages, or the research literature constantly grows by the publication of new papers. Moreover most real networks exhibit preferential attachment, that is, the likelihood of connecting to a node depends on the node’s degree. A webpage will most likely include hyperlinks to popular documents which have already a high degree, because such highly connected documents are easier to find. A new manuscript will most likely cite a wellknown one increasing furthermore its high number of citations. Growth and preferential attachment are the two sufficient ingredients to produce a scalefree network. The Barabasi-Albert (BA) model proposed in [12,13] is a simple way to generate a network with a power-law degree distribution P (k) ∼ k −γ , 8 and with γ = 3. On the contrary, neither of the two ingredients is present in the small-world model discussed in Section 2, that assumes instead a fixed number N of vertices and a probability that two nodes are connected (or their connection is rewired) independent of the nodes’ degree. Concerning the second question, the authors of ref.[14] have studied the response of scale-free networks to errors and to attacks. By error and attack they indicate, respectively, the removal of randomly chosen nodes, and the removal of the most connected nodes. In particular they study the change of the characteristic path length L when a small fraction of the nodes is eliminated: in fact the removal of a node in general increases the distance between the remaining nodes, because it can eliminate paths contributing to the connectivity of the system. Differently from random networks, the scale-free networks display a high degree of error tolerance, i.e. the ability of their nodes to communicate is unaffected by the failure of some randomly chosen nodes. On the contrary these networks are extremely vulnerable to attacks, i.e. the removal of a few nodes that play a vital role in maintaining the network’s connectivity. In practice the presence of the scale-free topology in so many real cases [11,15,16,23] can be attributed to the need to construct systems with a high degree of tolerance against errors. Though the error tolerance comes at a high price in that the scale-free networks are extremely vulnerable to attacks. The response of scale-free networks to the removal of nodes is also one of the main points of our paper. In fact, in Section 4 we will extend the analysis of ref.[14], that was only based on the quantity L, to both the global and local properties of the network. In order to characterize the local properties of a graph we will use either C and Eloc . For the global properties we will see that Eglob is better than L especially when a large number of nodes are removed. The BA scale-free model reproduces the power-law connectivity distribution, but not the small-world effect. In fact it produces networks with small average distance between two generic nodes, like a small-world network, but lacks high clustering, which is typical of a small-world network. On the contrary, most large real networks with power-law connectivity distribution, shows also a high clustering coefficient. As an example the values of C obtained from the two databases of Internet and of the World Wide Web studied, are orders of magnitude larger than the clustering coefficients for the correspective random graphs [3]. In order to overcome this problem Klemm and Eguı́luz [18] have recently proposed an alternative model, the KE model, which produces networks with scale-free degree distributions, small average distances and with strong clustering. With a minimal amount of changes to the BA model, the KE model reproduces, at the same time, the two distinct features of real networks: power-law degree distribution and small-world effect. We do not go into the details of the KE model now. Since the subject of this paper is the study of the properties of scale-free networks, in the next section we will discuss how to construct scale-free networks with the BA model, and scale-free networks 9 with high clustering by means of the KE model. 4 Efficiency in Scale-Free Networks We are finally ready to study how the efficiency of a network with scale-free topology is affected by the removal of some of its nodes. We will make use of the measures defined in formula 3 and in 4, and compare the results with the ones obtained in terms of L and C. The first step is the construction of a scale-free network: for this purpose we consider both the BA model and the KE model. 4.1 Barabasi-Albert (BA) scale-free networks First we construct the scale-free network following the Barabasi-Albert (BA) model [12,13]. As previously mentioned the two ingredients of the BA model are growth and preferential attachment. In fact the algorithm [12] is based on the iteration of the following two steps: (1) Addition of nodes: Starting with a small number (m0 ) of nodes, at every timestep a new node with m(≤ m0 ) edges is added. The edges link the new node to m different nodes already present in the system. (2) Preferential attachment of new edges: When choosing the nodes to which the new node connects, the probability Π that the new node will be connected to node i is assumed to depend on the degree ki of node i, according to: ki Π(ki ) = P j kj (5) After t timesteps the algorithm produces a network with N = t + m0 nodes and mt edges. The analytical solution of the BA model in the mean field 2m2 t −3 approximation predicts a degree distribution P (k) = m k , This function 0 +t asymptotically converges for t → ∞ to a time-independent degree distribution P (k) ∼ 2m2 k −γ , i.e. to a power law with an exponent γ = 3. It is interesting to notice that γ does not depend neither on m nor on the size N = m0 + t of the network. The mean field predictions are confirmed by other analytical approaches (master equation [27] and rate equation [28]) and by numerical simulations. Both the two ingredients, growth and preferential attachment, are necessary in the BA model for the emergence of the power-law scaling. Barabasi et al. have in fact checked that a model with growth but no preferential attachment gives for 10 0 0 10 10 N=15000 N=5000 SFBA SFKE −1 10 −1 10 −3 k P(k) −3 k −2 10 −2 −3 10 10 −3 10 (a) (b) −4 10 −4 1 10 10 100 1 k 10 100 k Fig. 2. Degree distribution for the BA scale-free model (indicated as SFBA with full circles) and for the KE scale-free model (indicated as SFKE with open squares). Two system sizes are considered N = 5000, K = 10000 in (a), and N = 15000, K = 75000 in (b). For N = 5000 the results reported are obtained as averages over 10 different realizations. While in the case N = 15000 only one realization is considered. The dashed line is P (k) ∼ k−γ with γ = 3. t → ∞ an exponential degree distribution. On the other hand, a model with preferential attachment but no growth predicts that the degree distribution becomes a Gaussian around its mean value. The BA model can be considered as a particular case of a model proposed by Simon [29] in 1955 to describe the scaling behaviour observed in distributions of words frequencies in texts, and in population figures of cities [30]. The original Simon’s model has been reformulated recently for networks growth in ref [31]. In Fig. 2 we report the degree distribution of a scale-free network obtained from the BA model (reported in black dots and indicated as SFBA ). We have constructed two networks, the first with N = 5000, K = 10000, and the second with N = 15000, K = 75000. In the first case the results reported are obtained as averages over 10 different realizations. While in the case N = 15000 only one realization is sufficient to have a good statistics. 4.2 Klemm-Eguı́luz (KE) scale-free networks In this section we introduce a different class of scale-free networks with high clustering coefficient. We follow the method developed by Klemm and Eguı́luz (KE) in Ref.[18]. In the KE model, each node of the network is assigned a 11 binary state variable and can be either in an active state or in a non-active state. Taking a completely connected network of m active nodes as initial condition, the time-discrete dynamics of the KE model is based on the iteration of the following three steps: (1) Addition of nodes: A new node with m edges is added to the network. (2) Preferential attachment: For each of the m edges of the new node it is decided with a probability µ whether the link connects to one of the active nodes or if it connects to a non-active node. In the latter case the random node is chosen according to the same rule of the BA model, the linear preferential attachment of eq. (5), i.e. the probability that node i obtains a link is P proportional to the node’s degree: Π(ki ) = ki / j kj . The limit case µ = 1 of the KE model is the BA model. The limit case µ = 0 is a model with high clustering but large path length: in fact, as a function of the system size, C quickly converges to a constant value, whereas L increases linearly [32]. (3) Activation and deactivation of nodes. One of the m active nodes is deactivated: the probability that node i is chosen for deactivation is Πdeact = i −1 P −1 ki / l kl . The new node is set in the active state. The KE model generates scale-free networks with degree distribution P (k) = 2m2 k −3 (for k ≥ m) and average connectivity < k >= 2K = 2m [32]. FurN thermore, by varying µ in the interval [0, 1] the model makes possible to study the cross-over between a case with high L and C (the model µ = 0 has been studied previously in ref.[32]), and a case with small L and C (µ = 1 corresponds exactly to the BA model). Klemm and Eguı́luz have shown in Figure 1 of Ref.[18] that a few “long-range” connections are sufficient to have a smallworld transition: in fact, as soon as µ is different from zero, the average shortest path length L drops rapidly approaching the minimum value of the BA model, while the clustering coefficient C remains practically constant. Thus the KE model with µ 6= 0 and µ ≪ 1 reproduces the three generic properties of realworld networks: power law degrees distribution, small L and high C. In our simulations in the following of this paper, we have used the KE model with µ = 0.1. In Fig. 2 we report the degree distribution obtained for two different networks N = 5000, K = 10000 and N = 15000, K = 75000. Numerical simulations are shown both for the BA model (full circles) and the KE model (open squares). A good power-law behavior is obtained with an exponent γ = 3 as expected. The results for N = 5000 are obtained as averages over 10 different realizations. While in the case N = 15000 only one realization is sufficient to have a good statistics. 12 8 (a) 7 L6 EXP Failure EXP Attack SFBA Failure SFBA Attack 5 4 0 0.005 0.01 0.015 0.02 0.005 0.01 0.015 0.02 (b) Eglob 0.2 0.1 0 p Fig. 3. Resistance to failures and attacks: analysis of the global characteristics. BA scale-free graphs (SFBA ) are compared with random graphs (EXP). In both cases we start with two graphs with N = 5000 nodes and K = 10000 edges, and we remove a fraction p of the nodes with two different prescriptions: failure and attack (see text). The correlation length L, in panel (a), and the global efficiency Eglob , in panel (b), are plotted as function of p. The results reported here and in all the following figures are averages over 10 different realizations. 4.3 Error and attack tolerance of BA scale-free networks We are finally ready to address the problem of how the global and the local properties of a scale-free network are affected by the removal of some of the nodes. We consider first the class of scale-free networks generated by means of the BA model of Section 4.1. The malfunctioning of a node in general makes less efficient the communication between the remaining nodes, because it can eliminate some of the edges and consequentely some of the paths that contribute to the interconnectedness of the system. This will affect not only the global, but also the local properties of the graph (though the latter have never been addressed in the literature before). As a starting point in our numerical experiments we consider a BA scale-free network with N = 5000 nodes and K = 10000 edges, corresponding to < k >= 4. The error and attack tolerance of this network is compared to that of a random graph with the same number of nodes and edges. As previously mentioned, the P (k) of a random graph is a Poisson distribution, a curve which for large k decays exponentially and not as a power-law. For this reason the random graph is indicated in the figures’ captions as exponential graph (EXP). In removing the nodes, we use two different strategies. We can simulate an error in the system, as the failure of 13 30 L EXP Failure EXP Attack SFBA Failure 20 SFBA Attack (a) 10 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.2 Eglob 0.1 (b) 0 0 0.1 0.2 0.3 0.4 p 0.5 0.6 0.7 0.8 Fig. 4. Resistance to failures and attacks: analysis of the global characteristics. BA scale-free graphs are compared with random graphs. Same as in previous figure, but now the whole range of p is considered. a node chosen at random among all the possible nodes. In alternative we can simulate an attack on the system by sorting the nodes in order of importance, according to their degree ki, and then removing them one by one starting from the node with the highest degree. In fact an agent well informed about the whole structure of the network and wanting to deliberately damage the network will not target the nodes randomly, but will preferentially attack the most connected nodes. Both for failures and attacks a fraction p of the N nodes is removed and the properties of the networks are studied computing the two quantities L and C, or the two quantities Eglob and Eloc , as a function of p (see Section 4.4). Global properties. In Fig. 3 and in Fig. 4 we report L and Eglob as a function of the fraction p of nodes removed. We first perform the same analysis of ref.[14] by studying the changes in the characteristic path length L. The scalefree graph considered initially has L ∼ 4.6 (on average, two generic nodes can be connected in less than 5 steps), a value lower than that of the random graph (L ∼ 6.7). In the upper part of Fig. 3, we observe for the exponential network a slow monotonic increase of L with p (for p ≪ 1), both for failures and for attacks. In practice there is no substantial difference whether the nodes are selected randomly or in decreasing order of connectivity. This behaviour is rooted in the homogeneity of the network: since all nodes have approximately the same number of links, they all contribute equally to the network characteristic path length, thus the removal of a generic node or the best connected one causes about the same amount of damage. On the other hand we oberve 14 0.01 (a) 0.008 0.006 EXP Failure EXP Attack SFBA Failure C 0.004 SFBA Attack 0.002 0 0 0.01 0.005 0.01 0.015 0.02 0.005 0.01 0.015 0.02 (b) 0.008 0.006 Eloc 0.004 0.002 0 0 Fig. 5. Resistance to failures and attacks: analysis of the local characteristics. BA scale-free graphs are compared with random graphs. In both cases we consider two graphs with the same initial number of nodes N = 5000 and edges K = 10000. The clustering coefficient C, in panel (a), and the local efficiency Eloc , in panel (b), are plotted as function of p, the fraction of nodes removed. a drastically different behaviour for scale-free networks (the same observed in [14]): L remains almost unchanged under an increasing level of errors, while it increases rapidly when the most connected nodes are eliminated. For example, when 2% of the nodes fails (p = 0.02), the communication between the remaining nodes in the network is unaffected, while, when the 2% of the most connected nodes is removed, then L almost doubles its original value. This robustness to failures and at the same time vulnerability to attacks is rooted in the inhomogeneity of the connectivity distribution P (k): the connectivity is maintained by a few highly connected nodes, whose removal drastically alters the network’s topology, and decreases the ability of the remaining nodes to communicate with each other. In the following we show that this behavior can be better quantified by using Eglob , since the variable in formula 3 is normalized to the ideal case, obtained when all the N(N − 1)/2 links are present in the graph. In the lower part of Fig. 3 we observe that initially the scale-free graph has Eglob = 0.24 and the random graph has Eglob = 0.15, respectively 24% and 15% the efficiency of a completely connected graph. When p = 0.02 and the nodes are removed under attack (i.e. according to their degree), the efficiency of the scale-free graph has rapidly decreased to Eglob = 0.12: by attacking only a tiny fraction of nodes as the 2%, the scale-free network has already lost 50% of its efficiency. Conversely the global efficiency of the scale-free graph does not vary a lot in the case of failures. The same thing happens for the exponential graph, where 15 0.01 (a) 0.008 EXP Failure EXP Attack SFBA Failure 0.006 SFBA Attack C 0.004 0.002 0 0 0.01 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 (b) 0.008 0.006 Eglob 0.004 0.002 0 0 p Fig. 6. Resistance to failures and attacks: analysis of the local characteristics. BA scale-free graphs are compared with random graphs. Same as in previous figure, but now the whole range of p is considered. the communication between the remaining nodes of the network is unaffected either from failures and from attacks. In so far we have only considered the removal of a small percentage of nodes. What happens now if we extend the analysis to larger values of p, even to values of the order of 1 ? In this case, it will become evident that the efficiency variable is a better quantity to study. In fact, for large values of p, we have to deal with the problem of the graph becoming unconnected. In the upper part of Fig. 4, we observe that L reaches very high values when more and more nodes are removed. In practice, as explained in Section 2, a straighforward application of the definition in formula (1) would give L = ∞ for p larger than a certain value p∗ for which the graph becomes unconnected. To avoid this divergence we have to limit the use of definition (1) only to a part of the graph, the main connected component (as also done in [14]). In this way for different values of p we compare graphs with different number of nodes, and this can give unrealistic results (see Fig. 1) as the maxima of L observed in Fig. 4(a). See for example the BA scale-free network (SFBA ) under attacks: we have L = 30 for p = 0.1 and then a rapid drop to L = 4 for p = 0.2. This effect indicates that the network for p = 0.1 starts to fragment into many unconnected small parts (each with more or less the same size) as evidenced from the cluster size distribution studied in Ref. [14], but at the same time makes unfeasible the comparison of the connectivity properties of graphs with different p. In fact the misleading information we get from L is that, by increasing p, i.e. by removing more nodes we can get a network with better connectivity (shorter L). In reality, when we want to compare graphs with p varying in a 16 wide range of values, it is better to use the efficiency variable. In the lower part of Fig. 4, one can clearly see that, evaluating Eglob as a function of p we get four monotonically decreasing curves, and we avoid the problem of the unphysical change of slope of L. Again we notice the rapid drop in the global efficiency of a scale-free network under attack: the removal of the 10% of the nodes completely destroys the global efficiency that drops to values Eglob ∼ 0. The removal of nodes by failure produces instead a slower decreases of Eglob with p. When we compare these two curves with the two analog curves obtained for an exponential graph, we observe that in the case of a random graph the difference between failure and attack is less pronounced (though clearly visible on such a scale of p, while it was not visible in the short range p scale used in Fig. 3(b)) than in the case of the SFBA network. This means that, besides the sudden drop of Eglob observed for SFBA under attack there are no other qualitative differences between scale-free and random graphs when their properties are compared on a large scale of p. The results we have reported in Fig. 3 and Fig. 4 are averages over 10 different realizations. The average makes no important differences in the case of the global properties, although can be very important for the local quantities, which are in general affected by larger fluctuations. Local properties. In Fig. 5 and in Fig. 6 we report C and Eloc as a function of the nodes removed. We start, as before, with two networks, a BA scalefree and a random graph, with N = 5000 nodes and K = 10000 links. Of course both the networks considered have, by construction, a very small local clustering, as indicated by the small values of C (0.007 for the BA scale-free network and less than 0.001 for the random graph) or by the small values of Eloc (again 0.007 for the BA scale-free network and less than 0.001 for the random graph). The first thing to notice is that, in agreement with what said in Section 2, the values of C and Eloc are very similar. In fact we expect C to be a reasonable approximation for Eloc when the subgraphs Gi of the neighbours of a generic node i are composed by very small graphs [7,9]. This is the case for both the random graph, and also the scale-free network of the BA model (things will be different for KE scale-free networks). Since the local clustering is very small we have large fluctuations among different realizations, and we must consider an average over different realizations to obtain stable results. The curves reported in Fig. 5 and in Fig. 6 are averages over 10 different realizations. Though the local clustering of the two networks is very small, we observe a rapid drop in the local efficiency (similarly to that observed for the global efficiency) of a scale-free network under attacks. 17 (a) 12 L 10 EXP Failure EXP Attack SFKE Failure 8 SFKE Attack 6 0 0.005 0.01 0.015 0.02 0.005 0.01 0.015 0.02 0.16 0.14 Eglob 0.12 0.1 (b) 0.08 0 Fig. 7. Resistance to failures and attacks: analysis of the global characteristics. KE scale-free graphs are compared with random graphs. In both cases we have two graphs with the same initial number of nodes N = 5000 and edges K = 10000. The correlation length L and the global efficiency Eglob are plotted as function of p, the fraction of nodes removed. 4.4 Error and attack tolerance of KE scale-free networks We now repeat the same analysis for the class of scale-free networks generated by the KE model, i.e. for networks with power law degree distribution and at the same time strong clustering. We can consider such networks as smallworlds with power-law degree distribution. We start by considering a KE scalefree (SFKE ) network with N = 5000 nodes and K = 10000 edges, generated by the prescription of the KE model of Section 4.2 with µ = 0.1. (such a scale-free network has also small-world properties, in fact it has Eglob = 0.12 and Eloc = 0.54). As in the previous section we remove the nodes by using the two different strategies simulating failures or attacks, and we investigate how the properties of the network change by reporting as a function of p the two quantities L and C, or the two quantities Eglob and Eloc . Global properties. In Fig. 7 and in Fig. 8 we report L and Eglob as a function of the fraction p of nodes removed. The KE scale-free graph considered initially has now L ∼ 9.5 (two generic nodes can be connected in an average of 10 steps). This value is higher than the value obtained for SFBA networks (L ∼ 4.6), and also higher than that of random graphs (L ∼ 6.7). This is of course the price to pay to have a strong local clustering: the increase in local connectivity is obtained at the expenses of the global connectivity. In any case, the results are similar to those obtained for the BA scale-free networks, though 18 30 (a) 25 EXP Failure EXP Attack SFKE Failure 20 SFKE Attack L 15 10 5 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.16 0.12 Eglob 0.08 0.04 (b) 0 0 p Fig. 8. Resistance to failures and attacks: analysis of the global characteristics. KE scale-free graphs are compared with random graphs. Same as in previous figure, but now the whole range of p is considered. the difference between scale-free and exponential network is less marked. In the upper part of Fig. 7 we observe on one hand that the exponential network has a slow monotonic increase of L with p (for p ≪ 1), both for failures and for attacks, and on the other hand that for scale-free networks L remains almost unchanged under an increasing level of errors, while it increases rapidly when the most connected nodes are eliminated. In the lower part of figure Fig. 7 we see that the same behavior is confirmed when the global connectivity of the graph is expressed in terms of the efficiency Eglob : the initial efficiency of the scale-free graph Eglob = 0.12 (12% the efficiency of the completely connected graph) decreases to Eglob = 0.08 by attacking the 2% of the nodes (though this results is not as drastic as in the case of BA networks, compare with Fig. 3). The global efficiency of the scale-free graph does not vary a lot in the case of failures. In Fig. 8 we consider a larger range of values of p. From panel (a) we see again that the correct variable to evaluate is Eglob and not L. In fact L would give unphysical result as the presence of a spurious maximum when the network becomes unconnected. From the plot of Eglob versus p in Fig. 8(b) we observe that the KE scale-free and the exponential graph have a similar behavior as a function of p, when compared on the whole scale of p, apart from a different normalization factor, i.e. a different value at p = 0. A qualitatively different behavior in the global properties of KE scale-free and exponential graphs is observed only for p < 0.02 (compare Fig. 7 to Fig. 8), i.e. only when a very small fraction of nodes is removed. 19 0.5 0.4 0.3 EXP Failure EXP Attack SFKE Failure C 0.2 0.1 (a) SFKE Attack 0 -0.1 0 0.005 0.01 0.015 0.02 0.005 0.01 0.015 0.02 0.5 0.4 Eloc 0.3 0.2 0.1 (b) 0 -0.1 0 p Fig. 9. Resistance to failures and attacks: analysis of the local characteristics. KE scale-free graphs are compared with random graphs. In both cases we have two graphs with the same initial number of nodes N = 5000 and edges K = 10000. The clustering coefficient C and the local efficiency Eloc are plotted as function of p, the fraction of nodes removed. Local properties. In Fig. 9 and in Fig. 10 we report C and Eloc as a function of the nodes removed. We observe that the KE scale-free network has a good local connectivity expressed by a clustering coefficient C = 0.43 and/or Eloc = 0.54 (meaning that the graph has 54% of the local efficiency of the completely connected graph). Notice that, for KE scale-free networks the numerical values of Eloc and C are not similar to each other, as they were in BA scale-free networks. In fact for SFKE networks the subgraph Gi of the neighbours of a generic node i is not always a very small graph and therefore C is not a good approximation of Eloc anymore [7,9]. Though the numerical value of C is different from that of Eloc , the information we get from the behavior of these two quantities as a function of p is similar. We observe, both in Fig. 9 and in Fig. 10, a rapid decrease in the local efficiency (and in the clustering coefficient C) of SFKE networks under attacks, while the local efficiency (and C) decreases much slower under failures. Eloc (p) and C for random graphs, (the same curves were plotted in Fig. 5 and Fig. 6 in larger scale), are here order of magnitude smaller than the values of the local efficiency of SFKE networks, and are practically indistinguishable from zero in the scale adopted. 20 0.5 (a) 0.4 EXP Failure EXP Attack SFKE Failure 0.3 SFKE Attack C 0.2 0.1 0 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 (b) 0.5 0.4 Eloc 0.3 0.2 0.1 0 -0.1 0 0.1 0.2 0.3 0.4 p 0.5 0.6 0.7 0.8 Fig. 10. Resistance to failures and attacks: analysis of the local characteristics. KE scale-free graphs are compared with random graphs. Same as in previous figure, but now the whole range of p is considered. 5 Conclusions In this paper we have studied the effects of errors and attacks on the efficiency of scale-free networks. Two different kinds of scale-free networks have been considered and compared to random graphs: scale-free networks with no local clustering produced by the Barabasi-Albert (BA) model, and scalefree networks with high clustering properties as in the model by Klemm and Eguı́luz (KE). By using as mathematical measures the global and the local efficiency, we have investigated the effects of errors and attacks both on the global and on the local properties of the network. We have found that both the global and the local efficiency of scale-free networks are unaffected by the failure of some of the nodes, i.e. when some (up to 2%) of the nodes are chosen at random and removed. On the other hand, at variance with random graphs, in scale-free networks the global and the local efficiency rapidly decrease when the nodes removed are those with higher connectivity ki , i.e. scale-free networks are extremely sensitive to attacks. These properties are true both for BA networks and for KE networks, though KE networks have higher local efficiency but lower global efficiency than BA networks. We have also studied the effects of errors an attacks when a large number of nodes (even up to 80% of the nodes of the network) are removed. On a such a larger scale of p the difference between scale-free networks and random graph is less pronounced than in the smaller scale p < 0.02. When a large number of nodes 21 are removed, especially when the network become unconnected, the efficiency variable is definitely a better quantity than the characteristic path length L to measure the response of the networks to external factors. References [1] Y. Bar-Yam, Dynamics of Complex Systems (Addison-Wesley, Reading Mass, 1997). [2] S.H. Strogatz, Nature 410, 268 (2001). [3] R. Albert and A.-L. Barabási, Reviews of Modern Physics 74, 47 (2002). [4] M.E.J. Newman, J. Stat. Phys. 101, 819 (2000). [5] D.J. Watts and S.H. Strogatz, Nature 393, 440 (1998). [6] S. Milgram, Psychol. Today, 2, 60 (1967). [7] V. Latora and M. Marchiori Phys. Rev. Lett. 87, 198701 (2001). [8] V. Latora and M. Marchiori, cond-mat/0202299, Proceedings of the International Conference “Horizons in Complex Systems”, Messina December 2001, to appear on Physica A. [9] V. Latora and M. Marchiori, cond-mat/0204089 and submitted to Phys. Rev. E. [10] M. Marchiori and V. Latora, Physica A285, 539 (2000). [11] R. Albert, H. Jeong, and A.-L. Barabási, Nature 401, 130 (1999). [12] A.-L. Barabási and R. Albert, Science 286, 509 (1999). [13] A.-L. Barabási, R. Albert and H. Jeong, Physica A272, 173 (1999). [14] R. Albert, H. Jeong, and A.-L. Barabási, Nature 406, 378 (2000); Correction Nature 409, 542 (2001). [15] H. Jeong, B. Tombor, R. Albert, Z.N. Oltvai and A.-L. Barabási, Nature 407, 651 (2000). [16] H. Jeong , S.P.Mason, A.-L. Barabási and Z.N. Oltvai, Nature 411, 41 (2001). [17] B. Bollobás, Random Graphs (Academic, London, 1985). [18] K. Klemm, V.M. Eguı́luz, Phys. Rev. E65, 057102 (2002). [19] The Internet Movie Database, http://www.imdb.com [20] T.B. Achacoso and W.S. Yamamoto, AY’s Neuroanatomy of C. elegans for Computation (CRC Press, Boca Raton, FL, 1992). 22 [21] The formalism can be easily extended to the case of weighted networks [7,9]. Since in this paper we are interested in the study of unweighted networks we have directly presented the definition of the efficiency in the particular and simpler case of unweighted networks. In the general definition valid for weighted and unweighted networks a normalization factor has to be introduced to have: 0 ≤ Eglob (G) ≤ 1 and 0 ≤ Eloc (G) ≤ 1 (see refs. [7,9]). [22] As done for example in ref. [5] when the collaboration network of movie actors is studied, or in all the examples of [14]. [23] F.L. Liljeros, C.R. Edling, N. Amaral, H.E. Stanley, and Y. Aberg, Nature 411, 907 (2001). [24] R. Pastor-Satorras and A. Vespignani, Phys. Rev. Lett. 86, 3200 (2001). [25] J.P.K. Doye, cond-mat/0201430. [26] L. A. N. Amaral, A. Scala, M. Barthélémy, and H. E. Stanley, Proc. Natl. Acad. Sci. 97, 11149 (2000). [27] S. Dorogovtsev, J. Mendes and A.N. Samukhin Phys. Rev. Lett. 85, 4633 (2000). [28] P.L. Krapivsky, S. Redner, F. Leyvraz Phys. Rev. Lett. 85, 4629 (2000). [29] H.A. Simon, Biometrika 42, 425 (1955). [30] Zipf, G.K., Human Behaviour and the Principle of Least Effort (AddisonWesley, Cambridge, Massachusetts, 1949). [31] S. Bornholdt, H. Ebel, Phys. Rev. E64, 035104(R) (2001). [32] K. Klemm, V.M. Eguı́luz, Phys. Rev. E65, 036123 (2002). 23