Academia.eduAcademia.edu

Finding k-highest betweenness centrality vertices in graphs

2014, Proceedings of the 23rd International Conference on World Wide Web

The betweenness centrality is a measure for the relative participation of the vertex in the shortest paths in the graph. In many cases, we are interested in the k-highest betweenness centrality vertices only rather than all the vertices in a graph. In this paper, we study an efficient algorithm for finding the exact k-highest betweenness centrality vertices.

Finding k-Highest Betweenness Centrality Vertices in Graphs Min-Joong Lee Chin-Wan Chung Department of Computer Science, KAIST 291 Daehak-ro, Yuseong-gu Daejeon, Korea Division of Web Science and Technology & Department of Computer Science, KAIST 291 Daehak-ro, Yuseong-gu Daejeon, Korea [email protected] ABSTRACT The betweenness centrality is a measure for the relative participation of the vertex in the shortest paths in the graph. In many cases, we are interested in the k-highest betweenness centrality vertices only rather than all the vertices in a graph. In this paper, we study an efficient algorithm for finding the exact k-highest betweenness centrality vertices. Categories and Subject Descriptors G.2.2 [Discrete Mathematics]: Graph Theory—Graph algorithms, Path and circuit problems; E.1 [Data]: Data structures—Graphs and networks Keywords Graph; Betweenness centrality; Top-k 1. INTRODUCTION The centrality is one of the essential concepts for the analysis of networks, and the betweenness centrality is one of the most prominent measures among several centrality measures. The betweenness centrality problem has been extensively studied in the literature since the idea of the betweenness centrality is defined by Anthonisse et al. [1]. The fastest known algorithm to compute exact betweenness centralities for all the vertices is proposed by Brandes et al. [2]. Recently, starting from the work by Lee et al. [3], a few works address an updating problem in dynamic graphs. However, in many cases, we are only interested in the khighest betweenness centrality vertices rather than the betweenness centralities of all the vertices in a graph. This is reasonable since a vertex with a higher centrality is viewed as a more important vertex than a vertex with a lower centrality, and we are only interested in the important vertices in many applications such as finding influencers in a social network, and locating bottlenecked junctions/routers in a transportation network/the Internet. Moreover, a community detection application, one of the most prominent application of the betweenness centrality, utilizes the highest Copyright is held by the International World Wide Web Conference Committee (IW3C2). IW3C2 reserves the right to provide a hyperlink to the author’s site if the Material is used in electronic media. WWW’14 Companion, April 7–11, 2014, Seoul, Korea. ACM 978-1-4503-2745-9/14/04. http://dx.doi.org/10.1145/2567948.2577358. [email protected] centrality vertex only. Despite the above evident applications, the k-highest betweenness centrality problem is hardly discussed in the literature. In this paper, we propose an efficient algorithm for finding the exact k-highest betweenness centrality vertices in a graph. We utilize the novel properties of biconnected components to compute the betweenness centralities of a part of vertices in a graph, and employ an idea of the upper-bounding to compute the relative partial order of the vertices with respect to their betweenness centralities. 2. PRELIMINARY Definition 1 (Betweenness Centrality). A graph is represented by G = (V, E). The betweennessPcentrality of a vertex vj ∈ V is defined as follow. c(v) = i,j∈V (σi,j (v)/σi,j ) where v, i, j ∈ V , v 6= i, i 6= j, j 6= v, σi,j (v) is the number of shortest paths between i and j that include v, and σi,j is the number of shortest paths between i and j. Definition 2 (Biconnected Component and Articulation Vertex). A biconnected graph is a inseparable graph by removing an any vertex in a graph, and the biconnected component is a maximal biconnected subgraph of the graph. Any connected graph can be decomposed into biconnected components that are connected to each other through a vertex called articulation vertex. An example is shown in Figure 1. a b c e d g i h j f Biconnected components k l B1 = {a, b, c}, B2 = {c, e}, B3 = {d, e, f }, B4 = {e, g, h, i, j}, B5 = {j, k, l} Articulation vertices = c, e, j Figure 1: Graph Example. 3. SOLUTION We can efficiently find the k-highest betweenness centrality vertices by solving the following two sub-problems. 1. Compute the betweenness centrality of a vertex faster than the betweenness centralities of all vertices. 2. Compute the partial order of vertices with respect to their betweenness centralities. Sub-proplem 1. To compute the betweenness centrality of a single vertex v, σi,j (v)/σi,j for all vertices pairs i, j in a graph should be computed by Definition 1. Therefore, solving this sub-problem seems impossible. However, we propose a novel idea to compute σi,j (v)/σi,j without performing allpairs shortest paths computation for all vertices pairs i, j in the graph. Let V(Bv ) = {V i , ..., V k } be a set of vertex sets. V i ∈ V(Bv ) is a set of vertices in the disconnected component which has a vertex i among the disconnected components induced by removals of all vertices in Bv . Then, the betweenness centrality of vertex v in Bv can be efficiently computed as follow. c(v)=cBv (v)+ X |V j |·σi,j (v) + σi,j i i∈Bv ,V j ∈V(Bv ) X |V i |·|V j |·σi,j (v) σi,j V ,V j ∈V(Bv ) where i 6= j. The first term cBv (v) represents an amount of the betweenness centrality of v with respect to the shortest paths between vertices in Bv . Since the shortest paths between vertices in a biconnected component only include vertices in the component, we can compute this term by computing the betweenness centrality of v on Bv using any existing betweenness centrality computation algorithm. The second term represents an amount of the betweenness centrality with respect to the shortest paths between a vertex in Bv and a vertex in not Bv , and the third term represents an amount of the betweenness centrality with respect to the shortest paths between vertices not in Bv . Note that the above equation requires the shortest paths between pairs of vertices i, j in Bv only, and the betweenness centralities of all vertices in Bv can be computed also using the shortest paths between i, j pairs in Bv . The number of vertices in a biconnected component is fairly smaller than that in the entire graph. In Figure 1, c(h) = 21 +( 5·1 + 2·1 )+( 5·2·1 ). 2 2 2 Sub-proplem 2. The betweenness centrality of any vertex in Bv can not exceeds C(|Bv |, 2), the number of ways of choosing two vertices from Bv . It is used to exclude biconnected components with a small vertices. The number of the shortest paths between a vertex in Bv and a vertex not in Bv is (|V |−|Bv |)·(|Bv | − 1). paths P Among the shortest i j between vertices not in Bv , V i ,V j ∈V(Bv ) |V | · |V | shortest paths include at least one vertex in Bv . It is used to exclude outlier biconnected components. Using the above numbers, the upper-bound of the betweenness centralities of vertices in Bv , denoted as b c(Bv ),Pcan be computed as c(Bv ) = C(|Bv |, 2)+(|V |−|Bv |)·(|Bv |−1)+ V i ,V j ∈V(Bv ) |V i |·|V j | . b The majority of cases, this upper-bound is an overestimate compared to the actual centralities. Despite the overestimate, it is enough to give a concrete theoretical reason for pruning small and outlier components. Overall Process. The overall process of our proposed algorithm is as follows. 1. Decompose G into biconnected components. Let B be a set of the decomposed biconnected components. 2. Calculate b c(Bv ), the upper-bound of the betweenness centralities of vertices in Bv , for each Bv ∈ B. 3. Calculate and update the betweenness centralities of vertices in Bh . Bh is Bv with the highest b c(Bv ) among Bv ∈ B. 4. Update B to B \ {Bh }. 5. Repeat 3-4 until the known k-th highest betweenness centrality is higher than the new b c(Bh ). Optimization. We can further optimize the above algorithm by computing and utilizing the lower-bound of the betweenness centrality of an articulation vertex after Step 2. Let V(a) = {V1 , ..., Vm } be a set of vertex sets. Vi ∈ V(a) is consist of vertices in each disconnected component induced by the removal of an articulation vertex a. The lower-bound of the betweenness centrality of P an articulation vertex a is denoted and computed as č(a) = Vi ,Vj ∈V(a) |Vi |·|Vj | where Vi 6= Vj . Only the amount of the betweenness centrality, with respects to the shortest paths between vertices in each biconnected component which has a, is not considered in the lower-bound of the centrality of a. Thus, it is a fairly tight bound, and articulation vertices generally have higher betweenness centralities than non-articulation vertices. Therefore, we can prune vertices in many biconnected components which have smaller upper-bounds than the lower-bound of an articulation vertex. 4. EXPERIMENTAL RESULTS Since an algorithm for finding the exact k-highest betweenness centrality does not exist, we compare our algorithm to the Brandes algorithm which is the fastest known algorithm for the exact betweenness centrality computation. Note that all existing algorithms require all-pairs shortest paths even for computing the centrality of one vertex in a graph. We show the experimental results with the summary of graphs in Table 1. The speed-up shows how much improvement is achieved by our algorithm compared to the Brandes algorithm. For example, our algorithm is 3312 times faster in ’eva’ dataset. Moreover, our algorithm is scalable with respect to k since it compute the centralities of vertices in a biconnected component simultaneously. Table 1: Experiments on Real Graphs Graph disease eva erdos972 geon CAGrQc power pgp CAhep contact Type Ownership Ownership Collab. Collab. Social Web link Web link Collab. Social |V | |E| 516 8343 5822 9072 5242 4941 4680 9877 11604 2376 6726 14750 13567 14496 13188 24340 25998 88806 Speed-Up k=1 k=5 k=10 k=50 k=100 18.22 18.09 17.38 17.12 3312.37 3312.13 37.43 6.33 6.31 3.321 3.313 4.021 4.020 21.61 3.02 4.24 5. CONCLUSION AND FUTURE WORKS In this paper, we propose an efficient algorithm for finding the exact k-highest betweenness centrality vertices. To the best of our knowledge, this is the first work which addresses the exact k-highest betweenness centrality problem. The proposed algorithm utilizes the novel properties of biconnected components, and outperforms the existing algorithm for finding the k-highest betweenness centrality vertices. As future works, we plan to improve the upper-bound computation. Currently, an upper-bound is computed by only using the number of vertices in a biconnected component. However, we expect a proper utilization of other graph measures, such as the number of edges and the diameter, will tighten the upper-bound. Also, we plan to employ our idea for computing the exact betweenness centralities for all vertices. This is also promising since our idea provides a key for the divide and conquer computation. Acknowledgments. This work was supported by the National Research Foundation of Korea grant funded by the Korean government (MSIP) (No. NRF-2009-0081365). 6. REFERENCES [1] J. M. Anthonisse. The rush in a directed graph. Stichting Mathematisch Centrum. Mathematische Besliskunde, (BN 9/71):1–10, 1971. [2] U. Brandes. A faster algorithm for betweenness centrality. Journal of Mathematical Sociology, 25(1994):163–177, 2001. [3] M.-J. Lee, J. Lee, J. Y. Park, R. H. Choi, and C.-W. Chung. Qube: a quick algorithm for updating betweenness centrality. In Proceedings of the 21st international conference on World Wide Web, pages 351–360. ACM, 2012.