A Benchmark For Betweenness Centrality Approximation Algorithms On Large Graphs
A Benchmark For Betweenness Centrality Approximation Algorithms On Large Graphs
A Benchmark For Betweenness Centrality Approximation Algorithms On Large Graphs
Table 1: Summary of the presented approximation methods (methods presented in bold are evaluated in Section 5)
4 6 centrality value of node 2 (that is close 1 and has degree two), ini-
tial sampling methods compute δ 1• (2) = 5 and scale it up using
1 2 3 |V |/|P | = 7/1 = 7 where V and P is the set of nodes and piv-
ots respectively. Overall the betweeness centrality value of node
2, is approximated by 5 · 7 = 35. Compared with the exact be-
5 7
tweeness centrality value of node 2, which is 5, the approximation
overestimates by a factor of |V |/|P | = 7/1 = 7. Although this ex-
Figure 3: Example of overestimating node 1 using RAND1 ample shows an extreme overestimation case, Geisberger et al. [13]
showed that it is rather common particularly for the nodes lying
close to pivots.
on the input graph. Particularly, common imbalances in a graph’s To address this overestimation problem, Geisberger et al. [13]
structure may result in poor pivot choices [7]. There are even cases and Khopkar et al. [19] introduce ideas that reduce the contribution
where the introduction of additional pivots decreases the quality of of pivots to the betweenness centrality values of nodes in their
the betweenness centrality approximation. proximity. To this end, Geisberger et al. [13] determine the contri-
The experimental analysis of [7] showed that uniform random bution of the node in computing the betweenness centrality based
sampling forms the best selection strategy. The random selection on the distance from the sampled pivots; nodes closer to sampled
of the pivots allows the strategy to handle structural imbalances of pivots have lower contribution. Geisberger et al. [13] introduced
the input graph and avoids the drawbacks of the other strategies. and evaluated several scaling functions that consider the distance
In total, uniform random sampling is a simple method that is both to the sampled pivots and showed that better results were achieved
effective and consistent in all tested graphs. For these reasons, it using the linear scaling function f (x ) = x, where x is the distance
is used in several studies as a comparison method [13, 19, 20, 31]. from a node to a pivot.
Thus, we have also decided to include the uniform random source In the same spirit, Khopkar et al. [19] introduced a non-uniform
sampling selection in our evaluation framework (Section 4). This random sampling method that discourages near to pivot nodes form
method is referred to as RAND1 (see Table 1). being sampled. To this end, they incorporate a penalty box that
contains nodes that cannot be sampled. Nodes near the sampled
Different contributions sampling. Initial sampling methods pivots are put in the penalty box. If a node stays in the penalty
tend to overestimate betweenness centrality values for nodes that box for more than k subsequent sampling iterations (where k is a
are near to the sampled pivots [13, 19]. For example, consider parameter) it is released and can be used as a pivot in a following
Figure 3 and assume that we only sample as a pivot node 1 (with sampling iteration.
degree one). To compute the approximation of the betweeness
A Benchmark for Betweenness Centrality Approximation Algorithms on Large Graphs SSDBM ’17, June 27-29, 2017, Chicago, IL, USA
δ and outputs the approximate betweenness centrality values of the betweenness centrality of all nodes in an ego network. There-
all nodes of the graph G. The algorithm starts by computing the fore, Everett and Borgatti [11] suggested using all neighbors with
initial sample size S (Line 1). After sampling the required number distance one or two hops.
of node-pairs and paths, it computes Equation 8 (Line 8) for all In addition to the previously mentioned methods, Kourtellis et
nodes that lie on the sampled paths (Line 7). These are used to al. introduced a new metric called k-path centrality that is based
compute the stopping condition (Line 10). If the stopping condition on random walks with at most k steps [20]. They reported that the
is satisfied, the algorithm computes and returns the betweenness nodes with high k-path centrality values also have high between-
centrality values (Line 13). Otherwise, it increases the sample size ness centrality values. Lee et al. [23] showed that the exact or the
and samples new node-pairs and paths (i.e., repeats the While loop approximate betweenness centrality values of a modified graph
of Line 2-11). could be found by recalculating the betweenness centrality values
Note that there are two critical points that may affect the per- on the portion of the graph where the update occurred.
formance dramatically; the initial sample size (Line 1) and the In our evaluation framework (Section 4), we have included Pfeffer
incremental factor of the sample size at each iteration (Line 11). and Carley [30], and Everett and Borgatti [11] methods, referred to
These two factors affect the performance because expensive compu- as KPATH and EGO, respectively (see Table 1).
tations are performed whenever the stopping condition is checked.
If we increase the sample size dramatically, we might end up having 4 OUR BENCHMARK: BeBeCA
unnecessary samples. On the other hand, if we increase the sample In this section we present BeBeCA, our benchmark for the evalua-
size slightly, we will have too many iterations with too many ex- tion of approximate betweenness centrality algorithms. BeBeCA
pensive operations to check the stopping condition. Riondato and consists of:
Upfal [32] proposed formulas to compute the initial size and the
• 20 real-world graphs with different sizes, average degrees
incremental factor in each iteration.
and diameters; their characteristics cover a wide range of
In our evaluation framework (Section 4), we have included Rion-
real applications.
dato and Upfal [32] method with an allowed error ϵ = 0.01 and
• The exact betweenness centrality values for all the above
probability δ = 0.1. This method is referred to as ABRA (see Table
graphs. These values form the golden standard that is
1).
needed to assess the accuracy of the tested methods.
Other approaches. All previously mentioned node-pair sampling • A set of metrics that allow us to assess various aspects of
methods are meant to approximate the betweenness centrality val- the approximation efficiency and effectiveness: execution
ues for all nodes. Ji and Yan [16] introduced an approximation time, absolute accuracy (i.e., maximum and average error),
algorithm that combines both techniques introduced by Bader et whole graph ranking and top-1% ranking accuracy, plus
al. [1], and Riondato and Upfal [32] to have a progressive sampling accuracy for specific nodes.
approach for approximating betweenness centrality for a specific • Implementations of the most representative state-of-the-
node. art approximation methods, and the scripts that calculate
the above-mentioned evaluation metrics.
BeBeCA can be used by researchers to evaluate any betweenness
centrality approximation method in a systematic way, or by practi-
3.3 Bounded traversal tioners to select the most appropriate method for the characteristics
of their applications and datasets. BeBeCA is available online3 .
The general idea behind bounded traversal methods is to sample
pairs of nodes with a specific shortest distance, compute their pair Computing infrastructure: We used two machines:
dependencies and use these values to approximate the betweenness Workstation: We have used a server with twin AMD Opteron-
centrality values. The sum of all dependencies on a node v will 6172 CPUs (24 cores in total) running at 2.1GHz and 192GB
form the approximated betweenness centrality value for that node. of RAM. We used the workstation for all approximate al-
Pfeffer and Carley [30] proposed a bounded traversal method gorithms and for computing the exact values for small
that works similarly to Brandes algorithm, except that all traversals graphs.
are bounded by K steps, i.e., the height of the DAGs considered Supercomputer: We have used a Cray XC40 supercomputer
is K. Given a node v, the intuition behind this approach is that with 6,174 nodes. Each node contains four 8-core Intel
nodes that are distant from v are unlikely to affect the betweenness Haswell CPUs running at 2.3GHz and 128GB RAM. We
centrality value of v. Ostrowski [29] proposed a methodology that used roughly half of the machine, that is 96,000 CPU cores
follows the MapReduce paradigm to enable parallel implementation and almost 400TB of RAM. The supercomputer was used
of bounded-distance approximation of betweenness centrality. for computing the exact betweenness centrality values of
Everett and Borgatti [11] devise a method utilizing the notion all large graphs.
of ego network. The ego network of a node v is a subgraph that
consists of node v and all its direct neighbors [11]. Specifically, 4.1 Datasets
their experiments showed that there is a correlation between the BeBeCA contains 20 real-world datasets (see Table 2) downloaded
betweenness centrality value of a node in its ego network and its from SNAP [24] and Konect [22] repositories. They cover a variety
betweenness centrality value in the whole graph. However, con-
sidering only direct neighbors will not give a good indicator of 3 http://bit.ly/BeBeCa
SSDBM ’17, June 27-29, 2017, Chicago, IL, USA Ziyad AlGhamdi, Fuad Jamour, Spiros Skiadopoulos, and Panos Kalnis
Table 2: Summary of datasets. The bolded datasets are used for evaluation in Section 5.
of domains and properties. We extract the largest connected com- discussed in the literature so far. We publish the calculated exact
ponent in each graph and perform our analysis on it. The ground values as a golden standard for the evaluation of existing and future
truth for all 20 datasets is included in our benchmark. However, approximation algorithms. It is worth noting that we spent 2M
for clarity, in the experimental section of this paper we consider core hours for the calculations; for comparison, this corresponds to
only 5 representative datasets (highlighted in bold in Table 2). The roughly 10 years of calculations on a high-end workstation with 24
following are brief descriptions of the chosen datasets: CPU cores and 256GB RAM.
ca-GrQc is a collaboration network between authors in arXiv, pub- Our parallel Brandes algorithm implementation works as follows:
lished under category “General Relativity and Quantum Cosmol-
(1) The sources of the input graph are divided uniformly among
ogy”. Nodes are authors, whereas edges represent co-authorship.
available threads.
The data covers all papers between 1993 and 2003.
(2) Each thread solves the single-source shortest path problem
email-Enron captures email communications. Nodes are email (SSSP) from its assigned source and computes the contri-
addresses, wheras edges represent email exchange between two bution of these sources to the betweenness centrality of all
addresses. nodes in the graph.
com-Amazon is a customer-product network created by crawling (3) The results is accumulated and summed by the master
the Amazon online store. Nodes are customers and edges denote thread to have the exact betweenness centrality values for
that two customers bought the same product. all nodes.
com-LiveJournal is a social network where nodes are users and
edges are friendships. 4.3 Evaluation metrics
dbpedia-link is a hyperlink network created from Wikipedia; In this section, we will detailed the measures incorporated by our
pages are nodes and hyperlinks are edges. evaluation framework.
Execution time. To measure the efficiency of the tested algorithm,
4.2 Golden standard: Exact values our framework incorporates wall-clock time.
In order to evaluate the various accuracy metrics for the approxi- Accurancy. To evaluate the accuracy of the tested approximation
mation algorithms, it is necessary to compare against the ground algorithm, we measure the maximum and the average error between
truth. However, this is a difficult issue because of the excessive the exact and the approximate betweenness centrality values. Since
cost of computing the ground truth for large realistic graphs. To there are nodes with betweenness centrality equal to zero in all the
solve this problem, we used 96,000 CPU cores with 400TB RAM chosen datasets, we do not measure the minimum error because
on a supercomputer to compute the exact betweenness centrality all approximation algorithms do not overestimate nodes with be-
values of the above-mentioned 20 datasets. The largest one contains tweenness centrality values equal to zero. Thus, the maximum and
126M edges; that is 3 orders of magnitudes larger than any graph the average error suffice to assess the deviation between the exact
A Benchmark for Betweenness Centrality Approximation Algorithms on Large Graphs SSDBM ’17, June 27-29, 2017, Chicago, IL, USA
and the approximate betweenness centrality values and, thus, the GSIZE. The implementation of GSIZE is also based on our imple-
accuracy of the tested method. mentation of Brandes’ algorithm. However, instead of sampling all
Ranking. Many applications use the ranking of the nodes based the pivots from the beginning, we use an adaptive sampling where
on the betweenness centrality values rather than the actual be- we sample until we meet a certain stopping condition then we ter-
tweenness centrality values. To measure the quality of the ranking minate. Therefore, we modified our Brandes’ implementation such
produced by the tested method, we use the Kendall-tau distance that it will take as an additional arguments node v and constant
[17, 18] to compare it with the ranking produced by the exact C, and then it will keep sampling pivots until the sum of source
method. Kendall-tau distance [17, 18] calculates the number of dependencies on v is larger than C |V |. Meanwhile, we compute and
disagreements between the rankings of the compared methods. maintain the sum of dependencies for all nodes in the input graph.
The value of Kendall-tau distance is in the range [-1,1], where 1 After that, for any node v, we scale up our betweenness centrality
means that the two rankings are in total agreement, while -1 means approximation by multiplying the sum of dependencies by |V |
|P | ,
that the two rankings are in complete disagreement. For instance, where |P | is the number of sampled pivots, to find its approximate
say we have a graph with n = 5 nodes A, B, C, D and E. Let their betweenness centrality value.
exact and approximate rankings be τ1 (A, B, C, D, E) = (1, 2, 3, 4, 5) DIAM. For the implementation of DIAM, we used the available
and τ2 (A, B, C, D, E) = (1, 5, 2, 4, 3), respectively. Let the number of parallel implementation by the authors in NetworKit library [35].
concordant pairs be α = 6 and the number of discordant pairs be
ABRA. For the implementation of ABRA, we used the available
β = 4. The Kendall-tau distance for the two rankings is:
parallel implementation by the authors in NetworKit library [35].
α −β KPATH. According to Brandes [6], KPATH can be implemented by
K (τ1 , τ2 ) = 1 = 0.2 (9)
2 n(n − 1) modifying Brandes’ algorithm by backtracking whenever a node of
K distance is reached. Therefore, we modified our implementation
Identifying top 1%. Some applications require only the nodes of Brandes’ algorithm to implement KPATH algorithm. For the
with highest betweenness centrality values. Therefore, we measure value of K, we choose it to be 20% of the diameter of the graph.
how many nodes the approximation methods managed to identify EGO. For the implementation of EGO, we followed the same ap-
in the top 1% set of nodes with the highest betweenness centrality proach as KPATH, except that we traverse at most two hops from
values. any node instead of K steps.
Betweenness centrality for specific nodes. Some approxima-
tion algorithms showed that they could estimate certain nodes’ However, we need to emphasize that all of the above implemen-
betweenness centrality values in very limited time. Therefore, we tations of the approximation algorithms are serial except for DIAM
use these algorithms to estimate the betweenness centrality values and ABRA where the available source codes by the authors run in
of some nodes and compare these values with the respective val- parallel.
ues that were produced by other approximation algorithms that
estimate all nodes’ values. We measure the average error for ap- 5 EXPERIMENTS
proximating the chosen nodes’ values.
In this section, we compare the selected approximation methods
(Section 3) using our evaluation methodology (Section 4). We show
4.4 Approximation algorithms the results for 5 real world datasets (Section 4.1) . Note that we set
Our framework includes the implementations of seven methods a limit of 7 days on the execution time for an algorithm. So if an
for the approximate computation of betweenness centrality values, algorithm does not finish within 7 days, we terminate it and we do
namely RAND1, RAND2, GSIZE, DIAM, ABRA, KPATH and EGO. not include its results in our comparisons.
As discussed in Section 3 the chosen methods are representative
of the related state-of-the-art literature. Let us now briefly discuss
the implementation of these methods. Note that for all tested al- 5.1 Runtime
gorithms, we have used their original implementation whenever As we can see in Figure 4, the runtime follows the same behavior
it was publicly available or it was kindly provided by the authors, for all approximation methods; the runtime increases as the dataset
otherwise, we have implemented them using C++. size increases.
We can see that GSIZE outperforms almost by a factor of 10,
RAND1. To implement RAND1, we first implemented Brandes’
all other approximation methods and terminates within a limited
algorithm. Then, instead of using all nodes of the graph, pivots are
time in all datasets. On the other hand, EGO and KPATH outper-
sampled uniformly at random and Brandes’ algorithm is executed
form most approximation algorithms for the smallest dataset, but
from the selected pivots. The sums of dependencies are kept for all
their running time increase dramatically as the size of the dataset
nodes in the input graph. Finally, for any node v, the betweenness
increases and they become the worst. EGO and KPATH did not
centrality approximation is scaled up by multiplying the sum of
finish within 7 days for the largest datasets com-LiveJournal and
dependencies by |V |
|P | , where |P | is the number of sampled pivots to dbpedia-link, so we terminated them. The data points for EGO and
find its approximate betweenness centrality value. KPATH are not available in all the figures. DIAM did not finish
RAND2. For the implementation of RAND2, we used the available within 7 days for the largest dataset, and its data points for that
source code in NetworKit library [35]. dataset are not available in all figures as well.
SSDBM ’17, June 27-29, 2017, Chicago, IL, USA Ziyad AlGhamdi, Fuad Jamour, Spiros Skiadopoulos, and Panos Kalnis
107
RAND1 RAND2 DIAM ABRA GSIZE KPATH EGO RAND1 RAND2 DIAM ABRA GSIZE KPATH EGO
10-1
106
105
Time(seconds)
104
Max Error
10-2
103
102
101 10-3
100
c
on
on
nk
on
on
nk
na
na
rQ
rQ
-li
-li
nr
az
nr
az
ur
ur
-G
-G
ia
ia
l-E
l-E
m
m
Jo
Jo
ed
ed
ca
ca
-A
-A
ai
ai
ve
ve
p
p
em
em
m
db
db
i
i
-L
-L
co
co
m
m
co
co
Dataset Dataset
0.8
10-4
Kendal Tau
0.7
Avg. Error
10-5 0.6
0.5
10-6 0.4
0.3
10-7
0.2
0.1
10-8
c
on
on
al
k
rQ
lin
rn
nr
az
c
on
on
al
a-
-G
ou
rQ
lin
l-E
rn
m
nr
az
di
ca
a-
eJ
-G
ou
-A
pe
ai
l-E
di
iv
ca
em
m
eJ
-A
db
pe
ai
-L
co
iv
em
m
db
-L
co
co
m
co
Dataset
Dataset
Figure 5: Average error of approximation algorithms. Figure 7: Kendall correlation coefficient for approximation
algorithms.
As the dataset size increase, the differences in the runtime be- approximation algorithms, so it offers a good trade off between
come more clear. Hence, by taking into consideration that DIAM accuracy and speed.
and ABRA ran in parallel, the approximation algorithms perfor- On the other hand, KPATH and EGO provide the worst accuracy
mances could be generally, in all datasets, ranked as follows: GSIZE, among all other approximation algorithms in all datasets, since
RAND2, RAND1, ABRA, DIAM, EGO then KPATH. they result in the highest maximum and average error.
RAND1 RAND2 DIAM ABRA GSIZE KPATH EGO RAND1 RAND2 DIAM ABRA GSIZE KPATH EGO
10-1
100
90
80 10-2
Top 1% Hits
Avg. Error
70
60
10-3
50
40
10-4
c
on
on
nk
on
on
nk
na
na
rQ
rQ
-li
-li
nr
az
nr
az
ur
ur
-G
-G
ia
ia
l-E
l-E
m
m
Jo
Jo
ed
ed
ca
ca
-A
-A
ai
ai
ve
ve
p
p
em
em
m
db
db
i
i
-L
-L
co
co
m
m
co
co
Dataset Dataset
Figure 8: Percentage of top 1% nodes identified by approxi- Figure 9: Average error of selected nodes.
mation algorithms.
we investigated this case, we found that dbpedia-link has lots of As we can see in Figure 9, for all datasets, KPATH and EGO
nodes with the same betweenness centrality values. Therefore, provide the worst approximation for the chosen nodes. On the other
these nodes will always be ranked by their node IDs, and this will hand, RAND1, RAND2, DIAM and ABRA produce more accurate
improve the overall ranking quality of the approximation method. approximations than GSIZE.
On the other hand, RAND1, RAND2 and GSIZE provide high Therefore, GSIZE does not provide more accurate estimation
quality rankings consistently for all datasets. However, for ap- for the chosen vertices, but it does give a good estimation in very
plications that care only about the nodes’ ranking, GSIZE is the limited time. As we were observing the behavior of GSIZE, when
best option taking into consideration its running time performance we change its input from c = 5 to c = 10, it will take a longer time
demonstrated previously. with larger sample size, but it will produce a better approximation
for the chosen nodes.
5.4 Identifying top 1%
As we can see in Figure 8, for ca-GrQc, EGO was the worst in
6 CONCLUSIONS
identifying the top betweenness centrality values; it managed to
identify almost half of them. GSIZE and KPATH identified almost The lack of a golden standard benchmark for evaluating approx-
80% of the top ranked nodes. On the other hand, RAND1, RAND2, imate betweenness centrality algorithms made it difficult for re-
DIAM and ABRA performed the best and managed to identify more searchers to compare their algorithms to existing ones, and difficult
than 90% of the top ranked nodes. for users to choose the appropriate approximation algorithm for
For the datasets email-Enron and com-Amazon, GSIZE, KPATH their applications. Motivated by that, we developed the golden stan-
and EGO perform almost similarly and they managed to identify dard BeBeCA in an attempt to standardize and facilitate evaluating
almost 80% of the top ranked nodes. On the other hand, RAND1, betweenness centrality approximation algorithms.
RAND2, DIAM and ABRA still performed better and they managed Our extensive evaluation of existing betweenness centrality ap-
to identify around 90% of top ranked nodes on both datasets. proximation algorithms showed that simple approximation algo-
However, as we move toward the largest dataset, we can see that rithms based on source sampling offer a good trade-off between
RAND2 starts outperforming other approximation algorithms. performance and accuracy. More specifically, algorithms based on
a uniform random sampling of sources perform well consistently
5.5 Betweenness centrality for specific nodes as the graph size grows, with tolerable runtimes, and they outper-
form more sophisticated and recent algorithms both in terms of
GSIZE’s goal is to produce a good estimation of an individual node performance and accuracy.
in very limited time. We have witnessed in the previous subsections Our plans for future work include extending BeBeCA to support
that it has the best runtime performance among the approximation other centrality metrics to facilitate evaluating other metrics on
algorithms, and we want to test if it will give a good estimation for large graphs.
certain nodes we choose at random.
To that end, we select 10 nodes with high betweenness centrality,
and we run GSIZE to approximate their betweenness centrality. We ACKNOWLEDGEMENTS
report in Figure 9 the average error produced by GSIZE for these For computer time, this research used the resources of the Super-
nodes only. For comparison, we include the average error of the computing Laboratory at King Abdullah University of Science and
other approximation algorithms for these nodes. Technology (KAUST) in Thuwal, Saudi Arabia.
SSDBM ’17, June 27-29, 2017, Chicago, IL, USA Ziyad AlGhamdi, Fuad Jamour, Spiros Skiadopoulos, and Panos Kalnis
REFERENCES [22] Jérôme Kunegis. 2013. Konect: the koblenz network collection. In Proceedings of
[1] David A Bader, Shiva Kintali, Kamesh Madduri, and Milena Mihail. 2007. Ap- the 22nd International Conference on World Wide Web. ACM, 1343–1350.
proximating betweenness centrality. In International Workshop on Algorithms [23] Min-Joong Lee, Sunghee Choi, and Chin-Wan Chung. 2016. Efficient algorithms
and Models for the Web-Graph. Springer, 124–137. for updating betweenness centrality in fully dynamic graphs. Information Sciences
[2] Reinhard Bauer and Dorothea Wagner. 2009. Batch dynamic single-source 326 (2016), 278–296.
shortest-path algorithms: An experimental study. In International Symposium on [24] Jure Leskovec and Rok Sosič. 2016. Snap: A general-purpose network analysis and
Experimental Algorithms. Springer, 51–62. graph-mining library. ACM Transactions on Intelligent Systems and Technology
[3] Elisabetta Bergamini and Henning Meyerhenke. 2016. Approximating Between- (TIST) 8, 1 (2016), 1.
ness Centrality in Fully Dynamic Networks. Internet Mathematics 12, 5 (2016), [25] Yeon-sup Lim, Daniel S Menasché, Bruno Ribeiro, Don Towsley, and Prithwish
281–314. Basu. 2011. Online estimating the k central nodes of a network. In Network
[4] Elisabetta Bergamini, Henning Meyerhenke, and Christian L Staudt. 2014. Ap- Science Workshop (NSW), 2011 IEEE. IEEE, 118–122.
proximating betweenness centrality in large evolving networks. In 2015 Pro- [26] Leandros A Maglaras and Dimitrios Katsaros. 2012. New measures for char-
ceedings of the Seventeenth Workshop on Algorithm Engineering and Experiments acterizing the significance of nodes in wireless ad hoc networks via localized
(ALENEX). SIAM, 133–146. path-based neighborhood analysis. Social Network Analysis and Mining 2, 2
[5] Ulrik Brandes. 2001. A faster algorithm for betweenness centrality. Journal of (2012), 97–106.
mathematical sociology 25, 2 (2001), 163–177. [27] Ahmad Mahmoody, Charalampos E Tsourakakis, and Eli Upfal. 2016. Scalable Be-
[6] Ulrik Brandes. 2008. On variants of shortest-path betweenness centrality and tweenness Centrality Maximization via Sampling. arXiv preprint arXiv:1609.00790
their generic computation. Social Networks 30, 2 (2008), 136–145. (2016).
[7] Ulrik Brandes and Christian Pich. 2007. Centrality estimation in large networks. [28] Arun S Maiya and Tanya Y Berger-Wolf. 2010. Online sampling of high centrality
International Journal of Bifurcation and Chaos 17, 07 (2007), 2303–2318. individuals in social networks. In Pacific-Asia Conference on Knowledge Discovery
[8] Kathleen M Carley. 1999. On the evolution of social and organizational networks. and Data Mining. Springer, 91–98.
Research in the Sociology of Organizations 16, 0 (1999). [29] David Alfred Ostrowski. 2015. An approximation of betweenness centrality
[9] Mostafa Haghir Chehreghani. 2014. An efficient algorithm for approximate for Social Networks. In Semantic Computing (ICSC), 2015 IEEE International
betweenness centrality computation. Comput. J. (2014), bxu003. Conference on. IEEE, 489–492.
[30] Jürgen Pfeffer and Kathleen M Carley. 2012. k-Centralities: local approximations
[10] Shlomi Dolev, Yuval Elovici, and Rami Puzis. 2010. Routing betweenness central-
of global measures based on shortest paths. In Proceedings of the 21st International
ity. Journal of the ACM (JACM) 57, 4 (2010), 25.
Conference on World Wide Web. ACM, 1043–1050.
[11] Martin Everett and Stephen P Borgatti. 2005. Ego network betweenness. Social
[31] Matteo Riondato and Evgenios M Kornaropoulos. 2016. Fast approximation of
networks 27, 1 (2005), 31–38.
betweenness centrality through sampling. Data Mining and Knowledge Discovery
[12] Linton C Freeman. 1977. A set of measures of centrality based on betweenness.
30, 2 (2016), 438–475.
Sociometry (1977), 35–41.
[32] Matteo Riondato and Eli Upfal. 2016. ABRA: Approximating betweenness cen-
[13] Robert Geisberger, Peter Sanders, and Dominik Schultes. 2008. Better approx-
trality in static and dynamic graphs with Rademacher averages. arXiv preprint
imation of betweenness centrality. In Proceedings of the Meeting on Algorithm
arXiv:1602.05866 (2016).
Engineering & Expermiments. Society for Industrial and Applied Mathematics,
[33] Gert Sabidussi. 1966. The centrality index of a graph. Psychometrika 31, 4 (1966),
90–100.
581–603.
[14] M. Girvan and M. E. J. Newman. 2002. Community structure in social and
[34] Shai Shalev-Shwartz and Shai Ben-David. 2014. Understanding machine learning:
biological networks. Proceedings of the National Academy of Sciences 99, 12
From theory to algorithms. Cambridge university press.
(2002), 7821–7826.
[35] Christian Staudt, Aleksejs Sazonovs, and Henning Meyerhenke. 2014. Net-
[15] K-I Goh, Eulsik Oh, Byungnam Kahng, and Doochul Kim. 2003. Betweenness
workit: An interactive tool suite for high-performance network analysis. CoRR,
centrality correlation in social networks. Physical Review E 67, 1 (2003), 017101.
abs/1403.3005 (2014).
[16] Shiyu Ji and Zenghui Yan. 2016. Refining approximating betweenness centrality
[36] Vladimir Naumovich Vapnik. 2000. The nature of statistical learning theory, ser.
based on samplings. arXiv preprint arXiv:1608.04472 (2016).
Statistics for engineering and information science. New York: Springer 21 (2000),
[17] Maurice G Kendall. 1938. A new measure of rank correlation. Biometrika 30, 1/2
1003–1008.
(1938), 81–93.
[37] Vladimir N Vapnik and A Ya Chervonenkis. 2015. On the uniform convergence
[18] Maurice George Kendall. 1948. Rank correlation methods. (1948).
of relative frequencies of events to their probabilities. In Measures of Complexity.
[19] Sushant S Khopkar, Rakesh Nagi, and Gregory Tauer. 2016. A penalty box
Springer, 11–30.
approach for approximation betweenness and closeness centrality algorithms.
[38] Stanley Wasserman and Katherine Faust. 1994. Social network analysis: Methods
Social Network Analysis and Mining 6, 1 (2016), 1–13.
and applications. Vol. 8. Cambridge university press.
[20] Nicolas Kourtellis, Tharaka Alahakoon, Ramanuja Simha, Adriana Iamnitchi,
[39] Yuichi Yoshida. 2014. Almost linear-time algorithms for adaptive betweenness
and Rahul Tripathi. 2013. Identifying high betweenness centrality nodes in large
centrality using hypergraph sketches. In Proceedings of the 20th ACM SIGKDD
social networks. Social Network Analysis and Mining 3, 4 (2013), 899–914.
international conference on Knowledge discovery and data mining. ACM, 1416–
[21] Valdis E Krebs. 2002. Mapping networks of terrorist cells. Connections 24, 3
1425.
(2002), 43–52.