Anomaly Detection in Dynamic Networks - A Survey

Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

Advanced Review

Anomaly detection in dynamic


networks: a survey
Stephen Ranshous,1,2 Shitian Shen,1,2 Danai Koutra,3 Steve
Harenberg,1,2 Christos Faloutsos3 and Nagiza F. Samatova1,2∗

Anomaly detection is an important problem with multiple applications, and thus


has been studied for decades in various research domains. In the past decade there
has been a growing interest in anomaly detection in data represented as networks,
or graphs, largely because of their robust expressiveness and their natural ability
to represent complex relationships. Originally, techniques focused on anomaly
detection in static graphs, which do not change and are capable of representing
only a single snapshot of data. As real-world networks are constantly changing,
there has been a shift in focus to dynamic graphs, which evolve over time.
In this survey, we aim to provide a comprehensive overview of anomaly
detection in dynamic networks, concentrating on the state-of-the-art methods. We
first describe four types of anomalies that arise in dynamic networks, providing
an intuitive explanation, applications, and a concrete example for each. Having
established an idea for what constitutes an anomaly, a general two-stage approach
to anomaly detection in dynamic networks that is common among the methods is
presented. We then construct a two-tiered taxonomy, first partitioning the methods
based on the intuition behind their approach, and subsequently subdividing
them based on the types of anomalies they detect. Within each of the tier one
categories—community, compression, decomposition, distance, and probabilistic
model based—we highlight the major similarities and differences, showing the
wealth of techniques derived from similar conceptual approaches. © 2015 The Authors.
WIREs Computational Statistics published by Wiley Periodicals, Inc.

How to cite this article:


WIREs Comput Stat 2015, 7:223–247. doi: 10.1002/wics.1347

Keywords: anomaly detection; dynamic networks; outlier detection; graph min-


ing; dynamic network anomaly detection; network anomaly detection

INTRODUCTION financial systems connecting banks across the world,


a electric power grids connecting geographically dis-
A network is a powerful way to represent a
collection of objects and the relationships or
connections among them. Examples include global
tributed areas, and social networks that connect users,
businesses, or customers using relationships such
as friendship, collaboration, or transactional inter-
∗ Correspondence actions. These are examples of dynamic networks,
to: [email protected]
1 Department of Computer Science, North Carolina State University,
which, unlike static networks, are constantly under-
Raleigh, NC, USA going changes to their structure or attributes. Possi-
2 Computer Science and Mathematics Division, Oak Ridge National ble changes include insertion and deletion of vertices
Laboratory, Oak Ridge, TN, USA (objects), insertion and deletion of edges (relation-
3 Computer Science Department, Carnegie Mellon University, Pitts- ships), and modification of attributes (e.g., vertex or
burgh, PA, USA edge labels).
Conflict of interest: The authors have declared no conflicts of interest An important problem over dynamic networks is
for this article. anomaly detection—finding objects, relationships, or
Volume 7, May/June 2015 223
© 2015 The Authors. WIREs Computational Statistics published by Wiley Periodicals, Inc.
This is an open access article under the terms of the Creative Commons Attribution-NonCommercial License, which permits use, distribution and reproduction
in any medium, provided the original work is properly cited and is not used for commercial purposes.
19390068, 2015, 3, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wics.1347, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Advanced Review wires.wiley.com/compstats

points in time that are unlike the rest. There are many used in the papers we discuss. We then introduce four
high-impact and practical applications of anomaly different types of anomalies that these algorithms
detection spanning numerous domains. A small sam- detect, namely, anomalous vertices, edges, subgraphs,
ple includes: detection of ecological disturbances, and events. We continue with an extensive overview
such as wildfires1,2 and cyclones3 ; intrusion detection of the existing methods based on the proposed tax-
for individual systems4 and network systems5–7 ; iden- onomy, illustrated in Figure 5 that takes into account
tifying abnormal users and events in communication their underlying design principles, such as those based
networks8,9 ; and detecting civil unrest using twitter on graph communities, compression, decomposition,
feeds.10 distance metrics, and probabilistic modeling of graph
The ubiquitousness and importance of anomaly features. Each taxonomic group is then subcatego-
detection in dynamic networks has led to the emer- rized further based on the types of anomalies detected.
gence of dozens of methods over the last 5 years Finally, we end with a more in-depth discussion of
(see Tables 3 and 4). These methods complement the methods that have code publicly available (see
techniques for static networks,11–14 as the latter often Table 6), highlighting pros and cons of each.
cannot be easily adopted for dynamic networks.
When considering the dynamic nature of the data,
new challenges are introduced, including: BACKGROUND
Anomaly, or outlier, detection is a problem that spans
• New types of anomalies are introduced as a result many domains. Chandola et al.16 provide an excellent
of the graph evolving over time, for example, overview, taxonomy, and analysis of a multitude
splitting, disappearing, or flickering communi- of techniques (e.g., classification, clustering, and
ties. statistical) for a variety of domains, expanding the
• New graphs or updates that arrive over time work of Hodge et al.23 and Agyemang et al.24 As our
need to be stored and analyzed. Storing all the focus will be on graphs, it is important to have a
new graphs in their entirety can vastly increase basic understanding of graph theory. West et al.25 and
the size of the data. Therefore, typical offline Balakrishnan et al.26 both offer comprehensive and
analysis, where multiple passes over the data are approachable introductions to graph theory, covering
acceptable and all of the data are assumed to all of the basics required for this survey and well
fit into memory, becomes infeasible. Conversely, beyond. Cook and Holder27 show how the theoretical
storing only the most recent graph or updates concepts can be applied for graph mining, and recently
restricts the analysis to a single point in time. Samatova et al.28 provide an overview of many graph
mining techniques as well as implementation details
• Graphs from different domains, such as social in the R programming language.b Owing to space
networks compared to gene networks, may limitations we cannot provide an introduction to the
exhibit entirely different behavior over time. fundamentals of the types of methods we discuss, so
This divergence in evolution can lead to we provide references for introductory and overview
application-specific anomalies and approaches. material for each of them in Table 1.
• Anomalies, particularly those that are slow to It is important to note that in many domains
develop and span multiple time steps, can be hard the data are naturally represented as a network, with
to differentiate from organic graph evolution. the vertices and edges clearly defined. However, in

No dedicated and comprehensive survey of TABLE 1 Introductory and Overview References


anomaly detection in dynamic networks exists, despite
the growing importance of the topic because of the Method References
increasing availability of network data. Although Community detection Fortunato,29 Lancichinetti and
anomaly detection has been surveyed in a variety of Fortunato,30 Reichardt and
domains,15–19 it has only recently been touched upon Bornholdt,31 Harenberg et al.32
in the context of dynamic networks.20–22 MDL and compression Rissanen,33–35 Grünwald36
In this survey, we hope to bridge the gap Decomposition Golub and Reinsch,37 Klema and
between the increasing number of methods for Laub,38 Kolda and Bader39
anomaly detection in dynamic networks and the Distance Gao et al.,40 Rahm and
lack of their comprehensive analysis. First, we give a Bernstein,41 Cook and Holder27
broad overview of the related work in graph mining, Probabilistic Koller and Friedman,42 Glaz43
anomaly detection, and the high-level approaches

224 © 2015 The Authors. WIREs Computational Statistics published by Wiley Periodicals, Inc. Volume 7, May/June 2015
19390068, 2015, 3, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wics.1347, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
WIREs Computational Statistics Anomaly detection in dynamic networks

some cases, how to represent the data as a network some applications, and conclude with a representative
is unclear and can depend on the specific research example.
question being asked. The conversion processes used
in specific domains are outside the scope of this survey,
and we assume all data are already represented as Type 1: Anomalous Vertices
dynamic networks. Anomalous vertex detection aims to find a subset
of the vertices such that every vertex in the subset
has an ‘irregular’ evolution compared to the other
TYPES OF ANOMALIES vertices in the graph. Optionally, the time point(s)
In this section, we identify and formalize four types where the vertices are determined to be anomalous
of anomalies that arise in dynamic networks. These can be identified. What constitutes irregular behavior
categories represent the output of the methods, not is dependent on the specific method, but it can be
the implementation details of how they detect the generalized by assuming that each method provides
anomalies, e.g., comparing consecutive time points, a function that scores each vertex’s behavior, e.g.,
using a sliding window technique. Note that often measuring the change in the degree of a vertex from
times in real-world graphs, (e.g., social and biological time step to time step. In static graphs, the single
networks) the graph’s vertex set V is called a set of snapshot allows only intragraph comparisons to be
nodes. However, to avoid confusion with nodes in made, such as finding vertices with an abnormal
a physical computer network, and to align with the egonet density.11 Dynamic graphs allow the temporal
abstract mathematical representation, we call it a set dynamics of the vertex to be included, introducing new
of vertices. types of anomalies that are not present in static graphs.
Because the graphs are assumed to be dynamic, A high-level definition for a set of anomalous vertices
vertices and edges can be inserted or removed at every is as follows.
time step. For the sake of simplicity, we assume that
the vertex correspondence and the edge correspon- Definition 1. (Anomalous vertices). Given G, the
dence across different time steps are resolved because total vertex set V = ∪Tt=1 Vt , and a specified scoring
of unique labeling of vertices and edges, respectively. ( ′ ) ̂V ⊆ V
function f : V → ℝ, the set of anomalous vertices ′

We define a graph series G as an ordered set of graphs is a vertex set such that ∀ v ∈ V , |f v − f| > c0 ,
′ ′

with a fixed number of time steps. Formally, G =


{ }T where ̂f is a summary statistic of the scores f(v),
Gt t=1 , where T is the total number of time steps, ∀ v ∈ V.
Gt = (V t , Et ⊆ (V t × V t )), and the vertex set V t and edge
set Et may be plain or attributed (labeled). Graph An example of an anomalous vertex is shown
series where T → ∞ are called graph streams. The full in Figure 1. Two time steps are shown, both with
set of notations can be found in Table 2. In the fol- 10 vertices and 2 communities. In this example,
lowing subsections, we start with an intuitive expla- anomalous vertices are those that have a substantial
nation of the problem definition, then give a general change in their community involvement compared
formal definition of the anomaly type, continue with to the rest of the vertices in the graph. As v6 is the
only vertex that has any change in its community
involvement, it is labeled as an anomaly. Formally, we
TABLE 2 Notation Summary
can measure the change in community involvement
Symbol Meaning between adjacent time steps t and t − 1 by letting
∑|C|
G Graph series with a fixed number of time points f(v) = i=1 ct−1
i,v
⊕ cti,v , where C = {c1 , c2 , … , c|C| } is
Gt Snapshot of the graph series at time t the set of communities, cti,v = 1 if v is part of commu-
G s
The sth graph segment, a grouping of temporally
nity ci at time step t and 0 otherwise, and ⊕ is the xor
adjacent graphs operator.
The scoring function f will depend on the appli-
It Vertex partitioning at time t , separating all the
vertices into disjoint sets
cation. In the example shown in Figure 1, the ver-
tices were scored based on the change in community
Vt Vertex set for the graph at time point t
involvement. However, if the objective is identifying
vi Vertex i computers on a network that become infected and
Et Edge set for the graph at time point t part of a botnet, then an appropriate scoring function
ei ,j Edge between v i and v j might be measuring the change in the number of edges
c0 Threshold value for normal versus anomalous data each vertex has between time steps, or the change in
the weights of the edges.

Volume 7, May/June 2015 © 2015 The Authors. WIREs Computational Statistics published by Wiley Periodicals, Inc. 225
19390068, 2015, 3, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wics.1347, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Advanced Review wires.wiley.com/compstats

a real number, low values indicating unusual behavior.


1 10 In a static graph, a distribution of the edge weights
can be found, and each edge can be assigned a score
according to the probability of its weight. However,
2 5 6 9
because of the temporal nature of dynamic graphs,
two new main types of irregular edge evolution can be
3 4 7 8 found: (1) abnormal edge weight evolution,47 where
the weight of a single edge fluctuates over time and
Time step 1 has inconsistent spikes in value, and (2) appearance
of unlikely edges in a graph between two vertices
that are not typically connected or part of the same
community.48,49 Anomalous edge detection can be
Community 1 Community 2 formally defined as follows.
Time step 2

Definition 2. (Anomalous edges). Given G, the total


1 10
edge set E = ∪Tt=1 Et , and a specified scoring function
f : E → ℝ, the set of anomalous edges E′ ⊆ E is an edge
( )
2 5 6 9 set such that ∀ e′ ∈ E′ , |f e′ − ̂f| > c0 , where ̂f is a
summary statistic of the scores f(e), ∀ e ∈ E.

3 4 7 8 Figure 2 shows an example of edges that exhibit


irregular edge weight evolutions. Over the five time
steps there are two anomalous edges, both having a
FIGURE 1 | Example of an anomalous vertex. A dynamic graph with single time point that is unlike the rest of the series.
two time steps showing a vertex, v 6 , that is found to be anomalous
Letting f (e) = |wt (e) − wt − 1 (e)| + |wt (e) − wt + 1 (e)|,
based on its change in community involvement. In time step 1, v 6 is
only part of community 2, whereas in time step 2 it is part of both where wt (e) is the edge weight at time step t, each
communities 1 and 2. As no other vertex’s community involvement edge is scored based on its current weight compared
changes, v 6 is deemed anomalous. to the previous and following weight. Abnormally
large changes in the weight of an edge results in it
is being flagged as anomalous. For example, at time
Typical applications of this type of anomaly
detection are identifying vertices that contribute the step 2, using the mean change in weights ̂ f = 0.43 as a
most to a discovered event (also known as attribution), summary statistic and c0 = 0.10 as a threshold, results
such as in communication networks,44 and observing in edge (3, 4) being declared anomalous.
the shifts in community involvement.45,46 While our example used a simple thresholding
approach for edge weights, various other scoring func-
tions can be used. Finding unlikely social interactions
Type 2: Anomalous Edges can be accomplished by modeling the edges between
Similar to vertex detection, edge detection aims to find people as the number of times they communicate daily,
a subset of the edges such that every edge in the subset and then scoring each edge at each time step based on
has an ‘irregular’ evolution, optionally identifying the the probability of it (1) being present, and (2) having a
time point(s) where they are abnormal. Again, this particular weight.50 Another application is identifying
concept can be generalized by assuming each method anomalous traffic patterns in a graph where vertices
employs a function that maps each edge in the graph to represent road intersections and edges represent the

1 1 1 1 1
0.50 0.45 0.45 0.48 0.47 0.93 0.51 0.47 0.49 0.50

2 4 2 4 2 4 2 4 2 4

0.85 0.22 0.90 0.77 0.87 0.23 0.88 0.20 0.91 0.18
3 3 3 3 3
t=1 t=2 t=3 t=4 t=5

FIGURE 2 | An illustration of anomalous edges that occur because of an irregular pattern of their weight over time, with anomalous edges
highlighted. At each time step, a vertex’s weight typically shifts by ± 0.05 at most. However, edge (3, 4) has a spike in its weight at time step 2, unlike
any other time in the series. Similarly, at time step 3, edge (1, 4) spikes in value. These spikes cause the edges to be considered anomalous.

226 © 2015 The Authors. WIREs Computational Statistics published by Wiley Periodicals, Inc. Volume 7, May/June 2015
19390068, 2015, 3, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wics.1347, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
WIREs Computational Statistics Anomaly detection in dynamic networks

roads themselves. Projecting the edges into a feature 1 1


space based on their average speed of traffic through-
out the day, the pairwise similarity between every edge 2 6
can be calculated. Each edge is then scored based on
the change in the similarity between itself and every 3 5 3 5
other edge.47
4 4
t=1 t=2
Type 3: Anomalous Subgraphs (a)
Finding subgraphs with irregular behavior requires
an approach different from the ones for anomalous 1 1
vertices or edges, as enumerating every possible sub-
graph in even a single graph is an intractable problem. 2 6 2 6
Hence, the subgraphs that are tracked or identified are
typically constrained, for example, to those found by 3 5 3 5
community detection methods. In these cases, match-
4 4
ing algorithms are required to track the subgraphs
t=1 t=2
across time steps, such as the community matching
technique used in Ref 51. Once a set of subgraphs has (b)
been determined, intragraph comparisons or intertime FIGURE 3 | Two different types of anomalous subgraphs. (a)
point comparisons can be made to assign scores to Shrunken community shows a shrunken community, where a community
each subgraph, e.g., measuring the change in the total loses members from one time step to the next. (b) Split community
weight of the subgraph between adjacent time steps. shows a split community, where a single community breaks into several
Typical intragraph comparison methods, such as distinct smaller communities.
finding unique substructures in the graph that inhibit
compressibility,14 must be extended to incorporate
series. Therefore, split communities can be found by
the additional information gained by using a dynamic
examining two adjacent time steps and letting f score
network. Instead of finding structures that are unique
each community in t − 1 as the number of communities
within a single graph, structures that are unique to a
it is matched to in t.
single graph within a series of graphs can be found.
What constitutes an anomalous subgraph is
Anomalies of this type, unique to dynamic networks,
heavily dependent on the application domain. Auto-
include communities that split, merge, disappear, and
matic identification of accidents in a traffic network
reappear frequently, or exhibit a number of other
can be done by letting edge weights represent out-
behaviors.
lier scores, then mining the heaviest dynamic subgraph
Definition 3. (Anomalous subgraphs). Given G, a (HDS).52 Similarly, changes and threats in social net-
subgraph set H = ∪Tt=1 Ht where Ht ⊆ Gt , and a spec- works can be found by running community detection
ified scoring function f : H → ℝ, the set of anomalous on a reduced graph composed of suspected anomalous
subgraphs H′ ⊆ H is a subgraph set such that ∀ h′ ∈ H′ , vertices50 and performing an Eigen decomposition on
( )
|f h′ − ̂f| > c0 , where ̂f is a summary statistic of the a residual graph.53
scores f(h), ∀ h ∈ H.

Two specific types of anomalous subgraphs Type 4: Event and Change Detection
are shown in Figure 3. A shrunken community, Unlike the previous three types of anomalies, the two
Figure 3(a), is when a single community loses a sig- types discussed here are exclusively found in dynamic
nificant number of its vertices between time steps. graphs. We start by defining the problem of event
Assuming that a matching of communities between detection, which has attracted much interest in the
adjacent time steps t and t − 1 is known, shrunken data mining community. Event detection has a much
communities can be identified by measuring the broader scope compared to the previous three types
decrease in the number of vertices in the community, of anomalies, aiming to identify time points that are
f (h) = |ht ∩ ht − 1 | − |ht − 1 |. Figure 3(b) is an example of significantly different from the rest. Isolated points
a split community, when a single community divides in time where the graph is unlike the graphs at the
into several distinct communities. Split communities previous and following time points represent events.
will have a high matching score or probability for One approach to measuring the similarity of two
more than one community in the next graph of the graphs is comparing the signature vector of summary

Volume 7, May/June 2015 © 2015 The Authors. WIREs Computational Statistics published by Wiley Periodicals, Inc. 227
19390068, 2015, 3, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wics.1347, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Advanced Review wires.wiley.com/compstats

1 1 1 1 1

2 6 2 6 2 6 2 6 2 6

3 5 3 5 3 5 3 5 3 5
4 4 4 4 4
t=1 t=2 t=3 t=4 t=5
(a)

1 1 1 1 1

2 6 2 6 2 6 2 6 2 6

3 5 3 5 3 5 3 5 3 5
4 4 4 4 4
t=1 t=2 t=3 t=4 t=5
(b)
FIGURE 4 | An example of an event (a), a change (b), and the difference between them. In both, the graphs initially have only a few edges. At
time step 3, the graphs become highly connected, almost forming a clique. The major difference is observed at the time step following the insertion of
the edges. In Figure 4(a) the edges are then removed and the graph returns to a state similar to the one before the insertion, representing an isolated
event. In Figure 4(b), however, the graph maintains its new state indicating the sustained shift, or change, in the graph.

values extracted from each graph, such as average applications include finding frames in a video that are
clustering coefficient. The task of attribution—finding unlike the others,55,56 and detecting disturbances in
the cause for the event—is not required as part the ecosystem (e.g., wildfires).1
of event detection, and is often omitted. However, Now, we move on to the problem of change
once the time points for events have been identified, detection, which is complementary to event detection.
potential causes can be found by applying techniques It is important to note the distinction between event
for anomaly detection in static graphs.11–14 and change detection. While events represent isolated
incidents, change points mark a point in time where
Definition 4. (Event detection). Given G and a the entire behavior of the graph changes and the
scoring function f : Gt → ℝ, an event is defined as difference is maintained until the next change point,
a time point t, such that |f(Gt ) − f(Gt − 1 )| > c0 and Figure 4 illustrates the difference between the two. The
|f(Gt ) − f(Gt + 1 )| > c0 . distinction between events and change points mani-
fests in the modification of the constraint that the value
One of the simplest similarity metrics for graphs
of the scoring function at t be more than a threshold
is comparing the number of vertices and edges in
value away from both t − 1 and t + 1, to being more
them. In Figure 4(a), the event at time step 3 can be
than a threshold value away from only t − 1.
found by counting the edges in each graph, f (Gt ) = |Et |,
and comparing the adjacent time points. The number Definition 5. (Change detection). Given G and a
of edges in each graph, {6, 6, 14, 7, 7}, shows that scoring function f : Gt → ℝ, a change is defined as
there is a significant change in the structure of the a time point t, such that |f(Gt ) − f(Gt − 1 )| > c0 and
graph at time step 3, almost forming a clique. Note |f(Gt ) − f(Gt + 1 )| ≤ c0 .
that this time point is isolated, with the surrounding
time points being very dissimilar, indicating an event An example of change detection is shown in
occurred. Figure 4(b). Similar to event detection, this change
Event detection, while providing less specific can be identified by comparing the number of edges at
information than vertex, edge, or subgraph detection, adjacent time points. Just as in Figure 4(a), a number
is highly applicable in many areas. Approximating of edges are added to the graph, forming nearly a
the data available using a tensor decomposition, then clique. However, at time step 4, instead of the graph
scoring the time point as the amount of error in reverting back to its original structure, the added
the reconstruction, has been shown to effectively edges are persistent and the graph has a new ‘normal’
detect moments in time when the collective motions behavior. The persistence of the new edges indicates
in molecular dynamics simulations change.54 Other that at t = 3 a change was detected, not an event.

228 © 2015 The Authors. WIREs Computational Statistics published by Wiley Periodicals, Inc. Volume 7, May/June 2015
19390068, 2015, 3, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wics.1347, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
WIREs Computational Statistics Anomaly detection in dynamic networks

One of the most popular applications of change numbers, then applying existing anomaly detectors to
detection is in networks modeling human interactions, identify the anomalies. Equivalently, they are propos-
such as communication and coauthorship networks. ing methodologies for stage one, and applying existing
Authors that act as a ‘bridge’ between different con- methodologies for stage two. A complete list of the
ferences, or switch areas of interest throughout their methods surveyed is provided in Tables 3 and 4, with
career, will exhibit a number of change points in their the complexity notation provided in Table 5.
publication records.57 Additionally, change detection
has been applied to communication graphs44 and net-
work traffic58 by measuring the change in the Eigen Community Detection
decompositions. Community-based methods track the evolution of
communities and their associated vertices in the
graphs over time.105–107 While the data-specific fea-
METHODS tures generated by the methods discussed in this
section are all derived using the community struc-
A common two-stage methodology was found among ture of the graphs, the various approaches differ in
the papers reviewed. In the first stage, data-specific two main points: (1) in the aspects of the community
features are generated by applying a mapping from the structure they analyze, e.g., the connectivity within
domain-specific representation, graphs, to a common each community versus how the individual vertices are
data representation, a vector of real numbers. A assigned to the communities at each time step, and (2)
simple example is taking a single static graph as in the definitions of communities they use, e.g., soft
input and outputting its diameter. In the second stage, communities where each vertex has a probability of
an anomaly detector is applied, taking the output belonging to each community versus disjoint commu-
from stage one, and optionally some historical data, nities where each vertex is placed into at most one
and mapping it to a decision on the anomalousness community in the graph. Moreover, based on how the
of the data. Anomaly detectors, such as support community evolution is viewed, it can be applied to
vector machines (SVMs)59 or the local outlier factor the detection of different anomaly types. For example,
(LOF) algorithm,60 are general methodologies that are a rapid expansion or contraction of a community
domain-independent, as they operate on the common could indicate that the specific subgraph for that com-
data representation that is output from stage one. munity is undergoing drastic changes, whereas a drop
Hence, stage two consists of identifying outlier points in the number of communities by two corresponds to
in a multidimensional space.60–64 The two stages can an abnormal event.
be formally defined as follows:
Example
Stage one ⇒ Ω ∶ D → ℝd , In Ref 67, given a weighted directed graph series G and
a vertex partitioning similarity function Sim(Is , It ), the
{[ ] [ ] } set of change points is defined as a set of time points
∗ t
Stage two ⇒ f ∶ ℝd , ℝd → {0, 1} , T : {t ∈ T|Sim(Is , It ) ≤ c0 }. The similarity function is
defined as
where D is the domain-specific representation, d is ( ) ( )
( ) PIs It + PIt Is
the number of feature dimensions, [ℝd ]* denotes the Sim Is , It = , (1)
history of the data, and [ℝd ]t denotes the current time 2N
point. Often times, stage one maps the domain-specific ( ) ∑ | |
where PIs It = p max |Isp ∩ c|, Is is the partition-
data to a single value, d = 1, such as the graph diameter c∈Is ∩It | |
example. Additionally, stage two can be generalized ing of the current graph segment Gs and It is the
to mapping to [0, 1] instead of {0, 1}, representing partitioning of the newly arrived graph Gt , N is the
the probability that the data are anomalous or a number of vertices, p is the community index in a
normalized outlier score assigned to the data. partitioning, c is the common community between
This two-stage process can be found in many the two partitionings. Mapping the similarity of two
other areas of anomaly or outlier detection, such as graphs to a single score via vertex partition similar-
text mining and abnormal document detection,65 and ity is the extraction of the data-specific feature, and
computer vision.66 However, in this survey we restrict the anomaly detector is the application of a threshold
the scope to anomaly detection in dynamic graphs. value. If the similarity is above a threshold c0 , then Gt
The papers we surveyed are interested in finding the will be put into the current graph segment Gs , other-
best mapping from dynamic graphs to a vector of real wise, Gt starts a new graph segment Gs + 1 . The graph

Volume 7, May/June 2015 © 2015 The Authors. WIREs Computational Statistics published by Wiley Periodicals, Inc. 229
19390068, 2015, 3, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wics.1347, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Advanced Review wires.wiley.com/compstats

TABLE 3 Summary of Methods


Paper V E SG E/C Year Complexity1
Community
√ ( )
Duan67 2009 𝒪 n3 t

Aggarwal68 2012 𝒪 (mt )
69

Chen 2012 Exp.
√ ( )
Gupta70 2012 𝒪 nk 2 t
√ ( )
Gupta71 2012 𝒪 ntk 2t

Chen3 2013 Exp.
√ ( )
Ji46
2013 𝒪 n3 t

Rossi72 2013 𝒪 (mt )

Araujo 73
2014 𝒪 (mt )
Compression

Chakrabarti74 2004 𝒪 (mt )

Sun 75
2007 𝒪 (mt )
Decomposition
√ √ ( )
Ide76 2004 𝒪 n2 t
√ ( )
Lakhina77 2004 𝒪 tm2
√ √ √ ( 3 )
Sun 78
2006 𝒪 nr t
√ √ √ ( )
Sun79 2006 𝒪 rt |𝒳 |N𝒳
√ √ √ ( )
Sun 80
2007 𝒪 nct + c 3 t
√ √ √ ( )
Kolda81 2008 𝒪 |𝒳 |r 3 t
√ √ √ ( )
Tong 82
2008 𝒪 mct + c 3 t
√ √ ( )
Akoglu44 2010 𝒪 n2 t
√ ( )
Ishibashi 58
2010 𝒪 n2 t
√ √ ( )
Jiang83 2011 𝒪 nt 2 + n2
√ √ √
Koutra 57
2012 𝒪 (mt )

Miller84 2012 𝒪 (mt )
√ √ √
Papalexakis85 2012 𝒪 (mt )
√ √
Miller86 2013 𝒪 (mt )

Yu 87
2013 𝒪 (mt )

V, vertex; E, edge; SG, subgraph; C/E, change/event.


1 The complexity stated is, in some cases, a coarse approximation provided for comparative purposes. The complexity shown is for the entire graph series.

is partitioned using a relevance matrix computed Based on this logic, using soft community match-
with random walks with restarts and modularity ing and looking at each community individually, the
optimization. average change in belongingness (the probability the
vertex is part of the community) for each vertex can
Vertex Detection be found for consecutive time steps. Vertices whose
change in belongingness is different from the average
A group of vertices that belong to the same community change of the vertices in the community are said to be
are expected to exhibit similar behavior. Intuitively evolutionary community outliers.70
this means that if at consecutive time steps one vertex
The changes in a vertex’s community belong-
in the community has a significant number of new
ingness may form a pattern over time, called a soft
edges added, the other vertices in the community
would also have a significant number of new edges. temporal pattern. In Ref 72, a non-negative matrix
If the rest of the vertices in the community did factorization approach in combination with the min-
not have new edges added, the vertex that did is imum description length (MDL) principle is used to
anomalous. detect automatically vertex roles and build transition

230 © 2015 The Authors. WIREs Computational Statistics published by Wiley Periodicals, Inc. Volume 7, May/June 2015
19390068, 2015, 3, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wics.1347, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
WIREs Computational Statistics Anomaly detection in dynamic networks

TABLE 4 Summary of Methods (Cont.)


Paper V E SG E/C Year Complexity1
Distance
√ ( )
Pincombe88 2005 𝒪 m2 t
√ ( )
Gaston89 2006 𝒪 n3 t
√ ( )
Li47 2009 𝒪 m2 t
√ √ √
Abello 48
2010 𝒪 (mt )

Papadimitriou 90
2010 𝒪 (mt )

Arackaparambil91 2011 𝒪 (mt )
√ ( )
Le 92
2011 𝒪 n2 t
√ ( )
Berlingerio 93
2012 𝒪 nt log n
√ ( 3 )
He94 2012 𝒪 nt
√ ( )
Malliaros 95
2012 𝒪 n2 t

Koutra96 2013 𝒪 (mt )
√ ( )
Mongiov 52
2013 𝒪 mt + m log m
√ ( )
Mongiov 97
2013 𝒪 mt log m
Probabilistic
√ √ ( )
Priebe8 2005 𝒪 n3 t
√ √ ( )
Huang98 2006 𝒪 n2 t

Pandit99 2007 𝒪 (mt )
√ √ ( )
Hirose 100
2009 𝒪 n2 t
√ ( )
Thompson 101
2009 𝒪 n2 t
√ √ √ √ ( )
Heard50 2010 𝒪 n2 t

Djidjev 102
2011 𝒪 ((n + m) t )
√ √
Aggarwal103 2011 𝒪 (mt )
√ ( k −1 )
Neil 104
2013 𝒪 mn t

V, vertex; E, edge; SG, subgraph; C/E, change/event.


1 The complexity stated is, in some cases, a coarse approximation provided for comparative purposes. The complexity shown is for the entire graph series.

models. The community memberships are found In Ref 46, communities manifest in the form
slightly differently in Ref 71, where Xmeans is used. of corenets. Instead of defining the corenet (commu-
However, in both cases, the patterns common across nity) based solely on density, modularity, or hop dis-
all of the vertices in the graph are extracted, then tance (as egonets are11 ), the definition is based on the
each vertex’s patterns are compared to the extracted weighted paths between vertices. More formally, a ver-
ones. If a vertex’s patterns are not similar to any of tex’s corenet consists of itself and all the vertices within
the extracted common patterns, then the vertex is two hops that have a weighted path above a thresh-
anomalous. Later extended to static networks derived old value. If the edge weight between two vertices is
from heterogeneous data sources,108 the two-step considered the strength of their connection, then intu-
approach in Ref 71 is modified to an alternating itera- itively the vertices connected with higher weight edges
tive approach in Ref 70. Instead of extracting patterns should be considered part of the same community.
first and then identifying outliers, the patterns and Consequently, if a vertex has two neighbors removed,
outliers are found in an alternating fashion (pattern one connected with a high edge weight and the other
extraction → outlier detection → pattern extraction connected with a low edge weight, then the removal of
→ · · ·) until the outliers discovered do not change on the vertex connected by the higher edge weight should
consecutive iterations. By alternating back and forth have more of an impact. At each time step every vertex
between pattern extraction and outlier detection, the is first given an outlier score based on the change in its
algorithm accounts for the affect the outliers have on corenet, and the top k outlier scores are then declared
the communities discovered. anomalous.

Volume 7, May/June 2015 © 2015 The Authors. WIREs Computational Statistics published by Wiley Periodicals, Inc. 231
19390068, 2015, 3, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wics.1347, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Advanced Review wires.wiley.com/compstats

TABLE 5 Algorithm Complexity Notation that behave similarly can be grouped using clustering
or prior knowledge. In Ref 3, if a community is
Symbol Meaning
conserved across time steps and the networks within
n Number of vertices in the graph its group, but has no corresponding community in
m Number of edges in the graph any other group of networks, then the community is
t Number of time steps defined as anomalous; two communities are consid-
k Number of communities ered corresponding communities if they have a certain
𝒳 Tensor representing the graph
percentage of their vertices in common.
Unlike Refs 3, 69, 73 that consider only the
|𝒳 | Size of the largest mode in 𝒳
structure of the network, in social networks there is
r Size of the largest mode of the core tensor often more information available. For example, in the
approximation (e.g., in the Tucker Twitter user network, clusters can be found based
decomposition113 )
on the content of tweets (edges), as well as the users
N𝒳 Number of modes in 𝒳 (vertices) involved. If the fraction of the tweets (edges)
c Number of columns sampled for low rank added during the recent time window for a cluster is
approximation much larger than the fraction of tweets (edges) added
Exp. Exponential time complexity anytime before the window, then this influx is declared
as an evolution event for that cluster.68 A cluster
that experiences an evolution event is marked as an
Subgraph Detection anomalous subgraph at the time when the evolution
event occurs. Although the authors did not use labeled
Instead of looking at individual vertices and their com- datasets in Ref 73, the proposed algorithm can be
munity belongingness, entire subgraphs that behave applied to such data by appropriately incorporating
abnormally can be found by observing the behavior the information in the tensor.
of communities themselves over time.

Six different types of community-based anoma- Change Detection


lies are proposed in Ref 69: shrink, grow, merge, split,
Changes are detected by partitioning the streaming
born, and vanish. A graph at time t is compared to the graphs into coherent segments based on the similarity
graphs at time t + 1 and t − 1 using graph and commu- of their partitionings (communities). The beginning of
nity representatives. The representatives reduce the each segment represents a detected change.
computation required, providing seeding points for
community enumeration. Communities are defined The segments are found online by comparing the
as maximal cliques, hence enumerating all commu- vertex partitioning of the newest graph to the par-
nities is NP-hard, and overlapping communities are titioning found for the graphs in the current, grow-
allowed. Graph representatives are the set of vertices ing, segment. Vertex partitioning can be achieved with
that have appeared in time step t in addition to t + 1, many methods, but in Ref 67 it is done using the rele-
t − 1, or both; a community representative is a vertex vance matrix computed by random walks with restarts
that appears in the fewest number of the other com- and modularity maximization. When the partitioning
munities. A set of rules are then derived based on these of the new graph is much different from the current
representatives that decide whether communities are segment’s, a new segment begins, and the time point
anomalous, falling into one of the six defined classes, for the new graph is output as a detected change. The
or normal. One additional type of community-based similarity of two partitions is computed as their Jac-
anomaly was proposed recently in Ref 73: comets, card coefficient, and the similarity of two partitionings
or communities that come and go, often periodically. is the normalized sum of their partition similarities. A
The time-evolving graph is represented by a tensor similar approach, GraphScope,75 based on the same
(3-mode matrix), and the comets are detected by low idea but using the MDL principle to guide its parti-
rank tensor decomposition combined with the MDL tioning and segmenting will be discussed later.
principle that allows for parameter-free community
search.
Conversely, instead of finding changes, commu- Compression
nities that are conserved, or stable, can be identified. The methods discussed in this section are all based on
Constructing multiple networks at each time step the MDL principle.33 The MDL principle and com-
based on different information sources, communities pression techniques based on this principle exploit pat-
can be conserved across time and networks. Networks terns and regularity in the data to achieve a compact

232 © 2015 The Authors. WIREs Computational Statistics published by Wiley Periodicals, Inc. Volume 7, May/June 2015
19390068, 2015, 3, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wics.1347, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
WIREs Computational Statistics Anomaly detection in dynamic networks

graph representation.109 Applying this principle to matrix including the edge to the encoding cost if the
graphs is done by viewing the adjacency matrix of edge is removed. Streaming updates can be performed
a graph as a single binary string, flattened in row using the previous result as a seed for a new run of
or column major order. If the rows and columns of the algorithm, thus avoiding full recomputations.
the matrix can be rearranged such that the entropy
of the binary string representation of the matrix is Change Detection
minimized, then the compression cost (also known as
encoding cost34 ) is minimized. The data-specific fea- The main idea is that consecutive time steps that are
tures are all derived from the encoding cost of the very similar can be grouped together leading to low
graph or its specific substructures; hence, anomalies compression cost. Increases in the compression cost
are then defined as graphs or substructures (e.g., edges) mean that the new time step differs significantly from
that inhibit compressibility. the previous ones, and thus signifies a change.

Akin to Ref 67, to detect changes in a graph


Example
stream, consecutive time steps that are similar can be
In Ref 75, given unweighted and undirected graph
grouped into segments. When considering the next
series G and segment encoding cost function c, the
graph in the stream, it can either be grouped with
set of
{ change(points{ is })
defined( as )a set (of time
)} points the graphs in the current segment, or it can be the
T ∶ t ∈ T|c Gs ∪ Gt − c Gs ≥ c Gt , where
beginning of a new segment. The decision to start a
Gs is the current graph segment, and Gt is the newly
new segment in Ref 75 is made by comparing the
arrived graph at time point t. The main difference
encoding cost of the current segment without the next
between GraphScope75 and the example from the
graph to the encoding cost of the segment if the next
Community Detection section by Duan et al67 is the
graph were included. If the vertex partitioning for the
scoring function used for partitioning and change
new graph is very similar to the vertex partitioning of
detection. Equivalently, the two methods generate dif-
the graphs in the segment then the encoding cost will
ferent data-specific features. Duan et al.67 used com-
not change much. However, if the partitions are very
munity driven metrics, computed by random walks
different, the encoding cost would increase because of
with restarts and modularity optimization, to parti-
the increase in entropy. Changes in the graph stream
tion and segment the graphs, whereas, GraphScope75
are the time points when a new segment begins. This
is guided by the MDL principle, using encoding cost as
method is also parameter-free.
a scoring function. While both methods use a thresh-
old system as an anomaly detector, GraphScope
uses a dynamic threshold, based on the current graph Matrix/Tensor Decomposition
segment, instead of a fixed threshold as in Ref 67.
These techniques represent the set of graphs as a
tensor, most easily thought of as a multidimensional
Edge Detection
array, and perform tensor factorization or dimen-
An edge is considered anomalous if the compression sionality reduction. The most straightforward method
of a subgraph has higher encoding cost when the edge for modeling a dynamic graph as a tensor is to cre-
is included than when it is omitted. ate a dimension for each graph aspect of interest,
e.g., a dimension for time, source vertices, and des-
In Ref 74, a two-step alternating iterative tination vertices. For example, modeling the Enron
method is used for automatic partitioning of the ver- email dataset can be done using a 4-mode tensor, with
tices. Vertex partitioning can be done by rearranging dimensions for sender, recipient, keyword, and date.
the rows and columns in the adjacency matrix. In The element (i, j, k, l) is 1 if there exists an email that
the first step, the vertices are separated into a fixed is sent from sender i to recipient j with keyword k on
number of disjoint partitions such that the encoding day l, otherwise, it is 0.
cost is minimized. The second step iteratively splits Similar to compression techniques, decomposi-
the partitions with high entropy into two. Any vertex tion techniques search for patterns or regularity in
whose removal from the original partition would the data to exploit. All of the data-specific features
result in a decrease in entropy is removed from that generated by these methods are derived from the
partition and placed into the new one. Once the result of the decomposition of a matrix or tensor.
method has converged, meaning steps 1 and 2 are Unlike the community methods, these typically take
unable to find an improvement, the edges can be a more global view of the graphs, but are able to
given outlierness scores. The score for each edge is incorporate more information (attributes) via addi-
computed by comparing the encoding cost of the tional dimensions in a tensor. One of the most popular

Volume 7, May/June 2015 © 2015 The Authors. WIREs Computational Statistics published by Wiley Periodicals, Inc. 233
19390068, 2015, 3, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wics.1347, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Advanced Review wires.wiley.com/compstats

methods for matrices (2-mode tensors) is singular weighted frequencies are found using a decay function,
value decomposition (SVD),37 and for higher order where edges that occurred more recently have a higher
tensors (≥ 3 modes) is PARAFAC,110 a generalization weight. The largest Eigenvalue and its corresponding
of SVD. The main differences among the decomposi- vector obtained by performing PCA on M are sum-
tion based methods are whether they use a matrix or maries of the activity of the vertex and the correlation
a higher order tensor, how the tensor is constructed of its edges, respectively. The time series formed by
(what information is stored), and the method of finding the changes in these values are used to com-
decomposition. pute a score for each vertex at each time step. Vertices
that have a score above a threshold value are output
Example as anomalies at that time.

In Ref 44, given the 3-D T × N × F tensor, where


Event Detection
T denotes the number of time steps, N denotes the
number of vertices in a graph, F denotes the num- There are two main approaches: (1) Tensor decom-
ber of features extracted for each vertex, a scoring position approximates the original data in a reduced
function Z(t), and the time window W, a set of time dimensionality, and the reconstruction error is an indi-
points for change detection is a set of time steps cator of how well the original data is approximated.
T : {t ∈ T |Z(t) ≥ c0 }, where c0 is a dynamic threshold,
′ ′
Sub-tensors, slices, or individual fibers in the tensor
Z(t) = 1 − u(t)T r(t − 1), u(t) is the principal Eigen- that have high reconstruction error do not exhibit
vector (or ‘Eigen-behavior’) of the vertex-to-vertex behavior typical of the surrounding data, and reveal
feature correlation matrix computed at time point t, anomalous vertices, subgraphs, or events. (2) Singular
and r(t − 1) is the ‘typical Eigen-behavior’, calculated values and vectors, as well as Eigenvalues and Eigen-
using the last W Eigenvectors. Each time points score, vectors are tracked over time in order to detect signif-
icant changes that showcase anomalous vertices.
Z(t), can be thought of as the similarity between the
current graph, and the previous W graphs. Using the
Using the reconstruction error as an indicator
Z scores that are output as the data-specific features,
for anomalies has been employed for detecting times
the anomaly detector is a simple heuristic of choosing
during molecular dynamics simulations where the
the top k values and labeling them as the anomalies.
collective motions suddenly change,54 finding frames
Similar to the examples in the Community Detection
in a video that are unlike the others,56 and identifying
and Compression-based sections, this method incor-
data that do not fit any concepts.112
porates the historical values into the data-specific
To address the intermediate blowup problem—
feature generation instead of the anomaly detector.
when the input tensor and output tensors exceed
However, unlike the previous two examples, it con-
memory capacity during the computation—
siders the entire graph at once to compute its score,
memory-efficient tucker (MET) decomposition
instead of building up the score based on substruc-
was proposed,81 based on the original Tucker
tures or partitions, and uses a fixed sliding window
decomposition.113 The Tucker decomposition approx-
of graphs over the time series instead of a dynamic
imates a higher order tensor using a smaller core tensor
segmenting process.
(thought of as a compressed version of the original
tensor) and a matrix for every mode (dimension)
Vertex Detection
of the original tensor. Similarly, methods have been
Matrix decomposition is used to obtain activity vec-
developed for offline, dynamic, and streaming tensor
tors per vertex. A vertex is characterized as anomalous analysis,79 in addition to static and sliding window
if its activity changes significantly between consecutive based methods.78 These extensions allow the method
time steps. to operate on continuous graph streams as well as
those with a fixed number of time points. Compact
Owing to the computational complexity of per- matrix decomposition80 (CMD) computes a sparse
forming principal component analysis111 (PCA) on the low rank approximation of a given matrix. By apply-
entire graph, it is computationally advantageous to ing CMD to every adjacency matrix in the stream,
apply it locally. The approach used in Ref 87 is to a time series of reconstruction values is created and
have each vertex maintain an edge correlation matrix used for event detection. Colibri82 and ParCube85
M, which has one row and column for every neigh- can be used in the same fashion and provide a large
bor of the vertex. The value of an entry in the matrix increase in efficiency. The PARAFAC decomposition
for vertex i, M(j, k), is the correlation between the has been shown to be effective at spotting anomalies
weighted frequencies of the edges (i, j) and (i, k). The in tensors as well.57

234 © 2015 The Authors. WIREs Computational Statistics published by Wiley Periodicals, Inc. Volume 7, May/June 2015
19390068, 2015, 3, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wics.1347, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
WIREs Computational Statistics Anomaly detection in dynamic networks

Instead of finding the difference between the The normal activity vector, r(t − 1), is the left
graph reconstructed from the approximation obtained singular vector obtained by performing SVD on the
by a decomposition, a probabilistic model can be used. matrix formed by the activity vectors for the last
The Chung-Lu random graph model114 is used in W time steps. Each time point is given a score
Ref 84 Taking the difference between the real graph’s Z(t) = 1 − r(t − 1)T u(t), which is intuitively a score of
adjacency matrix and the expected graph’s forms a how different the current activity is compared to
residual matrix. Anomalous time windows are found normal, where a higher value is worse. Anomalies can
by performing SVD on the residual matrices—on be found online using a dynamic thresholding scheme,
which a linear ramp filter has been applied—and where time points with a score above the threshold
by analyzing the change in the top singular values. are output as changes.76 The vertices responsible are
The responsible vertices are identified via inspection found by calculating the ratio of change between
of the right singular vectors. More accurate graph the normal and activity vectors. The vertices that
models that also consider attributes are proposed correspond to the indexes with the largest change are
in Ref 86. labeled anomalous. Similar approaches have used the
Going away from comparing an expected or activity vector of a vertex-to-vertex feature correlation
approximate model of the graph to the real model to matrix,44 and a vertex-to-vertex correlation matrix
find the deviation, events can be identified via signifi- based on the similarity between vertex’s neighbors.58
cant changes in the decomposed space. Specifically, by
performing PCA on the data, the calculated Eigenvec-
Distance Measures
tors can be separated into ‘normal’ and ‘anomalous’
Using the notion of distance as a metric to measure
sets by examining the values of the data projected onto
change is natural and widely used.23,60,116,117 Two
each Eigenvector. At each time step the Eigenvectors
objects that have a small difference in a measured
are examined (via projection of the data) in descend-
metric can be said to be similar. The metrics measured
ing order of their corresponding Eigenvalues, and the
in graphs are typically structural features, such as
first Eigenvector to contain a projected point outside 3
the number of vertices. Once the summary metrics
standard deviations of the rest of the values, and every
are found for each graph, the difference or similarity,
Eigenvector thereafter, constitute the anomalous set.
which are inversely related, can be calculated. The
The second step is then to project the data onto its
variation in the algorithms lies in the metrics chosen
normal and anomalous subspaces. Once this is done,
to extract and compare, and the methods they use to
events are detected when the modifications in the
determine the anomalous values and corresponding
anomalous components from the previous time step to
graphs.
the current are above a threshold value.77 Expanding
on this method, joint sparse PCA and graph-guided Example
joint sparse PCA were developed to localize the In Ref 96, given a graph series G, a distance function
anomalies and identify the vertices responsible.83 The d that computes the distance between two matrices,
responsible vertices are more easily identified by using and a function to calculate the vertex affinity matrix
a sparse set of components for the anomalous set. Ver- S, where sij indicates the influence vertex i has on
tices are given an anomaly score based on the values vertex j, a set of time points for detected events is
of their corresponding row in the abnormal subspace. T : {t ∈ T|d(Gt , Gt + 1 ) ≤ c0 }, where
As a result of the anomalous components being sparse,

the vertices that are not anomalous receive a score of √ n n (√ √ )2
( ) √ ∑∑
0. Owing to the popularity of PCA in traffic anomaly d Gt , Gt + 1 = √ St,ij − St+1,ij , (2)
detection, a study was performed identifying and eval- i=1 j=1
uating the main challenges of using PCA.115
and c0 is a threshold value. Similar to the example
Change Detection given in the Community Detection section,67 the
data-specific feature of interest here is a measure of
finding the distance (inversely related to similarity)
The activity vector of a graph, u(t), is the principal
component, the left singular vector corresponding to
between two graphs. Thus, each adjacent graph pair
the largest singular value obtained by performing SVD is mapped to a single real value, creating a time
on the weighted adjacency matrix. A change point is series. The anomaly detector, the quality control mov-
when an activity vector is substantially different from ing average threshold, is unique compared to the
the ‘normal activity’ vector, which is derived from previous examples, as it considers the historical val-
previous activity vectors. ues of the data-specific features. The previous three

Volume 7, May/June 2015 © 2015 The Authors. WIREs Computational Statistics published by Wiley Periodicals, Inc. 235
19390068, 2015, 3, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wics.1347, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Advanced Review wires.wiley.com/compstats

sections (Community Detection, Compression, and existing method that assigns outlier scores for edges
Decomposition) incorporated this information into in a network47,48,50,74,119 could be used as the input
their data-specific feature generation, using window- to this method. The latter approach allows52 to be
ing or segmenting techniques. applied to any network that is capable of having out-
lier scores assigned to edges (e.g., weighted, directed,
Edge Detection and attributed). Once every edge is assigned an outlier
score, significant anomalous regions (SARs)—fixed
If the evolution of some edge attribute (e.g., edge subgraphs over a window of time—are mined from
weight) differs from the ‘normal’ evolution, then the the sequence, analogous to the problem of finding the
corresponding edge is characterized as anomalous. HDSs. An alternating iterative approach based on Ref
120—first finding an optimal time window for a fixed
In Ref 47, a dynamic road traffic network whose subgraph, then finding an optimal subgraph for the
edge weights vary over time is studied. The similarity fixed time window, repeating until no improvement
between the edges over time is computed using a decay is found—is constructed to approximate a solution to
function that weighs the more recent patterns higher. the NP-hard problem. This work was later extended
At each time step, an outlierness score is calculated to allow the subgraphs to smoothly evolve over time,
for each edge based on the sum of the changes in where vertices can be added or removed between adja-
similarity scores. Edges with the highest score, chosen cent time steps.97 A similar approach mines weighted
using either a threshold value or top-k heuristic, are frequent subgraphs in network traffic, where the edge
marked as anomalous at that time step. weights correspond to the anomaly contribution of
Viewing the network as a stream of edges, that edge.94
meaning the network does not have a fixed topology
as the road traffic network did, the frequency and
Event Detection
persistence of an edge can be measured and used as
an indicator of its novelty. The persistence of an edge Provided a function f (Gi , Gj ) that measures the dis-
is how long it remains in the graph once it is added, tance between two graphs, a time series of distance
and the frequency is how often it appears. In Ref 48, values can be constructed by applying the function on
set system discrepancy118 is used to measure the consecutive time steps in the series. Anomalous val-
persistence and frequency of the edges. When an edge ues can then be extracted from this time series using
arrives, its discrepancy is calculated and compared to a number of different heuristics, such as selecting the
the mean discrepancy value of the active edge set. If top k or using a moving average threshold.
the weighted discrepancy of the edge is more than a
threshold level greater than the mean discrepancy, the Extracting features from the graphs is a common
edge is declared novel (anomalous). Using the novel technique to create a summary of the graph in a few
edges detected, a number of metrics can be calculated scalar values, its signature. Local features are specific
for various graph elements (e.g., vertices, edges, and to a single vertex and its egonet11 (the subgraph
subgraphs). Individual graph elements can then be induced by itself and its one-hop neighbors), such
identified anomalous via comparison of the calculated as the vertex or egonet degree. Global features are
metrics for that element with the overall distribution derived from the graph as a whole, such as the
and change of the metric. graph radius. The local features of every vertex in
the graph can be agglomerated into a single vector,
Subgraph Detection the signature vector, of values that describe the graph
using the moments of distribution (such as mean
A subgraph with many ‘anomalous’ edges is deemed and standard deviation) of the feature values. In
anomalous. Ref 93, the similarity between two graphs is the
Canberra distance, a weighted version of the L1
Contrasting with Ref 46 where the edge weights distance, between the two signature vectors. A similar
are considered the strength of a connection, in Ref approach is used in Ref 92 to detect abnormal times in
52 the edge weights are considered as outlier scores, traffic dispersion graphs. Instead of an agglomeration
or how ‘anomalous’ that edge is at that time. Every of local features, it extracts global features from each
edge at every time step is given its own anomaly score, graph, introducing a dK-2 series121–123 -based distance
which is a function of the probability of seeing that metric, and any graph with a feature value above a
particular edge weight on that particular edge given threshold is anomalous.
the distribution of weights for that edge over all the As an alternative to extracting multiple features
graphs in the series. Alternatively, the output of an from the graph and using the signatures, the pairwise

236 © 2015 The Authors. WIREs Computational Statistics published by Wiley Periodicals, Inc. Volume 7, May/June 2015
19390068, 2015, 3, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wics.1347, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
WIREs Computational Statistics Anomaly detection in dynamic networks

vertex affinity scores may be used. Pairwise vertex features used were each vertex and its PageRank
affinity scores are a measure of how much each vertex value, and each edge (u, v) with a weight equal to u’s
influences another vertex, and can be found using fast PageRank value divided by the number of outlinks
belief propagation.124 In Ref 96, the scores are calcu- from u. Weighting the vertices with their ‘quality’,
lated for two consecutive time steps, and the similarity measured by PageRank, ensures that the removal of a
between the two graphs is the rooted Euclidean dis- high-quality vertex will have more of an impact than
tance (Matusita distance) between the score matrices. the removal of a low quality vertex. A fixed threshold
The changes in the vertex affinity score are shown to was set to find graphs with abnormally low similarity
accurately reflect and scale with the level of impact scores.
of the changes. For example, removing an edge that Instead of finding structural difference between
connects two otherwise disconnected components, a two consecutive graphs, events can be detected using
‘bridging edge’, results in a higher score than remov- the time series of robustness values for each graph.
ing an edge that does not affect the overall structure Robustness is a measure of the connectivity of the
of the graph. A moving threshold is set on the time graph. A graph with a high robustness will retain the
series of similarity scores using quality control with same general structure and connectivity even when
individual moving range. The exponential weighted some vertices or edges are removed. Finding events
moving average has also been used as a way to is then finding when the robustness value changes
dynamically set the threshold, tested on distribution significantly.95 A similar approach is given in Ref 89,
features extracted from Wikipedia revision logs.91 but using a variant of the graph diameter. The graph
Complementary to feature similarity, one can diameter used is the average of all vertex eccentricities
look at the structural differences between graphs (the eccentricity of a vertex v is the longest shortest
to identify the magnitude of change. These meth- path from v to any other vertex), instead of the typical
ods focus on the function that defines the distance definition of using the maximum vertex eccentricity.
between graphs instead of finding the optimal features Both the time series of graph diameter changes and the
to use as summaries. Many metrics have been devel- graph diameters themselves are shown to be effective
oped and tested to quantify the differences between methods for detecting events.
graphs. In Ref 88, 10 different distance functions
were evaluated on TCP/IP traffic graphs with known
anomalies (ground truth). Box-Jenkins autoregressive Probabilistic Models
moving average (ARMA) modeling125 was used to With a foundation in probability theory, distributions,
create a ‘normal’ model of the feature values, and and scan statistics, these methods typically construct
the time points with residuals, calculated as the dif- a model of what is considered ‘normal’ or expected,
ference between expected and real value, exceeding and flag deviations from this model as anomalous. The
a threshold are flagged. The 10 distance functions type of model used, how it is constructed, what it is
tested were weight, maximum common subgraph modeling, and the method for determining outliers is
(MCS) weight, MCS edge, MCS vertex, graph edit what differentiates these approaches.
distance, median graph edit distance, modality, diam-
eter, entropy, and spectral. Of these 10 distance func- Example
tions, the MCS-based methods performed the best; In Ref 98, given a graph series G and a likelihood scor-
however, finding the MCS between two graphs is an ing function L, a set of time points for change detec-
NP-hard problem. More recently in Ref 90, five dif- tion is a set of time steps T : {t ∈ T|log(L(t))/|𝜃| ≤ c0 },

ferent distance scoring functions, all with a linear where L(t) = 𝜃appear in time t P(𝜃)c(𝜃) , 𝜃 denotes
complexity, were tested on web graphs with specific sender–recipient tuples, P(𝜃) indicates the probability
types of anomalies: missing subgraph, missing vertices, of all possible sender–recipient tuples (emails from a
and connectivity change. The five scoring functions single sender to a list of receivers), and c(𝜃) is the num-
were: vertex/edge overlap, vertex ranking, vector sim- ber of occurrences of a particular sender–recipient
ilarity, sequence similarity, and signature similarity. tuple in time t. Thus, each time step is mapped to
The method that performed the best was the signa- a single value, the average log-likelihood, indicating
ture similarity, which is done by extracting a signa- the predictability of the emails during that time step,
ture vector from each graph and finding the distance high values indicating normal behavior. Abnormal
between them using SimHash.126 Unlike the signa- time steps are identified via inspection of the plot
ture vectors discussed above that create summaries of of log-likelihood values. In general, the data-specific
the entire graph in a few scalar values, the signature features that are output are often probabilities, or
here includes every vertex and edge. Specifically, the likelihoods, of certain structures or events occurring.

Volume 7, May/June 2015 © 2015 The Authors. WIREs Computational Statistics published by Wiley Periodicals, Inc. 237
19390068, 2015, 3, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wics.1347, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Advanced Review wires.wiley.com/compstats

Typically, models or distributions are constructed Edge Detection


using past data to derive an ‘expected’ case for the
next time step. Anomaly detectors in probabilistic Communications (edges) are modeled using a count-
methods do not always perform a hard mapping ing process, and edges that deviate from the model by
from features to anomalies, where everything is either a statistically significant amount are flagged.
normal or abnormal, but can provide a probability One way to model the relationship between ver-
that the structure or event is anomalous. tices is considering the communication between them
as a counting process. In Ref 50, a Bayesian discrete
Vertex Detection time counting process is used to model the number of
communications (edge weights) between vertices, and
There are two main approaches: (1) Building scan
statistics time series and detecting points that are is continuously updated as new graphs are considered.
several standard deviations away from the mean, (2) Predictive p-values, based on the learning of the dis-
vertex classification. tribution of the counts, are calculated for the edges of
the newest observation and subsequently used for flag-
Scan statistics are often called ‘moving window ging anomalous vertex pairs (edges). Moreover, as new
analysis’, where the local maximum or minimum of a graphs are considered, both sequential—comparing
measured statistic is found in specific regions of the the new graph to the model based on history—and
data. In a graph, a scan statistic can be considered retrospective analysis—adjusting the history based on
as the maximum of a graph invariant feature, such the newest graphs—are performed. The retrospective
as the number of edges, found for each vertex and analysis helps alleviate the insufficient data problem,
its neighborhood in the graph. In Ref 8, they use a when decisions are made early on with insufficient his-
variable ‘scale’ for the measured statistic; each vertex tory to be accurate.
has the number of edges in its 0, 1, and 2 step
neighborhood measured. The local statistic for each Subgraph Detection
vertex is then normalized using the mean and standard
Fixed subgraphs (e.g., paths and stars), multigraphs,
deviation of its recent values, and the scan statistic
and cumulative graphs are used to construct models on
of a graph is the maximum of all of the normalized the expected behaviors. Deviations from the models
local statistics. Normalizing the values accounts for signify an anomalous subgraph.
the history of each vertex, meaning the statistic for
each vertex is relative to its own past instead of To identify hacker behaviors in a network,104
the values of the other vertices. This ensures that combines scan statistics with a Hidden Markov Model
the largest change measured in the graph is not tied (HMM) for edge behavior. Unlike Ref 8 that used
directly to the magnitude of change, but the ratio. neighborhoods, the local scan statistics are based on
Building a standardized time series of the scan statistic two graph shapes, the k-path and star. Comparing
values, any value that is five standard deviations away the scan statistics of a sliding window to its past
from the mean of the series is considered an event values, and using an online thresholding system, local
(hence, events are detected as well). The vertex most anomalies are identified. The local anomaly is the
responsible is identified as the one that was chosen for union of all subgraphs (representing the k-paths or
the scan statistic value for the entire graph. stars) that led to statistically significant test statistics.
Similar to the use of neighborhoods for scan Another method to model a dynamic network,
statistics, the Markov random field model (MRF) is instead of using a series or stream of graphs, is to have
used to uncover the states for vertices and infer the a single multigraph where parallel edges correspond to
maximum likelihood assignment by a belief propaga- communication between vertices at two different time
tion algorithm, where a vertex is labeled based on the steps. The initial multigraph is decomposed into tele-
vertices it is connected to. In Ref 99, anomalies (fraud- scoping subgraphs (TSGs) for each time window.102
sters) are uncovered in an online auction network by A TSG satisfies two conditions: (1) for any two edges
discovering bipartite cores, which are posited to be the that share a vertex, the edge that begins communica-
interaction behavior between fraudsters and accom- tion first finishes communication last; (2) there exists a
plices. It incrementally updates the model as new edges vertex r, the root, that has no incoming edges and has a
are added, taking advantage of the fact that an edge path to every vertex in the TSG. TSGs that have a low
insertion or removal will affect only a local subgraph. probability of appearing (according to their size and
In the propagation matrix a vertex can be in one of historic connection patterns) are output as anomalies.
three states: fraudster, accomplice, or honest. Vertices Likewise, a cumulative graph, which includes
that are assigned the label of fraudster are anomalous. every edge seen up until the current time step, is

238 © 2015 The Authors. WIREs Computational Statistics published by Wiley Periodicals, Inc. Volume 7, May/June 2015
19390068, 2015, 3, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wics.1347, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
WIREs Computational Statistics Anomaly detection in dynamic networks

used in Ref 101. Edge weights in the graph are events and the vertices responsible can be identified.
calculated using a decay function where more recent When the Eigenvalues deviate from the expected dis-
edges weigh more. A normal behavior for subgraphs tribution an event has occurred, and the vertices whose
is defined by identifying persistent patterns, here matrices deviate from the matrix distribution127 the
found by constructing a graph where each edge is most are considered responsible.100
weighted by the fraction of time it is in the graph and
iteratively extracting the heaviest weight connected
components. As the cumulative graph evolves the DISCUSSION
extracted subgraphs are monitored, comparing their
With the wealth of algorithms discussed, the deci-
current activity to the expected activity, which is based
sion of which to use can be difficult. The choice
on the recent behavior.
depends on the types of graphs being analyzed (plain,
attributed, directed, undirected, etc.), the desired types
Event Detection
of anomalies, and the size of the graphs—influencing
Deviations from the models of the graph likelihood the allowed complexity, required parallelism, and
or the distribution of the Eigenvalues reveal when an whether streaming analysis is required. It is worth not-
event occurs. ing, however, that using multiple methods in parallel
or in conjunction with one another may yield superior
Recently, a new reservoir sampling method was results compared to using a single method. Using pub-
proposed in Ref 103 to maintain structural summaries licly available datasets (Table 7), each algorithm can
of a graph stream. The online sampling method man- be applied to data from a variety of domains, com-
ages multiple distinct partitionings of the network to paring the found anomalies, their sensitivity, and their
provide a statistically significant summary. As a new scalability.
graph is added to the stream, every edge has its likeli- While each method has its own advantages and
hood calculated based on the edge generation models disadvantages, for brevity we will give an overview
of the different partitions. Once the edge likelihoods of only those that have publicly available code (see
have been found an global graph likelihood is calcu- Table 6). We note that the available software listed
lated as the geometric mean of the edge likelihood is not comprehensive. We listed all software that was
values. A similar edge generation model approach was said to be available in their papers, as well as the
used in Ref 98, where the probability that and edge software of the authors that responded to our email
exists between vertices i and j is stored in a matrix, asking about the availability of their code. The meth-
M(i, j). The estimated probabilities are calculated ods are again partitioned by the types of anomalies
using expectation maximization, and subsequently they detect so that a fair qualitative comparison can
used to give potential scores to every sender–recipient be made. Note that a quantitative comparison among
pair. Each sender–recipient tuple, which is an email the available methods was impractical because the
from one sender to multiple recipients, then has its methods mostly detect different types of anomalies.
likelihood computed based on the estimated distribu- Of the few intra-type comparisons that exist, coordi-
tion of all possible sender–recipient tuples. A single nating the different outputs (e.g., anomalous vertices
score is calculated for each graph as the average of in the graphs versus anomalous vertices in communi-
the log-likelihood scores. In both of these methods a ties of the graphs) and attaining a valuable comparison
single score is calculated for each graph based on the would be unlikely.
likelihoods of the individual edges within it, and the
individual edges (or tuples) responsible are those that
have the lowest likelihood. Type 1 (Vertices) Methods
Similar to estimating the distribution of possible The earliest work with available software is the online
sender–recipient tuples, the distribution of the Eigen- method by Heard et al.,50 which models the frequency
values and compressed Eigen equations is calculated of the connections between vertices as a counting
in Ref 100. Based on the assumption that each ver- process and uses Bayesian learning and predictive
tex has a time series of a vertex-local feature (e.g., p-values to identify anomalies.71,70,50 In addition to
summed edge weight), a vertex-to-vertex correlation operating on graph streams, it leverages the sparsity
matrix can be generated for each time step. The Eigen of the network by only examining edges that appear
equation of the correlation matrix is compressed by in the graphs. A key advantage of this technique is
keeping the largest Eigenvalues and a set of low dimen- that it performs both sequential analysis, where new
sion matrices (one matrix for each vertex). By learning graphs are analyzed using the history of the graph
the distributions of the Eigenvalues and matrices, both stream, in addition to retrospective analysis, where

Volume 7, May/June 2015 © 2015 The Authors. WIREs Computational Statistics published by Wiley Periodicals, Inc. 239
19390068, 2015, 3, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wics.1347, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Advanced Review wires.wiley.com/compstats

the history is updated based on the new graphs that

[0, 1] per vertex per community per


arrive. One example of when this could be useful

[0, 1] per vertex for all time steps


[0, 1] per time step and [0, 1] per

Tensor factors for all time steps


is in the initial stages of the analysis when very lit-

Subgraphs with time intervals


tle history is available. When the algorithm termi-
nates the output is very easy to interpret, labeling ver-
edge per time step
tices/edges/subgraphs explicitly as anomalous or not.
[0, 1] per time step The flexibility of the final part of the algorithm, when
analysis is performed on only the identified anomalous

time step
portions of the graph and their neighbors, opens many
Output

possibilities—anomalous community detection, iden-


tifying the most important or influential anomalous
vertex, and many more.
Language

Similar to Ref 50, ECOutlier70 can be per-


MATLAB

MATLAB

MATLAB
Java
Java
Java

formed on graph streams. Each new graph will have


a community detection algorithm run on it, and then
methods operate on the community belongingess matrices of graphs, therefore they are applicable to any graph that can have communities extracted.

the community matching and anomaly detection can


Streaming

be performed. However, it compares only two adja-


TABLE 6 Methods with Open Source Code, the Types of Graphs they Work on, the Language They Are Written in, and the Output Format

cent time points in the graph stream—the newly added


graph and the one immediately preceding it. An inter-
esting feature of this method is that it operates on
the community belongingness matrices of a graph,
Parameterized

and is therefore applicable to graphs of any type




(plain, attributed, directed, etc.). The output from


this method is not as straightforward as Ref 50, at
each time step providing a matrix that has the outlier
score for each vertex in regards to each community. A
Attributed

related method, CTOutlier,71 has the same appeal of



working on community belongingness matrices. A dis-


tinguishing factor is that CTOutlier operates on the
sequence of belongingness matrices, requiring all of
Plain

them to be in memory at once for a complete analysis.




As a direct result of this, it is able to perform commu-


nity pattern extraction at an arbitrary scale, removing
Unweighted

the restriction of comparing only adjacent time points.



For long sequences the length of the patterns should


be restricted to 2, as the number of potential patterns
grows exponentially with the number of time steps. A
single score for each vertex is output, representing how
Weighted

anomalous the community evolution for each vertex is





over the entire sequence.


Undirected

Type 2 (Edges) Methods





Only the Bayesian learning technique by Heard et al.50


identifies anomalous edges explicitly.50 However, ten-
sor methods, such as ParCube,85 can find the recon-
Directed

struction error at arbitrary granularities, even on a



per cell basis. Therefore, they are capable of identi-


fying anomalous edges; however, the reconstruction
error must then be maintained for each edge, instead of
CTOUTLIER 711
701
DELTACON96
50

PARCUBE85
ECOUTLIER

each tensor, or the user must provide individual edges


52
BAYESIAN

NETSPOT

of interest to examine. A more practical approach for


Code

1 These

using a tensor-based method could be to identify the


time points that are considered anomalous, and then

240 © 2015 The Authors. WIREs Computational Statistics published by Wiley Periodicals, Inc. Volume 7, May/June 2015
19390068, 2015, 3, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wics.1347, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
WIREs Computational Statistics Anomaly detection in dynamic networks

examine the reconstruction error for the cells in those Type 4 (Events/Changes) Methods
time points individually. DeltaCon96 provides a graph similarity scoring func-
tion based on a number of desirable principles.85,96
It has been shown to effectively distinguish between,
Type 3 (Subgraphs) Methods and correctly mark as more important, the removal of
NetSpot52 is designed to find optimal anoma- ‘bridging’ edges and edges with higher weights, com-
lous subgraphs.52 Its alternating optimization pared to edges that do not affect the overall structure
approach—fixing a subgraph and finding the opti- if removed or have a low weight. Anomalies are found
mal time window, then fixing the time window and by finding the similarity between two adjacent graphs
finding the optimal subgraph within the window, in the stream, flagging those that are sufficiently differ-
and repeat—restricts the method to operating on an ent from their immediate neighbors as outliers. While
entire graph series at a time. The output from the not as robust as methods such as Ref 50 that perform
algorithm is a set of highly anomalous subgraphs both sequential and retrospective analysis, there is a
and their corresponding time windows, solving the commensurate decrease in the run time, scaling lin-
problem of attribution and requiring no further anal- early with the number of edges in the networks.
ysis on the part of the user. Additionally, using their The tensor decomposition method ParCube85
novel seed generation algorithm for mining heavy is the only method we discuss that is capable of
subgraphs, it is shown to scale linearly with the size handling graphs that do not fit in memory. Moreover,
of the network in terms of both vertices and time it is parallelizable on a shared memory machine
slices. While the authors assign outlier scores to the and has been designed specifically for sparse tensors.
edges in the network in their own way, it would be As with other tensor methods, it is able to handle
interesting to utilize an algorithm specifically designed any type of graphs with arbitrary attributes, as long
to assign outlier scores to edges in a network.48,74,50,47 as the attributes can be reduced to a real number
Preprocessing the networks using an auxiliary edge (e.g., edge labels being enumerated and mapping
scoring method allows a mapping from networks that them to integers). Using an online threshold, e.g.,
do not fit the requirements of being plain, weighted, maintaining the mean and standard deviation of the
and undirected to networks that do. reconstruction error and setting the threshold to two
As mentioned above, tensor methods are capa- standard deviations above the mean, ParCube can be
ble of identifying anomalous subgraphs using the applied to graph streams.
reconstruction error for particular portions of the
tensor. Without user guidance this is prohibitively
expensive, as the number of possible subgraphs is
CONCLUSION
exponential on the number of vertices in the graph. In this survey, we hoped to bridge the gap between the
However, if the user is interested in a particular set of increasing interest in anomaly detection in dynamic
vertices, or a particular vertex and its neighbors, the networks and the extensive literature available.
error can be tracked for just that region of the graph. A common high-level theoretical formulation was

Anomaly detection in dynamic networks

Community Compression Decomposition Distance Probabilistic model


based based based based based

Vertex detection Vertex detection


Vertex detection [129] Edge detection [90, 97]
[55, 56, 67, 106] Edge detection Event detection [1, 80] Edge detection
Subgraph detection [22] [68, 72, 75, 77, Subgraph detection [61]
[3, 13, 30, 31] Change detection 83, 84, 93, 113, [60, 86, 87] Subgraph detection
Change detection [112] 114, 115, 122] Event detection [37, 88, 120]
[38] Change detection [12, 17, 44, 76, Event detection
[7, 65, 66] 79, 82, 91, 95] [5, 62, 64]

FIGURE 5 | The proposed two-tiered taxonomy that illustrates the layout of the survey. Methods are first categorized based on their overarching
approach, then subdivided based on the types of anomalies they detect.

Volume 7, May/June 2015 © 2015 The Authors. WIREs Computational Statistics published by Wiley Periodicals, Inc. 241
19390068, 2015, 3, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wics.1347, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Advanced Review wires.wiley.com/compstats

found among the methods surveyed. Within this methods into five main categories (Figure 5) derived
formulation, all of the methods are concerned with from the intuition behind each: communities, com-
exploring various mappings from graphs to real pression, decomposition, distance, and probabilistic
numbers to best identify anomalous behavior using model based.
existing anomaly detectors. Although a myriad of The field of anomaly detection in dynamic net-
approaches have been proposed and explored, they works is relatively young and is rapidly growing in
all aim to identify one or more of the four key anoma- popularity, as indicated by the number of papers pub-
lies we have outlined for dynamic networks: vertices lished in the past five years. A concise summary of the
(Figure 1), edges (Figure 2), subgraphs (Figure 3), methods discussed in this survey is given in Tables 3
and events or changes (Figure 4). In conjunction with and 4, showing the type of anomalies detected, year
the types of anomalies identified, we partition the of publication, and time complexity (notation defined

TABLE 7 Public Datasets


Dataset Network Type1 Description
SNAP Multiple A large collection of networks, both static and dynamic, including domains
such as network traffic, social networks, and online reviews. See also
http://www-personal.umich.edu/mejn/netdata/Newman’s collection.
VAST Challenge 20082,3 Social A who-calls-whom dataset, recording 10 fictional days of calls and 400
unique cell phones (vertices).
PeMS Vehicle traffic Provides up to 10 years of real historical data on the road networks in
California.
Wikipedia Web links Provides an expansive Wikipedia data set that has information such as
page links, page views, and page revisions.
Enron Communication A who-emails-whom network between about 150 former Enron
employees, mostly upper management.
IMDB Social Over 100 years worth of movie data, including titles, producers, and
actors.
DBLP Co-author Computer science co-author network.
US patent citations Citation A list of almost 3 million U.S. patents granted between January 1963 and
December 1999, and all citations made to these patents between 1975
and 1999.
Climate Reanalysis4 Climate Various climate variables (air temperature, precipitable water) assimilated
from 1948 to the present, across the globe in a fixed grid.
Reality Commons Communication and social Multiple datasets, each with information on who-calls-whom and
proximity based connections.
HEP-Th Citation network Consolidated data from the http://www.cs.cornell.edu/projects/kddcup/
datasets.htmlKDD 2003 datasets for theoretical high energy physics
papers.
Can-o-sleep Communication A p2p filesharing network consisting of records of all the mp3 files shared
and transferred over 81 days.
KDD Cup 19992 TCP traffic Labeled network traffic data that is a version of the simulated network
data made available by the http://www.ll.mit.edu/mission/
communications/cyber/CSTcorpora/ideval/data/1998data.html1988
DARPA Intrusion Detection Evaluation Program.
Facebook Social A list of Facebook wall posts, where a wall posts between two users
creates a timestamped edge.
House and Senate Sponsors Co-sponsorship The co-sponsorship networks of 280,000 pieces of legislation proposed in
the U.S. House and Senate from 1973 to 2004.
1 Data may not already be in network format.
2 Syntheticinstead of real data.
3 Many more datasets are available from the 2008, 2009, and 2010 challenges.
4 See Ref 128 for one example of creating climate networks from raw data.

242 © 2015 The Authors. WIREs Computational Statistics published by Wiley Periodicals, Inc. Volume 7, May/June 2015
19390068, 2015, 3, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wics.1347, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
WIREs Computational Statistics Anomaly detection in dynamic networks

in Tables 2 and 5). A list of papers that have made NOTES


their code available to the public is given in Table 6. aThroughout this paper we will use the terms network
Moreover, there are many publicly accessible datasets
and graph interchangeably.
for testing the available methods or for personal use, b
Information on the R programming language can be
see Table 7. For further reading on anomaly detection
found at http://www.r-project.org/
in graphs and other domains we direct the reader to
Refs 16–18, 20–22, 28, 129, 130.

ACKNOWLEDGMENTS
The authors would like to thank the reviewers for their excellent feedback and suggestions. We would also like
to thank the authors of the work we cited who provided feedback and additional information regarding their
papers. This work was supported in part by the DOE SDAVI Institute and the U.S. National Science Foundation
(Expeditions in Computing program). In addition, this material is based upon work supported in part with
funding from the Laboratory for Analytic Sciences. Any opinions, findings, conclusions, or recommendations
expressed in this material are those of the author(s) and do not necessarily reflect the views of the Laboratory
for Analytic Sciences and/or any agent or entity of the United States government.

REFERENCES
1. Cheng H, Tan P-N, Potter C, Klooster S. A robust 8. Priebe CE, Conroy JM, Marchette DJ, Park Y. Scan
graph-based algorithm for detection and characteriza- statistics on Enron graphs. Comput Math Organ
tion of anomalies in noisy multivariate time series. In: Theory 2005, 11:229–247.
IEEE International Conference on Data Mining Work- 9. Wang H, Tang M, Park Y, Priebe CE. Locality statis-
shops, Pisa, Italy, 2008, 349–358. tics for anomaly detection in time series of graphs.
2. Cheng H, Tan P-N, Potter C , Klooster SA. Detection arXiv:1306.0267, 2013.
and characterization of anomalies in multivariate time 10. Chen F, Neill DB. Non-parametric scan statistics
series. In: Proceedings of the 9th SIAM International for event detection and forecasting in heterogeneous
Conference on Data Mining (SDM), Sparks, NV, 2009, social media graphs. In: Proceedings of the 20th ACM
413–424. SIGKDD International Conference on Knowledge
3. Chen Z, Hendrix W, Guan H, Tetteh IK, Choudhary Discovery and Data Mining. New York: ACM; 2014,
A, Semazzi F, Samatova NF. Discovery of extreme 1166–1175.
events-related communities in contrasting groups of 11. Akoglu L, McGlohon M, Faloutsos C. Odd-
physical system networks. Data Min Knowl Discov ball: spotting anomalies in weighted graphs. In:
2013, 27:225–258. Advances in Knowledge Discovery and Data Mining.
4. Zhang T, Zhuang X, Pande S, Lee W. Anomalous Berlin/Heidelberg: Springer; 2010, 410–421.
path detection with hardware support. In: Proceedings 12. Eberle W, Holder L. Detecting anomalies in cargo
of the 2005 International Conference on Compilers, using graph properties. In: Intelligence and Secu-
Architectures and Synthesis for Embedded Systems. rity Informatics. Berlin/Heidelberg: Springer; 2006,
New York: ACM; 2005, 43–54. 728–730.
5. Ding Q Katenka N, Barford P, Kolaczyk E , Crovella 13. Eberle W, Holder L. Anomaly detection in data repre-
M. Intrusion as (anti) social communication: charac- sented as graphs. Intell Data Anal 2007, 11:663–689.
terization and detection. In: Proceedings of the 18th
ACM International Conference on Knowledge Discov- 14. Noble CC, Cook DJ. Graph-based anomaly detection.
ery and Data Mining (SIGKDD), Beijing, China, 2012, In: Proceedings of the 9th ACM International Con-
886–894. ference on Knowledge Discovery and Data Mining
(SIGKDD), Washington, DC, 2003, 631–636.
6. Ghoting A, Eric Otey M, Parthasarathy S. Loaded:
link-based outlier and anomaly detection in evolving 15. Barnett V, Lewis T. Outliers in Statistical Data, vol. 3.
data sets. In: Proceedings of the 4th IEEE Interna- New York: John Wiley & Sons; 1994608 p.
tional Conference on Data Mining (ICDM), Brighton, 16. Chandola V, Banerjee A, Kumar V. Anomaly detec-
UK, 2004, 387–390. tion: a survey. ACM Comput Surv 2009, 41:15.
7. Jing X, Shelton CR. Intrusion detection using contin- 17. Chandola V, Banerjee A, Kumar V. Anomaly detection
uous time Bayesian networks. J Artif Intell Res 2010, for discrete sequences: a survey. IEEE Trans Knowl
39:745–774. Data Eng 2012, 24:823–839.

Volume 7, May/June 2015 © 2015 The Authors. WIREs Computational Statistics published by Wiley Periodicals, Inc. 243
19390068, 2015, 3, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wics.1347, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Advanced Review wires.wiley.com/compstats

18. Chandola V, Mithal V, Kumar V. Comparative eval- 38. Klema V, Laub AJ. The singular value decomposition:
uation of anomaly detection techniques for sequence its computation and some applications. IEEE Trans
data. In: Proceedings of the 8th IEEE International Automatic Control 1980, 25:164–176.
Conference on Data Mining (ICDM), Pisa, Italy, 2008, 39. Kolda TG, Bader BW. Tensor decompositions and
743–748. applications. SIAM Rev 2009, 51:455–500.
19. Hawkins DM. Identification of Outliers, vol. 11. 40. Gao X, Xiao B, Tao D, Li X. A survey of graph edit
London: Chapman and Hall; 1980. distance. Pattern Anal Appl 2010, 13:113–129.
20. Akoglu L, Tong H, Koutra D. Graph-based anomaly 41. Rahm E, Bernstein PA. A survey of approaches to auto-
detection and description: a survey. Data Min Knowl matic schema matching. VLDB J 2001, 10:334–350.
Discov 2014, 28:1–63. 42. Koller D, Friedman N. Probabilistic Graphical Mod-
21. Bilgin CC, Yener B. Dynamic network evolution: els: Principles and Techniques. Cambridge, MA: MIT
models, clustering, anomaly detection. IEEE Netw Press; 2009.
2006. 43. Glaz J. Scan Statistics. Berlin/Heidelberg: Springer;
22. Gupta M, Gao J, Aggarwal CC, Han J. Outlier 2001.
Detection for Temporal Data. San Rafael, CA: Morgan 44. Akoglu L, Faloutsos C. Event detection in time series
& Claypool Publishers; 2014. of mobile communication graphs. 27th Army Sci Conf
23. Hodge VJ, Austin J. A survey of outlier detection 2010, 2:18.
methodologies. Artif Intell Rev 2004, 22:85–126. 45. Gao J, Liang F, Fan W, Wang C, Sun Y, Han J.
24. Agyemang M, Barker K, Alhajj R. A comprehensive On community outliers and their efficient detection
survey of numeric and symbolic outlier mining tech- in information networks. In: Proceedings of the 16th
niques. Intell Data Anal 2006, 10:521–538. ACM International Conference on Knowledge Discov-
25. Brent West D. Introduction to Graph Theory, vol. 2. ery and Data Mining (SIGKDD), Seattle, WA, 2010,
Upper Saddle River, NJ: Prentice Hall; 2001. 813–822.
46. Ji T, Yang D, Gao J. Incremental local evolution-
26. Balakrishnan R, Ranganathan K. A Textbook of
ary outlier detection for dynamic social networks.
Graph Theory. Berlin/Heidelberg: Springer; 2012.
In: Machine Learning and Knowledge Discovery in
27. Cook DJ, Holder LB. Mining Graph Data. New York: Databases. Berlin/Heidelberg: Springer; 2013, 1–15.
John Wiley & Sons; 2006.
47. Li X, Li Z, Han J, Lee J-G. Temporal outlier detection
28. Samatova NF, Hendrix W, Jenkins J, Padmanabhan K, in vehicle traffic data. In: Proceedings of the 25th Inter-
Chakraborty A. Practical Graph Mining with R. Boca national Conference on Data Engineering (ICDE),
Raton, FL: CRC Press; 2013. Shanghai, China, 2009, 1319–1322.
29. Fortunato S. Community detection in graphs. Phys 48. James A, Tina E-R, Nishchal D. Detecting novel dis-
Rep 2010, 486:75–174. crepancies in communication networks. In: Proceed-
30. Lancichinetti A, Fortunato S. Community detection ings of the 10th IEEE International Conference on
algorithms: a comparative analysis. Phys Rev E 2009, Data Mining (ICDM), Sydney, Australia, 2010, 8–17.
80:056117. 49. Rattigan MJ, Jensen D. The case for anomalous
31. Reichardt J, Bornholdt S. Statistical mechanics of link discovery. ACM SIGKDD Explor Newsl 2005,
community detection. Phys Rev E 2006, 74:016110. 7:41–47.
32. Harenberg S, Bello G, Gjeltema L, Ranshous S, Har- 50. Heard NA, Weston DJ, Platanioti K, Hand DJ.
lalka J, Seay R, Padmanabhan K, Samatova N. Com- Bayesian anomaly detection methods for social net-
munity detection in large-scale networks: a survey works. Ann Appl Stat 2010, 4:645–662.
and empirical evaluation. WIREs Comput Stat 2014, 51. Greene D, Doyle D, Cunningham P. Tracking the
6:426–439. evolution of communities in dynamic social networks.
33. Rissanen J. Modeling by shortest data description. In: 2010 International Conference on Advances in
Automatica 1978, 14:465–471. Social Networks Analysis and Mining (ASONAM),
2010, 176–183.
34. Rissanen J. A universal prior for integers and estima-
tion by minimum description length. Ann Stat 1983, 52. Mongiov M, Bogdanov P, Ranca R, Papalexakis EE,
11:416–431. Faloutsos C, Singh AK. Netspot: spotting signifi-
cant anomalous regions on dynamic networks. In:
35. Rissanen J. Minimum-Description-Length Principle. Proceedings of the 13th SIAM International Confer-
New York: John Wiley & Sons; 1985. ence on Data Mining (SDM), Austin, TX, 2013.
36. Grünwald P. The Minimum Description Length Prin- 53. Miller BA, Beard MS, Bliss NT. Eigenspace analysis
ciple. Cambridge, MA: MIT Press; 2007. for threat detection in social networks. In: 2011
37. Golub GH, Reinsch C. Singular value decomposition Proceedings of the 14th International Conference on
and least squares solutions. Numerische Mathematik Information Fusion (FUSION), Chicago, IL, 2011,
1970, 14:403–420. 1–7.

244 © 2015 The Authors. WIREs Computational Statistics published by Wiley Periodicals, Inc. Volume 7, May/June 2015
19390068, 2015, 3, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wics.1347, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
WIREs Computational Statistics Anomaly detection in dynamic networks

54. Ramanathan A, Agarwal PK, Kurnikova M, Lang- International Conference on Data Mining (SDM),
mead CJ. An online approach for mining collec- Anaheim, CA, 2012, 624–635.
tive behaviors from molecular dynamics simulations. 69. Chen Z, Hendrix W, Samatova NF. Community-based
J Comput Biol 2010, 17:309–324. anomaly detection in evolutionary networks. J Intell
55. Saligrama V, Chen Z. Video anomaly detection based Inf Syst 2012, 39:59–85.
on local statistical aggregates. In: 2012 IEEE Con- 70. Gupta M, Gao J, Sun Y, Han J. Integrating community
ference on Computer Vision and Pattern Recognition matching and outlier detection for mining evolution-
(CVPR), Providence, RI, 2012, 2112–2119. ary community outliers. In: Proceedings of the 18th
56. Tran L, Navasca C, Luo J. Video detection anomaly via ACM International Conference on Knowledge Discov-
low-rank and sparse decompositions. In: Image Pro- ery and Data Mining (SIGKDD), Beijing, China, 2012,
cessing Workshop (WNYIPW), 2012 Western New 859–867.
York, New York, NY, 2012, 17–20. 71. Gupta M, Gao J, Sun Y, Han J. Community trend
57. Koutra D, Papalexakis EE, Faloutsos C. Tensorsplat: outlier detection using soft temporal pattern min-
spotting latent anomalies in time. In: 2012 16th ing. In: Machine Learning and Knowledge Discov-
Panhellenic Conference on Informatics (PCI), Piraeus, ery in Databases. Berlin/Heidelberg: Springer; 2012,
Greece, 2012, 144–149. 692–708.
58. Ishibashi K, Kondoh T, Harada S, Mori T, Kawahara 72. Rossi RA, Gallagher B, Neville J, Henderson K. Mod-
R, Asano S. Detecting anomalous traffic using commu- eling dynamic behavior in large evolving graphs. In:
nication graphs. In: Telecommunications: The Infras- Proceeding of the 6th ACM International Conference
tructure for the 21st Century (WTC), Vienna, Austria, on Web Search and Data Mining (WSDM), Rome,
2010, 1–6 Italy, 2013, 667–676.
59. Suykens JAK, Vandewalle J. Least squares support 73. Araujo M, Papadimitriou S, Günnemann S, Faloutsos
vector machine classifiers. Neural Process Lett 1999, C, Basu P, Swami A, Papalexakis EE, Koutra D.
9:293–300. Com2: fast automatic discovery of temporal (’comet’)
60. Breunig MM, Kriegel H-P, Ng RT, Sander J. LOF: communities. In: Proceedings of the 18th Pacific-Asia
identifying density-based local outliers. In: ACM Sig- Conference on Knowledge Discovery and Data Mining
mod Record, vol. 29. New York: ACM; 2000, 93–104. (PAKDD), Tainan, Taiwan, 2014, 271–283.

61. Aggarwal CC, Yu PS. Outlier detection for high 74. Chakrabarti D. Autopart: parameter-free graph parti-
dimensional data. In: ACM Sigmod Record, vol. 30. tioning and outlier detection. In: Proceedings of the
New York: ACM; 2001, 37–46. 8th European Conference on Principles and Practice
of Knowledge Discovery in Databases (PKDD), Pisa,
62. Angiulli F, Pizzuti C. Fast outlier detection in high Italy, 2004, 112–124.
dimensional spaces. In: Principles of Data Mining and
Knowledge Discovery. Berlin/Heidelberg: Springer; 75. Sun J, Faloutsos C, Papadimitriou S, Yu PS. Graph-
2002, 15–27. scope: parameter-free mining of large time-evolving
graphs. In: Proceedings of the 13th ACM International
63. Angiulli F, Pizzuti C. Outlier mining in large Conference on Knowledge Discovery and Data Mining
high-dimensional data sets. IEEE Trans Knowl (SIGKDD), San Jose, CA, 2007, 687–696.
Data Eng 2005, 17:203–215.
76. Ide T, Kashima H. Eigenspace-based anomaly detec-
64. Parsons L, Haque E, Liu H. Subspace clustering for tion in computer systems. In: Proceedings of the 10th
high dimensional data: a review. ACM SIGKDD ACM International Conference on Knowledge Discov-
Explor Newsl 2004, 6:90–105. ery and Data Mining (SIGKDD), Seattle, WA, 2004,
65. Srivastava AN. Enabling the discovery of recur- 440–449.
ring anomalies in aerospace problem reports using 77. Lakhina A, Crovella M, Diot C. Diagnosing
high-dimensional clustering techniques. In: Aerospace network-wide traffic anomalies. In: ACM SIGCOMM
Conference, Big Sky, MT, 2006, 17 pp. Computer Communication Review, vol. 34. New
66. Pokrajac D, Lazarevic A, Latecki LJ. Incremental York: ACM; 2004, 219–230.
local outlier detection for data streams. In: IEEE 78. Sun J, Papadimitriou S, Yu PS. Window-based ten-
Symposium on Computational Intelligence and Data sor analysis on high-dimensional and multi-aspect
Mining, Orlando, FL, 2007, 504–515. streams. In: Proceedings of the 6th IEEE International
67. Duan D, Li Y, Jin Y, Zhengding L. Community mining Conference on Data Mining (ICDM), Hong Kong,
on dynamic weighted directed graphs. In: Proceedings China, 2006, 1076–1080.
of the 1st ACM International Workshop on Complex 79. Sun J, Tao D, Christos F. Beyond streams and graphs:
Networks Meet Information & Knowledge Manage- dynamic tensor analysis. In: Proceedings of the 12th
ment. New York: ACM; 2009, 11–18. ACM International Conference on Knowledge Discov-
68. Aggarwal CC, Subbian K. Event detection in ery and Data Mining (SIGKDD), Philadelphia, PA,
social streams. In: Proceedings of the 12th SIAM 2006, 374–383.

Volume 7, May/June 2015 © 2015 The Authors. WIREs Computational Statistics published by Wiley Periodicals, Inc. 245
19390068, 2015, 3, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wics.1347, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Advanced Review wires.wiley.com/compstats

80. Sun J, Xie Y, Zhang H, Faloutsos C. Less is more: com- Information and Communication Technology. New
pact matrix decomposition for large sparse graphs. York: ACM; 2011, 36–41.
In: Proceedings of the 7th SIAM International Con- 93. Berlingerio M, Koutra D, Eliassi-Rad T, Faloutsos
ference on Data Mining (SDM), Minneapolis, MN, C. Netsimile: a scalable approach to size-independent
2007. network similarity. arXiv:1209.2684, 2012.
81. Kolda TG, Sun J. Scalable tensor decompositions 94. He W, Guangmin H, Zhou Y. Large-scale ip network
for multi-aspect data mining. In: Proceedings of the behavior anomaly detection and identification using
8th IEEE International Conference on Data Mining substructure-based approach and multivariate time
(ICDM), Pisa, Italy, 2008, 363–372. series mining. Telecomm Syst 2012, 50:1–13.
82. Tong H, Papadimitriou S, Sun J, Yu PS, Faloutsos 95. Malliaros FD, Megalooikonomou V, Faloutsos C.
C. Colibri: fast mining of large static and dynamic Fast robustness estimation in large social graphs:
graphs. In: Proceedings of the 14th ACM International communities and anomaly detection. In: Proceedings
Conference on Knowledge Discovery and Data Mining of the 12th SIAM International Conference on Data
(SIGKDD), Las Vegas, NV, 2008, 686–694. Mining (SDM), Anaheim, CA, 2012, 942–953.
83. Jiang R, Fei H, Huan J. Anomaly localization for 96. Koutra D, Vogelstein J, Faloutsos C. Deltacon: a prin-
network data streams with graph joint sparse PCA. cipled massive-graph similarity function. In: Proceed-
In: Proceedings of the 17th ACM International Con- ings of the 13th SIAM International Conference on
ference on Knowledge Discovery and Data Mining Data Mining (SDM), Texas-Austin, TX, 2013.
(SIGKDD), San Diego, CA, 2011, 886–894.
97. Mongiovi M, Bogdanov P, Singh AK. Mining evolving
84. Miller BA, Arcolano N, Beard MS, Kepner J, Schmidt network processes. In: Proceedings of the 13th IEEE
MC, Bliss NT, Wolfe PJ. A scalable signal processing International Conference on Data Mining (ICDM),
architecture for massive graph analysis. In: 2012 IEEE Dallas, TX, 2013.
International Conference on Acoustics, Speech and
98. Huang Z, Dajun Zeng D. A link prediction approach
Signal Processing (ICASSP), 2012, 5329–5332.
to anomalous email detection. In: IEEE International
85. Papalexakis EE, Faloutsos C, Sidiropoulos ND. Par- Conference on Systems, Man and Cybernetics, San
cube: sparse parallelizable tensor decompositions. Diego, CA, 2006, 1131–1136.
In: Machine Learning and Knowledge Discovery
99. Pandit S, Horng Chau D, Wang S, Faloutsos C. Net-
in Databases. Berlin/Heidelberg: Springer; 2012,
probe: a fast and scalable system for fraud detection in
521–536.
online auction networks. In: Proceedings of the 16th
86. Miller BA, Arcolano N, Bliss NT. Efficient anomaly International Conference on World Wide Web, Banff,
detection in dynamic, attributed graphs: emerging Canada, 2007, 201–210.
phenomena and big data. In: 2013 IEEE International
100. Hirose S, Yamanishi K, Nakata T, Fujimaki R. Net-
Conference on Intelligence and Security Informatics
work anomaly detection based on Eigen equation com-
(ISI), Washington, DC, 2013, 179–184.
pression. In: Proceedings of the 15th ACM Interna-
87. Yu W, Aggarwal CC, Ma S, Wang H. On anomalous tional Conference on Knowledge Discovery and Data
hotspot discovery in graph streams. In: Proceedings Mining (SIGKDD), Paris, France, 2009, 1185–1194.
of the 13th IEEE International Conference on Data
101. Thompson B, Eliassi-Rad T. Dapa-v10: discovery
Mining (ICDM), Dallas, TX, 2013.
and analysis of patterns and anomalies in volatile
88. Pincombe B. Anomaly detection in time series of time-evolving networks. In: Notes of the 1st Workshop
graphs using Arma processes. ASOR Bull 2005, 24:2. on Information in Networks (WIN), New York, NY,
89. Gaston ME, Kraetzl M, Wallis WD. Using graph 2009.
diameter for change detection in dynamic networks. 102. Djidjev H, Sandine G, Storlie C, Vander Wiel S.
Australasian J Combinatorics 2006, 35:299. Graph based statistical analysis of network traffic. In:
90. Papadimitriou P, Dasdan A, Garcia-Molina H. Web Proceedings of the Ninth Workshop on Mining and
graph similarity for anomaly detection. J Internet Serv Learning with Graphs, San Diego, CA, 2011.
Appl 2010, 1:19–30. 103. Aggarwal CC, Zhao Y, Yu PS. Outlier detection in
91. Arackaparambil C, Yan G. Wiki-watchdog: anomaly graph streams. In: Proceedings of the 27th Interna-
detection in Wikipedia through a distributional lens. tional Conference on Data Engineering (ICDE), Han-
In: Proceedings of the 2011 IEEE/WIC/ACM Interna- nover, Germany, 2011, 399–409.
tional Conferences on Web Intelligence and Intelligent 104. Neil J, Hash C, Brugh A, Fisk M, Storlie CB. Scan
Agent Technology. Washington, DC: IEEE Computer statistics for the online detection of locally anomalous
Society; 2011, 257–264. subgraphs. Technometrics 2013, 55:403–414.
92. Quoc Le D, Jeong T, Eduardo Roman H, Won-Ki 105. Tantipathananandh C, Berger-Wolf T. Constant-
Hong J. Traffic dispersion graph based anomaly detec- factor approximation algorithms for identifying
tion. In: Proceedings of the Second Symposium on dynamic communities. In: Proceedings of the 15th

246 © 2015 The Authors. WIREs Computational Statistics published by Wiley Periodicals, Inc. Volume 7, May/June 2015
19390068, 2015, 3, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wics.1347, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
WIREs Computational Statistics Anomaly detection in dynamic networks

ACM International Conference on Knowledge Dis- 118. Chazelle B. The Discrepancy Method: Randomness
covery and Data Mining (SIGKDD), Paris, France, and Complexity. Cambridge, UK: Cambridge Univer-
2009, 827–836 sity Press; 2002.
106. Tantipathananandh C , Berger-Wolf T. Finding com- 119. Tong H, Lin C-Y. Non-negative residual matrix factor-
munities in dynamic social networks. In: Proceed- ization with application to graph anomaly detection.
ings of the 11th IEEE International Conference on In: SDM. Mesa, AZ: SIAM; 2011, 143–153.
Data Mining (ICDM), Vancouver, Canada, 2011, 120. Bogdanov P, Mongiov M, Singh AK. Mining heavy
1236–1241. subgraphs in time-evolving networks. In: Proceedings
107. Tantipathananandh C , Berger-Wolf T, Kempe D. A of the 11th IEEE International Conference on Data
framework for community identification in dynamic Mining (ICDM), Vancouver, Canada, 2011, 81–90.
social networks. In: Proceedings of the 13th ACM 121. Mahadevan P, Krioukov D, Fall K, Vahdat A. Sys-
International Conference on Knowledge Discovery tematic topology analysis and generation using degree
and Data Mining (SIGKDD), San Jose, CA, 2007, correlations. In: ACM SIGCOMM Computer Com-
717–726. munication Review, vol. 36. New York: ACM; 2006,
108. Gupta M, Gao J, Han J. Community distribution 135–146.
outlier detection in heterogeneous information net- 122. Sala A, Cao L, Wilson C, Zablit R, Zheng H, Zhao
works. In: Machine Learning and Knowledge Discov- BY. Measurement-calibrated graph models for social
ery in Databases. Berlin/Heidelberg: Springer; 2013, network experiments. In: Proceedings of the 19th
557–573. International Conference on World Wide Web. New
109. Grünwald P. A Tutorial Introduction to the Minimum York: ACM; 2010, 861–870.
Description Length Principle. Cambridge, MA: MIT 123. Wang J, Provan GM. Generating application-specific
Press; 2005. benchmark models for complex systems. In: AAAI,
110. Harshman RA. Foundations of the Parafac Procedure: 2008, 566–571.
Models and Conditions for an “Explanatory” Mul- 124. Koutra D, Ke T-Y, Kang U, Horng Polo Chau D,
timodal Factor Analysis. Los Angeles, CA: UCLA; Kenneth Pao H-K, Faloutsos C. Unifying guilt-by-
1970. association approaches: theorems and fast algorithms.
111. Jolliffe I. Principal Component Analysis. New York: In: Machine Learning and Knowledge Discovery
John Wiley & Sons; 2005. in Databases. Berlin/Heidelberg: Springer; 2011,
112. Barnathan M, Megalooikonomou V, Faloutsos C, 245–260.
Faro S, Mohamed FB. Twave: high-order analysis of 125. Box GEP, Jenkins G. Time Series Analysis: Forecasting
functional MRI. Neuroimage 2011, 58:537–548. and Control. San Francisco, CA: Holden-Day; 1970.
113. Tucker LR. Some mathematical notes on three-mode 126. Charikar MS. Similarity estimation techniques
factor analysis. Psychometrika 1966, 31:279–311. from rounding algorithms. In: Proceedings of the
114. Chung F, Linyuan L, Van V. Spectra of random graphs Thiry-Fourth Annual ACM Symposium on Theory of
with given expected degrees. Proc Natl Acad Sci 2003, Computing. New York: ACM; 2002, 380–388.
100:6313–6318. 127. Gupta AK, Nagar DK. Matrix Variate Distributions,
115. Ringberg H, Soule A, Rexford J, Diot C. Sensitivity vol. 104. Boca Raton, FL: CRC Press; 1999.
of PCA for traffic anomaly detection. ACM SIGMET- 128. Steinhaeuser K, Chawla NV, Ganguly AR. Complex
RICS Perf Eval Rev 2007, 35:109–120. networks as a unified framework for descriptive anal-
116. Chen Y, Miao D, Zhang H. Neighborhood outlier ysis and predictive modeling in climate science. Stat
detection. Expert Syst Appl 2010, 37:8745–8749. Anal Data Min 2011, 4:497–511.
117. Papadimitriou S, Kitagawa H, Gibbons PB, Falout- 129. Aggarwal CC. Outlier Analysis. Berlin/Heidelberg:
sos C. Loci: fast outlier detection using the local Springer; 2013.
correlation integral. In: Proceeding of the 19th Inter- 130. Gupta M, Gao J, Aggarwal CC, Han J. Outlier
national Conference on Data Engineering, 2003, detection for temporal data: a survey. IEEE Trans
315–326. Knowl Data Eng 2013, 25:1.

Volume 7, May/June 2015 © 2015 The Authors. WIREs Computational Statistics published by Wiley Periodicals, Inc. 247

You might also like