Anomaly Detection in Dynamic Networks - A Survey
Anomaly Detection in Dynamic Networks - A Survey
Anomaly Detection in Dynamic Networks - A Survey
points in time that are unlike the rest. There are many used in the papers we discuss. We then introduce four
high-impact and practical applications of anomaly different types of anomalies that these algorithms
detection spanning numerous domains. A small sam- detect, namely, anomalous vertices, edges, subgraphs,
ple includes: detection of ecological disturbances, and events. We continue with an extensive overview
such as wildfires1,2 and cyclones3 ; intrusion detection of the existing methods based on the proposed tax-
for individual systems4 and network systems5–7 ; iden- onomy, illustrated in Figure 5 that takes into account
tifying abnormal users and events in communication their underlying design principles, such as those based
networks8,9 ; and detecting civil unrest using twitter on graph communities, compression, decomposition,
feeds.10 distance metrics, and probabilistic modeling of graph
The ubiquitousness and importance of anomaly features. Each taxonomic group is then subcatego-
detection in dynamic networks has led to the emer- rized further based on the types of anomalies detected.
gence of dozens of methods over the last 5 years Finally, we end with a more in-depth discussion of
(see Tables 3 and 4). These methods complement the methods that have code publicly available (see
techniques for static networks,11–14 as the latter often Table 6), highlighting pros and cons of each.
cannot be easily adopted for dynamic networks.
When considering the dynamic nature of the data,
new challenges are introduced, including: BACKGROUND
Anomaly, or outlier, detection is a problem that spans
• New types of anomalies are introduced as a result many domains. Chandola et al.16 provide an excellent
of the graph evolving over time, for example, overview, taxonomy, and analysis of a multitude
splitting, disappearing, or flickering communi- of techniques (e.g., classification, clustering, and
ties. statistical) for a variety of domains, expanding the
• New graphs or updates that arrive over time work of Hodge et al.23 and Agyemang et al.24 As our
need to be stored and analyzed. Storing all the focus will be on graphs, it is important to have a
new graphs in their entirety can vastly increase basic understanding of graph theory. West et al.25 and
the size of the data. Therefore, typical offline Balakrishnan et al.26 both offer comprehensive and
analysis, where multiple passes over the data are approachable introductions to graph theory, covering
acceptable and all of the data are assumed to all of the basics required for this survey and well
fit into memory, becomes infeasible. Conversely, beyond. Cook and Holder27 show how the theoretical
storing only the most recent graph or updates concepts can be applied for graph mining, and recently
restricts the analysis to a single point in time. Samatova et al.28 provide an overview of many graph
mining techniques as well as implementation details
• Graphs from different domains, such as social in the R programming language.b Owing to space
networks compared to gene networks, may limitations we cannot provide an introduction to the
exhibit entirely different behavior over time. fundamentals of the types of methods we discuss, so
This divergence in evolution can lead to we provide references for introductory and overview
application-specific anomalies and approaches. material for each of them in Table 1.
• Anomalies, particularly those that are slow to It is important to note that in many domains
develop and span multiple time steps, can be hard the data are naturally represented as a network, with
to differentiate from organic graph evolution. the vertices and edges clearly defined. However, in
224 © 2015 The Authors. WIREs Computational Statistics published by Wiley Periodicals, Inc. Volume 7, May/June 2015
19390068, 2015, 3, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wics.1347, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
WIREs Computational Statistics Anomaly detection in dynamic networks
some cases, how to represent the data as a network some applications, and conclude with a representative
is unclear and can depend on the specific research example.
question being asked. The conversion processes used
in specific domains are outside the scope of this survey,
and we assume all data are already represented as Type 1: Anomalous Vertices
dynamic networks. Anomalous vertex detection aims to find a subset
of the vertices such that every vertex in the subset
has an ‘irregular’ evolution compared to the other
TYPES OF ANOMALIES vertices in the graph. Optionally, the time point(s)
In this section, we identify and formalize four types where the vertices are determined to be anomalous
of anomalies that arise in dynamic networks. These can be identified. What constitutes irregular behavior
categories represent the output of the methods, not is dependent on the specific method, but it can be
the implementation details of how they detect the generalized by assuming that each method provides
anomalies, e.g., comparing consecutive time points, a function that scores each vertex’s behavior, e.g.,
using a sliding window technique. Note that often measuring the change in the degree of a vertex from
times in real-world graphs, (e.g., social and biological time step to time step. In static graphs, the single
networks) the graph’s vertex set V is called a set of snapshot allows only intragraph comparisons to be
nodes. However, to avoid confusion with nodes in made, such as finding vertices with an abnormal
a physical computer network, and to align with the egonet density.11 Dynamic graphs allow the temporal
abstract mathematical representation, we call it a set dynamics of the vertex to be included, introducing new
of vertices. types of anomalies that are not present in static graphs.
Because the graphs are assumed to be dynamic, A high-level definition for a set of anomalous vertices
vertices and edges can be inserted or removed at every is as follows.
time step. For the sake of simplicity, we assume that
the vertex correspondence and the edge correspon- Definition 1. (Anomalous vertices). Given G, the
dence across different time steps are resolved because total vertex set V = ∪Tt=1 Vt , and a specified scoring
of unique labeling of vertices and edges, respectively. ( ′ ) ̂V ⊆ V
function f : V → ℝ, the set of anomalous vertices ′
We define a graph series G as an ordered set of graphs is a vertex set such that ∀ v ∈ V , |f v − f| > c0 ,
′ ′
Volume 7, May/June 2015 © 2015 The Authors. WIREs Computational Statistics published by Wiley Periodicals, Inc. 225
19390068, 2015, 3, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wics.1347, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Advanced Review wires.wiley.com/compstats
1 1 1 1 1
0.50 0.45 0.45 0.48 0.47 0.93 0.51 0.47 0.49 0.50
2 4 2 4 2 4 2 4 2 4
0.85 0.22 0.90 0.77 0.87 0.23 0.88 0.20 0.91 0.18
3 3 3 3 3
t=1 t=2 t=3 t=4 t=5
FIGURE 2 | An illustration of anomalous edges that occur because of an irregular pattern of their weight over time, with anomalous edges
highlighted. At each time step, a vertex’s weight typically shifts by ± 0.05 at most. However, edge (3, 4) has a spike in its weight at time step 2, unlike
any other time in the series. Similarly, at time step 3, edge (1, 4) spikes in value. These spikes cause the edges to be considered anomalous.
226 © 2015 The Authors. WIREs Computational Statistics published by Wiley Periodicals, Inc. Volume 7, May/June 2015
19390068, 2015, 3, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wics.1347, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
WIREs Computational Statistics Anomaly detection in dynamic networks
Two specific types of anomalous subgraphs Type 4: Event and Change Detection
are shown in Figure 3. A shrunken community, Unlike the previous three types of anomalies, the two
Figure 3(a), is when a single community loses a sig- types discussed here are exclusively found in dynamic
nificant number of its vertices between time steps. graphs. We start by defining the problem of event
Assuming that a matching of communities between detection, which has attracted much interest in the
adjacent time steps t and t − 1 is known, shrunken data mining community. Event detection has a much
communities can be identified by measuring the broader scope compared to the previous three types
decrease in the number of vertices in the community, of anomalies, aiming to identify time points that are
f (h) = |ht ∩ ht − 1 | − |ht − 1 |. Figure 3(b) is an example of significantly different from the rest. Isolated points
a split community, when a single community divides in time where the graph is unlike the graphs at the
into several distinct communities. Split communities previous and following time points represent events.
will have a high matching score or probability for One approach to measuring the similarity of two
more than one community in the next graph of the graphs is comparing the signature vector of summary
Volume 7, May/June 2015 © 2015 The Authors. WIREs Computational Statistics published by Wiley Periodicals, Inc. 227
19390068, 2015, 3, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wics.1347, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Advanced Review wires.wiley.com/compstats
1 1 1 1 1
2 6 2 6 2 6 2 6 2 6
3 5 3 5 3 5 3 5 3 5
4 4 4 4 4
t=1 t=2 t=3 t=4 t=5
(a)
1 1 1 1 1
2 6 2 6 2 6 2 6 2 6
3 5 3 5 3 5 3 5 3 5
4 4 4 4 4
t=1 t=2 t=3 t=4 t=5
(b)
FIGURE 4 | An example of an event (a), a change (b), and the difference between them. In both, the graphs initially have only a few edges. At
time step 3, the graphs become highly connected, almost forming a clique. The major difference is observed at the time step following the insertion of
the edges. In Figure 4(a) the edges are then removed and the graph returns to a state similar to the one before the insertion, representing an isolated
event. In Figure 4(b), however, the graph maintains its new state indicating the sustained shift, or change, in the graph.
values extracted from each graph, such as average applications include finding frames in a video that are
clustering coefficient. The task of attribution—finding unlike the others,55,56 and detecting disturbances in
the cause for the event—is not required as part the ecosystem (e.g., wildfires).1
of event detection, and is often omitted. However, Now, we move on to the problem of change
once the time points for events have been identified, detection, which is complementary to event detection.
potential causes can be found by applying techniques It is important to note the distinction between event
for anomaly detection in static graphs.11–14 and change detection. While events represent isolated
incidents, change points mark a point in time where
Definition 4. (Event detection). Given G and a the entire behavior of the graph changes and the
scoring function f : Gt → ℝ, an event is defined as difference is maintained until the next change point,
a time point t, such that |f(Gt ) − f(Gt − 1 )| > c0 and Figure 4 illustrates the difference between the two. The
|f(Gt ) − f(Gt + 1 )| > c0 . distinction between events and change points mani-
fests in the modification of the constraint that the value
One of the simplest similarity metrics for graphs
of the scoring function at t be more than a threshold
is comparing the number of vertices and edges in
value away from both t − 1 and t + 1, to being more
them. In Figure 4(a), the event at time step 3 can be
than a threshold value away from only t − 1.
found by counting the edges in each graph, f (Gt ) = |Et |,
and comparing the adjacent time points. The number Definition 5. (Change detection). Given G and a
of edges in each graph, {6, 6, 14, 7, 7}, shows that scoring function f : Gt → ℝ, a change is defined as
there is a significant change in the structure of the a time point t, such that |f(Gt ) − f(Gt − 1 )| > c0 and
graph at time step 3, almost forming a clique. Note |f(Gt ) − f(Gt + 1 )| ≤ c0 .
that this time point is isolated, with the surrounding
time points being very dissimilar, indicating an event An example of change detection is shown in
occurred. Figure 4(b). Similar to event detection, this change
Event detection, while providing less specific can be identified by comparing the number of edges at
information than vertex, edge, or subgraph detection, adjacent time points. Just as in Figure 4(a), a number
is highly applicable in many areas. Approximating of edges are added to the graph, forming nearly a
the data available using a tensor decomposition, then clique. However, at time step 4, instead of the graph
scoring the time point as the amount of error in reverting back to its original structure, the added
the reconstruction, has been shown to effectively edges are persistent and the graph has a new ‘normal’
detect moments in time when the collective motions behavior. The persistence of the new edges indicates
in molecular dynamics simulations change.54 Other that at t = 3 a change was detected, not an event.
228 © 2015 The Authors. WIREs Computational Statistics published by Wiley Periodicals, Inc. Volume 7, May/June 2015
19390068, 2015, 3, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wics.1347, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
WIREs Computational Statistics Anomaly detection in dynamic networks
One of the most popular applications of change numbers, then applying existing anomaly detectors to
detection is in networks modeling human interactions, identify the anomalies. Equivalently, they are propos-
such as communication and coauthorship networks. ing methodologies for stage one, and applying existing
Authors that act as a ‘bridge’ between different con- methodologies for stage two. A complete list of the
ferences, or switch areas of interest throughout their methods surveyed is provided in Tables 3 and 4, with
career, will exhibit a number of change points in their the complexity notation provided in Table 5.
publication records.57 Additionally, change detection
has been applied to communication graphs44 and net-
work traffic58 by measuring the change in the Eigen Community Detection
decompositions. Community-based methods track the evolution of
communities and their associated vertices in the
graphs over time.105–107 While the data-specific fea-
METHODS tures generated by the methods discussed in this
section are all derived using the community struc-
A common two-stage methodology was found among ture of the graphs, the various approaches differ in
the papers reviewed. In the first stage, data-specific two main points: (1) in the aspects of the community
features are generated by applying a mapping from the structure they analyze, e.g., the connectivity within
domain-specific representation, graphs, to a common each community versus how the individual vertices are
data representation, a vector of real numbers. A assigned to the communities at each time step, and (2)
simple example is taking a single static graph as in the definitions of communities they use, e.g., soft
input and outputting its diameter. In the second stage, communities where each vertex has a probability of
an anomaly detector is applied, taking the output belonging to each community versus disjoint commu-
from stage one, and optionally some historical data, nities where each vertex is placed into at most one
and mapping it to a decision on the anomalousness community in the graph. Moreover, based on how the
of the data. Anomaly detectors, such as support community evolution is viewed, it can be applied to
vector machines (SVMs)59 or the local outlier factor the detection of different anomaly types. For example,
(LOF) algorithm,60 are general methodologies that are a rapid expansion or contraction of a community
domain-independent, as they operate on the common could indicate that the specific subgraph for that com-
data representation that is output from stage one. munity is undergoing drastic changes, whereas a drop
Hence, stage two consists of identifying outlier points in the number of communities by two corresponds to
in a multidimensional space.60–64 The two stages can an abnormal event.
be formally defined as follows:
Example
Stage one ⇒ Ω ∶ D → ℝd , In Ref 67, given a weighted directed graph series G and
a vertex partitioning similarity function Sim(Is , It ), the
{[ ] [ ] } set of change points is defined as a set of time points
∗ t
Stage two ⇒ f ∶ ℝd , ℝd → {0, 1} , T : {t ∈ T|Sim(Is , It ) ≤ c0 }. The similarity function is
defined as
where D is the domain-specific representation, d is ( ) ( )
( ) PIs It + PIt Is
the number of feature dimensions, [ℝd ]* denotes the Sim Is , It = , (1)
history of the data, and [ℝd ]t denotes the current time 2N
point. Often times, stage one maps the domain-specific ( ) ∑ | |
where PIs It = p max |Isp ∩ c|, Is is the partition-
data to a single value, d = 1, such as the graph diameter c∈Is ∩It | |
example. Additionally, stage two can be generalized ing of the current graph segment Gs and It is the
to mapping to [0, 1] instead of {0, 1}, representing partitioning of the newly arrived graph Gt , N is the
the probability that the data are anomalous or a number of vertices, p is the community index in a
normalized outlier score assigned to the data. partitioning, c is the common community between
This two-stage process can be found in many the two partitionings. Mapping the similarity of two
other areas of anomaly or outlier detection, such as graphs to a single score via vertex partition similar-
text mining and abnormal document detection,65 and ity is the extraction of the data-specific feature, and
computer vision.66 However, in this survey we restrict the anomaly detector is the application of a threshold
the scope to anomaly detection in dynamic graphs. value. If the similarity is above a threshold c0 , then Gt
The papers we surveyed are interested in finding the will be put into the current graph segment Gs , other-
best mapping from dynamic graphs to a vector of real wise, Gt starts a new graph segment Gs + 1 . The graph
Volume 7, May/June 2015 © 2015 The Authors. WIREs Computational Statistics published by Wiley Periodicals, Inc. 229
19390068, 2015, 3, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wics.1347, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Advanced Review wires.wiley.com/compstats
is partitioned using a relevance matrix computed Based on this logic, using soft community match-
with random walks with restarts and modularity ing and looking at each community individually, the
optimization. average change in belongingness (the probability the
vertex is part of the community) for each vertex can
Vertex Detection be found for consecutive time steps. Vertices whose
change in belongingness is different from the average
A group of vertices that belong to the same community change of the vertices in the community are said to be
are expected to exhibit similar behavior. Intuitively evolutionary community outliers.70
this means that if at consecutive time steps one vertex
The changes in a vertex’s community belong-
in the community has a significant number of new
ingness may form a pattern over time, called a soft
edges added, the other vertices in the community
would also have a significant number of new edges. temporal pattern. In Ref 72, a non-negative matrix
If the rest of the vertices in the community did factorization approach in combination with the min-
not have new edges added, the vertex that did is imum description length (MDL) principle is used to
anomalous. detect automatically vertex roles and build transition
230 © 2015 The Authors. WIREs Computational Statistics published by Wiley Periodicals, Inc. Volume 7, May/June 2015
19390068, 2015, 3, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wics.1347, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
WIREs Computational Statistics Anomaly detection in dynamic networks
models. The community memberships are found In Ref 46, communities manifest in the form
slightly differently in Ref 71, where Xmeans is used. of corenets. Instead of defining the corenet (commu-
However, in both cases, the patterns common across nity) based solely on density, modularity, or hop dis-
all of the vertices in the graph are extracted, then tance (as egonets are11 ), the definition is based on the
each vertex’s patterns are compared to the extracted weighted paths between vertices. More formally, a ver-
ones. If a vertex’s patterns are not similar to any of tex’s corenet consists of itself and all the vertices within
the extracted common patterns, then the vertex is two hops that have a weighted path above a thresh-
anomalous. Later extended to static networks derived old value. If the edge weight between two vertices is
from heterogeneous data sources,108 the two-step considered the strength of their connection, then intu-
approach in Ref 71 is modified to an alternating itera- itively the vertices connected with higher weight edges
tive approach in Ref 70. Instead of extracting patterns should be considered part of the same community.
first and then identifying outliers, the patterns and Consequently, if a vertex has two neighbors removed,
outliers are found in an alternating fashion (pattern one connected with a high edge weight and the other
extraction → outlier detection → pattern extraction connected with a low edge weight, then the removal of
→ · · ·) until the outliers discovered do not change on the vertex connected by the higher edge weight should
consecutive iterations. By alternating back and forth have more of an impact. At each time step every vertex
between pattern extraction and outlier detection, the is first given an outlier score based on the change in its
algorithm accounts for the affect the outliers have on corenet, and the top k outlier scores are then declared
the communities discovered. anomalous.
Volume 7, May/June 2015 © 2015 The Authors. WIREs Computational Statistics published by Wiley Periodicals, Inc. 231
19390068, 2015, 3, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wics.1347, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Advanced Review wires.wiley.com/compstats
TABLE 5 Algorithm Complexity Notation that behave similarly can be grouped using clustering
or prior knowledge. In Ref 3, if a community is
Symbol Meaning
conserved across time steps and the networks within
n Number of vertices in the graph its group, but has no corresponding community in
m Number of edges in the graph any other group of networks, then the community is
t Number of time steps defined as anomalous; two communities are consid-
k Number of communities ered corresponding communities if they have a certain
𝒳 Tensor representing the graph
percentage of their vertices in common.
Unlike Refs 3, 69, 73 that consider only the
|𝒳 | Size of the largest mode in 𝒳
structure of the network, in social networks there is
r Size of the largest mode of the core tensor often more information available. For example, in the
approximation (e.g., in the Tucker Twitter user network, clusters can be found based
decomposition113 )
on the content of tweets (edges), as well as the users
N𝒳 Number of modes in 𝒳 (vertices) involved. If the fraction of the tweets (edges)
c Number of columns sampled for low rank added during the recent time window for a cluster is
approximation much larger than the fraction of tweets (edges) added
Exp. Exponential time complexity anytime before the window, then this influx is declared
as an evolution event for that cluster.68 A cluster
that experiences an evolution event is marked as an
Subgraph Detection anomalous subgraph at the time when the evolution
event occurs. Although the authors did not use labeled
Instead of looking at individual vertices and their com- datasets in Ref 73, the proposed algorithm can be
munity belongingness, entire subgraphs that behave applied to such data by appropriately incorporating
abnormally can be found by observing the behavior the information in the tensor.
of communities themselves over time.
232 © 2015 The Authors. WIREs Computational Statistics published by Wiley Periodicals, Inc. Volume 7, May/June 2015
19390068, 2015, 3, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wics.1347, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
WIREs Computational Statistics Anomaly detection in dynamic networks
graph representation.109 Applying this principle to matrix including the edge to the encoding cost if the
graphs is done by viewing the adjacency matrix of edge is removed. Streaming updates can be performed
a graph as a single binary string, flattened in row using the previous result as a seed for a new run of
or column major order. If the rows and columns of the algorithm, thus avoiding full recomputations.
the matrix can be rearranged such that the entropy
of the binary string representation of the matrix is Change Detection
minimized, then the compression cost (also known as
encoding cost34 ) is minimized. The data-specific fea- The main idea is that consecutive time steps that are
tures are all derived from the encoding cost of the very similar can be grouped together leading to low
graph or its specific substructures; hence, anomalies compression cost. Increases in the compression cost
are then defined as graphs or substructures (e.g., edges) mean that the new time step differs significantly from
that inhibit compressibility. the previous ones, and thus signifies a change.
Volume 7, May/June 2015 © 2015 The Authors. WIREs Computational Statistics published by Wiley Periodicals, Inc. 233
19390068, 2015, 3, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wics.1347, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Advanced Review wires.wiley.com/compstats
methods for matrices (2-mode tensors) is singular weighted frequencies are found using a decay function,
value decomposition (SVD),37 and for higher order where edges that occurred more recently have a higher
tensors (≥ 3 modes) is PARAFAC,110 a generalization weight. The largest Eigenvalue and its corresponding
of SVD. The main differences among the decomposi- vector obtained by performing PCA on M are sum-
tion based methods are whether they use a matrix or maries of the activity of the vertex and the correlation
a higher order tensor, how the tensor is constructed of its edges, respectively. The time series formed by
(what information is stored), and the method of finding the changes in these values are used to com-
decomposition. pute a score for each vertex at each time step. Vertices
that have a score above a threshold value are output
Example as anomalies at that time.
234 © 2015 The Authors. WIREs Computational Statistics published by Wiley Periodicals, Inc. Volume 7, May/June 2015
19390068, 2015, 3, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wics.1347, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
WIREs Computational Statistics Anomaly detection in dynamic networks
Instead of finding the difference between the The normal activity vector, r(t − 1), is the left
graph reconstructed from the approximation obtained singular vector obtained by performing SVD on the
by a decomposition, a probabilistic model can be used. matrix formed by the activity vectors for the last
The Chung-Lu random graph model114 is used in W time steps. Each time point is given a score
Ref 84 Taking the difference between the real graph’s Z(t) = 1 − r(t − 1)T u(t), which is intuitively a score of
adjacency matrix and the expected graph’s forms a how different the current activity is compared to
residual matrix. Anomalous time windows are found normal, where a higher value is worse. Anomalies can
by performing SVD on the residual matrices—on be found online using a dynamic thresholding scheme,
which a linear ramp filter has been applied—and where time points with a score above the threshold
by analyzing the change in the top singular values. are output as changes.76 The vertices responsible are
The responsible vertices are identified via inspection found by calculating the ratio of change between
of the right singular vectors. More accurate graph the normal and activity vectors. The vertices that
models that also consider attributes are proposed correspond to the indexes with the largest change are
in Ref 86. labeled anomalous. Similar approaches have used the
Going away from comparing an expected or activity vector of a vertex-to-vertex feature correlation
approximate model of the graph to the real model to matrix,44 and a vertex-to-vertex correlation matrix
find the deviation, events can be identified via signifi- based on the similarity between vertex’s neighbors.58
cant changes in the decomposed space. Specifically, by
performing PCA on the data, the calculated Eigenvec-
Distance Measures
tors can be separated into ‘normal’ and ‘anomalous’
Using the notion of distance as a metric to measure
sets by examining the values of the data projected onto
change is natural and widely used.23,60,116,117 Two
each Eigenvector. At each time step the Eigenvectors
objects that have a small difference in a measured
are examined (via projection of the data) in descend-
metric can be said to be similar. The metrics measured
ing order of their corresponding Eigenvalues, and the
in graphs are typically structural features, such as
first Eigenvector to contain a projected point outside 3
the number of vertices. Once the summary metrics
standard deviations of the rest of the values, and every
are found for each graph, the difference or similarity,
Eigenvector thereafter, constitute the anomalous set.
which are inversely related, can be calculated. The
The second step is then to project the data onto its
variation in the algorithms lies in the metrics chosen
normal and anomalous subspaces. Once this is done,
to extract and compare, and the methods they use to
events are detected when the modifications in the
determine the anomalous values and corresponding
anomalous components from the previous time step to
graphs.
the current are above a threshold value.77 Expanding
on this method, joint sparse PCA and graph-guided Example
joint sparse PCA were developed to localize the In Ref 96, given a graph series G, a distance function
anomalies and identify the vertices responsible.83 The d that computes the distance between two matrices,
responsible vertices are more easily identified by using and a function to calculate the vertex affinity matrix
a sparse set of components for the anomalous set. Ver- S, where sij indicates the influence vertex i has on
tices are given an anomaly score based on the values vertex j, a set of time points for detected events is
of their corresponding row in the abnormal subspace. T : {t ∈ T|d(Gt , Gt + 1 ) ≤ c0 }, where
As a result of the anomalous components being sparse,
√
the vertices that are not anomalous receive a score of √ n n (√ √ )2
( ) √ ∑∑
0. Owing to the popularity of PCA in traffic anomaly d Gt , Gt + 1 = √ St,ij − St+1,ij , (2)
detection, a study was performed identifying and eval- i=1 j=1
uating the main challenges of using PCA.115
and c0 is a threshold value. Similar to the example
Change Detection given in the Community Detection section,67 the
data-specific feature of interest here is a measure of
finding the distance (inversely related to similarity)
The activity vector of a graph, u(t), is the principal
component, the left singular vector corresponding to
between two graphs. Thus, each adjacent graph pair
the largest singular value obtained by performing SVD is mapped to a single real value, creating a time
on the weighted adjacency matrix. A change point is series. The anomaly detector, the quality control mov-
when an activity vector is substantially different from ing average threshold, is unique compared to the
the ‘normal activity’ vector, which is derived from previous examples, as it considers the historical val-
previous activity vectors. ues of the data-specific features. The previous three
Volume 7, May/June 2015 © 2015 The Authors. WIREs Computational Statistics published by Wiley Periodicals, Inc. 235
19390068, 2015, 3, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wics.1347, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Advanced Review wires.wiley.com/compstats
sections (Community Detection, Compression, and existing method that assigns outlier scores for edges
Decomposition) incorporated this information into in a network47,48,50,74,119 could be used as the input
their data-specific feature generation, using window- to this method. The latter approach allows52 to be
ing or segmenting techniques. applied to any network that is capable of having out-
lier scores assigned to edges (e.g., weighted, directed,
Edge Detection and attributed). Once every edge is assigned an outlier
score, significant anomalous regions (SARs)—fixed
If the evolution of some edge attribute (e.g., edge subgraphs over a window of time—are mined from
weight) differs from the ‘normal’ evolution, then the the sequence, analogous to the problem of finding the
corresponding edge is characterized as anomalous. HDSs. An alternating iterative approach based on Ref
120—first finding an optimal time window for a fixed
In Ref 47, a dynamic road traffic network whose subgraph, then finding an optimal subgraph for the
edge weights vary over time is studied. The similarity fixed time window, repeating until no improvement
between the edges over time is computed using a decay is found—is constructed to approximate a solution to
function that weighs the more recent patterns higher. the NP-hard problem. This work was later extended
At each time step, an outlierness score is calculated to allow the subgraphs to smoothly evolve over time,
for each edge based on the sum of the changes in where vertices can be added or removed between adja-
similarity scores. Edges with the highest score, chosen cent time steps.97 A similar approach mines weighted
using either a threshold value or top-k heuristic, are frequent subgraphs in network traffic, where the edge
marked as anomalous at that time step. weights correspond to the anomaly contribution of
Viewing the network as a stream of edges, that edge.94
meaning the network does not have a fixed topology
as the road traffic network did, the frequency and
Event Detection
persistence of an edge can be measured and used as
an indicator of its novelty. The persistence of an edge Provided a function f (Gi , Gj ) that measures the dis-
is how long it remains in the graph once it is added, tance between two graphs, a time series of distance
and the frequency is how often it appears. In Ref 48, values can be constructed by applying the function on
set system discrepancy118 is used to measure the consecutive time steps in the series. Anomalous val-
persistence and frequency of the edges. When an edge ues can then be extracted from this time series using
arrives, its discrepancy is calculated and compared to a number of different heuristics, such as selecting the
the mean discrepancy value of the active edge set. If top k or using a moving average threshold.
the weighted discrepancy of the edge is more than a
threshold level greater than the mean discrepancy, the Extracting features from the graphs is a common
edge is declared novel (anomalous). Using the novel technique to create a summary of the graph in a few
edges detected, a number of metrics can be calculated scalar values, its signature. Local features are specific
for various graph elements (e.g., vertices, edges, and to a single vertex and its egonet11 (the subgraph
subgraphs). Individual graph elements can then be induced by itself and its one-hop neighbors), such
identified anomalous via comparison of the calculated as the vertex or egonet degree. Global features are
metrics for that element with the overall distribution derived from the graph as a whole, such as the
and change of the metric. graph radius. The local features of every vertex in
the graph can be agglomerated into a single vector,
Subgraph Detection the signature vector, of values that describe the graph
using the moments of distribution (such as mean
A subgraph with many ‘anomalous’ edges is deemed and standard deviation) of the feature values. In
anomalous. Ref 93, the similarity between two graphs is the
Canberra distance, a weighted version of the L1
Contrasting with Ref 46 where the edge weights distance, between the two signature vectors. A similar
are considered the strength of a connection, in Ref approach is used in Ref 92 to detect abnormal times in
52 the edge weights are considered as outlier scores, traffic dispersion graphs. Instead of an agglomeration
or how ‘anomalous’ that edge is at that time. Every of local features, it extracts global features from each
edge at every time step is given its own anomaly score, graph, introducing a dK-2 series121–123 -based distance
which is a function of the probability of seeing that metric, and any graph with a feature value above a
particular edge weight on that particular edge given threshold is anomalous.
the distribution of weights for that edge over all the As an alternative to extracting multiple features
graphs in the series. Alternatively, the output of an from the graph and using the signatures, the pairwise
236 © 2015 The Authors. WIREs Computational Statistics published by Wiley Periodicals, Inc. Volume 7, May/June 2015
19390068, 2015, 3, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wics.1347, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
WIREs Computational Statistics Anomaly detection in dynamic networks
vertex affinity scores may be used. Pairwise vertex features used were each vertex and its PageRank
affinity scores are a measure of how much each vertex value, and each edge (u, v) with a weight equal to u’s
influences another vertex, and can be found using fast PageRank value divided by the number of outlinks
belief propagation.124 In Ref 96, the scores are calcu- from u. Weighting the vertices with their ‘quality’,
lated for two consecutive time steps, and the similarity measured by PageRank, ensures that the removal of a
between the two graphs is the rooted Euclidean dis- high-quality vertex will have more of an impact than
tance (Matusita distance) between the score matrices. the removal of a low quality vertex. A fixed threshold
The changes in the vertex affinity score are shown to was set to find graphs with abnormally low similarity
accurately reflect and scale with the level of impact scores.
of the changes. For example, removing an edge that Instead of finding structural difference between
connects two otherwise disconnected components, a two consecutive graphs, events can be detected using
‘bridging edge’, results in a higher score than remov- the time series of robustness values for each graph.
ing an edge that does not affect the overall structure Robustness is a measure of the connectivity of the
of the graph. A moving threshold is set on the time graph. A graph with a high robustness will retain the
series of similarity scores using quality control with same general structure and connectivity even when
individual moving range. The exponential weighted some vertices or edges are removed. Finding events
moving average has also been used as a way to is then finding when the robustness value changes
dynamically set the threshold, tested on distribution significantly.95 A similar approach is given in Ref 89,
features extracted from Wikipedia revision logs.91 but using a variant of the graph diameter. The graph
Complementary to feature similarity, one can diameter used is the average of all vertex eccentricities
look at the structural differences between graphs (the eccentricity of a vertex v is the longest shortest
to identify the magnitude of change. These meth- path from v to any other vertex), instead of the typical
ods focus on the function that defines the distance definition of using the maximum vertex eccentricity.
between graphs instead of finding the optimal features Both the time series of graph diameter changes and the
to use as summaries. Many metrics have been devel- graph diameters themselves are shown to be effective
oped and tested to quantify the differences between methods for detecting events.
graphs. In Ref 88, 10 different distance functions
were evaluated on TCP/IP traffic graphs with known
anomalies (ground truth). Box-Jenkins autoregressive Probabilistic Models
moving average (ARMA) modeling125 was used to With a foundation in probability theory, distributions,
create a ‘normal’ model of the feature values, and and scan statistics, these methods typically construct
the time points with residuals, calculated as the dif- a model of what is considered ‘normal’ or expected,
ference between expected and real value, exceeding and flag deviations from this model as anomalous. The
a threshold are flagged. The 10 distance functions type of model used, how it is constructed, what it is
tested were weight, maximum common subgraph modeling, and the method for determining outliers is
(MCS) weight, MCS edge, MCS vertex, graph edit what differentiates these approaches.
distance, median graph edit distance, modality, diam-
eter, entropy, and spectral. Of these 10 distance func- Example
tions, the MCS-based methods performed the best; In Ref 98, given a graph series G and a likelihood scor-
however, finding the MCS between two graphs is an ing function L, a set of time points for change detec-
NP-hard problem. More recently in Ref 90, five dif- tion is a set of time steps T : {t ∈ T|log(L(t))/|𝜃| ≤ c0 },
∏
ferent distance scoring functions, all with a linear where L(t) = 𝜃appear in time t P(𝜃)c(𝜃) , 𝜃 denotes
complexity, were tested on web graphs with specific sender–recipient tuples, P(𝜃) indicates the probability
types of anomalies: missing subgraph, missing vertices, of all possible sender–recipient tuples (emails from a
and connectivity change. The five scoring functions single sender to a list of receivers), and c(𝜃) is the num-
were: vertex/edge overlap, vertex ranking, vector sim- ber of occurrences of a particular sender–recipient
ilarity, sequence similarity, and signature similarity. tuple in time t. Thus, each time step is mapped to
The method that performed the best was the signa- a single value, the average log-likelihood, indicating
ture similarity, which is done by extracting a signa- the predictability of the emails during that time step,
ture vector from each graph and finding the distance high values indicating normal behavior. Abnormal
between them using SimHash.126 Unlike the signa- time steps are identified via inspection of the plot
ture vectors discussed above that create summaries of of log-likelihood values. In general, the data-specific
the entire graph in a few scalar values, the signature features that are output are often probabilities, or
here includes every vertex and edge. Specifically, the likelihoods, of certain structures or events occurring.
Volume 7, May/June 2015 © 2015 The Authors. WIREs Computational Statistics published by Wiley Periodicals, Inc. 237
19390068, 2015, 3, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wics.1347, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Advanced Review wires.wiley.com/compstats
238 © 2015 The Authors. WIREs Computational Statistics published by Wiley Periodicals, Inc. Volume 7, May/June 2015
19390068, 2015, 3, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wics.1347, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
WIREs Computational Statistics Anomaly detection in dynamic networks
used in Ref 101. Edge weights in the graph are events and the vertices responsible can be identified.
calculated using a decay function where more recent When the Eigenvalues deviate from the expected dis-
edges weigh more. A normal behavior for subgraphs tribution an event has occurred, and the vertices whose
is defined by identifying persistent patterns, here matrices deviate from the matrix distribution127 the
found by constructing a graph where each edge is most are considered responsible.100
weighted by the fraction of time it is in the graph and
iteratively extracting the heaviest weight connected
components. As the cumulative graph evolves the DISCUSSION
extracted subgraphs are monitored, comparing their
With the wealth of algorithms discussed, the deci-
current activity to the expected activity, which is based
sion of which to use can be difficult. The choice
on the recent behavior.
depends on the types of graphs being analyzed (plain,
attributed, directed, undirected, etc.), the desired types
Event Detection
of anomalies, and the size of the graphs—influencing
Deviations from the models of the graph likelihood the allowed complexity, required parallelism, and
or the distribution of the Eigenvalues reveal when an whether streaming analysis is required. It is worth not-
event occurs. ing, however, that using multiple methods in parallel
or in conjunction with one another may yield superior
Recently, a new reservoir sampling method was results compared to using a single method. Using pub-
proposed in Ref 103 to maintain structural summaries licly available datasets (Table 7), each algorithm can
of a graph stream. The online sampling method man- be applied to data from a variety of domains, com-
ages multiple distinct partitionings of the network to paring the found anomalies, their sensitivity, and their
provide a statistically significant summary. As a new scalability.
graph is added to the stream, every edge has its likeli- While each method has its own advantages and
hood calculated based on the edge generation models disadvantages, for brevity we will give an overview
of the different partitions. Once the edge likelihoods of only those that have publicly available code (see
have been found an global graph likelihood is calcu- Table 6). We note that the available software listed
lated as the geometric mean of the edge likelihood is not comprehensive. We listed all software that was
values. A similar edge generation model approach was said to be available in their papers, as well as the
used in Ref 98, where the probability that and edge software of the authors that responded to our email
exists between vertices i and j is stored in a matrix, asking about the availability of their code. The meth-
M(i, j). The estimated probabilities are calculated ods are again partitioned by the types of anomalies
using expectation maximization, and subsequently they detect so that a fair qualitative comparison can
used to give potential scores to every sender–recipient be made. Note that a quantitative comparison among
pair. Each sender–recipient tuple, which is an email the available methods was impractical because the
from one sender to multiple recipients, then has its methods mostly detect different types of anomalies.
likelihood computed based on the estimated distribu- Of the few intra-type comparisons that exist, coordi-
tion of all possible sender–recipient tuples. A single nating the different outputs (e.g., anomalous vertices
score is calculated for each graph as the average of in the graphs versus anomalous vertices in communi-
the log-likelihood scores. In both of these methods a ties of the graphs) and attaining a valuable comparison
single score is calculated for each graph based on the would be unlikely.
likelihoods of the individual edges within it, and the
individual edges (or tuples) responsible are those that
have the lowest likelihood. Type 1 (Vertices) Methods
Similar to estimating the distribution of possible The earliest work with available software is the online
sender–recipient tuples, the distribution of the Eigen- method by Heard et al.,50 which models the frequency
values and compressed Eigen equations is calculated of the connections between vertices as a counting
in Ref 100. Based on the assumption that each ver- process and uses Bayesian learning and predictive
tex has a time series of a vertex-local feature (e.g., p-values to identify anomalies.71,70,50 In addition to
summed edge weight), a vertex-to-vertex correlation operating on graph streams, it leverages the sparsity
matrix can be generated for each time step. The Eigen of the network by only examining edges that appear
equation of the correlation matrix is compressed by in the graphs. A key advantage of this technique is
keeping the largest Eigenvalues and a set of low dimen- that it performs both sequential analysis, where new
sion matrices (one matrix for each vertex). By learning graphs are analyzed using the history of the graph
the distributions of the Eigenvalues and matrices, both stream, in addition to retrospective analysis, where
Volume 7, May/June 2015 © 2015 The Authors. WIREs Computational Statistics published by Wiley Periodicals, Inc. 239
19390068, 2015, 3, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wics.1347, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Advanced Review wires.wiley.com/compstats
time step
portions of the graph and their neighbors, opens many
Output
MATLAB
MATLAB
Java
Java
Java
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
PARCUBE85
ECOUTLIER
NETSPOT
1 These
240 © 2015 The Authors. WIREs Computational Statistics published by Wiley Periodicals, Inc. Volume 7, May/June 2015
19390068, 2015, 3, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wics.1347, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
WIREs Computational Statistics Anomaly detection in dynamic networks
examine the reconstruction error for the cells in those Type 4 (Events/Changes) Methods
time points individually. DeltaCon96 provides a graph similarity scoring func-
tion based on a number of desirable principles.85,96
It has been shown to effectively distinguish between,
Type 3 (Subgraphs) Methods and correctly mark as more important, the removal of
NetSpot52 is designed to find optimal anoma- ‘bridging’ edges and edges with higher weights, com-
lous subgraphs.52 Its alternating optimization pared to edges that do not affect the overall structure
approach—fixing a subgraph and finding the opti- if removed or have a low weight. Anomalies are found
mal time window, then fixing the time window and by finding the similarity between two adjacent graphs
finding the optimal subgraph within the window, in the stream, flagging those that are sufficiently differ-
and repeat—restricts the method to operating on an ent from their immediate neighbors as outliers. While
entire graph series at a time. The output from the not as robust as methods such as Ref 50 that perform
algorithm is a set of highly anomalous subgraphs both sequential and retrospective analysis, there is a
and their corresponding time windows, solving the commensurate decrease in the run time, scaling lin-
problem of attribution and requiring no further anal- early with the number of edges in the networks.
ysis on the part of the user. Additionally, using their The tensor decomposition method ParCube85
novel seed generation algorithm for mining heavy is the only method we discuss that is capable of
subgraphs, it is shown to scale linearly with the size handling graphs that do not fit in memory. Moreover,
of the network in terms of both vertices and time it is parallelizable on a shared memory machine
slices. While the authors assign outlier scores to the and has been designed specifically for sparse tensors.
edges in the network in their own way, it would be As with other tensor methods, it is able to handle
interesting to utilize an algorithm specifically designed any type of graphs with arbitrary attributes, as long
to assign outlier scores to edges in a network.48,74,50,47 as the attributes can be reduced to a real number
Preprocessing the networks using an auxiliary edge (e.g., edge labels being enumerated and mapping
scoring method allows a mapping from networks that them to integers). Using an online threshold, e.g.,
do not fit the requirements of being plain, weighted, maintaining the mean and standard deviation of the
and undirected to networks that do. reconstruction error and setting the threshold to two
As mentioned above, tensor methods are capa- standard deviations above the mean, ParCube can be
ble of identifying anomalous subgraphs using the applied to graph streams.
reconstruction error for particular portions of the
tensor. Without user guidance this is prohibitively
expensive, as the number of possible subgraphs is
CONCLUSION
exponential on the number of vertices in the graph. In this survey, we hoped to bridge the gap between the
However, if the user is interested in a particular set of increasing interest in anomaly detection in dynamic
vertices, or a particular vertex and its neighbors, the networks and the extensive literature available.
error can be tracked for just that region of the graph. A common high-level theoretical formulation was
FIGURE 5 | The proposed two-tiered taxonomy that illustrates the layout of the survey. Methods are first categorized based on their overarching
approach, then subdivided based on the types of anomalies they detect.
Volume 7, May/June 2015 © 2015 The Authors. WIREs Computational Statistics published by Wiley Periodicals, Inc. 241
19390068, 2015, 3, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wics.1347, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Advanced Review wires.wiley.com/compstats
found among the methods surveyed. Within this methods into five main categories (Figure 5) derived
formulation, all of the methods are concerned with from the intuition behind each: communities, com-
exploring various mappings from graphs to real pression, decomposition, distance, and probabilistic
numbers to best identify anomalous behavior using model based.
existing anomaly detectors. Although a myriad of The field of anomaly detection in dynamic net-
approaches have been proposed and explored, they works is relatively young and is rapidly growing in
all aim to identify one or more of the four key anoma- popularity, as indicated by the number of papers pub-
lies we have outlined for dynamic networks: vertices lished in the past five years. A concise summary of the
(Figure 1), edges (Figure 2), subgraphs (Figure 3), methods discussed in this survey is given in Tables 3
and events or changes (Figure 4). In conjunction with and 4, showing the type of anomalies detected, year
the types of anomalies identified, we partition the of publication, and time complexity (notation defined
242 © 2015 The Authors. WIREs Computational Statistics published by Wiley Periodicals, Inc. Volume 7, May/June 2015
19390068, 2015, 3, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wics.1347, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
WIREs Computational Statistics Anomaly detection in dynamic networks
ACKNOWLEDGMENTS
The authors would like to thank the reviewers for their excellent feedback and suggestions. We would also like
to thank the authors of the work we cited who provided feedback and additional information regarding their
papers. This work was supported in part by the DOE SDAVI Institute and the U.S. National Science Foundation
(Expeditions in Computing program). In addition, this material is based upon work supported in part with
funding from the Laboratory for Analytic Sciences. Any opinions, findings, conclusions, or recommendations
expressed in this material are those of the author(s) and do not necessarily reflect the views of the Laboratory
for Analytic Sciences and/or any agent or entity of the United States government.
REFERENCES
1. Cheng H, Tan P-N, Potter C, Klooster S. A robust 8. Priebe CE, Conroy JM, Marchette DJ, Park Y. Scan
graph-based algorithm for detection and characteriza- statistics on Enron graphs. Comput Math Organ
tion of anomalies in noisy multivariate time series. In: Theory 2005, 11:229–247.
IEEE International Conference on Data Mining Work- 9. Wang H, Tang M, Park Y, Priebe CE. Locality statis-
shops, Pisa, Italy, 2008, 349–358. tics for anomaly detection in time series of graphs.
2. Cheng H, Tan P-N, Potter C , Klooster SA. Detection arXiv:1306.0267, 2013.
and characterization of anomalies in multivariate time 10. Chen F, Neill DB. Non-parametric scan statistics
series. In: Proceedings of the 9th SIAM International for event detection and forecasting in heterogeneous
Conference on Data Mining (SDM), Sparks, NV, 2009, social media graphs. In: Proceedings of the 20th ACM
413–424. SIGKDD International Conference on Knowledge
3. Chen Z, Hendrix W, Guan H, Tetteh IK, Choudhary Discovery and Data Mining. New York: ACM; 2014,
A, Semazzi F, Samatova NF. Discovery of extreme 1166–1175.
events-related communities in contrasting groups of 11. Akoglu L, McGlohon M, Faloutsos C. Odd-
physical system networks. Data Min Knowl Discov ball: spotting anomalies in weighted graphs. In:
2013, 27:225–258. Advances in Knowledge Discovery and Data Mining.
4. Zhang T, Zhuang X, Pande S, Lee W. Anomalous Berlin/Heidelberg: Springer; 2010, 410–421.
path detection with hardware support. In: Proceedings 12. Eberle W, Holder L. Detecting anomalies in cargo
of the 2005 International Conference on Compilers, using graph properties. In: Intelligence and Secu-
Architectures and Synthesis for Embedded Systems. rity Informatics. Berlin/Heidelberg: Springer; 2006,
New York: ACM; 2005, 43–54. 728–730.
5. Ding Q Katenka N, Barford P, Kolaczyk E , Crovella 13. Eberle W, Holder L. Anomaly detection in data repre-
M. Intrusion as (anti) social communication: charac- sented as graphs. Intell Data Anal 2007, 11:663–689.
terization and detection. In: Proceedings of the 18th
ACM International Conference on Knowledge Discov- 14. Noble CC, Cook DJ. Graph-based anomaly detection.
ery and Data Mining (SIGKDD), Beijing, China, 2012, In: Proceedings of the 9th ACM International Con-
886–894. ference on Knowledge Discovery and Data Mining
(SIGKDD), Washington, DC, 2003, 631–636.
6. Ghoting A, Eric Otey M, Parthasarathy S. Loaded:
link-based outlier and anomaly detection in evolving 15. Barnett V, Lewis T. Outliers in Statistical Data, vol. 3.
data sets. In: Proceedings of the 4th IEEE Interna- New York: John Wiley & Sons; 1994608 p.
tional Conference on Data Mining (ICDM), Brighton, 16. Chandola V, Banerjee A, Kumar V. Anomaly detec-
UK, 2004, 387–390. tion: a survey. ACM Comput Surv 2009, 41:15.
7. Jing X, Shelton CR. Intrusion detection using contin- 17. Chandola V, Banerjee A, Kumar V. Anomaly detection
uous time Bayesian networks. J Artif Intell Res 2010, for discrete sequences: a survey. IEEE Trans Knowl
39:745–774. Data Eng 2012, 24:823–839.
Volume 7, May/June 2015 © 2015 The Authors. WIREs Computational Statistics published by Wiley Periodicals, Inc. 243
19390068, 2015, 3, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wics.1347, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Advanced Review wires.wiley.com/compstats
18. Chandola V, Mithal V, Kumar V. Comparative eval- 38. Klema V, Laub AJ. The singular value decomposition:
uation of anomaly detection techniques for sequence its computation and some applications. IEEE Trans
data. In: Proceedings of the 8th IEEE International Automatic Control 1980, 25:164–176.
Conference on Data Mining (ICDM), Pisa, Italy, 2008, 39. Kolda TG, Bader BW. Tensor decompositions and
743–748. applications. SIAM Rev 2009, 51:455–500.
19. Hawkins DM. Identification of Outliers, vol. 11. 40. Gao X, Xiao B, Tao D, Li X. A survey of graph edit
London: Chapman and Hall; 1980. distance. Pattern Anal Appl 2010, 13:113–129.
20. Akoglu L, Tong H, Koutra D. Graph-based anomaly 41. Rahm E, Bernstein PA. A survey of approaches to auto-
detection and description: a survey. Data Min Knowl matic schema matching. VLDB J 2001, 10:334–350.
Discov 2014, 28:1–63. 42. Koller D, Friedman N. Probabilistic Graphical Mod-
21. Bilgin CC, Yener B. Dynamic network evolution: els: Principles and Techniques. Cambridge, MA: MIT
models, clustering, anomaly detection. IEEE Netw Press; 2009.
2006. 43. Glaz J. Scan Statistics. Berlin/Heidelberg: Springer;
22. Gupta M, Gao J, Aggarwal CC, Han J. Outlier 2001.
Detection for Temporal Data. San Rafael, CA: Morgan 44. Akoglu L, Faloutsos C. Event detection in time series
& Claypool Publishers; 2014. of mobile communication graphs. 27th Army Sci Conf
23. Hodge VJ, Austin J. A survey of outlier detection 2010, 2:18.
methodologies. Artif Intell Rev 2004, 22:85–126. 45. Gao J, Liang F, Fan W, Wang C, Sun Y, Han J.
24. Agyemang M, Barker K, Alhajj R. A comprehensive On community outliers and their efficient detection
survey of numeric and symbolic outlier mining tech- in information networks. In: Proceedings of the 16th
niques. Intell Data Anal 2006, 10:521–538. ACM International Conference on Knowledge Discov-
25. Brent West D. Introduction to Graph Theory, vol. 2. ery and Data Mining (SIGKDD), Seattle, WA, 2010,
Upper Saddle River, NJ: Prentice Hall; 2001. 813–822.
46. Ji T, Yang D, Gao J. Incremental local evolution-
26. Balakrishnan R, Ranganathan K. A Textbook of
ary outlier detection for dynamic social networks.
Graph Theory. Berlin/Heidelberg: Springer; 2012.
In: Machine Learning and Knowledge Discovery in
27. Cook DJ, Holder LB. Mining Graph Data. New York: Databases. Berlin/Heidelberg: Springer; 2013, 1–15.
John Wiley & Sons; 2006.
47. Li X, Li Z, Han J, Lee J-G. Temporal outlier detection
28. Samatova NF, Hendrix W, Jenkins J, Padmanabhan K, in vehicle traffic data. In: Proceedings of the 25th Inter-
Chakraborty A. Practical Graph Mining with R. Boca national Conference on Data Engineering (ICDE),
Raton, FL: CRC Press; 2013. Shanghai, China, 2009, 1319–1322.
29. Fortunato S. Community detection in graphs. Phys 48. James A, Tina E-R, Nishchal D. Detecting novel dis-
Rep 2010, 486:75–174. crepancies in communication networks. In: Proceed-
30. Lancichinetti A, Fortunato S. Community detection ings of the 10th IEEE International Conference on
algorithms: a comparative analysis. Phys Rev E 2009, Data Mining (ICDM), Sydney, Australia, 2010, 8–17.
80:056117. 49. Rattigan MJ, Jensen D. The case for anomalous
31. Reichardt J, Bornholdt S. Statistical mechanics of link discovery. ACM SIGKDD Explor Newsl 2005,
community detection. Phys Rev E 2006, 74:016110. 7:41–47.
32. Harenberg S, Bello G, Gjeltema L, Ranshous S, Har- 50. Heard NA, Weston DJ, Platanioti K, Hand DJ.
lalka J, Seay R, Padmanabhan K, Samatova N. Com- Bayesian anomaly detection methods for social net-
munity detection in large-scale networks: a survey works. Ann Appl Stat 2010, 4:645–662.
and empirical evaluation. WIREs Comput Stat 2014, 51. Greene D, Doyle D, Cunningham P. Tracking the
6:426–439. evolution of communities in dynamic social networks.
33. Rissanen J. Modeling by shortest data description. In: 2010 International Conference on Advances in
Automatica 1978, 14:465–471. Social Networks Analysis and Mining (ASONAM),
2010, 176–183.
34. Rissanen J. A universal prior for integers and estima-
tion by minimum description length. Ann Stat 1983, 52. Mongiov M, Bogdanov P, Ranca R, Papalexakis EE,
11:416–431. Faloutsos C, Singh AK. Netspot: spotting signifi-
cant anomalous regions on dynamic networks. In:
35. Rissanen J. Minimum-Description-Length Principle. Proceedings of the 13th SIAM International Confer-
New York: John Wiley & Sons; 1985. ence on Data Mining (SDM), Austin, TX, 2013.
36. Grünwald P. The Minimum Description Length Prin- 53. Miller BA, Beard MS, Bliss NT. Eigenspace analysis
ciple. Cambridge, MA: MIT Press; 2007. for threat detection in social networks. In: 2011
37. Golub GH, Reinsch C. Singular value decomposition Proceedings of the 14th International Conference on
and least squares solutions. Numerische Mathematik Information Fusion (FUSION), Chicago, IL, 2011,
1970, 14:403–420. 1–7.
244 © 2015 The Authors. WIREs Computational Statistics published by Wiley Periodicals, Inc. Volume 7, May/June 2015
19390068, 2015, 3, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wics.1347, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
WIREs Computational Statistics Anomaly detection in dynamic networks
54. Ramanathan A, Agarwal PK, Kurnikova M, Lang- International Conference on Data Mining (SDM),
mead CJ. An online approach for mining collec- Anaheim, CA, 2012, 624–635.
tive behaviors from molecular dynamics simulations. 69. Chen Z, Hendrix W, Samatova NF. Community-based
J Comput Biol 2010, 17:309–324. anomaly detection in evolutionary networks. J Intell
55. Saligrama V, Chen Z. Video anomaly detection based Inf Syst 2012, 39:59–85.
on local statistical aggregates. In: 2012 IEEE Con- 70. Gupta M, Gao J, Sun Y, Han J. Integrating community
ference on Computer Vision and Pattern Recognition matching and outlier detection for mining evolution-
(CVPR), Providence, RI, 2012, 2112–2119. ary community outliers. In: Proceedings of the 18th
56. Tran L, Navasca C, Luo J. Video detection anomaly via ACM International Conference on Knowledge Discov-
low-rank and sparse decompositions. In: Image Pro- ery and Data Mining (SIGKDD), Beijing, China, 2012,
cessing Workshop (WNYIPW), 2012 Western New 859–867.
York, New York, NY, 2012, 17–20. 71. Gupta M, Gao J, Sun Y, Han J. Community trend
57. Koutra D, Papalexakis EE, Faloutsos C. Tensorsplat: outlier detection using soft temporal pattern min-
spotting latent anomalies in time. In: 2012 16th ing. In: Machine Learning and Knowledge Discov-
Panhellenic Conference on Informatics (PCI), Piraeus, ery in Databases. Berlin/Heidelberg: Springer; 2012,
Greece, 2012, 144–149. 692–708.
58. Ishibashi K, Kondoh T, Harada S, Mori T, Kawahara 72. Rossi RA, Gallagher B, Neville J, Henderson K. Mod-
R, Asano S. Detecting anomalous traffic using commu- eling dynamic behavior in large evolving graphs. In:
nication graphs. In: Telecommunications: The Infras- Proceeding of the 6th ACM International Conference
tructure for the 21st Century (WTC), Vienna, Austria, on Web Search and Data Mining (WSDM), Rome,
2010, 1–6 Italy, 2013, 667–676.
59. Suykens JAK, Vandewalle J. Least squares support 73. Araujo M, Papadimitriou S, Günnemann S, Faloutsos
vector machine classifiers. Neural Process Lett 1999, C, Basu P, Swami A, Papalexakis EE, Koutra D.
9:293–300. Com2: fast automatic discovery of temporal (’comet’)
60. Breunig MM, Kriegel H-P, Ng RT, Sander J. LOF: communities. In: Proceedings of the 18th Pacific-Asia
identifying density-based local outliers. In: ACM Sig- Conference on Knowledge Discovery and Data Mining
mod Record, vol. 29. New York: ACM; 2000, 93–104. (PAKDD), Tainan, Taiwan, 2014, 271–283.
61. Aggarwal CC, Yu PS. Outlier detection for high 74. Chakrabarti D. Autopart: parameter-free graph parti-
dimensional data. In: ACM Sigmod Record, vol. 30. tioning and outlier detection. In: Proceedings of the
New York: ACM; 2001, 37–46. 8th European Conference on Principles and Practice
of Knowledge Discovery in Databases (PKDD), Pisa,
62. Angiulli F, Pizzuti C. Fast outlier detection in high Italy, 2004, 112–124.
dimensional spaces. In: Principles of Data Mining and
Knowledge Discovery. Berlin/Heidelberg: Springer; 75. Sun J, Faloutsos C, Papadimitriou S, Yu PS. Graph-
2002, 15–27. scope: parameter-free mining of large time-evolving
graphs. In: Proceedings of the 13th ACM International
63. Angiulli F, Pizzuti C. Outlier mining in large Conference on Knowledge Discovery and Data Mining
high-dimensional data sets. IEEE Trans Knowl (SIGKDD), San Jose, CA, 2007, 687–696.
Data Eng 2005, 17:203–215.
76. Ide T, Kashima H. Eigenspace-based anomaly detec-
64. Parsons L, Haque E, Liu H. Subspace clustering for tion in computer systems. In: Proceedings of the 10th
high dimensional data: a review. ACM SIGKDD ACM International Conference on Knowledge Discov-
Explor Newsl 2004, 6:90–105. ery and Data Mining (SIGKDD), Seattle, WA, 2004,
65. Srivastava AN. Enabling the discovery of recur- 440–449.
ring anomalies in aerospace problem reports using 77. Lakhina A, Crovella M, Diot C. Diagnosing
high-dimensional clustering techniques. In: Aerospace network-wide traffic anomalies. In: ACM SIGCOMM
Conference, Big Sky, MT, 2006, 17 pp. Computer Communication Review, vol. 34. New
66. Pokrajac D, Lazarevic A, Latecki LJ. Incremental York: ACM; 2004, 219–230.
local outlier detection for data streams. In: IEEE 78. Sun J, Papadimitriou S, Yu PS. Window-based ten-
Symposium on Computational Intelligence and Data sor analysis on high-dimensional and multi-aspect
Mining, Orlando, FL, 2007, 504–515. streams. In: Proceedings of the 6th IEEE International
67. Duan D, Li Y, Jin Y, Zhengding L. Community mining Conference on Data Mining (ICDM), Hong Kong,
on dynamic weighted directed graphs. In: Proceedings China, 2006, 1076–1080.
of the 1st ACM International Workshop on Complex 79. Sun J, Tao D, Christos F. Beyond streams and graphs:
Networks Meet Information & Knowledge Manage- dynamic tensor analysis. In: Proceedings of the 12th
ment. New York: ACM; 2009, 11–18. ACM International Conference on Knowledge Discov-
68. Aggarwal CC, Subbian K. Event detection in ery and Data Mining (SIGKDD), Philadelphia, PA,
social streams. In: Proceedings of the 12th SIAM 2006, 374–383.
Volume 7, May/June 2015 © 2015 The Authors. WIREs Computational Statistics published by Wiley Periodicals, Inc. 245
19390068, 2015, 3, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wics.1347, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Advanced Review wires.wiley.com/compstats
80. Sun J, Xie Y, Zhang H, Faloutsos C. Less is more: com- Information and Communication Technology. New
pact matrix decomposition for large sparse graphs. York: ACM; 2011, 36–41.
In: Proceedings of the 7th SIAM International Con- 93. Berlingerio M, Koutra D, Eliassi-Rad T, Faloutsos
ference on Data Mining (SDM), Minneapolis, MN, C. Netsimile: a scalable approach to size-independent
2007. network similarity. arXiv:1209.2684, 2012.
81. Kolda TG, Sun J. Scalable tensor decompositions 94. He W, Guangmin H, Zhou Y. Large-scale ip network
for multi-aspect data mining. In: Proceedings of the behavior anomaly detection and identification using
8th IEEE International Conference on Data Mining substructure-based approach and multivariate time
(ICDM), Pisa, Italy, 2008, 363–372. series mining. Telecomm Syst 2012, 50:1–13.
82. Tong H, Papadimitriou S, Sun J, Yu PS, Faloutsos 95. Malliaros FD, Megalooikonomou V, Faloutsos C.
C. Colibri: fast mining of large static and dynamic Fast robustness estimation in large social graphs:
graphs. In: Proceedings of the 14th ACM International communities and anomaly detection. In: Proceedings
Conference on Knowledge Discovery and Data Mining of the 12th SIAM International Conference on Data
(SIGKDD), Las Vegas, NV, 2008, 686–694. Mining (SDM), Anaheim, CA, 2012, 942–953.
83. Jiang R, Fei H, Huan J. Anomaly localization for 96. Koutra D, Vogelstein J, Faloutsos C. Deltacon: a prin-
network data streams with graph joint sparse PCA. cipled massive-graph similarity function. In: Proceed-
In: Proceedings of the 17th ACM International Con- ings of the 13th SIAM International Conference on
ference on Knowledge Discovery and Data Mining Data Mining (SDM), Texas-Austin, TX, 2013.
(SIGKDD), San Diego, CA, 2011, 886–894.
97. Mongiovi M, Bogdanov P, Singh AK. Mining evolving
84. Miller BA, Arcolano N, Beard MS, Kepner J, Schmidt network processes. In: Proceedings of the 13th IEEE
MC, Bliss NT, Wolfe PJ. A scalable signal processing International Conference on Data Mining (ICDM),
architecture for massive graph analysis. In: 2012 IEEE Dallas, TX, 2013.
International Conference on Acoustics, Speech and
98. Huang Z, Dajun Zeng D. A link prediction approach
Signal Processing (ICASSP), 2012, 5329–5332.
to anomalous email detection. In: IEEE International
85. Papalexakis EE, Faloutsos C, Sidiropoulos ND. Par- Conference on Systems, Man and Cybernetics, San
cube: sparse parallelizable tensor decompositions. Diego, CA, 2006, 1131–1136.
In: Machine Learning and Knowledge Discovery
99. Pandit S, Horng Chau D, Wang S, Faloutsos C. Net-
in Databases. Berlin/Heidelberg: Springer; 2012,
probe: a fast and scalable system for fraud detection in
521–536.
online auction networks. In: Proceedings of the 16th
86. Miller BA, Arcolano N, Bliss NT. Efficient anomaly International Conference on World Wide Web, Banff,
detection in dynamic, attributed graphs: emerging Canada, 2007, 201–210.
phenomena and big data. In: 2013 IEEE International
100. Hirose S, Yamanishi K, Nakata T, Fujimaki R. Net-
Conference on Intelligence and Security Informatics
work anomaly detection based on Eigen equation com-
(ISI), Washington, DC, 2013, 179–184.
pression. In: Proceedings of the 15th ACM Interna-
87. Yu W, Aggarwal CC, Ma S, Wang H. On anomalous tional Conference on Knowledge Discovery and Data
hotspot discovery in graph streams. In: Proceedings Mining (SIGKDD), Paris, France, 2009, 1185–1194.
of the 13th IEEE International Conference on Data
101. Thompson B, Eliassi-Rad T. Dapa-v10: discovery
Mining (ICDM), Dallas, TX, 2013.
and analysis of patterns and anomalies in volatile
88. Pincombe B. Anomaly detection in time series of time-evolving networks. In: Notes of the 1st Workshop
graphs using Arma processes. ASOR Bull 2005, 24:2. on Information in Networks (WIN), New York, NY,
89. Gaston ME, Kraetzl M, Wallis WD. Using graph 2009.
diameter for change detection in dynamic networks. 102. Djidjev H, Sandine G, Storlie C, Vander Wiel S.
Australasian J Combinatorics 2006, 35:299. Graph based statistical analysis of network traffic. In:
90. Papadimitriou P, Dasdan A, Garcia-Molina H. Web Proceedings of the Ninth Workshop on Mining and
graph similarity for anomaly detection. J Internet Serv Learning with Graphs, San Diego, CA, 2011.
Appl 2010, 1:19–30. 103. Aggarwal CC, Zhao Y, Yu PS. Outlier detection in
91. Arackaparambil C, Yan G. Wiki-watchdog: anomaly graph streams. In: Proceedings of the 27th Interna-
detection in Wikipedia through a distributional lens. tional Conference on Data Engineering (ICDE), Han-
In: Proceedings of the 2011 IEEE/WIC/ACM Interna- nover, Germany, 2011, 399–409.
tional Conferences on Web Intelligence and Intelligent 104. Neil J, Hash C, Brugh A, Fisk M, Storlie CB. Scan
Agent Technology. Washington, DC: IEEE Computer statistics for the online detection of locally anomalous
Society; 2011, 257–264. subgraphs. Technometrics 2013, 55:403–414.
92. Quoc Le D, Jeong T, Eduardo Roman H, Won-Ki 105. Tantipathananandh C, Berger-Wolf T. Constant-
Hong J. Traffic dispersion graph based anomaly detec- factor approximation algorithms for identifying
tion. In: Proceedings of the Second Symposium on dynamic communities. In: Proceedings of the 15th
246 © 2015 The Authors. WIREs Computational Statistics published by Wiley Periodicals, Inc. Volume 7, May/June 2015
19390068, 2015, 3, Downloaded from https://wires.onlinelibrary.wiley.com/doi/10.1002/wics.1347, Wiley Online Library on [18/11/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
WIREs Computational Statistics Anomaly detection in dynamic networks
ACM International Conference on Knowledge Dis- 118. Chazelle B. The Discrepancy Method: Randomness
covery and Data Mining (SIGKDD), Paris, France, and Complexity. Cambridge, UK: Cambridge Univer-
2009, 827–836 sity Press; 2002.
106. Tantipathananandh C , Berger-Wolf T. Finding com- 119. Tong H, Lin C-Y. Non-negative residual matrix factor-
munities in dynamic social networks. In: Proceed- ization with application to graph anomaly detection.
ings of the 11th IEEE International Conference on In: SDM. Mesa, AZ: SIAM; 2011, 143–153.
Data Mining (ICDM), Vancouver, Canada, 2011, 120. Bogdanov P, Mongiov M, Singh AK. Mining heavy
1236–1241. subgraphs in time-evolving networks. In: Proceedings
107. Tantipathananandh C , Berger-Wolf T, Kempe D. A of the 11th IEEE International Conference on Data
framework for community identification in dynamic Mining (ICDM), Vancouver, Canada, 2011, 81–90.
social networks. In: Proceedings of the 13th ACM 121. Mahadevan P, Krioukov D, Fall K, Vahdat A. Sys-
International Conference on Knowledge Discovery tematic topology analysis and generation using degree
and Data Mining (SIGKDD), San Jose, CA, 2007, correlations. In: ACM SIGCOMM Computer Com-
717–726. munication Review, vol. 36. New York: ACM; 2006,
108. Gupta M, Gao J, Han J. Community distribution 135–146.
outlier detection in heterogeneous information net- 122. Sala A, Cao L, Wilson C, Zablit R, Zheng H, Zhao
works. In: Machine Learning and Knowledge Discov- BY. Measurement-calibrated graph models for social
ery in Databases. Berlin/Heidelberg: Springer; 2013, network experiments. In: Proceedings of the 19th
557–573. International Conference on World Wide Web. New
109. Grünwald P. A Tutorial Introduction to the Minimum York: ACM; 2010, 861–870.
Description Length Principle. Cambridge, MA: MIT 123. Wang J, Provan GM. Generating application-specific
Press; 2005. benchmark models for complex systems. In: AAAI,
110. Harshman RA. Foundations of the Parafac Procedure: 2008, 566–571.
Models and Conditions for an “Explanatory” Mul- 124. Koutra D, Ke T-Y, Kang U, Horng Polo Chau D,
timodal Factor Analysis. Los Angeles, CA: UCLA; Kenneth Pao H-K, Faloutsos C. Unifying guilt-by-
1970. association approaches: theorems and fast algorithms.
111. Jolliffe I. Principal Component Analysis. New York: In: Machine Learning and Knowledge Discovery
John Wiley & Sons; 2005. in Databases. Berlin/Heidelberg: Springer; 2011,
112. Barnathan M, Megalooikonomou V, Faloutsos C, 245–260.
Faro S, Mohamed FB. Twave: high-order analysis of 125. Box GEP, Jenkins G. Time Series Analysis: Forecasting
functional MRI. Neuroimage 2011, 58:537–548. and Control. San Francisco, CA: Holden-Day; 1970.
113. Tucker LR. Some mathematical notes on three-mode 126. Charikar MS. Similarity estimation techniques
factor analysis. Psychometrika 1966, 31:279–311. from rounding algorithms. In: Proceedings of the
114. Chung F, Linyuan L, Van V. Spectra of random graphs Thiry-Fourth Annual ACM Symposium on Theory of
with given expected degrees. Proc Natl Acad Sci 2003, Computing. New York: ACM; 2002, 380–388.
100:6313–6318. 127. Gupta AK, Nagar DK. Matrix Variate Distributions,
115. Ringberg H, Soule A, Rexford J, Diot C. Sensitivity vol. 104. Boca Raton, FL: CRC Press; 1999.
of PCA for traffic anomaly detection. ACM SIGMET- 128. Steinhaeuser K, Chawla NV, Ganguly AR. Complex
RICS Perf Eval Rev 2007, 35:109–120. networks as a unified framework for descriptive anal-
116. Chen Y, Miao D, Zhang H. Neighborhood outlier ysis and predictive modeling in climate science. Stat
detection. Expert Syst Appl 2010, 37:8745–8749. Anal Data Min 2011, 4:497–511.
117. Papadimitriou S, Kitagawa H, Gibbons PB, Falout- 129. Aggarwal CC. Outlier Analysis. Berlin/Heidelberg:
sos C. Loci: fast outlier detection using the local Springer; 2013.
correlation integral. In: Proceeding of the 19th Inter- 130. Gupta M, Gao J, Aggarwal CC, Han J. Outlier
national Conference on Data Engineering, 2003, detection for temporal data: a survey. IEEE Trans
315–326. Knowl Data Eng 2013, 25:1.
Volume 7, May/June 2015 © 2015 The Authors. WIREs Computational Statistics published by Wiley Periodicals, Inc. 247