Measuring Temporal Patterns in Dynamic Social Networks: ACM Reference Format
Measuring Temporal Patterns in Dynamic Social Networks: ACM Reference Format
Measuring Temporal Patterns in Dynamic Social Networks: ACM Reference Format
Given social networks over time, how can we measure network activities across different timesteps with
a limited number of metrics? We propose two classes of dynamic metrics for assessing temporal evolution
patterns of agents in terms of persistency and emergence. For each class of dynamic metrics, we implement
it using three different temporal aggregation models ranging from the most commonly used Average Aggre-
gation Model to more the complex models such as the Exponential Aggregation Model. We argue that the
problem of measuring temporal patterns can be formulated using Recency and Primacy effect, which is a
concept used to characterize human cognitive processes. Experimental results show that the way metrics
model Recency–Primacy effect is closely related to their abilities to measure temporal patterns. Further-
more, our results indicate that future network agent activities can be predicted based on history information
using dynamic metrics. By conducting multiple experiments, we are also able to find an optimal length of
history information that is most relevant to future activities. This optimal length is highly consistent within
a dataset and can be used as an intrinsic metric to evaluate a dynamic social network.
Categories and Subject Descriptors: H.2.8 [Information Systems-Database Applications]: Data Mining
General Terms: Measurement, Algorithms, Performance
Additional Key Words and Phrases: Social network analysis, aggregating method, temporal analysis, Primacy
and Recency effects
ACM Reference Format:
Wei Wei and Kathleen M. Carley. 2015. Measuring temporal patterns in dynamic social networks. ACM
Trans. Knowl. Discov. Data. 10, 1, Article 9 (July 2015), 27 pages.
DOI: http://dx.doi.org/10.1145/2749465
1. INTRODUCTION
A social network can be modeled mathematically by a graph G = {V, E} which con-
sists of a set of agents V and a set of edges E connecting them. A dynamic social
network consists of a series of observations of social networks at different timesteps
{G1 , G2 , . . . , Gn}. A dynamic social network contains not only a set of relationships
between agents, but also information on how these relationships change over time.
Understanding how temporal factors affect the evolutions of network structures is a
This work is part of the dynamics networks project at the center for Computational Analysis of Social
and Organizational Systems (CASOS) of the school of Computer Science (SCS) at Carnegie Mellon Uni-
versity (CMU). This work was supported in part by the Office of Naval Research (ONR) through a MURI
N000140811186 on adversarial reasoning, a MINERVA N000141310835 on trust and media, the Defense
Threat Reduction Agency (DTRA) HDTRA11010102, and by Center for Computational Analysis of Social
and Organization Systems (CASOS). The views and conclusions contained in this document are those of the
authors and should not be interpreted as representing the official policies, either expressed or implied, of
the ONR, the DTRA, or the US government.
Authors’ addresses: W. Wei and K. M. Carley, School of Computer Science, Carnegie Mellon University,
Pittsburgh, PA; emails: {weiwei, kathleen.carley}@cs.cmu.edu.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted
without fee provided that copies are not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. Copyrights for components of this work owned
by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request 9
permissions from [email protected].
2015 Copyright is held by the owner/author(s). Publication rights licensed to ACM.
ACM 1556-4681/2015/07-ART9 $15.00
DOI: http://dx.doi.org/10.1145/2749465
ACM Transactions on Knowledge Discovery from Data, Vol. 10, No. 1, Article 9, Publication date: July 2015.
9:2 W. Wei and K. M. Carley
persistent research topic in social network analysis. Recently, there have been many
works focusing on temporal analysis such as link prediction [Dunlavy et al. 2011],
graph evolution [Leskovec et al. 2007; Leskovec et al. 2008], and community evolution
[Lin et al. 2009].
One important issue in the study of dynamic social networks is how to measure net-
work activity patterns across different timesteps. For example, we might be interested
in the message-sending patterns overtime of a particular agent in the social network.
Such analysis can be useful for a number of reasons. (1) Understanding the temporal
patterns of agents, a critical step in abnormality detection. In an online social net-
work, a sharp decrease in network activities indicates potential abnormalities such
as account intrusion or difficulties accessing service. On the other hand, a sudden in-
crease might indicate, for example, the introduction of a bot-spammer. (2) Temporal
activity analysis is a keystone to the problem of network evolution patterns. Future
network activities can be predicted based on temporal patterns learned from the exist-
ing networks. (3) Repeated temporal patterns suggest certain temporal invariants in
the dynamic social network. Those temporal factors are critical to the understanding
of the periodic nature of many dynamic social networks.
In this article, we propose two types of dynamic metrics to assess temporal patterns
in dynamic social networks – Persistence and Emergence. Persistence, which aggre-
gates agents’ activities over time, is a class of metrics to evaluate the extent to which
an agent maintains its level of activity. Emergence, which compares activity in two
adjacent timesteps, is a class of metrics to evaluate the suddenness of change in net-
work activities. Each of these dynamic metrics relies on a specific temporal aggregation
model, which synthesizes a series of temporal network activities into single numerical
representations. Here, the network activities are measured per timestep using static
network metrics such as degree centrality, clustering coefficients, or authority cen-
trality. They will be aggregated through one of the temporal aggregation models and
form either a Persistence or an Emergence metric. Clearly there can be many different
models to aggregate historic data and each has its advantages and disadvantages. In
this article, we evaluate three particular temporal aggregation models: average, linear,
and exponential. The Average Model aggregates temporal information by assigning an
equal weight to each timestep. The Linear Model assigns linearly increased weights,
and the Exponential Model assigns exponentially increased ones to each timestep. We
will see how those aggregation models affect the performance of the models later in the
article.
Since each dynamic metric is an evaluation of temporal patterns, we can use it to
approximate activity patterns in the future. We use a regression model to evaluate
the performance of each metric by using it to predict future network activities. Exper-
imental results show that the performances of the metrics are closely related to the
ways that they replicate Recency–Primacy effects. From a human cognition perspec-
tive, Primacy is the extent to which initial information dominates the perception. In
contrast, Recency is the extent to which the most current information dominates the
perception.
The contributions of this article are threefold. First, we provided two efficient dy-
namic metrics to effectively measure temporal patterns in social networks. Each metric
measures one of two important aspects of temporal patterns. Second, we built a sys-
tematic approach to assess the performance of our metrics. The approach is based on
the fact that dynamic metrics are both a representation of historical activity patterns
and an approximation of future activity. Third, we provide statistical evidences that
dynamic social networks have intrinsic temporal invariants. Activity patterns in social
networks are highly dependent on historical activity information within a specific time
frame.
ACM Transactions on Knowledge Discovery from Data, Vol. 10, No. 1, Article 9, Publication date: July 2015.
Measuring Temporal Patterns in Dynamic Social Networks 9:3
2. BACKGROUND
2.1. Evolution of Dynamic Social Networks
Social networks are in a constant state of flux [Palinkas et al. 1998; Romney 1988;
Sampson 1968; Sanil et al. 1995]. Unsurprisingly, then, studies of network dynam-
ics abound. Early studies in this area suggested that social networks are generated
according to random graph models with a fixed probability p that dominates the suc-
cessful rate that new links added to the networks [Erdős and Rényi 1960]. The fact that
this random graph model does not exhibit clustering behaviors, which are generally
observed in real social network data, resulted in more realistic network models such
as the small world model [Watts and Strogatz 1998] being developed. In a small world
network, the distance between two arbitrary agents is proportional to the logarithm of
the number of agents in the network. Recent findings on large-scale social networks
data [Faloutsos et al. 1999; Newman 2001b, 2004] suggest that the degree distribution
of networks may be fairly accurately modeled as a power law [Albert and Barabási
2002; Barabási and Albert 1999; Newman 2005]. A power-law network can be gen-
erated by the process called preferential attachment [Newman 2001a], which models
agent’s ability to attract links as being proportional to the total number of their already
established links. However, in real social networks, links grow and decay leading to the
need for incremental metrics that handle both positive and negative network changes
[Kas et al. 2015]. We design temporal aggregation metrics that account for both tie
growth and atrophication.
ACM Transactions on Knowledge Discovery from Data, Vol. 10, No. 1, Article 9, Publication date: July 2015.
9:4 W. Wei and K. M. Carley
have developed enable the assessment of change over different time periods. However,
in contrast to Kas, our approach accounts for burstiness by supporting an averaging
procedure.
ACM Transactions on Knowledge Discovery from Data, Vol. 10, No. 1, Article 9, Publication date: July 2015.
Measuring Temporal Patterns in Dynamic Social Networks 9:5
3.2. Emergence
Emergence is a type of metric used to evaluate the changes in activity over time. We
use Ei,t to denote the Emergence for agent i at time t. In Equation (2), the Emergence
is defined in two different cases. At the very beginning timestep where t = 1 or at any
timestep when both Pi,t and Pi,t−1 are zero, Emergence is equal to 0, which indicates a
neutral change. When t is larger than 1 and max{Pi,t , Pi,t−1 } = 0, it is defined to be the
normalized change of Persistence at the most recent timesteps. Depending on whether
the Persistence contains negative values, normalization terms are defined differently.
For Persistence that is guaranteed to be positive, Ni,t = max{Pi,t , Pi,t−1 }. If the positive
restriction does not hold, the normalization factor Ni,t = |Pi,t | + |Pi,t−1 |, which is a
weaker bound of the nominator
⎧
⎨0 t = 1 or max Pi,t , Pi,t−1 = 0
Ei,t = P − Pi,t−1 , (2)
⎩ i,t t>1
Ni,t
where Ni,t = max{Pi,t , Pi,t−1 } for Persistence that contains only positive values and
Ni,t = |Pi,t | + |Pi,t−1 | when Persistence might contain negative values.
To see why the normalization factors make sense, consider Equation (3) as the defini-
+
tion of Ei,t which is equivalent to Equation (2) under the assumption that Persistence
is guaranteed to be positive. Here we separated the conditions into three parts and
Ni,t = max{Pi,t , Pi,t−1 } is substituted for its corresponding value under each condition.
Emergence is positive when Pi,t > Pi,t−1 and negative when Pi,t < Pi,t−1 . It also follows
naturally that if Pi,t and Pi,t−1 are the same or t = 0, it yields an Emergence value of
0. Because of the normalization, Emergence always ranges from −1 to 1. When Emer-
gence is close to 1, it suggests a radical increase in Persistence over time. This can be
the result of a sudden increase in activity measures from 0 to an arbitrary amount.
When the Emergence is close to −1, it suggests that the agent has a radical decrease
in its Persistence. If an agent remains at a stable amount of activity overtime, it will
have an Emergence of approximately 0, which indicates a neutral emergent pattern,
⎧
⎪ P
⎪ 1 − i,t−1
⎪ Pi,t > Pi,t−1
⎪
⎪ Pi,t
⎨
+
Ei,t = 0 t = 1 or Pi,t = Pi,t−1 . (3)
⎪
⎪
⎪
⎪ P
⎪
⎩ i,t − 1 Pi,t < Pi,t−1
Pi,t−1
When Persistence may contain negative values, we have a slight different definition
±
of Emergence called Ei,t in Equation (4). Similar to what we have discussed in the
positive case, the Emergence is positive when Pi,t > Pi,t−1 and vice versa. It is also
obvious to notice that |Pi,t − Pi,t−1 | ≤ |Pi,t | + |Pi,t−1 | which makes −1 ≤ Ei,t ≤ 1,
⎧
⎪ Pi,t − Pi,t−1
⎪
⎪ Pi,t > Pi,t−1
⎪
⎨ i,t | + |Pi,t−1 |
⎪ |P
±
Ei,t = 0 t = 1 or Pi,t = Pi,t−1 . (4)
⎪
⎪
⎪
⎪ −(Pi,t−1 − Pi,t )
⎪
⎩ Pi,t < Pi,t−1
|Pi,t | + |Pi,t−1 |
To summarize, Equations (3) and (4) are definitions of Emergence under different
assumptions of the data. However, those two definitions can be combined into one using
Equation (2), which is the formation definition of Emergence.
ACM Transactions on Knowledge Discovery from Data, Vol. 10, No. 1, Article 9, Publication date: July 2015.
9:6 W. Wei and K. M. Carley
One thing which makes Emergence different from Persistence is that there is no
magnitude information because of the normalization. We believed that normalization
makes Emergence more useful in illustrating the agents’ temporal patterns overtime.
Consider two agents i and j who have Persistence Pi,t = β · a and Pj,t = γ · a at time t
and Pi,t−1 = β ·b and Pj,t−1 = γ ·b on time t − 1. When β and γ are significantly different
(e.g. β is significantly larger than γ ), those two agents exhibit fundamentally different
behaviors in terms of Persistence because of the differences in the magnitudes of activ-
ities. If we assume that those Persistence are positive and a>b, applying Equation (3)
makes the magnitude factor β and γ to disappear. In that case, Ei,t = E j,t = 1 − ab .
In other words, agent i and j have the same Emergence in spite of their difference in
magnitude demonstrated in Persistence.
3.3. Activity Measures
At each network snapshot t, the activity measure mi,t represents the evaluation of
activity of node i. Common activity measures include centrality measures, cluster-
ing coefficients, and eigenvector-based measures [Bonacich 1972; Watts and Strogatz
1998]. Contrary to Persistence and Emergence, which are dynamic network metrics,
the activity measures can only evaluate the activities within a specific timestep. In
some situations, they are referred to as activity metrics.
Our Persistence and Emergence metrics can be applied to a wide variety of activ-
ity metrics. In fact, there are no restrictions on the types of activity metrics that our
method can be utilized in conjunction with. If the activity metrics are guaranteed to be
non-negative, the corresponding Persistence will also be non-negative. Some examples
include degree centrality and betweenness centrality. When calculating Emergence, we
can choose the normalization factor as max{Pi,t , Pi,t−1 } to achieve a tighter bound. If the
activity metrics are not guaranteed to be nonnegative, the corresponding Persistence
may contain negative values. In this case, we have to choose the more relaxed nor-
malization factor |Pi,t | + |Pi,t−1 |. Such activity metrics include eigenvector centrality. In
some cases, one can use linear transformation techniques (e.g. to divide by its minimal
values) to convert those metrics into a nonnegative measures [Wei et al. 2011].
4. TEMPORAL AGGREGATION MODELS
Temporal aggregation models are essential components of the dynamic metrics we
defined in Section 3. In this section, we will illustrate three temporal aggregation
models and illustrate how they relate to the dynamic metrics.
4.1. Aggregation Functions
The aggregation function agg(·) takes a variable length vector parameter m as input
and returns a single numeric evaluation of the temporal information contained in m.
The temporal activity of a specific agent i from time 1 to time t is denoted as a vector
mi = (mi,1 , . . . , mi,t ). We then apply the aggregation function agg(·) on this vector to
assess Persistence and Emergence. Given the fact that the length of temporal history
varies in the networks, the aggregation function should be adaptable to variable input
length.
For an input vector m ∈ Rd, we define the aggregation function to be the linear
combination of input variables, as is defined in Equation (5)
Agg (m) = α1 m1 + α2 m2 + · · · + αd−1 md−1 + αd md = α · m (5)
with regularization condition αi = 1.
From Equation (5), we can see that the aggregation function takes all the values in
the activity vector into account. A regularized vector α, which we call the strength
vector, is used to tune the activity vector mso that only the information in the desired
ACM Transactions on Knowledge Discovery from Data, Vol. 10, No. 1, Article 9, Publication date: July 2015.
Measuring Temporal Patterns in Dynamic Social Networks 9:7
timesteps are counted toward the value of the aggregation function. The sum of the
strength vector over the time range of the activity measure must be equal to 1. This is
referred to as the regularization condition of the strength vector. In some cases where
the dimension of m is very large, a good temporal aggregation model will assign large αi
to timesteps which are of the most interest to the analysis of the temporal assessments
and assign small αi to the timesteps that are less important. The temporal aggregation
models that we will discuss later define mechanisms to generate the values of strength
vector α according to the length of the activity measure. Note that the values of strength
vector α only depend on the temporal model and the dimension of the input vector d.
Once the temporal model and the dimension of the input vector are determined, the
strength vector will remain fixed regardless of what the exact values of the input vector
m are.
A good temporal aggregation model should satisfy the following requirements:
(1) The model should come up with a scalable algorithm to dynamically calculate the
result of an aggregation function based on input vector m. For applications such as
abnormality detection in large datasets, it might require the algorithm to respond
in real time.
(2) The aggregation models should be temporally informative and meaningful to Per-
sistence and Emergence metrics. They should achieve good generalizations of tem-
poral trends and provide predictive insight.
(3) A tunable model parameter should be provided to switch between Recency and
Primacy effect. When the model parameter favors Recency effect, αi on the most
recent timesteps should be large. On the other hand, when the model parameter
favors the Primacy effect, αi on the distant past should be increased.
ACM Transactions on Knowledge Discovery from Data, Vol. 10, No. 1, Article 9, Publication date: July 2015.
9:8 W. Wei and K. M. Carley
We fit the aggregation function into Equations (1) and (2) to get the Persistence and
Emergence for Average Aggregation Model in Equations (7) and (8), respectively:
t
A(w) k=t−max{t,w}+1 mi,k
Pi,t = , (7)
max{t, w}
⎧ A(w)
⎪
⎪ 0 t = 1 or Pi,t−1 = Pi,tA(w)
⎪
⎨
A w)
Ei,t( = Pi,tA(w) − Pi,t−1
A(w) , (8)
⎪
⎪
⎪
⎩ A(w)
t>1
Ni,t
A(w) A(w)
where Ni,t = max{Pi,t−1 , Pi,tA(w) } if Persistence is non-negative and Ni,t
A(w) A(w)
= |Pi,t−1 |+
|Pi,tA(w) | if otherwise.
One of the issues in calculating the metrics is computational complexity. Fortunately,
both the Persistence and Emergence in Average Aggregation Model can be calculated
recursively based on values at the previous timestep. Equation (9) illustrates a recur-
sive method to calculate Persistence at time t based on Persistence at time t − 1 and
the activity measure at time t. The basic idea here is that we first get the sum of the
A(w)
previous activity measures by multiplying Pi,t−1 with its window size. If 1 < t < w, the
window size is t − 1. If t ≥ w, the window size is w. After we get the sum, depending on
how many terms are already in the sum, we might need to subtract the oldest activity
measure from the sum. If t ≥ w, we need to subtract the sum by mi,t−w because this is the
element that is discarded as the window moves one step forward. If 1 < t < w, we do not
need to subtract the sum by anything because the new window will not discard any ele-
ments. If t = 1, then the calculation of Persistence degenerates into the base case. Only
O(1) modifications are needed to calculate Persistence at the current timestep based on
the result at the previous timestep. The calculation of Emergence can also be done in
O(1) as a byproduct of the Persistence calculation which is illustrated in Equation (8),
⎧
⎪
⎪ mi,1 t=1
⎪
⎪
⎪
⎪
⎪ A(w)
⎨ Pi,t−1 · (t − 1) + mi,t
A(w)
Pi,t = 1 < t < w. (9)
⎪ t
⎪
⎪
⎪
⎪
⎪
⎪ P A(w) · w − mi,t−w + mi,t
⎩ i,t−1 t≥w
w
Algorithm 1 is the pseudocode for calculating Persistence and Emergence recursively
in Average Aggregation Model. The algorithm takes four input variables: the current
timestep t, Persistence at time t − 1, Pt−1 , activity measure at time t, mt and the
window size w. The algorithm returns Persistence and Emergence at time t. Based on
the analysis of recursive calculation, the algorithm scales in O(1) in terms of running
time when activity metrics are calculated in advance.
4.3. Linear Aggregation Model
One serious drawback of the Average Aggregation Model is the lack of differences
between timesteps within a window. In the Linear Aggregation Model, we assign dif-
ferent strengths to different values of input vector m based on how close it is to the
most recent timestep.
In the Linear Aggregation Model, the strengths increase linearly from the distant
timesteps to the recent timesteps. For the Linear Aggregation Model with window size
ACM Transactions on Knowledge Discovery from Data, Vol. 10, No. 1, Article 9, Publication date: July 2015.
Measuring Temporal Patterns in Dynamic Social Networks 9:9
w, we can get the strength vector α ∈ Rw = (μ, 2μ . . . , (w − 1)μ, wμ). Using the regular-
ization condition in definition (5), we can solve μ = (1+w)w 2
. Substituting it back to the
strength vector we find that α = (1, 2, . . . , w − 1, w) · (1+w)w . Using this strength vector,
2
we are left with Equation (10), which is the formal definition of the aggregation function
of the Linear Aggregation Model. Similar to the aggregation function of the Average
Aggregation Model, we divide the definition into two cases: If the length of input vector
d is less than the window size, then the window is shrunk into the actual dimension
of the data. Otherwise, the aggregation function will consider the input information
that falls into the window. The window is always aligned with the current timestep and
anything outside the window will be discarded. The model will represent more Recency
effects when the window size is small. When the window size is large, the model will put
more weight on historical information and will better represent more Primacy effects,
⎧
⎪
⎪ 2 d
⎪
⎪ i · mi
⎪
⎨ (1 + d)d
i=1 d<w
AggL(w) (m ∈ Rd ) = . (10)
⎪
⎪ d d≥w
⎪
⎪ 2
⎪
⎩ (1 + w)w i · mi
i=d−w+1
Equations (11) and (12) are the definitions of Persistence and Emergence of Linear
Aggregation Model by substituting the aggregation function. Unfortunately, there
is no recursive method to calculate the dynamic metrics in the Linear Aggregation
Model. The calculation of both Persistence and Emergence must be made directly from
the definition,
L(w) 2
t
Pi,t = i · mi , (11)
(1 + max{d, t})max{d, t}
i=t−max{d,w}+1
ACM Transactions on Knowledge Discovery from Data, Vol. 10, No. 1, Article 9, Publication date: July 2015.
9:10 W. Wei and K. M. Carley
⎧ L(w) L(w)
⎪
⎪ 0 t = 1 or Pi,t−1 = Pi,t
⎨
L w)
Ei,t( = L(w)
Pi,t L(w)
− Pi,t−1 , (12)
⎪
⎪ t>1
⎩ L(w)
Ni,t
L(w) L(w) L(w) L(w) L(w)
where Ni,t = max{Pi,t−1 , Pi,t } if Persistence is non-negative and Ni,t = |Pi,t−1 |+
L(w)
|Pi,t | if otherwise.
Algorithm 2 is the pseudocode to calculate the dynamic metrics on Linear Aggrega-
tion Model. The computational complexity of this algorithm is O(w), which is propor-
tional to the window size.
Equation (14) is the corresponding strength vector of the aggregation function. One
can easily validate that the sum of the strength vector is always equal to 1, which
meets the regularization condition in Equation (5). The strength on the first timestep
is slightly different from the consequent ones. However, all the values in the strength
ACM Transactions on Knowledge Discovery from Data, Vol. 10, No. 1, Article 9, Publication date: July 2015.
Measuring Temporal Patterns in Dynamic Social Networks 9:11
vector are dominated by β d− j , which is very large at recent timesteps but decays expo-
nentially at earlier timesteps when β ranges from 0 to 1. If we change the transmission
parameter to 0, the aggregation function will only consider the information in the most
recent timestep, that is α E(β) ( j = 1) = 1, α E(β) ( j = 1) = 0. This means no history in-
formation is transmitted into the dynamic metrics, and is an extreme practice of the
Recency effect. However, if we increase the transmission parameter, we see that some
of the strength of the first timestep is moved to other timesteps. This represents a
blend of the Recency and Primacy effects. If the transmission parameter is set to 1,
then only the information in the first time period will have impacts to the dynamic
metrics, while all other timesteps will have a strength vector of 0. This represents a
pure Primacy effect where only the first information matters,
d−1
β if j = 1
α E(β)
(j) = . (14)
(1 − β)β d− j otherwise
Equation (15) is the definition of Persistence under the Exponential Aggregation
Model. The Persistence on the first timestep is just the activity measure itself. On the
subsequent timesteps, it is defined recursively based on Persistence at time t − 1 and
the activity measure at time t. The transmission parameter β plays an important role
in controlling the exponential components. One can easily validate that the Persistence
defined in Equation (15) has the same aggregation function defined in Equation (13)
as well as the same strength vector defined in Equation (14),
mi,t if t = 1
E(w)
Pi,t = . (15)
E(w)
β Pi,t−1 + (1 − β)mi,t otherwise
Equation (16) is the Emergence of Exponential Aggregation Model. Similar to the
Emergence definitions in other aggregation models, the Emergence is defined based on
the Persistence on t and t − 1,
⎧ E(w)
⎪
⎪ 0 t = 1 or Pi,t−1 = Pi,tE(w)
⎨
Ew
Ei,t( ) = Pi,tE(w) − Pi,t−1
E(w) , (16)
⎪
⎪ t>1
⎩ E(w)
Ni,t
E(w) E(w)
where Ni,t = max{Pi,t−1 , Pi,tE(w) } if Persistence is nonnegative and Ni,t
E(w) E(w)
= |Pi,t−1 |+
|Pi,tE(w) | if otherwise.
Algorithm 3 is the pseudocode to calculate Persistence and Emergence on Exponen-
tial Aggregation Model. The Persistence and Emergence in Exponential Aggregation
Model can be calculated recursively. The computational complexity is O(1) .
One thing worth mentioning is the possibility of converting the transmission pa-
rameter to an equivalent window parameter in the Exponential Aggregation Model.
In our experiment section, we will compare the results across different aggregation
models. For the average and liner model, it is not difficult to compare them because
both of them work on the same parameter, which is the window parameter. For the
exponential model that uses the transmission parameter, however, it is hard to com-
pare it against other models under the same scale. One way to solve this problem
is to use an equivalent variable that can be calculated from transmission parameter
to represent the window size parameter. In Equation (17), we define the Window Pa-
rameter of the transmission parameter β, λ (β, T ) to be the shortest distance from a
specific timestep to the current timestep T so that the sum of the strength vector from
the current timestep to this specific timestep is larger than 0.99. Since the sum of all
ACM Transactions on Knowledge Discovery from Data, Vol. 10, No. 1, Article 9, Publication date: July 2015.
9:12 W. Wei and K. M. Carley
the elements in the strength vector is always 1, this ensures that the majority of the
timesteps that will have impacts to the aggregation function will be selected into the
window parameter,
⎧ ⎫
⎨ T
T ⎬
λ(β, T) = k | α E(β) (j) ≥ 0.99 ≥ α E(β) (j) . (17)
⎩ ⎭
j=T −k j=T −k+1
5. DATASET
We use four different datasets to validate the dynamic metrics: the Newcomb fraternity
dataset, the Enron email dataset, the DBLP (i.e. DBLP computer science bibliography
hosted at http://dblp.uni-trier.de) co-authorship dataset, and the “Arab Spring” twitter
dataset. Each dataset has several snapshots and a set of agents who have a series of
network activities overtime. In the Enron, DBLP and Arab Spring dataset, there are a
significant amount of agents who have very little activities overtime. We applied a data
ACM Transactions on Knowledge Discovery from Data, Vol. 10, No. 1, Article 9, Publication date: July 2015.
Measuring Temporal Patterns in Dynamic Social Networks 9:13
Table I.
Dataset Nodes Snapshots
Newcomb 17 15
Enron 946 34
DBLP 1981 35
Arab Spring 4825 18
ACM Transactions on Knowledge Discovery from Data, Vol. 10, No. 1, Article 9, Publication date: July 2015.
9:14 W. Wei and K. M. Carley
The general idea of the experiment is to first calculate the dynamic metrics and then
use a regression model to find statistical correlations between the dynamic metrics at
time t and the true activity metrics at time t + 1. We use lasso regression, which is a
linear regression model to train the regression model on dynamic metrics and use it
to predict future activity measures. We use a linear regression model instead of more
complicated models for the following reasons: (1) a simpler and less powerful regression
model leaves the complexity of the prediction task to the dynamic metrics, which is the
main focus of our article. If the regression model is simple, we can easily distinguish
between different dynamic metrics. However, if we choose a complicated regression
model, chances are that those dynamic metrics will have virtually little performance
differences because the regression model is powerful enough to generate a model that
fits the input/output features. (2) For the dynamic metrics, we want them to meet the
human expectation of Persistence and Emergence so that when either of those two
metrics changes, people will immediately realize what happened to the network. If
we use a more complicated model, it will likely to pick up an aggregation model that
requires a less intuitive way to interpret Persistence and Emergence, which makes
the metrics less useful in detecting network changes. (3) The lasso regression model is
implemented in most statistical software packages. This increases the repeatability of
our analysis presented in the article. We want to emphasize that the regression models
in the experiments are used for the purpose of comparing different metrics only. They
are only used to illustrate the statistical correlations between the metrics and the
activity measure and should not be considered to be part of the metrics.
To begin with, we first generate activity metrics on each timestep for each agent
in the network. We calculate Persistence and Emergence based on the algorithms we
proposed in Section 4. Each aggregation model will have a different way of calculating
those metrics. We recorded the running time in the process of calculating these dynamic
metrics. After that, we start the training process of the regression models. We build two
versions of the experiments. One uses only Persistence and the other uses both Persis-
tence and Emergence. We complete the same experiments on four different datasets
illustrated in Section 5. Figure 1 illustrates the general procedure of the experiments.
We will not build a regression model using only Emergence because Emergence alone
can only indicate the trend and not the magnitude of the activities. Without the mag-
nitude of the activity, the performance of the regression model will be compromised.
We implemented lasso regression using the R glmnet package.[Friedman et al. 2010].
The lambda parameter is set to 1e−5.
To illustrate the feature variables and target variables of the regression, consider
Figure 2, where data in a dynamic network are organized into a matrix with horizontal
axis that represents time and vertical axis which represents the agents in the network.
We choose a column in the matrix to be the feature variable of the regression model
and choose multiple columns that is one timestep forward to the feature variable to
be target variables. The feature variables are chosen to be the dynamic metrics while
the target variables are the activity measures. The regression analysis is conducted to
find statistical correlations between feature variables and target variables. In Figure 2,
green cells are feature variables and blue cells are target variables. In that situation,
we are trying to predict information that is two timesteps away using the dynamic
metrics. For each experiment, we slide the selection of feature variable throughout the
time frame of the dataset to obtain results of different regression experiments under
similar parameter settings.
The parameter space varies depending on the aggregation model. For the Average
Aggregation Model and Linear Aggregation Model, the window parameter takes value
from 1 to approximately the same size as the maximum timestep of the data. For
Exponential Aggregation Model, however, we choose transmission parameter to vary
ACM Transactions on Knowledge Discovery from Data, Vol. 10, No. 1, Article 9, Publication date: July 2015.
Measuring Temporal Patterns in Dynamic Social Networks 9:15
evenly: 0.01, 0.02,. . . , 0.99, 1.00. In some cases, where we want to compare the models
under the same parameterization, we use the equivalent window parameter defined in
Section 4.4 to replace the transmission parameter in Exponential Aggregation Model.
We will also fix the length of the feature variable to be 1 while changing the length
of the target variable to see how many future timesteps can we predict based on the
dynamic metrics. We call the length of the target variable validation size.
ACM Transactions on Knowledge Discovery from Data, Vol. 10, No. 1, Article 9, Publication date: July 2015.
9:16 W. Wei and K. M. Carley
Fig. 3. Comparison of aggregation model performance in DBLP dataset when validation size = 1.
efforts to reach all other agents in the network through shortest paths. Betweenness
centrality measures the importance of an agent by counting what fraction of shortest
paths goes through this particular agent. Finally, clustering coefficient measures to
what extent agents in a network tend to cluster with each other.
ACM Transactions on Knowledge Discovery from Data, Vol. 10, No. 1, Article 9, Publication date: July 2015.
Measuring Temporal Patterns in Dynamic Social Networks 9:17
Table II illustrates the results of the lasso regression. Since lasso is a linear regres-
sion model, the result of the fitting on a two-dimensional input will generate three
estimations. For this specific experiment, we see that both Persistence and Emergence
have a fairly large coefficient while the intercept is relatively small. Emergence has
slightly higher coefficients than Persistence.
In Table III, we illustrate the prediction results of the first 10 agents in the net-
work in one of the experiments. In this specific experiment, we choose the exponential
aggregation model and set transmission parameter to be 0.75. We calculated both Per-
sistence and Emergence on the 15th timestep and ran the regression analysis to predict
the activity measure on the 16th timestep. We can see that the dynamic metrics does
a good job of generalizing the temporal patterns and even a linear regression model
can achieve a good accuracy in the prediction task. We can see that the predicted ac-
tivity measures and actual activity measures are fairly close and the differences are
quite small for all of the agents. Similar experimental results are observed on all other
datasets and timesteps.
The analysis in Figure 3 suggests an interesting way to look at the experimental
results, which is to compare the performance of the models when parameters minimize
the prediction error. Figure 4 and Figure 5 present the analysis using this technique.
Since we have four different datasets and each of them can have four different activity
measures, we have 16 sets of different sets of Persistence and Emergence measures.
For each measure set, we conduct experiments using both Persistence and Emergence
(marked with “∗”) and only Persistence (marked without “∗”), which makes the total
number of experiments to be 32. In each plot, the vertical axis represents the MSE of
prediction and the horizontal axis represents the validating size of the experiments.
For each validating size, we conducted many sets of regression experiments by varying
the model parameters. The one we report in the graph is the one that minimizes the
mean of MSE. Different colors and shapes of the points represent different aggregation
models.
From Figure 4, we can see that the Exponential model generally achieves the best
prediction accuracy, followed by Linear and Average model. There are two reasons
which lead to this. First, the Exponential model is more expressive in representing the
Primacy and Recency effects than the other two models. The exponential model uses a
real-valued parameter to control the level of Primacy and Recency effects, while other
models can only use a discrete window-size parameter. The real-valued parameter
gives the dynamic metrics better chance to capture the optimal values for regression
ACM Transactions on Knowledge Discovery from Data, Vol. 10, No. 1, Article 9, Publication date: July 2015.
9:18 W. Wei and K. M. Carley
analysis. Second, the Exponential model features a more reasonable strength vector
than that of Linear and Average model. The Exponential component makes the aggre-
gation function focus on the most recent timesteps, which are important to the task of
prediction. Although the Linear model also has a strength vector that increases with
time, it increases too gently to capture the actual evolutionary pattern of the agents.
The Average model, which did not distinguish between the recent and past informa-
tion at all, has the worst prediction accuracy and the worst ability to measure temporal
patterns.
One important observation is that the validation size plays an important role in
the prediction task. With small validation sizes, models generally achieve small errors
since those activities to predict is not far away from the point when dynamic metrics
are generated. With the increase of the validating size, all the models demonstrated
certain degree of decrease in the prediction accuracies. This is because there are certain
temporal variables that are not incorporated into the network information we have.
The regression model achieves better performance in timesteps that are closer to the
ACM Transactions on Knowledge Discovery from Data, Vol. 10, No. 1, Article 9, Publication date: July 2015.
Measuring Temporal Patterns in Dynamic Social Networks 9:19
Fig. 5. Comparison of prediction performance across validating size with confidence interval.
feature variable. The larger the validating size is, the worse the performance of the
predictions will be.
Another interesting observation is that the error first starts to decrease and then
increase with the increase of the validation size. This indicates that the models can do
a better job of predicting activity metrics in some distant future than those in the near
future. Datasets that have this phenomenon frequently observed such as the Enron
dataset and the Arab Spring dataset have complex temporal dependencies in network
activity. In those datasets, change in activity measures will see a delay in its effects
to the future activities. In other words, activities that are this distance away in the
ACM Transactions on Knowledge Discovery from Data, Vol. 10, No. 1, Article 9, Publication date: July 2015.
9:20 W. Wei and K. M. Carley
future have the most temporal correlation to the dynamic metrics and thus achieved
the best prediction accuracy. We use the term best validating distance to indicate the
validating size that achieved the best prediction error. For those datasets that have
best validating distance equals to 1, dynamic metrics have an immediate impact to the
activity measures in the next timestep.
Figure 5 is made using exactly the same piece of information as that in Figure 4
but with each aggregation models plotted separately. That makes it possible to show
the confidence intervals of each aggregation models under different dataset/activity
measure/experimental settings. We observe that the Exponential model has both a
lower prediction error and a tighter confidence interval than other models, which again
validated that exponential model performs better than the other two. In addition, we see
that the combined use of Persistence and Emergence makes the confidence intervals
tighter by comparing experiments with and without “∗”. The two dynamic metrics
represent two different aspects of the temporal evolution patterns. Using those two
dynamic metrics together achieves a better generalization of the historical information
and hence a better prediction of future activities.
Please note that the results presented in Figure 4 and Figure 5 is asynchronous,
in that each point in the graph represents different experiments using a different pa-
rameter. This is because we report only the results of the model that minimize the
prediction error for each validating size. The parameter to reach the optimal predic-
tion error varies from model to model and from one validation size to another. One
interesting question to ask is how these three aggregation models behave when they
have similar parameters. We use a technique that we introduced in Section 4.4 to con-
vert the transmission parameter into an equivalent window parameter so that we can
compare those three models together under the same window parameter.
Figure 6 and Figure 7 are the experimental results to compare different experi-
mental results under the same window parameters. Figure 6 and Figure 7 use the
same experimental results as in Figure 4 and Figure 5, but are aggregated over the
validation size rather than the parameter space. As a result, those two figures carry
different information than that in Figure 4 and Figure 5. In Figure 6, the horizontal
axis is changed from validating size to the window parameter. Each point in the graph
represents the regression experiment that optimizes the MSE with the same window-
size parameter but different validation size. The real-valued transmission parameter
of the Exponential model is now replaced by a discrete-valued window size by using
technique introduced in Section 4.4. This makes one of the benefits of the Exponential
model, which is the better expressiveness by using a continuously changed transmis-
sion parameter to disappear. However, even when present the results in that way, the
main conclusion we get from previous plots remain the same: the Exponential model
performs better than Linear and Average model across different datasets. Figure 7 is
a similar plot to Figure 6 but shows the confidence intervals. Again, using two metrics
together achieves a better performance in measuring temporal patterns. The reduction
in MSE and subsequent tither confidence interval of the Exponential model is even
more significant when using both metrics together as opposed to only Persistence.
One important observation from Figure 6 and Figure 7 is that the change of the
best model’s prediction error over window parameters for each dataset and activity
measure has different patterns. For some experiments such as degree centrality on
DBLP, the performance remains basically unchanged for different window sizes. For
the case of Arab Spring however, all centrality measure exhibit a pattern that the
prediction errors first increase and then decrease as the window parameter increases.
This illustrates that there are potential long-distance temporal dependencies between
the dynamic metrics and the activity measures in the past. We call this distance the
ACM Transactions on Knowledge Discovery from Data, Vol. 10, No. 1, Article 9, Publication date: July 2015.
Measuring Temporal Patterns in Dynamic Social Networks 9:21
best training distance, which is the difference between two window sizes that achieve
the best prediction accuracies. Regression models can achieve good performance when
the window size is small but the same prediction error can also be observed again
when the window size is large enough. The concept of best training distance is simi-
lar to that of the best validating distance since they both demonstrated long distance
temporal dependencies between activity measures on different timesteps. However,
the best training distance is the distance from the dynamic metrics to the past activ-
ity measures, while the best validating distance is the distance from dynamic metrics
to the future activity measures. In addition, we observe that those experiments first
saw significant increase and then decrease pattern in Figure 6 and Figure 7 are those
experiments that have best validating distance large than 1. This indicated that exper-
iments that showed long-distance dependencies in the past also demonstrated similar
long-distance dependencies to the future.
ACM Transactions on Knowledge Discovery from Data, Vol. 10, No. 1, Article 9, Publication date: July 2015.
9:22 W. Wei and K. M. Carley
Fig. 7. Comparison of prediction performance across window parameter with confidence interval.
ACM Transactions on Knowledge Discovery from Data, Vol. 10, No. 1, Article 9, Publication date: July 2015.
Measuring Temporal Patterns in Dynamic Social Networks 9:23
Fig. 8. Mean and confidence intervals of window parameters that achieves optimal performance.
prediction. We then calculate the mean and confidence intervals of those optimal pa-
rameters of experiments conducted in each dataset. We call the average window pa-
rameter to optimize prediction error the reference window. The mean and standard
deviations of reference window is reported in Figure 8 as well. We see that most of
the experiments have a relatively stable reference window size with a small standard
deviation. This illustrates an internal long-distance dependency within a dataset that
is independent of the prediction task. Another observation is that exponential model
ACM Transactions on Knowledge Discovery from Data, Vol. 10, No. 1, Article 9, Publication date: July 2015.
9:24 W. Wei and K. M. Carley
usually requires more data to achieve the optimal prediction error and always has a
small confidence interval. This is due to the flexibility of the model itself and its ability
to take more information in the distant past into account. For example, the Newcomb
dataset has a reference window of three-to-five weeks, which is a reasonable cycle for
a real social network. For Arab Spring dataset, the reference window size is something
between three and five months. These analyses suggested the existence of temporal
invariants in the dynamic social network.
6.5. Running Time Analysis
One of the key considerations in practice is the running time of the metrics. As earlier
theoretical analysis suggests, both Average and Exponential models can be calculated
recursively and the execution time is in constant complexity. However, since there are
no recursive solutions to the Linear model, its complexity is proportional to the length
of the input activity measures. We recorded the execution time of dynamic metrics
on each of the three temporal aggregation models. In order to compare models with
different running time, we again use the technique introduced in Section 4.4 to convert
the transmission parameter of the Exponential model into the window-size parameter.
We reported the mean and confidence interval of the running time in each group. The
running time only includes the execution time of calculating dynamic metrics, which
is illustrated in Algorithms 1, 2, and 3. The running time of regression models, which
is used to evaluate our metrics, is not included.
Figure 9 is a comparison of the running time of different models in each of the
datasets we used in the experiments. The horizontal axis is the window parameter and
the vertical axis is the actual running time. Each point in the graph represents the
mean execution time of several sets of experiments with the same aggregation model
and parameter. Confidence intervals are also plotted for each window size. We see that
the Average and Exponential model basically follow a constant complexity of running
time when model parameter changes.
7. FUTURE WORK
This research lays the groundwork for many additional developments. One thing that
deserves further exploration is how to make the model parameters adaptable with the
network data. In our analysis, the mode parameters are fixed prior to the analysis
and stay unchanged later on. Future work should consider building a dynamic model
that tweaks the parameters to fit the changes of incoming data adaptively. Such an
application would be particularly useful in contexts where network externalities are
changing network dynamics. Second, we choose activity measures that are used to
characterize network structure. There are many other activity measures that are not
related to network structures. Future research should consider what kind of activity
metrics could yield the most meaningful results in a dynamic setting. Third, in the
definition of Emergence we use two different ways to normalize the Emergence to make
it become an indicator that is independent of the magnitude of network changes. More
analysis might be necessary to understand the impacts of those normalizations and
compare the normalized and non-normalized versions of Emergence. Finally, further
empirical work on the temporal dimension of social network data is critical including
an evaluation of what additional features, beyond Persistence and Emergence, are
necessary in understanding network dynamics.
8. CONCLUSIONS AND LIMITATIONS
In this article, we explored the problem of measuring temporal patterns in dynamic
social networks. We argued that the patterns of evolution could be modeled in a man-
ner similar to that of the human cognitive process using the principles of Recency
ACM Transactions on Knowledge Discovery from Data, Vol. 10, No. 1, Article 9, Publication date: July 2015.
Measuring Temporal Patterns in Dynamic Social Networks 9:25
and Primacy. We built two classes of dynamic metrics and implemented those metrics
with three different temporal aggregation models. We found that the extents to which
we model Recency and Primacy effects are closely related to the performance of these
metrics in terms of the ability to predict future network activities. In particular, the
Exponential aggregation model has the best practice of Recency–Primacy effect and
achieves the best performance. We also found that there exist intrinsic temporal invari-
ants in dynamic social networks. These temporal invariants remain similar for each
agent over time and reflect the temporal dependencies of activities between the recent
and the past. We conclude that the activities of agents in the social network are highly
predictable given the retrospective information within a fixed temporal length, which
patterns can be measured using our dynamic metrics.
There are several limitations to this work. For example, the work presented in
the article is based on the consistency assumption of human behavior. Although our
ACM Transactions on Knowledge Discovery from Data, Vol. 10, No. 1, Article 9, Publication date: July 2015.
9:26 W. Wei and K. M. Carley
experiment results support this consistency, it is still possible that sporadic human
behavior might invalidate the basic model assumption in some situation.
There are many applications of this research. First, the dynamic metrics proposed
in this research can be used in anomaly detection. Using Persistence and Emergence,
we can easily identify agents who show sharp increases in network activities. In on-
line social network websites, those accounts that have abnormal activities might be
spammers or robots. The metrics proposed in this research provided a scalable method
to detect those abnormalities in large dynamic social networks. Second, the fact that
our metrics have statistical correlations to future activities suggests the possibility of
estimating agent level activities in dynamic social networks. Although our approach
cannot be used to predict the exact link changes, we can use the metrics to predict the
activity measures on those link changes. This makes our metrics useful in forecasting
the future importance of agents within a network according to the activity measures
used without knowing the exact network representations. Understanding the changes
in the activity measures such as centralities is key to many problems such as rec-
ommendation systems and user impact estimations. Finally, our metrics can be used
to study the periodicity of network changes. The temporal invariants can be used to
support the studies of periodicity such as the life cycles of fashions, gossips and how
they impact the evolution of social networks.
REFERENCES
R. Albert and A. L. Barabási. 2002. Statistical mechanics of complex networks. Reviews of Modern Physics
74, 1, 49–52.
J. M. Anthonisse. 1971. The rush in a directed graph. Vol. Technical Report BN 9/71. Stichting Mathematisch
Centrum, Amsterdam.
A. L. Barabási and R. Albert. 1999. Emergence of scaling in random networks. Science 286, 5439, 509–512.
L. Bian and R. Butler. 1999. Comparing effects of aggregation methods on statistical and spatial properties
of simulated spatial data. Photogrammetric Engineering and Remote Sensing 65, 73–84.
P. Bonacich. 1972. Factoring and weighting approaches to status scores and clique identification. Journal of
Mathematical Sociology 2, 1, 113–120. DOI:citeulike-article-id:3333118
S. P. Borgatti. 2003. The Key Player Problem. In Dynamic Social Network Modeling and Analysis: Workshop
Summary and Papers. National Academies Press, 241.
K. M. Carley. 1990. Group stability: A socio-cognitive approach. Advances in Group Processes 7, 1–44.
W. Chen, A. Collins, R. Cummings, T. Ke, Z. Liu, D. Rincon, and Y. Yuan. 2011. Influence maximization in
social networks when negative opinions may emerge and propagate. In SDM. Vol. 11, 379–390.
J. Deese and R. A. Kaufman. 1957. Serial effects in recall of unorganized and sequentially organized verbal
material. Journal of Experimental Psychology 54, 3, 180–187.
D. M. Dunlavy, T. G. Kolda, and E. Acar. 2011. Temporal link prediction using matrix and tensor factoriza-
tions. ACM Transactions on Knowledge Discovery from Data 5, 2, 1–27. DOI:10.1145/1921632.1921636
P. Erdős and A. Rényi. 1960. On the evolution of random graphs. Magyar Tud. Akad. Mat. Kutató Int. Közl
5, 17–61.
M. Faloutsos, P. Faloutsos, and C. Faloutsos. 1999. On power-law relationships of the internet topology. In
ACM SIGCOMM Computer Communication Review. Vol. 29, ACM, 251–262.
L. C. Freeman. 1979. Centrality in social networks conceptual clarification. Social Networks 1, 3, 215–239.
J. Friedman, T. Hastie, and R. Tibshirani. 2010. Regularization paths for generalized linear models via
coordinate descent. Journal of Statistical Software 33, 1, 1–22.
T. Hastie, R. Tibshirani, J. Friedman, and J. Franklin. 2005. The elements of statistical learning: data
mining, inference and prediction. The Mathematical Intelligencer 27, 2, 83–85.
A. E. Hoerl and R. W. Kennard. 1970. Ridge regression: biased estimation for nonorthogonal problems.
Technometrics 12, 1, 55–67.
C. I. Hovland. 1957. The order of presentation in persuasion. Yale University Press, Inc.
M. Kas, K. M. Carley, and L. R. Carley. 2013. Incremental Closeness Centrality for Dynamically Changing
Social Networks. Workshop on the Semantic and Dynamic Analysis of Information Networks (SDAIN’13).
ACM Transactions on Knowledge Discovery from Data, Vol. 10, No. 1, Article 9, Publication date: July 2015.
Measuring Temporal Patterns in Dynamic Social Networks 9:27
In Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis
and Mining (ASONAM), August 25–28, 2013, Niagara Falls, Canada.
M. Kas, K. M. Carley, and L. R. Carley. 2015. An incremental algorithm for updating betweenness centrality
and k-betweenness centrality and its performance on realistic dynamic social network data. Social
Network Analysis and Mining (SNAM). Springer
D. Kempe, J. Kleinberg, and E. Tardos. 2003. Maximizing the spread of influence through a social network.
In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data
Mining. ACM.
B. Klimt and Y. Yang. 2004. The enron corpus: A new dataset for email classification research. Machine
Learning: ECML 2004, 217–226.
J. Leskovec, L. Backstrom, R. Kumar, and A. Tomkins. 2008. Microscopic evolution of social networks. Paper
presented at the Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery
and data mining, Las Vegas, Nevada, USA.
J. Leskovec, J. Kleinberg, and C. Faloutsos. 2007. Graph evolution: Densification and shrinking diameters.
ACM Transactions on Knowledge Discovery from Data 1, 1, 2. DOI: 10.1145/1217299.1217301
Y.-R. Lin, Y. Chi, S. Zhu, H. Sundaram, and B. L. Tseng. 2009. Analyzing communities and their evolutions
in dynamic social networks. ACM Transactions on Knowledge Discovery from Data 3, 2, 1–31. DOI:
10.1145/1514888.1514891
N. Miller and D. T. Campbell. 1959. Recency and primacy in persuasion as a function of the timing of speeches
and measurements. The Journal of Abnormal and Social Psychology 59, 1, 1–9.
B. B. Murdock Jr. 1962. The serial position effect of free recall. Journal of Experimental Psychology; Journal
of Experimental Psychology 64, 5, 482–488.
T. N. Newcomb. 1961. The acquaintance process. New York: Holt, Rinehart and Winston.
M. E. J. Newman. 2001a. Clustering and preferential attachment in growing networks. Physical Review E
64, 2, 025102.
M. E. J. Newman. 2001b. The structure of scientific collaboration networks. Proceedings of the National
Academy of Sciences 98, 2, 404–409.
M. E. J. Newman. 2004. Coauthorship networks and patterns of scientific collaboration. Proceedings of the
National Academy of Sciences of the United States of America 101, Suppl 1, 5200–5205.
M. E. J. Newman. 2005. Power laws, Pareto distributions and Zipf ’s law. Contemporary Physics 46, 5, 323–
351.
L. A. Palinkas, J. C. Johnson, J. S. Boster, and M. Houseal. 1998. Longitudinal studies of behavior and
performance during a winter at the South Pole. Aviation, Space, and Environmental Medicine 69, 1,
73–77.
A. K. Romney. 1988. Quantitative models, science and cumulative knowledge. School of Social Sciences,
University of California, Irvine.
S. F. Sampson. 1968. A Novitiate in a Period of Change: An Experimental and Case Study of Social Relation-
ships. Ph.D dissertation. Cornell University.
A. Sanil, D. Banks, and K. Carley. 1995. Models for evolving fixed node networks: Model fitting and model
testing. Social Networks 17, 1, 65–81. DOI: 10.1016/0378-8733(94)00249-a.
R. Tibshirani. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society.
Series B (Methodological) 267–288.
D. J. Watts and S. H. Strogatz. 1998. Collective dynamics of ‘small-world’ networks. Nature 393, 6684,
440–442. DOI: citeulike-article-id:99 DOI: 10.1038/30918.
W. Wei, J. Pfeffer, J. Reminga, and K. M. Carley. 2011. Handling Weighted, Asymmetric, Self-Looped, and
Disconnected Networks in ORA. Technical Report No. CMU-ISR-11-113. Institute of Software Research,
Carnegie Mellon University Pittsburgh, PA.
O. R. Zaiane, J. Chen, and R. Goebel. 2007. DBconnect: Mining research community on DBLP data. In
Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 Workshop on Web Mining and Social Network
Analysis. ACM, 74–81.
H. Zou and T. Hastie. 2005. Regularization and variable selection via the elastic net. Journal of the Royal
Statistical Society: Series B (Statistical Methodology) 67, 2, 301–320.
ACM Transactions on Knowledge Discovery from Data, Vol. 10, No. 1, Article 9, Publication date: July 2015.