Understanding Mobility in A Social Petri Dish
Understanding Mobility in A Social Petri Dish
Understanding Mobility in A Social Petri Dish
U
meduniwien.ac.at) nderstanding the statistical patterns of human mobility, predicting trajectories and uncovering the mechan-
isms behind human movements1 is a considerable challenge with important practical applications to traffic
management2,3, planning of urban spaces4,5, epidemics6–9, information spreading10,11, and geo-marketing12,13.
In the last years, advanced digital technologies have provided huge amounts of data on human activities, allowing
to extract information on human movements. For instance, observations of banknote circulation14,15, mobile phone
records16, online location-based social networks17,18, GPS location data of vehicles19, or radio frequency identifica-
tion traces1,5,20, have all been used as proxies for human movements. These studies have provided valuable insights
into several aspects of human mobility, uncovering distinct features of human travel behaviour such as scaling
laws14,21, predictability of trajectories22, and impact of motion on disease spreading7–9,23. However, from a compar-
ative analysis of the different works it emerges clearly that a ‘‘unified theory’’ of human mobility is still outstanding,
since results, even on some very basic features of the motion, often appear to be contrasting1. One example is the
measured distribution of human trip lengths in various types of transportation: some studies agree that mobility is
generally characterized by fat-tailed distributions of trip lengths14,21, while others report exponential or binomial
forms1,5,19. The discrepancies arise due to the different mobility data sets used, where mobility is indirectly inferred
from some specific human activity in a particular context. For instance, mobile phone records typically provide
location information only when a person uses the phone21, while radio frequency identification traces like the ones
of Oyster cards in the London subway5 only log movements based on public transportation systems. Analyses of
these data sets can then result in a possibly biased view of the underlying mobility processes. Furthermore, most of
the analyzed data sets have poor information on how socio-economic factors influence human mobility patterns.
More generally, the lack of an all-encompassing record set with positional raw data, including complete informa-
tion on the socio-economic context and on the behaviour of all members of a human society, has so far limited the
possibilities for a comprehensive exploration of human mobility.
Here, we address the issue of mobility from a novel point of view by analyzing, with unprecedented precision,
the movements of a large number of individuals, the players of a self-developed massive multiplayer online game
(MMOG). Such online platforms provide a fascinating new way of observing hundreds of thousands of inter-
acting individuals who are simultaneously engaged in social and economic activities. The potential of online
worlds as largescale ‘‘socio-economic laboratories’’ has been demonstrated in a number of previous studies25–28.
power-law distribution:
PðDt Þ*Dt {b ð2Þ
Figure 3 | Influence of socio-economic clusters on mobility. (a) Sketch of jump patterns from a sector i to sectors within the same cluster, j and l, and to
sectors in a different cluster, j9, l9. Although sectors j9 and l9 have the same graph distance from sector i as sectors j and l respectively, transitions across
cluster border have smaller probabilities. (b) Quantitative evidence of the tendency of players to avoid crossing borders. Red squares show the null model,
i.e. the fraction of all pairs of sectors at a given distance d being in the same cluster. Blue circles show the fraction of measured jumps leading into the same
cluster, per distance. Coincidence of the two curves would indicate that clusters have no effect on mobility. Clearly this is not the case – there is a strong
tendency of players to avoid crossing the borders between clusters.
1 20 4
10
10 102
0
0
0 10
Figure 4 | Extracting communities from network topology and from mobility patterns. (a) The adjacency matrix A of the universe network, (b) the
matrix D of shortest path distances, and (c) the matrix M of transition counts of player jumps. Each of the three matrices contains 400 3 400 entries,
whose values are colour-coded. Sector IDs are ordered by cluster, resulting in the block-diagonal form of the three matrices. We have used modularity-
optimization algorithms to extract community structures from the information encoded in the three matrices. Different node colours represent the
different communities found, while the 20 different colour-shaded areas indicate the predefined socio-economic clusters as in Fig. 1. The displayed
Fowlkes and Mallows index F [½0, 1 quantifies the overlap of the detected communities with the predefined clusters. The closer F is to 1, the better the
match, see Supplementary Section S4. (d) Although information contained in the adjacency matrix A allows to find 18 communities, a number close to
the real number of clusters, the communities extracted do not correspond to the underlying colour-shades areas (F ~0:68). (e) Extracting communities
from the distance matrix D only results in 6 different groups (F ~0:49). (f) The 23 communities detected using the transition count matrix M reproduce
almost perfectly the real socio-economic clusters (F ~0:96), with only a few mismatched nodes detected as additional clusters. For more measures
quantifying the match of communities, see Supplementary Table II.
comparison we considered the player transition count matrix M, avoid crossing borders. We have therefore considered a Markov
shown in Fig. 4 (c), which displays a similar block-diagonal model in which each walker moves from a current node i to a node
structure as A and D, but with the qualitative difference that it j with a transition probability pij 5 mij/Sl mil, where mij is the num-
contains dynamic information on the system. Figure 4 (f) shows ber of jumps between sector i and sector j, as expressed by the trans-
that community detection methods applied to the transition count ition count matrix M of Fig. 4 (c). The probabilities pij are the entries
matrix M reveal almost perfectly all the socio-economic areas of the of the transition probability matrix P, which contains all the
universe. This finding demonstrates that mobility patterns contain information on the day-to-day movement of real players, such as
fundamental information on the socio-economic constraints present the preference to move within clusters, the length distribution of
in a social system. Therefore, a community detection algorithm jumps, as well as the tendency to remain in the same sector.
applied to raw mobility information, as the one proposed here, is Despite this detailed amount of information used (the matrix P
able to extract the underlying socio-economic features, which are has 160,000 elements), the Markov model fails to reproduce the
instead invisible to methods based solely on topology. For a asymptotic behaviour of the MSD, see magenta diamonds in Fig. 5
detailed treatment of adopted community detection methods and (b). Since the model considers only the position of the individual at
measures see Supplementary Section S4, Supplementary Table II its current time to determine its position at the following time, devia-
and Supplementary Figs. 4 and 5. tions from empirical data appear presumably due to the presence of
higher-order memory effects37. For this reason we have considered
A long-term memory model. In order to characterize the diffusion the recently proposed preferential return model21 which incorporates
of players over the network, we have computed the mean square a strong memory feature. The model is based on a reinforcement
displacement (MSD) of their positions, s2(t), as a function of time. mechanism which takes into account the propensity of individuals to
Results reported in Fig. 5 (a) indicate that, for long times, the MSD return to locations they visited frequently before. This mechanism is
increases as a power-law: able to reproduce the observed tendency of individuals to spend most
s2 ðt Þ*t u ð3Þ of their time in a small number of locations, a tendency which is also
prevalent in the mobility behaviour of Pardus players (see
with an exponent u < 0.26. This anomalous subdiffusive behaviour is Supplementary Fig. 3). However, the implementation of the pref-
not a simple effect of the topology of the Pardus universe. In fact, as erential return model on the Pardus universe network is not able
shown in Fig. 5 (b), gray stars, the simulation of plain random walks to capture the scaling patterns of the MSD, as shown in Fig. 5 (b).
on the same network produces a standard diffusion with an exponent The reason is that in the model the probability for an individual to
u < 1 up to t < 100 days, and then a rapid saturation effect which is move to a given location does not depend on the current location, nor
not present in the case of the human players. on the order of previously visited locations. Instead, we observe
Insights from the previous section suggest that the anomalous that in reality individuals tend to return with higher probability to
diffusion behaviour might be related to the tendency of players to sectors they have visited recently and with lower probability to
Figure 5 | Diffusion scaling in empirical data and simulated models. (a) The mean square displacement (MSD) of the positions of players follows a
power relation s2(t) , tu with a subdiffusive exponent u < 0.26. The inset shows the average probability P/- ðtÞ for a player to return after t jumps to a
sector previously visited. The curve follows a power law P/- ðtÞ*t{a with an exponent of a < 1.3 and an exponential cutoff. We report, for comparison,
(b) the MSD for various models of mobility. For random walkers and in the case of a Markov model with transition probability pij 5 mij/Sj mij we observe
an initial diffusion with an exponent u < 1 and then a rapid saturation of s2(t), due to the finite size of the network. A preferential return model also shows
saturation and does not fit the empirical observed scaling exponent u. Conversely, a model with long-time memory (Time Order Memory) reproduces the
exponent almost perfectly. Such a model makes use of the empirically observed P/- ðtÞ while the Markov model and the preferential return model over-
emphasize preferences to locations visited long ago and do not recreate the empirical curve well. Curves are shifted vertically for visual clarity.
sectors visited a long time before. Consequently a sector that has Discussion
been visited many times but with the most recent visit dating back The flat slope of u < 0.26 and the lack of saturation of the MSD of the
one year has a lower probability to be visited again than a sector players over the whole observation period exposes the significant
that has been visited just a few times but with the last visit dating level of subdiffusivity in the motions of individuals, consistent with
back only one week. previous findings21,38–41. However, the mere tendency of individuals
To highlight this mechanism we measured the return time distri- to return to already visited locations is not sufficient to capture these
bution in the jump-time series (see Methods). In particular, we subdiffusive properties of the MSD, but it is fundamental to consider
extracted the probability P/- ðtÞ for an individual to return again a mechanism that takes into account the temporal order of visited
(for the first time) to the currently occupied sector after t jumps. locations, as achieved by the TOM model. Moreover, the TOM
As shown in the inset of Fig. 5 (a), we found that the return time model is realistic in the sense that, in contrast to Markov models,
distribution reads it takes into account the tendency of individuals to develop a pref-
P/- ðtÞ*t{a ð4Þ erence for visiting certain locations. At the same time it allows for the
possibility that a previously preferred location becomes not fre-
quented anymore. This view provides an alternative to recently sug-
with an exponent a < 1.3. We used this information for constructing
gested reinforcement mechanisms in preferential return models21.
a model which takes into account the higher re-visiting probability of
The possibility for individuals to ‘‘change home’’ is relevant when
recently explored locations. In this way we can capture the long-term
the model should be able to account for migration, which is an
scaling properties of movements. Exactly these asymptotic properties important feature in the long-time mobility behaviour of humans.
are fundamentally relevant for issues of epidemics spreading or traf-
Finally, we discuss to which extent the findings from our ‘‘social
fic management.
petri dish’’ are valid also for human populations unrelated to the
This ‘‘Time Order Memory’’ (TOM) model incorporates a power- game. Previous analyses of human social behaviour in Pardus25,26
law distribution of first return times, together with a power-law have shown agreement with well-known sociological theories and
distribution of waiting times and an exponential distribution of jump with properties on comparable behavioural data. Examining the
distances, as those observed empirically in Fig. 2. We show below that preference of players to move within socio-economic regions is of
these ingredients are sufficient to reproduce the subdiffusive beha- obvious importance for clearing up the role of political or socio-
viour reported in Fig. 5 (a). The model works as follows: an individual economic borders on the movement and migration of humans,
stands still in a given sector for a number of days drawn from the where the presence of borders has a strong influence on mobil-
waiting time distribution, Eq. (2). Then, the individual jumps. There ity15,42–44. Online societies as the one of Pardus have the evident
are two possibilities: (i) with a probability v she returns to an already potential to serve as ‘‘socio-economic laboratories’’, where the com-
visited sector, (ii) with the probability 1 – v she jumps to a so far plete knowledge of activities, social relations, and positions of all
unexplored sector. In case (i), one of the previously visited sectors is individuals can significantly advance our understanding of large-
chosen according to Eq. (4). In the exploration case (ii), the indi- scale human behaviour, in particular of mobility.
vidual draws a distance d from the distance distribution, Eq. (1), and
jumps to a randomly selected, unexplored sector at that distance. The
model has four parameters. The parameters l, b and a of equations Methods
Data set. We focus on one of the three Pardus universes, Artemis. For this universe,
(1), (2) and (4) respectively, are fixed by the data. Further, averaging we extract player mobility data from day 200 to day 1200 of its existence. We discard
over all jumps and players, the probability of returning to an already the first 200 days because social networks between players of Pardus have shown
visited location is v < 0.83. Similarly to the measured data, the MSD aging effects in the beginning of the universe, i.e. there seems to exist a transient phase
of the TOM model, black squares in Fig. 5 (b), exhibits no saturation in the development of the society, possibly affecting mobility, which we would like to
avoid considering25. To make sure we only consider active players, we select all who
effects and displays an exponent uTOM 5 0.23 6 0.02 (the error is exist in the game between the days 200 and 1200, yielding 1458 players active over a
calculated over an ensemble of realizations) in agreement with the time-period of 1000 days. The sector IDs of these players, i.e. their positions on the
exponent observed for the players. universe network’s nodes, are logged every day at 05:35 GMT. Players typically log in
once a day and perform all their limited movements of the day within a few minutes, 20. Cattuto, C. et al. Dynamics of person-to-person interactions from distributed rfid
see Supplementary Section S1. The legal department of the Medical University of sensor networks. PloS one 5, e11596 (2010).
Vienna has attested the innocuousness of the used anonymized data. 21. Song, C., Koren, T., Wang, P. & Barabási, A. Modelling the scaling properties of
human mobility. Nature Physics 6, 818–823 (2010).
Transition count matrix and transition probability matrix. The entry mij of the 22. Song, C., Qu, Z., Blumm, N. & Barabási, A. Limits of predictability in human
transition count matrix M is equal to the number of times a player’s position was on mobility. Science 327, 1018 (2010).
sector i and then, on the following day, on sector j. This number is cumulated for all 23. Belik, V., Geisel, T. & Brockmann, D. Natural human mobility patterns and spatial
players. The entry pij of the transition probability matrix P corresponds to the spread of infectious diseases. Phys. Rev. X 1, 011001 (2011).
probability that a player moves to a sector j given that on the previous day the player’s 24. Boccaletti, S., Latora, V., Moreno, Y., Chavez, M. & Hwang, D. Complex networks:
mij structure and dynamics. Phys. Rep. 424, 175–308 (2006).
location was sector i. It reads: pij ~ P , where mij is the number of observed player
l mil 25. Szell, M. & Thurner, S. Measuring social dynamics in a massive multiplayer online
movements from sector i to sector j, and the sum over l is over all sectors of the game. Social Networks 32, 313–329 (2010).
universe. The matrix P is a stochastic matrix, i.e. it has the property that the entries of 26. Szell, M., Lambiotte, R. & Thurner, S. Multirelational organization of large-scale
each row sum to one. social networks in an online world. Proc. Natl. Acad. Sci. USA 107, 13636–13641
(2010).
MSD and diffusion. The MSD is defined as s2 (t) 5 Æ(r (T 1 t) 2 r (T))2æ, where r (T) 27. Castronova, E. On the research value of large games. Games and Culture 1, 163–
and r (T 1 t) are the sectors a player occupies at times T and T 1 t respectively, and 186 (2006).
where by (r (T 1 t) 2 r (T)) we denote the distance between the two sectors. The 28. Bainbridge, W. The scientific research potential of virtual worlds. Science 317, 472
average Æ N æ is performed over all windows of size t, with their left boundaries going (2007).
from T50 to T51000-t, and over all the 1458 players in the data set. If s2 has the form 29. www.pardus.at.
s2(t) , tu with an exponent u , 1, the diffusion process is subdiffusive, in the case 30. Thurner, S., Szell, M. & Sinatra, R. Emergence of good conduct, scaling and Zipf
u . 1 it is super-diffusive. An exponent of u 5 1 corresponds to classical brownian laws in human behavioral sequences in an online world. PLoS ONE: 7, e29796
motion38,39. (2012).
31. Kölbl, R. & Helbing, D. Energy laws in human travel behaviour. New J. of Phys. 5,
Jump-time and first return time distribution. We transform the time-series of daily 48 (2003).
sector IDs occupied by the players from real-time to jump-time, in order to be able to 32. Han, X., Hao, Q., Wang, B. & Zhou, T. Origin of the scaling law in human
compare time-series of different length and to focus on the movements between mobility: Hierarchy of traffic systems. Phys. Rev. E 83, 036117 (2011).
sectors. An example of this conversion is provided: a time series [5, 5, 5, 32, 32, 104, 5, 33. Barabási, A. The origin of bursts and heavy tails in humans dynamics. Nature 435,
5, 104, 104, 104, 32, 337, 337, 32…] becomes in jump-time [5, 32, 104, 5, 104, 32, 337, 207 (2005).
32, …]. We denote jump-time by the greek letter t, that is, at jump-time t a player has 34. Fortunato, S. Community detection in graphs. Phys. Rep. 486, 75–174 (2010).
performed exactly t jumps. We use t in the computation of the first return time 35. Arenas, A., Fernández, A. & Gómez, S. Analysis of the structure of complex
distribution. In the hypothetical time series of sectors [5, 32, 104, 5, 104, 32, 337, 32] a networks at different resolution levels. New J. of Phys. 10, 053039 (2008).
first return to a sector lying t 5 1 jumps back happens 2 times (104, 5, 104 and 32, 337, 36. Newman, M. Analysis of weighted networks. Phys. Rev. E 70, 056131 (2004).
32), for t 5 2 this happens once (5, 32, 104, 5), for t 5 3 also c, P/- ð1Þ~0:5, 37. Sinatra, R., Condorelli, D. & Latora, V. Networks of motifs from sequences of
P/- ð2Þ~P/- ð3Þ~0:25, where the sum over all P/- ðtÞ is equal to 1. symbols. Phys. Rev. Lett. 105, 178702 (2010).
38. West, B., Grigolini, P., Metzler, R. & Nonnenmacher, T. Fractional diffusion and
levy stable processes. Phys. Rev. E 55, 99 (1997).
1. Barthélemy, M. Spatial networks. Phys. Rep. 499, 1–101 (2010). 39. Metzler, R. & Klafter, J. The random walk’s guide to anomalous diffusion: a
2. Guimerà, R., Mossa, S., Turtschi, A. & Amaral, L. The worldwide air fractional dynamics approach. Phys. Rep. 339, 1–77 (2000).
transportation network: anomalous centrality, community structure, and cities 40. Scafetta, N., Latora, V. & Grigolini, P. Lévy statistics in coding and non-coding
global roles. Proc. Natl. Acad. Sci. USA 102, 7794–7799 (2005). nucleotide sequences. Phys. Lett. A 299, 565–570 (2002).
3. Helbing, D. Traffic and related self-driven many-particle systems. Rev. of Mod. 41. Viswanathan, G. et al. Optimizing the success of random searches. Nature 401,
Phys. 73, 1067 (2001). 911–914 (1999).
4. Makse, H. A., Havlin, S. & Stanley, H. E. Modelling urban growth patterns. Nature 42. Ratti, C. et al. Redrawing the map of Great Britain from a network of human
377, 608–612 (1995). interactions. PLoS One 5, e14248 (2010).
5. Roth, C., Kang, S. M., Batty, M. & Barthélemy, M. Structure of urban movements: 43. Newman, D. The lines that continue to separate us: borders in our borderless
Polycentric activity and entangled hierarchical flows. PLoS ONE 6, e15923 (2011). world. Progress in Human Geography 30, 143 (2006).
6. Pastor-Satorras, R. & Vespignani, A. Epidemic spreading in scale-free networks. 44. Lambiotte, R. et al. Geographical dispersal of mobile communication networks.
Phys. Rev. Lett. 86, 3200–3203 (2001). Physica A 387, 5317–5325 (2008).
7. Colizza, V., Barrat, A., Barthélemy, M. & Vespignani, A. The role of the airline
transportation network in the prediction and predictability of global epidemics.
Proc. Natl. Acad. Sci. USA 103, 2015 (2006).
8. Hufnagel, L., Brockmann, D. & Geisel, T. Forecast and control of epidemics in a
globalized worlds. Proc. Natl. Acad. Sci. USA 101, 15124–15129 (2004).
Acknowledgments
This work was conducted under the HPC-EUROPA2 project (project number: 228398)
9. Balcan, D. et al. Multiscale mobility networks and the spatial spreading of
with the support of the European Commission – Capacities Area – Research Infrastructures
infectious diseases. Proc. Natl. Acad. Sci. USA 106, 21484–21489 (2009).
initiative, and within the framework of European Cooperation in Science and Technology
10. Miritello, G., Moro, E. & Lara, R. Dynamical strength of social ties in information
Action MP0801 Physics of Competition and Conflicts. M.S. and S.T. acknowledge support
spreading. Phys. Rev. E 83, 045102 (2011).
from the Austrian Science Fund Fonds zur Förderung der wissenschaftlichen Forschung P
11. Onnela, J. et al. Structure and tie strengths in mobile communication networks.
23378, and from project EU FP7 – INSITE. M.S., R.S. and G.P. also thank the Santa Fe
Proc. Natl. Acad. Sci. USA 104, 7332 (2007).
Institute for the opportunities offered during the Complex Systems Summer School 2010,
12. Quercia, D., Lathia, N., Calabrese, F., Di Lorenzo, G. & Crowcroft, J.
where this project originated.
Recommending social events from mobile phone location data. In Data Mining
(ICDM), 2010 IEEE 10th International Conference on, 971–976 (2010).
13. Jensen, P. Network-based predictions of retail store commercial categories and Author contributions
optimal locations. Phys. Rev. E 74, 035101 (2006). All the authors have equally contributed to the design of the study, to the analysis and
14. Brockmann, D., Hufnagel, L. & Geisel, T. The scaling laws of human travel. Nature interpretation of the results and to the preparation of the manuscript.
439, 462–465 (2006).
15. Thiemann, C., Theis, F., Grady, D., Brune, R. & Dirk Brockmann, D. The structure
of borders in a small world. PLoS one 5, e15422 (2010). Additional information
16. González, M., Hidalgo, C. & Barabási, A. Understanding individual human Supplementary information accompanies this paper at http://www.nature.com/
mobility patterns. Nature 453, 779–782 (2008). scientificreports
17. Scellato, S., Noulas, A., Lambiotte, R. & Mascolo, C. Socio-spatial properties of Competing financial interests: The authors declare no competing financial interests.
online location-based social networks. Proceedings of ICWSM 11 (2011).
18. Scellato, S., Musolesi, M., Mascolo, C., Latora, V. & Campbell, A. Nextplace: A License: This work is licensed under a Creative Commons
spatio-temporal prediction framework for pervasive systems. Pervasive Attribution-NonCommercial-ShareAlike 3.0 Unported License. To view a copy of this
Computing 152–169 (2011). license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/
19. Bazzani, A., Giorgini, B., Rambaldi, S., Gallotti, R. & Giovannini, L. Statistical laws How to cite this article: Szell, M., Sinatra, R., Petri, G., Thurner, S. & Latora, V.
in urban mobility from microscopic gps data in the area of florence. J. Stat. Mech. Understanding mobility in a social petri dish. Sci. Rep. 2, 457; DOI:10.1038/srep00457
2010, P05001 (2010). (2012).