The Spreading of Misinformation Online (PNAS-2016)
The Spreading of Misinformation Online (PNAS-2016)
The Spreading of Misinformation Online (PNAS-2016)
Michela Del Vicarioa, Alessandro Bessib, Fabiana Zolloa, Fabio Petronic, Antonio Scalaa,d, Guido Caldarellia,d,
H. Eugene Stanleye, and Walter Quattrociocchia,1
a
Laboratory of Computational Social Science, Networks Department, IMT Alti Studi Lucca, 55100 Lucca, Italy; bIUSS Institute for Advanced Study, 27100
Pavia, Italy; cSapienza University, 00185 Rome, Italy; dISC-CNR Uos Sapienza, 00185 Rome, Italy; and eBoston University, Boston, MA 02115
Edited by Matjaz Perc, University of Maribor, Maribor, Slovenia, and accepted by the Editorial Board December 4, 2015 (received for review September
1, 2015)
he massive diffusion of sociotechnical systems and microblogging platforms on the World Wide Web (WWW) creates a
direct path from producers to consumers of content, i.e., allows
disintermediation, and changes the way users become informed,
debate, and form their opinions (15). This disintermediated environment can foster confusion about causation, and thus encourage
speculation, rumors, and mistrust (6). In 2011 a blogger claimed
that global warming was a fraud designed to diminish liberty and
weaken democracy (7). Misinformation about the Ebola epidemic
has caused confusion among healthcare workers (8). Jade Helm 15,
a simple military exercise, was perceived on the Internet as the
beginning of a new civil war in the United States (9).
Recent works (1012) have shown that increasing the exposure
of users to unsubstantiated rumors increases their tendency to
be credulous.
According to ref. 13, beliefs formation and revision is influenced by the way communities attempt to make sense of events or
facts. Such a phenomenon is particularly evident on the WWW
where users, embedded in homogeneous clusters (1416), process
information through a shared system of meaning (10, 11, 17, 18)
and trigger collective framing of narratives that are often biased
toward self-confirmation.
In this work, through a thorough quantitative analysis on a
massive dataset, we study the determinants behind misinformation
diffusion. In particular, we analyze the cascade dynamics of Facebook users when the content is related to very distinct narratives:
conspiracy theories and scientific information. On the one hand,
conspiracy theories simplify causation, reduce the complexity of
reality, and are formulated in a way that is able to tolerate a certain
level of uncertainty (1921). On the other hand, scientific information disseminates scientific advances and exhibits the process
of scientific thinking. Notice that we do not focus on the quality of
the information but rather on the possibility of verification. Indeed,
554559 | PNAS | January 19, 2016 | vol. 113 | no. 3
the main difference between the two is content verifiability. The generators of scientific information and their data, methods, and outcomes are readily identifiable and available. The origins of conspiracy
theories are often unknown and their content is strongly disengaged
from mainstream society and sharply divergent from recommended
practices (22), e.g., the belief that vaccines cause autism.
Massive digital misinformation is becoming pervasive in online
social media to the extent that it has been listed by the World
Economic Forum (WEF) as one of the main threats to our society (23). To counteract this trend, algorithmic-driven solutions
have been proposed (2429), e.g., Google (30) is developing a
trustworthiness score to rank the results of queries. Similarly,
Facebook has proposed a community-driven approach where
users can flag false content to correct the newsfeed algorithm.
This issue is controversial, however, because it raises fears that
the free circulation of content may be threatened and that the
proposed algorithms may not be accurate or effective (10, 11,
31). Often conspiracists will denounce attempts to debunk false
information as acts of misinformation.
Whether a claim (either substantiated or not) is accepted by
an individual is strongly influenced by social norms and by the
claims coherence with the individuals belief systemi.e., confirmation bias (32, 33). Many mechanisms animate the flow of
false information that generates false beliefs in an individual,
which, once adopted, are rarely corrected (3437).
In this work we provide important insights toward the understanding of cascade dynamics in online social media and in
particular about misinformation spreading.
We show that content-selective exposure is the primary driver
of content diffusion and generates the formation of homogeneous
Significance
The wide availability of user-provided content in online social
media facilitates the aggregation of people around common
interests, worldviews, and narratives. However, the World
Wide Web is a fruitful environment for the massive diffusion of
unverified rumors. In this work, using a massive quantitative
analysis of Facebook, we show that information related to
distinct narrativesconspiracy theories and scientific news
generates homogeneous and polarized communities (i.e., echo
chambers) having similar information consumption patterns.
Then, we derive a data-driven percolation model of rumor
spreading that demonstrates that homogeneity and polarization are the main determinants for predicting cascades size.
Author contributions: M.D.V., A.B., F.Z., A.S., G.C., H.E.S., and W.Q. designed research;
M.D.V., A.B., F.Z., H.E.S., and W.Q. performed research; M.D.V., A.B., F.Z., F.P., and W.Q.
contributed new reagents/analytic tools; M.D.V., A.B., F.Z., A.S., G.C., H.E.S., and W.Q.
analyzed data; and M.D.V., A.B., F.Z., A.S., G.C., H.E.S., and W.Q. wrote the paper.
The authors declare no conflict of interest.
This article is a PNAS Direct Submission. M.P. is a guest editor invited by the Editorial
Board.
Freely available online through the PNAS open access option.
1
www.pnas.org/cgi/doi/10.1073/pnas.1517441113
Methods
0.050
Science
Conspiracy
0.025
0.000
0
10
20
30
40
50
Lifetime(hours)
Fig. 1. PDF of lifetime computed on science news and conspiracy theories,
where the lifetime is here computed as the temporal distance (in hours) between the first and last share of a post. Both categories show a similar behavior.
clusters, i.e., echo chambers (10, 11, 38, 39). Indeed, our analysis
reveals that two well-formed and highly segregated communities
exist around conspiracy and scientific topics. We also find that
although consumers of scientific information and conspiracy
theories exhibit similar consumption patterns with respect to content, the cascade patterns of the two differ. Homogeneity appears
to be the preferential driver for the diffusion of content, yet each
echo chamber has its own cascade dynamics. To account for these
features we provide an accurate data-driven percolation model of
rumor spreading showing that homogeneity and polarization are
the main determinants for predicting cascade size.
The paper is structured as follows. First we provide the preliminary definitions and details concerning data collection. We
then provide a comparative analysis and characterize the statistical
signatures of cascades of the different kinds of content. Finally,
we introduce a data-driven model that replicates the analyzed
cascade dynamics.
Data Collection. Debate about social issues continues to expand across the
Web, and unprecedented social phenomena such as the massive recruitment
of people around common interests, ideas, and political visions are emerging.
Using the approach described in ref. 10, we define the space of our investigation with the support of diverse Facebook groups that are active in
the debunking of misinformation.
The resulting dataset is composed of 67 public pages divided between 32
about conspiracy theories and 35 about science news. A second set, composed
of two troll pages, is used as a benchmark to fit our data-driven model.
The first category (conspiracy theories) includes the pages that disseminate
alternative, controversial information, often lacking supporting evidence
and frequently advancing conspiracy theories. The second category (science
news) includes the pages that disseminate scientific information. The third
category (trolls) includes those pages that intentionally disseminate sarcastic
false information on the Web with the aim of mocking the collective
credulity online.
For the three sets of pages we download all of the posts (and their
respective user interactions) across a 5-y time span (20102014). We
perform the data collection process by using the Facebook Graph API (40),
which is publicly available and accessible through any personal Facebook
user account. The exact breakdown of the data is presented in SI Appendix,
section 1.
Preliminaries and Definitions. A tree is an undirected simple graph that is
connected and has no simple cycles. An oriented tree is a directed acyclic
graph whose underlying undirected graph is a tree. A sharing tree, in the
context of our research, is an oriented tree made up of the successive sharing
of a news item through the Facebook system. The root of the sharing tree is
the node that performs the first share. We define the size of the sharing tree
as the number of nodes (and hence the number of news sharers) in the tree
and the height of the sharing tree as the maximum path length from the root.
We define the user polarization = 2 1, where 0 1 is the fraction of
likes a user puts on conspiracy-related content, and hence 1 1. From
user polarization, we define the edge homogeneity, for any edge eij between nodes i and j, as
ij = i j ,
with 1 ij 1. Edge homogeneity reflects the similarity level between
the polarization of the two sharing nodes. A link in the sharing tree is
600
400
Lifetime (hours)
Lifetime (hours)
400
200
200
0
0
500
1000
1500
2000
2500
250
500
750
Fig. 2. Lifetime as a function of the cascade size for conspiracy news (Left) and science news (Right). Science news quickly reaches a higher diffusion; a longer
lifetime does not correspond to a higher level of interest. Conspiracy rumors are assimilated more slowly and show a positive relation between lifetime
and size.
SOCIAL SCIENCES
0.075
STATISTICS
Ethics Statement. Approval and informed consent were not needed because
the data collection process has been carried out using the Facebook Graph
application program interface (API) (40), which is publicly available. For the
analysis (according to the specification settings of the API) we only used
publicly available data (thus users with privacy restrictions are not included in
the dataset). The pages from which we download data are public Facebook
entities and can be accessed by anyone. User content contributing to these
pages is also public unless the users privacy settings specify otherwise, and in
that case it is not available to us.
that drive sharing patterns and we focus on the role of homogeneity in friendship networks.
Fig. 3 shows the PDF of the mean-edge homogeneity, computed for all cascades of science news and conspiracy theories. It
shows that the majority of links between consecutively sharing
users is homogeneous. In particular, the average edge homogeneity value of the entire sharing cascade is always greater than or
equal to zero, indicating that either the information transmission
occurs inside homogeneous clusters in which all links are homogeneous or it occurs inside mixed neighborhoods in which the
balance between homogeneous and nonhomogeneous links is
favorable toward the former ones. However, the probability of
close to zero mean-edge homogeneity is quite small. Contents
tend to circulate only inside the echo chamber.
Hence, to further characterize the role of homogeneity in
shaping sharing cascades, we compute cascade size as a function
of mean-edge homogeneity for both science and conspiracy news
(Fig. 4). In science news, higher levels of mean-edge homogeneity in
556 | www.pnas.org/cgi/doi/10.1073/pnas.1517441113
Science
Conspiracy
0
0.1
1.0
Fig. 3. PDF of edge homogeneity for science (orange) and conspiracy (blue)
news. Homogeneity paths are dominant on the whole cascades for both
scientific and conspiracy news.
*Recall that a sharing path is here defined as any path from the root to one of the leaves
of the sharing tree. A homogeneous path is a sharing path for which the edge homogeneity of each edge is positive.
60
min1,
Z +
Cascade Size
p = f
f d.
max0,
40
Science
Conspiracy
20
0.00
0.25
0.50
0.75
The Model. Our findings show that users mostly tend to select and
nh
, 0 nh M.
M
Notice that 0 HL 1 and that 1 HL, the fraction of nonhomogeneous links, is complementary to HL. In particular, we can
reduce the parameters space to HL 0.5, 1 as we would restrict
our attention to either one of the two complementary clusters.
The model can be seen as a branching process where the
sharing threshold and neighborhood dimension z are the key
parameters. More formally, let the fitness j of the jth news and
the opinion i of a the ith user be uniformly independent
Del Vicario et al.
1.00
f mm1 1 =
hmif
1
hmif
,
1 2z
For details on the parameters of the fitted distributions used, see SI Appendix, section 3.2.
Note that the real-data values for the mean (and SD) of size and height on the troll posts
are, respectively, 23.54 122.32 and 1.78 0.73.
STATISTICS
S=
SOCIAL SCIENCES
100
10
1.00
0.5
0.75
10
Data
1.5
Simulated
10
CDF
CCDF
10
Data
0.50
Simulated
0.25
10
2.5
10
0.00
0
10
0.5
10
10
1.5
10
10
2.5
10
10
Height
Size
Fig. 5. CCDF of size (Left) and CDF of height (Right) for the best parameters combination that fits real-data values,HL , r, = 0.56, 0.01, 0.015, and first
sharers distributed as IG18.73, 9.63.
sample (the number of nodes in the system and the number of news
items) and varied the fraction of homogeneous links HL, the
rewiring probability r, and sharing threshold . See SI Appendix,
section 3.2 for the distribution of first sharers used and for additional simulation results of the fit on trolling messages.
We simulated the model dynamics with the best combination
of parameters obtained from the simulations and the number of
first sharers distributed as an inverse Gaussian. Fig. 5 shows the
CCDF of cascades size and the cumulative distribution function
(CDF) of their height. A summary of relevant statistics (min
value, first quantile, median, mean, third quantile, and max
value) to compare the real-data size and height distributions with
the fitted ones is reported in SI Appendix, section 3.2.
We find that the inverse Gaussian is the distribution that best
fits the data both for science and conspiracy news, and for troll
messages. For this reason, we performed one more simulation
using the inverse Gaussian as distribution of the number of first
sharers, 1,072 news items, 16,889 users, and the best parameters
combination obtained in the simulations. The CCDF of size and
the CDF of height for the above parameters combination, as well
as basic statistics considered, fit real data well.
Conclusions
Digital misinformation has become so pervasive in online social
media that it has been listed by the WEF as one of the main threats
to human society. Whether a news item, either substantiated or not,
is accepted as true by a user may be strongly affected by social
norms or by how much it coheres with the users system of beliefs
(32, 33). Many mechanisms cause false information to gain acceptance, which in turn generate false beliefs that, once adopted by an
individual, are highly resistant to correction (3437). In this work,
using extensive quantitative analysis and data-driven modeling, we
provide important insights toward the understanding of the mechanism behind rumor spreading. Our findings show that users mostly
tend to select and share content related to a specific narrative and
to ignore the rest. In particular, we show that social homogeneity is
the primary driver of content diffusion, and one frequent result is
the formation of homogeneous, polarized clusters. Most of the
times the information is taken by a friend having the same profile
(polarization)i.e., belonging to the same echo chamber.
The best parameters combinations is HL = 0.56, r = 0.01, = 0.015. In this case we have a
mean size equal to 23.42 33.43 and a mean height 1.28 0.88, and it is indeed a good
approximation; see SI Appendix, section 3.2.
Data
IG
Lognormal
Poisson
1
5
10
39.34
27
3,033
0.36
4.16
10.45
39.28
31.59
1814
0.10
3.16
6.99
13.04
14.85
486.10
20
35
39
39.24
43
66
558 | www.pnas.org/cgi/doi/10.1073/pnas.1517441113
The inverse Gaussian (IG) shows the best fit for the distribution of first
sharers with respect to all of the considered statistics.
1. Brown J, Broderick AJ, Lee N (2007) Word of mouth communication within online
communities: Conceptualizing the online social network. J Interact Market 21(3):220.
2. Kahn R, Kellner D (2004) New media and internet activism: From the battle of Seattle to blogging. New Media Soc 6(1):8795.
3. Quattrociocchi W, Conte R, Lodi E (2011) Opinions manipulation: Media, power and
gossip. Adv Complex Syst 14(4):567586.
4. Quattrociocchi W, Caldarelli G, Scala A (2014) Opinion dynamics on interacting networks: Media competition and social influence. Sci Rep 4:4938.
5. Kumar R, Mahdian M, McGlohon M (2010) Dynamics of conversations. Proceedings of
the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data
Mining (ACM, New York), pp 553562.
6. Sunstein C, Vermeule A (2009) Conspiracy theories: Causes and cures. J Polit Philos
17(2):202227.
7. Kadlec C (2011) The goal is power: The global warming conspiracy. Forbes, July 25,
2011. Available at www.forbes.com/sites/charleskadlec/2011/07/25/the-goal-is-po.werthe-global-warming-conspiracy/. Accessed August 21, 2015.
8. Millman J (2014) The inevitable rise of Ebola conspiracy theories. The Washington
Post, Oct. 13, 2014. Available at https://www.washingtonpost.com/news/wonk/
wp/2014/10/13/the-inevitable-rise-of-ebola-conspiracy-theories/. Accessed August
31, 2015.
9. Lamothe D (2015) Remember Jade Helm 15, the controversial military exercise? Its
over. The Washington Post, Sept. 14, 2015. Available at https://www.washingtonpost.
com/news/checkpoint/wp/2015/09/14/remember-jade-helm-15-the-controversial-militaryexercise-its-over/. Accessed September 20, 2015.
10. Bessi A, et al. (2015) Science vs conspiracy: Collective narratives in the age of misinformation. PLoS One 10(2):e0118093.
11. Mocanu D, Rossi L, Zhang Q, Karsai M, Quattrociocchi W (2015) Collective attention in
the age of (mis) information. Comput Human Behav 51:11981204.
12. Bessi A, Scala A, Rossi L, Zhang Q, Quattrociocchi W (2014) The economy of attention
in the age of (mis) information. J Trust Manage 1(1):113.
13. Furedi F (2006) Culture of Fear Revisited (Bloomsbury, London).
14. Aiello LM, et al. (2012) Friendship prediction and homophily in social media. ACM
Trans Web 6(2):9.
15. Gu B, Konana P, Raghunathan R, Chen HM (2014) Research notethe allure of homophily in social media: Evidence from investor responses on virtual communities. Inf
Syst Res 25(3):604617.
16. Bessi A, et al. (2015) Viral misinformation: The role of homophily and polarization.
Proceedings of the 24th International Conference on World Wide Web Companion
(International World Wide Web Conferences Steering Committee, Florence,
Italy), pp 355356.
17. Bessi A, et al. (2015) Trend of narratives in the age of misinformation. PLoS One 10(8):
e0134641.
18. Zollo F, et al. (2015) Emotional dynamics in the age of misinformation. PLoS One
10(9):e0138740.
19. Byford J (2011) Conspiracy Theories: A Critical Introduction (Palgrave Macmillan,
London).
20. Fine GA, Campion-Vincent V, Heath C (2005) Rumor Mills: The Social Impact of Rumor
and Legend, eds Fine GA, Campion-Vincent V, Heath C (Aldine Transaction, New
Brunswick, NJ), pp 103122.
21. Hogg MA, Blaylock DL (2011) Extremism and the Psychology of Uncertainty (John
Wiley & Sons, Chichester, UK), Vol 8.
22. Betsch C, Sachse K (2013) Debunking vaccination myths: Strong risk negations can
increase perceived vaccination risks. Health Psychol 32(2):146155.
23. Howell L (2013) Digital wildfires in a hyperconnected world. WEF Report 2013.
Available at reports.weforum.org/global-risks-2013/risk-case-1/digital-wildfires-in-ahyperconnected-world. Accessed August 31, 2015.
24. Qazvinian V, Rosengren E, Radev DR, Mei Q (2011) Rumor has it: Identifying misinformation in microblogs. Proceedings of the Conference on Empirical Methods in
Natural Language Processing (Association for Computational Linguistics, Stroudsburg,
PA), pp 15891599.
25. Ciampaglia GL, et al. (2015) Computational fact checking from knowledge networks.
arXiv:1501.03471.
26. Resnick P, Carton S, Park S, Shen Y, Zeffer N (2014) Rumorlens: A system for analyzing
the impact of rumors and corrections in social media. Proceedings of Computational
Journalism Conference (ACM, New York).
27. Gupta A, Kumaraguru P, Castillo C, Meier P (2014) Tweetcred: Real-time credibility
assessment of content on twitter. Social Informatics (Springer, Berlin), pp 228243.
28. Al Mansour AA, Brankovic L, Iliopoulos CS (2014) A model for recalibrating credibility
in different contexts and languages-a twitter case study. Int J Digital Inf Wireless
Commun 4(1):5362.
29. Ratkiewicz J, et al. (2011) Detecting and tracking political abuse in social media.
Proceedings of the 5th International AAAI Conference on Weblogs and Social Media
(AAAI, Palo Alto, CA).
30. Dong XL, et al. (2015) Knowledge-based trust: Estimating the trustworthiness of web
sources. Proc VLDB Endowment 8(9):938949.
31. Nyhan B, Reifler J, Richey S, Freed GL (2014) Effective messages in vaccine promotion:
A randomized trial. Pediatrics 133(4):e835e842.
32. Zhu B, et al. (2010) Individual differences in false memory from misinformation:
Personality characteristics and their interactions with cognitive abilities. Pers Individ
Dif 48(8):889894.
33. Frenda SJ, Nichols RM, Loftus EF (2011) Current issues and advances in misinformation
research. Curr Dir Psychol Sci 20(1):2023.
34. Kelly GR, Weeks BE (2013) The promise and peril of real-time corrections to political
misperceptions. Proceedings of the 2013 Conference on Computer Supported
Cooperative Work (ACM, New York), pp 10471058.
35. Meade ML, Roediger HL, 3rd (2002) Explorations in the social contagion of memory.
Mem Cognit 30(7):9951009.
36. Koriat A, Goldsmith M, Pansky A (2000) Toward a psychology of memory accuracy.
Annu Rev Psychol 51(1):481537.
37. Ayers MS, Reder LM (1998) A theoretical review of the misinformation effect: Predictions from an activation-based memory model. Psychon Bull Rev 5(1):121.
38. Sunstein C (2001) Echo Chambers (Princeton Univ Press, Princeton, NJ).
39. Kelly GR (2009) Echo chambers online?: Politically motivated selective exposure
among internet news users. J Comput Mediat Commun 14(2):265285.
40. Facebook. (2015) Using the graph API. Available at https://developers.facebook.com/
docs/graph-api/using-graph-api. Accessed December 19, 2015.
41. Watts DJ, Strogatz SH (1998) Collective dynamics of small-world networks. Nature
393(6684):440442.
42. Leskovec J, Huttenlocher D, Kleinberg J (2010) Signed networks in social media.
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
(ACM, New York), pp 13611370.
STATISTICS
SOCIAL SCIENCES
ACKNOWLEDGMENTS. Special thanks go to Delia Mocanu, Protesi di Protesi di Complotto, Che vuol dire reale, La menzogna diventa verita
e passa alla storia, Simply Humans, Semplicemente me, Salvatore
Previti, Elio Gabalo, Sandro Forgione, Francesco Pertini, and The rooster