Cuihua Shen
University of California, Davis, USA
Mona Kasra
University of Virginia, USA
James F O’Brien
University of California, Berkeley, USA
Fake or manipulated images propagated through the Web and social media have the
capacity to deceive, emotionally distress, and influence public opinions and actions.
Yet few studies have examined how individuals evaluate the authenticity of images that
accompany online stories. This article details a 6-batch large-scale online experiment
using Amazon Mechanical Turk that probes how people evaluate image credibility
across online platforms. In each batch, participants were randomly assigned to 1 of
28 news-source mockups featuring a forged image, and they evaluated the credibility
of the images based on several features. We found that participants’ Internet skills,
photo-editing experience, and social media use were significant predictors of image
credibility evaluation, while most social and heuristic cues of online credibility (e.g.
source trustworthiness, bandwagon, intermediary trustworthiness) had no significant
impact. Viewers’ attitude toward a depicted issue also positively influenced their
credibility evaluation.
Digital media literacy, image credibility, image manipulation, intermediary, online
images, source credibility
The ubiquitous availability of easy-to-use software for editing digital images brought
about by rapid technological advances of the 21st century has dramatically decreased the
time, cost, effort, and skill required to fabricate convincing visual forgeries. Often dis-
tributed through trusted sources such as mass media outlets, perhaps unknowingly, these
manipulated images propagate across social media with growing frequency and sophis-
tication. Moreover, the technology that allows for manipulating or generating realistic
appearing images has far outpaced the technological development of methods for detect-
ing fake imagery and even experts often cannot rely on visual inspection to distinguish
authentic digital images from forgeries. Bad actors can thus easily publish manipulated
visual content to deceive their viewers, inflicting cognitive stress, exploiting prior beliefs,
or influencing individuals’ decisions and actions.
Although it is difficult to say how prevalent undetected occurrences of fake imagery
are, numerous examples have been exposed in which manipulated images have caused
substantial harms at individual, organizational, and societal levels. For instance, an
image of Senator John Kerry and Jane Fonda sharing the stage at a Vietnam era antiwar
rally emerged during the 2004 presidential primaries as Senator Kerry was campaigning
for the Democratic nomination. The accompanying caption stated, “Actress and Anti-
War Activist Jane Fonda Speaks to a crowd of Vietnam Veterans as Activist and Former
Vietnam Vet John Kerry (LEFT) listens and prepares to speak next concerning the war in
Vietnam (AP Photo).” The forged photograph, however, was created by compositing
together two separate photos that separately depicted Kerry and Fonda. The edited image
showing them together gave the false impression that Kerry shared the controversial
antiwar views of activist Jane Fonda (Light, 2004). In a more recent example, in January
2014, the Associated Press news agency fired its Pulitzer prize-winning photographer
Narciso Contreras for digitally removing an object from one of his widely distributed
photographs of the Syrian civil war (The Guardian, 2014). This case has stirred an ongo-
ing and contested discussion about the authenticity of digital photographs, the potential
repercussions of image manipulation, and the ethics code in photojournalism. Numerous
other examples exist where fake imagery has been used to distort the truth and manipu-
late viewers (For more examples, see http://pth.izitru.com/). It is unclear how prevalent
are instances of undetected photo manipulation.
The damage done by manipulated imagery is real, substantial, and persistent. Studies
suggest that manipulated images can distort viewer’s memory (Wade et al., 2002)—
therefore further enhancing the credibility of these images—and even influence deci-
sion-making behaviors such as voting (Bailenson et al., 2008; Nash et al., 2009).
Moreover, even when individuals do become aware of the true nature of a forgery, the
harmful impact of misinformation on their perception, memory, emotions, viewpoints,
and attitude toward a topic can linger (Sacchi et al., 2007). Quite often the distribution of
fake images will far surpass the distribution of any correction or attempt to expose the
forgery (Friggeri et al., 2014). The factors combine to make image manipulation an
extremely effective and difficult to combat manipulation method.
While there is a growing awareness that images should no longer be automatically
assumed to be credible, authentic, or reliable sources of information, the general public
remains vulnerable to visual deception. Due to the scope and speed of information dis-
semination across social media websites, the potential for ill-intentioned players to inflict
emotional distress or to purposefully influence opinions, attitudes, and actions through
visual misinformation poses a severe and growing societal risk. Yet we know distress-
ingly little about how online viewers assess or make credibility judgments of online
images. This article details a large-scale online experiment of image credibility that
seeks to understand how individuals evaluate manipulated images that accompany online
stories, and what features (image-related and non-image-related) impact their credibility
judgment. The images tested in this study were altered using common manipulation tech-
niques: composition, elimination, and retouching (identified in Kasra et al., 2018).
The research design was informed by earlier research on social and heuristic
approaches to credibility judgment as well as by our previous exploratory findings on
online image credibility (Kasra et al., 2018). Previous research in this area has either
predominantly focused on fake image detection using machine learning approaches
(Gupta et al., 2013), or on the credibility of textual information, such as websites and
blogs (Allcott and Gentzkow, 2017; Morris et al., 2012; Wineburg and McGrew, 2016).
These studies tend to assume that individuals make credibility evaluations on their own
without considering that decisions are heavily influenced by one’s social networks. Our
study is among the first to test the social and cognitive heuristics of information credibil-
ity and evaluation in the context of image authenticity.
users rarely perform any evaluation behaviors (such as seeking out other sources to vali-
date information, or checking out the author) to verify the credibility of online informa-
tion (Metzger, 2007). To save cognitive time and effort, they instead process and
determine the credibility of the information by relying on social and heuristic cues (Fogg
et al., 2003; Metzger et al., 2010). These strategies include social information pooling
(especially from like-minded others), cognitive heuristics (e.g. website reputation,
endorsements, consistency, and expectancy violation), and persuasive intent (Metzger
et al., 2010).
Social network sites such as Facebook and Twitter introduce a host of system-gener-
ated metrics (e.g. number of followers) as cognitive heuristics for credibility judgment
(Sundar, 2008; Westerman et al., 2012, 2014). Similarly, information sharing behaviors
on social network sites and news media sites, facilitated by the prevalence of “social but-
tons” (Gerlitz and Helmond, 2013), present additional cognitive heuristics for credibility
evaluation. For instance, encountering a “secondhand” Facebook post, originally shared
by the New York Times but now reshared on an individual’s or organization’s page,
could complicate users’ credibility assessment. Unfortunately, there has been relatively
little empirical evidence on how the credibility of intermediaries may affect people’s
credibility judgment, especially when the credibility of the intermediary is inconsistent
with that of the original source. It is therefore crucial that, in addition to features related
to the original source, research on credibility evaluation takes into account the interme-
diaries, or “second-order” sources that share or endorse information originally published
by someone else.
Furthermore, most online credibility research to date focuses on textual information.
Among the few empirical studies specifically on image credibility, for instance, Gupta
et al. (2013) identified more than 10,000 tweets containing fake photos, and used machine
learning models to compare fake images to real ones based on several features. These
features ranged from characteristics of the source (e.g. number of followers and whether
the user is verified) to the quality of the tweet (e.g. length, sentiment, or hashtags used).
Their result showed that tweet-based features identified fake from real images very well,
while source (user) based features performed poorly.
Prior research suggests that viewers tend to believe the content depicted in online
images. A study measuring students’ ability in evaluating online sources of information
also found that most high school students accept photographs as facts without verifying
them (Wineburg and McGrew, 2016). Similarly, an exploratory study based on groups of
US college students found that, in general, users are overly trusting toward images on the
web (Kasra et al., 2018). More importantly, this study also revealed that viewers made
their credibility judgments based on non-image-related features such as source and
media channel, instead of image-related features such as inconsistencies in lighting,
shadow, or color (Kasra et al., 2018).
This article reports the results of a comprehensive experiment on contextual assess-
ment of image credibility online, measuring several factors such as the effects of source,
intermediary, and digital media literacy on viewers’ contextual assessments. In the fol-
lowing sections, we present hypotheses on six major factors that may influence credibil-
ity evaluation: source trustworthiness, source and media type, intermediary, bandwagon
cues, digital skills and experiences, and pro-issue attitude.
Source trustworthiness
Decades of credibility research concludes that the reputation of the source is an impor-
tant credibility heuristic (Metzger et al., 2010), and that credibility lies foremost in the
trustworthiness and expertise of the source itself (Tseng and Fogg, 1999). Users tend to
transfer the reputation of the source (companies as well as news organizations) to the
content itself (Metzger et al., 2010). In a previous focus group study on fake images,
most participants relied heavily on the source of online information, such as nationally
recognized news organizations, to determine the credibility of an image (Kasra et al.,
2018). So, source trustworthiness appears to play a critical role in evaluating online
images as well. Therefore,
H1: Images from more credible sources will be perceived as more credible than those
from less credible sources.
H2(a): Images from news organizations will be perceived as more credible than those
from individuals.
H2(b): Images from a news organization’s official website will be perceived as more
credible than those from their social media accounts.
Information sharing and curating behaviors online, fueled by the prevalence of “social
buttons” on social network sites and news media websites (Gerlitz and Helmond, 2013),
H3: Images from more credible intermediaries will be perceived as more credible
than those from less credible intermediaries.
RQ1: Are images shared by an intermediary perceived as more credible than images
without an intermediary?
RQ2: Do source credibility and intermediary credibility interact with each other in
affecting viewers’ image credibility evaluation?
The social buttons (like, share, favorite, etc.) available on websites and social media
often aggregate opinions about specific stories. Several studies have shown consistent
evidence for “bandwagon” effects where people are more likely to agree with a per-
ceived consensus from aggregate metrics (Lee and Sundar, 2013; Sundar, 2008). In ask-
ing participants to select and read news articles with different recommendation ratings,
Knobloch-Westerwick et al. (2005) found that higher rated articles were selected more
often. We expect the same heuristic for online images, therefore:
H4: Images with higher levels of bandwagon cues will be perceived as more credible
than those with lower levels of bandwagon cues.
H5: People with greater levels of (a) photography and digital imaging experience and
(b) general Internet skills will be more likely to perceive fake images as less credible
than people with less experience/skills.
H6: People who use (a) Facebook and (b) Twitter more often will perceive fake images
as less credible than people who use (a) Facebook and (b) Twitter less frequently.
Issue attitudes
Finally, confirmation bias is a well-established finding in the context of credibility judg-
ment (e.g. Knobloch-Westerwick et al., 2015), in that people are more likely to perceive
something as credible if it confirms their existing beliefs and opinions. This confirmation
bias is probably even more pronounced for information related to politics or current
events (Metzger et al., 2010). Therefore,
H7. People who have a high level of pro-attitude toward the issue depicted in the image
will perceive it as more credible than people with a low level of pro-attitude.
Study design
Based on previous findings reviewed above, we identified 4 factors to be manipulated
and tested in the experiment and statistically controlled 2 factors. We adopted a partial
factorial design: Source credibility (High Credibility/Low Credibility) Source and Media
Type (Website/Social Media Organization Account/Social Media Individual Account) ×
Intermediary (No/High Trust/Low Trust) × Bandwagon (High/Low), resulting in 28
unique conditions for each image tested (Table 1).
Note that a fully factorial design would have included 36 conditions ( 2 × 3 × 3 × 2 = 36 )
but our partial factorial design had only 28 conditions instead. We omitted 8 conditions
where an intermediary shared content from a news organization’s website (e.g. a person
sharing a story on Facebook from the New York Times website), because such conditions
typically result in a thumbnail of the original image, which is significantly smaller than the
image presented on the original website. In such cases, readers are required to click through
the link in order to view the image in its original form. To ensure all images were presented
and consumed in the same size and resolution without further clicking (see next section for
mockup creation), we decided to exclude these 8 conditions, while keeping the conditions
where an intermediary is sharing content from a news organization’s social media account
(e.g. a person sharing a story on Facebook from the New York Times Facebook account).
Condition Source Source used Source and media Intermediary Intermediary used Bandwagon Sample bandwagon cues
trustworthiness type
1 High New York Times Org. Website NA Low 50 likes, 11 shares, 1000
page views
2 High New York Times Org. Website NA High 22,575 likes, 1490 shares,
87,501 page views
3 High New York Times Org. Twitter/ NA Low 5 favorites, 3 retweets
4 High New York Times Org. Twitter/ NA High 64,540 favorites, 26361
Facebook retweets
5 High New York Times Org. Twitter/ Low Buzzfeed Low 5 favorites, 3 retweets
6 High New York Times Org. Twitter/ Low Buzzfeed High 64,540 favorites, 26361
Facebook retweets
7 High New York Times Org. Twitter/ High NPR/Bill Gates Low 5 favorites, 3 retweets
8 High New York Times Org. Twitter/ High NPR/Bill Gates High 64,540 favorites, 26361
Facebook retweets
9 High Bill Gates Personal Twitter/ NA Low 3 favorites, 2 retweets
10 High Bill Gates Personal Twitter/ NA High 30,635 favorites, 10,000
Facebook retweets
11 High Bill Gates Personal Twitter/ Low Rachael Hughes/ Low 3 favorites, 2 retweets
Facebook Mark Smith
12 High Bill Gates Personal Twitter/ Low Rachael Hughes/ High 30,635 favorites, 10,000
Facebook Mark Smith retweets
13 High Bill Gates Personal Twitter/ High NPR Low 3 favorites, 2 retweets
14 High Bill Gates Personal Twitter/ High NPR High 30,635 favorites, 10,000
Facebook retweets
Table 1. (Continued)
Condition Source Source used Source and media Intermediary Intermediary used Bandwagon Sample bandwagon cues
trustworthiness type
(Smith, n.d.). We also included two generic individuals, Rachael Hughes and Mark
Smith to represent ordinary social media users. 53 college undergraduate students in a
US University were recruited to pretest different manipulations, in which various sources
(including both news organizations and individuals) were presented to participants to
obtain their ratings of source trustworthiness.
The pretest results revealed the New York Times as the most trustworthy among news
organizations, so it was included in our study as a highly credible source. National Public
Radio (NPR) also scored consistently high and was included as a highly credible interme-
diary. Buzzfeed was regarded as a low credibility news organization (Table 1). As for indi-
viduals, Bill Gates was rated the most trustworthy individual. Generic individuals Rachael
Hughes and Mark Smith received low credibility ratings and were both included in the
manipulation as less credible individuals to counterbalance possible biases due to gender.
Source and media type. This factor included three levels: a news organization’s website
(e.g. New York Times’ website), a news organization’s social media account (e.g. New
York Times’ Twitter/Facebook account), and an individual’s social media account (e.g.
Bill Gates’ Twitter/Facebook account). We did not consider an individual’s website as a
separate media platform, as based on prior research this combination is rarely a source of
news information (Flanagin and Metzger, 2007). To assess the respective impact of Twit-
ter and Facebook, we adopted Twitter as the social media account for half of the images
tested, and Facebook for the other half (see Table 1).
Intermediary. This factor included three levels: no intermediary, intermediary with low
trustworthiness, and intermediary with high trustworthiness. We used the same or similar
news sources to manipulate the intermediary who helps diffuse images from the original
source on social media sites. Highly credible intermediaries included National Public
Radio (news organization) and Bill Gates (individual), and less credible intermediaries
included Buzzfeed (news organization) and Rachael Hughes/Mark Smith (individual).
Figure 1. Mockup for Condition #2: A fake news article purportedly from the New York Times
website. The image was modified to depict a bridge collapse.
Figure 2. Mockup for Condition #21: A fake Facebook post allegedly created by Buzzfeed and
shared by Bill Gates. The image was composed by layering three separate images.
Figure 3. Mockup for Condition #18: A fake Twitter post allegedly created by Buzzfeed. The
image was composed by layering two separate images to create a cat-mouse chimera.
Figure 4. Mockup for Condition #19: A Facebook post allegedly created by Buzzfeed and
shared by a generic person, Mark Smith. The image was composed by layering three separate
images to depict a school in Africa.
political event, scientific discovery, natural disaster, and social issues. The images
depicted (1) a bridge collapse in China, (2) a gay couple accompanied by their children,
3) genetically modified mouse with a cat’s head, 4) a school in Africa, (5) a bombing in
Syria, and (6) a Hispanic politician meeting with students. A short textual description
Figure 5. Mockup for Condition #13: A Facebook post allegedly created by Bill Gates and
shared by NPR. The image was modified to display airstrikes in Syria.
Figure 6. Mockup for Condition #8: A fake Twitter post purportedly created by the New York
Times and retweeted by Bill Gates. The image was composed by layering two separate images
to show a politician meeting with students.
accompanied all conditions. The captions provided brief textual information about the
content depicted, similar to how images are shared and viewed on the Web.
We created mockup compositions for each image based on the 28 experimental condi-
tions, resulting in 28 mockups. The total number of mockups tested was 168 (28
14 new media & society 00(0)
conditions × 6 images). Every participant was exposed to only one mockup depicting
one single image. For Figures 1, 3, and 6 (bridge collapse, genetically modified mouse,
Hispanic politician), we used Twitter interface to create the mockups wherever appropri-
ate. For Figures 2, 4, and 5 (gay couple, African school, and Syrian bombing), we used
Facebook interface to create the mockups wherever appropriate. We used the actual
social media handles and profile pictures for all news organizations and public figures.
The two individuals, Rachael Hughes and Mark Smith, had generic Twitter and Facebook
silhouettes instead of personalized profile pictures to keep them consistent and prevent
partiality (see Figures 1 to 6). Note that silhouette profile images may imply anonymity
which could adversely affect trustworthiness. However, Rachael Hughes and Mark
Smith are intended to be low credibility individuals so reduced trustworthiness due to
silhouette usage aligns with our study design.
Our goal was to understand (1) how viewers evaluate image credibility online and (2)
what contextual cues and features (image-related and non-image-related) impact their
credibility judgment. By including only fake images in the study instead of using unal-
tered, original images, we wanted to avoid the possibility of the participants having pre-
viously encountered one of the images. Such familiarity could have influenced their
credibility evaluation without consideration of the features we attempted to test in this
Image credibility. Six items on a 7-point scale (1 = strongly disagree, 7 = strongly agree)
were used to measure participants’ perception of the credibility of the photo. The items
were adapted from Flanagin and Metzger’s (2007) scale of message credibility online
and modified for use in the current study, and assessed the extent to which participants
perceived the image to be believable, original, authentic, fake, manipulated, and
retouched. Negatively worded items were reverse-coded. Then all items were averaged
to create a composite credibility score (Cronbach’s alpha = .95 for the whole sample).
Internet skills. Participants’ Internet skills were measured by their self-reported familiar-
ity with 10 Internet-related terms (e.g. phishing and tagging) on a 5-point Likert-type
scale (Hargittai and Hsieh, 2012). A composite score of Internet skills was obtained by
taking the mean of these 10 items (Cronbach’s alpha = .92).
Digital imaging experiences and skills. Two items on a 5-point scale (1 = None, 5 = I’m an
expert) were used to measure participants’ digital imaging (e.g. photo editing) experi-
ences and skills (Greer and Gosen, 2002; Cronbach’s alpha = .74).
Facebook use/Twitter use. We first asked if participants had a Facebook account (Y/N). If
they did, we proceeded with a six-item instrument on a 5-points Likert-type scale
(1 = strongly disagree, 5 = strongly agree), adapted from (Ellison et al., 2007), to measure
participants’ intensity of Facebook use (e.g. “Facebook is part of my everyday activity”;
“I would be sorry if Facebook shut down”). A composite intensity score was created by
taking the average of these 6 items (Cronbach’s alpha = .90). People who did not have a
Facebook account skipped this scale. Twitter use was measured in a similar manner
(Cronbach’s alpha = .91).
Pro-issue attitude. Two issue-relevant questions were used to measure participants’ pre-
existing attitudes toward the issue depicted in the image they were about to encounter in
the mockup. The issue relevant questions were adapted from Treier and Hillygus (2009)
and modified to fit each of the images tested. For example, the issue attitude question for
Figure 3 (genetically modified mouse) asked the participant whether it is ethical or
acceptable to genetically modify animals for research purposes. Negatively worded
questions were reversed coded, and then the two items were averaged to create a com-
posite score of pro-issue attitude.
Demographics. At the end of the survey, participants were asked to indicate their biologi-
cal sex, age, race, annual household income, and education level. Participants’ age and
sex were included in our analysis as control variables.
examining the image before answering the credibility questions, the “Next” button was
not displayed on the page until 30 seconds later. After rating image credibility, partici-
pants answered questions on their social media use, followed by demographic questions.
The entire survey took approximately 5 minutes to complete. After completion, partici-
pants were paid $0.25 for the task.
In order to compare how the MTurk sample may deviate from typical undergraduate
samples often employed in social science studies, including web credibility studies
(Hargittai and Hsieh, 2012), we also included a sample of undergraduate students from a
large public university in the west coast of the United States for study #3 (with the image
depicting a genetically modified cat/mouse). There were 486 MTurk workers and 401
undergraduate students who completed study #3, all randomly assigned to the 28 experi-
mental conditions. Compared to MTurk workers, students were significantly younger
(Mstudents = 21.24, SD = 2.58, t =−27.00, p < .001), with more women (71.54%, Pearson
χ2 = 45.11, p < .001), had lower Internet skills (Mstudents = 3.31, SD = 0.84, t = –12.23, p <
.001), less photography/digital imaging experiences (Mstudents = 2.74, SD = 0.75, t = –3.70,
p < .001), and less likely to have a Twitter account (45.64%, Pearson χ2 = 17.91, p <
.001) but equally likely to have a Facebook account (92.02%, Pearson χ2 = 1.64, p = .201).
They were also marginally more credulous than MTurk workers (Mstudents = 1.95, SD = .95,
t = 1.87, p = .06).1 In the following section, only MTurk sample results are reported.
There were slightly more men (N = 1902, 54.72%) than women (N = 1548, 44.53%)
among those who completed the study. Participants were between 20 and 87 years old
(one participant reported being 11 years old and was removed as all participants are
required to be 18 or older to enter the study), with a mean age of 34.71 years (SD = 11.16).
The largest household income category was less than 30,000 US dollars annually.
Participants were well-educated, with 89.8% reporting at least some college or above.
Detailed demographic statistics are reported in Table 2.
Overall, we observed significant differences in the average credibility judgment of the
six images, as expected. The mean credibility ratings on a 7-point scale for each of the
images are: 4.65 (SD = 1.19, bridge collapse), 3.86 (SD = 1.74, gay couple adopting chil-
dren), 1.83 (SD = 0.96, genetically modified mouse), 3.08 (SD = 1.66, a school in Africa),
4.06 (SD = 1.35, Syrian bombing), and 2.29 (SD = 1.32, Hispanic politician). The descrip-
tive statistics and correlations are reported in Table 3.
To test all hypotheses and answer research questions, we ran an analysis of covariance
(ANCOVA) for all participants (N = 3476), with all four experimental factors (source trust-
worthiness, source and media type, intermediary, and bandwagon). The participant’s sex
and the image tested were considered as fixed factors. The covariates were the participant’s
age, digital imaging experience, Internet skills, and favorable attitude toward the issue. An
interaction term between source trustworthiness and intermediary was also included.
H1 predicted that images from highly trustworthy sources are evaluated as more cred-
ible than those from less trustworthy sources. H1 was not supported, as source trustwor-
thiness did not have a significant main effect in the whole model, F(1, 3449) = 1.64,
p = .20.
Total Percent
Sex Men 1902 54.72
Women 1548 44.53
Not disclosed 26 0.75
Age M = 34.71 SD = 11.16
Race White/Caucasian 2596 74.68
African American 241 6.93
Hispanic 199 5.72
Asian 325 9.35
Native American 19 .55
Pacific Islander 11 .32
Other 62 1.78
Rather not to disclose 23 .66
Income Less than US$30,000 819 23.56
US$30,000–US$39,999 480 13.81
US$40,000–US$49,999 399 11.48
US$50,000–US$59,999 391 11.25
US$60,000–US$69,999 286 8.23
US$70,000–US$79,999 268 7.71
US$80,000–US$89,999 145 4.17
US$90,000–US$99,999 164 4.72
US$100,000 or more 462 13.29
Rather not to disclose 62 1.78
Education Less than High School 21 .60
High School/GED 319 9.18
Some College 893 25.69
2 year College Degree 379 10.90
4 year College Degree 1340 38.55
Masters Degree 396 11.39
Doctoral Degree 50 1.44
Professional Degree (JD, MD) 64 1.84
Rather not to disclose 14 .40
H2a predicted that images from news organizations are perceived as more credible
than those from individuals. H2b predicted that images from an organization’s official
website will be perceived as more credible than those from their social media accounts.
We tested the main effect of source and media type, as well as planned contrasts between
the three levels within the factor (news organization website, news organization social
media, individual social media). The results showed that the main effect was nonsignifi-
cant for the whole sample, F(2, 3449) = 1.75, p = .17, so were the planned contrasts. As a
result, H2a and H2b were not supported.
H3 predicted that images from more credible intermediaries will be perceived as more
credible than those from less credible intermediaries, while RQ1 asked if having an
Mean SD 1 2 3 4 5 6
1 Age 35.71 11.16
2 photo_experience 2.82 0.79 −0.09**
3 pro issue attitude 4.14 1.81 -.07** 0.004
4 internet_skills 4.04 0.82 -.04* .33** .04*
5 photo_credibility 3.50 1.73 .07** -.11** .05** -.05**
6 TW intensity (account 2.58 1.06 -.08** .10** −0.01 0.01 0.03
holders only, N = 1927)
7 FB intensity (account 3.04 1.03 .12** .04* -.04* -.04** 0.03 .19**
holders only, N = 3114)
intermediary affects image credibility. The factor intermediary did not have a significant
main effect, F(2, 3449) = 0.97, p = .38. Subsequent planned contrasts among the three
levels (no intermediary, low trustworthiness, and high trustworthiness) yielded nonsig-
nificant results. Therefore, H3 was not supported, and the answer to RQ1 was negative.
RQ2 explored the potential interaction between source trustworthiness and intermediary.
We again found no significant interaction in all models.
H4 predicted that images with higher levels of bandwagon cues such as shares and
favorites will be perceived as more credible than those with lower levels of bandwagon
cues. The main effect of bandwagon cues was nonsignificant in the whole model, F(1,
3449) = 0.04, p = .85, as well as in both subsamples. H4 was therefore not supported.
H5a predicted that people with greater amounts of photography experience and digital
imaging skills will perceive images as less credible compared to people with less skill or
experience. This hypothesis was supported in the whole model, F(1, 3449) = 12.38, p <
.001. H5b predicted that people with greater levels of Internet skills will perceive images
as less credible compared to people with lower skills. This prediction was also supported,
F(1, 3449) = 6.79, p = .01.
To investigate further how Facebook and Twitter use in particular may play a role in
credibility judgment of online images using either Facebook or Twitter mockups, we
divided the participants into two subsamples: the Twitter sample is based on people who
were exposed to Figures 1, 3, and 6 (bridge collapse in China, genetically modified cat/
mouse, Hispanic politician), where the Twitter interface was used in the mockup; The
Facebook sample is based on people who were exposed to Figures 2, 4, or 5 (gay couple,
African school, and Syrian bombing). We ran separate ANCOVAs on both samples,
using the same design as the whole model, first adding whether participants have a
Facebook/Twitter account (binary variable), and if they do, their Facebook/Twitter use
intensity measure, respectively. This resulted in two ANCOVA models for the Facebook
subsample and two ANCOVA models for the Twitter subsample (Table 4).
H6a predicted that people who use Facebook more will perceive images as less
credible compared to people who use Facebook less. This hypothesis did not receive
support as Facebook use intensity was not associated with credibility rating, Model 3:
Table 4. ANCOVA predicting image credibility.
F(1, 1535) = 0.86, p = 0.35. H6b predicted that people who use Twitter more will perceive
images as less credible compared to people who use Twitter less. This hypothesis was
supported, as Twitter use intensity was significant, Model 5: F(1, 999) = 5.98, p = .02.
H7 predicted that people’s support of the issue depicted in the image is positively
related to their credibility rating of the image. This hypothesis received strong support in
the whole sample, F(1, 3449) = 9.00, p < .001, and Facebook subsample, Model 2: F(1,
1701) = 10.94, p = .001; Model 3, F(1, 1535) = 8.74, p = .003.
Finally, participants’ sex and age were included as controls. Age showed a strong
main effect across the board, Model 1: F(1, 3449) = 44.08, p < .001. Sex was significant
in the Facebook subsample, Model 2: F(1, 1701) = 39.61, p < .001; Model 3, F(1,
1535) = 36.29, p < .001, but not significant in the whole sample, Model 1: F(1,
3449) = 2.63, p = .105, or the Twitter subsample.
As tools for creating and manipulating digital images become increasingly common-
place and easy to use, fake images continue to propagate across social media platforms
and contemporary media environment, influencing the viewers and posing a significant
sociopolitical threat around the world. It is thus imperative to better understand how
people evaluate the credibility of online images. This study reports the findings from a
large-scale experiment on image credibility evaluations on the Web, conducted on
Amazon MTurk. Based on previous work on social and cognitive heuristics for evaluat-
ing online credibility, we tested the effects of several features such source, intermediary,
and the background and skills of the viewers on assessing the credibility of images
online. The results were consistent across all six images tested, showing that viewers’
Internet skills, digital imaging experiences, social media use, and pro-issue attitude are
significant predictors of image credibility evaluation. However, none of the image con-
text features tested—for example, where the image was posted or and how many people
liked it—had an impact on participants’ credibility judgments. Our findings also reveal
that credibility evaluations are far less impacted by the content of an online image.
Instead they are influenced by the viewers’ backgrounds, prior experiences, and digital
media literacy.
This study contributes critical insights to image credibility research. Past studies
reported that people generally believe that they are rarely capable of identifying fake
images as such (Farid and Bravo, 2010), and that images are generally considered trust-
worthy (Kasra et al., 2018; Nightingale et al., 2017). Yet the credibility variance of
images was limited in these studies, either by the measurement scale employed (binary
yes/no), or by the topics and contexts of the images. To ensure a large variance, our study
purposefully chose six fake images depicting a wide range of issues. Each image was
forged using various image-manipulation techniques (e.g. composition, elimination,
retouching) and exhibited different levels of sophistication. Contrary to what the previ-
ous studies suggested, our results show that people are not as gullible in evaluating image
credibility on the Web. Our participants rated four images as fake or manipulated (below
4 on a 7-point scale). The other two images were rated only a little above the midpoint.
This result indicates that participants, no matter how careless or distracted they may be,
can still be discerning consumers of digital images.
Compared to previous research, our study implemented three notable changes. First,
our design recognized that image consumption and evaluation on the Internet are always
contextual rather than occurring in a vacuum. We therefore provided brief textual infor-
mation about each fabricated image, similar to how images are usually presented and
viewed online. Second, taking into account that online information is continuously
shared and reshared by different sources, we explicitly manipulated and tested whether
the existence and trustworthiness of an intermediary had any bearing on image credibil-
ity evaluation. Third, we adopted a measure of credibility (6 items, on a 7-point scale)
that is more nuanced than a binary yes/no choice, which was used in the study by
Nightingale et al. (2017). Our scale has better reliability and validity than a binary meas-
ure, which is prone to false positive and false negative results.
The most significant discovery of our study is that viewers’ skills and experience
greatly impact their image credibility evaluations. The more knowledge and experience
people have with the Internet, digital imaging and photography, and the online media
platforms, the better they are at evaluating image credibility. Our results suggest that to
mitigate the potential harm caused by fake images online, the best strategy is investing
in educational efforts to increase users’ digital media literacy. Meanwhile, issue attitude
has a significant effect as well. This is consistent with the confirmation bias found in
many similar studies (e.g. Knobloch-Westerwick et al., 2015) that people are more likely
to accept an image as real if it aligns with their prior beliefs. This finding could explain
why fake news spreads so readily in social media settings.
Several well-researched social and cognitive heuristic cues found in online credibility
research (e.g. source trustworthiness, media platform, and bandwagon cues) did not have
any significant effect on image credibility. Although surprising, this result does not mean
that the process of image credibility evaluation is inherently different from the process of
judging online information, neither does it mean that people use very different heuristics.
We speculate that workers on MTurk might have been rushing through the experiment
without paying enough attention to the various source, intermediary, and bandwagon
cues. MTurk participants were certainly not motivated to pay attention (Antin and Shaw,
2012), as they were compensated by the completion of the task, regardless of the response
quality. We included a few quality-control mechanisms, such as a 30 second minimum
stay time on the image page before participants could advance to the next page.2 However,
it can be argued that a rushed, careless scan without motivation to consider various cues
is indeed how people consume news and images in today’s media environment. In this
regard, the MTurk workers’ behaviors may be representative of people’s actual behaviors
are still not representative of the general population. We found them to have good self-
reported Internet skills (M = 4.04 on a 5-point scale), compared to a student sample in our
study and those reported previously (Hargittai and Hsieh, 2012; Hargittai and Shaw,
2015), perhaps unsurprisingly as they participated in an online labor marketplace. Still,
research found that the MTurk samples are slightly more diverse demographically than
standard Internet samples, and a lot more diverse than American college samples
(Buhrmester et al., 2011). The second limitation is the lack of manipulation checks in our
design. As a result, we do not know for sure whether observed results stemmed from lack
of attention (e.g. participants did not notice the purported source of an image was New
York Times) or the factor itself (e.g. participants did not think New York Times was a
credible source). Given the pretest results, we believe the former is more likely than the
latter, yet this remains a minor threat to validity. Third, our study was cross-sectional so
causality cannot be ascertained, although we believe most variables capture pre-existing
states and habits (Internet skills, digital imaging experiences) that are unlikely to change
due to image evaluation tasks.
In addition, we purposefully included only fake images in the credibility evaluation
task and excluded unaltered and/or misattributed original images. Although this approach
eliminated the risk of participants being already familiar with the stimuli, it only tested
participants’ suspicion when confronted with an image that was actually fake. In other
words, for those participants who rated fake images as less credible, we could not deter-
mine whether they were truly capable at evaluating image credibility, or just being more
skeptical in general. However, given the amount of misinformation and disinformation
in today’s media environment, being skeptical is arguably the crucial first step in all cred-
ibility evaluation tasks. As Rheingold (2012) argues, “the heuristic for crap detection is
to make skepticism your default” (p. 77). Meanwhile, the fake images used in this study
were all forgeries of sufficient quality so as not to be immediately distinguishable from
authentic images. Previous work showed that participants failed to identify the images as
fake and even when told that the images were fake they failed to correctly identify what
image areas had been manipulated (Kasra et al., 2018). Inability to distinguish between
compelling fakes and authentic images implies that results would be the same regardless
of whether the study was done using authentic images, compelling fakes, or a mixture.
Nevertheless, future research should test users’ ability to evaluate unaltered and misat-
tributed images as well as fakes of varying quality.
Even though our experiment aimed to be comprehensive, it still left out a few impor-
tant factors. For example, we did not manipulate the recency of images posted (all images
were presented with random 2015 dates), which could influence credibility judgments
(Westerman et al., 2014). Furthermore, considering that people are more likely to con-
sume news on social media sites instead of traditional channels, and that their networks
of “friends” will play an important role in information diffusion, a productive future
research direction is to examine credibility judgment in participants’ naturalistic social
network environment. This will allow the study to factor in the endorsements and aggre-
gate ratings from the participants’ self-curated network of friends and contacts. We also
focused only on image consumption on the desktop while people increasingly access
news on mobile devices (Fedeli and Matsa, 2018). How the parameters of mobile devices
may impact credibility judgment remains a fruitful future direction. Operationally, our
Shen et al. 23
study design could be further improved by swapping the order of the Internet and digital
skills questions with the fake image to eliminate potential priming effects. Forcing the
participants to stay on the image page for 30 seconds was also a less than ideal approach
to ensuring sufficient time for evaluation. Furthermore, participants’ Internet and digital
photography skills were self-reported rather than measured objectively, although evi-
dence shows that the likelihood of participants misreporting their Internet skills is low
(Hargittai, 2009). Future research is encouraged to address the above operational con-
cerns to further improve validity.
In the age of fake news and alternative facts, the risks and dangers associated with ill-
intentioned individuals or groups easily routing forged visual information through com-
puter and social networks to deceive, cause emotional distress, or to purposefully
influence opinions, attitudes, and actions have never been more severe. This article
details an online experiment to probe how people respond to and evaluate the credibility
of images in online environments. Through a series of between-subjects factorial experi-
ments that randomly assigned participants on Mechanical Turk to rate the credibility of
fake image mockups, we found that image characteristics such as where it is published
and how many people shared it do not matter. Instead, participants’ Internet skills, digital
imaging experiences, social media use, and pro-issue attitude are significant predictors
of credibility evaluation of online images.
The author(s) disclosed receipt of the following financial support for the research, authorship, and/
or publication of this article: This research was supported by National Science Foundation grants
CNS-1444840 and CNS-1444861.
1. We ran the same ANCOVA analyses on the student sample and found little difference from
the MTurk sample.
2. We also ran models including the total time spent on the image page as a covariate, which did
not change results.
Author biographies
Cuihua Shen is an associate professor at the Department of Communication and co-founder of the
Computational Communication Research Lab, UC Davis. Her research interests include digital
media literacy, online and mobile social networks, and computational social science.
Mona Kasra is an assistant professor of Digital Media Design at the University of Virginia (UVa).
Her research is centered around the power and politics of emerging media and its impact on art,
culture, and society. In 2016, she served as Conference Chair at ACM SIGGRAPH, the world’s
largest, most influential annual conference on the theory and practice of computer graphics and
interactive techniques.
Wenjing Pan completed this project as a Ph.D. student in the Department of Communication at UC
Davis. She is currently an Assistant Professor at the School of Journalism and Communication,
Renmin University of China. Her research interests include online supportive communication,
health communication, and computational social science.
Grace A Bassett completed this project as a Ph.D. student in the Department of Communication at
UC Davis. Her research interests focus on social networks, teams, language, and computational
social science. She is currently working as a User Experience Researcher at Facebook.
Yining Malloch is a PhD candidate at the Department of Communication, UC Davis. Her research
interests include computer-mediated social interaction and health communication.
James F O’Brien is a professor of Computer Science at the University of California, Berkeley. His
research interests include media forensics, computer graphics, physically based animation, and
simulation of physical systems. Professor O’Brien is a Fellow of the Sloan, Okawa, and Hellman
foundations and an ACM Distinguished Scientist, has been selected as one of Technology Review’s
TR-100, and received an Academy Award from the Academy of Motion Picture Arts and Sciences.