Surveillance

Download as pdf or txt
Download as pdf or txt
You are on page 1of 31

129

Using Social Media for Mental Health Surveillance:


A Review

RUBA SKAIK and DIANA INKPEN, University of Ottawa, Canada

Data on social media contain a wealth of user information. Big data research of social media data may also
support standard surveillance approaches and provide decision-makers with usable information. These data
can be analyzed using Natural Language Processing (NLP) and Machine Learning (ML) techniques to detect
signs of mental disorders that need attention, such as depression and suicide ideation. This article presents
the recent trends and tools that are used in this field, the different means for data collection, and the current
applications of ML and NLP in the surveillance of public mental health. We highlight the best practices and
the challenges. Furthermore, we discuss the current gaps that need to be addressed and resolved.
CCS Concepts: • General and reference → Surveys and overviews; • Computing methodologies →
Natural language processing; Machine learning algorithms;
Additional Key Words and Phrases: Mental health, social media
ACM Reference format:
Ruba Skaik and Diana Inkpen. 2020. Using Social Media for Mental Health Surveillance: A Review. ACM
Comput. Surv. 53, 6, Article 129 (December 2020), 31 pages.
https://doi.org/10.1145/3422824

1 INTRODUCTION
Public health surveillance is “the ongoing, systematic collection, analysis, interpretation, and dis-
semination of data regarding a health-related event for use in public health action to reduce mor-
bidity and mortality and to improve health” [German et al. 2001]. A critical requirement for a
surveillance system is to obtain reliable information and evidence in a timely manner. In this arti-
cle, we review the research studies that focus on using social media text for mental health surveil-
lance. The importance of mental health surveillance, focusing on depression and suicide, will be
addressed in the following subsections.

1.1 Mental Illness Impact with a Focus on Depression and Suicide


Mental illness is a leading cause of disability worldwide [WHO 2019], and it is a serious problem
that must be addressed. Mental illness includes mood or personality disorders such as depression,
insomnia, bipolar, schizophrenia, anxiety disorder, and drug or alcohol use disorders. Millions of
people suffer from mental illness and only a fraction receive adequate treatment. It is estimated

This research is funded by Natural Sciences and Engineering Research Council of Canada (NSERC).
Authors’ addresses: R. Skaik and D. Inkpen, University of Ottawa, 800 King Edward Avenue, Ottawa, ON, K1N 6N5, Canada;
emails: {rskai034, diana.inkpen}@uottawa.ca.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee
provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and
the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored.
Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires
prior specific permission and/or a fee. Request permissions from [email protected].
© 2020 Association for Computing Machinery.
0360-0300/2020/12-ART129 $15.00
https://doi.org/10.1145/3422824

ACM Computing Surveys, Vol. 53, No. 6, Article 129. Publication date: December 2020.
129:2 R. Skaik and D. Inkpen

that 792M people worldwide had a mental disorder in 2017 [Ritchie and Roser 2018], yet this is
proving to be an economic burden for governments, costing them billions of dollars yearly.
Depression is one of the most widely recognized mental disorders in the world. It is the main
contributor to non-fatal prevalence of impairment, accountable for a high number of disability-
adjusted years of life globally [Public Health Agency of Canada 2015] and one of the leading causes
for suicide. Early recognition of signs of depression and application of the proper treatments can
help those affected and ease their pain.
Although suicide is not a mental illness, most attempts are ascribed to mental disorders, very
commonly depression. Suicide has become one of the leading causes of death worldwide. It is a
serious public health problem and the right prompt response can mitigate it. For instance, the rate
of youth suicide in Canada is the third highest in the developed world and was the second-leading
cause of death among adolescents during 2017 in the United States [NIMH 2019]. According to
Statistics Canada, 4,405 people decided to take their own lives in Canada in 2015, at a rate of
11.5 per 100,000 people. Suicide cases have serious implications on the well-being of families and
communities, both physically and emotionally. An early detection of suicide ideation can prevent
many cases of suicide and help in identifying those that need immediate counseling. This is one
of the crucial steps of maintaining global mental health.

1.2 Mental Health Surveillance


Identifying early warning signs in patients can lead to prompt medical treatments to avoid relapse
and hospitalization [Abdullah and Choudhury 2018]. Furthermore, identifying suffering clusters
in terms of demographic information can guide governments to target each group with suitable
vigilance programs, plan required medical assistance to the concerned parties in early stages, and
allocate the necessary resources to reduce the encumbrance of mental illness in their respective
region. Identifying people with mental illness requires initiation from those in need, available med-
ical services, and time allocation from professional experts. These resources might not be available
all the time. The common practice is to rely on clinical data, which are generally collected after
the illness is developed and reported. Moreover, such clinical data are incomplete, as the majority
of the people facing mental illness will not seek treatment [Marcus et al. 2012]. An alternative
data source is by conducting surveys through phone calls, interviews, or by mail, but this is costly
and time-consuming. Social media analysis has brought advances in leveraging population data
to understand mental health problems. Thus, analyzing social media posts can play an important
alternative in identifying mental disorders throughout the population.

1.3 The Scope of This Study


Many reviews have been conducted to illustrate the significance of using machine learning meth-
ods and social media data for predictive mental health in general [Calvo et al. 2017; Guntuku et al.
2017; Wongkoblap et al. 2017; Thieme et al. 2020; Chancellor et al. 2019], or for specific mental
illness [Khan et al. 2018; Morales et al. 2017; Mahdy et al. 2020; Giuntini et al. 2020; Franco-Martín
et al. 2018], or using specific social media platform [Nugroho et al. 2020], or for public health–
related issues [Phillips et al. 2017; Sinnenberg et al. 2017; Conway et al. 2019; Yin et al. 2019;
Edo-Osagie et al. 2019; Shatte et al. 2019]. However, this systematic review aims to analyze the
literature on using social media posts to predict mental disorders using ML and NLP methods that
could be useful for mental health surveillance and presents the cutting-edge techniques in predic-
tive analysis of suicide ideation and depression at the population-level. It also points at the gaps
that need further research from the perspective of the data, the models, and evaluation procedures.
By analysing 110 publications, this review addresses the following:

ACM Computing Surveys, Vol. 53, No. 6, Article 129. Publication date: December 2020.
Using Social Media for Mental Health Surveillance: A Review 129:3

Table 1. Number of Monthly Active Users (Millions) Using Social Media Platforms 2018

Social media platform Users Main usage


Facebook 2,260 Social networking service
YouTube 1,900 Video-sharing
Instagram 1,000 Photo and video-sharing
WeChat 1,000 multi-purpose messaging application
Tumblr 624 multimedia and microblogging
TikTok 500 Video-sharing
Sina Weibo 431 Microblogging
Google+ 430 Social networking service
Reddit 355 Social news and blogging
Twitter 330 Microblogging and social networking service

(1) Data collection techniques used for predicting mental illness in text-based social media
platforms such as Twitter, Reddit, and Sina Weibo.
(2) Features used in training ML models.
(3) State-of-the-art methods in population-level mental illness prediction.
(4) Study limitations that need to be identified to facilitate mental health surveillance and
provide more accurate tools to concerned parties to enhance public mental health.
This article is structured as follows: Section 2 describes the methodology used for the collection
of the relevant articles, description of the eligibility criteria, and the compilation process for this
review. Sections 3 presents an overview of the various techniques that are used for data collection
and annotation. Section 4 summarizes the ML methods used for identifying signs and predicting
mental health issues within social media. Section 5 presents the techniques of using Social Me-
dia data to predict mental health issues at population level, focusing on depression and suicide
ideation. Section 6 addresses the challenges and limitations in this field. The last section concludes
the article.

2 THE REVIEW METHODOLOGY


Social media usage has grown exponentially over the past few years. By January 2019, there were
3.5B active social media users with a 9% increase over the previous year, which suggests that by mid
2020, about half of the world’s overall population would be using social media [Ortiz-Ospina 2019].
Currently, social media is considered a primary source of information among youth, providing
essential means for networking and opportunities to post user-generated content, such as text,
videos, images, and reviews. It contains myriad data on people’s thoughts, feelings, moods, and
experiences over time, which makes it a suitable data source for mental health surveillance. Table 1
shows the number of active users for the most popular social media platforms along with the
main usage of each social media platform. This study reviews the use of NLP and ML techniques,
thus, we focus on frameworks designed for posting user-generated text such as Twitter. These
social media were used, because they contain sufficient public data that are relevant and easy to
collect.
We followed PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses)
framework guidelines [Moher et al. 2011] to select publications related to predicting mental health
at the population level using ML and NLP techniques on the user-generated text. The search was
done in two phases. Phase (I) started on January 7, 2019. A total of 969 related papers between
2013 and 2018 (inclusive) were identified after searching PubMed, ACM Digital Library, Springer,

ACM Computing Surveys, Vol. 53, No. 6, Article 129. Publication date: December 2020.
129:4 R. Skaik and D. Inkpen

Table 2. Eligibility Criteria in This Review

Inclusion Criteria Level


Papers using user-generated text in a social media framework General
Papers published in English General
Phase (I): Papers published during the period from 2013 till 2018 (inclusive) General
Phase (II): Papers published during 2019 and 2020 General
Papers for estimating suicide or depression mental disorder General
Papers published in journals or computer science conference proceedings General
Papers using machine learning tools for population level Population

Fig. 1. Number of publications included in this review (a) per year (b) per topic.

Elsevier, IEEE Xplore databases, and Google Scholar for articles where any of its fields matched the
following Boolean search strings: “mental health” AND “machine learning” AND “social media”
AND “natural language processing” AND (population OR surveillance), and for Google Scholar we
excluded any article with “image” or “speech” terms. Searches done through the online database
were limited to publications in English. Twenty-four additional articles were also identified using
the snowball process, resulting in a total of 838 articles. Based on the title and abstract screening,
711 articles were excluded based on the eligibility criteria shown in Table 2, resulting in 165 papers
for full-text screening. Studies were excluded if: they are using other than social media text, such
as speech, image, or other multimedia data1 ; they are done using wearable or mobile data; they
used clinical data; or they investigate non-mental health issues including obesity, diabetes, or flu.
For population-level section, mental disorders other than depression and suicide ideation were
excluded.
Phase (II) started on August 9, 2020, to include the papers that were published in 2019 and 2020.
The same process was repeated, resulting in a total of 381 related papers that were identified using
the same resources. After screening titles and abstracts, 82 articles were eligible for full screening.
After the full-text screening, other 50 studies did not meet the inclusion criteria and were excluded
from the 82 initially eligible ones, resulting in an additional 32 studies to be added in phase II.
Figure 1 indicates the connection between the publishing years and the number of publications
obtained in our review and the topics covered in the publications.

1 Data from visual platforms such as Instagram and Snapchat.

ACM Computing Surveys, Vol. 53, No. 6, Article 129. Publication date: December 2020.
Using Social Media for Mental Health Surveillance: A Review 129:5

Fig. 2. PRISMA flow diagram of the study selection process for using NLP and ML to predict depression and
suicide ideation at population level.

The full list of all papers is provided in the Appendix. These papers did not specifically focus on
population-level analysis, but still highlighted useful methods that could be applied. Thus, these
papers are included in the data collection techniques and general methods sections. Ultimately, 110
publications were included in the current review, and 25 publications specifically for population-
level mental health classification techniques. Fifteen of the 25 publications were identified for the
depression disorder and 10 for suicide-ideation, as shown in Figure 2.

ACM Computing Surveys, Vol. 53, No. 6, Article 129. Publication date: December 2020.
129:6 R. Skaik and D. Inkpen

3 DATA COLLECTION
The first step to addressing mental illness is obtaining reliable information and evidence [Paul
and Dredze 2017]. Having a comprehensive and accurate dataset is a critical success factor for
applying ML algorithms. A gold standard is a dataset that is used to compare ML models against it
[Calvo et al. 2017]. Such datasets could contain only a test set on which the performance of various
classifiers can be compared, but more often they include a training set as well. The classifiers can
be trained on the latter, though any additional labeled or unlabeled data can be used for training
if desired.
There are several methods to gather information on social media relevant to users’ mental
health, including self-reporting (directly or indirectly), mental illness signs inference, manual an-
notations, and external statistics. In this section, we will summarize the data collection techniques
and annotation procedures and objectives.

3.1 Screening Surveys


Crowdsourcing platforms are considered as a significant source of explicit labels provided by
human workers or volunteers. Crowdsourcing platforms such as Amazon’s Mechanical Turk or
Crowd Flower enable researchers to post a questionnaire and invite people to contribute. Each
participant is expected to complete a diagnostic survey with the consent of the participant to pro-
vide their social media accounts (mostly Twitter).
Different psychiatric scales are used to measure the level of the participants’ mental health.
For quantification of depressive symptoms, researchers may choose to use PHQ-9 (Patient Health
Questionnaire) that is widely used for diagnosing and assessing the severity of depression. It mea-
sures behavioral attributes including concentration troubles, changes in sleeping habits, eating
disorder, lower activity, and losing interest, as well as feeling-oriented attributes such as feeling
tired, down, guilt, or failure as well as self-harm and suicidal thoughts. There is also the Depression
Scale Center for Epidemiological Studies (CES-D) questionnaire, which provides a self-report scale
of 20 multiple-choice questions designed to test depression-related symptoms such as depressed
feeling, restless sleep, and decreased appetite. Similar to the latter, there are the Beck Depression
Inventory (BDI) and the Short Depression-Happiness Scale (SDHS) questionnaires that serve the
same purpose. For anxiety, the State Anxiety Inventory (SAI) or the Anxiety Sensitivity Index (ASI).
Other questionnaires exist to measure other behaviors. For example, Life Events Checklist (LEC) is
used to detect life events, Anger Rumination Scale (ARS) to measure anger. To check the presence
of suicide ideation, there are many questionnaires such as Scale for Suicide Ideation (SSI), De-
pressive Symptom Inventory-Suicide Subscale (DSI-SS), Interpersonal Needs Questionnaire (INQ),
Acquired Capability for Suicide Scale (ACSS), and many others.
Although these scales are established tools in psychiatry, sometimes it is hard to determine
which scale needs to be used and is more accurate for a given population sample. For example, to
predict suicide attempts, the Emotion Regulation Questionnaire (ERQ) performed better than the
SSI questionnaire [Oh et al. 2017].
The screening method design varies depending on the study purpose and the target platform.
While this method is reasonably close to clinical practice, it is expensive to administer on a large
scale and suffers from sampling biases [Guntuku et al. 2017]. Table 3 shows some examples of
using such method for data collection, its applications, and references.
Some studies proposed a semi-supervised approach to mimic screening surveys and answer clin-
ical questionnaires from social media data. Yazdavar et al. [2017] incorporated a linguistic analysis
for user-generated content on social media over time for answering PHQ-9 questions. They col-
lected a total of 23M tweets posted by 45K Twitter users with self-declared symptoms of depression

ACM Computing Surveys, Vol. 53, No. 6, Article 129. Publication date: December 2020.
Using Social Media for Mental Health Surveillance: A Review 129:7

Table 3. Chosen Studies That Used Screening Surveys as a Way to Collect Data about DD: Major
Depression Disorder; PT: Posttraumatic Stress Disorder (PTSD); PD: Postpartum Depression,
and suicide within the population, SD: Suicide

Ref. Field Q.Type Platform #Users #Posts


[De Choudhury et al. 2013a] DD CES-D Twitter 489 69,514
[De Choudhury et al. 2013d] DD CES-D Twitter 476 2,157,992
[De Choudhury et al. 2014] PD PHQ-9 Facebook 156 578,220
[Tsugawa et al. 2015] DD CES-D Twitter 209 574,562
[Zhang et al. 2015] SD SPS Sina Weibo 697 2,000/user
[Braithwaite et al. 2016] SD DSI-SS Twitter 135 2,000/user
[Reece et al. 2017] DD CES-D Twitter 204 279,951
PT TSQ Twitter 174 243,775
[Almouzini et al. 2019] DD CES-D, PHQ-9 Twitter 89 6,122
INQ, ACSS
[Stankevich et al. 2019] DD BDI Vkontakte 531 32,872
[Stankevich et al. 2020] DD BDI Vkontakte 1,330 94,660

in users’ profile descriptions. The authors developed a probabilistic topic modeling on user tweets
with partial supervision to monitor clinical depression symptoms. They achieved competitive re-
sults with a fully supervised approach with an accuracy of 68% for predicting the answers of all the
questions over a time interval. Their semi-supervised topic modeling approach called (ssToT) was
able to respond with an enhanced F1 score specifically to the following symptoms: decreased sat-
isfaction, feeling down, sleep disturbance, energy loss, and appetite change. Similar approach was
conducted by Karmen et al. [2015] who translated questionnaires and synonyms into a depression
lexicon and used it to assign a post-level cumulative depression rating.

3.2 Forums Membership


Many online communities exist to discuss and support mental health topics either by the patient
himself or a related person seeking help. Online communities include discussion forums, chat
rooms, and blogs. Online forums are considered to be one of the essential knowledge-based re-
sources for adults on the Internet [Korda and Itani 2013]. Support forums allow people to discuss
their mental health issues and seek the required assistance from peers or specialized personnel.
Some researchers rely on the users’ affiliation to indicate a mental health condition. For example,
Nguyen et al. [2014] considered that joining LiveJournal depression-related communities is a sign
for a mental illness. Furthermore, known suicide web forums such as recoveryourlife, enotalone,
and endthislife are used to generate a lexicon of terms that were likely to identify suicidal thoughts
within posts [Burnap et al. 2015, 2017; Colombo et al. 2016].

3.3 Social Media Posts


Social media use has risen sharply over the past 10 years. In 2018, there were approximately 25.3M
social media users in Canada, i.e., more than 65% of the population are social media2 users. Social
media platforms can be utilized as a source of insights into the mental health status of the pop-
ulation. Some social media users share their mental illness conditions through social media plat-
forms. Users may reveal that information as a mean to seek help from their community, challenge
stigma, share their experience, and as an empowering coping mechanism [Berry et al. 2017]. Social

2 https://www.statista.com/statistics/260710/number-of-social-network-users-in-canada.

ACM Computing Surveys, Vol. 53, No. 6, Article 129. Publication date: December 2020.
129:8 R. Skaik and D. Inkpen

media data can be extracted using available Application Programming Interfaces (API) of the so-
cial media platforms. There are different ways of collecting and processing social media data. This
is determined by the types of illnesses being researched and the platforms that are used as source
material. In the following subsections, we will briefly summarize some of the most popular data
sources in the social media research community.
3.3.1 Twitter. Twitter is the most attractive social network service among researchers. Twitter
currently ranks as one of the world’s leading social networks based on active users.3 Every update
the user posts to his followers on Twitter is called a tweet. Tweets are mostly accessible to the
public and can be obtained and analyzed, unless flagged by the user as “private”. Tweets can be
collected using Twitter API by searching the tweets for specific keywords, hashtags, or any defined
query and can be limited to particular locations and time periods.
Other researchers look for focal users and build the dataset incrementally based on social con-
nections [Wang et al. 2017a; Zhao et al. 2018]. Several depressive symptoms derived from mental
disorders manuals such as Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition
(DSM-5) can be identified by applying machine learning algorithms on Twitter data [Mowery et al.
2015; Prieto et al. 2014].
Some researches obtained tweets with self-reported diagnoses and filtered via regular expres-
sions (RegEx) to capture “I was diagnosed with ... Condition” then the collected tweets are manually
labeled by human annotators to determine whether the expression used is indicating the men-
tioned mental health diagnosis or not [Chen et al. 2018b; Coppersmith et al. 2014a, 2015a; Li et al.
2017; Mowery et al. 2017b; Qntfy et al. 2017]. For suicide-related tweets, more tailored expressions
need to be considered, for example .+(\took | \take).+\own.+\life.+ [Burnap et al. 2017].
Some researchers extracted candidate tweets using keywords that are related to mental disor-
ders, for example, “Depression” or “Suicide” or terms that may indicate mental disorders conditions,
or symptoms such as distress, dejected, gloomy, cheerless, blue, empty, sad, feeling low, hate myself,
kill myself, don’t want to live anymore, ashamed of myself, and so on [Kale 2015; Cavazos-Rehg
et al. 2016; Varathan and Talib 2014].
There are also other ways to collect tweets that may include signs of mental disorders, like using
indicative hashtags such as #MyDepressionLooksLike [Lachmar et al. 2017], #WhatYouDontSee4 ,
or #KMS.
In addition to user-generated text, Twitter includes user data and social metadata, such as geo-
graphical information and the date and time of the tweet, and user networking and interaction in-
formation. Thus, state-of-the-art applications including Twitris5 , SOCIALmetricsTM , and OsoMe6
have been built to analyze massive real-time social media data. Twitris is a linguistic social web
network that utilizes user-generated social media content to understand social perspectives on
real-world events. Whereas, SOCIALmetricsTM is a system that processes crawled Twitter data
using NLP and text mining tools.
Razak et al. [2020] presented Tweep, a rule-based system using VADER (Valence Aware Dic-
tionary and sEntiment Reasoner) Sentiment analysis tool7 and two machine learning algorithms:
Naive Bayes (NB) and Convolutional Neural Network (CNN) to analyze tweet sentiment for logged
in users and their Twitter followers. Table 4 presents a summary of recent research done using
Twitter website as a data source.

3 https://www.statista.com/statistics/272014/global-social-networks-ranked-by-number-of-users/.
4 https://www.buzzfeed.com/annaborges/what-you-dont-know-campaign.
5 http://twitris.knoesis.org.
6 https://osome....
7 https://github.com/cjhutto/vaderSentiment.

ACM Computing Surveys, Vol. 53, No. 6, Article 129. Publication date: December 2020.
Using Social Media for Mental Health Surveillance: A Review 129:9

Table 4. References for ML Algorithms Applied over Data Collected from Twitter Platform on DD: Major
Depression Disorder, PT: Posttraumatic Stress Disorder (PTSD), PD: Postpartum Depression, LS: Life
Satisfaction, MI: Mental Illness, SI: Suicide Ideation

Ref. Users Posts Field Objective


De Choudhury 376 77,374 PD Identify mothers at risk of postpartum depression
et al. [2013b] using engagement, emotion, linguistic style, and
social network features.
De Choudhury 85 5 (1) PD Predict mothers at postpartum depression risk using
et al. [2013c] prenatal behavior observations changes in language,
patterns of posting, and emotion.
Schwartz et al. - 82M LS Predict life satisfaction of US counties using
[2013] linguistic features.
Abboute et al. 6,000 - SI Classify tweets into risky and non-risky language.
[2014]
Coppersmith et al. 5,972 926K PT Analyzed the language usage of PTSD Twitter users
[2014b] utilizing LIWC.
Coppersmith et al. 6,966 16.7M MI Analyzed language usage relevant to mental health.
[2014a]
Culotta [2014] 1.46M 4.31M MI Estimated health statistics for US counties using
LIWC+PERMA lexicons.
Jashinsky et al. 28,088 37,717 SI Analyzed the spatial correlation of suicide rates in
[2013] the US and predicted at-risk users.
Burnap et al. [2015] - 2,000 SI Ensemble classifier to differentiate between suicidal
ideation contents and other suicide-related topics
such as reporting of suicide and condolences.
Coppersmith et al. 100 (2) - SI Quantifiable linguistic differences between users’
[2015b] posts prior to suicide attempt and control users as
well as depressed users and suicide attempts in the
US.
Huang et al. [2015] 7,314 - SI Topic modeling using extended suicide psychological
lexicon.
O’Dea et al. [2015] - 14,701 SI Analyzed suicidality and predicted the level of
concern among suicide-related tweets.
Preot et al. [2015] 1,957 6.7M MI Demographics and personality estimated from tweets
achieved high performance to identify mental illness.
Resnik et al. 2,000 3M MI Used sLDA to analyze linguistic signals and uncover
[2015b, 2015a] meaningful latent structure.
Coppersmith et al. 1,088 320K SI An empirical study of the language trends and
[2016] emotional changes for individuals before a suicide
attempt.
Kang et al. [2016] 45 23,956 DD Multi-modal analysis for mood, emoticon, and
images.
Mowery et al. - 9,473 DD Classify evidence of depression using binary features.
[2016]
Benton et al. 9,611 33.8M MI Multi-task learning (MTL) framework for eight
[2017b] mental health conditions prediction.
Burnap et al. [2017] - 816 SI Differentiate between suicidal ideation contents and
other suicide-related topics using Rotation Forest and
a Maximum Probability voting.
De Choudhury 534,829 1.3M MI Explored how gender and culture influences the
et al. [2017b] online conversation of mental disorder.

(Continued)
ACM Computing Surveys, Vol. 53, No. 6, Article 129. Publication date: December 2020.
129:10 R. Skaik and D. Inkpen

Table 4. Continued

Ref. Users Posts Field Objective


Jamil et al. [2017] 25,362 156,612 DD Predict at-risk for depression users using tweet
depression index.
Mowery et al. - 9,300 DD Analysis of social media content should take into
[2017b] account the context in which certain terms are used.
Mowery et al. - 9,300 DD Used lexical features and reduced feature sets to
[2017a] classify depressed tweets.
Nguyen et al. 3, 221 (3) 769M LS Suggested textual and temporal kernel-based features
[2017a] for population health indices prediction.
Yazdavar et al. 7,046 21M DD Guided approach to combine semantically
[2017] PHQ-related terms in the same topical LDA cluster.
Chen et al. [2018b] 7,968 11.9M MI Used SVM and RF to classify four types of mental
disorders and control groups.
Chen et al. [2018a] 1,185 2.3M SI Predict users at risk of depression using temporal
analysis of eight Ekman’s basic emotions as features.
Coppersmith et al. 836 395,230 SI Used GloVe, bidirectional LSTM, and attention
[2018] mechanism to fetch the most informative terms.
Vioules et al. [2018] 60 5,446 SI Implemented pointwise mutual information measure
to detect sudden emotional changes for monitoring
suicide warning signs.
Joshi et al. [2018] 200 1.2M DD Sentiment, emotion, and behavioral features using
ensemble classifiers with accuracy of 90%.
Nguyen et al. 3, 221 (3) 1.1B MI Estimate population health indices of the US counties
[2019] using graph-based model.
Weerasinghe et al. 654 - DD After removing direct mention of depression, using
[2019] SVM with sLDA, BoW, word clusters, and POS
features achieved Average Precision (0.87).
Weerasinghe et al. 492 - PT Removed direct mention of PTSD, using SVM with
[2019] sLDA and BoW features achieved Average
Precision (0.88).
Li et al. [2020] 1.4M 80M DD CorEx topic modeling with lexicons derived from
PhQ-9 to estimate stress related to COVID-19
pandemic in the US.
Razak et al. [2020] 20 - DD Online sentiment analysis on the users’ and
followers’ tweets using VADER, TextBlob, and CNN.
Roy et al. [2020] 2,938 4M SI RF classifier based on NNs psychological SI indicators
and sentiment polarity.
Verma et al. [2020] - 15,000 DD Hybrid deep learning model to predict depressed
tweets.
(3) states
(1) months (2) users per state (3) states.

3.3.2 Reddit. Reddit is an open-source platform that allows users to publish, comment, or vote
on submissions. Reddit had over 430M active monthly users who collectively have generated
199M posts with more than 130K active communities, 1.7B comments, and 32B up-votes during
20198 .

8 https://www.redditinc.com/.

ACM Computing Surveys, Vol. 53, No. 6, Article 129. Publication date: December 2020.
Using Social Media for Mental Health Surveillance: A Review 129:11

Table 5. References for ML Training Algorithms Applied over Data Collected from Reddit Platform

Ref Posts Notes


Balani and De 32,509 Data-driven approaches for predicting self-disclosure of social media
Choudhury content related to mental health.
[2015]
De Choudhury 63,485 Predictive models of online communities to prevent suicidal disclosures.
et al. [2016]
Kavuluru et al. 11,730 Designed a system for defining the helpfulness of comments on Reddit
[2016] forums for mental health.
Gkotsis et al. 1,014,660 Classified to one of 11 themes disorders with 71.4% accuracy using CNN
[2017] and recognized mental-illness related posts with 91.1% accuracy.
Wolohan et al. 12,106* SVM classifier using LIWC and n-grams to predict hidden depression.
[2018]
Cong et al. 116,484 Attention-BiLSTM to capture informative terms after XGBoost classifier
[2018] to rectify the imbalance problem and predict depression with f1-score
0.6.
Alambo et al. 4,992 Framework for semantic clustering and sequence-to-sequence models to
[2019] assess the level of suicide risk by question answering mechanism of
C-SSRS9 questionnaire.
Thorstad and 515,374 Classified posts to one of four mental illness subreddits and used
Wolff [2019] non-clinical subreddits to early predict mental illness.
Tadesse et al. 7,201 Using LSTM-CNN over word2vec hybrid model discovered shift in
[2020] language usage of at-risk users.
(*) users used 149,089,719 words.

Posts are grouped by areas of interest that cover a variety of topics, such as gaming, sport,
news, and many others. Every subreddit has its own rules, administrators, and subscribers. Some
of these subreddits address mental health issues, including anxiety, depression, and suicide. Sub-
scribers may share personal experiences, seek help, and offer support to others. In March 2020,
there were nearly 190K subscribers to the “SuicideWatch” subreddit and nearly 600K subscribers to
the Depression subreddit. Yates et al. [2017] created an experimental dataset that contains around
9K diagnosed users and over 100K control users and named it “Reddit Self-reported Depression
Diagnosis (RSDD) dataset.” Similarly, Losada and Crestani [2016] collected 49,580 depressed and
481,873 control posts. Thorstad and Wolff [2019] collected 56,009 posts for each of the following
clinical subreddits: r/ADHD, r/Anxiety, r/Bipolar, and r/Depression and used unigram word vec-
tors for training a logistic regression model to classify the post to one of the four mentioned mental
illnesses. The depression classification model achieved an F1-score of 0.74. They also concluded
that the language used in a non-clinical context would be predictive of which clinical subreddit the
user would later post to. Table 5 presents a summary of recent researches done using the Reddit
website as a data source.
3.3.3 Sina Weibo. Sina Weibo is China’s largest microblog site. Many studies have been con-
ducted using Sina Weibo as a data source. Wang et al. [2018] randomly crawled 1M users (394M
postings) and used a keyword-based method to pinpoint users at risk of suicide. Afterwards, three
mental health researchers were assigned to label at-risk users manually. They identified 114 users
(60,839 posts) with suicide ideation and used linguistic analysis to explore behavioral and demo-
graphic characteristics. Lv et al. [2015] used Sina Weibo to build a suicide dictionary. They found
that the dictionary-based recognition correlates well with the expert ratings (r = 0.507) in detecting

9 Columbia Suicide Severity Rating Scale.

ACM Computing Surveys, Vol. 53, No. 6, Article 129. Publication date: December 2020.
129:12 R. Skaik and D. Inkpen

suicidal expression along with evaluating the level of suicide risk. Similarly, Hao et al. [2013] used
Support Vector Machine (SVM) and Neural Networks (NN) on psychological measurement data
(SCL-90-R) along with Sina Weibo blogs to identify users with mental health issues. On post-level,
Gao et al. [2017] extracted new features based on content and emotions by examining the semantic
relationships between the words in a labeled dataset of 9,123 microblogs with suicidal ideation.

3.4 Available Datasets


Datasets can be made available for shared tasks or competitions, such as the CLPsych Shared Task,
eRisk, and Crisis Text Line datasets that are described in the next subsections.

3.4.1 CLPsych Shared Task. In 2014, the workshop on Computational Linguistics and Clinical
Psychology (CLPsych) began a collaboration between clinical psychologists and computer scien-
tists and developed links across the research community. The workshop series aims to expedite
the development of language technology for mental healthcare with an emphasis on using social
media to predict population mental health. Shared tasks provide gold standards, because they are
built on the same dataset to test and compare various solutions to the same problem under study.
At the 2015 CLPsych workshop, participants were asked to determine whether a user has PTSD
or depression, or none of them based on self-reported Twitter diagnoses [Coppersmith et al. 2014a].
The dataset is composed of 1,146 users: 246 users with PTSD, 327 depressed users, and 573 control
users that match the age and gender of the former two groups. For all three tasks, the system of
Resnik et al. [2015a] performed the best, obtaining an average precision above 0.80. The maximum
precision was achieved for the task that distinguishes PTSD vs. control users by training an SVM
classifier with a linear kernel based on topic modeling and lexical TF-IDF features. Later, Orabi et al.
[2018] used optimized word embeddings with a deep learning model and achieved a precision of
87.4% and an F1-score of 86.97%.
The CLPsych 2016 and 2017 shared tasks invited participants to automatically triage posts col-
lected from the ReachOut.com forum as green, amber, red, or crisis to assist the forum moderators
to identify and address pertinent cases as soon as possible. A total of 15 teams participated in
the task with 60 different submissions. At first, 947 annotated posts were given to each team to
develop and train their models. The best-performing system used an ensemble classification ap-
proach with TF-IDF weighted unigrams and post embeddings, and achieved an F1-score of 0.42
[Kim et al. 2016]. Following, Cohan et al. [2017] used the same dataset and applied an ensemble of
lexical, LIWC, emotions, contextual, and topic modeling features using an SVM model to reach a
better F1-score of 0.51.
In 2019, the shared task consisted of three tasks; Task A was about risk assessment of the users
who posted in the SuicideWatch subreddit, into one of the following four levels of risk: No risk,
low, moderate, and high. Task B was about risk assessment using all the subreddits. Task C was
about screening users for probabilistic risk from non-mental health-related subreddits. The dataset
consists of 1,242 users (including both positive examples and controls). Mohammadi et al. [2019]
participation under the team name CLaC obtained the best macro-F1 score for task A (0.533) by
adding the SVM-predicted class probabilities at the end of the pipeline on top of a set of CNN, Bi-
LSTM, Bi-RNN, and Bi-GRU neural networks. For Task B, Matero et al. [2019] achieved the best F1-
score (0.504) using BERT features extracted separately from Suicide-Watch and nonSuicideWatch
posts. For Task C, a stacked parallel CNN with LIWC and a universal sentence encoder [Cer et al.
2018], produced the best unofficial F1 score (0.278) as compared to (0.268) for the CLaC primary
system. Finally, Howard et al. [2020] used lexicon analysis, LIWC, Empath, Word Count, VADER,
and DeepMoji for emotional feature extraction to train the model on CLPsych 2017 dataset and
test it on CLPSych 2019 expert-labeled dataset with a maximum F1-score (0.616).

ACM Computing Surveys, Vol. 53, No. 6, Article 129. Publication date: December 2020.
Using Social Media for Mental Health Surveillance: A Review 129:13

3.4.2 eRisk task. The 2017 CLEF eRisk is a pilot project that extends the CLEF initiatives that
have been operating since 2000, leading to the systemic evaluation of information systems, mainly
through experiments on shared tasks. The primary purpose of CLEF eRisk 2017 and 2018 (Task 1) is
to address issues related to assessment criteria, effectiveness indicators, and other early detection
of depression mechanisms [Coello-Guilarte et al. 2019; Losada et al. 2017, 2018]. The shared task
focuses on the automatic detection of the risk of depression from Reddit posts of a user as soon as
possible.
For eRisk 2017, the training set has been manually annotated by experts and contains 486 users
(83 depressed with 30,851 posts and 403 non-depressed with 264,172 posts). The test set holds
401 users (52 depressed with 18,706 posts, and 349 non-depressed wrote 217,665 posts). A total
of 30 systems were submitted by eight teams in the pilot task [Almeida et al. 2017]. The highest
precision was 0.69 submitted by the Biomedical Computer Science Group from the University
of Applied Sciences and Arts Dortmund (FHDO); while the highest recall was 0.79, submitted
by the LIDIC Research Group, from Universidad Nacional de San Luis. They examined multiple
document representations such as Bag of Words (BoW), Concise Semantic Analysis, Character
3-grams, and LIWC using Random Forests, NB, and decision trees machine learning algorithms.
The evaluation measures include an early risk detection measure (ERDE) along with standard
classification measures, such as F1, Precision, and Recall. ERDE measure is mainly to reward
correct classification using a fewer number of user submissions and to penalize late decisions.
The 2018 eRisk task continued with the goal of early detection of symptoms of depression, along
with a new task of early detection of anorexia indicators. A 2017 dataset was used as training set,
then additional 820 users were added with more than 500K posts for testing. There were 45 contri-
butions from 11 teams. No significant improvement was noticed, and most participants ignored the
tradeoff between early detection and accuracy. Consequently, Leiva and Freire [2017] used TF-IDF
features to compare different ML algorithms on the same dataset and concluded that the Random
Forest shows the highest precision score, while the K-Nearest Neighbors obtains the highest re-
call. The combination of all of them in the Voting Algorithm presents an improvement in the F1
measure.
There were three tasks for eRisk 2019. In addition to an ambitious challenge relating to instantly
completing a user interaction-based depression questionnaire in social media, the first task contin-
ued in the same direction as previous challenges for early detection of depression symptoms, and
a similar task was introduced for unsupervised self-harm detection. The findings indicate that it is
uncertain whether early signs of self-harm can be identified from social media user experiences un-
til they join a self-harm community [Losada et al. 2019]. Recently, eRisk 2020, in its fourth year, con-
tinued with two tasks: self-harm detection and measuring the severity of depressive symptoms.10

3.4.3 Crisis Text Line. The Crisis Text Line11 supported by Kids Help Phone is a free 24/7 crisis
support texting hotline to assist people with mental health issues through texting [Dinakar et al.
2014]. As of October 2019, Crisis Text Line has processed more than 100M text messages. The
data are used to study different mental illness trends across the US, including but not limited
to depression, self-harm, and suicidal ideation. The result of the analysis is displayed publicly
on CrisisTrends.org. Althoff et al. [2016] experimented with roughly 15K counselor messages to
evaluate the linguistic aspects of effective therapy. Unigram and bigram features on a regression
model with L1 regularization and 10-fold cross-validation showed the best performing model to
predict the effectiveness of a patient-counselor conversation with an accuracy of 0.687 and AUC

10 https://early.irlab.org/.
11 The labeled dataset used to produce the word cloud is from Shen et al. [2017].

ACM Computing Surveys, Vol. 53, No. 6, Article 129. Publication date: December 2020.
129:14 R. Skaik and D. Inkpen

of 0.716. To gain access to an anonymized version of the dataset, the researcher must apply and
get accepted in the Research Fellows program sponsored by Crisis Text Line.

4 METHODS
Social media data can provide a valuable source of information for public health research. Un-
derstanding data and the domain of discourse is vital for building a good model. Accordingly,
detecting mental disorders using social media posts requires a thorough understanding of the key
predictors of the illness, called features in ML terminology. Many researchers tried to determine
the contributing features utilizing different NLP approaches to build an accurate predictive model.
Most predictive models focus on determining the best features that contribute to the problem
under analysis to design good classifiers. Selecting the best set of features that help to reduce the
dimension from the dataset is influential to the learning process. Because of the heterogeneity of
social media content, a variety of features can be developed starting from textual and linguistic,
to user-based and metadata-related features [Wijeratne et al. 2017]. As mentioned earlier, only
a subset of these features has the requisite ability to distinguish classes for specific applications
and contexts; in particular, to predict user types and behaviors [Kursuncu et al. 2018]. Feature
engineering or extraction aims to reduce the number of features extracted from the dataset under
study by choosing the most discriminative features. Reducing data dimensionality via the feature
extraction process helps avoid the curse of dimensionality [Cummins et al. 2015]. The main target
is to identify strongly relevant and non-redundant features [Li et al. 2017], which is a challenging
task. Using deep learning frameworks helps capture related features during the learning process
without exhaustive feature engineering [Orabi et al. 2018].
Table 6 shows the summary of the features that have been used in various mental health signs
analysis studies. Several efforts have attempted to predict mental illness within social media con-
tent on post level, user level, or population level. In this section, more details on the analysis levels
will be explained.

4.1 Post-level Analysis


Predicting indicators of mental disorders within posts could be an intermediate step towards a
more comprehensive model [Kim et al. 2016; Lin et al. 2017; Wang et al. 2017b]. Figure 3 shows
the difference between depression and non-depression contents in post level. The post itself has
either explicit or implicit attributes. The explicit attributes are the raw attributes provided from the
social media framework such as the post itself or the metadata embedded within the post, such as
the time of the post, the number of up-votes/shares/replies (based on the application) and location
of the post for enabled GPS-tagging. However, the implicit attributes can be inferred from explicit
attribute(s) with a simple or more complicated process, such as the post sentiment, emotions, the
type of the post, and sleep patterns.

4.2 User-level Analysis


For user-level classification, usually multiple posts are aggregated as a single document, or behav-
ioral changes are detected over a defined period. This can be done in hierarchical manner starting
from the post level using only the post content [Aladag et al. 2018; Amir et al. 2017; Orabi et al.
2018; Schwartz et al. 2014; Zhang et al. 2015], or by considering users defined, or derived informa-
tion such as gender and personality [Preot et al. 2015], or by adding behavior patterns and social
engagements [Lin et al. 2014; Shuai et al. 2018; Yates et al. 2017].

ACM Computing Surveys, Vol. 53, No. 6, Article 129. Publication date: December 2020.
Using Social Media for Mental Health Surveillance: A Review 129:15

Table 6. List of Features Used in Different ML Algorithms for Mental Health Classification

References Type Description


Aladag et al. [2018]; Burnap et al. [2017]; De Lexical Short/long words ratio, word
Choudhury et al. [2017a]; Colombo et al. frequency, LIWC lexicon,
[2016]; Coppersmith et al. [2015b]; Doan Bag of Words, n-grams,
et al. [2017]; Huang et al. [2014]; Karmen words that show surprise,
et al. [2015]; Kavuluru et al. [2016]; Kumar exaggeration or emphasis
et al. [2015]; Mowery et al. [2017a]; Nguyen
et al. [2014, 2017b]; Reece et al. [2017]; Yang
and Mu [2015]
Acuña Caicedo et al. [2020]; Burnap et al. Syntactic POS, verb tenses, first-person
[2015]; Kang et al. [2016]; Kumar et al. pronouns, usage of
[2015]; Mowery et al. [2017b]; Wang et al. intensifier terms,
[2018] dependency relation,
emotion and sentiment
analysis
Colombo et al. [2016]; De Choudhury et al. Social The type and number of
[2013b]; Vioules et al. [2018]; Kumar et al. networking connections with other users
[2015]; Park et al. [2013] as inlinks or outlinks
Chen et al. [2018b]; Coppersmith et al. Pattern of Life The behavior of the user
[2014a]; Nguyen et al. [2017a]; Peng et al. during a specific period
[2017] concerning the volume and
time of posts and the type
and number of connections
with other users
Nsoesie et al. [2016]; Oh et al. [2017]; Peng Demographics The users’ demographic data,
et al. [2017]; Preot et al. [2015] such as age, gender,
ethnicity, income, education,
and personality.
Amir et al. [2017]; Joshi et al. [2018]; Kim Word Representing users’ posts
et al. [2016]; Lin et al. [2016]; Orabi et al. embedding vocabulary and capturing the
[2018] semantic and syntactic
relations with other words
Cohan et al. [2017]; Nguyen et al. [2017a]; Topic Identifying topic patterns
Resnik et al. [2013, 2015b]; Toulis and Golab modeling that are present across the
[2017]; Seah and Jin Shim [2019] users’ posts

4.3 Population-level Analysis


Studies show the usefulness of using social media data to recognize mental health indicators in
countries as a tool of public health surveillance, but it is necessary to support the accuracy of
social media results from other sources, such as national surveys. There are two main approaches
for textual analysis of social media content for insights of mental health issues for population-level
mental-illness detection:

4.3.1 Bottom-up Approach. Using this approach, the researcher starts with individual models,
then generalizes to make an inference about the population [Coppersmith et al. 2014a, 2015b,

ACM Computing Surveys, Vol. 53, No. 6, Article 129. Publication date: December 2020.
129:16 R. Skaik and D. Inkpen

Fig. 3. Word cloud overview of emerging terms from users who did not express depression (left) and users
who self-reported depression (right).

Fig. 4. Bottom-up approach for population textual analysis.12

Fig. 5. Top-down approach.

2014b; De Choudhury et al. 2017b; Yazdavar et al. 2017]. Figure 4 illustrates the steps that need to
be followed to conclude a population-related inference. Importantly, the sample under study has
a significant impact on the findings, because often only major classes are represented, excluding
crucial minority classes. Hence, researchers had utilized techniques typically used in regular sur-
veys. To have a population-representative sample, researchers had utilized different techniques to
rectify representation errors, such as probability sampling techniques including Stratified Sam-
pling [De Choudhury et al. 2016; Wang et al. 2019b], Simple Random Sampling [Cheng et al. 2017;
Liu et al. 2017; Calderon-Vilca et al. 2017; Shing et al. 2018], Cluster Sampling [Schwartz et al.
2013], or Multi-Stage Sampling [De Choudhury et al. 2013c]. In addition, researchers had used non-
probability sampling techniques such as snowball sampling to deal with minority classes [Balani
and De Choudhury 2015; Tsugawa et al. 2015; Wee et al. 2017; Wolohan et al. 2018; Zhao et al.
2018].
4.3.2 Top-down Approach. In this approach, studies use the aggregated data of the population,
then make inference from that data [Coppersmith et al. 2017; Culotta 2014; Schwartz et al. 2013;
Jaques et al. 2016; Gruebner et al. 2017a; Nguyen et al. 2017a; Giorgi et al. 2018]. This type should be
applied carefully, otherwise, the differences among the subgroups could be masked and dissolved
during the aggregation process, as shown in Figure 5.

12 Map from https://commons.wikimedia.org/wiki/File:Political_map_of_Canada.png.

ACM Computing Surveys, Vol. 53, No. 6, Article 129. Publication date: December 2020.
Using Social Media for Mental Health Surveillance: A Review 129:17

Machine learning algorithms play a vital role for modeling relationships between features [Hahn
et al. 2017]. It is well known that there is no universal algorithm for classification. It is increasingly
important to understand the diverse cultures and societies contributing to social media. Adding
demographic attributes including age, gender, and personality type affect predicting health sta-
tus [Preot et al. 2015]. Religion, ethnicity, marital, and socioeconomic status are expected to add
value in the research. Previous researches done on users’ posts and metadata have shown that
demographic information can be extracted by applying different ML algorithms with an accuracy
ranging from 60% to 90% [Sinnenberg et al. 2017], or can be extracted from profile information,
such as username, screen name, biography, and profile image with Macro-F1 measure 0.9 for gen-
der and Macro-F1 measure 0.5 for age [Wang et al. 2019a].

5 POPULATION MENTAL HEALTH CLASSIFICATION TECHNIQUES


Machine Learning—in its simplest forms—utilizes selected features to classify some data instances
into binary classes of mental conditions (positive or negative). Supervised learning algorithms
such as SVM, NB, KNN, Logistic Regression (LR), and Decision Trees (DT) are employed. Recently,
deep learning models have been extensively applied in the field of NLP and show remarkable
enhancement in the prediction process. One of the key advantages of deep neural networks is its
ability to learn input representation and parameters of the neural network without the need of
domain-specific feature engineering. The lower layers learn simple features then propagate what
they learned to the higher layers. This process allows to identify complex relationships between
the input text/post(s) and the label/diagnosis.
Social media has been employed to assess population health statistics. For example, Culotta
[2014] gathered approximately 1.4M Twitter users spread over the 100 most crowded counties in
the US and performed a linguistic analysis on tweets’ text and users’ description to predict 27
health-related topics ranging from lack of health insurance, obesity, teen births, to an indication
of mental illness. He contrasted the output of his model with figures from the County Health
Rankings & Roadmaps. A significant correlation was observed with six health measures and mod-
els with added linguistically analyzed Twitter data improved predictive accuracy for 20 community
health measures. In fact, the distribution of a county’s textual characteristics may be insightful
and predictive of the county’s medical activity and health outcomes [Nguyen et al. 2017a]. More-
over, considering word categories using LIWC13 that has been developed by psychologists across
the world in their native languages and PERMA lexicon that contain 10 categories reflecting the
five dimensions of positive psychology, a significant correlation with county-level health statistics
[Culotta 2014] was reported.
Recent studies by analyzing textual features and LIWC have projected mental health conditions
[Nguyen et al. 2017a, 2017b; Vioules et al. 2018; Kavuluru et al. 2016; Lehrman et al. 2012], Personal
Event Detection [Lin et al. 2016], and topic modeling using Bayesian probabilistic modeling tools,
such as LDA (Latent Dirichlet allocation) [Resnik et al. 2013; Paul and Dredze 2014; Seah and Jin
Shim 2019]. In some cases, experiments show that reduced feature sets and simple lexical features
can yield comparable results to the much larger feature dataset [Mowery et al. 2017a].
In the following subsections, we will summarize the recent techniques used in the most promi-
nent areas of research: depression and suicide ideation in population level.

5.1 Depression Detection


Due to its importance, depression disorder has received considerable attention among researchers.
NLP along with ML techniques were used on social media to detect depression [Coppersmith et al.

13 http://liwc.wpengine.com/.

ACM Computing Surveys, Vol. 53, No. 6, Article 129. Publication date: December 2020.
129:18 R. Skaik and D. Inkpen

2014a; Karmen et al. 2015; Resnik et al. 2015b]. De Choudhury et al. [2013a] developed a proba-
bilistic model to detect the behavioral changes associated with the onset of depression. Whereas,
Mowery et al. [2016] used lexical and emotional features to identify depressive symptoms such
as anhedonia (reduced motivation to feel pleasure), insomnia, loss of energy, and so on. Nguyen
et al. [2014] achieved an accuracy of 93% using a logistic regression model to classify blog posts
to be belonging to depression or control sets. Shen et al. [2017] constructed a multi-modal depres-
sive dictionary learning model (MDL) to learn the latent features of depressed and non-depressed
users on Twitter from a joint sparse representation of emotion, visual, social network properties,
user profile, topic-level, and domain-specific features and achieved 85% in F1-score. To predict
user stress levels, Lin et al. [2016] used neural network–based architecture with word embeddings
(WE) learned from Sina Weibo dataset along with stress-based keywords. The word embedding
was found to be effective in predicting semantic similarity between different words. Nguyen et al.
[2019] represented each county as a graph interactions between LIWC features, then trained sev-
eral graph neural networks: graph convolutional network (GCN), graph attention network (GAT),
a hybrid GAT-GCN network, and graph isomorphism network (GIN) to learn the population health
representation, and finally, used LR to estimate 3,221 counties health indices.
Table 7 summarizes the research findings related to predicting depression from Twitter text.

5.2 Predicting Suicide Ideation


Suicide prevention starts with recognizing the warning signs and taking them seriously. Thus,
predicting SI from social media is considered one step towards identifying affected groups based
on gender, age, geographic location, or other characteristics.
In 2015, O’Dea et al. [2015] used machine learning algorithms to distinguish strongly concerning
suicide-related tweets among 14,701 tweets with an accuracy of 80%. Wang et al. [2019b] performed
the same on Chinese online communities.
In 2017, Benton et al. [2017b] presented a multi-task learning (MTL) model using a feed-forward
neural network using character n-gram features to predict potential suicide attempt and presence
of atypical mental health. Also, a framework to instantly detect suicide-related posts on Twitter
was proposed by Vioules et al. [2018] by using NLP methods that combine text-generated features
based on a lexicon ensemble. Similarly, the association between suicidal ideation and the linguistic
features was examined [Cheng et al. 2017; Huang et al. 2014]. Also, Jashinsky et al. [2013] found
a correlation between the rate of risk per tweet per state measured by the appearance of terms
associated with suicide risk and state age adjusted suicide rates.
A well-known strong predictor of completed suicide is a previous suicide attempt [Rakesh 2017]
and self-harm [Robinson et al. 2015]. De Choudhury et al. [2016] built a statistical approach on data
from Reddit users who shifted from mental health concerns to SI. Their approach derives markers
to detect this transition incident through the three cognitive psychological integrative models of
suicide, including thinking, ambivalence, and decision-making. In 2020, using 10 neural networks,
Roy et al. [2020] estimated the weight of psychological factors such as stress, loneliness, burden-
someness, hopelessness, depression, anxiety, and insomnia, in addition to sentiment polarity by
training a random forest model using the 10-estimated psychological metrics to predict SI within
the tweets and achieved 0.88 AUC score.
Table 8 shows the best reported ML method for anticipating suicide ideation on datasets that
are manually annotated.

6 LIMITATIONS AND CHALLENGES


Social media provides benefits over conventional data sources, including ease of access, reduced
costs, and up-to-date data availability. We can learn about public health topics by passively

ACM Computing Surveys, Vol. 53, No. 6, Article 129. Publication date: December 2020.
Using Social Media for Mental Health Surveillance: A Review 129:19

Table 7. Summary of the Features Used for Predictive Models for Depression with the Best Results
Achieved in Terms of Accuracy (AC); F1-score; or Recall (RC)

Ref. Features Model Metric Score


De Choudhury et al. Lexical (swear words & 1st SVM AC 73%
[2013a] person pronoun)
De Choudhury et al. Semantic and Social SVM AC 70%
[2013d]
[De Choudhury Semantic and Social SVM AC 80%
et al. 2013b]
Burnap et al. [2015] N-gram and Semantic Ensemble F1 0.69
Resnik et al. [2015a] LDA, supervised Anchor, SVM AC 86%
TF-IDF
Tsugawa et al. N-gram, Semantic, Social SVM AC 66%
[2015]
De Choudhury et al. Interpersonal, Interaction, LR AC 80%
[2016] linguistic
Mowery et al. [2016] n-grams, emotions, LIWC, age, SVM F1 0.52
gender
Jamil et al. [2017] Lexical, polarity, depression SVM RC 0.80
terms, self report
Peng et al. [2017] User profile, text and behavior SVM AC 83%
Chen et al. [2018b] LIWC, Pattern of Life, Emotions SVM, RF AC 90%
Chen et al. [2018a] Emotions, temporal, LIWC RF AC 93%
Wolohan et al. LIWC & n-gram SVM AC 82%
[2018]
Wongkoblap et al. LIWC & life satisfaction SVM AC 78%
[2018]
Thorstad and Wolff TF-IDF LR F1 0.74
[2019]

analyzing existing social media data [Paul et al. 2016]. Social media analysis may allow to
connect links between diseases and their symptoms or causes. There are still many challenges
and questions that need to be addressed to exploit the opportunities for employing social media
data to predict mental health issues within the population. This section briefly discusses some
of the limitations and challenges that arise in this endeavor, offering some recommendations to
overcome such barriers.

6.1 Availability and Correctness of Social Media Data


One of the major issues in using NLP and ML tools for public mental health prediction is the
availability of correctly labeled data. Social media provides more than 10 times the volume of
data collected by country-wide surveys; however, collecting relevant posts is challenging because
of semantic heterogeneity and diverse writing styles. Moreover, labeling a good sample is time-
consuming and requires professional resources and an adequate level of consensus. Using self-
reports as a way to gather mental disorders within posts is the easiest. Still, it may not be considered
as a clinical ground truth, since there is no way to verify an actual diagnosis, nor any way to
determine that a control instance might not be positive for the conditions. Semi-automatic labeling
techniques could be adapted to label the data as previously stated, but more accurate labeling
ACM Computing Surveys, Vol. 53, No. 6, Article 129. Publication date: December 2020.
129:20 R. Skaik and D. Inkpen

Table 8. Summary of the Features Used on Social Media Text for Predictive Model for
Suicide Ideation with the Best Results Achieved in Terms of Accuracy (AC);
F1-score; Recall (RC); or Area Under Curve (AUC)

Positive
Ref. Tweets/Users Features Model Metric Score
Abboute et al. [2014] 623/3,263 WEKA NB AC 63%
Huang et al. [2015] 664/- LDA SVM AC 96%
O’Dea et al. [2015] 1,820/14,701 TF-IDF SVM AC 80%
Braithwaite et al. [2016] 17/- LIWC DT AC 92%
Coppersmith et al. 554/- Sentiment LR F1 0.53
[2016]
Burnap et al. [2017] 425/- TF-IDF SVM AC 66%
Cheng et al. [2017] 976 /- SC-LIWC SVM AUC 0.48
Aladag et al. [2018] -/785 TF-IDF SVM F1 0.92
Coppersmith et al. 418/197,615 WE LSTM RC 0.85
[2018]
Desmet and Hoste 257/- Gallop & SVM F1 0.75
[2018] BoW
Roy et al. [2020] 283/512,526 NN for psy- RF AUC 0.88
chological
constructs

techniques need to be researched to minimize human interference. Another approach is to delve


into the users’ posts looking for symptoms and finding psychiatric questionnaire answers to label
the data automatically.

6.2 Is Social Media Representative of the General Population?


A fundamental limitation in this field is that social media users do not represent the general popu-
lation [Greenwood et al. 2016; Mellon and Prosser 2017]. Social media is biased in different ways.
For example, most Twitter users are between 18 and 34 years old, and most Facebook users are
female [InsightsWest 2017]. Besides, there is a variance in posting using users’ geo-location-based
on age, gender, marital and socioeconomic status, religion, and personality type [Rzeszewski and
Beluch 2017]. Other attributes can negatively affect the presence of under-served populations and
minority groups, such as lack of education or Internet access [Denecke et al. 2015]. However, so-
cial media information suffers from a lack of key demographic variables such as age, ethnicity, and
gender. Nevertheless, some researchers have suggested approaches for the automatic identification
of demographic characteristics of social media users [Cesare et al. 2017].
These limitations can be resolved using different techniques such as user filtering [Filho et al.
2015; Yang et al. 2015], characterizing demographic attributes [Chen et al. 2015], and user sampling
[White et al. 2012; Aghababaei and Makrehchi 2017]. The ability to detect user attributes, such as
gender, age, ethnicity, and location, can be predicted from messages posted by the user on social
media [Farzindar and Inkpen 2017]. Although using social media contents show promise in mental
health detection within the population, significant research on representative samples is necessary
to validate its results and strengthen its models.

6.3 Evaluation Metric to Measure Model Generalization


An adequately labeled data and a well-tuned model are insufficient to obtain a general mental
health assessment surveillance system. The model needs to be built in a way that allows it to
ACM Computing Surveys, Vol. 53, No. 6, Article 129. Publication date: December 2020.
Using Social Media for Mental Health Surveillance: A Review 129:21

perform well across different datasets. Different evaluation metrics can be adopted as detailed in
Sokolova and Lapalme [2009]. Therefore, having the means to evaluate the model’s output and its
ability to generalize is another success factor. Some population-level systems are evaluated using
the Pearson correlation coefficient between the ML model prediction and official national statistics.
It is important to note that national statistics may not be sufficiently accurate, as mental illnesses
are significantly under-reported, especially in developing and undeveloped countries where data
are more difficult to obtain and mental illnesses are less recognized. However, some critical clusters,
such as veterans, indigenous people, and the elderly, may not be well presented in the data due to
not regularly using social media.

6.4 Importance of Identifying Risk Factors


Specific risk factors include age, sex, substance abuse, certain medications, chronic diseases, family,
workplace, genetics, illness, and major life events such as marriage, divorce, death, or abuse. More
research needs to be done to better understand the influence of features for identifying such risk
factors and, consequently, efficiently classifying social media with mental illness signs to support
mental health monitoring at the population level. ML offers an opportunity to dig into complex
interactions between risk factors.

6.5 Ethics
There are ethical challenges in using social media as a source for NLP and ML. Conway [2014]
provided a taxonomy of ethical principles on the use of Twitter in public health research based
on a review of the literature. These principles can be implemented among all social media plat-
forms. Researchers face important challenges to ensure the privacy of social media data [Conway
et al. 2016; Gruebner et al. 2017b]. Although some of the data are publicly available, the problem
becomes more complicated when personal attributes can be predicted, and the identity of particu-
lar users can be revealed. There is an elaborated discussion of ethics, particularly in public health
research, such as Mckee [2013]; Mikal et al. [2016]; Denecke et al. [2015]; Golder et al. [2017];
Valdez and Keim-Malpass [2019]. Based on their experience in the domain, Benton et al. [2017a]
developed practical protocols to guide NLP research using social media data from a healthcare
perspective. Their guidelines recommend that researchers need to acquire an ethical approval or
exemption from their Institutional Review Board (IRB). Researchers also need to obtain informed
consent when possible and protect and anonymize sensitive data when used in presentations or
analysis. Besides, they need to be vigilant when linking data across sites is necessary. Finally, when
sharing their data, they need to ensure that other researchers respect ethical and privacy concerns.
In general, there is an agreement that researchers can use publicly available data for health mon-
itoring, but preserving the confidentiality of social media users is a must. Predicting the clusters
vulnerable to mental disorders is one of the steps in protective medicine. After identifying such
groups, there are practical steps that need to be considered from responsible parties to collaborate
for disease control, treatment, and prevention. These steps may require informed consent and ac-
ceptance of such interventions in the target population or required government-related programs
to address specific clusters and provide the appropriate help.

7 CONCLUSION
Despite the above-mentioned methodological and technical difficulties accompanying the use of
social media data in predictive analysis, social media has proven to be a valuable source for detect-
ing the characteristics of depressed individuals or those who are vulnerable to suicidal thoughts.
With the progressive rise in the number of depressed users due to the COVID-19 pandemic and an
expected increase in the number of suicides [Sher 2020], this field becomes even more promising

ACM Computing Surveys, Vol. 53, No. 6, Article 129. Publication date: December 2020.
129:22 R. Skaik and D. Inkpen

in providing decision-makers with rapid tools to mitigate risk. This review addresses the ideas pre-
sented by different researchers in this emerging field and provides a summary of data collection
methods, classification techniques, and evaluation results. It highlights the importance of apply-
ing proper sampling methods when dealing with population-level analysis to include the minority
communities with the correct handling of imbalanced data. Also, this review highlights the poten-
tial in applying deep neural network models on textual features instead of particular user-centric
features such as demographic and social features and post-centric features such as linguistic and
behavioral features to determine the most prominent features to develop a better predictive model.
Its success criteria should be measured by its accuracy on identifying at-risk clusters, its ability
to generalize on unseen data, its early prediction of mental illness signs from users’ posts, and its
timely prediction in terms of performance.

APPENDIX
B THE MANUSCRIPTS INCLUDED IN THE REVIEW

Data Collection
method References
Screening Surveys Almouzini et al. [2019]; De Choudhury et al. [2013a, 2013d, 2014]; Oh et al.
[2017]; Tsugawa et al. [2015]; Zhang et al. [2015]; Rakesh [2017]; Reece et al.
[2017]; Stankevich et al. [2020, 2019]
Forums Acuña Caicedo et al. [2020]; Burnap et al. [2015, 2017]; Colombo et al. [2016];
Desmet and Hoste [2018]; Karmen et al. [2015]; Nguyen et al. [2014, 2017b];
Wang et al. [2019b]; Howard et al. [2020]
Twitter Abboute et al. [2014]; Benton et al. [2017b]; Braithwaite et al. [2016]; Burnap
et al. [2015, 2017]; Chen et al. [2018b, 2018a]; Colombo et al. [2016];
Coppersmith et al. [2015b, 2014b, 2014a, 2016, 2018]; Culotta [2014]; De
Choudhury et al. [2013d, 2013a, 2013b, 2013c, 2017b]; Jamil et al. [2017];
Jashinsky et al. [2013]; Vioules et al. [2018]; Joshi et al. [2018]; Kang et al.
[2016]; Liu et al. [2017]; Mowery et al. [2016, 2017b, 2017a]; Nguyen et al.
[2017a]; O’Dea et al. [2015]; Preot et al. [2015]; Yazdavar et al. [2017]; Resnik
et al. [2015b, 2015a]; Schwartz et al. [2013]; Tsugawa et al. [2015]; Razak
et al. [2020]; Verma et al. [2020]; Giorgi et al. [2018]; Samuel et al. [2019]
Reddit Alambo et al. [2019]; Aladag et al. [2018]; Balani and De Choudhury [2015];
De Choudhury et al. [2016, 2017a]; Gkotsis et al. [2017]; Seah and Jin Shim
[2019]; Shing et al. [2018]; Toulis and Golab [2017]; Tadesse et al. [2020];
Thorstad and Wolff [2019]; Wolohan et al. [2018]; Cong et al. [2018]
Sina Weibo Cheng et al. [2017]; Huang et al. [2014, 2015]; Gao et al. [2017]; Peng et al.
[2017]; Wang et al. [2018]
CLPsych Shared Amir et al. [2017]; Cohan et al. [2017]; Matero et al. [2019]; Mohammadi
Task et al. [2019]; Orabi et al. [2018]; Resnik et al. [2015b]
eRisk Losada et al. [2019, 2017, 2018]; Almeida et al. [2017]; Kim et al. [2016]; Shen
et al. [2017]
Reviews Dreisbach et al. [2019]; Edo-Osagie et al. [2020]; Giuntini et al. [2020];
Mahdy et al. [2020]; Thieme et al. [2020]; Tsakalidis et al. [2019]; Robila and
Robila [2019]
Others Coppersmith et al. [2017]; Resnik et al. [2013]; Toulis and Golab [2017]

ACM Computing Surveys, Vol. 53, No. 6, Article 129. Publication date: December 2020.
Using Social Media for Mental Health Surveillance: A Review 129:23

REFERENCES
Amayas Abboute, Yasser Boudjeriou, Gilles Entringer, Jérôme Azé, Sandra Bringay, and Pascal Poncelet. 2014. Mining
Twitter for suicide prevention. In Proceedings of the International Conference on Applications of Natural Language to Data
Bases/Information Systems (LNCS, Vol 8455). Springer, Cham, 250–253. DOI:https://doi.org/10.1007/978-3-319-07983-
7_36
Saeed Abdullah and Tanzeem Choudhury. 2018. Sensing technologies for monitoring serious mental illnesses. IEEE Multi-
media 25, 1 (Jan. 2018), 61–75.
Roberto Wellington Acuña Caicedo, José Manuel Gómez Soriano, and Héctor Andrés Melgar Sasieta. 2020. Assessment of
supervised classifiers for the task of detecting messages with suicidal ideation. Heliyon 6, 8 (2020). DOI:https://doi.org/
10.1016/j.heliyon.2020.e04412
Somayyeh Aghababaei and Masoud Makrehchi. 2017. Activity-based Twitter sampling for content-based and user-centric
prediction models. Hum.-centr. Comput. Inf. Sci. 7, 1 (2017). DOI:https://doi.org/10.1186/s13673-016-0084-z
Ahmet Emre Aladag, Serra Muderrisoglu, Naz Berfu Akbas, Oguzhan Zahmacioglu, and Haluk O. Bingol. 2018. Detecting
suicidal ideation on forums: Proof-of-concept study. J. Med. Internet Res. 20, 6 (June 2018), e215. DOI:https://doi.org/10.
2196/jmir.9840
Amanuel Alambo, Manas Gaur, Usha Lokala, Ugur Kursuncu, and Krishnaprasad Thirunarayan. 2019. Question answering
for suicide risk assessment using Reddit. In Proceedings of the IEEE 13th International Conference on Semantic Computing
(ICSC’19). 468–473.
Hayda Almeida, Antoine Briand, and Marie Jean Meurs. 2017. Detecting early risk of depression from social media user-
generated content. In Proceedings of the CEUR Workshop, Vol. 1866.
Salma Almouzini, Maher Khemakhem, and Asem Alageel. 2019. Detecting Arabic depressed users from Twitter data. Pro-
cedia Comput. Sci. 163 (2019), 257–265. DOI:https://doi.org/10.1016/j.procs.2019.12.107
Tim Althoff, Kevin Clark, and Jure Leskovec. 2016. Large-scale analysis of counseling conversations: An application of
natural language processing to mental health. Trans. Assoc. Comput. Ling. 4 (2016), 463–476.
Silvio Amir, Glen Coppersmith, Paula Carvalho, Mário J. Silva, and Byron C. Wallace. 2017. Quantifying mental health
from social media with neural user embeddings. In Proceedings of the 2nd Machine Learning for Healthcare Conference.
PMLR, 306–321. arxiv:1705.00335.
Laritza Coello-Guilarte, Rosa María Ortega-Mendoza, Luis Villaseñor-Pineda, Manuel Montes-y-Gómez. 2019. Crosslingual
depression detection in Twitter using Bilingual word alignments. In Proceedings of the International Conference of the
Cross-Language Evaluation Forum for European Languages.Springer, 49–61 pages. DOI:https://doi.org/10.1007/978-3-
030-28577-7
Sairam Balani and Munmun De Choudhury. 2015. Detecting and characterizing mental health related self-disclosure in
social media. In Proceedings of the 33rd Annual ACM Conference Extended Abstracts on Human Factors in Computing
Systems (CHI EA’15). 1373–1378.
Adrian Benton, Glen Coppersmith, and Mark Dredze. 2017a. Ethical research protocols for social media health research.
In Proceedings of the 1st Workshop on Ethics in Natural Language Processing. Association for Computational Linguistics,
94–102. DOI:https://doi.org/10.18653/v1/w17-1612
Adrian Benton, Margaret Mitchell, and Dirk Hovy. 2017b. Multitask learning for mental health conditions with limited
social media data. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational
Linguistics, Vol. 1. 152–162. DOI:https://doi.org/10.1890/06-0645.1arxiv:1712.03538.
Natalie Berry, Fiona Lobban, Maksim Belousov, Richard Emsley, Goran Nenadic, and Sandra Bucci. 2017. #Why-
WeTweetMH: Understanding why people use Twitter to discuss mental health problems. J. Med. Internet Res. 19, 4
(2017), 107–1071. DOI:https://doi.org/10.2196/jmir.6173
Scott R. Braithwaite, Christophe Giraud-Carrier, Josh West, Michael D. Barnes, and Carl Lee Hanson. 2016. Validating
machine learning algorithms for Twitter data against established measures of suicidality. JMIR Ment. Health 3, 2 (2016),
e21. DOI:https://doi.org/10.2196/mental.4822
Pete Burnap, Gualtiero Colombo, Rosie Amery, Andrei Hodorog, and Jonathan Scourfield. 2017. Multi-class machine clas-
sification of suicide-related communication on Twitter. Online Social Netw. Media 2 (2017), 32–44.
Pete Burnap, Walter Colombo, and Jonathan Scourfield. 2015. Machine classification and analysis of suicide-related com-
munication on Twitter. In Proceedings of the 26th ACM Conference on Hypertext & Social Media (HT’15). 75–84.
Hugo D. Calderon-Vilca, William I. Wun-Rafael, and Roberto Miranda-Loarte. 2017. Simulation of suicide tendency by using
machine learning. In Proceedings of 36th International Conference of the Chilean Computer Science Society (SCCC’17).
IEEE, 1–6. DOI:https://doi.org/10.1109/SCCC.2017.8405128
Rafael A. Calvo, David N. Milne, M. Sazzad Hussain, and Helen Christensen. 2017. Natural language processing in men-
tal health applications using non-clinical texts. Nat. Lang. Eng. 23, 5 (Sep. 2017), 1–37. DOI:https://doi.org/10.1017/
S1351324916000383

ACM Computing Surveys, Vol. 53, No. 6, Article 129. Publication date: December 2020.
129:24 R. Skaik and D. Inkpen

Patricia Cavazos-Rehg, Melissa Krauss, Shaina Sowles, Sarah Connolly, Rosas Carlos, Meghana Bharadwaj, and Laura
Bierut. 2016. A content analysis of depression-related Tweets. Comput. Hum. Behav. 2, 74 (2016), 351–357.
Daniel Cer, Yinfei Yang, Sheng yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-
Céspedes, Steve Yuan, Chris Tar, Yun Hsuan Sung, Brian Strope, and Ray Kurzweil. 2018. Universal sentence encoder. In
Proceedings of the Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association
for Computational Linguistics, 169–174. DOI:https://doi.org/10.18653/v1/d18-2029arxiv:1803.11175.
Nina Cesare, Christan Grant, and Elaine O. Nsoesie. 2017. Detection of user demographics on social media: A review of
methods and recommendations for best practices. CoRR (Feb. 2017), 1–18.
Stevie Chancellor, Eric P. S. Baumer, and Munmun De Choudhury. 2019. Who is the ‘human” in human-centered machine
learning: The case of predicting mental health from social media. Proc. ACM Hum.-comput. Interact. 3, Nov. (2019).
Xuetong Chen, Martin Sykora, Thomas Jackson, Suzanne Elayan, and Fehmidah Munir. 2018b. Tweeting your mental
health: An exploration of different classifiers and features with emotional signals in identifying mental health con-
ditions. In Proceedings of the 51st Hawaii International Conference on System Sciences. 3320–3328. DOI:https://doi.org/
10.24251/HICSS.2018.421
Xuetong Chen, Martin D. Sykora, Thomas W. Jackson, and Suzanne Elayan. 2018a. What about mood swings? Identifying
depression on Twitter with temporal measures of emotions. In The 2018 Web Conference Companion. ACM, 1653–1660.
Xin Chen, Yu Wang, Eugene Agichtein, and Fusheng Wang. 2015. A comparative study of demographic attribute inference
in Twitter. In Proceedings of the 9th International AAAI Conference on Web and Social Media. 590–593. DOI:https://doi.
org/10.1177/2047487314541731
Qijin Cheng, Tim Mh Li, Chi Leung Kwok, Tingshao Zhu, and Paul S. F. Yip. 2017. Assessing suicide risk and emotional
distress in Chinese social media: A text mining and machine learning study. J. Med. Internet Res. 19, 7 (2017), 1–10.
DOI:https://doi.org/10.2196/jmir.7276
Arman Cohan, Sydney Young, Andrew Yates, and Nazli Goharian. 2017. Triaging content severity in online mental health
forums. J. Assoc. Inf. Sci. Technol. 68, 11 (2017), 2675–2689. DOI:https://doi.org/10.1002/asi.23865 arxiv:1702.06875
Gualtiero B. Colombo, Pete Burnap, Andrei Hodorog, and Jonathan Scourfield. 2016. Analysing the connectivity and com-
munication of suicidal users on Twitter. Comput. Commun. 73 (2016), 291–300.
Qing Cong, Zhiyong Feng, Fang Li, Yang Xiang, Guozheng Rao, and Cui Tao. 2018. X-A-BiLSTM: A deep learning approach
for depression detection in imbalanced data. In Proceedings of the IEEE International Conference on Bioinformatics and
Biomedicine (BIBM’18). IEEE, 1624–1627. DOI:https://doi.org/10.1109/BIBM.2018.8621230
Mike Conway. 2014. Ethical issues in using Twitter for public health surveillance and research: Developing a taxonomy of
ethical concepts from the research literature. J. Med. Internet Res. 16, 12 (2014), 1–9. DOI:https://doi.org/10.2196/jmir.
3617
Mike Conway, Mengke Hu, and Wendy W. Chapman. 2019. Recent advances in using natural language processing to
address public health research questions using social media and consumer-generated data. Yearb. Med. Inform. 28, 01
(Aug. 2019), 208–217. DOI:https://doi.org/10.1055/s-0039-1677918
Mike Conway and Daniel O’Connor. 2016. Social media, big data, and mental health: Current advances and ethical impli-
cations. Curr. Opin. Psychol. 9 (June 2016), 77–82. DOI:https://doi.org/10.1016/j.copsyc.2016.01.004
Glen Coppersmith, Mark Dredze, and Craig Harman. 2014a. Quantifying mental health signals in Twitter. In Proceedings
of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality. 51–60.
Glen Coppersmith, Mark Dredze, Craig Harman, and Kristy Hollingshead. 2015a. From ADHD to SAD: Analyzing the lan-
guage of mental health on Twitter through self-reported diagnoses. In Proceedings of the 2nd Workshop on Computational
Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality. 1–10.
Glen Coppersmith, Casey Hilland, Ophir Frieder, and Ryan Leary. 2017. Scalable mental health analysis in the clinical
whitespace via natural language processing. In Proceedings of the IEEE EMBS International Conference on Biomedical
and Health Informatics (BHI’17). IEEE, 393–396. DOI:https://doi.org/10.1109/BHI.2017.7897288
Glen Coppersmith, Ryan Leary, Patrick Crutchley, and Alex Fine. 2018. Natural language processing of social me-
dia as screening for suicide risk. Biomed. Inform. Ins. 10 (2018), 117822261879286. DOI:https://doi.org/10.1177/
1178222618792860
Glen Coppersmith, Ryan Leary, Eric Whyne, and Tony Wood. 2015b. Quantifying suicidal ideation via language usage on
social media. In Joint Statistics Meetings Proceedings, Statistical Computing Section (JSM’15).
Glen Coppersmith, Kim Ngo, Ryan Leary, and Anthony Wood. 2016. Exploratory analysis of social media prior to a suicide
attempt. In Proceedings of the 3rd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal
to Clinical Reality. Association for Computational Linguistics. 106–117. DOI:https://doi.org/10.18653/v1/w16-0311
Glen A. Coppersmith, Craig T. Harman, and Mark H. Dredze. 2014b. Measuring post traumatic stress disorder in Twitter.
In Proceedings of the International AAAI Conference on Web and Social Media (ICWSM’14). DOI:https://doi.org/10.1016/
S1003-6326(14)63309-4

ACM Computing Surveys, Vol. 53, No. 6, Article 129. Publication date: December 2020.
Using Social Media for Mental Health Surveillance: A Review 129:25

Aron Culotta. 2014. Estimating county health statistics with Twitter. In Proceedings of the 32nd Annual ACM Conference on
Human Factors in Computing Systems (CHI’14). 1335–1344.
Nicholas Cummins, Stefan Scherer, Jarek Krajewski, Sebastian Schnieder, Julien Epps, and Thomas F. Quatieri. 2015.
A review of depression and suicide risk assessment using speech analysis. Speech Commun. 71 (2015), 10–49. DOI:
https://doi.org/10.1016/j.specom.2015.03.004
Munmun De Choudhury, Scott Counts, and Eric Horvitz. 2013c. Major life changes and behavioral markers in social media:
Case of childbirth. In Proceedings of the ACM Conference on Computer Supported Cooperative Work (CSCW’13). 1431–
1442. DOI:https://doi.org/10.1145/2441776.2441937
Munmun De Choudhury, Scott Counts, and Eric Horvitz. 2013b. Predicting postpartum changes in emotion and behavior
via social media. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’13).
Munmun De Choudhury, Scott Counts, and Eric Horvitz. 2013a. Social media as a measurement tool of depression in
populations. In Proceedings of the 5th Annual ACM Web Science Conference (WebSci’13). 47–56. DOI:https://doi.org/10.
1145/2464464.2464480
Munmun De Choudhury, Scott Counts, Eric J. Horvitz, and Aaron Hoff. 2014. Characterizing and predicting postpartum
depression from shared Facebook data. In Proceedings of the 17th ACM Conference on Computer Supported Cooperative
Work & Social Computing (CSCW’14). 626–638. DOI:https://doi.org/10.1145/2531602.2531675
Munmun De Choudhury, Michael Gamon, Scott Counts, and Eric Horvitz. 2013d. Predicting depression via social media.
In Proceedings of the 7th International AAAI Conference on Weblogs and Social Media, Vol. 2. 128–137.
Munmun De Choudhury, Emre Kiciman, Mark Dredze, Glen Coppersmith, and Mrinal Kumar. 2016. Discovering shifts to
suicidal ideation from mental health content in social media. In Proceedings of the CHI Conference on Human Factors in
Computing Systems (CHI’16). 2098–2110. DOI:https://doi.org/10.1145/2858036.2858207
Munmun De Choudhury and Emre Kiciman. 2017a. The language of social support in social media and its effect on suicidal
ideation risk. In Proceedings of the International AAAI Conference on Weblogs and Social Media. 32–41.
Munmun De Choudhury, Sanket S. Sharma, Tomaz Logar, Wouter Eekhout, and René Clausen Nielsen. 2017b. Gender and
cross-cultural differences in social media disclosures of mental illness. In Proceedings of the ACM Conference on Computer
Supported Cooperative Work and Social Computing (CSCW’17). 353–369. DOI:https://doi.org/10.1145/2998181.2998220
K. Denecke, P. Bamidis, C. Bond, E. Gabarron, M. Househ, A. Y. S. S. Lau, M. A. Mayer, M. Merolli, and M. Hansen. 2015.
Ethical issues of social media usage in healthcare. Yearb. Med. Inf. 10, 1 (Aug. 2015), 137–147. DOI:https://doi.org/10.
15265/IY-2015-001
Bart Desmet and Véronique Hoste. 2018. Online suicide prevention through optimised text classification. Inf. Sci. 439–440
(2018), 61–78. DOI:https://doi.org/10.1016/j.ins.2018.02.014
Karthik Dinakar, Henry Lieberman, Allison J. B. Chaney, and David M. Blei. 2014. Real-time topic models for crisis coun-
seling. In Proceedings of the KDD DSSG Workshop.
Son Doan, Amanda Ritchart, Nicholas Perry, Juan D. Chaparro, and Mike Conway. 2017. How do you #relax when you’re
#stressed? A content analysis and infodemiology study of stress-related tweets. JMIR Pub. Health Surveill. 3, 2 (2017),
e35. DOI:https://doi.org/10.2196/publichealth.5939
Caitlin Dreisbach, Theresa A. Koleck, Philip E. Bourne, and Suzanne Bakken. 2019. A systematic review of natural language
processing and text mining of symptoms from electronic patient-authored text data. Int. J. Med. Inform. 125, Dec. (2019),
37–46. DOI:https://doi.org/10.1016/j.ijmedinf.2019.02.008
Oduwa Edo-Osagie, Beatriz De La Iglesia, Iain Lake, and Obaghe Edeghere. 2020. A scoping review of the use of Twitter
for public health research. Comput. Biol. Med. 122, Apr. (2020), 103770. DOI:https://doi.org/10.1016/j.compbiomed.2020.
103770
Oduwa Edo-Osagie, Gillian Smith, Iain Lake, Obaghe Edeghere, and Beatriz De La Iglesia. 2019. Twitter mining using
semi-supervised classification for relevance filtering in syndromic surveillance. PLoS One 14, 7 (2019), 1–29. DOI:
https://doi.org/10.1371/journal.pone.0210689
Atefeh Farzindar and Diana Inkpen. 2017. Natural language processing for social media, second edition. Synth. Lect. Hum.
Lang. Technol. 10, 2 (2017), 1–195.
Renato Miranda Filho, Jussara M. Almeida, and Gisele L. Pappa. 2015. Twitter population sample bias and its impact on
predictive outcomes. In Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis
and Mining (ASONAM’15). ACM, 1254–1261. DOI:https://doi.org/10.1145/2808797.2809328
Manuel A. Franco-Martín, Juan Luis Muñoz-Sánchez, Beatriz Sainz-de Abajo, Gema Castillo-Sánchez, Sofiane Hamrioui,
and Isabel de la Torre-Díez. 2018. A systematic literature review of technologies for suicidal behavior prevention.
J. Med. Systems 42, 4 (Apr. 2018), 71. DOI:https://doi.org/10.1007/s10916-018-0926-5
Yuanbo Gao, Baobin Li, Xuefei Wangy, Jingying Wangy, Yang Zhouy, Shuotian Bai, and Tingshao Zhuy. 2017. Detecting
suicide ideation from Sina microblog. In Proceedings of the IEEE International Conference on Systems, Man, and Cyber-
netics (SMC’17). 182–187. DOI:https://doi.org/10.1109/SMC.2017.8122599

ACM Computing Surveys, Vol. 53, No. 6, Article 129. Publication date: December 2020.
129:26 R. Skaik and D. Inkpen

Robert R. German, John M. Horan, Lisa M. Lee, Bobby Milstein, and Carol A. Pertowski. 2001. Updated guidelines for
evaluating public health surveillance systems; recommendations from the guidelines working group. Retrieved from
https://www.cdc.gov/mmwr/preview/mmwrhtml/rr5013a1.htm.
Salvatore Giorgi, Daniel Preotiuc-Pietro, Anneke Buffone, Daniel Rieman, Lyle Ungar, and H. Andrew Schwartz. 2018.
The remarkable benefit of user-level aggregation for lexical-based population-level predictions. In Proceedings of the
Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1167–1172.
DOI:https://doi.org/10.18653/v1/d18-1148 arxiv:1808.09600.
Felipe T. Giuntini, Mirela T. Cazzolato, Maria de Jesus Dutra dos Reis, Andrew T. Campbell, Agma J. M. Traina, and Jó
Ueyama. 2020. A review on recognizing depression in social networks: Challenges and opportunities. J. Amb. Intell.
Humaniz. Comput. 2016 (2020). DOI:https://doi.org/10.1007/s12652-020-01726-4
George Gkotsis, Anika Oellrich, Sumithra Velupillai, Maria Liakata, Tim J. P. Hubbard, Richard J. B. Dobson, and Rina Dutta.
2017. Characterisation of mental health conditions in social media using informed deep learning. Sci. Rep. 7 (Mar. 2017),
45141. DOI:https://doi.org/10.1038/srep45141
Su Golder, Shahd Ahmed, Gill Norman, and Andrew Booth. 2017. Attitudes toward the ethics of research using social
media: A systematic review. J. Med. Internet Res. 19, 6 (June 2017), e195. DOI:https://doi.org/10.2196/jmir.7082
Shannon Greenwood, Andrew Perrin, and Maeve Duggan. 2016. Social media update 2016. Pew Research CenterNov. (2016).
Oliver Gruebner, Sarah R. Lowe, Martin Sykora, Ketan Shankardass, S. V. Subramanian, and Sandro Galea. 2017a. A novel
surveillance approach for disaster mental health. PLoS One 12, 7 (2017), e0181233. DOI:https://doi.org/10.1371/journal.
pone.0181233
Oliver Gruebner, Martin Sykora, Sarah R. Lowe, Ketan Shankardass, Sandro Galea, and S. V. Subramanian. 2017b.
Big data opportunities for social behavioral and mental health research. Social Sci. Med. 189 (2017), 167–169. DOI:
https://doi.org/10.1016/j.socscimed.2017.07.018
Sharath Chandra Guntuku, David B. Yaden, Margaret L. Kern, Lyle H. Ungar, and Johannes C. Eichstaedt. 2017. Detecting
depression and mental illness on social media: An integrative review. Curr. Opin. Behav. Sci. 18 (2017), 43–49. DOI:
https://doi.org/10.1016/j.cobeha.2017.07.005
T. Hahn, A. A. Nierenberg, and S. Whitfield-Gabrieli. 2017. Predictive analytics in mental health: Applications, guidelines,
challenges and perspectives. Molec. Psychi. 22, 1 (2017), 37–43.
Bibo Hao, Lin Li, Ang Li, and Tingshao Zhu. 2013. Predicting mental health status on social media—A preliminary study
on microblog. In Proceedings of the 15th International Conference on Human-Computer Interaction. 101–110.
Derek Howard, Marta M. Maslej, Justin Lee, Jacob Ritchie, Geoffrey Woollard, and Leon French. 2020. Transfer learning for
risk classification of social media posts: Model evaluation study. J. Med. Internet Res. 22, 5 (2020). DOI:https://doi.org/
10.2196/15371 arxiv:1907.02581
Xiaolei Huang, Xin Li, Lei Zhang, Tianli Liu, David Chiu, and Tingshao Zhu. 2015. Topic model for identifying suicidal
ideation in Chinese microblog. In Proceedings of the 29th Pacific Asia Conference on Language, Information and Compu-
tation. 553–562.
Xiaolei Huang, Lei Zhang, David Chiu, Tianli Liu, Xin Li, and Tingshao Zhu. 2014. Detecting suicidal ideation in Chinese
microblogs with psychological lexicons. In Proceedings of the IEEE International Conference on Ubiquitous Intelligence
and Computing. 844–849.
InsightsWest. 2017. Canadian social media monitor 2017. Retrieved from https://bcama.com/wp-content/uploads/2018/03/
Rep_IW_CDNSocialMediaMonitor_Oct2017.pdf.
Zunaira Jamil, Diana Inkpen, Prasadith Buddhitha, and Kenton White. 2017. Monitoring tweets for depression to detect
at-risk users. In Proceedings of the 4th Workshop on Computational Linguistics and Clinical Psychology. 32–40.
Natasha Jaques, Sara Taylor, Ehimwenma Nosakhare, Akane Sano, and Rosalind Picard. 2016. Multi-task learning for pre-
dicting health, stress, and happiness. In Proceedings of the NIPS Workshop on Machine Learning for Healthcare. 1–5.
Jared Jashinsky, Scott H. Burton, Carl L. Hanson, Josh West, Christophe Giraud-Carrier, Michael D. Barnes, and Trenton
Argyle. 2013. Tracking suicide risk factors through Twitter in the US. Crisis 35, 1 (Jan. 2013), 51–59. DOI:https://doi.org/
10.1027/0227-5910/a000234
Deepali J. Joshi, Mohit Makhija, Yash Nabar, Ninad Nehete, and Manasi S. Patwardhan. 2018. Mental health analysis using
deep learning for feature extraction. In ACM International Conference Proceeding Series. ACM, 356–359. DOI:https://doi.
org/10.1145/3152494.3167990
Sayali Shashikant Kale. 2015. Tracking Mental Disorders across Twitter Users. Ph.D. Dissertation. University of Mumbai.
Keumhee Kang, Chanhee Yoon, and Eun Yi Kim. 2016. Identifying depressive users in Twitter using multimodal anal-
ysis. In Proceedings of the International Conference on Big Data and Smart Computing (BigComp’16). 231–238. DOI:
https://doi.org/10.1109/BIGCOMP.2016.7425918
Christian Karmen, Robert C. Hsiung, and Thomas Wetter. 2015. Screening Internet forum participants for depression symp-
toms by assembling and enhancing multiple NLP methods. Comput. Meth. Prog. Biomed. 120, 1 (2015), 27–36.

ACM Computing Surveys, Vol. 53, No. 6, Article 129. Publication date: December 2020.
Using Social Media for Mental Health Surveillance: A Review 129:27

Ramakanth Kavuluru, Amanda G. Williams, María Ramos-Morales, Laura Haye, Tara Holaday, and Julie Cerel. 2016. Clas-
sification of helpful comments on online suicide watch forums HHS public access. In Proceedings of the ACM Conference
on Bioinformatics, Computational Biology, and Health Informatics. 32–40. DOI:https://doi.org/10.1145/2975167.2975170
Ashar Anam Khan and Mohd Husain. 2018. Analysis of mental state of users using social media to predict depression! A
survey. In Int. J. Adv. Res. Comput. Sci. 9 (2018).
Sunghwan Mac Kim, Yufei Wang, Stephen Wan, and Cécile Paris. 2016. Data61-CSIRO systems at the CLPsych 2016 shared
task. In Proceedings of the 3rd Workshop on Computational Linguistics and Clinical Psychology. 128–132.
Holly Korda and Zena Itani. 2013. Harnessing social media for health promotion and behavior change. Health Promot. Pract.
14, 1 (2013), 15–23.
Mrinal Kumar, Mark Dredze, Glen Coppersmith, and Munmun De Choudhury. 2015. Detecting changes in suicide content
manifested in social media following celebrity suicides. In Proceedings of the 26th ACM Conference on Hypertext & Social
Media (HT’15), Vol. 2015. NIH Public Access, 85–94.
Ugur Kursuncu, Manas Gaur, Usha Lokala, Krishnaprasad Thirunarayan, Amit Sheth, and I. Budak Arpinar. 2018. Predictive
analysis on Twitter: Techniques and applications. In Emerging Research Challenges and Opportunities in Computational
Social Network Analysis and Mining. Springer, Cham, 67–104.
E. Megan Lachmar, Andrea K. Wittenborn, Katherine W. Bogen, and Heather L. McCauley. 2017. #MyDepressionLooksLike:
Examining public discourse about depression on Twitter. JMIR Ment. Health 4, 4 (Oct. 2017), e43. DOI:https://doi.org/
10.2196/mental.8141
Michael Thaul Lehrman, Cecilia Ovesdotter, Alm Rubén, and A. Proaño. 2012. Detecting distressed and non-distressed
affect states in short forum texts. In Proceedings of the 2nd Workshop on Language in Social Media. 9–18.
Victor Leiva and Freire Ana. 2017. Towards suicide prevention: Early detection of depression on social media,
I. Kompatsiaris I. et al. (Eds.). In Proceedings of the Conference on Internet Science (INSCI’17), Vol. 10673. 1271c–0.
DOI:https://doi.org/10.1007/978-3-319-70284-1_34
Diya Li, Harshita Chaudhary, and Zhe Zhang. 2020. Modeling spatiotemporal pattern of depressive symptoms caused by
COVID-19 using social media data mining. Int. J. Environ. Res. Pub. Health 17, 14 (2020), 1–23. DOI:https://doi.org/10.
3390/ijerph17144988
Yun Li, Tao Li, and Huan Liu. 2017. Recent advances in feature selection and its applications. Knowl. Inf. Syst. 53, 3 (2017),
551–577.
Huijie Lin, Jia Jia, Quan Guo, Yuanyuan Xue, Qi Li, Jie Huang, Lianhong Cai, and Ling Feng. 2014. User-level psychological
stress detection from social media using deep neural network. In Proceedings of the ACM International Conference on
Multimedia (MM’14). 507–516.
Huijie Lin, Jia Jia, Liqiang Nie, Guangyao Shen, and Tat-Seng Chua. 2016. What does social media say about your stress?
In Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI’16). 3775–3781.
Huijie Lin, Jia Jia, Jiezhong Qiu, Yongfeng Zhang, Guangyao Shen, Lexing Xie, Jie Tang, Ling Feng, and Tat-seng Seng
Chua. 2017. Detecting stress based on social interactions in social networks. IEEE Trans. Knowl. Data Eng. 29, 9 (2017),
1820–1833. DOI:https://doi.org/10.1109/TKDE.2017.2686382
Tong Liu, Qijin Cheng, Christopher M. Homan, and Vincent M. B. Silenzio. 2017. Learning from various labeling strategies
for suicide-related messages on social media: An experimental study. In Proceedings of the ACM International Conference
on Web Search and Data Mining Workshop on Mining Online Health Reports. arxiv:1701.08796
David E. Losada and Fabio Crestani. 2016. A test collection for research on depression and language use. In Lecture Notes
in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol.
9822. Springer, Cham, 28–39.
David E. Losada, Fabio Crestani, and Javier Parapar. 2017. CLEF 2017 eRisk overview: Early risk prediction on the Internet:
Experimental foundations. In Proceedings of the International Conference of the Cross-language Evaluation Forum for
European Languages, Vol. 1866. Springer.
David E. Losada, Fabio Crestani, and Javier Parapar. 2018. Overview of eRisk 2018: Early risk prediction on the Internet
(extended lab overview). In Proceedings of the 9th International Conference of the CLEF Association, Vol. 2125.
David E. Losada, Fabio Crestani, and Javier Parapar. 2019. Overview of eRisk: Early risk prediction on the Internet. In Exper-
imental IR Meets Multilinguality, Multimodality, and Interaction, F. Crestaniet al. (Eds.), Vol. 11696. Springer International
Publishing, 343–361. DOI:https://doi.org/10.1007/978-3-030-28577-7_27
Meizhen Lv, Ang Li, Tianli Liu, and Tingshao Zhu. 2015. Creating a Chinese suicide dictionary for identifying suicide risk
on social media. PeerJ 3 (2015), e1455.
Nourane Mahdy, Dalia A. Magdi, Ahmed Dahroug, and Mohammed Abo Rizka. 2020. Comparative study: Different tech-
niques to detect depression using social media. In Lecture Notes in Networks and Systems. Vol. 114. Springer Singapore,
441–452. DOI:https://doi.org/10.1007/978-981-15-3075-3_30
M. Marcus, Mohammad Taghi Yasamy, M. Van Ommeren, D. Chisholm, and S. Saxena. 2012. Depression: A global public
health concern. World Health Org. Paper Depress. 01, Dec. (2012), 6–8.

ACM Computing Surveys, Vol. 53, No. 6, Article 129. Publication date: December 2020.
129:28 R. Skaik and D. Inkpen

Matthew Matero, Akash Idnani, Youngseo Son, Sal Giorgi, Huy Vu, Mohammad Zamani, Parth Limbachiya, Sharath
Chandra Guntuku, and H. Andrew Schwartz. 2019. Suicide risk assessment with multi-level dual-context language
and BERT. In Proceedings of the 6th Workshop on Computational Linguistics and Clinical Psychology. 39–44. DOI:
https://doi.org/10.18653/v1/w19-3005
Rebecca Mckee. 2013. Ethical issues in using social media for health and health care research. Health Polic. 110, 2-3 (May
2013), 298–301. DOI:https://doi.org/10.1016/j.healthpol.2013.02.006
Jonathan Mellon and Christopher Prosser. 2017. Twitter and Facebook are not representative of the general population:
Political attitudes and demographics of British social media users. Res. Polit. 4, 3 (2017).
Jude Mikal, Samantha Hurst, and Mike Conway. 2016. Ethical issues in using Twitter for population-level depression mon-
itoring: A qualitative study. BMC Med. Ethics 17, 1 (Apr. 2016), 22. DOI:https://doi.org/10.1186/s12910-016-0105-5
Elham Mohammadi, Hessam Amini, and Leila Kosseim. 2019. CLaC at CLPsych 2019: Fusion of neural features and predicted
class probabilities for suicide risk assessment based on online posts. In Proceedings of the 6th Workshop on Computational
Linguistics and Clinical Psychology. 34–38. DOI:https://doi.org/10.18653/v1/W19-3004
David Moher, Douglas G. Altman, Alesandro Liberati, and Jennifer Tetzlaff. 2011. PRISMA statement. Epidemiology 22, 1
(2011), 128.
Michelle Renee Morales, Stefan Scherer, and Rivka Levitan. 2017. A cross-modal review of indicators for depression detec-
tion systems. In Proceedings of the 4th Workshop on Computational Linguistics and Clinical Psychology. 1–12.
Danielle Mowery, Craig Bryan, and Mike Conway. 2017a. Feature Studies to Inform the Classification of Depressive Symp-
toms from Twitter Data for Population Health. DOI:https://doi.org/10.1056/NEJMoa1010095
Danielle Mowery, Albert Park, Mike Conway, and Craig Bryan. 2016. Towards automatically classifying depressive symp-
toms from Twitter data for population health. In Proceedings of the Workshop on Computational Modeling of People’s
Opinions, Personality, and Emotions in Social Media. 182–191.
Danielle Mowery, Hilary Smith, Tyler Cheney, Greg Stoddard, Glen Coppersmith, Craig Bryan, and Mike Conway. 2017b.
Understanding depressive symptoms and psychosocial stressors on Twitter: A corpus-based study. J. Med. Internet Res.
19, 2 (2017). DOI:https://doi.org/10.2196/jmir.6895
Danielle L. Mowery, Craig Bryan, and Mike Conway. 2015. Towards developing an annotation scheme for depressive
disorder symptoms: A preliminary study using Twitter data. In Proceedings of the 2nd Workshop on Computational
Linguistics and Clinical Psychology. 89–98.
Hung Nguyen, Duc Thanh Nguyen, and Thin Nguyen. 2019. Estimating county health indices using graph neural networks.
In Proceedings of the Australasian Data Mining Conference (AusDM’19), T. Le et al. (Eds). (Communications in Computer
and Information Science, Vol. 1127). 16–27. DOI:https://doi.org/10.1007/978-981-15-1699-3
Thin Nguyen, Bridianne O’Dea, Mark Larsen, Dinh Phung, Svetha Venkatesh, and Helen Christensen. 2017b. Using lin-
guistic and topic analysis to classify sub-groups of online depression communities. Multimedia Tools Applic. 76, 8 (Apr.
2017), 10653–10676.
Thin Nguyen, Duc Thanh Nguyen, Mark E. Larsen, Bridianne O’Dea, John Yearwood, Dinh Phung, Svetha Venkatesh, and
Helen Christensen. 2017a. Prediction of population health indices from social media using kernel-based textual and
temporal features. In Proceedings of the 26th International Conference on World Wide Web (WWW’17). ACM, 99–107.
DOI:https://doi.org/10.1145/3041021.3054136
Thin Nguyen, Dinh Phung, Bo Dao, Svetha Venkatesh, and Michael Berk. 2014. Affective and content analysis of online
depression communities. IEEE Trans. Affect. Comput. 5, 3 (2014), 217–226.
Elaine O. Nsoesie, Luisa Flor, Jared Hawkins, Adyasha Maharana, Tobi Skotnes, Fatima Marinho, and John S. Brownstein.
2016. Social media as a sentinel for disease surveillance: What does sociodemographic status have to do with it? PLoS
Currents 8 (2016), 1–26.
Robertus Nugroho, Cecile Paris, Surya Nepal, Jian Yang, and Weiliang Zhao. 2020. A Survey of Recent Methods on Deriving
Topics from Twitter: Algorithm to Evaluation. Vol. 62. Springer London. 2485–2519 pages. DOI:https://doi.org/10.1007/
s10115-019-01429-z
Bridianne O’Dea, Stephen Wan, Philip J. Batterham, Alison L. Calear, Cecile Paris, and Helen Christensen. 2015. Detecting
suicidality on Twitter. Internet Interven. 2, 2 (May 2015), 183–188. DOI:https://doi.org/10.1016/j.invent.2015.03.005
The National Institute of Mental Health Information Resource Center NIMH. 2019. Suicide. Retrieved from https://www.
nimh.nih.gov/health/statistics/suicide.shtml.
Jihoon Oh, Kyongsik Yun, Ji-Hyun Hwang, and Jeong-Ho Chae. 2017. Classification of suicide attempts through a machine
learning algorithm based on multiple systemic psychiatric scales. Front. Psychi. 8 (2017), 192.
Ahmed Husseini Orabi, Prasadith Buddhitha, Mahmoud Husseini Orabi, and Diana Inkpen. 2018. Deep learning for depres-
sion detection of Twitter users. In Proceedings of the 5th Workshop on Computational Linguistics and Clinical Psychology:
From Keyboard to Clinic. 88–97.
Esteban Ortiz-Ospina. 2019. The Rise of Social Media. Retrieved from https://ourworldindata.org/rise-of-social-media.

ACM Computing Surveys, Vol. 53, No. 6, Article 129. Publication date: December 2020.
Using Social Media for Mental Health Surveillance: A Review 129:29

Minsu Park, David W. Mcdonald, and Meeyoung Cha. 2013. Perception differences between the depressed and non-
depressed users in Twitter. In Proceedings of the 7th International AAAI Conference on Weblogs and Social Media
(ICWSM’13). 476–485.
Michael J. Paul and Mark Dredze. 2014. Discovering health topics in social media using topic models. PLoS One 9, 8 (Aug.
2014), e103408. DOI:https://doi.org/10.1371/journal.pone.0103408
Michael J. Paul and Mark Dredze. 2017. Social monitoring for public health. Synth. Lect. Inf. Concepts, Retr. Serv. 9, 5 (2017),
1–183.
Michael J. Paul, Abeed Sarker, John S. Brownstein, Azadeh Nikfarjam, Matthew Scotch, Karen L. Smith, and Graciela
Gonzalez. 2016. Social media mining for public health monitoring and surveillance. In Proceedings of the Pacific Sympo-
sium on Biocomputing. 468–79.
Zhichao Peng, Qinghua Hu, and Jianwu Dang. 2017. Multi-kernel SVM based depression recognition using social media
data. Int. J. Mach. Learn. Cyber. (June 2017), 1–15.
Lawrence Phillips, Chase Dowling, Kyle Shaffer, Nathan Hodas, and Svitlana Volkova. 2017. Using social media to predict
the future: A systematic literature review. arXiv preprint arXiv:1706.06134 06, June 2016 (2017), 1–55.
Daniel Preot, Johannes Eichstaedt, Gregory Park, Maarten Sap, Laura Smith, Victoria Tobolsky, H. Andrew Schwartz, and
Lyle Ungar. 2015. The role of personality, age and gender in tweeting about mental illnesses. In Proceedings of the
Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality. 21–30.
V. M. Prieto, M. Alvarez, F. Cacheda, and J. L. Oliveira. 2014. Twitter: A good place to detect health conditions. PLoS One 9,
1 (Jan. 2014), e86191.
Public Health Agency of Canada. 2015. Report from the Canadian Chronic Disease Surveillance System: Mental Illness in
Canada 2015. Vol. 2015. Minister of Health, Ottawa, Canada. 1–54. DOI:https://doi.org/10.1002/yd.20038
Kate Loveys, Patrick Crutchley, Emily Wyatt, and Glen Coppersmith.2017. Small but mighty: Affective micropatterns for
quantifying mental health from social media language. In Proceedings of the 4th Workshop on Computational Linguistics
and Clinical Psychology. 85–95. DOI:https://doi.org/10.18653/v1/w17-3110
Gopalkumar Rakesh. 2017. Suicide prediction with machine learning. Amer. J. Psychi. Resid. J. 12, 1 (2017), 15–17.
Chempaka Seri Abdul Razak, Muhammad Ameer Zulkarnain, Siti Hafizah Ab Hamid, Nor Badrul Anuar, Mohd Zalisham
Jali, and Hasni Meon. 2020. Tweep: A System Development to Detect Depression in Twitter Posts. Vol. 603. Springer Singa-
pore. 543–552. DOI:https://doi.org/10.1007/978-981-15-0058-9_52
Andrew G. Reece, Andrew J. Reagan, Katharina L. M. Lix, Peter Sheridan Dodds, Christopher M. Danforth, and Ellen J.
Langer. 2017. Forecasting the onset and course of mental illness with Twitter data. Sci. Rep. 7, 1 (2017), 1–11. DOI:
https://doi.org/10.1038/s41598-017-12961-9
Philip Resnik, William Armstrong, Leonardo Claudino, and Thang Nguyen. 2015a. The University of Maryland CLPsych
2015 shared task system. In Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology. 54–60.
DOI:https://doi.org/10.3115/v1/w15-1207
Philip Resnik, William Armstrong, Leonardo Claudino, Thang Nguyen, Viet-An Nguyen, and Jordan Boyd-Graber. 2015b.
Beyond LDA: Exploring supervised topic modeling for depression-related language in Twitter. In Proceedings of the
52nd Workshop Computational Linguistics and Clinical Psychology, Vol. 1. 99–107.
Philip Resnik, Anderson Garron, and Rebecca Resnik. 2013. Using topic modeling to improve prediction of neuroticism and
depression in college students. In Proceedings of the Conference on Empirical Methods in Natural Language Processing.
1348–1353.
Hannah Ritchie and Max Roser. 2018. Our World in Data. Retrieved from https://ourworourworldindataldindata.org/
mental-health.
Mihaela Robila and Stefan A. Robila. 2019. Applications of artificial intelligence methodologies to behavioral and social
sciences. J. Child Fam. Stud. DOI:https://doi.org/10.1007/s10826-019-01689-x
Jo Robinson, Maria Rodrigues, Steve Fisher, Eleanor Bailey, and Helen Herrman. 2015. Social media and suicide prevention:
Findings from a stakeholder survey. Shanghai Arch. Psychi. 27, 1 (2015), 27–35.
Arunima Roy, Katerina Nikolitch, Rachel McGinn, Safiya Jinah, William Klement, and Zachary A. Kaminsky. 2020. A
machine learning approach predicts future risk to suicidal ideation from social media data. npj Dig. Med. 3, 1 (2020),
1–12. DOI:https://doi.org/10.1038/s41746-020-0287-6
Michal Rzeszewski and Lukasz Beluch. 2017. Spatial characteristics of Twitter users—Toward the understanding of geosocial
media production. ISPRS Int. J. Geo-inf. 6, 8 (2017), 236. DOI:https://doi.org/10.3390/ijgi6080236
Hamman Samuel, Benyamin Noori, Sara Farazi, and Osmar Zaiane. 2019. Context prediction in the social web using applied
machine learning: A study of Canadian tweeters. In Proceedings of the IEEE/WIC/ACM International Conference on Web
Intelligence (WI’18). 230–237. DOI:https://doi.org/10.1109/WI.2018.00-85
H. Andrew Schwartz, Johannes Eichstaedt, Margaret L. Kern, Gregory Park, Maarten Sap, David Stillwell, Michal Kosinski,
and Lyle Ungar. 2014. Towards assessing changes in degree of depression through Facebook. In Proceedings of the
Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality. 118–125.

ACM Computing Surveys, Vol. 53, No. 6, Article 129. Publication date: December 2020.
129:30 R. Skaik and D. Inkpen

H. Andrew Schwartz, Johannes C. Eichstaedt, Margaret L. Kern, Lukasz Dziurzynski, Megha Agrawal, Gregory J. Park,
Shrinidhi K. Lakshmikanth, Sneha Jha, Martin E. P. Seligman, and Lyle Ungar. 2013. Characterizing geographic
variation in well-being using tweets. In Proceedings of the 7th International AAAI Conference on Weblogs and Social
Media (ICWSM’13). 331–346.
Jane H. K. Seah and Kyong Jin Shim. 2019. Data mining approach to the detection of suicide in social media: A case
study of Singapore. In Proceedings of the IEEE International Conference on Big Data (Big Data’18). IEEE, 5442–5444.
DOI:https://doi.org/10.1109/BigData.2018.8622528
Adrian B. R. Shatte, Delyse M. Hutchinson, and Samantha J. Teague. 2019. Machine Learning in Mental Health: A Scoping
Review of Methods and Applications. DOI:https://doi.org/10.1017/S0033291719000151
Guangyao Shen, Jia Jia, Liqiang Nie, Fuli Feng, Cunjun Zhang, Tianrui Hu, Tat Seng Chua, and Wenwu Zhu. 2017. Depres-
sion detection via harvesting social media: A multimodal dictionary learning solution. In Proceedings of the International
Joint Conference on Artificial Intelligence. 3838–3844.
Leo Sher. 2020. The impact of the COVID-19 pandemic on suicide rates. QJM: Int. J. Med. May (2020), 1–6. DOI:
https://doi.org/10.1093/qjmed/hcaa202
Han-Chin Shing, Suraj Nair, Ayah Zirikly, Meir Friedenberg, Hal Daum, and Philip Resnik. 2018. Expert, crowdsourced, and
machine assessment of suicide risk via online postings. In Proceedings of the 5th Workshop on Computational Linguistics
and Clinical Psychology: From Keyboard to Clinic. 25–36.
Hong-Han Shuai, Chih-Ya Shen, De-Nian Yang, Yi-Feng Lan, Wang-Chien Lee, Philip Yu, and Ming-Syan Chen. 2018. A
comprehensive study on social network mental disorders detection via online social media mining. In IEEE Trans. Knowl.
Data Eng., Vol. 4347. 1–1.
Lauren Sinnenberg, Alison M. Buttenheim, Kevin Padrez, Christina Mancheno, Lyle Ungar, and Raina M. Merchant. 2017.
Twitter as a tool for health research: A systematic review. Amer. J. Pub. Health 107, 1 (2017), e1–e8.
Marina Sokolova and Guy Lapalme. 2009. A systematic analysis of performance measures for classification tasks. Inf. Proc.
Manag. 45, 4 (2009), 427–437. DOI:https://doi.org/10.1016/j.ipm.2009.03.002
Maxim Stankevich, Andrey Latyshev, Evgenia Kuminskaya, Ivan Smirnov, and Oleg Grigoriev. 2019. Depression detection
from social media texts. In CEUR Workshop Proceedings 2523 (2019), 279–289.
Maxim Stankevich, Ivan Smirnov, Natalia Kiselnikova, and Anastasia Ushakova. 2020. Depress. Detect. Social Media Prof.
Vol. 1223 CCIS. Springer International Publishing. 181–194. DOI:https://doi.org/10.1007/978-3-030-51913-1_12
Michael Mesfin Tadesse, Hongfei Lin, Bo Xu, and Liang Yang. 2020. Detection of suicide ideation in social media forums
using deep learning. Algorithms 13, 1 (2020), 1–19. DOI:https://doi.org/10.3390/a13010007
Anja Thieme, Danielle Belgrave, and Gavin Doherty. 2020. Machine learning in mental health: A systematic review
of the HCI literature to support effective ML system design. ACM Trans. Comput.-hum. Interact. 27, 5 (2020). DOI:
https://doi.org/10.1145/3398069
Robert Thorstad and Phillip Wolff. 2019. Predicting future mental illness from social media: A big-data approach. Behav.
Res. Meth. 51, 4 (2019), 1586–1600. DOI:https://doi.org/10.3758/s13428-019-01235-z
Andrew Toulis and Lukasz Golab. 2017. Social media mining to understand public mental health. In Lecture Notes in Com-
puter Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 10494.
55–70.
Adam Tsakalidis, Maria Liakata, Damoulas Theo, and Alexandra Cristea. 2019. Can we assess mental health through social
media and smart devices? Addressing bias in methodology and evaluation. In Proceedings of the Conference on Machine
Learning and Knowledge Discovery in Databases (ECML PKDD’18), U. Brefeld et al. (Eds.). (Lecture Notes in Computer
Science, Vol. 11053). 186–201. DOI:https://doi.org/10.1007/978-3-030-10997-4
Sho Tsugawa, Yusuke Kikuchi, Fumio Kishino, Kosuke Nakajima, Yuichi Itoh, and Hiroyuki Ohsaki. 2015. Recognizing
depression from Twitter activity. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing
Systems (CHI’15). 3187–3196.
Rupa Valdez and Jessica Keim-Malpass. 2019. Ethics in health research using social media. In Social Web Health Res.
DOI:https://doi.org/10.1007/978-3-030-14714-3_13
Kasturi Dewi Varathan and Nurhafizah Talib. 2014. Suicide detection system based on Twitter. In Proceedings of the IEEE
Science and Information Conference. 785–788.
Bhanu Verma, Sonam Gupta, and Lipika Goel. 2020. A Neural Network Based Hybrid Model for Depression Detection in
Twitter. Vol. 19. Springer Singapore. DOI:https://doi.org/10.1007/978-981-15-6634-9
M. Johnson Vioulès, B. Moulahi, J. Azè, and S. Bringay. 2018. Detection of suicide-related posts in Twitter data streams.
IBM J. Res. Dev. 62, 1 (2018), 7:1–7:12. DOI:https://doi.org/10.1147/JRD.2017.2768678
Tao Wang, Markus Brede, Antonella Ianni, and Emmanouil Mentzakis. 2017a. Detecting and characterizing eating-disorder
communities on social media. In Proceedings of the 10th ACM International Conference on Web Search and Data Mining
(WSDM’17). 91–100.

ACM Computing Surveys, Vol. 53, No. 6, Article 129. Publication date: December 2020.
Using Social Media for Mental Health Surveillance: A Review 129:31

Yilin Wang, Jiliang Tang, Jundong Li, Baoxin Li, Yali Wan, Clayton Mellina, Neil O’Hare, and Yi Chang. 2017b. Understand-
ing and discovering deliberate self-harm content in social media. In Proceedings of the 26th International World Wide
Web Conference (WWW’17). 93–102. DOI:https://doi.org/10.1145/3038912.3052555
Zijian Wang, Scott A. Hale, David Adelani, Przemyslaw A. Grabowicz, Timo Hartmann, Fabian Flöck, and David
Jurgens. 2019a. Demographic inference and representative population estimates from multilingual social media data.
In Proceedings of the World Wide Web Conference (WWW’19). 2056–2067. DOI:https://doi.org/10.1145/3308558.3313684
arxiv:1905.05961.
Zheng Wang, Guang Yu, and Xianyun Tian. 2019b. Exploring behavior of people with suicidal ideation in a Chinese online
suicidal community. Int. J. Environ. Res. Publ. Health 16, 1 (2019). DOI:https://doi.org/10.3390/ijerph16010054
Zheng Wang, Guang Yu, Xianyun Tian, Jingyun Tang, and Xiangbin Yan. 2018. A study of users with suicidal ideation on
Sina Weibo. Telemed. e-Health 24, 9 (2018), 702–709. DOI:https://doi.org/10.1089/tmj.2017.0189
Jieun Wee, Sooyeun Jang, Joonhwan Lee, and Woncheol Jang. 2017. The influence of depression and personality on social
networking. Comput. Hum. Behav. 74 (2017), 45–52. DOI:https://doi.org/10.1016/j.chb.2017.04.003
Janith Weerasinghe, Kediel Morales, and Rachel Greenstadt. 2019. “Because... I was told... so much”: Linguistic indicators of
mental health status on Twitter. Proc. Priv. Enhanc. Technol. 2019, 4 (2019), 152–171. DOI:https://doi.org/10.2478/popets-
2019-0063
Kenton White, Guichong Li, and Nathalie Japkowicz. 2012. Sampling online social networks using coupling from the Past.
In Proceedings of the 12th IEEE International Conference on Data Mining Workshops (ICDMW’12). IEEE, 266–272. DOI:
https://doi.org/10.1109/ICDMW.2012.126
World Health Organization. WHO. 2019. Mental Disorders. Retrieved from https://www.who.int/news-room/fact-sheets/
detail/mental-disorders.
Sanjaya Wijeratne, Amit Sheth, Shreyansh Bhatt, Lakshika Balasuriya, Hussein S. Al-Olimat, Manas Gaur, Amir
Hossein Yazdavar, and Krishnaprasad Thirunarayan. 2017. Feature engineering for Twitter-based applications. In Fea-
ture Engineering for Machine Learning and Data Analytics, Huan Liu Guozhu Dong (Ed.). Chapman and Hall, 35.
DOI:https://doi.org/10.1201/9781315181080-14
J. T. Wolohan, Misato Hiraga, Atreyee Mukherjee, Zeeshan Ali Sayyed, and Matthew Millard. 2018. Detecting linguistic
traces of depression in topic-restricted text: Attending to self-stigmatized depression with NLP. In Proceedings of the 1st
International Workshop on Language Cognition and Computational Models. 11–21.
Akkapon Wongkoblap, Miguel A. Vadillo, and Vasa Curcin. 2017. Researching Mental Health Disorders in the Era of Social
Media: Systematic Review. DOI:https://doi.org/10.2196/jmir.7215
Akkapon Wongkoblap, Miguel A. Vadillo, and Vasa Curcin. 2018. A multilevel predictive model for detecting social network
users with depression. In Proceedings of the IEEE International Conference on Healthcare Informatics (ICHI’18). IEEE, 130–
135. DOI:https://doi.org/10.1109/ICHI.2018.00022
Ming Yang, Melody Kiang, and Wei Shang. 2015. Filtering big data from social media—Building an early warning system
for adverse drug reactions. J. Biomed. Informa. 54 (2015), 230–240. DOI:https://doi.org/10.1016/j.jbi.2015.01.011
Wei Yang and Lan Mu. 2015. GIS analysis of depression among Twitter users. Appl. Geog. 60 (2015), 217–223.
Andrew Yates, Arman Cohan, and Nazli Goharian. 2017. Depression and self-harm risk assessment in online forums. In
Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2968–2978.
Amir Hossein Yazdavar, Hussein S. Al-Olimat, Monireh Ebrahimi, Goonmeet Bajaj, Tanvi Banerjee, Krishnaprasad
Thirunarayan, Jyotishman Pathak, and Amit Sheth. 2017. Semi-supervised approach to monitoring clinical depres-
sive symptoms in social media. In Proceedings of the IEEE/ACM International Conference on Advances in Social Networks
Analysis and Mining.1191–1198. DOI:https://doi.org/10.1145/3110025.3123028
Zhijun Yin, Lina M. Sulieman, and Bradley A. Malin. 2019. A systematic literature review of machine learning in online
personal health data. J. Amer. Med. Inform. Assoc. 26, 6 (2019), 561–576. DOI:https://doi.org/10.1093/jamia/ocz009
Lei Zhang, Xiaolei Huang, Tianli Liu, Ang Li, Zhenxiang Chen, and Tingshao Zhu. 2015. Using linguistic features to estimate
suicide probability of Chinese microblog users. In Lecture Notes in Computer Science (including subseries Lecture Notes
in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 8944. 549–559.
Yunpeng Zhao, Yi Guo, Xing He, Jinhai Huo, Yonghui Wu, Xi Yang, and Jiang Bian. 2018. Assessing mental health signals
among sexual and gender minorities using Twitter data. In Proceedings of the IEEE International Conference on Healthcare
Informatics Workshops (ICHI-W’18). IEEE, 51–52. DOI:https://doi.org/10.1109/ICHI-W.2018.00015

Received October 2019; revised July 2020; accepted August 2020

ACM Computing Surveys, Vol. 53, No. 6, Article 129. Publication date: December 2020.

You might also like