Surveillance
Surveillance
Surveillance
Data on social media contain a wealth of user information. Big data research of social media data may also
support standard surveillance approaches and provide decision-makers with usable information. These data
can be analyzed using Natural Language Processing (NLP) and Machine Learning (ML) techniques to detect
signs of mental disorders that need attention, such as depression and suicide ideation. This article presents
the recent trends and tools that are used in this field, the different means for data collection, and the current
applications of ML and NLP in the surveillance of public mental health. We highlight the best practices and
the challenges. Furthermore, we discuss the current gaps that need to be addressed and resolved.
CCS Concepts: • General and reference → Surveys and overviews; • Computing methodologies →
Natural language processing; Machine learning algorithms;
Additional Key Words and Phrases: Mental health, social media
ACM Reference format:
Ruba Skaik and Diana Inkpen. 2020. Using Social Media for Mental Health Surveillance: A Review. ACM
Comput. Surv. 53, 6, Article 129 (December 2020), 31 pages.
https://doi.org/10.1145/3422824
1 INTRODUCTION
Public health surveillance is “the ongoing, systematic collection, analysis, interpretation, and dis-
semination of data regarding a health-related event for use in public health action to reduce mor-
bidity and mortality and to improve health” [German et al. 2001]. A critical requirement for a
surveillance system is to obtain reliable information and evidence in a timely manner. In this arti-
cle, we review the research studies that focus on using social media text for mental health surveil-
lance. The importance of mental health surveillance, focusing on depression and suicide, will be
addressed in the following subsections.
This research is funded by Natural Sciences and Engineering Research Council of Canada (NSERC).
Authors’ addresses: R. Skaik and D. Inkpen, University of Ottawa, 800 King Edward Avenue, Ottawa, ON, K1N 6N5, Canada;
emails: {rskai034, diana.inkpen}@uottawa.ca.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee
provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and
the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored.
Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires
prior specific permission and/or a fee. Request permissions from [email protected].
© 2020 Association for Computing Machinery.
0360-0300/2020/12-ART129 $15.00
https://doi.org/10.1145/3422824
ACM Computing Surveys, Vol. 53, No. 6, Article 129. Publication date: December 2020.
129:2 R. Skaik and D. Inkpen
that 792M people worldwide had a mental disorder in 2017 [Ritchie and Roser 2018], yet this is
proving to be an economic burden for governments, costing them billions of dollars yearly.
Depression is one of the most widely recognized mental disorders in the world. It is the main
contributor to non-fatal prevalence of impairment, accountable for a high number of disability-
adjusted years of life globally [Public Health Agency of Canada 2015] and one of the leading causes
for suicide. Early recognition of signs of depression and application of the proper treatments can
help those affected and ease their pain.
Although suicide is not a mental illness, most attempts are ascribed to mental disorders, very
commonly depression. Suicide has become one of the leading causes of death worldwide. It is a
serious public health problem and the right prompt response can mitigate it. For instance, the rate
of youth suicide in Canada is the third highest in the developed world and was the second-leading
cause of death among adolescents during 2017 in the United States [NIMH 2019]. According to
Statistics Canada, 4,405 people decided to take their own lives in Canada in 2015, at a rate of
11.5 per 100,000 people. Suicide cases have serious implications on the well-being of families and
communities, both physically and emotionally. An early detection of suicide ideation can prevent
many cases of suicide and help in identifying those that need immediate counseling. This is one
of the crucial steps of maintaining global mental health.
ACM Computing Surveys, Vol. 53, No. 6, Article 129. Publication date: December 2020.
Using Social Media for Mental Health Surveillance: A Review 129:3
Table 1. Number of Monthly Active Users (Millions) Using Social Media Platforms 2018
(1) Data collection techniques used for predicting mental illness in text-based social media
platforms such as Twitter, Reddit, and Sina Weibo.
(2) Features used in training ML models.
(3) State-of-the-art methods in population-level mental illness prediction.
(4) Study limitations that need to be identified to facilitate mental health surveillance and
provide more accurate tools to concerned parties to enhance public mental health.
This article is structured as follows: Section 2 describes the methodology used for the collection
of the relevant articles, description of the eligibility criteria, and the compilation process for this
review. Sections 3 presents an overview of the various techniques that are used for data collection
and annotation. Section 4 summarizes the ML methods used for identifying signs and predicting
mental health issues within social media. Section 5 presents the techniques of using Social Me-
dia data to predict mental health issues at population level, focusing on depression and suicide
ideation. Section 6 addresses the challenges and limitations in this field. The last section concludes
the article.
ACM Computing Surveys, Vol. 53, No. 6, Article 129. Publication date: December 2020.
129:4 R. Skaik and D. Inkpen
Fig. 1. Number of publications included in this review (a) per year (b) per topic.
Elsevier, IEEE Xplore databases, and Google Scholar for articles where any of its fields matched the
following Boolean search strings: “mental health” AND “machine learning” AND “social media”
AND “natural language processing” AND (population OR surveillance), and for Google Scholar we
excluded any article with “image” or “speech” terms. Searches done through the online database
were limited to publications in English. Twenty-four additional articles were also identified using
the snowball process, resulting in a total of 838 articles. Based on the title and abstract screening,
711 articles were excluded based on the eligibility criteria shown in Table 2, resulting in 165 papers
for full-text screening. Studies were excluded if: they are using other than social media text, such
as speech, image, or other multimedia data1 ; they are done using wearable or mobile data; they
used clinical data; or they investigate non-mental health issues including obesity, diabetes, or flu.
For population-level section, mental disorders other than depression and suicide ideation were
excluded.
Phase (II) started on August 9, 2020, to include the papers that were published in 2019 and 2020.
The same process was repeated, resulting in a total of 381 related papers that were identified using
the same resources. After screening titles and abstracts, 82 articles were eligible for full screening.
After the full-text screening, other 50 studies did not meet the inclusion criteria and were excluded
from the 82 initially eligible ones, resulting in an additional 32 studies to be added in phase II.
Figure 1 indicates the connection between the publishing years and the number of publications
obtained in our review and the topics covered in the publications.
ACM Computing Surveys, Vol. 53, No. 6, Article 129. Publication date: December 2020.
Using Social Media for Mental Health Surveillance: A Review 129:5
Fig. 2. PRISMA flow diagram of the study selection process for using NLP and ML to predict depression and
suicide ideation at population level.
The full list of all papers is provided in the Appendix. These papers did not specifically focus on
population-level analysis, but still highlighted useful methods that could be applied. Thus, these
papers are included in the data collection techniques and general methods sections. Ultimately, 110
publications were included in the current review, and 25 publications specifically for population-
level mental health classification techniques. Fifteen of the 25 publications were identified for the
depression disorder and 10 for suicide-ideation, as shown in Figure 2.
ACM Computing Surveys, Vol. 53, No. 6, Article 129. Publication date: December 2020.
129:6 R. Skaik and D. Inkpen
3 DATA COLLECTION
The first step to addressing mental illness is obtaining reliable information and evidence [Paul
and Dredze 2017]. Having a comprehensive and accurate dataset is a critical success factor for
applying ML algorithms. A gold standard is a dataset that is used to compare ML models against it
[Calvo et al. 2017]. Such datasets could contain only a test set on which the performance of various
classifiers can be compared, but more often they include a training set as well. The classifiers can
be trained on the latter, though any additional labeled or unlabeled data can be used for training
if desired.
There are several methods to gather information on social media relevant to users’ mental
health, including self-reporting (directly or indirectly), mental illness signs inference, manual an-
notations, and external statistics. In this section, we will summarize the data collection techniques
and annotation procedures and objectives.
ACM Computing Surveys, Vol. 53, No. 6, Article 129. Publication date: December 2020.
Using Social Media for Mental Health Surveillance: A Review 129:7
Table 3. Chosen Studies That Used Screening Surveys as a Way to Collect Data about DD: Major
Depression Disorder; PT: Posttraumatic Stress Disorder (PTSD); PD: Postpartum Depression,
and suicide within the population, SD: Suicide
in users’ profile descriptions. The authors developed a probabilistic topic modeling on user tweets
with partial supervision to monitor clinical depression symptoms. They achieved competitive re-
sults with a fully supervised approach with an accuracy of 68% for predicting the answers of all the
questions over a time interval. Their semi-supervised topic modeling approach called (ssToT) was
able to respond with an enhanced F1 score specifically to the following symptoms: decreased sat-
isfaction, feeling down, sleep disturbance, energy loss, and appetite change. Similar approach was
conducted by Karmen et al. [2015] who translated questionnaires and synonyms into a depression
lexicon and used it to assign a post-level cumulative depression rating.
2 https://www.statista.com/statistics/260710/number-of-social-network-users-in-canada.
ACM Computing Surveys, Vol. 53, No. 6, Article 129. Publication date: December 2020.
129:8 R. Skaik and D. Inkpen
media data can be extracted using available Application Programming Interfaces (API) of the so-
cial media platforms. There are different ways of collecting and processing social media data. This
is determined by the types of illnesses being researched and the platforms that are used as source
material. In the following subsections, we will briefly summarize some of the most popular data
sources in the social media research community.
3.3.1 Twitter. Twitter is the most attractive social network service among researchers. Twitter
currently ranks as one of the world’s leading social networks based on active users.3 Every update
the user posts to his followers on Twitter is called a tweet. Tweets are mostly accessible to the
public and can be obtained and analyzed, unless flagged by the user as “private”. Tweets can be
collected using Twitter API by searching the tweets for specific keywords, hashtags, or any defined
query and can be limited to particular locations and time periods.
Other researchers look for focal users and build the dataset incrementally based on social con-
nections [Wang et al. 2017a; Zhao et al. 2018]. Several depressive symptoms derived from mental
disorders manuals such as Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition
(DSM-5) can be identified by applying machine learning algorithms on Twitter data [Mowery et al.
2015; Prieto et al. 2014].
Some researches obtained tweets with self-reported diagnoses and filtered via regular expres-
sions (RegEx) to capture “I was diagnosed with ... Condition” then the collected tweets are manually
labeled by human annotators to determine whether the expression used is indicating the men-
tioned mental health diagnosis or not [Chen et al. 2018b; Coppersmith et al. 2014a, 2015a; Li et al.
2017; Mowery et al. 2017b; Qntfy et al. 2017]. For suicide-related tweets, more tailored expressions
need to be considered, for example .+(\took | \take).+\own.+\life.+ [Burnap et al. 2017].
Some researchers extracted candidate tweets using keywords that are related to mental disor-
ders, for example, “Depression” or “Suicide” or terms that may indicate mental disorders conditions,
or symptoms such as distress, dejected, gloomy, cheerless, blue, empty, sad, feeling low, hate myself,
kill myself, don’t want to live anymore, ashamed of myself, and so on [Kale 2015; Cavazos-Rehg
et al. 2016; Varathan and Talib 2014].
There are also other ways to collect tweets that may include signs of mental disorders, like using
indicative hashtags such as #MyDepressionLooksLike [Lachmar et al. 2017], #WhatYouDontSee4 ,
or #KMS.
In addition to user-generated text, Twitter includes user data and social metadata, such as geo-
graphical information and the date and time of the tweet, and user networking and interaction in-
formation. Thus, state-of-the-art applications including Twitris5 , SOCIALmetricsTM , and OsoMe6
have been built to analyze massive real-time social media data. Twitris is a linguistic social web
network that utilizes user-generated social media content to understand social perspectives on
real-world events. Whereas, SOCIALmetricsTM is a system that processes crawled Twitter data
using NLP and text mining tools.
Razak et al. [2020] presented Tweep, a rule-based system using VADER (Valence Aware Dic-
tionary and sEntiment Reasoner) Sentiment analysis tool7 and two machine learning algorithms:
Naive Bayes (NB) and Convolutional Neural Network (CNN) to analyze tweet sentiment for logged
in users and their Twitter followers. Table 4 presents a summary of recent research done using
Twitter website as a data source.
3 https://www.statista.com/statistics/272014/global-social-networks-ranked-by-number-of-users/.
4 https://www.buzzfeed.com/annaborges/what-you-dont-know-campaign.
5 http://twitris.knoesis.org.
6 https://osome....
7 https://github.com/cjhutto/vaderSentiment.
ACM Computing Surveys, Vol. 53, No. 6, Article 129. Publication date: December 2020.
Using Social Media for Mental Health Surveillance: A Review 129:9
Table 4. References for ML Algorithms Applied over Data Collected from Twitter Platform on DD: Major
Depression Disorder, PT: Posttraumatic Stress Disorder (PTSD), PD: Postpartum Depression, LS: Life
Satisfaction, MI: Mental Illness, SI: Suicide Ideation
(Continued)
ACM Computing Surveys, Vol. 53, No. 6, Article 129. Publication date: December 2020.
129:10 R. Skaik and D. Inkpen
Table 4. Continued
3.3.2 Reddit. Reddit is an open-source platform that allows users to publish, comment, or vote
on submissions. Reddit had over 430M active monthly users who collectively have generated
199M posts with more than 130K active communities, 1.7B comments, and 32B up-votes during
20198 .
8 https://www.redditinc.com/.
ACM Computing Surveys, Vol. 53, No. 6, Article 129. Publication date: December 2020.
Using Social Media for Mental Health Surveillance: A Review 129:11
Table 5. References for ML Training Algorithms Applied over Data Collected from Reddit Platform
Posts are grouped by areas of interest that cover a variety of topics, such as gaming, sport,
news, and many others. Every subreddit has its own rules, administrators, and subscribers. Some
of these subreddits address mental health issues, including anxiety, depression, and suicide. Sub-
scribers may share personal experiences, seek help, and offer support to others. In March 2020,
there were nearly 190K subscribers to the “SuicideWatch” subreddit and nearly 600K subscribers to
the Depression subreddit. Yates et al. [2017] created an experimental dataset that contains around
9K diagnosed users and over 100K control users and named it “Reddit Self-reported Depression
Diagnosis (RSDD) dataset.” Similarly, Losada and Crestani [2016] collected 49,580 depressed and
481,873 control posts. Thorstad and Wolff [2019] collected 56,009 posts for each of the following
clinical subreddits: r/ADHD, r/Anxiety, r/Bipolar, and r/Depression and used unigram word vec-
tors for training a logistic regression model to classify the post to one of the four mentioned mental
illnesses. The depression classification model achieved an F1-score of 0.74. They also concluded
that the language used in a non-clinical context would be predictive of which clinical subreddit the
user would later post to. Table 5 presents a summary of recent researches done using the Reddit
website as a data source.
3.3.3 Sina Weibo. Sina Weibo is China’s largest microblog site. Many studies have been con-
ducted using Sina Weibo as a data source. Wang et al. [2018] randomly crawled 1M users (394M
postings) and used a keyword-based method to pinpoint users at risk of suicide. Afterwards, three
mental health researchers were assigned to label at-risk users manually. They identified 114 users
(60,839 posts) with suicide ideation and used linguistic analysis to explore behavioral and demo-
graphic characteristics. Lv et al. [2015] used Sina Weibo to build a suicide dictionary. They found
that the dictionary-based recognition correlates well with the expert ratings (r = 0.507) in detecting
ACM Computing Surveys, Vol. 53, No. 6, Article 129. Publication date: December 2020.
129:12 R. Skaik and D. Inkpen
suicidal expression along with evaluating the level of suicide risk. Similarly, Hao et al. [2013] used
Support Vector Machine (SVM) and Neural Networks (NN) on psychological measurement data
(SCL-90-R) along with Sina Weibo blogs to identify users with mental health issues. On post-level,
Gao et al. [2017] extracted new features based on content and emotions by examining the semantic
relationships between the words in a labeled dataset of 9,123 microblogs with suicidal ideation.
3.4.1 CLPsych Shared Task. In 2014, the workshop on Computational Linguistics and Clinical
Psychology (CLPsych) began a collaboration between clinical psychologists and computer scien-
tists and developed links across the research community. The workshop series aims to expedite
the development of language technology for mental healthcare with an emphasis on using social
media to predict population mental health. Shared tasks provide gold standards, because they are
built on the same dataset to test and compare various solutions to the same problem under study.
At the 2015 CLPsych workshop, participants were asked to determine whether a user has PTSD
or depression, or none of them based on self-reported Twitter diagnoses [Coppersmith et al. 2014a].
The dataset is composed of 1,146 users: 246 users with PTSD, 327 depressed users, and 573 control
users that match the age and gender of the former two groups. For all three tasks, the system of
Resnik et al. [2015a] performed the best, obtaining an average precision above 0.80. The maximum
precision was achieved for the task that distinguishes PTSD vs. control users by training an SVM
classifier with a linear kernel based on topic modeling and lexical TF-IDF features. Later, Orabi et al.
[2018] used optimized word embeddings with a deep learning model and achieved a precision of
87.4% and an F1-score of 86.97%.
The CLPsych 2016 and 2017 shared tasks invited participants to automatically triage posts col-
lected from the ReachOut.com forum as green, amber, red, or crisis to assist the forum moderators
to identify and address pertinent cases as soon as possible. A total of 15 teams participated in
the task with 60 different submissions. At first, 947 annotated posts were given to each team to
develop and train their models. The best-performing system used an ensemble classification ap-
proach with TF-IDF weighted unigrams and post embeddings, and achieved an F1-score of 0.42
[Kim et al. 2016]. Following, Cohan et al. [2017] used the same dataset and applied an ensemble of
lexical, LIWC, emotions, contextual, and topic modeling features using an SVM model to reach a
better F1-score of 0.51.
In 2019, the shared task consisted of three tasks; Task A was about risk assessment of the users
who posted in the SuicideWatch subreddit, into one of the following four levels of risk: No risk,
low, moderate, and high. Task B was about risk assessment using all the subreddits. Task C was
about screening users for probabilistic risk from non-mental health-related subreddits. The dataset
consists of 1,242 users (including both positive examples and controls). Mohammadi et al. [2019]
participation under the team name CLaC obtained the best macro-F1 score for task A (0.533) by
adding the SVM-predicted class probabilities at the end of the pipeline on top of a set of CNN, Bi-
LSTM, Bi-RNN, and Bi-GRU neural networks. For Task B, Matero et al. [2019] achieved the best F1-
score (0.504) using BERT features extracted separately from Suicide-Watch and nonSuicideWatch
posts. For Task C, a stacked parallel CNN with LIWC and a universal sentence encoder [Cer et al.
2018], produced the best unofficial F1 score (0.278) as compared to (0.268) for the CLaC primary
system. Finally, Howard et al. [2020] used lexicon analysis, LIWC, Empath, Word Count, VADER,
and DeepMoji for emotional feature extraction to train the model on CLPsych 2017 dataset and
test it on CLPSych 2019 expert-labeled dataset with a maximum F1-score (0.616).
ACM Computing Surveys, Vol. 53, No. 6, Article 129. Publication date: December 2020.
Using Social Media for Mental Health Surveillance: A Review 129:13
3.4.2 eRisk task. The 2017 CLEF eRisk is a pilot project that extends the CLEF initiatives that
have been operating since 2000, leading to the systemic evaluation of information systems, mainly
through experiments on shared tasks. The primary purpose of CLEF eRisk 2017 and 2018 (Task 1) is
to address issues related to assessment criteria, effectiveness indicators, and other early detection
of depression mechanisms [Coello-Guilarte et al. 2019; Losada et al. 2017, 2018]. The shared task
focuses on the automatic detection of the risk of depression from Reddit posts of a user as soon as
possible.
For eRisk 2017, the training set has been manually annotated by experts and contains 486 users
(83 depressed with 30,851 posts and 403 non-depressed with 264,172 posts). The test set holds
401 users (52 depressed with 18,706 posts, and 349 non-depressed wrote 217,665 posts). A total
of 30 systems were submitted by eight teams in the pilot task [Almeida et al. 2017]. The highest
precision was 0.69 submitted by the Biomedical Computer Science Group from the University
of Applied Sciences and Arts Dortmund (FHDO); while the highest recall was 0.79, submitted
by the LIDIC Research Group, from Universidad Nacional de San Luis. They examined multiple
document representations such as Bag of Words (BoW), Concise Semantic Analysis, Character
3-grams, and LIWC using Random Forests, NB, and decision trees machine learning algorithms.
The evaluation measures include an early risk detection measure (ERDE) along with standard
classification measures, such as F1, Precision, and Recall. ERDE measure is mainly to reward
correct classification using a fewer number of user submissions and to penalize late decisions.
The 2018 eRisk task continued with the goal of early detection of symptoms of depression, along
with a new task of early detection of anorexia indicators. A 2017 dataset was used as training set,
then additional 820 users were added with more than 500K posts for testing. There were 45 contri-
butions from 11 teams. No significant improvement was noticed, and most participants ignored the
tradeoff between early detection and accuracy. Consequently, Leiva and Freire [2017] used TF-IDF
features to compare different ML algorithms on the same dataset and concluded that the Random
Forest shows the highest precision score, while the K-Nearest Neighbors obtains the highest re-
call. The combination of all of them in the Voting Algorithm presents an improvement in the F1
measure.
There were three tasks for eRisk 2019. In addition to an ambitious challenge relating to instantly
completing a user interaction-based depression questionnaire in social media, the first task contin-
ued in the same direction as previous challenges for early detection of depression symptoms, and
a similar task was introduced for unsupervised self-harm detection. The findings indicate that it is
uncertain whether early signs of self-harm can be identified from social media user experiences un-
til they join a self-harm community [Losada et al. 2019]. Recently, eRisk 2020, in its fourth year, con-
tinued with two tasks: self-harm detection and measuring the severity of depressive symptoms.10
3.4.3 Crisis Text Line. The Crisis Text Line11 supported by Kids Help Phone is a free 24/7 crisis
support texting hotline to assist people with mental health issues through texting [Dinakar et al.
2014]. As of October 2019, Crisis Text Line has processed more than 100M text messages. The
data are used to study different mental illness trends across the US, including but not limited
to depression, self-harm, and suicidal ideation. The result of the analysis is displayed publicly
on CrisisTrends.org. Althoff et al. [2016] experimented with roughly 15K counselor messages to
evaluate the linguistic aspects of effective therapy. Unigram and bigram features on a regression
model with L1 regularization and 10-fold cross-validation showed the best performing model to
predict the effectiveness of a patient-counselor conversation with an accuracy of 0.687 and AUC
10 https://early.irlab.org/.
11 The labeled dataset used to produce the word cloud is from Shen et al. [2017].
ACM Computing Surveys, Vol. 53, No. 6, Article 129. Publication date: December 2020.
129:14 R. Skaik and D. Inkpen
of 0.716. To gain access to an anonymized version of the dataset, the researcher must apply and
get accepted in the Research Fellows program sponsored by Crisis Text Line.
4 METHODS
Social media data can provide a valuable source of information for public health research. Un-
derstanding data and the domain of discourse is vital for building a good model. Accordingly,
detecting mental disorders using social media posts requires a thorough understanding of the key
predictors of the illness, called features in ML terminology. Many researchers tried to determine
the contributing features utilizing different NLP approaches to build an accurate predictive model.
Most predictive models focus on determining the best features that contribute to the problem
under analysis to design good classifiers. Selecting the best set of features that help to reduce the
dimension from the dataset is influential to the learning process. Because of the heterogeneity of
social media content, a variety of features can be developed starting from textual and linguistic,
to user-based and metadata-related features [Wijeratne et al. 2017]. As mentioned earlier, only
a subset of these features has the requisite ability to distinguish classes for specific applications
and contexts; in particular, to predict user types and behaviors [Kursuncu et al. 2018]. Feature
engineering or extraction aims to reduce the number of features extracted from the dataset under
study by choosing the most discriminative features. Reducing data dimensionality via the feature
extraction process helps avoid the curse of dimensionality [Cummins et al. 2015]. The main target
is to identify strongly relevant and non-redundant features [Li et al. 2017], which is a challenging
task. Using deep learning frameworks helps capture related features during the learning process
without exhaustive feature engineering [Orabi et al. 2018].
Table 6 shows the summary of the features that have been used in various mental health signs
analysis studies. Several efforts have attempted to predict mental illness within social media con-
tent on post level, user level, or population level. In this section, more details on the analysis levels
will be explained.
ACM Computing Surveys, Vol. 53, No. 6, Article 129. Publication date: December 2020.
Using Social Media for Mental Health Surveillance: A Review 129:15
Table 6. List of Features Used in Different ML Algorithms for Mental Health Classification
4.3.1 Bottom-up Approach. Using this approach, the researcher starts with individual models,
then generalizes to make an inference about the population [Coppersmith et al. 2014a, 2015b,
ACM Computing Surveys, Vol. 53, No. 6, Article 129. Publication date: December 2020.
129:16 R. Skaik and D. Inkpen
Fig. 3. Word cloud overview of emerging terms from users who did not express depression (left) and users
who self-reported depression (right).
2014b; De Choudhury et al. 2017b; Yazdavar et al. 2017]. Figure 4 illustrates the steps that need to
be followed to conclude a population-related inference. Importantly, the sample under study has
a significant impact on the findings, because often only major classes are represented, excluding
crucial minority classes. Hence, researchers had utilized techniques typically used in regular sur-
veys. To have a population-representative sample, researchers had utilized different techniques to
rectify representation errors, such as probability sampling techniques including Stratified Sam-
pling [De Choudhury et al. 2016; Wang et al. 2019b], Simple Random Sampling [Cheng et al. 2017;
Liu et al. 2017; Calderon-Vilca et al. 2017; Shing et al. 2018], Cluster Sampling [Schwartz et al.
2013], or Multi-Stage Sampling [De Choudhury et al. 2013c]. In addition, researchers had used non-
probability sampling techniques such as snowball sampling to deal with minority classes [Balani
and De Choudhury 2015; Tsugawa et al. 2015; Wee et al. 2017; Wolohan et al. 2018; Zhao et al.
2018].
4.3.2 Top-down Approach. In this approach, studies use the aggregated data of the population,
then make inference from that data [Coppersmith et al. 2017; Culotta 2014; Schwartz et al. 2013;
Jaques et al. 2016; Gruebner et al. 2017a; Nguyen et al. 2017a; Giorgi et al. 2018]. This type should be
applied carefully, otherwise, the differences among the subgroups could be masked and dissolved
during the aggregation process, as shown in Figure 5.
ACM Computing Surveys, Vol. 53, No. 6, Article 129. Publication date: December 2020.
Using Social Media for Mental Health Surveillance: A Review 129:17
Machine learning algorithms play a vital role for modeling relationships between features [Hahn
et al. 2017]. It is well known that there is no universal algorithm for classification. It is increasingly
important to understand the diverse cultures and societies contributing to social media. Adding
demographic attributes including age, gender, and personality type affect predicting health sta-
tus [Preot et al. 2015]. Religion, ethnicity, marital, and socioeconomic status are expected to add
value in the research. Previous researches done on users’ posts and metadata have shown that
demographic information can be extracted by applying different ML algorithms with an accuracy
ranging from 60% to 90% [Sinnenberg et al. 2017], or can be extracted from profile information,
such as username, screen name, biography, and profile image with Macro-F1 measure 0.9 for gen-
der and Macro-F1 measure 0.5 for age [Wang et al. 2019a].
13 http://liwc.wpengine.com/.
ACM Computing Surveys, Vol. 53, No. 6, Article 129. Publication date: December 2020.
129:18 R. Skaik and D. Inkpen
2014a; Karmen et al. 2015; Resnik et al. 2015b]. De Choudhury et al. [2013a] developed a proba-
bilistic model to detect the behavioral changes associated with the onset of depression. Whereas,
Mowery et al. [2016] used lexical and emotional features to identify depressive symptoms such
as anhedonia (reduced motivation to feel pleasure), insomnia, loss of energy, and so on. Nguyen
et al. [2014] achieved an accuracy of 93% using a logistic regression model to classify blog posts
to be belonging to depression or control sets. Shen et al. [2017] constructed a multi-modal depres-
sive dictionary learning model (MDL) to learn the latent features of depressed and non-depressed
users on Twitter from a joint sparse representation of emotion, visual, social network properties,
user profile, topic-level, and domain-specific features and achieved 85% in F1-score. To predict
user stress levels, Lin et al. [2016] used neural network–based architecture with word embeddings
(WE) learned from Sina Weibo dataset along with stress-based keywords. The word embedding
was found to be effective in predicting semantic similarity between different words. Nguyen et al.
[2019] represented each county as a graph interactions between LIWC features, then trained sev-
eral graph neural networks: graph convolutional network (GCN), graph attention network (GAT),
a hybrid GAT-GCN network, and graph isomorphism network (GIN) to learn the population health
representation, and finally, used LR to estimate 3,221 counties health indices.
Table 7 summarizes the research findings related to predicting depression from Twitter text.
ACM Computing Surveys, Vol. 53, No. 6, Article 129. Publication date: December 2020.
Using Social Media for Mental Health Surveillance: A Review 129:19
Table 7. Summary of the Features Used for Predictive Models for Depression with the Best Results
Achieved in Terms of Accuracy (AC); F1-score; or Recall (RC)
analyzing existing social media data [Paul et al. 2016]. Social media analysis may allow to
connect links between diseases and their symptoms or causes. There are still many challenges
and questions that need to be addressed to exploit the opportunities for employing social media
data to predict mental health issues within the population. This section briefly discusses some
of the limitations and challenges that arise in this endeavor, offering some recommendations to
overcome such barriers.
Table 8. Summary of the Features Used on Social Media Text for Predictive Model for
Suicide Ideation with the Best Results Achieved in Terms of Accuracy (AC);
F1-score; Recall (RC); or Area Under Curve (AUC)
Positive
Ref. Tweets/Users Features Model Metric Score
Abboute et al. [2014] 623/3,263 WEKA NB AC 63%
Huang et al. [2015] 664/- LDA SVM AC 96%
O’Dea et al. [2015] 1,820/14,701 TF-IDF SVM AC 80%
Braithwaite et al. [2016] 17/- LIWC DT AC 92%
Coppersmith et al. 554/- Sentiment LR F1 0.53
[2016]
Burnap et al. [2017] 425/- TF-IDF SVM AC 66%
Cheng et al. [2017] 976 /- SC-LIWC SVM AUC 0.48
Aladag et al. [2018] -/785 TF-IDF SVM F1 0.92
Coppersmith et al. 418/197,615 WE LSTM RC 0.85
[2018]
Desmet and Hoste 257/- Gallop & SVM F1 0.75
[2018] BoW
Roy et al. [2020] 283/512,526 NN for psy- RF AUC 0.88
chological
constructs
perform well across different datasets. Different evaluation metrics can be adopted as detailed in
Sokolova and Lapalme [2009]. Therefore, having the means to evaluate the model’s output and its
ability to generalize is another success factor. Some population-level systems are evaluated using
the Pearson correlation coefficient between the ML model prediction and official national statistics.
It is important to note that national statistics may not be sufficiently accurate, as mental illnesses
are significantly under-reported, especially in developing and undeveloped countries where data
are more difficult to obtain and mental illnesses are less recognized. However, some critical clusters,
such as veterans, indigenous people, and the elderly, may not be well presented in the data due to
not regularly using social media.
6.5 Ethics
There are ethical challenges in using social media as a source for NLP and ML. Conway [2014]
provided a taxonomy of ethical principles on the use of Twitter in public health research based
on a review of the literature. These principles can be implemented among all social media plat-
forms. Researchers face important challenges to ensure the privacy of social media data [Conway
et al. 2016; Gruebner et al. 2017b]. Although some of the data are publicly available, the problem
becomes more complicated when personal attributes can be predicted, and the identity of particu-
lar users can be revealed. There is an elaborated discussion of ethics, particularly in public health
research, such as Mckee [2013]; Mikal et al. [2016]; Denecke et al. [2015]; Golder et al. [2017];
Valdez and Keim-Malpass [2019]. Based on their experience in the domain, Benton et al. [2017a]
developed practical protocols to guide NLP research using social media data from a healthcare
perspective. Their guidelines recommend that researchers need to acquire an ethical approval or
exemption from their Institutional Review Board (IRB). Researchers also need to obtain informed
consent when possible and protect and anonymize sensitive data when used in presentations or
analysis. Besides, they need to be vigilant when linking data across sites is necessary. Finally, when
sharing their data, they need to ensure that other researchers respect ethical and privacy concerns.
In general, there is an agreement that researchers can use publicly available data for health mon-
itoring, but preserving the confidentiality of social media users is a must. Predicting the clusters
vulnerable to mental disorders is one of the steps in protective medicine. After identifying such
groups, there are practical steps that need to be considered from responsible parties to collaborate
for disease control, treatment, and prevention. These steps may require informed consent and ac-
ceptance of such interventions in the target population or required government-related programs
to address specific clusters and provide the appropriate help.
7 CONCLUSION
Despite the above-mentioned methodological and technical difficulties accompanying the use of
social media data in predictive analysis, social media has proven to be a valuable source for detect-
ing the characteristics of depressed individuals or those who are vulnerable to suicidal thoughts.
With the progressive rise in the number of depressed users due to the COVID-19 pandemic and an
expected increase in the number of suicides [Sher 2020], this field becomes even more promising
ACM Computing Surveys, Vol. 53, No. 6, Article 129. Publication date: December 2020.
129:22 R. Skaik and D. Inkpen
in providing decision-makers with rapid tools to mitigate risk. This review addresses the ideas pre-
sented by different researchers in this emerging field and provides a summary of data collection
methods, classification techniques, and evaluation results. It highlights the importance of apply-
ing proper sampling methods when dealing with population-level analysis to include the minority
communities with the correct handling of imbalanced data. Also, this review highlights the poten-
tial in applying deep neural network models on textual features instead of particular user-centric
features such as demographic and social features and post-centric features such as linguistic and
behavioral features to determine the most prominent features to develop a better predictive model.
Its success criteria should be measured by its accuracy on identifying at-risk clusters, its ability
to generalize on unseen data, its early prediction of mental illness signs from users’ posts, and its
timely prediction in terms of performance.
APPENDIX
B THE MANUSCRIPTS INCLUDED IN THE REVIEW
Data Collection
method References
Screening Surveys Almouzini et al. [2019]; De Choudhury et al. [2013a, 2013d, 2014]; Oh et al.
[2017]; Tsugawa et al. [2015]; Zhang et al. [2015]; Rakesh [2017]; Reece et al.
[2017]; Stankevich et al. [2020, 2019]
Forums Acuña Caicedo et al. [2020]; Burnap et al. [2015, 2017]; Colombo et al. [2016];
Desmet and Hoste [2018]; Karmen et al. [2015]; Nguyen et al. [2014, 2017b];
Wang et al. [2019b]; Howard et al. [2020]
Twitter Abboute et al. [2014]; Benton et al. [2017b]; Braithwaite et al. [2016]; Burnap
et al. [2015, 2017]; Chen et al. [2018b, 2018a]; Colombo et al. [2016];
Coppersmith et al. [2015b, 2014b, 2014a, 2016, 2018]; Culotta [2014]; De
Choudhury et al. [2013d, 2013a, 2013b, 2013c, 2017b]; Jamil et al. [2017];
Jashinsky et al. [2013]; Vioules et al. [2018]; Joshi et al. [2018]; Kang et al.
[2016]; Liu et al. [2017]; Mowery et al. [2016, 2017b, 2017a]; Nguyen et al.
[2017a]; O’Dea et al. [2015]; Preot et al. [2015]; Yazdavar et al. [2017]; Resnik
et al. [2015b, 2015a]; Schwartz et al. [2013]; Tsugawa et al. [2015]; Razak
et al. [2020]; Verma et al. [2020]; Giorgi et al. [2018]; Samuel et al. [2019]
Reddit Alambo et al. [2019]; Aladag et al. [2018]; Balani and De Choudhury [2015];
De Choudhury et al. [2016, 2017a]; Gkotsis et al. [2017]; Seah and Jin Shim
[2019]; Shing et al. [2018]; Toulis and Golab [2017]; Tadesse et al. [2020];
Thorstad and Wolff [2019]; Wolohan et al. [2018]; Cong et al. [2018]
Sina Weibo Cheng et al. [2017]; Huang et al. [2014, 2015]; Gao et al. [2017]; Peng et al.
[2017]; Wang et al. [2018]
CLPsych Shared Amir et al. [2017]; Cohan et al. [2017]; Matero et al. [2019]; Mohammadi
Task et al. [2019]; Orabi et al. [2018]; Resnik et al. [2015b]
eRisk Losada et al. [2019, 2017, 2018]; Almeida et al. [2017]; Kim et al. [2016]; Shen
et al. [2017]
Reviews Dreisbach et al. [2019]; Edo-Osagie et al. [2020]; Giuntini et al. [2020];
Mahdy et al. [2020]; Thieme et al. [2020]; Tsakalidis et al. [2019]; Robila and
Robila [2019]
Others Coppersmith et al. [2017]; Resnik et al. [2013]; Toulis and Golab [2017]
ACM Computing Surveys, Vol. 53, No. 6, Article 129. Publication date: December 2020.
Using Social Media for Mental Health Surveillance: A Review 129:23
REFERENCES
Amayas Abboute, Yasser Boudjeriou, Gilles Entringer, Jérôme Azé, Sandra Bringay, and Pascal Poncelet. 2014. Mining
Twitter for suicide prevention. In Proceedings of the International Conference on Applications of Natural Language to Data
Bases/Information Systems (LNCS, Vol 8455). Springer, Cham, 250–253. DOI:https://doi.org/10.1007/978-3-319-07983-
7_36
Saeed Abdullah and Tanzeem Choudhury. 2018. Sensing technologies for monitoring serious mental illnesses. IEEE Multi-
media 25, 1 (Jan. 2018), 61–75.
Roberto Wellington Acuña Caicedo, José Manuel Gómez Soriano, and Héctor Andrés Melgar Sasieta. 2020. Assessment of
supervised classifiers for the task of detecting messages with suicidal ideation. Heliyon 6, 8 (2020). DOI:https://doi.org/
10.1016/j.heliyon.2020.e04412
Somayyeh Aghababaei and Masoud Makrehchi. 2017. Activity-based Twitter sampling for content-based and user-centric
prediction models. Hum.-centr. Comput. Inf. Sci. 7, 1 (2017). DOI:https://doi.org/10.1186/s13673-016-0084-z
Ahmet Emre Aladag, Serra Muderrisoglu, Naz Berfu Akbas, Oguzhan Zahmacioglu, and Haluk O. Bingol. 2018. Detecting
suicidal ideation on forums: Proof-of-concept study. J. Med. Internet Res. 20, 6 (June 2018), e215. DOI:https://doi.org/10.
2196/jmir.9840
Amanuel Alambo, Manas Gaur, Usha Lokala, Ugur Kursuncu, and Krishnaprasad Thirunarayan. 2019. Question answering
for suicide risk assessment using Reddit. In Proceedings of the IEEE 13th International Conference on Semantic Computing
(ICSC’19). 468–473.
Hayda Almeida, Antoine Briand, and Marie Jean Meurs. 2017. Detecting early risk of depression from social media user-
generated content. In Proceedings of the CEUR Workshop, Vol. 1866.
Salma Almouzini, Maher Khemakhem, and Asem Alageel. 2019. Detecting Arabic depressed users from Twitter data. Pro-
cedia Comput. Sci. 163 (2019), 257–265. DOI:https://doi.org/10.1016/j.procs.2019.12.107
Tim Althoff, Kevin Clark, and Jure Leskovec. 2016. Large-scale analysis of counseling conversations: An application of
natural language processing to mental health. Trans. Assoc. Comput. Ling. 4 (2016), 463–476.
Silvio Amir, Glen Coppersmith, Paula Carvalho, Mário J. Silva, and Byron C. Wallace. 2017. Quantifying mental health
from social media with neural user embeddings. In Proceedings of the 2nd Machine Learning for Healthcare Conference.
PMLR, 306–321. arxiv:1705.00335.
Laritza Coello-Guilarte, Rosa María Ortega-Mendoza, Luis Villaseñor-Pineda, Manuel Montes-y-Gómez. 2019. Crosslingual
depression detection in Twitter using Bilingual word alignments. In Proceedings of the International Conference of the
Cross-Language Evaluation Forum for European Languages.Springer, 49–61 pages. DOI:https://doi.org/10.1007/978-3-
030-28577-7
Sairam Balani and Munmun De Choudhury. 2015. Detecting and characterizing mental health related self-disclosure in
social media. In Proceedings of the 33rd Annual ACM Conference Extended Abstracts on Human Factors in Computing
Systems (CHI EA’15). 1373–1378.
Adrian Benton, Glen Coppersmith, and Mark Dredze. 2017a. Ethical research protocols for social media health research.
In Proceedings of the 1st Workshop on Ethics in Natural Language Processing. Association for Computational Linguistics,
94–102. DOI:https://doi.org/10.18653/v1/w17-1612
Adrian Benton, Margaret Mitchell, and Dirk Hovy. 2017b. Multitask learning for mental health conditions with limited
social media data. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational
Linguistics, Vol. 1. 152–162. DOI:https://doi.org/10.1890/06-0645.1arxiv:1712.03538.
Natalie Berry, Fiona Lobban, Maksim Belousov, Richard Emsley, Goran Nenadic, and Sandra Bucci. 2017. #Why-
WeTweetMH: Understanding why people use Twitter to discuss mental health problems. J. Med. Internet Res. 19, 4
(2017), 107–1071. DOI:https://doi.org/10.2196/jmir.6173
Scott R. Braithwaite, Christophe Giraud-Carrier, Josh West, Michael D. Barnes, and Carl Lee Hanson. 2016. Validating
machine learning algorithms for Twitter data against established measures of suicidality. JMIR Ment. Health 3, 2 (2016),
e21. DOI:https://doi.org/10.2196/mental.4822
Pete Burnap, Gualtiero Colombo, Rosie Amery, Andrei Hodorog, and Jonathan Scourfield. 2017. Multi-class machine clas-
sification of suicide-related communication on Twitter. Online Social Netw. Media 2 (2017), 32–44.
Pete Burnap, Walter Colombo, and Jonathan Scourfield. 2015. Machine classification and analysis of suicide-related com-
munication on Twitter. In Proceedings of the 26th ACM Conference on Hypertext & Social Media (HT’15). 75–84.
Hugo D. Calderon-Vilca, William I. Wun-Rafael, and Roberto Miranda-Loarte. 2017. Simulation of suicide tendency by using
machine learning. In Proceedings of 36th International Conference of the Chilean Computer Science Society (SCCC’17).
IEEE, 1–6. DOI:https://doi.org/10.1109/SCCC.2017.8405128
Rafael A. Calvo, David N. Milne, M. Sazzad Hussain, and Helen Christensen. 2017. Natural language processing in men-
tal health applications using non-clinical texts. Nat. Lang. Eng. 23, 5 (Sep. 2017), 1–37. DOI:https://doi.org/10.1017/
S1351324916000383
ACM Computing Surveys, Vol. 53, No. 6, Article 129. Publication date: December 2020.
129:24 R. Skaik and D. Inkpen
Patricia Cavazos-Rehg, Melissa Krauss, Shaina Sowles, Sarah Connolly, Rosas Carlos, Meghana Bharadwaj, and Laura
Bierut. 2016. A content analysis of depression-related Tweets. Comput. Hum. Behav. 2, 74 (2016), 351–357.
Daniel Cer, Yinfei Yang, Sheng yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-
Céspedes, Steve Yuan, Chris Tar, Yun Hsuan Sung, Brian Strope, and Ray Kurzweil. 2018. Universal sentence encoder. In
Proceedings of the Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association
for Computational Linguistics, 169–174. DOI:https://doi.org/10.18653/v1/d18-2029arxiv:1803.11175.
Nina Cesare, Christan Grant, and Elaine O. Nsoesie. 2017. Detection of user demographics on social media: A review of
methods and recommendations for best practices. CoRR (Feb. 2017), 1–18.
Stevie Chancellor, Eric P. S. Baumer, and Munmun De Choudhury. 2019. Who is the ‘human” in human-centered machine
learning: The case of predicting mental health from social media. Proc. ACM Hum.-comput. Interact. 3, Nov. (2019).
Xuetong Chen, Martin Sykora, Thomas Jackson, Suzanne Elayan, and Fehmidah Munir. 2018b. Tweeting your mental
health: An exploration of different classifiers and features with emotional signals in identifying mental health con-
ditions. In Proceedings of the 51st Hawaii International Conference on System Sciences. 3320–3328. DOI:https://doi.org/
10.24251/HICSS.2018.421
Xuetong Chen, Martin D. Sykora, Thomas W. Jackson, and Suzanne Elayan. 2018a. What about mood swings? Identifying
depression on Twitter with temporal measures of emotions. In The 2018 Web Conference Companion. ACM, 1653–1660.
Xin Chen, Yu Wang, Eugene Agichtein, and Fusheng Wang. 2015. A comparative study of demographic attribute inference
in Twitter. In Proceedings of the 9th International AAAI Conference on Web and Social Media. 590–593. DOI:https://doi.
org/10.1177/2047487314541731
Qijin Cheng, Tim Mh Li, Chi Leung Kwok, Tingshao Zhu, and Paul S. F. Yip. 2017. Assessing suicide risk and emotional
distress in Chinese social media: A text mining and machine learning study. J. Med. Internet Res. 19, 7 (2017), 1–10.
DOI:https://doi.org/10.2196/jmir.7276
Arman Cohan, Sydney Young, Andrew Yates, and Nazli Goharian. 2017. Triaging content severity in online mental health
forums. J. Assoc. Inf. Sci. Technol. 68, 11 (2017), 2675–2689. DOI:https://doi.org/10.1002/asi.23865 arxiv:1702.06875
Gualtiero B. Colombo, Pete Burnap, Andrei Hodorog, and Jonathan Scourfield. 2016. Analysing the connectivity and com-
munication of suicidal users on Twitter. Comput. Commun. 73 (2016), 291–300.
Qing Cong, Zhiyong Feng, Fang Li, Yang Xiang, Guozheng Rao, and Cui Tao. 2018. X-A-BiLSTM: A deep learning approach
for depression detection in imbalanced data. In Proceedings of the IEEE International Conference on Bioinformatics and
Biomedicine (BIBM’18). IEEE, 1624–1627. DOI:https://doi.org/10.1109/BIBM.2018.8621230
Mike Conway. 2014. Ethical issues in using Twitter for public health surveillance and research: Developing a taxonomy of
ethical concepts from the research literature. J. Med. Internet Res. 16, 12 (2014), 1–9. DOI:https://doi.org/10.2196/jmir.
3617
Mike Conway, Mengke Hu, and Wendy W. Chapman. 2019. Recent advances in using natural language processing to
address public health research questions using social media and consumer-generated data. Yearb. Med. Inform. 28, 01
(Aug. 2019), 208–217. DOI:https://doi.org/10.1055/s-0039-1677918
Mike Conway and Daniel O’Connor. 2016. Social media, big data, and mental health: Current advances and ethical impli-
cations. Curr. Opin. Psychol. 9 (June 2016), 77–82. DOI:https://doi.org/10.1016/j.copsyc.2016.01.004
Glen Coppersmith, Mark Dredze, and Craig Harman. 2014a. Quantifying mental health signals in Twitter. In Proceedings
of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality. 51–60.
Glen Coppersmith, Mark Dredze, Craig Harman, and Kristy Hollingshead. 2015a. From ADHD to SAD: Analyzing the lan-
guage of mental health on Twitter through self-reported diagnoses. In Proceedings of the 2nd Workshop on Computational
Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality. 1–10.
Glen Coppersmith, Casey Hilland, Ophir Frieder, and Ryan Leary. 2017. Scalable mental health analysis in the clinical
whitespace via natural language processing. In Proceedings of the IEEE EMBS International Conference on Biomedical
and Health Informatics (BHI’17). IEEE, 393–396. DOI:https://doi.org/10.1109/BHI.2017.7897288
Glen Coppersmith, Ryan Leary, Patrick Crutchley, and Alex Fine. 2018. Natural language processing of social me-
dia as screening for suicide risk. Biomed. Inform. Ins. 10 (2018), 117822261879286. DOI:https://doi.org/10.1177/
1178222618792860
Glen Coppersmith, Ryan Leary, Eric Whyne, and Tony Wood. 2015b. Quantifying suicidal ideation via language usage on
social media. In Joint Statistics Meetings Proceedings, Statistical Computing Section (JSM’15).
Glen Coppersmith, Kim Ngo, Ryan Leary, and Anthony Wood. 2016. Exploratory analysis of social media prior to a suicide
attempt. In Proceedings of the 3rd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal
to Clinical Reality. Association for Computational Linguistics. 106–117. DOI:https://doi.org/10.18653/v1/w16-0311
Glen A. Coppersmith, Craig T. Harman, and Mark H. Dredze. 2014b. Measuring post traumatic stress disorder in Twitter.
In Proceedings of the International AAAI Conference on Web and Social Media (ICWSM’14). DOI:https://doi.org/10.1016/
S1003-6326(14)63309-4
ACM Computing Surveys, Vol. 53, No. 6, Article 129. Publication date: December 2020.
Using Social Media for Mental Health Surveillance: A Review 129:25
Aron Culotta. 2014. Estimating county health statistics with Twitter. In Proceedings of the 32nd Annual ACM Conference on
Human Factors in Computing Systems (CHI’14). 1335–1344.
Nicholas Cummins, Stefan Scherer, Jarek Krajewski, Sebastian Schnieder, Julien Epps, and Thomas F. Quatieri. 2015.
A review of depression and suicide risk assessment using speech analysis. Speech Commun. 71 (2015), 10–49. DOI:
https://doi.org/10.1016/j.specom.2015.03.004
Munmun De Choudhury, Scott Counts, and Eric Horvitz. 2013c. Major life changes and behavioral markers in social media:
Case of childbirth. In Proceedings of the ACM Conference on Computer Supported Cooperative Work (CSCW’13). 1431–
1442. DOI:https://doi.org/10.1145/2441776.2441937
Munmun De Choudhury, Scott Counts, and Eric Horvitz. 2013b. Predicting postpartum changes in emotion and behavior
via social media. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’13).
Munmun De Choudhury, Scott Counts, and Eric Horvitz. 2013a. Social media as a measurement tool of depression in
populations. In Proceedings of the 5th Annual ACM Web Science Conference (WebSci’13). 47–56. DOI:https://doi.org/10.
1145/2464464.2464480
Munmun De Choudhury, Scott Counts, Eric J. Horvitz, and Aaron Hoff. 2014. Characterizing and predicting postpartum
depression from shared Facebook data. In Proceedings of the 17th ACM Conference on Computer Supported Cooperative
Work & Social Computing (CSCW’14). 626–638. DOI:https://doi.org/10.1145/2531602.2531675
Munmun De Choudhury, Michael Gamon, Scott Counts, and Eric Horvitz. 2013d. Predicting depression via social media.
In Proceedings of the 7th International AAAI Conference on Weblogs and Social Media, Vol. 2. 128–137.
Munmun De Choudhury, Emre Kiciman, Mark Dredze, Glen Coppersmith, and Mrinal Kumar. 2016. Discovering shifts to
suicidal ideation from mental health content in social media. In Proceedings of the CHI Conference on Human Factors in
Computing Systems (CHI’16). 2098–2110. DOI:https://doi.org/10.1145/2858036.2858207
Munmun De Choudhury and Emre Kiciman. 2017a. The language of social support in social media and its effect on suicidal
ideation risk. In Proceedings of the International AAAI Conference on Weblogs and Social Media. 32–41.
Munmun De Choudhury, Sanket S. Sharma, Tomaz Logar, Wouter Eekhout, and René Clausen Nielsen. 2017b. Gender and
cross-cultural differences in social media disclosures of mental illness. In Proceedings of the ACM Conference on Computer
Supported Cooperative Work and Social Computing (CSCW’17). 353–369. DOI:https://doi.org/10.1145/2998181.2998220
K. Denecke, P. Bamidis, C. Bond, E. Gabarron, M. Househ, A. Y. S. S. Lau, M. A. Mayer, M. Merolli, and M. Hansen. 2015.
Ethical issues of social media usage in healthcare. Yearb. Med. Inf. 10, 1 (Aug. 2015), 137–147. DOI:https://doi.org/10.
15265/IY-2015-001
Bart Desmet and Véronique Hoste. 2018. Online suicide prevention through optimised text classification. Inf. Sci. 439–440
(2018), 61–78. DOI:https://doi.org/10.1016/j.ins.2018.02.014
Karthik Dinakar, Henry Lieberman, Allison J. B. Chaney, and David M. Blei. 2014. Real-time topic models for crisis coun-
seling. In Proceedings of the KDD DSSG Workshop.
Son Doan, Amanda Ritchart, Nicholas Perry, Juan D. Chaparro, and Mike Conway. 2017. How do you #relax when you’re
#stressed? A content analysis and infodemiology study of stress-related tweets. JMIR Pub. Health Surveill. 3, 2 (2017),
e35. DOI:https://doi.org/10.2196/publichealth.5939
Caitlin Dreisbach, Theresa A. Koleck, Philip E. Bourne, and Suzanne Bakken. 2019. A systematic review of natural language
processing and text mining of symptoms from electronic patient-authored text data. Int. J. Med. Inform. 125, Dec. (2019),
37–46. DOI:https://doi.org/10.1016/j.ijmedinf.2019.02.008
Oduwa Edo-Osagie, Beatriz De La Iglesia, Iain Lake, and Obaghe Edeghere. 2020. A scoping review of the use of Twitter
for public health research. Comput. Biol. Med. 122, Apr. (2020), 103770. DOI:https://doi.org/10.1016/j.compbiomed.2020.
103770
Oduwa Edo-Osagie, Gillian Smith, Iain Lake, Obaghe Edeghere, and Beatriz De La Iglesia. 2019. Twitter mining using
semi-supervised classification for relevance filtering in syndromic surveillance. PLoS One 14, 7 (2019), 1–29. DOI:
https://doi.org/10.1371/journal.pone.0210689
Atefeh Farzindar and Diana Inkpen. 2017. Natural language processing for social media, second edition. Synth. Lect. Hum.
Lang. Technol. 10, 2 (2017), 1–195.
Renato Miranda Filho, Jussara M. Almeida, and Gisele L. Pappa. 2015. Twitter population sample bias and its impact on
predictive outcomes. In Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis
and Mining (ASONAM’15). ACM, 1254–1261. DOI:https://doi.org/10.1145/2808797.2809328
Manuel A. Franco-Martín, Juan Luis Muñoz-Sánchez, Beatriz Sainz-de Abajo, Gema Castillo-Sánchez, Sofiane Hamrioui,
and Isabel de la Torre-Díez. 2018. A systematic literature review of technologies for suicidal behavior prevention.
J. Med. Systems 42, 4 (Apr. 2018), 71. DOI:https://doi.org/10.1007/s10916-018-0926-5
Yuanbo Gao, Baobin Li, Xuefei Wangy, Jingying Wangy, Yang Zhouy, Shuotian Bai, and Tingshao Zhuy. 2017. Detecting
suicide ideation from Sina microblog. In Proceedings of the IEEE International Conference on Systems, Man, and Cyber-
netics (SMC’17). 182–187. DOI:https://doi.org/10.1109/SMC.2017.8122599
ACM Computing Surveys, Vol. 53, No. 6, Article 129. Publication date: December 2020.
129:26 R. Skaik and D. Inkpen
Robert R. German, John M. Horan, Lisa M. Lee, Bobby Milstein, and Carol A. Pertowski. 2001. Updated guidelines for
evaluating public health surveillance systems; recommendations from the guidelines working group. Retrieved from
https://www.cdc.gov/mmwr/preview/mmwrhtml/rr5013a1.htm.
Salvatore Giorgi, Daniel Preotiuc-Pietro, Anneke Buffone, Daniel Rieman, Lyle Ungar, and H. Andrew Schwartz. 2018.
The remarkable benefit of user-level aggregation for lexical-based population-level predictions. In Proceedings of the
Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1167–1172.
DOI:https://doi.org/10.18653/v1/d18-1148 arxiv:1808.09600.
Felipe T. Giuntini, Mirela T. Cazzolato, Maria de Jesus Dutra dos Reis, Andrew T. Campbell, Agma J. M. Traina, and Jó
Ueyama. 2020. A review on recognizing depression in social networks: Challenges and opportunities. J. Amb. Intell.
Humaniz. Comput. 2016 (2020). DOI:https://doi.org/10.1007/s12652-020-01726-4
George Gkotsis, Anika Oellrich, Sumithra Velupillai, Maria Liakata, Tim J. P. Hubbard, Richard J. B. Dobson, and Rina Dutta.
2017. Characterisation of mental health conditions in social media using informed deep learning. Sci. Rep. 7 (Mar. 2017),
45141. DOI:https://doi.org/10.1038/srep45141
Su Golder, Shahd Ahmed, Gill Norman, and Andrew Booth. 2017. Attitudes toward the ethics of research using social
media: A systematic review. J. Med. Internet Res. 19, 6 (June 2017), e195. DOI:https://doi.org/10.2196/jmir.7082
Shannon Greenwood, Andrew Perrin, and Maeve Duggan. 2016. Social media update 2016. Pew Research CenterNov. (2016).
Oliver Gruebner, Sarah R. Lowe, Martin Sykora, Ketan Shankardass, S. V. Subramanian, and Sandro Galea. 2017a. A novel
surveillance approach for disaster mental health. PLoS One 12, 7 (2017), e0181233. DOI:https://doi.org/10.1371/journal.
pone.0181233
Oliver Gruebner, Martin Sykora, Sarah R. Lowe, Ketan Shankardass, Sandro Galea, and S. V. Subramanian. 2017b.
Big data opportunities for social behavioral and mental health research. Social Sci. Med. 189 (2017), 167–169. DOI:
https://doi.org/10.1016/j.socscimed.2017.07.018
Sharath Chandra Guntuku, David B. Yaden, Margaret L. Kern, Lyle H. Ungar, and Johannes C. Eichstaedt. 2017. Detecting
depression and mental illness on social media: An integrative review. Curr. Opin. Behav. Sci. 18 (2017), 43–49. DOI:
https://doi.org/10.1016/j.cobeha.2017.07.005
T. Hahn, A. A. Nierenberg, and S. Whitfield-Gabrieli. 2017. Predictive analytics in mental health: Applications, guidelines,
challenges and perspectives. Molec. Psychi. 22, 1 (2017), 37–43.
Bibo Hao, Lin Li, Ang Li, and Tingshao Zhu. 2013. Predicting mental health status on social media—A preliminary study
on microblog. In Proceedings of the 15th International Conference on Human-Computer Interaction. 101–110.
Derek Howard, Marta M. Maslej, Justin Lee, Jacob Ritchie, Geoffrey Woollard, and Leon French. 2020. Transfer learning for
risk classification of social media posts: Model evaluation study. J. Med. Internet Res. 22, 5 (2020). DOI:https://doi.org/
10.2196/15371 arxiv:1907.02581
Xiaolei Huang, Xin Li, Lei Zhang, Tianli Liu, David Chiu, and Tingshao Zhu. 2015. Topic model for identifying suicidal
ideation in Chinese microblog. In Proceedings of the 29th Pacific Asia Conference on Language, Information and Compu-
tation. 553–562.
Xiaolei Huang, Lei Zhang, David Chiu, Tianli Liu, Xin Li, and Tingshao Zhu. 2014. Detecting suicidal ideation in Chinese
microblogs with psychological lexicons. In Proceedings of the IEEE International Conference on Ubiquitous Intelligence
and Computing. 844–849.
InsightsWest. 2017. Canadian social media monitor 2017. Retrieved from https://bcama.com/wp-content/uploads/2018/03/
Rep_IW_CDNSocialMediaMonitor_Oct2017.pdf.
Zunaira Jamil, Diana Inkpen, Prasadith Buddhitha, and Kenton White. 2017. Monitoring tweets for depression to detect
at-risk users. In Proceedings of the 4th Workshop on Computational Linguistics and Clinical Psychology. 32–40.
Natasha Jaques, Sara Taylor, Ehimwenma Nosakhare, Akane Sano, and Rosalind Picard. 2016. Multi-task learning for pre-
dicting health, stress, and happiness. In Proceedings of the NIPS Workshop on Machine Learning for Healthcare. 1–5.
Jared Jashinsky, Scott H. Burton, Carl L. Hanson, Josh West, Christophe Giraud-Carrier, Michael D. Barnes, and Trenton
Argyle. 2013. Tracking suicide risk factors through Twitter in the US. Crisis 35, 1 (Jan. 2013), 51–59. DOI:https://doi.org/
10.1027/0227-5910/a000234
Deepali J. Joshi, Mohit Makhija, Yash Nabar, Ninad Nehete, and Manasi S. Patwardhan. 2018. Mental health analysis using
deep learning for feature extraction. In ACM International Conference Proceeding Series. ACM, 356–359. DOI:https://doi.
org/10.1145/3152494.3167990
Sayali Shashikant Kale. 2015. Tracking Mental Disorders across Twitter Users. Ph.D. Dissertation. University of Mumbai.
Keumhee Kang, Chanhee Yoon, and Eun Yi Kim. 2016. Identifying depressive users in Twitter using multimodal anal-
ysis. In Proceedings of the International Conference on Big Data and Smart Computing (BigComp’16). 231–238. DOI:
https://doi.org/10.1109/BIGCOMP.2016.7425918
Christian Karmen, Robert C. Hsiung, and Thomas Wetter. 2015. Screening Internet forum participants for depression symp-
toms by assembling and enhancing multiple NLP methods. Comput. Meth. Prog. Biomed. 120, 1 (2015), 27–36.
ACM Computing Surveys, Vol. 53, No. 6, Article 129. Publication date: December 2020.
Using Social Media for Mental Health Surveillance: A Review 129:27
Ramakanth Kavuluru, Amanda G. Williams, María Ramos-Morales, Laura Haye, Tara Holaday, and Julie Cerel. 2016. Clas-
sification of helpful comments on online suicide watch forums HHS public access. In Proceedings of the ACM Conference
on Bioinformatics, Computational Biology, and Health Informatics. 32–40. DOI:https://doi.org/10.1145/2975167.2975170
Ashar Anam Khan and Mohd Husain. 2018. Analysis of mental state of users using social media to predict depression! A
survey. In Int. J. Adv. Res. Comput. Sci. 9 (2018).
Sunghwan Mac Kim, Yufei Wang, Stephen Wan, and Cécile Paris. 2016. Data61-CSIRO systems at the CLPsych 2016 shared
task. In Proceedings of the 3rd Workshop on Computational Linguistics and Clinical Psychology. 128–132.
Holly Korda and Zena Itani. 2013. Harnessing social media for health promotion and behavior change. Health Promot. Pract.
14, 1 (2013), 15–23.
Mrinal Kumar, Mark Dredze, Glen Coppersmith, and Munmun De Choudhury. 2015. Detecting changes in suicide content
manifested in social media following celebrity suicides. In Proceedings of the 26th ACM Conference on Hypertext & Social
Media (HT’15), Vol. 2015. NIH Public Access, 85–94.
Ugur Kursuncu, Manas Gaur, Usha Lokala, Krishnaprasad Thirunarayan, Amit Sheth, and I. Budak Arpinar. 2018. Predictive
analysis on Twitter: Techniques and applications. In Emerging Research Challenges and Opportunities in Computational
Social Network Analysis and Mining. Springer, Cham, 67–104.
E. Megan Lachmar, Andrea K. Wittenborn, Katherine W. Bogen, and Heather L. McCauley. 2017. #MyDepressionLooksLike:
Examining public discourse about depression on Twitter. JMIR Ment. Health 4, 4 (Oct. 2017), e43. DOI:https://doi.org/
10.2196/mental.8141
Michael Thaul Lehrman, Cecilia Ovesdotter, Alm Rubén, and A. Proaño. 2012. Detecting distressed and non-distressed
affect states in short forum texts. In Proceedings of the 2nd Workshop on Language in Social Media. 9–18.
Victor Leiva and Freire Ana. 2017. Towards suicide prevention: Early detection of depression on social media,
I. Kompatsiaris I. et al. (Eds.). In Proceedings of the Conference on Internet Science (INSCI’17), Vol. 10673. 1271c–0.
DOI:https://doi.org/10.1007/978-3-319-70284-1_34
Diya Li, Harshita Chaudhary, and Zhe Zhang. 2020. Modeling spatiotemporal pattern of depressive symptoms caused by
COVID-19 using social media data mining. Int. J. Environ. Res. Pub. Health 17, 14 (2020), 1–23. DOI:https://doi.org/10.
3390/ijerph17144988
Yun Li, Tao Li, and Huan Liu. 2017. Recent advances in feature selection and its applications. Knowl. Inf. Syst. 53, 3 (2017),
551–577.
Huijie Lin, Jia Jia, Quan Guo, Yuanyuan Xue, Qi Li, Jie Huang, Lianhong Cai, and Ling Feng. 2014. User-level psychological
stress detection from social media using deep neural network. In Proceedings of the ACM International Conference on
Multimedia (MM’14). 507–516.
Huijie Lin, Jia Jia, Liqiang Nie, Guangyao Shen, and Tat-Seng Chua. 2016. What does social media say about your stress?
In Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI’16). 3775–3781.
Huijie Lin, Jia Jia, Jiezhong Qiu, Yongfeng Zhang, Guangyao Shen, Lexing Xie, Jie Tang, Ling Feng, and Tat-seng Seng
Chua. 2017. Detecting stress based on social interactions in social networks. IEEE Trans. Knowl. Data Eng. 29, 9 (2017),
1820–1833. DOI:https://doi.org/10.1109/TKDE.2017.2686382
Tong Liu, Qijin Cheng, Christopher M. Homan, and Vincent M. B. Silenzio. 2017. Learning from various labeling strategies
for suicide-related messages on social media: An experimental study. In Proceedings of the ACM International Conference
on Web Search and Data Mining Workshop on Mining Online Health Reports. arxiv:1701.08796
David E. Losada and Fabio Crestani. 2016. A test collection for research on depression and language use. In Lecture Notes
in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol.
9822. Springer, Cham, 28–39.
David E. Losada, Fabio Crestani, and Javier Parapar. 2017. CLEF 2017 eRisk overview: Early risk prediction on the Internet:
Experimental foundations. In Proceedings of the International Conference of the Cross-language Evaluation Forum for
European Languages, Vol. 1866. Springer.
David E. Losada, Fabio Crestani, and Javier Parapar. 2018. Overview of eRisk 2018: Early risk prediction on the Internet
(extended lab overview). In Proceedings of the 9th International Conference of the CLEF Association, Vol. 2125.
David E. Losada, Fabio Crestani, and Javier Parapar. 2019. Overview of eRisk: Early risk prediction on the Internet. In Exper-
imental IR Meets Multilinguality, Multimodality, and Interaction, F. Crestaniet al. (Eds.), Vol. 11696. Springer International
Publishing, 343–361. DOI:https://doi.org/10.1007/978-3-030-28577-7_27
Meizhen Lv, Ang Li, Tianli Liu, and Tingshao Zhu. 2015. Creating a Chinese suicide dictionary for identifying suicide risk
on social media. PeerJ 3 (2015), e1455.
Nourane Mahdy, Dalia A. Magdi, Ahmed Dahroug, and Mohammed Abo Rizka. 2020. Comparative study: Different tech-
niques to detect depression using social media. In Lecture Notes in Networks and Systems. Vol. 114. Springer Singapore,
441–452. DOI:https://doi.org/10.1007/978-981-15-3075-3_30
M. Marcus, Mohammad Taghi Yasamy, M. Van Ommeren, D. Chisholm, and S. Saxena. 2012. Depression: A global public
health concern. World Health Org. Paper Depress. 01, Dec. (2012), 6–8.
ACM Computing Surveys, Vol. 53, No. 6, Article 129. Publication date: December 2020.
129:28 R. Skaik and D. Inkpen
Matthew Matero, Akash Idnani, Youngseo Son, Sal Giorgi, Huy Vu, Mohammad Zamani, Parth Limbachiya, Sharath
Chandra Guntuku, and H. Andrew Schwartz. 2019. Suicide risk assessment with multi-level dual-context language
and BERT. In Proceedings of the 6th Workshop on Computational Linguistics and Clinical Psychology. 39–44. DOI:
https://doi.org/10.18653/v1/w19-3005
Rebecca Mckee. 2013. Ethical issues in using social media for health and health care research. Health Polic. 110, 2-3 (May
2013), 298–301. DOI:https://doi.org/10.1016/j.healthpol.2013.02.006
Jonathan Mellon and Christopher Prosser. 2017. Twitter and Facebook are not representative of the general population:
Political attitudes and demographics of British social media users. Res. Polit. 4, 3 (2017).
Jude Mikal, Samantha Hurst, and Mike Conway. 2016. Ethical issues in using Twitter for population-level depression mon-
itoring: A qualitative study. BMC Med. Ethics 17, 1 (Apr. 2016), 22. DOI:https://doi.org/10.1186/s12910-016-0105-5
Elham Mohammadi, Hessam Amini, and Leila Kosseim. 2019. CLaC at CLPsych 2019: Fusion of neural features and predicted
class probabilities for suicide risk assessment based on online posts. In Proceedings of the 6th Workshop on Computational
Linguistics and Clinical Psychology. 34–38. DOI:https://doi.org/10.18653/v1/W19-3004
David Moher, Douglas G. Altman, Alesandro Liberati, and Jennifer Tetzlaff. 2011. PRISMA statement. Epidemiology 22, 1
(2011), 128.
Michelle Renee Morales, Stefan Scherer, and Rivka Levitan. 2017. A cross-modal review of indicators for depression detec-
tion systems. In Proceedings of the 4th Workshop on Computational Linguistics and Clinical Psychology. 1–12.
Danielle Mowery, Craig Bryan, and Mike Conway. 2017a. Feature Studies to Inform the Classification of Depressive Symp-
toms from Twitter Data for Population Health. DOI:https://doi.org/10.1056/NEJMoa1010095
Danielle Mowery, Albert Park, Mike Conway, and Craig Bryan. 2016. Towards automatically classifying depressive symp-
toms from Twitter data for population health. In Proceedings of the Workshop on Computational Modeling of People’s
Opinions, Personality, and Emotions in Social Media. 182–191.
Danielle Mowery, Hilary Smith, Tyler Cheney, Greg Stoddard, Glen Coppersmith, Craig Bryan, and Mike Conway. 2017b.
Understanding depressive symptoms and psychosocial stressors on Twitter: A corpus-based study. J. Med. Internet Res.
19, 2 (2017). DOI:https://doi.org/10.2196/jmir.6895
Danielle L. Mowery, Craig Bryan, and Mike Conway. 2015. Towards developing an annotation scheme for depressive
disorder symptoms: A preliminary study using Twitter data. In Proceedings of the 2nd Workshop on Computational
Linguistics and Clinical Psychology. 89–98.
Hung Nguyen, Duc Thanh Nguyen, and Thin Nguyen. 2019. Estimating county health indices using graph neural networks.
In Proceedings of the Australasian Data Mining Conference (AusDM’19), T. Le et al. (Eds). (Communications in Computer
and Information Science, Vol. 1127). 16–27. DOI:https://doi.org/10.1007/978-981-15-1699-3
Thin Nguyen, Bridianne O’Dea, Mark Larsen, Dinh Phung, Svetha Venkatesh, and Helen Christensen. 2017b. Using lin-
guistic and topic analysis to classify sub-groups of online depression communities. Multimedia Tools Applic. 76, 8 (Apr.
2017), 10653–10676.
Thin Nguyen, Duc Thanh Nguyen, Mark E. Larsen, Bridianne O’Dea, John Yearwood, Dinh Phung, Svetha Venkatesh, and
Helen Christensen. 2017a. Prediction of population health indices from social media using kernel-based textual and
temporal features. In Proceedings of the 26th International Conference on World Wide Web (WWW’17). ACM, 99–107.
DOI:https://doi.org/10.1145/3041021.3054136
Thin Nguyen, Dinh Phung, Bo Dao, Svetha Venkatesh, and Michael Berk. 2014. Affective and content analysis of online
depression communities. IEEE Trans. Affect. Comput. 5, 3 (2014), 217–226.
Elaine O. Nsoesie, Luisa Flor, Jared Hawkins, Adyasha Maharana, Tobi Skotnes, Fatima Marinho, and John S. Brownstein.
2016. Social media as a sentinel for disease surveillance: What does sociodemographic status have to do with it? PLoS
Currents 8 (2016), 1–26.
Robertus Nugroho, Cecile Paris, Surya Nepal, Jian Yang, and Weiliang Zhao. 2020. A Survey of Recent Methods on Deriving
Topics from Twitter: Algorithm to Evaluation. Vol. 62. Springer London. 2485–2519 pages. DOI:https://doi.org/10.1007/
s10115-019-01429-z
Bridianne O’Dea, Stephen Wan, Philip J. Batterham, Alison L. Calear, Cecile Paris, and Helen Christensen. 2015. Detecting
suicidality on Twitter. Internet Interven. 2, 2 (May 2015), 183–188. DOI:https://doi.org/10.1016/j.invent.2015.03.005
The National Institute of Mental Health Information Resource Center NIMH. 2019. Suicide. Retrieved from https://www.
nimh.nih.gov/health/statistics/suicide.shtml.
Jihoon Oh, Kyongsik Yun, Ji-Hyun Hwang, and Jeong-Ho Chae. 2017. Classification of suicide attempts through a machine
learning algorithm based on multiple systemic psychiatric scales. Front. Psychi. 8 (2017), 192.
Ahmed Husseini Orabi, Prasadith Buddhitha, Mahmoud Husseini Orabi, and Diana Inkpen. 2018. Deep learning for depres-
sion detection of Twitter users. In Proceedings of the 5th Workshop on Computational Linguistics and Clinical Psychology:
From Keyboard to Clinic. 88–97.
Esteban Ortiz-Ospina. 2019. The Rise of Social Media. Retrieved from https://ourworldindata.org/rise-of-social-media.
ACM Computing Surveys, Vol. 53, No. 6, Article 129. Publication date: December 2020.
Using Social Media for Mental Health Surveillance: A Review 129:29
Minsu Park, David W. Mcdonald, and Meeyoung Cha. 2013. Perception differences between the depressed and non-
depressed users in Twitter. In Proceedings of the 7th International AAAI Conference on Weblogs and Social Media
(ICWSM’13). 476–485.
Michael J. Paul and Mark Dredze. 2014. Discovering health topics in social media using topic models. PLoS One 9, 8 (Aug.
2014), e103408. DOI:https://doi.org/10.1371/journal.pone.0103408
Michael J. Paul and Mark Dredze. 2017. Social monitoring for public health. Synth. Lect. Inf. Concepts, Retr. Serv. 9, 5 (2017),
1–183.
Michael J. Paul, Abeed Sarker, John S. Brownstein, Azadeh Nikfarjam, Matthew Scotch, Karen L. Smith, and Graciela
Gonzalez. 2016. Social media mining for public health monitoring and surveillance. In Proceedings of the Pacific Sympo-
sium on Biocomputing. 468–79.
Zhichao Peng, Qinghua Hu, and Jianwu Dang. 2017. Multi-kernel SVM based depression recognition using social media
data. Int. J. Mach. Learn. Cyber. (June 2017), 1–15.
Lawrence Phillips, Chase Dowling, Kyle Shaffer, Nathan Hodas, and Svitlana Volkova. 2017. Using social media to predict
the future: A systematic literature review. arXiv preprint arXiv:1706.06134 06, June 2016 (2017), 1–55.
Daniel Preot, Johannes Eichstaedt, Gregory Park, Maarten Sap, Laura Smith, Victoria Tobolsky, H. Andrew Schwartz, and
Lyle Ungar. 2015. The role of personality, age and gender in tweeting about mental illnesses. In Proceedings of the
Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality. 21–30.
V. M. Prieto, M. Alvarez, F. Cacheda, and J. L. Oliveira. 2014. Twitter: A good place to detect health conditions. PLoS One 9,
1 (Jan. 2014), e86191.
Public Health Agency of Canada. 2015. Report from the Canadian Chronic Disease Surveillance System: Mental Illness in
Canada 2015. Vol. 2015. Minister of Health, Ottawa, Canada. 1–54. DOI:https://doi.org/10.1002/yd.20038
Kate Loveys, Patrick Crutchley, Emily Wyatt, and Glen Coppersmith.2017. Small but mighty: Affective micropatterns for
quantifying mental health from social media language. In Proceedings of the 4th Workshop on Computational Linguistics
and Clinical Psychology. 85–95. DOI:https://doi.org/10.18653/v1/w17-3110
Gopalkumar Rakesh. 2017. Suicide prediction with machine learning. Amer. J. Psychi. Resid. J. 12, 1 (2017), 15–17.
Chempaka Seri Abdul Razak, Muhammad Ameer Zulkarnain, Siti Hafizah Ab Hamid, Nor Badrul Anuar, Mohd Zalisham
Jali, and Hasni Meon. 2020. Tweep: A System Development to Detect Depression in Twitter Posts. Vol. 603. Springer Singa-
pore. 543–552. DOI:https://doi.org/10.1007/978-981-15-0058-9_52
Andrew G. Reece, Andrew J. Reagan, Katharina L. M. Lix, Peter Sheridan Dodds, Christopher M. Danforth, and Ellen J.
Langer. 2017. Forecasting the onset and course of mental illness with Twitter data. Sci. Rep. 7, 1 (2017), 1–11. DOI:
https://doi.org/10.1038/s41598-017-12961-9
Philip Resnik, William Armstrong, Leonardo Claudino, and Thang Nguyen. 2015a. The University of Maryland CLPsych
2015 shared task system. In Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology. 54–60.
DOI:https://doi.org/10.3115/v1/w15-1207
Philip Resnik, William Armstrong, Leonardo Claudino, Thang Nguyen, Viet-An Nguyen, and Jordan Boyd-Graber. 2015b.
Beyond LDA: Exploring supervised topic modeling for depression-related language in Twitter. In Proceedings of the
52nd Workshop Computational Linguistics and Clinical Psychology, Vol. 1. 99–107.
Philip Resnik, Anderson Garron, and Rebecca Resnik. 2013. Using topic modeling to improve prediction of neuroticism and
depression in college students. In Proceedings of the Conference on Empirical Methods in Natural Language Processing.
1348–1353.
Hannah Ritchie and Max Roser. 2018. Our World in Data. Retrieved from https://ourworourworldindataldindata.org/
mental-health.
Mihaela Robila and Stefan A. Robila. 2019. Applications of artificial intelligence methodologies to behavioral and social
sciences. J. Child Fam. Stud. DOI:https://doi.org/10.1007/s10826-019-01689-x
Jo Robinson, Maria Rodrigues, Steve Fisher, Eleanor Bailey, and Helen Herrman. 2015. Social media and suicide prevention:
Findings from a stakeholder survey. Shanghai Arch. Psychi. 27, 1 (2015), 27–35.
Arunima Roy, Katerina Nikolitch, Rachel McGinn, Safiya Jinah, William Klement, and Zachary A. Kaminsky. 2020. A
machine learning approach predicts future risk to suicidal ideation from social media data. npj Dig. Med. 3, 1 (2020),
1–12. DOI:https://doi.org/10.1038/s41746-020-0287-6
Michal Rzeszewski and Lukasz Beluch. 2017. Spatial characteristics of Twitter users—Toward the understanding of geosocial
media production. ISPRS Int. J. Geo-inf. 6, 8 (2017), 236. DOI:https://doi.org/10.3390/ijgi6080236
Hamman Samuel, Benyamin Noori, Sara Farazi, and Osmar Zaiane. 2019. Context prediction in the social web using applied
machine learning: A study of Canadian tweeters. In Proceedings of the IEEE/WIC/ACM International Conference on Web
Intelligence (WI’18). 230–237. DOI:https://doi.org/10.1109/WI.2018.00-85
H. Andrew Schwartz, Johannes Eichstaedt, Margaret L. Kern, Gregory Park, Maarten Sap, David Stillwell, Michal Kosinski,
and Lyle Ungar. 2014. Towards assessing changes in degree of depression through Facebook. In Proceedings of the
Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality. 118–125.
ACM Computing Surveys, Vol. 53, No. 6, Article 129. Publication date: December 2020.
129:30 R. Skaik and D. Inkpen
H. Andrew Schwartz, Johannes C. Eichstaedt, Margaret L. Kern, Lukasz Dziurzynski, Megha Agrawal, Gregory J. Park,
Shrinidhi K. Lakshmikanth, Sneha Jha, Martin E. P. Seligman, and Lyle Ungar. 2013. Characterizing geographic
variation in well-being using tweets. In Proceedings of the 7th International AAAI Conference on Weblogs and Social
Media (ICWSM’13). 331–346.
Jane H. K. Seah and Kyong Jin Shim. 2019. Data mining approach to the detection of suicide in social media: A case
study of Singapore. In Proceedings of the IEEE International Conference on Big Data (Big Data’18). IEEE, 5442–5444.
DOI:https://doi.org/10.1109/BigData.2018.8622528
Adrian B. R. Shatte, Delyse M. Hutchinson, and Samantha J. Teague. 2019. Machine Learning in Mental Health: A Scoping
Review of Methods and Applications. DOI:https://doi.org/10.1017/S0033291719000151
Guangyao Shen, Jia Jia, Liqiang Nie, Fuli Feng, Cunjun Zhang, Tianrui Hu, Tat Seng Chua, and Wenwu Zhu. 2017. Depres-
sion detection via harvesting social media: A multimodal dictionary learning solution. In Proceedings of the International
Joint Conference on Artificial Intelligence. 3838–3844.
Leo Sher. 2020. The impact of the COVID-19 pandemic on suicide rates. QJM: Int. J. Med. May (2020), 1–6. DOI:
https://doi.org/10.1093/qjmed/hcaa202
Han-Chin Shing, Suraj Nair, Ayah Zirikly, Meir Friedenberg, Hal Daum, and Philip Resnik. 2018. Expert, crowdsourced, and
machine assessment of suicide risk via online postings. In Proceedings of the 5th Workshop on Computational Linguistics
and Clinical Psychology: From Keyboard to Clinic. 25–36.
Hong-Han Shuai, Chih-Ya Shen, De-Nian Yang, Yi-Feng Lan, Wang-Chien Lee, Philip Yu, and Ming-Syan Chen. 2018. A
comprehensive study on social network mental disorders detection via online social media mining. In IEEE Trans. Knowl.
Data Eng., Vol. 4347. 1–1.
Lauren Sinnenberg, Alison M. Buttenheim, Kevin Padrez, Christina Mancheno, Lyle Ungar, and Raina M. Merchant. 2017.
Twitter as a tool for health research: A systematic review. Amer. J. Pub. Health 107, 1 (2017), e1–e8.
Marina Sokolova and Guy Lapalme. 2009. A systematic analysis of performance measures for classification tasks. Inf. Proc.
Manag. 45, 4 (2009), 427–437. DOI:https://doi.org/10.1016/j.ipm.2009.03.002
Maxim Stankevich, Andrey Latyshev, Evgenia Kuminskaya, Ivan Smirnov, and Oleg Grigoriev. 2019. Depression detection
from social media texts. In CEUR Workshop Proceedings 2523 (2019), 279–289.
Maxim Stankevich, Ivan Smirnov, Natalia Kiselnikova, and Anastasia Ushakova. 2020. Depress. Detect. Social Media Prof.
Vol. 1223 CCIS. Springer International Publishing. 181–194. DOI:https://doi.org/10.1007/978-3-030-51913-1_12
Michael Mesfin Tadesse, Hongfei Lin, Bo Xu, and Liang Yang. 2020. Detection of suicide ideation in social media forums
using deep learning. Algorithms 13, 1 (2020), 1–19. DOI:https://doi.org/10.3390/a13010007
Anja Thieme, Danielle Belgrave, and Gavin Doherty. 2020. Machine learning in mental health: A systematic review
of the HCI literature to support effective ML system design. ACM Trans. Comput.-hum. Interact. 27, 5 (2020). DOI:
https://doi.org/10.1145/3398069
Robert Thorstad and Phillip Wolff. 2019. Predicting future mental illness from social media: A big-data approach. Behav.
Res. Meth. 51, 4 (2019), 1586–1600. DOI:https://doi.org/10.3758/s13428-019-01235-z
Andrew Toulis and Lukasz Golab. 2017. Social media mining to understand public mental health. In Lecture Notes in Com-
puter Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 10494.
55–70.
Adam Tsakalidis, Maria Liakata, Damoulas Theo, and Alexandra Cristea. 2019. Can we assess mental health through social
media and smart devices? Addressing bias in methodology and evaluation. In Proceedings of the Conference on Machine
Learning and Knowledge Discovery in Databases (ECML PKDD’18), U. Brefeld et al. (Eds.). (Lecture Notes in Computer
Science, Vol. 11053). 186–201. DOI:https://doi.org/10.1007/978-3-030-10997-4
Sho Tsugawa, Yusuke Kikuchi, Fumio Kishino, Kosuke Nakajima, Yuichi Itoh, and Hiroyuki Ohsaki. 2015. Recognizing
depression from Twitter activity. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing
Systems (CHI’15). 3187–3196.
Rupa Valdez and Jessica Keim-Malpass. 2019. Ethics in health research using social media. In Social Web Health Res.
DOI:https://doi.org/10.1007/978-3-030-14714-3_13
Kasturi Dewi Varathan and Nurhafizah Talib. 2014. Suicide detection system based on Twitter. In Proceedings of the IEEE
Science and Information Conference. 785–788.
Bhanu Verma, Sonam Gupta, and Lipika Goel. 2020. A Neural Network Based Hybrid Model for Depression Detection in
Twitter. Vol. 19. Springer Singapore. DOI:https://doi.org/10.1007/978-981-15-6634-9
M. Johnson Vioulès, B. Moulahi, J. Azè, and S. Bringay. 2018. Detection of suicide-related posts in Twitter data streams.
IBM J. Res. Dev. 62, 1 (2018), 7:1–7:12. DOI:https://doi.org/10.1147/JRD.2017.2768678
Tao Wang, Markus Brede, Antonella Ianni, and Emmanouil Mentzakis. 2017a. Detecting and characterizing eating-disorder
communities on social media. In Proceedings of the 10th ACM International Conference on Web Search and Data Mining
(WSDM’17). 91–100.
ACM Computing Surveys, Vol. 53, No. 6, Article 129. Publication date: December 2020.
Using Social Media for Mental Health Surveillance: A Review 129:31
Yilin Wang, Jiliang Tang, Jundong Li, Baoxin Li, Yali Wan, Clayton Mellina, Neil O’Hare, and Yi Chang. 2017b. Understand-
ing and discovering deliberate self-harm content in social media. In Proceedings of the 26th International World Wide
Web Conference (WWW’17). 93–102. DOI:https://doi.org/10.1145/3038912.3052555
Zijian Wang, Scott A. Hale, David Adelani, Przemyslaw A. Grabowicz, Timo Hartmann, Fabian Flöck, and David
Jurgens. 2019a. Demographic inference and representative population estimates from multilingual social media data.
In Proceedings of the World Wide Web Conference (WWW’19). 2056–2067. DOI:https://doi.org/10.1145/3308558.3313684
arxiv:1905.05961.
Zheng Wang, Guang Yu, and Xianyun Tian. 2019b. Exploring behavior of people with suicidal ideation in a Chinese online
suicidal community. Int. J. Environ. Res. Publ. Health 16, 1 (2019). DOI:https://doi.org/10.3390/ijerph16010054
Zheng Wang, Guang Yu, Xianyun Tian, Jingyun Tang, and Xiangbin Yan. 2018. A study of users with suicidal ideation on
Sina Weibo. Telemed. e-Health 24, 9 (2018), 702–709. DOI:https://doi.org/10.1089/tmj.2017.0189
Jieun Wee, Sooyeun Jang, Joonhwan Lee, and Woncheol Jang. 2017. The influence of depression and personality on social
networking. Comput. Hum. Behav. 74 (2017), 45–52. DOI:https://doi.org/10.1016/j.chb.2017.04.003
Janith Weerasinghe, Kediel Morales, and Rachel Greenstadt. 2019. “Because... I was told... so much”: Linguistic indicators of
mental health status on Twitter. Proc. Priv. Enhanc. Technol. 2019, 4 (2019), 152–171. DOI:https://doi.org/10.2478/popets-
2019-0063
Kenton White, Guichong Li, and Nathalie Japkowicz. 2012. Sampling online social networks using coupling from the Past.
In Proceedings of the 12th IEEE International Conference on Data Mining Workshops (ICDMW’12). IEEE, 266–272. DOI:
https://doi.org/10.1109/ICDMW.2012.126
World Health Organization. WHO. 2019. Mental Disorders. Retrieved from https://www.who.int/news-room/fact-sheets/
detail/mental-disorders.
Sanjaya Wijeratne, Amit Sheth, Shreyansh Bhatt, Lakshika Balasuriya, Hussein S. Al-Olimat, Manas Gaur, Amir
Hossein Yazdavar, and Krishnaprasad Thirunarayan. 2017. Feature engineering for Twitter-based applications. In Fea-
ture Engineering for Machine Learning and Data Analytics, Huan Liu Guozhu Dong (Ed.). Chapman and Hall, 35.
DOI:https://doi.org/10.1201/9781315181080-14
J. T. Wolohan, Misato Hiraga, Atreyee Mukherjee, Zeeshan Ali Sayyed, and Matthew Millard. 2018. Detecting linguistic
traces of depression in topic-restricted text: Attending to self-stigmatized depression with NLP. In Proceedings of the 1st
International Workshop on Language Cognition and Computational Models. 11–21.
Akkapon Wongkoblap, Miguel A. Vadillo, and Vasa Curcin. 2017. Researching Mental Health Disorders in the Era of Social
Media: Systematic Review. DOI:https://doi.org/10.2196/jmir.7215
Akkapon Wongkoblap, Miguel A. Vadillo, and Vasa Curcin. 2018. A multilevel predictive model for detecting social network
users with depression. In Proceedings of the IEEE International Conference on Healthcare Informatics (ICHI’18). IEEE, 130–
135. DOI:https://doi.org/10.1109/ICHI.2018.00022
Ming Yang, Melody Kiang, and Wei Shang. 2015. Filtering big data from social media—Building an early warning system
for adverse drug reactions. J. Biomed. Informa. 54 (2015), 230–240. DOI:https://doi.org/10.1016/j.jbi.2015.01.011
Wei Yang and Lan Mu. 2015. GIS analysis of depression among Twitter users. Appl. Geog. 60 (2015), 217–223.
Andrew Yates, Arman Cohan, and Nazli Goharian. 2017. Depression and self-harm risk assessment in online forums. In
Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2968–2978.
Amir Hossein Yazdavar, Hussein S. Al-Olimat, Monireh Ebrahimi, Goonmeet Bajaj, Tanvi Banerjee, Krishnaprasad
Thirunarayan, Jyotishman Pathak, and Amit Sheth. 2017. Semi-supervised approach to monitoring clinical depres-
sive symptoms in social media. In Proceedings of the IEEE/ACM International Conference on Advances in Social Networks
Analysis and Mining.1191–1198. DOI:https://doi.org/10.1145/3110025.3123028
Zhijun Yin, Lina M. Sulieman, and Bradley A. Malin. 2019. A systematic literature review of machine learning in online
personal health data. J. Amer. Med. Inform. Assoc. 26, 6 (2019), 561–576. DOI:https://doi.org/10.1093/jamia/ocz009
Lei Zhang, Xiaolei Huang, Tianli Liu, Ang Li, Zhenxiang Chen, and Tingshao Zhu. 2015. Using linguistic features to estimate
suicide probability of Chinese microblog users. In Lecture Notes in Computer Science (including subseries Lecture Notes
in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 8944. 549–559.
Yunpeng Zhao, Yi Guo, Xing He, Jinhai Huo, Yonghui Wu, Xi Yang, and Jiang Bian. 2018. Assessing mental health signals
among sexual and gender minorities using Twitter data. In Proceedings of the IEEE International Conference on Healthcare
Informatics Workshops (ICHI-W’18). IEEE, 51–52. DOI:https://doi.org/10.1109/ICHI-W.2018.00015
ACM Computing Surveys, Vol. 53, No. 6, Article 129. Publication date: December 2020.