17732-Article Text-21226-1-2-20210518
17732-Article Text-21226-1-2-20210518
17732-Article Text-21226-1-2-20210518
14748
identify homeless youth who may need intervention even if and are more likely to engage in exchange sex or survival
they are not willing to answer lengthy surveys truthfully; and sex. However, none of them analyze how to leverage social
2) can potentially be integrated into an online service tool so media data to detect the group of homeless youth at risk of
that it can reach and help more homeless youth. substance use and our work fills in the gap.
From a machine learning perspective, this problem is Mining Information from Social Media Texts. With the
challenging for a variety of factors. First, different from tra- advent of social media, users have a tendency to post a large
ditional text classification problems, Facebook posts are typ- quantity of data online, including what they have done and
ically noisy, as they can contain a significant amount of ty- how they feel, which has been used to study users’ behav-
pographical errors or Internet slangs. Besides, as is often the iors. For instance, Aramaki, Maskawa, and Morita (2011)
case, only a limited amount of data is available and thus we extract information from Twitter to detect influenza epi-
need to develop robust algorithms that have robust perfor- demics and Gerber (2014) uses linguistic analysis and sta-
mance in order to deal with these scarce-data settings. As tistical topic modeling to automatically identify discussion
a result of these challenges, several novel adaptations are topics for predicting different types of crime. Researchers
required if we want to apply established machine learning have tried to understand behaviors of substance users or
algorithms to predict substance use for homeless youth. predict substance use using social media data. Zhou et al.
To address these challenges, we develop a general- (2016) understand behaviors of illicit drug users by collect-
purpose multi-step framework which consists of multiple ing Instagram posts and utilizing a dictionary of illicit drug-
steps of pre-processing and vectorization, followed by a related slangs to find common substance use behaviors with
combination of late-fusion and early-fusion techniques for regard to time. Ding, Bickel, and Pan (2017) explore several
effective training of predictive models on scarce and noisy ways to predict whether a user suffers from substance use
social-media data. Experimental results demonstrate the ef- disorder, and we treat it as one of our baselines.
fectiveness of the proposed methods, achieving ROC-AUC While machine learning has been used to predict sub-
scores of ∼0.77 on identifying certain kinds of substance stance use among other youth and adult populations, re-
use among homeless youth with social media conversations searchers have not applied them to a group of young adults
only, and scores of ∼0.8 when combined with answers to who face transient living circumstances and also experience
four survey questions. In addition, we investigate associa- very high rates of trauma which translate to extremely high
tions between certain characteristics of people’s Facebook rates of substance use engagement. Besides, all the previ-
posts and substance use and provide several unique insights ous methods fail to consider the noise in social media texts,
about the problem. which have been shown to degrade the model performance.
In short, we make the following novel contributions: In addition, we ask participants to complete a survey on
• We target a highly vulnerable population, i.e., home- their demographic information and health conditions, and
less youth, which has received little attention in previ- propose methods that outperform previous state-of-the-art
ous work, and collect Facebook data and survey responses algorithms by utilizing both Facebook posts and survey re-
from them (a first-of-its-kind effort to the best of our sponses to detect substance use among homeless youth.
knowledge).
• We investigate several substance use detection models Dataset
with machine learning and natural language processing We collected a total of 135,189 textual Facebook conver-
techniques that are specifically adapted to noisy social sations (posts and comments) from 158 survey participants
media texts, and exhibit superior performance of our mod- (homeless youth) who shared this content on their Face-
els on real-world data by solely using Facebook posts, and book profiles during the data collection period. A purpo-
further improvements can be achieved when non-drug- sive sampling design (a non-probabilistic sampling method
related survey responses are included as additional inputs. which uses a pre-defined list of characteristics for the popu-
• We investigate associations between certain characteris- lation based on the objective of the study) was used to recruit
tics of word usages (or survey responses) and substance participants. Recruiters were present at a non-profit agency,
use, and gain insights about the problem. over six months, for the duration of service provision hours
• We demonstrate that our proposed methods can benefit the to approach and screen youth, and invite participation.
homeless youth by presenting specific use cases regarding Youth who were interested in the study were screened for
substance use prevention in real world settings. eligibility. Eligibility criteria was assessed by a trained re-
search assistant who asked participants about: where they
Related Work slept last night, how long could they stay at that location,
their age, and whether they owned a Facebook profile for at
Social Media and Health-Risk Behaviors among Home- least a year. For youth who met eligibility criteria, the re-
less Youth. Only three studies (Rice, Monro, and Barman- search assistant sought informed consent for participation.
Adhikari 2010; Rice, Milburn, and Monro 2011; Barman- For those who did not meet eligibility, the research assistant
Adhikari et al. 2016) have assessed social media use and thanked them for their time and discontinued the interaction.
health-risk behaviors among homeless youth. In (Rice, For eligible participants, we collected all Facebook posts
Monro, and Barman-Adhikari 2010), ∼25% of the home- shared by them in the last one year. The resulting dataset
less youth surveyed report looking for a sex partner online consists of ∼135K posts in total.
14749
In the effort of pre-processing the data for our analysis, Preprocessing Social Media Texts
we removed the Facebook posts and comments that are ei- Noise in social media text is a known issue that has been
ther empty or only contain weblinks from our dataset. The investigated in a variety of previous work (Michel and Neu-
resulting dataset consists of 91,482 Facebook conversations, big 2018), with most of them focusing on data augmenta-
including 24,960 Facebook posts and 66,522 comments. tion. Unfortunately, we empirically show that popular data
In addition to collecting their Facebook information, we augmentation methods do not work sufficiently well in our
also asked participants to fill out a self-reported survey that problem domain. As a result, in this paper, we approach the
collected information such as their demographic informa- problem from a completely different perspective, and pro-
tion, past and current living status, etc. In addition to peo- pose a general-purpose methodology that utilizes subword
ple’s basic demographic information such as age and gen- information for handling noise in social media text.
der, the participants were also asked questions like “Why Subwords are an effective solution to the out-of-
did you leave home or become homeless?” and “How often vocabulary problem, which is commonly observed in noisy
do you feel that you lack companionship?” that are not di- social media text. In this work, we employ byte pair encod-
rectly about demographic characteristics or substance use, ing (BPE) to perform subword segmentation, which is a sim-
yet they can be utilized for substance use predictions. Ta- ple data compression technique that is widely used in ma-
ble 1 summarizes the general aspects of the survey data and chine translation (Sennrich, Haddow, and Birch 2016). The
its participants’ Facebook conversations revealed from their basic idea of BPE is to iteratively merge pairs of characters
posts and comments. Because not all the participants have or character sequences that appear frequently in the corpus
shared both their Facebook conversations and filled out the to create subwords.
survey, we removed users who either do not have Facebook
In order to compare against data-augmentation based
posts available or do not complete the survey, resulting in a
methods for handling noise in social media data, we also
dataset consisting of ∼25K posts and ∼66K comments from
attempted to utilize BPE-Dropout, a recently proposed data
87 Facebook profiles.
augmentation technique, to tackle the problem. The pro-
Most importantly, in the survey, the participants were
cedure of BPE-Dropout is simple and it mainly alters the
asked to note if they have used drugs in the last 30
segmentation procedure of BPE while keeping its original
days. Specifically, they reported whether they have used
merge table. At a high level, BPE-Dropout stochastically
marijuana, cocaine (including powder, coke, blow or
corrupts the segmentation procedure of BPE, which can
snow), crack (including freebase or rock), heroin, metham-
benefit the machine learning models by 1) augmenting the
phetamines, ecstasy, needles to inject any illegal drug into
dataset; 2) enabling them to be robust against noise. Because
their body, and/or prescription drugs without a doctor’s pre-
both of these properties can be of great benefit in our setting,
scription or more often than prescribed. The statistics of
BPE-Dropout seems to be a promising technique to use.
people using drugs are shown in Table 2. We hope a ma-
chine learning model has the ability to predict which spe-
cific drug one person is using based on Facebook posts, sur-
Vector Representations for Users
vey responses, or both. Prior research suggests that there are After pre-processing the inputs (using BPE), we also need to
unique predictors and consequences of some kind of drugs vectorize the inputs before feeding them to machine learn-
compared to others. For example, a recent study found that ing (ML) models, such that these vector representations can
drug users who used methamphetamine had an 80 percent be processed by ML models. This process is referred to as
greater risk of attempting suicide than drug users who did vectorization. In this part, we describe how we obtain vec-
not (Marshall et al. 2011). Also, homeless youth who use il- torized representations for each homeless youth (or user),
licit drugs experience longer episodes of homelessness and ranging from a simple bag-of-words model to a more com-
victimization while living on the streets compared to recre- plicated distributed bag-of-words model.
ational drugs such as alcohol, tobacco, and marijuana (Ben-
der et al. 2015). Because we can see from the table that only Bag-of-Words Model The bag-of-words model (BoW) is
two people were using crack, we only ran experiments on a simple and intuitive vectorization method for text classi-
the other seven types of drugs. Note that while collecting fication. The idea of the bag-of-words model is to convert
this survey data is onerous in day-to-day settings, it is im- text into fixed-length vectors by counting how many times
portant from an evaluation perspective, as the survey allows each word (or subword) appears in the input text. One caveat
us to gather ground truth labels for our prediction task, i.e., of the bag-of-words model is that it does not take word
which people are substance users and which people are not. order into consideration. However, we empirically demon-
In addition, we also analyze the impact of these survey ques- strate that the method is effective (see experiments).
tions on our predictive performance, and discuss potential Singular Value Decomposition Typically, a vocabulary
workarounds in the paper below. can contain thousands of entries, which can cause bag-
of-words (BoW) models to break down without sufficient
Methods amounts of training data. To solve this problem, we use Sin-
In this section, we will discuss our algorithms in detail. We gular Value Decomposition (SVD) to reduce the dimensions
first describe each component of our algorithms separately, of the BoW vector representation. SVD (De Lathauwer,
including the pre-processing and vectorization steps, and De Moor, and Vandewalle 2000) is a matrix factorization
then illustrate the overall classification procedure. method that generalizes the eigendecomposition of a square
14750
Data Sources Characteristics Mean Standard Deviation Min Max
age 20.72 1.89 18 25
Survey (158 observations) #posts 243.44 275.96 1 1598
#comments 133.61 189.64 1 1061
Posts (21,179 observations) #characters 108.53 174.07 4 1452
Comments (12,025 observations) #characters 59.71 92.93 4 1452
matrix to any m × n matrix. Concretely, given any m × n ing Ding, Bickel, and Pan (2017), we choose the document
matrix A, the SVD algorithm can find matrices U , W and embedding with distributed bag-of-words model (D-DBoW)
V which satisfy the equation A = U W V T , where U is an approach in our drug use prediction domain.
orthogonal matrix with a size of m × n, W is a n × n diago-
nal matrix and V is an n × n orthogonal matrix. To perform Multi-Task Learning
dimension reduction after obtaining matrices U , V , W by When the amount of training data is limited, multi-task
SVD, we first keep the r (r < n) largest singular values in learning can be used to add additional supervision to the
the diagonal matrix W and obtain the resulting matrix W 0 , model, as well as function as a regularizer. In our setting,
then compute a new matrix A0 = U W 0 . we have information on which types of drugs are being used
In our setting, the variable m is the number of homeless by each homeless youth in our dataset. Therefore, when we
youth in the data. We treat all the posts from each user as one predict whether a homeless youth (user) is consuming one
big document and utilize the BoW model to vectorize each type of drug (say type A), we can utilize information about
user’s post. The vectors of all the users are then concate- alternate types of drugs that they may have consumed (in ad-
nated to form the matrix A, which is reduced to a new matrix dition to drug type A) and make predictions simultaneously.
A0 of size m × r using SVD. Each row of the new matrix While additional supervision signals from related tasks
A0 becomes the new feature vector for each user. The new can be typically helpful, multi-task learning with unrelated
features will have significantly fewer dimensions than the task objectives can be harmful and deteriorate the model per-
old ones and words with similar meanings can share simi- formance. To alleviate this issue, instead of predicting all
lar representations. However, valuable information might be types of drugs simultaneously, for each type of drug, we try
lost during compression and thus we need to find a balance to perform multi-task learning with different combinations
between efficiency and effectiveness. of types of drugs and evaluate the model performance on a
development set. The combination which achieves the best
Document Embedding with Distributed Bag-of-Words performance will be selected to train the model. While this
Model Previously, researchers have tried to learn doc- brute-force algorithm can be computationally inefficient, it
ument embeddings with distributed memory and the dis- can ensure that we only utilize the positive connections.
tributed bag-of-words model (D-DBoW) (Le and Mikolov
2014). The idea of these approaches is simple: during train- Multi-View Learning
ing, either a document vector and one or more word vectors As we have illustrated in the previous section, in our set-
are aggregated to predict a target word in the context, or a tings, we not only have access to the Facebook posts from all
document vector is fed to a neural network to predict words the homeless youth, but also have them complete a question-
randomly sampled from the document. naire that documents their biographic information as well
Specifically, we treat all the posts by one user as one doc- as their answers to multi-choice questions like “How would
ument as before, and try to train a document vector to rep- you rate your perceived health?”. In order to utilize informa-
resent each user. At each training step, a global document tion from the questionnaire and combine it with the posts, we
vector vi will be sampled, which is treated as the representa- propose both early fusion and late fusion techniques.
tion of the i-th user. Then, we sample n (sub)words from the
posts uploaded by the i-th user. A neural network is trained Early Fusion. The idea of the early fusion strategy is to
to maximize the likelihood of the n sampled words given the concatenate the features of posts and the features of the
global document representation vi . The training process will questionnaires into a single vector before feeding them to
be repeated until convergence. After the training completes, classifiers. After the concatenation, the classifier is trained
we can get vector representations for all the users. Follow- using techniques as before.
14751
Index Methods Marijuana Cocaine Heroin Meth. Ecstasy Inject Prescription Average
1 Majority Voting 0.500 0.500 0.500 0.500 0.500 0.500 0.500 0.500
Using Posts
2 Ding, Bickel, and Pan (2017) 0.503 0.435 0.476 0.530 0.503 0.532 0.519 0.500
3 BERT (Devlin et al. 2019) 0.500 0.500 0.500 0.500 0.500 0.500 0.500 0.500
4 BOW 0.532 0.435 0.491 0.523 0.533 0.468 0.545 0.504
5 SVD 0.450 0.538 0.436 0.452 0.532 0.532 0.479 0.488
6 D-DBoW 0.583 0.485 0.464 0.592 0.515 0.552 0.454 0.521
7 D-DBoW + Multi 0.617 0.655 0.796 0.711 0.774 0.622 0.712 0.698
Using Survey Answers
8 Survey 0.440 0.632 0.491 0.636 0.500 0.464 0.626 0.541
Combining Posts and Comments
9 Early Fusion 0.668 0.594 0.724 0.641 0.706 0.713 0.648 0.671
10 Late Fusion 0.681 0.617 0.732 0.632 0.723 0.779 0.747 0.702
Combining Posts, Survey Answers and Comments
11 Early Fusion 0.702 0.680 0.750 0.633 0.694 0.728 0.747 0.704
12 Late Fusion 0.677 0.648 0.815 0.712 0.694 0.826 0.728 0.728
Late Fusion. The late fusion strategy first trains separate do not have either survey responses or Facebook posts, re-
classifiers on different views, then ensembles the classifiers sulting in 87 datapoints containing information from ∼25K
together. Concretely, different classifiers will be trained on Facebook posts and ∼66k comments. We lowercased all
different views. When making the final predictions, all the the texts after performing subword segmentation strategies,
classifiers’ outputs will be combined. In this paper, we have which resulted in 8K merge operations. For BPE-Dropout,
attempted a meta-classifier approach. The meta-classifier ap- we performed the algorithm three times with a dropout rate
proach takes the output probability of both classifiers as in- of 0.1, resulting in a dataset that is three times larger than
put (one classifier for each view) and outputs the final prob- the original dataset. It should be noted that we did not per-
ability, and is trained with the training samples. form any kind of tokenizations for subword-level models.
We performed an analysis on a validation set and chose sur-
Overall Algorithm vey answers to the multi-choice questions “Why did you
First, we use subword segmentation algorithms to segment leave home or become homeless?”, “How often do you feel
words from Facebook posts to subwords. Afterwards, we that you lack companionship?”, “I can share happy and sad
convert the inputs into vectors by applying the bag-of-words, moments with these friends”, “In your first 18 years of life, a
SVD, and document embedding with distributed bag-of- parent or other adult in the household often pushed, grabbed,
words algorithms. The converted vectors are then fed into a slapped or threw something at you” because the answers to
machine learning classifier, which is trained with multi-task these questions are correlated with substance use and they
learning. We also use one-hot representations for vectorizing are not directly about demographic characteristics or sub-
survey answers. Then, for each user, we either use the early stance use.
fusion or the late fusion algorithms to do classifications. We
choose decision tree as the base classifier in this paper, and Implementation Details. Because of the scarcity of the
we also try other options (see experiments). data, we mainly tried decision tree instead of deep neural
networks for classification. We have also attempted random
Experiments forests to perform classifications as they have also shown to
We evaluate our models on our collected dataset. In this sec- be powerful models on small-scale datasets and can be easily
tion, we first describe our basic experimental settings and adapted to multi-learning classification settings. The feature
baselines we compare our models with, then present the ex- size is set to 50 For SVD and D-DBoW.
perimental results. We also conduct a fair amount of ablation
studies and analysis to gain some insights to our model. Baselines. We compared our model with three baselines:
1) majority voting; 2) the strongest model in Ding, Bickel,
Settings and Pan (2017); 3) BERT (Devlin et al. 2019), which
Datasets. The detailed description of our collected dataset achieves state-of-the-art performance on a variety of tasks.
can be found in the Dataset section. We removed users who In addition, we conducted extensive ablation studies to
14752
Marijuana Cocaine Heroin Methamphetamines Ecstasy Needles Prescription Average
D-DBoW 0.583 0.485 0.464 0.592 0.515 0.552 0.454 0.521
-subword 0.503 0.435 0.476 0.530 0.503 0.532 0.519 0.500
+BPE-Dropout 0.513 0.464 0.509 0.601 0.482 0.485 0.519 0.512
+random forest 0.519 0.487 0.500 0.500 0.500 0.589 0.553 0.521
+#post 0.480 0.429 0.491 0.457 0.532 0.474 0.552 0.488
demonstrate the necessity of each component in our algo- As demonstrated in row 11-12, because there is greater di-
rithmic framework. We used three-fold cross validation and versity among survey answers and social media texts, adding
weighted ROC-AUC scores to evaluate the performance of survey answers to the fusion can improve the model perfor-
the model. mance, with the best AUC-ROC score being 0.826. In ad-
dition, comparing both early fusion and late fusion strate-
Main Results gies, we can find that late fusion generally outperforms early
Single-View Learning We first trained models with data fusion, indicating that combining high-level information is
from only one view. As we can see from row 1-7 in Ta- better than fusing low-level features in our settings.
ble 3, our models are consistently better than all the base-
line models. BERT cannot outperform the simple majority Ablation Studies
voting strategy, probably because BERT is mainly trained We also did a fair amount of ablation studies and the results
with Wikipedia data, and the huge domain differences be- are shown in Table 4. First, we take the D-DBoW model
tween Wikipedia articles and social media can cause the de- and try not to use subword segmentation. Instead, we use
graded performance of the model. The strongest method in the Twitter-aware tokenizer in the NLTK package1 (which
Ding, Bickel, and Pan (2017) (row 2) can outperform the is designed to be flexible and easy to adapt to new domains
majority voting mechanism to some extent, while being out- and tasks) to segment posts into sequences of words. We
performed by our models. As the main difference between can see from the table that this modification significantly
Ding, Bickel, and Pan (2017) and our methods is the adop- degrades the performance, which shows that the use of sub-
tion of the subword model, the improvements indicate the word segmentation algorithms is necessary. As mentioned in
necessity of using subword models in our settings. Despite the the method section, we also attempted to utilize the BPE-
its simplicity, the bag-of-words model (row 4) can achieve Dropout algorithm. As shown in the table, surprisingly, the
reasonable performance compared with the simple baseline adoption of the BPE-Dropout algorithm would lead to de-
models. The SVD model (row 5), however, cannot improve graded performance. We conjecture that this is because the
upon the baseline in most cases, possibly because the com- augmented datapoints are similar to the original ones, which
pression can drop some valuable information that can be can cause the model to overfit the training data. Next, we
useful for classification. The D-DBoW method (row 6), on tried to use random forests as our classification algorithm
the other hand, can improve the baseline by a large margin, (instead of decision trees). However, we can see that the
which is consistent with the previous findings (Ding, Bickel, adoption of random forests does not improve the model per-
and Pan 2017). Multi-task learning (row 7) is highly ben- formance. One possible explanation is that since the dimen-
eficial in our setting, as it outperforms all the other meth- sion of features is small (< 30), and predictions of every tree
ods significantly, which demonstrates the effectiveness of in the random forest are correlated with each other, a combi-
additional supervision from other tasks. Specifically, in the nation of these trees can result in a relatively poor general-
best scenarios, the model can achieve a ROC-AUC score of ization ability. We also tried to provide the number of posts
0.774, improving the next best baseline by 0.241 points. Us- as one feature to the model. However, this leads to degraded
ing survey answers alone (row 8) can achieve over 0.6 ROC- performance, which suggests that this may not be a reason-
AUC scores on three types of drugs, which is quite effective able feature in our setting.
compared to other methods. The results are intuitive, as in-
formation such as people’s emotional stability and social life Analysis
can reveal if they are engaging in substance use.
Associations between Word Usages/Survey Answers and
Substance Use We first check the associations between
Multi-View Learning As we have described above, a nat- word usages or survey answers and substance use. It should
ural idea is to combine information from different views. be noted that here we just classify people into two types,
In our settings, we have access to people’s posts and com- namely substance-users and non-substance users, without
ments on Facebook, as well as their survey answers, and we considering which specific type of substance they are us-
have tried to combine information from these “views”. As ing. We compute the correlations by directly training a lin-
we can see from the results (row 9-10) in Table 3, combin- ear SVM classifier with the number of appearances of one
ing posts and comments does not always help, probably be-
1
cause there is some overlap between comments and posts. https://www.nltk.org/api/nltk.tokenize.html
14753
Non-Substance User Substance User
Words sincerely (0.611), love (0.549), ... sucking (0.535), ’.’ (0.526)
I can share my happy and sad moments In your first 18 years of life, a parent or other
with friends (0.611) adult in the household often push, grab, slap
Survey Answers
or throw something at you, or ever hit you
so hard you had marks or were injured. (0.611)
“So cute!! I want one!! ¡3.” “They either don’t know, don’t understand or don’t care.”
Sentences I miss my Daughter Smoke weed every day
My favorite person in the world.
Table 5: Associations between words (or survey answers) and substance use.
14754
Ethical Impact out would not prevent them from accessing services at that
Our purpose is to examine discrete types of drug use rather agency. Agencies already have existing protocols that pre-
than patterns of drug use. It is important to understand what vent such coercive practices and we can make this opt-out
kind of substance homeless youth are using because of the option a part of that protocol. To keep their information safe,
implications and consequences of use. Engagement in some their data would be destroyed from the computers once the
substances such as meth, heroin, and injection drug use are analysis is run. This would allow agencies to screen young
known to have more dire physical and mental health effects people for substance use without the same privacy and trans-
than many other commonly used drugs. parency concerns associated with utilization of social me-
Substance abuse is a highly significant public health and dia data or the burden associated with intensive surveys. In
social problem in the United States (Tabar et al. 2020; Yadav addition, these agencies may provide online service tools
et al. 2020). While substance abuse is a debilitating problem through creating a social media account, and the participants
in its own right, even more importantly, it is a key causative can make some or all of their social media conversations
issue for a whole host of other problems faced by homeless visible to the account. Thus, our algorithm can be run fre-
youth in their lives, e.g., substance abuse has been shown quently to identify the participants in need of help and sup-
to increase likelihood of (i) exposure to STIs; (ii) unstable port in a timely fashion without asking the participants to fill
mental health, etc. As a result, it becomes very important in lengthy surveys repeatedly.
to tackle substance use and abuse among homeless youth It should be noted that our proposed system could poten-
from a policy planner’s and practitioner’s perspective. Fur- tially be misused. For instance, leakage of private Facebook
thermore, the addictive tendencies associated with substance data is a concern, as it means that our system could be used
use mean that it is more cost-effective to prevent substance by malicious actors. Also, agencies serving this population
use before youth get addicted (through proactive interven- might stigmatize youth who are screened as potential drug
tions) as opposed to treating youth medically after they fall users and deny them services. Additional efforts are required
into addiction (through reactive interventions). Social me- to prevent the system from being misused.
dia may offer a powerful opportunity for accessing, educat- In addition, it is inevitable that our system could make
ing, and intervening with this typically hard-to-reach group mistakes and may learn false correlations between Facebook
with extremely high rates of drug use. Within this context, posts and labels. As shown in the analysis section, our model
we place our work as one of the first attempts at using Face- can balance between true and false negative rates by tuning
book data to get weak, low-cost (but hopefully accurate) sig- the threshold. Therefore, our model can be adapted to the
nals which can be used to identify homeless youth at-risk of specific needs of serving agencies. Concretely, if false nega-
substance abuse in the near future. tive is more costly than false positive, we can set the predic-
It is important to consider how the findings of this study tion rate to a high value, and vice versa.
can be applied to substance use prevention in real world
settings (i.e. non-profit agencies serving homeless youth). References
One option is to consider engaging Facebook in efforts to Aramaki, E.; Maskawa, S.; and Morita, M. 2011. Twitter
use such algorithms to flag users’ substance use behaviors. catches the flu: detecting influenza epidemics using Twitter.
Facebook already uses its own algorithm to detect suicidal In EMNLP, 1568–1576.
ideation. However, such efforts by Facebook have recently
become mired in controversy because of concerns with pri- Barman-Adhikari, A.; Bowen, E.; Bender, K.; Brown, S.;
vacy, transparency and ethical issues. and Rice, E. 2016. A social capital approach to identify-
A potential alternative to engaging Facebook would be to ing correlates of perceived social support among homeless
create a screening tool that is less likely to violate such pri- youth. In Child & Youth Care Forum, 691–708.
vacy and ethical standards and engage agencies that serve Bender, K.; Brown, S. M.; Thompson, S. J.; Ferguson,
this population in utilizing and deploying this tool. For ex- K. M.; and Langenderfer, L. 2015. Multiple victimizations
ample, most agencies that serve young people who expe- before and after leaving home associated with PTSD, de-
rience homelessness have some kind of an intake process, pression, and substance use disorder among homeless youth.
where youth are screened for various needs and health risks, Child maltreatment 115–124.
including substance use. Our algorithmic screening tool can
easily be integrated into existing intake processes. These in- De Lathauwer, L.; De Moor, B.; and Vandewalle, J. 2000. A
take processes typically rely on intensive self-reported sur- multilinear singular value decomposition. SIAM journal on
veys to screen for substance use. One option is to provide a Matrix Analysis and Applications 1253–1278.
software tool (or a phone application) that non-profit agen- Devlin, J.; Chang, M.-W.; Lee, K.; and Toutanova, K. 2019.
cies serving homeless youth can download on their com- Bert: Pre-training of deep bidirectional transformers for lan-
puters/phones which contain (and run) our algorithm. When guage understanding. In NAACL, 4171–4186.
homeless youth are signing up to receive services at these
agencies, they can be asked to volunteer their Facebook Ding, T.; Bickel, W. K.; and Pan, S. 2017. Multi-view un-
conversations along with some simple survey questions to supervised user feature embedding for social media-based
screen them for substance use. To prevent any potential for substance use prediction. In EMNLP, 2275–2284.
coercion, youth would be given the option to opt-out of Gerber, M. S. 2014. Predicting crime using Twitter and ker-
the screening if they had any concerns. However, opting- nel density estimation. Decision Support Systems 115–125.
14755
Guadagno, R. E.; Muscanell, N. L.; and Pollio, D. E. 2013.
The homeless use Facebook?! Similarities of social network
use between college students and homeless young adults.
Computers in Human Behavior 86–89.
Jones, S.; and Fox, S. 2009. Generations online in 2009.
Pew Internet & American Life Project.
Kennedy, D. P.; Wenzel, S. L.; Tucker, J. S.; Green, H. D.;
Golinelli, D.; Ryan, G. W.; Beckman, R.; and Zhou, A. 2010.
Unprotected sex of homeless women living in Los Angeles
County: An investigation of the multiple levels of risk. AIDS
and Behavior 960–973.
Le, Q.; and Mikolov, T. 2014. Distributed representations of
sentences and documents. In ICML, 1188–1196.
Marshall, B. D.; Galea, S.; Wood, E.; and Kerr, T. 2011.
Injection methamphetamine use is associated with an in-
creased risk of attempted suicide: a prospective cohort study.
Drug and alcohol dependence 134–137.
Michel, P.; and Neubig, G. 2018. MTNT: A Testbed for
Machine Translation of Noisy Text. In EMNLP, 543–553.
Nyamathi, A.; Hudson, A.; Greengold, B.; and Leake, B.
2012. Characteristics of homeless youth who use cocaine
and methamphetamine. American journal on addictions
243–249.
Rice, E.; Barman-Adhikari, A.; Rhoades, H.; Winetrobe, H.;
Fulginiti, A.; Astor, R.; Montoya, J.; Plant, A.; and Kordic,
T. 2013. Homelessness experiences, sexual orientation, and
sexual risk taking among high school students in Los Ange-
les. Journal of Adolescent Health 773–778.
Rice, E.; Milburn, N. G.; and Monro, W. 2011. Social net-
working technology, social network composition, and reduc-
tions in substance use among homeless adolescents. Preven-
tion Science 80–88.
Rice, E.; Monro, W.; and Barman-Adhikari. 2010. Inter-
net use, social networking, and HIV/AIDS risk for homeless
adolescents. Journal of Adolescent Health 610–613.
Sennrich, R.; Haddow, B.; and Birch, A. 2016. Neural
machine translation of rare words with subword units. In
EMNLP, 1715–1725.
Tabar, M.; Park, H.; Winkler, S.; Lee, D.; Barman-Adhikari,
A.; and Yadav, A. 2020. Identifying Homeless Youth At-
Risk of Substance Use Disorder: Data-Driven Insights for
Policymakers. In KDD, 3092–3100.
Toro, P. A.; Lesperance, T. M.; and Braciszewski, J. M.
2011. The heterogeneity of homeless youth in America: Ex-
amining typologies. National Alliance to End Homelessness
.
Yadav, A.; Singh, R.; Siapoutis, N.; Barman-Adhikari, A.;
and Liang, Y. 2020. Optimal and Non-Discriminative Re-
habilitation Program Design for Opioid Addiction Among
Homeless Youth. In IJCAI, 4389–4395.
Zhou, Y.; Sani, N.; Lee, C.-K.; and Luo, J. 2016. Under-
standing illicit drug use behaviors by mining social media.
arXiv preprint arXiv:1604.07096 .
14756