Learning Chemical Sensitivity Reveals Mechanisms of Cellular Response

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

communications biology Article

https://doi.org/10.1038/s42003-024-06865-4

Learning chemical sensitivity reveals


mechanisms of cellular response
Check for updates
1,2,3 3,4,5,6 3,4,5,6 1,2,3
William Connell , Kristle Garcia , Hani Goodarzi & Michael J. Keiser

Chemical probes interrogate disease mechanisms at the molecular level by linking genetic changes to
observable traits. However, comprehensive chemical screens in diverse biological models are
impractical. To address this challenge, we develop ChemProbe, a model that predicts cellular
sensitivity to hundreds of molecular probes and drugs by learning to combine transcriptomes and
1234567890():,;
1234567890():,;

chemical structures. Using ChemProbe, we infer the chemical sensitivity of cancer cell lines and tumor
samples and analyze how the model makes predictions. We retrospectively evaluate drug response
predictions for precision breast cancer treatment and prospectively validate chemical sensitivity
predictions in new cellular models, including a genetically modified cell line. Our model interpretation
analysis identifies transcriptome features reflecting compound targets and protein network modules,
identifying genes that drive ferroptosis. ChemProbe is an interpretable in silico screening tool that
allows researchers to measure cellular response to diverse compounds, facilitating research into
molecular mechanisms of chemical sensitivity.

Chemical probes are highly potent small molecules that selectively target expression profile7. However, significant improvements have been
known mechanism-of-action proteins1. These tools are crucial for achieved by incorporating multimodal information, such as chemical
understanding the role of specific proteins in biological processes and structure and pharmacological features8,9. These advancements have
diseases, and have been instrumental in investigating a range of functions demonstrated the value of integrating diverse types of data to enhance
such as those related to the cell cytoskeleton, immunosuppression, drug response prediction.
mTOR signaling, protein kinase dynamics, and have often served as the Deep learning has become a way to effectively represent and integrate
starting point for drug development1–3. In addition to their primary use as diverse feature sets. These methods commonly employ separate feature
therapeutic agents, drugs can serve as chemical probes in complex dis- encoders that learn rich representations prior to integration10,11. For
eases like cancer. Addressing cancer heterogeneity necessitates precision example, variational auto-encoders (VAEs) can leverage pretraining for
clinical treatment strategies and research into the mechanisms that transfer learning12–15. More broadly, neural networks are adaptable to novel
control disease resistance and sensitivity4,5. By improving our under- inputs, such as graph representations for chemical structures16–18, and their
standing of gene expression patterns contributing to variance in drug composability enables feature integration techniques like cross-
response, we can develop better solutions for cancer patients exploiting attention18,19.
specific tumor vulnerabilities. Interpreting the computational structure of predictive models
Ideally, we could test large libraries of chemicals on disease models, themselves can inform on the underlying biology of compound response.
engineered cell lines, and patient samples to probe disease mechanisms. On one hand, ensemble models offer confidence scores, and direct
However, screening biological samples against a large library of chemical interpretation of model coefficients (e.g., attention matrices) reveals
probes is resource-prohibitive. To overcome this problem, a variety of feature relationships19–22. Gradient-based attribution methods can also
traditional machine-learning methods have been applied to predict drug help identify features driving a particular prediction15. On the other hand,
response, including support vector machines (SVMs), random forests integrating biological priors into neural networks effectively reduces the
(RFs), and multi-layer perceptrons (MLPs)6. Early approaches often feature space and incorporates interpretable features, such as gene
relied on a single cellular feature set, such as mutation status or gene ontologies and pathway annotations23–25. By imposing constraints

1
Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, CA, USA. 2Institute for Neurodegenerative Diseases, University
of California, San Francisco, San Francisco, CA, USA. 3Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA,
USA. 4Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, CA, USA. 5Department of Urology, University of
California, San Francisco, San Francisco, CA, USA. 6Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA,
USA. e-mail: [email protected]

Communications Biology | (2024)7:1149 1


https://doi.org/10.1038/s42003-024-06865-4 Article

Fig. 1 | ChemProbe design and model interpretation. a Workflow of model of the conditional neural network (ChemProbe) trained to predict cell line viability
training, validation, prediction, dose-response modeling, and feature attribution. from molecular features and compound structure. ChemProbe learns an embedding
We trained a deep neural network model to predict drug sensitivity at specific of protein-coding gene expression features conditioned by parameters learned from
compound concentrations and fit log-logistic models to predictions. We derived an embedding of compound structure and concentration. c Decomposition of
compound pharmacodynamics from dose-response curves and applied integrated learned conditioning parameters. Points represent compound-concentration sam-
gradient saliency mapping to predicted IC50 to derive input feature attributions. ples; color indicates compound; size indicates concentration; and shape indicates
IC50, inhibitory concentration of compound at 50% cellular viability. b Architecture parameter.

through priors, learning operates in the context of existing biological Results


knowledge. However, this approach can limit a model’s capacity to learn Conditional modeling enhances cellular drug sensitivity
novel gene combinations and systems mechanisms that are not well prediction
understood. We hypothesized that a deep neural network model could learn to combine
We developed a conditional deep-learning model that predicts the gene expression with chemical structure to predict cellular sensitivity
sensitivity of new cellular samples to a panel of chemicals along with a (Fig. 1a). We leveraged publicly available datasets to match cancer cell line
framework for understanding the gene features implicated in the basal transcriptomes with a large-scale chemical screen. The Cancer Ther-
response. While previous research has addressed the tradeoffs among apeutics Response Portal (CTRP) reports the viability of 842 cancer cell lines
various input feature modalities, few studies explore how diverse feature in response to 545 compounds and compound pairs across a range of
sets are integrated. Moreover, little work has been done to explore if the concentrations26,27. These compounds span cell circuitry targets, offering a
features relied on by well-performing models indeed reflect expected nuanced view of cellular response to pathway perturbations across a broad
biological relationships. Our work investigates methods for integrating range of cellular components. The Cancer Cell Line Encyclopedia (CCLE)
biological and chemical features to improve drug sensitivity prediction provides basal transcriptomic characterizations of all 842 CTRP cell lines28.
and assesses the utility of model interpretation methods for advancing We combined compound structures and concentrations from the CTRP
biological discovery. ChemProbe learns to combine cellular tran- with protein-coding gene transcriptomes from the CCLE to create a dataset
scriptomes and chemical structures to predict sensitivity. The model can of compound-cell line pairs consisting of approximately 5.8 million labeled
be applied to new biological samples and leverages integrated gradients to examples (Methods).
generate interpretable learned gene features relevant to known com- We formulated the cellular drug sensitivity prediction task as a con-
pound mechanisms. ChemProbe accurately models chemical response ditional model y ¼ f ðxjnÞ, where y is cellular viability, x is a matrix of
without biological priors, enabling in silico chemical screening of biolo- standardized RNA abundance values, n is a matrix of chemical features, and
gical models and mechanistic interpretation of learned gene f is parameterized by a neural network (Methods). Thus the model’s pre-
dependencies. diction of cellular viability depends on a cell’s transcriptomic profile in the

Communications Biology | (2024)7:1149 2


https://doi.org/10.1038/s42003-024-06865-4 Article

Table 1 | Predictive performance The I-SPY2 dataset introduced a significant change in the input data
modality and allowed us to assess the robustness of ChemProbe. Unlike the
Model R2 CCLE training data, which quantified gene expression through high-
Concatenation 0.6066 ± 0.0165 throughput RNA sequencing, I-SPY2 collected pre-treatment patient gene
Shift 0.7060 ± 0.0304 expression by microarray. Microarrays have lower overall specificity and
Scale 0.7113 ± 0.0081 sensitivity and capture a smaller dynamic range of gene abundance34. Neural
networks often fail on “out of distribution” samples whose features (gene
FiLM 0.7089 ± 0.0040
abundance values) come from different assays than their training data. To
Sructural ablation 0.3016 ± 0.0304
determine the extent to which the I-SPY2 data was outside ChemProbe’s
Average performance and standard error of 5 models trained across identical data folds. training distribution, we compared dataset expression profiles across the top
two principal components. A subset of the I-SPY2 data fell outside the
training data distribution, consistent with the expectation that assay types
context of a chemical structure and concentration. ChemProbe predicts introduce systematic measurement effects (Methods, Supplementary Fig. 2a
viability by learning to use chemical features to modulate gene expression and Supplementary Data 4).
through linear transformations of internal gene expression representations We assessed whether ChemProbe could retrospectively stratify I-SPY2
(Fig. 1b). This enables a logic akin to chemical substructures modulating responders and non-responders despite these differences in datasets. We
gene products (proteins). We tested several ways to combine cellular fea- compared ChemProbe predictions with the original treatment allocations in
tures and chemical information within a single model, as assessed by the the I-SPY2 trial, which were determined by standard biomarkers of the
average maximum coefficient of determination (R2). Accounting for com- participants’ tumors. Five drugs from the I-SPY2 trial were in ChemProbe’s
parable model sizes, we trained, validated, and hyperparameter-optimized panel. We first compared the magnitudes of ChemProbe’s sensitivity pre-
different model architectures across five data folds stratified by cell line (five- dictions between responders and non-responders. For four out of five drugs,
fold cross-validation, Methods). We compared three methods of learned ChemProbe predicted lower scaled-AUC values for the responder group
feature conditioning against a baseline feature concatenation approach29. All (Fig. 2a and Supplementary Data 5). Next, we generated receiver operating
conditioning approaches outperformed feature concatenation by a notable characteristic (ROC) curves to compare the drug response predictions of
margin (Table 1 and Supplementary Data 1). Among the conditioning I-SPY2 and ChemProbe with the trial outcomes. We used the treatment
models, scaling, shifting, and linearly modulating gene expression by che- designations from I-SPY2 as a proxy for drug response prediction.
mical features performed similarly. A t-distributed stochastic neighbor ChemProbe’s area under the ROC curve for each drug ranged from 0.60
embedding (t-SNE) decomposition of learned parameters demonstrated (paclitaxel and neratinib) to 0.73 (veliparib), with a macro-average auROC
that scaling and shifting operations encoded distinct chemical features of 0.65 (Fig. 2b and Supplementary Data 5).
(Fig. 1c; Supplementary Data 2). Hierarchical clustering of scaling (γ) To evaluate the clinical utility of ChemProbe, we used it to classify
parameters grouped compounds by identity (Supplementary Fig. 1a and patients by treatment response: responders (+) and non-responders (−)
Supplementary Data 3), whereas compound concentration correlated with (ChemProbe + /−). Since the model determines cellular viability by drug
the first principal component of shifting (β) parameters (p = 1.72e-55; concentration, we established a decision threshold for detecting responders
Supplementary Fig. 1b and Supplementary Data 3). Thus the learned (Methods). ChemProbe + /− classification accuracy significantly out-
conditioning parameters interpretably reflected compound structure and performed I-SPY2 (p < 5e-2; Fig. 2c and Supplementary Data 5). Although
concentration in the drug-response modeling task as an emergent property I-SPY2 predictions had a higher true positive rate (0.30, I-SPY2; 0.21,
of model learning. ChemProbe), ChemProbe + /− classifications massively reduced the false
Cellular response commonly follows a sigmoidal relationship to positive rate (0.70, I-SPY2; 0.37, ChemProbe) with relatively few false
drug concentration. To quantify whether compound dosage alone was negatives (0.00, I-SPY2; 0.095, ChemProbe) (Supplementary Fig. 2b, c and
driving drug sensitivity predictions, we performed a feature ablation Supplementary Data 4). By correctly predicting a portion of patients with a
experiment, wherein we purposefully removed crucial data from the low likelihood of drug response, ChemProbe + /− significantly increased
model’s training and compared it to the actual model. For the “straw the true negative rate of drug-response classification relative to I-SPY2,
model,”30,31 we replaced chemical fingerprints with unique but structu- providing crucial information for clinical decision-making. Despite being
rally uninformative and randomized numerical values. The “straw trained only on isogenic cell lines, these results support ChemProbe’s use
model” trained on ablated features failed, underscoring the importance of with heterogeneous tumors from clinical patient samples.
compound structural features in the modeling task (Table 1 and Sup-
plementary Data 1)31. Explicitly modeling chemical information as ChemProbe predicts cellular drug sensitivity
conditioning provides a valuable inductive bias for chemical sensitivity We conducted a prospective evaluation of ChemProbe’s ability to differ-
prediction and gives insights into the predictive mechanisms of the entiate drug sensitivity between two primary breast cancer cell lines,
model. We hyperparameter-optimized 5 FiLM models across cell-line HCC1806-Par and MDA-MB-231-Par35–38. We compared the gene
stratified data folds and used this ensemble (0.7173 ± 0.0052 R2) of expression profiles of the two cell lines to their CCLE counterparts by
models in subsequent experiments (Methods). analyzing the top two gene-expression principal components. Our analysis
showed significant disparities in the gene expression patterns of the two cell
ChemProbe predicts breast cancer patient response lines compared to the training data, highlighting the challenges of main-
We next asked whether learned transcriptional patterns would generalize to taining consistency across cellular models (Supplementary Fig. 3a and
an in vivo cellular context. We measured how well ChemProbe, trained Supplementary Data 6). The observed differences, which make the pro-
solely on cell line expression profiles, could predict drug response in clinical spective test more difficult but particularly informative, may be attributed to
tumor samples. We used gene expression and patient-drug response data variations in cell culture protocols, reagents, and genetic drift commonly
from the I-SPY2 adaptive, randomized, phase II clinical trial of neoadjuvant found between experimental settings39.
therapies for early-stage breast cancer (NCT01042379)32,33. I-SPY2 assigned We predicted sensitivity at 32 drug concentrations (1e-3 µM–300 µM),
patients to treatment arms based on biomarkers such as hormone receptor fit log-logistic models, and determined 50% inhibitory concentration (IC50)
status, human epidermal growth factor receptor-2 expression, and Mam- values from each in silico dose-response curve (Fig. 3a). ChemProbe pre-
maPrint status. The absence of invasive cancer in the breast and regional dicted that HCC1806-Par would be more sensitive than MDA-MB-231-Par
lymph nodes at the time of surgery defined the endpoint of pathological to 88.16% (201/228) of the compounds with fitted curves (Fig. 3b and
complete response (pCR) (nonresponse, pCR = 0; response, pCR = 1). Supplementary Data 7). We focused on compounds with the largest

Communications Biology | (2024)7:1149 3


https://doi.org/10.1038/s42003-024-06865-4 Article

Fig. 3 | Differential potency and in silico dose-response curve predictions.


a Approach to model training and dose-response modeling. We trained individual
models on held-out cell line dataset splits by 5-fold cross-validation. We then fit log-
logistic models to cross-validated model predictions and derived pharmacodynamic
features. b Expected cumulative distribution plot of predicted compound IC50
differences between HCC1806-Par and MDA-MB-231-Par cell lines. Compounds
selected for in vitro dose-response testing are highlighted. c Predicted dose-response
relationships of HCC1806-Par and MDA-MB-231-Par response to neratinib (n = 5
Fig. 2 | I-SPY2 clinical trial retrospective analysis. a Predicted dose-response AUC independent samples) and d 1S,3R-RSL-3 (n = 5 independent samples). 95% con-
for I-SPY2 patients treated with each drug. AUCs scaled between the minimum and fidence intervals.
maximum predicted AUC of patients treated with each drug. Blue = non-respon-
der, orange = responder; centerline, median; box limits, upper and lower quartiles;
whiskers, 1.5x interquartile range; points, outliers; two-sided Wilcoxon rank-sum
differences in IC50 between the cell lines, selecting four compounds pre-
test; ns: p < =1e1, *p < =5e-2, **p < =1e-2. Sample sizes: paclitaxel (n = 38 inde-
pendent case samples, n = 172 independent control samples); neratinib (n = 41
dicted to have strong IC50s against HCC1806-Par (neratinib, ceranib-2,
independent case samples, n = 73 independent control samples); MK2206 (n = 35 CAY10618, and AZD7762) and two compounds with IC50s favoring
independent case samples, n = 59 independent control samples); veliparib (n = 27 MDA-MB-231-Par (ML162 and 1S,3R-RSL-3) (Fig. 3c, d; Supplementary
independent case samples, n = 44 independent control samples); carboplatin Fig. 4 and Supplementary Data 7). In vitro, prospective testing confirmed
(n = 27 independent case samples, n = 44 independent control samples). b Receiver ChemProbe’s predictions for all six compounds. Predicted differences in
operating characteristic curve of patients treated with each drug and corresponding IC50s between the two cell lines significantly correlated with observed dif-
auROC. c Accuracy of I-SPY2 (n = 5 independent samples) predictions versus ferences (Fig. 4a–f, p = 0.035; Fig. 4g, Supplementary Data 8). We compared
ChemProbe (n = 5 independent samples) predictions for non-responders/respon- the relative potency of each compound at the median effective dose (ED50)
ders. Centerline, median; box limits, upper and lower quartiles; whiskers, 1.5x between cell lines, finding significant differences in compound cellular
interquartile range; points, outliers); two-sided Wilcoxon rank-sum test; viability as predicted (Table 2). Additionally, predicted IC50s correlated
*p < =5e-2. highly with measured IC50s for individual cell lines after correcting for an

Communications Biology | (2024)7:1149 4


https://doi.org/10.1038/s42003-024-06865-4 Article

Fig. 4 | Validation of differential potency predictions. a–d In vitro dose-response independent experiments). 95% confidence intervals. g Relationship between the pre-
relationships of HCC1806-Par differentially potent compounds (n = 4 independent dicted and observed differences in IC50 values of tested compounds between HCC1806-
experiments) and e, f MDA-MB-231-Par differentially potent compounds (n = 4 Par and MDA-MB-231-Par cell lines. Two-sided t-test (n = 6 independent experiments).

experimental outlier (p = 0.096, Supplementary Fig. 5a and Supplementary undermining their usefulness in quantifying feature importances44. We used
Data 9). These results were consistent with initial concentration range- principal component analysis to determine the first two principal compo-
finding experiments, where five out of six compounds had shown predicted nents of attribution vectors by cell line to check this. We found that attri-
differences in IC50s and relevant IC50s specific to individual lines (Sup- bution and transcriptome vectors correlated (Supplementary Fig. 6a, c and
plementary Fig. 5b, c and Supplementary Data 9). ChemProbe accurately Supplementary Data 10). Although some gene expression magnitudes may
predicted the sensitivities of independently obtained and characterized cell linearly correlate with phenotypes, this framework does not capture other
line samples, despite their transcriptomic profiles differing from the training causal nonlinear interactions within gene regulatory networks. A well-
dataset. calibrated interpretation method should attribute significance to features
based on their relevance to the model’s predictions, rather than solely on
Gene expression attribution vectors pass interpretability their expression levels. To address this confounder, we normalized attri-
soundness checks bution vectors by cell line, which decreased cell-line-specific effects in the
To assess whether ChemProbe learned biologically relevant patterns, we principal component analysis and decoupled the correlation between
investigated whether the model’s gene expression saliency reflected known attribution and transcriptome vectors (Methods, Supplementary Fig. 6b, c
compound pharmacology and network biology. Saliency mapping methods and Supplementary Data 10).
evaluate which gene features a model puts the most “weight” on when To ensure model interpretation accurately reflected learned feature
making its predictions and how these choices change for different cell lines transformations, we performed two tests (Methods)44. The first test ran-
or compounds. We experimentally characterized seven new cell line tran- domly initialized the model parameters and compared the outcomes with
scriptomes, including primary tissue models and metastatic derivatives40–42. the true-model’s attribution vectors. We trained the model using scrambled
Using integrated gradient saliency mapping43, we determined IC50 values labels in the second test and compared its attribution vectors with the true-
for each compound-cell line pair with ChemProbe and computed its gene model’s. We conducted these tests using both uncorrected (raw) and cell
attribution vectors. However, we acknowledge that integrated gradient line-effect corrected (adjusted) attribution vectors. The raw attribution
attribution vectors may correlate with input feature magnitudes, potentially vectors were highly correlated with transcriptome profiles, random-model,
and permuted-model attribution vectors, failing the tests of independence
Table 2 | Relative potency at median effective dose from learned parameters and the training data. However, the adjusted
attribution vectors were not correlated with those derived from the control
Compound ED50 ratio (HCC1806-Par/MDA- t-value p-value models, indicating that adjusted attribution vectors do not simply reflect
MB-231-Par)
data or architecture artifacts (Supplementary Fig. 6c and Supplementary
Neratinib 0.4946 ± 0.2426 −2.0830 4.0220e-2
Data 10).
Ceranib-2 0.5165 ± 0.1943 −2.4887 1.4700e-2
CAY10618 0.2089 ± 1.869e-2 −42.3233 1.1593e-59 Learned transcriptomic features reflect compound pharmacol-
AZD7762 0.5639 ± 0.1004 −4.3430 3.7500e-5 ogy and network biology
Neural networks are notoriously difficult to interpret, but we hypothesized
1S,3R-RSL-3 2.1123 ± 0.4446 2.5244 1.3384e-2
that ChemProbe’s highly attributed gene features may reflect causative
ML162 3.008 ± 0.7041 2.8521 5.4120e-3 mechanisms or correlative biomarkers of drug sensitivity. First, we inves-
Relative potency of each compound at the median effective dose (ED50) between cell lines. tigated whether the model relied on similar gene features for compounds

Communications Biology | (2024)7:1149 5


https://doi.org/10.1038/s42003-024-06865-4 Article

with the same known protein targets. We created a control compound set Changes in compound sensitivity following gene knockout (KO) or
(CCS) based on nominal target classes, each with at least two compounds overexpression can inform on mechanisms of gene-dependent protection or
successfully predicted in all seven cell lines. We applied K-means clustering resistance. Accordingly, we assessed ChemProbe’s utility for screening gene-
to the CCS attribution vectors and computed the adjusted mutual infor- dependent ferroptosis resistance in silico. Lipoprotein receptor LRP8 has
mation (AMI) between clusters and target class labels to determine whether recently been shown to act as a ferroptosis resistance factor by maintaining
transcriptomic attribution vector similarity corresponded to known com- cellular selenium levels and appropriate translation of GPX4. Selenium
pound mechanisms of action (MOA) (Fig. 5a and Supplementary Data 11). uptake is reduced in LRP8 KO models, leading to ribosome stalling and early
We also examined the AMI between structural clusters and target classes, as translation termination of GPX4, which sensitizes cells to ferroptosis49. We
chemical structure similarity alone may at times reflect target profile simi- tested if ChemProbe correctly predicted that an LRP8 KO cell line would
larity, albeit imperfectly45. have reduced sensitivity to ferroptosis-inducing compounds than a wild-
The attribution vector clusters AMI was significantly greater than that type. Consistent with previous research, ChemProbe predicted LRP8 KO
of structural clusters, a randomly initialized model, and a model trained on cells were more sensitive than wild-type to known ferroptosis-inducing
permuted labels (Fig. 5b, Methods; Supplementary Data 11). Moreover, we compounds ML210, 1S,3R-RSL-3, ML162, and CIL56 (Fig. 6d and Sup-
found that compounds belonging to the same target class frequently had plementary Data 13).
high nominal target attributions relative to other compounds, indicating We noticed several correlations between cellular response and the
that ChemProbe often made predictions based on the expression infor- expression of highly attributed genes for compounds that induce ferroptosis
mation of nominal targets (Fig. 5c; Supplementary Fig. 6d–k and Supple- (Supplementary Fig. 8b, c). We wondered if highly attributed genes played
mentary Data 10–11). functional roles related to ferroptosis. We extracted the ten highest differ-
We next examined the network topology of nominal target classes entially attributed genes and applied a functional enrichment analysis
using the STRING database of high-confidence protein–protein (Supplementary Fig. 8d and Supplementary Data 14). We observed the
interactions46 to interrogate biological relevance. We clustered attribution enrichment of terms related to lipid transport and fatty acid metabolic
vectors, gathered target annotations within each cluster, and queried processes, pathways adjacent to lipid peroxidation, and ferroptosis (Fig. 6e
STRING for the respective target interactome (Fig. 5d, Methods; Supple- and Supplementary Data 13). These results indicate that transcriptomic
mentary Data 11). Target modules had significantly greater connectivity attributions align with ferroptosis biology, underscoring the potential of
than modules generated from randomly sampled target protein sets or ChemProbe in screening genetic dependencies and identifying novel bio-
randomly sampled protein sets (Fig. 5e; Supplementary Data 11). Finally, we logical mechanisms.
tested whether attribution-defined target modules of action (ModOA) also
showed protein interaction enrichment. On analysis, 10/26 ModOA Discussion
reflected significant network interaction enrichment and a variety of func- Using a conditional deep-learning approach, ChemProbe evaluates cel-
tional enrichments from gene ontologies, KEGG pathways, and Reactome lular transcriptomic signatures against bioactive molecular structures to
pathways (Fig. 5f; Supplementary Fig. 7 and Supplementary Data 11, 12). predict cellular responses to chemical perturbations. In experiments on
These findings suggest that highly attributed transcriptome features reflect cellular models and clinical tumor samples, this tool accurately predicts
systems biology and potential mechanisms of drug response. cellular viabilities. ChemProbe complements more clinically oriented
approaches with its ability to directly screen engineered cell lines and
Screening genetic dependencies for mechanisms of ferroptosis interrogate potential molecular mechanisms. Engineered cell lines, which
We further hypothesized that ChemProbe’s highly attributed gene features possess specific genetic modifications or alterations, physically model
would relate to compound MOA. To test this, we used linear regression for disease conditions or the results of targeting pathways of interest. By
differences in gene attribution between groups. This “differential attribution leveraging ChemProbe, researchers can evaluate the sensitivity of known
analysis” (DAA; see Methods) generates ranked gene lists, which we use as and newly engineered lines to a panel of chemical probes to assess how
marker genes to arrange attribution clusters hierarchically (Fig. 6a and specific genetic modifications or alterations influence compound
Supplementary Data 13). We noticed clusters 26 and 28 showed different response.
prediction sensitivity to ferroptosis-inducing compounds (Fig. 6b and Intriguingly, deep-learning model interpretation reflects compound
Supplementary Data 13). Ferroptosis is a type of cell death implicated in mechanisms of action (Fig. 5). The differential attribution analysis (DAA)
multiple biological contexts, with therapeutic applications in cancer, method we introduce surfaces potential gene patterns driving responses and
immunity, development, and aging47,48. These attribution clusters included new disease-gene relationships. In one example, we identified genes linked
compounds ML162 and 1S,3R-RSL-3, which had shown differential cellular to ferroptosis resistance in an LRP8 knockout cell line. ChemProbe’s cal-
sensitivity in the prospective in vitro experiments (Fig. 4e, f). Additional culations were not specific to this biology; it may find similar use in screens
compounds with ferroptosis-inducing mechanisms of action in these for resistance mechanisms and target discovery across diverse cellular
clusters included ML210, erastin, CIL56, and CIL70. models (Fig. 6). In cancer research, the tool rapidly evaluates the influence of
Next, we investigated the attribution vectors of ferroptosis-inducing specific oncogenic mutations or alterations in tumor suppressor genes on
compounds to assess the alignment of model interpretations with estab- chemical sensitivity. When applied to engineered cell lines representing
lished ferroptosis biology. We observed a clear distinction in predicted different genetic backgrounds, ChemProbe can highlight vulnerabilities and
sensitivity between two groups of cell lines exposed to the same compounds potential mechanisms of drug resistance associated with particular genetic
(Fig. 6b and Supplementary Data 13). To further analyze this, we merged alterations.
clusters 26 and 28 into a combined cluster representing ferroptosis-inducing Extending to clinical samples, ChemProbe becomes a tool for targeted
compounds and applied DAA. Since multiple mechanisms induce ferrop- therapy and precision medicine. We found that it predicts drug sensitivity in
tosis, we queried differential attributions of multiple ferroptosis-associated breast cancer patients across heterogeneous clinical tumor samples (Fig. 2).
genes, including GPX4, SCD, SLC7A11, FSP1, and LRP847. All ferroptosis- Likewise, ChemProbe suggests which drugs may be ineffective for a given
associated genes were within the most highly attributed in the ferroptosis- patient; if borne out in clinical studies, it or similar methods could mean-
inducing compound cluster (Fig. 6c and Supplementary Data 13). To verify ingfully reduce therapeutic trial and error50,51. Although our data may
that these results were not artifacts of the transcriptomes or relative gene suggest alternative drug treatments, we refrained from recommending
expression differences, we also performed differential expression analysis therapies as their potential value is challenging to evaluate within the scope
(DEA) between MDA-MB-231-Par and HCC1806-Par. Besides GPX4, a of this study. The ability to expedite treatment at earlier disease stages and
key ferroptosis regulator, no ferroptosis-associated genes rose to significance target cellular vulnerabilities would be particularly impactful for tumors
(p < 5e-2; Supplementary Fig. 8a and Supplementary Data 14). whose resistance mechanisms rapidly evolve.

Communications Biology | (2024)7:1149 6


https://doi.org/10.1038/s42003-024-06865-4 Article

Fig. 5 | Feature attribution analysis of nominal compound targets. a Distinct highest significance target of the nominal target class versus all other target classes.
protein target clusters emerge from UMAP decomposition of adjusted attribution Two-sided Wilcoxon rank-sum test; *p < =5e-2, **p < =1e-2, ***p < =1e-3,
vectors at compound IC50s for predicted and fitted dose-response relationships in ****p < =1e-4. Sample sizes: n = 14 independent case samples, n = 455 independent
MDAMB231, MDAMB231-LM2, HCC1806, HCC1806-LM2b/c, SW480, and control samples except otherwise noted; EGFR;ERBB2 (n = 21 case, n = 448 control);
SW480-LvM2 prospective cell lines. Control compound set (CCS) attribution vec- HDAC1/2/3/6/8 (n = 56 case, n = 413 control); HMGCR (n = 21 case, n = 448
tors colored by nominal target class. b Comparison of adjusted mutual information control); NAMPT (n = 28 case, n = 441 control); TP53 (n = 21 case, n = 448 control).
(AMI) derived from CCS nominal target labels and K-means clustering of trained d Leiden clustering of all attribution vectors. e Comparison of PPI subgraph con-
model adjusted attribution vectors (n = 5 independent samples), compound fin- nectivity derived from clustered target profiles (n = 26 independent samples), ran-
gerprints (n = 5 independent samples), random-model adjusted attribution vectors dom target profiles (n = 26 independent samples), and random protein-coding genes
(n = 5 independent samples) and permuted-model adjusted attribution vectors (n = 26 independent samples). Centerline, median; box limits, upper and lower
(n = 5 independent samples). Centerline, median; box limits, upper and lower quartiles; whiskers, 1.5x interquartile range; points, outliers; two-sided Wilcoxon
quartiles; whiskers, 1.5x interquartile range; points, outliers; two-sided Wilcoxon rank-sum test; ***p < =1e-3, ****p < =1e-4. f Network representation of exemplar
rank-sum test; *p < =5e-2, **p < =1e-2. c Average attribution difference between the clustered target profile subgraphs.

Communications Biology | (2024)7:1149 7


https://doi.org/10.1038/s42003-024-06865-4 Article

Fig. 6 | Differential attribution analysis (DAA) of ferroptosis-inducing com- c Volcano plot of DAA results derived by comparing ferroptosis-inducing com-
pounds. a Heatmap of top-10 differentially attributed genes within Leiden clusters pound attributions to all other compound attributions. Known ferroptosis-
from Fig. 5d. Clusters ordered by hierarchical clustering of DAA profiles (columns). mediating genes in orange. d Expected cumulative distribution plot of predicted
Rows: top-10 attributed genes, columns: cell line-compound attribution sample. compound IC50 differences between HCC1143 WT and LRP8 KO cell lines.
b Comparison of predicted IC50s between cluster 26 (ferroptosis-sensitive, n = 24 Ferroptosis-inducing compounds predicted differentially potent in LRP8 KO
independent samples) and cluster 28 (ferroptosis-resistant, n = 18 independent marked. e Enrichment analysis of top-10 differentially attributed genes of
samples). Centerline, median; box limits, upper and lower quartiles; whiskers, 1.5x ferroptosis-inducing compound samples. Fisher’s exact test.
interquartile range; points, outliers; two-sided Wilcoxon rank-sum test; *p < =5e-2.

Nonetheless, several caveats merit mention. The experiments were subset of chemical space would require collecting biological screening
constrained to a limited set of cell lines and compounds, and model inter- training data at a commensurate scale, which is currently impractical. Deep
pretation reflects a limited set of biological factors. Without further study, learning models like AlphaFold and ESM have leveraged self-supervised
gene features achieving high model attribution may reflect useful but obtuse learning to extract emergent properties from extensive unlabeled protein
patterns in the datasets rather than biological causality. Similarly, without a sequence data54,55. Similarly, integrating ChemProbe with pre-trained cel-
more diverse training set of chemical structures, the model may not be lular transcriptomic or small-molecule structure foundation models may be
leveraging generalizable structural features. Deep learning model attribution a means to expand into broader biological and chemical space.
methods are primarily empirical, so the potential compound mechanisms of When used to screen disease models, engineered cell lines, and clinical
action they reveal ultimately necessitate prospective biological testing52,53. samples, ChemProbe is a powerful tool to assess how cells respond to
ChemProbe screens cell lines against half a thousand chemical probes various compounds. It supports exploring new therapeutic targets, suggests
and drug-like compounds. However, expanding its predictions to a larger disease mechanisms, and can help researchers develop more precise and

Communications Biology | (2024)7:1149 8


https://doi.org/10.1038/s42003-024-06865-4 Article

effective treatments. ChemProbe’s conditional architecture enables no. 3917, Corning). Twenty-four hours later, cells were treated with serial
expressive sensitivity predictions across chemical concentrations, enhan- dilutions between 2.05 pM and 100 µM of the following compounds: ner-
cing model interpretability and adaptability to new probes. In this study, atinib (catalog no. 18404, Cayman Chemical), CAY10618, 1S,3R-RSL-3
ChemProbe predicted drug sensitivities in multiple contexts, from cancer (catalog no. 19288, Cayman Chemical), AZD7762 (catalog no. 11491,
cells and tumor samples to data on breast cancer patient responses. Looking Cayman Chemical), ceranib-2 (catalog no. 11092, Cayman Chemical), and
forward, we hope that ChemProbe’s availability as an open-source tool will ML162 (catalog no. 20455, Cayman Chemical), and DMSO control. Cells
contribute to a range of research efforts in precision medicine and beyond. were treated for 72 h with media replaced every 24 hours. Cell viability was
measured with the CellTiter-Glo 2.0 Assay (catalog no. G9243, Promega
Methods Corporation) with 1000 ms integration time.
Pharmacogenomic dataset All cells were cultured at 37 °C in a humidified incubator with 5% CO2.
Drug sensitivity data was obtained from the Cancer Therapeutic Response MDA-MB-231-Par (ATCC HTB-26) cells were grown in DMEM supple-
Portal v1 and v2 (CTRP v1/2). These datasets comprise 864 cell line mented with 10% FCS, penicillin (100 U ml−1), streptomycin (100 μg ml−1)
responses to 481 individual compounds and 64 compound pairs across a and amphotericin (1 μg ml−1). HCC1806-Par (ATCC CRL-2335) cells were
range of concentrations. Response phenotypes were quantified by cellular grown in Roswell Park Memorial Institute-1640 medium supplemented
viability, a normalized measure characterizing complete cell killing to cell with 10% FCS, L-glutamine (2 mM), sodium pyruvate (1 mM), penicillin
stasis (0–1) and cell growth (>1). We utilized predicted cellular viability (100 U ml−1), streptomycin (100 μg ml−1) and amphotericin (1 μg ml−1).
derived from fitted dose-response curves of each experimental set, in which
replicate cell line-compound experiments were fit with a log-logistic func- Statistics and reproducibility
tion, and predicted cellular viability was derived at the original experimental Details regarding sample sizes, number of replicates, and statistical methods
concentrations27. The compound structure was represented as 512-bit are provided in the respective section subheadings.
Morgan fingerprints (radius = 2) converted by RDKit from SMILES pro-
vided by the CTRP. Experimental micromolar compound concentrations Predictive modeling baselines. We compared different models that
were concatenated with Morgan fingerprints, resulting in 513-length modify gene expression features by compound structure and concentra-
compound feature vectors. We matched CTRP cell lines with the Cancer tion using various transformations. Our baseline model, “concatenation”
Cell Line Encyclopedia (CCLE) molecular characterizations and extracted architecture, simply combined gene expression and compound features
protein-coding gene expression measurements, resulting in 19,144-length into a single vector, which was fed into a multi-layer perceptron. We
cell line feature vectors. In total, 545 total compounds or compound pairs independently evaluated the isolated effects of learning transformation
and 860 cell lines comprised 366,710 unique pairs and 5,849,340 total types using the “scale” and “shift” variants of the ChemProbe model. The
individual examples of compound response at various concentrations. “scale” model held shift parameters constant (β=0) and learned only the
scale parameters (γ), whereas the “shift” model held scale parameters
ChemProbe architecture, training, and evaluation constant (γ=1) and learned only the shift parameters (β). We assessed
The study focused on predicting drug sensitivity in the context of phar- ChemProbe’s dependence on compound concentration by creating a
macological intervention by integrating cell state features with compound “permuted” model that used random binary fingerprints for each com-
features. To achieve this, we formulated a conditional model where cellular pound, ablating structural information. We trained and evaluated all
viability is predicted based on a vector of standardized protein-coding RNA models using 5-fold cross-validation on the originally defined dataset splits
abundance values and a vector of chemical features, including structure and for three rounds of hyperparameter optimization.
concentration. We explored two methods of integrating gene expression
and small-molecule feature representations: simple concatenation and Dose-response modeling. To generate predicted dose-response curves,
hierarchical integration using feature-wise linear modulation (FiLM). log-logistic functions were fit to each set of cell line-compound predic-
ChemProbe includes a conditional encoder that embeds compound tions obtained from the five individually trained ChemProbe models. A
features into a vector of length c and an inputs encoder that embeds gene sequence of quality control conditions was defined to ensure the relia-
expression features into a vector of length g. We used a FiLM generator to bility of each dose-response relationship. Firstly, cellular viability at any
predict γ and β parameters of length g based on compound embeddings. The of the four largest compound concentrations was checked for increases of
FiLM layer then applies an affine transformation of gene expression 20% or more from the fifth largest compound concentration. If this
embeddings by γ and β parameters. This process repeats across n FiLM condition was met, the viability prediction at the largest concentration
layers, and the modulated gene expression embeddings pass through a linear was dropped. This process was repeated recursively, and a minimum of
block consisting of a linear layer, ReLU activation, batch normalization, and 16 data points was required for fitting a dose-response curve. If the
dropout. The final linear block compresses feature maps to a vector of length minimum predicted cellular viability was greater than 0.4, no dose-
1, and the mean-squared error is calculated between predicted cellular response curve was fit. For cell line-compound pairs that passed quality
viability and true cellular viability. In our experiments and publicly available control, a 4-parameter log-logistic function was fit. If the optimization
trained models, we use a transcriptome embedding block with layers of size failed, a 3-parameter log-logistic function was fit. If this optimization also
[2048, 512, 256] and a compound embedding block with layers of size [256, failed, a 2-parameter log-logistic function was fit. Additional quality
128] to project to an embedding of size g = c = 128. We used n = 2 FiLM control was performed during the analysis of predicted dose-response
layers in the final models. curves by filtering out log-logistic functions with undetermined para-
To evaluate the performance of our model, we used cross-validation meters and with predicted EC50 < 1e-3 or EC50 > 300. Scipy was used to
and split cell line-compound pairs into five groups of approximately equal fit parameters of log-logistic functions to dose-response relationships.
size by cell line to avoid data leakage and performance inflation. We trained For relative potency comparisons, the drc package in R was employed
five individual models in a leave-one-out cross-validation scenario and to fit dose-response models with a four-parameter log-logistic model. We
applied 20 rounds of hyperparameter optimization to all five individually focused on the median effective dose (ED50) as an indicator of relative
trained models. We implemented the ChemProbe model in PyTorch and potency, calculating it with the EDcomp function. Significant differences
applied hyperparameter optimization with Optuna. in compound effects between cell lines were assessed using t-values and
p-values obtained from EDcomp.
Dose-response assay and cell culture
MDA-MB-231-Par and HCC1806-Par cells were seeded at 1,000 cells per Retrospective I-SPY2 analysis. We obtained I-SPY2 clinical trial
well in quadruplicate per condition in a white opaque 96-well plate (catalog metadata and microarray characterizations of 988 patient transcriptomes

Communications Biology | (2024)7:1149 9


https://doi.org/10.1038/s42003-024-06865-4 Article

from the Gene Expression Omnibus (GEO) (GSE194040). We matched Attribution method soundness checks. To evaluate the sensitivity of
90% of the recorded genes to our training dataset, mean-imputed the the attribution method to learned parameters and data features, we
remaining 10% of genes, and standardized the data using z-score trans- conducted soundness checks. First, we assessed model-dependent attri-
formation. We then evaluated the alignment of I-SPY2 patient data with bution method invariance by comparing the attribution vectors of ran-
CCLE cell line training data across the first two principal components. domly initialized parameters of architecturally identical models with
Next, we predicted drug sensitivity for each patient across 32 con- those of the trained models. We applied integrated gradients to the
centrations (1E-3 µM–300 µM) in response to all 545 compounds and trained and randomly initialized models and compared the attribution
compound pairs in the CTRP. We generated patient-drug response vectors. Second, we evaluated data-dependent attribution method
predictions using independent models and computed the area under the invariance by permuting the data labels, training architecturally identical
curve (AUC) of each predicted dose-response assay. We scaled the AUC models, and applying integrated gradients to compare the true-model
of each patient-drug prediction between 0-1 based on the drug’s mini- and permuted-model attribution vectors (Supplementary Fig. 6c). We
mum and maximum predicted AUC across all I-SPY2 patients. used the correlation between the attribution vectors of the true and
Participants in the I-SPY2 trial were assigned to treatment arms based alternative models to assess the attribution method’s sensitivity to
on classification into one of eight subtypes, determined by HR, ERBB2- learned parameters and dependence between data features and labels.
receptor, and MammaPrint status. Adaptive randomization was employed
to dynamically adjust treatment assignments based on ongoing analysis of Attribution similarity analysis. We investigated the relationship
treatment outcomes, optimizing the likelihood of each patient achieving a between compound MOAs and learned gene expression feature depen-
pathological complete response. The trial assessed the efficacy of various dence by examining attribution vector similarity. First, we filtered attri-
combination therapies relative to paclitaxel treatment, the clinical standard bution vectors by considering compound MOA classes with at least two
of care. We identified drugs matched between I-SPY2 treatment arms and compounds successfully attributed in all seven cell lines to obtain MOA
the CTRP, including paclitaxel, neratinib, MK2206, veliparib, and carbo- classes with sufficient samples for analysis (control compound set). This
platin. In the I-SPY2 experimental arms, patients were treated with a resulted in 28 MOA classes, which served as a true label baseline. We then
combination of paclitaxel and an additional drug(s) to assess response compared these true labels to unsupervised labels generated by K-means
relative to paclitaxel treatment only. As the predictive ability of ChemProbe clustering of attribution vectors from a trained model, a randomly
was only evaluated with respect to the available compounds and compound initialized model, compound fingerprints, and a label-permutation
pairs in the CTRP, the ChemProbe predictions for I-SPY2 patients reflected baseline (Fig. 5b). We applied K-means clustering on five independent
predicted patient response to a single compound rather than a combination trials.
therapy. We analyzed gene target attributions to further investigate the model
dependence on individual nominal targets within each MOA class. Speci-
Prospective differential potency predictions. To identify differentially fically, we applied a two-sided Wilcoxon rank-sum test to group attributions
potent compounds between HCC1806-Par and MDA-MB-231-Par cell for each nominal target in the MOA class of interest and adjusted for false
lines, we computed the difference in predicted IC50 values for com- discovery rate (FDR) using the Benjamini-Hochberg (BH) procedure. We
pounds that passed dose-response modeling. We visually examined dose- visualized the nominal target with the largest average attribution difference
response curves of the top 50 differentially sensitive compounds and between groups for each MOA class (Fig. 5c).
selected candidates for in vitro testing. We based selection criteria on the
completeness of dose-response curves in each cell line, including ade- Attribution network analysis. We extended our analysis to include all
quate Emax and Emin boundaries within the predicted attribution vectors generated from the 7-cell line test set. We randomly
concentration range. selected a single nominal target from each compound MOA class to avoid
We conducted a preliminary dose-response experiment to determine bias towards closely associated targets. This is because the nominal tar-
appropriate concentration points for the subsequent dose-response gets of a single compound likely fall in close network proximity, and
experiments across a broader range of concentrations than our predic- downstream network analysis of target sets would reflect artificial over-
tions (300—1.7e-3 µM, 12 points) (Supplementary Fig. 5b, c). We narrowed connectivity. For example, the MOA class of neratinib includes nominal
the concentration range for the following experiment to capture response targets EGFR and HER2, which are involved in the same pathway.
granularity (100—2.1e-6 µM, 12 points) (Fig. 4). Therefore, we randomly chose one target from this set.
We applied Leiden clustering unsupervised discovery of attribution
Integrated gradients. We employed integrated gradients, a path-based clusters. As described above, we defined attribution cluster MOA classes by
model attribution technique, to determine the extent to which feature random target selection from each compound MOA class. We filtered the
gradients changed compared to a baseline feature vector. The method STRING database to consider only high-confidence protein–protein asso-
involves linearly interpolating n feature vectors between a designated ciations (combined score > 0.7). We queried STRING for attribution cluster
baseline and the query feature vector. We used zero-vector baselines for nominal targets and computed the connectivity of the resulting subgraph.
compound and gene expression features and set n = 50 as the step size. At To account for random subgraph connectivity due to target biases in
each interpolated feature vector step, gradients of the inputs are calcu- STRING, we randomly sampled from available targets, queried the filtered
lated with respect to the corresponding prediction. Finally, the integral of STRING database, and computed connectivity. We repeated this procedure
each feature along the path of feature gradients between the baseline with randomly sampled protein-coding genes to account for random pro-
vector and the query vector is computed. The Python package captum tein associations (Fig. 5f). We used the Networkx library for analysis.
was used to compute integrated gradients. To test for protein interaction enrichment, we defined attribution
To account for potential differences in cellular responses, we used cluster nominal targets by random target selection from each compound
the predicted compound IC50 for each cell line-compound pair to cal- MOA class, as described above (number of targets > 3). Next, we queried the
culate integrated gradients and obtain an attribution vector at the pre- STRING API for protein–protein interaction enrichment in the network of
dicted IC50. We then extracted the cell line feature attribution vector for high-confidence protein–protein associations (combined score > 0.7). We
each pair to investigate the influence of conditional compound infor- computed statistical enrichment using the hypergeometric test, which tests
mation on gradient changes in the input gene expression features. To if a query set of proteins has more interactions than expected relative to the
address cell line-specific effects, we standardized the attribution features background proteome-wide interaction distribution. We also applied the
of each cell line separately using a z-score transformation, resulting in hypergeometric test for functional enrichment of GO terms, KEGG path-
adjusted attribution vectors (Supplementary Fig. 6b, c). ways, and Reactome pathways. We used the stringdb python package to

Communications Biology | (2024)7:1149 10


https://doi.org/10.1038/s42003-024-06865-4 Article

access the STRING API. To infer potential modules of action for com- In the context of our prospective experiments, biological sample ran-
pounds, we selected the unique set of all nominal targets associated with an domization was not applicable. However, for model training and evaluation,
attribution cluster. we employed a numerical random split of samples by cell line into groups for
five-fold cross-validation.
Cell line characterization and differential expression analysis. RNA Blinding was deemed unnecessary for our study, as the prospective
sequencing was conducted on seven test cell lines in triplicate, namely experiments were solely determined by the objective predictions of an
HCC1806-Par, HCC1806-LMb/c, MDA-MB-231-Par, MDA-MB-231- algorithm.
LM2, SW480, and SW480-LvM256. Using RNA that was rRNA-depleted Cell line sources: MDA-MB-231-Par (ATCC HTB-26); HCC1806-Par
with Ribo-Zero Gold (Illumina), libraries were prepared with SciptSeq-v2 (ATCC CRL-2335).
(Illumina) and sequenced on an Illumina HiSeq4000 at UCSF Center for
Advanced Technologies. Transcript abundances were quantified using Use of large language models (LLMs)
Salmon, and tximport was utilized to summarize transcript-level mea- We used OpenAI ChatGPT 3.5 Turbo and ChatGPT 4 as scientific editing
surements. We employed DESeq2 to identify differentially expressed tools when writing the manuscript. We prompted the LLMs to suggest
genes (n = 3 per condition). revisions to our manually drafted text for improved clarity and conciseness,
predominantly at the paragraph level. We did not ask the LLMs to generate
Differential attribution analysis. To assess model dependence on indi- content de novo. An example of a prompt we used was, “You are helping edit
vidual genes within attribution clusters, we conducted an unbiased papers for a broad scientific audience, emphasizing clarity and conciseness.
analysis. We applied a two-sided Wilcoxon rank-sum test to each gene to Revise the following paragraph: <draft text here > .” We manually reviewed
analyze gene attributions within a cluster relative to all remaining sam- the LLMs’ suggested revised text and decided whether to include part, all, or
ples. We adjusted for FDR using BH to account for multiple testing. We none of it on a word-by-word basis.
utilized Scanpy to apply tests across genes in each cluster relative to all
other samples. Attributions were standard scaled and each cluster’s top- Reporting summary
10 most significant genes were plotted. Leiden groups were hierarchically Further information on research design is available in the Nature Portfolio
clustered (complete linkage) by Pearson correlation. Scanpy was used for Reporting Summary linked to this article.
computation and visualization. We obtained gene expression—sensi-
tivity Pearson correlation z-scores and corresponding visualizations from Data availability
the Cancer Therapeutics Response Portal v2 feature correlation analysis Training and validation data from CTRP v1/2 can be downloaded at ftp://
(Supplementary Fig. 8b, c). caftpd.nci.nih.gov/pub/OCG-DCC/CTD2/Broad/CTRPv2.0_2015_ctd2_
ExpandedDataset/CTRPv2.0_2015_ctd2_ExpandedDataset.zip27 CCLE exp
Software and code reporting ression data can be downloaded at https://ndownloader.figshare.com/files/
Data collection tools: Python (3.10.6) was used to collect data, along with the 24613325. CCLE sample metadata can be downloaded at https://
following packages: stringdb (0.1.5), scanpy (1.9.3). Data analysis tools: ndownloader.figshare.com/files/2461339457. I-SPY2 gene expression data is
Python (3.10.6) was used along with the following packages: numpy located at GSE194040. I-SPY2 patient-level biomarker scores, subtype classes,
(1.23.4), pandas (1.5.1), matplotlib (3.6.2), seaborn (0.11.2), scanpy (1.9.3), and clinical/response data were gathered from supplementary information
scipy (1.9.3), scikit-learn (1.1.3), statsmodels (0.13.5), rdkit (2022.9.4), of Wolf, et al.: https://www.cell.com/cms/10.1016/j.ccell.2022.05.005/
pytorch (1.13.0), pytorch-lightning (1.8.4). R (4.0.2) was used along with the attachment/c220411b-c281-41e8-befa-a45e48af9c64/mmc3.xlsx33 HGNC
following packages: tidyverse (1.13.0), tximport (1.18.0), genomicfeatures was used to map gene names: https://www.genenames.org/tools/multi-
(1.42.2), deseq2 (1.30.1), enhancedvolcano (1.8.0), drc (3.0_1). symbol-checker/. Protein–protein interaction data was downloaded from the
STRING database v11.5. The current file version is found here: https://
Life science study design stringdb-downloads.org/download/protein.links.v12.0/9606.protein.links.
Samples for prospective in vitro testing of model predictions were selected v12.0.txt.gz. CCLE gene expression—sensitivity Pearson correlation z-scores
based on the presence of complete dose-response curves among the top 50 and corresponding visualizations were obtained from the CTRP v1/2 web
differential predictions for the cell line pair. The chosen predicted dose- portal: https://portals.broadinstitute.org/ctrp.v2.1/. RNA sequencing gene
response curves exhibited adequate Emax and Emin boundaries, appro- expression profiles of triple-negative breast cancer cell line HCC1143 WT and
priately covering the predicted concentration range. Given resource con- LRP8 KO were obtained from a data access request to Zhipeng Li as original
straints, we reasoned that six out of 50 compounds (12%) provided adequate data from a related publication49. The Enrichr web portal was used to perform
representativeness of model predictions. Wikipathway, KEGG, and GO enrichment analysis (https://maayanlab.
In the prospective drug screening experiment (Fig. 4a–f; Supplemen- cloud/Enrichr/). Source data for all tables and figures are provided in Sup-
tary Fig. 4 and Supplementary Data 7), one plate showed significant cell plementary Data.
death in four wells within column 6. To maintain the integrity of the analysis
and avoid distorting the representation of the data, the values from these Code availability
four wells were excluded from the analysis. Specifically, two wells were used Our code to download, preprocess data, reproduce model training, load pre-
for testing the drug CAY10618 against the cell lines HCC1806-Par and trained weights, and run model inference is available as open source at
MDA-MB-231-Par. Similarly, two wells were dedicated to testing the drug https://github.com/keiserlab/chemprobe under the MIT License58.
neratinib against the same cell lines, HCC1806-Par and MDA-MB-231-Par.
As outlined in the methods, we initially conducted a preliminary screen Received: 11 December 2023; Accepted: 6 September 2024;
to calibrate the dose-response concentration ranges of the tested com-
pounds (300—1.7e-3 µM, 12 points) (Supplementary Fig. 4b, c and Sup-
plementary Data 7). Based on the insights gained from the preliminary References
screen, we refined the concentration range for the subsequent experiment to 1. Arrowsmith, C. H. et al. The promise and peril of chemical probes. Nat.
enhance the capture of response granularity (100—2.1e-6 µM, 12 points) Chem. Biol. 11, 536–541 (2015).
(Fig. 4 and Supplementary Data 7). Both experiments consistently 2. Workman, P. & Collins, I. Probing the probes: fitness factors for small
demonstrated a significant relationship between the responses of the dif- molecule tools. Chem. Biol. 17, 561–577 (2010).
ferential compounds (Fig. 4g; Supplementary Fig. 4c and Supplemen- 3. Blagg, J. & Workman, P. Choose and use your chemical probe wisely
tary Data 7). to explore cancer biology. Cancer Cell 32, 9–25 (2017).

Communications Biology | (2024)7:1149 11


https://doi.org/10.1038/s42003-024-06865-4 Article

4. Roden, D. M. et al. Pharmacogenomics. Lancet 394, 521–532 (2019). 27. Seashore-Ludlow, B. et al. Harnessing connectivity in a large-scale
5. Tyner, J. W. et al. Understanding drug sensitivity and tackling small-molecule sensitivity dataset. Cancer Discov. 5, 1210–1223
resistance in cancer. Cancer Res. 82, 1448–1460 (2022). (2015).
6. Baptista, D., Ferreira, P. G. & Rocha, M. Deep learning for drug 28. Ghandi, M. et al. Next-generation characterization of the Cancer Cell
response prediction in cancer. Brief. Bioinform. 22, 360–379 (2021). Line Encyclopedia. Nature 569, 503–508 (2019).
7. Costello, J. C. et al. A community effort to assess and improve drug 29. Perez, E., Strub, F., de Vries, H., Dumoulin, V. & Courville, A. FiLM:
sensitivity prediction algorithms. Nat. Biotechnol. 32, 1202–1212 Visual Reasoning with a General Conditioning Layer. Proc. AAAI Conf.
(2014). Artif. Intell. 32 (2018).
8. Partin, A. et al. Deep learning methods for drug response prediction in 30. Chuang, K. V. & Keiser, M. J. Comment on ‘Predicting reaction
cancer: predominant and emerging trends. Front. Med. 10, performance in C–N cross-coupling using machine learning’. Science
1086097 (2023). 362, eaat8603 (2018).
9. Menden, M. P. et al. Machine learning prediction of cancer cell 31. Chuang, K. V. & Keiser, M. J. Adversarial controls for scientific
sensitivity to drugs based on genomic and chemical properties. PLoS machine learning. ACS Chem. Biol. 13, 2819–2821 (2018).
ONE 8, e61318 (2013). 32. Nanda, R. et al. Effect of pembrolizumab plus neoadjuvant
10. Chiu, Y.-C. et al. Predicting drug response of tumors from integrated chemotherapy on pathologic complete response in women with early-
genomic profiles by deep neural networks. BMC Med. Genomics 12, stage breast cancer: an analysis of the ongoing phase 2 adaptively
18 (2019). randomized I-SPY2 trial. JAMA Oncol. 6, 676–684 (2020).
11. Sharifi-Noghabi, H., Zolotareva, O., Collins, C. C. & Ester, M. MOLI: 33. Wolf, D. M. et al. Redefining breast cancer subtypes to guide
multi-omics late integration with deep neural networks for drug treatment prioritization and maximize response: Predictive
response prediction. Bioinformatics 35, i501–i509 (2019). biomarkers across 10 cancer therapies. Cancer Cell 40,
12. Ding, M. Q., Chen, L., Cooper, G. F., Young, J. D. & Lu, X. Precision 609–623.e6 (2022).
oncology beyond targeted therapy: combining omics data with 34. Mantione, K. J. et al. Comparing bioinformatic gene expression
machine learning matches the majority of cancer cells to effective profiling methods: microarray and RNA-Seq. Med. Sci. Monit. Basic
therapeutics. Mol. Cancer Res. 16, 269–278 (2018). Res. 20, 138–142 (2014).
13. Rampasek, L., Hidru, D., Smirnov, P., Haibe-Kains, B. & Goldenberg, 35. Ahmadian, M. et al. Analysis of the FHIT gene and FRA3B region in
A. Dr.VAE: Drug Response Variational Autoencoder. https://arxiv.org/ sporadic breast cancer, preneoplastic lesions, and familial breast
abs/1706.08203 (2017). cancer probands. Cancer Res. 57, 3664–3668 (1997).
14. Li, M. et al. DeepDSC: a deep learning method to predict drug 36. Gazdar, A. F. et al. Characterization of paired tumor and non-tumor
sensitivity of cancer cell lines. IEEE/ACM Trans. Comput. Biol. cell lines established from patients with breast cancer. Int. J. Cancer
Bioinform. 18, 575–582 (2021). 78, 766–774 (1998).
15. Chen, J. et al. Deep transfer learning of cancer drug responses by 37. Brinkley, B. R. et al. Variations in cell form and cytoskeleton in
integrating bulk and single-cell RNA-seq data. Nat. Commun. 13, human breast carcinoma cells in vitro. Cancer Res. 40, 3118–3129
6494 (2022). (1980).
16. Liu, Q., Hu, Z., Jiang, R. & Zhou, M. DeepCDR: a hybrid graph 38. Cailleau, R., Olivé, M. & Cruciger, Q. V. Long-term human breast
convolutional network for predicting cancer drug response. carcinoma cell lines of metastatic origin: preliminary characterization.
Bioinformatics 36, i911–i918 (2020). Vitro 14, 911–915 (1978).
17. Yi, H.-C., You, Z.-H., Huang, D.-S. & Kwoh, C. K. Graph representation 39. Liu, Y. et al. Multi-omic measurements of heterogeneity in HeLa cells
learning in bioinformatics: trends, methods and applications. Brief. across laboratories. Nat. Biotechnol. 37, 314–322 (2019).
Bioinform. 23, bbab340 (2022). 40. Minn, A. J. et al. Genes that mediate breast cancer metastasis to lung.
18. Zuo, Z. et al. SWnet: a deep learning model for drug response Nature 436, 518–524 (2005).
prediction from cancer genomic signatures and compound chemical 41. Earnest-Noble, L. B. et al. Two isoleucyl tRNAs that decode
structures. BMC Bioinforma. 22, 434 (2021). synonymous codons divergently regulate breast cancer metastatic
19. Manica, M. et al. Toward explainable anticancer compound sensitivity growth by controlling translation of proliferation-regulating genes.
prediction via multimodal attention-based convolutional encoders. Nat. Cancer 3, 1484–1497 (2022).
Mol. Pharm. 16, 4797–4806 (2019). 42. Loo, J. M. et al. Extracellular metabolic energetics can promote cancer
20. Chang, Y. et al. Cancer Drug Response Profile scan (CDRscan): a progression. Cell 160, 393–406 (2015).
deep learning model that predicts drug effectiveness from cancer 43. Gupta, A. & Arora, S. A simple saliency method that passes the sanity
genomic signature. Sci. Rep. 8, 8857 (2018). checks. https://arxiv.org/abs/1905.12152 (2019).
21. Janizek, J. D. et al. Uncovering expression signatures of synergistic 44. Adebayo, J. et al. Sanity checks for saliency maps. https://arxiv.org/
drug responses via ensembles of explainable machine-learning abs/1810.03292 (2018).
models. Nat. Biomed. Eng. https://doi.org/10.1038/s41551-023- 45. Maggiora, G., Vogt, M., Stumpfe, D. & Bajorath, J. Molecular similarity
01034-0 (2023). in medicinal chemistry. J. Med. Chem. 57, 3186–3204 (2014).
22. Cadow, J., Born, J., Manica, M., Oskooei, A. & Rodríguez Martínez, M. 46. Szklarczyk, D. et al. STRING v10: protein-protein interaction
PaccMann: a web service for interpretable anticancer compound networks, integrated over the tree of life. Nucleic Acids Res. 43,
sensitivity prediction. Nucleic Acids Res. 48, W502–W508 (2020). D447–D452 (2015).
23. Parca, L. et al. Modeling cancer drug response through drug-specific 47. Stockwell, B. R. Ferroptosis turns 10: emerging mechanisms,
informative genes. Sci. Rep. 9, 15222 (2019). physiological functions, and therapeutic applications. Cell 185,
24. Zhang, H., Chen, Y. & Li, F. Predicting anticancer drug response with 2401–2421 (2022).
deep learning constrained by signaling pathways. Front. Bioinforma. 48. Li, J. et al. Ferroptosis: past, present and future. Cell Death Dis. 11,
1, 10 (2021). 88 (2020).
25. Kuenzi, B. M. et al. Predicting drug response and synergy using a deep 49. Li, Z. et al. Ribosome stalling during selenoprotein translation exposes
learning model of human cancer cells. Cancer Cell 38, 672–684.e6 (2020). a ferroptosis vulnerability. Nat. Chem. Biol. 1–11 https://doi.org/10.
26. Rees, M. G. et al. Correlating chemical sensitivity and basal gene 1038/s41589-022-01033-3 (2022).
expression reveals mechanism of action. Nat. Chem. Biol. 12, 50. Duffy, M. J. & Crown, J. A personalized approach to cancer treatment:
109–116 (2016). how biomarkers can help. Clin. Chem. 54, 1770–1779 (2008).

Communications Biology | (2024)7:1149 12


https://doi.org/10.1038/s42003-024-06865-4 Article

51. Tsimberidou, A. M., Fountzilas, E., Nikanjam, M. & Kurzrock, R. Visualization, W.C.; Supervision, H.G., M.J.K.; Project administration,
Review of precision cancer medicine: evolution of the treatment M.J.K.; Funding acquisition, H.G., M.J.K.
paradigm. Cancer Treat. Rev. 86, 102019 (2020).
52. Alipanahi, B., Delong, A., Weirauch, M. T. & Frey, B. J. Predicting the Competing interests
sequence specificities of DNA- and RNA-binding proteins by deep The authors declare no competing interests.
learning. Nat. Biotechnol. 33, 831–838 (2015).
53. Avsec, Ž. et al. Base-resolution models of transcription-factor binding Additional information
reveal soft motif syntax. Nat. Genet. 53, 354–366 (2021). Supplementary information The online version contains
54. Jumper, J. et al. Highly accurate protein structure prediction with supplementary material available at
AlphaFold. Nature https://doi.org/10.1038/s41586-021-03819-2 https://doi.org/10.1038/s42003-024-06865-4.
(2021).
55. Rives, A. et al. Biological structure and function emerge from scaling Correspondence and requests for materials should be addressed to
unsupervised learning to 250 million protein sequences. Proc. Natl Michael J. Keiser.
Acad. Sci. USA 118, e2016239118 (2021).
56. Culbertson, B. et al. A sense-antisense RNA interaction promotes Peer review information Communications Biology thanks the anonymous
breast cancer metastasis via regulation of NQO1 expression. Nat. reviewers for their contribution to the peer review of this work. Primary
Cancer 4, 682–698 (2023). handling editor: Christina Karlsson Rosenthal. A peer review file is available.
57. DepMap, B. Current DepMap Release data, including CRISPR
Screens, PRISM Drug Screens, Copy Number, Mutation, Expression, Reprints and permissions information is available at
and Fusions. DepMap 21Q2 Public https://doi.org/10.25452/figshare. http://www.nature.com/reprints
plus.25880521.v1 (2021).
58. Connell, W. ChemProbe. https://doi.org/10.5281/zenodo.13381833 Publisher’s note Springer Nature remains neutral with regard to
(Github, 2024). jurisdictional claims in published maps and institutional affiliations.

Acknowledgements Open Access This article is licensed under a Creative Commons


This work was supported by CZI grant DAF2018-191905 (https://doi. Attribution 4.0 International License, which permits use, sharing,
org/10.37921/550142lkcjzw) from the Chan Zuckerberg Initiative DAF, adaptation, distribution and reproduction in any medium or format, as long
an advised fund of Silicon Valley Community Foundation (funder https:// as you give appropriate credit to the original author(s) and the source,
doi.org/10.13039/100014989) (M.J.K.), the National Institute of Mental provide a link to the Creative Commons licence, and indicate if changes
Health of the National Institutes of Health (U01 MH115747-02 M.J.K.). were made. The images or other third party material in this article are
We thank Zhipeng Li for his valuable insights into ferroptosis inter- included in the article’s Creative Commons licence, unless indicated
pretation and for generously sharing the sequencing data of the LRP8 otherwise in a credit line to the material. If material is not included in the
KO cell line. article’s Creative Commons licence and your intended use is not permitted
by statutory regulation or exceeds the permitted use, you will need to
Author contributions obtain permission directly from the copyright holder. To view a copy of this
Conceptualization, W.C.; Methodology, W.C.; Software, W.C.; Validation, licence, visit http://creativecommons.org/licenses/by/4.0/.
K.G.; Formal analysis, W.C.; Investigation, W.C., K.G.; Data curation, W.C.;
Writing - Original Draft, W.C.; Writing - Review & Editing, W.C., M.J.K.; © The Author(s) 2024

Communications Biology | (2024)7:1149 13

You might also like