Cancer Systems Biology - Methods and Protocols (PDFDrive)
Cancer Systems Biology - Methods and Protocols (PDFDrive)
Cancer Systems Biology - Methods and Protocols (PDFDrive)
Cancer Systems
Biology
Methods and Protocols
METHODS IN MOLECULAR BIOLOGY
Series Editor
John M. Walker
School of Life and Medical Sciences,
University of Hertfordshire, Hatfield,
Hertfordshire AL10 9AB, UK
Edited by
Cancer is a highly complex disease that is often characterized by vast changes in the genetic
and epigenetic landscape. Those changes result in altered protein expression levels in tumors
compared to healthy tissues. Moreover, posttranscriptional alterations lead to deregulation
of signaling processes, and altered metabolic pathways can produce aberrant metabolic
signatures in cancer cells.
A wealth of high-throughput information has emerged over the last decade, including
global measurements of genes, proteins, and metabolites, as well as many other molecular
species. Those studies provide a glimpse of the molecular makeup of cancer cells on various
levels. In order to classify tumor types and predict clinical outcomes of cancer, researchers
often employ sophisticated computational tools to extract cancer-specific events from the
excessive amounts of data that have been compiled.
This volume on “Cancer Systems Biology” comprises protocols, which describe systems
biology methodologies and computational tools, offering a variety of ways to analyze
different types of high-throughput cancer data. Those include for example network- and
pathway-based analyses. Other chapters cover descriptive and predictive mathematical mod-
els used to analyze complex cancer phenotypes and responses to anticancer drugs.
A number of chapters give an overview of data types available in large-scale data
repositories, describe state-of-the-art computational methods used, and highlight key
trends in the field of cancer systems biology.
v
Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Contributors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
vii
viii Contents
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399
Contributors
ix
x Contributors
Abstract
Cancer genes may tend to mutate in a co-mutational or mutually exclusive manner in a tumor sample of a
specific cancer, which constitute two known combinatorial mutational patterns for a given gene set.
Previous studies have established that genes functioning in different signaling pathways can mutate in the
same sample, i.e., a tumor from one patient, while genes operating in the same pathway are rarely mutated
in the same cancer genome. Therefore, reliable identification of combinatorial mutational patterns of
candidate cancer genes has important ramifications in inferring signaling network modules in a particular
cancer type. While algorithms for discovering mutated driver pathways based on mutual exclusivity of
mutations in cancer genes have been proposed, a systematic pipeline for identifying both co-mutational and
mutually exclusive patterns with rational significance estimation is still lacking. Here, we describe a reliable
framework with detailed procedures to simultaneously explore both combinatorial mutational patterns
from public cross-sectional gene mutation data.
Key words Cancer genomics, Co-mutation, Mutual exclusivity, Signaling pathway, Hypergeometric test
1 Introduction
Louise von Stechow (ed.), Cancer Systems Biology: Methods and Protocols, Methods in Molecular Biology, vol. 1711,
https://doi.org/10.1007/978-1-4939-7493-1_1, © Springer Science+Business Media, LLC 2018
3
4 Hua Tan and Xiaobo Zhou
Tumor Samples
Gene 1 Co-mutational
Gene 2 pattern
Fig. 1 Schematic representation of two combinatorial mutational patterns studied in this protocol: the
co-mutational pattern (upper panel) refers to the scenario that a set of genes tends to mutate simultaneously
in a tumor sample, whereas the mutually exclusive pattern (lower panel) represents the opposite scenario:
genes in a given set tend to avoid mutating simultaneously in any one tumor sample
Mutaton
records in
database
Quality control
Preprocessing
Calculate Calculate
likelihood rato signifcance
level
Determine
mutatonal paern
For Inferring
signaling
network
Fig. 2 Schematic of the overall pipeline proposed in this protocol. The specific
steps of text processing, computation, and visualization are provided in
Subheading 3
6 Hua Tan and Xiaobo Zhou
2 Materials
3 Methods
3.1 Data Quality 1. Extract mutations of a designated cancer type from the mixed
Control mutation records in COSMIC by the keyword “Primary site”
and Preprocessing (see Note 1).
of COSMIC Mutation 2. Remove synonymous mutations by the keyword “Substitution-
Entries coding silent” (see Note 1).
3. Remove mutation records that are not from a genome-wide
study by the keyword “genome-wide screen” (see Note 1).
4. Generate a gene mutation pattern matrix based on the muta-
tions and sample IDs. The rows and columns of the matrix refer
to samples and genes, respectively. The entry at row i and
column j of the matrix refers to the number of mutations
occurring on gene j in tumor sample i. Figure 3 highlights an
example showing the 9th tumor sample has a mutation on gene
2 by marking the coordinate (9,2) (see Note 2).
Combinatorial Mutational Patterns in Human Cancers 7
12 11 10 9 8 7 6 5 4 3 2 1
12 10 9 8 7 6 5 4 3 2 1
Samples
Fig. 3 Schematic depicting the mutation pattern matrix and entry filtering criteria. (a) A mutation pattern matrix
is generated to represent the mutation profiles of the tumor samples across all genes. A gray grid indicates the
corresponding sample has at least one mutation on the gene specified by the column ID. (b) Columns 3, 5, and
6 are deleted since the associated genes are mutated in only a small fraction of samples (the threshold of
fraction can be prescribed). (c) Row 11 is deleted as the corresponding sample has no mutation in the
remaining genes after the processing in (b)
P ðg1 ¼ 1; g2 ¼ 1Þ
LR comb ¼ ð1Þ
P ðg1 ¼ 1ÞP ðg2 ¼ 1Þ
3.3 Calculation 1. Calculate the significance level of the co-mutational pattern Pco
of Significance by the hypergeometric test as the formula (2):
of Combinatorial
n2
X
Mutational Patterns n1 n n1 n
P co ¼ = ð2Þ
k n2 k n2
k¼n12
n12
X
n1 n n1 n
P excl ¼ = ð3Þ
k n2 k n2
k¼0
where n, n1, n2, n12 are defined as in the formula of Pco above.
5 Notes
Acknowledgments
References
1. Hanahan D, Weinberg RA (2000) The hall- mutations in cancer. FASEB J 22
marks of cancer. Cell 100(1):57–70. https:// (8):2605–2622. https://doi.org/10.1096/fj.
doi.org/10.1016/S0092-8674(00)81683-9 08-108985
2. Stratton MR, Campbell PJ, Futreal PA (2009) 11. Ciriello G, Cerami E, Sander C, Schultz N
The cancer genome. Nature 458 (2012) Mutual exclusivity analysis identifies
(7239):719–724. https://doi.org/10.1038/ oncogenic network modules. Genome Res 22
Nature07943 (2):398–406. https://doi.org/10.1101/gr.
3. Peng H, Tan H, Zhao W, Jin G, Sharma S, 125567.111
Xing F, Watabe K, Zhou X (2016) Computa- 12. Vandin F, Upfal E, Raphael BJ (2012) De novo
tional systems biology in cancer brain metasta- discovery of mutated driver pathways in cancer.
sis. Front Biosci 8:169–186 Genome Res 22(2):375–385. https://doi.
4. Tan H, Bao J, Zhou X (2012) A novel org/10.1101/gr.120477.111
missense-mutation-related feature extraction 13. Leiserson MD, Blokh D, Sharan R, Raphael BJ
scheme for ‘driver’ mutation identification. (2013) Simultaneous identification of multiple
Bioinformatics 28(22):2948–2955. https:// driver pathways in cancer. PLoS Comput Biol 9
doi.org/10.1093/bioinformatics/bts558 (5):e1003054. https://doi.org/10.1371/jour
5. Tan H, Bao J, Zhou X (2015) Genome-wide nal.pcbi.1003054
mutational spectra analysis reveals significant 14. Forbes SA, Beare D, Gunasekaran P, Leung K,
cancer-specific heterogeneity. Sci Rep Bindal N, Boutselakis H, Ding M, Bamford S,
5:12566. https://doi.org/10.1038/ Cole C, Ward S, Kok CY, Jia M, De T, Teague
srep12566 JW, Stratton MR, McDermott U, Campbell PJ
6. Tan H, Li F, Singh J, Xia X, Cridebring D, (2015) COSMIC: exploring the world’s
Yang J, Bao J, Ma J, Zhan M, Wong STC knowledge of somatic mutations in human can-
(2012) A 3-dimentional multiscale model to cer. Nucleic Acids Res 43(Database issue):
simulate tumor progression in response to D805–D811. https://doi.org/10.1093/nar/
interactions between cancer stem cells and gku1075
tumor microenvironmental factors. IEEE 6th 15. Greenman C, Stephens P, Smith R, Dalgliesh
International Conference on Systems Biology GL, Hunter C, Bignell G, Davies H, Teague J,
(ISB):297–303. https://doi.org/10.1109/ Butler A, Stevens C, Edkins S, O’Meara S,
ISB.2012.6314153 Vastrik I, Schmidt EE, Avis T, Barthorpe S,
7. Tan H, Wei K, Bao J, Zhou X (2013) In silico Bhamra G, Buck G, Choudhury B,
study on multidrug resistance conferred by Clements J, Cole J, Dicks E, Forbes S,
I223R/H275Y double mutant neuraminidase. Gray K, Halliday K, Harrison R, Hills K,
Mol BioSyst 9(11):2764–2774. https://doi. Hinton J, Jenkinson A, Jones D, Menzies A,
org/10.1039/c3mb70253g Mironenko T, Perry J, Raine K, Richardson D,
8. Vogelstein B, Papadopoulos N, Velculescu VE, Shepherd R, Small A, Tofts C, Varian J,
Zhou S, Diaz LA Jr, Kinzler KW (2013) Can- Webb T, West S, Widaa S, Yates A, Cahill DP,
cer genome landscapes. Science 339 Louis DN, Goldstraw P, Nicholson AG,
(6127):1546–1558. https://doi.org/10. Brasseur F, Looijenga L, Weber BL, Chiew
1126/science.1235122 YE, DeFazio A, Greaves MF, Green AR,
Campbell P, Birney E, Easton DF, Chenevix-
9. Vogelstein B, Kinzler KW (2004) Cancer genes Trench G, Tan MH, Khoo SK, Teh BT, Yuen
and the pathways they control. Nat Med 10 ST, Leung SY, Wooster R, Futreal PA, Stratton
(8):789–799. https://doi.org/10.1038/ MR (2007) Patterns of somatic mutation in
nm1087 human cancer genomes. Nature 446
10. Yeang CH, McCormick F, Levine A (2008) (7132):153–158. https://doi.org/10.1038/
Combinatorial patterns of somatic gene nature05610
Combinatorial Mutational Patterns in Human Cancers 11
16. Ihaka P, Gentleman R (1996) R: a language for 18. Shannon P, Markiel A, Ozier O, Baliga NS,
data analysis and graphics. J Comput Graph Wang JT, Ramage D, Amin N,
Stat 5(3):299–314 Schwikowski B, Ideker T (2003) Cytoscape: a
17. Dempster AP, Laird NM, Rubin DB (1977) software environment for integrated models of
Maximum likelihood from incomplete data via biomolecular interaction networks. Genome
EM Algorithm. J Roy Stat Soc B Met 39 Res 13(11):2498–2504. https://doi.org/10.
(1):1–38 1101/gr.1239303. 13/11/2498 [pii]
Chapter 2
Abstract
With the extraordinary rise in available biological data, biologists and clinicians need unbiased tools for data
integration in order to reach accurate, succinct conclusions. Network biology provides one such method for
high-throughput data integration, but comes with its own set of algorithmic problems and needed
expertise. We provide a step-by-step guide for using Omics Integrator, a software package designed for
the integration of transcriptomic, epigenomic, and proteomic data. Omics Integrator can be found at
http://fraenkel.mit.edu/omicsintegrator.
Key words Data integration, Network biology, Computational biology, High-throughput data
1 Introduction
Louise von Stechow (ed.), Cancer Systems Biology: Methods and Protocols, Methods in Molecular Biology, vol. 1711,
https://doi.org/10.1007/978-1-4939-7493-1_2, © Springer Science+Business Media, LLC 2018
13
14 Amanda J. Kedaigle and Ernest Fraenkel
sources can lead to novel discoveries that each assay could have
missed on its own [4–6].
Network biology is a fast-growing category of methods for this
type of analysis [7]. Network models provide a valuable resource for
biologists looking to analyze their high-throughput data in a sys-
tems context. By mapping “hits” from high-throughput assays
onto interaction networks, the mechanistic connections between
the hits become obvious, and investigators can focus on pathways,
or series of interactions in the cell that are related to a certain
function, that may be perturbed in the system.
Network methods typically involve modeling the molecules
within a cell—which can for example be DNA, mRNAs, proteins,
or metabolites—as nodes in a graph. Edges between these nodes
connect molecules that are functionally or physically connected
[7]. For example, a protein-protein interaction network (PPI)
would represent the binding of protein A to protein B by drawing
an edge between the “A” node and “B” node in the network.
Several publicly available databases have been created to translate
experimentally discovered protein interactions into PPIs, such as
iRefIndex [8], BioGRID [9], and STRING [10]. There are also
databases that store interactions of proteins with other molecules,
such as metabolites [11–13]. In other types of networks, the edges
can represent more abstract relationships. For example, in a
correlation-based network, edges between nodes might represent
probable co-regulation, rather than physical interactions, based on
covariance between the concentration of molecule A and molecule
B [14, 15].
Mapping high-throughput hits onto networks in search of
affected pathways has several advantages. Hits that are close to
each other in a network might function in the same pathway.
Focusing on subnetworks of functionally related nodes can produce
a more tractable number of targets, rather than the potentially
hundreds of individual factors identified in high-throughput
experiments. In addition, this type of pathway identification
reduces the chance of devoting resources to the analysis of false
positives from the high-throughput screen. Although the confi-
dence for each hit in a screen may be low, the confidence in a
pathway that contains many hits is much higher. Finally, pathway
analysis can help to find novel nodes that may not have appeared in
a high-throughput screen. These “hidden nodes” can be false
negatives in a screen, or true negatives that are nonetheless impor-
tant players in the investigated biological system. Our work has
shown that these hidden nodes can often be important to a system
under study, despite the lack of direct experimental evidence [4, 16,
17]. Using the PPI to discover these pathways de novo, rather than
relying on predetermined pathway databases like KEGG [18],
expands our ability to find novel information, and avoids biasing
the results toward well-studied pathways.
Network-based Integration of ‘Omic’ Data 15
2 Materials
Fig. 1 Outline of the Omics Integrator workflow. Epigenomic data (open chromatin regions or histone marks)
and transcriptomic data are used to predict influential transcription factors (TFs). Transcription factors and
proteomic data are then mapped onto an interactome, and the Prize Collecting Steiner Forest algorithm is used
to produce small pathways and subnetworks predicted to be relevant to the experimental system
3 Methods
3.1 Installation 1. You can run Omics Integrator as a web tool on our website:
of Omics Integrator http://fraenkel.mit.edu/omicsintegrator/or install it on your
own computer using the instructions at https://github.com/
fraenkel-lab/OmicsIntegrator. You should make sure you have
all dependencies (see Note 2) installed and that you have the
most updated version of Omics Integrator from our GitHub
page (see Note 3).
Network-based Integration of ‘Omic’ Data 17
3.2 Finding Garnet uses differentially expressed genes from your transcriptomic
Transcriptional assays (i.e., RNA-seq) to predict transcription factors (TFs) that are
Regulators with Garnet likely to be responsible for the altered gene expression. It uses
epigenomic data to find regions of the genome to look for differ-
ential TF binding. For example, this could be ATAC-seq data that
points out accessible regions of the genome in your cell type. The
algorithm will search for transcription factor binding motifs within
regions implicated by your epigenomic data. The strength of these
motifs is then correlated with the magnitude of change of nearby
differentially expressed genes to give each TF a score.
1. Obtain epigenomic data for cell lines related to your samples
from one of the sources listed under Subheading 2.1. Alterna-
tively, if you have epigenomic data for your own samples, you
can use this as well. These data can be in the form of histone
marks ChIP-seq, or DNase-seq or ATAC-seq, all of which
indicate accessible chromatin regions where a TF might be
bound. Collect these data in a BED-formatted file.
2. Go to the Galaxy webserver [23] (see Note 4) to extract the
DNA sequences for your epigenomic regions. Upload your
BED file to Galaxy under the “Get Data” tool, specify which
genome you are using, and then use the “Fetch Alignments/
Sequences”>“Extract Genomic DNA” tool to download a
FASTA-formatted file.
3. Format your experimentally derived gene expression data in a
tab-delimited file with two columns. The first should be the
name of the gene, and the second should be the log-fold-
change of that gene in the study conditions (i.e., tumor
vs. control). We recommend only including genes with a statis-
tically significant change in expression (see Note 5).
4. Create the Garnet configuration file. For an example configu-
ration file, see the README on the Omics Integrator GitHub
page, or the comment on the top of scripts/garnet.py. Your
configuration file should be formatted similarly, but you should
replace the paths to the bedfile, fastafile, and expressionFile with
the paths to the files you created in steps 1–3 in Subheading
3.2. Make sure the annotation files referenced by genefile,
xreffile, and genome are using the correct genome for your
sample (files for mm9 and hg19 are provided with Omics
Integrator).
5. You can change the parameters to your liking (Table 1).
6. Run Garnet on the command line by navigating to the direc-
tory with garnet.py and running python garnet.py yourconfigfile.
cfg. You can also add a --outdir directoryname flag if you would
like to put the output from garnet into a different directory.
18 Amanda J. Kedaigle and Ernest Fraenkel
Table 1
An explanation of the parameters used by Garnet
windowsize This parameter determines the maximum distance in nucleotides from a gene TSS to a TF
binding motif to consider them related. Higher values will find more TFs, but their
binding may be farther away from the gene, and thus, less likely to be directly related to
expression. Values usually range from 2000 to 20,000
pvalThresh The p value of a correlation measures how likely you are to get this correlation value if the
events were not correlated. This threshold determines which transcription factors will be
passed to Forest. Only those whose correlation with expression falls below the provided
threshold will be included. Recommended values range from 0.01 to 0.05. Leave this
value blank to use a q value threshold rather than a p value
qvalThresh A q value is a False Discovery Rate adjusted p value. This measurement will result in fewer
false positives. This threshold determines which transcription factors will be passed to
Forest. Only those whose correlation with expression falls below the provided threshold
will be included. Recommended values range from 0.01 to 0.05. Leave this value blank if
a p value threshold is sufficient. (If you are going on to run Forest, a p value is generally
sufficient since the network nature of Forest make false positives less likely to appear in a
final network)
3.3 Network Forest integrates proteomic data and the output from Garnet into a
Integration with Forest network. After mapping the data onto a provided interactome
network, it uses the prize-collecting Steiner tree algorithm (solved
by the msgsteiner code that you downloaded and installed) to find
an optimal set of subnetworks. These subnetworks can then be
analyzed for pathway context.
1. If you are not using the default interactome provided with
Omics Integrator, prepare your input interactome file. An
interactome file (or “edge file”) contains the large network of
all known connections between nodes. The file should be
Network-based Integration of ‘Omic’ Data 19
Table 2
An explanation of the parameters used by Forest
w This parameter influences the number of separate trees detected, which can aid in identifying
functionally distinct processes. Higher values of w lead to more trees in the optimal forest,
while lower values force most prizes to be found in the same tree. Values usually range
from 1 to 10. See Tuncbag et al. [14] for a more detailed explanation
b This parameter linearly scales the prizes, thereby changing the relative weighting of edge
weights and node prizes. Higher values lead to larger trees, including some
low-confidence edges, while lower values force networks to be small and use only high
confidence edges, and lead to the possible exclusion of some prize terminals. Values
usually range from 1 to 20
D This parameter sets the maximum depth from the dummy node, or root of the tree, to the
leaf nodes. Higher values lead to long pathways, while lower values lead to shorter
disparate pathways. Values usually range from 5 to 15
μ This parameter controls negative prizes in Forest. Negative prizes are explained in detail in
Section 3.4.2. The default value is zero, and if you want to use negative prizes, values
usually range from 0.0001 to 0.1
garnetBeta This parameter controls the relative weighting of TF scores derived from Garnet and prize
values on proteomic nodes. Higher values will encourage the inclusion of more TF nodes
in the network, while lower values force networks to include only the most significant or
pathway-relevant TF nodes. Typically, the value for this parameter is set to the median
value of the proteomic prizes divided by the median value of the TF scores
20 Amanda J. Kedaigle and Ernest Fraenkel
and D. If you are including results from Garnet, you will also
need a garnetBeta parameter. See Subheading 3.4.1 for more
information.
4. You can now run forest with the command python forest.py –p
yourprizefile.txt –e youredgefile.txt –c yourconfigfile.txt --garnet
yourgarnetoutput_FOREST_INPUT.tsv. You can also add a
--outlabel yourexperimentname flag to give your output files a
prefix and a --outpath directoryname flag if you would like to
put the output from forest into a different directory. You may
need to add a --msgpath directoryname flag to indicate where
you installed the msgsteiner code during the installation step.
There are several other optional flags you can add to this
command if wanted (see Note 7).
5. Forest will run through several steps, informing you on the
command line where it is in the process. These steps include:
l reading in your input files.
l running the msgsteiner optimization.
l writing the output files.
3.4 Network Quality 1. We recommend checking the robustness and specificity of your
Control networks. You can do this by adding flags to the forest.py
command. Add --noisyEdges 10 to test robustness of your net-
work to noise in the edge weights. This command will add
Gaussian noise to the edgeweights, re-run Forest ten (or your
input number of) times, and then merge the results into output
files with noisyEdges in the filenames. Add --randomTerminals
10 to test specificity of your network to your input terminals.
Network-based Integration of ‘Omic’ Data 21
3.4.1 Choosing The resulting network from this data integration algorithm is
Parameters for Forest highly dependent on several parameters. These include w, b, D, μ,
and garnetBeta (Table 2).
We recommend running Forest over a range of these values to
find the best set for your system. To see an example of a script for
testing parameters, see OmicsIntegrator/example/GBM/
GBM_case_study.py. Once you have several resulting networks,
we recommend choosing the best result by
1. Choosing a set of parameters that maximizes the fraction of
input prize nodes that are included in the final network and are
robust to noise (as judged by the noisyEdges runs).
2. Some parameters will lead to networks with large “hubs,” that
is, one hidden protein in the middle connected to several prize
nodes with few interactions between these “spokes.” These
hubs are usually not informative or very specific to one system.
We recommend choosing parameters that minimize this by
measuring the average degree of hidden nodes in your network
(i.e., the number of edges connecting to those nodes in the
interactome) compared to the average degree of prize nodes. A
good parameter set will minimize the distance between these
metrics. Figure 2 shows an example of this analysis using the
data in the example/a549 folder (see Fig. 2).
3. Once conditions 1 and 2 are satisfied, we prefer larger net-
works, as those provide the most opportunities for novel dis-
coveries of hidden nodes and pathways enriched in the
subnetworks.
3.4.2 Negative Prizes One of the more innovative aspects of Omics Integrator is its ability
in Forest to incorporate negative evidence. There are two settings in which
22 Amanda J. Kedaigle and Ernest Fraenkel
Fig. 2 An analysis of several parameter sets when running Forest on the sample A549 data provided with
Omics Integrator. A good parameter set will minimize the difference between the average degree of prize
nodes and hidden nodes, and will include a large number of prize nodes. A good choice of a parameter set is
highlighted by the black arrow. The A549 dataset reflects phosphoproteomic changes in a lung cancer cell line
when stimulated with TGF-beta. The black arrow highlights a network that includes relevant nodes such as
EGFR, while networks with large average degree of hidden nodes are mostly comprised of a hub centered on
ubiquitin-C, which connects to most prize nodes in the interactome, but is not specific to the lung cancer cell
system
4 Notes
is set not to turn off or interrupt the run. You can also run
Omics Integrator on a cloud server. However, if the run is
taking more than a day, you should cancel the run and look
for errors. In particular, try running Forest without or with a
smaller input to noisyEdges or randomTerminals, as these
options can lead to large memory and time consumption.
High values for the D parameter can also increase runtime.
Acknowledgments
References
Abstract
Epigenetic modifications play a key role in cellular development and tumorigenesis. Recent large-scale
genomic studies have shown that mutations in players of the epigenetic machinery and concomitant
perturbation of epigenomic patterning are frequent events in tumors. Among epigenetic marks, DNA
methylation is one of the best studied. Hyper- and hypo-methylation events of specific regulatory elements
(such as promoters and enhancers) are sometimes thought to be correlated with expression of nearby genes.
High-throughput bisulfite converted sequencing is currently the technology of choice for studying DNA
methylation in base-pair resolution and on whole-genome scale. Such broad and high-resolution coverage
investigations of the epigenome provide unprecedented opportunities to analyze DNA methylation pat-
terns, which are correlated with tumorigenesis, tumor evolution, and tumor progression. However, few
computational pipelines are available to the public to perform systematic DNA methylation analysis.
Utilizing open source tools, we here describe a comprehensive computational methodology to thoroughly
analyze DNA methylation patterns during tumor evolution based on bisulfite converted sequencing data,
including intra-tumor methylation heterogeneity.
1 Introduction
Louise von Stechow (ed.), Cancer Systems Biology: Methods and Protocols, Methods in Molecular Biology, vol. 1711,
https://doi.org/10.1007/978-1-4939-7493-1_3, © Springer Science+Business Media, LLC 2018
27
28 Heng Pan and Olivier Elemento
2 Materials
FastQC
FASTQ
FastQC report
files
Trim Galore
Trimmed
FASTQ
files
Bismark
Bismark Errbs-tools
Methyl
BAM files
files
Errbs-tools
Errbs-tools RRBSseeqer
ChIPseeqer
Regulatory
MH DMCs/DM
Methyl
Hotspots Rs
Region
Gene list
Fig. 1 Schematic of ERRBS data analysis pipelines. This comprehensive computational pipelines start from the
ERRBS FASTQ files. The first step is to use FastQC to perform quality check of ERRBS data and make sure the
data quality is good enough to make downstream analysis. Second, Trim Galore is used to remove adapter
contaminations. Third, Bismark is used to map reads to bisulfite converted genomes and call methyl files,
which indicates methylation status for each CpG site in genomes. Next, many computational tools including
Errbs-tools, ChIPseeqer and RRBSseeqer are used to perform downstream analysis. MH hotspots can be used
to perform global MH analysis and link MH to tumor evolution and disease progression. Individual DMCs/DMRs
can be annotated to nearest genes and such gene lists can be used to perform pathway analysis. Methylation
levels for regulatory regions are good inputs for both supervised and unsupervised types of downstream
analysis
Tumor Methylome Analysis 31
2.2 Input Files 1. For fastqc in FastQC: FASTQ format files of normal or tumor
samples are most common inputs, BAM or SAM format files
are also acceptable.
2. For trim_galore in Trim Galore: FASTQ format files of healthy
tissue or tumor samples are required.
3. For bismark_genome_preparation in Bismark: FASTQ/
FASTA format files of genome reference are required.
4. For bismark in Bismark: FASTQ format files processed with
trim_galore are required. FASTA format files are also accept-
able but not recommended since the quality values are missing
for such types of data.
5. For bismark_methylation_extractor in Bismark: BAM files
from bismark are used as inputs.
6. For methylCall_from_Bismark.py in Errbs-tools: CpG_OT_-
sample.RRBS_trimmed.1bp.fq_bismark.txt and CpG_OB_-
sample.RRBS_trimmed.1bp.fq_bismark.txt from bismark_m
ethylation_extractor are used as input files. Reads in file set
1 (labeled with OT) reflect methylation levels of CpGs in the
forward strand. Reads in file set 2 (labeled with OB) contain m
ethylation information of CpGs in the reverse strand.
7. For epicore2calls.pl in RRBSseeqer: Methyl files from methyl-
Call_from_Bismark.py are used as input files (see Table 1).
8. For RRBSseeqer_CG in RRBSseeqer: output files from epi-
core2calls.pl are used as inputs.
9. For RRBSidentifyUpDownDMR.pl in RRBSseeqer: output
files with DMCs information from RRBSseeqer_CG are used
as input files.
10. For ChIPseeqerAnnotate, mergeCSAnnotateGenesCol-
umns.pl, make_PAGE_input.pl and page.pl in ChIPseeqer:
files with DMR information from RRBSidentifyUp-
DownDMR.pl are used as inputs. Each tool uses the output
files from the previous one for those four sequential tools.
11. For regionMethyl.R in Errbs-tools: two kinds of input files are
required. One is the Methyl file from methylCall_from_Bis-
mark.py (see Table 1), the other one is RDS format file includ-
ing genomic region annotations in GRanges or GRangesList
objects [18, 19]. RDS is a special R based format, which can
store a single R object.
12. For regionMH.R in Errbs-tools: three types of input files are
required. The first type is the Methyl file from methylCall_-
from_Bismark.py (see Table 1). The second one is the BAM
file from bismark, which is a binary format for storing
sequence data. BAM format is a more space-saving format as
compared to SAM format data. The last one is the RDS file
format including the genomic region annotations in GRanges
or GRangesList objects [18, 19].
32 Heng Pan and Olivier Elemento
Table 1
RRBSeeqer input files example
(continued)
Tumor Methylome Analysis 33
Table 1
(continued)
3 Methods
3.2 ERRBS Reads Mapping ERRBS reads to a bisulfite converted genome presents
Alignment many computational challenges. Alignments should allow for mis-
matches, especially for potential methylation sites. Also, alignments
should be unique considering the numerous possibilities combin-
ing all the methylation statuses in each read to avoid miscalling of
methylation levels. Among all the publicly available mapping tools
such as BSMAP, RMAP-bs, MAQ, or BS seeker, we have chosen
Bismark [15] to map ERRBS reads due to a couple of substantial
advantages (seeNote 4) [20–23].
The alignment process requires two steps:
1. Bisulfite converted genome preparation: typically no parameter
changes are required for the genome preparation process. The
only thing that absolutely needs to be specified is the directory
where genome references are located. Such files need to be in
FASTA/FASTQ format and can be downloaded from public
databases such as UCSC genome browser or Ensembl
[24, 25]. A recent genome build is recommended, e.g., hg19
or GRCh38.
$ bismark_genome_preparation [options] <path_to_genome_folder>
3.3 Cytosine Once suitable ERRBS alignments are generated, the methylation
Methylation State level for individual sites (mostly CpG sites) can be determined. To
Calling be consistent with our alignment processes, we utilize a simple
script, named bismark_methylation_extractor from Bismark to
achieve this goal. After methylation levels are generated, we need
to perform quality checks to assess the accuracy of individual CpG
methylation levels. Then we convert the data into a user-friendlier
format for further analysis. This process consists of the following
steps:
1. Extract CpG methylation levels from BAM files: we use bis-
mark_methylation_extractor from Bismark to extract CpG
methylation levels from each read in the BAM files. This tool
is one of the most important advantages of Bismark compared
to other computational tools (seeNote 4).
Table 2
Bismark output statistics example
<input_file> sets the name of the Methyl files from step 3 (see
Table 1). <output_dir> specifies that all output files are writ-
ten into this directory.
Example:
Using one of our BAM files as an example (DLBCL_1D.
ERRBS_trimmed.fq_bismark.bam), the commands are as
follows:
1. $ bismark_methylation_extractor -s --output. --merge_n-
on_CpG --multicore 6 --genome_folder genome/
DLBCL_1D.ERRBS_trimmed.fq_bismark.bam
2. $ python methylCall_from_Bismark.py -c 10 DLBCL_1D
bismark_output/ cpg/
3. $ perl epicore2calls.pl cpg.DLBCL_1D.mincov10.txt | gzip
> cpg.DLBCL_1D.mincov10.txt.calls.gz
As before, all files, directories, and samples are assumed to be in
the current working directory. The full path to each file and direc-
tory needs to be specified otherwise, if the files are present in a
different location. When working with non-directional ERRBS
data, additional parameters are required as indicated in Subheading
4 (seeNotes 6 and 7). In the above example, the minimum coverage
per CpG was set to 10. Enough reads can support the reliability of
methylation levels for CpGs. For ERRBS analysis, 10 is always used
as the cutoff, which is a tradeoff value considering the available
number of CpGs and sequencing cost. Please find suggestions
about how to choose this parameter in Subheading 4 (seeNote 8).
40 Heng Pan and Olivier Elemento
a
Enrichment
Depletion
Pathways
10
3
4
6
7
1.2
2
1.1
1.3
5
8
9
11
Patients
b Z-score of methylation
level (by row)
Regulatory elements
Diagnosis
Relapse
1D
2D
3D
4D
5D
6D
7D
8D
9D
10D
11D
1R1
1R2
1R3
2R
3R
4R
5R
6R
7R
8R
9R
10R
11R
Patient samples
Fig. 2 Examples of DMRs identification and visualization. (a) Pathways overrepresented among hypermethy-
lated genes (promoters overlapped with hypermethylation DMRs) of individual patients were illustrated here.
Each row represents a single pathway and each column represents a patient pair. (b) Each row represents a
single differentially methylated regulatory element. Each column represents single diagnosis/relapsed sample
from patients. Scale bars represent z-score of methylation levels. Values were centered and scaled in row
direction
42 Heng Pan and Olivier Elemento
$ mergeCSAnnotateGenesColumns.pl --genefile¼<input_file>
--outfile¼<output_file> [options]
This tool creates an input file for iPAGE. Briefly, each gene is
labeled as “gene of interest” (a gene near a DMR) or “back-
ground”. <input_file> indicates the output files from mer-
geCSAnnotateGenesColumns.pl, which should be used as
input for this step. The --refgene parameter specifies the gene
data annotation used by ChIPseeqer and is used to create the
background gene category.
5. Pathway analysis of DMR-related genes: given a gene profile
with genes labeled either as genes of interest or as background,
iPAGE is used to run pathway analysis against known pathways
and gene sets. It uses mutual information to connect input
gene sets and published gene sets and pathways.
3.5 Genomic Region- The analysis in Subheading 3.4 is currently limited to pairwise
Specific DMRs sample analysis. While it can be extended to more than two samples,
Analysis (Supervised) an alternative approach is to compare the methylation levels of
specific regions across two groups of samples. Groups of samples
can be defined based on clinical variables such as diagnosis and
relapse, chemo-resistant versus chemo-refractory, etc. Genomic
regions can be defined as promoters, CpG Islands, enhancers, and
binding sites for certain proteins, e.g., CTCF [34]. The proposed
Tumor Methylome Analysis 45
2. > pheatmap(mat, . . .)
In this example the, --regions parameter specifies a list of
promoters (defined as 2kb windows centered on RefSeq tran-
scription start site) in GRanges data format [19]. Other regions
such as CpG islands and enhancers can also be used here.
b c Loci in promoters
p=0.002414
65
1.0 High intra-tumor
Intra-tumor heterogeneity
heterogeneity 60
0.8
Epipolymorphism
55
0.6
50
0.4
Low intra- 45
0.2 tumor
40
heterogeneity
0.0 35
0 20 40 60 80 100 Diagnosis Relapse
DNA Methylation (%)
Fig. 3 Examples of intra-tumor MH analysis. (a) Epipolymorphism levels are dependent on DNA methylation
levels. All loci are divided into different groups based on their methylation level and the median epipolymorph-
ism of each group is calculated. Genome-wide intra-tumor MH is quantified by area under the median line. (b)
Median epipolymorphism lines for diagnosis and relapse tumors from patient 1.1 in our cohort. Intra-tumor MH
visibly decreased with tumor evolution. (c) Relapsed samples displayed significant lower intra-tumor MH. All
the loci located in gene promoter
3.7 Conclusion Epigenetic modifications play a key role in cell development and
and Outlook tumorigenesis. DNA methylation is one of the best studied epige-
netic modifications. High-throughput bisulfite converted sequenc-
ing technology provides great opportunities to analyze DNA
methylation patterns during various physiological and pathophysi-
ological processes. DNA methylation is relevant for cancer biology
and has been link to tumor evolution. We here describe a compre-
hensive computational methodology to analyze DNA methylation,
utilizing open source tools and our own in-house software. Our
methodology starts from pre-alignment quality control and data
cleaning processes, followed by data alignment, methylation state
calling, and multiple downstream analyses. Following our direc-
tions, users can perform supervised and unsupervised analysis to
different scales, including base pair DMCs, patient-specific DMRs,
and genomic region-specific DMRs. Utilizing the above-
mentioned tools to identify DNA methylation abnormalities can
allow linking those to cellular development, tumor progression,
and tumor evolution.
It is still unclear how DNA methylation or epigenetic modifica-
tions contribute to genetic changes and subsequently influence
tumor evolution. Computationally, faster and more accurate
Tumor Methylome Analysis 49
4 Notes
References
1. Dawson MA, Kouzarides T (2012) Cancer epi- Discov 3:1002–1019. https://doi.org/10.
genetics: from mechanism to therapy. Cell 1158/2159-8290.CD-13-0117
150:12–27 3. Shaknovich R, Melnick A (2011) Epigenetics
2. Clozel T, Yang S, Elstrom RL, Tam W, and B-cell lymphoma. Curr Opin Hematol
Martin P, Kormaksson M, Banerjee S, 18:293–299. https://doi.org/10.1097/
Vasanthakumar A, Culjkovic B, Scott DW, MOH.0b013e32834788cf
Wyman S, Leser M, Shaknovich R, 4. Pan H, Jiang Y, Boi M, Tabbò F, Redmond D,
Chadburn A, Tabbo F, Godley LA, Gascoyne Nie K, Ladetto M, Chiappella A, Cerchietti L,
RD, Borden KL, Inghirami G, Leonard JP, Shaknovich R, Melnick AM, Inghirami GG,
Melnick A, Cerchietti L (2013) Mechanism- Tam W, Elemento O (2015) Epigenomic evo-
based epigenetic chemosensitization therapy lution in diffuse large B-cell lymphomas. Nat
of diffuse large B-cell lymphoma. Cancer Commun 6:6921
52 Heng Pan and Olivier Elemento
5. Lin P-CC, Giannopoulou EG, Park K, Mos- 13. Andrews S (2010) FastQC: a quality control
quera JM, Sboner A, Tewari AK, Garraway LA, tool for high throughput sequence data.
Beltran H, Rubin MA, Elemento O (2013) http://www.bioinformatics.babraham.ac.uk/
Epigenomic alterations in localized and projects/fastqc/http://www.bioinformatics.
advanced prostate cancer. Neoplasia babraham.ac.uk/projects/. doi: citeulike-arti-
15:373–383. https://doi.org/10.1593/neo. cle-id:11583827
122146 14. Krueger F (2012) Trim Galore!. http://www.
6. Pike BL, Greiner TC, Wang X, Weisenburger bioinformatics.babraham.ac.uk/projects/
DD, Hsu Y-H, Renaud G, Wolfsberg TG, trim_galore/
Kim M, Weisenberger DJ, Siegmund KD, 15. Krueger F, Andrews SR (2011) Bismark: a flexi-
Ye W, Groshen S, Mehrian-Shai R, Delabie J, ble aligner and methylation caller for Bisulfite-
Chan WC, Laird PW, Hacia JG (2008) DNA Seq applications. Bioinformatics 27:1571–1572
methylation profiles in diffuse large B-cell lym- 16. Giannopoulou EG, Elemento O (2011) An
phoma and their relationship to gene expres- integrated ChIP-seq analysis platform with cus-
sion status. Leukemia 22:1035–1043. https:// tomizable workflows. BMC Bioinformatics
doi.org/10.1038/leu.2008.18 12:277
7. Esteller M (2002) CpG island hypermethyla- 17. Goodarzi H, Elemento O, Tavazoie S (2009)
tion and tumor suppressor genes: a booming Revealing global regulatory perturbations
present, a brighter future. Oncogene across human cancers. Mol Cell 36:900–911.
21:5427–5440 https://doi.org/10.1016/j.molcel.2009.11.
8. Shaknovich R, Geng H, Johnson NA, 016
Tsikitas L, Cerchietti L, Greally JM, Gascoyne 18. R Development Core Team (2011) R Founda-
RD, Elemento O, Melnick A (2010) DNA tion for Statistical Computing, Vienna AI
methylation signatures define molecular sub- 3-900051-07-0. R A Lang Environ Stat Com-
types of diffuse large B-cell lymphoma. Blood put 55:275–286
116:e81–e89
19. Lawrence M, Huber W, Pagès H, Aboyoun P,
9. Akalin A, Garrett-Bakelman FE, Carlson M, Gentleman R, Morgan MT, Carey
Kormaksson M, Busuttil J, Zhang L, VJ (2013) Software for computing and anno-
Khrebtukova I, Milne TA, Huang Y, tating genomic ranges. PLoS Comput Biol 9:
Biswas D, Hess JL, Allis CD, Roeder RG, e1003118
Valk PJM, Löwenberg B, Delwel R, Fernandez
HF, Paietta E, Tallman MS, Schroth GP, 20. Xi Y, Li W (2009) BSMAP: whole genome
Mason CE, Melnick A, Figueroa ME (2012) bisulfite sequence MAPping program. BMC
Base-pair resolution DNA methylation Bioinformatics 10:1–9
sequencing reveals profoundly divergent epige- 21. Smith AD, Chung WY, Hodges E, Kendall J,
netic landscapes in acute myeloid leukemia. Hannon G, Hicks J, Xuan Z, Zhang MQ
PLoS Genet 8:e1002781. https://doi.org/ (2009) Updates to the RMAP short-read
10.1371/journal.pgen.1002781 mapping software. Bioinformatics
10. Meissner A, Gnirke A, Bell GW, Ramsahoye B, 25:2841–2842
Lander ES, Jaenisch R (2005) Reduced repre- 22. Li H, Ruan J, Durbin R (2008) Mapping short
sentation bisulfite sequencing for comparative DNA sequencing reads and calling variants
high-resolution DNA methylation analysis. using mapping quality scores. Genome Res
Nucleic Acids Res 33:5868–5877 18:1851–1858
11. Sidow A, Spies N (2015) Concepts in solid 23. Chen P-Y, Cokus SJ, Pellegrini M (2010) BS
tumor evolution. Trends Genet 31:208–214 Seeker: precise mapping for bisulfite sequenc-
12. Landau DA, Clement K, Ziller MJ, Boyle P, ing. BMC Bioinformatics 11:203
Fan J, Gu H, Stevenson K, Sougnez C, 24. Kent WJ, Sugnet CW, Furey TS, Roskin KM,
Wang L, Li S, Kotliar D, Zhang W, Pringle TH, Zahler AM, Haussler a D (2002)
Ghandi M, Garraway L, Fernandes SM, Livak The Human Genome Browser at UCSC.
KJ, Gabriel S, Gnirke A, Lander ES, Brown JR, Genome Res 12:996–1006. https://doi.org/
Neuberg D, Kharchenko PV, Hacohen N, 10.1101/gr.229102
Getz G, Meissner A, Wu CJ (2014) Locally 25. Aken BL, Ayling S, Barrell D, Clarke L,
disordered methylation forms the basis of intra- Curwen V, Fairley S, Fernandez-Banet J,
tumor methylome variation in chronic lympho- Billis K, Garcia-Giron C, Hourlier T, Howe
cytic leukemia. Cancer Cell 26:813–825. KL, Kahari AK, Kokocinski F, Martin FJ, Mur-
https://doi.org/10.1016/j.ccell.2014.10.012 phy DN, Nag R, Ruffier M, Schuster M, Tang
YA, Vogel J-H, White S, Zadissa A, Flicek P,
Tumor Methylome Analysis 53
Searle SMJ (2016) The Ensembl gene annota- 33. Subramanian A, Tamayo P, Mootha VK,
tion system. Database (Oxford) 2016:baw093. Mukherjee S, Ebert BL, Gillette MA,
https://doi.org/10.1093/database/baw093 Paulovich A, Pomeroy SL, Golub TR, Lander
26. Langmead B, Trapnell C, Pop M, Salzberg SL ES, Mesirov JP (2005) Gene set enrichment
(2009) Ultrafast and memory-efficient align- analysis: a knowledge-based approach for inter-
ment of short DNA sequences to the human preting genome-wide expression profiles. Proc
genome. Genome Biol 10:1–25. https://doi. Natl Acad Sci U S A 102:15545–15550
org/10.1186/gb-2009-10-3-r25. gb-2009- 34. Lai AY, Fatemi M, Dhasarathy A, Malone C,
10-3-r25 [pii]\r Sobol SE, Geigerman C, Jaye DL, Mav D,
27. Langmead B, Salzberg SL (2012) Fast gapped- Shah R, Li L, Wade PA (2010) DNA methyla-
read alignment with Bowtie 2. Nat Methods tion prevents CTCF-mediated silencing of the
9:357–359 oncogene BCL6 in B cell lymphomas. J Exp
28. Ashburner M, Ball CA, Blake JA, Botstein D, Med 207:1939–1950
Butler H, Cherry JM, Davis AP, Dolinski K, 35. Landan G, Cohen NM, Mukamel Z, Bar A,
Dwight SS, Eppig JT, Harris MA, Hill DP, Molchadsky A, Brosh R, Horn-Saban S, Zal-
Issel-Tarver L, Kasarskis A, Lewis S, Matese censtein DA, Goldfinger N, Zundelevich A,
JC, Richardson JE, Ringwald M, Rubin GM, Gal-Yam EN, Rotter V, Tanay A (2012) Epige-
Sherlock G (2000) Gene ontology: tool for the netic polymorphism and the stochastic forma-
unification of biology. The Gene Ontology tion of differentially methylated regions in
Consortium. Nat Genet 25:25–29 normal and cancerous tissues. Nat Genet
29. Shaffer AL, Wright G, Yang L, Powell J, Ngo V, 44:1207–1214. https://doi.org/10.1038/
Lamy L, Lam LT, Davis RE, Staudt LM (2006) ng.2442
A library of gene expression signatures to illu- 36. Eichten SR, Stuart T, Srivastava A, Lister R,
minate normal and pathological lymphoid biol- Borevitz JO (2016) DNA methylation profiles
ogy. Immunol Rev 210:67–85. https://doi. of diverse Brachypodium distachyon aligns
org/10.1111/j.0105-2896.2006.00373.x with underlying genetic diversity. Genome
30. Kolde R (2012) Package ‘pheatmap’. Res 26:1520–1531. https://doi.org/10.
Bioconductor:1–6 1101/gr.205468.116
31. Kanehisa M, Sato Y, Kawashima M, 37. Li S, Garrett-Bakelman F, Perl AE, Luger SM,
Furumichi M, Tanabe M (2016) KEGG as a Zhang C, To BL, Lewis ID, Brown AL,
reference resource for gene and protein anno- D’Andrea RJ, Ross ME, Levine R, Carroll M,
tation. Nucleic Acids Res 44:D457–D462 Melnick A, Mason CE (2014) Dynamic evolu-
tion of clonal epialleles revealed by methclone.
32. Ogata H, Goto S, Sato K, Fujibuchi W, Genome Biol 15:472
Bono H, Kanehisa M (1999) KEGG: Kyoto
encyclopedia of genes and genomes. Nucleic
Acids Res 27:29–34
Chapter 4
Abstract
A variety of molecular techniques can be used in order to unravel the molecular composition of cells. In
particular, the microarray technology has been used to identify novel biomarkers that may be useful in the
diagnosis, prognosis, or treatment of cancer. The microarray technology is ideal for biomarker discovery as
it allows for the screening of a large number of molecules at once. In this review, we focus on microRNAs
(miRNAs) which are key molecules in cells and regulate gene expression post-transcriptionally. miRNAs are
small, single-stranded RNA molecules that bind to complementary mRNAs. Binding of miRNAs to
mRNAs leads either to degradation, or translational inhibition of the target mRNA. Roughly one third
of all the mRNAs are postulated to be regulated by miRNAs. miRNAs are known to be deregulated in
different types of cancer, including breast cancer, and it has been demonstrated that deregulation of several
miRNAs can be used as biological markers in cancer. miRNA expression can for example discriminate
between normal, benign and malignant breast tissue, and between different breast cancer subtypes.
In the post-genomic era, an important task of molecular biology is to understand gene regulation in the
context of biological networks. Because miRNAs have such a pronounced role in cells, it is pivotal to
understand the mechanisms that underlie their control, and to identify how miRNAs influence cancer
development and progression.
Key words Biomarkers, Breast cancer, Cancer, Microarrays, microRNA, Systems biology
1 microRNA Biology
1.1 microRNAs: A The central dogma in molecular biology has for a long time been
Historical Perspective “DNA makes RNA that makes protein” [1]. However, the impact
of a gene on the phenotype is highly dependent on different
mechanisms that allow a particular gene to be turned “on” or
“off” in a particular state, in a particular cell, at a particular time.
One way this type of regulation can be performed is by small RNA
regulatory units called microRNAs (miRNAs). miRNAs are small,
non-protein-coding RNA molecules that function as negative reg-
ulators of gene expression either by inhibiting translation or induc-
ing degradation of messenger RNA (mRNA). Lin-4 was the first
Louise von Stechow (ed.), Cancer Systems Biology: Methods and Protocols, Methods in Molecular Biology, vol. 1711,
https://doi.org/10.1007/978-1-4939-7493-1_4, © Springer Science+Business Media, LLC 2018
55
56 Andliena Tahiri et al.
1.2 miRNA The process of generating mature miRNAs in the cell consists of a
Biogenesis series of nuclear and cytoplasmic steps (see Fig. 1). miRNAs are
and Function encoded either independently of protein-coding genes (intergenic)
or inside introns of a host gene (intronic). Transcription occurs in
the nucleus by RNA polymerase II and produces a long primary
hairpin transcript called the primary miRNA (pri-miRNA). The
pri-miRNA is long (>1 kb) and contains a local stem–loop struc-
ture, which is cleaved by the microprocessor complex (RNase III
Drosha, in combination with DiGeorge syndrome critical region
gene 8) in order to generate a precursor miRNA (pre-miRNA)
[6, 7]. The pre-miRNA is exported to the cytoplasm by Exportin-
5 and RAN-GTP, where it is further processed by RNase III endo-
nuclease Dicer, to form a double-stranded miRNA duplex (~22 nt)
[8]. The duplex is made up of two mature miRNA strands (named
-5p and -3p depending on the 50 and 30 directions of the strand),
and is subsequently loaded onto an Argonaut (AGO) protein to
form an effector complex called the RNA-induced silencing com-
plex (RISC) [7]. Usually, the RNA-strand with the unstable 50 -end
is recruited into RISC, whereas the other strand (-3p) is released
and quickly degraded. However, some studies have shown that the
less abundant strand is also active in silencing, albeit usually less
The Role and Function of MicroRNAs in Normal and Pathological Processes 57
DNA
POL II
Pri-miRNA
Pre-miRNA
EXP5
Cytoplasm
EXP5
DICER
mRNA
miRNA strand
degradation
incorporated into
RISC
Fig. 1 The canonical miRNA biogenesis pathway and miRNA function (see the text for details). Pri-miRNA,
primary miRNA; EXP5, Exportin 5; POL II, RNA polymerase II; pre-miRNA, precursor miRNA; RISC, RNA-induced
silencing complex
potently than the more abundant guide strand [19]. Once the
mature miRNA strand is incorporated into the RISC complex,
the miRNA sequence targets mRNA through either perfect or
imperfect complimentary binding to the 30 untranslated region
(UTR), the coding region or the 50 -UTR of genes [9, 10]. Binding
of RISC to target mRNA can have different outcomes. Imperfect
complementary binding of miRNAs to their targets inhibits trans-
lation and reduces protein expression without affecting the mRNA
levels of these genes. Perfect complementary pairing between
miRNA and mRNA targets the mRNA for degradation by RISC
[11]. The exact mechanism of protein reduction is not fully under-
stood, but it is likely that it occurs through both RNA degradation
and translational repression pathways, with different miRNAs con-
tributing to each pathway in different proportions [12]. mRNA
degradation in mammals involves poly (A)-tail shortening (dead-
enylation) and other de-capping methods at the 50 -end of the
mRNA strand. It is believed that miRNAs regulate a substantial
portion of all protein coding genes. The complexity of mRNA
regulation through miRNAs is remarkable as each miRNA can
potentially regulate hundreds of genes, and one gene can be
58 Andliena Tahiri et al.
1.3 miRNA–Target The dominant target recognition sequence in the miRNA is termed
Gene Interactions the “seed” sequence and is located in nucleotides 2–8 in the
and Predictions miRNA from the 50 -end [17]. These positions in the miRNA are
often evolutionary conserved. Other compensatory rules for
miRNA-mRNA target recognition also exist, but all of them
include some degree of sequence complementarity. miRNA target
prediction is a major task in computational biology. Several in silico
approaches exist that predict targets for a given miRNA and are
described later in more detail. Those are based on different criteria
such as complementarity to the miRNA seed region, evolutionary
conservation of the miRNA recognition elements in the mRNA,
free energy of the miRNA-mRNA hetero-duplex, and mRNA
sequence features outside the target site [18, 19].
1.4 miRNA Function miRNAs are able to fine-tune the protein level of thousands of
on a Cancer Systems genes, either directly or indirectly, and thereby make fine-scaled
Level adjustments to protein output [20]. The variety and abundance of
targets offer an enormous level of combinatorial possibilities. This
high level of complexity suggests that miRNAs and their targets
form an intricate regulatory network intertwined with other cellular
networks. It is pivotal to understand how miRNAs regulate cellular
processes at the systems level, including miRNA regulation of
cellular networks, metabolic processes, protein interactions, and
gene regulatory networks. Studying different networks to assess
the influence of miRNAs on their targets will help to identify
miRNAs that have a strong influence on breast cancer development
and progression.
2 miRNAs in Cancer
Table 1
Different types of biomarkers important in clinical settings of cancer research
2.3 miRNA Function The first discovery of the implication of miRNAs in cancer was
in Cancer observed in B-cell chronic lymphocytic leukemia (CLL) in the
search of tumor suppressors at chromosome 13q14, which is com-
monly deleted in CLL patients [46]. In this study, the authors
found that miR-15a and miR-16-1 were located in this region.
Since loss of this chromosome was frequent in CLL, it indicated
that loss of these miRNAs also occurred, raising the question
whether miRNAs could be involved in the pathogenesis of cancer.
Later, the same group identified several miRNAs located in fre-
quently deleted or amplified regions in the genome in different
tumors [47].
Iorio et al. in 2005 described the first breast cancer miRNA
signature which could discriminate tumors from normal tissues
[48]. Subsequent studies have increased our understanding of
miRNA involvement in breast cancer, and identified aberrant
miRNA expression related to survival, metastasis, stage, prolifera-
tion, molecular subtype, TP53 mutational status, hormone recep-
tor status, and response to treatment [49–53]. The studies revealed
that changes in miRNA expression profiles can serve as phenotypic
signatures of specific types of cancer. Aberrant miRNA expression
associated with tumorigenesis can be a result of various mechan-
isms. Several studies point to transcriptional deregulation, copy
number aberrations, mutations, epigenetic alterations, and defects
in the miRNA biogenesis machinery as contributors to miRNA
deregulation in cancer [54]. Some miRNAs may be causally linked
to tumorigenesis by directly modifying tumor-suppressor or onco-
genic pathways. For example, the overexpression of miRNAs can
inhibit tumor-suppressor genes in a pathway. Conversely, reduced
miRNA expression through loss-of-function mutations could result
in increased expression of oncogenes, also contributing to cancer
development and progression (see Fig. 2).
62 Andliena Tahiri et al.
Cancer
Fig. 2 miRNAs may have oncogenic or tumor-suppressive roles in cancer. Upregulation of oncogenic miRNAs
results in increased repression of tumor-suppressor target genes. Conversely, downregulation of tumor-
suppressor miRNAs results in decreased repression and thus increased expression of target oncogenes. Both
scenarios may lead to cancer development and progression. Figure based on Lujambio and Lowe [119]
2.4 miRNAs Various studies provide evidence that miRNAs can be used as
as Cancer Biomarkers biomarkers for different purposes [55, 56]. Deregulated expression
profiles of miRNAs have been discovered in a wide variety of human
cancers, including breast cancer [57], colorectal cancer [58], gli-
oma [59], lymphoma [60], and prostate cancer [61].
The survival and prognosis of a patient is highly dependent on
the stage of the tumor at the time of detection. The earlier a tumor
is detected, the better the prognosis is. Thus, a major clinical
challenge in cancer is the identification of biomarkers that can
detect cancer at an early stage. miRNAs can be reliably extracted
and detected from frozen and paraffin-embedded tissues. They can
moreover be found circulating freely in the blood or bound to
circulating exosomes, and in different body fluids like urine, saliva,
and sputum [62]. The fact that miRNAs are stable in body fluids,
and that they are easily detectable through noninvasive procedures
makes miRNAs attractive biomarker candidates. For example,
miRNA signatures in plasma had strong diagnostic and prognostic
potential detecting lung cancer before disease onset, as plasma
samples were collected 1–2 years before lung cancer was detected
by CT [63]. Another recent study by Cava et al. [64] showed that
miRNA profiling improved breast cancer classification and could
differentiate patients with breast cancer as responding or not
responding to therapy, with promising results. The correct classifi-
cation of breast cancer is a fundamental factor in determining the
appropriate treatment, and it is now evident that miRNAs have the
potential to provide new diagnostic, prognostic, and predictive
The Role and Function of MicroRNAs in Normal and Pathological Processes 63
cDNA
Scanning
Labeled Labeled
RNA cRNA
Feature extraction
Fig. 3 miRNA and mRNA expression profiling using Agilent microarrays. RNA is labeled with a fluorescent dye
(Cyanine 3; Cy) and transferred to the microarray where the sample material hybridizes to complementary
probes during incubation. Then follows washing and scanning of the array, and finally feature extraction where
probe hybridization intensities are quantified. The protocol deviates slightly between microRNA and mRNA
analysis. For the former RNA is treated with phosphatase to remove the 30 -phosphate group (P), which is
followed by labeling. For mRNA profiling the RNA is first converted to complementary DNA (cDNA) by reverse
transcriptase (RT), and then the cDNA is further transcribed into complementary RNA (cRNA) by the use of RNA
polymerase (POL II) where labeled cytosine residues are incorporated
64 Andliena Tahiri et al.
3.2 Functional Data from functional studies of miRNAs in cell lines can be gener-
Experiments ated after identifying interesting candidates from analyses of high-
to Validate miRNA throughput data. The aim is to determine whether the candidate
Targets and Their miRNA is functionally involved in cancer-associated processes. This
Effect on Cells can be done by testing the effects of silencing or overexpression of
the candidate miRNA on the viability and proliferation of cancer
cells. Knockdown of potential tumor driver miRNAs can be per-
formed using small, single-stranded anti-miRs which are miRNA
inhibitors that bind to and inhibit endogenous miRNAs [65]. Con-
versely, the effect of candidate tumor-suppressor miRNAs can be
assessed by overexpression, for example by adding miRNA mimics
and measuring the effect on cell viability. In order to effectively
study the functional role of miRNAs in cell lines, high-throughput
screens can be performed. Leivonen et al. used libraries of either
miRNA mimics or anti-miRNAs which were tested simultaneously
in large scale and used to measure the effect of miRNA overexpres-
sion or knockdown, respectively [66]. miRNAs can be spotted in
96- or 384-well formats, and incubated with cells from a cell line of
interest. The phenotypic end-points of such screens may measure
the effects that miRNAs have on cell viability, apoptosis, and prolif-
eration, as well as expression of marker proteins. Leivonen et al.
[66] performed a high-throughput screen to identify miRNAs that
were important for the growth of HER2-positive breast cancer
cells. They overexpressed miRNAs in HER2-positive cell lines and
assessed the effect on HER2 protein levels, proliferation (Ki67),
and apoptosis (cleaved PARP). Thirty-eight miRNAs were identi-
fied that inhibited HER2 signaling and cell growth. In another
study [53], miRNAs that were identified as differentially expressed
between high and low proliferative tumor samples (scored by
immunohistochemistry) were further functionally validated by
transfecting a library of pre-miR constructs into breast cancer cell
lines. The cells were lysed and the lysates printed on slides that were
then stained with an antibody against Ki67 to assess the effect of the
miRNAs on proliferation. Among the 123 identified differentially
expressed miRNAs, 13 showed a corresponding functional effect
on Ki67 protein levels [53].
The measurement of ATP using luciferase is one of the most
commonly used assays for assessing cell viability in high-
throughput screening applications [67]. The assay is fast and easy
to use, sensitive, and also less prone to artifacts than other viability
assay methods [67, 68]. However, the assay measures metabolically
active cells, which cannot be translated into viable cells in all con-
texts. Another method that is widely used is the MTT Tetrazolium
Reduction Assay. Yet, the MTT assay lacks sensitivity, is more time-
consuming and more prone to variation, due to multiple experi-
mental steps involved compared to the ATP assay [68]. Other
methods that are used to measure the effect of miRNAs on cell
viability or proliferation include the TUNEL assay, Trypan Blue
The Role and Function of MicroRNAs in Normal and Pathological Processes 65
3.3 Databases Different databases exist that list miRNAs, their chromosomal
and Tools location, sequence and their putative target genes. For example,
the miRBase database contains all published miRNA sequences and
annotations [4]. The Ingenuity Pathway Analysis database (IPA,
Ingenuity Systems; www.ingenuity.com) can be used to associate
genes correlated to candidate miRNAs with pathways and for vari-
ous gene annotation purposes. The SEEK tool [69] can be used to
identify and annotate genes that are co-expressed with miRNA-
correlated genes. Different computational tools are readily available
for the analysis of miRNA target sites, such as miRanda [70–72],
TargetScan [5, 73, 74], PicTar [75–77], and DianaMicroT-CDS
[78, 79] (Table 2). Those can be used to predict potential targets of
a miRNA that has been identified (for example in cancer tissue), or
vice versa, identify candidate miRNAs predicted to bind to a gene of
interest.
Feedback from functional validation results has greatly
improved the performance of these in silico miRNA target predic-
tion algorithms. The miRanda software was initially designed to
predict miRNA target genes in Drosophila melanogaster
[70, 71]. The algorithm searches for highly overlapping basepairs
in the 30 UTRs for identifying potential binding sites [70]. A higher
score is given for sequences which are complementary to the 50 end
of the miRNA compared to the 30 end, leading to higher prediction
scores for seed regions with perfect, or nearly perfect match.
TargetScan is an algorithm developed by Lewis et al. [74], and
was the first miRNA target prediction tool for the human genome,
using a different search approach than miRanda. TargetScan
searches for perfect complementarity in the seed region and beyond
[74]. If there is complementarity outside the seed region, it will
filter out the false positives more efficiently prior to prediction.
Data from conservation analysis derived from orthologous 30
Table 2
Computational algorithms for miRNA target prediction
3.4 (Epi-) DNA aberrations are a hallmark of cancer genomes [81], and the
Genome–Transcrip- phenotypic effects of such alterations are commonly investigated
tome Analysis through the integration of genomic and transcriptomic data. Ana-
lyzing changes in DNA copy number can be used to identify
aberrant cancer genes. The correlation between copy number and
mRNA expression can be utilized to single out genes for which
DNA aberration is manifested in the altered expression of the gene.
In a similar manner, DNA copy number and methylation status can
be used together with miRNA expression to identify miRNAs
altered on the (epi-)genomic level with effects on the transcrip-
tomic level. The rationale behind such integrative approaches is
that recurrent alterations across tumor samples may indicate func-
tionality through the effect on the transcription levels of the
corresponding miRNAs or genes. Thus, RNA expression is used
as an additional layer to the genomic or epigenetic data to further
identify potential candidate genes. If a change in DNA copy num-
ber affects the expression of a miRNA, the miRNA is more likely to
be under selection in the tumor and hence might be important for
tumorigenesis.
Studies integrating DNA copy number and mRNA expression
in breast cancer have revealed a clear dosage effect of gene copy
number on gene expression [82, 83], which also holds true for
miRNA expression [84]. Lahti et al. [85] divided implementations
for the integrative analysis of DNA copy number and expression
into four main categories of approaches; two-step approaches,
correlation-based approaches, regression-based approaches, and
latent variable models. In a two-step approach, tumor samples
and miRNAs/genes are first grouped based on altered copy
The Role and Function of MicroRNAs in Normal and Pathological Processes 67
DNA level
methylation
Gene Co-expression
Transcription
mRNA AAAA
regulation
miRNA Translation AAAA
regulation
Protein
Protein-protein
interaction
Individual Networks and
measurements Individual relations pathways
on various levels
Fig. 4 Integration and analysis of multi-dimensional data. Biological components are measured across
individuals and platforms, and their relations and interactions are identified. From this, complete networks
and pathways are overlaid or built, and the emerging system is interrogated for alterations. Figure based on
McDermott et al. [89]
4.1 Methods to Study miRNAs play an important role in the post-transcriptional regula-
miRNA Regulation tion of gene expression. To date, the number of experimentally
and Target Validation validated targets is low compared to the hundreds of putative
targets predicted by the different in silico prediction algorithms
[96]. The most common methods for the validation of miRNA
targets include the transfection of reporter vector constructs or
mimic miRNAs into cells, or the use of miRNA inhibitors. Those
are followed by assessing the effects on mRNA (by, e.g., qRT-PCR,
microarrays or sequencing) or protein levels (by, e.g., western blot)
of the putative miRNA targets. The challenge entailed in these
techniques lies in distinguishing direct from indirect effects
[96]. Alternatively, direct methods for the validation of miRNA
targets are based on the immunoprecipitation of the RISC complex
together with the bound miRNA-mRNA complex. RNA isolated
by crosslinking immunoprecipitation (HITS-CLIP) can then be
analyzed by high-throughput sequencing [97]. Yet, also for
Co-IP protocols, unspecific binding or co-isolation of secondary
binders is common.
Most analyses of miRNA crosslinking to date have not included
protein data. Indeed, the majority of studies modeling the regu-
latory impact of miRNAs have been performed on joint miRNA-
mRNA expression data. While the physical interaction takes place
between miRNA and mRNA, in order to validate a true miRNA-
The Role and Function of MicroRNAs in Normal and Pathological Processes 71
4.2 Dissecting Altered miRNA expression in cancer has been extensively reported;
the Functional Role however, there are still many unanswered questions regarding the
of miRNAs in Breast role of miRNAs in cancer. miRNAs, which are differentially
Cancer expressed between samples of different molecular subtypes, TP53
mutation status, and ER status have been described in breast cancer
[53]. The causes of miRNA deregulation in breast cancer have been
investigated by trying to comprehensively study the effect of DNA
methylation and copy number aberrations of miRNA loci and
couple those to miRNA expression [84]. Identifying the various
mechanisms underlying perturbation of miRNA levels will help us
to understand more about the role of miRNAs in tumor develop-
ment and also about miRNA biology in general.
Dissecting the functional role of miRNAs is a challenging task due
to several aspects. miRNA families have likely arisen due to gene
duplication events [99], and members of the same miRNA family
have a high degree of similarity in sequence. In some cases, members
of a miRNA family are also encoded in the same polycistronic tran-
script [100]. Sequence similarities suggest that they may target the
same genes and thus have potentially overlapping functions. From an
evolutionary perspective, the mRNA 30 -UTR where the miRNA tar-
geting most often occurs, is not constrained by coding needs and thus
has the potential to be subject to selection so that beneficial miRNA-
mRNA target interactions may evolve [101]. Moreover, miRNAs
originating from the same polycistronic transcript or encoded in
close proximity have a high chance of being co-expressed. Hence,
untangling the role of individual miRNAs is complicated.
72 Andliena Tahiri et al.
4.3 miRNAs The field of miRNA biology is rather new, considering that the first
as Clinical Biomarkers miRNA was discovered in 1993, and there are still new miRNAs
for Diagnostic being identified today. During the past 10 years, miRNA research
and Predictive has advanced rapidly, and has produced new knowledge about the
Purposes in Breast molecular basis of cancer, tools for molecular classification, and new
Cancer markers with diagnostic and prognostic relevance [62]. miRNAs
are considered suitable biomarkers for early cancer detection
because they are present and stable in human serum and plasma
[108]. miRNA alterations during breast cancer progression from
DCIS to invasive cancer have recently been identified within the
intrinsic subtypes, Luminal A, luminal B, HER2-enriched, and
basal-like [109, 110]. For immunohistochemical-based subtypes
no miRNAs are differentially expressed between DCIS and the
luminal subtypes. Six miRNAs were downregulated in ER/
HER2þ invasive samples compared to DCIS, of which five belong
to the miR-30 family, whereas miR-139-5p was downregulated in
both ER subtypes, while miR-887-3p was downregulated in
triple-negative breast cancer only [109]. This study found that
subtype stratification based on molecular signatures resulted in
more correct classification than stratification based on ER, PR,
and HER2 alone, indicating a better representation of the intrinsic
biology of the samples.
Although the focus has previously been on identifying molecu-
lar differences between cancerous and normal tissue, we often tend
to forget that abnormal cell growth also occurs at benign stages. As
discussed earlier in this chapter, previous studies have shown that
certain types of benign tumors can increase the risk of breast cancer
[22, 23, 25–28]. As the use of mammography has increased, the
identification of benign breast disease has become more common.
Thus, having accurate risk estimates for women who receive this
diagnosis is vital. Moreover, with the distinction between benign
tumors and malignant tumors, Tahiri et al. [111] identified that
deregulation of known cancer-related miRNAs is evident also in
fibroadenomas and fibroadenomatosis, considered as benign
lesions in the breast. These cancer-related miRNAs included
miR-21, members of the let-7 family and other miRNAs well
known to be included in malignant transformation [111]. The
level of deregulation in benign tumors was less pronounced than
that observed in malignant tumors. Nevertheless, the identification
of tumor-associated miRNAs in benign tumors hinted that similar
processes are in place already at early stages of tumor formation.
The identification of miRNAs that can be assigned to either benign
or malignant groups of tumor tissue would be important for diag-
nostic purposes, but these results need to be further strengthened
by independent confirmations.
When identifying a signature of miRNAs through expression
arrays, there are different points to take into consideration. First,
there is reported lack of consistency between different studies that
74 Andliena Tahiri et al.
Acknowledgments
Parts of this review have been part of two doctoral theses from the
University of Oslo, Norway, under the supervision of V.N.K.: one
of M.R.A., fellow of the Research Council of Norway, and one of A.
T., fellow of the South-Eastern Norway Regional Health Authority.
Both are at present postdoctoral fellows of the South-Eastern Nor-
way Regional Health Authority.
References
1. Crick F (1970) Central dogma of molecular deep-sequencing data. Nucleic Acids Res 39
biology. Nature 227(5258):561 (Database Issue)):D152–D157
2. Lee RC, Feinbaum RL, Ambros V (1993) 5. Friedman RC, Farh KK-H, Burge CB, Bartel
The C. elegans heterochronic gene lin-4 DP (2009) Most mammalian mRNAs are
encodes small RNAs with antisense comple- conserved targets of microRNAs. Genome
mentarity to lin-14. Cell 75(5):843–854 Res 19(1):92–105
3. Reinhart BJ, Slack FJ, Basson M, Pasquinelli 6. Lee Y, Ahn C, Han J, Choi H, Kim J, Yim J,
AE, Bettinger JC, Rougvie AE, Horvitz HR, Lee J, Provost P, Radmark O, Kim S et al
Ruvkun G (2000) The 21-nucleotide let-7 (2003) The nuclear RNase III Drosha initi-
RNA regulates developmental timing in Cae- ates microRNA processing. Nature 425
norhabditis elegans. Nature 403 (6956):415–419
(6772):901–906 7. Ha M, Kim VN (2014) Regulation of micro-
4. Kozomara A, Griffiths-Jones S (2011) miR- RNA biogenesis. Nat Rev Mol Cell Biol 15
Base: integrating microRNA annotation and (8):509–524
The Role and Function of MicroRNAs in Normal and Pathological Processes 77
8. Kolb FA, Zhang H, Jaronczyk K, Tahbaz N, 22. Dupont WD, Page DL (1985) Risk factors for
Hobman TC, Filipowicz W (2005) Human breast cancer in women with proliferative
dicer: purification, properties, and interaction breast disease. N Engl J Med 312(3):146–151
with PAZ PIWI domain proteins. Methods 23. Dupont WD, Page DL, Parl FF, Vnencak-
Enzymol 392:316–336 Jones CL, Plummer WD Jr, , Rados MS,
9. Forman JJ, Legesse-Miller A, Coller HA Schuyler PA: Long-term risk of breast cancer
(2008) A search for conserved sequences in in women with fibroadenoma. N Engl J Med
coding regions reveals that the let-7 micro- 1994, 331(1):10–15
RNA targets Dicer within its coding 24. McPherson K, Steel CM, Dixon JM (2000)
sequence. Proc Natl Acad Sci U S A 105 ABC of breast diseases. Breast cancer-
(39):14879–14884 epidemiology, risk factors, and genetics. BMJ
10. Lytle JR, Yario TA, Steitz JA (2007) Target 321(7261):624–628
mRNAs are repressed as efficiently by 25. Worsham MJ, Raju U, Lu M, Kapke A,
microRNA-binding sites in the 50 UTR as in Botttrell A, Cheng J, Shah V, Savera A, Wol-
the 30 UTR. Proc Natl Acad Sci U S A 104 man SR (2009) Risk factors for breast cancer
(23):9667–9672 from benign breast disease in a diverse popu-
11. Dennis C (2002) The brave new world of lation. Breast Cancer Res Treat 118(1):1–7
RNA. Nature 418(6894):122–124 26. Fitzgibbons PL, Henson DE, Hutter RV
12. Sullivan RP, Leong JW, Fehniger TA (2013) (1998) Benign breast changes and the risk
MicroRNA regulation of natural killer cells. for subsequent breast cancer: an update of
Front Immunol 4:44 the 1985 consensus statement. Cancer Com-
13. Boyer LA, Lee TI, Cole MF, Johnstone SE, mittee of the College of American Patholo-
Levine SS, Zucker JR, Guenther MG, Kumar gists. Arch Pathol Lab Med 122
RM, Murray HL, Jenner RG et al (2005) (12):1053–1055
Core transcriptional regulatory circuitry in 27. McDivitt RW, Stevens JA, Lee NC, Wingo
human embryonic stem cells. Cell 122 PA, Rubin GL, Gersell D (1992) Histologic
(6):947–956 types of benign breast disease and the risk for
14. O’Donnell KA, Wentzel EA, Zeller KI, Dang breast cancer. The Cancer and Steroid Hor-
CV, Mendell JT (2005) c-Myc-regulated mone Study Group. Cancer 69
microRNAs modulate E2F1 expression. (6):1408–1414
Nature 435(7043):839–843 28. Cole P, Mark Elwood J, Kaplan SD (1978)
15. Marson A, Levine SS, Cole MF, Frampton Incidence rates and risk factors of benign
GM, Brambrink T, Johnstone S, Guenther breast neoplasms. Am J Epidemiol 108
MG, Johnston WK, Wernig M, Newman J (2):112–120
et al (2008) Connecting microRNA genes to 29. Sgroi DC (2010) Preinvasive breast cancer.
the core transcriptional regulatory circuitry of Annu Rev Pathol 5:193–221
embryonic stem cells. Cell 134(3):521–533 30. Johnson K, Sarma D, Hwang ES (2015) Lob-
16. Mattick JS, Makunin IV (2006) Non-coding ular breast cancer series: imaging. Breast Can-
RNA. Hum Mol Genet 15:R17–R29 cer Res 17:94
17. Czech B, Hannon GJ (2011) Small RNA sort- 31. Perou CM, Sorlie T, Eisen MB, van de
ing: matchmaking for Argonautes. Nat Rev Rijn M, Jeffrey SS, Rees CA, Pollack JR,
Genet 12(1):19–31 Ross DT, Johnsen H, Akslen LA et al (2000)
18. Martin G, Schouest K, Kovvuru P, Spillane C Molecular portraits of human breast tumours.
(2007) Prediction and validation of micro- Nature 406(6797):747–752
RNA targets in animal genomes. J Biosci 32 32. Sorlie T, Perou CM, Tibshirani R, Aas T,
(6):1049–1052 Geisler S, Johnsen H, Hastie T, Eisen MB,
19. Thomas M, Lieberman J, Lal A (2010) Des- van de Rijn M, Jeffrey SS et al (2001) Gene
perately seeking microRNA targets. Nat expression patterns of breast carcinomas dis-
Struct Mol Biol 17(10):1169–1174 tinguish tumor subclasses with clinical impli-
20. Baek D, Villen J, Shin C, Camargo FD, Gygi cations. Proc Natl Acad Sci U S A 98
SP, Bartel DP (2008) The impact of micro- (19):10869–10874
RNAs on protein output. Nature 455 33. Gown AM (2008) Current issues in ER and
(7209):64–71 HER2 testing by IHC in breast cancer. Mod
21. Jemal A, Bray F, Center MM, Ferlay J, Pathol 21(Suppl 2):S8–S15
Ward E, Forman D (2011) Global cancer sta- 34. de Azambuja E, Cardoso F, de Castro G,
tistics. CA Cancer J Clin 61(2):69–90 Colozza M, Mano MS, Durbecq V,
Sotiriou C, Larsimont D, Piccart-Gebhart
78 Andliena Tahiri et al.
MJ, Paesmans M (2007) Ki-67 as prognostic 45. Wu W, Zhao S (2013) Metabolic changes in
marker in early breast cancer: a meta-analysis cancer: beyond the Warburg effect. Acta Bio-
of published studies involving 12 155 chim Biophys Sin Shanghai 45(1):18–26
patients. Br J Cancer 96(10):1504–1513 46. Calin GA, Dumitru CD, Shimizu M, Bichi R,
35. van’t Veer LJ, Dai H, van de Vijver MJ, He Zupo S, Noch E, Aldler H, Rattan S,
YD, Hart AA, Mao M, Peterse HL, van der Keating M, Rai K (2002) Frequent deletions
Kooy K, Marton MJ, Witteveen AT et al and down-regulation of micro-RNA genes
(2002) Gene expression profiling predicts miR15 and miR16 at 13q14 in chronic lym-
clinical outcome of breast cancer. Nature phocytic leukemia. Proc Natl Acad Sci U S A
415(6871):530–536 99(24):15524–15529
36. Enerly E, Steinfeld I, Kleivi K, Aure MR, Lei- 47. Calin GA, Sevignani C, Dan Dumitru C,
vonen SK, Johnsen H, Kallioniemi O, Kris- Hyslop T, Noch E, Yendamuri S,
tensen VN, Yakhini Z, Borresen-Dale AL Shimizu M, Rattan S, Bullrich F, Negrini M
(2010) Molecular characterization of breast et al (2004) Human microRNA genes are
cancer subtypes derived from joint analysis of frequently located at fragile sites and genomic
high throughput miRNA and mRNA data. regions involved in cancers. Proc Natl Acad
EJC Suppl 8(5):164 Sci U S A 101(9):2999–3004
37. Sotiriou C, Neo SY, McShane LM, Korn EL, 48. Iorio MV, Ferracin M, Liu C-G, Veronese A,
Long PM, Jazaeri A, Martiat P, Fox SB, Harris Spizzo R, Sabbioni S, Magri E, Pedriali M,
AL, Liu ET (2003) Breast cancer classification Fabbri M, Campiglio M et al (2005) Micro-
and prognosis based on gene expression pro- RNA gene expression deregulation in human
files from a population-based study. Proc Natl breast cancer. Cancer Res 65(16):7065–7070
Acad Sci U S A 100(18):10393–10398 49. Tavazoie SF, Alarcon C, Oskarsson T,
38. Inic Z, Zegarac M, Inic M, Markovic I, Padua D, Wang Q, Bos PD, Gerald WL, Mas-
Kozomara Z, Djurisic I, Inic I, Pupic G, Jancic sague J (2008) Endogenous human micro-
S (2014) Difference between luminal A and RNAs that suppress breast cancer metastasis.
luminal B subtypes according to Ki-67, tumor Nature 451(7175):147–152
size, and progesterone receptor negativity 50. Yan L-X, Huang X-F, Shao Q, Huang MAY,
providing prognostic information. Clin Med Deng L, Wu Q-L, Zeng Y-X, Shao J-Y (2008)
Insights Oncol 8:107–111 MicroRNA miR-21 overexpression in human
39. Subik K, Lee JF, Baxter L, Strzepek T, breast cancer is associated with advanced clin-
Costello D, Crowley P, Xing L, Hung MC, ical stage, lymph node metastasis and patient
Bonfiglio T, Hicks DG et al (2010) The poor prognosis. RNA 14(11):2348–2360
expression patterns of ER, PR, HER2, 51. Castellano L, Giamas G, Jacob J, Coombes
CK5/6, EGFR, Ki-67 and AR by immuno- RC, Lucchesi W, Thiruchelvam P, Barton G,
histochemical analysis in breast cancer cell Jiao LR, Wait R, Waxman J et al (2009) The
lines. Breast Cancer (Auckl) 4:35–41 estrogen receptor-a-induced microRNA sig-
40. Prat A, Adamo B, Cheang MC, Anders CK, nature regulates itself and its transcriptional
Carey LA, Perou CM (2013) Molecular char- response. Proc Natl Acad Sci U S A 106
acterization of basal-like and non-basal-like (37):15732–15737
triple-negative breast cancer. Oncologist 18 52. Cittelly D, Das P, Spoelstra N, Edgerton S,
(2):123–133 Richer J, Thor A, Jones F (2010) Downregu-
41. Atkinson AJ, Colburn WA, DeGruttola VG, lation of miR-342 is associated with tamoxi-
DeMets DL, Downing GJ, Hoth DF, Oates fen resistant breast tumors. Mol Cancer 9
JA, Peck CC, Schooley RT, Spilker BA et al (1):317
(2001) Biomarkers and surrogate endpoints: 53. Enerly E, Steinfeld I, Kleivi K, Leivonen S-K,
preferred definitions and conceptual frame- Aure MR, Russnes HG, Rønneberg JA,
work. Clin Pharmacol Therap 69(3):89–95 Johnsen H, Navon R, Rødland E et al
42. Sobin LH (2003) TNM: evolution and rela- (2011) miRNA-mRNA integrated analysis
tion to other prognostic factors. Semin Surg reveals roles for miRNAs in primary breast
Oncol 21(1):3–7 tumors. PLoS One 6(2):e16915
43. Karve TM, Cheema AK (2011) Small changes 54. Deng S, Calin GA, Croce CM, Coukos G,
huge impact: the role of protein posttransla- Zhang L (2008) Mechanisms of microRNA
tional modifications in cellular homeostasis deregulation in human cancer. Cell Cycle 7
and disease. J Amino Acids 2011:207691 (17):2643–2646
44. Sharma S, Kelly TK, Jones PA (2010) Epige- 55. Bertoli G, Cava C, Castiglioni I (2015)
netics in cancer. Carcinogenesis 31(1):27–36 MicroRNAs: new biomarkers for diagnosis,
The Role and Function of MicroRNAs in Normal and Pathological Processes 79
prognosis, therapy prediction and therapeutic 67. Kepp O, Galluzzi L, Lipinski M, Yuan J, Kroe-
tools for breast cancer. Theranostics 5 mer G (2011) Cell death assays for drug dis-
(10):1122–1143 covery. Nat Rev Drug Discov 10(3):221–237
56. Nana-Sinkam SP, Croce CM (2013) Clinical 68. Riss TL, Moravec RA, Niles AL, Duellman S,
applications for microRNAs in cancer. Clin Benink HA, Worzella TJ, Minor L (2004)
Pharmacol Ther 93(1):98–104 Cell viability assays. In: Sittampalam GS,
57. Ouyang M, Li Y, Ye S, Ma J, Lu L, Lv W, Coussens NP, Brimacombe K, Grossman A,
Chang G, Li X, Li Q, Wang S et al (2014) Arkin M, Auld D, Austin C, Bejcek B,
MicroRNA profiling implies new markers of Glicksman M, Inglese J et al (eds) Assay guid-
chemoresistance of triple-negative breast can- ance manual. Eli Lilly & Company, Bethesda,
cer. PLoS One 9(5):e96228 MD
58. Dong Y, Wu WK, Wu CW, Sung JJ, Yu J, Ng 69. Zhu Q, Wong AK, Krishnan A, Aure MR,
SS (2011) MicroRNA dysregulation in colo- Tadych A, Zhang R, Corney DC, Greene
rectal cancer: a clinical perspective. Br J Can- CS, Bongo LA, Kristensen VN et al (2015)
cer 104(6):893–898 Targeted exploration and analysis of large
59. Tumilson CA, Lea RW, Alder JE, Shaw L cross-platform human transcriptomic com-
(2014) Circulating microRNA biomarkers pendia. Nat Methods 12(3):211–214
for glioma and predicting response to therapy. 70. Enright A, John B, Gaul U, Tuschl T,
Mol Neurobiol 50(2):545–558 Sander C, Marks D (2003) MicroRNA targets
60. Mazan-Mamczarz K, Gartenhaus RB (2013) in Drosophila. Genome Biol 5(1):R1
Role of microRNA deregulation in the patho- 71. John B, Enright AJ, Aravin A, Tuschl T,
genesis of diffuse large B-cell lymphoma Sander C, Marks DS (2004) Human micro-
(DLBCL). Leuk Res 37(11):1420–1428 RNA targets. PLoS Biol 2(11):e363
61. Maugeri-Sacca M, Coppola V, Bonci D, De 72. Betel D, Wilson M, Gabow A, Marks DS,
Maria R (2012) MicroRNAs and prostate can- Sander C (2008) The microRNA.org
cer: from preclinical research to translational resource: targets and expression. Nucleic
oncology. Cancer J 18(3):253–261 Acids Res 36(Database Issue):D149–D153
62. Iorio MV, Croce CM (2012) MicroRNA dys- 73. Lewis BP, Burge CB, Bartel DP (2005) Con-
regulation in cancer: diagnostics, monitoring served seed pairing, often flanked by adeno-
and therapeutics. A comprehensive review. sines, indicates that thousands of human
EMBO Mol Med 4(3):143–159 genes are microRNA targets. Cell 120
63. Boeri M, Verri C, Conte D, Roz L, Modena P, (1):15–20
Facchinetti F, Calabro E, Croce CM, 74. Lewis BP, Shih IH, Jones-Rhoades MW, Bar-
Pastorino U, Sozzi G (2011) MicroRNA sig- tel DP, Burge CB (2003) Prediction of mam-
natures in tissues and plasma predict develop- malian microRNA targets. Cell 115
ment and prognosis of computed tomography (7):787–798
detected lung cancer. Proc Natl Acad Sci U S 75. Krek A, Grun D, Poy MN, Wolf R,
A 108(9):3713–3718 Rosenberg L, Epstein EJ, MacMenamin P,
64. Cava C, Bertoli G, Ripamonti M, Mauri G, da Piedade I, Gunsalus KC, Stoffel M et al
Zoppis I, Della Rosa PA, Gilardi MC, Casti- (2005) Combinatorial microRNA target pre-
glioni I (2014) Integration of mRNA expres- dictions. Nat Genet 37(5):495–500
sion profile, copy number alterations, and 76. Grun D, Wang YL, Langenberger D, Gunsa-
microRNA expression levels in breast cancer lus KC, Rajewsky N (2005) microRNA target
to improve grade definition. PLoS One 9(5): predictions across seven Drosophila species
e97681 and comparison to mammalian targets. PLoS
65. Weiler J, Hunziker J, Hall J (2006) Anti- Comput Biol 1(1):e13
miRNA oligonucleotides (AMOs): ammuni- 77. Lall S, Grun D, Krek A, Chen K, Wang YL,
tion to target miRNAs implicated in human Dewey CN, Sood P, Colombo T, Bray N,
disease? Gene Ther 13(6):496–502 Macmenamin P et al (2006) A genome-wide
66. Leivonen SK, Sahlberg KK, Makela R, Due map of conserved microRNA targets in
EU, Kallioniemi O, Borresen-Dale AL, Perala C. elegans. Curr Biol 16(5):460–471
M (2014) High-throughput screens identify 78. Maragkakis M, Alexiou P, Papadopoulos GL,
microRNAs essential for HER2 positive Reczko M, Dalamagas T, Giannopoulos G,
breast cancer cell growth. Mol Oncol 8 Goumas G, Koukis E, Kourtis K, Simossis
(1):93–104 VA et al (2009) Accurate microRNA target
80 Andliena Tahiri et al.
prediction correlates with protein repression 89. McDermott JE, Costa M, Janszen D,
levels. BMC Bioinformatics 10:295 Singhal M, Tilton SC (2010) Separating the
79. Paraskevopoulou MD, Georgakilas G, drivers from the driven: integrative network
Kostoulas N, Vlachos IS, Vergoulis T, and pathway approaches aid identification of
Reczko M, Filippidis C, Dalamagas T, Hatzi- disease biomarkers from high-throughput
georgiou AG (2013) DIANA-microT web data. Dis Markers 28(4):253–266
server v5.0: service integration into miRNA 90. Baudot A, Real FX, Izarzugaza JMG, Valencia
functional analysis workflows. Nucleic Acids A (2009) From cancer genomes to cancer
Res 41(W1):W169–W173 models: bridging the gaps. EMBO Rep 10
80. Ekimler S, Sahin K (2014) Computational (4):359–366
methods for microRNA target prediction. 91. Chin L, Hahn WC, Getz G, Meyerson M
Genes (Basel) 5(3):671–683 (2011) Making sense of cancer genomic
81. Hanahan D, Weinberg RA (2011) Hallmarks data. Genes Dev 25(6):534–555
of cancer: the next generation. Cell 144 92. Dvinge H, Git A, Graf S, Salmon-Divon M,
(5):646–674 Curtis C, Sottoriva A, Zhao Y, Hirst M,
82. Hyman E, Kauraniemi P, Hautaniemi S, Armisen J, Miska EA et al (2013) The shaping
Wolf M, Mousses S, Rozenblum E, and functional consequences of the micro-
Ringnér M, Sauter G, Monni O, Elkahloun RNA landscape in breast cancer. Nature 497
A et al (2002) Impact of DNA amplification (7449):378–382
on gene expression patterns in breast cancer. 93. Curtis C, Shah SP, Chin SF, Turashvili G,
Cancer Res 62(21):6240–6245 Rueda OM, Dunning MJ, Speed D, Lynch
83. Bergamaschi A, Kim YH, Wang P, Sørlie T, AG, Samarajiwa S, Yuan Y et al (2012) The
Hernandez-Boussard T, Lonning PE, genomic and transcriptomic architecture of
Tibshirani R, Børresen-Dale A-L, Pollack JR 2,000 breast tumours reveals novel sub-
(2006) Distinct patterns of DNA copy num- groups. Nature 486(7403):346–352
ber alteration are associated with different 94. Hernández Patiño CE, Jaime-Muñoz G,
clinicopathological features and gene- Resendis-Antonio O (2013) Systems biology
expression subtypes of breast cancer. Genes of cancer: moving toward the integrative
Chromosom Cancer 45(11):1033–1040 study of the metabolic alterations in cancer
84. Aure MR, Leivonen SK, Fleischer T, Zhu Q, cells. Front Physiol 3:481
Overgaard J, Alsner J, Tramm T, Louhimo R, 95. The Cancer Genome Atlas Network (2012)
Alnæs GI, Per€al€a M, Busato F, Touleimat N, Comprehensive molecular portraits of human
Tost J, Børresen-Dale AL, Hautaniemi S, breast tumours. Nature 490(7418):61–70
Troyanskaya OG, Lingjærde OC, Sahlberg 96. Muniategui A, Pey J, Planes FJ, Rubio A
KK, Kristensen VN (2013) Individual and (2012) Joint analysis of miRNA and mRNA
combined effects of DNA methylation and expression data. Brief Bioinform 14
copy number alterations on miRNA expres- (3):263–278
sion in breast tumors. Genome Biol 14(11): 97. Chi SW, Zang JB, Mele A, Darnell RB (2009)
R126 Argonaute HITS-CLIP decodes microRNA-
85. Lahti L, Sch€afer M, Klein H-U, Bicciato S, mRNA interaction maps. Nature 460
Dugas M (2012) Cancer gene prioritization (7254):479–486
by integrative analysis of mRNA expression 98. Aure MR, Jernstrom S, Krohn M, Vollan H,
and DNA copy number data: a comparative Due E, Rodland E, Karesen R, Ram P, Lu Y,
review. Brief Bioinform 14(1):27–35 Mills G et al (2015) Integrated analysis reveals
86. Louhimo R, Lepikhova T, Monni O, Hauta- microRNA networks coordinately expressed
niemi S (2012) Comparative analysis of algo- with key proteins in breast cancer. Genome
rithms for integration of copy number and Med 7(1):21
expression data. Nat Methods 9(4):351–355 99. Hertel J, Lindemeyer M, Missal K, Fried C,
87. Huang N, Shah PK, Li C (2011) Lessons Tanzer A, Flamm C, Hofacker I, Stadler P,
from a decade of integrating cancer copy Students of Bioinformatics Computer Labs
number alterations with gene expression pro- 2004 and 2005 (2006) The expansion of the
files. Brief Bioinform 13(3):305–316 metazoan microRNA repertoire. BMC Geno-
88. Zhang S, Liu C-C, Li W, Shen H, Laird PW, mics 7:25
Zhou XJ (2012) Discovery of multi- 100. Griffiths-Jones S, Saini HK, van Dongen S,
dimensional modules by integrative analysis Enright AJ (2008) miRBase: tools for micro-
of cancer genomic data. Nucleic Acids Res RNA genomics. Nucleic Acids Res 36(Data-
40(19):9379–9391 base issue):D154–D158
The Role and Function of MicroRNAs in Normal and Pathological Processes 81
101. Inui M, Martello G, Piccolo S (2010) Micro- invasive breast cancer. Cell Rep 16
RNA control of signal transduction. Nat Rev (4):1166–1179
Mol Cell Biol 11(4):252–263 111. Tahiri A, Leivonen SK, Luders T, Steinfeld I,
102. Bentwich I (2005) Prediction and validation Aure MR, Geisler J, Makela R, Nord S, Riis
of microRNAs and their targets. FEBS Lett MLH, Yakhini Z et al (2014) Deregulation of
579(26):5904–5910 cancer-related miRNAs is a common event in
103. Patnaik SK, Dahlgaard J, Mazin W, both benign and malignant human breast
Kannisto E, Jensen T, Knudsen S, Yendamuri tumors. Carcinogenesis 35(1):76–85
S (2012) Expression of microRNAs in the 112. Callari M, Dugo M, Musella V, Marchesi E,
NCI-60 cancer cell-lines. PLoS One 7(11): Chiorino G, Grand MM, Pierotti MA, Dai-
e49918 done MG, Canevari S, De Cecco L (2012)
104. Lu J, Getz G, Miska EA, Alvarez-Saavedra E, Comparison of microarray platforms for mea-
Lamb J, Peck D, Sweet-Cordero A, Ebert BL, suring differential microRNA expression in
Mak RH, Ferrando AA et al (2005) Micro- paired normal/cancer colon tissues. PLoS
RNA expression profiles classify human can- One 7(9):e45105
cers. Nature 435(7043):834–838 113. Backes C, Sedaghat-Hamedani F, Frese K,
105. Selbach M, Schwanhausser B, Thierfelder N, Hart M, Ludwig N, Meder B, Meese E, Keller
Fang Z, Khanin R, Rajewsky N (2008) Wide- A (2016) Bias in high-throughput analysis of
spread changes in protein synthesis induced miRNAs and implications for biomarker stud-
by microRNAs. Nature 455(7209):58–63 ies. Anal Chem 88(4):2088–2095
106. Creixell P, Schoof EM, Erler JT, Linding R 114. Meiri E, Mueller WC, Rosenwald S,
(2012) Navigating cancer network attractors Zepeniuk M, Klinke E, Edmonston TB,
for tumor-specific therapy. Nat Biotechnol 30 Werner M, Lass U, Barshack I, Feinmesser
(9):842–848 M et al (2012) A second-generation micro-
107. Avraham R, Yarden Y (2012) Regulation of RNA-based assay for diagnosing tumor tissue
signalling by microRNAs. Biochem Soc Trans origin. Oncologist 17(6):801–812
40(1):26–30 115. Garzon R, Marcucci G, Croce CM (2010)
108. Mitchell PS, Parkin RK, Kroh EM, Fritz BR, Targeting microRNAs in cancer: rationale,
Wyman SK, Pogosova-Agadjanyan EL, strategies and challenges. Nat Rev Drug Dis-
Peterson A, Noteboom J, O’Briant KC, cov 9(10):775–789
Allen A et al (2008) Circulating microRNAs 116. Thorsen SB, Obad S, Jensen NF, Stenvang J,
as stable blood-based markers for cancer Kauppinen S (2012) The therapeutic poten-
detection. Proc Natl Acad Sci U S A 105 tial of microRNAs in cancer. Cancer J 18
(30):10513–10518 (3):275–284
109. Haakensen VD, Nygaard V, Greger L, Aure 117. Aagaard L, Rossi JJ (2007) RNAi therapeu-
MR, Fromm B, Bukholm IR, Luders T, Chin tics: principles, prospects and challenges. Adv
SF, Git A, Caldas C et al (2016) Subtype- Drug Deliv Rev 59(2–3):75–86
specific micro-RNA expression signatures in 118. Yan LX, Wu QN, Zhang Y, Li YY, Liao DZ,
breast cancer progression. Int J Cancer 139 Hou JH, Fu J, Zeng MS, Yun JP, Wu QL et al
(5):1117–1128 (2011) Knockdown of miR-21 in human
110. Lesurf R, Aure MR, Mork HH, Vitelli V, Oslo breast cancer cell lines inhibits proliferation,
Breast Cancer Research Consortium, in vitro migration and in vivo tumor growth.
Lundgren S, Borresen-Dale AL, Breast Cancer Res 13(1):R2
Kristensen V, Warnberg F, Hallett M et al 119. Lujambio A, Lowe SW (2012) The microcos-
(2016) Molecular features of subtype-specific mos of cancer. Nature 482(7385):347–355
progression from ductal carcinoma in situ to
Chapter 5
Abstract
Loss-of-function screening using RNA interference or CRISPR approaches can be used to identify genes
that specific tumor cell lines depend upon for survival. By integrating the results from screens in multiple
cell lines with molecular profiling data, it is possible to associate the dependence upon specific genes with
particular molecular features (e.g., the mutation of a cancer driver gene, or transcriptional or proteomic
signature). Here, using a panel of kinome-wide siRNA screens in osteosarcoma cell lines as an example, we
describe a computational protocol for analyzing loss-of-function screens to identify genetic dependencies
associated with particular molecular features. We describe the steps required to process the siRNA screen
data, integrate the results with genotypic information to identify genetic dependencies, and finally the
integration of protein-protein interaction data to interpret these dependencies.
1 Introduction
Louise von Stechow (ed.), Cancer Systems Biology: Methods and Protocols, Methods in Molecular Biology, vol. 1711,
https://doi.org/10.1007/978-1-4939-7493-1_5, © The Author (s) 2018
83
84 James Campbell et al.
2 Materials
2.2 Input Files 1. Plate files (txt) contain the output from a loss of function
screen. These each comprise three tab-separated columns of
data containing the plate number (numeric), well position
(e.g., B07), and the response value for the cell (e.g., luminosity
readout). See the CellHTS2 documentation for further
information.
2. Plate file list. This file contains three tab-separated columns
with a header row listing “Filename,” “Plate,” and “Replicate.”
Filenames correspond to each plate file. The plate column
defines which plate in the plate configuration file the data
correspond to. The replicates column defines, which replicate
a plate represents. See the CellHTS2 documentation for fur-
ther information.
3. Plate configuration file. The first line defines the number of
wells in each plate (e.g., “Wells: 384”). The second line defines
the number of plates in the library (e.g., “Plates: 3”). The third
line is a header associated with the subsequent columns (e.g.,
86 James Campbell et al.
A
Cell Line 1 2 3 N
Kinase1 -3 0 0 -1
Kinase2 0 1 -2 0
Kinase N 0.5 -2 1 -5
Process using
Plate Arrayed CellHTS2 / R
siRNA Screens Z-score Table (Step 3.1)
(Step 3.1)
+
Kinase1 -3 0 0 -1
RB1 1 1 0 1 0 0 0 0 0
Kinase2 0 1 -2 0
CDKN2A 0 0 1 0 1 1 1 0 0
Kinase N 0.5 -2 1 -5
Mutations Table
Z-score Table
Perform association
analysis using R (Step 3.2)
Associations Table
C
Marker Target P-value
RB1
CDKN2A
DYRK1A
BRAF
0.005
0.600
+ Protein-protein
Interaction Network
Associations Table
Annotate dependencies
using Python (Step 3.3)
Fig. 1 Analyzing siRNA screens in Tumor Cell Line Panels. (a) Luminescence values derived from pooled siRNA
screens are converted into Z-scores using CellHTS2 and custom R scripts. (b) Z-score profiles for each cell line
are integrated with mutational profiles for the same set of cell lines using R. Custom R scripts are used to
identify associations between the presence of particular mutations (e.g., in the RB1 gene) with increased
Analyzing siRNA Screens in Tumor Cell Line Panels 87
3 Methods
3.1 Processing Typically, siRNA screens are conducted in multiwell tissue culture
siRNA Screen Data plates. The process of transfecting a cancer cell line with siRNAs is
Using CellHTS2 optimized prior to screening and once optimal conditions have
been selected (described in [17]), cells are dispensed into multiwell
plates containing growth media, transfection reagents, and siRNAs.
The data in the example provided represent a screen of a single
osteosarcoma tumor cell line using an siRNA library targeting
714 kinase and kinase-related genes. Positive and negative controls
are included on each plate—typically non-targeting siRNA as a
negative control and an siRNA pool targeting PLK1 as a positive
control. The full experimental protocol for this screen has been
described elsewhere [4, 5]. Briefly, following siRNA transfection,
the cells were cultured for 5 days, after which a luminescence assay
measuring cellular ATP was used to estimate cell viability. A Victor
X5 platereader was used to read luminescence values, resulting in
data files in Microsoft Excel format. Prior to the analysis in R, these
data files were converted to plain text plate files. Each plate file
contains the luminescence reading from each well in one 96 or
Fig. 1 (continued) sensitivity to siRNAs targeting specific genes (e.g., DYRK1A). (c) The associations table is
integrated with a data file describing known protein-protein interactions using Python. This results in a table of
annotated dependencies—indicating whether a given association occurs between a pair of genes whose
protein products are known to physically interact
88 James Campbell et al.
384 multiwell plate. Where an siRNA library is larger than the plate
format used in the screen, several plates are required for a single
screen. Additionally, multiple replicate screens are typically con-
ducted for a given cell line and siRNA library. The organization of
plates into segments of an siRNA library and replicate screens is
described in a plate list file. A plate list file contains the file names of
the plate files, the replicate numbers, and plate numbers in a multi-
plate screen. Annotations indicating the genes targeted by siRNAs
in the library across multiple plates as well as the positions of
control wells are provided in separate plain text files. The analysis
protocol set out below uses the cellHTS2 [18] R package devel-
oped by Huber and Boutros to combine data from the plate files,
the plate list file, the plate configuration file, and the annotation file.
The luminescence data are normalized to produce Z-scores by first
log2 transforming the values and subtracting the median log lumi-
nescence value on a plate-by-plate basis. The plate-centered data are
then scaled to the median absolute deviation (MAD) value calcu-
lated across the entire siRNA library to produce Z-scores.
An R script named “run_cellHTS.R” in the R-scripts directory
contains the following commands. The first command loads the
cellHTS2 R package that provides the functions required for the
analysis.
require(cellHTS2)
x <- readPlateList(
filename¼" platelist_p3r3.txt",
name¼"CGDsExample"
path¼"./"
)
x <- configure(
x,
descripFile¼"screen_description.txt",
confFile¼"plateconf_384.txt",
logFile¼"Screenlog.txt",
path¼"./"
)
Analyzing siRNA Screens in Tumor Cell Line Panels 89
x <- annotate(
x,
geneIDFile¼"kinome_library.txt",
path¼"./"
)
xn <- normalizePlates(
x,
scale¼"multiplicative",
log¼TRUE,
method¼"median",
varianceAdjust ¼ "none",
negControls¼"neg",
posControls¼"pos"
)
setSettings(
list(
plateList¼list(
reproducibility¼list(
include¼TRUE,
map¼TRUE
),
intensities¼list(
include¼TRUE,
map¼TRUE)
),
screenSummary¼list(
scores¼list(
range¼c(-20, 10),
map¼TRUE
)
)
)
)
Analyzing siRNA Screens in Tumor Cell Line Panels 91
writeReport(
raw¼x,
normalized¼xn,
scored¼xsc,
outdir¼./report,
force¼TRUE,
posControls¼"pos",
negControls¼"neg",
mainScriptFile¼"../R-scripts/run_cellHTS.R"
)
We can then write out the “combined” data frame to a text file.
A use case for this is to enable joining data from multiple screens
into a single file for analysis.
write.table(
combinedz,
"zscore.txt",
sep¼"\t",
quote¼FALSE,
row.names¼FALSE
)
cor(
summary_info[,c(
"normalized_r1_ch1",
"normalized_r2_ch1",
"normalized_r3_ch1"
)],
use¼"pairwise.complete.obs"
)
3.2 Identification We next integrate the processed results from multiple siRNA
of Kinase screens with data describing the genetic alterations present in each
Dependencies sample. For this tutorial we use the siRNA data from 18 osteosar-
Associated with Driver coma tumor cell lines and a mutations file that describes the pres-
Gene Mutation or Copy ence or absence of genetic alterations in different members of the
Number Alteration Retinoblastoma (RB1) pathway. In the git repository downloaded,
there is a set of directories containing pre-formatted siRNA and
mutation datasets as well as R scripts to process the data. Open the
script R-scripts/identifying_CGDs_RB1_osteosarcoma.R and
examine its contents. The first command sets the working directory
to the top level of the git repository we cloned/downloaded earlier.
Modification of the path given to the setwd() function is required
to point to the appropriate location on your local system.
setwd("~/software/identifying-genetic-dependencies")
Analyzing siRNA Screens in Tumor Cell Line Panels 93
source("./R-scripts/identifying_CGDs_library.R")
We next define the paths to the siRNA and mutation data files
used in the analysis. It is a helpful to define this kind of information
near the top of scripts so that in the future the files can be changed
without having to find the commands where these values are used.
func_muts_file¼rb_pathway_func_muts_file,
all_muts_file¼rb_pathway_all_muts_file
)
We write out the results of the association tests to a text file that
can be opened in a spreadsheet application or used as input for
other programs such as the annotate_dependencies.py python pro-
gram described in Subheading 3.3.
write.table(
kinome_rb_mut_associations,
"./results/kinome_rb_mut_associations.txt",
sep¼"\t",
col.names¼TRUE,
row.names¼FALSE,
quote¼FALSE
)
4 Notes
install.packages(
"gplots",
dependencies¼TRUE,
)
source("https://bioconductor.org/biocLite.R")
biocLite("cellHTS2")
References
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 Interna-
tional License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation,
distribution and reproduction in any medium or format, as long as you give appropriate credit to the
original author(s) and the source, provide a link to the Creative Commons license and indicate if changes
were made.
The images or other third party material in this chapter are included in the chapter’s Creative Commons
license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s
Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the
permitted use, you will need to obtain permission directly from the copyright holder.
Part II
Abstract
Cellular signaling, predominantly mediated by phosphorylation through protein kinases, is found to be
deregulated in most cancers. Accordingly, protein kinases have been subject to intense investigations in
cancer research, to understand their role in oncogenesis and to discover new therapeutic targets. Despite
great advances, an understanding of kinase dysfunction in cancer is far from complete.
A powerful tool to investigate phosphorylation is mass-spectrometry (MS)-based phosphoproteomics,
which enables the identification of thousands of phosphorylated peptides in a single experiment. Since every
phosphorylation event results from the activity of a protein kinase, high-coverage phosphoproteomics data
should indirectly contain comprehensive information about the activity of protein kinases.
In this chapter, we discuss the use of computational methods to predict kinase activity scores from
MS-based phosphoproteomics data. We start with a short explanation of the fundamental features of the
phosphoproteomics data acquisition process from the perspective of the computational analysis. Next, we
briefly review the existing databases with experimentally verified kinase-substrate relationships and present a
set of bioinformatic tools to discover novel kinase targets. We then introduce different methods to infer
kinase activities from phosphoproteomics data and these kinase-substrate relationships. We illustrate their
application with a detailed protocol of one of the methods, KSEA (Kinase Substrate Enrichment Analysis).
This method is implemented in Python within the framework of the open-source Kinase Activity Toolbox
(kinact), which is freely available at http://github.com/saezlab/kinact/.
1 Introduction
Louise von Stechow (ed.), Cancer Systems Biology: Methods and Protocols, Methods in Molecular Biology, vol. 1711,
https://doi.org/10.1007/978-1-4939-7493-1_6, © The Author(s) 2018
103
104 Jakob Wirbel et al.
originating from the fusion of the BCR and ABL genes, can give
rise to and sustain chronic myeloid leukemia [3]. Accordingly, the
small molecule inhibitor of the BCR-ABL kinase, Imatinib, has
shown unprecedented therapeutic effectiveness in affected
patients [4].
Fueled by these promising clinical results, due to the essential
role for kinases in the patho-mechanism of cancer, and because
kinases are in general pharmacologically tractable [5], a range of
new kinase inhibitors has been approved or is in development for
different cancer types [6]. However, not all eligible patients
respond equally well, and in addition, cancers often develop resis-
tance to initially successful therapies. This calls for a deeper under-
standing of kinase signaling and opens up the possibility of
exploiting this knowledge therapeutically [7].
By definition, the activity of a kinase is reflected in the occur-
rence of phosphorylation events catalyzed by this kinase. Thus,
analysis of kinase activity was traditionally achieved by monitoring
the phosphorylation status of a limited number of sites known to be
targeted by the kinase of interest using immunochemical techni-
ques [8]. This, however, requires substantial prior-knowledge and
yields a comparably low throughput. Other approaches exist, e.g.,
protein kinase activity assays [9, 10] or attempts to measure kinase
activity with chromatographic beads functionalized with ATP or
small molecule inhibitors [11].
Mass spectrometry-based techniques to measure phosphoryla-
tion can identify thousands of phosphopeptides in a single sample
with ever-increasing coverage, throughput, and quality, nourished
by technological advances and dramatically increased performance
of MS instruments in recent years [12–14]. High-coverage phos-
phoproteomics data should indirectly contain information about
the activity of many active kinases. The high-content nature of
phosphoproteomics data, however, poses challenges for computa-
tional analysis. For example, only a small subset of the described
phosphorylation sites can be explicitly associated with functional
impact [15].
As a means to extract functional insight, methods to infer
kinase activities from phosphoproteomics data based on prior-
knowledge about kinase-substrate relationships have been put for-
ward [16–19]. The knowledge about kinase-substrate relation-
ships, compiled in databases like PhosphoSitePlus [20] or
Phospho.ELM [21], covers only a limited set of interactions. Alter-
natively, computational resources to predict kinase-substrate rela-
tionships based on kinase recognition motifs and contextual
information have been used to enrich the collections of substrates
per kinase [22, 23], but the accuracy of such kinase-substrate
relationships has not been validated experimentally for most cases.
The inferred kinase activities can in turn be used to reconstruct
Phosphoproteomics-Based Profiling of Kinase Activities 105
2.2 Data Acquisition For most phosphoproteomics studies so far, the MS instrument is
operated in the data-dependent acquisition (DDA) mode. Therein,
precursor ions from a first survey scan are selected—typically based
on relative ion abundance—in order to generate fragmentation
spectra in a second MS run [32], for which a database search yields
the corresponding peptide sequences [33]. As a result, peptide
detection in DDA is on the one hand biased toward high abun-
dance species, but also considerably irreproducible due to stochas-
tic precursor ion selection [34]. This inherent under-sampling of
DDA usually leads to missing data points in LC-MS/MS datasets.
However, this problem may be solved to some extent by extracting
ion chromatograms of the peptides that are missing in some of the
runs that are being compared [35–38], by matching across samples
[39], or with the accurate mass and retention tag method [40].
In an alternative operation mode, selected reaction monitor-
ing/multiple reaction monitoring (SRM/MRM), the presence and
abundance of only a limited set of pre-specified peptides with
known fragmentation spectra is surveyed [41]. This targeted
approach overcomes many of the issues of shotgun methods, but
is usually not feasible for large-scale investigation of the complete
phosphoproteome.
Data-independent acquisition (DIA), e.g., SWATH-MS [42]
tries to address the shortcoming of both established data acquisi-
tion strategies in order to combine the throughput of DDA with
the reproducibility of SRM. In DIA, fragmentation spectra are
generated for all precursor ions in a specific window of m/z ratios,
leading to a complete map of fragmentation spectra, followed by
computational extraction of quantitative information for known
spectra. For phosphoproteomics, DIA-MS has already been applied
to investigate insulin signaling [43] or histone modifications
[44]. However, the spectra generated by DIA-MS are usually
highly complex and require intricate data extraction techniques,
Phosphoproteomics-Based Profiling of Kinase Activities 107
2.5 Pitfalls in the Although the available experimental methods for MS-based phos-
Analysis of MS-Based phoproteomics data acquisition have evolved considerably over the
Phospho- last years, leading to a steadily increasing number of detected
proteomics Data phosphosites, several limitations remain for the investigation of
signaling processes using phosphoproteomics data.
While it has been estimated that there are around 500,000
phosphorylation sites in the human proteome [62], the number of
phosphosites that can be identified in a single MS experiment usually
ranks around 10,000 to up to 40,000 [63]. Therefore, the sampled
phosphoproteomic picture is incomplete. It has to be taken into
account though, that, not all possible phosphorylation sites are
expected to be modified at the same time point. This is caused by
context-dependent regulation of phosphosites. For example, some
phosphosites are controlled differentially at different cell cycle stages,
while others only change under specific external stimulation such as
growth factors or other effector molecules [64, 65]. The hope is
therefore that a significantly larger portion of phosphosites could be
mapped with improving technology and by increasing the diversity
of biologically relevant conditions analyzed. So far though, in differ-
ent MS runs or replicates, usually a distinct set of phosphosites is
detected, as the selection of precursor ions is stochastic. This leads to
incomplete datasets with a high number of missing data points,
challenging computational investigation of the data such as cluster-
ing or correlation analysis. However, as discussed above, approaches
in which phosphopeptide intensities are compared across MS run
post-acquisition minimize this problem [38].
The functional impact of a phosphorylation event is known only
in the minority of cases [15]. Indeed, it has been hypothesized that a
substantial fraction of phosphorylation sites are non-functional [66],
since phosphorylation sites tend to be poorly conserved throughout
species [67]. Although approaches to studying the function of indi-
vidual phosphorylation events have been proposed [68], it may be
that a large part of the detected phosphosites serves no function at
all. Thus, non-functional sites add a substantial amount of noise to
phosphoproteomics data and complicate the computational analysis.
The inference of kinase activity from phosphoproteomics data
that will be described in the next section aims to overcome these
limitations, by the integration of the information from many
110 Jakob Wirbel et al.
3.1 Resources for As the large-scale detection of phosphorylation events using mass
Kinase-Substrate spectrometry became routine, many freely available databases that
Relationships collect experimentally verified phosphosites have been set up,
including PhosphoSitePlus [20], Phospho.ELM [21], Signor
[71], or PHOSIDA [72], to name just a few. The databases differ
in size and aim; PHOSIDA for example provides a tool for the
prediction of putative phosphorylation sites and recently also added
acetylation and other posttranslational modification sites to its
scope. Phospho.ELM computes a score for the conservation of a
phosphosite. Signor is focused on interactions between proteins
participating in signal transduction. PhosphoNetworks [73] is ded-
icated to kinase-substrate interactions, but the information is on
the level of proteins, not phosphosites. The arguably most promi-
nent database for expert-edited and curated interactions between
kinases and individual phosphosites (that have not been derived
Phosphoproteomics-Based Profiling of Kinase Activities 111
3.3 KAA Another approach to link phosphoproteomics data with the activity
of kinases was presented in a publication from Qi et al. [16], which
they termed kinase activity analysis (KAA).
In this study, the authors collected phosphoproteomics data
from adult mouse testis in order to investigate the process of
mammalian spermatogenesis. With the software package iGPS
[23] they predicted putative kinase-substrate relationships for the
detected phosphosites. The authors hypothesized that the number
of links for a given kinase in the predicted kinase-substrate network
can serve as proxy for the activity of this kinase in the specific cell
type. This activity was then compared to the kinase activity back-
ground which was calculated by computing the number of links in
the background kinase-substrate network based on the mouse
phosphorylation atlas by Huttlin et al. [93]. Qi and colleagues
predicted high activity of PLK kinases in adult mouse testis and
could validate this prediction experimentally.
However, there are several limitations of KAA. For once, it is
mainly based on computational predictions of kinase substrate
relationships, which are known to be susceptible to errors
Phosphoproteomics-Based Profiling of Kinase Activities 113
temporal activity profiles. Since the method does not provide sin-
gular activity scores for each kinase, it may be only partly applicable
to experiments in which the individual responses of kinases to
different treatments or conditions are of interest.
3.5 KSEA Casado et al. [17] presented a method for kinase activity estimation
based on kinase-substrate sets. Using kinase-substrate relationships
derived from the databases PhosphoSitePlus and Phospho.ELM, all
phosphosites that are targeted by a given kinase can be grouped
together into a substrate set (see Fig. 1 for an outline of the work-
flow). In theory, these phosphosites should show similar values,
since they are targeted by the same kinase. However, due to the
transient and therefore inherently noisy nature of phosphorylation,
Casado and colleagues proposed integrating the information from
all phosphosites in the substrate set in order to enhance the signal-
to-noise ratio by signal averaging [95].
For KSEA, log2-transformed fold change data is needed, i.e.,
the change of the abundance of a phosphosite between the initial
and treated states, initial and later time points, or between two
different cell types. Therefore, KSEA activity scores describe the
activity of a kinase in one condition relative to another.
The authors suggested three possible metrics (mean score,
alternative mean score, and delta score) that can be extracted out
of the substrate set and serve as proxy for kinase activity: (1) The
S29 Protein
Protein
3
1
Protein and
2 others
4
2
Statistical
Kinase
log2(Fold
Change)
Analysis
0 Activity
-2 Score
-4
P-sites in substrate set of kinase X
Fig. 1 Work-flow of methods to obtain Kinase activity scores such as KSEA. As prior knowledge, the targets of
a given kinase are extracted out of curated databases like PhosphoSitePlus. Together with the data of the
detected phosphosites, substrate sets are constructed for each kinase, from which an activity score can be
calculated
Phosphoproteomics-Based Profiling of Kinase Activities 115
X
m X
m
Pi ¼ e ji = pji
j ¼1 j ¼1
A 0.3 B 0.8
mean mean
0.2
median 0.5 median
0.2
activity score
activity score
0.1 0.3
0.0 0.0
-0.1
-0.3
-0.2
-0.5
-0.2
-0.3 -0.8
5 10 20 30 60 5 10 20 30 60
C 30
time [min] D 4
time [min]
log2(fold change)
20 3
activity score
10 1
0 0
-10 -1
-20 -3
-30 -4
5 10 20 30 60 5 10 20 30 60
4.1 Quick Start As a quick start for practiced Python users, we can use the utility
functions from kinact to load the example dataset. The data should
be organized as Pandas DataFrame containing the log2-
transformed fold changes, while the columns represent different
conditions or time points and the row individual phosphosites. The
p-value of the fold change is optional, but should be organized in
the same way as the data.
import kinact
data_fc, data_p_value ¼ kinact.get_example_data()
print data_fc.head()
>>> 5min 10min 20min 30min 60min
>>> ID
>>> A0AVK6_S71 -0.319306 -0.484960 -0.798082 -0.856103
-0.928753
118 Jakob Wirbel et al.
kin_sub_interactions ¼ kinact.get_kinase_targets(sources¼
[‘all’])
4.2 Loading the Data In the following, we walk the reader step by step through the
procedure for KSEA. First, we need to organize the data so that
the KSEA functions can interpret it.
In Python, the library Pandas [99] provides useful data struc-
tures and powerful tools for data analysis. Since the provided script
depends on many utilities from this library, we would strongly
advice the reader to have a look at the Pandas documentation,
although it will not be crucial in order to understand the presented
protocol. The library, together with the NumPy [100] package, can
be loaded with:
import pandas as pd
import numpy as np
data_reduced ¼ data_raw[~data_raw[‘Proteins’].str.contains
(‘;’)]
120 Jakob Wirbel et al.
4.3 Loading the Now, we load the prior knowledge about kinase-substrate relation-
Kinase-Substrate ships. In this example, we use the information provided in the
Relationships PhosphoSitePlus database (see Note 5), which can be downloaded
from the website www.phosphosite.org. The organization of the
data from comparable databases, e.g., Phospho.ELM, does not
differ drastically from the one from PhosphoSitePlus and therefore
requires only minor modifications. Using ‘read_csv’ again, we load
the downloaded file with:
ks_rel_human[‘psite’] ¼ ks_rel_human[‘SUB_ACC_ID’] +
‘_’ + ks_rel_human[‘SUB_MOD_RSD’]
print adj_matrix.sum(axis¼0).sort_values(ascending¼False).
head()
>>> GENE
>>> CDK2 541
>>> CDK1 458
>>> PRKACA 440
>>> CSNK2A1 437
>>> SRC 391
>>> dtype: int64
data_condition ¼ data_fc[‘60min’].copy()
p_values ¼ data_p_value[‘p value_60vs0min’]
kinase ¼ ‘CDK1’
Phosphoproteomics-Based Profiling of Kinase Activities 123
substrate_set ¼ adj_matrix[kinase].replace(
0, np.nan).dropna().index # (see Note 9)
detected_p_sites ¼ data_condition.index
intersect¼list(set(substrate_set).intersection(detected_p_-
sites))
print len(intersect)
>>> 114
4.4.1 KSEA Using the For the “mean” method, the KSEA score is equal to the mean of
“Mean” Method the fold changes in the substrate set mS.
The significance of the score is tested with a z-statistic using
pffiffiffiffiffi
mS mP m
z¼
δ
with mP as mean of the complete dataset, m being the size of the
substrate set, and δ the standard deviation of the complete dataset,
adapted from the PAGE method for gene set enrichment
[101]. The “mean” method has established itself as the preferred
method in the Cutillas lab that developed the KSEA approach.
mS ¼ data_condition.ix[intersect].mean()
mP ¼ data_fc.values.mean()
m ¼ len(intersect)
delta ¼ data_fc.values.std()
z_score ¼ (mS - mP) * np.sqrt(m) * 1/delta
4.4.2 KSEA Using the Alternatively, only the phosphosites in the substrate set that change
Alternative ‘Mean’ Method significantly between conditions can be considered when comput-
ing the mean of the fold changes in the substrate set. Therefore, we
need a cutoff, determining a significant increase or decrease, respec-
tively, which can be a user-supplied parameter. Here, we use a
124 Jakob Wirbel et al.
cut_off ¼ -np.log10(0.05)
set_alt ¼ data_condition.ix[intersect].where(
p_values.ix[intersect] > cut_off).dropna()
mS_alt ¼ set_alt.mean()
z_score_alt ¼ (mS_alt - mP) * np.sqrt(len(set_alt)) * 1/delta
p_value_mean_alt ¼ norm.sf(abs(z_score_alt))
print mS_alt, p_value_mean_alt
>>> -0.680835732551 1.26298232031e-13
4.4.3 KSEA Using the In the “Delta count” method, we count the number of phospho-
“Delta Count” Method sites in the substrate set that are significantly increased in the
condition versus the control and subtract the number of phospho-
sites that are significantly decreased.
cut_off ¼ -np.log10(0.05)
score_delta ¼ len(data_condition.ix[intersect].where(
(data_condition.ix[intersect] > 0) &
(p_values.ix[intersect] > cut_off)).dropna()) -
len(data_condition.ix[intersect].where(
(data_condition.ix[intersect] < 0) &
(p_values.ix[intersect] > cut_off)).dropna()) # (see Note 10)
5 Closing Remarks
6 Notes
intensity_columns ¼ []
for x in data_indexed:
...if x.starstwith(‘Average’):
... ...intensity_columns.append(x)
data_intensity ¼ data_indexed[intensity_columns]
Phosphoproteomics-Based Profiling of Kinase Activities 127
data_log2 ¼ np.log2(data_intensity)
Acknowledgments
References
1. Jørgensen C, Linding R (2010) Simplistic 12. Doll S, Burlingame AL (2015) Mass
pathways or complex networks? Curr Opin spectrometry-based detection and assignment
Genet Dev 20:15–22 of protein posttranslational modifications.
2. Hanahan D, Weinberg RA (2011) Hallmarks ACS Chem Biol 10:63–71
of cancer: the next generation. Cell 13. Choudhary C, Mann M (2010) Decoding
144:646–674 signalling networks by mass spectrometry-
3. Sawyers CL (1999) Chronic myeloid leuke- based proteomics. Nat Rev Mol Cell Biol
mia. N Engl J Med 340:1330–1340 11:427–439
4. Sawyers CL, Hochhaus A, Feldman E et al 14. Sabidó E, Selevsek N, Aebersold R (2012)
(2002) Imatinib induces hematologic and Mass spectrometry-based proteomics for sys-
cytogenetic responses in patients with chronic tems biology. Curr Opin Biotechnol
myelogenous leukemia in myeloid blast crisis: 23:591–597
results of a phase II study. Blood 15. Beltrao P, Albanèse V, Kenner LR et al (2012)
99:3530–3539 Systematic functional prioritization of protein
5. Zhang J, Yang PL, Gray NS (2009) Targeting posttranslational modifications. Cell
cancer with small molecule kinase inhibitors. 150:413–425
Nat Rev Cancer 9:28–39 16. Qi L, Liu Z, Wang J et al (2014) Systematic
6. Gonzalez de Castro D, Clarke PA, analysis of the phosphoproteome and kinase-
Al-Lazikani B et al (2012) Personalized can- substrate networks in the mouse testis. Mol
cer medicine: molecular diagnostics, predic- Cell Proteomics 13:3626–3638
tive biomarkers and drug resistance. Clin 17. Casado P, Rodriguez-Prados J-C, Cosulich
Pharmacol Ther 93:252–259 SC et al (2013) Kinase-substrate enrichment
7. Cutillas PR (2015) Role of phosphoproteo- analysis provides insights into the heterogene-
mics in the development of personalized cancer ity of signaling pathway activation in leukemia
therapies. Proteomics Clin Appl 9:383–395 cells. Sci Signal 6:rs6
8. Bertacchini J, Guida M, Accordi B et al 18. Yang P, Zheng X, Jayaswal V et al (2015)
(2014) Feedbacks and adaptive capabilities Knowledge-based analysis for detecting key
of the PI3K/Akt/mTOR axis in acute mye- signaling events from time-series Phospho-
loid leukemia revealed by pathway selective proteomics data. PLoS Comput Biol 11:
inhibition and phosphoproteome analysis. e1004403
Leukemia 28:2197–2205 19. Mischnik M, Sacco F, Cox J et al (2015)
9. Cutillas PR, Khwaja A, Graupera M et al IKAP: a heuristic framework for inference of
(2006) Ultrasensitive and absolute quantifica- kinase activities from Phosphoproteomics
tion of the phosphoinositide 3-kinase/Akt data. Bioinformatics 32(3):424–431
signal transduction pathway by mass spec- 20. Hornbeck PV, Zhang B, Murray B et al
trometry. Proc Natl Acad Sci U S A (2015) PhosphoSitePlus, 2014: mutations,
103:8959–8964 PTMs and recalibrations. Nucleic Acids Res
10. Yu Y, Anjum R, Kubota K et al (2009) A site- 43:D512–D520
specific, multiplexed kinase activity assay using 21. Dinkel H, Chica C, Via A et al (2011) Phos-
stable-isotope dilution and high-resolution pho.ELM: a database of phosphorylation
mass spectrometry. Proc Natl Acad Sci U S A sites—update 2011. Nucleic Acids Res 39:
106:11606–11611 D261–D267
11. McAllister FE, Niepel M, Haas W et al (2013) 22. Horn H, Schoof EM, Kim J et al (2014)
Mass spectrometry based method to increase KinomeXplorer: an integrated platform for
throughput for kinome analyses using ATP kinome biology studies. Nat Methods
probes. Anal Chem 85:4666–4674 11:603–604
Phosphoproteomics-Based Profiling of Kinase Activities 129
23. Song C, Ye M, Liu Z et al (2012) Systematic cell lymphoma cell line. Mol Cell Proteomics
analysis of protein phosphorylation networks 4:1038–1051
from phosphoproteomic data. Mol Cell Pro- 37. Bateman NW, Goulding SP, Shulman NJ et al
teomics 11:1070–1083 (2014) Maximizing peptide identification
24. Riley NM, Coon JJ (2016) Phosphoproteo- events in proteomic workflows using data-
mics in the age of rapid and deep proteome dependent acquisition (DDA). Mol Cell Pro-
profiling. Anal Chem 88:74–94 teomics 13:329–338
25. Nilsson CL (2012) Advances in quantitative 38. Alcolea MP, Casado P, Rodrı́guez-Prados J-C
phosphoproteomics. Anal Chem 84:735–746 et al (2012) Phosphoproteomic analysis of
26. Hennrich ML, Gavin A-C (2015) Quantita- leukemia cells under basal and drug-treated
tive mass spectrometry of posttranslational conditions identifies markers of kinase path-
modifications: keys to confidence. Sci Signal way activation and mechanisms of resistance.
8:re5 Mol Cell Proteomics 11:453–466
27. Giansanti P, Aye TT, van den Toorn H et al 39. Cox J, Hein MY, Luber CA et al (2014)
(2015) An augmented multiple-protease- Accurate proteome-wide label-free quantifica-
based human phosphopeptide atlas. Cell Rep tion by delayed normalization and maximal
11:1834–1843 peptide ratio extraction, termed MaxLFQ.
28. Ruprecht B, Roesli C, Lemeer S et al (2016) Mol Cell Proteomics 13:2513–2526
MALDI-TOF and nESI Orbitrap MS/MS 40. Strittmatter EF, Ferguson PL, Tang K et al
identify orthogonal parts of the phosphopro- (2003) Proteome analyses using accurate
teome. Proteomics 16(10):1447–1456 mass and elution time peptide tags with capil-
29. Zhou H, Ye M, Dong J et al (2013) Robust lary LC time-of-flight mass spectrometry. J
phosphoproteome enrichment using mono- Am Soc Mass Spectrom 14:980–991
disperse microsphere-based immobilized tita- 41. Lange V, Picotti P, Domon B et al (2008)
nium (IV) ion affinity chromatography. Nat Selected reaction monitoring for quantitative
Protoc 8:461–480 proteomics: a tutorial. Mol Syst Biol 4:222
30. Rush J, Moritz A, Lee KA et al (2005) Immu- 42. Gillet LC, Navarro P, Tate S et al (2012)
noaffinity profiling of tyrosine phosphoryla- Targeted data extraction of the MS/MS spec-
tion in cancer cells. Nat Biotechnol tra generated by data-independent acquisi-
23:94–101 tion: a new concept for consistent and
31. Ruprecht B, Koch H, Medard G et al (2015) accurate proteome analysis. Mol Cell Proteo-
Comprehensive and reproducible phospho- mics 11:O111.016717
peptide enrichment using iron immobilized 43. Parker BL, Yang G, Humphrey SJ et al (2015)
metal ion affinity chromatography Targeted phosphoproteomics of insulin sig-
(Fe-IMAC) columns. Mol Cell Proteomics naling using data-independent acquisition
14:205–215 mass spectrometry. Sci Signal 8:rs6
32. Domon B, Aebersold R (2006) Mass spec- 44. Sidoli S, Fujiwara R, Kulej K et al (2016)
trometry and protein analysis. Science Differential quantification of isobaric phos-
(New York, NY) 312:212–217 phopeptides using data-independent acquisi-
33. Nesvizhskii AI (2007) Protein identification tion mass spectrometry. Mol BioSyst 12
by tandem mass spectrometry and sequence (8):2385–2388
database searching. Methods Mol Biol (Clif- 45. Keller A, Bader SL, Kusebauch U et al (2016)
ton, NJ) 367:87–119 Opening a SWATH window on posttransla-
34. Liu H, Sadygov RG, Yates JR (2004) A model tional modifications: automated pursuit of
for random sampling and estimation of rela- modified peptides. Mol Cell Proteomics
tive protein abundance in shotgun proteo- 15:1151–1163
mics. Anal Chem 76:4193–4201 46. Ong S-E, Blagoev B, Kratchmarova I et al
35. Cutillas PR, Vanhaesebroeck B (2007) Quan- (2002) Stable isotope labeling by amino
titative profile of five murine core proteomes acids in cell culture, SILAC, as a simple and
using label-free functional proteomics. Mol accurate approach to expression proteomics.
Cell Proteomics 6:1560–1573 Mol Cell Proteomics 1:376–386
36. Cutillas PR, Geering B, Waterfield MD et al 47. Zanivan S, Meves A, Behrendt K et al (2013)
(2005) Quantification of gel-separated pro- In vivo SILAC-based proteomics reveals
teins and their phosphorylation sites by phosphoproteome changes during mouse
LC-MS using unlabeled internal standards: skin carcinogenesis. Cell Rep 3:552–566
analysis of phosphoprotein dynamics in a B
130 Jakob Wirbel et al.
48. Shenoy A, Geiger T (2015) Super-SILAC: 61. Baker PR, Trinidad JC, Chalkley RJ (2011)
current trends and future perspectives. Expert Modification site localization scoring
Rev Proteomics 12:13–19 integrated into a search engine. Mol Cell Pro-
49. Thompson A, Sch€afer J, Kuhn K et al (2003) teomics 10:M111.008078
Tandem mass tags: a novel quantification 62. Lemeer S, Heck AJR (2009) The phospho-
strategy for comparative analysis of complex proteomics data explosion. Curr Opin Chem
protein mixtures by MS/MS. Anal Chem Biol 13:414–420
75:1895–1904 63. Sharma K, D’Souza RCJ, Tyanova S et al
50. Ross PL, Huang YN, Marchese JN et al (2014) Ultradeep human phosphoproteome
(2004) Multiplexed protein quantitation in reveals a distinct regulatory nature of Tyr and
Saccharomyces cerevisiae using amine- Ser/Thr-based signaling. Cell Rep
reactive isobaric tagging reagents. Mol Cell 8:1583–1594
Proteomics 3:1154–1169 64. Olsen JV, Blagoev B, Gnad F et al (2006)
51. Li Z, Adams RM, Chourey K et al (2012) Global, in vivo, and site-specific phosphoryla-
Systematic comparison of label-free, meta- tion dynamics in signaling networks. Cell
bolic labeling, and isobaric chemical labeling 127:635–648
for quantitative proteomics on LTQ Orbitrap 65. Olsen JV, Vermeulen M, Santamaria A et al
Velos. J Proteome Res 11:1582–1590 (2010) Quantitative phosphoproteomics
52. Chelius D, Bondarenko PV (2002) Quantita- reveals widespread full phosphorylation site
tive profiling of proteins in complex mixtures occupancy during mitosis. Sci Signal 3:ra3
using liquid chromatography and mass spec- 66. Landry CR, Levy ED, Michnick SW (2009)
trometry. J Proteome Res 1:317–323 Weak functional constraints on phosphopro-
53. Neilson KA, Ali NA, Muralidharan S et al teomes. Trends Genet 25:193–197
(2011) Less label, more free: approaches in 67. Beltrao P, Trinidad JC, Fiedler D et al (2009)
label-free quantitative mass spectrometry. Evolution of phosphoregulation: comparison
Proteomics 11:535–553 of phosphorylation patterns across yeast spe-
54. Perkins DN, Pappin DJ, Creasy DM et al cies. PLoS Biol 7:e1000134
(1999) Probability-based protein identifica- 68. Beltrao P, Bork P, Krogan NJ et al (2013)
tion by searching sequence databases using Evolution and functional cross-talk of protein
mass spectrometry data. Electrophoresis post-translational modifications. Mol Syst
20:3551–3567 Biol 9:714
55. Clauser KR, Baker P, Burlingame AL (1999) 69. Newman RH, Zhang J, Zhu H (2014)
Role of accurate mass measurement (+/10 Toward a systems-level view of dynamic phos-
ppm) in protein identification strategies phorylation networks. Front Genet 5:263
employing MS or MS/MS and database 70. Glickman JF (2012) Assay development for
searching. Anal Chem 71:2871–2882 protein kinase enzymes. Eli Lilly & Company
56. MacCoss MJ, Wu CC, Yates JR (2002) and the National Center for Advancing Trans-
Probability-based validation of protein identi- lational Sciences, Bethesda, MD. http://
fications using a modified SEQUEST algo- www.ncbi.nlm.nih.gov/books/NBK91991/
rithm. Anal Chem 74:5593–5599 71. Perfetto L, Briganti L, Calderone A et al
57. Cox J, Neuhauser N, Michalski A et al (2011) (2016) SIGNOR: a database of causal rela-
Andromeda: a peptide search engine tionships between biological entities. Nucleic
integrated into the MaxQuant environment. Acids Res 44:D548–D554
J Proteome Res 10:1794–1805 72. Gnad F, Gunawardena J, Mann M (2011)
58. Beausoleil SA, Villén J, Gerber SA et al (2006) PHOSIDA 2011: the posttranslational modi-
A probability-based approach for high- fication database. Nucleic Acids Res 39:
throughput protein phosphorylation analysis D253–D260
and site localization. Nat Biotechnol 73. Hu J, Rho H-S, Newman RH et al (2014)
24:1285–1292 PhosphoNetworks: a database for human
59. Savitski MM, Lemeer S, Boesche M et al phosphorylation networks. Bioinformatics
(2011) Confident phosphorylation site locali- (Oxford, England) 30:141–142
zation using the Mascot Delta Score. Mol Cell 74. Sadowski I, Breitkreutz B-J, Stark C et al
Proteomics 10:M110.003830 (2013) The PhosphoGRID Saccharomyces
60. Chalkley RJ, Clauser KR (2012) Modification cerevisiae protein phosphorylation site data-
site localization scoring: strategies and perfor- base: version 2.0 update. Database 2013:
mance. Mol Cell Proteomics 11:3–14 bat026
Phosphoproteomics-Based Profiling of Kinase Activities 131
75. Duan G, Li X, K€ ohn M (2015) The human 89. Chen EY, Tan CM, Kou Y et al (2013)
DEPhOsphorylation database DEPOD: a Enrichr: interactive and collaborative
2015 update. Nucleic Acids Res 43: HTML5 gene list enrichment analysis tool.
D531–D535 BMC Bioinformatics 14:128
76. Zhang H, Zha X, Tan Y et al (2002) Phospho- 90. Kuleshov MV, Jones MR, Rouillard AD et al
protein analysis using antibodies broadly reac- (2016) Enrichr: a comprehensive gene set
tive against phosphorylated motifs. J Biol enrichment analysis web server 2016 update.
Chem 277:39379–39387 Nucleic Acids Res 44(W1):W90–W97
77. Obenauer JC, Cantley LC, Yaffe MB (2003) 91. Lachmann A, Ma’ayan A (2009) KEA: kinase
Scansite 2.0: proteome-wide prediction of cell enrichment analysis. Bioinformatics (Oxford,
signaling interactions using short sequence England) 25:684–686
motifs. Nucleic Acids Res 31:3635–3641 92. Keshava Prasad TS, Goel R, Kandasamy K et al
78. C. Chen and B.E. Turk (2010) Analysis of (2009) Human Protein Reference Database—
serine-threonine kinase specificity using 2009 update. Nucleic Acids Res 37:
arrayed positional scanning peptide libraries., D767–D772
Curr Protoc Mol Biol Chapter 18:Unit 18.14 93. Huttlin EL, Jedrychowski MP, Elias JE et al
79. Sidhu SS, Koide S (2007) Phage display for (2010) A tissue-specific atlas of mouse protein
engineering and analyzing protein interaction phosphorylation and expression. Cell
interfaces. Curr Opin Struct Biol 17:481–487 143:1174–1189
80. Miller ML, Jensen LJ, Diella F et al (2008) 94. de Graaf EL, Giansanti P, Altelaar AFM et al
Linear motif atlas for phosphorylation- (2014) Single-step enrichment by Ti4+-
dependent signaling. Sci Signal 1:ra2 IMAC and label-free quantitation enables
81. Hjerrild M, Stensballe A, Rasmussen TE et al in-depth monitoring of phosphorylation
(2004) Identification of phosphorylation sites dynamics with high reproducibility and tem-
in protein kinase A substrates using artificial poral resolution. Mol Cell Proteomics
neural networks and mass spectrometry. J 13:2426–2434
Proteome Res 3:426–433 95. Wilm M, Mann M (1996) Analytical proper-
82. Linding R, Jensen LJ, Pasculescu A et al ties of the nanoelectrospray ion source. Anal
(2008) NetworKIN: a resource for exploring Chem 68:1–8
cellular phosphorylation networks. Nucleic 96. Wilkes EH, Terfve C, Gribben JG et al (2015)
Acids Res 36:D695–D699 Empirical inference of circuitry and plasticity
83. Szklarczyk D, Franceschini A, Wyder S et al in a kinase signaling network. Proc Natl Acad
(2015) STRING v10: protein-protein inter- Sci U S A 112:7719–7724
action networks, integrated over the tree of 97. T€urei D, Korcsmáros T, Saez-Rodriguez J
life. Nucleic Acids Res 43:D447–D452 (2016) OmniPath: guidelines and gateway
84. Wagih O, Sugiyama N, Ishihama Y et al for literature-curated signaling pathway
(2016) Uncovering phosphorylation-based resources. Nat Methods 13:966–967
specificities through functional interaction 98. Benjamini Y, Hochberg Y (2000) On the
networks. Mol Cell Proteomics 15:236–245 adaptive control of the false discovery rate in
85. Linding R, Jensen LJ, Ostheimer GJ et al multiple testing with independent statistics. J
(2007) Systematic discovery of in vivo phos- Educ Behav Stat 25:60–83
phorylation networks. Cell 129:1415–1426 99. Mckinney W (2010) Data structures for sta-
86. Subramanian A, Tamayo P, Mootha VK et al tistical computing in python. Proceedings of
(2005) Gene set enrichment analysis: a the 9th python in science conference
knowledge-based approach for interpreting 100. Van Der Walt S, Colbert SC, Varoquaux G
genome-wide expression profiles. Proc Natl (2011) The NumPy Array: A Structure for
Acad Sci U S A 102:15545–15550 Efficient Numerical Computation, Comput
87. Schacht T, Oswald M, Eils R et al (2014) Sci Eng 13:22–30. https://doi.org/10.
Estimating the activity of transcription factors 1109/MCSE.2011.37
by the effect on their target genes. Bioinfor- 101. Kim S-Y, Volsky DJ (2005) PAGE: parametric
matics (Oxford, England) 30:i401–i407 analysis of gene set enrichment. BMC Bioin-
88. Drake JM, Graham NA, Stoyanova T et al formatics 6:144
(2012) Oncogene-specific activation of tyro- 102. Jones E, Oliphant TE, Peterson P (2007)
sine kinase networks during prostate cancer Python for scientific computing. Comput Sci
progression. Proc Natl Acad Sci Eng 9:10–20
109:1643–1648 103. Imamura H, Sugiyama N, Wakabayashi M
et al (2014) Large-scale identification of
132 Jakob Wirbel et al.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 Interna-
tional License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation,
distribution and reproduction in any medium or format, as long as you give appropriate credit to the
original author(s) and the source, provide a link to the Creative Commons license and indicate if changes
were made.
The images or other third party material in this chapter are included in the chapter’s Creative Commons
license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s
Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the
permitted use, you will need to obtain permission directly from the copyright holder.
Chapter 7
Abstract
Mass spectrometry-based proteomics is a continuously growing field marked by technological and meth-
odological improvements. Cancer proteomics is aimed at pursuing goals such as accurate diagnosis, patient
stratification, and biomarker discovery, relying on the richness of information of quantitative proteome
profiles. Translating these high-dimensional data into biological findings of clinical importance necessitates
the use of robust and powerful computational tools and methods. In this chapter, we provide a detailed
description of standard analysis steps for a clinical proteomics dataset performed in Perseus, a software for
functional analysis of large-scale quantitative omics data.
Key words Perseus, Software, Omics data analysis, Translational bioinformatics, Cancer proteomics
1 Introduction
Louise von Stechow (ed.), Cancer Systems Biology: Methods and Protocols, Methods in Molecular Biology, vol. 1711,
https://doi.org/10.1007/978-1-4939-7493-1_7, © The Author(s) 2018
133
134 Stefka Tyanova and Juergen Cox
e1
pl
m
Sa
significant
0
difference min maxx
Fig. 1 Outline of a typical analysis workflow in Perseus. The workflow shows the process of converting data
into information and knowledge. Statistical analysis can be used to guide the identification of biologically
relevant hits and drive hypotheses generation. Various external databases, annotation sources, and multiple
omics types can be loaded and matched within the software and together with powerful enrichment
techniques allow for smooth data integration
2 Materials
2.1 Software Written in C#, Perseus achieves optimal performance when run on
Download and Windows operating systems. The latest versions require 64 bit
Installation system and .NET Framework 4.5 to be installed on the same
computer. To use the software on MacOS set up BootCamp and
optionally in addition Parallels. Registration and acceptance of the
Software License Agreement are required prior to downloading
Perseus from the official website: http://www.coxdocs.org/doku.
php?id¼perseus:start . Once the download has finished, decom-
press the folder, locate the Perseus.exe file, and double-click it to
start the program.
2.2 Data Files In the subsequent analysis, we used a subset of the data measured
by Pozniak et al. [20]. The authors provide a genome-wide pro-
teomic analysis of progression of breast cancer in patients by study-
ing major differences at the proteome level between healthy,
primary tumor, and metastatic tissues. The data were measured as
ratios between an optimized heavy-labeled mix of cell lines repre-
senting different breast cancer stages and the patient proteome
[2]. This constitutes an accurate relative quantification approach
used especially in the analysis of clinical samples. Peptide and pro-
tein identification and quantification was performed using the
MaxQuant suite for the analysis of raw mass spectrometry data
[21] at peptide spectrum match and protein false discovery rate of
1%. The subset used in this protocol contains proteome profiles of
22 healthy, 21 lymph node negative, and 25 lymph node metastatic
tissue samples and spans over 10,000 protein groups and can be
found in the proteinGroups.txt file provided as supplementary
material to the Pozniak et al. study (see Note 1).
3 Methods
3.1 Loading the Data 1. Go to the “Load” section in Perseus and click the “Generic
matrix upload” button.
2. In the pop-up window, navigate to the file to be loaded (see
Note 2).
3. Select all the expression columns and transfer them to the Main
columns window (see Note 3). Select all additional numerical
data that may be needed in the analysis and transfer them to the
Numerical columns window. Make sure that the columns con-
taining identifiers (e.g., protein IDs) are selected as Text col-
umns. Click ok.
3.2 Summary Get familiar with the Software and its five main sections: Load,
Statistics Processing, Analysis, Multi-processing, and Export (see Fig. 2).
1. In the workflow panel, change the name of the data matrix from
matrix 1 to InitialData by right-clicking the node and changing
the Alternative name box. Close the pop-up window. Explore
the right-most panel of Perseus, which contains useful informa-
tion such as number of main columns and number of rows.
2. Go to “Processing ! Filter rows ! Filter rows based on
categorical column” to exclude proteins identified by site,
matching to the reverse database or contaminants (see Note 4).
3. Transform the data to a logarithmic scale by going to “Proces-
sing ! Basic ! Transform” and specifying the transformation
function (e.g., log2(x)).
4. In the “Processing” section, select the “Basic” menu and click
on the “Summary statistics (columns)” button. Select all
expression columns by transferring them to the right-hand
side. Click ok and explore the new matrix.
3.3 Filtering 1. Use the workflow window to select the InitialData matrix data
by clicking on it (see Note 5).
2. In the “Processing” section, go to the “Filter rows” menu and
select “Filter rows based on valid values.” Change the Min.
valids parameter to Percentage and keep the default value of
70% for the Min. percentage of values parameter. Click ok.
Check how many protein groups were retained after the filter-
ing (see Note 6).
138 Stefka Tyanova and Juergen Cox
A
LOAD EXPORT PROCESSING ANALYSIS MULTI-PROC.
B
Treatment Annotation
Technical replicates r1 r2 r3 r1 r2 r3
Numerical variable, e.g. BMI 32 26 23 27 23 22 Numerical
C D
s
ot ID y
Pr ein hwa
e
m
GO lue e
va nc
na
GG s
ot at
KE erm
Q- nda
e
e
n
Pr p
n
i
pl
pl
ei
t
m
u
Ab
Sa
Sa
Numerical Text
Categorical
Fig. 2 Interfaces of Perseus and the augmented data matrix format. (a) Perseus extends over five interfaces,
each of which includes various analysis and transformation functionalities and visualization possibilities. (b)
Experimental design is specified as annotation (e.g., treatment vs. control groups) or numerical rows (e.g.,
variable concentration). Multiple annotation rows can be specified that allow biological and technical
replicates to be analyzed together. (c) The data is organized in a matrix format where typically all samples
are displayed as columns and all proteins as rows. (d) Additional protein information can be added in the form
of Numerical, Categorical, or Text annotations
3. Click on the pdf button to export the plot (see Note 7).
4. Switch the view to the “Data” tab.
5. Go to “Analysis ! Visualization ! Multi scatter plot.” Select
the desired samples by transferring them to the right-hand side.
Click ok (see Fig. 3).
6. Adjust the plot using the Fit width and Fit height options and
resizing the plot window.
7. In the drop-down menu “Display in plots” in the plot window,
select Pearson correlation.
8. Select a scatter plot by clicking on it. The selected plot will be
shown in an enlarged view.
9. Select a number of proteins from the “Point” table on the right
of the multi scatter plot and examine their position in all pair-
wise sample comparisons.
10. Switch back to the “Data” tab to continue with the analysis.
11. “Go to Processing ! Basic ! Column correlation.” Make sure
that the Type is set to Pearson correlation. The output table
contains all pairwise correlations between the selected
columns.
12. To visualize the sample correlations, go to “Analysis ! Clus-
tering/PCA ! Hierarchical clustering.” Use the Change color
gradient to set a continuous gradient similar to the one in
Fig. 3a.
13. Export the plot by clicking on the pdf button.
14. Navigate back to the previous data matrix by clicking on it in
the workflow panel.
15. Principal component analysis requires all values to be valid. To
remove all protein groups with missing values, repeat Subhead-
ing 3.3, step 2 setting the percentage parameter to 100 (see
Note 8).
16. Go to “Analysis ! Clustering/PCA ! Principal component
analysis” and click ok. Explore the sample separation (dot plot
in the upper panel) and the corresponding loadings (dot plot in
the lower panel).
17. In the table on the right of the PCA plot, select a set of samples
(e.g., all samples that belong to one experimental condition)
and change their color by clicking on the Symbol color button
and selecting the desired color.
18. Check the contribution of other components by substituting
Component 1 and 2 with other components from the drop-
down menu. Find the components that show sample separa-
tion according to the experimental conditions (see Fig. 3c).
140 Stefka Tyanova and Juergen Cox
A B
B18M
B15M
B13M
B25M
B21M
B28M
B24M
A38T
A33T
B22M
A35T
A34T
A32T
A30T
A31T
B19M
A36T
B23H
B23M
B26M
B16M
A15T
A19T
B2M
A8T
B3M
B9M
B5M
B6M
B1M
B17M
B12M
B14M
A9T
A7T
B11M
B10M
B7M
A16T
A6T
A5T
A3T
B24H
A12T
A18T
A11T
A19H
B27M
A31H
B22H
A36H
A35H
A34H
B26H
B19H
A30H
B9H
B2H
B5H
A16H
B8H
A15H
A14H
B21H
B3H
A3H
A13T
A11H
5
+HDOWK\/+
B18M
B15M
B13M
0
B25M
B21M
B28M
B24M
A38T
A33T
B22M
A35T
A34T
A32T
A30T
A31T
B19M
A36T
B23H
B23M
3ULPDU\/+
B26M
5
B16M
A15T
A19T
B2M
A8T
B3M
B9M
B5M
B6M
B1M
B17M
0
B12M
B14M
A9T
A7T
B11M
B10M
B7M
A16T
A6T
A5T
A3T
B24H
A12T
5
A18T
0HWDVWDVLV+/
A11T
A19H
B27M
A31H
B22H
A36H
A35H
A34H
B26H
B19H
A30H
B9H
0
B2H
B5H
A16H
B8H
A15H
A14H
B21H
B3H
A3H
A13T
A11H
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0 5 0 5 0 5
+HDOWK\/+ 3ULPDU\/+ 0HWDVWDVLV+/
Correlation coefficient
C
20
Healthy
&RPSRQHQW
10
Primary tumors
Metastasis
0
-10
Fig. 3 Exploratory analysis outputs in Perseus. (a) Hierarchical clustering of all the samples based on the
correlation coefficients between them reveals higher similarity between primary and metastatic tumors versus
healthy tissue samples. (b) Multi-scatter plot of averaged profiles among the three main groups clearly
represents the disease progression by highlighting strong similarities between subsequent stages, e.g.,
healthy tissue samples are more similar to primary tumors than to metastasis (correlation coefficient 0.76
vs 0.69), whereas primary tumors are most similar to metastasis (R ¼ 0.94). The category Cell division is
highlighted in bright green in all pairwise comparison plots. (c) Principal component analysis (PCA) attributes
the largest variance to the difference between healthy (blue dots) and cancer tissues (pink and red dots)
(Component 1, 21.1%) and shows that primary and metastatic tumors (pink and red dots respectively) are
difficult to distinguish
3.5 Normalization 1. Navigate back to the data matrix before filtering for 100% valid
values (Subheading 3.3, step 2).
2. Go to “Processing ! Normalization ! Z-score.” Change the
Matrix access parameter to Columns and select the Use median
option. In the new data table, plot histograms for the same
subset of samples as in Subheading 3.4, step 1 (see Note 9).
3.7 Loading 1. Go to the drop-down menu indicated with a white arrow at the
Annotations top left corner of Perseus and select “Annotation download.”
2. Click on the link in the pop-up window. Select the appropriate
annotation file (e.g., “PerseusAnnotaion ! FrequentlyUsed !
mainAnnot.homo_sapiens.txt.gz,” if the organism of interest is
homo sapiens).
3. Download the file to the Perseus/conf/annotations folder.
4. Go to “Processing ! Annot. columns ! Add annotation.”
Select the file from the previous step as a Source.
5. Set the UniProt column parameter to the column that
contains UniProt identifiers. These identifiers will be used for
overlaying the annotation data with the expression matrix (e.g.,
Protein IDs).
6. Select several categories of interest to be overlaid with the main
matrix and move them to the right-hand side. Click ok.
3.8 Differential 1. Go to “Processing ! Tests.” From the menu select the appro-
Expression Analysis priate test. For the data set used in this chapter, the Multiple-
sample tests option should be chosen, as there are more than
two conditions that are compared. The default parameters do
not have to be changed (see Note 11).
2. Specify the categorical row that contains information about the
experimental conditions of the samples that will be used in the
differential analysis in the Grouping parameter.
3. Keep the default value of 0 for the S0 parameter, to use the
standard t-test statistic. Change the parameter to use the mod-
ified test statistic approach described by Tusher et al. [15].
4. Select the multiple hypothesis testing correction method to
be used by specifying the Use for truncation parameter (see
Note 12, Fig. 4a).
142 Stefka Tyanova and Juergen Cox
A
Randomize, r q-vali <= FDR threshold
Protein |m1 - m2| >= THSD Sign. |m1 - m3| >= THSD Sign. |m2 - m3| >= THSD Sign.
A *** ***
C *** ***
D *** ***
F *** ***
Fig. 4 Differential expression and multiple hypothesis testing. (a) Multiple hypothesis testing correction using a
permutation-based false discovery rate approach is shown. Labels are randomly swapped between the three
groups (blue, yellow, and red). The Randomization is repeated r times. ANOVA p-values are computed both on
the measured and the permutated data and local FDR values (q-values) are computed as the fraction of
accepted hits from the permuted data over accepted hits from the measured data normalized by the total
number of randomizations r. All hits with a q-value smaller than a threshold are considered significant. (b) To
determine the exact pairwise differences of protein expression Tukey’s Honest Significant Difference (THSD)
test is used on the ANOVA significant hits. If the mean difference between two groups is greater than or equal
to the corresponding THSD, the difference is considered significant between the compared groups. q: constant
depending on the number of treatments and the degrees of freedom that can be found in a Studentized range q
table; MSE: mean squared error; n1, n2, number of data points in each group
A B C
Metastasis
Primary tumor
Healthy
or
m
s
tu
si
ta
y
ar
lth
as
im
ea
et
Pr
M
H
1 Cytoplasmic translation 9.3 0.3E-02
Arp2/3 complex 8.6 0.1E-02
Cytosolic small ribosomal subunit 8.5 1.2E-17
Actin nucleation 7.8 0.3E-02
3
Doane breast cancer ESR1 DN 6.2 0.1E-02
Epidermis development 4.9 3.7E-06
Basolateral plasma membrane 4.3 0.2E-03
Cell-cell adhesion 3.1 0.2E-02
Fig. 5 Enrichment analysis highlighting important pathways and processes. (a) Hierarchical clustering of
proteins found to have differential expression between pairs of disease states. High and low expression are
shown in red and blue respectively. Various clusters of protein groups are highlighted in the dendrogram. (b)
Profile plots of three selected clusters showing distinct behavior with respect to the three disease states are
shown: 1 strongly increased expression in tumor tissues; 2 moderate increase in tumor tissue; and
3 decreased expression in tumor samples. (c) Functional analysis of protein annotation terms resulted in
multiple categories that were enriched in the three selected clusters. The enriched terms and the
corresponding enrichment factor and p-value are shown
4 Notes
References
1. Mann M, Kulak NA, Nagaraj N, Cox J (2013) Geiger T, Mann M, Flores-Morales A (2016)
The coming age of complete, accurate, and The proteome of primary prostate cancer. Eur
ubiquitous proteomes. Mol Cell 49 Urol 69(5):942–952. https://doi.org/10.
(4):583–590. https://doi.org/10.1016/j. 1016/j.eururo.2015.10.053
molcel.2013.01.029 10. Deeb SJ, Tyanova S, Hummel M, Schmidt-
2. Geiger T, Cox J, Ostasiewicz P, Wisniewski JR, Supprian M, Cox J, Mann M (2015) Machine
Mann M (2010) Super-SILAC mix for quanti- learning based classification of diffuse large
tative proteomics of human tumor tissue. Nat B-cell lymphoma patients by their protein
Methods 7(5):383–385. https://doi.org/10. expression profiles. Mol Cell Proteomics 14
1038/nmeth.1446 (11):2947–2960. https://doi.org/10.1074/
3. Shenoy A, Geiger T (2015) Super-SILAC: cur- mcp.M115.050245
rent trends and future perspectives. Expert Rev 11. Tyanova S, Albrechtsen R, Kronqvist P, Cox J,
Proteomics 12(1):13–19. https://doi.org/10. Mann M, Geiger T (2016) Proteomic maps of
1586/14789450.2015.982538 breast cancer subtypes. Nat Commun
4. Cox J, Hein MY, Luber CA, Paron I, 7:10259. https://doi.org/10.1038/
Nagaraj N, Mann M (2014) Accurate ncomms10259
proteome-wide label-free quantification by 12. Mertins P, Mani DR, Ruggles KV, Gillette MA,
delayed normalization and maximal peptide Clauser KR, Wang P, Wang X, Qiao JW, Cao S,
ratio extraction, termed MaxLFQ. Mol Cell Petralia F, Kawaler E, Mundt F, Krug K, Tu Z,
Proteomics 13(9):2513–2526. https://doi. Lei JT, Gatza ML, Wilkerson M, Perou CM,
org/10.1074/mcp.M113.031591 Yellapantula V, Huang KL, Lin C, McLellan
5. Ellis MJ, Gillette M, Carr SA, Paulovich AG, MD, Yan P, Davies SR, Townsend RR, Skates
Smith RD, Rodland KK, Townsend RR, SJ, Wang J, Zhang B, Kinsinger CR, Mesri M,
Kinsinger C, Mesri M, Rodriguez H, Liebler Rodriguez H, Ding L, Paulovich AG, Fenyo D,
DC, Clinical Proteomic Tumor Analysis C Ellis MJ, Carr SA, Nci C (2016) Proteoge-
(2013) Connecting genomic alterations to can- nomics connects somatic mutations to signal-
cer biology with proteomics: the NCI Clinical ling in breast cancer. Nature 534(7605):55–62.
Proteomic Tumor Analysis Consortium. Can- https://doi.org/10.1038/nature18003
cer Discov 3(10):1108–1112. https://doi. 13. Troyanskaya O, Cantor M, Sherlock G,
org/10.1158/2159-8290.CD-13-0219 Brown P, Hastie T, Tibshirani R, Botstein D,
6. Hanash S, Taguchi A (2010) The grand chal- Altman RB (2001) Missing value estimation
lenge to decipher the cancer proteome. Nat methods for DNA microarrays. Bioinformatics
Rev Cancer 10(9):652–660. https://doi.org/ 17(6):520–525
10.1038/nrc2918 14. Lazar C, Gatto L, Ferro M, Bruley C, Burger T
7. Wisniewski JR, Dus-Szachniewicz K, (2016) Accounting for the multiple natures of
Ostasiewicz P, Ziolkowski P, Rakus D, Mann missing values in label-free quantitative prote-
M (2015) Absolute proteome analysis of colo- omics data sets to compare imputation strate-
rectal mucosa, adenoma, and cancer reveals gies. J Proteome Res 15(4):1116–1125.
drastic changes in fatty acid metabolism and https://doi.org/10.1021/acs.jproteome.
plasma membrane transporters. J Proteome 5b00981
Res 14(9):4005–4018. https://doi.org/10. 15. Tusher VG, Tibshirani R, Chu G (2001) Sig-
1021/acs.jproteome.5b00523 nificance analysis of microarrays applied to the
8. Zhang B, Wang J, Wang X, Zhu J, Liu Q, ionizing radiation response. Proc Natl Acad Sci
Shi Z, Chambers MC, Zimmerman LJ, Shad- U S A 98(9):5116–5121. https://doi.org/10.
dox KF, Kim S, Davies SR, Wang S, Wang P, 1073/pnas.091062498
Kinsinger CR, Rivers RC, Rodriguez H, Town- 16. Benjamini Y, Hochberg Y (1995) Controlling
send RR, Ellis MJ, Carr SA, Tabb DL, Coffey the false discovery rate: a practical and powerful
RJ, Slebos RJ, Liebler DC, Nci C (2014) Pro- approach to multiple testing. J R Stat Soc Series
teogenomic characterization of human colon B 57:289–300
and rectal cancer. Nature 513 17. Fisher RA (1922) On the interpretation of x
(7518):382–387. https://doi.org/10.1038/ (2) from contingency tables, and the calcula-
nature13438 tion of P. J R Stat Soc 85:87–94. https://doi.
9. Iglesias-Gato D, Wikstrom P, Tyanova S, org/10.2307/2340521
Lavallee C, Thysell E, Carlsson J, Hagglof C, 18. Cox J, Mann M (2012) 1D and 2D annotation
Cox J, Andren O, Stattin P, Egevad L, enrichment: a statistical method integrating
Widmark A, Bjartell A, Collins CC, Bergh A,
148 Stefka Tyanova and Juergen Cox
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 Interna-
tional License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation,
distribution and reproduction in any medium or format, as long as you give appropriate credit to the
original author(s) and the source, provide a link to the Creative Commons license and indicate if changes
were made.
The images or other third party material in this chapter are included in the chapter’s Creative Commons
license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s
Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the
permitted use, you will need to obtain permission directly from the copyright holder.
Chapter 8
Abstract
Glioblastoma is the most aggressive primary brain tumor with a poor mean survival even with the current
standard of care. Kinase signaling analyses of clinical glioblastoma samples provide a physiologically relevant
view of oncogenic signaling networks. Here, we describe the methods that enable the quantification of
protein expression profiles and phosphotyrosine signaling across flash frozen and optimal cutting tempera-
ture (OCT) compound embedded tumor specimens. The data derived from these experiments can be used
to identify the intra- and inter-patient heterogeneity present in these tumors. Correlation and functional
analyses on the quantitative protein expression and phosphotyrosine signaling data obtained from clinical
samples can be used to identify tyrosine kinase signaling networks present in these tumors and reveal the
differential expression of functionally related proteins. This chapter provides the quantitative mass spec-
trometry methods required for the identification of in vivo oncogenic signaling networks from human
tumor specimens.
1 Introduction
Louise von Stechow (ed.), Cancer Systems Biology: Methods and Protocols, Methods in Molecular Biology, vol. 1711,
https://doi.org/10.1007/978-1-4939-7493-1_8, © Springer Science+Business Media, LLC 2018
149
150 Hannah Johnson and Forest M. White
Fig. 1 Quantification of tyrosine phosphorylation signaling and protein expression profiles across human
glioblastoma tumors. Experimental mass spectrometry workflow. Human glioblastoma tumor sections are
homogenized, reduced, alkylated, and digested with trypsin and peptides labeled with iTRAQ8plex.
Phosphotyrosine peptide enrichment was carried out by immunoprecipitation using anti-phosphotyrosine
antibodies and analyzed by LC-MS/MS. For protein expression profiling, peptides are fractionated by
isoelectric focusing and analyzed by LC-MS/MS
2 Materials
Prepare all the solutions using HPLC grade solvents unless indi-
cated otherwise. Prepare and store all the reagents at room temper-
ature unless indicated otherwise. Follow waste disposal regulations
when disposing of chemicals.
152 Hannah Johnson and Forest M. White
6. C18 cartridges.
7. Acetonitrile.
9. C18 cartridges.
10. Acetic acid.
3 Methods
3.1 Tumor It is essential that tumors are flash frozen immediately following
Homogenization resection as cold ischemia can lead to significant changes in the
and Removal tyrosine kinase signaling network [13]. Perform steps 3–8 in the
of Optimal Cutting chemical hood on ice.
Temperature 1. Immediately flash freeze tumor samples in liquid nitrogen
following resection or embed in OCT compound and flash
freeze in liquid nitrogen as soon as possible (ideally within
5 min).
2. Take tissue out of the tube using tweezers and deposit it in a
weighting tray that has been previously tared. Record the
weight of the tumor, the size, and the shape (see Note 1).
Quantitative Tyrosine Kinase Signaling in Glioblastoma 155
3.2 Immunoblotting The RTK status of the tumors can be used to help understand the
tyrosine phosphorylation results (e.g., to help define the relative
stoichiometry of phosphorylation between samples). RTK expres-
sion can be assessed using standard immunoblotting (see Note 2).
1. Separate tissue homogenates on 7.5% polyacrylamide gels and
electrophoretically transfer to nitrocellulose.
2. Block nitrocellulose with blocking buffer for 1 h.
3. Dilute primary antibodies in blocking buffer and incubate with
nitrocellulose overnight at 4 C.
4. Dilute secondary antibodies (either goat anti-rabbit or goat
anti-mouse conjugated to horseradish peroxidase) in TBS-T
at a 1:10,000 ratio and incubate at room temperature for 1 h.
5. Wash nitrocellulose 3 10 min with TBS-T.
6. Detect antibody binding with ECL, film, and a standard
developer.
3.4 iTRAQ 8plex iTRAQ labeling currently allows the multiplexed quantification
Labeling across up to eight different samples. Multiple iTRAQ8plex experi-
ments can be combined to quantify across multiples of eight
tumors. This multiplexing strategy requires the presence of a com-
mon sample in each experiment in order to compare across differ-
ent experiments. This multiplex labeling strategy can also be
performed with TMT reagents, available in 6-plex or 10-plex.
1. Label 400 μg peptide (quantified by BCA before C18 desalt-
ing) from each of the tumors with one tube of iTRAQ 8plex
reagent (see Note 3).
2. Dissolve 400 μg lyophilized peptides in 30 μL dissolution
buffer. Vortex each sample for 1 min and spin at 12,000 g
for 1 min.
3. Dissolve each tube of iTRAQ reagent in 70 μL of isopropanol.
Vortex each tube for 1 min and spin at 12,000 g for 1 min.
4. Add the isopropanol and iTRAQ 8plex reagent to the 400 μg
peptide in dissolution buffer and vortex. Incubate at room
temperature for 2 h.
5. Concentrate the eight tubes of peptide/iTRAQ mix to 40 μL
using a vacuum centrifuge (speed-vac).
6. Combine the eight differentially labeled samples into a
single tube.
7. Sequentially rinse out all the tubes with 3 60 μL 0.1% acetic
acid and add to the sample.
8. Concentrate the combined iTRAQ sample using a vacuum
centrifuge (spin to dryness) and store at 80 C. At this
point the sample is stable for long-term storage (see Note 4).
3.6 Phosphopeptide IMAC columns are packed and used according to the previously
Enrichment by IMAC described protocol [18]. The steps required to enrich for phospho-
peptides are briefly highlighted here.
1. Rinse the IMAC column with 100 mM EDTA pH 8.0 for
10 min at 10 μL/min.
2. Rinse the IMAC column with MilliQ water for 10 min at
10 μL/min.
3. Load 100 mM iron(III) chloride onto the column for 30 min
at 10 μL/min (see Note 7).
4. Rinse the IMAC column with 0.1% acetic acid for 10 min at
10 μL/min.
158 Hannah Johnson and Forest M. White
3.7 Analysis 1. After rinsing with 0.1% acetic acid, attach the pre-column to a
of Tyrosine C18 reverse-phase analytical column with integrated electro-
Phosphorylation by MS spray emitter tip.
2. Chromatographically separate peptides by reverse phase HPLC
over a 140 min gradient, with the eluent ionized by nanoelec-
trospray into an Orbitrap QExactive Plus instrument.
3. Operate the instrument in positive ion mode. Record full scans
in the Orbitrap mass analyzer (resolution- FWHM 60,000) at a
mass/charge (m/z) range of 350–2000 in profile mode. Select
the top 15 most intense ions per scan for higher-energy C-trap
dissociation (HCD)-based MS/MS analysis for peptide frag-
mentation and for iTRAQ reporter ion quantification, record-
ing MS/MS scans in the Orbitrap mass analyzer (resolution-
FWHM 60,000) at a mass/charge (m/z) range of 100–2000 in
profile mode.
3.12 Functional Data 1. To identify groups of similarly expressed proteins and phos-
Analysis phorylation sites, perform unsupervised hierarchical clustering.
(a) Clustering of the mean normalized and log2 transformed
phosphotyrosine and protein expression quantitative
iTRAQ data (using one minus Pearson correlation as a
distance metric) can be performed using GENE-E (see
Note 10).
2. To visualize quantitative phosphotyrosine and protein expres-
sion profiles across tumors, generate heat maps of mean nor-
malized and log2 transformed phosphotyrosine and protein
expression quantitative iTRAQ data (see Note 11).
(a) When using GENE-E, upload an excel file with mean
normalized and log2 transformed phosphotyrosine and
protein expression quantitative iTRAQ data with the
quantitative information specified in a data matrix, where
phosphorylation sites are row metadata and
corresponding iTRAQ labels are column.
(b) Heat maps can be aesthetically modified under
“preferences.”
Quantitative Tyrosine Kinase Signaling in Glioblastoma 161
4 Notes
Acknowledgments
This work was supported in part by a generous gift from the James
S. McDonnell Foundation and by NIH grants P30 CA014051 and
R01 CA184320. The authors would like to thank Ms. Marcela
White at the brain tumor bank (www.Braintumourbank.com) for
access to patient materials.
Quantitative Tyrosine Kinase Signaling in Glioblastoma 163
References
1. Stupp R, Mason WP, van den Bent MJ, glioblastoma genes and core pathways. Nature
Weller M, Fisher B, Taphoorn MJ, 455(7216):1061–1068
Belanger K, Brandes AA, Marosi C, 7. Krakstad C, Chekenya M (2010) Survival sig-
Bogdahn U, Curschmann J, Janzer RC, Lud- nalling and apoptosis resistance in glioblasto-
win SK, Gorlia T, Allgeier A, Lacombe D, mas: opportunities for targeted therapeutics.
Cairncross JG, Eisenhauer E, Mirimanoff RO Mol Cancer 9:135. https://doi.org/10.
(2005) Radiotherapy plus concomitant and 1186/1476-4598-9-135
adjuvant temozolomide for glioblastoma. N 8. Johnson H, White FM (2014) Quantitative
Engl J Med 352(10):987–996. https://doi. analysis of signaling networks across differen-
org/10.1056/NEJMoa043330 tially embedded tumors highlights interpatient
2. Stupp R, Hegi ME, Mason WP, van den Bent heterogeneity in human glioblastoma. J Prote-
MJ, Taphoorn MJ, Janzer RC, Ludwin SK, ome Res 13(11):4581–4593. https://doi.org/
Allgeier A, Fisher B, Belanger K, Hau P, 10.1021/pr500418w
Brandes AA, Gijtenbeek J, Marosi C, Vecht 9. Rikova K, Guo A, Zeng Q, Possemato A, Yu J,
CJ, Mokhtari K, Wesseling P, Villa S, Haack H, Nardone J, Lee K, Reeves C, Li Y,
Eisenhauer E, Gorlia T, Weller M, Hu Y, Tan Z, Stokes M, Sullivan L, Mitchell J,
Lacombe D, Cairncross JG, Mirimanoff RO Wetzel R, Macneill J, Ren JM, Yuan J, Baka-
(2009) Effects of radiotherapy with concomi- larski CE, Villen J, Kornhauser JM, Smith B,
tant and adjuvant temozolomide versus radio- Li D, Zhou X, Gygi SP, TL G, Polakiewicz RD,
therapy alone on survival in glioblastoma in a Rush J, Comb MJ (2007) Global survey of
randomised phase III study: 5-year analysis of phosphotyrosine signaling identifies oncogenic
the EORTC-NCIC trial. Lancet Oncol 10 kinases in lung cancer. Cell 131
(5):459–466. https://doi.org/10.1016/ (6):1190–1203. https://doi.org/10.1016/j.
S1470-2045(09)70025-7 cell.2007.11.025
3. Verhaak RG, Hoadley KA, Purdom E, Wang V, 10. Drake JM, Graham NA, Lee JK, Stoyanova T,
Qi Y, Wilkerson MD, Miller CR, Ding L, Faltermeier CM, Sud S, Titz B, Huang J,
Golub T, Mesirov JP, Alexe G, Lawrence M, Pienta KJ, Graeber TG, Witte ON (2013) Met-
O’Kelly M, Tamayo P, Weir BA, Gabriel S, astatic castration-resistant prostate cancer
Winckler W, Gupta S, Jakkula L, Feiler HS, reveals intrapatient similarity and interpatient
Hodgson JG, James CD, Sarkaria JN, heterogeneity of therapeutic kinase targets.
Brennan C, Kahn A, Spellman PT, Wilson Proc Natl Acad Sci U S A 110(49):
RK, Speed TP, Gray JW, Meyerson M, E4762–E4769. https://doi.org/10.1073/
Getz G, Perou CM, Hayes DN (2006) pnas.1319948110
Integrated genomic analysis identifies clinically
relevant subtypes of glioblastoma characterized 11. Steu S, Baucamp M, von Dach G, Bawohl M,
by abnormalities in PDGFRA, IDH1, EGFR, Dettwiler S, Storz M, Moch H, Schraml P
and NF1. Cancer Cell 17(1):98–110. https:// (2008) A procedure for tissue freezing and
doi.org/10.1016/j.ccr.2009.12.020 processing applicable to both intra-operative
frozen section diagnosis and tissue banking in
4. Brennan C, Momota H, Hambardzumyan D, surgical pathology. Virchows Arch 452
Ozawa T, Tandon A, Pedraza A, Holland E (3):305–312. https://doi.org/10.1007/
(2009) Glioblastoma subclasses can be defined s00428-008-0584-y
by activity among signal transduction pathways
and associated genomic alterations. PLoS One 12. Loken SD, Demetrick DJ (2005) A novel
4(11):e7752. https://doi.org/10.1371/jour method for freezing and storing research tissue
nal.pone.0007752 bank specimens. Hum Pathol 36(9):977–980.
https://doi.org/10.1016/j.humpath.2005.
5. Phillips HS, Kharbanda S, Chen R, Forrest WF, 06.016
Soriano RH, TD W, Misra A, Nigro JM,
Colman H, Soroceanu L, Williams PM, 13. Gajadhar AS, Johnson H, Slebos RJ,
Modrusan Z, Feuerstein BG, Aldape K Shaddox K, Wiles K, Washington MK, Herline
(2006) Molecular subclasses of high-grade gli- AJ, Levine DA, Liebler DC, White FM (2015)
oma predict prognosis, delineate a pattern of Phosphotyrosine signaling analysis in human
disease progression, and resemble stages in tumors is confounded by systemic ischemia-
neurogenesis. Cancer Cell 9(3):157–173. driven artifacts and intra-specimen heterogene-
https://doi.org/10.1016/j.ccr.2006.02.019 ity. Cancer Res 75(7):1495–1503. https://doi.
org/10.1158/0008-5472.CAN-14-2309
6. Network TCGAR (2008) Comprehensive
genomic characterization defines human 14. Snuderl M, Fazlollahi L, Le LP, Nitta M, Zhe-
lyazkova BH, Davidson CJ, Akhavanfard S,
164 Hannah Johnson and Forest M. White
Cahill DP, Aldape KD, Betensky RA, Louis 17. Johnson H, Del Rosario AM, Bryson BD,
DN, Iafrate AJ (2011) Mosaic amplification of Schroeder MA, Sarkaria JN, White FM (2012)
multiple receptor tyrosine kinase genes in glio- Molecular characterization of EGFR and
blastoma. Cancer Cell 20(6):810–817. EGFRvIII signaling networks in human glio-
https://doi.org/10.1016/j.ccr.2011.11.005 blastoma tumor xenografts. Mol Cell Proteo-
15. Szerlip NJ, Pedraza A, Chakravarty D, mics 11(12):1724–1740. https://doi.org/10.
Azim M, McGuire J, Fang Y, Ozawa T, Hol- 1074/mcp.M112.019984
land EC, Huse JT, Jhanwar S, Leversha MA, 18. Zhang Y, Wolf-Yadlin A, Ross PL, Pappin DJ,
Mikkelsen T, Brennan CW (2012) Intratu- Rush J, Lauffenburger DA, White FM (2005)
moral heterogeneity of receptor tyrosine Time-resolved mass spectrometry of tyrosine
kinases EGFR and PDGFRA amplification in phosphorylation sites in the epidermal growth
glioblastoma defines subpopulations with dis- factor receptor signaling network reveals
tinct growth factor response. Proc Natl Acad dynamic modules. Mol Cell Proteomics 4
Sci U S A 109(8):3041–3046. https://doi. (9):1240–1250. https://doi.org/10.1074/
org/10.1073/pnas.1114033109 mcp.M500089-MCP200
16. Sottoriva A, Spiteri I, Piccirillo SG, 19. Curran TG, Bryson BD, Reigelhaupt M,
Touloumis A, Collins VP, Marioni JC, Johnson H, White FM (2013) Computer
Curtis C, Watts C, Tavare S (2013) Intratumor aided manual validation of mass spectrometry-
heterogeneity in human glioblastoma reflects based proteomic data. Methods 61
cancer evolutionary dynamics. Proc Natl Acad (3):219–226. https://doi.org/10.1016/j.
Sci U S A 110(10):4009–4014. https://doi. ymeth.2013.03.004
org/10.1073/pnas.1219747110
Part III
Abstract
Metabolic profiles reflect biological conditions as a result of biochemical changes within a living system. It is
therefore possible to associate metabolic signatures with clinical endpoints of diseases, such as breast cancer.
Nuclear magnetic resonance (NMR) spectroscopy is one of the most common techniques used for
metabolic profiling, and produces high dimensional datasets from which meaningful biological information
can be extracted. Here, we present an overview of data analysis techniques used to achieve this, describing
key steps in the procedure. Moreover, examples of clinical endpoints of interest are provided. Although
these are specific for breast cancer, the procedures for the analysis of NMR spectra as described here are
applicable to any type of cancer and to other diseases.
Key words Breast cancer, Cross validation, Hierarchical clustering, Hypothesis testing, Metabolites,
Model diagnostic statistics, Multivariate analysis, NMR spectroscopy, Partial least squares, Principle
component analysis, Permutation testing
1 Introduction
Louise von Stechow (ed.), Cancer Systems Biology: Methods and Protocols, Methods in Molecular Biology, vol. 1711,
https://doi.org/10.1007/978-1-4939-7493-1_9, © Springer Science+Business Media, LLC 2018
167
168 Leslie R. Euceda et al.
Fig. 1 Example 1H NMR spectrum of breast tumor tissue. Observable metabolite peaks include glucose (Glc),
ascorbate (Asc), lactate (Lac), myo-inositol (mI), creatine (Cr), glutamate (Glu), glycine (Gly), taurine (Tau),
glycerophosphocholine (GPC), phosphocholine (PCho), choline (Cho), glutathione (GSH), glutamine (Gln),
succinate (Succ), and alanine (Ala)
2 Materials
2.1 Data Input Choosing the optimal approach for statistical analysis is dependent
on the type and structure of the data input as well as the hypothesis
of interest. The NMR spectral data should, prior to statistical
analysis, have been through proper preprocessing procedures. For
multivariate analysis, the preprocessed data forms the X-matrix,
where each row represents one sample and each column represents
one variable or point in an NMR spectrum. Alternatively, metabo-
lites can be quantified to make the data applicable for both multi-
variate and univariate analysis. In such cases, quantified metabolites
from the same sample can be combined so that each variable of the
X-matrix used for multivariate analysis represents one individual
metabolite.
The Y-matrix or vector, used in supervised analysis, contains
information about the clinical endpoint that should be predicted.
The clinical endpoint is defined as the relevant patient information
of interest to test for correlation with metabolic signature. Exam-
ples of clinical outcome variables of interest in breast cancer are
patient 5-year survival, tumor size, tumor cell percentage, lymph
node status, metastatic status, pathological response to treatment,
and hormone (estrogen or progesterone) receptor status. These can
either be categorical (e.g., lymph node status) or continuous vari-
ables (e.g., tumor cell percentage) (see Note 1).
NMR Metabolic Profiles of Breast Cancer 171
2.2 Software There are several different softwares available for univariate or
multivariate analysis of metabolomics data, differing in their flexi-
bility and user-friendliness (see Table 1). Software such as Matlab
and R can be used for all types of data analysis, but require that the
user has knowledge on programming.
3 Methods
3.1 Multivariate Unsupervised methods are exploratory and useful tools for getting
Analysis to know your dataset in terms of possible groupings, patterns, and
outliers, without taking a response variable into account. Examples
3.1.1 Unsupervised of common methods are principal component analysis (PCA) and
Methods hierarchical cluster analysis (HCA).
Principal Component PCA is a method that through linear combinations of the original
Analysis (PCA) independent variables X, constructs a new lower dimension coor-
dinate system made up by latent variables (LVs), which in PCA are
called principal components (PCs) [11]. These variables explain
variance within the dataset with the aim of capturing the main
trends in the data. The position of each sample in the new coordi-
nate system is reflected by the scores matrix (T), while the influence
of the original variables on the PCs is defined by the loadings matrix
(P) such that:
X ¼ TPT þ E ð1Þ
where E is the residual matrix of variance not explained by the
model, and T indicates the transpose of a matrix. The results can
thus be visualized in scores and loadings plots (see Fig. 2).
Protocol for PCA 1. Additional preprocessing of variables. Although the spectral data
was preprocessed prior to data analysis, PCA is sensitive to the
scaling of the variables.
Spectral data: Mean center the data by subtracting the variable
mean from each variable value to make the mean zero. Mean
centering of spectral data removes the offset from each variable
172 Leslie R. Euceda et al.
Table 1
Examples of available software and interfaces to perform multivariate and/or univariate
metabolomics analyses described here
Software/ Methods
Interface Reference/URL Implemented
Amix https://www.bruker.com/products/mr/nmr/nmr-software/ Multivariate
software/amix/overview.html
Knimea https://www.knime.org/knime Univariate and
multivariate
Matlab http://www.mathworks.com/products/matlab/ Univariate and
multivariate
MetaboAnalysta http://www.metaboanalyst.ca/faces/docs/Format.xhtml Univariate and
multivariate
PLS toolboxb http://www.eigenvector.com/software/pls_toolbox.htm Multivariate
R https://www.r-project.org/ Univariate and
multivariate
SIMCA http://umetrics.com/products/simca Multivariate
SIRIUS http://www.prs.no/Sirius/Sirius.html Multivariate
SPSS http://www-01.ibm.com/software/analytics/spss/products/ Univariate
statistics/
STATA http://www.stata.com/ Univariate and
multivariate
The http://www.camo.com/rt/Products/Unscrambler/ Univariate and
Unscrambler unscrambler.html multivariate
a
Uses R packages
b
Requires Matlab
so that PC1 will not capture the mean of the data but the
direction of maximum variance.
Quantified metabolites: Autoscale the metabolite concentrations
by normalizing each value to the variable standard deviation
after mean centering. Autoscaling allows each variable to have
the same influence on the model, and the resulting variables have
mean zero and standard deviation of one (see Note 2).
2. Perform PCA using the software of choice (see Table 1).
3. Select number of components to include in the model. There are
two alternative approaches:
Cumulative variance plot: Evaluate the cumulative variance
explained by the model with increasing number of PCs. Choose
the number of PCs that explain a certain predetermined amount
of variance (see Note 3).
NMR Metabolic Profiles of Breast Cancer 173
Fig. 2 Result from principal component analysis of breast cancer tissue from two different groups. The two
classes are perfectly separated in the second principal component (PC2). Samples from class 2 have high PC2
scores compared to class 1, thus they have higher levels of the metabolite phosphocholine (PCho) and lower
levels of glycerophosphocholine (GPC) compared to the class 1 samples. The first principal component (PC1)
shows that the largest variation in the dataset is due to differences in lipid concentrations between the
samples, as the samples to the right in the scores plot, having high scores on PC1, have high lipid levels
compared to the remaining samples
Fig. 3 Scree plot example. Two “knees,” marked by red arrows, are observed, suggesting two or four principal
components (PCs) to be the optimal number. In this case, the cumulative variance plot can aid in the
determination of the best “knee,” selecting the one that represents the number of PCs that explains a
certain predetermined amount of variance
Scree plot: Plot the variance explained by each PC (see Note 4).
The variance will decrease for each PC. Choose the PC repre-
senting the “knee” in the curve (see Fig. 3).
174 Leslie R. Euceda et al.
Fig. 4 Dendogram example. Samples, whose ID numbers are specified in the x-axis, are divided into six
clusters, shown in different colors, by manually setting a cutoff at height 150
Hierarchical Cluster HCA aims to find natural clusters among samples using a hierar-
Analysis (HCA) chical approach where samples are grouped according to calculated
similarities and dissimilarities. The result is visualized as a dendro-
gram (see Fig. 4). At the bottom of the dendrogram, all objects
represent individual clusters. For each level, the two closest objects
are joined into one cluster. This continues until all clusters are
joined by one branch. There are different measures for determining
the distance between individual samples or between clusters of
samples. Common measures for individual samples include Euclid-
ean distance, Manhattan distance, and sample correlations. Com-
mon measures for distance between clusters include single linkage,
average linkage, complete linkage, and Ward’s method. The
NMR Metabolic Profiles of Breast Cancer 175
Protocol for HCA 1. Calculate the distance between all possible pairs of clusters using
the chosen distance measure for individual samples.
2. Merge the two clusters with the smallest distance.
3. Calculate the new clusters’ distance to other clusters using a
chosen distance measure for clusters.
4. Repeat steps 2–3 until all samples are merged into one cluster.
Alternatively, a top down approach can be used where all
objects are considered one cluster initially and subsequently divided
into smaller clusters depending on their dissimilarities.
To decide which matrices are optimal for distance measure-
ments and assessing how well the dendrogram reflects your data,
the cophenetic correlation coefficient [12] can be used. This coeffi-
cient calculates the correlation of the original pairwise distance
between two objects and the level/height at which the two objects
were joined in one cluster.
The resulting dendrogram can be used to divide the samples
into clusters. The number of resulting clusters can be defined
beforehand or a cutoff can be set at a decided level of the dendo-
gram, either manually or using for instance Gap statistics [13]. All
the samples joined by branches below the cutoff are considered one
cluster. The resulting clusters can be evaluated in terms of clinical
endpoints of interest. Prediction of cluster labels for new samples
can be achieved based on the shortest distance to each of the cluster
centroids or using validated supervised models (e.g., PLS-DA, see
Subheading 3.1.2).
Partial Least Squares (PLS) Partial least squares (PLS) is commonly used both for regression
Methods and for classification problems. PLS defines underlying structures
that maximize the covariance between the independent variables
and the response variable [15]. Instead of only modeling the inde-
pendent variables X, as is the case for PCA, the dependent response
variables Y are also modeled:
X ¼ TPT þ E ð2Þ
Y ¼ UQT þ F ð3Þ
where T and U are the score matrices, P and Q are the loading
matrices, and E and F are the residuals for X and Y, respectively. T
indicates the transpose of a matrix. The X-scores, T, are predictors
of Y and will also model X, thus both X and Y are assumed to be
modeled by the same latent variables. Hence, Y can be written as
Y ¼ TGQT þ F ð4Þ
where G is the diagonal matrix resulting from U ¼ TG.
There are several algorithms that can be used to estimate these
parameters, all of which provide more or less similar results [16] .
The covariance between X and Y is optimized by defining PLS
LVs, which are linear combinations of the original X variables, and
the dimensionality of the resulting PLS model is equal to the
number of LVs used in the model. The optimal number of LVs to
use is chosen based on different model diagnostic terms used to
evaluate the overall quality of the model for different numbers of
LVs. For PLSR, the Q2 statistic (Eq. 5) and the root mean square
error (RMSE) (Eq. 6) are typically used. These statistics reflect the
^
differences between the predicted value ( y ) and the known y:
P ^ 2
i y i y i
Q2 ¼ 1 P 2 ð5Þ
i y i y
NMR Metabolic Profiles of Breast Cancer 177
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
uP
u ^ 2
t i yi y i
RMSE ¼ ð6Þ
m
where i ¼ 1,2,. . .,m for m included samples and yis the mean of all y
values. If RMSE is to be compared between datasets, the value
should be normalized (NRMSE) to make it independent of scale
(Eq. 7):
RMSE
NRMSE ¼ ð7Þ
y max y min
A Q2 of 1 corresponds to perfect prediction, while a very low or
negative Q2 indicates a poor prediction. A NRMSE value of 0 cor-
responds to perfect prediction, while a value close to 1 indicates
poor prediction.
For PLS-DA, commonly used diagnostic statistics include the
classification error, sensitivity, and specificity. The number of cor-
rectly classified samples, i.e., true positives (TP) and true negatives
(TN), and the number of incorrectly classified samples, i.e., false
negatives (FN) and false positives (FP), is subsequently recorded.
The prediction error (see Note 5) relates the number of incorrectly
classified samples with the total number of samples (Eq. 8). The
model accuracy equals one minus the classification error.
Number of incorrectly classifed samples
Error ¼
Total number of samples
FN þ FP
¼ ð8Þ
TP þ TN þ FN þ FP
Sensitivity measures the ability to correctly predict the case
class, or true positive samples (Eq. 9). A highly sensitive model is
one that generates few false negatives.
Partial Least Squares PLSR is used for regression problems where the intention is to
Regression (PLSR) model correlations and/or make prediction on a continuous
response variable.
Protocol for PLSR 1. Perform additional scaling of variables as described in the proto-
col for PCA, step 1.
2. Perform PLSR using the spectral data or quantified metabolites
as independent variables X and the sample characteristic to be
modeled as a continuous response variable Y. Depending on the
number of samples in your dataset, choose the type of cross-
validation that suits your dataset (see Subheading 3.1.4). Make
PLSR models for a restricted number of LVs (see Note 6).
3. Examine the cross-validated RMSE or Q2 for the different num-
bers of LVs.
4. Choose the number of LVs giving the first minimum in cross-
validated RMSE or the first maximum in Q2 (see Note 7).
5. If you have an independent validation set, make predictions of
the independent test set using the model obtained in step 4.
Partial Least Squares PLS-DA is used for classification problems where the intention is to
Discriminant Analysis model correlations and/or make prediction between two or more
(PLS-DA) groups of samples.
Protocol for PLS-DA 1. Perform additional scaling of variables as described in the Pro-
tocol for PCA, step 1.
2. Perform PLS-DA using the spectral data or quantified metabo-
lites as independent variables X and the sample characteristic to
be modeled as a categorical response variable Y, with discrete
numbers representing each class (e.g., 1 for treatment group and
2 for control group). Depending on the number of samples in
your dataset, choose the type of cross-validation that suits your
dataset (see Subheading 3.1.4). Make PLS-DA models for a
restricted number of LVs (see Note 6).
3. Examine the cross-validated classification error for the different
numbers of LVs.
4. Choose the number of LVs giving the first minimum in cross-
validated classification error (see Note 7).
5. If you have an independent validation set, make predictions of
the independent test set using the model obtained in step 4.
Orthogonal PLS (OPLS) In orthogonal PLSR (OPLSR) and OPLS-DA, the response
orthogonal variations in X are separated out before the model is
built. Orthogonalizing the model gives identical model perfor-
mance to that of original PLS as the OPLS components are
NMR Metabolic Profiles of Breast Cancer 179
Multilevel PLS-DA For longitudinal or cross-over studies, where each individual serves
as its own control, multilevel PLS-DA can be used to separate the
within-patient variation from the between-patient variation
[17]. By focusing on the within-patient variation, representing
metabolic changes due to the intervention (e.g., samples before
and after treatment), metabolic changes that would otherwise be
masked by the often much larger between-patient variation can be
revealed. In the example of samples of the same individuals before
and after intervention, the within-patient variation would be sepa-
rated according to:
Control ¼ A B ð11Þ
Intervention ¼ B A ð12Þ
where A is the metabolic data before intervention and B is the
metabolic data after intervention. These new matrices from
Eqs. 11 and 12 are concatenated and used as independent variables
in PLS-DA with a categorical variable representing control and
intervention as the Y vector.
Other Multivariate Analysis In addition to PLS-based methods, several other multivariate anal-
Methods ysis methods are suitable for the analysis of breast cancer metabo-
lomics data. Neural networks (NNs) can model complex, nonlinear
relationships between the input variables and the problem to be
solved, and can be used for both regression and classification pro-
blems [18]. NNs consist of three or more layers: an input layer, one
or several hidden layers, and an output layer. The nodes of each
layer are connected through weights, and the weights and hidden
layer(s) will be adapted to the input data through learning. Another
suitable method is support vector machines (SVMs) [19]. SVMs do
not learn like NNs, but instead aim to find boundaries that separate
different groups. The boundary determined by SVMs will be a line
in 2D, a plane in 3D, or a hyperplane in n dimensions. By choosing
different kernel functions, SVMs can be applied to nonlinear pro-
blems by transformation of the input space into a higher dimension
where the classes are linearly separable. Although these methods
can be powerful for making predictions, a main drawback of NNs
and SVMs is the difficulty in interpreting the resulting models.
3.1.3 Variable Selection Metabolic datasets are made up of several variables, or columns,
each one representing a point in a spectrum or an individual metab-
olite. Variables that are biologically irrelevant add noise to the
model and can impair model performance. A variable selection
procedure is therefore often performed when analyzing
180 Leslie R. Euceda et al.
Protocol for Variable 1. Define the variable selection method to use. For an overview of
Selection available methods, see Ref. 23.
2. Perform variable selection using the algorithm or script defined.
3. Build models using only the selected variables.
It is worth mentioning that if many variables provide the same
information, only one (or very few) of these will be selected to
minimize redundancy. Highly correlated metabolites involved in
the same pathway could thus be discarded while still being biologi-
cally important.
Fig. 5 Illustration of data splitting for a four-fold double cross validation procedure through which the number
of latent variables (LVs) is optimized in the inner loop and model quality is assessed in the outer loop. Samples
are divided into four different outer loop groups or folds (k ¼ 1–4). At each outer loop repetition, three folds
comprise the data input for the inner loop, while one is left out as a validation set. The inner loop is then
partitioned into four inner loop folds (k2 ¼ 1–4), which at each inner loop repetition alternate the role of test
set while the remaining folds comprise the training set. The samples comprising the validation set in the outer
loop are therefore unseen to the latent variable optimization procedure, reducing the risk of over optimistic
results when using them to assess the model built with the inner loop data. A classical, single-layered CV
procedure consists only of the outer loop, with the inner loop data being the training set
Permutation Testing To ensure that obtained model diagnostic statistics are significantly
better than those that would be obtained by chance, permutation
testing can be performed. By rearranging the y response variable in
a random order, the y continuous values or classes are no longer
associated with their true corresponding metabolic information
(X); thus, any relationships between X and y are lost [24]. The
procedure can be performed to evaluate a double CV procedure as
follows.
Protocol for Permutation 1. Permute or rearrange the values in the original y variable in a
Testing random order to obtain ypermuted. Replace the original y variable
with ypermuted.
NMR Metabolic Profiles of Breast Cancer 183
3.2 Univariate Univariate analysis can be performed to search for statistically sig-
Analysis nificant differences in individual metabolites between groups.
3.2.1 Selection Criteria Prior to univariate analysis, it should be decided whether the data is
for Univariate Tests prone to parametric or nonparametric tests. This is decided based
on at least three check points: normality (see Note 12), homogene-
ity of variances, i.e., homoscedasticity (in case of heteroscedasticity,
see Note 13), and independency of samples (for dependent sam-
ples, e.g., repeated measurements, samples from the same hospital,
etc., linear mixed-effects models can be used (see Subheading
3.2.2)). Figure 6 shows a simplified overview of tests to select
according to data distribution and number of groups to test. For
more extensive details regarding selection criteria for univariate
tests, refer to [25].
3.2.2 Linear Mixed- Linear mixed-effects models (LMM) are an extension of general
Effects Model linear models taking into account both fixed and random effects,
where fixed effects often are those of primary interest, e.g., effect of
treatment type, while random effects are results of random selec-
tion, e.g., age, hospital, or individual. The modeling of random
effects enables inclusion of repeated measurements. An additional
advantage is that LMM can handle missing values, thus improving
the power in multilevel analysis where some observations are
missing.
In longitudinal studies, where samples have been collected
from individuals over time, LMM can be used to evaluate which
metabolites are significantly different with respect to one or more
outcomes of interest. In such cases, metabolite levels are set as
individual response variables, clinical outcome as a fixed effect,
and patient number as a random effect.
To perform LMM, first define the fixed and random effects.
Categorical fixed effects are set as factors. To decide whether or not
to model interactions between the fixed effects, a likelihood ratio
test comparing the reduced model (without interactions) to the full
184 Leslie R. Euceda et al.
Fig. 6 Examples of univariate tests that can be used for evaluating group differences in the level of quantified
metabolites
3.2.3 Multiple Testing When performing tests to associate a p-value to each individual
Correction metabolite separately, such as those described in Subheadings
3.2.1 and 3.2.2, the same test is repeated for all metabolites. The
likelihood of significant p-values being achieved by chance will
increase with the number of tests performed. Hence, the number
of false positives (i.e., type I errors) should be controlled for. Here
lies the purpose of multiple testing corrections, which can be
achieved via different approaches. A widely used approach is the
Bonferroni adjustment [26].
Protocol for Bonferroni 1. Generate p-values for all n metabolites using a suitable
Adjustment statistical test.
2. Multiply each p-value by n.
The Bonferroni method controls for the family-wise error rate
(FWER), which is the probability of producing at least one false
positive. Although simple, the Bonferroni adjustment is generally
unnecessarily strict for the purposes of metabolic analyses. Alterna-
tively, one can implement less stringent correction methods that
control for the false discovery rate (FDR), which is the expected
proportion of false positives to be generated. One such method is
NMR Metabolic Profiles of Breast Cancer 185
Protocol for Benjamini- 1. Input p-values and record their order, referred to as the “original
Hochberg Adjustment: order.”
2. Rank and sort the inputted p-values in an ascending order, such
that the rank i of the smallest p-value is 1, the second smallest has
an i ¼ 2, etc.
3. Calculate an intermediate q-value (qint) for each sorted p-value
( pval): qint ¼ ( pval/i)n, where n is the number of inputted p-
values.
4. Sort the qint in an ascending order, recording their
corresponding p-value ranks.
5. The sorted qint values will now be adjusted according to their p-
value rank. The first qint value (qint0) remains the same. If the
rank of the second sorted qint value (qint1) is lower than that of
the previous value (qint0), overwrite qint1 with qint0. Next, look to
the rank of qint2. If its rank is lower than qint1 then replace qint2
with the new value of qint1, if the rank is higher, then qint2
remains unchanged. Next, compare the rank of qint2 with qint3
and repeat the previous steps until all qint values have been
adjusted. The result is a list of the final q-values (see Note 14).
6. Reorder the final q-values so that they correspond to the original
order of the inputted p-values recorded in step 1.
The adjusted p-value (q-value) represents the smallest FDR at
which the corresponding test will be significant. So for a q-value of
0.02, the test would be considered significant (null hypothesis
rejected) when allowing a maximum of 2% of all significant tests
to be false positives (i.e., FDR threshold is 2%). As for all statistical
tests, the desired FDR threshold value to base significance on
should be defined prior to testing.
3.3 Multivariate An overview of the key steps to analyze metabolic profiles in breast
Versus Univariate cancer using both multivariate and univariate methods has been
Analysis provided. To conclude, a comparison of these methods regarding
their advantages and disadvantages is presented in Table 2.
4 Notes
Table 2
Advantages and disadvantages of univariate and multivariate methods
Advantages Disadvantages
Univariate l Widely used/known in all scientific fields. l Accurate measure of absolute or relative
methods l Usually simple and straightforward to concentrations is essential.
perform and interpret. l Untargeted approaches present
l Useful for targeted approaches when one challenges, particularly the risk of false
or a few metabolites have been defined to discoveries increasing with increasing
be tested. number of univariate paralleled tests
l Variables (i.e., metabolites) do not affect performed. Although this can be
the outcome of each other’s tests (with the addressed by applying multiple testing
exception of multiple testing procedures). corrections, these in turn may be too
strict, thereby risking to miss a true
discovery.
l Does not account for variable correlation.
Multivariate l Useful for exploratory purposes, such as l Not widely known in clinical fields
methods outlier detection. l Computationally intensive, time-
l Applicable for untargeted approaches as consuming algorithms
they can handle large numbers of variables l Interpretation might not be
l Takes proper account of the correlation noise can obscure information from
between spectral points/metabolites. important variables that would be
l No need to correct for multiple testing, as detected using univariate tests.
all variables are analyzed simultaneously. l When using the metabolic profile as
l Evidence of individual metabolites can input, scaling will increase the influence of
accumulate to reveal findings that would the noise and might not be optimal.
not be detected separately with univariate Thus, differences in metabolites of lower
methods. abundance may be obscured by those of
l Quantification not necessary higher abundance.
tissue; early recurrence [31, 32] and weight change [33] study-
ing serum; risk of disease development [34] studying plasma.
2. Autoscaling should not be performed on spectral data, as this
will scale up the noise regions between metabolite signals.
3. Choosing a number of PCs explaining 80–90% of the data
variation will usually be sufficient to get a good overview of
the data.
4. Alternatively, scree plots can be plotted as the eigenvalue versus
the corresponding principal component (PC). An eigenvalue
describes the amount of variance accounted for by its associated
PC [35].
5. For unbalanced datasets, i.e., those with very few samples of
one class and many of the other, the prediction error may be
misleading. For example, if 90% of samples in a dataset are of
class A, a model that predicts every sample as class A will achieve
NMR Metabolic Profiles of Breast Cancer 187
References
1. Clarke CJ, Haselden JN (2008) Metabolic 3. Keeler J, Understanding NMR (2010) Spec-
profiling as a tool for understanding mechan- troscopy, 2nd edn. Wiley, Chichester, UK
isms of toxicity. Toxicol Pathol 36 4. Hu K, Westler WM, Markley JL (2011) Simul-
(1):140–147. https://doi.org/10.1177/ taneous quantification and identification of
0192623307310947 individual Chemicals in Metabolite Mixtures
2. Bujak R, Struck-Lewicka W, Markuszewski MJ, by two-dimensional extrapolated time-zero
Kaliszan R (2015) Metabolomics for labora- (1)H(13)C HSQC (HSQC(0)). J Am Chem
tory diagnostics. J Pharm Biomed Anal Soc 133(6):1662–1665. https://doi.org/10.
113:108–120. https://doi.org/10.1016/j. 1021/ja1095304
jpba.2014.12.017. 5. Nicholson JK (1989) High resolution nuclear
magnetic resonance spectroscopy in clinical
188 Leslie R. Euceda et al.
chemistry and disease diagnosis. In: den Boer intervention study. J Proteome Res 7
NC, van der Heiden C, Leijnse B, Souverijn (10):4483–4491. https://doi.org/10.1021/
JHM (eds) Clinical chemistry, an overview. Ple- pr800145j
num Press, New York, NY 18. Brougham DF, Ivanova G, Gottschalk M, Col-
6. Le Gall G (2015) NMR spectroscopy of bio- lins DM, Eustace AJ et al (2011) Artificial neu-
fluids and extracts. In: Bjerrum TJ ral networks for classification in metabolomic
(ed) Metabonomics: methods and protocols. studies of whole cells using 1H nuclear mag-
Springer, New York, NY, pp 29–36 netic resonance. J Biomed Biotechnol 2011:8.
7. Giskeødegård GF, Cao MD, Bathen TF (2015) https://doi.org/10.1155/2011/158094.
High-resolution magic-angle-spinning NMR 19. Gromski PS, Muhamadali H, Ellis DI, Xu Y,
spectroscopy of intact tissue. In: Bjerrum TJ Correa E, Turner ML et al (2015) A tutorial
(ed) Metabonomics: methods and protocols. review: metabolomics and partial least squares-
Springer, New York, NY, pp 37–50 discriminant analysis – a marriage of conve-
8. Bathen TF, Geurts B, Sitter B, Fjøsne HE, nience or a shotgun wedding. Anal Chim Acta
Lundgren S, Buydens LM et al (2013) Feasi- 879:10–23. https://doi.org/10.1016/j.aca.
bility of MR metabolomics for immediate anal- 2015.02.012.
ysis of resection margins during breast cancer 20. Wold S, Johansson E, Cocchi M (1993) PLS:
surgery. PLoS One 8(4):e61578. https://doi. partial least-squares projections to latent struc-
org/10.1371/journal.pone.0061578 tures. In: Kubinyi H (ed) 3D QSAR in Drug
9. Vettukattil R (2015) Preprocessing of raw Design. ESCOM, Leiden, The Netherlands, pp
metabonomic data. In: Bjerrum TJ 523–550
(ed) Metabonomics: methods and protocols. 21. Li H-D, Zeng M-M, Tan B-B, Liang Y-Z, Q-S X,
Springer, New York, NY, pp 123–136 Cao D-S (2010) Recipe for revealing informative
10. Euceda LR, Giskeødegard GF, Bathen TF metabolites based on model population analysis.
(2015) Preprocessing of NMR metabolomics Metabolomics 6(3):353–361. https://doi.org/
data. Scand J Clin Lab Invest 75(3):193–203. 10.1007/s11306-010-0213-z
https://doi.org/10.3109/00365513.2014. 22. Rajalahti T, Arneberg R, Kroksveen AC,
1003593 Berle M, Myhr K-M, Kvalheim OM (2009)
11. Wold S, Esbensen K, Geladi P (1987) Principal Discriminating variable test and selectivity
component analysis. Chemometr Intell Lab ratio plot: quantitative tools for interpretation
Syst 2(1):37–52. https://doi.org/10.1016/ and variable (biomarker) selection in complex
0169-7439(87)80084-9. spectral or chromatographic profiles. Anal
12. The Mathworks Inc. Cophenetic correlation Chem 81(7):2581–2590. https://doi.org/10.
coefficient. http://www.mathworks.com/help/ 1021/ac802514y
stats/cophenet.html#zmw57dd0e176726. 23. Mehmood T, Liland KH, Snipen L, Sæbø S
Accessed 13 Apr 2016 (2012) A review of variable selection methods
13. Tibshirani R, Walther G, Hastie T (2001) Esti- in partial least squares regression. Chemometr
mating the number of clusters in a data set via Intell Lab Syst. 118:62–69. https://doi.org/
the gap statistic. J R Stat Soc Series B Stat 10.1016/j.chemolab.2012.07.010.
Methodol 63(2):411–423. https://doi.org/ 24. Westerhuis JA, Hoefsloot HCJ, Smit S, Vis DJ,
10.1111/1467-9868.00293 Smilde AK, Velzen EJJ et al (2008) Assessment
14. Hawkins DM (2004) The problem of Overfit- of PLSDA cross validation. Metabolomics 4
ting. J Chem Inf Comput Sci 44(1):1–12. (1):81–89. https://doi.org/10.1007/
https://doi.org/10.1021/ci0342472 s11306-007-0099-6
15. Wold S, Sjöström M, Eriksson L (2001) 25. Riffenburgh RH (2006) Statistics in medicine,
PLS-regression: a basic tool of chemometrics. 2nd edn. Elsevier Academic Press, Burlington,
Chemometr Intell Lab Syst 58(2):109–130. MA
https://doi.org/10.1016/S0169-7439(01) 26. Bonferroni CE (1936) Teoria statistica delle
00155-1. classi e calcolo delle probabilità, Pubblicazioni
16. Andersson M (2009) A comparison of nine del R Istituto Superiore di Scienze Economiche
PLS1 algorithms. J Chemometr 23 e Commerciali di Firenze, vol 8. Seeber, Fire-
(10):518–529. https://doi.org/10.1002/ nze, pp 3–62. doi:citeulike-article-id:1778138
cem.1248 27. Benjamini Y, Hochberg Y (1995) Controlling
17. van Velzen EJJ, Westerhuis JA, van Duynhoven the false discovery rate: a practical and powerful
JPM, van Dorsten FA, Hoefsloot HCJ, Jacobs approach to multiple testing. J R Stat Soc Series
DM et al (2008) Multilevel data analysis of a B Stat Methodol 57(1):289–300
crossover designed human nutritional
NMR Metabolic Profiles of Breast Cancer 189
28. Benjamini Y, Yekutieli D (2001) The control of negative early breast cancer at increased risk of
the false discovery rate in multiple testing disease recurrence. Results from a retrospective
under dependency. Ann Stat 29(4):1165–1188 study. Mol Oncol 9(1):128–139. https://doi.
29. Giskeødegård GF, Grinde MT, Sitter B, Axel- org/10.1016/j.molonc.2014.07.012.
son DE, Lundgren S, Fjøsne HE et al (2010) 33. Keun HC, Sidhu J, Pchejetski D, Lewis JS,
Multivariate modeling and prediction of breast Marconell H, Patterson M et al (2009) Serum
cancer prognostic factors using MR metabolo- molecular signatures of weight change during
mics. J Proteome Res 9(2):972–979. https:// early breast cancer chemotherapy. Clin Cancer
doi.org/10.1021/pr9008783 Res 15(21):6716–6723. https://doi.org/10.
30. Cao MD, Giskeødegård GF, Bathen TF, 1158/1078-0432.ccr-09-1452
Sitter B, Bofin A, Lønning PE et al (2012) 34. Bro R, Kamstrup-Nielsen MH, Engelsen SB,
Prognostic value of metabolic response in Savorani F, Rasmussen MA, Hansen L et al
breast cancer patients receiving neoadjuvant (2015) Forecasting individual breast cancer risk
chemotherapy. BMC Cancer 12(1):1–11. using plasma metabolomics and biocontours.
https://doi.org/10.1186/1471-2407-12-39. Metabolomics 11(5):1376–1380. https://doi.
31. Asiago VM, Alvarado LZ, Shanaiah N, Gowda org/10.1007/s11306-015-0793-8
GAN, Owusu-Sarfo K, Ballas RA et al (2010) 35. Bernstein IH, Garvin CP, Teng GK (1988)
Early detection of recurrent breast cancer using Applied multivariate analysis. Springer,
metabolite profiling. Cancer Res 70 New York, NY
(21):8309–8318. https://doi.org/10.1158/ 36. Ghasemi A, Zahediasl S (2012) Normality tests
0008-5472.can-10-1319 for statistical analysis: a guide for
32. Tenori L, Oakman C, Morris PG, Gralka E, non-statisticians. Int J Endocrinol Metab 10
Turner N, Cappadona S et al (2015) Serum (2):486–489. https://doi.org/10.5812/ijem.
metabolomic profiles evaluated after surgery 3505
may identify patients with oestrogen receptor
Part IV
Abstract
Although the detection of metastases radically changes prognosis of and treatment decisions for a cancer
patient, clinically undetectable micrometastases hamper a consistent classification into localized or meta-
static disease. This chapter discusses mathematical modeling efforts that could help to estimate the
metastatic risk in such a situation. We focus on two approaches: (1) a stochastic framework describing
metastatic emission events at random times, formalized via Poisson processes, and (2) a deterministic
framework describing the micrometastatic state through a size-structured density function in a partial
differential equation model. Three aspects are addressed in this chapter. First, a motivation for the Poisson
process framework is presented and modeling hypotheses and mechanisms are introduced. Second, we
extend the Poisson model to account for secondary metastatic emission. Third, we highlight an inherent
crosslink between the stochastic and deterministic frameworks and discuss its implications. For increased
accessibility the chapter is split into an informal presentation of the results using a minimum of mathemati-
cal formalism and a rigorous mathematical treatment for more theoretically interested readers.
Key words Poisson process, Structured population equation, Metastasis, Mathematical modeling
1 Introduction
Louise von Stechow (ed.), Cancer Systems Biology: Methods and Protocols, Methods in Molecular Biology, vol. 1711,
https://doi.org/10.1007/978-1-4939-7493-1_10, © Springer Science+Business Media, LLC 2018
193
194 Christophe Gomez and Niklas Hartung
2.1 Metastatic Risk Predicting the probability of metastatic disease at diagnosis of the
primary tumor is of major clinical importance since it is strongly
linked to survival expectancy. One possibility to build such a pre-
diction model is by using large databases to correlate information
on the presence of metastases to primary tumor characteristics at
diagnosis or surgery. As an example, we show a relationship estab-
lished in [14] between primary tumor size at surgery and probabil-
ity of metastasis based on clinical data on breast cancer:
ℙðno metastasesÞ ¼ expðc d z Þ, ð1Þ
2.2 Poisson A Poisson process (PP) is a model for counting a series of events
Processes occurring at random times. The precise definition of this process is
given in Subheading 5, but its basic properties are the two follow-
ing ones (see Fig. 1 for an illustration):
1. The number of events in disjoint time intervals is independent.
This translates the memorylessness property since given some
time t, the number of future events (those happening at any time
tfuture > t) does not depend on the past events (those happening
at any time tpast < t), but only depend on the present state of the
system at time t. For example, in Fig. 1 the time elapsed between
t and T(4) is independent of when exactly T(3) occurred. In other
words, the system forgot what happened up to time t.
2. The number of events Nt that occurred by time t has a Poisson
distribution with parameter
Zt
ΛðtÞ ¼ λðsÞds,
0
i.e., the integral over each emission intensity λ(s) for s in the time
4
Number of events
0
T (1) T (2) T (3) t T (4) Time
Fig. 1 Schematic trajectory of a Poisson process. Here, T(1), . . ., T(4) are the
times at which events occur, and by time t we have 3 events, i.e. Nt ¼ 3
198 Christophe Gomez and Niklas Hartung
interval [0, t]. This means that the probability of having observed
exactly k events until time t is given by
ΛðtÞk ΛðtÞ
ℙðN t ¼ kÞ ¼ e :
k!
These two properties characterize the PP, and can even serve as
a definition in addition to N0 ¼ 0. From these two properties, one
can show that the probability that the next event time lies between
times t and t þ Δt is approximately λðtÞΔt. Hence, λ determines the
event frequency, and this is the reason why it is called the intensity
function.
In the setting of this chapter, we are interested in describing the
inception times of new metastatic lesions via PPs. This means that
Nt is the number of metastases emitted until time t in our context.
Following [21, 22], we will first suppose that only the primary
tumor has the capacity of seeding metastases.
A constant emission intensity λ (called a homogeneous PP)
would mean that a tumor consisting of a few cells is equally likely
to shed a metastasis as a large tumor of several grams. Since such a
model is not realistic, we need to consider time-varying intensities λ
(called non-homogeneous PPs). We will consider an emission
intensity λ that depends on some measure of primary tumor size
Xp(t) (diameter, volume, number of cells, etc.). The relationship
between primary tumor size and emission intensity is given by a
size-dependent emission law γ, i.e. λ(t) ¼ γ(Xp(t)).
Before going into more detail, let us introduce a set of clinical
parameters (summarized in Table 1), which will be used through-
out this chapter to further illustrate those concepts. These para-
meters were estimated in [23] from clinical data on a hepatocellular
carcinoma with multiple liver metastases. Although derived
within the deterministic framework of the size-structured model
Table 1
Growth and emission laws derived in [23] from clinical data of a hepatocellular carcinoma case with
multiple liver metastases
0.18
0.16
0.14
Probability density
0.12
0.1
0.08
0.06
0.04
0.02
0
0 5 10 15 20 25
Time (months)
Fig. 2 Probability density function (pdf) for the emission time of the first
metastasis T(1), using the clinical parameters in Table 1. The analytical
formula of the pdf is f T ð1Þ ðt Þ ¼ λðt Þe Λðt Þ
(see Subheading 3.1 for more details), the inherent link with the PP
framework described in Subheading 3.2 ensures that these para-
meters are also relevant in the PP model; we will therefore use the
same set of parameters in both frameworks. Also, we will make use
of a slight modification of this clinical setting to predict the risk of
distant metastasis after surgery. To represent the impact of a surgery
at time tsurgery, the emission intensity will be set to zero for all times
larger than tsurgery. Randomness of emission means that each emis-
sion time can be represented via its probability density function; this
is illustrated for the emission time of the first metastasis T(1) in
Fig. 2.
The number of metastases Nt is itself random in this model, but
relevant deterministic quantities can be derived from Nt, such as the
expected number of metastases ½N t or the probability of meta-
static disease ℙðN t > 0Þ. Exploiting the memorylessness property
of PPs, these quantities can be computed without any need to
simulate the process (all the following formulas are proven in
Appendix:
Zt
½N t ¼ λðsÞds ð2Þ
0
and
Zt
ℙðN t > 0Þ ¼ 1 expð λðsÞdsÞ:
0
Also, a formula for the variance of Nt is obtained readily:
200 Christophe Gomez and Niklas Hartung
20
Expectation
Variability
Stochastic trajectory
Number of metastases
15
10
0
0 0.5 1 1.5 2
Time (years)
Zt
var½N t ¼ λðsÞds: ð3Þ
0
The concepts Nt, ½N t and var[Nt] are illustrated in Fig. 3.
If a metastatic growth law is added to the model, the total
metastatic mass (or total cell count, sum of lesion volumes) Mt—
again a random quantity—can be represented via the emission times
of the PP. Mt can be compared to quantitative measures of total
metastatic biomass, obtainable, e.g., via bioluminescence imaging
[24]. We will assume that all metastases follow the same determin-
istic growth law Xm, but which can be different from the primary
tumor growth law. Therefore, the size difference among metastases
is entirely explained by differences in metastatic inception times,
and Mt can be written as
X
Nt
Mt ¼ X m ðt T ðkÞ Þ,
k¼1
where Xm(0) ¼ xm0 is the initial size of a metastasis. Expectation
and variance of the metastatic burden can also be calculated
analytically:
Stochastic and Deterministic Metastatic Emission Models 201
Zt
½M t ¼ λðsÞX m ðt sÞds: ð4Þ
0
Zt
var½M t ¼ λðsÞðX m ðt sÞÞ2 ds: ð5Þ
0
The assumption of equal growth law for the metastases greatly
simplifies the model, which is both an advantage (for identifiability
from clinical data) and drawback (for correct representation of
cancer biology). Beyond the scope of this chapter, it could be
replaced by a less restrictive assumption, e.g. by supposing that
individual growth parameters are drawn randomly from a given
probability distribution. However, even if easily integrated into
numerical algorithms, such a feature would be prohibitive for any
characterization of the model through mathematical analysis.
2.3 Secondary In the model described above, metastases do not have the capacity
Emission to emit metastases themselves. However, it is easy to think of a case
in which such a property would make a difference in the model. For
example, suppose that only a single metastasis is emitted prior to
surgery of the primary tumor (see Note 2). If this metastasis cannot
emit further metastases, its successful removal cures the patient but
the second surgery may fail if the metastasis is able to seed as well.
Of course, there are other mechanisms potentially leading to treat-
ment failure (e.g., local recurrence, surgery impossible, etc., see,
e.g., [36, 37]), but for simplicity these are not considered here.
In this section, we extend the previously shown PP model to
account for secondary metastatic emission using PPs as building
blocks. Many of the advantages and limitations of the PP model
carry over to the extended model, and we do not claim that a
comprehensive framework for cancer metastasis is built in that
way. The model is, however, simple enough to have a chance to
be parametrized reasonably from clinical data.
Conceptually, the extension is straightforward: as before, the
primary tumor grows according to Xp and metastatic emission by
the primary tumor is represented by a PP with intensity λp. In
addition, any emitted metastasis has the same capacities as the
primary tumor, but possibly with different growth and emission
rates (Xm instead of Xp, and λm instead of λp). If we consider a
metastasis emitted at time s, this means that at a later time t it
reaches the size Xm(t s) and emits metastases with intensity
λm(t s). Every newly emitted metastasis starts a new PP. Also,
each metastasis has a precursor (either the primary tumor or
another metastasis). The whole model then consists of the meta-
static emission times from all of these PPs. Since each PP can start
202 Christophe Gomez and Niklas Hartung
P P(λp ; 0)
PP (λm ;T (1) )
PP (λm ;T (2) )
T (1,1,1) T (1,1,2)
PP (λm ;T (1,1) )
Fig. 4 Illustration of the first three generations for a Poisson process (PP) cascade. Each long horizontal arrow
represents a PP (from top to bottom: primary tumor, first metastasis of first generation, second metastasis of
first generation, first metastasis of second generation emitted by first metastasis of first generation). Each
short vertical arrow represents an emission by the PP it points towards. This starts a new PP, connected by a
dashed line. In the notation PP(λ; T), λ is the intensity of the PP and T is its starting time for the new PP
(emission times are counted from the start of the respective PP and not from zero)
Stochastic and Deterministic Metastatic Emission Models 203
3.1 Size-Structured Let us consider a different framework for the description of metas-
Model tasis, which also represents the metastatic process purely as growth
and emission dynamics. To describe the micrometastatic state of
cancer patients a size-structured model was developed [23]. The
model describes the time evolution of a density function ρ(x, t)
representing
R x2 the size distribution of metastatic colonies: the inte-
gral x 1 ρðx, tÞdx represents the number of metastases at time t with
size between x1 and x2. Therefore, ρ is like a smoothed histogram of
the number of metastases within different size ranges.
To better understand why a size density is considered, it is
instructive to draw an analogy to Lagrangian and Eulerian descrip-
tion of a fluid flow (see, e.g., [38] for a comprehensive discussion).
In a Lagrangian description, the observer follows individual parti-
cles through the flow field. In contrast, for the Eulerian point of
view the observer considers the flow density through fixed refer-
ence points. These two frames of reference are illustrated in Fig. 5.
In this picture, metastatic growth becomes “flow through size
space.” In the PP model this is represented in a Lagrangian fashion:
a growth function is associated to each individual metastasis. In the
size-structured model an Eulerian frame of reference is used: the
entire population of metastatic tumors is described through a den-
sity function moving through size space at a “speed” g(x) (i.e., the
growth rate), in other words a size-structured density.
Formalizing metastatic growth from an Eulerian perspective
leads to a PDE model. Metastatic emission is the boundary
Time Time
Size Size
X m (t – s)
Fig. 5 Representation of the Lagrangian (left) and Eulerian (right) frames of reference for describing a
population of growing metastases. Left: the observer (the eye symbol) follows the growth curves of individual
metastases; time and size coordinates determine the observer’s position. Right: a static observer looks from
0
the outside at the growth speed g in fixed time-size areas. The relationship Xm (t) ¼ g(Xm(t)) holds
204 Christophe Gomez and Niklas Hartung
t = 1 year
0
10 N micro = 0.1
N macro = 0
M = 320
−2 t = 2 years
10 N micro = 14
N macro = 0.1
Metastatic density ρ(t, x)
M = 8.5 · 10 6
−4
10 t = 3 years
N micro = 118
N macro = 15
M = 3.8 · 10 9
−6
10
−8
10
−10 with t
10 increasing
−12
10
0 2 4 6 8 10
10 10 10 10 10 10
Size x (number of cells)
Fig. 6 Time evolution of the metastatic density in the size-structured model for metastasis. Each solid line
represents a snapshot of the metastatic density ρ at a particular point in time (1/2/3 years after inception of
the primary tumor). Due to the growth dynamics, the density is transported to the right. Several quantities
computable from the density are represented in the legend: Nmicro, number of metastases smaller than 108
cells; Nmacro, number of metastases larger than 108 cells; M, total metastatic mass (number of cells of all
metastases together)
Stochastic and Deterministic Metastatic Emission Models 205
3.2 Bridging the Gap: We now describe how the size-structured model and the PP cascade
Model Observables model are related. At first view, the two frameworks describe quite
different objects. While the PP cascade is concerned with a collec-
tion of emission times with a generational hierarchy, the size-
structured model features a density function. Nevertheless, as we
will see, the latter can be seen as the expectation of the PP cascade
model. To describe precisely the relationship between the models,
we need to introduce model observables as a common theme. In fact,
we have already introduced some model observables without nam-
ing them so. The model observables include the number of metas-
tases, the number of micro-/macro-metastases, and the total
metastatic mass.
Let us start with the size-structured model. For each function f,
a model observable (MO) is defined by
1
Z
xm
where xm0 is the size of a newly emitted metastasis and xm1 denotes
the theoretical upper boundary, i.e. it is integrated over all possible
sizes of metastases. Different choices for f are possible, and each of
them corresponds to one observable (this dependency is made
explicit through the subscriptf in MOf). The definition includes
the above-mentioned quantities:
l The number of metastases is obtained for f ¼ 1, i.e. the function
R x1
that equals 1 for all x: MO1 ðtÞ ¼ x 0m 1 ρðx, tÞdx ¼ N ðtÞ.
m
X
Nt
SMO1 ðtÞ ¼ 1 ¼ Nt,
k¼1
X
Nt
SMOf Id ðtÞ ¼ X m ðt T ðkÞ Þ ¼ M t : ð8Þ
k¼1
Variability
80 80 80
Data
Stochastic tra jectory
60 60 60
40 40 40
20 20 20
0 0 0
1000 1100 1200 1300 1400 1000 1100 1200 1300 1400 1000 1100 1200 1300 1400
Time (days) Time (days) Time (days)
Fig. 7 Comparison of residual variability from the size-structured model fit and stochastic variability of the PP
cascade model. Expectation (bold solid line) is the size-structured model prediction Nmacro(t), which was used
to fit the clinical data from [23] (computed via Eq. 10). Variability of the corresponding PP cascade model is
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
displayed in two ways: through stochastic trajectories (thin lines) and ½N macro, t p\pm2 var½N macro, t , with
var[Nmacro,t] computed via Eq. 15 (Variability, bold dashed line). As in [23], we count time from inception of the
first primary tumor cell, which was back-calculated from primary tumor data assuming Gompertzian growth
(hence the first CT scan with metastatic disease is approximately 3 years post-inception)
40
Probability (in %) 35
30 With secondary metastatic emission
25
20
15
10
5
0 1 2 3 4 5 6 7 8
40
35
Probability (in %)
0 1 2 3 4 5 6 7 8
Number of metastases
Fig. 8 Probability of metastatic disease after surgery with (top panel) or without (bottom panel) secondary
metastatic emission (each based on 10.000 simulations). In addition to the clinical parameters derived in [23],
it is assumed that the primary tumor is surgically removed 500 days after its inception, and that the number of
metastases is evaluated another 500 days later
relatively simple way, PPs have appealing properties that have been
illustrated here. They can be easily included as building blocks in
larger models, which has been shown with the PP cascade model,
but which applies in a much more general way. Also, they allow for a
high degree of analytical tractability, which was exploited here to
characterize the mean behavior of the PP cascade model.
Without doubt, further improvements of these techniques are
required. In particular, to make individualized risk predictions with
the model we have to match patient characteristics to model para-
meters. In this respect, circulating biomarkers, such as circulating
tumor cells or circulating tumor DNA can be a useful source of
information, especially since quantification methods are rapidly
getting more reliable [43, 44, 2]. Both frameworks presented
here allow for such an extension. Once validated, a mathematical
model can serve as a powerful tool for informed treatment decisions
for cancer patients by integrating case-specific information into a
consistent quantitative framework.
While this chapter has focused on the natural metastatic emis-
sion kinetics, it is possible to extend the formalism to cover systemic
treatments such as chemotherapy (represented as a size function
Xm(t; tincept) depending on inception time in the stochastic context
of Subheading 2.2, or a time-varying growth rate g(x, t) in the
deterministic context of Subheading 3.1). However, although
210 Christophe Gomez and Niklas Hartung
5.1 Definition of a The Poisson distribution is a standard way to count the occurrences
Poisson Process of some events.
and Basic Properties Definition 5.1 (Poisson Distribution).: Let μ 0. A random vari-
able Y with values in is said to have a Poisson distribution with
parameter μ, that we denote by Y PðμÞ, if for all k∈
μk
ℙðY ¼ kÞ ¼ e μ
k!
for μ > 0, and if ℙðY ¼ 0Þ ¼ 1 in the case μ ¼ 0.
The parameter μ∈ℝþ can be interpreted as the expected num-
ber of occurrences since
½Y ¼ μ with Y PðμÞ:
In our context, it counts the number of metastases. However, at
this level, we have no information on the event times we are
counting, nor how this number evolves with respect to time. To
handle the random nature of these times, let us introduce the
Poisson processes.
Definition 5.2 (Non-homogeneous Poisson Process).: Let
λ : ℝþ ! ℝþ be a continuous function. We say that (Nt)t 0 is a
non-homogeneous Poisson process with intensity λ if:
1. N0 ¼ 0;
2. the number of occurrences in disjoint time intervals is indepen-
dent, i.e. for t0 < . . . < tn, the random variables N t k N t k1 , k ¼
1, . . ., n are independent;
3. For all t > 0, Nt has a Poisson distribution with parameter ΛðtÞ,
given by
Stochastic and Deterministic Metastatic Emission Models 211
Zt
ΛðtÞ ¼ λðuÞdu:
0
The terminology non-homogeneous results from the fact that the
intensity function λ can vary in time, as opposed to a homogeneous
Poisson process for which λ is constant. Also, there are several
equivalent definitions for a non-homogenous Poisson process.
For instance, the third item above can be replaced by the following
properties:
ℙðN tþΔt N t ¼ 1Þ ¼ λðtÞΔt þ oðΔtÞ
and ℙðN tþΔt N t 2Þ ¼ oðΔtÞ,
X
þ1
PðduÞ :¼ δT ðkÞ ,
k¼1
X
þ1
N t ¼ Pð½0, tÞ ¼ 1ðT ðkÞ t Þ :
k¼1
X
þ1 Z t
ðkÞ
Mt ¼ 1ðT ðkÞ t Þ X m ðt T Þ ¼ X m ðt uÞPðduÞ:
k¼1
0
Expressions involving an integral with respect to the Poisson mea-
sure allow convenient manipulations as we will see in Appendix
using Proposition A.1.
212 Christophe Gomez and Niklas Hartung
5.2 Derivation Let us assume that the primary tumor diameter d(t) follows a power
of Empirical Formula law:
from Poisson
d 0 ðtÞ ¼ a dðtÞα , dð0Þ ¼ d 0 , 0 < α < 1:
Assumptions
Power growth of volume with a power between 2/3 and 1 has been
described in the literature, leading to the above model if we assume
a spherical shape of the tumor.
Furthermore, let us assume that metastatic emission is gov-
erned by a Poisson process with intensity λ(t) ¼ b d(t)β. We will
require β > 0, since the emission rate should increase with primary
tumor size. Then, the number of metastases
Rt Nt by time t is Poisson
distributed with parameter ΛðtÞ ¼ 0 λðsÞds and the probability of
metastasis-free disease at time t is given by
Zt
ℙðno metastasesÞ ¼ ℙðN t ¼ 0Þ ¼ exp b ð dðsÞβ dsÞ:
0
Here, we have
Zt Zt Zt βαþ1
b β
d ðs Þds ¼
b
a
0 βα
d ðs ÞdðsÞ ds ¼
b
a
d
dt
ð βdðsÞ
αþ1
:ds
0 0 0
b
¼ ðdðtÞβαþ1 d βαþ1 Þ,
aðβ α þ 1Þ 0
and assuming that the tumor is initiated with a negligible size (d0
d(t)), one obtains
Zt
b
b dðsÞβ ds
dðtÞβαþ1 :
aðβ α þ 1Þ
0
with c ¼ aðβαþ1Þ
b
, and z ¼ β α þ 1. Since β > 0 and α 1, we have
z > 0, and the above manipulation is justified.
It should be noted that although c and z can be determined
unambiguously from information on metastatic status and primary
tumor size at surgery if the patient cohort is large enough, this does
not apply for the growth and emission parameters of the underlying
Poisson process. To distinguish the growth and emission processes
additional information is required, such as repeated tumor size
measurements over time.
Stochastic and Deterministic Metastatic Emission Models 213
5.3 Summary of Key The framework proposed in [23] focuses on the evolution of a size-
Results on the Size- structured metastatic density ρ. Originally, it was assumed that
Structured Model primary and secondary tumors have the same growth and emission
patterns. Here, we present an extended version, described, e.g., in
[40], where primary and secondary growth and emission dynamics
can be different.
As before, Xp denotes the size of the primary tumor and γ p the
primary tumor emission law. The size of a metastasis is given by Xm,
which is the solution of an autonomous ordinary differential
equation
X 0m ðtÞ ¼ gðX m ðtÞÞ, X m ð0Þ ¼ x 0m :
5.4 Probabilistic Let us first remind the reader that we assume all emissions of the
Framework primary tumor and of the metastases to be independent. To exploit
for Secondary this property in calculations, we have to take care of the filiation of
Emission each metastasis, i.e. the generational hierarchy (the primary tumor,
the metastases emitted from the primary tumor, the metastases
emitted from the metastases emitted from the primary tumor,
etc.). We will therefore introduce a cascade of independent PPs,
and define recursively the emission times with respect to the gener-
ational hierarchy.
l The emission times for the first generation of metastases, that is,
the ones emitted by the primary tumor, are the event times of a
PP (Nt(1))t 0 with intensity λp; we will write Πð1Þ :¼ ðT ðj Þ Þj 1
for the set of random emission times.
The emission times for the next generations of metastases are
defined recursively.
l Let k 2 and n1, . . ., nk1 1. The jth emission time for the kth
generation of metastasis with filiation n1, . . ., nk1 is defined by
ðn1 , ..., nk1 , j Þ
T ðn1 , ..., nk1 , j Þ :¼ T ðn1 , ..., nk1 Þ þ T~ ð11Þ
This is the time it takes for the offspring with filiation n1, . . .,
n
k1ðn ,to give birth to its jth offspring. Here, the family
1 ..., nk1 , j Þ
~
T is formed by the event times of a PP
ðn , ..., n Þ j 1
N 1 k1
t0
with intensity λm.
ðkÞ
X
SMOf ðtÞ :¼ 1ðT ðn1 , ..., nk Þ t Þ f ðX m ðt T ðn1 , ..., nk Þ ÞÞ, ð12Þ
n1 , ..., nk 1
e f ðtÞ ¼ M O f ðtÞ:
Let us remark that the SMO (Eq. 13) may also be seen as
integrals w.r.t. a random measure
Z
SMOf ðtÞ ¼ f ðxÞMt ðdxÞ,
This description is the key point to bridge the gap to the description
of metastasis via a structured population equation.
Theorem 5.2 (Link to the Structured Population Model).: For all
t 0, the measure
μt :¼ ½Mt
is σ-finite, absolutely continuous with respect to the Lebesgue
measure, and its Radon–Nikodýn density is given by ρ( , t),
dμt
¼ ρð, t Þ,
dx
where ρ is the solution of the structured population
equation (Eq. 9).
216 Christophe Gomez and Niklas Hartung
Zt
vf ðtÞ ¼ λp ðsÞðf ðX m ðt sÞÞ þ e m, f ðt sÞÞ2
0
Zt
þ λm ðsÞvf ðt sÞ: ð15Þ
0
Here,
6 Notes
Acknowledgements
and
R R R R
ϕðu1 ÞPðdu1 Þ ϕðu2 ÞPðdu2 Þ ¼ ϕðu
R 1 Þλðu1 Þdu1 ψðu2 Þλðu2 Þdu2
þ ϕðuÞψðuÞλðuÞdu:
½Pðdu1 ÞPðdu2 Þ ¼ λðu1 Þλðu2 Þdu1 du2 þ δðu1 u2 Þλðu1 Þdu1 du2 :
Zt
ϕ∗ψðtÞ :¼ ϕðt uÞψðuÞdu: ð16Þ
0
A.1 Proof We first need to establish the following lemma, which is proven
of Proposition 5.1 further below.
Lemma A.1.: We have
e f ¼ λp ∗ðf ðX m Þ þ e m, f Þ, ð17Þ
where em, fhas been introduced in Proposition 5.2.
This is not exactly the renewal equation we want. To derive the
desired equation (Eq. 10) we just have to make the following
remark. Taking λp ¼ λm, Lemma A.1 gives that em, f satisfies
e m, f ¼ λm ∗ðf ðX m Þ þ e m, f Þ:
Hence, from (Eq. 17), we have
218 Christophe Gomez and Niklas Hartung
ef ¼ λp ∗f ðX m Þ þ λp ∗ðλm ∗f ðX m Þ þ λm ∗e m, f Þ
¼ λp ∗f ðX m Þ þ λm ∗ðλp ∗f ðX m Þ þ λp ∗e m, f Þ ð18Þ
¼ λp ∗f ðX m Þ þ λm ∗e f :
Now, let T > 0, we have from the last line of (Eq. 18) that for all t ∈
[0, T],
Zt
e f ðtÞ f L 1 ð½x 0 , x 1 Þ λpL 1 ð½0, T Þ T þ λmL 1 ð½0, T Þ e f ðuÞdu,
m m
0
and then
ℙð8t 0, SMOf ðtÞ < þ1: ¼ lim ℙð8t∈½0, n, SMOf ðtÞ < þ1: ¼ 1:
n!þ1
Proof (of Lemma A.1).: Let us start with the following remark.
According to the recursive definition (Eq. 11) of our PP cascade, one
has
ðn1 , ..., nk Þ
T ðn1 , ..., nk Þ ¼ T ðn1 Þ þ T , ð20Þ
where all the times
ðn1 , ..., nk Þ
fT , k 2, n1 , . . . , nk 1g
are independent of Π :¼ ðT ðn1 Þ Þn1 1 .
ð1Þ
:¼ I þ J,
Stochastic and Deterministic Metastatic Emission Models 219
with
X X
SMOn1 , f ðtÞ :¼ 1 ðn1 , ..., nk Þ
f ðX m ðt T ðn1 , ..., nk Þ ÞÞ,
T t
k2 n2 , ..., nk 1
0 0
¼ λp ∗f ðX m ÞðtÞ:
For the second term, using standard properties of the conditional
expectation (especially ½X ¼ ½½X jY ), one has
X
½I I ¼ ½
1ðT ðn1 Þ t Þ ½SMOn1 , f t T ðn1 Þ jΠð1Þ
n1 1
with
½SMOf , n1 t T ðn1 Þ jΠð1Þ : ¼ ½SMOn1 , f ðt uÞju¼T ð1Þ
n 1
X Zt
1ðT ðn1 Þ t Þ ½SMOn1 , f t T ðn1 Þ
jΠ ¼
ð1Þ
e m, f ðt uÞP ð1Þ ðduÞ:
n1 1
0
Zt Zt
½I I ¼ ½ e m, f ðt uÞP ð1Þ
ðduÞ: ¼ λp ðuÞe m, f ðt uÞdu
0 0
¼ λp ∗e m, f ðtÞ,
which concludes the proof of (Eq. 17). □
A.2 Proof Using the same strategy as for (Eq. 18), the proof of (Eq. 15)
of Proposition 5.2 consists only in proving the following relation:
220 Christophe Gomez and Niklas Hartung
with vm, f(t): ¼ var[SMOm, f(t)]. Knowing Proposition 5.1 and the
formula of the variance, one can focus on the term
ð2Þ
e f ðtÞ :¼ ½SMO2f ðtÞ. Using (Eq. 21), we have to compute three
terms
ð2Þ
e f ðtÞ ¼ ½I 2 þ 2½I J þ ½J 2 :
Rt
The Term ½I 2 : Reminding that I ¼ 0 f ðX s ðt uÞ:P ð1Þ ðduÞ, and
using Proposition A.1, it is direct that
Zt Zt
½I ¼
2
ð λp ðuÞf ðX m ðt uÞÞduÞ þ 2
λp ðuÞf 2 ðX m ðt uÞÞdu:
0 0
The Term ½I J : Using standard properties of the conditional expec-
tation, and that for all n1 1
ðT 1 tÞ ðT 1 tÞ
n11 , n21 1
½J 2 i ii
X
¼ ½ 1 ðn1 Þ 1 ðn2 Þ ½SMOn11 ,f t T ðn1 Þ jΠð1Þ ½SMOn21 ,f t T ðn1 Þ jΠð1Þ
T 1 t T 1 t
1 2
n11 6¼n21
X i X i
¼ ½ð 1ðT ðn1 Þ t Þ e m,f t T ðn1 Þ Þ2 ½
1ðT ðn1 Þ t Þ e 2m,f t T ðn1 Þ
n1 1 n1 1
Zt i Zt i
¼ ½ð e m,f ðt uÞP ð1Þ ðduÞÞ2 ½ e 2m,f ðt uÞP ð1Þ ðduÞ
0 0
Zt
¼ ð λp ðuÞe m,f ðt uÞduÞ2 :
0
Combining the three previous computations, we obtain
ð2Þ ð2Þ
ef ¼ ðλp ∗ðf ðX m Þ þ e m, f ÞÞ2 þ λp ∗f 2 ðX m Þ þ 2λp ∗ðf ðX m Þe m, f Þ þ λp ∗e m, f : ð24Þ
Zt
ð2Þ ð2Þ
e m, f ðtÞ C 1 þ C 2 e m, f ðuÞdu,
0
and then for all T > 0,
222 Christophe Gomez and Niklas Hartung
ð2Þ
sup e m, f ðtÞ C 1, T þ C 2, T sup e 2m, f < þ1,
t∈½0, T t∈½0, T
A.3 Proof Using that Xm(s) ∈ [xm0, xm1] for all s∈ℝþ , the σ-finiteness and
of Theorem 5.2 absolute continuity of μt (for any t 0) are direct consequences
of (Eq. 19). Denoting by ~ρ t its Radon–Nikodým density, Proposi-
tion 5.1 and Theorem 5.1 then yield
1 1 1
Zx m Zx m Zx m
f ðxÞμt ðdxÞ ¼ f ðxÞ~ρ t ðxÞdx ¼ f ðxÞρðt, xÞdx,
x 0m x 0m x0
1
m
for all f ∈C ½x 0m , x 1
m \ L ½x 0m , x 1
m , which concludes the proof.
References
1. Çınlar E (2011) Probability and stochastics. 8. Gupta GP, Massagué J (2006) Cancer metasta-
Graduate texts in mathematics, vol 261. sis: building a framework. Cell 127
Springer, New York (4):679–695
2. Yu M, Bardia A, Wittner BS, Stott SL, Smas 9. Hanahan D, Weinberg RA (2011) Hallmarks
ME, Ting DT, Isakoff SJ, Ciciliano JC, Wells of cancer: the next generation. Cell 144
MN, Shah AM, Concannon KF, Donaldson (5):646–674
MC, Sequist LV, Brachtel E, Sgroi D, 10. Michor F, Nowak MA, Iwasa Y (2006) Sto-
Baselga J, Ramaswamy S, Toner M (2013) Cir- chastic dynamics of metastasis formation. J
culating breast tumor cells exhibit dynamic Theor Biol 240(4):521–530
changes in epithelial and mesenchymal compo- 11. Haeno H, Michor F (2010) The evolution of
sition. Science 339(6119):580–584 tumor metastases during clonal expansion. J
3. Nguyen DX, Bos PD, Massagué J (2009) Theor Biol 263(1):30–44
Metastasis: from dissemination to organ- 12. Anderson AR, Quaranta V (2008) Integrative
specific colonization. Nat Rev Cancer 9 mathematical oncology. Nat Rev Cancer 8
(4):274–284 (3):227–234
4. Sahai E (2007) Illuminating the metastatic 13. Koscielny S, Tubiana M, Lê MG, Valleron J,
process. Nat Rev Cancer 7(10):737–749 Mouriesse H, Contesso G, Sarrazin D (1984)
5. WHO (2015) Cancer fact sheet. http://www. Breast cancer: relationship between the size of
who.int/mediacentre/factsheets/fs297/en/ . the primary tumour and the probability of met-
Accessed 14 Jan 2016 astatic dissemination. Br J Cancer 49
6. Pantel K, Cote RJ, Fodstad O (1999) Detec- (6):709–715
tion and clinical importance of micrometastatic 14. Michaelson JS, Silverstein M, Wyatt J,
disease. J Natl Cancer Inst 91(13):1113–1124 Weber G, Moore R, Halpern E, Kopans DB,
7. Scott JG, Gerlee P, Basanta D, Fletcher AG, Hughes K (2002) Predicting the survival of
Maini PK, Anderson ARA (2013) Mathemati- patients with breast carcinoma using tumor
cal modeling of the metastatic process. In: size. Cancer 95(4):713–723
Malek A (ed) Experimental metastasis: model- 15. van de Vijver MJ, He YD, van’t Veer LJ, Dai H,
ing and analysis. Springer, Dordrecht, pp Hart AAM, Voskuil DW, Schreiber GJ, Peterse
189–208 JL, Roberts C, Marton MJ, Parrish M,
Atsma D, Witteveen A, Glas A, Delahaye L,
Stochastic and Deterministic Metastatic Emission Models 223
van der Velde T, Bartelink H, Rodenhuis S, 28. Newton PK, Mason J, Bethel K, Bazhenova L,
Rutgers ET, Friend SH, Bernards R (2002) A Nieva J, Norton L, Kuhn P (2013) Spreaders
gene-expression signature as a predictor of sur- and sponges define metastasis in lung cancer: a
vival in breast cancer. N Engl J Med 347 Markov chain Monte Carlo mathematical
(25):1999–2009 model. Cancer Res 73(9):2760–2769
16. Hahnfeldt P, Panigrahy D, Folkman J, Hlatky 29. Comen E, Norton L, Massague J (2011) Clin-
L (1999) Tumor development under angio- ical implications of cancer self-seeding. Nat Rev
genic signaling: a dynamical theory of tumor Clin Oncol 8(6):369–377
growth, response and postvascular dormancy. 30. Scott JG, Basanta D, Anderson AR, Gerlee P
Cancer Res 59:4770–5 (2013) A mathematical model of tumour self-
17. Norton L (1988) A Gompertzian model of seeding reveals secondary metastatic deposits as
human breast cancer growth. Cancer Res drivers of primary tumour growth. J R Soc
48:7067–7071 Interface 10(82):20130011
18. Verga F (2010) Modélisation mathématique de 31. Hanin L, Zaider M (2011) Effects of surgery
processus métastatiques. Ph.D. thesis, and chemotherapy on metastatic progression of
Aix-Marseille Université prostate cancer: evidence from the natural his-
19. Hart D, Shochat E, Agur Z (1998) The growth tory of the disease reconstructed through
law of primary breast cancer as inferred from mathematical modeling. Cancers 3
mammography screening trials data. Br J Can- (3):3632–3660
cer 78:382–387 32. Wheldon TE (1988) Mathematical models in
20. Benzekry S, Lamont C, Beheshti A, Tracz A, cancer research. Medical science series. Adam
Ebos JML, Hlatky L, Hahnfeldt P (2014) Clas- Hilger, Bristol/Philadelphia
sical mathematical models for description and 33. Benzekry S, Gandolfi A, Hahnfeldt P (2014)
prediction of experimental tumor growth. Global dormancy of metastases due to systemic
PLoS Comput Biol 10(8):e1003800 inhibition of angiogenesis. PLoS One 9(1):
21. Bartoszyński R, Edler L, Hanin L, Kopp- e84249
Schneider A, Pavlova L, Tsodikov A, Zorin A, 34. Bethge A, Schumacher U, Wedemann G
Yakovlev A (2001) Modeling cancer detection: (2015) Simulation of metastatic progression
tumor size as a source of information on unob- using a computer model including chemother-
servable stages of carcinogenesis. Math Biosci apy and radiation therapy. J Biomed Inform
171:113–142 57:74–87
22. Hanin L, Rose J, Zaider M (2006) A stochastic 35. Lewis PAW, Shedler GS (1979) Simulation of
model for the sizes of detectable metastases. J nonhomogeneous poisson processes by thin-
Theor Biol 243:407–417 ning. Nav Res Log Q 26(3):403
23. Iwata K, Kawasaki K, Shigesada N (2000) A 36. Sadahiro S, Suzuki T, Ishikawa K, Nakamura T,
dynamical model for the growth and size dis- Tanaka Y, Masuda T, Mukoyama S, Yasuda S,
tribution of multiple metastatic tumors. J Tajima T, Makuuchi H, Murayama C (2003)
Theor Biol 203:177–186 Recurrence patterns after curative resection of
24. Hartung N, Mollard S, Barbolosi D, colorectal cancer in patients followed for a min-
Benabdallah A, Chapuisat G, Henry G, imum of ten years. Hepatogastroenterology 50
Giacometti S, Iliadis A, Ciccolini J, Faivre C, (53):1362–1366
Hubert F (2014) Mathematical modeling of 37. Siegel R, DeSantis C, Virgo K, Stein K,
tumor growth and metastatic spreading: valida- Mariotto A, Smith T, Cooper D, Gansler T,
tion in tumor-bearing mice. Cancer Res Lerro C, Fedewa S, Lin C, Leach C, Cannady
74:6397–6407 RS, Cho H, Scoppa S, Hachey M, Kirch R,
25. Benzekry S, Tracz A, Mastri M, Corbelli R, Jemal A, Ward E (2012) Cancer treatment
Barbolosi D, Ebos JML (2016) Modeling and survivorship statistics, 2012. CA Cancer J
spontaneous metastasis following surgery: an Clin 62(4):220–241
in vivo-in silico approach. Cancer Res 76 38. Batchelor GK (1967) An introduction to fluid
(3):535–547 dynamics. Cambridge University Press,
26. Chaffer CL, Weinberg RA (2011) A perspec- Cambridge
tive on cancer cell metastasis. Science 331 39. Barbolosi D, Benabdallah B, Hubert F, Verga F
(6024):1559–1564 (2009) Mathematical and numerical analysis
27. Newton PK, Mason J, Bethel K, Bazhenova for a model of growing metastatic tumors.
LA, Nieva J, Kuhn P (2012) A stochastic Mar- Math Biosci 218:1–14
kov chain model to describe lung cancer 40. Hartung N (2015) Efficient resolution of met-
growth and metastasis. PLoS One 7(4):e34637 astatic tumour growth models by
224 Christophe Gomez and Niklas Hartung
reformulation into integral equations. Discrete Scoles G, Toffoletto B, Isola M, Beltrami CA,
Contin Dyn Syst B 20:445–467 Di Loreto C, Beltrami AP, Puglisi F, Cesselli D
41. Lavielle M (2014) Mixed effects models for the (2016) In patients with metastatic breast can-
population approach. models, tasks, methods cer the identification of circulating tumor cells
and tools. Chapman & Hall/CRC biostatistics in epithelial-to-mesenchymal transition is asso-
series. Chapman & Hall/CRC, Boca Raton ciated with a poor prognosis. Breast Cancer Res
42. Tornøe CW, Overgaard RV, Agersø H, Nielsen 18(1):30
HA, Madsen H, Jonsson EN (2005) Stochastic 44. Paoletti C, Hayes DF (2016) Circulating
differential equations in NONMEM: imple- tumor cells. Adv Exp Med Biol 882:235–258
mentation, application, and comparison with 45. Chen LL, Blumm N, Christakis NA, Barabasi
ordinary differential equations. Pharm Res 22 AL, Deisboeck TA (2009) Cancer metastasis
(8):1247–1258 networks and the prediction of progression
43. Bulfoni M, Gerratana L, Del Ben F, patterns. Br J Cancer 101(5):749–758
Marzinotto S, Sorrentino M, Turetta M,
Chapter 11
Abstract
Biophysical models designed to predict the growth and response of tumors to treatment have the potential
to become a valuable tool for clinicians in care of cancer patients. Specifically, individualized tumor forecasts
could be used to predict response or resistance early in the course of treatment, thereby providing an
opportunity for treatment selection or adaption. This chapter discusses an experimental and modeling
framework in which noninvasive imaging data is used to initialize and parameterize a subject-specific model
of tumor growth. This modeling approach is applied to an analysis of murine models of glioma growth.
Key words Cancer, Biophysical stress, Diffusion, Invasion, MRI, Finite difference method
1 Introduction
Louise von Stechow (ed.), Cancer Systems Biology: Methods and Protocols, Methods in Molecular Biology, vol. 1711,
https://doi.org/10.1007/978-1-4939-7493-1_11, © Springer Science+Business Media, LLC 2018
225
226 David A. Hormuth II et al.
∇ σ λf ∇N ¼ 0, ð2Þ
where σ is the stress tensor and λf is tumor cell-force coupling
constant. For implementation, Eq. 2 is rewritten in terms of the
⇀
tissue displacement ( u ) under a linear elastic isotropic material
assumption in Eq. 3:
G
⇀ ⇀
∇ G∇u þ ∇ ∇ u λf ∇N ¼ 0, ð3Þ
1 2ν
where G is the shear modulus (a material property that represents
the constant of proportionality between shear stress to shear strain)
and ν is Poisson’s ratio (a material property that is a ratio relating
lateral to longitudinal strain). The first two terms on the left-hand
side in Eq. 3 represent the linear-elastic description of tissue dis-
placement, while the third term represents a local body force gen-
⇀
erated by the invading tumor. u is then used to calculate the local
normal (εxx, εyy, εzz) and shear strains (εxy, εxz, εyz). For small
deformations, strain εi,j is defined as the total deformation in the
Methods for a Mechanically Coupled Reaction-Diffusion Glioma Model 227
2 Materials
2.1 Dataset The numerical methods presented in this chapter use an in vivo
dataset acquired in rats with intracranially inoculated glioma cells
[5, 18, 19]. Alternatively, an in silico dataset can also be used
[5]. For both approaches the dataset should contain:
1. Three-dimensional estimates of the distribution of tumor cells
at several time points.
2. Three-dimensional map of k (or initial guess).
3. Single value for D0 (or initial guess).
4. Values for G, ν, λD, λf, and θ (based on literature, calculation, or
assignment, see Note 1).
For use in Matlab this dataset should be saved as a “.mat” file
consisting of a 4D array of tissue cellularity, a 3D array of k values,
and one-element arrays of D0, G, ν, λD, λf, and θ all with double
precision.
2.2 Software/ The forward evaluation and parameter optimization of the mechan-
Hardware ically coupled model was ran on a Dell PowerEdge R820 server
Requirements consisting of four Intel Xenon E5–4610 2.3 GHz processors with a
total of 256 GB of memory using Matlab 2015b. The forward
evaluation is relatively less computationally intensive and takes less
than 16 s for a 10 day simulation on a laptop with 8 GB of memory
and an Intel i5-2550 M 2.5 GHz processor. The parameter optimi-
zation computation time, however, depends on both the number of
parameters being estimated and the number of iterations of the
optimization algorithm until stopping criteria are met. Paralleliza-
tion of the parameter perturbation code can reduce computation
time by a factor approximately equal to the number of parallel
threads. (For example parameter perturbation for 100 parameters
takes 13.1 min with 1 thread, 3.1 min with 4 threads, 1.7 min with
8 threads, 0.9 min with 16 threads, and 0.7 min with 32 threads.)
3 Methods
3.1 Animal While details are presented in [5], we here discuss the salient
Experiments features of the experimental procedure (see Fig. 1). The in vivo
Methods for a Mechanically Coupled Reaction-Diffusion Glioma Model 229
Fig. 1 Experimental timeline and estimation of in vivo cell number from DW-MRI data. (a) On day 0, rats are
injected intracranially with 105 C6 glioma cells. (b) Jugular catheters are then inserted on day 8. (c) On days
10 through 20, rats are imaged with MRI with 3D gradient echo, DW-MRI, and CE-MRI. (d) CE-MRI is used to
identify tumor tissue by subtracting pre-contrast image from the post-contrast image. (e) ADC(x, y, z, t ) is
then estimated from DW-MRI data. Finally, N(x, y, z, t ) is estimated (f) within the tumor tissue using Eq. 9 and
ADC(x, y, z, t )
3.2 Modeling We now discuss the details of the finite difference simulation for
Eqs. 1 and 2, the forward evaluation of the model system, and the
parameter optimization and the tumor growth prediction
approach. Figure 2 shows an overview of the data collection,
parameter optimization, and prediction approach. Briefly, data is
acquired from ti to tf. A subset of the total data (days ti to tn, where
tn is less than tf) are first used to determine the optimal model
parameters. Once the stopping criteria are met for the parameter
optimization approach, the optimized model parameters are then
Methods for a Mechanically Coupled Reaction-Diffusion Glioma Model 231
Fig. 2 Tumor growth modeling and prediction flow chart. DW-MRI and CE-MRI data is first acquired in rats at
days ti to tf. A subset of the total data (ti to tn) is used to first estimate model parameters using an iterative
optimization algorithm. The optimized model parameters are then used in a forward evaluation of the model
system to predict tumor growth at the remaining data points (tn + 1 to tf). The error is then assessed between
the model and measured values of N(x, y, z, t )
3.2.1 Finite Difference As an illustrative example for clarity, we show the derivation of the
Simulation Setup finite difference model for a 1D implementation, followed by
extending the model to the full 3D implementation. A Taylor series
expansion is used to derive the finite difference approximation of
the tumor cell model (Eq. 1) as shown for the 1D implementation
in Eq. 10:
N ðx; t þ h t Þ N ðx; t Þ δN ðx; t Þ δD ðx Þ
¼ þ D ðx Þ
ht 2h x 2h x
!
δ2 N ðx; t Þ
þ kðx Þ N ðx; t Þ
h 2x
N ðx; t Þ
1 , ð10Þ
θ
where ht is the time step, and hx is the grid spacing in the x-
direction, and δ represents the central difference operator, defined
below in Eqs. 11 and 12. Finite difference approximations are
derived using a full grid approach to take advantage of the natural,
voxelized gridding from the experimental imaging data
232 David A. Hormuth II et al.
∂N ðx; t Þ δN ðx; t Þ N ðx þ h x ; t Þ N ðx h x ; t Þ
¼ : ð11Þ
∂x 2h x 2h x
Similarly, the central difference approximation of the second
derivative in (for example) the x-direction is shown in Eq. 12:
∂2 N ðx; t Þ δ2 N ðx; t Þ
∂x 2 h 2x
N ðx þ h x ; t Þ 2 N ðx; t Þ þ N ðx h x ; t Þ
¼ : ð12Þ
h 2x
In the case of a mesh boundary, where the node at either (x + 1)
or (x 1) does not exist, the zero flux boundary condition (∂N/
∂x ¼ 0) can be used to relateN(x + hx, t) to N(x hx, t) (or vice
versa) as shown in Eq. 13:
N ðx þ h x ; t Þ N ðx h x ; t Þ
¼ 0 ) N ðx þ h x ; t Þ
2h x
¼ N ðx h x ; t Þ: ð13Þ
The 3D implementation of Eq. 1 is shown below in Eq. 14:
!
N ðx; y; z;t þ h t Þ N ðx; y;z;t Þ δN ðx;y;z; t Þ δD ðx; y; z Þ δ2 N ðx;y;z;t Þ
¼ þ D ðx;y;z Þ
ht 2h x 2h x h 2x
!
δN ðx;y;z;t Þ δD ðx;y;z Þ δ2 N ðx; y; z;t Þ
þ þ D ðx; y; z Þ
2h y 2h y h 2y
!
δN ðx;y;z;t Þ δD ðx;y;z Þ δ2 N ðx; y; z;t Þ
þ þ D ðx; y; z Þ
2h z 2h z h2
z
N ðx; y; z;t Þ
þkðx;y;z Þ N ðx;y;z; t Þ 1 :
θ
ð14Þ
The derivation of the finite difference approximation of Eq. 2 is
shown for the 1D implementation in Eqs. 15–17. Equation 2 is first
rewritten in terms of the 1D stress in the x-direction (σ x) in Eq. 15:
∇ σ x ðx Þ λf ∇N ðx; t Þ ¼ 0: ð15Þ
σ x is then replaced with Hooke’s law for a linear elastic isotropic
material (σ x ¼ E εx) in Eq. 16:
3.2.2 Forward Evaluation A summary and example of the forward evaluation algorithm is
presented in Fig. 3. The forward evaluation begins with solving the
mechanical model (steps 1 through 4 in Fig. 3). At the beginning
of each iteration, the gradient of the current distribution of tumor
cells, ∇N(x, y, z, t), is calculated and is assigned to f∇Ng (step 1 in
Fig. 3). fUg is then solved for in Eq. 21 (step 2 in Fig. 3). The
strains (Eq. 4) and stresses (Eqs. 5 and 6) are calculated (step 3 in
Fig. 3). σ vm(x, y, z, t) is then used to update D(x, y, z, t) (Eq. 7,
step 4 in Fig. 3). Finally, D(x, y, z, t) is used in the evaluation of
Eq. 1 to determine N(x, y, z, t + 1) (step 5 in Fig. 3). The forward
evaluation of the model system is then repeated at each simulation
time step.
3.3 Parameter The optimal model parameters are determined using an iterative
Optimization and Levenberg-Marquardt [32, 33] weighted least squares
Tumor Growth optimization:
Prediction h i
J T WJ þ α D J T WJ fΔβg ¼ J T W fN meas N model ðβÞg, ð22Þ
where J is the Jacobian matrix, W is a diagonal weighting matrix, α
is a damping parameter, D J T WJ is a diagonal matrix consisting of the
diagonal elements of JTWJ, {Δβ}is as vector of updates to model
parameters, {Nmeas} is a vector of the measured cell number, and
{Nmodel(β)} is a vector of the model described cell number using the
current best set of parameters β. J is a (n (number of voxels) nt
(number of time points)) by p (the number of model parameters)
matrix, W is a (n nt) (n nt) matrix, has p components, and
{Nmeas} has (n nt) components. J can be estimated using
Methods for a Mechanically Coupled Reaction-Diffusion Glioma Model 235
Fig. 3 Algorithm and example forward evaluation of mechanical and tumor cell model. The mechanical model
is first solved to calculate the tissue displacement vector {U} due to N(x, y, z, t ), Eq. 21. {U} is then used to
calculate strain, stress, and σ vm(x, y, z, t ). The new value of D(x, y, z, t )is calculated using Eq. 2 and
σ vm(x, y, z, t ). Finally, D(x, y, z, t ) is used in Eq. 6 to calculate the value of N(x, y, z, t + 1)
ð25Þ
3.4 Summary and In this chapter, a modeling and experimental framework was
Outlook described which can be used to individualize a predictive biophysi-
cal model from an individual patient’s imaging data. Clinically
available imaging measurements from CE-MRI and DW-MRI
were used to provide serial estimates of tumor cell number that
were then used in an inverse problem to optimize model parameters
for the measured tumor. These individually optimized model para-
meters could then be used to predict future growth or response.
For example, acquiring data early in the course of a patient’s ther-
apy could be used to calibrate a patient-specific model that could
238 David A. Hormuth II et al.
4 Notes
Acknowledgments
References
1. Yankeelov TE, Quaranta V, Evans KJ, Rericha 8. Corwin D, Holdsworth C, Rockne RC, Trister
EC (2015) Toward a science of tumor forecast- AD, Mrugala MM, Rockhill JK et al (2013)
ing for clinical oncology. Cancer Res Toward patient-specific, biologically optimized
75(6):918–923 radiation therapy plans for the treatment of
2. Atuegwu NC, Gore JC, Yankeelov TE (2010) glioblastoma. PLoS One 8(11):e79115
The integration of quantitative multi-modality 9. Hogea C, Davatzikos C, Biros G (2008) An
imaging data into mathematical models of image-driven parameter estimation problem
tumors. Phys Med Biol 55(9):2429–2449 for a reaction-diffusion glioma growth model
3. Atuegwu NC, Colvin DC, Loveless ME, Xu L, with mass effects. J Math Biol 56(6):793–825
Gore JC, Yankeelov TE (2012) Incorporation 10. Liu Y, Sadowski SM, Weisbrod AB,
of diffusion-weighted magnetic resonance Kebebew E, Summers RM, Yao J (2014)
imaging data into a simple mathematical Patient specific tumor growth prediction
model of tumor growth. Phys Med Biol 57 using multimodal images. Med Image Anal 18
(1):225–240 (3):555–566
4. Weis JA, Miga MI, Arlinghaus LR, Li X, Chak- 11. Konukoglu E, Clatz O, Menze BH, Stieltjes B,
ravarthy AB, Abramson V et al (2013) A Weber M-A, Mandonnet E et al (2010) Image
mechanically coupled reaction-diffusion guided personalization of reaction-diffusion
model for predicting the response of breast type tumor growth models using modified
tumors to neoadjuvant chemotherapy. Phys anisotropic eikonal equations. IEEE Trans
Med Biol 58(17):5851–5866 Med Imaging 29:77–95
5. Hormuth DA II, Weis JA, Barnes SL, Miga MI, 12. Garg I, Miga MI (2008) Preliminary investiga-
Rericha EC, Quaranta V et al (2015) Predicting tion of the inhibitory effects of mechanical
in vivo glioma growth with the reaction diffu- stress in tumor growth. Proc SPIE
sion equation constrained by quantitative mag- 29:69182L-11
netic resonance imaging data. Phys Biol 12 13. Venes D (2013) Taber’s® cyclopedic medical
(4):46006 dictionary, 22nd edn. F. A. Davis Company,
6. Weis JA, Miga MI, Arlinghaus LR, Li X, Philadelphia, PA
Abramson V, Chakravarthy AB et al (2015) 14. DeAngelis LM (2001) Brain tumors. N Engl J
Predicting the response of breast cancer to Med 344(2):114–123
neoadjuvant therapy using a mechanically cou- 15. Helmlinger G, Netti PA, Lichtenbeld HC,
pled reaction-diffusion model. Cancer Res 75 Melder RJ, Jain RK (1997) Solid stress inhibits
(22):4697–4707 the growth of multicellular tumor spheroids.
7. Baldock A, Rockne R, Boone A, Neal M, Nat Biotechnol 15(8):778–783
Bridge C, Guyman L et al (2013) From 16. Padhani AR, Liu G, Mu-Koh D, Chenevert
patient-specific mathematical neuro-oncology TL, Thoeny HC, Takahara T et al (2009)
to precision medicine. Front Oncol 3:62 Diffusion-weighted magnetic resonance
Methods for a Mechanically Coupled Reaction-Diffusion Glioma Model 241
imaging as a cancer biomarker: consensus and for cell aggregation analysis and cell aggrega-
recommendations. Neoplasia 11(2):102–125 tion in in vitro chondrogenesis. Cytometry 28
17. Yankeelov TE, Gore JC (2009) Dynamic con- (2):141–146
trast enhanced magnetic resonance imaging in 27. Rouzaire-Dubois B, Milandri JB, Bostel S,
oncology: theory, data acquisition, analysis, Dubois JM (2000) Control of cell proliferation
and examples. Curr Med Imaging Rev 3 by cell volume alterations in rat C6 glioma
(2):91–107 cells. Pflugers Arch 440(6):881–888
18. Barth R, Kaur B (2009) Rat brain tumor mod- 28. Elkin BS, Ilankovan AI, Morrison B III (2011)
els in experimental neuro-oncology: the C6, A detailed viscoelastic characterization of the
9L, T9, RG2, F98, BT4C, RT-2 and CNS-1 P17 and adult rat brain. J Neurotrauma
gliomas. J Neuro-Oncol 94(3):299–312 28:2235
19. Hormuth DA II, Weis JA, Barnes SL, Miga MI, 29. Lee SJ, King MA, Sun J, Xie HK, Subhash G,
Rericha EC, Quaranta V, Yankeelov TE Sarntinoranont M (2014) Measurement of vis-
(2017). A mechanically-coupled reaction-dif- coelastic properties in multiple anatomical
fusion model that incorporates intra-tumoral regions of acute rat brain tissue slices. J Mech
heterogeneity to predict in vivo glioma growth. Behav Biomed Mater 29:213–224
J R Soc Interface 14:128 30. Lynch D (2005) Numerical partial differential
20. Barnes SL, Sorace AG, Loveless ME, Whise- equations for environmental scientsits and
nant JG, Yankeelov TE (2015) Correlation of engineers: a first practical course. Springer,
tumor characteristics derived from DCE-MRI New York, NY
and DW-MRI with histology in murine models 31. Miga MI, Paulsen KD, Lemery JM, Eisner SD,
of breast cancer. NMR Biomed 28 Hartov A, Kennedy FE et al (1999) Model-
(10):1345–1356 updated image guidance: initial clinical experi-
21. Anderson AW, Xie J, Pizzonia J, Bronen RA, ences with gravity-induced brain deformation.
Spencer DD, Gore JC (2000) Effects of cell IEEE Trans Med Imaging 10:866–874
volume fraction changes on apparent diffusion 32. Levenberg K (1944) A method for the solution
in human cells. Magn Reson Imaging 18 of certain non-linear problems in least squares.
(6):689–695 Q J Appl Mathmatics II(2):164–168
22. Guo Y, Cai Y-Q, Cai Z-L, Gao Y-G, An N-Y, 33. Marquardt DW (1963) An algorithm for least-
Ma L et al (2002) Differentiation of clinically squares estimation of nonlinear parameters. J
benign and malignant breast lesions using Soc Ind Appl Math 11(2):431–441
diffusion-weighted imaging. J Magn Reson 34. Eisenhauer EA, Therasse P, Bogaerts J,
Imaging 16(2):172–178 Schwartz LH, Sargent D, Ford R et al (2009)
23. Sugahara T, Korogi Y, Kochi M, Ikushima I, New response evaluation criteria in solid
Shigematu Y, Hirai T et al (1999) Usefulness of tumours: revised RECIST guideline (version
diffusion-weighted MRI with echo-planar 1.1). Eur J Cancer 45(2):228–247
technique in the evaluation of cellularity in 35. Yankeelov TE, Atuegwu N, Hormuth DA,
gliomas. J Magn Reson Imaging 9(1):53–60 Weis JA, Barnes SL, Miga MI et al (2013)
24. Humphries PD, Sebire NJ, Siegel MJ, Olsen Clinically relevant modeling of tumor growth
ØE (2007) Tumors in pediatric patients at and treatment response. Sci Transl Med 5
diffusion-weighted mr imaging: apparent dif- (187):187ps9
fusion coefficient and tumor cellularity. Radiol- 36. Marino S, Hogue IB, Ray CJ, Kirschner DE
ogy 245(3):848–854 (September 2008) A methodology for
25. Whisenant JG, Ayers GD, Loveless ME, Barnes performing global uncertainty and sensitivity
SL, Colvin DC, Yankeelov TE (2014) Asses- analysis in systems biology. J Theor Biol 254
sing reproducibility of diffusion-weighted (1):178–196
magnetic resonance imaging studies in a 37. Broyden CG (1965) A class of methods for
murine model of HER2+ breast cancer. Magn solving nonlinear simultaneous equations.
Reson Imaging 32(3):245–249 Math Comput 19(92):577–593
26. Martin I, Dozin B, Quarto R, Cancedda R,
Beltrame F (1997) Computer-based technique
Chapter 12
Abstract
Tumor infiltrating leukocytes (TILs) are an integral component of the tumor microenvironment and have
been found to correlate with prognosis and response to therapy. Methods to enumerate immune subsets
such as immunohistochemistry or flow cytometry suffer from limitations in phenotypic markers and can be
challenging to practically implement and standardize. An alternative approach is to acquire aggregative high
dimensional data from cellular mixtures and to subsequently infer the cellular components computationally.
We recently described CIBERSORT, a versatile computational method for quantifying cell fractions from
bulk tissue gene expression profiles (GEPs). Combining support vector regression with prior knowledge of
expression profiles from purified leukocyte subsets, CIBERSORT can accurately estimate the immune
composition of a tumor biopsy. In this chapter, we provide a primer on the CIBERSORT method and
illustrate its use for characterizing TILs in tumor samples profiled by microarray or RNA-Seq.
Key words Cancer immunology, Deconvolution, Support vector regression (SVR), Tumor infiltrat-
ing leukocytes (TILs), Tumor microenvironment, Tumor heterogeneity, Gene expression, Microarray,
RNA-Seq, TCGA
1 Introduction
Louise von Stechow (ed.), Cancer Systems Biology: Methods and Protocols, Methods in Molecular Biology, vol. 1711,
https://doi.org/10.1007/978-1-4939-7493-1_12, © Springer Science+Business Media, LLC 2018
243
244 Binbin Chen et al.
Bulk
tissue/ Blood
tumor draw
RNA
Purify profile OR
Cell proportions
Signature matrix
RNA
profile
Significance
CIBERSORT analysis
Fig. 1 Overview of CIBERSORT. As input, CIBERSORT requires a “signature matrix” comprised of barcode
genes that are enriched in each cell type of interest. Once a suitable knowledgebase is created and validated,
CIBERSORT can be applied to characterize cell type proportions in bulk tissue expression profiles. Although
originally validated using a signature matrix containing 22 functionally defined human immune subsets (LM22)
profiled by microarrays, CIBERSORT is a general framework that can be applied to diverse cell phenotypes and
genomic data types, including RNA-Seq. To quantitatively capture deconvolution confidence, CIBERSORT
calculates several quality control metrics, including a deconvolution p-value
2 Materials
Table 1
Format of input mixture files (tab separated plain text)
3 Methods
3.2 Enumerating TIL LM22 is a signature matrix file consisting of 547 genes that accu-
Subsets with LM22 rately distinguish 22 mature human hematopoietic populations
isolated from peripheral blood or in vitro culture conditions,
including seven T cell types, naive and memory B cells, plasma
cells, NK cells, and myeloid subsets. LM22 was designed and
extensively validated using gene expression microarray data, but is
also applicable to RNA-Seq data for hypothesis generation (see
Note 1). Here, we illustrate how to prepare Affymetrix microarray
data for use with LM22, and how to run CIBERSORT with LM22
to characterize the leukocyte composition of prostate biopsies
obtained from patients with prostate cancer and from healthy sub-
jects. To follow the examples in this section, download GSE55945
CEL files from GEO (https://www.ncbi.nlm.nih.gov/geo/down
load/?acc¼GSE55945&format¼file). Processed data for
GSE55945 can be downloaded from the CIBERSORT website.
Profiling Tumor Infiltrating Immune Cells with CIBERSORT 249
3.2.1 General Tips for Gene expression data must be preprocessed as specified in Subhead-
Mixture File Preparation ings 2 and 3.2.2. Because LM22 uses HUGO gene symbols (e.g.,
CD8A, MS4A1, CTLA4, etc.), all mixture files need to possess
matching HUGO identifiers. See Note 2 for using non-HUGO
gene symbols. Importantly, all expression values should be in
non-log (i.e., linear) space with positive numerical values and no
missing data. Not all signature matrix genes need to be present in
the mixture expression data, but performance will improve with the
presence of more signature genes.
3.2.3 Running Before running CIBERSORT, all mixture files need to be uploaded
CIBERSORT (Menu > “Upload Files”). The user needs to select “Mixture”
when uploading mixture files. After uploading the correctly for-
matted mixture file (e.g., prostate_cancer.txt) to the website, go to
“Run CIBERSORT” under Menu (see Fig. 2). Select “LM22
(22 immune cell types)” for “Signature gene file.” When clicking
“Mixture file,” the uploaded mixture file will be one of the options.
Select “Run” after choosing both the mixture file of interest and a
permutation number. At least 100 permutations are recommended
to achieve statistical rigor.
To run CIBERSORT locally in R, navigate to the directory
containing the CIBERSORT.R script, and run the following com-
mands within the R terminal:
250 Binbin Chen et al.
Fig. 2 CIBERSORT web interface. All the files except the LM22 gene signature need to be uploaded to the
CIBERSORT website before proceeding to this page. When using LM22, the user will need to select the
uploaded mixture file and specify “LM22 (22 immune cell types)” for the signature gene file. When creating
custom gene signatures, a reference sample file and a phenotype classes file are required, and need to be
uploaded to the webserver. For CIBERSORT to generate a meaningful p-value, we recommend at least
100 permutations; however, this parameter can be set to a small number for exploratory analyses
> source(‘CIBERSORT.R’)
> results <- CIBERSORT(‘sig_matrix_file.txt’,‘mixture_file.txt’,
perm¼100, QN¼TRUE)
Deconvolution output will be saved to a results object in R and
written to disk as CIBERSORT-results.txt in the same directory.
In this example, sig_matrix_file.txt should be “LM22.txt”
(obtain under Menu>Download); mixture_file.txt should be
“prostate_cancer.txt”; perm is an integer number for the number
of permutations; and QN is a Boolean value (TRUE or FALSE) for
performing quantile normalization. QN is set to TRUE by default
and recommended when the gene signature matrix is derived from
several different studies or sample batches.
3.2.4 Interpretation of Once the online analysis is complete, the website will output a
Results stacked bar plot (see Fig. 3) and a heat map (see Fig. 4). The output
Profiling Tumor Infiltrating Immune Cells with CIBERSORT 251
Fig. 3 Inferred composition of 22 immune cell subsets in malignant and normal prostate biopsies (related to
Subheading 3.2). The results were generated using CIBERSORT and the built-in LM22 immune cell gene
signature, and the stacked bar plot display was automatically generated by the CIBERSORT webserver
Fig. 4 Estimated proportions of six major leukocyte subsets (B cells, CD8 T cells, CD4 T cells, NK cells,
monocytes/macrophages, neutrophils) in skin cutaneous melanoma tumor biopsies profiled by The Cancer
Genome Atlas (TCGA). The results were determined using a custom RNA-Seq leukocyte signature matrix
(“LM6,” Subheading 3.3.3), and the heat map figure was generated by the CIBERSORT webserver
252 Binbin Chen et al.
3.3 TIL A custom signature matrix can be created using data from purified
Characterization with cell populations. While the process to generate a custom matrix
a Custom Signature from expression profiles is straightforward, the performance of a
Matrix custom matrix will depend on the quality of the data used to
generate it. Immunophenotyping of leukocytes is a dynamic field
3.3.1 Generation of with new immune populations continuing to be identified. Care
Expression Profiles for should be taken in determining which immune “cell types” should
Custom Gene Signature be included in the signature matrix and which canonical markers
Matrix Creation should be used to isolate these populations. For example, it is clear
that the population of “CD4-expressing T lymphocytes” encom-
passes heterogeneous populations with diverse functional pheno-
types including naive, memory, Th1, Th2, Th17, T-regulatory
cells, and T follicular helper cells. Replicates for each purified
immune cell type are required to gauge variance in the expression
profile (see Note 4 for further details). The platform and methods
used to generate data for the signature matrix ideally should be
identical to that applied to the analysis of the mixture samples.
See Note 3 for analyzing murine data. While SVR is robust to
unknown cell populations, performance can be adversely affected
by genes that are highly expressed in a relevant unknown cell
population (e.g., in the malignant cells) but not by any immune
components present in the signature matrix. A simple option imple-
mented in CIBERSORT to limit this effect is to remove genes
highly expressed in non-hematopoietic cells or tumor cells. If
expression data is available from purified tumor cells for the malig-
nancy to be studied, this can be used as a guideline to filter other
confounding genes from the signature matrix.
3.3.2 Input Data The mixture input data format for custom signature gene matrix
Preparation option is identical to the analysis with the LM22 signature gene
matrix (Subheading 3.2.1). To generate the custom signature gene
matrix, the user needs to provide a reference sample file containing
the GEPs for each purified immune population of interest, and a
phenotype class file assigning the profiles to each phenotypic type of
immune cell to be included in the signature matrix. The expression
data in the reference sample file should be in non-log (i.e., linear)
space with genes listed in the rows and reference populations listed
in columns. The phenotype class file lists the desired cell popula-
tions in the signature matrix listed in rows and the purified refer-
ence samples contained in the reference sample file listed in
columns (refer to the CIBSERORT website manual for more
details). These must be listed in the exact same order as the refer-
ence sample file. The cells are used to assign phenotypic classes to
Profiling Tumor Infiltrating Immune Cells with CIBERSORT 253
Table 2
Format of input files to generate reference files and class files necessary for custom gene signatures
(tab separated plain text)
Gene symbol Cell type Cell type Cell type Cell type Cell type Cell type
(required) Name1 Name1 Name1 Name2 Name2 Name2 ...
Gene1
Gene2
...
3.3.3 Creating the In the following two sections, we describe how to create a custom
Signature Matrix leukocyte signature matrix and apply it to study cellular heteroge-
neity and TIL survival associations in melanoma tumors profiled by
The Cancer Genome Atlas (TCGA). Readers can follow along by
creating “LM6,” a leukocyte RNA-Seq signature matrix comprised
of six peripheral blood immune subsets (B cells, CD8 T cells, CD4
T cells, NK cells, monocytes/macrophages, neutrophils;
GSE60424 [21]). Key input files are provided on the CIBERSORT
website (“Menu>Download”).
A custom signature file can be created by uploading the Refer-
ence sample file and the Phenotype classes file (Subheading 3.3.2)
to the online CIBERSORT application (see Fig. 2) or can be created
using the downloadable Java package. To build a custom gene
signature matrix with the latter, the user should download the
Java package from the CIBERSORT website and place all relevant
files under the package folder. To link Java with R, run the follow-
ing in R:
Within R:
> library(Rserve)
254 Binbin Chen et al.
> Rserve(args¼"--no-save")
Command line:
> java -Xmx3g -Xms3g -jar CIBERSORT.jar -M Mixture_file -P
Reference_sample_file -c phenotype_class_file -f
The last argument (-f) will eliminate non-hematopoietic genes
from the signature matrix and is generally recommended for signa-
ture matrices tailored to leukocyte deconvolution. The user can also
run this step on the website by choosing the corresponding refer-
ence sample file and phenotype class file (see Fig. 2). The CIBER-
SORT website will generate a gene signature matrix located under
“Uploaded Files” for future download.
Following signature matrix creation, quality control measures
should be taken to ensure robust performance (see “Calibration of
in silico TIL profiling methods” in Newman et al.) [18]. Factors
that can adversely affect signature matrix performance include poor
input data quality, significant deviations in gene expression between
cell types that reside in different tissue compartments (e.g., blood
versus tissue), and cell populations with statistically indistinguish-
able expression patterns. Manual filtering of poorly performing
genes in the signature matrix (e.g., genes expressed highly in the
tumor of interest) may improve performance.
To benchmark our custom leukocyte matrix (LM6), we com-
pared it to LM22 using a set of TCGA lung squamous cell carci-
noma tumors profiled by RNA-Seq and microarray (n ¼ 130 pairs).
Deconvolution results were significantly correlated for all cell sub-
sets shared between the two signature matrices ( p < 0.0001).
Notably, since LM6 was derived from leukocytes isolated from
peripheral blood [21, 22], we restricted the CD4 T cell comparison
to naive and resting memory CD4 T cells in LM22. Once validation
is complete, a CIBERSORT signature matrix can be broadly
applied to mixture samples as described in Subheading 3.3 (e.g.,
see Fig. 4).
3.4 Correlating TIL Associations with clinical indices and outcomes are commonly
Levels with Clinical assessed using a log-rank test for binary variables and Cox propor-
Outcomes tional hazards regression for continuous variables. There are a
number of freely available tools for such analyses. We typically use
the R “survival” package or the python “lifelines” package. To
illustrate TIL survival analysis in primary tumor samples, we applied
LM6 (Subheading 3.3.3) to 473 TCGA skin cutaneous melanoma
tumor samples profiled by RNA-Seq (see Fig. 4). We then analyzed
the influence of estimated CD8 T cell levels on overall survival.
Higher levels of CD8 T lymphocytes were associated with favorable
overall survival in both dichotomous (Fig. 5) and continuous mod-
els ( p ¼ 0.013, Cox regression), consistent with previous studies
[1, 2].
Profiling Tumor Infiltrating Immune Cells with CIBERSORT 255
Overall survival
0.6 n = 364 tumors
0.4
0.2
0
0 50 100 150 200 250 300 350
Time (months)
3.5 Use of By default, CIBERSORT estimates the relative fraction of each cell
CIBERSORT to Infer type in the signature matrix, such that the sum of all fractions is
Absolute TIL Levels equal to 1 for a given mixture sample. CIBERSORT can also be
used to produce a score that quantitatively measures the overall
abundance of each cell type (as described in “Analysis of deconvo-
lution consistency” in Newman et al.) [17]. Briefly, the absolute
immune fraction score is estimated by the median expression level
of all genes in the signature matrix divided by the median expres-
sion level of all genes in the mixture. Using this metric coupled with
LM22, we have found that CIBERSORT effectively captures over-
all immune content in RNA-Seq and microarray datasets when
benchmarked against other methods. These include H&E staining
and computational inference by ESTIMATE [23], a previously
published method for determining overall immune content in
tumor expression profiles.
Absolute results can be easily accessed from the CIBERSORT
website by toggling the output between relative and absolute
modes in the Results page (see online manual for details). When
using the R script (Subheading 3.2.3), the user should download
the latest version of the script and set “absolute¼TRUE.” For
example:
results <- CIBERSORT(’sig_matrix_file.txt’,’mixture_file.txt’,
perm¼100, absolute¼TRUE)
256 Binbin Chen et al.
4 Notes
Acknowledgments
We would like to thank David Steiner, M.D., Ph.D. for his assis-
tance in generating the RNA-Seq derived signature matrix. This
work is supported by grants from the Doris Duke Charitable Foun-
dation (A.A.A.), the Damon Runyon Cancer Research Foundation
(A.A.A.), the B&J Cardan Oncology Research Fund (A.A.A.), the
Ludwig Institute for Cancer Research (A.A.A.), NIH grant
1K99CA187192-01A1 (A.M.N.), NIH grant PHS NRSA 5T32
CA09302-35 (A.M.N.), US Department of Defense grant
W81XWH-12-1-0498 (A.M.N.), a grant from the Siebel Stem
Cell Institute and the Thomas and Stacey Siebel Foundation
(A.M.N.), an NIH/Stanford MSTP training grant (B.C.), and a
PD Soros Fellowship (B.C.).
References
1. Fridman WH, Pagès F, Sautès-Fridman C, Carmona M, Kivork C, Seja E, Cherry G,
Galon J (2012) The immune contexture in Gutierrez AJ, Grogan TR, Mateus C,
human tumours: impact on clinical outcome. Tomasic G, Glaspy JA, Emerson RO,
Nat Rev Cancer 12(4):298–306. https://doi. Robins H, Pierce RH, Elashoff DA,
org/10.1038/nrc3245 Robert C, Ribas A (2014) PD-1 blockade
2. Gentles AJ, Newman AM, Liu CL, Bratman induces responses by inhibiting adaptive
SV, Feng W, Kim D, Nair VS, Xu Y, immune resistance. Nature 515
Khuong A, Hoang CD, Diehn M, West RB, (7528):568–571. https://doi.org/10.1038/
Plevritis SK, Alizadeh AA (2015) The prognos- nature13954
tic landscape of genes and infiltrating immune 4. Herbst RS, Soria JC, Kowanetz M, Fine GD,
cells across human cancers. Nat Med 21 Hamid O, Gordon MS, Sosman JA, McDer-
(8):938–945. https://doi.org/10.1038/nm. mott DF, Powderly JD, Gettinger SN, Kohrt
3909 HE, Horn L, Lawrence DP, Rost S,
3. Tumeh PC, Harview CL, Yearley JH, Shintaku Leabman M, Xiao Y, Mokatrin A,
IP, Taylor EJ, Robert L, Chmielowski B, Koeppen H, Hegde PS, Mellman I, Chen DS,
Spasic M, Henry G, Ciobanu V, West AN, Hodi FS (2014) Predictive correlates of
258 Binbin Chen et al.
Abstract
The complex network of the tissue system, in both pre-neoplastic tissues and tumors, demonstrates the
need for a systems biology approach to cancer pathology, in which quantification of key tissue system
processes is combined with informatics tools to produce actionable scores to aid clinical decision-making. A
systems biology approach to cancer pathology enables integration of key system features that are relevant to
diagnoses, patient outcomes, and responses to therapies. Key tissue system features relevant to cancer
pathology include molecular and morphologic abnormalities in epithelia, cellular changes in the stroma
such as immune infiltrates, and relationships between components of the system, such as interactions and
spatial relationships between epithelial and stromal components, and also between specific immune cell
subsets. Here, we describe a method for objective quantification of multiple epithelial and stromal bio-
markers in the context of tissue architecture to generate a high dimensional tissue profile that can be used to
build multivariable predictive models for cancer pathology.
Key words Biomarkers, Multiplexed immunofluorescence, Whole slide fluorescence imaging, Digital
pathology, Quantitative image analysis, Cancer systems biology
1 Introduction
Louise von Stechow (ed.), Cancer Systems Biology: Methods and Protocols, Methods in Molecular Biology, vol. 1711,
https://doi.org/10.1007/978-1-4939-7493-1_13, © Springer Science+Business Media, LLC 2018
261
262 Aaron DeWard and Rebecca J. Critchley-Thorne
2 Materials
3 Methods
3.1 Multiplexed Program the BondRX autostainer to perform the following steps
Immunofluorescence (application volume is 150 μL for all the reagent steps):
Slide Labeling
1. Bake slides (no reagent), incubation time 30 min, 60 C.
Procedure
2. Apply Bond Dewax solution, incubation time 30 min, 72 C.
3. Reapply Bond Dewax solution, incubation time 0 s, 72 C.
4. Reapply Bond Dewax solution, incubation time 0 s, ambient
temperature (see Note 8).
5. Apply ethanol, incubation time 0 s, ambient temperature,
repeat twice for total of three ethanol washes.
6. Apply Bond Wash (diluted to 1 with deionized water), incu-
bation time 0 s, ambient temperature, repeat for total of three
washes with Bond Wash.
264 Aaron DeWard and Rebecca J. Critchley-Thorne
3.2 Whole Slide 1. Calibrate the light source to steady absolute output using the
Fluorescence X-Cite® XR2100 Power Meter. An output of 2.2 W will ensure
Scanning adequate illumination for most imaging applications that can
be maintained for 1500–2500 h of scanning depending on the
initial attainable wattage of the bulb.
Tissue Systems Pathology 265
Fig. 1 Representative images of multiplexed panels of tissue system biomarkers in Barrett’s esophagus pinch
biopsies. Sections of Barrett’s esophagus pinch biopsies were fluorescently immunolabeled for the multi-
plexed panels of biomarkers described in Notes 4 and 6. Whole slide images were acquired at 20
magnification using the ScanScope FL. (Panels a–d) (a) HIF-1α-green (b) CD45RO-red, (c) CD1a-yellow,
(d) HIF-1α-green, CD1a-yellow overlay demonstrating infiltration of the lamina propria by cells expressing
HIF-1α, which indicates stromal angiogenesis, and also memory lymphocytes and dendritic cells. (Panels
e–h) (e) HIF-1α-green (f) CD45RO-red, (g) CD1a-yellow, (h) HIF-1α-green, CD45RO-red, CD1a-yellow overlay,
providing an additional example of infiltration of the lamina propria by cells expressing HIF-1α, memory
lymphocytes and dendritic cells. (Panels i–l) (i) p16-green, (j) AMACR-red, (k) p53-yellow, (l) p16-green,
AMACR-red, p53-yellow overlay showing loss of p16, focal overexpression of AMACR and overexpression of
p53. (Panels m–p) (m) p16-green, (n) AMACR-red, (o) p53-yellow, (p) p16-green, AMACR-red, p53-yellow
overlay showing normal/positive expression of p16, multi-focal overexpression of AMACR and loss of p53.
Hoechst shown in blue in all panels
Tissue Systems Pathology 267
Fig. 2 Cellular object segmentation and tissue structure segmentation to enable quantitative, contextual
feature measurements. The TissueCypher® Image Analysis Platform was used to detect a Barrett’s esophagus
biopsy and segment subcellular compartments and tissue objects. (a) Barrett’s esophagus biopsy labeled for
p16 (green), AMACR (red), p53 (yellow), and Hoechst (blue). (b) Segmentation of nuclei objects based on the
Hoechst channel. (c) Segmentation of cell objects containing nuclei by first creating a distance map to which
the watershed operation was applied, and then performing connected components labeling, as previously
described [4]. (d) Segmentation of cytoplasm by subtracting the nuclei mask shown in Panel b from the cell
mask shown in Panel c. (e) A nuclei cluster mask was produced via Gaussian smoothing of the Hoechst signal,
rank order filter, image thresholding, morphological operations, and connected components labeling, as
previously described [4]. (f) p53 signal (yellow) was measured within the segmented nuclei clusters
4 Notes
Acknowledgments
References
1. Pantanowitz L, Valenstein PN, Evans AJ, Kaplan 5. Critchley-Thorne RJ, Duits LC, Prichard JW,
KJ, Pfeifer JD, Wilbur DC, Collins LC, Colgan Davison JM, Jobe BA, Campbell BB, Repa KA,
TJ (2011) Review of the current state of whole Reese LM, Li J, Diehl DL, Jhala NC, Ginsberg
slide imaging in pathology. J Pathol Inf 2:36 GG, DeMarshall M, Foxwell T, Zaidi AH, Tay-
2. Dennis J, Parsa R, Chau D, Koduru P, Peng Y, lor DL, Rustgi AK, Bergman JJ, Falk GW
Fang Y, Sarode VR (2015) Quantification of (2016) A novel tissue systems pathology test
human epidermal growth factor receptor predicts progression in Barrett’s esophagus
2 immunohistochemistry using the Ventana patients. Cancer Epidemiol Biomark Prev 25
image analysis system: correlation with gene (6):958–968
amplification by fluorescence in situ hybridiza- 6. MathWorks image processing toolbox. https://
tion: the importance of instrument validation for www.mathworks.com/products/image
achieving high (>95%) concordance rate. Am J 7. Kothari S, Phan JH, Wang MD (2013) Eliminat-
Surg Pathol 39(5):624–631 ing tissue-fold artifacts in histopathological
3. Gough A, Lezon T, Faeder J, Chennubhotla C, whole-slide images for improved image-based
Murphy R, Critchley-Thorne R, Taylor DL prediction of cancer grade. J Pathol Inf 4:22
(2014) High content analysis and cellular and 8. Hang W, Phan JH, Bhatia AK, Cundiff CA,
tissue systems biology: a bridge between cancer Shehata BM, Wang MD (2015) Detection of
cell biology and tissue-based diagnostics. In: blur artifacts in histopathological whole-slide
Mendelsohn J, Howley PM, Israel MA, Gray images of endomyocardial biopsies. Conf Proc
JW, Thompson CB (eds) The molecular basis IEEE Eng Med Biol Soc 2015:727–730
of cancer 4th edition, 4th edn. Elsevier, 9. Dolled-Filhart M, McCabe A, Giltnane J,
New York Cregger M, Camp RL, Rimm DL (2006) Quan-
4. Prichard JW, Davison JM, Campbell BB, Repa titative in situ analysis of beta-catenin expression
KA, Reese LM, Nguyen XM, Li J, Foxwell T, in breast cancer shows decreased expression is
Taylor DL, Critchley-Thorne RJ (2015) Tissue- associated with poor outcome. Cancer Res 66
Cypher: a systems biology approach to anatomic (10):5487–5494
pathology. J Pathol Inf 6:48
Part V
Abstract
Fulfilling the promises of precision medicine will depend on our ability to create patient-specific treatment
regimens. Therefore, being able to translate genomic sequencing into predicting how a patient will respond
to a given drug is critical. In this chapter, we review common bioinformatics approaches that aim to use
sequencing data to predict sample-specific drug susceptibility. First, we explain the importance of custo-
mized drug regimens to the future of medical care. Second, we discuss the different public databases and
community efforts that can be leveraged to develop new methods for identifying new predictive biomar-
kers. Third, we cover the basic methods that are currently used to identify markers or signatures of drug
response, without any prior knowledge of the drug’s mechanism of action. We further discuss how one can
integrate knowledge about drug targets, mechanisms, and predictive markers to better estimate drug
response in a diverse set of samples. We begin this section with a primer on popular methods to identify
targets and mechanism of action for new small molecules. This discussion also includes a set of computa-
tional methods that incorporate other drug features, which do not relate to drug-induced genetic changes
or sequencing data such as drug structures, side-effects, and efficacy profiles. Those additional drug
properties can aid in gaining higher accuracy for the identification of drug target and mechanism of action.
We then progress to discuss using these targets in combination with disease-specific expression patterns,
known pathways, and genetic interaction networks to aid drug choice. Finally, we conclude this chapter
with a general overview of machine learning methods that can integrate multiple pieces of sequencing data
along with prior drug or biological knowledge to drastically improve response prediction.
Key words Bioinformatics, Precision medicine, Drug response, Machine learning, Biomarkers
1 Introduction
Louise von Stechow (ed.), Cancer Systems Biology: Methods and Protocols, Methods in Molecular Biology, vol. 1711,
https://doi.org/10.1007/978-1-4939-7493-1_14, © Springer Science+Business Media, LLC 2018
277
278 Neel S. Madhukar and Olivier Elemento
2 Databases
2.1 NCI60 Drug The National Cancer Institute’s (NCI) 60 cell line drug screen is a
Sensitivity Database database of in vitro drug efficacies (either in terms of GI50, LD50,
or TGI) for over 50,000 compounds screened against the NCI60
panel of cancer cell lines [5]. With 60 cancer cell lines from nine
distinct tumor types—leukemia, colon, lung, central nervous sys-
tem, renal, melanoma, ovarian, breast, and prostate—the NCI60
collection aims to provide information on a broad set of genetic
conditions and tumor types. The NCI60 panel has itself been
profiled using a variety of assays from genomic to gene expression
and proteomics [6–9]. The profiling data can be used in conjunc-
tion with the Developmental Therapeutics Program’s (DTP) drug
screening database to identify genetic signatures indicative of a
certain response pattern.
2.2 Cancer Cell Line The Cancer Cell Line Encyclopedia (CCLE) [10, 11] is a database
Encyclopedia of 947 different human cancer cell lines encompassing 36 different
tumor types that have been genetically profiled—gene expression,
copy number, mutations, etc. Furthermore, 24 known anticancer
drugs were profiled against approximately 500 of these cell lines.
Though the number of compounds profiled is smaller than the
Bioinformatics Approaches to Predict Drug Responses from Genomic Sequencing 279
Table 1
List of databases and abbreviations that are mentioned throughout the text of the chapter
NCI60 drug screen, the greater number of cell lines tested allows
for more precise identification of genetic predictors of sensitivity for
the drugs measured.
2.3 Genomics of Hosted by the Wellcome Trust Sanger Institute, the Genomics of
Drug Sensitivity in Drug Sensitivity in Cancer (GDSC) database is a massive drug
Cancer screen project similar to the NCI60 and CCLE. In their initial
release, investigators screened a set of 138 known anti-cancer com-
pounds against over 1000 different cancer cell lines (on average
525 cell lines tested per compound). Each cell line also was sub-
jected to thorough expression and copy number profiling along
with targeted mutation data for a set of 75 cancer genes. This
dataset constitutes another great resource for the identification of
genomic markers of drug responses.
280 Neel S. Madhukar and Olivier Elemento
2.4 Connectivity Released by the Broad Institute, the Connectivity Map (CMap)
Map/LINCS seeks to find connections between small molecules, physiological
processes, and disease states [12]. Using mRNA expression
(measured by DNA microarrays) as the “language” of cellular
response, the CMap measures how a panel of cancer cell lines
responds transcriptionally to a variety of different drug treatments.
This approach had previously been successful in identifying drug
mechanisms in yeast but had never been applied to cancer cells
[13]. The investigators profiled four different cancer cell lines
before and after treatment with a panel of more than 1000 small
molecules. The LINCS database is an updated version of this
profiling system with a much larger number of drugs and cell
lines. This database makes use of the LINC1000 expression
profiling system where the expression of 1000 key genes is
measured and used to infer the global gene expression profile.
From these transcriptional changes it is possible to explore a
drug’s mechanisms of action. These could be used to successfully
repurpose drugs for specific diseases or genetic states [14, 15].
A key first step to any drug response prediction effort involves the
identification of genomic markers that can impact efficacy. Identify-
ing those markers makes response prediction a much simpler task.
Once a polymorphism, gene expression pattern, or pathway has
been identified, all new samples can simply be screened for that
marker and, using known correlations with drug response, a pre-
diction of drug susceptibility can be made. Here, we focus on a
variety of approaches that can be used to identify genomic markers
indicative of drug response.
3.1 Using Genome- Genome-Wide Associate Studies (GWAS) have classically been used
Wide Associate to detect genetic variations associated with specific disease pheno-
Studies to Identify types. However, in recent years, the use of GWAS has proved to be a
Polymorphisms powerful method to identify polymorphisms that can affect drug
Related to Drug efficacy and toxicity [16]. Unlike approaches focusing on known
Response drug targets or candidate gene lists, GWAS provides a hypothesis-
free method that can systematically test a large number of variants
[17, 18]. In order to run a GWAS one must provide a measure of
response or toxicity for a large number of samples, as well as a
thorough genotyping of each sample.
GWA studies typically fall into two main categories depending
on whether the provided response measure is categorical (such as
case/control, responder/non-responder, adverse reaction/no
reactions, etc.) or quantitative (such as IC50 or a measure of side
effect severity). Recently, there have been a series of developments
improving the traditional GWAS, such as taking into account a
Bioinformatics Approaches to Predict Drug Responses from Genomic Sequencing 281
Table 2
Sample contingency table showing how we can use the number of responders with a certain SNP to
test whether it is related to drug efficacy
Responders Non-responders
SNP present 90 15
SNP absent 10 485
8
Significantly Associated Hits
6
-log 10( p)
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
Chromosome
Fig. 1 Sample Manhattan plot showcasing how one can use the output of GWAS calculation to find SNPs
related to drug efficacy. Boxed hits represent those that pass the significant p value cutoff and thus may be
relevant to treatment response
3.2 Using Gene While GWA studies aim to find a set of mutations or polymorph-
Expression to Find isms that are predictive of how a patient will respond to a drug,
Response Signatures another popular approach is using gene expression data to find an
and Predict Response expression signature associated with a positive (or negative)
response. Different transcriptional profiles can often lead to differ-
ent levels of drug efficacy, and differential expression analyses can
help pinpoint the specific genes or pathways that drive the hetero-
geneous drug response and can be used to predict response levels.
The classic approach involves treating a cohort of mice or
patients, or patient samples or cell lines with a given drug and
measuring the degree of response in each sample. Similar to a
GWAS, the response rate can be measured either categorically
(responder/non-responder) or as a continuous variable. Using
either sequencing data from before treatment or differential gene
expression (comparing pre and post-treatment samples) one can
Bioinformatics Approaches to Predict Drug Responses from Genomic Sequencing 283
le
le
Identified genomic signature
ti b
ti b
ti b
t
t
tan
tan
tan
ep
ep
ep
si s
si s
si s
of response
sc
sc
sc
Su
Re
Re
Re
Su
Su
le
le
l
ti b
ti b
ti b
t
t
tan
tan
tan
ep
ep
ep
sis
sis
sis
sc
sc
sc
Su
Re
Re
Re
Su
Su
Differential Gene 1
Gene Analysis
Gene 2
Fig. 2 Diagram on how gene expression patterns from responders and non-responders can be used to identify
signatures related to response and how these can be used to better select new patients likely to respond
search for gene expression patterns that seems more prevalent in the
samples that are susceptible (or resistant) to treatment (see Fig. 2).
For instance, one would expect to see genes that confer drug
resistance to be more highly expressed in samples where drug
treatment shows a limited effect.
A number of methods exist for detecting differential expression
across a set of samples. For microarray data oftentimes statistical
tests such as an ANOVA would suffice, but packages such as limma
[30] (see also Chapter 6 for an application of the limma package on
phosphoproteomics data) use linear models that can help deal with
more complicated experimental designs. For RNA-seq data the
most popular methods include a limma-voom [31], DESeq2
[32], edgeR [33], and cufflinks (cuffdiff) [34]. DESeq2 and
edgeR are currently considered the standard for differential expres-
sion analysis and both use similar underlying models (however with
different dispersion estimates). However, in our experience we have
found DESeq2 to be more conservative. One key difference
between DESeq2/edgeR and limma-voom is that voom does not
employ a negative binomial distribution and instead estimates the
284 Neel S. Madhukar and Olivier Elemento
3.3 Using Pathway Often a differential gene expression analysis will have a set of genes
Annotations and GSEA as output, which has no obvious pattern or relevance to the type of
to Identify Differential drug being investigated. Additionally, it is quite common for a set
Biological States of genes to be marked as significant in a differential gene expression
analysis, but when experiments are done to perturb individual
genes they seem to have little to no effect on drug response. In
cases like these it is often helpful to translate the differentially
expressed genes into a set of enriched biological pathways or gene
sets. These can provide a broader explanation of a drug’s mecha-
nism of action and a clearer understanding on how to predict
efficacy. This approach has previously been successful not only in
drug response prediction, but also in the development of highly
effective drugs. Overexpression of the mTOR pathway in lym-
phoma led to the development of inhibitors to specifically target
genes in that pathway [37], and global activation of the epidermal
growth factor receptor pathway was found to be predictive of
erlotinib susceptibility in pancreatic cancer xenografts [38].
The basic technique to finding enriched pathways or canonical
gene sets is to first annotate each gene based on the pathways/sets
it falls into. A few popular resources for pathway and gene set
annotation include: the Molecular Signatures Database (MSigDB)
[39], Reactome [40, 41], the Kyoto Encyclopedia of Genes and
Genomes (KEGG) [42], Gene Ontologies [43], and InnateDB
[44, 45]. Reactome, KEGG, and InnateDB group genes based on
their biochemical pathways (with InnateDB focusing on pathways
relating to immunity), Gene Ontologies group genes based on their
biological/molecular function or cellular localization, and
MSigDB is a combination of all the aforementioned databases
with custom sets of “hallmark” gene sets, or important genes
involved in certain processes. Following annotation, a statistical
test (such as the Fishers exact test) can be used to test whether a
certain pathway is enriched for up (or down) regulated genes
compared to what would be expected by random chance.
Bioinformatics Approaches to Predict Drug Responses from Genomic Sequencing 285
4.1 Computational For a small molecule in development the mechanisms of action and
Techniques to Identify binding targets are often not fully understood. A number of
Drug Targets and computational methods exist that seek to predict targets for these
Mechanisms orphan small molecules, based either on chemical structure or on its
down-stream effects. These methods can broadly be divided into
three categories:
1. Molecular dynamics: Using intricate mathematical models,
molecular dynamics methods computationally simulate a
drug’s interaction with a given protein. To predict targets, an
orphan small molecule is tested against a series of proteins to
identify any with favorable binding results [47, 48]. However,
this approach requires significant computation power, complex
mathematical models, and full 3D structures for each queried
protein—data that is often unavailable.
2. Ligand-based [49, 50]: Using a set of known protein binding
partners for a given small molecule, ligand-based approaches
apply machine learning techniques to find other proteins with
high enough similarity to the known targets. The proteins with
high degrees of similarity are predicted to be novel binding
targets. However ligand-based methods often require a large
number of known binding partners for each tested small mole-
cule, and thus can mostly be used on drugs far enough in the
drug development phase.
3. Downstream effect based: Recently, a number of methods
emerged, which use the downstream effects of a small molecule
(such as induced gene expression change [51] or side-effects
[52]) to predict targets for orphan small molecules. The basic
premise of these methods is to compare the effects of an orphan
small molecule to the effects of drugs with known targets. If the
286 Neel S. Madhukar and Olivier Elemento
4.2 Using Known Assuming one can determine the mechanisms of action of a drug—
Drug Targets To either in terms of specific binding targets or broad knowledge on
Predict Response the biological pathways mobilized—the task of predicting efficacies
is often much simpler. For example, if a drug’s main mechanism of
action is to target Protein A, then one would expect different
efficacies in samples based on whether there is an amplification or
deletion of Protein A. This type of reasoning also applies when
there are mutations in a known drug target. Examples of this are
treatments involving Gefitinib or Herceptin. Gefitinib is an anti-
cancer small molecule known to target the EGFR kinase, and
mutations in EGFR were found to predict sensitivity of samples
to gefitinib treatment [56]. Herceptin, an antibody that targets
HER2, was found to improve the outcomes of cancer patients
with HER2 amplifications or activating mutations [57, 58]. Another
example of this concept is vemurafenib—a small molecule that
targets V600E BRAF mutation—that has been found to be selec-
tively effective in cancer patients with this exact mutation, while
having no beneficial effect on normal BRAF samples
Bioinformatics Approaches to Predict Drug Responses from Genomic Sequencing 287
[59–61]. These are just a few of the many examples showing how
combining known drug targets with targeted sequencing can help
detect instances of differential response.
However, it is also important to note that while the alterations
of a drug’s target are often predictive of efficacy, this is not always
the case, even if the target itself serves as a biomarker [62]. More-
over, there are often cases where the predictive biomarker for a
given drug is not the actual target, but rather another gene or set of
genes involved in the same pathway or biological processes as
drug’s target. In cases like these sequencing could still prove to
be a valuable tool, and we advise utilizing some of the other
methods mentioned in this chapter. Drug target information
could be used in combination with these methods to refine predic-
tions and gain greater biological insights.
Sequencing-based approaches also can be very successful in
positioning drugs for specific disease conditions—especially differ-
ent cancer types. Using resources like the Cancer Genome Atlas
(TCGA) [63] and Genotype-Tissue Expression (GTEx) project
[64], one can find genes or pathways that are significantly upregu-
lated in certain cancers or cancer types compared to either normal
tissue samples or other cancer subtypes. Identifying such cancer-
subtype-specific, upregulated signatures could highlight drugs
known to target these signatures as particularly viable candidates
for treatment. For instance, it was recently discovered that dopa-
mine receptors were selectively upregulated in neoplastic stem cells
in breast cancer. It was observed that thioridazine (a compound
known to target dopamine receptors) was particularly effective
against these cell populations [65].
4.3 Exploiting One approach that has become increasingly popular is exploiting
Genetic Interactions networks of synthetic lethality (SL) and synthetic dosage lethality
(SL/SDL) (SDL) to predict drug efficacy. SL describes a specific type of
genetic interactions involving two or more genes, where the loss
of either gene individually is non-fatal, but the combined loss of all
SL partner genes leads to a severe decrease in fitness or cell death.
SDL describes a related genetic interaction where lethality is
observed when one gene is lost while its SDL partner is overex-
pressed [66, 67]. Both SL and SDL interactions are highly relevant
to cancer biology, as most cancers have both widespread losses and
gains of certain genes. Exploiting these could drastically improve
patient prognosis. For instance, if Gene A and Gene B are in an SL
pair and Gene A is lost in a given cancer sample, then one would
expect compounds targeting Gene B to have better responses in
this sample (see Fig. 3).
To this end there have recently been many efforts to uncover
underlying SL and SDL networks in cancer. Among the most
successful efforts was the data mining synthetic lethality identifica-
tion pipeline DAISY [68]. DAISY uses three distinct hypotheses to
288 Neel S. Madhukar and Olivier Elemento
Fig. 3 (a) Diagram highlighting the concept of synthetic lethality and how known synthetic lethal relationships
can be combined with genomic information to better predict drug response. (b) Using synthetic lethality to
predict differential response
detect SL pairs (with the inverse hypotheses being used for SDL
pair detection):
1. Genes in an SL pair will have significantly lower raters of
co-mutation or co-loss.
2. Knockout/knockdown of a given gene will be more fatal in
samples with under-expression or loss of its SL partner.
3. Genes in an SL pair are more likely to be co-expressed.
By scanning for gene pairs that fulfill all three hypotheses,
DAISY predicted networks of SL and SDL interactions. It achieved
an accuracy level of approximately 77% (measured by Area Under
the Receiver Operating Curve) when compared to known SL inter-
actions, demonstrating that DAISY could accurately infer SL and
SDL genetic interactions. To translate this into predicting drug
responses, the authors identified sample-specific exploitable inter-
actions, or SDL interactions where one gene was overexpressed and
SL interactions where one gene was lost. DAISY then identified
drugs known to target the other gene in each exploitable interac-
tion. For each drug DAISY ranked the most sensitive samples based
on the number of exploitable interactions being targeted by each
drug. They found that specific drugs were significantly more effec-
tive in cell lines predicted to be sensitive than those predicted to be
resistant. Furthermore, the authors used a similar approach to
predict the exact IC50 value for each drug across a set of cancer
cell lines and observed a strong correlation between the predicted
and observed values (R ¼ 0.721). Taken together these results
show how known genetic interactions (particularly SL and SDL
interactions) can be combined with sequencing data to better
predict drug sensitivities and inform treatment.
Bioinformatics Approaches to Predict Drug Responses from Genomic Sequencing 289
Drug
Sensitivities
Samples
Genomic Data
Samples
Machine Learning
Model
Gene Expression Levels
2. Prediction
Genomic Data for New
Samples
Samples
Sensitivity Predictions
Machine Learning
Model
Samples
Fig. 4 Overview of how common machine-learning methods combine multiple data types to train a specific
model that can be applied to new samples to predict sensitivity
290 Neel S. Madhukar and Olivier Elemento
Acknowledgments
The authors would like to thank the Elemento Lab members and
Natalie R. Davidson for their feedback and discussion. O.E. and
N.M. are supported by the CAREER grant from National Science
Foundation (DB1054964), NIH grant R01CA194547, the Starr
CancerFoundation, as well as by startup funds from the Institute
for Computational Biomedicine. Support for N.M. was also
provided by the PhRMA Foundation Pre Doctoral Informatics
Fellowship and by the Tri-Institutional Training Program in
Computational Biology and Medicine.
References
1. Fry RC, Svensson JP, Valiathan C, Wang E, 6. Abaan OD, Polley EC, Davis SR, Zhu YJ,
Hogan BJ, Bhattacharya S, Bugni JM, Whit- Bilke S, Walker RL, Pineda M, Gindin Y,
taker CA, Samson LD (2008) Genomic predic- Jiang Y, Reinhold WC, Holbeck SL, Simon
tors of interindividual differences in response RM, Doroshow JH, Pommier Y, Meltzer PS
to DNA damaging agents. Genes Dev 22 (2013) The exomes of the NCI-60 panel: a
(19):2621–2626. https://doi.org/10.1101/ genomic resource for cancer biology and sys-
gad.1688508 tems pharmacology. Cancer Res 73
2. Rice SD, Heinzman JM, Brower SL, Ervin PR, (14):4372–4382. https://doi.org/10.1158/
Song N, Shen K, Wang DK (2010) Analysis of 0008-5472.Can-12-3342
chemotherapeutic response heterogeneity and 7. Reinhold WC, Varma S, Sousa F, Sunshine M,
drug clustering based on mechanism of action Abaan OD, Davis SR, Reinhold SW, Kohn KW,
using an in vitro assay. Anticancer Res 30 Morris J, Meltzer PS, Doroshow JH, Pommier
(7):2805–2811 Y (2014) NCI-60 whole exome sequencing
3. Bosquet JG, Marchion DC, Chon H, Lancaster and pharmacological CellMiner analyses.
JM, Chanock S (2014) Analysis of chemother- PLoS One 9(7). https://doi.org/10.1371/
apeutic response in ovarian cancers using pub- journal.pone.0101670
licly available high-throughput data. Cancer 8. Scherf U, Ross DT, Waltham M, Smith LH,
Res 74(14):3902–3912. https://doi.org/10. Lee JK, Tanabe L, Kohn KW, Reinhold WC,
1158/0008-5472.CAN-14-0186 Myers TG, Andrews DT, Scudiero DA, Eisen
4. Sboner A, Elemento O (2016) A primer on MB, Sausville EA, Pommier Y, Botstein D,
precision medicine informatics. Brief Bioin- Brown PO, Weinstein JN (2000) A gene
form 17(1):145–153. https://doi.org/10. expression database for the molecular pharma-
1093/bib/bbv032 cology of cancer. Nat Genet 24(3):236–244.
5. Shoemaker RH (2006) The NCI60 human https://doi.org/10.1038/73439
tumour cell line anticancer drug screen. Nat 9. Gholami AM, Hahne H, Wu ZX, Auer FJ,
Rev Cancer 6(10):813–823. https://doi.org/ Meng C, Wilhelm M, Kuster B (2013) Global
10.1038/nrc1951 proteome analysis of the NCI-60 cell line
panel. Cell Rep 4(3):609–620. https://doi.
org/10.1016/j.celrep.2013.07.018
292 Neel S. Madhukar and Olivier Elemento
34. Trapnell C, Roberts A, Goff L, Pertea G, Nucleic Acids Res 42(Database issue):
Kim D, Kelley DR, Pimentel H, Salzberg SL, D472–D477. https://doi.org/10.1093/nar/
Rinn JL, Pachter L (2012) Differential gene gkt1102
and transcript expression analysis of RNA-seq 42. Ogata H, Goto S, Sato K, Fujibuchi W,
experiments with TopHat and Cufflinks. Nat Bono H, Kanehisa M (1999) KEGG: Kyoto
Protoc 7(3):562–578. https://doi.org/10. encyclopedia of genes and genomes. Nucleic
1038/nprot.2012.016 Acids Res 27(1):29–34. https://doi.org/10.
35. Wright G, Tan B, Rosenwald A, Hurt EH, 1093/nar/27.1.29
Wiestner A, Staudt LM (2003) A gene 43. Ashburner M, Ball CA, Blake JA, Botstein D,
expression-based method to diagnose clinically Butler H, Cherry JM, Davis AP, Dolinski K,
distinct subgroups of diffuse large B cell lym- Dwight SS, Eppig JT, Harris MA, Hill DP,
phoma. Proc Natl Acad Sci U S A 100 Issel-Tarver L, Kasarskis A, Lewis S, Matese
(17):9991–9996. https://doi.org/10.1073/ JC, Richardson JE, Ringwald M, Rubin GM,
pnas.1732008100 Sherlock G (2000) Gene ontology: tool for the
36. Lam LT, Davis RE, Pierce J, Hepperle M, Xu Y, unification of biology. The gene ontology con-
Hottelet M, Nong Y, Wen D, Adams J, sortium. Nat Genet 25(1):25–29. https://doi.
Dang L, Staudt LM (2005) Small molecule org/10.1038/75556
inhibitors of IkappaB kinase are selectively 44. Breuer K, Foroushani AK, Laird MR, Chen C,
toxic for subgroups of diffuse large B-cell lym- Sribnaia A, Lo R, Winsor GL, Hancock RE,
phoma defined by gene expression profiling. Brinkman FS, Lynn DJ (2013) InnateDB: sys-
Clin Cancer Res 11(1):28–40 tems biology of innate immunity and beyond—
37. Briones J (2009) Emerging therapies for B-cell recent updates and continuing curation.
non-Hodgkin lymphoma. Expert Rev Antican- Nucleic Acids Res 41(Database issue):
cer 9(9):1305–1316. https://doi.org/10. D1228–D1233. https://doi.org/10.1093/
1586/Era.09.86 nar/gks1147
38. Jimeno A, Tan AC, Coffa J, Rajeshkumar NV, 45. Lynn DJ, Winsor GL, Chan C, Richard N,
Kulesza P, Rubio-Viqueira B, Wheelhouse J, Laird MR, Barsky A, Gardy JL, Roche FM,
Diosdado B, Messersmith WA, Lacobuzio- Chan TH, Shah N, Lo R, Naseer M, Que J,
Donahue C, Maitra A, Varella-Garcia M, Yau M, Acab M, Tulpan D, Whiteside MD,
Hirsch FR, Meijer GA, Hidalgo M (2008) Chikatamarla A, Mah B, Munzner T,
Coordinated epidermal growth factor receptor Hokamp K, Hancock RE, Brinkman FS
pathway gene overexpression predicts epider- (2008) InnateDB: facilitating systems-level
mal growth factor receptor inhibitor sensitivity analyses of the mammalian innate immune
in pancreatic cancer. Cancer Res 68 response. Mol Syst Biol 4:218. https://doi.
(8):2841–2849. https://doi.org/10.1158/ org/10.1038/msb.2008.55
0008-5472.Can-07-5200 46. Subramanian A, Tamayo P, Mootha VK,
39. Liberzon A, Subramanian A, Pinchback R, Mukherjee S, Ebert BL, Gillette MA,
Thorvaldsdottir H, Tamayo P, Mesirov JP Paulovich A, Pomeroy SL, Golub TR, Lander
(2011) Molecular signatures database ES, Mesirov JP (2005) Gene set enrichment
(MSigDB) 3.0. Bioinformatics 27 analysis: a knowledge-based approach for inter-
(12):1739–1740. https://doi.org/10.1093/ preting genome-wide expression profiles. Proc
bioinformatics/btr260 Natl Acad Sci U S A 102(43):15545–15550.
40. Fabregat A, Sidiropoulos K, Garapati P, https://doi.org/10.1073/pnas.0506580102
Gillespie M, Hausmann K, Haw R, Jassal B, 47. Li HL, Gao ZT, Kang L, Zhang HL, Yang K,
Jupe S, Korninger F, McKay S, Matthews L, Yu KQ, Luo XM, Zhu WL, Chen KX, Shen JH,
May B, Milacic M, Rothfels K, Shamovsky V, Wang XC, Jiang HL (2006) TarFisDock: a web
Webber M, Weiser J, Williams M, Wu G, server for identifying drug targets with docking
Stein L, Hermjakob H, D’Eustachio P (2016) approach. Nucleic Acids Res 34:W219–W224.
The reactome pathway knowledgebase. https://doi.org/10.1093/nar/gkl114
Nucleic Acids Res 44(D1):D481–D487. 48. Rarey M, Kramer B, Lengauer T, Klebe G
https://doi.org/10.1093/nar/gkv1351 (1996) A fast flexible docking method using
41. Croft D, Mundo AF, Haw R, Milacic M, an incremental construction algorithm. J Mol
Weiser J, Wu G, Caudy M, Garapati P, Biol 261(3):470–489. https://doi.org/10.
Gillespie M, Kamdar MR, Jassal B, Jupe S, 1006/jmbi.1996.0477
Matthews L, May B, Palatnik S, Rothfels K, 49. Butina D, Segall MD, Frankcombe K (2002)
Shamovsky V, Song H, Williams M, Birney E, Predicting ADME properties in silico: methods
Hermjakob H, Stein L, D’Eustachio P (2014) and models. Drug Discov Today 7(11):
The reactome pathway knowledgebase.
Bioinformatics Approaches to Predict Drug Responses from Genomic Sequencing 295
Magazine H, Syron J, Fleming J, Siminoff L, 67. Chan DA, Giaccia AJ (2011) Harnessing syn-
Traino H, Mosavel M, Barker L, Jewell S, thetic lethal interactions in anticancer drug dis-
Rohrer D, Maxim D, Filkins D, Harbach P, covery. Nat Rev Drug Discov 10(5):351–364.
Cortadillo E, Berghuis B, Turner L, https://doi.org/10.1038/nrd3374
Hudson E, Feenstra K, Sobin L, Robb J, 68. Jerby-Arnon L, Pfetzer N, Waldman YY,
Branton P, Korzeniewski G, Shive C, McGarry L, James D, Shanks E, Seashore-
Tabor D, Qi LQ, Groch K, Nampally S, Ludlow B, Weinstock A, Geiger T, Clemons
Buia S, Zimmerman A, Smith A, Burges R, PA, Gottlieb E, Ruppin E (2014) Predicting
Robinson K, Valentino K, Bradbury D, cancer-specific vulnerability via data-driven
Cosentino M, Diaz-Mayoral N, Kennedy M, detection of synthetic lethality. Cell 158
Engel T, Williams P, Erickson K, Ardlie K, (5):1199–1209. https://doi.org/10.1016/j.
Winckler W, Getz G, DeLuca D, cell.2014.07.027
MacArthur D, Kellis M, Thomson A, 69. Garnett MJ, Edelman EJ, Heidorn SJ, Green-
Young T, Gelfand E, Donovan M, Meng Y, man CD, Dastur A, Lau KW, Greninger P,
Grant G, Mash D, Marcus Y, Basile M, Liu J, Thompson IR, Luo X, Soares J, Liu Q,
Zhu J, Tu ZD, Cox NJ, Nicolae DL, Gamazon Iorio F, Surdez D, Chen L, Milano RJ, Bignell
ER, Im HK, Konkashbaev A, Pritchard J, GR, Tam AT, Davies H, Stevenson JA,
Stevens M, Flutre T, Wen XQ, Dermitzakis Barthorpe S, Lutz SR, Kogera F, Lawrence K,
ET, Lappalainen T, Guigo R, Monlong J, McLaren-Douglas A, Mitropoulos X,
Sammeth M, Koller D, Battle A, Mostafavi S, Mironenko T, Thi H, Richardson L, Zhou W,
McCarthy M, Rivas M, Maller J, Rusyn I, Jewitt F, Zhang T, O’Brien P, Boisvert JL,
Nobel A, Wright F, Shabalin A, Feolo M, Price S, Hur W, Yang W, Deng X, Butler A,
Sharopova N, Sturcke A, Paschal J, Anderson Choi HG, Chang JW, Baselga J, Stamenkovic I,
JM, Wilder EL, Derr LK, Green ED, Struew- Engelman JA, Sharma SV, Delattre O, Saez-
ing JP, Temple G, Volpi S, Boyer JT, Thomson Rodriguez J, Gray NS, Settleman J, Futreal
EJ, Guyer MS, Ng C, Abdallah A, PA, Haber DA, Stratton MR, Ramaswamy S,
Colantuoni D, Insel TR, Koester SE, Little McDermott U, Benes CH (2012) Systematic
AR, Bender PK, Lehner T, Yao Y, Compton identification of genomic markers of drug sen-
CC, Vaught JB, Sawyer S, Lockhart NC, sitivity in cancer cells. Nature 483
Demchok J, Moore HF (2013) The (7391):570–575. https://doi.org/10.1038/
genotype-tissue expression (GTEx) project. nature11005
Nat Genet 45(6):580–585. https://doi.org/
10.1038/ng.2653 70. Menden MP, Iorio F, Garnett M,
McDermott U, Benes CH, Ballester PJ, Saez-
65. Sachlos E, Risueno RM, Laronde S, Rodriguez J (2013) Machine learning predic-
Shapovalova Z, Lee JH, Russell J, Malig M, tion of cancer cell sensitivity to drugs based on
McNicol JD, Fiebig-Comyn A, Graham M, genomic and chemical properties. PLoS One 8
Levadoux-Martin M, Lee JB, Giacomelli AO, (4). https://doi.org/10.1371/journal.pone.
Hassell JA, Fischer-Russell D, Trus MR, 0061318
Foley R, Leber B, Xenocostas A, Brown ED,
Collins TJ, Bhatia M (2012) Identification of 71. Costello JC, Heiser LM, Georgii E, Gonen M,
drugs including a dopamine receptor antago- Menden MP, Wang NJ, Bansal M, Ammad-ud-
nist that selectively target cancer stem cells. Cell din M, Hintsanen P, Khan SA, Mpindi JP,
149(6):1284–1297. https://doi.org/10. Kallioniemi O, Honkela A, Aittokallio T,
1016/j.cell.2012.03.049 Wennerberg K, Collins JJ, Gallahan D,
Singer D, Saez-Rodriguez J, Kaski S, Gray
66. Madhukar NS, Elemento O, Pandey G (2015) JW, Stolovitzky G, Community ND (2014) A
Prediction of genetic interactions using community effort to assess and improve drug
machine learning and network properties. sensitivity prediction algorithms. Nat Biotech-
Front Bioeng Biotechnol 3(172). https://doi. nol 32(12):1202–U1257. https://doi.org/10.
org/10.3389/fbioe.2015.00172 1038/nbt.2877
Chapter 15
Abstract
The design of optimal protocols plays an important role in cancer treatment. However, in clinical applica-
tions, the outcomes under the optimal protocols are sensitive to variations of parameter settings such as
drug effects and the attributes of age, weight, and health conditions in human subjects. One approach to
overcoming this challenge is to formulate the problem of finding an optimal treatment protocol as a robust
optimization problem (ROP) that takes parameter uncertainty into account. In this chapter, we describe a
method to model toxicity uncertainty. We then apply a mixed integer ROP to derive the optimal protocols
that minimize the cumulative tumor size. While our method may be applied to other cancers, in this work
we focus on the treatment of chronic myeloid leukemia (CML) with tyrosine kinase inhibitors (TKI). For
simplicity, we focus on one particular mode of toxicity arising from TKI therapy, low blood cell counts, in
particular low absolute neutrophil count (ANC). We develop optimization methods for locating optimal
treatment protocols assuming that the rate of decrease of ANC varies within a given interval. We further
investigated the relationship between parameter uncertainty and optimal protocols. Our results suggest
that the dosing schedule can significantly reduce tumor size without recurrence in 360 weeks while insuring
that toxicity constraints are satisfied for all realizations of uncertain parameters.
Key words Robust optimization, Mixed integer optimization, Cancer treatment, Toxicity uncertainty
1 Introduction
Louise von Stechow (ed.), Cancer Systems Biology: Methods and Protocols, Methods in Molecular Biology, vol. 1711,
https://doi.org/10.1007/978-1-4939-7493-1_15, © Springer Science+Business Media, LLC 2018
297
298 Junfeng Zhu et al.
2 General Models
fbðx ðt Þ; y ðt ÞÞ 0 ð1dÞ
f~ ðx ðt Þ; y ðt ÞÞ ¼ 0 ð1eÞ
x min x ðt Þ x max ð1f Þ
y min y ðt Þ y max ð1gÞ
2.1 Objective The role of the objective function in Eq. 1 is to specify the desired
Functions outcome of the course of anti-cancer therapy. The simplest form of
an objective function is to minimize the tumor population at the
end of treatment [15], i.e.,
J ¼ C ðT Þ ð2Þ
where C(T) is the tumor cell population at time t and T is a given
constant parameter indicating the length of treatment period.
Although objective functions of the form (Eq. 2) are easy to imple-
ment, they suffer from the drawback that they allow for large tumor
populations during treatment. To deal with this shortcoming,
Murray et al. [16] minimized the total tumor cell population over
the interval [0, T] while limiting the side effects of therapy. In
particular, they consider the objective function
ðT
J ¼ ðα1 C ðt Þ þ α2 S e ðt ÞÞdt
0
where Se(t) is a function modeling side effects. It can be a function
of dosage [17], or loss of body weight [18], and α1 and α2 are
weighting values for the cumulative tumor population and normal
tissues toxicity, respectively. Note that if one chooses parameter
α2 as zero, then the goal is to minimize the cumulative tumor
population over the time frame [0, T].
2.2 Tumor Growth Most optimization models of cancer therapy assume that tumor
Models growth can be accurately modeled by a set of differential equations
(usually ordinary differential equations). Some important questions
to consider when building these kinds of models are how the tumor
cells grow, how they interact, and how they are affected by anti-
cancer therapy. The simplest tumor growth model assumes that all
tumor cells proliferate with constant cell cycle duration which
results in an exponential growth model:
x_ ðt Þ ¼ λx ðt Þ
where x(t) is the tumor size at time t, and λ is a constant related to
the net-growth rate of the tumor. By using a single parameter, an
exponential growth model can capture some key features of the
beginning phase of tumor growth. However, the prediction of
tumor size based on the exponential growth model does not
match well with clinical datasets, since the exponential model will
give unreasonably large values over a long time. In particular,
limited nutrient availability for large tumors makes the exponential
growth an inappropriate model for tumor growth [19]. To over-
come this drawback researchers often use models such as logistic or
Gompertz models, where the growth rate decays as the tumor
population increases [19]. Thus, as t increases, tumor size
Robust Optimization with Toxicity Uncertainty 301
K
x_ ðt Þ ¼ λx ðt Þ ln ð3Þ
x ðt Þ
v_ ðt Þ ¼ uðt Þ βv ðt Þ
where k is the proportion of tumor cells killed per unit time per unit
drug concentration, and H is the Heaviside step function which is a
discontinuous function whose value is zero for negative argument
and one for positive argument [22]:
0; if vðt Þ < v th
H ðvðt Þ vth Þ ¼
1; if vðt Þ v th
The cell dynamics are described as
K
x_ ðt Þ ¼ γx ðt Þ ln L ðx ðt Þ; vðt ÞÞx ðt Þ
x ðt Þ
PC : x_ 1 ¼ a x x 0 d 1 x 1 y_ 1 ¼ a y y 0 d 1 y 1 z_ 1 ¼ az z 0 d 1 z 1
DC : x_ 2 ¼ b x x 1 d 2 x 2 y_ 2 ¼ b y y 1 d 2 y 2 z_ 2 ¼ b z z 1 d 2 z 2
TC : x_ 3 ¼ c x x 2 d 3 x 3 y ˙3 ¼ c y y 2 d 3 y 3 z_ 3 ¼ c z z 2 d 3 z 3
where x0, x1, x2, and x3 indicate the populations of normal SC, PC,
DC, and TC, respectively. y0, y1, y2, and y3 indicate the populations
of wild-type leukemic SC, PC, DC, and TC, respectively. z0, z1, z2,
and z3 indicate the populations of mutant leukemic SC, PC, DC,
and TC, respectively. The rate constants are given by a, b, and c with
appropriate indices between normal, wild type, and mutant leuke-
mic cells. d0, d1, d2, and d3 indicate the death rates of SC, PC, DC,
and TC, respectively. λ is a decreasing function describing the
homeostasis of normal SC. ry and rz are the birth rates of sensitive
leukemic and resistant leukemic SC, respectively. In our previous
Robust Optimization with Toxicity Uncertainty 303
ðT
vðs Þds vcum
0
ð
tþdt
side effects arise due to drug toxicity, including low blood cell
count, fever, heart problems, as well as a number of other adverse
events [35–37]. Different patients may suffer different side effects
and even for the same patient, due to the change in health condi-
tion over time, side effects may vary over the course of treatment.
This complexity of the side effects induced by TKI therapy makes
the scheduling of treatment for CML challenging.
3.1 Nominal Problem In this section, we first introduce a series of ordinary differential
Formulation equations (ODE) that describes the dynamics of normal stem cells,
wild-type CML cells, and mutant CML cells in response to combi-
nation therapy. Then we explain how the toxicity associated with
treatment protocols quantified by monitoring ANC values during
treatment. Next, we propose a deterministic optimization problem
to find the best schedule of multiple therapies based on the evolution
of CML cells according to our ordinary differential equation model.
The resulting optimization problem is nontrivial due to the presence
of ordinary different equation constraints and integer variables. We
explain how the nominal problem can be solved efficiently.
3.1.1 CML Dynamics We use ODEs to describe the dynamics of stem cells for CML
patients over a given time period of M weeks. There are three
different types of stem cells: normal stem cells (NSC), wild-type
stem cells (WSC), and mutant stem cells (MSC). Let I ¼ { 1, 2, 3,
. . . , n} be the set of stem cell types, where types 1, 2, and i denote
NSC, WSC, and type (i 2) MSC (3 i n), respectively. Let
J ¼ {0, 1, 2, 3} be the set of drugs used to treat CML, where drug
0, 1, 2, and 3 denote a drug holiday, nilotinib, dasatinib, and
imatinib, respectively. Let M ¼ {1, 2, 3, . . . , M} be the set of
treatment periods and xi(t) the abundance of NSC, WSC, and
MSC at time t for i ∈ I, respectively. In this project, we assume
that Δt ¼ 7 days. If drug j ∈ J is taken for week m, the cell
dynamics are modeled as below:
j
x_ 1 ðt Þ ¼ b 1 ψ x 1 d x 1 , t∈½mΔt; ðm þ 1ÞΔt , m∈M \ fM g, ð4aÞ
j
x_ 2 ðt Þ ¼ b 2 ð1 ðn 2ÞμÞψ x 2 d x 2 , t∈½mΔt; ðm þ 1ÞΔt , m∈M \ fM g, ð4bÞ
j j
x_ i ðt Þ ¼ b i ψ x 2 d x i þ μb 2 ψ x 2 x 2 , t∈½mΔt; ðm þ 1ÞΔt , m∈M \ fM g, 3 i n, ð4cÞ
Here, we assume that the birth rates of the NSC, WSC, and
MSC are drug specific, but drugs do not affect the death rates of
stem cells and all the stem cells have the same death rate d. The
division rates of NSC, WSC, and MSC under drug j are b 1j , b 2j , and
j
b i per week, respectively. MSC are mutated from WSC with a
mutation rate μ. The competition between normal and leukemic
stem cells is modeled by the density dependence function ψ x i ,
306 Junfeng Zhu et al.
!
P
n
where ψ x i ¼ 1= 1 þ pi x k ðt Þ . These functions ensure that the total
k¼1
number of normal and leukemic stem cells remains constant once
the system reaches a steady
state
[38]. We set the constants
b0 b0
p1 ¼ d1 1 =K 1 and p2 ¼ d2 1 =K 2 , where K1 and K2 are the
1 1
3.1.2 Toxicity Modeling In our previous work [14], we developed a model to quantify the
in Nominal Optimization ANC levels in patients during the course of therapy. Here we review
Problem this model. We assume the patient’s ANC level decreases at rate
danc , j per week taking drug j, for j ¼ 1 , 2 , 3. During drug holi-
day, ANC increases at rate danc , 0 per week but never exceeds the
normal level ANCnormal. At the same time, ANC should stay above
an acceptable threshold level Lanc. The ANC levels are modeled as:
!
X
m, j
y mþ1
¼ min y
m
d anc, j z ; ANCnormal
j ∈J
y m L anc
3.1.3 Nominal Assume that the initial population for each cell type is known. The
Optimization Problem goal of the nominal problem is to develop a treatment protocol to
minimize the cumulative leukemic cell number over a given
planning period subject to the toxicity constraints. The drug used
in each treatment cycle is determined by the weekly treatment
decision. Within each week, the dosing regimen stays identical on
a day-to-day basis. The cumulative leukemic cell numbers at time
t are P
modeled by the total number of WSC and MSC which is
x i, m , where xi , m ¼ xi(mΔt).
m∈M , i∈I \ f1g
The nominal optimization problem can be formulated as a
mixed-integer optimization problem with ODE constraints: details
are provided in Appendix 1.
3.2 Robust Problem A challenge of utilizing this optimization procedure in the clinical
Formulation setting is that parameters such as birth rates, death rates, and
toxicity decreasing rates in model (Eq. 7) may vary among patients.
Robust Optimization with Toxicity Uncertainty 307
Even for the same patient, due to changes in health status, these
parameters may vary over time. By modeling the uncertainty in
(Eq. 7), we investigate how parametric uncertainty affects the
optimal solution. Specifically, we consider the uncertainty of drug
toxicity in the model. The problem is formulated as a mixed integer
robust optimization problem. Our objective function is to mini-
mize the cumulative leukemic cell number over a fixed period. The
goal of the study is to investigate how the parameter uncertainty
affects the optimal solution.
3.2.1 Toxicity We primarily focus on uncertainty in the rate at which the ANC
Uncertainty level decreases under the different treatment options. In particular,
we assume
m b j m, j
,j ¼ L þ C η ð5Þ
j
d anc
In this section, first we will describe the dataset and parameters that
were used in our numerical experiments, then the dynamics of the
CML cells under three mono-therapies will be simulated. Next, the
308 Junfeng Zhu et al.
4.1 Parameter For our model, we assume there are two BCR-ABL-mutant cell types
Selection that are Y253F and F317L. We consider patients harboring three
different levels of the BCR-ABL mutant cells before the start of
therapy: low, medium, and high. The corresponding initial cell
populations are given in Table 1. The parameter settings for birth
rates and death rates in our model (Eq. 7) are given below. Based on
[38], we set death rate d to be 0.003. The net-growth rate
of NSC is
j
assumed to be 0.005. The net-growth rates of WSC b 2 under drug
holiday and mono-therapies are 0.008 and 0.002, respectively. We
assume that the net-growth rates of MSC under holiday
0 j
b i ; i ¼ 3; 4 are the same as b 02 which is 0.008. b i for i ¼ 3 , 4
and j 1 are estimated based on the work presented in [39] which
studied the in vivo mutational selectivity profile for mono-therapies.
We consider two mutant cell types in the model, i.e., Y 253F and
j
F317L. For Y 253F, the estimated values of b 3 are 0.0088, 0.0097,
and 0.0101 under nilotinib, dasatinib, and imatinib, respectively. For
j
F317L, the estimated values of b 4 are 0.0228, 0.0509, and
0.0079 under nilotinib, dasatinib, and imatinib, respectively. The
mutation rate of WSC is 107 [24]. We assume the equilibrium
abundance of NSC (K1) and WSC (K2) are 107 and 2 107,
respectively.
For toxicity constraints, we assume the patient’s normal ANC
level is Uanc ¼ 3000/mm3ANC and its ANC cannot fall below
Lanc ¼ 1000/mm3. We assume that the patient’s initial ANC is
3000/mm3. Based on the median time of grade 3 or 4 episode of
neutropenia, we estimated the weekly decrease rates of ANC as
danc , 1 ¼ 145.8333/mm3 under nilotinib [40], danc , 2 ¼
125/mm3 under dasatinib [41], and danc , 3 ¼ 56.4516/
mm3 under imatinib [10]. We assume that the ANC of a patient
increases by danc , 0 ¼ 500/mm3 during a drug holiday, before it
reaches the normal level 3000/mm3. In this project, we consider
two types of uncertainties: C b j ¼ 0:2 L j and Cb j ¼ 0:3 L j .
Table 1
Initial cell population conditions
4.2 Cell Dynamics In this part, we present the dynamics of stem cells with the preex-
Simulations isting BCR-ABL mutation Y 253F and F317L under mono-
therapies. As reported, Y 253F is highly resistant to imatinib, lightly
resistant to nilotinib and sensitive to dasatinib; F317L is highly
resistant to dasatinib, and sensitive to imatinib and nilotinib. For
this simulation pattern, we expect that all monotherapies will fail
eventually because of the presence of the mutant cells and their
differentiated responses to drugs. The initial levels of NSC, WSC, Y
253F, and F317L are 9E þ 06, 9E þ 05, 1E þ 05, and 1E þ 05,
respectively.
Figure 1 plots the cell dynamics over 420 weeks (around
8 years) for six treatment protocols: nilotinib, dasatinib, and ima-
tinb mono-therapy, all of which are performed with and without
drug holiday. As F317L is resistant to dasatinib, the population of
F317L explodes around week 50 when administering dasatinib
[42]. On the other hand, we note that the population of Y 253F
is well controlled. In Fig. 2, we only look at the performances of
imatinb and nilotinib mono-therapy. The population of Y
253 increases over time, but the population size of F317L decreases
in both the cases. Those results indicate that drug combination may
be more effective for treating patients with multiple mutant cell
types.
Next, we discuss the results for nominal and robust optimiza-
tion problems. We first report the recurrence time of the optimal
schedule and mono therapies assuming that the toxicity parameters
are known, i.e., the nominal problem. In addition, we investigate
the recurrence time of the resulting optimal schedule when
Fig. 1 (a–e) Cell dynamics under mono-therapies for 420 weeks (mutant cell types: Y 253F and F317L)
310 Junfeng Zhu et al.
Fig. 2 (a–e) Cell dynamics under mono-therapies (without dasatinib) for 420 weeks (mutant cell types: Y 253F
and F317L)
4.3 Nominal Optimal In this section, we are interested in the recurrence time for the two
Treatment Plans scenarios: mono-therapy and the nominal optimized therapies that
are achieved by solving the model presented in Appendix 1 for
360 weeks. The recurrence time is defined as the time at which
the tumor cell population returns to its size at the start of treat-
ment. The initial conditions for NSC, WSC, Y 253F, and F317L are
9E þ 06, 5E þ 05, 3E þ 05, and 3E þ 05, respectively. The
nominal optimal treatment plans are given in Fig. 3. The cell
growth is shown in Fig. 4. Since F317L is highly resistant to
dasatinib, we show the dynamics of tumor growth under dasatinib
only for 50 weeks. The results are summarized in Table 2. Under
the optimal schedule, the tumor size keeps decreasing, and thus
there is no recurrence time. We thus denote recurrence time by
NA. Under nilotinib, the tumor size reaches its minimal size at
week 88, then reaches the initial population size at week 183, and
doubles its size at week 261. Under imatinib, the tumor size
reaches the minimal size at week 63, reaches the initial population
size at week 130, and doubles its size at week 185. Under dasatinib,
the tumor size keeps increasing.
We also performed a sensitivity analysis on the nominal optimal
solution (shown in Fig. 3) with respect to the birth rates of mutant
Robust Optimization with Toxicity Uncertainty 311
Fig. 3 Optimal solution of the nominal problem for 360 weeks (mutant cell types: Y 253F and F317L). Digits
0, 1, 2, and 3 represent drug holiday, nilotinib, dasatinib, and imatinib
Fig. 4 Cell dynamics under mono-therapies and Optimal solution of the nominal problem for 360 weeks
(mutant cell types: Y 253F and F317L)
Table 2
Recurrence time for multiple mutants
j
cells (b i for i ¼ 3 , 4 and j ¼ 1 , 2 , 3). We are interested in how
the recurrence time under schedule (shown in Fig. 3) changes as we
vary the birth rates of mutant cells. A 360-week simulation is run to
study the behavior of recurrence time. We consider two scenarios.
312 Junfeng Zhu et al.
Fig. 5 The recurrence time with respect to the birth rate changes of Y253F and F317L under one drug when
the treatment protocols are fixed as the nominal optimal solution. (a–c) show the recurrence time when birth
rates of Y253F and F317L vary under nilotinib, dasatinib, and imatinib, respectively
Scenario one is that the birth rates of both mutant cells types vary
under only one drug, while the birth rates of mutant cells stay
constant under the other two drugs, i.e., if the drug affecting
birth rates is nilotinib, then b 13 and b 14 are set to be uniformly
distributed on [0.7, 1.3], while b 23 , b 24 , b 33 , and b 34 are fixed. The
other scenario is that the birth rates of one mutant cell type change
under all drugs, whereas the birth rates of the second mutant cell
type stay constant, i.e., the birth rates of Y 253F change under all
three drugs, while the birth rates of F317L stay the same under all
three drugs.
Figure 5 shows the results for scenario one. The colors indicate
different recurrence time as indicated by the colorbar, i.e., blue,
green, and red corresponding to a recurrence time of 0, 150, and
The
360, respectively. original birth rate of type (i 2) mutant cell
j
under drug j, b i , is given in Subheading 4.1. The varied birth
o j
rates of type (i 2) cell under drug j are represented by bi . The
j j
ratio, bi = b i is set to be uniformly distributed on [0.7, 1.3]. The
o
Robust Optimization with Toxicity Uncertainty 313
results in Fig. 5a, c indicate that the tumor size is below the initial
tumor size after 360 weeks using
the proposed method. The results
in Fig. 5b show that if b24 = b 24 o is >1.27,
recurrence happens before
the end of treatment. Recall that b 24 o , a positive value, is the
original net growth rate of F317L under dasatinib. As we increase
b24 = b 24 o , b24 increases which causes F317L grows faster under
dasatinib. However, overall we see that the optimal schedule
(shown in Fig. 3) is largely robust to changes in the birth rates of
the mutant cells.
Figure 6 shows the results for scenario two where birth rates of
F317L vary. For better visualization purposes, we fix the birth rates
under one drug while varying the birth rates under the other two,
and show the results of the recurrence time. Figure 6a indicates that
the ratio of F317L birth rate under nilotinib (drug 1) is fixed at
b14 = b 14 o ¼ 0:7 and the ranges of b24 = b 24 o and b34 = b 34 o are
uniformly distributed on [0.7, 1.3]. Columns 1, 2, and 3 show
j j
the recurrence time of tumor for fixed ratio of b4 = b 4 set at 0.7,
o
1.0, and 1.3, respectively. Figures in rows 1 (a, b, c), 2 (d, e, f), and
3 (g, h, i) correspond to j ¼ 1 , 2 , 3, respectively.
The figures in
the first row show that the increase in b14 = b 14 o is less likely to cause
the tumor reaching the initial size at the end of treatment. The
reason is that F317L is highly sensitive
to drug 1 which is nilotinib.
As we increase the ratio of b14 = b 14 o , the growth rate of F317L is
reduced. From Fig. 6a, we can see that under the extreme case
1 1
b4 = b 4 o ¼ 0:7, b24 = b 24 o ¼ 1:3, and b34 = b 34 o ¼ 0:7 , recurrences
happen around week 150. The result is consistent with the recur-
rence time reported in Fig. 6f, g. There is no recurrence when
b24 = b 24 o 1, but if b24 = b 24 o ¼ 1:3, recurrence happens in almost
half of the cases. Since F317L is sensitive to both nilotinib and
imatinib, the results in Fig. 6g–i are similar to the ones in Fig. 6a–c.
The difference
is that there is still a chance for tumor recurrence
when b34 = b 34 o ¼ 1:3, because nilotinib is applied more often com-
pared to imatinib in the nominal optimal solution.
4.4 Robust Optimal As we discuss in Appendix 2, protection levels (Γm) adjust the
Treatment Plans robustness of the proposed model against the conservation level
of the solution. In this part, we first compare the robust optimal
solutions under different protection levels, which are provided in
Appendix 3 for two monotherapies (imatnib and nilotinib). Fig-
ure 7 shows the dynamics of tumor growth for 30 weeks under
nilotinib, imtinib, and optimal solutions with different protection
b j ¼ 0:2 L j , and
levels (Fig. 8). For this simulation, we assume C
initial population sizes are 9E þ 06, 9E þ 05, 1E þ 05, and
1E þ 05, for NSC, WSC, Y 253F, and F317L, respectively. It is
interesting to note that the tumor sizes under the proposed meth-
ods at week 30 are lower than those predicted for either of the
314 Junfeng Zhu et al.
Fig. 6 (a–i) The recurrence time with respect to the varied birth rates of F317L under three drugs when the
treatment protocols are fixed as the nominal optimal solution. Rows: constant birth rate set under nilotnib,
dasatnib, and imatinib, respectively. Columns: constant birth rate ratio set at 0.7, 1.0 and 1.3, respectively
Robust Optimization with Toxicity Uncertainty 315
Fig. 6 (continued)
Fig. 7 Cell dynamics under mono-therapies (without dasatinib) and optimal solutions for 30 weeks (mutant cell
types: Y 253F and F317L)
Y∗ ∗
Γ Y 0
increments are calculated by: Y∗
, where Y ∗ ∗
0 and Y Γ are the
0
optimal values of the nominal and robust optimization problems
under different protection levels, respectively. It is interesting to
note that the optimal value of the objective function increases as we
increase the protection level of robust solutions.
Next, we consider how the optimal treatment protocols are
affected by protection levels Γ and initial conditions of tumor
size. Figure 10a–c show the optimal treatment protocols for
Cb j ¼ 0:2 L j and initial tumor size at low (a), medium (b), and
high (c) levels. For initial tumor size at low level, as wild-type cells
dominate the total tumor size at the beginning of treatment, it is
efficient to reduce tumor size by taking the drug with the lowest
toxicity, which is drug 3. Recall that drug 0, 1, 2, and 3 represent
drug holiday, nilotinib, dasatinib, and imatinib. At the end of
treatment, as the number of mutant cells increases, it is necessary
to switch to dasatinib, which can reduce the number of Y 253F cells
efficiently. As we increase the protection level, more drug holidays
are needed, i.e., for unprotected optimal solutions (Γ ¼ 0), the
third break happens at the end of treatment, week 30, however for
Robust Optimization with Toxicity Uncertainty 317
Fig. 8 Robust optimal solutions under Cb j ¼ 0:2 Lj for 30 weeks. The initial conditions for NSC, WSC,
Y 253F, and F317L are 9E þ 06, 9E þ 05, 1E þ 05, and 1E þ 05, respectively.
Table 3
b j ¼ 0:2 L j : Multiple mutants
Robust solution for C
Fig. 9 Tumor size increments of optimal values under Γ with three different initial conditions
Fig. 10 Optimal solutions under Cb j ¼ 0:2 Lj with three different initial conditions: (a) initial tumor size at low
level; (b) initial tumor size at medium level; (c) initial tumor size at high level
Table 4
b j ¼ 0:3 Lj : Multiple mutants
Robust solution for C
Fig. 11 Optimal solutions under Cb j ¼ 0:3 Lj with three different initial conditions: (a) initial tumor size at low
level; (b) initial tumor size at medium level; (c) initial tumor size at high level
for patients with an initial tumor size at low, medium, and high
levels, respectively. These results are similar to those of
Cb j ¼ 0:2 L j . Hence, we can conclude that the structure of the
optimal solution is only mildly sensitive to the size of the uncer-
tainty range.
Next, we focus on comparing
j the differences that resulted
from
b j b
the uncertainty ranges C ¼ 0:3 L and C ¼ 0:3 L . From
j j
Fig. 12, we observe that: for patients with an initial tumor size at
low, medium, and high levels, the larger the toxicity uncertainty
ranges, the larger the optimal value.
The idea of imposing protection levels on robust optimization
is to use conservative constraints that guarantee no toxic side effects
occur. Here, we compare the performance of nominal solutions
versus robust optimization solutions in terms of objective function
and toxic side effects. We do this by randomly generating ANC
320 Junfeng Zhu et al.
Fig. 12 The effects of uncertainty ranges under three different initial conditions: (a) initial tumor size at low
level; (b) initial tumor size at medium level; (c) initial tumor size at high level
Table 5
Price of robust optimization
b j ¼ 0:2 Lj
C b j ¼ 0:3 Lj
C
Increments in OBJ Toxicity invalidation Increments in OBJ Toxicity invalidation
Γ (%) (%) (%) (%)
1 1.2707 0 1.8250 0
0.5 0.5386 49.98 1.0237 48.64
0.4 0.4560 61.80 0.6217 61.89
0.3 0.3626 69.18 0.6044 72.31
0.2 0.2542 79.07 0.4176 82.67
0.1 0.0842 94.08 0.2397 96.53
0.05 0.0842 94.08 0.0842 97.38
0 0 100 0 100
the ANC value is <Lanc during the simulation, then the patient
received a toxic side effect and the simulation is considered infeasi-
ble. The fraction of cases that are infeasible due to toxic side effects
is calculated by the total number of infeasible cases divided by the
total number of cases (106). The results are shown in Table 5.
Recall that Γ ¼ 0 is equivalent to the nominal problem. For both
simulations, as we increase Γ the objective value increases, while the
probability of toxicity violation decreases. The optimal solution
obtained by ROP seems to yield an interesting tradeoff between
the two objectives of minimizing cumulative tumor population and
the infeasibility of toxicity constraints beyond which allowing for
more risky regimens, i.e., using smaller Γ, does not lead to any
significant gain in objective function. In particular, if
Cb j ¼ 0:2 L j , then it appears that around Γ ¼ 0.2 there is a
sharp change in the fraction of runs that lead to toxic side effects
and a significant increase in objective value.
5 Conclusion
Acknowledgments
X
3
j
s:t: x_ 1 ðt Þ ¼ z m, j b 1 ψ x 1 d x 1 , t∈½mΔt; ðm þ 1ÞΔt , m∈M \ fM g ð7bÞ
j ¼0
X
3
j
x_ 2 ðt Þ ¼ z m, j b 2 ð1 ðn 2ÞμÞψ x 2 d x 2 , t∈½mΔt; ðm þ 1ÞΔt , m∈M \ fM g ð7cÞ
j ¼0
X
3
j j
x_ i ðt Þ ¼ z m, j b i ψ x 2 d x i þμb 2 ψ x 2 x 2 , t∈½mΔt; ðm þ1ÞΔt ,m∈M \ fM g,3 i n ð7dÞ
j ¼0
Robust Optimization with Toxicity Uncertainty 323
X
z m, j ¼ 1, m∈M \ fM g, ð7eÞ
j ∈J
X
y mþ1 ¼ b
ym d anc, j z m, j , m∈M \ fM g, ð7f Þ
j ∈J
b
y m ¼ minðy m ; ANCnormal Þ, m∈M , ð7gÞ
L anc b
y m, m∈M , ð7hÞ
z m, j ∈f0; 1g, m∈M \ fM g, j ∈J ð7iÞ
where x(0) , y0 are given. In Eqs. 7b, 7c, and 7d, the dynamics of
NSC, WSC, and MSC are described, respectively. Equations 7e,
and 7i indicate that during each week, only one type of drug or no
drug is allowed. Equations 7f, 7g, and 7h describe the toxicity
constraints.
As discussed in the previous work [26], the ODEs can be
approximated by linear functions:
X
min x i, m
m∈M , i∈I \ f1g
!
X
3
j
X
n
j
s:t: x i, mþ1 ¼ z m, j C i , 0 þ C i, k x k, m , t∈½mΔt; ðm þ 1ÞΔt , m∈M \ fM g
j ¼0 k¼1
X
z m, j ¼ 1, m∈M \ fM g
j ∈J
X
y mþ1 ¼ b
ym d anc, j z m, j , m∈M \ fM g
j ∈J
b
y m ¼ minðy m ; ANCnormal Þ, m∈M
L anc b
y m, m∈M
z m, j ∈f0; 1g, m∈M \ fM g, j ∈J
To linearize b
y m ¼ minðy m ; ANCnormal Þ, we introduce a binary
m
variable p
324 Junfeng Zhu et al.
b
y m ANCnormal U y ð1 pm Þ,
b
y m y m U y pm ,
b
y m ym,
b
y m ANCnormal ,
pm ∈f0; 1g:
The nominal problem can be transformed into a MILP as
X
min x i, m ð8aÞ
m∈M , i∈I \ f1g
!
X
3
j
Xn
j m , j
s:t: x i, mþ1 ¼ z m, j C i , 0 þ C i, k vk , t∈½mΔt; ðm þ 1ÞΔt , m∈M \ fM g, ð8bÞ
j ¼0 k¼1
m, j
0 v i U i z m, j , ð8cÞ
m, j
U i 1 z m, j vi x i, m , ð8dÞ
m, j
v i x i , m U i 1 z m, j , ð8eÞ
X
z m, j ¼ 1, m∈M \ fM g, ð8f Þ
j ∈J
X
y mþ1 ¼ b
ym d anc, j z m, j , m∈M \ fM g, ð8gÞ
j ∈J
b
y m ANCnormal U y ð1 pm Þ, ð8hÞ
b
y m y m U y pm , ð8iÞ
b
y m ym, ð8jÞ
b
y m ANCnormal , ð8kÞ
p ∈f0; 1g,
m
ð8lÞ
L anc b
y m, m∈M , ð8mÞ
z m, j ∈f0; 1g, m∈M \ fM g, j ∈J , ð8nÞ
where x(0) , y0 are given.
X
s:t: y mþ1 b
ym þ L j z m, j
j ∈J
8 9
<X =
þ max b j z m, j þ ðΓm bΓm cÞC
C b t z m, t
m m
, ð9bÞ
C mRO :j ∈S m ;
s:t: b j z m, j þ qm þ pm, j 0,
C ð11bÞ
q 0,
m
ð11cÞ
pm, j 0, ð11dÞ
Thus, the optimal solution of our robust problem can be
obtained by solving the MILP:
X
min x i, m ð12aÞ
m∈M , i∈I \ f1g
!
X
3 Xn
m, j
m, j j j
s:t: x i, mþ1 ¼ z C i, 0 þ C i, k vk , t∈½mΔt; ðm þ 1ÞΔt , m∈M \ fM g, ð12bÞ
j ¼0 k¼1
m, j
0 v i U i z m, j , ð12cÞ
m, j
U i 1 z m, j vi x i, m , ð12dÞ
m, j
v i x i , m U i 1 z m, j , ð12eÞ
X
z m, j ¼ 1, m∈M \ fM g, ð12f Þ
j ∈J
X
y mþ1 b
ym L j z m , j q m Γm
j ∈J
X
pm , j , m∈M \ fM g, ð12gÞ
j ∈J
b
y m ANCnormal U y ð1 pm Þ, ð12hÞ
b
y m y m U y pm , ð12iÞ
b
y m ym, ð12jÞ
b
y ANCnormal ,
m
ð12kÞ
p ∈f0; 1g,
m
ð12lÞ
L anc b
y m, m∈M , ð12mÞ
Robust Optimization with Toxicity Uncertainty 327
Fig. 13 Robust optimal solutions under Cb j ¼ 0:2 Lj for 30 weeks. The initial conditions for NSC, WSC,
Y 253F, and F317L are 9E þ 06, 9E þ 05, 1E þ 04, and 1E þ 04, respectively
328 Junfeng Zhu et al.
Fig. 14 Robust optimal solutions under Cb j ¼ 0:2 Lj for 30 weeks. The initial conditions for NSC, WSC,
Y 253F, and F317L are 9E þ 06, 5E þ 05, 3E þ 05, and 3E þ 05, respectively
Fig. 15 Robust optimal solutions under Cb j ¼ 0:3 Lj for 30 weeks. The initial conditions for NSC, WSC,
Y 253F, and F317L are 9E þ 06, 9E þ 05, 1E þ 04, and 1E þ 04, respectively
Robust Optimization with Toxicity Uncertainty 329
Fig. 16 Robust optimal solutions under Cb j ¼ 0:3 Lj for 30 weeks. The initial conditions for NSC, WSC,
Y 253F, and F317L are 9E þ 06, 9E þ 05, 1E þ 05, and 1E þ 05, respectively
Fig. 17 Robust optimal solutions under Cb j ¼ 0:3 Lj for 30 weeks. The initial conditions for NSC, WSC,
Y 253F, and F317L are 9E þ 06, 5E þ 05, 3E þ 05, and 3E þ 05, respectively
330 Junfeng Zhu et al.
References
1. Shi Z, Peng XX, Kim IW et al (2007) Erlotinib 13. O’Hare T, Eide CA, Deininger MWN (2007)
(Tarceva, OSI-774) antagonizes ATP-binding Bcr-Abl kinase domain mutations, drug resis-
cassette subfamily B member 1 and tance, and the road to a cure for chronic mye-
ATP-binding cassette subfamily G member loid leukemia. Blood 110:2242–2249
2-mediated drug resistance. Cancer Res 14. He Q, Zhu JF, Dingli D et al (2016) Opti-
67:1101220 mized treatment schedules for chronic myeloid
2. Paraiso KH, Xiang Y, Rebecca VW et al (2011) leukemia. PLoS Comput Biol 12:e1005129
PTEN loss confers BRAF inhibitor resistance 15. Harrold JM, Parker RS (2009) Clinically rele-
to melanoma cells through the suppression of vant cancer chemotherapy dose scheduling via
BIM expression. Cancer Res 71:27502760 mixedinteger optimization. Comput Chem
3. Foo J, Michor F (2014) Evolution of acquired Eng 33(12):2042–2054
resistance to anti-cancer therapy. J Theor Biol 16. Murray JM (1990) Some optimal control pro-
355:10 blems in cancer chemotherapy with a toxicity
4. Leder K, Foo J, Skaggs B et al (2011) Fitness limit. Math Biosci 100(1):49–67
conferred by BCR-ABL kinase domain muta- 17. Murray JM (1990) Optimal control for a can-
tions determines the risk of pre-existing resis- cer chemotherapy problem with general
tance in chronic myeloid leukemia. PLoS One growth and loss functions. Math Biosci
6(11):e27682. https://doi.org/10.1371/jour 98:273–287
nal.pone.0027682 18. Hadjiandreou MM, Mitsis GG (2014) Mathe-
5. Foo J, Leder K (2013) Dynamics of cancer matical modeling of tumor growth, drug-
recurrence. Annals Appl Probab 23 resistance, toxicity, and optimal therapy design.
(4):1437–1468 IEEE Trans Biomed Eng 61(2):415–425
6. Swanson KR, Bridge C, Murray JD et al (2003) 19. Laird AK (1964) Dynamics of tumour growth.
Virtual and real brain tumors: using mathemat- Br J Cancer 18(3):490–502
ical modeling to quantify glioma growth and 20. Martin RB (1992) Optimal control drug
invasion. J Neurol Sci 216(1):1–10 scheduling of cancer chemotherapy. Automa-
7. Badri H, Watanabe Y, Leder K (2015) Optimal tica 28:11131123
radiotherapy dose schedules under parametric 21. Floares A Neural networks control of drug
uncertainty. Phys Med Biol 61(1):338 dosage regimens in cancer chemotherapy.
8. Zhou C, Wu YL, Chen G et al (2011) Erlotinib SAIA, Cluj-Napoca, Transilvania
versus chemotherapy as first-line treatment for 22. Weisstein, Eric W. Heaviside step function.
patients with advanced EGFR mutation- MathWorld
positive non-small-cell lung cancer (OPTI-
MAL, CTONG-0802): a multicentre, open- 23. Afenya EK (2001) Recovery of normal hemo-
label, randomised, phase 3 study. Lancet poiesis in disseminated cancer therapy-a model.
Oncol 12(8):735–742 Math Biosci 172
9. Druker BJ, Talpaz M, Resta DJ et al (2001) 24. Michor F, Hughes TP, Iwasa Y et al (2005)
Efficacy and safety of a specific inhibitor of the Dynamics of chronic myeloid leukaemia.
BCR-ABL tyrosine kinase in chronic myeloid Nature 435:1267–1270
leukemia. N Engl J Med 344:1031–1037 25. Bozic I, Reiter JG, Allen B et al (2013) Evolu-
10. Kantarjian H, Sawyers C, Hochhaus A et al tionary dynamics of cancer in response to tar-
(2002) Hematologic and cytogenetic geted combination therapy. Elife 2:e00747
responses to imatinib mesylate in chronic mye- 26. Nanda S, Moore H, Lenhart S (2007) Optimal
logenous leukemia. N Engl J Med control of treatment in a mathematical model
346:645–652 of chronic myelogenous leukemia. Math Biosci
11. Cortes JE, Jones D, O’Brien S et al (2010) 210:143
Results of dasatinib in patients with early 27. O’Brien S, Berman E, Borghaei H et al (2009)
chronic-phase chronic myeloid leukemia. J NCCN clinical practice guidelines in oncology:
Clin Oncol 28(3):398–404 chronic myelogenous leukemia. J Natl Compr
12. Giles FJ, Abruzzese E, Rosti G et al (2010) Canc Netw 7(9):984–1023
Nilotinib is active in chronic and accelerated 28. Sokal JE, Cox EB, Baccarani M et al (1984)
phase chronic myeloid leukemia following fail- Prognostic discrimination in “good-risk”
ure of imatinib and dasatinib therapy. Leuke- chronic granulocytic leukemia. Blood
mia 24:1299–1301 63:789–799
Robust Optimization with Toxicity Uncertainty 331
29. Hasford J, Pfirrmann M, Hehlmann R et al 37. Marin D (2012) Initial choice of therapy
(1998) A new prognostic score for survival of among plenty for newly diagnosed chronic
patients with chronic myeloid leukemia treated myeloid leukemia. Hematology Am Soc
with interferon alfa. Writing Committee for the Hematol Educ Program 1:115–121
Collaborative CML Prognostic Factors Project 38. Foo J, Drummond MW, Clarkson B et al
Group. J Natl Cancer Inst 90:850–858 (2009) Eradication of chronic myeloid leuke-
30. Scheijen B, Griffin JD (2002) Tyrosine kinase mia stem cells: a novel mathematical model
oncogenes in normal hematopoiesis and hema- predicts no therapeutic benefit of adding
tological disease. Oncogene 21:3314 G-CSF to imatinib. PLoS Comput Biol 5(9):
31. Deininger MW, O’Brien S, Ford JM et al e1000503
(2003) Practical management of patients with 39. Gruber FX, Ernst T, Porkka K et al (2012)
chronic myeloid leukemia receiving imatinib. J Dynamics of the emergence of dasatinib and
Clin Oncol 21(8):1637–1647 nilotinib resistance in imatinib-resistant CML
32. Katia BBP, Israel B, Carla B et al (2015) patients. Leukemia 26:172–177
BCR-ABL mutations in Chronic Myeloid Leu- 40. Cortes JE, Jones D, O’Brien S et al (2010)
kemia treated with tyrosine kinase inhibitors Nilotinib as front-line treatment for patients
and impact on survival. Cancer Invest with chronic myeloid leukemia in early chronic
33:451–458 phase. J Clin Oncol 28(3):392–397
33. Ravin JG, Hagop K, Susan O et al (2009) The 41. Radich JP, Kopecky KJ, Appelbaum FR et al
use of nilotinib or dasatinib after failure to (2012) A randomized trial of dasatinib 100 mg
2 prior tyrosine kinase inhibitors: long-term versus imatinib 400 mg in newly diagnosed
follow-up. Blood 114(20):4361 chronic-phase chronic myeloid leukemia.
34. Wei G, Rafiyath S, Liu D (2010) First-line Blood 120(19):3898–3905
treatment for chronic myeloid leukemia: dasa- 42. Deininger M, Mauro M, Matloub Y et al
tinib, nilotinib, or imatinib. J Hematol Oncol (2008) Prevalence of T315I, dasatinib-specific
3:47 resistant mutations (F317L, V299L, and
35. Cornelison M, Jabbour EJ, Welch MA (2012) T315A), and nilotinib-specific resistant muta-
Managing side effects of tyrosine kinase inhibi- tions (P-loop and F359) at the time of imatinib
tor therapy to optimize adherence in patients resistance in chronic-phase chronic myeloid
with chronic myeloid leukemia: the role of the leukemia (CP-CML). Blood 112:3236
midlevel practitioner. J Support Oncol 10 43. Sawyers C (2004) Targeted cancer therapy.
(1):14–24 Nature 432:294–297
36. Conchon M, Freitas CM, Rego MA et al 44. Cortes J, Talpaz M, O’Brien S et al (2005)
(2011) Dasatinib - clinical trials and manage- Molecular responses in patients with chronic
ment of adverse events in imatinib resistant/ myelogenous leukemia in chronic phase treated
intolerant chronic myeloid leukemia. Rev Bras with imatinib mesylate. Clin Cancer Res
Hematol Hemoter 33(2):131–139 11:3425
Chapter 16
Abstract
Mathematical models of cancer stem cells are useful in translational cancer research for facilitating the
understanding of tumor growth dynamics and for predicting treatment response and resistance to com-
bined targeted therapies. In this chapter, we describe appealing aspects of different methods used in
mathematical oncology and discuss compelling questions in oncology that can be addressed with these
modeling techniques. We describe a simplified version of a model of the breast cancer stem cell niche,
illustrate the visualization of the model, and apply stochastic simulation to generate full distributions and
average trajectories of cell type populations over time. We further discuss the advent of single-cell data in
studying cancer stem cell heterogeneity and how these data can be integrated with modeling to advance
understanding of the dynamics of invasive and proliferative populations during cancer progression and
response to therapy.
Key words Breast cancer, Cancer stem cell, Mathematical model, Optimal therapy design
1 Introduction
Louise von Stechow (ed.), Cancer Systems Biology: Methods and Protocols, Methods in Molecular Biology, vol. 1711,
https://doi.org/10.1007/978-1-4939-7493-1_16, © Springer Science+Business Media, LLC 2018
333
334 Mary E. Sehl and Max S. Wicha
2.2 Modeling Cell- Because tumors consist of many cell types that interact with each
Cell Interactions other, as well as with the numerous cell types that are present in the
Between Cancer Stem tumor microenvironment, models that account for these interac-
Cells and Their tions are required. Evolutionary game theory has been useful in
Microenvironment modeling these interactions [1]. Models based on evolutionary
game theory have been employed to examine mechanisms of
growth control under conditions of competing resources [21],
and have predicted the evolution of cooperation among tumor
cells [22].
The breast cancer stem cell microenvironment consists of a
number of diverse cell types including more differentiated tumor
cells, stromal cells, endothelial cells, and immune cells. These cells
interact with each other through a number of signaling mechanisms
involving cytokines, growth factors, and other signaling molecules,
such as miRNAs [23–29].
Under normal conditions, the stem cell niche regulates how
stem cells participate in tissue generation, maintenance, and repair,
preventing stem cell depletion and overpopulation. The interaction
between these normal, tissue-specific stem cells and their niche is
required for balanced tissue maintenance, and aberrant function of
the niche may contribute to malignant transformation.
The cancer stem cell niche plays an important role in the
regulation of tumor growth, and metastasis as well as in modulating
therapeutic response. Here, we will describe the cellular elements of
the breast cancer stem cell niche.
Breast cancer stem cells, exist in either a proliferative, epithelial
state characterized by expression of ALDH as well as epithelial
markers such as E-cadherin, or in a quiescent, invasive, mesenchymal
state, characterized by expression of CD44 as well as additional
mesenchymal markers such as vimentin, N-cadherin, Twist, and
Slug and Snail [30]. When a BCSC is in the proliferative state, it
can undergo symmetric self-renewal, or asymmetric self-renewal,
giving rise to one identical copy of itself and one bipotent progeni-
tor cell [31, 32]. Alternatively, it can undergo symmetric
336 Mary E. Sehl and Max S. Wicha
SC SC SC
SC SC SC
P P P
Fig. 1 Types of stem cell division. A stem cell or stem-like cell can undergo symmetric self-renewal, giving
rise two identical copies of themselves, or asymmetric self-renewal, giving rise to one identical copy of itself
and one partially differentiated progenitor cell. It can also undergo symmetric differentiation, in which it gives
rise to two partially differentiated daughter cells
2.3 Relevance of Spatial organization is a key factor for growth and tissue renewal
Spatial Factors? during development and regeneration of healthy tissues [34]. It
was first observed in the germ stem cell niche of Drosophila mela-
nogaster that during cell division, the mitotic spindle is aligned with
support cells of the niche so that the daughter cell that remains
within the niche retains stem cell identity, whereas the daughter cell
that is displaced outside the niche (away from self-renewal signals)
initiates differentiation [35]. These oriented divisions have also
been observed in mammalian epithelia. For example, the position
of a stem cell within a hair follicle predicts whether it is likely to
Modeling Cancer Stem Cell Niche Dynamics 337
2.5 Integration of The advent of immunotherapy has led to a dramatic shift in the
Immunotherapy with treatment and survival of several tumors, such as melanoma, renal
Molecularly Targeted cell carcinoma, lung cancer, and Hodgkin lymphoma
and Cytotoxic [42–49]. Approximately one-quarter of patients with triple negative
Therapies breast cancer respond to immunotherapy [50]. Immunotherapy is
particularly successful in aggressive malignancies, where the percent-
age of tumor-initiating cells is high. For example, in melanoma the
majority of tumor cells have capacity for self-renewal [51]. These
tumors were the first where immunotherapy was shown to be suc-
cessful. Immunotherapy, informed by mathematical modeling, may
have a greater chance of leading to durable remissions [52].
Successful immunotherapy should target stem-like cells as well
as bulk tumor cells. Mathematical modeling can be helpful in pre-
dicting the variable response to immunotherapy based on different
proportions of cell types comprising a tumor. These models are
especially relevant in the adjuvant setting, where tumor growth and
invasion are driven by a small number of cells on a longer time scale,
and where considerably more time and resources are required to
directly observe survival outcomes in relation to therapy. If immu-
notherapy is successful in activating the immune system to target
the stem cell compartment, it should eventually lead to eradication
of the tumor. However, the required duration of therapy required
to observe an appreciable change in bulk tumor size is unknown.
Stochastic models can be used to predict extinction times of the cell
populations comprising the tumor, allowing the estimation of the
treatment duration required to eradicate cancer cells [53]. Models
should also take into account the potential costs of immunotherapy,
including autoimmune side effects. These models would allow
selection of the optimal treatment dosing and duration that
would have the best the chance of tumor eradication while mini-
mizing the risk of side effects.
Another area in immunotherapy where mathematical modeling
may prove useful is in determining optimal combinations of thera-
pies. A branching process model has been used to predict success of
combination therapy under assumptions of mutations conferring
resistance [54]. In models combining cytotoxic chemotherapy,
vaccine therapy, CTLA4 and PD-1 inhibitors, and drugs targeting
the BRAF and MEK pathways and other molecular pathways
[55, 56], it will be important to model dosing and effectiveness in
order to address the need to minimize potentially debilitating side
effects, including autoimmune processes as well as the development
of secondary malignancies.
3.1 Defining the The breast cancer stem cell niche is a complex system comprised of
Model cancer stem cells, and the surrounding cells and molecular signals
that govern the behavior of the stem cells. Multiple overlapping
feedback loops regulate whether a cancer stem cell undergoes self-
renewal, quiescence, differentiation, or apoptosis. The niche also
regulates the rare event of partially differentiated breast epithelial
cancer cells undergoing dedifferentiation into a stem-like state.
The scope of a model is defined by the reactant species involved,
and by the reactions or events that take place. Examples of species
involved in the breast cancer stem cell niche include cancer stem
cells (quiescent and invasive versus proliferative), progenitor cells,
differentiated luminal and basal cells, endothelial cells, mesenchy-
mal cells, immune cells as well as the elements of signaling path-
ways, which regulate the transitions and interactions between these
cell types [26, 61]. Those signaling pathway elements include
cytokines (e.g., IL-6, IL-8, TGF-β, BMPs), receptors (e.g., HER2
and CXCR1), and intracellular signals, including protein kinases
(e.g., Akt), transcription factor proteins (e.g., Lin28, IκB, Stat3),
microRNA precursors (e.g., let-7), and microRNAs (e.g., mir-93)
[23–29]. The reactions of a model describe the important events
that change the abundance of reactant species. Examples of reac-
tions in the breast cancer stem cell niche include stem cell self-
renewal, quiescence, differentiation, and apoptosis. In general, a
model should be kept as simple as possible, adding sufficient com-
plexity to address the biological principles involved.
Figure 2 shows a simplified model of the state transitions that
occur between the proliferative epithelial (MET) state of breast
cancer stem cells (BCSCs) and their invasive quiescent mesenchy-
mal (EMT) state (for illustration, a small number of species and
reactions have been included here). The species include cell types
(EMT and MET states of the BCSCs) and the factors (cytokines
and intracellular signaling molecules) that regulate transitions
340 Mary E. Sehl and Max S. Wicha
IL-6 gp130,
TGF-β TGF-βR2
Epithelial Mesenchymal
BCSC BCSC
mir-93, BMPs,
HER2 EGFR
Species: epithelial BCSC, mesenchymal BCSC, IL-6 and its receptor (gp130), TGF-
and its receptor (TGF- R2), mir-93, BMP, HER2 and its receptor (EGFR).
Reactions:
Fig. 2 Schematic of microenvironmental signals governing BCSC state transitions. In this simplified model of
the BCSC niche, we identify the species involved, including cell types (the proliferative epithelial BCSCs and
the quiescent mesenchymal BCSC populations) and cytokines and intracellular signals that regulate transition
between these two states. The reactions included in our model directly or indirectly play a role in regulating
the BCSC state transitions
3.2 Deterministic Deterministic models can provide insight into many important
Versus Stochastic aspects of microenvironmental signaling, including the under-
Models standing of dynamic control (as revealed by time-course studies),
the impact of cellular cross-talk and identification of control points,
Modeling Cancer Stem Cell Niche Dynamics 341
dE dM
¼ : ð3Þ
dt dt
If symmetric self-renewal, a process that results in an increase in
the number of BCSCs was to be added into this mathematical
model, as well as apoptosis, which decreases the number of
BCSCs the system of equations would be:
dE
¼ ðk1 y 1 þ k2 y 2 Þ E þ β δ þ k3 y 3 þ k4 y 4 þ k5 y 5 M ð4Þ
dt
dM
¼ k3 y 3 þ k4 y 4 þ k5 y 5 M þ ðk1 y 1 þ k2 y 2 ÞE ð5Þ
dt
where β and δ are the rates of symmetric self-renewal and apoptosis,
respectively. In this case, the rate of change in epithelial and mesen-
chymal BCSCs would be equal only when β ¼ δ.
3.3 Visualizing the Petri nets are diagrams that are used in systems biology to describe
Model transitions and interactions that occur in complex systems [65]. In
these graphs, boxes represent the occurrence of transitions, ovals
represent species, and directed arcs delineate which reactant species
enter the reaction (i.e., arrow flows from species to reaction) and
products that are produced during the reaction (i.e., arrow flows
from the reaction to the species). Figure 3 shows the petri net
342 Mary E. Sehl and Max S. Wicha
EM_Trans1
IL-6*gp130 EMT
EGFR HER2
HER2_Dimerization
Fig. 3 Petri net generated by the simplified model of factors regulating transitions between proliferative and
quiescent BCSC states. The Petri net demonstrates the interconnectivity of the model, defining its reactant
species (ovals) and the transitions and events (boxes) that relate them to each other
Table 1
Propensity and stoichiometric change for two example reactions
Fig. 4 Sample output from stochastic simulation of stem cell state transitions. The first panel shows the full
distribution of epithelial BCSC cell counts over 1000 simulations for a fixed period of time. For slower birth
rates, BCSC cell populations reach smaller final counts. In the second panel, the average trajectories of
epithelial-like BCSC populations are shown. When the birth rate is faster, BCSC cell counts initially diminish in
response to therapy but later increase over time
Modeling Cancer Stem Cell Niche Dynamics 345
Acknowledgments
Thanks are given to Jill Granger for manuscript review and editing.
This work was supported by grants RO1 CA101860 and R35
CA129765, NIH/NCATS UCLA CTSI Grant KL2TR000122,
and by the Breast Cancer Research Foundation
346 Mary E. Sehl and Max S. Wicha
References
1. Nowak M (2006) Evolutionary dynamics: 13. Trinh A, Rye IH, Almendro V, Helland A,
exploring the equations of life. Harvard Uni- Russnes HG, Markowetz F (2014) Goifish: a
versity Press, Canada system for the quantification of single cell het-
2. Michor F (2008) Mathematical models of can- erogeneity from ifish images. Genome Biol
cer stem cells. J Clin Oncol 26:2854–2861 15:442
3. Foo J, Michor F (2014) Evolution of acquired 14. Hou Y, Song L, Zhu P, Zhang B, Tao Y, Xu X,
resistance to anti-cancer therapy. J Theor Biol Li F, Wu K, Liang J, Shao D, Wu H, Ye X, Ye C,
355:10–20 Wu R, Jian M, Chen Y, Xie W, Zhang R,
4. Weekes SL, Barker B, Bober S, Cisneros K, Chen L, Liu X, Yao X, Zheng H, Yu C, Li Q,
Cline J, Thompson A, Hlatky L, Hahnfeldt P, Gong Z, Mao M, Yang X, Yang L, Li J,
Enderling HA (2014) multicompartment Wang W, Lu Z, Gu N, Laurie G, Bolund L,
mathematical model of cancer stem cell-driven Kristiansen K, Wang J, Yang X, Wang J (2012)
tumor growth dynamics. Bull Math Biol Single-cell exome sequencing and monoclonal
76:762–782 evolution of a JAK2-negative myeloprolifera-
tive neoplasm. Cell 148:873–885
5. Beerenwinkel N, Schwarz RF, Gerstung M,
Markowetz F (2014) Cancer evolution: math- 15. Kim KI, Simon R (2014) Using single-cell
ematical models and computational inference. sequencing data to model the evolutionary his-
Syst Biol 0:1–24 tory of a tumor. BMC Bioinformatics 15:27
6. Gupta PB, Fillmore CM, Jiang G, Shapira SD, 16. Azizi E, Fouladdel S, Deol YS, Bender J,
Tao K, Kuperwasser C, Lander ES (2011) Sto- McDermott S, Jiang H, Sehl M, Clouthier
chastic state transitions give rise to phenotypic SG, Nagrath S, Wicha MS. Exploring cancer
equilibrium in populations of cancer cells. Cell stem cells heterogeneity via single cell multi-
146:633–644 plex gene expression analysis. Abstract 1943.
Proceedings: AACR 106th Annual Meeting
7. Sehl ME, Shimada M, Landeros A, Lange K, 2015; April 5–9th, 2014; San Diego, CA.
Wicha MS (2015) Modeling of cancer stem cell
state transitions predicts therapeutic response. 17. Azizi E, Jiagge EM, Fouladdel S, Wong S,
PLoS One 10:e0135797 Dziubinski ML, Sehl M, Kyani A, Li J,
Jiang H, Luther TK, Clouthier SG, McDer-
8. Norton L (2005) Conceptual and practical mott SP, Carpten J, Newman LA, Merajver
implications of breast tissue geometry: toward SD, Wicha M. Single cell multiplex gene
a more effective, less toxic therapy. Oncologist expression analysis to unravel heterogeneity of
10:370–381 PDX samples established from tumors of breast
9. Baldock AL, Rockne RC, Boone AD, Neal ML, cancer patients with different ethnicity.
Hawkins-Daarud A, Corwin DM, Bridge CA, Abstract 4834. Proceedings: AACR 106th
Guyman LA, Trister AD, Mrugala MM, Rock- Annual Meeting 2015; April 18–22, 2015;
hill JK, Swanson KR (2013) From patient- Philadelphia, PA.
specific mathematical neuro-oncology to preci- 18. Hwang D, Smith JJ, Leslie DM, Weston AD,
sion medicine. Front. Oncologia 3:62 Rust AG, Ramsey S, de Atauri P, Siegel AF,
10. Withers HR, Taylor JMG, Maciejewski B Bolouri H, Aitchison JD, Hood L (2005) A
(1988) Treatment volume and tissue tolerance. data integration methodology for systems biol-
Int J Radiat Oncol Biol Phys 14:751–759 ogy: experimental verification. Proc Natl Acad
11. Simon R, Altman DG (1994) Statistical aspects Sci U S A 102:17302–17307
of prognostic factor studies in oncology. Br J 19. Yeang CH, Ideker T, Jaakkola T (2004) Physi-
Cancer 69:979–985 cal Network Models. J Comput Biol
12. Almendro V, Cheng Y-K, Randles A, 11:243–262
Itzkovitz S, Marusyk A, Ametller E, 20. Markowetz F, Sprang R (2007) Inferring cel-
Gonzalez-Farre X, Munoz M, Russnes HG, lular networks – a review. BioMed Central Bio-
Helland A, Rye IH, Borresen-Dale AL, informatics 8(Suppl 6):S5
Maruyama R, van Oudenaarden A, 21. Gatenby RA, Vincent TL (2003) An evolution-
Dowsett M, Jones RL, Reis-Filho J, Gascon P, ary model of carcinogenesis. Cancer Res
Goenen M, Michor F, Polyak K (2014) Infer- 63:6212–6220
ence of tumor evolution during chemotherapy
by computational modeling and in situ analysis 22. Axelrod R, Axelrod DE, Pienta KJ (2006) Evo-
of genetic and phenotypic cellular diversity. lution of cooperation among tumor cells. Proc
Cell Rep 6:514–527 Natl Acad Sci U S A 103:13474–13479
Modeling Cancer Stem Cell Niche Dynamics 347
23. Korkaya H, Kim GI, Davis A, Malik F, Henry 32. Cicalese A, Bonizzi G, Pasi CE, Faretta M,
NL, Ithimakin S, Quraishi AA, Tawakkol N, Ronzoni S, Giulini B, Brisken C, Minucci S,
D’Angelo R, Paulson AK, Chung S, Luther T, Di Fiore PP, Pelicci PG (2009) The tumor
Paholak HJ, Liu S, Hassan KA, Zen Q, Clou- suppressor p53 regulates polarity of self-
thier SG, Wicha MS (2012) Activation of an renewing divisions in mammary stem cells.
IL6 inflammatory loop mediates trastuzumab Cell 138:1083–1095
resistance in HER2+ breast cancer by expand- 33. Peng D, Tanikawa T, Li W, Zhao L, Vatan L,
ing the cancer stem cell population. Mol Cell Szeliga W, Wan S, Wei S, Wang Y, Liu Y,
47:570–584 Staroslawska E, Szubstarski F, Rolinski J,
24. Korkaya H, Liu S, Wicha MS (2011) Breast Grywalska E, Stanisławek A, Polkowski W,
cancer stem cells, cytokine networks, and the Kurylcio A, Kleer C, Chang AE, Wicha M,
tumor microenvironment. J Clin Invest Sabel M, Zou W, Kryczek I (2016) Myeloid-
121:3804–3809 derived suppressor cells endow stem-like quali-
25. Korkaya H, Liu S, Wicha MS (2011) Regula- ties to breast cancer cells through IL6/STAT3
tion of cancer stem cells by cytokine networks: and NO/NOTCH cross-talk signaling. Cancer
attacking cancer’s inflammatory roots. Clin Res 76:3156–3165
Cancer Res 17:6125–6129 34. Rompolas P, Mesa KR, Greco V (2013) Spatial
26. Liu S, Ginestier C, SJ O, Clouthier SG, Patel organization within a niche as a determinant of
SH, Monville F, Korkaya H, Heath A, stem cell fate. Nature 402:513–518
Dutcher J, Kleer CG, Jung Y, Dontu G, 35. Jones DL, Wagers AJ (2008) No place like
Taichman R, Wicha MS (2011) Breast cancer home: anatomy and function of the stem cell
stem cells are regulated by mesenchymal stem niche. Nat Rev Mol Cell Biol 9:11–21
cells through cytokine networks. Cancer Res 36. Ovadia J, Nie Q (2013) Stem cell niche struc-
71:614–624 ture as an inherent cause of undulating epithe-
27. Liu S, Clouthier SG, Wicha MS (2012) Role of lial morphologies. Biophys J 104:237–246
microRNAs in the regulation of breast cancer 37. Szekely T, Burrage K, Mangel M, Bonasall MB
stem cells. J Mammary Gland Biol Neoplasia (2014) Stochastic dynamics of interacting hae-
17:15–21 matopoietic stem cell niche lineages. PLoS
28. Deng L, Shang L, Bai S, Chen J, He X, Martin- Comput Biol 10:e1003794
Trevino R, Chen S, Li XY, Meng X, Yu B, 38. Komarova NL (2006) Spatial stochastic models
Wang X, Liu Y, McDermott SP, Ariazi AE, for cancer initiation and progression. Bull Math
Ginestier C, Ibarra I, Ke J, Luther T, Clouthier Biol 68:1573–1599
SG, Xu L, Shan G, Song E, Yao H, Hannon GJ, 39. Komarova NL (2007) Loss- and gain-of-func-
Weiss SJ, Wicha MS, Liu S (2014) Micro- tion mutations in cancer: mass-action, spatial
RNA100 inhibits self-renewal of breast cancer and hierarchical models. J Stat Phys
stem-like cells and breast tumor development. 128:413–446
Cancer Res 74:6648–6660
40. Conley SJ, Gheordunescu E, Kakarala P,
29. Liu S, Patel SH, Ginestier C, Ibarra I, Martin- Newman B, Korkaya H, Heath AN, Clouthier
Trevino R, Bai S, McDermott SP, Shang L, SG, Wicha MS (2012) Antiangiogenic agents
Ke J, SJ O, Heath A, Zhang KJ, Korkaya H, increase breast cancer stem cells via the genera-
Clouthier SG, Charafe-Jauffret E, tion of tumor hypoxia. Proc Natl Acad Sci U S
Birnbaum D, Hannon GJ, Wicha MS (2012) A 109:1784–1789
MicroRNA93 regulates proliferation and dif-
ferentiation of normal and malignant breast 41. Savage VM, Herman AB, West GB, Leu K
stem cells. PLoS Genet 8:e1002751 (2013) Using fractal geometry and universal
growth curves as diagnostics for comparing
30. Liu S, Cong Y, Wang D, Sun Y, Deng L, Liu Y, tumor vasculature and metabolic rate with
Martin-Trevino R, Shang L, McDermott SP, healthy tissue and for predicting responses to
Landis MD, Hog S, Adams A, D’Angelo R, drug therapies. Discr Cont Dyn Syst Ser B
Ginestier C, Charafe-Jauffret E, Clouthier SG, 18:1077–1108
Birnbaum D, Wong ST, Zhan M, Chang JC,
Wicha MS (2013) Breast cancer stem cell tran- 42. Pardoll DM (2012) The blockade of immune
sition between epithelial and mesenchymal checkpoints in cancer immunotherapy. Nat Rev
states reflective of their normal counterparts. Cancer 12:252–264
Stem Cell Rep 2:78–91 43. Hodi FS, O’Day SJ, DF MD, Weber RW, Sos-
31. Morrison SJ, Kimble J (2006) Asymmetric and man JA, Haanen JB, Gonzalez R, Robert C,
symmetric stem-cell divisions in development Schadendorf D, Hassel JC, Akerley W, van den
and cancer. Nature 441:1068–1074 Eertwegh AJ, Lutzky J, Lorigan P, Vaubel JM,
Linette GP, Hogg D, Ottensmeier CH,
348 Mary E. Sehl and Max S. Wicha
Lebbé C, Peschel C, Quirt I, Clark JI, Wolchok MPDL3280A leads to clinical activity in
JD, Weber JS, Tian J, Yellin MJ, Nichol GM, patients with metastatic triple-negative breast
Hoos A, Urba WJ (2010) Improved survival cancer (TNBC). Proceedings: AACR 106th
with ipilimumab in patients with metastatic Annual Meeting 2015; April 18–22, 2015;
melanoma. N Engl J Med 363:711–723 Philadelphia, PA
44. Mellman I, Coukos G, Dranoff G (2011) Can- 51. Quintana E, Shackleton M, Sabel MS, Fullen
cer immunotherapy comes of age. Nature DR, Johnson TM, Morrison SJ (2008) Effi-
480:480–489 cient tumour formation by single human mela-
45. Luke JJ, Flaherty KT, Ribas A, Long noma cells. Nature 456:593–598
GV. Targeted agents and immunotherapies: 52. Walker R, Enderling H (2015) From concept
optimizing outcomes in melanoma. Nat Rev to clinic: mathematically informed immuno-
Clin Oncol. 2017 14 463 therapy. Curr Probl Cancer 40:68–83
46. Ribas A, Hamid O, Daud A, Hodi FS, Wolchok 53. Sehl M, Zhou H, Sinsheimer JS, Lange KL
JD, Kefford R, Joshua AM, Patnaik A, Hwu (2011) Extinction models for cancer stem cell
WJ, Weber JS, Gangadhar TC, Hersey P, therapy. Math Biosci 234(2):132–146
Dronca R, Joseph RW, Zarour H, 54. Robert L, Ribas A, Hu-Lieskovan S (2016)
Chmielowski B, Lawrence DP, Algazi A, Rizvi Combining targeted therapy with immuno-
NA, Hoffner B, Mateus C, Gergich K, Lindia therapy. Can 1+1 equal more than 2? Semin
JA, Giannotti M, Li XN, Ebbinghaus S, Kang Immunol 28:73–80
SP, Robert C (2016) Association of Pembroli- 55. Hu-Lieskovan S, Robert L, Homet Moreno B,
zumab With Tumor Response and Survival Ribas A (2014) Combining targeted therapy
Among Patients With Advanced Melanoma. with immunotherapy in BRAF-mutant mela-
JAMA 315:1600–1609 noma: promise and challenges. J Clin Oncol
47. Garon EB, Rizvi NA, Hui R, Leighl N, Balma- 32:2248–2254
noukian AS, Eder JP, Patnaik A, Aggarwal C, 56. Lu H, Clauser KR, Tam WL, Fröse J, Ye X,
Gubens M, Horn L, Carcereny E, Ahn MJ, Eaton EN, Reinhardt F, Donnenberg VS,
Felip E, Lee JS, Hellmann MD, Hamid O, Bhargava R, Carr SA, Weinberg RAA (2014)
Goldman JW, Soria JC, Dolled-Filhart M, breast cancer stem cell niche supported by jux-
Rutledge RZ, Zhang J, Lunceford JK, tacrine signalling from monocytes and macro-
Rangwala R, Lubiniecki GM, Roach C, phages. Nat Cell Biol 16:1105–1117
Emancipator K, Gandhi L (2015)
KEYNOTE-001 Investigators. Pembrolizu- 57. Bozic I, Reiter JG, Allen B, Antal T,
mab for the treatment of non-small-cell lung Chatterjee K, Shah P, Moon YS, Yaqubie A,
cancer. N Engl J Med 372:2018–2028 Kelly N, Le DT, Lipson EJ, Chapman PB,
Diaz LA Jr, Vogelstein B, Nowak MA (2013)
48. Ansell SM, Lesokhin AM, Borrello I, Evolutionary dynamics of cancer in response to
Halwani A, Scott EC, Gutierrez M, Schuster targeted combination therapy. Elife 2:e00747
SJ, Millenson MM, Cattry D, Freeman GJ,
Rodig SJ, Chapuy B, Ligon AH, Zhu L, Grosso 58. Sehl ME, Sinsheimer JS, Zhou H, Lange KL
JF, Kim SY, Timmerman JM, Shipp MA, (2009) Differential destruction of stem cells:
Armand P (2015) PD-1 blockade with nivolu- implications for targeted cancer stem cell ther-
mab in relapsed or refractory Hodgkin’s lym- apy. Cancer Res 69(24):9481–9489
phoma. N Engl J Med 372:311–319 59. Rodriguez-Brenes IA, Komarova NL, Wodarz
49. Motzer RJ, Escudier B, McDermott DF, D (2011) Evolutionary dynamics of feedback
George S, Hammers HJ, Srinivas S, Tykodi escape and the development of stem-cell-
SS, Sosman JA, Procopio G, Plimack ER, driven cancers. Proc Natl Acad Sci U S A
Castellano D, Choueiri TK, Gurney H, 108:18983–18988
Donskov F, Bono P, Wagstaff J, Gauler TC, 60. Behar M, Barken D, Werner SL, Hoffmann A
Ueda T, Tomita Y, Schutz FA, (2013) The dynamics of signaling as a pharma-
Kollmannsberger C, Larkin J, Ravaud A, cological target. Cell 155:448–461
Simon JS, LA X, Waxman IM, Sharma P 61. Sun Z, Komarova NL (2012) Stochastic mod-
(2015) CheckMate 025 Investigators. Nivolu- eling of stem-cell dynamics with control. Math
mab versus everolimus in advanced renal-cell Biosci 240:231–240
carcinoma. N Engl J Med 373:1803–1813 62. Mitchell S, Tsui R, Hoffmann A (2015) Study-
50. Leisha A. Emens, Fadi S. Braiteh, Philippe Cas- ing NF-kB signaling with mathematical mod-
sier, Jean-Pierre Delord, Joseph Paul Eder, els. Methods Mol Biol 1280:647–661
Marcella Fasso, Yuanyuan Xiao, Yan Wang, 63. Schmidt H, Jirstrand M (2006) Systems Biol-
Luciana Molinero, Daniel S. Chen and Ian ogy Toolbox for MATLAB: a computational
Krop. Abstract 2859: Inhibition of PD-L1 by
Modeling Cancer Stem Cell Niche Dynamics 349
platform for research in systems biology. Bioin- 67. Gillespie DT, Petzold LR (2003) Improved
formatics 22:514–515 leap-size selection for accelerated stochastic
64. Nagy AL, Papp D, Toth J (2012) ReactionKi- simulation. J Chem Phys 119:8229–8234
netics-- a mathematica package with applica- 68. Macklin P, Edgerton ME, Thompson AM,
tions. Chem Eng Sci 83:12–23 Cristini V (2012) Patient-calibrated agent-
65. Peterson JL (1981) Petri net theory and the based modeling of ductal carcinoma in situ
modeling of systems. Prentice-Hall, Engle- (DCIS): from microscopic measurements to
wood Cliffs, NJ macroscopic predictions of clinical progression.
66. Gillespie DT (1977) Exact stochastic simula- J Theor Biol 301:122–140
tion of coupled chemical reactions. J Phys 69. Enderling H (2013) Unveiling stem cell kinet-
Chem 81:2340–2361 ics: prime time for integrating experimental
and computational models. Front Oncol 3:291
Chapter 17
Abstract
Gene products or pathways that are aberrantly activated in cancer but not in normal tissue hold great
promises for being effective and safe anticancer therapeutic targets. Many targeted drugs have entered
clinical trials but so far showed limited efficacy mostly due to variability in treatment responses and often
rapidly emerging resistance. Toward more effective treatment options, we will need multi-targeted drugs or
drug combinations, which selectively inhibit the viability and growth of cancer cells and block distinct
escape mechanisms for the cells to become resistant. Functional profiling of drug combinations requires
careful experimental design and robust data analysis approaches. At the Institute for Molecular Medicine
Finland (FIMM), we have developed an experimental-computational pipeline for high-throughput screen-
ing of drug combination effects in cancer cells. The integration of automated screening techniques with
advanced synergy scoring tools allows for efficient and reliable detection of synergistic drug interactions
within a specific window of concentrations, hence accelerating the identification of potential drug combi-
nations for further confirmatory studies.
Key words Drug combinations, High-throughput screening, Experimental design, Synergy scoring,
Computational modeling
1 Introduction
Louise von Stechow (ed.), Cancer Systems Biology: Methods and Protocols, Methods in Molecular Biology, vol. 1711,
https://doi.org/10.1007/978-1-4939-7493-1_17, © Springer Science+Business Media, LLC 2018
351
352 Liye He et al.
2 Materials
2.1 Cell Culture 1. Established cancer cell lines can be purchased from multiple
vendors (see Note 1).
2. Patient-derived samples are obtained with permission from
Finnish biobanks, hospitals, and clinical collaborators [2].
3. Cell media, serum and supplements recommended by cell line
providers.
4. Trypsin-EDTA.
5. HyQTase.
6. CellTox Green Cytotoxicity reagent (Promega).
7. CellTiter-Glo or CellTiter-Glo 2.0 reagent (Promega).
8. 384-well tissue culture treated sterile assay plates.
9. MicroClime Environmental Lids.
10. Beckman Coulter Biomek FXP for dispensing primary cells,
which tend to grow as aggregates.
11. Plate reader.
Fig. 1 An overview of the FIMM oncology compound collection. The drug combination platform enables the
testing of pairwise drug combinations from 525 small-molecular anticancer compounds that cover mainly
kinase inhibitors and other signal transduction modulators. About half of the compounds comprised in the
library are either FDA-approved or being evaluated in clinical trials at different stages
3 Methods
10000
3000
1000
300
100
30
10
0
B
40 40
Synergy score
40 30 30
30 20 20
20 10 10
10 0 0
0 −10 −10
−10 −20 −20
−20 −30 −30
−30 −40 −40
−40
10000 10000 10000
3000 3000 3000
1000 1000 1000
300 1000 300 1000
300 1000 300 100 300
100 300 100 100 100
100 30 30 30 30
30 10
30
10 3 10
10 3 10
10 3 1 1
1
Non-interactive Antagonistic Synergistic
Fig. 2 An overview of the drug combination data analysis. (a) A typical high-throughput drug combination
screen utilizes a dose-response matrix design where all possible dose combinations for a drug pair can be
tested. Colors in the dose-response matrices show different levels of phenotypic responses of the cancer cell
with red indicating stronger inhibition and green indicating lower inhibition. (b) Depending on the interaction
pattern models derived from the dose-response matrices, a drug combination can be classified as
non-interactive, antagonistic, or synergistic
spin the plates at 218 g for 5 min. Measure the CTG signal in
the assay wells using a plate reader with luminescence mode.
4. MicroClime Environmental Lids are used to minimize edge
effect and to keep concentrations of solutions constant.
3.2 Drug We utilize a combination plate layout where six compound pairs can
Combination Plate be accommodated on one 384-well plate. A given pair of drugs is
Design combined in a series of one blank and seven half-log dilution
concentrations, resulting in an 8 8 dose matrix. To be able to
transfer the compounds according to this matrix format, a pick list
defining the source and destination plate locations and transfer
volumes for the compounds is needed. An in-house program, called
FIMMCherry, has been developed to automatically generate these
rather complex pick lists effortlessly (see Note 7).
Two tab-delimited text files are needed as input:
1. A source plate file provides information of the compound stocks
(compound identification, available concentration ranges,
source plate identification, and well identification).
2. A drug combination file containing the selected compound
pairs.
After loading the input files, FIMMCherry will show the layout
of the plates accordingly (Fig. 3). A pick list that is compatible with
the Labcyte Echo dispenser is then created by the program for
compound dispensing. The Labcyte Echo 550 acoustic dispenser
transfers liquid from source wells to destination wells in a
non-contact fashion in 2.5 nL droplets. The pick list generated
above is compatible with the Echo Cherry Pick software without
further modifications to produce the pre-drugged assay plates [10].
1. The compounds are dissolved in DMSO except for 19 drugs
(e.g., platinum drugs) with poor DMSO solubility or stability
that are instead dissolved in water. All 525 compounds are
transferred in five doses on eight 384-well plates.
2. The pre-dosed plates are stored in Storage Pods under nitrogen
gas at room temperature for up to 1 month.
3. For quality control, a regular quality check-up of our compound
library is performed which includes the testing of the com-
pounds with four assay-ready cell lines (DU4475, HDQ-P1,
IGROV-1, and MOLM-13) every 2 months. Following the
time-dependent reproducibility of the drug responses allows us
to precisely detect any changes in the compound stability and
activity.
Fig. 3 Drug combination plate design using FIMMCherry. The graphical user interface contains a virtual plate
enabling an interactive way of designing the plate. After loading the input files including the source, the
control, and drug pair information (the black inset boxes), the selected drug combinations and their dose
ranges will be listed in the “Drug Pair” tab, for which an echo file will be generated for acoustic dispensing.
Each plate can be visualized in a separate tab and will be named by its plate identifier (the red inset box). The
“Info” tab shows the liquids consumption in the source plates (the yellow inset box)
2. Shake the plate on the plate shaker at 450 rpm for 5 min for
proper drug dissolving.
3. Transfer a single-cell suspension in 20 μL of media to a 384-well
plate. Final dilution of CellTox Green reagent should be 1:2000
in 25 μL.
4. Incubate the cells in the plates for 72 h.
5. Shake the plates on the plate shaker at 500 rpm for 30 s. Read
fluorescence in the plates using a plate reader for CellTox Green
Cytotoxicity detection.
6. Transfer 25 μL of CellTiter-Glo reagent to the plate.
7. Shake the plates on the plate shaker at 450 rpm for 5 min and
spin the plate at 218 g for 5 min.
8. Read luminescence in the plates for detecting cell viability using
a plate reader.
Drug Combination Screening and Data Analysis 359
> source(“https://www.bioconductor.org/biocLite.R”)
> biocLite(“synergyfinder”)
> library(synergyfinder)
3.5 Synergy Scoring: 1. A single csv file that describes a drug combination dataset is
Input Data provided as input. The csv file is in a list format and must contain
the following columns:
l BlockID: the identifier for a drug combination. If multiple
drug combinations are present, e.g., in the standard 384-well
plate where six drug combinations are fitted, then the identi-
fiers for each of them must be unique.
l Row and Col: the row and column indexes for each well in
the plate.
l DrugCol: the name of the drug on the columns in a dose-
response matrix.
l DrugRow: the name of the drug on the rows in a dose-
response matrix.
l ConcCol and ConcRow: the concentrations of the column
drugs and row drugs in combination.
l ConcUnit: the unit of concentrations. It is typically nM or
μM.
l Response: the effect of drug combinations at the concentra-
tions specified by ConcCol and ConcRow. The effect must be
normalized to %inhibition of cell viability or proliferation
based on the positive and negative controls. For a well-
controlled experiment, the range of the response values is
expected from 0 to 100. However, missing values or extreme
values are allowed. For input data where the drug effect is
represented as %viability, the program will internally convert
it to %inhibition value by 100-%viability.
2. We provide example input data in the R package, which is
extracted from a recent drug combination screen for treatment
of diffuse large B-cell lymphoma (DLBCL) [7]. The example
input data contains two representative drug combinations
360 Liye He et al.
> data(“mathews_screening_data”)
> dose.response.mat <- ReshapeData(mathews_screening_data,
data.type ¼ “viability”)
> help(‘ReshapeData’)
3.6 Synergy Scoring: 1. The input data can be visualized using the function PlotDoseR-
Input Data esponse by typing:
Visualization
> PlotDoseResponse(dose.response.mat)
3. The pdf file will be saved under the current work directory with
the syntax: “drug1.drug2.dose.response.blockID.pdf.”
3.7 Synergy Scoring: 1. The current SynergyFinder package provides the synergy scores
Drug Synergy Scoring of four major reference models, including HSA, Loewe, Bliss,
(See Notes 9 and 10) and ZIP. In a drug combination experiment where drug 1 at
dose x1 is combined with drug 2 at dose x2, the effect of such a
combination is yc as compared to the monotherapy effect y1(x1)
and y2(x2). To be able to quantify the degree of drug interac-
tions, one needs to determine the deviation of yc from the
Drug Combination Screening and Data Analysis 361
59
2500 54.21 61.96 75.5 84.91 93.17 92.2
58
57
56
55
625 60.63 61.95 76.74 85.9 93.45 94.07
54 Inhibiton (%)
10 50 200 1000
ispinesib (nM)
75
156.2 60.76 66.48 74.07 87.57 92.39 94.28
Concentration (nM)
50
40
9.8 59.22 63.63 71.96 88.24 85 91.05
20
0
0 −22.96 −3.76 −18.14 41.17 53.33 71.3
−20
0.2 1 5 20
Concentration (nM)
0 0.2 0.8 3.1 12.5 50
ibrutinib (nM)
20
0 2500 53.02 58.96 64.31 67.29 66.19 86.71
−20
−40
−60 625 −51.48 −40.63 −48.98 −58.41 −21.65 36.12
−80
Inhibiton (%)
10 50 200 1000
canertinib (nM)
60
20
0 −15.04 4.76 10.43 59.69 68.31 76.97
0.2 1 5 20
Concentration (nM)
0 0.2 0.8 3.1 12.5 50
ibrutinib (nM)
Fig. 4 Plots for single-drug dose-response curves and drug combination dose-response matrices. (a) The
ibrutinib and ispinesib combination. (b) The ibrutinib and canertinib combination. Left panel: single drug dose-
response curves fitted with the commonly-used 4-parameter log-logistic (4PL) function. Right panel: the raw
dose-response matrix data is visualized as a heatmap
l Bliss: ye is the effect that would be achieved if the two drugs are
acting independently of the phenotype, i.e., ye ¼ y1 + y2 y1y2.
l ZIP: ye is the effect that would be achieved if the two drugs
do not potentiate each other, i.e., both the assumptions of
the Loewe model and the Bliss model are met.
2. Once ye can be determined, the synergy score can be calculated
as the difference between the observed effect yc and the expected
effect ye. Depending on whether yc > ye or yc < ye the drug
combination can be classified as synergistic or antagonist,
respectively. Furthermore, as the input data has been normalized
as %inhibition, the synergy score can be directly interpreted as
the proportion of cellular responses that can be attributed to the
drug interactions.
3. For a given dose-response matrix, one needs to first choose
which reference model to use and then apply the CalculateSy-
nergy function to calculate the corresponding synergy score at
each dose combination. For example, the ZIP-based synergy
score for the example data can be obtained by typing:
4. For assessing the synergy scores with the other reference models,
one needs to change the “method” parameter to “HSA,”
“Loewe,” or “Bliss.” The “correction” parameter specifies if a
baseline correction is applied on the raw dose-response data or
not. The baseline correction utilizes the average of the minimum
responses of the two single drugs as a baseline response to
correct the negative response values. The output “synergy.
score” contains a score matrix of the same size to facilitate a
dose-level evaluation of drug synergy as well as a direct compar-
ison of the synergy scores between two reference models.
3.8 Synergy Scoring: 1. The synergy scores are calculated across all the tested concentra-
The Drug Interaction tion combinations, which can be visualized as either a
Landscape two-dimensional or a three-dimensional interaction surface
over the dose matrix. The landscape of such a drug interaction
scoring is very informative when identifying the specific dose
regions where a synergistic or antagonistic drug interaction
occurs. The height of the 3D drug interaction landscape is
normalized as the % inhibition effect to facilitate a direct com-
parison of the degrees of interaction among multiple drug com-
binations. In addition, a summarized synergy score is provided
by averaging over the whole dose-response matrix. To visualize
the drug interaction landscape, one can utilize the PlotSynergy
function as below (see Fig. 5):
Drug Combination Screening and Data Analysis 363
A
ZIP synergy score: 18.042 ZIP synergy score: 18.042
−40 −20 0 20 40 −40 −30 −20 −10 0 10 20 30 40
2500
40
625
30
20
Inhibition (%)
ispinesib (nM)
156.2
10
0
−10
−20
39.1
−30
−40
2500
9.8
625
isp
156.2 50
in
12.5
es
39.1
ib
3.1
(n
0.8 )
(nM
M
0 0.2 0.8 3.1 12.5 50 9.8
)
0.2 tinib
ibru
ibrutinib (nM)
B
ZIP synergy score: −16.339 ZIP synergy score: −16.339
−40 −20 0 20 40 −40 −30 −20 −10 0 10 20 30 40
2500
40
625
30
20
Inhibition (%)
canertinib (nM)
156.2
10
0
−10
−20
39.1
−30
−40
2500
9.8
625
ca
156.2 50
ne
12.5
rti
39.1
ni
3.1
b
(n
9.8 0.8 M)
ib (n
M
0.2 tin
ibru
ibrutinib (nM)
Fig. 5 The drug interaction landscapes based on the ZIP model. (a) The ibrutinib and ispinesib combination. (b)
The ibrutinib and canertinib combination
4 Notes
1. Examples of cell lines include four cell lines that are used for
quality check of the compound library: DU4475 (breast can-
cer), HDQ-P1 (breast cancer), IGROV-1 (ovarian cancer), and
MOLM-13 (acute monocytic leukemia).
2. Specific software tools are needed in the experimental design
stage and in the data analysis stage. For the 384-well plate
design, once the drugs and the concentration ranges are
selected, we use the in-house cherry-picking program, FIMM-
cherry, to automatically generate the echo files needed for the
Labcyte Access system.
3. The FIMM oncology collection contains both FDA/EMA-
approved drugs and investigational compounds (see Fig. 1).
The collection is constantly evolving and the current FO4B
version contains 525 compounds with concentrations ranging
typically between 1 and 10,000 nM. For some compounds, the
concentration range is adjusted upward (e.g., platinum drugs,
100,000 nM) or downward (e.g., rapalogs, 100 nM) to better
match their relevant concentrations of bioactivity. The full list
of the FIMM oncology compounds can be found in
Supplementary Table 1.
4. When the drug combination dose-response matrix data is
ready, we then use the SynergyFinder R-package to score and
visualize the drug interactions. The SynergyFinder is also avail-
able as a web-application without the need to install the R
environment.
5. The SynergyFinder package will be continuously updated for
including more rigorous analyses such as statistical significance,
effect size, and noise detection.
6. Availability: The source code for the FIMMCherry program is
available at github (https://github.com/hly89/FIMM-
Cherry). The SynergyFinder R package for drug combination
data analysis is available at CRAN and Bioconductor.
7. FIMMCherry is a desktop GUI application, which is developed
using Python (https://www.python.org/) and Qt application
development framework (https://www.qt.io/). The integra-
tion of Python and Qt allows FIMMCherry to run on all the
major computer platforms including Windows, Linux, and
Mac OS X.
8. We have not seen problems in cell proliferation rate or other
major effects when using the reagent. The reagent is stable at
least 72 h in the cell culture and the cells dying at the beginning
of the 72 h incubation are still stained after 72 h.
Table 1
The FIMM oncology compound collection
High phase High
DRUG_ Mechanism Class approval Trade Supplier Conc.
NAME targets explained status Alias names Supplier Ref Solvent (nM)
SN-38 Active metabolite of A. Conv. (approved) BR-36613, 7 ChemieTek CT-SN38 DMSO 10000
irinotecan. Chemo -Ethyl-10-
Topoisomerase I hydroxy
inhibitor camptothecine
Idarubicin Topoisomerase II A. Conv. Approved Zavedos, Sigma-Aldrich I1656 DMSO 1000
inhibitor Chemo Idamycin
Auranofin Antirheumatic agent A. Conv. Approved Sigma-Aldrich A6733 DMSO 2500
Chemo
Plicamycin RNA synthesis inhibitor A. Conv. Approved Mithramycin A Santa Cruz sc-200909-5 DMSO 10000
Chemo Biotechnology
Bortezomib Proteasome inhibitor A. Conv. Approved MS-341 Velcade, National Cancer NSC 681239- DMSO 1000
(26S subunit) Chemo Cytomib Institute L/9
Clofarabine Antimetabolite; Purine A. Conv. Approved Evoltra, National Cancer NSC 606869- DMSO 10000
analog Chemo Clolar Institute X/4
Lomustine Alkylating nitrosourea A. Conv. Approved CCNU, CeeNU National Cancer NSC 79037- DMSO 10000
compound Chemo Institute R/12
Vincristine Mitotic inhibitor. Vinca A. Conv. Approved Selleck S1241 DMSO 1000
alkaloid microtubule Chemo
depolymerizer
Vinorelbine Mitotic inhibitor. Vinca A. Conv. Approved Selleck S4269 DMSO 10000
alkaloid microtubule Chemo
depolymerizer
Altretamine Formaldehyde release, A. Conv. Approved National Cancer NSC 13875- DMSO 10000
alkylating agent Chemo Institute O/97
Vinblastine Mitotic inhibitor. Vinca A. Conv. Chemo Approved National Cancer NSC 49842- DMSO 1000
alkaloid microtubule Institute J44
depolymerizer
Chlorambucil Nitrogen mustard A. Conv. Chemo Approved National Cancer NSC 3088- DMSO 10000
alkylating agent Institute N/6
Drug Combination Screening and Data Analysis
Dacarbazine Alkylating agent A. Conv. Chemo Approved National Cancer NSC 45388- DMSO 10000
Institute R/74
Cyclophosphamide Alkylating agent A. Conv. Chemo Approved Selleck S1217 DMSO 40000
365
(continued)
Table 1
366
(continued)
Cytarabine Antimetabolite, interferes A. Conv. Chemo Approved Ara-C National Cancer NSC 63878- DMSO 10000
with DNA synthesis Institute P/19
Liye He et al.
Fluorouracil Antimetabolite A. Conv. Chemo Approved 5-fluorouracil, 5-FU National Cancer NSC 19893- DMSO 10000
Institute G/4
Ifosfamide Nitrogen mustard A. Conv. Chemo Approved National Cancer NSC 109724- DMSO 10000
alkylating agent Institute X/4
Melphalan Nitrogen mustard A. Conv. Chemo Approved Sigma-Aldrich M2011 AQ 12500
alkylating agent
Mitoxantrone Topoisomerase II A. Conv. Chemo Approved National Cancer NSC 279836- DMSO 1000
inhibitor Institute C/2
Paclitaxel Mitotic inhibitor, taxane A. Conv. Chemo Approved Taxol National Cancer NSC 125973- DMSO 1000
microtubule stabilizer Institute L/68
Procarbazine Alkylating agent A. Conv. Chemo Approved National Cancer NSC 77213- DMSO 10000
Institute K/6
Topotecan Topoisomerase I A. Conv. Chemo Approved National Cancer NSC 609699- DMSO 10000
inhibitor. Institute Y/16
Camptothecin analog
Temozolomide Alkylating agent A. Conv. Chemo Approved National Cancer NSC 362856- DMSO 100000
Institute R/31
Mechlorethamine Nitrogen mustard A. Conv. Chemo Approved Nitrogen mustard Mustargen Sigma-Aldrich 122564 DMSO 100000
alkylating agent
Mitotane Antineoplastic agent A. Conv. Chemo Approved National Cancer NSC 38721- DMSO 10000
Institute U/3
Allopurinol Xanthine oxidase A. Conv. Chemo Approved Zyloprim National Cancer NSC 1390-R/ DMSO 10000
inhibitor Institute 3
Busulfan Alkylating antineoplastic A. Conv. Chemo Approved Sigma-Aldrich B2635 DMSO 100000
agent
Hydroxyurea Antineoplastic agent A. Conv. Chemo Approved Myelostat Sigma-Aldrich H8627 DMSO 100000
Mercaptopurine Antimetabolite A. Conv. Chemo Approved 6-mercaptopurine, National Cancer NSC 755-Z/ DMSO 10000
6-MP Institute 13
Thioguanine Antimetabolite; Purine A. Conv. Chemo Approved 6-thioguanine, National Cancer NSC 752-W/ DMSO 10000
analog 6-TG Institute 47
Carmustine Alkylating agent A. Conv. Approved BCNU National Cancer NSC 409962- DMSO 10000
Chemo Institute T/3
Thio-TEPA Alkylating agent A. Conv. Approved Sigma-Aldrich T6069 DMSO 50000
Chemo
Pipobroman Alkylating agent A. Conv. Chemo Approved National Cancer NSC 25154- DMSO 10000
Institute X/2
Raltitrexed DHFR/GARFT/ A. Conv. Chemo Approved ICI-D 1694 Tomudex Medchemexpress HY-10821 DMSO 1000
thymidylate synthase
inhibitor
Irinotecan Topoisomerase I A. Conv. Chemo Approved Camptosar LC Laboratories I-4122 DMSO 10000
inhibitor.
Camptothecin
prodrug analog
Nelarabine Nucleoside analog, DNA, A. Conv. Chemo Approved Arranon, SequoiaResearch SRP003328n DMSO 10000
RNA synth inhibitor Atriance Products
Docetaxel Mitotic inhibitor, taxane A. Conv. Chemo Approved Taxotere, LC Laboratories D-1000 DMSO 1000
microtubule stabilizer Docecad
Pentostatin Antimetabolite; Purine A. Conv. Chemo Approved Deoxycoformycin National Cancer NSC 218321- DMSO 10000
analog Institute O/48
Estramustine Alkylating agent A. Conv. Chemo Approved Sigma-Aldrich E0407 AQ 10000
Floxuridine Antimetabolite; Analog of A. Conv. Chemo Approved 5-fluorodeoxyuridine National Cancer NSC 27640- DMSO 10000
5-fluorouracil Institute Z/31
Gemcitabine Antimetabolite; A. Conv. Chemo Approved Gemsar, National Cancer NSC 613327- DMSO 1000
Nucleoside analog Gemzar Institute S/2
Teniposide Topoisomerase II A. Conv. Chemo Approved National Cancer NSC 122819- DMSO 10000
inhibitor Institute I/52
Dactinomycin RNA and DNA synthesis A. Conv. Chemo Approved Actinomycin D National Cancer NSC 3053-Y/ DMSO 1000
inhibitor Institute 14
Streptozocin Alkylating glucosamine- A. Conv. Chemo Approved National Cancer NSC 37917- DMSO 10000
nitrosourea agent Institute V/5
Cladribine Antimetabolite; Purine A. Conv. Chemo Approved Leustatin National Cancer NSC 105014- DMSO 1000
analog Institute F/2
Drug Combination Screening and Data Analysis
Mitomycin C Antineoplastic A. Conv. Chemo Approved National Cancer NSC 26980- DMSO 10000
anatibiotic; DNA Institute J/65
crosslinker
367
(continued)
Table 1
368
(continued)
Gefitinib EGFR inhibitor B. Kinase inhibitor Approved Iressa LC Laboratories G-4408 DMSO 10000
Imatinib Abl, Kit, PDGFRB B. Kinase inhibitor Approved Gleevec, LC Laboratories C-5508 DMSO 10000
inhibitor Glivec
369
(continued)
Table 1
370
(continued)
Erlotinib EGFR inhibitor B. Kinase inhibitor Approved OSI-774 Tarceva National Cancer NSC 718781- DMSO 10000
Institute R/4
Liye He et al.
Lapatinib HER2, EGFR inhibitor B. Kinase inhibitor Approved GW2016 Tykerb, LC Laboratories L-4804 DMSO 1000
Tyverb
Palbociclib CDK inhibitor (Cdk4/6) B. Kinase inhibitor Approved Ibrance Selleck S1116 DMSO 10000
Afatinib EGFR inhibitor B. Kinase inhibitor Approved Gilotrif, Selleck S1011 DMSO 1000
Giotrif
Crizotinib ALK, c-Met inhibitor B. Kinase inhibitor Approved Xalkori Selleck S1068 DMSO 1000
Ponatinib Broad TK inhibitor B. Kinase inhibitor Approved Iclusig Selleck S1490 DMSO 1000
Trametinib MEK1/2 inhibitor B. Kinase inhibitor Approved JTP-74057 Mekinist ChemieTek CT-GSK112 DMSO 250
Ruxolitinib JAK1&2 inhibitor B. Kinase inhibitor Approved Jakafi, Jakavi ChemieTek CT-INCB DMSO 10000
Nilotinib Abl inhibitor B. Kinase inhibitor Approved Tasigna LC Laboratories N-8207 DMSO 10000
Vemurafenib B-Raf(V600E) inhibitor B. Kinase inhibitor Approved RG7204, RO5185426 Zelboraf ChemieTek CT-P4032 DMSO 10000
Vandetanib VEGFR,EGFR, RET B. Kinase inhibitor Approved Caprelsa LC Laboratories V-9402 DMSO 1000
inhibitor
Dasatinib Abl, Src, Kit, EphR... B. Kinase inhibitor Approved Sprycel LC Laboratories D-3307 DMSO 1000
Inhibitor
Tofacitinib JAK3, JAK2(V617F) B. Kinase inhibitor Approved tasocitinib Xeljanz, LC Laboratories T-1377 DMSO 5000
inhibitor Jakvinus
Axitinib VEGFR, PDGFR, KIT B. Kinase inhibitor Approved Inlyta LC Laboratories A-1107 DMSO 10000
inhibitor
Bosutinib Abl, Src inhibitor B. Kinase inhibitor Approved Bosulif LC Laboratories B-1788 DMSO 10000
Pazopanib VEGFR inhibitor B. Kinase inhibitor Approved Votrient LC Laboratories P-6706 DMSO 10000
Sorafenib B-Raf, FGFR-1, VEGFR- B. Kinase inhibitor Approved Nevaxar LC Laboratories S-8502 DMSO 1000
2 & -3, PDGFR-beta,
KIT, and FLT3 inhib
Sunitinib Broad TK inhibitor B. Kinase inhibitor Approved Sutent LC Laboratories S-8803 DMSO 1000
Regorafenib B-Raf, c-Kit, VEGFR2 B. Kinase inhibitor Approved Stivarga Selleck S1178 DMSO 10000
inhibitor
Cabozantinib VEGFR2, Met, FLT3, B. Kinase inhibitor Approved XL184 Cometriq ChemieTek CT-XL184 DMSO 1000
Tie2, Kit and Ret
inhibitor
Ibrutinib Btk inhibitor B. Kinase inhibitor Approved CRA-032765 Imbruvica Selleck S2680 DMSO 1000
Dabrafenib B-Raf(V600E) inhibitor B. Kinase inhibitor Approved Tafinlar ChemieTek CT-DABR DMSO 2500
Ceritinib ALK inhibitor B. Kinase inhibitor Approved LDK378 Zykadia Selleck S7083 DMSO 2500
Fasudil ROCK, PKA, PKG, PRK B. Kinase inhibitor Approved HA-1077 LC Laboratories H-2330 DMSO 50000
inhibitor, prodrug (Japan)
Alectinib ALK (incl gatekeeper B. Kinase inhibitor Approved Alecensa ChemieTek CT-CH542 DMSO 1000
mut) inhib (Japan)
Idelalisib PI3K inhibitor, B. Kinase inhibitor Approved (US) CAL-101 Zydelig ChemieTek CT-CAL101 DMSO 10000
p110δ-selective
Nintedanib VEGFR, PDGFR, FGFR B. Kinase inhibitor Approved (US) Indetanib Vargatef, Selleck S1010 DMSO 10000
inhibitor Ofev
Lenvatinib VEGFR inhibitor B. Kinase inhibitor Approved (US) Lenvima Selleck S1164 DMSO 2500
CUDC-101 HDAC & EGFR, Her2 B. Kinase inhibitor Investigational Selleck S1194 DMSO 10000
inhibitor (Ph 1)
PF-00477736 Chk1 inhibitor B. Kinase inhibitor Investigational Axon Medchem Axon 1379-2 DMSO 10000
(Ph 1)
AZD7762 Chk1 inhibitor B. Kinase inhibitor Investigational Axon Medchem Axon 1399 DMSO 1000
(Ph 1)
AZD8055 mTOR inhibitor B. Kinase inhibitor Investigational ChemieTek CT-A8055-3 DMSO 10000
(Ph 1)
Doramapimod p38MAPK inhibitor B. Kinase inhibitor Investigational Axon Medchem Axon 1358 DMSO 10000
(Ph 1)
Bryostatin 1 PKC activator B. Kinase inhibitor Investigational Santa Cruz sc-201407-4 DMSO 100
(Ph 1) Biotechnology
EMD1214063 c-Met inhibitor B. Kinase inhibitor Investigational ChemieTek CT-EMD063 DMSO 1000
(Ph 1)
AZD1480 JAK1/2, FGFR inhibitor B. Kinase inhibitor Investigational ChemieTek CT-A1480 DMSO 1000
Drug Combination Screening and Data Analysis
(Ph 1)
Tamatinib Syk inhibitor B. Kinase inhibitor Investigational Selleck S2194-2 DMSO 10000
(Ph 1)
371
(continued)
Table 1
372
(continued)
TAK-733 MEK1/2 inhibitor B. Kinase inhibitor Investigational Selleck S2617 DMSO 1000
(Ph 1)
Liye He et al.
Omipalisib PI3K/mTOR inhibitor B. Kinase inhibitor Investigational Selleck S2658 DMSO 1000
(Ph 1)
TAK-901 Aurora B inhibitor B. Kinase inhibitor Investigational Selleck S2718 DMSO 1000
(Ph 1)
NVP-BGJ398 FGFR inhibitor B. Kinase inhibitor Investigational BGJ398 ChemieTek CT-BGJ398 DMSO 1000
(Ph 1)
INK128 mTOR inhibitor B. Kinase inhibitor Investigational INK128 ChemieTek CT-INK128 DMSO 1000
(Ph 1)
ZSTK474 PI3K gamma selective B. Kinase inhibitor Investigational LC Laboratories Z-1066 DMSO 10000
inhibitor (Ph 1)
AZD2014 mTOR inhibitor, B. Kinase inhibitor Investigational Selleck S2783 DMSO 10000
ATP-competitive (Ph 1)
GSK2636771 PI3K beta selective B. Kinase inhibitor Investigational ChemieTek CT-GSK263 DMSO 10000
inhibitor (Ph 1)
Rebastinib Allosteric ABL, FLT3, B. Kinase inhibitor Investigational ChemieTek CT-DCC20 DMSO 1000
TIE2, TRKA inhibitor (Ph 1)
BMS-911543 JAK2 inhibitor B. Kinase inhibitor Investigational ChemieTek CT-BMS911 DMSO 10000
(Ph 1)
LY-294002 PI3K inhibitor B. Kinase inhibitor Investigational LC Laboratories L-7962 DMSO 100000
(Ph 1)
ASP3026 ALK inhibitor B. Kinase inhibitor Investigational ChemieTek CT-ASP302 DMSO 10000
(Ph 1)
PF-03758309 PAK inhibitor B. Kinase inhibitor Investigational ChemieTek CT-PF0375 DMSO 10000
(Ph 1)
AZD-8330 MEK1/2 inhibitor B. Kinase inhibitor Investigational ARRY-424704 ChemieTek CT-A8330 DMSO 10000
(Ph 1)
BMS-599626 Pan-HER inhibitor B. Kinase inhibitor Investigational AC-480 ChemieTek CT-BMS59 DMSO 10000
(Ph 1)
LY-2874455 FGFR inhibitor B. Kinase inhibitor Investigational Axon Medchem Axon 1981 DMSO 1000
(Ph 1)
SGI-1776 PIM kinase inhibitor B. Kinase inhibitor Investigational Selleck S2198 DMSO 10000
(Ph 1)
AT7519 CDK1, 2, 4, 6 and B. Kinase inhibitor Investigational Selleck S1524 DMSO 10000
9 inhibitor (Ph 1)
TAK-960 PLK1 inhibitor B. Kinase inhibitor Investigational Santa Cruz sc-364631 DMSO 2500
(Ph 1) Biotechnology
Lucitanib FGFR1, VEGFR B. Kinase inhibitor Investigational CO-3810, S 80881 Axon Medchem Axon 1942 DMSO 10000
inhibitor (Ph 1)
AMG-208 MET inhibitor B. Kinase inhibitor Investigational Selleck S1316 DMSO 2500
(Ph 1)
AMG-900 pan-Aurora inhibitor B. Kinase inhibitor Investigational Selleck S2719 DMSO 1000
(Ph 1)
ARRY-380 HER2 inhibitor B. Kinase inhibitor Investigational Selleck S2752 DMSO 2500
(Ph 1)
GSK-1070916 AURb, AURc inhibitor B. Kinase inhibitor Investigational Selleck S2740 DMSO 1000
(Ph 1)
GSK-461364 PLK1 inhibitor B. Kinase inhibitor Investigational Selleck S2193 DMSO 10000
(Ph 1)
NVP-INC280 MET inhibitor B. Kinase inhibitor Investigational INC280, INCB-28060 Selleck S2788 DMSO 1000
(Ph 1)
OSI-930 KIT, VEGFR inhibitor B. Kinase inhibitor Investigational Selleck S1220 DMSO 2500
(Ph 1)
Palomid-529 AKT, MTOR, PI3K B. Kinase inhibitor Investigational P529 Selleck S2238 DMSO 10000
inhibitor (Ph 1)
PF-00562271 FAK inhibitor B. Kinase inhibitor Investigational Selleck S2672 DMSO 10000
(Ph 1)
PF-03814735 AURa, AURb inhibitor B. Kinase inhibitor Investigational Selleck S2725 DMSO 10000
(Ph 1)
Gedatolisib PI3K/mTOR inhibitor B. Kinase inhibitor Investigational PKI-587 Selleck S2628 DMSO 1000
(Ph 1)
SNS-314 AURa, AURb inhibitor B. Kinase inhibitor Investigational Selleck S1154 DMSO 1000
Drug Combination Screening and Data Analysis
(Ph 1)
TAK-285 HER2 inhibitor B. Kinase inhibitor Investigational Selleck S2784 DMSO 2500
(Ph 1)
373
(continued)
Table 1
374
(continued)
MLN-8054 AURa AURb FLT3 KIT B. Kinase inhibitor Investigational Selleck S1100 DMSO 10000
(PDGFR) (Ph 1)
Liye He et al.
KW-2449 AURa AURb FLT3 B. Kinase inhibitor Investigational Selleck S2158 DMSO 2500
inhibitor (Ph 1)
KRN-633 VEGFR inhibitor B. Kinase inhibitor Investigational Selleck S1557 DMSO 2500
(Ph 1)
PHA-793887 CDK inhibitor B. Kinase inhibitor Investigational Selleck S1487 DMSO 10000
(Ph 1)
AZD-6482 PI3Kbeta-selective B. Kinase inhibitor Investigational Selleck S1462 DMSO 2500
inhibitor (Ph 1)
GSK-1059615 PI3K/mTOR inhibitor B. Kinase inhibitor Investigational GSK-615 Selleck S1360 DMSO 10000
(Ph 1)
CYC-116 Aurora and VEGFR2 B. Kinase inhibitor Investigational Selleck S1171 DMSO 10000
inhibitor (Ph 1)
CP-724714 EGFR ERBB2 inhibitor B. Kinase inhibitor Investigational Selleck S1167 DMSO 10000
(Ph 1)
SGX-523 MET inhibitor B. Kinase inhibitor Investigational Selleck S1112 DMSO 5000
(Ph 1)
JNJ-38877605 MET inhibitor B. Kinase inhibitor Investigational Selleck S1114 DMSO 10000
(Ph 1)
GSK-690693 AKT, PKA, PKC inhibitor B. Kinase inhibitor Investigational Selleck S1113 DMSO 10000
(Ph 1)
OSU-03012 PDPK1 inhibitor B. Kinase inhibitor Investigational AR-12 Selleck S1106 DMSO 25000
(Ph 1)
NVP-AEW541 IGF1R inhibitor B. Kinase inhibitor Investigational AEW541 Selleck S1034 DMSO 10000
(Ph 1)
PF-04217903 MET inhibitor B. Kinase inhibitor Investigational Selleck S1094 DMSO 2500
(Ph 1)
AZD-1080 GSK3 inhibitor B. Kinase inhibitor Investigational AZ-11548415 Selleck S7145 DMSO 10000
(Ph 1)
RG-7603 PI3K inhibitor, pan-class B. Kinase inhibitor Investigational GDC-0349 Selleck S8040 DMSO 2500
I (Ph 1)
MK-8776 CHEK1 inhibitor B. Kinase inhibitor Investigational SCH-900776 Selleck S2735 DMSO 2500
(Ph 1)
CH-5132799 PI3K inhibitor, pan-class B. Kinase inhibitor Investigational PA-799 Selleck S2699 DMSO 10000
I (Ph 1)
AZD-5438 CDK1,2,9 inhibitor B. Kinase inhibitor Investigational Selleck S2621 DMSO 10000
(Ph 1)
Silmitasertib CSNK2A1 inhibitor B. Kinase inhibitor Investigational Selleck S2248 DMSO 10000
(Ph 1)
Mubritinib ERBB2 inhibitor B. Kinase inhibitor Investigational Selleck S2216 DMSO 1000
(Ph 1)
AZD-8186 PI3Kbeta inhibitor B. Kinase inhibitor Investigational Active Biochem A-1610 DMSO 1000
(Ph 1)
XL019 JAK2 inhibitor B. Kinase inhibitor Investigational Selleck S7036 DMSO 10000
(Ph 1)
Bentamapimod JNK inhibitor B. Kinase inhibitor Investigational Medchemexpress HY-14761 DMSO 10000
(Ph 1)
AZD1208 PIM1, 2, 3 kinase B. Kinase inhibitor Investigational Medchemexpress HY-15604 DMSO 10000
inhibitor (Ph 1)
BGB324 Axl inhibitor B. Kinase inhibitor Investigational R 428 Axon Medchem Axon 1946 DMSO 10000
(Ph 1)
CEP-37440 ALK inhibitor B. Kinase inhibitor Investigational ChemieTek CT-CEP374 DMSO 5000
(Ph 1)
AT13148 p70S6K, PKA, ROCK B. Kinase inhibitor Investigational Medchemexpress HY-16071 DMSO 10000
(AKT) inhibitor (Ph 1)
Cerdulatinib JAK, SYK inhibitor B. Kinase inhibitor Investigational Medchemexpress HY-15999 DMSO 10000
(Ph 1)
TEW-7197 TGF-β receptor ALK4/ B. Kinase inhibitor Investigational EW-7197 Selleck S7530 DMSO 2500
ALK5 inhibitor (Ph 1)
GDC-0994 ERK inhibitor B. Kinase inhibitor Investigational Medchemexpress HY-15947-2 DMSO 10000
(Ph 1)
Merestinib Met inhibitor B. Kinase inhibitor Investigational Medchemexpress HY-15514A DMSO 1000
Drug Combination Screening and Data Analysis
(Ph 1)
VS-4718 FAK inhibitor B. Kinase inhibitor Investigational PND-1186 Chemietek CT-VS4718 DMSO 10000
(Ph 1)
375
(continued)
Table 1
376
(continued)
LY3009120 pan-RAF inhibitor B. Kinase inhibitor Investigational DP-4978 Medchemexpress HY-12558 DMSO 10000
(Ph 1)
Liye He et al.
BI 2536 PLK1 inhibitor B. Kinase inhibitor Investigational Selleck S1109 DMSO 1000
(Ph 2)
AT9283 Aurora A & B, Jak2, Flt, B. Kinase inhibitor Investigational Selleck S1134 DMSO 1000
Abl inhibitor (Ph 2)
Danusertib Aurora, Ret, TrkA, B. Kinase inhibitor Investigational Selleck S1107 DMSO 10000
FGFR-1 inhibitor (Ph 2)
Foretinib MET, VEGFR2 inhibitor B. Kinase inhibitor Investigational XL880, EXEL-2880 Selleck S1111 DMSO 1000
(Ph 2)
SNS-032 CDK inhibitor B. Kinase inhibitor Investigational BMS-387032 Selleck S1145 DMSO 10000
(Ph 2)
Alvocidib CDK inhibitor B. Kinase inhibitor Investigational Flavopiridol, Selleck S1230 DMSO 10000
(Ph 2) HMR-1275
Pimasertib MEK1/2 inhibitor B. Kinase inhibitor Investigational MSC1936369B Selleck S1475 DMSO 10000
(Ph 2)
Motesanib VEGFR, PDGFR, Ret, B. Kinase inhibitor Investigational Selleck S1032 DMSO 10000
Kit inhibitor (Ph 2)
PF-04691502 PI3K/mTOR inhibitor B. Kinase inhibitor Investigational ChemieTek CT-PF1502 DMSO 10000
(Ph 2)
MK1775 Wee1 inhibitor B. Kinase inhibitor Investigational Axon Medchem Axon 1494 DMSO 10000
(Ph 2)
BMS-754807 IGF1R inhibitor B. Kinase inhibitor Investigational ChemieTek CT-BMS75 DMSO 10000
(Ph 2)
OSI-027 mTOR inhibitor B. Kinase inhibitor Investigational ChemieTek CT-O027 DMSO 10000
(Ph 2)
Refametinib MEK1/2 inhibitor B. Kinase inhibitor Investigational RDEA119 ChemieTek CT-R119 DMSO 10000
(Ph 2)
MK-2206 AKT inhibitor B. Kinase inhibitor Investigational ChemieTek CT-MK2206 DMSO 1000
(Ph 2)
Linsitinib IGF1R, IR inhibitor B. Kinase inhibitor Investigational ASP7487 ChemieTek CT-O906 DMSO 10000
(Ph 2)
Tandutinib FLT3, PDGFR, KIT B. Kinase inhibitor Investigational CT53518 LC Laboratories T-7802 DMSO 1000
inhibitor (Ph 2)
Pictilisib PI3K inhibitor, pan-class B. Kinase inhibitor Investigational RG-7321 LC Laboratories G-9252 DMSO 10000
I (Ph 2)
Seliciclib CDK2/7/9 inhibitor B. Kinase inhibitor Investigational Roscovitine LC Laboratories R-1234 DMSO 10000
(Ph 2)
Dactolisib mTOR/(PI3K) inhibitor B. Kinase inhibitor Investigational LC Laboratories N-4288 DMSO 1000
(Ph 2)
Quizartinib FLT3 inhibitor B. Kinase inhibitor Investigational ChemieTek CT-AC220 DMSO 1000
(Ph 2)
Gandotinib JAK2 inhibitor B. Kinase inhibitor Investigational Selleck S2179 DMSO 10000
(Ph 2)
Sotrastaurin PKC inhibitor B. Kinase inhibitor Investigational AEB071 Axon Medchem Axon 1635-2 DMSO 10000
(Ph 2)
UCN-01 PKCbeta, PDK1, Chk, B. Kinase inhibitor Investigational 7-Hydroxy Sigma-Aldrich U6508-4 DMSO 10000
Cdk2 inhibitor (Ph 2) staurosporine
Tivantinib MET inhibitor B. Kinase inhibitor Investigational ChemieTek CT-ARQ197 DMSO 1000
(Ph 2)
RAF265 C-Raf inhibitor B. Kinase inhibitor Investigational CHIR-265 Selleck S2161 DMSO 1000
(Ph 2)
Rabusertib Chk1 inhibitor B. Kinase inhibitor Investigational IC-83 Selleck S2626 DMSO 1000
(Ph 2)
Galunisertib TGF-B/Smad inhibitor B. Kinase inhibitor Investigational Selleck S2230 DMSO 1000
(Ph 2)
Buparlisib PI3K inhibitor, pan-class B. Kinase inhibitor Investigational BKM-120 Selleck S2247 DMSO 10000
I (Ph 2)
Apitolisib PI3K/mTOR inhibitor B. Kinase inhibitor Investigational ChemieTek CT-G0980 DMSO 10000
(Ph 2)
AZD4547 FGFR inhibitor B. Kinase inhibitor Investigational ChemieTek CT-A4547 DMSO 1000
(Ph 2)
Sonolisib PI3K inhibitor, pan-class B. Kinase inhibitor Investigational DJM-166 Active Biochem PX-866 DMSO 10000
Drug Combination Screening and Data Analysis
I. Irreversible (Ph 2)
(continued)
377
Table 1
378
(continued)
Binimetinib MEK1/2 inhibitor B. Kinase inhibitor Investigational MEK162, ARRY- ChemieTek CT-A162 DMSO 1000
(Ph 2) 438162, ARRY-162
Liye He et al.
KX2-391 non-ATP competitive Src B. Kinase inhibitor Investigational Selleck S2700 DMSO 10000
inhibitor (Ph 2)
Fostamatinib Syk inhibitor B. Kinase inhibitor Investigational Selleck S2206-3 AQ 2500
(Ph 2)
Momelotinib JAK1 & 2 inhibitor B. Kinase inhibitor Investigational CYT1138 ChemieTek CT-CYT387 DMSO 10000
(Ph 2)
Ralimetinib p38MAPK inhibitor B. Kinase inhibitor Investigational CP868569 Selleck S1494 DMSO 10000
(Ph 2)
Crenolanib PDGFRA and PDGFRB B. Kinase inhibitor Investigational Selleck S2730 DMSO 10000
inhibitor (Ph 2)
GDC-0068 AKT inhibitor B. Kinase inhibitor Investigational RG7440 ChemieTek CT-G0068 DMSO 10000
(Ph 2)
Alpelisib PI3Kalpha inhibitor B. Kinase inhibitor Investigational BYL719 Selleck S2814 DMSO 2500
(Ph 2)
Baricitinib JAK inhibitor B. Kinase inhibitor Investigational INCB28050 Selleck S2851 DMSO 2500
(Ph 2)
AZD-5363 AKT inhibitor B. Kinase inhibitor Investigational ChemieTek CT-A5363 DMSO 10000
(Ph 2)
SAR302503 JAK2-selective inhibitor B. Kinase inhibitor Investigational TG-101348 ChemieTek CT-TG101 DMSO 10000
(Ph 2)
Bafetinib Abl, Lyn inhibitor B. Kinase inhibitor Investigational NS-187 Selleck S1369 DMSO 1000
(Ph 2)
Tideglusib GSK3 inhibitor B. Kinase inhibitor Investigational Selleck S2823 DMSO 3000
(Ph 2)
Rigosertib Ras-Raf interaction B. Kinase inhibitor Investigational Estybon, Selleck S1362 DMSO 10000
inhibitor (Ph 2) Novonex
Milciclib CDK2 inhibitor B. Kinase inhibitor Investigational Selleck S2751 DMSO 10000
(Ph 2)
Duvelisib PI3K inhibitor B. Kinase inhibitor Investigational INK1197 Selleck S7028 DMSO 500
(Ph 2)
Icotinib EGFR inhibitor B. Kinase inhibitor Investigational Selleck S2922 DMSO 10000
(Ph 2)
Amuvatinib Broad spectrum TK inhib B. Kinase inhibitor Investigational Selleck S1244 DMSO 10000
(Ph 2)
Pelitinib EGFR inhibitor B. Kinase inhibitor Investigational WAY-EKB 569 Selleck S1392 DMSO 2500
(Ph 2)
Telatinib VEGFR, KIT, PDGFR B. Kinase inhibitor Investigational Selleck S2231 DMSO 10000
inhibitor (Ph 2)
Triciribine AKT inhibitor B. Kinase inhibitor Investigational Pentaazacentopthylene, Selleck S1117-2 DMSO 100000
(Ph 2) Tricyclic nucleoside
Tozasertib pan-Aurora inhibitor B. Kinase inhibitor Investigational MK-0457 Selleck S1048 DMSO 10000
(Ph 2)
Varlitinib EGFR HER2 inhibitor B. Kinase inhibitor Investigational Selleck S2755 DMSO 10000
(Ph 2)
Golvatinib MET, VEGFR2 inhibitor B. Kinase inhibitor Investigational Selleck S2859 DMSO 2500
(Ph 2)
Copanlisib PI3K alpha, beta selective B. Kinase inhibitor Investigational Selleck S2802-2 DMSO 1000
inhibitor (Ph 2) w/
10mM
TFA
Sapitinib Pan-HER inhibitor B. Kinase inhibitor Investigational Selleck S2192 DMSO 1000
(Ph 2)
NVP-AEE788 EGFR, VEGFR, ABL, B. Kinase inhibitor Investigational AEE788, GNF-PF- Selleck S1486 DMSO 2500
SRC inhibitor (Ph 2) 5343
NVP-BGT226 PI3K/mTOR inhibitor B. Kinase inhibitor Investigational BGT226 Selleck S2749 DMSO 1000
(Ph 2)
BMS-777607 Met, Axl, Ron and Tyro3 B. Kinase inhibitor Investigational Selleck S1561 DMSO 2500
inhibitor (Ph 2)
Abemaciclib CDK4 and 6 inhibitor B. Kinase inhibitor Investigational Selleck S7158 DMSO 2500
(Ph 2)
VX 745 p38MAPK inhibitor B. Kinase inhibitor Investigational Tocris 3915 DMSO 10000
(Ph 2) Biosciences
Drug Combination Screening and Data Analysis
XL-647 EGFR, ERBB2, VEGFR, B. Kinase inhibitor Investigational Santa Cruz sc-364659 DMSO 1000
EPHB4 (Ph 2) Biotechnology
379
(continued)
Table 1
380
(continued)
PD184352 MEK1/2 inhibitor B. Kinase inhibitor Investigational Selleck S1020 DMSO 10000
(Ph 2)
Liye He et al.
ENMD-2076 pan-Aurora inhibitor B. Kinase inhibitor Investigational Selleck S1181 DMSO 10000
(Ph 2)
MK-2461 MET inhibitor B. Kinase inhibitor Investigational Selleck S2774 DMSO 10000
(Ph 2)
PD0325901 MEK1/2 inhibitor B. Kinase inhibitor Investigational Selleck S1036 DMSO 1000
(Ph 2)
PH-797804 p38MAPK inhibitor B. Kinase inhibitor Investigational Selleck S2726 DMSO 1000
(Ph 2)
TAK-715 p38MAPK inhibitor B. Kinase inhibitor Investigational Selleck S2928 DMSO 10000
(Ph 2)
TG100-115 PI3K gamma/delta B. Kinase inhibitor Investigational Selleck S1352 DMSO 10000
inhibitor (Ph 2)
CEP-32496 BRAF inhibitor B. Kinase inhibitor Investigational Selleck S8015 DMSO 10000
(Ph 2)
GDC-0623 MEK1/2 inhibitor B. Kinase inhibitor Investigational Active Biochem A-1181 DMSO 2500
(Ph 2)
Talmapimod p38MAPK alpha selective B. Kinase inhibitor Investigational Axon Medchem Axon 1671 DMSO 10000
inhibitor (Ph 2)
Encorafenib B-RAF(V600E) B. Kinase inhibitor Investigational LGX818 Selleck S7108 DMSO 1000
(Ph 2)
Tanzisertib JNK1, 2, 3 inhibitor B. Kinase inhibitor Investigational Medchemexpress HY-15495 DMSO 10000
(Ph 2)
BMS863233 Cdc7 inhibitor B. Kinase inhibitor Investigational Selleck S7547 AQ 10000
(Ph 2)
Entospletinib SYK inhibitor B. Kinase inhibitor Investigational Selleck S7523 DMSO 5000
(Ph 2)
Voxtalisib mTOR/PI3K inhibitor B. Kinase inhibitor Investigational XL765 ChemieTek CT-XL765c DMSO 10000
(Ph 2)
Pilaralisib PI3K inhibitor. Pan-class B. Kinase inhibitor Investigational XL147 Medchemexpress HY-16526 DMSO 2500
I (Ph 2)
Uprosertib AKT inhibitor B. Kinase inhibitor Investigational Medchemexpress HY-15965 DMSO 10000
(Ph 2)
Filgotinib JAK1-selective inhibitor B. Kinase inhibitor Investigational Selleck S7605 DMSO 10000
(Ph 2)
Afuresertib AKT1-selective inhibitor B. Kinase inhibitor Investigational Medchemexpress HY-15966A DMSO 1000
(Ph 2)
PF-06463922 ALK, ROS1 inhibitor B. Kinase inhibitor Investigational Selleck S7536 DMSO 1000
(Ph 2)
SLx-2119 ROCK2 inhibitor B. Kinase inhibitor Investigational KD025 Medchemexpress HY-15307 DMSO 5000
(Ph 2)
Poziotinib pan-HER inhibitor B. Kinase inhibitor Investigational NOV120101 Medchemexpress HY-15730 DMSO 1000
(Ph 2)
Spebrutinib BTK inhibitor B. Kinase inhibitor Investigational AVL-292 Medchemexpress HY-18012 DMSO 1000
(Ph 2)
Ulixertinib ERK inhibitor B. Kinase inhibitor Investigational VRT752271 Chemietek CT-VRT752 DMSO 10000
(Ph 2)
Prexasertib Chk1 inhibitor B. Kinase inhibitor Investigational Medchemexpress HY-18174A DMSO 10000
(Ph 2)
Vatalanib VEGFR-1 & -2 inhibitor B. Kinase inhibitor Investigational PTK 787, ZK222584 LC Laboratories V-8303 DMSO 10000
(Ph 3)
Orantinib KDR, FGFR, PDGFR B. Kinase inhibitor Investigational SU6668 Selleck S1470 DMSO 10000
inhibitor (Ph 3)
Selumetinib MEK1/2 inhibitor B. Kinase inhibitor Investigational ARRY-142886 Selleck S1008 DMSO 10000
(Ph 3)
Dovitinib FGFR inhibitor B. Kinase inhibitor Investigational TKI258 Selleck S1018 DMSO 10000
(Ph 3)
Perifosine AKT/PI3K inhibitor B. Kinase inhibitor Investigational Selleck S1037 AQ 2500
(Ph 3)
Cediranib KDR/Flt/VEGFR B. Kinase inhibitor Investigational Recentin Selleck S1017 DMSO 1000
inhibitor (Ph 3)
Tivozanib VEGFR1, 2, 3, c-Kit, B. Kinase inhibitor Investigational KRN951 ChemieTek CT-AV951 DMSO 10000
Drug Combination Screening and Data Analysis
(continued)
Table 1
382
(continued)
Alisertib Aurora A inhibitor B. Kinase inhibitor Investigational ChemieTek CT-M8237 DMSO 10000
(Ph 3)
Liye He et al.
Lestaurtinib FLT3, JAK2, TrkA, TrkB, B. Kinase inhibitor Investigational LC Laboratories L-6307 DMSO 1000
TrkC inhibitor (Ph 3)
Saracatinib Src, Abl inhibitor B. Kinase inhibitor Investigational LC Laboratories S-8906 DMSO 10000
(Ph 3)
Canertinib pan-HER inhibitor B. Kinase inhibitor Investigational PD 183805 LC Laboratories C-1201 DMSO 10000
(Ph 3)
Enzastaurin PKCbeta inhibitor B. Kinase inhibitor Investigational LC Laboratories E-4506 DMSO 10000
(Ph 3)
Masitinib KIT inhibitor B. Kinase inhibitor Investigational LC Laboratories M-7007 DMSO 10000
(Ph 3)
Midostaurin PKC, PKA, S6K and B. Kinase inhibitor Investigational PKC412, CGP 41251 LC Laboratories P-7600 DMSO 10000
EGFR inhibitor (Ph 3)
Ruboxistaurin PKCbeta inhibitor B. Kinase inhibitor Investigational Axon Medchem Axon 1401-2 DMSO 10000
(Ph 3)
Volasertib PLK1 inhibitor B. Kinase inhibitor Investigational ChemieTek CT-BI6727 DMSO 1000
(Ph 3)
Neratinib EGFR inhibitor B. Kinase inhibitor Investigational Selleck S2150(2) DMSO 1000
(Ph 3)
Linifanib VEGFR, PDGFR, B. Kinase inhibitor Investigational AL-39324, RG3635 Selleck S1003 DMSO 1000
CSF-1R, FLT3 (Ph 3)
inhibitor
Brivanib VEGFR inhibitor B. Kinase inhibitor Investigational BMS-582664 Selleck S1084 DMSO 1000
(Ph 3)
Dacomitinib pan-HER inhibitor B. Kinase inhibitor Investigational ChemieTek CT-DACO DMSO 1000
(Ph 3)
Dinaciclib CDK inhibitor B. Kinase inhibitor Investigational ChemieTek CT-DINA DMSO 1000
(Ph 3)
Apatinib VEGFR inhibitor B. Kinase inhibitor Investigational Selleck S2221 DMSO 10000
(Ph 3)
Semaxanib VEGFR inhibitor B. Kinase inhibitor Investigational Selleck S2845 DMSO 10000
(Ph 3)
NVP-LEE011 CDK4/6 inhibitor B. Kinase inhibitor Investigational LEE011 Selleck S7440 DMSO 10000
(Ph 3)
Pacritinib FLT3/JAK2 B. Kinase inhibitor Investigational Selleck S8057 DMSO 10000
(Ph 3)
Cobimetinib MEK1/2 inhibitor B. Kinase inhibitor Investigational XL-518 Medchemexpress HY-13064 DMSO 1000
(Ph 3)
Osimertinib EGFR(L858R/T790M) B. Kinase inhibitor Investigational Selleck S7297 DMSO 2500
inhibitor (Ph 3)
Losmapimod p38MAPK inhibitor B. Kinase inhibitor Investigational Selleck S7215 DMSO 10000
(Ph 3)
Rociletinib EGFR(L858R/T790M) B. Kinase inhibitor Investigational AVL-301, CNX-419 ChemieTek CT-CO1686 DMSO 10000
inhibitor (Ph 3)
Taselisib PI3K alpha, gamma B. Kinase inhibitor Investigational RG7604 Medchemexpress HY-13898-2 DMSO 1000
selective inhibitor (Ph 3)
Pexidartinib KIT, CSF1R, FLT3 B. Kinase inhibitor Investigational Medchemexpress HY-16749 DMSO 10000
inhibitor (Ph 3)
AZ 3146 Mps1 kinase (TTK) B. Kinase inhibitor Probe Tocris 3994 DMSO 10000
inhibitor Biosciences
PF-04708671 p70S6K inhibitor B. Kinase inhibitor Probe Sigma-Aldrich PZ0143 DMSO 10000
TGX-221 PI3K beta selective B. Kinase inhibitor Probe ChemieTek CT-TGX221 DMSO 10000
inhibitor
VX-11E ERK1 & 2 inhibitor B. Kinase inhibitor Probe ChemieTek CT-VX11e DMSO 2500
PF-4800567 CK1epsilon inhibitor B. Kinase inhibitor Probe Tocris 4281 DMSO 10000
Biosciences
PF-670462 CK1epsilon and B. Kinase inhibitor Probe Tocris 3316 DMSO 10000
CK1delta inhibitor Biosciences
(5Z)-7-Oxozeaenol TAK1 inhibitor B. Kinase inhibitor Probe (5Z)-7-Oxozeaenol Tocris 3604-03-01 DMSO 10000
Biosciences
GSK269962 ROCK1 and ROCK2 B. Kinase inhibitor Probe Tocris 4009 DMSO 10000
inhibitor Biosciences
Drug Combination Screening and Data Analysis
PF 431396 FAK/PYK2 inhibitor B. Kinase inhibitor Probe Tocris 4278 DMSO 10000
Biosciences
GSK650394 SGK1 & 2 inhibitor B. Kinase inhibitor Probe Tocris 3572 DMSO 10000
Biosciences
383
(continued)
Table 1
384
(continued)
AZ-23 Trk inhibitor B. Kinase inhibitor Probe Axon Medchem Axon 1610 DMSO 1000
Liye He et al.
GSK-1838705A IGF1R, INSR, ALK inhib B. Kinase inhibitor Probe ChemieTek CT-GSK183 DMSO 2500
GSK-1904529A IGF1R, INSR inhib B. Kinase inhibitor Probe ChemieTek CT-GSK190 DMSO 10000
SP600125 pan-JNK inhibitor B. Kinase inhibitor Probe Pyrazolanthrone, LC Laboratories S-7979 DMSO 100000
Anthrapyrazolone
BX-912 PDK1 inhib B. Kinase inhibitor Probe Selleck S1275 DMSO 10000
GSK-2334470 PDK1 inhibitor B. Kinase inhibitor Probe ChemieTek CT-GSK233 DMSO 10000
SCH772984 ERK1 & 2 inhibitor B. Kinase inhibitor Probe ChemieTek CT-SCH772 DMSO 10000
PKI-402 PI3K/mTOR inhibitor B. Kinase inhibitor Probe Selleck S2739 DMSO 2500
PHA 408 IKK-2 inhibitor B. Kinase inhibitor Probe Axon Medchem Axon 1651 DMSO 10000
PS-1145 IKK-2 inhibitor B. Kinase inhibitor Probe Axon Medchem Axon 1568 DMSO 25000
KU-60019 ATM inhibitor B. Kinase inhibitor Probe Selleck S1570 DMSO 25000
TPCA-1 IKK-2 inhibitor B. Kinase inhibitor Probe Selleck S2824 DMSO 25000
IRAK1/4 inhibitor IRAK1/4 inhibitor B. Kinase inhibitor Probe IRAK1/4 inhibitor Merck Millipore 407601 DMSO 25000
MK-8745 Aurora A inhibitor B. Kinase inhibitor Probe Selleck S7065 DMSO 2500
AZ191 DYRK1A inhibitor B. Kinase inhibitor probe Selleck S7338 DMSO 10000
VE-821 ATR inihibitor B. Kinase inhibitor Probe Selleck S8007 DMSO 10000
AZD7545 PDHK inhibitor B. Kinase inhibitor probe Selleck S7517 DMSO 10000
GNE-0877 LRRK2 inhibitor B. Kinase inhibitor probe Selleck S7367 DMSO 1000
OTSSP167 MELK inhibitor B. Kinase inhibitor probe Selleck S7159 DMSO 1000
UNC2881 MER inhibitor B. Kinase inhibitor probe Selleck S7325 DMSO 2500
AMG-925 FLT-3, CDK4 inhibitor B. Kinase inhibitor Probe ChemieTek CT-AMG925 DMSO 1000
FRAX486 PAK1, 2, 3 inhibitor B. Kinase inhibitor probe ChemieTek CT-F486 DMSO 5000
GNE-7915 LRRK2 inhibitor B. Kinase inhibitor probe ChemieTek CT-GNE79 DMSO 1000
TAK-632 pan-RAF inhibitor B. Kinase inhibitor probe ChemieTek CT-TAK632 DMSO 10000
GSK2656157 PERK inhibitor B. Kinase inhibitor probe Medchemexpress HY-13820 DMSO 2500
Tacrolimus Binds FKBP12, causes C. Rapalog Approved Fujimycin Prograf, Tocris 3631 DMSO 10000
inhibition of Advagraf, Biosciences
calcineurin Protopic
Everolimus binds FKBP12, causes C. Rapalog Approved SDZ-RAD Afinitor, LC Laboratories E-4040 DMSO 100
inhibition of Certican,
mTORC1 Zortress
Temsirolimus binds FKBP12, causes C. Rapalog Approved Torisel LC Laboratories T-8040 DMSO 100
inhibition of
mTORC1
Sirolimus binds FKBP12, causes C. Rapalog Approved Rapamycin Rapamune LC Laboratories R-5000 DMSO 100
inhibition of
mTORC1
Ridaforolimus binds FKBP12, causes C. Rapalog Investigational AP 23573, Deforolimus Active Biochem A-1004 DMSO 100
inhibition of (Ph 3)
mTORC1
Dexamethasone Immunosuppresant; D. Immunomodulatory Approved Decadron, Selleck S1322 DMSO 10000
glucocorticoid Dexpak
Thalidomide Immunosuppresant D. Immunomodulatory Approved National Cancer NSC 66847- DMSO 10000
Institute R/5
Imiquimod Immunomodulatory D. Immunomodulatory Approved National Cancer NSC 369100- DMSO 2500
agent, TLR7 agonist Institute F/4
Levamisole Immunomodulatory D. Immunomodulatory Approved Tetramisole Sigma-Aldrich L9756 DMSO 10000
agent
Methylprednisolone Immunosuppressant D. Immunomodulatory Approved Santa Cruz sc-205749 DMSO 10000
Biotechnology
Prednisolone Immunomodulatory D. Immunomodulatory Approved Santa Cruz sc-205815 DMSO 10000
agent Biotechnology
Prednisone Immunomodulatory D. Immunomodulatory Approved Santa Cruz sc-205816 DMSO 10000
agent Biotechnology
Bimatoprost Prostaglandin analog D. Immunomodulatory Approved Latisse, Selleck S1407 DMSO 5500
Lumigan,
Drug Combination Screening and Data Analysis
Prostamide
Lenalidomide Immunomodulatory D. Immunomodulatory Approved Revlimid LC Laboratories L-5499 DMSO 100000
385
(continued)
Table 1
386
(continued)
Pomalidomide Immunomodulatory D. Immunomodulatory Approved 3-amino-thalidomide Pomalyst Sigma-Aldrich P0018 DMSO 10000
agent, anti-angiogenic
Liye He et al.
Quisinostat HDAC inhibitor E. Differentiating/ Investigational Active Biochem A-1162 DMSO 1000
epigenetic modifier (Ph 2)
FG-4592 HIF prolyl hydroxylase E. Differentiating/ Investigational Selleck S1007 DMSO 10000
inhibitor epigenetic modifier (Ph 2)
387
(continued)
Table 1
388
(continued)
inhibitor
Pracinostat HDAC inhibitor E. Differentiating/ Investigational Selleck S1515 DMSO 10000
epigenetic modifier (Ph 2)
Resminostat HDAC1, 3, 6 inhibitor E. Differentiating/ Investigational Medchemexpress HY-14718A DMSO 10000
epigenetic modifier (Ph 2)
Givinostat HDAC inhibitor E. Differentiating/ Investigational Selleck S2170 DMSO 1000
epigenetic modifier (Ph 2)
Abexinostat HDAC1-selective E. Differentiating/ Investigational Medchemexpress HY-10990 DMSO 10000
inhibitor epigenetic modifier (Ph 2)
Veliparib PARP inhibitor E. Differentiating/ Investigational Selleck S1004 DMSO 10000
epigenetic modifier (Ph 3)
Tipifarnib Farnesyltransferase E. Differentiating/ Investigational Zarnestra, IND 58359 Selleck S1453 DMSO 10000
inhibitor epigenetic modifier (Ph 3)
Tacedinaline HDAC inhibitor E. Differentiating/ Investigational Acetyldinaline, Gö LC Laboratories C-2606 DMSO 1000
epigenetic modifier (Ph 3) 5549, PD 123654,
Iniparib PARP inhibitor E. Differentiating/ Investigational IND-71677 Axon Medchem Axon 1566 DMSO 10000
epigenetic modifier (Ph 3)
Lonafarnib Farnesyl transferase E. Differentiating/ Investigational Sarasar Selleck S2797 DMSO 10000
inhibitor epigenetic modifier (Ph 3)
Talazoparib PARP1/2 inhibitor E. Differentiating/ Investigational Medchemexpress HY-16106 DMSO 1000
epigenetic modifier (Ph 3)
XAV-939 Tankyrase-1 and -2 E. Differentiating/ Probe Selleck S1180 DMSO 10000
epigenetic modifier
(+)JQ1 BET family inhibitor E. Differentiating/ Probe (+)JQ1, SGCBD01(+) SGC SGCBD01(+) DMSO 10000
epigenetic modifier
Tubacin HDAC6 inhibitor E. Differentiating/ Probe Tubacin Selleck S2239 DMSO 10000
epigenetic modifier
Tubastatin A HDAC6 inhibitor E. Differentiating/ Probe Tubastatin A ChemieTek CT-TUBA DMSO 10000
epigenetic modifier
StemRegenin 1 AHR antagonist, stem cell E. Differentiating/ Probe StemRegenin 1, SR1 ChemieTek CT-SR1 DMSO 10000
regenerating epigenetic modifier
PFI-1 Selective chemical probe E. Differentiating/ Probe SGC SGCPFI DMSO 10000
for BET epigenetic modifier
Bromodomains
I-BET151 BET family inhibitor E. Differentiating/ Probe GSK1210151A ChemieTek CT-BET151 DMSO 10000
epigenetic modifier
IOX-2 PHD2 inhibitor E. Differentiating/ Probe Tocris 4451 DMSO 50000
epigenetic modifier Biosciences
GSK-J4 JMJD3 (histone E. Differentiating/ Probe Selleck S7070-2 DMSO 100000
demethylase) inhibitor epigenetic modifier
UNC1215 L3MBTL3 inhibitor E. Differentiating/ Probe Tocris 4666 DMSO 30000
epigenetic modifier Biosciences
SGC0946 DOT1L inhibitor E. Differentiating/ Probe Selleck S7079 DMSO 10000
epigenetic modifier
UNC0642 G9a/GLP inhibitor E. Differentiating/ Probe Tocris 5132 DMSO 10000
epigenetic modifier Biosciences
GSK343 EZH2 inhibitor E. Differentiating/ Probe SGC SGCGSK343 DMSO 10000
epigenetic modifier
UNC0638 G9a/GLP inhibitor E. Differentiating/ Probe Tocris 4343 DMSO 10000
epigenetic modifier Biosciences
C646 p300/CREB-binding E. Differentiating/ Probe Axon Medchem Axon 1781 DMSO 25000
protein (CBP) epigenetic modifier
inhibitor
EPZ-5687 EZH2 inhibitor E. Differentiating/ Probe ChemieTek CT-EPZ687 DMSO 10000
epigenetic modifier
IOX-1 2-Oxoglutarate E. Differentiating/ Probe Selleck S7234 DMSO 100000
Oxygenase Inhibitor epigenetic modifier
GSK2801 BAZ2B/A bromodomain E. Differentiating/ Probe SGC SGCGSK2801 DMSO 10000
inhibitor epigenetic modifier
SGC-CBP30 CREBBP/EP300 E. Differentiating/ Probe Medchemexpress HY-15826 DMSO 25000
bromodomain epigenetic modifier
inhibitor
RGFP966 HDAC3 inhibitor E. Differentiating/ Probe Selleck S7229 DMSO 10000
Drug Combination Screening and Data Analysis
epigenetic modifier
PTC-209 BMI-1 inhibitor E. Differentiating/ probe Selleck S7539 DMSO 10000
epigenetic modifier
389
(continued)
Table 1
390
(continued)
(AhR) antagonists
PCI-34051 HDAC8 inhibitor E. Differentiating/ probe Medchemexpress HY-15224 DMSO 10000
epigenetic modifier
EPZ015666 PRMT5 inhibitor E. Differentiating/ Probe Selleck S7748 DMSO 10000
epigenetic modifier
Goserelin Gonadotropin releasing F. Hormone therapy Approved Tocris 3592 DMSO 10000
hormone superagonist Biosciences
Raloxifene Selective estrogen F. Hormone therapy Approved National Cancer NSC 747974- DMSO 10000
receptor modulator Institute X/1
Letrozole Aromatase inhibitor F. Hormone therapy Approved Femara National Cancer NSC 719345- DMSO 10000
Institute G/2
Anastrozole Aromatase inhibitor F. Hormone therapy Approved National Cancer NSC 719344- DMSO 10000
Institute F/2
Bicalutamide Nonsteriodal F. Hormone therapy Approved Casodex, ChemieTek CT-BIC DMSO 10000
antiandrogen Cosudex,
Calutide,
Kalumid
Aminoglutethimide Anti-steroid, aromatase F. Hormone therapy Approved Sigma-Aldrich A9657 DMSO 10000
inhibitor
Clomifene Selective estrogen F. Hormone therapy Approved Clomid, Serophene, Selleck S2561 DMSO 10000
receptor modulator Milophene
Finasteride type II 5-alpha reductase F. Hormone therapy Approved Tocris 3293 DMSO 10000
inhibitor Biosciences
Flutamide Nonsteroidal F. Hormone therapy Approved Tocris 4094 DMSO 10000
antiandrogen Biosciences
Fulvestrant Estrogen receptor F. Hormone therapy Approved Selleck S1191 DMSO 1000
antagonist
Megestrol Progestogen F. Hormone therapy Approved Megace National Cancer NSC 71423- DMSO 10000
Institute Q/12
Tamoxifen Estrogen receptor F. Hormone therapy Approved National Cancer NSC 180973- DMSO 10000
antagonist Institute S/203
Nilutamide Nonsteroidal F. Hormone therapy Approved Santa Cruz sc-203644 DMSO 10000
antiandrogen Biotechnology
Exemestane Aromatase inhibitor F. Hormone therapy Approved National Cancer NSC 713563- DMSO 10000
Institute U/2
Abiraterone P450 17alpha- F. Hormone therapy Approved Selleck S1123 DMSO 5000
hydroxylase-17,20-
lyase inhibitor
Toremifene selective estrogen F. Hormone therapy Approved Fareston, Santa Cruz sc-253712 DMSO 10000
receptor modulator Acapodene Biotechnology
Lasofoxifene Selective estrogen F. Hormone therapy Approved Oporia Santa Cruz sc-211721 DMSO 1000
receptor modulator Biotechnology
Enzalutamide AR antagonist F. Hormone therapy Approved Xtandi Axon Medchem Axon 1613 DMSO 10000
ARN 509 AR antagonist F. Hormone therapy Investigational Axon Medchem Axon 1979 DMSO 10000
(Ph 2)
Orteronel CYP17A1, androgen F. Hormone therapy Investigational Selleck S1195 DMSO 10000
synth inhib. (Ph 3)
4-hydroxy- Selective estrogen F. Hormone therapy Investigational Santa Cruz sc-3542 DMSO 10000
tamoxifen receptor modulator as a gel Biotechnology
preparation
RD162 AR antagonist F. Hormone therapy Probe Axon Medchem Axon 1532 DMSO 10000
Serdemetan HDM2-p53 antagonist G. Apoptotic Investigational Selleck S1172 DMSO 10000
modulator (Ph 1)
APR-246 p53 activator, thioredoxin G. Apoptotic Investigational Prima-1 Met Tocris 3710 DMSO 10000
reductase 1 inhibitor modulator (Ph 1) Biosciences
PAC-1 procaspase-3 activator G. Apoptotic Investigational Selleck S2738 DMSO 10000
modulator (Ph 1)
AT-406 XIAP, cIAP1, cIAP2 G. Apoptotic Investigational Selleck S2754 DMSO 10000
inhibitor modulator (Ph 1)
Venetoclax Bcl-2-selective inhibitor G. Apoptotic Investigational GDC-0199 ChemieTek CT-A199 DMSO 1000
modulator (Ph 1)
Verdinexor XPO1/CRM1 inhibitor G. Apoptotic Investigational Medchemexpress HY-15970 DMSO 1000
Drug Combination Screening and Data Analysis
modulator (Ph 1)
SAR405838 MDM2 inhibitor G. Apoptotic Investigational MI-773 Selleck S7649 DMSO 10000
modulator (Ph 1)
391
(continued)
Table 1
392
(continued)
AT 101 Bcl-2 family inhibitor G. Apoptotic Investigational R-(-).gossypol Selleck S2812 DMSO 100000
modulator (Ph 2)
Liye He et al.
inhibitor
TH588 MTH1 inhibitor H. Metabolic modifier Probe KI/Helleday TH588 DMSO 25000
393
(continued)
Table 1
394
(continued)
AGI-6780 IDH2-R140Q inhibitor H. Metabolic modifier Probe Medchemexpress HY-15734 DMSO 10000
Liye He et al.
GSK923295 CENP-E inhibitor I. Kinesin inhibitor Investigational Medchemexpress HY-10299 DMSO 10000
(Ph 1)
SB 743921 Mitotic inhibitor. I. Kinesin inhibitor Investigational Selleck S2182 DMSO 100
Eg5/KSP inhibitor (Ph 2)
ARRY-520 KSP/Eg5 inhibitor I. Kinesin inhibitor Investigational Medchemexpress HY-15187 DMSO 1000
(Ph 2)
Rofecoxib COX-2 inhibitor J. NSAID Approved Vioxx ChemieTek CT-RX001 DMSO 10000
Celecoxib Selective COX-2 inhibitor J. NSAID Approved National Cancer NSC 719627- DMSO 10000
Institute M/1
CUDC-305 HSP90 inhibitor K. HSP inhibitor Investigational DEBIO-0932 ChemieTek CT-CU305 DMSO 10000
(Ph 1)
Tanespimycin HSP90 inhibitor K. HSP inhibitor Investigational 17-AAG Selleck S1141 DMSO 10000
(Ph 2)
Alvespimycin HSP90 inhibitor K. HSP inhibitor Investigational 17-DMAG Selleck S1142 DMSO 1000
(Ph 2)
BIIB021 HSP90 inhibitor K. HSP inhibitor Investigational Selleck S1175 DMSO 10000
(Ph 2)
Luminespib HSP90 inhibitor K. HSP inhibitor Investigational AUY922 ChemieTek CT-AUY922 DMSO 1000
(Ph 2)
Onalespib HSP90 inhibitor K. HSP inhibitor Investigational Medchemexpress HY-14463 DMSO 2500
(Ph 2)
Ganetespib HSP90 inhibitor K. HSP inhibitor Investigational Selleck S1159 DMSO 1000
(Ph 3)
VER 155008 HSP70 inhibitor K. HSP inhibitor Probe Axon Medchem Axon 1608 DMSO 10000
Radicicol HSP90 inhibitor K. HSP inhibitor Probe Monorden Tocris 2/1/1589 DMSO 10000
Biosciences
Pilocarpine Non-selective muscarinic X. Other Approved Salagen Tocris 694 DMSO 40000
receptor agonist Biosciences
Anagrelide PDE-3, PLA2 inhibitor X. Other Approved Agrylin, Tocris 2432 DMSO 10000
Xagrid Biosciences
Mepacrine Unclear. PLA2 inhibitor. X. Other Approved Quinacrine, Achricrine Sigma-Aldrich Q3251 AQ 50000
NF-kB inhibitor, p53
activator
Plerixafor CXCR4 antagonist X. Other Approved JM 3100, AMD 3100 Mozobil Cayman 10011332-2 AQ 10000
Chemical
Company
Fingolimod S1PR antagonist X. Other Approved Gilenya LC Laboratories F-4633 DMSO 10000
Vismodegib Smothened X. Other Approved HhAntag691 Erivedge LC Laboratories V-4050 DMSO 10000
(Hh) inhibitor
Deferoxamine Iron chelator X. Other Approved desferrioxamine, Sigma-Aldrich D9533 AQ 10000
(non- DFOM
oncology)
Itraconazole antifungal, hedgehog X. Other Approved Selleck S2476 DMSO 5000
signaling inhibitor (non-
oncology)
NVP-LGK974 PORCN inhibitor X. Other Investigational LGK974 Selleck S7143 DMSO 10000
(Ph 1)
Sonidegib Smothened (Hh) inhib X. Other Investigational LDE225, erismodegib ChemieTek CT-LDE225 DMSO 10000
(Ph 2) (USAN)
2-methoxyestradiol Angiogenesis inhibitor X. Other Investigational 2ME2 Cayman 13021 DMSO 10000
(Ph 2) Chemical
Company
MK-0752 gamma-secretase/notch X. Other Investigational Selleck S2660 DMSO 1000
inhibitor (Ph 2)
Varespladib Secretory phospholipase X. Other Investigational A-001 ChemieTek CT-VARE DMSO 10000
A2 inhibitor (Ph 2)
1-methyl-D- Indolamine X. Other Investigational 1-methyl-D-tryptophan Sigma-Aldrich 452483 AQ 5000
tryptophan 2,3-dioxygenase 1 and (Ph 2)
2 inhibitor
Glasdegib Smo inhibitor X. Other Investigational Medchemexpress HY-16391 DMSO 1000
(Ph 2)
Tarenflurbil Gamma-secretase X. Other Investigational (R)-Flurbiprofen Cayman 70255 DMSO 10000
inhibitor (Ph 3) Chemical
Drug Combination Screening and Data Analysis
Company
Tosedostat Aminopeptidase inhibitor X. Other Investigational Tocris 3595 DMSO 10000
(Ph 3) Biosciences
395
(continued)
Table 1
(continued)
396
Cilengitide alphaVbeta3 integrin X. Other Investigational NSC 707544 Selleck S7077 DMSO 10000
inhibitor (Ph 3)
Darapladib lipoprotein-associated X. Other Investigational Selleck S7520 DMSO 1000
Liye He et al.
phospholipase A2 (Ph 3)
inhibitor
Marimastat MMP-9, MMP-1, X. Other Investigational Selleck S7156 DMSO 10000
MMP-2, MMP-14, (Ph 3)
MMP-7 inhibitor
Galiellalactone STAT3-DNA interaction X. Other Probe Santa Cruz sc-202165-6 DMSO 25000
inhibitor Biotechnology
PF-3845 FAAH inhibitor X. Other Probe Selleck S2666 DMSO 10000
15D-PGJ2 Endogenous PPARγ X. Other Probe 15D-PGJ2, 15-deoxy Merck 538927-2 DMSO 3000
ligand, prostaglandin, delta(12,14)
NFkB signaling prostaglandin J2
inhibitor
Stattic STAT3 SH2 domain X. Other Probe Stattic Tocris 2798-03-01 DMSO 50000
inhibitor Biosciences
TRAM-34 intermediate- X. Other Probe Selleck S1160 DMSO 1000
conductance Ca2+-
activated K+ channel
inh.
deltarasin Ras-PDEdelta inhibitor X. Other Probe deltarasin Chemietek CT-DELT DMSO 10000
NSC348884 NPM1 oligomerization X. Other Probe Axon Medchem Axon 1402 DMSO 50000
inhibitor
ONX-0914 LMP7 (immuno- X.Other Probe PR-957 Selleck S7172 DMSO 10000
proteasome)
NMS-873 p97/VCP inhibitor X. Other Probe Selleck S7285 DMSO 10000
ML323 USP1-UAF1 inhibitor X. Other probe Selleck S7529 DMSO 10000
GSK2830371 Wip1 inhibitor X. Other probe ChemieTek CT-GSK283 DMSO 5000
BCI Dusp6 inhibitor X. Other probe BCI, NSC 150117 Sigma B4313 DMSO 50000
MST-312 Telomerase inhibitor X. Other probe Telomerase Inhibitor IX Sigma M3949 DMSO 10000
SH-4-54 STAT3 inhibitor X. Other probe Selleck S7337 DMSO 25000
Compound name, classification, mechanism of action, clinical phase and supplier information are displayed
Drug Combination Screening and Data Analysis 397
Acknowledgments
References
1. Vogelstein B, Papadopoulos N, Velculescu VE 6. Gillies RJ, Verduzco D, Gatenby RA (2012)
et al (2013) Cancer genome landscapes. Sci- Evolutionary dynamics of carcinogenesis and
ence 339:1546–1558 why targeted therapy does not work. Nat Rev
2. Pemovska T, Kontro M, Yadav B et al (2013) Cancer 12:487–493
Individualized systems medicine strategy to tai- 7. Mathews Griner LA, Guha R, Shinn P et al
lor treatments for patients with chemorefrac- (2014) High-throughput combinatorial
tory acute myeloid leukemia. Cancer Discov screening identifies drugs that cooperate with
3:1416–1429 ibrutinib to kill activated B-cell-like diffuse
3. Yang W, Soares J, Greninger P et al (2013) large B-cell lymphoma cells. Proc Natl Acad
Genomics of drug sensitivity in cancer Sci U S A 111:2349–2354
(GDSC): a resource for therapeutic biomarker 8. Crystal AS, Shaw TA, Sequist VL et al (2014)
discovery in cancer cells. Nucleic Acids Res Patient-derived models of acquired resistance
D41:D955–D961 can identify effective drug combinations for
4. Seashore-Ludlow B, Rees MG, Cheah JH et al cancer. Science 346:1480–1486
(2015) Harnessing connectivity in a large-scale 9. Pemovska T, Johnson E, Kontro M et al
small-molecule sensitivity dataset. Cancer Dis- (2015) Axitinib effectively inhibits
cov 5:1210–1223 BCR-ABL1 (T315I) with a distinct binding
5. Tang J, Aittokallio T (2014) Network pharma- conformation. Nature 519:102–105
cology strategies toward multi-target antican- 10. Kulesskiy E, Saarela J, Turunen L et al (2016)
cer therapies: from computational models to Precision cancer medicine in the acoustic dis-
experimental design principles. Curr Pharm pensing era: ex vivo primary cell drug sensitiv-
Des 20:20–36 ity testing. J Lab Autom 21:27–36
398 Liye He et al.
11. Haltia UM, Andersson N, Yadav B et al (2017) 19. Greco WR, Bravo G, Parsons JC (1995) The
Systematic drug sensitivity testing reveals syn- search for synergy: a critical review from a
ergistic growth inhibition by dasatinib or response surface perspective. Pharmacol Rev
mTOR inhibitors with paclitaxel in ovarian 47:331–385
granulosa cell tumor cells. Gynecol Oncol 20. Zhao W, Sachsenmeier K, Zhang L et al (2014)
144:621 A new bliss independence model to analyze
12. Saeed K, Rahkama V, Eldfors S et al (2017) drug combination data. J Biomol Screen
Comprehensive drug testing of patient-derived 19:817–821
conditionally reprogrammed cells from 21. Yadav B, Wennerberg K, Aittokallio T et al
castration-resistant prostate cancer. Eur Urol (2015) Searching for drug synergy in complex
71:319. https://doi.org/10.1016/j.eururo. dose-response landscapes using an interaction
2016.04.019 potency model. Comput Struct Biotechnol J
13. Berenbaum MC (1989) What is synergy. Phar- 13:504–513
macol Rev 41:93–141 22. Szwajda A, Gautam P, Karhinen L et al (2015)
14. Loewe S (1953) The problem of synergism and Systematic mapping of kinase addiction combi-
antagonism of combined drugs. Arzneimittel- nations in breast cancer cells by integrating
forschung 3:285–290 drug sensitivity and selectivity profiles. Chem
15. Bliss CI (1939) The toxicity of poisons applied Biol 22:1144–1155
jointly. Ann Appl Biol 26:585–615 23. Gautam P, Karhinen L, Szwajda A et al (2016)
16. Chou TC (2006) Theoretical basis, experimen- Identification of selective cytotoxic and syn-
tal design, and computerized simulation of syn- thetic lethal drug responses in triple negative
ergism and antagonism in drug combination breast cancer cells. Mol Cancer 15:34
studies. Pharmacol Rev 58:621–681 24. Karjalainen R, Pemovska T, Majumder M et al
17. Boik JC, Narasimhan B (2010) An R package (2017) JAK1/2 and BCL2 inhibitors synergize
for assessing drug synergism/antagonism. J to counteract bone marrow stromal cell-
Stat Softw 34:6 induced protection of AML. Blood 130:789
18. Ritz C, Baty F, Streibig JC (2005) Bioassay
analysis using R. J Stat Softw 12:5
INDEX
Louise von Stechow (ed.), Cancer Systems Biology: Methods and Protocols, Methods in Molecular Biology, vol. 1711,
https://doi.org/10.1007/978-1-4939-7493-1, © Springer Science+Business Media, LLC 2018
399
CANCER SYSTEMS BIOLOGY: METHODS AND PROTOCOLS
400 Index
E L
Enhanced chemiluminescence (ECL)................. 152, 155 Liquid chromatography (LC)
Enhanced reduced representation bisulfite high-performance liquid chromatography
sequencing (ERRBS)..............................28–30, (HPLC)..................... 151, 153, 154, 158, 159
33–36, 39, 46, 49–51 reversed phase liquid chromatography (RPLC) .... 106
Epigenomics ...................................................9, 13–25, 27 Locked nucleic acids (LNA) ........................................... 75
Epipolymorphism......................................................46–48 Loss of function ................................................. 61, 83–85
Epitope Retrieval (ER) ............................... 262, 264, 271 Luminescence assay......................................................... 87
F M
Formalin-fixed paraffin-embedded (FFPE) ................150, Machine learning
262, 270 support vector machine (SVM)..................... 175, 179
support vector regression (SVR)...........245–246, 252
G Magnetic resonance imaging (MRI)................... 229–230
Gene expression profiles (GEPs)........................ 244–247, Magnetic resonance spectroscopy (MRS).................... 169
252, 280, 284 Manhattan plot.............................................................. 282
Mascot ................................................................. 108, 109,
Gene ontology (GO) .............................................. 43, 44,
154, 161 154, 159, 160
Gene Set Enrichment Analysis (GSEA) ......112, 284–286 Mass spectrometry
data-dependent acquisition (DDA) .............. 106, 107
Genome Wide Associate Studies (GWAS).......... 279–282
Genomic sequencing............................................ 277–291 data-independent acquisition (DIA)............. 106–107
GitHub .....................................................................16–18, higher-energy collisional dissociation (HCD)....... 158
20, 23, 24, 119 multiple reaction monitoring (MRM) ................... 106
Glioblastoma ........................................................ 149–162 selected reaction monitoring (SRM) ..................... 106
Graphical user interphases (GUIs)............................... 182 tryptic digestion ............................ 152–153, 155–156
Mathematical model
H deterministic model ...................... 193–216, 340–341
finite difference method ......................................... 227
Hematoxylin and Eosin (H&E) staining............ 239, 255 Gompertz model................................... 194, 300, 301
High-throughput data ............................................ 13, 14, nominal optimization problem .............306, 322–324
64, 68–69, 72 objective function.......................................... 236, 237,
Histograms .......................................................... 138, 141, 299, 300, 307, 315, 316, 319, 321
187, 203, 268 partial differential equation (PDE) ...... 195, 203, 204
petri nets ......................................................... 341, 342
I
population model.......................... 203–208, 215, 306
Immunoblotting ......................................... 152, 155, 161 robust optimization problem (ROP).................... 307,
Immunoprecipitation (IP) .................................... 70, 151, 316, 321, 324–327
153, 157, 158 stochastic model ............................................ 193–216,
In silico ....................................................... 24, 58, 65, 70, 338, 340–342
125, 228, 254, 256, 338–339 Poisson processes ............194–195, 197–212, 217
Intra-tumor heterogeneity.............................................. 70 toxicity modeling ...................................303–304, 306
In vivo ...................................................................... 49, 56, toxicity uncertainty ...................................297–329
66, 110, 111, 151, 194, 228–229, 238, tumor growth model .............................231, 300–303
308, 352 Matlab..............................................................6, 117, 162,
172, 228, 239, 265
K Metastatic emission process................................. 193–216
Methylation heterogeneity (MH) ........ 28–30, 46–48, 51
Kinase activity analysis (KAA) ............................. 112–113
Microarray .........................................................63, 70, 74,
Kinase activity scoring (kinact)................... 105, 117, 127
135, 244, 245, 247–249, 254–256, 283, 284
Kinase-Substrate Enrichment Analysis (KSEA)..........105,
Microarray microdissection with analysis of
114–124, 126
differences (MMAD)................................... 244
Kinase-substrate relationships .............................. 97, 104,
Microenvironment ................................ 28, 268, 333–345
110–116, 118, 121–122, 125–127
miRbase .............................................................. 56, 65, 74
CANCER SYSTEMS BIOLOGY: METHODS AND PROTOCOLS
Index 401
Multi-dimensional data.............................................67–70 phosphopeptide enrichment: phosphotyrosine
Multiplexed immunofluorescence peptide immunoprecipitation ..................... 153
BondRX autostainer....................................... 262, 263 phosphosite assignment ............................108–109
fluorescence scanning..................................... 262–265 Python ..................................................................... 23, 39,
Hoechst 33342 ..................................... 262, 264, 272 85, 87, 94, 96, 105, 117–120, 122, 123, 127,
image artifacts................................................. 265, 272 254, 364
quantitative image analysis ....................263, 265–270
Multi scatter plot.................................................. 139, 140 Q
Mutational pattern ........................................................ 3–9
Quantitative mass spectrometry
Mutual exclusivity ............................................................. 4 isobaric tags for relative and absolute quantitation
Myeloid derived suppressor cells (iTRAQ).................................... 107, 150, 153,
(MDSCs)............................................. 243, 336
156, 158–160
stable isotope labeling by metabolic incorporation
N
of amino acids (SILAC) .............................. 107
Natural killer cells............................................................ 57 tandem mass tags (TMT) .............................. 107, 156
Network biology
hub node ................................................................... 22 R
node ........................................................14, 18, 20–24 R
protein-protein interaction network Cellhts2 (R-package) .............................85–92, 96, 97
(PPI).........................................................14, 15 Gplots (R-package) .............................................85, 96
Neural networks (NNs) .............................. 179, 180, 290
limma package ................................................ 247, 283
Nuclear magnetic resonance (NMR) .................. 167–187 Radio frequency (RF) ................................................... 168
Radioimmunoprecipitation assay (RIPA) ........... 152, 155
O
Reduced representation bisulfite sequencing
Omics Integrator (RRBS) ........................................................... 28
Forest ............................................................ 15, 18–24 Reduction ............................................................ 152–153,
Garnet .....................................................15, 17–20, 23 155–156, 301, 303
Perseus ............................................................ 133–146 Regions of interest (ROI)........................... 230, 239, 269
Synergyfinder................................................. 354, 355, Ribonucleic acid (RNA)
359, 360, 397 messenger RNA (mRNA). 14, 15, 55–58, 63, 66–72,
Targetscan.................................................................. 65 75, 256, 280
Trim Galore .................................................. 29, 30, 34 micro RNAs (miRNA) ............................... 55–76, 335
Optimal cutting temperature (OCT)..........150, 154–155 primary miRNA (pri-miRNA)............................56, 57
Optimal therapy design ....................................... 338, 397 short hairpin RNA (shRNA) ..............................84, 85
small interfering RNA (siRNA).......................... 83–98
P RNA-Seq .........................................................17, 23, 245,
Patient-derived xenograft (PDX) ........................ 150, 334 247, 248, 251, 253–256, 283, 334
Phenotypic readouts ..................................................... 354
S
Phosphate buffered saline (PBS)......................... 152, 155
Phosphotyrosine signaling............................................ 151 Signaling pathway ................................................... 4, 5, 8,
Polymerase chain reaction (PCR) .................................. 33 118, 125, 126, 335, 339
reverse transcription polymerase chain Single nucleotide polymorphisms (SNP)....................279,
reaction (RT-PCR) ........................................ 63 281, 282
Precision medicine ........................................................ 277 Software tools
Proteomic ........................................................13–25, 107, BANDIT.................................................................. 286
108, 133–146, 150, 256, 278, 290 Bismark ............................... 29–31, 35–38, 47, 50–51
phosphoproteomics Bliss ................................................352, 353, 360, 362
phosphopeptide enrichment ChIPseeqer .............................................29–31, 42, 43
phosphopeptide enrichment: immobilized CIBERSORT.................................................. 243–257
metal affinity chromatography (IMAC).... 105, Cytoscape........................................... 8, 16, 20, 23, 24
153, 157–158 DAISY............................................................. 287–288
phosphopeptide enrichment: metal oxide FastQC............................................. 29–31, 33–35, 49
affinity chromatography (MOAC) ............. 105 FIMMcherry......................... 353–355, 357, 358, 364
CANCER SYSTEMS BIOLOGY: METHODS AND PROTOCOLS
402 Index
Statistical methods univariate analysis ...................................170, 183–185
Analysis of Variance (ANOVA) .................... 142–144, unsupervised analysis ................................... 40, 43, 48
146, 281 Synergy scoring .................................................... 351–397
Benjamini-Hochberg Adjustment.......................... 185 Synthetic dosage lethality ............................................. 287
cumulative variance plot ................................ 172, 173 Synthetic lethality (SL) ........................... 84, 95, 287, 288
false discovery rate (FDR) .............................. 18, 108,
135, 136, 142–143, 146, 160, 184, 185 T
generalized linear model (GLM) ........................... 281 Terminal deoxynucleotidyl transferase dUTP nick
hierarchical cluster analysis ....................171, 174–175
end labeling (TUNEL) ...........................64–65
latent variables ................................................. 66, 171, TissueCypher® .....................................263, 265, 267, 269
176–180, 182, 187 Total growth inhibition (TGI) ............................ 278, 279
leave-one-out-analysis ............................................. 181
Transcription factor (TF)........................ 15–20, 112, 339
likelihood ratio .......................................5, 7, 183–184 Transcriptomics ................................................. 13–25, 28,
linear least-square regression (LLSR) .................... 244 66, 133, 334, 345
linear mixed-effects models (LMM) ............. 183–184
Translational bioinformatics ................................ 133–146
median absolute deviation (MAD) ............. 88, 89, 98 Tumor evolution .............................................. 27–51, 298
memorylessness ..................................... 196, 197, 199 Tumor growth prediction............................230, 234–237
model diagnostic statistics ............................. 181, 182
Tumorigenesis ......................................... 8, 27, 48, 61, 66
multiple hypothesis testing ........................... 135, 141, Tumor infiltrating leukocytes (TILs) ..........................243,
142, 246, 282 244, 248–255
multivariate analysis ....................................... 170–183 Tumor progression ....................................................8, 43,
partial least squares discriminant analysis (PLS-DA)
48, 83, 345
175, 177–179, 181 Tyrosine kinase inhibitors (TKI)........................ 298, 299,
partial least squares regression (PLSR) ................. 176, 305, 322
178, 181
Pearson correlation ..................................92, 139, 160 U
permutation testing........................................ 182–183
principal component analysis (PCA)............ 139, 140, Untranslated region (UTR) .............................. 57, 65, 66
171–174, 176–178
W
response surface model ........................................... 353
root mean square error (RMSE) ...........176–178, 181 Whole-genome bisulfite sequencing (WGBS).............. 28,
scree plot......................................................... 173, 186 51, 334
Tukey’s honestly significant difference ................. 142,
143, 146