Background: Cancer is a complex, multiscale dynamical system, with interactions between tumor cel... more Background: Cancer is a complex, multiscale dynamical system, with interactions between tumor cells and non-cancerous host systems. Therapies act on this combined cancer-host system, sometimes with unexpected results. Systematic investigation of mechanistic computational models can augment traditional laboratory and clinical studies, helping identify the factors driving a treatment's success or failure. However, given the uncertainties regarding the underlying biology, these multiscale computational models can take many potential forms, in addition to encompassing high-dimensional parameter spaces. Therefore, the exploration of these models is computationally challenging. We propose that integrating two existing technologies-one to aid the construction of multiscale agent-based models, the other developed to enhance model exploration and optimization-can provide a computational means for high-throughput hypothesis testing, and eventually, optimization. Results: In this paper, we introduce a high throughput computing (HTC) framework that integrates a mechanistic 3-D multicellular simulator (PhysiCell) with an extreme-scale model exploration platform (EMEWS) to investigate high-dimensional parameter spaces. We show early results in applying PhysiCell-EMEWS to 3-D cancer immunotherapy and show insights on therapeutic failure. We describe a generalized PhysiCell-EMEWS workflow for high-throughput cancer hypothesis testing, where hundreds or thousands of mechanistic simulations are compared against data-driven error metrics to perform hypothesis optimization. Conclusions: While key notational and computational challenges remain, mechanistic agent-based models and high-throughput model exploration environments can be combined to systematically and rapidly explore key problems in cancer. These high-throughput computational experiments can improve our understanding of the underlying biology, drive future experiments, and ultimately inform clinical practice.
There is increasing interest in the use of mechanism-based multi-scale computational models (such... more There is increasing interest in the use of mechanism-based multi-scale computational models (such as agent-based models) to generate simulated clinical populations in order to discover and evaluate potential diagnostic and therapeutic modalities. A necessary precondition of this use is the ability to parameterize these highly complex models in an appropriate clinical context, which itself is often a complex environment. The description of the environment in which a biomedical simulation operates (model context) and parameterization of internal model rules (model content) requires the optimization of a large number of free-parameters; given the wide range of variable combinations, along with the intractability of ab initio modeling techniques which could be used to constrain these combinations, an astronomical number of simulations would be required to achieve this goal. In this work, we utilize a nested active-learning workflow to efficiently parameterize and contextualize an agentbased model (ABM) of systemic inflammation used to examine sepsis. Methods: Billions of microbial sepsis patients were simulated using a previously validated ABM of acute systemic inflammation, the Innate Immune Response Agent-Based Model (IIRABM). Contextual parameter space was examined using the following parameters: cardio-respiratory-metabolic resilience; two properties of microbial virulence, invasiveness and toxigenesis; and degree of contamination from the environment. The model's internal parameterization, which represents gene expression and associated cellular behaviors, was explored through the augmentation or inhibition of signaling pathways for 12 signaling mediators associated with inflammation and wound healing. We have implemented a nested active learning approach in which the clinically relevant model environment space for a given internal model parameterization is mapped using a small Artificial Neural Network (ANN). The outer AL level workflow is a larger ANN which uses a novel approach to active learning, Double Monte-Carlo Dropout Uncertainty (DMCDU), to efficiently regress the volume and centroid location of the CR space given by a single internal parameterization. Results: A brute-force exploration of the IIRABM's content and context would require approximately 3*10 12 simulations, and the result would be a coarse representation of a continuous space. We have reduced the number of simulations required to efficiently map the clinically relevant parameter space of this model by approximately 99%. Additionally, we have shown that more complex models with a larger number of variables may expect further improvements in efficiency. .
OSTI OAI (U.S. Department of Energy Office of Scientific and Technical Information), Nov 13, 2016
Agent-based models (ABMs) integrate multiple scales of behavior and data to produce higher-order ... more Agent-based models (ABMs) integrate multiple scales of behavior and data to produce higher-order dynamic phenomena and are increasingly used in the study of cancer. However, the complexity of ABMs provides numerous challenges to their effective use, mostly related to the relatively high computational cost in carrying out the simulation experiments by which ABMs are developed, calibrated and used. Highperformance computing (HPC) platforms can address some of these computational constraints. We have developed a framework, called Extreme-scale Model Exploration with Swift/T (EMEWS), that can leverage the computing capabilities of HPC parallel architectures by integrating model exploration (ME) modules such as machine learning and evolutionary computing methods to augment the performance of large-scale simulation experiments. EMEWS can be used to aid in the calibration, parameter estimation and model exploration of any simulation-model. Herein we provide a use case examining the factors and patterns of mutational events of oncogenesis in population level simulations of a mechanism-based ABM of colorectal cancer (CRC).
Sepsis, a manifestation of the body's inflammatory response to injury and infection, has a mortal... more Sepsis, a manifestation of the body's inflammatory response to injury and infection, has a mortality rate of between 28%-50% and affects approximately 1 million patients annually in the United States. Currently, there are no therapies targeting the cellular/molecular processes driving sepsis that have demonstrated the ability to control this disease process in the clinical setting. We propose that this is in great part due to the considerable heterogeneity of the clinical trajectories that constitute clinical "sepsis," and that determining how this system can be controlled back into a state of health requires the application of concepts drawn from the field of dynamical systems. In this work, we consider the human immune system to be a random dynamical system, and investigate its potential controllability using an agent-based model of the innate immune response (the Innate Immune Response ABM or IIRABM) as a surrogate, proxy system. Simulation experiments with the IIRABM provide an explanation as to why single/limited cytokine perturbations at a single, or small number of, time points is unlikely to significantly improve the mortality rate of sepsis. We then use genetic algorithms (GA) to explore and characterize multi-targeted control strategies for the random dynamical immune system that guide it from a persistent, non-recovering inflammatory state (functionally equivalent to the clinical states of systemic inflammatory response syndrome (SIRS) or sepsis) to a state of health. We train the GA on a single parameter set with multiple stochastic replicates, and show that while the calculated results show good generalizability, more advanced strategies are needed to achieve the goal of adaptive personalized medicine. This work evaluating the extent of interventions needed to control a simplified surrogate model of sepsis provides insight into the scope of the clinical challenge, and can serve as a guide on the path towards true "precision control" of sepsis.
Digital twins, customized simulation models pioneered in industry, are beginning to be deployed i... more Digital twins, customized simulation models pioneered in industry, are beginning to be deployed in medicine and healthcare, with some major successes, for instance in cardiovascular diagnostics and in insulin pump control. Personalized computational models are also assisting in applications ranging from drug development to treatment optimization. More advanced medical digital twins will be essential to making precision medicine a reality. Because the immune system plays an important role in such a wide range of diseases and health conditions, from fighting pathogens to autoimmune disorders, digital twins of the immune system will have an especially high impact. However, their development presents major challenges, stemming from the inherent complexity of the immune system and the difficulty of measuring many aspects of a patient’s immune state in vivo. This perspective outlines a roadmap for meeting these challenges and building a prototype of an immune digital twin. It is structure...
Damage to an epithelial surface disrupts its mechanical and immunologic barrier function and expo... more Damage to an epithelial surface disrupts its mechanical and immunologic barrier function and exposes underlying tissues to a potentially hostile external environment. Epithelial restitution occurs quickly to reestablish the barrier and comprises a major part of the immediate host response to injured tissue. Pathways involving transforming growth factor beta and activation of epidermal growth factor receptor are both of critical importance, although cross‐pathway interactions have been poorly characterized. Agent‐based modeling has been showed to be useful in integrating disparate bodies of knowledge and showing the dynamic consequences of pathway structures and cellular population behavior and is used herein to create an in silico analog of an in vitro scratch assay. The In Vitro Scratch Agent‐Based Model consists of agents representing individual epithelial cells in a simulated extracellular matrix. Agents sense signals from the damaged environment and produce effector molecules, l...
There has been a great deal of interest in the concept, development and implementation of medical... more There has been a great deal of interest in the concept, development and implementation of medical digital twins. This interest has led to wide ranging perceptions of what constitutes a medical digital twin. This Perspectives article will provide 1) a description of fundamental features of industrial digital twins, the source of the digital twin concept, 2) aspects of biology that challenge the implementation of medical digital twins, 3) a schematic program of how a specific medical digital twin project could be defined, and 4) an example description within that schematic program for a specific type of medical digital twin intended for drug discovery, testing and repurposing, the Drug Development Digital Twin (DDDT).
The use of synthetic data is recognized as a crucial step in the development of neural network-ba... more The use of synthetic data is recognized as a crucial step in the development of neural network-based Artificial Intelligence (AI) systems. While the methods for generating synthetic data for AI applications in other domains have a role in certain biomedical AI systems, primarily related to image processing, there is a critical gap in the generation of time series data for AI tasks where it is necessary to know how the system works. This is most pronounced in the ability to generate synthetic multi-dimensional molecular time series data (subsequently referred to as synthetic mediator trajectories or SMTs); this is the type of data that underpins research into biomarkers and mediator signatures for forecasting various diseases and is an essential component of the drug development pipeline. We argue the insufficiency of statistical and data-centric machine learning (ML) means of generating this type of synthetic data is due to a combination of factors: perpetual data sparsity due to the Curse of Dimensionality, the inapplicability of the Central Limit Theorem in terms of making assumptions about the statistical distributions of this type of data, and the inability to use ab initio simulations due to the state of perpetual epistemic incompleteness in cellular/molecular biology. Alternatively, we present a rationale for using complex multi-scale mechanism-based simulation models, constructed and operated on to account for perpetual epistemic incompleteness and the need to provide maximal expansiveness in concordance with the Maximal Entropy Principle. These procedures provide for the generation of SMT that minimizes the known shortcomings associated with neural network AI systems, namely overfitting and lack of generalizability. The generation of synthetic data that accounts for the identified factors of multi-dimensional time series data is an essential capability for the development of mediator-biomarker based AI forecasting systems, and therapeutic control development and optimization.
Background: Cancer is a complex, multiscale dynamical system, with interactions between tumor cel... more Background: Cancer is a complex, multiscale dynamical system, with interactions between tumor cells and non-cancerous host systems. Therapies act on this combined cancer-host system, sometimes with unexpected results. Systematic investigation of mechanistic computational models can augment traditional laboratory and clinical studies, helping identify the factors driving a treatment's success or failure. However, given the uncertainties regarding the underlying biology, these multiscale computational models can take many potential forms, in addition to encompassing high-dimensional parameter spaces. Therefore, the exploration of these models is computationally challenging. We propose that integrating two existing technologies-one to aid the construction of multiscale agent-based models, the other developed to enhance model exploration and optimization-can provide a computational means for high-throughput hypothesis testing, and eventually, optimization. Results: In this paper, we introduce a high throughput computing (HTC) framework that integrates a mechanistic 3-D multicellular simulator (PhysiCell) with an extreme-scale model exploration platform (EMEWS) to investigate high-dimensional parameter spaces. We show early results in applying PhysiCell-EMEWS to 3-D cancer immunotherapy and show insights on therapeutic failure. We describe a generalized PhysiCell-EMEWS workflow for high-throughput cancer hypothesis testing, where hundreds or thousands of mechanistic simulations are compared against data-driven error metrics to perform hypothesis optimization. Conclusions: While key notational and computational challenges remain, mechanistic agent-based models and high-throughput model exploration environments can be combined to systematically and rapidly explore key problems in cancer. These high-throughput computational experiments can improve our understanding of the underlying biology, drive future experiments, and ultimately inform clinical practice.
There is increasing interest in the use of mechanism-based multi-scale computational models (such... more There is increasing interest in the use of mechanism-based multi-scale computational models (such as agent-based models) to generate simulated clinical populations in order to discover and evaluate potential diagnostic and therapeutic modalities. A necessary precondition of this use is the ability to parameterize these highly complex models in an appropriate clinical context, which itself is often a complex environment. The description of the environment in which a biomedical simulation operates (model context) and parameterization of internal model rules (model content) requires the optimization of a large number of free-parameters; given the wide range of variable combinations, along with the intractability of ab initio modeling techniques which could be used to constrain these combinations, an astronomical number of simulations would be required to achieve this goal. In this work, we utilize a nested active-learning workflow to efficiently parameterize and contextualize an agentbased model (ABM) of systemic inflammation used to examine sepsis. Methods: Billions of microbial sepsis patients were simulated using a previously validated ABM of acute systemic inflammation, the Innate Immune Response Agent-Based Model (IIRABM). Contextual parameter space was examined using the following parameters: cardio-respiratory-metabolic resilience; two properties of microbial virulence, invasiveness and toxigenesis; and degree of contamination from the environment. The model's internal parameterization, which represents gene expression and associated cellular behaviors, was explored through the augmentation or inhibition of signaling pathways for 12 signaling mediators associated with inflammation and wound healing. We have implemented a nested active learning approach in which the clinically relevant model environment space for a given internal model parameterization is mapped using a small Artificial Neural Network (ANN). The outer AL level workflow is a larger ANN which uses a novel approach to active learning, Double Monte-Carlo Dropout Uncertainty (DMCDU), to efficiently regress the volume and centroid location of the CR space given by a single internal parameterization. Results: A brute-force exploration of the IIRABM's content and context would require approximately 3*10 12 simulations, and the result would be a coarse representation of a continuous space. We have reduced the number of simulations required to efficiently map the clinically relevant parameter space of this model by approximately 99%. Additionally, we have shown that more complex models with a larger number of variables may expect further improvements in efficiency. .
OSTI OAI (U.S. Department of Energy Office of Scientific and Technical Information), Nov 13, 2016
Agent-based models (ABMs) integrate multiple scales of behavior and data to produce higher-order ... more Agent-based models (ABMs) integrate multiple scales of behavior and data to produce higher-order dynamic phenomena and are increasingly used in the study of cancer. However, the complexity of ABMs provides numerous challenges to their effective use, mostly related to the relatively high computational cost in carrying out the simulation experiments by which ABMs are developed, calibrated and used. Highperformance computing (HPC) platforms can address some of these computational constraints. We have developed a framework, called Extreme-scale Model Exploration with Swift/T (EMEWS), that can leverage the computing capabilities of HPC parallel architectures by integrating model exploration (ME) modules such as machine learning and evolutionary computing methods to augment the performance of large-scale simulation experiments. EMEWS can be used to aid in the calibration, parameter estimation and model exploration of any simulation-model. Herein we provide a use case examining the factors and patterns of mutational events of oncogenesis in population level simulations of a mechanism-based ABM of colorectal cancer (CRC).
Sepsis, a manifestation of the body's inflammatory response to injury and infection, has a mortal... more Sepsis, a manifestation of the body's inflammatory response to injury and infection, has a mortality rate of between 28%-50% and affects approximately 1 million patients annually in the United States. Currently, there are no therapies targeting the cellular/molecular processes driving sepsis that have demonstrated the ability to control this disease process in the clinical setting. We propose that this is in great part due to the considerable heterogeneity of the clinical trajectories that constitute clinical "sepsis," and that determining how this system can be controlled back into a state of health requires the application of concepts drawn from the field of dynamical systems. In this work, we consider the human immune system to be a random dynamical system, and investigate its potential controllability using an agent-based model of the innate immune response (the Innate Immune Response ABM or IIRABM) as a surrogate, proxy system. Simulation experiments with the IIRABM provide an explanation as to why single/limited cytokine perturbations at a single, or small number of, time points is unlikely to significantly improve the mortality rate of sepsis. We then use genetic algorithms (GA) to explore and characterize multi-targeted control strategies for the random dynamical immune system that guide it from a persistent, non-recovering inflammatory state (functionally equivalent to the clinical states of systemic inflammatory response syndrome (SIRS) or sepsis) to a state of health. We train the GA on a single parameter set with multiple stochastic replicates, and show that while the calculated results show good generalizability, more advanced strategies are needed to achieve the goal of adaptive personalized medicine. This work evaluating the extent of interventions needed to control a simplified surrogate model of sepsis provides insight into the scope of the clinical challenge, and can serve as a guide on the path towards true "precision control" of sepsis.
Digital twins, customized simulation models pioneered in industry, are beginning to be deployed i... more Digital twins, customized simulation models pioneered in industry, are beginning to be deployed in medicine and healthcare, with some major successes, for instance in cardiovascular diagnostics and in insulin pump control. Personalized computational models are also assisting in applications ranging from drug development to treatment optimization. More advanced medical digital twins will be essential to making precision medicine a reality. Because the immune system plays an important role in such a wide range of diseases and health conditions, from fighting pathogens to autoimmune disorders, digital twins of the immune system will have an especially high impact. However, their development presents major challenges, stemming from the inherent complexity of the immune system and the difficulty of measuring many aspects of a patient’s immune state in vivo. This perspective outlines a roadmap for meeting these challenges and building a prototype of an immune digital twin. It is structure...
Damage to an epithelial surface disrupts its mechanical and immunologic barrier function and expo... more Damage to an epithelial surface disrupts its mechanical and immunologic barrier function and exposes underlying tissues to a potentially hostile external environment. Epithelial restitution occurs quickly to reestablish the barrier and comprises a major part of the immediate host response to injured tissue. Pathways involving transforming growth factor beta and activation of epidermal growth factor receptor are both of critical importance, although cross‐pathway interactions have been poorly characterized. Agent‐based modeling has been showed to be useful in integrating disparate bodies of knowledge and showing the dynamic consequences of pathway structures and cellular population behavior and is used herein to create an in silico analog of an in vitro scratch assay. The In Vitro Scratch Agent‐Based Model consists of agents representing individual epithelial cells in a simulated extracellular matrix. Agents sense signals from the damaged environment and produce effector molecules, l...
There has been a great deal of interest in the concept, development and implementation of medical... more There has been a great deal of interest in the concept, development and implementation of medical digital twins. This interest has led to wide ranging perceptions of what constitutes a medical digital twin. This Perspectives article will provide 1) a description of fundamental features of industrial digital twins, the source of the digital twin concept, 2) aspects of biology that challenge the implementation of medical digital twins, 3) a schematic program of how a specific medical digital twin project could be defined, and 4) an example description within that schematic program for a specific type of medical digital twin intended for drug discovery, testing and repurposing, the Drug Development Digital Twin (DDDT).
The use of synthetic data is recognized as a crucial step in the development of neural network-ba... more The use of synthetic data is recognized as a crucial step in the development of neural network-based Artificial Intelligence (AI) systems. While the methods for generating synthetic data for AI applications in other domains have a role in certain biomedical AI systems, primarily related to image processing, there is a critical gap in the generation of time series data for AI tasks where it is necessary to know how the system works. This is most pronounced in the ability to generate synthetic multi-dimensional molecular time series data (subsequently referred to as synthetic mediator trajectories or SMTs); this is the type of data that underpins research into biomarkers and mediator signatures for forecasting various diseases and is an essential component of the drug development pipeline. We argue the insufficiency of statistical and data-centric machine learning (ML) means of generating this type of synthetic data is due to a combination of factors: perpetual data sparsity due to the Curse of Dimensionality, the inapplicability of the Central Limit Theorem in terms of making assumptions about the statistical distributions of this type of data, and the inability to use ab initio simulations due to the state of perpetual epistemic incompleteness in cellular/molecular biology. Alternatively, we present a rationale for using complex multi-scale mechanism-based simulation models, constructed and operated on to account for perpetual epistemic incompleteness and the need to provide maximal expansiveness in concordance with the Maximal Entropy Principle. These procedures provide for the generation of SMT that minimizes the known shortcomings associated with neural network AI systems, namely overfitting and lack of generalizability. The generation of synthetic data that accounts for the identified factors of multi-dimensional time series data is an essential capability for the development of mediator-biomarker based AI forecasting systems, and therapeutic control development and optimization.
Uploads
Papers by Gary An