We introduce a new family of deep neural network models. Instead of specifying a discrete sequenc... more We introduce a new family of deep neural network models. Instead of specifying a discrete sequence of hidden layers, we parameterize the derivative of the hidden state using a neural network. The output of the network is computed using a blackbox differential equation solver. These continuous-depth models have constant memory cost, adapt their evaluation strategy to each input, and can explicitly trade numerical precision for speed. We demonstrate these properties in continuous-depth residual networks and continuous-time latent variable models. We also construct continuous normalizing flows, a generative model that can train by maximum likelihood, without partitioning or ordering the data dimensions. For training, we show how to scalably backpropagate through any ODE solver, without access to its internal operations. This allows end-to-end training of ODEs within larger models.
The cornerstone of neural algorithmic reasoning is the ability to solve algorithmic tasks, especi... more The cornerstone of neural algorithmic reasoning is the ability to solve algorithmic tasks, especially in a way that generalises out of distribution. While recent years have seen a surge in methodological improvements in this area, they mostly focused on building specialist models. Specialist models are capable of learning to neurally execute either only one algorithm or a collection of algorithms with identical control-flow backbone. Here, instead, we focus on constructing a generalist neural algorithmic learner-a single graph neural network processor capable of learning to execute a wide range of algorithms, such as sorting, searching, dynamic programming, path-finding and geometry. We leverage the CLRS benchmark to empirically show that, much like recent successes in the domain of perception, generalist algorithmic learners can be built by "incorporating" knowledge. That is, it is possible to effectively learn algorithms in a multi-task manner, so long as we can learn to execute them well in a single-task regime. Motivated by this, we present a series of improvements to the input representation, training regime and processor architecture over CLRS, improving average single-task performance by over 20% from prior art. We then conduct a thorough ablation of multi-task learners leveraging these improvements. Our results demonstrate a generalist learner that effectively incorporates knowledge captured by specialist models.
In this paper we show that simple noisy regularisation can be an effective way to address GNN ove... more In this paper we show that simple noisy regularisation can be an effective way to address GNN oversmoothing. First we argue that regularisers addressing oversmoothing should both penalise node latent similarity and encourage meaningful node representations. From this observation we derive "Noisy Nodes", a simple technique in which we corrupt the input graph with noise, and add a noise correcting node-level loss. The diverse node level loss encourages latent node diversity, and the denoising objective encourages graph manifold learning. Our regulariser applies well-studied methods in simple, straightforward ways which allow even generic architectures to overcome oversmoothing and achieve state of the art results on quantum chemistry tasks, and improve results significantly on Open Graph Benchmark (OGB) datasets. Our results suggest Noisy Nodes can serve as a complementary building block in the GNN toolkit.
In the area of physical simulations, nearly all neural-network-based methods directly predict fut... more In the area of physical simulations, nearly all neural-network-based methods directly predict future states from the input states. However, many traditional simulation engines instead model the constraints of the system and select the state which satisfies them. Here we present a framework for constraint-based learned simulation, where a scalar constraint function is implemented as a graph neural network, and future predictions are computed by solving the optimization problem defined by the learned constraint. Our model achieves comparable or better accuracy to top learned simulators on a variety of challenging physical domains, and offers several unique advantages. We can improve the simulation accuracy on a larger system by applying more solver iterations at test time. We also can incorporate novel hand-designed constraints at test time and simulate new dynamics which were not present in the training data. Our constraint-based framework shows how key techniques from traditional simulation and numerical methods can be leveraged as inductive biases in machine learning simulators.
In this paper we show that simple noisy regularisation can be an effective way to address GNN ove... more In this paper we show that simple noisy regularisation can be an effective way to address GNN oversmoothing. First we argue that regularisers addressing oversmoothing should both penalise node latent similarity and encourage meaningful node representations. From this observation we derive "Noisy Nodes", a simple technique in which we corrupt the input graph with noise, and add a noise correcting node-level loss. The diverse node level loss encourages latent node diversity, and the denoising objective encourages graph manifold learning. Our regulariser applies well-studied methods in simple, straightforward ways which allow even generic architectures to overcome oversmoothing and achieve state of the art results on quantum chemistry tasks, and improve results significantly on Open Graph Benchmark (OGB) datasets. Our results suggest Noisy Nodes can serve as a complementary building block in the GNN toolkit.
Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing, 2020
Mutational signatures are patterns of mutation types, many of which are linked to known mutagenic... more Mutational signatures are patterns of mutation types, many of which are linked to known mutagenic processes. Signature activity represents the proportion of mutations a signature generates. In cancer, cells may gain advantageous phenotypes through mutation accumulation, causing rapid growth of that subpopulation within the tumour. The presence of many subclones can make cancers harder to treat and have other clinical implications. Reconstructing changes in signature activities can give insight into the evolution of cells within a tumour. Recently, we introduced a new method, TrackSig, to detect changes in signature activities across time from single bulk tumour sample. By design, TrackSig is unable to identify mutation populations with different frequencies but little to no difference in signature activity. Here we present an extension of this method, TrackSigFreq, which enables trajectory reconstruction based on both observed density of mutation frequencies and changes in mutationa...
Bayesian optimization is a principled approach for globally optimizing expensive, black-box funct... more Bayesian optimization is a principled approach for globally optimizing expensive, black-box functions by using a surrogate model of the objective. However, each step of Bayesian optimization involves solving an inner optimization problem, in which we maximize an acquisition function derived from the surrogate model to decide where to query next. This inner problem can be challenging to solve, particularly in discrete spaces, such as protein sequences or molecular graphs, where gradient-based optimization cannot be used. Our key insight is that we can train a parameterized policy to generate candidates that maximize the acquisition function. This is faster than standard parameterfree search methods, since we can amortize the cost of learning the policy across rounds of Bayesian optimization. We therefore call this Amortized Bayesian Optimization. On several challenging discrete design problems, we show this method generally outperforms other methods at optimizing the inner acquisitio...
Graph Neural Networks (GNNs) perform learned message passing over an input graph, but conventiona... more Graph Neural Networks (GNNs) perform learned message passing over an input graph, but conventional wisdom says performing more than handful of steps makes training difficult and does not yield improved performance. Here we show the contrary. We train a deep GNN with up to 100 message passing steps and achieve several state-of-the-art results on two challenging molecular property prediction benchmarks, Open Catalyst 2020 IS2RE and QM9. Our approach depends crucially on a novel but simple regularisation method, which we call “Noisy Nodes”, in which we corrupt the input graph with noise and add an auxiliary node autoencoder loss if the task is graph property prediction. Our results show this regularisation method allows the model to monotonically improve in performance with increased message passing steps. Our work opens new opportunities for reaping the benefits of deep neural networks in the space of graph and other structured prediction problems.
We present a new method, TrackSig, to estimate evolutionary trajectories in cancer. Our method re... more We present a new method, TrackSig, to estimate evolutionary trajectories in cancer. Our method represents cancer evolution in terms of mutational signaturesmultinomial distributions over mutation types. TrackSig infers an approximate order in which mutations accumulated in cancer genome, and then fits the signatures to the mutation time series. We assess TrackSig's reconstruction accuracy using simulations. We find 1.9% median discrepancy between estimated mixtures and ground truth. The size of the signature change is consistent in 87% cases and direction of change is consistent in 95% of cases. The code is available at https://github.com/YuliaRubanova/TrackSig.
Time series with non-uniform intervals occur in many applications, and are difficult to model usi... more Time series with non-uniform intervals occur in many applications, and are difficult to model using standard recurrent neural networks (RNNs). We generalize RNNs to have continuous-time hidden dynamics defined by ordinary differential equations (ODEs), a model we call ODE-RNNs. Furthermore, we use ODE-RNNs to replace the recognition network of the recently-proposed Latent ODE model. Both ODE-RNNs and Latent ODEs can naturally handle arbitrary time gaps between observations, and can explicitly model the probability of observation times using Poisson processes. We show experimentally that these ODE-based models outperform their RNN-based counterparts on irregularly-sampled data.
Time series with non-uniform intervals occur in many applications, and are difficult to model usi... more Time series with non-uniform intervals occur in many applications, and are difficult to model using standard recurrent neural networks (RNNs). We generalize RNNs to have continuous-time hidden dynamics defined by ordinary differential equations (ODEs), a model we call ODE-RNNs. Furthermore, we use ODE-RNNs to replace the recognition network of the recently-proposed Latent ODE model. Both ODE-RNNs and Latent ODEs can naturally handle arbitrary time gaps between observations, and can explicitly model the probability of observation times using Poisson processes. We show experimentally that these ODE-based models outperform their RNN-based counterparts on irregularly-sampled data.
We have characterised intra-tumour heterogeneity (ITH) across 2,778 whole genome sequences of tum... more We have characterised intra-tumour heterogeneity (ITH) across 2,778 whole genome sequences of tumours in the International Cancer Genome Consortium Pan-Cancer Analysis of Whole Genomes project, representing 36 distinct cancer types. We applied 6 copy number (CNA) callers and 11 subclonal reconstruction algorithms and developed approaches to integrate the results in robust, high-confidence CNA calls and subclonal architectures. The analysis reveals widespread ITH. We find at least one subclone in nearly all (96.7%) tumours with sufficient sequencing depth. Analysis using dN/dS ratios yields clear signs of positive selection in clonal and subclonal mutations and we find subclonal driver mutations in known driver genes. However, only 24% of subclones contain a driver mutation in a known driver gene, suggesting that a multitude of undiscovered late drivers exist and that tumours continue to undergo selection after tumourigenesis, at least until diagnosis. Consistent with other studies, ...
Cancer develops through a continuous process of somatic evolution. Whole genome sequencing provid... more Cancer develops through a continuous process of somatic evolution. Whole genome sequencing provides a snapshot of the tumor genome at the point of sampling, however, the data can contain information that permits the reconstruction of a tumor's evolutionary past. Here, we apply such life history analyses on an unprecedented scale, to a set of 2,658 tumors spanning 39 cancer types. We estimated the timing of large chromosomal gains during tumor evolution, by comparing the rates of doubled to non-doubled point mutations within gained regions. Although we find that such events typically occur in the second half of clonal evolution, we also observe distinctive and early chromosomal gains in some cancer types, such as gains of chromosomes 7, 19 and 20 in glioblastoma, and isochromosome 17q in medulloblastoma. By integrating these results with the qualitative timing of individual driver mutations, we obtained an overall ranking, from early to late, of frequent somatic events per cancer...
ABSTRACTWe present a new method, TrackSig, to estimate the evolutionary trajectories of signature... more ABSTRACTWe present a new method, TrackSig, to estimate the evolutionary trajectories of signatures of different somatic mutational processes from DNA sequencing data from a single, bulk tumour sample. TrackSig uses probability distributions over mutation types, called mutational signatures, to represent different mutational processes and detects the changes in the signature activity using an optimal segmentation algorithm that groups somatic mutations based on their estimated cancer cellular fraction (CCF) and their mutation type (e.g. CAG->CTG). We use two different simulation frameworks to assess both TrackSig’s reconstruction accuracy and its robustness to violations of its assumptions, as well as to compare it to a baseline approach. We find 2-4% median error in reconstructing the signature activities on simulations with varying difficulty with one to three subclones at an average depth of 30x. The size and the direction of the activity change is consistent in 83% and 95% of ...
We report the integrative analysis of more than 2,600 whole cancer genomes and their matching nor... more We report the integrative analysis of more than 2,600 whole cancer genomes and their matching normal tissues across 39 distinct tumour types. By studying whole genomes we have been able to catalogue non-coding cancer driver events, study patterns of structural variation, infer tumour evolution, probe the interactions among variants in the germline genome, the tumour genome and the transcriptome, and derive an understanding of how coding and non-coding variations together contribute to driving individual patient's tumours. This work represents the most comprehensive look at cancer whole genomes to date. NOTE TO READERS: This is an incomplete draft of the marker paper for the Pan-Cancer Analysis of Whole Genomes Project, and is intended to provide the background information for a series of in-depth papers that will be posted to BioRixv during the summer of 2017.
We introduce a new family of deep neural network models. Instead of specifying a discrete sequenc... more We introduce a new family of deep neural network models. Instead of specifying a discrete sequence of hidden layers, we parameterize the derivative of the hidden state using a neural network. The output of the network is computed using a blackbox differential equation solver. These continuous-depth models have constant memory cost, adapt their evaluation strategy to each input, and can explicitly trade numerical precision for speed. We demonstrate these properties in continuous-depth residual networks and continuous-time latent variable models. We also construct continuous normalizing flows, a generative model that can train by maximum likelihood, without partitioning or ordering the data dimensions. For training, we show how to scalably backpropagate through any ODE solver, without access to its internal operations. This allows end-to-end training of ODEs within larger models.
The cornerstone of neural algorithmic reasoning is the ability to solve algorithmic tasks, especi... more The cornerstone of neural algorithmic reasoning is the ability to solve algorithmic tasks, especially in a way that generalises out of distribution. While recent years have seen a surge in methodological improvements in this area, they mostly focused on building specialist models. Specialist models are capable of learning to neurally execute either only one algorithm or a collection of algorithms with identical control-flow backbone. Here, instead, we focus on constructing a generalist neural algorithmic learner-a single graph neural network processor capable of learning to execute a wide range of algorithms, such as sorting, searching, dynamic programming, path-finding and geometry. We leverage the CLRS benchmark to empirically show that, much like recent successes in the domain of perception, generalist algorithmic learners can be built by "incorporating" knowledge. That is, it is possible to effectively learn algorithms in a multi-task manner, so long as we can learn to execute them well in a single-task regime. Motivated by this, we present a series of improvements to the input representation, training regime and processor architecture over CLRS, improving average single-task performance by over 20% from prior art. We then conduct a thorough ablation of multi-task learners leveraging these improvements. Our results demonstrate a generalist learner that effectively incorporates knowledge captured by specialist models.
In this paper we show that simple noisy regularisation can be an effective way to address GNN ove... more In this paper we show that simple noisy regularisation can be an effective way to address GNN oversmoothing. First we argue that regularisers addressing oversmoothing should both penalise node latent similarity and encourage meaningful node representations. From this observation we derive "Noisy Nodes", a simple technique in which we corrupt the input graph with noise, and add a noise correcting node-level loss. The diverse node level loss encourages latent node diversity, and the denoising objective encourages graph manifold learning. Our regulariser applies well-studied methods in simple, straightforward ways which allow even generic architectures to overcome oversmoothing and achieve state of the art results on quantum chemistry tasks, and improve results significantly on Open Graph Benchmark (OGB) datasets. Our results suggest Noisy Nodes can serve as a complementary building block in the GNN toolkit.
In the area of physical simulations, nearly all neural-network-based methods directly predict fut... more In the area of physical simulations, nearly all neural-network-based methods directly predict future states from the input states. However, many traditional simulation engines instead model the constraints of the system and select the state which satisfies them. Here we present a framework for constraint-based learned simulation, where a scalar constraint function is implemented as a graph neural network, and future predictions are computed by solving the optimization problem defined by the learned constraint. Our model achieves comparable or better accuracy to top learned simulators on a variety of challenging physical domains, and offers several unique advantages. We can improve the simulation accuracy on a larger system by applying more solver iterations at test time. We also can incorporate novel hand-designed constraints at test time and simulate new dynamics which were not present in the training data. Our constraint-based framework shows how key techniques from traditional simulation and numerical methods can be leveraged as inductive biases in machine learning simulators.
In this paper we show that simple noisy regularisation can be an effective way to address GNN ove... more In this paper we show that simple noisy regularisation can be an effective way to address GNN oversmoothing. First we argue that regularisers addressing oversmoothing should both penalise node latent similarity and encourage meaningful node representations. From this observation we derive "Noisy Nodes", a simple technique in which we corrupt the input graph with noise, and add a noise correcting node-level loss. The diverse node level loss encourages latent node diversity, and the denoising objective encourages graph manifold learning. Our regulariser applies well-studied methods in simple, straightforward ways which allow even generic architectures to overcome oversmoothing and achieve state of the art results on quantum chemistry tasks, and improve results significantly on Open Graph Benchmark (OGB) datasets. Our results suggest Noisy Nodes can serve as a complementary building block in the GNN toolkit.
Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing, 2020
Mutational signatures are patterns of mutation types, many of which are linked to known mutagenic... more Mutational signatures are patterns of mutation types, many of which are linked to known mutagenic processes. Signature activity represents the proportion of mutations a signature generates. In cancer, cells may gain advantageous phenotypes through mutation accumulation, causing rapid growth of that subpopulation within the tumour. The presence of many subclones can make cancers harder to treat and have other clinical implications. Reconstructing changes in signature activities can give insight into the evolution of cells within a tumour. Recently, we introduced a new method, TrackSig, to detect changes in signature activities across time from single bulk tumour sample. By design, TrackSig is unable to identify mutation populations with different frequencies but little to no difference in signature activity. Here we present an extension of this method, TrackSigFreq, which enables trajectory reconstruction based on both observed density of mutation frequencies and changes in mutationa...
Bayesian optimization is a principled approach for globally optimizing expensive, black-box funct... more Bayesian optimization is a principled approach for globally optimizing expensive, black-box functions by using a surrogate model of the objective. However, each step of Bayesian optimization involves solving an inner optimization problem, in which we maximize an acquisition function derived from the surrogate model to decide where to query next. This inner problem can be challenging to solve, particularly in discrete spaces, such as protein sequences or molecular graphs, where gradient-based optimization cannot be used. Our key insight is that we can train a parameterized policy to generate candidates that maximize the acquisition function. This is faster than standard parameterfree search methods, since we can amortize the cost of learning the policy across rounds of Bayesian optimization. We therefore call this Amortized Bayesian Optimization. On several challenging discrete design problems, we show this method generally outperforms other methods at optimizing the inner acquisitio...
Graph Neural Networks (GNNs) perform learned message passing over an input graph, but conventiona... more Graph Neural Networks (GNNs) perform learned message passing over an input graph, but conventional wisdom says performing more than handful of steps makes training difficult and does not yield improved performance. Here we show the contrary. We train a deep GNN with up to 100 message passing steps and achieve several state-of-the-art results on two challenging molecular property prediction benchmarks, Open Catalyst 2020 IS2RE and QM9. Our approach depends crucially on a novel but simple regularisation method, which we call “Noisy Nodes”, in which we corrupt the input graph with noise and add an auxiliary node autoencoder loss if the task is graph property prediction. Our results show this regularisation method allows the model to monotonically improve in performance with increased message passing steps. Our work opens new opportunities for reaping the benefits of deep neural networks in the space of graph and other structured prediction problems.
We present a new method, TrackSig, to estimate evolutionary trajectories in cancer. Our method re... more We present a new method, TrackSig, to estimate evolutionary trajectories in cancer. Our method represents cancer evolution in terms of mutational signaturesmultinomial distributions over mutation types. TrackSig infers an approximate order in which mutations accumulated in cancer genome, and then fits the signatures to the mutation time series. We assess TrackSig's reconstruction accuracy using simulations. We find 1.9% median discrepancy between estimated mixtures and ground truth. The size of the signature change is consistent in 87% cases and direction of change is consistent in 95% of cases. The code is available at https://github.com/YuliaRubanova/TrackSig.
Time series with non-uniform intervals occur in many applications, and are difficult to model usi... more Time series with non-uniform intervals occur in many applications, and are difficult to model using standard recurrent neural networks (RNNs). We generalize RNNs to have continuous-time hidden dynamics defined by ordinary differential equations (ODEs), a model we call ODE-RNNs. Furthermore, we use ODE-RNNs to replace the recognition network of the recently-proposed Latent ODE model. Both ODE-RNNs and Latent ODEs can naturally handle arbitrary time gaps between observations, and can explicitly model the probability of observation times using Poisson processes. We show experimentally that these ODE-based models outperform their RNN-based counterparts on irregularly-sampled data.
Time series with non-uniform intervals occur in many applications, and are difficult to model usi... more Time series with non-uniform intervals occur in many applications, and are difficult to model using standard recurrent neural networks (RNNs). We generalize RNNs to have continuous-time hidden dynamics defined by ordinary differential equations (ODEs), a model we call ODE-RNNs. Furthermore, we use ODE-RNNs to replace the recognition network of the recently-proposed Latent ODE model. Both ODE-RNNs and Latent ODEs can naturally handle arbitrary time gaps between observations, and can explicitly model the probability of observation times using Poisson processes. We show experimentally that these ODE-based models outperform their RNN-based counterparts on irregularly-sampled data.
We have characterised intra-tumour heterogeneity (ITH) across 2,778 whole genome sequences of tum... more We have characterised intra-tumour heterogeneity (ITH) across 2,778 whole genome sequences of tumours in the International Cancer Genome Consortium Pan-Cancer Analysis of Whole Genomes project, representing 36 distinct cancer types. We applied 6 copy number (CNA) callers and 11 subclonal reconstruction algorithms and developed approaches to integrate the results in robust, high-confidence CNA calls and subclonal architectures. The analysis reveals widespread ITH. We find at least one subclone in nearly all (96.7%) tumours with sufficient sequencing depth. Analysis using dN/dS ratios yields clear signs of positive selection in clonal and subclonal mutations and we find subclonal driver mutations in known driver genes. However, only 24% of subclones contain a driver mutation in a known driver gene, suggesting that a multitude of undiscovered late drivers exist and that tumours continue to undergo selection after tumourigenesis, at least until diagnosis. Consistent with other studies, ...
Cancer develops through a continuous process of somatic evolution. Whole genome sequencing provid... more Cancer develops through a continuous process of somatic evolution. Whole genome sequencing provides a snapshot of the tumor genome at the point of sampling, however, the data can contain information that permits the reconstruction of a tumor's evolutionary past. Here, we apply such life history analyses on an unprecedented scale, to a set of 2,658 tumors spanning 39 cancer types. We estimated the timing of large chromosomal gains during tumor evolution, by comparing the rates of doubled to non-doubled point mutations within gained regions. Although we find that such events typically occur in the second half of clonal evolution, we also observe distinctive and early chromosomal gains in some cancer types, such as gains of chromosomes 7, 19 and 20 in glioblastoma, and isochromosome 17q in medulloblastoma. By integrating these results with the qualitative timing of individual driver mutations, we obtained an overall ranking, from early to late, of frequent somatic events per cancer...
ABSTRACTWe present a new method, TrackSig, to estimate the evolutionary trajectories of signature... more ABSTRACTWe present a new method, TrackSig, to estimate the evolutionary trajectories of signatures of different somatic mutational processes from DNA sequencing data from a single, bulk tumour sample. TrackSig uses probability distributions over mutation types, called mutational signatures, to represent different mutational processes and detects the changes in the signature activity using an optimal segmentation algorithm that groups somatic mutations based on their estimated cancer cellular fraction (CCF) and their mutation type (e.g. CAG->CTG). We use two different simulation frameworks to assess both TrackSig’s reconstruction accuracy and its robustness to violations of its assumptions, as well as to compare it to a baseline approach. We find 2-4% median error in reconstructing the signature activities on simulations with varying difficulty with one to three subclones at an average depth of 30x. The size and the direction of the activity change is consistent in 83% and 95% of ...
We report the integrative analysis of more than 2,600 whole cancer genomes and their matching nor... more We report the integrative analysis of more than 2,600 whole cancer genomes and their matching normal tissues across 39 distinct tumour types. By studying whole genomes we have been able to catalogue non-coding cancer driver events, study patterns of structural variation, infer tumour evolution, probe the interactions among variants in the germline genome, the tumour genome and the transcriptome, and derive an understanding of how coding and non-coding variations together contribute to driving individual patient's tumours. This work represents the most comprehensive look at cancer whole genomes to date. NOTE TO READERS: This is an incomplete draft of the marker paper for the Pan-Cancer Analysis of Whole Genomes Project, and is intended to provide the background information for a series of in-depth papers that will be posted to BioRixv during the summer of 2017.
Uploads
Papers by Yulia Rubanova