Papers by Russell Schwartz
Biophysical Journal, 2013
Applied Bioinformatics, Feb 1, 2004
While the shared consensus genetic sequence of our species contains a great deal of information a... more While the shared consensus genetic sequence of our species contains a great deal of information about our common biology, there is also much to be learned from the subtle genetic variations across our species. These variations are believed to be generally of little or no direct functional significance and predominantly reflect the chance accumulation of small genetic changes since our emergence as a species. Therefore, they carry little useful information when observed in a single individual. When tallied across a whole population though, these chance mutations can teach us a great deal about our evolutionary history and the patterns of inheritance in particular individuals. In particular, frequently observed patterns of single nucleotide polymorphisms (SNPs) in a population can identify segments of chromosome that have been passed down largely intact through long stretches of our evolution. Finding these frequently conserved chromosomal segments, or haplotypes, and developing methods to identify haplotype patterns in particular individuals, will in turn help us to identify those particular segments that carry genetic factors influencing risk for many common human diseases. To make the best use of this data, we will need to develop new models for the encoding of information in genome variations--the "language of genetic variation"--and new algorithms for fitting datasets to those models. This article surveys past work by the author and colleagues on this problem, utilising computational methods for locating frequent patterns in haploid sequence data, and "parsing" sequences so as to optimally explain them given the knowledge of the general population structure. The author's recent work in this area has been compiled into a set of computational tools available at http://www-2.cs.cmu.edu/~russells/software/hapmotif.html.
Jcb, 1999
A computer model of protein aggregation competing with productive folding is proposed. Our model ... more A computer model of protein aggregation competing with productive folding is proposed. Our model adapts techniques from lattice Monte Carlo studies of protein folding to the problem of aggregation. However, rather than starting with a single string of residues, we allow independently folding strings to undergo collisions and consider their interactions in different orientations. We first present some background into the nature and significance of protein aggregation and the use of lattice Monte Carlo simulations in understanding other aspects of protein folding. The results of a series of simulation experiments involving simple versions of the model illustrate the importance of considering aggregation in simulations of protein folding and provide some preliminary understanding of the characteristics of the model. Finally, we discuss the value of the model in general and of our particular design decisions and experiments. We conclude that computer simulation techniques developed to study protein folding can provide insights into protein aggregation, and that a better understanding of aggregation may in turn provide new insights into and constraints on the more general protein folding problem.
Cell Biochemistry and Biophysics, Feb 1, 2007
The cellular environment creates numerous obstacles to efficient chemistry, as molecular componen... more The cellular environment creates numerous obstacles to efficient chemistry, as molecular components must navigate through a complex, densely crowded, heterogeneous, and constantly changing landscape in order to function at the appropriate times and places. Such obstacles are especially challenging to self-organizing or self-assembling molecular systems, which often need to build large structures in confined environments and typically have high-order kinetics that should make them exquisitely sensitive to concentration gradients, stochastic noise, and other non-ideal reaction conditions. Yet cells nonetheless manage to maintain a finely tuned network of countless molecular assemblies constantly forming and dissolving with a robustness and efficiency generally beyond what human engineers currently can achieve under even carefully controlled conditions. Significant advances in high-throughput biochemistry and genetics have made it possible to identify many of the components and interactions of this network, but its scale and complexity will likely make it impossible to understand at a global, systems level without predictive computational models. It is thus necessary to develop a clear understanding of how the reality of cellular biochemistry differs from the ideal models classically assumed by simulation approaches and how simulation methods can be adapted to accurately reflect biochemistry in the cell, particularly for the self-organizing systems that are most sensitive to these factors. In this review, we present approaches that have been undertaken from the modeling perspective to address various ways in which self-organization in the cell differs from idealized models.
BMC Genomics, 2016
Despite the enormous medical impact of cancers and intensive study of their biology, detailed cha... more Despite the enormous medical impact of cancers and intensive study of their biology, detailed characterization of tumor growth and development remains elusive. This difficulty occurs in large part because of enormous heterogeneity in the molecular mechanisms of cancer progression, both tumor-to-tumor and cell-to-cell in single tumors. Advances in genomic technologies, especially at the single-cell level, are improving the situation, but these approaches are held back by limitations of the biotechnologies for gathering genomic data from heterogeneous cell populations and the computational methods for making sense of those data. One popular way to gain the advantages of whole-genome methods without the cost of single-cell genomics has been the use of computational deconvolution (unmixing) methods to reconstruct clonal heterogeneity from bulk genomic data. These methods, too, are limited by the difficulty of inferring genomic profiles of rare or subtly varying clonal subpopulations from bulk data, a problem that can be computationally reduced to that of reconstructing the geometry of point clouds of tumor samples in a genome space. Here, we present a new method to improve that reconstruction by better identifying subspaces corresponding to tumors produced from mixtures of distinct combinations of clonal subpopulations. We develop a nonparametric clustering method based on medoidshift clustering for identifying subgroups of tumors expected to correspond to distinct trajectories of evolutionary progression. We show on synthetic and real tumor copy-number data that this new method substantially improves our ability to resolve discrete tumor subgroups, a key step in the process of accurately deconvolving tumor genomic data and inferring clonal heterogeneity from bulk data.
Biophysical Journal, 2014
The Golgi apparatus plays an important role in processing and sorting proteins and lipids. Golgi ... more The Golgi apparatus plays an important role in processing and sorting proteins and lipids. Golgi compartments constantly exchange material with each other and with other cellular components, allowing them to maintain and reform distinct identities despite dramatic ...
Mechanics & chemistry of biosystems: MCB
Understanding the connection between mechanics and cell structure requires the exploration of the... more Understanding the connection between mechanics and cell structure requires the exploration of the key molecular constituents responsible for cell shape and motility. One of these molecular bridges is the cytoskeleton, which is involved with intracellular organization and mechanotransduction. In order to examine the structure in cells, we have developed a computational technique that is able to probe the self-assembly of actin filaments through a lattice based Monte Carlo method. We have modeled the polymerization of these filaments based upon the interactions of globular actin through a probabilistic model encompassing both inert and active proteins. The results show similar response to classic ordinary differential equations at low molecular concentrations, but a bi-phasic divergence at realistic concentrations for living mammalian cells. Further, by introducing localized mobility parameters, we are able to simulate molecular gradients that are observed in nonhomogeneous protein di...
Synthetic biology studies have unraveled insights into gene circuit dynamics, but often neglected... more Synthetic biology studies have unraveled insights into gene circuit dynamics, but often neglected the impact of non-transcriptional factors, including circuit-host interactions, epigenetic factors, and intracellular microenvironments. An obvious and important non-transcriptional factor is molecular crowding, which refers to the volume exclusion effect resulting from the packing of high-density macromolecules into constrained intracellular spaces. To date, it remains unclear if and how molecular crowding can impact dynamics of gene circuits. We addressed this question by using a multi-scale synthetic biology approach by integrating single-molecule experiments, cell-free expression systems, and artificial cells. Specifically, we found that the size of crowding molecules uniquely affects both the diffusion T7 RNA polymerase and its binding to a T7 RNAP promoter. Based on the single-molecule results, we further showed that the impact of molecular crowding on gene circuits could be enhan...
Biophysical Journal, 2015
Biophysical Journal, 2015
Proceedings of the seventh annual international conference on Computational molecular biology - RECOMB '03, 2003
It is widely hoped that variation in the human genome will provide a means of predicting risk of ... more It is widely hoped that variation in the human genome will provide a means of predicting risk of a variety of complex, chronic diseases. A major stumbling block to the successful identification of association between human DNA polymorphisms (SNPs) and variability in risk of complex diseases is the enormous number of SNPs in the human genome . The large number of SNPs results in unacceptably high costs for exhaustive genotyping, and so there is a broad effort to determine ways to select SNPs so as to maximize the informativeness of a subset. In this paper we contrast two methods for reducing the complexity of SNP variation: haplotype tagging, i.e. typing a subset of SNPs to identify segments of the genome that appear to be nearly unrecombined (haplotype blocks), and a new block-free model that we develop in this report. We present a statistic for comparing haplotype blocks and show that while the concept of haplotype blocks is reasonably robust there is substantial variability among block partitions. We develop a measure for selecting an informative subset of SNPs in a block free model. We show that the general version of this problem is NP-hard and give efficient algorithms for two important special cases of this problem.
Lecture Notes in Computer Science, 2004
There has been considerable recent interest in the use of haplotype structure to aid in the desig... more There has been considerable recent interest in the use of haplotype structure to aid in the design and analysis of case-control association studies searching for genetic predictors of human disease. The use of haplotype structure is based on the premise that genetic variations that are physically close on the genome will often be predictive of one another due to their frequent descent intact through recent evolution. Understanding these correlations between sites should make it possible to minimize the amount of redundant information gathered through assays or examined in association tests, improving the power and reducing the cost of the studies. In this work, we evaluate the potential value of haplotype structure in this context by applying it to two key subproblems: inferring hidden polymorphic sites in partial haploid sequences and choosing subsets of variants that optimally capture the information content of the full set of sequences. We develop methods for these approaches based on a prior method we developed for predicting piece-wise shared ancestry of haploid sequences. We apply these methods to a case study of two genetic regions with very different levels of sequence diversity. We conclude that haplotype correlations do have considerable potential for these problems, but that the degree to which they are useful will be strongly dependent on the population sizes available and the specifics of the genetic regions examined.
Lecture Notes in Computer Science, 2002
Abstract. Recent evidence for a blocky haplotype structure to the human genome and for its impo... more Abstract. Recent evidence for a blocky haplotype structure to the human genome and for its importance to disease inference studies has created a pressing need for tools that identify patterns of past recom-bination in sequences of samples of human genes and gene regions. We ...
Briefings in Bioinformatics, 2002
Proceedings IEEE 2001 Symposium on Parallel and Large-Data Visualization and Graphics (Cat. No.01EX520), 2001
In recent years, an explosion in data has been profoundly changing the field of biology and creat... more In recent years, an explosion in data has been profoundly changing the field of biology and creating the need for new areas of expertise, particularly in the handling of data. One vital area that has so far received insufficient attention is how to communicate the large quantities of diverse and complex information that is being generated. Celera has encountered a number of visualization problems in the course of developing tools for bioinformatics research, applying them to our data generation efforts, and making that data available to our customers. This paper presents several examples from Celera's experience. In the area of genomics, challenging visualization problems have come up in assembling genomes, studying variations between individuals, and comparing different genomes to one another. The emerging area of proteomics has created new visualization challenges in interpreting protein expression data, studying protein regulatory networks, and examining protein structure. These examples illustrate how the field of bioinformatics is posing new challenges concerning the communication of data that are often very different from those that have heretofore dominated scientific computing. Addressing the level of detail, the degree of complexity, and the interdisciplinary barriers that characterize bioinformatic problems can be expected to be a sizable but rewarding task for the field of scientific visualization.
Uploads
Papers by Russell Schwartz