Background: Photosynthetic organisms convert atmospheric carbon dioxide into numerous metabolites... more Background: Photosynthetic organisms convert atmospheric carbon dioxide into numerous metabolites along the pathways to make new biomass. Aquatic photosynthetic organisms, which fix almost half of global inorganic carbon, have great potential: as a carbon dioxide fixation method, for the economical production of chemicals, or as a source for lipids and starch which can then be converted to biofuels. To harness this potential through metabolic engineering and to maximize production, a more thorough understanding of photosynthetic metabolism must first be achieved. A model algal species, C. reinhardtii, was chosen and the metabolic network reconstructed. Intracellular fluxes were then calculated using flux balance analysis (FBA). Results: The metabolic network of primary metabolism for a green alga, C. reinhardtii, was reconstructed using genomic and biochemical information. The reconstructed network accounts for the intracellular localization of enzymes to three compartments and includes 484 metabolic reactions and 458 intracellular metabolites. Based on BLAST searches, one newly annotated enzyme (fructose-1,6-bisphosphatase) was added to the Chlamydomonas reinhardtii database. FBA was used to predict metabolic fluxes under three growth conditions, autotrophic, heterotrophic and mixotrophic growth. Biomass yields ranged from 28.9 g per mole C for autotrophic growth to 15 g per mole C for heterotrophic growth. Conclusion: The flux balance analysis model of central and intermediary metabolism in C. reinhardtii is the first such model for algae and the first model to include three metabolically active compartments. In addition to providing estimates of intracellular fluxes, metabolic reconstruction and modelling efforts also provide a comprehensive method for annotation of genome databases. As a result of our reconstruction, one new enzyme was annotated in the database and several others were found to be missing; implying new pathways or non-conserved enzymes. The use of FBA to estimate intracellular fluxes also provides flux values that can be used as a starting point for rational engineering of C. reinhardtii. From these initial estimates, it is clear that aerobic heterotrophic growth on acetate has a low yield on carbon, while mixotrophically and autotrophically grown cells are significantly more carbon efficient.
Complex DNA mixtures are challenging to interpret and require computational tools that aid in tha... more Complex DNA mixtures are challenging to interpret and require computational tools that aid in that interpretation. Recently, several computational methods that estimate the number of contributors (NOC) to a sample have been developed. Unlike analogous tools that interpret profiles and report LRs, NOC tools vary widely in their operational principle where some are Bayesian and others are machine learning tools. Conjunctionally, NOC tools may return a single n estimate, or a distribution on n. This vast array of constructs, coupled with a gap in standardized methods by which to validate NOC systems, warrants an exploration into the measures by which differing NOC systems might be tested for operations. In the current paper, we use two exemplar NOC systems: a probabilistic system named NOCIt, which renders an a posteriori probability (APP) distribution on the number of contributors given an electropherogram and an artificial neural network (ANN). NOCIt is a continuous Bayesian inference system incorporating models of peak height, degradation, differential degradation, forward and reverse stutter, noise and allelic drop-out while considering allele frequencies in a reference population. The ANN is also a continuous method, taking all the same features (barring degradation) into account. Unlike its Bayesian counterpart, it demands substantively more data to parameterize, requiring synthetic data. We explore each system's performance by conducting tests on 214 PROVEDIt mixtures where the limit of detection was 1-copy of DNA. We found that after a lengthy training period of approximately 24 h, the ANN's evaluation process was very fast and perfectly repeatable. In contrast, NOCIt only took a few minutes to train but took tens of minutes to complete each sample and was less repeatable. In addition, it rendered a probability distribution that was more sensitive and specific, affording a reasonable method by which to report all reasonable n that explain the evidence for a given sample. Whatever the method, by acknowledging the inherent differences between NOC systems, we demonstrate that validation constructs will necessarily be guided by the needs of the forensic domain and be dependent upon whether the laboratory seeks to assign a single n or range of n.
The interpretation of DNA evidence may rely upon the assumption that the forensic short tandem re... more The interpretation of DNA evidence may rely upon the assumption that the forensic short tandem repeat (STR) profile is composed of multiple genotypes, or partial genotypes, originating from n contributors. In cases where the number of contributors (NOC) is in dispute, it may be justifiable to compute likelihood ratios that utilize different NOC parameters in the numerator and denominator, or present different likelihoods separately. Therefore, in this work, we evaluate the impact of allele dropout on estimating the NOC for simulated mixtures with up to six contributors in the presence or absence of a major contributor. These simulations demonstrate that in the presence of dropout, or with the application of an analytical threshold (AT), estimating the NOC using counting methods was unreliable for mixtures containing one or more minor contributors present at low levels. The number of misidentifications was only slightly reduced when we expand the number of STR loci from 16 to 21. In ...
Advances in experimental medicine and biology, 2018
With the demand for renewable energy growing, hydrogen (H) is becoming an attractive energy carri... more With the demand for renewable energy growing, hydrogen (H) is becoming an attractive energy carrier. Developing H production technologies with near-net zero carbon emissions is a major challenge for the "H economy." Certain cyanobacteria inherently possess enzymes, nitrogenases, and bidirectional hydrogenases that are capable of H evolution using sunlight, making them ideal cell factories for photocatalytic conversion of water to H. With the advances in synthetic biology, cyanobacteria are currently being developed as a "plug and play" chassis to produce H. This chapter describes the metabolic pathways involved and the theoretical limits to cyanobacterial H production and summarizes the metabolic engineering technologies pursued.
Boosting cellular growth rates while redirecting metabolism to make desired products are the pree... more Boosting cellular growth rates while redirecting metabolism to make desired products are the preeminent goals of gene engineering of photoautotrophs, yet so far these goals have been hardly achieved owing to lack of understanding of the functional pathways and their choke points. Here we apply a C mass isotopic method (INST-MFA) to quantify instantaneous fluxes of metabolites during photoautotrophic growth. INST-MFA determines the globally most accurate set of absolute fluxes for each metabolite from a finite set of measured C-isotopomer fluxes by minimizing the sum of squared residuals between experimental and predicted mass isotopomers. We show that the widely observed shift in biomass composition in cyanobacteria, demonstrated here with Synechococcus sp. PCC 7002, favoring glycogen synthesis during nitrogen starvation is caused by (1) increased flux through a bottleneck step in gluconeogenesis (3PG → GAP/DHAP), and (2) flux overflow through a previously unrecognized hybrid glucon...
2012 IEEE Information Theory Workshop, ITW 2012, 2012
ABSTRACT We take a theoretical approach to the problem of identification, or “reverse engineering... more ABSTRACT We take a theoretical approach to the problem of identification, or “reverse engineering”, of gene regulatory networks. Through a mathematical model of a gene regulatory network, we examine fundamental questions on the limits and achievability of network identification. We apply simplifying assumptions to construct an acyclic binary model, and we assume that the identification strategy is restricted to perturbing the network by gene expression assignments, followed by expression profile measurements at steady-state. Further, we assume the presence of side information, which we call sensitivity, that is likely to be present in actual gene networks. We show that with sensitivity side information and realistic topology assumptions we can identify the topology of acyclic binary networks using O(n) assignments and measurements, n being the number of genes in the network. Our work establishes a theoretical framework for examining an important technological problem where a number of significant questions remain open.
Proceedings of the National Academy of Sciences of the United States of America, Jan 13, 2015
Diatoms are unicellular algae that accumulate significant amounts of triacylglycerols as storage ... more Diatoms are unicellular algae that accumulate significant amounts of triacylglycerols as storage lipids when their growth is limited by nutrients. Using biochemical, physiological, bioinformatics, and reverse genetic approaches, we analyzed how the flux of carbon into lipids is influenced by nitrogen stress in a model diatom, Phaeodactylum tricornutum. Our results reveal that the accumulation of lipids is a consequence of remodeling of intermediate metabolism, especially reactions in the tricarboxylic acid and the urea cycles. Specifically, approximately one-half of the cellular proteins are cannibalized; whereas the nitrogen is scavenged by the urea and glutamine synthetase/glutamine 2-oxoglutarate aminotransferase pathways and redirected to the de novo synthesis of nitrogen assimilation machinery, simultaneously, the photobiological flux of carbon and reductants is used to synthesize lipids. To further examine how nitrogen stress triggers the remodeling process, we knocked down th...
Network Coding With Wireless Applications Outline What is Network Coding? Main Theorem of Network... more Network Coding With Wireless Applications Outline What is Network Coding? Main Theorem of Network Coding Wireless Applications Conclusion Motivation ◮ Network coding is a theory for communicating information across networks more efficiently. ◮ Although it has been around since the year 2000, there still isn't a single deployed product that uses it.
ATP-dependent proteases are processive, meaning that they degrade full-length proteins into small... more ATP-dependent proteases are processive, meaning that they degrade full-length proteins into small peptide products without releasing large intermediates along the reaction pathway. In the case of the bacterial ATP-dependent protease ClpAP, ATP hydrolysis by the ClpA component has been proposed to be required for processive proteolysis of full-length protein substrates. We present here data showing that in the absence of the ATPase subunit ClpA, the protease subunit ClpP can degrade full-length protein substrates processively, albeit at a greatly reduced rate. Moreover, the size distribution of peptide products from a ClpP-catalyzed digest is remarkably similar to the size distribution of products from a ClpAPcatalyzed digest. The ClpAP-and ClpP-generated peptide product size distributions are fitted well by a sum of multiple underlying Gaussian peaks with means at integral multiples of ∼900 Da (7-8 amino acids). Our results are consistent with a mechanism in which ClpP controls product sizes by alternating between translocation in steps of 7-8 ((2-3) amino acid residues and proteolysis. On the structural and molecular level, the step size may be controlled by the spacing between the ClpP active sites, and processivity may be achieved by coupling peptide bond hydrolysis to the binding and release of substrate and products in the protease chamber. Energy-dependent proteases are large molecular machines responsible for the intracellular degradation of misfolded and short-lived regulatory proteins. Members of this family include the eukaryotic 26S proteasome and the bacterial enzymes ClpAP, ClpXP, HslVU, Lon, and FtsH. In these systems, AAA+ 1 (ATPases associated with a variety of cellular activities) components bind, unfold, and translocate protein substrates into barrel-shaped compartmental proteases where the substrates are hydrolyzed and released as peptide products (1-4). While much attention has been devoted to understanding how protein substrates are recognized, unfolded, and translocated by the ATPase component of energydependent proteases (5, 6), much remains to be learned about how proteolysis proceeds after the substrates reach the proteolytic chamber. In particular, the way that the ATPase and protease components may work together to maintain processivity is incompletely understood (7). One way to study the proteolytic mechanism is to examine the size distribution of peptide products generated by these proteases. Different proteolytic mechanisms make distinct predictions about the shape of the product size distribution and its sensitivity to perturbations in enzymatic function (8). Size-exclusion chromatography allows the product size distribution to be estimated (8-10), with the caveat that the chromatogram reflects both the true size distribution and chromatographic factors that broaden the observed peaks (11). Previous work has shown that the size range of products generated by energy-dependent proteases spans a 10-fold range (from ∼3 to ∼30 residues). Interestingly, all ATPdependent proteases for which product size has been characterized [including the 20S archaeal (9) and eukaryotic proteasomes (10, 12), the 26S proteasome (10) (which includes the protease subunits in the 20S proteasome, accessory ATPase subunits, and other regulatory subunits), ClpAP (8), HslVU (13), and Lon (14, 15)] share this size range of products. Furthermore, both ClpAP and the proteasome have been shown to generate approximately lognormal peptide product size distributions with a peak between 6 and 9 amino acids and a tail skewed toward longer products (8-10). The similarity in the range of product sizes and in the shape of product size distributions suggests that all ATPdependent proteases share a common proteolytic mechanism. However, this mechanism remains a matter of debate. The distance between protease active sites corresponds to 7-8 amino acids (16-18), matching the peak observed at this product length in both proteasome-and ClpAP-generated size distributions. This observation led to the hypothesis that the † This work was supported by the Beckman Foundation and the Department of Chemistry.
MOST (metabolic optimization and simulation tool) is a software package that implements GDBB (gen... more MOST (metabolic optimization and simulation tool) is a software package that implements GDBB (genetic design through branch and bound) in an intuitive user-friendly interface with excel-like editing functionality, as well as implementing FBA (flux balance analysis), and supporting systems biology markup language and comma-separated values files. GDBB is currently the fastest algorithm for finding gene knockouts predicted by FBA to increase production of desired products, but GDBB has only been available on a command line interface, which is difficult to use for those without programming knowledge, until the release of MOST. MOST is distributed for free on the GNU General Public License. The software and full documentation are available at http://most.ccib.rutgers.edu/. [email protected].
Forensic DNA signal is notoriously challenging to assess, requiring computational tools to suppor... more Forensic DNA signal is notoriously challenging to assess, requiring computational tools to support its interpretation. Over-expressions of stutter, allele drop-out, allele drop-in, degradation, differential degradation, and the like, make forensic DNA profiles too complicated to evaluate by manual methods. In response, computational tools that make point estimates on the Number of Contributors (NOC) to a sample have been developed, as have Bayesian methods that evaluate an A Posteriori Probability (APP) distribution on the NOC. In cases where an overly narrow NOC range is assumed, the downstream strength of evidence may be incomplete insofar as the evidence is evaluated with an inadequate set of propositions. In the current paper, we extend previous work on NOCIt, a Bayesian method that determines an APP on the NOC given an electropherogram, by reporting on an implementation where the user can add assumed contributors. NOCIt is a continuous system that incorporates models of peak height (including degradation and differential degradation), forward and reverse stutter, noise, and allelic drop-out, while being cognizant of allele frequencies in a reference population. When conditioned on a known contributor, we found that the mode of the APP distribution can shift to one greater when compared with the circumstance where no known contributor is assumed, and that occurred most often when the assumed contributor was the minor constituent to the mixture. In a development of a result of Slooten and Caliebe (FSI:G, 2018) that, under suitable assumptions, establishes the NOC can be treated as a nuisance variable in the computation of a likelihood ratio between the prosecution and defense hypotheses, we show that this computation must not only use coincident models, but also coincident contextual information. The results reported here, therefore, illustrate the power of modern probabilistic systems to assess full weights-of-evidence, and to provide information on reasonable NOC ranges across multiple contexts.
With the establishment of advanced technology facilities for high throughput plant phenotyping, t... more With the establishment of advanced technology facilities for high throughput plant phenotyping, the problem of estimating plant biomass of individual plants from their two dimensional images is becoming increasingly important. The approach predominantly cited in literature is to estimate the biomass of a plant as a linear function of the projected shoot area of plants in the images. However, the estimation error from this model, which is solely a function of projected shoot area, is large, prohibiting accurate estimation of the biomass of plants, particularly for the salt-stressed plants. In this paper, we propose a method based on plant specific weight for improving the accuracy of the linear model and reducing the estimation bias (the difference between actual shoot dry weight and the value of the shoot dry weight estimated with a predictive model). For the proposed method in this study, we modeled the plant shoot dry weight as a function of plant area and plant age. The data used...
1 Center for Computational and Integrative Biology, Rutgers University, Camden, NJ 08102, USA 2 B... more 1 Center for Computational and Integrative Biology, Rutgers University, Camden, NJ 08102, USA 2 Biomedical Forensic Sciences Program, Boston University School of Medicine, Boston, MA 02118, USA 3 Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139, USA 4 School of Mathematics and Statistics, University of South Australia, Mawson Lakes, SA 5095, Australia *Corresponding author. Tel.: +1-856-225-6094. E-mail: [email protected]
Forensic DNA signal is notoriously challenging to interpret and requires the implementation of co... more Forensic DNA signal is notoriously challenging to interpret and requires the implementation of computational tools that support its interpretation. While data from high-copy, low-contributor samples result in electropherogram signal that is readily interpreted by probabilistic methods, electropherogram signal from forensic stains is often garnered from low-copy, high-contributor-number samples and is frequently obfuscated by allele sharing, allele drop-out, stutter and noise. Since forensic DNA profiles are too complicated to quantitatively assess by manual methods, continuous, probabilistic frameworks that draw inferences on the Number of Contributors (NOC) and compute the Likelihood Ratio (LR) given the prosecution's and defense's hypotheses have been developed. In the current paper, we validate a new version of the NOCIt inference platform that determines an A Posteriori Probability (APP) distribution of the number of contributors given an electropherogram. NOCIt is a continuous inference system that incorporates models of peak height (including degradation and differential degradation), forward and reverse stutter, noise and allelic drop-out while taking into account allele frequencies in a reference population. We established the algorithm's performance by conducting tests on samples that were representative of types often encountered in practice. In total, we tested NOCIt's performance on 815 degraded, UV-damaged, inhibited, differentially degraded, or uncompromised DNA mixture samples containing up to 5 contributors. We found that the model makes accurate, repeatable and reliable inferences about the NOCs and significantly outperformed methods that rely on signal filtering. By leveraging recent theoretical results of Slooten and Caliebe (FSI:G, 2018) that, under suitable assumptions, establish the NOC can be treated as a nuisance variable, we demonstrated that when NOCIt's APP is used in conjunction with a downstream likelihood ratio (LR) inference system that employs the same probabilistic model, a full evaluation across multiple contributor numbers is rendered. This work, therefore, illustrates the power of modern probabilistic systems to report holistic and interpretable weights-of-evidence to the trier-of-fact without assigning a specified number of contributors or filtering signal.
Background: Photosynthetic organisms convert atmospheric carbon dioxide into numerous metabolites... more Background: Photosynthetic organisms convert atmospheric carbon dioxide into numerous metabolites along the pathways to make new biomass. Aquatic photosynthetic organisms, which fix almost half of global inorganic carbon, have great potential: as a carbon dioxide fixation method, for the economical production of chemicals, or as a source for lipids and starch which can then be converted to biofuels. To harness this potential through metabolic engineering and to maximize production, a more thorough understanding of photosynthetic metabolism must first be achieved. A model algal species, C. reinhardtii, was chosen and the metabolic network reconstructed. Intracellular fluxes were then calculated using flux balance analysis (FBA). Results: The metabolic network of primary metabolism for a green alga, C. reinhardtii, was reconstructed using genomic and biochemical information. The reconstructed network accounts for the intracellular localization of enzymes to three compartments and includes 484 metabolic reactions and 458 intracellular metabolites. Based on BLAST searches, one newly annotated enzyme (fructose-1,6-bisphosphatase) was added to the Chlamydomonas reinhardtii database. FBA was used to predict metabolic fluxes under three growth conditions, autotrophic, heterotrophic and mixotrophic growth. Biomass yields ranged from 28.9 g per mole C for autotrophic growth to 15 g per mole C for heterotrophic growth. Conclusion: The flux balance analysis model of central and intermediary metabolism in C. reinhardtii is the first such model for algae and the first model to include three metabolically active compartments. In addition to providing estimates of intracellular fluxes, metabolic reconstruction and modelling efforts also provide a comprehensive method for annotation of genome databases. As a result of our reconstruction, one new enzyme was annotated in the database and several others were found to be missing; implying new pathways or non-conserved enzymes. The use of FBA to estimate intracellular fluxes also provides flux values that can be used as a starting point for rational engineering of C. reinhardtii. From these initial estimates, it is clear that aerobic heterotrophic growth on acetate has a low yield on carbon, while mixotrophically and autotrophically grown cells are significantly more carbon efficient.
Complex DNA mixtures are challenging to interpret and require computational tools that aid in tha... more Complex DNA mixtures are challenging to interpret and require computational tools that aid in that interpretation. Recently, several computational methods that estimate the number of contributors (NOC) to a sample have been developed. Unlike analogous tools that interpret profiles and report LRs, NOC tools vary widely in their operational principle where some are Bayesian and others are machine learning tools. Conjunctionally, NOC tools may return a single n estimate, or a distribution on n. This vast array of constructs, coupled with a gap in standardized methods by which to validate NOC systems, warrants an exploration into the measures by which differing NOC systems might be tested for operations. In the current paper, we use two exemplar NOC systems: a probabilistic system named NOCIt, which renders an a posteriori probability (APP) distribution on the number of contributors given an electropherogram and an artificial neural network (ANN). NOCIt is a continuous Bayesian inference system incorporating models of peak height, degradation, differential degradation, forward and reverse stutter, noise and allelic drop-out while considering allele frequencies in a reference population. The ANN is also a continuous method, taking all the same features (barring degradation) into account. Unlike its Bayesian counterpart, it demands substantively more data to parameterize, requiring synthetic data. We explore each system's performance by conducting tests on 214 PROVEDIt mixtures where the limit of detection was 1-copy of DNA. We found that after a lengthy training period of approximately 24 h, the ANN's evaluation process was very fast and perfectly repeatable. In contrast, NOCIt only took a few minutes to train but took tens of minutes to complete each sample and was less repeatable. In addition, it rendered a probability distribution that was more sensitive and specific, affording a reasonable method by which to report all reasonable n that explain the evidence for a given sample. Whatever the method, by acknowledging the inherent differences between NOC systems, we demonstrate that validation constructs will necessarily be guided by the needs of the forensic domain and be dependent upon whether the laboratory seeks to assign a single n or range of n.
The interpretation of DNA evidence may rely upon the assumption that the forensic short tandem re... more The interpretation of DNA evidence may rely upon the assumption that the forensic short tandem repeat (STR) profile is composed of multiple genotypes, or partial genotypes, originating from n contributors. In cases where the number of contributors (NOC) is in dispute, it may be justifiable to compute likelihood ratios that utilize different NOC parameters in the numerator and denominator, or present different likelihoods separately. Therefore, in this work, we evaluate the impact of allele dropout on estimating the NOC for simulated mixtures with up to six contributors in the presence or absence of a major contributor. These simulations demonstrate that in the presence of dropout, or with the application of an analytical threshold (AT), estimating the NOC using counting methods was unreliable for mixtures containing one or more minor contributors present at low levels. The number of misidentifications was only slightly reduced when we expand the number of STR loci from 16 to 21. In ...
Advances in experimental medicine and biology, 2018
With the demand for renewable energy growing, hydrogen (H) is becoming an attractive energy carri... more With the demand for renewable energy growing, hydrogen (H) is becoming an attractive energy carrier. Developing H production technologies with near-net zero carbon emissions is a major challenge for the "H economy." Certain cyanobacteria inherently possess enzymes, nitrogenases, and bidirectional hydrogenases that are capable of H evolution using sunlight, making them ideal cell factories for photocatalytic conversion of water to H. With the advances in synthetic biology, cyanobacteria are currently being developed as a "plug and play" chassis to produce H. This chapter describes the metabolic pathways involved and the theoretical limits to cyanobacterial H production and summarizes the metabolic engineering technologies pursued.
Boosting cellular growth rates while redirecting metabolism to make desired products are the pree... more Boosting cellular growth rates while redirecting metabolism to make desired products are the preeminent goals of gene engineering of photoautotrophs, yet so far these goals have been hardly achieved owing to lack of understanding of the functional pathways and their choke points. Here we apply a C mass isotopic method (INST-MFA) to quantify instantaneous fluxes of metabolites during photoautotrophic growth. INST-MFA determines the globally most accurate set of absolute fluxes for each metabolite from a finite set of measured C-isotopomer fluxes by minimizing the sum of squared residuals between experimental and predicted mass isotopomers. We show that the widely observed shift in biomass composition in cyanobacteria, demonstrated here with Synechococcus sp. PCC 7002, favoring glycogen synthesis during nitrogen starvation is caused by (1) increased flux through a bottleneck step in gluconeogenesis (3PG → GAP/DHAP), and (2) flux overflow through a previously unrecognized hybrid glucon...
2012 IEEE Information Theory Workshop, ITW 2012, 2012
ABSTRACT We take a theoretical approach to the problem of identification, or “reverse engineering... more ABSTRACT We take a theoretical approach to the problem of identification, or “reverse engineering”, of gene regulatory networks. Through a mathematical model of a gene regulatory network, we examine fundamental questions on the limits and achievability of network identification. We apply simplifying assumptions to construct an acyclic binary model, and we assume that the identification strategy is restricted to perturbing the network by gene expression assignments, followed by expression profile measurements at steady-state. Further, we assume the presence of side information, which we call sensitivity, that is likely to be present in actual gene networks. We show that with sensitivity side information and realistic topology assumptions we can identify the topology of acyclic binary networks using O(n) assignments and measurements, n being the number of genes in the network. Our work establishes a theoretical framework for examining an important technological problem where a number of significant questions remain open.
Proceedings of the National Academy of Sciences of the United States of America, Jan 13, 2015
Diatoms are unicellular algae that accumulate significant amounts of triacylglycerols as storage ... more Diatoms are unicellular algae that accumulate significant amounts of triacylglycerols as storage lipids when their growth is limited by nutrients. Using biochemical, physiological, bioinformatics, and reverse genetic approaches, we analyzed how the flux of carbon into lipids is influenced by nitrogen stress in a model diatom, Phaeodactylum tricornutum. Our results reveal that the accumulation of lipids is a consequence of remodeling of intermediate metabolism, especially reactions in the tricarboxylic acid and the urea cycles. Specifically, approximately one-half of the cellular proteins are cannibalized; whereas the nitrogen is scavenged by the urea and glutamine synthetase/glutamine 2-oxoglutarate aminotransferase pathways and redirected to the de novo synthesis of nitrogen assimilation machinery, simultaneously, the photobiological flux of carbon and reductants is used to synthesize lipids. To further examine how nitrogen stress triggers the remodeling process, we knocked down th...
Network Coding With Wireless Applications Outline What is Network Coding? Main Theorem of Network... more Network Coding With Wireless Applications Outline What is Network Coding? Main Theorem of Network Coding Wireless Applications Conclusion Motivation ◮ Network coding is a theory for communicating information across networks more efficiently. ◮ Although it has been around since the year 2000, there still isn't a single deployed product that uses it.
ATP-dependent proteases are processive, meaning that they degrade full-length proteins into small... more ATP-dependent proteases are processive, meaning that they degrade full-length proteins into small peptide products without releasing large intermediates along the reaction pathway. In the case of the bacterial ATP-dependent protease ClpAP, ATP hydrolysis by the ClpA component has been proposed to be required for processive proteolysis of full-length protein substrates. We present here data showing that in the absence of the ATPase subunit ClpA, the protease subunit ClpP can degrade full-length protein substrates processively, albeit at a greatly reduced rate. Moreover, the size distribution of peptide products from a ClpP-catalyzed digest is remarkably similar to the size distribution of products from a ClpAPcatalyzed digest. The ClpAP-and ClpP-generated peptide product size distributions are fitted well by a sum of multiple underlying Gaussian peaks with means at integral multiples of ∼900 Da (7-8 amino acids). Our results are consistent with a mechanism in which ClpP controls product sizes by alternating between translocation in steps of 7-8 ((2-3) amino acid residues and proteolysis. On the structural and molecular level, the step size may be controlled by the spacing between the ClpP active sites, and processivity may be achieved by coupling peptide bond hydrolysis to the binding and release of substrate and products in the protease chamber. Energy-dependent proteases are large molecular machines responsible for the intracellular degradation of misfolded and short-lived regulatory proteins. Members of this family include the eukaryotic 26S proteasome and the bacterial enzymes ClpAP, ClpXP, HslVU, Lon, and FtsH. In these systems, AAA+ 1 (ATPases associated with a variety of cellular activities) components bind, unfold, and translocate protein substrates into barrel-shaped compartmental proteases where the substrates are hydrolyzed and released as peptide products (1-4). While much attention has been devoted to understanding how protein substrates are recognized, unfolded, and translocated by the ATPase component of energydependent proteases (5, 6), much remains to be learned about how proteolysis proceeds after the substrates reach the proteolytic chamber. In particular, the way that the ATPase and protease components may work together to maintain processivity is incompletely understood (7). One way to study the proteolytic mechanism is to examine the size distribution of peptide products generated by these proteases. Different proteolytic mechanisms make distinct predictions about the shape of the product size distribution and its sensitivity to perturbations in enzymatic function (8). Size-exclusion chromatography allows the product size distribution to be estimated (8-10), with the caveat that the chromatogram reflects both the true size distribution and chromatographic factors that broaden the observed peaks (11). Previous work has shown that the size range of products generated by energy-dependent proteases spans a 10-fold range (from ∼3 to ∼30 residues). Interestingly, all ATPdependent proteases for which product size has been characterized [including the 20S archaeal (9) and eukaryotic proteasomes (10, 12), the 26S proteasome (10) (which includes the protease subunits in the 20S proteasome, accessory ATPase subunits, and other regulatory subunits), ClpAP (8), HslVU (13), and Lon (14, 15)] share this size range of products. Furthermore, both ClpAP and the proteasome have been shown to generate approximately lognormal peptide product size distributions with a peak between 6 and 9 amino acids and a tail skewed toward longer products (8-10). The similarity in the range of product sizes and in the shape of product size distributions suggests that all ATPdependent proteases share a common proteolytic mechanism. However, this mechanism remains a matter of debate. The distance between protease active sites corresponds to 7-8 amino acids (16-18), matching the peak observed at this product length in both proteasome-and ClpAP-generated size distributions. This observation led to the hypothesis that the † This work was supported by the Beckman Foundation and the Department of Chemistry.
MOST (metabolic optimization and simulation tool) is a software package that implements GDBB (gen... more MOST (metabolic optimization and simulation tool) is a software package that implements GDBB (genetic design through branch and bound) in an intuitive user-friendly interface with excel-like editing functionality, as well as implementing FBA (flux balance analysis), and supporting systems biology markup language and comma-separated values files. GDBB is currently the fastest algorithm for finding gene knockouts predicted by FBA to increase production of desired products, but GDBB has only been available on a command line interface, which is difficult to use for those without programming knowledge, until the release of MOST. MOST is distributed for free on the GNU General Public License. The software and full documentation are available at http://most.ccib.rutgers.edu/. [email protected].
Forensic DNA signal is notoriously challenging to assess, requiring computational tools to suppor... more Forensic DNA signal is notoriously challenging to assess, requiring computational tools to support its interpretation. Over-expressions of stutter, allele drop-out, allele drop-in, degradation, differential degradation, and the like, make forensic DNA profiles too complicated to evaluate by manual methods. In response, computational tools that make point estimates on the Number of Contributors (NOC) to a sample have been developed, as have Bayesian methods that evaluate an A Posteriori Probability (APP) distribution on the NOC. In cases where an overly narrow NOC range is assumed, the downstream strength of evidence may be incomplete insofar as the evidence is evaluated with an inadequate set of propositions. In the current paper, we extend previous work on NOCIt, a Bayesian method that determines an APP on the NOC given an electropherogram, by reporting on an implementation where the user can add assumed contributors. NOCIt is a continuous system that incorporates models of peak height (including degradation and differential degradation), forward and reverse stutter, noise, and allelic drop-out, while being cognizant of allele frequencies in a reference population. When conditioned on a known contributor, we found that the mode of the APP distribution can shift to one greater when compared with the circumstance where no known contributor is assumed, and that occurred most often when the assumed contributor was the minor constituent to the mixture. In a development of a result of Slooten and Caliebe (FSI:G, 2018) that, under suitable assumptions, establishes the NOC can be treated as a nuisance variable in the computation of a likelihood ratio between the prosecution and defense hypotheses, we show that this computation must not only use coincident models, but also coincident contextual information. The results reported here, therefore, illustrate the power of modern probabilistic systems to assess full weights-of-evidence, and to provide information on reasonable NOC ranges across multiple contexts.
With the establishment of advanced technology facilities for high throughput plant phenotyping, t... more With the establishment of advanced technology facilities for high throughput plant phenotyping, the problem of estimating plant biomass of individual plants from their two dimensional images is becoming increasingly important. The approach predominantly cited in literature is to estimate the biomass of a plant as a linear function of the projected shoot area of plants in the images. However, the estimation error from this model, which is solely a function of projected shoot area, is large, prohibiting accurate estimation of the biomass of plants, particularly for the salt-stressed plants. In this paper, we propose a method based on plant specific weight for improving the accuracy of the linear model and reducing the estimation bias (the difference between actual shoot dry weight and the value of the shoot dry weight estimated with a predictive model). For the proposed method in this study, we modeled the plant shoot dry weight as a function of plant area and plant age. The data used...
1 Center for Computational and Integrative Biology, Rutgers University, Camden, NJ 08102, USA 2 B... more 1 Center for Computational and Integrative Biology, Rutgers University, Camden, NJ 08102, USA 2 Biomedical Forensic Sciences Program, Boston University School of Medicine, Boston, MA 02118, USA 3 Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139, USA 4 School of Mathematics and Statistics, University of South Australia, Mawson Lakes, SA 5095, Australia *Corresponding author. Tel.: +1-856-225-6094. E-mail: [email protected]
Forensic DNA signal is notoriously challenging to interpret and requires the implementation of co... more Forensic DNA signal is notoriously challenging to interpret and requires the implementation of computational tools that support its interpretation. While data from high-copy, low-contributor samples result in electropherogram signal that is readily interpreted by probabilistic methods, electropherogram signal from forensic stains is often garnered from low-copy, high-contributor-number samples and is frequently obfuscated by allele sharing, allele drop-out, stutter and noise. Since forensic DNA profiles are too complicated to quantitatively assess by manual methods, continuous, probabilistic frameworks that draw inferences on the Number of Contributors (NOC) and compute the Likelihood Ratio (LR) given the prosecution's and defense's hypotheses have been developed. In the current paper, we validate a new version of the NOCIt inference platform that determines an A Posteriori Probability (APP) distribution of the number of contributors given an electropherogram. NOCIt is a continuous inference system that incorporates models of peak height (including degradation and differential degradation), forward and reverse stutter, noise and allelic drop-out while taking into account allele frequencies in a reference population. We established the algorithm's performance by conducting tests on samples that were representative of types often encountered in practice. In total, we tested NOCIt's performance on 815 degraded, UV-damaged, inhibited, differentially degraded, or uncompromised DNA mixture samples containing up to 5 contributors. We found that the model makes accurate, repeatable and reliable inferences about the NOCs and significantly outperformed methods that rely on signal filtering. By leveraging recent theoretical results of Slooten and Caliebe (FSI:G, 2018) that, under suitable assumptions, establish the NOC can be treated as a nuisance variable, we demonstrated that when NOCIt's APP is used in conjunction with a downstream likelihood ratio (LR) inference system that employs the same probabilistic model, a full evaluation across multiple contributor numbers is rendered. This work, therefore, illustrates the power of modern probabilistic systems to report holistic and interpretable weights-of-evidence to the trier-of-fact without assigning a specified number of contributors or filtering signal.
Uploads
Papers by Desmond Lun