The chloroplast, a photosynthetic organelle found in all plant and algae species, originates from... more The chloroplast, a photosynthetic organelle found in all plant and algae species, originates from an ancient event in which a cyanobacterium was engulfed by a larger eukaryote. Thus, modern chloroplasts still harbor a bacterial-like genome and carry out all stages of gene expression, including mRNA translation by a 70S ribosome. However, the Shine-Dalgarno model, which predominantly regulates translation initiation by base-pairing between the ribosomal RNA and the mRNA in model bacteria genera, was reported to have ambiguous effects on chloroplast gene expression. Here we show that while the Shine-Dalgarno motif is clearly conserved in proteobacterial mRNAs, its general absence from chloroplast mRNAs is observed in cyanobacteria as well, promoting the idea that the evolutionary process of reducing the centrality of the Shine-Dalgarno mechanism began well before plastid endosymbiosis. As plastid ribosomal RNA anti-Shine-Dalgarno elements are highly similar to their bacterial counterp...
The transcript is populated with numerous overlapping codes that regulate all steps of gene expre... more The transcript is populated with numerous overlapping codes that regulate all steps of gene expression. Deciphering these codes is very challenging due to the large number of variables involved, the non-modular nature of the codes, biases and limitations in current experimental approaches, our limited knowledge in gene expression regulation across the tree of life, and other factors. In recent years, it has been shown that computational modeling and algorithms can significantly accelerate the discovery of novel gene expression codes. Here, we briefly summarize the latest developments and different approaches in the field.
Background mRNA can form local secondary structure within the protein-coding sequence, and the st... more Background mRNA can form local secondary structure within the protein-coding sequence, and the strength of this structure is thought to influence gene expression regulation. Previous studies suggest that secondary structure strength may be maintained under selection, but the details of this phenomenon are not well understood. Results We perform a comprehensive study of the selection on local mRNA folding strengths considering variation between species across the tree of life. We show for the first time that local folding strength selection tends to follow a conserved characteristic profile in most phyla, with selection for weak folding at the two ends of the coding region and for strong folding elsewhere in the coding sequence, with an additional peak of selection for strong folding located downstream of the start codon. The strength of this pattern varies between species and organism groups, and we highlight contradicting cases. To better understand the underlying evolutionary proc...
Motivation: Understanding how viruses co-evolve with their hosts and adapt various genomic level ... more Motivation: Understanding how viruses co-evolve with their hosts and adapt various genomic level strategies in order to ensure their fitness may have essential implications in unveiling the secrets of viral evolution, and in developing new vaccines and therapeutic approaches. Here, based on a novel genomic analysis of 2625 different viruses and 439 corresponding host organisms, we provide evidence of universal evolutionary selection for high dimensional 'silent' patterns of information hidden in the redundancy of viral genetic code. Results: Our model suggests that long substrings of nucleotides in the coding regions of viruses from all classes, often also repeat in the corresponding viral hosts from all domains of life. Selection for these substrings cannot be explained only by such phenomena as codon usage bias, horizontal gene transfer and the encoded proteins. Genes encoding structural proteins responsible for building the core of the viral particles were found to include more host-repeating substrings, and these substrings tend to appear in the middle parts of the viral coding regions. In addition, in human viruses these substrings tend to be enriched with motives related to transcription factors and RNA binding proteins. The host-repeating substrings are possibly related to the evolutionary pressure on the viruses to effectively interact with host's intracellular factors and to efficiently escape from the host's immune system.
The ribosome flow model with input and output (RFMIO) is a deterministic dynamical system that ha... more The ribosome flow model with input and output (RFMIO) is a deterministic dynamical system that has been used to study the flow of ribosomes during mRNA translation. The input of the RFMIO controls its initiation rate and the output represents the ribosome exit rate (and thus the protein production rate) at the 3′ end of the mRNA molecule. The RFMIO and its variants encapsulate important properties that are relevant to modeling ribosome flow such as the possible evolution of "traffic jams" and nonhomogeneous elongation rates along the mRNA molecule, and can also be used for studying additional intracellular processes such as transcription, transport, and more. Here we consider networks of interconnected RFMIOs as a fundamental tool for modeling, analyzing and re-engineering the complex mechanisms of protein production. In these networks, the output of each RFMIO may be divided, using connection weights, between several inputs of other RFMIOs. We show that under quite general feedback connections the network has two important properties: (1) it admits a unique steady-state and every trajectory converges to this steady-state; and (2) the problem of how to determine the connection weights so that the network steady-state output is maximized is a convex optimization problem. These mathematical properties make these networks highly suitable as models of various phenomena: property (1) means that the behavior is predictable and ordered, and property (2) means that determining the optimal weights is numerically tractable even for large-scale networks. For the specific case of a feed-forward network of RFMIOs we prove an additional useful property, namely, that there exists a spectral representation for the network steady-state, and thus it can be determined without any numerical simulations of the dynamics. We describe the implications of these results to several fundamental biological phenomena and biotechnological objectives.
Horizontal gene transfer (HGT) was attributed as a major driving force for the innovation and evo... more Horizontal gene transfer (HGT) was attributed as a major driving force for the innovation and evolution of prokaryotic genomes. Previously, multiple research endeavors were undertaken to decipher HGT in different bacterial lineages. The genus Mycobacterium houses some of the most deadly human pathogens; however, the impact of HGT in Mycobacterium has never been addressed in a systematic way. Previous initiatives to explore the genomic imprints of HGTs in Mycobacterium were focused on few selected species, specifically among the members of Mycobacterium tuberculosis complex. Considering the recent availability of a large number of genomes, the current study was initiated to decipher the probable events of HGTs among 109 completely sequenced Mycobacterium species. Our comprehensive phylogenetic analysis with more than 9,000 families of Mycobacterium proteins allowed us to list several instances of gene transfers spread across the Mycobacterium phylogeny. Moreover, by examining the topology of gene phylogenies here, we identified the species most likely to donate and receive these genes and provided a detailed overview of the putative functions these genes may be involved in. Our study suggested that horizontally acquired foreign genes had played an enduring role in the evolution of Mycobacterium genomes and have contributed to their metabolic versatility and pathogenicity. A significant fraction of genes in all living species was considered to be acquired from genealogically distant species 1-7. This mode of gene exchange between reproductively isolated species, commonly known as horizontal gene transfer (HGT) or lateral gene transfer, was attributed as a major evolutionary force in several prokaryotic lineages 6,8-10. Previous studies have implicated horizontally transferred genes in various important traits including novel metabolic pathways 11-13 , oxygenic photosynthesis 14 , antibiotic resistance 15 , pathogenesis 16 and microbial translation efficiency 17 and various other features 1-4,6-10. Moreover, foreign genes were shown to assist microbes in colonizing new niches and in sustaining environmental changes 1-7,18. Considering their implications in bacterial genome evolution, here in this study, an initiative has been undertaken to trace the probable HGT events among all currently available completely sequenced Mycobacterium genomes. The genus Mycobacterium comprises more than 160 species of which about 15 deadly pathogens are commonly encountered in human and other animals 19. Among the pathogenic Mycobacterium species, M. tuberculosis alone was estimated to have infected one-third of the human population causing more than 2 million annual deaths globally 20. Pathogenic strains were suggested to originate from their free-living ancestors driven by independent or combined influence of genome reduction, gene duplication, gene rearrangement and HGT evolutionary processes. Horizontal gene transfer, among these, was attributed as a major factor contributing to Mycobacterium pathogenesis. Although there are controversies regarding the intensity and extent of HGT among different Mycobacterium species, however, it is clear from recent studies that Mycobacterium genomes had undergone many episodes of intra and interspecies HGTs acquiring genes from diverse origins including some members of eukaryotic families 21-25. Along with substantial evidence of HGTs in several Mycobacterium genomes, previous studies provided important insights regarding their function and evolutionary importance 21-25. For instance, foreign genes were shown to play important roles in the evolution of M. ulcerans (from M. marinum) 26 , M. avium subsp. paratuberculosis 27 and in shaping pathogenic potential of M. abscessus 28 and M. tuberculosis 22,23. However, the contribution of HGT in Mycobacterium genome evolution has never been investigated in a systematic way. Earlier studies were mainly focused to find the genomic imprints of HGT in few selected genomes,
In many important cellular processes, including mRNA translation, gene transcription, phosphotran... more In many important cellular processes, including mRNA translation, gene transcription, phosphotransfer, and intracellular transport, biological "particles" move along some kind of "tracks". The motion of these particles can be modeled as a one-dimensional movement along an ordered sequence of sites. The biological particles (e.g., ribosomes or RNAPs) have volume and cannot surpass one another. In some cases, there is a preferred direction of movement along the track, but in general the movement may be bidirectional, and furthermore the particles may attach or detach from various regions along the tracks. We derive a new deterministic mathematical model for such transport phenomena that may be interpreted as a dynamic mean-field approximation of an important model from mechanical statistics called the asymmetric simple exclusion process (ASEP) with Langmuir kinetics. Using tools from the theory of monotone dynamical systems and contraction theory we show that the m...
Viruses undergo extensive evolutionary selection for efficient replication which effects, among o... more Viruses undergo extensive evolutionary selection for efficient replication which effects, among others, their codon distribution. In the current study, we aimed at understanding the way evolution shapes the codon distribution in early vs. late viral genes in terms of their expression during different stages in the viral replication cycle. To this end we analyzed 14 bacteriophages and 11 human viruses with available information about the expression phases of their genes. We demonstrated evidence of selection for distinct composition of synonymous codons in early and late viral genes in 50% of the analyzed bacteriophages. Among others, this phenomenon may be related to the time specific adaptation of the viral genes to the translation efficiency factors involved at different bacteriophage developmental stages. Specifically, we showed that the differences in codon composition in different temporal gene groups cannot be explained only by phylogenetic proximities between the analyzed bac...
DNA research : an international journal for rapid publication of reports on genes and genomes, Jan 17, 2017
Translation initiation in prokaryotes is affected by the mRNA folding and interaction of the ribo... more Translation initiation in prokaryotes is affected by the mRNA folding and interaction of the ribosome binding site with the ribosomal RNA. The elongation rate is affected, among other factors, by the local biophysical properties of the coding regions, the decoding rates of different codons, and the interactions among ribosomes. Currently, there is no comprehensive biophysical model of translation that enables the prediction of mRNA translation dynamics based only on the transcript sequence and while considering all of these fundamental aspects of translation. In this study, we provide, for the first time, a computational simulative biophysical model of both translation initiation and elongation with all aspects mentioned above. We demonstrate our model performance and advantages focusing on Escherichia coli genes. We further show that the model enables prediction of translation rate, protein levels, and ribosome densities. In addition, our model enables quantifying the effect of sil...
The two major steps of gene expression are transcription and translation. While hundreds of studi... more The two major steps of gene expression are transcription and translation. While hundreds of studies regarding the effect of sequence features on the translation elongation process have been published, very few connect sequence features to the transcription elongation rate. We suggest, for the first time, that short transcript sub-sequences have a typical effect on RNA polymerase (RNAP) speed: we show that nucleotide 5-mers tend to have typical RNAP speed (or transcription rate), which is consistent along different parts of genes and among different groups of genes with high correlation. We also demonstrate that relative RNAP speed correlates with mRNA levels of endogenous and heterologous genes. Furthermore, we show that the estimated transcription and translation elongation rates correlate in endogenous genes. Finally, we demonstrate that our results are consistent for different high resolution experimental measurements of RNAP densities. These results suggest for the first time th...
The Plant journal : for cell and molecular biology, 2018
Various species of microalgae have recently emerged as promising host-organisms for use in biotec... more Various species of microalgae have recently emerged as promising host-organisms for use in biotechnology industries due to their unique properties. These include efficient conversion of sunlight into organic compounds, the ability to grow in extreme conditions and the occurrence of numerous post-translational modification pathways. However, the inability to obtain high levels of nuclear heterologous gene expression in microalgae hinders the development of the entire field. To overcome this limitation, we analyzed different sequence optimization algorithms while studying the effect of transcript sequence features on heterologous expression in the model microalga Chlamydomonas reinhardtii, whose genome consists of rare features such as a high GC content. Based on the analysis of genomic data, we created eight unique sequences coding for a synthetic ferredoxin-hydrogenase enzyme, used here as a reporter gene. Following in silico design, these synthetic genes were transformed into the C...
Ribosome queuing is a fundamental phenomenon suggested to be related to topics such as genome evo... more Ribosome queuing is a fundamental phenomenon suggested to be related to topics such as genome evolution, synthetic biology, gene expression regulation, intracellular biophysics, and more. However, this phenomenon hasn't been quantified yet at a genomic level. Nevertheless, methodologies for studying translation (e.g. ribosome footprints) are usually calibrated to capture only single ribosome protected footprints (mRPFs) and thus limited in their ability to detect ribosome queuing. On the other hand, most of the models in the field assume and analyze a certain level of queuing. Here we present an experimental-computational approach for studying ribosome queuing based on sequencing of RNA footprints extracted from pairs of ribosomes (dRPFs) using a modified ribosome profiling protocol. We combine our approach with traditional ribosome profiling to generate a detailed profile of ribosome traffic. The data are analyzed using computational models of translation dynamics. The approach...
Background: The regulation of all gene expression steps (e.g., Transcription, RNA processing, Tra... more Background: The regulation of all gene expression steps (e.g., Transcription, RNA processing, Translation, and mRNA Degradation) is known to be primarily encoded in different parts of genes and in genomic regions in proximity to genes (e.g., promoters, untranslated regions, coding regions, introns, etc.). However, the entire gene expression codes and the genomic regions where they are encoded are still unknown. Results: Here, we employ an unsupervised approach to estimate the concentration of gene expression codes in different non-coding parts of genes and transcripts, such as introns and untranslated regions, focusing on three model organisms (Escherichia coli, Saccharomyces cerevisiae, and Schizosaccharomyces pombe). Our analyses support the conjecture that regions adjacent to the beginning and end of ORFs and the beginning and end of introns tend to include higher concentration of gene expression information relatively to regions further away. In addition, we report the exact regions with elevated concentration of gene expression codes. Furthermore, we demonstrate that the concentration of these codes in different genetic regions is correlated with the expression levels of the corresponding genes, and with splicing efficiency measurements and meiotic stage gene expression measurements in S. cerevisiae. Conclusion: We suggest that these discoveries improve our understanding of gene expression regulation and evolution; they can also be used for developing improved models of genome/gene evolution and for engineering gene expression in various biotechnological and synthetic biology applications.
It has recently been shown that the organization of genes in eukaryotic genomes, and specifically... more It has recently been shown that the organization of genes in eukaryotic genomes, and specifically in 3D, is strongly related to gene expression and function and partially conserved between organisms. However, previous studies of 3D genomic organization analyzed each organism independently from others. Here, we propose an approach for unified interorganismal analysis of gene organization based on a network representation of Hi-C data. We define and detect four classes of spatially co-evolving orthologous modules (SCOMs), i.e. gene families that coevolve in their 3D organization, based on patterns of divergence and conservation of distances. We demonstrate our methodology on Hi-C data from Saccharomyces cerevisiae and Schizosaccharomyces pombe, and identify, among others, modules relating to RNA splicing machinery and chromatin silencing by small RNA which are central to S. pombe's lifestyle. Our results emphasize the importance of 3D genomic organization in eukaryotes and suggest that the evolutionary mechanisms that shape gene organization affect the organism fitness and phenotypes. The proposed algorithms can be utilized in future studies of genome evolution and comparative analysis of spatial genomic organization in different tissues, conditions and single cells.
Background: It is known that mRNA folding can affect and regulate various gene expression steps b... more Background: It is known that mRNA folding can affect and regulate various gene expression steps both in living organisms and in viruses. Previous studies have recognized functional RNA structures in the genome of the Dengue virus. However, these studies usually focused either on the viral untranslated regions or on very specific and limited regions at the beginning of the coding sequences, in a limited number of strains, and without considering evolutionary selection. Results: Here we performed the first large scale comprehensive genomics analysis of selection for local mRNA folding strength in the Dengue virus coding sequences, based on a total of 1,670 genomes and 4 serotypes. Our analysis identified clusters of positions along the coding regions that may undergo a conserved evolutionary selection for strong or weak local folding maintained across different viral variants. Specifically, 53-66 clusters for strong folding and 49-73 clusters for weak folding (depending on serotype) aggregated of positions with a significant conservation of folding energy signals (related to partially overlapping local genomic regions) were recognized. In addition, up to 7% of these positions were found to be conserved in more than 90% of the viral genomes. Although some of the identified positions undergo frequent synonymous / non-synonymous substitutions, the selection for folding strength therein is preserved, and thus cannot be trivially explained based on sequence conservation alone. Conclusions: The fact that many of the positions with significant folding related signals are conserved among different Dengue variants suggests that a better understanding of the mRNA structures in the corresponding regions may promote the development of prospective anti-Dengue vaccination strategies. The comparative genomics approach described here can be employed in the future for detecting functional regions in other pathogens with very high mutations rates.
A large number of studies demonstrated the importance of different HIV RNA structural elements at... more A large number of studies demonstrated the importance of different HIV RNA structural elements at all stages of the viral life cycle. Nevertheless, the significance of many of these structures is unknown, and plausibly new regions containing RNA structure-mediated regulatory signals remain to be identified. An important characteristic of genomic regions carrying functionally significant secondary structures is their mutational robustness, that is, the extent to which a sequence remains constant in spite of despite mutations in terms of its underlying secondary structure. Structural robustness to mutations is expected to be important in the case of functional RNA structures in viruses with high mutation rate; it may prevent fitness loss due to disruption of possibly functional conformations, pointing to the specific significance of the corresponding genomic region. In the current work, we perform a genome-wide computational analysis to detect signals of a direct evolutionary selection for strong folding and RNA structure-based mutational robustness within HIV coding sequences. We provide evidence that specific regions of HIV structural genes undergo an evolutionary selection for strong folding; in addition, we demonstrate that HIV Rev responsive element seems to undergo a direct evolutionary selection for increased secondary structure robustness to point mutations. We believe that our analysis may enable a better understanding of viral evolutionary dynamics at the RNA structural level and may benefit to practical efforts of engineering antiviral vaccines and novel therapeutic approaches.
The chloroplast, a photosynthetic organelle found in all plant and algae species, originates from... more The chloroplast, a photosynthetic organelle found in all plant and algae species, originates from an ancient event in which a cyanobacterium was engulfed by a larger eukaryote. Thus, modern chloroplasts still harbor a bacterial-like genome and carry out all stages of gene expression, including mRNA translation by a 70S ribosome. However, the Shine-Dalgarno model, which predominantly regulates translation initiation by base-pairing between the ribosomal RNA and the mRNA in model bacteria genera, was reported to have ambiguous effects on chloroplast gene expression. Here we show that while the Shine-Dalgarno motif is clearly conserved in proteobacterial mRNAs, its general absence from chloroplast mRNAs is observed in cyanobacteria as well, promoting the idea that the evolutionary process of reducing the centrality of the Shine-Dalgarno mechanism began well before plastid endosymbiosis. As plastid ribosomal RNA anti-Shine-Dalgarno elements are highly similar to their bacterial counterp...
The transcript is populated with numerous overlapping codes that regulate all steps of gene expre... more The transcript is populated with numerous overlapping codes that regulate all steps of gene expression. Deciphering these codes is very challenging due to the large number of variables involved, the non-modular nature of the codes, biases and limitations in current experimental approaches, our limited knowledge in gene expression regulation across the tree of life, and other factors. In recent years, it has been shown that computational modeling and algorithms can significantly accelerate the discovery of novel gene expression codes. Here, we briefly summarize the latest developments and different approaches in the field.
Background mRNA can form local secondary structure within the protein-coding sequence, and the st... more Background mRNA can form local secondary structure within the protein-coding sequence, and the strength of this structure is thought to influence gene expression regulation. Previous studies suggest that secondary structure strength may be maintained under selection, but the details of this phenomenon are not well understood. Results We perform a comprehensive study of the selection on local mRNA folding strengths considering variation between species across the tree of life. We show for the first time that local folding strength selection tends to follow a conserved characteristic profile in most phyla, with selection for weak folding at the two ends of the coding region and for strong folding elsewhere in the coding sequence, with an additional peak of selection for strong folding located downstream of the start codon. The strength of this pattern varies between species and organism groups, and we highlight contradicting cases. To better understand the underlying evolutionary proc...
Motivation: Understanding how viruses co-evolve with their hosts and adapt various genomic level ... more Motivation: Understanding how viruses co-evolve with their hosts and adapt various genomic level strategies in order to ensure their fitness may have essential implications in unveiling the secrets of viral evolution, and in developing new vaccines and therapeutic approaches. Here, based on a novel genomic analysis of 2625 different viruses and 439 corresponding host organisms, we provide evidence of universal evolutionary selection for high dimensional 'silent' patterns of information hidden in the redundancy of viral genetic code. Results: Our model suggests that long substrings of nucleotides in the coding regions of viruses from all classes, often also repeat in the corresponding viral hosts from all domains of life. Selection for these substrings cannot be explained only by such phenomena as codon usage bias, horizontal gene transfer and the encoded proteins. Genes encoding structural proteins responsible for building the core of the viral particles were found to include more host-repeating substrings, and these substrings tend to appear in the middle parts of the viral coding regions. In addition, in human viruses these substrings tend to be enriched with motives related to transcription factors and RNA binding proteins. The host-repeating substrings are possibly related to the evolutionary pressure on the viruses to effectively interact with host's intracellular factors and to efficiently escape from the host's immune system.
The ribosome flow model with input and output (RFMIO) is a deterministic dynamical system that ha... more The ribosome flow model with input and output (RFMIO) is a deterministic dynamical system that has been used to study the flow of ribosomes during mRNA translation. The input of the RFMIO controls its initiation rate and the output represents the ribosome exit rate (and thus the protein production rate) at the 3′ end of the mRNA molecule. The RFMIO and its variants encapsulate important properties that are relevant to modeling ribosome flow such as the possible evolution of "traffic jams" and nonhomogeneous elongation rates along the mRNA molecule, and can also be used for studying additional intracellular processes such as transcription, transport, and more. Here we consider networks of interconnected RFMIOs as a fundamental tool for modeling, analyzing and re-engineering the complex mechanisms of protein production. In these networks, the output of each RFMIO may be divided, using connection weights, between several inputs of other RFMIOs. We show that under quite general feedback connections the network has two important properties: (1) it admits a unique steady-state and every trajectory converges to this steady-state; and (2) the problem of how to determine the connection weights so that the network steady-state output is maximized is a convex optimization problem. These mathematical properties make these networks highly suitable as models of various phenomena: property (1) means that the behavior is predictable and ordered, and property (2) means that determining the optimal weights is numerically tractable even for large-scale networks. For the specific case of a feed-forward network of RFMIOs we prove an additional useful property, namely, that there exists a spectral representation for the network steady-state, and thus it can be determined without any numerical simulations of the dynamics. We describe the implications of these results to several fundamental biological phenomena and biotechnological objectives.
Horizontal gene transfer (HGT) was attributed as a major driving force for the innovation and evo... more Horizontal gene transfer (HGT) was attributed as a major driving force for the innovation and evolution of prokaryotic genomes. Previously, multiple research endeavors were undertaken to decipher HGT in different bacterial lineages. The genus Mycobacterium houses some of the most deadly human pathogens; however, the impact of HGT in Mycobacterium has never been addressed in a systematic way. Previous initiatives to explore the genomic imprints of HGTs in Mycobacterium were focused on few selected species, specifically among the members of Mycobacterium tuberculosis complex. Considering the recent availability of a large number of genomes, the current study was initiated to decipher the probable events of HGTs among 109 completely sequenced Mycobacterium species. Our comprehensive phylogenetic analysis with more than 9,000 families of Mycobacterium proteins allowed us to list several instances of gene transfers spread across the Mycobacterium phylogeny. Moreover, by examining the topology of gene phylogenies here, we identified the species most likely to donate and receive these genes and provided a detailed overview of the putative functions these genes may be involved in. Our study suggested that horizontally acquired foreign genes had played an enduring role in the evolution of Mycobacterium genomes and have contributed to their metabolic versatility and pathogenicity. A significant fraction of genes in all living species was considered to be acquired from genealogically distant species 1-7. This mode of gene exchange between reproductively isolated species, commonly known as horizontal gene transfer (HGT) or lateral gene transfer, was attributed as a major evolutionary force in several prokaryotic lineages 6,8-10. Previous studies have implicated horizontally transferred genes in various important traits including novel metabolic pathways 11-13 , oxygenic photosynthesis 14 , antibiotic resistance 15 , pathogenesis 16 and microbial translation efficiency 17 and various other features 1-4,6-10. Moreover, foreign genes were shown to assist microbes in colonizing new niches and in sustaining environmental changes 1-7,18. Considering their implications in bacterial genome evolution, here in this study, an initiative has been undertaken to trace the probable HGT events among all currently available completely sequenced Mycobacterium genomes. The genus Mycobacterium comprises more than 160 species of which about 15 deadly pathogens are commonly encountered in human and other animals 19. Among the pathogenic Mycobacterium species, M. tuberculosis alone was estimated to have infected one-third of the human population causing more than 2 million annual deaths globally 20. Pathogenic strains were suggested to originate from their free-living ancestors driven by independent or combined influence of genome reduction, gene duplication, gene rearrangement and HGT evolutionary processes. Horizontal gene transfer, among these, was attributed as a major factor contributing to Mycobacterium pathogenesis. Although there are controversies regarding the intensity and extent of HGT among different Mycobacterium species, however, it is clear from recent studies that Mycobacterium genomes had undergone many episodes of intra and interspecies HGTs acquiring genes from diverse origins including some members of eukaryotic families 21-25. Along with substantial evidence of HGTs in several Mycobacterium genomes, previous studies provided important insights regarding their function and evolutionary importance 21-25. For instance, foreign genes were shown to play important roles in the evolution of M. ulcerans (from M. marinum) 26 , M. avium subsp. paratuberculosis 27 and in shaping pathogenic potential of M. abscessus 28 and M. tuberculosis 22,23. However, the contribution of HGT in Mycobacterium genome evolution has never been investigated in a systematic way. Earlier studies were mainly focused to find the genomic imprints of HGT in few selected genomes,
In many important cellular processes, including mRNA translation, gene transcription, phosphotran... more In many important cellular processes, including mRNA translation, gene transcription, phosphotransfer, and intracellular transport, biological "particles" move along some kind of "tracks". The motion of these particles can be modeled as a one-dimensional movement along an ordered sequence of sites. The biological particles (e.g., ribosomes or RNAPs) have volume and cannot surpass one another. In some cases, there is a preferred direction of movement along the track, but in general the movement may be bidirectional, and furthermore the particles may attach or detach from various regions along the tracks. We derive a new deterministic mathematical model for such transport phenomena that may be interpreted as a dynamic mean-field approximation of an important model from mechanical statistics called the asymmetric simple exclusion process (ASEP) with Langmuir kinetics. Using tools from the theory of monotone dynamical systems and contraction theory we show that the m...
Viruses undergo extensive evolutionary selection for efficient replication which effects, among o... more Viruses undergo extensive evolutionary selection for efficient replication which effects, among others, their codon distribution. In the current study, we aimed at understanding the way evolution shapes the codon distribution in early vs. late viral genes in terms of their expression during different stages in the viral replication cycle. To this end we analyzed 14 bacteriophages and 11 human viruses with available information about the expression phases of their genes. We demonstrated evidence of selection for distinct composition of synonymous codons in early and late viral genes in 50% of the analyzed bacteriophages. Among others, this phenomenon may be related to the time specific adaptation of the viral genes to the translation efficiency factors involved at different bacteriophage developmental stages. Specifically, we showed that the differences in codon composition in different temporal gene groups cannot be explained only by phylogenetic proximities between the analyzed bac...
DNA research : an international journal for rapid publication of reports on genes and genomes, Jan 17, 2017
Translation initiation in prokaryotes is affected by the mRNA folding and interaction of the ribo... more Translation initiation in prokaryotes is affected by the mRNA folding and interaction of the ribosome binding site with the ribosomal RNA. The elongation rate is affected, among other factors, by the local biophysical properties of the coding regions, the decoding rates of different codons, and the interactions among ribosomes. Currently, there is no comprehensive biophysical model of translation that enables the prediction of mRNA translation dynamics based only on the transcript sequence and while considering all of these fundamental aspects of translation. In this study, we provide, for the first time, a computational simulative biophysical model of both translation initiation and elongation with all aspects mentioned above. We demonstrate our model performance and advantages focusing on Escherichia coli genes. We further show that the model enables prediction of translation rate, protein levels, and ribosome densities. In addition, our model enables quantifying the effect of sil...
The two major steps of gene expression are transcription and translation. While hundreds of studi... more The two major steps of gene expression are transcription and translation. While hundreds of studies regarding the effect of sequence features on the translation elongation process have been published, very few connect sequence features to the transcription elongation rate. We suggest, for the first time, that short transcript sub-sequences have a typical effect on RNA polymerase (RNAP) speed: we show that nucleotide 5-mers tend to have typical RNAP speed (or transcription rate), which is consistent along different parts of genes and among different groups of genes with high correlation. We also demonstrate that relative RNAP speed correlates with mRNA levels of endogenous and heterologous genes. Furthermore, we show that the estimated transcription and translation elongation rates correlate in endogenous genes. Finally, we demonstrate that our results are consistent for different high resolution experimental measurements of RNAP densities. These results suggest for the first time th...
The Plant journal : for cell and molecular biology, 2018
Various species of microalgae have recently emerged as promising host-organisms for use in biotec... more Various species of microalgae have recently emerged as promising host-organisms for use in biotechnology industries due to their unique properties. These include efficient conversion of sunlight into organic compounds, the ability to grow in extreme conditions and the occurrence of numerous post-translational modification pathways. However, the inability to obtain high levels of nuclear heterologous gene expression in microalgae hinders the development of the entire field. To overcome this limitation, we analyzed different sequence optimization algorithms while studying the effect of transcript sequence features on heterologous expression in the model microalga Chlamydomonas reinhardtii, whose genome consists of rare features such as a high GC content. Based on the analysis of genomic data, we created eight unique sequences coding for a synthetic ferredoxin-hydrogenase enzyme, used here as a reporter gene. Following in silico design, these synthetic genes were transformed into the C...
Ribosome queuing is a fundamental phenomenon suggested to be related to topics such as genome evo... more Ribosome queuing is a fundamental phenomenon suggested to be related to topics such as genome evolution, synthetic biology, gene expression regulation, intracellular biophysics, and more. However, this phenomenon hasn't been quantified yet at a genomic level. Nevertheless, methodologies for studying translation (e.g. ribosome footprints) are usually calibrated to capture only single ribosome protected footprints (mRPFs) and thus limited in their ability to detect ribosome queuing. On the other hand, most of the models in the field assume and analyze a certain level of queuing. Here we present an experimental-computational approach for studying ribosome queuing based on sequencing of RNA footprints extracted from pairs of ribosomes (dRPFs) using a modified ribosome profiling protocol. We combine our approach with traditional ribosome profiling to generate a detailed profile of ribosome traffic. The data are analyzed using computational models of translation dynamics. The approach...
Background: The regulation of all gene expression steps (e.g., Transcription, RNA processing, Tra... more Background: The regulation of all gene expression steps (e.g., Transcription, RNA processing, Translation, and mRNA Degradation) is known to be primarily encoded in different parts of genes and in genomic regions in proximity to genes (e.g., promoters, untranslated regions, coding regions, introns, etc.). However, the entire gene expression codes and the genomic regions where they are encoded are still unknown. Results: Here, we employ an unsupervised approach to estimate the concentration of gene expression codes in different non-coding parts of genes and transcripts, such as introns and untranslated regions, focusing on three model organisms (Escherichia coli, Saccharomyces cerevisiae, and Schizosaccharomyces pombe). Our analyses support the conjecture that regions adjacent to the beginning and end of ORFs and the beginning and end of introns tend to include higher concentration of gene expression information relatively to regions further away. In addition, we report the exact regions with elevated concentration of gene expression codes. Furthermore, we demonstrate that the concentration of these codes in different genetic regions is correlated with the expression levels of the corresponding genes, and with splicing efficiency measurements and meiotic stage gene expression measurements in S. cerevisiae. Conclusion: We suggest that these discoveries improve our understanding of gene expression regulation and evolution; they can also be used for developing improved models of genome/gene evolution and for engineering gene expression in various biotechnological and synthetic biology applications.
It has recently been shown that the organization of genes in eukaryotic genomes, and specifically... more It has recently been shown that the organization of genes in eukaryotic genomes, and specifically in 3D, is strongly related to gene expression and function and partially conserved between organisms. However, previous studies of 3D genomic organization analyzed each organism independently from others. Here, we propose an approach for unified interorganismal analysis of gene organization based on a network representation of Hi-C data. We define and detect four classes of spatially co-evolving orthologous modules (SCOMs), i.e. gene families that coevolve in their 3D organization, based on patterns of divergence and conservation of distances. We demonstrate our methodology on Hi-C data from Saccharomyces cerevisiae and Schizosaccharomyces pombe, and identify, among others, modules relating to RNA splicing machinery and chromatin silencing by small RNA which are central to S. pombe's lifestyle. Our results emphasize the importance of 3D genomic organization in eukaryotes and suggest that the evolutionary mechanisms that shape gene organization affect the organism fitness and phenotypes. The proposed algorithms can be utilized in future studies of genome evolution and comparative analysis of spatial genomic organization in different tissues, conditions and single cells.
Background: It is known that mRNA folding can affect and regulate various gene expression steps b... more Background: It is known that mRNA folding can affect and regulate various gene expression steps both in living organisms and in viruses. Previous studies have recognized functional RNA structures in the genome of the Dengue virus. However, these studies usually focused either on the viral untranslated regions or on very specific and limited regions at the beginning of the coding sequences, in a limited number of strains, and without considering evolutionary selection. Results: Here we performed the first large scale comprehensive genomics analysis of selection for local mRNA folding strength in the Dengue virus coding sequences, based on a total of 1,670 genomes and 4 serotypes. Our analysis identified clusters of positions along the coding regions that may undergo a conserved evolutionary selection for strong or weak local folding maintained across different viral variants. Specifically, 53-66 clusters for strong folding and 49-73 clusters for weak folding (depending on serotype) aggregated of positions with a significant conservation of folding energy signals (related to partially overlapping local genomic regions) were recognized. In addition, up to 7% of these positions were found to be conserved in more than 90% of the viral genomes. Although some of the identified positions undergo frequent synonymous / non-synonymous substitutions, the selection for folding strength therein is preserved, and thus cannot be trivially explained based on sequence conservation alone. Conclusions: The fact that many of the positions with significant folding related signals are conserved among different Dengue variants suggests that a better understanding of the mRNA structures in the corresponding regions may promote the development of prospective anti-Dengue vaccination strategies. The comparative genomics approach described here can be employed in the future for detecting functional regions in other pathogens with very high mutations rates.
A large number of studies demonstrated the importance of different HIV RNA structural elements at... more A large number of studies demonstrated the importance of different HIV RNA structural elements at all stages of the viral life cycle. Nevertheless, the significance of many of these structures is unknown, and plausibly new regions containing RNA structure-mediated regulatory signals remain to be identified. An important characteristic of genomic regions carrying functionally significant secondary structures is their mutational robustness, that is, the extent to which a sequence remains constant in spite of despite mutations in terms of its underlying secondary structure. Structural robustness to mutations is expected to be important in the case of functional RNA structures in viruses with high mutation rate; it may prevent fitness loss due to disruption of possibly functional conformations, pointing to the specific significance of the corresponding genomic region. In the current work, we perform a genome-wide computational analysis to detect signals of a direct evolutionary selection for strong folding and RNA structure-based mutational robustness within HIV coding sequences. We provide evidence that specific regions of HIV structural genes undergo an evolutionary selection for strong folding; in addition, we demonstrate that HIV Rev responsive element seems to undergo a direct evolutionary selection for increased secondary structure robustness to point mutations. We believe that our analysis may enable a better understanding of viral evolutionary dynamics at the RNA structural level and may benefit to practical efforts of engineering antiviral vaccines and novel therapeutic approaches.
Uploads
Papers by Tamir Tuller