Many applications in the biomedical domain involve the detailed molecular and functional characte... more Many applications in the biomedical domain involve the detailed molecular and functional characterization of macro-molecules such as proteins. Where possible, this involves the knowledge of detailed 3D coordinates of every atom within a protein. At the same time, machine learning has become the basis of much innovation within this domain in recent years. There are, however, a few challenges in applying machine learning to 3D protein structures, such as variability in size and high dimensionality of the data. It would therefore be beneficial to be able to map every protein structure to a smaller fixeddimensional representation that is directly learned from the structure without manual curation. In addition, it would be valuable for biomedical researchers if such approaches would require little method development and instead draw from cutting-edge research such as image classification via deep neural networks. Here, such an approach is outlined that first re-formats protein structures as 2D color images and then applies off-the-shelf neural networks for image classification. It is shown that such neural networks can be trained to effectively encode the CATH protein classification database and that feature vectors extracted from such networks, once trained, can be transferred to a completely new task that is likely to benefit from molecular protein information, namely that of small molecule binding.
Rotational view onto fitness landscape leading to Neofunctionalization (under strong selection pr... more Rotational view onto fitness landscape leading to Neofunctionalization (under strong selection pressure). Gene fitness is plotted over a 2D representation of sequence space. Solid black lines indicate evolutionary trajectories of pre-duplication gene loci, whereas green lines indicate evolution of a gene copy arising by duplication.
Heterodimerisation of transcription factors and combinatorial control are features of gene regula... more Heterodimerisation of transcription factors and combinatorial control are features of gene regulatory networks especially in higher eukaryotes. Here I propose that these features could have been important mechanisms for the evolution of multicellular complexity. Random Boolean networks were used to model regulatory networks of dimerising and non-dimerising transcription factors. These networks were evolved in computer simulations to produce high numbers of steady states (attractors) which were interpreted as individual cellular states as found in different cell types. Simulations were run that either allowed dimerisation or did not. Dimerising networks evolved to produce a higher number of steady states, thus supporting the initial proposal.
The presented thesis investigates fundamental theoretical questions regarding the evolutionary pr... more The presented thesis investigates fundamental theoretical questions regarding the evolutionary processes that lead to new proteins. In particular, it addresses the question of how the opposing demands for conservation and innovation on evolving proteins (referred to as adaptive conflicts) can be reconciled at the biophysical level of protein tertiary structures. The trade-offs that become necessary as a result of adaptive conflicts are studied by means of simple biophysical models of protein folding and simulations of evolving populations of proteins. Additional results were obtained by applying structural bioinformatics techniques to experimentally determined protein tertiary structures. Two main mechanisms by which evolution can resolve adaptive conflicts are studied: first, by mutations that shift the thermodynamic equilibrium of a protein towards bi-stability, i.e. the capacity of the protein to form two alternative tertiary structures, each of which performs a different benefic...
Life-history traits controlling the duration and timing of developmental phases in the life cycle... more Life-history traits controlling the duration and timing of developmental phases in the life cycle jointly determine fitness. Therefore, life-history traits studied in isolation provide an incomplete view on the relevance of life-cycle variation for adaptation. In this study, we examine genetic variation in traits covering the major life history events of the annual species Arabidopsis thaliana: seed dormancy, vegetative growth rate and flowering time. In a sample of 112 genotypes collected throughout the European range of the species, both seed dormancy and flowering time follow a latitudinal gradient independent of the major population structure gradient. This finding confirms previous studies reporting the adaptive evolution of these two traits. Here, however, we further analyze patterns of co-variation among traits. We observe that covariation between primary dormancy, vegetative growth rate and flowering time also follows a latitudinal cline. At higher latitudes, vegetative growth rate is positively correlated with primary dormancy and negatively with flowering time. In the South, this trend disappears. Patterns of trait co-variation change, presumably because major environmental gradients shift with latitude. This pattern appears unrelated to population structure, suggesting that changes in the coordinated evolution of major life history traits is adaptive. Our data suggest that A. thaliana provides a good model for the evolution of tradeoffs and their genetic basis.
Deciphering the effects of nonsynonymous mutations on protein structure is central to many areas ... more Deciphering the effects of nonsynonymous mutations on protein structure is central to many areas of biomedical research and is of fundamental importance to the study of molecular evolution. Much of the investigation of protein evolution has focused on mutations that leave a protein's folded structure essentially unchanged. However, to evolve novel folds of proteins, mutations that lead to large conformational modifications have to be involved. Unraveling the basic biophysics of such mutations is a challenge to theory, especially when only one or two amino acid substitutions cause a large-scale conformational switch. Among the few such mutational switches identified experimentally, the one between the GA all-α and GB α+β folds is extensively characterized; but all-atom simulations using fully transferrable potentials have not been able to account for this striking switching behavior. Here we introduce an explicit- chain model that combines structure-based native biases for multiple alternative structures with a general physical atomic force field, and apply this construct to twelve mutants spanning the sequence variation between GA and GB. In agreement with experiment, we observe conformational switching from GA to GB upon a single L45Y substitution in the GA98 mutant. In line with the latent evolutionary potential concept, our model shows a gradual sequence-dependent change in fold preference in the mutants before this switch. Our analysis also indicates that a sharp GA/GB switch may arise from the orientation dependence of aromatic π-interactions. These findings provide physical insights toward rationalizing, predicting and designing evolutionary conformational switches.
Many organisms live under complex and changing environmental conditions, while having a limited n... more Many organisms live under complex and changing environmental conditions, while having a limited number of proteins to deal with these conditions. Multi-functionality, as exhibited by many functionally promiscuous enzymes, has been hypothesised as an advantageous compromise whenever the same protein is under selection to conserve an existing function while adapting towards a new function (adaptive conflict). A stage of multi-functionality may or may not be followed by gene duplication and divergence. We use computational biophysical models to analyse multi-functionality of proteins that can fold into more than one stable structure (using structure formation as a proxy for functionality). Our model predicts that proteins evolving under selection for two alternative structures can follow gradients of stability shift from the formation of only one stable structure towards an equilibrium state between two stable structures (bi-stability). Population dynamics simulations show that weak co...
Rotational view onto fitness landscape leading to Subfunctionalization (under weak selection pres... more Rotational view onto fitness landscape leading to Subfunctionalization (under weak selection pressure). Gene fitness is plotted over a 2D representation of sequence space. Solid black lines indicate evolutionary trajectories of pre-duplication gene loci, whereas green lines indicate evolution of a gene copy arising by duplication.
Understanding how rapid functional adaptation is possible under complex environmental constraints... more Understanding how rapid functional adaptation is possible under complex environmental constraints is crucial not only in the context of long-term genome evolution, but also in critical short-term scenarios such as the evolution of multi-drug resistance in bacteria. Recent theoretical developments in the area of molecular evolution via gene duplication emphasise the importance of functionally promiscuous ancestors. The concept of subfunctionalisation not only provides a mechanism for retention of otherwise redundant gene copies but also for the "escape from an adaptive conflict". This conflict of a single pre-duplication gene to adapt to more than one selection pressure can lead to a certain degree of multi- functionality prior to a duplication event. After a duplication, the resulting paralogs are thought to quickly specialise on different sub-functions. Here, multi-functionality is considered at the protein conformational level, assuming a direct link between structure an...
Most hypotheses about the evolution of new proteins via the mecha- nism of gene duplication and d... more Most hypotheses about the evolution of new proteins via the mecha- nism of gene duplication and divergence only take into account the pro- cesses following the duplication event. The Neofunctionalisation hypothe- sis assumes that, due to the functional redundancy of the two gene copies, one is free to undergo adaptation towards a new function. The Subfunc- tionalisation hypothesis, assumes that, for a while, the duplicates retain complementary subfunctions of the original gene. Here, we review evi- dence for adaptive changes occurring in proteins (or single protein do- mains) that have not (yet) undergone gene duplication. These changes are neutral with respect to the fitness contribution of the native function of a protein, but potentially adaptive regarding the fitness contributions of latent or promiscuous functions. It is likely that those promiscuous functions are associated with different protein conformations existing in equilibrium. If a gene duplication occurs for a protein...
Many organisms live under complex and changing environmental conditions, while having a limited n... more Many organisms live under complex and changing environmental conditions, while having a limited number of proteins to deal with these conditions. Multi-functionality, as exhibited by many functionally promiscuous enzymes, has been hypothesised as an advantageous compromise whenever the same protein is under selection to conserve an existing function while adapting towards a new function (adaptive conflict). A stage of multi-functionality may or may not be followed by gene duplication and divergence. We use simple biophysical models to explain the basic principles behind the multi-functionality of proteins that can fold into more than one stable structure (using structure formation as a proxy for functionality). Our model predicts that proteins evolving under selection for two alternative structures can follow gradients of stability shift from the formation of only one stable structure towards an equilibrium state between two stable structures (bi-stability), each providing an indepe...
Molecular evolution has focused on sequence data due to its easy accessibility and the wealth of ... more Molecular evolution has focused on sequence data due to its easy accessibility and the wealth of information it contains. However, certain aspects of protein evolution can only be understood in the light of structure and the biophysical rules that determine it. One such rule is the formation of a stable hydrophobic core in many if not most folded proteins that we see today. Another rule is the efficient folding of an extended peptide chain into a functional native form, avoiding misfolding and misinteracting with other molecules. While the core of a protein fold is strongly conserved, the surface can tolerate many mutations, leading to the widely observed properties of robustness and adaptability which are crucial for evolution. Adaptability can arise from mutations stabilizing hidden or excited protein folds that may compete with the dominant native fold during the folding and other dynamic processes crucial for function. If these hidden folds have functional potential, they may be...
ABSTRACT This chapter contains sections titled: Introduction Stability of Protein Structures Stru... more ABSTRACT This chapter contains sections titled: Introduction Stability of Protein Structures Structural Promiscuity of Proteins Evolutionary Transitions between Protein Phenotypes Functional Promiscuity of Enzymes Gene Duplications and Phenotypic Transitions at the Population Level Evolution of Ribozyme Structures Conclusions References
The study of molecular evolution at the level of protein-coding genes often entails comparing lar... more The study of molecular evolution at the level of protein-coding genes often entails comparing large datasets of sequences to infer their evolutionary relationships. Despite the importance of a protein's structure and conformational dynamics to its function and thus its fitness, common phylogenetic methods embody minimal biophysical knowledge of proteins. To underscore the biophysical constraints on natural selection, we survey effects of protein mutations, highlighting the physical basis for marginal stability of natural globular proteins and how requirement for kinetic stability and avoidance of misfolding and misinteractions might have affected protein evolution. The biophysical underpinnings of these effects have been addressed by models with an explicit coarse-grained spatial representation of the polypeptide chain. Sequence–structure mappings based on such models are powerful conceptual tools that rationalize mutational robustness, evolvability, epistasis, promiscuous function performed by ‘hidden’ conformational states, resolution of adaptive conflicts and conformational switches in the evolution from one protein fold to another. Recently, protein biophysics has been applied to derive more accurate evolutionary accounts of sequence data. Methods have also been developed to exploit sequence-based evolutionary information to predict biophysical behaviours of proteins. The success of these approaches demonstrates a deep synergy between the fields of protein biophysics and protein evolution.
Experimental studies have shown that some proteins exist in two alternative native-state conforma... more Experimental studies have shown that some proteins exist in two alternative native-state conformations. It has been proposed that such bi-stable proteins can potentially function as evolutionary bridges at the interface between two neutral networks of protein sequences that fold uniquely into the two different native conformations. Under adaptive conflict scenarios, bi-stable proteins may be of particular advantage if they simultaneously provide two beneficial biological functions. However, computational models that simulate protein structure evolution do not yet recognize the importance of bi-stability. Here we use a biophysical model to analyze sequence space to identify bi-stable or multi-stable proteins with two or more equally stable native-state structures. The inclusion of such proteins enhances phenotype connectivity between neutral networks in sequence space. Consideration of the sequence space neighborhood of bridge proteins revealed that bi-stability decreases gradually with each mutation that takes the sequence further away from an exactly bistable protein. With relaxed selection pressures, we found that bi-stable proteins in our model are highly successful under simulated adaptive conflict. Inspired by these model predictions, we developed a method to identify real proteins in the PDB with bridge-like properties, and have verified a clear bi-stability gradient for a series of mutants studied by Alexander et al. (Proc Nat Acad Sci USA 2009, 106:21149-21154) that connect two sequences that fold uniquely into two different native structures via a bridge-like intermediate mutant sequence. Based on these findings, new testable predictions for future studies on protein bi-stability and evolution are discussed.
Many applications in the biomedical domain involve the detailed molecular and functional characte... more Many applications in the biomedical domain involve the detailed molecular and functional characterization of macro-molecules such as proteins. Where possible, this involves the knowledge of detailed 3D coordinates of every atom within a protein. At the same time, machine learning has become the basis of much innovation within this domain in recent years. There are, however, a few challenges in applying machine learning to 3D protein structures, such as variability in size and high dimensionality of the data. It would therefore be beneficial to be able to map every protein structure to a smaller fixeddimensional representation that is directly learned from the structure without manual curation. In addition, it would be valuable for biomedical researchers if such approaches would require little method development and instead draw from cutting-edge research such as image classification via deep neural networks. Here, such an approach is outlined that first re-formats protein structures as 2D color images and then applies off-the-shelf neural networks for image classification. It is shown that such neural networks can be trained to effectively encode the CATH protein classification database and that feature vectors extracted from such networks, once trained, can be transferred to a completely new task that is likely to benefit from molecular protein information, namely that of small molecule binding.
Rotational view onto fitness landscape leading to Neofunctionalization (under strong selection pr... more Rotational view onto fitness landscape leading to Neofunctionalization (under strong selection pressure). Gene fitness is plotted over a 2D representation of sequence space. Solid black lines indicate evolutionary trajectories of pre-duplication gene loci, whereas green lines indicate evolution of a gene copy arising by duplication.
Heterodimerisation of transcription factors and combinatorial control are features of gene regula... more Heterodimerisation of transcription factors and combinatorial control are features of gene regulatory networks especially in higher eukaryotes. Here I propose that these features could have been important mechanisms for the evolution of multicellular complexity. Random Boolean networks were used to model regulatory networks of dimerising and non-dimerising transcription factors. These networks were evolved in computer simulations to produce high numbers of steady states (attractors) which were interpreted as individual cellular states as found in different cell types. Simulations were run that either allowed dimerisation or did not. Dimerising networks evolved to produce a higher number of steady states, thus supporting the initial proposal.
The presented thesis investigates fundamental theoretical questions regarding the evolutionary pr... more The presented thesis investigates fundamental theoretical questions regarding the evolutionary processes that lead to new proteins. In particular, it addresses the question of how the opposing demands for conservation and innovation on evolving proteins (referred to as adaptive conflicts) can be reconciled at the biophysical level of protein tertiary structures. The trade-offs that become necessary as a result of adaptive conflicts are studied by means of simple biophysical models of protein folding and simulations of evolving populations of proteins. Additional results were obtained by applying structural bioinformatics techniques to experimentally determined protein tertiary structures. Two main mechanisms by which evolution can resolve adaptive conflicts are studied: first, by mutations that shift the thermodynamic equilibrium of a protein towards bi-stability, i.e. the capacity of the protein to form two alternative tertiary structures, each of which performs a different benefic...
Life-history traits controlling the duration and timing of developmental phases in the life cycle... more Life-history traits controlling the duration and timing of developmental phases in the life cycle jointly determine fitness. Therefore, life-history traits studied in isolation provide an incomplete view on the relevance of life-cycle variation for adaptation. In this study, we examine genetic variation in traits covering the major life history events of the annual species Arabidopsis thaliana: seed dormancy, vegetative growth rate and flowering time. In a sample of 112 genotypes collected throughout the European range of the species, both seed dormancy and flowering time follow a latitudinal gradient independent of the major population structure gradient. This finding confirms previous studies reporting the adaptive evolution of these two traits. Here, however, we further analyze patterns of co-variation among traits. We observe that covariation between primary dormancy, vegetative growth rate and flowering time also follows a latitudinal cline. At higher latitudes, vegetative growth rate is positively correlated with primary dormancy and negatively with flowering time. In the South, this trend disappears. Patterns of trait co-variation change, presumably because major environmental gradients shift with latitude. This pattern appears unrelated to population structure, suggesting that changes in the coordinated evolution of major life history traits is adaptive. Our data suggest that A. thaliana provides a good model for the evolution of tradeoffs and their genetic basis.
Deciphering the effects of nonsynonymous mutations on protein structure is central to many areas ... more Deciphering the effects of nonsynonymous mutations on protein structure is central to many areas of biomedical research and is of fundamental importance to the study of molecular evolution. Much of the investigation of protein evolution has focused on mutations that leave a protein's folded structure essentially unchanged. However, to evolve novel folds of proteins, mutations that lead to large conformational modifications have to be involved. Unraveling the basic biophysics of such mutations is a challenge to theory, especially when only one or two amino acid substitutions cause a large-scale conformational switch. Among the few such mutational switches identified experimentally, the one between the GA all-α and GB α+β folds is extensively characterized; but all-atom simulations using fully transferrable potentials have not been able to account for this striking switching behavior. Here we introduce an explicit- chain model that combines structure-based native biases for multiple alternative structures with a general physical atomic force field, and apply this construct to twelve mutants spanning the sequence variation between GA and GB. In agreement with experiment, we observe conformational switching from GA to GB upon a single L45Y substitution in the GA98 mutant. In line with the latent evolutionary potential concept, our model shows a gradual sequence-dependent change in fold preference in the mutants before this switch. Our analysis also indicates that a sharp GA/GB switch may arise from the orientation dependence of aromatic π-interactions. These findings provide physical insights toward rationalizing, predicting and designing evolutionary conformational switches.
Many organisms live under complex and changing environmental conditions, while having a limited n... more Many organisms live under complex and changing environmental conditions, while having a limited number of proteins to deal with these conditions. Multi-functionality, as exhibited by many functionally promiscuous enzymes, has been hypothesised as an advantageous compromise whenever the same protein is under selection to conserve an existing function while adapting towards a new function (adaptive conflict). A stage of multi-functionality may or may not be followed by gene duplication and divergence. We use computational biophysical models to analyse multi-functionality of proteins that can fold into more than one stable structure (using structure formation as a proxy for functionality). Our model predicts that proteins evolving under selection for two alternative structures can follow gradients of stability shift from the formation of only one stable structure towards an equilibrium state between two stable structures (bi-stability). Population dynamics simulations show that weak co...
Rotational view onto fitness landscape leading to Subfunctionalization (under weak selection pres... more Rotational view onto fitness landscape leading to Subfunctionalization (under weak selection pressure). Gene fitness is plotted over a 2D representation of sequence space. Solid black lines indicate evolutionary trajectories of pre-duplication gene loci, whereas green lines indicate evolution of a gene copy arising by duplication.
Understanding how rapid functional adaptation is possible under complex environmental constraints... more Understanding how rapid functional adaptation is possible under complex environmental constraints is crucial not only in the context of long-term genome evolution, but also in critical short-term scenarios such as the evolution of multi-drug resistance in bacteria. Recent theoretical developments in the area of molecular evolution via gene duplication emphasise the importance of functionally promiscuous ancestors. The concept of subfunctionalisation not only provides a mechanism for retention of otherwise redundant gene copies but also for the "escape from an adaptive conflict". This conflict of a single pre-duplication gene to adapt to more than one selection pressure can lead to a certain degree of multi- functionality prior to a duplication event. After a duplication, the resulting paralogs are thought to quickly specialise on different sub-functions. Here, multi-functionality is considered at the protein conformational level, assuming a direct link between structure an...
Most hypotheses about the evolution of new proteins via the mecha- nism of gene duplication and d... more Most hypotheses about the evolution of new proteins via the mecha- nism of gene duplication and divergence only take into account the pro- cesses following the duplication event. The Neofunctionalisation hypothe- sis assumes that, due to the functional redundancy of the two gene copies, one is free to undergo adaptation towards a new function. The Subfunc- tionalisation hypothesis, assumes that, for a while, the duplicates retain complementary subfunctions of the original gene. Here, we review evi- dence for adaptive changes occurring in proteins (or single protein do- mains) that have not (yet) undergone gene duplication. These changes are neutral with respect to the fitness contribution of the native function of a protein, but potentially adaptive regarding the fitness contributions of latent or promiscuous functions. It is likely that those promiscuous functions are associated with different protein conformations existing in equilibrium. If a gene duplication occurs for a protein...
Many organisms live under complex and changing environmental conditions, while having a limited n... more Many organisms live under complex and changing environmental conditions, while having a limited number of proteins to deal with these conditions. Multi-functionality, as exhibited by many functionally promiscuous enzymes, has been hypothesised as an advantageous compromise whenever the same protein is under selection to conserve an existing function while adapting towards a new function (adaptive conflict). A stage of multi-functionality may or may not be followed by gene duplication and divergence. We use simple biophysical models to explain the basic principles behind the multi-functionality of proteins that can fold into more than one stable structure (using structure formation as a proxy for functionality). Our model predicts that proteins evolving under selection for two alternative structures can follow gradients of stability shift from the formation of only one stable structure towards an equilibrium state between two stable structures (bi-stability), each providing an indepe...
Molecular evolution has focused on sequence data due to its easy accessibility and the wealth of ... more Molecular evolution has focused on sequence data due to its easy accessibility and the wealth of information it contains. However, certain aspects of protein evolution can only be understood in the light of structure and the biophysical rules that determine it. One such rule is the formation of a stable hydrophobic core in many if not most folded proteins that we see today. Another rule is the efficient folding of an extended peptide chain into a functional native form, avoiding misfolding and misinteracting with other molecules. While the core of a protein fold is strongly conserved, the surface can tolerate many mutations, leading to the widely observed properties of robustness and adaptability which are crucial for evolution. Adaptability can arise from mutations stabilizing hidden or excited protein folds that may compete with the dominant native fold during the folding and other dynamic processes crucial for function. If these hidden folds have functional potential, they may be...
ABSTRACT This chapter contains sections titled: Introduction Stability of Protein Structures Stru... more ABSTRACT This chapter contains sections titled: Introduction Stability of Protein Structures Structural Promiscuity of Proteins Evolutionary Transitions between Protein Phenotypes Functional Promiscuity of Enzymes Gene Duplications and Phenotypic Transitions at the Population Level Evolution of Ribozyme Structures Conclusions References
The study of molecular evolution at the level of protein-coding genes often entails comparing lar... more The study of molecular evolution at the level of protein-coding genes often entails comparing large datasets of sequences to infer their evolutionary relationships. Despite the importance of a protein's structure and conformational dynamics to its function and thus its fitness, common phylogenetic methods embody minimal biophysical knowledge of proteins. To underscore the biophysical constraints on natural selection, we survey effects of protein mutations, highlighting the physical basis for marginal stability of natural globular proteins and how requirement for kinetic stability and avoidance of misfolding and misinteractions might have affected protein evolution. The biophysical underpinnings of these effects have been addressed by models with an explicit coarse-grained spatial representation of the polypeptide chain. Sequence–structure mappings based on such models are powerful conceptual tools that rationalize mutational robustness, evolvability, epistasis, promiscuous function performed by ‘hidden’ conformational states, resolution of adaptive conflicts and conformational switches in the evolution from one protein fold to another. Recently, protein biophysics has been applied to derive more accurate evolutionary accounts of sequence data. Methods have also been developed to exploit sequence-based evolutionary information to predict biophysical behaviours of proteins. The success of these approaches demonstrates a deep synergy between the fields of protein biophysics and protein evolution.
Experimental studies have shown that some proteins exist in two alternative native-state conforma... more Experimental studies have shown that some proteins exist in two alternative native-state conformations. It has been proposed that such bi-stable proteins can potentially function as evolutionary bridges at the interface between two neutral networks of protein sequences that fold uniquely into the two different native conformations. Under adaptive conflict scenarios, bi-stable proteins may be of particular advantage if they simultaneously provide two beneficial biological functions. However, computational models that simulate protein structure evolution do not yet recognize the importance of bi-stability. Here we use a biophysical model to analyze sequence space to identify bi-stable or multi-stable proteins with two or more equally stable native-state structures. The inclusion of such proteins enhances phenotype connectivity between neutral networks in sequence space. Consideration of the sequence space neighborhood of bridge proteins revealed that bi-stability decreases gradually with each mutation that takes the sequence further away from an exactly bistable protein. With relaxed selection pressures, we found that bi-stable proteins in our model are highly successful under simulated adaptive conflict. Inspired by these model predictions, we developed a method to identify real proteins in the PDB with bridge-like properties, and have verified a clear bi-stability gradient for a series of mutants studied by Alexander et al. (Proc Nat Acad Sci USA 2009, 106:21149-21154) that connect two sequences that fold uniquely into two different native structures via a bridge-like intermediate mutant sequence. Based on these findings, new testable predictions for future studies on protein bi-stability and evolution are discussed.
Uploads
Papers by Tobias Sikosek