Checking normality assumption using Q-Q plots. Checking normality assumption for initial and tran... more Checking normality assumption using Q-Q plots. Checking normality assumption for initial and transformed differentially infected wild and npr1 mutant plant spore count datasets using Q-Q plots. (PDF 536 kb)
Experimental data from Arabidopsis wild type and npr1 mutant plants. Leaf bacteria spore count (p... more Experimental data from Arabidopsis wild type and npr1 mutant plants. Leaf bacteria spore count (phenotype) dataset of 31 Arabidopsis wild type (control) and 29 npr1 mutant (case) plants 48 h post Pst-DC3000 infection. (XLS 10 kb)
FRANC is a Python framework for local ancestry inference integrating eight existing state-of-the-... more FRANC is a Python framework for local ancestry inference integrating eight existing state-of-the-art ancestry deconvolution tools that uses high density SNP data. It estimates ancestry using WINPOP, LAMPLD, SUPPORTMIX, RFMIX, PCADMIX, ELAI, CHROMOPAINTER and LOTER. It processes tool-specific inputs, runs and standardizes the outputs to different output format according to user specifications. Output is also available in the standardized LAIT format. FRANC v19.1, runs ELAI with either phased/unphased data and ELAI on unphased data can only have output format in either ELAI or LAIT. Interestingly, FRANC allows a user to run one or more tools on one or more chromosomes. The FRANC Interface enables users to manipulate inputs, infers local ancestry estimates, converts outputs to user format choice and summarizes the results for eight existing multi-way local ancestry methods using one straightforward command line. This may ease the use of these outputs in further applications and paves t...
Artificial Intelligence - Applications in Medicine and Biology, 2019
Advances in sequencing technology have significantly contributed to shaping the area of genetics ... more Advances in sequencing technology have significantly contributed to shaping the area of genetics and enabled the identification of genetic variants associated with complex traits through genome-wide association studies. This has provided insights into genetic medicine, in which case, genetic factors influence variability in disease and treatment outcomes. On the other side, the missing or hidden heritability has suggested that the host quality of life and other environmental factors may also influence differences in disease risk and drug/treatment responses in genomic medicine, and orient biomedical research, even though this may be highly constrained by genetic capabilities. It is expected that combining these different factors can yield a paradigm-shift of personalized medicine and lead to a more effective medical treatment. With existing "big data" initiatives and high-performance computing infrastructures, there is a need for data-driven learning algorithms and models that enable the selection and prioritization of relevant genetic variants (post-genomic medicine) and trigger effective translation into clinical practice. In this chapter, we survey and discuss existing machine learning algorithms and postgenomic analysis models supporting the process of identifying valuable markers.
Bioinformatics Tools for Detection and Clinical Interpretation of Genomic Variations [Working Title], 2019
Rapid advances in sequencing and genotyping technologies have significantly contributed to shapin... more Rapid advances in sequencing and genotyping technologies have significantly contributed to shaping the area of medical and population genetics. Several thousand genomes are completed with millions of variants identified in the human deoxyribonucleic acid (DNA) sequences. These genomic variations highly influence changes in phenotypic manifestations and physiological functions of different individuals or population groups. Of particular importance are variations introduced by admixture event, contributing significantly to a remarkable phenotypic variability with medical and/or evolutionary implications. In this case, knowledge of local ancestry estimates and date of admixture is of utmost importance for a better understanding of genomic variation patterns throughout modern human evolution and adaptive processes. In this chapter, we survey existing local ancestry deconvolution and dating admixture event models to identify possible gaps that still need to be filled and orient future trends in designing more effective models, which account for current challenges and produce more accurate and biological relevant estimates.
Over the past decade, studies of admixed populations have increasingly gained interest in both me... more Over the past decade, studies of admixed populations have increasingly gained interest in both medical and population genetics. These studies have so far shed light on the patterns of genetic variation throughout modern human evolution and have improved our understanding of the demographics and adaptive processes of human populations. To date, there exist about 20 methods or tools to deconvolve local ancestry. These methods have merits and drawbacks in estimating local ancestry in multiway admixed populations. In this article, we survey existing ancestry deconvolution methods, with special emphasis on multiway admixture, and compare these methods based on simulation results reported by different studies, computational approaches used, including mathematical and statistical models, and biological challenges related to each method. This should orient users on the choice of an appropriate method or tool for given population admixture characteristics and update researchers on current ad...
Motivation: Recent technological advances in high-throughput sequencing and genotyping have facil... more Motivation: Recent technological advances in high-throughput sequencing and genotyping have facilitated an improved understanding of genomic structure and disease-associated genetic factors. In this context, simulation models can play a critical role in revealing various evolutionary and demographic effects on genomic variation, enabling researchers to assess existing and design novel analytical approaches. Although various simulation frameworks have been suggested, they do not account for natural selection in admixture processes. Most are tailored to a single chromosome or a genomic region, very few capture large-scale genomic data, and most are not accessible for genomic communities. Results: Here we develop a multi-scenario genome-wide medical population genetics simulation framework called 'FractalSIM'. FractalSIM has the capability to accurately mimic and generate genome-wide data under various genetic models on genetic diversity, genomic variation affecting diseases and DNA sequence patterns of admixed and/or homogeneous populations. Moreover, the framework accounts for natural selection in both homogeneous and admixture processes. The outputs of FractalSIM have been assessed using popular tools, and the results demonstrated its capability to accurately mimic real scenarios. They can be used to evaluate the performance of a range of genomic tools from ancestry inference to genome-wide association studies. Availability and implementation: The FractalSIM package is available at http://www.cbio.uct.ac.za/ FractalSIM.
Advances in forward and reverse genetic techniques have enabled the discovery and identification ... more Advances in forward and reverse genetic techniques have enabled the discovery and identification of several plant defence genes based on quantifiable disease phenotypes in mutant populations. Existing models for testing the effect of gene inactivation or genes causing these phenotypes do not take into account eventual uncertainty of these datasets and potential noise inherent in the biological experiment used, which may mask downstream analysis and limit the use of these datasets. Moreover, elucidating biological mechanisms driving the induced disease resistance and influencing these observable disease phenotypes has never been systematically tackled, eliciting the need for an efficient model to characterize completely the gene target under consideration. We developed a post-gene silencing bioinformatics (post-GSB) protocol which accounts for potential biases related to the disease phenotype datasets in assessing the contribution of the gene target to the plant defence response. The...
Application of statistical methods in monitoring and control of industrial processes are generall... more Application of statistical methods in monitoring and control of industrial processes are generally known as statistical process control (SPC). Since most of the modern day industrial processes are multivariate in nature, multivariate statistical process control (MSPC), supplanted univariate SPC techniques. MSPC techniques are not only significant for scholastic pursuit; it has been addressing industrial problems in recent past. Monitoring and controlling a chemical process is a challenging task because of their multivariate, highly correlated and non-linear nature. In this paper, a series of techniques were applied. Time series plot was implemented to determine the stationarity of the data. The Box-Jenkins methodology of model identification, estimation and validation; was used to generate ARIMA models based on multiple non sequential data. As a result, the residuals from ARIMA models have shown four attributes: normally distributed, uncorrelated, independent and no autocorrelation ...
Checking normality assumption using Q-Q plots. Checking normality assumption for initial and tran... more Checking normality assumption using Q-Q plots. Checking normality assumption for initial and transformed differentially infected wild and npr1 mutant plant spore count datasets using Q-Q plots. (PDF 536 kb)
Experimental data from Arabidopsis wild type and npr1 mutant plants. Leaf bacteria spore count (p... more Experimental data from Arabidopsis wild type and npr1 mutant plants. Leaf bacteria spore count (phenotype) dataset of 31 Arabidopsis wild type (control) and 29 npr1 mutant (case) plants 48 h post Pst-DC3000 infection. (XLS 10 kb)
FRANC is a Python framework for local ancestry inference integrating eight existing state-of-the-... more FRANC is a Python framework for local ancestry inference integrating eight existing state-of-the-art ancestry deconvolution tools that uses high density SNP data. It estimates ancestry using WINPOP, LAMPLD, SUPPORTMIX, RFMIX, PCADMIX, ELAI, CHROMOPAINTER and LOTER. It processes tool-specific inputs, runs and standardizes the outputs to different output format according to user specifications. Output is also available in the standardized LAIT format. FRANC v19.1, runs ELAI with either phased/unphased data and ELAI on unphased data can only have output format in either ELAI or LAIT. Interestingly, FRANC allows a user to run one or more tools on one or more chromosomes. The FRANC Interface enables users to manipulate inputs, infers local ancestry estimates, converts outputs to user format choice and summarizes the results for eight existing multi-way local ancestry methods using one straightforward command line. This may ease the use of these outputs in further applications and paves t...
Artificial Intelligence - Applications in Medicine and Biology, 2019
Advances in sequencing technology have significantly contributed to shaping the area of genetics ... more Advances in sequencing technology have significantly contributed to shaping the area of genetics and enabled the identification of genetic variants associated with complex traits through genome-wide association studies. This has provided insights into genetic medicine, in which case, genetic factors influence variability in disease and treatment outcomes. On the other side, the missing or hidden heritability has suggested that the host quality of life and other environmental factors may also influence differences in disease risk and drug/treatment responses in genomic medicine, and orient biomedical research, even though this may be highly constrained by genetic capabilities. It is expected that combining these different factors can yield a paradigm-shift of personalized medicine and lead to a more effective medical treatment. With existing "big data" initiatives and high-performance computing infrastructures, there is a need for data-driven learning algorithms and models that enable the selection and prioritization of relevant genetic variants (post-genomic medicine) and trigger effective translation into clinical practice. In this chapter, we survey and discuss existing machine learning algorithms and postgenomic analysis models supporting the process of identifying valuable markers.
Bioinformatics Tools for Detection and Clinical Interpretation of Genomic Variations [Working Title], 2019
Rapid advances in sequencing and genotyping technologies have significantly contributed to shapin... more Rapid advances in sequencing and genotyping technologies have significantly contributed to shaping the area of medical and population genetics. Several thousand genomes are completed with millions of variants identified in the human deoxyribonucleic acid (DNA) sequences. These genomic variations highly influence changes in phenotypic manifestations and physiological functions of different individuals or population groups. Of particular importance are variations introduced by admixture event, contributing significantly to a remarkable phenotypic variability with medical and/or evolutionary implications. In this case, knowledge of local ancestry estimates and date of admixture is of utmost importance for a better understanding of genomic variation patterns throughout modern human evolution and adaptive processes. In this chapter, we survey existing local ancestry deconvolution and dating admixture event models to identify possible gaps that still need to be filled and orient future trends in designing more effective models, which account for current challenges and produce more accurate and biological relevant estimates.
Over the past decade, studies of admixed populations have increasingly gained interest in both me... more Over the past decade, studies of admixed populations have increasingly gained interest in both medical and population genetics. These studies have so far shed light on the patterns of genetic variation throughout modern human evolution and have improved our understanding of the demographics and adaptive processes of human populations. To date, there exist about 20 methods or tools to deconvolve local ancestry. These methods have merits and drawbacks in estimating local ancestry in multiway admixed populations. In this article, we survey existing ancestry deconvolution methods, with special emphasis on multiway admixture, and compare these methods based on simulation results reported by different studies, computational approaches used, including mathematical and statistical models, and biological challenges related to each method. This should orient users on the choice of an appropriate method or tool for given population admixture characteristics and update researchers on current ad...
Motivation: Recent technological advances in high-throughput sequencing and genotyping have facil... more Motivation: Recent technological advances in high-throughput sequencing and genotyping have facilitated an improved understanding of genomic structure and disease-associated genetic factors. In this context, simulation models can play a critical role in revealing various evolutionary and demographic effects on genomic variation, enabling researchers to assess existing and design novel analytical approaches. Although various simulation frameworks have been suggested, they do not account for natural selection in admixture processes. Most are tailored to a single chromosome or a genomic region, very few capture large-scale genomic data, and most are not accessible for genomic communities. Results: Here we develop a multi-scenario genome-wide medical population genetics simulation framework called 'FractalSIM'. FractalSIM has the capability to accurately mimic and generate genome-wide data under various genetic models on genetic diversity, genomic variation affecting diseases and DNA sequence patterns of admixed and/or homogeneous populations. Moreover, the framework accounts for natural selection in both homogeneous and admixture processes. The outputs of FractalSIM have been assessed using popular tools, and the results demonstrated its capability to accurately mimic real scenarios. They can be used to evaluate the performance of a range of genomic tools from ancestry inference to genome-wide association studies. Availability and implementation: The FractalSIM package is available at http://www.cbio.uct.ac.za/ FractalSIM.
Advances in forward and reverse genetic techniques have enabled the discovery and identification ... more Advances in forward and reverse genetic techniques have enabled the discovery and identification of several plant defence genes based on quantifiable disease phenotypes in mutant populations. Existing models for testing the effect of gene inactivation or genes causing these phenotypes do not take into account eventual uncertainty of these datasets and potential noise inherent in the biological experiment used, which may mask downstream analysis and limit the use of these datasets. Moreover, elucidating biological mechanisms driving the induced disease resistance and influencing these observable disease phenotypes has never been systematically tackled, eliciting the need for an efficient model to characterize completely the gene target under consideration. We developed a post-gene silencing bioinformatics (post-GSB) protocol which accounts for potential biases related to the disease phenotype datasets in assessing the contribution of the gene target to the plant defence response. The...
Application of statistical methods in monitoring and control of industrial processes are generall... more Application of statistical methods in monitoring and control of industrial processes are generally known as statistical process control (SPC). Since most of the modern day industrial processes are multivariate in nature, multivariate statistical process control (MSPC), supplanted univariate SPC techniques. MSPC techniques are not only significant for scholastic pursuit; it has been addressing industrial problems in recent past. Monitoring and controlling a chemical process is a challenging task because of their multivariate, highly correlated and non-linear nature. In this paper, a series of techniques were applied. Time series plot was implemented to determine the stationarity of the data. The Box-Jenkins methodology of model identification, estimation and validation; was used to generate ARIMA models based on multiple non sequential data. As a result, the residuals from ARIMA models have shown four attributes: normally distributed, uncorrelated, independent and no autocorrelation ...
Uploads
Papers by Ephifania Geza