We present calculations of the lattice thermal conductivity of silicon that incorporate several c... more We present calculations of the lattice thermal conductivity of silicon that incorporate several commonly used empirical models of the interatomic potential. Second-and third-order force constants obtained from these potentials are used as inputs to an exact iterative solution of the inelastic phonon Boltzmann equation, which includes the anharmonic three-phonon scattering as well as isotopic defect and boundary scattering. Comparison of the calculated lattice thermal conductivity with the experiment shows that none of these potentials provides satisfactory agreement. Calculations of the mode Grüneisen parameters and the linear thermal expansion coefficient help elucidate the reasons for this. We also examine a set of parameters for one of these empirical potentials that produces improved agreement with both the measured lattice thermal conductivity and the thermal expansion data.
Journal of Clinical and Translational Science, 2017
IntroductionComputational analysis of genome or exome sequences may improve inherited disease dia... more IntroductionComputational analysis of genome or exome sequences may improve inherited disease diagnosis, but is costly and time-consuming.MethodsWe describe the use of iobio, a web-based tool suite for intuitive, real-time genome diagnostic analyses.ResultsWe used iobio to identify the disease-causing variant in a patient with early infantile epileptic encephalopathy with prior nondiagnostic genetic testing.ConclusionsIobio tools can be used by clinicians to rapidly identify disease-causing variants from genomic patient sequencing data.
Using density functional perturbation theory and a full solution of the linearized phonon Boltzma... more Using density functional perturbation theory and a full solution of the linearized phonon Boltzmann transport equation (BTE), a parameter-free theory of semiconductor thermal properties is developed. The approximations and shortcomings of previous approaches to thermal conductivity calculations are investigated. The use of empirical interatomic potentials in the BTE approach is shown to give poor agreement with measured values of thermal conductivity. By using the adiabatic bond charge model, the importance of accurate descriptions of phonon dispersions is highlighted. The extremely limited capacity of previous theoretical techniques in the realm of thermal conductivity prediction is highlighted; this is due to a dependence on adjustable parameters. Density functional perturbation theory is coupled with an iterative solution to the full Boltzmann transport equation creating a theoretical construct where thermal conductivity prediction becomes possible. Validation of the approach is ...
Background: Mobile elements (MEs) constitute greater than 50% of the human genome as a result of ... more Background: Mobile elements (MEs) constitute greater than 50% of the human genome as a result of repeated insertion events during human genome evolution. Although most of these elements are now fixed in the population, some MEs, including ALU, L1, SVA and HERV-K elements, are still actively duplicating. Mobile element insertions (MEIs) have been associated with human genetic disorders, including Crohn's disease, hemophilia, and various types of cancer, motivating the need for accurate MEI detection methods. To comprehensively identify and accurately characterize these variants in whole genome next-generation sequencing (NGS) data, a computationally efficient detection and genotyping method is required. Current computational tools are unable to call MEI polymorphisms with sufficiently high sensitivity and specificity, or call individual genotypes with sufficiently high accuracy. Results: Here we report Tangram, a computationally efficient MEI detection program that integrates read-pair (RP) and split-read (SR) mapping signals to detect MEI events. By utilizing SR mapping in its primary detection module, a feature unique to this software, Tangram is able to pinpoint MEI breakpoints with single-nucleotide precision. To understand the role of MEI events in disease, it is essential to produce accurate individual genotypes in clinical samples. Tangram is able to determine sample genotypes with very high accuracy. Using simulations and experimental datasets, we demonstrate that Tangram has superior sensitivity, specificity, breakpoint resolution and genotyping accuracy, when compared to other, recently developed MEI detection methods. Conclusions: Tangram serves as the primary MEI detection tool in the 1000 Genomes Project, and is implemented as a highly portable, memory-efficient, easy-to-use C++ computer program, built under an open-source development model.
Early infantile epileptic encephalopathy (EIEE) is a devastating epilepsy syndrome with onset in ... more Early infantile epileptic encephalopathy (EIEE) is a devastating epilepsy syndrome with onset in the first months of life. Although mutations in more than 50 different genes are known to cause EIEE, current diagnostic yields with gene panel tests or whole-exome sequencing are below 60%. We applied whole-genome analysis (WGA) consisting of whole-genome sequencing and comprehensive variant discovery approaches to a cohort of 14 EIEE subjects for whom prior genetic tests had not yielded a diagnosis. We identified both de novo point and INDEL mutations and de novo structural rearrangements in known EIEE genes, as well as mutations in genes not previously associated with EIEE. The detection of a pathogenic or likely pathogenic mutation in all 14 subjects demonstrates the utility of WGA to reduce the time and costs of clinical diagnosis of EIEE. While exome sequencing may have detected 12 of the 14 causal mutations, 3 of the 12 patients received non-diagnostic exome panel tests prior to genome sequencing. Thus, given the continued decline of sequencing costs, our results support the use of WGA with comprehensive variant discovery as an efficient strategy for the clinical diagnosis of EIEE and other genetic conditions.
The primary goal of precision genomics is the identification of causative genetic variants in tar... more The primary goal of precision genomics is the identification of causative genetic variants in targeted or whole-genome sequencing data. The ultimate clinical hope is that these findings lead to an efficacious change in treatment for the patient. In current clinical practice, these findings are typically returned by expert analysts as static, text-based reports. Ideally, these reports summarize the quality of the data obtained, integrate known gene–phenotype associations, follow allele segregation and affected status within the sequenced samples, and weigh computational evidence of pathogenicity. These findings are used to prioritize the variant(s) most likely to cause the given patient’s phenotypes. In most diagnostic settings, a team of experts contribute to these reports, including bioinformaticians, clinicians, and genetic counselors, among others. However, these experts often do not have the necessary tools to review genomic findings, test genetic hypotheses, or query specific g...
A comprehensive list of candidate genes that succinctly describe the complete and objective pheno... more A comprehensive list of candidate genes that succinctly describe the complete and objective phenotypic features of disease is critical when both ordering genetic testing and when triaging candidate variants in exome and genome sequencing studies. Great efforts have been made to curate gene:disease associations both in academic research and commercial gene testing settings. However, many of these valuable resources exist as islands and must be used independently, generating static, single-resource gene:disease association lists. To more effectively utilize these resources we created genepanel.iobio (https://genepanel.iobio.io) an easy to use, free and open-source web tool for generating disease- and phenotype-associated gene lists from multiple gene:disease association resources, including the NCBI Genetic Testing Registry (GTR), Phenolyzer, and the Human Phenotype Ontology (HPO). We demonstrate the utility of genepanel.iobio by applying it to complex, rare and undiagnosed disease ca...
MOSAIK is a stable, sensitive and open-source program for mapping second and third-generation seq... more MOSAIK is a stable, sensitive and open-source program for mapping second and third-generation sequencing reads to a reference genome. Uniquely among current mapping tools, MOSAIK can align reads generated by all the major sequencing technologies, including Illumina, Applied Biosystems SOLiD, Roche 454, Ion Torrent and Pacific BioSciences SMRT. Indeed, MOSAIK was the only aligner to provide consistent mappings for all the generated data (sequencing technologies, low-coverage and exome) in the 1000 Genomes Project. To provide highly accurate alignments, MOSAIK employs a hash clustering strategy coupled with the Smith-Waterman algorithm. This method is well-suited to capture mismatches as well as short insertions and deletions. To support the growing interest in larger structural variant (SV) discovery, MOSAIK provides explicit support for handling known-sequence SVs, e.g. mobile element insertions (MEIs) as well as generating outputs tailored to aid in SV discovery. All variant discovery benefits from an accurate description of the read placement confidence. To this end, MOSAIK uses a neural-net based training scheme to provide well-calibrated mapping quality scores, demonstrated by a correlation coefficient between MOSAIK assigned and actual mapping qualities greater than 0.98. In order to ensure that studies of any genome are supported, a training pipeline is provided to ensure optimal mapping quality scores for the genome under investigation. MOSAIK is multi-threaded, open source, and incorporated into our command and pipeline launcher system GKNO (http://gkno.me).
By characterizing the geographic and functional spectrum of human genetic variation, the 1000 Gen... more By characterizing the geographic and functional spectrum of human genetic variation, the 1000 Genomes Project aims to build a resource to help to understand the genetic contribution to disease. Here we describe the genomes of 1,092 individuals from 14 populations, constructed using a combination of low-coverage whole-genome and exome sequencing. By developing methods to integrate information across several algorithms and diverse data sources, we provide a validated haplotype map of 38million single nucleotide ...
With increasing utilization of comprehensive genomic data to guide clinical care, anticipated to ... more With increasing utilization of comprehensive genomic data to guide clinical care, anticipated to become the standard of care in many clinical settings, the practice of diagnostic medicine is undergoing a notable shift. However, the move from single-gene or panel-based genetic testing to exome and genome sequencing has not been matched by the development of tools to enable diagnosticians to interpret increasingly complex genomic findings. A new paradigm has emerged, where genome-based tests are often evaluated by a large multi-disciplinary collaborative team, typically including a diagnostic pathologist, a bioinformatician, a genetic counselor, and often a subspeciality clinician. This team-based approach calls for new computational tools to allow every member of the clinical care provider team, at varying levels of genetic knowledge and diagnostic expertise, to quickly and easily analyze and interpret complex genomic data. Here, we present gene.iobio, a real-time, intuitive and inte...
The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variati... more The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, ...
We present calculations of the lattice thermal conductivity of silicon that incorporate several c... more We present calculations of the lattice thermal conductivity of silicon that incorporate several commonly used empirical models of the interatomic potential. Second-and third-order force constants obtained from these potentials are used as inputs to an exact iterative solution of the inelastic phonon Boltzmann equation, which includes the anharmonic three-phonon scattering as well as isotopic defect and boundary scattering. Comparison of the calculated lattice thermal conductivity with the experiment shows that none of these potentials provides satisfactory agreement. Calculations of the mode Grüneisen parameters and the linear thermal expansion coefficient help elucidate the reasons for this. We also examine a set of parameters for one of these empirical potentials that produces improved agreement with both the measured lattice thermal conductivity and the thermal expansion data.
Journal of Clinical and Translational Science, 2017
IntroductionComputational analysis of genome or exome sequences may improve inherited disease dia... more IntroductionComputational analysis of genome or exome sequences may improve inherited disease diagnosis, but is costly and time-consuming.MethodsWe describe the use of iobio, a web-based tool suite for intuitive, real-time genome diagnostic analyses.ResultsWe used iobio to identify the disease-causing variant in a patient with early infantile epileptic encephalopathy with prior nondiagnostic genetic testing.ConclusionsIobio tools can be used by clinicians to rapidly identify disease-causing variants from genomic patient sequencing data.
Using density functional perturbation theory and a full solution of the linearized phonon Boltzma... more Using density functional perturbation theory and a full solution of the linearized phonon Boltzmann transport equation (BTE), a parameter-free theory of semiconductor thermal properties is developed. The approximations and shortcomings of previous approaches to thermal conductivity calculations are investigated. The use of empirical interatomic potentials in the BTE approach is shown to give poor agreement with measured values of thermal conductivity. By using the adiabatic bond charge model, the importance of accurate descriptions of phonon dispersions is highlighted. The extremely limited capacity of previous theoretical techniques in the realm of thermal conductivity prediction is highlighted; this is due to a dependence on adjustable parameters. Density functional perturbation theory is coupled with an iterative solution to the full Boltzmann transport equation creating a theoretical construct where thermal conductivity prediction becomes possible. Validation of the approach is ...
Background: Mobile elements (MEs) constitute greater than 50% of the human genome as a result of ... more Background: Mobile elements (MEs) constitute greater than 50% of the human genome as a result of repeated insertion events during human genome evolution. Although most of these elements are now fixed in the population, some MEs, including ALU, L1, SVA and HERV-K elements, are still actively duplicating. Mobile element insertions (MEIs) have been associated with human genetic disorders, including Crohn's disease, hemophilia, and various types of cancer, motivating the need for accurate MEI detection methods. To comprehensively identify and accurately characterize these variants in whole genome next-generation sequencing (NGS) data, a computationally efficient detection and genotyping method is required. Current computational tools are unable to call MEI polymorphisms with sufficiently high sensitivity and specificity, or call individual genotypes with sufficiently high accuracy. Results: Here we report Tangram, a computationally efficient MEI detection program that integrates read-pair (RP) and split-read (SR) mapping signals to detect MEI events. By utilizing SR mapping in its primary detection module, a feature unique to this software, Tangram is able to pinpoint MEI breakpoints with single-nucleotide precision. To understand the role of MEI events in disease, it is essential to produce accurate individual genotypes in clinical samples. Tangram is able to determine sample genotypes with very high accuracy. Using simulations and experimental datasets, we demonstrate that Tangram has superior sensitivity, specificity, breakpoint resolution and genotyping accuracy, when compared to other, recently developed MEI detection methods. Conclusions: Tangram serves as the primary MEI detection tool in the 1000 Genomes Project, and is implemented as a highly portable, memory-efficient, easy-to-use C++ computer program, built under an open-source development model.
Early infantile epileptic encephalopathy (EIEE) is a devastating epilepsy syndrome with onset in ... more Early infantile epileptic encephalopathy (EIEE) is a devastating epilepsy syndrome with onset in the first months of life. Although mutations in more than 50 different genes are known to cause EIEE, current diagnostic yields with gene panel tests or whole-exome sequencing are below 60%. We applied whole-genome analysis (WGA) consisting of whole-genome sequencing and comprehensive variant discovery approaches to a cohort of 14 EIEE subjects for whom prior genetic tests had not yielded a diagnosis. We identified both de novo point and INDEL mutations and de novo structural rearrangements in known EIEE genes, as well as mutations in genes not previously associated with EIEE. The detection of a pathogenic or likely pathogenic mutation in all 14 subjects demonstrates the utility of WGA to reduce the time and costs of clinical diagnosis of EIEE. While exome sequencing may have detected 12 of the 14 causal mutations, 3 of the 12 patients received non-diagnostic exome panel tests prior to genome sequencing. Thus, given the continued decline of sequencing costs, our results support the use of WGA with comprehensive variant discovery as an efficient strategy for the clinical diagnosis of EIEE and other genetic conditions.
The primary goal of precision genomics is the identification of causative genetic variants in tar... more The primary goal of precision genomics is the identification of causative genetic variants in targeted or whole-genome sequencing data. The ultimate clinical hope is that these findings lead to an efficacious change in treatment for the patient. In current clinical practice, these findings are typically returned by expert analysts as static, text-based reports. Ideally, these reports summarize the quality of the data obtained, integrate known gene–phenotype associations, follow allele segregation and affected status within the sequenced samples, and weigh computational evidence of pathogenicity. These findings are used to prioritize the variant(s) most likely to cause the given patient’s phenotypes. In most diagnostic settings, a team of experts contribute to these reports, including bioinformaticians, clinicians, and genetic counselors, among others. However, these experts often do not have the necessary tools to review genomic findings, test genetic hypotheses, or query specific g...
A comprehensive list of candidate genes that succinctly describe the complete and objective pheno... more A comprehensive list of candidate genes that succinctly describe the complete and objective phenotypic features of disease is critical when both ordering genetic testing and when triaging candidate variants in exome and genome sequencing studies. Great efforts have been made to curate gene:disease associations both in academic research and commercial gene testing settings. However, many of these valuable resources exist as islands and must be used independently, generating static, single-resource gene:disease association lists. To more effectively utilize these resources we created genepanel.iobio (https://genepanel.iobio.io) an easy to use, free and open-source web tool for generating disease- and phenotype-associated gene lists from multiple gene:disease association resources, including the NCBI Genetic Testing Registry (GTR), Phenolyzer, and the Human Phenotype Ontology (HPO). We demonstrate the utility of genepanel.iobio by applying it to complex, rare and undiagnosed disease ca...
MOSAIK is a stable, sensitive and open-source program for mapping second and third-generation seq... more MOSAIK is a stable, sensitive and open-source program for mapping second and third-generation sequencing reads to a reference genome. Uniquely among current mapping tools, MOSAIK can align reads generated by all the major sequencing technologies, including Illumina, Applied Biosystems SOLiD, Roche 454, Ion Torrent and Pacific BioSciences SMRT. Indeed, MOSAIK was the only aligner to provide consistent mappings for all the generated data (sequencing technologies, low-coverage and exome) in the 1000 Genomes Project. To provide highly accurate alignments, MOSAIK employs a hash clustering strategy coupled with the Smith-Waterman algorithm. This method is well-suited to capture mismatches as well as short insertions and deletions. To support the growing interest in larger structural variant (SV) discovery, MOSAIK provides explicit support for handling known-sequence SVs, e.g. mobile element insertions (MEIs) as well as generating outputs tailored to aid in SV discovery. All variant discovery benefits from an accurate description of the read placement confidence. To this end, MOSAIK uses a neural-net based training scheme to provide well-calibrated mapping quality scores, demonstrated by a correlation coefficient between MOSAIK assigned and actual mapping qualities greater than 0.98. In order to ensure that studies of any genome are supported, a training pipeline is provided to ensure optimal mapping quality scores for the genome under investigation. MOSAIK is multi-threaded, open source, and incorporated into our command and pipeline launcher system GKNO (http://gkno.me).
By characterizing the geographic and functional spectrum of human genetic variation, the 1000 Gen... more By characterizing the geographic and functional spectrum of human genetic variation, the 1000 Genomes Project aims to build a resource to help to understand the genetic contribution to disease. Here we describe the genomes of 1,092 individuals from 14 populations, constructed using a combination of low-coverage whole-genome and exome sequencing. By developing methods to integrate information across several algorithms and diverse data sources, we provide a validated haplotype map of 38million single nucleotide ...
With increasing utilization of comprehensive genomic data to guide clinical care, anticipated to ... more With increasing utilization of comprehensive genomic data to guide clinical care, anticipated to become the standard of care in many clinical settings, the practice of diagnostic medicine is undergoing a notable shift. However, the move from single-gene or panel-based genetic testing to exome and genome sequencing has not been matched by the development of tools to enable diagnosticians to interpret increasingly complex genomic findings. A new paradigm has emerged, where genome-based tests are often evaluated by a large multi-disciplinary collaborative team, typically including a diagnostic pathologist, a bioinformatician, a genetic counselor, and often a subspeciality clinician. This team-based approach calls for new computational tools to allow every member of the clinical care provider team, at varying levels of genetic knowledge and diagnostic expertise, to quickly and easily analyze and interpret complex genomic data. Here, we present gene.iobio, a real-time, intuitive and inte...
The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variati... more The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, ...
Uploads
Papers by Alistair Ward