Papers by Nicholas Sinnott-Armstrong
Archaeological and Anthropological Sciences, 2011
This object-specific case study focuses on cupreous artifacts excavated from the Great Temple com... more This object-specific case study focuses on cupreous artifacts excavated from the Great Temple complex of Petra, Jordan to demonstrate how the use of compositional X-ray analyses alongside two experimental applications (ImageJ software and nanoindentation) have the potential to generate different and otherwise unobtainable information about archaeological metals. The study highlights the value of using multiple techniques as a means of resolving the ambiguities that tend to arise from interpretations of single-sited measurements on objects and from single-instrumental analyses during studies of production processes and consequent material performance. Employing different techniques on multiple localities within a sample permits the gathering of precise information about the behavior of and interrelationships between variables that affect the objects’ fabrication and use, particularly composition, structure, and hardness properties. The resulting data are interpreted in association with contextual archaeological information from Petra to consider the use-life and potential significance of these objects.
Bioinformatics/computer Applications in The Biosciences, 2010
Motivation: Epistasis, the presence of gene-gene interactions, has been hypothesized to be at the... more Motivation: Epistasis, the presence of gene-gene interactions, has been hypothesized to be at the root of many common human diseases, but current genome-wide association studies largely ignore its role. Multifactor dimensionality reduction (MDR) is a powerful model-free method for detecting epistatic relationships between genes, but computational costs have made its application to genome-wide data difficult. Graphics processing units (GPUs), the hardware responsible for rendering computer games, are powerful parallel processors. Using GPUs to run MDR on a genome-wide dataset allows for statistically rigorous testing of epistasis. Results: The implementation of MDR for GPUs (MDRGPU) includes core features of the widely used Java software package, MDR. This GPU implementation allows for large-scale analysis of epistasis at a dramatically lower cost than the standard CPU-based implementations. As a proof-of-concept, we applied this software to a genome-wide study of sporadic amyotrophic lateral sclerosis (ALS). We discovered a statistically significant two-SNP classifier and subsequently replicated the significance of these two SNPs in an independent study of ALS. MDRGPU makes the large-scale analysis of epistasis tractable and opens the door to statistically rigorous testing of interactions in genome-wide datasets. Availability: MDRGPU is open source and available free of charge from http://www.sourceforge.net/projects/mdr.
Interdisciplinary Sciences: Computational Life Sciences, 2010
Advances in the video gaming industry have led to the production of low-cost, high-performance gr... more Advances in the video gaming industry have led to the production of low-cost, high-performance graphics processing units (GPUs) that possess more memory bandwidth and computational capability than central processing units (CPUs), the standard workhorses of scientific computing. With the recent release of generalpurpose GPUs and NVIDIA’s GPU programming language, CUDA, graphics engines are being adopted widely in scientific computing applications, particularly in the fields of computational biology and bioinformatics. The goal of this article is to concisely present an introduction to GPU hardware and programming, aimed at the computational biologist or bioinformaticist. To this end, we discuss the primary differences between GPU and CPU architecture, introduce the basics of the CUDA programming language, and discuss important CUDA programming practices, such as the proper use of coalesced reads, data types, and memory hierarchies. We highlight each of these topics in the context of computing the all-pairs distance between instances in a dataset, a common procedure in numerous disciplines of scientific computing. We conclude with a runtime analysis of the GPU and CPU implementations of the all-pairs distance calculation. We show our final GPU implementation to outperform the CPU implementation by a factor of 1700.
ABSTRACT Effective rule generalization in learning classifier systems (LCSs) has long since been ... more ABSTRACT Effective rule generalization in learning classifier systems (LCSs) has long since been an important consideration. In noisy problem domains, where attributes do not precisely determine class, overemphasis on accuracy without sufficient generalization leads to over-fitting of the training data, and a large discrepancy between training and testing accuracies. This issue is of particular concern within noisy bioinformatic problems such as complex disease, gene association studies. In an effort to promote effective generalization we introduce and explore a simple strategy which seeks to discourage over-fitting via the probabilistic incorporation of random noise within training instances. We evaluate a variety of noise models and magnitudes which either specify an equal probability of noise per attribute, or target higher noise probability to the attributes which tend to be more frequently generalized. Our results suggest that targeted noise incorporation can reduce training accuracy without eroding testing accuracy. In addition, we observe a slight improvement in our power estimates (i.e. ability to detect the true underlying model(s)).
Genetic epidemiologists, tasked with the disentanglement of genotype-to-phenotype mappings, conti... more Genetic epidemiologists, tasked with the disentanglement of genotype-to-phenotype mappings, continue to struggle with a variety of phenomena which obscure the underlying etiologies of common complex diseases. For genetic association studies, genetic heterogeneity (GH) and epistasis (gene-gene interactions) epitomize well recognized phenomenon which represent a difficult, but accessible challenge for computational biologists. While progress has been made addressing epistasis, methods for
BMC Research Notes, 2009
Background: Human geneticists are now capable of measuring more than one million DNA sequence var... more Background: Human geneticists are now capable of measuring more than one million DNA sequence variations from across the human genome. The new challenge is to develop computationally feasible methods capable of analyzing these data for associations with common human disease, particularly in the context of epistasis. Epistasis describes the situation where multiple genes interact in a complex non-linear manner to determine an individual's disease risk and is thought to be ubiquitous for common diseases. Multifactor Dimensionality Reduction (MDR) is an algorithm capable of detecting epistasis. An exhaustive analysis with MDR is often computationally expensive, particularly for high order interactions. This challenge has previously been met with parallel computation and expensive hardware. The option we examine here exploits commodity hardware designed for computer graphics. In modern computers Graphics Processing Units (GPUs) have more memory bandwidth and computational capability than Central Processing Units (CPUs) and are well suited to this problem. Advances in the video game industry have led to an economy of scale creating a situation where these powerful components are readily available at very low cost. Here we implement and evaluate the performance of the MDR algorithm on GPUs. Of primary interest are the time required for an epistasis analysis and the price to performance ratio of available solutions.
Uploads
Papers by Nicholas Sinnott-Armstrong