Academia.eduAcademia.edu
paper cover icon
DNA Sequence Classification Using Compression-Based Induction

DNA Sequence Classification Using Compression-Based Induction

1995
Abstract
Inductive learning methods, such as neural networks and decision trees, have become a popular approach to developing DNA sequence identification tools. Such methods attempt to form models of a collection of training data that can be used to predict future data accurately. The common approach to using such methods on DNA sequence identification problems forms models that depend on the absolute locations of nucleotides and assume independence of consecutive nucleotide locations. This paper describes a new class of learning methods, called compression-based induction (CBI), that is geared towards sequence learning problems such as those that arise when learning DNA sequences. The central idea is to use text compression techniques on DNA sequences as the means for generalizing from sample sequences. The resulting methods form models that are based on the more important relative locations of nucleotides and on the dependence of consecutive locations. They also provide a suitable framewor...

Peter Yianilos hasn't uploaded this paper.

Let Peter know you want this paper to be uploaded.

Ask for this paper to be uploaded.