Skip to main content
We improve the fastest known algorithm for approximate string matching, which can be used only for low error levels. By using a new method to verify potential matches and a new optimization technique for biased texts (such as English),... more
    • by 
    •   5  
      EngineeringInformation RetrievalIPLMathematical Sciences
In this article, a word-oriented approximate string matching approach for searching Arabic text is presented. The distance between a pair of words is determined on the basis of aligning the two words by using occurrence heuristic tables.... more
    • by 
    •   3  
      Information SystemsLibrary and Information StudiesApproximate string matching
    • by 
    •   6  
      Computer ScienceFPGADynamic programmingCognitive Computation
This paper presents a software library, nicknamed BATS, for some basic sequence analysis tasks. Namely, local alignments, via approximate string matching, and global alignments, via longest common subsequence and alignments with affine... more
    • by 
    •   9  
      Sequence AnalysisBiological SciencesStatistical SignificanceTheory and Practice
The web has become a resourceful tool for almost all domains today. Search engines prominently use inverted indexing technique to locate the web pages having the users query. The performance of inverted index fundamentally depends upon... more
    • by 
    •   5  
      String MatchingString Matching AlgorithmsWeb SearchingApproximate string matching
    • by 
    •   14  
      Natural Language ProcessingSignal ProcessingGraph TheoryComputational linguistic phylogenetics
This paper focuses on the problem of alias detection based on orthographic variations of Arabic names. Alias detection is the process to identify dif f erent variants of the same name. To detect aliases based on orthographic vari ations,... more
    • by 
    •   2  
      Experimental EvaluationApproximate string matching
The δ-approximate string matching problem, recently introduced in connection with applications to music retrieval, is a generalization of the exact string matching problem for alphabets of integer numbers. In the δ-approximate variant,... more
    • by 
    •   3  
      StringologyEfficient Algorithm for ECG CodingApproximate string matching
A compressed full-text self-index for a text T is a data structure requiring reduced space and able to search for patterns P in T. It can also reproduce any substring of T , thus actually replacing T. Despite the recent explosion of... more
    • by 
    •   9  
      EngineeringAlgorithmsData StructureMathematical Sciences
We present new algorithms for approximate string matching based in simple, but efficient, ideas. First, we present an algorithm for string matching with mismatches based in arithmetical operations that runs in linear worst case time for... more
    • by 
    •   7  
      EngineeringAlgorithmsMathematical SciencesPattern Matching
Approximate string matching is an important operation in information systems because an input string is often an inexact match to the strings already stored. Commonly known accurate methods are computationally expensive as they compare... more
    • by 
    •   3  
      DictionaryApproximate string matchingN gram
Given two strings, a pattern P of length m and a text T of length n over some alphabet Σ, we consider the string matching problem under k mismatches. The wellknown Shift-Add algorithm (Baeza-Yates and Gonnet, 1992) solves the problem in... more
    • by 
    •   12  
      EngineeringAlgorithmsAlgorithmInformation Processing
We consider a version of pattern matching useful in processing large musical data: - matching, which consists in finding matches which are -approximate in the sense of the distance measured as maximum difference between symbols. The... more
    • by 
    •   4  
      Pattern MatchingString MatchingApproximate string matchingDistance Measure
Indexing Methods for Approximate String Matching Gonzalo Navarro£ Ricardo Baeza-Yates£ Erkki Sutinen Ý Jorma Tarhio Þ Abstract Indexing for approximate text searching is a novel problem that has received significant attention be-cause of... more
    • by 
    •   5  
      Signal ProcessingComputational BiologyIndexationApproximate string matching
In this chapter we deal with various string manipulation problems which originate from the field of computational biology and mu- sicology. These problems are: "approximate string matching with gaps", "inference of maximal... more
    • by 
    •   5  
      Computational BiologyData StructureSuffix TreeUpper Bound
    • by 
    •   12  
      Indigenous KnowledgeKnowledge RepresentationProceedingsInformation Access
This presentation looks at the spammers modifying content in spam. For example the deliberate misspelling of words like Viagra to get through spam filters. We use the dynamic programming algorithm to identify variations of these words.... more
    • by 
    •   3  
      Dynamic programmingSpam FilteringApproximate string matching
The Boyer-Moore idea applied in exact string matching is generalized to approximate string matching. Two versions of the problem are considered. The k mismatches problem is to find all approximate occurrences of a pattern string (length... more
    • by 
    •   6  
      Pure MathematicsExperimental EvaluationString MatchingEdit Distance
The newest generation of sequencing instruments, such as Illumina/Solexa Genome Analyzer and ABI SOLiD, can generate hundreds of millions of short DNA "reads" from a single run. These reads must be matched against a reference genome to... more
    • by 
    •   10  
      BioinformaticsReconfigurable ComputingGenomicsField-Programmable Gate Arrays
We present a new bit-parallel technique for approximate string matching. We build on two previous techniques. The first one [Myers, J. of the ACM, 1999], searches for a pattern of length m in a text of length n permitting k differences in... more
    • by 
    •   8  
      BioinformaticsDistanceNatural languageSearch Algorithm
    • by 
    •   13  
      MathematicsComputer ScienceModelingData Structure
A compressed full-text self-index for a text T is a data structure requiring reduced space and able of searching for patterns P in T. Furthermore, the structure can reproduce any substring of T , thus it actually replaces T. Despite the... more
    • by 
    •   7  
      MathematicsComputer ScienceComputational BiologySpace Time
    • by 
    •   15  
      AlgorithmsComputational BiologyPattern RecognitionBiological Sciences
We introduce a problem called Maximum Common Characters in Blocks (MCCB), which arises in applications of approximate string comparison, particularly in the unification of possibly erroneous textual data coming from different sources. We... more
    • by 
    •   11  
      Linear ProgrammingInteger ProgrammingMultidisciplinaryUnification
We propose a method for efficiently finding all parallel passages in a large corpus, even if the passages are not quite identical due to rephrasing and orthographic variation. The key ideas are the representation of each word in the... more
    • by  and +1
    • Approximate string matching
In this paper we consider several new versions of approximate string matching with gaps. The main characteristic of these new versions is the existence of gaps in the matching of a given pattern in a text. Algorithms are devised for each... more
    • by 
    •   4  
      Dynamic programmingMusic analysisNordicApproximate string matching
We present a new index for approximate string matching. The index collects text q-samples, i.e. disjoint text substrings of length q, at fixed intervals and stores their positions. At search time, part of the text is filtered out by... more
    • by 
    •   5  
      Applied MathematicsLoad BalanceIndexationDiscrete Algorithms
In this paper we describe a factorial language, denoted by L (S, k, r), that contains all words that occur in a string S up to k mismatches every r symbols. Then we give some combinatorial properties of a parameter, called repetition... more
    • by 
    •   5  
      Data StructureCombinatorics on WordsFormal languageIndexation
Distinctive visual cues are of central importance for image retrieval applications, in particular, in the context of visual location recognition. While in indoor environments typically only few distinctive features can be found, outdoors... more
    • by 
    •   12  
      Computational ComplexityContent based image retrievalImage RetrievalFeature Extraction
We consider a version of pattern matching useful in processing large musical data: - matching, which consists in finding matches which are -approximate in the sense of the distance measured as maximum difference between symbols. The... more
    • by 
    •   3  
      Pattern MatchingString MatchingApproximate string matching
    • by 
    •   5  
      Randomized AlgorithmsApproximation AlgorithmsString MatchingString Matching Algorithms
Indexing Methods for Approximate String Matching Gonzalo Navarro£ Ricardo Baeza-Yates£ Erkki Sutinen Ý Jorma Tarhio Þ Abstract Indexing for approximate text searching is a novel problem that has received significant attention be-cause of... more
    • by  and +1
    •   4  
      Signal ProcessingComputational BiologyIndexationApproximate string matching
    • by 
    •   5  
      Information SystemsDistributed ComputingLoad BalanceMessage Passing Interface
Treating electronic ink as first-class data -as opposed to simply a substitute for keyboard input -offers intriguing possibilities. The pen has well-known advantages in terms of portability and user acceptance, and ink is an extremely... more
    • by 
    •   5  
      Information RetrievalHandwriting RecognitionUser AcceptanceApproximate string matching
This study focuses on the intellectual accessibility of information in indigenous languages, using Zulu, one of the main indigenous languages in South Africa, as a test case. Both Cross-Lingual Information Retrieval (CLIR) and metadata... more
    • by  and +1
    •   11  
      Indigenous KnowledgeKnowledge RepresentationInformation AccessCross Lingual Information Retrieval
In this paper we consider the confidentiality aspects of particular Grid's applications such as, for example, genetic applications. The search of DNA similarities is one of the interesting areas of genetic biology. However, DNA sequences... more
    • by 
    •   6  
      GeneticsString MatchingEdit DistanceGrid System
Distinctive visual cues are of central importance for image retrieval applications, in particular, in the context of visual location recognition. While in indoor environments typically only few distinctive features can be found, outdoors... more
    • by 
    •   12  
      Computational ComplexityContent based image retrievalImage RetrievalFeature Extraction
Searching in a large data set those strings that are more similar, according to the edit distance, to a given one is a time-consuming process. In this paper we investigate the performance of metric trees, namely the M-tree, when they are... more
    • by 
    •   14  
      Computer ScienceData MiningData AnalysisPattern Recognition
The k differences approximate string matching problem specifies a text string of length n, a pattern string of length m, the number k of differences (insertions, deletions, substitutions) allowed in a match, and asks for every location in... more
    • by 
    •   14  
      GeneticsComputer ScienceMolecular BiologyPattern Recognition
Here, we present PatMatch, an efficient, web-based pattern-matching program that enables searches for short nucleotide or peptide sequences such as cis-elements in nucleotide sequences or small domains and motifs in protein sequences. The... more
    • by 
    •   20  
      GeneticsComputer ScienceBiologyMedicine
This article shows how finite-state methods can be employed in a new and different task: the conflation of personal name variants in standard forms. In bibliographic databases and citation index systems, variant forms create problems of... more
    • by 
    •   6  
      Information SystemsComputer ScienceLibrary and Information StudiesIndexation
Approximate string matching is an important problem in Computer Science. The standard solution for this problem is an O(mn) running time and space dynamic programming algorithm for two strings of length m and n. Lan-dau and Vishkin... more
    • by 
    •   2  
      Approximate string matchingDynamic Programming Algorithm
We explain new ways of constructing search algorithms using fuzzy sets and fuzzy automata. This technique can be used to search or match strings in special cases when some pairs of symbols are more similar to each other than the others.... more
    • by 
    •   6  
      Soft ComputingSearch AlgorithmString MatchingFuzzy Set
Approximate string matching is an important paradigm in domains ranging from speech recognition to information retrieval and molecular biology. In this paper, we introduce a new formalism for a class of applications that takes two strings... more
    • by 
    •   13  
      Set TheoryInformation RetrievalMolecular BiologySpeech Recognition
AbstractÐSeveral new number representations based on a Residue Number System are presented which use the smallest prime numbers as moduli and are suited for parallel computations on a reconfigurable mesh architecture. The bit model of... more
    • by 
    •   8  
      Distributed ComputingTime UseResidue Number SystemComputer Software
Index-based search algorithms are an important part of a genomic search, and how to construct indices is the key to an index-based search algorithm to compute similarities between two DNA sequences. In this paper, we propose an efficient... more
    • by 
    •   19  
      MathematicsComputer ScienceAlgorithmsMedicine
This paper introduces a framework for clarifying and formalizing the duplicate document detection problem. Four distinct models are presented, each with a corresponding algorithm for its solution derived from the realm of approximate... more
    • by 
    •   7  
      Computer ScienceInformation ManagementData MiningPackaging
We propose novel machine learning methods for exploring the domain of music performance praxis. Based on simple measurements of timing and intensity in 12 recordings of a Schubert piano piece, short performance sequences are fed into a... more
    • by 
    •   7  
      Cognitive ScienceArtificial IntelligenceMachine LearningEvolutionary Algorithm
    • by 
    •   18  
      MathematicsComputer ScienceConstructionTheoretical Computer Science
In this paper we focus on the construction of the minimal deterministic finite automaton S k that recognizes the set of suffixes of a word w up to k errors. We present an algorithm that makes use of the automaton S k in order to accept in... more
    • by 
    •   5  
      MathematicsComputer ScienceCombinatorics on WordsApplication