Skip to main content

Katerina Perdikuri

University of Patras, Computer Engineering and Informatics, Graduate Student

Followers

3

Following

1

Co-author

1

Public Views

Interests

Uploads

Papers by Katerina Perdikuri

Computing the Repetitions in a Biological Weighted Sequence

One of the most important goals in computational molecular biology is allocating repeated pattern... more One of the most important goals in computational molecular biology is allocating repeated patterns in nucleic or protein sequences, and identifying structural or functional motifs that are common to a set of such sequences. Although the problem of computing the repetitions in biological sequences has been extensively studied, in the relevant literature, the problem of computing the repetitions in biological weighted sequences has not been efficiently solved. In this work we present an O(n 2 ) algorithm for computing the set of repetitions in a biological weighted sequence with probability of appearance larger than 1/k , where k is a given constant. Our algorithm can be applied in the detection of the repeated patterns in biological weighted sequences such as assembled DNA sequences.

String Regularities with Don't Cares (extended version)

Computing the repetitions in a weighted sequence

Computing the Repetitions in a Weighted Sequence using Weighted Su x Trees

... Iliopoulos, Costas S. and Makris, C. and Panagis, I. and Perdikuri, Katerina and Theodoridis,... more

Identification of protein patterns in nucleic acid sequences

New Upper Bounds on Various String Manipulation Problems

In this chapter we deal with various string manipulation problems which originate from the field ... more In this chapter we deal with various string manipulation problems which originate from the field of computational biology and musicology. These problems are: "approximate string matching with gaps", "inference of maximal pairs in a set of strings" and "handling of weighted sequences". We provide new upper bounds for solving these problems and for the third we propose a novel data structure, for the representation of the weighted sequences, which inherits most of the properties of the suffix tree.

New Upper Bounds on Various String

ABSTRACT

String Regularities with Don't Cares

Algorithms for computing typical regularities in strings with don't care symbols are presented. T... more Algorithms for computing typical regularities in strings with don't care symbols are presented. The period of a string of length n over an alphabet Σ can be computed in O(n log n log |Σ|) worst-case time. The computation of all possible borders, the border array and all covers of a string require quadratic time in the worst-case but in practice it performs very well. The expected average running time is linear.

Evolutionary Biomedical Signal Processing Techniques

Digital Signal Processing techniques constitute the basic scientific approach used in most of the... more Digital Signal Processing techniques constitute the basic scientific approach used in most of the current advances in medicine. In particular, the development of algorithms in order to extract, predict and model raw biomedical data series has revolutionized many routine, but data-intensive, areas of current medical practice. In this contribution, we present an evolutionary technique for modelling and analysing Non-linear Time Series (NLTS). The proposed methodology has been already used in two cases with great biomedical importance and we therefore explore its effectiveness on other biomedical signals.

The Weighted Suffix Tree: An Efficient Data Structure for Handling Molecular Weighted Sequences and its Applications

Fundamenta Informaticae

In this paper we introduce the Weighted Suffix Tree, an efficient data structure for com- puting ... more In this paper we introduce the Weighted Suffix Tree, an efficient data structure for com- puting string regularities in weighted sequences of molecular data. Molecular Weighted Sequences can model important biological processes such as the DNA Assembly Process or the DNA-Protein Binding Process. Thus pattern matching or identification of repeated patterns, in biological weighted sequences is a very important procedure in the translation of gene expression and regulation. We present time and space efficient algorithms for constructing the weighted suffix tree and some appli- cations of the proposed data structure to problems taken from the Molecular Biology area such as pattern matching, repeats discovery, discovery of the longest common subsequence of two weighted sequences and computation of covers.

Knowledge discovery in patent databases

Proceedings of the eleventh international conference on Information and knowledge management - CIKM '02, 2002

In our days the business, scientific and personal databases are growing in an exponential rate. H... more In our days the business, scientific and personal databases are growing in an exponential rate. However, what is truly valuable is the knowledge that can be extracted from the stored data. Knowledge Discovery in patent databases was traditionally based on manual analysis carried out from statistical experts. Nowadays the increasing interest of many actors have led to the development of

THE PATTERN MATCHING PROBLEM IN BIOLOGICAL WEIGHTED SEQUENCES

Fun with Algorithms, 2000

In this paper we develop new and efficient algorithms for the problems of pattern matching and id... more In this paper we develop new and efficient algorithms for the problems of pattern matching and identification of repeated patterns in biological weighted sequences. Biological Weighted Se- quences can model important biological processes such as the DNA Assembly Process or the DNA-Protein Binding Process. Thus, pattern matching or identification of repeated patterns in biological weighted sequences is a very important

Motif Extraction from Weighted Sequences

Lecture Notes in Computer Science, 2004

We present in this paper three algorithms. The first extracts repeated motifs from a weighted seq... more We present in this paper three algorithms. The first extracts repeated motifs from a weighted sequence. The motifs correspond to words which occur at least q times and with hamming distance e in a weighted sequence with probability ≥ 1/k each time, where k is a small constant. The second algorithm extracts common motifs from a set of N ≥ 2 weighted sequences with hamming distance e. In the second case, the motifs must occur twice with probability ≥ 1/k, in 1 ≤ q ≤ N distinct sequences of the set. The third algorithm extracts maximal pairs from a weighted sequence. A pair in a sequence is the occurrence of the same substring twice. In addition, the algorithms presented in this paper improve slightly on previous work on these problems.

Time and space efficient content queries for video databases

Indexing video content is one of the most important problems in video databases. In this paper we... more Indexing video content is one of the most important problems in video databases. In this paper we present a simple optimal algorithm for this problem that answers certain content queries invoking video functions in linear time and space in terms of the number of the objects appearing in the video. To accomplish this, we make a straightforward reduction of this

New Upper Bounds on Various String Manipulation Problems

In this chapter we deal with various string manipulation problems which originate from the field ... more In this chapter we deal with various string manipulation problems which originate from the field of computational biology and mu- sicology. These problems are: "approximate string matching with gaps", "inference of maximal pairs in a set of strings" and "handling of weighted sequences". We provide new upper bounds for solving these problems and for the third we propose a novel

Efficient Algorithms for Handling Molecular Weighted Sequences

IFIP International Federation for Information Processing, 2004

In this paper we introduce the Weighted Suffix Tree, an efficient data structure for computing st... more In this paper we introduce the Weighted Suffix Tree, an efficient data structure for computing string regularities in weighted sequences of molecular data. Molecular Weighted Sequences can model important biological processes such as the DNA Assembly Process or the DNA-Protein Binding Process. Thus pattern matching or identification of repeated patterns, in biological weighted sequences is a very important procedure in the translation of gene expression and regulation. We present time and space efficient algorithms for constructing the weighted suffix tree and some applications of the proposed data structure to problems taken from the Molecular Biology area such as pattern matching, repeats discovery, discovery of the longest common subsequence of two weighted sequences and computation of covers.

Computing the repetitions in a weighted sequence

Algorithms for extracting motifs from biological weighted sequences

Journal of Discrete Algorithms, 2007

In this paper we present three algorithms for the Motif Identification Problem in Biological Weig... more In this paper we present three algorithms for the Motif Identification Problem in Biological Weighted Sequences. The first algorithm extracts repeated motifs from a biological weighted sequence. The motifs correspond to repetitive words which are approximately equal, under a Hamming distance, with probability of occurrence 1/k, where k is a small constant. The second algorithm extracts common motifs from a set of N 2 weighted sequences. In this case, the motifs consists of words that must occur with probability 1/k, in 1 q N distinct sequences of the set. The third algorithm extracts maximal pairs from a biological weighted sequence. A pair in a sequence is the occurrence of the same word twice. In addition, the algorithms presented in this paper improve previous work on these problems.

Computation of Repetitions and Regularities of Biologically Weighted Sequences

Journal of Computational Biology, 2006

Biological Weighted Sequences are used extensively in Molecular Biology as profiles for protein f... more Biological Weighted Sequences are used extensively in Molecular Biology as profiles for protein families, in the representation of binding sites and often for the representation of sequences produced by a shotgun sequencing strategy. In this paper we address three fundamental problems in the area of Biological Weighted Sequences: i) Computation of Repetitions, ii) Pattern Matching and iii) Computation of Regularities. To the best of our knowledge, this is the first time these problems are tackled in the relative literature. Our algorithms can be used as basic building blocks for more sophisticated algorithms applied on weighted sequences. A preliminary form of the results in this paper were presented in the conferences Fun with Algorithms [Iliopoulos et al. 2004b], CompBionets [Christodoulakis et al. 2004a] and ICCMSE [Christodoulakis et al. 2004b].

Motif Extraction from Biological Sequences: Trends and Contributions to Other Scientific Fields

International Conference on Information Technology and Applications, 2005

In this paper we present algorithms for the localization and extraction of interesting motifs fro... more In this paper we present algorithms for the localization and extraction of interesting motifs from biological sequences. We are especially interested in weighted sequences, which are extensively used in molecular biology as profiles for protein families and for the representation of binding sites. It is our belief that these algorithms can also be applied to other information technology applications such

Computing the Repetitions in a Biological Weighted Sequence

One of the most important goals in computational molecular biology is allocating repeated pattern... more One of the most important goals in computational molecular biology is allocating repeated patterns in nucleic or protein sequences, and identifying structural or functional motifs that are common to a set of such sequences. Although the problem of computing the repetitions in biological sequences has been extensively studied, in the relevant literature, the problem of computing the repetitions in biological weighted sequences has not been efficiently solved. In this work we present an O(n 2 ) algorithm for computing the set of repetitions in a biological weighted sequence with probability of appearance larger than 1/k , where k is a given constant. Our algorithm can be applied in the detection of the repeated patterns in biological weighted sequences such as assembled DNA sequences.

String Regularities with Don't Cares (extended version)

Computing the repetitions in a weighted sequence

Computing the Repetitions in a Weighted Sequence using Weighted Su x Trees

... Iliopoulos, Costas S. and Makris, C. and Panagis, I. and Perdikuri, Katerina and Theodoridis,... more

Identification of protein patterns in nucleic acid sequences

New Upper Bounds on Various String Manipulation Problems

In this chapter we deal with various string manipulation problems which originate from the field ... more In this chapter we deal with various string manipulation problems which originate from the field of computational biology and musicology. These problems are: "approximate string matching with gaps", "inference of maximal pairs in a set of strings" and "handling of weighted sequences". We provide new upper bounds for solving these problems and for the third we propose a novel data structure, for the representation of the weighted sequences, which inherits most of the properties of the suffix tree.

New Upper Bounds on Various String

ABSTRACT

String Regularities with Don't Cares

Algorithms for computing typical regularities in strings with don't care symbols are presented. T... more Algorithms for computing typical regularities in strings with don't care symbols are presented. The period of a string of length n over an alphabet Σ can be computed in O(n log n log |Σ|) worst-case time. The computation of all possible borders, the border array and all covers of a string require quadratic time in the worst-case but in practice it performs very well. The expected average running time is linear.

Evolutionary Biomedical Signal Processing Techniques

Digital Signal Processing techniques constitute the basic scientific approach used in most of the... more Digital Signal Processing techniques constitute the basic scientific approach used in most of the current advances in medicine. In particular, the development of algorithms in order to extract, predict and model raw biomedical data series has revolutionized many routine, but data-intensive, areas of current medical practice. In this contribution, we present an evolutionary technique for modelling and analysing Non-linear Time Series (NLTS). The proposed methodology has been already used in two cases with great biomedical importance and we therefore explore its effectiveness on other biomedical signals.

The Weighted Suffix Tree: An Efficient Data Structure for Handling Molecular Weighted Sequences and its Applications

Fundamenta Informaticae

In this paper we introduce the Weighted Suffix Tree, an efficient data structure for com- puting ... more In this paper we introduce the Weighted Suffix Tree, an efficient data structure for com- puting string regularities in weighted sequences of molecular data. Molecular Weighted Sequences can model important biological processes such as the DNA Assembly Process or the DNA-Protein Binding Process. Thus pattern matching or identification of repeated patterns, in biological weighted sequences is a very important procedure in the translation of gene expression and regulation. We present time and space efficient algorithms for constructing the weighted suffix tree and some appli- cations of the proposed data structure to problems taken from the Molecular Biology area such as pattern matching, repeats discovery, discovery of the longest common subsequence of two weighted sequences and computation of covers.

Knowledge discovery in patent databases

Proceedings of the eleventh international conference on Information and knowledge management - CIKM '02, 2002

In our days the business, scientific and personal databases are growing in an exponential rate. H... more In our days the business, scientific and personal databases are growing in an exponential rate. However, what is truly valuable is the knowledge that can be extracted from the stored data. Knowledge Discovery in patent databases was traditionally based on manual analysis carried out from statistical experts. Nowadays the increasing interest of many actors have led to the development of

THE PATTERN MATCHING PROBLEM IN BIOLOGICAL WEIGHTED SEQUENCES

Fun with Algorithms, 2000

In this paper we develop new and efficient algorithms for the problems of pattern matching and id... more In this paper we develop new and efficient algorithms for the problems of pattern matching and identification of repeated patterns in biological weighted sequences. Biological Weighted Se- quences can model important biological processes such as the DNA Assembly Process or the DNA-Protein Binding Process. Thus, pattern matching or identification of repeated patterns in biological weighted sequences is a very important

Motif Extraction from Weighted Sequences

Lecture Notes in Computer Science, 2004

We present in this paper three algorithms. The first extracts repeated motifs from a weighted seq... more We present in this paper three algorithms. The first extracts repeated motifs from a weighted sequence. The motifs correspond to words which occur at least q times and with hamming distance e in a weighted sequence with probability ≥ 1/k each time, where k is a small constant. The second algorithm extracts common motifs from a set of N ≥ 2 weighted sequences with hamming distance e. In the second case, the motifs must occur twice with probability ≥ 1/k, in 1 ≤ q ≤ N distinct sequences of the set. The third algorithm extracts maximal pairs from a weighted sequence. A pair in a sequence is the occurrence of the same substring twice. In addition, the algorithms presented in this paper improve slightly on previous work on these problems.

Time and space efficient content queries for video databases

Indexing video content is one of the most important problems in video databases. In this paper we... more Indexing video content is one of the most important problems in video databases. In this paper we present a simple optimal algorithm for this problem that answers certain content queries invoking video functions in linear time and space in terms of the number of the objects appearing in the video. To accomplish this, we make a straightforward reduction of this

New Upper Bounds on Various String Manipulation Problems

In this chapter we deal with various string manipulation problems which originate from the field ... more In this chapter we deal with various string manipulation problems which originate from the field of computational biology and mu- sicology. These problems are: "approximate string matching with gaps", "inference of maximal pairs in a set of strings" and "handling of weighted sequences". We provide new upper bounds for solving these problems and for the third we propose a novel

Efficient Algorithms for Handling Molecular Weighted Sequences

IFIP International Federation for Information Processing, 2004

In this paper we introduce the Weighted Suffix Tree, an efficient data structure for computing st... more In this paper we introduce the Weighted Suffix Tree, an efficient data structure for computing string regularities in weighted sequences of molecular data. Molecular Weighted Sequences can model important biological processes such as the DNA Assembly Process or the DNA-Protein Binding Process. Thus pattern matching or identification of repeated patterns, in biological weighted sequences is a very important procedure in the translation of gene expression and regulation. We present time and space efficient algorithms for constructing the weighted suffix tree and some applications of the proposed data structure to problems taken from the Molecular Biology area such as pattern matching, repeats discovery, discovery of the longest common subsequence of two weighted sequences and computation of covers.

Computing the repetitions in a weighted sequence

Algorithms for extracting motifs from biological weighted sequences

Journal of Discrete Algorithms, 2007

In this paper we present three algorithms for the Motif Identification Problem in Biological Weig... more In this paper we present three algorithms for the Motif Identification Problem in Biological Weighted Sequences. The first algorithm extracts repeated motifs from a biological weighted sequence. The motifs correspond to repetitive words which are approximately equal, under a Hamming distance, with probability of occurrence 1/k, where k is a small constant. The second algorithm extracts common motifs from a set of N 2 weighted sequences. In this case, the motifs consists of words that must occur with probability 1/k, in 1 q N distinct sequences of the set. The third algorithm extracts maximal pairs from a biological weighted sequence. A pair in a sequence is the occurrence of the same word twice. In addition, the algorithms presented in this paper improve previous work on these problems.

Computation of Repetitions and Regularities of Biologically Weighted Sequences

Journal of Computational Biology, 2006

Biological Weighted Sequences are used extensively in Molecular Biology as profiles for protein f... more Biological Weighted Sequences are used extensively in Molecular Biology as profiles for protein families, in the representation of binding sites and often for the representation of sequences produced by a shotgun sequencing strategy. In this paper we address three fundamental problems in the area of Biological Weighted Sequences: i) Computation of Repetitions, ii) Pattern Matching and iii) Computation of Regularities. To the best of our knowledge, this is the first time these problems are tackled in the relative literature. Our algorithms can be used as basic building blocks for more sophisticated algorithms applied on weighted sequences. A preliminary form of the results in this paper were presented in the conferences Fun with Algorithms [Iliopoulos et al. 2004b], CompBionets [Christodoulakis et al. 2004a] and ICCMSE [Christodoulakis et al. 2004b].

Motif Extraction from Biological Sequences: Trends and Contributions to Other Scientific Fields

International Conference on Information Technology and Applications, 2005

In this paper we present algorithms for the localization and extraction of interesting motifs fro... more In this paper we present algorithms for the localization and extraction of interesting motifs from biological sequences. We are especially interested in weighted sequences, which are extensively used in molecular biology as profiles for protein families and for the representation of binding sites. It is our belief that these algorithms can also be applied to other information technology applications such