Pattern Discovery
16 Followers
Recent papers in Pattern Discovery
To uncover qualitative and quantitative patterns in a data set is a challenging task for research in the area of machine learning and data analysis. Due to the complexity of real-world data, high-order (polythetic) patterns or event... more
The increased usage of World Wide Web (WWW) becomes a vast data repository related to the users’ interaction with the websites which is unstructured, unlabeled, high redundant and less reliable recorded in weblog. In addition, the... more
We describe a model for strings of characters that is loosely based on the Lempel Ziv model with the addition that a repeated substring can be an approximate match to the original substring; this is close to the situation of DNA, for... more
Recommender systems are helpful tools which provide an adaptive Web environment for Web users. Recently, a number of Web page recommender systems have been developed to extract the user behavior from the user's navigational path and... more
The detection of frequently occurring patterns, also called motifs, in data streams has been recognized as an important task. To find these motifs, we use an advanced event encoding and pattern discovery algorithm. As a large time series... more
This paper uses disaggregated export data to explore the relationship between economic discovery and economic development. We find that discoveries, or episodes when countries begin exporting a new product, are not limited to so-called... more
This study presents an unsupervised feature selection approach for the discovery of significant patterns in seismic wavefields. We iteratively reduce the number of features generated from seismic time series by first considering... more
With the rapid development of information technology, the World Wide Web has been widely used in various applications, such as search engines, online learning and electronic commerce. These applications are used by a diverse population of... more
Knowing patterns of relationship in a social network is very useful for law enforcement agencies to investigate collaborations among criminals, for businesses to exploit relationships to sell products, or for individuals who wish to... more
Motivation: Transcription factor binding sites often differ significantly in their primary sequence and can hardly be aligned. Often one set of sites can contain several subsets of sequences that follow not just one but several different... more
A very large percentage of business and academic data is stored in textual format. With the exception of metadata, such as author, date, title and publisher, this data is not overtly structured like the standard, mainly numerical, data in... more
Application of data mining techniques to the World Wide Web, referred to as Web mining, has been the focus of several recent research projects and papers. However, there is no established vocabulary, leading to confusion when comparing... more
Decision support nowadays is more and more targeted to large scale complicated systems and domains. The success of a decision support system relies mainly on its capability of processing large amounts of data and efficiently extracting... more
This paper provides a comprehensive idea about the pattern discovery of Web usage mining. Web site designers should have clear understanding of user's profile and site objectives, as well as an emphasized knowledge of the way users will... more
Process discovery is the automated construction of structured process models from information system event logs. Such event logs often contain positive examples only. Without negative examples, it is a challenge to strike the right... more
Discovery of interesting or frequently appearing time series patterns is one of the important tasks in various time series data mining applications. However, recent research criticized that discovering subsequence patterns in time series... more
Discovering patterns in graphs has long been an area of interest. In most contemporary approaches to such pattern discovery either quantitative anomalies or frequency of substructure is used to measure the interestingness of a pattern. In... more
Organizations are taking advantage of ''data-mining'' techniques to leverage the vast amounts of data captured as they process routine transactions. Data mining is the process of discovering hidden structure or patterns in data. However,... more
One of the popular trends in computer science has been development of intelligent web-based systems. Demand for such systems forces designers to make use of knowledge discovery techniques on web server logs. Web usage mining has become a... more
A major source of revenue shrink in retail stores is the intentional or unintentional failure of proper checking out of items by the cashier. More recently, a few automated surveillance systems have been developed to monitor cashier lanes... more
We are designing new data mining techniques on boolean contexts to identify a priori interesting concepts, i.e., closed sets of objects (or transactions) and associated closed sets of attributes (or items). We propose a new algorithm... more
Background: Pattern discovery in DNA sequences is one of the most fundamental problems in molecular biology with important applications in finding regulatory signals and transcription factor binding sites. An important task in this... more
Retail company's data may be geographically spread in different locations due to huge amount of data and rapid growth in transactions. But for decision making, knowledge workers need integrated data of all sites. Therefore the main... more
We consider the interoperability of information systems within a distributed environment, such as across statistical organisations of the Member States of the European Union. Within a logical layer between the physical storage of the data... more
Application of data mining techniques to the World Wide Web, referred to as Web mining, has been the focus of several recent research projects and papers. However, there is no established vocabulary, leading to confusion when comparing... more
Scoring sentences in documents given abstract summaries created by humans is important in extractive multi-document summarization. In this paper, we formulate extractive summarization as a two step learning problem building a generative... more
Removing objects that are noise is an important goal of data cleaning as noise hinders most types of data analysis. Most existing data cleaning methods focus on removing noise that is the result of low-level data errors that result from... more
The explosive growth of Internet has given rise to many websites which maintain large amount of user information. To utilize this information, identifying usage pattern of users is very important. Web usage mining is one of the processes... more
Pattern-based Java bytecode compression techniques rely on the identification of identical instruction sequences that occur more than once. Each occurrence of such a sequence is substituted by a single instruction. The sequence defines a... more
Inductive databases (IDBs) represent a database view on data mining and knowledge discovery. IDBs contain not only data, but also generalizations (patterns and models) valid in the data. In an IDB, ordinary queries can be used to access... more
Scientific measurements are often depicted as line graphs. Stateof-the-art high throughput systems in life sciences, telemetry and electronics measurement rapidly generate hundreds to thousands of such graphs. Despite the increasing... more
One of the popular trends in computer science has been development of intelligent web-based systems. Demand for such systems forces designers to make use of knowledge discovery techniques on web server logs. Web usage mining has become a... more
Most modern lossless data compression techniques used today, are based in dictionaries. If some string of data being compressed matches a portion previously seen, then such string is included in the dictionary and its reference is... more
Several pattern discovery methods proposed in the data mining literature have the drawbacks that they discover too many obvious or irrelevant patterns and that they do not leverage to a full extent valuable prior domain knowledge that... more
The study of metabolic pathways is becoming increasingly important to exploit an integrated, systems-level approach for optimizing a desired cellular property or phenotype. In this context, the integration of genomics data with genetic,... more
To engage visitors to a Web site at a very early stage (i.e., before registration or authentication), personalization tools must rely primarily on clickstream data captured in Web server logs. The lack of explicit user ratings as well as... more