Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2012, Lecture Notes in Computer Science
…
2 pages
1 file
This paper presents a novel approach for time series clustering which is based on BIRCH algorithm. Our BIRCH-based approach performs clustering of time series data with a multi-resolution transform used as feature extraction technique. Our approach hinges on the use of cluster feature (CF) tree that helps to resolve the dilemma associated with the choices of initial centers and significantly improves the execution time and clustering quality. Our BIRCH-based approach not only takes full advantages of BIRCH algorithm in the capacity of handling large databases but also can be viewed as a flexible clustering framework in which we can apply any selected clustering algorithm in Phase 3 of the framework. Experimental results show that our proposed approach performs better than k-Means in terms of clustering quality and running time, and better than I-k-Means in terms of clustering quality with nearly the same running time.
Data Mining and Knowledge Discovery, 1997
Data clustering is an important technique for exploratory data analysis, and has been studied for several years. It has been shown to be useful in many practical domains such as data classification and image processing. Recently, there has been a growing emphasis on exploratory analysis of very large datasets to discover useful patterns and/or correlations among attributes. This is called data mining, and data clustering is regarded as a particular branch. However existing data clustering methods do not adequately address the problem of processing large datasets with a limited amount of resources (e.g., memory and cpu cycles). So as the dataset size increases, they do not scale up well in terms of memory requirement, running time, and result quality.
Journal La Multiapp
Time series is one of the forms of data presentation that is used in many studies. It is convenient, easy and informative. Clustering is one of the tasks of data processing. Thus, the most relevant currently are methods for clustering time series. Clustering time series data aims to create clusters with high similarity within a cluster and low similarity between clusters. This work is devoted to clustering time series. Various methods of time series clustering are considered. Examples are given for real data.
1996
Finding useful patterns in large datasets has attracted considerable interest recently, and one of the most widely st,udied problems in this area is the identification of clusters, or deusel y populated regions, in a multi-dir nensi onal clataset. Prior work does not adequately address the problem of large datasets and minimization of 1/0 costs.
Clustering is considered as the most important unsupervised learning problem. It aims to find some structure in a collection of unlabeled data. Dealing with a large quantity of data items can be problematic because of time complexity. On the other hand high dimensional data is a challenge arena in data clustering e.g. time series data. Novel algorithms are needed to be robust, scalable, efficient and accurate to cluster of these kinds of data. In this study we proposed a two stages algorithm base on K-Means to achieve our objective.
TheScientificWorldJournal, 2014
Time series clustering is an important solution to various problems in numerous fields of research, including business, medical science, and finance. However, conventional clustering algorithms are not practical for time series data because they are essentially designed for static data. This impracticality results in poor clustering accuracy in several systems. In this paper, a new hybrid clustering algorithm is proposed based on the similarity in shape of time series data. Time series data are first grouped as subclusters based on similarity in time. The subclusters are then merged using the k-Medoids algorithm based on similarity in shape. This model has two contributions: (1) it is more accurate than other conventional and hybrid approaches and (2) it determines the similarity in shape among time series data with a low complexity. To evaluate the accuracy of the proposed model, the model is tested extensively using syntactic and real-world time series datasets.
2015
Clustering is an approach to divide data into number of groups on the basis of some mutual characteristics each group called clusters, consists of objects that are similar between themselves and dissimilar to objects between other groups. Nowadays a lot of work is being carried out for trend analysis on time series data set. The genetic algorithm suffers with the problem in case of time series data because they consider each time stamp as a single entity during clustering. Even some algorithm still give good result on some type of time series data set but overall there is no generalized algorithm which considers different type of time series data set. This paper is presents comprehensive analysis over different type of clustering algorithms like k-means, hierarchical, SOM and GMM on two type of time series data, microarray and financial. These algorithms are compared on the following factors: size of data, number of clusters, type of data set, homogeneity score, separation score, si...
Data Mining and Knowledge Discovery, 2006
BIRCH algorithm is a clustering algorithm suitable for very large data sets. In the algorithm, a CF-tree is built whose all entries in each leaf node must satisfy a uniform threshold T, and the CF-tree is rebuilt at each stage by different threshold. But using a single threshold cause many shortcomings in the birch algorithm, in this paper to propose a solution to this shortcoming by using multiple thresholds instead of a single threshold.
2005
Existing methods for time series clustering rely on the actual data values can become impractical since the methods do not easily handle dataset with high dimensionality, missing value, or different lengths. In this paper, a dimension reduction method is proposed that replaces the raw data with some global measures of time series characteristics. These measures are then clustered using a self-organizing map.
Series in Machine Perception and Artificial Intelligence, 2004
In recent years, there has been an explosion of interest in mining time series databases. As with most computer science problems, representation of the data is the key to efficient and effective solutions. One of the most commonly used representations is piecewise linear approximation. This representation has been used by various researchers to support clustering, classification, indexing and association rule mining of time series data. A variety of algorithms have been proposed to obtain this representation, with several algorithms having been independently rediscovered several times. In this paper, we undertake the first extensive review and empirical comparison of all proposed techniques. We show that all these algorithms have fatal flaws from a data mining perspective. We introduce a novel algorithm that we empirically show to be superior to all others in the literature.
Debates em Educação, 2019
İklim Değişikliğine Bağlı Göçlerin Avrupa Birliği'ne Üye Ülkeler ve Türkiye'de Sosyal Güvenlik Sistemlerine Etkileri, 2019
Frontiers in Microbiology, 2019
Communications in Computer and Information Science, 2017
IP Annals of Prosthodontics and Restorative Dentistry, 2020
Team Performance Management, 2022
Desde el Sur. Revista de Ciencias Humanas y Sociales de la Universidad Científica del Sur, 2013
The Astrophysical Journal, 2020
Thēmis Revista De Derecho, 2015