... Huang [18] proposed an algorithm based on K-mean algorithm philosophy to cluster mixed data. ... more ... Huang [18] proposed an algorithm based on K-mean algorithm philosophy to cluster mixed data. Fuzzy c-means (FCM) proposed by Dunn [19] and extended by Bezdek [20] is one of the most well known methodologies in clustering analysis. ...
Performance of iterative clustering algorithms which converges to numerous local minima depend hi... more Performance of iterative clustering algorithms which converges to numerous local minima depend highly on initial cluster centers. Generally initial cluster centers are selected randomly. In this paper we propose an algorithm to compute initial cluster centers for K-means clustering. This algorithm is based on two observations that some of the patterns are very similar to each other and that is why they have same cluster membership irrespective to the choice of initial cluster centers. Also, an individual attribute may provide some information about initial cluster center. The initial cluster centers computed using this methodology are found to be very close to the desired cluster centers, for iterative clustering algorithms. This procedure is applicable to clustering algorithms for continuous data. We demonstrate the application of proposed algorithm to K-means clustering algorithm. The experimental results show improved and consistent solutions using the proposed algorithm.
The field enhancement factor of a carbon nanotube (CNT) placed in a cluster of CNTs is smaller th... more The field enhancement factor of a carbon nanotube (CNT) placed in a cluster of CNTs is smaller than an isolated CNT because the electric field on one tube is screened by neighbouring tubes. This screening depends on the length of the CNTs and the spacing between them. We have derived an expression to compute the field enhancement factor of CNTs under any positional distribution of CNTs using a model of a floating sphere between parallel anode and cathode plates. Using this expression we can compute the field enhancement factor of a CNT in a cluster (non-uniformly distributed CNTs). This expression is used to compute the field enhancement factor of a CNT in an array (uniformly distributed CNTs). Comparison has been shown with experimental results and existing models.
In the paper (Ahmad and Tripathi 2006 Nanotechnology 17 3798), we derived an expression to comput... more In the paper (Ahmad and Tripathi 2006 Nanotechnology 17 3798), we derived an expression to compute the field enhancement factor of CNTs under any positional distribution of CNTs by using the model of a floating sphere between parallel anode and cathode plates. Using this expression we can compute the field enhancement factor of a CNT in a cluster (non-uniformly distributed CNTs). This expression was used to compute the field enhancement factor of a CNT in an array (uniformly distributed CNTs). We used an approximation to calculate the field enhancement factor. Hence, our expressions are correct in that assumption only. Zhbanov et al (2010 Nanotechnology 21 358001) suggest a correction that can calculate the field enhancement factor without using the approximation. Hence, this correction can improve the applicability of this model.
Dimensionality reduction is one of the key data analysis steps. Besides increasing the speed of c... more Dimensionality reduction is one of the key data analysis steps. Besides increasing the speed of computation, eliminating insignificant attributes from data enhances the quality of knowledge extracted from the data. In this paper we have proposed an efficient, conditional probability based method for computing the significance of attributes. The algorithm is highly scalable and can simultaneously rank all the attributes. The proposed method can be used to analyze pre-classified data by exploiting the attribute-to-class and class-to-attribute co-relations. The effectiveness of the approach is established through the analysis of various large test data sets. The method can be extended to extract classificatory knowledge from the data.
Computation of similarity between categorical data objects in unsupervised learning is an importa... more Computation of similarity between categorical data objects in unsupervised learning is an important data mining problem. We propose a method to compute distance between two attribute values of same attribute for unsupervised learning. This approach is based on the fact that similarity of two attribute values is dependent on their relationship with other attributes. Computational cost of this method is linear with respect to number of data objects in data set. To see the effectiveness of our proposed distance measure, we use proposed distance measure with K-mode clustering algorithm to cluster various categorical data sets. Significant improvement in clustering accuracy is observed as compared to clustering results obtained using traditional K-mode clustering algorithm.
Data with multi-valued categorical attributes can cause major problems for decision trees. The hi... more Data with multi-valued categorical attributes can cause major problems for decision trees. The high branching factor can lead to data fragmentation, where decisions have little or no statistical support. In this paper, we propose a new ensemble method, Random Ordinality Ensembles (ROE), that circumvents this problem, and provides significantly improved accuracies over other popular ensemble methods. We perform a random projection of the categorical data into a continuous space by imposing random ordinality on categorical attribute values. A decision tree that learns on this new continuous space is able to use binary splits, hence avoiding the data fragmentation problem. A majority-vote ensemble is then constructed with several trees, each learnt from a different continuous space. An empirical evaluation on 13 datasets shows this simple method to significantly outperform standard techniques such as Boosting and Random Forests. Theoretical study using an information gain framework is carried out to explain RO performance. Study shows that ROE is quite robust to data fragmentation problem and Random Ordinality (RO) trees are significantly smaller than trees generated using multi-way split.
The K-Modes clustering algorithm [1] has shown great promise for clustering large data sets with ... more The K-Modes clustering algorithm [1] has shown great promise for clustering large data sets with categorical attributes. K-Mode clustering algorithm suffers from the drawback of choosing random selection of initial points (modes) of the cluster. Different initial points leads to different cluster formations. In this paper Density-based Multiscale Data Condensation [2] approach with hamming distance [1] is used to extract K-initial points. Experiments show that K-modes clustering algorithm using these initial points produce improved and consistent results then the random selection method.
... Huang [18] proposed an algorithm based on K-mean algorithm philosophy to cluster mixed data. ... more ... Huang [18] proposed an algorithm based on K-mean algorithm philosophy to cluster mixed data. Fuzzy c-means (FCM) proposed by Dunn [19] and extended by Bezdek [20] is one of the most well known methodologies in clustering analysis. ...
Performance of iterative clustering algorithms which converges to numerous local minima depend hi... more Performance of iterative clustering algorithms which converges to numerous local minima depend highly on initial cluster centers. Generally initial cluster centers are selected randomly. In this paper we propose an algorithm to compute initial cluster centers for K-means clustering. This algorithm is based on two observations that some of the patterns are very similar to each other and that is why they have same cluster membership irrespective to the choice of initial cluster centers. Also, an individual attribute may provide some information about initial cluster center. The initial cluster centers computed using this methodology are found to be very close to the desired cluster centers, for iterative clustering algorithms. This procedure is applicable to clustering algorithms for continuous data. We demonstrate the application of proposed algorithm to K-means clustering algorithm. The experimental results show improved and consistent solutions using the proposed algorithm.
The field enhancement factor of a carbon nanotube (CNT) placed in a cluster of CNTs is smaller th... more The field enhancement factor of a carbon nanotube (CNT) placed in a cluster of CNTs is smaller than an isolated CNT because the electric field on one tube is screened by neighbouring tubes. This screening depends on the length of the CNTs and the spacing between them. We have derived an expression to compute the field enhancement factor of CNTs under any positional distribution of CNTs using a model of a floating sphere between parallel anode and cathode plates. Using this expression we can compute the field enhancement factor of a CNT in a cluster (non-uniformly distributed CNTs). This expression is used to compute the field enhancement factor of a CNT in an array (uniformly distributed CNTs). Comparison has been shown with experimental results and existing models.
In the paper (Ahmad and Tripathi 2006 Nanotechnology 17 3798), we derived an expression to comput... more In the paper (Ahmad and Tripathi 2006 Nanotechnology 17 3798), we derived an expression to compute the field enhancement factor of CNTs under any positional distribution of CNTs by using the model of a floating sphere between parallel anode and cathode plates. Using this expression we can compute the field enhancement factor of a CNT in a cluster (non-uniformly distributed CNTs). This expression was used to compute the field enhancement factor of a CNT in an array (uniformly distributed CNTs). We used an approximation to calculate the field enhancement factor. Hence, our expressions are correct in that assumption only. Zhbanov et al (2010 Nanotechnology 21 358001) suggest a correction that can calculate the field enhancement factor without using the approximation. Hence, this correction can improve the applicability of this model.
Dimensionality reduction is one of the key data analysis steps. Besides increasing the speed of c... more Dimensionality reduction is one of the key data analysis steps. Besides increasing the speed of computation, eliminating insignificant attributes from data enhances the quality of knowledge extracted from the data. In this paper we have proposed an efficient, conditional probability based method for computing the significance of attributes. The algorithm is highly scalable and can simultaneously rank all the attributes. The proposed method can be used to analyze pre-classified data by exploiting the attribute-to-class and class-to-attribute co-relations. The effectiveness of the approach is established through the analysis of various large test data sets. The method can be extended to extract classificatory knowledge from the data.
Computation of similarity between categorical data objects in unsupervised learning is an importa... more Computation of similarity between categorical data objects in unsupervised learning is an important data mining problem. We propose a method to compute distance between two attribute values of same attribute for unsupervised learning. This approach is based on the fact that similarity of two attribute values is dependent on their relationship with other attributes. Computational cost of this method is linear with respect to number of data objects in data set. To see the effectiveness of our proposed distance measure, we use proposed distance measure with K-mode clustering algorithm to cluster various categorical data sets. Significant improvement in clustering accuracy is observed as compared to clustering results obtained using traditional K-mode clustering algorithm.
Data with multi-valued categorical attributes can cause major problems for decision trees. The hi... more Data with multi-valued categorical attributes can cause major problems for decision trees. The high branching factor can lead to data fragmentation, where decisions have little or no statistical support. In this paper, we propose a new ensemble method, Random Ordinality Ensembles (ROE), that circumvents this problem, and provides significantly improved accuracies over other popular ensemble methods. We perform a random projection of the categorical data into a continuous space by imposing random ordinality on categorical attribute values. A decision tree that learns on this new continuous space is able to use binary splits, hence avoiding the data fragmentation problem. A majority-vote ensemble is then constructed with several trees, each learnt from a different continuous space. An empirical evaluation on 13 datasets shows this simple method to significantly outperform standard techniques such as Boosting and Random Forests. Theoretical study using an information gain framework is carried out to explain RO performance. Study shows that ROE is quite robust to data fragmentation problem and Random Ordinality (RO) trees are significantly smaller than trees generated using multi-way split.
The K-Modes clustering algorithm [1] has shown great promise for clustering large data sets with ... more The K-Modes clustering algorithm [1] has shown great promise for clustering large data sets with categorical attributes. K-Mode clustering algorithm suffers from the drawback of choosing random selection of initial points (modes) of the cluster. Different initial points leads to different cluster formations. In this paper Density-based Multiscale Data Condensation [2] approach with hamming distance [1] is used to extract K-initial points. Experiments show that K-modes clustering algorithm using these initial points produce improved and consistent results then the random selection method.
Uploads
Papers by Amir Ahmad