17 - Chapter 9 PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

CHAPTER 9

Conclusions

This chapter concludes the thesis by summarizing the works, findings and
contributions of the thesis. It also presents some directions of future research.

9.1 Summary of Works

This thesis investigated the effectiveness of various clustering algorithms and soft
computing approaches from a pattern recognition perspective. The research work for
the thesis began with the obtainment of a general overview on pattern recognition and
on its paradigms and various approaches. A study work was then done on some
prominent soft computing approaches like fuzzy logic and artificial neural networks
and on their applications in pattern recognition. The K-means and FCM clustering
algorithms were then analyzed thoroughly to understand their shortcomings. Since
clustering is unsupervised, so a study and investigation was also done on various
existing validity assessment indices for validation of clustering results. Based on the
study on clustering algorithms, two new clustering techniques HBDKM and DpsFCM
were developed and based on the study on validity indices the enhanced validity
indices kPBM and kPBMF were obtained. A software package was then developed by
integrating together the implementations of all the clustering algorithms and soft
computing approaches investigated in the research so as to obtain a valuable research
aid for researchers working in the area of pattern recognition. All these works are
presented in the various chapters as follows:
Chapter 2 presents an overview of pattern recognition and its two different
9.1 Summary of Works 155

paradigms namely supervised and unsupervised. It also presents some standard


approaches of pattern recognition like, template matching, geometrical classification,
statistical approach, syntactic approach and neural networks. Some applications of
pattern recognition are also presented. Further a brief discussion on pattern
recognition on data from some different domains is also presented in chapter 2.
In chapter 3, a discussion on soft computing approaches fuzzy logic and artificial
neural networks is presented. A specific example of pattern recognition on images
using a hamming neural network is also presented. The findings obtained from the
experiment showed that the classification accuracy of a hamming network is fully
dependant on the fact that how well the templates stored by it match the input pattern
vectors. From the experiment it can also be reinforced and concluded that template
matching is not the most effective approach to pattern recognition and for more
sophisticated pattern recognition, neural networks with dynamic learning ability
should be used.
In chapter 4 the details of the analytical work on clustering algorithms is
presented. The chapter begins with a brief overview on clustering, similarity measures
and different taxonomical representations of clustering. It then presents a discussion
on two prominent clustering algorithms K-means and FCM and on some popular hard
and fuzzy clustering validity assessment indices. Further, the chapter also presents
several experimental results of experiments on the applications of the clustering
algorithms for pattern recognition on numeric, image and text data.
In chapter 5 the HBDKM clustering algorithm is presented as an efficient
deterministic version of the widely used K-means clustering algorithm. The chapter
also presents experimental results obtained by applying the HBDKM algorithm on a
number of benchmark data sets, to illustrate its efficiency over the K-means clustering
algorithm.
In chapter 6 the DpsFCM clustering algorithm is presented as an efficient
deterministic version of the psFCM clustering algorithm. The chapter also presents
experimental results, obtained by applying the DpsFCM algorithm on a number of
benchmark data sets, to illustrate its efficiency over the pshFCM, psFCM and FCM
clustering algorithms.
9.2 Summary of Contributions 156

In chapter 7 the enhanced validity indices kPBM and kPBMF are presented. The
indices were obtained by applying a common enhancement applicable on both the
PBM and PBMF validity indices. Experimental results are presented in the chapter to
show the better performance of the proposed forms over the existing forms of the
P^JW validity index.
In chapter 8 the features and working of PatternWiz are presented. The software
package was developed in MATLAB 7 as one of the primary objective of the
research. The package includes a number of useful and interesting features to
facilitate clustering of different types of data and the validation of clustering. It also
integrates a module for the assessment of performance and efficiency of hamming
neural network based pattern recognition on images. The module was developed for
the identification of digits in distorted images of printed and handwritten digits.

9.2 Summary of Contributions

The main contributions of this thesis include the development of two new
clustering algorithms HBDKM and DpsFCM, the formulation of Hybrid split rule for
k-d tree partitioning, the formulation of enhanced clustering validity indices kPBM
and kPBMF and the development of the software package PatternWiz. Other
important contributions of the thesis include the studies and analysis of soft
computing approaches, the studies and analysis of hard and fuzzy clustering
algorithms and the studies and analysis of various hard and fuzzy clustering validity
assessment indices.
The HBDKM clustering algorithm finds better initial centroids compared to the
K-means clustering algorithm and obtains the same set of optimal initial centroids
every time. As a result the algorithm is deterministic and obtains the same final
clustering every time, for a given data set. The DpsFCM clustering algorithm is also
deterministic and obtains the same set of better initial centroids every time unlike the
other non-deterministic or stochastic algorithms, pshFCM, psFCM and FCM. Thus
the resultant clustering is also obtained as same every time. The hybrid split rule
described in subsection 6.3.1 is a hybridization of the original split rule, the standard
9.3 Future Directions 157

split rule and the midpoint split rule and was designed to obtain a computationally
faster splitting rule for k-d tree partitioning, but in line with the standard rules. The
kPBM and kPBMF validity indices evaluate faster than the corresponding PBM and
PBMF validity indices by avoiding the computation of a constant term present in both
the PBM and PBMF indices. The validation characteristics of the proposed versions
however remain the same as the existing forms. The software package PatternWiz can
be used as a usefiil research aid by researchers working in the area of pattern
recognition.

9.3 Future Directions

In this thesis, the study and analysis related to the soft computing approaches was
done by investigating fuzzy logic and artificial neural networks. As a future work,
other soft computing approaches like genetic algorithms (GA), probabilistic reasoning
(PR) and rough sets (RS) can be investigated for pattern recognition. Study and
investigation of their hybridizations can also be done. Moreover apart from the fuzzy
logic based clustering, other soft computing based clustering approaches may also be
investigated.
The proposed HBDKM clustering algorithm uses a deterministic approach. As a
future extension a non-deterministic version of the algorithm can be obtained by
restricting the hyper-block partitioning process to only once and then it can be
compared for performance with algorithms like K-means. Further the performance of
the hyper-block partitioning technique itself may be investigated, by using it for other
clustering algorithms which require the computations of good initial centroids for
optimal convergence.
For the DpsFCM clustering algorithm also, a non-deterministic version can be
obtained by allowing the k-d tree partitioning to change on different trials using
different (may be random) values for maximum depth maxd and the performance may
then be compared with other algorithms. Like hyper-block partitioning, the
performance of the k-d tree partitioning technique also may be investigated by using it
for other clustering algorithms.
9.3 Future Directions 158

The validity indices evaluate the goodness of clustering obtained for a given data
set. Many validity indices however perform better for data sets with greater degree of
regularity in the cluster shapes and orientation in the hyper-volume of the data set
than for the data sets with noise and overlapping clusters. An extensive study thus
needs to be undertaken on several validity indices for a better insight of their
properties and validation performances. Since the performance study of validity
indices in also dependant on the underlying algorithm used, hence as a future work, an
investigation on the performances of the validity indices may also be done by
applying the indices for validation of results obtained using various clustering
algorithms for the same collection of data sets.

You might also like