Towards A Hybrid Approach of K-Means and Density-Based Spatial Clustering of Applications With Noise For Image Segmentation
Towards A Hybrid Approach of K-Means and Density-Based Spatial Clustering of Applications With Noise For Image Segmentation
Towards A Hybrid Approach of K-Means and Density-Based Spatial Clustering of Applications With Noise For Image Segmentation
and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData)
Abstract— Image segmentation is the process to divide a digital K-means algorithm is then adopted to cluster the pixels into a
image into a number of regions for further analysis in the area of number of small clusters whilst the centroid of each cluster is to
computer vision. Color images can be segmented by applying group the data points with similar aggregated features values.
various clustering algorithms such as DBSCAN, which can The clusters centroids produced by K-means are further
identify the arbitrary shaped clusters. The drawback of DBSCAN clustered by DBSCAN. The image segmentation results are
is the high computational complexity whilst the sizes of image finally provided by fusing the results of K-means and DBSCAN.
datasets are normally very large. This paper proposes a hybrid To demonstrate the usability of the proposed method, four
method of K-means and DBSCAN (Kmeans-DBSCAN) for image images selected from Berkeley Segmentation Dataset and
segmentation. K-means is the common partition-based clustering
Benchmark (BSDB) [10] were used in the experiment.
approach to reduce the size of image dataset. Four benchmarking
image segmentation cases are used for evaluating the usability of This paper is organized as following. Section II presents
proposed Kmeans-DBSCAN method. Kmeans-DBSCAN in details. Section III presents the
experimental design and the results. Section IV concludes this
Keywords—Image Segmentation; Computer Vision; Clustering study.
Analysis; K-means; DBSCAN.
II. KMEANS-DBSCAN
I. INTRODUCTION
The proposed Kmeans-DBSCAN approach consists of four
Image segmentation is an important topic in the field of stages. An image is preprocessed as a data matrix. The data
computer vision. The representation of images can be simplified matrix is divided to a number of small clusters by K-means. The
by segmenting a digital image into multiple regions for further clusters are then merged by DBSCAN taking the corresponding
analysis [1]. The pixels of an image can be clustered as regions centroids as inputs. Finally the results of K-means and DBSCAN
with respect to their color similarities and spatial relationships are fused to produce the segmentation results.
[1-3].
A. Image Dataset Preprocessing
Density-based clustering methods can be applied for image
segmentation due to their ability to discover arbitrary shaped An RGB color image can be represented as a 3-D raw dataset
clusters [4-6]. Density-Based Spatial Clustering of Applications by using the image reading functions such as readJPEG()
with Noise (DBSCAN) [7] is the widely used density-based provided by R package jpeg [9]. The 3-D raw dataset is a color
clustering method. The main disadvantage of DBSCAN is the matrix consisting of three additive primary color (Red, Green
high time-complexity. As the size of an image dataset is and Blue) values of each pixel. In the preprocessing stage, the
normally very large, it is hard to be directly processed by raw 3-D image dataset is reconstructed as a 2-D data matrix by
DBSCAN. combining the primary color and spatial information. The ith
pixel is represented by the vector (Xi, Yi, Ri, Gi, Bi) where Xi, Yi
K-means [8] has been applied for image segmentation [1, 2] are the spatial values computed by Eqs. (1-2) and (Ri, Gi, Bi) is
by clustering the pixels with respect to their color similarities. the color vector from the raw image dataset.
As K-means was designed for centroid-based cases, it can not be
used to detect the arbitrary shaped regions in an image. 1+i \x
Xi = (1)
In this paper, a hybrid approach, Kmeans-DBSCAN is x
proposed by combining K-means and DBSCAN for image
segmentation. For the image preprocessing, the color and spatial i mod y
Yi =
information for all the pixels are represented in a 2-D data matrix. y (2)
397
have 2 space dimensions matrix and each element for the matrix using Kmeans-DBSCAN, the input size of DBSCAN is reduced
is the 3 dimensions color values (R, G, B). For the example of to 50 or 100 vectors in the experiments, as the segmentation
Image 3063, the color values of the first pixel are 0.3451, 0.4627 results are reasonable in Figures 4-5.
and 0.6667 in the raw image dataset. In the 2-D data matrix, the
first two attributes values are computed as IV. CONCLUSION
(1\481+1)/481=0.0021 and (1mod321)/321=0.0.0031. The This paper proposes Kmeans-DBSCAN, a hybrid approach
attribute values of the first 4 pixels in Image 3063 are shown in of K-means and DBSCAN, for image segmentation. Since the
Table I. high computational complexity of DBSCAN and the large size
of image datasets, K-means is applied to reduce the size of image
TABLE I. A SAMPLE OF PREPROCESSED IMAGE 3063 DATA MATRIX datasets in proposed approach. Four images selected from
Pixel ID Xi Yi Ri Gi Bi benchmarking datasets are used for evaluating the usability of
the proposed method. The results of proposed method are more
1 0.0021 0.0031 0.3451 0.4627 0.6667
reasonable than either DBSCAN or K-means segmentation
2 0.0021 0.0063 0.3686 0.4784 0.6745 results. The further study will extend the structure of the
3 0.0021 0.0093 0.4039 0.4980 0.6862 proposed method with more experiments.
4 0.0021 0.0125 0.4392 0.5216 0.6902
398
(a) Image 3063 (b) Image 3069 (c) Image 12003 (d) Image 135069
(a) Image 3063 (b) Image 3069 (c) Image 12003 (d) Image 135069
(a) Image 3063 (b) Image 3069 (c) Image 12003 (d) Image 135069
(a) Image 3063 (b) Image 3069 (c) Image 12003 (d) Image 135069
Fig. 4. Segmentation results of Kmeans-DBSCAN by setting K as 50, DBSCAN Parameter pairs as: (a) (0.40, 2); (b) (0.35, 3); (c) (0.31, 2); (d) (0.33, 3).
(a) Image 3063 (b) Image 3069 (c) Image 12003 (d) Image 135069
Fig. 5. Segmentation results of Kmeans-DBSCAN by setting K as 100, DBSCAN Parameter pairs as: (a) (0.25, 3); (b) (0.35, 5); (c) (0.31, 4); (d) (0.33, 5).
(a) Image 3063 (b) Image 3069 (c) Image 12003 (d) Image 135069
399