Dendrogram Clustering For 3D Data Analytics in Smart City
Dendrogram Clustering For 3D Data Analytics in Smart City
Dendrogram Clustering For 3D Data Analytics in Smart City
International Conference on Geomatics and Geospatial Technology (GGT 2018), 3–5 September 2018, Kuala Lumpur, Malaysia
3D GIS Research Lab, Department of Geoinformation, Faculty of Built Environment and Survey, 81310 UTM Johor
Bahru, Malaysia - (suhaibah, mduznir, alias)@utm.my
KEY WORDS: Smart City, Dendrogram Clustering, 3D Spatial Database, 3D GIS, Data Analytics, Data Structure
ABSTRACT:
Smart city is a connection of physical and social infrastructure together with the information technology to leverage the
collective intelligence of the city. Cities will build huge data centres. These data are collected from sensors, social
media, and legacy data sources. In order to be smart, cities needs data analysis to identify infrastructure that needs to be
improved, city planning and predictive analysis for citizen safety and security. However, no matter how much smart city
focus on the updated technology, data do not organize themselves in a database. Such tasks require a sophisticated
database structure to produce informative data output. Furthermore, increasing number of smart cities and generated
data from smart cities contributes to current phenomenon called big data. These large and complex data collections
would be difficult to process using regular database management tools or traditional data processing applications. There
are multiple challenges for big data, including visualization, mining, analysis, capture, storage, search, and sharing.
Efficient data analysis mechanisms are necessary to search and extract valuable patterns and knowledge through the big
data of smart cities. In this paper, we present a technique of three-dimensional data analytics using dendrogram
clustering approach. Data will be organized using this technique and several output and analyses are carried out to proof
the efficiency of the structure for three – dimensional data analytics in smart city.
* Corresponding author
3D R-Tree seems to be the most promising data structure due two group filter which are classification and
to be used in database. In fact, it has been widely used in clustering.
commercial database software such as Oracle. However,
the transition of R-Tree to 3D had increase the overlap 3.2 Dendrogram Clustering
among node and requires more storage. This would lead
to repetitive data and multipath query among node (Azri Dendrogram clustering is also known as hierarchical
et al., 2015). Although an effort has been made to clustering algorithms or HCA. This clustering actually
improve the structure of 3D R-Tree, application using falls into two categories top-down or bottom-up. Bottom-
large amount of data still faces low data retrieval up algorithm categorize each data or object as a single
efficiency. Recent review from (Azri et al., 2013) cluster and then merge with its pair until all clusters are
proposed to merge multiple indexing methods to form merged and become a single cluster. This hierarchy of
hybrid data structure. Looking forward to a smart city, clusters is represented as a tree (or dendrogram). The root
there is a need to design a specific data handling or of the tree is the unique cluster that gathers all the objects,
structure to enable computer and system to analyse spatial the leaves being the clusters with only one sample.
big data for intelligent decision making. Thus, in this Another category is using top-down algorithm or known
paper, we propose a 3D data structure to constellate big as divisive. The approach is a bit different from top-down
data of 3D smart city data into spatial database. where all observations start in one cluster, and splits are
performed recursively as one moves down the hierarchy.
In this study, dendrogram clustering is constructed based
3. 3D DATA CONSTELLATION IN SMART CITY on bottom up algorithm. The algorithm of dendrogram
clustering can be seen as follows and the structure of
dendrogram clustering using different distance metric can
In this section the proposed 3D data structure is be seen in the Figure 2.
introduced. The construction and development of the
proposed structure is the most important part. All of the
information will be accessed through this structure prior Algorithm Dendrogram Clustering
to data analytical. 3.1
Input: X Data Points
3.1 3D Data Constellation for Efficient Data Retrieval Output: X Clusters
1. create single cluster for each X
Classified and Clustered Data Constellation (CCDC) is a
points
3D data structure that constellate 3D spatial data into
2. select distance metric that measures
spatial database (Azri et al., 2016). The data structure is
the distance between two cluster
designed and developed based on two main filters;
3. combine two clusters into one with a
classification and clustering and works based on
condition smallest distance metric
hierarchical tree concept. The classification phase will
(nearest neighbour)
classifies each of spatial objects into a group based on its
4. repeat step 3 until reach root of the
theme or type. For instance classification based on zoning
theme such as, retail, housing or industrial. Then, each tree (one cluster which contain all
object in each group of classification will be clustered data )
using clustering algorithm. In (Azri et al., 2016) the 5. list all clusters
clustering processes are based on k-means++ crisp
clustering algorithm by (Arthur and Vassilvitskii, 2007).
k-means++ introduced the approach of careful seeding to One of the advantages of using dendrogram clustering is,
improve the k-means algorithm. By using this approach, it does not require number of clusters. Besides that, the
initial seeds are defined and the remaining objects are algorithm is not sensitive to the choice of distance metric
then clustered based on the nearest distance to the initial where it can work equally well with other clustering
seeds. This algorithm has proven to yield improvements algorithms. These advantages of hierarchical clustering
in terms of accuracy with respect to original algorithm. come at the cost of lower efficiency, as it has a time
complexity of O(n³), unlike the linear complexity of k-
The results from classification and clustering phases are means. Dendrogram clustering algorithm is based on a
then mapped into hierarchical tree structure. Data will be distance matrix that has to be kept in memory. Thus, the
retrieved by traversing the tree structure from its parent distance matrix is symmetric which need memory scales
node to its child. CCDC data structure offer a very
minimal percentage and coverage area among nodes as . Besides that, average link clustering scales
which is one of the requirement for efficient data retrieval as N3 in time, because for each cluster agglomeration, the
from the database. However, still CCDC could not
achieve the zero overlap among nodes and we believe that algorithm searches through cluster
the construction of CCDC structure is time consuming dissimilarities in order to determine the pair of most
5. CONCLUSIONS
This paper proposed 3D dendrogram clustering to GUTTMAN, A. 1984. R-trees: a dynamic index structure for
produce hierarchical tree structure for data for data spatial searching. SIGMOD Rec., 14, 47-57.
retrieval and analytics. The structure is constructed based
on dendrogram clustering. The clustering grouped the HASHEM, I. A. T., CHANG, V., ANUAR, N. B., ADEWOLE,
K., YAQOOB, I., GANI, A., AHMED, E. & CHIROMA, H.
object with the same features and then groups the object
2016. The role of big data in smart city. International Journal
under the root tree. The datasets are retrieved based on of Information Management, 36, 748-758.
specific IDs for each object. The implementation of k-
means algorithm is used as a cluster seeding to speed up KELING, N., MOHAMAD YUSOFF, I., LATEH, H. &
the tree creation. This is due to the lower efficiency, as it UJANG, U. 2017. Highly Efficient Computer Oriented Octree
has a time complexity of O (n³). Based on the Data Structure and Neighbours Search in 3D GIS. In: ABDUL-
comprehensive tests and analyses of the proposed RAHMAN, A. (ed.) Advances in 3D Geoinformation. Cham:
structure, results and findings are discussed as follows. Springer International Publishing.
The first test is to prove its ability of the structure in
LABRINIDIS, A. & JAGADISH, H. V. 2012. Challenges and
retrieving records from the database. From the test, it is
opportunities with big data. Proceedings of the VLDB
successfully shown that the structure is able to retrieve Endowment, 5, 2032-2033.
information of radiation and solar absorption on the roof
of one building. The last two experiments were performed MOHANTY, S. P., CHOPPALI, U. & KOUGIANOS, E. 2016.
in order to measure the efficiency of the structure. Based Everything you wanted to know about smart cities: The Internet
on page analysis and response time analysis, it is proven of things is the backbone. IEEE Consumer Electronics
that the structure could perform better data analysis with Magazine, 5, 60-70.
low access to page. We strongly believe that the data
structure will be benefitted to planner and spatial
professional and aid them to perform better data analytics
for smarter cities.