Ijcet 10 01 005 PDF
Ijcet 10 01 005 PDF
Ijcet 10 01 005 PDF
Volume 10, Issue 1, January – February 2019, pp. 38–47, Article ID: IJCET_10_01_005
Available online at
http://www.iaeme.com/ijcet/issues.asp?JType=IJCET&VType=10&IType=1
Journal Impact Factor (2016): 9.3590(Calculated by GISI) www.jifactor.com
ISSN Print: 0976-6367 and ISSN Online: 0976–6375
© IAEME Publication
ABSTRACT
Present studies, development of genomic technologies are highly concentrated on
galactic scale gene data. In Bioinformatics community, the sizable volume of gene
data investigation and distinguishing the behavior of genes in antithetical conditions
are the intriguing task. This cognitive factor can be deal by the clustering technique,
its groups the similarity patterns at various features. Moreover, gene expression data
indicates the contrastive levels of gene behaviors in various tissue cells and it does
provide the feature information effectively. This gene clustering investigation is
precise and accommodating in cancer uncovering because of its easiness to detect the
cancerous and non-cancerous genes. The precautionary measures cancer diagnostic
is precise crucial for cancer prevention and treatment. The existing cancer gene
clustering techniques includes several limitations such as time complexities in training
and testing samples, maximum redundant features and high dimensional data. These
issues are severely influences the data clustering accuracy. This paper focuses on
survey of various clustering techniques of cancer gene clustering with respect to
cancer gene benchmark datasets. Furthermore, review of existing cancer gene
clustering technique describes the advantages and limitations comprehensively.
Key words: Bioinformatics, Cancer, Clustering technique, Gene expression.
Cite this Article: Prabhuraj, P.M Mallikarjuna Shastry, S.S Patil, Review on
Clustering Cancer Genes. International Journal of Computer Engineering and
Technology, 10(1), 2019, pp. 38–47.
http://www.iaeme.com/IJCET/issues.asp?JType=IJCET&VType=10&IType=1
http://www.iaeme.com/IJCET/index.asp 38 [email protected]
Prabhuraj, P.M Mallikarjuna Shastry, S.S Patil
1. INTRODUCTION
The field of Bioinformatics significantly includes several different sections such as molecular
biology, genetics, mathematics, data intensive and etc. Bioinformatics provides the more
information about specific elements and usually represented in the form of sequence [1]. In
present scenario, genomics sequence data is growing exponentially and it’s compelling to
analyze such a vast cancer genome. In order to analyze and handle such biological
information, several data clustering techniques are utilized [2]. The clustering approaches are
divide the sequence data into various groups and those groups are helps to predict the genes
functions [3]. The clustering technique highly depends on the distance and similarity between
the data. The clustering technique is applied in different applications such as image
segmentation, pattern recognition, web search pattern and etc. [4]. In the field of medical, the
significant objective of clustering technique develops the structure of uncertain molecules to
determine the intrinsic hidden patterns and find the link between the molecules. This
information helps to identify the patterns for diagnosis and treatment [5]. In micro-array
technology, genes store the significant biological information’s of each living organisms. In
gene data analysis, discover the similarity between the genes based on the functions and
expression values, which is available in the Gene Ontology databases [6-7].
Numerous researchers are used clustering technique for analyzing the gene activities and
cancer topologies. Cancer gene-based clustering algorithms groups thousands of genes into
several smaller clusters to find out the different levels of gene expression, which is useful for
understanding the functions of many genes. Sample-based clustering methods cluster samples
which has similar expression pattern to facilitate the discovery of new tumor types [8]. The
cancer gene clustering technique is classified into two types such as supervised clustering
method and Un-supervised clustering method. The unsupervised clustering processes a set of
different groups of data items that belongs to the similar groups based on particular criteria. In
supervised clustering, the actual class labels of some data points are construct the model,
which is further used to assign class labels to some unknown samples [9]. The several
standard clustering algorithms such as K-means (KM), Fuzzy C-Means (FCM), Self-
Organizing Maps (SOM), and Genetic Algorithm-based (GA) clustering algorithms have been
utilized for clustering gene expression data [10].
http://www.iaeme.com/IJCET/index.asp 39 [email protected]
Review on Clustering Cancer Genes
http://www.iaeme.com/IJCET/index.asp 40 [email protected]
Prabhuraj, P.M Mallikarjuna Shastry, S.S Patil
http://www.iaeme.com/IJCET/index.asp 41 [email protected]
Review on Clustering Cancer Genes
N. Nidheesh, et al. [19] developed a density based KM algorithm for the estimation of
cancer subspace in gene expression data. Generally, KM algorithm includes several benefits
such as easy to implement, resolves the problem of linear space complexity without any time
complexity. Similarly, KM approach includes several disadvantages i.e. non-deterministic
nature, and random set of data points are being considered as centroid. The random selection
of data points degraded the cluster efficiency. In this literature, the density based KM
algorithm has difficulty to perform in outlier data hence, time complexity was gradually
increased.
S. Saha, et al. [20] established ensemble based clustering namely Multi-Objective (MO)
fuzzy technique for enhancing the performance of cancer gene classification. The few
processes are merged with the ensemble based framework (i) To detect the overlapped
clusters, fuzzy logic is used (ii) In order to identify the various shape of the clusters and
calculated the distance between the clusters by symmetry based distance measure. (iii) MO-
optimization approach and MO-differential evolution methods are used to improve the search
space efficiency for finding the optimal partitioning in minimum time. The ensemble
clustering method needs prior information about the number of clusters in the datasets which
is the major limitation of this method.
Huang, X., et al. [21] presented an efficient method namely Support Vector Machine
Recursive Feature Elimination (SVM-RFE) for gene feature selection. Initially, SVM-RFE
method randomly selects the genes after that ranks the selected genes and finally clusters the
genes as similar expression profiles. According to experimental analysis, in contrast to the
traditional clustering method, the SVM-RFE algorithm shows better clustering efficiency and
minimum computation complexity. Also, SVM-RFE method minimizes the relevant gene
features and maximizes the redundant features.
S. R. Kannan, et al. [22] developed Kernel based Fuzzy clustering (KF) system for
evaluating the cancer data. This clustering algorithm considers the breast cancer data, these
data are the high dimensional gene expression profile. The KF method helps to select the
various levels of non-linearity to identify the membership functions complexity. The
significant advantage of KF method are reduced number of iterations in prototype
initialization and decreased running time. But, the features have high dimensional data hence
complexities of data clustering is bit increased.
http://www.iaeme.com/IJCET/index.asp 42 [email protected]
Prabhuraj, P.M Mallikarjuna Shastry, S.S Patil
While some genes can be even involved in more than one pathway, while some others
might not be relevant to the biological process. Moreover, all gene data values are
dependent on other gene values hence, gene values are redundant.
Scalability: The gene data includes the large size of datasets and it’s includes number
of data items. The existing clustering technique increase the running time linearly
because of large size dataset. Sometimes re-scan the data on servers may be an
expensive operation since data are generated by an expensive join query over
potentially distributed data warehouse. Thus, only one data scan is usually required.
Time and Space Complexity: The computational complexity is linear in input
features, objects and number of iterations. In every iteration, loops search the nearest
neighbor in the clusters also, performs the insertion of few clusters or remove the
clusters from the stack. If clusters are removed from the stack, it influences the other
clusters hence, time and space complexity are increased gradually.
http://www.iaeme.com/IJCET/index.asp 43 [email protected]
Review on Clustering Cancer Genes
http://www.iaeme.com/IJCET/index.asp 44 [email protected]
Prabhuraj, P.M Mallikarjuna Shastry, S.S Patil
5. CONCLUSIONS
Cancer gene data has high dimensionality and cluster structure. Numerous statistical strategies
are helps to detect the cancer genes in order to improve the cancer detection and development
stages. To detect derivative expressed genes under comparative conditions, various hypothesis
testing methods and the false discovery rate approach are used. The existing cancer gene
clustering technique is helps to identity the cancerous and normal gene effectively. The
clustering technique is classified into several types such as partition based, density, graph
based and etc. In this paper, review on existing cancer gene clustering technique advantage,
limitation and similarity measure is described. According to the comparison table.1, the
machine learning technique with Ensemble classifier shows the better results with respect to
some efficient parameters such as time, memory and accuracy. The clustering technique is
implemented in standard cancer gene datasets.
REFERENCES
[1] Nagi, S. and Bhattacharyya, D. K. Cluster analysis of cancer data using semantic
similarity, sequence similarity and biological measures. Network Modeling Analysis in
Health Informatics and Bioinformatics, 3(1), 2014, pp. 67.
[2] Jonnalagadda, S. and Srinivasan, R. Determining distinct clusters in gene expression data
using similarity in principal component subspaces. International Journal of Advances in
Engineering Sciences and Applied Mathematics, 4(1-2), 2012, pp.41-51.
[3] Wei, D., Jiang, Q., Wei, Y. and Wang, S. A novel hierarchical clustering algorithm for
gene sequences. BMC bioinformatics, 13(1), 2012, pp. 174.
[4] Jiang, Z., Li, T., Min, W., Qi, Z. and Rao, Y. Fuzzy c-means clustering based on weights
and gene expression programming. Pattern Recognition Letters, 90, 2017, pp.1-7.
[5] Mehmood, R., El-Ashram, S., Bie, R. and Sun, Y. Effective cancer subtyping by
employing density peaks clustering by using gene expression microarray. Personal and
Ubiquitous Computing, 22(3), 2018, pp. 615-619.
[6] Acharya, S., Saha, S. and Pradhan, P. Novel symmetry-based gene-gene dissimilarity
measures utilizing Gene Ontology: Application in gene clustering. Gene, 679, pp. 341-
351.
http://www.iaeme.com/IJCET/index.asp 45 [email protected]
Review on Clustering Cancer Genes
[7] Hosseini, B. and Kiani, K. FWCMR: A scalable and robust fuzzy weighted clustering
based on MapReduce with application to microarray gene expression. Expert Systems with
Applications, 91, 2018, pp. 198-210.
[8] Chen, X. and Jian, C. Gene expression data clustering based on graph regularized
subspace segmentation. Neurocomputing, 143, 2014, pp. 44-50.
[9] Torshizi, A. D. and Zarandi, M. H. F. A new cluster validity measure based on general
type-2 fuzzy sets: application in gene expression data clustering. Knowledge-Based
Systems, 64, 2014, pp. 81-93.
[10] Alok, A. K., Saha, S. and Ekbal, A. Semi-supervised clustering for gene-expression data
in multiobjective optimization framework. International Journal of Machine Learning and
Cybernetics, 8(2), 2017, pp. 421-439.
[11] Kim, J. and Kim, H. Partitioning of functional gene expression data using principal
points. BMC bioinformatics, 18(1), 2017, pp. 450.
[12] Lord, E., Diallo, A. B. and Makarenkov, V. Classification of bioinformatics workflows
using weighted versions of partitioning and hierarchical clustering algorithms. BMC
bioinformatics, 16(1), 2015, pp. 68.
[13] Kumar, K. M. and Reddy, A. R. M. An efficient k-means clustering filtering algorithm
using density based initial cluster centers. Information Sciences, 418-419, 2017, pp. 286-
301.
[14] Sriwanna, K., Boongoen, T. and Iam-On, N. Graph clustering-based discretization
approach to microarray data. Knowledge and Information Systems, 2018, pp.1-28.
[15] Anusha, M. and Sathiaseelan, J. G. R. Evolutionary clustering algorithm using criterion-
knowledge-ranking for multi-objective optimization. In proceedings World Conference
on Futuristic Trends in Research and Innovation for Social Welfare (Startup Conclave),
pp. 1-13.
[16] Maâtouk, O., Ayadi, W., Bouziri, H. and Duval, B. Evolutionary biclustering algorithms:
an experimental study on microarray data. Soft Computing, 2018, pp. 1-27.
[17] Yin, L. and Liu, Y. Ensemble biclustering gene expression data based on the spectral
clustering. Neural Computing and Applications, 30(8), pp. 2403-2416.
[18] Soruri, M., Sadri, J. and Zahiri, S. H. Gene clustering with hidden Markov model
optimized by PSO algorithm. Pattern Analysis and Applications, 2018, pp. 1-6.
[19] Nidheesh, N., Nazeer, K. A. and Ameer, P. M. An enhanced deterministic K-Means
clustering algorithm for cancer subtype prediction from gene expression data. Computers
in biology and medicine, 91, 2017, pp. 213-221.
[20] Saha, S., Das, R. and Pakray, P. Aggregation of multi-objective fuzzy symmetry-based
clustering techniques for improving gene and cancer classification. Soft Computing, 2017,
pp. 1-20.
[21] Huang, X., Zhang, L., Wang, B., Li, F. and Zhang, Z. Feature clustering based support
vector machine recursive feature elimination for gene selection. Applied
Intelligence, 48(3), 2018, pp. 594-607.
[22] Kannan, S. R., Siva, M., Ramathilagam, S. and Devi, R. Effective Kernel-Based Fuzzy
Clustering Systems in Analyzing Cancer Database. Data-Enabled Discovery and
Applications, 2(1), 2018, pp. 5.
[23] Jaskowiak, P. A., Campello, R. J. and Costa, I. G., Evaluating correlation coefficients for
clustering gene expression profiles of cancer. In Brazilian Symposium on
Bioinformatics, pp. 120-131, Berlin, Heidelberg.
http://www.iaeme.com/IJCET/index.asp 46 [email protected]
Prabhuraj, P.M Mallikarjuna Shastry, S.S Patil
[24] Yan, X., Liang, A., Gomez, J., Cohn, L., Zhao, H. and Chupp, G. L. A novel pathway-
based distance score enhances assessment of disease heterogeneity in gene
expression. BMC bioinformatics, 18(1), 2017, pp. 309.
[25] Chicco, D., Ciceri, E. and Masseroli, M., Extended Spearman and Kendall Coefficients for
Gene Annotation List Correlation. In International Meeting on Computational Intelligence
Methods for Bioinformatics and Biostatistics, 2014, pp. 19-32, Springer.
[26] Kumar, V., Chhabra, J. K. and Kumar, D. Performance evaluation of distance metrics in
the clustering algorithms. INFOCOMP, 13(1), 2014, pp. 38-52.
[27] Ramos, J., Castellanos-Garzón, J. A., González-Briones, A., de Paz, J. F. and Corchado, J.
M. An agent-based clustering approach for gene selection in gene expression
microarray. Interdisciplinary Sciences: Computational Life Sciences, 9(1), 2017, pp. 1-13.
[28] Chen, H., Zhang, Y. and Gutman, I. A kernel-based clustering method for gene selection
with gene expression data. Journal of biomedical informatics, vol. 62, pp. 12-20.
[29] Ray, S. S. and Misra, S. A supervised weighted similarity measure for gene expressions
using biological knowledge. Gene, 595(2), 2016, pp. 150-160.
[30] Yu, Z., Chen, H., You, J., Liu, J., Wong, H. S., Han, G. and Li, L. Adaptive fuzzy
consensus clustering framework for clustering analysis of cancer data. IEEE/ACM
Transactions on Computational Biology and Bioinformatics (TCBB), 12(4), 2015, pp.
887-901.
[31] Wang, J., Liu, J. X., Kong, X. Z., Yuan, S. S. and Dai, L. Y., Laplacian Regularized Low-
Rank Representation for Cancer Samples Clustering. Computational Biology and
Chemistry, 2018.
[32] Zareizadeh, Z., Helfroush, M. S., Rahideh, A. and Kazemi, K., A robust gene clustering
algorithm based on clonal selection in multiobjective optimization framework. Expert
Systems with Applications, 113, 2018, pp. 301-314.
http://www.iaeme.com/IJCET/index.asp 47 [email protected]