Ijcet 10 01 005 PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

International Journal of Computer Engineering & Technology (IJCET)

Volume 10, Issue 1, January – February 2019, pp. 38–47, Article ID: IJCET_10_01_005
Available online at
http://www.iaeme.com/ijcet/issues.asp?JType=IJCET&VType=10&IType=1
Journal Impact Factor (2016): 9.3590(Calculated by GISI) www.jifactor.com
ISSN Print: 0976-6367 and ISSN Online: 0976–6375
© IAEME Publication

REVIEW ON CLUSTERING CANCER GENES


Prabhuraj
Assistant Professor, Dept. of Computer Science & Engineering
EPCET, Bengaluru, Karnataka, India

Dr P.M Mallikarjuna Shastry


Professor, School of Computing & Information Technology,
Reva University, Kattigenahalli, Yelahanka, Bengaluru, Karnataka, India

Dr. S.S Patil


Professor & Head, University Head, Dept. of Agriculture Statistics,
Applied Mathematics & Computer Science
UAS, GKVK, Bengaluru, Karnataka, India

ABSTRACT
Present studies, development of genomic technologies are highly concentrated on
galactic scale gene data. In Bioinformatics community, the sizable volume of gene
data investigation and distinguishing the behavior of genes in antithetical conditions
are the intriguing task. This cognitive factor can be deal by the clustering technique,
its groups the similarity patterns at various features. Moreover, gene expression data
indicates the contrastive levels of gene behaviors in various tissue cells and it does
provide the feature information effectively. This gene clustering investigation is
precise and accommodating in cancer uncovering because of its easiness to detect the
cancerous and non-cancerous genes. The precautionary measures cancer diagnostic
is precise crucial for cancer prevention and treatment. The existing cancer gene
clustering techniques includes several limitations such as time complexities in training
and testing samples, maximum redundant features and high dimensional data. These
issues are severely influences the data clustering accuracy. This paper focuses on
survey of various clustering techniques of cancer gene clustering with respect to
cancer gene benchmark datasets. Furthermore, review of existing cancer gene
clustering technique describes the advantages and limitations comprehensively.
Key words: Bioinformatics, Cancer, Clustering technique, Gene expression.
Cite this Article: Prabhuraj, P.M Mallikarjuna Shastry, S.S Patil, Review on
Clustering Cancer Genes. International Journal of Computer Engineering and
Technology, 10(1), 2019, pp. 38–47.
http://www.iaeme.com/IJCET/issues.asp?JType=IJCET&VType=10&IType=1

http://www.iaeme.com/IJCET/index.asp 38 [email protected]
Prabhuraj, P.M Mallikarjuna Shastry, S.S Patil

1. INTRODUCTION
The field of Bioinformatics significantly includes several different sections such as molecular
biology, genetics, mathematics, data intensive and etc. Bioinformatics provides the more
information about specific elements and usually represented in the form of sequence [1]. In
present scenario, genomics sequence data is growing exponentially and it’s compelling to
analyze such a vast cancer genome. In order to analyze and handle such biological
information, several data clustering techniques are utilized [2]. The clustering approaches are
divide the sequence data into various groups and those groups are helps to predict the genes
functions [3]. The clustering technique highly depends on the distance and similarity between
the data. The clustering technique is applied in different applications such as image
segmentation, pattern recognition, web search pattern and etc. [4]. In the field of medical, the
significant objective of clustering technique develops the structure of uncertain molecules to
determine the intrinsic hidden patterns and find the link between the molecules. This
information helps to identify the patterns for diagnosis and treatment [5]. In micro-array
technology, genes store the significant biological information’s of each living organisms. In
gene data analysis, discover the similarity between the genes based on the functions and
expression values, which is available in the Gene Ontology databases [6-7].
Numerous researchers are used clustering technique for analyzing the gene activities and
cancer topologies. Cancer gene-based clustering algorithms groups thousands of genes into
several smaller clusters to find out the different levels of gene expression, which is useful for
understanding the functions of many genes. Sample-based clustering methods cluster samples
which has similar expression pattern to facilitate the discovery of new tumor types [8]. The
cancer gene clustering technique is classified into two types such as supervised clustering
method and Un-supervised clustering method. The unsupervised clustering processes a set of
different groups of data items that belongs to the similar groups based on particular criteria. In
supervised clustering, the actual class labels of some data points are construct the model,
which is further used to assign class labels to some unknown samples [9]. The several
standard clustering algorithms such as K-means (KM), Fuzzy C-Means (FCM), Self-
Organizing Maps (SOM), and Genetic Algorithm-based (GA) clustering algorithms have been
utilized for clustering gene expression data [10].

2. TAXONOMY OF GENE EXPRESSION CLUSTERING


Genes are small segments in chromosome that have more functions related to the
encapsulated data which is responsible for generating proteins results in a large range of
sequence length in chromosomes, and some of them share specific functions. Figure 1 shows
the overall structures of a chromosome and a gene which are formed by a string of nucleotides
A, C, G, T corresponding to adenine, cytosine, thymine and guanine bases. The analysis of
DNA sequence is a crucial application area in computational biology, and finding of
similarity between genes and DNA subsequences provides an essential knowledge of their
structures and their functions. The sample image of human chromosome is shown in the
figure.1 (a) chromosome is made of DNA sequences which consists of genes and figure.1 (b)
is indicates the gene sequence.

http://www.iaeme.com/IJCET/index.asp 39 [email protected]
Review on Clustering Cancer Genes

Figure 1 (a) Structure of human chromosome (b) Gene Sequence

2.1. Significant Gene Sequence Clustering Techniques


Generally, clustering technique is defined as a group of objects that are similar to one another
in same cluster and dissimilar objects are considered in other clusters. The group of data are
preserved collectively in single group; it’s also known as data compression. The clustering
algorithms are based on distance measured between two objects. Basically, the goal is to
minimize the distance of every object from the center of the cluster to which the object
belongs. In general, the major clustering methods can be classified into several categories and
it’s explained in the following sections.
 Partition technique: This clustering technique segments the data objects into non
overlapping clusters, since every data object is accurately present in one subset.
Moreover, partition provides every data objects with cluster index values. In this
technique, every cluster is indicated as centroids for example, K-Means, neighboring
data points or medoids. Generally, number of clusters are selected randomly in order
to optimize the clustering criterion for reassigning the data points.
Advantage: The major benefit of this approach is reduced mean squared errors between
the data points.
Disadvantage: The significant limitation of K-Means is more number of possible solutions
occurred as a result [11].
 Hierarchical technique: This technique is extensively employed in identifying the
clusters in genomic data. Initially, the set of divisions generates the cluster hierarchy
based on the significant criteria such as single linkage, complete linkage and average
linkage clusters. These cluster hierarchies also known as tree of cluster or dendrogram.
The hierarchical clustering method consists of two types such as agglomerative
(bottom-up) approach and divisive (top-down) approach. The Bottom-up technique is
start with the single clusters after that combines the more number of relevant clusters.
This clustering process is continuous until it reaches the certain criterion.
Disadvantage: The major limitations of Hierarchical clustering technique is when the
size of the cluster tree is increased, then time complexity also high [12].
 Density based technique: The clusters in this approach are dense regions of objects
in space that are separated by low density regions where cluster density is defined by
the criteria of each cluster must have a minimum number of points in its
neighborhood. The density based clustering technique is used in the dense region of
objects on data space for example, DBSCAN-KM algorithm. Advantage: The major

http://www.iaeme.com/IJCET/index.asp 40 [email protected]
Prabhuraj, P.M Mallikarjuna Shastry, S.S Patil

point focused on DBSCAN-KM algorithm is use of the constant radius, every


instance’s neighboring element enclosed with minimum number of objects. While
counting the objects, every object’s neighborhood density is computed without
discretization.
Disadvantage: The significant limitation of this technique is when the difference
between the gene densities is maximum, then it’s not able to cluster accurately [13].
 Graph based clustering: It’s a kind of clustering technique and graph structure is
formed by group of vertices and edges that are connected between the pair of vertices.
This technique performs by creating a set of vertices which are indicated as graph after
that graphs are clustered. In each clusters, the graph includes more number of edges
and some of the edges are in between the clusters.
Advantage: The graph based method reduces the information loss and use the
minimum sample hence, running time is decreased.
Disadvantage: The algorithm seems to be working well for randomly scattered data
points but it could not properly derive regular geometrical patterns and clusters in such
case results in irregular structures [14].
 Evolutionary Clustering Technique (ECT): This technique is highly concentrated
on resolving the time complexity in clustering data. The ECT includes two significant
optimization criteria such as (i) clustering of any data over time should represent the
appropriate clusters (ii) data clustering does not shift from one-time period to another.
Advantage: The ECT is effectively remove the noise, maximum consistency, and more
correspondence clusters.
Disadvantage: In high dimensional data, maximum time is required for searching the
optimal solution in search space [15-16].
 Ensemble Clustering Technique: It is a popular way of combining the classification
strategies to overcome instabilities in different classification algorithms. It scales
linearly among the number of data points and the number of repetitions by making it
feasible to apply for large data sets.
Advantage: The algorithm also improves the ability of a clustering algorithm to find
structures in a data set as it can find any cluster shapes in the data set.
Disadvantage: The miss prediction in certain cluster structures [17].

3. ANALYSIS OF CANCER GENE CLUSTERING TECHNIQUES


In past decades, cancer is the severe disease which is difficult to detect accurately hence,
detection of cancer is the significant phase for diagnosis as well as treatment. The various
kinds of cancer classified based on the gene activity in the tumor cell. In this section, evaluate
the different kinds of cancer gene clustering techniques for example, density based, model
based, ensemble and etc. A brief evaluation of some essential contributions to the existing
literatures are presented in this section.
M. Soruri, et al. [18] presented an efficient method for gene clustering namely Hidden
Markov Model with Particle Swarm Optimization (HMM-PSO) method. The HMM model
defines the particular gene sequence after that model calculates the probability of each
sequence. The HMM model is helps to calculate the similarity between the sequences. The
PSO algorithm is optimize the similarity values and based on the clustering sequence the
symmetric distance matrix is constructed. The model based gene clustering technique clusters
the gene sequence based on analytical algorithm and not able to model by feature vectors. In
order to achieve accurate cancer gene clustering, the number of iterations are maximum.

http://www.iaeme.com/IJCET/index.asp 41 [email protected]
Review on Clustering Cancer Genes

N. Nidheesh, et al. [19] developed a density based KM algorithm for the estimation of
cancer subspace in gene expression data. Generally, KM algorithm includes several benefits
such as easy to implement, resolves the problem of linear space complexity without any time
complexity. Similarly, KM approach includes several disadvantages i.e. non-deterministic
nature, and random set of data points are being considered as centroid. The random selection
of data points degraded the cluster efficiency. In this literature, the density based KM
algorithm has difficulty to perform in outlier data hence, time complexity was gradually
increased.
S. Saha, et al. [20] established ensemble based clustering namely Multi-Objective (MO)
fuzzy technique for enhancing the performance of cancer gene classification. The few
processes are merged with the ensemble based framework (i) To detect the overlapped
clusters, fuzzy logic is used (ii) In order to identify the various shape of the clusters and
calculated the distance between the clusters by symmetry based distance measure. (iii) MO-
optimization approach and MO-differential evolution methods are used to improve the search
space efficiency for finding the optimal partitioning in minimum time. The ensemble
clustering method needs prior information about the number of clusters in the datasets which
is the major limitation of this method.
Huang, X., et al. [21] presented an efficient method namely Support Vector Machine
Recursive Feature Elimination (SVM-RFE) for gene feature selection. Initially, SVM-RFE
method randomly selects the genes after that ranks the selected genes and finally clusters the
genes as similar expression profiles. According to experimental analysis, in contrast to the
traditional clustering method, the SVM-RFE algorithm shows better clustering efficiency and
minimum computation complexity. Also, SVM-RFE method minimizes the relevant gene
features and maximizes the redundant features.
S. R. Kannan, et al. [22] developed Kernel based Fuzzy clustering (KF) system for
evaluating the cancer data. This clustering algorithm considers the breast cancer data, these
data are the high dimensional gene expression profile. The KF method helps to select the
various levels of non-linearity to identify the membership functions complexity. The
significant advantage of KF method are reduced number of iterations in prototype
initialization and decreased running time. But, the features have high dimensional data hence
complexities of data clustering is bit increased.

4. SIGNIFICANT CHALLENGES OF GENE CLUSTERING


The gene data has several levels of genes and monitoring the expressional behavior of the
genes under various experiments. The traditional research studies of gene clustering under
different experiments with different conditions are difficult to accomplish the goals because
of several limitations. The typical challenges of clustering techniques in gene data are
addressed below.
 High Dimensionality: The gene data is the high dimensional data because the gene
matrix includes the more number of rows and columns. Moreover, number of
attributes are increased in the dataset, hence distance measure faces difficulty to
measure the difference between the clusters.
 Noisy Data: The gene data samples calculate the levels of variation in gene expression
between cells. The public gene dataset generally includes noisy data like missed cell
values, unlabeled data, difficult to identify the outliers, poor quality and etc. These
kinds of noises are influences the cluster process.
 Redundancy: The biological process in a gene study under scrutiny is assumed as a
complicated process, which involves determined gene reactions in different pathways.

http://www.iaeme.com/IJCET/index.asp 42 [email protected]
Prabhuraj, P.M Mallikarjuna Shastry, S.S Patil

While some genes can be even involved in more than one pathway, while some others
might not be relevant to the biological process. Moreover, all gene data values are
dependent on other gene values hence, gene values are redundant.
 Scalability: The gene data includes the large size of datasets and it’s includes number
of data items. The existing clustering technique increase the running time linearly
because of large size dataset. Sometimes re-scan the data on servers may be an
expensive operation since data are generated by an expensive join query over
potentially distributed data warehouse. Thus, only one data scan is usually required.
 Time and Space Complexity: The computational complexity is linear in input
features, objects and number of iterations. In every iteration, loops search the nearest
neighbor in the clusters also, performs the insertion of few clusters or remove the
clusters from the stack. If clusters are removed from the stack, it influences the other
clusters hence, time and space complexity are increased gradually.

5. DISTANCE MEASURES IN GENE CLUSTERS


Generally, defined clusters are measured using two kinds of methods such as model based
approaches and distance based approaches. First, the model based method helps to calculate
the different data points in high dimensional space. Secondly, distance based method calculate
the pair of relation between the data points in high dimensional space. The brief description of
different distance measures for clustering gene data is mentioned in the following sections.
 Pearson correlation: This is one kind of similarity measures in the clustering
technique. This method is the dot product of the two dimensional vectors or cosine
between the two vectors. It’s calculate the similarity in the shapes of two gene profiles
and not consider the magnitude of the profiles [23].
 Euclidean Distance: In biological samples, distance measure identifies the
heterogeneity in gene clusters. This similarity metric calculates the distance between
two different data points in the space and represents the absolute behavior of the genes
[24].
 Jackknife: This is one kind of similarity measure, it decreases the effects of gene’s
outliers values in the correlation values. If two sequences show the similar values at a
time irrelevant values are removed by Jackknife metric. If the sequences do not have
outliers then correlation value is stable.
 Kendall: The traditional Kendall distance measure considers only same size and same
gene elements in the space. An extended Kendall rank distance, measures the
difference between ranked position of an element present in all analyzed lists [25].
 Mahalanobis Distance: It measure the distance between the two data points as the
sum of the absolute of their coordinates. Further, it does not depend upon the
translation and reflection of the coordinate system. The one disadvantage is that it
depends upon the rotation of the coordinate system [26].
The comparative study of various existing techniques for different kinds of cancer gene
clustering approach analyzed with its merits, demerits, use of standard datasets, and similarity
measures are described in the table 1.

http://www.iaeme.com/IJCET/index.asp 43 [email protected]
Review on Clustering Cancer Genes

Table 1 Related Work


Author Methodology Technique Advantage Dataset Limitation Similarity Performance
Name Employed Category Measure Evaluation
M. Soruri, HMM-PSO Model Improve lung cancer- Number of Distance DBL:0.45
et al. [18] cluster quality related iterations are Matrix,
genes data maximum Similarity
Matrix
N. KM Algorithm Density Select the data UCI dataset High Time Euclidean Adjusted Rand
Nidheesh, points which complexity distance Index: 0.714
[19] belong to because it is
dense regions. difficult to
clusters the
outliers data
S. Saha, et FCM-PSO- Ensemble Multi- Gene Difficult to Euclidean Silhouette index:
al. [20] Differential objective Ontology (GO) clusters the distance 0.49, Execution
Evolution based annotation gene data Time: 57 sec.
clustering database because of
techniques, for noisy raw data
the allocation
of data points
to different
clusters.
X. Huang, SVM-RFE Ensemble Decreases the Gene High Euclidean Accuracy:
et al. [21] computational expression computational distance 88.24%,
complexities dataset complexity Running time:
and 1404 sec
redundancy
among genes
S. R. KF Clustering Partition Introduced Breast cancer High Laplacian Accuracy: 73.1%
Kannan, et System prototype data dimensional kernel-
al. [22] initialization features hence, induced
method to difficult to distance,
avoid more cluster Canberra
number of distance
iterations.
J. Ramos, Clustering based Partition Gene Lung cancer Poor Euclidean Accuracy:
[27] Multi Agent clustering data, Colon performance distance Leukemia
system through and with respect to Dataset- 90.2%,
coordinated Leukemia multiple CRC-dataset-
agents to cancer data datasets 85.4%
discover an
informative
gene subsets.
H. Chen, et Kernel-Based Distance It searches for Public cancer Euclidean Accuracy: 94.5%,
al. [28] Clustering method the best datasets High running distance TPR: 0.94,
for Gene weights of time because FPR:0.06,
Selection genes of more
iteratively at number of
the same time features
to optimize the
clustering
objective
function.
S. S. Ray, Supervised Distance Improve the Saccharomyces One feature is weighted Positive
and S. Weighted positive Genome dependent on Pearson Predictive Value:
Misra, [29] Similarity predictive Database other features correlation 0.91
value for gene and all
pairs attributes are
correlated
hence,
computational
complexity is
high
Z. Yu, et Adaptive Random Partition Reduces the Cancer gene Euclidean Random
al. [30] Double Clustering feature expression Missing values distance, Index:7.92
based Cluster dimension and Profile data in the dataset Similarity- Purity Measure:

http://www.iaeme.com/IJCET/index.asp 44 [email protected]
Prabhuraj, P.M Mallikarjuna Shastry, S.S Patil

Ensemble the sample hence, difficult Rank 7.95


Framework (A- dimension to to predict the (SimRank).
RDCCE) lessen the values.
effect of noise.
J. Wang, et Laplacian - LLRR method Public cancer symmetric Accuracy:
al. [31] regularized Low- simultaneously gene dataset Not able to similarity 95.83%
Rank capture the perform in matrix
Representation global large scale
Clustering structures and dataset.
technique the intrinsic
local
geometrical
information
within the
data.
Z. Multi-Objective Evolutionary Fast Gene Maximum Davis– Dunn-
Zareizadeh, clonal selection convergence to expression number of Bouldin index Index:0.1744,
et al. [32] optimization the optimal datasets iterations Execution Time:
algorithm solutions and hence, time 1028sec
frequently complexity is
update the maximum
solutions

5. CONCLUSIONS
Cancer gene data has high dimensionality and cluster structure. Numerous statistical strategies
are helps to detect the cancer genes in order to improve the cancer detection and development
stages. To detect derivative expressed genes under comparative conditions, various hypothesis
testing methods and the false discovery rate approach are used. The existing cancer gene
clustering technique is helps to identity the cancerous and normal gene effectively. The
clustering technique is classified into several types such as partition based, density, graph
based and etc. In this paper, review on existing cancer gene clustering technique advantage,
limitation and similarity measure is described. According to the comparison table.1, the
machine learning technique with Ensemble classifier shows the better results with respect to
some efficient parameters such as time, memory and accuracy. The clustering technique is
implemented in standard cancer gene datasets.

REFERENCES
[1] Nagi, S. and Bhattacharyya, D. K. Cluster analysis of cancer data using semantic
similarity, sequence similarity and biological measures. Network Modeling Analysis in
Health Informatics and Bioinformatics, 3(1), 2014, pp. 67.
[2] Jonnalagadda, S. and Srinivasan, R. Determining distinct clusters in gene expression data
using similarity in principal component subspaces. International Journal of Advances in
Engineering Sciences and Applied Mathematics, 4(1-2), 2012, pp.41-51.
[3] Wei, D., Jiang, Q., Wei, Y. and Wang, S. A novel hierarchical clustering algorithm for
gene sequences. BMC bioinformatics, 13(1), 2012, pp. 174.
[4] Jiang, Z., Li, T., Min, W., Qi, Z. and Rao, Y. Fuzzy c-means clustering based on weights
and gene expression programming. Pattern Recognition Letters, 90, 2017, pp.1-7.
[5] Mehmood, R., El-Ashram, S., Bie, R. and Sun, Y. Effective cancer subtyping by
employing density peaks clustering by using gene expression microarray. Personal and
Ubiquitous Computing, 22(3), 2018, pp. 615-619.
[6] Acharya, S., Saha, S. and Pradhan, P. Novel symmetry-based gene-gene dissimilarity
measures utilizing Gene Ontology: Application in gene clustering. Gene, 679, pp. 341-
351.

http://www.iaeme.com/IJCET/index.asp 45 [email protected]
Review on Clustering Cancer Genes

[7] Hosseini, B. and Kiani, K. FWCMR: A scalable and robust fuzzy weighted clustering
based on MapReduce with application to microarray gene expression. Expert Systems with
Applications, 91, 2018, pp. 198-210.
[8] Chen, X. and Jian, C. Gene expression data clustering based on graph regularized
subspace segmentation. Neurocomputing, 143, 2014, pp. 44-50.
[9] Torshizi, A. D. and Zarandi, M. H. F. A new cluster validity measure based on general
type-2 fuzzy sets: application in gene expression data clustering. Knowledge-Based
Systems, 64, 2014, pp. 81-93.
[10] Alok, A. K., Saha, S. and Ekbal, A. Semi-supervised clustering for gene-expression data
in multiobjective optimization framework. International Journal of Machine Learning and
Cybernetics, 8(2), 2017, pp. 421-439.
[11] Kim, J. and Kim, H. Partitioning of functional gene expression data using principal
points. BMC bioinformatics, 18(1), 2017, pp. 450.
[12] Lord, E., Diallo, A. B. and Makarenkov, V. Classification of bioinformatics workflows
using weighted versions of partitioning and hierarchical clustering algorithms. BMC
bioinformatics, 16(1), 2015, pp. 68.
[13] Kumar, K. M. and Reddy, A. R. M. An efficient k-means clustering filtering algorithm
using density based initial cluster centers. Information Sciences, 418-419, 2017, pp. 286-
301.
[14] Sriwanna, K., Boongoen, T. and Iam-On, N. Graph clustering-based discretization
approach to microarray data. Knowledge and Information Systems, 2018, pp.1-28.
[15] Anusha, M. and Sathiaseelan, J. G. R. Evolutionary clustering algorithm using criterion-
knowledge-ranking for multi-objective optimization. In proceedings World Conference
on Futuristic Trends in Research and Innovation for Social Welfare (Startup Conclave),
pp. 1-13.
[16] Maâtouk, O., Ayadi, W., Bouziri, H. and Duval, B. Evolutionary biclustering algorithms:
an experimental study on microarray data. Soft Computing, 2018, pp. 1-27.
[17] Yin, L. and Liu, Y. Ensemble biclustering gene expression data based on the spectral
clustering. Neural Computing and Applications, 30(8), pp. 2403-2416.
[18] Soruri, M., Sadri, J. and Zahiri, S. H. Gene clustering with hidden Markov model
optimized by PSO algorithm. Pattern Analysis and Applications, 2018, pp. 1-6.
[19] Nidheesh, N., Nazeer, K. A. and Ameer, P. M. An enhanced deterministic K-Means
clustering algorithm for cancer subtype prediction from gene expression data. Computers
in biology and medicine, 91, 2017, pp. 213-221.
[20] Saha, S., Das, R. and Pakray, P. Aggregation of multi-objective fuzzy symmetry-based
clustering techniques for improving gene and cancer classification. Soft Computing, 2017,
pp. 1-20.
[21] Huang, X., Zhang, L., Wang, B., Li, F. and Zhang, Z. Feature clustering based support
vector machine recursive feature elimination for gene selection. Applied
Intelligence, 48(3), 2018, pp. 594-607.
[22] Kannan, S. R., Siva, M., Ramathilagam, S. and Devi, R. Effective Kernel-Based Fuzzy
Clustering Systems in Analyzing Cancer Database. Data-Enabled Discovery and
Applications, 2(1), 2018, pp. 5.
[23] Jaskowiak, P. A., Campello, R. J. and Costa, I. G., Evaluating correlation coefficients for
clustering gene expression profiles of cancer. In Brazilian Symposium on
Bioinformatics, pp. 120-131, Berlin, Heidelberg.

http://www.iaeme.com/IJCET/index.asp 46 [email protected]
Prabhuraj, P.M Mallikarjuna Shastry, S.S Patil

[24] Yan, X., Liang, A., Gomez, J., Cohn, L., Zhao, H. and Chupp, G. L. A novel pathway-
based distance score enhances assessment of disease heterogeneity in gene
expression. BMC bioinformatics, 18(1), 2017, pp. 309.
[25] Chicco, D., Ciceri, E. and Masseroli, M., Extended Spearman and Kendall Coefficients for
Gene Annotation List Correlation. In International Meeting on Computational Intelligence
Methods for Bioinformatics and Biostatistics, 2014, pp. 19-32, Springer.
[26] Kumar, V., Chhabra, J. K. and Kumar, D. Performance evaluation of distance metrics in
the clustering algorithms. INFOCOMP, 13(1), 2014, pp. 38-52.
[27] Ramos, J., Castellanos-Garzón, J. A., González-Briones, A., de Paz, J. F. and Corchado, J.
M. An agent-based clustering approach for gene selection in gene expression
microarray. Interdisciplinary Sciences: Computational Life Sciences, 9(1), 2017, pp. 1-13.
[28] Chen, H., Zhang, Y. and Gutman, I. A kernel-based clustering method for gene selection
with gene expression data. Journal of biomedical informatics, vol. 62, pp. 12-20.
[29] Ray, S. S. and Misra, S. A supervised weighted similarity measure for gene expressions
using biological knowledge. Gene, 595(2), 2016, pp. 150-160.
[30] Yu, Z., Chen, H., You, J., Liu, J., Wong, H. S., Han, G. and Li, L. Adaptive fuzzy
consensus clustering framework for clustering analysis of cancer data. IEEE/ACM
Transactions on Computational Biology and Bioinformatics (TCBB), 12(4), 2015, pp.
887-901.
[31] Wang, J., Liu, J. X., Kong, X. Z., Yuan, S. S. and Dai, L. Y., Laplacian Regularized Low-
Rank Representation for Cancer Samples Clustering. Computational Biology and
Chemistry, 2018.
[32] Zareizadeh, Z., Helfroush, M. S., Rahideh, A. and Kazemi, K., A robust gene clustering
algorithm based on clonal selection in multiobjective optimization framework. Expert
Systems with Applications, 113, 2018, pp. 301-314.

http://www.iaeme.com/IJCET/index.asp 47 [email protected]

You might also like