Mtap D 24 02268

Multimedia Tools and Applications
EXTREME GRADIENT BOOSTING MODEL FOR PREDICTION OF GENOME

DISORDER
--Manuscript Draft--
Manuscript Number: MTAP-D-24-02268
Full Title: EXTREME GRADIENT BOOSTING MODEL FOR PREDICTION OF GENOME

DISORDER
Article Type: Track 2: Medical Applications of Multimedia
Keywords: AGDPM, XGBoost, RNA, Deep Learning, DNN, SGID, MGID and CNN
Corresponding Author: Pavan Desineedi, M.Tech

Bonam Venkata Chalamayya Engineering College
Allavaram, Andhra Pradesh INDIA
Corresponding Author Secondary

Information:
Corresponding Author's Institution: Bonam Venkata Chalamayya Engineering College
Corresponding Author's Secondary

Institution:
First Author: Pavan Desineedi, M.Tech
First Author Secondary Information:
Order of Authors: Pavan Desineedi, M.Tech
Subba Rao Polamuri
Chandra Mouli Venkata Srinivas Aakana
Order of Authors Secondary Information:
Funding Information:
Abstract: Genetic illness prediction is an important and timely issue in the realm of biomedical
science. Mutations in the genome are the root cause of many diseases with significant
global mortality rates, including Alzheimer's, cancer, diabetes, cystic fibrosis, leigh
syndrome, and others. Theoretical and explanatory approaches to predicting genetic
abnormalities have been developed through prior research. Genetic data has
expanded to practically include the entire genome and protein, and methods based on
deep learning and machine learning have been created to forecast genomic
abnormalities in response. Concurrently with the introduction of machine learning
techniques, deep learning methods also emerged. Studies on the forecasting of
genetic anomalies have previously employed a variety of learning strategies, including
supervised, unsupervised, and semi-supervised approaches. Most of these studies
used genetic sequence data to make predictions about binary dilemmas. These
methods produced dubious results since they were less accurate and relied on binary
class prediction algorithms, which ignore the pasts of individuals with genetic
anomalies. The majority of the approaches relied on RNA gene sequences, which led
to frequent issues when dealing with auction data. Here, we use the XGBoost
Algorithm to foretell genome multiclass disease from a huge dataset utilising an
advanced genome disorder prediction model (AGDPM). AGDPM outperformed the
trained XGBoost Algorithm in every category, with an average accuracy of 92.65% in
both the training and testing phases of the study. Therefore, the state-of-the-art
genome disorder prediction model can reliably predict genome disorder and analyse a
large quantity of patient genome disorder data thanks to the incorporation of a multi-
class prediction technique. Multiple statistical performance metrics demonstrate that
AGDPM may accurately predict diseases caused by a single gene, mitochondrial
genes, and multiple genes. As a result, AGDPM will help biomedical researchers
manage mortality rates and anticipate genetic disorders.
Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation
Click here to access/download;Manuscript;EGD single
coloum.docx
Click here to view linked References
EXTREME GRADIENT BOOSTING MODEL FOR PREDICTION OF GENOME

1 DISORDER
2
3 1
4 D.N.V. Pavan Manikanta, 2Dr Subba Rao Polamuri, 3Dr. Chandra Mouli Venkata Srinivas
5 Aakana
6
7 1
M. Tech Student Department of CSE, 2Professor & HOD Department of CSE (AI&DS), 3Professor &
8
9
Principal Department of CSE
10 123
11 Bonam Venkata Chalamayya Engineering College (Autonomous), Odalarevu
12
1
13 [email protected], [email protected], [email protected]
14
15 ABSTRACT
16
17 Genetic illness prediction is an important and timely issue in the realm of biomedical science.
18
19 Mutations in the genome are the root cause of many diseases with significant global mortality
20 rates, including Alzheimer's, cancer, diabetes, cystic fibrosis, leigh syndrome, and others.
21
22 Theoretical and explanatory approaches to predicting genetic abnormalities have been
23 developed through prior research. Genetic data has expanded to practically include the entire
24 genome and protein, and methods based on deep learning and machine learning have been
25
26 created to forecast genomic abnormalities in response. Concurrently with the introduction of
27 machine learning techniques, deep learning methods also emerged. Studies on the forecasting
28
29 of genetic anomalies have previously employed a variety of learning strategies, including
30 supervised, unsupervised, and semi-supervised approaches. Most of these studies used genetic
31 sequence data to make predictions about binary dilemmas. These methods produced dubious
32
33 results since they were less accurate and relied on binary class prediction algorithms, which
34 ignore the pasts of individuals with genetic anomalies. The majority of the approaches relied
35
36 on RNA gene sequences, which led to frequent issues when dealing with auction data. Here,
37 we use the XGBoost Algorithm to foretell genome multiclass disease from a huge dataset
38 utilising an advanced genome disorder prediction model (AGDPM). AGDPM outperformed
39
40 the trained XGBoost Algorithm in every category, with an average accuracy of 92.65% in both
41 the training and testing phases of the study. Therefore, the state-of-the-art genome disorder
42
43 prediction model can reliably predict genome disorder and analyse a large quantity of patient
44 genome disorder data thanks to the incorporation of a multi-class prediction technique.
45
Multiple statistical performance metrics demonstrate that AGDPM may accurately predict
46
47 diseases caused by a single gene, mitochondrial genes, and multiple genes. As a result,
48 AGDPM will help biomedical researchers manage mortality rates and anticipate genetic
49
50 disorders.
51
52 Keywords: AGDPM, XGBoost, RNA, Deep Learning, DNN, SGID, MGID and CNN
53
54
55
56
57
58
59
60
61
62 1
63
64
65
1. INTRODUCTION
1
2 It is estimated that almost 2,000 different human diseases can be traced back to a single faulty
3
4 gene, making them monogenic syndromes. The underlying genes for each illness present
5 themselves in somewhat different ways, resulting in a wide range of phenotypic manifestations.
6
Therefore, establishing phenotype-gene correlations is a crucial biological activity that aids
7
8 researchers and medical professionals in understanding the fundamental genetic pathways
9 behind disorders. The identification of disease-causing genes aids in patient diagnosis and
10
11 sheds light on the complex network of genetic interactions. In other words, a possible genetic
12 disease can be detected by studying the causative mutant genotypes during the sickness gene
13
14
identification procedure. The same way Single nucleotide changes, single nucleotide additions
15 or deletions, complete gene loss, and other genetic anomalies can all have an impact on disease-
16 causing genes. Positional cloning, linkage analysis, and mutation analysis are all examples of
17
18 time-honored approaches to identifying pathogen genes. First, using linkage analysis on human
19 pedigrees, the susceptible chromosomal interval is discovered, which is roughly where the
20
21
disease-associated candidate genes are located. The use of positional cloning to sequence a set
22 of putative genes in the region is the second topic covered. This approach incorporates both
23 spatial and transcriptional mapping. A human genetic disorder is an inherited condition
24
25 manifested from conception due to a genetic or chromosomal abnormality. There are two
26 primary categories of genetic illnesses: single-gene diseases and complicated disorders. One
27
28
gene aberration caused by a single mutation in the structure of deoxyribonucleic acid is a severe
29 shortcoming. These problems are easily passed down from one generation to the next.
30 Mandolin diseases is a term used to describe this group of illnesses. Complex diseases are the
31
32 pathological outcome of a confluence of environmental, behavioral, and lifestyle factors, and
33 genetic defects account for only a small fraction of the phenotypes associated with these
34
35
diseases. A mutation in a single gene is the sole cause of a single gene disorder. The wide
36 variety of single-gene illnesses is due to the fact that they might originate in any gene. All
37 single gene disorders share the same core genetic and psychosocial care needs despite their
38
39 wide variation in presentation. the ability to make educated decisions about risk management
40 strategies and provide emotional and practical assistance to those who are ill, whether they are
41
42
young or old. It's associated with alterations in mitochondrial DNA that doesn't come from the
43 nucleus. There are as many as ten circular strands of deoxyribonucleic acid in the mitochondrial
44 genome. After becoming fertilised eggs, they keep their organelles intact. Therefore, mothers
45
46 always end up becoming the source of their children's illnesses. The symptoms of
47 mitochondrial disease are lactic acidosis, stroke-like episodes, eye abnormalities, and
48
49
encephalopathy. Inherited disorders have various root causes. Several diseases have several
50 causes, including gene alterations that work in tandem with dietary and environmental
51 variables. Polygenic illness can also be referred to as complex illness [2]. One complex genetic
52
53 disorder underlies diabetes, Alzheimer's, and cancer.
54
55
56
2. OBJECTIVE
57 The AGDPM used numerous statistical performance parameters to predict the results of the
58 multifactorial gene inheritance disease simulation. Furthermore, genetic illnesses might be
59
60 multifactorial, which means that genetic factors contribute to the development of only a subset
61
62 2
63
64
65
of the phenotypes associated with the disorder. Diseases with multiple causal factors, or risk
1 factors, include those caused by both genetic predisposition and environmental influences. A
2
3 mutation in a single gene is the sole cause of a single gene disorder. The wide variety of single-
4 gene illnesses is due to the fact that they might originate in any gene. Despite their clinical
5
6 distinctions, all single-gene illnesses are inherited, share a common biological basis, and
7 require the same fundamental genetic and counselling services. the ability to make educated
8 decisions about risk management strategies and provide emotional and practical assistance to
9
10 those who are ill, whether they are young or old. It's associated with alterations in mitochondrial
11 DNA that doesn't come from the nucleus. There are between five and ten circular strands of
12
13 deoxyribonucleic acid that make up each mitochondrial genome. After becoming fertilised
14 eggs, they keep their organelles in tact. Therefore, mothers always end up becoming the source
15 of their children's illnesses. The symptoms of mitochondrial disease are lactic acidosis, stroke-
16
17 like episodes, eye abnormalities, and encephalopathy. These diseases, which are frequently the
18 result of interplays between environmental and nutritional factors, may involve many
19
20 mutations. It's sometimes called complicated illness or polygenic disease. One complex genetic
21 disorder underlies diabetes, Alzheimer's, and cancer. An alternative to conventional methods
22 of genetic prediction is machine learning. Due to advancements in the area, as well as growing
23
24 data sets and computing power, deep learning has become increasingly popular in recent years.
25 These methods are useful in statistical genetics because they enable the identification of
26
27 interactions between several loci without the need to assume additivity and because of the high
28 dimensionality with which they operate, making it difficult to predict the relative importance
29 of various factors.
30
31
32 2.1 PROBLEM STATEMENT
33 In the realm of genetics and medical research, it is essential to forecast genome disorders.
34 Although Deep Neural Networks (DNNs) have shown a lot of promise in tackling this issue,
35
36
their generalization and performance are constrained by overfitting.
37 Convolutional Neural Networks (CNNs) are limited by the increasing spatial correlation of
38 zeroed-out values in output feature maps. In order to avoid overfitting, dropout is frequently
39 employed.
40 The current setup recommends using Checkerboard Dropout, a structured dropout method, to
41
42 improve performance and generality while also tackling the spatial correlation problem.
43 Despite its advantages, the Checkerboard Dropout may still have problems that need fixing.
44
45
46 2.2 EXISTING SYSTEM
47 Dropout is a method employed by contemporary Deep Neural Networks (DNNs) to combat
48
49 overfitting. During a dropout, features from feature maps are removed at random. However,
50 the dropout's applicability to CNNs is constrained by an increase in the spatial correlation of
51 the zeroed-out values in the output feature maps, which in turn hinders the network's
52 performance and generalisation.
53
54
Drop Block, which is an organised dropout used to drop a continuous zone and reduce the
55 unpredictability of the standard dropout, has recently been used efficiently to alleviate the
56 spatial correlation problem.
57
58 Disadvantage of Existing System
59
60  The recommends using Checkerboard Dropout as a fix for the overfitting problem.
61
62 3
63
64
65
 An efficient structured dropout technique for mitigating randomness and spatial
1 correlation problems while augmenting model generalisation is the Checkerboard
2
3
Dropout.
4
5 2.3 PROPOSED SYSTEM
6 Complex Multiple gene abnormalities can cause a wide variety of symptoms. These include
7 multifactorial genome disorder, mitochondrial gene inheritance disorder, and single gene
8
9 inheritance disorder.
10 New advances in genomic technology have made it possible to acquire genetic data with greater
11 precision.
12 Hundreds of people with abnormalities have been found in many large-scale genetic studies,
13
including those for MGD and SGID. Despite the mountain of data our study has produced,
14
15 pinpointing the specific disease-causing genes has proven challenging. After becoming
16 fertilised eggs, they keep their organelles in tact. Therefore, mothers always end up becoming
17 the source of their children's illnesses.
18
19
20 Advantages of Proposed System
21  A gradient descent method is used to minimise the loss when adding new models.
22  Its independence in doing feature engineering.
23  The suggested model, XGBoost Algorithm, obtained 92.65% prediction accuracy using
24
25 patients' clinical feature base data.
26  The suggested model, which also had ideal space and computational complexity,
27 employed the perfect XGBoost Algorithm to forecast this illness.
28
 It improved dramatically in terms of result prediction.
29
30
31
32
33
3. RELATED WORKS
34
35 New advances in genomic technology have made it possible to acquire genetic data with
36 greater precision. Hundreds of people with abnormalities have been found in numerous large-
37
38 scale genetic studies, including those for MGD and SGID [4, 5]. Finding the specific disease-
39 causing genes has been challenging despite the abundance of data from this investigation [6].
40 The fact that different disturbances within a single disorder module often produce similar
41
42 phenotypes, as well as the close relationships between proteins and phenomena networks
43 (where genes are appended endpoints if they indicate associated phenotypic states), suggest
44
45 that genetic information is particularly useful [7]. connection between transcription factor
46 networks and the genome [8]. Furthermore, anomalies observed in interactome distant
47 neighbours create unique phenotypes [6].There are methods out there for predicting disease
48
49 based on genes that take into account all of these factors. In this investigation, a binary
50 support vector machine was used to aggregate data from several sources. Binary learning
51
52 algorithms, both adaptive and maladaptive [10, 11], have been proposed as a means of sifting
53 through the residual collection in the hopes of discovering previously undiscovered genes or
54 diseases. Recent years have witnessed the successful implementation of deep learning and
55
56 machine learning in many biological applications. Despite being able to handle massive data
57 sets with substantial noise, complexity, and/or error levels, deep learning and machine
58
59 learning algorithms only produce a small number of trustworthy estimates of probability
60 distributions and data production processes.
61
62 4
63
64
65
4. METHODOLOGY OF PROJECT
1
2 In order to effectively treat genetic illnesses, early diagnosis is crucial for both clinicians and
3
4
the biomedical industry. In this investigation, we suggest AGDPM for the early diagnosis of
5 multi-class genetic anomalies. The training model of the XGBoost algorithm and the AGDPM
6 are used to illustrate the investigation's flow. This method will use a streamlit framework to
7 promote user involvement since it anticipates output that includes mitochondrial gene
8 inheritance illnesses, single-gene inheritance disorders, and multifactorial gene inheritance
9
10 disorders without the need for a physician.
11
12 MODULES NAME:
13  Data gathering,
14
15  dataset creation,
16  data preparation, model selection,
17  analysis, and prediction,
18
19
 accuracy on the test set
20  saving the trained model
21
22 MODULE DESCRIPTION:
23 1) Data Collection:
24
25 This is the first step in the real process of collecting data and creating a machine learning model.
26 This is an important stage since the amount and quality of data we are able to gather will
27 determine how effectively the model works.
28 Manual interventions, online scraping, and other techniques are used in data collection.
29
30
31 2) Dataset:
32 The collection contains 22084 unique bits of data. The 45 columns in the dataset are described
33 in detail below.
34 1. Patient Id: Patient Id with "Genetic Disorder" written on it.
35
36 2. Patient Age: The age of the patient or the user
37 3. Mother's side genes - maternal genes, whether or not they are present.
38 4. Inherited from father to father: Parents use DNA to pass on characteristics or traits to their
39 children, such as blood type and eye colour.
40
41
5. Maternal gene: Genes that produce or deposit RNA or protein byproducts in the oocyte, or
42 are found in the fertilised egg or embryo prior to the onset of zygotic gene expression, are
43 known as maternal genes.
44 6. Paternal gene: Paternal inheritance is the term used to describe any characteristic that a father
45 passes on to his offspring.
46
47 The measurement of the amount of red, white, and platelet-rich blood in the body is called the
48 blood cell count (mcL).
49 8. Patient First Name, which is the patient's surname
50 9. Father's name and family name
51
10. Name of mother and father
52
53 11. Age of mother - age of mother 12. Age of father - age of father
54 13. Institution Name: The hospital's or institution's name
55 14. Institute's Location: Hospital or Institutional
56 15. Status: Is the person or patient still living or has passed away?
57
58
16. Respiratory Rate (breaths/min): The brain's respiratory centre controls and determines how
59 quickly people breathe.
60
61
62 5
63
64
65
17. Heart Rate (rates/min): The frequency of the heartbeat, also called the beats per minute, or
1 bpm, is established by counting the number of heartbeats (also called pulse rate, or heart rate)
2 that transpire each minute.
3
4 18. Test 1: Is it finished?
8
9
22. Test 5: Is it finished?
10 23. Parental consent - Also known as parental involvement laws, parental consent laws require
11 one or more parents to provide their assent or notify their child before the child is allowed to
12 legally participate in a particular activity.
13 24. Check if fellow-up is at a high or low level.
14
15 25. Gender: Male, Female, or Indeterminate
16 26. Birth asphyxia - Asphyxia, also called asphyxiation, is a condition in which breathing
17 irregularities allow the body to get insufficient oxygen. asphyxia during childbirth
18 27. Autopsy reveals birth defect (if any) - An autopsy, also called an obduction, an
19
autopsiacadaverum, a post-mortem examination, or a necropsy, is a surgical procedure that
20
21 involves a thorough examination of a corpse through dissection to determine the manner, mode,
22 and cause of death as well as to evaluate any disease or injury that may be present for
23 instructional or research purposes.
24 28. Place of Birth: The birthplace
25
26 29. Information about folic acid (peri-conceptional): Folic acid is a form of vitamin B. It aids
27 the body in producing new, healthy cells.
28 30. H/O serious maternal disease - Indicates an unanticipated result of labour and delivery that
29 had a major impact on the patient's mother in the short or long term 31. H/O radiation exposure
30 (x-ray) - Indicates whether the patient has ever been exposed to radiation
31
32 32. H/O substance abuse - Indicates if a parent has previously struggled with drug addiction.
33 33. Assisted Conception: IVF/ART - Indicates the kind of infertility therapy
34 34. Previous pregnancy abnormalities - any history of unexplained things in prior pregnancies
35 Certainly or no
36
37
35. Number of prior abortions – total amount of prior abortions
38 36. Birth defects – Indicates if a patient is afflicted with birth defects.
39 37. White blood cell count (number of White Blood cells) expressed in thousands per microliter
40 38. Blood test result: Normal, slightly abnormal, unclear, and abnormal values
41 39. Symptom 1: Does Symptom 1 exist? 40. Symptom 2: Yes or no 41 for Symptom 2.
42
43 Symptom 3: Yes or no, symptom 42. Symptom 4: Yes or no, symptom 43. Symptom 5: Yes or
44 no 44 for Symptom 5. Genetic Disorder - Professional doctor detection of genetic disorders
45 45. Type of Disorder – Subclass
46
47
48
3) Data Preparation:
49 Compile the data and prepare it for training. Eliminate duplicates, correct errors, deal with
50 missing numbers, normalise, convert data types, and other potential clutter.
51 By randomising the data, the effects of the particular order in which we collected and/or
52 otherwise processed our data are erased.
53
54 Conduct additional exploratory analysis, such as visualising data to find significant class
55 imbalances or relationships between variables (beware of bias!).
56 separated into sets for assessments and training.
57
58
59
60
61
62 6
63
64
65
4) Model Selection:
1 After utilising the XGBoost and Support Vector Machine methods, which produced accuracy
2 of 98% and 80% on the train set, respectively, we developed this method.
3
4
5 5) Analyze and Prediction:
6 Out of the entire dataset, we only chose two attributes:
7 1 A description of the health values is given.
8
9
2 Outcome: indicates the type of genetic condition that the patient or individual has.
10
11 6) Accuracy on test set:
12 We obtained accuracy of 92.65% & 41.40% on the test set.
13
14
15 7) Saving the Trained Model:
16 You are ready to put your training to work when you: The first step in deploying your trained
17 and validated model in a production setting is to save it as a.pkl file using a library like Pickle.
18 Verify that Pickle has been set up properly in your environment.
19
20
At this stage, the module will import the model and generate an.pkl file for export.
21
22 5. ALGORITHM USED IN PROJECT
23
24 5.1. XGBOOST ALGORITHM:
25
26 The regularised (L1 and L2) objective function in XGBoost consists of a convex loss function
27 (based on the difference between the predicted and target outputs) and a penalty term for model
28
29 complexity (i.e., the regression tree functions), both of which must be minimised for the
30 method to be effective. New trees are added to the training process to predict the errors or
31
32 residuals from earlier trees, and these trees are blended with the original trees to get the final
33 prediction. Because of its high performance, scalability, and accuracy, XGBoost is widely
34 utilised in image classification, text mining, and recommender systems applications.Input
35
36 features used by AGDPM include data on genetic diseases.
37
38 6. DATA FLOW DIAGRAM
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60 Fig. Data Flow Diagram
61
62 7
63
64
65
7. SYSTEM ARCHITECTURE
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23 Fig: System Architecture of Project
24
25
26 8. RESULTS
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62 8
63
64
65
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62 9
63
64
65
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62 10
63
64
65
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16 9. FUTURE ENHANCEMENT
17
18 Further genetic disorders and more prediction models can be added to this study in the future.
19
20 10. CONCLUSION
21
22
Technological progress in artificial intelligence has had a significant effect on biological
23 study. In this research, we applied the machine learning model to the original AGDPM
24 model. Information on genetic anomalies was gathered from an online source, and the
25
26 XGBoost model was used to develop the AGDPM. The model's efficacy was measured using
27 a wide variety of statistical criteria. AGDPM has a higher prediction accuracy than ResNet-
28
29
50 (92.65%) for identifying diseases caused by mutations in a single gene, mitochondrial
30 disorders, and multifactorial diseases. By aiding in the forecasting of genetic abnormalities,
31 the AGDPM will propel biomedical study forward by leaps and bounds. Additional forecasts
32
33 and genetic abnormalities may be added to this study in the future to generate a more precise
34 prediction model.
35
36
37
38 REFERENCES
39
40 [1] Mc Kusick-Nathans Institute of Genetic Medicine.Online Mendelian Inheritance in Man
41
42
Johns Hopkins University School of Medicine. Accessed: Nov.1, 2021.Available:
43 www.ncbinlmnih.gov/omim.
44
45 [2] B. Irom, ‘‘Genetic disorders: A literature review,’’ Genet. Mol. Biol. Res., vol. 4, no. 2, p.
46
47 30, 2020.
48
49 [3] A.Krizhevsky, I. Sutskever, and G. E.Hinton, ‘‘ImageNet classification with deep
50 convolutional neural networks,’’ Commun. ACM, vol. 60, no. 2, pp. 84–90, Jun. 2012.
51
52
53 [4] S. J. Sanders, ‘‘First glimpses of the neurobiology of autism spectrum dis-order,’’ Current
54 Opinion Genet. Develop. vol. 33, pp. 80–92, Aug. 2015.
55
56 [5] Europe PMC Funders Group, ‘‘Biological insights from 108 schizophrenia-associated
57
58 genetic loci,’’ Nature, vol. 511, no. 7510, pp. 421–427, Jul. 2014.
59
60
61
62 11
63
64
65
[6] J.Menche, A. Sharma, M. Kitsak, S. D. Ghiassian, M. Vidal, J. Loscalzo, and A.-L.
1 Barabasi, ‘‘Uncovering disease-disease relationships through the incomplete interactome,’’
2
3 Science, vol. 347, no. 6224, Feb. 2015, Art. no. 1257601.
4
5 [7] A. L. Barabási, N. Gulbahce, and J. Loscalzo, ‘‘Network medicine: A network-based
6
approach to human disease,’’ Nature Rev. Genet., vol. 12, pp. 56–68, Oct. 2011.
7
8
9 [8] M. Vidal, M. E. Cusick, and A.L. Barabási,‘‘Interactome networks and human disease,’’
10 Cell, vol. 144, no. 6, pp. 986–998, Mar. 2011.
11
12 [9] X.Wang, N.Gulbahce, and H. Yu, ‘‘Network-based methods for human disease gene
13
14 prediction,’’ Briefings Funct.Genomics, vol. 10, no. 5, pp. 280 293, 2011.
15
16 [10] T.-P. Nguyen and T.B. Ho, ‘‘Detecting disease genes based on semi-supervised learning
17
18
and protein–protein interaction networks,’’ Artif. Intell. Med., vol. 54, no. 1, pp. 63–71, Jan.
19 2012.
20
21 [11] P. Yang, X. L. Li, J. P. Mei, C. K. Kwoh, and S. K. Ng, ‘‘Positive-unlabeled learning for
22
23 disease gene identification,’’ Bioinformatics, vol. 28, no. 20, pp. 2640–2647, 2012.
24
25 [12] A. Rishabh. Of Genomes and Genetics HackerEarth Machine Learning Challenge.
26 Kaggle. Accessed:Oct. 27, 2021. Available: https://www.kaggle.com/aryarishabh/of-
27
28 genomes-and-genetics-hackerearth-ml-challenge.
29
30 [13] P. Han, P. Yang, P. Zhao, S. Shang, Y. Liu, J. Zhou, X. Gao, and P. Kalnis, ‘‘GCN-MF:
31 Disease-gene association identification by graph convolutional networks and matrix
32
33 factorization,’’ in Proc. 25th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, Jul.
34 2019, pp. 705–713.
35
36
37
[14] X. Zeng, Y. Liao, Y. Liu, and Q. Zou,‘‘Prediction and validation of disease genes using
38 HeteSim scores,’’ IEEE/ACM Trans. Comput. Biol. Bioinf., vol. 14, no. 3, pp. 687–695, May
39 2017.
40
41
42
[15] H. Zhou and J. Skolnick,‘‘A knowledge-based approach for predicting gene–disease
43 associations,’’ Bioinformatics, vol. 32, no. 18, pp. 2831–2838, Sep. 2016.
44
45 [16] Y. Li, H. Kuwahara, P. Yang, L. Song, and X. Gao, ‘‘PGCN: Disease gene prioritization
46
47 by disease and gene embedding through graph convolutional neural networks,’’ bioRxiv, vol.
48 2019, Jan. 2019, Art. no. 532226, doi: 10.1101/532226.
49
50 [17] K. Yang, Y. Zheng, K. Lu, K. Chang, N. Wang, Z. Shu, J. Yu, B. Liu, Z. Gao, and X.
51
52 Zhou, ‘‘PDGNet: Predicting disease genes using a deep neural net-work with multi-view
53 features,’’ IEEE/ACM Trans. Comput. Biol. Bioinf., vol. 19, no. 1, pp. 575–584, Jan. 2022,
54
55
doi: 10.1109/TCBB.2020.3002771.
56
57 [18] M. Alshahrani and R. Hoehndorf, ‘‘Semantic disease gene embeddings (SmuDGE):
58 Phenotype-based disease gene prioritization without pheno-types,’’ Bioinformatics, vol. 34,
59
60 no. 17, pp. i901–i907, Sep. 2018.
61
62 12
63
64
65
[19] K. Yang, R. Wang, G. Liu, Z. Shu, N. Wang, R. Zhang, J. Yu, J. Chen, X. Li, and X.
1 Zhou, ‘‘HerGePred: Heterogeneous network embedding represen-tation for disease gene
2
3 prediction,’’ IEEE J. Biomed. Health Informat., vol. 23, no. 4, pp. 1805–1815, Jul. 2019.
4
5 [20] K. Yang, N. Wang, G. Liu, R. Wang, J. Yu, R. Zhang, J. Chen, and X. Zhou,
6
‘‘Heterogeneous network embedding for identifying symptom candidate genes,’’ J. Amer.
7
8 Med. Inform. Assoc., vol. 25 Nov. 2018.
9
10 [21] Y. Liu, H.Q. Qu, X. Chang, L. Tian, J. Qu, J. Glessner, P. M. A. Sleiman, and H.
11
12
Hakonarson, ‘‘Machine learning reduced gene/non-coding RNA features that classify
13 schizophrenia patients accurately and highlight insightful gene clusters,’’ Int. J. Mol. Sci.,
14 vol. 22, no. 7, p. 3364, Mar. 2021.
15
16
17 [22] Y. Liu, H.Q. Qu, F. D. Mentch, J. Qu, X. Chang, K. Nguyen, L. Tian, J.Glessner, P. M.
18 A. Sleiman, and H. Hakonarson, ‘‘Application of deep learning algorithm on whole genome
19
sequencing data uncovers structural variants associated with multiple mental disorders in
20
21 African American patients,’’ Mol. Psychiatry, vol. 27, no. 3, pp. 1469–1478, Mar. 2022, doi:
22 10.1038/s41380-021-01418-1.
23
24
25
[23] Rectiﬁer/ (Neural/ Networks). Accessed:Nov. 4, 2021.
26
27 [24] Statistics#03—Standard Deviation and Variance. Accessed: Nov. 4, 2021.Available:
28 https://towardsdatascience.com/statistics-03-standard-deviation-and-variance-9724f33b58df.
29
30
31
[25] Softmax Activation Function—How It Actually Works. Accessed: Nov.4, 2021.
32 Available: https://towardsdatascience.com/softmax-activation-function-how-it-actually-
33 works-d292d335bd78
34
35
36 [26] A.-U. Rahman, S. Abbas, M. Gollapalli, R. Ahmed, S. Aftab, M. Ahmad, M.A. Khan,
37 and A. Mosavi, ‘‘Rainfall prediction system using machine learning fusion for smart cities,’’
38 Sensors, vol. 22, no. 9, p. 3504, May 2022.
39
40
41 [27] M. Saleem, S. Abbas, T. M. Ghazal, M. A. Khan, N. Sahawneh, and M.Ahmad, ‘‘Smart
42 cities: Fusion-based intelligent trafﬁc congestion con-trol system for vehicular networks
43
44
using machine learning techniques,’’ Egyptian Informat. J., vol. 6, pp. 1–10, Apr. 2022.
45
46 [28] M. W. Nadeem, H. G. Goh, M. A. Khan, M. Hussain, M. F. Mushtaq, and V. A.
47 Ponnusamy,‘‘Fusion-based machine learning architecture for heart disease prediction, ’’
48
49
Comput. Master. Continue vol. 67, no. 2, pp. 2481–2496, 2021.
50
51 [29] S. Y. Siddiqui, A. Athar, M. A. Khan, S. Abbas, Y. Saeed, M. F. Khan, and M. Hussain,
52 ‘‘Modelling, simulation and optimization of diagnosis cardiovascular disease using
53
54 computational intelligence approaches,’’J.Med. Imag. Health Informat., vol. 10, no. 5, pp.
55 1005–1022, May 2020.
56
57 [30] N. Taleb, S. Mehmood, M. Zubair, I. Naseer, B. Mago, and M. U. Nasir, ‘‘Ovary cancer
58
59 diagnosing empowered with machine learning,’’ in Proc. Int. Conf. Bus. Anal. Technol.
60 Secur. (ICBATS), Feb. 2022, pp. 1–6.
61
62 13
63
64
65
[31] A.-U. Rahman, A. Alqahtani, N. Aldhafferi, M. U. Nasir, M. F. Khan, M.A. Khan, and
1 A. Mosavi, ‘‘Histopathologic oral cancer prediction using oral squamous cell carcinoma
2
3 biopsy empowered with transfer learning,’’ Sensors, vol. 22, no. 10, p. 3833, May 2022.
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62 14
63
64
65
Link(s) to supporting data
Click here to download Link(s) to supporting data

https://www.kaggle.com/datasets/surbhi1425/human-genome6

Mtap D 24 02268

Uploaded by

Copyright:

Available Formats

Mtap D 24 02268

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Mtap D 24 02268

Uploaded by

Copyright:

Available Formats

Multimedia Tools and Applications

EXTREME GRADIENT BOOSTING MODEL FOR PREDICTION OF GENOME

Manuscript Number: MTAP-D-24-02268

Full Title: EXTREME GRADIENT BOOSTING MODEL FOR PREDICTION OF GENOME

Article Type: Track 2: Medical Applications of Multimedia

Corresponding Author: Pavan Desineedi, M.Tech

Corresponding Author Secondary

Corresponding Author's Institution: Bonam Venkata Chalamayya Engineering College

Corresponding Author's Secondary

First Author: Pavan Desineedi, M.Tech

First Author Secondary Information:

Order of Authors: Pavan Desineedi, M.Tech

Subba Rao Polamuri

Chandra Mouli Venkata Srinivas Aakana

Order of Authors Secondary Information:

EXTREME GRADIENT BOOSTING MODEL FOR PREDICTION OF GENOME

Click here to download Link(s) to supporting data

You might also like