REPORT
REPORT
REPORT
Machine-learning techniques are used in the medical approaches to help using an invasive method
in prediction and detection of diseases, such as prediction of fibrosis, cirrhosis, and prediction of
response therapy in Hepatitis C Patients. It is used to avoid the drawbacks of biopsy. Hepatitis C is
an infectious disease of the liver caused by the hepatitis C virus (HCV) and is a major global cause
of chronic hepatitis, liver cirrhosis, and hepatocellular carcinoma. Patients with chronic hepatitis C
were divided into two sets—one categorized as mild to moderate fibrosis (F0-F2), and the other
categorized as advanced fibrosis (F3-F4) according to METAVIR score.
We are using Multilinear Regression ,KNN(K Nearest Neighbour) , Decision Trees and Bagging
technique to find the stage of the fibrosis . Based on the confusion matrix and finding the
accuracy of all the above models we get highest accuracy on Decision Trees of 72% by using
Bagging approach.
1
Contents
Page no
1. Abstract 1
2. Introduction 3
3. Problem Statement 3
5. Code Implementation 7
6. Results 10
References 13
2
List of Figures
Page no
3
Introduction
Machine-learning techniques have been used as prediction, classification, and diagnosis tools.
Hepatitis C is an infectious disease of the liver caused by the hepatitis C virus (HCV). Hepatitis C is
a disease that causes inflammation and infection of the liver . Unlike hepatitis A and B, there's no
vaccine for hepatitis C. chronic hepatitis C symptoms develop over a period of months and may not
be apparent at first. The World Health Organization (WHO) estimates that 71 millionTrusted
Source people have chronic hepatitis C. The assessment of liver fibrosis staging in Chronic
Hepatitis C (CHC) is mandatory for the management of patients infected with the hepatitis C virus
(HCV). It is essential to monitoring the prognosis of the disease, to establish the optimal timing for
therapy, management strategies and to predict the response to treatment. hepatitis C virus (HCV)
and is a major global cause of chronic hepatitis, liver cirrhosis, and hepatocellular carcinoma.
Problem Statement
Liver biopsy was considered as standard in staging liver fibrosis but there are many drawbacks of
biopsy:
High cost and especially when periodically repeated for monitoring the diseases progress.
Painful for patients.
Carries the risk of infection and bleeding
Susceptible to sampling error
There is a need to use non-invasive methods as alternative in staging chronic liver diseases to
overcome the drawbacks of biopsy.
4
Machine Learning Approches
Machine-learning techniques are used in the medical approaches to help using an invasive method
in prediction and detection of diseases.Patients with chronic hepatitis C were divided into two sets
—oncategorized as mild to moderate fibrosis (F0-F2), and the other categorized as advanced
fibrosis (F3-F4). HCV –Egypt Data set is used. It has 1384 patient observation details and
has 29 variables Age, Gender, BMI. Fever, Nausea. /Vomting, Headache, Diarrhea Fatigue,
generalized.bone.ache Jaundice , Epigastric.pain , WBC , RBC ,HGB , Plat, AST.1, ALT.1,
ALT4 , ALT.12, ALT.24 ,ALT.36 ,ALT.48 , ALT,.after.24.w RNA.Base ,RNA.4 , RNA.12,
RNA.EOT, RNA.EF,
Baseline.histological.Grading , Baselinehistological.staging(fibrosis).
Feature Selection:
Pearsons Correlation Coefficient is used for performing feature selection. Pearsons Correlation
Coefficient between , Baselinehistological.staging(fibrosis) and each variable has been assessed .
Table 1
Variable Pearson’s Correlation
age -0.0187505532
gender 0.0129803382
fever -0.0300457134
nausea 0.0539673391
headache -0.0009988639
boneache 0.0135885246
jaundice 0.0192461221
epigastric -0.0511800045
wbc 0.0173873318
rbc 0.0085519190
hgb 0.0024144025
platelet -0.0158091915
AST1 -0.0240393805
ALT1 0.0381001785
alt4 -0.0136860091
alt12 0.0023133566
alt24 -0.0040491522
alt36 -0.0050901877
alt48 -0.0130380893
rnaafter 0.0340205726
rna_base 0.0311426697
rna4 -0.0346061313
rna12 0.0354142240
5
rna_eot -0.0157042581
rna_ef 0.0298398993
Decision Trees :
It Decision tree learning is one of the predictive modeling approaches used in statistics, data
mining and machine learning. It uses a decision tree (as a predictive model) to go from observations
about an item (represented in the branches) to conclusions about the item's target value (represented
in the leaves). Tree models where the target variable can take a discrete set of values are
called classification trees; in these tree structures, leaves represent class labels and branches
represent conjunctions of features that lead to those class labels. A decision tree illustrates
graphically all the possible alternatives, probabilities and outcomes and identifies the benefits of
using decision analysis.
On the data set when we applied the decision tree the we got the above as output. A tree is built by
splitting the source set, constituting the root node of the tree, into subsets - which constitute the
successor children. The splitting is based on a set of splitting rules based on classification features.
6
Multilinear Regression
Multiple linear regression attempts to model the relationship between two or more explanatory
variables and a response variable by fitting a linear equation to observed data. Every value of the
independent variable x is associated with a value of the dependent variable y. The population
regression line for p explanatory variables x1, x2, ... , xp is defined to be y = 0 + x +
1 1
x + ... +
2 2 x . Regression Analysis is a form of predictive modelling technique which investigates
p p
7
Code Implementation
Code for loading the data from the dataset and for obtaining scatter plots for getting the visual view
of the correlation.
8
Code for performing Multilinear Regression
9
Code for performing Bagging:
10
Results
Decision Trees:
We have to load the Tree package from CRAN repository.
Based on feature selection which variables have high correlation they are used for decision tree.
11
We get accuracy of 72%
Regression Tree
12
Hence the accuracy is 65%
13
References
[1] L. Castera, “Noninvasive methods to assess liver disease in patients with hepatitis B or C,”
Gastroenterology, vol. 142, pp. 1293–1302, 2012.
[2] L. Zhang, Q.-Y. Li, Y.-Y. Duan, G.-Z. Yan, Y.-L. Yang, and R.-J. Yang, “Artificial neural network aided non-
invasive grading evaluation of hepatic fibrosis by duplex ultrasonography,” BMC Med. Informat. Decision
Making, vol. 12, 2012, Art. no. 55.
[4] L. Gravitz, “A smouldering public-health crisis,” Nature, vol. 474, no. 7350, pp. S2–S4, 2011.
14