REPORT

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 14

ABSTRACT

Machine-learning techniques are used in the medical approaches to help using an invasive method
in prediction and detection of diseases, such as prediction of fibrosis, cirrhosis, and prediction of
response therapy in Hepatitis C Patients. It is used to avoid the drawbacks of biopsy. Hepatitis C is
an infectious disease of the liver caused by the hepatitis C virus (HCV) and is a major global cause
of chronic hepatitis, liver cirrhosis, and hepatocellular carcinoma. Patients with chronic hepatitis C
were divided into two sets—one categorized as mild to moderate fibrosis (F0-F2), and the other
categorized as advanced fibrosis (F3-F4) according to METAVIR score.

Unlike hepatitis A and B, there's no vaccine for hepatitis C. chronic hepatitis C symptoms develop


over a period of months and may not be apparent at first. The World Health Organization (WHO)
estimates that 71 millionTrusted Source people have chronic hepatitis C.Pearsons Correlation
Coefficient is used for performing feature selection. Pearsons Correlation Coefficient between ,
Baselinehistological.staging(fibrosis) and each variable has been assessed .

We are using Multilinear Regression ,KNN(K Nearest Neighbour) , Decision Trees and Bagging
technique to find the stage of the fibrosis . Based on the confusion matrix and finding the
accuracy of all the above models we get highest accuracy on Decision Trees of 72% by using
Bagging approach.

1
Contents
Page no

1. Abstract 1

2. Introduction 3

3. Problem Statement 3

4. Topic Explanation -Machine Learning Approaches 4

5. Code Implementation 7

6. Results 10

References 13

2
List of Figures

Page no

1. Fig 1: Lungs affected with Hepatitis C 4

2. Fig 2:Fibrosis Stage 4

3. Fig 3: Decision Tree 6

4. Fig 4: Multilinear Regression 7

3
Introduction
Machine-learning techniques have been used as prediction, classification, and diagnosis tools.
Hepatitis C is an infectious disease of the liver caused by the hepatitis C virus (HCV). Hepatitis C is
a disease that causes inflammation and infection of the liver . Unlike hepatitis A and B, there's no
vaccine for hepatitis C. chronic hepatitis C symptoms develop over a period of months and may not
be apparent at first. The World Health Organization (WHO) estimates that 71 millionTrusted
Source people have chronic hepatitis C. The assessment of liver fibrosis staging in Chronic
Hepatitis C (CHC) is mandatory for the management of patients infected with the hepatitis C virus
(HCV). It is essential to monitoring the prognosis of the disease, to establish the optimal timing for
therapy, management strategies and to predict the response to treatment. hepatitis C virus (HCV)
and is a major global cause of chronic hepatitis, liver cirrhosis, and hepatocellular carcinoma.

Problem Statement
Liver biopsy was considered as standard in staging liver fibrosis but there are many drawbacks of
biopsy:

 High cost and especially when periodically repeated for monitoring the diseases progress.
 Painful for patients.
 Carries the risk of infection and bleeding
 Susceptible to sampling error

There is a need to use non-invasive methods as alternative in staging chronic liver diseases to
overcome the drawbacks of biopsy.

Fig 1: Lungs Affected with Hepatitis C Fig 2: Fibrosis Liver

4
Machine Learning Approches
Machine-learning techniques are used in the medical approaches to help using an invasive method
in prediction and detection of diseases.Patients with chronic hepatitis C were divided into two sets
—oncategorized as mild to moderate fibrosis (F0-F2), and the other categorized as advanced
fibrosis (F3-F4). HCV –Egypt Data set is used. It has 1384 patient observation details and
has 29 variables Age, Gender, BMI. Fever, Nausea. /Vomting, Headache, Diarrhea Fatigue,
generalized.bone.ache Jaundice , Epigastric.pain , WBC , RBC ,HGB , Plat, AST.1, ALT.1,
ALT4 , ALT.12, ALT.24 ,ALT.36 ,ALT.48 , ALT,.after.24.w RNA.Base ,RNA.4 , RNA.12,
RNA.EOT, RNA.EF,
Baseline.histological.Grading , Baselinehistological.staging(fibrosis).

Feature Selection:
Pearsons Correlation Coefficient is used for performing feature selection. Pearsons Correlation
Coefficient between , Baselinehistological.staging(fibrosis) and each variable has been assessed .
Table 1
Variable Pearson’s Correlation
age -0.0187505532
gender 0.0129803382
fever -0.0300457134
nausea 0.0539673391
headache -0.0009988639
boneache 0.0135885246
jaundice 0.0192461221
epigastric -0.0511800045
wbc 0.0173873318
rbc 0.0085519190
hgb 0.0024144025
platelet -0.0158091915
AST1 -0.0240393805
ALT1 0.0381001785
alt4 -0.0136860091
alt12 0.0023133566
alt24 -0.0040491522
alt36 -0.0050901877
alt48 -0.0130380893
rnaafter 0.0340205726
rna_base 0.0311426697
rna4 -0.0346061313
rna12 0.0354142240

5
rna_eot -0.0157042581
rna_ef 0.0298398993
Decision Trees :

It Decision tree learning is one of the predictive modeling approaches used in statistics, data
mining and machine learning. It uses a decision tree (as a predictive model) to go from observations
about an item (represented in the branches) to conclusions about the item's target value (represented
in the leaves). Tree models where the target variable can take a discrete set of values are
called classification trees; in these tree structures, leaves represent class labels and branches
represent conjunctions of features that lead to those class labels. A decision tree illustrates
graphically all the possible alternatives, probabilities and outcomes and identifies the benefits of
using decision analysis.

Fig 3: Decision Tree

On the data set when we applied the decision tree the we got the above as output. A tree is built by
splitting the source set, constituting the root node of the tree, into subsets - which constitute the
successor children. The splitting is based on a set of splitting rules based on classification features.
6
Multilinear Regression

Multiple linear regression attempts to model the relationship between two or more explanatory
variables and a response variable by fitting a linear equation to observed data. Every value of the
independent variable x is associated with a value of the dependent variable y. The population

regression line for p explanatory variables x1, x2, ... , xp is defined to be  y =  0 +  x  + 
1 1

x  + ... + 
2 2 x . Regression Analysis is a form of predictive modelling technique which investigates
p p

relationship between dependent and independent variable.

Fig 4: MultiLinear Regression

7
Code Implementation
Code for loading the data from the dataset and for obtaining scatter plots for getting the visual view
of the correlation.

Code for finding the Pearson’s correlation coefficient.

8
Code for performing Multilinear Regression

Code for finding the


confusion matrix

Code for performing Decision trees:

Code for finding the


confusion matrix

9
Code for performing Bagging:

Code for performing KNN on the data set:

10
Results

Decision Trees:
We have to load the Tree package from CRAN repository.
Based on feature selection which variables have high correlation they are used for decision tree.

Fig 5 : Decision Tree

We have got Accuracy=64.7% approximately 65%


After Applying Bagging we get

11
We get accuracy of 72%

Regression Tree

Fig 6 : Multilinear Regression

12
Hence the accuracy is 65%

13
References
[1] L. Castera, “Noninvasive methods to assess liver disease in patients with hepatitis B or C,”
Gastroenterology, vol. 142, pp. 1293–1302, 2012.

[2] L. Zhang, Q.-Y. Li, Y.-Y. Duan, G.-Z. Yan, Y.-L. Yang, and R.-J. Yang, “Artificial neural network aided non-
invasive grading evaluation of hepatic fibrosis by duplex ultrasonography,” BMC Med. Informat. Decision
Making, vol. 12, 2012, Art. no. 55.

[3] M. ElHefnawi, et al., “Accurate prediction of response to Interferon-based therapy in Egyptian


patients with Chronic Hepatitis C using machine-learning approaches,” in Proc. IEEE/ACM Int. Conf.
Advances Social Netw. Anal. Mining, 2012, pp. 771–778.

[4] L. Gravitz, “A smouldering public-health crisis,” Nature, vol. 474, no. 7350, pp. S2–S4, 2011.

14

You might also like