Statistical Analysis On Risk Factors of Prevalence of Malaria in Z/ Dugda Woreda, Oromia, East Arsi

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 31

STATISTICAL ANALYSIS ON RISK FACTORS OF PREVALENCE OF

MALARIA IN Z/ DUGDA WOREDA, OROMIA, EAST ARSI.

BY: -

UMER DERSE…………………………………… 541/05

JEMAL TUKE…………………………………….270/05

ADVISOR: DESALEGNE MESA (MSC).

SUBMITTED TO:

COLLAGE OF NATURAL AND COMPUTATIONAL SCIENCE, DEPERTMENT


OFSTATISTICS, WOLATITA SODO UNIVERSITYIN PARTIAL FULFILMENT
OF THE REQUIREMENTS FOR THE DEGREE OF BACHELOR OF SCIENCE
IN STATISTICS

JUNE, 2015

WALAITA SODO, ETHIOPIA

STATISTICAL ANALYSIS ON RISK FACTORS OF PREVALENCE OF


MALARIA IN Z/ DUGDA WOREDA, OROMIA, EAST ARSI.
Name of Students: ID:

1.UMER DERSE541/05

2. JEMAL TUKE 270/05

3. ELLENE DEREJE181/05

Approved by:

Name of Advisor: Mr. DESELGN.M ___Signature____ Date______________

Examiner’s Name: 1._______________ Signature _________________ Date_____________

2._______________ Signature _________________ Date_____________

Name of Chair Person: BereketTessema Signature ______________ Date_____________

Acknowledgements

We are greatly pleased to present our great gratitude to our effective, creative, invaluable,
affable, pithy, and invigorating consultant Mr. DESELGN.M next to ALLAH who has been a
continuous source of encouragement, mine of confidence, river of endurance, and paradise of
wisdom and who is available in correct position at correct time for correct consultation. We are
very grateful to him for his valuable outlook, ideas, suggestions, and constructive comments that
have helped to considerable improvement of this paper. Then we would like to present our great
gratitude to all WolaitaSodo University statistics department staff’s for their unrestrained and
demonstrative knowledge sharing and cooperation. Also we would like to extend our thanks to
East Arsi Zone Ogolcho Clinichealth officers, Z/dugdaWoreda for their willingness to give data
which is basic to build the study. The last but not least thank belongs to our family especially to
NASHIYO GANSA, GANI H/GEMECHU, DEREJE and DINBRINASH for their continuous
financial and material support from very beginning to still now without any failure.

Abstract
Background :Malaria is the most deadly disease caused by Plasmodium parasites. The parasites
are spread to people through the bites of infected Anopheles mosquitoes, called "malaria
vectors". It remains to be a major challenge to public health and socio-economic development
worldwide and in sub- Saharan Africa in particular. However, there is still a paucity of
information on the occurrence of malaria at the study area. The objective of this study was to
investigate the prevalence of malaria and related risk factors in Z/dugdaWoreda.

Methods:In this research Chi-square test of independence was used to see the association
between Status of malaria and the other categorical independent variables. Moreover Binary
Logistic Regression was used to examine the impact of predictor variables on the prevalence of
Malaria in the area.
Results:The results of Chi-square showed that malaria status had significantly associated with
the variables: stagnant water, net usage, age and residence at 5% significant level.
Using Binary logistic regression analysis out of seven categorical predictor variables, four
predictor variables, like age of patient, residence of patient, net usage and stagnant waterhad
significant effect on the outcome variable which is status malaria patients.
Conclusion:The results of this study revealed that the variables age, Residence, stagnant of
water and Net usage had contributed to malaria status of a patient.
Keywords: Malaria prevalence, Retrospective study, Logistic regression, Chi-square Test.

II
ACRONOMY

WHO:World Health Organization

FMOH &ENMIS:Federal Ministry OfHealth and Ethiopia National Malaria Indicator Survey

MOH: Ministry Of Health

IRS: Indoor Residual Spraying

ITNs: Insecticide-Treated Mosquito Nets

SNNP:Southern Nations, Nationalities and peoples

III
Table of contents
Contents Page

Acknowledgment ........................................................................................................................................I

Abstract.......................................................................................................................................................II

Acronomy …………………………………………………………………………………………………………………………………………...III

Table of content …………………………………………………………………………………………………………………………………….IV

1. INTRODUCTION

1.1. Background of the study ……………………………………………………………………………………………………..1

1.2. Statement of the problem…………………………………………………………………………………………….......2

1.3 Objective of the Study …………………………………………………………………………………………………………--3

1.3.1 General Objective--------------------------------------------------------------------------------------------3


1.3.2 Specific objectives------------------------------------------------------------------------------------------3
1.4 Significance of the study ---------------------------------------------------------------------------------------3

2. REVIEW OF RELATED LITERATURE----------------------------------------------------------------------------4

3.METHODOLOGY---------------------------------------------------------------------------------------------------7

3.1 Study Area and Description of the Data ------------------------------------------------------------------7

3.1.1 Data description --------------------------------------------------------------------------------------------7


3.1.2Description of the study Area ---------------------------------------------------------------------------7
3.2 Sampling Design -------------------------------------------------------------------------------------------------7

3.4 Variables consider in the study -----------------------------------------------------------------------------7

3.3.1 Dependent Variable --------------------------------------------------------------------------------------7


3.3.2 Independent Variables ---------------------------------------------------------------------------------8
3.1.Statistical Method -----------------------------------------------------------------------------------------------9

IV
3.4.1 Logistic Regression Model -------------------------------------------------------------------------9
3.4.2 Univarate Logistic Regression Model ------------------------------------------------------------------------9

3.4.3 Multiple Logistic Regression Model-------------------------------------------------------------------------10

3.5 Assumptions of Logistic Regression ----------------------------------------------------------------------------11

3.6 Variable Selection ---------------------------------------------------------------------------------------------------11

3.7 Goodness of Fit the Model --------------------------------------------------------------------------------------12

4. RESULT AND DICTATION------------------------------------------------------------------------------------------------13

4.1 Descriptive statistical and bivariate analysis -----------------------------------------------------------------13

4.1.1 Analysis of data using logistic regression-------------------------------------------------------------------15

4 .1.2 variable selection ------------------------------------------------------------------------------------------------15

4.1.3 Assessing logistic regression model ---------------------------------------------------------------------------16

4.1.4 Odds Ratio------------------------------------------------------------------------------------------------------------16

4.2. Discussion----------------------------------------------------------------------------------------------------------------17

5. CONCLUSIONS AND RECOMAINDATION -----------------------------------------------------------------------------18

5.1 Conclusion -----------------------------------------------------------------------------------------------------------------18

5.2 Recomandation ----------------------------------------------------------------------------------------------18

5.3 Limitation of the study--------------------------------------------------------------------------------------------------19

6. REFERENCES -------------------------------------------------------------------------------------------------------20

7. APPENDIX ---------------------------------------------------------------------------------------------------------------------21
1. INTRODUCTION

1.1. Background of the Study

Malaria is the most deadly disease caused by Plasmodium parasites. The parasites are spread to
people through the bites of infected Anopheles mosquitoes, called "malaria vectors". It remains
to be a major challenge to public health and socio-economic development worldwide and in sub-
Saharan Africa in particular. It causes an estimated 300 to500 million cases and 1.5 to 2.7
million deaths worldwide each year, of which 80% of the cases and 90% of the deaths occur in
Sub-Saharan Africa world health organization (WHO, 2009).

In Ethiopia, it is also a leading public health problem, where 75 % of the country’s land surface
is malarious and 68% of the populations are at risk of malaria infection federal ministry of health
and Ethiopia national malaria indicator survey (FMOH & ENMIS, 2008). Thus, malaria is a
public health concern and all age groups of the population are vulnerable, even if children under
five years of age and pregnant women are generally considered to be at a higher risk ministry of
health (MOH, 2008).Over the last five years (2000 – 2005) E.Cthe proportion of malaria in out
of patient department, admission and in-patient deaths has been increasing with the highest being
recorded in 2003 and 2004. In 2008, malaria was still the first leading cause of health problem
accounting for 48% of out of patient consultations, 25% admissions and 29.6% inpatient deaths
(MOH, 2008). According to (MOH 2008) reports, approximately 70,000 people die of malaria
each year in Ethiopia.

There are four types of human malaria: Plasmodium falciparum, Plasmodium Vivax,Plasmodium
malaria and Plasmodium Ovale. Among the four species, Plasmodium falciparum is by far the
most aggressive species, distributed globally especially common in Africa (WHO, 2009). In
Ethiopia, Plasmodium falciparum is the dominant species followed by P. Vivax and these two
species accounts for 60% and 40% of all malaria cases respectively. P. malaria accounts only for
less than 1% of cases and is restricted in distribution. But, P.ovale is rarely reported (MOH,
2008). However, the relative frequency of the species varies from place to place and from season
to season. For instance, in Oromia region plasmodium falciparum accounts for 49% of confirmed
malaria cases, P.vivaxwere22%and11%mixedinfection. However;

P.vivaxfalciparum is responsible for most hospital and health center admissions, morbidity and
mortality (Oromia region report, 2000). Prevention and control activities of malaria in Ethiopia
are implemented as guided by the National Strategic Plan to ultimately reduce the burden of
malaria to level where it is nolonger a public health problem. There are four major intervention
strategies that are being applied in Ethiopia to combat malaria were: early diagnosis and prompt
treatment, selective vector control that involves use of indoor residual spraying (IRS),
insecticide-treated mosquito nets (ITNs) and environmental managementministry of health
(MOH, 2008).

A major challenge for malaria epidemiologists is to evaluate the strengths and weakness of
methodsin estimating malaria prevalence and time trends. Especially as malaria control programs
are intensified worldwide (WHO, 2009).

Regardless of decades of sustained control efforts malaria still remains as the major cause of
morbidity, mortality and socioeconomic problems in Ethiopia, because malaria control is a big
challenge due to many factors(MOH, 2008).

The complexity of the disease control process, expensiveness of the control program, resistance
of the parasite to ant malarial drugs and vectors to insecticides are some of the challenges.
Moreover, currently in Ethiopia different reports indicate that malaria is decreasing. But the
exact factors for itsreduction is not well defined (FMOH&ENMS, 2008).

1.2. Statement of the problem

Prevalence of malaria is the major global health problem and causes the most serious form of the
disease, and it is common in developing countries, particularly in sub-Saharan Africa like
Ethiopia.There are many factors thatcontributetoprevalence of malaria like seasonal
variability,altitude;due to parasite was changeto complicated, problems of society awareness and
etc.However, the factors themselves and the health care way to manage or prevent the effect of
these factors are not well known by the community. Similarly there is gap in health service and
people face lack of health insurance and skilled medical care in the study area. Also there is no
clear statistical methodology and quantitative research applied using modern and appropriate
statistical modelson these factors in the study area. Considering the above listed and other
problems, this study is motivated tocontribute in identifying those important factors which play
role to incidence of malaria using appropriate statistical method particularly, binary Logistic
regression Model.

The core research questions are:

• What factors enhance the prevalence of malaria in the study area?

• What factors are highly related to the prevalence of malaria in the study area?

• In which residents the prevalence of malaria high; rural or urban residents?

• In which sex and age groups this disease more prevalent?

1.3 Objective of the Study

1.3.1 General Objective

The central objective of this study is to investigate the prevalence of malariaandrelatedrisk


factors in Z/dugdaWoreda using binary logistic regression model.

1.3.2 Specific objectives

• To identify the major factors that is significantlycontributeto prevalence of malaria.

• To examine the characteristics of patients with malaria classified by sex.

• To find out the relationship of malaria prevalence and Seasons of the year.

1.4 Significance of the study

The findings of the study could help peoples to better understand risk factors for incidence of
malaria, to review the malaria related deaths, to understand method used to resist the prevalence
of malaria, to identify gaps in health service and to make recommendation, to prevent future
death and to take appropriate actions. It will also serve as a guide for those who are interested to
make further studies on the area and problems to acquire which factor have major effect on the
death, solving the problems in the real world by the application of statistics.
2. REVIEW OF RELATED LITERATURE

Malaria is a major life-threatening vector-borne disease transmitted through a bite of female


Anopheles mosquitoes and it is not only just a disease but an economic and social disease that
burdens many nations globally (Sachs and Malaney, 2006).
The disease got its name from bad air (malaria) as it was thought that the disease came from fetid
marshes. Later in 1880, it was discovered that the real cause of malaria was Plasmodium a single
cell parasite which can only be transmitted from one person to another by the bite of female
Anopheles mosquito. The male Anopheles mosquitoes are not involved in disease transmission
as they don’t require blood to nurture eggs as their Female counterparts do (Ribeir,2006).

The disease is more prevalent in the tropic and sub-tropic regions of the world and causes more
than 300 to500 million cases and 1.5 to 2.7 million deaths worldwide annually. According to
World Malaria Report 2009, Geneva by World Health Organization 90% of deaths caused by
malaria takes place in Africa, primarily among young children, pregnant women and their
unborn children. A child in Africa dies every 30 seconds because of malaria and those who
survive the severe episode of malaria might suffer from learning impairments or brain damage
(WHO/UNICEF, 2008).
Mboumbet al.201investigates study in different area of Gabon. During the study cross-sectional
surveys were carried out in health care facilities at four locations: two urban areas (Libreville and
Port-Gentil), one semi-urban area (Melen) and one rural area (Oyem), between 2005 and 2011.
Body temperature, history of fever, age, sex, and location were collected as independent
(predictors) in different areas of Gabon.The result of Binary Logistic Regression model shows
that increased risk of malaria infection in different areas of Gabon with over-five year-old
children tending to become the most at-risk population, suggesting a changing epidemiology.
Moreover, the heterogeneity of the malaria burden in the country highlights the importance of
maintaining various malaria control strategies and redefining their implementation.
Alexanderet al.2011studied prevalence of malaria among patients attending public health
facilities in Maputo city,Mozambique. The predictor included in the study were age group-
(>=5vs. <5),residence in Maputo city, house close to water bed net at household, bed net hung
the previous night and documented fever enrollment. By using logistic regression model this
study were published the result, among the 706 enrolled patients,11 (15.7%) cases were
identified; 105 of Plasmodium falciparum only, two of Plasmodium ovaleonly, and four of both
P.falciparumand P. ovale. No cases of Plasmodium vivaxorPlasmodium malariawere identified.
The RDTs were positive in 99 of the 111 patients, yielding a sensitivity of 89.2% (95%
confidence interval [CI]:82.2_96.2%).The specificity was 97.0% (95% CI: 95.3–98.7%), because
RDTs were negative in 577 of the 595 non-cases.
Alemuet al.2012have done a research on trend analysis of malaria prevalence in Kola Diba,
North Gondar, North west Ethiopia .A retrospective study was conducted to determine the
prevalence of malaria from peripheral blood smear examinations from the Kola Diba Health
Center of Ethiopia. The case notes of all malaria cases reported between2002–2011 were
carefully reviewed and analyzed. The result obtained during investigation were within the last
decade (2002–2011) a total of 59, 208 blood films were requested for malaria diagnosis in Kola
Diba health center and 23,473 (39.6%) microscopically confirmed malaria cases were reported in
the town with a fluctuating trend.Regarding the identified plasmodium species, Plasmodium
falciparum and Plasmodium vivaxaccounted for 75% and 25% of malaria morbidity,
respectively. As researcher concluded that, the study after the introduction of the current malaria
control strategies, the morbidity and mortality by malaria is decreasing but malaria is still a
major health problem and the deadly species P. falciparumis predominant. Therefore, control
activities should be continued in a strengthened manner in the study area considering both P.
falcipariumandP.vivax.Woyesseet al.2012 studied in the prevalence of malaria in Butajira area,
south-central Ethiopia. The method used in this study were a multi-stage sampling technique,
750 households were selected. All consenting family members were examined for malaria
parasites in thick and thin blood smears. In this study variables used as predictor
(independent)were age group,sex, survey period and Hobe. Finally the result obtained by the
study were, in total 19,207 persons were examined in the six surveys. From those tested, 178
slides were positive for malaria, of which 154 (86.5%) were positive for Plasmodium vivaxand
22 (12.4%) for Plasmodium falciparum; the remaining two (1.1%) showed mixed infections of
Plasmodium falciparum and Plasmodium vivax. The researcher concluded that, the study
documented a low prevalence of malaria that varied with season and altitudinal zone in a
highland-fringe area of Ethiopia. Most of the malaria infections were attributable to Plasmodium
vivax.
A house hold cluster survey was conducted in Amhara, Oromia and Southern Nations,
Nationalities and peoples’ (SNNP) regions of Ethiopia during December 2006 to January 2007,
during the end of the mal- aria season. A total of 224 clusters of average 25 households each
(total 5,708 households) were selected and 28,994 individuals participated in at least one part of
the survey.Variables (predictors) considered in the study were;age distribution, sex,pregnant
women, main sources of drinking water, time to collect water, sanitation facility, main material
of wall, main material of roof, main material of floor, service and goods(Electricity,radio and
TV) andaltitude were observed in detail.The questionnaire was developed as a modification of
the Malaria Indicator Survey Household Questionnaire (WHO, 2008). By using Multinomial
Logistic Regression model the study were found the result that Oromia had a significantly lower
prevalence than the other two regions, which are not significantly different from each other.
Plasmodium falciparum prevalence was higher than P.vivax in all regions (Carter center report,
August 2007).
3.METHODOLOGY

In this section the methodology to be employed starting with data and description of the study
area and variables will be discussed. This section ends by discussing statistical method which are
appropriate for study.

3.1 Study Area and Description of the Data

3.1.1 Data description

The data used in the study will be obtained from OgolchoClinic, Z/dugdaworeda from clinical
registry, patient’s card and log book.

3.1.2Description of the study Area

Z/dugdaworeda is one of the eastern parts of Arsi zone and people reformed town.Aworeda is
located in eastern part of Ethiopia,Oromia region under East Arsi zone. This Woreda is invented
before 1952 EC and located 47 Km away from Asella town, which is the capital town of East
Arsizone and 222Km away from federal city Addis Abeba. Our study is carried out on
Z/dugdaworedain case of malaria prevalence and also aworeda has unfavorable air condition
(Kola)(Annual city report, 2009)

3.2 Sampling Design

Sampling methods are scientific procedures of selecting those sampling units which would
provide the required estimators with associated margins of uncertainty arising from only
examining the part not whole of the population.The target population for this study was recorded
patients of laboratory confirmed malaria positive slide in Ogolcho Health Center.However in this
study the whole data will be taken since the study is retrospective study.
3.3 Variables consider in the study

3.3.1 Dependent Variable.

The response or dependent variable in this study is the binary response variable which is named
as presence or not presence. This status of malaria presenceis coded as (1= if the patient is
malaria positive and 0 = if not).

3.3.2 Independent Variables

The predictor variables that are expected to cause the prevalence of malaria and their
classification or category are presented with their detail codes and descriptions in table 3.1

Table 3.1: List of Variables with their Codes and Descriptions


Explanatory variable Category
Sex (0) Male
(1) Female
Age interval in year Continues
Season in months (0) Dry
i.e. dry season (October-March) (1) Wet
wet season(April-September)
(0) Plasmodium
Types of malaria species diagnosed Falciparum
(1) Plasmodium Vivex

Residence of patients (0) Urban


(1) Rural

Stagnant water well(mosquito comfortable zone) (0) Yes


(1) No

Is society uses Net? (0) No


(1) Yes
3. 4 Statistical Method
Statistical analysis can be carried out using the classical approach or the Bayesian approach
(Prop et al., 1996). In this study since the response variable, that is presence/not status of
malaria presence (1= if the patient is malaria positive and 0 = if not) is a dichotomous variable
so the effect of explanatory variables on the dependent variable can be investigated using logistic
regression model that can be formulated under the classical or the Bayesian set up. We
considered binary logistic regression analysis to determine the factors that cause the prevalence
of malaria.

3.4.1 Logistic Regression Model

To mention point as why logistic regression is used; when the dependent variable is binary (0, 1)
ordinary least square regression model produces parametric estimate that is inefficient and
hetroscedastic error structure.

As result hypothesis testing and confidence interval become inaccurate. Similarly the probability
value may generate predicted value of outside 0-1 interval; which is violating the basic
assumption of the probability. It also creates the problem of non-normality, there after leading
lower coefficient of determination. To alleviate this problem and produce a relevant outcome, the
most widely used qualitative response model is the logit. Therefore logit is fitted to explain
(model) the relationship between each factor with the prevalence of malaria. Accordingly, in our
model the dependent variable take 1 if the malaria is prevalent with probability “ π (x )” otherwise
value of“0” that is, if not prevalent “1- π (x )”.

3.4.2 Univari ate Logistic Regression Model


It is a single predictor logit model. To illustrate it let us first consider the odds ratio:

π (x )
Odds= , where π (x ) - the probability that the malaria is prevalent
1−π (x)

Since P ( X ) can vary on scale [0, 1], the odds can vary on the scale (0,∞); so the log odds can vary
on the scale of (-∞, ∞).
π (x )
Let ϴ(x) be odds, then ϴ(x) = =exp ( β 0+ β 1x)
1−π (x)

Solving for π (x ), the logistic function becomes:

exp (β 0+ β1 x)
Π(x) = ………………………………………………………………………..( 1 )
1+ exp(β 0 + β 1 x )

In logistic regression for the binary variable we model the logarithm of the odds ratio, which is
π (x)
called logit π ( x ) . Thus,Logit π ( x ) =log(odds)=log = β 0 + β 1x ………………….( 2 ) .
1−π ( x )

3.4.3 Multiple Logistic Regression Model


As in univariate logistic regression, let π(x) represents the probability of event that depends on p-
covariate or independent variables.

'
exp (β 0+ β1 X 1 + β 2 X 2+ …+ β p X p) exp ⁡(β 0+ β x)
Π(x)= = ………………….( ¿ )
1+ exp(β 0 + β 1 X 1+ β 2 X 2 +…+ β p X p ) 1+exp ⁡(β 0 + β ' x)

[]
X1
X2
.
Where β =[ β 1 , β 2 , … β p ]and X =
'
.
.
Xp

So, the form is identical to the univariate logistic regression, but now we use more than one
covariate.

The corresponding logit function from this can be calculated (letting x represent the whole set of
covariates) as:

π (x ) exp (β 0+ β1 X 1 + β 2 X 2+ …+ β p X p)
Logit(π(x)) =log =¿ log +ɛi…………….¿
1−π (x) 1+ exp(β 0 + β 1 X 1+ β 2 X 2 +…+ β p X p )

It is assumed that this error has mean zero and that it follows binomial distribution with π(x), and
variance π(x)[1-π(x)]
3.5Assumptions of Logistic Regression

 Logistic regression assumes meaningful coding of the variables. Logistic coefficients will be
difficult to interpret if not coded meaningfully. The convention for binomial logistic
regression is to code the dependent class of greatest interest as 1 and the other class as 0.
 Logistic regression does not assume a linear relationship between the dependent and
independent variables. The dependent variable must be categorical.
 The independent variables need not be interval, nor normally distributed, nor linearly
related, nor of equal variance within each group.
 The groups must be mutually exclusive and exhaustive; a case can only be in one group
and every case must be a member of one of the groups.
 Linearity in the logit regression equation should have a linear relationship with the logit
form of the dependent variable.
 Absence of multicollinearity(collinearity)

3.6 Variable Selection


Choosing an appropriate variable is the major issue in statistical investigations. Omitting relevant
variables that are correlated with repressors causes least squares to be biased and inconsistent.
Including irrelevant variables reduces the precision of least squares. So, from a purely technical
point, it is important to estimate a model that has all of the necessary relevant variables and none
that are irrelevant. It is also important to use a suitable functional form. There are a great deal of
algorithms and methods to select appropriate variables that should be included in the model; we
will use one or some of these algorithms and methods to do this. There are two approaches;
Frequentist Approaches and Bayesian Approach. However for this study we will focus on the
frequentistis approach which summarized below. We refer readers for detailed description to see
(Gujarati 2004, andHosmer and Lemeshow, 1989).
 Backwards or forwards or backwards/forwards selection - still applies to logistic regression,
as that of linear regression with some differences. For instance, no theoretical basis, p-values do
not retain their usual meaning, tends to pick models that are much too large, etc.
 All subsets selection - A generic term, where a criterion (such as AIC, BIC, R 2, Adjusted R2,
etc) can be used.
 AIC criterion - calculated as AIC = n ln(SSE) − n ln(n) + 2p. Note thatit tends to be good
for complex models, less good in finding simple models.
 R2 criterion - Does not apply to logistic regression models, as we do not have the same kind
of residuals as in linear models. The logit function, we are really dealing with itsgeneralized
linear model. So, using these programs, an R 2 measure can in fact be defined for logistic
regression models, but it does not work well, and is seldom used in practice.
 Adjusted R2 criterion – Used when the number of independent variable is greater than one
in the same way as with R2; therefore, adjusted R2 is a better measure than R2 for a number
of covariates.

3.7 Goodness of Fit the Model


The goodness of fit or calibration of a model measures how well the πmodel describes (explains)
the dependent variable. Assessing the goodness of fit involves examining how close values
predicted by the model with that of the observed value(Buick&Jonatan, 2005).

After fitting the logistic regression model, there are several techniques used in examining the
goodness, adequacy and usefulness of the model.First,theimportance of each of the independent
variables will be assessed by testing statistically the coefficients. Then the overall goodness of
the fit of the model will be tested (Agresti, 1996).

Additionally the ability of the model to discriminate between the groups defined by the response
variable is evaluated. Finally if possible the model is validated by checking the goodness of fit
and discrimination a different set of data from that which will be used to develop the model
(Bewick& Jonathan, 2005). The Pearson’s chi-square, likelihood ratio
test,HosmerandLemeshowtest, and Wald test are the most commonly used measure of goodness
of fit for categorical data (Hosmer&Lemeshow, 1989
4. RESULTS AND DISCUSSION

As already stated, the main objective of this study has been to investigate the prevalence of
malaria in Z/dugdaWoreda using binary logistic regression model.The first part of this section
deals with descriptive statistics plus bivariate analysis, the second part deals with binary Logistic
Regression model.

For this study,the data was obtained from OgolchoClinik, Z/dugdaworeda from clinical registry
patient’s card and log book.The data of size 263 were obtained from record reviews of all
malaria patients’ admitted to the clinic. In this paper 5% level of significance was used to
investigate the significance of the variable.

4.1. Descriptive Statistics and Bivariate Analysis

The result displayed on table below show percentage and frequency of malaria prevalence status
with respect to each category together with, Chi-square, p-value and degrees of freedom. The
tests were intended to test the association between each explanatory variables and status of
malaria prevalence (negative or positive).
Table 4.1:Test of Association between malaria prevalenceStatus and Explanatory
Variables.
(OgolchoClinik, Z/dugdaworeda from clinical registry, patient’s card and log book, April
2015).

Status of malaria Total Chi-square Df


Negative Positive
Count (%) Count(%) Count (%) Value (pvalue)
Female 157 (72.0) 61(28.0) 218 (82.9) 1
Sex Male 26 (11.9) 19 (8.7) 45 (17.1) 3.574(0.0587)
Season Wet 149 (72.7) 56(27.3) 205 (77.9)
Dry 34 (58.6) 24 (41.4) 58 (22.1) 4.224 (0.0398) 1
MSp Vivax 157 (72.4) 60 (27.6) 217 (81)
Falciprium 31(60.8) 20 (39.2) 51 (19) 2.314(0.128) 1

Residence Urban 154(74.0) 54(26) 208(79.1)


Rural 29(52.7) 26(47.3) 55(20.9) 9.33(0.0023) 1

Stagnantofwate
r No 145(74) 51(26) 196(74.5)
Yes 38(56.7) 29(43.3) 67(25.5) 7.03(0.008) 1

Net usage Yes 135(75.4) 44(24.6) 179(68.1)

No 48(57.1) 36(42.9) 84(32) 9.02(0.003) 1

The output on Table 4.1 shows the proportions of prevalence of malaria, frequency distribution,
Chi-square, p-value and degrees of freedomwith respect to each category of the categorical
explanatory variables.
The results reveal that out of 263 status of malaria patients considered in the analysis, 28.0% of
males and 8.7% of females have malaria positive,while11.9% of males and 72%of females were
malaria negative. Moreover, the Table 4.1 shows that age, residence, net usage and stagnant
were found to have significant association with status of prevalence of malaria.
Since the p-value of all variables are less than α=0.05; from the chi square goodness of fit we can
conclude that the model fits the data well.
4.1.1Analysis of data using Logistic Regression
Here Binary Logistic regression is illustrated.
Full Model
Table 4.1.1:Full Models of Binary Logistic regression.

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.627921 0.405738 -6.477 9.36e-11 ***
Age 0.021549 0.007444 2.895 0.00379 **
SexMale 0.635914 0.364690 1.744 0.08121 .
ResidenceRural 0.852440 0.342242 2.491 0.01275 *
netusageNo 0.757582 0.301180 2.515 0.01189 *
stagnant Yes 0.850786 0.323041 2.634 0.00845 **
MSpFalcipriu 0.088372 0.368130 0.240 0.81029
m
Season Dry 0.138758 0.353009 0.393 0.69427

As it can be seen from the output of the model, age, residence, net usage and stagnant were
significantly related to prevalence of malaria.

4.1.2 Variable Selection

Variable selection can be done either by computer algorithm or manually by discarding the
variable with largest p-value, greater than the specified level of significant. Accordingly, sex,
MSp and season are the candidate for removal. However, for this study we used the backward
selection computer algorithm using R version 3.0.3 (2015-03-06) with R-code given below.

>stepwise (glm1, direction='backward', criterion='AIC')

Thus, age , Sex, Residence, netusage and stagnant should be included in the model.

Table 4.1.2: Stepwise variable selection(backward selection)


Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.632581 0.406148 -6.482 9.06e-11 ***
Age 0.022354 0.007239 3.088 0.00202 **
SexMale 0.643077 0.364453 1.765 0.07765 .
Residence Rural 0.881543 0.335303 2.629 0.00856 **
Net usageNo 0.772624 0.299190 2.582 0.00981 **
Stagnant Yes 0.870524 0.319489 2.725 0.00644 **

As we can see from the above output age, residence, net usage and stagnant have significant
effect on the prevalence status of malaria at5% level of significance. Since their p-value is less
than α=0.05.
4.1.3 Assessing Logistic Regression Model
After the logistic model is formed using the selected predictor variables in the stepwise
Likelihood– ratio selection procedure, the overall significance should be tested using appropriate
methods.
The hypothesis to be tested in relation to the overall fit of the model is:
H0: The model is a good fitting model.
H1: The model is not a good fitting model.
An alternative to model chi-square is the Hosmer-Lemeshow test which divides subjects
into 10 ordered groups of subjects and then compares the number actually in the each group
observed to the number predicted by the logistic regression model. The 10 ordered groups
were created based on their estimated probability; those with estimated probability below
0.1 form one group, and so on, up to those with probability 0.9 to 1.0. Each of these categories is
further divided into two groups based on the actual observed outcome variable. The expected
frequencies for each of the cells were obtained from the model. A probability value was
computed from the chi-square distribution with 8 Degrees of freedom to test the fit of the logistic
model. If the Hosmer - Lemeshow goodness of-fit test statistic is greater than 0.05, as we want
for well-fitting models, we fail to reject the null hypothesis that there is no difference
between observed and model-predicted values, implying that the model fits the data at an
acceptable level. That is, well-fitting models show non-significance on the Hosmer - Lemeshow
goodness-of-fit test. This desirable outcome of non-significance indicates that the model
prediction does not significantly differ from the observed.
4.1.4 Odds Ratio
Table: 4.1.4 Odd Ratio
Exp(coef(glm2))
(Intercept) Age SexMale Residence Rural netusageNo stagnantYes
0.07189265 1.02260543 1.90232562 2.41462374 2.16544200 2.38816164
We can interpret Exp (β) as changes in odds. If the value exceeds 1 then the odds of occurrence
of success increases (being malaria patient ); if the value is less than 1, any increase in
the predictor variables leads to a drop in the odds of the success occurrence (being
malaria positive). The results above also contain the odds ratio column. Apparently, the results
showedthat the odds of being malariapositivefor rural resident are 2.42 times more likely
compared to urban residents. This indicates that the problem is more severe for rural residents
than patient diagnosed from urban residents in Z/dugdawerada. The chance of being malaria
positive for respondent who do not use net.The odds ratio for net usage showed that society who
do not used net are 2,17 times more likely to have the risk of malaria diseasethanthose who used
net. Similarly the existence of stagnant water in the area is 2.39 times more likelyto contribute to
malaria prevalence than the absence of stagnant water.
Table4. 1.5Hosmer and Lemeshow Goodness-of-Fit Test

Chi Square df P_value


12.9222 8 0.1145548

The values of Hosmer-Lemeshow statistic has chi-square value of 12.9222and a significance of


0.1145548, which means that Hosmer-Lemeshow test is not statistically significant and therefore
our model is quite a good fit. Because p-value exceeds level of significance (α=0.05), that shows
there is no significant difference between the observed and predicted model values and hence the
model fits the data well.
4.2. Discussion
This retrospective study was attempted to identify the risk factors of malaria prevalence related in
Ogolcho clinic, Z/dugdawerada. Also, the predictive powers of Binomial logistic regression approaches
were computed. The results of the study showedthat,malaria is prevalent for the sake of predictor
variables those were significant (i.e, stagnant water, net usage, age and residence). Several researches
have been done on this by different authors. Variables, like residence and stagnant water had significant
contribution to the status of malaria in this study, which is found to be comparable to studies conducted
byWeyessa A et al(2012) in buttajera, Alemuet al (2012) in kola dibaandKazembe et al. (2008) in Zomba
district hospital. Furthermore, Variables like, net usage status of patients and residence of patient have
significant impact on prevalence of malaria in our stuy.
5. CONCLUSIONS AND RECOMMENDATION
5.1. Conclusions
The main objective of this study is to investigate the risk factors of the prevalence of malaria in
Z/dugdaWoreda using Binary logistic regression. Thus the following conclusions are made.

From the results, this study showsthat the most important risk factors for theprevalence of
malaria aresex, age, malaria species, Residence, stagnant of water and Net usage. As we see from
the final logit model the coefficient of all variables are positive providing for us evidence to
conclude thatall variables chosen (age, Residence, stagnant water and Net usage.)can increase
risk of the prevalence of malaria even though the level of effect (significance) differs.

5.2. Recommendations
As seen in previous parts, the predictor variables such as age of patients, residence of patients,
stagnantwater and net usage were risk factors most probable affects status of malaria prevalence.
Considering this in mind we make the following recommendations:
To minimize prevalence of malaria, health workers should be cautioustoaware society, distribute
net among individuals of the area, erode stagnant water and etc. Moreover, rural residents are
particularly at high risk of prevalence of malaria than urban residents. This were due to problem
of their treatment seeking behavior, in that there is an extreme delay in early diagnoses which
leads to the progress of severe and finally leads to high risk of prevalent. So in order to address
this problem, the governmental and non-governmental organizations those working in the areas
should give due attention specially, on continuous awareness creation of early diagnoses and
treatment to the health facility.
Finally, the concerned body has to expand and maintain health promotions on designing
appropriate interventions, tailored towards communities at high risk and effective treatment in
home or community based care.
5.3. Limitations of the Study

The study focused on identifying some of the factors that were expected to be associated risk
factors of malaria prevalence in Z/dugdaworeda based on available data on patient cards.
However, the study could not incorporate some other important risk factors that may hindered to
prevalence of malaria due to lack of data, such as malaria in pregnancy case, vaccination status,
socio economic status of patients, educational status, awareness of patients about the disease,
proper utilization of different ant malaria treatments and other related issues. Despite these
limitations, the model derived in this paper may give a more accurate prediction of risk factors of
malaria prevalence status by taking into account available proxy data of patient card of
laboratory confirmed malaria positive patient
6. REFERENCES

1. Agresti A. (1996) an introduction to categorical data Analysis 2nd edition .John Wiley and
sons Inc, Newyork.
2. Alemuetal.Parasites&Vectors2012173http://www.parasitesandvectors.com/content/5/1/17
3. AlexanderMacedon de Oliveira, Malaria Branch, Division of Parasitic Diseases and Malaria,
Centers for Disease Control and Prevention, 1600 Clifton Rd, MS A-06, Atlanta, GA 30333.
E-mail: [email protected]
4. Am. J. Trop. Med. Hyg., 85(6), 2011, pp. 1002–1007doi:10.4269/ajtmh.2011.11-0365
5. Copyright © 2011 by the American Society of Tropical Medicine and Hygiene Bewick, L. and
Jonatan, B. (2005) statistic Review14: Logistic Regression.
6. Federal Ministry of Health and Ethiopian National Malaria Indicator Survey
7. (ENMIS) (2008). Report on malaria situation in Ethiopia.
8. Hosmer, D. and Lemeshow. (1989)Applied logistic Regression 3rdeditionJohn Wiley and
sonsinc, Newyork.Mawili-Mboumba et al. Malaria Journal 2013, 12 :3
9. http://www.malariajournal.com/content/12/1/2
10. Ministry of Health (2008). Malaria in Ethiopia: Health and Health-Related Indicators
report, Planning and Programming Department, Federal Democratic Republic of Ethiopia,
Ministry of Health, Addis Ababa, Ethiopia.
11. Ribeiro, J.M. (2006). Epidemiologic aspects of the human malaria transmission.
AmJTropMedHyg,128-135.
12. Sachs, J. and Malaney, P. (2006). The burden of malaria epidemics and cost-effectiveness of
interventions in epidemic situations in Africa. Nature 415,680-685.
13. The Carter Center: Report of Malaria and Trachoma Survey in Ethiopia
14. Weyessa A., Gebremichael T., Ali A. (2007). An indigenous malaria transmission in the
outskirts of Addis Ababa, Akaki Town and its environs. Ethiopia. J. Health Dev. 2007;
18(1):2-7.
15. World Health Organization / United Nations Children's Fund (WHO/UNICEF.
(2008).World Malaria Report 2008: Geneva, Switzerland http://www.unicef.org/386.html

16.World Health Organization (WHO) (2009). World Malaria Report 2009, Geneva: Roll Back Malaria
and Global Malaria Control Strategy. http://www.rollbackmalaria.org
7. Appendix
A) R-code for Full Model

getwd ()
setwd("C:/Users/user/Desktop/GC2015")

umer<-read.table("Umer.dat",header=T)

attach(umer)

Data2<-data.frame(age,Status,Sex,Residence, netusage, stagnant, MSp, Season)

Data2 <- within(Data2, {

Status <- factor(Status, levels=0:1, labels=c("negative","positive"))

Sex <- factor(Sex, levels=0:1, labels=c("Female","Male"))

Season<- factor(Season, levels=0:1, labels=c("Wet","Dry"))

net usage<- factor(net usage, levels=0:1, labels=c("Yes","No"))

Stagnant<- factor(stagnant, levels=0:1, labels=c("No","Yes"))

Residence<- factor(Residence, levels=0:1, labels=c("Urban","Rural"))

MSp<- factor (MSp, levels=0:1, labels=c("Vivax","Falciprium"))

})

glm1<-glm(Status~age+Sex+Residence+netusage+stagnant+MSp+Season, family=binomial, data=Data2)

summary(glm1)

Call:

glm(formula = Status ~ age + Sex + Residence + netusage + stagnant +

MSp + Season, family = binomial, data = Data2)

B) R-code for the Best model


glm2<-glm(Status ~ age + Sex + Residence + netusage + stagnant, family = binomial, data = Data2)
summary(glm2)

Call:
glm(formula = Status ~ age + Sex + Residence + netusage + stagnant,
family = binomial, data = Data2)
C) R-code for Hosmer and Lemsho Test

library(vcdExtra)
HL<-HLtest (glm2, g = 10)
summary(HL)
Call:
glm(formula = Status ~ age + Sex + Residence + net usage + stagnant,
family = binomial, data = Data2)

You might also like