Using Logistic Regression To Estimate The Influence of Crash Factors On Road Crash Severity in Kathmandu Valley

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Proceedings of IOE Graduate Conference, 2017

Volume: 5 ISSN: 2350-8914 (Online), 2350-8906 (Print)

Using Logistic Regression to Estimate the Influence of Crash


Factors on Road Crash Severity in Kathmandu Valley
Rajeeb Shakya a , Anil Marsani b
a, b
Department of Civil Engineering, Pulchowk Campus, IOE, TU, Nepal
Corresponding Email: a [email protected], b [email protected]

Abstract
There are various factors which are related to Road Traffic Crashes (RTCs). In this study, Logistic Regression is
used to estimate the severity of factors related to RTCs in Kathmandu Valley. The dependent variable is the Crash
Severity (Fatal or Non-Fatal). The independent variables are crash cause, vehicle type, age & sex of the driver at
fault, age of the injured personnel, time of crash, collision type, location of the crash and injured type. Data are
obtained from Nepal Traffic Police records for the past five years. Because of the binary nature of the dependent
variable, logistic regression was found suitable. Of the nine independent variables, three variables were found to
be significantly associated with the outcome of the dependent variable namely age of the driver at fault, age of the
injured personnel and time of the crash. A statistical interpretation of these significant variables in terms of odds
and odd ratio concept is done in the analysis part. Further the association between the time of the crash and the
traffic volume is also checked in the analysis. The findings show that logistic regression as used in this thesis is a
promising tool in providing meaningful interpretation that can be used for future safety improvement policies in
Kathmandu Valley.
Keywords
Logistic regression, Road Traffic Crashes

1. Introduction developing countries in South Asia. The population


according to the 2011 census is approximately 2.5
The number of vehicles on the road is increasing day million. The population density in the valley is 5140
by day and alongside this increase, the number of road people per sq. km. With such a high population density,
crashes and deaths is also on the increase. Every year the growth of vehicles is bound to increase day by day.
approximately 1.25 million people around the world lose This rapid urbanization is showing its toll on the valley
their lives and between 20-50 million more people suffer traffic condition with increased congestion, pollution
non-fatal injuries due to road traffic crashes. Unless any and crashes. Further, the road space in Kathmandu
action is taken, road traffic injuries are predicted to be valley is just around 7-8%. With such low road space
the 5th leading cause of death by 2030 (World Health and such high volume of vehicles, immense care needs
Organization, 2010). In 2002, the overall global road to be taken to minimize the number of Road Traffic
injuries rate was 19 per 100000 people, with 90% of Crashes (RTCs).
the cases in low and mid-income countries like Nepal.
According to the report published in 2002 by World In the year 2068/69, the total number of RTCs was 5096
Bank “Cities on the move”, nearly 0.5 million people die (1481 3962 33173 ). This number counted to 4770 (1481
and 15 million people are injured in urban road traffic 2462 34313 ), 4672 (1431 2292 34813 ), 4999 (1331 2332
accidents in developing countries each year, at a direct 36423 ) in the years 69/70, 70/71, 71/72 that followed.
economic cost between 1 and 2% of worldwide Gross There is no significant improvement in these numbers
Domestic Product. 1 Number of deaths
2 Number of serious injuries
Kathmandu is one of the fast growing cities among the 3 Number of minor injuries

Pages: 311 – 324


Using Logistic Regression to Estimate the Influence of Crash Factors on Road Crash Severity in
Kathmandu Valley

over the past four years. significant predictors of severe crashes. [3]
Logistic Regression is used when the dependent variable Mahmoud Saffarzadeh et.al (2012) developed a binary
is binary in nature. It is a powerful statistical tool which logistic regression regression model analysis of forensic
can be used to predict the effect of the factors that are medicine data from the Khorasan Razavi province, Iran
related to the crashes. and found that the pedestrian’s age, vehicle involved
in the accident and accident location had a significant
impact on the probability of the death at the scene. [4]
2. Literature Review
Sareh Bahrololoom et.al (2011) carried out the research
Many researches have been done around the world that studied the factors that increase the likelihood of hit
using Regression Analysis. Among the different and run in crashes that include at least one cyclists in
regression analysis methods, the most commonly used Victoria, Australia. The results showed that crash time,
is the Conventional Regression Analysis either linear or bicyclist’s age and gender, helmet use (for bicyclists),
Non- linear when the dependent variable is continuous other road user’s intent, bicyclist’s intent, traffic control
in nature. However, logistic regression analysis seems (other road user’s approach), traffic control (bicyclist’s
to be a promising tool especially when the dependent approach) and crash severity are significant variables in
variable is binary in nature (i.e. it can take only two the Binary Logistic Regression model. [5]
values) and the same is used in this paper to predict the
severity of the factors that are involved in the RTCs. Murat Karacasu et.al (2013) studied the causes of the
traffic accidents in Eskisehir, Turkey using logistic
In the logistic regression model developed By Ali. S. regression and discriminant analysis and found that
Al-Ghamdi (2000) to study the accidents in Riyadh, traffic sign, pavement type, vehicle type, purpose,
Location and Crash Cause were found to be education and primary fault were determined to be
significantly associated with the outcome of the crash significant variables. [6]
being fatal or non-fatal. The study showed that the odds
of being in a fatal crash at a non-intersection location Boakye Agyemang et.al (2013) carried out the
are 2.64 times higher than those at an intersection. The Regression Analysis of Road Traffic Accidents and
study also showed that the odds that a crash will be fatal Population Growth in Ghana and found that population
because of running a red light (RRL) are 2.72 times growth is 72.9% accountable for the changes in
higher than for a non-RRL crash.[1] accidents in Ghana. Further, the model was used to
forecast the Road Traffic Accidents in the near future.
Mohadeseh Khalili & Alireza Pakgohar (2013) [7]
investigated the impact of Road Defects on the severity
factor of the road crashes according to the vehicle Gentiana Qirjako et.al (2008) studied the factors
movability situation after the accident. The results from associated with fatal road crashes in Tirana, Albania and
this research shows that most important factors reducing found that Younger Age (OR: 3.97, 95% CI: 2.28-6.91),
the safety on the suburban roads in Iran is “Insufficient High Speed (OR: 2.54, 95% CI: 1.62-3.98) and
road width” pertaining to frequency and “Level especially Alcohol Consumption (OR: 6.15, 95% CI:
difference between road & shoulder” pertaining to crash 3.54-10.66) were strong and significant predictors of
severity. [2] fatal crashes. The study also showed that fatal crashes
were more prevalent on Intercity Roads (OR: 4.25, 95%
Wiredu Sampson & Richard Tawiah (2015) developed a CI: 3.11-5.82) and involved especially Vans and Trucks
logistic model which revealed that the significant (OR: 4.12, 95% CI: 2.34-7.24). [8]
predictors of severe crashes were mainly overcrowding,
driver discipline on roads, driver fatigue and design & S. Renuraj et.al (2014) carried out the logistic regression
condition of roads with estimated odds ratio of 2.42, analysis to study factors influencing Traffic Accidents in
3.83, 10.51 and 12.06 respectively. Other variables Jaffna. The study showed that Type of Vehicle and Age
including speed, drunk driving, Not using Helmets, were found to be significant in influencing the accident
mechanical failure, over speeding and indiscriminate severity. [9]
use of roads by pedestrians were not found to be

312
Proceedings of IOE Graduate Conference, 2017

3. Theoretical Background of Logistic completely treated in the same way as in linear analysis,
Regression the dependent variable now being equal to g(x). The
exponent of the term, g(x) also called Odd, is described
Logistic Regression is a very powerful statistical tool in the later section (See Development of Logistic Model
when the analysis contains dependent (response) variable in the Methodology section). However the Sum of Least
that is binary or dichotomous in nature (i.e. it can take Squares cannot be used to predict the parameters in
only two values). The independent variables can be logistic regression, the reason being the fact that the
either continuous or categorical in nature. The response dependent variable is binary in nature and not
variable thus takes the value 0 or 1. The linear regression continuous. A convenient way to express the
equation is in the form of contribution to the likelihood function for the pair
(xi , yi ) is through the term,
E(Y /x) = β0 + βi · xi
τ(xi ) = π(xi )yi [1 − π(xi )]1−yi
Where, E(Y /x)i the expected value of Y given x. β0 is
the value of Y when x equals zero and βi are the model Since (xi ) values are assumed to be independent, the
parameters. The above equation is linear and Y can take product for the terms given in the foregoing equation
any values from 0 to infinity. The likelihood function is gives the likelihood function as follows.
the function which when maximized or minimized the
n
predicted values of the dependent variable tends closer
L(β ) = ∏ τ(xi )
to the actual values. In linear regression, the likelihood i=1
function is the sum of least square of the difference of
the predicted and actual values. It can be written in the It is easier to work mathematically with the log of the
following form. equation which gives the likelihood expression,

LL(xi ) = Σn0 (Ypi −Yi )2 L(β ) = ln[l(β )]

Where Y( pi) are the predicted (modeled) values and Yi = Σni=1 (yi ln[π(xi )] + (1 − yi )ln[1 − π(xi )]
are the actual (observed) values. The values of βi can
Maximizing the above function with respect to β and
be determined by minimizing the Likelihood function.
setting the resulting expression to zero will give the
This process of computing the parameters is called the
values of βi values.
Sum of Least Squares. In logistic regression, however
the dependent variable Y can take only two values (0 or There is a statistic which is used to check the significance
1). To make this possible, the above equation needs to of the variables in the model which is called Deviance.
be transformed as mentioned below. It is the ratio of the likelihood of the current model to the
likelihood of the saturated model multiplied by minus
eβ0 +βi xi 2. Saturated model is the one which contains as many
π(x) =
1 + eβ0 +βi xi parameters as there are data points and the current model
is the one that contains only the variable under question.
Where π(x) is used instead of E(Y /x) in logistic
regression to simplify notation. By doing this, the value Likelihood o f the current model
D = −2ln
of dependent variable is limited to take values Likelihood o f the saturated model
in-between 0 or 1 including 0 and 1. The above form of
π(x) can be transformed into linear form which is called For the purpose of assessing the significance of an
Logit transformation as shown below. independent variable, the value of D should be
compared with and without the independent variable in
π(xi ) the model. It can be obtained as follows
g(x) = ln = β0 + βi xi
1 − π(xi
G = D( f or the model without the variable)−
This transformation is important considering the fact
that the right hand side of the equation can now be D( f or the model with the variable)

313
Using Logistic Regression to Estimate the Influence of Crash Factors on Road Crash Severity in
Kathmandu Valley

The G-values will follow the Chi-square (χ 2 ) categorical variables. One way of coding the dummy
distribution with one degree of freedom. The critical variables is to have (k-1) design variables for k levels of
values of the Chi-Square distribution can be easily nominal scale of that categorical variable. An example
obtained from the statistic tables (Not mentioned in this for this coding is given in the Table 2 below. The
paper). If the change in the Chi-Square values is greater independent variable “Injured Type” is taken as an
than the critical value, then the change is significant and example. As can be seen in Table 2, it has 6 categories
the variable under question is significant and if it is namely “Bike Driver”, “Pedestrian”, “Bike Passenger “,
below the critical value, then the change is just random “Other Vehicle Passenger”, “Multiple Injuries” and “4W
and the variable under question is insignificant. Driver”. Thus it needs 5 dummy variables D1 , D2 , D3 ,
D4 and D5 . One variable is set as a base variable and all
Another important statistic called P-value is the region
the other dummy variables are calculated relative to this
outside the confidence interval of the normal
base variable. Here for example if the crash involves
distribution of the predicted value of the coefficient of
“Bike Driver” (which is a base variable) as the affected
the independent variable. For 95% confidence interval,
party, then all 5 dummy variables D1 , D2 , D3 , D4 and
the p-value should be less than 0.05.
D5 are set equal to zero. If the crash involves Pedestrian
as the affected party, then D1 is set to 1 and D2 , D3 , D4
4. Methodology and D5 are set to zero.
And, if the affected party includes “Bike Passenger”,
4.1 Data Description then D2 is set equal to 1, and D1 , D3 , D4 and D5 set
The data for the regression analysis of this model is to zero. Similarly if the affected party includes “Other
collected from the traffic police records from the year Vehicle Passenger”, then D3 is set equal to 1, and D1,
2068/69 to 2072/73. Only the serious crashes occurring D2 , D4 and D5 set to zero. If there are multiple injuries,
inside the Kathmandu valley are used. Minor crashes then D4 is set equal to 1 and D1 , D2 , D3 and D5 are set
are not used in this analysis. Serious crashes are to be to zero. And finally if the affected party is “4W Driver”,
filtered manually from the thousands of crash records then D5 is set equal to 1 and D1 , D2 , D3 and D4 are set
that are recorded in the traffic police records. A Software equal to zero. The coding technique is same for all the
called SPSS-21 (Statistical Package for Social Sciences) remaining categorical variables. However these codings
is used to carry out the logistic regression analysis for are automatically done by the software (SPSS).
our purpose of study.
The dependent variable is “Crash Severity” which is 4.2 Development of Logistic Model
coded as 0 if it results in at least one injury and no
There are many ways of developing a Logistic model.
fatality and 1 if it results at least one fatality.
The mostly used techniques are Forward Selection
There are 8 independent variables in this analysis. All process and Backward Selection Process. In the
of the independent variables are categorical in nature i.e. Forward Selection process, first the regression analysis
they cannot be put in order in terms of their magnitude is carried out with only one independent variable and its
except two variables which are “Age of the driver at fault” significance is checked at 95%. However due to the
and “Age of the injured personnel”. The categorical possibility of omission of the significant variables from
variables need to be coded in a different way which is the complete model, the critical significance level is
defined in the section below. The list of variables is kept at 75% (P-value 25%) initially (Hosmer &
summarized in the Table 1 given below. Lemeshow, 2000)[10]. And then the remaining
variables are added to the model one by one until we are
The categorical variables need to be interpreted in a
left with only significant variables at 95% confidence
different way from the continuous variables. The two
level.
continuous variables “Age of the driver at fault” and
“Age of the injured personnel” are measured in terms of The backward selection process is carried out with all the
number of years. A collection of design variables (Also independent variables in the model with no interaction
called Dummy variables) are needed to define the between the variables also called as the saturated model.

314
Proceedings of IOE Graduate Conference, 2017

Table 1: List of variables

No. Description Coded Values Abbreviation


0 = Non fatal
1 Crash Severity CRA SEV
1 = Fatal
1 = Alcohol Consumtion
2 = Negligence of the driver
3 = Overtaking
2 Crash Cause CRA CAUSE
4 = Overspeeding
5 = Mechanical Failure
6 = Road Condition
1 = Head on
2 = Right Angled
3 = Side Swipe
3 Collision Type 4 = Rear End COLL TYPE
5 = Out of control
6 = Pedestrian Hit
7 = Collision with Fixed Objects
1 = Bus
2 = Bike
3 = Car
4 Vehicle Type VEH TYPE
4 = Minibus
5 = Truck
6 = Cycle
5 Age of the years AGE FAULT
driver at fault
6 Age of the years AGE INJ
Injured
0 = Male
7 Gender GENDER
1 = Female
1 = Intersection
8 Location 2 = Turning LOC
3 = Straight Section
1 = Bike Driver
2 = Pedestrian
3= Bike Passenger
9 Injured type INJ TYPE
4 = Other Vehicle Passenger1
5 = Multiple Injuries
6 = 4W Driver

315
Using Logistic Regression to Estimate the Influence of Crash Factors on Road Crash Severity in
Kathmandu Valley

No. Description Coded Values Abbreviation


1 = Morning Off Peak (4-8)
2 = Morning Peak (8-12)
3 = Day off Peak (12-16)
10 Time of Crash CRA TIME
4 = Day Peak (16-20)
5 = Evening Off Peak (20-24)
6 = Night (24-4)

Table 2: Coding of Categorical Variables equation can also be written in the following form.
Design Variables P
INJ TYPE = eβxi +β0
D1 D2 D3 D4 D5 1−P
Driver 0 0 0 0 0 The odd ratio can be defined as the ratio of the odds
Pedestrian 1 0 0 0 0 for every unit increase in the value of the independent
variable.
Bike Passenger 0 1 0 0 0
P/((1−P))x=a+1
Other vehicle Passenger 0 0 1 0 0 Odd Ratio, OR(a + 1, a) = P/(1−P))x=a = eBi

Multiple Injuries 0 0 0 1 0 Hence, the Odd Ratio is the exponent of the coefficient
of the independent variable under consideration.
4W Driver 0 0 0 0 1
The odd ratio is a useful tool in the model interpretation
process.
Their P-values are checked. Those with P-values greater
If the odd ratio is greater than 1, then the odd of the
than 25% are rejected at the initial stage until the model
success (fatal in this case) for certain value of the
is left with only significant variables at 95% confidence
independent variable under consideration is greater than
level. The change in deviance i.e. the G-value is also
the odd of the success for the unit increase in the value
observed to interpret the significance of the variable.
of that independent variable.
In our model, the backward selection process is used.
If the odd ratio is less than 1, then the odd of the success
The logistic model will be in the following form. (fatal in this case) for certain value of the independent
variable under consideration is less than the odd of the
P( f atal) success for the unit increase in the value of that
ln = β0 + β1i ∗ (CRACAUSE )i independent variable.
1 − P( f atal))

+β2i ∗(COLLTY PE )i +β3i ∗(V EHTY PE )i +β4 ∗(AGEFAULT ) 4.3 Entry of Data in SPSS

+β5 ∗ (AGEINJ ) + β6i ∗ (GENDER)i + β7i ∗ (LOC)i There are a total of 12 columns in the SPSS entry sheet.
The 1st column records the date of the crash. The 2nd
+β8i ∗ (INJTY PE )i + β9i ∗ (CRAT IME )i column records whether the crash recorded is fatal or
non-fatal. “1” means the case is fatal and “0” means
However one should keep it mind that all the independent that it is non-fatal. This is the dependent variable. The
variables may not be significant and the final model will independent variable starts from the 3rd column and ends
contain only those variables that are significant. on the 12th column. The 3rd column records the cause
P( f atal) of the crash. There are 6 causes that can be recorded as
The term 1−P( f atal) is called the odd. As can be seen shown in Table 1. The 4th column records the collision
π(xi )
from the earlier equation ln 1−π(x i
= β0 + βi xi , this type. There are 7 types of collision as shown on Table 1.

316
Proceedings of IOE Graduate Conference, 2017

The 5th column records the vehicle type at fault. There be significant at 95% confidence Interval.
are 6 vehicle types as shown in Table 1. The 6th and
And thus the final logistic model of our case is in the
7th column records the age of the driver at fault and the
following form:
age of the injured personnel. The 8th column records
whether the driver at fault is male or female “0” for male
and “1” for female. The 9th column records the location p
ln = −0.832+0.025∗Age f ault +0.021∗AgeIn jured
of the crash. There can be 3 categories of the location 1− p
of the crash as shown in the Table 1. The 10th column
−1.425∗CrashT IME(4−8) −1.189∗CrashT IME(8−12) −1.057
records the injured type. There can be 6 categories of the
injured type as shown in the Table 1. The 11th column ∗CrashT IME(12−16) − 1.076 ∗CrashT IME(16−20) − 1.344
records the time of the crash. It has 6 categories as ∗CrashT IME(20−24 ) − 0 ∗CrashT IME(24−4)
shown in the Table 1. The last and the 12th column is
for the remarks if any. The value of “p” in the above equation gives the
probability that the crash will be fatal.
4.4 Backward Selection Method
In this case, the total number of crash cases studied to 5. Analysis of the Results
develop a logit model were obtained from the fiscal year
2068/69, 2069/70 and 2070/71. There were a total of 504 5.1 Validation of the model
number of serious crash cases. Out of these 504 accident The final model needs to be validated against the data
cases, only 476 cases were valid. The 28 invalid cases that were not used to develop the model. Hence for the
were due to the unknown ages of either the driver at fault validation process, the data from the fiscal year 2071/72
or of that person injured. These data are recorded as and 2072/73 were taken.
“999” during the entry process.
The value of “p” in the above equation determines the
The first run of the analysis is done in the SPSS probability of the crash being fatal. A cut point of 0.31 is
(Statistical Packages for Social Science) to test the used to separate the fatal crashes from the non-fatal. The
significance of the independent variable at 95%. reason behind using this value of cut point is because it
The four independent variables “Crash Cause”, “Vehicle maximizes the accuracy of this model. This means that
Type”, “Gender of the driver at fault” and “Location of any crash case with value of “p” below 0.31 is termed
the Crash” were not significant at 95% so these four as a non-fatal case and any crash case with value of “p”
variables are eliminated and the selection process is above 0.31 is termed as a fatal case.
carried out with the remaining variables. The variables A total of 304 cases were used out of which 217 cases
“Age of the Injured” and “Crash time” are significant at were successful and 87 cases failed which means that
95% as can be seen in figure 1. the accuracy of the model is 71.38 %.
The second run of analysis is done as can be seen in
figure 2 The variable “Age of the Injured”, “Age of 5.2 Model Interpretation
the driver at fault” and “Time of crash” appeared to be
The exponent of the co-efficient of the variables gives
significant. The variable “Injured Type” appeared to be
the odd ratio as defined earlier in the “Development of
insignificant and removed for the next selection process.
the Logistic Model” Section.
In the third run of analysis, the variables “Age of the
As can be seen in the final result in figure 4, the
Injured”, “Age of the driver at fault” and “Time of crash”
exponent of the variable “Age of the driver at fault” is
still appeared to be significant as can be seen in figure 3
1.025. Similarly that of the variable “Age of the injured
The variable “Collision Type” is removed in this step.
personnel” is 1.021. Similarly the exponent of the
In the fourth and final run of the analysis in figure 4, all coefficient of the categories of the variable “Time of
the remaining three variables “Age of the Injured”, “Age crash” is respectively 0.241, 0.305, 0.348, 0.341 and
of the driver at fault” and “Time of crash” appeared to 0.261 respectively.

317
Using Logistic Regression to Estimate the Influence of Crash Factors on Road Crash Severity in
Kathmandu Valley

Figure 1: Figure of the table from SPSS showing first run of the analysis

318
Proceedings of IOE Graduate Conference, 2017

Figure 2: Figure of the table from SPSS showing second run of the analysis

319
Using Logistic Regression to Estimate the Influence of Crash Factors on Road Crash Severity in
Kathmandu Valley

Figure 3: Figure of the table from SPSS showing third run of the analysis

Figure 4: Figure of the table from SPSS showing fourth and final run of the analysis

320
Proceedings of IOE Graduate Conference, 2017

5.2.1 Age Effect c) The exponent of the coefficient of the dummy variable
CRAT IME(3) i.e. Day Off Peak (12-16) is 0.348 which
Effect of the Age of the driver at fault
means that the odd of a crash being fatal in the time
The exponent of the coefficient of the variable “Age “Day Off Peak (12-16)” is 65.3% less than that of the
of the driver at fault” is 1.025 which means that for a crash in the time “Night (24-4)”.
given “Age of the injured personnel” and given “Time of
d) The exponent of the coefficient of the dummy variable
crash”, the ratio of the odds of the crash being fatal for a
CRAT IME(4) i.e. Day Peak (16-20) is 0.341 which means
unit increase in the “Age of the driver at fault” is 1.025.
that the odd of a crash being fatal in the time “Day Peak
This means that for every unit increase in the age of the
(16-20)” is 65.9% less than that of the crash in the time
driver at fault, the odd of the crash being fatal increases
“Night (24-4)”.
by 2.5% for a given “Age of the injured personnel” and
given “Time of crash”. e) The exponent of the coefficient of the dummy
variable CRAT IME(5) i.e. Evening Off Peak (16-20) is
Effect of the Age of the Injured Personnel
0.261 which means that the odd of a crash being fatal in
Similarly, the exponent of the coefficient of the variable the time “Evening Off Peak (16-20)” is 73.9% less than
“Age of the injured personnel” is 1.021 which means that that of the crash in the time “Night (24-4)”.
for a given “Age of the driver at fault” and given “Time
Part II
of crash”, the ratio of the odds of the crash being fatal
for a unit increase in the “Age of the injured personnel” Associating “Time of the crash” with the “Traffic
is 1.021. This means that for every unit increase in the Volume”
age of the injured personnel, the odd of the crash being
From our results, the time of the crash is significantly
fatal increases by 2.1% for a given “Age of the driver at
associated with the outcome of the crash severity being
fault” and given “Time of crash”.
fatal or non-fatal. This analysis is further carried out
to check whether there is any significant association
between the time of the crash and the traffic volume
5.2.2 Effect of the time of Crash
data.
Part-I There are altogether 160 stations set across the major
Since this is the categorical variable and it has 6 corridor roads by Departments of Roads all over the
categories, one category is used as a base category. And country. Among them, 24 stations lie inside Kathmandu
the other 5 dummy categories are stated relative to this valley. It provides the traffic volume along with the
base category as defined in the “Methodology” section vehicle types at one hour interval during the whole time
above. of the day. The traffic volume data for the fiscal year
2011/12, 2012/13, 2014/15, 2015/16 and 2016/17 can
In this case, the last category i.e. “Night (24-4)” is used
be obtained from the online site of the Department of
as the base model. All other categories are represented
Roads (http: //ssrn.aviyaan.com/traffic controller/get
relative to this category for the given ”Age of the driver
summary). Each station gives the traffic volume data
at fault” and ”Age of the injured personnel”.
for the consecutive three days of a particular fiscal year.
a) The exponent of the coefficient of the dummy variable Hence the traffic volume needs to be the average of
CRAT IME(1) i.e. Morning off peak (4-8) is 0.241 which these three days. Also to mention, that there is no traffic
means that the odd of a crash being fatal in the time volume data for the fiscal year 2013/14 (2070/71) and
“Morning off peak (4-8)” is 75.9% less than that of the hence the crash cases of the year 2070/71 cannot be
crash in the time “Night (24-4)”. analyzed. The traffic volume for our analysis needs to
b) The exponent of the coefficient of the dummy variable be interpolated to get the traffic volume at a specific time
CRAT IME(2) i.e. Morning Peak (8-12) is 0.305 which of the day. Linear interpolation is used in this case.
means that the odd of a crash being fatal in the time The details of the 24 stations can be seen in the table 3.
“Morning Peak (8-12)” is 69.5% less than that of the
The 504 crash cases that were used to develop the
crash in the time “Night (24-4)”.

321
Using Logistic Regression to Estimate the Influence of Crash Factors on Road Crash Severity in
Kathmandu Valley

Table 3: 24 monitored stations inside Kathmandu Valley

S No Station Location Link Name Number of


No Crashes
1 58 Satdobato South (Chapagaun) Satdobato-Sunakothi 4
2 59 Satdobato Junction South Satdobato-Karmanus Bridge 1
3 60 Ring Road (Manohara Bridge) Gwarko-Manohara River(Balkumari) 4
4 61 Ring Road (Balkhu East) Balkhu - Ekantakuna ( KTM Ringroad) 2
5 62 Kharipati Bhaktapur-Army camp
6 63 Hanumante Bridge Sallaghari-Hanumante Culvert 7
7 64 Manohara Bridge Koteshwar-Manohara bridge 1
8 65 Ring Road (Sinamangal) Tinkune - Sinamangal - Gaushala 8
9 66 Chabahil East Pipal Bot-Sankhu
10 67 Jorpati North Jorpati-Sundarijal
11 68 Ring Road (Narayan Gopal Chowk) Sankhapark - Maharajganj (KTM Ringroad)
12 69 Gangalal Hospital North Maharajgunj-Bansbari 3
13 70 Balaju Bypass North Balaju bypas-Nagarjun
14 71 Ring Road (Banasthali) Balaju Junction - Banasthali - Swoyambhu 13
15 72 T.U. Gate Balkhu-Chovar 7
16 73 Taudaha Chovar-Chhaimale
17 74 Nagdhunga Peepalmod-Nagdhunga
18 154 Narayan Gopal Chowk West Maharajganj - Balaju Bypass Junction 8
19 155 Narayan Gopal Chowk South Lainchaur-Maharajgunj 5
20 156 Sitapaila South Kalimati-Bahiti 1
21 157 Kalanki Kalanki to Balkhu 3
22 158 Gwarko East Gwarko-Lubhu-Lankuri Bhanjyang 1
23 159 Byasi Chowk North Byasi (Bhaktapur)-Changunarayan
24 160 Satdobato North Satdobato- Gwarko (KTM Ringroad) 4
Total 72

Figure 5: Figure of the table from SPSS showing the final run of the analysis of part II of 5. Analysis section

322
Proceedings of IOE Graduate Conference, 2017

model occurred all over the Kathmandu Valley. The 5.2.3 Interpretation on insignificant variables
traffic volume data of all the places in the Kathmandu
The independent variables like “Vehicle type”, “Crash
Valley cannot be obtained. Hence only those crashes
cause”, “Gender of the driver at fault” and “Location of
that occurred along the corridor monitored by the 24
the crash” which were eliminated in the first run of the
stations provided in table 3 can be analyzed. As can be
analysis were far from being significant.
seen in table 3, there are only 72 accident cases within
the original 504 cases that occurred along the monitored The first reason behind the independent variable “Vehicle
corridor. type” not being significant may be due to the fact that the
nature of the traffic in Kathmandu valley is homogeneous
The analysis of the 72 crash cases in the SPSS includes
with all kinds of vehicle occupying the same right of
Crash Severity as the dependent variable which is either
way. With this homogenous nature of traffic, all kinds of
fatal or non-fatal. Only the significant independent
vehicles are involved in both fatal and non-fatal crashes
variables from the initial model are taken for this
without any kind of significant relation between vehicle
analysis. They include “Age of the driver at fault”, “Age
type at fault and the severity of the crash. The second
of the injured personnel”, “Time of crash” and newly
reason may be due to the under reporting from the traffic
added “Traffic volume”. The traffic volume data
authority. The actual size of the road crash problem
between two different time needs to be interpolated to
may be greater than that shown by the official crash data
get the traffic volume at the time of the crash.
recorded by the traffic authority.
The analysis is done in SPSS in the same manner as
“Crash cause” not being significant may be due to the
previously. Only one variable ”Traffic volume” was
fact that 319 crash cases out of the total 476 crash cases
found to be significant. The reason behind the
were accounted for the negligence of the driver. This
insignificance of the previously significantly associated
implies that the 67% of the crashes were caused due to
independent variables (“Age of the driver at fault”, “Age
negligence of the driver. Hence there seems to be lack
of the injured personnel” and “Time of Crash”) is
of interest shown by the traffic authority to write down
because of the reduced number of crash cases from 504
all the details which led to the crash.
to 72. The final result can be seen in figure 5 of the table
from SPSS. Only 2 crash cases involved female out of the 476 crash
cases. This tiny number of crash cases involving female
The exponent of the coefficient of the independent
is the very reason behind the independent variable
variable ‘Traffic volume” is 0.998 (0.999 in the table but
“Gender of the driver at fault” not being significant.
0.998 as calculated in excel) which means that the odd
of the crash to be fatal decreases by 0.2% for every Similarly for the independent variable “Location of the
increase in the traffic volume by 1 PCU/hour. If the crash”, it was difficult to pin point the exact location of
traffic volume is increased by 100 PCU/hour, the odd of the crash with only the descriptive nature of the crash
the crash to be fatal decreases by 18.27%. If expressed data being available. The data also did not include any
in the form of the equation, it can be stated as below: kind of sketch of the location.

6. Conclusion
p
ln = 0.970 − 0.002 ∗ Tra f f icVolume
1− p The result of the model interpretation shows that

• Both the increasing ages of the driver at fault and


However it should be kept in mind that this equation is that of the injured personnel is contributing
applicable only for the crashes occurring on the given towards the increasing number of the fatal crashes
24 station-monitored corridors. To be more specific, the in Kathmandu Valley.
72 crashes occurred in 16 monitored corridors out of the • The chances of fatal crashes in Kathmandu Valley
24 corridors as can be seen in Table 3. Hence this model are the highest in the Night time (24-4) followed
will apply effectively in these 16 corridors. by the “Day Off Peak (12-16)”, “Day Peak

323
Using Logistic Regression to Estimate the Influence of Crash Factors on Road Crash Severity in
Kathmandu Valley

(16-20)”, “Morning Peak (8-12)”, “Evening Off Accident Severity. Journal of Emerfing Technologies in
Peak (16-20)” and “Morning off peak (4-8)” Wen Intelligence, 05(02).
respectively. [3] Sampson Wiredu and Tawiah Richard. Exploring
• For the crashes occurring along the 16 monitored the predictors of Accident Severity in Urban Ghana.
corridors inside Kathmandu valley as given in Developing Country Studies, ISSN 2224-607X (Paper),,
table 3, the increase in the traffic volume results 5(14), 2015.
in the decreasing number of fatal crashes. The [4] Saffarzadeh Mahmoud, Dovom Zangooei Mehdi, and
increase in the traffic volume results in the Nadim Navid. An analysis of Pedestrian Fatal Accident
decrease in the speed of the vehicles ultimately Severity Using a Binary Logistic Regression Model.
ITE Journal.
causing the reduction in the number of fatal
crashes. [5] Bahrololoom Sareh, Moridpour Sara, and Tay Richard.
A logistic Regression Model for Hit and Run Bicycle
Crashes in Victoria, Australia. 2011.
Acknowledgments
[6] Karacasu Murat, Ergul Baris, and Yavuz Altin Arzu.
The authors are grateful to Pulchowk Engineering Estimating the causes of traffic accidents using logistic
regression and discriminant analysis. International
College, Pulchowk, Lalitpur for providing with the Journal of Injury Control and Safety Promotion,
opportunity to proceed with the research and guidance. 21(4):305–312, 2014.
Authors would also like to thank Metropolitan Traffic
[7] Agyemang Boakye, Abledu Dr.G.K., and Semevoh
Police Division for providing with the required data. Reuben. Regression Analysis of Road Traffic Accidents
Authors would also like to express gratitude to friends and Population Growth in Ghana. International Journal
and family who have been through the research periods. of Business and Social Research (IJBSR), 3(10).
[8] Oirjaka Gentiana, Burazeri Genc, Hysa Bajram, and
Roshi Enver. Factors associated with Fatal Traffic
Accidents in Tirana, Albania: Cross Sectional Study.
References 2008.
[1] S. AL-Ghamdi Ali. Using logistic regression to estimate [9] S Renuraj, N Varathan, and N Satkunananthan. Factors
the influence of accident factors on accident severity. Influencing Traffic Accidents in Jaffna. Sri Lankan
Accident Analysis and Prevention 34, pages 729–741, Journal of Applied Statistics, 16(2).
2002.
[2] Khalili Mohodeseh and Pakgohar Alireza. Logistic [10] H. David and Lemeshow Stanley. Applied Logistic
Regression Approach in Road Defects Impact on Regression, Second Edition.

324

You might also like