Transportation Mode Choice Analysis Based On Class

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/266593894

Transportation Mode Choice Analysis Based on Classification Methods

Article in Scientific Journal of Riga Technical University Computer Sciences · January 2011
DOI: 10.2478/v10143-011-0041-2

CITATIONS READS

17 6,079

2 authors, including:

Nadezhda Zenina
Riga Technical University
14 PUBLICATIONS 72 CITATIONS

SEE PROFILE

All content following this page was uploaded by Nadezhda Zenina on 15 March 2018.

The user has requested enhancement of the downloaded file.


Scientific Journal of Riga Technical University
Computer Science. Information Technology and Management Science
2011
________________________________________________________________________________________________ Volume 49

Transportation Mode Choice Analysis Based on


Classification Methods
Nadezda Zenina1, Arkady Borisov2, Riga Technical University

Abstract – Mode choice analysis has received the most The approach proposed in Matthew G Karlaftis study
attention among discrete choice problems in travel behavior (2001) was focused on the recursive partitioning methodology
literature. Most traditional mode choice models are based on the development for individual mode choice prediction. The
principle of random utility maximization derived from
econometric theory. This paper investigates performance of mode methodology is based on tree-structured nonparametric
choice analysis with classification methods - decision trees, classification technique (Breiman, 1984), as a result, a set of
discriminant analysis and multinomial logit. Experimental results decision rules represented in the form of a binary decision tree
have demonstrated satisfactory quality of classification. was produced. This methodology allows using any
combination of categorical and discrete variables compared to
Keywords – Decision trees, discriminant analysis, multinomial multinomial logit [15].
logit, transportation mode
In the V.C.Tatineni, M.J. Demetsky study the supply chain
methodology based on modeling methodology for regional
I. INTRODUCTION freight transportation planning was investigated. The mode
choice model was developed using four different classification
Transportation mode is one of the major components methods: binary logit model, linear discriminant analysis,
(system user, mode, infrastructure, intermodal connections and quadratic discriminant analysis and tree classification. The
stakeholders) of the transportation system that estimates the quadratic discriminant and classification trees have provided
level of usage for different transportation modes (e.g., the most accurate modal split among the four empirical choice
walking, public transport, bicycle, and vehicle) and given the models (model accuracy 87% - 92% for test set). Logit models
performance characteristics of each available mode and have provided the most interpretable results among the four
characteristics of the individual user. [12]. empirical choice models [7].
Travel mode choice has received the most attention among The purpose of this study is to classify transportation mode
discrete choice problems in travel behavior literature. Mode choice based on classification methods: discriminant analysis,
choice analysis and prediction are closely related to decision trees and multinomial logit. The capabilities of each
transportation system policies and congestion mitigation classification method for the transportation mode choice
strategies. The most of mode choice models are based on analysis were evaluated.
random utility maximization principle derived from
econometric theory. Since the multinomial logit (MNL) model II. INPUT DATA
was developed in the 1970, the parametric models with
different structures have become the most common used tools The data were collected from July 19 to August 05, 2005. A
for mode choice analysis. Several recent studies in the field of total of 7 171 personal interviews were conducted. All
decision trees and neural networks [14], [15] have showed respondents were 16 years old or older and data collection was
better performance indicators compared to discrete choice organised at four places: hotel, office, shopping center and
models. home.
Up to now, variety of neural network models have been The survey collected information on: (a) socioeconomic and
operated in traffic flow management for driver behavior demographic variables; (b) travel characteristics; (c) travel
modeling, vehicle detection on the road and vehicle influence conditions. Some of these variables were qualitative
scheduling and routing. In the study (Chi Xie, Jinyang Lu, and others were quantitative. The following variables have
Emily Parkany, 2003) two data mining methods were been collected and used to determine the best fit model under
considered: learning tree (algorithm C4.5) and neural study:
networks (backpropagation) to improve performance of mode • Data. This variable has been used to determine the
choice forecast. Two data mining models were compared with impact of the week day in the trip maker mode choice.
traditional multinomial logit (MNL). Comparative evaluation • Place or Trip purpose. The distinction among trip
has showed that the two data mining models have comparable purposes is an important step in mode choice analysis because
but slightly better prediction capability than the MNL model different trip maker behaviors are expected in selecting a
on work travel mode choice modeling. Decision trees have mode for different trip purposes [8]. In order to distinguish
problems with processing continuous data; data have to first between trip purposes, this information should be available to
be grouped into ranges manually or automatically with the model builder. Four trip purposes (hotel, office, shopping
software tool [14]. center and home) are reviewed in this study.

49
Scientific Journal of Riga Technical University
Computer Science. Information Technology and Management Science
2011
________________________________________________________________________________________________ Volume 49
• Direction. This variable has been used to determine the attribute was calculated. Entropy is a measure of variability in
visitor’s direction to/from data collection points (for example, a random variable (1).
a visitor is going to or from a shopping centre).
q
• Part of City. The city was divided into four districts to | Ai |
get more accurate results and to determine the impact of the Gain ( A, Q )  H ( A , S ) 
i 1 | A |

H ( Ai , S ) . (1)
city part on the trip maker’s mode choice.
• Transportation Mode. This variable is used to determine
where Ai – subset of A for which attribute Q has value i.
whether the trip maker owns a car or is captive to other modes
The calculated and sorted information gain from the biggest to
such as public transport, taxi, bicycle and walking.
the smallest for all attributes is given in Table II.
• Travel time. This is the time in minutes spent in the mode
for a one-way trip including the access time, egress time and TABLE II
waiting time. INFORMATION GAIN FOR ATTRIBUTE
• Age. This variable has been used to determine if age has Attribute Information Gain
an impact on intercity mode choice for the trip maker or his Direction 0,81
Age 0,76
family. Travel time 0,70
• Temperature. This variable is used to determine the Part of city 0,65
average temperature during data collection. Conditions 0,64
• Wind Speed. This variable is used to determine the wind Date 0,63
Wind speed 0,38
speed during data collection Place 0,36
• Conditions. This variable is used to determine the From received data the attributes „Direction” and „Age” have
conditions (cloudy, rainy, sunny or thunder) during data the maximum information gain; in turn attributes „Date”,
collection. „Wind Speed” and „Place” have the minimum information
The last three variables were taken from Internet resourses. gain values. To understand how information gains influence
the classification task, the attribute „Mode” was classified
III. DATA PREPROCESSING
without attributes with a small value of information gain.
Data preprocessing is an important step in the data mining
process, and it has a huge impact on the success of a data C. Weka Filter Application (Supervised and Unsupervised)
mining project. The purpose of data preprocessing is to clean To improve classification task for attribute “Mode”, Weka
the noise data, extract and merge them from different sources, 3-6-0 filters were used. All filters in Weka are divided into two
and then transform and convert the data into a proper format. categories: supervised and unsupervised. Each category
Data preprocessing was divided into three parts: sampling, includes a filter on the attributes and instances. The supervised
entropy and information gain calculation for the attribute filters take class information into account and try to maintain
selection. the class distinctions in the grouped instances, while the
unsupervised filters do not. All filters were used to classify
A. Sampling attribute “mode” and then three of the filters with the best
The number of items was reduced from 7171 to classification results are chosen for further analysis:
approximately 500 to make the data more suitable for data • Resample - Produces a random subsample of a dataset
mining. Sampling without replacement (if item is selected using sampling with replacement (items are not removed from
once, then it’s removed from the set) was used to distribute the population as they are selected for the sample),
uniformly all items for the instance “Mode”. Table I shows • Remove Misclassified - A filter that removes instances
distribution of items before and after preprocessing. which are incorrectly classified. Useful for removing outliers
TABLE I and
ITEMS DISTRIBUTION FOR THE ATTRIBUTE “TRANSPORTATION MODE” • Remove Folds - This filter takes a dataset and outputs a
Before Preprocessing After Preprocessing specified fold for cross validation.
Attribute value Number of Number of
% of Items % of Items
Items Items Using the Resample filter the number of instances remained
Bicycle 105 1% 83 17% unchanged – 498 instances, however some of instances did not
Public transport 1460 20% 110 22% get to the set, whereas some of the instances are chosen
Taxi 86 1% 86 17%
Vehicle 3394 47% 109 22%
repeatedly. In the new data set the number of correctly
Walking 2126 30% 110 22% classified instances has grown about by 10-15% in comparison
Total: 7171 100% 498 100% with the other Weka filters (the classification results are
shown in paragraph „Learning tree”).
B. Entropy and Information Gain
Using the Remove Misclassified filter the number of
As the next step, information gain was calculated for each instances was reduced from 498 instances to the 253 and the
attribute by using data after preprocessing (approximately 500 percentage of correctly classified instances was 71%.
items). To calculate information gain, entropy for each However, by using this filter the distribution of the instances

50
Scientific Journal of Riga Technical University
Computer Science. Information Technology and Management Science
2011
________________________________________________________________________________________________ Volume 49
has changed; it did not correspond to the initial distribution worse (up to 5%) in comparison with the CART algorithm and
(see Table 1, column – After Preprocessing). with classification with sampling.
Using the Remove Folds filter the number of instances is TABLE IV
reduced to 50. The initial distribution has also changed and, as CONFUSION MATRIX FOR ALGORITHM C4.5
a result, only three instances corresponded to the values The number of classified instances
“Vehicle”, “Bicycle” and “Taxi” for attribute “Mode”. Correctly Incorrectly Correctly Incorrectly Correctly Incorrectly
8 attributes 1) 7 attributes 2) 6 attributes 3)
IV. LEARNING TREE C4.5 65% 35% 63% 37% 63% 37%
SimpleCart 65% 35% 66% 34% 61% 39%
After the data preprocessing has been finished, the 1)
8 attributes – classification results without attribute “Place”;
classification was made to evaluate correctly and incorrectly 2)
7 attributes - classification results without attributes “Place” and “Wind
speed”;
classified number of instances. The classification was made 3)
6 attributes - classification results without attributes “Place”, “Wind speed”
with the Weka 6-0-1 software using two algorithms C4.5 and and “Date”.
CART (Classification and regression tree). CART method is Using the information gain calculation the classification of
nonparametric technique (binary) that can select important the attribute “mode” is improved insignificantly. It was
patterns from a large number of variables. decided to use the Weka filters for classification to make the
After the first part of the preprocessing – sampling and total classification more accurate. The classification results are
pattern reduction to the 500 instances both algorithms have given in Table V.
shown similar results – 65% of the correctly classified
TABLE V
instances and 35% incorrectly. As a result, the confusion
CLASSIFICATION RESULTS. LEARNING TREE
matrix was received; it showed the number of confused
Classified Instances
instances for classified attribute “Mode”. The confusion Data preprocessing Algorithm C4.5 Algorithm CART
matrix for algorithm C4.5 with the sampling is shown in Correctly Incorrectly Correctly Incorrectly
Table III. Resample 67% 33% 67% 33%
Remove Misclassified 71% 29% 69% 31%
TABLE III Remove Folds 78% 22% 77% 23%
CONFUSION MATRIX FOR ALGORITHM C4.5 The use of Weka filters produced various classification
a b c d e Classified as results. The number of correctly classified instances varies
77 8 10 11 3 a = Vehicle
within 67% - 78% percent according to the chosen
0 77 15 0 18 b = Walking
0 12 86 0 12 c = Public transport classification algorithm, C4.5 or CART. In addition, to
22 17 11 31 2 d = Bicycle increase classification accuracy, the filters are used in
12 6 11 5 52 e = Taxi combination with each other. As a result, the correctly
a b c d e Classified as
classified instances are 80% for the C4.5 algorithm and 92%
77 8 10 11 3 a = Vehicle
0 77 15 0 18 b = Walking for the CART algorithm.
It can be seen from Table 3 that, for example, for instances
value “Vehicle” 77 instances are classified correctly and V. DISCRIMINANT ANALYSIS
others are confused with other instances value. The main Discriminant analysis is a statistical technique that classifies
reasons of the incorrect classification are poor quality input dependent variable between groups and calculates each
data with the noise, missing values, different types of respondent probability to get into one or another group. As a
attributes - the numeric and categorical, the significance of result of discriminant analysis, a discriminant function is
different attributes, as well as overfitting and underfitting (a obtained that is similar to regression function. In the
situation where a large number of errors is observed when discriminant analysis the initial group size and quantity are
checking the classifier on the training set. This means that the given and the main task is to determine how accurately it is
specific patterns in the data were not detected and either they possible to predict the object membership to groups with the
do not exist at all or it is necessary to choose another method given set of discriminant variables [2]. The main problems of
of detection). discriminant analysis are the selection of discriminant
Entropy and information gain calculations were used to variables and the choice of discriminant function. The multiple
improve the classification results in the second part of the data forward stepwise discriminant analysis was used to predict
preprocessing. After the information gain calculations the values of five categories (Table VI).
classification without attributes with the smallest information In general, all nine variables (see p. 2 “input data”) were
gain values has been made (Table IV). discriminated well between the groups and are suitable for
The results obtained give evidence that the number of classification purposes. The correlation coefficient between
correctly classified instances slightly improved for the the discriminant function values and an indication of
algorithm CART when seven attributes (excluding attributes belonging to a group have shown an average relationship for
“Place” and “Wind speed”) were used for class attribute the first (correlation coefficient - 0.697) and second (0.65)
classification “Mode”. On the other hand, the number of the functions, and the weak relationship of the third (0.436) and
correctly classified instances for the algorithm C4.5 is slightly fourth (0.303) functions. That means that the third and fourth
functions are not clearly divided into groups, and it has

51
Scientific Journal of Riga Technical University
Computer Science. Information Technology and Management Science
2011
________________________________________________________________________________________________ Volume 49
resulted in a small number of correctly classified observations. ("Place», Sig = 0.478; «Age», Sig = 0.728; and "Travel time»,
Forward stepwise analysis was carried out to improve the Sig = 0.263) on the model. As a result, these variables can be
discriminant analysis results (Table VII). excluded from the model.
TABLE VI TABLE VII
DISCRIMINANT ANALYSIS. CLASSIFICATION RESULTS CLASSIFICATION RESULTS. MULTINOMIAL LOGIT
Discriminant Number of Number of categories, True classified instances, % Observed Predicted categories
analysis variables transportation mode transportation mode Public Percent
Vehicle Walking
9 5 (all include) 63,9% transport Correct
Stepwise 9 4 (exclude «bicycle») 70,6% Vehicle 82 6 5 88,2%
analysis 3 (exclude «bicycle», Public transport 1 78 6 91,8%
9 79,6%
«taxi») Walking 2 9 66 85,7%
6 5 (all include) 64,1% Overall percentage 33,3% 36,5% 30,2% 88,6%
Forward
6 4 (exclude «bicycle») 70,8%
stepwise
3 (exclude «bicycle»,
analysis 6
«taxi»)
79,6% VII. CONCLUSIONS
Comparative results of the stepwise and forward stepwise In this study the applications of learning tree, discriminant
methods showed small difference of correctly classified analysis and multinomial logit with different specifications in
instances, only 0.2%. In the case of forward stepwise the context of mode choice analysis are presented.
discriminant analysis 64.1% instances were correctly Socioeconomic and demographic data, travel characteristics
identified and 63.9% of instances in the classification, taking and travel influence conditions were collected to estimate
into account all independent variables simultaneously. Low factor influence on individual’s choice of travel mode. Five
accuracy of classification was associated with "Taxi" and available modes (vehicle, public transport, walking, bicycle
"Bicycle" categories of dependent variable "Transportation and taxi) were taken into account for mode choice analysis
mode." Perhaps learning sets are closely located to each other, according to survey data. All input data were cleaned from
resulting in increased probability of erroneous classification of noise and were transformed into proper format for analysis.
categories. The number of categories was reviewed to improve Two algorithms C4.5 (Quinlan) and CART (Classification
accuracy of the classification model. and regression tree) were chosen for building decision trees.
Reduction of categories from five to four (excluding the C4.5 algorithm is one of the most well known algorithms with
category “Taxi”) and three (excluding the categories “Taxi” good combination of error rate and speed (Tjen-Sien Lim et al.
and “Bicycle”) has increased the classification accuracy by 2000). CART algorithm is nonparametric and can easily
10% and 25% (70.6% and 79.6%) for forward stepwise handle outliers in travel characteristics data that are based on
discriminant analysis and by 10% and 24% (70.8% and traveler perception. Both classification algorithms have shown
79.6%) for stepwise discriminant analysis. roughly the same results 67% - 78% of correctly classified
instances. After additional data preprocessing (first, a random
VI. MULTINOMIAL LOGIT dataset was produced using sampling with replacement and
The multinomial logit model is a choice model between two then incorrectly classified instances were removed from the
or more alternatives, among several independent variables dataset), the percentage of correctly classified instances was
(also called predictors) and the dependent variable. 80% for the C4.5 algorithm and 92% for the CART algorithm.
Multinomial logit model (MNL) gives the choice probabilities Discriminant analysis was chosen for transportation mode
of each alternative as a function of the systematic portion of choice analysis because it deals with more individual than
the utility of all the alternatives. aggregate data, it may make fuller use of the data and more
The maximum likelihood method consists of finding model accurately reproduce structure inherent in the data (John A.
parameters that maximize the likelihood (posterior Fiedler, 1996). Two methods of discriminant analysis were
probability) of the observed choices conditional on the model. used: stepwise and forward stepwise. The stepwise method
In this study the maximum likelihood method was used to includes all variables simultaneously in a model and in the
calculate coefficient of logistic regression [18]. forward stepwise method a model of discrimination is built
step by step. Comparative results of the stepwise and forward
stepwise methods have shown that the percentage of correctly
L(Y1 , Y2 ,...Yk ; )  p(Y1; )  ...  p(Yk , ) . (2) identified instances for depended variable “transportation
mode” was 63.9% - 64.1% for five categories and 70.6% -
79.6% for three categories.
The multinomial logit classification results have showed
The calculations for multinomial logit model in Section VI
that 88.6% of instances were correctly classified. High
have illustrated the manner in which different utility
statistical significance of the model based on the method of
specifications and the estimated parameters associated with
maximum likelihood (Sig. <0,001) testifies to its quality and
them are used to predict choice probabilities based on
suitability for the task. The model explains 86.7% of the total
characteristics of the traveler and attributes of the alternatives.
variance (Pseudo R2 = 0,867 on the test Nagelkerke).
Overall experimental results have shown that 88.6% instances
Assessment of statistical significance of each of the dependent
were classified correctly. Category “Public transport” of
variable has indicated that three variables have little effect

52
Scientific Journal of Riga Technical University
Computer Science. Information Technology and Management Science
2011
________________________________________________________________________________________________ Volume 49
attribute “transportation mode” has shown the best result with [11] Pang-Ning Tan, Steinbach M., Kumar V. Introduction to Data Mining.
Addison-Wesley. 2006.
91.8% correctly classified instances. [12] Quinlain J.R. Improved Use of Continuous Attributes in C4.5. Journal of
Artificial Intelligence Research 4. 1996. p. 77-90.
REFERENCES [13] Meyer M.D., Miller E.J. Urban Transportation Planning. A Decision-
Oriented Approach. Second Edition. Mc-Graw Hill, 2001.
[1] Плеханов А.В. Математико-статистические методы обработки [14] Chi Xie, Jinyang Lu, Parkany E. Work Travel Mode Choice Modeling
информации с применением программы SPSS. Издательство Санкт- Using Data Mining: Decision Trees and Neural Networks.
Петербурского государственного университета Экономики и Transportation Research Record, Paper No. 03-4348, 2003.
финансов. 2010. [15] Karlaftis M. G. Predicting Mode Choice through Multivariate Recursive
[2] Таганов Д.Н. SPSS: Статистический анализ в маркетинговых Partitioning. European Transport Conference, 2001.
исследованиях. СПб: Питер, 2005 [16] Murat Y.S., Uludag N. Route choice modeling in urban transportation
[3] Иванов Е.Е., Шустров Д.А., Перешивкин С.А. Многомерные networks using fuzzy logic and logistic regression methods. Journal of
статистические методы. Кафедра экономической кибернетики и Scientific and Industrial Research. Vol. 67, January 2008, pp. 19-27.
экономико-математических методов. 2010. [17] Ben-Akiva, M.T., Morikawa and Shiroishi, F., Analysis of the Reliability
[4] Орлова И.В., Половников В.А., Габескирия В.Я., Гармаш А.Н., of Stated Preference Data in Estimating Mode Choice Models, Selected
Гусарова О.М., Михайлов В.Н., Пилипенко А.И.. Эконометрика. Proc., 5th WCTR, Vol. 4Yokohoma, Japan, pp: 263-277 (1989).
Москва. 2005. [18] “Multinomial logit models” Sep. 25, 2011. [Online]. Available:
[5] Мухамедиев Б.М.. Эконометрика и эконометрическое http://ru.wikipedia.org/wiki/Logit. [Accessed: Sep. 25, 2011].
прогнозирование. Алматы. 2007.
[6] Akkarapol Tangphaisankun. A study in integrating paratransit as a Nadezda Zenina is a Doctoral student at the Institute of Information
feeder into mass transit systems in developing countries: a study in Technology, Riga Technical University (Latvia). She received her M.Sc.
Bangkok. September, 2010. degree from Riga Technical University in 2006.
[7] Tatineni V. C., Demetsky Dr. M. J. Supply Chain Models for Freight Research areas include artificial neural systems and data mining methods:
Transportation Planning. Research Report No. UVACTS-14-0-85. learning trees, multinomial logit and discriminant analysis.
2005.
[8] Koppelman F.S., Bhat C. A Self Instructing Course in Mode Choice Arkady Borisov is a Professor of Computer Science at the Institute of
Modeling: Multinomial and Nested Logit Models. U.S. Department of Information Technology, Riga Technical University (Latvia). He received his
Transportation Federal Transit Administration. 2006. Doctor of Technical Sciences degree from Taganrog State Radio-Engineering
[9] Давнис В.В., Тинякова В.И.. Прогноз и адекватный образ будущего. University in 1986 and Dr.habil.sci.comp. degree from the Latvian Council of
Вестник ВГУ, серия: экономика и управление, 2005, Н2. УДК Science in 1992.
681.3.07. The research areas include artificial intelligence, decision support systems,
[10] WEKA Manual for Version 3-6-0. University of Waikato, Hamilton, fuzzy set theory and its applications and artificial neural systems.
New Zealand. December 18, 2008.
.

Nadežda Zeņina, Arkadijs Borisovs. Transportlīdzekļu pārvietošanās veidu analīze, pamatojoties uz klasifikācijas metodēm
Transportlīdzekļu veidu izvēle diskrēto uzdevumu vidē ir plaši atspoguļota literatūrā. Transportlīdzekļu veidu izvēle un prognoze ir cieši saistīti ar transporta
sistēmas politiku, braucienu pieprasījuma vadību un ar sastrēgumu samazinājuma stratēģiju uz ceļiem. Darbā ir izskatīti lēmumu koki (algoritmi C4.5 un CART),
diskriminantu analīze un daudzdimensiju logit regresija transportlīdzekļu veidu (mašīna, gājējs, sabiedriskais transports, taksometrs un riteņbraucējs)
pārvietošanās analīzei. Lēmumu koku klasifikācijas rezultāti parādīja, ka 67% - 78% eksemplāru bija klasificēti pareizi. Papildus apstrādājot izejas datus,
kombinējot vairākus filtrus, pareizi klasificēto eksemplāru skaits tika palielināts līdz 80% algoritmam C4.5 un līdz 92% algoritmam CART. Tiešā un soļu
diskriminantu analīze parādīja nebūtisku atšķirību pareizi klasificēto eksemplāru skaitā. Soļu diskriminantu analīzes gadījumā pareizi tika identificēta piederība
64.1% novērojumu un 63.9% klasifikācijai ņemot vērā visus mainīgos vienlaicīgi. Klasifikācijas rezultātu ne sevišķi lielā precizitāte bija saistīta ar kategorijām
„taksometrs” un „riteņbraucējs” atkarīgā mainīgā „transportlīdzekļu pārvietošanās veids”. Samazinot kategoriju skaitu līdz trim (bez kategorijām „taksometri” un
„riteņbraucējs”) klasifikācijas precizitāte palielinājās līdz 79.6% soļu un tiešajai diskriminantu analīzei. Daudzdimensiju loģistiskās regresijas klasifikācijas
rezultāti uzrādīja, ka 88.6% respondentu tika klasificēti pareizi. Uzbūvētā modeļa lielā statistiskā nozīme liecina par to augsto kvalitāti un piemērotību uzdevuma
risināšanai.

Надежда Зенина, Аркадий Борисов. Анализ транспортных средств передвижения с помощью классификационных методов
Выбор вида перемещения (mode choice) среди дискретных задач выбора наиболее широко отражен в литературе. Выбор и прогнозирование вида
передвижения тесно связаны с политикой транспортной системы, управлением спроса на поездки и стратегией уменьшения заторов на дорогах. В
данной работе рассматриваются деревья решений (алгоритмы C4.5 и CART), дискриминантный анализ и множественная логит регрессия для анализа
выбора средства передвижения (на машине, пешком, общественный транспорт, такси или на велосипеде). Результаты классификации с помощью
деревьев решений показали 67% - 78% верно классифицированных экземпляров на тестируемом множестве. Дополнительно обработав исходные
данные, комбинируя несколько фильтров, удалось повысить процент верно классифицируемых экземпляров до 80% для алгоритма C4.5 и до 92% для
алгоритма CART. Сравнительные результаты прямого метода и пошагового дискриминантного анализа показали незначительную разницу верно
классифицированных наблюдений. В случае пошагового дискриминантного анализа правильно определена принадлежность 64,1% наблюдений и
63,9% при классификации с учетом всех независимых переменных одновременно. Невысокая точность классификации связана с категориями «такси»
и «велосипед» зависимой переменной «тип перемещения». Уменьшение категорий с пяти до трех (без категорий «такси» и «велосипед») увеличило
точность классификации на 79,6% для пошагового дискриминантного анализа, и до 79,8% для классификации с учетом всех независимых переменных
одновременно. Результаты множественной логистической регрессии показали, что 88,6% респондентов были классифицированы верно. Высокая
статистическая значимость построенной модели, основанная на методе максимального правдоподобия (Sig. < 0,001), свидетельствует о ее высоком
качестве и пригодности для решения поставленной задачи.

53

View publication stats

You might also like