Decision Support Systems: Philippe Baecke, Lorenzo Bocca

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Decision Support Systems 98 (2017) 69–79

Contents lists available at ScienceDirect

Decision Support Systems


journal homepage: www.elsevier.com/locate/dss

The value of vehicle telematics data in insurance risk selection processes


Philippe Baecke* , Lorenzo Bocca
Vlerick Business School, Reep 1, Ghent 9000, Belgium

A R T I C L E I N F O A B S T R A C T

Article history: The advent of the Internet of Things enables companies to collect an increasing amount of sensor gen-
Received 3 October 2016 erated data which creates plenty of new business opportunities. This study investigates how this sensor
Received in revised form 17 March 2017 data can improve the risk selection process in an insurance company. More specifically, several risk assess-
Accepted 25 April 2017 ment models based on three different data mining techniques are augmented with driving behaviour data
Available online 28 April 2017
collected from In-Vehicle Data Recorders. This study proves that including standard telematics variables
significantly improves the risk assessment of customers. As a result, insurers will be better able to tailor
Keywords: their products to the customers’ risk profile. Moreover, this research illustrates the importance of including
Internet of things
industry knowledge, combined with data expertise, in the variable creation process. Especially when a reg-
Usage-based-insurance
ulator forces the use of easily interpretable data mining techniques, expert-based telematics variables are
Risk assessment model
Logistic regression able to improve the risk assessment model in addition to the standard telematics variables. Further, the
Random forests results suggest that if a manager wants to implement Usage-Based-Insurances, Pay-As-You-Drive related
Artificial neural networks variables are most valuable to tailor the premium to the risk. Finally, the study illustrates that this new type
of telematics-based insurance product can quickly be implemented since three months of data is already
sufficient to obtain the best risk estimations.
© 2017 Elsevier B.V. All rights reserved.

1. Introduction Nowadays, under the pressure of increasing competition, trying


to achieve a cost reduction for both the insurer and the policyholder,
The adoption of sensors is having a worldwide and growing some insurance companies have developed Usage-Based-Insurance
impact on several industries [46]. Resulting from the rise of the (UBI) models [42]. Through the use of In-Vehicle Data Records (IVDRs),
Internet of Thing (IoT), a massive flow of data will be generated [5,11]. the insurer is able to collect driving behaviour data of each cus-
This new source of data combined with data mining techniques can tomer. These records include the kilometres driven, subdivided based
generate new opportunities to automate or support business deci- on location and time of the day [22,43]. Insurance companies can
sion making [19]. This paper investigates the value of IoT data with gain a strong competitive advantage by correctly using and analysing
particular attention to telematics data in the motor insurance sector. these data. Extra services such as automatic emergency calls, stolen
Over the past years, the premium calculation methodology for vehicle monitoring and diagnostic services, including economically
motor insurance companies was mainly based on general factors. more convenient and safer driving suggestions, can simultaneously
Vehicle specific characteristics and drivers’ socio-demographical data be implemented with the same device. Further, it can facilitate fraud
were the only input for the calculations [39]. This a priori approach detection in the claims handling process. Last but not least, it also
can be improved by taking the claim history of the policyholder gives the opportunity to tailor insurance premiums depending on a
into account. Through the use of a merit–demerit past claims-based dynamic risk profile for each policyholder [15,18,22,43].
model the insurers could define a more consistent level of riski- This study will investigate the added value of telematics driving
ness [4,9,39]. Despite the wide adoption of such models, they still behaviour data to improve motor insurance risk profiles using a unique
have limitations in estimating the real risk level of the policyholder. dataset of a European insurance company. First, we will research the
More specifically, the exposure to the risk (i.e. how much, where effect of augmenting a traditional accident risk assessment model
and when the policyholder drives his vehicle) is not yet considered. with telematics data on the quality of the risk selection process. Three
classification techniques will be used for this. From a business per-
spective, a traditional logistic regression model will be researched.
This “white box” model is still easy to explain to the regulator. In addi-
* Corresponding author.
tion, from an academic perspective, two more “black box” machine
E-mail address: [email protected] (P. Baecke). learning techniques will be investigated: random forests and artificial

http://dx.doi.org/10.1016/j.dss.2017.04.009
0167-9236/© 2017 Elsevier B.V. All rights reserved.
70 P. Baecke, L. Bocca / Decision Support Systems 98 (2017) 69–79

neural networks models. Secondly, we will research the effect of For a couple of years, professional IVDRs that include a GPS sensor
expert domain knowledge in this context. Multiple studies in biology and wireless transmission capabilities were introduced in the insur-
and medicine have proven the added value of domain knowledge to ance industry. These units enable insurers to capture the driving
improve knowledge discovery in data (KDD) [1,2,21]. Also in more behaviour of the policyholder. When a policyholder is driving, the
business related fields, a few studies demonstrated how domain position is tracked every couple of seconds and aggregated on the
knowledge can be used in the data mining phase of the KDD process. device level. Next, this data is transmitted to the insurer [36].
More specifically, in a context of churn and credit risk analysis, the These devices need to be installed by a professional under the car’s
interpretability of logistic regression algorithms and decision trees dashboard. Several additional features can simultaneously be imple-
can be improved through the evaluation of coefficient signs and deci- mented with the device, such as automatic emergency calls, stolen
sion tables by domain experts [29,31]. This study however, will focus vehicle monitoring or diagnostic services [43]. This study also makes
on the added value of domain knowledge in the data prepossessing use of this type of “professional” devices. Despite the advantages,
and transformation phase of a KDD process. This phase is selected by these devices are still quite expensive, limiting the ability to deploy
Kopanas et al. [26] as one of the most crucial phases in which domain this for all policyholders. However, the relevance of this research
knowledge should be integrated. Both D’Haen et al. [16] and Moro will increase even more in the future due to the democratisation of
et al. [35] proved, in a marketing context, that integrating domain such devices. Recently, a new type of units has entered the market,
expertise in the feature creation process can significantly increase the called “Smart” devices, that can be installed autonomously by the
predictive value of the original dataset. Based on these insights, this driver connecting them to the on-board diagnostics (OBD) port of
study will examine in a risk assessment context the impact of aug- the car. While these devices have more or less the same technical
menting data with expert-based telematics variables. Finally, we will features of the professional ones, such as GPS, connectivity and Blue-
provide deeper insights in the relationship between several types of tooth, they have the important advantage of a lower price. Finally,
telematics variables and accident risk prediction. smartphones and applications could also be used to collect location
The remainder of the paper is organized as follows. Section 2 data from vehicles [22]. Although this solution has a commercial
presents an overview of the motor insurance risk selection litera- price advantage, the technical performances are significantly lower
ture and the use of telematics in this domain. The methodology is and features such as stolen vehicle recovery and real time assis-
described in Section 3, including a description of the data and data tance cannot be deployed. In addition to the democratisation of these
mining techniques used in this study. In Section 4, the results are devices, the European Parliament has introduced a new regulation
summarised. Section 5 will discuss these results and concludes this which requires all new cars produced in the European Union to be
paper with suggestions for future research. equipped with a telematics-based device that automatically call the
emergency services in case of an accident, providing the exact loca-
2. Auto insurance risk selection tion of the car [40]. As a result, an increasing number of cars will be
automatically equipped with telematics in the future.
2.1. Related work Remarkably, the number of studies with regard to telematics in
Usage-Based-Insurances is still limited. Table 1 presents an overview
Driven by the competition in the market, insurers increasingly try of the most relevant academic literature in which driving behaviour
to differentiate their products by tailoring the price of their products collected from IVDRs is used to improve the insurance business and
depending to the expected risk [15,22,43]. Nevertheless, the main more specifically explain accident risk. Vaia et al. [43] examined the
problem in developing a winning strategy through personalisation first example of a fully Telematics-Based-Insurance. This insurance
lies in the lack of valid explanatory variables. In fact, most of the tra- was developed by Unipol, an Italian insurance company, in collab-
ditional premium calculation models are only based on vehicle- and oration with Octo Telematics, the main data integrator player on
driver-specific variables. In order to better align the insurance prod- the market. This study discussed the benefits of a telematics-based
uct with the real risk exposure, some companies introduced a Usage- programme for each stakeholder involved. They proposed differ-
Based-Insurance product [17]. Influenced by the rise of Internet of ent telematics-based products tailored to three customer segments.
things, this Pay-Per-Mile insurance business model evolves from a For the first year the premium was determined based on tradi-
system where the driver is responsible to report his own mileage tional parameters while the insurance company gathered all the data
to a system that heavily relies on IVDRs to collect the data [22,27]. recorded by the IVDR such as total mileage driven, daytime and loca-
This technological evolution enables insurance companies to tailor tion distance driven per trip. Vaia et al. [43] mentioned that after this
the premium more to the customers’ behaviour. year of data collection, these data can be used to tailor their premium
Depending on the variety of information available on the driver on the basis of personal driving styles, however without exploring
and the usage level of telematics, Usage-Based-Insurances can have this in more detail. Besides the financial benefits for the company and
several variants. The two main categories we can distinguish are Pay- customer, they also mentioned, as a few other studies [22,30,37], the
As-You-Drive (PAYD) and Pay-How-You-Drive (PHYD) insurances. social and environmental benefits of telematics since the data could
Although the base idea of PAYD and PHYD insurances is similar, they also be used to improve traffic flow and reduce fuel consumption.
differ in the variety of behaviour-based data collected and, conse- Azzopardi and Cortis [7] provided a SWOT analysis to com-
quently, in their use of telematics. In fact, while for PAYD insurance pare Telematics-Based-Insurances with the classical premium rating
the premium is mainly based on general distance data, PHYD insur- system in a context of fleet insurance covers. This was a qualita-
ances make use of additional variables such as time of the day and tive study based on interviews of different stakeholders, such as fleet
location [22,43]. owners, insurance companies, insurance brokers and data integra-
Several studies have analysed the opportunities and risks brought tion companies. They noticed an interest to adopt telematics-based
by Usage-Based-Insurances. Desyllas and Sako [15] defined ways insurances due to several advantages. Insurers could make use of
to profit from this business model innovation, Troncoso et al. [42] more adequate pricing strategy, while fleet owners could have a
proposed a PAYD methodology without privacy leaks, Litman [30] better control of their fleet.
evaluated the pricing strategy with the insurance regulatory objec- Husnjak et al. [22] provided an overview of telematics usage in
tives, while Rejikumar [37] analysed the possible customers’ barriers the insurance sector, mainly focusing on the technical solutions and
in adopting Usage-Based-Insurances. However, none of these studies underlying data model behind the billing process. More specifically,
discuss in detail the potential of sensors resulting from the techno- they discussed the data collection process, the data extrapolation
logical trend toward the Internet of Things. necessary to create relevant metrics from the raw data and the
P. Baecke, L. Bocca / Decision Support Systems 98 (2017) 69–79 71

integration with the customer relationship management system.

Non-parametric bootstrap resampling and Wilks


Furthermore, the social, economical, environmental, insurers’ and

18 customer specific variables 24 car specific Logistic regression, Random Forests, Artificial
users’ benefits of Telematics-Based-Insurances were demonstrated
through the analysis of a small sample implementation of the system
in Eastern Europe.
Previously mentioned studies only describe the potential of telem-
atics for car insurances, without any analytical results. Only a few
studies have researched the effect of driving behaviour on accident
risk. However, in these studies, the number of observations and the
Analytical technique

Correlation analysis

observation period was typically quite small, limiting the authors

Logistic regression
Poisson regression

Neural Networks
Survival analysis
to focus on univariate relationships. Specifically for one specific bus
Lambda test route, Wahlberg [45] investigated the correlation between accelera-
tion data, measured on a 30 minute bus trip using a g-analyst, and
accident frequencies. Unfortunately the sample of 125 observation

was probably too small to provide conclusive results. Toledo et al. [41]
mainly focused on the effect of driver feedback. However, in a small
3 customer specific variables, 1 car specific

variables, 7 claim history related variables,


experiment they also illustrated the correlation between a risk index,
variable, 4 telematics based variables

calculated based on IVDR data, and accident involvement using a


240 telematics based variables, 48
expert-based telematics variables
poisson regression model. Jun et al. [23] also made use of IVDR data
Acceleration, speed and position

39 telematics based variables

to investigate if differences could be observed in the driving speed


patterns between crash-involved and crash-non-involved drives. For
specific roadway types at specific times of the day, the study iden-
tified that crash-involved drives drive at significantly higher speed.
However, in order to generalize these findings over all roadway types
Acceleration

and times of the day, more observations than the 162 vehicles were
Predictors

required. In the three studies of Wahlberg [45] , Toledo et al. [41] and
Speed

Jun et al. [23], there is no strict separation of the time window in which

the dependent variable (i.e. accident involvement) and the indepen-


dent telematics-based variables are created. Hence, crashes could
3 years (interval censored)
Between 6 and 35 months

easily have taken place before or during the time window in which
the driving behaviour was collected. Previous research has reported
Observation period

that the driving behaviour could strongly differ before and after a car
accident [32]. This limits the predictive value of these studies.
6 months

6 months

While previously described studies research univariate rela-


30 min.

4 years

tionships between a driving behaviour variable and accident risk,


more recently, two studies have investigated the effect of multiple

telematics-based variables simultaneously. Ayuso et al. [6] compared


15,940 vehicles (age < 30)

6984 vehicles (age < 30)

novice and experienced young drivers for a few customer profile


and driving behaviour variables. Further, the authors researched the
influence of these variable on the time and distance to the first crash
125 bus drivers

in a diagnostic way. A limitation of the study is however that the


1600 vehicles
191 vehicles

167 vehicles

information about the number of days and kilometres to the first


Sample

accident is heavily interval censored on a coarse level of granularity.


The average time and distance interval windows were respectively

157 days and 5023.19 km. Further, the number of variables used in
The influence of driving behaviour variables
driving behaviour variables on the time and
Relationship between acceleration and past

variables in addition to traditional accident


driving behaviour based risk-index on past

Speed differences between crash-involved

the study was still quite limited to assess the predictive value of
Effect of driver feedback and influence of

such a model. More specifically, no data about past claims were


The influence of customer profile and

The added value of driving behaviour


System architecture and data model

included, which is proven to be an important predictor of future


claims [4,9,39]. A second study of Paefgen et al. [36] applies a multi-
and crash-non-involved drives
New business opportunities

variate logistic regression to explain the risk of accident involvement


distance to the first crash

on accident involvement

by several driving behaviour variables. This research is based on


1600 observations for which 6 months of telematics data is collected.
crash involvement

crash involvement

The authors recognize that a larger sample and a longer observation


Research scope

risk predictors

period could deliver additional insights. Compared to Ayuso et al. [6],


the study of Paefgen et al. [36] takes a higher variety of road type
and time of the day variables into account, but no customer specific,
car specific or past claims data was available. This is also recognized
as a limitation by the authors. As a result, the data do not allow to
investigate the added value of telematics data on top of the data
Azzopardi and Cortis [7]

traditionally used to estimate accident risk.


Vaia et al. [43] and

Husnjak et al. [22]

This study will build on previous work by investigating from a


Paefgen et al. [36]
literature overview.

Toledo et al. [41]


Wåahlberg [45]

Ayuso et al. [6]

predictive perspective how telematics records can improve a risk


Jun et al. [23]

assessment model that is traditionally based on customer specific,


This study
Reference

car specific and past claims data. Further, a larger sample size and a
Table 1

longer observation period than in the study of Paefgen et al. [36] is


used. This allows us to include a variation of time windows on which
72 P. Baecke, L. Bocca / Decision Support Systems 98 (2017) 69–79

these telematics variables are constructed. In addition, also human are created as a smart combination of metrics from which experts
judgement-based telematics variables will be investigated and finally, expect a significant impact on accident risk (e.g. night trips during
thisstudywillexplorewhichtypesoftelematicsdataaremostvaluable. the weekend).
These topics are currently under-focused in previous literature but
could significantly improve the risk selection process of an insurance
company. 3. Methodology

2.2. Risk selection process 3.1. Data description

Fig. 1 maps the risk selection process that could be used when a Data is collected from a European car insurance company. These
customer comes to an insurance company for a new motor insurance data include information about 6984 customers involved in the
product or when an existing product is reviewed for renewal. This telematics programme launched by the company from 2011 until
risk selection process can be split into several steps: risk assessment, 2015. Similar as in Ayuso et al. [6], this programme was targeted to
risk evaluation and premium evaluation. In the risk assessment drivers under the age of 30 who could receive a discount on the final
process a predictive model is used to estimate the accident risk for premium if they agreed to install the IVDR into their car. Until 2015
the car insurance product. The data used for this estimation will no premiums were tailored based on telematics data.
differ depending on the lifetime of the product. For new customers, Fig. 2 represents the timeline used to create a predictive model.
only customer specific data (e.g. age, experience, zone of residence), Table 2 gives an overview of all the variables collected by the company
car specific data (e.g. car segment, power, weight) and, if available, and included in this model. This table also reports if the variable group
past claims data from the previous insurer are taken into account. If is continuous or categorical and the number of underlying variables, in
the policy is reviewed for existing customers, the estimation can be case of a continuous variable group, or the number of category values
augmented, if available, with data from telematics devices previously per variable group. The purpose of the model deployed is to predict if
installed in the car. These driving behaviour data include kilometres a customer will make at least one claim during the dependent period
driven by the customer for each trip, subdivided based on time and 2015. Based on this period a binary variable, claim, is created. This
location. If the customer is not part of the telematics programme yet, variable takes the value of one in 23.6% of the cases. This percentage
he or she could receive a discount to install the IVDR. In a second is in line with the study of Ayuso et al. [6] in which the yearly acci-
step, the decision support system will evaluate the customer’s appli- dent probability ranged between 15.4% and 31.8% depending on the
cation making use of business rules that combine the risk estimation customer segment.
with information about the customer’s product portfolio. Even if the The calculations of the independent variables are based on 2014
risk estimation is above a crucial threshold, the application could and previous years of data. The explanatory variables traditionally
still be accepted due to a high customer lifetime value from other employed to estimate accident risk can be divided into three main
insurance products. If accepted, the premium is determined taking groups: customer specific, car specific and past claims variables.
again the customer’s portfolio, product specifications and the risk These variables are similar as the ones used in other risk assess-
estimation into account. This study focusses on improving the first ment studies [4,27,39]. Concerning the customer specific variables,
part of this decision support system. More specifically it will research each policy can have up to three drivers assigned. Therefore, average,
the value of telematics in the risk assessment process. This will be minimum and maximum age and driving experience are calculated
investigated by first adding standard telematics variables to the pre- per policy. In addition, the company includes an interaction term
dictive model. Next, similarly to D’Haen et al. [16], the added value between age and driving experience. The car specific variables
of involving industry experts in the development of the predictive include some technical characteristics of the car such as type, vehicle
model is investigated by augmenting the model with expert-based age, power, and weight, and more commercial ones such as car brand
telematics variables. These are additional features that are not auto- and segment. The claim history is measured by the bonus malus and
matically extrapolated from the raw data. Instead, these features variables such as the number of years without any claim: only in

Fig. 1. Car insurance risk selection process.


P. Baecke, L. Bocca / Decision Support Systems 98 (2017) 69–79 73

Fig. 2. Model timeline.

tort or overall. Furthermore, there are four additional binary vari- distance driven can be subdivided based on location (i.e. kilometres
ables indicating if a client has filed a claim during four different time driven in city, other roads, highway or abroad) and time of the
windows: the previous month, the three previous months, the pre- day. Similar as in Paefgen et al. [36], five time values are identified
vious year and the four previous years. In total 18 customer specific, depending on peak or off-peak hours and the moment of the day
24 cars specific variables and 7 claim history variables were included (i.e. high AM, low day, high PM, low PM or low night). These metrics
in the model. Categorical variables are included as n − 1 dummy already reveal more information about how the policyholder drives.
variables, in which n represents the number of category values. Further, also the time between the first and last trip per day is cal-
Besides these variables, traditionally used by the insurer, an addi- culated. Since all these telematics data is collected per trip, these
tional set of variables, telematics, are created to represent the driving values need to be aggregated on customer level. This aggregation
behaviour factors in the model. Over the four years (2011–2014), is performed using sum, average, minimum and maximum on the
about 4.3 million single day trips have been recorded by telematics four different time windows: the previous month, the three pre-
devices for the customers involved in the programme. This leads to vious months, the previous year and the four previous years. Also
a total of more than 230 million km driven. Fig. 3 illustrates the dis- the relative percentages of kilometres driven per location and time
tribution of the kilometres driven in 2014, with an average of about window are calculated. Finally, the telematics sensors also contain
13,000 km driven per customer. For each trip, the IVDR generates an experimental feature that is able to register a collision with the
data about the total distance driven. This data is already sufficient to car and the relative G-Force linked to this collision. Although this
support a PAYD insurance product. However, besides this, the total is still an experimental feature, the variables created based on this

Table 2
Model variables.

Variable categories Description Var. type Nbr. of variables or values

Dependent
Claim A binary variable indicating whether the cat 2
customer incurred in at least one claim

Independent
Customer specific Age cont 3
Experience cont 3
Age * Experience cont 1
Zone of residence cat 5
Number of drivers cat 3
Payment frequency cat 5
Private usage cat 2
Car specific Vehicle age cont 1
Vehicle kilowatt cont 1
Vehicle weight cont 1
Vehicle value cont 1
Petrol cat 2
Five seats cat 2
Car segment 1 cat 5
Car segment 2 cat 4
Manual transmission cat 2
Car brand cat 11
Claim history Years without claims cont 1
Years without claims in tort cont 1
Bonus malus cont 1
Number of past claims cont 4
Telematics Total distance cont 20
Total trip time cont 20
Location distance (city, highway, abroad, other) cont 80
Day time distance (low night, high AM, low day, cont 100
high PM, low PM)
Telematics crash cont 8
Telematics crash G-force cont 12
Expert-based Night trip (Friday, Saturday) cont 16
Rush hours trip (morning, evening) cont 16
Rush hours trip start (morning, evening) cont 16
74 P. Baecke, L. Bocca / Decision Support Systems 98 (2017) 69–79

Fig. 3. Distribution of kilometers driven.

are also taken into account. All these metrics result in a total of 240 variables for customer i; b0 represents the intercept and b is a vector
independent telematics variables. of coefficients obtained through maximum likelihood optimisation.
The variables described above are directly aggregated from The data used in this study contains many correlating telematics-
the raw telematics data without a lot of business reasoning [22]. based variables. In order to avoid redundant variables, we employ a
However, D’Haen et al. [16] illustrated that the creation of vari- stepwise feature selection process, similar as in Paefgen et al. [36].
ables based on industry expertise can enhance the performance of The variables are iteratively removed from the model if their signifi-
predictive models. Hence, additional variables are created based on cance exceeds a p-value threshold of 0.05.
the collaboration between data and industry experts. By combining Although from a business perspective, the regulator requires the
the automatically generated telematics data in a smart way, experts use of “white box” models from which the effect of all variables
could create metrics that better point to driving behaviour that has can be explained, this study will also investigate, from an academic
a significant influence on accident risk. These expert-based variables perspective, the value of two “black box” models, namely random
specifically focus on identifying risky rush hour trips on weekdays forests and artificial neural networks. These two techniques are
and night trips during the weekend. These variables are also aggre- chosen based on the benchmark study of Lessmann et al. [28]. In that
gated using the average and sum over the 4 different time windows study, which was also executed in a risk assessment context, random
previously defined, resulting in 48 additional variables. forests was ranked as the best performing homogeneous ensemble
classifier and artificial neural networks as the best individual clas-
3.2. Data mining techniques sifier. Also, other studies have found good predictive performances
using these machine learning techniques [25,44]. Random Forests
In the insurance industry, a generalized linear model with poisson is a very powerful predictive technique which consists of a large
distribution is often used to predict the claim counts [14]. If the ensemble of decision trees [10]. This technique overtakes the single
number of observations without claims is very high, a zero-inflated decision tree’s limitation of sacrificing accuracy to avoid overfitting
model can even result in better predictions [47]. However, as illus- on training data [8]. Based on random feature selection techniques
trated by Paefgen et al. [36], also a logistic regression model can be and bootstrap sampling a large number of trees are modelled. Finally,
used to estimate the probability that a claim would be made. Our classification is allocated by aggregating the results of the individual
study will use the same approach as Paefgen et al. [36] because the trees. Based on the suggestions of Breiman [10] the number of ran-
insurer is allowed to revise the insurance contract each time a claim domly chosen predictors is set equal to the square root of the total
is filed. Hence, it is more valuable to estimate if at least one claim will number of variables included in the model and 1000 trees were
be made during the insurance period than the count of the claims. grown per random forests model. Besides random forests, also a
Logistic regression is a well-known robust data mining tech- feed-forward neural network will be used, which consists of an input
nique for classification problems [3]. One of the main advantages of layer, one hidden, and an output layer. The neurons of one layer
this predictive technique is its interpretability, by producing specific are fully-connected to the neurons of the next layer, which helps to
information about the size and the direction of the effects of the inde- explore potential interaction effects between the input variables. The
pendent variables [33]. In a risk assessment context, interpretability number of neurons in the input layer equals the number of input
is an important requirement of the regulator. On the other hand, this variables while the number of neurons in the output layer is set to
technique is less able to capture very complex non-linear relation- one, namely the dependent claim variable.
ships [34]. Based on this technique the probability of a claim can be
estimated as follows: 3.3. Model evaluation

1 The predictive performance of these models will be calculated


P(Yi /Xi ) = (1)
1 + e−(b0 +b∗Xi ) based on a k-fold cross validation approach, where, in this study, k
equals ten. This technique splits the data into k different folds of a
where P(Yi /Xi ) contains the a posteriori probability of a claim by cus- similar size, where the training sample consists of k − 1 folds and
tomer i; Xi represents a vector containing specific values of selected the validation sample of the remaining one. For the artificial neural
P. Baecke, L. Bocca / Decision Support Systems 98 (2017) 69–79 75

Table 3
Model performance (AUC).

Model name Variable categories Logistic regression Random forests Artificial neural networks

Base model Customer specific, Car specific, Claim history 0.5777 0.5866 5820
Telematics model Telematics 0.5949 0.5840 0.5992
Combined model Customer specific, Car specific, Claim history , Telematics 0.6083 0.5937 0.6174
Combined + Expert model Customer specific, Car specific, Claim history, Telematics, Expert-based 0.6135 0.5974 0.6176

network model, the training sample is split again, where 70% of the Table 3, a traditionally used base model leads to an AUC of 0.5777.
training sample is used to build models in which the number of On the other hand, a model that is only based on telematics variables
neurons in the hidden layer range between 5 and 100, increased in reaches an AUC of 0.5949, meaning that the automatically generated
increments of 5 neurons. The other 30% of the trainings sample is telematics-based variables already outperform the traditional vari-
reserved to determine the optimal model [44]. The process of train- ables for risk assessment. However, the power lies in combining
ing and testing the model is repeated until each fold operated as test both data sources which results in a predictive performance of
set. Finally, the performance is computed as the average area under 0.6083. This shows that the augmented telematics data, measuring
the receiver operating characteristic (AUC) on the validation sam- the driving behaviour, is able to capture risk elements that where
ple across all k trials [38]. The AUC represents the probability that a not included in the traditional customer specific, car specific and his-
randomly chosen policyholder that made a claim in 2015 is ranked torical claims data. Next, Table 3 illustrates that involving industry
higher than a randomly chosen policyholder without any claim in experts in the variable creation process is valuable. In such a final
2015. This evaluation criteria is preferred because it gives a general model an improved AUC of 0.6135 could be observed. This final
indication about the performance of the model, independent of a model is able to improve the predictive performance of a traditional
chosen threshold that transforms probabilities into predictions. This approach by 3.58 percentage points, which clearly illustrates the
is relevant since all levels of probabilities will be used in the premium added value of telematics data for predicting customers’ accident
evaluation stage of the risk selection process (see Fig. 1). Further- claims.
more, this evaluation criteria is insensitive to the class imbalance Besides logistic regression models, the same input variables are
present in the dataset [13,20]. used for two more advanced data mining techniques. Using the
traditional input variables, random forests outperforms the logis-
tic regression model and artificial neural networks. In contrast,
4. Results the model containing only telematics variables performs worse.
Although random forests is an advanced classification technique, the
This section will compare the different risk assessment models in relationships between telematics-based driving behaviour variables
terms of predictive performance. Next, deeper insights are provided and the probability to make a claim can be better modelled with
into the impact of individual telematics variables. Table 3 shows a regression based techniques or artificial neural networks. This is
the predictive performances of all the models deployed. The second illustrated by Fig. 4 that represents the variable effect character-
column gives an overview of the input variables of each model, istic curves of the most important telematics-based variable: total
defined in more detail in Table 2. In total, 12 models will be com- distance (3M, Sum), which represents the sum of all daily kilometres
pared. These models differ depending on 4 groups of input variables driven during the last three months. The curves show the variation in
and 3 data mining techniques. accident risk depending on the variation of Total Distance (3M, Sum)
In a first step, we will focus on the performances of the logistic while all other independent variables are fixed at the median
regression model, which is from a business perspective more easily level [12,24]. Note that the impact of the variable is stronger in
accepted by the regulator. This algorithm makes use of a stepwise a logistic regression model than in the two more advanced tech-
variable selection technique. Table 4 gives for each logistic regres- niques. Due to the stepwise variable selection technique the logistic
sion model an overview of the average number of variables selected regression model is limited to a few telematics-based variables that
across the 10 samples in the cross-validation procedure. Further, the capture the impact of driving behaviour (see Table 4). This effect is
percentages represent the distribution of the number of input vari- spread over all 240 telematics-based variables and 48 expert-based
ables over the different categories. The number of selected variables telematics variables in the two advanced models. Multiple studies
is quite equal between a base and a telematics model, although the have proven at least a monotonic positive, although not always pro-
number of initial input variables is a lot higher for the telematics portional, relationship between total distance driven and accident
model (see Table 2). This can be explained by a higher collinearity risk [17,27,36]. Fig. 4 shows that this monotonicity is better captured
between the telematics variables. Combining the two data sources by a regression based or artificial neural networks model than a
increases the number of variables selected, with an equal balance random forests model. Random forests is still based on underlying
between traditional and telematics variables. In the final model, the decision trees that try to discretise these continuous telematics-
expert-based telematics variables account for 13.33% of all variables based variables to determine classification cut-offs for each node
included in the model. Investigating the predictive performance in in the decision tree. Hence, this model is less able to capture the

Table 4
Overview of variable selection of logistic regression model.

Variable categories Base model Telematics model Combined model Combined + Expert model

Customer specific 36.22% 16.75% 13.33%


Car specific 47.24% 22.84% 20.00%
Claim history 16.54% 5.08% 4.44%
Telematics 100.00% 55.33% 48.89%
Expert-based 13.33%
Avg. number of variables 12.7 12.4 19.7 22.5
76 P. Baecke, L. Bocca / Decision Support Systems 98 (2017) 69–79

Fig. 4. Variable effect characteristic curves.

real relationship between driving behaviour and claims. The same performance is already higher than all traditionally used variables in
underperformance is reflected in a random forests model in which the base model. This observation is consistent with the assumption
traditional predictors are combined with telematics variables. The that the occurrence of an accident is still mainly driven by chance. As
final random forests model, which includes traditional variables, a result, the more exposed to the risk, the higher the probability of an
standard and expert-based telematics variables, improves the bench- accident. Further, looking at the variable typologies, Fig. 5 indicates
mark logistic regression model with only 1.97 percentage points. that the general telematics-based variables typically have the high-
This is significantly lower than the 3.58 percentage points improve- est predictive power. This category contains the top six variables in
ment obtained with a logistic regression model. terms of univariate predictive performance. This observation is con-
On the other hand, artificial neural networks are better able to firmed in Fig. 6, which visualises the average AUC of all telematics
capture these positive monotonic relationships, resulting in the best variables distributed by typology. On average, telematics-based vari-
performance of a model only based on telematics data. Also for ables that take into account the general driving behaviour (i.e. total
the combined models this data mining technique is the best per- distance and trip time) have the highest predictive performance. This
former. Further, augmenting the combined model with expert-based is followed by variables that focus on specific time windows when
variables does not impact the performance anymore. This can be a customer drives and locations or road types where the customer
explained by the fact the expert-based telematics variables are still drives. Fig. 6 also shows that the average univariate predictive power
quite simple combinations of the original telematics data, while arti- of the expert-based variables is similar to the location-based telem-
ficial neural networks are designed to automatically detect relevant atics variables. As mentioned before, these expert-based telematics
combinations of variables in the hidden layers. The final artificial variables still add value on top of the standard telematics variables
neural networks model including all possible input variables is able (see Table 3). Finally, this study also included experimental features
to improve a logistic regression model containing traditionally used that indicate past crashes and G-force of these crashes recorded auto-
variables with 3.99 percentage points. matically by the IVDR. These variables currently only have a very
In order to provide more insights about the relevance of the dif- limited added value. This can be explained by the fact that this fea-
ferent types of telematics variables, a series of univariate logistic ture was still in development containing plenty of wrongly identified
regression models are conducted. A univariate analysis was preferred crashes.
since the collinearity between the telematics variables is large. Fig. 5 The driving behaviour variables can be calculated based on dif-
shows the impact in terms of AUC of the top 40 telematics variables, ferent time windows. Fig. 7 compares the average univariate pre-
labelled by typology such as general, time-based, location-based dictive performance over the four different time windows used in
and expert-based driving behaviour variables. This performance is this research. This shows that data from three months of driv-
calculated as the average AUC on the validation sample using a 10- ing behaviour is already sufficient to obtain the highest predictive
fold cross validation procedure. On top of the list, Total Distance power. If data from one year or four years is used, some driving
(3M, Sum) can be found. By only using this variable to predict if a behaviour is not recent enough which negatively affects the predic-
customer will file a claim, an AUC of 0.5891 can be obtained. This tive performance. However, also decreasing the number of months
P. Baecke, L. Bocca / Decision Support Systems 98 (2017) 69–79 77

Fig. 5. Univariate predictive performance (AUC) of top 40 telematics-based.

to one reduces the average AUC. This indicates that not enough customers’ driving behaviour. This study has investigated the impact
behavioural data is collected to estimate the parameter estimates of these data on the risk selection process. More specifically, it is the
more accurately. first study that proves in detail the added predictive value of telem-
atics data in addition to traditionally used variables such as customer
5. Discussion and conclusion specific, car specific and historical claims variables. A predictive
model that is only based on this data source is already able to assess
With the advent of the IoT paradigm, an increasing amount of the accident risk better than traditional models. However, most value
sensors are available that enable insurers to collect detailed data of lies in combining both data sources because they capture different

Fig. 6. Average univariate predictive performance (AUC) per telematics type. Fig. 7. Average univariate predictive performance (AUC) per time window.
78 P. Baecke, L. Bocca / Decision Support Systems 98 (2017) 69–79

underlying elements of the risk. Insurance companies should stimu- [5] L. Atzori, A. Iera, G. Morabito, The Internet of Things: a survey, Comput. Netw.
54 (2010) 2787–2805. http://dx.doi.org/10.1016/j.comnet.2010.05.010.
late their clients to install In-Vehicle Data Recorders. This can gen-
[6] M. Ayuso, M. Guillén, A.M. Pérez-Marín, Time and distance to first accident
erate advantages for both insurers and customers. While this clearly and driving patterns of young drivers with pay-as-you-drive insurance, Accid.
improves an insurer’s risk selection process, customers can benefit Anal. Prev. 73 (2014) 125–131. http://dx.doi.org/10.1016/j.aap.2014.08.017.
[7] M. Azzopardi, D. Cortis, Implementing automotive telematics for insurance
from a lower premium if their driving behaviour is analysed as safe,
covers of fleets, J. Technol. Manag. Innov. 8 (2013) 59–67.
but also from additional services, such as automatic emergency calls, [8] P. Baecke, D. Van Den Poel, Data augmentation by predicting spending
stolen vehicle tracking and diagnostic services. pleasure using commercially available external data, J. Intell. Inf. Syst. 36 (2011)
This study shows that additional improvements can be quite 367–383. http://dx.doi.org/10.1007/s10844-009-0111-x.
[9] J. Beirlant, V. Derveaux, A.M. De Meyer, M.J. Goovaerts, E. Labie, B.
easily obtained by managing the variable creation process. Hence Maenhoudt, Statistical risk evaluation applied to (Belgian) car insurance,
management should put enough focus on this process. This time Insur. Math. Econ. 10 (1992) 289–302. http://dx.doi.org/10.1016/0167-
investment, in which data experts collaborate with industry experts, 6687(92)90060-O.
[10] L. Breiman, Random forests, Mach. Learn. 45 (2001) 5–32.
can result in an even better risk assessment model. This is espe- [11] M. Chen, S. Mao, Y. Liu, Big data: a survey, Mob. Netw. Appl. 19 (2014) 171–
cially the case in a heavily regulated environment in which only 209. http://dx.doi.org/10.1007/s11036-013-0489-0.
easily interpretable models are allowed. These models are not able [12] P. Cortez, M.J. Embrechts, Using sensitivity analysis and visualization tech-
niques to open black box data mining models, Inf. Sci. 225 (2013) 1–17. http://
to automatically identify valuable interactions between the original dx.doi.org/10.1016/j.ins.2012.10.039.
telematics variables. Hence, industry experts can easily create addi- [13] S. De Cnudde, D. Martens, Loyal to your city? A data mining analysis of a public
tional variables which are still explainable to the regulator. When service loyalty program, Decis. Support. Syst. 73 (2015) 74–84. http://dx.doi.
org/10.1016/j.dss.2015.03.004.
more advanced machine learning techniques are used that are spe-
[14] M. Denuit, X. Maréchal, S. Pitrebois, J.-F. Walhin, Actuarial Modelling of Claim
cialised in detecting these hidden interactions, such as artificial Counts: Risk Classification, Credibility and Bonus–Malus Systems, Wiley. 2007.
neural networks, the added value of expert-based variables is very [15] P. Desyllas, M. Sako, Profiting from business model innovation: evidence from
Pay-As-You-Drive auto insurance, Res. Policy 42 (2013) 101–116. http://dx.doi.
limited. Even though these models are not accepted by the regulator,
org/10.1016/j.respol.2012.05.008.
management could still use these techniques to obtain an indication [16] J. D’Haen, D. Van den Poel, D. Thorleuchter, D.F. Benoit, Integrating expert
of the potential with the same input variables. knowledge and multilingual web crawling data in a lead qualification system,
If managers want to implement a Usage-Based-Insurance product Decis. Support. Syst. 82 (2016) 69–78. http://dx.doi.org/10.1016/j.dss.2015.12.
002.
this study advises to focus first on PAYD insurances where the final [17] J. Ferreira, E. Minikel, Measuring per mile risk for pay-as-you-drive automobile
premium is based on the total amount of kilometres driven in a pre- insurance, Transp. Res. Rec. J. Transp. Res. Board 2297 (2012) 97–103. http://
defined timeslot. Since the risk on an accident is still mainly driven dx.doi.org/10.3141/2297-12.
[18] B.D. Gerardo, J. Lee, A framework for discovering relevant patterns using
by chance, the general exposure to this risk is a very good predictor. aggregation and intelligent data mining agents in telematics systems, Telemat-
By focusing on this metric, convenient products can be created, quick ics Inform. 26 (2009) 343–352. http://dx.doi.org/10.1016/j.tele.2008.05.003.
to implement for both insurers and drivers. Next, the insurer can [19] J. Gubbi, R. Buyya, S. Marusic, M. Palaniswami, Internet of Things (IoT):
a vision, architectural elements, and future directions, Futur. Gener. Comput.
evolve toward even more complex products that also include time Syst. 29 (2013) 1645–1660. http://dx.doi.org/10.1016/j.future.2013.01.010.
and location of the driving behaviour to assess the risk even more [20] J.A. Hanley, B.J. McNeil, The meaning and use of the area under a receiver
accurately. While Vaia et al. [43] suggest to review the premium operating characteristic (ROC) curve, Radiology 143 (1982) 29–36. http://dx.
doi.org/10.1148/radiology.143.1.7063747.
based on the data collected over the past year, this study shows
[21] H. Hirsh, M. Noordewier, Using background knowledge to improve inductive
that three months of driving behaviour data is already sufficient to learning: a case study in molecular biology, IEEE Expert 9 (1994) 3–6. http://dx.
make the most accurate predictions. This can significantly shorten doi.org/10.1109/64.331477.
[22] S. Husnjak, D. Peraković, I. Forenbacher, M. Mumdziev, Telematics system in
the time necessary to up-sell customers to the new telematics-based
usage based motor insurance, Energy Procedia 100 (2015) 816–825. http://dx.
insurance product and give a competitive advantage. doi.org/10.1016/j.proeng.2015.01.436.
Future research could even widen the typology of the dataset by [23] J. Jun, R. Guensler, J. Ogle, Differences in observed speed patterns between
also investigating breaking and acceleration data. These data types crash-involved and crash-not-involved drivers: application of in-vehicle mon-
itoring technology, Transp. Res. C Emerg. Technol. 19 (2011) 569–578. http://
might help to understand the customers’ driving behaviour better dx.doi.org/10.1016/j.trc.2010.09.005.
and improve the risk selection process even more. Hence, research [24] R.H. Kewley, M.J. Embrechts, C. Breneman, Data strip mining for the virtual
in this area can help management even more to assess the value of design of pharmaceuticals with neural networks, IEEE Trans. Neural Netw. 11
(2000) 668–679. http://dx.doi.org/10.1109/72.846738.
PHYD insurances. This study found that more advanced data min- [25] J. Kim, P. Kang, Late payment prediction models for fair allocation of customer
ing techniques, such as artificial neural networks, limit the value of contact lists to call center agents, Decis. Support. Syst. 85 (2016) 84–101. http://
expert-based telematics variables. Future research could investigate dx.doi.org/10.1016/j.dss.2016.03.002.
[26] I. Kopanas, N.M. Avouris, S. Daskalaki, The role of domain knowledge in a large
the trade-off between more complex expert-based variables and the
scale data mining project, Methods Appl. Artif. Intell. 2308 (2002) 288–299.
additional predictive gains. Further, this study mainly focusses on the [27] J. Lemaire, S.C. Park, K.C. Wang, The use of annual mileage as a rating variable,
risk assessment stage of the risk selection process. However, once ASTIN Bull. 46 (2015) 39–69. http://dx.doi.org/10.1017/asb.2015.25.
more IVDRs are adopted by the market, future research could inves- [28] S. Lessmann, B. Baesens, H.V. Seow, L.C. Thomas, Benchmarking state-of-
the-art classification algorithms for credit scoring: an update of research, Eur.
tigate the financial impact of IoT in general and telematics in specific J. Oper. Res. 247 (2015) 124–136. http://dx.doi.org/10.1016/j.ejor.2015.05.030.
on the insurance industry. [29] E. Lima, C. Mues, B. Baesens, Domain knowledge integration in data mining
using decision tables: case studies in churn prediction, J. Oper. Res. Soc. 60
(2009) 1096–1106. http://dx.doi.org/10.2307/40206835.
[30] T. Litman, Pay-as-you-drive pricing and insurance regulatory objectives, J.
References Insur. Regul. 23 (2005) 35–53.
[31] D. Martens, J. Vanthienen, W. Verbeke, B. Baesens, Performance of classifica-
[1] F. Alonso, J.P. Carac, A.L. Gonza, Combining expert knowledge and data mining tion models from a user perspective, Decis. Support. Syst. 51 (2011) 782–793.
in a medical diagnosis domain, Expert Syst. Appl. 23 (2002) 367–375. http://dx. http://dx.doi.org/10.1016/j.dss.2011.01.013.
doi.org/10.1016/S0957-4174(02)00072-6. [32] R. Mayou, B. Bryant, R. Duthie, Psychiatric consequences of road traffic
[2] R. Ambrosino, B.G. Buchanan, The use of physician domain knowledge to accidents., Br. Med. J. 307 (1993) 647–651. http://dx.doi.org/10.1136/bmj.307.
improve the learning of rule-based models for decision-support, Proc. Annu. 6905.647.
Fall Symp. Am. Med. Inform. Assoc. (1999) 192–196. [33] P. McCullagh, J.A. Nelder, Generalized Linear Models, 37. CRC Press. 1989.
[3] D. Andriosopoulos, C. Gaganis, F. Pasiouras, C. Zopounidis, An application [34] S. Moro, P. Cortez, P. Rita, A data-driven approach to predict the success of
of multicriteria decision aid models in the prediction of open market share bank telemarketing, Decis. Support. Syst. 62 (2014) 22–31. http://dx.doi.org/
repurchases, Omega 40 (2012) 882–890. http://dx.doi.org/10.1016/j.omega. 10.1016/j.dss.2014.03.001.
2012.01.009. [35] S. Moro, P. Cortez, P. Rita, A framework for increasing the value of predic-
[4] K. Antonio, E.A. Valdez, Statistical concepts of a priori and a posteriori risk tive data-driven models by enriching problem domain characterization with
classification in insurance, AStA Adv. Stat. Anal. 96 (2012) 187–224. http://dx. novel features, Neural Comput. & Applic. (2016) 1–9. http://dx.doi.org/10.1007/
doi.org/10.1007/s10182-011-0152-7. s00521-015-2157-8.
P. Baecke, L. Bocca / Decision Support Systems 98 (2017) 69–79 79

[36] J. Paefgen, T. Staake, E. Fleisch, Multivariate exposure modeling of accident [46] L.D. Xu, W. He, S. Li, Internet of things in industries: a survey, IEEE Trans. Ind.
risk: insights from pay-as-you-drive insurance data, Transp. Res. A Policy Pract. Inf. 10 (2014) 2233–2243. http://dx.doi.org/10.1109/TII.2014.2300753.
61 (2014) 27–40. http://dx.doi.org/10.1016/j.tra.2013.11.010. [47] K.C.H. Yip, K.K.W. Yau, On modeling claim frequency data in general insurance
[37] G. Rejikumar, A pre-launch exploration of customer acceptance of usage based with extra zeros, Insur. Math. Econ. 36 (2005) 153–163. http://dx.doi.org/10.
vehicle insurance policy, IIMB Manage. Rev. 25 (2013) 19–27. http://dx.doi.org/ 1016/j.insmatheco.2004.11.002.
10.1016/j.iimb.2012.11.002.
[38] J.D. Rodriguez, A. Perez, J.A. Lozano, Sensitivity analysis of k-fold cross vali-
dation in prediction error estimation, IEEE Trans. Pattern Anal. Mach. Intell. 32 Philippe Baecke is a Professor at Vlerick Business School. He holds a Master degree
(2010) 569–575. http://dx.doi.org/10.1109/TPAMI.2009.187. in Applied Economics, an Advanced Master degree in Marketing Analysis and is
[39] M.-J. Segovia-Vargas, M.-d.-M. Camacho-Mi nano, D. Pascual-Ezama, Risk also Doctor in Applied Economics (Ghent University). He is programme director of
factors selection in automobile insurance policies: a way to improve the bottom two executive programmes at Vlerick Business School: Creating Business Value with
line of insurance companies, Rev. Bus. Manag. 17 (2015) 1228–1245. http://dx. Big Data & Data Driven Marketing. Philippe is also a visiting lecturer at University
doi.org/10.7819/rbgn.v17i57.1741. of Namur and Trinity College Dublin (Ireland). His research interest mainly lies at
[40] The European Parliament and the Council of the European Union, Concerning Customer Relationship Management, Business analytics and Big Data. More specifi-
type-approval requirements for the deployment of the eCall in-vehicle system cally, he focuses on improving analytical models by creatively incorporating new data
based on the 112 service and amending Directive 2007/46/EC, Off. J. Eur. Union types, such as geographical, social network and sensor generated data. His research
123 (2015) 77–89. has been published in several peer reviewed journals such as Decision Support
[41] T. Toledo, O. Musicant, T. Lotan, In-vehicle data recorders for monitoring Systems, International Journal of Information Technology and Decision Making,
and feedback on drivers’ behavior, Transp. Res. C Emerging Technol. 16 (2008) Journal of Intelligent Information Systems and Expert Systems with Applications. His
320–331. http://dx.doi.org/10.1016/j.trc.2008.01.001. research is also recognized in the business world, which is reflected in both the SAS
[42] C. Troncoso, G. Danezis, E. Kosta, B. Preneel, Pripayd: privacy friendly pay- Student Ambassador Award and the Best Paper Award which he received at SAS
as-you-drive insurance, Proc. 2007 ACM workshop Priv. electron. soc. (2007) Global Forum in 2012 (Orlando, U.S.). Over the past years, he has executed busi-
99–107. http://dx.doi.org/10.1145/1314333.1314353. ness research projects for multiple companies different industries such as financial
[43] G. Vaia, E. Carmel, W. DeLone, H. Trautsch, F. Menichetti, Vehicle telematics at services, telecom, and retail.
an Italian insurer: new auto insurance products and a new industry ecosystem,
MIS Q. 10 (2011) 115–117. http://dx.doi.org/10.1108/02635570910926564.
Lorenzo Bocca is a researcher at Vlerick Business School (Belgium). He obtained a
[44] V. Van Vlasselaer, C. Bravo, O. Caelen, T. Eliassi-Rad, L. Akoglu, M. Snoeck, B.
Master’s degree in Engineering Management from “Sapienza” University of Rome.
Baesens, APATE: a novel approach for automated credit card transaction fraud
After being awarded an Erasmus+Traineeship scholarship he joined Vlerick Business
detection using network-based extensions, Decis. Support. Syst. 75 (2015) 38–
School. His research mainly focuses on Business Analytics and Big Data. More specifi-
48. http://dx.doi.org/10.1016/j.dss.2015.04.013.
cally, he tries to generate business value by applying machine learning techniques on
[45] A.E. Wåhlberg, The stability of driver acceleration behavior, and a replication of
sensor generated data. For this, he is also involved in company projects in the financial
its relation to bus accidents, Accid. Anal. Prev. 36 (2004) 83–92. http://dx.doi.
services industry.
org/10.1016/S0001-4575(02)00130-6.

You might also like