Claim Distribution 3
Claim Distribution 3
Claim Distribution 3
Chu-Shiu Li
Department of Risk Management and Insurance
National Kaohsiung First University of Science and Technology
Kaohsiung, Taiwan
Chwen-Chi Liu
Department of Risk Management and Insurance
Feng Chia University
Taichung, Taiwan
Sheng-Chang Peng*
Department of Risk Management and Insurance
Ming Chuan University
Taipei, Taiwan
*
Corresponding author, email: [email protected].
Risk Classification and Claim Prediction: An Empirical Analysis
Abstract
By conducting prediction models by logistic regression, this paper uses a unique vehicle
insurance data set in Taiwan to examine whether rating characteristics are still effective
under a bonus-malus system and to investigate whether extra information can help
predict claim occurrences for vehicle damage insurance. The empirical results show
that all current rating characteristics for vehicle damage insurance are significant factors
to predict claim occurrence. Among all, claim coefficient, car age, and car model are
models, we find that claim record in the previous policy year is a useful information for
risk classification.
1. Introduction
Risk classification is an essential task in the insurance field from both theoretical
and practical views. Unable to acquire complete information of their insureds, insurance
information. Not only does risk classification closely relate to the efficiency of
insurance market equilibrium, but it also functions as a necessary process for insurers
1
In the case of automobile insurance, it is common for insurers to use a number of
a priori classification variables such as the main driver’s age, gender, and occupation,
and vehicle’s usage and type, to adequately and fairly differentiate risk levels among
and driving history are also important in pricing automobile insurance rates. To include
driving experience into risk classification, the bonus-malus system (BMS) or merit
rating system has been widely used for a long time. A BMS rewards policyholders
without filing any claims by providing a discount (or bonus) and penalizes
policyholders involved in one or more accidents by adding extra premium (or malus).
There are usually two effects when a BMS is adopted by insurers: (i) it may prompt
the insureds to drive carefully and reduce accident occurrences, and (ii) it may link
driving risk to premiums more adequately. Utilizing automobile insurance data before
and after Tunisia introduced a BMS to its vehicle insurance pricing system, Dionne and
Ghali (2005) find a reduced probability of making claims for policyholders who
remained with the same insurance company during the observed period. Moreno et al.
(2006) provide a theoretical model to prove a bonus-malus contract can help eliminate
fraud. Furthermore, a BMS might induce some policyholders not to report small claims
(Lemaire, 1977) or to accumulate small losses to file one claim (Li et al., 2012) in order
policyholder’s long-term driving behavior and subsequent claim behavior. For instance,
when an insured has made a claim due to a car accident in the previous year, this year,
he or she will drive more carefully to avoid filing a claims and paying higher premiums
in the following year so that he or she can get a discount on future premiums.
in this paper through analyzing vehicle damage insurance data in Taiwan. In other
2
words, we would like to investigate which characteristics currently applied by insurers
significantly impact the occurrence of auto insurance claims. Especially, drivers’ claim
behavior might be affected by the BMS. Then it becomes a new issue that whether those
characteristics for risk classification are still operative. Furthermore, we also want to
examine whether some additional information which insurers are able to get from
test the relationships between a priori variables of risk classification and occurrence of
ex post claims in order to identify which variable can differentiate risk levels of
insureds. An effective risk classification variable, however, should have the ability to
predict claim occurrence. It would be a useful way to examine the characteristics of risk
analyzing prediction results. In other words, through setting different models to predict
claim occurrence, we could investigate how different rating characteristics affect the
following policy year. Particularly, we could estimate the importance of each different
develop their risk classification method for automobile insurance, our results might
provide some ideas for them to underwrite policies or review their current pricing
strategy.
vehicle insurance data set to examine whether rating characteristics are still effective
under a BMS and to investigate whether extra information can help predict claim
occurrences for vehicle damage insurance. Our empirical results show that all
characteristics in the current rating system are significant factors to predict claim
occurrence. Among all rating characteristics, claim coefficient, car age, and car model
3
are relative important information for risk classification. For checking useful factors to
prediction accuracy, claim record in the previous policy year is a useful information for
risk classification.
Taiwan’s vehicle damage insurance market in the next section. As for Section 4, we
describe our empirical data and methodology. Our estimated models and prediction
2. Literature review
approaches, the efficiency of risk classification and its influence on social benefits, such
as Hoy (1982), Crocker and Snow (1985, 1986, and 1992), Bond and Crocker (1991),
etc. Crocker and Snow (2000) later reviewed previous theoretical views and
summarized that risk classification can raise the efficiency of insurance markets under
do not target at analyzing particular insurance types but discussing issues of risk
classification through characteristics of vehicle – car usage, brand, and style, and
4
characteristics of policyholder – insured’s gender, age, and claim record (Lemaire,
1995). Between these two kinds of classification characteristics, there are little
behind this result is that European and American countries used to argue over whether
investigate vehicle insurance data by analyzing examples applying gender and age as
classification variables. For example, Butler et al. (1988) argued that pricing rules using
analyze car accident records for both male and female drivers, then compared the
differences between collected premiums and claim losses from insurance companies,
and found that insured US females were overcharged for their vehicle insurance.
Puelz and Kemmsies (1993) used data of three personal vehicle insurance policies
in Georgia, USA, including vehicle collision coverage, full coverage, and liability
coverage, to evaluate how gender and other demographic variables impact on premium
pricing. Their empirical research results showed that gender significantly affects
premium rates, yet its influential degree is relatively less than other variables such as
driving record, age, location, and vehicle type. Accordingly, it might be unnecessary
for supervising administration to spend much time making laws against insurance rate
pricing resorts to gender. Relying on various existing viewpoints on restricting this kind
research by pointing out that although the effects of administrative measures are
unclear; they indirectly lead to risk cost or claim control cost increases as they distort
5
claim incentives. As a result, both the supply and the coverage of individual insurance
might be impacted.
Until present day, scholars often propose related analyses of similar car insurance
applied the Closed Claim Survey data of year 1997 provided by the Insurance Research
characteristics and claim payment of third party liability vehicle insurance policies. Via
three possible economic theories discussing diverse risk attitudes and differences and
The results indicated that while controlling other variables, female insureds receive less
claim payments than male ones and married insureds receive more claim payments than
single ones. The relation between age difference and claim payment is insignificant in
Doerpinghaus’ study, but there are other researches arguing over age. Brown et al.
vehicle insurance pricing characteristic, and six among ten provinces refuse to do so.
Nevertheless, other documented studies which applied car accident data have found
age’s influence on car accident occurrence. For instance, Braver and Tempel (2004) and
Tefft (2008) identified higher accident tendencies for young and elder drivers. To put
their findings into a figure with car accident loss versus age, then the figure shows a
line close to a U shape. Such results respond to the rate regulations in practice in
Taiwan, which apply higher rate coefficients upon young and elder insureds.
insurance risk classification characteristic because similar to the cases of gender and
6
age, there is dispute over whether taking marital status into account is a kind of
insurance market to envision future trends on vehicle insurance coverage, rates, market
management, and related laws. On the subject of estimating rates according to one’s
marital status, the most common reason to support this application is marital status’
relativity, considering that married drivers are calmer and more responsible than
unmarried drivers. Meanwhile, opposing opinions emphasize that not every driver is
granted the right of marriage; for example, forty-five states in USA forbid same sex
marriage, so only heterosexual drivers are qualified for marital status rate discount.
Moreover, because less people orient to marital relationships and the (average) age at
first marriage rise at present day, the effects of marital status discount will become less
critical to insurance purchasers. While most insurance purchasers do not agree that
marital status should be an insurance rate pricing characteristic, the empirical analysis
by Doerpinghaus et al. (2008) has proved that marital status impacts on the total of
claim payment.
characteristic do not take place until recent years. Even though there were early
practical cases of claim experience, most countries adopted this characteristic rather
late. Dionne and Ghali (2005) used the vehicle insurance data of Tunisia before and
after it adopted a BMS and found that the probability for insureds who stayed with the
same insurance company decreases. Also, Moreno et al. (2006) designed theoretical
models which indicated that BMSs can prevent insurance fraud. The research literature
above suggests that applying BMSs induce insureds to lower driving hazards, but BMSs
may change insureds’ claim behavior, encouraging bonus hunger (Lemaire, 1977).
Because of the linkage between premium rates and filed claim numbers, insureds may
7
hesitate to report small claims or accumulate small losses to file one claim before
policies mature (Li et al., 2013) in order to avoid increasing their claim totals.
there is also plenty discussion with other factors. Via statistical analysis, Kellison et al.
(2003) examined the relationships between policyholder’s credit record and policies
with claim record, and they identified that those with worse credit scores report greater
losses (including both loss frequency and loss degree), which further proved that credit
yet slightly different research by Miller and Smith (2003) analyzed six types of private
car insurance policies and insurance scores only applicable to this kind of vehicle
insurance. However, these two papers did not explain how credit record assists
insurance companies in risk evaluation. To bridge this gap, Brockett and Golden (2007)
first reviewed related literature before investigating the relations between credit record
and car insurance loss from their studying of biological, psychological, and behavioral
attributes and financial assumption of risk regarding these attributes. They concluded
that credit evaluation could be turned into useful underwriting information only when
individual biological and psychological differences are reflected upon the loss risks of
insured vehicles.
Besides, Bair et al. (2012) predicted car accident occurrence based on vehicle
maintenance record through data of compulsory vehicle liability insurance and the
probability from insured cars which follow their maintenance schedules, but they did
not find significant effects on loss degree. Accordingly, Li et al. (2013) analyzed vehicle
insurance data in Taiwan, and they found a significantly less probability of filing vehicle
physical damage claims for insureds who purchase new cars along with vehicle physical
8
damage policies that bundle with high insured amount voluntary vehicle liability
source for insurance providers. Based on previous statistical data, the income from
automobile insurance premium accounts for approximately fifty per cent of all premium
income; meanwhile, vehicle insurance claims comprise sixty per cent of the claim total.
Judging from both revenue and expense aspects, steadily managing automobile
For a long time, the Taiwanese automobile insurance market was regulated, so
decided basic premium for each insurance policy and related characteristics which
age and gender (gender-age coefficient) and claim record (claim coefficient) into
general, the coefficient for male is higher than female, and young drivers are also noted
with relatively higher coefficients. The claim coefficient is calculated based on the
claim record in previous three years estimated as cumulative claim points. As for
characteristics of insured vehicle, official rates consider insured vehicle’s usage, type,
age, brand, and style (manufacture coefficient). Therefore, the administration calculates
9
Premium = Basic Premium × (Gender-age Coefficient + Claim Coefficient)
× Manufacture Coefficient
can determine their own vehicle insurance premium rates since April 2009. However,
administrative institutions.
[Insert Table 1]
Data
This paper uses a data set of private vehicle damage insurance policy and claim
information for the policy years between 2010 and 2012 from the Taiwan Insurance
policyholder (age, gender, and marital status), characteristics of the vehicle (car age,
car model, and exhaust), premium, and deductible type. The claim information includes
demographic characteristics of the claimed driver (age, gender, and marital status),
The vehicle damage insurance policy covers accidents to the car, including
rollover, lightning, fire, explosion, damage from flying objects, and collision, and it
(3,000/5,000/7,000 New Taiwan Dollar) and straight deductible. In our data set, about
10
different types of deductibles will affect whether an insured makes a claims for car
[Insert Table 2]
Our data are reorganized by policy year from 2010 to 2012. The number of
policies and claim ratio are shown in Table 2. Depending on claim information, all
proportions (claim ratios) of claimed policy numbers in all policies by policy year are
about 48 percent. Moreover, we also observe the claim coefficient, which can serve as
the long-term claim history. They are all negative in average for three policy years,
which shows that many policyholders had a discount for vehicle damage insurance
premium.
[Insert Figure 1]
As shown in Figure 1, the shares of new cars exceed 40 percent, and one-year
cars are about 20 percent. The shares of other car ages display a declining tendency.
Accordingly, we observe the claim history of both new car and old car policies, shown
in Table 2. New car policies have higher claim ratios than old car policies for three
policy years. Similarly, compared with renewed policies, there are relative high claim
ratios for new policies which include new car policies and those policies transferred
from other insurers. For insurance companies, renewed policyholders seem have lower
Methodology
for prediction models. Depending on our data set, we use samples in the 2010 and 2011
11
policy years to estimate prediction models and make in-sample predictions,
respectively. Subsequently, for out-sample predictions, samples in 2011 and 2012 are
respectively used (correspondingly applied). Therefore, we have two sub data sets of
2010-2011 and 2011-2012 policy years to make estimations and predictions to confirm
where Claim is a binary variable for whether a claim was filed during the insurance
and marital status); X2 are characteristics of the vehicle (such as car age, exhaust, car
model, and domestic car); X3 are other variables (such as insured district, insurance
company, claim experience in the last policy year, and so on); β1, β2, and β3 are
classification table. A comparison of the predicted probability with the cutpoint can
distinguish whether a predicted event will occur. If the probability of the predicted
event is greater than or equal to the cutpoint, this defines that the predicted event will
occur; otherwise it will not occur. The classification table applied in our study is shown
in Table 2. Of total claimed policies (A+B), A is the number of policies that the
predicted probability correctly forecast a claim filed during the policy period, and B is
the number of policies falsely predicted. The sensitivity (= A / (A+B)) is the percentage
12
and F are numbers of claimed and non-claimed policies, respectively. The cutpoint is
usually 0.5, but this paper would like to use a sensitivity analysis to determine it through
[Insert Table 3]
5. Empirical Results
After controlling for characteristics of the insured and vehicle, we predict the claim
probability using prediction models and then determine whether a claim is made during
the policy period by the cutpoint which is determined by sensitivity analysis. We apply
the percentages of correct prediction to the estimation sample (in the estimated policy
year) and holdout sample (in the next policy year) in order to evaluate the accuracy of
the prediction models. Prediction results are divided into the 2010 and 2011 policy
years. We only display prediction results for the 2010 policy year, and others are shown
in Appendix.
Firstly, we examine individual effect of rating characteristics, and results for the
2010 policy year are shown in Table 4. From estimation results, all rating factors, which
include insured age, gender, claim coefficient, car age, exhaust, and car model, have
significant effects on claim occurrence via Wald test, shown in Model 1. Then we
proceed prediction for claim occurrence through individually excluding one of six
rating characteristics from basic model, Model 1, and the results are shown in Models
2-7. When claim coefficient is removed out of the prediction model, the ratio of total
correct in Model 3 is slightly smaller than it in Model 1 for estimation sample, and the
difference rises for holdout sample. We have similar findings for car age and car model.
These results demonstrate that claim coefficient, car age, and car model are relative
important among current rating characteristics. There are consistent results in the 2011
13
policy year, shown in Appendix A.
shown in Table 5. Model 1 is basic model which contains all factors in rating system.
Models 2-6 individually join additional information into basic model, including
insured's married status, claim filed in the last policy year, liability coverage for third-
party's body and property, and claim coefficient of liability coverage. In Model 3, we
find that the information of claim record in the previous policy year could improve
model (Model 6), the ratios of total correct for estimation and holdout sample are
similar with those in Model 3. This finding exhibits that claim history in the last policy
information, however, cannot have consistent and robust estimation results, and also
cannot improve prediction accuracy. We have consistent outcomes in the 2011 policy
According to significantly different claim ratios among new car policies, new
policies, and renewed policies, this paper also proceeds prediction analyses for three
sub-samples, shown in Table 5. In Panels A and B, there are similar prediction results.
The ratios of sensitivity (prediction accuracy of claimed policies) are all higher than 70
percent for estimation and holdout samples. On the contrary, the ratios of specificity
cannot get more claim information for new car policies and new policies, other
renewed policies, the ratios of sensitivity (prediction accuracy of claimed policies) are
policies) are higher than 80 percent. These prediction results of renewed policies are
14
greatly different compared with new policies or new car policies. From those ratios of
total correct in Models 2-6, we can have the same finding that claim history in the last
6. Conclusions
A driver’s past claim history has been considered as one of the most important
variable to predict the future number of claims. However, not only can a BMS more
adequately link driving risk to premiums, it can also prompt the insured to drive more
carefully and reduce accident occurrences. As policyholders might change their claim
still effective. In addition, this paper uses an automobile insurance data set to investigate
whether extra information can help predict claim occurrences and improve risk
regression. The percentages of correct prediction for estimation and holdout samples
The empirical results show that all characteristics in the current rating system are
significant factors in prediction models, and claim coefficient, car age, and car model
are relative important information for risk classification. As stated by the highest
prediction accuracy, the predictor of claim record in the previous policy year is more
helpful than other extra information among all prediction models. From the sub-sample
15
Reference
Andersson, H., 2005, “The Value of Safety as Revealed in the Swedish Car Market: An
Application of the Hedonic Pricing Approach,” Journal of Risk and Uncertainty,
30(3): 211-239.
Bair, Shyi-Tarn, Rachel J. Huang, and Kili C.Wang, 2012, “Can Vehicle Maintenance
Records Predict Automobile Accidents?” Journal of Risk and Insurance, 79(2):
567-584.
Bond, Eric W. and Keith J. Crocker, 1991, “Smoking, Skydiving and Knitting: The
Endogenous Categorization of Risks in Insurance Markets with Asymmetric
Information,” Journal of Political Economy, 99: 177-200.
Braver, E. R. and R. E. Trempel, 2004, “Are Older Drivers Actually at Higher Risk of
Involvement in Collisions Resulting in Deaths or Non-Fatal Injuries among Their
Passengers and Other Road Users?” Injury Prevention, 10: 27–32.
Butler, P., T. Butler, and L. Williams, 1988, “Sex-Divided Mileage, Accident, and
Insurance Data Show that Auto Insurers Overcharge Most Women,” Journal of
Insurance Regulation,” 6: 243-284, 373-416.
Crocker, Keith J. and Arthur Snow, 1985, “The Efficiency of Competitive Equilibria in
Insurance Markets with Asymmetric Information,” Journal of Public Economics,
26: 201-219.
Crocker, Keith J. and Arthur Snow, 1986, “The Efficiency Effects of Categorical
Discrimination in the Insurance Industry,” Journal of Political Economy, 94: 321-
344.
16
Crocker, Keith J. and Arthur Snow, 1992, “The Social Value of Hidden Information in
Adverse Selection Economies,” Journal of Public Economics, 48: 317-347.
Doerpinghaus, H., J. Schmit and J. J-H. Yeh, 2008, “Age and Gender Effects on Auto
Liability Insurance Payouts,” Journal of Risk and Insurance, 75(3): 527-550.
Hoy, Michael, 1982, “Categorizing Risks in the Insurance Industry,” Quarterly Journal
of Economics, 97: 321-336.
Hoy, Michael, Chu-Shiu Li, Chwen-Chi Liu and Sheng-Chang Peng, 2012, “Risk
Classification, Rating Innovation, and Multiple Contracts in Automobile
Insurance Market,” 39th Seminar of the European Group of Risk and Insurance
Economists, Palma de Mallorca, Spain.
Kellison, B., P. Brockett, S.-H. Shin, and S. Li, 2003, “A Statistical Analysis of the
Relationship Between Credit History and Insurance Losses,” Bureau of Business
Research: McCombs School of Business, The University of Texas at Austin.
Lemaire, J., 1985, Automobile Insurance: Actuarial Models, Boston, MA: Kluwer
Academic Publishers.
17
Li, Chu-Shiu, Chih-Hao Lin, Chwen-Chi Liu, and A. Woodside, 2012, “Dynamic
Pricing in Regulated Insurance Markets with Heterogeneous Insurers: Strategies
Nice versus Nasty for Customers,” Journal of Business Research, 65: 968-976.
Li, Chu-Shiu, Chwen-Chi Liu, and Sheng-Chang Peng, 2012, “Is Claim Experience a
Good Factor to Predict Automobile Insurance Claim Occurrences?” Asia-Pacific
Risk and Insurance Association 16th Annual Conference, Seoul, South Korea.
Li, Chu-Shiu, Chwen-Chi Liu, and Sheng-Chang Peng, 2013, “The Expiration Date
Effects of Automobile Insurance Contracts: The Curious Case of Last Policy
Month Claims in Taiwan,” Geneva Risk and Insurance Review, 38: 23-47.
Li, Chu-Shiu, Chwen-Chi Liu, and Sheng-Chang Peng, 2013, “Bundled Automobile
Insurance Coverage and Accidents,” Accident Analysis and Prevention, 50: 64-
72.
Li, Chu-Shiu, Chwen-Chi Liu, and Jia-Hsing Yeh, 2007, “The Incentive Effects of
Increasing Per-Claim Deductible Contracts in Automobile Insurance,” Journal of
Risk and Insurance, 74: 441-459.
Miller, M., and R. A. Smith, 2003, “The Relationship of Credit-based Insurance Scores
to Private Passenger Automobile Insurance Loss Propensity,” Actuarial Study,
Epic Actuaries online at http://www.epicactuaries.com.
Moreno, I., F Vázquez, and R. Watt, 2006, “Can Bonus-Malus Allieviate Insurance
Fraud?” Journal of Risk and Insurance, 73: 123-151.
Puelz, R., and W. Kemmsies, 1993, “Implications for Unisex Statutes and Risk-pooling:
The Costs of Gender and Underwriting Attributes in the Automobile Insurance
Market,” Journal of Regulatory Economics, 5(3): 289-301.
Tefft, Brian C., 2008, “Risks Older Drivers Pose to Themselves and to Other Road
Users,” Journal of Safety Research, 39(6): 577-582.
Wang, Jennifer L., Ching-Fan Chung, and Larry Y. Tzeng, 2008 “An Empirical
Analysis of the Effects of Increasing Deductibles on Moral Hazard,” Journal of
Risk and Insurance, 75: 551-566.
18
Table 1. Gender-age coefficients
Age Male Female
Under 20 1.89 1.70
20 or above but under 25 1.74 1.57
25 or above but under 30 1.15 1.04
30 or above but under 60 1.00 0.90
60 or above but under 70 1.07 0.96
70 or above 1.07 0.96
New cars 61,236 62,963 49,957 59.13% 58.61% 60.80% -0.01 -0.02 -0.03
Old cars 73,846 78,846 80,975 38.94% 39.64% 41.24% -0.27 -0.29 -0.27
New policies 79,112 86,611 76,146 55.97% 55.67% 55.04% -0.05 -0.06 -0.07
Renewed policies 55,970 55,198 54,786 36.96% 36.13% 39.88% -0.31 -0.33 -0.33
Note: Claim ratio is the proportion of claimed policy numbers in all policies.
Figure 1. Car age pattern of the whole sample for each policy yea
19
Table 3. Classification table
Predicted
Actual % Claim No Claim
Claim Sensitivity A B
No Claim Specificity C D
Total correct E F
Table 4. Examination results of individual effect of rating characteristics for the 2010 policy year
Predictor Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7
Wald tests of individual effects (Wald Chi-Square and P-value)
Insured age 20.78 Y Y Y Y Y
(0.0004)
Male insured 17.53 Y Y Y Y Y
(<.0001)
Claim coefficient 1304.23 Y Y Y Y Y
(<.0001)
Car age 731.11 Y Y Y Y Y
(<.0001)
Exhaust 88.28 Y Y Y Y Y
(<.0001)
Car model 953.98 Y Y Y Y Y
(<.0001)
Others Deductible, Insured district, Insurance company
Classification Accuracies for Estimation and Holdout Samples (%)
Estimation sample
Sensitivity 64.00 63.90 63.90 63.60 64.20 64.20 64.20
Specificity 71.50 71.50 71.60 71.40 71.10 71.30 70.60
Total Correct 67.90 67.90 67.90 67.70 67.80 67.90 67.50
Observations 135,082
Holdout sample
Sensitivity 64.20 64.18 64.21 65.37 63.79 64.37 64.86
Specificity 67.78 67.77 67.73 65.76 67.74 67.57 66.47
Total Correct 66.06 66.05 66.04 65.57 65.84 66.03 65.69
Observations 141,809
Notes: The sensitivity is the percentage of prediction accuracy of the total claimed policies. The specificity is
the percentage of prediction accuracy of the total non-claimed policies. Total correct is the percentage of
prediction accuracy of the total policies.
20
Table 5. Estimation and prediction results of additional information for the 2010 policy year
Predictor Model 1 Model 2 Model 3 Model 4 Model 5 Model 6
Logit Predictor Coefficients for Estimation Sample
Basic risk factors Y Y Y Y Y Y
21
Table 6. Prediction results of additional information for different sub-samples for the 2010 policy year
Variable Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 1 Model 2 Model 3 Model 4 Model 5 Model 6
Basic risk factors Y Y Y Y Y Y Y Y Y Y Y Y
Married insured Y Y Y Y
Claim filed in last policy year Y Y Y Y
Body liability coverage Y Y Y Y
Property liability coverage Y Y Y Y
Claim coefficient of liability coverage Y Y Y Y
Estimation sample Holdout sample
Panel A: New car policies
Sensitivity 77.90 78.00 78.00 78.30 78.30 74.31 74.35 74.38 73.47 73.37
Specificity 55.10 55.30 55.20 54.50 54.50 53.45 53.37 53.35 55.04 55.27
Total Correct 68.60 68.70 68.70 68.70 68.70 65.68 65.67 65.68 65.84 65.87
Observations 61,236 62,963
Panel B: New policies
Sensitivity 77.00 77.00 77.00 77.60 78.00 71.38 71.39 71.39 70.33 70.25
Specificity 56.10 56.10 56.10 55.30 55.50 54.40 54.39 54.44 56.24 56.54
Total Correct 67.80 67.80 67.80 67.90 68.20 63.85 63.85 63.87 64.08 64.18
Observations 79,112 86,611
Panel C: Renewed policies
Sensitivity 40.50 40.50 41.00 40.70 41.00 41.50 43.15 43.23 44.83 43.50 43.29 44.71
Specificity 84.80 84.80 85.50 84.60 84.40 85.20 85.04 85.04 85.66 84.83 84.93 85.52
Total Correct 68.40 68.40 69.10 68.40 68.20 68.90 69.90 69.93 70.90 69.90 69.88 70.78
Observations 55,970 55,198
Notes: *** denotes statistical significance at the 1 percent level. Basic risk factors include a limited number of a priori classification variables for calculating premiums. The
sensitivity is the percentage of prediction accuracy of the total claimed policies. The specificity is the percentage of prediction accuracy of the total non-claimed policies. Total
correct is the percentage of prediction accuracy of the total policies.
22
Appendix A. Examination results of individual effect of rating characteristics for the 2011 policy year
Predictor Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7
Wald tests of individual effects (Wald Chi-Square and P-value)
Insured age 24.92 Y Y Y Y Y
(<.0001)
Male insured 42.83 Y Y Y Y Y
(<.0001)
Claim coefficient 1079.57 Y Y Y Y Y
(<.0001)
Car age 1121.49 Y Y Y Y Y
(<.0001)
Exhaust 100.47 Y Y Y Y Y
(<.0001)
Car model 1548.85 Y Y Y Y Y
(<.0001)
Others Deductible, Insured district, Insurance company
Classification Accuracies for Estimation and Holdout Samples (%)
Estimation sample
Sensitivity 59.40 59.30 59.20 59.00 58.80 59.70 59.40
Specificity 73.40 73.50 73.50 73.20 73.30 73.00 71.80
Total Correct 66.70 66.60 66.60 66.30 66.40 66.60 65.90
Observations 141,809
Holdout sample
Sensitivity 52.01 52.00 51.96 52.07 52.17 51.89 52.42
Specificity 65.40 65.34 65.45 64.04 64.81 65.57 63.94
Total Correct 58.88 58.85 58.88 58.21 58.66 58.91 58.33
Observations 130,932
Notes: The sensitivity is the percentage of prediction accuracy of the total claimed policies. The specificity is
the percentage of prediction accuracy of the total non-claimed policies. Total correct is the percentage of
prediction accuracy of the total policies.
23
Appendix B. Estimation and prediction results of additional information for the 2011 policy year
Predictor Model 1 Model 2 Model 3 Model 4 Model 5 Model 6
Logit Predictor Coefficients for Estimation Sample
Basic risk factors Y Y Y Y Y Y
24
Appendix C. Prediction results of additional information for different sub-samples for the 2011 policy year
Variable Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 1 Model 2 Model 3 Model 4 Model 5 Model 6
Basic risk factors Y Y Y Y Y Y Y Y Y Y Y Y
Married insured Y Y Y Y
Claim filed in last policy year Y Y Y Y
Body liability coverage Y Y Y Y
Property liability coverage Y Y Y Y
Claim coefficient of liability coverage Y Y Y Y
Estimation sample Holdout sample
Panel A: New car policies
Sensitivity 74.70 77.70 78.80 78.20 80.30 61.80 66.84 69.85 64.46 68.61
Specificity 54.10 51.20 52.50 51.10 51.20 47.61 44.67 39.96 47.24 41.73
Total Correct 66.20 66.80 67.90 67.20 68.50 56.24 58.15 58.13 57.71 58.07
Observations 62,963 49,957
Panel B: New policies
Sensitivity 80.20 79.80 77.00 81.20 78.70 73.04 73.11 71.69 71.75 70.54
Specificity 45.70 46.30 51.60 44.60 50.10 45.10 45.09 43.46 46.95 45.39
Total Correct 64.90 65.00 65.80 65.30 66.30 60.49 60.51 59.00 60.61 59.23
Observations 86,611 76,146
Panel C: Renewed policies
Sensitivity 47.20 47.50 48.50 48.40 48.50 50.20 33.08 33.08 34.05 36.91 33.00 38.40
Specificity 83.40 83.20 84.90 82.50 82.60 83.40 82.56 82.50 83.75 79.42 82.51 80.27
Total Correct 70.30 70.30 71.70 70.20 70.10 71.30 62.82 62.79 63.93 62.46 62.76 63.57
Observations 55,198 54,786
Notes: Basic risk factors include a limited number of a priori classification variables for calculating premiums. The sensitivity is the percentage of prediction accuracy of the
total claimed policies. The specificity is the percentage of prediction accuracy of the total non-claimed policies. Total correct is the percentage of prediction accuracy of the
total policies.
25