BookSlides 3A Data Exploration
BookSlides 3A Data Exploration
BookSlides 3A Data Exploration
5 Summary
The Data Quality Report Getting To Know The Data Identifying Data Quality Issues Handling Data Quality Issues Summary
N UM % C LAIM
M ARITAL N UM I NJURY H OSPITAL C LAIM TOTAL N UM S OFT S OFT A MT F RAUD
ID T YPE I NC. S TATUS C LMNTS . T YPE S TAY A MNT. C LAIMED C LAIMS T ISS . T ISS . R CVD. F LAG
1 CI 0 2 Soft Tissue No 1,625 3250 2 2 1.0 0 1
2 CI 0 2 Back Yes 15,028 60,112 1 0 15,028 0
3 CI 54,613 Married 1 Broken Limb No -99,999 0 0 0 0 572 0
4 CI 0 4 Broken Limb Yes 5,097 11,661 1 1 1.0 7,864 0
5 CI 0 4 Soft Tissue No 8869 0 0 0 0 0 1
6 CI 0 1 Broken Limb Yes 17,480 0 0 0 0 17,480 0
7 CI 52,567 Single 3 Broken Limb No 3,017 18,102 2 1 0.5 0 1
8 CI 0 2 Back Yes 7463 0 0 0 0 7,463 0
9 CI 0 1 Soft Tissue No 2,067 0 0 0 0 2,067 0
10 CI 42,300 Married 4 Back No 2,260 0 0 0 0 2,260 0
. . .
. . .
. . .
300 CI 0 2 Broken Limb No 2,244 0 0 0 0 2,244 0
301 CI 0 1 Broken Limb No 1,627 92,283 3 0 0 1,627 0
302 CI 0 3 Serious Yes 270,200 0 0 0 0 270,200 0
303 CI 0 1 Soft Tissue No 7,668 92,806 3 0 0 7,668 0
304 CI 46,365 Married 1 Back No 3,217 0 0 0 1,653 0
. . .
. . .
. . .
458 CI 48,176 Married 3 Soft Tissue Yes 4,653 8,203 1 0 0 4,653 0
459 CI 0 1 Soft Tissue Yes 881 51,245 3 0 0 0 1
460 CI 0 3 Back No 8,688 729,792 56 5 0.08 8,688 0
461 CI 47,371 Divorced 1 Broken Limb Yes 3,194 11,668 1 0 0 3,194 0
462 CI 0 1 Soft Tissue No 6,821 0 0 0 0 0 1
. . .
. . .
. . .
491 CI 40,204 Single 1 Back No 75,748 11,116 1 0 0 0 1
492 CI 0 1 Broken Limb No 6,172 6,041 1 0 6,172 0
493 CI 0 1 Soft Tissue Yes 2,569 20,055 1 0 0 2,569 0
494 CI 31,951 Married 1 Broken Limb No 5,227 22,095 1 0 0 5,227 0
495 CI 0 2 Back No 3,813 9,882 3 0 0 0 1
496 CI 0 1 Soft Tissue No 2,118 0 0 0 0 0 1
497 CI 29,280 Married 4 Broken Limb Yes 3,199 0 0 0 0 0 1
498 CI 0 1 Broken Limb Yes 32,469 0 0 0 0 16,763 0
499 CI 46,683 Married 1 Broken Limb No 179,448 0 0 0 179,448 0
500 CI 0 1 Broken Limb No 8,259 0 0 0 0 0 1
Table: A data quality report for the motor insurance claims fraud
detection ABT
0.3
Density
Density
0.2
0.00010
0.1
0.00000
0.0
0 10000 30000 50000 70000 1 2 3 4
Income Num. Claimants
3e−05
4e−05
2e−05
Density
Density
2e−05
1e−05
0e+00
0e+00
0.6
0.3
Density
Density
0.4
0.2
0.2
0.1
0.0
0.0
0 1 2 3 4 5 56 0 1 2 3 5
Num. Claims Num. Soft Tissue
8e−05
6e−05
3
Density
Density
4e−05
2
2e−05
1
0e+00
0
0.30
0.6
0.5
0.5
0.4
0.20
0.4
Density
Density
Density
0.3
0.3
0.2
0.10
0.2
0.1
0.1
0.00
0.0
0.0
Missing Married Single Divorced Broken Limb Soft Tissue Back Serious No Yes
Marital Status Injury Type Hospital Stay
0.8
0.5
0.4
0.6
Density
Density
0.3
0.4
0.2
0.2
0.1
0.0
0.0
0 1 CI
Fraud Flag Insurance Type
Uniform
The Data Quality Report Getting To Know The Data Identifying Data Quality Issues Handling Data Quality Issues Summary
Normal (Unimodal)
The Data Quality Report Getting To Know The Data Identifying Data Quality Issues Handling Data Quality Issues Summary
In a feature following an
exponential distribution the
likelihood of occurrence of a
small number of low values is
very high, but sharply diminishes
as values increase.
Exponential
The Data Quality Report Getting To Know The Data Identifying Data Quality Issues Handling Data Quality Issues Summary
A feature characterized by a
multimodal distribution has two
or more very commonly occurring
ranges of values that are clearly
separated.
Multimodal
The Data Quality Report Getting To Know The Data Identifying Data Quality Issues Handling Data Quality Issues Summary
−6 −4 −2 0 2 4 6
Value
−6 −4 −2 0 2 4 6
Value
Table: The data quality plan for the motor insurance fraud prediction
ABT.
Handling Outliers
Table: The data quality plan for the motor insurance fraud prediction
ABT.
Summary
The Data Quality Report Getting To Know The Data Identifying Data Quality Issues Handling Data Quality Issues Summary
5 Summary