Assignment 5-Riya Mathew

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 14

Assignment 5

Riya Mathew
19021141088

Question:
1902114108 1902114108
Riya Mathew
8 8 85

Abstract
Machine Learning (ML) is a field of study that provides the capability to a Machine to
understand data and to learn from the data.
In this assignment,we will be performing regression to understand various factors
such as F statistics,to understand which are the significant variable and to remove the
insignificant variable.Also,this helps in understanding the skewness and distribution
of performing sqrt fn,log fn of the data set.

Objective
 to understand various factors such as F statistics
 to understand whether there is relation between predictor and response variable
 to understand which are the significant variable
 to remove the insignificant variable for better prediction value
 to understand the skewness and distribution of performing sqrt fn,log fn of the
data set
 To predict values which are relatively accurate
Methodology
 The method which is used here is linear regression
 The variable that I am analysing here is Price
 The set seed number that is assigned to me in 088 and the Train D size is 85%
 Therefore ,using the given values,the regression analysis needs to be done

Syntax
library(MASS)
library(ggplot2)
Toyota
names(Toyota)
str(Toyota)
set.seed(088)
#split the sample data and make the model
row.number=sample(1:nrow(Toyota),0.85*nrow(Toyota))
train=Toyota[row.number,]
test=Toyota[-row.number,]
train
dim(train)
test
dim(test)
ggplot(train,aes(Price))+geom_density(fill="green")
ggplot(train,aes(log(Price)))+geom_density(fill="green")
ggplot(train,aes(sqrt(Price)))+geom_density(fill="green")
#modelfit
model1=lm(log(Price)~.,data=train)
summary(model1)
par(mfrow=c(2,2))
plot(model1)
#model building for model2
#remove less significant
model2=update(model1,~.-MetColor-CC)
summary(model2)
plot(model2)
pred=predict(model2,newdata = test)
Output
> Toyota <- read.csv("C:/Users/Riya/Downloads/Toyota.csv")
> View(Toyota)
> library(MASS)
> library(ggplot2)
> Toyota
Price Age KM FuelType HP MetColor Automatic CC Doors Weight
1 13500 23 46986 Diesel 90 1 0 2000 3 1165
2 13750 23 72937 Diesel 90 1 0 2000 3 1165
3 13950 24 41711 Diesel 90 1 0 2000 3 1165
4 14950 26 48000 Diesel 90 0 0 2000 3 1165
5 13750 30 38500 Diesel 90 0 0 2000 3 1170
6 12950 32 61000 Diesel 90 0 0 2000 3 1170
7 16900 27 94612 Diesel 90 1 0 2000 3 1245
8 18600 30 75889 Diesel 90 1 0 2000 3 1245
9 21500 27 19700 Petrol 192 0 0 1800 3 1185
10 12950 23 71138 Diesel 69 0 0 1900 3 1105
11 20950 25 31461 Petrol 192 0 0 1800 3 1185
12 19950 22 43610 Petrol 192 0 0 1800 3 1185
13 19600 25 32189 Petrol 192 0 0 1800 3 1185
14 21500 31 23000 Petrol 192 1 0 1800 3 1185
15 22500 32 34131 Petrol 192 1 0 1800 3 1185
16 22000 28 18739 Petrol 192 0 0 1800 3 1185
17 22750 30 34000 Petrol 192 1 0 1800 3 1185
18 17950 24 21716 Petrol 110 1 0 1600 3 1105
19 16750 24 25563 Petrol 110 0 0 1600 3 1065
20 16950 30 64359 Petrol 110 1 0 1600 3 1105
21 15950 30 67660 Petrol 110 1 0 1600 3 1105
22 16950 29 43905 Petrol 110 0 1 1600 3 1170
23 15950 28 56349 Petrol 110 1 0 1600 3 1120
24 16950 28 32220 Petrol 110 1 0 1600 3 1120
25 16250 29 25813 Petrol 110 1 0 1600 3 1120
26 15950 25 28450 Petrol 110 1 0 1600 3 1120
27 17495 27 34545 Petrol 110 1 0 1600 3 1120
28 15750 29 41415 Petrol 110 1 0 1600 3 1120
29 16950 28 44142 Petrol 110 0 0 1600 3 1120
30 17950 30 11090 Petrol 110 1 0 1600 3 1120
31 12950 29 9750 Petrol 97 1 0 1400 3 1100
32 15750 22 35199 Petrol 97 1 0 1400 3 1100
33 15950 27 29510 Petrol 97 1 0 1400 3 1100
34 14950 26 32692 Petrol 97 1 0 1400 3 1100
35 15500 22 41000 Petrol 97 1 0 1400 3 1100
36 15750 26 43000 Petrol 97 0 0 1400 3 1100
37 15950 25 25000 Petrol 97 0 0 1400 3 1100
38 14950 23 10000 Petrol 97 1 0 1400 3 1100
39 15750 32 25329 Petrol 97 1 0 1400 3 1100
40 14750 27 27500 Petrol 97 0 0 1400 3 1100
41 13950 22 49059 Petrol 97 0 0 1400 3 1100
42 16750 27 44068 Petrol 97 1 0 1400 3 1100
43 13950 22 46961 Petrol 97 0 0 1400 3 1100
44 16950 27 110404 Diesel 90 0 0 2000 5 1255
45 16950 22 100250 Diesel 90 0 0 2000 5 1255
46 19000 23 84000 Diesel 90 0 0 2000 5 1270
47 17950 27 79375 Diesel 90 1 0 2000 5 1255
48 15800 22 75048 Petrol 97 1 0 1400 5 1110
49 17950 22 72215 Diesel 90 1 0 2000 5 1255
50 21950 31 64982 Petrol 192 1 0 1800 5 1195
51 17950 22 62636 Diesel 90 1 0 2000 5 1255
52 15750 30 57086 Petrol 97 1 0 1400 5 1110
53 20500 26 56000 Petrol 110 1 1 1600 5 1180
54 21950 27 49866 Petrol 192 1 0 1800 5 1195
55 15500 25 49163 Petrol 110 0 1 1600 5 1165
56 13250 32 45725 Petrol 110 1 0 1600 5 1075
57 15250 28 43210 Petrol 97 0 0 1400 5 1110
58 15250 26 43000 Petrol 97 0 0 1400 5 1110
59 18950 23 39704 Petrol 110 1 1 1600 5 1180
60 15999 30 38950 Petrol 110 1 0 1600 5 1130
61 14950 22 37400 Petrol 97 1 0 1400 5 1110
62 16500 27 37177 Petrol 110 0 0 1600 5 1130
63 18750 31 36544 Petrol 110 1 0 1600 5 1130
64 17950 30 33511 Petrol 110 1 0 1600 5 1130
65 17950 27 32809 Petrol 97 1 0 1400 5 1110
66 16950 26 32181 Petrol 110 1 0 1600 5 1075
67 18950 28 30993 Petrol 110 1 0 1600 5 1130
68 14950 22 30400 Petrol 97 1 0 1400 5 1110
69 22250 22 30000 Diesel 110 1 0 2000 5 1275
70 15950 25 29719 Petrol 97 1 0 1400 5 1110
71 15950 28 29206 Petrol 97 1 0 1400 5 1110
72 12995 32 29198 Petrol 97 1 0 1400 5 1060
73 18950 28 28817 Petrol 110 1 0 1598 5 1130
74 15750 23 28227 Petrol 97 1 0 1400 5 1110
75 19950 28 28000 Petrol 110 0 0 1600 5 1130
76 16950 23 28000 Petrol 110 1 0 1600 5 1115
77 18750 31 25266 Petrol 110 1 0 1600 5 1130
78 18450 27 23489 Petrol 110 0 0 1600 5 1115
79 16895 29 22575 Petrol 110 1 0 1600 5 1115
80 14900 30 22000 Petrol 97 1 0 1400 5 1110
81 18950 25 20019 Petrol 110 1 1 1600 5 1180
82 17250 29 20000 Petrol 110 1 0 1600 5 1115
83 15450 25 17003 Petrol 97 1 0 1400 5 1110
84 17950 31 16238 Petrol 110 1 1 1600 5 1180
85 16650 25 15414 Petrol 97 1 0 1400 5 1110
86 17450 28 8537 Petrol 110 1 0 1600 5 1130
87 14900 30 7000 Petrol 97 1 0 1400 5 1100
88 17950 20 66966 Diesel 90 1 0 2000 3 1245
89 15950 19 51884 Petrol 97 1 0 1400 3 1100
90 21950 19 50005 Diesel 110 1 0 2000 3 1265
91 16450 20 48110 Petrol 97 1 0 1400 3 1100
92 22250 20 37500 Diesel 90 1 0 2000 3 1260
93 19950 16 34472 Diesel 90 1 0 1995 3 1260
94 15950 20 33329 Petrol 97 1 0 1400 3 1100
95 18900 20 31850 Petrol 110 0 0 1600 3 1120
96 19950 17 30351 Diesel 90 1 0 1995 3 1260
97 15950 19 29435 Petrol 97 1 0 1400 3 1100
98 15950 19 25948 Petrol 97 1 0 1400 3 1100
99 18750 11 24500 Petrol 110 1 0 1600 3 1120
100 17450 18 23902 Petrol 97 1 0 1400 3 1100
[ reached getOption("max.print") -- omitted 1336 rows ]

> names(Toyota)
[1] "Price" "Age" "KM" "FuelType" "HP"
"MetColor" "Automatic"
[8] "CC" "Doors" "Weight"

> str(Toyota)
'data.frame': 1436 obs. of 10 variables:
$ Price : int 13500 13750 13950 14950 13750 12950 16900 18600 21500
12950 ...
$ Age : int 23 23 24 26 30 32 27 30 27 23 ...
$ KM : int 46986 72937 41711 48000 38500 61000 94612 75889 19700
71138 ...
$ FuelType : Factor w/ 3 levels "CNG","Diesel",..: 2 2 2 2 2 2 2 2 3
2 ...
$ HP : int 90 90 90 90 90 90 90 90 192 69 ...
$ MetColor : int 1 1 1 0 0 0 1 1 0 0 ...
$ Automatic: int 0 0 0 0 0 0 0 0 0 0 ...
$ CC : int 2000 2000 2000 2000 2000 2000 2000 2000 1800 1900 ...
$ Doors : int 3 3 3 3 3 3 3 3 3 3 ...
$ Weight : int 1165 1165 1165 1165 1170 1170 1245 1245 1185 1105 ...

> set.seed(088)
> #split the sample data and make the model
> row.number=sample(1:nrow(Toyota),0.85*nrow(Toyota))
> train=Toyota[row.number,]
> test=Toyota[-row.number,]

> train
Price Age KM FuelType HP MetColor Automatic CC Doors Weight
590 9950 55 27500 Petrol 97 1 0 1400 3 1025
148 24500 13 19988 Petrol 110 1 0 1600 5 1130
1063 6750 80 160000 Petrol 86 0 0 1300 3 1015
688 9450 67 99781 Petrol 110 1 0 1600 5 1085
1419 7750 73 39168 Petrol 86 0 0 1300 3 1015
1431 8450 80 23000 Petrol 86 0 0 1300 3 1015
47 17950 27 79375 Diesel 90 1 0 2000 5 1255
1087 6950 77 131307 Petrol 110 1 0 1600 3 1050
967 9500 62 49258 Petrol 110 1 0 1600 4 1035
1394 7250 69 49640 Petrol 110 1 0 1600 4 1035
784 8990 65 76155 Petrol 110 1 0 1600 4 1035
1061 7950 71 164000 Petrol 110 1 0 1600 3 1050
788 9900 68 75000 Petrol 110 1 0 1600 5 1075
258 11750 44 52084 Petrol 97 1 0 1400 3 1025
30 17950 30 11090 Petrol 110 1 0 1600 3 1120
470 11250 54 66063 Petrol 110 1 0 1600 5 1090
146 16450 16 20105 Petrol 97 0 0 1400 5 1110
842 8800 68 66550 Petrol 86 0 0 1332 3 1010
1425 7750 73 34717 Petrol 86 0 0 1300 3 1015
1081 7950 74 137741 Diesel 90 1 0 2000 5 1135
300 13750 39 40000 Petrol 110 1 0 1600 3 1055
1044 9450 66 15110 Petrol 86 0 0 1300 5 1035
1189 7950 71 90370 Petrol 86 1 0 1300 5 1035
1294 8500 77 71825 Petrol 110 0 0 1600 5 1075
1297 7750 79 71359 Petrol 110 1 0 1600 3 1050
996 9950 68 42750 Petrol 110 1 0 1600 3 1050
578 11950 56 33998 Petrol 110 0 0 1600 5 1080
1112 8500 71 120000 Petrol 110 0 0 1600 5 1085
800 8250 65 74179 Petrol 110 1 0 1600 3 1050
547 12500 56 45336 Petrol 110 1 0 1600 5 1080
278 11495 39 46694 Petrol 110 0 0 1600 3 1040
1073 6450 71 151000 CNG 110 1 0 1600 5 1094
894 7995 65 60724 Petrol 86 1 0 1300 3 1015
469 13950 52 66527 Petrol 110 1 0 1600 5 1080
1102 6450 72 123403 Petrol 110 1 0 1600 3 1050
1291 8250 78 72000 Petrol 110 1 0 1600 4 1035
617 9500 62 147636 Diesel 72 0 0 2000 5 1135
1043 10950 67 15535 Petrol 86 1 1 1300 4 1030
1149 6750 74 101000 Petrol 86 0 0 1300 3 1015
1319 8500 78 67255 Petrol 110 0 0 1600 5 1085
838 9750 67 67110 Petrol 86 1 0 1300 5 1035
599 10450 48 15000 Petrol 97 1 0 1400 3 1025
1416 6950 72 42000 Petrol 110 1 0 1600 3 1050
758 9500 68 80121 Petrol 110 0 0 1600 5 1070
318 10950 38 37320 Petrol 97 1 0 1400 3 1025
529 10500 56 48731 Petrol 110 1 0 1600 3 1055
173 19500 8 10077 Petrol 97 1 0 1400 5 1110
656 8250 59 113700 Petrol 110 1 0 1600 5 1065
1133 6640 74 106250 Petrol 110 0 0 1600 5 1070
652 7950 68 115071 Petrol 110 1 0 1600 3 1055
274 13450 34 48011 Petrol 110 1 0 1600 4 1030
65 17950 27 32809 Petrol 97 1 0 1400 5 1110
364 13450 40 23616 Petrol 110 1 0 1600 5 1075
1325 8500 74 66718 Petrol 110 1 0 1600 3 1050
108 17450 17 10000 Petrol 97 1 0 1400 3 1100
43 13950 22 46961 Petrol 97 0 0 1400 3 1100
506 11500 54 55877 Petrol 110 1 0 1600 5 1075
799 9950 64 74193 Petrol 110 0 0 1600 3 1050
273 13500 35 48052 Diesel 69 1 0 1900 3 1105
914 12950 67 58058 Petrol 110 1 0 1600 3 1065
236 11650 38 60829 Petrol 110 1 0 1600 5 1075
823 9500 65 70068 Petrol 110 1 0 1600 5 1075
958 8950 61 51235 Petrol 86 1 0 1300 4 1000
540 11750 52 46449 Petrol 110 1 0 1600 4 1035
396 9000 48 119742 Petrol 110 1 0 1600 5 1080
516 10750 55 52149 Petrol 97 1 0 1400 3 1085
952 8950 57 52548 Petrol 110 0 0 1600 3 1050
144 18500 16 20629 Petrol 110 1 0 1600 4 1090
928 9250 67 56074 Petrol 86 0 0 1300 4 1000
971 9950 63 47612 Petrol 86 0 0 1300 3 1015
325 12950 39 34599 Petrol 110 1 0 1600 5 1075
589 9950 48 28656 Petrol 97 0 0 1400 3 1085
95 18900 20 31850 Petrol 110 0 0 1600 3 1120
1074 7900 75 150000 Diesel 72 1 0 2000 3 1135
1011 9500 60 36943 Petrol 110 0 0 1600 5 1070
677 9500 63 103400 Petrol 110 1 0 1600 5 1075
388 9250 48 142130 CNG 110 0 0 1600 5 1119
856 8950 62 64966 Petrol 86 0 0 1300 3 1020
389 7750 48 140700 Diesel 69 1 0 1900 5 1110
610 5751 67 174833 Diesel 72 0 0 2000 4 1100
1322 7250 80 66880 Petrol 110 1 0 1600 3 1055
138 16250 13 25170 Petrol 110 1 0 1600 5 1105
1216 8950 80 86000 Petrol 110 0 0 1600 3 1050
730 8950 68 86714 Petrol 110 1 0 1600 4 1035
194 11750 40 130062 Diesel 69 1 0 1900 5 1140
230 12750 36 63459 Petrol 97 1 0 1400 5 1060
1234 7950 69 83133 Petrol 86 0 0 1300 3 1015
383 8900 45 174000 Diesel 69 1 0 1900 5 1095
187 6950 43 243000 Diesel 69 0 0 1900 3 1110
1190 8500 78 90345 Petrol 86 1 0 1300 5 1035
804 10950 64 73376 Petrol 110 0 0 1600 5 1070
560 13500 50 39706 Petrol 110 1 0 1600 5 1080
1350 7150 70 61000 Petrol 110 1 0 1600 4 1035
414 11950 51 98040 Petrol 110 0 0 1600 5 1080
1101 7950 74 124057 Petrol 110 1 0 1600 3 1050
190 7750 43 178858 CNG 110 0 0 1600 3 1084
959 8950 65 51000 Petrol 86 1 0 1300 3 1015
1296 7500 80 71500 Petrol 110 1 0 1600 4 1035
1378 8750 73 56307 Petrol 110 1 0 1600 3 1050
1145 7750 77 102000 Petrol 86 1 0 1300 3 1015
[ reached getOption("max.print") -- omitted 1120 rows ]

> dim(train)
[1] 1220 10
> test
Price Age KM FuelType HP MetColor Automatic CC Doors Weight
2 13750 23 72937 Diesel 90 1 0 2000 3 1165
6 12950 32 61000 Diesel 90 0 0 2000 3 1170
7 16900 27 94612 Diesel 90 1 0 2000 3 1245
9 21500 27 19700 Petrol 192 0 0 1800 3 1185
11 20950 25 31461 Petrol 192 0 0 1800 3 1185
20 16950 30 64359 Petrol 110 1 0 1600 3 1105
24 16950 28 32220 Petrol 110 1 0 1600 3 1120
29 16950 28 44142 Petrol 110 0 0 1600 3 1120
33 15950 27 29510 Petrol 97 1 0 1400 3 1100
34 14950 26 32692 Petrol 97 1 0 1400 3 1100
36 15750 26 43000 Petrol 97 0 0 1400 3 1100
38 14950 23 10000 Petrol 97 1 0 1400 3 1100
48 15800 22 75048 Petrol 97 1 0 1400 5 1110
50 21950 31 64982 Petrol 192 1 0 1800 5 1195
56 13250 32 45725 Petrol 110 1 0 1600 5 1075
62 16500 27 37177 Petrol 110 0 0 1600 5 1130
66 16950 26 32181 Petrol 110 1 0 1600 5 1075
67 18950 28 30993 Petrol 110 1 0 1600 5 1130
85 16650 25 15414 Petrol 97 1 0 1400 5 1110
92 22250 20 37500 Diesel 90 1 0 2000 3 1260
103 18500 13 18000 Petrol 71 0 0 1400 3 1125
109 17950 20 7187 Petrol 110 1 0 1600 3 1105
113 24950 8 13253 Diesel 116 1 0 2000 5 1320
118 17900 7 1 Petrol 110 1 0 1600 3 1105
124 18950 20 39115 Petrol 110 1 0 1600 5 1130
135 16500 20 29000 Petrol 97 0 0 1400 5 1110
151 17200 20 17300 Petrol 97 1 0 1400 5 1110
155 21750 13 13178 Petrol 110 1 0 1600 5 1130
156 16868 15 13157 Petrol 97 1 0 1400 4 1085
159 19750 17 11999 Petrol 110 1 0 1600 5 1130
171 18245 9 1 Petrol 110 1 0 1600 5 1075
175 21950 8 9788 Petrol 110 1 0 1600 5 1130
201 11495 44 96829 Petrol 110 1 0 1600 5 1075
203 10500 42 92204 Petrol 110 1 0 1600 5 1075
209 11450 41 84312 Petrol 110 0 0 1600 5 1080
213 11790 34 78677 Petrol 110 1 1 1600 5 1105
221 11950 43 74285 Petrol 110 1 0 1600 5 1075
253 11750 43 53773 Petrol 110 1 0 1600 5 1075
257 13500 38 53000 Petrol 110 1 0 1600 5 1075
262 12495 39 50873 Petrol 110 1 0 1600 5 1075
264 12750 40 50640 Petrol 110 1 0 1600 5 1075
266 11950 38 49500 Petrol 110 1 0 1600 5 1075
269 14750 40 48952 Diesel 90 1 0 2000 5 1205
271 13500 33 48928 Diesel 69 1 0 1900 3 1105
277 13450 39 46821 Petrol 97 1 0 1400 5 1060
279 12750 43 46515 Petrol 97 0 0 1400 5 1025
280 14990 38 46327 Petrol 110 1 0 1600 3 1055
281 12950 35 46304 Petrol 97 1 0 1400 5 1060
293 10500 35 43000 Petrol 110 0 0 1600 3 1050
296 10950 38 41754 Petrol 110 0 0 1600 3 1040
322 10750 36 36269 Petrol 110 1 0 1600 5 1075
329 12950 35 33258 Petrol 110 1 0 1600 5 1075
337 12900 33 31000 Petrol 110 1 0 1600 5 1075
341 11900 41 29716 Petrol 116 1 0 1600 5 1075
352 9950 42 27141 Petrol 97 1 0 1400 5 1060
356 13750 39 25062 Petrol 110 1 0 1600 5 1080
358 14990 33 24650 Petrol 110 1 0 1600 3 1055
360 14350 41 24475 Petrol 110 1 0 1600 5 1030
368 11950 41 21651 Petrol 97 0 0 1400 3 1025
375 12950 40 16325 Petrol 110 1 0 1600 5 1080
386 9900 51 146736 Petrol 110 1 0 1600 5 1080
406 9950 54 103454 Petrol 110 1 0 1600 5 1075
407 10950 51 103018 Diesel 69 0 0 1900 5 1140
428 12500 54 84598 Petrol 110 0 0 1600 5 1075
431 12200 50 82805 Petrol 110 1 0 1600 3 1040
434 11290 49 80320 Petrol 110 1 1 1600 3 1070
455 11950 50 72242 Petrol 110 0 1 1600 3 1070
456 9850 53 72000 Petrol 97 1 0 1400 3 1025
461 9500 55 69813 Petrol 97 1 0 1400 3 1025
492 10750 54 60239 Petrol 110 1 0 1600 5 1075
500 9950 53 57948 Petrol 97 1 0 1400 3 1025
512 11900 51 53408 Petrol 110 1 0 1600 5 1080
517 11950 55 52141 Petrol 110 1 0 1600 5 1070
524 18950 49 49568 Petrol 110 1 0 1600 3 1105
526 10250 52 49432 Petrol 110 1 0 1600 3 1050
532 10250 54 47852 Petrol 110 1 0 1600 4 1030
535 12950 53 47451 Petrol 110 1 0 1600 3 1055
538 9550 54 46856 Petrol 97 0 0 1400 5 1060
543 10500 52 46029 Petrol 110 1 0 1600 4 1030
553 12950 49 41636 Petrol 110 1 0 1600 5 1105
559 11000 47 40000 Petrol 110 0 0 1600 5 1080
570 13000 49 36000 Petrol 110 0 0 1600 5 1080
574 11710 48 35142 Petrol 110 0 0 1600 3 1055
577 11500 46 34000 Petrol 110 1 0 1600 5 1075
579 11500 55 33230 Petrol 110 0 0 1600 3 1050
580 11900 46 33021 Petrol 110 1 0 1600 5 1080
584 10450 46 30806 Petrol 97 1 0 1400 5 1060
586 12950 50 29686 Petrol 110 1 1 1600 3 1075
600 12950 50 10210 Petrol 97 0 0 1400 5 1065
607 7500 59 190900 Diesel 72 1 0 2000 3 1115
622 6900 60 139800 Diesel 72 0 0 2000 3 1115
624 8750 61 136956 Petrol 110 0 0 1600 3 1065
626 8950 64 133769 Diesel 72 1 0 2000 3 1120
630 7750 60 130270 Petrol 110 0 0 1600 3 1050
631 7500 59 130000 Diesel 72 1 0 2000 4 1135
644 10950 57 118833 Petrol 110 0 0 1600 3 1065
648 6950 68 117000 Diesel 72 0 0 2000 3 1115
650 9950 58 115715 Petrol 110 0 0 1600 5 1070
651 9450 60 115191 CNG 110 1 0 1600 4 1079
660 10500 66 112000 Petrol 110 1 0 1600 3 1065
[ reached getOption("max.print") -- omitted 116 rows ]

> dim(test)
[1] 216 10
> ggplot(train,aes(Price))+geom_density(fill="green")

> ggplot(train,aes(log(Price)))+geom_density(fill="green")
> ggplot(train,aes(sqrt(Price)))+geom_density(fill="green")

> #modelfit

> model1=lm(log(Price)~.,data=train)
> summary(model1)

Call:
lm(formula = log(Price) ~ ., data = train)

Residuals:
Min 1Q Median 3Q Max
-0.78788 -0.06304 0.00473 0.07489 0.44290

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.611e+00 1.196e-01 71.974 < 2e-16 ***
Age -1.055e-02 2.463e-04 -42.845 < 2e-16 ***
KM -1.730e-06 1.240e-07 -13.949 < 2e-16 ***
FuelTypeDiesel 1.096e-01 4.927e-02 2.224 0.02632 *
FuelTypePetrol 7.422e-02 3.009e-02 2.466 0.01378 *
HP 2.921e-03 5.750e-04 5.081 4.35e-07 ***
MetColor 3.621e-03 7.091e-03 0.511 0.60969
Automatic 4.747e-02 1.520e-02 3.124 0.00183 **
CC -5.881e-05 5.333e-05 -1.103 0.27036
Doors 1.020e-02 3.838e-03 2.658 0.00797 **
Weight 9.344e-04 1.099e-04 8.505 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1151 on 1209 degrees of freedom


Multiple R-squared: 0.8499, Adjusted R-squared: 0.8487
F-statistic: 684.8 on 10 and 1209 DF, p-value: < 2.2e-16

> par(mfrow=c(2,2))
> plot(model1)

> #model building for model2


> #remove less significant

> model2=update(model1,~.-MetColor-CC)
> summary(model2)

Call:
lm(formula = log(Price) ~ Age + KM + FuelType + HP + Automatic +
Doors + Weight, data = train)

Residuals:
Min 1Q Median 3Q Max
-0.78331 -0.06346 0.00479 0.07416 0.44180

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.567e+00 1.119e-01 76.581 < 2e-16 ***
Age -1.058e-02 2.449e-04 -43.218 < 2e-16 ***
KM -1.746e-06 1.232e-07 -14.175 < 2e-16 ***
FuelTypeDiesel 6.997e-02 3.383e-02 2.068 0.03884 *
FuelTypePetrol 7.545e-02 3.004e-02 2.511 0.01216 *
HP 2.405e-03 3.247e-04 7.406 2.43e-13 ***
Automatic 4.613e-02 1.515e-02 3.045 0.00238 **
Doors 9.557e-03 3.764e-03 2.539 0.01123 *
Weight 9.482e-04 1.090e-04 8.699 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.115 on 1211 degrees of freedom


Multiple R-squared: 0.8498, Adjusted R-squared: 0.8488
F-statistic: 856.2 on 8 and 1211 DF, p-value: < 2.2e-16

> plot(model2)

> pred=predict(model2,newdata = test)


> pred
2 6 7 9 11 20 24
29 33
9.616124 9.546472 9.611794 9.936508 9.937132 9.553716 9.645234
9.624412 9.610323
34 36 38 48 50 56 62
66 67
9.615348 9.597346 9.686727 9.612299 9.843689 9.555763 9.675754
9.642912 9.675972
85 92 103 109 113 118 124
135 151
9.684702 9.799839 9.739758 9.759390 10.107704 9.909511 9.746446
9.713886 9.734320
155 156 159 171 175 201 203
209 213
9.865821 9.761206 9.825551 9.879014 9.924654 9.339522 9.368765
9.397871 9.551620
221 253 257 262 264 266 269
271 277
9.389478 9.425302 9.479563 9.472696 9.462520 9.485676 9.535154
9.444840 9.434288
279 280 281 293 296 322 329
337 341
9.359306 9.453141 9.477520 9.485958 9.446905 9.529949 9.545790
9.570898 9.502911
352 356 358 360 368 375 386
406 407
9.436911 9.522516 9.543911 9.454967 9.404782 9.527192 9.183025
9.222128 9.212190
428 431 434 455 456 461 492
500 512
9.255060 9.248221 9.337716 9.341241 9.189860 9.172515 9.297603
9.214401 9.346021
517 524 526 532 535 538 543
553 559
9.296423 9.378484 9.294824 9.267012 9.292442 9.275491 9.291360
9.411450 9.411768
570 574 577 579 580 584 586
600 607
9.397589 9.366852 9.428088 9.291374 9.434539 9.388181 9.420306
9.386563 8.938442
622 624 626 630 631 644 648
650 651
9.017106 9.060945 8.990050 9.068982 9.073324 9.134927 8.972267
9.153644 9.056923
660 664 676 680 686 696 697
703 705
9.051619 9.019223 9.055554 9.152423 9.148037 9.054758 9.088936
9.004132 8.989548
706 712 721 727 729 744 748
750 751
9.105939 9.094656 9.147940 9.175740 9.107277 9.147243 9.116007
9.000491 9.187767
755 776 779 789 790 791 792
813 817
9.076116 9.108233 8.986443 9.021697 9.182423 9.174180 8.990144
9.119031 9.194430
818 819 825 831 834 839 859
881 891
9.142021 9.157096 8.998683 9.183275 9.167642 9.204270 9.204983
9.220518 9.098885
895 913 920 922 930 934 940
947 965
9.157711 9.203608 9.218288 9.179207 9.187371 9.185499 9.215442
9.193747 9.170585
968 969 974 990 995 1010 1014
1019 1036
9.200337 9.203484 9.144377 9.268629 9.088167 9.242540 9.089810
9.223775 9.302023
1040 1049 1056 1067 1068 1070 1071
1075 1080
9.310244 8.777094 8.843169 8.893215 8.833354 8.895318 8.864234
8.910505 8.814942
1083 1086 1113 1124 1125 1132 1134
1141 1150
8.856365 8.873355 8.939254 8.985792 8.895928 8.950888 8.903800
8.871067 8.951421
1159 1163 1169 1172 1177 1179 1180
1196 1199
9.075976 8.861659 8.847450 8.980593 8.959035 9.049788 8.932941
8.956277 8.890537
1204 1206 1222 1224 1229 1236 1262
1264 1265
8.984071 8.893845 9.010496 9.017727 9.022192 9.035657 8.984215
8.935862 9.035072
1271 1280 1286 1302 1327 1328 1330
1331 1334
9.017975 8.917620 8.989622 9.073605 9.007746 8.900718 8.890566
9.031301 9.098085
1336 1339 1341 1347 1349 1355 1357
1364 1370
8.977816 8.973948 9.007276 8.977308 9.055966 8.990395 9.011808
9.098297 8.893024
1373 1385 1401 1403 1409 1414 1421
1424 1436
9.027170 9.057469 9.144998 8.975148 9.007229 9.179873 9.142756
8.977514 9.206975

> exp(9.616124)
[1] 15004.78

Interpretation
From the analysis of model 1,I understood that the variables,

 Age ,km,Hp,Weight have high significance


 Automatic,Doors have significance of 0.1%
 Fuel type dielsel,Fue l type petrol has got 1%
 Metcolor and CC has no significance at all,therefore in model 2,I will be removing these
variable to get more accurate values

Age -1.055e-02 2.463e-04 -42.845 < 2e-16 ***


KM -1.730e-06 1.240e-07 -13.949 < 2e-16 ***
FuelTypeDiesel 1.096e-01 4.927e-02 2.224 0.02632 *
FuelTypePetrol 7.422e-02 3.009e-02 2.466 0.01378 *
HP 2.921e-03 5.750e-04 5.081 4.35e-07 ***
MetColor 3.621e-03 7.091e-03 0.511 0.60969
Automatic 4.747e-02 1.520e-02 3.124 0.00183 **
CC -5.881e-05 5.333e-05 -1.103 0.27036
Doors 1.020e-02 3.838e-03 2.658 0.00797 **
Weight 9.344e-04 1.099e-04 8.505 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

 Also here we can see that the Fstatistics value is 684.8


 while considering sqrt,log fn, Log fn is more normally distributed.hence I have used
log fn plot
 According to the plots,the distribution of data is not even on both sides of the line

In model 2,
 Metcolor and CC has no significance at all,therefore I will be removing these variable
to get more accurate values
 Since Fuel type has slight significance,I have decided to keep them for my analysis
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.567e+00 1.119e-01 76.581 < 2e-16 ***
Age -1.058e-02 2.449e-04 -43.218 < 2e-16 ***
KM -1.746e-06 1.232e-07 -14.175 < 2e-16 ***
FuelTypeDiesel 6.997e-02 3.383e-02 2.068 0.03884 *
FuelTypePetrol 7.545e-02 3.004e-02 2.511 0.01216 *
HP 2.405e-03 3.247e-04 7.406 2.43e-13 ***
Automatic 4.613e-02 1.515e-02 3.045 0.00238 **
Doors 9.557e-03 3.764e-03 2.539 0.01123 *
Weight 9.482e-04 1.090e-04 8.699 < 2e-16 ***

 Now in this model, all the predictors are significant.

 Also here we can see that the Fstatistics value is 856.2,here we can see that f-statistics
value has increased from 684.8
Since there is increase in the value ,we say that there is a relationship between predictor and
response variable.

 Observation of the plot


 All the four plots look similar to the previous model and we don’t see any major effect.

Conclusion

The example shows how to approach linear regression modelling. We can conclude that by
analysing the 1st model and 2nd model, we can understand the improvement in the f statistics value
.Also it helps us in understanding which all variables to be removed,and by removing those
Variables we can improve the F-statistics value and also,the prediction will be more accurate. With
the help of the plots, we can make the decision of whether to use log or sqrt function depending on
which plot is more normally distributed.the Predictions that are made once the insignifant variables
are removed helps in near accurate forecasting of values.

You might also like