Project 2 Factor Hair Revised Case Study
Project 2 Factor Hair Revised Case Study
Project 2 Factor Hair Revised Case Study
By:
Shreya Garg
Table of Contents
3. Psych- include functions most useful for personality and psychological research.
Some of the functions
(e.g., read.file, read.clipboard, describe,pairs.panels, error.bars and error.dots) are
useful for basic data entry and descriptive analyses.
4|P age
2.1.3 Import and Read the Dataset
The given dataset is in .csv format. Hence, the command ‘read.csv’ is used for importing
the file.
A total of 12 variables were used for market segmentation in the context of product
service management. From this data we are asked to first check for multicollinearity
within the variables. Then factor analysis is being performed on the variables to
reduce them to 4 factors. Then linear regression analysis is being performed with
customer satisfaction as the dependent variable and the four factors as the
independent variable.
5|P age
2.3 Bi-Variate Analysis
The below scatter plots of all the 12 variables, shows the distribution and
variance of each of these variables. All of the variables follow a normal
distribution bell curve.
The given dataset does not contain any missing values. This was checked
in R studio by running command - any(is.na.data.frame(“dataset”)) which
returned a False value.
3 Multicollinearity Evidences
5|P age
1. Using the corrplot command for displaying the pictorial representation
of correlation matrix. It indicates the presence of correlations between
variables. The circles with darker shades of blue shows the correlation
between variables.
6|P age
The same can also be seen from a numerical point of view to get a
clearer picture:
2. After we study the above graphs, we can perform the following tests to
find out the multicollinearity in the dataset.
print(cortest.bartlett(newdata_cor,nrow(newdata)))
$chisq
[1] 619.2726
7|P age
$p.value
[1] 1.79337e-96
$df
[1] 55
From the above output, we can clearly infer that the p-value is much
less than the significance value of 0.05, hence we can easily reject the
null hypothesis.
From the above test output, we can infer that the data is suitable for
factor analysis as the overall MSA > 0.5
A KMO value over 0.5 and a significance level for the Bartlett’s test below
0.05 suggest there is substantial correlation in the data.
Variable collinearity indicates how strongly a single variable is correlated
with other variables.
KMO measures are also calculated for each variable. Values above 0.5
are acceptable.
8|P age
Code in R (Using VIF function)
vif(lm(Satisfaction~ProdQual+Ecom+TechSup+CompRes+Advertising+Pr
odLine+SalesFImage+ComPricing+WartyClaim+OrdBilling+DelSpeed,
data= newdata))
ProdQual Ecom TechSup CompRes
1.635797 2.756694 2.976796 4.730448
Advertising ProdLine SalesFImage ComPricing
1.508933 3.488185 3.439420 1.635000
WartyClaim OrdBilling DelSpeed
3.198337 2.902999 6.516014
As, we can see, the values lies between 1<VIF<10, the dataset is correlated.
1. Satisfaction~ProdQual
Call:
lm(formula = Satisfaction ~ ProdQual)
Residuals:
Min 1Q Median 3Q Max
-1.88746 -0.72711 -0.01577 0.85641 2.25220
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.67593 0.59765 6.151 1.68e-08 ***
ProdQual 0.41512 0.07534 5.510 2.90e-07 ***
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
9|P age
2. Satisfaction~ECom
Call:
lm(formula = Satisfaction ~ Ecom)
Residuals:
Min 1Q Median 3Q Max
-2.37200 -0.78971 0.04959 0.68085 2.34580
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.1516 0.6161 8.361 4.28e-13 ***
Ecom 0.4811 0.1649 2.918 0.00437 **
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
3. Satisfaction~TechSup
Call:
lm(formula = Satisfaction ~ TechSup)
Residuals:
Min 1Q Median 3Q Max
-2.26136 -0.93297 0.04302 0.82501 2.85617
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.44757 0.43592 14.791 <2e-16 ***
TechSup 0.08768 0.07817 1.122 0.265
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
4. Satisfaction~CompRes
Call:
lm(formula = Satisfaction ~ CompRes)
10 | P a g
e
Residuals:
Min 1Q Median 3Q Max
-2.40450 -0.66164 0.04499 0.63037 2.70949
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.68005 0.44285 8.310 5.51e-13 ***
CompRes 0.59499 0.07946 7.488 3.09e-11 ***
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
5. Satisfaction~Advertising
Call:
lm(formula = Satisfaction ~ Advertising)
Residuals:
Min 1Q Median 3Q Max
-2.34033 -0.92755 0.05577 0.79773 2.53412
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.6259 0.4237 13.279 < 2e-16 ***
Advertising 0.3222 0.1018 3.167 0.00206 **
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
6. Satisfaction~ProdLine
Call:
lm(formula = Satisfaction ~ ProdLine)
Residuals:
Min 1Q Median 3Q Max
-2.3634 -0.7795 0.1097 0.7604 1.7373
Coefficients:
11 | P a g
e
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.02203 0.45471 8.845 3.87e-14 ***
ProdLine 0.49887 0.07641 6.529 2.95e-09 ***
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
7. Satisfaction~SalesFImage
Call:
lm(formula = Satisfaction ~ SalesFImage)
Residuals:
Min 1Q Median 3Q Max
-2.2164 -0.5884 0.1838 0.6922 2.0728
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.06983 0.50874 8.000 2.54e-12 ***
SalesFImage 0.55596 0.09722 5.719 1.16e-07 ***
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
8. Satisfaction~ComPricing
Call:
lm(formula = Satisfaction ~ ComPricing)
Residuals:
Min 1Q Median 3Q Max
-1.9728 -0.9915 -0.1156 0.9111 2.5845
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.03856 0.54427 14.769 <2e-16 ***
ComPricing -0.16068 0.07621 -2.108 0.0376 *
---
Signif. codes:
12 | P a g
e
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
9. Satisfaction~WartyClaim
Call:
lm(formula = Satisfaction ~ WartyClaim)
Residuals:
Min 1Q Median 3Q Max
-2.36504 -0.90202 0.03019 0.90763 2.88985
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.3581 0.8813 6.079 2.32e-08 ***
WartyClaim 0.2581 0.1445 1.786 0.0772 .
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
10. Satisfaction~OrdBilling
Call:
lm(formula = Satisfaction ~ OrdBilling)
Residuals:
Min 1Q Median 3Q Max
-2.4005 -0.7071 -0.0344 0.7340 2.9673
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.0541 0.4840 8.377 3.96e-13 ***
OrdBilling 0.6695 0.1106 6.054 2.60e-08 ***
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Call:
lm(formula = Satisfaction ~ ProdQual)
Residuals:
Min 1Q Median 3Q Max
-1.88746 -0.72711 -0.01577 0.85641 2.25220
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.67593 0.59765 6.151 1.68e-08 ***
ProdQual 0.41512 0.07534 5.510 2.90e-07 ***
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
5 PCA/Factor Analysis
R code:
ev <- eigen(newdata_cor)
eigenValues <- ev$values
eigenVector <- ev$vectors
Factor = c(2,3,4,5,6,7,8,9,10,11,12)
Scree =data.frame(Factor, eigenValues)
plot(Scree, main="Scree Plot", col="Blue")
lines(Scree, col="Red")
14 | P a g
e
Based on above Scree plot and by applying Kaiser rule (elbow
rule), we can take 4 factors which are greater than 1 and leave
remaining factors that are less than 1.
15 | P a g
e
Mean item complexity = 1.9
Test of the hypothesis that 4 components are sufficient.
16 | P a g
e
Rotated Factor Loadings:
17 | P a g
e
The root mean square of the residuals (RMSR) is 0.06
with the empirical chi square 39.023 with prob < 0.00177
RotateProfile=plot(Rotate,row.names(Rotate$loadings),cex=1.0)
18 | P a g
e
Naming of Factors:
19 | P a g
e
FACTOR ANALYSIS :-
20 | P a g
e
6 Multiple Linear Regression
RC<-Rotate$scores
ModelRC = lm(Satisfaction~Rotate$scores)
summary(ModelRC)
Call:
lm(formula = Satisfaction ~ Rotate$scores)
Residuals:
Min 1Q Median 3Q Max
-1.6308 -0.4996 0.1372 0.4623 1.5228
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.91800 0.07089 97.589 < 2e-16 ***
21 | P a g
e
Rotate$scoresRC1 0.61805 0.07125 8.675 1.12e-13 ***
Rotate$scoresRC2 0.50973 0.07125 7.155 1.74e-10 ***
Rotate$scoresRC3 0.06714 0.07125 0.942 0.348
Rotate$scoresRC4 0.54032 0.07125 7.584 2.24e-11 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> confint(ModelRC,"Rotate$scoresRC1")
2.5 % 97.5 %
Rotate$scoresRC1 0.4766059 0.7594879
> confint(ModelRC,"Rotate$scoresRC2")
2.5 % 97.5 %
Rotate$scoresRC2 0.3682931 0.6511751
> confint(ModelRC,"Rotate$scoresRC3")
2.5 % 97.5 %
Rotate$scoresRC3 -0.07430515 0.2085769
> confint(ModelRC,"Rotate$scoresRC4")
2.5 % 97.5 %
Rotate$scoresRC4 0.398878 0.6817601
newdata1=data.frame(RC1=0.71,RC2=0.66,RC3=0.13,RC4=0.76)
prediction=predict(ModelRC,newdata1)
prediction
prediction=predict(ModelRC,newdata1,interval="confidence")
prediction
Predicted=predict(ModelRC)
Actual=mydata$Satisfaction
Backtrack=data.frame(Actual,Predicted)
Backtrack
plot(Actual,col="Red")
lines(Actual,col="Red")
plot(Predicted,col="Blue")
lines(Predicted,col="Blue")
lines(Actual,col="Red")
22 | P a g
e
The Actual Model is represented by red lines and Predicted Model is
represented by blue lines which shows that the Predicted Model is fairly
close to the Actual Model. Hence, the regression model developed is good
enough to capture the essence of the data.
7 Model Validity
23 | P a g
e
24 | P a g
e