10 Regression Analysis
10 Regression Analysis
10 Regression Analysis
Model fitting
Regression model
In a Example: Relationship of number of
correlational hours of study and test score (e.g., 100
points)
study, the
goal is to
Independent variable (X): number of
check hours
whether
two
variables are Dependent variable (Y): test score
related.
In correlation, the information we
might get is that test score is
positively “related” to the number
of hours (r = .75, p = .004)
Interpretation
in correlation
Our interpretation is that as you
add your study hours, the higher
is your score in the test.
It does not provide a “prediction value”
of the number of hours and how much
score you will get.
Defining
regression
Types:
Simple linear Multiple linear
regression (SLR) regression (MLR)
➢ predicts the value of y given the value of x.
➢ used when there is a relationship between x
(independent variable) and y (dependent
variable)
SIMPLE LINEAR ➢ data should be normally distributed using
the level of measurement which is
REGRESSION expressed in interval or ratio
➢ y = a + bx
a = intercept
b = slope of the line
What is SLR?
• Simple linear regression is a linear regression model with a single
regression variable (X).
• Simple linear model: 𝑦𝑖 = 𝛽0 + 𝛽1 𝑋𝑖 + 𝑒𝑖 , i = 1, 2
where:
yi the value of the response variable in the ith trial
𝛽0 and 𝛽1 the parameters of the model
x the value of the regressor in the ith trial
ei the random error in the ith trial
The regression
model for
number of hours 𝑦 = 𝛽0 + 𝛽1 𝑋𝑖 + 𝑒𝑖
and test score is:
Test score = 60 + 1.89 (1 hour of study)
Test score = 60 + 1.89 (1 hour of study)
The height of ten students were determined to be 65, 64, 64, 62,
62, 60, 60, 59, 58, and 57 inches, and their weight were
determined to be 110, 108, 107, 104, 98, 96, 94, 92, 90, and 89
pounds. Determine if height correlates with weight, and the
regression equation. Use .05 level of significance.
ANTONIO2019
This summary table shows whether the regression line (line of best fit)
is significantly different from horizontal line. If p < 0.05, it indicates
that, the regression model significantly predicts the outcome variable.
Since R2 = .953, hence the model is a good fit for the data.
ANTONIO2019
The coefficients table provides the necessary information to
predict dependent variable from the independent.
ANTONIO2019
Reporting of Results
A simple linear regression was calculated to predict weight based
on height. A significant regression equation was found (F(1,9) =
161.880, p < .05). With R2 of .953, it can be said that increase in weight
can be explained by height by 95.30%. Participants’ predicted weight is
equal to -73.084 + 2.913 (height) pounds when height is measured in
inches. Participants’ average weight increased 2.813 pounds for each
inch of height.
ANTONIO2019
EXAMPLE # 2 Number of Absences Grades in English
1 90
A study is conducted on the 2 85
relationship of the number of 2 80
absences and the grades of 15 3 75
students in English. Determine the 3 80
relationship of the two variables. 8 65
6 70
1 95
4 80
5 80
5 75
1 92
2 89
1 80
9 65
ANTONIO2019
Multiple Linear Regression (MLR)
Persistence (X2)
Motivation (X3)
P-plot
Shapiro-Wilk test
Kolmogorov-Smirnov test
Assumptions of normality/linearity
In regression, the difference between what the
model predicts and the observed data are usually
called residuals.
This means that the plots show that error terms are
normally distributed.
How to validate normality/linearity?
To further validate the assumption of normality,
Shapiro-Wilk and Kolmogorov-Smirnov tests are
used.
In a residual plot, they are the points that lie far beyond
the scatter of the remaining residuals.
• Statistics
• Regression Plots
Statistics
• Available procedures in this
button are checking the
assumptions of no
multicollinearity (Collinearity
diagnostics) and
independence of errors
(Durbin- Watson)
Dialog box for STATISTICS
Statistics
• Estimates give us the estimated coefficients of the regression model
(i.e. the estimated 𝜷-values).
• Confidence intervals - produce confidence intervals for each of the
unstandardized regression coefficients.
Statistics
• Model fit: Provides statistical test of the model’s ability to predict the
outcome variable (the F-test), and the value of R and the adjusted R2.
• R squared change: It displays the change in R2 resulting from the
inclusion of a new predictor (for MLR only).
Statistics
• Descriptives: Displays correlation matrix to assess whether predictors
are highly correlated
• Collinearity diagnostics: This option is for obtaining collinearity
statistics such as the VIF ≤ 10.0) and tolerance = .20 – 1.00 (for MLR
only)
Statistics
• Durbin-Watson: Displays the Durbin-Watson test statistic, which tests
for correlations between errors. Specifically, it tests whether adjacent
residuals are independent (desired value is 2.0 or close to 2.0)
Residual plots
• Plots provides the means to
create graphs for regression to
check for validity of
assumptions.
Dialog box for Plots
Plots
• ZRESID (standardized residuals, or errors): These values are the
standardized differences between the observed data and the values
that the model predicts.
• ZPRED (standardized predicted values of the dependent variable
based on the model): These values are standardized forms of the
values predicted by the model
Plots
• ZRESID & ZPRED are useful in:
• Determining assumptions on errors
• Identifying if errors are homoscedastic
Dialog box for Save
• This box saves regression
diagnostics. Each statistic has a
corresponding column
in SPSS output.
Task 7.4