Week8 Tutorial

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

BIG DATA

Week 8 - Tutorial
April 2024
MEASURING CONNECTIONS BETWEEN VARIABLES

Depending on the type of variables that we have there are different option on how to measure
connections between variables

Testing for
connections
between
variables

Contingency
Correlation
tables

Nominal/Ordinal
Ordinal Interval or ratio
variables

Spearman Pearson Spearman


correlation correlation correlation
coefficient coefficient coefficient

2
Which option is NOT a method to measure connections
between variables?
A. Correlation
B. Regression
C. Histogram
D. Chi-square
How would you evaluate the strength of the connection
between two variables using statistical methods?
A. p-value
B. R-squared value
C. Regression coefficient
D. Standard deviation
Analyze the relationship between two continuous
variables using which statistical measure?
A. Mean
B. Correlation coefficient
C. Standard deviation
D. Mode
CORRELATION ANALYSIS – MEASURING
THE RELATIONSHIP BETWEEN TWO VARIABLES

• Spearman correlation uses data rank to measure monotonicity between ordinal or


continuous variables.
• Pearson correlation, R or r, detects linear relationships between quantitative data.

Strong Moderate Weak Weak Moderate Strong


Negative Negative Negative Positive Positive Positive
CorrelationCorrelation Correlation Correlation CorrelationCorrelation

−1 Negative correlation: 𝑟 ≤ 0
0 Positive correlation: 𝑟 ≥ 0
1

6
Which correlation method detects linear relationships
between quantitative data?
A. Spearman correlation
B. Pearson correlation
C. T-test
D. ANOVA
Which method would you choose to measure the
relationship between two continuous variables with a
non-linear pattern?
A. Spearman correlation
B. Pearson correlation
C. Linear regression
D. Chi-square test
What type of relationships does the Pearson correlation
detect?
A. Monotonic relationships
B. Non-linear relationships
C. Linear relationships
D. No relationships
What is the primary difference between Spearman
correlation and Pearson correlation?
A. Spearman correlation uses data rank for ordinal or continuous variables, while
Pearson correlation detects linear relationships between quantitative data.
B. Spearman correlation measures linear relationships, while Pearson correlation
measures monotonicity.
C. Spearman correlation is used for qualitative data, while Pearson correlation is used
for quantitative data.
If the sample Pearson correlation coefficient between no of
hours of watching Tiktok and Big Data final exam score is equal
to – 0.85, what does this tell you ? (Choose one or more correct
statements)
A. No of hours of watching Tiktok and Big Data final exam score have a negative strong
linear relationship.
B. The sample Pearson correlation coefficient suggests the more hours you watch Tiktok,
the lower Big Data exam score you will receive.
C. Suppose that a simple linear regression is used to predict Big Data final exam score
when using the no of hours of watching Tiktok. Then no of hours of watching Tiktok
explains 72.25% the variability in the Big Data final exam score.
D. All of the above answer.
PURPOSE OF REGRESSION ANALYSIS

The purpose of regression analysis is to


analyze relationships among variables.

Answer the question of how much y


Forecast or predict the value of y
changes with changes in each of the
based on the values of the X's
X’s
COEFFICIENT OF DETERMINATION, 𝑅2

• Using 𝑅2 , we can assess the overall fit of a model.


• 𝑅2 measures the proportion of the total variability in the response variable
explained by the model.
• 0 ≤ 𝑅2 ≤ 1. The closer 𝑅2 is to 1, the better the explanatory power of the
model.
• Note also that 𝑅2 = 𝑟 2 for a simple linear model (only), where r is the
sample Pearson correlation coefficient.
• When conducting simple linear regression, if we had a choice of candidates
to use as the explanatory variable then we would prefer the one which
maximises 𝑅2 .
When evaluating a regression analysis model, what does
a high R-squared value indicate?
A. A strong correlation between the variables
B. Overfitting of the model
C. No relationship between the variables
D. Underfitting of the model
In regression analysis, what is the coefficient of
determination used for?
A. To quantify the extent to which the dependent variable is predictable
B. To determine the slope of the regression line
C. To identify outliers in the data set
D. To calculate the standard error of the estimate
When would you use regression analysis over
correlation analysis?
A. When you want to determine the strength and direction of the relationship
between variables
B. When you want to predict one variable based on another variable
C. When you want to calculate the correlation coefficient
D. Regression analysis and correlation analysis serve different purposes
Below is JASP output when fitting Age vs Prevexp of employee_dataset.
Model Summary - Age
Model R R² Adjusted R² RMSE

H₀ 0.000 0.000 0.000 11.777

H₁ 0.801 0.642 0.642 7.051


Coefficients

Model Unstandardized Standard Error Standardized t p

H₀ (Intercept) 62.981 0.542 116.304 < .001

H₁ (Intercept) 54.329 0.440 123.481 < .001

prevexp 0.090 0.003 0.801 29.085 < .001

1. What is the value of the sample Pearson correlation coefficient between age and prevexp ? Interpret the value.

2. What is the value of the coefficient of determination between age and prevexp ? Interpret the value.

3. Write down the line of regression of age vs prevexp. Hence calculate the prediction of Age when prevexp = 100.

You might also like