IWB Chapter 10 - Inter-Relationships Between Variables

Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

Chapter 10

Inter-relationships between variables

Outcome

By the end of this session you should be able to:

 describe the principal business applications of big data and analytics;

 demonstrate the relationship between data variables: correlation coefficient


(both Pearson and Spearman) and the coefficient of determination

 prepare a trend equation using either graphical means or regression analysis

and answer questions relating to these areas.

The underpinning detail for this chapter in your Integrated Workbook can
be found in Chapter 10 of your Study Text

193
Chapter 10

Overview

Nature of Strength of
Big data
relationship relationship

Introduction

INTER-RELATIONSHIPS
BETWEEN VARIABLES

Correlation Regression

Pearson’s correlation
coefficient

Coefficient of
determination

Rank correlation

194
Inter-relationships between variables

Introduction

1.1 Definition

This chapter examines the strength and nature of the relationship between two sets
of figures.

 strength of relationship ↔ correlation.

 nature of relationship ↔ regression.

195
Chapter 10

Big data

2.1 Big data

 Big data allows businesses to expand their knowledge of their customers and
develop products/ services that are best suited to their needs.

 Big data gives businesses a deeper understanding of how their customers


behave, allowing them to connect with customers on a more meaningful level.

 Big data can help boost marketing activities, since it provides businesses with a
chance to analyse customer behaviour on multiple channels and understand
when the customer is most likely to buy products/ services.

196
Inter-relationships between variables

Correlation

3.1 Pearson’s correlation coefficient = r


n ∑ xy – ∑x∑y
 r=
2 2
(n ∑ x2 –( ∑ x) )(n ∑ y2 –( ∑ y) )

 This measure has the property of always lying in the range –1 to +1, where:

– r = +1 denotes perfect positive linear correlation

– r = –1 denotes perfect negative linear correlation

– r = 0 denotes no linear correlation.

 The strength of a correlation can be judged by its proximity to +1 or –1: the


nearer it is (and the further away from zero), the stronger is the linear
correlation.

 A common error is to believe that negative values of r cannot be strong. They


can be just as strong as positive values except that y is decreasing as
x increases.

3.2 The coefficient of determination = r2

 The coefficient of determination, r2, gives the proportion of changes in y that can
be explained by changes in x, assuming a linear relationship between x and y.

 E.g. If r = +0.7, then r2 = 0.49 and we could state that 49% of the observed
changes in y can be explained by the changes in x but that 51% of the changes
must be due to other factors.

197
Chapter 10

Question 1
By calculating the correlation coefficient, you are demonstrating the
maths skill of following the order of precedence of operators, including
indices.

Pearson's correlation coefficient

Evaluate Pearson’s correlation coefficient for the data on sales and advertising
spend in the table below.

Advertising spend ($000) Sales value ($000)


10 1,500
12 2,000
15 2,200
18 2,700
19 2,700
21 2,720
n=6
x y xy x2 y2
10 1,500 15,000 100 2,250,000
12 2,000 24,000 144 4,000,000
15 2,200 33,000 225 4,840,000
18 2,700 48,600 324 7,290,000
19 2,700 51,300 361 7,290,000
21 2,720 57,120 441 7,398,400
––– –––––– –––––– ––––– –––––––––
95 13,820 229,020 1,595 33,068,400
n ∑ xy – ∑x∑y
r=
2 2
(n ∑ x2 – ( ∑ x) )(n ∑ y2 – ( ∑ y) )
r = (6 × 229,020 – 95 × 13,820)/√[(6 × 1,595 – 952) (6 × 33,068,400 – 13,8202)]

r = 61,220/√[545 × 7,418,000]
r = 61,220/63,583 = 0.96

198
Inter-relationships between variables

199
Chapter 10

Question 2
Pearson's correlation coefficient
Evaluate Pearson’s correlation coefficient for the data below.
n = 6, ∑x = 21, ∑y = 482, ∑xy = 2,213, ∑x2 = 91, ∑y2 = 60,514

n ∑ xy – ∑ x ∑ y
r=
2 2
(n ∑ x2– ( ∑ x) )(n ∑ y2– ( ∑ y) )
r = (6 × 2,213 – 21 × 482)/√[(6 × 91 – 212) (6 × 60,514 – 4822)]
r = 3,156/√[105 × 130,760]
r = 3,156/3,705 = 0.85

Question 3
By calculating the coefficient of determination and explaining the result,
you are developing the skill of analysis.
Coefficient of determination
If the Pearson’s correlation coefficient is 0.96, calculate the coefficient of
determination and explain what the result means.
r2 = 0.962 = 0.92
92% of the variation in dependent y values can be explained by changes in the
independent x values

200
Inter-relationships between variables

3.3 Spurious correlation

 Correlation does not necessarily imply causality, especially if a third factor is


involved.

3.4 Spearman’s rank correlation = R

 Looks at the link between the ranking of a set of objects and one of their
properties

 For example, to what extent would customer perceived quality (ranking)


correlate with the price (property) of a product?
2
6∑d
 R=1–
n(n2 -1)

 The interpretation of R-values (and warnings!) is similar to that for r.

201
Chapter 10

Question 4
By calculating the correlation coefficient, you are demonstrating the
maths skill of following the order of precedence of operators, including
indices.

Spearman's rank correlation

Calculate Spearman’s rank correlation coefficient for the following data, ranking
the best scores as 1 through to the worst as 5.
chemistry score maths score
Bob 43% 55%
Jim 61% 57%
Ash 57% 88%
Zak 76% 59%
Jen 85% 96%
2
6∑d
R=1–
n(n2 -1)
chemistry maths Chemistry Maths
score score rank rank d d2
Bob 43% 57% 5 5 0 0
Jim 61% 59% 3 4 –1 1
Ash 57% 88% 4 2 2 4
Zak 76% 71% 2 3 –1 1
Jen 85% 96% 1 1 0 0
––
∑d2 6
R = 1 – (6 × 6)/[5 × (25 – 1)]

R = 1 – 36/120 = 0.7

202
Inter-relationships between variables

Question 5
Spearman's rank correlation

Calculate Spearman’s rank correlation coefficient for the following data.


rank set 1 rank set 2
5 6
1 2
3 4
6 5
2 1
4 3
6∑d2
R=1–
n(n2 -1)

rank set 1 rank set 2 d d2


5 6 –1 1
1 2 –1 1
3 4 –1 1
6 5 1 1
2 1 1 1
4 3 1 1
––
∑d2 6
R = 1 – (6 × 6)/[6 × (36 – 1)]

R = 1 – 36/210 = 0.83

Illustrations and further practice


Now read illustrations 1 to 3 and try TYUs 1 to 9 from Chapter 10

203
Chapter 10

Regression

4.1 What is it?

 Regression analysis calculates a straight line relationship between two


variables (x and y) as follows:

 y = a + bx, where y is the dependent variable and x is the independent variable

4.2 Calculation

Step 1: calculate “b”

n ∑ xy – ∑ x ∑ y
 b= 2
(n ∑ x2– ( ∑ x) )

Step 2: calculate “a”

 a = y – bx

4.3 Limitations over the use of linear regression

 The actual relationship between the two sets of figures may not be linear – use
the coefficient of determination to investigate further.

 Even if we have a high level of correlation, this may be due to "spurious"


correlation and the presence of other causal factors

 We need to be careful using the regression line to make forecasts outside of the
range of the original data ("extrapolation").

204
Inter-relationships between variables

Question 6
Least-squares regression

Calculate the line of best fit formula using least-squares regression on the
following data.
Advertising spend ($000) Sales value ($000)
10 1,500
12 2,000
15 2,200
18 2,700
19 2,700
21 2,720
n=6
x y xy x2
10 1,500 15,000 100
12 2,000 24,000 144
15 2,200 33,000 225
18 2,700 48,600 324
19 2,700 51,300 361
21 2,720 57,120 441
––– ––––– –––––– ––––
95 13,820 229,020 1,595
∑ ∑ ∑
b=
∑ ∑
b = (6 × 229,020 – 95 × 13,820)/(6 × 1,595 – 952)
b = 61,220/545
b = 112.33

a = y – bx

a = 13,820/6 – 112.3 × 95/6

a = 525.25

y = 525.25 + 112.33x

205
Chapter 10

Question 7
By calculating the line of best fit formula, you are demonstrating the
maths skill of following the order of precedence of operators, including
indices.

Least-squares regression

Evaluate the line of best fit formula from the following information:

n = 6, ∑x = 21, ∑y = 482, ∑xy = 2,213, ∑x2 = 91, ∑y2 = 60,514

∑ ∑ ∑
b=
∑ ∑

b = (6 × 2,213 – 21 × 482)/(6 × 91 – 212)

b = 3,156/105 = 30.06

a = y – bx

a = 482/6 – 30.06 × 21/6

a = –24.88

y = –24.88 + 30.06x

Question 8
Regression equation

If the regression equation linking costs (y, in $) and productivity levels (x, in
units produced) is y = 55,000 + 25x, forecast the cost when 6,000 units are
produced.

y = 55,000 + 25 × 6,000 = $205,000

206
Inter-relationships between variables

Question 9
By calculating the value of b, you are demonstrating the maths skill of
following the order of precedence of operators, including indices.

Regression equation

In a forecasting model based on y = a + bx, the intercept on the y-axis is 2,580.


If the value of y is 4,100 when x is 95, calculate the value of b.

4,100 = 2,580 + b × 95

1,520 = b × 95

b = 1,520/95 = 16

Illustrations and further practice


Now read illustration 4 and try TYUs 10 to 15 from Chapter 10

207
Chapter 10

You should now be able to answers all the questions from chapter 10 of the
Study Text and questions 192 – 203 from the Exam Practice Kit.

For further reading, visit Chapter 10 from the Study Text.

208
Inter-relationships between variables

Answers

Question 1
n=6
x y xy x2 y2
10 1,500 15,000 100 2,250,000
12 2,000 24,000 144 4,000,000
15 2,200 33,000 225 4,840,000
18 2,700 48,600 324 7,290,000
19 2,700 51,300 361 7,290,000
21 2,720 57,120 441 7,398,400
––– –––––– –––––– ––––– –––––––––
95 13,820 229,020 1,595 33,068,400
n ∑ xy – ∑x∑y
r=
2 2
(n ∑ x2 – ( ∑ x) )(n ∑ y2 – ( ∑ y) )
r = (6 × 229,020 – 95 × 13,820)/√[(6 × 1,595 – 952) (6 × 33,068,400 – 13,8202)]

r = 61,220/√[545 × 7,418,000]
r = 61,220/63,583 = 0.96

209
Chapter 10

Question 2
n ∑ xy – ∑ x ∑ y
r=
2 2
(n ∑ x2– ( ∑ x) )(n ∑ y2– ( ∑ y) )
r = (6 × 2,213 – 21 × 482)/√[(6 × 91 – 212) (6 × 60,514 – 4822)]
r = 3,156/√[105 × 130,760]
r = 3,156/3,705 = 0.85

Question 3
r2 = 0.962 = 0.92
92% of the variation in dependent y values can be explained by changes in the
independent x values

210
Inter-relationships between variables

Question 4
2
6∑d
R=1–
n(n2 -1)
chemistry maths Chemistry Maths
score score rank rank d d2
Bob 43% 57% 5 5 0 0
Jim 61% 59% 3 4 –1 1
Ash 57% 88% 4 2 2 4
Zak 76% 71% 2 3 –1 1
Jen 85% 96% 1 1 0 0
––
∑d2 6
R = 1 – (6 × 6)/[5 × (25 – 1)]

R = 1 – 36/120 = 0.7

211
Chapter 10

Question 5
6∑d2
R=1–
n(n2 -1)

rank set 1 rank set 2 d d2


5 6 –1 1
1 2 –1 1
3 4 –1 1
6 5 1 1
2 1 1 1
4 3 1 1
––
∑d2 6
R = 1 – (6 × 6)/[6 × (36 – 1)]

R = 1 – 36/210 = 0.83

212
Inter-relationships between variables

Question 6
n=6
x y xy x2
10 1,500 15,000 100
12 2,000 24,000 144
15 2,200 33,000 225
18 2,700 48,600 324
19 2,700 51,300 361
21 2,720 57,120 441
––– ––––– –––––– ––––
95 13,820 229,020 1,595
∑ ∑ ∑
b=
∑ ∑
b = (6 × 229,020 – 95 × 13,820)/(6 × 1,595 – 952)
b = 61,220/545
b = 112.33

a = y – bx

a = 13,820/6 – 112.3 × 95/6

a = 525.25

y = 525.25 + 112.33x

213
Chapter 10

Question 7
∑ ∑ ∑
b=
∑ ∑

b = (6 × 2,213 – 21 × 482)/(6 × 91 – 212)

b = 3,156/105 = 30.06

a = y – bx

a = 482/6 – 30.06 × 21/6

a = –24.88

y = –24.88 + 30.06x

Question 8
y = 55,000 + 25 × 6,000 = $205,000

Question 9
4,100 = 2,580 + b × 95

1,520 = b × 95

b = 1,520/95 = 16

214

You might also like