Data Analysis

Download as pdf or txt
Download as pdf or txt
You are on page 1of 44

Created by T.

Madas

DATA
ANALYSIS

Created by T. Madas
Created by T. Madas

DISCRETE
DATA

Created by T. Madas
Created by T. Madas

Question 1
The following set of data is given

11, 45, 47, 49, 50, 52, 53, 59, 60, 94, 117.

For this set of data, ...

a) ... determine the value of the median and the quartiles.

b) ... calculate the mean and the standard deviation.

c) ... determine with justification whether there are any outliers.

d) ... state with justification if there is any type of skew.

e) ... draw a suitably labelled box plot.

x , Q1 = 47 , Q2 = 52 , Q3 = 60 , x = 57.91 , σ ≈ 26.09 ,
(11, 94) and 117 are outliers depending on method , positive skew

Created by T. Madas
Created by T. Madas

Question 2
The following set of data is given

25, 47, 47, 49, 50, 52, 54, 55, 56, 56, 56, 59.

For this set of data, ...

a) ... determine the value of the median and the quartiles.

b) ... calculate the mean and the standard deviation.

c) ... determine with justification whether there are any outliers.

d) ... state with justification if there is any type of skew.

e) ... draw a suitably labelled box plot.

x , Q1 = 48 , Q2 = 53 , Q3 = 56 , x = 50.5 , σ ≈ 8.54 , 25 is an outlier ,


negative skew

Created by T. Madas
Created by T. Madas

Question 3
The following set of data is given

78, 79, 79, 79, 80, 82, 82, 85, 86, 88, 89, 92, 97.

For this set of data, ...

a) ... determine the value of the median and the quartiles.

b) ... calculate the mean and the standard deviation.

c) ... determine with justification whether there are any outliers.

d) ... state with justification if there is any type of skew.

e) ... draw a suitably labelled box plot.

x , Q1 = 79 , Q2 = 82 , Q3 = 89 , x = 84.31 , σ ≈ 5.635 ,
no outliers or 97 is an outlier depenending on method , positive skew

Created by T. Madas
Created by T. Madas

Question 4
The following set of data is given

50, 52, 52, 52, 53, 54, 56, 58, 61, 64, 65, 69, 75, 77.

For this set of data, ...

a) ... determine the value of the median and the quartiles.

b) ... calculate the mean and the standard deviation.

c) ... determine with justification whether there are any outliers.

d) ... state with justification if there is any type of skew.

e) ... draw a suitably labelled box plot.

x , Q1 = 52 , Q2 = 57 , Q3 = 65 , x = 59.86 , σ ≈ 8.59 , no outliers ,


positive skew

Created by T. Madas
Created by T. Madas

Question 5
The following set of data is given

7, 8, 10, 12, 16, 19, 19, 20, 23, 24, 30.

For this set of data, ...

a) ... determine the value of the median and the quartiles.

b) ... calculate the mean and the standard deviation.

c) ... determine with justification whether there are any outliers.

d) ... state with justification if there is any type of skew.

x , Q1 = 10 , Q2 = 19 , Q3 = 23 , x ≈ 17.09 , σ ≈ 6.92 , no outliers ,


negative skew

Created by T. Madas
Created by T. Madas

Question 6
The following set of data is given

5, 10, 11, 11, 14, 17, 19, 24, 30, 36, 40, 45.

For this set of data, ...

a) ... determine the value of the median and the quartiles.

b) ... calculate the mean and the standard deviation.

c) ... determine with justification whether there are any outliers.

d) ... state with justification if there is any type of skew.

x , Q1 = 11 , Q2 = 18 , Q3 = 33 , x ≈ 21.83 , σ ≈ 12.55 , no outliers ,


positive skew

Created by T. Madas
Created by T. Madas

Question 7
The following set of data shows the number of posts made, in a given day, in a social
media site by a group of individuals.

1, 12, 13, 14, 16, 17, 20, 21, 23, 24, 26, 39, 55.

For this set of data, ...

a) ... determine the value of the median and the quartiles.

b) ... calculate the mean and the standard deviation.

c) ... determine with justification whether there are any outliers.

d) ... state with justification if there is any type of skew.

MMS-P , ( Q1, Q2 , Q3 ) = (14, 20, 26 ) or ( Q1, Q2 , Q3 ) = (13.5, 20, 25 ) , x ≈ 21.6 ,


σ ≈ 12.9 , 55 is an outlier , no skew or positive skew depending on the method

Created by T. Madas
Created by T. Madas

Question 8
The following set of data is given

21, 50, 51, 53, 54, 57, 58, 60, 62, 63, 64, 68, 82, 97.

For this set of data, ...

a) ... determine the value of the median and the quartiles.

b) ... calculate the mean and the standard deviation.

c) ... determine with justification whether there are any outliers.

d) ... state with justification if there is any type of skew.

x , Q1 = 53 , Q2 = 59 , Q3 = 64 , x = 60 , σ ≈ 16.36 , 21 and 97 are outliers ,


positive/negative skew

Created by T. Madas
Created by T. Madas

Question 9
The % marks, rounded to the nearest integer, of a recent Mathematics test taken by 16
students, were summarised in an ordered stem and leaf diagram.

4 7
5 2,3,8
6 0,3, 4, a, b where 5 2 = 52 .
7 3,6, c, d ,8
8 1,9

a) Determine the lower quartile of the data.

b) Given the median is 68 and a ≠ b , find the value of a and the value of b .

It is further given that c ≠ d .

c) Find the possible values of the upper quartile.

Q1 = 59 , a = 7, b = 9 , 76.5, 77, 77.5

Created by T. Madas
Created by T. Madas

Question 10
The concentration of lactic acid, in appropriate units, after a period of intense exercise
was measured in the blood of 12 marathon runners.

Athlete A B C D E F G H I J K L
Lactic Acid Concentration 180 172 110 175 256 140 241 450 205 375 402 195

a) Determine the value of the median and the quartiles.

b) Find the mean and the standard deviation of the data.

The skewness of data can be determined by the formula

3 ( mean − median )
.
standard deviation

c) Evaluate this expression for this data and hence state its skew.

d) Draw a suitably labelled box plot for this data.


You may assume that there are no outliers in this data.

x = 241.75 , σ ≈ 104.64 , Q1 = 173.5 , Q2 = 200 , Q3 = 315.5 , 1.20 ,


positive skew

Created by T. Madas
Created by T. Madas

Question 11
The number of phone text messages send by 11 different students is given below.

14, 25, 31, 36, 37, 41, 51, 52, 55, 79, 112.

a) Find the lower quartile, the median and the upper quartile of the data.

b) Show clearly that there is only one outlier in the data.

c) Draw a suitably labelled box plot for this data, clearly indicating any outliers.

d) Determine with justification the skewness of the data.

Q1 = 3 , Q2 = 41 , Q3 = 55 , 112 is the only outlier , positive skew

Created by T. Madas
Created by T. Madas

Question 12
The number of bottles of red wine sold by a local supermarket over a two week period
is shown below.

22, 14, 11, 33, 32, 45, 4, 12, 13, 20, 27, 44, 30, 15.

a) Display the above data in an ordered stem and leaf diagram.

b) Calculate the mean and the standard deviation of the data.

c) Find the median and the quartiles of the data and use them to determine if there
are any outliers.

d) Draw a suitably labelled box plot for this data.

e) Determine with justification the skewness of the data.

x = 23 , σ = 12.11 , Q1 = 13 , Q2 = 21 , Q3 = 33 , no outliers , positive skew

Created by T. Madas
Created by T. Madas

Question 13
A company decides to give their 23 employees a skills test in order to decide if any of
these employees need to be retrained.

The maximum possible score in this test is 50 and the results are summarised in an
ordered stem and leaf diagram.

0 5
1 9,9
2 1, 6,8
where 2 9 = 29 .
3 3, 4,5,7
4 2,3, 4, 4,8,9,9
5 0, 0, 0, 0,0,0

a) Find the median score of the test.

b) Determine the interquartile range of the scores.

The company decides to retrain any employee whose score is less than the lower
quartile minus the interquartile range.

c) Show clearly that only one employee will undergo retraining.

d) Draw a suitably labelled box plot for this data, clearly indicating any outliers,
as found in part (c).

e) Determine with justification the skewness of the scores.

Q2 = 43 , IQR = 22 , 05 is the only outlier , negative skew

Created by T. Madas
Created by T. Madas

Question 14
The ages of the residents of Arnold Street are denoted by x the ages of the residents of
Benedict Street are denoted by y .

These are summarized in the following back to back stem and leaf diagram.

x y
50
5,5,3,3 1
9, 9,1 2 5
9,8, 6, 5,5, 4,3, 2, 2, 2,1 3 6, 7,8
6, 4,1, 0, 0, 0, 0 4 1, 2, 2, 3, 4,8
9 5 1, 4, 4, 4, 4,5,8,8
6 1, 3, 4, 4,5, 9,9
7 2, 6,9

where 2 3 9 = 32 in Arnold Street and 39 in Benedict Street .

a) Find separately for the residents of Arnold Street and Benedict Street, ...

i. ... the mode.

ii. ... the lower quartile, the median and the upper quartile.

iii. ... the mean and the standard deviation.

You may assume  x = 866 ,  x 2 = 31514 ,  y = 1516 ,  y 2 = 86880 .

[continues overleaf]

Created by T. Madas
Created by T. Madas

[continued from overleaf]

A coefficient of skewness is defined as

mean − mode
.
standard deviation

b) Evaluate this coefficient for the ages in each street.

c) Compare the distribution of the ages between the two streets.

mode = 40 mode = 54
Q1 = 29 Q1 = 42.5
Q2 = 34 Q2 = 54
Q3 = 40 , Q3 = 64
x ≈ 32.07 y ≈ 54.14
σ x ≈ 11.77 σ y ≈ 13.09
skew ≈ −0.67 skew ≈ 0.01

Created by T. Madas
Created by T. Madas

Question 15
The mean and standard deviation of 20 observations x1, x2 , x3, ..., x20 are

x = 18.5 and σ x = 6.5 .

The mean and standard deviation of 12 observations y1, y2 , y3, ..., y12 are

y = 25 and σ y = 7.5 .

Determine the mean and the standard deviation of all 32 observations.

mean ≈ 20.94 , standard deviation ≈ 7.58

Created by T. Madas
Created by T. Madas

Question 16
The mean and standard deviation of the test marks of 40 pupils in a Mathematics class
are 65 and 18 , respectively.

The mean and standard deviation of the test marks of the 24 boys of the class are 72
and 20 , respectively.

Find the mean and standard deviation of the test marks of the 16 girls of the class.

mean = 54.5 , standard deviation ≈ 5.12

Created by T. Madas
Created by T. Madas

Question 17
It is given that for a sample of data x1 , x2 , x3 , x4 , x5 , … xn the mean x and standard
deviation σ are

2
n n  n 
x=
1
n 
r =1
xr = 2 and σ =
1
n 
r =1
( r)
x
2

1 
n2 


r =1
xr  = 3 .

Determine, in terms of n , the value of

(
r =1
2
xr + 1) .

(r =1
2
xr + 1) = 18n

Created by T. Madas
Created by T. Madas

CONTINUOUS
DATA

Created by T. Madas
Created by T. Madas

Question 1
The distances achieved by a group of javelin throwers, rounded to the nearest metre, is
summarized in the table below.

Distance
Frequency
(nearest metre)
30 – 35 12
36 – 38 18
39 – 40 21
41 37
42 – 49 10
50 – 60 2

a) Estimate by linear interpolation, the value of the median and the quartiles.

b) Estimate the mean and the standard deviation of these data.

c) Determine with justification the skewness of the data.

d) Investigate the possibility of any outliers

x , Q1 ≈ 37.67 , Q2 ≈ 40.40 , Q3 ≈ 41.15 , x = 39.675 , σ ≈ 4.026 ,


negative skew

Created by T. Madas
Created by T. Madas

Question 2
The number of hours worked in a given week by a group of 64 individuals is
summarized in the table below.

Hours
Frequency
(nearest hour)
1 – 10 5
11 – 20 16
21 – 25 14
26 – 30 17
31 – 40 10
41 – 59 2

a) Estimate, by linear interpolation, the value of the median.

b) Estimate the mean and the standard deviation of these data.

c) Establish, with justification, the skewness of the data.

d) Determine the possibility whether the data contain any outliers.

Q2 ≈ 24.4 , x ≈ 23.88 , σ ≈ 9.54 , negative skew

Created by T. Madas
Created by T. Madas

Question 3
The weights of a random sample of a variety of apples, in grams, is summarised in the
table below.

Weight (grams) Frequency


120 ≤ w < 140 6
140 ≤ w < 150 16
150 ≤ w < 160 30
160 ≤ w < 170 36
170 ≤ w < 180 30
180 ≤ w < 200 0
200 ≤ w < 220 1

a) Estimate the mean and the standard deviation of these weights.

b) Estimate the median and the quartiles of these weights.

x , x ≈ 161 , σ ≈ 13 , Q1 ≈ 152.6 − 152.7 , Q2 ≈ 162.1 − 162.2 , Q3 ≈ 170.4 − 170.7

Created by T. Madas
Created by T. Madas

Question 4
The mileages of 120 journeys covered by a minicab driver over a monthly period are
summarized in the table below.

Mileages Frequency
10 ≤ m < 12 2
12 ≤ m < 17 54
17 ≤ m < 19 28
19 ≤ m < 21 16
21 ≤ m < 23 13
23 ≤ m < 25 7

a) Estimate by linear interpolation, the value of the median.

b) Estimate the mean and the standard deviation of these data.

A skewness coefficient can be determined by

3 ( mean − median )
.
standard deviation

c) Evaluate this coefficient for this data and hence state its skew.

d) Determine whether the data could be modelled by a Normal distribution.

x , Q2 ≈ 17.3 , x ≈ 17.36 , σ ≈ 3.21 , slight positive skew/no skew

Created by T. Madas
Created by T. Madas

Question 5
The daily commuting distances of 125 individuals, rounded to the nearest mile, is
summarised in the table below.

Distance
Frequency
(nearest mile)
0–9 12
10 –19 22
20 – 29 48
30 – 39 26
40 – 49 8
50 – 59 5
60 – 69 3
70 – 79 1

a) Estimate the mean and the standard deviation of these commuting distances.

b) Use linear interpolation to estimate the value of the median.

c) Determine with justification the skewness of the data.

d) Explain which out of the mean and standard deviation or the median and the
interquartile range are more appropriate measures to summarize this data.

x , x ≈ 26.74 , σ ≈ 13.85 , Q2 = 25.3 − 25.5 , positive skew , median & IQR

Created by T. Madas
Created by T. Madas

HISTOGRAMS

Created by T. Madas
Created by T. Madas

Question 1
A group of patients with a certain respiratory condition were asked to hold their breath
for as long as they could.

The results are summarized in the table below.

Time t
Frequency
(in seconds)
0 < t ≤ 10 30
10 < t ≤ 15 35
15 < t ≤ 18 33
18 < t ≤ 20 20
20 < t ≤ 30 25
30 < t ≤ 50 10

a) Draw an accurate histogram to represent this data.

b) Use the histogram to estimate the number of patients that managed to hold their
breath between 24 and 36 seconds.

c) Calculate estimates for the mean and standard deviation of this data.

≈ 18 , x ≈ 16.6 , σ ≈ 8.85

Created by T. Madas
Created by T. Madas

Question 2
The number of hours worked in a given week by a group of 64 freelance electricians
is summarized in the table below.

Hours
Frequency
(nearest hour)
1 – 10 5
11 – 20 16
21 – 25 14
26 – 30 17
31 – 40 10
41 – 59 2

a) Draw an accurate histogram to represent this data.

b) Use the histogram to estimate the number of freelance electricians that worked
between 15 and 37 hours during that week.

c) Estimate the median of the data.

≈ 48 , Q2 ≈ 24.4

Created by T. Madas
Created by T. Madas

Question 3
The times taken to complete a 3 mile run, in minutes, by the members of a jogging
club are summarized in the table below.

Times
Frequency
(nearest hour)
11 – 14 24
15 – 17 24
18 – 19 19
20 11
21 – 23 21
24 – 28 15

a) Estimate the mean and standard deviation of this data.

b) Estimate, by linear interpolation, the median of this data.

c) Draw an accurate histogram to represent this data.

d) Find the proportion of data which lies within 3 standard deviations of the mean.

e) Discuss briefly whether this data could be modelled by a Normal distribution.

x ≈ 18.5 , σ ≈ 4.33 , Q2 ≈ 18.4 , 100%

Created by T. Madas
Created by T. Madas

Question 4
The histogram below shows the distribution of the marks of 250 students.

2
Frequency Density

1.5

0.5

0 20 50
40 60 100
0
Marks

a) Estimate how many students scored between 52 and 74 marks.

b) Use the histogram estimate the median.

c) Calculate estimates for the mean and standard deviation of the marks of these
students.

60 , 49 , x ≈ 51.8 , σ ≈ 22.22

Created by T. Madas
Created by T. Madas

Question 5

2
Frequency Density

1.5

0.5

0 8.5 12.5 14.5 16.5


4.5 20.5
0
height

The histogram above shows the distribution of the heights, to the nearest cm , of some
plants in a garden centre. It is further given that there were 18 plants with a height
between 5 cm and 8 cm , rounded to the nearest cm .

a) Use the histogram to estimate the median.

b) Estimate, by calculation, the mean and the standard deviation of the heights of
these plants.

median ≈ 13.6 , x ≈ 13.48 , σ ≈ 3.45

Created by T. Madas
Created by T. Madas

Question 6
In a histogram the weights of baby hamsters, correct to the nearest gram, are plotted on
the x axis.

In this histogram the class 24 − 30 has a frequency of 63 and is represented by a


rectangle of base 2.8 cm and height 6 cm .

In the same histogram the class 31 − 35 has a frequency of 60 .

Determine the measurements, in cm , of the rectangle that represents the class 31 − 35 .

base = 2 cm , height = 8 cm

Created by T. Madas
Created by T. Madas

Question 7
In a histogram the commuting times of a group of individuals, correct to the nearest
minute, are plotted on the x axis.

In this histogram the class 47 − 50 has a frequency of 48 and is represented by a


rectangle of base 6 cm and height 3.6 cm .

In the same histogram the class 51 − 55 has a frequency of 30 .

Determine the measurements, in cm , of the rectangle that represents the class 51 − 55 .

base = 7.5 cm , height = 1.8 cm

Created by T. Madas
Created by T. Madas

Question 8
In a histogram the weights of apples, W grams, are plotted on the x axis.

In this histogram the class 125 ≤ W < 130 has a frequency of 75 and is represented by
a rectangle of base 1.8 cm and height 12 cm .

In the same histogram the class 150 ≤ W < 170 has a frequency of 40 .

Find the measurements, in cm , of the rectangle that represents the class 150 ≤ W < 170 .

base = 7.2 cm , height = 1.6 cm

Created by T. Madas
Created by T. Madas

Question 9
In a histogram the weights of peaches, correct to the nearest gram, are plotted on the x
axis.

In this histogram the class 146 − 150 has a frequency of 75 and is represented by a
rectangle of base 2.8 cm and height 7.5 cm .

In the same histogram a different class is represented by a rectangle of base 5.6 cm


and height 10.5 cm .

Determine the frequency of this class.

f = 210

Created by T. Madas
Created by T. Madas

Question 10
In a histogram the heights, h cm , of primary school pupils are plotted on the x axis.

In this histogram the class 120 ≤ h < 130 has a frequency of 72 and is represented by a
rectangle of base 4.2 cm and height 9 cm .

In the same histogram a different class is represented by a rectangle of base 2.1 cm


and height 8 cm .

Determine the frequency of this class.

f = 32

Created by T. Madas
Created by T. Madas

DATA
CODING

Created by T. Madas
Created by T. Madas

Question 1
The monthly mileages of a sales rep are summarised in the table below.

Mileages (m) Frequency


3250 ≤ m < 3300 19
3300 ≤ m < 3350 45
3350 ≤ m < 3400 16
3400 ≤ m < 3450 5
3450 ≤ m < 3500 2

By using the coding

x − 3325
y= ,
50

where x represents the midpoint of each class, estimate the mean and the standard
deviation of this data.

x ≈ 3332 , σ ≈ 45.2

Created by T. Madas
Created by T. Madas

Question 2
The masses of 68 cows, in kg, are summarised in the table below.

Mass (m) Frequency


600 < m ≤ 625 11
625 < m ≤ 650 14
650 < m ≤ 675 28
675 < m ≤ 700 7
700 < m ≤ 725 5
725 < m ≤ 750 2
750 < m ≤ 775 1

a) By using the coding

x − 662.5
y= ,
25

where x represents the midpoint of each class, estimate the mean and standard
deviation of this data.

b) Estimate, by the method of linear interpolation, the median mass of these cows.

x ≈ 659.19 , σ ≈ 32.91 , Q2 = 658.0

Created by T. Madas
Created by T. Madas

Question 3
The diameters of fine sand particles, in mm, are summarised in the table below.

Diameters (d) Frequency


0.02 < d ≤ 0.04 25
0.04 < d ≤ 0.06 76
0.06 < d ≤ 0.08 111
0.08 < d ≤ 0.10 255
0.10 < d ≤ 0.12 33

a) By using the coding

y = 50 ( x − 0.09 ) ,

where x represents the midpoint of each class, estimate the mean and the
standard deviation of this data.

b) Estimate, by linear interpolation, the median diameter of these sand particles.

c) Describe, with justification, the skewness of the data.

x ≈ 0.0778 , σ ≈ 0.0197 , Q2 = 0.08298

Created by T. Madas
Created by T. Madas

Question 4
The masses, x kg , of 40 students were measured and the results were summarized
using the notation below.

40 40

n =1
( xn − 50 ) = 140 and
n =1
( xn − 50 )2 = 4490 .

Calculate the mean and standard deviation of the masses of these 40 students.

x = 53.5 , σ = 10

Created by T. Madas
Created by T. Madas

Question 5
The test marks, x , of 20 students were coded and their results were summarized as

 ( x − 10 ) = 220 and
 ( x − 10 )2 = 2720 .

a) Use a detailed method to show that

 x 2 = 9120 .

b) Calculate the mean and standard deviation of the test marks of these students.

x = 21 , σ = 15 ≈ 3.87

Created by T. Madas
Created by T. Madas

Question 6
The following information about 5 observations of x is shown below.

5 5

 
2
 xi − 255   xi − 255 
  = 50 and   = 1650 .
 2   2 
i =1 i =1

Calculate the mean and standard deviation of x .

x = 275 , σ = 2 230 ≈ 30.3

Created by T. Madas

You might also like