Lesson Note 11

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

STA101: Lesson Note 11

EDA Plot

Box & Whisker plot:

Box-and-whisker plots (also called box plots) are a great way to represent a data set when you
want to show the median and spread of the data at the same time. A box plot is a graphical
display, based on quartiles, that helps us picture a set of data. To construct a box plot, we need
only five statistics: the minimum value, Q1 (the first quartile), Q2 the median, Q3 (the third
quartile), and the maximum value. In general, a box-and-whisker plot might look like this:

The big box in the center is the box, and the little lines extending out from the sides are the
whiskers.

Five number summary-

In a five-number summary, the following five numbers are used to summarize the data.

1. Lowest value

2. Q1

3. Median (Q2)

4. Q3

5. Highest value

STA101 (Introduction to Statistics) _Lesson Note 11_Summer 2022


Example:

For a distribution, Lowest value= 25, Highest value= 40, Q1= 29.50, Q3= 32.1, and Median=
30.58. Show this information in a boxplot.

Example:

For a distribution, 21, 50, 51, 52, 60, 60, 61, 61, 62, 63, 63, 70, 70, 71, 71, 80, 80, 80, 83, 90, 93
Lowest value= 21, Highest value= 93, Q1= 60, Q3= 80, and Median= 63. Show this information
in a boxplot.

𝑖𝑛 1 ∗ 21
= ~5.25 = 6𝑡ℎ 𝑣𝑎𝑙𝑢𝑒
4 4
𝑖𝑛 2 ∗ 21
= ~10.5 = 11𝑡ℎ 𝑣𝑎𝑙𝑢𝑒
4 4
𝑖𝑛 3 ∗ 21
= ~15.75 = 16𝑡ℎ 𝑣𝑎𝑙𝑢𝑒
4 4

STA101 (Introduction to Statistics) _Lesson Note 11_Summer 2022


Strategies for determining the skewness for a Boxplot:

Skewed to the right


STA101 (Introduction to Statistics) _Lesson Note 11_Summer 2022
STA101 (Introduction to Statistics) _Lesson Note 11_Summer 2022
Interquartile Range (IQR) = Q3 - Q1

Interquartile Range is the middle 50%, which trims off the 25% on both sides. Therefore, it is
not affected by the outliers like we see in case of normal Range.

Outliers:

Outliers are data points that are unlike most of the rest of the data. If there’s a data point that’s
really far from most of the data, then we can probably call it an outlier. We use what’s called the
1.5-IQR rule, and it will identify both high outliers (outliers above the majority of the data) and
low outliers (outliers below the majority of the data).

Outliers are the data points which are below or less than Q1 – (1.5 X IQR) or above or greater than
Q3 + (1.5 X IQR).

𝐼𝑛𝑡𝑒𝑟𝑞𝑢𝑎𝑟𝑡𝑖𝑙𝑒 𝑅𝑎𝑛𝑔𝑒, 𝐼𝑄𝑅 = 𝑄3 − 𝑄1

𝑳𝒐𝒘𝒆𝒓 𝒇𝒆𝒏𝒄𝒆 = 𝑸𝟏 − 𝟏. 𝟓 ∗ 𝑰𝑸𝑹

𝑼𝒑𝒑𝒆𝒓 𝒇𝒆𝒏𝒄𝒆 = 𝑸𝟑 + 𝟏. 𝟓 ∗ 𝑰𝑸𝑹

Any observation having value out of (beyond) these two fences is called outliers and
represented by ‘*’ sign on the boxplot. (One * for each outlier)

STA101 (Introduction to Statistics) _Lesson Note 11_Summer 2022


Example:

For a distribution, Lowest value= 21, Highest value= 93, Q1= 60, Q3= 80, and Median= 63.
IQR = 𝐼𝑄𝑅 = 𝑄3 − 𝑄1 = 80 − 60 = 20

Lower fence = Q1 − 1.5 ∗ IQR = 60- (1.5 X 20) = 30

Upper fence = Q3 + 1.5 ∗ IQR = 80+ (1.5 X 20) = 110

Example:

A random sample of 20 people was taken to know the time passed on Facebook during last two
weeks (in hours). The recorded data were as follows-

67, 76, 85, 42, 93, 48, 93, 46, 52, 72, 77, 53, 41, 48, 86, 78, 56, 80, 70, 66

Show this data in a boxplot. Are there any outliers?

STA101 (Introduction to Statistics) _Lesson Note 11_Summer 2022


Solution:

Position
Value

𝑖𝑛 1∗20 5𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 + 6𝑡ℎ 𝑣𝑎𝑙𝑢𝑒


Q1= = =5= = 50
4 4 2

𝑖𝑛 2∗20 10𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 +11𝑡ℎ 𝑣𝑎𝑙𝑢𝑒


Q2 = = = 10 = = 68.5
4 4 2

𝑖𝑛 3∗20 15𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 +16𝑡ℎ 𝑣𝑎𝑙𝑢𝑒


Q3 = = = 15 = = 79
4 4 2

Lowest value= 41, Highest value= 93, Q1= 50, Q3= 79, and Median= 68.5.

𝐈𝐐𝐑 = 𝑄3 − 𝑄1 = 79 − 50 = 𝟐𝟗

𝑳𝒐𝒘𝒆𝒓 𝒇𝒆𝒏𝒄𝒆 = Q1 − 1.5 ∗ IQR = 50- (1.5 X 29) = 6.5

𝑼𝒑𝒑𝒆𝒓 𝒇𝒆𝒏𝒄𝒆 = Q3 + 1.5 ∗ IQR = 80+ (1.5 X 20) = 122.5

STA101 (Introduction to Statistics) _Lesson Note 11_Summer 2022


Exercises for practice:
1. Sarah is visiting dairy farms as part of a research project and counting the number of red
cows at each farm she visits. Here is her data:
0, 1, 1, 1, 2, 5, 5, 7, 7, 18, 24, 24
Calculate the IQR and range of the data set.
[Ans: IQR = 11.5, Range = 24]
2. Catherine counted the number of lizards she saw in her garden each week and recorded the
data in a table. What is the interquartile range of the data? [Ans: IQR = 12]
Number of lizards Frequency
2 5
5 2
8 1
12 2
13 2
15 3
21 1

3. The bar graph shows the number of tickets sold for the high school party each day. What is
the interquartile range of the data set? [Ans: IQR = 4]

STA101 (Introduction to Statistics) _Lesson Note 11_Summer 2022


4. These are average lifespans in years of various mammals:
35, 10, 40, 40, 20, 10, 15, 14, 18, 35
Find the five-number summary for the data. Show them in a Box plot.
[Ans: Min=10, Q1= 14, Median= 19, Q3=35, Max=40]

5. Create the box-and-whisker plot for the book ratings given in the stem and leaf plot.
Stem Leaf
1 378
2 146
3 55
4
5 26

Key: 1|3 = 13

STA101 (Introduction to Statistics) _Lesson Note 11_Summer 2022

You might also like