Lesson Note 11
Lesson Note 11
Lesson Note 11
EDA Plot
Box-and-whisker plots (also called box plots) are a great way to represent a data set when you
want to show the median and spread of the data at the same time. A box plot is a graphical
display, based on quartiles, that helps us picture a set of data. To construct a box plot, we need
only five statistics: the minimum value, Q1 (the first quartile), Q2 the median, Q3 (the third
quartile), and the maximum value. In general, a box-and-whisker plot might look like this:
The big box in the center is the box, and the little lines extending out from the sides are the
whiskers.
In a five-number summary, the following five numbers are used to summarize the data.
1. Lowest value
2. Q1
3. Median (Q2)
4. Q3
5. Highest value
For a distribution, Lowest value= 25, Highest value= 40, Q1= 29.50, Q3= 32.1, and Median=
30.58. Show this information in a boxplot.
Example:
For a distribution, 21, 50, 51, 52, 60, 60, 61, 61, 62, 63, 63, 70, 70, 71, 71, 80, 80, 80, 83, 90, 93
Lowest value= 21, Highest value= 93, Q1= 60, Q3= 80, and Median= 63. Show this information
in a boxplot.
𝑖𝑛 1 ∗ 21
= ~5.25 = 6𝑡ℎ 𝑣𝑎𝑙𝑢𝑒
4 4
𝑖𝑛 2 ∗ 21
= ~10.5 = 11𝑡ℎ 𝑣𝑎𝑙𝑢𝑒
4 4
𝑖𝑛 3 ∗ 21
= ~15.75 = 16𝑡ℎ 𝑣𝑎𝑙𝑢𝑒
4 4
Interquartile Range is the middle 50%, which trims off the 25% on both sides. Therefore, it is
not affected by the outliers like we see in case of normal Range.
Outliers:
Outliers are data points that are unlike most of the rest of the data. If there’s a data point that’s
really far from most of the data, then we can probably call it an outlier. We use what’s called the
1.5-IQR rule, and it will identify both high outliers (outliers above the majority of the data) and
low outliers (outliers below the majority of the data).
Outliers are the data points which are below or less than Q1 – (1.5 X IQR) or above or greater than
Q3 + (1.5 X IQR).
Any observation having value out of (beyond) these two fences is called outliers and
represented by ‘*’ sign on the boxplot. (One * for each outlier)
For a distribution, Lowest value= 21, Highest value= 93, Q1= 60, Q3= 80, and Median= 63.
IQR = 𝐼𝑄𝑅 = 𝑄3 − 𝑄1 = 80 − 60 = 20
Example:
A random sample of 20 people was taken to know the time passed on Facebook during last two
weeks (in hours). The recorded data were as follows-
67, 76, 85, 42, 93, 48, 93, 46, 52, 72, 77, 53, 41, 48, 86, 78, 56, 80, 70, 66
Position
Value
Lowest value= 41, Highest value= 93, Q1= 50, Q3= 79, and Median= 68.5.
𝐈𝐐𝐑 = 𝑄3 − 𝑄1 = 79 − 50 = 𝟐𝟗
3. The bar graph shows the number of tickets sold for the high school party each day. What is
the interquartile range of the data set? [Ans: IQR = 4]
5. Create the box-and-whisker plot for the book ratings given in the stem and leaf plot.
Stem Leaf
1 378
2 146
3 55
4
5 26
Key: 1|3 = 13