Intro W03 Rev
Intro W03 Rev
Intro W03 Rev
Notes:
If the measures are computed for data from a sample, they are called sample statistics.
If the measures are computed for data from a population, they are called population
parameters.
A sample statistic is referred to as the point estimator of the corresponding population
parameter.
Mean
The mean for ungrouped data is obtained by dividing the
sum of all values by the number of values in the data set. Thus,
Solution:
Notes:
The median gives the center of a histogram, with half the data values to the left
the median and half to the right of the median.
The advantage of using the median as a measure of central tendency is that it is
not influenced by outliers.
Consequently, the median is preferred over the mean as a measure of center
for data sets that contain outliers.
Mean and Median application:
The following data give lunch expenses in Prasmul canteen (in 000 Rp) for a
sample of 20 students on Friday, 8 March 2024.
25;35;27;45;33;28;29;42;33;28;45;35;35;40;22;29;40;35;35;30
a) Calculate the mean
b) Calculate the median
c) Give some interpretation of these statistics.
d) Assume you have intention to open canteen business in this location.
What are some considerations before you start the business (relevant to
the two statistics)
Solution
22 + 25 + 27 + 28 + 28 + 29 + 29 + 30 + 33 + 33 + 35 + 35 + 35 + 35 + 35 + 40 + 40 + 42+ 45 + 45
a) 𝑚𝑒𝑎𝑛= =33.55
20
20+1
b) n = 20 Even number, so the median position is: =10.5
2
22, 25, 27, 28, 28, 29, 29, 30, 33, 33, 35, 35, 35, 35, 35, 40, 40, 42, 45, 45
𝑥 10 + 𝑥11 33+35
X10 X11 Median value: = =34
2 2
c) From 20 students, the mean and median values are similar, so the central position based on the two
values are almost the same. Therefore, the samples are more likely to be symmetric (skewness is
almost zero)
d) What is the number of students that will likely to become your customers? The likely profit would
be the number of customers times the central position (mean or median).
Mode
• The mode of a data set is the value that occurs with
greatest frequency. It’s more suitable for categorical
data (limited category) than continues quantitative
data.
• The greatest frequency can occur at two or more
different values.
• If the data have exactly two modes, the data are
bimodal.
• If the data have more than two modes, the data are
multimodal.
Weighted Mean xw
Weighted Mean
w
where x and w denote the variable and the weights,
respectively.
Example:
Maura bought gas for her car four times during June 2015. She bought 10
gallons at a price of $2.60 a gallon, 13 gallons at a price of $2.80 a gallon, 8
gallons at a price of $2.70 a gallon, and 15 gallons at a price of $2.75 a gallon.
What is the average price that Maura paid for gas during June 2015?
Weighted Mean
xw 125.25
$2.72
w 46
Thus, Maura paid an average of $2.72 a gallon for the gas she bought in June
2015.
Relationships Among the Mean, Median, and Mode
Symmetric
Skewed to right
Skewed to left
Geometric Mean
1. The geometric mean is calculated by finding the nth root of the product of n
values. It useful when we have high outlier values to get “typical” values.
and s 2
x x
2
N n 1
x x x
2 2
and s
N n 1
Coefficient of Variation (CV)
Note that the coefficient of variation does not have any units of
measurement, as it is always expressed as a percent.
The following data give lunch expenses in Prasmul canteen (in 000 Rp) for a
sample of 20 students on Friday, 8 March 2024.
25;35;27;45;33;28;29;42;33;28;45;35;35;40;22;29;40;35;35;30
Find the sample standard deviation and sample CV!
x (x - mean)^2
22 133.4025
25 73.1025
27 42.9025
28 30.8025
28 30.8025
29 20.7025
29 20.7025
30 12.6025
33 0.3025
33 0.3025
35 2.1025
35 2.1025
35 2.1025
35 2.1025
35 2.1025
40 41.6025
40 41.6025
42 71.4025
45 131.1025
45 131.1025
Total 792.95
mean 33.55
s 6.460202
Measures of Position
Quartiles and Interquartile Range
Percentiles and Percentile Rank
100 100
Thus, the 70th percentile, P70, is given by the value of the 9th term in
the ranked data set. Note that we rounded 8.4 up to 9, which is always
the case when calculating a percentile.
P70 = Value of the 9th term = 42 minutes
Thus, we can state that approximately 70% of these 12 students
commute for less than or equal to 42 minutes.
Find how many data values are less than 42.
In the above ranked data, there are eight data values that are less than 42.
8
Percentile rank of 42 100% 66.67%
12
Rounding this answer to the nearest integral value, we can state that about
67% of the students in this sample commute for less than 42 minutes.
Box-and-Whisker Plot
Definition
A plot that shows the center, spread, and skewness of a data set. It is
constructed by drawing a box and two whiskers that use the median, the first
quartile, the third quartile, and the smallest and the largest values in the data
set between the lower and the upper inner fences.
The following data are the incomes (in thousands of dollars) for a sample of
12 households.