Probability and Statistics Lecture Notes
Probability and Statistics Lecture Notes
Probability and Statistics Lecture Notes
The most commonly used measures of central tendency are: the mean,
the median and the mode.
2.1.1 Mean
The arithmetic average of all the scores or group of scores in a
distribution. Denoted by the symbol µ for the population mean and 𝑥 for
the sample mean.
𝑿
Population mean, 𝝁 = 𝑵
𝒔𝒖𝒎 𝒐𝒇 𝒐𝒃𝒔𝒆𝒓𝒗𝒂𝒕𝒊𝒐𝒏𝒔 𝒙
Sample mean, 𝒙 = 𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒐𝒃𝒔𝒆𝒓𝒗𝒂𝒕𝒊𝒐𝒏𝒔 = 𝒏
Where:
X = the scores in population
x = the scores in a sample
N = total number of observations in a population
n = total number of observations in a sample
2.1.2 Median
The median is the midpoint of the data array. Before finding this value,
the data must be arranged in order, from least to greatest or vice versa. The
median will either be a specific value or will fall between two values.
A point in the distribution of scores at which 50% of the scores fall
below and 50% of the score fall above. It is the middlemost score.
When an odd number of observations are placed in array, the median
corresponds to the [(n+1)/2]th largest observation.
When there is an even number of observations in array, the median
corresponds to the midpoint between the (n/2)th and the [(n/2) + 1]th largest
observations.
2.1.3 Mode
The third measure of central tendency is mode. It is the value that
occurs most often in the data set. The data can have more than one mode or
none at all. If there are two data that appear most frequent in the observation,
the data is bimodal.
2.2.1 Mean
( 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑒𝑎𝑐ℎ 𝑐𝑙𝑎𝑠𝑠 ∗ 𝑐𝑙𝑎𝑠𝑠 𝑚𝑎𝑟𝑘 )
𝑀𝑒𝑎𝑛, 𝑥 =
𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠
(𝑓 ∗ 𝑀)
𝑥=
𝑛
This is called the long method formula.
Other formulas that can be used in finding the mean of grouped data are the
Coding and Deviation formulas:
• Coding method
(𝑓𝑢)
𝑥 =𝐴+𝑖
𝑛
where:
A = assumed mean
i = class size or class width
f = frequency
u = algebraic unit deviation of class mark from the
assumed
mean
n = total number of observations
• Deviation method
(𝑓𝑑)
𝑥 =𝐴+
𝑛
where:
A = assumed mean
f = frequency
d = algebraic deviation of class mark from the
assumed mean
n = total number of observations
2.2.2 Median
𝑛
−𝐹
𝑀𝑒𝑑𝑖𝑎𝑛 = 𝐿 + 2 𝑖
𝑓!
where:
L = exact lower limit of the median class, the lower class
boundary
n = total number of observations
F = the cumulative frequency of the class preceding/ before
the median class
fm = frequency of the median class
i = class width
2.2.3 Mode
The class with the highest frequency is called the modal class.
𝑑!
𝑀𝑜𝑑𝑒 = 𝐿!" + 𝑖
𝑑! + 𝑑!
where:
Lmo = lower boundary of the modal class
i = class size or class width
d1 = difference of the frequency of the modal class and
the
class preceding it
d2 = difference of the frequency of the modal class and
the
class succeeding it
3 Measures of Variability
The previous section focused on measures of central tendency or
averages which are the central scores of a given set of data. However, not all
features of a given data set may be reflected by the measures of central
tendency. For example, two different groups of five students were given the
same exam in Statistics and the results are as follows:
Group 1 Group 2
90 62
82 95
85 85
85 85
83 98
Taking the mean, median and mode of these two groups of data will result
to a value of 85 for these three measures. But obviously, Group 2 has a more
widely scattered data than Group 1. This characteristic is called variability. It is
not reflected by using averages.
The three basic measures of variation are range, variance and standard
deviation.
3.1 Range
It is the simplest measure of variation. It is he difference between the
highest data and the lowest data. A much larger range suggests greater
variation or dispersion.
For ungrouped data:
Range, R = Hs - Ls
For grouped data:
R= UBHC – LBLC
UBHC – upper boundary of the highest class
LBLC – lower boundary of the lowest class
3.2 Mean Absolute Deviation (MAD)
It is a measure of the average of the absolute deviation from the mean
of all observations in a given set of data. It Is the sum of the difference
between the scores or class marks and the arithmetic mean divided by the
total number of observations.
For ungrouped data:
!
!!! 𝑥! − 𝑥
𝑀𝐴𝐷 =
𝑛
where:
xk = particular data
𝑥 = mean
n = total frequency or observations
!
!!!(𝑥! − 𝜇)!
𝜎=
𝑁
Sample standard deviation, s
!
− 𝑥)!
!!!(𝑥!
𝑠=
𝑛−1
where:
x = individual data
µ = population mean
𝑥 = sample mean
n = sample size
N = total population
!
!!! 𝑓! (𝑀! − 𝜇)!
𝜎=
𝑁
!
− 𝑥)!
!!! 𝑓(𝑀
𝑠=
𝑛−1
where:
M = class mark or class midpoint
µ = population mean
𝑥 = sample mean
f = frequency
n = sample size
N = total population
3.4 Variance
It is simply the square of the standard deviation.
Sample variance, s2
!
! !!!(𝑥!− 𝑥)!
𝑠 =
𝑛−1
Decile - are values that divide a set of observations into 10 equal parts.
These values, denoted by D1, D2, …. D9 are such that 10% falls below D1,
20% falls below D2,… and 90% falls below D9.
Quartile - are values that divide a set of observations into 4 equal parts.
These values, denoted by Q1, Q2, and Q3 are such that 25% falls below Q1,
50% falls below Q2 and 75% falls below Q3.