Probability and Statistics Lecture Notes

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

2 Measures of Central Tendency

The most commonly used measures of central tendency are: the mean,
the median and the mode.

2.1 Ungrouped Data

2.1.1 Mean
The arithmetic average of all the scores or group of scores in a
distribution. Denoted by the symbol µ for the population mean and 𝑥 for
the sample mean.
𝑿
Population mean, 𝝁 =   𝑵
𝒔𝒖𝒎  𝒐𝒇  𝒐𝒃𝒔𝒆𝒓𝒗𝒂𝒕𝒊𝒐𝒏𝒔 𝒙
Sample mean, 𝒙 = 𝒏𝒖𝒎𝒃𝒆𝒓  𝒐𝒇  𝒐𝒃𝒔𝒆𝒓𝒗𝒂𝒕𝒊𝒐𝒏𝒔 = 𝒏

Where:
X = the scores in population
x = the scores in a sample
N = total number of observations in a population
n = total number of observations in a sample

2.1.2 Median
The median is the midpoint of the data array. Before finding this value,
the data must be arranged in order, from least to greatest or vice versa. The
median will either be a specific value or will fall between two values.
A point in the distribution of scores at which 50% of the scores fall
below and 50% of the score fall above. It is the middlemost score.
When an odd number of observations are placed in array, the median
corresponds to the [(n+1)/2]th largest observation.
When there is an even number of observations in array, the median
corresponds to the midpoint between the (n/2)th and the [(n/2) + 1]th largest
observations.
2.1.3 Mode
The third measure of central tendency is mode. It is the value that
occurs most often in the data set. The data can have more than one mode or
none at all. If there are two data that appear most frequent in the observation,
the data is bimodal.

2.2 Grouped Data

2.2.1 Mean
( 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦  𝑜𝑓  𝑒𝑎𝑐ℎ  𝑐𝑙𝑎𝑠𝑠 ∗ 𝑐𝑙𝑎𝑠𝑠  𝑚𝑎𝑟𝑘 )
𝑀𝑒𝑎𝑛, 𝑥 =
𝑡𝑜𝑡𝑎𝑙  𝑛𝑢𝑚𝑏𝑒𝑟  𝑜𝑓  𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠
(𝑓 ∗ 𝑀)
𝑥=
𝑛
This is called the long method formula.

Other formulas that can be used in finding the mean of grouped data are the
Coding and Deviation formulas:

• Coding method
(𝑓𝑢)
𝑥 =𝐴+𝑖
𝑛
where:
A = assumed mean
i = class size or class width
f = frequency
u = algebraic unit deviation of class mark from the
assumed
mean
n = total number of observations

• Deviation method
(𝑓𝑑)
𝑥 =𝐴+
𝑛

where:

A = assumed mean
f = frequency
d = algebraic deviation of class mark from the
assumed mean
n = total number of observations

2.2.2 Median
𝑛
−𝐹
𝑀𝑒𝑑𝑖𝑎𝑛 = 𝐿 + 2 𝑖
𝑓!
where:
L = exact lower limit of the median class, the lower class
boundary
n = total number of observations
F = the cumulative frequency of the class preceding/ before
the median class
fm = frequency of the median class
i = class width

2.2.3 Mode
The class with the highest frequency is called the modal class.
𝑑!
𝑀𝑜𝑑𝑒 = 𝐿!" + 𝑖
𝑑! + 𝑑!
where:
Lmo = lower boundary of the modal class
i = class size or class width
d1 = difference of the frequency of the modal class and
the
class preceding it
d2 = difference of the frequency of the modal class and
the
class succeeding it
3 Measures of Variability
The previous section focused on measures of central tendency or
averages which are the central scores of a given set of data. However, not all
features of a given data set may be reflected by the measures of central
tendency. For example, two different groups of five students were given the
same exam in Statistics and the results are as follows:
Group 1 Group 2
90 62
82 95
85 85
85 85
83 98

Taking the mean, median and mode of these two groups of data will result
to a value of 85 for these three measures. But obviously, Group 2 has a more
widely scattered data than Group 1. This characteristic is called variability. It is
not reflected by using averages.
The three basic measures of variation are range, variance and standard
deviation.

3.1 Range
It is the simplest measure of variation. It is he difference between the
highest data and the lowest data. A much larger range suggests greater
variation or dispersion.
For ungrouped data:
Range, R = Hs - Ls
For grouped data:
R= UBHC – LBLC
UBHC – upper boundary of the highest class
LBLC – lower boundary of the lowest class
3.2 Mean Absolute Deviation (MAD)
It is a measure of the average of the absolute deviation from the mean
of all observations in a given set of data. It Is the sum of the difference
between the scores or class marks and the arithmetic mean divided by the
total number of observations.
For ungrouped data:
!
!!! 𝑥! − 𝑥
𝑀𝐴𝐷 =
𝑛
where:
xk = particular data
𝑥 = mean
n = total frequency or observations

For grouped data:


!
!!! 𝑓! 𝑀! − 𝑥
𝑀𝐴𝐷 =
𝑛
where:
fk = frequency of the kth class
Mk = class mark of the kth class
m = number of class intervals
𝑥 = mean
n = total frequency or observations

3.3 Standard Deviation


Of all the measures of variation, the standard deviation is the most
preferred because it is used in many statistical operations. It is the square root
of variance or the square root of the average squared deviations from the
mean.
For ungrouped data:
Population standard deviation, σ

!
!!!(𝑥! − 𝜇)!
𝜎=
𝑁
Sample standard deviation, s

!
− 𝑥)!
!!!(𝑥!
𝑠=
𝑛−1

where:
x = individual data
µ = population mean
𝑥 = sample mean
n = sample size
N = total population

For grouped data:


Population standard deviation, σ

!
!!! 𝑓! (𝑀! − 𝜇)!
𝜎=
𝑁

Sample standard deviation, s

!
− 𝑥)!
!!! 𝑓(𝑀
𝑠=
𝑛−1

where:
M = class mark or class midpoint
µ = population mean
𝑥 = sample mean
f = frequency
n = sample size
N = total population

3.4 Variance
It is simply the square of the standard deviation.

For ungrouped data:


Population variance, σ2
!
! !!!(𝑥! − 𝜇)!
𝜎 =
𝑁

Sample variance, s2
!
! !!!(𝑥!− 𝑥)!
𝑠 =
𝑛−1

For grouped data:


Population variance, σ2
!
! !!! 𝑓! (𝑀! − 𝜇)!
𝜎 =
𝑁
2
Sample variance, s
!
! !!! 𝑓(𝑀− 𝑥)!
𝑠 =
𝑛−1

3.5 Percentile Deviation, Decile Deviation, Quartile Deviation


Percentile - are values that divide a set of observations into 100 equal
parts. These values, denoted by P1, P2, …. P99 are such that 1% falls below
P1, 2% falls below P2,… and 99% falls below P99.

Decile - are values that divide a set of observations into 10 equal parts.
These values, denoted by D1, D2, …. D9 are such that 10% falls below D1,
20% falls below D2,… and 90% falls below D9.
Quartile - are values that divide a set of observations into 4 equal parts.
These values, denoted by Q1, Q2, and Q3 are such that 25% falls below Q1,
50% falls below Q2 and 75% falls below Q3.

Percentile deviation, PD PD = P90 – P10


Decile deviation, DD DD = D9 – D1
Interquartile deviation QD = Q3 – Q1
!!  –  !!
Semi-interquartile deviation 𝑄𝐷 = !

For ungrouped data


90𝑛
𝑃!" =
100
10𝑛
𝑃!" =
100
9𝑛
𝐷! =
10
𝑛
𝐷! =
10
3𝑛
𝑄! =
4
𝑛
𝑄! =
4
2𝑛
𝑄! =
4

For grouped data


𝑖 90𝑛
𝑃!" =   𝐿!!" + − 𝐹!!"
𝑓!!" 100

3.6 Coefficient of Variation, CV


In comparing the measures of variation of two sets of interval scales
data, coefficient of variation can be used which is given by the ratio between
the standard deviation and the mean. In equation form:
𝑠
𝐶𝑉 =
𝑥

3.7 Shapes of Distribution and Skewness


The three most important shapes of frequency distribution are positively
skewed, symmetrical, and negatively skewed.
v Negatively skewed distribution – when the majority of the data
value falls to the right of the mean and clusters at the upper end
of the distribution.
v Positively skewed distribution - when the majority of the data
value falls to the left of the mean and clusters at the lower end of
the distribution.
v Symmetrical distribution – when the data values are evenly
distributed on both sides of the mean. Also, when the distribution
is unimodal, the mean, median and mode are the same, and are
at the same center of the distribution.
Coefficient of Skewness
A measure to determine the skewness of a distribution is called
Pearson coefficient of skewness given by this formula:
3 𝑥 − 𝑀𝑑
𝑆𝐾 =
𝑠
where:
𝑥 = mean
Md = median
s = standard deviation

3.8 Measure of Kurtosis


Even if the curves of distributions have the same coefficient of skewness,
these curves may still differ in the sharpness of their peaks. This property can
be described by means of the measure of kurtosis. Symmetrical curves can
be the normal curve or mesokurtic, more peaked or leptokurtic or flat-
topped curves of platykurtic curves.
The formula to compute the measure of kurtosis are:
For ungrouped data:
!
𝑥−𝑥
𝐾=
𝑛𝑠 !
For grouped data:
!
𝑓 𝑀−𝑥
𝐾=
𝑛𝑠 !
A distribution is said to be
Mesokurtic if K=3
Leptokurtic if K>3
Platykurtic if K<3

You might also like