DR. Waqar Al - Kubaisy

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 36

DR.

Waqar Al – Kubaisy
10/07/2020 1
Biostatistics
Descriptive Biostatistics
L 2
Prof. Dr. WAQAR AL-KUBAISY
2
10/07/2020 2
Measures of Dispersion
(Measures of Variation)
(Measures of Scattering)
measures of spread
Measures of Dispersion

SHOOTER A SHOOTER B
Both shooters are hitting around the “centre”

but shooter B is more “accurate”


4
Measures of Dispersion
(Measures of Variation)
(Measures of Scattering)
measures of spread
1- Range
2-Interquartile range
3- Variance
4- Stander Deviation
5- Coefficient of variance

measures of spread
Measuring of spread are very useful.
There are three main measures in common use .
once again the type of data influence the choice of an appropriate
measure
the choice of the most appropriate measure of Dispersion
depends crucially on the type of data involved
1- Range
The Range 2-Interquartile range
 simplest 3- Variance
 most obvious one of dispersion. 4- Stander Deviation
5- Coefficient of variance
 It is the distance from the smallest to the largest
 It Obtained by
subtracting lowest value from the highest value in a set of data .

Pulse rate 70 76 74 78 72 74 76
Range = 78 – 70 = 8
 The range is best written
like rang of data (from- to) 70-78
rather than single-valued difference which is much less informative
The range is not affected by skewness , but
70 72 74 76 76 78 78 78-70

sensitive to the addition or removal of an outlier value


66 70 74 90, 100 120 124 124-66
66 70 74 90, 100 120 124 120-66
66 70 74 90, 100 120 194 194-66
 Its disadvantage that
 it is based on only two observations (the lowest and highest value)
and
 give no idea about others,
 not take into consideration other values in data
 sensitive to an outlier value
 It is not very useful measures of variation,
because it does not use other observation

Therefore
Therefore

measure the variation of one observation from the other


Standard deviation

60, 65, 55, 70, 75, 75, ,70, 80,

Sensitive an outlier value


Interquartile rang (I q r)
75, 75, 75, 75, 75, 75, Mean = ????
75, 70, 75. 80, 85. Mean = ????
60, 65, 55, 70, 75, 75, ,70, 80, Mean= ????

75, 70, 75. 80, 85. Mean = ????

60, 65, 55, 70, 75, 75, ,70, 80, Mean= ????
X1 X2
X7 1 d1 X3
∑ X d11
X= N 1 d1 d1 1

X
d1
d1
X6 d1

1 X4
X5
1
1 9
10/07/2020 9
Standard deviation
the mean (average) distance of all data values from the
over all mean of all values
The smaller the mean distance is
the narrower the spread of values must be
and visa versa
this is known as standard deviation
marks of 6 students 6,2,4,1,3,2
.student No x  x
score
1st 6 3+ = 3 – 6
2nd 2 1- = 3 – 2
3rd 4 1+ = 3 – 4
4th 1 2- = 3 – 1
5th 3 0=3–3

X6 6th 2 1- = 3 – 2
d1 X1  X  18
d6
d2X2 X  ( X  X )  zero
d5 d3 X
2
X 3 ????
X5 33
d4
X4
X4
10/07/2020
(X  X ) 11
11
1
(X  X )
.student No Score x  x ( X  X )2
1st 6 3+ = 3 – 6 9
2nd 2 1- = 3 – 2 1
3rd 4 1+ = 3 – 4 1
4th 1 2- = 3 – 1 4
5th 3 0=3–3 0 2
 ( X  X )  16
6th  X 
2 18 1-=( X
3 –2X )  zero
1
X 3 16
2
S= 5

S 
2  ( X  X ) 2
2

10/07/2020
N 1 3.179 score ???? 12
12
Variance S2
It is the Average of squared deviation of observation from the
mean in a set of data
2
S 2

 (X  X ) 2 3.179 score ????

N 1

The Disadvantage or drawback of variance


that its unit is squared Kg2 , bacteria2 …..,
So
restore the squared unit into its original form by taking the square
root of this (S2) value, this is known as S.D .
Standard Deviation ± S.D.
It is the square root of variance.

S2 
 ( X  X ) 2
 (X  X ) 2
N 1   S .D
N 1

± S.D (S) it is the square root of the Average square deviation


of observation from the mean in a set of data

One advantage of SD is that unlike the iqr


it uses all the information in the data
Steps in calculating S.D
1.Determine the mean X
2-Determine the deviation of each value from the mean ( X  X )
3-.Square each deviation of value from mean ( X  X )2
2
4-Sum these square deviation of value from mean
( sum of square) .
 ( X  X )
(X  X )2
5-Divide this square deviation of value from mean by N-1
2
 ( X  X )
N 1
.
6-Take the square root of deviation of value from mean by N-1

2
.
 ( X  X )
  S .D
N 1 15
10/07/2020 15
Short Cut Method
d 2
student X2
S 2

N 1
d 2
  (X  X )2 X
.No Score
S2 
 ( X  X ) 2

1st 6 36
N 1
2nd 2 4
( X ) 2

(X  X )   X 2 2

N
3rd 4 16
4th 1 1

 X 2

(  X ) 2
5th 3 9
S 
2 N
N 1 6th 2 4
 X  18  = 70
X 2

70 - 18²
6 . = 70- 54 = 16 = 3.2 score² =1.789 score
5 5 5
Example
Short Cut Method
  Freq.(No.of
Score Students) XF X2 F
6 2 62=12 622=72
 2 4 24=8 224=16
 4 3 43=12 423=48
1 5 15=5 125=5
3 2 32=6 322=18
 2 6 26=12 226=24
total   22 55 183

( X ) 2
S2 
 ( X  X ) 2
 ( X  X ) 2
  
X 2

N
N 1
2  X 2

(  X ) 2

55 N
183  S 
2

22 183  137.5 N 1
S 
2
  2.166 scor²
22  1 21
Short Cut Method for S.D
1-Square each absolute individual value
2-Sum these squared values ƩX² .
3-Sum the all absolute value of observation X.1  X 2  X 3  .....   X
4-Square this sum of absolute values ( X ) 2
5-Divide this sum of absolute values by N ( X ).
2

N
( X ) 2 ( X ) 2
6-Subtract N
from X 2

N
( X ) 2
X 2

N
7-Divided all this result by N-1 , S2 
N 1

8- Take the square root of this last result,


( X ) 2
X 2

N
S .D  
N 1
Short Cut Method for S.D
1-Square each absolute individual value
2-Sum these squared values ƩX²
3-Sum the all absolute value of observation X 1  .X 2  X 3  .....   X
4-Square this sum of absolute values ( X )
2

5-Divide this sum of Square absolute values by N ( X ) .


2

N
( X ) 2
( X ) 2
X 2

6-Subtract N
from N
( X ) 2
X 2

N
7-Divided all this result by N-1 , S 
2

N 1

8-Take the square root of this last result, ( X ) 2


X 2

N
S .D  
N 1
Disadvantage Limitation or Drawback of S.D
It is depend on the unit of measurement, we can't compare
between two or more data to overcome this 

Coefficient of Variation C.V


It is representing by measuring the variation in
relation to the percentage of mean of that data
S .D
C.V   100
X
-C.V is used
- to compare between two or more data
-with different units of measurement .
-data with large difference between their means .
Interpreting Standard Deviation
34% 34% 1s
68%
13.5% 13.5%
95%

2.5% 99.7% 2.5%


x  3s x  2s x  1s x  1s x  2s x  3s
For bell-shaped shaped distributions, the following statements hold:
•Approximately 68% of the data fall between x  1 s and x  1 s
•Approximately 95% of the data fall between x  2 s and x  2 s
•Approximately 99.7% of the data fall between x  3 s and x  3 s
For NORMAL distributions, the word ‘approximately’ may be removed from
The above statements.
21
34% 34% 1s
68%
13.5% 13.5%
95%

x  3s x  2s x  1s x  1s x  2s x  3s
Example: Suppose the Hb levels of 150 women has a roughly
bell-shaped distribution with a mean of 12 mg/dl.
and standard deviation of 0.10 g/dl.

a) Give the interval of the amount of Hb level that approximately 68%


of the women will have
12-0.1 to 12+0.1 = 11.9 to 12.1g/dl.
b) Give the interval of the amount of Hb level that approximately 95%
of the women will have
12-2(0.1) to 12+2(0.1) = 11.8 to 12.2g/dl.
22
Q1
Thirty (30) pregnant women attending Al- Karak antenatal clinic
during 23-februry 2018 showing gain in weight as follows
Weight gain (kg) NO.of women

3-5 3
6-8 5
9-11 10
12-14 8
15-17 4
Present this data graphically,
Q2
Thirty (30) pregnant women attending Al- Karak antenatal clinic
during 23-februry 2018 showing gain in weight as follows
Weight gain (kg) NO.of women

4 3
7 5
10 10
13 8
16 4
1- Compute the measures of Central tendency ?
2- Compute Measures of Dispersion
Interquartile rang (i q r).
Calculation of percentile value
The pth percentile is
the value in the p/100 (n+1) th position
For example
the 20th percentile

Calculation of percentile value


the birth weight( g ) of 30 infants which
we put in ascending order.
2860 2994 3193 3266 3287 3303 3388 3399 3400 3421
3447 3508 3541 3594 3613 3615 3650 3666 3710 3798
3800 3886 3896 4006 4010 4090 4094 4200 4206 4490
Calculation of percentile value
The pth percentile is
the value in the p/100 (N+1) th position

the 20th percentile is the


with the BW values )N+1(20/100
)1+ 30( 20/100
0.2x31 observations= 6.2 observation
the birth weight of 30 infants which we put in ascending order.
2860 2994 3193 3266 3287 3303 3388 3399 3400 3421
3447 3508 3541 3594 3613 3615 3650 3666 3710 3798
3800 3886 3896 4006 4010 4090 4094 4200 4206
4490
The 6th value is 3303 g
the 7th value is 3388 g a difference of 85 g
the 20th percentile is
the birth weight of 30 infants which we put in
3303 + 0.2 of 85 g ascending order.
which is
2860 2994 3193 3266 3287 3303 3388
3303g +0.2x 85 g =
3399 3400 3421 3447 3508 3541 3594 3613
=3303g+17g
= 3320g 3615 3650 3666 3710 3798 3800 3886 3896
4006 4010 4090 4094 4200 4206 4490

The pth percentile is


the value in the p/100 (n+1) th position.
Similarly we could calculate
the deciles which subdivide the data values
into 10 (not 100 )equal division,
and
Quintiles Collectively we call
percentiles, n-tiles
which sub-divide the values
into five equal –sized groups deciles and
quintiles
The pth percentile is
Interquartile rang (i q r). the value in the p/100 (n+1) th position.

One solution to the problem of the sensitivity


to extreme value (outlier) is to
chop the quarter(25 percent) of the values of
both ends of the distribution
(which removes any troublesome outliers)
then measure the range of the remaining values
 this distance is called
interquartile range or i q r .

first quarantile ( Q1) third quarantile (Q3)


Calculation of iqr
To calculate iqr we need to determine two values

first quarantile ( Q1) third quarantile (Q3)


The value which The value which
cuts off the bottom cuts off the top 25 percent of values
25 percent of values,

The interquartile range is then written as (Q1 to Q3)

25/100(N+1 =0.25X31
the birth weight of 30 infants which we
=7.75 put in ascending order.
2860 2994 3193 3266 3287 3303
75/100(N+1 =0.27X31 3388 3399 3400 3421 3447 3508
=23.25 3541 3594 3613 3615 3650 3666
3710 3798 3800 3886 3896 4006
4010 4090 4094 4200 4206 4490
25/100(N+1 =0.25X31 7.75th 3399-3388=11x.75=8.25+3388
=7.75 = 3396.25
75/100(N+1) =0.75X31 23.25th 4006- 3896 =110x.25=27.5+3896
=23.25 =3923.50

with the BW data Therefore iqr = 3369. 25 to 3923.50 g


Q1= 3396.25g and = 3369. 25 to 3923.50 g
Q3 = 3923.50 g the middle 50 percent

the birth weight of 30 infants which we put in ascending order.


2860 2994 3193 3266 3287 3303 3388 3399 3400
3421 3447 3508 3541 3594 3613 3615 3650 3666
3710 3798 3800 3886 3896 4006 4010 4090 4094
4200 4206 4490

This result tell us that


the middle 50 percent of infant weighed
between 3396.25 and 3923.50 g
The interquartile range
indicate
 the spread of the middle 50%of the distribution,
 together with the median is useful adjunct (accessory) to the
range
 it is less sensitive to the size of the sample providing that this is
not too small
 is not affected either by
Outlier
skewers
But
 it does not use all of the information in the data set
 since it ignores the bottom and top quarter of values
summary measures of spread

Metric(Quantitative)
data ordinal data

Interquartile Interquartile
S.D± Range
range range
distribution is skewed and/or S.D is not appropriate because
already selected the of the non-numeric nature of
median as preferred ordinal data.
measure of location

don’t mix and –match measures-


standard deviation goes with the mean and
iqr with the median look at the above table
Choosing an appropriate measure of spread
-don’t mix and –match measures
standard deviation goes with the mean and iqr with the
median look at the following table

Standard Interquartile range Range Type of variable


deviation
No No No Nominal
No Yes Yes Ordinal
yes Yes , if skewed Yes Metric
Thank you for attention

Stay home

34
Population
probability
probability
Sample

NDC

35
10/07/2020 35
Normal Distribution Curve

You might also like