Lecture 2 PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 54

BIOSTATISTICS

&
RESEARCH
METHODOLOGY

DR. SYBIL ROSE


2

BIOSTATISTICS
Methods of summarizing and displaying

data
3

SUMMARIZING AND
DISPLAYING DATA
Frequency Tables Measures of Central Tendency
• Histograms and polygons and • Mean
Ogive • Median
• Stem and leaf plots • Mode
• Box and Whisker plot
• Scatter plot Measures of Dispersion
• Proportions or percentages • Variance
• Bar charts • Standard Deviation
• Pie Charts • Standard error
• Range
• Quartiles
• CoefCicient of Variation (CV)
BIOSTATISTICS
Presenting qualitative data
Charts and tables used to present qualitative data

1. Pie charts
2. Bar Charts (Simple and Clustered bar Charts)
3. Relative frequency (Percentage) Table

These two charts are used for the presentation of qualitative


data.
Pie Charts: Pie Charts are typically used to present the
relative frequency of qualitative data.
In most cases the data are nominal, but ordinal data can also
be displayed in a pie chart.
6

The complete circle represents the total number of


measurements.
Partition into slices - one for each category
The size of a slice is proportional to the relative
frequency of that category.
Determine the angle of each slice by multiplying
the relative frequency by 360 degrees. (Recall a
circle spans 360)
7

STEPS TO CREATE A PIE


CHART
1. Construct a frequency table
2. Calculate relative frequency % (percentage)
3. Change the percentages into degrees, where:
Degree = Percentage X 360°
4. Draw a circle and divide it accordingly
For single variable:
E.g., A class of 40 students, 15 are boys and 25 are girls.
(See the pie chart)
8

FREQUENCY
Frequency: Number of times that something occurs.

Relative frequency: Frequency divide by sum of all

frequencies

Frequency

Relative frequencies = -------------------------------------

Sum of all frequencies


9

Gender Frequency Relative


Frequency %
Boys 15 15
----- X 100 =
37.5%
40
Girls 25 25
----- X 100 =
62.5%
40
Total 40
10

PIE CHART
FREQUENCY DISTRIBUTION
Girls Boys

37%

63%
11

ANGLE COMPUTATIONS
Since a circle has 360 degrees, the degree

measure of the sector for the category will be:

0.375 X 360 = 135

0.625 X 360 = 225

Total = 360
BAR CHART (BAR GRAPH)
12

• Place categories on the horizontal axis.

• Place frequency (or relative frequency) on the vertical

axis.

• Construct vertical bars of equal width, one for each

category

Its height is proportional to the frequency (or relative

frequency) of the category.


13

SIMPLE BAR CHART


FREQUENCY DISTRIBUTION FOR GENDER (BAR
CHART)

Frequency Distribution for Gender


(Bar charts)
70
60 62.5
50
40
30 37.5
20
10
0
Boys Girls
TWO VARIABLES (CROSS
14

TABULATION)
Cross tabulation or cross tabs are
often used in presenting the counts
of two qualitative variables.
Suppose the variables of Wearing Glasses Total
Yes No
interest are:
Boys 5 10 15
• Gender Girls 10 15 25

• Wearing Glasses Total 15 25 40

They are presented in this table.


TWO VARIABLES (QUALITATIVE) 15

WE CROSS TABULATION
Wearing Glasses Total

Yes No

Boys 5 10 15

Girls 10 15 25

Total 15 25 40
16

Wearing Glasses Total

Yes No

Boys 33.33% 66.67% 100%

Girls 40% 60% 100%

Total 37.5% 62.50% 100%


TABLE SHOWING THE PERCENTAGE OF 17

GENDER AND WEARING GLASSES


Clustered Bar Chart
80
70
60
50
40 Wearing Glasses Yes
30 Wearing Glasses No
20
10
0
Boys Girls
18

CROSSTABS AND
CLUSTERED BAR CHART

Expressed in percentage:

• 33.33 % of the boys

• 40% of the girls wear glasses


CALCULATE THE PERCENTAGES
19

Smoking Lung Cancer Total

Yes No

Yes 70 100

No 3 70

Total
FREQUENCY AND FREQUENCY 20

DISTRIBUTION TABLES
Frequency Distribution: A table showing a

listing of all observed values of the variable being

studied and how many times each value is

observed.
21

The number of times that something occurs is known as its

frequency.

The notation fx is used to denote the frequency or number of

times the value x occurs.

The relative frequency is just the frequency divided by the

sample size n.
TABLE: OBTAINING FREQUENCY, CUMULATIVE FREQUENCY AND PERCENTAGE 22

Age Frequency Cumulative Relative Cumulative Relative


frequency Frequency % Frequency %
13 1 1 3 3
14 7 8 23 26
15 5 13 17 43
16 6 19 20 63
17 6 25 20 83
18 2 27 7 90
19 3 30 10 100
Total 30 100
COMPUTING RELATIVE FREQUENCY 23

Frequency: Number of times that something occurs.

Relative Frequency: Frequency divided by the sum of all


frequencies
Frequency
Relative Frequency = -------------------------------------
Sum of all frequencies
Cumulative Frequency: Frequencies are added up.
E.g., 1/30 x 100 = 3% and 7/30 x 100 = 23%
Cumulative Relative Frequency: sums of all relative
frequencies below and including each category
STEPS IN CONSTRUCTING THE FREQUENCY 24

DISTRIBUTION TABLE FOR QUANTITATIVE DATA:


1. Data are first divided into a number of intervals.

2. Then the number of data points falling within

each interval is presented as the frequency or

count for that interval.

3. Tally the data in the tally column and obtain the

class frequencies
Smoothing class intervals to obtain Δ = (class boundaries) 25

(Upper limit of first-class – the lower limit of first-class)

Δ = ------------------------------------------------------------------------------

Subtract Δ from the first class limits to get the lower class

boundaries.

Add Δ to the upper-class limits to get the upper-class boundaries.


26

Sturge's rule: K = 1+3.322 (log n)

R
C = -----------
K
Where K = number of class intervals, n = number of
observations and C = class width
R (range) = minimum value – maximum value
The beginning and end of each interval are called boundaries
or interval and the point midway between any two
boundaries is called the class mark or midpoint.
TABLE: BODY MASS INDEX DATA FOR A SAMPLE OF 120 U.S ADULTS 27
18.3 21.9 23.0 24.3 25.4 26.6 27.5 28.8 30.9 34.4
19.2 21.9 23.1 24.3 25.6 26.9 27.5 28.8 30.9 34.9
19.8 21.9 23.1 24.5 25.7 27.1 27.6 28.9 31.0 35.0
20.2 22.3 23.3 24.6 25.7 27.3 28.2 29.3 31.1 35.5
20.7 22.3 23.4 24.6 25.8 27.3 28.3 29.5 31.3 35.8
20.8 22.3 23.5 24.7 25.8 27.3 28.3 29.8 31.6 35.9
21.1 22.4 24.0 24.7 25.9 27.3 28.3 30.0 31.6 36.6
21.1 22.5 24.0 24.8 25.9 27.4 28.4 30.1 32.6 37.1
21.1 22.7 24.0 24.8 26.2 27.4 28.6 30.2 32.8 37.5
21.3 22.7 24.1 25.0 26.5 27.4 28.7 30.3 33.2 37.8
21.3 22.8 24.1 25.4 26.5 27.4 28.7 30.8 33.6 38.2
21.5 22.9 24.2 25.4 26.5 27.4 28.8 30.8 34.2 38.8
28

• Usually, for a data set of 100 to 150 These seven intervals are as follows:

o 18.0 – 20.9
observations, the number chosen ranges
o 21.0 – 23.9
from about 5 to 10. o 24.0 – 26.9

• In our example, the range of the data is o 27.0 – 29.9


o 30.0 – 32.9
38.8 – 18.3 = 20.5. Suppose we divide
o 33.0 – 35.9
the data set into seven intervals. Then,
o 36.0 – 38.9
we have 20.5 ÷ 7 = 2.93, which rounds to

3.0. So the intervals have a width of 3.


FREQUENCY DISTRIBUTION TABLE
29

Cumulative
Class Interval for Cumulative Relative
Frequency (f) Relative
BMI levels Frequency (cf ) Frequency (%)
Frequency (%)
18.0 – 20.9 6 6 5.00 5.00
21.0 – 23.9 24 30 20.00 25.00
24.0 – 26.9 32 62 26.67 51.67
27.0 – 29.9 28 90 23.33 75
30.0 – 32.9 15 105 12.50 87.50
33.0 – 35.9 9 114 7.50 95.00
36.0 – 38.9 6 120 5.00 100.00
Total 120 100.00 100.00
GRAPHS FOR DISPLAYING 30

QUANTITATIVE DATA INCLUDE:


o Histogram

o Frequency Polygon and Ogive

o Stem-and-leaf plot

o Box and Whisker plot ( used when we are

constructing quartiles)

o Scatter plot ( used in correlation and regression

analysis
31
HISTOGRAM & FREQUENCY
POLYGONS:
Frequency distributions are often displayed with a histogram,

which looks like a bar chart but there is no space between bars.

The heights of the bars represent either the number or percent

of observations within each interval.

Frequency polygons, which are essentially a line that connects

the middle of each of the bars of the histogram, are also used

extensively.
32
TO CONSTRUCT A HISTOGRAM
• Draw the interval boundaries on a horizontal line and the
frequencies on a vertical line.
• Non-overlapping intervals that cover all of the data values
must be used.
• Bars are then drawn over the intervals in such a way that the
areas of the bars are all proportional in the same way to their
interval frequencies.
Using the above data we can contract histogram and polygon
using Excel.
33

Frequency histogram of BMI Data


35
30
25
20
15
10
5
0
18.0 - 20.921.0 - 23.924.0 - 26.927.0 - 29.9 30.0 - 32.933.0 - 35.936.0 - 38.9
Class Interval
34

Relative Frequency for BMI Data


30

25

20

15

10

0
18.0 - 20.9 21.0 - 23.9 24.0 - 26.9 27.0 - 29.9 30.0 - 32.9 33.0 - 35.9 36.0 - 38.9
Class Interval
35

Frequency polygon for BMI Data


35
30
25
20
15
10
5
0
18.0 - 20.9 21.0 - 23.9 24.0 - 26.927.0 - 29.9 30.0 - 32.9 33.0 - 35.9 36.0 - 38.9
Class Interval
36

Cumulative Frequency polygon (Ogive) for BMI Data

140
120
100
80
60
40
20
0
18.0 - 20.9 21.0 - 23.9 24.0 - 26.927.0 - 29.9 30.0 - 32.9 33.0 - 35.9 36.0 - 38.9
Class Interval
37

Cumulative Frequency polygon (Ogive) for BMI Data


30
26.67
25 23.33
20 20

15
12.5
10
7.5
5 5 5

0
18.0 - 20.9 21.0 - 23.9 24.0 - 26.9 27.0 - 29.9 30.0 - 32.9 33.0 - 35.9 36.0 - 38.9
Class Interval
CUMULATIVE RELATIVE FREQUENCY 38

USING OGIVE
Another way of representing of quantitative data is the Ogive

which is the graphical presentation of the commutative relative

frequency. Sometimes it may become necessary to know the

number of items whose values are more or less than a certain

amount. We can use Ogive to estimate the cumulative relative

frequencies of other values.

For example 80% of the respondents have a BMI less than 30.
39
40
STEM-AND-LEAF PLOT
HbA1c from diabetic patients (in %)
7.1 8.0 7.2 7.5 6.4
6.8 8.2 9.1 7.8 8.1

Stem Leaf
6 4 8
7 1 2 5 8
8 0 1 2
9 1
ADVANTAGES OF STEM-AND-LEAF 41

PLOT:
• Orders the data, so that the maximum and minimum are

evident

• Gaps in the data become evident

• All the data is displayed

• The shape of the data becomes clearer


BOX AND WHISKER PLOT 42
BOX AND WHISKER PLOT 43

It is another way to display information when the objective is to

illustrate certain locations in the distribution. A box plot is a good

alternative or complement to a histogram and is usually better for

showing several simultaneous comparisons.

It is useful for the detection of outliers.

It displays median, minimum, maximum first quartile (Q1)

third quartile (Q3) and inter-quartile range (IQR).


44

1.A box is drawn with the top of the box at the third quartile and

the bottom at the first quartile.

2.The location of the mid-point of the distribution is indicated with

a horizontal line in the box, which is the median or the (Q2)

3.Finally, straight lines, or whiskers, are drawn from the center of

the top of the box to the largest observation and from the center of

the bottom of the box to the smallest observation


SCATTER PLOT
45

To illustrate the relationship between


two characteristics when both are
quantitative variables we use
bivariate plots (also called scatter
plots or scatter diagrams).
46

Scatter plot showing the height and weight of


newborn babies
SUMMATION NOTATION 47

Summation notation is simply way of saying that a

collection of numbers is to be added.

Generally, some letter is used is to represent whatever

is being measured; the letter X is the most common

choice.
48

The notation X1 is used to indicate the first

observation.

The next observation is X2, and so on.... Generally, n

is typically used to represent the total number of

observations, and the observations themselves are

represented by X1, X2, . . . ,Xn.


49

In symbols, adding the numbers X1,X2, . . . ,Xn is denoted by

Where Xi = X1 +X2+· · ·+Xn,

Where is an upper case Greek sigma. The subscript i is the index

of summation and the 1 and n that appear respectively below and

above the symbol designate the range of the summation.

The i is where the X values start and the n is where the values end.
50
Sometimes, the sum extends over all n observations, in which

case it is customary to omit the index of summation. That is,

simply use the notation

𝛴Xi = X1 +X2+· · ·+Xn.


For example:
1.2, 2.2, 6.4, 3.8, 0.9.
Then the

= 2.2+6.4+3.8 = 12.4

And 𝛴Xi = 1.2+2.2+6.4+3.8+0.9 = 14.5.


51

Another common arithmetic operation is squaring each

observed value and summing the results.

This is written as: 𝛴X2i = X21+X22+· · ·+X2n

The adding of all the values and squaring them, is

written as: 𝛴(Xi) 2

For example

𝛴 X2i = 1.22 +2.22 +6.42 +3.82 +0.92 = 62.49

(𝛴 Xi)2 = (1.2+2.2+6.4+3.8+0.9)2 = 14.52 = 210.25.


52
Let c be any constant. In some situations it helps to note that

multiplying each value by c and adding the results is the same as first

computing the sum and then multiplying by c. This is written as:

𝛴 cXi = c 𝛴 Xi

For example: 𝛴 60Xi = 60 𝛴 Xi = 60×14.5 = 870.

Another common operation is to subtract a constant from each

observed value, square each difference, and add the results. In

summation notation, this is written as: 𝛴 (Xi −c)2.


53

For example:

Suppose we want to subtract 2.9 from each value, square

each of the results, and then sum these squared differences.

So c = 2.9, and

𝛴(Xi −c)2 = (1.2−2.9)2 +(2.2−2.9)2+· · ·+(0.9−2.9)2 = 20.44.


THANK YOU
Dr. Sybil Rose
[email protected]

You might also like