Lecture 2 PDF
Lecture 2 PDF
Lecture 2 PDF
&
RESEARCH
METHODOLOGY
BIOSTATISTICS
Methods of summarizing and displaying
data
3
SUMMARIZING AND
DISPLAYING DATA
Frequency Tables Measures of Central Tendency
• Histograms and polygons and • Mean
Ogive • Median
• Stem and leaf plots • Mode
• Box and Whisker plot
• Scatter plot Measures of Dispersion
• Proportions or percentages • Variance
• Bar charts • Standard Deviation
• Pie Charts • Standard error
• Range
• Quartiles
• CoefCicient of Variation (CV)
BIOSTATISTICS
Presenting qualitative data
Charts and tables used to present qualitative data
1. Pie charts
2. Bar Charts (Simple and Clustered bar Charts)
3. Relative frequency (Percentage) Table
FREQUENCY
Frequency: Number of times that something occurs.
frequencies
Frequency
PIE CHART
FREQUENCY DISTRIBUTION
Girls Boys
37%
63%
11
ANGLE COMPUTATIONS
Since a circle has 360 degrees, the degree
Total = 360
BAR CHART (BAR GRAPH)
12
axis.
category
TABULATION)
Cross tabulation or cross tabs are
often used in presenting the counts
of two qualitative variables.
Suppose the variables of Wearing Glasses Total
Yes No
interest are:
Boys 5 10 15
• Gender Girls 10 15 25
WE CROSS TABULATION
Wearing Glasses Total
Yes No
Boys 5 10 15
Girls 10 15 25
Total 15 25 40
16
Yes No
CROSSTABS AND
CLUSTERED BAR CHART
Expressed in percentage:
Yes No
Yes 70 100
No 3 70
Total
FREQUENCY AND FREQUENCY 20
DISTRIBUTION TABLES
Frequency Distribution: A table showing a
observed.
21
frequency.
sample size n.
TABLE: OBTAINING FREQUENCY, CUMULATIVE FREQUENCY AND PERCENTAGE 22
class frequencies
Smoothing class intervals to obtain Δ = (class boundaries) 25
Δ = ------------------------------------------------------------------------------
Subtract Δ from the first class limits to get the lower class
boundaries.
R
C = -----------
K
Where K = number of class intervals, n = number of
observations and C = class width
R (range) = minimum value – maximum value
The beginning and end of each interval are called boundaries
or interval and the point midway between any two
boundaries is called the class mark or midpoint.
TABLE: BODY MASS INDEX DATA FOR A SAMPLE OF 120 U.S ADULTS 27
18.3 21.9 23.0 24.3 25.4 26.6 27.5 28.8 30.9 34.4
19.2 21.9 23.1 24.3 25.6 26.9 27.5 28.8 30.9 34.9
19.8 21.9 23.1 24.5 25.7 27.1 27.6 28.9 31.0 35.0
20.2 22.3 23.3 24.6 25.7 27.3 28.2 29.3 31.1 35.5
20.7 22.3 23.4 24.6 25.8 27.3 28.3 29.5 31.3 35.8
20.8 22.3 23.5 24.7 25.8 27.3 28.3 29.8 31.6 35.9
21.1 22.4 24.0 24.7 25.9 27.3 28.3 30.0 31.6 36.6
21.1 22.5 24.0 24.8 25.9 27.4 28.4 30.1 32.6 37.1
21.1 22.7 24.0 24.8 26.2 27.4 28.6 30.2 32.8 37.5
21.3 22.7 24.1 25.0 26.5 27.4 28.7 30.3 33.2 37.8
21.3 22.8 24.1 25.4 26.5 27.4 28.7 30.8 33.6 38.2
21.5 22.9 24.2 25.4 26.5 27.4 28.8 30.8 34.2 38.8
28
• Usually, for a data set of 100 to 150 These seven intervals are as follows:
o 18.0 – 20.9
observations, the number chosen ranges
o 21.0 – 23.9
from about 5 to 10. o 24.0 – 26.9
Cumulative
Class Interval for Cumulative Relative
Frequency (f) Relative
BMI levels Frequency (cf ) Frequency (%)
Frequency (%)
18.0 – 20.9 6 6 5.00 5.00
21.0 – 23.9 24 30 20.00 25.00
24.0 – 26.9 32 62 26.67 51.67
27.0 – 29.9 28 90 23.33 75
30.0 – 32.9 15 105 12.50 87.50
33.0 – 35.9 9 114 7.50 95.00
36.0 – 38.9 6 120 5.00 100.00
Total 120 100.00 100.00
GRAPHS FOR DISPLAYING 30
o Stem-and-leaf plot
constructing quartiles)
analysis
31
HISTOGRAM & FREQUENCY
POLYGONS:
Frequency distributions are often displayed with a histogram,
which looks like a bar chart but there is no space between bars.
the middle of each of the bars of the histogram, are also used
extensively.
32
TO CONSTRUCT A HISTOGRAM
• Draw the interval boundaries on a horizontal line and the
frequencies on a vertical line.
• Non-overlapping intervals that cover all of the data values
must be used.
• Bars are then drawn over the intervals in such a way that the
areas of the bars are all proportional in the same way to their
interval frequencies.
Using the above data we can contract histogram and polygon
using Excel.
33
25
20
15
10
0
18.0 - 20.9 21.0 - 23.9 24.0 - 26.9 27.0 - 29.9 30.0 - 32.9 33.0 - 35.9 36.0 - 38.9
Class Interval
35
140
120
100
80
60
40
20
0
18.0 - 20.9 21.0 - 23.9 24.0 - 26.927.0 - 29.9 30.0 - 32.9 33.0 - 35.9 36.0 - 38.9
Class Interval
37
15
12.5
10
7.5
5 5 5
0
18.0 - 20.9 21.0 - 23.9 24.0 - 26.9 27.0 - 29.9 30.0 - 32.9 33.0 - 35.9 36.0 - 38.9
Class Interval
CUMULATIVE RELATIVE FREQUENCY 38
USING OGIVE
Another way of representing of quantitative data is the Ogive
For example 80% of the respondents have a BMI less than 30.
39
40
STEM-AND-LEAF PLOT
HbA1c from diabetic patients (in %)
7.1 8.0 7.2 7.5 6.4
6.8 8.2 9.1 7.8 8.1
Stem Leaf
6 4 8
7 1 2 5 8
8 0 1 2
9 1
ADVANTAGES OF STEM-AND-LEAF 41
PLOT:
• Orders the data, so that the maximum and minimum are
evident
1.A box is drawn with the top of the box at the third quartile and
the top of the box to the largest observation and from the center of
choice.
48
observation.
The i is where the X values start and the n is where the values end.
50
Sometimes, the sum extends over all n observations, in which
= 2.2+6.4+3.8 = 12.4
For example
multiplying each value by c and adding the results is the same as first
𝛴 cXi = c 𝛴 Xi
For example:
So c = 2.9, and