EPIMLS1

Biostatistics
and
Epidemiology
Prepared by:
Jaleh V. Gacayan, RMT, MPH, LPT
Adelle N. Sanchez, MA Physics, LPT
A Self-regulated Learning Module 1

REQUIREMENTS OF THE COURSE
1. Regular Attendance to classes: You must attend online classes and live quizzes regularly by logging
in to our scheduled online activities. Online lectures will be done through Google meet and/or facebook
live. Assessments shall be given through Quizziz, Pear Deck, Canvas and/or Google forms. For offline
students, your attendance will be monitored through your responses to text information and through
timely correspondence.
2. Submission of required activities: All required activities (assignments, research work, laboratory
illustrations) should be submitted on or before the given deadline. Deadlines will be posted by the
teacher in the google classroom and messenger group chat. It will also be texted to offline students.
For online students, requirements must be submitted to the teacher’s email address which will be
provided during the class orientation. For offline students, requirements must be submitted via mail or
express courier (e.g. LBC, JRS) addressed to: Instructor’s name, School of Natural Sciences,
University of Baguio, Baguio City.
3. Seventy percent (70%) passing score in all required activities: Quizzes, exams, assignments,
research work, laboratory illustrations.
Computation of grades:
▪ The Course Grade is obtained by combining the lecture and laboratory grades (50%:50%) for the
subject.
▪ Laboratory grade shall be computed as 30% enhancement activities (illustrations; research work;
case study; experiments – when possible) plus 70% class standing (quizzes and exams).
▪ The cumulative system of computing grades shall be followed. Grades computed for midterms and
finals are considered tentative. The final midterm grade is calculated by getting 1/3 of the first
grading grade plus 2/3 of the tentative midterm grade and the final grade is computed by getting 1/3
of the midterm grade plus 2/3 of the tentative final grade.
4. Study/Learning Guidelines:
a. Manage your time properly. As students of higher education (College), you are expected to be
more responsible in paying attention to course schedules, requirements, and deadlines. Schedule
how you will accomplish all the requirements in all your enrolled courses (reading the modules,
reading on research/ enhancement questions, doing assignments and laboratory illustrations) and
focus your attention when doing your tasks.
b. Observe proper conduct. Despite this online mode of learning, you must still maintain appropriate
behavior at all times. All standards of student conduct outlined in the University of Baguio Student
Handbook remain in full effect during this time of distance learning. Be honest in answering your
quizzes and exams. Work independently when accomplishing tasks and assignments.
c. Stay motivated. Your future depends on what you do today. Maintain a positive attitude towards
learning and enjoy a fun-learning environment despite the current circumstances.
d. Maintain a performance of high standard. Give your best in accomplishing all the assigned tasks.
Do not be complacent with just a 70% passing cut-off score. Remember that this is a board subject,
and the best preparation for the board/licensure examination should be during these formative
years. The board review is but supplementary to the knowledge you have already learned during
your Med Tech education.
e. Communicate properly. Promptly respond to notifications by regularly visiting our google classroom
and messenger group chat. If you have confusions or queries in any part of this module, I am here
to guide you through. Send your academic concerns using the same online platforms. For offline
students, text messages and mobile calls are welcome during scheduled hours of the day and
week. Be guided by this schedule when communicating:

▪ Respect private hours. I do not always open my laptop/email/messenger 24/7. Send your
queries and/or concerns during regular office hours. For concerns that need immediate
attention, send through mobile text.
▪ Be patient. Messages received between 8 AM to 8 PM will be responded to within the same
day. Messages received after 8 PM will be answered starting 8 AM the next day.
▪ Before calling my mobile number, text first for permission for I might be giving an online
lecture or in a meeting or on private moment at that very instance.
▪ Saturdays and Sundays are for my family and home chores. I shall respond to
queries/messages received during these days within the first office hour of Monday.
f. Show mutual support. Support one another. Let us all be responsible and supportive in making this
new learning process more effective.
g. Live lecture/Video conferencing guidelines:
g.1 Be punctual. Live lectures/Video conferences will be scheduled during the official class
period/time of this course. Log in to the platform at least 5-10 minutes before the class period.
Prepare your learning materials such as this module, pens, papers, etc. Attendance will be
checked during the lecture/video conference.
g.2 Maintain professionalism.
- Wear appropriate clothing and set your gadget in an appropriate area. You may be asked
to turn on your video/camera at any time during the lecture.
- Log in using your UB gmail account. Unidentified names like nicknames, phone models,
etc. will not be allowed in the video conference.
- Mute your microphone as soon as you log in to the platform to avoid any excess
background noise. Unmute your microphone when instructed to do so.
- Be courteous. Do not interrupt your teacher or a classmate who is speaking. You may type
your question in the Chat area, or use the “raise hand” feature if available, and wait until
you are allowed to speak.
- Respect privacy. Do not take a screenshot, picture, snapshott, etc. of your teacher or
fellow students, nor make any unnecessary audio or video recordings.
g.3 Remain focused and engaged. Do not be distracted by your gadget. Keep your
videoconference platform open and do not navigate other tabs or webpages unless directed
by your teacher.
Endorsed by:
Teresa N. Villanueva, RMT, MACT

Dean, School of Natural Sciences

STUDY SCHEDULE
WEEK TOPIC ACTIVITY
1 Lesson 1:Populations and Lecture: Assessment & Quiz
Samples Laboratory 1: Introduction to
Biostatistics and Epidemiology
2 Lesson 2: Graphical Summaries Lecture: Assessment & Quiz
Laboratory 2: Quartiles, Deciles,
and Percentiles
3 Lesson 3: Measures of Location Lecture: Assessment & Quiz
and Spread Laboratory 3: Mean, Median,
Mode
4 Lesson 4: T-test Lecture: Assessment & Quiz
Laboratory 4: T-test
5 Lesson 5: Regression Lecture: Assessment & Quiz
Laboratory 5: Regression
6 FIRST GRADING EXAM Coverage Lesson 1-5
7 Lesson 6:Correlation Lecture: Assessment & Quiz
Laboratory 6: Correlation
8 Lesson 7: Chi Square Lecture: Assessment & Quiz
Laboratory 7: Chi Square
9 Lesson 8: Analysis of Variance Lecture: Assessment & Quiz
Laboratory 8: ANOVA
10 Lesson 9: Relative Risk Lecture: Assessment & Quiz
Laboratory 9
11 Lesson 10: Odds Ratio, Lecture: Assessment & Quiz
Prevalence and Incidence Laboratory 10: Odds Ratio
12 MIDTERM Coverage Lesson 1-10
13 Lesson 11 Epidemiology Lecture: Assessment & Quiz
Laboratory 11
14 Lesson 12 Descriptive Lecture: Assessment & Quiz
Epidemiology Laboratory 12
15 Lesson 13 Analytical Lecture: Assessment & Quiz
Epidemiology and Experimental Laboratory 13
16 Lesson 14 Causal Interference Lecture: Assessment & Quiz
Laboratory 14
17 Lesson 15 Field Epidemiology & Lecture: Assessment & Quiz
Outbreak Investigation Laboratory 15
18 Lesson 16 Chronic Disease Lecture: Assessment & Quiz
Epidemiology Laboratory 16 and 17
Lesson 17 Clinical Epidemiology
18 FINALS Coverage Lesson 1-17
Table of Contents
Page
Introduction of the Module 5
Lesson 1 Populations and Samples 6
Laboratory 1: Introduction to Biostatistics and 11
Epidemiology
Lesson 2 Graphical Summaries 19
Laboratory 2: Quartiles, Deciles, and Percentiles 23
Lesson 3 Measures of Location and Spread 25
Laboratory 3: Mean, Median, Mode 31
Lesson 4 T-test 33
Laboratory 4: T-test 37
Lesson 5 Regression 39
Laboratory 5: Regression 43
Lesson 6 Correlation 44
Laboratory 6: Correlation 49
Lesson 7 Chi Square 51
Laboratory 7: Chi Square 55
Lesson 8 Analysis of Variance 57
Laboratory 8: ANOVA 63
Lesson 9 Relative Risk 66
Laboratory 9 70
Lesson 10 Odds Ratio, Prevalence and Incidence 72
Laboratory 10: Odds Ratio 77
Lesson 11 Epidemiology 78
Laboratory 11 87
Lesson 12 Descriptive Epidemiology 89
Laboratory 12 100
Lesson 13 Analytical Epidemiology and Experimental 104
Epidemiology
Laboratory 13 108
Lesson 14 Causal Interference 109
Laboratory 14 118
Lesson 15 Field Epidemiology & Outbreak Investigation 119
Laboratory 15 134
Lesson 16 Chronic Disease Epidemiology 139
Laboratory 16 141
Lesson 17 Clinical Epidemiology 146
Laboratory 17 149

Introduction of the Module
Course Code: EPIMLS1
Course Title: Biostatistics and Epidemiology
Course Description:
This course focuses on the study of infectious disease, chronic disease and health
related conditions. It also studies the distribution and determinants of health related
conditions in human populations and the application of this method to the control of health
problems. It also aims to answer a number of questions related to health problems in
human populations. This course also incorporates statistics to public health issues. It
makes use of concepts and interpretation of the different statistical tools to interpret the
result of public health researches. Hypotheses are generated and tested to accurately
describe data and make conclusions. Subject of interest should be carefully selected to
represent the case in a particular population.
Requirements of the Course:
Quizzes and completed laboratory manuals are needed to be able to finish the course

Lesson 1
Populations and Samples
Desired Learning Outcomes:

1. Be familiar with the different terms in sampling
2. Understand how sampling is done in research
3. Distinguish the different study designs
4. Describe variable types
Introduction:
One of the key steps to clearly defining the research question is to state clearly who or what
needs to be studied. Although many studies in public health involve human beings as subjects,
this is not a requirement. The subjects can be anything. Examples are children, extracted teeth,
pregnant women, water sources exposed to bacteria, cell cultures, households in the tropics,
muscle tissues, etc. The subjects in a study are the sources of information or data. Data are
obtained by measuring the characteristics of the subjects.
Lesson Proper:
Defining who or what is going to be studied means defining the population, which consists of
all possible subjects of interest. Defining the population is a critical step because it defines the
subjects to be studied and the subjects to whom the conclusions will be applicable.
A sample is a smaller set or a subset of the population. A representative sample is a subset
that provides an accurate picture of the whole population. The sample data should be very similar
to what would be found in the whole population. A representative sample provides an accurate
picture of the population but an unrepresentative sample may be misleading.
A biased sample occurs when certain members of the population are chosen so that the
sample systematically misrepresents the population. This happens when subjects select
themselves for the sample or the investigator select subjects that are convenient for him. To avoid
this random sampling must be done. A sampling frame must be created where respondents are
listed and assigned a unique number. A random number generator can be used to randomly
select the sample.
Types of Random Sampling:

1. Simple random sampling
Each subject in the population has the same chance of being selected. The random
number generator randomly selects numbers from the range of unique identifiers in the
sampling frame. The selected identifiers are then matched wit he subjects to select the
sample. On average, they are representative, but a single simple random sample may not
reflect the true population. For large populations, a simple random sample may not be
feasible.
2. Stratified random sampling
The sampling frame is divided into subgroups or strata and simple random samples are
conducted within the strata.
3. Systematic random sampling
The sampling frame is ordered, and a number s is selected so that every s th subject is
selected to be in the sample. The selection process is still random because the starting
point for choosing the sample is based on random chance. Although systematic random
samples are a strategy for choosing subjects at random, the systematic component can
lead to bias. If the sampling frame is ordered so that a characteristic of the population
repeats for every s subjects, choosing every s th subject could result in a sample with or
without the characteristic, depending on the starting point.
Study Designs
The study design is how information on the subjects will be collected.
Kinds of Study Designs

1. Prospective study
Subjects are identified and followed for a specific period of time. Data collection starts at
the beginning of the study and continues as subjects are followed during the study. The
investigator controls what variables are measured and how they are measured. In public
health, prospective studies are often conducted when the goal is to compare groups.
2. Cohort study
Cohort studies are a type of medical research used to investigate the causes of disease
and to establish links between risk factors and health outcomes. Cohort studies typically
observe large groups of individuals, recording their exposure to certain risk factors to find
clues as to the possible causes of disease. They can be prospective studies and gather
data going forward, or retrospective cohort studies, which look at data already collected.
3. Retrospective study
An outcome is identified, after the data have already been collected. Previously collected
data are reviewed to determine whether any characteristics impacted the outcome. These
studies are conducted when the outcome is not very common and when it would require
a long time to follow subjects prospectively.
4. Case control study
Case subjects (those having the outcome) and control subjects (those not having the
outcome) are identified. Existing data are then obtained to determine what factors were
related to subjects becoming either a case or a control.
5. Cross-sectional study
Data are collected at a particular time point and represent a cross-section of time. The
outcome and the variables of interest are all measured at the same time. Surveys that
measure the responses of subjects at a particular point in time are typically conducted as
part of a cross-sectional study.
Variable Types
1. Categorical

Variables whose measurements represent a limited set of possible values. It is also known
as discrete variables. The values can be expressed in either numbers or in characters and
words. These variables are best analyzed using procedures that try to count the number
of subjects with a particular category or level.
a. Ordinal
These are variables with different levels or categories whose order matters. Examples
include pain scores, stages of cancer, and educational attainment
b. Nominal
These are categorical variables with different levels or categories whose order does
not matter. Examples are tooth color, marital status, and political affiliation.
c. Dichotomous
These are variables that can have only two levels. Examples include yes/no variables
and sex.
2. Continuous
Variables whose measurements represent an unlimited set of possible values. These
variables can have only numeric values. There are no natural gaps between the numbers
and in-between values are possible. The level of detail measured by a continuous variable
is limited only by the level of detail of the measuring instrument. Examples include BMI,
Viral Load, and average probing depths. These data are best analyzed by procedures that
allow for multiplying, dividing, adding, or subtracting the values.
3. Count
These variables can take on only positive, whole number values. Examples are number
of cavities in a mouth, number of side effects, number of sexual partners, and the number
of cigarettes.
How to describe the subjects: Numerical Summaries

For categorical variables, numerical summaries include:
• Counts – For each level or category, a subject either belongs in the category or not. The
total number of subjects with a particular category or level is a count.
• Proportions – The proportion is simply the count for a category divided by the total number
of subjects.
• Percentages – The percentage is the proportion times 100
For continuous variables, there are many options for summarizing the responses numerically.
Typically, numerical summaries of continuous variables are the mean, standard deviation,
median, first and third quartiles, and minimum and maximum. Some of these numerical
summaries describe the center of the distribution, and some describe the spread.
A measure of center provides a description of the average response. A measure of spread
provides a description of how varied the responses are. A measure of how spread out the
responses are tells the investigator whether the responses are clustered close to the center or
dispersed farther away from the center.
The mean is commonly used to describe the center of the responses. However, when
extremely large or small values are present, the mean is no longer a good measure of the center.
In this case, the median, which is the middle of the responses, is a better measure of the center.
The choice for the measure of spread depends on the measure of center. Mixing and matching
measures of center and spread is not appropriate. If the mean is chosen as the measure of center,
then an appropriate measure of spread is the standard deviation. If the median is chosen to
summarize the center of the data, then an appropriate measure of spread is the range. The range

can be presented either as the difference between the maximum and the minimum or as the
interval of values.
Notation for Parameters and Statistics

Parameters
These are numerical summaries that describe the sample. Parameters are the numerical
summaries that an investigator wants but cannot obtain directly because collecting data on the
entire population is not feasible.
Statistics
These are numerical summaries that describe the sample. If the sample is representative of the
population, then the numerical summaries obtained from the sample will provide a good
approximation of the population.
Listed below are examples of notations for statistics and parameters.

Example Questions Example Numerical Example Notation for a
Summaries Population Sample (Statistic)
(Parameter)
What do the children The mean (and Mean (𝜇) Mean (𝑥̅ )
weigh? standard deviation) Standard deviation Standard deviation (s)
weight (𝜎)
Are they participating The proportions Proportion (p) Proportion (𝑝̂ )
in school lunch participating in school
programs? lunch programs
What color is their The proportion Green (𝑝𝑔 ) Green (𝑝 ̂)
𝑔
favorite fruit or whose favorite Yellow/white (𝑝𝑦𝑤 ) Yellow/white (𝑝̂ 𝑦𝑤 )
vegetable? fruit/vegetable is Orange (𝑝𝑜 ) Orange (𝑝 ̂)0
green, yellow/white, Red (𝑝𝑟 ) Red (𝑝 ̂)
𝑟
orange, red, Blue/purple(𝑝𝑏𝑝 ) Blue/purple(𝑝̂ 𝑏𝑝 )
blue/purple
What is the activity The proportion who Very sedentary (𝑝1 ) Very sedentary (𝑝 ̂)
1
level? are very sedentary, Somewhat sedentary Somewhat sedentary
somewhat sedentary, (𝑝2 ) (𝑝
̂)2
moderately active, or Moderately active Moderately active
very active (𝑝3 ) (𝑝
̂)3
Very active (𝑝4 ) Very active (𝑝 ̂)
4
How many hours are The median (and Median (M) Median (𝑚 ̅)
spent watching range) number of Range (R) Range (𝑅̅ )
television or playing hours
video games?
Synthesis:
Ideally, samples are chosen from complete sampling frames using an element of random
chance. When a sample systematically misses a group of subjects in a population, the sample
suffers from selection bias. Bias can occur when the sampling frame omits a particular group of
subjects, when subjects are either self-selected or investigator-selected. Although eliminating all
sources of bias from the sample selection process may not be possible, selecting a sample
requires careful consideration and procedures should be carefully planned so that potential biases
are minimized.

Parameters describe the population while statistics can be calculated from the data. In Other
words, statistics describe the sample and parameters describe the population. Statistics need to
be presented using notations to differentiate them from parameters. A statistic is presented as a
letter with a bar or hat on top but a parameter never has a bar or hat.
To help you with this topic, you can watch the video links below:
https://www.youtube.com/watch?v=Mb9BuEkbaHQ
https://www.youtube.com/watch?v=VPM84_yfx5Q
Assessment:
Offline Learners: Answer the given activity and send it to your instructor’s email.
Online Learners: Answer the quiz posted in Google Classroom.
Give brief and concise answers for the following questions. Be guided by the rubric below: (20
points)
Content – 3 points Coherence – 2 points
1. Consider the population of children ages 6-10 who attend public schools. How could a
representative sample of this population be obtained? Identify the problems with using
the following methods to obtain a sample of this population:
a. Send a survey (with a stamped return envelope) to every public elementary school.
The students who return the survey are the sample.
b. The investigator has worked with educational leaders in states in the Southeast and
knows that the schools in this region will be motivated to participate. The investigator
collects data only on children from these schools.
c. Children attending elementary schools with high test scores may be more likely to
participate, so the investigator chooses only children from schools in the top 10%
d. The investigator conducts a random sample using home telephone number of
parents with children ages 6-10 public schools.
Reference:
Bush, H., Macera, C., Shaffer, R., and Shaffer, P. (2020). Biostatistics and Epidemiology.
Taguig: Cengage Learning Asia Pte Ltd.

LABORATORY 1
INTRODUCTION TO BIOSTATISTICS AND EPIDEMIOLOGY
I. BACKGROUND
Epidemiology and biostatistics are the basic sciences of public health. Public health
investigations use quantitative methods, which combine the two disciplines of epidemiology and
biostatistics. Epidemiology is about the understanding of disease development and the methods
used to uncover the etiology, progression, and treatment of the disease. Information (data) is
collected to investigate a question. The methods and tools of biostatistics are used to analyze the
data to aid decision making. (John Hopkins Bloomberg School of Public Health)
II. OBJECTIVES
At the end of the lesson you are expected to:
1. be able to identify the basic concepts and principles of biostatistics & epidemiology.
2. trace the historical development of Biostatistics & Epidemiology
III. MATERIALS
Coupon bond
Writing & Drawing materials
References
IV. ACTIVITY
1. Create a timeline on the historical development of Biostatistics and Epidemiology

a) International level
b) National level
2. Define the following terminologies:
a) Statistics
b) Biostatistics
c) Epidemiology
d) Mathematical Statistics
e) Applied Statistics
f) sample
g) population
h) Qualitative variables
i) Quantitative variables
- nominal - ordinal
- interval - ratio
3. Define observational study. List and describe the types of observational studies

V. QUESTIONS/SUPPLEMENTAL ACTIVITIES/EXERCISES
* Based on the research activities conducted answer the following exercises
A. Identify the type of data (nominal, ordinal, interval and ratio) represented by each of the
following.
1. Blood group
2. Temperature (Celsius)
3. Ethnic group
4. Job satisfaction index (1-5)
5. Number of heart attacks
6. Calendar year
7. Serum uric acid (mg/100ml)
8. Number of accidents in 3 - year period
9. Number of cases of each reportable disease reported by a health
worker
10. The average weight gain of 6 1-year old dogs (with a special diet supplement)
was 950grams last month.
B. Refer to the journal attached to answer the following questions

1. Identify the population of the study (encircle it with black ink)
2. Identify the sample of the study (encircle it with red ink)
3. Identify the different statistical methods used in the study (highlight)
4. Identify the qualitative variables in the study (encircle with blue ink)
5. Identify the type of observational study utilized in the journal (box it with red ink)

LESSON 2
GRAPHICAL SUMMARIES

1. Recall frequency distribution and normal distribution
2. Accurately interpret graphs and charts
3. Give meaning to p-values
Introduction:
Each type of variable has its own special properties, and the distribution of each type of
variable has a particular shape and characteristics. The distribution of a variable consists of a
summary of the possible values the variable can have and the number of subjects with each of
these values. A distribution that uses counts to describe the number of subjects with a particular
value is called a frequency distribution. A distribution that uses proportions to describe the
number of the subjects with a particular value is called a probability distribution.
Lesson Proper:
Two types of graphs are used to summarize categorical variables: pie charts and bar graphs.
Pie charts can be presented using frequencies or proportions. A pie chart describes how the
pieces relate to the whole. So the counts for all the slices (categories) must add up to the total
number of subjects, and/or the proportions for all the slices (categories) must add up to 1 (or
100% if percentages are used). Pie charts are generally used when trying to demonstrate how
the categories within a variable relate to each other. Bar graphs are used to describe the
distributions of categorical variables. The height of the bar indicate how many subjects are in each
value or category. The height of the bar can represent either the number of subjects (frequency)
or the proportion (probability) of subjects in a particular category. Because the shape of the graph
does not change when using the count or proportion of participants, many times the proportion of
participants is preferred because it takes the total number of subjects into account. Because the
heights of the bars can easily be compared, these graphs are particularly useful when the
research question involves comparisons.
Binomial distributions are used when a data has a variable with two options. These
variables are dichotomous and each subject has only two possibilities: have the characteristic or
do not have the characteristic. Often these two options are measured by a variable where subjects
are assigned a 0 for not having the characteristic and 1 for having it. Since there is a gap between
two possible values, binomial variables are said to be discrete. The mean of the binomial
distribution is simply the number of subjects with the characteristic or the proportion multiplied by
the sample size. The variance of the binomial distribution is a little more complicated. It is the
proportion of the sample with the characteristic multiplied by the proportion without it, all divided
by the sample size.
Histograms best describe the distribution of a continuous variable. A histogram is a graphical
representation of a variable in which the observed values are categorized, a bar is drawn for each
category, and the number of participants in each category is represented by the height of the bar.
It provides a quick picture of the distribution of a variable and it can be presented with counts or
proportions of participants. There are no gaps between the bars of the histogram which
demonstrates that all in-between values are possible. The chart must have enough bars to
present a good picture of the distribution results. Too few or too many categories (bars) hide
interesting trends in the data. Histograms provide information about how spread out the
responses are, which responses are common, which responses are in the center, and the overall
shape of the distribution.

Symmetric distributions can be folded in half so that each half is close to a mirror image of
the other. The mode of the distribution represents the most common response. A unimodal
distribution has one mode or one most common value. It is represented by the peak in the
histogram. A distribution with two peaks can be bimodal.
When the histogram is bell-shaped, unimodal, and symmetric, with the mean, median, and
most common value at the center at the peak, the data come from a normal distribution.
Because there are no gaps between possible values, normally distributed values are continuous
variables. Normally distributed variables are characterized by two values: the mean and the
standard deviation (or variance). Therefore, normally distributed variables have the same overall
shape but may have different centers and/or different spreads, depending on the mean and the
standard deviation. In the normal distribution, knowing the mean indicates nothing about the
spread, and knowing the standard deviation indicates nothing about the center.
Using the standard deviation as a unit of distance, 68% of all subjects are within one standard
deviation of the mean, 95.4% are within two standard deviations of the mean and 99.7% are within
three standard deviations of the mean. This rule, known as empirical rule, can be applied to any
normal distribution. The empirical rule can be used to determine if observations are common or
extreme. Observations in the middle of the distribution are considered common and those more
than three standard deviations from the mean are considered to be very rare.
The normal distribution is left skewed when the distribution has a tail that extends longer to
the left, that is, there is a set of observations with lower values than those of the majority of the
observed responses. A distribution is right skewed when the distribution has a tail that extends
longer to the right, that is, there is a set of observations with higher values than those of the
majority of the observed responses.
A Poisson distribution is a discrete probability distribution whose possible values are whole
numbers from 0 to infinity. It is not necessarily symmetric. When the mean is small the distribution
is somewhat skewed. As the mean increases, the distribution becomes more symmetric. One of
the assumptions of this distribution is that the mean and the variance are the same. An advantage
of this distribution is that it is similar to the binomial distribution when the mean is very small.
Percentiles are percentages of all the observations that are less than the value of interest. It
is used to determine whether a particular value is common or rare. These are easily obtained
using statistical tables or online calculators.
Variability in measurements occurs when multiple measurements are taken on the subject.
If there is little measurement variability, the measurement has reliability. Variability also exists
between subjects. In any study, the subjects are generally not the same. Subject-to-subject
variability is one of the primary reasons numerical summaries are necessary. The idea that
samples may be different is called sampling variability. Because samples are different, the
numerical summaries from these samples are expected to be different. Therefore, it is important
not to over interpret the results obtained from a single sample. Although it may seem troubling
that different samples could result in different summaries, the study of statistics, in particular
statistical inference, provides some reassurance. The value of the statistics and the number of
times the statistics occur from all the possible samples is known as the distribution of samples or
the sampling distribution. It provides a description of all possible statistics obtained from
samples.
The central limit theorem is the characterization of all sample means. According to this
theorem, the distribution of the means obtained from all possible samples will result in a normally
shaped distribution, in which the center of the distribution is the true parameter and one standard
deviation of the sampling distribution is the standard error of the mean. This theorem holds true
for large sample size.
Estimation (Confidence Intervals)

Common confidence levels are 90%, 95%, and 99%. For a normally distributed sampling
distribution:
90% of all statistics are within 1.645 standard errors of the true parameter
These confidence intervals can be calculated as:
90% confidence interval: point estimate ± (1.645 x standard error)
Hypothesis Testing
A statement claiming that the null parameter is the true parameter is called the null
hypothesis. An alternative or research hypothesis is a hypothesis that states the true parameter
is not (or is less than or is greater than) the null parameter. To reject the null parameter as the
true parameter, the observed statistic needs to be far enough away from the center so that the
observed statistic clearly came from a sampling distribution that was not centered at the null
parameter. On the other hand, when the observed statistic is very close to the null parameter,
claiming that this statistic would not be expected from a sampling distribution centered at the null
parameter would be difficult.
The proportion of statistics that are even farther from the null parameter than the observed
statistic is called the p-value. When the p-value is small, the observed statistic is rare and
provides evidence against the null hypothesis. When the p-value is large, the observed statistic
is common and does not provide sufficient evidence against the null hypothesis. Generally, p-
values that are smaller than 0.05 are considered small enough to reject the null hypothesis. When
studies are observational or exploratory, researches may consider p-values smaller than 0.10 as
small enough to reject the null hypothesis. In contrast, when the researcher wants to be really
sure that the null parameter is not the true parameter, 0.01 or even 0.001 is defined as small
enough to reject the null hypothesis.
Type 1 Error
This occurs when a statistic provides evidence against the null parameter, but the null
parameter is really the true parameter. This happens when the null hypothesis is rejected even
though it is really true. If the cutoff for defining a small enough p-value to reject the null is set at
0.05 or 5%, then 5% of the possible statistics in the sampling distribution are far enough away to
reject the null parameter even though it is really the true parameter. Therefore, there is a 5%
chance that a type 1 error will be made because these are rare statistics from a sampling
distribution centered around the null parameter. The rule for when the reject the null parameter
as the true parameter is often written as when p-value < 𝛼 for example, p-value < 0.05.
Type 2 Error
This happens if the research hypothesis is true and the null hypothesis is not rejected. When
the significance level is set to a very small value, then it is more likely that a type 2 error will be
made. Thus, as the chance of a type 1 error decreases, the chance of a type 2 error increases. A
type 2 error occurs when the null parameter is not the true parameter, but the null hypothesis is
not rejected. The probability of a type 2 error is presented less often than the significance level.
Power
The goal of any hypothesis test is to have a sample with good power, that is, a sample with a
good chance of supplying enough evidence so that an incorrect null parameter can be rejected
as the true parameter. The probability that the null hypothesis will be rejected when it is indeed
false is called power. Power is the opposite of the probability of a type 2 error and can be written
as 1 − 𝛽. A study with good power has power of 80-90%. If a study has 90% power, this means
that there is a 90% chance that a false null parameter will be rejected as the true parameter.
Power, type 1 error, and type 2 error all work together. The probabilities of type 1 and type 2
errors are inversely related. Because the probability of a type 2 error and power are opposites,
they are also inversely related. Hence, when the probability of a type 1 error increases, then the
power also increases.
Synthesis:
Many research questions remain to be answered in public health. Statistical methods can be
utilized in the development of the question, deciding whom or what to study, determining what to
measure, summarizing the measurements, and testing hypothesis. After the research question
has been posed, the sample selected, and the measurements made, statistical methods provide
a strategy form moving from data points to answers.
There are many statistical procedures implementing describing the data, estimating the
parameter and testing the hypothesis. Organizing the methods according to the research question
simplifies choosing the appropriate statistical method.
To help you understand the topic better, visit the following links:
https://www.youtube.com/watch?v=U6fBc_SPVoM
https://www.youtube.com/watch?v=s6y3ykvDols
Assessment:
Offline Learners: Answer the activity below and send it to your instructor via email.
Online Learners: Answer the quiz posted on Google Classroom
Answer the following in 5-7 sentences. Be guided by the rubric below: (20 points)
Content – 7 points Coherence – 3 points
1. Suppose you were conducting a study to investigate the effects of an exercise program
on a participant’s weight. The participants will be coming in weekly to be weighed.
Describe how reliability might be a factor in this study.
2. You collect information on a random sample of women and obtain estimates on vitamin D
exposure. However, the results that you find are different than those conducted at another
study site. Should you be concerned? Explain your answer using the concept of sampling
variability.
References:
Beligan, S. (2016). Statistics for the Social Sciences. La Trinidad: Benguet State University. ISBN
971-0330-05-5

LABORATORY 2
QUARTILES, DECILES, AND PERCENTILES
I. BACKGROUND
Quartiles divide the set of observations into four equal parts. They are named as quartile 1
(Q1), quartile 2 (Q2), and quartile 3 (Q3). Q2 is the median and Q1 and Q3 are lower and upper
quartiles.
Formula for ungrouped data:

𝑖𝑁+1
If N is odd: 𝑄𝑖 =
4
1 𝑖𝑁 𝑖𝑁
If N is even: 𝑄𝑖 = [ +( + 1)]
2 4 4
Formula for grouped data:

𝑖𝑁
−𝐹
𝑄𝑖 = 𝐿 + 4 𝐶
𝑓
Where:
L – lower limit of the quartile class
N – total frequency
F – cumulative frequency of the class just above the quartile class
f - frequency of the quartile class
C – class interval of the quartile class
𝑖𝑁
**The quartile class is the class for which the cumulative frequency is ≥
4
Deciles divide the set of observations into 10 equal parts. In a set, there are 9 deciles
Formula for ungrouped data:

𝑖(𝑁+1)
If N is odd: 𝐷𝑖 =
10
1 𝑖𝑁 𝑖𝑁
If N is even: 𝐷𝑖 = [ +( + 1)]
2 10 10
Formula for grouped data:

𝑖𝑁
−𝐹
𝐷𝑖 = 𝐿 + 10 𝐶
𝑓
Where:
L – lower limit of the decile class
F – cumulative frequency for the class just above the decile class
f – frequency of the decile class
C – class interval of the decile class
𝑖𝑁
**The decile class is the class for which the cumulative frequency ≥
10
Percentiles divide the set of observations into 100 equal parts and there are 99 percentiles
Formula for ungrouped data
𝑖(𝑁+1)
If N is odd: 𝑃𝑖 =
100

1 𝑖𝑁 𝑖𝑁
If N is even: 𝑃𝑖 = [ + ( + 1)]
2 100 100
Formula for grouped data

𝑖𝑁
−𝐹
𝑃𝑖 = 𝐿 + 100 𝐶
𝑓
Where:
L – lower limit of the percentile class
F – cumulative frequency for the class just above the percentile class
f – frequency of the percentile class
C – class interval of the percentile class
𝑖𝑁
**The percentile class is the class for which the cumulative frequency is ≥
100
To better understand this, watch the video by clicking on the link below:
https://www.youtube.com/watch?v=40o82o3uNfk
https://www.youtube.com/watch?v=uYIl2M9YwHE
https://www.youtube.com/watch?v=XiJV6Lm1En0
II. OBJECTIVES
1. Accurately solve for the percentile, decile and quartile of grouped and ungrouped data
2. Understand the use of percentile, decile, and quartile in biostatistics
III. MATERIALS
Coupon bond
Calculator
Writing materials
IV. ACTIVITY
Use a separate sheet of bond paper to write the solution of the following data.
1. Find the quartile, 3rd and 7th decile, and 50th percentile of the ungrouped data below.
Height (inches) 58 59 60 61 62 63 64 65 66
# of students 15 20 32 35 33 22 20 10 8
2. Find the quartile, 4th and 6th decile, and 30th percentile of the grouped data below.
Marks Frequency Cumulative frequency
0-10 3
11-21 4
22-32 6
33-43 8
44-55 4
1. Why is it necessary to solve for percentile, decile, and quartile in a given set of data?

2. What is the application of percentile, decile, and quartile in biostatistics and
epidemiology?
LESSON 3
MEASURES OF LOCATION AND SPREAD

1. Understand and recall mean, median, mode, range, variance, standard deviation, and
coefficient of variation
2. Explain the use of measures of location and spread in interpreting the given data
3. Solve for the mean, median, mode, range, variance, standard deviation and coefficient
of variation
Introduction:
A measure of central tendency (position or location) is a single value about which the set of
observations tend to cluster. The measures of position provide precise, objectively determined
value that can easily be manipulated, interpreted, and compared with one another. In short, it
permits a more careful analysis of the data than do the general impression conveyed by tabular
and graphical summaries. Some popular and commonly used measures of position are the mean
median, and mode.
Lesson Proper
MEASURES OF LOCATION
Mean
One measure of location for a sample is the arithmetic mean (colloquially called the average).
The arithmetic mean (or mean or sample mean) is usually denoted by 𝑥̅ . The arithmetic mean is
the sum of all the observations divided by the number of observations. It is written in statistical
terms as:
1
𝑥̅ = ∑𝑛𝑖=1 𝑥𝑖
𝑛
The arithmetic mean is, in general, a very natural measure of location. One of its main
limitations, however, is that it is oversensitive to extreme values. In this instance, it may not be
representative of the location of the great majority of sample points.
Median
An alternative measure of location, perhaps second in popularity to the arithmetic mean, is
the median or, more precisely, the sample median. Suppose there are n observations in a sample.
If these observations are ordered from smallest to largest, then the median is defined as follows:
𝑛+1
• The ( ) 𝑡ℎ largest observation if n is odd
2
𝑛 𝑛
• The average of the ( ) 𝑡ℎ and ( + 1) 𝑡ℎ largest observation if n is even
2 2
The rationale for these definitions is to ensure an equal number of sample points on both sides
of the sample median. The median is defined differently when n is even and odd because it is
impossible to achieve this goal with one uniform definition. Samples with an odd sample size have
a unique central point; for example, for samples of size 7, the fourth largest point is the central
point in the sense that 3 points are smaller than it and 3 points are larger. Samples with an even
sample size have no unique central point, and the middle two values must be averaged. Thus, for
samples of size 8 the fourth and fifth largest points would be averaged to obtain the median,
because neither is the central point. The main strength of the sample median is that it is insensitive
to very large or very small values. The main weakness of the sample median is that it is
determined mainly by the middle points in a sample and is less sensitive to the actual numeric
values of the remaining data points.
Mode
The mode is the most frequently occurring value among all the observations in a sample. In
certain cases, mode can be an extremely helpful measure of central tendency. One of its biggest
advantages is that it can be applied to any type of data. It is also not affected by extreme values
in datasets with quantitative data. Thus, it can provide the insights into almost any dataset despite
the data distribution. The measure of mode cannot be further treated mathematically and cannot
be used for more detailed analysis. It is also not based on all values in the dataset, therefore, it is
difficult to draw conclusions regarding the dataset relying on mode only.
Example 1:
Determine the mean, median, and mode of the following data:
a) Solving for the mean

𝑛
1 1
𝑥̅ = ∑ 𝑥𝑖 = (3265 + 3260 + 3245 + 3484 + 4146 + 3323 + 3649 + 3200 + 3031
𝑛 20
𝑖=1
+ 2069 + 2581 + 2841 + 3609 + 2838 + 3541 + 2759 + 3248 + 3314
+ 3101 + 2834)
𝑥̅ = 3166.9
b) To determine the median, arrange the values in ascending order, then determine the
middle value
2069, 2581, 2759, 2834, 2838, 2841, 3031, 3101, 3200, 3245, 3248, 3260, 3265, 3314,
3323, 3484, 3541, 3609, 3649, 4146
1 𝑛 𝑛 1
𝑥̃ = [ + ( + 1)] = (3245 + 3248) = 3246.5
2 2 2 2
c) There is no mode in the given sample since there is no repeated term.
MEASURES OF SPREAD

Range
The range is the difference between the largest and smallest observations in a sample.
Quantiles
Another approach that addresses some of the shortcomings of the range in quantifying the
spread in a data set is the use of quantiles or percentiles. Intuitively, the pth percentile is the value
Vp such that p percent of the sample points are less than or equal to Vp. The median, being the
50th percentile, is a special case of a quantile. As was the case for the median, a different
definition is needed for the pth percentile, depending on whether or not np/100 is an integer.
The pth percentile is defined by

(1) The (k + 1)th largest sample point if np/100 is not an integer (where k is the largest integer
less than np/100).
(2) The average of the (np/100)th and (np/100 + 1)th largest observations if np/100 is an
integer.
Percentiles are also sometimes called quantiles.
The spread of a distribution can be characterized by specifying several percentiles. For
example, the 10th and 90th percentiles are often used to characterize spread. Percentiles have
the advantage over the range of being less sensitive to outliers and of not being greatly affected
by the sample size (n). To compute percentiles, the sample points must be ordered.
Example 2:
Solve for the range, 10th and 90th percentile of example 1.
a) Range
𝑅 = 𝐻𝑉 − 𝐿𝑉 = 4146 − 2069 = 2077
th
b) 10 percentile
Multiply n by the decimal equivalent of the 10th percentile
20 x 0.1 = 2
10th percentile = average of 2nd and 3rd values
10th percentile = (2581 + 2759)/2 = 2670 g
c) 90th percentile
20 x 0.9 = 18
90th percentile = (3609 + 3649)/2 = 3629 g
We would estimate that 80% of birthweights will fall between 2670 g and 3629 g, which gives
an overall impression of the spread of the distribution.
Variance
The variance combines all the values in a data set to produce a measure of spread. It tells how
spread the data is. It is defined as:
∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2
𝑠2 =
𝑛−1
Standard Deviation
The standard deviation measures spread around the mean. Because of its close links with the
mean, standard deviation can be greatly affected if the mean gives a poor measure of central
tendency.

Standard deviation is also influenced by outliers one value could contribute largely to the
results of the standard deviation. In that sense, the standard deviation is a good indicator of the
presence of outliers. This makes standard deviation a very useful measure of spread for
symmetrical distributions with no outliers.
Standard deviation is also useful when comparing the spread of two separate data sets that
have approximately the same mean. The data set with the smaller standard deviation has a
narrower spread of measurements around the mean and therefore usually has comparatively
fewer high or low values. An item selected at random from a data set whose standard deviation
is low has a better chance of being close to the mean than an item from a data set whose standard
deviation is higher. The standard deviation is defined as:
∑𝑛 (𝑥𝑖 − 𝑥̅ )2
𝑠 = √ 𝑖=1 = √𝑠 2
𝑛−1
Example 3: Solve for the variance and the standard deviation of Example 1
It would be easier to solve it if the values are placed on a table
𝑥𝑖 𝑥𝑖 − 𝑥̅ (𝑥𝑖 − 𝑥̅ )2
2069 -1097.9 1205384
2581 -585.9 343278.8
2759 -407.9 166382.4
2834 -332.9 110822.4
2838 -328.9 108175.2
2841 -325.9 106210.8
3031 -135.9 18468.81
3101 -65.9 4342.81
3200 33.1 1095.61
3245 78.1 6099.61
3248 81.1 6577.21
3260 93.1 8667.61
3265 98.1 9623.61
3314 147.1 21638.41
3323 156.1 24367.21
3484 317.1 100552.4
3541 374.1 139950.8
3609 442.1 195452.4
3649 482.1 232420.4
4146 979.1 958636.8
Total 3768148
∑ 𝑛 (𝑥 2
𝑖=1 𝑖 − 𝑥̅ ) 3768148
𝑠2 = = = 198323.58
𝑛−1 20 − 1
𝑠 = √𝑠2 = √198323.58 = 445.34

Coefficient of Variation
This value relates the mean to the standard deviation. It is defined by:
𝑠
𝐶𝑉 = × 100%
𝑥̅
The CV is most useful in comparing the variability of several different samples, each with different
arithmetic means. This is because a higher variability is usually expected when the mean
increases, and the CV is a measure that accounts for this variability. The CV is also useful for
comparing the reproducibility of different variables.
The CV of example 1 is:

𝑠 445.34
𝐶𝑉 = × 100% = × 100% = 14.06%
𝑥̅ 3166.9
Synthesis:
The mean reflects the magnitude of every observation, since every observation contributes to
the value of the mean. It is easily affected by the presence of extreme values, and hence not a
good measure of central tendency when extreme observations occur. Means of subgroups may
be combined when properly weighted. Combined mean is called the weighted arithmetic mean.
The median is a positional value and is not affected by the presence of extreme values. It is
not suitable to further computations and hence medians of subgroups cannot be combined in the
same manner as the mean.
The mode is determined by the frequency and not by the values of the observations. It can be
manipulated algebraically and can be defined with qualitative or quantitative variables.
The range is a quick but rough measure of dispersion. The larger the value of the range, the
more dispersed are the observations. It considers only the lowest and highest values in the
population.
The variance is always non-negative. It is easy to manipulate for further mathematical
treatment. It makes us of all observations.
The standard deviation is always non-negative. It is easy to manipulate for further
mathematical treatment and makes use of all observations.
The coefficient of variation is a quantity without units. It can be used to compare the dispersion
of two or more sets of data measured in the same or different units.
To better understand this topic, watch the video link below:

https://www.youtube.com/watch?v=E4HAYd0QnRc
Assessment:
Offline Learners: Answer the activity below and submit it via email to your instructor.
Online Learners: Answer the quiz posted in Google Classroom

Given the data on the right, determine the mean,
median, mode, range, variance, standard deviation,
and coefficient of variation. Use the data on cholesterol
difference only. (40 points)
References:
971-0330-05-5
Rosner, B. (2016). Fundamentals of Biostatistics 8th edition. USA: Cengage Learning.

LABORATORY 3
MEASURES OF DISPERSION AND VARIABILITY
I. BACKGROUND
The arithmetic mean or average of a set of n measurements is equal to the sum of the
measurements divided by n. The median of a set of n measurements is the value of x that falls in
the middle position when the measurements are ordered form smallest to largest. The mode is
the category that occurs most frequently, or the most frequently occurring value of x. When the
measurements on a continuous variable have been grouped as frequency or relative frequency
histogram, the class with the highest peak or frequency is called the modal class, and the midpoint
of that class is taken to be the mode.
Measures of variability can help you create a mental picture of the spread of the data. The
range of a set of n measurements is defined as the difference between the largest and smallest
measurements. The variance of a population of N measurements is the average of the squares
of deviation of the measurements about their mean. The variance of a sample of n
measurements is the sum of the squared deviations of the measurements about their mean
divided by (n – 1). The standard deviation of a set of measurements is equal to the positive
square root of the variance.
II. OBJECTIVES
1. Apply the formula of mean, median and mode in the medical field
2. Accurately solve for the mean, median, and mode in an ungrouped and grouped data
3. Apply the different measures of variability to the medical field
4. Accurately solve for the range, variance and standard deviation of a given data
III. MATERIALS
Coupon bond
Writing materials
Calculator
IV. ACTIVITY
1. Given the data on the age of the population affected by SARS in a city, find the mean,
median, and mode: 30, 25, 7, 40, 15, 36, 27, 35, 48, 10, 20, 28, 33, 45, 10
2. You are given n = 8 measurements: 3, 1, 5, 6, 4, 4, 3, 5. Calculate the range, sample
mean, sample variance, and standard deviation.
3. An article in Archaeometry involved an analysis of 26 samples of Romano-British pottery
found at four different kiln sites in the United Kingdom. The samples were analyzed to
determine their chemical composition. The percentage of iron oxide in each of five
samples collected at the Island Thorns site was: 1.28, 2.39, 1.50, 1.88, 1.51. Calculate
the range, sample variance, and the standard deviation. Compare the range and the
standard deviation.
1. Why is it necessary to solve for the mean, median, and mode?
2. What are the practical significance of the range, variance, and standard deviation?
3. A report on a study by an MIT researcher indicates that later born children are more likely
to challenge the establishment, more open to new ideas, and more accepting of change.
In fact, the number of later born children is increasing, during the Depression years of the
1930s, families averaged 2.5 children (59% later born), whereas the parents of baby
boomers averaged 3 to 4 children (68% later born). What does the author mean by an
average of 2.5 children?
4. In a psychological experiment, the time on task was recorded for 10 subject under a 5
minute time constraint. These measurements are on seconds: 175, 200, 190, 185, 250,
190, 230, 225, 240, 265. Find the average time and the median time on task. If you were
writing a report to describe these data, which measure of central tendency would you use?
Explain.

LESSON 4
T-TEST

1. Learn how to state the null and alternative hypothesis
2. Test the hypothesis using t-test
3. Formulate a conclusion based on the results of the t-test
Introduction:
Most of the time we do not know the population variance of some population hence to determine
the variability of the data, we simply compute its estimate the sample variance. Thus, when we
divide the difference of the population mean from the sample mean by the standard error of the
sample mean, then the quantity has values under the t-distribution. T-test is another tool used for
testing population mean when the variance is unknown and/or the sample size is small (n < 30).
Lesson Proper:
The One Sample t Test is commonly used to test the following:
• Statistical difference between a sample mean and a known or hypothesized value of the
mean in the population.
• Statistical difference between the sample mean and the sample midpoint of the test
variable.
• Statistical difference between the sample mean of the test variable and chance.
• This approach involves first calculating the chance level on the test variable. The
chance level is then used as the test value against which the sample mean of the
test variable is compared.
• Statistical difference between a change score and zero.
• This approach involves creating a change score from two variables, and then
comparing the mean change score to zero, which will indicate whether any
change occurred between the two time points for the original measures. If the
mean change score is not significantly different from zero, no significant change
occurred.
Note: The One Sample t Test can only compare a single sample mean to a specified constant. It
can not compare sample means between two or more groups. If you wish to compare the means
of multiple groups to each other, you will likely want to run an Independent Samples t Test (to
compare the means of two groups) or a One-Way ANOVA (to compare the means of two or more
groups).
Your data must meet the following requirements:

• Test variable that is continuous (i.e., interval or ratio level)
• Scores on the test variable are independent (i.e., independence of observations)
• There is no relationship between scores on the test variable
• Violation of this assumption will yield an inaccurate p value
• Random sample of data from the population

• Normal distribution (approximately) of the sample and population on the test variable
• Non-normal population distributions, especially those that are thick-tailed or heavily
skewed, considerably reduce the power of the test
• Among moderate or large samples, a violation of normality may still yield accurate p
values
• Homogeneity of variances (i.e., variances approximately equal in both the sample and
population)
• No outliers
One Sample t-test

𝑥̅ − 𝜇0
𝑡=
𝑠/√𝑛
Where:
𝑥̅ is the sample mean
𝜇0 proposed constant for the population mean
𝑠 sample standard deviation
𝑛 sample size
For one tailed test: For two tailed test:

If t ≥ tn−1,α then we reject H0. If 𝑡 > 𝑡𝑛−1,𝛼 then we reject 𝐻0
2
If t < tn−1,α then we accept H0. 𝛼
If 𝑡 < 𝑡𝑛−1 , then we accept 𝐻0
2
Here 𝑛 − 1 is called the degrees of freedom and 𝛼 is the level of significance. To determine the
value of 𝑡𝑛−1,𝛼 can be determined using the table below.

Example 1:
Suppose we want to test the hypothesis that mothers with low socio-economic status (SES)
deliver babies whose birthweights are lower than “normal.” To test this hypothesis, a list is
obtained of birthweights from 100 consecutive, full-term, live-born deliveries from the maternity
ward of a hospital in a low-SES area. The mean birthweight (𝑥̅ ) is found to be 115 oz with a
sample standard deviation (s) of 24 oz. Suppose we know from nationwide surveys based on
millions of deliveries that the mean birthweight in the United States is 120 oz. Can we actually
say the underlying mean birthweight from this hospital is lower than the national average?
Assume that the sample size is 10 000.
𝐻0 : The mean birthweight is lower than the national average

𝐻𝑎 : The mean birthweight is greater than the national average
𝑥̅ − 𝜇0 115 − 120
𝑡= = = −20.83
𝑠/√𝑛 24/√10000
𝑡9999,0.95 = 1.645
𝑆𝑖𝑛𝑐𝑒 − 20.83 < 1.645, 𝐴𝑐𝑐𝑒𝑝𝑡 𝐻0
Conclusion:
The mean birthweight is lower than the national average
Synthesis:
T-test is used to test the hypothesis involving the mean of a study. The results would tell us
whether there is a significant difference between the meant of a group. Constructing the null and
alternative hypothesis is important because it is the basis for testing mathematically. If the
computed t—value is greater than the tabulated t-value, then the null hypothesis is rejected. One
sample t-test can only be used for sample size less than 30, however, there is also a t-test that
can be used for samples greater than 30.
To better understand this topic, watch the video by clicking on the link below:
https://www.youtube.com/watch?v=pTmLQvMM-1M
Assessment:
Offline Students: Do the activity below and pass it your instructor via email.
Online Students:Answer the quiz posted in Google Classroom
Use t-test to be able to test the hypothesis in the following problems. (20 points)
1. The mean serum-creatinine level measured in 12 patients 24 hours after they received a
newly proposed antibiotic was 1.2 mg/dL. If the mean and standard deviation of serum
creatinine in the general population are 1.0 and 0.4 mg/dL, respectively, then, using a
significance level of .05, test whether the mean serum-creatinine level in this group is
different from that of the general population.
2. Plasma-glucose levels are used to determine the presence of diabetes. Suppose the
mean ln (plasma-glucose) concentration (mg/dL) in 35- to 44-year-olds is 4.86 with
standard deviation = 0.54. A study of 100 sedentary people in this age group is planned
to test whether they have a higher or lower level of plasma glucose than the general
population.
References:
971-0330-05-5

LABORATORY 4
T-TEST
I. BACKGROUND
T-test for two population means
If the sample size are small (n < 30), the samples were taken at random from two populations
and if the level of measurement used is at least in an interval scale, then the appropriate
test statistic is the t-test.
̅̅̅
𝑥1 − ̅̅̅
𝑥2
𝑡=
√𝑠𝑝2 ( 1 + 1 )
𝑛1 𝑛2
Where:
̅̅̅
𝑥1 – sample mean for the first group
̅̅̅
𝑥2 – sample mean for the second group
̅̅̅̅)2 +∑(𝑥𝑗−𝑥2
∑(𝑥𝑖−𝑥1 ̅̅̅̅)2
𝑠𝑝2 =
𝑛1+𝑛2−2
II. OBJECTIVES
1. Accurately solve for the t-value and compare it with the tabulated value
2. State the null and alternative hypothesis given a problem in biostatistics
3. Interpret the result of the t-value obtained
III. MATERIALS
Coupon bond
Calculator
Writing materials
IV. ACTIVITY
Formulate the null and alternative hypothesis, compute the t-value, and give a proper
interpretation of the result.
1. Sleep researchers decide to test the impact of REM sleep deprivation on a computerized
assembly line task. Subjects are required to participate in two nights of testing. On the nights
of testing EEG, EMG, and EOG measures are taken. On each night of testing, the subject is
allowed a total of four hours of sleep. However, on one of the nights, the subject is awakened
immediately upon achieving REM sleep. On the alternate night, subjects are randomly
awakened at various times throughout the 4 hour total sleep session. Testing conditions are
counterbalanced so that half of the subject experience REM deprivation on the second night
of testing. Each subject after the sleep session is required to complete a computerized
assembly line task. The task involves five rows of widgets slowly passing across the computer
screen. Randomly placed on a one/five ratio are widgets missing a component that must be
“fixed” by the subject. Number of missed widgets is recorded. Compute the appropriate t-test
for the data provided below:

REM Deprived Control
condition
26 20
15 4
8 9
44 36
26 20
13 3
38 25
24 10
17 6
29 14
2. Researchers want to examine the effect of perceived control on health complaints of geriatric
patients in a long term care facility. Thirty patients are randomly selected to participate in the
study. Half are given a plant to care for and half are given a plant but the care is conducted
by the staff. Number of health complaints are recorded for each patient over the following
seven days.
Control over plant No control over plant
23 35
12 21
6 26
13 24
18 17
5 23
21 37
18 22
34 16
10 38
23 23
14 41
19 27
23 24
8 32
1. Use one sample t-test to solve for the following:
a. The weights of 11 one month old breastfed babies are found to be 4.64, 4.41, 4.60, 4.50,
3.50, 4.01, 3.99, 4.55, 4.62, 4.80, and 4.00 kg. Based on the standard weights, one
month old babies should weigh 4 kg. Does this indicate that breastfeeding is best for
babies? Use ∝= 0.05.
b. The reaction time of a coagulant is recorded as follows: 30.26s, 31.4s, 35.6s, 34.8s,
35.26s. The standard reaction time is 32.67 s. Is this sufficient evidence that the
coagulant is effective?

LESSON 5
REGRESSION

1. Understand the use of regression
2. Determine the best fit line of a set of data
Introduction:
In many studies, the concern is to determine the cause and effect relationship of two variables
taken from a bivariate distribution. One might be interested in determining the best statistical
relation among the variables or simply just to know the degree of relationship among variables.
Problems such as these can be solved using regression techniques
Lesson Proper:
Linear regression is a basic and commonly used type of predictive analysis. The overall idea
of regression is to examine two things: (1) does a set of predictor variables do a good job in
predicting an outcome (dependent) variable? (2) Which variables in particular are significant
predictors of the outcome variable, and in what way do they–indicated by the magnitude and sign
of the beta estimates–impact the outcome variable? These regression estimates are used to
explain the relationship between one dependent variable and one or more independent variables.
The simplest form of the regression equation with one dependent and one independent variable
is defined by the formula y = c + b*x, where y = estimated dependent variable score, c = constant,
b = regression coefficient, and x = score on the independent variable.
There are many names for a regression’s dependent variable. It may be called an outcome
variable, criterion variable, endogenous variable, or regressand. The independent variables can
be called exogenous variables, predictor variables, or regressors.
Three major uses for regression analysis are (1) determining the strength of predictors, (2)
forecasting an effect, and (3) trend forecasting.
First, the regression might be used to identify the strength of the effect that the independent
variable(s) have on a dependent variable. Typical questions are what is the strength of
relationship between dose and effect, sales and marketing spending, or age and income.
Second, it can be used to forecast effects or impact of changes. That is, the regression
analysis helps us to understand how much the dependent variable changes with a change in one
or more independent variables.
Types of Regression Models

Coefficient of Determination
The coefficient of determination is the portion of the total variation in the dependent variable that
is explained by variation in the independent variable
Regression line or best fit line
Equation of a line:
𝑦 = 𝑚𝑥 + 𝑏
where:
𝑚 = 𝑦̅ − 𝑏𝑥̅
∑(𝑥𝑖 − 𝑥̅ )(𝑦𝑖 − 𝑦̅)
𝑏=
∑(𝑥𝑖 − 𝑥̅ )2
Example: Find the best line of the given data.
Create a table to easily solve for the values

𝑖 𝑥𝑖 𝑦𝑖 𝑥𝑖 − 𝑥̅ 𝑦𝑖 − 𝑦̅ (𝑥𝑖 − 𝑥̅ )(𝑦𝑖 − 𝑦̅) (𝑥𝑖 − 𝑥̅ )2
1 7 25 -8.06 -3.38 27.24 64.96
2 9 25 -6.06 -3.38 20.48 36.72
3 9 25 -6.06 -3.38 20.48 36.72
4 12 27 -3.06 -1.38 4.22 9.36
5 14 27 -1.06 -1.38 1.46 1.12
6 16 27 0.94 -1.38 -1.30 0.88
7 16 24 0.94 -4.38 -4.12 0.88
8 14 30 -1.06 1.62 -1.72 1.12
9 16 30 0.94 1.62 1.52 0.88
10 16 31 0.94 2.62 2.46 0.88
11 17 30 1.94 1.62 3.14 3.76
12 19 31 3.94 2.62 10.32 15.52
13 21 30 5.94 1.62 9.62 35.28
14 24 28 8.94 -0.38 -3.40 79.92
15 15 32 -0.06 3.62 -0.22 0.00
16 16 32 0.94 3.62 3.40 0.88
15.06 28.38 93.62 288.94
∑(𝑥𝑖 − 𝑥̅ )(𝑦𝑖 − 𝑦̅) 93.62
𝑏= 2 = = 0.32
∑(𝑥𝑖 − 𝑥̅ ) 288.94
𝑚 = 𝑦̅ − 𝑏𝑥̅ = 28.38 − (0.32)(15.06) = 23.56

The best fit equation of the line is:
𝑦 = 𝑚𝑥 + 𝑏
𝑦 = 23.56𝑥 + 0.32
Synthesis:
A line of best fit is often useful to attempt to represent data with the equation of a straight line in
order to predict values that may not be displayed on the plot. The line of best fit is determined by
the correlation between the two variables on a scatter plot. In the case that there are a few outliers
(data points that are located far away from the rest of the data) the line will adjust so that it
represents those points as well.
To understand this lesson better, the video links below may help:
https://www.youtube.com/watch?v=WWqE7YHR4Jc
https://www.youtube.com/watch?v=ZkjP5RJLQF4
Assessment:
Online Learners: Answer the activity posted in Google Classroom.
The data in Table 11.17 are given for 9 patients with aplastic anemia. Fit a regression line relating
the percentage of reticulocytes (x) to the number of lymphocytes (y). (30 points)

References:
971-0330-05-5
Frost, J. (2019). Choosing the Correct Type of Regression Analysis. Retrieved from:
http://www.statisticsbyjim.com/regression/choosing-regression-analysis/

LABORATORY 5
REGRESSION
I. BACKGROUND
The linear regression line can predict the value of y if a new x value is fit into the
equation. The equation for a line is 𝑦 = 𝑚𝑥 + 𝑏 where:
𝑚 = 𝑦̅ − 𝑏𝑥̅
∑(𝑥𝑖 −𝑥̅ )(𝑦𝑖 −𝑦̅)

𝑏= ∑(𝑥𝑖 −𝑥̅ )2
II. OBJECTIVES
1. Determine the regression line that fits the given data
2. Create a scatter plot based on the data given
III. MATERIALS
Calculator
Writing materials
Coupon bond
IV. ACTIVITY
1. The table below shows the height, x in inches and the pulse rate y per minute for 9
people.
a. Determine the linear regression line that would fit the data.
b. Create a scatter graph using the given data and the calculated linear regression
line.
x 68 72 65 70 62 75 78 64 68
y 90 85 88 100 105 98 70 65 72
1. The table below shows the lengths and corresponding ideal weights of sand sharks.
Determine the equation of the best fit line to predict the weight of a sand shark
whose length is 75 inches.
Length 60 62 64 66 68 70 72
Weight 105 114 124 131 139 149 158
2. The data below shows the height and shoe sizes of six randomly selected men. If a
man has a shoe size of 10.5, what would be his predicted height?
Height 67 70 73.5 75 78 66
Shoe Size 8.5 9.5 11 12 13 8

LESSON 6
CORRELATION

1. Solve for the correlation coefficient of a set of data
2. Interpret the correlation coefficient
3. Determine the correlation of two variables in a set of data
Introduction:
Correlation is a statistical technique that can show whether and how strongly pairs of variables
are related. For example, height and weight are related; taller people tend to be heavier than
shorter people. The relationship isn't perfect. People of the same height vary in weight, and you
can easily think of two people you know where the shorter one is heavier than the taller one.
Nonetheless, the average weight of people 5'5'' is less than the average weight of people 5'6'',
and their average weight is less than that of people 5'7'', etc. Correlation can tell you just how
much of the variation in peoples' weights is related to their heights.
Although this correlation is fairly obvious your data may contain unsuspected correlations. You
may also suspect there are correlations, but don't know which are the strongest. An intelligent
correlation analysis can lead to a greater understanding of your data.
Lesson Proper:
The correlation coefficient is needed to obtain a measure of relatedness independent of the
units of X and Y. The correlation coefficient is a dimensionless quantity that is independent of the
units of X and Y and ranges between −1 and 1. For random variables that are approximately
linearly related, a correlation coefficient of 0 implies independence. A correlation coefficient close
to 1 implies nearly perfect positive dependence with large values of X corresponding to large
values of Y and small values of X corresponding to small values of Y. An example of a strong
positive correlation is between forced expiratory volume (FEV), a measure of pulmonary function,
and height (Figure a). A somewhat weaker positive correlation exists between serum cholesterol
and dietary intake of cholesterol (Figure b). A correlation coefficient close to −1 implies ≈ perfect
negative dependence, with large values of X corresponding to small values of Y and vice versa,
as is evidenced by the relationship between resting pulse rate and age in children under the age
of 10 (Figure c). A somewhat weaker negative correlation exists between FEV and number of
cigarettes smoked per day in children (Figure d).

Interpretation of the Sample Correlation Coefficient
(1) If the correlation is greater than 0, such as for birthweight and estriol, then the variables are
said to be positively correlated. Two variables (x, y) are positively correlated if as x increases, y
tends to increase, whereas as x decreases, y tends to decrease.
(2) If the correlation is less than 0, such as for pulse rate and age, then the variables are said to
be negatively correlated. Two variables (x, y) are negatively correlated if as x increases, y tends
to decrease, whereas as x decreases, y tends to increase.
(3) If the correlation is exactly 0, such as for birthweight and birthday, then the variables are said
to be uncorrelated. Two variables (x, y) are uncorrelated if there is no linear relationship
between x and y.
Thus the sample correlation coefficient provides a quantitative estimate of the dependence
between two variables: the closer |r| is to 1, the more closely related the variables are; if |r| = 1,
then one variable can be predicted exactly from the other.
• POSITIVE CORRELATION – exists when high scores in one variable are associated
with high scores in the second variable or low scores in one variable are associated with
low scores in the other
• NEGATIVE CORRELATION – exists when high scores in one variable are associated
with low scores in the second or vice versa.
• ZERO CORRELATION– exists when the points on the scatter diagram are spread in a
random manner.
• PERFECT CORRELATION– all points lie on a straight line
The strength or degree of the relationship is based on the following ranges of the correlation
coefficient:

Correlation Coefficient
𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
𝑟=
√[𝑛 ∑ 𝑥 2 − (∑ 𝑥)2 ][𝑛 ∑ 𝑦 2 − (∑ 𝑦)2 ]
Where:
𝑟 correlation coefficient
𝑛 sample size
𝑥 value of the independent variable
𝑦 value of the dependent variable
Example: Determine the correlation coefficient of the data
x y x2 y2 xy
7 25 49 625 175
9 25 81 625 225
9 25 81 625 225
12 27 144 729 324
14 27 196 729 378
16 27 256 729 432
16 24 256 576 384
14 30 196 900 420
16 30 256 900 480
16 31 256 961 496
17 30 289 900 510
19 31 361 961 589
21 30 441 900 630
24 28 576 784 672
15 32 225 1024 480
16 32 256 1024 512
241 454 3919 12992 6932
𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦 16(6932) − (241)(454)
𝑟= = = 0.53
√[𝑛 ∑ 𝑥 2 − (∑ 𝑥)2 ][𝑛 ∑ 𝑦 2 − (∑ 𝑦)2 ] √(16(3919) − 2412 )(16(12992) − 4542 )
From the computed r value, there is moderate or substantial correlation between estriol and
birthweight.
Synthesis:
A key thing to remember when working with correlations is never to assume a correlation
means that a change in one variable causes a change in another. Sales of personal computers
and athletic shoes have both risen strongly over the years and there is a high correlation between
them, but you cannot assume that buying computers causes people to buy athletic shoes (or vice
versa).
The Pearson correlation technique works best with linear relationships: as one variable gets
larger, the other gets larger (or smaller) in direct proportion. It does not work well with curvilinear
relationships (in which the relationship does not follow a straight line). An example of a curvilinear
relationship is age and health care. They are related, but the relationship doesn't follow a straight
line. Young children and older people both tend to use much more health care than teenagers or
young adults. Multiple regression (also included in the Statistics Module) can be used to examine
curvilinear relationships.
Watch the video link below to help you understand the topic better:
https://www.youtube.com/watch?v=4EXNedimDMs
Assessment:
Offline Learners: Answer the activity below and pass it to your instructor via email.
Online Learners: Answer the activity posted in Google Classroom
Compute the correlation between 5-year lung cancer mortality and annual cigarette
consumption when each is expressed in the log10 scale. (30 points)

References:
971-0330-05-5
Creative Research Systems. (2016). Correlation. Retrieved from:

https://www.surveysystem.com/correlation.htm

LABORATORY 6
CORRELATION
I. BACKGROUND
Correlation seeks to find the relationship between two variables. It can be solved using the
formula below:
𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
𝑟=
√[𝑛 ∑ 𝑥 2 − (∑ 𝑥)2 ][𝑛 ∑ 𝑦 2 − (∑ 𝑦)2 ]
The computed r value will be a number between -1 to 1. If r is close to 1, then the variables
are positively correlated. If r is close to -1 the variables are negatively correlated. If r is
close to 0, the variables are not correlated.
II. OBJECTIVES:
1. Accurately solve for the correlation coefficient.
2. Give an appropriate interpretation for the computed r value
III. MATERIALS:
Coupon Bond
Calculator
Writing Materials
IV. ACTIVITY:
1. The World Bank collects information on the life expectancy of a person in each
country ("Life expectancy at," 2013) and the fertility rate per woman in the country
("Fertility rate," 2013). The data for 24 randomly selected countries for the year 2011
are listed below. Based on the correlation coefficient, is there a correlation between
life expectancy and fertility?

V. UESTIONS/SUPPLEMENTAL ACTIVITIES/EXERCISES:
1. Does correlation imply causation? Why or why not?
2. Give an interpretation of the r values generated from the study below:
a. Difference in scar formation at different sites, in different directions at the same
site, but with changes in the elasticity of skin with age, sex, and race or in some
pathological conditions, is well known to clinicians. Spearman correlation
coefficients of collagen and elastic fibers in horizontal and vertical directions were
measured from the respective quantitative fraction data in 320 skin samples from
32 human cadavers collected at five selected sites over extremities. The results
yielded a 𝑝 < 0.01 that is, shoulder joint area r = 0.66, wrist r = 0.75, forearm r =
0.75, thigh r = 0.80, ankle r = 0.26.
b. Tap water consumption and hemoglobin change were studied using 30 individuals.
The results show that r = -0.20 for tap water consumption and baseline hemoglobin
and r = -0.40 for tap water consumption and final hemoglobin. Give an
interpretation of the r values to answer the hypothesis that there is a correlation
between hemoglobin change and tap water consumption.

LESSON 7
CHI SQUARE GOODNESS OF FIT TEST

1. Understand the use of Chi-square given a set of data
2. Accurately interpret the result of the test statistic
3. Solve for the test statistic
Introduction:
The Chi Square statistic is commonly used for testing relationships between categorical
variables. The null hypothesis of the Chi-Square test is that no relationship exists on the
categorical variables in the population; they are independent. An example research question that
could be answered using a Chi-Square analysis would be:
Is there a significant relationship between voter intent and political party membership?
Lesson Proper:
How does the Chi-Square statistic work?
The Chi-Square statistic is most commonly used to evaluate Tests of Independence when
using a crosstabulation (also known as a bivariate table). Crosstabulation presents the
distributions of two categorical variables simultaneously, with the intersections of the categories
of the variables appearing in the cells of the table. The Test of Independence assesses whether
an association exists between the two variables by comparing the observed pattern of responses
in the cells to the pattern that would be expected if the variables were truly independent of each
other. Calculating the Chi-Square statistic and comparing it against a critical value from the Chi-
Square distribution allows the researcher to assess whether the observed cell counts are
significantly different from the expected cell counts.
The calculation of the Chi-Square statistic is quite straight-forward and intuitive:
(𝑂 − 𝐸)2
𝜒𝑐2 = ∑
𝐸
Where O is the observed frequency (the observed counts in the cells) and E is the expected
frequency if NO relationship existed between the variables. As depicted in the formula, the Chi-
Square statistic is based on the difference between what is actually observed in the data and what
would be expected if there was truly no relationship between the variables.
Steps in solving 𝜒 2
1. State the null and alternative hypothesis
2. Determine he critical value of 𝜒 2 based on the table
2
For one tailed: 𝜒𝛼,(𝑣−1)
2
For two tailed: 𝜒𝛼,(𝑣−1)
2
3. Compute
(𝑂 − 𝐸)2
𝜒𝑐2 = ∑
𝐸
4. Decision: Reject Ho if 𝜒𝑐2 > 𝜒𝛼2
5. State the conclusion

The critical value of 𝜒 2 can be obtained from the value below:

Example:
Hypertension Diastolic blood-pressure measurements were collected at home in a community-
wide screening program of 14,736 adults ages 30−69 in East Boston, Massachusetts, as part
of a nationwide study to detect and treat hypertensive people. The people in the study were
each screened in the home, with two measurements taken during one visit. A frequency
distribution of the mean diastolic blood pressure is given in the table below in 10-mm Hg
intervals. We would like to assume these measurements came from an underlying normal
distribution because standard methods of statistical inference could then be applied on these
data as presented in this text. How can the validity of this assumption be tested?
1. Ho: The distribution is normal

Ha: The distribution is not normal
2
2. 𝜒0.05 = 1.69
,(8−1)
2
(𝑂−𝐸)2
3. 𝜒𝑐2 = ∑
𝐸
(57 − 69)2 (330 − 502.5)2 (2132 − 2018.4)2 (4584 − 4200.9)2
𝜒𝑐2 = + + +
69 502.5 2018.4 4200.9
(4604 − 4538.6)2 (2119 − 2545.9)2 (659 − 740.4)2
+ + +
4538.6 2545.9 740.4
2
(251 − 120.2)
+ = 326.44
120.2
4. Since 326.44 > 1.69, Reject Ho
5. Conclusion: The distribution is not normal.
Synthesis:
A chi-square (χ2) statistic is a test that measures how expectations compare to actual observed
data (or model results). The data used in calculating a chi-square statistic must be random,
raw, mutually exclusive, drawn from independent variables, and drawn from a large enough
sample. Chi-square tests are often used in hypothesis testing. The null hypothesis is rejected if
the computed value is greater than the tabulated value.
To help you understand this lesson, you can watch the video links below:

https://www.youtube.com/watch?v=7_cs1YlZoug
https://www.youtube.com/watch?v=WXPBoFDqNVk
Assessment:
Offline Learners: Do the activity below and submit it to your instructor’s email.
Online Learners: Answer the activity posted in Google Classroom.
The National Center for Health Statistics (NCHS) provided data on the distribution of weight (in
categories) among Americans in 2002. The distribution was based on specific values of body
mass index (BMI) computed as weight in kilograms over height in meters squared. Underweight
was defined as BMI< 18.5, Normal weight as BMI between 18.5 and 24.9, overweight as BMI
between 25 and 29.9 and obese as BMI of 30 or greater. Americans in 2002 were distributed as
follows: 2% Underweight, 39% Normal Weight, 36% Overweight, and 23% Obese. Suppose we
want to assess whether the distribution of BMI is different in the Framingham Offspring sample.
Using data from the n=3,326 participants who attended the seventh examination of the Offspring
in the Framingham Heart Study we created the BMI categories as defined and observed the
following:
Normal Overweight Obese
Underweight
BMI 18.5- BMI 25.0- BMI > 30
BMI<18.5 24.9 29.9
Observed Frequencies 20 932 1374
(O)
Expected Frequencies 66.5 1297.1 1197.4
(E)
References:
Beligan, S. (2016). Statistics for the Social Sciences. La Trinidad: Benguet State University.
ISBN 971-0330-05-5
Gunderson, B. (2015). Chi-square Tests. Retrieved from:

https://open.umich.edu/sites/default/files/downloads/workbook_13-lab_12-_chi-
square_tests.pdf
______. (2020). Using Chi-square Statistic in Research. Retrieved from:

statisticssolutions.com/using-chi-square-statistic-in-research

LABORATORY 7
CHI-SQUARE
I. BACKGROUND:
You will see three Chi-Square tests: the tests of goodness of fit, independence, and
homogeneity. For all three tests, the data are generally presented in the form of a
contingency table (a rectangular array of numbers in cells). All three tests are based on the
Chi-Square statistic.
Goodness of Fit Test: This test answers the question, “Do the data fit well compared to
a specified distribution?” It considers one categorical response, and assesses whether the
proportion of sampled observations falling into each category matches well to a specified
distribution. The null hypothesis specifies this distribution which describes the population
proportion of observations in each category.
Test of Homogeneity: This test answers the question, “Do two or more populations have
the same distribution for one categorical variable?” It considers one categorical response,
and assesses whether the model for this response is the same in two (or more) populations.
The null hypothesis is that the distribution of the categorical variable is the same for the two
(or more) populations.
Test of Independence: This test answers the question, “Are two factors (or variables)
independent for a population under study?” It considers two categorical variables
(sometimes one is a response and the other is explanatory), and assesses whether there
appears to be a relationship between these two variables for a single population. The null
hypothesis is that the two categorical variables are independent (not related) for the
population of interest.
II. OBJECTIVES:
1. Accurately determine 𝜒 2
2. Give an appropriate interpretation of the test statistic
III. MATERIALS:
Coupon Bond
Calculator
Writing Materials
IV. ACTIVITY:
1. A clinical trial was conducted among children 1–10 years of age with prior symptoms of
otorrhea comparing efficacy of (i) antibiotic eardrops, (ii) oral antibiotics, and (iii)
observation without treatment, referred to below as observation. Children were seen at
home by study physicians at 2 weeks and 6 months after randomization. The primary
outcome was the presence of otorrhea at 2 weeks observed by study physicians. The
results are given in Table 10.22. Do the results agree with the expectation?

V. QUESTIONS/SUPPLEMENTAL ACTIVITIES/EXERCISES:
Match each research question with the appropriate Chi-Square test that should be used to answer
the question
1. Is student status (in-state versus out-of-state) associated with one’s eventual graduation
outcome (graduating versus not graduating)?
Answer: Chi-Square test of _____ _ ________
2. To test a theory that people have no preference among four different outdoor activities,
you ask 100 people to select among jogging, bicycling, hiking, or swimming.
3. A biostatistician would like to determine if the ratio of the blood type in the storage for
transfusions should be different in Hawaii from the main land. She collected a sample of
blood types of 10,000 people in Hawaii and that of 100,000 people in the mainland. She
wishes to see if the breakdown of blood types (A, B, AB and 0) is the same for both
populations.
4. A researcher wants to determine if scoring high or low on an artistic ability test depends
on being right or left-handed.
5. A national organization wants to compare the distribution of level of highest education

completed (high school, college, masters, doctoral) for Republicans versus Democrats.
6. A preservation society has the percentages of five main types of fish in the river from 10
years ago. After noticing an imbalance recently, they add some fish from hatcheries to the
river. How can they determine if they restored the ecosystem from a new sample of fish?

LESSON 8
ANALYSIS OF VARIANCE

1. Describe the use and application of ANOVA
2. Understand and master the steps to solve for the F value
3. Accurately interpret the meaning of the F value
Introduction:
An ANOVA test is a way to find out if survey or experiment results are significant. In other
words, they help you to figure out if you need to reject the null hypothesis or accept the alternate
hypothesis.
Basically, you’re testing groups to see if there’s a difference between them. Examples of
when you might want to test different groups:
• A group of psychiatric patients are trying three different therapies: counseling, medication
and biofeedback. You want to see if one therapy is better than the others.
• A manufacturer has two different processes to make light bulbs. They want to know if one
process is better than the other.
• Students from different colleges take the same exam. You want to see if one college
outperforms the other.
Lesson Proper:
One Way ANOVA

A one way ANOVA is used to compare two means from two independent (unrelated) groups using
the F-distribution. The null hypothesis for the test is that the two means are equal. Therefore, a
significant result means that the two means are unequal.
Examples of when to use a one way ANOVA

Situation 1: You have a group of individuals randomly split into smaller groups and completing
different tasks. For example, you might be studying the effects of tea on weight loss and form
three groups: green tea, black tea, and no tea.
Situation 2: Similar to situation 1, but in this case the individuals are split into groups based on an
attribute they possess. For example, you might be studying leg strength of people according to
weight. You could split participants into weight categories (obese, overweight and normal) and
measure their leg strength on a weight machine.
Limitations of the One Way ANOVA

A one way ANOVA will tell you that at least two groups were different from each other. But it
won’t tell you which groups were different. If your test returns a significant f-statistic, you may
need to run an ad hoc test (like the Least Significant Difference test) to tell you exactly which
groups had a difference in means.
Tabulated F-value
The tables below are needed to obtain the critical F value. If the computed F value is greater
than the tabulated F value, then the null hypothesis is rejected. If the computed F value is less
than the tabulated F value, then the null hypothesis is accepted.

Steps in solving the F value
1. State the null and alternative hypothesis
2. Determine the tabulated F value (refer to the tables)
𝐺𝑇 2
3. Solve for the correction factor: 𝐶𝐹 =
𝑛
4. Determine the degrees of freedom
a. 𝑡𝑜𝑡𝑎𝑙 𝑑𝑓 = 𝑡𝑑𝑓 = 𝑛 − 1
b. 𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡 𝑑𝑓 = 𝑡𝑟𝑡𝑑𝑓 = 𝑘 − 1
Here, n is the sample size and K is the number of methods being compared
c. 𝑒𝑟𝑟𝑜𝑟 𝑑𝑓 = 𝑒𝑑𝑓 = 𝑡𝑑𝑓 − 𝑡𝑟𝑡𝑑𝑓
5. Solve for the sum of squares
a. Total sum of squares (TSS)
2
𝑇𝑆𝑆 = ∑ 𝑦𝑖𝑗 − 𝐶𝐹
b. Treatment sum of squares (TrSS)
𝑦𝑖2
𝑇𝑟𝑆𝑆 = ∑ − 𝐶𝐹
𝑛𝑖
c. Error sum of squares (ESS)
𝐸𝑆𝑆 = 𝑇𝑆𝑆 − 𝑇𝑟𝑆𝑆
6. Mean Square
a. Treatment Mean Square (MSTr)
𝑇𝑟𝑆𝑆
𝑀𝑆𝑇𝑟 =
𝑡𝑟𝑡𝑑𝑓
b. Error Mean Square (MSE)

𝐸𝑆𝑆
𝑀𝑆𝐸 =
𝑒𝑑𝑓
𝑀𝑆𝑇𝑟
7. Computed F value: 𝐹𝑐 =
𝑀𝑆𝐸
8. Decision: accept or reject H0
9. State the conclusion
To understand the steps, let us look at an example
Example: Pulmonary Disease

The authors identified 200 males and 200 females in each of the six groups except for the NI
group, which was limited to 50 males and 50 females because of the small number of such
people available. The mean and standard deviation of FEF for each of the six groups for males
are presented in the table below. How can the means of these six groups be compared?
1. Ho: The means are equal

Ha: The means are not equal
2. 𝐹1,10 = 4.96 assuming that 𝛼 = 0.05
3. Take the sum of Mean FEF and sd FEF
𝐺𝑇 2 (18.95 + 4.83)2
𝐶𝐹 = = = 47.12
𝑛 12
4. 𝑡𝑑𝑓 = 𝑛 − 1 = 12 − 1 = 11
𝑡𝑟𝑡𝑑𝑓 = 2 − 1 = 1
𝑒𝑑𝑓 = 𝑡𝑑𝑓 − 𝑡𝑟𝑡𝑑𝑓 = 11 − 1 = 10
2
5. 𝑇𝑆𝑆 = ∑ 𝑦𝑖𝑗 − 𝐶𝐹
𝑇𝑆𝑆 = (3.782 + 3.302 + 3.322 + 3.232 + 2.732 + 2.592 + 0.792 + 0.772 + 0.862 + 0.782
+ 0.812 + 0.822 ) − 47.12 = 17.57
𝑦𝑖2
𝑇𝑟𝑆𝑆 = ∑ − 𝐶𝐹
𝑛𝑖
18.952 4.832
𝑇𝑟𝑆𝑆 = [ + ] − 47.12 = 16.62
6 6
𝐸𝑆𝑆 = 𝑇𝑆𝑆 − 𝑇𝑟𝑆𝑆 = 17.57 − 16.62 = 0.95

𝑇𝑟𝑆𝑆 16.62
6. 𝑀𝑆𝑇𝑟 = = = 16.62
𝑡𝑟𝑡𝑑𝑓 1

𝐸𝑆𝑆 0.95
𝑀𝑆𝐸 = = = 0.095
𝑒𝑑𝑓 10
𝑀𝑆𝑇𝑟 16.62
7. 𝐹𝑐 = = = 174.94
𝑀𝑆𝐸 0.095
8. Since 174.94 > 4.96, Reject Ho
9. Conclusion: The means are not equal
Synthesis:
The ANOVA is used when the research question involves the comparisons of means from more
than two independent groups. It is assumed that:
• The groups are independent
• The variance for each of the groups is the same
• The outcome comes from the normal distribution
In this analysis, the goal is to take the variability of the outcome and divide it into the variability
between the groups and the variability within groups. This statistical tool helps us to answer a
hypothesis by splitting up the sources of variability. Therefore, the ANOVA provides a statistical
test for determining whether there is enough evidence to reject the null hypothesis that all the
means are equal.
To help you understand this lesson, watch the video link below:
https://www.youtube.com/watch?v=oOuu8IBd-yo
Assessment:
Offline Learners: Answer the activity listed below and send it to your instructor via email.
Twenty-two young asthmatic volunteers were studied to assess the short-term effects of sulfur
dioxide (SO2) exposure under various conditions. The baseline data in Table 12.30 were
presented regarding bronchial reactivity to SO 2 stratified by lung function (as defined by forced
expiratory volume / forced vital capacity [FEV1/FVC]) at screening. Test the hypothesis that there

is an overall mean difference in bronchial reactivity among the three lung-function groups.
References:
Beligan, S. (2016). Statistics for the Social Sciences. La Trinidad: Benguet State
University. ISBN 971-0330-05-5
Bush, H., Macera, C., Shaffer, R., and Shaffer, P. (2020). Biostatistics and
Epidemiology. Taguig: Cengage Learning Asia Pte Ltd.
LABORATORY 8
ANALYSIS OF VARIANCE
I. BACKGROUND
The responses that are generated in an experimental situation always exhibit a certain
amount of variability. In an analysis of variance, you divide the total variation in the response
measurements into portions that may be attributed to various factors of interest to the
experimenter. If the experiment has been properly designed, these portions can then be
used to answer questions about the effects of the various factors on the response of interest.
II. OBJECTIVE
1. Accurately solve for the F value given the raw data
2. Give an appropriate interpretation of the F value
3. Analyze and interpret a given Excel ANOVA output
III. MATERIALS
Calculator
Writing materials
Coupon bond
IV. ACTIVITY
1. A clinical trial is run to compare weight loss programs and participants are randomly
assigned to one of the comparison programs and are counseled on the details of the
assigned program. Participants follow the assigned program for 8 weeks. The outcome of
interest is weight loss, defined as the difference in weight measured at the start of the study
(baseline) and weight measured at the end of the study (8 weeks), measured in
pounds. Three popular weight loss programs are considered. The first is a low calorie diet.
The second is a low fat diet and the third is a low carbohydrate diet. For comparison
purposes, a fourth group is considered as a control group. Participants in the fourth group
are told that they are participating in a study of healthy behaviors with weight loss only one
component of interest. The control group is included here to assess the placebo effect (i.e.,
weight loss due to simply participating in the study). A total of twenty patients agree to
participate in the study and are randomly assigned to one of the four diet groups. Weights
are measured at baseline and patients are counseled on the proper implementation of the
assigned diet (with the exception of the control group). After 8 weeks, each patient's weight
is again measured and the difference in weights is computed by subtracting the 8 week
weight from the baseline weight. Positive differences indicate weight losses and negative
differences indicate weight gains. For interpretation purposes, we refer to the differences in
weights as weight losses and the observed weight losses are shown below.
Low Calorie Low Fat Low Carbohydrate Control

8 2 3 2
9 4 5 2
6 3 4 -1
7 5 2 0
3 1 3 3
Is there a statistically significant difference in the mean weight loss among the four diets?
2. Calcium is an essential mineral that regulates the heart, is important for blood clotting and
for building healthy bones. The National Osteoporosis Foundation recommends a daily
calcium intake of 1000-1200 mg/day for adult men and women. While calcium is contained
in some foods, most adults do not get enough calcium in their diets and take supplements.
Unfortunately, some of the supplements have side effects such as gastric distress, making
them difficult for some patients to take on a regular basis. A study is designed to test
whether there is a difference in mean daily calcium intake in adults with normal bone
density, adults with osteopenia (a low bone density which may lead to osteoporosis) and
adults with osteoporosis. Adults 60 years of age with normal bone density, osteopenia and
osteoporosis are selected at random from hospital records and invited to participate in the
study. Each participant's daily calcium intake is measured based on reported food intake
and supplements. The data are shown below.

Normal Bone Density Osteopenia Osteoporosis
1200 1000 890
1000 1100 650
980 700 1100
900 800 900
750 500 400
800 700 350
Is there a statistically significant difference in mean calcium intake in patients with normal bone
density as compared to patients with osteopenia and osteoporosis?
Given the result generated from Excel, state the alternative hypothesis, determine if the null
hypothesis should be rejected or accepted, interpret the F value and p-value.
1. Ho: There is no sufficient evidence to conclude that the mean calcium content is not the
same for the four different storage times
Anova: Single Factor
SUMMARY
Groups Count Sum Average Variance
0 months 6 344.96 57.49333 1.890107
1 month 6 347.71 57.95167 1.765937
2 months 6 357.32 59.55333 0.804507
3 months 6 362.03 60.33833 2.119657
ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 32.13815 3 10.71272 6.512085 0.002982 3.098391
Within Groups 32.90103 20 1.645052
Total 65.03918 23
2. Ho: There are no significant differences in the efficacy of the new antidepressant based
on the dosage given
Anova: Single Factor
SUMMARY

Groups Count Sum Average Variance
placebo 5 191 38.2 66.7
low dose 5 103 20.6 69.3
moderate dose 5 74 14.8 61.7
ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 1484.933 2 742.4667 11.26657 0.001761 3.885294
Within Groups 790.8 12 65.9
Total 2275.733 14
LESSON 9
RELATIVE RISK

1. Interpret relative risk from a given set of data
2. Determine the exposed and unexposed group in a population
Introduction:
Risk as defined for public health planning is the probability of the occurrence of a disease or
other health outcome of interest during a specified period, usually one year. Risk is calculated by
dividing the number who got the disease during the defined period by the total population of
interest during that period. For example, if there were 1000 births in a health jurisdiction in one

year and 72 of those babies weighed less that 2500 grams, the risk of low birth weight (LBW) in
the community would be 72/1000 = 0.072 or 7.2%.
Relative risk is the calculated ratio of incidence rates of a health condition or outcome in two
groups of people, those exposed to a factor of interest and those not exposed. It is used to
determine if exposure to a specific risk factor is associated with an increase, decrease, or no
change in the disease or outcome rate when compared to those without the exposure. Relative
risk is a statistical measure of the strength of the association between a risk factor and an
outcome.
Lesson Proper:
The fundamental comparison of rates using a ratio in epidemiology is known as the rate ratio.
If the rates being compared are incidence rates, epidemiologists call those comparisons risk
ratios, also referred to as relative risk (RR). The definition of relative risk is a measure of
association that provided the strength of association between exposure and outcome in a
population. This definition has several key parts that need to be highlighted.
First, relative risk is a measure of association, which means that it has the ability to tell if two
comparable groups are related to each other. The second key part of the definition of relative risk
is that it provides the strength of association, which means that it results in a number that tells
how related the comparable groups are. So a resulting relative risk of 2 indicates that the rate
above the fraction line is twice as large as the rate below the fraction line. A relative risk of 3 is
said to be stronger than a relative risk of 2. The third key part of the definition of relative risk is
that it is between the exposure and outcome in a population. Although the terms “exposure” and
“outcome” are used in the definition, the reality is that the relative risk can compare rates between
any two groups. The two groups could be two populations, two geographic locations, two time
periods, or two diseases, but most often relative risk is used to compare the rates of a disease in
the group of people exposed to the risk factor of interest and in the group of people not exposed
to the risk factor of interest. Relative risk is a very flexible tool.
The generic formula for assessing the relationship between exposure and outcome using the
relative risk is:
𝐼𝑛𝑐𝑖𝑑𝑒𝑛𝑐𝑒 𝑟𝑎𝑡𝑒 𝑖𝑛 𝑡ℎ𝑒 𝑒𝑥𝑝𝑜𝑠𝑒𝑑 𝑔𝑟𝑜𝑢𝑝
𝑅𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝑟𝑖𝑠𝑘 =
𝐼𝑛𝑐𝑖𝑑𝑒𝑛𝑐𝑒 𝑟𝑎𝑡𝑒 𝑖𝑛 𝑡ℎ𝑒 𝑛𝑜𝑛𝑒𝑥𝑝𝑜𝑠𝑒𝑑 𝑔𝑟𝑜𝑢𝑝
In this form, the exposed group is identified by the hypothesis of interest to the investigator.
So, for example, the exposed group could be those who smoke cigarettes, and the nonexposed
group would be those who do not smoke cigarettes. The resulting relative risk will then identify
the relationship between smoking cigarettes and the outcome of interest. In addition to comparing
exposed and unexposed groups, the rates above and below the fraction line can be flexible and
represent any groupings that are to be compared.
Interpreting Relative Risk

The summary of the interpretation of relative risk is presented in the table below. The
interpretation of relative risk is the same as any ratio measure. If the rate above the fraction line
is the same quantity as the rate below the fraction line, the result will be a relative risk equal to 1,
which is interpreted as no relationship between the outcome being assessed among the groups
of exposed and unexposed. The implication is that if both groups have the same rate of the
outcome, then being in either is group is not related to a change in the outcome. Consequently,
a value of 1 for the relative risk is referred to as the null value because it means that there is no
relationship between exposure and outcome.

Relative Risk = 1 Null value. Same rate of outcome in both groups being compared. No
relationship exists between the groups being compared in the ratio.
Relative Risk > 1 Positive association. Rate above the fraction line is greater than the rate
below the fraction line. Subjects in the exposed group are more likely to
have the outcome of interest.
Relative Risk < 1 Negative association. Rate above the fraction line is less than the rate
below the fraction line. Subjects in the exposed group are less likely to
have the outcome of interest.
The null value of relative risk is not strictly equal to exactly 1. Sometimes a relative risk close
to 1 is still considered as no association between exposure and outcome. But when the result of
a relative risk is different than 1, the relationship between the exposure and outcome is indicated
by the strength of association and the direction of the result. When the relative risk is above 1,
the interpretation is that those in the exposed group are more likely to have the outcome than
those in the nonexposed group. This is known as a positive association between exposure and
outcome. The larger the number, the stronger the relationship between being exposed and having
the outcome. The sentence that interprets a relative risk above 1 is:
The risk that those in the exposed group will develop the outcome is XX.XX times as
likely as those in the nonexposed group developing the outcome.
In this interpretation, the numeric result of the relative risk is inserted in place of XX.XX.
Also, the actual characteristic or attribute that forms the exposure group should replace the terms
exposed and nonexposed. Finally, the actual outcome or disease should replace the outcome.
As an example, examine this calculated relative risk:
𝐼𝑛𝑐𝑖𝑑𝑒𝑛𝑐𝑒 𝑟𝑎𝑡𝑒 𝑜𝑓 𝑙𝑢𝑛𝑔 𝑐𝑎𝑛𝑐𝑒𝑟 𝑖𝑛 𝑡ℎ𝑒 𝑠𝑚𝑜𝑘𝑖𝑛𝑔 𝑔𝑟𝑜𝑢𝑝 12.8%
= = 4.0
𝐼𝑛𝑐𝑖𝑑𝑒𝑛𝑐𝑒 𝑟𝑎𝑡𝑒 𝑜𝑓 𝑙𝑢𝑛𝑔 𝑐𝑎𝑛𝑐𝑒𝑟 𝑖𝑛 𝑡ℎ𝑒 𝑛𝑜𝑛𝑠𝑚𝑜𝑘𝑖𝑛𝑔 𝑔𝑟𝑜𝑢𝑝 3.2%
This relative risk of 4 means that there is a positive relationship between smoking and lung cancer.
The interpretation of this relative risk is:
The risk that those in the smoking group will develop lung cancer is 4.0 times as likely
as those in the nonsmoking group developing lung cancer.
A negative association is represented by a relative risk that is less than 1. In this case, the higher
rate of outcome is below the fraction line. This finding is also an indication that the exposure is
protective for the outcome. The sentence that interprets a relative risk below 1 is the same as for
the relative risk above 1, except that it indicates the exposed group is less likely to develop the
outcome. As an example, examine the relative risk presented here:
𝐼𝑛𝑐𝑖𝑑𝑒𝑛𝑐𝑒 𝑟𝑎𝑡𝑒 𝑜𝑓 ℎ𝑒𝑎𝑟𝑡 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 𝑖𝑛 𝑡ℎ𝑒 𝑒𝑥𝑒𝑟𝑐𝑖𝑠𝑒 𝑔𝑟𝑜𝑢𝑝 2.4%
= = 0.27
𝐼𝑛𝑐𝑖𝑑𝑒𝑛𝑐𝑒 𝑟𝑎𝑡𝑒 𝑜𝑓 ℎ𝑒𝑎𝑟𝑡 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 𝑖𝑛 𝑡ℎ𝑒 𝑛𝑜𝑛𝑒𝑥𝑒𝑟𝑐𝑖𝑠𝑒 𝑔𝑟𝑜𝑢𝑝 8.8%
This relative risk means that there is a negative relationship between exercise and heart disease.
Those in the exercise group have lower rates of heart disease than those in the nonexercise
group. Further, from this result, there is an indication that exercise is protective against heart
disease. The interpretation of this relative risk is:
The risk that those in the exercise group will develop heart disease is 0.27 times
as likely as those in the nonexercise group developing heart disease.

An important concept in the discussion of the interpretation of measures of association is that
the interpretation must describe the study design. The study design used to collect the data should
be reflected in the sentence used to interpret the measure of association. In the case of relative
risk, the data collected must be incidence data, and incidence data only comes from the cohort
studies. Further, a cohort study compares those with exposure to those without exposure, so the
sentence interpreting a relative risk will look like the cohort study design. For example, notice in
the previous interpretation of relative risk, the exercise group (exposure) is compared to the non-
exercise group (no exposure). This reflects the actual cohort study design. Finally, because
relative risk can only come from a cohort study design, the sentence interpreting the relative risk
will always be organized like a cohort study (comparing exposure to no exposure).
One further comment about relative risk resulting in a negative association is that many
investigators find it difficult to present findings of relative risk less than 1 because there can be
confusion about the direction of the association using the sentence as written previously. For
example, it is mathematically correct to say “0.27 times as likely,” but to some readers, care must
be taken to notice that exercise is not “more likely.” To alleviate this concern, some investigators
may choose to reverse the location of each of the rates above and below the fraction line such
as:
𝐼𝑛𝑐𝑖𝑑𝑒𝑛𝑐𝑒 𝑟𝑎𝑡𝑒 𝑜𝑓 ℎ𝑒𝑎𝑟𝑡 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 𝑖𝑛 𝑡ℎ𝑒 𝑛𝑜𝑛𝑒𝑥𝑒𝑟𝑐𝑖𝑠𝑒 𝑔𝑟𝑜𝑢𝑝 8.8%
= = 3.67
𝐼𝑛𝑐𝑖𝑑𝑒𝑛𝑐𝑒 𝑟𝑎𝑡𝑒 𝑜𝑓 ℎ𝑒𝑎𝑟𝑡 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 𝑖𝑛 𝑡ℎ𝑒 𝑒𝑥𝑒𝑟𝑐𝑖𝑠𝑒 𝑔𝑟𝑜𝑢𝑝 2.4%
This new arrangement for the relative risk still represents the same relationship between
exercise and heart disease, but the investigator eliminates the possible confusion of having to
interpret a relative risk less than 1. If the rates are reversed, it is critical to ensure that the correct
corresponding exposure group is represented in the interpretation. Consequently, in this example,
one would interpret it by saying that non-exercisers are 3.67 times more likely to develop heart
disease than exercisers or that exercisers are 0.27 times as likely to develop heart disease as
non-exercisers. The relationship between exercise and heart disease in both statements is the
same. Finally, notice that the mathematical relationship between relative risks when the rates are
reversed above and below the fraction line is that the each relative risk is the reciprocal of the
other, so 0.27 = 1/3.67.
OUTCOME NO OUTCOME Total

EXPOSED A B A+B
NOT EXPOSED C D C+D
Total A+C B+D
𝑎
𝑅𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝑅𝑖𝑠𝑘(𝑅𝑅) = 𝑐 𝑏
𝑎 +
𝑐+𝑑
Synthesis:
A risk ratio (RR), also called relative risk, compares the risk of a health event (disease, injury,
risk factor, or death) among one group with the risk among another group. It does so by dividing
the risk (incidence proportion, attack rate) in group 1 by the risk (incidence proportion, attack rate)
in group 2. The two groups are typically differentiated by such demographic factors as sex (e.g.,
males versus females) or by exposure to a suspected risk factor (e.g., did or did not eat potato

salad). Often, the group of primary interest is labeled the exposed group, and the comparison
group is labeled the unexposed group.
A risk ratio of 1.0 indicates identical risk among the two groups. A risk ratio greater than 1.0
indicates an increased risk for the group in the numerator, usually the exposed group. A risk ratio
less than 1.0 indicates a decreased risk for the exposed group, indicating that perhaps exposure
actually protects against disease occurrence.
Assessment:
1. Induction of labor. Meconium staining of the fetus during childbirth is a sign of fetal distress.
In a randomized trial, 11 pregnant women had elective induction of labor between 39 and 40
weeks of gestation and 117 control women were managed expectantly until 41 weeks of
gestation. One case of meconium staining occurred in the treatment group. Thirteen (13) occurred
in the control group. Express the association between induction and meconium staining as a risk
ratio.
2. Joseph Lister and anti-septic surgery. When Joseph Lister introduced the antiseptic method
for surgical operations he demonstrated that post-operative mortality dropped from 16 per 35
procedures to 6 per 40 procedures. Determine the risks of post-operative mortality in each group
and determine whether the difference is statistically significant.
References:
Andrade, C. (2015). Understanding Relative Risk, Odds Ratio, and Related Terms: As Simple
As It Can Get. Retrieved from: http://www.pitt.edu/~bertsch/risk.pdf
FHOP Planning Guide. (n.d.). Calculating and Interpreting Attributable Risk and Population
Attributable Risk. Retrieved from:
https://fhop.ucsf.edu/sites/fhop.ucsf.edu/files/wysiwyg/pg_apxIIIB.pdf
LABORATORY 9
RELATIVE RISK
I. BACKGROUND:
Relative risk is a measure of association that provides strength of association between

exposure and outcome in a population. The generic formula for assessing the
relationship between exposure and outcome using the relative risk is:

𝑰𝒏𝒄𝒊𝒅𝒆𝒏𝒄𝒆 𝒓𝒂𝒕𝒆 𝒊𝒏 𝒕𝒉𝒆 𝒆𝒙𝒑𝒐𝒔𝒆𝒅 𝒈𝒓𝒐𝒖𝒑
𝑹𝑹 =
𝑰𝒏𝒄𝒊𝒅𝒆𝒏𝒄𝒆 𝒓𝒂𝒕𝒆 𝒊𝒏 𝒕𝒉𝒆 𝒏𝒐𝒏𝒆𝒙𝒑𝒐𝒔𝒆𝒅 𝒈𝒓𝒐𝒖𝒑
The relative risk is interpreted by the table below:
RR = 1 No relationship exists between the groups being compared in the ratio
RR > 1 Subjects in the exposed group are more likely to have the outcome of
interest
RR < 1 Subjects in the exposed group are less likely to have the outcome of
interest
II. OBJECTIVES:
1. Solve for the relative risk of a given set of data.
2. Correctly interpret the result of the relative risk
III. MATERIALS:
Coupon Bond
Writing Materials
Calculator
IV. ACTIVITY:
1. The table below examines the risk of wound infections with incidental appendectomy
during a staging laparotomy for Hodgkin disease. Calculate the relative risk and give
the interpretation.
Had incidental Wound infection No wound infection
appendectomy
Yes 7 124
No 1 78
2. A study of raloxifene and incidence of fractures was conducted among women with
evidence of osteoporosis. The women were initially divided into two groups: those
with and those without pre-existing fractures. The women were then randomized to
raloxifene or placebo and followed for 3 years to determine the incidence of new
vertebral fractures, with the results shown in Table 13.53. Among those with no pre-
existing fractures, compute the relative risk of new fractures among those randomized
to raloxifene vs. placebo, along with its associated 95% CI.
1. A cohort study examined the association between smoking and lung cancer after
following 400 smokers and 600 non-smokers for 15 years. At the conclusion of the
study the investigators found a relative risk of 17. What would be the interpretation of
the given relative risk?

2. A study is done to examine whether there is an association between daily use of
vitamins C and E and risk of coronary artery disease over a 10 year period. When
subjects who took both vitamins were compared to those who did not take the vitamins
at all, the risk ratio was found to be 0.70. What is the interpretation of the risk ratio?
LESSON 10
ODDS RATIO, PREVALENCE, INCIDENCE

1. Understand odds ratio as a measure of association
2. Calculate the odds ration given the data
3. Correctly interpret the meaning of the odds ratio
Introduction:
In the previous lesson, the RR (or relative risk) was introduced. The relative risk can be
expressed as the ratio of the probability of disease among exposed subjects (p1) divided by the
probability of disease among unexposed subjects (p2). Although easily understood, the RR has
the disadvantage of being constrained by the denominator probability (p2). For example, if p2 =
.5, then the RR can be no larger than 1/.5 = 2; if p2 = .8, then the RR can be no larger than 1/.8
= 1.25. To avoid this restriction, another comparative measure relating two proportions is
sometimes used, called the odds ratio (OR). The odds in favor of a success are defined as follows:
If the probability of a success = p, then the odds in favor of success = p/(1 −p). If two proportions
p1, p2 are considered and the odds in favor of success are computed for each proportion, then
the ratio of odds, or OR, becomes a useful measure for relating the two proportions.
Lesson Proper:
When incidence rates are available, comparing rates using a ratio measure of association is
best done with a relative risk. However, when incidence data are not available, the most
commonly used method for comparing rates as a ratio is the odds ratio (OR). Equally important
is that an odds ratio can be used to measure associations in any study design. This makes the
odds ratio a very widely used measure of association.
An odds ratio is a measure association that provides the strength and direction of the
association between exposure and outcome in a population. This sounds very similar to the
definition of a relative risk, and, in fact, the results of the odds ratio are interpreted in the same
manner as the relative risk. An odds ratio equal to 1 indicates that there is no relationship between
exposure and outcome in the observed populations. An odds ratio greater than 1 indicates a
positive association between exposure and outcome, an odds ratio less than 1 indicates a
negative association between exposure and outcome. The formula for the odds ratio is best
described by referring to the 2 x 2 table. The odds ratio can be either the ratio of the:
1. Exposure odds in those with the outcome to the exposure odds in those without outcome
2. Outcome odds in those with exposure to the outcome odds in those without exposure.
Outcome No Outcome
Exposed A B
Not exposed C D
Exposure Odds Ratio

The first way that the odds ratio can be calculated is by comparing those with the outcome to
those without the outcome. This method is based on the concept of odds, which is the probability
that the event will occur divided by the probability that the event will not occur. So using the
concept, the exposure odds in those with outcome and the exposure odds in those without the
outcome can be calculated by the formula:
𝑎/𝑐
𝑏/𝑑
This formula is known as the exposure odds ratio. Notice that the exposure odds in those with
outcome is represented by a/c because it is the number of people in the outcome group with
exposure divided by the number of people in the outcome without exposure. Likewise, the
exposure odds in those without outcome are represented by b/d because it is the number of
people without outcome with the exposure divided by the number of people without exposure.
Solving the formula results in the traditional odd ratio formula as follows:
𝑎
𝑐 = 𝑎 × 𝑑 = 𝑎𝑑
𝑏 𝑐 𝑏 𝑏𝑐
𝑑

As with the relative risk, the odds ratio interpretation must reflect the study design that produced
the data. The exposure odds ratio compares those with the outcome (cases) to those without the
outcome (controls), which is consistent with the case control study design. So when interpreting
an odds ratio from a case control study, the exposure odds ratio is used.
Outcome Odds Ratio
The second way that the odds ratio can be calculated is by comparing those groups with the
exposure to those without the exposure. This method is also based on the concept of odds, which
is the probability that the event will occur divided by the probability that the event will not occur.
Using this concept, the outcome odds in those with exposure and the outcome odds in those
without the exposure is shown in the formula:
𝑎/𝑏 𝑎 𝑑 𝑎𝑑
= × =
𝑐/𝑑 𝑏 𝑐 𝑏𝑐
The outcome odds in those with exposure is represented by a/b because it is the number of
people with exposure and with outcome divided by the number of people with exposure and
without the outcome. Likewise, the outcome odds in those without exposure are represented by
c/d because it is the number of people with outcome and without the exposure divided by the
number of people without the outcome and without exposure.
In the case of outcome odds ratio, the interpretation compares the odds of outcome in the
exposed group to the odds of outcome in the not exposed group. The study design that collects
data in this fashion is the cohort study design. So in order for the interpretation of the odds ratio
to match the study design, the outcome odds ratio is used to measure the association in a cohort
study design. This understandably can be confusing because it would seem that if a cohort study
was performed, then the chosen measure of association would be the relative risk. But remember
that an odds ratio can be used in any study design, so the odds ratio must be flexible enough to
reflect whatever design is used to collect the data. As a general rule, for a prospective cohort
study, the relative risk is the measure of choice, but an outcome odds ratio can be used when
incidence data are not available, such as in a retrospective cohort study, or because an
investigator believes the odds ratio is more appropriate. Of course, if the study design is a case
control study, the only option for a measure association is the exposure odds ratio.
Once the odds ratio is calculated, it must be interpreted according to one of the two methods:
Exposure odds ratio: The odds that those with the outcome are exposed is xx.xx times as
likely as those without the outcome being exposed.
Outcomes odds ratio: The odds that those with the exposure have the outcome is xx.xx
times as likely as those without exposure have the outcome.
Prevalence Ratio
The prevalence ratio is a measure of association that provides strength and direction of the
association between existing exposure and outcome in the population. The prevalence ratio can
be used in a cross sectional study or any study where the outcome data is prevalence. A
prevalence ratio has the same interpretation as the relative risk and the odds ratio with respect to
its null value of 1 and values greater or less than 1. The main difference is the prevalence ratio
compares two prevalence rates. The two rates are compared as a ratio in the following generic
formula:

𝑃𝑟𝑒𝑣𝑎𝑙𝑒𝑛𝑐𝑒 𝑟𝑎𝑡𝑒 𝑖𝑛 𝑡ℎ𝑒 𝑒𝑥𝑝𝑜𝑠𝑒𝑑 𝑔𝑟𝑜𝑢𝑝
𝑃𝑟𝑒𝑣𝑎𝑙𝑒𝑛𝑐𝑒 𝑟𝑎𝑡𝑖𝑜 =
𝑃𝑟𝑒𝑣𝑎𝑙𝑒𝑛𝑐𝑒 𝑟𝑎𝑡𝑒 𝑖𝑛 𝑡ℎ𝑒 𝑛𝑜𝑛𝑒𝑥𝑝𝑜𝑠𝑒𝑑 𝑔𝑟𝑜𝑢𝑝
Incidence Density Ratio

The incidence density ratio, a measure of association between exposure and outcome, provides
strength and direction using two incidence densities. An incidence density is a way to present
disease occurrence using a denominator hat is comprised of person time and has some
advantages over using rates. Incidence densities can be compared using a ratio to determine
whether either of the two groups has a greater (or weaker) density of a disease. Incidence density
ratios can be used as an indication of an increased or decreased risk of outcome. The two
densities are compared using the following generic formula:
𝐼𝑛𝑐𝑖𝑑𝑒𝑛𝑐𝑒 𝑑𝑒𝑛𝑠𝑖𝑡𝑦 𝑖𝑛 𝑡ℎ𝑒 𝑒𝑥𝑝𝑜𝑠𝑒𝑑 𝑔𝑟𝑜𝑢𝑝

𝐼𝑛𝑐𝑖𝑑𝑒𝑛𝑐𝑒 𝑑𝑒𝑛𝑠𝑖𝑡𝑦 𝑟𝑎𝑡𝑖𝑜 =
𝐼𝑛𝑐𝑖𝑑𝑒𝑛𝑐𝑒 𝑑𝑒𝑛𝑠𝑖𝑡𝑦 𝑖𝑛 𝑡ℎ𝑒 𝑛𝑜𝑛 − 𝑒𝑥𝑝𝑜𝑠𝑒𝑑 𝑔𝑟𝑜𝑢𝑝
The incidence density ratio has the same interpretation as relative risk and the odds ratio.
Example:
A study looking at breast cancer in women compared cases with non- cases, and found that
75/100 cases did not use calcium supplements compared with 25/100 of the non-cases. (a)
Calculate the odds of exposure in cases and non-cases. (b) Calculate the odds ratio using the
cross-product ratio. (c) How does the difference between the two prevalences of breast cancer
(75% vs 25%) compare to the odds ratio?
Risk factor/exposure Disease Group

Case Control
No calcium supplement 75 25
Calcium supplement 25 75
a) The odds of exposure in:
𝑎 75
- case group: = =3
𝑐 25
𝑏 25 1
- control group: = =
𝑑 75 3
𝑎𝑑 (75)(75)
b) 𝑂𝑅 = = =9
𝑏𝑐 (25)(25)
c) After calculating the odds ratio, we observe a 3-fold difference in the prevalence rate (75%
vs. 25%) change to a 9-fold difference in the odds ratio. Clearly, the two methods produce
opposing results.
Synthesis:
The great value of the odds ratio is that it is simple to calculate, very easy to interpret, and provides
results upon which clinical decisions can be made. Furthermore, it is sometimes helpful in clinical
situations to be able to provide the patient with information on the odds of one outcome versus
another. Patients may decide to accept or forego painful or expensive treatments if they
understand what their odds are for obtaining a desired result from the treatment. Many patients
want to be involved in decisions about their treatment, but to be able to participate effectively,
they must have information about their likely results in terms they can understand. At least in the
industrialized world, most patients have received enough schooling to understand basic
percentages and the meaning of probabilities. The odds ratio provides information that both
clinicians and their patients can use for decision-making.
Odds ratios are one of a category of statistics clinicians often use to make treatment decisions.
Other statistics commonly used to make treatment decisions include risk assessment statistics
such as absolute risk reduction and relative risk reduction statistics. The odds ratio supports
clinical decisions by providing information on the odds of a particular outcome relative to the odds
of another outcome. In the endocarditis example, the risk (or odds) of dying if treated with the new
drug is relative to the risk (odds) of dying if treated with the standard treatment antibiotic protocol.
Relative risk assessment statistics are particularly suited to diagnostic and treatment decision-
making and will be addressed in a future paper.
Assessment:
Offline Learners: Do the activity below and submit it to your instructor via email.
Online Learners: Do the activity posted in Google Classroom
1. The following is the abstract of a paper (Illi et al., 2001). Read and understand the case
and answer the questions that follow.
Objective: To investigate the association between early childhood infections and subsequent
development of asthma.
Design: Longitudinal birth cohort study.
Setting: Five children's hospitals in five German cities.
Participants: 1314 children born in 1990 followed from birth to the age of 7 years.
Main outcome measures: Asthma and asthmatic symptoms assessed longitudinally by parental
questionnaires; atopic sensitisation assessed longitudinally by determination of IgE
concentrations to various allergens; bronchial hyperreactivity assessed by bronchial histamine
challenge at age 7 years.
Results: Compared with children with 1 episode of runny nose before the age of 1 year, those
with 2 episodes were less likely to have a doctor's diagnosis of asthma at 7 years old (odds ratio
0.52 (95% confidence interval 0.29 to 0.92)) or to have wheeze at 7 years old (0.60 (0.38 to 0.94)),
and were less likely to be atopic before the age of 5 years. Similarly, having 1 viral infection of the
herpes type in the first 3 years of life was inversely associated with asthma at age 7 (odds ratio
0.48 (0.26 to 0.89)). Repeated lower respiratory tract infectionsin the first 3 years of life showed
a positive association with wheeze up to the age of 7 years (odds ratio 3.37 (1.92 to 5.92) for 4
infections v 3 infections).
Conclusion: Repeated viral infections other than lower respiratory tract infections early in
life may reduce the risk of developing asthma up to school age.
a) What is meant by odds ratio 0.52 for runny nose and asthma and what does it tell us?
b) What is meant by 95% confidence interval 0.29 to 0.92 and what further information
does this provide?
c) What is meant by odds ratio 3.37 (1.92 to 5.92) for lower respiratory tract infections and
wheeze?
d) On a less statistical point, what is wrong with the way the conclusion is phrased?

2. Treatment of acute otitis media. A trial on the treatment of otitis media studied clearance
of infection within 14 days of treatment in two groups. Group 1 received Cefaclor and group
2 received amoxicillin
Cross-tabulated data are shown below.
AB 1 2 TOTAL
1 89 61 150
2 56 72 128
TOTAL 145 133 278
a) Calculate the incidence of the clearance of infection in each ear.
b) Calculate the incidence proportion ratio associated with cefaclor. Interpret this statistic.
References:
LABORATORY 10
ODDS RATIO
I. BACKGROUND

Odds ratio is a measure association that provides the strength and direction of the association
between exposure and outcome in a population. The results of the odd ratio are interpreted in
the same manner as the relative risk.
𝑎⁄
𝑐
𝐸𝑥𝑝𝑜𝑠𝑢𝑟𝑒 𝑜𝑑𝑑𝑠 𝑟𝑎𝑡𝑖𝑜 =
𝑏⁄
𝑑
𝑎⁄
𝑂𝑢𝑡𝑐𝑜𝑚𝑒 𝑜𝑑𝑑𝑠 𝑟𝑎𝑡𝑖𝑜 = 𝑐 𝑏
⁄𝑑
II. OBJECTIVES
1. Accurately solve for the odds ratio given a set of data
2. Formulate a correct interpretation of the odds ratio
III. MATERIALS
Coupon bond
Calculator
Writing materials
IV. ACTIVITY
1. Using the table below, calculate and interpret:
a. the odds ratio for the relationship between NOT smoking and lung cancer
b. the odds ratio for the relationship between smoking and lung cancer
Lung Cancer No lung cancer Total
Smoker 127 3,400 3,527
Non-smoker 45 3,100 3,145
Total 172 6,500 6,672
1. Create a table presenting the data and calculate the odds ratio and give an appropriate
interpretation based on the data below:
200 pairs where the case is exposed and the control is not
50 pairs where the control is exposed and the case is not
130 pairs where cases and controls are exposed
85 pairs where the cases and controls are unexposed
2. An investigator conducts a study to determine whether there is an association between
caffeine intake and Parkinson’s disease. He assembles 230 incident cases of PD and
samples 455 controls from the general population. After interviewing all subjects, he
finds that 64 of the cases had high daily intake of caffeine (exposed) prior to diagnosis
and 277 of the controls had low daily intake of caffeine (unexposed) prior to the date of
the matched case’s diagnosis.
a. Assemble the 2x2 table for this study using the information given.
b. Calculate the odds of being a case among the exposed
c. Calculate the odds ratio for disease given exposure to high daily intake of
caffeine
LESSON 11
EPIDEMIOLOGY

1. To describe key features and applications of descriptive and analytic epidemiology.
2. To utilize basic statistical techniques in the analysis and presentation of health-related
data
Introduction
This lesson begins with the definition, overview of the objectives of epidemiology, some
of the approaches used in epidemiology, and examples of the applications of epidemiology
to human health problems. It also discusses how diseases are transmitted. Diseases do
not arise in a vacuum; they result from an interaction of human beings with their
environment. An understanding of the concepts and mechanisms underlying the
transmission and acquisition of disease is critical to exploring the epidemiology of human
disease and preventing and controlling many infectious diseases
Lesson Proper:
What is epidemiology?
Epidemiology is the study of how disease is distributed in populations and the factors
that influence or determine this distribution. Why does a disease develop in some people
and not in others? The premise underlying epidemiology is that disease, illness, and ill
health are not randomly distributed in human populations. Rather, each of us has certain
characteristics that predispose us to, or protect us against, a variety of different diseases.
These characteristics may be primarily genetic in origin or may be the result of exposure to
certain environmental hazards. Perhaps most often, we are dealing with an interaction of
genetic and environmental factors in the development of disease.
A broader definition of epidemiology than that given above has been widely accepted. It
defines epidemiology as “the study of the distribution and determinants of health-related
states or events in specified populations and the application of this study to control of health
problems”. What is noteworthy about this definition is that it includes both a description of
the content of the discipline and the purpose or application for which epidemiologic
investigations are carried out.
What are the dynamics of disease transmission?
Human disease does not arise in a vacuum. It results from an interaction of the host (a
person), the agent (e.g., a bacterium), and the environment (e.g., a contaminated water
supply). Although some diseases are largely genetic in origin, virtually all disease results
from an interaction of genetic and environmental factors, with the exact balance differing for
different diseases. Many of the underlying principles governing the transmission of disease
are most clearly demonstrated using communicable diseases as a model. Hence, this
chapter primarily uses such diseases as examples in reviewing these principles. However,
the concepts discussed are also applicable to diseases that do not appear to be of
infectious origin.
Disease has been classically described as the result of an epidemiologic triad.

According to
this diagram, it is the
product of an interaction of
the human host, an
infectious or other type of agent, and the environment that promotes the exposure. A
vector, such as the mosquito or the deer tick, is often involved. For such an interaction to
take place, the host must be susceptible. Human susceptibility is determined by a variety of
factors including genetic background and nutritional and immunologic characteristics. The
immune status of an individual is determined by many factors including prior experience
both with natural infection and with immunization.
The factors that can cause human disease include biologic, physical, and chemical factors
as well as other types, such as stress, that may be harder to classify

Modes of Transmission
Diseases can be transmitted directly or indirectly. For example, a disease can be

transmitted person to person (direct transmission) by means of direct contact. Indirect
transmission can occur through a common vehicle such as a contaminated air or water
supply, or by a vector such as the mosquito
An infected individual can transmit influenza or the common cold to others in the course
of an hour in a crowded room. A venereal infection also may spread progressively from
person to person if it is to maintain itself in nature, but it would be a formidable task to transmit
venereal infection on such a scale.
Thus, different organisms spread in different ways, and the potential of a given organism
for spreading and producing outbreaks depends on the characteristics of the organism, such
as its rate of growth and the route by which it is transmitted from one person to another
Clinical and Subclinical Disease
It is important to recognize the broad spectrum of disease severity which is illustrated

through the iceberg concept of disease. Just as most of an iceberg is underwater and
hidden from view with only its tip visible, so it is with disease: only clinical illness is readily
apparent . But infections without clinical illness are important, particularly in the web of
disease transmission, although they are not visible clinically. The iceberg concept is
important because it is not sufficient to count only the clinically apparent cases we see; for
example, most cases of polio in prevaccine days were subclinical, but they were still

capable of spreading the virus. The epidemiology of polio cannot be explained without a
recognition and assessment of the pool of inapparent cases.
Clinical and biologic knowledge has increased over the years, so has our ability to distinguish
different stages of disease. These include clinical and nonclinical disease:
Clinical Disease : characterized by signs and symptoms.
Nonclinical (Inapparent) Disease
Nonclinical disease may include the following:
1. Preclinical Disease
➢ Disease that is not yet clinically apparent, but is destined to progress to clinical
disease.
2. Subclinical Disease
➢ Disease that is not clinically apparent and is not destined to become clinically
apparent. This type of disease is often diagnosed by serologic (antibody) response
or culture of the organism.
3. Persistent (Chronic) Disease
➢ A person fails to “shake off” the infection, and it persists for years, at times for life.
In recent years, an interesting phenomenon has been the manifestation of
symptoms many years after an infection was thought to have been resolved. Some
adults who recovered from poliomyelitis in childhood are now reporting severe
fatigue and weakness; this has been called post-polio syndrome in adult life. These
have thus become cases of clinical disease, albeit somewhat different from the
initial illness.
4. Latent Disease
➢ An infection with no active multiplication of the agent, as when viral nucleic acid is
incorporated into the nucleus of a cell as a provirus. In contrast to persistent
infection, only the genetic message is present in the host, not the viable organism.

CARRIER STATUS
In this situation, the individual harbors the organism, but is not infected as measured by
serologic studies (no evidence of an antibody response) or by evidence of clinical illness.
This person can still infect others, although the infectivity is often lower than with other
infections. Carrier status may be of limited duration or may be chronic, lasting for months or
years. One of the best-known examples of a long-term carrier was Typhoid Mary, who carried
Salmonella typhi and died in 1938. Over a period of many years, she worked as a cook in the
New York City area, moving from household to household under different names. She was
considered to have caused at least 10 typhoid fever outbreaks that included 51 cases and 3
deaths.
ENDEMIC, EPIDEMIC, AND PANDEMIC
Three other terms need to be defined: endemic, epidemic, and pandemic. Endemic is
defined as the habitual presence of a disease within a given geographic area. It may also
refer to the usual occurrence of a given disease within such an area. Epidemic is defined as
the occurrence in a community or region of a group of illnesses of similar nature, clearly in
excess of
normal
expectancy, and derived from a common or from a propagated source. Pandemic refers to a
worldwide epidemic.

DISEASE OUTBREAKS
Assuming that a food becomes contaminated with a microorganism. If an outbreak occurs

in the group of people who have eaten the food, it is called a common-vehicle exposure,
because all the cases that developed were in persons exposed to the food in question. The
food may be served only once, for example, at a catered luncheon, resulting in a single
exposure to the people who eat it, or the food may be served more than once, resulting in
multiple exposures to people who eat it more than once. When a water supply is
contaminated with sewage because of leaky pipes, the contamination can be either periodic,
causing multiple exposures as a result of changing pressures in the water supply system that
may cause intermittent contamination, or continuous, in which a constant leak leads to
persistent contamination. The epidemiologic picture that is manifested depends on whether
the exposure is single, multiple, or continuous.
For purposes of this discussion, the focus will be on the single-exposure, common-vehicle
outbreak because the issues discussed are most clearly seen in this type of outbreak. What
are the characteristics of such an outbreak? First, such outbreaks are explosive; there is a
sudden and rapid increase in the number of cases of a disease in a population. Second, the
cases are limited to people who share the common exposure. This is self-evident, because
in the first wave of cases we would not expect the disease to develop in people who were not
exposed unless there were another source of the disease in the community. Third, in a food-
borne outbreak, cases rarely occur in persons who acquire the disease from a primary case.
The reason for the relative rarity of such secondary cases in this type of outbreak is not well
understood.
DETERMINANTS OF DISEASE OUTBREAKS
The amount of disease in a population depends on a balance between the number of

people in that population who are susceptible, and therefore at risk for the disease, and the
number of people who are not susceptible, or immune, and therefore not at risk. They may
be immune because they have had the disease previously or because they have been
immunized. They also may be not susceptible on a genetic basis. Clearly, if the entire
population is immune, no epidemic will develop. But the balance is usually struck somewhere
in between immunity and susceptibility, and when it moves toward susceptibility, the
likelihood of an outbreak increases. This has been observed particularly in formerly isolated
populations who were exposed to disease. For example, in the 19th century, Panum
observed that measles occurred in the Faroe Islands in epidemic form when infected
individuals entered the isolated and susceptible population. In another example, severe
outbreaks of streptococcal sore throats developed when new susceptible recruits arrived at
the Great Lakes Naval Station
INCUBATION PERIOD
The incubation period is defined as the interval from receipt of infection to the time of onset
of clinical illness. If a person become infected today, the disease with which a person is
infected may not develop for a number of days or weeks. During this time, the incubation
period, you feel completely well and show no signs of the disease.

Why doesn—t disease develop immediately at the time of infection? What accounts for the
incubation period? It may reflect the time needed for the organism to replicate sufficiently
until it reaches the critical mass needed for clinical disease to result. It probably also relates
to the site in the body at which the organism replicates—whether it replicates superficially,
near the skin surface, or deeper in the body. The dose of the infectious agent received at the
time of infection may also influence the length of the incubation period. With a large dose,
the incubation period may be shorter.
The incubation period is also of historical interest because it is related to what may have
been the only medical advance associated with the Black Death in Europe. In 1374, when
people were terribly frightened of the Black Death, the Venetian Republic appointed three
officials who were to be responsible for inspecting all ships entering the port and for excluding
ships that had sick people on board. It was hoped that this intervention would protect the
community. In 1377, in the Italian seaport of Ragusa, travelers were detained in an isolated
area for 30 days (trentini giorni) after arrival to see whether infection developed. This period
was found to be insufficient, and the period of detention was lengthened to 40 days (quarante
giorni). This is the origin of the word quarantine.
How long would a person be isolated? The person should be isolated until he or she is no
longer infectious to others. When a person is clinically ill, there is generally a clear sign of
potential infectiousness. An important problem arises before the person becomes clinically
ill—that is, during the incubation period. It is important to know when a person became
infected and also know the general length of the incubation period for the disease, the
infected person should be isolated during this period to prevent the transmission of the
disease to others. In most situations, however, it is difficult to know that a person has been
infected, and nobody may know until signs of clinical disease become manifest.
This leads to an important question: Is it worthwhile to quarantine—isolate—a patient,

such as a child with chickenpox? The problem is that, during at least part of the incubation
period, when a person is still free of clinical illness, he or she can transmit the disease to
others. Thus, we have people who are not (yet) clinically ill, but who have been infected and
are able to transmit the disease. For many common childhood diseases, by the time clinical
disease develops in the child, he or she has already transmitted the disease to others.
Therefore, isolating such a person at the point at which he or she becomes clinically ill will
not necessarily be effective. On the other hand, isolation can be very valuable. In February
2003 a serious respiratory illness was first reported in Asia (having occurred in 2002) and
was termed severe acute respiratory syndrome (SARS). The disease is characterized by
fever over 38°C, headache, overall discomfort, and, after 2 to 7 days, development of cough
and difficulty in breathing in some patients. The cause of SARS has been shown to be
infection with a previously unrecognized human coronavirus, called SARS-associated
coronavirus.
SARS appears to spread by close, person-to-person contact. Because modern travel,

particularly air travel, facilitates rapid and extensive spread of disease, within a few months
the illness had spread to more than two dozen countries in North America, South America,
Europe, and Asia. However, by late July 2003, no new cases were being reported and the
outbreak was considered contained.

ATTACK RATE
An attack rate is defined as
The attack rate is similar to the incidence rate, which is also used for less acute diseases.
The attack rate (or the incidence rate) is useful for comparing the risk of disease in groups
with different exposures. The attack rate can be specific for a given exposure. For example,
the attack rate in people who ate a
certain food is called a food-
specific attack rate. It is calculated
by:
In general, time is not explicitly specified in an attack rate; given what is usually known
about how long after an exposure most cases develop, the time period is implicit in the attack
rate
A person who acquires the disease from that exposure (e.g., from a contaminated food)
is called a primary case. A person who acquires the disease from exposure to a primary case
is called a secondary case. The secondary attack rate is therefore defined as the attack rate
in susceptible people who have been exposed to a primary case. It is a good measure of
person-to-person spread of disease after the disease has been introduced into a population,
and it can be thought of as a ripple moving out from the primary case. We often calculate the
secondary attack rate in family members of the index case.
The secondary attack rate also has application in noninfectious diseases when family
members are examined to determine the extent to which a disease clusters among first-
degree relatives of an index case, which may yield a clue regarding the relative contributions
of genetic and environmental factors to the cause of a disease.
Synthesis:
This lesson reviewed some basic concepts that underlie the epidemiologic approach to
acute communicable diseases. Many of these concepts apply equally well to nonacute
diseases that at this time do not appear to be infectious in origin. Moreover, for an increasing
number of chronic diseases originally thought to be noninfectious, infection seems to play
some role. Thus, hepatitis B infection is a major cause of primary liver cancer.
Papillomaviruses have been implicated in cervical cancer, and Epstein-Barr virus has been
implicated in Hodgkin disease. The boundary between the epidemiology of infectious and
noninfectious diseases has blurred in many areas. In addition, even for diseases that are not
infectious in origin, the patterns of spread share many of the same dynamics, and the
methodologic issues in studying them are similar.
For additional materials you may go to the following links below:

https://www.youtube.com/watch?v=r9poHB-ldqk
https://www.youtube.com/watch?v=6izxFrf8aBs
Assessment:
QUIZ: it will be given through quizziz, google forms or for weak connectivity will be sent through
email
SEATWORK (15 points):

• This will be answered in a word format in a letter-sized bond with margins 1 inch on all sides. It
will be submitted to the google classroom assigned to the class.
• Compute and show all solutions ( 1 point-formula, 1 point – substitution of values, 1 point answer)
1. An outbreak of cholera with 120 cases among a population at risk of 2400, compute
for the attack rate of disease. (3 points)
2. After a party attended by 80 people, 25 individuals become ill. All 80 people were
interviewed about their food consumption at the dinner. The interviews show that 18 of the
25 people who are ill and 25 of the 80 who are healthy ate fish. Show the 2x2 table (6
points) and Compute for the :
a) Attack rate of the individuals who became ill after eating fish (3 points)
b) Attack rate of the individuals who did not eat fish ( 3 points)
References:
Dawson, B. and Trapp, R. (2004) Basic & Clinical Biostatistics 4 th Edition. McGrawHill.
Gordis, L. (2008) Epidemiology 4th Edition. Saunders, an imprint of Elsevier Inc.

LABORATORY 11
EPIDEMIOLOGY
I. BACKGROUND
Diseases have always been around, many in the same form as we see today. Every disease
has its own pattern, usually described by who is affected, where this takes place, and when it takes place.
There is a collection of selected models of health and disease that can be used in assessing, planning and
decision making for the community. However there is not a single model that is ideal enough to give the
totality of community and public health. There are several theories and models that support the practice
of health promotion and disease prevention. Theories and models are used in program planning to
understand and explain health behavior and to guide the identification, development, and
implementation of interventions.
When identifying a theory or model to guide health promotion or disease prevention

programs, it is important to consider a range of factors, such as the specific health problem being
addressed, the population(s) being served, and the contexts within which the program is being
implemented. Health promotion and disease prevention programs typically draw from one or more
theories or models.
II. OBJECTIVES
1. Explain the concepts of community health and disease
2. Illustrate concepts of community organization
3. Explain the different models of health and disease
III. MATERIALS
Powerpoint presentation
References
IV. ACTIVITY
1. Students are assigned one model of community health
a) epidemiology triangle/epidemiological triad of disease
b) health field model
c) health belief model
d) iceberg theory of disease
e) model of health and community ecosystem
f) medical model
g) biopsychosocial model
h) salutogenic model
i) stages of change model (transtheoretical model)
j)Theory of reasoned action/planned behavior
2. The models are to be presented in class using group prepared powerpoint through video
presentation/google meet
3. Report should cover the following details
a) define the model of health
b) identify the major components of the model
c) suggest ways on how the model could be used to address a particular health concern
d) advantage and disadvantage
e) evaluate the model with regards to a likely success in various situations
V. QUESTIONS FOR RESEARCH

1. Describe the concept of natural history of disease
2. Describe disease occurrence in terms of person, place and time
3. List the guidelines used to assess causality in epidemiological studies of infectious and non
infectious diseases
4. Differentiate through a table primary, secondary and tertiary prevention

LESSON 12
DESCRIPTIVE EPIDEMIOLOGY

1. To utilize the basic terminology and definitions of epidemiology to define public health
problems in terms of magnitude, person, place, and time
2. To be able to identify the basic concepts and principles of epidemiology and vital
Statistics and population indicators
3. To be able to apply epidemiological concepts to diverse health problems and in various
health settings.
Introduction
Epidemiologists strive for similar comprehensiveness in characterizing an

epidemiologic event, whether it be a pandemic of influenza or a local increase in all-
terrain vehicle crashes. However, epidemiologists tend to use synonyms for the five W’s
listed above: case definition, person, place, time, and causes/risk factors/modes of
transmission. Descriptive epidemiology covers time, place, and person.
This Lesson will also cover the different descriptive design strategy in
epidemiological research. Descriptive studies are observational studies which describe
the patterns of disease occurrence in relation to variables such as person, place and
time. .While descriptive studies can highlight associations between variables or between
exposure and outcome variables, they cannot establish causality.
Lesson Proper:
I. DESCRIPTIVE EPIDEMIOLOGY ACCORDING TO PERSON, PLACE and TIME
Compiling and analyzing data by time, place, and person is desirable for several
reasons.
First, by looking at the data carefully, the epidemiologist becomes very familiar
with the data. He or she can see what the data can or cannot reveal based on the
variables available, its limitations (for example, the number of records with missing
information for each important variable), and its eccentricities (for example, all cases
range in age from 2 months to 6 years, plus one 17-year-old.).
Second, the epidemiologist learns the extent and pattern of the public health
problem being investigated — which months, which neighborhoods, and which groups of
people have the most and least cases.
Third, the epidemiologist creates a detailed description of the health of a
population that can be easily communicated with tables, graphs, and maps.
Fourth, the epidemiologist can identify areas or groups within the population that
have high rates of disease. This information in turn provides important clues to the
causes of the disease, and these clues can be turned into testable hypotheses.
Time
The occurrence of disease changes over time. Some of these changes occur
regularly, while others are unpredictable. Two diseases that occur during the same
season each year include influenza (winter) and West Nile virus infection (August–
September). In contrast, diseases such as hepatitis B and salmonellosis can occur at any
time. For diseases that occur seasonally, health officials can anticipate their occurrence
and implement control and prevention measures, such as an influenza vaccination
campaign or mosquito spraying. For diseases that occur sporadically, investigators can
conduct studies to identify the causes and modes of spread, and then develop
appropriately targeted actions to control or prevent further occurrence of the disease.
In either situation, displaying the patterns of disease occurrence by time is critical for
monitoring disease occurrence in the community and for assessing whether the public
health interventions made a difference.
Time data are usually displayed with a two-dimensional graph. The vertical or y-axis
usually shows the number or rate of cases; the horizontal or x-axis shows the time periods
such as years, months, or days. The number or rate of cases is plotted over time. Graphs
of disease occurrence over time are usually plotted as line graphs or histograms
Sometimes a graph shows the timing of events that are related to disease trends
being displayed. For example, the graph may indicate the period of exposure or the date
control measures were implemented. Studying a graph that notes the period of exposure
may lead to insights into what may have caused illness. Studying a graph that notes the
timing of control measures shows what impact, if any, the measures may have had on
disease occurrence.
As noted above, time is plotted along the x-axis. Depending on the disease, the time
scale may be as broad as years or decades, or as brief as days or even hours of the
day. For some conditions — many chronic diseases, for example — epidemiologists
tend to be interested in long-term trends or patterns in the number of cases or the rate.
For other conditions, such as foodborne outbreaks, the relevant time scale is likely to be
days or hours.
Secular (long-term) trends. Graphing the annual cases or rate of a disease over a
period of years shows long-term or secular trends in the occurrence of the disease.
Health officials use these graphs to assess the prevailing direction of disease occurrence
(increasing, decreasing, or essentially flat), help them evaluate programs or make policy
decisions, infer what caused an increase or decrease in the occurrence of a disease
(particularly if the graph indicates when related events took place), and use past trends
as a predictor of future incidence of disease.
Seasonality. Disease occurrence can be graphed by week or month over the course
of a year or more to show its seasonal pattern, if any. Some diseases such as influenza
and West Nile infection are known to have characteristic seasonal distributions.
Seasonal patterns may suggest hypotheses about how the infection is transmitted, what
behavioral factors increase risk, and other possible contributors to the disease or
condition.
Day of week and time of day. For some conditions, displaying data by day of the
week or time of day may be informative. Analysis at these shorter time periods is
particularly appropriate for conditions related to occupational or environmental
exposures that tend to occur at regularly scheduled intervals.
Epidemic period. To show the time course of a disease outbreak or epidemic,

epidemiologists use a graph called an epidemic curve. As with the other graphs
presented so far, an epidemic curve’s y-axis shows the number of cases, while the x-
axis shows time as either date of symptom onset or date of diagnosis. Depending on the
incubation period (the length of time between exposure and onset of symptoms) and
routes of transmission, the scale on the x-axis can be as broad as weeks (for a very
prolonged epidemic) or as narrow as minutes (e.g., for food poisoning by chemicals that
cause symptoms within minutes). Conventionally, the data are displayed as a histogram
(which is similar to a bar chart but has no gaps between adjacent columns).
Place
Describing the occurrence of disease by place provides insight into the geographic
extent of the problem and its geographic variation. Characterization by place refers not
only to place of residence but to any geographic location relevant to disease occurrence.
Such locations include place of diagnosis or report, birthplace, site of employment,
school district, hospital unit, or recent travel destinations. The unit may be as large as a
continent or country or as small as a street address, hospital wing, or operating room.
Sometimes place refers not to a specific location at all but to a place category such as
urban or rural, domestic or foreign, and institutional or noninstitutional.
Place data can be shown through a table but a map provides a more striking visual
display of place data. On a map, different numbers or rates of disease can be depicted
using different shadings, colors, or line patterns.

Another type of map for place data is a spot map. Spot maps generally are used for
clusters or outbreaks with a limited number of cases. A dot or X is placed on the location
that is most relevant to the disease of interest, usually where each victim lived or
worked, just as John Snow did in his spot map of the Golden Square area of London. If
known, sites that are relevant, such as probable locations of exposure are usually noted
on the map.
Analyzing data by place can identify communities at increased risk of disease. Even
if the data cannot reveal why these people have an increased risk, it can help generate
hypotheses to test with additional studies. For example, is a community at increased risk
because of characteristics of the people in the community such as genetic susceptibility,
lack of immunity, risky behaviors, or exposure to local toxins or contaminated food? Can
the increased risk, particularly of a communicable disease, be attributed to
characteristics of the causative agent such as a particularly virulent strain, hospitable
breeding sites, or availability of the vector that transmits the organism to humans? Or
can the increased risk be attributed to the environment that brings the agent and the host
together, such as crowding in urban areas that increases the risk of disease
transmission from person to person, or more homes being built in wooded areas close to
deer that carry ticks infected with the organism that causes Lyme disease?
Person
“Person” attributes include age, sex, ethnicity/race, and socioeconomic status.
Because personal characteristics may affect illness, organization and analysis of

data by “person” may use inherent characteristics of people (for example, age, sex,
race), biologic characteristics (immune status), acquired characteristics (marital status),
activities (occupation, leisure activities, use of medications/tobacco/drugs), or the
conditions under which they live (socioeconomic status, access to medical care). Age
and sex are included in almost all data sets and are the two most commonly analyzed
“person” characteristics. However, depending on the disease and the data available,
analyses of other person variables are usually necessary. Usually epidemiologists begin
the analysis of person data by looking at each variable separately. Sometimes, two
variables such as age and sex can be examined simultaneously. Person data are usually
displayed in tables or graphs.
Age. Age is probably the single most important “person” attribute, because almost
every health-related event varies with age. A number of factors that also vary with age
include: susceptibility, opportunity for exposure, latency or incubation period of the
disease, and physiologic response (which affects, among other things, disease
development).
When analyzing data by age, epidemiologists try to use age groups that are narrow
enough to detect any age-related patterns that may be present in the data. For some
diseases, particularly chronic diseases, 10-year age groups may be adequate. For other
diseases, 10-year and even 5-year age groups conceal important variations in disease
occurrence by age.
Sex. Males have higher rates of illness and death than do females for many
diseases. For some diseases, this sex-related difference is because of genetic,
hormonal, anatomic, or other inherent differences between the sexes. These inherent
differences affect susceptibility or physiologic responses.
Ethnic and racial groups. Sometimes epidemiologists are interested in analyzing

person data by biologic, cultural or social groupings such as race, nationality, religion, or
social groups such as tribes and other geographically or socially isolated groups.
Differences in racial, ethnic, or other group variables may reflect differences in
susceptibility or exposure, or differences in other factors that influence the risk of
disease, such as socioeconomic status and access to health care.
Socioeconomic status. Socioeconomic status is difficult to quantify. It is made up of
many variables such as occupation, family income, educational achievement or census
track, living conditions, and social standing. The variables that are easiest to measure
may not accurately reflect the overall concept. Nevertheless, epidemiologists commonly
use occupation, family income, and educational achievement, while recognizing that
these variables do not measure socioeconomic status precisely.
The frequency of many adverse health conditions increases with decreasing

socioeconomic status. For example, tuberculosis is more common among persons in
lower socioeconomic strata. Infant mortality and time lost from work due to disability are
both associated with lower income. These patterns may reflect more harmful exposures,
lower resistance, and less access to health care. Or they may in part reflect an
interdependent relationship that is impossible to untangle: Does low socioeconomic
status contribute to disability, or does disability contribute to lower socioeconomic status,
or both? What accounts for the disproportionate prevalence of diabetes and asthma in
lower socioeconomic areas?
A few adverse health conditions occur more frequently among persons of higher
socioeconomic status. Gout was known as the “disease of kings” because of its
association with consumption of rich foods. Other conditions associated with higher
socioeconomic status include breast cancer, Kawasaki syndrome, chronic fatigue
syndrome, and tennis elbow. Differences in exposure account for at least some if not
most of the differences in the frequency of these conditions.
II. DESCRIPTIVE STUDIES IN EPIDEMIOLOGY
What are descriptive studies?
Descriptive studies are observational studies which describe the patterns of

disease occurrence in relation to variables such as person, place and time. They are
often the first step or initial enquiry into a new topic, event, disease or condition.
Descriptive studies can be divided into two roles - those studies that emphasize features
of a new condition and those which describe the health status of communities or
populations. Case reports, case-series reports, before-and-after studies,cross-sectional
studies and surveillance studies deal with invidiuals. Ecological Studies examine
populations. Common misuses of descriptive studies involve a lack of a clear, specific
and reproducible case definition and establishing a casual relationship which the data
cannot support. Whilst descriptive studies can highlight associations between variables
or between exposure and outcome variables, they cannot establish causality. Descriptive
studies do not have a comparison (control) group which means that they do not allow for
inferences to be drawn about associations, casual or otherwise. However, they can
suggest hypotheses which can be tested in analytical observational studies.
Uses of Descriptive Studies

1. Health care planning- Descriptive studies provide knowledge about which populations
or subgroups are most or least affected by disease. This enables public health
administrators to target particular segments of the population for education or
prevention programs and can help allocate resources more efficiently.
2. Hypothesis generation- Descriptive studies identify descriptive characteristics which
frequently constitutes an important first step in the search for determinants or risk
factors that can be altered or eliminated to reduce or prevent disease.
3. Trend Analysis - Time-trend analysis is a longitudinal descriptive study that can
provide a dynamic view of a population's health status. Data is collected over time,
place and person to look for trends and changes.
Types of Descriptive Studies
Case reports
➢ Case reports describe the experience of a single patient or a group of
patients with a similar diagnosis. These types of studies typically depict an
observant clinician identifying an unusual feature of a disease or a patient's
history. They can represent the first clues in the identification of new diseases
or adverse effects of an exposure. A case report can prompt further
investigations with more rigorous study design. Case reports are quite
common in medical journals. A systematic review found that they accounted
for over one third of all articles published. They are useful to public health as
they can provide an interface between clinical medicine and epidemiology.
Case Series
➢ A case series is a report that describes clinical findings seen in a succession

of patients who seem to display a similar condition or an outcome of interest.
Another way of defining a case series is that case series are collections of
individual case reports which may occur within a fairly short period of time
and these are aggregated into one publication. No control group is involved.
Something unexpected has been observed - e.g. more cases than usual of a
rare disorder or new signs and symptoms of an emerging disease - hence the
motivation to write it up and share it with the wider clinical community.
➢ This study design has historical importance in epidemiology. It was often

used as an early means to identify the beginning or presence of an epidemic.
Even now, the routine surveillance of accumulating case reports often
suggest the emergence of a new disease or epidemic. A convenient feature
of case-series is that they can provide a case group for a case-control study.
An advantage of case series over case report is that a case series can help
formulate a new and useful hypothesis rather than merely documenting an
interesting medical oddity. However, its disadvantage is that it cannot be
used to test for the presence of a valid statistical association.
Cross-sectional (Prevalence) Study
➢ This is the observation of a defined population at a single point in time or time

interval. Exposure and outcome are determined simultaneously. The cross-
sectional study describes the presence and/or absence of various clinical
features, so it provides a cross-sectional comparison. This means that costs
are small and loss to follow up is not a problem. However, because exposure
and outcome are measured at the same time point, the temporal sequence is
often impossible to determine. Sometimes the cross-sectional study can be
considered an analytic study, when it is used to test an epidemiologic
hypothesis. This can only occur when the current values of the exposure
variables are unaltered over time, thus representing the value present at the
initiation of the disease. For example, factors at birth.
➢ The cross-sectional survey is sometimes referred to as a prevalance study

and it can survey or assess the health status of a population - e.g. Health
Survey of England. A survey can be defined as a special inquiry which
collects planned information from individuals (usually a sample) about their
history, habits, knowledge, attitudes or behaviour. The principles involved
include sampling, instrument design, non-response and accuracy. Reasons
for non-response incorporate the effect of the topic, study design (postal,
telephone or face-to-face interviews), age, sex, social class, urban/rural
location and general attitudes to survey. See entry on Survey in Toolkit for
more details.
➢ It is worth noting that the term 'cross-sectional' study is also used in social
research. Here, the cross-sectional study refers to a snapshot of a population
at a particular point in time. This contrasts with longitudinal studies which
follow a population over a period of time (i.e. cohort and panel), with cross-
comparative, where one population is compared with another within the same
country and cross-national, where one country population is compared with
other countries.
Ecological Study (or Ecological Correlational Study)
➢ Ecological correlational studies look for associations between exposures and

outcomes in populations rather than in individuals. They use data that has
already been collected. (This could be argued to be a form of what social
scientists call secondary statistical analysis). The measure of association
between exposure and outcome is the correlation coefficent r. This is a
measure of how linear the relationship is between the exposure and outcome
variables. (Note that correational is a specific form of association and
requires two continuous variables).
III. GENERAL HEALTH AND POPULATION INDICATORS
Health is defined as “a state of complete physical, mental & social

well being, and not merely an absence of disease or infirmity” (WHO). This
statement has been amplified to include the ability to lead a “socially and
economically productive life”. Health cannot be measured in exact measurable
forms. Hence measurement have been framed in terms of illness (or lack of
health) and economic, occupation and domestic factors that promote ill health.
INDICATORS OF HEALTH
Indicator also termed as Index or Variable is only an indication of a given
situation or a reflection of that situation. Health indicator is a variable, susceptible
to direct measurement, that reflects the state of health of persons in a
community. Indicators help to measure the extent to which the objectives and
targets of a program are being attained. Health status indicators measure
different aspects of the health of a population. Health determinant indicators
measure things that influence health.
CHARACTERISTICS OF INDICATORS OF HEALTH

1. Valid- should actually measure what they are supposed to measure
2. Reliable- the results should be the same when measured by different people
in similar circumstances
3. Sensitive- should be sensitive to changes in the situation concerned
4. Specific- should reflect changes only in the situation concerned
5. Feasible – should have the ability to obtain data when needed
6. Relevant- should contribute to the understanding of the phenomenon of
interest.
USES OF INDICATORS OF HEALTH

✓ Measurement of the health of the community
✓ Description of the health of the community
✓ Comparison of the health of different communities
✓ Identification of health needs and prioritizing them
✓ Evaluation of health services
✓ Planning and allocation of health resources
✓ Measurement of health successes
CLASSIFICATION OF INDICATORS
1. Mortality indicators- crude death rate, expectation of life, infant mortality rate,
under 5 mortality rate, child mortality rate, maternal mortality ratio, disease
specific death rate, proportional mortality rate
2. Morbidity indicators- incidence rate, prevalence rate,
3. Disability rates – event type and person type, sullivan’s index, health adjusted
life expectancy, disability adjusted life years, quality adjusted life year
4. Nutritional indicators- BMI, growth monitoring
5. Health care delivery indicators
6. Utilization Rates
7. Indicators of social and mental health
8. Environmental indicators
9. Socio-economic indicators
10. Health policy indicators
11. Other indicators
Synthesis:
Descriptive epidemiology searches for patterns by examining characteristics of

person, place, & time. These characteristics are carefully considered when a disease
outbreak occurs, because they provide important clues regarding the source of the
outbreak. Descriptive epidemiology provides a way of organizing and analyzing data on
health and disease in order to understand variations in disease frequency geographically
and over time and how disease varies among people based on a host of personal
characteristics (person, place, and time). Epidemiology had its origins in the desire to

understand the determinants of acute infectious diseases, but its methods and applicability
have expanded to include chronic diseases as well.
Epidemiologic researches uses different design strategies. It is important to note
that the research strategies/design can be broadly categorized according to whether
investigations focus on describing the distribution of disease or elucidating its
determinants. Each of the descriptive study designs discussed in this lesson provides
information on various characteristics of person, place or time, and each has unique
strengths and limitations.
For additional materials you may go to the following links below:

https://www.youtube.com/watch?v=Jd3gFT0-C4s
Assessment:
email
TASK 1: Complete the table below. Indicate References
STUDY DESIGN APPLICATION STRENGTH/ADVANTAGES WEAKNESS/DISADVANTAGES STATISTICAL

TOOL
CASE
REPORT
CASE
SERIES
CROSS
SECTIONAL
ECOLOGICAL
TASK 2: Define the following and for numbers 5-11 aside from define give examples
1. Mortality indicators
➢ crude death rate
➢ expectation of life
➢ infant mortality rate
➢ under 5 mortality rate
➢ child mortality rate
➢ maternal mortality ratio
➢ disease specific death rate
➢ proportional mortality rate
2. Morbidity indicators
➢ incidence rate
➢ prevalence rate,
3. Disability rates
➢ event type and person type
➢ sullivan’s index
➢ health adjusted life expectancy
➢ disability adjusted life years
➢ quality adjusted life year
4. Nutritional indicators-
➢ BMI
➢ growth monitoring
5. Health care delivery indicators
6. Utilization Rates
7. Indicators of social and mental health
8. Environmental indicators
9. Socio-economic indicators
10. Health policy indicators
11. Other indicators
References:
Center for Disease Control and Prevention, (2012). Principles of Epidemiology in Public Health
Practice, Third Edition An Introduction to Applied Epidemiology and Biostatistics. Retrieved
from https://www.cdc.gov/csels/dsepd/ss1978/lesson1/section6.html on August 5, 2020
Grimes, D.A. & Schultz, K.F. (2002) Descriptive Studies: what they can and cannot do. The
Lancet, 359, 145-49.
Heffernan, C. (n.d) Descriptive Studies. Retrieved from http://www.drcath.net/toolkit/descriptive-

studies on August 5, 2020
Goodman RA, Smith JD, Sikes RK, Rogers DL, Mickey JL. Fatalities associated with farm
tractor injuries: an epidemiologic study. Public Health Rep 1985;100:329–33.
Heyman DL, Rodier G. Global surveillance, national surveillance, and SARS. Emerg Infect Dis.
2003;10:173–5.
American Cancer Society [Internet]. Atlanta: The American Cancer Society, Inc. Available from:
http://www.cancer.org/Research/CancerFactsFigures/cancer-facts-figures-2005/external
icon.
Centers for Disease Control and Prevention. Current trends. Lung cancer and breast cancer
trends among women–Texas. MMWR 1984;33(MM19):266.
Liao Y, Tucker P, Okoro CA, Giles WH, Mokdad AH, Harris VB, et. al. REACH 2010
surveillance for health status in minority communities — United States, 2001–2002.
MMWR 2004;53:1–36.

LABORATORY 12
EPIDEMIOLOGIC MEASURES AND MEASURES OF DISEASE OCCURRENCE/
GENERAL HEALTH & POPULATION INDICATORS
I. BACKGROUND
As a quantitative field, certain measures are essential in understanding the state of health of
illnesses and health outcomes. The essence of epidemiology is to measure disease occurrence and make
comparisons between population groups. The current section introduces the commonly used measures
that help our understanding of the distribution of disease in a given population.
number of new cases

incidence rate = over a time period X multiplier( e. g .,100, 000)
total population at risk
number of deaths in a given year

crude death rate = X 100, 000
reference population ( during midpo int of the year )
number of deaths due to disease " x"

CFR (%) = X 100 during a time period
number of cases of disease " x"
mortality ( frequency of a given disease)

cause − specific rate = X 100, 000
population size at midpo int of the time period
Prevalence= # of people in sample with characteristic

Total # of people in sample
Population density
population
( ) = Population Density
area
Population Growth rate

crude births – crude deaths births – deaths
( ) =r% X 100 = r %
10 total population
OR
Doubling time of a population: The rule of 70. ( This only applies if the population is growing exponentially)
70% 0.7
( ) or ( ) = Doubling Time (dt) in years
r (in percent form) r (in decimal form)
Population from growth rate

( initial population ) X ( growth rate ) years = Final Population
NOTE: a growth rate of 3% is expressed as 1.03; a growth rate of 0.25% is 1.0025
II. OBJECTIVES
1. Learn about commonly used epidemiological measurements to describe the occurrence of
disease
2. Accurately solve for the following epidemiological measurements
a) Incidence rate
b) Prevalence rate
c) Crude death rate
d) Crude birth rate
e) Case fatality rate
f) Cause-specific rate
g) Population density
h) Population growth rate
i) Doubling time of a population
j) Future population from growth rate
III. MATERIALS
Calculator
References
IV. ACTIVITY
1. Epidemiologists have noticed the cyclic occurrence of the dreaded polio. Children afflicted with
this disease exhibit fever, sore throat and stiffness. In Baguio, doctors have been monitoring the
situation. Their observation began with week 0, the first week in October. The data is shown in
the following table:
Polio Cases: Raw Data

a) For weeks 1-4, calculate the prevalence and incidence rates, and express them as decimals.
Polio: Incidence and Prevalence rates

b) From the data above, convert the incidence and prevalence rates to rates per 100 and fill in
the table below with your answers.
c) Consider the incidence rates you have calculated. Based on these data, when should
epidemiologists expect the greatest increase in polio?
2. Baguio City contains 2.3 million people and covers 800,000 square miles. In the year after the
last census, there were 109,000 new children born and 111,000 people died.
a) What is the current population density?
b) What are the birth and death rates?
c) What is the population growth rate (r)?
d) In how many years will the population of Transylvania double?
e) Given a 2010 world population growth rate of about 1.3% per year, how long would it take
the world’s population to double?
f) How old will you be when this doubling occurs?
g) If a country doubles its population in 56 years, what has its population growth rate during
that time?
3. Compute for the cause-specific death rate with the following information: 6, 309 colon cancer deaths
in the Philippines during calendar years 2015, 2016, 2017. 38,128,753 – sum of estimated 2015,
2016, 2017 mid year populations.
4. Compute for the case fatality rate with the following information: 137 deaths due to HIV during
calendar year 2018. With a 110,787 estimated HIV infected individuals in 2018.

1. Describe the concept of natural history of disease
a) Population size l) incidence
b) Population density m) prevalence
c) Population distribution n) crude rate
d) Age structure o) case fatality rate
e) Sex ratio p) cause specific rate
f) Exponential growth q) life expectancy
g) Biotic potential r) maternal mortality
h) Carrying capacity s) infant mortality rate
i) Endemic t) fetal mortality
j) Epidemic u) crude birth rate
k) Pandemic
2. Gather a 5 year data ( 2014, 2015, 2016, 2017, 2018) on natality, mortality, morbidity from DOH.
Present this in table form. Analyze and Interpret the data/table

LESSON 13
ANALYTICAL and EXPERIMENTAL EPIDEMIOLOGY

1. To identify the design strategies and interpret statistical methods used in analytical
epidemiology
2. To identify experimental studies and observational in epidemiology
3. To analyze and interpret the data in experimental and observational studies in
epidemiology
Introduction
As noted in Lesson 12, descriptive epidemiology can identify patterns among cases and
in populations by time, place and person. From these observations, epidemiologists
develop hypotheses about the causes of these patterns and about the factors that increase
risk of disease. In other words, epidemiologists can use descriptive epidemiology to
generate hypotheses, but only rarely to test those hypotheses. For that, epidemiologists
must turn to analytic epidemiology.
Lesson Proper:
Analytic epidemiologic studies measure the association between a particular

exposure and a disease, using information collected from individuals, rather than from
the aggregate population. Exposure is defined broadly to include behavioral factors such
as smoking or diet, environmental pollutants such as asbestos, personal characteristics
such as obesity or tendency to sunburn, anthropometric measurements such as body
mass index, and genetic traits and other measurable biologic factors that may affect
cancer.
Analytic epidemiology is concerned with the search for causes and effects, or the
why and the how. Epidemiologists use analytic epidemiology to quantify the association
between exposures and outcomes and to test hypotheses about causal relationships. It
has been said that epidemiology by itself can never prove that a particular exposure
caused a particular outcome. Often, however, epidemiology provides sufficient evidence
to take appropriate control and prevention measures.

Epidemiologic studies fall into two categories: experimental and observational.
EXPERIMENTAL STUDIES
Experimental epidemiology is the study of the relationships of various factors
determining the frequency and distribution of diseases in a community. Experimental
epidemiology contains three case types:
1. randomized control trial
➢ often used for new medicine or drug testing
➢ the epitome of all research designs because it provides the strongest
evidence for concluding causation
➢ it provides the best insurance that the result was due to the
intervention
2. field trial
➢ conducted on those at a high risk of conducting a disease
3. community trial
➢ research on social originating diseases
OBSERVATIONAL STUDIES
In an observational study, the epidemiologist simply observes the exposure and
disease status of each study participant. John Snow’s studies of cholera in London were
observational studies. The two most common types of observational studies are cohort
studies and case-control studies; a third type is cross-sectional studies.
1. Cohort study
➢ A cohort study is similar in concept to the experimental study. In a
cohort study the epidemiologist records whether each study
participant is exposed or not, and then tracks the participants to see if
they develop the disease of interest. Note that this differs from an
experimental study because, in a cohort study, the investigator
observes rather than determines the participants’ exposure status.
After a period of time, the investigator compares the disease rate in
the exposed group with the disease rate in the unexposed group. The
unexposed group serves as the comparison group, providing an
estimate of the baseline or expected amount of disease occurrence in
the community. If the disease rate is substantively different in the
exposed group compared to the unexposed group, the exposure is
said to be associated with illness.
➢ The Framingham study is a well-known cohort study that has followed

over 5,000 residents of Framingham, Massachusetts, since the early
1950s to establish the rates and risk factors for heart disease. The
Nurses Health Study and the Nurses Health Study II are cohort studies
established in 1976 and 1989, respectively, that have followed over
100,000 nurses each and have provided useful information on oral
contraceptives, diet, and lifestyle risk factors. These studies are
sometimes called follow-up or prospective cohort studies, because
participants are enrolled as the study begins and are then followed
prospectively over time to identify occurrence of the outcomes of
interest.

➢ An alternative type of cohort study is a retrospective cohort study. In
this type of study both the exposure and the outcomes have already
occurred. Just as in a prospective cohort study, the investigator
calculates and compares rates of disease in the exposed and
unexposed groups. Retrospective cohort studies are commonly used in
investigations of disease in groups of easily identified people such as
workers at a particular factory or attendees at a wedding. For example,
a retrospective cohort study was used to determine the source of
infection of cyclosporiasis, a parasitic disease that caused an outbreak
among members of a residential facility in Pennsylvania in 2004. The
investigation indicated that consumption of snow peas was implicated
as the vehicle of the cyclosporiasis outbreak.
2. Case-control study
➢ In a case-control study, investigators start by enrolling a group of
people with disease (at CDC such persons are called case-patients
rather than cases, because case refers to occurrence of disease, not a
person). As a comparison group, the investigator then enrolls a group of
people without disease (controls). Investigators then compare previous
exposures between the two groups. The control group provides an
estimate of the baseline or expected amount of exposure in that
population. If the amount of exposure among the case group is
substantially higher than the amount you would expect based on the
control group, then illness is said to be associated with that exposure.
The study of hepatitis A traced to green onions, described above, is an
example of a case-control study.
➢ The key in a case-control study is to identify an appropriate control

group, comparable to the case group in most respects, in order to
provide a reasonable estimate of the baseline or expected exposure.
Synthesis:
In summary, the purpose of an analytic study in epidemiology is to identify and

quantify the relationship between an exposure and a health outcome. The hallmark of
such a study is the presence of at least two groups, one of which serves as a
comparison group. In an experimental study, the investigator determines the exposure
for the study subjects; in an observational study, the subjects are exposed under more
natural conditions. In an observational cohort study, subjects are enrolled or grouped on
the basis of their exposure, then are followed to document occurrence of disease.
Differences in disease rates between the exposed and unexposed groups lead
investigators to conclude that exposure is associated with disease. In an observational
case-control study, subjects are enrolled according to whether they have the disease or
not, then are questioned or tested to determine their prior exposure. Differences in
exposure prevalence between the case and control groups allow investigators to
conclude that the exposure is associated with the disease. Cross-sectional studies
measure exposure and disease status at the same time, and are better suited to
descriptive epidemiology than causation.
Assessment:
email
SEATWORK: Complete the table below. Indicate References
STUDY DESIGN APPLICATION STRENGTH/ADVANTAGES WEAKNESS/DISADVANTAGES STATISTICAL

TOOL
RANDOMIZED
CONTROL
TRIAL
COHORT
CASE
CONTROl
FIELD TRIAL
COMMUNITY
TRIAL
References:
Center for Disease Control and Prevention, (2012). Principles of Epidemiology in Public Health
Practice, Third Edition An Introduction to Applied Epidemiology and Biostatistics.
Retrieved from https://www.cdc.gov/csels/dsepd/ss1978/lesson1/section6.html on August
5, 2020
Dawson, B. and Trapp, R. (2004) Basic & Clinical Biostatistics 4th Edition. McGrawHill.
Kannel WB. The Framingham Study: its 50-year legacy and future promise. J Atheroscler
Thromb 2000;6:60-6.
Libretexts. (2019). Experimental Epidemiology. Retrieved from

https://bio.libretexts.org/Bookshelves/Microbiology/Book%3A_Microbiology_(Boundless)/1
0%3A_Epidemiology/10.5%3A_Epidemiology_and_Public_Health/10.5C%3A_Experiment
al_Epidemiology on August 5, 2020
Thun, J and Jehmal A. (2003). Analytic Epidemiology. Holland-Frei Cancer Medicine. 6th edition.
Retrieved from https://www.ncbi.nlm.nih.gov/books/NBK13832/ on August 5, 2020
LABORATORY 13
STUDY/RESEARCH DESIGNS
I. BACKGROUND
A study design is a specific plan or protocol for conducting an epidemiological study, which
allows the investigator to translate the conceptual hypothesis into an operational one. The research
design refers to the overall strategy that you choose to integrate the different components of the
study in a coherent and logical way, thereby, ensuring you will effectively address the research
problem; it constitutes the blueprint for the collection, measurement, and analysis of data.
II. OBJECTIVES
1. List and describe the goals of a study design
2. Discuss the difference between observational versus experimental study designs and the
difference between descriptive versus analytical studies.
3. Describe commonly used study designs- case report, case series, case control, cohort, and
cross sectional
4. Explain the advantages and disadvantages of each of the observational study designs
5. Explain how to interpret basic results of each study design
III. MATERIALS
References
IV. ACTIVITY
1. A Group of students are assigned a study design
a) case report d) cohort
b) case series e) cross sectional
c) case control f) Randomized control trial
2. The study designs should be presented through a powerpoint presentation
a) description of the study design
b) advantages and disadvantages of the study design
c) Explain how to interpret basic results of the study design
4. Another group of students will be assigned to present journals per study design. To be
presented through IMRAD format using powerpoint presentation

1. List and describe the goals of a study design
2. Discuss the difference between observational versus experimental study designs and the
difference between descriptive versus analytical studies. Present it through a table.
3. Describe commonly used study designs- case report, case series, case control, cohort, and
cross sectional
4. Explain the advantages and disadvantages of each of the observational study designs
5. Explain how to interpret basic results of each study design
LESSON 14
CAUSAL INFERENCE

1. To describe the processes, uses, and evaluation of public health surveillance.
2. To identify and comprehend key sources for epidemiologic data
Introduction
Causal inference -- the art and science of making a causal claim about the relationship
between two factors -- is in many ways the heart of epidemiologic research. Under most
circumstances if there is an association between an exposure and a health outcome of
interest, most commonly asked question is: is one causing the other? Causal inference is
important because, ultimately, it can help in the intervention to improve public health, and
interventions can be targeted on removing known causes of adverse health outcomes (or
adding known causes of beneficial health outcomes).
Lesson Proper:
Epidemiology is primarily focused on establishing valid associations between

'exposures' and health outcomes. However, establishing an association does not
necessarily mean that the exposure is a cause of the outcome. Most definitions of
"cause" include the notion that it is something that has an effect or a consequence.
Certainly, establishing a valid association between exposure and outcome is a necessary
first step that must be accomplished before wrestling with the more complicated, and
frequently controversial, question of whether the relationship is causal. However, for
most epidemiologists the second step in the process is to consider the entire body of
evidence that is available to try to arrive at a reasonable conclusion about the

relationship when the overall evidence from epidemiology and other sources (e.g., in
vitro, animal, and other types of human studies) is reviewed
Historical Views of Causation

1. Four Vital Humors
➢ Historically, there have been many efforts to account for the occurrence of
disease outcomes. Religions often attributed disease outbreaks or other
misfortunes to divine retribution - punishment for mankind's sins.
Hippocrates promoted the concept that disease was the result of an
imbalance among four vital "humors" within us: Yellow Bile, Black Bile,
Phlegm, Blood
➢ Hippocrates believed that if one of the humors became excessive or

deficient, health would deteriorate and symptoms would develop.
Hippocrates was a keen observer and tried to relate an individual's exposures
(e.g., diet, exercise, occupation, and other behaviors) to subsequent health
outcomes.
➢ Consequently, his recommendations and "prescriptions" were often based
on his observations and his perception of cause and effect. His disease
model, however crude, also suggested seemingly logical interventions. For
example, if he surmised that an individual suffered from too much of the
humor "blood", he prescribed blood letting to alleviate the problem. The
scene depicted to the right shows a female physician in the process of letting
blood from one of her patients.
2. Miasmas
➢ Another popular theory that persisted until the end of the 19th century was
that miasmas were responsible for disease. Bad odors were equated with
disease. Miasmas were toxic vapors or gases that emanated from cesspools
or swamps or filth, and it was believed that if one inhaled the vapors, disease
would result. This theory provided an explanation for outbreaks of infectious
disease, including cholera and plague. As a result many ineffective
interventions were pursued. Bonfires and smoking urns were used to prevent
both plague and cholera. In the 14th century "plague doctors" wore masks
with beak-like projections filled with aromatic herbs in order to counteract
the effect of miasmas. Echoes of the miasmatic theory can be found in the
name "malaria", derived from the Italian for "bad air" (mala, aria). It reflects
the correct observation that the disease was more common in swampy
areas, but it misidentified the cause as the foul odors rather than the
bacterium caused by the mosquito that bred there.
3. Germ Theory - Koch's Postulates
➢ Even though there was a "germ" of truth in miasmatic theory, in that it
focused attention on environmental causes of disease and partly explained
social disparities in health (poor people being more likely to live near foul
odors), the theory began to fall into disfavor as the germ theory gained
acceptance. Louis Pasteur and others introduced the germ theory in 1878.
➢ In 1890 Robert Koch proposed specific criteria that should be met before
concluding that a disease was caused by a particular bacterium. These
became known as Koch's Postulates, which are as follows:
✓ The bacteria must be present in every case of the disease.
✓ The bacteria must be isolated from the host with the disease and
grown in pure culture.
✓ The specific disease must be reproduced when a pure culture of the
bacteria is inoculated into a healthy susceptible host.
✓ The bacteria must be recoverable from the experimentally infected
host
➢ Koch's postulates established standard criteria for drawing conclusions about
the cause of infectious disease, but the criteria obviously don't apply to non-
infectious diseases. In addition, the criteria also had some limitations even
with respect to infectious disease. For example, not all infectious diseases
have good animal models. Another problem was that bacteria that we regard
as "normal flora", such as the Staphylococcus aureus on our skin, are
generally harmless but can cause disease under certain conditions.
Moreover, when people are exposed to a bacterium, such as the TB bacillus,
they don't necessarily become infected. There appear to be many other
factors that play a role in determining whether a given individual becomes
infected after they are exposed. Factors such as nutritional status or immune
status clearly have an impact in "causing" TB, but all these other
determinants aren't accounted for by Koch's postulates.
4. Webs of Causation
➢ The germ theory obviously didn't provide insights regarding the causes of
chronic diseases, and over time it became increasingly apparent that for
most diseases there were many contributory factors. Researchers began
thinking about complex "webs" of causation. The image below summarizes a
web of causation for obesity in the context of a socio-ecologic perspective.
Note that some factors are more "proximate" or immediate, such as
decreased energy expenditure and increased food intake, while other factors
or perhaps root causes are more distal, such as globalization of markets,
development, and advertising.
What is a Cause?
➢ In epidemiology, the “cause” is an agent (microbial germs, polluted water, smoking,
etc.) that modifies health, and the “effect” describes the way that the health is
changed by the agent. The agent is often potentially pathogenic (in which case it is
known as a “risk factor”).

➢ The effect is therefore effectively a risk comparison. We can define two different
types of risk in this context:
- The absolute effect of a cause expresses the increase in the risk or the
additional number of cases of illness that result or could result from
exposure to this cause. It is measured by the attributable risk and its
derivatives.
- The relative effect of a cause expresses the strength of the association

between the causal agent and the illness.
Characteristics of a Cause
➢ To be a cause, the factor:
- Must precede the effect

- Can be either a host or environmental factor (e.g., characteristics,
conditions, actions of individuals, events, natural, social or economic
phenomena)
- May be positive (presence of a causative exposure) or negative (lack of a
preventive exposure)
Risk Factors versus Causes
Epidemiologists often use the term "risk factor" to indicate a factor that is associated with
a given outcome. However, a risk factor is not necessarily a cause. The term risk factor includes
surrogates for underlying causes.
It is important to distinguish between risk factors and causes. Nevertheless, before one
can wrestle with the difficult question of causation, it is first necessary to establish that a valid
association exists. Consequently, if we accept Susser's assertion that a cause is something that
makes a difference, one might then ask how to tell if a factor makes a difference. Most
epidemiologists would agree that, in a broad sense, this is a two step process.
The evidence must be examined to determine that there is a valid association between
an exposure and an outcome. This is achieved by conducting epidemiologic studies and
critically reviewing the available studies to determine whether random error or bias or
confounding might explain the apparent association.
If it is determined that there is a valid association, then one must wrestle with the
question of whether the association was causal. Not all associations are causal. There are no
standardized rules for determining whether a relationship is causal.
Hill's Criteria for Causality
➢ Hills Criteria of Causation outlines the minimal conditions needed to establish a

causal relationship between two items.

➢ These criteria were originally presented by Austin Bradford Hill (1897-1991), a
British medical statistician, as a way of determining the causal link between a
specific factor (e.g., cigarette smoking) and a disease (such as emphysema or lung
cancer).
➢ Hill's Criteria form the basis of modern epidemiological research, which attempts to
establish scientifically valid causal connections between potential disease agents
and the many diseases that afflict humankind.
➢ The principles set forth by Hill form the basis of evaluation used in all modern
scientific research. While it is quite easy to claim that agent "A" (e.g., smoking)
causes disease "B" (lung cancer), it is quite another matter to establish a
meaningful, statistically valid connection between the two phenomena.
➢ Hill's Criteria simply provides an additional valuable measure by which to evaluate

the many theories and explanations proposed within the social sciences.
1. Temporal Relationship:
Exposure always precedes the outcome. If factor "A" is believed to cause a disease, then it is
clear that factor "A" must necessarily always precede the occurrence of the disease. This is the
only absolutely essential criterion. This criterion negates the validity of all functional
explanations used in the social sciences, including the functionalist explanations that dominated
British social anthropology for so many years and the ecological functionalism that pervades
much American cultural ecology.
2. Strength:
This is defined by the size of the association as measured by appropriate statistical tests. The
stronger the association, the more likely it is that the relation of "A" to "B" is causal. For
example, the more highly correlated hypertension is with a high sodium diet, the stronger is the
relation between sodium and hypertension. Similarly, the higher the correlation between
patrilocal residence and the practice of male circumcision, the stronger is the relation between
the two social practices.
3. Dose-Response Relationship:
An increasing amount of exposure increases the risk. If a dose-response relationship is

present, it is strong evidence for a causal relationship. However, as with specificity (see below),
the absence of a dose-response relationship does not rule out a causal relationship. A
threshold may exist above which a relationship may develop. At the same time, if a specific
factor is the cause of a disease, the incidence of the disease should decline when exposure to
the factor is reduced or eliminated. An anthropological example of this would be the relationship
between population growth and agricultural intensification. If population growth is a cause of
agricultural intensification, then an increase in the size of a population within a given area
should result in a commensurate increase in the amount of energy and resources invested in

agricultural production. Conversely, when a population decrease occurs, we should see a
commensurate reduction in the investment of energy and resources per acre. This is precisely
what happened in Europe before and after the Black Plague. The same analogy can be applied
to global temperatures. If increasing levels of CO2 in the atmosphere causes increasing global
temperatures, then "other things being equal", we should see both a commensurate increase
and a commensurate decrease in global temperatures following an increase or decrease
respectively in CO2 levels in the atmosphere.
4. Consistency:
The association is consistent when results are replicated in studies in different settings using
different methods. That is, if a relationship is causal, we would expect to find it consistently in
different studies and among different populations. This is why numerous experiments have to
be done before meaningful statements can be made about the causal relationship between two
or more factors. For example, it required thousands of highly technical studies of the
relationship between cigarette smoking and cancer before a definitive conclusion could be made
that cigarette smoking increases the risk of (but does not cause) cancer. Similarly, it would
require numerous studies of the difference between male and female performance of specific
behaviors by a number of different researchers and under a variety of different circumstances
before a conclusion could be made regarding whether a gender difference exists in the
performance of such behaviors.
5. Plausibility:
The association agrees with currently accepted understanding of pathological processes. In

other words, there needs to be some theoretical basis for positing an association between a
vector and disease, or one social phenomenon and another. One may, by chance, discover a
correlation between the price of bananas and the election of dog catchers in a particular
community, but there is not likely to be any logical connection between the two phenomena. On
the other hand, the discovery of a correlation between population growth and the incidence of
warfare among Yanomamo villages would fit well with ecological theories of conflict under
conditions of increasing competition over resources. At the same time, research that disagrees
with established theory is not necessarily false; it may, in fact, force a reconsideration of
accepted beliefs and principles.
6. Consideration of Alternate Explanations:
In judging whether a reported association is causal, it is necessary to determine the extent to

which researchers have taken other possible explanations into account and have effectively
ruled out such alternate explanations. In other words, it is always necessary to consider
multiple hypotheses before making conclusions about the causal relationship between any two
items under investigation.
7. Experiment:
The condition can be altered (prevented or ameliorated) by an appropriate experimental

regimen.
8. Specificity:
This is established when a single putative cause produces a specific effect. This is considered
by some to be the weakest of all the criteria. The diseases attributed to cigarette smoking, for
example, do not meet this criteria. When specificity of an association is found, it provides
additional support for a causal relationship. However, absence of specificity in no way negates
a causal relationship. Because outcomes (be they the spread of a disease, the incidence of a
specific human social behavior or changes in global temperature) are likely to have multiple
factors influencing them, it is highly unlikely that we will find a one-to-one cause-effect
relationship between two phenomena. Causality is most often multiple. Therefore, it is
necessary to examine specific causal relationships within a larger systemic perspective.
9. Coherence:
The association should be compatible with existing theory and knowledge. In other words, it is
necessary to evaluate claims of causality within the context of the current state of knowledge
within a given field and in related fields. What do we have to sacrifice about what we currently
know in order to accept a particular claim of causality. What, for example, do we have to reject
regarding our current knowledge in geography, physics, biology and anthropology in order to
accept the Creationist claim that the world was created as described in the Bible a few thousand
years ago? Similarly, how consistent are racist and sexist theories of intelligence with our
current understanding of how genes work and how they are inherited from one generation to the
next? However, as with the issue of plausibility, research that disagrees with established theory
and knowledge are not automatically false. They may, in fact, force a reconsideration of
accepted beliefs and principles. All currently accepted theories, including Evolution, Relativity
and non-Malthusian population ecology, were at one time new ideas that challenged orthodoxy.
Thomas Kuhn has referred to such changes in accepted theories as "Paradigm Shifts".
The Sufficient-Component Cause Model
The model has similarities to the "web of causation", but is more developed in the sense
that it simultaneously provides a general model for the conditions necessary to cause (and
prevent) disease in a single individual and for the epidemiological study of the causes of
disease among groups of individuals.
Synthesis:
Epidemiology has a vested interest in causation as, despite its numerous and often vague
definitions, it is a discipline with the goal of identifying causes of disease (both modifiable and
nonmodifiable) so that the disease or its consequences might be prevented.
Three Essential Attributes of a Cause
1. Association: variation in a causal factor must result in a change in probability of the

outcome; if X changes, Y changes
2. Time order: a cause must precede the effect; can be either proximate (e.g.; food
poisoning) or distant (e.g.; carcinogen)
3. Direction: the exposure results in the outcome, not vice versa.
There are different models to illustrate causation namely: Hill's Criteria for Causality, Web of
causation and The Sufficient-Component Cause Model

Assessment:
email
SEATWORK:
Draw the following Models of Causation namely: Hill's Criteria for Causality, Web of causation
and The Sufficient-Component Cause Model. Give a 3-5 explanation of each in your own words.
( 5 points each model)
RUBRIC:
INDICATOR DESCRIPTION RATING
Appearance Artistic presentation of the 2
image
Content In-depth explanation using 2
own words
Mechanics Impressive us of language 1
and grammar
References:
Boston University of Public Health (n.d) Causal Inference. Retrieved from https://sphweb.
bumc.bu.edu/otlt/MPH- Modules/EP/EP713_Causality/EP713_Causality_print.html on
August 6, 2020
Hills Criteria of Causation Retrieved from http://www.drabruzzi.com/hills_criteria _of_ causation

.htmon August 6, 2020
Lamorte, W. (2019). Elements of a Cause. Boston University School of Public Health Retrieved
from ttps://sphweb.bumc.bu.edu/otlt/MPH-Modules/PH717-QuantCore/PH717-Module1A-
Populations/Module1A-Populations7.html on August 6, 2020
Susser, M. (2003). Causal Inference. Columbia University Mailman School of Public Health
Retrieved from https://epiville.ccnmtl.columbia.edu/causal_inference/ on August 6, 2020
Cause and Effect in Epidemiology (2008). In: The Concise Encyclopedia of Statistics. Springer,
New York, NY. https://doi.org/10.1007/978-0-387-32833-1_48

LABORATORY 14
CAUSAL INFERENCE
I. BACKGROUND
Causal inference -- the art and science of making a causal claim about the relationship between
two factors -- is in many ways the heart of epidemiologic research. Under most circumstances if there is
an association between an exposure and a health outcome of interest, most commonly asked question
is: is one causing the other? Causal inference is important because, ultimately, it can help in the
intervention to improve public health, and interventions can be targeted on removing known causes of
adverse health outcomes (or adding known causes of beneficial health outcomes).
II. OBJECTIVES
1. Distinguish between association and a causal relationship.

2. Describe and apply Hill's criteria and for a judgment of causality.
3. Describe the sufficient-component cause model.
4. Discuss in general the differences in the weight of evidence needed for determining causality
versus taking public health action.
III. MATERIALS
References

IV. ACTIVITY
1. The students are assigned a concept or theory
a) Miasma
b) Germ Theory
c) Web of Causation
d) Hill’s Criteria of Causality
e) The Sufficient-Component Cause Model
2. The theories or concept should be presented through a powerpoint presentation
a) description of the theory/concept
b) history of the theory/concept
c) importance of the concept/theory in epidemiology
d) Give Public Health applications of the theory

1. Distinguish between association and a causal relationship
2. Discuss in general the differences in the weight of evidence needed for determining
causality versus taking public health action
LESSON 15
FIELD EPIDEMIOLOGY/ OUTBREAK INVESTIGATION

1. To describe the steps of an outbreak investigation.
Introduction
Field epidemiology is the way how epidemics and outbreaks are investigated. This is
used to implement measures to protect and improve the health of the public. Field
epidemiologists are the individuals that deal with unexpected, sometimes urgent problems
that demand immediate solution.
Lesson Proper:
Introduction
Field epidemiology are investigations initiated in response to urgent public health

problems. A primary goal of field epidemiology is to guide, as quickly as possible, the
processes of selecting and implementing interventions to lessen or prevent illness or death
when such problems arise. Field investigations of acute problems share many
characteristics with prospectively planned epidemiologic studies, they differ in at least three
important aspects.
✓ Because field investigations often start without specific hypotheses about the
cause or source of disease, they require the use of descriptive studies to

generate hypotheses before analytic studies can be designed and conducted to
test these hypotheses.
✓ As noted previously, when acute problems occur, an immediate need exists to
protect the community’s health and address its concerns. These responsibilities
drive the epidemiologic field investigation beyond the confines of data collection
and analysis and into the realm of public health policy and action.
✓ Field epidemiology forces the epidemiologist to consider when the findings are
sufficient to take action rather than to ask what additional questions might be
answered by additional data collection or analyses or, alternatively, to take initial
actions that might be modified as additional information is obtained through
further investigation.
Determinants for Field Investigations
Health departments become aware of possible disease outbreaks or other acute public
health problems in different ways. Situations might gain attention because astute clinicians
recognize unusual patterns of disease among their patients and alert health departments,
surveillance systems for monitoring disease or hazard trends detect increases, the
diagnosis of a single case of a rare disease heralds a broader problem or potential threat, or
members of the public are concerned and contact authorities.
After such alerts, the first step is to decide whether to conduct a field investigation. Initial
assessments might dispel concerns or affirm that further investigation is warranted. After
initiated, decisions must be made at successive stages about how far to pursue an
investigation. These decisions are necessary to make the most effective use of public health
resources, including capacities to conduct field investigations and optimize opportunities for
disease prevention.
In addition to the need to develop and implement control measures to end threats to the
public’s health, such as the Legionnaires’ disease and EVD outbreaks, other determinants
that shape field investigations include (1) epidemiologic, programmatic, and resource
considerations; (2) public and political considerations; (3) research and learning
opportunities; (4) legal obligations; and (5) training needs.
Unique Challenges to Epidemiologists in Field Investigations
An epidemiologist investigating problems in the field faces unique challenges that

sometimes constrain the ideal use of scientific methods. In contrast to prospectively planned
studies, which generally are based on carefully developed and refined protocols, field
investigations must rely on data sources that are immediately available, less readily controlled,
and subject to change with successive hours or days. In addition to possible limitations in data
sources, factors that pose challenges for epidemiologists during field investigations include
sampling considerations, availability of specimens, effects of publicity, reluctance of persons to
participate, and conflicting pressures to intervene. New technologies hold the promise of
mitigating some of these challenges.
Data Sources
Field investigations often use information abstracted from different sources, such as
hospital, outpatient medical, or school health records. These records vary substantially in
completeness and accuracy among patients, healthcare providers, and facilities because entries
are made for purposes other than conducting epidemiologic studies. Moreover, rapid and
substantive transitions have occurred for several key information sources—as, for example, in
the growing use of electronic medical records, hospital and managed-care data systems, and
laboratory information management systems. These automated systems can facilitate access to
needed records but might not be compatible with meeting the needs of or supporting specific
record access by external investigators. Thus, the quality of such records as sources of data for
epidemiologic investigations can be substantially less than the quality of information obtained
when investigators can exert greater control through the use of standardized, pretested
questionnaires; physical or laboratory examinations; or other prospectively designed, rather
than retrospective, data collection methods. These transitions necessitate that epidemiologists
involved in field investigations increasingly might need to know how to use these data sources
and, therefore, possess the requisite skills needed to analyze them.
The increasing use of social media and email can facilitate outreach to and queries of
persons who might have common exposures in an outbreak situation, such as participants in an
organized event linked to a common-source exposure. Recently, social media networks have
been used to assist in identifying contacts of persons with sexually transmitted diseases who
might be at high risk and should be considered for targeted prophylaxis. These communication
tools have provided added insight into social links and high-risk behaviors and have been used
to guide and augment data collected from traditional case investigation methodologies
OUTBREAK INVESTIGATION
Once the decision to conduct a field investigation of an acute outbreak has been made,
working quickly is essential — as is getting the right answer. In other words, epidemiologists
cannot afford to conduct an investigation that is “quick and dirty.” They must conduct
investigations that are “quick and clean.” Under such circumstances, epidemiologists find it
useful to have a systematic approach to follow. This approach ensures that the investigation
proceeds without missing important steps along the way.
Epidemiologic Steps of an Outbreak Investigation
1. Prepare for field work

2. Establish the existence of an outbreak
3. Verify the diagnosis
4. Construct a working case definition
5. Find cases systematically and record information
6. Perform descriptive epidemiology
7. Develop hypotheses
8. Evaluate hypotheses epidemiologically
9. As necessary, reconsider, refine, and re-evaluate hypotheses
10. Compare and reconcile with laboratory and/or environmental studies
11. Implement control and prevention measures
12. Initiate or maintain surveillance
13. Communicate findings
Step 1: Prepare for field work
The numbering scheme for this step is problematic, because preparing for field work
often is not the first step. Only occasionally do public health officials decide to conduct a field
investigation before confirming an increase in cases and verifying the diagnosis. More
commonly, officials discover an increase in the number of cases of a particular disease and then
decide that a field investigation is warranted. Sometimes investigators collect enough
information to perform descriptive epidemiology without leaving their desks, and decide that a
field investigation is necessary only if they cannot reach a convincing conclusion without one.
Regardless of when the decision to conduct a field investigation is made, you should be
well prepared before leaving for the field. The preparations can be grouped into two broad
categories: (a) scientific and investigative issues, and (b) management and operational issues.
Good preparation in both categories is needed to facilitate a smooth field experience.
Step 2: Establish the existence of an outbreak
An outbreak or an epidemic is the occurrence of more cases of disease than expected in

a given area or among a specific group of people over a particular period of time. Usually, the
cases are presumed to have a common cause or to be related to one another in some way.
Many epidemiologists use the terms outbreak and epidemic interchangeably, but the public is
more likely to think that epidemic implies a crisis situation. Some epidemiologists apply the term
epidemic to situations involving larger numbers of people over a wide geographic area. Indeed,
the Dictionary of Epidemiology defines outbreak as an epidemic limited to localized increase in
the incidence of disease, e.g., village, town, or closed institution.(23)
In contrast to outbreak and epidemic, a cluster is an aggregation of cases in a given area over a
particular period without regard to whether the number of cases is more than expected. This
aggregation of cases seems to be unusual, but frequently the public (and sometimes the health
agency) does not know the denominator. For example, the diagnosis in one neighborhood of
four adults with cancer may be disturbing to residents but may well be within the expected level
of cancer occurrence, depending on the size of the population, the types of cancer, and the
prevalence of risk factors among the residents.
One of the first tasks of the field investigator is to verify that a cluster of cases is indeed
an outbreak. Some clusters turn out to be true outbreaks with a common cause, some are
sporadic and unrelated cases of the same disease, and others are unrelated cases of similar
but unrelated diseases.
Even if the cases turn out to be the same disease, the number of cases may not exceed
what the health department normally sees in a comparable time period. Here, as in other areas
of epidemiology, the observed is compared with the expected. The expected number is usually
the number from the previous few weeks or months, or from a comparable period during the
previous few years. For a notifiable disease, the expected number is based on health
department surveillance records. For other diseases and conditions, the expected number may
be based on locally available data such as hospital discharge records, mortality statistics, or
cancer or birth defect registries. When local data are not available, a health department may
use rates from state or national data, or, alternatively, conduct a telephone survey of physicians

to determine whether they are seeing more cases of the disease than usual. Finally, a survey of
the community may be conducted to establish the background or historical level of disease.
Even if the current number of reported cases exceeds the expected number, the excess
may not necessarily indicate an outbreak. Reporting may rise because of changes in local
reporting procedures, changes in the case definition, increased interest because of local or
national awareness, or improvements in diagnostic procedures. A new physician, infection
control nurse, or healthcare facility may more consistently report cases, when in fact there has
been no change in the actual occurrence of the disease. Some apparent increases are actually
the result of misdiagnosis or laboratory error. Finally, particularly in areas with sudden changes
in population size such as resort areas, college towns, and migrant farming areas, changes in
the numerator (number of reported cases) may simply reflect changes in the denominator (size
of the population).
Whether an apparent problem should be investigated further is not strictly tied to

verifying the existence of an epidemic (more cases than expected). Sometimes, health agencies
respond to small numbers of cases, or even a single case of disease, that may not exceed the
expected or usual number of cases. As noted earlier, the severity of the illness, the potential for
spread, availability of control measures, political considerations, public relations, available
resources, and other factors all influence the decision to launch a field investigation.
Step 3: Verify the diagnosis
The next step, verifying the diagnosis, is closely linked to verifying the existence of an
outbreak. In fact, often these two steps are addressed at the same time. Verifying the diagnosis
is important: (a) to ensure that the disease has been properly identified, since control measures
are often disease-specific; and (b) to rule out laboratory error as the basis for the increase in
reported cases.
First, review the clinical findings and laboratory results. If you have questions about the
laboratory findings (for example, if the laboratory tests are inconsistent with the clinical and
epidemiologic findings), ask a qualified laboratorian to review the laboratory techniques being
used. If you need specialized laboratory work such as confirmation in a reference laboratory,
DNA or other chemical or biological fingerprinting, or polymerase chain reaction, you must
secure a sufficient number of appropriate specimens, isolates, and other laboratory material as
soon as possible.
Second, many investigators — clinicians and non-clinicians — find it useful to visit one
or more patients with the disease. If you do not have the clinical background to verify the
diagnosis, bring a qualified clinician with you. Talking directly with some patients gives you a
better understanding of the clinical features, and helps you to develop a mental image of the
disease and the patients affected by it. In addition, conversations with patients are very useful in
generating hypotheses about disease etiology and spread. They may be able to answer some
critical questions: What were their exposures before becoming ill? What do they think caused
their illness? Do they know anyone else with the disease? Do they have anything in common
with others who have the disease?
Third, summarize the clinical features using frequency distributions. Are the clinical
features consistent with the diagnosis? Frequency distributions of the clinical features are useful
in characterizing the spectrum of illness, verifying the diagnosis, and developing case
definitions. These clinical frequency distributions are considered so important in establishing the
credibility of the diagnosis that they are frequently presented in the first table of an
investigation’s report or manuscript.
Step 4: Construct a working case definition
A case definition is a standard set of criteria for deciding whether an individual should be
classified as having the health condition of interest. A case definition includes clinical criteria
and — particularly in the setting of an outbreak investigation — restrictions by time, place, and
person. The clinical criteria should be based on simple and objective measures such as “fever ≥
40°C (101°F),” “three or more loose bowel movements per day,” or “myalgias (muscle pain)
severe enough to limit the patient’s usual activities. ” The case definition may be restricted by
time (for example, to persons with onset of illness within the past 2 months), by place (for
example, to residents of the nine-county area or to employees of a particular plant) and by
person (for example, to persons with no previous history of a positive tuberculin skin test, or to
premenopausal women). Whatever the criteria, they must be applied consistently to all persons
under investigation.
The case definition must not include the exposure or risk factor you are interested in
evaluating. This is a common mistake. For example, if one of the hypotheses under
consideration is that persons who worked in the west wing were at greater risk of disease, do
not define a case as “illness among persons who worked in the west wing with onset
between…” Instead, define a case as “illness among persons who worked in the facility with
onset between…” Then conduct the appropriate analysis to determine whether those who
worked in the west wing were at greater risk than those who worked elsewhere.
Diagnoses may be uncertain, particularly early in an investigation. As a result,

investigators often create different categories of a case definition, such as confirmed, probable,
and possible or suspect, that allow for uncertainty. To be classified as confirmed, a case usually
must have laboratory verification. A case classified as probable usually has typical clinical
features of the disease without laboratory confirmation. A case classified as possible usually
has fewer of the typical clinical features.
In the outbreak setting, the investigators would need to specify time and place to
complete the outbreak case definition. For example, if investigating an epidemic of
meningococcal meningitis in Bamako, the case definition might be the clinical features as
described in the box with onset between January and April of this year among residents and
visitors of Bamako.
Classifications such as confirmed-probable-possible are helpful because they provide

flexibility to the investigators. A case might be temporarily classified as probable or possible
while laboratory results are pending. Alternatively, a case may be permanently classified as
probable or possible if the patient’s physician decided not to order the confirmatory laboratory
test because the test is expensive, difficult to obtain, or unnecessary. For example, while
investigating an outbreak of diarrhea on a cruise ship, investigators usually try to identify the
causative organism from stool samples from a few afflicted persons. If the tests confirm that all
of those case-patients were infected with the same organism, for example norovirus, the other
persons with compatible clinical illness are all presumed to be part of the same outbreak and to
be infected with the same organism. Note that while this approach is typical in the United

States, some countries prefer to acquire laboratory samples from every affected person, and
only those with a positive laboratory test are counted as true cases.
A case definition is a tool for classifying someone as having or not having the disease of
interest, but few case definitions are 100% accurate in their classifications. Some persons with
mild illness may be missed, and some persons with a similar but not identical illness may be
included. Generally, epidemiologists strive to ensure that a case definition includes most if not
all of the actual cases, but very few or no false-positive cases. However, this ideal is not always
met. For example, case definitions often miss infected people who have mild or no symptoms,
because they have little reason to be tested.
Step 5: Find cases systematically and record information
As noted earlier, many outbreaks are brought to the attention of health authorities by
concerned healthcare providers or citizens. However, the cases that prompt the concern are
often only a small and unrepresentative fraction of the total number of cases. Public health
workers must therefore look for additional cases to determine the true geographic extent of the
problem and the populations affected by it.
Usually, the first effort to identify cases is directed at healthcare practitioners and
facilities — physicians’ clinics, hospitals, and laboratories — where a diagnosis is likely to be
made. Investigators may conduct what is sometimes called stimulated or enhanced passive
surveillance by sending a letter describing the situation and asking for reports of similar cases.
Alternatively, they may conduct active surveillance by telephoning or visiting the facilities to
collect information on any additional cases.
In some outbreaks, public health officials may decide to alert the public directly, usually
through the local media. In other situations, the media may have already spread the word. For
example, in an outbreak of listeriosis in 2002 caused by contaminated sliceable turkey deli
meat, announcements in the media alerted the public to avoid the implicated product and
instructed them to see a physician if they developed symptoms compatible with the disease in
question.
If an outbreak affects a restricted population such as persons on a cruise ship, in a school, or at

a work site, and if many cases are mild or asymptomatic and therefore undetected, a survey of
the entire population is sometimes conducted to determine the extent of infection. A
questionnaire could be distributed to determine the true occurrence of clinical symptoms, or
laboratory specimens could be collected to determine the number of asymptomatic cases.
Finally, investigators should ask case-patients if they know anyone else with the same
condition. Frequently, one person with an illness knows or hears of others with the same illness.
In some investigations, investigators develop a data collection form tailored to the specific
details of that outbreak. In others, investigators use a generic case report form. Regardless of
which form is used, the data collection form should include the following types of information
about each case.
✓ Identifying information. A name, address, and telephone number is essential if

investigators need to contact patients for additional questions and to notify them of
laboratory results and the outcome of the investigation. Names also help in checking for
duplicate records, while the addresses allow for mapping the geographic extent of the
problem.
✓ Demographic information. Age, sex, race, occupation, etc. provide the person
characteristics of descriptive epidemiology needed to characterize the populations at
risk.
✓ Clinical information. Signs and symptoms allow investigators to verify that the case
definition has been met. Date of onset is needed to chart the time course of the
outbreak. Supplementary clinical information, such as duration of illness and whether
hospitalization or death occurred, helps characterize the spectrum of illness.
✓ Risk factor information. This information must be tailored to the specific disease in
question. For example, since food and water are common vehicles for hepatitis A but not
hepatitis B, exposure to food and water sources must be ascertained in an outbreak of
the former but not the latter.
✓ Reporter information. The case report must include the reporter or source of the report,
usually a physician, clinic, hospital, or laboratory. Investigators will sometimes need to
contact the reporter, either to seek additional clinical information or report back the
results of the investigation.
Traditionally, the information described above is collected on a standard case report form,
questionnaire, or data abstraction form.
Step 6: Perform descriptive epidemiology
Conceptually, the next step after identifying and gathering basic information on the
persons with the disease is to systematically describe some of the key characteristics of those
persons. This process, in which the outbreak is characterized by time, place, and person, is
called descriptive epidemiology. It may be repeated several times during the course of an
investigation as additional cases are identified or as new information becomes available.
This step is critical for several reasons.
✓ Summarizing data by key demographic variables provides a comprehensive

characterization of the outbreak — trends over time, geographic distribution (place), and
the populations (persons) affected by the disease.
✓ From this characterization you can identify or infer the population at risk for the disease.
✓ The characterization often provides clues about etiology, source, and modes of
transmission that can be turned into testable hypotheses (see Step 7).
✓ Descriptive epidemiology describes the where and whom of the disease, allowing you to
begin intervention and prevention measures.
✓ Early (and continuing) analysis of descriptive data helps you to become familiar with
those data, enabling you to identify and correct errors and missing values.
Time
Traditionally, a special type of histogram is used to depict the time course of an
epidemic. This graph, called an epidemic curve, or epi curve for short, provides a simple visual
display of the outbreak’s magnitude and time trend.
Place
Assessment of an outbreak by place not only provides information on the geographic
extent of a problem, but may also demonstrate clusters or patterns that provide important
etiologic clues. A spot map is a simple and useful technique for illustrating where cases live,
work, or may have been exposed.
Person
Characterization of the outbreak by person provides a description of whom the case-
patients are and who is at risk. Person characteristics that are usually described include both
host characteristics (age, race, sex, and medical status) and possible exposures (occupation,
leisure activities, and use of medications, tobacco, and drugs). Both of these influence
susceptibility to disease and opportunities for exposure.
The two most commonly described host characteristics are age and sex because they
are easily collected and because they are often related to exposure and to the risk of disease.
Depending on the outbreak, occupation, race, or other personal characteristics specific to the
disease under investigation and the setting of the outbreak may also be important. For example,
investigators of an outbreak of hepatitis B might characterize the cases by intravenous drug use
and sexual contacts, two of the high risk exposures for that disease. Investigators of a school-
based gastroenteritis outbreak might describe occurrence by grade or classroom, and by
student versus teacher or other staff.
Early in an investigation, investigators may restrict the descriptive epidemiology to

numbers of cases. However, in many circumstances the investigators also calculate rates
(number of cases divided by the population or number of people at risk). Numbers indicate the
burden of disease and are useful for planning and service delivery. Rates are essential for
identifying groups with elevated risk of disease.
Step 7: Develop hypotheses
Although the next conceptual step in an investigation is formulating hypotheses, in reality,

investigators usually begin to generate hypotheses at the time of the initial telephone call.
Depending on the outbreak, the hypotheses may address the source of the agent, the mode
(and vehicle or vector) of transmission, and the exposures that caused the disease. The
hypotheses should be testable, since evaluating hypotheses is the next step in the investigation.
In an outbreak context, hypotheses are generated in a variety of ways. First, consider what you
know about the disease itself: What is the agent’s usual reservoir? How is it usually transmitted?
What vehicles are commonly implicated? What are the known risk factors? In other words, by
being familiar with the disease, you can, at the very least, “round up the usual suspects.”
Another useful way to generate hypotheses is to talk to a few of the case-patients, as discussed
in Step 3. The conversations about possible exposures should be open-ended and wide-
ranging, not necessarily confined to the known sources and vehicles. In some challenging
investigations that yielded few clues, investigators have convened a meeting of several case-
patients to search for common exposures. In addition, investigators have sometimes found it
useful to visit the homes of case-patients and look through their refrigerators and shelves for
clues to an apparent foodborne outbreak.
Just as case-patients may have important insights into causes, so too may the local health
department staff. The local staff know the people in the community and their practices, and
often have hypotheses based on their knowledge.
The descriptive epidemiology may provide useful clues that can be turned into hypotheses. If
the epidemic curve points to a narrow period of exposure, what events occurred around that
time? Why do the people living in one particular area have the highest attack rate? Why are
some groups with particular age, sex, or other person characteristics at greater risk than other
groups with different person characteristics? Such questions about the data may lead to
hypotheses that can be tested by appropriate analytic techniques.
Step 8: Evaluate hypotheses epidemiologically
After a hypothesis that might explain an outbreak has been developed, the next step is
to evaluate the plausibility of that hypothesis. Typically, hypotheses in a field investigation are
evaluated using a combination of environmental evidence, laboratory science, and
epidemiology. From an epidemiologic point of view, hypotheses are evaluated in one of two
ways: either by comparing the hypotheses with the established facts or by using analytic
epidemiology to quantify relationships and assess the role of chance.
The first method is likely to be used when the clinical, laboratory, environmental, and/or
epidemiologic evidence so obviously supports the hypotheses that formal hypothesis testing is
unnecessary. For example, in an outbreak of hypervitaminosis D that occurred in
Massachusetts in 1991, investigators found that all of the case-patients drank milk delivered to
their homes by a local dairy. Therefore, investigators hypothesized that the dairy was the source
and the milk was the vehicle. When they visited the dairy, they quickly recognized that the dairy
was inadvertently adding far more than the recommended dose of vitamin D to the milk. No
analytic epidemiology was really necessary to evaluate the basic hypothesis in this setting or to
implement appropriate control measures, although investigators did conduct additional studies
to identify additional risk factors.
In many other investigations, however, the circumstances are not as straightforward, and
information from the series of cases is not sufficiently compelling or convincing. In such
investigations, epidemiologists use analytic epidemiology to test their hypotheses. The key
feature of analytic epidemiology is a comparison group. The comparison group allows
epidemiologists to compare the observed pattern among case-patients or a group of exposed
persons with the expected pattern among noncases or unexposed persons. By comparing the
observed with expected patterns, epidemiologists can determine whether the observed pattern
differs substantially from what should be expected and, if so, by what degree. In other words,
epidemiologists can use analytic epidemiology with its hallmark comparison group to quantify
relationships between exposures and disease, and to test hypotheses about causal
relationships. The two most common types of analytic epidemiology studies used in field
investigations are retrospective cohort studies and case-control studies
Step 9: Reconsider, refine, and re-evaluate hypotheses
Unfortunately, analytic studies sometimes are unrevealing. This is particularly true if the
hypotheses were not well founded at the outset. It is an axiom of field epidemiology that if you
cannot generate good hypotheses (for example, by talking to some case-patients or local staff
and examining the descriptive epidemiology and outliers), then proceeding to analytic
epidemiology, such as a case-control study, is likely to be a waste of time.

When analytic epidemiology is unrevealing, rethink your hypotheses. Consider
convening a meeting of the case-patients to look for common links or visiting their homes to look
at the products on their shelves. Consider new vehicles or modes of transmission. Even when
an analytic study identifies an association between an exposure and disease, the hypothesis
may need to be honed. Sometimes a more specific control group is needed to test a more
specific hypothesis. Finally, recall that one reason to investigate outbreaks is research. An
outbreak may provide an “experiment of nature” that would be unethical to set up deliberately
but from which the scientific community can learn when it does happen to occur. When an
outbreak occurs, whether it is routine or unusual, consider what questions remain unanswered
about that particular disease and what kind of study you might do in this setting to answer some
of those questions. The circumstances may allow you to learn more about the disease, its
modes of transmission, the characteristics of the agent, host factors, and the like.
Step 10: Compare and reconcile with laboratory and environmental studies
While epidemiology can implicate vehicles and guide appropriate public health action,
laboratory evidence can confirm the findings. Environmental studies are equally important in
some settings. They are often helpful in explaining why an outbreak occurred. Thus the
epidemiologic, environmental, and laboratory arms of the investigation complemented one
another, and led to an inescapable conclusion that the well had been contaminated and was the
source of the outbreak.
While you may not be an expert in these other areas, you can help. Use a camera to
photograph working or environmental conditions. Coordinate with the laboratory, and bring back
physical evidence to be analyzed.
Step 11: Implement control and prevention measures
In most outbreak investigations, the primary goal is control of the outbreak and
prevention of additional cases. Indeed, although implementing control and prevention measures
is listed as Step 11 in the conceptual sequence, in practice control and prevention activities
should be implemented as early as possible. The health department’s first responsibility is to
protect the public’s health, so if appropriate control measures are known and available, they
should be initiated even before an epidemiologic investigation is launched. For example, a child
with measles in a community with other susceptible children may prompt a vaccination
campaign before an investigation of how that child became infected.
Confidentiality is an important issue in implementing control measures. Healthcare

workers need to be aware of the confidentiality issues relevant to collection, management and
sharing of data. If patient information is disclosed to unauthorized persons without the patient’s
permission, the patient may be stigmatized or experience rejection from family and friends, lose
a job, or be evicted from housing.
In general, control measures are usually directed against one or more segments in the
chain of transmission (agent, source, mode of transmission, portal of entry, or host) that are
susceptible to intervention. For some diseases, the most appropriate intervention may be
directed at controlling or eliminating the agent at its source.

Some interventions are aimed at blocking the mode of transmission. Interruption of direct
transmission may be accomplished by isolation of someone with infection, or counseling
persons to avoid the specific type of contact associated with transmission. Similarly, to control
an outbreak of influenza-like illness in a nursing home, affected residents could be cohorted,
that is, put together in a separate area to prevent transmission to others. Vehicle borne
transmission may be interrupted by elimination or decontamination of the vehicle. For airborne
diseases, strategies may be directed at modifying ventilation or air pressure, and filtering or
treating the air. To interrupt vector borne transmission, measures may be directed toward
controlling the vector population. Some simple and effective strategies protect portals of entry.
Some interventions aim to increase a host’s defenses. Vaccinations promote development of
specific antibodies that protect against infection. Similarly, prophylactic use of antimalarial
drugs, recommended for visitors to malaria-endemic areas, does not prevent exposure through
mosquito bites but does prevent infection from taking root.
Step 12: Initiate or maintain surveillance
Once control and prevention measures have been implemented, they must continue to be
monitored. If surveillance has not been ongoing, now is the time to initiate active surveillance. If
active surveillance was initiated as part of case finding efforts, it should be continued. The
reasons for conducting active surveillance at this time are twofold. First, you must continue to
monitor the situation and determine whether the prevention and control measures are working.
Is the number of new cases slowing down or, better yet, stopping? Or are new cases continuing
to occur? If so, where are the new cases? Are they occurring throughout the area, indicating
that the interventions are generally ineffective, or are they occurring only in pockets, indicating
that the interventions may be effective but that some areas were missed?
Second, you need to know whether the outbreak has spread outside its original area or the area
where the interventions were targeted. If so, effective disease control and prevention measures
must be implemented in these new areas.
Step 13: Communicate findings
As noted in Step 1, development of a communications plan and communicating with those who
need to know during the investigation is critical. The final task is to summarize the investigation,
its findings, and its outcome in a report, and to communicate this report in an effective manner.
This communication usually takes two forms:
• An oral briefing for local authorities. If the field investigator is responsible for the
epidemiology but not disease control, then the oral briefing should be attended by the
local health authorities and persons responsible for implementing control and prevention
measures. Often these persons are not epidemiologists, so findings must be presented
in clear and convincing fashion with appropriate and justifiable recommendations for
action. This presentation is an opportunity for the investigators to describe what they did,
what they found, and what they think should be done about it. They should present their
findings in a scientifically objective fashion, and they should be able to defend their
conclusions and recommendations.
• A written report. Investigators should also prepare a written report that follows the usual
scientific format of introduction, background, methods, results, discussion, and
recommendations. By formally presenting recommendations, the report provides a
blueprint for action. It also serves as a record of performance and a document for
potential legal issues. It serves as a reference if the health department encounters a
similar situation in the future. Finally, a report that finds its way into the public health
literature serves the broader purpose of contributing to the knowledge base of
epidemiology and public health.
In recent years, the public has become more aware of and interested in public health. In
response, health departments have made great strides in attempting to keep the public
informed. Many health departments strive to communicate directly with the public, usually
through the media, both during an investigation and when the investigation is concluded.
Synthesis:
Key developments in public health practice during recent decades reflect the growing
recognition and formalization of field epidemiology, including establishment of field epidemiology
training programs in affiliation with ministries of health and other national-level public health
agencies around the world
As the discipline of field epidemiology continues to evolve, new developments and trends
are shaping its ongoing incorporation within public health practice. Examples of these
developments include the following.
➢ The importance of global epidemiologic capacity building to protect the United

States and other populations in an era of expanded travel and population
connectivity.
➢ The potential for parties affected in outbreaks to threaten or actually bring lawsuits
and how threatened or actual litigation might affect an ongoing investigation (e.g.,
complicate or otherwise interfere with data collection or create or increase response
bias).
➢ The importance of ethical public health practice, including the ongoing need to
respect privacy and protect confidentiality in the face of the ever-evolving landscape
of culture, policy, law, and technology.
➢ The persistent awareness of and concerns about intentionality as a cause of
disease outbreaks, including lower thresholds for considering intentional actions as
a primary or contributing determinant for an outbreak and, when criminal or terrorist
acts are suspected, the resulting need for public health and law enforcement
agencies to coordinate investigations.
➢ Uses during field investigations of Internet-based and other advanced information
technologies for connecting jurisdictions, identifying cases and contacts, conducting
surveys or collecting electronically stored health data, and communicating findings
and control measures.
➢ The use of new laboratory methods for multipathogen detection, genetic
sequencing, and environmental testing to increase opportunities for detecting and
investigating epidemics, emphasizing the need for increased close communication
between epidemiologists and laboratory scientists.
➢ The increasing expectation from the public for government transparency and for
timely information about unfolding events, combined with the advent of social media
and the 24-hour news cycle for transmitting instant, if not consistently accurate,
information, each of which underscores the heightened importance of evidence-
based decision-making and enhanced communication skills.

Field epidemiology draws on general epidemiologic principles and methods, and field
epidemiologists face questions that are familiar to all epidemiologists regardless of where they
work, including questions about how study methods are shaped by logistical constraints and
about the amount of information necessary to recommend or take action. Likewise, field
epidemiologists are affected by trends that influence the practice of epidemiology in general,
such as public concerns about the privacy of health information, the increasing automation of
health information, and the growth in use of the Internet. Field epidemiology is unique, however,
in compressing and pressurizing these concerns in the context of acute public health
emergencies and
other events and in
thrusting the
epidemiologist
irretrievably into the
midst of the
administrative,
legal, and ethical
domains of policy-
making and public
health action.
Assessment:
email

• This will be answered in a word format in a short coupon bond with margins 1 inch on all
sides. It will be submitted to the google classroom assigned to the class.
• Answer the following questions in 3-5 sentences.
• Answers should be using own words
• Each question is allotted 5 points
QUESTIONS :
1. What are the 2 issues encountered in field work? Briefly explain each
2. Why are epidemic curves informative?
3. Explain “When the epidemiology does not fit the natural pattern, think unnatural, i.e.,
intentional.”
RUBRIC

References:
Blank S, Scanlon KS, Sinks TH, Lett S, Falk H. An outbreak of hypervitaminosis D associated
with the overfortication of milk from a home-delivery dairy. Am J Public Health
1995;85:656–9.
Centers for Disease Control and Prevention. (2016). Lesson 6: Investigating an Outbreak
Section 2: Steps of an Outbreak Investigation. Principles of Epidemiology in Public Health
Practice, Third Edition An Introduction to Applied Epidemiology and Biostatistics.
Retrieved from https://www.cdc.gov/csels/dsepd/ss1978/lesson6/section2.html on August
7, 2020
Goodman, R, Buehler, J. Mott, J. (2018). Defining Epidemiology. The CDC Field Epidemiology
Manual. Retrieved from https://www.cdc.gov/eis/field-epi-manual/chapters/Defining-Field-
Epi.html on August 7, 2020
Huang, F. and Bayona, M. (2004). Disease Outbreak Investigation. The Young Epidemiology
Scholars Program (YES) . The Robert Wood Johnson Foundation and administered by the
College Board. Retrieved from https://secure-media.collegeboard. org/digitalservices/
pdf/yes/disease_outbreak.pdf on August 7, 2020
King, M., Bensyl, D., Goodman, R., Rasmussen, S. (2018). Conducting a Field Investigation.
The CDC Field Epidemiology Manual. Retrieved from https://www.cdc.gov/eis/field-epi-
manual/chapters/Field-Investigation.html on August 7, 2020
Palmer SR. Epidemiology in search of infectious diseases: methods in outbreak investigation. J

Epidemiol Comm Health 1989;43:311–4.
Last JM. A dictionary of epidemiology, 4th ed. New York: Oxford U Press, 2001:129.
Jacobus CH, Holick MF, Shao Q, Chen TC, Holm IA, Kolodny JM, et al. Hypervitaminosis D
associated with drinking milk. New Engl J Med 1992;326:1173–7.

LABORATORY 15
FIELD EPIDEMIOLOGY/OUTBREAK INVESTIGATION
I. BACKGROUND
Field epidemiology are investigations initiated in response to urgent public health
problems. A primary goal of field epidemiology is to guide, as quickly as possible, the processes of
selecting and implementing interventions to lessen or prevent illness or death when such problems arise
Once the decision to conduct a field investigation of an acute outbreak has been made,
working quickly is essential — as is getting the right answer. In other words, epidemiologists
cannot afford to conduct an investigation that is “quick and dirty.” They must conduct
investigations that are “quick and clean.” Under such circumstances, epidemiologists find it

useful to have a systematic approach to follow. This approach ensures that the investigation
proceeds without missing important steps along the way.
Epidemiologic Steps of an Outbreak Investigation
1. Prepare for field work

2. Establish the existence of an outbreak
3. Verify the diagnosis
4. Construct a working case definition
5. Find cases systematically and record information
6. Perform descriptive epidemiology
7. Develop hypotheses
8. Evaluate hypotheses epidemiologically
9. As necessary, reconsider, refine, and re-evaluate hypotheses
10. Compare and reconcile with laboratory and/or environmental studies
11. Implement control and prevention measures
12. Initiate or maintain surveillance
13. Communicate findings
II. OBJECTIVES
1. To be familiar with the steps that are taken to conduct an epidemic investigation, particularly
for an unknown disease
2. To illustrate the procedures of an outbreak investigation by using historical cases of out-break
investigations
III. MATERIALS
References/Internet
IV. ACTIVITY (CASE ANALYSIS): ANSWER THE QUESTIONS WRITTEN IN RED INL
Food-Borne Outbreak
Background
An outbreak (epidemic) of food poisoning occurred in Barangay A in Baguio City on the

evening of September 28. A total of 89 people went to the emergency departments of the three
local hospitals during that evening. No more cases were reported afterward. These patients
complained of headache, severe stomach ache, nausea, vomiting and diarrhea. The disease was
severe enough in 19 patients to require hospitalization for rehydration. Food poisoning outbreaks
like this are usually caused by the consumption of a contaminated food or water. However, acute
outbreaks are more often produced by toxins from bacteria such as Staphylococcius spp.,
Clostridium perfringens, Campylobacter, Salmonella spp.and Vibrio cholerae and virus like the

Norovirus. Food poisoning can also be caused by chemicals or heavy metals, such as cop-per,
cadmium or zinc, or by shellfish toxins.
Please discuss these findings.
a) nature of food poisoning- action of the causative agents mention
b) research of news articles showing any food poisoning outbreak in the last 5 years
Outbreak Investigation
The local health department was notified of a potential food-borne outbreak of food
poisoning in Barangay A in Baguio City and the epidemic team, including a medical epidemiologist,
a microbiologist and a nurse, visited the local hospitals to interview the attending physicians, the
patients and some of their relatives. Some stool samples were obtained from patients for
microbiologic identification of the causative agent. The epidemic team knew that these types of
outbreak usually occur in a very short time period that lasts no more than a few hours or one to two
days after people ingest a contaminated meal.
Epidemic investigators gather data to define the distribution of the disease by time (onset
time and epidemic curve), place (potential places where the implicated meal was served, such as
canteens, restaurants and picnics) and person (the distribution of the disease by age, gender and
food items eaten). The findings of the initial investigation included the following information. The
distribution of the disease by person (age and gender) was found as follows:
Please calculate the totals for each column and row and their corresponding percentages to try to
determine if there are any important differences by age or by gender. Such a task is carried out to
investigate if there are any high-risk groups and if the age and gender distribution can give some
clues about the source of the outbreak. Interpret your findings.

The epidemic curve above shows the onset time of illness in the 89 patients involved in the
outbreak. The epidemic team studied the curve and recognized that this was a typical single source
acute outbreak. The team also could see that the onset of symptoms in all patients occurred during
a six-hour period. Given the symptoms mentioned above and the epidemic curve, the epidemic
team concluded that this type of epidemic usually corresponds to intoxication o rfood poisoning and
that the potentially implicated meal was probably served and consumed within a period of a few
hours before the onset of the symptoms. Therefore the epidemic team investigated the places
where affected persons, their relatives and neighbors ate that day (September 28). The following
table shows the team's findings:

Please calculate the attack rates per 100 (incidence rates per 100) by place to try to determine
where the contaminated meal was served. For each place compare attack rates (AR) for those who
attended with attack rates for those who did not, by using the relative risk (i.e., RR = AR
inattendees/AR in non attendees). Interpret your findings
Once the implicated place was determined, the investigation centered on the food. The following
table includes the food items served in that place on September 28:
V.

Important note: None of the kitchen personnel were ill. The names of the kitchen personnel and their
participation in the food preparation are as follows: Manuel prepared the cheese burgers and French
fries, John prepared hotdogs and pork barbeque, Sally prepared the pancit and Jane prepared and sold
the fishballs.
Please calculate the attack rates per 100 (incidence rates per 100) by food item to try to determine the
one that was probably contaminated. Compare attack rates (AR) for those who ate the food item with
attack rates for those who did not eat the food item, by using the relative risk(i.e., RR = AR in those who
ate the food/AR in those who did not eat the food). Interpret your findings
Given that the epidemic team worked fast enough and the implicated meal(s) was (were)identified
before all food leftovers were discarded, food samples from some meal leftovers were taken to the
laboratory. In addition, stool samples were taken from the kitchen personnel who prepared or handled
each different food item. The laboratory confirmed that Salmonella toxin was present in some of the
food samples and that one of the kitchen personnel of that place had the same Salmonella species.
Furthermore, the Salmonella species found in the food and the kitchen worker was the same species
found in stool samples of the patients.
Please discuss these findings and identify the kitchen worker possibly responsible for the outbreak.
QUESTIONS FOR RESEARCH

1. What is a case definition? How important is it in Outbreak investigation?
2. Differentiate Active from Passive surveillance
3. Why is there a need to perform descriptive epidemiology in an outbreak
investigation?
4. Define the following
a) point source epidemic
b) continuous common-source epidemic
c) propagated epidemic
RUBRIC

LESSON 16
CHRONIC DISEASE EPIDEMIOLOGY

1. To assess & understand public health issues as determinants of population health
and illness
Introduction
Chronic disease epidemiology addresses the etiology, prevention, distribution,
natural history, and treatment outcomes of chronic health disorders, including cancer
(particularly breast, colon, lung, prostate, ovary and pancreas), cardiovascular disease,
diabetes, gastrointestinal and pulmonary disease, and obesity. Many of the greatest
population health problems are in chronic disease and include large contributions from a
lack of prioritization and appropriate implementation regarding known hazards and effective
prevention. Robust quantitative evidence is critical to addressing these.
Lesson Proper:
Chronic diseases are defined broadly as conditions that last 1 year or more and
require ongoing medical attention or limit activities of daily living or both. According to the
World health Organization, chronic diseases are diseases of long duration and generally
slow progression. So once someone has the condition they will have to manage and control
it. These are some of the major features of chronic diseases:
➢ They have an uncertain etiology: no direct causes have been identified for the
emergence of these diseases; studies show relationships between the
emergence of the disease and exposure to certain factors referred to as ‘risk
factors’.
➢ A cluster of factors, such as the ones mentioned above, are shown to have a
strong predictive relationship to these diseases, even if exposure to these factors
does not necessarily lead to such disease; for the chronic diseases major risk
factors relate to life conditions and practices.
One of the major contributions of the field of epidemiology - establishing causal

relationships between the emergence of the disease and factors that the affected persons
have been exposed to.
- They have multiple risk factors: unlike most infectious diseases they result from
exposure to several risk factors;
- They have a long latency period: the disease proceeds over a long course of
time without symptoms;
- They show a prolonged course of illness;
- They are generally non-contagious in origin;
- They result in functional impairment or disability;
- They are incurable;
- They require long-term and systematic approach to treatment.

Chronic diseases or Non-communicable diseases (NCDs) represent more than
half the global burden of disease. Cardiovascular disease causes roughly half of NCD
deaths. For a long time, NCDs were dismissed as “rich-country problems” and not
worthy of global attention. But NCDs are a larger problem in low-income countries than
in high-income. They are the price paid for economic development, prosperity and major
achievements in healthcare, which bring us longer, less arduous, but perhaps more
stressful lives.
When more than chronic condition occurs at the same time, the picture gets
more complicated. One in three adults worldwide has multiple chronic conditions:
cardiovascular disease alongside diabetes, depression as well as cancer, or a
combination of three, four, or even five or six diseases at the same time.
Advances in medical technology have resulted in people living longer and

therefore the ageing population increases. In many parts of the world, especially
developed countries such as Sweden where they have high proportion of ageing
population, the prevalence of NCDs tends to be higher.
Previously Chronic NCDs were known as diseases of affluence. However,

current data shows that low- and middle-income countries now have the highest
mortality rates due to NCDs, which suggests a change in NCD trends. Vulnerable and
disadvantaged communities also tend to have lower life expectancy than people from
higher social classes – determined by education, occupation, income, gender and
ethnicity.
Chronic NCDs have been said to place a burden on individuals, families, health
systems and the economy, brought about by loss of independence, loss of income,
increased budget for medication and loss of economically active workforce.
Effects on the individual and family

People living with chronic diseases are affected socially and economically. Chronic
disease has major adverse effects on the quality of life of affected individuals; it causes
premature death, creates significant adverse, and underappreciated, economic effects
on families, communities and societies in general
Effects on the workforce

Chronic diseases have not only social, but also economic effects. A significant
proportion of affected people are those of working age – family breadwinners and people
who should be productive members of the economy. In addition, in the case of chronic
disease there is a need for regular visits to the health facility, which impacts on time at
work, and productivity. Healthier individuals are less likely to be absent from work.
THE IMPACT OF CHRONIC DISEASES ON HEALTH SERVICES

Chronic diseases threaten to overwhelm already over-stretched health services.
While historically the health care system has focused on treating acute illnesses, today
there is growing pressure for the health care system to effectively manage the increasing
number of chronic disease sufferers as well. Chronic conditions are long-term illnesses
that limit life activities and require ongoing care. Yet many people do not have access to
ongoing medical attention, especially in developing countries, and particularly in the
African region where resources are scarce. Lives are then lost due to the fact that acute
care models and available services cannot accommodate the needs of chronically ill
individuals. These are often people from the most needy groups, where the result is
further increased stress on families due to loss of breadwinners.
CHRONIC DISEASE EPIDEMIOLOGISTS
Chronic disease epidemiologists (CDEs) perform functions that are critical to
health departments. Collecting, analyzing, interpreting, and disseminating data on
chronic diseases and related risk factors is vital to understanding and raising awareness
about morbidity, mortality, associated costs, and disparities. These data are also vital
inputs throughout the process of implementing evidence-based public health approaches
to reduce the burden of chronic diseases
CHRONIC DISEASE SURVEILLANCE
Chronic disease surveillance is changing, with new priorities that are more
upstream, more clinical, more cross-cutting, and more granular than previous priorities;
new data sources, such as electronic health records, to supplement traditional sources;
and new technologies. Today’s state, territorial, local, and tribal CDEs increasingly need
to be strategic, innovative, collaborative, and efficient while wearing many hats and
taking on leadership roles: statistician, informaticist, demographer, cartographer,
evaluator, communications specialist, privacy officer, strategist, convener, and others.
CDEs need to expand partnerships across multiple sectors to leverage data and
resources to address social, environmental, and economic conditions that affect health
and advance health equity. Timely and locally relevant data, metrics, and analytics are of
utmost importance in this work to guide, focus, and assess the effect of prevention
initiatives, including those targeting the social determinants of health and enhancing
equity.
Chronic disease surveillance is challenged by data gaps, limitations in data
access and timeliness, increases in data collection costs, decreases in funding, and
inadequate staffing.
To achieve excellence in chronic disease epidemiology and to build capacity, the
following are needed:
1) identify champions for enhancing capacity,
2) continually review and update the essential roles of CDEs,
3) expand the skills and competencies of the current and future workforce,
4) develop and enhance partnerships to improve data sharing,
5) leverage and link existing data sources,
6) improve the availability of local data,
7) fill data gaps to better measure determinants of health and health disparities,

8) make data more actionable.
Strong commitment is vital to building and maintaining capacity-building efforts in

chronic disease epidemiology and surveillance in state, territorial, local, and tribal public
health agencies. Throughout these capacity-building efforts and across all chronic
disease epidemiology and surveillance efforts, the default view must be through a health
equity lens.
Synthesis:
Chronic diseases account for 6 of the top 7 causes of death according the Centers for
Disease Control and Prevention. From cardiovascular disease to diabetes to cancer and
pulmonary disease, chronic diseases claim far more lives than such infectious diseases as
pneumonia and influenza. And as the population ages, the burden of these diseases is only
likely to increase. The same is true overseas and, increasingly, in developing countries.
These trends have spurred myriad efforts at international, national, state, and local
levels and in academia and industry – all of which call for interdisciplinary researchers and
practitioners with the ability to identify risk factors, understand the social context, and
develop prevention strategies for the most significant chronic diseases.
Assessment:
email
1. List the top 10 chronic diseases worldwide

2. List the top 5 chronic diseases in the Philippines
References:
Australian National University (n.d). Chronic Disease Epidemiology and Pharmacoepidemiology.

Retrieved from https://rsph.anu.edu.au/research/groups/epidemiology-policy-and-
practice/chronic-disease-epidemiology-and on August 7, 2020
Calanan, R. (2018). Achieving Excellence in the Practice of Chronic Disease Epidemiology.

Centers for Disease Control. Preventing Chronic Disease Public Health Research,
Practice, and Policy Retrieved from https://www.cdc.gov/pcd/issues/2018/18_0526.htm on
August 7, 2020
Columbia Mailman School of Public Health. (2020). Chronic Disease Epidemiology. Retrieved
from https://www.publichealth.columbia.edu/academics/ departments/epidemiology/
research/chronic-disease-epidemiology on August 7, 2020
Council of State and Territorial Epidemiologists. Essential functions of chronic disease

epidemiology in state health departments.

http://www.cste2.org/webpdfs/EssentialFunctionsWhitePaperEditedFinal092204.pdf.
Accessed October 2, 2018.
DeSalvo KB, Wang YC, Harris A, Auerbach J, Koo D, O’Carroll P. Public Health 3.0: a call to
action for public health to meet the challenges of the 21st century. Prev Chronic Dis
2017;14:E78.
Huang, F. and Bayona, M. (2004). Disease Outbreak Investigation. The Young Epidemiology
Scholars Program (YES) . The Robert Wood Johnson Foundation and administered by the
College Board. Retrieved from https://secure-media.collegeboard. org/digitalservices/
pdf/yes/disease_outbreak.pdf on August 7, 2020
School of Public Health University of the Western Cape (n.d). Epidemiology and Control of Non-
communicable Diseases – Unit 1. Retrieved from https://www.google.com/url?sa
=t&rct=j&q=&esrc=s&source=web&cd=&cad=rja&uact=8&ved=2ahUKEwj70OKNwojrAhV
syIsBHf2sDwUQFjADegQIAhAB&url=https%3A%2F%2Fwww.uwc.ac.za%2FFaculties%2
FCHS%2Fsoph%2FDocuments%2FSOPH%2520UWC-%2520 Epidemiology%2520File
%25202%2520Unit%25201(JB).doc&usg=AOvVaw1Wb56ogz-d9-BJ1MnzGNLM on
August 7, 2020
Waxman, A.(2020). This is the biggest challenge to our health. World Economic Forum articles
may be republished in accordance with the Creative Commons Attribution-
NonCommercial-NoDerivatives 4.0 International Public License, and in accordance with
our Terms of Use. Retrieved from https://www.weforum.org/agenda/2017/12/healthcare-
future-multiple-chronic-disease-ncd/ on August 7, 2020

LABORATORY 16
CHRONIC DISEASES EPIDEMIOLOGY
I. BACKGROUND
Chronic disease epidemiology addresses the etiology, prevention, distribution, natural
history, and treatment outcomes of chronic health disorders, including cancer (particularly breast, colon,
lung, prostate, ovary and pancreas), cardiovascular disease, diabetes, gastrointestinal and pulmonary
disease, and obesity. Many of the greatest population health problems are in chronic disease and
include large contributions from a lack of prioritization and appropriate implementation regarding
known hazards and effective prevention. Robust quantitative evidence is critical to addressing these.II.
II. OBJECTIVES
1. Explain the terms ‘chronic diseases’ and chronic non communicable disease
2. Describe the extent of the problem.
3. Explain why chronic diseases are a concern.
4. Understand the global and local burden of chronic diseases.
5. Understand basic concepts in chronic disease epidemiology.
6. Understand the implication of these chronic diseases in relation to health and development at
the global, country and family level
III. MATERIALS
References
IV. TASK 1 – Develop your own definition of chronic diseases
(a) As health workers, we encounter many different types of diseases: in your

understanding, what is a ‘chronic disease’?
(b) What conditions fall within the category of ‘chronic disease’?
ACTIVITY
- Complete the following tasks

TASK 3: Identify prevalent chronic diseases
Now that you have an overview on chronic diseases, refer to the NCD country profile 2019
presented by WHO to answer the following questions:
• What chronic NCDs are prevalent in the world, Philippines and your own province?
• Prioritize them from the most urgent to the least urgent.
• How does your country compare to other countries within the same income group?
TASK 4: Consider the case of South African health services
Bearing in mind the strain on health services brought about by the burden of chronic diseases,
explain the South African health service status within a developing country, and highlight how you
see health systems accommodating people from different socio-economic statuses (especially the
poor).

1. Based on this statement: ‘Chronic diseases are diseases of affluence’
a) Can chronic NCDs really still be considered ‘diseases of affluence’?
b) Do chronic NCDs only affect rich countries?
c) Do chronic NCDs affect only the rich in rich countries?
d) Are chronic NCDs a problem only for the elderly?
2. What are the effects of chronic diseases on labour supply and productivity
(workforce)?
3. Discuss the coping mechanisms you see occurring within families and the working
environment, to withstand the conditions brought about by the burden of chronic
diseases in your area.
TASK 2: Identify some of the implications of the burden of chronic disease
Look for various definitions of chronic diseases. Based on your research present a picture
of the nature of chronic disease. What then might be the implications of chronic disease?
• What costs or losses might a chronic disease predispose one to? Think of costs,
or the burden of suffering that chronic disease presents individuals, families and the
society at large. You may categorise your response according to these affected
populations.
• What opportunities does this

A Self-regulated picture
Learning show us for arresting the course of disease?
Module 146
LESSON 17
CLINICAL EPIDEMIOLOGY

1. To apply and interpret measures of disease occurrence and correlates in populations
Introduction
Clinical epidemiology is the study of the patterns, causes, and effects of health and
disease in patient populations and the relationships between exposures or treatments
and health outcomes.
Lesson Proper:
Clinical epidemiology is the application of epidemiology principles and methods

to the clinical setting. In short, clinical epidemiology is generally focused on applied
decision-making, for the purpose of improving patient-level outcomes. Classical
epidemiology is generally focused on the distribution and determinants of disease
(population level), while clinical epidemiology is the application of the principles and
methods of epidemiology to conduct, appraise, or apply clinical research for the purpose
of improving prevention, diagnosis, prognosis, and treatment of diseases in patients.
The movement toward evidence-based medicine and evidence-informed decision-
making in the clinical setting and in healthcare more generally is a direct derivative of the
field of clinical epidemiology.
Clinical Epidemiology is the study of groups of people to achieve the background

evidence needed for clinical decisions in patient care. It must generate the best possible
evidence from groups of individuals regarding the effectiveness and efficiency of various
clinical courses of action. It must also translate this evidence (or the lack thereof) into
rational clinical decisions pertaining to the management of individual patients.
Clinical Epidemiology utilizes techniques developed by classical epidemiology

and adapts these to the study of individual patients. It incorporates concepts from
related fields such as Biostatistics, Health Social Science, and Health Economics. It
deals mainly with the teaching of clinical research methodology and evidence-based
medicine. Clinical Epidemiology is an evolving discipline and is considered as a basic
science in clinical medicine.
The main predictors for the prognosis – diagnosis and treatment – are thus key
concepts in clinical epidemiology and the practice of clinical medicine. Whereas much of
population epidemiology is directed towards the general population, clinical
epidemiology is more focused on the individual.
In 1938, Paul used the term clinical epidemiology for the first time and defined it
as a new basic science for preventive medicine,6 but Paul’s description does not entirely
cover the modern description of clinical epidemiology, which concepts have been
developed since the mid-1960s in particular, by Sackett, Feinstein, the Fletchers, and
Weiss. Clinical epidemiology interfaces with many other areas. Thus, practical
application of clinical epidemiology is a key part of evidence-based medicine and clinical
decision making. In recent years, clinical epidemiology has become important for the
health care system because of the need for assessments in the areas of quality of care,
patient safety, health economics, and use of resources, all of which are based on clinical
epidemiology thinking. Furthermore, clinical epidemiology supplies data and evidence
needed in organization and planning of the health care system. Biostatistics is an
important basic tool for clinical epidemiology.
Synthesis:
Clinical Epidemiology is primarily focused on research on clinical questions, and on
the application of epidemiological principles and questions relating to patients and clinical
care in terms of
prevention, diagnosis,
prognosis, and
treatment.
Assessment:
Offline Learners: Do
the activity below and
submit it to your
instructor via email.
Online Learners: Do
the activity posted in
Google Classroom
email
1. Why is clinical epidemiology important?

2. In a table forms, differentiate Population epidemiology from Clinical epidemiology
RUBRIC

Reference:
Fletcher RH, Fletcher SW. Clinical Epidemiology. The Essentials. 4th ed. Philadelphia, PA:
Lippincott Williams and Wilkins; 2005.
Feinstein AR. Clinical Epidemiology. The Architecture of Clinical Research. Philadelphia, PA:
W.B. Saunders Company; 1985
Oregon Health and Science University (n.d). Clinical Epidemiology Research. Department of
Medical Informatics and Clinical Epidemiology. Retrieved from
https://www.ohsu.edu/school-of-medicine/medical-informatics-and-clinical-
epidemiology/clinical-epidemiology-research on August 7, 2020
Paul J. Clinical epidemiology. J Clin Invest. 1938;17:539–541
Sackett DL, Straus SE, Richardson WS, Rosenberg W, Haynes RB. (2000) Evidence-based
Medicine. How to Practice and Teach EBM. 2nd ed. Edinburgh, UK: Churchill Livingstone.
Sackett DL, Haynes BR, Guyatt GH, Tugwell P. Clinical Epidemiology. A Basic Science for
Clinical Medicine. 2nd ed. Boston, MA: Little, Brown and Company; 1991
Sorensen, H. (2009). Clinical Epidemiology – a fast new way to publish important research.
US National Library of Medicine National Institutes of Health Search database Search
term Clear input. Dovepress. Retrieved from https://www.ncbi.nlm.nih.gov/
pmc/articles/PMC2943161/ on August 7, 2020
University of the Philippines (2018). Clinical Epidemiology. Department of Clinical Epidemiology

College of Medicine. Retrieved from http://upcm.ph/clinical-epidemiology/ on August 7,
2020
Western University (2020). Clinical Epidemiology. Retrieved from
https://www.schulich.uwo.ca/epibio/research/research_clusters/metholological_approache
s_and_disciplines/clinical_epi.html on August 7, 2020
Weiss NS. (2006) Clinical Epidemiology. The Study of the Outcome of Illness. 3rd ed. Oxford,
UK: Oxford University Press

LABORATORY 17
CLINICAL EPIDEMIOLOGY
I. BACKGROUND
Clinical epidemiology is the study of the patterns, causes, and effects of health and
disease in patient populations and the relationships between exposures or treatments
and health outcomes.
II. OBJECTIVES
1. To apply and interpret measures of disease occurrence and correlates in populations
III. MATERIALS
References
IV. ACTIVITY
1. Look for a research on Clinical Epidemiology
2. Present through a power point using the IMRAD format

1. What topics are usually covered by Clinical epidemiology?
2. What is the contribution of Clinical Epidemiology to Public Health?
Module Evaluation Questionnaire
The learner’s feedback is vital to us. Taking into account your assessment and impression will
help us enrich the content enhance the quality your learning engagement with us.
From this view, we would appreciate if you could spend some time completing this evaluation by
checking the column you think is appropriate and then providing a qualitative response to the
questions raised in this form.
The questionnaire is anonymous and though your participation is voluntary, your utmost
cooperation is encouraged.
Once completed the results of these questionnaires will be analysed and an overview compiled
which will be reported to the next cohort of students in the module handbook. The overview will
also be used to inform discussion at programme team conference.
INDICATORS Strongly Moderately Slightly Disagree Strongly

agree agree agree agree
The Module
a) was effectively designed
b) had clear learning outcomes
c) was well organized
d) contained relevant information
e) had clear images
f) had sufficient parts

g) had lessons that related to life
experiences
The Assessment
a) rubrics were clear
b) instructions were comprehensive
c) was sufficiently challenging
d) was aligned to the lessons
e) was done within the prescribed
time
f) had enriched my knowledge about
the lessons
g) were of different types
h) contained critical thinking
questions
What do you like most about the module?
What could have been improved on the module?
What other things you suggest to improve the module?
How satisfied are you with the module?
Thank you.


EPIMLS1

Uploaded by

Copyright:

Available Formats

EPIMLS1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

EPIMLS1

Uploaded by

Copyright:

Available Formats

Biostatistics

A Self-regulated Learning Module 1

A Self-regulated Learning Module 1

Teresa N. Villanueva, RMT, MACT

A Self-regulated Learning Module 2

A Self-regulated Learning Module 4

A Self-regulated Learning Module 5

Desired Learning Outcomes:

Types of Random Sampling:

Kinds of Study Designs

A Self-regulated Learning Module 7

How to describe the subjects: Numerical Summaries

A Self-regulated Learning Module 8

Notation for Parameters and Statistics

Listed below are examples of notations for statistics and parameters.

A Self-regulated Learning Module 9

Content – 3 points Coherence – 2 points

A Self-regulated Learning Module 10

1. Create a timeline on the historical development of Biostatistics and Epidemiology

A Self-regulated Learning Module 11

B. Refer to the journal attached to answer the following questions

A Self-regulated Learning Module 12

Desired Learning Outcomes:

A Self-regulated Learning Module 19

Estimation (Confidence Intervals)

Content – 7 points Coherence – 3 points

A Self-regulated Learning Module 22

Formula for ungrouped data:

Formula for grouped data:

Formula for ungrouped data:

Formula for grouped data:

A Self-regulated Learning Module 23

Formula for grouped data

A Self-regulated Learning Module 24

Desired Learning Outcomes:

a) Solving for the mean

A Self-regulated Learning Module 26

The pth percentile is defined by

A Self-regulated Learning Module 27

A Self-regulated Learning Module 28

The CV of example 1 is:

To better understand this topic, watch the video link below:

Online Learners: Answer the quiz posted in Google Classroom

A Self-regulated Learning Module 29

Rosner, B. (2016). Fundamentals of Biostatistics 8th edition. USA: Cengage Learning.

A Self-regulated Learning Module 30

A Self-regulated Learning Module 32

Desired Learning Outcomes:

Your data must meet the following requirements:

A Self-regulated Learning Module 33

One Sample t-test

For one tailed test: For two tailed test:

A Self-regulated Learning Module 34

𝐻0 : The mean birthweight is lower than the national average

Rosner, B. (2016). Fundamentals of Biostatistics 8th edition. USA: Cengage Learning.

A Self-regulated Learning Module 36

A Self-regulated Learning Module 37

A Self-regulated Learning Module 38

Desired Learning Outcomes:

Types of Regression Models