EPIMLS1
EPIMLS1
EPIMLS1
and
Epidemiology
Prepared by:
Jaleh V. Gacayan, RMT, MPH, LPT
Adelle N. Sanchez, MA Physics, LPT
1. Regular Attendance to classes: You must attend online classes and live quizzes regularly by logging
in to our scheduled online activities. Online lectures will be done through Google meet and/or facebook
live. Assessments shall be given through Quizziz, Pear Deck, Canvas and/or Google forms. For offline
students, your attendance will be monitored through your responses to text information and through
timely correspondence.
2. Submission of required activities: All required activities (assignments, research work, laboratory
illustrations) should be submitted on or before the given deadline. Deadlines will be posted by the
teacher in the google classroom and messenger group chat. It will also be texted to offline students.
For online students, requirements must be submitted to the teacher’s email address which will be
provided during the class orientation. For offline students, requirements must be submitted via mail or
express courier (e.g. LBC, JRS) addressed to: Instructor’s name, School of Natural Sciences,
University of Baguio, Baguio City.
3. Seventy percent (70%) passing score in all required activities: Quizzes, exams, assignments,
research work, laboratory illustrations.
Computation of grades:
▪ The Course Grade is obtained by combining the lecture and laboratory grades (50%:50%) for the
subject.
▪ Laboratory grade shall be computed as 30% enhancement activities (illustrations; research work;
case study; experiments – when possible) plus 70% class standing (quizzes and exams).
▪ The cumulative system of computing grades shall be followed. Grades computed for midterms and
finals are considered tentative. The final midterm grade is calculated by getting 1/3 of the first
grading grade plus 2/3 of the tentative midterm grade and the final grade is computed by getting 1/3
of the midterm grade plus 2/3 of the tentative final grade.
4. Study/Learning Guidelines:
a. Manage your time properly. As students of higher education (College), you are expected to be
more responsible in paying attention to course schedules, requirements, and deadlines. Schedule
how you will accomplish all the requirements in all your enrolled courses (reading the modules,
reading on research/ enhancement questions, doing assignments and laboratory illustrations) and
focus your attention when doing your tasks.
b. Observe proper conduct. Despite this online mode of learning, you must still maintain appropriate
behavior at all times. All standards of student conduct outlined in the University of Baguio Student
Handbook remain in full effect during this time of distance learning. Be honest in answering your
quizzes and exams. Work independently when accomplishing tasks and assignments.
c. Stay motivated. Your future depends on what you do today. Maintain a positive attitude towards
learning and enjoy a fun-learning environment despite the current circumstances.
d. Maintain a performance of high standard. Give your best in accomplishing all the assigned tasks.
Do not be complacent with just a 70% passing cut-off score. Remember that this is a board subject,
and the best preparation for the board/licensure examination should be during these formative
years. The board review is but supplementary to the knowledge you have already learned during
your Med Tech education.
e. Communicate properly. Promptly respond to notifications by regularly visiting our google classroom
and messenger group chat. If you have confusions or queries in any part of this module, I am here
to guide you through. Send your academic concerns using the same online platforms. For offline
students, text messages and mobile calls are welcome during scheduled hours of the day and
week. Be guided by this schedule when communicating:
Endorsed by:
Introduction:
One of the key steps to clearly defining the research question is to state clearly who or what
needs to be studied. Although many studies in public health involve human beings as subjects,
this is not a requirement. The subjects can be anything. Examples are children, extracted teeth,
pregnant women, water sources exposed to bacteria, cell cultures, households in the tropics,
muscle tissues, etc. The subjects in a study are the sources of information or data. Data are
obtained by measuring the characteristics of the subjects.
Lesson Proper:
Defining who or what is going to be studied means defining the population, which consists of
all possible subjects of interest. Defining the population is a critical step because it defines the
subjects to be studied and the subjects to whom the conclusions will be applicable.
A sample is a smaller set or a subset of the population. A representative sample is a subset
that provides an accurate picture of the whole population. The sample data should be very similar
to what would be found in the whole population. A representative sample provides an accurate
picture of the population but an unrepresentative sample may be misleading.
A biased sample occurs when certain members of the population are chosen so that the
sample systematically misrepresents the population. This happens when subjects select
themselves for the sample or the investigator select subjects that are convenient for him. To avoid
this random sampling must be done. A sampling frame must be created where respondents are
listed and assigned a unique number. A random number generator can be used to randomly
select the sample.
Study Designs
The study design is how information on the subjects will be collected.
Variable Types
1. Categorical
For continuous variables, there are many options for summarizing the responses numerically.
Typically, numerical summaries of continuous variables are the mean, standard deviation,
median, first and third quartiles, and minimum and maximum. Some of these numerical
summaries describe the center of the distribution, and some describe the spread.
A measure of center provides a description of the average response. A measure of spread
provides a description of how varied the responses are. A measure of how spread out the
responses are tells the investigator whether the responses are clustered close to the center or
dispersed farther away from the center.
The mean is commonly used to describe the center of the responses. However, when
extremely large or small values are present, the mean is no longer a good measure of the center.
In this case, the median, which is the middle of the responses, is a better measure of the center.
The choice for the measure of spread depends on the measure of center. Mixing and matching
measures of center and spread is not appropriate. If the mean is chosen as the measure of center,
then an appropriate measure of spread is the standard deviation. If the median is chosen to
summarize the center of the data, then an appropriate measure of spread is the range. The range
Synthesis:
Ideally, samples are chosen from complete sampling frames using an element of random
chance. When a sample systematically misses a group of subjects in a population, the sample
suffers from selection bias. Bias can occur when the sampling frame omits a particular group of
subjects, when subjects are either self-selected or investigator-selected. Although eliminating all
sources of bias from the sample selection process may not be possible, selecting a sample
requires careful consideration and procedures should be carefully planned so that potential biases
are minimized.
To help you with this topic, you can watch the video links below:
https://www.youtube.com/watch?v=Mb9BuEkbaHQ
https://www.youtube.com/watch?v=VPM84_yfx5Q
Assessment:
Offline Learners: Answer the given activity and send it to your instructor’s email.
Online Learners: Answer the quiz posted in Google Classroom.
Give brief and concise answers for the following questions. Be guided by the rubric below: (20
points)
1. Consider the population of children ages 6-10 who attend public schools. How could a
representative sample of this population be obtained? Identify the problems with using
the following methods to obtain a sample of this population:
a. Send a survey (with a stamped return envelope) to every public elementary school.
The students who return the survey are the sample.
b. The investigator has worked with educational leaders in states in the Southeast and
knows that the schools in this region will be motivated to participate. The investigator
collects data only on children from these schools.
c. Children attending elementary schools with high test scores may be more likely to
participate, so the investigator chooses only children from schools in the top 10%
d. The investigator conducts a random sample using home telephone number of
parents with children ages 6-10 public schools.
Reference:
Bush, H., Macera, C., Shaffer, R., and Shaffer, P. (2020). Biostatistics and Epidemiology.
Taguig: Cengage Learning Asia Pte Ltd.
I. BACKGROUND
Epidemiology and biostatistics are the basic sciences of public health. Public health
investigations use quantitative methods, which combine the two disciplines of epidemiology and
biostatistics. Epidemiology is about the understanding of disease development and the methods
used to uncover the etiology, progression, and treatment of the disease. Information (data) is
collected to investigate a question. The methods and tools of biostatistics are used to analyze the
data to aid decision making. (John Hopkins Bloomberg School of Public Health)
II. OBJECTIVES
At the end of the lesson you are expected to:
1. be able to identify the basic concepts and principles of biostatistics & epidemiology.
2. trace the historical development of Biostatistics & Epidemiology
III. MATERIALS
Coupon bond
Writing & Drawing materials
References
IV. ACTIVITY
3. Define observational study. List and describe the types of observational studies
A. Identify the type of data (nominal, ordinal, interval and ratio) represented by each of the
following.
1. Blood group
2. Temperature (Celsius)
3. Ethnic group
4. Job satisfaction index (1-5)
5. Number of heart attacks
6. Calendar year
7. Serum uric acid (mg/100ml)
8. Number of accidents in 3 - year period
9. Number of cases of each reportable disease reported by a health
worker
10. The average weight gain of 6 1-year old dogs (with a special diet supplement)
was 950grams last month.
Introduction:
Each type of variable has its own special properties, and the distribution of each type of
variable has a particular shape and characteristics. The distribution of a variable consists of a
summary of the possible values the variable can have and the number of subjects with each of
these values. A distribution that uses counts to describe the number of subjects with a particular
value is called a frequency distribution. A distribution that uses proportions to describe the
number of the subjects with a particular value is called a probability distribution.
Lesson Proper:
Two types of graphs are used to summarize categorical variables: pie charts and bar graphs.
Pie charts can be presented using frequencies or proportions. A pie chart describes how the
pieces relate to the whole. So the counts for all the slices (categories) must add up to the total
number of subjects, and/or the proportions for all the slices (categories) must add up to 1 (or
100% if percentages are used). Pie charts are generally used when trying to demonstrate how
the categories within a variable relate to each other. Bar graphs are used to describe the
distributions of categorical variables. The height of the bar indicate how many subjects are in each
value or category. The height of the bar can represent either the number of subjects (frequency)
or the proportion (probability) of subjects in a particular category. Because the shape of the graph
does not change when using the count or proportion of participants, many times the proportion of
participants is preferred because it takes the total number of subjects into account. Because the
heights of the bars can easily be compared, these graphs are particularly useful when the
research question involves comparisons.
Binomial distributions are used when a data has a variable with two options. These
variables are dichotomous and each subject has only two possibilities: have the characteristic or
do not have the characteristic. Often these two options are measured by a variable where subjects
are assigned a 0 for not having the characteristic and 1 for having it. Since there is a gap between
two possible values, binomial variables are said to be discrete. The mean of the binomial
distribution is simply the number of subjects with the characteristic or the proportion multiplied by
the sample size. The variance of the binomial distribution is a little more complicated. It is the
proportion of the sample with the characteristic multiplied by the proportion without it, all divided
by the sample size.
Histograms best describe the distribution of a continuous variable. A histogram is a graphical
representation of a variable in which the observed values are categorized, a bar is drawn for each
category, and the number of participants in each category is represented by the height of the bar.
It provides a quick picture of the distribution of a variable and it can be presented with counts or
proportions of participants. There are no gaps between the bars of the histogram which
demonstrates that all in-between values are possible. The chart must have enough bars to
present a good picture of the distribution results. Too few or too many categories (bars) hide
interesting trends in the data. Histograms provide information about how spread out the
responses are, which responses are common, which responses are in the center, and the overall
shape of the distribution.
Hypothesis Testing
A statement claiming that the null parameter is the true parameter is called the null
hypothesis. An alternative or research hypothesis is a hypothesis that states the true parameter
is not (or is less than or is greater than) the null parameter. To reject the null parameter as the
true parameter, the observed statistic needs to be far enough away from the center so that the
observed statistic clearly came from a sampling distribution that was not centered at the null
parameter. On the other hand, when the observed statistic is very close to the null parameter,
claiming that this statistic would not be expected from a sampling distribution centered at the null
parameter would be difficult.
The proportion of statistics that are even farther from the null parameter than the observed
statistic is called the p-value. When the p-value is small, the observed statistic is rare and
provides evidence against the null hypothesis. When the p-value is large, the observed statistic
is common and does not provide sufficient evidence against the null hypothesis. Generally, p-
values that are smaller than 0.05 are considered small enough to reject the null hypothesis. When
studies are observational or exploratory, researches may consider p-values smaller than 0.10 as
small enough to reject the null hypothesis. In contrast, when the researcher wants to be really
sure that the null parameter is not the true parameter, 0.01 or even 0.001 is defined as small
enough to reject the null hypothesis.
Type 1 Error
This occurs when a statistic provides evidence against the null parameter, but the null
parameter is really the true parameter. This happens when the null hypothesis is rejected even
though it is really true. If the cutoff for defining a small enough p-value to reject the null is set at
0.05 or 5%, then 5% of the possible statistics in the sampling distribution are far enough away to
reject the null parameter even though it is really the true parameter. Therefore, there is a 5%
chance that a type 1 error will be made because these are rare statistics from a sampling
distribution centered around the null parameter. The rule for when the reject the null parameter
as the true parameter is often written as when p-value < 𝛼 for example, p-value < 0.05.
Type 2 Error
This happens if the research hypothesis is true and the null hypothesis is not rejected. When
the significance level is set to a very small value, then it is more likely that a type 2 error will be
made. Thus, as the chance of a type 1 error decreases, the chance of a type 2 error increases. A
type 2 error occurs when the null parameter is not the true parameter, but the null hypothesis is
not rejected. The probability of a type 2 error is presented less often than the significance level.
Power
The goal of any hypothesis test is to have a sample with good power, that is, a sample with a
good chance of supplying enough evidence so that an incorrect null parameter can be rejected
as the true parameter. The probability that the null hypothesis will be rejected when it is indeed
A Self-regulated Learning Module 21
false is called power. Power is the opposite of the probability of a type 2 error and can be written
as 1 − 𝛽. A study with good power has power of 80-90%. If a study has 90% power, this means
that there is a 90% chance that a false null parameter will be rejected as the true parameter.
Power, type 1 error, and type 2 error all work together. The probabilities of type 1 and type 2
errors are inversely related. Because the probability of a type 2 error and power are opposites,
they are also inversely related. Hence, when the probability of a type 1 error increases, then the
power also increases.
Synthesis:
Many research questions remain to be answered in public health. Statistical methods can be
utilized in the development of the question, deciding whom or what to study, determining what to
measure, summarizing the measurements, and testing hypothesis. After the research question
has been posed, the sample selected, and the measurements made, statistical methods provide
a strategy form moving from data points to answers.
There are many statistical procedures implementing describing the data, estimating the
parameter and testing the hypothesis. Organizing the methods according to the research question
simplifies choosing the appropriate statistical method.
To help you understand the topic better, visit the following links:
https://www.youtube.com/watch?v=U6fBc_SPVoM
https://www.youtube.com/watch?v=s6y3ykvDols
Assessment:
Offline Learners: Answer the activity below and send it to your instructor via email.
Online Learners: Answer the quiz posted on Google Classroom
Answer the following in 5-7 sentences. Be guided by the rubric below: (20 points)
1. Suppose you were conducting a study to investigate the effects of an exercise program
on a participant’s weight. The participants will be coming in weekly to be weighed.
Describe how reliability might be a factor in this study.
2. You collect information on a random sample of women and obtain estimates on vitamin D
exposure. However, the results that you find are different than those conducted at another
study site. Should you be concerned? Explain your answer using the concept of sampling
variability.
References:
Beligan, S. (2016). Statistics for the Social Sciences. La Trinidad: Benguet State University. ISBN
971-0330-05-5
Bush, H., Macera, C., Shaffer, R., and Shaffer, P. (2020). Biostatistics and Epidemiology.
Taguig: Cengage Learning Asia Pte Ltd.
I. BACKGROUND
Quartiles divide the set of observations into four equal parts. They are named as quartile 1
(Q1), quartile 2 (Q2), and quartile 3 (Q3). Q2 is the median and Q1 and Q3 are lower and upper
quartiles.
1 𝑖𝑁 𝑖𝑁
If N is even: 𝑄𝑖 = [ +( + 1)]
2 4 4
Deciles divide the set of observations into 10 equal parts. In a set, there are 9 deciles
Percentiles divide the set of observations into 100 equal parts and there are 99 percentiles
Formula for ungrouped data
𝑖(𝑁+1)
If N is odd: 𝑃𝑖 =
100
To better understand this, watch the video by clicking on the link below:
https://www.youtube.com/watch?v=40o82o3uNfk
https://www.youtube.com/watch?v=uYIl2M9YwHE
https://www.youtube.com/watch?v=XiJV6Lm1En0
II. OBJECTIVES
At the end of the lesson you are expected to:
1. Accurately solve for the percentile, decile and quartile of grouped and ungrouped data
2. Understand the use of percentile, decile, and quartile in biostatistics
III. MATERIALS
Coupon bond
Calculator
Writing materials
IV. ACTIVITY
Use a separate sheet of bond paper to write the solution of the following data.
1. Find the quartile, 3rd and 7th decile, and 50th percentile of the ungrouped data below.
Height (inches) 58 59 60 61 62 63 64 65 66
# of students 15 20 32 35 33 22 20 10 8
2. Find the quartile, 4th and 6th decile, and 30th percentile of the grouped data below.
Marks Frequency Cumulative frequency
0-10 3
11-21 4
22-32 6
33-43 8
44-55 4
V. QUESTIONS/SUPPLEMENTAL ACTIVITIES/EXERCISES
1. Why is it necessary to solve for percentile, decile, and quartile in a given set of data?
Introduction:
A measure of central tendency (position or location) is a single value about which the set of
observations tend to cluster. The measures of position provide precise, objectively determined
value that can easily be manipulated, interpreted, and compared with one another. In short, it
permits a more careful analysis of the data than do the general impression conveyed by tabular
and graphical summaries. Some popular and commonly used measures of position are the mean
median, and mode.
Lesson Proper
MEASURES OF LOCATION
Mean
One measure of location for a sample is the arithmetic mean (colloquially called the average).
The arithmetic mean (or mean or sample mean) is usually denoted by 𝑥̅ . The arithmetic mean is
the sum of all the observations divided by the number of observations. It is written in statistical
terms as:
1
𝑥̅ = ∑𝑛𝑖=1 𝑥𝑖
𝑛
The arithmetic mean is, in general, a very natural measure of location. One of its main
limitations, however, is that it is oversensitive to extreme values. In this instance, it may not be
representative of the location of the great majority of sample points.
Median
An alternative measure of location, perhaps second in popularity to the arithmetic mean, is
the median or, more precisely, the sample median. Suppose there are n observations in a sample.
If these observations are ordered from smallest to largest, then the median is defined as follows:
𝑛+1
• The ( ) 𝑡ℎ largest observation if n is odd
2
𝑛 𝑛
• The average of the ( ) 𝑡ℎ and ( + 1) 𝑡ℎ largest observation if n is even
2 2
The rationale for these definitions is to ensure an equal number of sample points on both sides
of the sample median. The median is defined differently when n is even and odd because it is
impossible to achieve this goal with one uniform definition. Samples with an odd sample size have
a unique central point; for example, for samples of size 7, the fourth largest point is the central
point in the sense that 3 points are smaller than it and 3 points are larger. Samples with an even
sample size have no unique central point, and the middle two values must be averaged. Thus, for
samples of size 8 the fourth and fifth largest points would be averaged to obtain the median,
A Self-regulated Learning Module 25
because neither is the central point. The main strength of the sample median is that it is insensitive
to very large or very small values. The main weakness of the sample median is that it is
determined mainly by the middle points in a sample and is less sensitive to the actual numeric
values of the remaining data points.
Mode
The mode is the most frequently occurring value among all the observations in a sample. In
certain cases, mode can be an extremely helpful measure of central tendency. One of its biggest
advantages is that it can be applied to any type of data. It is also not affected by extreme values
in datasets with quantitative data. Thus, it can provide the insights into almost any dataset despite
the data distribution. The measure of mode cannot be further treated mathematically and cannot
be used for more detailed analysis. It is also not based on all values in the dataset, therefore, it is
difficult to draw conclusions regarding the dataset relying on mode only.
Example 1:
Determine the mean, median, and mode of the following data:
b) To determine the median, arrange the values in ascending order, then determine the
middle value
2069, 2581, 2759, 2834, 2838, 2841, 3031, 3101, 3200, 3245, 3248, 3260, 3265, 3314,
3323, 3484, 3541, 3609, 3649, 4146
1 𝑛 𝑛 1
𝑥̃ = [ + ( + 1)] = (3245 + 3248) = 3246.5
2 2 2 2
c) There is no mode in the given sample since there is no repeated term.
MEASURES OF SPREAD
Quantiles
Another approach that addresses some of the shortcomings of the range in quantifying the
spread in a data set is the use of quantiles or percentiles. Intuitively, the pth percentile is the value
Vp such that p percent of the sample points are less than or equal to Vp. The median, being the
50th percentile, is a special case of a quantile. As was the case for the median, a different
definition is needed for the pth percentile, depending on whether or not np/100 is an integer.
Example 2:
Solve for the range, 10th and 90th percentile of example 1.
a) Range
𝑅 = 𝐻𝑉 − 𝐿𝑉 = 4146 − 2069 = 2077
th
b) 10 percentile
Multiply n by the decimal equivalent of the 10th percentile
20 x 0.1 = 2
10th percentile = average of 2nd and 3rd values
10th percentile = (2581 + 2759)/2 = 2670 g
c) 90th percentile
20 x 0.9 = 18
90th percentile = (3609 + 3649)/2 = 3629 g
We would estimate that 80% of birthweights will fall between 2670 g and 3629 g, which gives
an overall impression of the spread of the distribution.
Variance
The variance combines all the values in a data set to produce a measure of spread. It tells how
spread the data is. It is defined as:
∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2
𝑠2 =
𝑛−1
Standard Deviation
The standard deviation measures spread around the mean. Because of its close links with the
mean, standard deviation can be greatly affected if the mean gives a poor measure of central
tendency.
Example 3: Solve for the variance and the standard deviation of Example 1
It would be easier to solve it if the values are placed on a table
𝑥𝑖 𝑥𝑖 − 𝑥̅ (𝑥𝑖 − 𝑥̅ )2
2069 -1097.9 1205384
2581 -585.9 343278.8
2759 -407.9 166382.4
2834 -332.9 110822.4
2838 -328.9 108175.2
2841 -325.9 106210.8
3031 -135.9 18468.81
3101 -65.9 4342.81
3200 33.1 1095.61
3245 78.1 6099.61
3248 81.1 6577.21
3260 93.1 8667.61
3265 98.1 9623.61
3314 147.1 21638.41
3323 156.1 24367.21
3484 317.1 100552.4
3541 374.1 139950.8
3609 442.1 195452.4
3649 482.1 232420.4
4146 979.1 958636.8
Total 3768148
∑ 𝑛 (𝑥 2
𝑖=1 𝑖 − 𝑥̅ ) 3768148
𝑠2 = = = 198323.58
𝑛−1 20 − 1
𝑠 = √𝑠2 = √198323.58 = 445.34
Synthesis:
The mean reflects the magnitude of every observation, since every observation contributes to
the value of the mean. It is easily affected by the presence of extreme values, and hence not a
good measure of central tendency when extreme observations occur. Means of subgroups may
be combined when properly weighted. Combined mean is called the weighted arithmetic mean.
The median is a positional value and is not affected by the presence of extreme values. It is
not suitable to further computations and hence medians of subgroups cannot be combined in the
same manner as the mean.
The mode is determined by the frequency and not by the values of the observations. It can be
manipulated algebraically and can be defined with qualitative or quantitative variables.
The range is a quick but rough measure of dispersion. The larger the value of the range, the
more dispersed are the observations. It considers only the lowest and highest values in the
population.
The variance is always non-negative. It is easy to manipulate for further mathematical
treatment. It makes us of all observations.
The standard deviation is always non-negative. It is easy to manipulate for further
mathematical treatment and makes use of all observations.
The coefficient of variation is a quantity without units. It can be used to compare the dispersion
of two or more sets of data measured in the same or different units.
Assessment:
Offline Learners: Answer the activity below and submit it via email to your instructor.
References:
Beligan, S. (2016). Statistics for the Social Sciences. La Trinidad: Benguet State University. ISBN
971-0330-05-5
Bush, H., Macera, C., Shaffer, R., and Shaffer, P. (2020). Biostatistics and Epidemiology.
Taguig: Cengage Learning Asia Pte Ltd.
I. BACKGROUND
The arithmetic mean or average of a set of n measurements is equal to the sum of the
measurements divided by n. The median of a set of n measurements is the value of x that falls in
the middle position when the measurements are ordered form smallest to largest. The mode is
the category that occurs most frequently, or the most frequently occurring value of x. When the
measurements on a continuous variable have been grouped as frequency or relative frequency
histogram, the class with the highest peak or frequency is called the modal class, and the midpoint
of that class is taken to be the mode.
Measures of variability can help you create a mental picture of the spread of the data. The
range of a set of n measurements is defined as the difference between the largest and smallest
measurements. The variance of a population of N measurements is the average of the squares
of deviation of the measurements about their mean. The variance of a sample of n
measurements is the sum of the squared deviations of the measurements about their mean
divided by (n – 1). The standard deviation of a set of measurements is equal to the positive
square root of the variance.
II. OBJECTIVES
At the end of the lesson you are expected to:
1. Apply the formula of mean, median and mode in the medical field
2. Accurately solve for the mean, median, and mode in an ungrouped and grouped data
3. Apply the different measures of variability to the medical field
4. Accurately solve for the range, variance and standard deviation of a given data
III. MATERIALS
Coupon bond
Writing materials
Calculator
IV. ACTIVITY
1. Given the data on the age of the population affected by SARS in a city, find the mean,
median, and mode: 30, 25, 7, 40, 15, 36, 27, 35, 48, 10, 20, 28, 33, 45, 10
2. You are given n = 8 measurements: 3, 1, 5, 6, 4, 4, 3, 5. Calculate the range, sample
mean, sample variance, and standard deviation.
3. An article in Archaeometry involved an analysis of 26 samples of Romano-British pottery
found at four different kiln sites in the United Kingdom. The samples were analyzed to
determine their chemical composition. The percentage of iron oxide in each of five
samples collected at the Island Thorns site was: 1.28, 2.39, 1.50, 1.88, 1.51. Calculate
the range, sample variance, and the standard deviation. Compare the range and the
standard deviation.
V. QUESTIONS/SUPPLEMENTAL ACTIVITIES/EXERCISES
1. Why is it necessary to solve for the mean, median, and mode?
2. What are the practical significance of the range, variance, and standard deviation?
A Self-regulated Learning Module 31
3. A report on a study by an MIT researcher indicates that later born children are more likely
to challenge the establishment, more open to new ideas, and more accepting of change.
In fact, the number of later born children is increasing, during the Depression years of the
1930s, families averaged 2.5 children (59% later born), whereas the parents of baby
boomers averaged 3 to 4 children (68% later born). What does the author mean by an
average of 2.5 children?
4. In a psychological experiment, the time on task was recorded for 10 subject under a 5
minute time constraint. These measurements are on seconds: 175, 200, 190, 185, 250,
190, 230, 225, 240, 265. Find the average time and the median time on task. If you were
writing a report to describe these data, which measure of central tendency would you use?
Explain.
Introduction:
Most of the time we do not know the population variance of some population hence to determine
the variability of the data, we simply compute its estimate the sample variance. Thus, when we
divide the difference of the population mean from the sample mean by the standard error of the
sample mean, then the quantity has values under the t-distribution. T-test is another tool used for
testing population mean when the variance is unknown and/or the sample size is small (n < 30).
Lesson Proper:
The One Sample t Test is commonly used to test the following:
• Statistical difference between a sample mean and a known or hypothesized value of the
mean in the population.
• Statistical difference between the sample mean and the sample midpoint of the test
variable.
• Statistical difference between the sample mean of the test variable and chance.
• This approach involves first calculating the chance level on the test variable. The
chance level is then used as the test value against which the sample mean of the
test variable is compared.
• Statistical difference between a change score and zero.
• This approach involves creating a change score from two variables, and then
comparing the mean change score to zero, which will indicate whether any
change occurred between the two time points for the original measures. If the
mean change score is not significantly different from zero, no significant change
occurred.
Note: The One Sample t Test can only compare a single sample mean to a specified constant. It
can not compare sample means between two or more groups. If you wish to compare the means
of multiple groups to each other, you will likely want to run an Independent Samples t Test (to
compare the means of two groups) or a One-Way ANOVA (to compare the means of two or more
groups).
𝑥̅ − 𝜇0 115 − 120
𝑡= = = −20.83
𝑠/√𝑛 24/√10000
𝑡9999,0.95 = 1.645
𝑆𝑖𝑛𝑐𝑒 − 20.83 < 1.645, 𝐴𝑐𝑐𝑒𝑝𝑡 𝐻0
Conclusion:
A Self-regulated Learning Module 35
The mean birthweight is lower than the national average
Synthesis:
T-test is used to test the hypothesis involving the mean of a study. The results would tell us
whether there is a significant difference between the meant of a group. Constructing the null and
alternative hypothesis is important because it is the basis for testing mathematically. If the
computed t—value is greater than the tabulated t-value, then the null hypothesis is rejected. One
sample t-test can only be used for sample size less than 30, however, there is also a t-test that
can be used for samples greater than 30.
To better understand this topic, watch the video by clicking on the link below:
https://www.youtube.com/watch?v=pTmLQvMM-1M
Assessment:
Offline Students: Do the activity below and pass it your instructor via email.
Online Students:Answer the quiz posted in Google Classroom
Use t-test to be able to test the hypothesis in the following problems. (20 points)
1. The mean serum-creatinine level measured in 12 patients 24 hours after they received a
newly proposed antibiotic was 1.2 mg/dL. If the mean and standard deviation of serum
creatinine in the general population are 1.0 and 0.4 mg/dL, respectively, then, using a
significance level of .05, test whether the mean serum-creatinine level in this group is
different from that of the general population.
2. Plasma-glucose levels are used to determine the presence of diabetes. Suppose the
mean ln (plasma-glucose) concentration (mg/dL) in 35- to 44-year-olds is 4.86 with
standard deviation = 0.54. A study of 100 sedentary people in this age group is planned
to test whether they have a higher or lower level of plasma glucose than the general
population.
References:
Beligan, S. (2016). Statistics for the Social Sciences. La Trinidad: Benguet State University. ISBN
971-0330-05-5
Bush, H., Macera, C., Shaffer, R., and Shaffer, P. (2020). Biostatistics and Epidemiology.
Taguig: Cengage Learning Asia Pte Ltd.
I. BACKGROUND
T-test for two population means
If the sample size are small (n < 30), the samples were taken at random from two populations
and if the level of measurement used is at least in an interval scale, then the appropriate
test statistic is the t-test.
̅̅̅
𝑥1 − ̅̅̅
𝑥2
𝑡=
√𝑠𝑝2 ( 1 + 1 )
𝑛1 𝑛2
Where:
̅̅̅
𝑥1 – sample mean for the first group
̅̅̅
𝑥2 – sample mean for the second group
̅̅̅̅)2 +∑(𝑥𝑗−𝑥2
∑(𝑥𝑖−𝑥1 ̅̅̅̅)2
𝑠𝑝2 =
𝑛1+𝑛2−2
II. OBJECTIVES
At the end of the lesson you are expected to:
1. Accurately solve for the t-value and compare it with the tabulated value
2. State the null and alternative hypothesis given a problem in biostatistics
3. Interpret the result of the t-value obtained
III. MATERIALS
Coupon bond
Calculator
Writing materials
IV. ACTIVITY
Formulate the null and alternative hypothesis, compute the t-value, and give a proper
interpretation of the result.
1. Sleep researchers decide to test the impact of REM sleep deprivation on a computerized
assembly line task. Subjects are required to participate in two nights of testing. On the nights
of testing EEG, EMG, and EOG measures are taken. On each night of testing, the subject is
allowed a total of four hours of sleep. However, on one of the nights, the subject is awakened
immediately upon achieving REM sleep. On the alternate night, subjects are randomly
awakened at various times throughout the 4 hour total sleep session. Testing conditions are
counterbalanced so that half of the subject experience REM deprivation on the second night
of testing. Each subject after the sleep session is required to complete a computerized
assembly line task. The task involves five rows of widgets slowly passing across the computer
screen. Randomly placed on a one/five ratio are widgets missing a component that must be
“fixed” by the subject. Number of missed widgets is recorded. Compute the appropriate t-test
for the data provided below:
2. Researchers want to examine the effect of perceived control on health complaints of geriatric
patients in a long term care facility. Thirty patients are randomly selected to participate in the
study. Half are given a plant to care for and half are given a plant but the care is conducted
by the staff. Number of health complaints are recorded for each patient over the following
seven days.
Control over plant No control over plant
23 35
12 21
6 26
13 24
18 17
5 23
21 37
18 22
34 16
10 38
23 23
14 41
19 27
23 24
8 32
V. QUESTIONS/SUPPLEMENTAL ACTIVITIES/EXERCISES
1. Use one sample t-test to solve for the following:
a. The weights of 11 one month old breastfed babies are found to be 4.64, 4.41, 4.60, 4.50,
3.50, 4.01, 3.99, 4.55, 4.62, 4.80, and 4.00 kg. Based on the standard weights, one
month old babies should weigh 4 kg. Does this indicate that breastfeeding is best for
babies? Use ∝= 0.05.
b. The reaction time of a coagulant is recorded as follows: 30.26s, 31.4s, 35.6s, 34.8s,
35.26s. The standard reaction time is 32.67 s. Is this sufficient evidence that the
coagulant is effective?
Introduction:
In many studies, the concern is to determine the cause and effect relationship of two variables
taken from a bivariate distribution. One might be interested in determining the best statistical
relation among the variables or simply just to know the degree of relationship among variables.
Problems such as these can be solved using regression techniques
Lesson Proper:
Linear regression is a basic and commonly used type of predictive analysis. The overall idea
of regression is to examine two things: (1) does a set of predictor variables do a good job in
predicting an outcome (dependent) variable? (2) Which variables in particular are significant
predictors of the outcome variable, and in what way do they–indicated by the magnitude and sign
of the beta estimates–impact the outcome variable? These regression estimates are used to
explain the relationship between one dependent variable and one or more independent variables.
The simplest form of the regression equation with one dependent and one independent variable
is defined by the formula y = c + b*x, where y = estimated dependent variable score, c = constant,
b = regression coefficient, and x = score on the independent variable.
There are many names for a regression’s dependent variable. It may be called an outcome
variable, criterion variable, endogenous variable, or regressand. The independent variables can
be called exogenous variables, predictor variables, or regressors.
Three major uses for regression analysis are (1) determining the strength of predictors, (2)
forecasting an effect, and (3) trend forecasting.
First, the regression might be used to identify the strength of the effect that the independent
variable(s) have on a dependent variable. Typical questions are what is the strength of
relationship between dose and effect, sales and marketing spending, or age and income.
Second, it can be used to forecast effects or impact of changes. That is, the regression
analysis helps us to understand how much the dependent variable changes with a change in one
or more independent variables.
Synthesis:
A line of best fit is often useful to attempt to represent data with the equation of a straight line in
order to predict values that may not be displayed on the plot. The line of best fit is determined by
the correlation between the two variables on a scatter plot. In the case that there are a few outliers
(data points that are located far away from the rest of the data) the line will adjust so that it
represents those points as well.
To understand this lesson better, the video links below may help:
https://www.youtube.com/watch?v=WWqE7YHR4Jc
https://www.youtube.com/watch?v=ZkjP5RJLQF4
Assessment:
Offline Learners: Answer the activity below and send it to your instructor via email.
Online Learners: Answer the activity posted in Google Classroom.
The data in Table 11.17 are given for 9 patients with aplastic anemia. Fit a regression line relating
the percentage of reticulocytes (x) to the number of lymphocytes (y). (30 points)
Bush, H., Macera, C., Shaffer, R., and Shaffer, P. (2020). Biostatistics and Epidemiology.
Taguig: Cengage Learning Asia Pte Ltd.
Frost, J. (2019). Choosing the Correct Type of Regression Analysis. Retrieved from:
http://www.statisticsbyjim.com/regression/choosing-regression-analysis/
I. BACKGROUND
The linear regression line can predict the value of y if a new x value is fit into the
equation. The equation for a line is 𝑦 = 𝑚𝑥 + 𝑏 where:
𝑚 = 𝑦̅ − 𝑏𝑥̅
II. OBJECTIVES
At the end of the lesson you are expected to:
1. Determine the regression line that fits the given data
2. Create a scatter plot based on the data given
III. MATERIALS
Calculator
Writing materials
Coupon bond
IV. ACTIVITY
1. The table below shows the height, x in inches and the pulse rate y per minute for 9
people.
a. Determine the linear regression line that would fit the data.
b. Create a scatter graph using the given data and the calculated linear regression
line.
x 68 72 65 70 62 75 78 64 68
y 90 85 88 100 105 98 70 65 72
V. QUESTIONS/SUPPLEMENTAL ACTIVITIES/EXERCISES
1. The table below shows the lengths and corresponding ideal weights of sand sharks.
Determine the equation of the best fit line to predict the weight of a sand shark
whose length is 75 inches.
Length 60 62 64 66 68 70 72
Weight 105 114 124 131 139 149 158
2. The data below shows the height and shoe sizes of six randomly selected men. If a
man has a shoe size of 10.5, what would be his predicted height?
Height 67 70 73.5 75 78 66
Shoe Size 8.5 9.5 11 12 13 8
Introduction:
Correlation is a statistical technique that can show whether and how strongly pairs of variables
are related. For example, height and weight are related; taller people tend to be heavier than
shorter people. The relationship isn't perfect. People of the same height vary in weight, and you
can easily think of two people you know where the shorter one is heavier than the taller one.
Nonetheless, the average weight of people 5'5'' is less than the average weight of people 5'6'',
and their average weight is less than that of people 5'7'', etc. Correlation can tell you just how
much of the variation in peoples' weights is related to their heights.
Although this correlation is fairly obvious your data may contain unsuspected correlations. You
may also suspect there are correlations, but don't know which are the strongest. An intelligent
correlation analysis can lead to a greater understanding of your data.
Lesson Proper:
The correlation coefficient is needed to obtain a measure of relatedness independent of the
units of X and Y. The correlation coefficient is a dimensionless quantity that is independent of the
units of X and Y and ranges between −1 and 1. For random variables that are approximately
linearly related, a correlation coefficient of 0 implies independence. A correlation coefficient close
to 1 implies nearly perfect positive dependence with large values of X corresponding to large
values of Y and small values of X corresponding to small values of Y. An example of a strong
positive correlation is between forced expiratory volume (FEV), a measure of pulmonary function,
and height (Figure a). A somewhat weaker positive correlation exists between serum cholesterol
and dietary intake of cholesterol (Figure b). A correlation coefficient close to −1 implies ≈ perfect
negative dependence, with large values of X corresponding to small values of Y and vice versa,
as is evidenced by the relationship between resting pulse rate and age in children under the age
of 10 (Figure c). A somewhat weaker negative correlation exists between FEV and number of
cigarettes smoked per day in children (Figure d).
Thus the sample correlation coefficient provides a quantitative estimate of the dependence
between two variables: the closer |r| is to 1, the more closely related the variables are; if |r| = 1,
then one variable can be predicted exactly from the other.
• POSITIVE CORRELATION – exists when high scores in one variable are associated
with high scores in the second variable or low scores in one variable are associated with
low scores in the other
• NEGATIVE CORRELATION – exists when high scores in one variable are associated
with low scores in the second or vice versa.
• ZERO CORRELATION– exists when the points on the scatter diagram are spread in a
random manner.
• PERFECT CORRELATION– all points lie on a straight line
The strength or degree of the relationship is based on the following ranges of the correlation
coefficient:
x y x2 y2 xy
7 25 49 625 175
9 25 81 625 225
9 25 81 625 225
12 27 144 729 324
14 27 196 729 378
16 27 256 729 432
16 24 256 576 384
14 30 196 900 420
16 30 256 900 480
16 31 256 961 496
A Self-regulated Learning Module 46
17 30 289 900 510
19 31 361 961 589
21 30 441 900 630
24 28 576 784 672
15 32 225 1024 480
16 32 256 1024 512
241 454 3919 12992 6932
𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦 16(6932) − (241)(454)
𝑟= = = 0.53
√[𝑛 ∑ 𝑥 2 − (∑ 𝑥)2 ][𝑛 ∑ 𝑦 2 − (∑ 𝑦)2 ] √(16(3919) − 2412 )(16(12992) − 4542 )
From the computed r value, there is moderate or substantial correlation between estriol and
birthweight.
Synthesis:
A key thing to remember when working with correlations is never to assume a correlation
means that a change in one variable causes a change in another. Sales of personal computers
and athletic shoes have both risen strongly over the years and there is a high correlation between
them, but you cannot assume that buying computers causes people to buy athletic shoes (or vice
versa).
The Pearson correlation technique works best with linear relationships: as one variable gets
larger, the other gets larger (or smaller) in direct proportion. It does not work well with curvilinear
relationships (in which the relationship does not follow a straight line). An example of a curvilinear
relationship is age and health care. They are related, but the relationship doesn't follow a straight
line. Young children and older people both tend to use much more health care than teenagers or
young adults. Multiple regression (also included in the Statistics Module) can be used to examine
curvilinear relationships.
Watch the video link below to help you understand the topic better:
https://www.youtube.com/watch?v=4EXNedimDMs
Assessment:
Offline Learners: Answer the activity below and pass it to your instructor via email.
Online Learners: Answer the activity posted in Google Classroom
Compute the correlation between 5-year lung cancer mortality and annual cigarette
consumption when each is expressed in the log10 scale. (30 points)
Beligan, S. (2016). Statistics for the Social Sciences. La Trinidad: Benguet State University. ISBN
971-0330-05-5
Bush, H., Macera, C., Shaffer, R., and Shaffer, P. (2020). Biostatistics and Epidemiology.
Taguig: Cengage Learning Asia Pte Ltd.
I. BACKGROUND
Correlation seeks to find the relationship between two variables. It can be solved using the
formula below:
𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
𝑟=
√[𝑛 ∑ 𝑥 2 − (∑ 𝑥)2 ][𝑛 ∑ 𝑦 2 − (∑ 𝑦)2 ]
The computed r value will be a number between -1 to 1. If r is close to 1, then the variables
are positively correlated. If r is close to -1 the variables are negatively correlated. If r is
close to 0, the variables are not correlated.
II. OBJECTIVES:
At the end of the lesson you are expected to:
1. Accurately solve for the correlation coefficient.
2. Give an appropriate interpretation for the computed r value
III. MATERIALS:
Coupon Bond
Calculator
Writing Materials
IV. ACTIVITY:
1. The World Bank collects information on the life expectancy of a person in each
country ("Life expectancy at," 2013) and the fertility rate per woman in the country
("Fertility rate," 2013). The data for 24 randomly selected countries for the year 2011
are listed below. Based on the correlation coefficient, is there a correlation between
life expectancy and fertility?
Introduction:
The Chi Square statistic is commonly used for testing relationships between categorical
variables. The null hypothesis of the Chi-Square test is that no relationship exists on the
categorical variables in the population; they are independent. An example research question that
could be answered using a Chi-Square analysis would be:
Is there a significant relationship between voter intent and political party membership?
Lesson Proper:
How does the Chi-Square statistic work?
The Chi-Square statistic is most commonly used to evaluate Tests of Independence when
using a crosstabulation (also known as a bivariate table). Crosstabulation presents the
distributions of two categorical variables simultaneously, with the intersections of the categories
of the variables appearing in the cells of the table. The Test of Independence assesses whether
an association exists between the two variables by comparing the observed pattern of responses
in the cells to the pattern that would be expected if the variables were truly independent of each
other. Calculating the Chi-Square statistic and comparing it against a critical value from the Chi-
Square distribution allows the researcher to assess whether the observed cell counts are
significantly different from the expected cell counts.
The calculation of the Chi-Square statistic is quite straight-forward and intuitive:
(𝑂 − 𝐸)2
𝜒𝑐2 = ∑
𝐸
Where O is the observed frequency (the observed counts in the cells) and E is the expected
frequency if NO relationship existed between the variables. As depicted in the formula, the Chi-
Square statistic is based on the difference between what is actually observed in the data and what
would be expected if there was truly no relationship between the variables.
Steps in solving 𝜒 2
1. State the null and alternative hypothesis
2. Determine he critical value of 𝜒 2 based on the table
2
For one tailed: 𝜒𝛼,(𝑣−1)
2
For two tailed: 𝜒𝛼,(𝑣−1)
2
3. Compute
(𝑂 − 𝐸)2
𝜒𝑐2 = ∑
𝐸
4. Decision: Reject Ho if 𝜒𝑐2 > 𝜒𝛼2
5. State the conclusion
Synthesis:
A chi-square (χ2) statistic is a test that measures how expectations compare to actual observed
data (or model results). The data used in calculating a chi-square statistic must be random,
raw, mutually exclusive, drawn from independent variables, and drawn from a large enough
sample. Chi-square tests are often used in hypothesis testing. The null hypothesis is rejected if
the computed value is greater than the tabulated value.
To help you understand this lesson, you can watch the video links below:
Assessment:
Offline Learners: Do the activity below and submit it to your instructor’s email.
Online Learners: Answer the activity posted in Google Classroom.
The National Center for Health Statistics (NCHS) provided data on the distribution of weight (in
categories) among Americans in 2002. The distribution was based on specific values of body
mass index (BMI) computed as weight in kilograms over height in meters squared. Underweight
was defined as BMI< 18.5, Normal weight as BMI between 18.5 and 24.9, overweight as BMI
between 25 and 29.9 and obese as BMI of 30 or greater. Americans in 2002 were distributed as
follows: 2% Underweight, 39% Normal Weight, 36% Overweight, and 23% Obese. Suppose we
want to assess whether the distribution of BMI is different in the Framingham Offspring sample.
Using data from the n=3,326 participants who attended the seventh examination of the Offspring
in the Framingham Heart Study we created the BMI categories as defined and observed the
following:
Normal Overweight Obese
Underweight
BMI 18.5- BMI 25.0- BMI > 30
BMI<18.5 24.9 29.9
Observed Frequencies 20 932 1374
(O)
Expected Frequencies 66.5 1297.1 1197.4
(E)
References:
Beligan, S. (2016). Statistics for the Social Sciences. La Trinidad: Benguet State University.
ISBN 971-0330-05-5
Bush, H., Macera, C., Shaffer, R., and Shaffer, P. (2020). Biostatistics and Epidemiology.
Taguig: Cengage Learning Asia Pte Ltd.
I. BACKGROUND:
You will see three Chi-Square tests: the tests of goodness of fit, independence, and
homogeneity. For all three tests, the data are generally presented in the form of a
contingency table (a rectangular array of numbers in cells). All three tests are based on the
Chi-Square statistic.
Goodness of Fit Test: This test answers the question, “Do the data fit well compared to
a specified distribution?” It considers one categorical response, and assesses whether the
proportion of sampled observations falling into each category matches well to a specified
distribution. The null hypothesis specifies this distribution which describes the population
proportion of observations in each category.
Test of Homogeneity: This test answers the question, “Do two or more populations have
the same distribution for one categorical variable?” It considers one categorical response,
and assesses whether the model for this response is the same in two (or more) populations.
The null hypothesis is that the distribution of the categorical variable is the same for the two
(or more) populations.
Test of Independence: This test answers the question, “Are two factors (or variables)
independent for a population under study?” It considers two categorical variables
(sometimes one is a response and the other is explanatory), and assesses whether there
appears to be a relationship between these two variables for a single population. The null
hypothesis is that the two categorical variables are independent (not related) for the
population of interest.
II. OBJECTIVES:
1. Accurately determine 𝜒 2
2. Give an appropriate interpretation of the test statistic
III. MATERIALS:
Coupon Bond
Calculator
Writing Materials
IV. ACTIVITY:
1. A clinical trial was conducted among children 1–10 years of age with prior symptoms of
otorrhea comparing efficacy of (i) antibiotic eardrops, (ii) oral antibiotics, and (iii)
observation without treatment, referred to below as observation. Children were seen at
home by study physicians at 2 weeks and 6 months after randomization. The primary
outcome was the presence of otorrhea at 2 weeks observed by study physicians. The
results are given in Table 10.22. Do the results agree with the expectation?
2. To test a theory that people have no preference among four different outdoor activities,
you ask 100 people to select among jogging, bicycling, hiking, or swimming.
Answer: Chi-Square test of _____ _ ________
3. A biostatistician would like to determine if the ratio of the blood type in the storage for
transfusions should be different in Hawaii from the main land. She collected a sample of
blood types of 10,000 people in Hawaii and that of 100,000 people in the mainland. She
wishes to see if the breakdown of blood types (A, B, AB and 0) is the same for both
populations.
Answer: Chi-Square test of _____ _ ________
4. A researcher wants to determine if scoring high or low on an artistic ability test depends
on being right or left-handed.
Answer: Chi-Square test of _____ _ ________
6. A preservation society has the percentages of five main types of fish in the river from 10
years ago. After noticing an imbalance recently, they add some fish from hatcheries to the
river. How can they determine if they restored the ecosystem from a new sample of fish?
Answer: Chi-Square test of _____ _ ________
Introduction:
An ANOVA test is a way to find out if survey or experiment results are significant. In other
words, they help you to figure out if you need to reject the null hypothesis or accept the alternate
hypothesis.
Basically, you’re testing groups to see if there’s a difference between them. Examples of
when you might want to test different groups:
• A group of psychiatric patients are trying three different therapies: counseling, medication
and biofeedback. You want to see if one therapy is better than the others.
• A manufacturer has two different processes to make light bulbs. They want to know if one
process is better than the other.
• Students from different colleges take the same exam. You want to see if one college
outperforms the other.
Lesson Proper:
Situation 2: Similar to situation 1, but in this case the individuals are split into groups based on an
attribute they possess. For example, you might be studying leg strength of people according to
weight. You could split participants into weight categories (obese, overweight and normal) and
measure their leg strength on a weight machine.
Tabulated F-value
The tables below are needed to obtain the critical F value. If the computed F value is greater
than the tabulated F value, then the null hypothesis is rejected. If the computed F value is less
than the tabulated F value, then the null hypothesis is accepted.
4. 𝑡𝑑𝑓 = 𝑛 − 1 = 12 − 1 = 11
𝑡𝑟𝑡𝑑𝑓 = 2 − 1 = 1
𝑒𝑑𝑓 = 𝑡𝑑𝑓 − 𝑡𝑟𝑡𝑑𝑓 = 11 − 1 = 10
2
5. 𝑇𝑆𝑆 = ∑ 𝑦𝑖𝑗 − 𝐶𝐹
𝑇𝑆𝑆 = (3.782 + 3.302 + 3.322 + 3.232 + 2.732 + 2.592 + 0.792 + 0.772 + 0.862 + 0.782
+ 0.812 + 0.822 ) − 47.12 = 17.57
𝑦𝑖2
𝑇𝑟𝑆𝑆 = ∑ − 𝐶𝐹
𝑛𝑖
18.952 4.832
𝑇𝑟𝑆𝑆 = [ + ] − 47.12 = 16.62
6 6
Synthesis:
The ANOVA is used when the research question involves the comparisons of means from more
than two independent groups. It is assumed that:
• The groups are independent
• The variance for each of the groups is the same
• The outcome comes from the normal distribution
In this analysis, the goal is to take the variability of the outcome and divide it into the variability
between the groups and the variability within groups. This statistical tool helps us to answer a
hypothesis by splitting up the sources of variability. Therefore, the ANOVA provides a statistical
test for determining whether there is enough evidence to reject the null hypothesis that all the
means are equal.
To help you understand this lesson, watch the video link below:
https://www.youtube.com/watch?v=oOuu8IBd-yo
Assessment:
Offline Learners: Answer the activity listed below and send it to your instructor via email.
Online Learners: Answer the activity posted in Google Classroom
Twenty-two young asthmatic volunteers were studied to assess the short-term effects of sulfur
dioxide (SO2) exposure under various conditions. The baseline data in Table 12.30 were
presented regarding bronchial reactivity to SO 2 stratified by lung function (as defined by forced
expiratory volume / forced vital capacity [FEV1/FVC]) at screening. Test the hypothesis that there
References:
Beligan, S. (2016). Statistics for the Social Sciences. La Trinidad: Benguet State
University. ISBN 971-0330-05-5
Bush, H., Macera, C., Shaffer, R., and Shaffer, P. (2020). Biostatistics and
Epidemiology. Taguig: Cengage Learning Asia Pte Ltd.
LABORATORY 8
ANALYSIS OF VARIANCE
I. BACKGROUND
The responses that are generated in an experimental situation always exhibit a certain
amount of variability. In an analysis of variance, you divide the total variation in the response
measurements into portions that may be attributed to various factors of interest to the
experimenter. If the experiment has been properly designed, these portions can then be
used to answer questions about the effects of the various factors on the response of interest.
II. OBJECTIVE
At the end of the lesson you are expected to:
1. Accurately solve for the F value given the raw data
2. Give an appropriate interpretation of the F value
3. Analyze and interpret a given Excel ANOVA output
III. MATERIALS
A Self-regulated Learning Module 63
Calculator
Writing materials
Coupon bond
IV. ACTIVITY
1. A clinical trial is run to compare weight loss programs and participants are randomly
assigned to one of the comparison programs and are counseled on the details of the
assigned program. Participants follow the assigned program for 8 weeks. The outcome of
interest is weight loss, defined as the difference in weight measured at the start of the study
(baseline) and weight measured at the end of the study (8 weeks), measured in
pounds. Three popular weight loss programs are considered. The first is a low calorie diet.
The second is a low fat diet and the third is a low carbohydrate diet. For comparison
purposes, a fourth group is considered as a control group. Participants in the fourth group
are told that they are participating in a study of healthy behaviors with weight loss only one
component of interest. The control group is included here to assess the placebo effect (i.e.,
weight loss due to simply participating in the study). A total of twenty patients agree to
participate in the study and are randomly assigned to one of the four diet groups. Weights
are measured at baseline and patients are counseled on the proper implementation of the
assigned diet (with the exception of the control group). After 8 weeks, each patient's weight
is again measured and the difference in weights is computed by subtracting the 8 week
weight from the baseline weight. Positive differences indicate weight losses and negative
differences indicate weight gains. For interpretation purposes, we refer to the differences in
weights as weight losses and the observed weight losses are shown below.
Is there a statistically significant difference in the mean weight loss among the four diets?
2. Calcium is an essential mineral that regulates the heart, is important for blood clotting and
for building healthy bones. The National Osteoporosis Foundation recommends a daily
calcium intake of 1000-1200 mg/day for adult men and women. While calcium is contained
in some foods, most adults do not get enough calcium in their diets and take supplements.
Unfortunately, some of the supplements have side effects such as gastric distress, making
them difficult for some patients to take on a regular basis. A study is designed to test
whether there is a difference in mean daily calcium intake in adults with normal bone
density, adults with osteopenia (a low bone density which may lead to osteoporosis) and
adults with osteoporosis. Adults 60 years of age with normal bone density, osteopenia and
osteoporosis are selected at random from hospital records and invited to participate in the
study. Each participant's daily calcium intake is measured based on reported food intake
and supplements. The data are shown below.
Is there a statistically significant difference in mean calcium intake in patients with normal bone
density as compared to patients with osteopenia and osteoporosis?
V. QUESTIONS/SUPPLEMENTAL ACTIVITIES/EXERCISES
Given the result generated from Excel, state the alternative hypothesis, determine if the null
hypothesis should be rejected or accepted, interpret the F value and p-value.
1. Ho: There is no sufficient evidence to conclude that the mean calcium content is not the
same for the four different storage times
SUMMARY
Groups Count Sum Average Variance
0 months 6 344.96 57.49333 1.890107
1 month 6 347.71 57.95167 1.765937
2 months 6 357.32 59.55333 0.804507
3 months 6 362.03 60.33833 2.119657
ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 32.13815 3 10.71272 6.512085 0.002982 3.098391
Within Groups 32.90103 20 1.645052
Total 65.03918 23
2. Ho: There are no significant differences in the efficacy of the new antidepressant based
on the dosage given
Anova: Single Factor
SUMMARY
ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 1484.933 2 742.4667 11.26657 0.001761 3.885294
Within Groups 790.8 12 65.9
Total 2275.733 14
LESSON 9
RELATIVE RISK
Introduction:
Risk as defined for public health planning is the probability of the occurrence of a disease or
other health outcome of interest during a specified period, usually one year. Risk is calculated by
dividing the number who got the disease during the defined period by the total population of
interest during that period. For example, if there were 1000 births in a health jurisdiction in one
Lesson Proper:
The fundamental comparison of rates using a ratio in epidemiology is known as the rate ratio.
If the rates being compared are incidence rates, epidemiologists call those comparisons risk
ratios, also referred to as relative risk (RR). The definition of relative risk is a measure of
association that provided the strength of association between exposure and outcome in a
population. This definition has several key parts that need to be highlighted.
First, relative risk is a measure of association, which means that it has the ability to tell if two
comparable groups are related to each other. The second key part of the definition of relative risk
is that it provides the strength of association, which means that it results in a number that tells
how related the comparable groups are. So a resulting relative risk of 2 indicates that the rate
above the fraction line is twice as large as the rate below the fraction line. A relative risk of 3 is
said to be stronger than a relative risk of 2. The third key part of the definition of relative risk is
that it is between the exposure and outcome in a population. Although the terms “exposure” and
“outcome” are used in the definition, the reality is that the relative risk can compare rates between
any two groups. The two groups could be two populations, two geographic locations, two time
periods, or two diseases, but most often relative risk is used to compare the rates of a disease in
the group of people exposed to the risk factor of interest and in the group of people not exposed
to the risk factor of interest. Relative risk is a very flexible tool.
The generic formula for assessing the relationship between exposure and outcome using the
relative risk is:
𝐼𝑛𝑐𝑖𝑑𝑒𝑛𝑐𝑒 𝑟𝑎𝑡𝑒 𝑖𝑛 𝑡ℎ𝑒 𝑒𝑥𝑝𝑜𝑠𝑒𝑑 𝑔𝑟𝑜𝑢𝑝
𝑅𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝑟𝑖𝑠𝑘 =
𝐼𝑛𝑐𝑖𝑑𝑒𝑛𝑐𝑒 𝑟𝑎𝑡𝑒 𝑖𝑛 𝑡ℎ𝑒 𝑛𝑜𝑛𝑒𝑥𝑝𝑜𝑠𝑒𝑑 𝑔𝑟𝑜𝑢𝑝
In this form, the exposed group is identified by the hypothesis of interest to the investigator.
So, for example, the exposed group could be those who smoke cigarettes, and the nonexposed
group would be those who do not smoke cigarettes. The resulting relative risk will then identify
the relationship between smoking cigarettes and the outcome of interest. In addition to comparing
exposed and unexposed groups, the rates above and below the fraction line can be flexible and
represent any groupings that are to be compared.
The null value of relative risk is not strictly equal to exactly 1. Sometimes a relative risk close
to 1 is still considered as no association between exposure and outcome. But when the result of
a relative risk is different than 1, the relationship between the exposure and outcome is indicated
by the strength of association and the direction of the result. When the relative risk is above 1,
the interpretation is that those in the exposed group are more likely to have the outcome than
those in the nonexposed group. This is known as a positive association between exposure and
outcome. The larger the number, the stronger the relationship between being exposed and having
the outcome. The sentence that interprets a relative risk above 1 is:
The risk that those in the exposed group will develop the outcome is XX.XX times as
likely as those in the nonexposed group developing the outcome.
In this interpretation, the numeric result of the relative risk is inserted in place of XX.XX.
Also, the actual characteristic or attribute that forms the exposure group should replace the terms
exposed and nonexposed. Finally, the actual outcome or disease should replace the outcome.
As an example, examine this calculated relative risk:
𝐼𝑛𝑐𝑖𝑑𝑒𝑛𝑐𝑒 𝑟𝑎𝑡𝑒 𝑜𝑓 𝑙𝑢𝑛𝑔 𝑐𝑎𝑛𝑐𝑒𝑟 𝑖𝑛 𝑡ℎ𝑒 𝑠𝑚𝑜𝑘𝑖𝑛𝑔 𝑔𝑟𝑜𝑢𝑝 12.8%
= = 4.0
𝐼𝑛𝑐𝑖𝑑𝑒𝑛𝑐𝑒 𝑟𝑎𝑡𝑒 𝑜𝑓 𝑙𝑢𝑛𝑔 𝑐𝑎𝑛𝑐𝑒𝑟 𝑖𝑛 𝑡ℎ𝑒 𝑛𝑜𝑛𝑠𝑚𝑜𝑘𝑖𝑛𝑔 𝑔𝑟𝑜𝑢𝑝 3.2%
This relative risk of 4 means that there is a positive relationship between smoking and lung cancer.
The interpretation of this relative risk is:
The risk that those in the smoking group will develop lung cancer is 4.0 times as likely
as those in the nonsmoking group developing lung cancer.
A negative association is represented by a relative risk that is less than 1. In this case, the higher
rate of outcome is below the fraction line. This finding is also an indication that the exposure is
protective for the outcome. The sentence that interprets a relative risk below 1 is the same as for
the relative risk above 1, except that it indicates the exposed group is less likely to develop the
outcome. As an example, examine the relative risk presented here:
𝐼𝑛𝑐𝑖𝑑𝑒𝑛𝑐𝑒 𝑟𝑎𝑡𝑒 𝑜𝑓 ℎ𝑒𝑎𝑟𝑡 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 𝑖𝑛 𝑡ℎ𝑒 𝑒𝑥𝑒𝑟𝑐𝑖𝑠𝑒 𝑔𝑟𝑜𝑢𝑝 2.4%
= = 0.27
𝐼𝑛𝑐𝑖𝑑𝑒𝑛𝑐𝑒 𝑟𝑎𝑡𝑒 𝑜𝑓 ℎ𝑒𝑎𝑟𝑡 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 𝑖𝑛 𝑡ℎ𝑒 𝑛𝑜𝑛𝑒𝑥𝑒𝑟𝑐𝑖𝑠𝑒 𝑔𝑟𝑜𝑢𝑝 8.8%
This relative risk means that there is a negative relationship between exercise and heart disease.
Those in the exercise group have lower rates of heart disease than those in the nonexercise
group. Further, from this result, there is an indication that exercise is protective against heart
disease. The interpretation of this relative risk is:
The risk that those in the exercise group will develop heart disease is 0.27 times
as likely as those in the nonexercise group developing heart disease.
This new arrangement for the relative risk still represents the same relationship between
exercise and heart disease, but the investigator eliminates the possible confusion of having to
interpret a relative risk less than 1. If the rates are reversed, it is critical to ensure that the correct
corresponding exposure group is represented in the interpretation. Consequently, in this example,
one would interpret it by saying that non-exercisers are 3.67 times more likely to develop heart
disease than exercisers or that exercisers are 0.27 times as likely to develop heart disease as
non-exercisers. The relationship between exercise and heart disease in both statements is the
same. Finally, notice that the mathematical relationship between relative risks when the rates are
reversed above and below the fraction line is that the each relative risk is the reciprocal of the
other, so 0.27 = 1/3.67.
𝑎
𝑅𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝑅𝑖𝑠𝑘(𝑅𝑅) = 𝑐 𝑏
𝑎 +
𝑐+𝑑
Synthesis:
A risk ratio (RR), also called relative risk, compares the risk of a health event (disease, injury,
risk factor, or death) among one group with the risk among another group. It does so by dividing
the risk (incidence proportion, attack rate) in group 1 by the risk (incidence proportion, attack rate)
in group 2. The two groups are typically differentiated by such demographic factors as sex (e.g.,
males versus females) or by exposure to a suspected risk factor (e.g., did or did not eat potato
Assessment:
Offline Learners: Answer the activity below and send it to your instructor via email.
Online Learners: Answer the activity posted in Google Classroom
1. Induction of labor. Meconium staining of the fetus during childbirth is a sign of fetal distress.
In a randomized trial, 11 pregnant women had elective induction of labor between 39 and 40
weeks of gestation and 117 control women were managed expectantly until 41 weeks of
gestation. One case of meconium staining occurred in the treatment group. Thirteen (13) occurred
in the control group. Express the association between induction and meconium staining as a risk
ratio.
2. Joseph Lister and anti-septic surgery. When Joseph Lister introduced the antiseptic method
for surgical operations he demonstrated that post-operative mortality dropped from 16 per 35
procedures to 6 per 40 procedures. Determine the risks of post-operative mortality in each group
and determine whether the difference is statistically significant.
References:
Andrade, C. (2015). Understanding Relative Risk, Odds Ratio, and Related Terms: As Simple
As It Can Get. Retrieved from: http://www.pitt.edu/~bertsch/risk.pdf
FHOP Planning Guide. (n.d.). Calculating and Interpreting Attributable Risk and Population
Attributable Risk. Retrieved from:
https://fhop.ucsf.edu/sites/fhop.ucsf.edu/files/wysiwyg/pg_apxIIIB.pdf
LABORATORY 9
RELATIVE RISK
I. BACKGROUND:
II. OBJECTIVES:
1. Solve for the relative risk of a given set of data.
2. Correctly interpret the result of the relative risk
III. MATERIALS:
Coupon Bond
Writing Materials
Calculator
IV. ACTIVITY:
1. The table below examines the risk of wound infections with incidental appendectomy
during a staging laparotomy for Hodgkin disease. Calculate the relative risk and give
the interpretation.
Had incidental Wound infection No wound infection
appendectomy
Yes 7 124
No 1 78
2. A study of raloxifene and incidence of fractures was conducted among women with
evidence of osteoporosis. The women were initially divided into two groups: those
with and those without pre-existing fractures. The women were then randomized to
raloxifene or placebo and followed for 3 years to determine the incidence of new
vertebral fractures, with the results shown in Table 13.53. Among those with no pre-
existing fractures, compute the relative risk of new fractures among those randomized
to raloxifene vs. placebo, along with its associated 95% CI.
V. QUESTIONS/SUPPLEMENTAL ACTIVITIES/EXERCISES
1. A cohort study examined the association between smoking and lung cancer after
following 400 smokers and 600 non-smokers for 15 years. At the conclusion of the
study the investigators found a relative risk of 17. What would be the interpretation of
the given relative risk?
LESSON 10
ODDS RATIO, PREVALENCE, INCIDENCE
Introduction:
In the previous lesson, the RR (or relative risk) was introduced. The relative risk can be
expressed as the ratio of the probability of disease among exposed subjects (p1) divided by the
probability of disease among unexposed subjects (p2). Although easily understood, the RR has
the disadvantage of being constrained by the denominator probability (p2). For example, if p2 =
A Self-regulated Learning Module 72
.5, then the RR can be no larger than 1/.5 = 2; if p2 = .8, then the RR can be no larger than 1/.8
= 1.25. To avoid this restriction, another comparative measure relating two proportions is
sometimes used, called the odds ratio (OR). The odds in favor of a success are defined as follows:
If the probability of a success = p, then the odds in favor of success = p/(1 −p). If two proportions
p1, p2 are considered and the odds in favor of success are computed for each proportion, then
the ratio of odds, or OR, becomes a useful measure for relating the two proportions.
Lesson Proper:
When incidence rates are available, comparing rates using a ratio measure of association is
best done with a relative risk. However, when incidence data are not available, the most
commonly used method for comparing rates as a ratio is the odds ratio (OR). Equally important
is that an odds ratio can be used to measure associations in any study design. This makes the
odds ratio a very widely used measure of association.
An odds ratio is a measure association that provides the strength and direction of the
association between exposure and outcome in a population. This sounds very similar to the
definition of a relative risk, and, in fact, the results of the odds ratio are interpreted in the same
manner as the relative risk. An odds ratio equal to 1 indicates that there is no relationship between
exposure and outcome in the observed populations. An odds ratio greater than 1 indicates a
positive association between exposure and outcome, an odds ratio less than 1 indicates a
negative association between exposure and outcome. The formula for the odds ratio is best
described by referring to the 2 x 2 table. The odds ratio can be either the ratio of the:
1. Exposure odds in those with the outcome to the exposure odds in those without outcome
2. Outcome odds in those with exposure to the outcome odds in those without exposure.
Outcome No Outcome
Exposed A B
Not exposed C D
Prevalence Ratio
The prevalence ratio is a measure of association that provides strength and direction of the
association between existing exposure and outcome in the population. The prevalence ratio can
be used in a cross sectional study or any study where the outcome data is prevalence. A
prevalence ratio has the same interpretation as the relative risk and the odds ratio with respect to
its null value of 1 and values greater or less than 1. The main difference is the prevalence ratio
compares two prevalence rates. The two rates are compared as a ratio in the following generic
formula:
The incidence density ratio has the same interpretation as relative risk and the odds ratio.
Example:
A study looking at breast cancer in women compared cases with non- cases, and found that
75/100 cases did not use calcium supplements compared with 25/100 of the non-cases. (a)
Calculate the odds of exposure in cases and non-cases. (b) Calculate the odds ratio using the
cross-product ratio. (c) How does the difference between the two prevalences of breast cancer
(75% vs 25%) compare to the odds ratio?
Synthesis:
The great value of the odds ratio is that it is simple to calculate, very easy to interpret, and provides
results upon which clinical decisions can be made. Furthermore, it is sometimes helpful in clinical
situations to be able to provide the patient with information on the odds of one outcome versus
another. Patients may decide to accept or forego painful or expensive treatments if they
understand what their odds are for obtaining a desired result from the treatment. Many patients
A Self-regulated Learning Module 75
want to be involved in decisions about their treatment, but to be able to participate effectively,
they must have information about their likely results in terms they can understand. At least in the
industrialized world, most patients have received enough schooling to understand basic
percentages and the meaning of probabilities. The odds ratio provides information that both
clinicians and their patients can use for decision-making.
Odds ratios are one of a category of statistics clinicians often use to make treatment decisions.
Other statistics commonly used to make treatment decisions include risk assessment statistics
such as absolute risk reduction and relative risk reduction statistics. The odds ratio supports
clinical decisions by providing information on the odds of a particular outcome relative to the odds
of another outcome. In the endocarditis example, the risk (or odds) of dying if treated with the new
drug is relative to the risk (odds) of dying if treated with the standard treatment antibiotic protocol.
Relative risk assessment statistics are particularly suited to diagnostic and treatment decision-
making and will be addressed in a future paper.
Assessment:
Offline Learners: Do the activity below and submit it to your instructor via email.
Online Learners: Do the activity posted in Google Classroom
1. The following is the abstract of a paper (Illi et al., 2001). Read and understand the case
and answer the questions that follow.
Objective: To investigate the association between early childhood infections and subsequent
development of asthma.
Design: Longitudinal birth cohort study.
Setting: Five children's hospitals in five German cities.
Participants: 1314 children born in 1990 followed from birth to the age of 7 years.
Main outcome measures: Asthma and asthmatic symptoms assessed longitudinally by parental
questionnaires; atopic sensitisation assessed longitudinally by determination of IgE
concentrations to various allergens; bronchial hyperreactivity assessed by bronchial histamine
challenge at age 7 years.
Results: Compared with children with 1 episode of runny nose before the age of 1 year, those
with 2 episodes were less likely to have a doctor's diagnosis of asthma at 7 years old (odds ratio
0.52 (95% confidence interval 0.29 to 0.92)) or to have wheeze at 7 years old (0.60 (0.38 to 0.94)),
and were less likely to be atopic before the age of 5 years. Similarly, having 1 viral infection of the
herpes type in the first 3 years of life was inversely associated with asthma at age 7 (odds ratio
0.48 (0.26 to 0.89)). Repeated lower respiratory tract infectionsin the first 3 years of life showed
a positive association with wheeze up to the age of 7 years (odds ratio 3.37 (1.92 to 5.92) for 4
infections v 3 infections).
Conclusion: Repeated viral infections other than lower respiratory tract infections early in
life may reduce the risk of developing asthma up to school age.
a) What is meant by odds ratio 0.52 for runny nose and asthma and what does it tell us?
b) What is meant by 95% confidence interval 0.29 to 0.92 and what further information
does this provide?
c) What is meant by odds ratio 3.37 (1.92 to 5.92) for lower respiratory tract infections and
wheeze?
d) On a less statistical point, what is wrong with the way the conclusion is phrased?
References:
Bush, H., Macera, C., Shaffer, R., and Shaffer, P. (2020). Biostatistics and Epidemiology.
Taguig: Cengage Learning Asia Pte Ltd.
LABORATORY 10
ODDS RATIO
I. BACKGROUND
III. MATERIALS
Coupon bond
Calculator
Writing materials
IV. ACTIVITY
1. Using the table below, calculate and interpret:
a. the odds ratio for the relationship between NOT smoking and lung cancer
b. the odds ratio for the relationship between smoking and lung cancer
Lung Cancer No lung cancer Total
Smoker 127 3,400 3,527
Non-smoker 45 3,100 3,145
Total 172 6,500 6,672
V. QUESTIONS/SUPPLEMENTAL ACTIVITIES/EXERCISES
1. Create a table presenting the data and calculate the odds ratio and give an appropriate
interpretation based on the data below:
200 pairs where the case is exposed and the control is not
50 pairs where the control is exposed and the case is not
130 pairs where cases and controls are exposed
85 pairs where the cases and controls are unexposed
2. An investigator conducts a study to determine whether there is an association between
caffeine intake and Parkinson’s disease. He assembles 230 incident cases of PD and
samples 455 controls from the general population. After interviewing all subjects, he
finds that 64 of the cases had high daily intake of caffeine (exposed) prior to diagnosis
and 277 of the controls had low daily intake of caffeine (unexposed) prior to the date of
the matched case’s diagnosis.
a. Assemble the 2x2 table for this study using the information given.
b. Calculate the odds of being a case among the exposed
c. Calculate the odds ratio for disease given exposure to high daily intake of
caffeine
LESSON 11
EPIDEMIOLOGY
Introduction
This lesson begins with the definition, overview of the objectives of epidemiology, some
of the approaches used in epidemiology, and examples of the applications of epidemiology
to human health problems. It also discusses how diseases are transmitted. Diseases do
not arise in a vacuum; they result from an interaction of human beings with their
environment. An understanding of the concepts and mechanisms underlying the
transmission and acquisition of disease is critical to exploring the epidemiology of human
disease and preventing and controlling many infectious diseases
Lesson Proper:
What is epidemiology?
Epidemiology is the study of how disease is distributed in populations and the factors
that influence or determine this distribution. Why does a disease develop in some people
and not in others? The premise underlying epidemiology is that disease, illness, and ill
health are not randomly distributed in human populations. Rather, each of us has certain
characteristics that predispose us to, or protect us against, a variety of different diseases.
These characteristics may be primarily genetic in origin or may be the result of exposure to
certain environmental hazards. Perhaps most often, we are dealing with an interaction of
genetic and environmental factors in the development of disease.
A broader definition of epidemiology than that given above has been widely accepted. It
defines epidemiology as “the study of the distribution and determinants of health-related
states or events in specified populations and the application of this study to control of health
problems”. What is noteworthy about this definition is that it includes both a description of
the content of the discipline and the purpose or application for which epidemiologic
investigations are carried out.
Human disease does not arise in a vacuum. It results from an interaction of the host (a
person), the agent (e.g., a bacterium), and the environment (e.g., a contaminated water
supply). Although some diseases are largely genetic in origin, virtually all disease results
from an interaction of genetic and environmental factors, with the exact balance differing for
different diseases. Many of the underlying principles governing the transmission of disease
are most clearly demonstrated using communicable diseases as a model. Hence, this
chapter primarily uses such diseases as examples in reviewing these principles. However,
the concepts discussed are also applicable to diseases that do not appear to be of
infectious origin.
The factors that can cause human disease include biologic, physical, and chemical factors
as well as other types, such as stress, that may be harder to classify
An infected individual can transmit influenza or the common cold to others in the course
of an hour in a crowded room. A venereal infection also may spread progressively from
person to person if it is to maintain itself in nature, but it would be a formidable task to transmit
venereal infection on such a scale.
Thus, different organisms spread in different ways, and the potential of a given organism
for spreading and producing outbreaks depends on the characteristics of the organism, such
as its rate of growth and the route by which it is transmitted from one person to another
Clinical and biologic knowledge has increased over the years, so has our ability to distinguish
different stages of disease. These include clinical and nonclinical disease:
1. Preclinical Disease
➢ Disease that is not yet clinically apparent, but is destined to progress to clinical
disease.
2. Subclinical Disease
➢ Disease that is not clinically apparent and is not destined to become clinically
apparent. This type of disease is often diagnosed by serologic (antibody) response
or culture of the organism.
3. Persistent (Chronic) Disease
➢ A person fails to “shake off” the infection, and it persists for years, at times for life.
In recent years, an interesting phenomenon has been the manifestation of
symptoms many years after an infection was thought to have been resolved. Some
adults who recovered from poliomyelitis in childhood are now reporting severe
fatigue and weakness; this has been called post-polio syndrome in adult life. These
have thus become cases of clinical disease, albeit somewhat different from the
initial illness.
4. Latent Disease
➢ An infection with no active multiplication of the agent, as when viral nucleic acid is
incorporated into the nucleus of a cell as a provirus. In contrast to persistent
infection, only the genetic message is present in the host, not the viable organism.
In this situation, the individual harbors the organism, but is not infected as measured by
serologic studies (no evidence of an antibody response) or by evidence of clinical illness.
This person can still infect others, although the infectivity is often lower than with other
infections. Carrier status may be of limited duration or may be chronic, lasting for months or
years. One of the best-known examples of a long-term carrier was Typhoid Mary, who carried
Salmonella typhi and died in 1938. Over a period of many years, she worked as a cook in the
New York City area, moving from household to household under different names. She was
considered to have caused at least 10 typhoid fever outbreaks that included 51 cases and 3
deaths.
Three other terms need to be defined: endemic, epidemic, and pandemic. Endemic is
defined as the habitual presence of a disease within a given geographic area. It may also
refer to the usual occurrence of a given disease within such an area. Epidemic is defined as
the occurrence in a community or region of a group of illnesses of similar nature, clearly in
excess of
normal
expectancy, and derived from a common or from a propagated source. Pandemic refers to a
worldwide epidemic.
For purposes of this discussion, the focus will be on the single-exposure, common-vehicle
outbreak because the issues discussed are most clearly seen in this type of outbreak. What
are the characteristics of such an outbreak? First, such outbreaks are explosive; there is a
sudden and rapid increase in the number of cases of a disease in a population. Second, the
cases are limited to people who share the common exposure. This is self-evident, because
in the first wave of cases we would not expect the disease to develop in people who were not
exposed unless there were another source of the disease in the community. Third, in a food-
borne outbreak, cases rarely occur in persons who acquire the disease from a primary case.
The reason for the relative rarity of such secondary cases in this type of outbreak is not well
understood.
INCUBATION PERIOD
The incubation period is defined as the interval from receipt of infection to the time of onset
of clinical illness. If a person become infected today, the disease with which a person is
infected may not develop for a number of days or weeks. During this time, the incubation
period, you feel completely well and show no signs of the disease.
The incubation period is also of historical interest because it is related to what may have
been the only medical advance associated with the Black Death in Europe. In 1374, when
people were terribly frightened of the Black Death, the Venetian Republic appointed three
officials who were to be responsible for inspecting all ships entering the port and for excluding
ships that had sick people on board. It was hoped that this intervention would protect the
community. In 1377, in the Italian seaport of Ragusa, travelers were detained in an isolated
area for 30 days (trentini giorni) after arrival to see whether infection developed. This period
was found to be insufficient, and the period of detention was lengthened to 40 days (quarante
giorni). This is the origin of the word quarantine.
How long would a person be isolated? The person should be isolated until he or she is no
longer infectious to others. When a person is clinically ill, there is generally a clear sign of
potential infectiousness. An important problem arises before the person becomes clinically
ill—that is, during the incubation period. It is important to know when a person became
infected and also know the general length of the incubation period for the disease, the
infected person should be isolated during this period to prevent the transmission of the
disease to others. In most situations, however, it is difficult to know that a person has been
infected, and nobody may know until signs of clinical disease become manifest.
The attack rate is similar to the incidence rate, which is also used for less acute diseases.
The attack rate (or the incidence rate) is useful for comparing the risk of disease in groups
with different exposures. The attack rate can be specific for a given exposure. For example,
the attack rate in people who ate a
certain food is called a food-
specific attack rate. It is calculated
by:
In general, time is not explicitly specified in an attack rate; given what is usually known
about how long after an exposure most cases develop, the time period is implicit in the attack
rate
A person who acquires the disease from that exposure (e.g., from a contaminated food)
is called a primary case. A person who acquires the disease from exposure to a primary case
is called a secondary case. The secondary attack rate is therefore defined as the attack rate
in susceptible people who have been exposed to a primary case. It is a good measure of
person-to-person spread of disease after the disease has been introduced into a population,
and it can be thought of as a ripple moving out from the primary case. We often calculate the
secondary attack rate in family members of the index case.
The secondary attack rate also has application in noninfectious diseases when family
members are examined to determine the extent to which a disease clusters among first-
degree relatives of an index case, which may yield a clue regarding the relative contributions
of genetic and environmental factors to the cause of a disease.
Synthesis:
This lesson reviewed some basic concepts that underlie the epidemiologic approach to
acute communicable diseases. Many of these concepts apply equally well to nonacute
diseases that at this time do not appear to be infectious in origin. Moreover, for an increasing
number of chronic diseases originally thought to be noninfectious, infection seems to play
some role. Thus, hepatitis B infection is a major cause of primary liver cancer.
Papillomaviruses have been implicated in cervical cancer, and Epstein-Barr virus has been
implicated in Hodgkin disease. The boundary between the epidemiology of infectious and
noninfectious diseases has blurred in many areas. In addition, even for diseases that are not
infectious in origin, the patterns of spread share many of the same dynamics, and the
methodologic issues in studying them are similar.
Assessment:
Offline Learners: Do the activity below and submit it to your instructor via email.
Online Learners: Do the activity posted in Google Classroom
QUIZ: it will be given through quizziz, google forms or for weak connectivity will be sent through
email
1. An outbreak of cholera with 120 cases among a population at risk of 2400, compute
for the attack rate of disease. (3 points)
2. After a party attended by 80 people, 25 individuals become ill. All 80 people were
interviewed about their food consumption at the dinner. The interviews show that 18 of the
25 people who are ill and 25 of the 80 who are healthy ate fish. Show the 2x2 table (6
points) and Compute for the :
a) Attack rate of the individuals who became ill after eating fish (3 points)
b) Attack rate of the individuals who did not eat fish ( 3 points)
References:
Dawson, B. and Trapp, R. (2004) Basic & Clinical Biostatistics 4 th Edition. McGrawHill.
I. BACKGROUND
Diseases have always been around, many in the same form as we see today. Every disease
has its own pattern, usually described by who is affected, where this takes place, and when it takes place.
There is a collection of selected models of health and disease that can be used in assessing, planning and
decision making for the community. However there is not a single model that is ideal enough to give the
totality of community and public health. There are several theories and models that support the practice
of health promotion and disease prevention. Theories and models are used in program planning to
understand and explain health behavior and to guide the identification, development, and
implementation of interventions.
II. OBJECTIVES
At the end of the lesson you are expected to:
1. Explain the concepts of community health and disease
2. Illustrate concepts of community organization
3. Explain the different models of health and disease
III. MATERIALS
Powerpoint presentation
References
IV. ACTIVITY
1. Students are assigned one model of community health
a) epidemiology triangle/epidemiological triad of disease
b) health field model
c) health belief model
d) iceberg theory of disease
e) model of health and community ecosystem
f) medical model
g) biopsychosocial model
h) salutogenic model
i) stages of change model (transtheoretical model)
j)Theory of reasoned action/planned behavior
2. The models are to be presented in class using group prepared powerpoint through video
presentation/google meet
3. Report should cover the following details
A Self-regulated Learning Module 88
a) define the model of health
b) identify the major components of the model
c) suggest ways on how the model could be used to address a particular health concern
d) advantage and disadvantage
e) evaluate the model with regards to a likely success in various situations
Introduction
Lesson Proper:
Compiling and analyzing data by time, place, and person is desirable for several
reasons.
First, by looking at the data carefully, the epidemiologist becomes very familiar
with the data. He or she can see what the data can or cannot reveal based on the
variables available, its limitations (for example, the number of records with missing
information for each important variable), and its eccentricities (for example, all cases
range in age from 2 months to 6 years, plus one 17-year-old.).
Second, the epidemiologist learns the extent and pattern of the public health
problem being investigated — which months, which neighborhoods, and which groups of
people have the most and least cases.
Third, the epidemiologist creates a detailed description of the health of a
population that can be easily communicated with tables, graphs, and maps.
Fourth, the epidemiologist can identify areas or groups within the population that
have high rates of disease. This information in turn provides important clues to the
causes of the disease, and these clues can be turned into testable hypotheses.
Time
The occurrence of disease changes over time. Some of these changes occur
regularly, while others are unpredictable. Two diseases that occur during the same
season each year include influenza (winter) and West Nile virus infection (August–
A Self-regulated Learning Module 90
September). In contrast, diseases such as hepatitis B and salmonellosis can occur at any
time. For diseases that occur seasonally, health officials can anticipate their occurrence
and implement control and prevention measures, such as an influenza vaccination
campaign or mosquito spraying. For diseases that occur sporadically, investigators can
conduct studies to identify the causes and modes of spread, and then develop
appropriately targeted actions to control or prevent further occurrence of the disease.
In either situation, displaying the patterns of disease occurrence by time is critical for
monitoring disease occurrence in the community and for assessing whether the public
health interventions made a difference.
Time data are usually displayed with a two-dimensional graph. The vertical or y-axis
usually shows the number or rate of cases; the horizontal or x-axis shows the time periods
such as years, months, or days. The number or rate of cases is plotted over time. Graphs
of disease occurrence over time are usually plotted as line graphs or histograms
Sometimes a graph shows the timing of events that are related to disease trends
being displayed. For example, the graph may indicate the period of exposure or the date
control measures were implemented. Studying a graph that notes the period of exposure
may lead to insights into what may have caused illness. Studying a graph that notes the
timing of control measures shows what impact, if any, the measures may have had on
disease occurrence.
As noted above, time is plotted along the x-axis. Depending on the disease, the time
scale may be as broad as years or decades, or as brief as days or even hours of the
day. For some conditions — many chronic diseases, for example — epidemiologists
tend to be interested in long-term trends or patterns in the number of cases or the rate.
For other conditions, such as foodborne outbreaks, the relevant time scale is likely to be
days or hours.
Secular (long-term) trends. Graphing the annual cases or rate of a disease over a
period of years shows long-term or secular trends in the occurrence of the disease.
A Self-regulated Learning Module 91
Health officials use these graphs to assess the prevailing direction of disease occurrence
(increasing, decreasing, or essentially flat), help them evaluate programs or make policy
decisions, infer what caused an increase or decrease in the occurrence of a disease
(particularly if the graph indicates when related events took place), and use past trends
as a predictor of future incidence of disease.
Seasonality. Disease occurrence can be graphed by week or month over the course
of a year or more to show its seasonal pattern, if any. Some diseases such as influenza
and West Nile infection are known to have characteristic seasonal distributions.
Seasonal patterns may suggest hypotheses about how the infection is transmitted, what
behavioral factors increase risk, and other possible contributors to the disease or
condition.
Day of week and time of day. For some conditions, displaying data by day of the
week or time of day may be informative. Analysis at these shorter time periods is
particularly appropriate for conditions related to occupational or environmental
exposures that tend to occur at regularly scheduled intervals.
Place
Describing the occurrence of disease by place provides insight into the geographic
extent of the problem and its geographic variation. Characterization by place refers not
only to place of residence but to any geographic location relevant to disease occurrence.
Such locations include place of diagnosis or report, birthplace, site of employment,
school district, hospital unit, or recent travel destinations. The unit may be as large as a
continent or country or as small as a street address, hospital wing, or operating room.
Sometimes place refers not to a specific location at all but to a place category such as
urban or rural, domestic or foreign, and institutional or noninstitutional.
Place data can be shown through a table but a map provides a more striking visual
display of place data. On a map, different numbers or rates of disease can be depicted
using different shadings, colors, or line patterns.
Analyzing data by place can identify communities at increased risk of disease. Even
if the data cannot reveal why these people have an increased risk, it can help generate
A Self-regulated Learning Module 93
hypotheses to test with additional studies. For example, is a community at increased risk
because of characteristics of the people in the community such as genetic susceptibility,
lack of immunity, risky behaviors, or exposure to local toxins or contaminated food? Can
the increased risk, particularly of a communicable disease, be attributed to
characteristics of the causative agent such as a particularly virulent strain, hospitable
breeding sites, or availability of the vector that transmits the organism to humans? Or
can the increased risk be attributed to the environment that brings the agent and the host
together, such as crowding in urban areas that increases the risk of disease
transmission from person to person, or more homes being built in wooded areas close to
deer that carry ticks infected with the organism that causes Lyme disease?
Person
Age. Age is probably the single most important “person” attribute, because almost
every health-related event varies with age. A number of factors that also vary with age
include: susceptibility, opportunity for exposure, latency or incubation period of the
disease, and physiologic response (which affects, among other things, disease
development).
When analyzing data by age, epidemiologists try to use age groups that are narrow
enough to detect any age-related patterns that may be present in the data. For some
diseases, particularly chronic diseases, 10-year age groups may be adequate. For other
diseases, 10-year and even 5-year age groups conceal important variations in disease
occurrence by age.
Sex. Males have higher rates of illness and death than do females for many
diseases. For some diseases, this sex-related difference is because of genetic,
hormonal, anatomic, or other inherent differences between the sexes. These inherent
differences affect susceptibility or physiologic responses.
A few adverse health conditions occur more frequently among persons of higher
socioeconomic status. Gout was known as the “disease of kings” because of its
association with consumption of rich foods. Other conditions associated with higher
socioeconomic status include breast cancer, Kawasaki syndrome, chronic fatigue
syndrome, and tennis elbow. Differences in exposure account for at least some if not
most of the differences in the frequency of these conditions.
Case reports
➢ Case reports describe the experience of a single patient or a group of
patients with a similar diagnosis. These types of studies typically depict an
observant clinician identifying an unusual feature of a disease or a patient's
history. They can represent the first clues in the identification of new diseases
or adverse effects of an exposure. A case report can prompt further
investigations with more rigorous study design. Case reports are quite
common in medical journals. A systematic review found that they accounted
for over one third of all articles published. They are useful to public health as
they can provide an interface between clinical medicine and epidemiology.
Case Series
➢ It is worth noting that the term 'cross-sectional' study is also used in social
research. Here, the cross-sectional study refers to a snapshot of a population
at a particular point in time. This contrasts with longitudinal studies which
follow a population over a period of time (i.e. cohort and panel), with cross-
comparative, where one population is compared with another within the same
country and cross-national, where one country population is compared with
other countries.
INDICATORS OF HEALTH
Indicator also termed as Index or Variable is only an indication of a given
situation or a reflection of that situation. Health indicator is a variable, susceptible
to direct measurement, that reflects the state of health of persons in a
community. Indicators help to measure the extent to which the objectives and
targets of a program are being attained. Health status indicators measure
A Self-regulated Learning Module 97
different aspects of the health of a population. Health determinant indicators
measure things that influence health.
CLASSIFICATION OF INDICATORS
1. Mortality indicators- crude death rate, expectation of life, infant mortality rate,
under 5 mortality rate, child mortality rate, maternal mortality ratio, disease
specific death rate, proportional mortality rate
2. Morbidity indicators- incidence rate, prevalence rate,
3. Disability rates – event type and person type, sullivan’s index, health adjusted
life expectancy, disability adjusted life years, quality adjusted life year
4. Nutritional indicators- BMI, growth monitoring
5. Health care delivery indicators
6. Utilization Rates
7. Indicators of social and mental health
8. Environmental indicators
9. Socio-economic indicators
10. Health policy indicators
11. Other indicators
Synthesis:
Assessment:
Offline Learners: Do the activity below and submit it to your instructor via email.
Online Learners: Do the activity posted in Google Classroom
QUIZ: it will be given through quizziz, google forms or for weak connectivity will be sent through
email
TASK 2: Define the following and for numbers 5-11 aside from define give examples
1. Mortality indicators
➢ crude death rate
➢ expectation of life
➢ infant mortality rate
➢ under 5 mortality rate
➢ child mortality rate
➢ maternal mortality ratio
➢ disease specific death rate
➢ proportional mortality rate
2. Morbidity indicators
➢ incidence rate
➢ prevalence rate,
3. Disability rates
➢ event type and person type
➢ sullivan’s index
➢ health adjusted life expectancy
A Self-regulated Learning Module 99
➢ disability adjusted life years
➢ quality adjusted life year
4. Nutritional indicators-
➢ BMI
➢ growth monitoring
5. Health care delivery indicators
6. Utilization Rates
7. Indicators of social and mental health
8. Environmental indicators
9. Socio-economic indicators
10. Health policy indicators
11. Other indicators
References:
Center for Disease Control and Prevention, (2012). Principles of Epidemiology in Public Health
Practice, Third Edition An Introduction to Applied Epidemiology and Biostatistics. Retrieved
from https://www.cdc.gov/csels/dsepd/ss1978/lesson1/section6.html on August 5, 2020
Grimes, D.A. & Schultz, K.F. (2002) Descriptive Studies: what they can and cannot do. The
Lancet, 359, 145-49.
Goodman RA, Smith JD, Sikes RK, Rogers DL, Mickey JL. Fatalities associated with farm
tractor injuries: an epidemiologic study. Public Health Rep 1985;100:329–33.
Heyman DL, Rodier G. Global surveillance, national surveillance, and SARS. Emerg Infect Dis.
2003;10:173–5.
American Cancer Society [Internet]. Atlanta: The American Cancer Society, Inc. Available from:
http://www.cancer.org/Research/CancerFactsFigures/cancer-facts-figures-2005/external
icon.
Centers for Disease Control and Prevention. Current trends. Lung cancer and breast cancer
trends among women–Texas. MMWR 1984;33(MM19):266.
Liao Y, Tucker P, Okoro CA, Giles WH, Mokdad AH, Harris VB, et. al. REACH 2010
surveillance for health status in minority communities — United States, 2001–2002.
MMWR 2004;53:1–36.
I. BACKGROUND
As a quantitative field, certain measures are essential in understanding the state of health of
illnesses and health outcomes. The essence of epidemiology is to measure disease occurrence and make
comparisons between population groups. The current section introduces the commonly used measures
that help our understanding of the distribution of disease in a given population.
OR
Doubling time of a population: The rule of 70. ( This only applies if the population is growing exponentially)
70% 0.7
( ) or ( ) = Doubling Time (dt) in years
r (in percent form) r (in decimal form)
II. OBJECTIVES
At the end of the lesson you are expected to:
1. Learn about commonly used epidemiological measurements to describe the occurrence of
disease
2. Accurately solve for the following epidemiological measurements
a) Incidence rate
b) Prevalence rate
c) Crude death rate
d) Crude birth rate
e) Case fatality rate
f) Cause-specific rate
g) Population density
h) Population growth rate
i) Doubling time of a population
j) Future population from growth rate
III. MATERIALS
Calculator
Writing & Drawing materials
References
IV. ACTIVITY
1. Epidemiologists have noticed the cyclic occurrence of the dreaded polio. Children afflicted with
this disease exhibit fever, sore throat and stiffness. In Baguio, doctors have been monitoring the
situation. Their observation began with week 0, the first week in October. The data is shown in
the following table:
2. Baguio City contains 2.3 million people and covers 800,000 square miles. In the year after the
last census, there were 109,000 new children born and 111,000 people died.
A Self-regulated Learning Module 103
a) What is the current population density?
b) What are the birth and death rates?
c) What is the population growth rate (r)?
d) In how many years will the population of Transylvania double?
e) Given a 2010 world population growth rate of about 1.3% per year, how long would it take
the world’s population to double?
f) How old will you be when this doubling occurs?
g) If a country doubles its population in 56 years, what has its population growth rate during
that time?
3. Compute for the cause-specific death rate with the following information: 6, 309 colon cancer deaths
in the Philippines during calendar years 2015, 2016, 2017. 38,128,753 – sum of estimated 2015,
2016, 2017 mid year populations.
4. Compute for the case fatality rate with the following information: 137 deaths due to HIV during
calendar year 2018. With a 110,787 estimated HIV infected individuals in 2018.
Introduction
As noted in Lesson 12, descriptive epidemiology can identify patterns among cases and
in populations by time, place and person. From these observations, epidemiologists
develop hypotheses about the causes of these patterns and about the factors that increase
risk of disease. In other words, epidemiologists can use descriptive epidemiology to
generate hypotheses, but only rarely to test those hypotheses. For that, epidemiologists
must turn to analytic epidemiology.
Lesson Proper:
EXPERIMENTAL STUDIES
Experimental epidemiology is the study of the relationships of various factors
determining the frequency and distribution of diseases in a community. Experimental
epidemiology contains three case types:
1. randomized control trial
➢ often used for new medicine or drug testing
➢ the epitome of all research designs because it provides the strongest
evidence for concluding causation
➢ it provides the best insurance that the result was due to the
intervention
2. field trial
➢ conducted on those at a high risk of conducting a disease
3. community trial
➢ research on social originating diseases
OBSERVATIONAL STUDIES
In an observational study, the epidemiologist simply observes the exposure and
disease status of each study participant. John Snow’s studies of cholera in London were
observational studies. The two most common types of observational studies are cohort
studies and case-control studies; a third type is cross-sectional studies.
1. Cohort study
➢ A cohort study is similar in concept to the experimental study. In a
cohort study the epidemiologist records whether each study
participant is exposed or not, and then tracks the participants to see if
they develop the disease of interest. Note that this differs from an
experimental study because, in a cohort study, the investigator
observes rather than determines the participants’ exposure status.
After a period of time, the investigator compares the disease rate in
the exposed group with the disease rate in the unexposed group. The
unexposed group serves as the comparison group, providing an
estimate of the baseline or expected amount of disease occurrence in
the community. If the disease rate is substantively different in the
exposed group compared to the unexposed group, the exposure is
said to be associated with illness.
2. Case-control study
➢ In a case-control study, investigators start by enrolling a group of
people with disease (at CDC such persons are called case-patients
rather than cases, because case refers to occurrence of disease, not a
person). As a comparison group, the investigator then enrolls a group of
people without disease (controls). Investigators then compare previous
exposures between the two groups. The control group provides an
estimate of the baseline or expected amount of exposure in that
population. If the amount of exposure among the case group is
substantially higher than the amount you would expect based on the
control group, then illness is said to be associated with that exposure.
The study of hepatitis A traced to green onions, described above, is an
example of a case-control study.
Synthesis:
Assessment:
Offline Learners: Do the activity below and submit it to your instructor via email.
Online Learners: Do the activity posted in Google Classroom
QUIZ: it will be given through quizziz, google forms or for weak connectivity will be sent through
email
References:
Center for Disease Control and Prevention, (2012). Principles of Epidemiology in Public Health
Practice, Third Edition An Introduction to Applied Epidemiology and Biostatistics.
Retrieved from https://www.cdc.gov/csels/dsepd/ss1978/lesson1/section6.html on August
5, 2020
Dawson, B. and Trapp, R. (2004) Basic & Clinical Biostatistics 4th Edition. McGrawHill.
Kannel WB. The Framingham Study: its 50-year legacy and future promise. J Atheroscler
Thromb 2000;6:60-6.
Thun, J and Jehmal A. (2003). Analytic Epidemiology. Holland-Frei Cancer Medicine. 6th edition.
Retrieved from https://www.ncbi.nlm.nih.gov/books/NBK13832/ on August 5, 2020
LABORATORY 13
STUDY/RESEARCH DESIGNS
I. BACKGROUND
A study design is a specific plan or protocol for conducting an epidemiological study, which
allows the investigator to translate the conceptual hypothesis into an operational one. The research
design refers to the overall strategy that you choose to integrate the different components of the
study in a coherent and logical way, thereby, ensuring you will effectively address the research
problem; it constitutes the blueprint for the collection, measurement, and analysis of data.
II. OBJECTIVES
At the end of the lesson you are expected to:
1. List and describe the goals of a study design
2. Discuss the difference between observational versus experimental study designs and the
difference between descriptive versus analytical studies.
3. Describe commonly used study designs- case report, case series, case control, cohort, and
cross sectional
4. Explain the advantages and disadvantages of each of the observational study designs
5. Explain how to interpret basic results of each study design
III. MATERIALS
Writing & Drawing materials
References
IV. ACTIVITY
1. A Group of students are assigned a study design
a) case report d) cohort
b) case series e) cross sectional
c) case control f) Randomized control trial
A Self-regulated Learning Module 109
2. The study designs should be presented through a powerpoint presentation
3. Report should cover the following details
a) description of the study design
b) advantages and disadvantages of the study design
c) Explain how to interpret basic results of the study design
4. Another group of students will be assigned to present journals per study design. To be
presented through IMRAD format using powerpoint presentation
LESSON 14
CAUSAL INFERENCE
Introduction
Causal inference -- the art and science of making a causal claim about the relationship
between two factors -- is in many ways the heart of epidemiologic research. Under most
circumstances if there is an association between an exposure and a health outcome of
interest, most commonly asked question is: is one causing the other? Causal inference is
important because, ultimately, it can help in the intervention to improve public health, and
interventions can be targeted on removing known causes of adverse health outcomes (or
adding known causes of beneficial health outcomes).
Lesson Proper:
➢ Historically, there have been many efforts to account for the occurrence of
disease outcomes. Religions often attributed disease outbreaks or other
misfortunes to divine retribution - punishment for mankind's sins.
Hippocrates promoted the concept that disease was the result of an
imbalance among four vital "humors" within us: Yellow Bile, Black Bile,
Phlegm, Blood
What is a Cause?
➢ In epidemiology, the “cause” is an agent (microbial germs, polluted water, smoking,
etc.) that modifies health, and the “effect” describes the way that the health is
changed by the agent. The agent is often potentially pathogenic (in which case it is
known as a “risk factor”).
- The absolute effect of a cause expresses the increase in the risk or the
additional number of cases of illness that result or could result from
exposure to this cause. It is measured by the attributable risk and its
derivatives.
Characteristics of a Cause
Epidemiologists often use the term "risk factor" to indicate a factor that is associated with
a given outcome. However, a risk factor is not necessarily a cause. The term risk factor includes
surrogates for underlying causes.
It is important to distinguish between risk factors and causes. Nevertheless, before one
can wrestle with the difficult question of causation, it is first necessary to establish that a valid
association exists. Consequently, if we accept Susser's assertion that a cause is something that
makes a difference, one might then ask how to tell if a factor makes a difference. Most
epidemiologists would agree that, in a broad sense, this is a two step process.
The evidence must be examined to determine that there is a valid association between
an exposure and an outcome. This is achieved by conducting epidemiologic studies and
critically reviewing the available studies to determine whether random error or bias or
confounding might explain the apparent association.
If it is determined that there is a valid association, then one must wrestle with the
question of whether the association was causal. Not all associations are causal. There are no
standardized rules for determining whether a relationship is causal.
➢ Hill's Criteria form the basis of modern epidemiological research, which attempts to
establish scientifically valid causal connections between potential disease agents
and the many diseases that afflict humankind.
➢ The principles set forth by Hill form the basis of evaluation used in all modern
scientific research. While it is quite easy to claim that agent "A" (e.g., smoking)
causes disease "B" (lung cancer), it is quite another matter to establish a
meaningful, statistically valid connection between the two phenomena.
1. Temporal Relationship:
Exposure always precedes the outcome. If factor "A" is believed to cause a disease, then it is
clear that factor "A" must necessarily always precede the occurrence of the disease. This is the
only absolutely essential criterion. This criterion negates the validity of all functional
explanations used in the social sciences, including the functionalist explanations that dominated
British social anthropology for so many years and the ecological functionalism that pervades
much American cultural ecology.
2. Strength:
This is defined by the size of the association as measured by appropriate statistical tests. The
stronger the association, the more likely it is that the relation of "A" to "B" is causal. For
example, the more highly correlated hypertension is with a high sodium diet, the stronger is the
relation between sodium and hypertension. Similarly, the higher the correlation between
patrilocal residence and the practice of male circumcision, the stronger is the relation between
the two social practices.
3. Dose-Response Relationship:
4. Consistency:
The association is consistent when results are replicated in studies in different settings using
different methods. That is, if a relationship is causal, we would expect to find it consistently in
different studies and among different populations. This is why numerous experiments have to
be done before meaningful statements can be made about the causal relationship between two
or more factors. For example, it required thousands of highly technical studies of the
relationship between cigarette smoking and cancer before a definitive conclusion could be made
that cigarette smoking increases the risk of (but does not cause) cancer. Similarly, it would
require numerous studies of the difference between male and female performance of specific
behaviors by a number of different researchers and under a variety of different circumstances
before a conclusion could be made regarding whether a gender difference exists in the
performance of such behaviors.
5. Plausibility:
7. Experiment:
8. Specificity:
A Self-regulated Learning Module 115
This is established when a single putative cause produces a specific effect. This is considered
by some to be the weakest of all the criteria. The diseases attributed to cigarette smoking, for
example, do not meet this criteria. When specificity of an association is found, it provides
additional support for a causal relationship. However, absence of specificity in no way negates
a causal relationship. Because outcomes (be they the spread of a disease, the incidence of a
specific human social behavior or changes in global temperature) are likely to have multiple
factors influencing them, it is highly unlikely that we will find a one-to-one cause-effect
relationship between two phenomena. Causality is most often multiple. Therefore, it is
necessary to examine specific causal relationships within a larger systemic perspective.
9. Coherence:
The association should be compatible with existing theory and knowledge. In other words, it is
necessary to evaluate claims of causality within the context of the current state of knowledge
within a given field and in related fields. What do we have to sacrifice about what we currently
know in order to accept a particular claim of causality. What, for example, do we have to reject
regarding our current knowledge in geography, physics, biology and anthropology in order to
accept the Creationist claim that the world was created as described in the Bible a few thousand
years ago? Similarly, how consistent are racist and sexist theories of intelligence with our
current understanding of how genes work and how they are inherited from one generation to the
next? However, as with the issue of plausibility, research that disagrees with established theory
and knowledge are not automatically false. They may, in fact, force a reconsideration of
accepted beliefs and principles. All currently accepted theories, including Evolution, Relativity
and non-Malthusian population ecology, were at one time new ideas that challenged orthodoxy.
Thomas Kuhn has referred to such changes in accepted theories as "Paradigm Shifts".
The model has similarities to the "web of causation", but is more developed in the sense
that it simultaneously provides a general model for the conditions necessary to cause (and
prevent) disease in a single individual and for the epidemiological study of the causes of
disease among groups of individuals.
Synthesis:
Epidemiology has a vested interest in causation as, despite its numerous and often vague
definitions, it is a discipline with the goal of identifying causes of disease (both modifiable and
nonmodifiable) so that the disease or its consequences might be prevented.
Three Essential Attributes of a Cause
SEATWORK:
Draw the following Models of Causation namely: Hill's Criteria for Causality, Web of causation
and The Sufficient-Component Cause Model. Give a 3-5 explanation of each in your own words.
( 5 points each model)
RUBRIC:
INDICATOR DESCRIPTION RATING
Appearance Artistic presentation of the 2
image
Content In-depth explanation using 2
own words
Mechanics Impressive us of language 1
and grammar
References:
Boston University of Public Health (n.d) Causal Inference. Retrieved from https://sphweb.
bumc.bu.edu/otlt/MPH- Modules/EP/EP713_Causality/EP713_Causality_print.html on
August 6, 2020
Lamorte, W. (2019). Elements of a Cause. Boston University School of Public Health Retrieved
from ttps://sphweb.bumc.bu.edu/otlt/MPH-Modules/PH717-QuantCore/PH717-Module1A-
Populations/Module1A-Populations7.html on August 6, 2020
Susser, M. (2003). Causal Inference. Columbia University Mailman School of Public Health
Retrieved from https://epiville.ccnmtl.columbia.edu/causal_inference/ on August 6, 2020
Cause and Effect in Epidemiology (2008). In: The Concise Encyclopedia of Statistics. Springer,
New York, NY. https://doi.org/10.1007/978-0-387-32833-1_48
I. BACKGROUND
Causal inference -- the art and science of making a causal claim about the relationship between
two factors -- is in many ways the heart of epidemiologic research. Under most circumstances if there is
an association between an exposure and a health outcome of interest, most commonly asked question
is: is one causing the other? Causal inference is important because, ultimately, it can help in the
intervention to improve public health, and interventions can be targeted on removing known causes of
adverse health outcomes (or adding known causes of beneficial health outcomes).
II. OBJECTIVES
At the end of the lesson you are expected to:
III. MATERIALS
Writing & Drawing materials
References
LESSON 15
FIELD EPIDEMIOLOGY/ OUTBREAK INVESTIGATION
Introduction
Field epidemiology is the way how epidemics and outbreaks are investigated. This is
used to implement measures to protect and improve the health of the public. Field
epidemiologists are the individuals that deal with unexpected, sometimes urgent problems
that demand immediate solution.
Lesson Proper:
Introduction
✓ Because field investigations often start without specific hypotheses about the
cause or source of disease, they require the use of descriptive studies to
Health departments become aware of possible disease outbreaks or other acute public
health problems in different ways. Situations might gain attention because astute clinicians
recognize unusual patterns of disease among their patients and alert health departments,
surveillance systems for monitoring disease or hazard trends detect increases, the
diagnosis of a single case of a rare disease heralds a broader problem or potential threat, or
members of the public are concerned and contact authorities.
After such alerts, the first step is to decide whether to conduct a field investigation. Initial
assessments might dispel concerns or affirm that further investigation is warranted. After
initiated, decisions must be made at successive stages about how far to pursue an
investigation. These decisions are necessary to make the most effective use of public health
resources, including capacities to conduct field investigations and optimize opportunities for
disease prevention.
In addition to the need to develop and implement control measures to end threats to the
public’s health, such as the Legionnaires’ disease and EVD outbreaks, other determinants
that shape field investigations include (1) epidemiologic, programmatic, and resource
considerations; (2) public and political considerations; (3) research and learning
opportunities; (4) legal obligations; and (5) training needs.
Data Sources
Field investigations often use information abstracted from different sources, such as
hospital, outpatient medical, or school health records. These records vary substantially in
A Self-regulated Learning Module 120
completeness and accuracy among patients, healthcare providers, and facilities because entries
are made for purposes other than conducting epidemiologic studies. Moreover, rapid and
substantive transitions have occurred for several key information sources—as, for example, in
the growing use of electronic medical records, hospital and managed-care data systems, and
laboratory information management systems. These automated systems can facilitate access to
needed records but might not be compatible with meeting the needs of or supporting specific
record access by external investigators. Thus, the quality of such records as sources of data for
epidemiologic investigations can be substantially less than the quality of information obtained
when investigators can exert greater control through the use of standardized, pretested
questionnaires; physical or laboratory examinations; or other prospectively designed, rather
than retrospective, data collection methods. These transitions necessitate that epidemiologists
involved in field investigations increasingly might need to know how to use these data sources
and, therefore, possess the requisite skills needed to analyze them.
The increasing use of social media and email can facilitate outreach to and queries of
persons who might have common exposures in an outbreak situation, such as participants in an
organized event linked to a common-source exposure. Recently, social media networks have
been used to assist in identifying contacts of persons with sexually transmitted diseases who
might be at high risk and should be considered for targeted prophylaxis. These communication
tools have provided added insight into social links and high-risk behaviors and have been used
to guide and augment data collected from traditional case investigation methodologies
OUTBREAK INVESTIGATION
Once the decision to conduct a field investigation of an acute outbreak has been made,
working quickly is essential — as is getting the right answer. In other words, epidemiologists
cannot afford to conduct an investigation that is “quick and dirty.” They must conduct
investigations that are “quick and clean.” Under such circumstances, epidemiologists find it
useful to have a systematic approach to follow. This approach ensures that the investigation
proceeds without missing important steps along the way.
The numbering scheme for this step is problematic, because preparing for field work
often is not the first step. Only occasionally do public health officials decide to conduct a field
investigation before confirming an increase in cases and verifying the diagnosis. More
commonly, officials discover an increase in the number of cases of a particular disease and then
decide that a field investigation is warranted. Sometimes investigators collect enough
information to perform descriptive epidemiology without leaving their desks, and decide that a
field investigation is necessary only if they cannot reach a convincing conclusion without one.
Regardless of when the decision to conduct a field investigation is made, you should be
well prepared before leaving for the field. The preparations can be grouped into two broad
categories: (a) scientific and investigative issues, and (b) management and operational issues.
Good preparation in both categories is needed to facilitate a smooth field experience.
In contrast to outbreak and epidemic, a cluster is an aggregation of cases in a given area over a
particular period without regard to whether the number of cases is more than expected. This
aggregation of cases seems to be unusual, but frequently the public (and sometimes the health
agency) does not know the denominator. For example, the diagnosis in one neighborhood of
four adults with cancer may be disturbing to residents but may well be within the expected level
of cancer occurrence, depending on the size of the population, the types of cancer, and the
prevalence of risk factors among the residents.
One of the first tasks of the field investigator is to verify that a cluster of cases is indeed
an outbreak. Some clusters turn out to be true outbreaks with a common cause, some are
sporadic and unrelated cases of the same disease, and others are unrelated cases of similar
but unrelated diseases.
Even if the cases turn out to be the same disease, the number of cases may not exceed
what the health department normally sees in a comparable time period. Here, as in other areas
of epidemiology, the observed is compared with the expected. The expected number is usually
the number from the previous few weeks or months, or from a comparable period during the
previous few years. For a notifiable disease, the expected number is based on health
department surveillance records. For other diseases and conditions, the expected number may
be based on locally available data such as hospital discharge records, mortality statistics, or
cancer or birth defect registries. When local data are not available, a health department may
use rates from state or national data, or, alternatively, conduct a telephone survey of physicians
Even if the current number of reported cases exceeds the expected number, the excess
may not necessarily indicate an outbreak. Reporting may rise because of changes in local
reporting procedures, changes in the case definition, increased interest because of local or
national awareness, or improvements in diagnostic procedures. A new physician, infection
control nurse, or healthcare facility may more consistently report cases, when in fact there has
been no change in the actual occurrence of the disease. Some apparent increases are actually
the result of misdiagnosis or laboratory error. Finally, particularly in areas with sudden changes
in population size such as resort areas, college towns, and migrant farming areas, changes in
the numerator (number of reported cases) may simply reflect changes in the denominator (size
of the population).
The next step, verifying the diagnosis, is closely linked to verifying the existence of an
outbreak. In fact, often these two steps are addressed at the same time. Verifying the diagnosis
is important: (a) to ensure that the disease has been properly identified, since control measures
are often disease-specific; and (b) to rule out laboratory error as the basis for the increase in
reported cases.
First, review the clinical findings and laboratory results. If you have questions about the
laboratory findings (for example, if the laboratory tests are inconsistent with the clinical and
epidemiologic findings), ask a qualified laboratorian to review the laboratory techniques being
used. If you need specialized laboratory work such as confirmation in a reference laboratory,
DNA or other chemical or biological fingerprinting, or polymerase chain reaction, you must
secure a sufficient number of appropriate specimens, isolates, and other laboratory material as
soon as possible.
Second, many investigators — clinicians and non-clinicians — find it useful to visit one
or more patients with the disease. If you do not have the clinical background to verify the
diagnosis, bring a qualified clinician with you. Talking directly with some patients gives you a
better understanding of the clinical features, and helps you to develop a mental image of the
disease and the patients affected by it. In addition, conversations with patients are very useful in
generating hypotheses about disease etiology and spread. They may be able to answer some
critical questions: What were their exposures before becoming ill? What do they think caused
their illness? Do they know anyone else with the disease? Do they have anything in common
with others who have the disease?
Third, summarize the clinical features using frequency distributions. Are the clinical
features consistent with the diagnosis? Frequency distributions of the clinical features are useful
in characterizing the spectrum of illness, verifying the diagnosis, and developing case
A Self-regulated Learning Module 123
definitions. These clinical frequency distributions are considered so important in establishing the
credibility of the diagnosis that they are frequently presented in the first table of an
investigation’s report or manuscript.
A case definition is a standard set of criteria for deciding whether an individual should be
classified as having the health condition of interest. A case definition includes clinical criteria
and — particularly in the setting of an outbreak investigation — restrictions by time, place, and
person. The clinical criteria should be based on simple and objective measures such as “fever ≥
40°C (101°F),” “three or more loose bowel movements per day,” or “myalgias (muscle pain)
severe enough to limit the patient’s usual activities. ” The case definition may be restricted by
time (for example, to persons with onset of illness within the past 2 months), by place (for
example, to residents of the nine-county area or to employees of a particular plant) and by
person (for example, to persons with no previous history of a positive tuberculin skin test, or to
premenopausal women). Whatever the criteria, they must be applied consistently to all persons
under investigation.
The case definition must not include the exposure or risk factor you are interested in
evaluating. This is a common mistake. For example, if one of the hypotheses under
consideration is that persons who worked in the west wing were at greater risk of disease, do
not define a case as “illness among persons who worked in the west wing with onset
between…” Instead, define a case as “illness among persons who worked in the facility with
onset between…” Then conduct the appropriate analysis to determine whether those who
worked in the west wing were at greater risk than those who worked elsewhere.
In the outbreak setting, the investigators would need to specify time and place to
complete the outbreak case definition. For example, if investigating an epidemic of
meningococcal meningitis in Bamako, the case definition might be the clinical features as
described in the box with onset between January and April of this year among residents and
visitors of Bamako.
A case definition is a tool for classifying someone as having or not having the disease of
interest, but few case definitions are 100% accurate in their classifications. Some persons with
mild illness may be missed, and some persons with a similar but not identical illness may be
included. Generally, epidemiologists strive to ensure that a case definition includes most if not
all of the actual cases, but very few or no false-positive cases. However, this ideal is not always
met. For example, case definitions often miss infected people who have mild or no symptoms,
because they have little reason to be tested.
As noted earlier, many outbreaks are brought to the attention of health authorities by
concerned healthcare providers or citizens. However, the cases that prompt the concern are
often only a small and unrepresentative fraction of the total number of cases. Public health
workers must therefore look for additional cases to determine the true geographic extent of the
problem and the populations affected by it.
Usually, the first effort to identify cases is directed at healthcare practitioners and
facilities — physicians’ clinics, hospitals, and laboratories — where a diagnosis is likely to be
made. Investigators may conduct what is sometimes called stimulated or enhanced passive
surveillance by sending a letter describing the situation and asking for reports of similar cases.
Alternatively, they may conduct active surveillance by telephoning or visiting the facilities to
collect information on any additional cases.
In some outbreaks, public health officials may decide to alert the public directly, usually
through the local media. In other situations, the media may have already spread the word. For
example, in an outbreak of listeriosis in 2002 caused by contaminated sliceable turkey deli
meat, announcements in the media alerted the public to avoid the implicated product and
instructed them to see a physician if they developed symptoms compatible with the disease in
question.
Finally, investigators should ask case-patients if they know anyone else with the same
condition. Frequently, one person with an illness knows or hears of others with the same illness.
In some investigations, investigators develop a data collection form tailored to the specific
details of that outbreak. In others, investigators use a generic case report form. Regardless of
which form is used, the data collection form should include the following types of information
about each case.
Traditionally, the information described above is collected on a standard case report form,
questionnaire, or data abstraction form.
Conceptually, the next step after identifying and gathering basic information on the
persons with the disease is to systematically describe some of the key characteristics of those
persons. This process, in which the outbreak is characterized by time, place, and person, is
called descriptive epidemiology. It may be repeated several times during the course of an
investigation as additional cases are identified or as new information becomes available.
Time
Traditionally, a special type of histogram is used to depict the time course of an
epidemic. This graph, called an epidemic curve, or epi curve for short, provides a simple visual
display of the outbreak’s magnitude and time trend.
Place
Assessment of an outbreak by place not only provides information on the geographic
extent of a problem, but may also demonstrate clusters or patterns that provide important
A Self-regulated Learning Module 126
etiologic clues. A spot map is a simple and useful technique for illustrating where cases live,
work, or may have been exposed.
Person
Characterization of the outbreak by person provides a description of whom the case-
patients are and who is at risk. Person characteristics that are usually described include both
host characteristics (age, race, sex, and medical status) and possible exposures (occupation,
leisure activities, and use of medications, tobacco, and drugs). Both of these influence
susceptibility to disease and opportunities for exposure.
The two most commonly described host characteristics are age and sex because they
are easily collected and because they are often related to exposure and to the risk of disease.
Depending on the outbreak, occupation, race, or other personal characteristics specific to the
disease under investigation and the setting of the outbreak may also be important. For example,
investigators of an outbreak of hepatitis B might characterize the cases by intravenous drug use
and sexual contacts, two of the high risk exposures for that disease. Investigators of a school-
based gastroenteritis outbreak might describe occurrence by grade or classroom, and by
student versus teacher or other staff.
In an outbreak context, hypotheses are generated in a variety of ways. First, consider what you
know about the disease itself: What is the agent’s usual reservoir? How is it usually transmitted?
What vehicles are commonly implicated? What are the known risk factors? In other words, by
being familiar with the disease, you can, at the very least, “round up the usual suspects.”
Another useful way to generate hypotheses is to talk to a few of the case-patients, as discussed
in Step 3. The conversations about possible exposures should be open-ended and wide-
ranging, not necessarily confined to the known sources and vehicles. In some challenging
investigations that yielded few clues, investigators have convened a meeting of several case-
patients to search for common exposures. In addition, investigators have sometimes found it
useful to visit the homes of case-patients and look through their refrigerators and shelves for
clues to an apparent foodborne outbreak.
Just as case-patients may have important insights into causes, so too may the local health
department staff. The local staff know the people in the community and their practices, and
often have hypotheses based on their knowledge.
A Self-regulated Learning Module 127
The descriptive epidemiology may provide useful clues that can be turned into hypotheses. If
the epidemic curve points to a narrow period of exposure, what events occurred around that
time? Why do the people living in one particular area have the highest attack rate? Why are
some groups with particular age, sex, or other person characteristics at greater risk than other
groups with different person characteristics? Such questions about the data may lead to
hypotheses that can be tested by appropriate analytic techniques.
After a hypothesis that might explain an outbreak has been developed, the next step is
to evaluate the plausibility of that hypothesis. Typically, hypotheses in a field investigation are
evaluated using a combination of environmental evidence, laboratory science, and
epidemiology. From an epidemiologic point of view, hypotheses are evaluated in one of two
ways: either by comparing the hypotheses with the established facts or by using analytic
epidemiology to quantify relationships and assess the role of chance.
The first method is likely to be used when the clinical, laboratory, environmental, and/or
epidemiologic evidence so obviously supports the hypotheses that formal hypothesis testing is
unnecessary. For example, in an outbreak of hypervitaminosis D that occurred in
Massachusetts in 1991, investigators found that all of the case-patients drank milk delivered to
their homes by a local dairy. Therefore, investigators hypothesized that the dairy was the source
and the milk was the vehicle. When they visited the dairy, they quickly recognized that the dairy
was inadvertently adding far more than the recommended dose of vitamin D to the milk. No
analytic epidemiology was really necessary to evaluate the basic hypothesis in this setting or to
implement appropriate control measures, although investigators did conduct additional studies
to identify additional risk factors.
In many other investigations, however, the circumstances are not as straightforward, and
information from the series of cases is not sufficiently compelling or convincing. In such
investigations, epidemiologists use analytic epidemiology to test their hypotheses. The key
feature of analytic epidemiology is a comparison group. The comparison group allows
epidemiologists to compare the observed pattern among case-patients or a group of exposed
persons with the expected pattern among noncases or unexposed persons. By comparing the
observed with expected patterns, epidemiologists can determine whether the observed pattern
differs substantially from what should be expected and, if so, by what degree. In other words,
epidemiologists can use analytic epidemiology with its hallmark comparison group to quantify
relationships between exposures and disease, and to test hypotheses about causal
relationships. The two most common types of analytic epidemiology studies used in field
investigations are retrospective cohort studies and case-control studies
Unfortunately, analytic studies sometimes are unrevealing. This is particularly true if the
hypotheses were not well founded at the outset. It is an axiom of field epidemiology that if you
cannot generate good hypotheses (for example, by talking to some case-patients or local staff
and examining the descriptive epidemiology and outliers), then proceeding to analytic
epidemiology, such as a case-control study, is likely to be a waste of time.
Step 10: Compare and reconcile with laboratory and environmental studies
While epidemiology can implicate vehicles and guide appropriate public health action,
laboratory evidence can confirm the findings. Environmental studies are equally important in
some settings. They are often helpful in explaining why an outbreak occurred. Thus the
epidemiologic, environmental, and laboratory arms of the investigation complemented one
another, and led to an inescapable conclusion that the well had been contaminated and was the
source of the outbreak.
While you may not be an expert in these other areas, you can help. Use a camera to
photograph working or environmental conditions. Coordinate with the laboratory, and bring back
physical evidence to be analyzed.
In most outbreak investigations, the primary goal is control of the outbreak and
prevention of additional cases. Indeed, although implementing control and prevention measures
is listed as Step 11 in the conceptual sequence, in practice control and prevention activities
should be implemented as early as possible. The health department’s first responsibility is to
protect the public’s health, so if appropriate control measures are known and available, they
should be initiated even before an epidemiologic investigation is launched. For example, a child
with measles in a community with other susceptible children may prompt a vaccination
campaign before an investigation of how that child became infected.
In general, control measures are usually directed against one or more segments in the
chain of transmission (agent, source, mode of transmission, portal of entry, or host) that are
susceptible to intervention. For some diseases, the most appropriate intervention may be
directed at controlling or eliminating the agent at its source.
Once control and prevention measures have been implemented, they must continue to be
monitored. If surveillance has not been ongoing, now is the time to initiate active surveillance. If
active surveillance was initiated as part of case finding efforts, it should be continued. The
reasons for conducting active surveillance at this time are twofold. First, you must continue to
monitor the situation and determine whether the prevention and control measures are working.
Is the number of new cases slowing down or, better yet, stopping? Or are new cases continuing
to occur? If so, where are the new cases? Are they occurring throughout the area, indicating
that the interventions are generally ineffective, or are they occurring only in pockets, indicating
that the interventions may be effective but that some areas were missed?
Second, you need to know whether the outbreak has spread outside its original area or the area
where the interventions were targeted. If so, effective disease control and prevention measures
must be implemented in these new areas.
As noted in Step 1, development of a communications plan and communicating with those who
need to know during the investigation is critical. The final task is to summarize the investigation,
its findings, and its outcome in a report, and to communicate this report in an effective manner.
This communication usually takes two forms:
• An oral briefing for local authorities. If the field investigator is responsible for the
epidemiology but not disease control, then the oral briefing should be attended by the
local health authorities and persons responsible for implementing control and prevention
measures. Often these persons are not epidemiologists, so findings must be presented
in clear and convincing fashion with appropriate and justifiable recommendations for
action. This presentation is an opportunity for the investigators to describe what they did,
what they found, and what they think should be done about it. They should present their
findings in a scientifically objective fashion, and they should be able to defend their
conclusions and recommendations.
• A written report. Investigators should also prepare a written report that follows the usual
scientific format of introduction, background, methods, results, discussion, and
recommendations. By formally presenting recommendations, the report provides a
blueprint for action. It also serves as a record of performance and a document for
A Self-regulated Learning Module 130
potential legal issues. It serves as a reference if the health department encounters a
similar situation in the future. Finally, a report that finds its way into the public health
literature serves the broader purpose of contributing to the knowledge base of
epidemiology and public health.
In recent years, the public has become more aware of and interested in public health. In
response, health departments have made great strides in attempting to keep the public
informed. Many health departments strive to communicate directly with the public, usually
through the media, both during an investigation and when the investigation is concluded.
Synthesis:
Key developments in public health practice during recent decades reflect the growing
recognition and formalization of field epidemiology, including establishment of field epidemiology
training programs in affiliation with ministries of health and other national-level public health
agencies around the world
As the discipline of field epidemiology continues to evolve, new developments and trends
are shaping its ongoing incorporation within public health practice. Examples of these
developments include the following.
Assessment:
Offline Learners: Do the activity below and submit it to your instructor via email.
Online Learners: Do the activity posted in Google Classroom
QUIZ: it will be given through quizziz, google forms or for weak connectivity will be sent through
email
QUESTIONS :
1. What are the 2 issues encountered in field work? Briefly explain each
2. Why are epidemic curves informative?
3. Explain “When the epidemiology does not fit the natural pattern, think unnatural, i.e.,
intentional.”
RUBRIC
Blank S, Scanlon KS, Sinks TH, Lett S, Falk H. An outbreak of hypervitaminosis D associated
with the overfortication of milk from a home-delivery dairy. Am J Public Health
1995;85:656–9.
Centers for Disease Control and Prevention. (2016). Lesson 6: Investigating an Outbreak
Section 2: Steps of an Outbreak Investigation. Principles of Epidemiology in Public Health
Practice, Third Edition An Introduction to Applied Epidemiology and Biostatistics.
Retrieved from https://www.cdc.gov/csels/dsepd/ss1978/lesson6/section2.html on August
7, 2020
Goodman, R, Buehler, J. Mott, J. (2018). Defining Epidemiology. The CDC Field Epidemiology
Manual. Retrieved from https://www.cdc.gov/eis/field-epi-manual/chapters/Defining-Field-
Epi.html on August 7, 2020
Huang, F. and Bayona, M. (2004). Disease Outbreak Investigation. The Young Epidemiology
Scholars Program (YES) . The Robert Wood Johnson Foundation and administered by the
College Board. Retrieved from https://secure-media.collegeboard. org/digitalservices/
pdf/yes/disease_outbreak.pdf on August 7, 2020
King, M., Bensyl, D., Goodman, R., Rasmussen, S. (2018). Conducting a Field Investigation.
The CDC Field Epidemiology Manual. Retrieved from https://www.cdc.gov/eis/field-epi-
manual/chapters/Field-Investigation.html on August 7, 2020
Last JM. A dictionary of epidemiology, 4th ed. New York: Oxford U Press, 2001:129.
Jacobus CH, Holick MF, Shao Q, Chen TC, Holm IA, Kolodny JM, et al. Hypervitaminosis D
associated with drinking milk. New Engl J Med 1992;326:1173–7.
I. BACKGROUND
Field epidemiology are investigations initiated in response to urgent public health
problems. A primary goal of field epidemiology is to guide, as quickly as possible, the processes of
selecting and implementing interventions to lessen or prevent illness or death when such problems arise
Once the decision to conduct a field investigation of an acute outbreak has been made,
working quickly is essential — as is getting the right answer. In other words, epidemiologists
cannot afford to conduct an investigation that is “quick and dirty.” They must conduct
investigations that are “quick and clean.” Under such circumstances, epidemiologists find it
II. OBJECTIVES
At the end of the lesson you are expected to:
1. To be familiar with the steps that are taken to conduct an epidemic investigation, particularly
for an unknown disease
2. To illustrate the procedures of an outbreak investigation by using historical cases of out-break
investigations
III. MATERIALS
Writing & Drawing materials
References/Internet
IV. ACTIVITY (CASE ANALYSIS): ANSWER THE QUESTIONS WRITTEN IN RED INL
Food-Borne Outbreak
Background
Outbreak Investigation
The local health department was notified of a potential food-borne outbreak of food
poisoning in Barangay A in Baguio City and the epidemic team, including a medical epidemiologist,
a microbiologist and a nurse, visited the local hospitals to interview the attending physicians, the
patients and some of their relatives. Some stool samples were obtained from patients for
microbiologic identification of the causative agent. The epidemic team knew that these types of
outbreak usually occur in a very short time period that lasts no more than a few hours or one to two
days after people ingest a contaminated meal.
Epidemic investigators gather data to define the distribution of the disease by time (onset
time and epidemic curve), place (potential places where the implicated meal was served, such as
canteens, restaurants and picnics) and person (the distribution of the disease by age, gender and
food items eaten). The findings of the initial investigation included the following information. The
distribution of the disease by person (age and gender) was found as follows:
Please calculate the totals for each column and row and their corresponding percentages to try to
determine if there are any important differences by age or by gender. Such a task is carried out to
investigate if there are any high-risk groups and if the age and gender distribution can give some
clues about the source of the outbreak. Interpret your findings.
outbreak. The epidemic team studied the curve and recognized that this was a typical single source
acute outbreak. The team also could see that the onset of symptoms in all patients occurred during
a six-hour period. Given the symptoms mentioned above and the epidemic curve, the epidemic
team concluded that this type of epidemic usually corresponds to intoxication o rfood poisoning and
that the potentially implicated meal was probably served and consumed within a period of a few
hours before the onset of the symptoms. Therefore the epidemic team investigated the places
where affected persons, their relatives and neighbors ate that day (September 28). The following
table shows the team's findings:
Once the implicated place was determined, the investigation centered on the food. The following
table includes the food items served in that place on September 28:
V.
Please calculate the attack rates per 100 (incidence rates per 100) by food item to try to determine the
one that was probably contaminated. Compare attack rates (AR) for those who ate the food item with
attack rates for those who did not eat the food item, by using the relative risk(i.e., RR = AR in those who
ate the food/AR in those who did not eat the food). Interpret your findings
Given that the epidemic team worked fast enough and the implicated meal(s) was (were)identified
before all food leftovers were discarded, food samples from some meal leftovers were taken to the
laboratory. In addition, stool samples were taken from the kitchen personnel who prepared or handled
each different food item. The laboratory confirmed that Salmonella toxin was present in some of the
food samples and that one of the kitchen personnel of that place had the same Salmonella species.
Furthermore, the Salmonella species found in the food and the kitchen worker was the same species
found in stool samples of the patients.
Please discuss these findings and identify the kitchen worker possibly responsible for the outbreak.
RUBRIC
Introduction
Chronic disease epidemiology addresses the etiology, prevention, distribution,
natural history, and treatment outcomes of chronic health disorders, including cancer
(particularly breast, colon, lung, prostate, ovary and pancreas), cardiovascular disease,
diabetes, gastrointestinal and pulmonary disease, and obesity. Many of the greatest
population health problems are in chronic disease and include large contributions from a
lack of prioritization and appropriate implementation regarding known hazards and effective
prevention. Robust quantitative evidence is critical to addressing these.
Lesson Proper:
Chronic diseases are defined broadly as conditions that last 1 year or more and
require ongoing medical attention or limit activities of daily living or both. According to the
World health Organization, chronic diseases are diseases of long duration and generally
slow progression. So once someone has the condition they will have to manage and control
it. These are some of the major features of chronic diseases:
➢ They have an uncertain etiology: no direct causes have been identified for the
emergence of these diseases; studies show relationships between the
emergence of the disease and exposure to certain factors referred to as ‘risk
factors’.
➢ A cluster of factors, such as the ones mentioned above, are shown to have a
strong predictive relationship to these diseases, even if exposure to these factors
does not necessarily lead to such disease; for the chronic diseases major risk
factors relate to life conditions and practices.
- They have multiple risk factors: unlike most infectious diseases they result from
exposure to several risk factors;
- They have a long latency period: the disease proceeds over a long course of
time without symptoms;
- They show a prolonged course of illness;
- They are generally non-contagious in origin;
- They result in functional impairment or disability;
- They are incurable;
- They require long-term and systematic approach to treatment.
When more than chronic condition occurs at the same time, the picture gets
more complicated. One in three adults worldwide has multiple chronic conditions:
cardiovascular disease alongside diabetes, depression as well as cancer, or a
combination of three, four, or even five or six diseases at the same time.
Chronic NCDs have been said to place a burden on individuals, families, health
systems and the economy, brought about by loss of independence, loss of income,
increased budget for medication and loss of economically active workforce.
Chronic diseases account for 6 of the top 7 causes of death according the Centers for
Disease Control and Prevention. From cardiovascular disease to diabetes to cancer and
pulmonary disease, chronic diseases claim far more lives than such infectious diseases as
pneumonia and influenza. And as the population ages, the burden of these diseases is only
likely to increase. The same is true overseas and, increasingly, in developing countries.
These trends have spurred myriad efforts at international, national, state, and local
levels and in academia and industry – all of which call for interdisciplinary researchers and
practitioners with the ability to identify risk factors, understand the social context, and
develop prevention strategies for the most significant chronic diseases.
Assessment:
Offline Learners: Do the activity below and submit it to your instructor via email.
Online Learners: Do the activity posted in Google Classroom
QUIZ: it will be given through quizziz, google forms or for weak connectivity will be sent through
email
References:
Columbia Mailman School of Public Health. (2020). Chronic Disease Epidemiology. Retrieved
from https://www.publichealth.columbia.edu/academics/ departments/epidemiology/
research/chronic-disease-epidemiology on August 7, 2020
DeSalvo KB, Wang YC, Harris A, Auerbach J, Koo D, O’Carroll P. Public Health 3.0: a call to
action for public health to meet the challenges of the 21st century. Prev Chronic Dis
2017;14:E78.
Huang, F. and Bayona, M. (2004). Disease Outbreak Investigation. The Young Epidemiology
Scholars Program (YES) . The Robert Wood Johnson Foundation and administered by the
College Board. Retrieved from https://secure-media.collegeboard. org/digitalservices/
pdf/yes/disease_outbreak.pdf on August 7, 2020
School of Public Health University of the Western Cape (n.d). Epidemiology and Control of Non-
communicable Diseases – Unit 1. Retrieved from https://www.google.com/url?sa
=t&rct=j&q=&esrc=s&source=web&cd=&cad=rja&uact=8&ved=2ahUKEwj70OKNwojrAhV
syIsBHf2sDwUQFjADegQIAhAB&url=https%3A%2F%2Fwww.uwc.ac.za%2FFaculties%2
FCHS%2Fsoph%2FDocuments%2FSOPH%2520UWC-%2520 Epidemiology%2520File
%25202%2520Unit%25201(JB).doc&usg=AOvVaw1Wb56ogz-d9-BJ1MnzGNLM on
August 7, 2020
Waxman, A.(2020). This is the biggest challenge to our health. World Economic Forum articles
may be republished in accordance with the Creative Commons Attribution-
NonCommercial-NoDerivatives 4.0 International Public License, and in accordance with
our Terms of Use. Retrieved from https://www.weforum.org/agenda/2017/12/healthcare-
future-multiple-chronic-disease-ncd/ on August 7, 2020
I. BACKGROUND
Chronic disease epidemiology addresses the etiology, prevention, distribution, natural
history, and treatment outcomes of chronic health disorders, including cancer (particularly breast, colon,
lung, prostate, ovary and pancreas), cardiovascular disease, diabetes, gastrointestinal and pulmonary
disease, and obesity. Many of the greatest population health problems are in chronic disease and
include large contributions from a lack of prioritization and appropriate implementation regarding
known hazards and effective prevention. Robust quantitative evidence is critical to addressing these.II.
II. OBJECTIVES
At the end of the lesson you are expected to:
1. Explain the terms ‘chronic diseases’ and chronic non communicable disease
2. Describe the extent of the problem.
3. Explain why chronic diseases are a concern.
4. Understand the global and local burden of chronic diseases.
5. Understand basic concepts in chronic disease epidemiology.
6. Understand the implication of these chronic diseases in relation to health and development at
the global, country and family level
III. MATERIALS
Writing & Drawing materials
References
ACTIVITY
- Complete the following tasks
Now that you have an overview on chronic diseases, refer to the NCD country profile 2019
presented by WHO to answer the following questions:
• What chronic NCDs are prevalent in the world, Philippines and your own province?
• How does your country compare to other countries within the same income group?
Bearing in mind the strain on health services brought about by the burden of chronic diseases,
explain the South African health service status within a developing country, and highlight how you
see health systems accommodating people from different socio-economic statuses (especially the
poor).
Look for various definitions of chronic diseases. Based on your research present a picture
of the nature of chronic disease. What then might be the implications of chronic disease?
• What costs or losses might a chronic disease predispose one to? Think of costs,
or the burden of suffering that chronic disease presents individuals, families and the
society at large. You may categorise your response according to these affected
populations.
Introduction
Clinical epidemiology is the study of the patterns, causes, and effects of health and
disease in patient populations and the relationships between exposures or treatments
and health outcomes.
Lesson Proper:
In 1938, Paul used the term clinical epidemiology for the first time and defined it
as a new basic science for preventive medicine,6 but Paul’s description does not entirely
cover the modern description of clinical epidemiology, which concepts have been
developed since the mid-1960s in particular, by Sackett, Feinstein, the Fletchers, and
Weiss. Clinical epidemiology interfaces with many other areas. Thus, practical
application of clinical epidemiology is a key part of evidence-based medicine and clinical
decision making. In recent years, clinical epidemiology has become important for the
health care system because of the need for assessments in the areas of quality of care,
patient safety, health economics, and use of resources, all of which are based on clinical
epidemiology thinking. Furthermore, clinical epidemiology supplies data and evidence
needed in organization and planning of the health care system. Biostatistics is an
important basic tool for clinical epidemiology.
Synthesis:
Clinical Epidemiology is primarily focused on research on clinical questions, and on
the application of epidemiological principles and questions relating to patients and clinical
care in terms of
prevention, diagnosis,
prognosis, and
treatment.
Assessment:
Offline Learners: Do
the activity below and
submit it to your
instructor via email.
Online Learners: Do
the activity posted in
Google Classroom
QUIZ: it will be given through quizziz, google forms or for weak connectivity will be sent through
email
RUBRIC
Fletcher RH, Fletcher SW. Clinical Epidemiology. The Essentials. 4th ed. Philadelphia, PA:
Lippincott Williams and Wilkins; 2005.
Feinstein AR. Clinical Epidemiology. The Architecture of Clinical Research. Philadelphia, PA:
W.B. Saunders Company; 1985
Oregon Health and Science University (n.d). Clinical Epidemiology Research. Department of
Medical Informatics and Clinical Epidemiology. Retrieved from
https://www.ohsu.edu/school-of-medicine/medical-informatics-and-clinical-
epidemiology/clinical-epidemiology-research on August 7, 2020
Sackett DL, Straus SE, Richardson WS, Rosenberg W, Haynes RB. (2000) Evidence-based
Medicine. How to Practice and Teach EBM. 2nd ed. Edinburgh, UK: Churchill Livingstone.
Sackett DL, Haynes BR, Guyatt GH, Tugwell P. Clinical Epidemiology. A Basic Science for
Clinical Medicine. 2nd ed. Boston, MA: Little, Brown and Company; 1991
Sorensen, H. (2009). Clinical Epidemiology – a fast new way to publish important research.
US National Library of Medicine National Institutes of Health Search database Search
term Clear input. Dovepress. Retrieved from https://www.ncbi.nlm.nih.gov/
pmc/articles/PMC2943161/ on August 7, 2020
Weiss NS. (2006) Clinical Epidemiology. The Study of the Outcome of Illness. 3rd ed. Oxford,
UK: Oxford University Press
I. BACKGROUND
Clinical epidemiology is the study of the patterns, causes, and effects of health and
disease in patient populations and the relationships between exposures or treatments
and health outcomes.
II. OBJECTIVES
At the end of the lesson you are expected to:
1. To apply and interpret measures of disease occurrence and correlates in populations
III. MATERIALS
Writing & Drawing materials
References
IV. ACTIVITY
1. Look for a research on Clinical Epidemiology
2. Present through a power point using the IMRAD format
The learner’s feedback is vital to us. Taking into account your assessment and impression will
help us enrich the content enhance the quality your learning engagement with us.
From this view, we would appreciate if you could spend some time completing this evaluation by
checking the column you think is appropriate and then providing a qualitative response to the
questions raised in this form.
The questionnaire is anonymous and though your participation is voluntary, your utmost
cooperation is encouraged.
Once completed the results of these questionnaires will be analysed and an overview compiled
which will be reported to the next cohort of students in the module handbook. The overview will
also be used to inform discussion at programme team conference.
Thank you.