Probability Concept - Topic 4

Download as pdf or txt
Download as pdf or txt
You are on page 1of 47

Probability concept

Kurnia Wahyudi

Department of Public Health


Faculty of Medicine Universitas Padjadjaran
2023 - 2024
Outline

• Probability
• Random variable
• Discrete Probability Distribution
• Continuous Probability Distribution
Recommended reference

• Rosner, B. (2015). Fundamentals of biostatistics (8th ed.).


Boston, MA: Cengage Learning.
• White, S. E. (2020). Basic and clinical biostatistics (5th
ed.). New York: Lange/McGraw-Hill.
Probability – Introduction

• Understanding probability is essential in calculating and


interpreting p-values in the statistical tests of subsequent
topics.
• It also permits the discussion of sensitivity, specificity,
and predictive values of screening tests
Probability – Definition

• The sample space is the set of all possible outcomes.


• In referring to probabilities of events, an event is any set
of outcomes of interest.
• The probability of an event is the relative frequency of
this set of outcomes over an indefinitely large (or
infinite) number of trials.
Probability – Empirical vs theoretical

• In real life, experiments cannot be performed an infinite


number of times. Instead, probabilities of events are
estimated from the empirical probabilities obtained from
large samples
• In other instances, theoretical probability models are
constructed from which probabilities of many different
kinds of events can be computed.
• The probability of developing stomach cancer over a 1-year
period in 45- to 49-year-old women, based on SEER Tumor
Registry data from 2002 to 2006, is 3.7 per 100,000, for
instance.
Probability – Basic properties

• The probability of an event E, denoted by Pr(E), always


satisfies 0 ≤ Pr(E) ≤ 1
• If outcomes A and B are two events that cannot both
happen at the same time (mutually exclusive), then:
Pr(A or B occurs) = Pr(A) + Pr(B)
Probability – Notation
Probability – Notation
Probability – Notation
Probability – Notation
Probability – Notation

• The multiplication law and addition law of probability


will not be discussed
• If one find this interesting, she or he can read more from
the references
Probability – Conditional probability

• The quantity Pr(A  B)/Pr(A) is defined as the


conditional probability of B given A, which is written
Pr(B|A).
• Example:
• Let A = {mammogram+}, B = {breast cancer}, then Pr(B|A) is the
probability of breast cancer (B) given that the mammogram is
positive (A).
Probability – Application
Probability – Application
Probability – Application

• Epidemiology
• The prevalence of a disease is the probability of currently having
the disease regardless of the duration of time one has had the
disease. Prevalence is obtained by dividing the number of
people who currently have the disease by the number of people
(population at risk) in the study population.
• The cumulative incidence of a disease is the probability that a
person with no prior disease will develop a new case of the
disease over some specified time period.
𝑃𝑟 𝐴 𝑃𝑟 𝐴
• Odd = =
1−𝑃𝑟 𝐴 𝑃𝑟 𝐴ҧ
• Survival curve i.e. Kaplan-Meier curve
Probability – Application

• Hypothesis testing i.e. p-value method based on certain


probability distribution (p-value is total area under the
curve)
• Z test  based on Z distribution
• t test  based on t distribution
• F test  based on F distribution
Random variable

• A random variable is a numeric function that assigns


probabilities to different events in a sample space
• During discussion of topic 1, the concept of variable was based
on observation of study samples

• A random variable for which there exists a discrete set of


numeric values is a discrete random variable
• A random variable whose possible values cannot be
enumerated is a continuous random variable
Random variable – example

1. Let X be the random variable that represents the


number of episodes of otitis media in the first 2 years of
life. Then X is a discrete random variable, which takes
on the values 0, 1, 2, and so on.
2. Assume there is no measurement error in
sphygmomanometer (device that measures blood
pressure) and hence the random variable X can take on
a continuum of possible values.
Discrete Probability Distribution

• The values taken by a discrete random variable and its


associated probabilities can be expressed by a rule or
relationship called a probability-mass function (pmf).
• A probability-mass function is a mathematical
relationship, or rule, that assigns the probability to any
possible value r of a discrete random variable X, Pr(X = r).
• This assignment is made for all values r that have positive
probability. The probability-mass function is sometimes
also called a probability distribution.
Discrete Probability Distribution
Discrete Probability Distribution

• Measures of location and spread can be developed for a


random variable in much the same way as they were
developed for samples.
• The analog to the arithmetic mean x is called the
expected value of a random variable, or population mean,
and is denoted by E(X) or µ.
• The analog to the sample variance (s2) for a random
variable is called the variance of the random variable, or
population variance, and is denoted by Var(X) or σ2. The
standard deviation of a random variable X, denoted by
sd(X) or σ, is defined by the square root of its variance.
Discrete Probability Distribution

• How to construct a probability-mass function?


• In some instances previous data can be obtained on the same
type of random variable being studied, and the probability-mass
function can be computed from these data.
• In other instances previous data may not be available, but the
probability-mass function from some well-known distribution
can be used to see how well it fits actual sample data (i.e.
Binomial and Poisson distribution)
Discrete Probability Distribution -
Binomial distribution
• Number of success in n statistically independent trials,
where the probability of success on each trial is p
(constant on each trial)
• Finite number of trials and the events  n
Discrete Probability Distribution -
Binomial distribution
• Mostly in form of proportion, therefore
• Expected mean = p
• Expected variance = p(1 – p)/n
• Application:
• To model the inheritability of a particular trait in genetics
• To estimate the occurrence of a specific reaction
• To estimate death of a cancer cell in an in vitro test of a new
chemotherapeutic agent
Discrete Probability Distribution –
Poisson distribution
• The probability of k events occurring in a time period t
for a Poisson random variable with parameter 
(expected number of events per unit time)
• Infinite number of trials and the events can be
indefinitely large, although the probability of k events
becomes very small as k gets large
• Expected mean =  = t (expected number of events over
time period t)
• Expected variance =  = t
Discrete Probability Distribution –
Poisson distribution
• Application:
• Number of death attributable to polio during the years 1968 –
1977
• To plan the number of beds needed for hospital’s intensive care
unit
• To plan the number of ambulances needed for on call
• To model the number of cells in a given volume of fluid
• To model the number of bacterial colonies growing in a certain
amount of medium
• To model emission of radioactive particles from a specified
amount of radioactive material
Continuous Probability Distribution

• Consider the distribution of diastolic blood-pressure (DBP)


measurements in 35- to 44-year-old men.
• In actual practice, this distribution is discrete because only a finite
number of blood-pressure values are possible since the
measurement is only accurate to within 2 mm Hg.
• However, assume there is no measurement error and hence the
random variable can take on a continuum of possible values.
• One consequence of this assumption is that the probabilities of
specific blood-pressure measurement values such as 102.3 are 0, and
thus, the concept of a probability-mass function cannot be used.
Continuous Probability Distribution

• The probability-density function (pdf) of the continuous


random variable X is a function such that the area under
the density-function curve between any two points a and
b is equal to the probability that the random variable X
falls between a and b. Thus, the total area under the
density-function curve over the entire range of possible
values for the random variable is 1.
Continuous Probability Distribution
Continuous Probability Distribution

• Areas A, B, and C correspond to the probabilities of being


mildly hypertensive, moderately hypertensive, and
severely hypertensive, respectively.
• Furthermore, the most likely range of values for DBP
occurs around 80 mmHg, with the values becoming
increasingly less likely as we move farther away from 80.
• However, not all continuous random variables have
symmetric bell-shaped distributions
Continuous Probability Distribution
Continuous Probability Distribution
Continuous Probability Distribution –
Normal Distribution
• The normal distribution is the most widely used
continuous distribution.
• It is also frequently called the Gaussian distribution, after
the well-known mathematician Karl Friedrich Gauss.
• Many other distributions that are not themselves normal
can be made approximately normal by transforming the
data onto a different scale.
• A normal distribution with mean µ and variance σ2 will
generally be referred to as an N(µ,σ2) distribution.
• Expected value = 
• Expected variance = 2
Continuous Probability Distribution –
Normal Distribution
Continuous Probability Distribution –
Normal Distribution
Continuous Probability Distribution –
Standard Normal Distribution
• A normal distribution with mean 0 and variance 1 is
called a standard, or unit, normal distribution.
• This distribution is also called an N(0,1) distribution.
• Expected value = 0
• Expected variance = 1
Continuous Probability Distribution –
Standard Normal Distribution
Continuous Probability Distribution –
Standard Normal Distribution
Continuous Probability Distribution –
Standard Normal Distribution
Continuous Probability Distribution –
Normal Distribution
• Application:
• The normal distribution is vital to statistical work, and most
estimation procedures and hypothesis tests that we will study
assume the random variable being considered has an underlying
normal distribution.
• As assumption in t test or F test (i.e. one-way ANOVA)
• Linear regression: the error terms or the residuals is assumed to
follow N(0,2)
Continuous Probability Distribution –
Normal Distribution
Continuous Probability Distribution –
Normal Distribution
Continuous Probability Distribution –
Standard Normal Distribution
• Application of standard normal distribution:
• Approximating distribution to other distributions
• Binomial distribution  npq  5
• Poisson distribution    10
• Hypothesis testing
• Comparing means of two groups with known variances
• Comparing proportion of two groups
Continuous Probability Distribution –
Standard Normal Distribution
• Application of standard normal distribution:
• Confidence intervals of proportion
• One sample
• Proportion difference
• Sample size calculation
• Descriptive study i.e. 𝑍1−𝛼Τ2
• Analytical study i.e. 𝑍1−𝛼Τ2 for two-sided hypothesis, 𝑍1−𝛼 for one-
sided hypothesis, and 𝑍1−𝛽 for selected power
Continuous Probability Distribution –
Other Distribution
• t distribution
• For testing 𝐻0 : 𝜇1 = 𝜇2
• For testing 𝐻0 : 𝜇1 − 𝜇2 = 0
• For testing 𝐻0 : 𝜌 = 0 in correlation
• For testing 𝐻0 : 𝛽𝑝 = 0 in linear regression
• F distribution
• For testing 𝐻0 : 𝜇1 = ⋯ = 𝜇𝑘 = 0 in one-way ANOVA
• For testing 𝐻0 : 𝛽1 = ⋯ = 𝛽𝑝 = 0 in multiple linear regression

You might also like