ML Unit2-1

Unit – II: Probability Distributions
Dr. Anil B. Gavade

May 23, 2024
Introduction to Probability Distributions

Probability theory is the mathematical framework that enables us to analyze
uncertainty and randomness in various phenomena. Probability distributions
are fundamental in this theory, as they provide a structured way to describe
the likelihood of different outcomes occurring in random experiments. In this
section, we will delve deeper into the concept of probability distributions and
their significance in statistical analysis.
0.1 Understanding Uncertainty

In real-world scenarios, uncertainty is ubiquitous. Whether it’s predicting the
outcome of an election, assessing the risk of a financial investment, or estimating
the likelihood of a medical diagnosis, uncertainty surrounds us. Probability
theory offers a systematic approach to quantify and manage this uncertainty,
providing a foundation for making informed decisions in the face of randomness.
0.2 The Role of Probability Distributions

Probability distributions serve as the building blocks of probability theory. They
characterize the probabilities associated with various outcomes of a random
experiment, providing insights into the likelihood of different events occurring.
By understanding the properties of different probability distributions, we can
model and analyze complex phenomena with uncertainty.
0.3 Types of Probability Distributions

• Discrete Probability Distributions: These distributions describe the
probabilities associated with discrete, countable outcomes. Examples in-
clude the outcomes of rolling a die, flipping a coin, or counting the number
of defective items in a production batch.
• Continuous Probability Distributions: Continuous distributions, on
the other hand, describe the probabilities associated with continuous, un-
countable outcomes. They are commonly encountered in scenarios involv-
ing measurements, such as height, weight, or temperature.
1
0.4 Properties of Probability Distributions
Probability distributions possess several key properties that are essential for
understanding and analyzing random phenomena.
• Probability Mass Function (PMF): For discrete distributions, the

probability mass function specifies the probability of each possible out-
come. It assigns a probability to each value that the random variable can
take.
• Probability Density Function (PDF): Continuous distributions are
characterized by probability density functions, which represent the density
of probabilities over a continuous range of values. The area under the
PDF curve within a certain interval corresponds to the probability of the
random variable falling within that interval.
• Expected Value (Mean): The expected value, or mean, of a probability
distribution represents the average value of the random variable over many
trials. It provides a measure of central tendency and is calculated as the
weighted sum of all possible outcomes, with each outcome weighted by its
probability.
• Variance and Standard Deviation: Variance measures the spread or
dispersion of a probability distribution around its mean, while the stan-
dard deviation provides a measure of the average deviation from the mean.
They quantify the degree of uncertainty or variability associated with the
random variable.
Discrete Random Variables: Explained

In probability theory and statistics, a random variable is a variable whose pos-
sible values are outcomes of a random phenomenon. A discrete random variable
is one that can take on only a countable number of distinct values. These values
are typically integers or a finite set of values.
Characteristics of Discrete Random Variables:

1. Countable Values: Discrete random variables have a finite or countably
infinite number of possible values. For example, the number of heads
obtained when flipping a coin multiple times can only be 0, 1, 2, ..., up to
the total number of flips.
2. Probability Mass Function (PMF): Discrete random variables are
characterized by a probability mass function, which assigns probabilities
to each possible value the random variable can take. The PMF provides
the probability of each outcome occurring.
2
3. Specific Probabilities: Each value of a discrete random variable has a
specific probability associated with it. These probabilities sum up to 1,
ensuring that one of the possible outcomes must occur.
4. Examples of Discrete Random Variables: Common examples in-
clude:
• The number of students in a classroom.
• The number of defects in a batch of products.
• The number of goals scored in a soccer match.
• The number of cars passing through a toll booth in an hour.
Probability Mass Function (PMF):

The PMF of a discrete random variable X, denoted by f (x), specifies the proba-
bility that X takes on a particular value x. Mathematically, it can be represented
as:
f (x) = P (X = x)
Where:
• x represents a specific value of the random variable X.
• f (x) is the probability mass function.
• P (X = x) denotes the probability that the random variable X takes on

the value x.
The PMF satisfies two key properties:
1. 0 ≤ f (x) ≤ 1 for all values of x.
P
2. all x f (x) = 1, where the summation is taken over all possible values of
x.
Example:
Consider a fair six-sided die. Let X represent the outcome of a single roll of the
die. The possible values of X are 1, 2, 3, 4, 5, and 6. Since the die is fair, each
outcome has an equal probability of 61 . The PMF for X is:
(
1
if x = 1, 2, 3, 4, 5, 6
f (x) = 6
0 otherwise
This PMF satisfies both properties: each probability is between 0 and 1, and
the sum of all probabilities is 1.
3
1 Fundamental Rules of Probability
Probability theory relies on a set of fundamental rules that provide the founda-
tion for understanding and calculating probabilities. These rules ensure consis-
tency and coherence in probabilistic reasoning, guiding us in making informed
decisions based on uncertain outcomes. Let’s explore each fundamental rule in
depth:
1. Non-negativity Axiom:
The non-negativity axiom states that probabilities must be non-negative.
In other words, the probability of an event occurring cannot be negative.
Mathematically, for any event E, we have:
P (E) ≥ 0
where P (E) represents the probability of event E.

Explanation: This rule reflects the intuitive notion that probabilities are
measures of likelihood and, as such, cannot be negative. If an event has
a non-negative probability, it means that it is possible for that event to
occur, even if the probability is very small.
2. Normalization Axiom:
The normalization axiom, also known as the sum rule or the total proba-
bility rule, states that the sum of probabilities over all possible outcomes
in a sample space equals 1. Mathematically, if S is the sample space
containing all possible outcomes, then:
X
P (E) = 1
where the sum is taken over all events E in the sample space S.
Explanation: This rule ensures that the total probability space is fully
accounted for, leaving no room for uncertainty. It establishes a basis
for interpreting probabilities as proportions or fractions of certainty, with
1 representing complete certainty (i.e., certainty that something in the
sample space will occur).
3. Addition Rule for Disjoint Events:
The addition rule for disjoint events, also known as the sum rule, states
that if A and B are disjoint events (i.e., they cannot both occur simulta-
neously), then the probability of either event occurring is the sum of their
individual probabilities. Mathematically, for disjoint events A and B, we
have:
P (A ∪ B) = P (A) + P (B)
4
Explanation: This rule captures the idea that when events are mutually
exclusive (i.e., if one event occurs, the other cannot), the probability of
their union is simply the sum of their individual probabilities. It allows us
to calculate the probability of at least one of the events occurring without
double-counting the overlap between them.
These fundamental rules provide a solid framework for reasoning about un-
certainty and making probabilistic predictions. Understanding and applying
these rules correctly are essential for proper interpretation and manipulation of
probabilities in various real-world scenarios, ranging from gambling and finance
to scientific research and decision-making processes.
2 Bayes’ Rule: Explained in Depth

Bayes’ Theorem:
Bayes’ theorem, also known as Bayes’ rule or Bayes’ law, is a fundamental
concept in probability theory that allows us to update our beliefs or probabil-
ities based on new evidence. It provides a systematic way to incorporate new
information into our existing knowledge, enabling more accurate predictions and
decision-making in uncertain situations.
Understanding Bayes’ Theorem:
At its core, Bayes’ theorem relates the conditional probability of an event
given some evidence to the probability of that evidence given the event. Math-
ematically, it is expressed as:
P (B|A) × P (A)
P (A|B) =
P (B)
Where:
• P (A|B) is the probability of event A occurring given that event B has

occurred. This is the posterior probability.
• P (B|A) is the probability of observing evidence B given that event A has
occurred. This is the likelihood.
• P (A) is the prior probability of event A, representing our initial belief or

knowledge about the likelihood of A before considering evidence B.
• P (B) is the total probability of observing evidence B, acting as a normal-
ization factor.
Explanation through Example:
Let’s illustrate Bayes’ theorem with a classic example known as the ”medical
diagnosis problem”:
Suppose a certain medical test is 99
Now, let A represent the event that a person has the disease, and B represent
the event that the medical test indicates the presence of the disease.
5
- Prior Probability: P (A) = 0.01 (1- Likelihood: P (B|A) = 0.99 (99-
Total Probability: P (B) = P (B|A) × P (A) + P (B|¬A) × P (¬A)
Since the test can either correctly indicate the disease given its presence
(P (B|A)×P (A)) or incorrectly indicate the disease given its absence (P (B|¬A)×
P (¬A)), we can calculate P (B):
P (B) = (0.99 × 0.01) + (0.05 × 0.99)

= 0.01 × (0.99 + 0.05)
= 0.01 × 1.04
= 0.0104
Now, applying Bayes’ theorem:
P (B|A) × P (A)
P (A|B) = ≈ 0.9519
P (B)
So, given that the test indicates the presence of the disease, the probability
that the person actually has the disease is approximately 95.19
Significance and Applications:
Bayes’ theorem has widespread applications across various fields, including
but not limited to:
• Medical diagnosis
• Spam filtering in emails

• Pattern recognition in machine learning
• Risk assessment in finance and insurance
• Fault diagnosis in engineering
By enabling the integration of new evidence with prior knowledge, Bayes’

theorem provides a robust framework for decision-making under uncertainty,
leading to more accurate predictions and informed choices.
3 Independence and Conditional Independence

3.1 Independence
Two events A and B are considered independent if the occurrence of one event
does not affect the occurrence of the other. Mathematically, two events are
independent if and only if the probability of their joint occurrence is equal to
the product of their individual probabilities:
P (A ∩ B) = P (A) × P (B)
6
In other words, the probability of A given B (or vice versa) is the same as
the probability of A without considering B, and vice versa.
Example: Consider two events: flipping a fair coin and rolling a fair six-
sided die. The outcome of the coin flip (heads or tails) is independent of the
outcome of the die roll (1, 2, 3, 4, 5, or 6). The probability of getting heads on
the coin flip is P (heads) = 0.5, and the probability of rolling a 4 on the die is
P (4) = 16 . The joint probability of getting heads on the coin flip and rolling a
4 on the die is P (heads ∩ 4) = P (heads) × P (4) = 0.5 × 16 = 121
, demonstrating
independence.
3.2 Conditional Independence

Conditional independence occurs when two events are independent of each other
given the occurrence of a third event. Formally, events A and B are conditionally
independent given event C if:
P (A ∩ B|C) = P (A|C) × P (B|C)

This implies that the knowledge of event C does not provide any additional
information about the relationship between events A and B. Even though A
and B may not be independent in general, they become independent when
conditioned on event C.
Example: Suppose we have three events: A represents drawing a red ball
from a bag, B represents drawing a blue ball, and C represents drawing a ball
from a specific compartment within the bag. If we know that A and B are
independent events (i.e., the bag contains equal numbers of red and blue balls),
and if event C specifies the compartment from which the ball is drawn (e.g., left
compartment or right compartment), then events A and B become conditionally
independent given C.
3.3 Block Diagrams

Block diagrams are graphical representations used to illustrate the relationships
between events and conditions in probabilistic systems. In the context of inde-
pendence and conditional independence, block diagrams can visually depict how
events interact with each other and how conditional independence is determined
based on certain conditions.
Example Block Diagram:
Event A Event B
| |
| |
Condition C (Given)
This block diagram represents events A and B being conditioned on event
C. If A and B are conditionally independent given C, they are represented as
separate branches from C, indicating that the occurrence of C does not affect
the relationship between A and B.
7
4 Continuous Random Variables
Continuous random variables are variables that can take on any value within
a certain range, often representing measurements or quantities that can vary
continuously. Unlike discrete random variables, which can only assume specific,
distinct values, continuous random variables can take on an infinite number of
values within their defined intervals. Understanding continuous random vari-
ables is essential in various fields such as physics, engineering, economics, and
statistics.
4.1 Probability Density Function (PDF)

In the context of continuous random variables, probabilities are described using
probability density functions (PDFs). A PDF, denoted as f (x), represents the
probability density or likelihood of the random variable taking on a particular
value x within a given interval. The area under the curve of a PDF over a
specific interval corresponds to the probability of the random variable falling
within that interval.
Properties of PDFs:
1. Non-negativity: The PDF must be non-negative for all values of x, i.e.,

f (x) ≥ 0 for all x.
2. Normalization: The total area under the PDF curve must Requal 1 over
∞
the entire range of possible values of the random variable, i.e., −∞ f (x) dx =
1.
Example: Let’s consider a continuous random variable X representing the

height of students in a class. Suppose the PDF of X is given by:
1 (x−µ)2
f (x) = √ e− 2σ2
2πσ
Where:
• µ is the mean (expected value) of the distribution.
• σ is the standard deviation, representing the spread of the distribution.
This PDF represents the normal (or Gaussian) distribution, which is com-
monly used to model various natural phenomena.
4.2 Cumulative Distribution Function (CDF)

The cumulative distribution function (CDF), denoted as F (x), gives the prob-
ability that a continuous random variable X is less than or equal to a specified
value x. Mathematically, the CDF is defined as:
Z x
F (x) = f (t) dt
−∞
8
Where:
• f (t) is the PDF of the continuous random variable X.
The CDF provides a convenient way to calculate probabilities for continuous
random variables and is particularly useful in statistical analysis and hypothesis
testing.
Example: For the normal distribution described earlier, the CDF can be
calculated by integrating the PDF from negative infinity to a specified value x:
Z x
1 (t−µ)2
F (x) = √ e− 2σ2 dt
−∞ 2πσ
This integral gives the probability that the random variable X is less than or
equal to x, providing valuable information about the distribution of the variable.
4.3 Expectation and Variance

Similar to discrete random variables, continuous random variables also have
expectations and variances, which provide measures of central tendency and
dispersion, respectively.
• Expectation (Mean): The expectation (or mean) of a continuous ran-
dom variable X is calculated as:
Z ∞
µ= x · f (x) dx
−∞
• Variance: The variance of a continuous random variable X is calculated

as:
Z ∞
σ2 = (x − µ)2 · f (x) dx
−∞
These measures help characterize the central tendency and spread of the dis-
tribution, providing valuable insights into the behavior of the random variable.
4.4 Applications and Examples

Continuous random variables are prevalent in various fields, including:
• Physics: Modeling the position, velocity, or energy of particles.
• Finance: Modeling stock prices or interest rates.
• Engineering: Analyzing the strength or durability of materials.
• Medicine: Studying patient vital signs or medical test results.
Understanding continuous random variables and their associated distribu-
tions is crucial for making informed decisions and predictions in these and many
other areas of study and application.
9
5 Quantiles, Mean, and Variance
Quantiles
Quantiles are values that divide a probability distribution into equally-sized
intervals. They provide insight into the spread and distribution of data. The
most commonly used quantiles include the median (50th percentile), quartiles
(25th and 75th percentiles), and percentiles (any value between 0 and 100).
• Median: The median is the value that separates the lower and upper
halves of a dataset. It is the 50th percentile, meaning that 50% of the data
lies below it and 50% lies above it. The median is resistant to outliers and
provides a robust measure of central tendency.
• Quartiles: Quartiles divide a dataset into four equal parts. The first
quartile (Q1) is the value below which 25% of the data lies, the second
quartile is the median (Q2), and the third quartile (Q3) is the value below
which 75% of the data lies. Quartiles help assess the spread and skewness
of the data.
• Percentiles: Percentiles generalize the concept of quartiles to divide a
dataset into hundred equal parts. For example, the 90th percentile rep-
resents the value below which 90% of the data lies. Percentiles are useful
for comparing individual data points to the overall distribution.
Mean (Expected Value)

The mean, also known as the expected value, is a measure of central tendency
that represents the average value of a random variable. It provides insight into
the “typical” value of the variable.
For a discrete random variable X with probability mass function f (x), the
mean µ is calculated as:
X
µ= x · f (x)
x
For a continuous random variable X with probability density function f (x),

the mean µ is calculated using integration:
Z ∞
µ= x · f (x) dx
−∞
The mean is sensitive to extreme values (outliers) and may not accurately
represent the central tendency if the distribution is skewed.
Variance
Variance measures the dispersion or spread of a probability distribution. It
quantifies how much the values of a random variable deviate from the mean.
For a discrete random variable X with probability mass function f (x), the
variance σ 2 is calculated as:
10
X
σ2 = (x − µ)2 · f (x)
x
For a continuous random variable X with probability density function f (x),

the variance σ 2 is calculated using integration:
Z ∞
2
σ = (x − µ)2 · f (x) dx
−∞
The square root of the variance, known as the standard deviation (σ), is
often used as a measure of spread, providing the same unit of measurement as
the random variable.
Interpretation
Quantiles, mean, and variance provide complementary information about
the distribution of data. While quantiles describe the spread of data at spe-
cific points, the mean represents the average value, and the variance quantifies
the dispersion around the mean. Together, these measures offer a comprehen-
sive understanding of the characteristics of a probability distribution and are
essential for statistical analysis and inference.
11

ML Unit2-1

Uploaded by

Document Informationclick to expand document information

Document Informationclick to expand document information

Copyright:

Available Formats

ML Unit2-1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ML Unit2-1

Uploaded by

Copyright:

Available Formats

Unit – II: Probability Distributions

Dr. Anil B. Gavade

Introduction to Probability Distributions

0.1 Understanding Uncertainty

0.2 The Role of Probability Distributions

0.3 Types of Probability Distributions

• Probability Mass Function (PMF): For discrete distributions, the

Discrete Random Variables: Explained

Characteristics of Discrete Random Variables:

Probability Mass Function (PMF):

• P (X = x) denotes the probability that the random variable X takes on

where P (E) represents the probability of event E.

2 Bayes’ Rule: Explained in Depth

• P (A|B) is the probability of event A occurring given that event B has

• P (A) is the prior probability of event A, representing our initial belief or

P (B) = (0.99 × 0.01) + (0.05 × 0.99)

Now, applying Bayes’ theorem:

• Spam filtering in emails

By enabling the integration of new evidence with prior knowledge, Bayes’

3 Independence and Conditional Independence

3.2 Conditional Independence

P (A ∩ B|C) = P (A|C) × P (B|C)

3.3 Block Diagrams

4.1 Probability Density Function (PDF)

1. Non-negativity: The PDF must be non-negative for all values of x, i.e.,

Example: Let’s consider a continuous random variable X representing the

4.2 Cumulative Distribution Function (CDF)

4.3 Expectation and Variance

• Variance: The variance of a continuous random variable X is calculated

4.4 Applications and Examples

Mean (Expected Value)

For a continuous random variable X with probability density function f (x),

For a continuous random variable X with probability density function f (x),

You might also like