Unit 5 Overview of Probability

Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

Silver Oal College Of Engineering And Technology

Unit 5 :
Overview of Probability :

1
Outline

 Statistical tools in Machine Learning,


 Concepts of probability,
 Random variables,
 Discrete distributions,
 Continuous distributions,
 Multiple random variables,
 Central limit theorem,
 Sampling distributions,
 Hypothesis testing,
 Monte Carlo Approximation

2
Concepts of probability
 Probability represents the certainty factor. Certainty is the rate that you
would assign to an event to happen
 Probability is the Bedrock of Machine Learning.
 Algorithms are designed using probability (e.g. Naive Bayes).
 Learning algorithms will make decisions using probability (e.g. information
gain).
 Sub-fields of study are built on probability (e.g. Bayesian networks).
1. Probability of a union of two events:
2. Joint probabilities :

3. Conditional probability :

4. Bayes rule :

3
Probability Theory – Terminology
 Random Experiment – This is an experiment in which the outcome is
not known with certainty.
 Sample Space – This is the universal set that consists of all possible
outcomes of an experiment. It is usually represented using the letter “S”.
Individual outcomes are called elementary events. Sample space can be
finite or infinite.
 Event – It is a subset of a sample space and the probability is usually
calculated with respect to an event. Examples of events include:
 Number of cancellation of orders placed at an E-commerce portal site
exceeding 10%.
 The number of fraudulent credit card transactions exceeding 1%.

4
Random variables
 Random variables play an important role in describing, measuring, and
analyzing uncertain events such as customer churn, employee attrition, and
demand for a product. It is a function that maps every outcome in the
sample space to a real number.
 If random variable X can assume only a finite or countably infinite set of
values, then it is a discrete random variable. E.g., number of orders received
at an e-commerce retailer. These variables are described using probability
mass function (PMF) and cumulative distribution function (CDF).
 Random variable X that can take a value from an infinite set of values is a
continuous random variable. E.g., percentage of attrition of employees.
Continuous random variables are described using probability density
function (PDF) and cumulative distribution function (CDF).
 PDF is the probability that a continuous random variable takes value in a
small neighborhood of “x”:

5
Continuous random variables
 Suppose X is some uncertain continuous quantity. The probability that X
lies in any interval a ≤ X ≤ b can be computed as follows. Define the events
A = (X ≤ a), B = (X ≤ b) and W = (a < X ≤ b). We have that B = A ∨ W, and
since A and W are mutually exclusive, the sum rules gives

 Define the function F(q) p(X ≤ q). This is called the cumulative
distribution function or cdf of X. This is obviously a monotonically
increasing function.

 Example shows as graph Using this notation we have

6
Continuous random variables
𝑑
 Now define f(x) = F(x) (we assume this derivative
𝑑𝑥
exists); this is called the probability density function
or pdf.

7
Binomial Distribution
 Binomial distribution is a discrete probability distribution.
 It has several applications in many business contexts.
 Random variable X is said to follow a binomial distribution when:
1. The random variable can have only two outcomes − success and failure
(also known as Bernoulli trials).
2. The objective is to find the probability of getting x successes out of n
trials.
3. The probability of success is p and thus the probability of failure is (1 −
p).
4. The probability p is constant and does not change between trials.
 The PMF of the binomial distribution (probability that the number of
success will be exactly x out of n trials) is given by

 The CDF of a binomial distribution (probability that the number of success


will be x or less than x out of n trials) is given by

8
Poisson Distribution
 In many situations, we may be interested in calculating the number of
events that may occur over a period of time or space.
 E.g., number of cancellation of orders by customers at an e-commerce
portal, number of customer complaints, number of cash withdrawals at an
ATM, number of typographical errors in a book, number of potholes on
Bangalore roads
 To find the probability of number of events, we use Poisson distribution.
The PMF of a Poisson distribution is given by

9
Exponential Distribution
 Exponential distribution is a single parameter
continuous distribution that is traditionally used for
modeling time-to-failure of electronic components.
 It represents a process in which events occur
continuously and independently at a constant average
rate.
 The probability density function is given by

10
Normal DISTRIBUTION
 Normal distribution is also known as Gaussian distribution or bell curve (as
it is shaped like a bell).
 It is one of the most popular continuous distribution in the field of analytics
especially due to its use in multiple contexts.
 Normal distribution is observed across many naturally occurring measures
such as age, salary, sales volume, birth weight and height.
 Normal distribution is parameterized by two parameters: the mean of the
distribution µ and the variance σ2.

The sample mean of a normal distribution is given by

11
Central Limit Theorem
 It is one of the most important theorems in statistics.
 CLT is key to hypothesis testing, which primarily deals with sampling
distribution.
 Let S1, S2, …, Sk be samples of size n drawn from an independent and
identically distributed population with mean µ and standard deviation σ.
 Let X1, X2, …, Xk be the sample means (of the samples S1, S2, …, Sk).
 According to the CLT, the distribution of X1, X2, …, Xk follows a normal
distribution with mean µ and standard deviation of σ/√n.

12
Hypothesis Test
 Hypothesis testing consists of two complementary statements - null hypothesis and
alternative hypothesis.
Null hypothesis is an existing belief and alternate hypothesis is what we intend to establish
with new evidences (samples).
 Objective of hypothesis testing is to either reject or retain a null hypothesis with the help of
data.
 Hypothesis tests are broadly classified into parametric tests and non-parametric tests.
1. Parametric tests are about population parameters of a distribution such as mean,
proportion, and standard deviation.
2. Non-parametric tests are not about other characteristics such as independence of
events or data following certain distributions such as normal distribution.
Steps for hypothesis tests:
1. Define null and alternative hypotheses. Normally, H0 is used to denote null hypothesis and
HA for alternate hypothesis.
2. Identify the test statistic to be used for testing the validity of the null hypothesis (e.g., Z-test
or t-test).
3. Decide the criteria for rejection and retention of null hypothesis. This is called significance
value (α). Typical value used for α is 0.05.
4. Calculate the p-value, which is the conditional probability of observing the test statistic value
when the null hypothesis is true.
5. Take the decision to reject or retain the null hypothesis based on p-value and α.

13
Analysis Of Variance (Anova)
 One-way ANOVA can be used to study the impact of a single treatment
(also known as factor) at different levels (thus forming different groups) on
a continuous response variable (or outcome variable).
 Then the null and alternative hypotheses for one-way ANOVA for
comparing 3 groups are given by

14
Monte Carlo Approximation
 Monte Carlo methods are a class of techniques for randomly sampling a
probability distribution.
 Often, we cannot calculate a desired quantity in probability, but we can
define the probability distributions for the random variables directly or
indirectly.
 Monte Carlo sampling a class of methods for randomly sampling from a
probability distribution.
 Monte Carlo sampling provides the foundation for many machine learning
methods such as resampling, hyperparameter tuning, and ensemble learning.
 In principle, Monte Carlo methods can be used to solve any problem having
a probabilistic interpretation.
 By the law of large numbers, integrals described by the expected value of
some random variable can be approximated by taking the empirical mean
(a.k.a. the sample mean) of independent samples of the variable.

15
Monte Carlo Approximation
 Need for Sampling
 There are many problems in probability, and more broadly in machine
learning, where we cannot calculate an analytical solution directly.
 In fact, there may be an argument that exact inference may be intractable
for most practical probabilistic models.
 Sampling provides a flexible way to approximate many sums and
integrals at reduced cost.

16
Monte Carlo Methods
 Monte Carlo methods, or MC for short, are a class of
techniques for randomly sampling a probability distribution.
 There are three main reasons to use Monte Carlo methods to
randomly sample a probability distribution; they are:
 Estimate density, gather samples to approximate the distribution of a
target function.
 Approximate a quantity, such as the mean or variance of a
distribution.
 Optimize a function, locate a sample that maximizes or minimizes the
target function.

17
Monte Carlo Methods
 Monte Carlo methods are defined in terms of the way that samples are drawn
or the constraints imposed on the sampling process.
 Some examples of Monte Carlo sampling methods include: direct
sampling, importance sampling, and rejection sampling.
 Direct Sampling. Sampling the distribution directly without prior
information.
 Importance Sampling. Sampling from a simpler approximation of the
target distribution.
 Rejection Sampling. Sampling from a broader distribution and only
considering samples within a region of the sampled distribution.
 It’s a huge topic with many books dedicated to it. Next, let’s make the idea of
Monte Carlo sampling concrete with some familiar examples.
 For example, Monte Carlo methods can be used for:
1. Calculating the probability of a move by an opponent in a complex game.
2. Calculating the probability of a weather event in the future.
3. Calculating the probability of a vehicle crash under specific conditions.

18
Monte Carlo Methods

19
20
 https://machinelearningmastery.com/monte-carlo-sampling-for-
probability/
 https://seleritysas.com/blog/2019/12/12/types-of-predictive-
analytics-models-and-how-they-work/
 https://towardsdatascience.com/selecting-the-correct-
predictive-modeling-technique-ba459c370d59
 https://www.netsuite.com/portal/resource/articles/financial-
management/predictive-modeling.shtml
 https://www.dezyre.com/article/types-of-analytics-descriptive-
predictive-prescriptive-analytics/209#toc-2
 https://www.sciencedirect.com/topics/computer-
science/descriptive-model
 https://towardsdatascience.com/intro-to-feature-selection-
methods-for-data-science-4cae2178a00a

21 Prof. Monali Suthar (SOCET-CE)

You might also like