Fundamentals of Data Analytics Lecture 01. Probability: Instructional Team

Download as pdf or txt
Download as pdf or txt
You are on page 1of 54

Fundamentals of Data Analytics

Lecture 01. Probability


Instructional Team
About this Course

- Probability
- Statistics
- Hands-on programming skills
- Meet your instructors & classmates
Instructional Team

Vinh Dang Thuy Nguyen


PhD., Data Science Lead M.Sc., Data Analyst
Tyme Digital Trusting Social

Trung Le Sang Nguyen Huy Pham


BSc., Research Engineer M.Sc., Data Scientist PharmB, Data Scientist
Trusting Social FE Credit Talosix
Welcome to FDA

✓ Fundamentals of Probability and Statistics


✓ Gentle Introduction to Python
✓ Real-Life Case Studies
✓ Networking
✓ Preparing for Advanced course DA / ML

? DA / ML / DS / BI / AI / 4th IR
? SQL / R…
⇒ Discussing with Instructional team & classmates (Piazza…)
Content of Lecture
➔ Counting Rules
➔ Sample Space, Event
➔ Independent Event
➔ Conditional Probability
➔ Bayes’ Theorem
Motivation Example

Blaise Pascal (1623 - 1662) Pierre de Fermat (1601 - 1665)

Who first get to 3 will win the game and take all money.
INTRODUCTION
Probability & Statistics
Probability & Statistics

Example
What is average height of Vietnamese males?
1. Produce Data: Determine what to measure, then collect the data.
→ Selected 1000 of male adults at random.
→ Measured and collected the height
2. Explore the Data: Analyze and summarize the data.
→ In the sample, the average height is 165.7 cm.
3. Draw a Conclusion: Use the data, probability, and statistical inference
→ Draw a conclusion about the population.
Probability & Statistics
COUNTING
Counting rules

Rule of counting
Event A can occur in n1 ways & Event B can occur in n2 ways
⟹ Events A and B can occur in n1 × n2 ways.
In general, the number of ways that m events can occur is n1 × n2 × . . . × nm.

Example:
How many unique stock-keeping unit (SKU) labels can a chain of hardware stores
create by using two letters (ranging from AA to ZZ) followed by four numbers
(digits 0 through 9)?

Solution:
26 x 26 x 10 x 10 x 10 x 10 = 6,760,000
Counting rules

Factorials
The number of unique ways that n items can be arranged in a particular order is n!
n! = n × (n-1) × (n-2) × … × 2 × 1

Example:
A home appliance service truck must make three stops (A, B, C). In how many ways
could the three stops be arranged?

Solution:
3! = 3 x 2 x 1 = 6
That is {ABC, ACB, BAC, BCA, CAB, CBA}
Counting rules

Permutations
The number of possible permutations of n items taken r in a particular order is

Example:
Five home appliance customers (A, B, C, D, E) need service calls, but the field techni-
cian can service only three of them before noon. The order in which they are serviced
is important (to the customers, anyway) so each possible arrangement of three service
calls is different. The dispatcher must assign the sequence. How many possible
permutation?
Solution:
Counting rules
Combinations
A combination is a collection of r items chosen at random without replacement
from n items where the order of the selected items is not important.
The number of possible combinations of r items chosen from n items is

Example:
Suppose that five customers (A, B, C, D, E) need service calls and the maintenance
worker can only service three of them this morning. The customers don’t care when
they are serviced as long as it’s before noon, so the dispatcher does not care who is
serviced first, second, or third. How many possible combinations?
Solutions:
PROBABILITY
Sample Spaces & Events
Definition

The Sample spaces S is the set of possible outcomes of an experiment


Sample outcomes / Realizations are the points ⍵ in the Sample spaces
Events (E) are subsets of Sample spaces
Example:
● If we toss a coin twice then S = {HH, HT, TH, TT}
Event that the 1st coin is heads is A = {HH, HT}

● If we toss a coin forever then the S is the infinite set


S = {⍵ = (⍵1, ⍵2, ⍵3, ...), ⍵i ∈ {H,T}}
Event that first head appears on the third toss
E = {(⍵1, ⍵2, ⍵3, ...): ⍵1 = T, ⍵2 = T, ⍵3 = H, ⍵i ∈ {H,T} for i > 3}
Sample Spaces & Events
S Sample space

ω Outcome

A or E,... Event (Subset of S)

|A| number of points in A (if A is finite)

AC Complement of A (not A)

A⋃B Union of A and B

A⋂B Intersection of A and B

A-B Set difference (points in A that are not in B)

A⊂B Set inclusion

∅ Null Event
Sample Spaces & Events
Mutually Exclusive Events

A1, A2, … are disjoint or are mutually exclusive if Ai ∩ Aj = ∅ whenever i ≠ j

Collectively Exhaustive Events

A1, A2, … are collectively exhaustive if

MECE?
Probability
Probability
The probability of an event is a number that measures the relative likelihood that the
event will occur.

Axioms of Probability

◉ P(A) ≥ 0 for every A


◉ P(S) = 1 (S is Sample space)
◉ If A1, A2, … are disjoint/mutually exclusive then
Views of Probability

Approach How Assigned? Example

Empirical Estimated from observed There is a 2 percent chance of


outcome frequency twins in a randomly chosen birth.

Classical Known a priori by the nature There is a 50 percent chance of


of the experiment heads on a coin flip.
Subjective Based on informed opinion There is a 60 percent chance that
or judgment Toronto will bid for the 2024
Winter Olympics.
How Assigned? → Empirical Approach
Empirical approach

⦿ Collecting empirical data through observations or experiments


⦿ The number of observations is n
⦿ The frequency of observed outcomes is f
⟹ The estimated probability is f/n

Example:
An company interviewed 280 production workers before hiring 70 of them.
Let H = event that a randomly chosen interviewee is hired ⇒ P(H) = f/n = 70/280 = 0.25
Law of LARGE number
As the number of trials increases, any empirical probability approaches its theoretical
limit.
How Assigned? → Empirical Approach
Law of LARGE number
As the number of trials increases, any empirical probability approaches its theoretical
limit.
How Assigned? → Empirical Approach
CASE STUDY: Practical Actuaries Issues
Actuaries help companies calculate payout rates on life insurance, pension plans, and health
care plans by estimating the empirical probabilities
Actuaries created the tables that guide IRA withdrawal rates for individuals from age 70 to
99. Here are a few challenges that actuaries face:
1. Is n “large enough” to say that f/n has become a good approximation to the probability
of the event of interest? (Data collection costs money, and decisions must be made)
2. Was the experiment repeated identically? (Subtle variations may exist in the
experimental conditions and data collection procedures)
3. Is the underlying process stable over time? (For example, default rates on 2007
student loans may not apply in 2017, due to changes in attitudes and interest rates)
How Assigned? → Classical Approach
Classical approach
In classical approach, we do not actually have to perform an experiment because the
nature of the process allows us to envision the entire sample space.
→ We can use deduction to determine P(A).

Example:
A priori: the process
In the two-dice experiment, there are 36 possible outcomes.
of assigning
probabilities before
H = rolling a seven
we actually observe
the event or try an
experiment
How Assigned? → Subjective Approach
Subjective approach
A subjective probability reflects someone’s informed judgment about the likelihood of
an event when there is no repeatable random experiment.

Example:
● What is the probability that a new truck product program will show a return on
investment of at least 10 percent?
● What is the probability that the price of Ford’s stock will rise within the next 30
days?
Notes:
In such cases, we rely on personal judgment or expert opinion. However, such a
judgment is not random because it is typically based on experience with similar
events and knowledge of the underlying causal processes.
Interpretations of Probability

“Frequencies” approach “Degrees of beliefs” approach

P(A) is the long run proportion of times P(A) measures an observer’s strength of
that A is true in repetitions. belief that A is true, or uncertainty of A

E.g: The probability that a coin will land heads is 0.5

If we flip the coin many times, we The coin is equally likely to land heads or
expect it to land heads about half the tails on the next toss
time.
Properties of Probability
Properties of Probability

⦿ P(∅) = 0
⦿ A ⊂ B ⇒ P(A) ≤ P(B)
⦿ 0 ≤ P(A) ≤ 1
⦿ P(AC) = 1 - P(A)
⦿ A ⋂ B = ∅ ⇒ P(A ⋃ B) = P(A) + P(B)
⦿ P(A ⋃ B) = P(A) + P(B) - P(A⋂ B)
Independent Events
Definition
Two events A and B are independent if

A set of events {Ai} is independent if

Example: Tossing a fair dice


Let A = {2, 4, 6} and B = {1, 2, 3, 4} ⇒ A ∩ B = {2, 4}
P(AB) = 2/6 = ⅓ and P(A) P(B) = (½) x (⅔) = ⅓
⇒ P(AB) = P(A) P(B) ⇒ A and B are independent
Contingency Tables
Definition
A contingency table is a cross-tabulation of frequencies into rows and columns.

Example: Tuition cost versus five-year net salary gains for MBA degree recipients at 67
top-tier graduate schools of business
Contingency Tables
Calculation From Contingency Tables
● Marginal Probability
● Joint probability
● Conditional Probability
● Independence
● Relative Frequencies
Contingency Tables
Marginal Probability
The marginal probability of an event is a relative frequency that is found by dividing a
row or column total by the total sample size.

The marginal probability of a medium salary gain is P(S2) = 33/67 = 0.4925


The marginal probability of low tuition is P(T1) = 16/67 = 0.2388
Contingency Tables
Joint Probability
Each of cells is used to calculate a joint probability representing the intersection of
two events.

The joint probability that the school has low tuition (T1) and has large salary gains
(S3)
is P(T1∩S3) = 1/67 = 0.0149
Contingency Tables
Conditional Probability
Conditional probabilities may be found by restricting ourselves to a single row or
column (the condition).

The conditional probability that salary gains are small (S1) given that the MBA tuition
is large (T3) is P(S1 | T3) = 5/32 = 0.1563
Contingency Tables
Independence
To check whether events in a contingency table are independent, we can look at
conditional probabilities.
Example: Is large salary gain (S3) independent of low tuition (T1) ?

Method 1: No, because


P(S3) P(T1) = (17/67)(16/67) = 0.0606
P(S3 ∩ T1) = 1/67 = 0.0149
⇒ P(S3) P(T1) ≠ P(S3 ∩ T1)

Method 2: No, because


P(S3 | T1) = 1/16 = 0.0625 ≠ P(S3) = 17/67 = 0.2537
Contingency Tables
Relative frequency
To facilitate probability calculations, we can divide each cell frequency fij by the total
sample size to get the relative frequencies fij / n
Contingency Tables
Confusion matrix
Tree Diagrams
Definition
Events and probabilities can be displayed in the form of a tree diagram or decision
tree to help visualize all possible outcomes.

How to build a tree diagram?


(1) Make the Contingency Table
(2) Calculate the conditional probabilities.
(3) Calculate the joint probabilities from
conditional probabilities.
P(A ∩ B) = P(B)P(A | B)
Tree Diagrams
Step 1. Make the Contingency Table
Tree Diagrams
Step 2. Calculate the conditional probabilities.
Tree Diagrams
Step 3. Calculate the joint probabilities from the conditional probabilities.
Tree Diagram
Example (Product Launching Plan)
① A small technology company wish to launch a new and innovative product to the
market. There are 3 options: Direct approach, Internet only or License.
② By Market research, the demand for the product can be classed into three
categories: high, medium, or low with probabilities of 0.2, 0.35 and 0.45 respectively.
③ The likely profits to be earned in each plan are in the table
High Medium Low

Direct 100 55 -25

Internet 46 25 15

License 20 20 20

How should the company launch the product ?


CONDITIONAL PROBABILITY &
BAYES’ THEOREM
Conditional Probability
Conditional Probability
If P(B) > 0 then the conditional probability of A given B is

Example:
Of the population age 16–21 and not in college: The conditional probability of
being unemployed is greater than
● 13.50% are unemployed (U)
the unconditional probability of
● 29.05% are high school dropouts (D) being unemployed
● 5.32% are unemployed high school dropouts(U∩D). → In other words, knowing that
→ The probability of an unemployed youth given someone is a high school dropout
that the person dropped out: alters the probability that the
person is unemployed.
Bayes’s Theorem
Theorem
Let A and B be event:

General form
If event B to have as many mutually exclusive and collectively exhaustive
categories (B1 , B2 , ... , Bn )
Bayes’s Theorem
Bayes’ Theorem
Example: Rare Disease detection D DC
A medical test for a rare disease D has outcomes (+) and (−).
(+) 0.009 0.099
Suppose you go for a test and get a positive.
What is the probability you have the disease? (−) 0.001 0.891

Most people choose P(+|D)=0.009/(0.009 + 0.001) =0.9 = 90%


However, the correct answer is

With :
P(+|D) = 0.009 / (0.009+0.001) = 0.9
P(D) = (0.009 + 0.001) / (0.009 +0.001 + 0.099 + 0.891) = 0.01
P(+) = (0.009 + 0.099) / (0.009 + 0.099 + 0.001 + 0.891) = 0.108
→ P(D|+) = 0.9 x 0.01 / 0.108 = 0.083 = 8.3%
Bayes’ Theorem
Example: Email Filter
A: The email contains the word “free” B1 B2 B3

B1: “spam” P(A|Bi) 0.90 0.01 0.01


B2: “low priority” P(Bi) 0.70 0.20 0.10
B3: “high priority”
From previous experience, we can determine P(A|Bi), P(Bi)
⇒ What is the probability that an email is spam containing a word “free”?
Case Study for Practice - Box Office revenue prediction
Case: A Kaggle Competition at https://www.kaggle.com/c/tmdb-box-office-prediction/overview
Purposes:
❏ To provide simple questions / examples for theoretical concepts, such as:
❏ How many ways to watch movies in a collection if we care about the order?
❏ What is the probability to attain $100 million given the budget is more than $20 million?

❏ To practice programming python with a real dataset (will be used in office hours)
Dataset:
All Columns:
Id, Belongs_to_collection, Budget, Genres, Homepage, Imdb_id, Original_language, Original_title, Overview, Popularity,
Poster_path, Production_companies, Production_countries, Release_date, Runtime, Spoken_languages, Status, Tagline,
Title, Keywords, Cast, Crew
Frequently used Columns:
Budget, Genres, Original_language, Popularity, Production_companies, Production_countries, Release_date, Runtime,
Spoken_languages, Tagline, Title, Keywords, Cast, Crew
Programming
● A Crash Course in Python: https://nbviewer.jupyter.org/gist/rpmuller/5920182
● Programming tutorial:
https://colab.research.google.com/drive/1IOysoRfcxFyGJnjVKY8pPTCJ0gG7EiZD
● Python tutorial: https://github.com/jerry-git/learn-python3
Reference
1. Doane, David P., and Lori E. Seward - Applied statistics in business and economics
2. Wasserman, Larry - All of statistics: a concise course in statistical inference
3. https://luminousmen.com/
4. http://www.mas.ncl.ac.uk/~ndah6/teaching/MAS1403/notes_chapter6.pdf
5. lumenlearning.com
End of Lecture 01
● What you have learned
○ Counting Rules
○ Sample Space, Event
○ Independent Event
○ Conditional Probability
○ Bayes’ Theorem
● Questions?
Exercise for discussing
● Ignoring leap years, and assuming birthdays are equally likely to be any day of the
year, what is the chance of a tie in birthdays among the students in this class?
● In any 15-minute interval, there is a 20% probability that you will see at least one
shooting star. What is the probability that you see at least one shooting star in the
period of an hour?
● A certain couple tells you that they have two children, at least one of which is a girl.
What is the probability that they have two girls?
● How can you generate a random number between 1 - 7 with only a die?

You might also like