Basics of Probability and Statistics

Basics of Probability and
Statistics
Outline
 Probability
 Statistical Measures
Probability
Idea of Probability
 Probability is the science of chance behavior
 Chance behavior is unpredictable in the short
run but has a regular and predictable pattern

in the long run
Randomness
 Random: individual outcomes are uncertain
 But there is a regular distribution of outcomes in a large
number of repetitions.
 Example: select any number from a bag of numbers
{1,2,3,…,100}
Random Experiment…
 …a
If random experiment
an experiment is an
has n possible action [all
outcomes or process that leads
equally likely to
to occur].
one of several possible outcomes. For example:
Experiment Outcomes
Flip a coin Heads, Tails
Selecting a color ball Green, red, blue
Rolling a die 1,2,3,4,5,6
Picking a card from a

52 cards
deck
Relative-Frequency Probabilities
 Relative frequency (proportion of occurrences) of an
outcome settles down to one value over the long run.

That one value is then defined to be the probability of
that outcome.
 Can be determined (or checked) by observing a long
series of independent trials (empirical data)
 experience with many samples
 simulation
Relative-Frequency Probabilities
Coin flipping:
Probability Models
 The sample space S of a random phenomenon is the set
of all possible outcomes.
 An event is an outcome or a set of outcomes (subset of
the sample space).
 A probability model is a mathematical description of long-
run regularity consisting of a sample space S and a way of

assigning probabilities to events.
Sample Space and Events
Event 3
Event 4
Event 1
Sample Space
Event 5
Event 2
Example
Rolling an odd
number={2,4,6}
Rolling an even
number={2,4,6}
Sample Space
={1,2,3,4,5,6}
Rolling a prime
number={2,3,5}
Probability Model for Two Dice
Random phenomenon: roll pair of fair dice.
Sample space:
Event: rolling even numbers on both dice

12
Probability Model for 52 card deck
Random phenomenon: Arrange 52 card deck in a zigzag way
Sample space:
Event: pick an ace

Probability
What is a PROBABILITY?
- Probability is the chance that some event

will happen
- It is the ratio of the number of ways a

certain event can occur to the number of
possible outcomes
Probability
number of favorable outcomes

P(event) = number of possible outcomes
Examples that use Probability:

(1) Dice, (2) Spinners, (3) Coins, (4) Deck of Cards, (5)
Evens/Odds, (6) Alphabet, etc.
Probability
0 ¼ or .25 ½ or .5 ¾ or .75 1
Impossible Not Very Equally Likely Somewhat Certain

Likely Likely
Probability of Simple Events
Example 2: Roll a dice.
What is the probability of rolling an even number?
# 𝑓𝑎𝑣𝑜𝑟𝑎𝑏𝑙𝑒 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠 3 1
𝑃 𝑒𝑣𝑒𝑛 # =
#𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠
= =
6 2
The probability of rolling an even number

is 3 out of 6.
Example 3: Roll a dice.
Random phenomenon: roll pair of fair dice and
count the number of pips on the up-faces.
Find the probability of rolling a 5.
P(roll a 5) = P( )+P( )+P( )+P( )

= 1/36 + 1/36 + 1/36 + 1/36
= 4/36
= 0.111
19
Example 4: Spinners.
What is the probability of spinning green?
# 𝑓𝑎𝑣𝑜𝑟𝑎𝑏𝑙𝑒 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠 1
𝑃 𝑔𝑟𝑒𝑒𝑛 = =
#𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠 4
The probability of spinning green is 1 out of 4

Example 5: Flip a coin.
What is the probability of flipping a tail?
# 𝑓𝑎𝑣𝑜𝑟𝑎𝑏𝑙𝑒 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠 1
𝑃 𝐻𝑒𝑎𝑑 = =
#𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠 2
The probability of spinning green is 1 out of 2

Example 6: Deck of Cards.
 What is the probability of picking a heart?
# 𝒇𝒂𝒗𝒐𝒓𝒂𝒃𝒍𝒆 𝒐𝒖𝒕𝒄𝒐𝒎𝒆𝒔 𝟏𝟑 𝟏
𝑃 𝐻𝑒𝑎𝑟𝑡 = = =
#𝒑𝒐𝒔𝒔𝒊𝒃𝒍𝒆 𝒐𝒖𝒕𝒄𝒐𝒎𝒆𝒔 𝟓𝟐 𝟒
The probability of picking a heart is 1 out of 4
 What is the probability of picking a non heart?

# 𝒇𝒂𝒗𝒐𝒓𝒂𝒃𝒍𝒆 𝒐𝒖𝒕𝒄𝒐𝒎𝒆𝒔 𝟑𝟗 𝟑
𝑃 𝑛𝑜𝑛 − 𝐻𝑒𝑎𝑟𝑡 = = =
#𝒑𝒐𝒔𝒔𝒊𝒃𝒍𝒆 𝒐𝒖𝒕𝒄𝒐𝒎𝒆𝒔 𝟓𝟐 𝟒
The probability of picking a heart is 3 out of 4

Key Concepts:
- Probability is the chance that some event will

happen
- It is the ratio of the number of ways a certain

even can occur to the total number of possible
outcomes
Guided Practice: Calculate the probability of each independent

event.
1) P(black) =
2) P(1) =
3) P(odd) =
4) P(prime) =
Guided Practice: Answers
1) P(black) = 4/8
2) P(1) = 1/8
3) P(odd) = 1/2
4) P(prime) = 1/2
Independent Practice: Calculate the probability of each

independent event.
1) P(red) =
2) P(2) =
3) P(not red) =
4) P(even) =
Independent Practice: Answers
1) P(red) =1/2
2) P(2) = 1/4
3) P(not red) = 1/2
4) P(even) = 1/2
Real World Example:

A computer company manufactures 2,500 computers each day. An
average of 100 of these computers are returned with defects. What is
the probability that the computer you purchased is not defective?
# 𝑓𝑎𝑣𝑜𝑟𝑎𝑏𝑙𝑒 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠 2400 24

𝑃 𝑛𝑜𝑡 𝑑𝑒𝑓𝑒𝑐𝑡𝑖𝑣𝑒 = = =
#𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠 2500 25
Complementary Events
 The complement of an event E is the set of all
outcomes in a sample space that are not included in
event E.
 The complement of an event E is denoted by 𝐸 ′ 𝑜𝑟 𝐸ത
0  P( E )  1
P( E )  P( E )  1
Properties of Probability:
P( E )  1  P( E )
P( E )  1  P( E )
 Example I: A sequence of 5 bits is randomly generated. What is
the probability that at least one of these bits is zero?
 Solution: There are 25 = 32 possible outcomes of generating
such a sequence.
Define event E as at least one of the bits is zeros
ത “none of the bits is zero”, includes only one
Then event 𝐸,
of these outcomes, namely the sequence 11111.
ത = 1/32.
Therefore, p(𝐸)
Now p(E) can easily be computed as
ത = 1 – 1/32 = 31/32.
p(E) = 1 – p(𝐸)
 Example II: What is the probability that at least two out of 36
people have the same birthday?
 Solution: The sample space S encompasses all possibilities
for the birthdays of the 36 people, so |S| = 36536.
ത
Let us consider the event 𝐸(“no two people out of 36 have the
same birthday”).
𝐸ത includes P(365, 36) outcomes (365 possibilities for the first

person’s birthday, 364 for the second, and so on).
ത = P(365, 36)/36536 = 0.168,
Then p(𝐸)
so p(E) = 0.832
The Multiplication Rule
 If events A and B are independent, then the probability
of two events, A and B occurring in a sequence (or
simultaneously) is:
P( A  B)  P( A)  P(B)
 This rule can extend to any number of independent
events.
 Two events are independent if the occurrence of the first

event does not affect the probability of the occurrence of
the second event.
Mutually Exclusive
Two events A and B are mutually exclusive if and only if:
P( A  B)  0
In a Venn diagram this means that event A is disjoint from event B.
A B A B
A and B are M.E. A and B are not M.E.

The Addition Rule
 The probability that at least one of the events A or B
will occur, P(A or B), is given by:
P( A  B)  P( A)  P(B)  P( A  B)
 If events A and B are mutually exclusive, then the

addition rule is simplified to:
P( A  B)  P( A)  P(B)
 This simplified rule can be extended to any number of

mutually exclusive events.
The Addition and Multiplication Rule
 Example: What is the probability of a positive integer
selected at random from the set of positive integers
{1,2,….,100} to be divisible by 2 or 5?
 Solution:
E2: “integer is divisible by 2”
E5: “integer is divisible by 5”
 E2 = {2, 4, 6, …, 100} and |E2| = 50

p(E2) = 0.5
 E5 = {5, 10, 15, …, 100} and|E5| = 20

p(E5) = 0.2
The Addition and Multiplication Rule
E2  E5 = {10, 20, 30, …, 100} and |E2  E5| = 10
p(E2  E5) = 0.1
p(E2  E5) = p(E2) + p(E5) – p(E2  E5 )

p(E2  E5) = 0.5 + 0.2 – 0.1 = 0.6
Conditional Probability
 We talk about conditional probability when the probability of one
event depends on whether or not another event has occurred.
 e.g. There are 2 red and 3 blue counters in a bag and, without
looking, we take out one counter and do not replace it.
 The probability of a 2nd counter taken from the bag being red
depends on whether the 1st was red or blue.

 Conditional probability problems can be solved by considering the
individual possibilities or by using a table, a Venn diagram, a tree
diagram or a formula.
Notation
P(A B) means
“the probability that event A occurs given that B

has occurred”. This is conditional probability.
Example
e.g. 1. The following table gives data on the type of car, grouped
by petrol consumption, owned by 100 people.
Low Medium High Total

Male 12 33 7
Female 23 21 4
100
One person is selected at random.

L is the event “the person owns a low rated car”
Example

Male 12 33 7
Female 23 21 4
100
One person is selected at random.

F is the event “a female is chosen”.
Male 12 33 7
Female 23 21 4
One person is selected at random. 100

e.g. 1. The following table gives data on the type of car, grouped by
petrol consumption, owned by 100 people.

Male 12 33 7
Female 23 21 4
One person is selected at random. 100

Find (i) P(L) (ii) P(F and L) (iii) P(F L)
We need to be careful which row or column we look at.

Solution: Low Medium High Total
Male 12 33 7
Female 23 21 4
35 100
35 7 7
(i) P(L) = 
100 20 20
Male 12 33 7
Female 23 21 4
100
35 7 7
(i) P(L) = 
100 20 20
23 The probability of selecting a
(ii) P(F and L) =
100 female with a low rated car.
Male 12 33 7
Female 23 21 4
35 100
35 7 7
(i) P(L) = 
100 20 20
23
(ii) P(F and L) =
100
23 We
Themust be careful
probability with the a female
of selecting
(iii) P(F L) 
35 denominators
given the car isinlow
(ii)rated.
and (iii). Here we
are given the car is low rated. We want
the total of that column.
Male 12 33 7
Female 23 21 4
100
35 7 7
(i) P(L) = 
100 20 20 Notice that
1
23 7 23 23
(ii) P(F and L) = P(L)  P(F L)   
100 20 35 5 100
23 = P(F and L)
(iii) P(F L) 
35
So, P(F and L) = P(F L)  P(L)
Conditional Probability
P(F and L) = P(F L)  P(L)
 This result can be used to help solve harder conditional probability

problems.
 However, I haven’t proved the formula, just shown that it works for
one particular problem.
 We’ll just illustrate it again on a simple problem using a Venn

diagram.
e.g. 2. I have 2 packets of seeds. One contains 20 seeds and although they
look the same, 8 will give red flowers and 12 blue. The 2nd packet has 25
seeds of which 15 will be red and 10 blue.
Draw a Venn diagram and use it to illustrate the conditional probability formula.
Solution: Let R be the event “ Red flower ” and F be the event “ First packet ”
F R
Red in the 1st packet

F R
8
Red in the 1st packet

F R
8
Blue in the 1st packet

F R
12 8
Blue in the 1st packet

F R
12 8
Red in the 2nd packet

F R
12 8 15
Red in the 2nd packet

F R
12 8 15
Blue in the 2nd packet

F R
12 8 15
10
Blue in the 2nd packet

F R
12 8 15
Total: 20 + 25 10
45
F R
12 8 15
Total: 20 + 25 10
45
F R
12 8 15
10
45
P(R and F) = F R
12 8 15
10
8 45
P(R and F) = F R
12 8 15
10
8 45
P(R and F) = F R
45
12 8 15
10
8 45
P(R and F) = F R
45
P(R F) = 8 12 8 15
10
8 45
P(R and F) = F R
45
P(R F) = 8 P(F) = 12 8 15
20
10
8 45
P(R and F) = F R
45
P(R F) = 8 P(F) =
20 12 8 15
20
10
8 45
P(R and F) = F R
45
P(R F) = 8 P(F) =
20 12 8 15
20 45
10
8 45
P(R and F) = F R
45
P(R F) = 8 P(F) =
20 12 8 15
20 45
10
8 1 20 8
 P(R F)  P(F) =  
1 20 45 45
So, P(R and F) = P(R F)  P(F)
Summary
The probability that both event A and event B occur is given by
P(A and B) = P(A B)  P(B)
We often use this in the form
P(A B)  P(A and B)

P(B)
In words, this is “the probability of event A given that B has
occurred, equals the probability of both A and B occurring
divided by the probability of B”.
Reminder:
P(A and B) can also be written as P(A  B)
Example
 Three jars contain colored balls as described in the table

below.
 One jar is chosen at random and a ball is selected. If the ball is
red, what is the probability that it came from the 2nd jar?
Jar # Red White Blue

1 3 4 1
2 1 2 3
3 4 3 2
Example
 We will define the following events:
 J1 is the event that first jar is chosen
 J2 is the event that second jar is chosen
 J3 is the event that third jar is chosen
 R is the event that a red ball is selected

Example
 The events J1 , J2 , and J3 mutually exclusive
 Why?
 You can’t chose two different jars at the same
time
 Because of this, our sample space has been
divided or partitioned along these three events

Venn Diagram
 Let’s look at the Venn Diagram
Venn Diagram
 All of the red balls are in the first, second, and
third jar so their set overlaps all three sets of our
partition
Finding Probabilities
 What are the probabilities for each of the events
in our sample space?
 How do we find them?
P A  B  P A | BPB
Computing Probabilities
P J 1  R   P R | J 1 P J 1    
3 1 1
8 3 8
 Similar calculations show:
      1 1
P J2  R  P R | J2 P J2   
1
6 3 18
P J 3  R   P R | J 3 P J 3    
4 1 4
9 3 27
Venn Diagram
 Updating our Venn Diagram with these
probabilities:
Where are we going with this?
 Our original problem was:
 One jar is chosen at random and a ball is selected.
If the ball is red, what is the probability that it came
from the 2nd jar?
 In terms of the events we’ve defined we want:
P J 2  R 
P J 2 | R  
P R 
Finding our Probability
 We already know what the numerator portion is
from our Venn Diagram
 What is the denominator portion?
P J 2  R 
P J 2 | R  
P R 
P J 2  R 

P J 1  R   P J 2  R   P J 3  R 
Arithmetic!
 Plugging in the appropriate values:
P J 2  R 
P J 2 | R  
P J 1  R   P J 2  R   P J 3  R 
1
 
  18 

12
 0.17
 1   1   4  71
   
 8   18   27 
Bayes’ Theorem:
PB AP A
P A B  
P(B)
P ( B A) P ( A)
P ( A B) =
å P(B A )P( A )
n n n
The important consequence of Bayes’ Theorem

is that it relates inverse probabilities: P(A|B) and
P(B|A)
79
Random Variables
 A random variable is a variable whose value is a
numerical outcome of a random experiment

 often denoted with capital alphabetic symbols (X, Y, etc.)
 a normal random variable may be denoted as X ~ N(µ, )
 The probability distribution of a random variable X tells us
what values X can take and how to assign probabilities to

those values
80
Discrete Random Variables
 Random variables that have a finite (countable) list of
possible outcomes, with probabilities assigned to each
of these outcomes, are called discrete
 Discrete random variables

 number of pets owned (0, 1, 2, … )
 numerical day of the month (1, 2, …, 31)
 the total number of tails you get if you flip 100 coins
81
Discrete example: roll of a die
p(x)
1/6
x
1 2 3 4 5 6
 P(x)  1
all x
Probability Distribution Function (PDF)
x p(x)
1 p(x=1)=1/6
2 p(x=2)=1/6
3 p(x=3)=1/6
4 p(x=4)=1/6
5 p(x=5)=1/6
6 p(x=6)=1/6
1.0
Cumulative Distribution Function (CDF)
1.0 P(x)
5/6
2/3
1/2
1/3
1/6
1 2 3 4 5 6 x
Cumulative Distribution Function (CDF)
x P(x≤A)
1 P(x≤1)=1/6
2 P(x≤2)=2/6
3 P(x≤3)=3/6
4 P(x≤4)=4/6
5 P(x≤5)=5/6
6 P(x≤6)=6/6
Examples
1. What’s the probability that you roll a 3 or less?
P(x≤3)=1/2
2. What’s the probability that you roll a 5 or

higher?
P(x≥5) = 1 – P(x≤4) = 1-2/3 = 1/3

Important discrete distributions in
epidemiology…
 Binomial
 Yes/no outcomes (dead/alive,
treated/untreated, smoker/non-smoker,
sick/well, etc.)
 Poisson
 Counts (e.g., how many cases of disease
in a given area)
Continuous Random Variables
 Random variables that can take on any
value in an interval, with probabilities given
as areas under a density curve, are called
continuous
 Continuous random variables
 weight
 temperature
88
Probability Density Function (PDF)
 The probability function that accompanies a continuous
random variable is a continuous mathematical function that

integrates to 1.
 The probabilities associated with continuous functions are
just areas under the curve (integrals!).
 Probabilities are given for a range of values, rather than a
particular.
 For example, the negative exponential function (in

probability, this is called an “exponential distribution”):
f ( x)  e  x
 This function integrates to 1:
 
e
x x
 e  0 1 1
0
0
p(x)=e-x
The probability that x is any exact particular value (such as

1.9976) is 0; we can only assign probabilities to possible
ranges of x.
For example, the probability of x falling within 1
to 2:
p(x)=e-x
x
1 2
2 2

x x
P(1  x  2)  e  e  e  2  e 1  .135  .368  .23
1
1
Cumulative Density Function (CDF)
As in the discrete case, we can specify the “cumulative

distribution function” (CDF):
The CDF here = P(X≤A)=
A A

x x
e  e  e  A  e 0   e  A  1  1  e  A
0
0
Cumulative Density Function (CDF)
p(x)
1
2 x
2
P(x  2)  1 - e  1 - .135  .865
Uniform Density
The uniform distribution: all values are equally likely
The uniform distribution:

f(x)= 1 , for 1 x 0 p(x)
x
1
We can see it’s a probability distribution because it integrates

to 1 (the area under the curve is 1): 1 1
1  x
0
0
1 0 1
Uniform Density
What’s the probability that x is between ¼ and ½?
p(x)
¼ ½ x
1
P(1/4 ≤ x≤ 1/2 )= ¼
The Normal Density Function
1 x 2
1  ( )
f ( x)  e 2 
 2
This is a bell shaped curve
Note constants: with different centers and
spreads depending on  and 
=3.14159
e=2.71828
μ
It’s a probability function, so no matter what the

values of  and , must integrate to 1!
+∞
1 1 𝑥−𝜇 2
−
න 𝑒 2 𝜎 𝑑𝑥 =1
𝜎 2𝜋
−∞
The Shape of Normal Density
Normal distribution is bell shaped, and symmetrical around m.
90  110
Why symmetrical? Let µ = 100. Suppose x = 110. Now suppose x = 90
2 2 2 2
 110 100   10   90 100   10 
1 (1/ 2)  1 (1/ 2)  1 (1/ 2)  1 (1/ 2) 
         
f (110)  e  e f (90)  e  e
 2  2  2  2
Normal Probability Density
 The expected value (also called the mean) E(X) (or )
can be any number
 The standard deviation  can be any nonnegative
number
 The total area under every normal curve is 1
 There are infinitely many normal distributions
Normal Probability Density
Total area =1; symmetric around µ

The effects of  and 
How does the standard deviation affect the shape of f(x)?

= 2
 =3
 =4
How does the expected value affect the location of f(x)?

 = 10  = 11  = 12
Statistical Measures
Statistical Measures
 Center of the data
 Mean
 Median
 Variation
 Range
 Quartiles
 Variance
 Standard Deviation
 Covariance
 Correlation
Mean or Average or Expectation
 Traditional measure of center
 Sum the values and divide by the number of values
𝑛
1 1
𝐸(𝑥)
Ԧ = 𝑥Ԧ1 + 𝑥Ԧ2 + ⋯ + 𝑥Ԧ𝑛 = ෍ 𝑥Ԧ𝑖
𝑛 𝑛
𝑖=1
 In general
𝑛
𝐸(𝑥)
Ԧ = 𝑝1 𝑥Ԧ1 + 𝑝2 𝑥Ԧ2 + ⋯ + 𝑝𝑛 𝑥Ԧ𝑛 = ෍ 𝑝𝑖 𝑥Ԧ𝑖
𝑖=1
Mean or Average
1
[(1,2)+ (5,6)
11
(3,4)+
(6,5)
(5,6)+ Mean
(2,4)+ (5,5)
(1,1)+ (2,4) (3,4) (3.3636,3.0909)
(4,2)+
(6,5)+ (5,3)
(3,1)+
(2,1)+ (2,1) (4,2)
(5,3)+
(5,5)] (1,1) (1,2) (3,1)
Median (M)
 A resistant measure of the data’s center
 At least half of the ordered values are less than or equal to
the median value
 At least half of the ordered values are greater than or equal
to the median value
 If n is odd, the median is the middle ordered value
 If n is even, the median is the average of the two middle
ordered values
Median (M)
Location of the median: L(M) = (n+1)/2 ,
where n = sample size.
Example: If 25 data values are recorded, the Median would

be the
(25+1)/2 = 13th ordered value.
Median
 Example 1 data: 2 4 6
Median (M) = 4
 Example 2 data: 2 4 6 8
Median = 5 (average of 4 and 6)
 Example 3 data: 6 2 4
Median 2
(order the values: 2 4 6 , so Median = 4)
Comparing the Mean & Median
 Computation of mean is easier.
 Finding median in higher dimension is much complex.
 Mean is prone to noise.
 The mean and median of data from a symmetric distribution
should be close together. The actual (true) mean and median

of a symmetric distribution are exactly the same.
Spread or Variability
 If all values are the same, then they all equal to the mean.
There is no variability.
 Eg: 2, 2, 2, 2, 2, 2; mean = 2
 Variability exists when some values are different from

(above or below) the mean.
 Eg: 10, 15,-20,-22,30, 22
 We will discuss the following measures of spread: range,

quartiles, variance, and standard deviation
Range
 One way to measure spread is to give the smallest
(minimum) and largest (maximum) values in the data set;
Range = max  min
 Eg: 10,-2,-7,22,0,11; Range = 22-(-7)=28
 The range is strongly affected by outliers

Quartiles
 Three numbers which divide the ordered data into four equal
sized groups.
 Q1 has 25% of the data below it.
 Q2 has 50% of the data below it. (Median)
 Q3 has 75% of the data below it.
Quartiles Uniform Distribution
1st Qtr Q1 2nd Qtr Q2 3rd Qtr Q3 4th Qtr

Obtaining the Quartiles
 Order the data.
 For Q2, just find the median.
 For Q1, look at the lower half of the data values, those to the left
of the median location; find the median of this lower half.
 For Q3, look at the upper half of the data values, those to the
right of the median location; find the median of this upper half.
Variance and Standard Deviation
 Recall that variability exists when some values are different
from (above or below) the mean.
 Each data value has an associated deviation from the mean:
xi  x
Deviations
 what is a typical deviation from the mean?
(standard deviation)
 small values of this typical deviation indicate
small variability in the data
 large values of this typical deviation indicate
large variability in the data
Variance
Variance is the average squared deviation from the mean of a set

of data. It is used to find the standard deviation.
Variance
Mean
Variance
2
-
Variance
2
-
2
-
Variance
1
---------------- ……… + 2 2
- + - + ………
No. of Data
Points
Variance Formula
𝑛
2
1 2
𝜎 = ෍(𝑥𝑖 − 𝑥)ҧ
𝑛
𝑖=1
Standard Deviation
𝑛
1
𝜎 = ෍(𝑥𝑖 − 𝑥)ҧ 2
𝑛
𝑖=1
[ standard deviation = square root of the variance ]

Metabolic rates of 7 men (cal./24hr.) :
1792 1666 1362 1614 1460 1867 1439
1792  1666  1362  1614  1460  1867  1439

x 
7
11,200

7
 1600
Observations Deviations Squared deviations
xi xi  x xi  x 
2
1792 17921600 = 192 (192)2 = 36,864

1666 1666 1600 = 66 (66)2 = 4,356
1362 1362 1600 = -238 (-238)2 = 56,644
1614 1614 1600 = 14 (14)2 = 196
1460 1460 1600 = -140 (-140)2 = 19,600
1867 1867 1600 = 267 (267)2 = 71,289
1439 1439 1600 = -161 (-161)2 = 25,921
sum = 0 sum = 214,870
214,870
 
2
 30695.71
7
  30695.71  175.20 calories

Variance (2D)
Variance (2D)
Variance (2D)
Variance (2D)
Variance (2D)
Variance doesn’t explore

relationship between variables
Covariance
1 𝑛
Variance(x)= σ𝑖=1(𝑥𝑖 − 𝑥)ҧ 2
𝑛
1 𝑛
= σ (𝑥 − 𝑥)(𝑥
ҧ 𝑖 − 𝑥)ҧ
𝑛 𝑖=1 𝑖
1 𝑛
Covariance(x, y) = σ𝑖=1(𝑥𝑖 − 𝑥)(𝑦
ҧ 𝑖 − 𝑦)
ത
𝑛
 Covariance x, x = var x
 Covariance x, 𝑦 = Covariance y, x
Covariance
1 𝑛
ҧ 𝑖 − 𝑦)
ത
𝑛
Covariance
1 𝑛
ҧ 𝑖 − 𝑦)
ത
𝑛
Covariance
1 𝑛
ҧ 𝑖 − 𝑦)
ത
𝑛
𝑦ത
𝑦1 − 𝑦<0
ത
𝑦1
𝑥1 𝑥ҧ
𝑥1 − 𝑥<0
ҧ
Covariance
1 𝑛
ҧ 𝑖 − 𝑦)
ത
𝑛
𝑦1
𝑦1 − 𝑦ത >0
𝑦ത
𝑥ҧ 𝑥1 𝑥1 − 𝑥ҧ >0
Covariance
1 𝑛
ҧ 𝑖 − 𝑦)
ത
𝑛
(𝑥𝑖 − 𝑥)(𝑦
ҧ 𝑖 − 𝑦)>0
ത
ҧ 𝑖 − 𝑦)<0
ത
Positive
Relation
ҧ 𝑖 − 𝑦)<0
ത
ҧ 𝑖 − 𝑦)>0
ത
Covariance
1 𝑛
ҧ 𝑖 − 𝑦)
ത
𝑛
Covariance
1 𝑛
ҧ 𝑖 − 𝑦)
ത
𝑛
Covariance
1 𝑛
ҧ 𝑖 − 𝑦)
ത
𝑛
𝑦1
𝑦1 − 𝑦ത >0
𝑦ത
𝑥1 𝑥ҧ
𝑥1 − 𝑥<0
ҧ
Covariance
1 𝑛
ҧ 𝑖 − 𝑦)
ത
𝑛
𝑦ത
𝑦1 − 𝑦<0
ത
𝑦1
𝑥ҧ 𝑥1
𝑥1 − 𝑥>0
ҧ
Covariance
1 𝑛
ҧ 𝑖 − 𝑦)
ത
𝑛
𝑥𝑖 − 𝑥ҧ 𝑦𝑖 − 𝑦ത <0
ҧ 𝑖 − 𝑦)>0
ത
Negative
Relation
ҧ 𝑖 − 𝑦)>0
ത
ҧ 𝑖 − 𝑦)<0
ത
Covariance
1 𝑛
ҧ 𝑖 − 𝑦)
ത
𝑛
Covariance
1 𝑛
ҧ 𝑖 − 𝑦)
ത
𝑛
𝑥𝑖 − 𝑥ҧ 𝑦𝑖 − 𝑦ത <0 𝑥𝑖 − 𝑥ҧ 𝑦𝑖 − 𝑦ത >0
No
Relation
𝑥𝑖 − 𝑥ҧ 𝑦𝑖 − 𝑦ത <0
𝑥𝑖 − 𝑥ҧ 𝑦𝑖 − 𝑦ത >0
Covariance
1 𝑛
ҧ 𝑖 − 𝑦)
ത
𝑛
(𝑥, 𝑦) (𝑥 − 𝑥,ҧ 𝑦 − 𝑦)
ത
(2 , 1) (-2.4545, -2.8182)
(2 , 2) (-2.4545, -1.8182) 1
Covariance(x, y) = (𝑥 − 𝑥)ҧ 𝑇 (𝑦 − 𝑦)
ത
(4 , 3) (-0.4545, -0.8182) 11
(6 , 1) (1.5455, -2.8182)
(8 , 3) (3.5455, -0.8182)
𝑇
(1 , 5) (-3.4545, 1.1818) Covariance(x, y) = 𝐸[ 𝑥 − 𝑥ҧ 𝑦 − 𝑦ത ]
(4 , 6) (-0.4545, 2.1818)
(4 , 7) (-0.4545, 3.1818)
(6 , 3) (1.5455, -0.8182)
(6 , 5) (1.5455, 1.1818)
(6 , 6) (1.5455, 2.1818)
(4.4545, 3.8182) (0, 0)
Covariance Matrix
𝑐𝑜𝑣(𝑥1 , 𝑥1 ) 𝑐𝑜𝑣(𝑥1 , 𝑥2 ) ⋯ 𝑐𝑜𝑣(𝑥1 , 𝑥𝑛 )

𝑐𝑜𝑣(𝑥2 , 𝑥1 ) 𝑐𝑜𝑣(𝑥2 , 𝑥2 ) ⋯ 𝑐𝑜𝑣(𝑥2 , 𝑥𝑛 )
𝐶𝑜𝑣 σ =
⋮ ⋮ ⋮ ⋮
𝑐𝑜𝑣(𝑥𝑛 , 𝑥1 ) 𝑐𝑜𝑣(𝑥𝑛 , 𝑥2 ) ⋯ 𝑐𝑜𝑣(𝑥𝑛 , 𝑥𝑛 )
 Diagonal elements are variances, i.e. Cov(𝑥, 𝑥)=𝑣𝑎𝑟 𝑥 .

 Covariance Matrix is symmetric.
 It is a positive semi-definite matrix.
Correlation
Positive relation Negative relation No relation
• Covariance determines whether relation is positive or negative, but it was

impossible to measure the degree to which the variables are related.
• Correlation is another way to determine how two variables are related.
• In addition to whether variables are positively or negatively related, correlation
also tells the degree to which the variables are related each other.
Correlation
𝑐𝑜𝑣(𝑥, 𝑦)
𝜌𝑥𝑦 = 𝐶𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 𝑥, 𝑦 =
𝑣𝑎𝑟(𝑥) 𝑣𝑎𝑟(𝑦).
−1 ≤ 𝐶𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 𝑥, 𝑦 ≤ +1
Multivariate Gaussians (or "multinormal distribution“ or
“multivariate normal distribution”)
Univariate case: single mean  and

variance 
Multivariate case:
Vector of observations x,
vector of means  and covariance matrix 
Dimension of x Determinant
Multivariate Gaussians
Univariate case
Multivariate case
do not depend on x
normalization constants
depends on x and positive
The mean vector
 μ1 
μ 
 2
μ  E ( x)  . 
 
. 
 μm 
 
Covariance of two random variables
 Recall for two random variables xi, xj
  Cov( xi , x j )
2
ij
 E[( xi  i )( x j   j )]
 E ( xi x j )  E ( xi ) E ( x j )
The covariance matrix
transpose operator
  E[ ( x  μ)( x  μ) ] T
 ( x1  μ1 )    2
1  12 ..  14 
    21  2 2 
.  24 
 .  
E  [( x1  μ1 )..( xn  μn )]   . . .. . 
 .  
    . . .. . 
 ( xm  μm ) 
  2 
 m1  m 2 ..  m 
Var(xm)=Cov(xm, xm)
An example: 2 variate case
The pdf of the multivariate will be: Covariance matrix
Determinant
An example: 2 variate case
Factorized into two independent Gaussians!

They are independent!
Recall in general case independence implies uncorrelation
but uncorrelation does not necessarily implies independence.
Multivariate Gaussians is a special case where uncorrelation
implies independence as well.
Diagonal covariance matrix
If all the variables are independent from each other,
The covariance matrix will be an diagonal one.
Reverse is also true:
If the covariance matrix is a diagonal one they are independent
 21 0  Diagonal matrix: m matrix where off-diagonal terms are zero
 2 
 0  2
 ij2  E[( xi  i )( x j   j )]  0
i j
Gaussian Intuitions: Size of 
Identity matrix
 = [0 0]  = [0 0]  = [0 0]
=I  = 0.6I  = 2I
As  becomes larger,
Gaussian becomes more spread out
Gaussian Intuitions: Off-diagonal
As the off-diagonal entries increase, more correlation between value of x and value of
y
Gaussian Intuitions: off-diagonal and diagonal
 Decreasing non-diagonal entries (#1-2)

 Increasing variance of one dimension in diagonal (#3)

Basics of Probability and Statistics

Uploaded by

Copyright:

Available Formats

Basics of Probability and Statistics

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Basics of Probability and Statistics

Uploaded by

Copyright:

Available Formats

What are the main concepts related to probability?

What are the main concepts related to probability?

What is the difference between a sample space and an event?

What is the difference between a sample space and an event?

Basics of Probability and

 Chance behavior is unpredictable in the short

run but has a regular and predictable pattern

 But there is a regular distribution of outcomes in a large

 Example: select any number from a bag of numbers

Flip a coin Heads, Tails

Selecting a color ball Green, red, blue

Rolling a die 1,2,3,4,5,6

Picking a card from a

outcome settles down to one value over the long run.

 Can be determined (or checked) by observing a long

series of independent trials (empirical data)

 experience with many samples

 The sample space S of a random phenomenon is the set

of all possible outcomes.

 An event is an outcome or a set of outcomes (subset of

the sample space).

 A probability model is a mathematical description of long-

run regularity consisting of a sample space S and a way of

Event: rolling even numbers on both dice

Event: pick an ace

- Probability is the chance that some event

- It is the ratio of the number of ways a

number of favorable outcomes

Examples that use Probability:

Impossible Not Very Equally Likely Somewhat Certain

The probability of rolling an even number

P(roll a 5) = P( )+P( )+P( )+P( )

The probability of spinning green is 1 out of 4

The probability of spinning green is 1 out of 2

 What is the probability of picking a non heart?

The probability of picking a heart is 3 out of 4

- Probability is the chance that some event will

- It is the ratio of the number of ways a certain

Guided Practice: Calculate the probability of each independent

Guided Practice: Answers

Independent Practice: Calculate the probability of each

Independent Practice: Answers

Real World Example:

# 𝑓𝑎𝑣𝑜𝑟𝑎𝑏𝑙𝑒 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠 2400 24

 The complement of an event E is denoted by 𝐸 ′ 𝑜𝑟 𝐸ത

𝐸ത includes P(365, 36) outcomes (365 possibilities for the first

 Two events are independent if the occurrence of the first

A and B are M.E. A and B are not M.E.

 If events A and B are mutually exclusive, then the

 This simplified rule can be extended to any number of

 E2 = {2, 4, 6, …, 100} and |E2| = 50

 E5 = {5, 10, 15, …, 100} and|E5| = 20

p(E2  E5) = p(E2) + p(E5) – p(E2  E5 )

depends on whether the 1st was red or blue.

“the probability that event A occurs given that B

Low Medium High Total

One person is selected at random.

Low Medium High Total

One person is selected at random.