Random Variables and Probability Distribution
Random Variables and Probability Distribution
Random Variables and Probability Distribution
1/6
x
1 2 3 4 5 6
P(x) 1
all x
Probability mass function
(pmf)
Probability mass function
(pmf)
The probabilities associated with all values
must be non-negative and sum up to 1.
Probability mass function
(pmf)
x p(x)
1 p(x=1)=1/6
2 p(x=2)=1/6
3 p(x=3)=1/6
4 p(x=4)=1/6
5 p(x=5)=1/6
6 p(x=6)=1/6
1.0
Problem-PMF
Let X be a discrete random variable with the
following PMF
Problem - PMF: Solution
Problem - PMF: Solution
Cumulative distribution
function (CDF)
• The cumulative distribution function (CDF) of a random
variable is another method to describe the distribution of
random variables.
• The advantage of the CDF is that it can be defined for any
kind of random variable (discrete, continuous, and
mixed).
• The cumulative distribution function (CDF) of random
variable X is defined as:
1.0 P(x)
5/6
2/3
1/2
1/3
1/6
1 2 3 4 5 6 x
Cumulative distribution
function (CDF)
x P(x≤A)
1 P(x≤1)=1/6
2 P(x≤2)=2/6
3 P(x≤3)=3/6
4 P(x≤4)=4/6
5 P(x≤5)=5/6
6 P(x≤6)=6/6
Problem-CDF and Solution
If you toss a coin twice. Let X be the number of
observed heads. Find the CDF of X.
Problem-CDF:Solution
Additional Examples
1. What’s the probability that you roll a 3 or less?
P(x≤3)=1/2
12 .25
1.0
Answer (b)
b. f(x)= (3-x)/2 for x=1,2,3,4
x f(x)
Though this sums to 1,
1 (3-1)/2=1.0 you can’t have a negative
probability; therefore, it’s
2 (3-2)/2=.5 not a probability
function.
3 (3-3)/2=0
4 (3-4)/2=-.5
Answer (c)
c. f(x)= (x2+x+1)/25 for x=0,1,2,3
x f(x)
0 1/25
1 3/25
Doesn’t sum to 1. Thus,
2 7/25 it’s not a probability
function.
3 13/25
24/25
Practice Problem:
The number of ships to arrive at a harbor on any given day is a random
variable represented by x. The probability distribution for x is:
x 10 11 12 13 14
P(x) .4 .2 .2 .1 .1
f ( x) e x
e
x x
e 0 1 1
0
0
Continuous case: “probability
density function” (pdf)
p(x)=e-x
p(x)=e-x
x
1 2
2 2
x x
P(1 x 2) e e e 2 e 1 .135 .368 .23
1
1
Cumulative distribution
function
As in the discrete case, we can specify the “cumulative
distribution function” (CDF):
A A
x x
e e e A e 0 e A 1 1 e A
0
0
Example
p(x)
2 x
2
P(x 2) 1 - e 1 - .135 .865
Expected Value and Variance
One standard
deviation from the
Mean ()
mean ()
Expected value, or mean
If we understand the underlying probability function of a certain
phenomenon, then we can make informed decisions based on
how we expect x to behave on-average over the long-run…(so
called “frequentist” theory of probability).
E( X ) x p(x )
all x
i i
Continuous case:
E( X )
all x
xi p(xi )dx
Empirical Mean is a special case of
Expected Value…
x i n
1
X i 1
n
i 1
xi ( )
n
E( X ) x p(x )
all x
i i
Continuous case:
E( X )
all x
xi p(xi )dx
Extension to continuous case:
uniform distribution
p(x)
x
1
1
x2 1
1 1
E ( X ) x(1)dx
0
2 0
2
0
2
Symbol Interlude
E(X) = µ
◦these symbols are used
interchangeably
Expected Value
Expected value is an extremely
useful concept for good decision-
making!
Example: the lottery
The Lottery
A certain weekly lottery works by picking 6
numbers from 1 to 49. It costs Rs. 1.00 to play the
lottery, and if you win, you win Rs. 2 lakhs after
taxes.
1 1 1 “49 choose 6”
7.2 x 10-8
49 49! 13,983,816
Out of 49
6 43!6!
numbers, this is
the number of
The probability function (note, sums to 1.0): distinct
combinations of 6.
X (Rs) p(x)
-1 .999999928
Expected Value
E(X) = P(win)* Rs 2,000,00 + P(lose)*- Rs 1.00
= 2.0 x 106 * 7.2 x 10-8+ .999999928 (-1) = .144 - .999999928 = -$.86
Negative expected value is never good!
You shouldn’t play if you expect to lose money!
Expected Value
If you play the lottery every week for 10 years, what are your
expected winnings or losses?
Var ( x) E[( x ) ]
2 2
(x )
all x
i
2
p(xi )
Var ( X ) 2
(x )
all x
i
2
p(xi )
Continuous case:
Var ( X ) ( xi ) p ( xi )dx
2 2
Similarity to empirical
variance
( xi x ) 2 N
1
i 1
n 1
i 1
2
( xi x ) (
n 1
)
.997 .99
Standard deviation is $.99. Interpretation: On average, you’re either 1 dollar
above or 1 dollar below the mean, which is just under zero. Makes sense!
Handy calculation formula!
(x ) x p(xi ) ( )
2
Var ( X ) i
2
p(xi ) i
2
all x all x
= E(x2) – 2
= E(x2) – [E(x)]2
For example, what’s the variance and
standard deviation of the roll of a die?
x p(x) p(x)
average distance from the mean
1 p(x=1)=1/6
2 p(x=2)=1/6
1/6
3 p(x=3)=1/6
4 p(x=4)=1/6 x
1 2 3 4 5 6
5 p(x=5)=1/6
mean
6 p(x=6)=1/6
1.0
For example, what’s the variance and
standard deviation of the roll of a die?
x p(x)
1 p(x=1)=1/6 p(x) average distance from the mean
2 p(x=2)=1/6
3 p(x=3)=1/6
4 p(x=4)=1/6
1/6
5 p(x=5)=1/6 x
1 2 3 4 5 6
6 p(x=6)=1/6
1.0
mean
1 1 1 1 1 1 21
E ( x)
all x
xi p(xi ) (1)( ) 2( ) 3( ) 4( ) 5( ) 6( )
6 6 6 6 6 6 6
3 .5
1 1 1 1 1 1
2
E(x )
all x
2
xi p(xi ) (1)( ) 4( ) 9( ) 16( ) 25( ) 36( ) 15.17
6 6 6 6 6 6
+c
Var (c+X)= Var(X)
Var (c+X)= Var(X)
Adding a constant to every instance of a random variable doesn’t
change the variability. It just shifts the whole distribution by c. If
everybody grew 5 inches suddenly, the variability in the population
would still be the same.
+c
Var(cX)= c2Var(X)
Var(cX)= c2Var(X)
Multiplying each instance of the random variable by c makes it c-
times as wide of a distribution, which corresponds to c2 as much
variance (deviation squared).
For example, if everyone suddenly became twice as tall, there’d be twice
the deviation and 4 times the variance in heights in the population.
Var(X+Y)= Var(X) + Var(Y)
Var(X+Y)= Var(X) + Var(Y) ONLY IF X and Y are independent!!!!!!!!
With two random variables, you have more opportunity for variation,
unless they vary together (are dependent, or have covariance):
Var(X+Y)= Var(X) + Var(Y) + 2Cov(X, Y)
Practice Problem
You toss a coin 100 times. What’s
the expected number of heads?
What’s the variance of the number
of heads?
Answer: expected value
Intuitively, we’d probably all agree that we expect around 50 heads, right?
Another way to show this
Think of tossing 1 coin. E(X=number of heads) = (1) P(heads) +
(0)P(tails)
E(X=number of heads) = 1(.5) + 0 = .5
If we do this 100 times, we’re looking for the sum of 100 tosses,
where we assign 1 for a heads and 0 for a tails. (these are 100
“independent, identically distributed (i.i.d)” events)
E(X1 +X2 +X3 +X4 +X5 …..+X100) = E(X1) + E(X2) + E(X3)+ E(X4)+ E(X5) …..+
E(X100) =
100 E(X1) = 50
Answer: variance
What’s the variability, though? More tricky. But, again, we
could do this for 1 coin and then use our rules of variance.
Think of tossing 1 coin.
E(X2=number of heads squared) = 12 P(heads) + 02 P(tails)
E(X2) = 1(.5) + 0 = .5
Var(X) = .5 - .52 = .5 - .25 = .25
Then, using our rule: Var(X+Y)= Var(X) + Var(Y) (coin tosses are independent!)
Var(X1 +X2 +X3 +X4 +X5 …..+X100) = Var(X1) + Var(X2) + Var(X3)+ Var(X4)+ Var(X5) …..+ Var(X100) =
E[( x x )( y y )]
N
σ xy ( xi x )( yi y ) P ( xi , yi )
i 1
The Sample Covariance
The sample covariance:
( x X )( y
i i Y )
cov ( x , y ) i 1
n 1
Interpreting Covariance
Covariance between two random variables: