iQRM Warm Up Week 5 February 17 Corrected

Download as pdf or txt
Download as pdf or txt
You are on page 1of 39

Spring 2024 Saturdays 8:30 - 9:00 am

Instructor: Dr. Oleksandra (Sasha) Lyulko

1
iQRM Warm-Up: Last Week Review

• Probability
o Conditional probability of A given B is the probability of event A given that
event B occurred: P(A)=P(A&B)/P(B)
o Law of Total Probability: 𝑃 𝐴 = σ 𝑃 𝐴 𝐵 𝑃(𝐵)
o Independent events A and B: P(A)=P(A|B)
o Tree diagrams are useful to visualize conditional probability
• Random Variables and Probability Distributions
o Discreet RV: PMF
o Continuous RV: PDF
o Cumulative Distribution Function CMF: 𝐹𝑋 𝑥 = 𝑃(𝑋 ≤ 𝑥)
o Distributions: Normal, Poisson, Lognormal, Binomial correspond to RV of
different nature
2
iQRM Warm-Up: Today’s Topics

• Measures of central tendency and variation


o Mean, median, mode
o Expected value
o Standard deviation and variance
• Normal Distribution

3
Measures of Central Tendency – Arithmetic Mean

• Arithmetic mean – the average of all values of the random


variable:
𝑛
1 𝑥1 + 𝑥2 + ⋯ + 𝑥𝑛
𝜇 = ෍ 𝑥𝑖 =
𝑛 𝑛
𝑖=1

• Only valid when all values are equally likely!


1+2+3+4+5+6
• Fair die: 𝜇 = = 3.5
6

4
What if the values are not equally likely?

• Example: February is usually the


snowiest month in New York City with
snowfalls occurring on 3-4 days. Two
of these days will record 1 inch (2.5
cm) or more and 1 day will record 3
inches (7.6 cm) or more.

3 + 1 + 1 +0 + 0 + ⋯+ 0
𝜇= ≈ 0.178
28

5
Measures of Central Tendency – Expected Value

3 + 1 + 1 +0 + 0 + ⋯+ 0 1 2 25
𝜇= = ∙3+ ∙1+ ∙ 0 ≈ 0.178
28 28 28 28

• Weighted average!

• 𝜇 = 𝑃(3) ∙ 3 + 𝑃(1) ∙ 1 + 𝑃(0) ∙ 0 ≈ 0.178

6
Measures of Central Tendency – Expected Value

• Expected value, or expectation of a random variable with finite


number of values is the weighted average of all possible
values:
𝐸 𝑋 = ෍ 𝑋 ∙ 𝑝(𝑋)

• Expected value of a continuous random variable:

𝐸 𝑋 = න 𝑥𝑓𝑋 𝑥 𝑑𝑥

7
Measures of Central Tendency – Median and Mode

• Median - the value separating the upper half from the lower half
of a data sample, a population, or a probability distribution.

• Examples:
• 5, 6, 8, 10, 12, 15, 20 – in case of odd number of data points, the
median is the middle number
• 5, 6, 8, 10, 12, 15, 20, 24 – in case of even number of data
points the median is the average between the two middle
numbers: 11

8
Measures of Central Tendency – Median and Mode

• Mode - the most frequent number in a data set.

• Examples:
• 5, 6, 6, 10, 12, 15, 20 – mode is 6
• Data sets can be multi-modal: 5, 6, 6, 10, 10, 15, 20 – modes
are 6 and 10

9
Measures of Variation

• Measures of variation are characteristics of distribution that


show how far apart the data points are from each other or how
spread the distribution is.

• Range – the difference between the highest and lowest values


• Interquartile range – the range of the middle half of the distribution
• Variance – average of the squared distances from the mean
• Standard deviation – square root of the variance

10
Measures of Variation – Variance

• Variance – the average of the squared distances from the mean,


i.e. expected value of the squared deviation from the mean:
Var 𝑋 = 𝐸 𝑋 − 𝜇 2

• Discrete random variable:


• 𝑉𝑎𝑟 𝑋 = σ 𝑋 − 𝜇 2 ∙ 𝑝(𝑋)
σ 𝑋−𝜇 2
• 𝑉𝑎𝑟 𝑋 =
𝑁

• Continuous random variable: 𝑉𝑎𝑟 𝑋 = ‫ 𝑋(׬‬− 𝜇)2 𝑓𝑋 𝑥 𝑑𝑥

11
Measures of Variation - Variance

• Question 1:
• Why does the distribution
with smaller variance
have higher peak than the
one with larger variance?
• Answer:
• Because the entire area
(integral) should be equal
to 1

12
Measures of Variation - Variance

• Question 2:
• Why can’t we simply find
the average deviation?

• Answer:
• Because the average
deviation is 0!

13
Properties of Expected Value and Variance

• 𝐸 𝑎𝑋 + 𝑏𝑌 = 𝑎𝐸 𝑋 + 𝑏𝐸(𝑌)
• Example: expected value of the sum of two dice

• Var 𝑎𝑋 = 𝑎2 𝑉𝑎𝑟(𝑋)
(if all values of the distribution are multiplied by 𝑎, squared devitions
will be multiplied by 𝑎2 )

• Var 𝑋 + 𝑌 = Var 𝑋 + Var(𝑌) – only when X and Y are independent


• Variance of the mean of n independent, identically distributed variables:
σ 𝑋𝑖 1 1 1
• Var 𝑋ത = Var = Var σ 𝑋𝑖 = ∙ 𝑛 ∙ Var 𝑋 = Var 𝑋
𝑛 𝑛2 𝑛2 𝑛
14
Population vs Sample

• Population – the entire group of people, items or elements that


we want to draw conclusions about
• Sample – a subset of the population that we include in our
study and collect data from

15
Population vs Sample

16
Population vs Sample

17
Measures of Variation – Standard Deviation

• Standard deviation – square root of the variance:


σ 𝑋−𝜇 2
𝜎𝑋 = 𝑉𝑎𝑟(𝑋) =
𝑁
• Standard deviation of the sample when 𝝁 is not known:
σ 𝑋 − 𝑋ത 2
𝑠=
𝑁−1
𝜎(𝑋)
• ത
Standard deviation of the sample mean: 𝜎 𝑋 =
𝑛

18
Parameters vs Statistics

• Parameter is a number that describes population


• Statistic is a number that describes sample

Population parameters Sample statistics

Population mean – 𝜇 Sample mean – 𝑋ത


Population standard Sample standard
deviation – 𝜎 deviation – 𝑠
Population variance – 𝜎 2 Sample variance – 𝑠 2
19
Applications of Standard Deviation – Rule of 16

• Volatility ~ standard deviation of the return.

• Rule of 16: Given daily volatility, the annualized volatility can be


calculated by multiplying by 16.

• If annualized volatility is 80%, what is the daily volatility?

20
Other Measures – Skewness and Kurtosis

• Skewness measures how symmetrical the distribution is


around the mean.
𝑬 𝑿−𝝁 𝟑
𝒔=
𝝈𝟑
𝟏 𝒏 𝒙𝒊 −𝝁 𝟑
or, for N data points observations: 𝒔 = σ𝒊=𝟏
𝒏 𝝈

21
Other Measures – Skewness and Kurtosis

• Positive skew:
mean > median
• Negative skew:
mean < median

22
Other Measures – Skewness and Kurtosis

• Kurtosis shows how widely spread the distribution is (similar to


the variance):
𝑬 𝑿−𝝁 𝟒
𝑲=
𝝈𝟒
𝟏 𝒏 𝒙𝒊 −𝝁 𝟒
or, for N data points observations: 𝑲 = σ𝒊=𝟏
𝒏 𝝈

23
Excel Functions

• Mean: =AVERAGE(select an array)


• Median: =MEDIAN(array)
• Mode: =MODE.SNGL(array), =MODE.MULT(array) (use with
Ctrl+Shift+Enter)
• Standard deviation: =STDEV.P(array), =STDEV.S(array)
• Variance: =VAR.P(array), =VAR.S(array)

24
Probability Distributions – Normal Distribution

• Most widely occurring in The population distribution of IQ scores drawn from


a sample of 10,000 observations
nature
• Examples:
• People’s heights
• Measurement errors
• SAT scores
• IQ scores
• Weight at birth

From: “Learning Statistics with R” by Danielle Navarro 25


History of Normal Distribution – Famous Contributors

• 1783: Abraham de Moivre in his study “The Doctrine of Chances” approximated


the coefficients of binomial expansion (𝑎 + 𝑏)𝑛 . The approximation curve was
bell-shaped.
• 1801-1823: Carl Friedrich Gauss discovered normal distribution. He
introduced the two-parameter exponential function in 1809. Within
astronomical studies of the dwarf planet Ceres, he proved the method of
least squares under the assumption of normally distributed
measurement errors and used the method to calculate the orbit of Ceres.
2
• 1782: Pierre-Simon Laplace calculated the integral ‫ 𝑒 ׬‬−𝑡 𝑑𝑡 = 𝜋 that
normalizes Normal Distribution. In 1810 he proved Central Limit Theorem.
• 1859-1866: James Clerk Maxwell developed kinetic theory of gases,
where he derived the distribution of velocities of particles and
demonstrated that Normal Distribution occurs in natural phenomena.

26
Normal Distribution - Definition

• Normal Distribution (Gaussian Distribution) – continuous


distribution for a real-valued random variable with probability
density function (PDF)
1 1 𝑥−𝜇 2

𝑓 𝑥 = 𝑒 2 𝜎
𝜎 2𝜋

27
Normal Distribution – Probability Density Function

1 1 𝑥−𝜇 2

𝑓 𝑥 = 𝑒 2 𝜎
𝜎 2𝜋

28
Normal Distribution

• Notation: 𝑋~𝑁(𝜇, 𝜎 2 ) means “random variable 𝑋 is drawn from


a normal distribution with mean 𝜇 and standard deviation 𝜎”
• How is it derived? The distribution of sample means
approaches normal distribution as 𝑛 → ∞ (Central Limit
Theorem – textbook pp. 73-76)
• Example: distribution of log returns (𝑟 = ln(1 + 𝑅))
• Properties: if 𝑋 and 𝑌 are normal random variables with means 𝜇𝑋
and 𝜇𝑌 and standard deviations 𝜎𝑋 and 𝜎𝑌 , and 𝑍 = 𝑎𝑋 + 𝑏𝑌, then
𝑍 is also a normal random variable with mean 𝜇𝑍 = 𝑎𝜇𝑋 + 𝑎𝜇𝑌 and
variance 𝜎𝑍2 = 𝑎2 𝜎𝑋2 + 𝑏 2 𝜎𝑌2
29
Standard Normal Distribution

• Standard Normal Distribution - normal distribution with the


mean of 0 and standard deviation of 1
𝑋−𝜇
• If 𝑋~𝑁(𝜇, 𝜎 2 ), then 𝑍 = ~𝑁 0,1 (standard normal)
𝜎

• To get normal variable from standard normal: 𝑋 = σ𝑍 + 𝜇

30
What percent of the data lies within 1, 2 or 3
standard deviations of the mean?

• Integrate the PDF of Standard Normal Distribution from 0 to 1 to


find what percent of the data lies between the mean (0) and
mean + one standard deviation (1):
1 −𝑥 2
𝑓 𝑥 = 𝑒 2
2𝜋

31
What percent of the data lies within 1, 2 or 3
standard deviations of the mean?

32
Normally Distributed Data

33
Applying Normal Distribution

• Example:
• Although there is no law for IQ cutoff to become a police officer, in
some states the police departments are allowed not to hire people
who are too intelligent. Suppose in the Happytown Police
Department the cutoff IQ score is 2 standard deviations above the
population mean. Among 1000 people chosen at random,
approximately how many will NOT get hired because they are too
smart?

• Answer: 2.2% of 1000 people are above 2 standard deviations ~ 22


persons
34
Normal Distribution – Excel Functions

• Probability of a value under a


specific normal distribution:
• Probability of a number given
standard normal distribution:
• Inverse of the normal
cumulative distribution (given
mean and stdev, what number
is higher than the specified % of
the data?):
• Inverse of standard normal
cumulative distribution:
35
Summary

• We use mean (expected value), median and mode as measures


of central tendency
• We use variance and standard deviation as measures of variation
• Normal distribution is bell-shaped and can be derived by finding
the means of many samples
• There is a specific percentage of data within 1, 2 and 3 standard
deviation of the mean

36
Next Time

• Hypothesis testing

37
Thank you!

38
Thank you! Any questions?

You might also like