AE 248: AI and Data Science: Prabhu Ramachandran 2024-03-01

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

AE 248: AI and Data science

Parameter Estimation

Prabhu Ramachandran

2024-03-01

Parameter Estimation

• Probability theory: you are given 𝐹


• Statistics: observed data → infer unknown parameters

Estimates

• Given that 𝑋1 , ..., 𝑋𝑛 from 𝐹𝜃


• 𝐹𝜃 not fully specified, 𝜃 unknown
• Example:
– Exponential distribution with unknown mean
– Normal with unknown mean and variance.

Estimates/Estimators

• Point estimates
• Interval estimates
• Confidence
• Estimator: statistic to estimate unknown parameter 𝜃

Maximum Likelyhood Estimators

• Assume unknown parameter 𝜃


• Find joint PDF/PMF, 𝑓(𝑥1 , ..., 𝑥𝑛 |𝜃)
• Maximize 𝑓 w.r.t. 𝜃 → 𝜃 ̂
• 𝑓(𝑥1 , ..., 𝑥𝑛 |𝜃), likelyhood function

1
. . .

• Provides a point estimate


• Note: 𝑓 and 𝑙𝑜𝑔(𝑓) have same maximum

MLE Example: Bernoulli Parameter

• 𝑛 Bernoulli trials with 𝑝 success probability


• What is the MLE of 𝑝?
• Data consist of values 𝑋1 , … , 𝑋𝑛

Solution

𝑃 {𝑋𝑖 = 𝑥} = 𝑝𝑥 (1 − 𝑝)(1−𝑥) , 𝑥 = 0, 1

𝑓(𝑥1 , … , 𝑥𝑛 |𝑝) = 𝑝∑𝑖 𝑥𝑖 (1 − 𝑝)𝑛−∑𝑖 𝑥𝑖

maximize {log 𝑓(𝑥1 , … , 𝑥𝑛 |𝑝)}

Answer
𝑛
∑ 𝑥𝑖
𝑝̂ = 𝑖=1
𝑛

MLE Example: Poisson Parameter

• 𝑛 independent Poisson RVs with mean 𝜆


• Find 𝜆̂

Solution

𝑒−𝑛𝜆 𝜆∑ 𝑥𝑖
𝑓(𝑥1 , … , 𝑥𝑛 |𝜆) =
𝑥1 ! … 𝑥𝑛 !
maximize {log 𝑓(𝑥1 , … , 𝑥𝑛 |𝜆)}

2
Answer
𝑛
∑𝑖=1 𝑥𝑖
𝜆̂ =
𝑛

Problem

The number of traffic accidents in Mumbai, in 10 randomly chosen non-rainy days in 2003 is
as follows:

4, 0, 6, 5, 2, 1, 2, 0, 4, 3

Use this to estimate the proportion of non rainy days that had 2 or fewer accidents that year.

MLE for Normal Population

• Self-study
• Same idea and approach
• Two parameters, so maximize w.r.t. each

MLE for a Uniform Distribution

• If 𝑥 ∈ (0, 𝜃)
• 𝜃 should be small
• But large enough for largest 𝑋𝑖

Interval estimates

• Given that 𝑋1 , ..., 𝑋𝑛 from 𝒩(𝜇, 𝜎)


• Unknown 𝜇 but known 𝜎
• MLE 𝜇̂ = 𝑋̄

. . .

• Is the MLE equal to actual 𝜇??


• Can we provide an interval in which 𝜇 lies?

3
Interval estimates
√ ̄
• 𝑛 𝑋−𝜇𝜎 is a standard normal
• So, for example:

√ 𝑋̄ − 𝜇
𝑃 {−1.96 < 𝑛 < 1.96} = 0.95
𝜎

Interval estimates

• Can be modified to:

𝜎 𝜎
𝑃 {𝑋̄ − 1.96 √ < 𝜇 < 𝑋̄ + 1.96 √ } = 0.95
𝑛 𝑛

Example

• Given some 𝑥,̄ this means


– With 95% confidence the mean lies within ±1.96 √𝜎𝑛 of 𝑥̄

• 95% percent confidence interval estimate of 𝜇

Interpretation

• If we estimated the interval a 100 times (with 100 samples), then we expect the number
of intervals that contain the true mean 𝜇 to tend to 90.
• There is a 90% probability that if we calculate the CI in a future experiment it will
encompass the true mean.

Example from textbook

Suppose that when a signal having value 𝜇 is transmitted from location A the value received
at location B is normally distributed with mean 𝜇 and variance 4. That is, if 𝜇 is sent, then
the value received is 𝜇 + 𝑁 where N , representing noise, is normal with mean 0 and variance
4. To reduce error, suppose the same value is sent 9 times. If the successive values received
are 5, 8.5, 12, 15, 7, 9, 7.5, 6.5, 10.5, let us construct a 95 percent confidence interval for 𝜇.

4
Two-sided vs one-sided

• With 95% confidence assert if 𝜇 is at least as large as the value

√ 𝑋̄ − 𝜇
𝑃{ 𝑛 < 1.645} = 0.95
𝜎
𝜎
𝑃 {𝑋̄ − 1.645 √ < 𝜇} = 0.95
𝑛

One-sided intervals

• One-sided upper CI for 𝜇 = (𝑥̄ − 1.645 √𝜎𝑛 , ∞)


• One-sided lower CI for 𝜇 = (−∞, 𝑥̄ + 1.645 √𝜎𝑛 )

Using the tables

• Recall 𝑃 {𝑍 > 𝑧𝛼 } = 𝛼
• 𝑃 {−𝑧𝛼/2 < 𝑍 < 𝑧𝛼/2 } = 1 − 𝛼

𝜎 𝜎
𝑃 {𝑋̄ − 𝑧𝛼/2 √ < 𝜇 < 𝑋̄ + 𝑧𝛼/2 √ } = 1 − 𝛼
𝑛 𝑛

Finding suitable n

• Given desired interval size


• Find 𝑛 to satisfy it

So what if variance is not known?


√ 𝑋−𝜇
̄
• Cannot assume 𝑛 𝜎 is 𝑍
• We can find 𝑆 2

So what if variance is not known?


√ ̄
• 𝑛 𝑋−𝜇
𝑆 is 𝑡𝑛−1

𝑠 𝑠
𝑃 {𝑋̄ − 𝑡𝛼/2,𝑛−1 √ < 𝜇 < 𝑋̄ + 𝑡𝛼/2,𝑛−1 √ } = 1 − 𝛼
𝑛 𝑛

5
Non-normal populations

• Central limit theorem applies, so if 𝑛 is ”large enough” we should be good.

Confidence intervals for the variance


2
• Recall that (𝑛 − 1) 𝑆𝜎2 ∼ 𝜒2𝑛−1
• Homework.
• Note that 𝜒2 is not symmetric
• 𝜒2𝛼/2,𝑛−1 and 𝜒21−𝛼/2,𝑛−1

Example

The weights of 5 students was found to be 61, 65, 68, 58, and 70 Kgs. Determine a 95%
confidence interval for their mean. Also determine a 95% lower confidence interval for this
mean.

Difference in means

• 𝑋1 , ..., 𝑋𝑛 from 𝒩(𝜇1 , 𝜎1 )


• 𝑌1 , ..., 𝑌𝑚 from 𝒩(𝜇2 , 𝜎2 )
• CI for 𝜇1 − 𝜇2 ?
• Recall: distribution of two normally distributed RVs is normal

Difference in means

• MLE of 𝜇1 − 𝜇2 is 𝑋̄ − 𝑌 ̄

𝑋̄ − 𝑌 ̄ − (𝜇1 − 𝜇2 )
2
∼ 𝒩(0, 1)
√ 𝜎1 + 𝜎22
𝑛 𝑚

When variances are not known?

• If 𝜎1 ≠ 𝜎2 we have a problem
• If they are the same the same approach as before can be used

𝑋̄ − 𝑌 ̄ − (𝜇1 − 𝜇2 )
∼ 𝒩(0, 1)
2 𝜎2
√ 𝜎𝑛 + 𝑚

6
Variances unknown

• 𝑋,̄ 𝑆12 , 𝑌 ̄ , 𝑆22 are independent


• If we consider

(𝑛 − 1)𝑆12 + (𝑚 − 1)𝑆22
𝑆𝑝2 =
𝑛+𝑚−2

Variances unknown

𝑋̄ − 𝑌 ̄ − (𝜇1 − 𝜇2 )
∼ 𝑡𝑛+𝑚−2
√𝑆𝑝2 ( 𝑛1 + 1
𝑚)

Approximate CI for Bernoulli RV


𝑋−𝑛𝑝
• When 𝑛 is large, √𝑛𝑝(1−𝑝)
∼ 𝒩(0, 1)
• To get a CI, let 𝑝̂ = 𝑋/𝑛
𝑋−𝑛𝑝
• So √𝑛 𝑝(1−
̂ 𝑝)̂
∼ 𝒩(0, 1)
• 𝑃 {𝑝̂ − 𝑧𝛼/2 √𝑝(1
̂ − 𝑝)/𝑛
̂ < 𝑝 < 𝑝̂ + 𝑧𝛼/2 √𝑝(1
̂ − 𝑝)/𝑛}
̂ ≈1−𝛼

Evaluating point estimators

• How good is an estimator, 𝑑(𝑋1 , … , 𝑋𝑛 )?


• One measure is the mean-square error 𝐸[(𝑑(X) − 𝜃)2 ]
• A desirable quality is unbiasedness

Unbiased estimators

• Bias is defined as: 𝑏𝜃 (𝑑) = 𝐸[𝑑(X)] − 𝜃


• Unbiased if 𝑏𝜃 (𝑑) = 0
• If 𝑑 is unbiased then 𝐸[(𝑑(X − 𝜃)2 ] = 𝑉 𝑎𝑟(𝑑(X))

7
Bayes estimator

• Prior information on distribution of 𝜃, i.e. 𝑝(𝜃)


• Use data to find posterior density

𝑓(𝜃, 𝑥1 , … , 𝑥𝑛 )
𝑓(𝜃|𝑥1 , … , 𝑥𝑛 ) =
𝑓(𝑥1 , … , 𝑥𝑛 )
𝑝(𝜃)𝑓(𝑥1 , … , 𝑥𝑛 |𝜃)
=
∫ 𝑓(𝑥1 , … , 𝑥𝑛 |𝜃)𝑝(𝜃)𝑑𝜃

Bayes estimator

• Best estimate of 𝜃 is the mean of the posterior:

𝐸[𝜃|𝑋1 = 𝑥1 , … , 𝑋𝑛 = 𝑥𝑛 ] = ∫ 𝜃𝑓(𝜃|𝑥1 , … , 𝑥𝑛 )𝑑𝜃

• See examples in the book

You might also like