AE 248: AI and Data Science: Prabhu Ramachandran 2024-03-01

AE 248: AI and Data science
Parameter Estimation
Prabhu Ramachandran
2024-03-01
Parameter Estimation
• Probability theory: you are given 𝐹

• Statistics: observed data → infer unknown parameters
Estimates
• Given that 𝑋1 , ..., 𝑋𝑛 from 𝐹𝜃

• 𝐹𝜃 not fully specified, 𝜃 unknown
• Example:
– Exponential distribution with unknown mean
– Normal with unknown mean and variance.
Estimates/Estimators
• Point estimates
• Interval estimates
• Confidence
• Estimator: statistic to estimate unknown parameter 𝜃
Maximum Likelyhood Estimators
• Assume unknown parameter 𝜃

• Find joint PDF/PMF, 𝑓(𝑥1 , ..., 𝑥𝑛 |𝜃)
• Maximize 𝑓 w.r.t. 𝜃 → 𝜃 ̂
• 𝑓(𝑥1 , ..., 𝑥𝑛 |𝜃), likelyhood function
1
. . .
• Provides a point estimate

• Note: 𝑓 and 𝑙𝑜𝑔(𝑓) have same maximum
MLE Example: Bernoulli Parameter
• 𝑛 Bernoulli trials with 𝑝 success probability

• What is the MLE of 𝑝?
• Data consist of values 𝑋1 , … , 𝑋𝑛
Solution
𝑃 {𝑋𝑖 = 𝑥} = 𝑝𝑥 (1 − 𝑝)(1−𝑥) , 𝑥 = 0, 1
𝑓(𝑥1 , … , 𝑥𝑛 |𝑝) = 𝑝∑𝑖 𝑥𝑖 (1 − 𝑝)𝑛−∑𝑖 𝑥𝑖
maximize {log 𝑓(𝑥1 , … , 𝑥𝑛 |𝑝)}
Answer
𝑛
∑ 𝑥𝑖
𝑝̂ = 𝑖=1
𝑛
MLE Example: Poisson Parameter
• 𝑛 independent Poisson RVs with mean 𝜆

• Find 𝜆̂
Solution
𝑒−𝑛𝜆 𝜆∑ 𝑥𝑖
𝑓(𝑥1 , … , 𝑥𝑛 |𝜆) =
𝑥1 ! … 𝑥𝑛 !
maximize {log 𝑓(𝑥1 , … , 𝑥𝑛 |𝜆)}
2
Answer
𝑛
∑𝑖=1 𝑥𝑖
𝜆̂ =
𝑛
Problem
The number of traﬀic accidents in Mumbai, in 10 randomly chosen non-rainy days in 2003 is
as follows:
4, 0, 6, 5, 2, 1, 2, 0, 4, 3
Use this to estimate the proportion of non rainy days that had 2 or fewer accidents that year.
MLE for Normal Population
• Self-study
• Same idea and approach
• Two parameters, so maximize w.r.t. each
MLE for a Uniform Distribution
• If 𝑥 ∈ (0, 𝜃)
• 𝜃 should be small
• But large enough for largest 𝑋𝑖
Interval estimates
• Given that 𝑋1 , ..., 𝑋𝑛 from 𝒩(𝜇, 𝜎)

• Unknown 𝜇 but known 𝜎
• MLE 𝜇̂ = 𝑋̄
. . .
• Is the MLE equal to actual 𝜇??

• Can we provide an interval in which 𝜇 lies?
3
Interval estimates
√ ̄
• 𝑛 𝑋−𝜇𝜎 is a standard normal
• So, for example:
√ 𝑋̄ − 𝜇
𝑃 {−1.96 < 𝑛 < 1.96} = 0.95
𝜎
Interval estimates
• Can be modified to:
𝜎 𝜎
𝑃 {𝑋̄ − 1.96 √ < 𝜇 < 𝑋̄ + 1.96 √ } = 0.95
𝑛 𝑛
Example
• Given some 𝑥,̄ this means

– With 95% confidence the mean lies within ±1.96 √𝜎𝑛 of 𝑥̄
• 95% percent confidence interval estimate of 𝜇
Interpretation
• If we estimated the interval a 100 times (with 100 samples), then we expect the number
of intervals that contain the true mean 𝜇 to tend to 90.
• There is a 90% probability that if we calculate the CI in a future experiment it will
encompass the true mean.
Example from textbook
Suppose that when a signal having value 𝜇 is transmitted from location A the value received
at location B is normally distributed with mean 𝜇 and variance 4. That is, if 𝜇 is sent, then
the value received is 𝜇 + 𝑁 where N , representing noise, is normal with mean 0 and variance
4. To reduce error, suppose the same value is sent 9 times. If the successive values received
are 5, 8.5, 12, 15, 7, 9, 7.5, 6.5, 10.5, let us construct a 95 percent confidence interval for 𝜇.
4
Two-sided vs one-sided
• With 95% confidence assert if 𝜇 is at least as large as the value
√ 𝑋̄ − 𝜇
𝑃{ 𝑛 < 1.645} = 0.95
𝜎
𝜎
𝑃 {𝑋̄ − 1.645 √ < 𝜇} = 0.95
𝑛
One-sided intervals
• One-sided upper CI for 𝜇 = (𝑥̄ − 1.645 √𝜎𝑛 , ∞)

• One-sided lower CI for 𝜇 = (−∞, 𝑥̄ + 1.645 √𝜎𝑛 )
Using the tables
• Recall 𝑃 {𝑍 > 𝑧𝛼 } = 𝛼
• 𝑃 {−𝑧𝛼/2 < 𝑍 < 𝑧𝛼/2 } = 1 − 𝛼
𝜎 𝜎
𝑃 {𝑋̄ − 𝑧𝛼/2 √ < 𝜇 < 𝑋̄ + 𝑧𝛼/2 √ } = 1 − 𝛼
𝑛 𝑛
Finding suitable n
• Given desired interval size

• Find 𝑛 to satisfy it
So what if variance is not known?

√ 𝑋−𝜇
̄
• Cannot assume 𝑛 𝜎 is 𝑍
• We can find 𝑆 2
So what if variance is not known?

√ ̄
• 𝑛 𝑋−𝜇
𝑆 is 𝑡𝑛−1
𝑠 𝑠
𝑃 {𝑋̄ − 𝑡𝛼/2,𝑛−1 √ < 𝜇 < 𝑋̄ + 𝑡𝛼/2,𝑛−1 √ } = 1 − 𝛼
𝑛 𝑛
5
Non-normal populations
• Central limit theorem applies, so if 𝑛 is ”large enough” we should be good.
Confidence intervals for the variance

2
• Recall that (𝑛 − 1) 𝑆𝜎2 ∼ 𝜒2𝑛−1
• Homework.
• Note that 𝜒2 is not symmetric
• 𝜒2𝛼/2,𝑛−1 and 𝜒21−𝛼/2,𝑛−1
Example
The weights of 5 students was found to be 61, 65, 68, 58, and 70 Kgs. Determine a 95%
confidence interval for their mean. Also determine a 95% lower confidence interval for this
mean.
Difference in means
• 𝑋1 , ..., 𝑋𝑛 from 𝒩(𝜇1 , 𝜎1 )

• 𝑌1 , ..., 𝑌𝑚 from 𝒩(𝜇2 , 𝜎2 )
• CI for 𝜇1 − 𝜇2 ?
• Recall: distribution of two normally distributed RVs is normal
Difference in means
• MLE of 𝜇1 − 𝜇2 is 𝑋̄ − 𝑌 ̄
𝑋̄ − 𝑌 ̄ − (𝜇1 − 𝜇2 )
2
∼ 𝒩(0, 1)
√ 𝜎1 + 𝜎22
𝑛 𝑚
When variances are not known?
• If 𝜎1 ≠ 𝜎2 we have a problem
• If they are the same the same approach as before can be used
𝑋̄ − 𝑌 ̄ − (𝜇1 − 𝜇2 )
∼ 𝒩(0, 1)
2 𝜎2
√ 𝜎𝑛 + 𝑚
6
Variances unknown
• 𝑋,̄ 𝑆12 , 𝑌 ̄ , 𝑆22 are independent

• If we consider
(𝑛 − 1)𝑆12 + (𝑚 − 1)𝑆22
𝑆𝑝2 =
𝑛+𝑚−2
Variances unknown
𝑋̄ − 𝑌 ̄ − (𝜇1 − 𝜇2 )
∼ 𝑡𝑛+𝑚−2
√𝑆𝑝2 ( 𝑛1 + 1
𝑚)
Approximate CI for Bernoulli RV

𝑋−𝑛𝑝
• When 𝑛 is large, √𝑛𝑝(1−𝑝)
∼ 𝒩(0, 1)
• To get a CI, let 𝑝̂ = 𝑋/𝑛
𝑋−𝑛𝑝
• So √𝑛 𝑝(1−
̂ 𝑝)̂
∼ 𝒩(0, 1)
• 𝑃 {𝑝̂ − 𝑧𝛼/2 √𝑝(1
̂ − 𝑝)/𝑛
̂ < 𝑝 < 𝑝̂ + 𝑧𝛼/2 √𝑝(1
̂ − 𝑝)/𝑛}
̂ ≈1−𝛼
Evaluating point estimators
• How good is an estimator, 𝑑(𝑋1 , … , 𝑋𝑛 )?

• One measure is the mean-square error 𝐸[(𝑑(X) − 𝜃)2 ]
• A desirable quality is unbiasedness
Unbiased estimators
• Bias is defined as: 𝑏𝜃 (𝑑) = 𝐸[𝑑(X)] − 𝜃

• Unbiased if 𝑏𝜃 (𝑑) = 0
• If 𝑑 is unbiased then 𝐸[(𝑑(X − 𝜃)2 ] = 𝑉 𝑎𝑟(𝑑(X))
7
Bayes estimator
• Prior information on distribution of 𝜃, i.e. 𝑝(𝜃)

• Use data to find posterior density
𝑓(𝜃, 𝑥1 , … , 𝑥𝑛 )
𝑓(𝜃|𝑥1 , … , 𝑥𝑛 ) =
𝑓(𝑥1 , … , 𝑥𝑛 )
𝑝(𝜃)𝑓(𝑥1 , … , 𝑥𝑛 |𝜃)
=
∫ 𝑓(𝑥1 , … , 𝑥𝑛 |𝜃)𝑝(𝜃)𝑑𝜃
Bayes estimator
• Best estimate of 𝜃 is the mean of the posterior:
𝐸[𝜃|𝑋1 = 𝑥1 , … , 𝑋𝑛 = 𝑥𝑛 ] = ∫ 𝜃𝑓(𝜃|𝑥1 , … , 𝑥𝑛 )𝑑𝜃
• See examples in the book

AE 248: AI and Data Science: Prabhu Ramachandran 2024-03-01

Uploaded by

Copyright:

Available Formats

AE 248: AI and Data Science: Prabhu Ramachandran 2024-03-01

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

AE 248: AI and Data Science: Prabhu Ramachandran 2024-03-01

Uploaded by

Copyright:

Available Formats

AE 248: AI and Data science

• Probability theory: you are given 𝐹

• Given that 𝑋1 , ..., 𝑋𝑛 from 𝐹𝜃

Maximum Likelyhood Estimators

• Assume unknown parameter 𝜃

• Provides a point estimate

MLE Example: Bernoulli Parameter

• 𝑛 Bernoulli trials with 𝑝 success probability

𝑓(𝑥1 , … , 𝑥𝑛 |𝑝) = 𝑝∑𝑖 𝑥𝑖 (1 − 𝑝)𝑛−∑𝑖 𝑥𝑖

maximize {log 𝑓(𝑥1 , … , 𝑥𝑛 |𝑝)}

MLE Example: Poisson Parameter

• 𝑛 independent Poisson RVs with mean 𝜆

MLE for Normal Population

MLE for a Uniform Distribution

• Given that 𝑋1 , ..., 𝑋𝑛 from 𝒩(𝜇, 𝜎)

• Is the MLE equal to actual 𝜇??

• Can be modified to:

• Given some 𝑥,̄ this means

• 95% percent confidence interval estimate of 𝜇

Example from textbook

• With 95% confidence assert if 𝜇 is at least as large as the value

• One-sided upper CI for 𝜇 = (𝑥̄ − 1.645 √𝜎𝑛 , ∞)

Using the tables

• Given desired interval size

So what if variance is not known?

So what if variance is not known?

• Central limit theorem applies, so if 𝑛 is ”large enough” we should be good.

Confidence intervals for the variance

• 𝑋1 , ..., 𝑋𝑛 from 𝒩(𝜇1 , 𝜎1 )

When variances are not known?

• 𝑋,̄ 𝑆12 , 𝑌 ̄ , 𝑆22 are independent

Approximate CI for Bernoulli RV

Evaluating point estimators

• How good is an estimator, 𝑑(𝑋1 , … , 𝑋𝑛 )?

• Bias is defined as: 𝑏𝜃 (𝑑) = 𝐸[𝑑(X)] − 𝜃

• Prior information on distribution of 𝜃, i.e. 𝑝(𝜃)

• Best estimate of 𝜃 is the mean of the posterior:

𝐸[𝜃|𝑋1 = 𝑥1 , … , 𝑋𝑛 = 𝑥𝑛 ] = ∫ 𝜃𝑓(𝜃|𝑥1 , … , 𝑥𝑛 )𝑑𝜃

• See examples in the book

You might also like