STAT 5 Week 9 Notes Part 1

May 24th, 2022
Taught by TA Jacob. Their slides are typed below.
Review:
Summary Statistics
● Idea: Characterize an aspect of the data in a single value
● Single Variable Summary Stats
Mean: Characterizes center of the distribution
Median: Also characterizes the center, robust to outliers
Mode: Most common value(s)
Standard Deviation: Characterizes spread of distribution around the mean
Histograms
● Two Variable Summary Statistics
Correlation Coefficient: characterizes linear relationship between variables
Scatterplots
Regression Line (can also be used to make predictions)
Box Models:
● Box represents the overall population
● We take random draws from the box
● Can find the expected value and standard error
Expected (new average) SE (New SD)
Sum (sample size) x (box avg) 𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒 x (Box SD)
Average Box average Box SD÷ 𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒
Percentage Box average x 100% (Box SD÷ 𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒)x 100%
Box Models Continued:

Law of Large Numbers:
● Expected value of average/percentage -› box average
Central Limit Theorem
● Sums/Average of draws from the box -› N (Expected Value, Standard Error)
● Assumes independent draws
● Number of draws needed depends on probability distribution of the box
● General rule of thumb: the more the better
● Only true for sums/averages, not products
Statistical Inference
Thus far, we have assumed that we know what's inside the box
● Then we can describe what happens for random samples from the box (CLT, LLN)
But what if we don't know the contents of the box?
● This is the core problem of statistics
● We have a random sample from the box, and want to guess what is inside
● In other words, can we find the box average/standard deviation from the sample?
● Can we quantify how uncertain we are?
Estimating Averages/Percentages
● Sample Mean/Percentage = Population Mean/Percentage + Chance Error
● Idea: We can approximate the population mean by the sample mean
● We know from the Law of Large Numbers that this estimate will get better as the sample
size gets bigger
● But how certain are we?
Bootstrapping
● To quantify our uncertainty, we need the SE
● To get the standard error, we need the box SD
○ But we don't have the box standard deviation!
● Idea: Use the SD of the sample to approximate the SD of the box
● This is called the "Bootstrap”
Confidence Intervals:
● We want a concise summary of how accurate our estimate is
● We can use the SD, and Normal approximation
● We know that 68% of the time, the true mean will be within 1 SE of the sample mean
● We know 95% of the time, the true mean will be within 2 SE of the sample mean
(actually 1.96 SE to be exact)
● We know that 99% of the time, the true mean will be within 2.58 SE of the sample mean
Confidence Intervals Continued

Sample Mean ± 1.96 SE
● First find sample average and sample SD
● Next compute the SE
● Then find the lower and upper bounds of the Confidence Interval
Important reminders for SEs:

SE for sums= # 𝑜𝑓 𝑑𝑟𝑎𝑤𝑠 x SD of Box
𝑆𝐸 𝑓𝑜𝑟 𝑠𝑢𝑚
SE for average= # 𝑜𝑓 𝑑𝑟𝑎𝑤𝑠
SE for count= SE for sum, from a 0-1 box
𝑆𝐸 𝑓𝑜𝑟 𝑐𝑜𝑢𝑛𝑡
SE for percent= # 𝑜𝑓 𝑑𝑟𝑎𝑤𝑠
x 100%
Interpreting Confidence Intervals:

● The randomness is in the Sample, not the underlying parameter (ie. mean/percentage)
● The true parameter is treated as fixed, the randomness comes from the change error in
the sampling procedure
● We build the confidence interval from the sample, which means that the confidence
interval is random
● The interval for any two samples will be different
Interpreting Confidence Intervals:

● A common incorrect interpretation:
○ The true mean will fall inside the 95% Confidence interval with 95% probability
(Wrong!)
○ The true mean is fixed. It's either inside or outside the Confidence Interval
● The correct Interpretation:

○ Confidence intervals are random objects generated by the sample
○ For each sample, you'll have a difference confidence interval
○ If you take 100 samples, the 95% of the confidence intervals will contain the true
mean

STAT 5 Week 9 Notes Part 1

Uploaded by

Copyright:

Available Formats

STAT 5 Week 9 Notes Part 1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

STAT 5 Week 9 Notes Part 1

Uploaded by

Copyright:

Available Formats

May 24th, 2022

Taught by TA Jacob. Their slides are typed below.

Expected (new average) SE (New SD)

Sum (sample size) x (box avg) 𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒 x (Box SD)

Average Box average Box SD÷ 𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒

Percentage Box average x 100% (Box SD÷ 𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒)x 100%

Box Models Continued:

Confidence Intervals Continued

Important reminders for SEs:

Interpreting Confidence Intervals:

Interpreting Confidence Intervals:

● The correct Interpretation:

You might also like