STAT 5 Week 9 Notes Part 1
STAT 5 Week 9 Notes Part 1
STAT 5 Week 9 Notes Part 1
Review:
Summary Statistics
● Idea: Characterize an aspect of the data in a single value
● Single Variable Summary Stats
Mean: Characterizes center of the distribution
Median: Also characterizes the center, robust to outliers
Mode: Most common value(s)
Standard Deviation: Characterizes spread of distribution around the mean
Histograms
● Two Variable Summary Statistics
Correlation Coefficient: characterizes linear relationship between variables
Scatterplots
Regression Line (can also be used to make predictions)
Box Models:
● Box represents the overall population
● We take random draws from the box
● Can find the expected value and standard error
Statistical Inference
Thus far, we have assumed that we know what's inside the box
● Then we can describe what happens for random samples from the box (CLT, LLN)
But what if we don't know the contents of the box?
● This is the core problem of statistics
● We have a random sample from the box, and want to guess what is inside
● In other words, can we find the box average/standard deviation from the sample?
● Can we quantify how uncertain we are?
Estimating Averages/Percentages
● Sample Mean/Percentage = Population Mean/Percentage + Chance Error
● Idea: We can approximate the population mean by the sample mean
● We know from the Law of Large Numbers that this estimate will get better as the sample
size gets bigger
● But how certain are we?
Bootstrapping
● To quantify our uncertainty, we need the SE
● To get the standard error, we need the box SD
○ But we don't have the box standard deviation!
● Idea: Use the SD of the sample to approximate the SD of the box
● This is called the "Bootstrap”
Confidence Intervals:
● We want a concise summary of how accurate our estimate is
● We can use the SD, and Normal approximation
● We know that 68% of the time, the true mean will be within 1 SE of the sample mean
● We know 95% of the time, the true mean will be within 2 SE of the sample mean
(actually 1.96 SE to be exact)
● We know that 99% of the time, the true mean will be within 2.58 SE of the sample mean