Advanced Statistical Approaches To Quality: INSE 6220 - Week 4
Advanced Statistical Approaches To Quality: INSE 6220 - Week 4
Advanced Statistical Approaches To Quality: INSE 6220 - Week 4
P ( Z z / 2 ) 1 P ( Z z / 2 ) 1 ( z / 2 )
2
( z / 2 ) 1
2
z / 2 = -1 1
2
Also, we have: P ( Z z / 2 ) ( z / 2 )
2
1 n
Mean X E( X ) X Xi
n i 1
2
Variance Var ( X ) E ( X X )
2 2
X
2
S S
2 1 n2
X
n 1 i 1
Xi X
E( X 2 ) E( X )
2
Standard 2 S S2
Deviation
Covariance
2
XY Cov( X , Y ) E ( X X )(Y y ) S 2
XY
1 n
n 1 i 1
X i X Yi Y
E ( XY ) E ( X ) E (Y )
XY
2
Cov( X , Y ) 2
S XY
Correlation XY rXY
Coefficient XY Var ( X )Var (Y ) S X SY
4
Statistical Inference
The purpose of statistical inference is to obtain information about a population from
information contained in a sample.
A population is the set of all the elements of interest.
A sample is a subset of the population.
The sample results provide only estimates of the values of the population
characteristics.
A parameter is a numerical characteristic of a population.
With proper sampling methods, the sample results will provide good estimates of
the population characteristics.
In point estimation we use the data from the sample to compute a value of a
sample statistic that serves as an estimate of a population parameter.
We refer to X as the point estimator of the population mean .
s is the point estimator of the population standard deviation .
When the expected value of a point estimator is equal to the population parameter,
the point estimator is said to be unbiased.
5
Sampling and Estimation
Sampling: act of making observations from populations
Random sampling: when each observation is identically and
independently distributed (i.i.d.)
Statistic: a function of sample data; a value that can be computed from
data (contains no unknowns)
average, median, standard deviation
A statistic is a random variable, which itself has a sampling distribution
i.e., if we take multiple random samples, the value for the statistic will be different
for each set of samples, but will be governed by the same sampling distribution
If we know the appropriate sampling distribution, we can reason about the
population based on the observed value of a statistic
E.g. we calculate a sample mean from a random sample; in what range do we
think the actual (population) mean really sits?
6
Point and Interval Estimators
A point estimator draws inferences about a population by estimating the
value of an unknown parameter using a single value or point.
That is we say (with some ___% certainty) that the population parameter of
interest is between some lower and upper bounds.
7
Point & Interval Estimation
For example, suppose we want to estimate the mean summer income of a
class of Quality Systems Engineering students. For n=25 students,
is calculated to be 400 $/week.
Estimation Process
Sample
General Formula 11
Confidence Level
The confidence that the interval will contain the unknown population parameter
A percentage (less than 100%)
width
14
Finding the Critical Value, z/2
Consider a 95% confidence interval:
z /2 1.96
1 0.95 0.05
0.025 0.025
2 2
Solution:
X z /2
n
2.20 1.96 (0.35/ 11)
2.20 0.2068
1.9932 2.4068
We are 95% confident that the true mean resistance is between 1.9932 and
2.4068 ohms.
Although the true mean may or may not be in this interval, 95% of intervals
formed in this manner will contain the true mean
16
Do You Ever Truly Know ?
Probably not!
In virtually all real world situations, is not known.
If there is a situation where is known then is also known (since to calculate
you need to know .)
If you truly know there would be no need to gather a sample to estimate it.
X-
S n
will be Student-t distributed with n-1 degrees of freedom
How do I know this? Remember that Student t Distribution is the
sampling distribution of the sample mean when sample size, n, is small
and underlying distribution is Normal (or close to Normal)
Calculate upper and lower CI limit using Student-t distribution and 1- such
that P(LCL UCL) = 1-
18
Sampling: the Chi-Square distribution
2
S ~ 2
2
(n 1) n 1
19
For k , tk N (0,1)
Consider X i N ( , 2 ) Then
X
X / n N (0,1)
t n 1
s/ n s / 1
n21
n 1
This is just the normalized distance from mean (normalized to our
estimate of the sample variance).
20
Confidence intervals: variance unknown
Case where we dont know variance a priori
Now we have to estimate not only the mean based on our data, but also
estimate the variance
Our estimate of the mean to some interval with (1-)100% confidence becomes
s s
X t / 2,n 1 X t / 2,n 1
n n
Note that the t distribution is slightly wider than the normal distribution, so that
our confidence interval on the true mean is not as tight as when we know the
variance.
Confidence intervals: Estimate of variance
( n 1) s 2 ( n 1) s 2
2
2 / 2,n 1 12 / 2,n 1
Example:
n 16, 0.05
t / 2,n 1 t0.025,15 2.131
23
Confidence Interval on the Variance of a Normal Distribution
Summary of Procedure
Data Distribution: Normal i.e. X~ N(, 2)
Population parameters:
Variance, 2: unknown to be estimated
Sample Size: any
Sample Statistic: Sample variance
Sampling Distribution: Chi-squared Distribution
Because chi-square is asymmetric, confidence intervals bounds not symmetric.
Confidence Intervals (CIs):
MATLAB
>> icdf('F',1-0.05/2,19,19) = 2.5265
>> icdf('F',0.05/2,19,19) = 0.3958
Hypothesis Testing pronounced
Null
25
H nought
Alternative Hypothesis
Hypothesis H 0 : 1.10
H1 : 1.10
Example3: suppose that someone says that the average price of a liter of regular unleaded
gas in Montreal is $1.10. How would you decide whether this statement is true? You could try
to find out what every gas station in the city was charging and how many liters they were
selling at that price. That approach might be definitive, but it could end up costing more than
the information is worth. A simpler approach is to find out the price of gas at a small number of
randomly chosen stations around the city and compare the average price to $1.10.
Of course, the average price you get will probably not be exactly $1.10 due to variability in
price from one station to the next. Suppose your average price was $1.18. Is this three cent
difference a result of chance variability, or is the original assertion incorrect? A hypothesis test
can provide an answer.
26
Hypothesis Test Terminology
The significance level is related to the degree of certainty you require in order to reject the
null hypothesis in favor of the alternative. By taking a small sample you cannot be certain
about your conclusion. So you decide in advance to reject the null hypothesis if the
probability of observing your sampled result is less than the significance level. For a
typical significance level of 5%, the notation is = 0.05. For this significance level, the
probability of incorrectly rejecting the null hypothesis when it is actually true is 5%. If you
need more protection from this error, then choose a lower value of .
The p-value is the probability of observing the given sample result under the assumption
that the null hypothesis is true. If the p-value is less than , then you reject the null
hypothesis. For example, if = 0.05 and the p-value is 0.03, then you reject the null
hypothesis. The converse is not true. If the p-value is greater than , you have insufficient
evidence to reject the null hypothesis.
The null hypothesis is always about a population parameter, not about a sample
statistic
H0 : 3 H0 : X 3
27
Type I and Type II Errors
Since hypothesis tests are based on sample data, we must allow
for the possibility of errors.
A Type I error is rejecting H0 when it is true.
The person conducting the hypothesis test specifies the
maximum allowable probability of making a
Type I error, denoted by and called the level of significance.
A Type II error is accepting H0 when it is false.
Generally, we cannot control for the probability of making a Type
II error, denoted by .
Statistician avoids the risk of making a Type II error by using do
not reject H0 and not accept H0.
P(Type I error) =
P(Type II error) =
28
Inference on the mean of a population, variance known
H 0 : 0
H1 : 0 (3-22)
X 0
Z0 (3-23)
/ n
F-test statistic
34
Introduction to control charts
Principal purpose: early detection of an out-of-control process
A process is out of control if it is producing items which are
off target or
too variable
An out-of-control process is likely to produce many nonconforming items
If an assignable cause can be found, the process can be corrected and
brought back into control.
A capable, in-control process will produce fewer nonconforming items.
An x chart is used if the quality of the output is measured in terms of a variable such as
length, weight, temperature, and so on.
x represents the mean value found in a sample of the output.
An R chart is used to monitor the range of the measurements in the sample.
A p chart is used to monitor the proportion defective in the sample.
An np chart is used to monitor the number of defective items in the sample.
>> controlchart(data,'chart','xbar','sigma','range','rules','we6');
38
Shewhart Control Charts
Suppose we have a general statistic W
We plot W over time
We specify control limits of the form
U C L 3
W W Mean of W
C L W
L C L W 3 W Std. Dev. of W
A control chart based on a number of standard deviations of the statistic
from the mean of the statistic is called a Shewhart Control Chart
Some commonly used Ws
X-bar: Average
R: Range
s: Standard deviation
We can also specify control charts using probability limits
39
R
So an unbiased estimator is given by: d 2
40
X-bar Control Charts
Control Limits
U C L X
3 X
C L X
L C L X
3 X
Therefore, A 2
3
L C L X R
d 2 n
C L X
3
U C L X R
d 2 n
41
R-Control Charts
We are looking to make control charts of the form
LCL R 3 R
UCL R 3 R
So
R
R d 3
d2
D 3
Thus LCL R 3d
R
1 3
d 3
R
3
d 2 d 2
R d
UCL R 3d 3 1 3 3
R
d 2 d 2
D 4
42
43
Example
x Chart :
UCL x A2 R
1.5056 (0.577 )(0.32521) 1.69325
Central line x 1.5056
LCL x A2 R
1.5056 (0.577 )(0.32521) 1.31795
R Chart :
UCL D4 R ( 2.114)(0.32521)
Central line R 0.32521
LCL D3 R (0)(0.32521)
x 1.50345
R 0.3360
44
X-bar chart using MATLAB
>> load parts
>> controlchart(runout,'chart','xbar','sigma','range','rules','we6');
Interpreting Charts:
Observations outside control limits indicate the process
is probably out-of-control
Significant patterns in the observations indicate the
process is probably out-of-control
Random causes will on rare occasions indicate the
process is probably out-of-control when it actually is
not