Advanced Statistical Approaches To Quality: INSE 6220 - Week 4

1
INSE 6220 -- Week 4

Advanced Statistical Approaches to Quality
Inferences about Process Control

Sampling and Estimation
Confidence intervals
Control Charts and hypothesis testing
Statistical basis for Control Charts
Dr. A. Ben Hamza Concordia University

2
Using the normal cdf and pdf

We often want to talk about percentage
points of the distribution-portion in the
tails.
P ( Z z / 2 ) 1 P ( Z z / 2 ) 1 ( z / 2 )
2

( z / 2 ) 1
2

z / 2 = -1 1
2
>> icdf('normal',1- /2,0,1)

Also, we have: P ( Z z / 2 ) ( z / 2 )
2
Example: z0.20 / 2 z0.10 1.2816

z0.05/ 2 z0.025 1.96
3
Moments of the population vs. sample statistics
Population Sample
1 n
Mean X E( X ) X Xi
n i 1
2
Variance Var ( X ) E ( X X )
2 2
X
2
S S
2 1 n2
X
n 1 i 1
Xi X
E( X 2 ) E( X )
2
Standard 2 S S2
Deviation
Covariance
2
XY Cov( X , Y ) E ( X X )(Y y ) S 2
XY
1 n

n 1 i 1

X i X Yi Y
E ( XY ) E ( X ) E (Y )
XY
2
Cov( X , Y ) 2
S XY
Correlation XY rXY
Coefficient XY Var ( X )Var (Y ) S X SY
4
Statistical Inference
The purpose of statistical inference is to obtain information about a population from
information contained in a sample.
A population is the set of all the elements of interest.
A sample is a subset of the population.
The sample results provide only estimates of the values of the population
characteristics.
A parameter is a numerical characteristic of a population.
With proper sampling methods, the sample results will provide good estimates of
the population characteristics.
In point estimation we use the data from the sample to compute a value of a
sample statistic that serves as an estimate of a population parameter.
We refer to X as the point estimator of the population mean .
s is the point estimator of the population standard deviation .
When the expected value of a point estimator is equal to the population parameter,
the point estimator is said to be unbiased.
5
Sampling and Estimation
Sampling: act of making observations from populations
Random sampling: when each observation is identically and
independently distributed (i.i.d.)
Statistic: a function of sample data; a value that can be computed from
data (contains no unknowns)
average, median, standard deviation
A statistic is a random variable, which itself has a sampling distribution
i.e., if we take multiple random samples, the value for the statistic will be different
for each set of samples, but will be governed by the same sampling distribution
If we know the appropriate sampling distribution, we can reason about the
population based on the observed value of a statistic
E.g. we calculate a sample mean from a random sample; in what range do we
think the actual (population) mean really sits?
6
Point and Interval Estimators
A point estimator draws inferences about a population by estimating the
value of an unknown parameter using a single value or point.
An interval estimator draws inferences about a population by estimating the

value of an unknown parameter using an interval.
That is we say (with some ___% certainty) that the population parameter of
interest is between some lower and upper bounds.
7
Point & Interval Estimation
For example, suppose we want to estimate the mean summer income of a
class of Quality Systems Engineering students. For n=25 students,
is calculated to be 400 $/week.
point estimate interval estimate
An alternative statement is:

The mean income is between 380 and 420 $/week.
8
Population vs. Sampling Distribution
9
Sampling Distributions
The probability distribution of a statistic is called a

sampling distribution
Sampling distribution for sample mean when large sample

Normal distribution
Sampling distribution for sample mean when small sample
Student-t distribution
Sampling distribution for sample variance
Chi-squared distribution
Sampling distribution of the ratio of two sample variances
F Distribution
10
Estimation Process
Random Sample I am 95%

confident that
is between
Population Mean 40 & 60.
(mean, , is X = 50
unknown)
Sample
General Formula 11
The general formula for all confidence intervals is:
Point Estimate (Critical Value)(Standard Error)

Where:
Point Estimate is the sample statistic estimating the population
parameter of interest
Critical Value is a table value based on the sampling distribution

of the point estimate and the desired confidence level
Standard Error is the standard deviation of the point estimate

12
Confidence Level, (1-)
Confidence Level
The confidence that the interval will contain the unknown population parameter
A percentage (less than 100%)
Suppose confidence level = 95%

Also written (1 - ) = 0.95, (so = 0.05)
A relative frequency interpretation:
95% of all the confidence intervals that can be constructed will contain
the unknown true parameter
A specific interval either will contain or will not contain the true
parameter
No probability involved in a specific interval
Confidence interval on the mean: variance known 13
We know , e.g. from historical data

Estimate mean in some interval to (1 )100% confidence

X z / 2 X z / 2
n n
width
14
Finding the Critical Value, z/2
Consider a 95% confidence interval:
z /2 1.96
1 0.95 0.05

0.025 0.025
2 2
Z units: -z/2 = -1.96 0 z/2 = 1.96

Lower Upper
X units: Confidence Point Estimate Confidence
Limit Limit
Example 15
A sample of 11 circuits from a large normal population has a mean

resistance of 2.20 ohms. We know from past testing that the population
standard deviation is 0.35 ohms. Determine a 95% confidence interval
for the true mean resistance of the population.
Solution:
X z /2
n
2.20 1.96 (0.35/ 11)
2.20 0.2068
1.9932 2.4068
We are 95% confident that the true mean resistance is between 1.9932 and
2.4068 ohms.
Although the true mean may or may not be in this interval, 95% of intervals
formed in this manner will contain the true mean
16
Do You Ever Truly Know ?
Probably not!
In virtually all real world situations, is not known.
If there is a situation where is known then is also known (since to calculate
you need to know .)
If you truly know there would be no need to gather a sample to estimate it.
Confidence Interval for ( Unknown)

If the population standard deviation is unknown, we can substitute the
sample standard deviation, S
This introduces extra uncertainty, since S is variable from sample to
sample
So we use the t distribution instead of the normal distribution
17
Confidence Interval on the Mean of a Normal Distribution, Variance Unknown
Want to estimate the population mean, , of data that is Normally

distributed with unknown variance (i.e. small sample)
Take a sample of size n from Normally distributed data (Note: n < 40)
Find sample mean, X
X-
S n
will be Student-t distributed with n-1 degrees of freedom
How do I know this? Remember that Student t Distribution is the
sampling distribution of the sample mean when sample size, n, is small
and underlying distribution is Normal (or close to Normal)
Calculate upper and lower CI limit using Student-t distribution and 1- such
that P(LCL UCL) = 1-
18
Sampling: the Chi-Square distribution
2
S ~ 2
2
(n 1) n 1
19
Sampling: the t student distribution

If Z N (0,1) then
Z
Y /k
tkwith Y k is distributed as a student
2
t distribution with k degrees of freedom.
Typical use: Find distribution of average when is NOT known
For k , tk N (0,1)
Consider X i N ( , 2 ) Then
X
X / n N (0,1)
t n 1
s/ n s / 1
n21
n 1
This is just the normalized distance from mean (normalized to our
estimate of the sample variance).
20
Confidence intervals: variance unknown
Case where we dont know variance a priori
Now we have to estimate not only the mean based on our data, but also
estimate the variance
Our estimate of the mean to some interval with (1-)100% confidence becomes
s s
X t / 2,n 1 X t / 2,n 1
n n
Note that the t distribution is slightly wider than the normal distribution, so that
our confidence interval on the true mean is not as tight as when we know the
variance.
Confidence intervals: Estimate of variance
( n 1) s 2 ( n 1) s 2
2
2 / 2,n 1 12 / 2,n 1
The appropriate sampling distribution is the Chi-square

Because chi-square is asymmetric, confidence intervals bounds not symmetric.
21
Confidence Interval on the Mean of a Normal Distribution, Small Sample Size

SummaryofProcedure
DataDistribution:Normali.e.X~ N(, 2)
Populationparameters:
Mean,:unknowntobeestimated
Variance,2:unknownes mateusingS2
SampleSize:small(i.e.n<40)
SampleStatistic:SampleMean
SamplingDistribution:StudenttDistribution
ConfidenceIntervals(CIs):
2sidedCI Upper1sidedCI Lower1sidedCI
S S S
X t / 2 , n 1 X t ,n 1 X t ,n 1
n n n
22
t-distribution table
The shaded are is equal

to for t t ,
= degree of freedom
Example:
n 16, 0.05
t / 2,n 1 t0.025,15 2.131
23
Confidence Interval on the Variance of a Normal Distribution
Summary of Procedure
Data Distribution: Normal i.e. X~ N(, 2)
Population parameters:
Variance, 2: unknown to be estimated
Sample Size: any
Sample Statistic: Sample variance
Sampling Distribution: Chi-squared Distribution
Because chi-square is asymmetric, confidence intervals bounds not symmetric.
Confidence Intervals (CIs):
2sidedCI Upper1sidedCI Lower1sidedCI

( n 1) S 2 ( n 1) S 2 (n 1) S 2 (n 1) S 2
2
2 / 2 , n 1 12 / 2 ,n 1 12 ,n 1 2 ,n 1
24
Sampling: the F distribution
MATLAB
>> icdf('F',1-0.05/2,19,19) = 2.5265
>> icdf('F',0.05/2,19,19) = 0.3958
Hypothesis Testing pronounced
Null
25
H nought
Alternative Hypothesis
Hypothesis H 0 : 1.10
H1 : 1.10
A hypothesis test is a procedure for determining if an assertion about a characteristic of a

population is reasonable.
Example1: The mean monthly cell phone bill in this city is = $42
Example2: The proportion of adults in this city with cell phones is p = 0.68
Example3: suppose that someone says that the average price of a liter of regular unleaded
gas in Montreal is $1.10. How would you decide whether this statement is true? You could try
to find out what every gas station in the city was charging and how many liters they were
selling at that price. That approach might be definitive, but it could end up costing more than
the information is worth. A simpler approach is to find out the price of gas at a small number of
randomly chosen stations around the city and compare the average price to $1.10.
Of course, the average price you get will probably not be exactly $1.10 due to variability in
price from one station to the next. Suppose your average price was $1.18. Is this three cent
difference a result of chance variability, or is the original assertion incorrect? A hypothesis test
can provide an answer.
26
Hypothesis Test Terminology
The significance level is related to the degree of certainty you require in order to reject the
null hypothesis in favor of the alternative. By taking a small sample you cannot be certain
about your conclusion. So you decide in advance to reject the null hypothesis if the
probability of observing your sampled result is less than the significance level. For a
typical significance level of 5%, the notation is = 0.05. For this significance level, the
probability of incorrectly rejecting the null hypothesis when it is actually true is 5%. If you
need more protection from this error, then choose a lower value of .
The p-value is the probability of observing the given sample result under the assumption
that the null hypothesis is true. If the p-value is less than , then you reject the null
hypothesis. For example, if = 0.05 and the p-value is 0.03, then you reject the null
hypothesis. The converse is not true. If the p-value is greater than , you have insufficient
evidence to reject the null hypothesis.
The null hypothesis is always about a population parameter, not about a sample
statistic
H0 : 3 H0 : X 3
27
Type I and Type II Errors
Since hypothesis tests are based on sample data, we must allow
for the possibility of errors.
A Type I error is rejecting H0 when it is true.
The person conducting the hypothesis test specifies the
maximum allowable probability of making a
Type I error, denoted by and called the level of significance.
A Type II error is accepting H0 when it is false.
Generally, we cannot control for the probability of making a Type
II error, denoted by .
Statistician avoids the risk of making a Type II error by using do
not reject H0 and not accept H0.
P(Type I error) =
P(Type II error) =
28
Inference on the mean of a population, variance known
H 0 : 0
H1 : 0 (3-22)
X 0
Z0 (3-23)
/ n
H1 in equation (3-22) is a two-sided alternative hypothesis

The procedure for testing this hypothesis is to:
take a random sample of n observations on the random variable x,
compute the test statistic, and
reject H0 if |Z0| > Z/2, where Z/2 is the upper /2 percentage of the
standard normal distribution.
In some situations we may wish to reject H0 only if the true mean is larger
than 0
Thus, the one-sided alternative hypothesis is H1: >0, and we would reject
H0: =0 only if Z0>Z
If rejection is desired only when <0
Then the alternative hypothesis is H1: <0, and we reject H0 only if Z0<Z
29
Confidence interval on the mean, variance known
Furthermore, a 100(1 )% upper confidence bound on is
whereas a 100(1 )% lower confidence bound on is

A Summary of Forms for Null and Alternative 30
Hypotheses about a Population Mean
The equality part of the hypotheses always appears in the null

hypothesis.
In general, a hypothesis test about the value of a population mean
must take one of the following three forms (where 0 is the
hypothesized value of the population mean).
H0: > 0 H0: < 0 H0: = 0

H1: < 0 H1: > 0 H1: 0
One-tailed One-tailed Two-tailed

31
Example
32
Example
33
F-test statistic
34
Introduction to control charts
Principal purpose: early detection of an out-of-control process
A process is out of control if it is producing items which are
off target or
too variable
An out-of-control process is likely to produce many nonconforming items
If an assignable cause can be found, the process can be corrected and
brought back into control.
A capable, in-control process will produce fewer nonconforming items.
Basic principle: Samples of measurements are periodically taken at one

or more stages of a production process to provide data for the monitoring
of the process.
Based on each sample, a statistic is computed and plotted against time.
The result is a time series of the observed statistic values.
Control Charts 35
A Control Chart is a graphical method to spot assignable

cause variation quickly.
Two important lines on a control chart are the upper
control limit (UCL) and lower control limit (LCL).
These lines are chosen so that when the process is in
control, there will be a high probability that the sample
finding will be between the two lines.
Values outside of the control limits provide strong
evidence that the process is out of control.
A range is specified within which the statistic is likely to
have come from the same distribution as the preceding
data.
A control chart is like a hypothesis test.
Control Limits are used to determine if the process
is in a state of statistical control (i.e., is producing
consistent output).
Specification Limits are used to determine if the
product will function in the intended fashion.
36
Control charts and hypothesis testing

Null hypothesis H : process is in-control
0
Alternative hypothesis H1: process is out-of-control

When a point plots within the control limits, the null hypothesis is not
rejected
When a point plots outside the control limits, the null hypothesis is
rejected
Type I error:
1. Rejecting the null hypothesis when it is true
2. Concluding the process is out of control when it isnt
3. False Alarm: an in-control point plots outside the control limits
Type II error:
1. Not rejecting the null hypothesis when it is false
2. Failing to detect an out of control condition: an out-of-control point plots inside the
control limits
Types of Control Charts 37
An x chart is used if the quality of the output is measured in terms of a variable such as
length, weight, temperature, and so on.
x represents the mean value found in a sample of the output.
An R chart is used to monitor the range of the measurements in the sample.
A p chart is used to monitor the proportion defective in the sample.
An np chart is used to monitor the number of defective items in the sample.
>> controlchart(data,'chart','xbar','sigma','range','rules','we6');
38
Shewhart Control Charts
Suppose we have a general statistic W
We plot W over time
We specify control limits of the form
U C L 3
W W Mean of W
C L W
L C L W 3 W Std. Dev. of W
A control chart based on a number of standard deviations of the statistic
from the mean of the statistic is called a Shewhart Control Chart
Some commonly used Ws
X-bar: Average
R: Range
s: Standard deviation
We can also specify control charts using probability limits
39
X-bar Control Charts

We dont know and , so we must estimate them
If we have m subgroups, with averages
X 1 , X 2 ,..., X m
then the best estimate for is

X X ... X
X 1 2 m
Suppose we have subgroup ranges (Xmax-Xmin)

R R ... R
R1, R 2 ,..., R m R 1 2 m
It turns out that R is a biased estimator of , with biasing term d2
R
So an unbiased estimator is given by: d 2
40
X-bar Control Charts
Control Limits
U C L X
3 X
C L X
L C L X
3 X
Therefore, A 2
3
L C L X R
d 2 n
C L X
3
U C L X R
d 2 n
41
R-Control Charts
We are looking to make control charts of the form
LCL R 3 R
UCL R 3 R
The best estimate for R is R

What about R?
It turns out that: d R 3
So
R
R d 3
d2
D 3
Thus LCL R 3d
R
1 3
d 3

R
3
d 2 d 2
R d
UCL R 3d 3 1 3 3
R
d 2 d 2
D 4
42
43
Example
x Chart :
UCL x A2 R
1.5056 (0.577 )(0.32521) 1.69325
Central line x 1.5056
LCL x A2 R
1.5056 (0.577 )(0.32521) 1.31795
R Chart :
UCL D4 R ( 2.114)(0.32521)
Central line R 0.32521
LCL D3 R (0)(0.32521)
x 1.50345
R 0.3360
44
X-bar chart using MATLAB
>> load parts
>> controlchart(runout,'chart','xbar','sigma','range','rules','we6');
Interpreting Charts:
Observations outside control limits indicate the process
is probably out-of-control
Significant patterns in the observations indicate the
process is probably out-of-control
Random causes will on rare occasions indicate the
process is probably out-of-control when it actually is
not

Advanced Statistical Approaches To Quality: INSE 6220 - Week 4

Uploaded by

Copyright:

Available Formats

Advanced Statistical Approaches To Quality: INSE 6220 - Week 4

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Advanced Statistical Approaches To Quality: INSE 6220 - Week 4

Uploaded by

Copyright:

Available Formats

1

INSE 6220 -- Week 4

Inferences about Process Control

Dr. A. Ben Hamza Concordia University

Using the normal cdf and pdf

>> icdf('normal',1- /2,0,1)

Example: z0.20 / 2 z0.10 1.2816

An interval estimator draws inferences about a population by estimating the

point estimate interval estimate

An alternative statement is:

The probability distribution of a statistic is called a

Sampling distribution for sample mean when large sample

Random Sample I am 95%

The general formula for all confidence intervals is:

Point Estimate (Critical Value)(Standard Error)

Critical Value is a table value based on the sampling distribution

Standard Error is the standard deviation of the point estimate

Suppose confidence level = 95%

We know , e.g. from historical data

Z units: -z/2 = -1.96 0 z/2 = 1.96

A sample of 11 circuits from a large normal population has a mean

Confidence Interval for ( Unknown)

Confidence Interval on the Mean of a Normal Distribution, Variance Unknown

Want to estimate the population mean, , of data that is Normally

Sampling: the t student distribution

t distribution with k degrees of freedom.

Typical use: Find distribution of average when is NOT known

The appropriate sampling distribution is the Chi-square

Confidence Interval on the Mean of a Normal Distribution, Small Sample Size

The shaded are is equal

2sidedCI Upper1sidedCI Lower1sidedCI

Sampling: the F distribution

A hypothesis test is a procedure for determining if an assertion about a characteristic of a

H1 in equation (3-22) is a two-sided alternative hypothesis

Furthermore, a 100(1 )% upper confidence bound on is

whereas a 100(1 )% lower confidence bound on is

Hypotheses about a Population Mean

The equality part of the hypotheses always appears in the null

H0: > 0 H0: < 0 H0: = 0

One-tailed One-tailed Two-tailed

Basic principle: Samples of measurements are periodically taken at one

A Control Chart is a graphical method to spot assignable

Control charts and hypothesis testing

Alternative hypothesis H1: process is out-of-control

X-bar Control Charts

then the best estimate for is

Suppose we have subgroup ranges (Xmax-Xmin)

It turns out that R is a biased estimator of , with biasing term d2

The best estimate for R is R

You might also like