QC Minitab PDF
QC Minitab PDF
QC Minitab PDF
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
MINITAB
Users Guide 2:
Data Analysis
and Quality Tools
Release 13
for Windows
Windows 95, Windows 98, and Windows NT
February 2000
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
ISBN 0-925636-44-4
2000 by Minitab Inc. All rights reserved.
MINITAB is a U.S. registered trademark of Minitab Inc. Other brands or product names are trademarks or registered
trademarks of their respective holders.
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Table of Contents
Welcome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
How to Use this Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
Register as a MINITAB User . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
Global Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv
Customer Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv
MINITAB on the Internet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv
About the Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
Sample Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviii
part I Statistics
1 Basic Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1
Basic Statistics Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2
Descriptive Statistics Available for Display or Storage . . . . . . . . . . . . . . . . . . . 1-4
Display Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-6
Store Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-9
One-Sample Z-Test and Confidence Interval . . . . . . . . . . . . . . . . . . . . . . . . 1-12
One-Sample t-Test and Confidence Interval . . . . . . . . . . . . . . . . . . . . . . . . . 1-15
Two-Sample t-Test and Confidence Interval . . . . . . . . . . . . . . . . . . . . . . . . . 1-18
Paired t-Test and Confidence Interval. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-22
Test and Confidence Interval of a Proportion . . . . . . . . . . . . . . . . . . . . . . . . 1-26
Test and Confidence Interval of Two Proportions . . . . . . . . . . . . . . . . . . . . . 1-30
Test for Equal Variances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-34
Correlation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-37
Covariance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-41
Normality Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-43
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-45
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
2 Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-1
Regression Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2
Regression. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3
Stepwise Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-14
Best Subsets Regression. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-20
Fitted Line Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-24
Residual Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-27
Logistic Regression Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-29
Binary Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-33
Ordinal Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-44
Nominal Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-51
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-58
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
5 Nonparametrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-1
Nonparametrics Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-2
One-Sample Sign Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-3
One-Sample Wilcoxon Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-7
Two-Sample Mann-Whitney Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-11
Kruskal-Wallis Test for a One-Way Design. . . . . . . . . . . . . . . . . . . . . . . . . . . 5-13
Moods Median Test for a One-Way Design . . . . . . . . . . . . . . . . . . . . . . . . . 5-16
Friedman Test for a Randomized Block Design . . . . . . . . . . . . . . . . . . . . . . . 5-18
Runs Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-22
Pairwise Averages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-24
Pairwise Differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-25
Pairwise Slopes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-26
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-27
6 Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-1
Tables Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-2
Arrangement of Input Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-3
Cross Tabulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-3
Tally Unique Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-12
Chi-Square Test for Association . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-14
Chi-Square Goodness-of-Fit Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-19
Simple Correspondence Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-21
Multiple Correspondence Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-31
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-36
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
NP Chart. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-7
C Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-9
U Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-12
Options for Attributes Control Charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-14
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-18
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
INDEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .I-1
ix
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Welcome
How to Use this Guide
This guide is not designed to be read from cover to cover. It is designed to provide you with quick access to the
information you need to complete tasks. If it fails to meet that objective, please let us know in any way you find
convenient, including using the Info form at the back of this book, or sending e-mail to
[email protected].
This guide is half of a two-book set and provides reference information on the following topics:
statistics
quality control
reliability and survival analysis
design of experiments
We provide task-oriented documentation based on using the menus and dialog boxes. We hope you can now easily
learn how to complete the specific task you need to accomplish. We welcome your comments.
See Documentation for MINITAB for Windows, Release 13 on page iii for information about the entire documentation
set for this product.
Assumptions
This guide assumes that you know the basics of using your operating system (such as Windows 95, Windows 98, or
Windows NT). This includes using menus, dialog boxes, a mouse, and moving and resizing windows. If you are not
familiar with these operations, see your operating system documentation.
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Global Support
Minitab Inc. and its international subsidiaries and partners provide sales and support services to Minitab customers
throughout the world. Please refer to the International Partners Card included in your software product box. You can
also access the most up-to-date international partner information via our web site at
http://www.minitab.com.
Customer Support
For technical help, contact your central computing support group if one exists. You may also be eligible to receive
customer support from your distributor, or from Minitab Inc., Minitab Ltd., or Minitab SARL directly, subject to the
terms and conditions of your License Agreement. Eligible users may contact their distributor, Minitab Ltd., Minitab
SARL, or Minitab Inc. (phone 814-231-2MTB (2682), fax 814-238-4383, or send e-mail through our web site at http:/
/www.minitab.com/contacts). Technical support at Minitab Inc. is available Monday through Friday, between the
hours of 9:00 a.m. and 5:00 p.m. Eastern time. When you are calling for technical support, it is helpful if you can be
at your computer when you call. Please have your serial and software version numbers handy (from the Help
About MINITAB screen), along with a detailed description of the problem.
Troubleshooting information is provided in a file called ReadMe.txt, installed in the main MINITAB directory, and in
Help under the topics Troubleshooting and How Do I. You can also visit the Support section of our web site at
http://www.minitab.com/support.
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
To order from Minitab Inc. from within the U.S. or Canada call: 800-448-3555. Additional contact information for
Minitab Inc., Minitab Ltd., and Minitab SARL is given on the back cover of this book.
Related Documentation
Companion Text List, 1996, Minitab Inc., State College, PA. More than 300 textbooks, textbook supplements, and
other related teaching materials that include MINITAB are featured in the Companion Text List. For a complete
bibliography, the Companion Text List is available online at http://www.minitab.com.
MINITAB Handbook, Third Edition, 1994, Barbara F. Ryan, and Brian L. Joiner, Duxbury Press, Belmont, CA. A
supplementary text that teaches basic statistics using MINITAB. The Handbook features the creative use of plots,
application of standard statistical methods to real data, in-depth exploration of data, simulation as a learning tool,
screening data for errors, manipulating data, transformation of data, and performing multiple regressions. Please
contact your bookstore, Minitab Inc., or Duxbury Press to order this book.
iii
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Click OK.
Enter Pulse1.
iv
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Examples
Note the
We have designed
the examples in the guides so you can follow along and duplicate the results. Here is an example
special symbol
examples.
with bothfor
Session
window and Graph window output:
e Example of displaying descriptive statistics
You want to examine characteristic of the height (in inches) of male (Sex = 1) and female (Sex = 2) students who
participated in the pulse study. You choose to display descriptive statistics with the option of a boxplot of the data.
1 Open the worksheet PULSE.MTW.
2 Choose Stat Basic Statistics Display Descriptive Statistics.
3 In Variables, enter Height. Check By variable and enter Sex in the text box.
4 Click Graphs. Check Boxplot of data. Click OK in each dialog box.
Sex
1
2
N
57
35
Mean
70.754
65.400
Median
71.000
65.500
TrMean
70.784
65.395
StDev
2.583
2.563
Variable
Height
Sex
1
2
SE Mean
0.342
0.433
Minimum
66.000
61.000
Maximum
75.000
70.000
Q1
69.000
63.000
Q3
73.000
68.000
Graph
window
output
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
vi
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Basic Statistics
Correlation, 1-36
Covariance, 1-40
CONTENTS
1-1
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 1
HOW TO USE
Basic Statistics Overview
hypothesis tests and confidence intervals for a proportion or the difference in proportions
measuring association
Display Descriptive Statistics produces descriptive statistics for each column or subset within
a column. You can print the statistics in the Session window and/or display them in a graph.
Store Descriptive Statistics stores descriptive statistics for each column or subset within a
column.
For a list of descriptive statistics available for display or storage see page 1-4. To calculate
descriptive statistics individually and store them as constants, see the Calculations chapter in
MINITAB Users Guide 1.
1-Sample Z computes a confidence interval or performs a hypothesis test of the mean when
the population standard deviation, , is known. This procedure is based upon the normal
distribution, so for small samples, this procedure works best if your data were drawn from a
normal distribution or one that is close to normal. From the Central Limit Theorem, you may
use this procedure if you have a large sample, substituting the sample standard deviation for .
A common rule of thumb is to consider samples of size 30 or higher to be large samples. Many
analysts choose the t-procedure over the Z-procedure whenever is unknown.
1-Sample t computes a confidence interval or performs a hypothesis test of the mean when
is unknown. This procedure is based upon the t-distribution, which is derived from a normal
distribution with unknown . For small samples, this procedure works best if your data were
1-2
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Basic Statistics
drawn from a distribution that is normal or close to normal. This procedure is more
conservative than the Z-procedure and should always be chosen over the Z-procedure with
small sample sizes and an unknown . Many analysts choose the t-procedure over the
Z-procedure anytime is unknown. According to the Central Limit Theorem, you can have
increasing confidence in the results of this procedure as sample size increases, because the
distribution of the sample mean becomes more like a normal distribution.
2-Sample t computes a confidence interval and performs a hypothesis test of the difference
between two population means when s are unknown and samples are drawn independently
from each other. This procedure is based upon the t-distribution, and for small samples it
works best if data were drawn from distributions that are normal or close to normal. You can
have increasing confidence in the results as the sample sizes increase.
Paired t computes a confidence interval and performs a hypothesis test of the difference
between two population means when observations are paired. When data are paired, as with
before-and-after measurements, the paired t-procedure results in a smaller variance and
greater power of detecting differences than would the above 2-sample t-procedure, which
assumes that the samples were independently drawn.
2 Proportions computes a confidence interval and performs a hypothesis test of the difference
between two population proportions.
2 Variances computes a confidence interval and performs a hypothesis test for the equality, or
homogeneity, of variance of two samples.
Measures of association
Correlation calculates the Pearson product moment coefficient of correlation (also called the
correlation coefficient or correlation) for pairs of variables. The correlation coefficient is a
measure of the degree of linear relationship between two variables. You can obtain a p-value
to test if there is sufficient evidence that the correlation coefficient is not zero.
By using a combination of MINITAB commands, you can also compute Spearmans
correlation and a partial correlation coefficient. Spearmans correlation is simply the
correlation computed on the ranks of the two samples. A partial correlation coefficient is the
correlation coefficient between two variables while adjusting for the effects of other variables.
CONTENTS
1-3
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 1
SC QREF
HOW TO USE
Covariance calculates the covariance for pairs of variables. The covariance is a measure of the
relationship between two variables but it has not been standardized, as is done with the
correlation coefficient, by dividing by the standard deviation of both variables.
Distribution test
Normality Test generates a normal probability plot and performs a hypothesis test to examine
whether or not the observations follow a normal distribution. Some statistical procedures, such
as a Z- or t-test, assume that the samples were drawn from a normal distribution. Use this
procedure to test the normality assumption.
Session
window
Statistic
Number of nonmissing values
Graphical
summary
Store
Total number
Cumulative number
Percent
Cumulative percent
Mean
Trimmed mean
Standard deviation
1-4
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Basic Statistics
Session
window
Statistic
Graphical
summary
Store
Variance
Sum
Minimum
Maximum
Range
Median
Interquartile range
Sums of squares
Skewness
Kurtosis
MSSD
Calculations
Trimmed Mean. To calculate the trimmed mean (TrMean), MINITAB removes the smallest 5%
and the largest 5% of the values (rounded to the nearest integer), and then averages the
remaining data.
Standard Error of Mean. Calculated by StDev N .
Standard Deviation. If the column contains x1, x2, , xn, with mean x , then
standard deviation =
( x x ) (n 1 )
( n 1 )s -----------------------------2
n 1, 1 2
to
( n 1 )s ----------------------2
n 1, 2
2
CONTENTS
1-5
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 1
HOW TO USE
Display Descriptive Statistics
Median. If sample size is odd, the median is the (n+1) / 2th ordered value. If sample size is even,
the median is the mean of the two middle ordered values.
Confidence Interval for Median. Uses one-sample sign confidence interval described on page
5-3.
Quartiles. To calculate quartiles, MINITAB orders the data from smallest to largest. The first
quartile (Q1) is the observation at position (n + 1) / 4, and the third quartile (Q3) is the
observation at position 3(n + 1) / 4, where n is the number of observations. If the position is not an
integer, interpolation is used.
Sums of Squares. This is the uncorrected sum of squares, or the sum of squared data values.
Skewness. This is a measure of distribution asymmetry or the tendency of one tail to be heavier
than the other. A negative value indicates skewness to the left and a positive values indicates
skewness to the right, though a value of zero does not necessarily indicate symmetry. Skewness is
calculated as
3
n ( n 1 ) ( n 2 ) ( x x ) s
Kurtosis. This is a measure of how different a distribution is from the normal distribution. A
positive value typically indicates that the distribution has a sharper peak, thinner shoulders, and
fatter tails than the normal distribution. A negative value means that a distribution has a flatter
peak, fatter shoulders, and thinner tails than the normal distribution. Kurtosis is calculated as
4
n ( n + 1 ) ( n 1 ) ( n 2 ) ( n 3 ) ( x x ) s 3 ( n 1 ) ( n 2 ) ( n 3 )
MSSD. This is half the Mean of Successive Squared Differences. For example, if the data are 1,
2, 4, 10, successive differences are 1, 2, 6, and the MSSD is
(mean of 12, 22, 62) / 2, or 6.833
Data
The data columns must be numeric. The optional grouping column (also called a By column)
can be numeric, text, or date/time and must be the same length as the data columns. If you wish
to change the order in which text categories are processed from their default alphabetical order,
you can define your own order (see Ordering Text Categories in the Manipulating Data chapter
in MINITAB Users Guide 1).
MINITAB automatically omits missing data from the calculations.
1-6
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Basic Statistics
2 In Variables, enter the column(s) containing the data you want to describe.
3 If you like, use one or more of the options listed below, then click OK.
Options
Display Descriptive Statistics dialog box
generate a histogram, a histogram with a normal curve, a dotplot, or a boxplot of the data in
separate Graph windows.
display statistics in a single graphical summary. You can specify the confidence level for the
displayed confidence intervals. The default level is 95%.
CONTENTS
1-7
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 1
HOW TO USE
Display Descriptive Statistics
There is no restriction on the number of columns or levels when producing output in the Session
window.
Tip
If you exceed the maximum number of graphs because of the number of levels of your By
variable, you can decrease the number of graphs by unstacking your data and displaying
descriptive statistics for data subsets. See the Manipulating Data chapter in MINITAB Users
Guide 1 for more information.
You want to compare the height (in inches) of male (Sex = 1) and female (Sex = 2) students who
participated in the pulse study. You choose to display a boxplot of the data.
1 Open the worksheet PULSE.MTW.
2 Choose Stat Basic Statistics Display Descriptive Statistics.
3 In Variables, enter Height. Check By variable and enter Sex in the text box.
4 Click Graphs. Check Boxplot of data. Click OK in each dialog box.
Session
window
output
Sex
1
2
N
57
35
Mean
70.754
65.400
Median
71.000
65.500
TrMean
70.784
65.395
StDev
2.583
2.563
Variable
Height
Sex
1
2
SE Mean
0.342
0.433
Minimum
66.000
61.000
Maximum
75.000
70.000
Q1
69.000
63.000
Q3
73.000
68.000
Graph
window
output
1-8
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Basic Statistics
Data
The data columns must be numeric. The optional grouping column (also called a By column)
can be numeric, text, or date/time and must be the same length as the data columns. If you wish
to change the order in which text categories are processed from their default alphabetical order,
you can define your own order (see Ordering Text Categories in the Manipulating Data chapter
in MINITAB Users Guide 1).
MINITAB automatically omits missing data from the calculations.
h To store descriptive statistics
1 Choose Stat Basic Statistics Store Descriptive Statistics.
2 In Variables, enter the column(s) containing the data you want to describe.
3 If you like, use one or more of the options listed below, then click OK.
Options
Store Descriptive Statistics dialog box
select the statistics that you wish to store. The defaults are sample mean and sample size
(nonmissing).
CONTENTS
1-9
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 1
HOW TO USE
Store Descriptive Statistics
store a row of output for each row of input. By default, MINITAB stores the requested statistics
at the top of the worksheet only. If you check Store a row of output for each row of input,
MINITAB will append the appropriate statistics to each row of input data.
store statistics for empty cells (default)see Storing Descriptive Statistics on page 1-10
store the distinct values of the By variables (default)see Storing Descriptive Statistics on page
1-10
Material
A
A
A
A
A
A
B
B
B
B
B
B
B
Width
3.04
3.06
3.07
3.01
2.94
2.98
3.02
3.00
3.03
3.02
3.02
3.01
3.04
ByVar1
1
1
2
2
3
3
ByVar2
A
B
A
B
A
B
Mean1
3.05667
2.97667
3.01500
3.01667
N1
3
0
3
2
0
3
1-10
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Basic Statistics
ByVar2
.
.
.
A
B
Mean1
.
.
.
3.03000
N1
.
.
.
0
2
CONTENTS
1-11
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 1
SC QREF
HOW TO USE
Data
Enter each sample in a single numeric column. You can generate a hypothesis test or confidence
interval for more than one column at a time.
MINITAB automatically omits missing data from the calculations.
h To do a Z-test and confidence interval of the mean
1 Choose Stat Basic Statistics 1-Sample Z.
Options
1-Sample Z dialog box
to perform a hypothesis test, specify a null hypothesized test value in Test mean.
specify a confidence level for the confidence interval. The default is 95%.
define the alternative hypothesis by choosing less than (lower-tailed), not equal (two-tailed), or
greater than (upper-tailed). The default is a two-tailed test.
Note that if you choose a lower-tailed or an upper-tailed hypothesis test, an upper or lower
confidence bound will be constructed, respectively, rather than a confidence interval.
1-12
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Basic Statistics
display a histogram, dotplot, and boxplot for each column. The graphs show the sample mean
and a confidence interval (or bound) for the mean. When you do a hypothesis test, the graphs
also show the null hypothesis test value.
Method
Confidence interval
The confidence interval is calculated as
x z 2 ( n ) to x + z 2 ( n )
where x is the mean of the data, is the population standard deviation, n is the sample size, and
z/2 is the value from the normal table where is 1 confidence level / 100.
Note that the appropriate confidence bound is constructed in a similar fashion with /2 replaced
by . Then the lower bound is the sample mean minus the error margin and the upper bound is
the sample mean plus the error margin.
You can specify a confidence level by entering any number between 1 and 100 in Level. The
confidence level is 95% by default.
Hypothesis test
MINITAB calculates the test statistic by
x
Z = --------------0 n
where x is the mean of the data, is the hypothesized population mean, is the population
standard deviation, and n is the sample size.
MINITAB performs a two-tailed test unless you specify a one-tailed test.
e Example of one-sample Z-test and confidence interval
Measurements were made on nine widgets. You know that the distribution of measurements has
historically been close to normal with = 0.2. Since you know , and you wish to test if the
population mean is 5 and obtain a 90% confidence interval for the mean, you use the
Z-procedure.
1 Open the worksheet EXH_STAT.MTW.
2 Choose Stat Basic Statistics 1-Sample Z.
3 In Variables, enter Values.
4 In Sigma, enter 0.2.
MINITAB Users Guide 2
CONTENTS
1-13
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 1
SC QREF
HOW TO USE
Session
window
output
One-Sample Z: Values
Test of mu = 5 vs mu not = 5
The assumed sigma = 0.2
Variable
Values
Variable
Values
N
9
(
Mean
4.7889
StDev
0.2472
90.0% CI
4.6792, 4.8985)
SE Mean
0.0667
Z
P
-3.17 0.002
Graph
window
output
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Basic Statistics
Data
Enter each sample in a single numeric column. You can generate a hypothesis test or confidence
interval for more than one column at a time.
MINITAB automatically omits missing data from the calculations.
h To compute a t-test and confidence interval of the mean
1 Choose Stat Basic Statistics 1-Sample t.
Options
1-Sample t dialog box
perform a hypothesis test by specifying a null hypothesized test value in Test mean.
specify a confidence level for the confidence interval. The default is 95%.
define the alternative hypothesis by choosing less than (lower-tailed), not equal (two-tailed),
or greater than (upper-tailed). The default is a two-tailed test.
Note that if you choose a lower-tailed or an upper-tailed hypothesis test, an upper or lower
confidence bound will be constructed, respectively, rather than a confidence interval.
CONTENTS
1-15
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 1
SC QREF
HOW TO USE
display a histogram, dotplot, and boxplot for each column. The graphs show the sample mean
and a confidence interval (or bound) for the mean. In addition, the null hypothesis test value is
displayed when you do a hypothesis test.
Method
Confidence interval
The confidence interval is calculated as
x t 2 ( s n ) to x + t 2 ( s n )
where x is the mean of the data, s is the sample standard deviation, n is the sample size, and t/2
is the value from a t-distribution table where is 1 confidence level / 100 and degrees of
freedom are (n 1).
Note that the appropriate confidence bound is constructed in a similar fashion with /2 replaced
by . Then the lower bound is the sample mean minus the error margin and the upper bound is
the sample mean plus the error margin.
You can specify a confidence level by entering any number between 1 and 100 in Confidence
level. The confidence level is 95% by default.
Hypothesis test
MINITAB calculates the test statistic by
x
t = --------------0s n
where x is the mean of the data, 0 is the hypothesized population mean, s is the sample
standard deviation, and n is the sample size.
MINITAB performs a two-tailed test unless you specify a one-tailed test.
e Example of a one-sample t-test and confidence interval
Measurements were made on nine widgets. You know that the distribution of widget
measurements has historically been close to normal, but suppose that you do not know . To test
if the population mean is 5 and to obtain a 90% confidence interval for the mean, you use a
t-procedure.
1 Open the worksheet EXH_STAT.MTW.
2 Choose Stat Basic Statistics 1-Sample t.
3 In Variables, enter Values.
4 In Test mean, enter 5.
1-16
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Basic Statistics
5 Click Options. In Confidence level enter 90. Click OK in each dialog box.
Session
window
output
One-Sample T: Values
Test of mu = 5 vs mu not = 5
Variable
Values
Variable
Values
N
9
(
Mean
4.7889
StDev
0.2472
90.0% CI
4.6357, 4.9421)
SE Mean
0.0824
T
-2.56
P
0.034
versus
H1: 1 - 2 0
where 1 and 2 are the population means and 0 is the hypothesized difference between the two
population means.
Data
Data can be entered in one of two ways:
both samples in a single numeric column with another grouping column (called subscripts)
to identify the population. The grouping column may be numeric, text, or date/time.
The sample sizes do not need to be equal. MINITAB automatically omits missing data from the
calculations.
CONTENTS
1-17
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 1
SC QREF
HOW TO USE
3 If you like, use one or more of the options listed below, and click OK.
Options
2-Sample t dialog box
assume that the populations have equal variances. The default is to assume unequal
variancessee Equal or unequal variances on page 1-19.
specify a confidence level for the confidence interval. The default is 95%.
specify a null hypothesized test value in Test mean to perform a hypothesis test. The default is
zero, or that the two population means are equal.
define the alternative hypothesis by choosing less than (lower-tailed), not equal (two-tailed), or
greater than (upper-tailed). The default is a two-tailed test.
Note that if you choose a lower-tailed or an upper-tailed hypothesis test, an upper or lower
confidence bound will be constructed, respectively, rather than a confidence interval.
1-18
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Basic Statistics
display a dotplot or boxplot of each sample in the same graph.The graphs also display the
sample means.
Method
Confidence interval
The confidence interval is calculated as
( x 1 x 2 ) t 2 s to ( x 1 x 2 ) + t 2 s
where t/2 is the value from a t-distribution table where is 1 - confidence level/100. The sample
standard deviation, s, of x 1 x 2 and the degrees of freedom depend upon the variance
assumption.
When a one-tailed test is specified, the appropriate confidence bound is constructed in a similar
fashion with /2 replaced by . Then the lower bound is the sample mean minus the error
margin and the upper bound is the sample mean plus the error margin.
You can specify a confidence level of any number between 1 and 100 in Confidence level. The
confidence level is 95% by default.
Hypothesis test
MINITAB calculates the test statistic, t, by
t = (( x 1 x 2 ) - 0)/s
The sample standard deviation, s, of x 1 x 2 depends upon the variance assumption. Recall that
0 is the hypothesized difference between the two population means.
Standard deviations
When you assume unequal variances, the sample standard deviation of x 1 x 2 is
CONTENTS
1-19
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 1
SC QREF
HOW TO USE
s =
s1
s2
------ + -----n1 n2
( VAR 1 + VAR 2 )
df = ------------------------------------------------------------------------------------------------------------2
2
[ ( VAR 1 ) ( n 1 1 ) ] + [ ( VAR 2 ) ( n 2 1 ) ]
where VAR1 = s12/n1, and VAR2 = s22/n2. MINITAB truncates the degrees of freedom to an integer,
if necessary. This is a more conservative approach than rounding.
When you assume equal variances, the pooled sample standard deviation is
2
sp =
( n 1 1 )s 1 + ( n 2 1 )s 2
------------------------------------------------------------n 1 + n2 2
A study was performed in order to evaluate the effectiveness of two devices for improving the
efficiency of gas home-heating systems. Energy consumption in houses was measured after one of
the two devices was installed. The two devices were an electric vent damper (Damper = 1) and a
thermally activated vent damper (Damper = 2). The energy consumption data (BTU.In) are
stacked in one column with a grouping column (Damper) containing identifiers or subscripts to
denote the population. Previously, you performed a variance test and found no evidence for
variances being unequal (see Example of a test for equal variances on page 1-34). Now you want to
compare the effectiveness of these two devices by determining whether or not there is any
evidence that the difference between the devices is different from zero.
1 Open the worksheet FURNACE.MTW.
2 Choose Stat Basic Statistics 2-Sample T.
3 Choose Samples in one column.
4 In Samples, enter 'BTU.In'.
5 In Subscripts, enter Damper.
6 Check Assume equal variances. Click OK.
1-20
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Session
window
output
Basic Statistics
N
40
50
Mean
9.91
10.14
StDev
3.02
2.77
SE Mean
0.48
0.39
CONTENTS
1-21
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 1
HOW TO USE
Data
The data from each sample must be in separate numeric columns of equal length. Each row
contains the paired measurements for an observation. If either measurement in a row is missing,
MINITAB automatically omits that row from the calculations.
h To compute a paired t-test and confidence interval
1 Choose Stat Basic Statistics Paired t.
Options
Graphs subdialog box
display a histogram, dotplot, and boxplot of the paired differences. The graphs show the
sample mean of the differences and a confidence interval (or bound) for the mean of the
differences. In addition, the null hypothesis test value is displayed when you do a hypothesis
test.
specify a confidence level for the confidence interval. The default is 95%.
define the alternative hypothesis by choosing less than (lower-tailed), not equal (two-tailed), or
greater than (upper-tailed). The default is a two-tailed test.
Note that if you choose a lower-tailed or an upper-tailed hypothesis test, an upper or lower
confidence bound will be constructed, respectively, rather than a confidence interval.
1-22
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Basic Statistics
Method
Confidence interval
For a two-tailed test, the confidence interval is calculated as
d t 2 ( s d n ) to d + t 2 ( s d n )
where:
d
t/2
sd
Note that the appropriate confidence bound is constructed in a similar fashion with /2 replaced
by . Then the lower bound is the sample mean minus the error margin and the upper bound is
the sample mean plus the error margin.
The standard deviation of the differences is calculated by:
2
sd =
(d d)
-------------------------(n 1)
You can specify a confidence level of any number between 1 and 100. The confidence level is
95% by default.
Hypothesis test
MINITAB calculates the test statistic, t, by:
d 0
t = -------------------( sd n )
When 0 is not specified in Test mean, 0 = 0 is used. MINITAB performs a two-tailed test unless
you specify a one-tailed test.
e Example of a test and confidence interval for paired data
A shoe company wants to compare two materials, A and B, for use on the soles of boys shoes. In
this example, each of ten boys in a study wore a special pair of shoes with the sole of one shoe
made from Material A and the sole on the other shoe made from Material B. The sole types were
randomly assigned to account for systematic differences in wear between the left and right foot.
After three months, the shoes are measured for wear.
MINITAB Users Guide 2
CONTENTS
1-23
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 1
SC QREF
HOW TO USE
For these data, you would use a paired design rather than an unpaired design. A paired
t-procedure would probably have a smaller error term than the corresponding unpaired
procedure because it removes variability that is due to differences between the pairs. For example,
one boy may live in the city and walk on pavement most of the day, while another boy may live in
the country and spend much of his day on unpaved surfaces.
1 Open the worksheet EXH_STAT.MTW.
2 Choose Stat Basic Statistics Paired t.
3 In First Sample, enter Mat-A. In Second Sample, enter Mat-B. Click OK.
Session
window
output
N
10
10
10
Mean
10.630
11.040
-0.410
StDev
2.451
2.518
0.387
SE Mean
0.775
0.796
0.122
P-Value = 0.009
1-24
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Basic Statistics
Data
You can have data in two forms: raw or summarized.
Raw data
Enter each sample in a numeric, text, or date/time column in your worksheet. Columns must be
all of the same type. Each column contains both the success and failure data for that sample.
Successes and failures are determined by numeric or alphabetical order. MINITAB defines the
lowest value as the failure; the highest value as the success. For example:
for the numeric column entries of 20 and 40, observations of 20 are considered failures;
observations of 40 are considered successes.
for the text column entries of alpha and omega, observations of alpha are considered
failures; observations of omega are considered successes. If the data entries are red and
yellow, observations of red are considered failures; observations of yellow are considered
successes.
You can reverse the definition of success and failure in a text column by applying a value
order (see Ordering Text Categories in the Manipulating Data chapter of MINITAB Users
Guide 1).
With raw data, you can generate a hypothesis test or confidence interval for more than one
column at a time. When you enter more than one column, MINITAB performs a separate
analysis for each column.
MINITAB omits missing data from the calculations.
Summarized data
Enter the number of trials and one or more values for the number of successes directly in the 1
Proportion dialog box. When you enter more than one success value, MINITAB performs a
separate analysis for each one.
CONTENTS
1-25
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 1
SC QREF
HOW TO USE
If you have raw data, choose Samples in columns, and enter the columns containing the
raw data.
4 If you like, use one or more of the options listed below, and click OK.
Options
Options subdialog box
specify a confidence level for the confidence interval. The default is 95%.
define the alternative hypothesis by choosing less than (lower-tailed), not equal (two-tailed), or
greater than (upper-tailed). The default is a two-tailed test.
Note that if you choose a lower-tailed or an upper-tailed hypothesis test, an upper or lower
confidence bound will be constructed, respectively, rather than a confidence interval.
use a normal approximation rather than the exact test for both the hypothesis test and
confidence intervalsee Method on page 1-27.
1-26
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Basic Statistics
Method
Confidence interval
By default, MINITAB uses an exact method [5] to calculate the confidence interval limits (pL, pU):
Lower limit (pL)
1 F
p L = -------------------2 + 1 F
1 F
p U = -------------------2 + 1 F
where:
where:
1 = 2x
2 = 2(n x +1)
x = number of successes
n = number of trials
F = lower /2 point of F with 1
and 2 degrees of freedom
1 = 2(x + 1)
2 = 2(n x)
x = number of successes
n = number of trials
F = upper /2 point of F with 1
and 2 degrees of freedom
If you choose to use a normal approximation, MINITAB calculates the confidence interval as:
p ( 1 p )
p z 2 -------------------n
where:
p is the observed probability, p = x / n, where x is the
observed number of successes in n trials
z/2 is the value from the z-distribution where is
1 confidence level / 100
n is the number of trials
Note that the appropriate confidence bound is constructed in a similar fashion with /2 replaced
by . Then the lower bound is the sample mean minus the error margin and the upper bound is
the sample mean plus the error margin.
You can specify a confidence level of any number between 1 and 100 in Confidence level. The
confidence level is 95% by default.
Hypothesis test
By default, MINITAB uses an exact method to calculate the test probability. If you choose to use a
normal approximation, MINITAB calculates the test statistic (Z) as:
p p o
Z = ----------------------------p o ( 1 po )
------------------------n
where:
p is the observed probability, p = x / n, where x is the
observed number of successes in n trials
po is the hypothesized probability
n is the number of trials
CONTENTS
1-27
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 1
SC QREF
HOW TO USE
The probabilities are obtained from a standard normal distribution table (Z table).
When p0 is not specified in Test proportion, p0 = 0.5 is used. MINITAB performs a two-tailed test
unless you specify a one-tailed test.
e Example of a test and confidence interval for a proportion
A county district attorney would like to run for the office of state district attorney. She has decided
that she will give up her county office and run for state office if more than 65% of her party
constituents support her. You need to test H0: p = 0.65 versus H1: p > 0.65.
As her campaign manager, you collected data on 950 randomly selected party members and find
that 560 party members support the candidate. A test of proportion was performed to determine
whether or not the proportion of supporters was greater than the required proportion of 0.65. In
addition, a 95% confidence bound was constructed to determine the lower bound for the
proportion of supporters.
1 Choose Stat Basic Statistics 1 Proportion.
2 Choose Summarized data.
3 In Number of trials, enter 950. In Number of successes, enter 560.
4 Click Options.
5 In Test proportion, enter 0.65.
6 From Alternative, choose greater than. Click OK in each dialog box.
Session
window
output
X
560
N Sample p
950 0.589474
Exact
95.0% Lower Bound P-Value
0.562515
1.000
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Basic Statistics
see if you have more responses from the group that received the sample than from those who did
not. For a two-tailed test of two proportions:
H0: p1 - p2 = p0 versus H1: p1 - p2 p0
where p1 and p2 are the proportions of success in populations 1 and 2, respectively, and p0 is
the hypothesized difference between the two proportions.
To test one proportion, use Stat Basic Statistics 1 Proportion described on page 1-25.
Data
Data can be in two forms: raw or summarized.
Raw data
Raw data can be entered in two ways: stacked and unstacked.
enter both samples in a single column (stacked) with a group column to identify the
population. Columns may be numeric, text, or date/time. Successes and failures are
determined by numeric or alphabetical order. MINITAB defines the lowest value as the failure;
the highest value as the success. For example:
for the numeric column entries of 5 and 10, observations of 5 are considered failures;
observations of 10 are considered successes.
for the text column entries of agree and disagree, observations of agree are considered
failures; observations of disagree are considered successes. If the data entries are yes and
no, observations of no are considered failures; observations of yes are considered
successes.
enter each sample (unstacked) in separate numeric or text columns. Both columns must be
the same typenumeric or text. Successes and failures are defined as above for stacked data.
You can reverse the definition of success and failure in a text column by applying a value order
(see Ordering Text Categories in the Manipulating Data chapter of MINITAB Users Guide 1).
The sample sizes do not need to be equal. MINITAB automatically omits missing data from the
calculations.
Summarized data
Enter the number of trials and the number of successes for each sample directly in the 2
Proportions dialog box.
CONTENTS
1-29
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 1
SC QREF
HOW TO USE
If your raw data are unstacked, that is, each sample is in a separate column:
1 Choose Samples in different columns.
2 In First, enter the column containing the first sample.
3 In Second, enter the column containing the other sample.
3 If you like, use one or more of the options listed below, and click OK.
Options
Options subdialog box
specify a confidence level for the confidence interval. The default is 95%.
define the alternative hypothesis by choosing less than (lower-tailed), not equal (two-tailed), or
greater than (upper-tailed). The default is a two-tailed test.
Note that if you choose a lower-tailed or an upper-tailed hypothesis test, an upper or lower
confidence bound will be constructed, respectively, rather than a confidence interval.
1-30
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Basic Statistics
use a pooled estimate of p to calculate the test statistic. See Hypothesis test on page 1-31.
Method
Confidence interval
The confidence interval is calculated as
p 1 ( 1 p 1 ) p 2 ( 1 p 2 )
p 1 p 2 z 2 ------------------------ + ------------------------n1
n2
where:
p 1 and p 2 are the observed probabilities of sample one and sample two respectively, p = x /
n, where x is the observed success in n trials
z/2 is the value from a Z-distribution where is 1 confidence level / 100
Note that the appropriate confidence bound is constructed in a similar fashion with /2 replaced
by . Then the lower bound is the sample mean minus the error margin and the upper bound is
the sample mean plus the error margin.
You can specify a confidence level of any number between 1 and 100 in Confidence level. The
confidence level is 95% by default.
Hypothesis test
The calculation of the test statistic, Z, depends on the method used to estimate p.
By default, MINITAB uses separate estimates of p for each population and calculates Z by:
( p 1 p 2 ) d o
Z = ------------------------------------------------------------p 1 ( 1 p 1 ) p 2 ( 1 p 2 )
------------------------- + ------------------------n1
n2
where d0 is the hypothesized difference. When d0 is not specified in Test difference, d0 = 0 is
used.
If you choose to use a pooled estimate of p for the test, MINITAB calculates Z by:
p 1 p 2
Z = ---------------------------------------------------1
1
p c ( 1 p c ) ----- + -----
n
n
1
CONTENTS
1-31
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 1
SC QREF
HOW TO USE
x1 + x2
p c = ---------------n1 + n2
Note
It is only appropriate to use a pooled estimate when the hypothesized difference is zero
(d0 = 0).
As your corporations purchasing manager, you need to authorize the purchase of twenty new
photocopy machines. After comparing many brands in terms of price, copy quality, warranty, and
features, you have narrowed the choice to two: Brand X and Brand Y. You decide that the
determining factor will be the reliability of the brands as defined by the proportion requiring
service within one year of purchase.
Because your corporation already uses both of these brands, you were able to obtain information
on the service history of 50 randomly selected machines of each brand. Records indicate that six
Brand X machines and eight Brand Y machines needed service. Use this information to guide
your choice of brand for purchase.
1 Choose Stat Basic Statistics 2 Proportions.
2 Choose Summarized data.
3 In First sample, under Trials, enter 50. Under Successes, enter 44.
4 In Second sample, under Trials, enter 50. Under Successes, enter 42. Click OK.
Session
window
output
X
44
42
N Sample p
50 0.880000
50 0.840000
P-Value = 0.564
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Basic Statistics
Data
Data can be entered in one of two ways:
both samples in a single numeric column with another grouping column (called subscripts)
to identify the population. The grouping column may be numeric, text, or date/time.
The sample sizes do not need to be equal. MINITAB automatically omits missing data from the
calculations.
h To perform a variance test
1 Choose Stat Basic Statistics 2 Variances.
3 If you like, use one or more of the options listed below, and click OK.
MINITAB Users Guide 2
CONTENTS
1-33
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 1
HOW TO USE
Test for Equal Variances
Options
Options subdialog box
specify a confidence level for the confidence interval (the default is 95%)
store standard deviations, variances, and upper and lower confidence limits for by factor
levels
A study was performed in order to evaluate the effectiveness of two devices for improving the
efficiency of gas home-heating systems. Energy consumption in houses was measured after one of
the two devices was installed. The two devices were an electric vent damper (Damper = 1) and a
thermally activated vent damper (Damper = 2). The energy consumption data (BTU.In) are
stacked in one column with a grouping column (Damper) containing identifiers or subscripts to
denote the population. You are interested in comparing the variances of the two populations so
that you can construct a two-sample t-test and confidence interval to compare the two dampers.
(See Example of a two-sample t-test and confidence interval on page 1-20.)
1 Open the worksheet FURNACE.MTW.
2 Choose Stat Basic Statistics 2 Variances.
3 Choose Samples in one column.
4 In Samples, enter 'BTU.In'.
5 In Subscripts, enter Damper. Click OK.
1-34
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Session
window
output
HOW TO USE
Basic Statistics
Sigma
Upper
2.40655
2.25447
3.01987
2.76702
4.02726
3.56416
40
50
Factor Levels
1
2
Graph
window
output
CONTENTS
1-35
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 1
HOW TO USE
Correlation
That is, these data do not provide enough evidence to claim that the two populations have
unequal variances. Thus, it is reasonable to assume equal variances when using a two-sample
t-procedure.
Correlation
You can use the Pearson product moment correlation coefficient to measure the degree of linear
relationship between two variables. The correlation coefficient assumes a value between 1 and
+1. If one variable tends to increase as the other decreases, the correlation coefficient is negative.
Conversely, if the two variables tend to increase together the correlation coefficient is positive.
For a two-tailed test of the correlation:
H0: = 0 versus
Data
Data must be in numeric columns of equal length.
MINITAB omits missing data from calculations using a method that is often called pairwise
deletion. MINITAB omits from the calculations for each column pair only those rows that contain
a missing value for that pair.
If you are calculating correlations between multiple columns at the same time, pairwise deletion
may result in different observations being included in the various correlations. Although this
method is the best for each individual correlation, the correlation matrix as a whole may not be
well behaved (for example, it may not be positive definite).
h To calculate the Pearson product moment correlation
1 Choose Stat Basic Statistics Correlation.
1-36
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Correlation
HOW TO USE
Basic Statistics
Options
display the p-value for individual hypothesis tests. This is the default.
store the correlation matrix. MINITAB does not display the correlation matrix when you store
the matrix. To display the matrix, choose Manip Display Data.
Method
For the two variables x and y,
(x x)(y y)
r = ---------------------------------------( n 1 )s x s y
where x and sx are the sample mean and standard deviation for the first sample, and y and sy are
the sample mean and standard deviation for the second sample.
e Example of Pearson correlations
We have verbal and math SAT scores and first-year college grade-point averages for 200 students
and we wish to investigate the relatedness of these variables. We use correlation with the default
choice for displaying p-values.
1 Open the worksheet GRADES.MTW.
2 Choose Stat Basic Statistics Correlation.
3 In Variables, enter Verbal Math GPA. Click OK.
Session
window
output
Verbal
0.275
0.000
Math
0.322
0.000
0.194
0.006
GPA
CONTENTS
1-37
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 1
Correlation
Spearmans
You can also use Correlation to obtain Spearmans (rank correlation coefficient). Like the
Pearson product moment correlation coefficient, Spearmans is a measure of the relationship
between two variables. However, Spearmans is calculated on ranked data.
h To calculate Spearmans
1 Delete any rows that contain missing values.
2 If the data are not already ranked, use Manip Rank to rank them. See the Manipulating
2-3.
2 Regress the second variable on the other variables and store the residuals.
3 Calculate the correlation between the two columns of residuals.
e Example of computing a partial correlation coefficient
1-38
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Correlation
Basic Statistics
The remaining steps calculate partial correlation between Sales and Newcap.
Step 2: Regress Sales on Value and store the residuals (Resi1)
1 Choose Stat Regression Regression.
2 In Response, enter Sales. In Predictors, enter Value.
3 Click Storage, and check Residuals. Click OK in each dialog box.
Session
window
output
Sales
0.615
0.000
Newcap
0.803
0.000
0.734
0.000
Session
window
output
CONTENTS
1-39
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 1
HOW TO USE
Covariance
Covariance
You can calculate the covariance for all pairs of columns. Like the Pearson correlation
coefficient, the covariance is a measure of the relationship between two variables. However, the
covariance has not been standardized, as is done with the correlation coefficient. The correlation
coefficient is standardized by dividing by the standard deviation of both variables.
Data
Data must be in numeric columns of equal length.
MINITAB omits missing data from calculations using a method that is often called pairwise
deletion. MINITAB omits from the calculations for each column pair only those rows that contain
a missing value for that pair.
If you are calculating covariances between multiple columns at the same time, pairwise deletion
may result in different observations being included in the various covariances. Although this
method is the best for each individual covariance, the covariance matrix as a whole may not be
well behaved (for example, it may not be positive definite).
h To calculate the covariance
1 Choose Stat Basic Statistics Covariance.
Options
You can store the covariance matrix. MINITAB does not display the covariance matrix when you
store the matrix. To display the matrix, choose Manip Display Data.
1-40
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Normality Test
Basic Statistics
Method
The covariance between each pair of columns is calculated, using the formula
(x x)(y y)
S xy = ---------------------------------------n1
where x is the sample mean for the first sample and y is the sample mean for the second
sample.
Normality Test
Normality test generates a normal probability plot and performs a hypothesis test to examine
whether or not the observations follow a normal distribution. For the normality test, the
hypotheses are,
H0: data follow a normal distribution vs. H1: data do not follow a normal distribution
Data
You need one numeric column. MINITAB automatically omits missing data from the
calculations.
h To perform a normality test
1 Choose Stat Basic Statistics Normality Test.
Options
mark reference probabilities and corresponding data values on the plotsee Method on page
1-42
CONTENTS
1-41
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 1
HOW TO USE
Normality Test
Ryan-Joiner test [4], [7] (similar to the Shapiro-Wilk test [8], [9]) which is a correlation based
test
The Anderson-Darling and Ryan-Joiner tests have similar power for detecting non-normality. The
Kolmogorov-Smirnov test has lesser powersee [3] and [7] for discussions of these tests for
normality.
The common null hypothesis for these three tests is H0: data follow a normal distribution. If the
p-value of the test is less than your level, reject H0.
Method
The input data are plotted as the x-values. MINITAB calculates the probability of occurrence,
assuming a normal distribution, and plots the calculated probabilities as y-values. The grid on the
graph resembles the grids found on normal probability paper, with a log scale for the
probabilities. A least-squares line is fit to the plotted points and drawn on the plot for reference.
The line forms an estimate of the cumulative distribution function for the population from which
data are drawn. MINITAB also displays the sample mean, standard deviation, and sample size of
the input data on the plot.
When you enter the optional reference probabilities, they are marked with horizontal references
lines. At the point where the reference line intersects the least-squares fit, a vertical reference line
is drawn and labeled with the corresponding data value. To include reference probabilities on the
plot:
In an operating engine, parts of the crankshaft move up and down. AtoBDist is the distance (in
mm) from the actual (A) position of a point on the crankshaft to a baseline (B) position. To ensure
production quality, a manager took five measurements each working day in a car assembly plant,
from September 28 through October 15, and then ten per day from the 18th through the 25th.
You wish to see if these data follow a normal distribution, so you use Normality test.
1-42
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
References
Basic Statistics
1 Open the worksheet CRANKSH.MTW.
2 Choose Stat Basic Statistics Normality Test.
3 In Variable, enter AtoBDist. Click OK.
Graph
window
output
References
[1] S.F. Arnold (1990). Mathematical Statistics, Prentice-Hall, pp.383-384.
[2] M.B. Brown and A.B. Forsythe (1974). Journal of the American Statistical Association, 69,
364-367.
[3] R.B. DAugostino and M.A. Stevens, Eds. (1986). Goodness-of-Fit Techniques, Marcel
Dekker.
[4] J.J. Filliben (1975). The Probability Plot Correlation Coefficient Test for Normality,
Technometrics, Vol 17, p.111.
[5] N.L. Johnson and S. Kotz (1969). Discrete Distributions, John Wiley & Sons, pp.58-61.
[6] H. Levene (1960). Contributions to Probability and Statistics, pp.278-292. Stanford
University Press, CA.
MINITAB Users Guide 2
CONTENTS
1-43
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 1
HOW TO USE
References
[7] T.A. Ryan, Jr. and B.L. Joiner (1976). Normal Probability Plots and Tests for Normality,
Technical Report, Statistics Department, The Pennsylvania State University. (Available from
MINITAB Inc.)
[8] S.S. Shapiro and R.S. Francia (1972). An Approximate Analysis of Variance Test for
Normality, Journal of the American Statistical Association, Vol 67, p.215.
[9] S.S. Shapiro and M.B. Wilk. An Analysis of Variance Test for Normality (Complete
Samples), Biometrika, Vol 52, p. 591.
1-44
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Regression
Regression Overview, 2-2
Regression, 2-3
See also,
CONTENTS
2-1
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 2
Regression Overview
Regression Overview
Regression analysis is used to investigate and model the relationship between a response variable
and one or more predictors. MINITAB provides various least-squares and logistic regression
procedures.
Both least squares and logistic regression methods estimate parameters in the model so that the fit
of the model is optimized. Least squares minimizes the sum of squared errors to obtain parameter
estimates, whereas MINITABs logistic regression commands obtain maximum likelihood
estimates of the parameters. See Logistic Regression Overview on page 2-28 for more information
about logistic regression.
Use the table below to assist in selecting a procedure:
response
type
estimation
method
continuous
least squares
Stepwise
(page 2-13)
continuous
least squares
Best Subsets
(page 2-19)
continuous
least squares
continuous
least squares
Residual Plots
(page 2-26)
continuous
least squares
Binary Logistic
(page 2-32)
categorical
maximum
likelihood
Ordinal Logistic
(page 2-43)
categorical
maximum
likelihood
Nominal
Logistic
(page 2-49)
categorical
maximum
likelihood
Use
to
Regression
(page 2-3)
2-2
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Regression
Regression
Regression
You can use Regression to perform simple and multiple regression using the method of least
squares. Use this procedure for fitting general least squares models, storing regression statistics,
examining residual diagnostics, generating point estimates, generating prediction and
confidence intervals, and performing lack-of-fit tests.
You can also use this command to fit polynomial regression models. However, if you want to fit a
polynomial regression model with a single predictor, you may find it more advantageous to use
Fitted Line Plot (page 2-23).
Data
Enter response and predictor variables in numeric columns of equal length so that each row in
your worksheet contains measurements on one observation or subject.
MINITAB omits all observations that contain missing values in the response or in the predictors,
from calculations of the regression equation and the ANOVA table items.
h To do a linear regression
1 Choose Stat Regression Regression.
Options
Graphs subdialog box
draw five different residual plots for regular, standardized, or deleted residualssee Choosing
a residual type on page 2-5. Available residual plots include a:
CONTENTS
2-3
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 2
HOW TO USE
Regression
histogram.
normal probability plot.
).
plot of residuals versus the fitted values ( Y
plot of residuals versus data order. The row number for each data point is shown on the
x-axis (for example, 1 2 3 4 n).
separate plot for the residuals versus each specified column.
For a discussion, see Residual plots on page 2-5.
exclude the intercept term from the regression by unchecking Fit Interceptsee Regression
through the origin on page 2-7
display the variance inflation factor (VIFa measure of multicollinearity effect) associated
with each predictorsee Variance inflation factor on page 2-7
perform a pure error lack-of-fit test for testing model adequacy when there are predictor
replicatessee Testing lack-of-fit on page 2-8
perform a data subsetting lack-of-fit test to test the model adequacysee Testing lack-of-fit on
page 2-8
predict the response, confidence interval, and prediction interval for new observationssee
Prediction of new observations on page 2-8
store the coefficients, fits, and regular, standardized, and deleted residualssee Choosing a
residual type on page 2-5.
store the leverages, Cooks distances, and DFITS, for identifying outlierssee Identifying
outliers on page 2-9.
2-4
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Regression
Regression
store the mean square error, the (XX)-1 matrix, and the R matrix of the QR or Cholesky
decomposition. (The variance-covariance matrix of the coefficients is MSE*(XX)-1.) See
Help for information on these matrices.
Calculation
regular
response fit
standardized
(residual) / (standard
deviation of the
residual)
Studentized
(residual) / (standard
deviation of the
residual). The ith
studentized residual is
computed with the ith
observation removed.
Residual plots
MINITAB generates residual plots that you can use to examine the goodness of model fit. You can
choose the following residual plots:
MINITAB Users Guide 2
CONTENTS
2-5
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 2
Regression
Normal plot of residuals. The points in this plot should generally form a straight line if the
residuals are normally distributed. If the points on the plot depart from a straight line, the
normality assumption may be invalid. To perform a statistical test for normality, use Stat
Basic Statistics Normality Test (page 1-41).
Histogram of residuals. This plot should resemble a normal (bell-shaped) distribution with a
mean of zero. Substantial clusters of points away from zero may indicate that factors other than
those in the model may be influencing your results.
Residuals versus fits. This plot should show a random pattern of residuals on both sides of 0.
There should not be any recognizable patterns in the residual plot. The following may
indicate error that is not random:
a series of increasing or decreasing points
a predominance of positive residuals, or a predominance of negative residuals
patterns such as increasing residuals with increasing fits
Residuals versus order. This is a plot of all residuals in the order that the data was collected
and can be used to find non-random error, especially of time-related effects.
Residuals versus other variables. This is a plot of all residuals versus another variable.
Commonly, you might use a predictor or a variable left out of the model and see if there is a
pattern that you may wish to fit.
If certain residual values are of concern, you can brush your graph to identify these values. See
the Brushing Graphs chapter in MINITAB Users Guide 1 for more information.
Weighted regression
Weighted least squares regression is a method for dealing with observations that have nonconstant
variances. If the variances are not constant, observations with
The usual choice of weights is the inverse of pure error variance in the response.
h To perform weighted regression
1 Choose Stat Regression Regression Options.
2 In Weights, enter the column containing the weights. The weights must be greater than or
[ wi ( y fit )
2-6
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Regression
HOW TO USE
Regression
A graph of residuals versus data order (1 2 3 4 n) can provide a means to visually inspect
residuals for autocorrelation.
The Durbin-Watson statistic tests for the presence of autocorrelation in regression residuals
by determining whether or not the correlation between two adjacent error terms is zero. The
test is based upon an assumption that errors are generated by a first-order autoregressive
CONTENTS
2-7
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 2
HOW TO USE
Regression
process. If there are missing observations, these are omitted from the calculations, and only the
nonmissing observations are used.
To reach a conclusion from the test, you will need to compare the displayed statistic with lower
and upper bounds in a table. If D > upper bound, no correlation; if D < lower bound, positive
correlation; if D is in between the two bounds, the test is inconclusive. For additional
information, see [4], [22].
Testing lack-of-fit
MINITAB provides two lack-of-fit tests so you can determine whether or not the regression model
adequately fits your data. The pure error lack-of-fit test requires replicates; the data subsetting
lack-of-fit test does not require replicates.
Pure error lack-of-fit testIf your predictors contain replicates (repeated x values with one
predictor or repeated combinations of x values with multiple predictors), MINITAB can
calculate a pure error test for lack-of-fit. The error term will be partitioned into pure error
(error within replicates) and lack-of-fit error. The F-test can be used to test if you have chosen
an adequate regression model. For additional information, see [9], [22], [29].
Data subsetting lack-of-fit testMINITAB also performs a lack-of-fit test that does not require
replicates but involves subsetting the data, and attempts to identify the nature of any lack-of-fit.
This test is nonstandard, but it can provide information about the lack-of-fit relative to each
variable. See [6] and Help for more information.
MINITAB performs 2k+1 hypothesis tests, where k is the number of predictors, and then
combines them using Bonferroni inequalities to give an overall significance level of 0.1. A
message is printed out for each test for which there is evidence of lack-of-fit. For each
predictor, a curvature test and an interaction test are performed by comparing the fit above
and below the predictor mean using indicator variables. A test can also be performed by fitting
the model to the central portion of the data and then comparing the error sums of squares of
that central data portion to the error sums of squares of all the data.
2-8
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Regression
HOW TO USE
Regression
Identifying outliers
In addition to graphs, you can store three additional measures for the purpose of
identifying outliers, or unusual observations that can have a significant influence upon
the regression. The measures are leverages, Cooks distance, and DFITS:
Leverages are the diagonals of the hat matrix, H = X (XX)-1 X, where X is the
design matrix. Note that hi depends only on the predictors; it does not involve the
response Y. Many people consider hi to be large enough to merit checking if it is
more than 2p/n or 3p/n, where p is the number of predictors (including one for the
constant). MINITAB displays these in a table of unusual observations with high
leverage. Those with leverage over 3p/n or 0.99, whichever is smallest, are marked
with an X and those with leverage greater than 5p/n are marked with XX.
Cooks distance combines leverages and Studentized residuals into one overall
measure of how unusual the predictor values and response are for each observation.
Large values signify unusual observations. Geometrically, Cooks distance is a
measure of the distance between coefficients calculated with and without the ith
observation. Cook [7] and Weisberg [29] suggest checking observations with Cooks
distance > F (.50, p, np), where F is a value from an F-distribution.
DFITS, like Cooks distance, combines the leverage and the Studentized residual
into one overall measure of how unusual an observation is. DFITS (also called
DFFITS) is the difference between the fitted values calculated with and without the
). Belseley, Kuh, and Welsch [3] suggest that
ith observation, and scaled by stdev ( Y
i
observations with DFITS > 2 p n should be considered as unusual. See Help for
more details on these measures.
You are a manufacturer who wishes to easily obtain a quality measure on a product, but
the procedure is expensive. However, there is a quick-and-dirty way of doing the same
thing that is much less expensive but also is slightly less precise. You examine the
relationship between the two scores to see if you can predict the desired score (Score2)
from the score that is easy to obtain (Score1). You also obtain a prediction interval for
an observation with Score1 being 8.2.
1 Open the worksheet EXH_REGR.MTW.
2 Choose Stat Regression Regression.
3 In Response, enter Score2. In Predictors, enter Score1.
4 Click Options.
5 In Predict intervals for new observations, type 8.2. Click OK in each dialog box.
CONTENTS
2-9
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 2
Session
window
output
Regression
Regression Analysis: Score2 versus Score1
The regression equation is
Score2 = 1.12 + 0.218 Score1
Predictor
Constant
Score1
Coef
1.1177
0.21767
S = 0.1274
SE Coef
0.1093
0.01740
R-Sq = 95.7%
T
10.23
12.51
P
0.000
0.000
R-Sq(adj) = 95.1%
Analysis of Variance
Source
Regression
Residual Error
Total
DF
1
7
8
SS
2.5419
0.1136
2.6556
Unusual Observations
Obs
Score1
Score2
9
7.50
2.5000
MS
2.5419
0.0162
Fit
2.7502
F
156.56
SE Fit
0.0519
P
0.000
Residual
-0.2502
St Resid
-2.15R
SE Fit
0.0597
95.0% CI
2.7614, 3.0439)
95.0% PI
( 2.5697, 3.2356)
Score1
8.20
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Regression
Regression
0.0005. These values indicate that there is sufficient evidence that the coefficients are not zero
for likely Type I error rates ( levels).
S = 0.1274. This is an estimate of , the estimated standard deviation about the regression line.
Note that
2
s = MSError
R-Sq = 95.7%. This is R2, also called the coefficient of determination. Note that R2 =
)2. Also,
Correlation (Y, Y
R2 = (SS Regression) / (SS Total)
The R2 value is the proportion of variability in the Y variable (in this example, Score2)
accounted for by the predictors (in this example, Score1).
R-Sq(adj) = 95.1%. This is R2 adjusted for degrees of freedom. If a variable is added to an
equation, R2 will almost always get larger even if the added variable is of no real value. To
compensate for this, MINITAB also prints R-Sq (adj), which is an approximately unbiased
estimate of the population R2 that is calculated by the formula
2
SS Error ( n p -)
R ( adj ) = 1 ------------------------------------------SS Total ( n 1 )
converted to a percent, where p is the number of coefficients fit in the regression equation (2 in
our example). In the same notation, the usual R2 is
2
SS Error
R = 1 -------------------SS Total
Analysis of Variance. This table contains sums of squares (abbreviated SS). SS Regression is
sometimes written SS (Regression | b0) and sometimes called SS Model. SS Error is sometimes
written as SS Residual, SSE, or RSS. MS Error is often written as MSE. SS Total is the total sum
of squares corrected for the mean. Use the analysis of variance table to assess the overall fit. The
F-test is a test of the hypothesis H0: All regression coefficients, excepting 0, are zero.
Unusual Observations. Unusual observations are marked with an X if the predictor is unusual
(large leverage), and they are marked with an R if the response is unusual (large standardized
residual). See Choosing a residual type on page 2-5 and Identifying outliers on page 2-9. The
default is to print only unusual observations. You can choose to print a full table of fitted values
by selecting this option in the Results subdialog box.
. SE Fit is the (estimated)
The Fit or fitted Y value is sometimes called predicted Y value or Y
standard error of the fitted value. St Resid is the standardized residual.
Predicted Values. The interval displayed under 95% CI is the confidence interval for the
population mean of all responses (Score2) that correspond to the given value of the predictor
(Score1 = 8.2). The interval displayed under 95% PI is the prediction interval for an individual
observation taken at Score1 = 8.2. The confidence interval is appropriate for the data used in the
regression. If you have new observations, use the prediction interval. See Prediction of new
observations on page 2-8.
MINITAB Users Guide 2
CONTENTS
2-11
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 2
Regression
Regression analysis would not be complete without examining residual patterns. The following
multiple regression example and residual plots procedure provide additional information about
regression analysis.
e Example of a multiple regression
As part of a test of solar thermal energy, you measure the total heat flux from homes. You wish to
examine whether total heat flux (Heatflux) can be predicted by insulation, by the position of the
focal points in the east, south, and north directions, and by the time of day. Data are from [21],
page 486. You found, using best subsets regression on page 2-22, that the best two-predictor model
included the variables North and South and the best three-predictor added the variable East. You
would like to evaluate the three-predictor model using multiple regression.
1 Open the worksheet EXH_REGR.MTW.
2 Choose Stat Regression Regression.
3 In Response, enter Heatflux.
4 In Predictors, enter North South East. Click OK.
Session
window
output
Coef
389.17
-24.132
5.3185
2.125
S = 8.598
SE Coef
66.09
1.869
0.9629
1.214
R-Sq = 87.4%
T
5.89
-12.92
5.52
1.75
P
0.000
0.000
0.000
0.092
R-Sq(adj) = 85.9%
Analysis of Variance
Source
Regression
Residual Error
Total
Source
North
South
East
DF
1
1
1
DF
3
25
28
SS
12833.9
1848.1
14681.9
MS
4278.0
73.9
F
57.87
P
0.000
Seq SS
10578.7
2028.9
226.3
Unusual Observations
Obs
North HeatFlux
4
17.5
230.70
22
17.6
254.50
Fit
210.20
237.16
SE Fit
5.03
4.24
Residual
20.50
17.34
St Resid
2.94R
2.32R
2-12
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Stepwise Regression
HOW TO USE
Regression
Stepwise Regression
Stepwise regression removes and adds variables to the regression model for the purpose of
identifying a useful subset of the predictors. MINITAB provides three commonly used procedures:
standard stepwise regression (adds and removes variables), forward selection (adds variables), and
backward elimination (removes variables).
CONTENTS
2-13
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 2
Stepwise Regression
Data
Enter response and predictor variables in the worksheet in numeric columns of equal length so
that each row in your worksheet contains measurements on one observation or subject. MINITAB
automatically omits rows with missing values from the calculations.
h To do a stepwise regression
1 Choose Stat Regression Stepwise.
2 In Response, enter the numeric column containing the response (Y) data.
3 In Predictors, enter the numeric columns containing the predictor (X) variables.
4 If you like, use one or more of the options listed below, then click OK.
Options
Stepwise dialog box
By entering variables in Predictors to include in every model, you can designate a set of
predictor variables that cannot be removed from the model, even when their p-values are less
than the Alpha to enter value.
perform standard stepwise regression (adds and removes variables), forward selection (adds
variables), or backward elimination (removes variables).
when you choose the stepwise method, you can enter a starting set of predictor variables in
Predictors in initial model. These variables are removed if their p-values are greater than the
Alpha to enter value. If you want keep variables in the model regardless of their p-values, enter
them in Predictors to include in every model in the main dialog box. See Stepwise regression
(default) on page 2-15.
2-14
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Stepwise Regression
Regression
when you choose the stepwise or forward selection method, you can set the value of the for
entering a new variable in the model in Alpha to enter. See Stepwise regression (default) and
Forward selection on page 2-16.
when you choose the stepwise or backward elimination method, you can set the value of for
removing a variable from the model in Alpha to remove. See Stepwise regression (default) and
Backward elimination on page 2-16.
display the next best alternate predictors up to the number requested. If a new predictor is
entered into the model, MINITAB displays the predictor which was the second best choice, the
third best choice, and so on, up to the requested number.
set the number of steps between pauses. See User intervention on page 2-16.
exclude the intercept term from the regression by unchecking Fit Intercept. See Regression
through the origin on page 2-7.
Method
MINITAB provides three commonly used procedures: standard stepwise regression (page 2-15),
forward selection (page 2-16), and backward elimination (page 2-16)
Stepwise regression (default)
The first step in stepwise regression is to calculate an F-statistic and p-value for each variable in
the model. If the model contains j variables, then F for any variable, Xr, is
SSE { j Xr } SSEj
F ( 1, n j 1 ) = --------------------------------------------MSE j
where n is the number of observations, SSE{j Xr} is the error sum of squares for the model after
Xr is removed, and SSEj and MSEj are the error sums of squares and mean squared errors
(respectively) for the model before Xr is removed.
If the p-value for any variable is greater than Alpha to remove, then the variable with the largest
p-value is removed from the model, the regression equation is calculated, the results are printed,
and the next step is initiated.
If no variable can be removed, the procedure attempts to add a variable. An F-statistic and
p-value are calculated for each variable that is not in the model. If the model contains j variables,
then F for any variable, Xa, is
SSE j SSE { j + Xa }
F ( 1, n j ) = ---------------------------------------------MSE { j + Xa }
CONTENTS
2-15
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 2
HOW TO USE
Stepwise Regression
where n is the number of observations, SSEj is calculated before Xa is added to the model, and
SSE{j + Xa} and MSE{j + Xa} are calculated after Xa is added to the model.
If the p-value corresponding to the F-value for any variable is smaller than Alpha to enter, the
variable with the smallest p-value is then added to the model. The regression equation is then
calculated, results are displayed, and the procedure goes to a new step. When no more variables
can be entered into or removed from the model, the stepwise procedure ends.
Forward selection
This procedure adds variables to the model using the same method as the stepwise procedure.
Once added, however, a variable is never removed. The forward selection procedure ends when
none of the candidate variables have a p-value smaller than Alpha to enter.
Backward elimination
This procedure starts with the model that contains all the predictors and then removes variables,
one at a time, using the same method as the stepwise procedure. No variable, however, can
re-enter the model. The backward elimination procedure ends when none of the variables
included the model have a p-value greater than Alpha to remove.
User intervention
Stepwise proceeds automatically by steps and then pauses. You can set the number of steps
between pauses in the Options subdialog box.
The number of steps can start at one with the default and maximum determined by the output
width. Set a smaller value if you wish to intervene more often. You must check Editor Enable
Commands in order to intervene and use the procedure interactively. If you do not, the
procedure will run to completion without pausing.
At the pause, MINITAB displays a MORE? prompt. At this prompt, you can continue the display of
steps, terminate the procedure, or intervene by typing a subcommand.
To
Type
YES
NO
ENTER CC
REMOVE CC
FORCE CC
BEST K
STEPS K
2-16
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Stepwise Regression
Regression
To
Type
change F to enter
FENTER K
change F to remove
FREMOVE K
change to enter
AENTER K
change to remove
AREMOVE K
Since the procedures automatically snoop through many models, the model selected may
fit the data too well. That is, the procedure can look at many variables and select ones
which, by pure chance, happen to fit well.
The three automatic procedures are heuristic algorithms, which often work very well but
which may not select the model with the highest R2 value (for a given number of predictors).
Automatic procedures cannot take into account special knowledge the analyst may have
about the data. Therefore, the model selected may not be the best from a practical point of
view.
CONTENTS
2-17
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 2
HOW TO USE
Stepwise Regression
Session
window
output
6 predictors, with N =
Step
Constant
1
10.28
2
44.48
Pulse1
T-Value
P-Value
0.957
7.42
0.000
0.912
9.74
0.000
Ran
T-Value
P-Value
S
R-Sq
R-Sq(adj)
C-p
More? (Yes,
92
-19.1
-9.05
0.000
13.5
9.82
37.97
67.71
37.28
66.98
103.2
13.5
No, Subcommand, or Help)
Step
Constant
3
42.62
Pulse1
T-Value
P-Value
0.812
8.88
0.000
Ran
T-Value
P-Value
-20.1
-10.09
0.000
Sex
T-Value
P-Value
7.8
3.74
0.000
S
9.18
R-Sq
72.14
R-Sq(adj)
71.19
C-p
1.9
More? (Yes, No, Subcommand, or Help)
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Regression
two steps. For each model, MINITAB displays the constant term, the coefficient and its t-value for
each variable in the model, S (square root of MSE), and R2.
Because you answered YES at the MORE? prompt, the automatic procedure continued for one
more step, adding the variable Sex. At this point, no more variables could enter or leave, so the
automatic procedure stopped and again allowed you to intervene. Because you do not want to
intervene, you typed NO.
The stepwise output is designed to present a concise summary of a number of fitted models. If
you want more information on any of the models, you can use the regression procedure (page
2-3).
Data
Enter response and predictor variables in the worksheet in numeric columns of equal length so
that each row in your worksheet contains measurements on one unit or subject. Minitab
automatically omits rows with missing values from all models.
You can use as many as 31 free predictors. However, the analysis can take a long time when 15 or
more free predictors are used. When analyzing a very large data set, forcing certain predictors to
be in the model by entering them in Predictors in all models can decrease the length of time
required to run the analysis. The total number of predictors (forced and free) in the analysis can
not be more than 100.
CONTENTS
2-19
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 2
2 In Response, enter the numeric column containing the response (Y) data.
3 In Free predictors, enter from 1 to 31 numeric columns containing the candidate predictor
(X) variables.
4 If you like, use one or more of the options listed below, then click OK.
Options
Best Subsets Regression dialog box
specify a set of predictors to be included in all models by entering these variables in Predictors
in all models. The maximum number of variables which can be entered is equal to 100 minus
the number of variables entered in Free predictors.
specify the minimum and maximum number of free predictors to include under Free
Predictor(s) In Each Model. For example, if you specify a minimum of 3 and a maximum of
6 free predictors, MINITAB will determine the best models that contain 3, 4, 5, and 6 free
predictors. (Note, in addition to the specified number of free predictors, these models will also
contain any variables entered in Predictors in all models.)
specify the number of models to display for each number of variables by entering the desired
value in Models of each size to print. For example, if you enter 3, MINITAB will display the
best, second best, and third best models for each number of free predictors. You can enter a
value from 1 to 5 (the default is 2).
exclude the intercept term from the regression by unchecking Fit Interceptsee Regression
through the origin on page 2-7.
2-20
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Regression
fitting the underlying structure of the data (a structure that will appear in other data sets
gathered in the same way)
fitting the peculiarities of the one particular data set you analyze
Unfortunately, when you search through many models to find the best, as you do in best
subsets regression, a good fit is often chosen largely for the second reason. There are two ways
that you can verify a model obtained by a variable selection procedure. You can
MINITAB Users Guide 2
CONTENTS
2-21
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 2
HOW TO USE
Best Subsets Regression
take the original data set and randomly divide it into two parts. Then use the variable selection
procedure on one part to select a model and verify the fit using the second part.
Total heat flux is measured as part of a solar thermal energy test. You wish to see how total heat
flux is predicted by other variables: insolation, the position of the focal points in the east, south,
and north directions, and the time of day. Data are from Montgomery and Peck [21], page 486.
1 Open the worksheet EXH_REGR.
2 Choose Stat Regression Best Subsets.
3 In Response, enter Heatflux.
4 In Free Predictors, enter Insolation-Time. Click OK.
Session
window
output
Vars
R-Sq
R-Sq(adj)
C-p
1
1
2
2
3
3
4
4
5
72.1
39.4
85.9
82.0
87.4
86.5
89.1
88.0
89.9
71.0
37.1
84.8
80.6
85.9
84.9
87.3
86.0
87.7
38.5
112.7
9.1
17.8
7.6
9.7
5.8
8.2
6.0
12.328
18.154
8.9321
10.076
8.5978
8.9110
8.1698
8.5550
8.0390
I
n
s
o
l
a
t
i
E
a
s
t
S
o
u
t
h
N
o
r
t
h
T
i
m
e
X
X
X X
X X
X X X
X X X
X X X X
X X X X
X X X X X
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Regression
example on page 2-12 and the residual plots example on page 2-26 indicate that adding the
variable East does not improve the fit of the model.
Data
Enter your response and single predictor variables in the worksheet in numeric columns of equal
length so that each row in your worksheet contains measurements on one unit or subject.
MINITAB automatically omits rows with missing values from the calculations.
h To do a fitted line plot
1 Choose Stat Regression Fitted Line Plot.
2 In Response (Y), enter the numeric column containing the response data.
3 In Predictor (X), enter the numeric column containing the predictor variable.
4 If you like, use one or more of the options listed below, then click OK.
Options
Fitted Line Plot dialog box
choose a linear (default), quadratic, or cubic regression model to automatically include all
lower order terms. See Polynomial regression model choices on page 2-24.
CONTENTS
2-23
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 2
HOW TO USE
Fitted Line Plot
transform the y-variable by log10Y. You can also choose to display the y-scale in the log10 scale.
transform the x-variable by log10X. You can also choose to display the plot x scale in the log10
scale. If you use this option with polynomials of order greater than one, then the polynomial
regression will be based on powers of the log10X.
display confidence bands and prediction bands about the regression line. You can also change
the confidence level from the default of 95%.
store the residuals, fits, and regression model coefficients (b0, b1, b2, up to b3 down the
column, where bi is the coefficient of the ith power of the predictor or transformed predictor).
store the scaled residuals and scaled fits when using the y-variable transformation, log10Y.
Order
Statistical model
linear
first
Y = 0 + 1X +
quadratic
second
Y = 0 + 1X + 2 X +
cubic
third
Y = 0 + 1X + 2 X + 3 X +
Another way of modeling curvature is to generate additional models by using the log10 of X and/
or Y for linear, quadratic, and cubic models. In addition, taking the log10 of Y may be used to
reduce right-skewness or nonconstant variance of residuals.
e Example of plotting a fitted regression line
You are studying the relationship between a particular machine setting and the amount of energy
consumed. This relationship is known to have considerable curvature, and you believe that a log
transformation of the response variable will produce a more symmetric error distribution. You
choose to model the relationship between the machine setting and the amount of energy
consumed with a quadratic model.
1 Open the worksheet EXH_REGR.MTW.
2 Choose Stat Regression Fitted Line Plot.
3 In Response (Y), enter EnergyConsumption.
2-24
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Regression
R-Sq = 93.1 %
R-Sq(adj) = 91.1 %
Analysis of Variance
Source
Regression
Error
Total
Source
Linear
Quadratic
DF
2
7
9
DF
1
1
SS
2.65326
0.19685
2.85012
Seq SS
0.03688
2.61638
MS
1.32663
0.02812
F
P
47.1743 0.000
F
P
0.1049 0.754
93.0370 0.000
Graph
window
output
CONTENTS
2-25
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 2
HOW TO USE
Residual Plots
Residual Plots
You can generate a set of plots to use for residual analysis by storing fits and residuals using
another procedure, such as regression, and then using the Residual Plots procedure to produce a
normal score plot, a chart of individual residuals, a histogram of residuals, and a plot of fits versus
residuals, all on the same graph.
Data
You must save a column of residuals and a column of fits from another MINITAB procedure.
MINITAB automatically omits rows with missing values from the calculations.
h To display the residual plots
1 Choose Stat Regression Residual Plots.
Options
You can replace the default title with your own title.
e Example of residual plots
You examine the residuals from the best two-predictor model of the best subsets regression
example on page 2-22. You determined in the multiple regression example on page 2-12 that
adding the third variable from the best three-predictor model may not add appreciably to the fit.
You now examine residual patterns from the best two-predictor model to further examine
goodness-of-fit.
Step 1: Store the residuals and fits from a regression analysis
1 Open the worksheet EXH_REGR.MTW.
2 Choose Stat Regression Regression.
2-26
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Residual Plots
Regression
3 In Response, enter Heatflux.
4 In Predictors, enter South North.
5 Click Storage. Check Fits and Standardized residuals.
6 Click OK in each dialog box.
Session
window
output
Residual Plots
TEST 1. One point more than 3.00 sigmas from center line.
Test Failed at points: 22
TEST 2. 9 points in a row on same side of center line.
Test Failed at points: 16
Graph
window
output
You can identify points in the plots using the brushing capabilities. See the Brushing
Graphs chapter in MINITAB Users Guide 1.
CONTENTS
2-27
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 2
HOW TO USE
Logistic Regression Overview
The plot of residuals versus fits shows that the fit tends to be better for higher predicted values.
Investigation shows that the highest residual coincides with the highest value of the variable East.
Including East in the model and repeating the residual plots procedure showed that no points are
flagged as unusual (not shown). The contribution to the fit by the variable East may warrant
further investigation.
Number of
categories
Characteristics
Examples
Binary
two levels
success, failure
yes, no
Ordinal
3 or more
natural ordering
of the levels
Nominal
3 or more
no natural ordering
of the levels
Both logistic and least squares regression methods estimate parameters in the model so that the fit
of the model is optimized. Least squares minimizes the sum of squared errors to obtain parameter
estimates, whereas logistic regression obtains maximum likelihood estimates of the parameters
using an iterative-reweighted least squares algorithm [19].
covariates that are crossed with each other or with factors, or nested within factors
2-28
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Regression
Model continuous predictors as covariates and categorical predictors as factors. Here are some
examples. A is a factor and X is a covariate.
Model terms
A X AX
A|X
A X XX
fits a model with a covariate crossed with itself making a squared term
A X(A)
The model for logistic regression is a generalization of the model used in MINITABs general
linear model (GLM) procedure. Any model fit by GLM can also be fit by the logistic regression
procedures. For a discussion of specifying models in general, see Specifying the model terms on
page 3-19 and Specifying reduced models on page 3-21. In the logistic regression commands,
MINITAB assumes any variable in the model is a covariate unless the variable is specified as a
factor. In contrast, GLM assumes that any variable in the model is a factor unless the variable is
specified as a covariate.
Model restrictions
Logistic regression models in MINITAB have the restrictions as GLM models:
There must be enough data to estimate all the terms in your model, so that the model is full
rank. MINITAB will automatically determine if your model is full rank and display a message.
In most cases, eliminating some unimportant high-order interactions in your model should
solve your problem.
CONTENTS
2-29
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 2
HOW TO USE
Logistic Regression Overview
For numeric factors, the reference level is the level with the least numeric value.
For date/time factors, the reference level is the level with the earliest date/time.
For text factors, the reference level is the level that is first in alphabetical order.
You can change the default reference level in the Options subdialog box.
If you have defined a value order for a text factor, the default rule above does not
apply. MINITAB designates the first value in the defined order as the reference value. See
Ordering Text Categories in the Manipulating Data chapter in MINITAB Users Guide 1.
Note
For more information, Interpreting the parameter estimates relative to the event and the reference
levels on page 2-37.
Logistic regression creates a set of design variables for each factor in the model. If there are k
levels, there will be k1 design variables and the reference level will be coded as 0. Here are two
examples of the default coding scheme:
reference
level
reference
level
For numeric factors, the reference event is the greatest numeric value.
For date/time factors, the reference event is the most recent date/time.
For text factors, the reference event is the last in alphabetical order.
You can change the default reference event in the Options subdialog box.
For more information, Interpreting the parameter estimates relative to the event and the reference
levels on page 2-37.
Note
If you have defined a value order for a text factor, the default rule above does not
apply. MINITAB designates the last value in the defined order as the reference event. See
Ordering Text Categories in the Manipulating Data chapter in MINITAB Users Guide 1.
2-30
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Regression
Worksheet structure
Data used for input to the logistic regression procedures may be arranged in two different ways in
your worksheet: as raw (categorical) data, or as frequency (collapsed) data. For binary logistic
regression, there are three additional ways to arrange the data in your worksheet: as successes and
trials, as successes and failures, or as failures and trials. These ways are illustrated here for the
same data.
The response entered as raw data or as frequency data
Raw Data: one row for each observation
C1
C2
C3
C4
Response
Factor Covar
0
1
12
1
1
12
1
1
12
.
.
.
.
.
.
.
.
.
1
1
12
0
2
12
1
2
12
.
.
.
.
.
.
.
.
.
1
2
12
.
.
.
.
.
.
.
.
.
CONTENTS
19
1
19
C2
Count
1
19
1
19
5
15
4
16
7
13
8
12
11
2
9
11
19
1
18
2
C3
C4
Factor Covar
1
12
1
12
2
12
2
12
1
24
1
24
2
24
2
24
1
50
1
50
2
50
2
50
1
125
1
125
2
125
2
125
1
200
1
200
2
200
2
200
2-31
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 2
C1
S
19
19
15
16
13
12
9
11
1
2
C1
S
19
19
15
16
13
12
9
11
1
2
C1
F
1
1
5
4
7
8
11
9
19
18
C2
C3
C4
T Factor Covar
20
1
12
20
2
12
20
1
24
20
2
24
20
1
50
20
2
50
20
1
125
20
2
125
20
1
200
20
2
200
C2
C3
C4
F Factor Covar
1
1
12
1
2
12
5
1
24
4
2
24
7
1
50
8
2
50
11
1
125
9
2
125
19
1
200
18
2
200
C2
C3
C4
T Factor Covar
20
1
12
20
2
12
20
1
24
20
2
24
20
1
50
20
2
50
20
1
125
20
2
125
20
1
200
20
2
200
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Regression
Data
Your data must be arranged in your worksheet in one of five ways: as raw data, as frequency data,
as successes and trials, as successes and failures, or as failures and trials. See Worksheet structure
on page 2-31.
Factors, covariates, and response data can be numeric, text, or date/time. The reference level and
the reference event depend on the data type. See Reference levels for factors on page 2-30 and
Reference event for the response variable on page 2-30 for details.
The predictors may either be factors (nominal variables) or covariates (continuous variables).
Factors may be crossed or nested. Covariates may be crossed with each other or with factors, or
nested within factors.
The model can include up to 9 factors and 50 covariates. Unless you specify a predictor in the
model as a factor, the predictor is assumed to be a covariate. Model continuous predictors as
covariates and categorical predictors as factors. See How to specify the model terms on page 2-28
for more information.
MINITAB automatically omits observations with missing values from all calculations.
h To do a binary logistic regression
1 Choose Stat Regression Binary Logistic Regression.
If your data is in raw form, choose Response and enter the column containing the
response variable.
If your data is in frequency form, choose Response and enter the column containing the
response variable. In Frequency, enter the column containing the count or frequency
variable.
CONTENTS
2-33
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 2
HOW TO USE
Binary Logistic Regression
3 In Model, enter the model terms. See How to specify the model terms on page 2-28.
4 If you like, use one or more of the options listed below, then click OK.
Options
Binary Logistic Regression dialog box
plot delta Pearson 2, delta deviance, delta based on standardized Pearson residuals, and
delta based on Pearson residuals versus:
the estimated event probability for each distinct factor/covariate pattern
the leverage for each distinct factor/covariate pattern
See Regression diagnostics and residual analysis on page 2-36.
specify the link function: logit (the default), normit (also called probit), or gompit (also called
complementary log-log)see Link functions on page 2-45
change the reference event of the response or the reference levels for the factorssee
Interpreting the parameter estimates relative to the event and the reference levels on page 2-37
specify initial values for model parameters or parameter estimates for a validation modelsee
Entering initial values for parameter estimates on page 2-36
change the maximum number of iterations for reaching convergence (the default is 20)
change the number of groups for the Hosmer-Lemeshow goodness-of-fit test from the default
of 10see Groups for the Hosmer-Lemeshow goodness-of-fit test on page 2-36
2-34
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Regression
Link functions
MINITAB provides three link functionslogit (the default), normit (also called probit), and
gompit (also called complementary log-log)allowing you to fit a broad class of binary response
models. These are the inverse of the cumulative logistic distribution function (logit), the inverse
of the cumulative standard normal distribution function (normit), and the inverse of the
Gompertz distribution function (gompit). This class of models is defined by:
g(j) = 0 + x j , where
j
= the intercept
x j
The link function is the inverse of a distribution function. The link functions and their
corresponding distributions are summarized below (pi in the variance is 3.14159):
Name
Link function
Distribution
Mean
Variance
logit
logistic
pi2 / 3
normit
g(j) = -1(j)
normal
CONTENTS
2-35
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 2
HOW TO USE
Binary Logistic Regression
Name
Link function
Distribution
Mean
Variance
gompit
Gompertz
You want to choose a link function that results in a good fit to your data. Goodness-of-fit statistics
can be used to compare fits using different link functions. Certain link functions may be used for
historical reasons or because they have a special meaning in a discipline.
An advantage of the logit link function is that it provides an estimate of the odds ratios. For a
comparison of link functions, see [19].
ConvergenceThe maximum likelihood solution may not converge if the starting estimates
are not in the neighborhood of the true solution. If the algorithm does not converge to a
solution, you can specify what you think are good starting values for parameter estimates in
Starting estimates for algorithm in the Options subdialog box.
ValidationYou may also wish to validate the model with an independent sample. Typically,
this is done by splitting the data into two subsets. Use the first set to estimate and store the
coefficients. If you enter these estimates in Estimates for validation model in the Options
subdialog box, MINITAB will use these values as the parameter estimates rather than
calculating new parameter estimates. Then, you can assess the model fit for the independent
sample.
In both cases, enter a column with the first entry being the constant estimate, and the remaining
entries corresponding to the model terms in the order in which they appear in the Model box or
the output.
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Regression
that are either poorly fit by the model, have a strong influence upon the estimated parameters, or
which have a high leverage. MINITAB provides different options for each of these, as listed in the
following table (See Help for computational details). Hosmer and Lemeshow [16] suggest that
you interpret these diagnostics jointly to understand any potential problems with the model.
To identify
Use...
Which measures...
factor/covariate
patterns with a
strong influence
on parameter
estimates
factor/covariate
patterns with a
large leverage
standardized
Pearson residual
deviance residual
delta chi-square
delta deviance
delta beta
leverage (Hi)
The graphs available in the Graphs subdialog box allow you to visualize some of these
diagnostics jointly; you can plot a measure useful for identifying poorly fit factor/covariate
patterns (delta chi-square or delta deviance) or a measure useful for identifying a factor/covariate
pattern with a strong influence on parameter estimates (one of the delta beta statistics) versus
either the estimated event probability or leverage. The estimated event probability is the
probability of the event, given the data and model. Leverages are used to assess how unusual the
predictor values are (see Identifying outliers on page 2-9). See [16] for a further discussion of
diagnostic plots. You can use MINITABs graph brushing capabilities to identify points. See the
Brushing Graphs chapter in MINITAB Users Guide 1 for more information.
CONTENTS
2-37
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 2
HOW TO USE
Binary Logistic Regression
each unit change in the predictor, while all other predictors are held constant. A unit change in a
factor refers to a comparison of a certain level to the reference level.
The logit link provides the most natural interpretation of the parameter estimates and is therefore
the default link in MINITAB. A summary of the interpretation follows:
The odds of a reference event is the ratio of P(event) to P(not event). The parameter estimate
of a predictor (factor or covariate) is the estimated change in the log of P(event)/P(not event)
for each unit change in the predictor, assuming the other predictors remain constant.
The parameter estimates can also be used to calculate the odds ratio, or the ratio between two
odds. Exponentiating the parameter estimate of a factor yields the ratio of P(event)/P(not
event) for a certain factor level compared to the reference level. The odds ratios at different
values of the covariate can be constructed relative to zero. In the covariate case, it may be
more meaningful to interpret the odds and not the odds ratio. Note that a parameter estimate
of zero or an odds ratio of one both imply the same thingthe factor or covariate has no
effect.
You can change the event or reference levels in the Options subdialog box if you wish to change
how you view the parameter estimates. To change the event, specify the new event value in the
Event box. To change the reference level for a factor, specify the factor variable followed by the
new reference level in the Reference factor level box. You can specify reference levels for more
than one factor at the same time. If the levels are text or date/time, enclose them in double
quotes.
e Example of a binary logistic regression
You are a researcher who is interested in understanding the effect of smoking and weight upon
resting pulse rate. Because you have categorized the responsepulse rateinto low and high, a
binary logistic regression analysis is appropriate to investigate the effects of smoking and weight
upon pulse rate.
1 Open the worksheet EXH_REGR.MTW.
2 Choose Stat Regression Binary Logistic Regression.
3 In Response, enter RestingPulse. In Model, enter Smokes Weight. In Factors (optional), enter
Smokes.
4 Click Graphs. Check Delta chi-square vs probability and Delta chi-square vs leverage.
Click OK.
2-38
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Regression
5 Click Results. Choose In addition, list of factor level values, tests for terms with more than
1 degree of freedom, and 2 additional goodness-of-fit tests. Click OK in each dialog box.
Binary Logistic Regression: RestingPulse versus Smokes, Weight
Session
window
output
Response Information
Variable Value
RestingP Low
High
Total
Count
70
22
92
(Event)
Factor Information
Factor
Smokes
Levels Values
2 No Yes
Coef
-1.987
SE Coef
1.679
Z
P
-1.18 0.237
-1.1930
0.02502
0.5530
0.01226
-2.16 0.031
2.04 0.041
Odds
Ratio
0.30
1.03
95% CI
Lower
Upper
0.10
1.00
0.90
1.05
Log-Likelihood = -46.820
Test that all slopes are zero: G = 7.574, DF = 2, P-Value = 0.023
Goodness-of-Fit Tests
Method
Chi-Square
Pearson
40.848
Deviance
51.201
Hosmer-Lemeshow
4.745
Brown:
General Alternative
0.905
Symmetric Alternative
0.463
CONTENTS
DF
47
47
8
P
0.724
0.312
0.784
2
1
0.636
0.496
2-39
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 2
HOW TO USE
Binary Logistic Regression
Group
5
6
4
4.4
6
6.4
6
6.3
8
6.6
8
6.9
6
7.2
8
12
10
8.3 12.9 9.1
2
1.9
70
5
4.6
4
3.6
3
2.7
1
2.4
1
2.1
3
1.8
2
1.7
0
0.1
22
92
10
10
3
2.1
15
10
0
0.9
10
Total
Measures of Association:
(Between the Response Variable and Predicted Probabilities)
Pairs
Concordant
Discordant
Ties
Total
Number
1045
461
34
1540
Percent
67.9%
29.9%
2.2%
100.0%
Summary Measures
Somers' D
Goodman-Kruskal Gamma
Kendall's Tau-a
G
0.38
0.39
0.14
Graph
window
output
2-40
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Regression
Interpretation of results
The Session window output contains the following seven parts:
a. Response Informationdisplays the number of missing observations and the number of
observations that fall into each of the two response categories. The response value that has
been designated as the reference event is the first entry under Value and labeled as the event.
In this case, the reference event is low pulse rate (see Reference event for the response variable
on page 2-30).
B Factor Informationdisplays all the factors in the model, the number of levels for each
factor, and the factor level values. The factor level that has been designated as the reference
level is first entry under Values, the subject does not smoke (see Reference levels for factors on
page 2-30).
C Logistic Regression Tableshows the estimated coefficients (parameter estimates), standard
error of the coefficients, z-values, and p-values. When you use the logit link function, you also
see the odds ratio and a 95% confidence interval for the odds ratio.
From the output, you can see that both Smokes (z = 2.16, p = 0.031) and Weight
(z = 2.04, p = 0.041) have p-values less than 0.05, indicating that there is sufficient
evidence that the parameters are not zero using a significance level of = 0.05.
The coefficient of -1.193 for Smokes represents the estimated change in the log of P(low
pulse)/P(high pulse) when the subject smokes compared to when he/she does not smoke,
with the covariate Weight held constant. The coefficient of 0.0250 for Weight is the
estimated change in the log of P(low pulse)/P(high pulse) with a 1 unit (lb) increase in
Weight, with the factor Smokes held constant.
Although there is evidence that the parameter of Weight is not zero, the odds ratio is very
close to one (1.03), indicating that a one pound increase in weight minimally effects a
persons resting pulse rate. A more meaningful difference would be found if you compared
subjects with a larger weight difference (for example, if the weight unit is 10 pounds, the
odds ratio becomes 1.28, indicating that the odds of a subject having a low pulse increases
by 1.28 times with each 10 pound increase in weight).
For Smokes, the negative coefficient of -1.193 and the odds ratio of 0.30 indicate that
subjects who smoke tend to have a higher resting pulse rate than subjects who do not
smoke. Given that subjects have the same weight, the odds ratio can be interpreted as the
odds of non-smokers in the sample having a low pulse being 30% of the odds of smokers
having a low pulse.
D Next, the last Log-Likelihood from the maximum likelihood iterations is displayed along with
the statistic G. This statistic tests the null hypothesis that all the coefficients associated with
predictors equal zero versus these coefficients not all being equal to zero. In this example, G =
7.574, with a p-value of 0.023, indicating that there is sufficient evidence that at least one of
the coefficients is different from zero, given that your accepted level is greater than 0.023.
Note that for factors with more than 1 degree of freedom, MINITAB performs a multiple
degrees of freedom test with a null hypothesis that all the coefficients associated with the
CONTENTS
2-41
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 2
HOW TO USE
Binary Logistic Regression
factor are equal to 0 versus them not all being equal to 0. This example does not have a
factor with more than 1 degree of freedom.
E Goodness-of-Fit Testsdisplays Pearson, deviance, and Hosmer-Lemeshow goodness-of-fit
tests. In addition, two Brown testsgeneral alternative and symmetric alternativeare
displayed because you have chosen the logit link function and the selected option in the
Results subdialog box. The goodness-of-fit tests, with p-values ranging from 0.312 to 0.724,
indicate that there is insufficient evidence to claim that the model does not fit the data
adequately. If the p-value is less than your accepted level, the test would reject the null
hypothesis of an adequate fit.
F Table of Observed and Expected Frequenciesallows you to see how well the model fits the
data by comparing the observed and expected frequencies. There is insufficient evidence that
the model does not fit the data well, as the observed and expected frequencies are similar. This
supports the conclusions made by the Goodness of Fit Tests.
G Measures of Associationdisplay a table of the number and percentage of concordant,
discordant, and tied pairs, as well as common rank correlation statistics. These values measure
the association between the observed responses and the predicted probabilities.
The table of concordant, discordant, and tied pairs is calculated by pairing the observations
with different response values. Here, you have 70 individuals with a low pulse and 22 with a
high pulse, resulting in 70 22 = 1540 pairs with different response values. Based on the
model, a pair is concordant if the individual with a low pulse rate has a higher probability of
having a low pulse, discordant if the opposite is true, and tied if the probabilities are equal.
In this example, 67.9% of pairs are concordant and 29.9% are discordant. You can use these
values as a comparative measure of prediction, for example in comparing fits with different
sets of predictors or with different link functions.
Somers D, Goodman-Kruskal Gamma, and Kendalls Tau-a are summaries of the table of
concordant and discordant pairs. These measures most likely lie between 0 and 1 where
larger values indicate that the model has a better predictive ability. In this example, the
measure range from 0.14 to 0.39 which implies less than desirable predictive ability.
PlotsIn the example, you chose two diagnostic plotsdelta Pearson 2 versus the estimated
event probability and delta Pearson 2 versus the leverage. Delta Pearson 2 for the jth factor/
covariate pattern is the change in the Pearson 2 when all observations with that factor/covariate
pattern are omitted. These two graphs indicate that two observations are not well fit by the model
(high delta 2). A high delta 2 can be caused by a high leverage and/or a high Pearson residual.
In this case, a high Pearson residual caused the large delta 2, because the leverages are less than
0.1. Hosmer and Lemeshow indicate that delta 2 or delta deviance greater than 3.84 is large.
If you choose Editor Brush, brush these points, and then click on them, they will be identified
as data values 31 and 66. These are individuals with a high resting pulse, who do not smoke, and
who have smaller than average weights (Weight = 116, 136 pounds). You might further
investigate these cases to see why the model did not fit them well.
2-42
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Regression
Data
Your data may be arranged in one of two ways: as raw data or as frequency data. See Worksheet
structure on page 2-31.
Factors, covariates, and response data can be numeric, text, or date/time. The reference level and
the reference event depend on the data type. See Reference levels for factors on page 2-30 and
Reference event for the response variable on page 2-30 for details.
The predictors may either be factors (nominal variables) or covariates (continuous variables).
Factors may be crossed or nested. Covariates may be crossed with each other or with factors, or
nested within factors.
The model can include up to 9 factors and 50 covariates. Unless you specify a predictor in the
model as a factor, the predictor is assumed to be a covariate. Model continuous predictors as
covariates and categorical predictors as factors. See How to specify the model terms on page 2-28
for more information.
MINITAB automatically omits observations with missing values from all calculations.
h To do an ordinal logistic regression
1 Choose Stat Regression Ordinal Logistic Regression.
CONTENTS
2-43
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 2
HOW TO USE
Ordinal Logistic Regression
If you have raw response data, in Response, enter the numeric column containing the
response data.
If you have frequency data, in Response, enter the numeric column containing the
response values. Then, in Frequency, enter the variable containing the counts.
Options
Ordinal Logistic Regression dialog box
specify the link function: logit (the default), normit (also called probit), or gompit (also called
complementary log-log)see Link functions on page 2-45
specify the order of the response values or change the reference levels for the factorssee
Interpreting the parameter estimates relative to the order of response values and the reference
levels on page 2-46
specify initial values for model parameters or parameter estimates for a validation modelsee
Entering initial values for parameter estimates on page 2-36
change the maximum number of iterations for reaching convergence (default is 20)
2-44
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Regression
estimated model coefficients, their standard errors, and the variance-covariance matrix of
the estimated coefficients.
the log-likelihood for the last maximum likelihood iteration.
store the following event probabilities and/or aggregated data:
the number of trials for each factor/covariate pattern.
event probabilities, cumulative event probabilities, and number of occurrences for each
factor/covariate pattern. Note that the number of events (distinct response values) must be
specified to store these values.
Link functions
MINITAB provides three link functionslogit (the default), normit (also called probit), and
gompit (also called complementary log-log)allowing you to fit a broad class of ordinal response
models. These are the inverse of the cumulative logistic distribution function (logit), the inverse
of the cumulative standard normal distribution function (normit), and the inverse of the
Gompertz distribution function (gompit). This class of models is defined by:
g(ij) = i + x j , i=1, ..., k-1
where
k
= the number of distinct values of the response or the number of possible events
ij
= the cumulative probability up to and including event i for the jth factor/
covariate pattern
x j
The link function is the inverse of a distribution function. The link functions and their
corresponding distributions are summarized below ( in the variance is 3.14159):
Name
Link function
Distribution
Mean
Variance
logit
logistic
pi2 / 3
normit
g(ij) = -1(ij)
normal
gompit
You want to choose a link function that results in a good fit to your data. Goodness-of-fit statistics
can be used to compare the fits using different link functions. Certain link functions may be
used for historical reasons or because they have a special meaning in a discipline.
MINITAB Users Guide 2
CONTENTS
2-45
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 2
HOW TO USE
Ordinal Logistic Regression
An advantage of the logit link function is that it provides an estimate of the odds ratios. The logit
link function is the default. For a comparison of link functions, see [19].
The odds of a reference event is the ratio of P(event) to P(not event). The estimated coefficient
of a predictor (factor or covariate) is the estimated change in the log of P(event)/P(not event)
for each unit change in the predictor, assuming the other predictors remain constant.
The estimated coefficient can also be used to calculate the odds ratio, or the ratio between two
odds. Exponentiating the parameter estimate of a factor yields the ratio of P(event)/P(not
event) for a certain factor level compared to the reference level. The odds ratios at different
values of the covariate can be constructed relative to zero. In the covariate case, it may be
more meaningful to interpret the odds and not the odds ratio. Note that a coefficient of zero or
an odds ratio of one both imply the same thingthe factor or covariate has no effect.
You can change the order of response values or the reference level in the Options subdialog box if
you wish to change how you view the parameter estimates. If your responses were coded Low,
Medium, and High, rather than 1, 2, 3, the default alphabetical ordering of the responses would
be improper. To change the order of response values, specify the new order in the Order of the
response values box. To order as Low, Medium, and High, enter these values in this order, each
enclosed in double quotes. To change the reference level for a factor, specify the factor variable
and the new reference level in the Reference factor level box. You can specify reference levels for
more than one factor at the same time. If the levels are text or date/time, enclose them in double
quotes.
e Example of an ordinal logistic regression
Suppose you are a field biologist and you believe that the adult population of salamanders in the
Northeast has gotten smaller over the past few years. You would like to determine whether any
association exists between the length of time a hatched salamander survives and level of water
toxicity, as well as whether there is a regional effect. Survival time is coded as 1 if it is less than 10
days, 2 if it is equal to 10 to 30 days, and 3 if it is equal to 31 to 60 days.
1 Open the worksheet EXH_REGR.MTW.
2 Choose Stat Regression Ordinal Logistic Regression.
2-46
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Regression
3 In Response, enter Survival. In Model, enter Region ToxicLevel. In Factors (optional), enter
Region.
4 Click Results. Choose In addition, list of factor level values, and tests for terms with more
Session
window
output
Value
1
2
3
Total
Count
15
46
12
73
Factor
Region
Factor Information
Levels Values
2 1 2
Coef
-7.043
-3.523
SE Coef
1.680
1.471
Z
P
-4.19 0.000
-2.39 0.017
0.2015
0.12129
0.4962
0.03405
0.41 0.685
3.56 0.000
Odds
Ratio
95% CI
Lower
Upper
1.22
1.13
0.46
1.06
3.23
1.21
Log-likelihood = -59.290
Test that all slopes are zero: G = 14.713, DF = 2, P-Value = 0.001
Goodness-of-Fit Tests
Method
Pearson
Deviance
Chi-Square
122.799
100.898
DF
122
122
P
0.463
0.918
Measures of Association:
(Between the Response Variable and Predicted Probabilities)
Pairs
Concordant
Discordant
Ties
Total
Number Percent
1127
79.3%
288
20.3%
7
0.5%
1422 100.0%
Summary Measures
Somers' D
Goodman-Kruskal Gamma
Kendall's Tau-a
0.59
0.59
0.32
CONTENTS
2-47
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 2
HOW TO USE
Ordinal Logistic Regression
a. Response Informationdisplays the number of observations that fall into each of the
response categories and the number of missing observations. The ordered response values,
from lowest to highest, are shown. Here, you use the default coding scheme, which orders the
values from lowest to highest: 1 is less than 10 days, 2 is equal to 10 to 30 days, and 3 is equal to
31 to 60 days (see Reference event for the response variable on page 2-30).
B Factor Informationdisplays all the factors in the model, the number of levels for each
factor, and the factor level values. The factor level that has been designated as the reference
level is the first entry under Values, region 1 (see Reference levels for factors on page 2-30).
C Logistic Regression Tableshows the estimated coefficients (parameter estimates), standard
error of the coefficients, z-values, and p-values. When you use the logit link function, MINITAB
displays the calculated odds ratio and a 95% confidence interval for the odds ratio.
The values labeled Const(1) and Const(2) are estimated intercepts for the logits of the
cumulative probabilities of survival for less than 10 days, and for 10 to 30 days, respectively.
Because the cumulative probability for the last response value is 1, there is no need to
estimate an intercept for 31 to 60 days.
The coefficient of 0.2015 for Region is the estimated change in the logit of the cumulative
survival time probability when the region is 2 compared to region being 1, with the
covariate ToxicLevel held constant. Because the p-value for this parameter estimate is
0.685, there is insufficient evidence to conclude that Region has an effect upon survival
time.
There is one parameter estimated for each covariate, which gives parallel lines for the
factor levels. Here, the estimated coefficient for the single covariate, ToxicLevel, is 0.121,
with a p-value of less than 0.0005. The p-value indicates that there is sufficient evidence to
conclude that the toxic level affects survival. The positive coefficient, and an odds ratio that
is greater than one, indicates that higher toxic levels tend to be associated with lower values
of survival.
Next, MINITAB displays the last Log-Likelihood from the maximum likelihood iterations
along with the statistic G. This statistics tests the null hypothesis that all the coefficients
associated with predictors equal to 0 versus them not all being equal to 0. In this example, G
= 14.713 with a p-value of 0.001, indicating that there is sufficient evidence to conclude
that at least one of the coefficients is different from zero.
The table of concordant, discordant, and tied pairs is calculated by pairing the observations
with different response values. Here, you have fifteen 1s, fourty-six 2s, and twelve 3s,
resulting in (15 46) + (15 12) + (46 12) = 1422 pairs of different response values. Pairs
2-48
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Regression
involving the lowest coded response value (the 1-2 and 1-3 value pairs in the example) are
concordant if the cumulative probability up to the lowest response value (here 1) is greater
for the observation with the lowest value. This pattern applies to other value pairs. Pairs
involving responses coded as 2 and 3 in this example are concordant if the cumulative
probability up to 2 is greater for the observation coded as 2. The pair is discordant if the
opposite is true and tied if the cumulative probabilities are equal. In this example, 79.3% of
pairs are concordant, 20.3% are discordant, and 0.5% are ties. You can use these values as a
comparative measure of prediction (for example, when evaluating predictors and different
link functions).
Somers D, Goodman-Kruskal Gamma, and Kendalls Tau-a are summaries of the table of
concordant and discordant pairs. The numbers have the same numerator (the number of
concordant pairs minus the number of discordant pairs). The denominators are the total
number of pairs with Somers D, the total number of pairs excepting ties with
Goodman-Kruskal Gamma, and the number of all possible observation pairs for Kendalls
Tau-a. These measures most likely lie between 0 and 1, where larger values indicate that
the model has a better predictive ability.
Data
Your data may be arranged in one of two ways: as raw data or as frequency data. See Worksheet
structure on page 2-31.
Factors, covariates, and response data can be numeric, text, or date/time. The reference level and
the reference event depend on the data type. See Reference levels for factors on page 2-30 and
Reference event for the response variable on page 2-30 for details.
The predictors may either be factors (nominal variables) or covariates (continuous variables).
Factors may be crossed or nested. Covariates may be crossed with each other or with factors, or
nested within factors.
The model can include up to 9 factors and 50 covariates. Unless you specify a predictor in the
model as a factor, the predictor is assumed to be a covariate. Model continuous predictors as
covariates and categorical predictors as factors. See How to specify the model terms on page 2-28
for more information.
MINITAB automatically omits observations with missing values from all calculations.
MINITAB Users Guide 2
CONTENTS
2-49
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 2
HOW TO USE
Nominal Logistic Regression
If you have raw response data, in Response, enter the numeric column containing the
response data.
If you have frequency data, in Response, enter the numeric column containing the
response values. Then, in Frequency, enter the variable containing the counts.
Options
Nominal Logistic Regression dialog box
change the reference event of the response or the reference levels for the factorssee
Interpreting the parameter estimates relative to the reference event and reference levels on page
2-52
specify initial values for model parameters or parameter estimates for a validation modelsee
Entering initial values for parameter estimates on page 2-36
change the maximum number of iterations for reaching convergence from the default of 20
2-50
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Regression
no output
basic information on response, parameter estimates, the log-likelihood, and the test for all
slopes being zero
the default output, which includes the above output plus two goodness-of-fit tests (Pearson
and deviance)
the default output, plus factor level values and tests for terms with more than one degree of
freedom
CONTENTS
2-51
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 2
HOW TO USE
Nominal Logistic Regression
k
ij
bi0
x j
bi
= the number of distinct values of the response or the number of possible events
= the probability of the ith event for the jth factor/covariate pattern
(1j is the probability of the reference event for the jth factor/covariate pattern)
= the intercept for the (i1)st logit function
= a vector of predictor variables for the jth factor/covariate pattern
= a vector of unknown coefficients associated with the predictors for the (i1)st
logit function
The coefficient of a predictor (factor or covariate) is the estimated change in the log of
P(response level)/P(reference event) for each unit change in the predictor, assuming the other
predictors remain constant.
The coefficient can also be used to calculate the odds ratio, or the ratio between two odds.
Exponentiating the parameter estimate of a factor yields the ratio of P(response level)/
P(reference event) for a certain factor level compared to the reference level. The odds ratios at
different values of the covariate can be constructed relative to zero. In the covariate case, it
may be more meaningful to interpret the odds and not the odds ratio. Note that a coefficient of
zero or an odds ratio of one both imply the same thingthe factor or covariate has no effect.
You can change the reference event or reference levels in the Options subdialog box if you wish
to change how you view the parameter estimates. To change the event, specify the new event
2-52
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Regression
value in the Reference event box. To change the reference event, give the new level. To change
the reference level for a factor, specify the factor variable followed by the new reference level in
the Reference factor level box. You can specify reference levels for more than one factor at the
same time. If the levels are text or date/time, enclose them in double quotes.
e Example of a nominal logistic regression
Suppose you are a grade school curriculum director interested in what children identify as their
favorite subject and how this subject is associated with their age or the teaching method
employed. Thirty children, 10 to 13 years old, had classroom instruction in science, math, and
language arts that employed either lecture or discussion techniques. At the end of the school
year, they were asked to identify their favorite subject. You use nominal logistic regression
because the response is categorical but possesses no implicit categorical ordering.
1 Open the worksheet EXH_REGR.MTW.
2 Choose Stat Regression Nominal Logistic Regression.
3 In Response, enter Subject. In Model, enter TeachingMethod Age. In Factors (optional),
enter TeachingMethod.
4 Click Results. Choose In addition, list of factor level values, tests for terms with more than
CONTENTS
2-53
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 2
HOW TO USE
Nominal Logistic Regression
Session
window
output
Response Information
Variable Value
Subject science
math
arts
Total
Count
10
11
9
30
(Reference Event)
Factor Information
Factor
Levels Values
Teaching
2 discuss lecture
Logistic Regression Table
Predictor
Coef
SE Coef
Logit 1: (math/science)
Constant
-1.123
4.564
Teaching
lecture
-0.5631
0.9376
Age
0.1247
0.4011
Logit 2: (arts/science)
Constant
-13.848
7.243
Teaching
lecture
2.770
1.372
Age
1.0135
0.5845
Odds
Ratio
95% CI
Lower
Upper
-0.25 0.806
-0.60 0.548
0.31 0.756
0.57
1.13
0.09
0.52
3.58
2.49
15.96
2.76
1.08
0.88
234.91
8.66
-1.91 0.056
2.02 0.044
1.73 0.083
Log-likelihood = -26.446
Test that all slopes are zero: G = 12.825, DF = 4, P-Value = 0.012
Goodness-of-Fit Tests
Method
Pearson
Deviance
Chi-Square
6.953
7.886
D
E
DF
P
10 0.730
10 0.640
2-54
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Regression
level is the first entry under Values. Here, the default coding scheme defines the reference
level as discussion using alphabetical order.
C Logistic Regression Tableshows the estimated coefficients (parameter estimates), standard
error of the coefficients, z-values, and p-values. MINITAB also displays the odds ratio and a
95% confidence interval for the odds ratio. The coefficient associated with a predictor is the
estimated change in the logit with a one unit change in the predictor, assuming that all other
factors and covariates are the same.
If there are k distinct response values, MINITAB estimates k1 sets of parameter estimates,
here labeled as Logit(1) and Logit(2). These are the estimated differences in log odds or
logits of math and language arts, respectively, compared to science as the reference event.
Each set contains a constant and coefficients for the factors (here, teaching method) and
the covariates (here, age). The TeachingMethod coefficient is the estimated change in the
logit when TeachingMethod is lecture compared to the teaching method being discussion,
with Age held constant. The Age coefficient is the estimated change in the logit with a one
year increase in age with teaching method held constant. These sets of parameter
estimates give nonparallel lines for the response values.
The first set of estimated logits, labeled Logit(1), are the parameter estimates of the change
in logits of math relative to the reference event, science. The p-values of 0.548 and 0.756
for TeachingMethod and Age, respectively, indicate that there is insufficient evidence to
conclude that a change in teaching method from discussion to lecture or a change in age
affected the choice of math as favorite subject compared to science.
The second set of estimated logits, labeled Logit(2), are the parameter estimates of the
change in logits of language arts relative to the reference event, science. The p-values of
0.044 and 0.083 for TeachingMethod and Age, respectively, indicate that there is sufficient
evidence, if the p-values are less than your acceptable level, to conclude that a change in
teaching method from discussion to lecture or a change in age affected the choice of
language arts as the favorite subject compared to science. The positive coefficient for
teaching method indicates students given a lecture style of teaching tend to prefer
language arts over science compared to students given a discussion style of teaching. The
estimated odds ratio of 15.96 implies that the odds of choosing language arts over science
is about 16 times higher for these students when the teaching method changes from
discussion to lecture. The positive coefficient associated with age indicates that students
tend to like language arts over science as they become older.
D Next, MINITAB displays the last Log-Likelihood from the maximum likelihood iterations
along with the statistic G. G is the difference in 2 log-likelihood for a model that only has
the constant terms and the fitted model shown in the Logistic Regression Table. G is the test
statistic for testing the null hypothesis that all the coefficients associated with predictors being
equal to 0 versus them not all being equal to 0. G = 12.825 with a p-value of 0.012, which
indicates that at = 0.05 there is sufficient evidence for at least one coefficient being different
from 0.
E Goodness-of-Fit Testsdisplays Pearson and deviance goodness-of-fit tests. The p-value for
the Pearson test is 0.730 and the p-value for the deviance test is 0.640, indicating that there is
CONTENTS
2-55
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 2
References
insufficient evidence for the model not fitting the data adequately. If the p-value is less than
your selected level, the test would indicate sufficient evidence for an inadequate fit.
References
[1] A. Agresti (1984). Analysis of Ordinal Categorical Data, John Wiley & Sons, Inc.
[2] A. Agresti (1990). Categorical Data Analysis, John Wiley & Sons, Inc.
[3] D.A. Belsley, E. Kuh, and R.E. Welsch (1980). Regression Diagnostics, John Wiley & Sons,
Inc.
[4] A. Bhargava (1989). Missing Observations and the Use of the Durbin-Watson Statistic,
Biometrika 76, 4, pp.828831.
[5] C.C. Brown (1982). On a Goodness of fit Test for the Logistic Model Based on Score
Statistics, Communications in Statistics, 11, pp.10871105.
[6] D.A. Burn and T.A. Ryan, Jr. (1983). A Diagnostic Test for Lack of Fit in Regression
Models, ASA 1983 Proceedings of the Statistical Computing Section, pp.286290.
[7] R.D. Cook (1977). Detection of Influential Observations in Linear Regression,
Technometrics 19, pp.1518.
[8] R.D. Cook and S. Weisberg (1982). Residuals and Influence in Regression, Chapman and
Hall.
[9] N.R. Draper and H. Smith (1981). Applied Regression Analysis, Second Edition, John Wiley
& Sons, Inc.
[10] S.E. Fienberg (1987). The Analysis of Cross-Classified Categorical Data. The MIT Press.
[11] M.J. Garside (1971). Some Computational Procedures for the Best Subset Problem,
Applied Statistics 20, pp.815.
[12] James H. Goodnight (1979). A Tutorial on the Sweep Operator, The American Statistician
33, pp.149158.
[13] W.W. Hauck and A. Donner (1977). Walds test as applied to hypotheses in logit analysis.
Journal of the American Statistical Association 72, 851-853.
[14] D.C. Hoaglin and R.E. Welsch (1978). The Hat Matrix in Regression and ANOVA, The
American Statistician 32, pp.1722, and Corrigenda 32, p.146.
[15] R.R. Hocking (1976). A Biometrics Invited Paper: The Analysis and Selection of Variables
in Linear Regression, Biometrics 32, pp.149.
[16] D.W. Hosmer and S. Lemeshow (1989). Applied Logistic Regression, John Wiley & Sons,
Inc.
2-56
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
References
Regression
[17] LINPACK (1979). Linpack Users Guide by J.J. Dongarra, J.R. Bunch, C.B. Moler, and
G.W. Stewart, Society for Industrial and Applied Mathematics, Philadelphia, PA.
[18] J.H. Maindonald (1984). Statistical Computation, John Wiley & Sons, Inc.
[19] P. McCullagh and J.A. Nelder (1992). Generalized Linear Models, Chapman & Hall.
[20] W. Miller (1978). Performing Armchair Roundoff Analysis of Statistical Algorithms,
Communications in Statistics, pp.243255.
[21] D.C. Montgomery and E.A. Peck (1982). Introduction to Linear Regression Analysis. John
Wiley & Sons.
[22] J. Neter, W. Wasserman, and M. Kutner (1985). Applied Linear Statistical Models, Richard
D. Irwin, Inc.
[23] S.J. Press and S. Wilson (1978). Choosing Between Logistic Regression and Discriminant
Analysis, Journal of the American Statistical Association 73, 699-705.
[24] M. Schatzoff, R. Tsao, and S. Fienberg (1968). Efficient Calculation of All Possible
Regressions, Technometrics 10, pp.769779.
[25] G.W. Stewart (1973). Introduction to Matrix Computations, Academic Press.
[26] R.A. Thisted (1988). Elements of Statistical Computing: Numerical Computation,
Chapman & Hall.
[27] P. Velleman and R. Welsch (1981). Efficient Computation of Regression Diagnostics,
The American Statistician 35, pp.234242.
[28] P.F. Velleman, J. Seaman, and I.E. Allen (1977). Evaluating Package Regression
Routines, ASA 1977 Proceedings of the Statistical Computing Section.
[29] S. Weisberg (1980). Applied Linear Regression, John Wiley & Sons, Inc.
Acknowledgments
We are very grateful for help in the design of the regression algorithm from W. Miller and from
G.W. Stewart and for useful suggestions from. P.F. Velleman and S. Weisberg and many others.
CONTENTS
2-57
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Analysis of Variance
See also
CONTENTS
3-1
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 3
HOW TO USE
Analysis of Variance Overview
One-way analysis of variance tests the equality of population means when classification is by
one variable. The classification variable, or factor, usually has three or more levels (one-way
ANOVA with two levels is equivalent to a t-test), where the level represents the treatment
applied. For example, if you conduct an experiment where you measure durability of a
product made by one of three methods, these methods constitute the levels. The one-way
procedure also allows you to examine differences among means using multiple comparisons.
Two-way analysis of variance performs an analysis of variance for testing the equality of
populations means when classification of treatments is by two variables or factors. In two-way
ANOVA, the data must be balanced (all cells must have the same number of observations) and
factors must be fixed.
If you wish to specify certain factors to be random, use Balanced ANOVA if your data are
balanced; use General Linear Models if your data are unbalanced or if you wish to compare
means using multiple comparisons.
Analysis of Means
Analysis of Means (ANOM) is a graphical analog to ANOVA for the testing of the equality of
population means. ANOM [16] was developed to test main effects from a designed experiment in
which all factors are fixed. This procedure is used for one-way designs. MINITAB uses an extension
of ANOM or Analysis of Mean treatment Effects (ANOME) [23] to test the significance of mean
treatment effects for two-way designs.
ANOM can be used if you assume that the response follows a normal distribution (similar to
ANOVA) and the design is one-way or two-way. You can also use ANOM when the response
follows either a binomial or Poisson distribution.
3-2
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Analysis of Variance
Balanced ANOVA performs univariate (one response) analysis of variance when you have a
balanced design (though one-way designs can be unbalanced). Balanced designs are ones in
which all cells have the same number of observations. Factors can be crossed or nested, fixed
or random. You can also use General Linear Models to analyze balanced, as well as
unbalanced, designs.
General linear model (GLM) fits the general linear model for univariate responses. In
matrix form, this model is Y = X + E, where Y is the response vector, X contains the
predictors, contains parameters to be estimated, and E represents errors assumed to be
normally distributed with mean vector 0 and variance . Using the general linear model, you
can perform a univariate analysis of variance with balanced and unbalanced designs, analysis
of covariance, and regression. GLM also allows you to examine differences among means
using multiple comparisons.
Fully nested ANOVA fits a fully nested (hierarchical) analysis of variance and estimates
variance components. All factors are implicitly assumed to be random.
General MANOVA is used to perform multivariate analysis of variance with either balanced
or unbalanced designs that can also include covariates. You cannot specify factors to be
random as you can for balanced MANOVA, although you can work around this restriction by
specifying the error term for testing different model terms.
The table below summarizes the differences between Balanced and General MANOVA:
Balanced
MANOVA
General
MANOVA
no
yes
yes
no
CONTENTS
3-3
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 3
HOW TO USE
One-Way Analysis of Variance
Balanced
MANOVA
General
MANOVA
no
yes
yes
no;
unrestricted only
Test for equal variances performs Bartletts (or F-test if 2 levels) and Levenes hypothesis tests
for testing the equality or homogeneity of variance. Many statistical procedures, including
ANOVA, are based upon the assumption that samples from different populations have the
same variance.
Interval plot for mean creates a plot of means with either error bars or confidence intervals
when you have a one-way design.
Main effects plot creates a main effects plot for either raw response data or fitted values from a
model-fitting procedure. The points in the plot are the means at the various levels of each
factor with a reference line drawn at the grand mean of the response data. Use the main effects
plot to compare magnitudes of marginal means.
Interactions plot creates a single interaction plot if two factors are entered, or a matrix of
interaction plots if 3 to 9 factors are entered. An interactions plot is a plot of means for each
level of a factor with the level of a second factor held constant. Interactions plots are useful for
judging the presence of interaction, which means that the difference in the response at two
levels of one factor depends upon the level of another factor. Parallel lines in an interactions
plot indicate no interaction. The greater the departure of the lines from being parallel, the
higher the degree of interaction. To use an interactions plot, data must be available from all
combinations of levels.
Use the main effects plot and the interactions plot in Chapter 19 to generate main effects plots
and interaction plots specifically for 2-level factorial designs, such as those generated by Create
Factorial Design and Create RS Design.
3-4
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Analysis of Variance
Data
The response variable must be numeric. You can enter the sample data from each population
into separate columns of your worksheet (unstacked case), or you can stack the response data in
one column with another column of level values identifying the population (stacked case). In the
stacked case, the factor level column can be numeric, text, or date/time. If you wish to change
the order in which text levels are processed from their default alphabetical order, you can define
your own order. See Ordering Text Categories in the Manipulating Data chapter of MINITAB
Users Guide 1. You do not need to have the same number of observations in each level. You can
use Calc Make Patterned Data to enter repeated factor levels. See the Generating Patterned
Data chapter in MINITAB Users Guide 1.
h To perform a one-way analysis of variance with stacked data
1 Choose Stat ANOVA One-way.
CONTENTS
3-5
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 3
HOW TO USE
One-Way Analysis of Variance
2 In Responses (in separate columns), enter the columns containing the separate response
variables.
3 If you like, use one or more of the options described below, then click OK.
store residuals and fitted values (the means for each level).
display confidence intervals for the differences between means, using four different multiple
comparison methods: Fishers LSD, Tukeys, Dunnetts, and Hsus MCB (multiple
comparisons with the best). See Multiple comparisons of means on page 3-6.
draw boxplots, dotplots, and residual plots. You can draw five different residual plots:
histogram.
normal probability plot.
).
plot of residuals versus the fitted values ( Y
plot of residuals versus data order. The row number for each data point is shown on the
x-axisfor example, 1 2 3 4 n.
separate plot for the residuals versus each specified column.
For a discussion of the residual plots, see Residual plots on page 2-5.
draw boxplots and dotplots that display the sample mean for each sample.
3-6
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Analysis of Variance
Purpose
Error rate
Fishers LSD
individual
Tukey
family
Dunnett
comparison to a control
family
Hsus MCB
family
CONTENTS
3-7
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 3
HOW TO USE
One-Way Analysis of Variance
stated, resulting in conservative confidence intervals [4], [22]. The Dunnett family error rates are
exact for unequal sample sizes.
The F-test and multiple comparisons
The results of the F-test and multiple comparisons can conflict. For example, it is possible for the
F-test to reject the null hypothesis of no differences among the level means, and yet all the Tukey
pairwise confidence intervals may contain zero. Conversely, it is possible for the F-test to fail to
reject the null hypothesis, and yet have one or more of the Tukey pairwise confidence intervals
not include zero. The F-test has been used to protect against the occurrence of false positive
differences in means. However, the Tukey, Dunnett, and MCB methods have protection against
false positives built in, while the Fisher method only benefits from this protection when all means
are equal. If the use of multiple comparisons is conditioned upon the significance of the F-test,
the error rate can be higher than the error rate in the unconditioned application of multiple
comparisons [15].
See Help for computational details of the multiple comparison methods.
3-8
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Analysis of Variance
You design an experiment to assess the durability of four experimental carpet products. You
place a sample of each of the carpet products in four homes and you measure durability after 60
days. Because you wish to test the equality of means and to assess the differences in means, you
use the one-way ANOVA procedure (data in stacked form) with multiple comparisons.
Generally, you would choose one multiple comparison method as appropriate for your data.
However, two methods are selected here to demonstrate MINITABs capabilities.
1 Open the worksheet EXH_AOV.MTW.
2 Choose Stat ANOVA One-way.
3 In Response, enter Durability. In Factor, enter Carpet.
4 Click Comparisons. Check Tukeys, family error rate and enter 10 in the text box. Check
Hsus MCB, family error rate and enter 10 in the text box. Click OK in each dialog box.
Session
window
output
N
4
4
4
4
Pooled StDev =
Mean
14.483
9.735
12.808
17.005
StDev
3.157
3.566
1.506
5.691
3.786
F
2.60
P
0.101
Lower
-7.527
-12.274
-9.202
-2.482
Center
-2.522
-7.270
-4.198
2.522
CONTENTS
Upper -+---------+---------+---------+-----2.482
(--------*-------)
0.000 (-------*-----------)
0.807
(-------*-------)
7.527
(-------*--------)
-+---------+---------+---------+------12.0
-6.0
0.0
6.0
3-9
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 3
HOW TO USE
One-Way Analysis of Variance
-2.106
11.601
-5.178
8.528
-9.926
3.781
-9.376
4.331
-14.123
-0.417
-11.051
2.656
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Analysis of Variance
eliminated as a choice for the best. By the Tukey method, the mean durability from carpets 2
and 4 appears to be different.
Data
The response variable must be numeric and in one worksheet column. You must have a single
factor level column for each of the two factors. These can be numeric, text, or date/time. If you
wish to change the order in which text categories are processed from their default alphabetical
order, you can define your own order. See Ordering Text Categories in the Manipulating Data
chapter of MINITAB Users Guide 1. You must have a balanced design (same number of
observations in each treatment combination) with fixed and crossed factors. See Balanced
designs on page 3-18, Fixed vs. random factors on page 3-19, and Crossed vs. nested factors on
page 3-18. You can use Calc Make Patterned Data to enter repeated factor levels. See the
Generating Patterned Data chapter in MINITAB Users Guide 1.
h To perform a two-way analysis of variance
1 Choose Stat ANOVA Two-way.
CONTENTS
3-11
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 3
HOW TO USE
Two-Way Analysis of Variance
5 If you like, use one or more of the options described below, then click OK.
Options
Two-way dialog box
print sample means and 95% confidence intervals for factor levels means.
fit an additive model, that is, a model without the interaction term. In this case, the fitted value
for cell (i, j) is (mean of observations in row i) + (mean of observations in row j) (mean of all
observations).
draw five different residual plots. You can display the following plots:
histogram.
normal probability plot.
).
plot of residuals versus the fitted values ( Y
plot of residuals versus data order. The row number for each data point is shown on the
x-axisfor example, 1 2 3 4 n.
separate plot for the residuals versus each specified column.
For a discussion of the residual plots, see Residual plots on page 2-5.
You are a biologist who is studying how zooplankton live in two lakes. You set up twelve tanks in
your laboratory, six each with water from a different lake. You add one of three nutrient
supplements to each tank and after 30 days you count the zooplankton in a unit volume of water.
You use two-way ANOVA to test if the population means are equal, or equivalently, to test
whether there is significant evidence of interactions and main effects.
1 Open the worksheet EXH_AOV.MTW.
2 Choose Stat ANOVA Two-way.
3 In Response, enter Zooplankton.
4 In Row factor, enter Supplement. Check Display means.
5 In Column factor, enter Lake. Check Display means. Click OK.
3-12
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Analysis of Means
Session
window
output
HOW TO USE
Analysis of Variance
Mean
43.5
68.3
39.8
Lake
Dennison
Rose
Mean
51.8
49.2
F
9.25
0.21
2.71
P
0.015
0.666
0.145
Individual 95% CI
--+---------+---------+---------+--------(-------*-------)
(--------*-------)
(--------*-------)
--+---------+---------+---------+--------30.0
45.0
60.0
75.0
Individual 95% CI
------+---------+---------+---------+----(----------------*----------------)
(----------------*----------------)
------+---------+---------+---------+----42.0
48.0
54.0
60.0
Analysis of Means
Analysis of Means (ANOM), a graphical analog to ANOVA, tests the equality of population
means. ANOM [16] was developed to test main effects from a designed experiment in which all
factors are fixed. This procedure is used for one-factor designs. MINITAB uses an extension of
ANOM or ANalysis Of Mean treatment Effects (ANOME) [23] to test the significance of mean
treatment effects for two-factor designs.
An ANOM chart can be described in two ways: by its appearance and by its function. In
appearance, it resembles a Shewhart control chart. In function, it is similar to ANOVA for
detecting differences in population means [13]. There are some important differences between
MINITAB Users Guide 2
CONTENTS
3-13
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 3
HOW TO USE
Analysis of Means
ANOM and ANOVA, however. The hypotheses they test are not identical [16]. ANOVA tests
whether the treatment means are different from each other; ANOM tests whether the treatment
means differ from the grand mean.
For most cases, ANOVA and ANOM will likely give similar results. However, there are some
scenarios where the two methods might be expected to differ: 1) if one group of means is above
the grand mean and another group of means is below the grand mean, ANOVAs F-test might
indicate evidence for differences where ANOM might not; 2) if the mean of one group is
separated from the other means, the ANOVA F-test might not indicate evidence for differences
whereas ANOM might flag this group as being different from the grand mean. Refer to [20], [21],
[22], and [23] for an introduction to the analysis of means.
ANOM can be used if you assume that the response follows a normal distribution, similar to
ANOVA, and the design is one-way or two-way. You can also use ANOM when the response
follows either a binomial distribution or a Poisson distribution.
Data
Response data from a normal distribution
Your response data must be numeric and entered into one column. Factor columns may be
numeric, text, or date/time and may contain any values. MINITABs capability to enter patterned
data can be helpful in entering numeric factor levels. If you wish to change the order in which
text categories are processed from their default alphabetical order, you can define your own order.
See Ordering Text Categories in the Manipulating Data chapter of MINITAB Users Guide 1. You
can use Calc Make Patterned Data to enter repeated factor levels. See the Generating
Patterned Data chapter in MINITAB Users Guide 1.
One-way designs may be balanced or unbalanced and can have up to 100 levels. Two-way designs
must be balanced and can have up to 50 levels for each factor. All factors must be fixed. See Fixed
vs. random factors on page 3-19.
Rows with missing data are automatically omitted from calculations. If you have two factors, the
design must be balanced after omitting rows with missing values.
Response data from a binomial distribution
The response data are the numbers of defectives (or defects) found in each sample, with a
maximum of 500 samples. These data must be entered into one column.
Since the decision limits in the ANOM chart are based upon the normal distribution, one of the
assumptions that must be met when the response data are binomial is that the sample size must
be large enough to ensure that the normal approximation to the binomial is valid. A general rule
of thumb is to only use ANOM if np > 5 and n(1 p) > 5, where n is the sample size and p is the
proportion of defectives. The second assumption is that all of the samples are the same size. See
[23] for more details.
A sample with a missing response value () is automatically omitted from the analysis.
3-14
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Analysis of Means
Analysis of Variance
4 If you like, use one or more of the options described below, then click OK.
Options
change the experiment wide error rate, or alpha level (default is 0.05). This will change the
location of the decision lines on the graph.
print a summary table of level statistics for normal (prints means, standard errors, sample size)
or binomial data (prints number, proportion of defectives).
CONTENTS
3-15
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 3
HOW TO USE
Analysis of Means
You perform an experiment to assess the effect of three process time levels and three strength
levels on density. You use analysis of means for normal data and a two-way design to identify any
significant interactions or main effects.
1 Open the worksheet EXH_AOV.MTW.
2 Choose Stat ANOVA Analysis of Means.
3 In Response, enter Density.
4 Choose Normal.
5 In Factor 1, enter Minutes. In Factor 2, enter Strength. Click OK.
Graph
window
output
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Analysis of Variance
You count the number of rejected welds from samples of size 80 in order to identify samples
whose proportions of rejects are out of line with the other samples. Because the data are
binomial (two possible outcomes, constant proportion of success, and independent samples) you
use analysis of means for binomial data.
1 Open the worksheet EXH_AOV.MTW.
2 Choose Stat ANOVA Analysis of Means.
3 In Response, enter WeldRejects.
4 Choose Binomial and enter 80 in Sample size. Click OK.
Graph
window
output
CONTENTS
3-17
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 3
SC QREF
HOW TO USE
can fit univariate models to balanced data with up to 31 factors. Here are some of the other
options:
Balanced
ANOVA
GLM
no
yes
yes
yes
Fits covariates
no
yes
no
yes
yes
unrestricted only
You can use balanced ANOVA to analyze data from balanced designssee Balanced designs on
page 3-18. You can use GLM to analyze data from any balanced design, though you cannot
choose to fit the restricted case of the mixed model, which only balanced ANOVA can fitsee
Restricted and unrestricted form of mixed models on page 3-26.
To determine how to classify your variables, see Crossed vs. nested factors on page 3-18, Fixed vs.
random factors on page 3-19, and Covariates on page 3-19.
For information on how to specify the model, see Specifying the model terms on page 3-19,
Specifying terms involving covariates on page 3-20, Specifying reduced models on page 3-21, and
Specifying models for some specialized designs on page 3-22.
For easy entering of repeated factor levels into your worksheet, see Using patterned data to set up
factor levels on page 3-24.
Balanced designs
Your design must be balanced to use balanced ANOVA, with the exception of a one-way design. A
balanced design is one with equal numbers of observations at each combination of your treatment
levels. A quick test to see whether or not you have a balanced design is to use Stat Tables
Cross Tabulation. Enter your classification variables and see if you have equal numbers of
observations in each cell, indicating balanced data.
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Analysis of Variance
In general, if each level of factor A occurs with each level of factor B, factors A and B are crossed.
If each level of factor B occurs within only one level of factor A, then factor B is nested within
factor A. The designation of whether a factor is crossed or nested within MINITAB occurs with the
specification of the model. See Specifying the model terms on page 3-19. It is important make the
correct designation in order to obtain the correct error term for factors.
Covariates
A covariate is a quantitative variable included in an ANOVA model. A covariate may be a
variable for which the level is not controlled as part of the design, but has been measured and it
is entered into the model to reduce the error variance. A covariate may also be a quantitative
variable for which the levels have been controlled as part of the experiment. Regardless of the
origin, the statistical model contains a coefficient for the covariate as if the covariate was a
predictor in a regression model.
CONTENTS
3-19
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 3
SC QREF
HOW TO USE
textbooks. Here are some examples of statistical models and the terms to enter in Model. A, B,
and C represent factors.
Case
Statistical model
Terms in model
Factors A, B crossed
A B AB
Factors A, B, C crossed
A B C AB AC BC ABC
3 factors nested
(B within A,
C within A and B)
A B(A) C(AB)
A B(A) C AC BC
In MINITABs models you omit the subscripts, , e, and +s that appear in textbook models. An is
used for an interaction term and parentheses are used for nesting. For example, when B is nested
within A, you enter B (A), and when C is nested within both A and B, you enter C (A B). Enter
B(A) C(B) for the case of 3 sequentially nested factors. Terms in parentheses are always factors in
the model and are listed with blanks between them. Thus, DF (A B E) is correct but DF (AB
E) and D (ABC) are not. Also, one set of parentheses cannot be used inside another set. Thus,
C (A B) is correct but C (A B (A)) is not. An interaction term between a nested factor and the
factor it is nested within is invalid.
See Specifying terms involving covariates on page 3-20 for details on specifying models with
covariates.
Several special rules apply to naming columns. You may omit the quotes around variable names.
Because of this, variable names must start with a letter and contain only letters and numbers.
Alternatively, you can use C notation (C1, C2, etc.) to denote data columns. You can use special
symbols in a variable name, but then you must enclose the name in single quotes.
You can specify multiple responses. In this case, a separate analysis of variance will be performed
for each response.
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Analysis of Variance
in the model. The default adjusted sums of squares (sums of squares with all other terms in the
model), however, will be the same, regardless of model order.
GLM allows terms containing covariates crossed with each other and with factors, and covariates
nested within factors. Here are some examples of these models, where A is a factor.
Case
Covariates
Terms in model
A X AX
same as previous
A|X
A X XX
X Z
A X Z XX ZZ XZ
A X(A)
Short form
A B C AB AC BC ABC
A|B|C
A B C AB AC BC
A|B|C ABC
A B C BC E
A B|C E
CONTENTS
3-21
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 3
SC QREF
HOW TO USE
A|B(A)|C
In general, all crossings are done for factors separated by bars unless the cross results in an illegal
term. For example, in the last example, the potential term AB(A) is illegal and MINITAB
automatically omits it. If a factor is nested, you must indicate this when using the vertical bar, as
in the last example with the term B(A).
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Analysis of Variance
is the proper term for testing factor B and that the remaining error (which is BlockAB) will be
used for testing AB. However, it is often assumed that the BlockB and BlockAB interactions
do not exist and these are then lumped together into error [6]. You might also pool the two terms
if the mean square for BlockB is small relative to BlockAB. In you dont pool, enter Block A
BlockA B BlockB AB in Model and what is labeled as Error is really BlockAB. If you do
pool terms, enter Block A BlockA B AB in Model and what is labeled as Error is the set of
pooled terms. In both cases enter Block in Random Factors.
Latin square with repeated measures design
A repeated measures design is a design where repeated measurements are made on the same
subject. There are a number of ways in which treatments can be assigned to subjects. With living
subjects especially, systematic differences (due to learning, acclimation, resistance, etc.) between
successive observations may be suspected. One common way to assign treatments to subjects is
to use a Latin square design. An advantage of this design for a repeated measures experiment is
that it ensures a balanced fraction of a complete factorial (i.e. all treatment combinations
represented) when subjects are limited and the sequence effect of treatment can be considered
to be negligible.
A Latin square design is a blocking design with two orthogonal blocking variables. In an
agricultural experiment there might be perpendicular gradients that might lead you to choose
this design. For a repeated measures experiment, one blocking variable is the group of subjects
and the other is time. If the treatment factor B has three levels, b1, b2, and b3, then one of twelve
possible Latin square randomizations of the levels of B to subjects groups over time is:
Time 1
Time 2
Time 3
Group 1
b2
b3
b1
Group 2
b3
b1
b2
Group 3
b1
b2
b3
The subjects receive the treatment levels in the order specified across the row. In this example,
group 1 subjects would receive the treatments levels in order b2, b3, b1. The interval between
administering treatments should be chosen to minimize carryover effect of the previous
treatment.
This design is commonly modified to provide information on one or more additional factors. If
each group was assigned a different level of factor A, then information on the A and AB effects
could be made available with minimal effort if an assumption about the sequence effect given to
the groups can be made. If the sequence effects are negligible compared to the effects of factor
A, then the group effect could be attributed to factor A. If interactions with time are negligible,
then partial information on the AB interaction may be obtained [27]. In the language of
repeated measures designs, factor A is called a between-subjects factor and factor B a
within-subjects factor.
Lets consider how to enter the model terms into MINITAB. If the group or A factor, subject, and
time variables were named A, Subject, and Time, respectively, enter A Subject(A) Time B AB in
Model and enter Subject in Random Factors.
MINITAB Users Guide 2
CONTENTS
3-23
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 3
HOW TO USE
Balanced ANOVA
bcn
cn
ab
Balanced ANOVA
Use Balanced ANOVA to perform univariate analysis of variance for each response variable.
Your design must be balanced, with the exception of one-way designs. Balanced means that all
treatment combinations (cells) must have the same number of observations. See Balanced designs
on page 3-18. Use General Linear Model (page 3-35) to analyze balanced and unbalanced
designs.
Factors may be crossed or nested, fixed or random. See Crossed vs. nested factors on page 3-18 and
Fixed vs. random factors on page 3-19. You may include up to 50 response variables and up to 31
factors at one time.
Data
You need one column for each response variable and one column for each factor, with each row
representing an observation. Regardless of whether factors are crossed or nested, use the same
form for the data. Factor columns may be numeric, text, or date/time. If you wish to change the
order in which text categories are processed from their default alphabetical order, you can define
your own order. See Ordering Text Categories in the Manipulating Data chapter in MINITAB
Users Guide 1.
Balanced data are required except for one-way designs. The requirement for balanced data
extends to nested factors as well. Suppose A has 3 levels, and B is nested within A. If B has 4 levels
3-24
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Balanced ANOVA
Analysis of Variance
within the first level of A, B must have 4 levels within the second and third levels of A. MINITAB
will tell you if you have unbalanced nesting. In addition, the subscripts used to indicate the 4
levels of B within each level of A must be the same. Thus, the four levels of B cannot be (1 2 3 4)
in level 1 of A, (5 6 7 8) in level 2 of A, and (9 10 11 12) in level 3 of A. However, you can use
GLM to analyze data coded in this way.
If any response or factor column specified contains missing data, that entire observation (row) is
excluded from all computations. The requirement that data be balanced must be preserved after
missing data are omitted. If an observation is missing for one response variable, that row is
eliminated for all responses. If you want to eliminate missing rows separately for each response,
perform balanced ANOVA separately for each response.
h To perform a balanced ANOVA
1 Choose Stat ANOVA Balanced ANOVA.
Options
Balanced Analysis of Variance dialog box
specify which factors are random factorssee Fixed vs. random factors on page 3-19.
use the restricted form of the mixed models (both fixed and random effects). The restricted
model forces mixed interaction effects to sum to zero over the fixed effects. By default,
MINITAB fits the unrestricted model. See Restricted and unrestricted form of mixed models on
page 3-26.
CONTENTS
3-25
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 3
HOW TO USE
Balanced ANOVA
draw five different residual plots. You can display the following plots:
histogram.
normal probability plot.
).
plot of residuals versus the fitted values ( Y
plot of residuals versus data order. The row number for each data point is shown on the
x-axisfor example, 1 2 3 4 n.
separate plot for the residuals versus each specified column.
For a discussion of the residual plots, see Residual plots on page 2-5.
display expected means squares, estimated variance components, and error terms used in each
F-test. See Expected mean squares on page 3-27.
display a table of means corresponding to specified terms from the model. For example, if you
specify A B D ABD, four tables of means will be printed, one for each main effect, A, B, D,
and one for the three-way interaction, ABD.
store the fits and residuals separately for each response. If you fit a full model, fits are cell
means. If you fit a reduced model, fits are least squares estimates. See Specifying reduced
models on page 3-21.
3-26
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Balanced ANOVA
HOW TO USE
Analysis of Variance
An experiment was conducted to test how long it takes to use a new and an older model of
calculator. Six engineers each work on both a statistical problem and an engineering problem
using each calculator model; the time in minutes to solve the problem is recorded. The
engineers can be considered as blocks in the experimental design. There are two factorstype of
problem, and calculator modeleach with two levels. Because each level of one factor occurs in
combination with each level of the other factor, these factors are crossed. The example and data
are from Neter, Wasserman, and Kutner [18], page 936.
1 Open the worksheet EXH_AOV.MTW.
2 Choose Stat ANOVA Balanced ANOVA.
3 In Responses, enter SolveTime.
4 In Model, type Engineer ProbType | Calculator.
5 In Random Factors, enter Engineer.
6 Click Results. In Display means corresponding to the terms, type
CONTENTS
3-27
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 3
Session
window
output
Balanced ANOVA
ANOVA: SolveTime versus Engineer, ProbType, Calculator
Factor
Type Levels Values
Engineer random
6
Adams
Dixon Erickson
Williams
ProbType fixed
2 Eng Stat
Calculat fixed
2 New Old
Jones
Maynes
DF
5
1
1
1
15
23
SS
1.053
16.667
72.107
3.682
1.010
94.518
MS
F
P
0.211
3.13 0.039
16.667 247.52 0.000
72.107 1070.89 0.000
3.682 54.68 0.000
0.067
Means
ProbType
Eng
Stat
N SolveTim
12
3.8250
12
5.4917
Calculat
New
Old
N SolveTim
12
2.9250
12
6.3917
ProbType
Eng
Eng
Stat
Stat
Calculat
New
Old
New
Old
N SolveTim
6
2.4833
6
5.1667
6
3.3667
6
7.6167
3-28
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Balanced ANOVA
HOW TO USE
Analysis of Variance
The following example contains data from Winer [27], p. 546, to illustrate a complex repeated
measures model. An experiment was run to see how several factors affect subject accuracy in
adjusting dials. Three subjects perform tests conducted at one of two noise levels. At each of
three time periods, the subjects monitored three different dials and made adjustments as needed.
The response is an accuracy score. The noise, time, and dial factors are crossed, fixed factors.
Subject is a random factor, nested within noise. Noise is a between-subjects factor, time (variable
ETime) and dial are within-subjects factors.
The model terms are entered in a certain order so that the error terms used for the fixed factors
are just below the terms for whose effects they test. (With a single random factor, the interaction
of a fixed factor with the random factor becomes the error term for that fixed effect.) Because
Subject was specified as Subject(Noise) the first time, you do not need to repeat (Noise) in the
interactions involving Subject. The interaction ETimeDialSubject, the error term for
ETimeDial, is not entered in the model because there would be zero degrees of freedom left
over for error. By not entering ETimeDialSubject in the model, it is labeled as Error and you
have the error term that is needed.
1 Open the worksheet EXH_AOV.MTW.
2 Choose Stat ANOVA Balanced ANOVA.
3 In Responses, enter Score.
4 In Model, enter Noise Subject(Noise) ETime NoiseETime ETimeSubject Dial
box.
CONTENTS
3-29
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 3
Session
window
output
HOW TO USE
Balanced ANOVA
2
2
2
2
3
3
3
DF
1
4
2
2
8
2
2
8
4
4
16
53
SS
468.17
2491.11
3722.33
333.00
234.89
2370.33
50.33
105.56
10.67
11.33
127.11
9924.83
MS
468.17
622.78
1861.17
166.50
29.36
1185.17
25.17
13.19
2.67
2.83
7.94
F
0.75
78.39
63.39
5.67
3.70
89.82
1.91
1.66
0.34
0.36
P
0.435
0.000
0.000
0.029
0.013
0.000
0.210
0.184
0.850
0.836
Source
1
2
3
4
5
6
7
8
9
10
11
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Balanced ANOVA
HOW TO USE
Analysis of Variance
repeated measures model can detect smaller differences in means within subjects as compared
to between subjects.
Of the four interactions among fixed factors, the noise by time interaction was the only one with
a low p-value (0.029). This implies that there is significant evidence for judging that a subjects
sensitivity to noise changed over time. Because this interaction is significant, at least at = 0.05,
the noise and time main effects are not examined. There is also significant evidence for a dial
effect (p-value < 0.0005). Among random terms, there is significant evidence for time by subject
(p-value = 0.013) and subject (p-value < 0.0005) effects.
e Example of both restricted and unrestricted forms of the mixed model
A company ran an experiment to see how several conditions affect the thickness of a coating
substance that it manufactures. The experiment was run at two different times, in the morning
and in the afternoon. Three operators were chosen from a large pool of operators employed by
the company. The manufacturing process was run at three settings, 35, 44, and 52. Two
determinations of thickness were made by each operator at each time and setting. Thus, the
three factors are crossed. One factor, operator, is random; the other two, time and setting, are
fixed.
The statistical model is
Yijkl = + Ti + Oj + Sk + TOij + TSik + OSjk + TOSijk + eijkl,
where Ti is the time effect, Oj is the operator effect, and Sk is the setting effect, and TOij,
TSik, OSjk, and TOSijk are the interaction effects.
Operator, all interactions with operator, and error are random. The random terms are:
Oj TOij OSjk TOSijk eijkl
These terms are all assumed to be normally distributed random variables with mean zero and
variances given by
var (Oj) = V(O)
var (eijkl) =
V(e) = 2
These variances are called variance components. The output from expected means squares
contains estimates of these variances.
In the unrestricted model, all these random variables are independent. The remaining terms in
this model are fixed.
In the restricted model, any term which contains one or more subscripts corresponding to fixed
factors is required to sum to zero over each fixed subscript. In the example, this means
( Ti ) = 0
i
( TOij ) = 0
( TSik ) = 0
k
( OS jk ) = 0
( TOSijk ) = 0
CONTENTS
( Sk) = 0
3-31
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 3
Balanced ANOVA
Your choice of model does not affect the sums of squares, degrees of freedom, mean squares, or
marginal and cell means. However, it does affect the expected mean squares, error term for the
F-tests, and the estimated variance components.
Step 1: Fit the restricted form of the model
1 Open the worksheet EXH_AOV.MTW.
2 Choose Stat ANOVA Balanced ANOVA.
3 In Responses, enter Thickness.
4 In Model, type Time | Operator | Setting.
5 In Random Factors, enter Operator.
6 Click Options. Check Use the restricted form of the model. Click OK.
7 Click Results. Check Display expected mean squares and variance components.
8 Click OK in each dialog box.
3-32
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Balanced ANOVA
HOW TO USE
Analysis of Variance
2
2
44
3
52
DF
1
2
2
2
2
4
4
18
35
SS
9.0
1120.9
15676.4
62.0
114.5
428.4
96.0
61.0
17568.2
MS
F
9.0
0.29
560.4 165.38
7838.2 73.18
31.0
9.15
57.3
2.39
107.1 31.61
24.0
7.08
3.4
P
0.644
0.000
0.001
0.002
0.208
0.000
0.001
Source
1
2
3
4
5
6
7
8
CONTENTS
3-33
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 3
Balanced ANOVA
2
2
44
3
52
DF
1
2
2
2
2
4
4
18
35
SS
9.0
1120.9
15676.4
62.0
114.5
428.4
96.0
61.0
17568.2
MS
9.0
560.4
7838.2
31.0
57.3
107.1
24.0
3.4
F
0.29
4.91
73.18
1.29
2.39
4.46
7.08
P
0.644
0.090 x
0.001
0.369
0.208
0.088
0.001
* Synthesized Test.
Error Terms for Synthesized Tests
Source
2 Operator
Error DF Error MS
3.73
114.1
Synthesis of Error MS
(4) + (6) - (7)
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Analysis of Variance
test. If an interaction is significant, any lower order interactions and main effects involving terms
of the significant interaction are not considered meaningful.
Lets examine where these models give different output. The OperatorSetting F-test is different,
because the error terms are Error in the restricted case and TimeOperatorSetting in the
unrestricted case, giving p-values of < 0.0005 and 0.088, respectively. Likewise, the
TimeOperator differs for the same reason, giving p-values of 0.002 and 0.369, for the restricted
and unrestricted cases, respectively. The estimated variance components for Operator,
TimeOperator, and OperatorSetting also differ.
Data
Set up your worksheet in the same manner as with balanced ANOVA: one column for each
response variable, one column for each factor, and one column for each covariate, so that there
is one row for each observation. The factor columns may be numeric, text, or date/time. If you
wish to change the order in which text categories are processed from their default alphabetical
order, you can define your own order. See Ordering Text Categories in the Manipulating Data
chapter in MINITAB Users Guide 1.
Although models can be unbalanced in GLM, they must be full rank, that is, there must be
enough data to estimate all the terms in your model. For example, suppose you have a two-factor
crossed model with one empty cell. Then you can fit the model with terms A B, but not A B AB.
MINITAB will tell you if your model is not full rank. In most cases, eliminating some of the high
order interactions in your model (assuming, of course, they are not important) can solve this
problem.
Nesting does not need to be balanced. A nested factor must have at least 2 levels at some level of
the nesting factor. If factor B is nested within factor A, there can be unequal levels of B within
each level of A. In addition, the subscripts used to identify the B levels can differ within each
level of A. This means, for example, that the B levels can be (1 2 3 4) in level 1 of A, (5 6 7 8) in
level 2 of A, and (9 10 11 12) in level 3 of A.
CONTENTS
3-35
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 3
HOW TO USE
General Linear Model
If any response, factor, or covariate column contains missing data, that entire observation (row) is
excluded from all computations. If you want to eliminate missing rows separately for each
response, perform GLM separately for each response.
h To perform an analysis using general linear model
1 Choose Stat ANOVA General Linear Model.
Options
General Linear Model dialog box
specify which factors are random factorssee Fixed vs. random factors on page 3-19.
select adjusted (Type III) or sequential (Type I) sums of squares for calculations. See Adjusted
vs. sequential sums of squares on page 3-40.
perform multiple comparison of treatment means with the mean of a control level. You can
also choose
3-36
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Analysis of Variance
draw five different residual plots for regular, standardized, or deleted residualssee Choosing
a residual type on page 2-5. Available residual plots include a
histogram.
normal probability plot.
).
plot of residuals versus the fitted values ( Y
plot of residuals versus data order. The row number for each data point is shown on the
x-axisfor example, 1 2 3 4 n.
separate plot for the residuals versus each specified column.
For a discussion of the residual plots, see Residual plots on page 2-5.
display expected means squares, estimated variance components, and error terms used in
each F-testsee Expected mean squares on page 3-27.
display the adjusted or least squares means (fitted values) corresponding to specified terms
from the model.
store coefficients for the model, in separate columns for each response.
store fits and regular, standardized, and deleted residuals separately for each responsesee
Choosing a residual type on page 2-5.
store leverages, Cooks distances, and DFITS, for identifying outlierssee Identifying outliers
on page 2-9.
CONTENTS
3-37
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 3
HOW TO USE
General Linear Model
store the design matrix. The design matrix multiplied by the coefficients will yield the fitted
values. See Design matrix used by General Linear Model on page 3-41.
enter factors to construct a main effects plotsee Main Effects Plot on page 3-64.
3-38
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Analysis of Variance
CONTENTS
3-39
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 3
less than the stated one. In comparing larger numbers of means, there is no proof that the Tukey
method is conservative for the general linear model. The Dunnett method uses a factor analytic
method to approximate the probabilities of the comparisons. Because it uses the factor analytic
approximation, the Dunnett method is not generally conservative. The Bonferroni and Sidak
methods are conservative methods based upon probability inequalities. The Sidak method is
slightly less conservative than the Bonferroni method.
Some characteristics of the multiple comparison methods are summarized below:
Comparison method
Properties
Dunnett
Tukey
Bonferroni
most conservative
Sidak
3-40
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Analysis of Variance
A1
1
0
0
1
A2
0
1
0
1
A3
0
0
1
1
Suppose factor B has 3 levels nested within each level of A. Then its block contains (3 1) 4 =
8 columns, call them B11, B12, B21, B22, B31, B32, B41, B42, coded as follows:
level of A
1
1
1
2
2
2
3
3
3
4
4
4
level of B
1
2
3
1
2
3
1
2
3
1
2
3
B11
1
0
1
0
0
0
0
0
0
0
0
0
B12
0
1
1
0
0
0
0
0
0
0
0
0
B21
0
0
0
1
0
1
0
0
0
0
0
0
B22
0
0
0
0
1
1
0
0
0
0
0
0
B31
0
0
0
0
0
0
1
0
1
0
0
0
B32
0
0
0
0
0
0
0
1
1
0
0
0
B41
0
0
0
0
0
0
0
0
0
1
0
1
B42
0
0
0
0
0
0
0
0
0
0
1
1
To calculate the dummy variables for an interaction term, just multiply all the corresponding
dummy variables for the factors and/or covariates in the interaction. For example, suppose factor
A has 6 levels, C has 3 levels, D has 4 levels, and Z and W are covariates. Then the term
ACDZWW has 5 2 3 1 1 1 = 30 dummy variables. To obtain them, multiply
each dummy variable for A by each for C, by each for D, by the covariates Z once and W twice.
e Example of using GLM to fit linear and quadratic effects
An experiment is conducted to test the effect of temperature and glass type upon the light output
of an oscilloscope. There are three glass types and three temperature levels: 100, 125, and 150
degrees Fahrenheit. These factors are fixed because we are interested in examining the response
at those levels. The example and data are from Montgomery [14], page 252.
MINITAB Users Guide 2
CONTENTS
3-41
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 3
HOW TO USE
General Linear Model
When a factor is quantitative with three or more levels it is appropriate to partition the sums of
squares from that factor into effects of polynomial orders [12]. If there are k levels to the factor,
you can partition the sums of squares into k-1 polynomial orders. In this example, the effect due
to the quantitative variable temperature can be partitioned into linear and quadratic effects.
Similarly, you can partition the interaction. To do this, you must code the quantitative variable
with the actual treatment values (that is, code Temperature levels as 100, 125, and 150), use
GLM to analyze your data, and declare the quantitative variable to be a covariate.
1 Open the worksheet EXH_AOV.MTW.
2 Choose Stat ANOVA General Linear Model.
3 In Responses, enter LightOutput.
4 In Model, type Temperature Temperature Temperature GlassType GlassType
3-42
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Session
window
output
HOW TO USE
Analysis of Variance
DF
1
1
2
2
Seq SS
1779756
190579
150865
226178
Adj SS
262884
190579
41416
51126
2
18
26
64374
6579
2418330
64374
6579
Adj MS
F
P
262884 719.21 0.000
190579 521.39 0.000
20708 56.65 0.000
25563 69.94 0.000
32187
366
Coef
-4968.8
83.867
-0.28516
SE Coef
191.3
3.127
0.01249
T
-25.97
26.82
-22.83
P
0.000
0.000
0.000
-24.400
-27.867
4.423
4.423
-5.52
-6.30
0.000
0.000
0.11236
0.12196
0.01766
0.01766
6.36
6.91
0.000
0.000
88.06 0.000
Fit
1035.00
1035.00
SE Fit
11.04
11.04
Residual
35.00
-35.00
St Resid
2.24R
-2.24R
CONTENTS
3-43
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 3
HOW TO USE
General Linear Model
The significant interaction effects of glass type with both linear and quadratic temperature terms
implies that the coefficients of second order regression models of the effect of temperature upon
light output depends upon the glass type.
The next table gives the estimated coefficients for the covariate, Temperature, and the
interactions of Temperature with GlassType, their standard errors, t-statistics, and p-values.
Following the table of coefficients is a table of unusual values. Observations with large
standardized residuals or large leverage values are flagged. In our example, two values have
standardized residuals whose absolute values are greater than 2.
e Example of using GLM and multiple comparisons with an unbalanced nested design
Four chemical companies produce insecticides that can be used to kill mosquitoes, but the
composition of the insecticides differs from company to company. An experiment is conducted to
test the efficacy of the insecticides by placing 400 mosquitoes inside a glass container treated with
a single insecticide and counting the live mosquitoes 4 hours later. Three replications are
performed for each product. The goal is to compare the product effectiveness of the different
companies. The factors are fixed because you are interested in comparing the particular brands.
The factors are nested because each insecticide for each company is unique. The example and
data are from Milliken and Johnson [13], page 414. You use GLM to analyze your data because
the design is unbalanced and you will use multiple comparisons to compare the mean response
for the company brands.
1 Open the worksheet EXH_AOV.MTW.
2 Choose Stat ANOVA General Linear Model.
3 In Responses, enter NMosquito.
4 In Model, enter Company Product(Company).
5 Click Comparisons. Under Pairwise Comparisons, enter Company in Terms. Click OK in
3-44
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Session
window
output
Analysis of Variance
DF
3
7
22
32
Seq SS
22813.3
1500.6
1260.0
25573.9
Adj SS
22813.3
1500.6
1260.0
Adj MS
F
7604.4 132.78
214.4
3.74
57.3
P
0.000
0.008
Lower
-2.92
-52.25
-61.69
Center
8.17
-41.17
-52.42
Upper
19.25
-30.08
-43.14
---------+---------+---------+------(---*----)
(----*---)
(---*---)
---------+---------+---------+-------50
-25
0
Upper
-37.19
-50.07
---------+---------+---------+------(----*----)
(---*---)
---------+---------+---------+-------50
-25
0
Upper
-0.7347
---------+---------+---------+------(----*---)
---------+---------+---------+-------50
-25
0
Lower
-61.48
-71.10
Center
-49.33
-60.58
Lower
-21.77
Center
-11.25
Difference
SE of
of Means Difference
8.17
3.989
-41.17
3.989
-52.42
3.337
CONTENTS
T-Value
2.05
-10.32
-15.71
Adjusted
P-Value
0.2016
0.0000
0.0000
3-45
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 3
Level
Company
C
D
Difference
SE of
of Means Difference
-49.33
4.369
-60.58
3.784
T-Value
-11.29
-16.01
Adjusted
P-Value
0.0000
0.0000
T-Value
-2.973
Adjusted
P-Value
0.0329
Difference
SE of
of Means Difference
-11.25
3.784
3-46
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Analysis of Variance
Data
Set up your worksheet in the same manner as with Balanced ANOVA or GLM: one column for
each response variable and one column for each factor, so that there is one row for each
observation. The factor columns may be numeric, text, or date/time. If you wish to change the
order in which text categories are processed from their default alphabetical order, you can define
your own order. See Ordering Text Categories in the Manipulating Data chapter in MINITAB
Users Guide 1.
Nesting does not need to be balanced. A nested factor must have at least 2 levels at some level of
the nesting factor. If factor B is nested within factor A, there can be unequal levels of B within
each level of A. In addition, the subscripts used to identify the B levels can differ within each
level of A.
If any response or factor column contains missing data, that entire observation (row) is excluded
from all computations. If an observation is missing for one response variable, that row is
eliminated for all responses. If you want to eliminate missing rows separately for each response,
perform a fully nested ANOVA separately for each response.
h To perform an analysis using fully nested ANOVA
1 Choose Stat ANOVA Fully Nested ANOVA.
below.
CONTENTS
3-47
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 3
HOW TO USE
Fully Nested ANOVA
MINITAB uses sequential (Type I) sums of squares for all calculations of fully nested ANOVA.
This usually makes sense for a hierarchical model. GLM offers the choice of sequential or
adjusted (Type III) sums of squares and uses the adjusted sums of squares by default. These sums
of squares can differ when your design is unbalanced. Use GLM if you want to use adjusted sums
of squares for calculations.
e Example of a fully nested ANOVA
You are an engineer trying to understand the sources of variability in the manufacture of glass
jars. The process of making the glass requires mixing materials in small furnaces for which the
temperature setting is to be 475 degrees F. Your company has a number of plants where the jars
are made, so you select four as a random sample. You conduct an experiment and measure
furnace temperature three times during a work shift for each of four operators from each plant
over four different shifts. Because your design is fully nested, you use Fully Nested ANOVA to
analyze your data.
1 Open the worksheet FURNTEMP.MTW.
2 Choose Stat ANOVA Fully Nested ANOVA.
3 In Responses, enter Temp.
4 In Factors, enter Plant-Batch. Click OK.
Session
window
output
DF
3
12
48
128
191
SS
731.5156
499.8125
1534.9167
1588.0000
4354.2448
MS
243.8385
41.6510
31.9774
12.4062
F
5.854
1.303
2.578
P
0.011
0.248
0.000
Variance Components
Source
Plant
Operator
Shift
Batch
Total
Var Comp.
4.212
0.806
6.524
12.406
23.948
3-48
% of Total
17.59
3.37
27.24
51.80
StDev
2.052
0.898
2.554
3.522
4.894
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Balanced MANOVA
HOW TO USE
Analysis of Variance
Plant
Operator
Shift
Batch
Balanced MANOVA
Use Balanced MANOVA to perform multivariate analysis of variance (MANOVA) for balanced
designs. You can take advantage of the data covariance structure to simultaneously test the
equality of means from different responses.
Your design must be balanced, with the exception of one-way designs. Balanced means that all
treatment combinations (cells) must have the same number of observations. Use General
MANOVA (page 3-55) to analyze either balanced and unbalanced MANOVA designs or if you
have covariates. You cannot designate factors to be random with general MANOVA, unlike for
balanced ANOVA, though you can work around this restriction by supplying error terms to test
the model terms.
Factors may be crossed or nested, fixed or random. See Crossed vs. nested factors on page 3-18
and Fixed vs. random factors on page 3-19.
Data
You need one column for each response variable and one column for each factor, with each row
representing an observation. Regardless of whether factors are crossed or nested, use the same
form for the data. Factor columns may be numeric, text, or date/time. If you wish to change the
order in which text categories are processed from their default alphabetical order, you can define
your own order. See Ordering Text Categories in the Manipulating Data chapter in MINITAB
Users Guide 1. You may include up to 50 response variables and up to 31 factors at one time.
CONTENTS
3-49
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 3
Balanced MANOVA
Balanced data are required except for one-way designs. The requirement for balanced data
extends to nested factors as well. Suppose A has 3 levels, and B is nested within A. If B has 4 levels
within the first level of A, B must have 4 levels within the second and third levels of A. MINITAB
will tell you if you have unbalanced nesting. In addition, the subscripts used to indicate the 4
levels of B within each level of A must be the same. Thus, the four levels of B cannot be (1 2 3 4)
in level 1 of A, (5 6 7 8) in level 2 of A, and (9 10 11 12) in level 3 of A. You can use general
MANOVA if you have different levels of B within the levels of A.
If any response or factor column specified contains missing data, that entire observation (row) is
excluded from all computations. The requirement that data be balanced must be preserved after
missing data are omitted.
h To perform a balanced MANOVA
1 Choose Stat ANOVA Balanced MANOVA.
Options
Balanced MANOVA dialog box
specify which factors are random factorssee Fixed vs. random factors on page 3-19
use the restricted form of the mixed models (both fixed and random effects). The restricted
model forces mixed interaction effects to sum to zero over the fixed effects. By default,
MINITAB fits the unrestricted model. See Restricted and unrestricted form of mixed models on
page 3-26.
3-50
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Balanced MANOVA
HOW TO USE
Analysis of Variance
draw five different residual plots. You can display the following plots:
histogram
normal probability plot
)
plot of residuals versus the fitted values ( Y
plot of residuals versus data order. The row number for each data point is shown on the
x-axisfor example, 1 2 3 4 n
separate plot for the residuals versus each specified column
For a discussion of the residual plots, see Residual plots on page 2-5.
display different MANOVA output. You can request the display of the hypothesis matrix H,
the error matrix E, and a matrix of partial correlations (see MANOVA tests on page 3-52), the
eigenvalues and eigenvalues for the matrix E-1 H, univariate analysis of variance for each
response, and when you have requested univariate analyses of variance, the expected means
squares.
display a table of means corresponding to specified terms from the model. For example, if you
specify A B D ABD, four tables of means will be printed, one for each main effect, A, B, D,
and one for the three-way interaction, ABD.
perform four multivariate tests for model terms that you specify. See Specifying terms to test on
page 3-51. Default tests are performed for all model terms.
store the fits and residuals separately for each response. If you fit a full model, fits are cell
means. If you fit a reduced model, fits are least squares estimates. See Specifying reduced
models on page 3-21.
CONTENTS
3-51
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 3
HOW TO USE
Balanced MANOVA
MANOVA tests
MINITAB automatically performs four multivariate testsWilks test, Lawley-Hotelling test,
Pillais test, and Roys largest root testfor each term in the model and for specially requested
terms (see above). All four tests are based on two SSCP (sums of squares and cross products)
matrices: H, the hypothesis matrix and E, the error matrix. There is one H associated with each
term. E is the matrix associated with the error for the test. These matrices are printed when you
request the hypothesis matrices and are labeled by SSCP Matrix.
The test statistics can be expressed in terms of either H and/or E or the eigenvalues of E-1 H. You
can request to have these eigenvalues printed. (If the eigenvalues are repeated, corresponding
eigenvectors are not unique and in this case, the eigenvectors MINITAB prints and those in books
or other software may not agree. The MANOVA tests, however, are always unique.) See Help for
computational details on the tests.
You can also print the matrix of partial correlations, which are the correlations among the
residuals, or alternatively, the correlations among the responses conditioned on the model. The
formula for this matrix is W .5 E W .5, where E is the error matrix and W has the diagonal of E as
its diagonal and 0s off the diagonal.
Hotellings T2 Test
Hotellings T2 test to compare the mean vectors of two groups is a special case of MANOVA,
using one factor that has two levels. MINITABs MANOVA option can be used to do this test. The
usual T2 test statistic can be calculated from MINITABs output using the relationship T2=(N-2)U,
where N is the total number of observations and U is the Lawley-Hotelling trace. See Help for
calculations.
e Example of balanced MANOVA
You perform a study in order to determine optimum conditions for extruding plastic film. You
measure three responsestear resistance, gloss, and opacityfive times at each combination of
two factorsrate of extrusion and amount of an additiveeach set at low and high levels. The
data and example are from Johnson and Wichern [10], page 266. You use Balanced MANOVA to
test the equality of means because the design is balanced.
1 Open the worksheet EXH_MVAR.MTW.
2 Choose Stat ANOVA Balanced MANOVA.
3 In Responses, enter Tear Gloss Opacity.
4 In Model, enter Extrusion | Additive.
5 Click Results. Under Display of Results, check Matrices (hypothesis, error, partial
3-52
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Balanced MANOVA
Session
window
output
Analysis of Variance
ANOVA: Tear, Gloss, Opacity versus Extrusion, Additive
MANOVA for Extrusio
s =
Criterion
Test Statistic
Wilk's
0.38186
Lawley-Hotelling
1.61877
Pillai's
0.61814
Roy's
1.61877
m = 0.5
F
7.554
7.554
7.554
( 3,
( 3,
( 3,
n =
6.0
DF
P
14) 0.003
14) 0.003
14) 0.003
Tear
1.740
-1.504
0.855
Gloss
-1.504
1.301
-0.739
Opacity
0.8555
-0.7395
0.4205
Gloss
0.0200
2.6280
-0.5520
Opacity
-3.070
-0.552
64.924
Tear
1.764
0.020
-3.070
Tear
Gloss
1.00000 0.00929
0.00929
1.00000
-0.28687 -0.04226
Opacity
-0.28687
-0.04226
1.00000
1.619
1.000
1.000
0.00000
0.00000
1.00000
0.00000
0.00000
1.00000
Eigenvector
1
Tear
0.6541
Gloss
-0.3385
Opacity
0.0359
2
0.4315
0.5163
0.0302
3
0.0604
0.0012
-0.1209
CONTENTS
3-53
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 3
HOW TO USE
Balanced MANOVA
Examine the p-values for the Wilks, Lawley-Hotelling, and Pillais test statistic to judge whether
there is significant evidence for model effects. These values are 0.003 for the model term
Extrusion, indicating that there is significant evidence for Extrusion main effects at levels
greater than 0.003. The corresponding p-values for Additive and for AdditiveExtrusion are 0.025
and 0.302, respectively (not shown), indicating that there is no significant evidence for
interaction, but there is significant evidence for Extrusion and Additive main effects at levels of
0.05 or 0.10.
You can use the SSCP matrices to assess the partitioning of variability in a similar way as you
would look at univariate sums of squares. The matrix labeled as SSCP Matrix for Extrusion is the
hypothesis sums of squares and cross-products matrix, or H, for the three response with model
term Extrusion. The diagonal elements of this matrix, 1.740, 1.301, and 0.4205, are the univariate
ANOVA sums of squares for the model term Extrusion when the response variables are Tear,
Gloss, and Opacity, respectfully. The off-diagonal elements of this matrix are the cross products.
The matrix labeled as SSCP Matrix for Error is the error sums of squares and cross-products
matrix, or E. The diagonal elements of this matrix, 1.764, 2.6280, and 64.924, are the univariate
ANOVA error sums of squares when the response variables are Tear, Gloss, and Opacity,
respectfully. The off-diagonal elements of this matrix are the cross products. This matrix is printed
once, after the SSCP matrix for the first model term.
You can use the matrix of partial correlations, labeled as Partial Correlations for the Error SSCP
Matrix, to assess how related the response variables are. These are the correlations among the
residuals or, equivalently, the correlations among the responses conditioned on the model.
Examine the off-diagonal elements. The partial correlations between Tear and Gloss of 0.00929
and between Gloss and Opacity of 0.04226 are small. The partial correlation of 0.28687
between Tear and Opacity is not large. Because the correlation structure is weak, you might be
satisfied with performing univariate ANOVA for these three responses. This matrix is printed
once, after the SSCP matrix for error.
You can use the eigen analysis to assess how the response means differ among the levels of the
different model terms. The eigen analysis is of E1 H, where E is the error SCCP matrix and H is
the response variable SCCP matrix. These are the eigenvalues that are used to calculate the four
MANOVA tests.
Place the highest importance on the eigenvectors that correspond to high eigenvalues. In the
example, the second and third eigenvalues are zero and therefore the corresponding eigenvectors
are meaningless. For both factors, Extrusion and Additive, the first eigenvectors contain similar
information The first eigenvector for Extrusion is 0.6541, 0.3385, 0.0359 and for Additive it is
0.6630, 0.3214, 0.0684 (not shown). The highest absolute value within these eigenvectors is
for the response Tear, the second highest is for Gloss, and the value for Opacity is small. This
implies that the Tear means have the largest differences between the two factor levels of either
Extrusion or Additive, the Gloss means have the next largest differences, and the Opacity means
have small differences.
3-54
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
General MANOVA
HOW TO USE
Analysis of Variance
General MANOVA
Use general MANOVA to perform multivariate analysis of variance (MANOVA) with balanced
and unbalanced designs or if you have covariates. This procedure takes advantage of the data
covariance structure to simultaneously test the equality of means from different responses.
Calculations are done using a regression approach. A full rank design matrix is formed from
the factors and covariates and each response variable is regressed on the columns of the design
matrix.
Factors may be crossed or nested, but they cannot be declared as random; it is possible to work
around this restriction by specifying the error term to test model terms (see Specifying terms to
test on page 3-57). Covariates may be crossed with each other or with factors, or nested within
factors. You can analyze up to 50 response variables with up to 31 factors and 50 covariates at one
time.
Data
Set up your worksheet in the same manner as with balanced MANOVA: one column for each
response variable, one column for each factor, and one column for each covariate, so that there
is one row of the worksheet for each observation. The factor columns may be numeric, text, or
date/time. If you wish to change the order in which text categories are processed from their
default alphabetical order, you can define your own order. See Ordering Text Categories in the
Manipulating Data chapter in MINITAB Users Guide 1.
Although models can be unbalanced in general MANOVA, they must be full rank. That is,
there must be enough data to estimate all the terms in your model. For example, suppose you
have a two-factor crossed model with one empty cell. Then you can fit the model with terms A B,
but not A B AB. MINITAB will tell you if your model is not full rank. In most cases, eliminating
some of the high order interactions in your model (assuming, of course, they are not important)
can solve non-full rank problems.
Nesting does not need to be balanced. If factor B is nested within factor A, there can be unequal
levels of B within each level of A. In addition, the subscripts used to identify the B levels can
differ within each level of A.
If any response, factor, or covariate column contains missing data, that entire observation (row) is
excluded from all computations.
CONTENTS
3-55
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 3
General MANOVA
h To perform an analysis using general MANOVA
1 Choose Stat ANOVA General MANOVA.
on page 3-17.
4 If you like, use one or more of the options described below, then click OK.
Options
Covariates subdialog box
draw five different residual plots for regular, standardized, or deleted residualssee Choosing
a residual type on page 2-5. Available residual plots include a
histogram
normal probability plot
)
plot of residuals versus the fitted values ( Y
plot of residuals versus data order. The row number for each data point is shown on the
x-axisfor example, 1 2 3 4 n
separate plot for the residuals versus each specified column
For a discussion of the residual plots, see Residual plots on page 2-5.
3-56
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
General MANOVA
Analysis of Variance
display different MANOVA output. You can request the display of the hypothesis matrix H,
the error matrix E, and a matrix of partial correlations (see MANOVA tests on page 3-57), the
eigenvalues and eigenvalues for the matrix E1 H, and univariate analysis of variance for each
response.
display a table of means corresponding to specified terms from the model. For example, if you
specify A B D ABD, four tables of means will be printed, one for each main effect, A, B, D,
and one for the three-way interaction, ABD.
perform 4 multivariate tests for model terms that you specify. See Specifying terms to test on
page 3-57. Default tests are performed for all model terms.
store model coefficients and fits in separate columns for each response.
regular, standardized, and deleted residuals separately for each responsesee Choosing a
residual type on page 2-5.
store leverages, Cooks distances, and DFITS, for identifying outlierssee Identifying outliers
on page 2-9.
store the design matrix. The design matrix multiplied by the coefficients will yield the fitted
values. See Design matrix used by General Linear Model on page 3-41.
MANOVA tests
The MANOVA tests with general MANOVA are similar to those performed for balanced
MANOVA. See MANOVA tests on page 3-52 for details.
CONTENTS
3-57
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 3
HOW TO USE
Test for Equal Variances
However, with general MANOVA, there are two SSCP matrices associated with each term in the
model, the sequential SSCP matrix and the adjusted SSCP matrix. These matrices are analogous
to the sequential SS and adjusted SS in univariate General Linear Model (see page 3-35). In fact,
the univariate SSs are along the diagonal of the corresponding SSCP matrix. If you do not specify
an error term in Error when you enter terms in Custom multivariate tests for the following
terms, then the adjusted SSCP matrix is used for H and the SSCP matrix associated with MSE is
used for E. If you do specify an error term, the sequential SSCP matrices associated with H and E
are used. Using sequential SSCP matrices guarantees that H and E are statistically independent.
See Help for details on these tests.
You can also perform Hotellings T2 test to compare the mean vectors of two groups (see
Hotellings T2 Test on page 3-52). Refer to Example of balanced MANOVA on page 3-52 for an
example of MANOVA. The dialog operation of general MANOVA is similar to that of balanced
MANOVA.
Data
Set up your worksheet with one column for the response variable and one column for each factor,
so that there is one row for each observation. Your response data must be in one column. You may
have up to 9 factors. Factor columns may be numeric, text, or date/time, and may contain any
value. If there are many cells (factors and levels), the print in the output chart can get very small.
Rows where the response column contains missing data () are automatically omitted from the
calculations. When one or more factor columns contain missing data, MINITAB displays the chart
and Bartletts test results, but not the Levenes test results.
Data limitations include: (1) if none of the cells have multiple observations, nothing is
calculated. In addition, there must be at least one nonzero standard deviation; (2) the F-test for 2
levels requires both cells to have multiple observations; (3) Bartletts test requires two or more
cells to have multiple observations; (4) Levenes test requires two or more cells to have multiple
observations, but one cell must have three or more.
3-58
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Analysis of Variance
Options
Test for Equal Variances dialog box
specify a confidence level for the confidence interval (the default is 95%)
store standard deviations, variances, and/or upper and lower confidence limits for by factor
levels
CONTENTS
3-59
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 3
HOW TO USE
Test for Equal Variances
You study conditions conducive to potato rot by injecting potatoes with bacteria that cause rotting
and subjecting them to different temperature and oxygen regimes. Before performing analysis of
variance, you check the equal variance assumption using the test for equal variances.
1 Open the worksheet EXH_AOV.MTW.
2 Choose Stat ANOVA Test for Equal Variances.
3 In Response, enter Rot.
4 In Factors, enter Temp Oxygen. Click OK.
Session
window
output
Sigma
Upper
2.26029
1.28146
2.80104
1.54013
1.50012
3.55677
5.29150
3.00000
6.55744
3.60555
3.51188
8.32666
81.890
46.427
101.481
55.799
54.349
128.862
N Factor Levels
3
3
3
3
3
3
10
10
10
16
16
16
2
6
10
2
6
10
3-60
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Analysis of Variance
Graph
window
output
Data
The response (Y variable) data must be stacked in one numeric column. You must also have a
column that contains the group identifiers. The grouping column can be numeric, text, or date/
time. If you wish to change the order in which text levels are processed, you can define your own
order. See Ordering Text Categories in the Manipulating Data chapter in MINITAB Users Guide
1.
Special cases include one observation in a group or a standard deviation of 0 (such as when all
observations are the same). In the first case, the mean is plotted, but not the interval bar. In the
second case, you see a symbol for the mean and a horizontal interval bar.
MINITAB Users Guide 2
CONTENTS
3-61
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 3
HOW TO USE
Interval Plot for Mean
MINITAB automatically omits rows with missing responses or factor levels from the calculations.
h To display an interval plot for the mean
1 Choose Stat ANOVA Interval Plot.
Options
Interval Plot for Mean dialog box
determine the type of interval displayed on the plot. You can display
the default plot which uses standard error bars. That is, the error bars are ( s ) n away
from the mean. You also can specify a multiplier for the standard error bars. For example,
specifying the multiplier allows you to display error bars that are two times the standard
error away from the mean.
display error bars that show a normal distribution confidence interval for the mean (rather
than using the standard error). You can change the confidence level from the default 95%.
display a symbol at the mean position or a bar that extends from the x-axis (or a specified base)
to the mean.
display error bars (or confidence intervals) above the mean (upper one-sided), below the mean
(lower one-sided), or both above and below the mean (two-sided).
pool the standard error across all subgroups instead of calculating the standard error for each
subgroup separately.
replace the default x- and y-axis labels with your own labels.
3-62
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Analysis of Variance
set the type, color, and size of symbols at each subgroup mean.
set the fill type, foreground color, background color, edge size, and base (y-value to which
bars extend to from the mean) of the bars.
specify the type, color, and size of the error bar lines at each subgroup mean.
Six varieties of alfalfa were grown on plots within four different fields. You are interested in
comparing yields of the different varieties. After harvest, you wish to examine means with their
standard errors using an error bar plot.
1 Open the worksheet ALFALFA.MTW.
2 Choose Stat ANOVA Interval Plot.
3 In Y variable, enter Yield.
4 In Group variable, enter Variety. Click OK.
Graph
window
output
CONTENTS
3-63
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 3
HOW TO USE
Main Effects Plot
Data
Set up your worksheet with one column for the response variable and one column for each factor,
so that each row in the response and factor columns represents one observation. It is not required
that your data be balanced.
The factor columns may be numeric, text, or date/time and may contain any values. If you wish
to change the order in which text levels are processed, you can define your own order. See
Ordering Text Categories in the Manipulating Data chapter in MINITAB Users Guide 1. You may
have up to 9 factors.
Missing values are automatically omitted from calculations.
h To perform a main effects plot
1 Choose Stat ANOVA Main Effects Plot.
Options
Options subdialog box
specify the y-value(s) to use for the minimum and/or the maximum of the graph scale.
3-64
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Interactions Plot
HOW TO USE
Analysis of Variance
you can replace the default graph title with your own title.
You grow six varieties of alfalfa on plots within four different fields and you weigh the yield of the
cuttings. You are interested in comparing yields from the different varieties and consider the
fields to be blocks. You want to preview the data and examine yield by variety and field using the
main effects plot.
1 Open the worksheet ALFALFA.MTW.
2 Choose Stat ANOVA Main Effects Plot.
3 In Responses, enter Yield.
4 In Factors, enter Variety Field. Click OK.
Graph
window
output
Interactions Plot
Interactions Plot creates a single interaction plot for two factors, or a matrix of interaction plots
for three to nine factors. An interactions plot is a plot of means for each level of a factor with the
level of a second factor held constant. Interactions plots are useful for judging the presence of
interaction.
Interaction is present when the response at a factor level depends upon the level(s) of other
factors. Parallel lines in an interactions plot indicate no interaction. The greater the departure of
CONTENTS
3-65
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 3
HOW TO USE
Interactions Plot
the lines from the parallel state, the higher the degree of interaction. To use interactions plot,
data must be available from all combinations of levels.
Use the Interactions Plot in Chapter 19 to generate interaction plots specifically for 2-level
factorial designs, such as those generated by Fractional Factorial Design, Central Composite
Design, and Box-Behnken Design.
Data
Set up your worksheet with one column for the response variable and one column for each factor,
so that each row in the response and factor columns represents one observation. Your data is not
required to be balanced.
The factor columns may be numeric, text, or date/time and may contain any values. If you wish
to change the order in which text levels are processed, you can define your own order. See
Ordering Text Categories in the Manipulating Data chapter in MINITAB Users Guide 1. You may
have from 2 through 9 factors.
Missing data are automatically omitted from calculations.
h To display an interactions plot
1 Choose Stat ANOVA Interactions Plot.
Options
Main Effects Plot dialog box
display the full interaction matrix for more than two factors, rather than the default upper right
portion of the matrix.
3-66
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Interactions Plot
HOW TO USE
Analysis of Variance
specify the y-value(s) to use for the minimum of the graph scale. You can enter one value to
be used for all plots or one value for each response.
specify the y-value(s) to use for the maximum of the graph scale. You can enter one value to
be used for all plots or one value for each response.
You conduct an experiment to test the effect of temperature and glass type upon the light output
of an oscilloscope (example and data from [14], page 252). There are three glass types and three
temperatures, 100, 125, and 150 degrees Fahrenheit. You choose interactions plot to visually
assess interaction in the data. You enter the quantitative variable second because you want this
variable as the x variable in the plot.
1 Open the worksheet EXH_AOV.MTW.
2 Choose Stat ANOVA Interactions Plot.
3 In Responses, enter LightOutput.
4 In Factors, enter GlassType Temperature. Click OK.
Graph
window
output
CONTENTS
3-67
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 3
HOW TO USE
Interactions Plot
Plywood is made by cutting thin layers of wood from logs as they are spun on their axis.
Considerable force is required to turn a log hard enough so that a sharp blade can cut off a layer.
Chucks are inserted into the ends of the log to apply the torque necessary to turn the log. You
conduct an experiment to study factors that affect torque. These factors are diameter of the logs,
penetration distance of the chuck into the log, and the temperature of the log. You wish to
preview the data to check for the presence of interaction.
1 Open the worksheet PLYWOOD.MTW.
2 Choose Stat ANOVA Interactions Plot.
3 In Responses, enter Torque.
4 In Factors, enter Diameter-Temp. Click OK.
Graph
window
output
3-68
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
References
Analysis of Variance
References
[1] R.E. Bechhofer and C.W. Dunnett (1988). Percentage points of multivariate Student t
distributions, Selected Tables in Mathematical Studies, Vol.11. American Mathematical
Society, Providence, R.I.
[2] M.B. Brown and A.B. Forsythe (1974). Journal of the American Statistical Association, 69,
364367.
[3] H.L. Harter (1970). Order Statistics and Their Uses in Testing and Estimation, Vol.1. U.S.
Government Printing Office, Washington, D.C.
[4] A.J. Hayter (1984). A proof of the conjecture that the Tukey-Kramer multiple comparisons
procedure is conservative, Annals of Statistics, 12, pp.6175.
[5] D.L. Heck (1960). Charts of Some Upper Percentage Points of the Distribution of the
Largest Characteristic Root, The Annals of Statistics, pp.625642.
[6] C.R. Hicks (1982). Fundamental Concepts in the Design of Experiments, Third Edition,
CBC College Publishing.
[7] Y. Hochberg and A.C. Tamhane (1987). Multiple Comparison Procedures. John Wiley &
Sons, New York.
[8] J.C. Hsu (1984). Constrained Two-Sided Simultaneous Confidence Intervals for Multiple
Comparisons with the Best, Annals of Statistics, 12, pp.11361144.
[9] J.C. Hsu (1996). Multiple Comparisons, Theory and methods, Chapman & Hall, New York.
[10] R. Johnson and D. Wichern (1992). Applied Multivariate Statistical Methods, Third
Edition, Prentice Hall.
[11] H. Levene (1960). Contributions to Probability and Statistics, pp.278292. Stanford
University Press, CA.
[12] T.M. Little (1981). Interpretation and Presentation of Result, HortScience, 19,
pp.637-640.
[13] G.A. Milliken and D.E. Johnson (1984). Analysis of Messy Data. Volume I: Designed
Experiments, Van Nostrand Reinhold.
[14] D.C. Montgomery (1991). Design and Analysis of Experiments, Third Edition, John Wiley
& Sons.
[15] D. Morrison (1967). Multivariate Statistical Methods, McGraw-Hill.
[16] L.S. Nelson (1974). Factors for the Analysis of Means, Journal of Quality Technology, 6,
pp.175181.
[17] P.R. Nelson (1983). A Comparison of Sample Sizes for the Analysis of Means and the
Analysis of Variance, Journal of Quality Technology, 15, pp.3339.
MINITAB Users Guide 2
CONTENTS
3-69
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 3
References
[18] J. Neter, W. Wasserman and M.H. Kutner (1985). Applied Linear Statistical Models,
Second Edition, Irwin, Inc.
[19] R.A. Olshen (1973). The conditional level of the F-test, Journal of the American Statistical
Association, 68, pp.692698.
[20] E.R. Ott (1983). Analysis of MeansA Graphical Procedure, Journal of Quality
Technology, 15, pp.1018.
[21] E.R. Ott and E.G. Schilling (1990). Process Quality ControlTroubleshooting and
Interpretation of Data, 2nd Edition, McGraw-Hill.
[22] P.R. Ramig (1983). Applications of the Analysis of Means, Journal of Quality Technology,
15, pp.1925.
[23] E.G. Schilling (1973). A Systematic Approach to the Analysis of Means, Journal of Quality
Technology, 5, pp.93108, 147159.
[24] S.R. Searle, G. Casella, and C.E. McCulloch (1992). Variance Components, John Wiley &
Sons.
[25] N.R. Ullman (1989). The Analysis of Means (ANOM) for Signal and Noise, Journal of
Quality Technology, 21, pp.111127.
[26] E. Uusipaikka (1985). Exact simultaneous confidence intervals for multiple comparisons
among three or four mean values, Journal of the American Statistical Association, 80, pp.196
201.
[27] B.J. Winer (1971). Statistical Principals in Experimental Design, Second Edition,
McGraw-Hill.
Acknowledgment
We are grateful for assistance in the design and implementation of multiple comparisons from
Jason C. Hsu, Department of Statistics, Ohio State University and for the guidance of
James L. Rosenberger, Statistics Department, The Pennsylvania State University, in developing
the Balanced ANOVA, Analysis of Covariance, and General Linear Models procedures.
3-70
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Multivariate Analysis
See also,
CONTENTS
4-1
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 4
SC QREF
HOW TO USE
Multivariate Analysis Overview
analyze the data covariance structure for the sake of understanding it or to reduce the data
dimension
Analyzing the data covariance structure and assigning observations to groups are characterized by
their non-inferential nature, that is, tests of significance are not computed. There may be no
single answer but what may work best for your data may require knowledge of the situation.
Principal Components Analysis is used to help you to understand the covariance structure in
the original variables and/or to create a smaller number of variables using this structure.
Factor Analysis, like principal components, is used to summarize the data covariance
structure in a smaller number of dimensions. The emphasis in factor analysis, however, is the
identification of underlying factors that might explain the dimensions associated with large
data variability.
Grouping observations
MINITAB offers discriminant analysis and three-cluster analysis methods for grouping
observations:
Discriminant Analysis is used for classifying observations into two or more groups if you have
a sample with known groups. Discriminant analysis can also used to investigate how the
predictors contribute to the groupings.
Cluster Observations is used to group or cluster observations that are close to each other,
when the groups are initially unknown. This method is a good choice when there is no outside
information about grouping. The choice of final grouping is usually made according to what
makes sense for your data after viewing clustering statistics.
Cluster Variables is used to group or cluster variables that are close to each other, when the
groups are initially unknown. The procedure is similar to clustering of observations. One
reason to cluster variables may be to reduce their number.
K-means clustering, like clustering of observations, is used to group observations that are
close to each other. K-means clustering works best when sufficient information is available
to make good starting cluster designations.
4-2
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Multivariate Analysis
Data
Set up your worksheet so that each row contains measurements on a single item or subject. You
must have two or more numeric columns, with each column representing a different
measurement (response).
MINITAB automatically omits rows with missing data from the analysis.
h To perform principal component analysis
1 Choose Stat Multivariate Principal Components.
Options
Principal Components dialog box
specify the number of principal components to calculate (the default number is the number
of variables).
use the correlation or covariance matrix to calculate the principal components. Use the
correlation matrix if it makes sense to standardize variables (the usual choice when variables
are measured by different scales); use the covariance matrix if you do not wish to standardize.
CONTENTS
4-3
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 4
SC QREF
HOW TO USE
display an eigenvalue profile plot (also called a scree plot). Scree plots display the eigenvalues
versus their order. Use this plot to judge the relative magnitude of eigenvalues.
plot second principal component scores (y-axis) versus the first principal component scores
(x-axis). You can also create plots for other components, by storing the scores and using Graph
Plot.
store the coefficients and scores of the principal components. Coefficients are eigenvector
coefficients and scores are the linear combinations of your data using the coefficients.
Nonuniqueness of coefficients
The coefficients are unique (except for a change in sign) if the eigenvalues are distinct and not
zero. If an eigenvalue is repeated, then the space spanned by all the principal component
vectors corresponding to the same eigenvalue is unique, but the individual vectors are not.
Therefore, the coefficients that MINITAB prints and those in a book or another program may not
agree, though the eigenvalues (variances) will always be the same.
If the covariance matrix has rank r < p, where p is the number of variables, then there will be p
r eigenvalues equal to zero. Eigenvectors corresponding to these eigenvalues may not be unique.
This can happen if the number of observations is less than p or if there is multicollinearity.
4-4
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Multivariate Analysis
You record the following characteristics for 14 census tracts: total population (Pop), median years
of schooling (School), total employment (Employ), employment in health services (Health),
and median home value (Home). The data were obtained from [5], Table 8.2.
You wish to understand the underlying data structure so you perform principal components
analysis. You use the correlation matrix to standardize the measurements because they are not
measured with the same scale.
1 Open the worksheet EXH_MVAR.MTW.
2 Choose Stat Multivariate Principal Components.
3 In Variables, enter Pop-Home.
4 Under Type of Matrix, choose Correlation.
5 Click Graphs. Check Eigenvalue (Scree) plot. Click OK in each dialog box.
Session
window
output
3.0289
0.606
0.606
1.2911
0.258
0.864
0.5725
0.114
0.978
0.0954
0.019
0.998
0.0121
0.002
1.000
Variable
Pop
School
Employ
Health
Home
PC1
-0.558
-0.313
-0.568
-0.487
0.174
PC2
-0.131
-0.629
-0.004
0.310
-0.701
PC3
0.008
-0.549
0.117
0.455
0.691
PC4
0.551
-0.453
0.268
-0.648
0.015
PC5
-0.606
0.007
0.769
-0.201
0.014
Graph
window
output
CONTENTS
4-5
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 4
Factor Analysis
Factor Analysis
Use factor analysis, like principal components analysis, to summarize the data covariance
structure in a few dimensions of the data. However, the emphasis in factor analysis is the
identification of underlying factors that might explain the dimensions associated with large data
variability.
Data
You can have three types of input data:
The typical case is to use raw data. Set up your worksheet so that a row contains measurements on
a single item or subject. You must have two or more numeric columns, with each column
4-6
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Factor Analysis
Multivariate Analysis
If you want to store coefficients, factor scores, or the residual matrix, or view an
eigenvalue or scores plot, you must enter raw data.
Usually the factor analysis procedure calculates the correlation or covariance matrix from which
the loadings are calculated. However, you can enter a matrix as input data. You can also enter
both raw data and a matrix of correlations or covariances. If you do, MINITAB uses the matrix to
calculate the loadings. MINITAB then uses these loadings and the raw data to calculate storage
values and generate graphs. See Using a matrix as input data on page 4-10.
If you store initial factor loadings, you can later input these initial loadings to examine the effect
of different rotations. You can also use stored loadings to predict factor scores of new data. See
Using stored loadings as input data on page 4-11.
h To perform factor analysis with raw data
1 Choose Stat Multivariate Factor Analysis.
Options
Factor Analysis dialog box
specify the number of factors to extract (required if you use maximum likelihood as your
method of extraction). With principal components extraction, the default number is the
number of variables.
use maximum likelihood rather than principal components for the initial solutionsee The
maximum likelihood method on page 4-9.
CONTENTS
4-7
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 4
Factor Analysis
perform an equimax, varimax, quartimax, or orthomax rotation of the initial factor loadings
see Rotating the factor loadings on page 4-10.
use a correlation or covariance matrix. Use the correlation matrix if it makes sense to
standardize variables (the usual choice when variables are measured by different scales); use
the covariance matrix if you do not wish to standardize.
enter a covariance or correlation matrix as input datasee Using a matrix as input data on
page 4-10.
use stored loadings for the initial solutionsee Using stored loadings as input data on page
4-11.
display an eigenvalue profile plot (also called a scree plot). Scree plots display the eigenvalues
versus their order. Use this plot to judge the relative magnitude of eigenvalues.
plot the second factor scores (y-axis) versus the first factor scores (x-axis). You can create plots
for other factors, by storing the scores and using Graph Plot.
plot the second factor loadings (y-axis) versus the first factor loadings (x-axis). You can create
loadings plots for other factors, by storing the loadings and using Graph Plot.
store the loadings, factor score coefficients, factor or standard scores, the rotation matrix,
residual matrix, eigenvalues, and matrix of eigenvectorssee Factor analysis storage on page
4-11.
sort the loadings in the Session window display (within a factor if the maximum absolute
loading occurs there). You can also display all loadings less than a given value as zero.
4-8
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Factor Analysis
HOW TO USE
Multivariate Analysis
CONTENTS
4-9
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 4
Factor Analysis
estimates, or change the convergence value, you may see differences in estimated loadings,
especially if the solution lies in a relatively flat place on the maximum likelihood surface.
Goal is
equimax
number of factors / 2
varimax
quartimax
simple loadings
orthomax
01
4-10
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Factor Analysis
HOW TO USE
Multivariate Analysis
3 Click Options.
To examine the effect of a different rotation method, choose an option under Type of
Rotation. See Rotating the factor loadings on page 4-10 for a discussion of the various
rotations.
To predict factor scores with new data, in Variables, enter the columns containing the new
data.
CONTENTS
4-11
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 4
Factor Analysis
You can also store the rotation matrix and residual matrix. Enter a matrix name or matrix number.
The rotation matrix is the matrix used to rotate the initial loadings. If L is the matrix of initial
loadings and M is the rotation matrix that you store, LM is the matrix of rotated loadings. The
residual matrix is (A-LL), where A is the correlation or covariance matrix and L is a matrix of
loadings. The residual matrix is the same for initial and rotated solutions.
You can also store the eigenvalues and eigenvectors of the correlation or covariance matrix
(depending on which is factored) if you chose the initial factor extraction via principal
components. Enter a single column name or number for storing eigenvalues, which are stored
from largest to smallest. Enter a matrix name or number to store the eigenvectors in an order
corresponding to the sorted eigenvalues.
e Example of factor analysis using the principal components method
You record the following characteristics of 14 census tracts (see also Example of principal
components analysis on page 4-5): total population (Pop), median years of schooling (School),
total employment (Employ), employment in health services (Health), and median home value
(Home) (data from [5], Table 8.2). You would like to investigate what factors might explain
most of the variability. As the first step in your factor analysis, you use the principal components
extraction method and examine an eigenvalues (scree) plot in order to help you to decide upon
the number of factors.
1 Open the worksheet EXH_MVAR.MTW.
2 Choose Stat Multivariate Factor Analysis.
3 In Variables, enter Pop-Home.
4 Click Graphs. Check Eigenvalue (Scree) plot. Click OK in each dialog box.
4-12
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Factor Analysis
Session
window
output
HOW TO USE
Multivariate Analysis
Factor1
-0.972
-0.545
-0.989
-0.847
0.303
Factor2
-0.149
-0.715
-0.005
0.352
-0.797
Factor3
0.006
-0.415
0.089
0.344
0.523
Factor4
0.170
-0.140
0.083
-0.200
0.005
Variance
% Var
3.0289
0.606
1.2911
0.258
0.5725
0.114
0.0954
0.019
Factor5 Communality
-0.067
1.000
0.001
1.000
0.085
1.000
-0.022
1.000
0.002
1.000
0.0121
0.002
5.0000
1.000
Factor1
-0.321
-0.180
-0.327
-0.280
0.100
Factor2
-0.116
-0.553
-0.004
0.272
-0.617
Factor3
0.011
-0.726
0.155
0.601
0.914
Factor4
1.782
-1.466
0.868
-2.098
0.049
Factor5
-5.511
0.060
6.988
-1.829
0.129
Graph
window
output
CONTENTS
4-13
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 4
Factor Analysis
See the example below for a rotation of loadings extracted by the maximum likelihood method
with a selection of two factors.
e Example of factor analysis using maximum likelihood and a rotation
You decide to examine the factor analysis fit with two factors in the above census tract example.
You perform a maximum likelihood extraction with varimax rotation.
1 Open the worksheet EXH_MVAR.MTW.
2 Choose Stat Multivariate Factor Analysis.
3 In Variables, enter Pop-Home.
4 Number of factors to extract, enter 2.
5 Under Method of Extraction, choose Maximum likelihood.
6 Under Type of Rotation, choose Varimax.
7 Click Graphs. Check Loading plot for first 2 factors. Uncheck Eigenvalue (Scree) plot.
Click OK.
Click Results. Check Sort loadings. Click OK in each dialog box.
Session
window
output
Factor1
0.971
0.494
1.000
0.848
-0.249
Variance
% Var
2.9678
0.594
Factor2 Communality
0.160
0.968
0.833
0.938
0.000
1.000
-0.395
0.875
0.375
0.202
1.0159
0.203
3.9837
0.797
Factor1
0.718
-0.052
0.831
0.924
-0.415
Variance
% Var
2.2354
0.447
4-14
Factor2 Communality
0.673
0.968
0.967
0.938
0.556
1.000
0.143
0.875
0.173
0.202
1.7483
0.350
3.9837
0.797
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Factor Analysis
HOW TO USE
Multivariate Analysis
Factor1
0.924
0.831
0.718
-0.415
-0.052
Variance
% Var
2.2354
0.447
Factor2 Communality
0.143
0.875
0.556
1.000
0.673
0.968
0.173
0.202
0.967
0.938
1.7483
0.350
3.9837
0.797
Factor1
-0.165
-0.528
1.150
0.116
-0.018
Factor2
0.246
0.789
0.080
-0.173
0.027
Graph
window
output
CONTENTS
4-15
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 4
HOW TO USE
Discriminant Analysis
loadings on Health (0.924), Employ (0.831), and Pop (0.718), and a 0.415 loading on Home
while the loading on School is small. Factor 2 has a large positive loading on School of 0.967 and
loadings of 0.556 and 0.673, respectively, on Employ and Pop, and small loadings on Health and
Home.
You can view the rotated loadings graphically in the loadings plot. What stands out for factor 1 are
the high loadings on the variables Pop, Employ, and Health and the negative loading on Home.
School has a high positive loading for factor 2 and somewhat lower values for Pop and Employ.
Lets give a possible interpretation to the factors. The first factor positively loads on population
size and on two variables, Employ and Health, that generally increase with population size. It
negatively loads on home value, but this may be largely influenced by one point. We might
consider factor 1 to be a health care - population size factor. The second factor might be
considered to be a education - population size factor. Both Health and School are correlated
with Pop and Employ, but not much with each other.
In addition, MINITAB displays a table of factor score coefficients. These show you how the factors
are calculated. MINITAB calculates factor scores by multiplying factor score coefficients and your
data after they have been centered by subtracting means.
You might repeat this factor analysis with three factors to see if it makes more sense for your data.
Discriminant Analysis
Use discriminant analysis to classify observations into two or more groups if you have a sample
with known groups. Discriminant analysis can also used to investigate how variables contribute to
group separation.
MINITAB offers both linear and quadratic discriminant analysis. With linear discriminant analysis,
all groups are assumed to have the same covariance matrix. Quadratic discrimination does not
make this assumption but its properties are not as well understood.
In the case of classifying new observations into one of two categories, logistic regression may be
superior to discriminant analysis [3], [9]. See Logistic Regression Overview on page 2-28.
Data
Set up your worksheet so that a row of data contains information about a single item or subject.
You must have one or more numeric columns containing measurement data, or predictors, and a
single grouping column containing up to 20 groups. The column of group codes may be
numeric, text, or date/time. If you wish to change the order in which text groups are processed
from their default alphabetized order, you can define your own order. See Ordering Text
Categories in the Manipulating Data chapter in MINITAB Users Guide 1. MINITAB automatically
omits observations with missing measurements or group codes from the calculations.
4-16
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Discriminant Analysis
Multivariate Analysis
If a high degree of multicollinearity exists (i.e., if one or more predictors is highly correlated with
another) or one or more of the predictors is essential constant, discriminant analysis calculations
cannot be done and MINITAB displays a message to that effect.
h To perform linear discriminant analysis
1 Choose Stat Multivariate Discriminant Analysis.
Options
Discriminant Analysis dialog box
perform cross-validationsee Cross-Validation on page 4-19. You can store the fitted values
from cross-validation.
store the fitted values. The fitted value for an observation is the group into which it is
classified.
predict group membership for new observationssee Predicting group membership for new
observations on page 4-19.
CONTENTS
4-17
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 4
HOW TO USE
Discriminant Analysis
Prior probabilities
Sometimes items or subjects from different groups are encountered according to different
probabilities. If you know or can estimate these probabilities a priori, discriminant analysis can
use these so-called prior probabilities in calculating the posterior probabilities, or probabilities of
4-18
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Discriminant Analysis
HOW TO USE
Multivariate Analysis
assigning observations to groups given the data. With the assumption that the data have a normal
distribution, the linear discriminant function is increased by ln(pi), where pi is the prior
probability of group i. Because observations are assigned to groups according to the smallest
generalized distance, or equivalently the largest linear discriminant function. The effect is to
increase the posterior probabilities for a group with a high prior probability.
one or more observations. The number of constants or columns must be equivalent to the
number of predictors.
Cross-Validation
Cross-validation is one technique that is used to compensate for an optimistic apparent error
rate. The apparent error rate is the percent of misclassified observations. This number tends to
be optimistic because the data being classified are the same data used to build the classification
function.
The cross-validation routine works by omitting each observation one at a time, recalculating the
classification function using the remaining data, and then classifying the omitted observation.
The computation time takes approximately four times longer with this procedure. When
cross-validation is performed, MINITAB prints an additional summary table.
Another technique that you can use to calculate a more realistic error rate is to split your data
into two parts. Use one part to create the discriminant function, and the other part as a validation
set. Predict group membership for the validation set and calculate the error rate as the percent of
these data that are misclassified.
e Example of discriminant analysis
In order to regulate catches of salmon stocks, it is desirable to identify fish as being of Alaskan or
Canadian origin. Fifty fish from each place of origin were caught and growth ring diameters of
MINITAB Users Guide 2
CONTENTS
4-19
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 4
HOW TO USE
Discriminant Analysis
scales were measured for the time when they lived in freshwater and for the subsequent time
when they lived in saltwater. The goal is to be able to identify newly-caught fish as being from
Alaskan or Canadian stocks. The example and data are from [5], pages 519-520.
1 Open the worksheet EXH_MVAR.MTW.
2 Choose Stat Multivariate Discriminant Analysis.
3 In Groups, enter SalmonOrigin. In Predictors, enter Freshwater Marine. Click OK.
Session
window
output
Alaska
50
Canada
50
Summary of Classification
Put into
Group
Alaska
Canada
Total N
N Correct
Proportion
N = 100
....True Group....
Alaska
Canada
44
1
6
49
50
50
44
49
0.880
0.980
N Correct =
4-20
93
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Discriminant Analysis
HOW TO USE
Multivariate Analysis
True
Group
Alaska
Pred
Group
Canada
2 **
Alaska
Canada
12 **
Alaska
Canada
13 **
Alaska
Canada
30 **
Alaska
Canada
32 **
Alaska
Canada
71 **
Canada
Alaska
Group
Alaska
Canada
Alaska
Canada
Alaska
Canada
Alaska
Canada
Alaska
Canada
Alaska
Canada
Alaska
Canada
Squared
Distance
3.544
2.960
8.1131
0.2729
4.7470
0.7270
4.7470
0.7270
3.230
1.429
2.271
1.985
2.045
7.849
Probability
0.428
0.572
0.019
0.981
0.118
0.882
0.118
0.882
0.289
0.711
0.464
0.536
0.948
0.052
CONTENTS
4-21
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 4
HOW TO USE
Clustering of Observations
Clustering of Observations
Use clustering of observations to classify observations into groups when the groups are initially not
known.
This procedure uses an agglomerative hierarchical method that begins with all observations
being separate, each forming its own cluster. In the first step, the two observations closest together
are joined. In the next step, either a third observation joins the first two, or two other observations
join together into a different cluster. This process will continue until all clusters are joined into
one, however this single cluster is not useful for classification purposes. Therefore you must
decide how many groups are logical for your data and classify accordingly. See Determining the
final cluster grouping on page 4-25.
Data
You can have two types of input data: columns of raw data or a matrix of distances.
Typically, you would use raw data. Each row contains measurements on a single item or subject.
You must have two or more numeric columns, with each column representing a different
measurement. You must delete rows with missing data from the worksheet before using this
procedure.
If you store an n n distance matrix, where n is the number of observations, you can use this
matrix as input data. The (i, j) entry in this matrix is the distance between observations i and j. If
you use the distance matrix as input, statistics on the final partition are not available.
h To perform clustering of observations
1 Choose Stat Multivariate Cluster Observations.
2 In Variables or distance matrix, enter either columns containing the raw (measurement) data
or a matrix of distances.
3 If you like, use one or more of the options listed below, then click OK.
4-22
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Clustering of Observations
Multivariate Analysis
Options
Cluster Observations dialog box
specify the method to measure distance between observations if you enter raw data. Available
methods are Euclidean (default), Squared Euclidean, Pearson, Squared Pearson, or
Manhattan. See Distance measures for observations on page 4-23.
standardize all variables by subtracting the means and dividing by the standard deviation
before the distance matrix is calculateda good idea if variables are in different units and you
wish to minimize the effect of scale differences. If you standardize, cluster centroids and
distance measures are in standardized variable space.
determine the final partition by the specified number of clusters (default is 1) or by the
similarity level. See Determining the final cluster grouping on page 4-25.
store distances between observations and cluster centroids for each cluster group
The Euclidean method is a standard mathematical measure of distance (square root of the
sum of squared differences).
CONTENTS
4-23
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 4
HOW TO USE
Clustering of Observations
The Pearson method is a square root of the sum of square distances divided by variances. This
method is for standardizing.
Manhattan distance is the sum of absolute distances, so that outliers receive less weight than
they would if the Euclidean method were used.
The squared Euclidean and squared Pearson methods use the square of the Euclidean and
Pearson methods, respectfully. Therefore, the distances that are large under the Euclidean and
Pearson methods will be even larger under the squared Euclidean and squared Pearson
methods.
Tip
If you choose average, centroid, median, or Ward as the linkage method, it is generally
recommended [7] that you use one of the squared distance measures.
Linkage methods
The linkage method that you choose determines how the distance between two clusters is
defined. At each amalgamation stage, the two closest clusters are joined. At the beginning, when
each observation constitutes a cluster, the distance between clusters is simply the
inter-observation distance. Subsequently, after observations are joined together, a linkage rule is
necessary for calculating inter-cluster distances when there are multiple observations in a cluster.
You may wish to try several linkage methods and compare results. Depending on the
characteristics of your data, some methods may provide better results than others.
With single linkage, or nearest neighbor, the distance between two clusters is the minimum
distance between an observation in one cluster and an observation in the other cluster. Single
linkage is a good choice when clusters are clearly separated. When observations lie close
together, single linkage tends to identify long chain-like clusters that can have a relatively large
distance separating observations at either end of the chain [5].
With average linkage, the distance between two clusters is the mean distance between an
observation in one cluster and an observation in the other cluster. Whereas the single or
complete linkage methods group clusters based upon single pair distances, average linkage
uses a more central measure of location.
With centroid linkage, the distance between two clusters is the distance between the cluster
centroids or means. Like average linkage, this method is another averaging technique.
With complete linkage, or furthest neighbor, the distance between two clusters is the
maximum distance between an observation in one cluster and an observation in the other
cluster. This method ensures that all observations in a cluster are within a maximum distance
and tends to produce clusters with similar diameters. The results can be sensitive to outliers
[8].
With median linkage, the distance between two clusters is the median distance between an
observation in one cluster and an observation in the other cluster. This is another averaging
technique, but uses the median rather than the mean, thus downweighting the influence of
outliers.
4-24
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Clustering of Observations
HOW TO USE
Multivariate Analysis
With McQuittys linkage, when two clusters are joined, the distance of the new cluster to any
other cluster is calculated as the average of the distances of the soon to be joined clusters to
that other cluster. For example, if clusters 1 and 3 are to be joined into a new cluster, say 1,
then the distance from 1 to cluster 4 is the average of the distances from 1 to 4 and 3 to 4.
Here, distance depends on a combination of clusters rather than individual observations in
the clusters.
With Wards linkage, the distance between two clusters is the sum of squared deviations from
points to centroids. The objective of Wards linkage is to minimize the within-cluster sum of
squares. It tends to produce clusters with similar numbers of observations, but it is sensitive to
outliers [8]. In Wards linkage, it is possible for the distance between two clusters to be larger
than dmax, the maximum value in the original distance matrix. If this happens, the similarity
will be negative.
For some data sets, average, centroid, median and Ward's methods may not produce a
hierarchical dendrogram. That is, the amalgamation distances do not always increase
with each step. In the dendrogram, such a step will produce a join that goes downward
rather than upward.
CONTENTS
4-25
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 4
HOW TO USE
Clustering of Observations
enter will cycle until one is assigned to each cluster. For line type and line color, enter numbers
that correspond to the types and colors below.
Line types
(default)
0
1
2
3
4
5
6
7
Line colors
null (invisible)
solid
dashes
dots
dash 1-dot
dash 2-dots
dash 3-dots
long dashes
(default)
0
1
2
3
4
5
6
7
white
black
red
green
blue
cyan
magenta
yellow
8
9
10
11
12
13
14
15
dark red
dark green
dark blue
dark cyan
dark magenta
dark yellow
dark gray
light gray
You can specify any positive real number for the line sizes. Larger values yield wider lines. The
default size is 1.
e Example of cluster observations
You make measurements on five nutritional characteristics (protein, carbohydrate, and fat
content, calories, and percent of the daily allowance of Vitamin A) of 12 breakfast cereal brands.
The example and data are from p. 623 of [5]. The goal is to group cereal brands with similar
characteristics. You use clustering of observations with the complete linkage method, squared
Euclidean distance, and you choose standardization because the variables have different units.
You also request a dendrogram and assign different line types and colors to each cluster.
1 Open the worksheet CEREAL.MTW.
2 Choose Stat Multivariate Cluster Observations.
3 In Variables or distance matrix, enter Protein-VitaminA.
4 For Linkage Method, choose Complete. For Distance Measure choose Squared Euclidean.
5 Check Standardize variables.
6 Under Specify Final Partition by, choose Number of clusters and enter 4.
7 Check Show dendrogram.
8 Click Customize. In Title, enter Dendrogram for Cereal Data. In Type, enter 1 2 1. In Color,
4-26
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Clustering of Observations
Session
window
output
Multivariate Analysis
Final Partition
Number of clusters:
Cluster1
Cluster2
Cluster3
Cluster4
Number of
Within cluster Average distance Maximum distance
observations sum of squares from centroid
from centroid
2
2.485
1.115
1.115
7
8.999
1.043
1.769
2
2.280
1.068
1.068
1
0.000
0.000
0.000
Cluster Centroids
Variable
Protein
Carbo
Fat
Calories
VitaminA
Cluster1
1.9283
-0.7587
0.3385
0.2803
-0.6397
CONTENTS
Cluster2
-0.3335
0.5419
-0.0967
0.2803
-0.2559
Cluster3
-0.2030
0.1264
0.3385
0.2803
2.0471
Cluster4
-1.1164
-2.5289
-0.6770
-3.0834
-1.0235
Grand centrd
0.0000
-0.0000
0.0000
-0.0000
-0.0000
4-27
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 4
HOW TO USE
Clustering of Observations
Cluster1
0.0000
2.6727
3.5418
4.9896
Cluster2
2.6727
0.0000
2.3838
4.7205
Cluster3
3.5418
2.3838
0.0000
5.4460
Cluster4
4.9896
4.7205
5.4460
0.0000
Graph
window
output
4-28
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Clustering of Variables
Multivariate Analysis
Clustering of Variables
Use Clustering of Variables to classify variables into groups when the groups are initially not
known. One reason to cluster variables may be to reduce their number. This technique may give
new variables that are more intuitively understood than those found using principal
components.
This procedure is an agglomerative hierarchical method that begins with all variables separate,
each forming its own cluster. In the first step, the two variables closest together are joined. In the
next step, either a third variable joins the first two, or two other variables join together into a
different cluster. This process will continue until all clusters are joined into one, but you must
decide how many groups are logical for your data. See Determining the final cluster grouping on
page 4-25.
Data
You can have two types of input data to cluster observations: columns of raw data or a matrix of
distances.
Typically, you would use raw data. Each row contains measurements on a single item or subject.
You must have two or more numeric columns, with each column representing a different
measurement. You must delete rows with missing data from the worksheet before using this
procedure.
If you store a p p distance matrix, where p is the number of variables, you can use this matrix as
input data. The (i, j) entry in this matrix is the distance between observations i and j. If you use
the distance matrix as input, statistics on the final partition are not available.
h To perform clustering of variables
1 Choose Stat Multivariate Cluster Variables.
2 In Variables or distance matrix, enter either columns containing the raw (measurement)
CONTENTS
4-29
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 4
Clustering of Variables
Options
Cluster Variables dialog box
choose the linkage methodsingle (default), average, centroid, complete, McQuitty, median,
or Wardsthat will determine how the distance between two clusters is defined. See Linkage
methods on page 4-24.
choose correlation or absolute correlation as a distance measure if you use raw datasee
Distance measures for variables on page 4-30.
determine the final partition by the specified number of clusters or the specified level of
similaritysee Determining the final cluster grouping on page 4-25.
If it makes sense to consider negatively correlated data to be farther apart than postively
correlated data, then use the correlation method.
If you think that the strength of the relationship is important in considering distance and not
the sign, then use the absolute correlation method.
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Clustering of Variables
Multivariate Analysis
groupings. However, if the purpose behind clustering of variables is data reduction, you may
decide to use your knowledge of the data to a greater degree in determining the final clusters of
variables. See the following example.
e Example of clustering variables
You conduct a study to determine the long-term effect of a change in environment on blood
pressure. The subjects are 39 Peruvian males over 21 years of age who had migrated from the
Andes mountains to larger towns at lower elevations. You recorded their age (Age), years since
migration (Years), weight in kg (Weight), height in mm (Height), skin fold of the chin, forearm,
and calf in mm (Chin, Forearm, Calf), pulse rate in beats per minute (Pulse), and systolic and
diastolic blood pressure (Systol, Diastol).
Your goal is to reduce the number of variables by combining variables with similar
characteristics. You use clustering of variables with the default correlation distance measure,
average linkage and a dendrogram.
1 Open the worksheet PERU.MTW.
2 Choose Stat Multivariate Cluster Variables.
3 In Variables or distance matrix, enter Age-Diastol.
4 For Linkage Method, choose Average.
5 Check Show dendrogram. Click OK.
Session
window
output
Cluster Analysis of Variables: Age, Years, Weight, Height, Chin, Forearm, Calf,
Correlation Coefficient Distance, Average Linkage
Amalgamation Steps
Step Number of Similarity Distance
clusters
level
level
1
9
86.78
0.264
2
8
79.41
0.412
3
7
78.85
0.423
4
6
76.07
0.479
5
5
71.74
0.565
6
4
65.55
0.689
7
3
61.34
0.773
8
2
56.60
0.868
9
1
55.44
0.891
CONTENTS
4-31
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 4
SC QREF
HOW TO USE
Graph
window
output
4-32
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Multivariate Analysis
Data
You must use raw data as input to K-means clustering of observations. Each row contains
measurements on a single item or subject. You must have two or more numeric columns, with
each column representing a different measurement. You must delete rows with missing data
from the worksheet before using this procedure.
To initialize the clustering process using a data column, you must have a column that contains a
cluster membership value for each observation. The initialization column must contain positive,
consecutive integers or zeros (it should not contain all zeros). Initially, each observation is
assigned to the cluster identified by the corresponding value in this column. An initialization of
zero means that an observation is initially unassigned to a group. The number of distinct positive
integers in the initial partition column equals the number of clusters in the final partition.
h To perform K-means clustering of observations
1 Choose Stat Multivariate Cluster K-Means.
Options
Cluster K-Means dialog box
specify the number of clusters to form or specify a column containing cluster membership to
begin the partition processsee Initializing the K-means clustering process on page 4-34.
standardize all variables by subtracting the means and dividing by the standard deviation
before the distance matrix is calculated. This is a good idea if the variables are in different
units and you wish to minimize the effect of scale differences. If you standardize, cluster
centroids and distance measures are in standardized variable space.
CONTENTS
4-33
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 4
SC QREF
HOW TO USE
store the distance between each observation and each cluster centroid
the one which has the smallest Euclidean distance between the observation and the centroid
of the cluster.
2 When a cluster changes, by losing or gaining an observation, MINITAB recalculates the cluster
centroid.
3 This process is repeated until no more observations can be moved into a different cluster. At
this point, all observations are in their nearest cluster according to the criterion listed above.
Unlike hierarchical clustering of observations, it is possible for two observations to be split into
separate clusters after they are joined together.
K-means procedures work best when you provide good starting points for clusters [8]. There are
two ways to initialize the clustering process: specifying a number of clusters or supplying an initial
partition column that contains group codes.
h To initialize the process by specifying the number of clusters
1 Choose Stat Multivariate Cluster K-Means.
2 In Variables, enter the columns containing the measurement data.
3 Under Specify Partition by, choose Number of clusters and enter a number, k, in the box.
MINITAB will use the first k observations as initial cluster seeds, or starting locations. Click OK.
For guidance in setting up your worksheet, see below.
h To initialize the process using a data column
1 Choose Stat Multivariate Cluster K-Means.
2 In Variables, enter the columns containing the measurement data.
3 Under Specify Partition by, choose Initial partition column. Enter the column containing
4-34
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Multivariate Analysis
If you specify the number of clusters, you must rearrange your data in the Data window to
move observations 2, 5 and 9 to the top of the worksheet, and then specify 3 for Number of
clusters.
If you enter an initial partition column, you do not need to rearrange your data in the Data
window. In the initial partition worksheet column, enter group numbers 1, 2, and 3, for
observations 2, 5, and 9, respectively, and enter 0 for the other observations. See the following
example.
The final partition will depend to some extent on the initial partition that MINITAB uses. You
might try different initial partitions.
e Example of K-means clustering
You live-trap, anesthetize, and measure one hundred forty-three black bears. The measurements
are total length and head length (Length, Head.L), total weight and head weight (Weight,
Weight.H), and neck girth and chest girth (Neck.G, Chest.G). You wish to classify these 143
bears as small, medium-sized, or large bears. You know that the second, seventy-eighth, and
fifteenth bears in the sample are typical of the three respective categories. First, you create an
initial partition column with the three seed bears designated as 1 = small, 2 = medium-sized, 3 =
large, and with the remaining bears as 0 (unknown) to indicate initial cluster membership. Then
you perform K-means clustering and store the cluster membership in a column named BearSize.
1 Open the worksheet BEARS.MTW.
2 To create the initial partition column, choose Calc Make Patterned Data Simple Set of
Numbers.
3 In Store patterned data in, type Initial for the storage column name. In both From first
value and To last value, enter 0. In List each value, type 143. Click OK.
4 Go to the Data window and type 1, 2, and 3 in the second, seventy-eighth, and fifteenth rows,
CONTENTS
4-35
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 4
Session
window
output
References
K-means Cluster Analysis: Head.L, Head.W, Neck.G, Length, Chest.G, Weight
Standardized Variables
Final Partition
Number of clusters:
Cluster1
Cluster2
Cluster3
Number of
Within cluster
observations sum of squares
41
63.075
67
78.947
35
65.149
Cluster Centroids
Variable
Head.L
Head.W
Neck.G
Length
Chest.G
Weight
Cluster1
-1.0673
-0.9943
-1.0244
-1.1399
-1.0570
-0.9460
Cluster2
0.0126
-0.0155
-0.1293
0.0614
-0.0810
-0.2033
Cluster3
1.2261
1.1943
1.4476
1.2177
1.3932
1.4974
Grand centrd
-0.0000
0.0000
-0.0000
0.0000
-0.0000
-0.0000
Cluster1
0.0000
2.4233
5.8045
Cluster2
2.4233
0.0000
3.4388
Cluster3
5.8045
3.4388
0.0000
References
[1] T.W. Anderson (1984). An Introduction to Multivariate Statistical Analysis, Second Edition,
John Wiley & Sons.
4-36
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
References
HOW TO USE
Multivariate Analysis
[2] W. Dillon and M. Goldstein (1984). Multivariate Analysis, Methods and Applications, John
Wiley & Sons.
[3] S.E. Fienberg (1987). The Analysis of Cross-Classified Categorical Data. The MIT Press.
[4] H. Harmon (1976). Modern Factor Analysis, Third Edition, University of Chicago Press.
[5] R. Johnson and D. Wichern (1992). Applied Multivariate Statistical Methods, Third Edition,
Prentice Hall.
[6] K. Joreskog (1977). Factor Analysis by Least Squares and Maximum Likelihood Methods,
Statistical Methods for Digital Computers, ed. K. Enslein, A. Ralston and H. Wilf, John Wiley
& Sons.
[7] G.N. Lance and W.T. Williams (1967), A General Theory of Classificatory Sorting
Strategies, I. Hierarchical systems, Computer Journal, 9, 373380
[8] G. W. Milligan (1980). An Examination of the Effect of Six Types of Error Pertubation on
Fifteen Clustering Algorithms, Psychometrika, 45, 325-342.
[9] S.J. Press and S. Wilson (1978). Choosing Between Logistic Regression and Discriminant
Analysis, Journal of the American Statistical Association 73, 699-705.
CONTENTS
4-37
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Nonparametrics
CONTENTS
5-1
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 5
HOW TO USE
Nonparametrics Overview
Nonparametrics Overview
MINITAB provides the following types of nonparametric procedures:
tests of the population location (sign test, Wilcoxon test, Mann-Whitney test, Kruskal-Wallis
test, Moods median test, and Friedman test)
procedures for calculating pairwise statistics (pairwise averages, pairwise differences, and
pairwise slopes)
Parametric implies that a distribution is assumed for the population. Often, an assumption is
made when performing a hypothesis test that the data are a sample from a certain distribution,
commonly the normal distribution. Nonparametric implies that there is no assumption of a
specific distribution for the population.
An advantage of a parametric test is that if the assumptions hold, the power, or the probability of
rejecting H0 when it is false, is higher than is the power of a corresponding nonparametric test
with equal sample sizes. An advantage of nonparametric tests is that the test results are more
robust against violation of the assumptions. Therefore, if assumptions are violated for a test based
upon a parametric model, the conclusions based on parametric test p-values may be more
misleading than conclusions based upon nonparametric test p-values. See [1] for comparing the
power of some of these nonparametric tests to their parametric equivalent.
1-Sample Sign performs a one-sample sign test of the median and calculates the
corresponding point estimate and confidence interval. Use this test as a nonparametric
alternative to one-sample Z and one-sample t-tests.
1-Sample Wilcoxon performs a one-sample Wilcoxon signed rank test of the median and
calculates the corresponding point estimate and confidence interval. Use this test as a
nonparametric alternative to one-sample Z and one-sample t-tests.
Mann-Whitney performs a hypothesis test of the equality of two population medians and
calculates the corresponding point estimate and confidence interval. Use this test as a
nonparametric alternative to the two-sample t-test.
Kruskal-Wallis performs a hypothesis test of the equality of population medians for a one-way
design (two or more populations). This test is a generalization of the procedure used by the
Mann-Whitney test and, like Moods median test, offers a nonparametric alternative to the
one-way analysis of variance. The Kruskal-Wallis test looks for differences among the
populations medians.
5-2
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Nonparametrics
The Kruskal-Wallis test is more powerful (the confidence interval is narrower, on average)
than Moods median test for analyzing data from many distributions, including data from the
normal distribution, but is less robust against outliers.
versus
Use the sign test as a nonparametric alternative to one-sample Z (page 1-11) and one-sample
t-tests (page 1-14), which use the mean rather than the median.
CONTENTS
5-3
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 5
Data
You need at least one column of numeric data. If you enter more than one column of data,
MINITAB performs a one-sample sign test separately for each column. MINITAB automatically
omits missing data from the calculations.
h To calculate a sign confidence interval and test for the median
1 Choose Stat Nonparametrics 1-Sample Sign.
to calculate a sign confidence interval for the median, choose Confidence interval
4 If you like, use one or more of the options listed below, then click OK.
Options
specify a level of confidence for the confidence interval. The default is 95%.
define the alternative hypothesis by choosing less than (lower-tailed), not equal (two-tailed), or
greater than (upper-tailed). The default is a two-tailed test.
Method
Sign test for the median
The sign hypothesis test is based upon the binomial distribution. You can choose an alternative
hypothesis that is one-tailed or two-tailed.
5-4
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Nonparametrics
If the alternative hypothesis is one-tailed (in Alternative you chose less than or greater than),
MINITAB tallies the number of observations less than the hypothesized value (for a
lower-tailed test) or greater than the hypothesized value (for an upper-tailed test). For each
test, the p-value is the binomial probability of observing:
the number of tallied observations or fewer for a lower-tailed test
the number of tallied observations or more for an upper-tailed test
using the observed sample size (n) and a probability of occurrence (p) of 0.5.
If you perform a two-tailed test (in Alternative you choose not equal), the procedure uses the
larger number of tallied values above or below the hypothesized one. The p-value of the sign
test is two times the binomial probability of observing the tallied number of observations or
fewer with the observed n and p = 0.5.
MINITAB omits observations (for both alternative hypotheses) equal to the hypothesized value
from the calculations, and n is reduced by one for each omitted value. When n 50, the
probability calculations are exact. When n > 50, MINITAB uses a normal approximation to the
binomial.
Sign confidence interval for the median
MINITAB calculates three sign confidence intervals. The output below illustrates the three
intervals:
Chemical
N
70
Median
51.50
Achieved
Confidence
0.9270
0.9500
0.9578
Confidence interval
( 49.00, 55.00)
( 48.35, 55.00)
( 48.00, 55.00)
Position
28
NLI
27
The first row gives the achievable confidence level (0.9270) just below the requested confidence
level (0.95); the third row gives the achievable confidence level (0.9578) just above the requested
level (0.95).
The calculation of the first and third intervals uses a method similar to the sign method used
when doing a hypothesis test of the median. Observations are first ordered. The interval that goes
from the dth smallest observation to the dth largest observation has confidence 1 2P (X < d)
using the binomial distribution with p = 0.5. The intervals with confidence coefficients just
above and below the requested level are those selected. Only rarely can you achieve the
requested confidence with these intervals.
MINITAB finds the middle confidence interval by a nonlinear interpolation procedure developed
by Hettmansperger and Sheather [2]. The confidence coefficient of this interval will be as close
to the requested level as possible. This method has the following properties:
the actual confidence level is between the confidence levels for the bounding intervals
the interpolation is a very good approximation for a wide variety of symmetric distributions,
including the normal distribution, the Cauchy distribution, and the uniform distribution
the interpolation tends to be not quite as good for asymmetric distributions as for symmetric
distributions but it is much more accurate than linear interpolation [2]
CONTENTS
5-5
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 5
HOW TO USE
One-Sample Sign Test
Boxplot (in the Core Graphs chapter in MINITAB Users Guide 1) also uses this interpolation
procedure to calculate the confidence interval for the median.
e Example of a one-sample sign test of the median
Price index values for 29 homes in a suburban area in the Northeast were determined. Real estate
records indicate the population median for similar homes the previous year was 115. This test will
determine if there is sufficient evidence for judging if the median price index for the homes was
greater than 115 using = 0.10.
1 Open the worksheet EXH_STAT.MTW.
2 Choose Stat Nonparametrics 1-Sample Sign.
3 In Variables, enter PriceIndex.
4 Choose Test median and enter 115 in the text box.
5 In Alternative, choose greater than. Click OK.
Session
window
output
N Below Equal
29
12
0
115.0
Above
17
P
0.2291
Median
144.0
Using data for the 29 houses in the previous example, you also want to obtain a 95% confidence
interval for the population median.
1 Open the worksheet EXH_STAT.MTW.
2 Choose Stat Nonparametrics 1-Sample Sign.
5-6
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Nonparametrics
Session
window
output
PriceInd
N
29
Median
144.0
Achieved
Confidence
0.9386
0.9500
0.9759
Confidence interval
( 110.0, 210.0)
( 108.5, 211.7)
( 101.0, 220.0)
Position
10
NLI
9
versus
An assumption for the one-sample Wilcoxon test and confidence interval is that the data are a
random sample from a continuous, symmetric population. When the population is normally
distributed, this test is slightly less powerful (the confidence interval is wider, on the average)
than the t-test. It may be considerably more powerful (the confidence interval is narrower, on the
average) for other populations.
Data
You need at least one column of numeric data. If you enter more than one column of data,
MINITAB performs a one-sample Wilcoxon test separately for each column. MINITAB
automatically omits missing data from the calculations.
CONTENTS
5-7
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 5
HOW TO USE
One-Sample Wilcoxon Test
h To calculate a one-sample Wilcoxon confidence interval and test for the median
1 Choose Stat Nonparametrics 1-Sample Wilcoxon.
to calculate a Wilcoxon confidence interval for the median, choose Confidence interval
4 If you like, use one or more of the options listed below, then click OK.
Note
If you do not specify a hypothesized median, a one-sample Wilcoxon test tests whether
the sample median is different from zero.
Options
specify a level of confidence for the confidence interval. The default is 95%.
define the alternative hypothesis by choosing less than (lower-tailed), not equal (two-tailed), or
greater than (upper-tailed). The default is a two-tailed test.
Method
Test for the median
MINITAB first eliminates any observations equal to the hypothesized median. Then the pairwise
(Walsh) averages, (Yi + Yj) / 2 for i j, are formed. The Wilcoxon statistic is the number of Walsh
averages exceeding the hypothesized median, plus one half the number of Walsh averages equal
to the hypothesized median. This statistic is approximately normally distributed. Under H0, the
distribution mean for the Wilcoxon is N (N + 1) / 4, where N is the number of observations for
the test. The attained p-value is calculated using a normal approximation with a continuity
correction.
5-8
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Nonparametrics
An algebraically equivalent form of the test is based on ranks. Subtract the hypothesized median
from each observation, discard any zeros, and rank the absolute values of these differences. The
number of differences is the sample size reduced by one for each observation equal to the
median. If two or more absolute differences are tied, assign the average rank to each. The
Wilcoxon statistic is the sum of ranks corresponding to positive differences.
The Wilcoxon point estimate of the population median is the median of the Walsh averages.
MINITAB obtains the test statistic and point estimate of the population median using an
algorithm based on Johnson and Mizoguchi [4].
Confidence interval
The confidence interval is the set of values (d) for which the test of H0: median = d is not
rejected in favor of H1: median d, using = 1 (percent confidence) / 100. Because of the
discreteness of the Wilcoxon test statistic, it will seldom be possible to achieve the specified
confidence. The procedure prints the closest value, which is computed using a normal
approximation with a continuity correction.
e Example of a one-sample Wilcoxon test for the median
Achievement test scores in science were recorded for 9 students. This test enables you to judge if
there is sufficient evidence for the population median being different than 77 using = 0.05.
1 Open the worksheet EXH_STAT.MTW.
2 Choose Stat Nonparametrics 1-Sample Wilcoxon.
3 In Variables, enter Achievement.
4 Choose Test median, and enter 77 in the box. Click OK.
Session
window
output
Achievem
N
9
N for Wilcoxon
Test Statistic
8
19.5
P
0.889
Estimated
Median
77.50
CONTENTS
5-9
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 5
HOW TO USE
One-Sample Wilcoxon Test
A 95% confidence interval for the population median can be calculated by the one-sample
Wilcoxon method.
1 Open the worksheet EXH_STAT.MTW.
2 Choose Stat Nonparametrics 1-Sample Wilcoxon.
3 In Variables, enter Achievement.
4 Choose Confidence interval. Click OK.
Session
window
output
Achieved
N
Median Confidence
9
77.5
95.6
Confidence Interval
(
70.0,
84.0)
5-10
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Nonparametrics
versus
H1: 1 2
Data
You will need two columns containing numeric data drawn from two populations. The columns
do not need to be the same length. MINITAB automatically omits missing data from the
calculations.
h To calculate a Mann-Whitney test
1 Choose Stat Nonparametrics Mann-Whitney.
2 In First Sample, enter the column containing the sample data from one population.
3 In Second Sample, enter the column containing the other sample data.
4 If you like, use one or more of the options listed below, then click OK.
CONTENTS
5-11
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 5
SC QREF
HOW TO USE
Options
specify a level of confidence for the confidence interval. The default is 95%.
define the alternative hypothesis by choosing less than (lower-tailed), not equal (two-tailed), or
greater than (upper-tailed). The default is a two-tailed test.
Method
To calculate the test statistic, W:
1 MINITAB ranks the two combined samples, with the smallest observation given rank 1, the
W.
The point estimate of the population median is the median of all the pairwise differences
between observations in the first sample and the second sample.
Mann-Whitney determines the attained significance level of the test using a normal
approximation with a continuity correction factor. If there are ties in the data, MINITAB adjusts
the significance level. The unadjusted significance level is conservative if ties are present; the
adjusted significance level is usually closer to the correct values, but is not always conservative.
The confidence interval is the set of values d for which the test of H0: 1 2 = d versus H1: 1
2 is not rejected, at = 1 (percent confidence) / 100. The method used to calculate the
confidence interval is described in [6].
e Example of two-sample Mann-Whitney test
Samples were drawn from two populations and diastolic blood pressure was measured. You will
want to determine if there is evidence of a difference in the population locations without
assuming a parametric model for the distributions. Therefore, you choose to test the equality of
population medians using the Mann-Whitney test with = 0.05 rather than using a two-sample
t-test, which tests the equality of population means.
1 Open the worksheet EXH_STAT.MTW.
2 Choose Stat Nonparametrics Mann-Whitney.
3 In First Sample, enter DBP1. In Second Sample, enter DBP2. Click OK.
5-12
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Session
window
output
Nonparametrics
Data
The response (measurement) data must be stacked in one numeric column. You must also have
a column that contains the factor levels or population identifiers. Factor levels can be numeric,
text, or date/time data. If you wish to change the order in which text levels are processed, you can
define your own order. See Ordering Text Categories in the Manipulating Data chapter in
MINITAB Users Guide 1. Calc Make Patterned Data can be helpful in entering the level
values of a factor. See the Generating Patterned Data chapter in MINITAB Users Guide 1.
MINITAB automatically omits rows with missing responses or factor levels from the calculations.
MINITAB Users Guide 2
CONTENTS
5-13
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 5
SC QREF
HOW TO USE
Method
To calculate the test statistic, H:
1 First, MINITAB ranks the combined samples, with the smallest observation given rank 1, the
12 n i [ R i R ]
H = ---------------------------------------N(N + 1)
where ni is the number of observations in group i, N is the total sample size, Ri is the average
of the ranks in group i, and R is the average of all the ranks.
Under the null hypothesis, the distribution of H can be approximated by a 2 distribution with k
1 degrees of freedom. The approximation is reasonably accurate if no group has fewer than five
observations. Large values of H suggest that there are some differences in location among the
populations.
Some authors (such as, Lehmann [5]) suggest adjusting H when there are ties in the data.
Suppose there are J distinct values among the N observations, and for the jth distinct value, there
are dj tied observations (dj = 1 if there are no ties). Then the adjusted test statistic is:
H
H ( adj ) = -------------------------------------------------------------------3
3
1 [ (d j d j ) ( N N ) ]
5-14
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Nonparametrics
When there are no ties, H(adj) = H. Under the null hypothesis, the distribution of H(adj) is also
approximately a 2 with k 1 degrees of freedom. For small samples, the use of exact tables is
suggested (such as, Hollander and Wolfe [3]). MINITAB displays H(adj) if there are ties.
MINITAB also displays z-value for each group. The value of zi indicates how the mean rank ( R i)
for group i differs from the mean rank ( R) for all N observations. For group i:
Ri ( N + 1 ) 2
z i = ----------------------------------------------------------( N + 1 ) ( N n i 1 ) 12
Under the null hypothesis, zi is approximately normal with mean 0 and variance 1.
e Example of a Kruskal-Wallis test
Measurements in growth were made on samples that were each given one of three treatments.
Rather than assuming a data distribution and testing the equality of population means with
one-way ANOVA, you decide to select the Kruskal-Wallis procedure to test H0: 1 = 2 = 3,
versus H1: not all s are equal, where the s are the population medians.
1 Open the worksheet EXH_STAT.MTW.
2 Choose Stat Nonparametrics Kruskal-Wallis.
3 In Response, enter Growth.
4 In Factor, enter Treatment. Click OK.
Session
window
output
N
5
5
6
16
Median
13.20
12.90
15.60
Ave Rank
7.7
4.3
12.7
8.5
Z
-0.45
-2.38
2.71
DF = 2 P = 0.013
DF = 2 P = 0.013 (adjusted for ties)
CONTENTS
5-15
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 5
SC QREF
HOW TO USE
An assumption of Moods median test is that the data from each population are independent
random samples and the population distributions have the same shape. Moods median test is
robust against outliers and errors in data and is particularly appropriate in the preliminary stages
of analysis. Moods median test is more robust than is the Kruskal-Wallis test against outliers, but
is less powerful for data from many distributions, including the normal.
Data
The response (measurement) data must be stacked in one numeric column. You must also have a
column that contains the factor levels or population identifiers. Factor levels can be numeric,
text, or date/time data. If you wish to change the order in which text levels are processed, you can
define your own order. See Ordering Text Categories in the Manipulating Data chapter in
MINITAB Users Guide 1. Calc Make Patterned Data can be helpful in entering the level
values of a factor. See the Generating Patterned Data chapter in MINITAB Users Guide 1.
MINITAB automatically omits rows with missing responses or factor levels from the calculations.
h To do a Moods median test
1 Choose Stat Nonparametrics Moods Median Test.
5-16
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Nonparametrics
Options
Method
The overall median is the median of all the data. For each level, Moods median test prints the
number of observations less than or equal to the overall median, and the number of observations
greater than the overall median. If there are k different levels, Moods median test gives a 2 k
table of counts. A 2 test for association is done on this table. Large values of 2 indicate that the
null hypothesis may be false. Only groups containing two or more observations are included in
the analysis.
If there are relatively few observations above the median due to ties with the median, then
observations equal to the median may be counted with those above the median.
e Example of Moods median test
One hundred seventy-nine participants were given a lecture with cartoons to illustrate the
subject matter. Subsequently, they were given the OTIS test, which measures general
intellectual ability. Participants were rated by educational level: 0 = preprofessional, 1 =
professional, 2 = college student. The Moods median test was selected to test H0: 1 = 2 = 3,
versus H1: not all s are equal, where the s are the median population OTIS scores for the
three education levels.
1 Open the worksheet CARTOON.MTW.
2 Choose Stat Nonparametrics Moods Median Test.
3 In Response, enter Otis. In Factor, enter Ed. Click OK.
Session
window
output
N<=
47
29
15
DF = 2
N>
9
24
55
Median
97.5
106.0
116.5
P = 0.000
Individual 95.0% CIs
Q3-Q1 ----+---------+---------+---------+-17.3 (-----+-----)
21.5
(------+------)
16.3
(----+----)
----+---------+---------+---------+-96.0
104.0
112.0
120.0
CONTENTS
5-17
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 5
SC QREF
HOW TO USE
versus
Randomized block experiments are a generalization of paired experiments, and the Friedman test
is a generalization of the paired sign test. Additivity (fit is sum of treatment and block effect) is not
required for the test, but is required for the estimate of the treatment effects.
Data
The response (measurement) data must be stacked in one numeric column. You must also have a
column that contains the treatment levels and a column that contains the block levels. Treatment
and block levels can be numeric, text, or date/time data. If you wish to change the order in which
text levels are processed, you can define your own order. See Ordering Text Categories in the
Manipulating Data chapter in MINITAB Users Guide 1. Calc Make Patterned Data can be
helpful in entering the level values of a factor. See the Generating Patterned Data chapter in
MINITAB Users Guide 1.
You must have exactly one nonmissing observation per treatmentblock combination. MINITAB
automatically omits rows with missing responses, treatment levels, or block levels from the
calculations.
5-18
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Nonparametrics
h To do a Friedman test
1 Choose Stat Nonparametrics Friedman.
Options
store the residuals. The residuals are the (observation adjusted for treatment effect)
(adjusted block median).
store the fitted values. The fits are the (treatment effect) + (adjusted block median) or
observation residual.
Method
To calculate the test statistic, S:
1 MINITAB first ranks the data separately within each block.
2 Next, sum the ranks for each treatment.
3 Calculate the test statistic (S) which is a constant times
[ ( Rj R )
where Rj is the sum of ranks for treatment j and R is the average of the Rjs. See standard
nonparametric texts (such as [3]) for details on computing S adjusted for ties.
The test statistic has an approximately 2 distribution, with associated degrees of freedom of
(number of treatments one). If there are ties within one or more blocks, MINITAB uses the
average rank and prints a test statistic that has been corrected for ties. If there are many ties, the
MINITAB Users Guide 2
CONTENTS
5-19
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 5
SC QREF
HOW TO USE
uncorrected test statistic is conservative; the corrected version is usually closer, but may be either
conservative or liberal. For details of the method used, see [3].
Block
2
3
1
0.15
0.55
0.55
0.26
0.26
0.66
0.23
0.22
0.77
4
0.99
0.99
0.99
To calculate treatment effects (Doksum method, [3] pages 158161), first find the median
difference between pairs of treatment. The pairwise differences for treatment 1 minus treatment 2
are 0.15 0.55 = 0.4, 0.26 0.26 = 0, 0.23 (0.22) = 0.45, and 0.99 0.99 = 0. The median of
the differences is 0. Doing this for the other two pairs gives 0.4 for treatment 1 minus treatment 3,
and 0.2 for treatment 2 minus treatment 3.
The effect for each treatment is the average of the median differences of that treatment with all
other treatments (including itself). For the data in this example, effect(2) = [median (2 1) +
median (2 2) + median (2 3)]/3 = (0.00 + 0.00 0.20)/3 = 0.0667. Similarly, effect(1) =
0.1333 and effect(3) = 0.20.
e Example of a Friedman test
A randomized block experiment was conducted to evaluate the effect of a drug treatment on
enzyme activity. Three different drug therapies were given to four animals, with each animal
belonging to a different litter. The Friedman test provides the desired test of H0: all treatment
effects are zero vs. H1: not all treatment effects are zero.
1 Open the worksheet EXH_STAT.MTW.
2 Choose Stat Nonparametrics Friedman.
3 In Response, enter EnzymeActivity.
4 In Treatment, enter Therapy. In Blocks, enter Litter. Click OK.
5-20
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Session
window
output
HOW TO USE
Nonparametrics
DF = 2 P = 0.305
DF = 2 P = 0.150 (adjusted for ties)
N
4
4
4
Est
Median
0.2450
0.3117
0.5783
Grand median =
0.3783
Therapy
1
2
3
Sum of
Ranks
6.5
7.0
10.5
CONTENTS
5-21
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 5
Runs Test
Runs Test
Use Runs Test to see if the data order is random. This is a nonparametric test because no
assumption is made about population distribution parameters. Use this test when you want to
determine if the order of responses above or below a specified value is random. A run is a set of
consecutive observations either all less than or all greater than some value.
Stat Quality Tools Run Chart generates a run chart and performs other tests for
randomness. See Run Chart on page 10-2 for more information.
Data
You need at least one column of numeric data. If you have more than one column of data,
MINITAB performs a runs test separately for each column.
You may have missing data at the beginning or end of a data column, but not in the middle. You
must omit missing data from the middle of a worksheet column before using this procedure.
h To do a runs test
1 Choose Stat Nonparametrics Runs Test.
2 In Variables, enter the column(s) containing the data you want to test for randomness.
3 If you like, use the option listed below, and click OK.
Options
You can specify a value other than the mean as the value for defining the runs.
e Example of a runs test
Suppose an interviewer selects 30 people at random and asks them each a question for which
there are four possible answers. Their responses are coded 0, 1, 2, and 3. You wish to perform a
runs test in order to check the randomness of answers. Answers that are not in random order may
5-22
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Pairwise Averages
HOW TO USE
Nonparametrics
indicate that there is a gradual bias in the phrasing of the questions or that subjects are not being
selected at random.
1 Open the worksheet EXH_STAT.MTW.
2 Choose Stat Nonparametrics Runs Test.
3 In Variables, enter Response. Click OK.
Session
window
output
1.2333
Pairwise Averages
Pairwise Averages calculates and stores the average for each possible pair of values in a single
column, including each value with itself. Pairwise averages are also called Walsh averages.
Pairwise averages are used, for example, for the Wilcoxon method.
MINITAB Users Guide 2
CONTENTS
5-23
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 5
HOW TO USE
Pairwise Differences
Data
You must have one numeric data column. If you have missing data, the pairwise averages
involving the missing values are set to missing.
h To calculate pairwise averages
1 Choose Stat Nonparametrics Pairwise Averages.
2 In Variable, enter the column for which you want to obtain averages.
3 In Store averages in, enter a column name or number to store the pairwise (Walsh) averages.
Options
You can store the indices for each average (in two columns). The Walsh average, (xi + xj) / 2, has
indices i and j. The value of i is put in the first storage column and the value of j is put in the
second storage column.
Pairwise Differences
Pairwise Differences calculates and stores the differences between all possible pairs of values
formed from two columns. These differences are useful for nonparametric tests and confidence
intervals. For example, the point estimate given by Mann-Whitney (page 5-11) can be computed
as the median of the differences.
5-24
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Pairwise Slopes
HOW TO USE
Nonparametrics
Data
You must have two numeric data columns. If you have missing data, the pairwise differences
involving the missing values are set to missing.
h To calculate pairwise differences
1 Choose Stat Nonparametrics Pairwise Differences.
2 In First variable, enter a column. The column you enter in Second variable will be
Options
You can store the indices for each difference (in two columns). The difference, (xi yj), has
indices i and j. The value of i is put in the first storage column and the value of j is put in the
second storage column.
Pairwise Slopes
Pairwise Slopes calculates and stores the slope between all possible pairs of points, where a row
in yx columns defines a point in the plane. This procedure is useful for finding robust estimates
of the slope of a line through the data.
CONTENTS
5-25
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 5
HOW TO USE
References
Data
You must have two numeric data columns, one that contains the response variable (y) and one
that contains the predictor variable (x). If you have missing data or the slope is not defined (e.g.
slope of a line parallel to the y axis), the slope will be stored as missing.
h To calculate pairwise slopes
1 Choose Stat Nonparametrics Pairwise Slopes.
Options
You can store the indices for each slope in two columns. The slope, (xi yj), has indices i and j.
The value of i is put in the first storage column and the value of j is put in the second storage
column.
References
[1] Gibbons, J.D. (1976). Nonparametric Methods for Quantitative Analysis. Holt, Rhinehart,
and Winston.
[2] T.P. Hettmansperger and S.J. Sheather (1986). Confidence Intervals Based on Interpolated
Order Statistics, Statistics and Probability Letters, 4, pp.7579.
5-26
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
References
HOW TO USE
Nonparametrics
[3] M. Hollander and D.A. Wolfe (1973). Nonparametric Statistical Methods, John Wiley &
Sons.
[4] D.B. Johnson and T. Mizoguchi (1978). Selecting the Kth Element in X + Y and X1 + X2
+ + Xm, SIAM Journal of Computing 7, pp.147153.
[5] E.L. Lehmann (1975). Nonparametrics: Statistical Methods Based on Ranks, HoldenDay.
[6] J.W. McKean and T.A. Ryan, Jr. (1977). An Algorithm for Obtaining Confidence Intervals
and Point Estimates Based on Ranks in the Two Sample Location Problem, Transactions on
Mathematical Software, pp.183185.
[7] G. Noether (1971). StatisticsA NonParametric Approach, Houghton-Mifflin.
CONTENTS
5-27
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Tables
See also,
CONTENTS
6-1
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 6
HOW TO USE
Tables Overview
Tables Overview
Table procedures summarize data into table form or perform a further analysis of a tabled
summary. Your data needs to be arranged in the worksheet in certain way in order to do these
procedures. The different possibilities of arranging your data can be seen in Arrangement of Input
Data on page 6-3. The Tables procedures are described below.
Cross Tabulation displays one-way, two-way, and multi-way tables containing counts,
percents, and summary statistics, such as means, standard deviations, and maximums, for
associated variables. To use this procedure, your data must be in raw form, or they can be in
frequency form if summary statistics for associated variables are not desired.
Tally displays counts, cumulative counts, percents, and cumulative percents for each unique
value of a variable when input data are in raw form.
Chi-square tests
Chi-Square Test for Association tests for non-independence in a two-way classification. Use
this procedure to test if the probabilities of items or subjects being classified for one variable
depend upon the classification of the other variable. Your data can be in raw, collapsed, or
contingency table form.
Chi-Square Test for Goodness-of-Fit tests if the sample outcomes result from a known
discrete probability model.
Correspondence analysis
6-2
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Tables
C1
1
2
2
1
.
.
.
Frequency data
Each row represents a unique
combination of group codes:
C1 = gender
C2 = politics
C3 = the number of observations at that level
C1
1
1
1
2
2
2
C2
1
1
3
2
.
.
.
C2
1
2
3
1
2
3
Contingency table
Each cell contains counts:
C1 = males
C2 = females
Rows 1-3 represent the
three levels for politics,
respectively
C3
17
10
19
18
19
17
C1
17
10
19
C2
18
19
17
Indicator variables
One row for each observation:
C1 = 1 if male
= 0 if female
C2 = 1 if female
= 0 if male
C3 = 1 if Democrat
= 0 otherwise
C4 = 1 if Republican
= 0 otherwise
C5 = 1 if Other
= 0 otherwise
C1
1
0
0
1
.
.
.
C2
0
1
1
0
.
.
.
C3
1
1
0
0
C4
0
0
0
1
C5
0
0
1
0
To obtain frequency data from raw data, see Store Descriptive Statistics on page 1-9. To create a
contingency table from raw data or frequency data, see Cross Tabulation on page 6-3, and copy
and paste the table output into your worksheet. To create indicator variables from raw data, use
Calc Make Indicator Variables.
Cross Tabulation
Cross tabulation prints one-way, two-way, and multi-way tables containing counts, percents, and
summary statistics, such as means, standard deviations, and maximums, for associated variables.
To use this procedure, your data must be in raw form, or they can be in collapsed form if
summary statistics for associated variables are not desired. See the Chi-Square Test for Association
on page 6-14 for performing a 2 test for association.
MINITAB Users Guide 2
CONTENTS
6-3
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 6
Cross Tabulation
Data
You can arrange your data in the worksheet in raw or frequency form. See Arrangement of Input
Data on page 6-3. The data must be in raw form to display summary statistics for associated
variables.
If your data are in raw form, you can have between two and ten classification columns with each
row representing one observation. The classification or category data may be numeric, text, or
date/time. If you wish to change the order in which text categories are processed from their
default alphabetized order, you can define your own order. See Ordering Text Categories in the
Manipulating Data chapter in MINITAB Users Guide 1. Associated variables must be numeric
and can contain any numeric values. By default, Cross Tabulation omits rows with missing
classification values. Optionally, you can include these rows.
If your data are in frequency or collapsed form, you can have between two and ten columns
containing your categories and another column containing the frequencies for the category
combinations. The category data may be numeric, text, or date/time, and may contain any values.
The frequency data must be integers. By default, Cross Tabulation omits rows with frequency
data. Optionally, you can include these rows.
If you have two-category columns, a two-way table will be tabulated. Otherwise you can obtain
multiple two-way tables (the default) or multi-way tables.
h To cross tabulate data
1 Choose Stat Tables Cross Tabulation.
To enter raw data, enter the columns containing the raw data in Classification variables.
3 If you like, use one or more of the options listed below, then click OK.
6-4
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Cross Tabulation
Tables
Options
Cross Tabulation dialog box
display the percent of each cell within its row, its column, or the total two-way table. If you do
not choose a percent option, MINITAB displays counts by default. If you choose a percent
option and want counts, you must check Counts.
perform a 2 test for association for each two-way table. See Chi-Square Test for Association on
page 6-14.
calculate and display the mean, median, minimum, maximum, sum, and standard deviation
for associated variables
display the data, the number of nonmissing data, the number of missing data, the proportion
of observations equal to a specified value, and the proportion of observations between
specified values for associated variables
adjust the table layout. See Changing the table layout below.
use missing values as a level accepted by MINITAB. However, MINITAB does not include
missing levels in calculations of marginal statistics, percents, or the 2 test for association.
specify the values to display marginal statistics for. By default, marginal statistics are printed
for all rows and columns. Marginal statistics are summaries for rows and columns of a table.
CONTENTS
6-5
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 6
HOW TO USE
Cross Tabulation
2 To assign the row variables, enter a number in Use the first ___ classification variables for
rows. MINITAB will assign the specified number of variables to rows using the order in which
they were entered in the main dialog box.
3 To assign the column variables, enter a 0, 1, or 2 in and the next ___ for columns. MINITAB
will assign the specified number of variables to columns using the order in which they were
entered in the main dialog box.
MINITAB uses category variables that are not assigned to define the combination of categories at a
higher level. For default two-way and higher tables, the first category variable is the row variable
and the second is the column variable.
6-6
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Cross Tabulation
Tables
You will not see the order of levels changed in the worksheet but commands will process these
levels in the new order.
Tip
Between examples, set the dialog box settings to their default state by pressing 3.
Resetting the dialog boxes to their default settings eliminates unwanted dialog box
changes made previously. Value ordering will not be affected by resetting dialogs to their
default state.
The following example illustrates output for a two-way table with summary statistics for
associated variables.
1 Open the worksheet EXH_TABL.MTW. Set the value order for the variable Activity as shown
in Tips on doing the examples on page 6-6, if you have not already done so.
2 Choose Stat Tables Cross Tabulation.
3 In Classification variables, enter Gender Activity. Under Display, check Counts.
4 Click Summaries. In Associated variables, enter Height Weight. Under Display, check
Columns: Activity
Slight Moderate
A lot
All
Female
4
65.000
123.00
26
65.615
124.46
5
64.600
121.00
35
65.400
123.80
Male
5
72.400
170.00
35
70.429
158.09
16
71.125
155.50
56
70.804
158.41
All
9
69.111
149.11
61
68.377
143.75
21
69.571
147.29
91
68.725
145.10
CONTENTS
6-7
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 6
HOW TO USE
Cross Tabulation
contains four observations. These four women have a mean height of 65 inches and a mean
weight of 123 pounds.
The column headed All contains the row margins. For example, the first number in this column,
35, is the total number of observations in row one, and the second number, 65.400, is the mean
height for these 35 women. The row headed All contains the corresponding column margins.
e Example of using cross tabulation to display data
This example shows how to display data values for a variable associated with the classification
variables.
1 Open the worksheet EXH_TABL.MTW.
2 Choose Stat Tables Cross Tabulation.
3 In Classification variables, enter Gender Smokes.
4 Click Summaries. In Associated variables, enter Pulse.
5 Under Display, check Data. Click OK in each dialog box.
Session
window
output
Columns: Smokes
No
Yes
Female 96.000
62.000
82.000
68.000
96.000
78.000
80.000
84.000
61.000
64.000
60.000
72.000
58.000
66.000
84.000
62.000
66.000
80.000
78.000
68.000
72.000
82.000
87.000
78.000
78.000
100.000
88.000
62.000
94.000
88.000
76.000
90.000
6-8
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Cross Tabulation
Tables
68.000
86.000
76.000
Male
No
Yes
64.000
58.000
64.000
74.000
84.000
68.000
62.000
76.000
80.000
68.000
60.000
62.000
72.000
70.000
74.000
66.000
62.000
60.000
62.000
76.000
74.000
74.000
68.000
68.000
64.000
58.000
54.000
76.000
88.000
70.000
78.000
90.000
72.000
68.000
84.000
74.000
68.000
62.000
66.000
90.000
92.000
66.000
70.000
68.000
70.000
72.000
68.000
54.000
72.000
82.000
70.000
62.000
90.000
70.000
92.000
60.000
CONTENTS
6-9
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 6
Cross Tabulation
e Example of cross tabulation with three classification variables
in Some simple tables on page 6-6, if you have not already done so.
2 Choose Stat Tables Cross Tabulation.
3 In Classification variables, enter Gender Activity Smokes. Click OK.
Session
window
output
3
3
6
20
22
42
A lot
All
4
12
16
27
37
64
1
2
3
6
13
19
A lot
All
1
4
5
8
19
27
This example uses the same data in a three-way table as in the example above, but the table
layout is different.
1 Open the worksheet EXH_TABL.MTW. Set the value order for the variable Activity as shown
in Some simple tables on page 6-6, if you have not already done so.
2 Choose Stat Tables Cross Tabulation.
3 In Classification variables, enter Gender Activity Smokes.
6-10
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Cross Tabulation
Tables
4 Click Options. In Use the first ___ classification variables for rows enter 1. In and the next
Slight
-------------No
Yes
Female
Male
All
3
3
6
1
2
3
Moderate
-------------No
Yes
20
22
42
6
13
19
A lot
-------------No
Yes
4
12
16
1
4
5
All
----All
Female
Male
All
35
56
91
This example displays a table of descriptive statistics for two measurement variables classified in
two ways.
1 Open the worksheet EXH_TABL.MTW. Set the value order for the variable Activity as shown
in Some simple tables on page 6-6, if you have not already done so.
2 Choose Stat Tables Cross Tabulation.
3 In Classification variables, enter Gender Activity.
4 Click Options. In Use the first ___ classification variables for rows enter 2. In and the next
CONTENTS
6-11
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 6
Session
window
output
Weight
Mean
Height
StDev
Weight
StDev
Height
N
Weight
N
Slight
65.000
Moderate 65.615
A lot
64.600
123.00
124.46
121.00
2.160
2.735
2.074
7.70
12.78
21.02
4
26
5
4
26
5
Slight
72.400
Moderate 70.429
A lot
71.125
170.00
158.09
155.50
2.510
2.521
2.649
19.69
20.58
13.21
5
35
16
5
35
16
All
145.10
3.679
23.87
91
91
Female
Male
All
68.725
Data
Your data must be arranged in your worksheet as columns of raw data. See Arrangement of Input
Data on page 6-3. Data may be numeric, text, or date/time. If you wish to change the order in
which text data are processed from their default alphabetized order, you can define your own
order. See Ordering Text Categories in the Manipulating Data chapter in MINITAB Users Guide
1. Column lengths do not need to be equal.
6-12
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Tables
h To tally
1 Choose Stat Tables Tally.
Options
You can display the counts, percents, cumulative counts, and cumulative percents of each
nonmissing value.
e Example of tally with all four statistics
This example generates frequency counts, cumulative counts, percents, and cumulative
percents.
1 Open the worksheet EXH_TABL.MTW. Set the value order for the variable Activity as shown
in Some simple tables on page 6-6, if you have not already done so.
2 Choose Stat Tables Tally.
3 In Variables, enter Activity. Under Display, check Counts, Percents, Cumulative counts,
CONTENTS
6-13
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 6
HOW TO USE
Data
You can have data arranged in your worksheet in raw, frequency, or contingency table form. See
Arrangement of Input Data on page 6-3. The form of the worksheet data determines acceptable
data values.
Raw data
If your data are in raw form, you can have between two and ten columns with each row
representing one observation. The data represent categories and can be numeric, text, or date/
time, and can contain any values. If you wish to change the order in which text categories are
processed from their default alphabetized order, you can define your own order. See Ordering
Text Categories in the Manipulating Data chapter in MINITAB Users Guide 1. When you enter:
Frequency data
If your data are in frequency or collapsed form, you can have between two and ten columns
containing your categories with another column containing the frequencies for the category
combinations. The category data may be numeric, text, or date/time, and may contain any values.
If you wish to change the order in which text categories are processed from their default
alphabetized order, you can define your own order. See Ordering Text Categories in the
Manipulating Data chapter in MINITAB Users Guide 1. The frequency data must be integer.
When you enter:
more than two category columns, MINITAB tabulates multiple two-way tables and performs a
2 test for association on each table
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Tables
category combination. You must delete rows with missing data from the worksheet before using
this procedure.
To calculate a 2 statistic for a contingency table with more than seven columns allowed
with Cross Tabulation, you can use Simple Correspondence Analysis (page 6-21).
More
For raw data, enter the columns containing the raw data in Classification variables
Options
display the expected count for each cell. By default, MINITAB displays the expected count
with contingency table data.
display the standardized residual, which is the contribution to 2 from each cell. By default,
MINITAB displays the standardized residual with contingency table data.
CONTENTS
6-15
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 6
SC QREF
HOW TO USE
2 In Columns containing the table, enter the columns containing the contingency table data.
Click OK.
Method
Under the null hypothesis of no association, expected frequencies for each (i, j) cell of the r c
table are:
total of row i ) ( total of column j )E ij = (-------------------------------------------------------------------------------------total number of observations
The total 2 is calculated by
2
( O ij E ij )
--------------------------E ij
i
where Oij = observed frequency in cell (i, j) and Eij = expected frequency for cell (i, j). The
degrees of freedom associated with a contingency table possessing r rows and c columns equals (r
1)(c 1).
The contribution to the 2 statistic from each cell is:
observed count expected
Standardized residual = --------------------------------------------------------------expected count
Use the 2 contribution from each cell to see how different cells contribute to a judgement about
the degree of association.
Exercise caution when there are small expected counts. MINITAB will give a count of the number
of cells that have expected frequencies less than five. Some statisticians hesitate to use the 2 test
if more than 20% of the cells have expected frequencies below five, especially if the p-value is
small and these cells give a large contribution to the total 2 value.
6-16
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Tables
If, in addition, some cells have expected frequencies less than one, the total 2 is not printed,
since most statisticians would not use the 2 test in this case. If some cells have small expected
frequencies, combining or omitting row and/or column categories can often help.
Yates correction for 2 2 tables is not used.
e Example of 2 test with raw data
in Some simple tables on page 6-6 if you have not already done so.
2 Choose Stat Tables Cross Tabulation.
3 In Classification variables, enter Gender Activity. Check Chi-Square analysis and then
Columns: Activity
Slight Moderate
Female
A lot
All
4
3.46
0.29
26
23.46
0.52
5
8.08
-1.08
35
35.00
--
Male
5
5.54
-0.23
35
37.54
-0.41
16
12.92
0.86
56
56.00
--
All
9
9.00
--
61
61.00
--
21
21.00
--
91
91.00
--
CONTENTS
6-17
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 6
SC QREF
HOW TO USE
Suppose you are interested in the connection between gender and political party preference. You
query 100 people about their political affiliation and record the number of males (row 1) and
females (row 2) for each political party. The worksheet data appears as follows:
C1
Democrat
28
22
C2
Republican
18
27
C3
Other
4
1
Session
window
output
Democrat Republic
28
18
25.00
22.50
Other
4
2.50
Total
50
22
25.00
27
22.50
1
2.50
50
Total
50
45
100
Chi-Sq =
6-18
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Tables
CONTENTS
6-19
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 6
HOW TO USE
typing Outcomes in the name cell. This column is already present in the worksheet
EXH_TABL.MTW
2 Choose Calc Probability Distributions Binomial.
3 In Number of trials, enter 5. In Probability of success, enter .5.
4 Choose Input column, then enter Outcomes. In Optional storage, enter Probs to name the
0
1
2
3
4
39 166 298 305 144
5
48
the name cell. This column is already present in the worksheet EXH_TABL.MTW
2 Choose Calc Calculator.
3 In Store result in variable, enter Chisquare to name the storage column.
4 In Expression, enter SUM((Observed Expected)**2 / Expected). Click OK.
5 Choose Calc Probability Distributions Chi-Square.
6 Choose Cumulative probability, and in Degrees of freedom, enter 5. The degrees of freedom
6-20
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Tables
View the calculated p-value in the Data window, or use Manip Display Data and display the
p-value in the Session window. The p-value of 0.0205 associated with the 2 statistic of 13.3216
indicates the binomial probability model with p = 0.5 is probably not a good model for this
experiment. That is, the observed number of outcomes are not consistent with expected number
of outcomes using a binomial model.
Data
Worksheet data may be arranged in two ways: raw or contingency table form. See Arrangement of
Input Data on page 6-3. Worksheet data arrangement determines acceptable data values.
If your data are in raw form, you can have two, three, or four classification columns with each
row representing one observation. The data represent categories and may be numeric, text, or
date/time. If you wish to change the order in which text categories are processed from their
default alphabetized order, you can define your own order. See Ordering Text Categories in
the Manipulating Data chapter in MINITAB Users Guide 1. You must delete missing data
before using this procedure. Because simple correspondence analysis works with a two-way
classification, the standard approach is to use two worksheet columns. However, you can
obtain a two-way classification with three or four variables by crossing variables within the
simple correspondence analysis procedure. See Crossing variables to create a two-way table on
page 6-25.
If your data are in contingency table form, worksheet columns must contain integer
frequencies of your category combinations. You must delete any rows or columns with missing
data or combine them with other rows or columns. Unlike the 2 test for association
procedure, there is no set limit on the number of contingency table columns. You could use
simple correspondence analysis to obtain 2 statistics for large tables.
Supplementary data
When performing a simple correspondence analysis, you have a main classification set of data on
which you perform your analysis. However, you may also have additional or supplementary data
in the same form as the main set, because you can see how these supplementary data are
scored using the results from the main set. These supplementary data may be further
information from the same study, information from other studies, or target profiles [2]. MINITAB
MINITAB Users Guide 2
CONTENTS
6-21
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 6
HOW TO USE
does not include these data when calculating the components, but you can obtain a profile and
display supplementary data in graphs.
You can have row supplementary data or column supplementary data. Row supplementary data
constitutes an additional row(s) of the contingency table, while column supplementary data
constitutes an additional column(s) of the contingency table. Supplementary data must be
entered in contingency table form. Therefore, each worksheet column of these data must contain
c entries (where c is the number of contingency table columns) or r entries (where r is the
number of contingency table rows).
h To perform a simple correspondence analysis
1 Choose Stat Tables Simple Correspondence Analysis.
2 How you enter your data depends on the form of the data and the number of categorical
variables.
If you have three or four categorical variables, you must cross some variables before
entering data as shown above. See Crossing variables to create a two-way table on page 6-25.
3 If you like, use one or more of the options listed below, then click OK.
Options
Simple Correspondence Analysis dialog box
name the rows and/or columns by entering a text column that contains an entry for each row
and/or column of the contingency table. MINITAB prints the first eight characters of names in
tables, but prints all characters on graphs.
6-22
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Tables
specify the number of components to calculate. The default is two, the minimum number is
one, and the maximum number for a contingency table with r rows and c columns is the
smaller of (r 1) or (c 1).
print row and/or column profiles. See Interpreting the results on page 6-29.
cross two category variables to create a single variable. See Crossing variables to create a
two-way table on page 6-25.
name the supplementary rows and/or columns by entering a text column that contains an
entry for each supplementary row and/or column of the contingency table. MINITAB prints
the first eight characters of names in tables, but prints all characters on graphs.
store the contingency table. MINITAB stores each column of the contingency table in a
separate worksheet column.
store principal and standardized coordinates for rows and columns. See Method below for
definitions. If you enter names for k columns, MINITAB stores the coordinates for the first k
components. When you have supplementary data, MINITAB stores their coordinates at the
end of the column.
CONTENTS
6-23
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 6
SC QREF
HOW TO USE
Method
Simple correspondence analysis performs a weighted principal components analysis of a
contingency table. If the contingency table has r rows and c columns, the number of underlying
dimensions is the smaller of (r 1) or (c 1). As with principal components, variability is
partitioned, but rather than partitioning the total variance, simple correspondence analysis
partitions the Pearson 2 statistic (the same statistic calculated in the 2 test for association).
Traditionally, correspondence analysis uses 2 / n, which is termed inertia or total inertia, rather
than 2.
Principal axes
Lower dimensional subspaces are spanned by principal components, also called principal axes.
The first principal axis is chosen so that it accounts for the maximum amount of the total inertia;
the second principal axis is chosen so that it accounts for the maximum amount of the remaining
inertia; and so on. The first principal axis spans the best (i.e., closest to the profiles using an
appropriate metric) one-dimensional subspace; the first two principal axes span the best
two-dimensional subspace; and so on. These subspaces are nested, i.e., the best one-dimensional
subspace is a subspace of the best two-dimensional subspace, and so on.
Principal and standardized coordinates
The principal coordinate for row profile i and component (axis) k is the coordinate of the
projection of row profile i onto component k. The row standardized coordinates for component k
are the principal coordinates for component k divided by the square root of the kth inertia.
Likewise, the principal coordinate for column profile j and component k is the coordinate of the
projection of column profile j onto component k. The column standardized coordinates for
component k are the column principal coordinates for component k divided by the square root of
the kth inertia.
Row and column profiles
The contingency table can be analyzed in terms of row profiles or column profiles. A row profile
is a list of row proportions that are calculated from the counts in the contingency table.
Specifically, the profile for row i is (ni1 / ni., ni2 / ni., , nic / ni.). A column profile is a list of
column proportions, where nij, is the frequency in row i and column j of the table and ni., is the
sum of the frequencies in row i. Specifically, the profile for column j is (n1j /n.j, n2j / n.j, , nrj /
n.j), where n.j, is the sum of the frequencies in column j.
The two analyses are mathematically equivalent. The one that you choose will depend on which
is more natural for a given analysis. Most of the time, a researcher is interested in studying either
how the row profiles differ from each other or how the column profiles differ from each other.
Row profiles are vectors of length c and therefore lie in a c-dimensional space (similarly, column
profiles lie in an r-dimensional space). Since this dimension is usually too high to allow easy
interpretation, you will want to try to find a subspace of lower dimension (preferably not more
than two or three) that lies close to all the row profile points (or column profile points). You can
then project the profile points onto this subspace and study the projections. If the projections are
6-24
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Tables
close to the profiles, we do not lose much information. Working in two or three dimensions
allows you to study the data more easily and, in particular, allows you to examine plots. This
process is analogous to choosing a small number of principal components to summarize the
variability of continuous data.
If d = the smaller of (r1) and (c1), then the row profiles (or equivalently the column profiles)
will lie in a d-dimensional subspace of the full c-dimensional space (or equivalently the full
r-dimensional space). Thus, there are at most d principal components.
Inertia
MINITAB prints the inertia associated with each component, and also displays these in a
histogram. The inertias associated with all of the principal components add up to the total
inertia. Ideally, the first one, two, or three components account for most of the total inertia for
the table.
See Simple Correspondence Analysis in Help for additional definitions and calculations.
a symmetric plot
CONTENTS
6-25
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 6
SC QREF
HOW TO USE
A row plot is a plot of row principal coordinates. A column plot is a plot of column principal
coordinates. See Method on page 6-24 for definitions.
A symmetric plot is a plot of row and column principal coordinates in a joint display. An advantage
of this plot is that the profiles are spread out for better viewing of distances between them. The
row-to-row and column-to-column distances are approximate 2 distances between the respective
profiles. However, this same interpretation cannot be made for row-to-column distances. Because
these distances are two different mappings, you must interpret these plots carefully [2].
An asymmetric row plot is a plot of row principal coordinates and of column standardized
coordinates in the same plot. Distances between row points are approximate 2 distances between
the row profiles. Choose the asymmetric row plot over the asymmetric column plot if rows are of
primary interest.
An asymmetric column plot is a plot of column principal coordinates and row standardized
coordinates. Distances between column points are approximate 2 distances between the column
profiles. Choose an asymmetric column plot over an asymmetric row plot if columns are of
primary interest.
An advantage of asymmetric plots is that there can be an intuitive interpretation of the distances
between row points and column points, especially if the two displayed components represent a
large proportion of the total inertia [2]. Suppose you have an asymmetric row plot, as shown in
Example of simple correspondence analysis on page 6-27. This graph plots both the row profiles
and the column vertices for components 1 and 2. The closer a row profile is to a column vertex,
the higher the row profile is with respect to the column category. In this example, of the row
points, Biochemistry is closest to column category E, implying that biochemistry as a discipline
has the highest percentage of unfunded researchers in this study. A disadvantage of asymmetric
plots is that the profiles of interest are often bunched in the middle of the graph [2], as happens
with the asymmetric plot of this example.
h To display simple correspondence analysis plots
1 Perform steps 12 of To perform a simple correspondence analysis on page 6-22.
2 Click Graphs.
and 15 component pairs in Axis pairs for all plots (Y then X). MINITAB plots the first
6-26
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Tables
component in each pair on the vertical or y-axis of the plot; the second component in the pair
on the horizontal or x-axis of the plot.
5 If you have supplementary data and would like to include this data in the plot(s), check Show
showing rows only and Asymmetric row plot showing rows and columns. Click OK in each
dialog box.
CONTENTS
6-27
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 6
Session
window
output
HOW TO USE
Geology
Biochemi
Chemistr
Zoology
Physics
Engineer
Microbio
Botany
Statisti
Mathemat
Mass
B
0.224
0.069
0.192
0.125
0.193
0.125
0.162
0.140
0.172
0.141
0.161
C
0.459
0.448
0.377
0.342
0.412
0.284
0.378
0.395
0.379
0.474
0.389
D
0.165
0.034
0.162
0.292
0.079
0.170
0.135
0.198
0.138
0.103
0.162
E
0.118
0.414
0.223
0.217
0.228
0.386
0.297
0.267
0.241
0.256
0.249
Mass
0.107
0.036
0.163
0.151
0.143
0.111
0.046
0.108
0.036
0.098
Inertia Proportion
0.0391
0.4720
0.0304
0.3666
0.0109
0.1311
0.0025
0.0303
0.0829
Cumulative
0.4720
0.8385
0.9697
1.0000
Histogram
******************************
***********************
********
*
Row Contributions
ID
1
2
3
4
5
6
7
8
9
10
Name
Geology
Biochemi
Chemistr
Zoology
Physics
Engineer
Microbio
Botany
Statisti
Mathemat
Qual
0.916
0.881
0.644
0.929
0.886
0.870
0.680
0.654
0.561
0.319
6-28
Mass
0.107
0.036
0.163
0.151
0.143
0.111
0.046
0.108
0.036
0.098
Inert
0.137
0.119
0.021
0.230
0.196
0.152
0.010
0.067
0.012
0.056
----Component
Coord Corr
-0.076 0.055
-0.180 0.119
-0.038 0.134
0.327 0.846
-0.316 0.880
0.117 0.121
-0.013 0.009
0.179 0.625
-0.125 0.554
-0.107 0.240
1---Contr
0.016
0.030
0.006
0.413
0.365
0.039
0.000
0.088
0.014
0.029
----Component
Coord Corr
-0.303 0.861
0.455 0.762
-0.073 0.510
-0.102 0.083
-0.027 0.006
0.292 0.749
0.110 0.671
0.039 0.029
-0.014 0.007
0.061 0.079
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
2---Contr
0.322
0.248
0.029
0.052
0.003
0.310
0.018
0.005
0.000
0.012
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Tables
Supplementary Rows
ID Name
1 Museums
2 MathSci
----Component
Qual Mass Inert Coord Corr
0.556 0.067 0.353 0.314 0.225
0.559 0.134 0.041 -0.112 0.493
1---- ----Component
Contr Coord Corr
0.168 -0.381 0.331
0.043 0.041 0.066
2---Contr
0.318
0.007
Column Contributions
ID
1
2
3
4
5
Name
A
B
C
D
E
Qual
0.587
0.816
0.465
0.968
0.990
Mass
0.039
0.161
0.389
0.162
0.249
----Component
Inert Coord Corr
0.187 -0.478 0.574
0.110 -0.127 0.286
0.094 -0.083 0.341
0.347 0.390 0.859
0.262 0.032 0.012
1---Contr
0.228
0.067
0.068
0.632
0.006
----Component
Coord Corr
-0.072 0.013
-0.173 0.531
-0.050 0.124
-0.139 0.109
0.292 0.978
2---Contr
0.007
0.159
0.032
0.103
0.699
Graph
window
output
The column labeled Qual, or quality, is the proportion of the row inertia represented by the
two components. The rows Zoology and Geology, with quality = 0.928 and 0.916,
CONTENTS
6-29
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 6
HOW TO USE
respectively, are best represented among the rows by the two component breakdown, while
Math has the poorest representation, with a quality value of 0.319.
The column labeled Mass has the same meaning as in the Row Profiles tablethe proportion
of the class in the whole data set.
The column labeled Inert is the proportion of the total inertia contributed by each row. Thus,
Geology contributes 13.7% to the total 2 statistic.
Next, MINITAB displays information for each of the two components (axes).
The column labeled Coord gives the principal coordinates of the rows.
The column labeled Corr represents the contribution of the component to the inertia of the
row. Thus, Component 1 accounts for most of the inertia of Zoology and Physics (Coor =
0.846 and 0.880, respectively), but explains little of the inertia of Microbiology (Coor =
0.009).
Contr, the contribution of each row to the axis inertia, shows that Zoology and Physics
contribute the most, with Botany contributing to a smaller degree, to Component 1. Geology,
Biochemistry, and Engineering contribute the most to Component 2.
Supplementary rows. You can interpret this table in a similar fashion as the row contributions
table.
Column Contributions. The fifth table shows that two components explain most of the
variability in funding categories B, D, and E. The funded categories A, B, C, and D contribute
most to component 1, while the unfunded category, E, contributes most to component 2.
Row Plot. This plot displays the row principal coordinates. Component 1, which best explains
Zoology and Physics, shows these two classes well removed from the origin, but with opposite
sign. Component 1 might be thought of as contrasting the biological sciences Zoology and
Botany with Physics. Component 2 might be thought of as contrasting Biochemistry and
Engineering with Geology.
Asymmetric Row Plot. Here, the rows are scaled in principal coordinates and the columns are
scaled in standard coordinates. Among funding classes, Component 1 contrasts levels of funding,
while Component 2 contrasts being funded (A to D) with not being funded (E). Among the
disciplines, Physics tends to have the highest funding level and Zoology has the lowest.
Biochemistry tends to be in the middle of the funding level, but highest among unfunded
researchers. Museums tend to be funded, but at a lower level than academic researchers.
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Tables
procedure, you gain information on a potentially larger number of variables, but you may lose
information on how rows and columns relate to each other.
Data
Worksheet data may be arranged in two ways: raw or indicator variable form. See Arrangement of
Input Data on page 6-3. Worksheet data arrangement determines acceptable data values.
If your data are in raw form, you can have one or more classification columns with each row
representing one observation. The data represent categories and may be numeric, text, or
date/time. If you wish to change the order in which text categories are processed from their
default alphabetized order, you can define your own order. See Ordering Text Categories in
the Manipulating Data chapter in MINITAB Users Guide 1. You must delete missing data
before using this procedure.
If your data are in indicator variable form, each row will also represent one observation. There
will be one indicator column for each category level. You can use Calc Make Indicator
Variables to create indicator variables from raw data. You must delete missing data before
using this procedure.
Supplementary data
When performing a multiple correspondence analysis, you have a main classification set of data
on which you perform your analysis. However, you may also have additional or supplementary
data in the same form as the main set, and you might want to see how this supplementary data
are scored using the results from the main set. These supplementary data are typically a
classification of your variables that can help you to interpret the results. MINITAB does not
include these data when calculating the components, but you can obtain a profile and display
supplementary data in graphs.
Set up your supplementary data in your worksheet using the same form, either raw data or
indicator variables, as you did for the input data. Because your supplementary data will provide
additional information about your observations, your supplementary data column(s) must be the
same length as your input data.
CONTENTS
6-31
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 6
SC QREF
HOW TO USE
For raw data, enter the columns containing the raw data in Categorical variables
For indicator variable data, enter the columns containing the indicator variable data in
Indicator variables
3 If you like, use one or more of the options listed below, then click OK.
Options
Multiple Correspondence Analysis dialog box
name the categories by entering a text column that has one row for each category of all input
variables. For example, suppose there are 3 categorical variables: Gender (male, female), Hair
color (blond, brown, black), and Age (under 20, from 20 to 50, over 50), and no
supplementary variables. You would assign eight category names (2 + 3 + 3), and enter the
names in a column. MINITAB prints the first eight characters of names in tables, but prints all
characters on graphs.
specify the number of components to calculate. The default is two, the minimum number is
one, and the maximum is the number of underlying dimensions. If the number of categories
in the j categorical columns are c1, c2, , cj, the number of underlying dimensions is the sum
of (ci 1), where i = 1, 2, , j.
print a Burt table. The Burt table is a symmetric matrix with one column and one row for each
level (category) of a categorical variable that contains the frequencies.
6-32
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Tables
name the supplementary data categories by entering a text column that contains an entry for
each category of all supplementary variables. MINITAB prints the first eight characters of
names in tables, but prints all characters on graphs.
store component coordinates. See Method on page 6-34 for definitions. If you enter names for
k columns, MINITAB stores the coordinates for the first k components. When you have
supplementary data, MINITAB stores their coordinates at the end of the column.
Method
Multiple correspondence analysis decomposes a matrix of indicator variables formed from all
entered variables. Unlike simple correspondence analysis, where all row classes are from one
variable and all column classes are from another variable, here all variable classes are column
contributors.
Multiple correspondence analysis performs a weighted principal-components analysis of the
matrix of indicator variables. If the number of categories in the j categorical columns are c1, c2,
, cj, the number of underlying dimensions is the sum of (ci 1), where i = 1, 2, , j. As with
simple correspondence analysis, multiple correspondence analysis partitions the Pearson 2
statistic. Unlike simple correspondence analysis, there is no choice of examining either row or
column profilesthere are only column profiles. See Method under simple correspondence
analysis on page 6-24 for additional information and definitions. Because there are no rows,
multiple correspondence analysis offers only one grapha plot of column coordinates.
e Example of multiple correspondence analysis
Automobile accidents are classified [3] (data from [1]) according to the type of accident
(collision or rollover), severity of accident (not severe or severe), whether or not the driver was
ejected, and the size of the car (small or standard). Multiple correspondence analysis was used to
examine how the categories in this four-way table are related to each other.
1 Open the worksheet EXH_TABL.MTW.
2 Choose Stat Tables Multiple Correspondence Analysis.
MINITAB Users Guide 2
CONTENTS
6-33
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 6
HOW TO USE
Session
window
output
Inertia Proportion
0.4032
0.4032
0.2520
0.2520
0.1899
0.1899
0.1549
0.1549
1.0000
Cumulative
0.4032
0.6552
0.8451
1.0000
Histogram
******************************
******************
**************
***********
Column Contributions
ID
1
2
3
4
5
6
7
8
Name
Small
Standard
NoEject
Eject
Collis
Rollover
NoSevere
Severe
Qual
0.965
0.965
0.474
0.474
0.613
0.613
0.568
0.568
Mass
0.042
0.208
0.213
0.037
0.193
0.057
0.135
0.115
Inert
0.208
0.042
0.037
0.213
0.057
0.193
0.115
0.135
----Component
Coord Corr
0.381 0.030
-0.078 0.030
-0.284 0.472
1.659 0.472
-0.426 0.610
1.429 0.610
-0.652 0.502
0.769 0.502
1---Contr
0.015
0.003
0.043
0.250
0.087
0.291
0.143
0.168
----Component
Coord Corr
-2.139 0.936
0.437 0.936
-0.020 0.002
0.115 0.002
0.034 0.004
-0.113 0.004
-0.237 0.066
0.280 0.066
2---Contr
0.771
0.158
0.000
0.002
0.001
0.003
0.030
0.036
Graph
window
output
6-34
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
References
Tables
Column Contributions. Use the column contributions to interpret the different components.
Since we did not specify the number of components, MINITAB calculates 2 components.
The column labeled Qual, or quality, is the proportion of the column inertia represented by
the all calculated components. The car-size categories (Small, Standard) are best represented
by the two component breakdown with Qual = 0.965, while the ejection categories are the
least represented with Qual = 0.474. When there are only two categories for each class, each
is represented equally well by any component, but this rule would not necessarily be true for
more than two categories.
The column labeled Mass is the proportion of the class in the whole data set. In this example,
the CarWt, DrEject, AccType, and AccSever classes combine for a proportion of 0.25.
The column labeled Inert is the proportion of Inertia contributed by each column. The
categories small cars, ejections, and collisions have the highest inertia, summing 61.4%,
which indicates that these categories are more dissociated from the others.
Next, MINITAB displays information for each of the two components (axes).
The column labeled Coord gives the column coordinates. Eject and Rollover have the largest
absolute coordinates for component 1 and Small has the largest absolute coordinate for
component 2. The sign and relative size of the coordinates are useful in interpreting
components.
The column labeled Corr represents the contribution of the respective component to the
inertia of the row. Here, Component 1 accounts for 47 to 61% of the inertia of the ejection,
collision type, and accident severity categories, but explains only 3.0% of the inertia of car
size.
Contr, the contribution of the row to the axis inertia, shows Eject and Rollover contributing
the most to Component 1 (Contr = 0.250 and 0.291, respectively). Component 2, on the
other hand accounts for 93.6% of the inertia of the car size categories, with Small
contributing 77.1% of the axis inertia.
Column Plot. As the contribution values for Component 1 indicate, Eject and Rollover are
most distant from the origin. This component contrasts Eject and Rollover and to some extent
Severe with NoSevere. Component 2 separates Small with the other categories. Two
components may not adequately explain the variability of these data, however.
References
[1] S. E. Fienberg. (1987). The Analysis of Cross-Classified Categorical Data. The MIT Press,
Cambridge, Massachusetts.
[2] M. J. Greenacre (1993). Correspondence Analysis in Practice, Academic Press, Harcourt,
Brace & Company, New York.
CONTENTS
6-35
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 6
HOW TO USE
References
Acknowledgment
We are grateful for the collaboration of James R. Allen of Allen Data Systems, Cross Plains,
Wisconsin in the development of the cross tabulation procedure.
6-36
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Time Series
Decomposition, 7-10
Differences, 7-35
Lag, 7-36
Autocorrelation, 7-37
ARIMA, 7-44
See also,
CONTENTS
7-1
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 7
HOW TO USE
Time Series Overview
7-2
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Time Series
Command
Forecast
Example
Trend Analysis
Static
Length: long
Profile:
extension of
trend line
Decomposition
Separates the times series into linear trend and seasonal
components, as well as error. Choose whether the
seasonal component is additive or multiplicative with the
trend. Use this procedure to forecast when there is a
seasonal component to your series or if you simply want
to examine the nature of the component parts.
Length: long
Profile: trend
with seasonal
pattern
Moving Average
Smooths your data by averaging consecutive observations
in a series. This procedure can be a likely choice when
your data do not have a trend or seasonal component.
There are ways, however, to use moving averages when
your data possess trend and/or seasonality.
Length: short
Profile: flat line
Dynamic
Length: short
Profile: flat line
Length: short
Profile: straight
line with slope
equal to last
trend estimate
Winters Method
Smooths your data by Holt-Winters exponential
smoothing. Use this procedure when trend and
seasonality are present, with these two components being
either additive or multiplicative. Winters Method
calculates dynamic estimates for three components: level,
trend, and seasonal.
CONTENTS
Length: short
to medium
Profile: trend
with seasonal
pattern
7-3
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 7
Trend Analysis
Differences computes and stores the differences between data values of a time series. If you
wish to fit an ARIMA model but there is trend or seasonality present in your data, differencing
data is a common step in assessing likely ARIMA models. Differencing is used to simplify the
correlation structure and to reveal any underlying pattern.
Lag computes and stores the lags of a time series. When you lag a time series, MINITAB moves
the original values down the column and inserts missing values at the top of the column. The
number of missing values inserted depends on the length of the lag.
Partial Autocorrelation computes and plots the partial autocorrelations of a time series.
Partial autocorrelations, like autocorrelations, are correlations between sets of ordered data
pairs of a time series. As with partial correlations in the regression case, partial autocorrelations
measure the strength of relationship with other terms being accounted for. The partial
autocorrelation at a lag of k is the correlation between residuals at time t from an
autoregressive model and observations at lag k with terms for all intervening lags present in the
autoregressive model. The plot of partial autocorrelations is called the partial autocorrelation
function or pacf. View the pacf to guide your choice of terms to include in an ARIMA model.
Cross Correlation computes and graphs correlations between two time series.
ARIMA fits a Box-Jenkins ARIMA model to a time series. ARIMA stands for Autoregressive
Integrated Moving Average. The terms in the nameAutoregressive, Integrated, and Moving
Averagerepresent filtering steps taken in constructing the ARIMA model until only random
noise remains. Use ARIMA to model time series behavior and to generate forecasts.
Trend Analysis
Trend analysis fits a general trend model to time series data and provides forecasts. Choose
among the linear, quadratic, exponential growth or decay, and S-curve models. Use this
procedure to fit trend when there is no seasonal component to your series.
7-4
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Trend Analysis
Time Series
Data
The time series must be in one numeric column. If you choose the S-curve trend model, you
must delete missing data from the worksheet before performing the trend analysis. MINITAB
automatically omits missing values from the calculations when you use one of the other three
trend models.
h To do a trend analysis
1 Choose Stat Time Series Trend Analysis.
Options
Trend Analysis dialog box
fit a linear (default), quadratic, exponential growth curve, or S-curve (Pearl-Reed logistic)
trend model. See Trend models on page 7-6.
specify the origin of forecasts (time unit before first forecast). The default is the end of the
data.
CONTENTS
7-5
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 7
Trend Analysis
the default Session window output, which includes the length of the series, number of
missing values, the fitted trend equation, and three measures to help you determine the
accuracy of the fitted values: MAPE, MAD, and MSD
the default Session window output, plus the data, fits, and residuals (the detrended data)
Options Subdialog box
apply coefficients (weights) from fitting other data to obtain weighted average fit. See Weighted
average trend analysis on page 7-7.
enter weights of coefficients of current data when obtaining weighted average fit. See
Weighted average trend analysis on page 7-7.
Trend models
There are four different trend models you can choose from: linear (default), quadratic,
exponential growth curve, or S-curve (Pearl-Reed logistic). Use care when interpreting the
coefficients from the different models, as they have different meanings. See [4] for details.
Trend analysis by default uses the linear trend model:
yt = 0 + 1 t + e t
In this model, 1 represents the average change from one period to the next.
The quadratic trend model which can account for simple curvature in the data, is:
2
yt = 0 + 1 t + 2 t + e t
The exponential growth trend model accounts for exponential growth or decay. For example, a
savings account might exhibit exponential growth. The model is:
t
yt = 0 1 + et
7-6
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Trend Analysis
HOW TO USE
Time Series
The S-curve model fits the Pearl-Reed logistic trend model. This accounts for the case where the
series follows an S-shaped curve. The model is:
a
10
y t = ----------------------------------- 0 + 1 ( 2t 1 )
coefficients. Default weights of 0.2 will be used for each coefficient if you dont enter any. If
you do enter weights, the number that you enter must be equal to the number of coefficients.
MINITAB generates a time series plot of the data, plus a second time series plot that shows trend
lines for three models. The Session window displays the coefficients and accuracy measures for
all three models.
Measures of accuracy
MINITAB computes three measures of accuracy of the fitted model: MAPE, MAD, and MSD for
each of the simple forecasting and smoothing methods. For all three measures, the smaller the
value, the better the fit of the model. Use these statistics to compare the fits of the different
methods.
MAPE, or Mean Absolute Percentage Error, measures the accuracy of fitted time series values. It
expresses accuracy as a percentage.
( y y t ) y t
t
- 100
MAPE = ---------------------------------n
MINITAB Users Guide 2
CONTENTS
( yt 0 )
7-7
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 7
HOW TO USE
Trend Analysis
where y t equals the actual value, y t equals the forecast value, and n equals the number of
forecasts.
MAD, which stands for Mean Absolute Deviation, measures the accuracy of fitted time series
values. It expresses accuracy in the same units as the data, which helps conceptualize the amount
of error.
n
y t y t
=1
MAD = t--------------------------n
where y t equals the actual value, y t equals the forecast value, and n equals the number of
forecasts.
MSD stands for Mean Squared Deviation. It is very similar to MSE, mean squared error, a
commonly-used measure of accuracy of fitted time series values. MSD is always computed using
the same denominator, n, regardless of the model, so you can compare MSD values across
models. MSEs are computed with different degrees of freedom for different models, so you
cannot always compare MSE values across models.
n
( y t y t )
=1
MSD = t----------------------------n
where y t equals the actual value, y t equals the forecast value, and n equals the number of
forecasts.
Forecasting
Forecasts are extrapolations of the trend model fits. Data prior to the forecast origin are used to fit
the trend.
e Example of a trend analysis
You collect employment in a trade business over 60 months and wish to predict employment for
the next 12 months. Because there is an overall curvilinear pattern to the data, you use trend
analysis and fit a quadratic trend model. Because there is also a seasonal component, you save the
fits and residuals to perform decomposition of the residuals (see Example of decomposition on
page 7-13).
1 Open the worksheet EMPLOY.MTW.
2 Choose Stat Time Series Trend Analysis.
3 In Variable, enter Trade.
4 Under Model Type, choose Quadratic.
5 Check Generate forecasts and enter 12 in Number of forecasts.
7-8
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Trend Analysis
HOW TO USE
Time Series
6 Click Storage.
7 Check Fits (trend line), Residuals (detrended data), and Forecasts. Click OK in each dialog
box.
Session
wIndow
output
Trend Analysis
Data
Length
NMissing
Trade
60.0000
0
1.70760
5.95655
59.1305
Row
Period
FORE1
1
2
3
4
5
6
7
8
9
10
11
12
61
62
63
64
65
66
67
68
69
70
71
72
391.818
393.649
395.502
397.376
399.271
401.188
403.127
405.087
407.068
409.071
411.096
413.142
Graph
window
output
CONTENTS
7-9
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 7
Decomposition
the accuracy of the fitted values: MAPE, MAD, and MSD. The trade employment data show a
general upward trend, though with an evident seasonal component. The trend model appears to
fit well to the overall trend, but the seasonal pattern is not well fit. To better fit these data, you also
use decomposition on the stored residuals and add the trend analysis and decomposition fits and
forecasts (see Example of decomposition on page 7-13).
Decomposition
You can use decomposition to separate the time series into linear trend and seasonal components,
as well as error, and provide forecasts. You can choose whether the seasonal component is
additive or multiplicative with the trend. Use this procedure when you wish to forecast and there
is a seasonal component to your series, or if you simply want to examine the nature of the
component parts. See [6] for a discussion of decomposition methods.
Data
The time series must be in one numeric column. MINITAB automatically omits missing data from
the calculations.
The data that you enter depends upon how you use this procedure. Usually, decomposition is
performed in one step by simply entering the time series. Alternatively, you can perform a
decomposition of the trend model residuals. This process may improve the fit of the model by
combining the information from the trend analysis and the decomposition. See Decomposition of
trend model residuals on page 7-12.
h To do a decomposition
1 Choose Stat Time Series Decomposition.
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Decomposition
HOW TO USE
Time Series
Options
Decomposition dialog box
specify if the trend and seasonal components should be additive rather than multiplicative, or
if you wish to omit trend from the model. See The decomposition model on page 7-11.
specify where the first observation is in the seasonal period (default is 1). For example, if you
have an annual cycle starting in January with monthly data (seasonal length is 12) and your
first observation is in June, specify 6.
specify the origin of forecasts (time unit before first forecast). The default is the end of the
data.
display a summary of the fits, plus a table of the data, the trend, the seasonal component, the
detrended data (seasonal plus residual), the seasonally adjusted data (trend plus residual), fits
(trend plus seasonal), and residuals
store the trend line, detrended data, the seasonal component, the seasonally adjusted data,
forecasts, residuals, and fits
CONTENTS
7-11
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 7
HOW TO USE
Decomposition
trend component but you omit it from the decomposition, this can influence the estimates of the
seasonal indices.
Method
Decomposition involves the following steps:
1 MINITAB fits a trend line to the data, using least squares regression.
2 Next, the data are detrended by either dividing the data by the trend component
(multiplicative model) or subtracting the trend component from the data (additive model).
3 Then, the detrended data are smoothed using a centered moving average with a length equal
to the length of the seasonal cycle. When the seasonal cycle length is an even number, a
two-step moving average is required to synchronize the moving average correctly.
4 Once the moving average is obtained, it is either divided into (multiplicative model) or
subtracted from (additive model) the detrended data to obtain what are often referred to as raw
seasonals.
5 Within each seasonal period, the median value of the raw seasonals is found. The medians are
also adjusted so that their mean is one (multiplicative model) or their sum is zero (additive
model). These adjusted medians constitute the seasonal indices.
6 The seasonal indices are used in turn to seasonally adjust the data.
7-12
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Decomposition
Time Series
Note
If you want these components to be additive, add the respective fits together.
The MAPE, MAD, MSD accuracy measures from decomposition used in this manner are
not comparable to these statistics calculated from other procedures, but you can
calculate the comparable values fairly easily. We demonstrate this with MSD in the
decomposition example.
Forecasts
Decomposition calculates the forecast as the linear regression line multiplied by (multiplicative
model) or added to (additive model) the seasonal indices. Data prior to the forecast origin are
used for the decomposition.
e Example of decomposition
You wish to predict trade employment for the next 12 months using data collected over 60
months. Because the data have a trend that is fit well by trend analysis quadratic trend model
and possess a seasonal component, you use the residuals from trend analysis example (see
Example of a trend analysis on page 7-8) to combine both trend analysis and decomposition for
forecasting.
1 Do the trend analysis example on page 7-8.
2 Choose Stat Time Series Decomposition.
3 In Variable, enter the name of the residual column you stored in from trend analysis.
4 In Seasonal length, enter 12.
5 Under Model Type, choose Additive. Under Model Components, choose Seasonal only.
6 Check Generate forecasts and enter 12 in Number of forecasts.
7 Click Storage.
8 Check Forecasts and Fits. Click OK in each dialog box.
CONTENTS
7-13
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 7
Session
window
output
HOW TO USE
Decomposition
Index
-8.48264
-13.3368
-11.4410
-5.81597
0.559028
3.55903
1.76736
3.47569
3.26736
5.39236
8.49653
12.5590
Accuracy of Model
MAPE:
MAD:
MSD:
881.582
2.802
11.899
Forecasts
Row Period
1
2
3
4
5
6
7
8
9
10
11
12
FORE2
61 -8.4826
62 -13.3368
63 -11.4410
64 -5.8160
65
0.5590
66
3.5590
67
1.7674
68
3.4757
69
3.2674
70
5.3924
71
8.4965
72 12.5590
7-14
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Decomposition
HOW TO USE
Time Series
Graph
window
output
CONTENTS
7-15
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 7
Decomposition
a time series plot that shows the original series with the fitted trend line, predicted values, and
forecasts
a component analysisin separate plots are the series, the detrended data, the seasonally
adjusted data, the seasonally adjusted and detrended data (the residuals)
a seasonal analysischarts of seasonal indices and percent variation within each season
relative to the sum of variation by season and boxplots of the data and of the residuals by
seasonal period
In addition, MINITAB displays the fitted trend line, the seasonal indices, the three accuracy
measuresMAPE, MAD, and MSD (see Measures of accuracy on page 7-7)and forecasts in
the Session window.
In the example, the first graph shows that the detrended residuals from trend analysis are fit fairly
well by decomposition, except that part of the first annual cycle is underpredicted and the last
annual cycle is overpredicted. This is also evident in the lower right plot of the second graph; the
residuals are highest in the beginning of the series and lowest at the end.
e Example of fits and forecasts of combined trend analysis and decomposition
Now, lets look at the combined trend analysis and decomposition results:
Step 1: Calculate the fits and forecasts of the combined trend analysis and
decomposition
1 Choose Calc Calculator.
2 In Store result in variable, enter NewFits.
3 In Expression, add the fits from trend analysis to the fits from decomposition. Click OK.
4 Choose Calc Calculator. Clear the Expression box by selecting the contents and pressing
Step 2: Plot the fits and forecasts of the combined trend analysis and
decomposition
1 Choose Stat Time Series Time Series Plot.
2 In Graph variables, enter Trade, NewFits, and NewFore in rows 13, respectively.
3 Choose Frame Multiple Graphs.
4 Under Generation of Multiple Graphs, choose Overlay graphs on the same page. Click OK.
7-16
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Decomposition
Time Series
5 Click Options. In Start time, enter 1 1 61 in rows 13, respectively. Click OK in each
dialog box.
Step 3: Calculate MSD
1 Choose Calc Calculator.
2 In Store result in variable, enter MSD.
3 Clear the Expression box by selecting the contents and pressing the delete key.
4 In Functions, double-click Sum. Within the parentheses in Expression, enter ((Trade
Graph
window
output
Session
window
output
Data Display
MSD
11.8989
CONTENTS
7-17
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 7
Moving Average
Moving Average
Moving Average smooths your data by averaging consecutive observations in a series and provides
short-term forecasts. This procedure can be a likely choice when your data do not have a trend or
seasonal component. There are ways, however, to use moving averages when your data possess
trend and/or seasonality.
Data
The time series must be in one numeric column. MINITAB automatically omits missing data from
the calculations.
h To do a moving average
1 Choose Stat Time Series Moving Average.
Options
Moving Average dialog box
center the moving averages. See Centering moving average values on page 7-19.
specify the origin of forecasts (time unit before first forecast). The default is the end of the data.
7-18
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Moving Average
Time Series
store the moving averages, fits or predicted values (uncentered moving average from time t
1), residuals, forecasts, and upper and lower 95% prediction limits
Method
To calculate a moving average, MINITAB averages consecutive groups of observations in a series.
For example, suppose a series begins with the numbers 4, 5, 8, 9, 10 and you use a moving
average length of 3. The first two values of the moving average are missing. The third value of
the moving average is the average of 4, 5, 8; the fourth value is the average of 5, 8, 9; the fifth
value is the average of 8, 9, 10.
CONTENTS
7-19
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 7
Moving Average
When you center the moving averages, they are placed at the center of the range rather than at
the end of the range. This is done to position the moving average values at their central positions
in time.
If the moving average length is odd: Suppose the moving average length is 3. In this case,
MINITAB places the first numeric moving average value at time 2, the next at time 3, and so on,
with missing moving average values for the first and last times.
If the moving average length is even: Suppose the moving average length is 4. The center of that
range is 2.5, but you cannot place a moving average value at time 2.5. Instead, data values 14
and 25 are averaged separately, then averaged together and placed at time 3. This process is
repeated throughout the series, with missing values placed at the first two and the last two
positions.
Forecasting
The fitted value at time t is the uncentered moving average at time t 1. The forecasts are the
fitted values at the forecast origin. If you forecast 10 time units ahead, the forecasted value for
each time will be the fitted value at the origin. Data up to the origin are used for calculating the
moving averages.
You can use the linear moving average method by performing consecutive moving averages. This
is often done when there is a trend in the data. First, compute and store the moving average of
the original series. Then compute and store the moving average of the previously stored column
to obtain a second moving average.
In naive forecasting, the forecast for time t is the data value at time t 1. Using moving average
procedure with a moving average of length one gives naive forecasting.
See [1], [4], and [6] for a discussion of forecasting.
e Example of moving average
You wish to predict employment over the next 6 months in a segment of the metals industry using
data collected over 60 months. You use the moving average method as there is no well-defined
trend or seasonal pattern in the data.
1 Open the worksheet EMPLOY.MTW.
2 Choose Stat Time Series Moving Average.
3 In Variable, enter Metals. In MA length, enter 3.
4 Check Center the moving averages.
5 Check Generate forecasts, and enter 6 in Number of forecasts. Click OK.
7-20
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Moving Average
Session
window
output
HOW TO USE
Time Series
Moving average
Data
Metals
Length
60.0000
NMissing
0
Moving Average
Length: 3
Accuracy Measures
MAPE: 1.55036
MAD: 0.70292
MSD: 0.76433
Row
Period Forecast
1
2
3
4
5
6
61
62
63
64
65
66
49.2
49.2
49.2
49.2
49.2
49.2
Lower
Upper
47.4865
47.4865
47.4865
47.4865
47.4865
47.4865
50.9135
50.9135
50.9135
50.9135
50.9135
50.9135
Graph
window
output
CONTENTS
7-21
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 7
HOW TO USE
Single Exponential Smoothing
Data
Your time series must be in a numeric column.
The time series cannot include any missing values. If you have missing values, you may want to
provide estimates of the missing values. If you
have seasonal data, estimate the missing values as the fitted values from the decomposition
procedure on page 7-10
do not have seasonal data, estimate the missing values as the fitted values from the moving
average procedure on page 7-18
Options
Single Exponential Smoothing dialog box
specify a smoothing weight between 0 and 1 rather than using the calculated optimal weight.
See Choosing a weight on page 7-23.
7-22
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Time Series
specify the origin of forecasts (time unit before first forecast). The default is the end of the
data.
set the initial smoothed value to be the average of the first k observations when you specify the
weight. You can specify k, which is 6 by default.
store the smoothed values, the fits or predicted values (smoothed value at time t 1), the
residuals (data fits), forecasts, and upper and lower 95% prediction limits
Choosing a weight
The weight is the smoothing parameter. You can have MINITAB supply the optimal weight (the
default) or you can specify the weight. See Method on page 7-24 for more information.
Large weights result in more rapid changes in the fitted line; small weights result in less rapid
changes in the fitted line. Therefore, the larger the weights the more the smoothed values follow
the data; the smaller the weights the smoother the pattern in the smoothed values. Thus, small
weights are usually recommended for a series with a high noise level around the signal or
pattern. Large weights are usually recommended for a series with a small noise level around the
pattern.
Among single exponential smoothing fits, the MSD accuracy measure will be smallest with
optimal weights, but it is possible to obtain smaller MAPE and MAD values with non-optimal
weights. See Measures of accuracy on page 7-7.
h To specify your own weight
In the main Single Exponential Smoothing dialog box, choose Use under Weight to use in
smoothing, and enter a value between 0 and 2, although the usual choices are between 0 and 1.
You can use a rule of thumb for choosing a weight.
CONTENTS
7-23
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 7
HOW TO USE
Single Exponential Smoothing
Method
The smoothed (predicted) values are obtained in one of two ways: with an optimal weight or with
a specified weight.
Optimal weight
1 MINITAB fits the ARIMA (0,1,1) model and stores the fits.
2 The smoothed values are the ARIMA model fits, but lagged one time unit.
3 Initial smoothed value (at time zero) by backcasting:
initial smoothed value = [smoothed in period two (data in period 1)] / (1 )
where is the weight.
Specified weight
1 MINITAB uses the average of the first six (or N, if N < 6) observations for the initial
smoothed value (at time zero).
2 Subsequent smoothed values are calculated from the formula:
smoothed value at time t = ()(data at t) + (1 )(smoothed at t 1)
where is the weight.
Forecasting
The fitted value at time t is the smoothed value at time t 1. The forecasts are the fitted value at
the forecast origin. If you forecast 10 time units ahead, the forecasted value for each time will be
the fitted value at the origin. Data up to the origin are used for the smoothing.
In naive forecasting, the forecast for time t is the data value at time t 1. Perform single
exponential smoothing with a weight of one to give naive forecasting.
e Example of single exponential smoothing
You wish to predict employment over 6 months in a segment of the metals industry using data
collected over 60 months. You use single exponential smoothing because there is no clear trend
or seasonal pattern in the data.
1 Open the worksheet EMPLOY.MTW.
2 Choose Stat Time Series Single Exp Smoothing.
3 In Variable, enter Metals.
7-24
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Time Series
Graph
window
output
Data
Your time series must be in a numeric column.
The time series cannot include any missing values. If you have missing values, you may want to
provide estimates of the missing values. If you
have seasonal data, estimate the missing values as the fitted values from the decomposition
procedure on page 7-10
CONTENTS
7-25
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 7
HOW TO USE
do not have seasonal data, estimate the missing values as the fitted values from the moving
average procedure on page 7-18
Options
Double Exponential Smoothing dialog box
specify smoothing weights for the level and trend components rather than using the calculated
optimal weight. See Choosing weights on page 7-27.
specify the origin of forecasts (time unit before first forecast). The default is the end of the data.
7-26
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Time Series
Choosing weights
The weights are the smoothing parameters. You can have MINITAB supply the optimal weights
(the default) or you can specify weights between 0 and 1 for the trend and level components. See
Method on page 7-27 for more information.
Regardless of the component, large weights result in more rapid changes in that component;
small weights result in less rapid changes. Therefore, the larger the weights the more the
smoothed values follow the data; the smaller the weights the smoother the pattern in the
smoothed values. The components in turn affect the smoothed values and the predicted values.
Thus, small weights are usually recommended for a series with a high noise level around the
signal or pattern. Large weights are usually recommended for a series with a small noise level
around the signal.
Among double exponential smoothing fits, the MSD accuracy measure will be smallest with
optimal weights, but it is possible to obtain smaller MAPE and MAD values with nonoptimal
weights. See Measures of accuracy on page 7-7.
h To specify your own weights
In the main Double Exponential Smoothing dialog box, choose Use under Weight to use in
smoothing, and enter a value between 0 and 1 in the boxes for the level and/or the trend.
Method
Double exponential smoothing employs a level component and a trend component at each
period. It uses two weights, or smoothing parameters, to update the components at each period.
The double exponential smoothing equations are:
L t = Y t + ( 1 ) [ L t 1 + T t 1 ]
T t = [ L t L t 1 ] + ( 1 )T t 1
Yt = Lt 1 + Tt 1
where Lt is the level at time t, is the weight for the level, Tt is the trend at time t, is the weight
is the fitted value, or one-period-ahead
for the trend, Yt is the data value at time t, and Y
t
forecast, at time t.
CONTENTS
7-27
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 7
HOW TO USE
If the first observation is numbered one, then level and trend estimates at time zero must be
initialized in order to proceed. The initialization method used to determine how the smoothed
values are obtained in one of two ways: with optimal weights or with specified weights.
Optimal weights
1 MINITAB fits an ARIMA (0,2,2) model to the data, in order to minimize the sum of squared
errors.
2 The trend and level components are then initialized by backcasting.
Specified weights
1 MINITAB fits a linear regression model to time series data (y variable) versus time (x
variable).
2 The constant from this regression is the initial estimate of the level component, the slope
coefficient is the initial estimate of the trend component.
When you specify weights that do not correspond to an equal-root ARIMA (0,2,2) model,
MINITAB employs Holts method. If you specify weights that do correspond to an equal-root
ARIMA (0,2,2) model, MINITAB employs Browns method.
Forecasting
Double exponential smoothing uses the level and trend components to generate forecasts. The
forecast for m periods ahead from a point at time t is
Lt + mTt, where Lt is the level and Tt is the trend at time t.
Data up to the forecast origin time will be used for the smoothing.
7-28
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Time Series
You wish to predict employment over six months in a segment of the metals industry. You use
double exponential smoothing as there is no clear trend or seasonal pattern in the data, and you
want to compare the fit by this method with that from single exponential smoothing (see
Example of single exponential smoothing on page 7-24).
1 Open the worksheet EMPLOY.MTW.
2 Choose Stat Time Series Double Exp Smoothing.
3 In Variable, enter Metals.
4 Check Generate forecasts and enter 6 in Number of forecasts. Click OK.
Session
window
output
Period Forecast
1
2
3
4
5
6
61
62
63
64
65
66
48.0961
48.1357
48.1752
48.2147
48.2542
48.2937
Lower
Upper
46.7717
46.0599
45.3134
44.5545
43.7898
43.0220
49.4206
50.2114
51.0369
51.8748
52.7185
53.5653
Graph
window
output
CONTENTS
7-29
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 7
HOW TO USE
Winters Method
Winters Method
Winters Method smooths your data by Holt-Winters exponential smoothing and provides short to
medium-range forecasting. You can use this procedure when both trend and seasonality are
present, with these two components being either additive or multiplicative. Winters Method
calculates dynamic estimates for three components: level, trend, and seasonal.
Data
Your time series must be in one numeric column.
The time series cannot include any missing values. If you have missing values, you may want to
provide estimates of the missing values. If you
have seasonal data, estimate the missing values as the fitted values from the decomposition
procedure on page 7-10
do not have seasonal data, estimate the missing values as the fitted values from the moving
average procedure on page 7-18
7-30
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Winters Method
Time Series
Options
Winters Method dialog box
specify that the level and seasonal components should be additive rather than multiplicative.
See An additive or a multiplicative model? on page 7-32.
specify weights for the level, trend, and seasonal components. The defaults are 0.2. See
Choosing weights on page 7-32.
specify the origin of forecasts (time unit before first forecast). The default is the end of the
data.
CONTENTS
7-31
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 7
HOW TO USE
Winters Method
store the smoothed values, level, trend, and seasonal estimates, the fits or predicted values
(one-period-ahead forecasts), the residuals (data fits), forecasts, and upper and lower 95%
prediction limits
Choosing weights
You can enter weights, or smoothing parameters, for the level, trend, and seasonal components.
The default weights are 0.2 and you can enter values between 0 and 1. Since an equivalent
ARIMA model exists only for a very restricted form of the Holt-Winters model, MINITAB does not
compute optimal parameters for Winters method as it does for single and double exponential
smoothing.
Regardless of the component, large weights result in more rapid changes in that component;
small weights result in less rapid changes. The components in turn affect the smoothed values
and the predicted values. Thus, small weights are usually recommended for a series with a high
noise level around the signal or pattern. Large weights are usually recommended for a series with
a small noise level around the signal.
Method
Winters method employs a level component, a trend component, and a seasonal component at
each period. It uses three weights, or smoothing parameters, to update the components at each
period. Initial values for the level and trend components are obtained from a linear regression on
time. Initial values for the seasonal component are obtained from a dummy-variable regression
using detrended data. The Winters method smoothing equations are:
7-32
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Winters Method
Time Series
Additive model:
Lt = ( Yt St p ) + ( 1 ) [ Lt 1 + Tt 1 ]
T t = [ L t L t 1 ] + ( 1 )T t 1
S t = ( Y t L t ) + ( 1 )S t p
Yt = Lt 1 + Tt 1 + St p
Multiplicative model:
Lt = ( Yt St p ) + ( 1 ) [ Lt 1 + Tt 1 ]
T t = [ L t L t 1 ] + ( 1 )T t 1
S t = ( Y t L t ) + ( 1 )S t p
= (L
Y
t
t 1 + T t 1 ) ( St p )
where Lt is the level at time t, is the weight for the level, Tt is the trend at time t, is the weight
for the trend, St is the seasonal component at time t, is the weight for the seasonal component,
is the fitted value, or
p is the seasonal period, Yt is the data value at time t, and Y
t
one-period-ahead forecast, at time t.
Forecasting
Winters method uses the level, trend, and seasonal components to generate forecasts. The
forecast for m periods ahead from a point at time t is
Lt + mTt, where Lt is the level and Tt is the trend at time t, multiplied by (or added to for an
additive model) the seasonal component for the same period from the previous year.
Winters Method uses data up to the forecast origin time to generate the forecasts.
e Example of Winters method
You wish to predict employment for the next six months in a food preparation industry using data
collected over the last 60 months. You use Winters method with the default multiplicative
model, because there is a seasonal component, and possibly trend, apparent in the data.
1 Open the worksheet EMPLOY.MTW.
2 Choose Stat Time Series Winters Method.
3 In Variable, enter Food, and 12 in Seasonal length.
CONTENTS
7-33
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 7
Winters Method
4 Under Model Type, choose Multiplicative.
5 Check Generate forecasts and enter 6 in Number of forecasts. Click OK.
Session
window
output
61
62
63
64
65
66
57.8102
57.3892
57.8332
57.9307
58.8311
62.7415
Lower
Upper
55.0645
54.5864
54.9687
55.0005
55.8313
59.6686
60.5558
60.1921
60.6977
60.8609
61.8309
65.8145
Graph
window
output
7-34
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Differences
HOW TO USE
Time Series
For these data, MAPE, MAD, and MSD were 1.88, 1.12, and 2.87, respectively, with the
multiplicative model. MAPE, MAD, and MSD were 1.95, 1.15, and 2.67, respectively (output
not shown) with the additive model, indicating that the multiplicative model provided a slightly
better fit according to two of the three accuracy measures.
Differences
Differences computes the differences between data values of a time series. If you wish to fit an
ARIMA model but there is trend or seasonality present in your data, differencing data is a
common step in assessing likely ARIMA models. Differencing is used to simplify the correlation
structure and to help reveal any underlying pattern.
Data
Your time series must be in one numeric column. MINITAB stores the difference for missing data
as missing ().
h To do differencing
Options
You can change the lag period from the default of one.
CONTENTS
7-35
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 7
Lag
Method
MINITAB calculates the differences between data values. The values that are differenced depend
on the length of the lag. If you request a lag of k, the entries in the stored column are the data
values in the original column minus the data value k rows above. For example, suppose you
difference a column using a lag of two:
Input
1
3
8
12
7
Stored
*
*
7
9
-1
Since the lag = 2, MINITAB stores asterisks () in rows 1 and 2 of Stored. Row 3 of Stored contains
8 1, row 4 contains 12 3, row 5 contains 7 8.
Lag
Lag computes lags of a column and stores them in a new column. To lag a time series, MINITAB
moves the data down the column and inserts missing values at the top of the column. The
number of missing values inserted depends on the length of the lag.
Data
Your time series must be in one numeric column. MINITAB stores the lag for missing data as
missing.
h To lag a time series
1 Choose Stat Time Series Lag.
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Autocorrelation
HOW TO USE
Time Series
Options
You can change the lag period from the default of one.
Method
To lag a time series, MINITAB moves the data down the column and inserts missing values at the
top of the column. The number of missing values inserted depends on the length of the lag. If
you request lag of k time units, the entries in the stored column are the same as units of the
original column shifted down k cells, with k missing values inserted at the top. For example,
suppose you lag a column using a lag of three:
Input
5
3
18
7
10
2
Stored
*
*
*
5
3
18
Since the lag = 3, MINITAB stores asterisks () in rows 1, 2, and 3 of Stored. Beginning with row
4, the original data is stored down the column until the column of lagged data is the same length
as the original time series data.
Autocorrelation
Autocorrelation computes and plots the autocorrelations of a time series. Autocorrelation is the
correlation between observations of a time series separated by k time units. The plot of
autocorrelations is called the autocorrelation function or acf. View the acf to guide your choice
of terms to include in an ARIMA model. See Fitting an ARIMA model on page 7-45.
More
Data
Your time series must be entered in one numeric column. You must either estimate or delete
missing data before using this procedure.
CONTENTS
7-37
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 7
Autocorrelation
h To do an autocorrelation function
1 Choose Stat Time Series Autocorrelation.
Options
change the number of lags for which to display autocorrelations. The default is n / 4 for a series
with less than or equal to 240 observations or n + 45 for a series with more than 240
observations, where n is the number of observations in the series.
7-38
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Autocorrelation
HOW TO USE
Time Series
e Example of autocorrelation
You wish to predict employment in a food preparation industry using past employment data. You
want to use ARIMA to do this but first you use autocorrelation in order to help identify a likely
model. Because the data exhibit a strong 12 month seasonal component, you take a difference at
lag 12 in order to induce stationarity and look at the autocorrelation of the differenced series.
There may be some long-term trend in these data, but the magnitude of it appears to be small
compared to the seasonal component. If the trend was larger, you might consider taking another
difference at lag 1 to induce stationarity.
1 Open the worksheet EMPLOY.MTW.
2 Choose Stat Time Series Differences.
3 In Series, enter Food.
4 In Store differences in, enter Food2.
5 In Lag, enter 12. Click OK.
6 Choose Stat Time Series Autocorrelation.
7 In Series, enter Food2. Click OK.
Graph
window
output
CONTENTS
7-39
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 7
HOW TO USE
Autocorrelation
You can use the Ljung-Box Q (LBQ) statistic to test the null hypothesis that the autocorrelations
for all lags up to lag k equal zero. Lets test that all autocorrelations up to a lag of 6 are zero. The
LBQ statistic is 56.03.
To compute the cumulative probability function:
1 Choose Calc Probability Distributions Chi-Square.
2 Check Cumulative Probability.
3 In Degrees of freedom, enter 6 (the lag of your test).
4 Choose Input constant and enter 56.03 (the LBQ value).
5 In Optional storage, enter Cumprob. This stores the cumulative probability function in a
7-40
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Partial Autocorrelation
Time Series
Partial Autocorrelation
Partial Autocorrelation computes and plots the partial autocorrelations of a time series. Partial
autocorrelations, like autocorrelations, are correlations between sets of ordered data pairs of a
time series. As with partial correlations in the regression case, partial autocorrelations measure
the strength of relationship with other terms being accounted for. The partial autocorrelation at a
lag of k is the correlation between residuals at time t from an autoregressive model and
observations at lag k with terms for all intervening lags present in the autoregressive model. The
plot of partial autocorrelations is called the partial autocorrelation function or pacf. View the
pacf to guide your choice of terms to include in an ARIMA model.
Data
Your time series must be entered in one numeric column. You must either estimate or delete
missing data before using this procedure.
h To do a partial autocorrelation function
1 Choose Stat Time Series Partial Autocorrelation.
CONTENTS
7-41
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 7
HOW TO USE
Partial Autocorrelation
Options
change the number of lags for which to display partial autocorrelations. The default is n / 4 for
a series with less than or equal to 240 observations or n + 45 for a series with more than 240
observations, where n is the number of observations in the series.
You obtain a pacf of the food industry employment data, after taking a difference of lag 12, in
order to help determine a likely ARIMA model.
1 Open the worksheet EMPLOY.MTW.
2 Choose Stat Time Series Differences.
3 In Series, enter Food.
4 In Store differences in, enter Food2.
5 In Lag, enter 12. Click OK.
6 Choose Stat Time Series Partial Autocorrelation.
7 In Series, enter Food2. Click OK.
Graph
window
output
7-42
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Cross Correlation
Time Series
Cross Correlation
Cross correlation computes and graphs correlations between two time series.
Data
You must have two time series in separate numeric columns of equal length. You must either
estimate or delete missing data before using this procedure.
h To do a cross correlation
1 Choose Stat Time Series Cross Correlation.
Options
You can specify the number of lags for which to display cross correlations. The default is
( n + 10 ) to ( n + 10 ) lags.
MINITAB Users Guide 2
CONTENTS
7-43
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 7
ARIMA
ARIMA
ARIMA fits a Box-Jenkins ARIMA model to a time series. ARIMA stands for Autoregressive
Integrated Moving Average with each term representing steps taken in the model construction
until only random noise remains. Use ARIMA to model time series behavior and to generate
forecasts. ARIMA modeling differs from the other time series methods discussed in this chapter in
the fact that ARIMA modeling uses correlational techniques. ARIMA can be used to model
patterns that may not be visible in plotted data. The concepts used in this procedure follow Box
and Jenkins [2]. For an elementary introduction to time series, see [3], [11].
Data
Your time series must be in a numeric column. Missing data in the middle of your series are not
allowed. If you have missing values, you may want to provide estimates of the missing values.
h To fit an ARIMA model
1 Choose Stat Time Series ARIMA.
enter the number of parameters. See Entering the ARIMA model on page 7-47.
4 If you like, use one or more the options listed below, then click OK.
Options
ARIMA dialog box
fit a seasonal model and specify the period. The default period is 12.
7-44
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
ARIMA
Time Series
specify the number of nonseasonal or seasonal differences to take. See Entering the ARIMA
model on page 7-47.
specify starting values for the parameter estimates. Default starting values are 0.1 except for
the constant. See Entering the ARIMA model on page 7-47.
display a time series plot with forecasts and 95% confidence limits of the raw data
display scatter plots of the residuals vs. fits, the residuals vs. data order (1 2 3 4 n), or the
residuals vs. specified columns
specify the origin of forecasts (time unit before first forecast). The default is the end of the
data.
CONTENTS
7-45
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 7
ARIMA
model adequacy, and forecasting, if desired. The model identification step generally requires
judgement from the analyst.
1 First, decide if the data are stationary. That is, do the data possess constant mean and variance.
Examine a time series plot to see if a transformation is required to give constant variance.
Examine the acf to see if large autocorrelations do not die out, indicating that differencing
may be required to give a constant mean.
A seasonal pattern that repeats every kth time interval suggests taking the kth difference to
remove a portion of the pattern. Most series should not require more than two difference
operations or orders. Be careful not to overdifference. If spikes in the acf die out rapidly, there
is no need for further differencing. A sign of an overdifferenced series is the first
autocorrelation close to 0.5 and small values elsewhere [11].
Use Stat Time Series Differences to take and store differences. Then, to examine the acf
and pacf of the differenced series, use Stat Time Series Autocorrelation and Stat Time
Series Partial Autocorrelation.
2 Next, examine the acf and pacf of your stationary data in order to identify what autoregressive
An acf with large spikes at initial lags that decay to zero or a pacf with a large spike at the
first and possibly at the second lag indicates an autoregressive process.
An acf with a large spike at the first and possibly at the second lag and a pacf with large
spikes at initial lags that decay to zero indicates a moving average process.
The acf and the pacf both exhibiting large spikes that gradually die out indicates that both
autoregressive and moving averages processes are present.
For most data, no more than two autoregressive parameters or two moving average parameters
are required in ARIMA models. See [11] for more details on identifying ARIMA models.
3 Once you have identified one or more likely models, you are ready to use the ARIMA
procedure.
Fit the likely models and examine the significance of parameters and select one model that
gives the best fit. See Entering the ARIMA model on page 7-47.
Check that the acf and pacf of residuals indicate a random process, signified when there are
no large spikes. You can easily obtain an acf and a pacf of residual using ARIMAs Graphs
subdialog box. If large spikes remain, consider changing the model.
You may perform several iterations in finding the best model. When you are satisfied with
the fit, go ahead and make forecasts.
The ARIMA algorithm will perform up to 25 iterations to fit a given model. If the solution does
not converge, store the estimated parameters and use them as starting values for a second fit. You
can store the estimated parameters and use them as starting values for a subsequent fit as often as
necessary.
7-46
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
ARIMA
Time Series
If you want to fit a seasonal model, check Fit seasonal model and enter a number to specify
the period. The period is the span of the seasonality or the interval at which the pattern is
repeated. The default period is 12.
You must check Fit seasonal model before you can enter the seasonal autoregressive and
moving average parameters or the number of seasonal differences to take.
To specify the number of nonseasonal and/or seasonal differences to take, enter a number in
the appropriate box. If you request one seasonal difference with k as the seasonal period, the
kth difference will be taken.
To include the constant in the model, check Include constant term in model.
You may want to specify starting values for the parameter estimates. You must first enter the
starting values in a worksheet column in the following order: ARs (autoregressive
parameters), seasonal ARs, MAs (moving average parameters), seasonal MAs, and if you
checked Include constant term in model enter the starting value for the constant in the last
row of the column. This is the same order in which the parameters appear on the output.
Check Starting values for coefficients, and enter the column containing the starting values
for each parameter included in the model. Default starting values are 0.1 except for the
constant.
The acf and pacf of the food employment data (see Example of autocorrelation on page 7-39 and
Example of partial autocorrelation on page 7-42) suggest an autoregressive model of order 1, or
AR(1), after taking a difference of order 12. You fit that model here, examine diagnostic plots,
and examine the goodness of fit. To take a seasonal difference of order 12, you specify the
seasonal period to be 12, and the order of the difference to be 1. In the subsequent example, you
perform forecasting.
1 Open the worksheet EMPLOY.MTW.
2 Choose Stat Time Series ARIMA.
3 In Series, enter Food.
CONTENTS
7-47
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 7
ARIMA
4 Check Fit seasonal model. In Period, enter 12. Under Nonseasonal, enter 1 in
Session
window
output
T
7.42
1.31
P
0.000
0.196
Graph
window
output
7-48
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
ARIMA
Time Series
In the previous example, you found that an AR(1) model with a twelfth seasonal difference gave
a good fit to the food sector employment data. You now use this fit to predict employment for the
next 12 months. The Session window output is not shown; you can see this output on page 7-48.
Step 1: Refit the ARIMA model without displaying the acf and pacf of the residuals:
1 Perform steps 15 of Example of fitting an ARIMA model on page 7-47.
2 Uncheck ACF of residuals and PACF of residuals.
Graph
window
output
CONTENTS
7-49
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 7
References
References
[1] B.L. Bowerman and OConnell (1987). Time Series Forecasting: Unified Concepts and
Computer Implementation, Duxbury, Boston.
[2] G.E.P. Box and G.M. Jenkins (1976). Time Series Analysis: Forecasting and Control, Revised
Edition, Holden Day.
[3] J.D. Cryer (1986). Time Series Analysis, Duxbury Press.
[4] N.R. Farnum and L.W. Stanton (1989). Quantitative Forecasting Methods. PWS-Kent,
Boston.
[5] G.M. Ljung and G.E.P. Box (1978). On a Measure of Lack of Fit in Time Series Models,
Biometrika 65, pp.6772.
[6] S. Makridakis, S.C. Wheelwright, and V.E. McGee (1983). Forecasting: Methods and
Applications. Wiley, New York.
[7] D.W. Marquardt (1963). An Algorithm for Least Squares Estimation of Nonlinear
Parameters, Journal Soc. Ind. Applied Mathematics 11, pp.431441.
[8] W.Q. Meeker, Jr. (1977). TSERIESA User-oriented Computer Program for Identifying,
Fitting and Forecasting ARIMA Time Series Models, ASA 1977 Proceedings of the Statistical
Computing Section.
[9] W.Q. Meeker, Jr. (1977). TSERIES Users Manual, Statistical Laboratory, Iowa State
University.
[10] R.B. Miller and D.W. Wichern (1977). Intermediate Business Statistics, Holt, Rinehart and
Winston.
[11] W. Vandaele (1983). Applied Time Series and Box-Jenkins Models. Academic Press, Inc, New
York.
Acknowledgment
The ARIMA algorithm is based on the fitting routine in the TSERIES package written by
Professor William Q. Meeker, Jr., of Iowa State University [8], [9]. We are grateful to Professor
Meeker for his help in the adaptation of his routine to MINITAB.
7-50
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Exploratory
Data Analysis
Rootogram, 8-12
See also,
CONTENTS
8-1
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 8
HOW TO USE
Letter Values generates a letter-value display. Use this procedure to describe the location and
spread of sample distributions.
Median Polish fits an additive model to a two-way design and identifies data patterns not
explained by row and column effects. This procedure is similar to analysis of variance except
medians are used instead of means, thus adding robustness against the effect of outliers.
Resistant Line uses a method that is resistant to outliers to fit a straight line to your data. You
can fit a resistant line before using a least squares regression to see if the relationship is linear,
to find re-expressions to linearize the relationship if necessary, and to identify outliers.
Resistant Smooth smooths an ordered sequence of data, usually collected over time, to
remove random fluctuations. Smoothing is useful for discovering and summarizing both data
trends and outliers.
Letter Values
Use letter-value displays to describe the location and spread of sample distributions.The statistics
given depend on the sample size, and include median, hinges, eighths, and more.
Data
You need one column that contains numeric or date/time data, but no missing values. Delete any
missing values from the worksheet before displaying letter values.
8-2
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Letter Values
HOW TO USE
Exploratory Data Analysis
2 In Variable, enter the column that contains the data for which you want to obtain letter
values.
3 If you like, use the option listed below, then click OK.
Options
You can store the letter values, middle values, and spreads.
Method
Letter values are defined by their depth. We use n for the number of observations.
depth of median:
depth of hinges:
depth of eighths:
depth of sixteenths:
d(M)
d(H)
d(E)
d(D)
=
=
=
=
(n + 1) / 2
([d(M)] + 1) / 2
([d(H)] + 1) / 2
([d(E)] + 1) / 2
Remaining depths are found by continuing the pattern (depth of thirty-seconds, sixty-fourths,
etc.). The depth is determined by the amount of data in the column. Remaining depths are
labeled sequentially C, B, A, Z, Y, X, W, V, U
To find the letter values, MINITAB first orders the data. The lower hinge is the observation at a
distance d(H) from the smallest observation; the upper hinge is the observation at a distance
d(H) from the largest observation. Similarly, the lower and upper eighths are the observations at
a depth d(E), and so on. If the depth for the data does not coincide with a data value, the average
of the nearest neighbors is taken.
The middle value for a given depth is the average of the upper and lower letter values at that
depth. The spread is (upper lower).
CONTENTS
8-3
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 8
HOW TO USE
Median Polish
When you store the letter values, the column will contain all the numbers on the output listed
under Lower (starting from the bottom and going up), then the median, and then the numbers
listed under Upper (starting from the top and going down).
e Example of letter values
Session
window
output
Depth
92
46.5
23.5
12.0
6.5
3.5
2.0
1
Lower
Upper
71.000
64.000
80.000
62.000
88.000
59.000
91.000
56.000
95.000
54.000
96.000
48.000
100.000
Mid
Spread
71.000
72.000
75.000
75.000
75.500
75.000
74.000
16.000
26.000
32.000
39.000
42.000
52.000
The median for this data is the average of the forty-sixth and forty-seventh ordered observations
and is 71.
The hinges are the average of the twenty-second and twenty-third observations from either
end, with values of 64 and 80, the average of these being the Mid, or 72. The difference
between the upper and lower hinges is the Spread, or 16.
The eighths (E), sixteenths (D), and other letter values are calculated in a similar fashion.
Median Polish
Median Polish fits an additive model to a two-way design and identifies data patterns not
explained by row and column effects. This procedure is similar to analysis of variance except
medians are used instead of means, thus adding robustness against the effect of outliers. For a
complete discussion, see [1] and [2].
8-4
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Median Polish
HOW TO USE
Exploratory Data Analysis
Median Polish does not print results. Use Stat Tables Cross Tabulation to display the data,
stored fits, or residuals.
Data
Arrange your data in three numeric columns in the worksheeta response, a row factor, and a
column factor. Each row represents one observation. Row levels and column levels must be
consecutive integers starting at one. The table may be unbalanced and may have empty cells,
but you cannot have any missing values. Delete any missing values from the worksheet before
performing a median polish.
h To perform a median polish
1 Choose Stat EDA Median Polish.
Options
specify the number of iterations to find the solution. The default is four. See Method on page
8-6.
use column medians rather than row medians for the first iteration. Starting with rows and
starting with columns does not necessarily yield the same fits, even if many iterations are
done.
store the comparison values. See Improving the fit of an additive model on page 8-6.
CONTENTS
8-5
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 8
Median Polish
Method
Median Polish uses an iterative algorithm.
1 On the first iteration, MINITAB finds the median for each row of the table, subtracts these from
the numbers in the corresponding rows, and uses them as preliminary values for the row
effects. This gives a column of row medians and a new table from which the row medians have
been subtracted.
2 On the second iteration, it finds the median for each column of the new table, subtracts these
from the numbers in the columns, and uses them as preliminary values for the column effects.
In addition, it finds the median of the row effects, subtracts it from each row effect, and uses it
as the preliminary common value.
3 Median Polish now goes back to rows. This time when it finds row medians, it also finds the
median of the preliminary column effects, subtracts it from the row effects, and adds it to the
common value.
4 This procedure continues, working on rows and columns alternately. After the last iteration,
the row of column effects is corrected by itself: the median of this row is subtracted from each
column effect and is added to the common.
The numbers remaining in the table are the residuals. The margins of the table contain the
common, row, and column effects. The fitted value for row i, column j is common + (row effect
i) + (column effect j). As in analysis of variance, data = fit + residual.
j:
comparison value =
2 Plot each residual against its comparison value for visual inspection of the data.
3 Fit a straight line to the data using the Resistant Line procedure on page 8-9.
4 Determine whether or not a transformation will improve the fit of the additive model. Let p =
8-6
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Median Polish
HOW TO USE
Exploratory Data Analysis
If p = , Y , where Y is the data, is likely to be more nearly additive (and thus better
analyzed by median polish).
The exploratory technique described above is similar to Tukeys one degree of freedom for
non-additivity method.
e Example of median polish
Suppose you want to fit a model to experimental data in a two-way design. The experiment
involved three types of helmets where a force was applied to the front and the back of the
helmet. The two factors of interest are helmet type and location of force applied; whereas, the
response measure is impact. The impact was measured to determine whether or not any
identifiable data patterns exist that would indicate a difference between the three helmet types
and the front and back portion of the helmet, with the level of protection (as measured by
Impact) provided. Here, we fit an additive model to a two-way design using a median polish.
Since Median Polish does not display any results, use Display Data and Cross Tabulation to
display results in the Session window.
Step 1: Perform the median polish
1 Open the worksheet EXH_STAT.MTW. Choose Stat EDA Median Polish.
2 In Response, enter Impact.
3 In Row factor, enter HelmetType. In Column factor, enter Location.
4 In Common effect, enter CommonEffect. In Row effects, enter RowEffect. In Column
Session
window
output
CONTENTS
8-7
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 8
Session
window
output
HOW TO USE
Median Polish
Data Display
CommonEffect
44.5000
Row RowEffect
1
2
3
ColumnEffect
0
23
-3
-1
1
box.
Session
window
output
Columns: Location
2
3.5000 0.5000
-0.5000 -5.5000
-4.5000 -1.5000
1.5000 2.5000
0.5000 -0.5000
-1.5000 3.5000
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Resistant Line
HOW TO USE
Exploratory Data Analysis
Resistant Line
Resistant line fits a straight line to your data using a method that is resistant to outliers. Velleman
and Hoaglin [2] suggest fitting a resistant line before using least squares regression to see if the
relationship is linear, to find re-experiences to linearize the relationship if necessary, and to
identify outliers.
Data
You must have two numeric columnsa response variable column and predictor variable
columnwith at least six, but preferably nine or more, observations.
MINITAB automatically omits missing data from the calculations.
h To fit a resistant line
1 Choose Stat EDA Resistant Line.
2 In Response, enter the column that contains the measurement data (Y).
3 In Predictor, enter the column that contains the predictor variable data (X).
4 If you like, use one or more of the options listed below, then click OK.
Options
Resistant Line dialog box
specify the maximum number of iterations used to find a solution. The default is 10. This
procedure will stop before the specified number of iterations if the value of the slope does not
change very much.
CONTENTS
8-9
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 8
Resistant Smooth
Method
First the data are partitioned into three groups: data with low x-values, middle x-values, and high
x-values. The resistant line is fit so that the median residual in the left (low x) partition is equal
to the median residual in the right partition.
Resistant Line uses an iterative method to find this solution. It will usually reach a solution in
fewer than the default 10 iterations but, for some data sets, it may not converge at all. Failure to
converge is especially likely to happen if the data have extraordinary x-values. If you wish to print
the slope for each iteration, choose In addition, the slope for each iteration in the Results
subdialog box.
Resistant Smooth
Resistant Smooth smooths an ordered series of data, usually collected over time, to remove
random fluctuations. Smoothing is useful for discovering and summarizing both data trends and
outliers. Resistant Smooth offers two smoothing methods: 4253H, twice and 3RSSH, twice. See
Method on page 8-11.
Data
You must have a numeric column with at least seven observations. You can have missing data at
the beginning and end of the column, but not in the middle.
8-10
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Resistant Smooth
HOW TO USE
Exploratory Data Analysis
2 In Variable, enter the column that contains the raw data to be smoothed.
3 In Rough, enter a column to store the rough data (rough data = raw data smoothed data).
4 In Smooth, enter a column to store the smoothed data.
5 If you like, use the option below, then click OK.
Options
You can choose to use the 3RSSH, twice smoother. The default method is to use the 4253H,
twice smoother.
Method
The smoothers are built up by successive applications of simple smoothers, such as running
medians and hanning. Running medians replace each observation by the median of the
observations immediately before and after it. Medians of 2, 3, 4, and 5 consecutive observations
are used by Resistant Smooth. Hanning replaces yt by the running average, 0.25yt - 1 + 0.5yt +
0.25yt + 1. Special methods are used at the ends of the sequence. MINITAB provides two
smoothing methods: 4253H, twice and 3RSSH, twice.
4253H, twice (the default) consists of a running median of 4, then 2, then 5, then 3, followed
by hanning (H). Each residual, or rough, is then smoothed by the same smoother. The
smooth of the residual is then added to the smooth of the first pass to produce the full
smoother, 4253H, twice.
3RSSH, twice is made up of three simple smoothers: 3R, followed by SS, followed by H. 3R
says to repeatedly use running medians of length 3 until there are no changes. SS, or splitting,
uses a special method to remove flat spots that often appear after 3R. H is hanning.
CONTENTS
8-11
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 8
HOW TO USE
Rootogram
See [2] for a full description of these methods. Detailed analyses of the properties of these
smoothers can be found in [1].
Rootogram
A suspended rootogram is a histogram with a normal distribution fit to it, which displays the
deviations from the fitted normal distribution. Since a rootogram is fit using percentiles, it
protects against outliers and extraordinary bin counts. For further details see [2].
Data
Your data can be in one of two forms: raw or frequency. To use
raw data, you need one column of numeric or date/time data. By default, the rootogram
procedure will determine the bin boundaries.
frequency data, you need one numeric column that contains the count (frequency) of
observations for each bin. The frequencies need to be ordered down the column from the
upper-most bin to the lower-most bin (equivalent to the left-most and right-most bins in a
histogram, respectively). By default, the bins have a width of 1.
Optionally, you can specify the bin boundaries for both raw and frequency data in another
column. In the bin boundary column, enter the bin boundaries down the column from the
smallest to largest.
If you are using bin boundaries with frequency data, the first row of the frequency data column is
the count for the number of observations that fall below the smallest bin boundary. If no
observations fall below the first bin boundary, the count in the first row is zero. Similarly, the last
row of the frequency data column contains the count for the number of observations that fall
above the largest bin boundary. The frequency data column will have one more entry than the
column of bin boundaries.
MINITAB automatically omits missing data from the calculations.
8-12
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Rootogram
HOW TO USE
Exploratory Data Analysis
under Source of Data, choose Variable, and enter the column that contains the raw data,
or
choose Frequencies, and enter the column that contains the counts
3 If you like, use one or more of the options listed below, then click OK.
Options
specify bin boundaries. Enter a column that contains bin boundaries ordered from smallest to
largest. If no bin boundaries are given and you enter frequency data, the bins are set to a
width of 1.
enter known values for and , overriding the automatic estimation of the mean and
standard deviation used in fitting the Gaussian comparison curve. If you enter a value for one
parameter, you must enter a value for the other parameter.
store bin boundaries, counts, double root residuals, and fitted counts.
Method
Let x1, , xk be the bin boundaries (the same as class boundaries for a histogram). These
determine k + 1 bins. Let bi = bin from xi - 1 to xi. Let b1 = half-open bin below x1, and bk + 1 =
half-open bin above xk. Let ni = (number observations in bi). If an observation falls on xi, it is put
in bi + 1. If a mean and standard deviation are not specified, they are calculated as m = (1/2) (HL
+ HU) and s = (HU HL) / 1.349, where HL and HU are the lower and upper hinges. See Method
for Letter Values on page 8-3 for definitions of hinges.
The fitting of the normal distribution is based upon square roots of the counts in each bin to
stabilize variance. The fitted count, fi, is N (area under normal curve with the specified mean
MINITAB Users Guide 2
CONTENTS
8-13
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 8
HOW TO USE
Rootogram
and stdev, in bin i), where N = total number of observations. The raw residuals, RawRes, are (ni
fi). The double root residuals, DRRes, are
2 + 4n i 1 + 4f i if ni is not zero
1 1 + 4n i if ni is zero
Double root residuals are essentially 2 ( n i f i ) , with a minor modification to avoid some
difficulties with small counts.
Here, we use a rootogram to determine whether or not the weight measurements from 92
students follow a normal distribution.
1 Open the worksheet PULSE.MTW.
2 Choose Stat EDA Rootogram.
3 In Variable, enter Weight. Click OK.
8-14
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
References
HOW TO USE
Exploratory Data Analysis
Session
window
output
Rootogram: Weight
Bin
Count
RawRes
DRRes
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
0.0
0.0
2.0
5.0
12.0
12.0
11.0
17.0
16.0
5.0
5.0
5.0
1.0
0.0
1.0
0.0
-0.7
-1.2
-0.8
-0.5
3.0
-0.5
-3.7
2.4
3.7
-3.8
-0.4
2.2
-0.2
-0.4
0.9
-0.0
-0.90
-1.44
-0.35
-0.10
0.99
-0.06
-0.94
0.66
1.03
-1.34
-0.04
1.23
0.04
-0.66
1.20
-0.09
Suspended Rootogram
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
------------+++++
----++++
++++++
------+++++++
+
---+++++++
-
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
OO
References
[1] P.F. Velleman (1980). Definition and Comparison of Robust Nonlinear Data Smoothing
Algorithms, Journal of the American Statistical Association, Volume 75, Number 371, pp.
609615.
[2] P.F. Velleman and D.C. Hoaglin (1981). ABCs of EDA, Duxbury Press.
Acknowledgments
MINITABs EDA commands use the programs in the book ABCs of EDA by P. Velleman and D.
Hoaglin [2]. See this book for a full explanation of these commands and guidance on how to use
them. We thank Paul Velleman and David Hoaglin for permission to use their routines and for
assistance in adapting them to MINITAB.
MINITAB Users Guide 2
CONTENTS
8-15
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Power and
Sample Size
CONTENTS
9-1
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 9
HOW TO USE
A prospective study is used before collecting data to consider design sensitivity. You want to be
sure that you have enough power to detect differences (effects) that you have determined to be
important. For example, you can increase the design sensitivity by increasing the sample size
or by taking measures to decrease the error variance.
A retrospective study is used after collecting data to help understand the power of the tests that
you have performed. For example, suppose you conduct an experiment and the data analysis
does not reveal any statistically significant results. You can then calculate power based on the
minimum difference (effect) you wish to detect. If the power to detect this difference is low,
you may want to modify your experimental design to increase the power and continue to
evaluate the same problem. However, if the power is high, you may want to conclude that
there is no meaningful difference (effect) and discontinue experimentation.
MINITAB provides power, sample size, and difference (effect) calculations (also the number of
center points for factorial and Plackett-Burman designs) for the following procedures:
one-sample Z
one-sample proportion
one-sample t
two-sample proportion
Plackett-Burman designs
two-sample t
What is power?
There are four possible outcomes for a hypothesis test. The outcomes depend on whether the
null hypothesis (H0) is true or false and whether you decide to reject or fail to reject H0. The
power of a test is the probability of correctly rejecting H0 when it is false. In other words, power is
the likelihood that you will identify a significant difference (effect) when one exists.
The four possible outcomes are summarized below:
power
Null hypothesis
Decision
True
False
When
H0 isH0true and you
reject
it, you make aType
type III error.
fail
to reject
correct
decision
error The probability (p) of making a
type I error is called alpha
to as the level of significance for the
p = 1()
and is sometimes
p =referred
test.
reject H0
Type I error
p=
9-2
correct decision
p=1
MINITAB Users Guide 2
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Power and Sample Size
When H0 is false and you fail to reject it, you make a type II error. The probability (p) of
making a type II error is called beta ().
severity of making an errorThe more serious the error, the less often you should be willing
to allow it to occur. Therefore, you should assign smaller probability values to more serious
errors.
magnitude of effect you want to detectPower is the probability (p = 1 ) of correctly
rejecting H0 when it is false. Ideally, you want to have high power to detect a difference that
you care about, and low power for a meaningless difference.
For example, suppose you want to claim that children in your school scored higher than the
general population on a standardized achievement test. You need to decide how much higher
than the general population your test scores need to be so you are not making claims that are
misleading. If your mean test score is only 0.7 points higher than the general population on a
100 point test, do you really want to detect a difference? Probably not. Therefore, you should
choose your sample size so that you only have power to detect differences that you consider
meaningful.
the size of the population difference (effect). As the size of population difference (effect)
decreases, power decreases.
CONTENTS
9-3
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 9
HOW TO USE
Z-Test and t-Tests
power
sample size
You need to determine what are acceptable values for any two of these parameters and MINITAB
will solve for the third.
For example, if you specify values for power and the minimum difference, Minitab will determine
the sample size required to detect the specified difference at the specified level of power. See
Defining the minimum difference on page 9-5.
h To calculate power, sample size, or minimum difference
1 Choose Stat Power and Sample Size 1-Sample Z, 1-Sample t, or 2-Sample t.
enter is considered the sample size for each group. For example, if you want to
determine power for an analysis with 10 observations in each group for a total of 20, you
would enter 10.
2 In Differences, enter one or more numbers.
9-4
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
MINITAB will solve for all combinations of the specified values. For example, if you enter 3
values in Sample sizes and 2 values in Differences, MINITAB will compute the power for all 6
combinations of sample sizes and differences.
For a discussion of the value needed in Differences, see Defining the minimum difference on
page 9-5.
3 In Sigma, enter an estimate of the population standard deviation () for your data. See
4 If you like, use one or more of the options listed below, then click OK.
Options
Options subdialog box
define the alternative hypothesis by choosing less than (lower-tailed), not equal (two-tailed),
or greater than (upper-tailed). The default is a two-tailed test.
store the sample sizes, differences (effects), and power values. When calculating sample size,
MINITAB stores the power value that will generate the nearest integer sample size.
For a one-sample Z- or t-test, express the difference in terms of the null hypothesis.
For example, suppose you are testing whether or not your students mean test score is different
from the population mean. You would like to detect a difference of three points. In the dialog
box, you would enter 3 in Differences.
For a two-sample t-test, express the difference as the difference between the population means
that you would like to be able to detect.
For example, suppose you are investigating the effects of water acidity on the growth of two
groups of tadpoles. You decide that any difference in growth between the two groups that is
smaller than 4 mm is not important. In the dialog box, you would enter 4 in Differences.
CONTENTS
9-5
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 9
HOW TO USE
Z-Test and t-Tests
Estimating
For power or minimum difference calculations, the estimate of depends on whether or not you
have already collected data.
Prospective studies are done before collecting data so has to be estimated. You can use
related research, pilot studies, or subject-matter knowledge to estimate .
Retrospective studies are done after data have been collected so you can use the sample
standard deviation to estimate . You could also use related research, pilot studies, or
subject-matter knowledge. Use Display Descriptive Statistics (page 1-6) to calculate the
sample standard deviation.
For sample size calculations, the data have not been collected yet so the population standard
deviation () has to be estimated. You can use related research, pilot studies, or subject-matter
knowledge to estimate .
Note
By default, MINITAB sets to 1.0. This is fine if the differences (effects) are standardized,
but will present erroneous results if they are not. When the differences (effects) are not
standardized, be sure to enter an estimate of .
Suppose you are the production manager at a dairy plant. In order to meet state requirements,
you must maintain strict control over the packaging of ice cream. The volume cannot vary more
than 3 oz for a half-gallon (64-oz) container. The packaging machine tolerances are set so the
process is 1. How many samples must be taken to estimate the mean package volume at a
confidence level of 99% ( = 0.01) for power values of 0.7, 0.8, and 0.9?
1 Choose Stat Power and Sample Size 1-Sample t.
2 In Differences, enter 3. In Power values, enter 0.7 0.8 0.9.
3 In Sigma, enter 1.
4 Click Options. In Significance level, enter 0.01. Click OK in each dialog box.
Session
window
output
Sample
Size
5
5
6
Target
Power
0.7000
0.8000
0.9000
9-6
Actual
Power
0.8947
0.8947
0.9827
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Tests of Proportions
HOW TO USE
Power and Sample Size
Tests of Proportions
Proportion tests are used to perform hypothesis tests of a proportion (one-sample) or the
difference in proportions (two-sample). For these tests, you can calculate the
power
sample size
You need to determine what are acceptable values for any two of these parameters and MINITAB
will solve for the third.
For example, if you specify values for power and the minimum difference, Minitab will
determine the sample size required to detect the specified difference at the specified level of
power. See Defining the minimum difference on page 9-9.
h To calculate power, sample size, or minimum difference
1 Choose Stat Power and Sample Size 1 Proportion or 2 Proportions.
1 Proportion
CONTENTS
2 Proportions
9-7
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 9
Tests of Proportions
2 Do one of the following:
enter is considered the sample size for each group. For example, if you want to
determine power for an analysis with 10 observations in each group for a total of 20, you
would enter 10.
2 In Alternative values of p or Proportion 1 values, enter one or more proportions. See
enter is considered the sample size for each group, not the total number for the
experiment.
2 In Power values, enter one or more numbers.
MINITAB will solve for all combinations of the specified values. For example, if you enter 3
values in Sample sizes and 2 values in Alternative values of p, MINITAB will compute the
power for all 6 combinations of sample sizes and alternative proportions.
For a discussion of the values needed in Alternative values of p and Proportion 1 values, see
Defining the minimum difference on page 9-9.
3 Do one of the following:
For a one-sample test, enter the expected proportion under the null hypothesis in
Hypothesized p. The default is 0.5.
For a two-sample test, enter the second proportion in Proportion 2. The default is 0.5.
For a discussion of the values needed in Hypothesized p and Proportion 2, see Defining the
minimum difference on page 9-9.
4 If you like, use one or more of the options listed below, then click OK.
Options
Options subdialog box
define the alternative hypothesis by choosing less than (lower-tailed), not equal (two-tailed), or
greater than (upper-tailed). The default is a two-tailed test.
specify the significance level of the test. The default is = 0.05.
9-8
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Tests of Proportions
HOW TO USE
Power and Sample Size
store the sample sizes, alternative values of p or proportion 1 values, and power values. When
calculating sample size, MINITAB stores the power value that will generate the nearest integer
sample size.
For a one-sample test of proportion, enter the expected proportion under the null hypothesis
for Hypothesized p in the dialog box.
Suppose you are testing whether the data are consistent with the following null hypothesis
and would like to detect any differences where the true proportion is greater than 0.73.
H0: p = 0.7
For a two-sample test of proportion, enter the expected proportions under the null hypothesis
for Proportion 2 in the dialog box.
Suppose a biologist wants to test whether or not there is a difference in the proportion of fish
that have been affected by pollution in two lakes. Previous research suggests that
approximately 25% of fish have been affected. The biologist would like to detect a difference
in proportions of 0.03.
H0: p1 = p2
H1: p1 p2
As a political advisor, you want to determine whether there is a difference between the
proportion of men and the proportion of women who support a tax reform bill. Results of a
previous survey of registered voters indicate that 30% (p = 0.30) of the voters support the tax bill.
If you mail 1000 surveys, what is the power to detect differences greater than 0.05 between the
proportions of men and women who support the tax bill?
1 Choose Stat Power and Sample Size 2 Proportions.
2 In Sample sizes, enter 1000.
3 In Proportion 1 values, enter 0.25 0.35.
4 In Proportion 2, enter 0.30. Click OK.
CONTENTS
9-9
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 9
Session
window
output
HOW TO USE
One-Way Analysis Of Variance
Proportion 1
0.250000
0.350000
power
sample size
minimum detectable difference between the smallest and largest factor means (maximum
difference)
You need to determine what are acceptable values for any two of these parameters and MINITAB
will solve for the third.
For example, if you specify values for power and the maximum difference between the factor level
means, Minitab will determine the sample size required to detect the specified difference at the
specified level of power. See Defining the maximum difference on page 9-12.
9-10
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
number of observations in every factor level. For example, if you have 3 factor levels
with 5 observations each, you would enter 5.
2 In Values of the maximum difference between means, enter one or more numbers.
MINITAB will solve for all combinations of the specified values. For example, if you enter 3
values in Sample sizes and 2 values in Values of the maximum difference between means,
MINITAB will compute the power for all 6 combinations of sample sizes and maximum
differences. See Defining the maximum difference on page 9-12.
3 In Sigma, enter an estimate of the population standard deviation () for your data. See
4 If you like, use one or more of the options listed below, then click OK.
CONTENTS
9-11
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 9
HOW TO USE
One-Way Analysis Of Variance
Options
Options subdialog box
Suppose you are about to undertake an investigation to determine whether or not 4 treatments
affect the yield of a product using 5 observations per treatment. You know that the mean of the
control group should be around 8, and you would like to find significant differences of +4. Thus,
the maximum difference you are considering is 4 units. Previous research suggests the population
is 1.64.
1 Choose Stat Power and Sample Size One-way ANOVA.
2 In Number of levels, enter 4.
3 In Sample sizes, enter 5.
4 In Values of the maximum difference between means, enter 4.
5 In Sigma, enter 1.64. Click OK.
Session
window
output
Sample
Size
Power
5 0.8269
Number of Levels = 4
Maximum
Difference
4
9-12
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Power and Sample Size
number of replicates
power
minimum effect
You need to determine what are acceptable values for any three of these parameters and
MINITAB will solve for the fourth.
For example, to calculate the number of replicates, you need to specify the minimum effect,
power, and the number of center points that you consider to be acceptable. Then, MINITAB
solves for the number of replicates you need to be able to reject the null hypothesis when the
true value differs from the hypothesized value by the specified minimum effect. See Defining the
effect on page 9-15.
h To calculate power, replicates, minimum effect, or number of center points
1 Choose Stat Power and Sample Size 2-Level Factorial Design or Plackett-Burman
Design.
on page 9-14.
4 Do one of the following:
CONTENTS
9-13
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 9
SC QREF
HOW TO USE
For information on the value needed in Effects, see Defining the effect on page 9-15.
5 In Sigma, enter an estimate of the population standard deviation () for your data. See
6 If you like, use one or more of the options listed below, then click OK.
Options
Designs subdialog box
9-14
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Power and Sample Size
As a quality engineer, you need to determine the best settings for 4 input variables (factors) to
improve the transparency of a plastic part. You have determined that a 4 factor, 8 run design (
fraction) with 3 center points will allow you to estimate the effects you are interested in.
Although you would like to perform as few replicates as possible, you must be able to detect
effects of 5 or more. Previous experimentation suggests that 4.5 is a reasonable estimate of .
1 Choose Stat Power and Sample Size 2-Level Factorial Design.
2 In Number of factors, enter 4.
3 In Number of corner points, enter 8.
4 In Replicates, enter 1 2 3 4.
5 In Effects, enter 5.
6 In Number of center points, enter 3.
7 In Sigma, enter 4.5. Click OK.
CONTENTS
9-15
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 9
Session
window
output
SC QREF
HOW TO USE
4
none
Base Design:
4, 8
Effect
5
5
5
5
Reps
1
2
3
4
Power
0.1577
0.5189
0.7305
0.8565
9-16
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
10
CONTENTS
10-1
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 10
SC QREF
HOW TO USE
Run charts detect patterns in your process data, and perform two tests for non-random
behavior. See Run Chart on page 10-2.
Pareto charts help you identify which of your problems are most significant, so you can focus
improvement efforts on areas where the largest gains can be made. See Pareto Chart on page
10-11.
Multi-Vari charts present analysis of variance data in graphical form to give you a look at
your data. See Multi-Vari Chart on page 10-17.
Symmetry plots can help you assess whether your data come from a symmetric distribution.
See Symmetry Plot on page 10-19.
Run Chart
Use the Run Chart command to look for evidence of patterns in your process data, and perform
two tests for non-random behavior. Run Chart plots all of the individual observations versus the
subgroup number, and draws a horizontal reference line at the median. When the subgroup size
is greater than 1, Run Chart also plots the subgroup means or medians and connects them with a
line.
The two tests for non-random behavior detect trends, oscillation, mixtures, and clustering in your
data. Such patterns suggest that the variation observed is due to special causescauses arising
from outside the system that can be corrected. Common cause variation, on the other hand, is
variation that is inherent or a natural part of the process. A process is in control when only
common causesnot special causesaffect the process output.
Data
You can use individual observations or subgroup data. Subgroup data can be structured in a
single column or in rows across several columns. When you have subgroups of unequal size,
enter the subgroups in a single column, then set up a second column of subgroup indicators. See
Data on page 12-3 for examples.
10-2
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Run Chart
HOW TO USE
Quality Planning Tools
When subgroups or individual observations are in one column, enter the data column in
Single column. In Subgroup size, enter a subgroup size or column of subgroup
indicators. For individual observations, enter a subgroup size of 1.
When subgroups are in rows, enter a series of columns in Subgroups across rows of.
3 If you like, use any of the options described below, then click OK.
Options
Run Chart dialog box
CONTENTS
10-3
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 10
Run Chart
Characteristics of
a normal pattern
include a random
distribution of
points; the actual
number of runs
should be close to
the expected
number of runs.
10-4
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Run Chart
HOW TO USE
Quality Planning Tools
The first of Run Charts two tests for randomness is based on the number of runs about the
median. The second test is based on the number of runs up or down. The methods used to count
the number of runs are described in Interpreting the test for number of runs about the median on
page 10-6 and Interpreting the test for number of runs up or down on page 10-8. The following
table illustrates what these two tests can tell you:
Test for
randomness
number of runs
about the median
number of runs
up or down
Condition
Indicates
clustering of data
oscillationdata varies
up and down rapidly
trending of data
Both tests are based on the individual observations when the subgroup size is equal to one. When
the subgroup size is greater than one, the tests are based on either the subgroup means (the
default) or the subgroup medians.
With both tests, the null hypothesis is that the data is a random sequence. Run Chart converts
the observed number of runs into a test statistic that is approximately standard normal, then uses
the normal distribution to obtain p-values. See [1] for details. The two p-values correspond to the
one-sided probabilities associated with the test statistic. When either p-value is smaller than your
-value, also known as the significance level, you should reject the hypothesis of randomness.
The -value is the probability that you will incorrectly reject the hypothesis of randomness when
the hypothesis is true. For illustrative purposes in the examples, assume the test for randomness
is significant at an -value of 0.05.
CONTENTS
10-5
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 10
Run Chart
Mixture pattern
10-6
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Run Chart
HOW TO USE
Quality Planning Tools
.
Clusters may indicate variation due to special causes, such as
measurement problems, or sampling from a bad group of parts.
Clusters are groups of points in one area of the chart.
Cluster pattern
CONTENTS
10-7
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 10
Run Chart
Oscillating pattern
Since the p-value is less than 0.05, you would reject the null
hypothesis of non-randomness in favor of the alternative for
oscillation. Here, rapid oscillationdata that varies up and
down quicklyis suggested.
10-8
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Run Chart
HOW TO USE
Quality Planning Tools
Trend pattern
CONTENTS
10-9
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 10
Run Chart
e Example of a run chart
Suppose you work for a company that produces different kinds of devices to measure radiation. As
the quality control engineer, you are concerned with a membrane type devices ability to
consistently measure the amount of radiation. You want to analyze the data from tests of twenty
devices (in groups of two) collected in an experimental chamber. After every test, you record the
amount of radiation that each device measured.
As an exploratory measure, you decide to construct a run chart to evaluate the variation in your
measurements.
1 Open the worksheet RADON.MTW.
2 Choose Stat Quality Tools Run Chart.
3 In Single column, enter Membrane.
4 In Subgroup size, enter 2. Click OK.
Graph
window
output
The .05 level of significance was chosen for illustrative purposes, because it is
conventional in many fields. You could evaluate the significance of the tests for
non-random patterns at any level you choose. When the p-value displayed is less than the
chosen level of significance, you reject the null hypothesisa random sequence of data
in favor of one of the alternatives. See Interpreting the tests for randomness on page 10-4
for a complete discussion.
10-10
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Pareto Chart
HOW TO USE
Quality Planning Tools
Pareto Chart
Pareto charts are a type of bar chart in which the horizontal axis represents categories of interest,
rather than a continuous scale. The categories are often defects. By ordering the bars from
largest to smallest, a Pareto chart can help you determine which of the defects comprise the
vital few and which are the trivial many. A cumulative percentage line helps you judge the
added contribution of each category. Pareto charts can help to focus improvement efforts on
areas where the largest gains can be made.
Pareto chart can draw one chart for all your data (the default), or separate charts for groups
within your data.
Data
You can structure your data in one of two ways:
as one column of raw data, where each observation is an occurrence of a type of defect
as two columns: one column of defect names and a corresponding column of counts
If you have a column of raw data, enter the column in Chart defects data in.
3 If you like, use any of the options described below, then click OK.
CONTENTS
10-11
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 10
Pareto Chart
Options
draw separate Pareto charts for groups within your data. You can arrange the group charts one
of three ways:
All on one page, same ordering of bars. All of the charts will be in the same Graph
window, with the ordering of the bars determined by the first group. This means the bars in
subsequent groups will usually not be in Pareto order (largest to smallest). But this can be
useful for comparing importance of categories relative to a baseline, which is the first
group.
One chart per page, same ordering of bars. Each chart is full-size in its own Graph
window, with the ordering of the bars determined by the first group, as above. This means
that the bars in subsequent groups will usually not be in Pareto order. But this can be useful
for comparing importance of categories relative to a baseline, which is the first group.
One chart per page, independent ordering of bars. Each chart is full-size in its own
Graph window, in Pareto order. In most cases, the order will be different between groups.
Your worksheet must be structured as a column of raw data (not counts) and a By column to
use this option. See Example of a Pareto chart with a by column on page 10-14 for an
example.
specify a cumulative percentage at which to stop generating bars for individual defects. By
default, Pareto Chart generates bars until the cumulative percent of defects surpasses 95, then
groups the remaining defects into a bar named Others. You may want to stop at a different
cumulative percentage, such as 90.
The company you work for manufactures metal bookcases. During final inspection, a certain
number of bookcases are rejected due to scratches, chips, bends, or dents. You want to make a
Pareto chart to see which defect is causing most of your problems. First you count the number of
times each defect occurred, then you enter the name of the defect each time it occurs into a
worksheet column called Damage.
1 Open the worksheet EXH_QC.MTW.
2 Choose Stat Quality Tools Pareto Chart.
3 Choose Chart defects data in and enter Damage in the text box. Click OK.
10-12
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Pareto Chart
HOW TO USE
Quality Planning Tools
Graph
window
output
Suppose you work for a company that manufactures motorcycles. You hope to reduce quality
costs arising from defective speedometers. During inspection, a certain number of speedometers
are rejected, and the types of defects recorded. You enter the name of the defect into a worksheet
column called Defects, and the corresponding counts into a column called Counts. You know
that you can save the most money by focusing on the defects responsible for most of the
rejections. A Pareto chart will help you identify which defects are causing most of your problems.
1 Open the worksheet EXH_QC.MTW.
2 Choose Stat Quality Tools Pareto Chart.
3 Choose Chart defects table. Enter Defects in Labels in and Counts in Frequencies in. Click
OK.
Graph
window
output
CONTENTS
10-13
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 10
HOW TO USE
Cause-and-Effect Diagram
Imagine you work for a company which manufactures dolls. Lately, you have noticed that an
increasing number of dolls are being rejected at final inspection due to scratches, peels, and
smudges in their paint. You want to see if a relationship exists between the type and number of
flaws, and the work shift producing the dolls.
1 Open the worksheet EXH_QC.MTW.
2 Choose Stat Quality Tools Pareto Chart.
3 Choose Chart defects data in and enter Flaws in the text box. In BY variable in, enter Period.
Click OK.
Graph
window
output
Cause-and-Effect Diagram
Use a fishbone (cause-and-effect, or Ishikawa) diagram to organize brainstorming information
about potential causes of a problem. Diagramming helps you to see relationships among
potential causes. You can draw a blank diagram, or a diagram filled in as much as you like.
Although there is no correct way to construct a fishbone diagram, some types lend themselves
well to many different situations.
10-14
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Cause-and-Effect Diagram
HOW TO USE
Quality Planning Tools
Option: On each
branch, you can
list specific causes
in that category.
Data
If you want to enter causes on the branches of the diagram, create a column of causes for each
branch.
h To make a cause-and-effect diagram
1 Choose Stat Quality Tools Cause-and-Effect.
2 If you like, use any of the options described below, then click OK.
CONTENTS
10-15
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 10
HOW TO USE
Cause-and-Effect Diagram
Options
customize the diagram with your own labels, causes, and name of problem (or effect) you
would like to solve
draw a blank diagramsee Example of drawing three common diagrams on page 10-16 for an
illustration
Using a Pareto chart (see page 10-11) you discovered that your parts were rejected most often due
to surface flaws. This afternoon, you are meeting with members of various departments to
brainstorm potential causes for these flaws. Beforehand, you decide to print a cause-and-effect
diagram to help organize your notes during the meeting.
1 Choose Stat Quality Tools Cause-and-Effect.
2 Do one of the following:
10-16
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Multi-Vari Chart
HOW TO USE
Quality Planning Tools
Multi-Vari Chart
MINITAB draws Shainin multi-vari charts for up to four factors. Multi-vari charts are a way of
presenting analysis of variance data in a graphical form providing a visual alternative to
analysis of variance. These charts may also be used in the preliminary stages of data analysis to
get a look at the data. The chart displays the means at each factor level for every factor. A chart
for two factors (MetalType and SinterTime), each with three levels, is shown below:
ST=150
ST= 200
ST=100
At each level of
MetalType, MINITAB
plots and connects the
three level means for
SinterTime.
CONTENTS
10-17
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 10
Multi-Vari Chart
Data
You need one numeric column for the response variable and up to four numeric, text, or date/
time factor columns. Each row contains the data for a single observation.
Text categories (factor levels) are processed in alphabetical order by default. If you wish, you can
define your own ordersee Ordering Text Categories in the Manipulating Data chapter in
MINITAB Users Guide 1 for details.
MINITAB automatically omits missing data from the calculations.
h To draw a multi-vari chart
1 Choose Stat Quality Tools Multi-Vari Chart.
Options
Options subdialog box
connect the factor level means for each factor with a line
You are responsible for evaluating the effects of sintering time on the compressive strength of
three different metals. Compressive strength was measured for five specimens for each metal type
at each of the sintering times: 100 minutes, 150 minutes, and 200 minutes. Before you engage in
a full data analysis, you want to view the data to see if there are any visible trends or interactions
by viewing a multi-vari chart.
1 Open the worksheet SINTER.MTW.
10-18
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Symmetry Plot
Graph
window
output
Symmetry Plot
Symmetry plots can be used to assess whether sample data come from a symmetric distribution.
Many statistical procedures assume that data come from a normal distribution. However, many
procedures are robust to violations of normality, so having data from a symmetric distribution is
often sufficient. Other procedures, such as nonparametric methods, assume symmetric
distributions rather than normal distributions. Therefore, a symmetry plot is a useful tool in
many circumstances.
Data
The data columns must be numeric. If you enter more than one data column, MINITAB draws a
separate symmetry plot for each column.
MINITAB automatically omits missing data from the calculations.
MINITAB Users Guide 2
CONTENTS
10-19
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 10
Symmetry Plot
h To draw a symmetry plot
1 Choose Stat Quality Tools Symmetry Plot.
2 In Variables, enter the columns containing the numeric data you want to plot.
3 If you like, use the option listed below, then click OK.
Options
Options subdialog box
Method
MINITAB plots the distances from the median of ordered pairs of the data from the sample. The
distances for each ordered pair make up the X and Y coordinates of a single point for each pair:
1 The first pair consists of the two values that are closest to the median, one above and one
below.
2 The second pair consists of the two values that are second closest to the median, one above
The distance from the median for the point in each pair that is less than the median becomes the
Y coordinate for that point. The distance from the median for the point in each pair that is greater
than the median becomes the X coordinate for that point.
MINITAB also displays a histogram to provide an alternative view of the distribution.
Interpreting the symmetry plot
When the sample data follow a symmetric distribution, the X and Y coordinates will be
approximately equal for all points and the data will fall in a straight line. MINITAB draws a line on
the plot to represent exact X-Y equality (a perfectly symmetric sample). By comparing the data
points to the line, you can assess the degree of symmetry present in the data. The more symmetric
the data, the closer the points will be to the line. Even with normally distributed data, you can
10-20
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Symmetry Plot
HOW TO USE
Quality Planning Tools
expect to see runs of points above or below the line. The important thing to look for is whether
the points remain close to or parallel to the line, versus the points diverging from the line. You
can detect the following asymmetric conditions:
Caution
Data points diverging above the line indicate skewness to the left.
Data points diverging below the line indicate skewness to the right.
Points far away from the line in the upper right corner (where distances are large) indicate
some degree of skewness in the tails of the distribution.
As rule of thumb, you should have at least 25 to 30 data points. Interpreting a plot with
too few data points may lead to incorrect conclusions.
Before doing further analysis, you would like to determine whether or not the sample data come
from a symmetric distribution.
1 Open the worksheet EXH_QC.MTW.
2 Choose Stat Quality Tools Symmetry Plot.
CONTENTS
10-21
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 10
HOW TO USE
References
Graph
window
output
References
[1] J.D. Gibbons (1986). Randomness, Tests of Encyclopedia of Statistical Sciences, 7, John
Wiley & Sons, pp.555562.
[2] T.P. Ryan (1989). Statistical Methods for Quality Improvement, John Wiley & Sons.
[3] W.A. Taylor (1991). Optimization & Variation Reduction in Quality, McGraw-Hill, Inc.
10-22
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
11
Measurement Systems
Analysis
CONTENTS
11-1
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 11
SC QREF
HOW TO USE
Gage R&R (Crossed), Gage R&R (Nested), and Gage Run Chart examine measurement
system precision.
Any time you measure the results of a process you will see some variation. This variation comes
from two sources: one, there are always differences between parts made by any process, and two,
any method of taking measurements is imperfectthus, measuring the same part repeatedly does
not result in identical measurements.
Statistical Process Control (SPC) is concerned with identifying sources of part-to-part variation,
and reducing that variation as much as possible to get a more consistent product. But before you
do any SPC analyses, you may want to check that the variation you observe is not overly due to
errors in your measurement system.
Accuracy describes the difference between the measurement and the parts actual value.
Precision describes the variation you see when you measure the same part repeatedly with the
same device.
Within any measurement system, you can have one or both of these problems. For example, you
can have a device which measures parts precisely (little variation in the measurements) but not
accurately. You can also have a device that is accurate (the average of the measurements is very
close to the accurate value), but not precise, that is, the measurements have large variance. You
can also have a device that is neither accurate nor precise.
Accuracy
The accuracy of a measurement system is usually broken into three components:
11-2
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
linearitya measure of how the size of the part affects the accuracy of the measurement
system. It is the difference in the observed accuracy values through the expected range of
measurements.
accuracya measure of the bias in the measurement system. It is the difference between the
observed average measurement and a master value.
stabilitya measure of how accurately the system performs over time. It is the total variation
obtained with a particular device, on the same part, when measuring a single characteristic
over time.
To examine your measurement systems accuracy, see Gage Linearity and Accuracy Study on
page 11-26.
Precision
Precision, or measurement variation, can be broken down into two components:
repeatabilitythe variation due to the measuring device. It is the variation observed when the
same operator measures the same part repeatedly with the same device.
To examine your measurement systems precision, see Gage R&R Study on page 11-4.
To look at a plot of all of the measurements by operator/part combination, and thus visualize the
repeatability and reproducibility components of the measurement variation, see Gage Run
Chart on page 11-22.
GAGE2.MTW, in which measurement system variation has a large effect on the overall
observed variation
GAGEAIAG.MTW, in which measurement system variation has a small effect on the overall
observed variation
You can compare the output for the two data sets, as well as compare results from the various
analyses.
The Gage Linearity and Accuracy Study example uses the GAGELIN.MTW data set.
GAGEAIAG.MTW and GAGELIN.MTW are reprinted with permission from the Measurement
Systems Analysis Reference Manual (Chrysler, Ford, General Motors Supplier Quality
Requirements Task Force).
CONTENTS
11-3
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 11
HOW TO USE
Gage R&R Study
Use Gage R&R Study (Crossed) when each part is measured multiple times by each operator.
Use Gage R&R Study (Nested) when each part is measured by only one operator, such as in
destructive testing. In destructive testing, the measured characteristic is different after the
measurement process than it was at the beginning. Crash testing is an example of destructive
testing.
MINITAB provides two methods for assessing repeatability and reproducibility: X and R, and
ANOVA. The X and R method breaks down the overall variation into three categories:
part-to-part, repeatability, and reproducibility. The ANOVA method goes one step further and
breaks down reproducibility into its operator, and operator-by-part, components.
Overall Variation
Part-to-Part Variation
Reproducibility
Repeatability
Operator
Operator by Part
The ANOVA method is more accurate than the X and R method, in part, because it considers
the operator by part interaction [3] and [4]. Gage R&R Study (Crossed) allows you to choose
between the X and R method and the ANOVA method. Gage R&R Study (Nested) uses the
ANOVA method only.
If you need to use destructive testing, you must be able to assume that all parts within a single
batch are identical enough to claim that they are the same part. If you are unable to make that
assumption then part-to-part variation within a batch will mask the measurement system
variation.
If you can make that assumption, then choosing between a crossed or nested Gage R&R Study for
destructive testing depends on how your measurement process is set up. If all operators measure
parts from each batch, then use Gage R&R Study (Crossed). If each batch is only measured by a
11-4
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
single operator, then you must use Gage R&R Study (Nested). In fact, whenever operators
measure unique parts, you have a nested design.
Data
Gage R&R Study (Crossed)
Structure your data so that each row contains the part name or number, operator (optional), and
the observed measurement. Parts and operators can be text or numbers.
Measure
1.48
1.43
1.83
1.83
1.53
1.38
1.78
1.33
Operator
Daryl
Daryl
Daryl
Daryl
Daryl
Daryl
Beth
Beth
PartNum
1
1
2
2
3
3
1
1
The Gage R&R studies require balanced designs (equal numbers of observations per cell) and
replicates. You can estimate any missing observations with the methods described in [2].
Gage R&R Study (Nested)
Structure your data so that each row contains the part name or number, operator, and the
observed measurement. Parts and operators can be text or numbers. Part is nested within
operator, because each operator measures unique parts.
Note
If you use destructive testing, you must be able to assume that all parts within a single
batch are identical enough to claim that they are the same part.
CONTENTS
11-5
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 11
In the example on the right, PartNum1 for Daryl is truly a different part from PartNum1 for Beth.
Measure
1.48
1.43
1.83
1.83
1.53
1.52
1.38
1.78
1.33
Operator
Daryl
Daryl
Daryl
Daryl
Daryl
Daryl
Beth
Beth
Beth
PartNum
1
1
2
2
3
3
1
1
2
Measure
1.48
1.43
1.83
1.83
1.53
1.52
1.38
1.78
1.33
Operator
Daryl
Daryl
Daryl
Daryl
Daryl
Daryl
Beth
Beth
Beth
PartNum
1
1
2
2
3
3
4
4
5
The Gage R&R studies require balanced designs (equal numbers of observations per cell) and
replicates. You can estimate any missing observations with the methods described in [2].
h To do a Gage R&R Study (Crossed)
1 Choose Stat Quality Tools Gage R&R Study (Crossed).
2 In Part or batch numbers, enter the column of part or batch names or numbers.
11-6
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Options
Gage R&R Study dialog box
(Gage R&R (Crossed) only) use the ANOVA or X and R (default) method of analysis
change the multiple in the Study Var (5.15SD) column by entering a study variationsee
StudyVar in Session window output on page 11-9
display a column showing the percentage of process tolerance taken up by each variance
component (a measure of precision-to-tolerance for each component)
display a column showing the percentage of process standard deviation taken up by each
variance component
CONTENTS
11-7
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 11
HOW TO USE
Gage R&R Study
ANOVA method
When both Parts and Operators are entered
When you enter Operators as well as Parts, your data are analyzed using a balanced two-factor
factorial design. Both factors are considered to be random. The model includes the main effects
of Parts and Operators, plus the Part by Operator interaction. (When operators are not entered,
the model is a balanced one-way ANOVA with Part as a random factor, as described in the next
section.)
MINITAB first calculates the ANOVA table for the appropriate model. That table is then used to
calculate the variance components, which appear in the Gage R&R tables.
Note
Some of the variance components could be estimated as negative numbers when the
Part by Operator term in the full model is not significant. MINITAB will first display an
ANOVA table for the full model. If the p-value for the Part by Operator term is > 0.25, a
reduced model is then fitted and used to calculate the variance components. This
reduced model includes only the main effects of Part and Operator.
With the full model, the variance component for Reproducibility is further broken down into
variation due to Operator and variation due to the Part by Operator interaction:
The Operator component is the variation observed between different operators measuring
the same part.
The Part by Operator interaction is the variation among the average part sizes measured by
each operator. This interaction takes into account cases where, for instance, one operator
gets more variation when measuring smaller parts, whereas another operator gets more
variation when measuring larger parts.
Use the table of variance components to interpret these effects.
With the reduced model, the variance component for Reproducibility is simply the variance
component for Operator.
11-8
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
MINITAB first calculates the ANOVA table for the appropriate model. That table is then used to
calculate the variance componentsRepeatability, Reproducibility, and Part-to-Part.
Note
Some of the variance components could be estimated as negative numbers when the
Part by Operator term in the full model is not significant. MINITAB will first display an
ANOVA table for the full model. If the p-value for the Part by Operator term is > 0.25, a
reduced model is then fitted and used to calculate the variance components. This
reduced model includes only the main effects of Part and Operator.
ANOVA Table (ANOVA method only)displays the usual analysis of variance output for the
fitted effects. See Note under ANOVA method on page 11-8 for more information.
Gage R&R
VarComp (or Variance)the variance component contributed by each source.
%Contributionthe percent contribution to the overall variation made by each variance
component. (Each variance component divided by the total variation, then multiplied by
100.) The percentages in this column add to 100.
StdDevthe standard deviation for each variance component.
StudyVarthe standard deviations multiplied by 5.15. You can change the multiple from
5.15 to some other number. The default is 5.15sigma, because 5.15 is the number of
standard deviations needed to capture 99% of your process measurements. The last entry
in the StudyVar column is 5.15Total. This number, usually referred to as the study
variation, estimates the width of the interval you need to capture 99% of your process
measurements.
%Study Varthe percent of the study variation for each component (the standard
deviation for each component divided by the total standard deviation). These percentages
do not add to 100.
Number of Distinct Categoriesthe number of distinct categories within the process data
that the measurement system can discern. For instance, imagine you measured ten different
parts, and MINITAB reported that your measurement system could discern four distinct
categories. This means that some of those ten parts are not different enough to be discerned as
being different by your measurement system. If you want to distinguish a higher number of
distinct categories, you need a more precise gage.
The number is calculated by dividing the standard deviation for Parts by the standard
deviation for Gage, then multiplying by 1.41 and rounding down to the nearest integer. This
number represents the number of non-overlapping confidence intervals that will span the
range of product variation.
The Automobile Industry Action Group (AIAG) [1] suggests that when the number of
categories is less than two, the measurement system is of no value for controlling the process,
since one part cannot be distinguished from another. When the number of categories is two,
CONTENTS
11-9
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 11
HOW TO USE
Gage R&R Study
the data can be divided into two groups, say high and low. When the number of categories is
three, the data can be divided into three groups, say low, middle and high. A value of five or
more denotes an acceptable measurement system.
Components of Variation is a visualization of the final table in the Session window output,
showing bars for: Total Gage R&R, Repeatability, Reproducibility (but not Operator and
Operator by Part), and Part-to-Part variation.
R Chart by Operator displays the variation in the measurements made by each operator, so
you can compare operators to each other. This helps you determine if each operator has the
variability of their measurements in control.
X Chart by Operator displays the measurements in relation to the overall mean for each
operator, so you can compare operators to each other and to the mean. This helps you
determine if each operator has the average of their measurements in control.
By Part displays the main effect for Part, so you can compare the mean measurement for each
part. If you have many replicates, boxplots are displayed on the By Part graph.
By Operator displays the main effect for Operator, so you can compare the mean
measurement for each operator. If you have many replicates, boxplots are displayed on the By
Operator graph.
Operator by Part Interaction (Gage R&R Study (Crossed) only) displays the Operator by Part
effect, so you can see how the relationship between Operator and Part changes depending on
the operator.
In this example, we do a gage R&R study on two data sets: one in which measurement system
variation contributes little to the overall observed variation (GAGEAIAG.MTW), and one in
which measurement system variation contributes a lot to the overall observed variation
(GAGE2.MTW). For comparison, we analyze the data using both the X and R method and the
ANOVA method. You can also look at the same data plotted on a Gage Run Chart (page 11-23).
For the GAGEAIAG data set, ten parts were selected that represent the expected range of the
process variation. Three operators measured the ten parts, two times per part, in a random order.
For the GAGE2 data, three parts were selected that represent the expected range of the process
variation. Three operators measured the three parts, three times per part, in a random order.
1 Open the file GAGEAIAG.MTW.
2 Choose Stat Quality Tools Gage R&R Study (Crossed).
3 In Part numbers, enter Part. In Operators, enter Operator. In Measurement data, enter
Response.
4 Under Method of Analysis, choose Xbar and R.
11-10
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Response.
4 Under Method of Analysis, choose ANOVA.
5 Click OK.
6 Now repeat steps 2 and 3 using the GAGE2.MTW data set.
CONTENTS
11-11
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 11
HOW TO USE
Gage R&R Study
%Contribution
Variance (of Variance)
2.08E-03
6.33
1.15E-03
3.51
9.29E-04
2.82
3.08E-02 93.67
3.29E-02 100.00
Source
StdDev
(SD)
0.045650
0.033983
0.030481
0.175577
0.181414
0.235099
0.175015
0.156975
0.904219
0.934282
25.16
18.73
16.80
96.78
100.00
a. The measurement system variation (Total Gage R&R) is much smaller than what was found
for the same data with the ANOVA method. That is because the X and R method does not
account for the Operator by Part effect, which was very large for this data set. Here you get
misleading estimates of the percentage of variation due to the measurement system.
B According to AIAG, 4 represents an adequate measuring system. However, as explained above,
you would be better off using the ANOVA method for this data. (See Session window output on
page 11-9.)
11-12
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
%Contribution
Variance (of Variance)
7229.94
7229.94
0.00
2026.05
9255.99
78.11
78.11
0.00
21.89
100.00
Source
StdDev
(SD)
Study Var
(5.15*SD)
%Study Var
(%SV)
85.0291
85.0291
0.0000
45.0116
96.2081
437.900
437.900
0.000
231.810
495.471
88.38
88.38
0.00
46.79
100.00
a. A large percentage (78.111%) of the variation in the data is due to the measuring system
(Gage R&R); little is due to differences between parts (21.889%).
B A 1 tells you the measurement system is poor; it cannot distinguish differences between parts.
(See Session window output on page 11-9.)
CONTENTS
11-13
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 11
HOW TO USE
Gage R&R Study
a. A low percentage of variation (6%) is due to the measurement system (Gage R&R), and a high
percentage (94%) is due to differences between parts.
B Although the X and R method does not account for the Operator by Part interaction, this plot
shows you that the interaction is significant. Here, the X and R method grossly overestimates
the capability of the gage. You may want to use the ANOVA method, which accounts for the
Operator by Part interaction.
C Most of the points in the X Chart are outside the control limits when the variation is mainly
due to part-to-part differences.
11-14
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
11-15
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 11
DF SS
Part
9 2.05871
Operator
2 0.04800
Operator*Part 18 0.10367
Repeatability 30 0.03875
Total
59 2.24912
MS
0.228745
0.024000
0.005759
0.001292
39.7178
4.1672
4.4588
0.00000
0.03256
0.00016
Gage R&R
Source
VarComp
%Contribution
(of VarComp)
0.004437
0.001292
0.003146
0.000912
0.002234
0.037164
0.041602
10.67
3.10
7.56
2.19
5.37
89.33
100.00
Source
StdDev
(SD)
Study Var
(5.15*SD)
%Study Var
(%SV)
0.066615
0.035940
0.056088
0.030200
0.047263
0.192781
0.203965
0.34306
0.18509
0.28885
0.15553
0.24340
0.99282
1.05042
32.66
17.62
27.50
14.81
23.17
94.52
100.00
a. When the p-value for Operator by Part is < 0.25, MINITAB fits the full model. In this case, the
ANOVA method will be more accurate than the X and R method, which does not account for
this interaction.
B The percent contribution from Part-To-Part is larger than that of Total Gage R&R. This tells
you that most of the variation is due to differences between parts; very little is due to
measurement system error.
C According to AIAG, 4 represents an adequate measuring system. (See Session window output
on page 11-9.)
11-16
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
DF
Part
2
Operator
2
Operator*Part 4
Repeatability 18
Total
26
SS
MS
DF
Part
2
Operator
2
Repeatability 22
Total
26
SS
MS
Gage R&R
Source
VarComp
%Contribution
(of VarComp)
7304.7
7304.7
0.0
0.0
1354.5
8659.2
84.36
84.36
0.00
0.00
15.64
100.00
Source
StdDev
(SD)
Study Var
(5.15*SD)
%Study Var
(%SV)
85.4673
85.4673
0.0000
0.0000
36.8036
93.0547
440.157
440.157
0.000
0.000
189.538
479.232
91.85
91.85
0.00
0.00
39.55
100.00
C
Number of Distinct Categories = 1
a. When the p-value for Operator by Part is > 0.25, MINITAB fits the model without the
interaction and uses the reduced model to define the Gage R&R statistics.
B The percent contribution from Total Gage R&R is larger than that of Part-To-Part. Thus, most
of the variation arises from the measuring system; very little is due to differences between
parts.
C A 1 tells you the measurement system is poor; it cannot distinguish differences between parts.
(See Session window output on page 11-9.)
MINITAB Users Guide 2
CONTENTS
11-17
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 11
A
B
a. The percent contribution from Part-To-Part is larger than that of Total Gage R&R, telling you
that most of the variation is due to differences between parts; little is due to the measurement
system.
B There are large differences between parts, as shown by the non-level line.
C There are small differences between operators, as shown by the nearly level line.
D Most of the points in the X Chart are outside the control limits, indicating the variation is
mainly due to differences between parts.
E This graph is a visualization of the p-value for OperPart0.00016 in this caseindicating a
significant interaction between Part and Operator.
11-18
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
E
C
a. The percent contribution from Total Gage R&R is larger than that of Part-to-Part, telling you
that most of the variation is due to the measurement systemprimarily repeatability; little is
due to differences between parts.
B There is little difference between parts, as shown by the nearly level line.
C Most of the points in the X chart are inside the control limits, indicating the observed
variation is mainly due to the measurement system.
D There are no differences between operators, as shown by the level line.
E This graph is a visualization of the p-value for OperPart0.48352 in this caseindicating
the differences between each operator/part combination are insignificant compared to the
total amount of variation.
CONTENTS
11-19
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 11
In this example, three operators each measured five different parts twice, for a total of 30
measurements. Each part is unique to operator; no two operators measured the same part.
Because of this you decide to conduct a gage R&R study (nested) to determine how much of your
observed process variation is due to measurement system variation.
1 Open the worksheet GAGENEST.MTW.
2 Choose Stat Quality Tools Gage R&R Study (Nested).
3 In Part or batch numbers, enter Part.
4 In Operators, enter Operator.
5 In Measurement data, enter Response.
6 Click OK.
DF
Operator
2
Part (Operator) 12
Repeatability
15
Total
29
SS
MS
0.0142
22.0548
19.3403
41.4093
Gage R&R
Source
%Contribution
VarComp (of VarComp)
1.28935 82.46
1.28935 82.46
0.00000
0.00
0.27427 17.54
1.56363 100.00
Source
StdDev
(SD)
1.13550
1.13550
0.00000
0.52371
1.25045
5.84781
5.84781
0.00000
2.69712
6.43982
90.81
90.81
0.00
41.88
100.00
11-20
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
a. The percent contribution for differences between parts (Part-To-Part) is much smaller than
the percentage contribution for measurement system variation (Total Gage R&R). This
indicates that most of the variation is due to measurement system error; very little is due to
differences between part.
B A 1 in number of distinct categories tells you that the measurement system is not able to
distinguish between parts.
Gage R&R Study (Nested)
a. Most of the variation is due to measurement system error (Gage R&R), while a low
percentage of variation is due to differences between parts.
B Most of the points in the X chart are inside the control limits when the variation is mostly due
to meaurement system error.
CONTENTS
11-21
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 11
HOW TO USE
Gage Run Chart
Data
Structure your data so each row contains the part name or number, operator (optional), and the
observed measurement. Parts and operators can be text or numbers.
Measure
1.48
1.43
1.83
1.83
1.53
1.38
1.78
1.33
Operator
Daryl
Daryl
Daryl
Daryl
Daryl
Daryl
Beth
Beth
PartNum
1
1
2
2
3
3
1
1
11-22
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Options
Gage Run Chart dialog box
enter a location other than the mean for the horizontal reference line
In this example, you draw a gage run chart with two data sets: one in which measurement system
variation contributes little to the overall observed variation (GAGEAIAG.MTW), and one in
which measurement system variation contributes a lot to the overall observed variation
(GAGE2.MTW). For comparison, see the same data sets analyzed by the gage R&R study using
the ANOVA and X and R Methods (page 11-10).
For the GAGEAIAG data, ten parts were selected that represent the expected range of the
process variation. Three operators measured the ten parts, two times per part, in a random order.
For the GAGE2 data, three parts were selected that represent the expected range of the process
variation. Three operators measured the three parts, three times per part, in a random order.
1 Open the worksheet GAGEAIAG.MTW.
2 Choose Stat Quality Tools Gage Run Chart.
3 In Part numbers, enter C1.
4 In Operators, enter C2.
5 In Measurement data, enter C3. Click OK.
6 Repeat these steps, using the GAGE2.MTW data set.
CONTENTS
11-23
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 11
HOW TO USE
Gage Run Chart
a. For each part, you can compare both the variation between measurements made by each
operator, and differences in measurements between operators.
B You can also look at the measurements in relationship to the horizontal reference line. In this
example, the reference line is the mean of all observations.
Most of the variation is due to differences between parts. Some smaller patterns also appear. For
example, Operator 2s second measurement is consistently (seven times out of ten) smaller than
the first measurement. Operator 2s measurements are consistently (eight times out of ten)
smaller than Operator 1s measurements.
11-24
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
A
B
a. For each part, you can compare both the variation between measurements made by each
operator, and differences in measurements between operators.
B You can also look at the measurements in relationship to the horizontal reference line. In this
example, the reference line is the mean of all observations.
The dominant factor here is repeatabilitylarge differences in measurements when the same
operator measures the same part. Oscillations might suggest the operators are adjusting how
they measure between measurements.
CONTENTS
11-25
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 11
HOW TO USE
Data
Structure your data so each row contains a part, master measurement, and the observed
measurement on that part (the response). Parts can be text or numbers.
4
4
5.1
3.9
2
2
Response
2.7
2.5
Master
2
2
PartNum
1
1
2 In Part numbers, enter the column of part names or numbers. In Master Measurements,
enter the column of master measurements. In Measurement data, enter the column of
observed measurements.
3 In Process Variation, enter a value. You can get this value from the Gage R&R Study
ANOVA method: it is the value in the Total row of the 5.15Sigma column. This is the
number that is usually associated with process variation. If you do not know the value for the
process variation, you can enter the process tolerance instead.
4 If you like, use any of the options described below, then click OK.
11-26
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Options
Gage Info subdialog box
Method
Both studies are done by selecting parts whose measurements cover the normal range of values
for a particular process, measuring the parts with a master system, then having an operator make
several measurements on each part using a common gage. MINITAB subtracts each
measurement taken by the operator from the master measurement, then calculates, for each
part, an average deviation from the master measurement.
To calculate the linearity of the gage, MINITAB finds the best-fit line relating the average
deviations to the master measurements. Then,
Linearity = slope process sigma
Generally, the closer the slope is to zero, the better the gage linearity. Linearity can also be
expressed as a percentage of the process variation by multiplying the slope of the line by 100.
To calculate the accuracy of the gage, MINITAB combines the deviations from the master
measurement for all parts. The mean of this combined sample is the gage accuracy. Accuracy
can also be expressed as a percentage of the overall process variation by dividing the mean
deviation by the process sigma, and multiplying by 100.
e Example of a gage linearity and accuracy study
A plant foreman chose five parts that represented the expected range of the measurements. Each
part was measured by layout inspection to determine its reference value. Then, one operator
randomly measured each part 12 times. A Gage R&R Study using the ANOVA method was done
to get the process variationthe number in the Total row of the 5.15Sigma columnin this
case, 14.1941.
The data set used in this example has been reprinted with permission from the Measurement
Systems Analysis Reference Manual (Chrysler, Ford, General Motors Supplier Quality
Requirements Task Force).
1 Open the worksheet GAGELIN.MTW.
2 Choose Stat Quality Tools Gage Linearity Study.
3 In Part numbers, enter C1.
4 In Master measurements, enter C2. In Measurement data, enter C3.
5 In Process Variation, enter 14.1941. Click OK.
CONTENTS
11-27
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 11
SC QREF
HOW TO USE
a. %Linearity, which is the linearity expressed as a percent of the process variation. For a gage
that measures consistently across parts, %linearity will be close to zero. Here, the %Linearity is
13.
B The variation due to accuracy for this gage is less than 1% of the overall process variation.
11-28
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
References
SC QREF
HOW TO USE
References
[1] Automotive Industry Task Force (AIAG) (1994). Measurement Systems Analysis Reference
Manual. Chrysler, Ford, General Motors Supplier Quality Requirements Task Force.
[2] R.J.A. Little and D. B. Rubin (1987). Statistical Analysis With Missing Data, John Wiley &
Sons, New York.
[3] Douglas C. Montgomery and George C. Runger (1993-4). Gauge Capability and Designed
Experiments. Part I: Basic Methods, Quality Engineering 6(1), pp.115135.
[4] Douglas C. Montgomery and George C. Runger (1993-4). Gauge Capability Analysis and
Designed Experiments. Part II: Experimental Design Models and Variance Component
Estimation, Quality Engineering 6(2), pp.289305.
[5] S.R. Searle, G. Casella, and C. E. McCulloch (1992). Variance Components, John Wiley &
Sons, New York.
CONTENTS
11-29
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
12
Variables
Control Charts
Variables Control Charts Overview, 12-2
Defining Tests for Special Causes, 12-5
Box-Cox Transformation for Non-Normal Data, 12-6
Control Charts for Data in Subgroups, 12-9
Xbar Chart, 12-10
R Chart, 12-14
S Chart, 12-17
Xbar and R Chart, 12-19
Xbar and S Chart, 12-22
I-MR-R/S (Between/Within) Chart, 12-24
CONTENTS
12-1
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 12
HOW TO USE
Structure of a
control chart
Quality characteristic
Center line
A process is in control when most of the points fall within the bounds of the control limits, and
the points do not display any nonrandom patterns. The tests for special causes offered with
MINITABs control charts will detect nonrandom patterns in your data. You can change the
threshold values for triggering a test failuresee Defining Tests for Special Causes on page 12-5.
Once a process is in control, control charts can be used to estimate process parameters needed to
determine capabilitysee also Chapter 14, Process Capability.
12-2
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Variables Control Charts
For data in
subgroups:
For individual
observations:
For subgroup
combinations:
To plot this...
On page
subgroup means, X
12-10
subgroup ranges, r
12-14
12-17
X and R
12-19
X and S
12-22
individual observations
Individuals
12-29
moving ranges
Moving Range
12-31
I-MR
12-33
EWMA
12-36
moving averages
Moving Average
12-40
cumulative sums
CUSUM
12-42
Zone
12-47
Z-MR
12-53
Data
Structure individual observations down one column.
Structure subgroup data down a column or across rows. Here is the same data set, with
subgroups of size 5, structured both ways. Note that the first 5 observations in the left-side data
set (subgroup 1) are the first row of the right-side data set, the second 5 observations are the
second row, and so on.
CONTENTS
12-3
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 12
HOW TO USE
C1
40.13
39.68
40.84
41.52
40.02
39.54
39.87
40.25
40.47
41.41
39.20
38.72
39.39
C1
40.13
39.54
39.64
39.15
Subgroup 1
C2
39.68
39.87
37.36
41.15
C3
40.84
40.25
39.20
39.96
C4
41.52
40.47
38.72
40.48
C5
40.02
41.41
39.39
40.05
Subgroup 2
etc.
When subgroups are of unequal size, you must enter your data in one column, then create a
second column of subscripts which serve as subgroup indicators. In the following example, C1
contains the process data and C2 contains subgroup indicators:
C1
39.68
40.84
41.52
39.54
39.87
40.25
40.47
41.41
39.20
38.72
39.39
C2
1
1
1
2
2
2
2
2
3
3
3
Subgroup 1
Subgroup 2
Subgroup 3
Each time a subscript changes in C2, a new subgroup begins in C1. In this example, subgroup 1
has three observations, subgroup 2 has six observations, and so on.
For information on data for specific charts, see the following sections:
12-4
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Non-normal data
To properly interpret MINITABs quality control charts, you must enter data which approximate a
normal distribution. If your data are highly skewed, you may want to use the Box-Cox
transformation to induce normality.
You can access the Box-Cox transformation two ways: by using the Box-Cox transformation
option provided with the control chart commands, or by using the stand-alone Box-Cox
command. The stand-alone command can be used as an exploratory tool to help you determine
the best lambda value for the transformation. Then, you can use the transformation option to
transform the data at the same time you draw the control chart.
For information on the stand-alone Box-Cox transformation command, see Box-Cox
Transformation for Non-Normal Data on page 12-6.
For information on the Box-Cox transformation option, see Use the Box-Cox power
transformation for non-normal data on page 12-67.
Note
(default in parentheses)
16 (3)
711 (9)
58 (6)
1214 (14)
36 (4)
1215 (15)
610 (8)
R Chart, S Chart, Moving Range Chart, and the Attributes Control Charts (P, NP, C, and
U) only support tests 1 through 4.
CONTENTS
12-5
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 12
SC QREF
HOW TO USE
2 In one or more of the Argument boxes, enter a value for K. Click OK.
Data
Use this command with subgroup data or individual observations. Structure individual
observations down a single column.
Structure subgroup data in a single column or in rows across several columnssee Data on page
12-3 for examples.
Note
12-6
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Variables Control Charts
h To do a Box-Cox transformation
1 Choose Stat Control Charts Box-Cox Transformation.
For subgroups or individual observations in one column, enter the data column in Single
column. In Subgroup size, enter a subgroup size or column of subgroup indicators. For
individual observations, enter a subgroup size of 1.
For subgroup in rows, enter a series of columns in Subgroups across rows of.
To
Do this
Click OK.
Estimate the best lambda value for the In Store transformed data in, enter a
transformation, transform the data, and column(s) in which to store the
store the transformed data in the
transformed data, then click OK.
column(s) you specify
Transform the data with a lambda value In Store transformed data in, enter a
you enter, and store the transformed
column(s) in which to store the
data in a column(s) you specify
transformed data. Click Options. In Use
lambda, enter a value. Click OK in each
dialog box.
CONTENTS
12-7
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 12
SC QREF
HOW TO USE
Method
Box-Cox Transformation estimates a lambda value, as shown below, which minimizes the
Transformation
= 2
Y = Y
= 0.5
Y =
= 0
Y = Log e Y
= 0.5
Y = 1 ( Y )
= 1
Y = 1 Y
See [18] for more details on this procedure. A Fibonacci search [2] is used to find the smallest
standard deviation (and therefore the best transformation).
Graphical output
When you ask MINITAB to estimate a lambda value, you get a graph which displays:
In some cases, one of the closely competing values of lambda may end up having a
slightly smaller standard deviation than the best estimate.
The data used in the example are highly right skewed, and consist of 50 subgroups each of size 5.
If you like, you can look at the spread of the data both before and after the transformation using
Graph Histogram.
12-8
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Variables Control Charts
Graph
window
output
X charta chart of the subgroup means. See XBAR CHART on page 12-10.
S charta chart of the subgroup standard deviations. See S Chart on page 12-17.
CONTENTS
12-9
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 12
HOW TO USE
Xbar Chart
X and R chartan X chart and R chart in one window. See Xbar and R Chart on page 12-19.
X and S chartan X chart and S chart in one window. See Xbar and S Chart on page 12-22.
I-MR-R/S chartan individuals chart, moving range chart, and R chart in one window. See
I-MR-R/S (Between/Within) Chart on page 12-24.
The charts in this section (except X chart) require that you have two or more observations in at
least one subgroup. Subgroups do not need to be the same size. MINITAB calculates summary
statistics for each subgroup. These summary statistics are plotted on the charts and used to
estimate process parameters.
With X chart, you can also plot individual observations by entering a subgroup size of 1 in the
dialog box.
An important consideration when constructing control charts for data in subgroups is in choosing
subgroups that are free of special causes. The variation within a subgroup should be
representative of the process variation if all special causes were removed.
Missing data
If a single observation is missing, it is omitted from the calculations of the summary statistics for
the subgroup it was in. All formulas are adjusted accordingly. This may cause the control chart
limits and the center line to have different values for that subgroup.
If an entire subgroup is missing, there is a gap in the chart where the summary statistic for that
subgroup would have been plotted.
Unequal-size subgroups
All of the control chart commands in this section will handle unequal-size subgroups. Since the
control limits are functions of the subgroup size, they are affected by unequal-size subgroups. If
the sample sizes do not vary by very much, you may want to force the control limits to be
constant. See Force control limits and center line to be constant on page 12-67 for details.
Xbar Chart
An X chart is a control chart of subgroup means. You can use X charts to track the process level
and detect the presence of special causes.
By default, MINITABs X chart estimates the process variation, , using a pooled standard
deviation. You can also base the estimate on the average of the subgroup ranges or standard
deviations, or enter an historical value for .
You can also plot individual observations with the X chart. When you plot individual
observations, MINITAB estimates with MR / d2, the average of the moving range divided by an
12-10
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Xbar Chart
HOW TO USE
Variables Control Charts
unbiasing constant. By default, the moving range is of length 2, since consecutive values have
the greatest chance of being alike. You can also estimate using the median of the moving
range, or change the length of the moving range.
For more information, see Variables Control Charts Overview on page 12-2 and Control Charts
for Data in Subgroups on page 12-9.
Data
Use this command with subgroup data or individual observations. Subgroup data can be
structured in a single column, or in rows across several columns. When you have subgroups of
unequal size, structure the subgroups in a single column, then set up a second column of
subgroup identifiers. See Data on page 12-3 for examples.
h To make an X chart
1 Choose Stat Control Charts Xbar.
When subgroups are in one column, enter the data column in Single column. In
Subgroup size, enter a subgroup size or column of subgroup indicators.
When subgroups are in rows, enter a series of columns in Subgroups across rows of.
3 If you like, use any of the options listed below, then click OK.
Options
Xbar Chart dialog box
enter historical values for (the mean of the population distribution) and (the standard
deviation of the population distribution) if you have a goal for or , or known parameters
from prior datasee Use historical values of and on page 12-62. If you do not specify a
value for or , they are estimated from the data.
CONTENTS
12-11
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 12
HOW TO USE
Xbar Chart
customize the chart annotation, frame, and region (placement of the chart within the Graph
window)see Customize the data display, annotation, frame, and regions on page 12-73.
do eight tests for special causessee Do tests for special causes on page 12-63. To adjust the
sensitivity of the tests, see Defining Tests for Special Causes on page 12-5.
omit certain subgroups when estimating and see Omit subgroups from the estimate of
or on page 12-65.
estimate various wayssee Control how is estimated on page 12-66.
with subgroup size > 1: base the estimate on the average of the subgroup ranges or standard
deviations. The default estimate uses a pooled standard deviation.
with subgroup size = 1: base the estimate on the median of the moving range, or change the
length of the moving range. The default method uses MR / d2, the average of the moving
range divided by an unbiasing constant. By default, the moving range is of length 2, since
consecutive values have the greatest chance of being alike.
force the control limits and center line to be constant when subgroups are of unequal size
see Force control limits and center line to be constant on page 12-67.
choose the positions at which to draw the upper and lower control (sigma) limits in relation to
the center linesee Customize the control (sigma) limits on page 12-69. The default line is 3
above and below the center line. You can draw more than one set of lines. For example, you
can draw specification limits along with control limits on the chart.
place bounds on the upper and lower control limitssee Customize the control (sigma) limits
on page 12-69.
choose the line type, color, and size for the control limitssee Customize the control (sigma)
limits on page 12-69. The default line is solid red.
add another row of tick labels below the default tick labelssee Add additional rows of tick
labels on page 12-70. For example, you can place time stamp labels (or other descriptive
labels) on your graph.
choose the text font, color, and size for the axis and tick labels. The default labels are black
Arial.
use the Box-Cox transformation when you have very skewed datasee Use the Box-Cox power
transformation for non-normal data on page 12-67.
12-12
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Xbar Chart
HOW TO USE
Variables Control Charts
choose the symbol type, color, and size. The default symbol is a black cross.
choose the connection line type, color, and size. The default line is solid black.
estimate control limits and center line independently for different groups (draws a historical
chart)see page 12-60.
Suppose you work at a car assembly plant in a department that assembles engines. In an
operating engine, parts of the crankshaft move up and down a certain distance from an ideal
baseline position. AtoBDist is the distance (in mm) from the actual (A) position of a point on the
crankshaft to the baseline (B) position.
To ensure production quality, you took five measurements each working day, from September 28
through October 15, and then ten per day from the 18th through the 25th. You want to draw an
X chart to track the process level through that time period and to test for the presence of special
causes.
1 Open the worksheet CRANKSH.MTW.
2 Choose Stat Control Charts Xbar.
3 In Single column, enter AtoBDist. In Subgroup size, enter 5.
4 Click Tests. Check Perform all eight tests. Click OK.
5 Click S Limits. In Sigma limit positions, enter 1 2 3. Click OK in each dialog box.
Session
window
output
Graph
window
output
CONTENTS
12-13
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 12
R Chart
R Chart
An R chart is a control chart of subgroup ranges. You can use R charts to track process variation
and detect the presence of special causes. R charts are typically used to track process variation for
samples of size 5 or less, while S charts (page 12-17) are used for larger samples.
By default, MINITABs R Chart command bases the estimate of the process variation, , on the
average of the subgroup ranges. You can also use a pooled standard deviation, or enter an
historical value for .
For more information, see Variables Control Charts Overview on page 12-2 and Control Charts for
Data in Subgroups on page 12-9.
Data
Subgroup data can be structured in a single column, or in rows across several columns. When
you have subgroups of unequal size, structure the subgroups in a single column, then set up a
second column of subgroup identifiers. Subgroup size must be less than or equal to 100. See
Data on page 12-3 for examples.
h To make an R chart
1 Choose Stat Control Charts R.
When subgroups are in one column, enter the data column in Single column. In
Subgroup size, enter a subgroup size or column of subgroup indicators.
12-14
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
R Chart
HOW TO USE
Variables Control Charts
When subgroups are in rows, enter a series of columns in Subgroups across rows of.
3 If you like, use any of the options listed below, then click OK.
Options
R Chart dialog box
enter an historical value for (the standard deviation of the population distribution) if you
have a goal for , or a known from past data. If you do not specify a value for , it is
estimated from the data.
customize the chart annotation, frame, and region (placement of the chart within the Graph
window)see page 12-73.
do four tests for special causessee page 12-63. To adjust the sensitivity of the tests, see
Defining Tests for Special Causes on page 12-5.
choose the positions at which to draw the upper and lower control (sigma) limits in relation to
the center linesee page 12-69. The default line is 3 above and below the center line. You
can draw more than one set of lines. For example, you can draw specification limits along
with control limits on the chart.
place bounds on the upper and lower control limitssee page 12-69.
choose the line type, color, and size for the control limitssee page 12-69. The default line is
solid red.
add another row of tick labels below the default tick labelssee page 12-70. For example, you
can place time stamp labels (or other descriptive labels) on your graph.
choose the text font, color, and size for the axis and tick labelssee page 12-70. The default
labels are black Arial.
CONTENTS
12-15
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 12
R Chart
use the Box-Cox transformation when you have very skewed datasee page 12-67.
choose the connection line type, color, and size. The default line is solid black.
choose the symbol type, color, and size. The default symbol is a black cross.
estimate control limits and center line independently for different groups (draws a historical
chart)see page 12-60.
e Example of an R chart
Suppose you work at a car assembly plant in a department that assembles engines. In an operating
engine, parts of the crankshaft move up and down a certain distance from an ideal baseline
position. AtoBDist is the distance (in mm) from the actual (A) position of a point on the
crankshaft to the baseline (B) position.
To ensure production quality, you took five measurements each working day, from September 28
through October 15, and then ten per day from the 18th through the 25th. You have already
made an X chart with the data to track the process level and test for special causes. Now you want
to draw an R chart to track the process variation using the same data.
1 Open the worksheet CRANKSH.MTW.
2 Choose Stat Control Charts R.
3 In Single column, enter AtoBDist. In Subgroup size, enter 5.
4 Click Tests. Check Perform all four tests. Click OK in each dialog box.
Graph
window
output
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
S Chart
HOW TO USE
Variables Control Charts
S Chart
An S Chart is a control chart of subgroup standard deviations. You can use S charts to track the
process variation and detect the presence of special causes. S charts are typically used to track
process variation for samples larger than size 5, while R charts (page 12-14) are used for smaller
samples.
By default, MINITABs S Chart command bases the estimate of the process variation, , on the
average of the subgroup standard deviations. You can also use a pooled standard deviation, or
enter an historical value for .
For more information, see Variables Control Charts Overview on page 12-2 and Control Charts
for Data in Subgroups on page 12-9.
Data
Subgroup data can be structured in a single column, or in rows across several columns. When
you have subgroups of unequal size, structure the subgroups in a single column, then set up a
second column of subgroup identifiers. See Data on page 12-3 for examples.
h To make an S chart
1 Choose Stat Control Charts S.
When subgroups are in one column, enter the data column in Single column. In
Subgroup size, enter a subgroup size or column of subgroup indicators.
When subgroups are in rows, enter a series of columns in Subgroups across rows of.
3 If you like, use any of the options listed below, then click OK.
CONTENTS
12-17
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 12
S Chart
Options
S Chart dialog box
enter an historical value for (the standard deviation of the population distribution) if you
have a goal for , or a known from prior datasee page 12-62. If you do not specify a value
for , it is estimated from the data.
customize the chart annotation, frame, and region (placement of the chart within the Graph
window)see page 12-73.
do four tests for special causessee page 12-63. To adjust the sensitivity of the tests, see
Defining Tests for Special Causes on page 12-5.
choose the positions at which to draw the upper and lower control (sigma) limits in relation to
the center linesee page 12-69. The default line is 3 above and below the center line. You
can draw more than one set of lines. For example, you can draw specification limits along with
control limits on the chart.
place bounds on the upper and lower control limitssee page 12-69.
choose the line type, color, and size for the control limitssee page 12-69. The default line is
solid red.
add another row of tick labels below the default tick labelssee page 12-70. For example, you
can place time stamp labels (or other descriptive labels) on your graph.
choose the text font, color, and size for the axis and tick labelssee page 12-70. The default
labels are black Arial.
use the Box-Cox transformation when you have very skewed datasee page 12-67.
choose the connection line type, color, and size. The default line is solid black.
choose the symbol type, color, and size. The default symbol is a black cross.
12-18
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Variables Control Charts
estimate control limits and center line independently for different groups (draws a historical
chart)see page 12-60.
Data
Subgroup data can be structured in a single column, or in rows across several columns. When
you have subgroups of unequal size, structure the subgroups in a single column, then set up a
second column of subgroup identifiers. See Data on page 12-3 for examples.
To use an X and R chart your subgroup size must be less than or equal to 100. If your subgroup
size is greater than 100, use an X and S chart.
CONTENTS
12-19
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 12
HOW TO USE
Xbar and R Chart
When subgroups are in one column, enter the data column in Single column. In
Subgroup size, enter a subgroup size or column of subgroup indicators.
When subgroups are in rows, enter a series of columns in Subgroups across rows of.
3 If you like, use any of the options listed below, then click OK.
Options
Xbar/R Chart dialog box
enter historical values for (the mean of the population distribution) and (the standard
deviation of the population distribution) if you have a goal for or , or known parameters
from prior datasee page 12-62. If you do not specify a value for or , they are estimated
from the data.
do eight tests for special causessee page 12-63. To adjust the sensitivity of the tests, see
Defining Tests for Special Causes on page 12-5.
12-20
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
place an additional row of tick labels, such as dates or shifts, below the subgroup numbers on
the x-axissee page 12-70.
use the Box-Cox transformation when you have very skewed datasee page 12-67.
choose the positions at which to draw the upper and lower control (sigma) limits in relation to
the center linesee page 12-69. The default line is 3 above and below the center line. You
can draw more than one set of lines. For example, you can draw specification limits along
with control limits on the chart.
estimate control limits and center line independently for different groups (draws a historical
chart)see page 12-60.
Suppose you work at an automobile manufacturer in a department that assembles engines. One
of the parts, a camshaft, must be 600 mm 2 mm long to meet engineering specifications. There
has been a chronic problem with camshaft length being out of specificationa problem which
has caused poor-fitting assemblies down the production line and high scrap and rework rates.
Your supervisor wants to run X and R charts to monitor this characteristic, so for a month, you
collect a total of 100 observations (20 samples of 5 camshafts each) from all the camshafts used at
the plant, and 100 observations from each of your suppliers. First you will look at camshafts
produced by Supplier 2.
1 Open the worksheet CAMSHAFT.MTW.
2 Choose Stat Control Charts Xbar-R.
3 In Single column, enter Supp2. In Subgroup size, enter 5. Click OK.
Graph
window
output
CONTENTS
12-21
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 12
HOW TO USE
Xbar and S Chart
Data
Subgroup data can be structured in a single column, or in rows across several columns. When
you have subgroups of unequal size, structure the subgroups in a single column, then set up a
second column of subgroup identifiers. See Data on page 12-3 for examples.
12-22
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Variables Control Charts
When subgroups are in one column, enter the data column in Single column. In
Subgroup size, enter a subgroup size or column of subgroup indicators.
When subgroups are in rows, enter a series of columns in Subgroups across rows of.
3 If you like, use any of the options listed below, then click OK.
Options
Xbar/S Chart dialog box
enter historical values for (the mean of the population distribution) and (the standard
deviation of the population distribution) if you have goals for or , or known parameters
from prior datasee page 12-62. If you do not specify a value for or , they are estimated
from the data.
do eight tests for special causessee page 12-63. To adjust the sensitivity of the tests, see
Defining Tests for Special Causes on page 12-5.
CONTENTS
12-23
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 12
HOW TO USE
place an additional row of tick labels, such as dates or shifts, below the subgroup numbers on
the x-axissee page 12-70.
use the Box-Cox transformation when you have very skewed datasee page 12-67.
choose the positions at which to draw the upper and lower control (sigma) limits in relation to
the center linesee page 12-69. The default line is 3 above and below the center line. You
can draw more than one set of lines. For example, you can draw specification limits along with
control limits on the chart.
estimate control limits and center line independently for different groups (draws a historical
chart)see page 12-60.
When collecting data in subgroups, random error may not be the only source of variation. For
example, if you sample five parts in close succession every hour, the only differences should be
due to random error. Over time, the process could shift or drift, so the next sample of five parts
may be different from the previous sample. Under these conditions, the overall process variation
is due to both between-sample variation and random error.
Variation within each sample also contributes to overall process variation. Suppose you sample
one part every hour, and measure five locations across the part. While the parts can vary hour to
hour, the measurements taken at the five locations can also be consistently different in all parts.
Perhaps one location almost always produces the largest measurement, or is consistently smaller.
This variation due to location is not accounted for, and the within-sample standard deviation no
longer estimates random error, but actually estimates both random error and the location effect.
This results in a standard deviation that is too large, causing control limits that are too wide, with
most points on the control chart placed very close to the centerline. This process appears to be too
good, and it probably is.
12-24
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Variables Control Charts
You can solve this problem by using I-MR-R/S (Between/Within) to create three separate
evaluations of process variation:
Individuals chart: charts the means from each sample on an individuals control chart, rather
than on an Xbar chart. This chart uses a moving range between consecutive means to determine
the control limits. Since the distribution of the sample means is related to the random error,
using a moving range to estimate the standard deviation of the distribution of sample means is
similar to estimating just the random error component. This eliminates the within-sample
component of variation in the control limits.
Moving range chart: charts the subgroup means using a moving range to remove the
within-sample variation. Use this chart, along with the Individuals chart, to track both process
location and process variation, using the between-sample component of variation.
R chart or S chart: charts process variation using the within-sample component of variation.
Whether MINITAB displays an R chart or an S chart depends on the chosen estimation method
and the size of the subgroup. If you base estimates on the average of subgroup ranges, then an R
chart will be displayed. If you base estimates on the average of subgroup standard deviations,
then an S chart will be displayed. If you base estimates on the pooled standard deviation and
your subgroup size is less than ten, an R chart will be displayed. If you base estimates on the
pooled standard deviation and your subgroup size is ten or greater, an S chart will be displayed.
Thus, the combination of the three charts provides a method of assessing the stability of process
location, the between-sample component of variation, and the within-sample component of
variation.
Data
Subgroup data can be structured in a single column, or in rows across several columns. When
you have subgroups of unequal size, structure the subgroups in a single column, then set up a
second column of subgroup indicators. See Data on page 12-3 for examples.
h To make an I-MR-R/S (Between/Within) Chart
1 Choose Stat Control Charts I-MR-R/S (Between/Within).
CONTENTS
12-25
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 12
HOW TO USE
When subgroups are in one column, enter the data columns in Single column. In
Subgroup size, enter a subgroup size or column of subgroup indicators.
When subgroups are in rows, enter a series of columns in Subgroups across rows of.
3 If you like use any of the dialog box options, then click OK.
Options
I-MR-R/S (Between/Within) Chart dialog box
enter a historical value for (the mean of the population distribution) if you have a goal for ,
or know parameters from prior datasee Use historical values of and on page 12-62. If you
do not specify a value for , it is estimated from the data.
do eight tests for special causessee Do tests for special causes on page 12-63. To adjust the
sensitivity of the test, see Defining Tests for Special Causes on page 12-5.
omit certain subgroups when estimating and see Omit subgroups from the estimate of
or on page 12-65
estimate various wayssee Control how is estimated on page 12-66
for I and MR chart only: base the estimate on the median of the moving range length the
square root of the mean of squared successive differences, or change the length of the
moving range. The default method uses MR /2, the average of the moving range divided by
an unbiasing constant. By default, the moving range is of length 2, since consecutive values
have the greatest chance of being alike.
for R chart or S chart only: base the estimate of on the average of subgroup standard
deviations (displays S chart) or on a pooled standard deviation. With estimates based on
pooled standard deviation, an R chart is displayed if subgroup size is less than ten, while an
S chart is displayed if subgroup size is ten or greater. By default, the estimate is based on the
average of the subgroup ranges (displays R chart).
place an additional row of tick labels, such as dates or shifts, below the subgroup numbers on
the x-axissee Add additional rows of tick labels on page 12-70.
use the Box-Cox transformation when you have very skewed datasee Use the Box-Cox power
transformation for non-normal data on page 12-67.
12-26
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Variables Control Charts
choose the positions at which to draw the upper and lower control (sigma) limits in relation to
the center linesee Customize the control (sigma) limits on page 12-69. The default line is
three above and below the center line. You can draw more than one set of lines. For example,
you can draw specification limits along with control limits on the chart.
estimate control limits and center line independently for different groups (draws a historical
chart)see Estimate control limits and center line independently for different groups on page
12-60.
Suppose you are interested in determining whether or not a process that coats rolls of paper with
a thin film is in control [27]. You are concerned that the paper is being coated with the correct
thickness of film and that the coating is evenly distributed across the length of the roll. You take
3 samples from 15 consecutive rolls and measure coating weight.
Because you are interested in whether or not the coating is even throughout a roll and whether
each roll is correctly coated, you use MINITAB to create an I-MR-R/S chart.
1 Open the worksheet COATING.MTW.
2 Choose Stat Control Charts I-MR-R/S (Between/Within).
3 In Single column, enter Coating. In Subgroup size, enter Roll. Click OK.
Session
window
output
0.0000
13.5854
13.5854
CONTENTS
12-27
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 12
SC QREF
HOW TO USE
Graph
window
output
Individuals charta chart of the individual observations. See Individuals Chart on page
12-29.
Moving Range charta chart of the moving ranges. See Moving Range Chart on page 12-31.
I-MR chartan Individuals and Moving Range chart on one screen. See I-MR Chart on page
12-33.
Other charts that work with individual observations are X , EWMA, Moving Average, CUSUM,
and Zone chart.
You must have all of the process data in a single column when using these commands.
Missing data
If an observation is missing, there is a gap in the Individuals chart where that observation would
have been plotted. When calculating moving ranges, each value is the range of K consecutive
12-28
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Individuals Chart
HOW TO USE
Variables Control Charts
observations, where K is the length of the moving ranges. If any of the observations for a
particular moving range are missing, it is not calculated. Hence, there is a gap in the Moving
Range chart corresponding to each of the moving ranges that includes the missing observation.
Individuals Chart
An individuals chart is a control chart of individual observations. You can use individuals charts
to track the process level and detect the presence of special causes when your sample size is 1.
By default, Individuals chart estimates the process variation, , with MR / d2, the average of the
moving range divided by an unbiasing constant. Moving ranges are artificial subgroups created
from the individual measurements. By default, the moving range is of length 2, since
consecutive values have the greatest chance of being alike. You can also estimate using the
median of the moving range, change the length of the moving range, or enter historical values of
.
For more information, see Variables Control Charts Overview on page 12-2 and Control Charts
for Individual Observations on page 12-28.
Data
Structure individual observations in one column.
h To make an individuals chart
1 Choose Stat Control Charts Individuals.
CONTENTS
12-29
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 12
Individuals Chart
Options
Individuals Chart dialog box
enter historical values for (the mean of the population distribution) and (the standard
deviation of the population distribution) if you have a goal for or , or known parameters
from prior datasee page 12-62. If you do not specify a value for or , they are estimated
from the data.
customize the chart annotation, frame, and region (placement of the chart within the Graph
window)see page 12-73.
do eight tests for special causessee page 12-63. To adjust the sensitivity of the tests, see
Defining Tests for Special Causes on page 12-5.
choose the positions at which to draw the upper and lower control (sigma) limits in relation to
the center linesee page 12-69. The default line is 3 above and below the center line. You
can draw more than one set of lines. For example, you can draw specification limits along with
control limits on the chart.
place bounds on the upper and lower control limitssee page 12-69.
choose the line type, color, and size for the control limitssee page 12-69. The default line is
solid red.
add another row of tick labels below the default tick labelssee page 12-70. For example, you
can place time stamp labels (or other descriptive labels) on your graph.
choose the text font, color, and size for the axis and tick labels. The default labels are black
Arial.
use the Box-Cox transformation when you have very skewed datasee page 12-67.
choose the connection line type, color, and size. The default line is solid black.
choose the symbol type, color, and size. The default symbol is a black cross.
12-30
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Variables Control Charts
estimate control limits and center line independently for different groups (draws a historical
chart)see page 12-60.
In the following example, Weight contains the weight in pounds of each batch of raw material.
1 Open the worksheet EXH_QC.MTW.
2 Choose Stat Control Charts Individuals.
3 In Variable, enter Weight.
4 Click Tests. Check the first four tests. Click OK in each dialog box.
Session
window
output
TEST 1. One point more than 3.00 sigmas from center line.
Test Failed at points: 14 23 30 31 44 45
TEST 2. 9 points in a row on same side of center line.
Test Failed at points: 9 10 11 12 13 14 15 16 17 18 19 20 21 33 34 35 36
Graph
window
output
CONTENTS
12-31
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 12
HOW TO USE
Moving Range Chart
consecutive values have the greatest chance of being alike. You can also estimate using the
median of the moving range, change the length of the moving range, or enter an historical value
for .
For more information, see Variables Control Charts Overview on page 12-2 and Control Charts for
Individual Observations on page 12-28.
Data
Structure individual observations in one column.
h To make a moving range chart
1 Choose Stat Control Charts Moving Range.
Options
Moving Range Chart dialog box
enter an historical value for (the standard deviation of the population distribution) if you
have a goal for , or a known from prior datasee page 12-62. If you do not specify a value
for , it is estimated from the data.
customize the chart annotation, frame, and region (placement of the chart within the Graph
window)see page 12-73.
do four tests for special causessee page 12-63. To adjust the sensitivity of the tests, see
Defining Tests for Special Causes on page 12-5.
12-32
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
I-MR Chart
HOW TO USE
Variables Control Charts
choose the positions at which to draw the upper and lower control (sigma) limits in relation to
the center linesee page 12-69. The default line is 3 above and below the center line. You
can draw more than one set of lines. For example, you can draw specification limits along
with control limits on the chart.
place bounds on the upper and lower control limitssee page 12-69.
choose the line type, color, and size for the control limitssee page 12-69. The default line is
solid red.
add another row of tick labels below the default tick labelssee page 12-70. For example, you
can place time stamp labels (or other descriptive labels) on your graph.
choose the text font, color, and size for the axis and tick labelssee page 12-70. The default
labels are black Arial.
use the Box-Cox transformation when you have very skewed datasee page 12-67.
choose the connection line type, color, and size. The default line is solid black.
choose the symbol type, color, and size. The default symbol is a black cross.
estimate control limits and center line independently for different groups (draws a historical
chart)see page 12-60.
I-MR Chart
An I-MR chart is an Individuals chart and Moving Range chart in the same graph window. The
Individuals chart is drawn in the upper half of the screen; the Moving Range chart in the lower
half. Seeing both charts together allows you to track both the process level and process variation
at the same time, as well as detect the presence of special causes. See [25] for a discussion of how
to interpret joint patterns in the two charts.
By default, I-MR Chart estimates the process variation, , with MR / d2, the average of the
moving range divided by an unbiasing constant. The moving range is of length 2, since
MINITAB Users Guide 2
CONTENTS
12-33
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 12
HOW TO USE
I-MR Chart
consecutive values have the greatest chance of being alike. You can also estimate using the
median of the moving range, change the length of the moving range, or enter an historical value
for .
For more information, see Variables Control Charts Overview on page 12-2 and Control Charts for
Individual Observations on page 12-28.
Data
Structure individual observations in one column.
h To make an I-MR chart
1 Choose Stat Control Charts I-MR.
Options
I/MR Chart dialog box
enter historical values for (the mean of the population distribution) and (the standard
deviation of the population distribution) if you have a goal for or , or known parameters
from prior datasee page 12-62. If you do not specify a value for or , they are estimated
from the data.
do eight tests for special causessee page 12-63. To adjust the sensitivity of the tests, see
Defining Tests for Special Causes on page 12-5.
12-34
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Variables Control Charts
base the estimate of on the median of the moving range, or change the length of the
moving rangesee page 12-66. The default estimate of is based on the average of the
moving range of length 2.
place an additional row of tick labels, such as dates or shifts, below the subgroup numbers on
the x-axissee page 12-70.
use the Box-Cox transformation when you have very skewed datasee page 12-67.
choose the positions at which to draw the upper and lower control (sigma) limits in relation to
the center linesee page 12-69. The default line is 3 above and below the center line. You
can draw more than one set of lines. For example, you can draw specification limits along
with control limits on the chart.
estimate control limits and center line independently for different groups (draws a historical
chart)see page 12-60.
EWMA charta chart of exponentially weighted moving averages. See page 12-36.
Moving Average charta chart of unweighted moving averages. See page 12-40.
CUSUM charta chart of cumulative sum of the deviations from a nominal specification.
See page 12-42.
Zone charta chart that assigns a weight to each point depending on its distance from the
center line, and plots the cumulative scores. See page 12-47.
EWMA, Moving Average, CUSUM, and Zone Chart produce control charts for either data in
subgroups or individual observations. They are typically used to evaluate the process level.
However, both EWMA and CUSUM Chart may also be used to plot control charts for subgroup
ranges or standard deviations to evaluate process variation. See [8] and [21] for a discussion.
EWMA, Moving Average, and Zone Chart work with equal or unequal-size subgroups, but
CUSUM Chart requires all subgroups to be the same size. Zone Chart generates a standardized
zone chart when subgroup sizes are unequal.
CONTENTS
12-35
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 12
EWMA Chart
Missing data
If a single observation is missing and you have data in subgroups, it is omitted from the
calculations of the summary statistics for the subgroup it was in. All formulas are adjusted
accordingly. This may cause the EWMA and Moving Average Chart to produce control chart
limits that are not straight lines.
Suppose an entire subgroup is missing:
EWMA Chart plots an exponentially weighted moving average of all past subgroup means.
Hence, once it finds a missing subgroup, it cannot calculate any more values. The chart will
be blank starting with the missing subgroup.
CUSUM Chart plots a cumulative sum of deviations. Like EWMA Chart, when CUSUM
Chart encounters a missing subgroup, the chart will be blank beginning with the missing
subgroup.
Zone Chart leaves a gap in the chart corresponding to the missing subgroup.
EWMA Chart
An EWMA chart is a chart of exponentially weighted moving averages. Each EWMA point
incorporates information from all of the previous subgroups or observations. An EWMA chart
can be custom tailored to detect any size shift in the process. Because of this, they are often used
to monitor in-control processes for detecting small shifts away from the target.
The plot points can be based on either subgroup means or individual observations. When you
have data in subgroups, the mean of all the observations in each subgroup is calculated.
Exponentially weighted moving averages are then formed from these means. By default, the
process standard deviation, , is estimated using a pooled standard deviation. You can also base
the estimate on the average of subgroup ranges or subgroup standard deviations, or enter an
historical value for .
When you have individual observations, exponentially weighted moving averages are formed
from the individual observations. By default, is estimated with MR / d2, the average of the
moving range divided by an unbiasing constant. Moving ranges are artificial subgroups created
from the individual measurements. The moving range is of length 2, since consecutive values
have the greatest chance of being alike. You can also estimate using the median of the moving
range, change the length of the moving range, or enter an historical value for .
For more information, see Variables Control Charts Overview on page 12-2 and Control Charts
Using Subgroup Combinations on page 12-35.
12-36
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
EWMA Chart
HOW TO USE
Variables Control Charts
Data
You can use this command with subgroup data or individual observations. Subgroup data can be
structured in a single column, or in rows across several columns. When you have subgroups of
unequal size, structure the subgroups in a single column, then set up a second column of
subgroup identifiers. See Data on page 12-3 for examples.
Individual observations should be structured in a single column.
h To make an EWMA chart
1 Choose Stat Control Charts EWMA.
When subgroups or individual observations are in one column, enter the data column in
Single column. In Subgroup size, enter a subgroup size or column of subgroup
indicators. For individual observations, enter a subgroup size of 1.
When subgroups are in rows, enter a series of columns in Subgroups across rows of.
3 If you like, use any of the options listed below, then click OK.
Options
EWMA Chart dialog box
specify the weight used in the exponentially weighted moving averagesee page 12-38.
Choose a value between 0 and 1. The default weight is 0.2.
enter historical values for (the mean of the population distribution) and (the standard
deviation of the population distribution) if you have goals for or , or known parameters
from prior datasee page 12-62. If you do not specify a value for or , they are estimated
from the data.
customize the chart annotation, frame, and region (placement of the chart within the Graph
window)see page 12-73.
CONTENTS
12-37
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 12
EWMA Chart
choose the positions at which to draw the upper and lower control (sigma) limits in relation to
the center linesee page 12-69. The default line is 3 above and below the center line. You
can draw more than one set of lines. For example, you can draw specification limits along with
control limits on the chart.
place bounds on the upper and lower control limitssee page 12-69.
choose the line type, color, and size for the control limitssee page 12-69. The default line is
solid red.
add another row of tick labels below the default tick labelssee page 12-70. For example, you
can place time stamp labels (or other descriptive labels) on your graph.
choose the text font, color, and size for the axis and tick labelssee page 12-70. The default
labels are black Arial.
use the Box-Cox transformation when you have very skewed datasee page 12-67.
choose the connection line type, color, and size. The default line is solid black.
choose the symbol type, color, and size. The default symbol is a black cross.
1
14.000
10.400
2
9.000
10.120
12-38
3
7.000
9.494
4
9.000
9.397
5
13.000
10.117
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
7
9.000
8.915
8
11.000
9.332
CONTENTS
6
4.000
8.894
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
EWMA Chart
HOW TO USE
Variables Control Charts
To get started, the EWMA for subgroup 0 is set to the mean of all data, 9.5. The EWMA for
subgroup 1 is .2(14) + .8(9.5) = 10.4. The EWMA for subgroup 2 is .2(9) + .8(10.4) = 10.12. In
general, the EWMA, zi for subgroup i is
zi = w x i + (1 w)zi - 1
or
In the following example, Weight contains the weight in pounds of each batch of raw material.
1 Open the worksheet EXH_QC.MTW.
2 Choose Stat Control Charts EWMA.
3 In Single column, enter Weight. In Subgroup size, enter 5. Click OK.
Graph
window
output
CONTENTS
12-39
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 12
HOW TO USE
Moving Average Chart
Data
You can use this command with subgroup data or individual observations. Subgroup data can be
structured in a single column, or in rows across several columns. When you have subgroups of
unequal size, structure the subgroups in a single column, then set up a second column of
subgroup identifiers. See Data on page 12-3 for examples.
Individual observations should be structured in a single column.
h To make a moving average chart
1 Choose Stat Control Charts Moving Average.
12-40
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Variables Control Charts
When subgroups or individual observations are in one column, enter the data column in
Single column. In Subgroup size, enter a subgroup size or column of subgroup
indicators. For individual observations, enter a subgroup size of 1.
When subgroups are in rows, enter a series of columns in Subgroups across rows of.
3 If you like, use any of the options listed below, then click OK.
Options
Moving Average Chart dialog box
specify the length of the moving averagessee page 12-42. The default length is 3.
enter historical values for (the mean of the population distribution) and (the standard
deviation of the population distribution) if you have goals for or , or known parameters
from prior datasee page 12-62. If you do not specify a value for or , they are estimated
from the data.
customize the chart annotation, frame, and region (placement of the chart within the Graph
window)see page 12-73.
choose the positions at which to draw the upper and lower control (sigma) limits in relation to
the center linesee page 12-69. The default line is 3 above and below the center line. You
can draw more than one set of lines. For example, you can draw specification limits along
with control limits on the chart.
place bounds on the upper and lower control limitssee page 12-69.
choose the line type, color, and size for the control limitssee page 12-69. The default line is
solid red.
CONTENTS
12-41
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 12
CUSUM Chart
add another row of tick labels below the default tick labelssee page 12-70. For example, you
can place time stamp labels (or other descriptive labels) on your graph.
choose the text font, color, and size for the axis and tick labelssee page 12-70. The default
labels are black Arial.
use the Box-Cox transformation when you have very skewed datasee page 12-67.
choose the connection line type, color, and size. The default line is solid black.
choose the symbol type, color, and size. The default symbol is a black cross.
1
14.000
14.000
2
9.000
11.500
3
7.000
10.000
4
9.000
8.333
5
13.000
9.667
6
4.000
8.667
7
9.000
8.667
8
11.000
8.000
The MA for the first subgroup is 14.0, the first subgroup mean. The MA for the second subgroup
is the average of the first two means, (14 + 9) / 2 = 11.5. These two are special because we do not
have three subgroup means to average yet. Because of this, the UCL and LCL will be farther out
than for the rest of the subgroups. The remaining values of the MA follow a general pattern. The
MA for subgroup 3 is the average of the first 3 means, that is (14 + 9 + 7) / 3 = 10.0. The MA for
subgroup 4 is the average of the means from subgroups 2 through 4, (9 + 7 + 9) / 3 = 8.333. In
general, the MA for subgroup i is the average of the means from subgroups i 2, i 1, and i.
If you have individual observations, these are used in place of the subgroup means in all
calculations.
You can specify the length of the moving average used, that is, the number of subgroup means to
include in each average.
h To change the length of the moving average
In the Moving Average Chart main dialog box, enter the number of subgroup means to be
included in each average in Length of MA. The default is 3. If you have individual observations
(that is you specified a subgroup size of 1), these are used in place of the subgroup means in all
calculations.
CUSUM Chart
A cumulative sum (CUSUM) chart plots the cumulative sums of the deviations of each sample
value from the target value. You can plot a chart based on subgroup means or individual
12-42
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
CUSUM Chart
HOW TO USE
Variables Control Charts
observations. With in-control processes, both the CUSUM chart and EWMA chart (page 12-36)
are good at detecting small shifts away from the target.
You can plot two types of CUSUM chart. The default chart plots two one-sided CUSUMs. The
upper CUSUM detects upward shifts in the level of the process, the lower CUSUM detects
downward shifts. This chart uses control limits (UCL and LCL) to determine when an
out-of-control situation has occurred. See [22] and [23] for a discussion of one-sided CUSUMs.
You can also plot one two-sided CUSUM. This chart uses a V-mask, rather than the usual 3
control limits, to determine when an out-of-control situation has occurred. See [14] and [24] for
a discussion of the V-mask chart.
When you have data in subgroups, the mean of all the observations in each subgroup is
calculated. CUSUM statistics are then formed from these means. All subgroups must be the
same size. By default, the process standard deviation, , is estimated using a pooled standard
deviation. You can also base the estimate on the average of subgroup ranges or subgroup
standard deviations, or enter an historical value for .
When you have individual observations, CUSUM statistics are formed from the individual
observations. By default, is estimated with MR / d2, the average of the moving range divided
by an unbiasing constant. Moving ranges are artificial subgroups created from the individual
measurements. The moving range is of length 2, since consecutive values have the greatest
chance of being alike. You can also estimate using the median of the moving range, change
the length of the moving range, or enter an historical value for .
For more information, see Variables Control Charts Overview on page 12-2 and Control Charts
Using Subgroup Combinations on page 12-35.
Data
Subgroup data can be structured in a single column or in rows across several columns.
Subgroups must be of equal size. See Data on page 12-3 for examples.
Individual observations should be structured in a single column.
CONTENTS
12-43
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 12
HOW TO USE
CUSUM Chart
When subgroups or individual observations are in one column, enter the data column in
Single column. In Subgroup size, enter a subgroup size. For individual observations, enter
a subgroup size of 1.
When subgroups are in rows, enter a series of columns in Subgroups across rows of.
3 If you like, use any of the options listed below, then click OK.
h To plot one two-sided (V-mask) CUSUM
1 Choose Stat Control Charts CUSUM.
2 Do one of the following:
When subgroups or individual observations are in one column, enter the data column in
Single column. In Subgroup size, enter a subgroup size. For individual observations, enter
a subgroup size of 1.
When subgroups are in rows, enter a series of columns in Subgroups across rows of.
Options
CUSUM Chart dialog box
specify a value other than 0 for the target. CUSUM statistics are cumulative deviations from
this target, or nominal, specification.
12-44
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
CUSUM Chart
HOW TO USE
Variables Control Charts
enter an historical value for (the standard deviation of the population distribution) if you
have a goal for , or a known from prior datasee page 12-62. If you do not specify a value
for , it is estimated from the data.
specify a CUSUM plan, which is defined by the parameters h and ksee page 12-46.
use the FIR (Fast Initial Response) method to initialize the one-sided CUSUMs. then specify
the number of standard deviations above and below the center line. Normally, they are
initialized at 0, but if the process is out f control at startup, the CUSUMs will not detect the
situation for several subgroups. This has been shown by [15] to reduce the number of
subgroups needed to detect problems at startup.
reset the CUSUMs to their initial values whenever an out-of-control signal is generated
(one-sided CUSUMS only). When a process goes out of control, an attempt should be made
to find and eliminate the cause of the problem. If the problem has been corrected, the
CUSUMs should be reset to their initial values.
Method
With in-control processes, CUSUM charts are good at detecting small shifts away from the
target, because they incorporate information from the sequence of sample values. The plot
points are the cumulative sums of the deviations of the sample values from the target. These
points should fluctuate randomly around zero. If a trend develops upwards or downwards, it
should be considered as evidence that the process mean has shifted, and you should look for
special causes.
MINITAB Users Guide 2
CONTENTS
12-45
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 12
HOW TO USE
CUSUM Chart
two one-sided CUSUMs (the default). The upper CUSUM detects upward shifts in the level
of the process and the lower CUSUM detects downward shifts. This chart uses control limits
(UCL and LCL) to determine when an out-of-control situation has occurred. See [22] and
[23] for a discussion of one-sided CUSUMs.
one two-sided CUSUM. This chart uses a V-mask, rather than control limits, to determine
when an out-of-control situation has occurred. See [14] and [24] for a discussion of the V-mask
chart.
h is
k is
One-sided CUSUM
Two-sided CUSUM
(V-mask)
Suppose you work at a car assembly plant in a department that assembles engines. In an operating
engine, parts of the crankshaft move up and down a certain distance from an ideal baseline
position. AtoBDist is the distance (in mm) from the actual (A) position of a point on the
crankshaft to the baseline (B) position.
To ensure production quality, you took five measurements each working day, from September 28
through October 15, and then ten per day from the 18th through the 25th. You already drew an X
12-46
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Zone Chart
HOW TO USE
Variables Control Charts
chart (page 12-13) and an R chart (page 12-16) of this data. On the X chart, subgroup 5 failed a
test for special causes. Now, to look for small shifts away from the target, you want to plot the
CUSUMS.
1 Open the worksheet CRANKSH.MTW.
2 Choose Stat Control Charts CUSUM.
3 In Single column, enter AtoBDist. In Subgroup size, enter 5. Click OK.
Graph
window
output
Zone Chart
A zone chart is a hybrid between an X (or Individuals) chart and a CUSUM chart. It plots a
cumulative score, based on zones at 1, 2, and 3 sigmas from the center line. Zone charts are
usually preferred over X or Individuals charts because of their utter simplicity: a point is out of
control simply, by default, if its score is greater than or equal to 8. Thus, you do not need to
recognize the patterns associated with non-random behavior as on a Shewhart chart. This
method is equivalent to four of the standard tests for special causes in an X or Individuals chart.
You can also modify the zone chart weighting scheme to provide the sensitivity needed for a
specific process.
A zone chart is illustrated and further defined on page 12-49.
You can plot a chart based on subgroup means or individual observations. With data in
subgroups, the mean of the observations in each subgroup is calculated, then plotted on the
chart. When subgroup sizes are unequal, MINITAB generates a standardized zone chart. By
default, the process standard deviation, , is estimated using a pooled standard deviation. You
can also base the estimate on the average of subgroup ranges or subgroup standard deviations, or
enter a historical value for .
MINITAB Users Guide 2
CONTENTS
12-47
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 12
HOW TO USE
Zone Chart
With individual observations, a point is plotted for each observation. By default, is estimated ,
with MR / d2, the average of the moving range divided by an unbiasing constant. Moving ranges
are artificial subgroups created from the individual measurements. The moving range is of length
2, since consecutive values have the greatest chance of being alike. You can also estimate using
the median of the moving range, change the length of the moving range, or enter an historical
value for .
For more information, see Variables Control Charts Overview on page 12-2 and Control Charts
Using Subgroup Combinations on page 12-35.
Data
You can use this command with subgroup data or individual observations. Subgroup data can be
structured in a single column, or in rows across several columns. When you have subgroups of
unequal size, structure the subgroups in a single column, then set up a second column of
subgroup identifiers. See Data on page 12-3 for examples.
Individual observations should be structured in a single column.
h To make a zone chart
1 Choose Stat Control Charts Zone.
When subgroups or individual observations are in one column, enter the data column in
Single column. In Subgroup size, enter a subgroup size or column of subgroup indicators.
For individual observations, enter a subgroup size of 1.
When subgroups are in rows, enter a series of columns in Subgroups across rows of.
3 If you like, use any of the options listed below, then click OK.
12-48
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Zone Chart
HOW TO USE
Variables Control Charts
Options
Zone Chart dialog box
enter historical values for (the mean of the population distribution) and (the standard
deviation of the population distribution) if you have goals for or , or known parameters
from prior datasee page 12-62. If you do not specify values for or , they are estimated
from the data.
change the weights or scores assigned to the points in each zonesee page 12-49. The weight
assigned to Zone 4 is also used as the critical value for determining when a process is out of
control. If you do not use this option, the default scores are 0, 2, 4, and 8. See [3] and [9] for a
discussion of the various weighting schemes.
reset the cumulative score to zero following each out of control signalsee page 12-49.
When a process goes out of control, you should try to find and eliminate the cause of the
problem. When the problem is corrected, the cumulative score should be reset to zero.
display all subgroups on the zone chart, or the last n subgroups. By default, MINITAB plots the
last 25 observations.
store the cumulative zone scores that appear in the circles at each point on the graph. Zone
chart stores the exact cumulative score for each subgroup.
Method
The zone chart classifies observations or subgroup means according to their distance from the
center line. For each observation or subgroup mean, the corresponding plot point is derived as
follows.
CONTENTS
12-49
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 12
Zone Chart
1 Each observation is assigned a zone score, as shown in this table:
Between 1 and 2
Between 2 and 3
Beyond 3
2 Each observation is assigned a cumulative score, which is the value that is actually plotted:
The first point is simply the zone score for the first observation or subgroup mean.
For subsequent points, weights are summed sequentially. Each time a new point crosses the
center line, the sum is reset to zero. If the sum totals 8 or more, the process is declared
out-of-control.
You can specify weights other than 0, 2, 4, and 8 for the zone scores in the Options subdialog box.
You can also choose to reset the cumulative score after each signal.
12-50
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Zone Chart
HOW TO USE
Variables Control Charts
a point in Zone 4 is given a score of 8. Keep in mind that a cumulative score equal to or
greater than the weight assigned to Zone 4 signals an out of control situation. This is
equivalent to a Shewhart chart Rule 1a single value beyond three standard deviations from
the center.
a point in Zone 3 is given a score of 4. A second point in the same zone gives another score of
4. The cumulative sum of these two points is 8, which signals an out of control situation. This
is equivalent to a Shewhart chart Rule 5two out of three points in a row more than two
standard deviations from the center line.
a point in Zone 2 is given a score of 2. Three more points in the same zone gives a cumulative
score of 8, which signals an out of control situation. This is equivalent to a Shewhart chart
Rule 6four out of five points in a row more than one standard deviation from the center
line.
For further discussions of zone control chart properties, refer to [3], [9], [11], and [12].
CONTENTS
12-51
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 12
HOW TO USE
Zone Chart
Suppose you work in a manufacturing plant concerned about quality control. You decide to
measure the length of ten sets of cylinders produced during each of five shifts for a total of 50
samples daily. Because a zone control chart is very easy to interpret, you decide to evaluate your
data with it. You also decide to reset the cumulative score following each out-of-control signal.
1 Open the worksheet EXH_QC.MTW.
2 Choose Stat Control Charts Zone.
3 In Single column, enter Length. In Subgroup size, enter 5.
4 Click Options. Check Reset cumulative score after each signal. Click OK in each dialog
box.
Graph
window
output
12-52
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Variables Control Charts
Z-MR Chart
A Z-MR chart is a chart of standardized individual observations (Z) and moving ranges (MR)
from a short run process. The chart for individual observations (Z) displays above the chart for
moving ranges (MR). Seeing both charts together lets you track both the process level and
process variation at the same time. See [25] for a discussion of how to interpret joint patterns in
the two charts.
Use Z-MR Chart with short run processes when there is not enough data in each run to produce
good estimates of process parameters. Z-MR Chart standardizes the measurement data by
subtracting the mean to center the data, then dividing by the standard deviation. Standardizing
allows data collected from different runs to be evaluated by interpreting a single control chart.
You can estimate the mean and process variation from the data various ways, or supply historical
values.
See page 12-2 for a control charts overview, or page 12-53 for information specific to control
charts for short run processes.
CONTENTS
12-53
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 12
HOW TO USE
Z-MR Chart
Data
Your worksheet should consist of a pair of columns: a data column and column containing the
corresponding part/product name or number. The part/product data defines the groupings for
estimating process parameters. In addition, each time MINITAB encounters a change in the part/
product name column, a new run is defined. You may find Calc Make Patterned Data
Simple Set of Numbers useful for entering the part/product number.
h To make a Z-MR chart
1 Choose Stat Control Charts Z-MR.
2 In Variable, enter a data column. In Part, enter a column containing the part/product name
Options
Estimate subdialog box
standardize the data with historical means, or target values, for each part/product, rather than
the means estimated from the data. See Estimating the process means on page 12-55 for more
information. When you use historical means to center the data, you can compare your process
with past performance. When you use target values to center the data, you can compare your
process to the desired performance.
estimate various ways, or enter historical values for each part/product. See Estimating the
process standard deviations on page 12-55 for more information.
place an additional row of tick labels, such as dates or shifts, below the subgroup numbers on
the x-axissee page 12-70
display all observations on the chart, or the last n observations. By default, the last 25
observations are displayed.
12-54
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Z-MR Chart
HOW TO USE
Variables Control Charts
When
CONTENTS
12-55
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 12
Z-MR Chart
By Parts is a good choice when you have very short runs and want to combine runs of the same
part or product to obtain a more reliable estimate of . If the runs are sufficiently long, By Runs
can provide reliable estimates of .
Regardless of the method used to estimate , Z-MR Chart uses a moving range of length 2 to
estimate for each group of pooled data.
The methods that can be used to estimate , the process standard deviation, result in different
standardized values that are plotted on the control charts. The assumptions you are willing to
make about your process variation will determine the estimation method you choose. See the
table above for guidance in choosing a method.
Suppose you are measuring the thickness, in centimeters, of fibers from a spinning process. There
are 3 different fibers being made (#134, #221, #077) as seen in the Fiber # column in the table
below. The table shows the estimated for each measurement, using the various methods:
Run # Fiber # Thickness
134
1.435
1.5015
.0716
.0463
.0696
.0988
134
1.572
1.5015
.0716
.0463
.0696
.0988
134
1.486
1.5015
.0716
.0463
.0696
.0988
221
1.883
1.7847
.0716
.0461
.0821
.1117
221
1.715
1.7847
.0716
.0461
.0821
.1117
221
1.799
1.7847
.0716
.0461
.0821
.1117
134
1.511
1.5015
.0716
.0461
.0696
.0643
134
1.457
1.5015
.0716
.0463
.0696
.0643
134
1.548
1.5015
.0716
.0463
.0696
.0643
221
1.768
1.7847
.0716
.0461
.0821
.0789
221
1.711
1.7847
.0716
.0461
.0821
.0789
221
1.832
1.7847
.0716
.0461
.0821
.0789
077
1.427
1.3883
.0716
.0459
.0634
.0634
077
1.344
1.3883
.0716
.0459
.0634
.0634
077
1.404
1.3883
.0716
.0459
.0634
.0634
Z-MR Chart estimates the mean for each different part or product separately. To calculate the
mean, Z-MR Chart pools all the data for a common part, and obtains the average of the pooled
data. This average is the estimate of for that part. The part name data are used to define the
groupings for estimating the process means.
Notice the mean for each fiber is the same for all runs of that fiber. You can see in the table that
the mean for run 1 and run 3 are the same, since they were both the same fiberfiber #134. Also,
the mean for run 2 and run 4 are the same, since they are both runs for fiber #221.
When the Relative to size option is used, the means are taken on the natural log of the data.
In this example, the data comes from a process where the variance increases as the size of the
measurement increases. For example,
Fiber#
Mean
12-56
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Z-MR Chart
HOW TO USE
Variables Control Charts
221
134
077
1.7847
1.5015
1.3883
.0821
.0696
.0634
You probably want to use the Relative to size method for estimating for this process.
When you use the Constant option, Z-MR Chart subtracts the mean for each part from the raw
data of that part. This process is required to center the data before estimating . The deviations
from the mean are then pooled into one sample, and the average moving range of the deviations
is used to estimate .
0.9674
0.9554
0.6731
134
1.435 1.5015
134
1.572 1.5015
0.9846
1.0022
1.0129
0.7136
134
1.486 1.5015
0.2165
0.2131
0.2227
0.1569
221
1.883 1.7847
1.3729
1.1774
1.1973
0.8800
221
1.715 1.7847
0.9735
0.8518
0.8490
0.6240
221
1.799 1.7847
0.1997
0.1865
0.1742
0.1280
134
1.511 1.5015
0.1327
0.1473
0.1365
0.1477
134
1.457 1.5015
0.6215
0.6388
0.6394
0.6921
134
1.548 1.5015
0.6494
0.6698
0.6681
0.7232
221
1.768 1.7847
0.2332
0.1909
0.2034
0.2117
221
1.711 1.7847
1.0293
0.9024
0.8977
0.9341
221
1.832 1.7847
0.6606
0.5812
0.5761
0.5995
077
1.427 1.3883
0.5405
0.5529
0.6104
0.6104
077
1.344 1.3883
0.6187
0.7520
0.6987
0.6987
077
1.404 1.3883
0.2193
0.1991
0.2476
0.2476
Suppose you work in a paper manufacturing plant and are concerned about quality control.
Because your process makes paper in short runs, you need to employ standardized control
charting techniques to assess quality control. You know that the variation in your process is
MINITAB Users Guide 2
CONTENTS
12-57
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 12
Z-MR Chart
proportional to the thickness of the paper being produced, so you plan to use the Relative to size
option to estimate .
You collect data from 5 runs including 3 different grades of paper. You then use MINITABs Z-MR
chart command to produce a standardized control chart for the individual observations (Z) and
the moving ranges (MR) from your short run paper-making process.
1 Open the worksheet EXH_QC.MTW.
2 Choose Stat Control Charts Z-MR.
3 In Variable, enter Thicknes. In Part, enter Grade.
4 Click Estimate. Choose Relative to size. Click OK in each dialog box.
Graph
window
output
12-58
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Variables Control Charts
12-62
12-65
Control how is
estimated
12-66
Z-MR
page
12-52
Use average of
subgroup ranges
Use average of
subgroup standard
deviations
Specify length of
moving range
Use Box-Cox
transformation on data
12-67
12-60
Customize control
(sigma) limits
12-69
12-70
12-73
CONTENTS
short
runs
Zone
CUSUM
Moving Average
EWMA
Individuals
subgroup combinations
I-MR
I-MR-R/S
Xbar-S
Xbar-R
Xbar
12-62
Page
Option
Moving Range
individual
observations
data in subgroups
12-59
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 12
SC QREF
HOW TO USE
Xbar, R, S, Xbar-R, Xbar-S, Individuals, I-MR, I-MR-R/S, Moving Range, all attributes
control charts (P, NP, C, and U)
You can display stages in your process by drawing a historical charta control chart in which
the control limits and center line are estimated independently for different groups in your data.
Historical charts are particularly useful for comparing data before and after a process
improvement.
Suppose you work for a company which manufactures light bulbs. As the light bulbs move along
a conveyer belt, they are stamped with the company logo. Sometimes the stamp lands off center.
To improve the process, you first tighten the conveyer belt. To improve the process further, you
adjust the stamping device.
This chart groups the data collected before and after each adjustment:
Note
With the following charts, you must have at least one subgroup with two or more
observations: R chart, S chart, I-MR-R/S chart, or X chart with the Rbar or Sbar estimation
method.
To define stages in your process, you must set up a column of grouping indicators. The indicators
can be numbers, dates, or text. When executing the command, you can tell MINITAB to start a
new stage in one of two ways:
The column must be the same length as the data column (or columns, when subgroups are
across rows).
12-60
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Variables Control Charts
1 In the control chart main dialog box, click Estimate Parameters BY Groups in.
2 In Variable used to define groups for estimating parameters, enter the column which
to start a new stage each time the value in the column changes, choose New groups start
at each new value in group variable (the default).
to start a new stage at the first occurrence of a certain value, choose New groups start at
the first occurrence of these values and enter the values, or a column containing those
values, in the box. Date/time or text entries must be enclosed in double quotes. You can
enter the same value more than once; each repeat will be treated as a separate occurrence.
You can
As manager of a hospitals intensive care unit, you are concerned about the length of time it takes
to admit patients to your unit. To gain an understanding of the process, you begin monitoring
admission times. You find that the process is in control, but the variability is large. So before
MINITAB Users Guide 2
CONTENTS
12-61
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 12
SC QREF
HOW TO USE
making any changes in the process, your team decides to first standardize the admission
procedure for all shifts. This standardization takes place in July.
While studying the admissions process, you discover that you can cut down on switchover time by
using the same type of IV line used in the operating room. You implement this change in August.
To share your findings with the staff, you draw an historical chart.
1 Open the worksheet ICU.MTW.
2 Choose Stat Control Charts Individuals.
3 In Variable, enter ICUadmit.
4 Click Estimate Parameters BY Groups In. In Variable used to define groups for estimating
Historical : Xbar, Xbar-R, Xbar-S, I-MR-R/S, Individuals, I-MR, EWMA, Moving Average,
Zone, C, U
Historical : All control charts except Z-MR and I-MR-R/S
For variables control charts, the process is assumed to produce data from a stable population that
often follows a normal distribution. The mean and standard deviation of a population distribution
are denoted by mu () and sigma (), respectively. If and are not specified, they are estimated
from the data. Alternatively, you can enter known process parameters, estimates obtained from
past data, or your goals.
12-62
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Variables Control Charts
When you choose to enter historical values for and it overrides any options relating to
estimating or from the dataspecifically: Omit the following samples when estimating
parameters, and any of the Methods for estimating sigma.
h To use historical values of and
In the charts main dialog box, enter a value in Historical mean and/or Historical sigma.
Xbar, Xbar-R, Xbar-S, I-MR-R/S, Individuals, and I-MR, all attributes charts (P, NP, C, and
U), Capability Sixpack (Normal), Capability Sixpack (Between/Within), and Capability
Sixpack (Weibull)
Tests 1-4 only: R, S, and Moving Range
Each of the tests for special causes, shown in Exhibit 12.1, detects a specific pattern in the data
plotted on the chart. The occurrence of a pattern suggests a special cause for the variation, one
that should be investigated. See [5] and [25] for guidance on using these tests.
When a point fails a test, it is marked with the test number on the chart. If a point fails more than
one test, the number of the first test in your list is the number printed on the chart. In addition, a
summary table is printed in the Session window with complete information.
You can change the threshold values for triggering a test failuresee Defining Tests for Special
Causes on page 12-5 for details.
Subgroup sizes must be equal to perform these tests.
CONTENTS
12-63
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 12
SC QREF
HOW TO USE
Test 1
Test 2
Test 3
Test 4
Fourteen points in a
row, alternating up
and down
Test 5
Test 6
Test 7
Test 8
Fifteen points in a
row within 1 sigma of
center line (either
side)
12-64
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Variables Control Charts
To select certain tests, choose Choose specific tests to perform (the default). Check the
tests you would like to perform.
To do all of the tests, choose Perform all eight tests or Perform all four tests, depending
on the command.
All variables control charts except Z-MR and all attributes control charts
By default, MINITAB estimates the process parameters from all the data. But you may want to
omit certain data if it shows abnormal behavior.
h
observation (sample) numbers that you want to omit from the calculations. With some charts
( X , R, S, Individuals, Moving Range, EWMA and Moving Average), you can also enter a
column which contains those values.
Note
MINITAB assumes the values you enter are subgroup numbers, except with the I-MR-R/S,
Individuals, Moving Range, and I-MR charts. With these charts, the values are interpreted
as observation (sample) numbers.
3 Click OK.
CONTENTS
12-65
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 12
SC QREF
HOW TO USE
Pooled standard deviation: All control charts except I-MR, Individuals, Moving Range,
and capability charts not based on normal distributions
Average of subgroup ranges: Xbar, R, Xbar-R, I-MR-R/S, EWMA, Moving Average,
CUSUM, Zone, Capability Analysis (Normal), Capability Sixpack (Normal)
Average of subgroup standard deviations: Xbar, S, Xbar-S, I-MR-R/S, EWMA, Moving
Average, CUSUM, Zone, Capability Analysis (Normal), Capability Sixpack (Normal)
Median of moving range/Specify length of moving range: Xbar, I-MR-R/S, Individuals,
Moving Range, I-MR, EWMA, Moving Average, CUSUM, Zone, Capability Analysis
(Normal), Capability Sixpack (Normal)
Square root of mean of squared successive differences: I-MR-R/S
MINITAB has several methods of estimating , depending on whether your data is in subgroups or
individual observations.
Data in subgroups
All commands, except for R chart, S chart, and I-MR-R/S chart, estimate with a pooled
standard deviation. The pooled standard deviation is the most efficient method of estimating
sigma when you can assume constant variation across subgroups. Choose Rbar to base the
estimate on the average of the subgroup ranges. Choose Sbar to base your estimate on the
average of the subgroup standard deviations. See [1] for a discussion of the relative merits of each
estimator.
Individual observations
The estimate of is based on MR / d2, the average of the moving range divided by an unbiasing
constant. By default, the moving range is of length 2, since consecutive values have the greatest
chance of being alike. Use moving range of length to change the length of the moving range.
Alternatively, use Median moving range to estimate using the median of the moving range.
h To choose how is estimated
1 In the charts main dialog box, click Estimate.
2 Under Methods of estimating sigma, click the method of choice, then click OK.
Note
When Omit the following samples when estimating parameters is used with Use
moving range of length, any moving ranges which include omitted data are excluded
from the calculations.
12-66
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Variables Control Charts
When subgroup sizes are not equal, the control limits will not be straight lines, but will vary with
the subgroup size. The center line of charts for ranges and standard deviations also varies with
the subgroup size. If the sizes do not vary much, you may want to force these lines to be constant.
For instance, you could enter the average sample size as the subgroup size. When you use this
option, the plot points themselves are not changed; only the control limits and center line.
h To force control limits and center line to be constant
1 In the charts main dialog box, click Estimate.
2 Under Calculate control limits using, enter a value in Subgroup size. For example, entering
a value of 6 says to calculate the control limits and center line as if all subgroups were of size
6. Click OK.
Note
It is usually recommended that you force the control limits and center line to be constant
only when the difference in size between the largest and smallest subgroup is no more
than 25%.
Xbar, R, S, Xbar-R, Xbar-S, I-MR-R/S, Individuals, Moving Range, I-MR, EWMA, Moving
Average, Z-MR, Capability Analysis (Normal), Capability Sixpack (Normal), Capability
Analysis (Between/Within), Capability Sixpack (Between/Within)
You can use the Box-Cox power transformation when your data are very skewed or where the
within-subgroup variation is unstable to make the data more normal. The transformation takes
the original data to the power , unless = 0, in which case the natural log is taken. ( is
pronounced lambda.)
To use this option, the data must be positive.
The Options subdialog box lists the common transformations natural log ( = 0) and square root
( = 0.5). You can also choose any value between 5 and 5 for . In most cases, you should not
choose a outside the range of -2 and 2. You may want to first run the command described
under Box-Cox Transformation for Non-Normal Data on page 12-6 to help you find the optimal
transformation value.
Caution
If you use Stat Control Charts Box-Cox Transformation to find the optimal lambda
value and choose to store the transformed data with that command, take care not to
select the Box-Cox option if making the control chart with that data; you will double
transform the data.
CONTENTS
12-67
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 12
SC QREF
HOW TO USE
When you use this transformation, MINITAB does not accept any values you enter in Historical
mean or Historical sigma.
Box-Cox transformation with control charts
When you use the Box-Cox power transformation, the control chart will be based on the
transformed data. The process parameters (mean and standard deviation) are also calculated
using the transformed data.
Box-Cox transformation with process capability commands
When you use the Box-Cox power transformation, MINITAB displays a capability histogram for
the transformed data. (A small histogram of the original data displays in the upper left side of the
plot.) The normal curve included in the capability histogram helps you determine whether the
transformation was successful in making the data more normal.
This method also transforms the specification limits and target automatically, so that all the data
are on the same scale. Process parameters (mean, short-term standard deviation, and long-term
standard deviation) and capability statistics (both long-term and short-term) are calculated using
the transformed data and specification limits. The transformed statistics display with an next to
their names in the table Process Data.
With Capability Sixpack (Normal) and Capability Sixpack (Between/Within), when you enter a
> 0, the capability plot is in the original scale; when < 0, the plot is in the transformed scale.
h To do the Box-Cox power transformation
1 From the main control chart or capability dialog box, click Options. We use the Xbar dialog
use the square root of the datachoose Lambda = 0.5 (square root)
transform the data using some other lambda valuechoose Other and enter a value
between 5 and 5 in the box
12-68
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Variables Control Charts
For help choosing a lambda value, see the independent Box-Cox transformation command
described in Box-Cox Transformation for Non-Normal Data on page 12-6.
3 Click OK.
Full options: Xbar, R, S, Individuals, Moving Range, EWMA, and Moving Average
Partial option: Xbar-R, Xbar-S, I-MR-R/S, and I-MR
With the full and partial options, you can draw control limits above and below the mean at the
multiples of any standard deviation.
To specify the positions of control limits, you enter positive numbers, or a column containing the
values. Each value you give draws two horizontal lines, one above and one below the mean. For
example, entering a 2 draws control limits at two standard deviations above and below the center
line. Entering 1 2 3 gives three lines above and three lines below the center line at 1, 2, and
3. Entering C1 also gives three lines above and below the center line at 1, 2, and 3 when
C1 contains the values 1 2 3.
With the full option (S Limits subdialog box), you can also:
set bounds on the upper and lower control limits. When the calculated upper control limit is
greater than the upper bound, a horizontal line labeled UB will be drawn at the upper bound
instead. Similarly, if the calculated lower control limit is less than the lower bound, a
horizontal line labeled LB will be drawn at the lower bound instead.
specify the line type, color, and size. By default, the line is solid red.
For an example, see Example of an Xbar chart with tests and customized control limits on page
12-13.
Tip
You can also modify the control limits using the graph editing features explained in
MINITAB Users Guide 1.
CONTENTS
12-69
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 12
SC QREF
HOW TO USE
To specify where control limits are drawn: in Sigma limit positions, enter one or more
values, or a column of values. Each value is the number of standard deviations the lines
should be drawn at, above and below the mean.
To set bounds on the control limits: check Place bound on upper sigma limits at (and/or
Place bound on lower sigma limits at) and enter a value. Each value represents the
number of standard deviations above and below the mean.
To change line attributes: under Line type or Line color, click a choice. In Line size, enter
a positive real number. Larger numbers correspond to wider lines. The number you enter is
in relation to the base unit of 1 pixel.
3 Click OK in each dialog box.
h To customize the control limits (partial option)
1 In the charts main dialog box, click Options.
2 In Sigma limit positions, specify where control limits are drawn by entering one or more
values, or a column of values. Each value is the number of standard deviations the lines should
be drawn at, above and below the mean.
3 Click OK.
Full option: Xbar, R, S, Individuals, Moving Range, EWMA, Moving Average, all attributes
charts
Partial option: Xbar-R, Xbar-S, I-MR-R/S, I-MR, CUSUM, Zone
add row(s) of tick labels below the regular tick labels on the horizontal (x-) axis. This allows
you to place time stamp labels (or other descriptive labels) on your chart.
specify a label for the added line. The default label is the name of the column variable
specified in the control chart command. If the column has a name (for example, Sales), that
name is the default label. If not, the column number (for example, C20) is the default label.
12-70
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Variables Control Charts
specify the font, color, and size of the tick labels. By default, the labels are black Arial.
With the partial option, you can add rows of tick labels below the regular labels, but you cannot
label the line.
The column used for tick labels can contain date/time, text, or numeric data, but must contain
the same number of entries as the column of data you use to generate the control chart.
To control the number of tick marks and whether the tick marks appear on the top or bottom of a
chart, see Tick in on-line Help. Changing the number of tick marks also changes the number of
tick (stamp) labels.
Tip
You can modify stamp text using the graph editing features explained in MINITAB Users
Guide 1. For example, you can control the number of tick marks and whether the tick
marks appear on the top or bottom of a chart. Changing the number of tick marks also
changes the number of tick (stamp) labels.
In Axis Label, enter an axis label for the added line. The default is the name of the column
of labels.
In Text Font, specify a font. The default is Arial.
In Text Color, specify a color. The default is black.
4 Click OK.
h To add additional rows of tick labels (partial option)
1 In the control charts main dialog box, click Stamp.
2 In Stamp, enter the column of labels.
3 Click OK.
e Example of adding a time stamp
In this example, we add two rows of tick labels below the original tick labels (the subgroup
number)Month and Day.
1 Open the worksheet CRANKSHD.MTW (a variation of CRANKSH.MTW).
2 Choose Stat Control Charts R.
3 In Single column, enter AtoBDist. In Subgroup size, enter 5.
4 Click Stamp. Under Tick Labels, enter Month in row 1 and Day in row 2, then click OK in
CONTENTS
12-71
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 12
SC QREF
HOW TO USE
Graph
window
output
12-72
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
References
HOW TO USE
Variables Control Charts
Full option: Xbar, R, S, Individuals, Moving Range, EWMA, Moving Average, all attributes
charts
Partial option: Xbar-R, Xbar-S, I-MR-R/S, I-MR, CUSUM, Zone
The control charts share basic options with other MINITAB graphs. These options can be
accessed in the main dialog box through the Annotation, Frame, and Regions drop-down lists,
and in the Options subdialog box. For more information, refer to the indicated chapters in
MINITAB Users Guide 1.
Note
Core Graphs:
Displaying Data
Core Graphs:
Annotating
Core Graphs:
Customizing
the Frame
Core Graphs:
Controlling Regions
Symbols
Titles
Axes
Figure
Connection lines
Footnotes
Ticks
Data
Text
Grids
Lines
Reference lines
Polygons
Markers
Suppressing frame
elements
With Xbar-R, Xbar-S, I-MR-R/S, I-MR, and Z-MR, you can enter your own graph title; no
other options are available.
To save an active graph window, use File Save Window As. To view it later, use File Open
Graph.
References
[1] I.W. Burr (1976). Statistical Quality Control Methods, Marcel Dekker, Inc.
[2] Ward Cheney and David Kincaid. (1985). Numerical Mathematics and Computing, Second
Edition, Brook/Cole Publishing Company.
[3] J. Fang and K.E. Case (1990). Improving the Zone Control Chart, ASQC Quality Congress
Transactions, San Francisco.
[4] N.R. Farnum (1994). Modern Statistical Quality Control and Improvement, Wadsworth
Publishing.
CONTENTS
12-73
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 12
References
[5] Ford Motor Company (1983). Continuing Process Control and Process Capability
Improvement, Ford Motor Company, Dearborn, Michigan.
[6] E.L. Grant and R.S. Leavenworth (1988). Statistical Quality Control, 6th Edition,
McGraw-Hill.
[7] G.K. Griffith (1989). Statistical Process Control Methods for Long and Short Runs, ASQC
Quality Press, Milwaukee.
[8] D.M. Hawkins (1981). A Cusum for a Scale Parameter, Journal of Quality Technology, 13,
pp.228231.
[9] C.D. Hendrix and J.L. Hansen (1990). Zone Charts: an SPC Tool for the 1990s, Union
Carbide Chemicals & Plastics, South Charleston.
[10] K. Ishikawa (1967). Guide to Quality Control, Asian Productivity Organization.
[11] A. Jaehn (1987). Zone Control Charts: A New Tool for Quality Control, Tappi, pp.159
161.
[12] A. Jaehn (1987). Zone Control ChartsSPC Made Easy, Quality, pp.5152.
[13] V.E. Kane (1989). Defect Prevention, Marcel Dekker, Inc.
[14] J.M. Lucas (1976). The Design and Use of V-Mask Control Schemes, Journal of Quality
Technology, 8, pp.112.
[15] J.M. Lucas and R.B. Crosier (1982). Fast Initial Response for CUSUM Quality-Control
Schemes: Give Your CUSUM a Head Start, Technometrics, 24, pp.199205.
[16] J.M. Lucas and M.S. Saccucci (1990). Exponentially Weighted Moving Average Control
Schemes: Properties and Enhancements, Technometrics, 32, pp.112.
[17] D.C. Montgomery (1985). Introduction to Statistical Quality Control, John Wiley & Sons.
[18] Raymond H. Myers. (1990). Classical and Modern Regression with Applications, Second
Edition, PWS-KENT Publishing Company.
[19] L.S. Nelson (1984). The Shewhart Control ChartTests for Special Causes, Journal of
Quality Technology, 16, pp.237239.
[20] John Neter, William Wasserman, and Michael Kutner. (1990). Applied Linear Statistical
Models: Regression, Analysis of Variance, and Experimental Designs, Third Edition, Richard
D. Irwin, Inc.
[21] C.H. Ng and K.E. Case (1989). Development and Evaluation of Control Charts Using
Exponentially Weighted Moving Averages, Journal of Quality Technology, 21, pp.242250.
[22] E.S. Page (1961). Cumulative Sum Charts, Technometrics, 3, pp.19.
[23] T.P. Ryan (1989). Statistical Methods for Quality Improvement, John Wiley & Sons.
12-74
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
References
HOW TO USE
Variables Control Charts
[24] H.M. Wadsworth, K.S. Stephens, and A.B. Godfrey (1986). Modern Methods for Quality
Control and Improvement, John Wiley & Sons.
[25] Western Electric (1956). Statistical Quality Control Handbook, Western Electric
Corporation, Indianapolis, Indiana.
[26] Donald J. Wheeler and David S. Chambers. (1992). Understanding Statistical Process
Control, Second Edition, SPC Press, Inc.
[27] Donald J. Wheeler. (1995). Advanced Topics in Statistical Process Control: The Power of
Shewhart Charts, SPC Press, Inc.
CONTENTS
12-75
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
13
Attributes
Control Charts
P Chart, 13-4
NP Chart, 13-6
C Chart, 13-8
U Chart, 13-12
CONTENTS
13-1
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 13
HOW TO USE
Structure of a
control chart
Process statistic
Center line
A process is in control when most of the points fall within the bounds of the control limits and the
points do not display any nonrandom patterns. The tests for special causes will detect
nonrandom patterns. If you like, you can change the threshold values for triggering a test failure.
Special causes are causes arising from outside the system that can be corrected. Examples of
special causes include supplier, shift, or day of the week differences. Common cause variation, on
the other hand, is variation that is inherent or a natural part of the process. A process is in control
when only common causesnot special causesaffect the process output.
See [2], [3], [6], [8], [9], and [10] for a discussion of these charts.
13-2
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Attributes Control Charts
C Chart, which charts the number of defects in each subgroup. Use C Chart when the
subgroup size is constant.
U Chart, which charts the number of defects per unit sampled in each subgroup. Use U
Chart when the subgroup size varies.
For example, if you were counting the number of flaws on the inner surface of a television
screen, C Chart would chart the actual number of flaws while U Chart would chart the number
of flaws per square inch sampled.
See [2], [3], [6], [8], [9], and [10] for a discussion of these charts.
Data
Each entry in the worksheet column should contain the number of defectives or defects for a
subgroup. When subgroup sizes are unequal, you must also enter a corresponding column of
subgroup sizes.
Suppose you have collected daily data on the number of parts that have been inspected and the
number of parts that failed to pass inspection. On any given day both numbers may vary. You
enter the number that failed to pass inspection in one column. In this case, the total number
inspected varies from day to day, so you enter the subgroup size in another column:
Failed
8
13
13
16
14
15
13
10
24
12
Inspect
968
1216
1004
1101
1076
995
1202
1028
1184
992
P Chart (and U Chart) divide the number of defectives or defects by the subgroup size to get the
proportion of defectives, or defects per unit. NP Chart and C Chart plot raw data.
P Chart, NP Chart, and U Chart handle unequal-size subgroups. With P Chart and U Chart, the
control limits are a function of the subgroup size, while the center line is always constant. With
NP Chart, both the control limits and the center line are affected by differing subgroup sizes. In
general, the control limits are further from the center line for smaller subgroups than they are for
larger ones. You can force the control limits and center line to be constant, as described on page
13-16.
CONTENTS
13-3
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 13
P Chart
When an observation is missing, a gap exists in the chart where the summary statistic for that
subgroup would have been plotted.
P Chart
Use P chart to draw a chart of the proportion of defectivesthe number of defectives divided by
the subgroup size. P charts track the proportion defective and detect the presence of special
causes. Each entry in the worksheet column is the number of defectives for one subgroup,
assumed to have come from a binomial distribution with parameters n and p.
By default, the process proportion defective, p, is estimated by the overall sample proportion. This
is the value of the center line on the chart. The control limits are also calculated using this value.
Data
Arrange the data in your worksheet as illustrated in Data on page 13-3.
h To make a P chart
1 Choose Stat Control Charts P.
When subgroups are of equal size, enter their size in Subgroup size.
When subgroups are of unequal size, choose Subgroups in, and enter the column of
subgroup sizes.
4 If you like, use any of the options described below, then click OK.
13-4
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
P Chart
HOW TO USE
Attributes Control Charts
Options
P Chart dialog box
enter an historical value for p that will be used for calculating the center line and control
limitssee Use historical values of p on page 13-14. Use this option if you have a goal for p, or
a known p from prior data.
customize the chart annotation, frame, and region (placement of the chart within the Graph
window)see Customize the data display, annotation, frame, and regions on page 13-17.
do four tests for special causessee Do tests for special causes on page 13-15. To adjust the
sensitivity of the tests, see Defining Tests for Special Causes on page 12-5.
omit certain subgroups when estimating p, for calculating the center line and control limits
see Omit subgroups from the estimate of or p on page 13-15.
force the control limits to be constant when subgroups are of unequal sizesee Force control
limits and center line to be constant on page 13-16.
choose the positions at which to draw the upper and lower control (sigma) limits in relation to
the center linesee Customize the control (sigma) limits on page 13-16. The default line is
3 above and below the center line. You can draw more than one set of lines. For example,
you can draw specification limits along with control limits on the chart.
place bounds on the upper and/or lower control limitssee Customize the control (sigma)
limits on page 13-16.
choose the line type, color, and size for the control limitssee Customize the control (sigma)
limits on page 13-16. The default line is solid red.
add another row of tick labels below the default tick labelssee Add additional rows of tick
labels on page 12-70. For example, you can place time stamp labels, or other descriptive
labels, on your graph.
choose the text font, color, and size for the axis and tick labelssee Customize the data
display, annotation, frame, and regions on page 13-17. The default labels are black Arial.
choose the symbol type, color, and sizesee page 13-17. The default symbol is a black cross.
CONTENTS
13-5
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 13
NP Chart
choose the connection line type, color, and sizesee page 13-17. The default line is solid
black.
estimate control limits and center line independently for different groups (draws a historical
chart)see page 12-60.
Suppose you work in a plant that manufactures picture tubes for televisions. For each lot, you pull
some of the tubes and do a visual inspection. If a tube has scratches on the inside, it is rejected. If
a lot has too many rejects, you will do a 100% inspection on that lot. A P chart can define when
you need to inspect the whole lot.
1 Open the worksheet EXH_QC.MTW.
2 Choose Stat Control Charts P.
3 In Variable, enter Rejects.
4 Choose Subgroups in and enter Sampled in the text box. Click OK.
Graph
window
output
NP Chart
Use NP chart to draw a chart of the number of defectives. NP charts track the number of
defectives and detect the presence of special causes. Each entry in the worksheet column is the
number of defectives for one subgroup, assumed to have come from a binomial distribution with
parameters n and p.
13-6
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
NP Chart
HOW TO USE
Attributes Control Charts
By default, the process proportion defective, p, is estimated by the overall sample proportion. The
center line and control limits are then calculated using this value.
Data
Arrange the data in your worksheet as illustrated in Data on page 13-3.
h To make an NP chart
1 Choose Stat Control Charts NP.
When subgroups are of equal size, enter their size in Subgroup size.
When subgroups are of unequal size, choose Subgroups in and enter the column of
subgroup sizes.
4 If you like, use any of the options described below, then click OK.
Options
NP Chart dialog box
enter an historical value for p that will be used for calculating the center line and control
limitssee page 13-14. Use this option if you have a goal for p, or a known p from prior data.
customize the chart annotation, frame, and region (placement of the chart within the Graph
window)see page 13-17.
do four tests for special causessee page 13-15. To adjust the sensitivity of the tests, see
Defining Tests for Special Causes on page 12-5.
CONTENTS
13-7
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 13
C Chart
omit certain subgroups when estimating p, for calculating the center line and control limits
see page 13-15.
force the control limits and center line to be constant when subgroups are of unequal size
see page 13-16.
choose the positions at which to draw the upper and lower control (sigma) limits in relation to
the center linesee page 13-16. The default line is 3 above and below the center line. You
can draw more than one set of lines. For example, you can draw specification limits along with
control limits on the chart.
place bounds on the upper and/or lower control limitssee page 13-16.
choose the line type, color, and size for the control limitssee page 13-16. The default line is
solid red.
add another row of tick labels below the default tick labelssee page 12-70. For example, you
can place time stamp labels, or other descriptive labels, on your graph.
choose the text font, color, and size for the axis and tick labelssee page 13-17. The default
labels are black Arial.
choose the symbol type, color, and sizesee page 13-17. The default symbol is a black cross.
choose the connection line type, color, and sizesee page 13-17. The default line is solid
black.
estimate control limits and center line independently for different groups (draws a historical
chart)see page 12-60.
C Chart
Use C chart to draw a chart of the number of defects. C charts track the number of defects and
detect the presence of special causes. Each entry in the specified column contains the number of
defects for one subgroup, assumed to have come from a Poisson distribution with parameter .
This is both the mean and the variance.
By default, the process average number of defects, , is estimated from the data. This value is the
center line on the C Chart. The control limits are also calculated using this value.
13-8
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
C Chart
HOW TO USE
Attributes Control Charts
Data
Each entry in the worksheet column should contain the number of defects for one subgroup.
Each subgroup must be the same size.
h To make a C chart
1 Choose Stat Control Charts C.
Options
C Chart dialog box
enter an historical value for that will be used for calculating the center line and control
limitssee page 13-14. Use this option if you have a goal for , or a known from prior data.
customize the chart annotation, frame, and region (placement of the chart within the Graph
window)see page 13-17.
do four tests for special causessee page 13-15. To adjust the sensitivity of the tests, see
Defining Tests for Special Causes on page 12-5.
omit certain subgroups when estimating , for calculating the center line and control
limitssee page 13-15.
CONTENTS
13-9
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 13
C Chart
choose the positions at which to draw the upper and lower control (sigma) limits in relation to
the center linesee page 13-16. The default line is 3 above and below the center line. You
can draw more than one set of lines. For example, you can draw specification limits along with
control limits on the chart.
place bounds on the upper and/or lower control limitssee page 13-16.
choose the line type, color, and size for the control limitssee page 13-16. The default line is
solid red.
add another row of tick labels below the default tick labelssee page 12-70. For example, you
can place time stamp labels, or other descriptive labels, on your graph.
choose the text font, color, and size for the axis and tick labelssee page 13-17. The default
labels are black Arial.
choose the symbol type, color, and sizesee page 13-17. The default symbol is a black cross.
choose the connection line type, color, and sizesee page 13-17. The default line is solid
black.
estimate control limits and center line independently for different groups (draws a historical
chart)see page 12-60.
Suppose you work for a linen manufacturer. Each 100 square yards of fabric is allowed to contain
a certain number of blemishes before it is rejected. For quality control, you want to track the
number of blemishes per 100 square yards over a period of several days, to see if your process is
behaving predictably. You would like the control chart to show control limits at 1, 2, as well as
3 above and below the center line.
1 Open the worksheet EXH_QC.MTW.
2 Choose Stat Control Charts C.
3 In Variable, enter Blemish.
4 Click S Limits. In Sigma limit positions, enter 1 2 3.
13-10
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
C Chart
HOW TO USE
Attributes Control Charts
5 Check Place bound on lower sigma limits at and enter 0 in the box. Click OK in each
dialog box.
Graph
window
output
CONTENTS
13-11
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 13
U Chart
U Chart
Use U Chart to draw a chart of the number of defects per unit sampled, X / n. U charts track the
number of defects per unit sampled and detect the presence of special causes. Each entry in the
worksheet column is the number of defects in a sample (or subgroup), assumed to come from a
Poisson distribution with the parameter , which is both the mean and the variance.
By default, the process average number of defects, , is estimated from the data. This value is the
center line on a U Chart. The control limits are also calculated using this value.
For general information on attributes control charts, see Attributes Control Charts Overview on
page 13-2.
Data
Each entry in the worksheet column should contain the number of defects in a sample (or
subgroup). Subgroups need not be of equal size. When they are unequal, a second column
should contain, in the corresponding row, the subgroup size. See Data on page 13-3 for an
illustration.
h To make a U chart
1 Choose Stat Control Charts U.
2 In Variable, enter the column containing the number of defects per unit.
3 Do one of the following:
When subgroups are of equal size, enter their size in Subgroup size.
When subgroups are of unequal size, choose Subgroups in and enter the column of unit
sizes.
4 If you like, use any of the options described below, then click OK.
13-12
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
U Chart
HOW TO USE
Attributes Control Charts
Options
U Chart dialog box
enter an historical value for that will be used for calculating the center line and control
limitssee page 13-14. Use this option if you have a goal for , or a known from prior data.
customize the chart annotation, frame, and region (placement of the chart within the Graph
window)see page 13-17.
do four tests for special causessee page 13-15. To adjust the sensitivity of the tests, see
Defining Tests for Special Causes on page 12-5.
omit certain subgroups when estimating , for calculating the center line and control
limitssee page 13-15.
force the control limits to be constant when subgroups are of unequal sizesee page 13-16.
choose the positions at which to draw the upper and lower control (sigma) limits in relation to
the center linesee page 13-16. The default line is 3 above and below the center line. You
can draw more than one set of lines. For example, you can draw specification limits along
with control limits on the chart.
place bounds on the upper and/or lower control limitssee page 13-16.
choose the line type, color, and size for the control limitssee page 13-16. The default line is
solid red.
add another row of tick labels below the default tick labelssee page 12-70. For example, you
can place time stamp labels, or other descriptive labels, on your graph.
choose the text font, color, and size for the axis and tick labelssee page 13-17. The default
labels are black Arial.
choose the connection line type, color, and sizesee page 13-17. The default line is solid
black.
choose the symbol type, color, and sizesee page 13-17. The default symbol is a black cross.
CONTENTS
13-13
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 13
SC QREF
HOW TO USE
Page
NP
13-14
13-14
13-15
13-15
13-16
13-16
12-70
13-17
In the attributes control charts main dialog box, enter a value in Historical p.
13-14
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Attributes Control Charts
Test 2
Nine points in a row
on same side of the
center line
Test 3
Six points in a row all
increasing or
decreasing
Test 4
Fourteen points in a
row alternating up
and down
CONTENTS
13-15
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 13
SC QREF
HOW TO USE
Note
It is usually recommended that you force the control limits and/or center line to be
constant only when the difference in size between the largest and smallest subgroup is
no more than 25%.
draw control limits above and below the mean at the multiples of any standard deviation.
set bounds on the upper and/or lower control limits. When the calculated upper control limit
is greater than the upper bound, a horizontal line labeled UB will be drawn at the upper
bound instead. Similarly, if the calculated lower control limit is less than the lower bound, a
horizontal line labeled LB will be drawn at the lower bound instead.
specify the line type, color, and size. By default, the line is solid red.
For an example, see Example of an Xbar chart with tests and customized control limits on page
12-13.
Tip
You can also modify the control limits using the graph editing features explained in
MINITAB Users Guide 1.
To specify where control limits are drawn: In Sigma limit positions, enter one or more
values, or a column of values. Each value you enter draws two horizontal lines, one above
13-16
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
References
HOW TO USE
Attributes Control Charts
and one below the mean. For example, entering a 2 draws control limits at two standard
deviations above and below the center line. Entering 1 2 3 gives three lines above and
three lines below the center line at 1, 2, and 3. When C1 contains the values 1 2 3,
entering C1 also draws three lines above and below the center line at 1, 2, and 3.
To set bounds on the control limits: Check Place bound on upper sigma limits at (and/or
Place bound on lower sigma limits at) and enter a value. Each value represents the
number of standard deviations above and below the mean.
To change line attributes: Under Line type or Line color, click a choice. In Line size,
enter a positive real number. Larger numbers correspond to wider lines. The number you
enter is in relation to the base unit of 1 pixel.
Core Graphs:
Displaying Data
Core Graphs:
Annotating
Core Graphs:
Customizing
the Frame
Core Graphs:
Controlling Regions
Symbols
Titles
Axes
Figure
Connection lines
Footnotes
Ticks
Data
Text
Grids
Lines
Reference lines
Polygons
Markers
Suppressing frame
elements
To save an active graph window, use File Save Window As. To view it later, use File Open
Graph.
References
[1] I.W. Burr (1976). Statistical Quality Control Methods, Marcel Dekker, Inc.
[2] Ford Motor Company (1983). Continuing Process Control and Process Capability
Improvement, Ford Motor Company, Dearborn, Michigan.
MINITAB Users Guide 2
CONTENTS
13-17
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 13
References
[3] E.L. Grant and R.S. Leavenworth (1988). Statistical Quality Control, 6th Edition,
McGraw-Hill.
[4] K. Ishikawa (1967). Guide to Quality Control, Asian Productivity Organization.
[5] V.E. Kane (1989). Defect Prevention, Marcel Dekker, Inc.
[6] D.C. Montgomery (1985). Introduction to Statistical Quality Control, John Wiley & Sons.
[7] L.S. Nelson (1984). The Shewhart Control ChartTests for Special Causes, Journal of
Quality Technology, 16, pp. 237239.
[8] T.P. Ryan (1989). Statistical Methods for Quality Improvement, John Wiley & Sons.
[9] H.M. Wadsworth, K.S. Stephens, and A.B. Godfrey, (1986). Modern Methods for Quality
Control and Improvement, John Wiley & Sons.
[10] Western Electric (1956). Statistical Quality Control Handbook, Western Electric
Corporation, Indianapolis, Indiana.
13-18
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
14
Process Capability
CONTENTS
14-1
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 14
HOW TO USE
Process Capability Overview
Note
It is essential to choose the correct distribution when conducting a capability analysis. For
example, MINITAB provides capability analyses based on both normal and Weibull probability
models. The commands that use a normal probability model provide a more complete set of
statistics, but your data must approximate the normal distribution for the statistics to be
appropriate for the data.
For example, Capability Analysis (Normal) estimates expected parts per million out-of-spec using
the normal probability model. Interpretation of these statistics rests on two assumptions: that the
data are from a stable process, and that they follow an approximately normal distribution.
Similarly, Capability Analysis (Weibull) calculates parts per million out-of-spec using a Weibull
distribution. In both cases, the validity of the statistics depends on the validity of the assumed
distribution.
If the data are badly skewed, probabilities based on a normal distribution could give rather poor
estimates of the actual out-of-spec probabilities. In that case, it is better to either transfom the data
to make the normal distribution a more appropriate model, or choose a different probability
model for the data. With MINITAB, you can use the Box-Cox power transformation or a Weibull
probability model. Non-normal data on page 14-5 compares these two methods.
14-2
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Process Capability
If you suspect that there may be a strong between-subgroup source of variation in your process,
use Capability Analysis (Between/Within) or Capability Sixpack (Between/Within). Subgroup
data may have, in addition to random error within subgroups, random variation between
subgroups. Understanding both sources of subgroup variation may provide you with a more
realistic estimate of the potential capability of a process. Capability Analysis (Between/Within)
and Capability Sixpack (Between/Within) computes both within and between standard
deviations and then pools them to calculate the total standard deviation.
MINITAB also provides capability analyses for attributes (count) data, based on the binomial and
Poisson probability models. For example, products may be compared against a standard and
classified as defective or not (use Capability Analysis (Binomial)). You can also classify products
based on the number of defects (use Capability Analysis (Poisson)).
Capability Sixpack (Normal) combines the following charts into a single display, along with
a subset of the capability statistics:
an X (or Individuals), R or S (or Moving Range), and run chart, which can be used to
verify that the process is in a state of control
a capability histogram and normal probability plot, which can be used to verify that the
data are normally distributed
a capability plot, which displays the process variability compared to the specifications
CONTENTS
14-3
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 14
HOW TO USE
Process Capability Overview
Capability Sixpack (Weibull) combines the following charts into a single display, along with a
subset of the capability statistics:
an X (or Individuals), R (or Moving Range), and run chart, which can be used to verify that
the process is in a state of control
a capability histogram and Weibull probability plot, which can be used to verify that the
data come from a Weibull distribution
a capability plot, which displays the process variability compared to the specifications
Although the Capability Sixpack commands give you fewer statistics than the Capability Analysis
commands, the array of charts can be used to verify that the process is in control and that the data
follow the chosen distribution.
Note
Capability statistics are simple to use, but they have distributional properties that are not
fully understood. In general, it is not good practice to rely on a single capability statistic
to characterize a process. See [2], [4], [5], [6], [9], [10], and [11] for a discussion.
Capability Analysis (Binomial) is appropriate when your data consists of the number of
defectives out of the total number of parts sampled. The report draws a P chart, which helps
you verify that the process is in a state of control. The report also includes a chart of
cumulative %defectives, histogram of %defectives, and defective rate plot.
Capability Analysis (Poisson) is appropriate when your data take the form of the number of
defects per item. The report draws a U chart, which helps you to verify that the process is in a
state of control. The report also includes a chart of the cumulative mean DPU (defects per
unit), histogram of DPU, and a defect rate plot.
Capability statistics
Process capability statistics are numerical measures of process capabilitythat is, they measure
how capable a process is of meeting specifications. These statistics are simple and unitless, so you
can use them to compare the capability of different processes. Capability statistics are basically a
ratio between the allowable process spread (the width of the specification limits) and the actual
process spread (6). Some of the statistics take into account the process mean or target.
Process capability command
Capability statistics
14-4
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Process Capability
Capability statistics
For more information, see Capability statistics on page 14-8, Capability statistics on page 14-20,
and Capability statistics on page 14-25.
Many practitioners consider 1.33 to be a minimum acceptable value for the process capability
statistics, and few believe that a value less than 1 is acceptable. A value less than 1 indicates that
your process variation is wider than the specification tolerance.
Here are some guidelines for how the statistics are used:
This statistic
is used when
Definition
Cp or Pp
Cpk or Ppk
Note
CPU or PPU
USL - / 3
CPL or PPL
- LSL / 3
If the process target is not the midpoint between specifications, you may prefer to use
Cpm in place of Cpk, since Cpm measures process mean relative to the target rather than
the midpoint between specifications. See [9] for a discussion. You can calculate Cpm by
entering a target in the Options subdialog box.
Non-normal data
When you have non-normal data, you can either transfom the data in such a way that the
normal distribution is a more appropriate model, or choose a Weibull probability model for the
data.
To transform the data, use Capability Analysis (Normal), Capability Sixpack (Normal),
Capability Analysis (Between/Within), or Capability Sixpack (Between/Within) with the
optional Box-Cox power transformation. See Box-Cox Transformation for Non-Normal Data
on page 12-6.
CONTENTS
14-5
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 14
SC QREF
HOW TO USE
To use a Weibull probability model, use Capability Analysis (Weibull) and Capability Sixpack
(Weibull).
Weibull model
Which method is better? The only way to answer that question is to see which model fits the data
better. If both models fit the data about the same, it is probably better to choose the normal
model, since it provides estimates of both overall and within process capability.
Data
You can use individual observations or data in subgroups. Individual observations should be
structured in one column. Subgroup data can be structured in one column, or in rows across
several columns. When you have subgroups of unequal size, enter the data in a single column,
then set up a second column of subgroup indicators. For examples, see Data on page 12-3.
14-6
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Process Capability
If you have data in subgroups, you must have two or more observations in at least one subgroup
in order to estimate the process standard deviation.
To use the Box-Cox transformation, data must be positive.
If an observation is missing, MINITAB omits it from the calculations.
h To perform a capability analysis (normal probability model)
1 Choose Stat Quality Tools Capability Analysis (Normal).
When subgroups or individual observations are in one column, enter the data column in
Single column. In Subgroup size, enter a subgroup size or column of subgroup
indicators. For individual observations, enter a subgroup size of 1.
When subgroups are in rows, choose Subgroups across rows of, and enter the columns
containing the rows in the box.
3 In Lower spec or Upper spec, enter a lower and/or upper specification limit, respectively. You
Options
Capability Analysis (Normal) dialog box
define the upper and lower specification limits as boundaries, meaning measurements
cannot fall outside those limits. As a result, the expected % out of spec is set to 0 for
boundaries. If you choose boundaries, then USL (upper specification limits) and LSL
(lower specification limit) will be replaced by UB (upper boundary) and LB (lower boundary)
on the analysis.
Note
When you define the upper and lower specification limits as boundaries, MINITAB still
calculates the observed % out-of-spec. If the observed % out-of-spec comes up nonzero,
this is an obvious indicator of incorrect data.
CONTENTS
14-7
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 14
SC QREF
HOW TO USE
enter historical values for (the process mean) and (the process potential standard
deviation) if you have known process parameters or estimates from past data. If you do not
specify a value for or , MINITAB estimates them from the data.
estimate the process standard deviation () various wayssee Estimating the process variation
on page 14-9.
use the Box-Cox power transformation when you have very skewed datasee Use the Box-Cox
power transformation for non-normal data on page 12-67.
enter a process target, or nominal specification. MINITAB calculates Cpm in addition to the
standard capability statistics.
calculate the capability statistics using an interval other than six standard deviations wide
(three on either side of the process mean) by entering a sigma tolerance. For example,
entering 12 says to use an interval 12 standard deviations wide, six on either side of the process
mean.
perform only the within-subgroup analysis or only the overall analysis. The default is to
perform both.
display benchmark Z scores instead of capability statistics. The default is to display capability
statistics.
display the capability analysis graph or not. The default is to display the graph.
store your choice of statistics in worksheet columns. The statistics available for storage depend
on the options you have chosen in the Capability Analysis (Normal) dialog box and subdialog
boxes.
Capability statistics
When you use the normal distribution model for the capability analysis, MINITAB calculates the
capability statistics associated with within variation (Cp, Cpk, CPU, and CPL) and with overall
variation (Pp, Ppk PPU, PPL). To interpret these statistics, see Capability statistics on page 14-4.
Cp, Cpk, CPU, and CPL represents the potential capability of your processwhat your process
would be capable of if the process did not have shifts and drifts in the subgroup means. To
14-8
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Process Capability
calculate these, Minitab estimates within considering the variation within subgroups, but not
the shift and drift between subgroups.
Note
When your subgroup size is one, the within variation estimate is based on a moving
range, so that adjacent observations are effectively treated as subgroups.
Pp, Ppk, PPU, and PPL represent the overall capability of the process. When calculating these
statistics, MINITAB estimates overall considering the variation for the whole study.
Each small curve
represents within (or
potential) variation, or
variation for one
subgroup (one
moment in time).
The large curve
represents overall
variationthe
variation for the
whole study.
Overall capability depicts how the process is actually performing relative to the specification
limits. Within capability depicts how the process could perform relative to the specification
limits, if shifts and drifts could be eliminated. A substantial difference between overall and
within variation may indicate that the process is out of control, or it may indicate sources of
variation not estimated by within capability.
CONTENTS
14-9
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 14
SC QREF
HOW TO USE
Estimate.
For subgroup sizes greater than one, to base the estimate on:
the average of the subgroup rangeschoose Rbar.
the average of the subgroup standard deviationschoose Sbar. To not use an unbiasing
constant in the estimation, uncheck Use unbiasing constants.
the pooled standard deviation (the default)choose Pooled standard deviation. To not
use an unbiasing constant in the estimation, uncheck Use unbiasing constants.
For individual observations (subgroup size is one), to base the estimate on:
the average of the moving range (the default)choose Average moving range. To
change the length of the moving range from 2, check Use moving range of length and
enter a number in the box.
the median of the moving rangechoose Median moving range. To change the length
of the moving range from 2, check Use moving range of length and enter a number in
the box.
the square root of MSSD (mean of the squared successive differences)choose Square
root of MSSD. To not use an unbiasing constant in the estimation, uncheck Use
unbiasing constants.
3 Click OK.
e Example of a capability analysis (normal probability model)
Suppose you work at an automobile manufacturer in a department that assembles engines. One
of the parts, a camshaft, must be 600 mm +2 mm long to meet engineering specifications. There
has been a chronic problem with camshaft lengths being out of specificationa problem which
has caused poor-fitting assemblies down the production line and high scrap and rework rates.
Upon examination of the inventory records, you discovered that there were two suppliers for the
camshafts. An X and R chart showed you that Supplier 2s camshaft production was out of
control, so you decided to stop accepting production runs from them until they get their
production under control.
14-10
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Process Capability
After dropping Supplier 2, the number of poor quality assemblies has dropped significantly, but
the problems have not completely disappeared. You decide to run a capability study to see
whether Supplier 1 alone is capable of meeting your engineering specifications.
1 Open the worksheet CAMSHAFT.MTW.
2 Choose Stat Quality Tools Capability Analysis (Normal).
3 In Single column, enter Supp1. In Subgroup size, enter 5.
4 In Lower spec, enter 598. In Upper spec, enter 602.
5 Click Options. In Target (adds Cpm to table), enter 600. Click OK in each dialog box.
Graph
window
output
CONTENTS
14-11
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 14
SC QREF
HOW TO USE
Suppose you work for a company that manufactures floor tiles and are concerned about warping
in the tiles. To ensure production quality, you measure warping in ten tiles each working day for
ten days.
A histogram shows that your data do not follow a normal distribution, so you decide to use the
Box-Cox power transformation to try to make the data more normal.
First you need to find the optimal lambda () value for the transformation. Then you will do the
capability analysis, performing the Box-Cox transformation with that value.
1 Open the worksheet TILES.MTW.
2 Choose Stat Control Charts Box-Cox Transformation.
3 In Single column, enter Warping. In Subgroup size, type 10. Click OK.
Graph
window
output
The best estimate of lambda is 0.449, but practically speaking, you may want a lambda value that
corresponds to an intuitive transformation, such as the square root (a lambda of 0.5). In our
example, 0.5 is a reasonable choice because it falls within the 95% confidence interval, as marked
by vertical lines on the graph. So you will run the Capability Analysis with a Box-Cox
transformation, using = 0.5.
1 Choose Stat Quality Tools Capability Analysis (Normal).
14-12
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Process Capability
CONTENTS
14-13
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 14
SC QREF
HOW TO USE
(pooled) to compute the total standard deviation. The total standard deviation will be used to
calculate the capability statistics, such as Cp and Cpk.
The report includes a capability histogram overlaid with two normal curves, and a complete table
of overall and total (between and within) capability statistics. The normal curves are generated
using the process mean and overall standard deviation and the process mean and total standard
deviation.
The report also includes statistics of the process data, such as the process mean, target, if you
enter one, total (between and within) and overall standard deviation, and observed and expected
performance.
Data
You can use data in subgroups, with two or more observations. Subgroup data can be structured
in one column, or in rows across several columns.
To use the Box-Cox transformation, data must be positive.
Ideally, all subgroups should be the same size. If your subgroups are not all the same size, due to
missing data or unequal subgroup sizes, only subgroups of the majority size are used for
estimating the between-subgroup variation.
h To perform a capability analysis (between/within)
1 Choose Stat Quality Tools Capability Analysis (Between/Within).
When subgroups are in one column, enter the data column in Single column. In
Subgroup size, enter a subgroup size or column of subgroup indicators.
When subgroups are in rows, choose Subgroups across rows of, and enter the columns
containing the rows in the box.
3 In Lower spec or Upper spec, enter a lower and/or upper specification limit, respectively. You
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Process Capability
4 If you like, use any of the options listed below, then click OK.
Options
Capability Analysis (Between/Within) dialog box
define the upper and lower specification limits as boundaries, meaning measurements
cannot fall outside those limits. As a result, the expected % out of spec is set to 0 for a
boundary. If you choose a boundary, MINITAB does not calculate capability statistics for that
side.
Note
When you define the upper and lower specification limits as boundaries, MINITAB still
calculates the observed % out-of-spec. If the observed % out-of-spec comes up nonzero,
this is an obvious indicator of incorrect data.
enter historical values for (the process mean) and within subgroups and/or between
subgroups if you have known process parameters or estimates from past data. If you do not
specify a value for or , MINITAB estimates them from the data.
estimate the within and between standard deviations () various wayssee Estimating the
process variation on page 14-16.
use the Box-Cox power transformation when you have very skewed datasee Use the
Box-Cox power transformation for non-normal data on page 12-67.
enter a process target, or nominal specifications. MINITAB calculates Cpm in addition to the
standard capability statistics.
calculate the capability statistics using an interval other than six standard deviations wide
(three on either side of the process mean) by entering a sigma tolerance. For example,
entering 12 says to use an interval 12 standard deviations wide, six on either side of the process
mean.
perform the between/within subgroup analysis only, or the overall analysis only. The default is
to perform both.
display the capability analysis graph or not. The default is to display the graph.
CONTENTS
14-15
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 14
SC QREF
HOW TO USE
store your choice of statistics in worksheet columns. The statistics available for storage depend
on the options you have chosen in the Capability Analysis (Between/Within) dialog box and
subdialog boxes.
Capability statistics
When you use Capability Analysis (Between/Within), MINITAB calculates both overall capability
statistics (Pp, Ppk, PPU, and PPL) and between/within capability statistics (Cp, Cpk, CPU, and
CPL). To interpret these statistics, see Capability statistics on page 14-4.
Cp, Cpk, CPU, and CPL represents the potential capability of your processwhat your process
would be capable of if the process did not have shifts and drifts in the subgroup means. To
calculate these, Minitab estimates within and between and pools them to estimate total. Then,
total is used to calculate the capability statistics.
Pp, Ppk, PPU, and PPL represent the overall capability of the process. When calculating these
statistics, MINITAB estimates overall considering the variation for the whole study.
14-16
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Process Capability
2 To change the method for estimating within, choose one of the following:
the average of the subgroup standard deviationschoose Sbar. To not use an unbiasing
constant in the estimation, uncheck Use unbiasing constants.
the pooled standard deviation (the default)choose Pooled standard deviation. To not
use an unbiasing constant in the estimation, uncheck Use unbiasing constants.
3 To change the method for estimating between, choose one of the following:
the average of the moving range (the default)choose Average moving range. To change
the length of the moving range from 2, check Use moving range of length and enter a
number in the box.
the median of the moving rangechoose Median moving range. To change the length of
the moving range from 2, check Use moving range of length and enter a number in the
box.
the square root of MSSD (mean of the squared successive differences)choose Square
root of MSSD. To not use an unbiasing constant in the estimation, uncheck Use
unbiasing constants.
4 Click OK.
e Example of a capability analysis (between/within)
Suppose you are interested in the capability of a process that coats rolls of paper with a thin film.
You are concerned that the paper is being coated with the correct thickness of film and that the
coating is applied evenly throughout the roll. You take three samples from 25 consecutive rolls
and measure coating thickness. The thickness must be 50 3 to meet engineering specifications.
1 Open the worksheet BWCAPA.MTW.
2 Choose Stat Quality Tools Capability Analysis (Between/Within).
3 In Single column, enter Coating. In Subgroup size, enter Roll.
MINITAB Users Guide 2
CONTENTS
14-17
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 14
SC QREF
HOW TO USE
Graph
window
output
Interpreting results
You can see that the process mean (49.8829) falls close to the target of 50. The Cpk index
indicates whether the process will produce units within the tolerance limits. The Cpk index is
only 1.21, indicating that the process is fairly capable, but could be improved.
The PPM Total for Expected Between/Within Performance is 193.94. This means that
approximately 194 out of a million coatings will not meet the specification limits. This analysis
tells you that your process is fairly capable.
14-18
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Process Capability
page 14-6 with the optional Box-Cox power transformation. For a comparison of the methods
used for non-normal data, see Non-normal data on page 14-5.
Data
You can enter your data in a single column or in multiple columns if you have arranged
subgroups across rows. Because the Weibull capability analysis does not calculate within
capability statistics, MINITAB does not used subgroups in calculations. For examples, see Data
on page 12-3.
Data must be positive.
If an observation is missing, MINITAB omits it from the calculations.
h To perform a capability analysis (Weibull probability model)
1 Choose Stat Quality Tools Capability Analysis (Weibull).
When subgroups or individual observations are in one column, choose Single column
and enter the column containing the data.
When subgroups are in rows, choose Subgroups across rows of, and enter the columns
containing the rows in the box.
3 In Lower spec or Upper spec, enter a lower and/or upper specification limit, respectively. You
must enter at least one of them. These limits must be positive numbers, though the lower spec
can be 0.
4 If you like, use any of the options listed below, then click OK.
CONTENTS
14-19
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 14
SC QREF
HOW TO USE
Options
Capability Analysis (Weibull) dialog box
Note
define the upper and lower specification limits as boundaries, meaning that it is impossible
for a measurement to fall outside that limit. As a result, when calculating the expected %
out-of-spec, MINITAB sets this value to 0 for a boundary.
When you define the upper or lower specification limits as boundaries, MINITAB still
calculates the observed % out-of-spec. If the observed % out-of-spec comes up nonzero,
this is an obvious indicator of incorrect data.
enter historical values for the Weibull shape and scale parameterssee Weibull family of
distributions on page 14-20.
enter a process target or nominal specification. MINITAB calculates Cpm in addition to the
standard capability statistics.
calculate the capability statistics using an interval other than six standard deviations wide
(three on either side of the process mean) by entering a sigma tolerance. For example,
entering 12 says to use an interval 12 standard deviations wide, six on either side of the process
mean.
Capability statistics
When you use the Weibull model for the capability analysis, MINITAB only calculates the overall
capability statistics, Pp, Ppk, PPU, and PPL. The calculations are based on maximum likelihood
estimates of the shape and scale parameters for the Weibull distribution, rather than mean and
variance estimates as in the normal case.
To interpret these statistics, see Capability statistics on page 14-4.
Pp, Ppk, PPU, and PPL represent the overall capability of the process. When calculating these
statistics, MINITAB estimates overall considering the variation for the whole study.
14-20
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Process Capability
If you like, you can enter historical values for the shape and scale. If you do not enter historical
values, MINITAB obtains maximum likelihood estimates from the data.
Caution
Because the shape and scale parameters define the properties of the Weibull distribution,
they also define the probabilities used to calculate the capability statistics. If you enter
known values for the parameters, keep in mind that small changes in the parameters,
especially the shape, can have large effects on the associated probabilities.
Options.
1 (Exponential)
2 (Rayleigh)
3 In Scale parameter, choose Historical value, and enter a positive value for the scale. Click
OK.
e Example of a capability analysis (Weibull probability model)
Suppose you work for a company that manufactures floor tiles, and are concerned about warping
in the tiles. To ensure production quality, you measured warping in ten tiles each working day
for ten days.
A histogram of the data showed that it did not come from a normal distributionsee Example of
a capability analysis with a Box-Cox transformation on page 14-12. So you decide to perform a
capability analysis based on a Weibull probability model.
1 Open the worksheet TILES.MTW.
2 Choose Stat Quality Tools Capability Analysis (Weibull).
3 In Single column, enter Warping.
MINITAB Users Guide 2
CONTENTS
14-21
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 14
SC QREF
HOW TO USE
Graph
window
output
14-22
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Process Capability
within and overall capability statistics: Cp, Cpk, Cpm (if you enter a target), and within; Pp,
Ppk, and overall
The X , R, and run charts can be used to verify that the process is in a state of control. The
histogram and normal probability plot can be used to verify that the data are normally
distributed. Lastly, the capability plot gives a graphical view of the process variability compared
to the specifications. Combined with the capability statistics, this information can help you
assess whether your process is in control and the product meets specifications.
A model that assumes the data are from a normal distribution suits most process data. If your data
are either very skewed or the within-subgroup variation is not constant (for example, when this
variation is proportional to the mean), see the discussion under Non-normal data on page 14-5.
Data
You can enter individual observations or data in subgroups. Individual observations should be
structured in one column. Subgroup data can be structured in one column, or in rows across
several columns. When you have subgroups of unequal size, enter the subgroups in a single
column, then set up a second column of subgroup indicators. For examples, see Data on page
12-3.
To use the Box-Cox transformation, data must be positive.
If you have data in subgroups, you must have two or more observations in at least one subgroup
in order to estimate the process standard deviation. Subgroups need not be the same size.
If a single observation in the subgroup is missing, MINITAB omits it from the calculations of the
statistics for that subgroup. Such an omission may cause the control chart limits and the center
line to have different values for that subgroup. If an entire subgroup is missing, there is a gap in
the chart where the statistic for that subgroup would have been plotted.
CONTENTS
14-23
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 14
SC QREF
HOW TO USE
When subgroups or individual observations are in one column, enter the data column in
Single column. In Subgroup size, enter a subgroup size or column of subgroup indicators.
For individual observations, enter a subgroup size of 1.
When subgroups are in rows, choose Subgroups across rows of, and enter the columns
containing the rows in the box.
3 In Lower spec or Upper spec, enter a lower and/or upper specification limit, respectively. You
Options
Capability Sixpack (Normal) dialog box
enter your own value for (the process mean) and (the process potential standard deviation)
if you have known process parameters or estimates from past data. If you do not specify a value
for or , MINITAB estimates them from the data.
do your choice of eight tests for special causessee Do tests for special causes on page 12-63.
To adjust the sensitivity of the tests, use Defining Tests for Special Causes on page 12-5.
14-24
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Process Capability
Note
When you estimate using the average of subgroup ranges (Rbar), MINITAB displays
an R chart.
When you estimate using the average of subgroup standard deviations (Sbar),
MINITAB displays an S chart.
When you estimate using the pooled standard deviation and your subgroup size is
less than ten, MINITAB displays an R chart.
When you estimate using the pooled standard deviation and your subgroup size is
ten or greater, MINITAB displays an S chart.
use the Box-Cox power transformation when you have very skewed datasee Use the
Box-Cox power transformation for non-normal data on page 12-67.
change the number of subgroups or observations to display in the run chart. The default is 25.
enter the process target or nominal specification. MINITAB calculates Cpm in addition to the
standard capability statistics.
calculate the capability statistics using an interval other than six standard deviations wide
(three on either side of the process mean) by entering a sigma tolerance. For example,
entering 12 says to use an interval 12 standard deviations wide, six on either side of the process
mean.
Capability statistics
Capability Sixpack (Normal) displays both the within and overall capability statistics, Cp, Cpk,
Cpm (if you specify a target), and within, and Pp, Ppk, and overall. To interpret these statistics,
see Capability statistics on page 14-4.
Cp, Cpk, CPU, and CPL represents the potential capability of your processwhat your process
would be capable of if the process did not have shifts and drifts in the subgroup means. To
calculate these, Minitab estimates within considering the variation within subgroups, but not
the shift and drift between subgroups.
Pp, Ppk, PPU, and PPL represent the overall capability of the process. When calculating these
statistics, MINITAB estimates overall considering the variation for the whole study.
e Example of a capability sixpack (normal probability model)
Suppose you work at an automobile manufacturer in a department that assembles engines. One
of the parts, a camshaft, must be 600 mm 2 mm long to meet engineering specifications. There
MINITAB Users Guide 2
CONTENTS
14-25
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 14
SC QREF
HOW TO USE
has been a chronic problem with camshaft lengths being out of specificationa problem which
has caused poor-fitting assemblies down the production line and high scrap and rework rates.
Upon examination of the inventory records, you discovered that there were two suppliers for the
camshafts. An X and R chart showed you that Supplier 2s camshaft production was out of
control, so you decided to stop accepting production runs from them until they get their
production under control.
After dropping Supplier 2, the number of poor quality assemblies have dropped significantly, but
the problems have not completely disappeared. You decide to run a capability sixpack to see
whether Supplier 1 alone is capable of meeting your engineering specifications.
1 Open the worksheet CAMSHAFT.MTW.
2 Choose Stat Quality Tools Capability Sixpack (Normal).
3 In Single column, enter Supp1. In Subgroup size, enter 5.
4 In Lower spec, enter 598. In Upper spec, enter 602. Click OK.
Graph
window
output
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Process Capability
curve. On the normal probability plot, the points approximately follow a straight line. These
patterns indicate that the data are normally distributed.
But from the capability plot, you can see that the process tolerance falls below the lower
specification limit. This means you will sometimes see camshafts that do not meet the lower
specification of 598 mm. Also, the values of Cp (1.16) and Cpk (0.90) are below the guideline of
1.33, indicating that Supplier 1 needs to improve their process.
e Example of a capability sixpack with a Box-Cox tranformation
Suppose you work for a company that manufactures floor tiles, and are concerned about warping
in the tiles. To ensure production quality, you measure warping in ten tiles each working day for
ten days.
From previous analyses, you found that the tile data do not come from a normal distribution, and
that a Box-Cox transformation using a lambda value of 0.5 makes the data more normal. For
details, see Example of a capability analysis with a Box-Cox transformation on page 14-12.
So you will run the capability sixpack using a Box-Cox transformation on the data.
1 Open the worksheet TILES.MTW.
2 Choose Stat Quality Tools Capability Sixpack (Normal).
3 In Single column, enter Warping. In Subgroup size, type 10.
4 In Upper spec, type 8.
5 Click Options.
6 Check Box-Cox power transformation (W = Y**Lambda). Choose Lambda = 0.5 (square
Graph
window
output
CONTENTS
14-27
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 14
SC QREF
HOW TO USE
14-28
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Process Capability
an Individuals chart
an R chart or S chart
between/within and overall capability statistics; Cp, Cpk, Cpm (if you specify a target),
within, between, and total; Pp, Ppk, and overall.
The Individuals, Moving Range, and R or S charts can verify whether or not the process is in
control. The histogram and normal probability plot can verify whether or not the data are
normally distributed. Lastly, the capability plot gives a graphical view of the process variability
compared to specifications. Combined with the capability statistics, this information can help
you assess whether your process is in control and the product meets specifications.
A model that assumes that the data are from a normal distribution suits most process data. If your
data are either very skewed or the within subgroup variation is not constant (for example, when
the variation is proportional to the mean), see the discussion under Non-normal data on page
14-5.
Data
You can enter data in subgroups, with two or more observations per subgroup. Subgroup data
can be structured in one column or in rows across several columns.
To use the Box-Cox transformation, data must be positive.
Ideally, all subgroups should be the same size. If your subgroups are not all the same size, due to
missing data or unequal sample sizes, only subgroups of the majority size are used for estimating
the between-subgroup variation. Control limits for the Individuals and Moving Range charts are
based on the majority subgroup size.
CONTENTS
14-29
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 14
SC QREF
HOW TO USE
When subgroups are in one column, enter the data column in Single column. In
Subgroup size, enter a subgroup size or column of subgroup indicators.
When subgroups are in rows, choose Subgroups across rows of, and enter the columns
containing the rows in the box.
3 In Lower spec or Upper spec, enter a lower and/or upper specification limit, respectively. You
Options
Capability Sixpack (Between/Within) dialog box
enter a historical value for (the process mean) and/or (within-subgroup and/or
between-subgroup standard deviations) if you have known process parameters or estimates
from past data. If you do not specify a value for or , MINITAB estimates them from the data.
do your choice of the eight tests for special causessee Do tests for special causes on page
12-63. To adjust the sensitivity of the tests, use Defining Tests for Special Causes on page 12-5.
14-30
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Process Capability
Note
When you estimate using the average of subgroup ranges (Rbar), MINITAB displays
an R chart.
When you estimate using the average of subgroup standard deviations (Sbar),
MINITAB displays an S chart.
When you estimate using the pooled standard deviation and your subgroup size is
less than ten, MINITAB displays an R chart.
When you estimate using the pooled standard deviation and your subgroup size is
ten or greater, MINITAB displays an S chart.
use the Box-Cox power transformation when you have very skewed datasee Non-normal
data on page 14-5.
enter the process target or nominal specification. MINITAB calculates Cpm in addition to the
standard capability statistics.
calculate the capability statistics using an interval other than six standard deviations wide
(three on either side of the process mean) by entering a sigma tolerance. For example,
entering 12 says to use an interval 12 standard deviations wide, six on either side of the process
mean.
Capability statistics
When you use Capability Analysis (Between/Within), MINITAB calculates both overall capability
statistics (Pp, Ppk, PPU, and PPL) and between/within capability statistics (Cp, Cpk, CPU, and
CPL). To interpret these statistics, see Capability statistics on page 14-4.
e Example of a capability sixpack (between/within)
Suppose you are interested in the capability of a process that coats rolls of paper with a thin film.
You are concerned that the paper is being coated with the correct thickness of film and that the
coating is applied evenly throughout the roll. You take three samples from 25 consecutive rolls
and measure coating thickness. The thickness must be 50 3 to meet engineering specifications.
Because you are interested in determining whether or not the coating is even throughout a roll,
you use MINITAB to conduct a Capability Sixpack (Between/Within).
1 Open the worksheet BWCAPA.MTW.
2 Select Stat Quality Tools Capability Sixpack (Between/Within).
3 In Single column, enter Coating. In Subgroup size, enter Roll.
MINITAB Users Guide 2
CONTENTS
14-31
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 14
SC QREF
HOW TO USE
Graph
window
output
Interpreting results
If you want to interpret the process capability statistics, your data need to come from a normal
distribution. This criteria appears to have been met. In the capability histogram, the data
approximately follow the normal curve. Also, on the normal probability plot, the points
approximately follow a straight line.
No points failed the eight tests for special causes, thereby implying that your process is in control.
The points on the Individuals and Moving Range chart do not appear to follow each other, again
indicating a stable process.
The capability plot shows that the process is meeting specifications. The values of Cpk (1.21) and
Ppk (1.14) fall just below the guideline of 1.33, so your process could use some improvement.
14-32
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Process Capability
The X, R, and run charts can be used to verify that the process is in a state of control. The
histogram and Weibull probability plot can be used to verify that the data approximate a Weibull
distribution. Lastly, the capability plot gives a graphical view of the process variability compared
to the specifications. Combined with the capability statistics, this information can help you
assess whether your process is in control and can produce output that consistently meets the
specifications.
When using the Weibull model, MINITAB only calculates the overall capability statistics, Pp and
Ppk. The calculations are based on maximum likelihood estimates of the shape and scale
parameters for the Weibull distribution, rather than mean and variance estimates as in the
normal case. If you have data that do not follow a normal distribution, and you want to calculate
the within statistics (Cp, Cpk, within), see Capability Analysis (Normal Distribution) on page
14-6 with the optional Box-Cox power transformation. For a comparison of the methods used for
non-normal data, see Non-normal data on page 14-5.
Tip
To make a control chart that you can interpret properly, your data must follow a normal
distribution. If the Weibull distribution fits your data well, a lognormal distribution would
probably also provide a good fit. To transform your data, use the control chart command
with the optional Box-Cox transformation, entering Lambda = 0(natural log). For more
details, see Use the Box-Cox power transformation for non-normal data on page 12-67.
Data
You can enter individual observations or data in subgroups. Individual observations should be
structured in one column. Subgroup data can be structured in one column or in rows across
several columns. When you have subgroups of unequal size, enter the subgroups in a single
column, then set up a second column of subgroup indicators. For examples, see Data on page
12-3.
Data must be positive.
MINITAB Users Guide 2
CONTENTS
14-33
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 14
SC QREF
HOW TO USE
If a single observation in the subgroup is missing, MINITAB omits it from the calculations of the
statistics for that subgroup. This may cause the control chart limits and the center line to have
different values for that subgroup. If an entire subgroup is missing, there is a gap in the chart
where the statistic for that subgroup would have been plotted.
h To make a capability sixpack (Weibull probability model)
1 Choose Stat Quality Tools Capability Sixpack (Weibull).
When subgroups or individual observations are in one column, enter the data column in
Single column. In Subgroup size, enter a subgroup size or column of subgroup indicators.
For individual observations, enter a subgroup size of 1.
When subgroups are in rows, choose Subgroups across rows of, and enter the columns
containing the rows in the box.
3 In Lower spec or Upper spec, enter a lower and/or upper specification limit. You must enter at
least one of them. These limits must be positive numbers, though the lower spec can be 0.
4 If you like, use any of the options listed below, then click OK.
Options
Options subdialog box
Caution
enter your own value for the Weibull shape and scale parameterssee Weibull family of
distributions on page 14-20. If you do not enter values, MINITAB obtains maximum likelihood
estimates from the data.
When you enter known values for the parameters, keep in mind that small changes in
the parameters, especially the shape, can have large effects on the associated
probabilities.
change the number of subgroups or observations to display in the run chart. The default is 25.
14-34
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Process Capability
calculate the capability statistics using an interval other than six standard deviations wide
(three on either side of the process mean) by entering a sigma tolerance. For example,
entering 12 says to use an interval 12 standard deviations wide, six on either side of the process
mean.
Capability statistics
Capability Sixpack (Weibull) displays the overall capability statistics, Pp and Ppk. These
calculations are based on maximum likelihood estimates of the shape and scale parameters for
the Weibull distribution, rather than mean and variance estimates as in the normal case.
For information on interpreting these statistics, see Capability statistics on page 14-4.
e Example of a capability sixpack (Weibull probability model)
Suppose you work for a company that manufactures floor tiles, and are concerned about warping
in the tiles. To ensure production quality, you measured warping in ten tiles each working day
for ten days.
A histogram of the data revealed that it did not come from a normal distributionsee Example
of a capability analysis with a Box-Cox transformation on page 14-12. So you decide to make a
capability sixpack based on a Weibull probability model.
1 Open the worksheet TILES.MTW.
2 Choose Stat Quality Tools Capability Sixpack (Weibull).
3 In Single column, enter Warping. In Subgroup size, type 10.
4 In Upper spec, type 8. Click OK.
CONTENTS
14-35
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 14
HOW TO USE
Capability Analysis (Binomial)
Graph
window
output
each item can result in one or two possible outcomes (success/failure, go/no-go)
14-36
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Process Capability
Capability Analysis (Binomial) produces a process capability report that includes the following:
Chart of cumulative %defectiveverifies that you have collected data from enough samples
to have a stable estimate of %defective
Defective rate plotverifies that the %defective is not influenced by the number of items
sampled
Data
Use data from a binomial distribution. Each entry in the worksheet column should contain the
number of defectives for a subgroup. When subgroup sizes are unequal, you must also enter a
corresponding column of subgroup sizes.
Suppose you have collected data on the number of parts inspected and the number of parts that
failed inspection. On any given data, both numbers may vary. Enter the number that failed
inspection in one column. If the total number inspected varies, enter subgroup size in another
column:
Failed Inspected
11
1003
12
968
9
897
13
1293
9
989
15
1423
Missing data
If an observation is missing, there is a gap in the P chart where that subgroup would have been
plotted. The other plots and charts simply exclude the missing observations.
Unequal subgroup sizes
In the P chart, the control limits are a function of the subgroup size. In general, the control
limits are further from the center line for smaller subgroups than they are for larger ones. When
you do have unequal subgroup sizes, the plot of %defective versus sample size will permit you to
verify that there is no relationship between the two. For example, if you tend to have a smaller
%defective when more items are sampled, this could be caused by fatigued inspectors, a
common problem. The subgroup size has no bearing on the other charts because they only
display the %defective.
CONTENTS
14-37
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 14
HOW TO USE
Capability Analysis (Binomial)
When your sample size is constant, enter the sample size value in Constant size.
When your sample sizes vary, enter the column containing sample sizes in Use sizes in.
4 If you like, use any of the options listed below, then click OK.
Options
Capability Analysis (binomial) dialog box
enter a historical value for the proportion of defectives. This value must be between 0 and 1.
perform your choice of the four tests for special causessee Do tests for special causes on page
13-15. To adjust the sensitivity of the tests, use Defining Tests for Special Causes on page 12-5.
Suppose you are responsible for evaluating the responsiveness of your telephone sales
department, that is, how capable it is of answering incoming calls. You record the number of calls
14-38
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Process Capability
that were not answered (a defective) by sales representatives due to unavailability each day for 20
days. You also record the total number of incoming calls.
1 Open the worksheet BPCAPA.MTW.
2 Choose Stat Quality Tools Capability Analysis (Binomial).
3 In Defectives, enter Unavailable.
4 In Use sizes in, enter Calls. Click OK.
Graph
window
output
Interpreting results
The P chart indicates that there is one point out of control. The chart of cumulative %defect
shows that the estimate of the overall defective rate appears to be settling down around 22%, but
more data may need to be collected to verify this. The rate of defectives does not appear to be
affected by sample size. The process Z is around 0.75, which is very poor. This process could use
a lot of improvement.
CONTENTS
14-39
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 14
HOW TO USE
Capability Analysis (Poisson)
Use Capability Analysis (Poisson) when your data meet the following conditions:
the rate of defects per unit of space or time is the same for each item
the number of defects observed in the items are independent of each other
Capability Analysis (Poisson) produces a process capability report for data from a Poisson
distribution. The report includes the following:
U chartverifies that the process was in a state of control at the time the report was generated
Chart of cumulative mean DPU (defects per unit)verifies that you have collected data
from enough samples to have a stable estimate of the mean
Histogram of DPUdisplays the overall distribution of the defects per unit from the samples
collected
Defect plot rateverifies that DPU is not influenced by the size of the items sampled
Data
Each entry in the worksheet column should contain the number of or defects for a subgroup.
When subgroup sizes are unequal, you must also enter a corresponding column of subgroup
sizes.
Suppose you have collected data on the number of defects per unit and the size of the unit. For
any given unit, both numbers may vary. Enter the number of defects in one column. If the unit
size varies, enter unit size in another column:
Failed Inspected
3
89
4
94
7
121
2
43
11
142
6
103
Missing data
If an observation is missing, there is a gap in the U chart where the subgroup would have been
plotted. The other plots and charts simply exclude the missing observation(s).
Unequal subgroup sizes
In the U chart, the control limits are a function of the subgroup size. In general, the control limits
are further from the centerline for smaller subgroups than they are for larger ones. When you do
have unequal subgroup sizes, the plot of defects per unit (DPU) versus sample size will permit
you to verify that there is no relationship between the two. For example, if you tend to have a
smaller DPU when more items are sampled, this could be caused by fatigued inspectors, a
common problem. The subgroup size has no bearing on the other charts, because they only
display the DPU.
14-40
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Process Capability
When your unit size is constant, enter the unit size value in Constant size.
When your unit sizes vary, enter the column containing unit sizes in Use sizes in.
4 If you like, use any of the options listed below, then click OK.
Options
Capability Analysis (Poisson) dialog box
enter historical values for (the process mean) if you have known process parameters or
estimates from past data. If you do not specify a value for , MINITAB estimates it from the
data.
enter a target DPU (defects per unit) for the process.
perform the four tests for special causessee Do tests for special causes on page 13-15. To
adjust the sensitivity of the tests, use Defining Tests for Special Causes on page 12-5.
choose to use a full color, partial color, or black and white color scheme for printing.
CONTENTS
14-41
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 14
References
e Example of capability analysis (Poisson probability distribution)
Suppose you work for a wire manufacturer and are concerned about the effectiveness of the wire
insulation process. You take random lengths of electrical wiring and test them for weak spots in
their insulation by subjecting them to a test voltage. You record the number of weak spots and the
length of each piece of wire (in feet).
1 Open the worksheet BPCAPA.MTW.
2 Choose Stat Quality Tools Capability Analysis (Poisson).
3 In Defects, enter Weak Spots.
4 In Uses sizes in, enter Length. Click OK.
Graph
window
output
Interpreting results
The U Chart indicates that there are three points out of control. The chart of cumulative mean
DPU (defects per unit) has settled down around the value 0.0265, signifying that enough samples
were collected to have a good estimate of the mean DPU. The rate of DPU does not appear to be
affected by the lengths of the wire.
References
[1] L.K. Chan, S.W. Cheng, and F.A. Spiring (1988). A New Measure of Process Capability:
Cpm, Journal of Quality Technology, 20, July, pp.162175.
[2] Y. Chou, D. Owen, S. Borrego (1990). Lower Confidence Limits on Process Capability
Indices, Journal of Quality Technology, 22, July, pp.223229.
14-42
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
References
HOW TO USE
Process Capability
[3] Ford Motor Company (1983). Continuing Process Control and Process Capability
Improvement, Ford Motor Company, Dearborn, Michigan.
[4] L.A. Franklin and G.S. Wasserman (1992). Bootstrap Lower Confidence Limits for
Capability Indices, Journal of Quality Technology, 24, October, pp.196210.
[5] B. Gunter (1989). The Use and Abuse of Cpk, Part 2, Quality Progress, 22, March, pp.108,
109.
[6] B. Gunter (1989). The Use and Abuse of Cpk, Part 3, Quality Progress, 22, May, pp.79, 80.
[7] A.H. Jaehn (1989). How to Estimate Percentage of Product Failing Specifications,
Tappi,72, pp.227228.
[8] V.E. Kane (1986). Process Capability Indices, Journal of Quality Technology, 18, pp. 41
52.
[9] R.H. Kushler and P. Hurley (1992). Confidence Bounds for Capability Indices, Journal of
Quality Technology, 24, October, pp.188195.
[10] W.L. Pearn, S. Kotz, and N.L. Johnson (1992). Distributional and Inferential Properties of
Process Capability Indices, Journal of Quality Technology, 24, October, pp. 216231.
[11] R.N. Rodriguez (1992). Recent Developments in Process Capability Analysis, Journal of
Quality Technology, 24, October, pp.176187.
[12] T.P. Ryan (1989). Statistical Methods for Quality Improvement, John Wiley & Sons.
[13] L.P. Sullivan (1984). Reducing Variability: A New Approach to Quality, Quality Progress,
July, 1984, pp.15 21.
[14] H.M. Wadsworth, K.S. Stephens, and A.B. Godfrey (1986). Modern Methods for Quality
Control and Improvement, John Wiley & Sons.
[15] Western Electric (1956). Statistical Quality Control Handbook, Western Electric
Corporation, Indianapolis, Indiana.
CONTENTS
14-43
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
15
Distribution Analysis
CONTENTS
15-1
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 15
SC QREF
HOW TO USE
Distribution Analysis Overview
Use the parametric distribution analysis commands when you can assume your data follow a
parametric distribution.
Use the nonparametric distribution analysis commands when you cannot assume a
parametric distribution.
Then, once you have decided which type of analysis to use, you need to choose whether you will
use the right censoring or arbitrary censoring commands, which perform similar analyses.
Use the right-censoring commands when you have exact failures and right-censored data.
Use the arbitrary-censoring commands when your data include both exact failures and a
varied censoring scheme, including right-censoring, left-censoring, and interval-censoring.
For details on creating worksheets for censored data, see Distribution Analysis Data on page 15-4.
15-2
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Distribution Analysis
CONTENTS
15-3
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 15
HOW TO USE
Distribution Analysis Data
Estimation methods
As described above, MINITAB provides both parametric and nonparametric methods to estimate
functions. If a parametric distribution fits your data, then use the parametric estimates. If no
parametric distribution adequately fits your data, then use the nonparametric estimates.
For the parametric estimates in this chapter, you can choose either the maximum likelihood
method or least squares approach. Nonparametric methods differ, depending on the type of
censoring. For the formulas used, see Help.
Estimation methods
Parametric methods
(assumes parametric
distribution)
Data
Maximum
likelihood (using
Newton-Raphson
algorithm)
Least squares
estimation
Kaplan-Meier
Nonparametric methods
(no distribution assumed)
Actuarial
Turnbull
15-4
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Distribution Analysis
Life data are often censored or incomplete in some way. Suppose you are monitoring air
conditioner fans to find out the percentage of fans that fail within a three-year warranty period.
This table describes the types of observations you can have:
Type of observation
Description
Example
Right censored
Left censored
Interval censored
How you set up your worksheet depends, in part, on the type of censoring you have:
when your data consist of exact failures and right-censored observations, see Distribution
analysisright censored data on page 15-5.
when your data have exact failures and a varied censoring scheme, including right- censoring,
left-censoring, and interval-censoring, see Distribution analysisarbitrarily censored data on
page 15-8.
CONTENTS
15-5
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 15
HOW TO USE
Distribution Analysis Data
In these two examples, the Months column contains failure times, and the Censor column
contains indicators that say whether that failure was censored (C) or an exact failure time (F):
Months
50
53
60
65
70
70
50
53
Months
50
50
53
53
60
65
70
70
Censor
F
F
C
C
F
F
F
F
etc.
etc.
...
...
...
...
etc.
Censor
F
F
F
F
F
F
C
C
etc.
time censored, meaning that you run the study for a specified period of time. All units still
running at the end time are time censored. This is known as Type I censoring on the right.
failure censored, meaning that you run the study until you observe a specified number of
failures. All units running from the last specified failure onward are failure censored. This is
known as Type II censoring on the right.
Worksheet structure
Do one of the following, depending on the type of censoring you have:
Singly censored data
to use a constant failure time to define censoring, enter a column of failure times for each
sample. Later, when executing the command, you will specify the failure time at which to
begin censoring.
to use a specified number of failures to define censoring, enter a column of failure times for
each sample. Later, when executing the command, you will specify the number of failures at
which to begin censoring.
to use censoring columns to define censoring, enter two columns for each sampleone
column of failure times and a corresponding column of censoring indicators. You must use
this method for multiply censored data.
Censoring indicators can be numbers or text. If you dont specify which value indicates
censoring in the Censor subdialog box, MINITAB assumes the lower of the two values indicates
censoring, and the higher of the two values indicates an exact failure.
The data column and associated censoring column must be the same length, although pairs of
data and censor columns (from different samples) can have different lengths.
15-6
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Distribution Analysis
Months
Censor
50
60
53
40
51
99
35
55
F
F
F
F
F
C
F
F
...
...
etc.
etc.
Days Censor
140
F
150
F
150
F
150
F
150
F
151
C
151
F
151
F
etc.
Freq
1
4
1
35
42
1
39
1
...
etc.
...
etc.
...
...
...
etc.
Days Censor
140
F
150
F
151
C
151
F
153
F
161
C
170
F
199
F
etc.
Frequency columns are useful for data where you have large numbers of observations with
common failure and censoring times. For example, warranty data usually includes large
numbers of observations with common censoring times.
CONTENTS
15-7
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 15
HOW TO USE
Distribution Analysis Data
Note
Stacked data
Drug B
2
3
6
14
24
26
27
31
Drug
20
30
43
51
57
82
85
89
2
3
6
14
24
26
27
31
Group
A
A
A
A
A
A
A
A
B
B
B
B
B
B
B
B
You cannot analyze more than one column of stacked data per analysis. So when you use
grouping indicators, the data for each sample must be in one column.
failure time
failure time
Right censored
Left censored
Interval censored
15-8
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Distribution ID Plot
HOW TO USE
Distribution Analysis
This data set illustrates tabled data, as well as the use of a frequency column. Frequency
columns are described in Using frequency columns on page 15-7.
Start
10000
20000
30000
30000
40000
50000
50000
60000
70000
80000
90000
End
10000
20000
30000
30000
40000
50000
50000
60000
70000
80000
90000
Frequency
20
10
10
2
20
40
7
50
120
230
310
190
When you have more than one sample, you can use separate columns for each sample.
Alternatively, you can stack all of the samples in one column, then set up a column of grouping
indicators. Grouping indicators can be numbers or text. For an illustration, see Stacked vs.
unstacked data on page 15-8.
Distribution ID Plot
Use Distribution ID Plot to plot up to four different probability plots (with distributions chosen
from Weibull, extreme value, exponential, normal, lognormal basee, lognormal base10, logistic,
and loglogistic) to help you determine which of these distributions best fits your data. Usually
this is done by comparing how closely the plot points lie to the best-fit linesin particular those
points in the tails of the distribution.
MINITAB also provides two goodness-of-fit testsAnderson-Darling for the maximum likelihood
and least squares estimation methods and Pearson correlation coefficient for the least squares
estimation methodto help you assess how the distribution fits your data. See Goodness-of-fit
statistics on page 15-13.
The data you gather are the individual failure times, which may be censored. For example, you
might collect failure times for units running at a given temperature. You might also collect
samples of failure times under different temperatures, or under varying conditions of any
combination of stress variables.
You can display up to ten samples on each plot. All of the samples display on a single plot, in
different colors and symbols.
For a discussion of probability plots, see Probability plots on page 15-36.
CONTENTS
15-9
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 15
Distribution ID Plot
Data
Distribution ID Plot accepts different kinds of data:
Distribution ID PlotRight Censoring accepts exact failure times and right censored data.
For information on how to set up your worksheet see Distribution analysisright censored
data on page 15-5.
Distribution ID PlotArbitrary Censoring accepts exact failure times and right-, left-, and
interval-censored data. For information on how to set up your worksheet see Distribution
analysisarbitrarily censored data on page 15-8.
You can enter up to ten samples per analysis. For general information on life data and censoring,
see Distribution Analysis Data on page 15-4.
h To make a distribution ID plot (uncensored/right censored data)
1 Choose Stat Reliability/Survival Distribution ID PlotRight Cens.
2 In Variables, enter the columns of failure times. You can enter up to ten columns (ten
different samples).
3 If you have frequency columns, enter the columns in Frequency columns.
4 If all of the samples are stacked in one column, check By variable, and enter a column of
15-10
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Distribution ID Plot
HOW TO USE
Distribution Analysis
5 Click Censor.
For data with censoring columns: Choose Use censoring columns, then enter the
censoring columns in the box. The first censoring column is paired with the first data
column, the second censoring column is paired with the second data column, and so on.
If you like, enter the value you use to indicate censoring in Censoring value. If you do not
enter a value, MINITAB uses the lowest value in the censoring column.
For time censored data: Choose Time censor at, then enter a failure time at which to
begin censoring. For example, entering 500 says that any observation from 500 time units
onward is considered censored.
For failure censored data: Choose Failure censor at, then enter a number of failures at
which to begin censoring. For example, entering 150 says to censor all (ordered)
observations from the 150th observed failure on, and leave all other observations
uncensored.
7 If you like, use any of the options listed below, then click OK.
h To make a distribution ID plot (arbitrarily censored data)
1 Choose Stat Reliability/Survival Distribution ID PlotArbitrary Cens.
CONTENTS
15-11
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 15
Distribution ID Plot
2 In Start variables, enter the column of start times. You can enter up to ten columns (ten
different samples).
3 In End variables, enter the column of end times. You can enter up to ten columns (ten
different samples). The first start column is paired with the first end column, the second start
column is paired with the second end column, and so on.
4 If you have frequency columns, enter the columns in Frequency columns.
5 If all of the samples are stacked in one column, check By variable, and enter a column of
Options
Distribution ID Plot dialog box
choose to create up to four probability plots. The default is to create four plots.
choose to fit up to four common lifetime distributions for the parametric analysis, including
the Weibull, extreme value, exponential, normal, lognormal basee, lognormal base10, logistic,
and loglogistic distributions. The four default distributions are Weibull, lognormal basee,
exponential, and normal.
More
MINITABs extreme value distribution is the smallest extreme value (Type 1).
estimate parameters using the maximum likelihood (default) or least squares methods.
estimate percentiles for additional percents. The default is 1, 5, 10, and 50.
obtain the plot points for the probability plot using various nonparametric methodssee
Probability plots on page 15-36.
With Distribution ID PlotRight Censoring, you can choose the Default method,
Modified Kaplan-Meier method, Herd-Johnson method, or Kaplan-Meier method. The
Default method is the normal score for uncensored data; the modified Kaplan-Meier
method for censored data.
With Distribution ID PlotArbitrary Censoring, you can choose the Turnbull or Actuarial
method. The Turnbull method is the default.
(Distribution ID PlotRight Censoring only) handle ties by plotting all of the points
(default), the maximum of the tied points, or the average (median) of the tied points.
15-12
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Distribution ID Plot
HOW TO USE
Distribution Analysis
Output
The default output consists of:
table of percents and their percentiles, standard errors, and 95% confidence intervals
table of MTTFs (mean time to failures) and their standard errors and 95% confidence
intervals
four probability plots for the Weibull, lognormal basee, exponential, and normal distributions
Goodness-of-fit statistics
MINITAB provides two goodness-of-fit statisticsAnderson-Darling for the maximum likelihood
and least squares estimation methods and Pearson correlation coefficient for the least squares
estimation methodto help you compare the fit of competing distributions.
The Anderson-Darling statistic is a measure of how far the plot points fall from the fitted line in a
probability plot. The statistic is a weighted squared distance from the plot points to the fitted line
with larger weights in the tails of the distribution. Minitab uses an adjusted Anderson-Darling
statistic, because the statistic changes when a different plot point method is used. A smaller
Anderson-Darling statistic indicates that the distribution fits the data better.
For least squares estimation, Minitab calculates a Pearson correlation coefficient. If the
distribution fits the data well, then the plot points on a probability plot will fall on a straight line.
The correlation measures the strength of the linear relationship between the X and Y variables
on a probability plot. The correlation will range between 0 and 1, and higher values indicate a
better fitting distribution.
Use the Anderson-Darling statistic and Pearson correlation coefficient to compare the fit of
different distributions.
e Example of a distribution ID plot for right-censored data
Suppose you work for a company that manufactures engine windings for turbine assemblies.
Engine windings may decompose at an unacceptable rate at high temperatures. You want to
knowat given high temperaturesthe time at which 1% of the engine windings fail. You plan
to get this information by using the Parametric Distribution AnalysisRight Censoring
command, which requires you to specify the distribution for your data. Distribution ID Plot
Right Censoring can help you choose that distribution.
First you collect failure times for the engine windings at two temperatures. In the first sample,
you test 50 windings at 80 C; in the second sample, you test 40 windings at 100 C. Some of the
units drop out of the test for unrelated reasons. In the MINITAB worksheet, you use a column of
censoring indicators to designate which times are actual failures (1) and which are censored
units removed from the test before failure (0).
MINITAB Users Guide 2
CONTENTS
15-13
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 15
Distribution ID Plot
1 Open the worksheet RELIABLE.MTW.
2 Choose Stat Reliability/Survival Distribution ID PlotRight Cens.
3 In Variables, enter Temp80 Temp100.
4 Click Censor. Choose Use censoring columns and enter Cens80 Cens100 in the box. Click
Distribution ID Plot
Variable: Temp80
Goodness of Fit
Distribution
Weibull
Lognormal base e
Exponential
Normal
Anderson-Darling
67.64
67.22
70.33
67.73
15-14
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Distribution ID Plot
HOW TO USE
Distribution Analysis
Table of Percentiles
Distribution
Percent Percentile
Weibull
1
10.0765
Lognormal base e
1
19.3281
Exponential
1
0.8097
Normal
1
-0.5493
CI
Upper
17.3193
25.7722
1.1176
15.8592
Weibull
Lognormal base e
Exponential
Normal
3.79130
3.02621
0.67939
6.40367
29.3273
33.5566
5.7037
30.7798
5
5
5
5
20.3592
26.9212
4.1326
18.2289
14.1335
21.5978
2.9942
5.6779
Mean
64.9829
67.4153
80.5676
63.5518
Standard
Error
4.6102
5.5525
13.2452
4.0694
95% Normal CI
Lower
Upper
56.5472
74.677
57.3656
79.225
58.3746
111.198
55.5759
71.528
Variable: Temp100
Goodness of Fit
Distribution
Weibull
Lognormal base e
Exponential
Normal
Anderson-Darling
16.60
16.50
18.19
17.03
Table of Percentiles
Distribution
Percent Percentile
Weibull
1
2.9819
Lognormal base e 1
6.8776
Exponential
1
0.5025
Normal
1
-18.8392
Weibull
Lognormal base e
Exponential
Normal
2.36772
2.07658
0.43984
6.86755
5
5
5
5
8.1711
11.3181
2.5647
-0.2984
4.6306
7.8995
1.8325
-13.7585
CI
Upper
6.8290
10.9034
0.7033
-1.5727
14.4189
16.2162
3.5893
13.1618
CONTENTS
15-15
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 15
HOW TO USE
Distribution ID Plot
Table of MTTF
Distribution
Weibull
Lognormal base e
Exponential
Normal
Mean
45.9448
49.1969
50.0000
44.4516
Standard
Error
4.87525
6.91761
8.57493
4.37371
95% Normal
Lower
37.3177
37.3465
35.7265
35.8793
CI
Upper
56.5663
64.8076
69.9761
53.0240
Graph
window
output
Suppose you work for a company that manufactures tires. You are interested in finding out how
many miles it takes for various proportions of the tires to fail, or wear down to 2/32 of an inch of
tread. You are especially interested in knowing how many of the tires last past 45,000 miles. You
plan to get this information by using the Parametric Distribution AnalysisArbitrary Censoring
15-16
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Distribution ID Plot
HOW TO USE
Distribution Analysis
command, which requires you to specify the distribution for your data. Distribution ID Plot
Arbitrary Censoring can help you choose that distribution.
You inspect each good tire at regular intervals (every 10,000 miles) to see if the tire has failed,
then enter the data into the MINITAB worksheet.
1 Open the worksheet TIREWEAR.MTW.
2 Choose Stat Reliability/Survival Distribution ID PlotArbitrary Cens.
3 In Start variables, enter Start. In End variables, enter End.
4 In Frequency columns, enter Freq.
5 Under Distribution 4, choose Extreme value. Click OK.
Session
window
output
Distribution ID Plot
Variable
Start: Start
End: End
Frequency: Freq
Goodness of Fit
Distribution
Weibull
Lognormal base e
Exponential
Extreme value
Anderson-Darling
2.534
2.685
3.903
2.426
Table of Percentiles
Distribution
Percent Percentile
Weibull
1
27623.0
Lognormal base e
1
27580.2
Exponential
1
762.4
Extreme value
1
13264.5
Standard
Error
998.00
781.26
28.80
2216.24
95.0% Normal
Lower
25734.6
26090.7
708.0
8920.8
CI
Upper
29650.0
29154.8
821.0
17608.3
Weibull
Lognormal base e
Exponential
Extreme value
975.59
795.52
146.96
1522.71
37703.1
34268.2
3613.4
33053.9
41528.9
37387.6
4190.0
39022.8
5
5
5
5
39569.8
35793.9
3891.0
36038.3
Mean
69545.4
72248.6
75858.8
69473.3
CONTENTS
Standard
Error
629.34
1066.42
2865.18
646.64
95% Normal
Lower
68322.8
70188.4
70446.0
68205.9
CI
Upper
70789.9
74369.3
81687.6
70740.7
15-17
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 15
HOW TO USE
Distribution ID Plot
Graph
window
output
Interpreting results
The points fall approximately on the straight line on the extreme value probability plot, so the
extreme value distribution would be a good choice when running the parametric distribution
analysis.
You can also compare the Anderson-Darling goodness-of-fit values to determine which
distribution best fits the data. A smaller Anderson-Darling statistic means that the distribution
provides a better fit. Here, the Anderson-darling values for the extreme value distribution are
lower than the Anderson-Darling values for other distributions, thus supporting your conclusion
that the extreme value distribution provides the best fit.
The table of percentiles and MTTFs allow you to see how your conclusions may change with
different distributions.
15-18
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Distribution Analysis
Data
Distribution Overview Plot accepts different kinds of data:
Distribution Overview PlotRight Censoring accepts exact failure times and right censored
data. For information on how to set up your worksheet, see Distribution analysisright
censored data on page 15-5.
Distribution Overview PlotArbitrary Censoring accepts exact failure times and right-, left-,
and interval-censored data. The data must be in tabled form. For information on how to set
up your worksheet, see Distribution analysisarbitrarily censored data on page 15-8.
You can enter up to ten samples per analysis. For general information on life data and censoring,
see Distribution Analysis Data on page 15-4.
CONTENTS
15-19
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 15
HOW TO USE
Distribution Overview Plot
2 In Variables, enter the columns of failure times. You can enter up to ten columns (ten
different samples).
3 If you have frequency columns, enter the columns in Frequency columns.
4 If all of the samples are stacked in one column, check By variable, and enter a column of
Note
6 Click Censor.
15-20
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Distribution Analysis
For data with censoring columns: Choose Use censoring columns, then enter the
censoring columns in the box. The first censoring column is paired with the first data
column, the second censoring column is paired with the second data column, and so on.
If you like, enter the value you use to indicate censoring in Censoring value. If you dont
enter a value, by default MINITAB uses the lowest value in the censoring column.
For time censored data: Choose Time censor at, then enter a failure time at which to
begin censoring. For example, entering 500 says that any observation from 500 time units
onward is considered censored.
For failure censored data: Choose Failure censor at, then enter a number of failures at
which to begin censoring. For example, entering 150 says to censor all (ordered)
observations from the 150th observed failure on, and to leave all other observations
uncensored.
8 If you like, use any of the options listed below, then click OK.
h To make a distribution overview plot (arbitrarily censored data)
1 Choose Stat Reliability/Survival Distribution Overview PlotArbitrarily Censored.
2 In Start variables, enter the columns of start times. You can enter up to ten columns (ten
different samples).
3 In End variables, enter the columns of end times. You can enter up to ten columns (ten
different samples).
4 If you have frequency columns, enter the columns in Frequency columns.
5 If all of the samples are stacked in one column, check By variable, and enter a column of
CONTENTS
15-21
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 15
HOW TO USE
Distribution Overview Plot
7 If you like, use any of the options described below, then click OK.
Options
Distribution Overview Plot dialog box
More
for the parametric display of plots, choose one of eight common lifetime distributions for the
dataWeibull (default), extreme value, exponential, normal, lognormal basee, lognormal
base10, logistic, or loglogistic.
MINITABs extreme value distribution is the smallest extreme value (Type 1).
estimate parameters using the maximum likelihood (default) or least squares methods
obtain the plot points for the probability plot using various nonparametric methodssee
Probability plots on page 15-36. You can choose the Default method, Modified Kaplan-Meier,
Herd-Johnson, or Kaplan-Meier method. The Default method is the normal score for
uncensored data; the modified Kaplan-Meier method is for censored data.
handle ties by plotting all of the points (default), the maximum of the tied points, or the
average (median) of the tied points.
estimate parameters using the maximum likelihood (default) or least squares methods
obtain the plot points for the probability plot using various nonparametric methodssee
Probability plots on page 15-36. You can choose from the Turnbull method (default) or
Actuarial method.
estimate parameters using the Turnbull method (default) or Actuarial method (default).
15-22
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Distribution Analysis
Output
The distribution overview plot display differs depending on whether you select the parametric or
nonparametric display.
When you select a parametric display, you get:
a probability plot, which displays estimates of the cumulative distribution function F(y) vs.
failure timesee Probability plots on page 15-36.
a parametric survival (or reliability) plot, which displays the survival (or reliability) function
1F(y) vs. failure timesee Survival plots on page 15-40.
a probability density function, which displays the curve that describes the distribution of your
data, or f(y).
a parametric hazard plot, which displays the hazard function or instantaneous failure rate,
f(y)/(1F(y)) vs. failure timesee Hazard plots on page 15-41.
The Kaplan-Meier survival estimates, Turnbull survival estimates, and empirical hazard function
change values only at exact failure times, so the nonparametric survival and hazard curves are
step functions. Parametric survival and hazard estimates are based on a fitted distribution and the
curve will therefore be smooth.
e Example of a distribution overview plot with right-censored data
Suppose you work for a company that manufactures engine windings for turbine assemblies.
Engine windings may decompose at an unacceptable rate at high temperatures. You want to
MINITAB Users Guide 2
CONTENTS
15-23
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 15
HOW TO USE
Distribution Overview Plot
know, at given high temperatures, at what time do 1% of the engine windings fail. You plan to get
this information by using the Parametric Distribution AnalysisRight Censoring command, but
you first want to have a quick look at your data from different perspectives.
First you collect data for times to failure for the engine windings at two temperatures. In the first
sample, you test 50 windings at 80 C; in the second sample, you test 40 windings at 100 C.
Some of the units drop out of the test due to failures from other causes. These units are
considered to be right censored because their failures were not due to the cause of interest. In the
MINITAB worksheet, you use a column of censoring indicators to designate which times are actual
failures (1) and which are censored units removed from the test before failure (0).
1 Open the worksheet RELIABLE.MTW.
2 Choose Stat Reliability/Survival Distribution Overview PlotRight Cens.
3 In Variables, enter Temp80 Temp100.
4 From Distribution, choose Lognormal base e.
5 Click Censor. Choose Use censoring columns and enter Cens80 Cens100 in the box. Click
Graph
window
output
15-24
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Distribution Analysis
Suppose you work for a company that manufactures tires. You are interested in finding out how
many miles it takes for various proportions of the tires to fail, or wear down to 2/32 of an inch of
tread. You are especially interested in knowing how many of the tires last past 45,000 miles. You
plan to get this information by using the Parametric Distribution AnalysisArbitrary Censoring
command, but first you want to have a quick look at your data from different perspectives.
You inspect each good tire at regular intervals (every 10,000 miles) to see if the tire has failed,
then enter the data into the MINITAB worksheet.
1 Open the worksheet TIREWEAR.MTW.
2 Choose Stat Reliability/Survival Distribution Overview PlotArbitrary Cens.
3 In Start variables, enter Start. In End variables, enter End.
4 In Frequency columns, enter Freq.
5 From Distribution, choose Extreme value. Click OK.
Session
window
output
CONTENTS
15-25
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 15
HOW TO USE
Distribution Overview Plot
Graph
window
output
15-26
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Distribution Analysis
Data
The parametric distribution analysis commands accept different kinds of data:
You can enter up to ten samples per analysis. For general information on life data and censoring,
see Distribution Analysis Data on page 15-4.
Occasionally, you may have life data with no failures. Under certain conditions, MINITAB allows
you to draw conclusions based on that data. See Drawing conclusions when you have few or no
failures on page 15-33.
CONTENTS
15-27
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 15
SC QREF
HOW TO USE
2 In Variables, enter the columns of failure times. You can enter up to ten columns (ten
different samples).
3 If you have frequency columns, enter the columns in Frequency columns.
4 If all of the samples are stacked in one column, check By variable, and enter a column of
grouping indicators in the box. In Enter number of levels, enter the number of levels the
indicator column contains.
Note
5 Click Censor.
For data with censoring columns: Choose Use censoring columns, then enter the
censoring columns in the box. The first censoring column is paired with the first data
column, the second censoring column is paired with the second data column, and so on.
If you like, enter the value you use to indicate censoring in Censoring value. If you dont
enter a value, by default MINITAB uses the lowest value in the censoring column.
15-28
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Distribution Analysis
For time censored data: Choose Time censor at, then enter a failure time at which to
begin censoring. For example, entering 500 says that any observation from 500 time units
onward is considered censored.
For failure censored data: Choose Failure censor at, then enter a number of failures at
which to begin censoring. For example, entering 150 says to censor all (ordered)
observations from the 150th observed failure on, and leave all other observations
uncensored.
7 If you like, use any of the options listed below, then click OK.
h To do a parametric distribution analysis (arbitrarily censored data)
1 Choose Stat Reliability/Survival Parametric Dist AnalysisArbitrary Cens.
2 In Start variables, enter the columns of start times. You can enter up to ten columns (ten
different samples).
3 In End variables, enter the columns of end times. You can enter up to ten columns (ten
different samples).
4 If you have frequency columns, enter the columns in Frequency columns.
5 If all of the samples are stacked in one column, check By variable, and enter a column of
grouping indicators in the box. In Enter number of levels, enter the number of levels the
indicator column contains.
6 If you like, use any of the options described below, then click OK.
CONTENTS
15-29
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 15
SC QREF
HOW TO USE
Options
Parametric Distribution Analysis dialog box
More
fit one of eight common lifetime distributions for the parametric analysis, including Weibull
(default), extreme value, exponential, normal, lognormal basee, lognormal base10, logistic,
and loglogistic
MINITABs extreme value distribution is the the smallest extreme value (Type 1).
estimate parameters using the maximum likelihood (default) or least squares methodssee
Estimating the distribution parameters on page 15-42.
estimate the scale parameter while holding the shape fixed (Weibull distribution), or estimate
the location parameter while keeping the scale fixed (all other distributions)see Estimating
the distribution parameters on page 15-42.
draw conclusions when you have few or no failuresDrawing conclusions when you have few
or no failures on page 15-33.
estimate survival probabilities for times (values) you specifysee Survival probabilities on page
15-39.
specify a confidence level for all of the confidence intervals. The default is 95.0%.
choose to calculate two-sided confidence intervals, or lower or upper bounds. The default is
two-sided.
test whether the distribution parameters (scale, shape, or location) are consistent with specified
valuessee Comparing parameters on page 15-34.
test whether two or more samples come from the same populationsee Comparing
parameters on page 15-34.
test whether the shape, scale, or location parameters from K distributions are the samesee
Comparing parameters on page 15-34.
obtain the plot points for the probability plot using various nonparametric methodssee
Probability plots on page 15-36.
15-30
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Distribution Analysis
With Parametric Distribution AnalysisRight Censoring, you can choose the Default
method, Modified Kaplan-Meier method, Herd-Johnson method, or Kaplan-Meier method.
The Default method is the normal score for uncensored data; the modified Kaplan-Meier
method for censored data.
With Parametric Distribution AnalysisArbitrary Censoring, choose the Turnbull or
Actuarial method. Turnbull is the default method.
(Parametric Distribution AnalysisRight Censoring only) handle tied failure times in the
probability plot by plotting all of the points (default), the average (median) of the tied points,
or the maximum of the tied points.
enter starting values for model parameterssee Estimating the distribution parameters on
page 15-42.
change the maximum number of iterations for reaching convergence (the default is 20).
MINITAB obtains maximum likelihood estimates through an iterative process. If the
maximum number of iterations is reached before convergence, the command terminates
see Estimating the distribution parameters on page 15-42.
use historical estimates for the parameters rather than estimate them from the data. In this
case, no estimation is done; all resultssuch as the percentiles and survival probabilitiesare
based on these historical estimates. See Estimating the distribution parameters on page 15-42.
CONTENTS
15-31
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 15
SC QREF
HOW TO USE
Output
The default output for Parametric Distribution AnalysisRight Censoring and Parametric
Distribution AnalysisArbitrary Censoring consists of:
probability plot
Fitting a distribution
You can fit one of eight common lifetime distributions to your data, including the Weibull
(default), extreme value, exponential, normal, lognormal basee, lognormal base10, logistic, and
loglogistic distributions.
More
MINITABs extreme value distribution is the the smallest extreme value (Type 1).
15-32
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Distribution Analysis
The Session window output includes two tables that describe the distribution. Here is some
sample output from a default Weibull distribution:
Estimation Method: Maximum Likelihood
Distribution: Weibull
Parameter Estimates
Parameter
Shape
Scale
Standard
Error
0.3127
5.203
Estimate
2.3175
73.344
95.0% Normal CI
Lower
Upper
1.7790
3.0191
63.824
84.286
Log-Likelihood = -186.128
Goodness-of-Fit
Anderson-Darling = 67.6366
Characteristics of Distribution
Mean(MTTF)
Standard Deviation
Median
First Quartile(Q1)
Third Quartile(Q3)
Interquartile Range(IQR)
Estimate
64.9829
29.7597
62.6158
42.8439
84.4457
41.6018
Standard
Error
4.6102
4.1463
4.6251
4.3240
6.2186
5.5878
95.0% Normal CI
Lower
Upper
56.5472
74.6771
22.6481
39.1043
54.1763
72.3700
35.1546
52.2151
73.0962
97.5575
31.9730
54.1305
Parameter Estimates displays the maximum likelihood or the least squares estimates of the
distribution parameters, their standard errors and approximate 95.0% confidence intervals,
and the log-likelihood and Anderson-Darling goodness-of-fit statistic for the fitted distribution.
Characteristics of Distribution displays common measures of the center and spread of the
distribution with 95.0% lower and upper confidence intervals. The mean and standard
deviation are not resistant to large lifetimes, while the median, Q1 (25th percentile), Q3
(75th percentile), and the IQR (interquartile range) are resistant.
CONTENTS
15-33
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 15
HOW TO USE
You provide a historical value for the shape parameter (Weibull or exponential).
MINITAB provides lower confidence bounds for the scale parameter (Weibull or exponential),
percentiles, and survival probabilities. The lower confidence bound helps you to draw some
conclusions; if the value of the lower confidence bound is better than the specifications, then you
may be able to terminate the test.
For example, your reliability specifications require that the 5th percentile is at least 12 months.
You run a Bayes analysis on data with no failures, and then examine the lower confidence bound
to substantiate that the product is at least as good as specifications. If the lower confidence bound
for the 5th percentile is 13.1 months, then you can conclude that your product meets
specifications and terminate the test.
h To draw conclusions when you have no failures
1 In the main dialog box, click Estimate.
2 In Set shape (Weibull) or scale (other distributions) at enter the shape or scale value. Click
OK.
Comparing parameters
Are the distribution parameters for a sample equal to specified values; for example, does the scale
equal 1.1? Does the sample come from the historical distribution? Do two or more samples come
from the same population? Do two or more samples share the same shape, scale, or location
parameters? To answer these questions you need to perform hypothesis tests on the distribution
parameters.
MINITAB performs Wald Tests [7] and provides Bonferroni 95.0% confidence intervals for the
following hypothesis tests:
15-34
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Distribution Analysis
Test whether the distribution parameters (scale, shape, or location) are consistent with
specified values
Test whether two or more samples come from the same population
Test whether two or more samples share the same shape, scale or location parameters
2 In Test shape (Weibull) or scale (other distributions) equal to or Test scale (Weibull or
expo) or location (other distributions) equal to enter the value to be tested. Click OK.
h To test whether a sample comes from a historical distribution
1 In the main dialog box, click Test.
2 In Test shape (Weibull) or scale (other distributions) equal to and Test scale (Weibull or
expo) or location (other distributions) equal to enter the parameters of the historical
distribution. Click OK.
h To determine whether two or more samples come from the same population
1 In the main dialog box, click Test.
2 Check Test for equal shape (Weibull) or scale (other distributions) and Test for equal scale
Percentiles
By what time do half of the engine windings fail? How long until 10% of the blenders stop
working? You are looking for percentiles. The parametric distribution analysis commands
MINITAB Users Guide 2
CONTENTS
15-35
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 15
SC QREF
HOW TO USE
automatically display a table of percentiles in the Session window. By default, MINITAB displays
the percentiles 110, 20, 30, 40, 50, 60, 70, 80, and 9099.
In this example, we entered failure times (in months) for engine windings.
Table of Percentiles
Percent
1
At about 10 months,
2
1% of the windings
3
failed.
4
Percentile
10.0765
13.6193
16.2590
18.4489
Standard
Error
2.7845
3.2316
3.4890
3.6635
95.0% Normal CI
Lower
Upper
5.8626
17.3193
8.5543
21.6834
10.6767
24.7601
12.5009
27.2270
The values in the Percentile column are estimates of the times at which the corresponding
percent of the units failed. The table also includes standard errors and approximate 95.0%
confidence intervals for each percentile.
In the Estimate subdialog box, you can specify a different confidence level for all confidence
intervals. You can also request percentiles to be added the default table.
h To request additional percentiles
1 In the main dialog box, click Estimate.
2 In Estimate percentiles for these additional percents, enter the additional percents for which
you want to estimate percentiles. You can enter individual percents (0 < P < 100) or a column
of percents. Click OK.
Probability plots
Use the probability plot to assess whether a particular distribution fits your data. The plot consists
of:
15-36
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Distribution Analysis
plot points, which represent the proportion of failures up to a certain time. The plot points are
calculated using a nonparametric method, which assumes no parametric distributionfor
formulas, see Calculations in Help. The proportions are transformed and used as the y
variable, while their corresponding times may be transformed and used as the x variable.
the fitted line, which is a graphical representation of the percentiles. To make the fitted line,
MINITAB first calculates the percentiles for the various percents, based on the chosen
distribution. The associated probabilities are then transformed and used as the y variables.
The percentiles may be transformed, depending on the distribution, and are used as the x
variables. The transformed scales, chosen to linearize the fitted line, differ depending on the
distribution used.
Because the plot points do not depend on any distribution, they would be the same (before being
transformed) for any probability plot made. The fitted line, however, differs depending on the
parametric distribution chosen. So you can use the probability plot to assess whether a particular
distribution fits your data. In general, the closer the points fall to the fitted line, the better the fit.
Tip
To quickly compare the fit of up to four different distributions at once see Distribution ID
Plot on page 15-9.
MINITAB provides two goodness of fit measures to help assess how the distribution fits your data:
the Anderson-Darling statistic for both the maximum likelihood and the least squares methods
and the Pearson correlation coefficient for the least squares method. A smaller Anderson-Darling
statistic indicates that the distribution provides a better fit. A larger Pearson correlation
coefficient indicates that the distribution provides a better fit. See Goodness-of-fit statistics on
page 15-13.
CONTENTS
15-37
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 15
SC QREF
HOW TO USE
Here is a Weibull probability plot for failure times associated with running engine windings at a
temperature of 80 C:
Fitted line
95% confidence
intervals
With the commands in this chapter, you can choose from various methods to estimate the plot
points. You can also choose the method used to obtain the fitted line. The task below describes all
the ways you can modify the probability plot.
h To modify the default probability plot
1 In the main dialog box, click Graphs.
specify the method used to obtain the plot pointsunder Obtain plot points using, choose
one of the following:
with Parametric Distribution AnalysisRight Censoring: Default method, Modified
Kaplan-Meier method, Herd-Johnson method, or Kaplan-Meier method. The Default
15-38
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Distribution Analysis
method is the normal score for uncensored data; the modified Kaplan-Meier method
for censored data.
with Parametric Distribution AnalysisArbitrary Censoring: Turnbull method or
Actuarial method.
Parametric Distribution AnalysisRight Censoring only: Choose what to plot when you
have tied failure timesunder Handle tied failure times by plotting, choose All points
(default), Maximum of the tied points, or Average (median) of tied points.
turn off the 95.0% confidence intervaluncheck Display confidence intervals on above
plots.
3 Click OK.
4 If you want to change the confidence level for the 95.0% confidence interval to some other
Survival probabilities
What is the probability of an engine winding running past a given time? How likely is it that a
cancer patient will live five years after receiving a certain drug? You are looking for survival
probabilities, which are estimates of the proportion of units that survive past a given time.
When you request survival probabilities in the Estimate subdialog box, the parametric
distribution analysis commands display them in the Session window. Here, for example, we
requested a survival probability for engine windings running at 70 months:
40.76% of the
engine windings last
past 70 months.
CONTENTS
15-39
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 15
SC QREF
HOW TO USE
2 In Estimate survival probabilities for these times (values), enter one or more times or a
column of times for which you want to calculate survival probabilities. Click OK.
Survival plots
Survival (or reliability) plots display the survival probabilities versus time. Each plot point
represents the proportion of units surviving at time t. The survival curve is surrounded by two
outer linesthe approximate 95.0% confidence interval for the curve, which provide reasonable
values for the true survival function.
Survival curve
95% confidence
interval
15-40
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Distribution Analysis
turn off the 95.0% confidence intervaluncheck Display confidence intervals on above
plots. Click OK.
change the confidence level for the 95.0% confidence interval. First, click OK in the
Graphs subdialog box. Click Estimate. In Confidence level for confidence intervals,
enter a value. Click OK.
Hazard plots
The hazard plot displays the instantaneous failure rate for each time t. Often, the hazard rate is
high at the beginning of the plot, low in the middle of the plot, then high again at the end of the
plot. Thus, the curve often resembles the shape of a bathtub. The early period with high failure
rate is often called the infant mortality stage. The middle section of the curve, where the failure
rate is low, is the normal life stage. The end of the curve, where failure rate increases again, is
the wearout stage.
Note
MINITABs distributions will not resemble a bathtub curve. The failures at different parts of
the bathtub curve are likely caused by different failure modes. MINITAB estimates the
distribution of the failure time caused by one failure mode.
CONTENTS
15-41
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 15
SC QREF
HOW TO USE
change the maximum number of iterations for reaching convergence (the default is 20).
MINITAB obtains maximum likelihood estimates through an iterative process. If the maximum
number of iterations is reached before convergence, the command terminates.
Why enter starting values for the algorithm? The maximum likelihood solution may not converge
if the starting estimates are not in the neighborhood of the true solution, so you may want to
15-42
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Distribution Analysis
specify what you think are good starting values for parameter estimates. In these cases, enter the
distribution parameters. For the Weibull distribution, enter the shape and scale. For the
exponential distribution, enter the scale. For all other distributions, enter the location and scale.
You can also choose to
estimate the scale parameter while keeping the shape fixed (Weibull and exponential
distributions)
estimate the location parameter while keeping the scale fixed (other distributions)
To estimate the distribution parameters from the data (the default), choose Estimate
parameters of distribution.
If you like, do any of the following:
Enter starting estimates for the parameters: In Use starting estimates, enter one
column of values to be used for all samples, or several columns of values that match the
order in which the corresponding variables appear in the Variables box in the main
dialog box.
Specify the maximum number of iterations: In Maximum number of iterations, enter
a positive integer.
To enter your own estimates for the distribution parameters, choose Use historical
estimates and enter one column of values to be used for all samples, or several columns of
values that match the order in which the corresponding variables appear in the Variables
box in the main dialog box.
3 Click OK.
h To choose the method for estimating parameters
1 In the main dialog box, choose Estimate.
2 Under Estimation Method, choose Maximum Likelihood (the default) or Least Squares.
Click OK.
MINITAB Users Guide 2
CONTENTS
15-43
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 15
SC QREF
HOW TO USE
You can estimate the scale parameter while keeping the shape parameter fixed (Weibull and
exponential) or estimate the location parameter while keeping the scale fixed (other
distributions).
1 In the main dialog box, click Estimate.
2 Do one of the following:
Estimate the scale parameter while keeping the shape fixed (Weibull and exponential
distributions): In Set shape (Weibull) or scale (other distributions) at, enter one value to
be used for all samples, or a series of values that match the order in which the
corresponding variables appear in the Variables box in the main dialog box.
Estimate the location parameter while keeping the scale fixed (other distributions): In Set
shape (Weibull) or scale (other distributions) at, enter one value to be used for all
samples, or a series of values that match the order in which the corresponding variables
appear in the Variables box in the main dialog box.
3 Click OK.
Distribution parameter estimates are more precise than least squares (XY).
MLE allows you to perform an analysis when there are no failures. When there is only one
failure and some right-censored observations, the maximum likelihood parameter estimates
may exist for a Weibull distribution.
Better graphical display to the probability plot because the line is fitted to the points on a
probability plot.
For small or heavily censored sample, LSXY is more accurate than MLE. MLE tends to
overestimate the shape parameter for a Weibull distribution and underestimate the scale
15-44
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Distribution Analysis
parameter in other distributions. Therefore, MLE will tend to overestimate the low
percentiles.
When possible, both methods should be tried; if the results are consistent, then there is more
support for your conclusions. Otherwise, you may want to use the more conservative estimates or
consider the advantages of both approaches and make a choice for your problem.
e Example of a parametric distribution analysis with exact failure/right-censored data
Suppose you work for a company that manufactures engine windings for turbine assemblies.
Engine windings may decompose at an unacceptable rate at high temperatures. You decide to
look at failure times for engine windings at two temperatures, 80 and 100C. You want to find out
the following information for each temperature:
the times at which various percentages of the windings fail. You are particularly interested in
the 0.1st percentile.
You also want to draw two plots: a probability plot to see if the lognormale distribution provides a
good fit for your data, and a survival plot.
In the first sample, you collect failure times (in months) for 50 windings at 80C; in the second
sample, you collect failure times for 40 windings at 100C. Some of the windings drop out of the
test for unrelated reasons. In the MINITAB worksheet, you use a column of censoring indicators
to designate which times are actual failures (1) and which are censored units removed from the
test before failure (0).
1 Open the worksheet RELIABLE.MTW.
2 Choose Stat Reliability/Survival Parametric Dist AnalysisRight Cens.
3 In Variables, enter Temp80 Temp100.
4 From Assumed distribution, choose Lognormal base e.
5 Click Censor. Choose Use censoring columns and enter Cens80 Cens100 in the box. Click
OK.
6 Click Estimate. In Estimate percentiles for these additional percents, enter .1.
7 In Estimate survival probabilities for these times (values), enter 70. Click OK.
8 Click Graphs. Check Survival plot. Click OK in each dialog box.
CONTENTS
15-45
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 15
Session
window
output
HOW TO USE
Count
37
13
Estimate
4.09267
0.48622
Standard
Error
0.07197
0.06062
95.0% Normal CI
Lower
Upper
3.95161
4.23372
0.38080
0.62082
Log-Likelihood = -181.625
Goodness-of-Fit
Anderson-Darling = 67.2208
Characteristics of Distribution
Mean(MTTF)
Standard Deviation
Median
First Quartile(Q1)
Third Quartile(Q3)
Interquartile Range(IQR)
Estimate
67.4153
34.8145
59.8995
43.1516
83.1475
39.9959
Standard
Error
5.5525
6.7983
4.3109
3.2953
7.3769
6.3332
95.0% Normal CI
Lower
Upper
57.3656
79.2255
23.7435
51.0476
52.0192
68.9735
37.1531
50.1186
69.8763
98.9392
29.3245
54.5505
Table of Percentiles
Percent
0.1
1.0
2.0
3.0
4.0
5.0
6.0
7.0
8.0
9.0
10.0
20.0
30.0
40.0
50.0
Percentile
13.3317
19.3281
22.0674
24.0034
25.5709
26.9212
28.1265
29.2276
30.2501
31.2110
32.1225
39.7837
46.4184
52.9573
59.8995
Standard
Error
2.5156
2.8375
2.9256
2.9726
3.0036
3.0262
3.0440
3.0588
3.0717
3.0833
3.0941
3.2100
3.4101
3.7567
4.3109
95.0% Normal CI
Lower
Upper
9.2103
19.2975
14.4953
25.7722
17.0178
28.6154
18.8304
30.5975
20.3126
32.1906
21.5978
33.5566
22.7506
34.7727
23.8074
35.8819
24.7910
36.9113
25.7170
37.8788
26.5962
38.7970
33.9646
46.5999
40.1936
53.6073
46.0833
60.8568
52.0192
68.9735
15-46
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Distribution Analysis
Count
34
6
Standard
Error
0.1178
0.09198
Estimate
3.6287
0.73094
95.0% Normal CI
Lower
Upper
3.3978
3.8595
0.57117
0.93540
Log-Likelihood = -160.688
Goodness-of-Fit
Anderson-Darling = 16.4987
Characteristics of Distribution
Mean(MTTF)
Standard Deviation
Median
First Quartile(Q1)
Third Quartile(Q3)
Interquartile Range(IQR)
Estimate
49.1969
41.3431
37.6636
23.0044
61.6643
38.6600
Standard
Error
6.9176
11.0416
4.4362
2.9505
8.4984
7.2450
95.0% Normal CI
Lower
Upper
37.3465
64.8076
24.4947
69.7806
29.8995
47.4439
17.8910
29.5791
47.0677
80.7876
26.7759
55.8185
Table of Percentiles
Percent
0.1
1.0
2.0
3.0
4.0
5.0
6.0
7.0
8.0
9.0
Percentile
3.9350
6.8776
8.3941
9.5253
10.4756
11.3181
12.0884
12.8069
13.4863
14.1354
CONTENTS
Standard
Error
1.1729
1.6170
1.7942
1.9111
2.0015
2.0766
2.1419
2.2003
2.2538
2.3034
95.0% Normal CI
Lower
Upper
2.1940
7.0577
4.3383
10.9034
5.5212
12.7619
6.4283
14.1144
7.2036
15.2338
7.8995
16.2162
8.5418
17.1076
9.1453
17.9343
9.7195
18.7129
10.2707
19.4544
15-47
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 15
HOW TO USE
14.7606
20.3589
25.6717
31.2967
37.6636
2.3502
2.7526
3.1662
3.6950
4.4362
10.8036
15.6197
20.1592
24.8316
29.8995
20.1667
26.5362
32.6916
39.4451
47.4439
Graph
window
output
15-48
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Distribution Analysis
You can find the 0.1st percentile, which you requested, within the Table of Percentiles. At 80 C,
0.1% of the windings fail by 13.3317 months; at 100 C, 0.1% of the windings fail by 3.9350
months. So the increase in temperature decreased the percentile by about 9.5 months.
What proportion of windings would you expect to still be running past 70 months? In the Table
of Survival Probabilities you find your answer. At 80 C, 37.43% survive past 70 months; at 100
C, 19.82% survive.
e Example of parametric distribution analysis with arbitrarily censored data
Suppose you work for a company that manufactures tires. You are interested in finding out how
many miles it takes for various proportions of the tires to fail, or wear down to 2/32 of an inch of
tread. You are especially interested in knowing how many of the tires last past 45,000 miles.
You inspect each good tire at regular intervals (every 10,000 miles) to see if the tire has failed,
then enter the data into the MINITAB worksheet.
1 Open the worksheet TIREWEAR.MTW.
2 Choose Stat Reliability/Survival Parametric Dist AnalysisArbitrary Cens.
3 In Start variables, enter Start. In End variables, enter End.
4 In Frequency columns, enter Freq.
5 From Assumed distribution, choose Extreme value.
6 Click Graphs. Check Survival plot, then click OK.
7 Click Estimate. In Estimate survival probabilities for these times (values), enter 45000.
Count
71
694
8
Estimate
77538.0
13972.0
CONTENTS
Standard
Error
547.0
445.0
95.0% Normal CI
Lower
Upper
76465.8
78610.2
13126.5
14872.1
15-49
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 15
HOW TO USE
Estimate
69473.32
17919.83
72417.04
60130.23
82101.72
21971.49
Standard
Error
646.6352
570.7594
599.5413
849.0361
538.9283
699.8078
95.0% Normal CI
Lower
Upper
68205.94
70740.70
16835.36
19074.15
71241.97
73592.12
58466.15
61794.31
81045.44
83158.00
20641.82
23386.80
Table of Percentiles
Percent
1
2
3
4
5
6
7
8
9
10
20
30
40
50
Percentile
13264.55
23019.97
28756.49
32847.96
36038.31
38658.95
40886.63
42826.87
44547.76
46095.77
56580.77
63133.78
68152.58
72417.04
Standard
Error
2216.243
1916.275
1741.644
1618.183
1522.706
1444.905
1379.291
1322.593
1272.702
1228.182
939.3041
777.3208
670.9556
599.5413
95.0% Normal CI
Lower
Upper
8920.791
17608.30
19264.14
26775.80
25342.93
32170.05
29676.38
36019.54
33053.87
39022.76
35826.99
41490.91
38183.26
43589.99
40234.64
45419.11
42053.31
47042.21
43688.58
48502.97
54739.76
58421.77
61610.26
64657.30
66837.54
69467.63
71241.97
73592.12
15-50
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Distribution Analysis
Graph
window
output
CONTENTS
15-51
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 15
SC QREF
HOW TO USE
Data
The nonparametric distribution analysis commands accept different kinds of data:
You can enter up to ten samples per analysis. For general information on life data and censoring,
see Distribution Analysis Data on page 15-4.
15-52
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Distribution Analysis
2 In Variables, enter the columns of failure times. You can enter up to ten columns (ten
different samples).
3 If you have frequency columns, enter the columns in Frequency columns.
4 If all of the samples are stacked in one column, enter a column of grouping indicators in By
variable. In Enter number of levels, enter the number of levels the indicator column
contains.
Note
5 Click Censor.
6 Do one of the following, then click OK.
For data with censoring columns: Choose Use censoring columns, then enter the
censoring columns in the box. The first censoring column is paired with the first data
column, the second censoring column is paired with the second data column, and so on.
If you like, enter the value you use to indicate a censored value in Censoring value. If you
do not enter a censoring value, MINITAB uses the lowest value in the censoring column by
default.
For time censored data: Choose Time censor at, then enter a failure time at which to
begin censoring. For example, entering 500 says that any observation from 500 time units
onward is considered censored.
For failure censored data: Choose Failure censor at, then enter a number of failures at
which to begin censoring. For example, entering 150 says to censor all (ordered)
observations starting with the 150th observed failures, and leave all other observations
uncensored.
7 If you like, use any of the options listed below, then click OK.
CONTENTS
15-53
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 15
SC QREF
HOW TO USE
2 In Start variables, enter the columns of start times. You can enter up to ten columns (ten
different samples).
3 In End variables, enter the columns of end times. You can enter up to ten columns (ten
different samples).
4 When you have frequency columns, enter the columns in Frequency columns.
5 If all of the samples are stacked in one column, enter a column of grouping indicators in By
variable. In Enter number of levels, enter the number of levels the indicator column
contains.
6 If you like, use any of the options described below, then click OK.
Options
Estimate subdialog box
specify a confidence level for all confidence intervals. The default is 95.0%.
choose to calculate two-sided confidence intervals, or lower or upper bounds. The default is
two-sided.
draw a survival plot, with or without confidence intervalssee Nonparametric survival plots on
page 15-61.
15-54
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Distribution Analysis
Output
The nonparametric distribution analysis output differs depending on whether your data are
uncensored/right censored or arbitrarily censored.
When your data are uncensored/right censored you get
characteristics of the variable, which includes the mean, its standard error and 95%
confidence intervals, median, interquartile range, Q1, and Q3
CONTENTS
15-55
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 15
SC QREF
HOW TO USE
Turnbull estimates of the survival probabilities and their standard errors and 95% confidence
intervals
Survival probabilities
What is the probability of an engine winding running past a given time? How likely is it that a
cancer patient will live five years after receiving a certain drug? You are looking for survival
probabilities. Survival probabilities estimate the proportion of units surviving at time t.
You can choose various estimation methods, depending on the command.
To plot the survival probabilities versus time, see Nonparametric survival plots on page 15-61.
Kaplan-Meier survival estimates (Nonparametric Distribution AnalysisRight
Censoring only)
With Nonparametric Distribution AnalysisRight Censoring, the default output includes the
characteristics of the variable, and a table of Kaplan-Meier survival estimates. You can also
request hazard estimates (empirical hazard function) in the Results subdialog box.
15-56
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Distribution Analysis
Upper
60.0254
Q3 =
Kaplan-Meier Estimates
Number
Number
Survival
Standard
Time at Risk
Failed Probability
23.0000
50
1
0.9800
24.0000
49
1
0.9600
27.0000
48
2
0.9200
31.0000
46
1
0.9000
34.0000
45
1
0.8800
35.0000
44
1
0.8600
etc.
*
95.0% Normal
Error
0.0198
0.0277
0.0384
0.0424
0.0460
0.0491
CI
Lower
0.9412
0.9057
0.8448
0.8168
0.7899
0.7638
Upper
1.0000
1.0000
0.9952
0.9832
0.9701
0.9562
Characteristics of Variable displays common measures of the center and spread of the
distribution. The mean is not resistant to large lifetimes, while the median, Q1 (25th
percentile), Q3 (75th percentile) and the IQR (interquartile range) are resistant.
For each failure time t, MINITAB also displays the number of units at risk, the number failed, and
the standard error and 95.0% confidence interval for the survival probabilities.
Additional output
You can request this additional output in the Results subdialog box:
Empirical Hazard Function
Time
23.0000
24.0000
27.0000
31.0000
34.0000
etc.
Hazard
Estimates
0.02000
0.02041
0.02128
0.02174
0.02222
Hazard Estimates are measures of the instantaneous failure rate for each time t.
CONTENTS
15-57
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 15
HOW TO USE
At 40,000 miles,
92.63% of the
tires have survived.
Survival
Time Probability
10000.00
0.9897
20000.00
0.9767
30000.00
0.9586
40000.00
0.9263
50000.00
0.8784
60000.00
0.7658
70000.00
0.5783
80000.00
0.2794
90000.00
0.0918
Probability
of Failure
0.0103
0.0129
0.0181
0.0323
0.0479
0.1125
0.1876
0.2988
0.1876
0.0918
Standard
Error
0.0036
0.0054
0.0072
0.0094
0.0118
0.0152
0.0178
0.0161
0.0104
Standard
Error
0.0036
0.0041
0.0048
0.0064
0.0077
0.0114
0.0140
0.0165
0.0140
*
95.0% Normal CI
lower
upper
0.9825
0.9968
0.9661
0.9873
0.9446
0.9726
0.9078
0.9447
0.8554
0.9014
0.7360
0.7957
0.5435
0.6131
0.2478
0.3111
0.0715
0.1122
The Probability of Failure column contains estimates of the probability of failing during the
interval.
The Survival Probability column contains estimates of the proportion of units still surviving at
time tin our case, the number of miles.
For each time t, the table also displays the standard errors for both the probability of failures and
survival probabilities, and 95.0% approximate confidence intervals for the survival probabilities.
Actuarial survival estimates
Instead of the default Kaplan-Meier or Turnbull survival estimates, you can request Actuarial
estimates in the Estimate subdialog box. Actuarial output includes median residual lifetimes,
conditional probabilities of failure, and survival probabilities. When using the actuarial method,
you can also request hazard estimates and density estimates in the Results subdialog box.
15-58
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Distribution Analysis
With Nonparametric Distribution AnalysisRight Censoring, you can request specific time
intervals. In this example, we requested equally spaced time intervals from 0110, in increments
of 20:
Characteristics of Variable
Standard
Error
3.3672
Median
56.1905
95.0% Normal CI
lower
upper
49.5909
62.7900
Proportion of
Running Units
1.0000
0.8400
Additional
Time
36.1905
20.0000
Standard
Error
3.3672
3.0861
95.0% Normal CI
lower
upper
29.5909
42.7900
13.9514
26.0486
Actuarial Table
Interval
Number
lower
upper Entering
0.000000
20.0000
50
20.0000
40.0000
50
40.0000
60.0000
42
60.0000
80.0000
21
80.0000
100.0000
9
100.0000
120.0000
3
Time
20.0000
40.0000
60.0000
80.0000
100.0000
Survival
Probability
1.0000
0.8400
0.4200
0.2432
0.2432
Number
Failed
0
8
21
8
0
0
Standard
Error
0.0000
0.0518
0.0698
0.0624
0.0624
Number
Censored
0
0
0
4
6
3
Conditional
Probability Standard
of Failure
Error
0.0000
0.0000
0.1600
0.0518
0.5000
0.0772
0.4211
0.1133
0.0000
0.0000
0.0000
0.0000
95.0% Normal CI
lower
upper
1.0000
1.0000
0.7384
0.9416
0.2832
0.5568
0.1208
0.3655
0.1208
0.3655
Characteristics of Variable displays the median, its standard error, and 95% confidence interval.
Additional Time from Time T Until 50% of Running Units Fail
Additional Time contains the median residual lifetimes, which estimate the additional time
from Time t until half of the running units fail. For example, at 40 months, it will take an
estimated additional 20 months until 42% (1/2 of 84%) of the running units fail.
Actuarial Table
CONTENTS
15-59
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 15
HOW TO USE
Survival Probability displays the survival probabilities, which estimate the probability that a
unit is running at a given time. For example, 0.8400 of the units are running at 40 months.
For each estimate, MINITAB displays the associated standard errors and, for the survival
probabilities, 95.0% approximate confidence intervals.
Additional output
You can request this additional output in the Results subdialog box:
Time
10.0000
30.0000
50.0000
70.0000
90.0000
110.0000
Hazard
Estimates
0.000000
0.008696
0.03333
0.02667
0.000000
0.000000
Standard
Error
*
0.003063
0.006858
0.009087
*
*
Density
Estimates
0.000000
0.008000
0.02100
0.008842
0.000000
0.000000
Standard
Error
*
0.002592
0.003490
0.002796
*
*
Hazard Estimates estimate the hazard function at the midpoint of the interval. The hazard
function is a measure of the instantaneous failure rate for each time t.
Density Estimates estimate the density function at the midpoint of the interval. The density
function describes the distribution of failure times.
click OK:
use equally spaced time intervalschoose 0 to_by_ and enter numbers in the boxes. For
example, 0 to 100 by 20 gives you these time intervals: 020, 2040, and so on up to 80
100.
use unequally spaced time intervalschoose Enter endpoints of intervals, and enter a
series of numbers, or a column of numbers, in the box. For example, entering 0 4 6 8 10 20
15-60
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Distribution Analysis
30, gives you these time intervals: 04, 46, 68, 810,
1020, and 2030.
More
To display hazard and density estimates in the Actuarial table, from the main dialog box,
click Results. Do one of the following, then click OK:
You can interpret the nonparametric survival curve in a similar manner as you would the
parametric survival curve on page 15-40. The major difference is that the nonparametric survival
curve is a step function while the parametric survival curve is a smoothed function.
To draw a nonparametric survival plot, check Survival plot in the Graphs subdialog box. By
default, the survival plot uses Kaplan-Meier (Nonparametric Distribution AnalysisRight
Censoring) or Turnbull (Nonparametric Distribution AnalysisArbitrary Censoring) estimates
of the survival function. If you want to plot Actuarial estimates, choose Actuarial method in the
Estimate subdialog box. See To request actuarial estimates on page 15-60.
More
CONTENTS
DF
1
1
P-Value
0.0055
0.0003
15-61
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 15
HOW TO USE
This table contains measures that tell you if the survival curves for various samples are
significantly different. A p-value < indicates that the survival curves are significantly different.
To get more detailed log-rank and Wilcoxon statistics, choose In addition, hazard, density
(actuarial method) estimates and log-rank and Wilcoxon statistics in the Results subdialog box.
Hazard plots
Nonparametric hazard estimates are calculated various ways:
For a general description, see Hazard plots on page 15-41. For computations, see Help.
h To draw a hazard plot (nonparametric distribution analysisright censoring
command)
1 In the Nonparametric Distribution AnalysisRight Censoring dialog box, click Graphs.
More
command)
1 In the Nonparametric Distribution AnalysisArbitrary Censoring dialog box, click Estimate.
15-62
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Distribution Analysis
2 Click Graphs.
data
Suppose you work for a company that manufactures engine windings for turbine assemblies.
Engine windings may decompose at an unacceptable rate at high temperatures. You decide to
look at failure times for engine windings at two temperatures, 80 and 100C. You want to find
out the following information for each temperature:
You also want to know whether or not the survival curves at the two temperatures are
significantly different.
In the first sample, you collect times to failure for 50 windings at 80C; in the second sample,
you collect times to failure for 40 windings at 100C. Some of the windings drop out of the test
for unrelated reasons. In the MINITAB worksheet, you use a column of censoring indicators to
designate which times are actual failures (1) and which are censored units removed from the test
before failure (0).
1 Open the worksheet RELIABLE.MTW.
2 Choose Stat Reliability/Survival Nonparametric Dist AnalysisRight Cens.
3 In Variables, enter Temp80 Temp100.
4 Click Censor. Choose Use censoring columns and enter Cens80 Cens100 in the box. Click
OK.
5 Click Graphs. Check Survival plot and Display confidence intervals on plot. Click OK in
CONTENTS
15-63
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 15
Session
window
output
HOW TO USE
Count
37
13
Nonparametric Estimates
Characteristics of Variable
Mean(MTTF)
55.7000
Median =
IQR =
Standard
Error
2.2069
55.0000
*
95.0% Normal CI
Lower
Upper
51.3746
60.0254
Q1 =
48.0000 Q3 =
Kaplan-Meier Estimates
Time
23.0000
24.0000
27.0000
31.0000
34.0000
35.0000
37.0000
40.0000
41.0000
45.0000
Number
at Risk
50
49
48
46
45
44
43
42
41
40
Number
Failed
1
1
2
1
1
1
1
1
1
1
Survival
Probability
0.9800
0.9600
0.9200
0.9000
0.8800
0.8600
0.8400
0.8200
0.8000
0.7800
Standard
Error
0.0198
0.0277
0.0384
0.0424
0.0460
0.0491
0.0518
0.0543
0.0566
0.0586
95.0% Normal CI
Lower
Upper
0.9412
1.0000
0.9057
1.0000
0.8448
0.9952
0.8168
0.9832
0.7899
0.9701
0.7638
0.9562
0.7384
0.9416
0.7135
0.9265
0.6891
0.9109
0.6652
0.8948
15-64
Count
34
6
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Distribution Analysis
Nonparametric Estimates
Characteristics of Variable
Mean(MTTF)
41.6563
Median =
IQR =
Standard
Error
3.4695
38.0000
30.0000
95.0% Normal CI
Lower
Upper
34.8561
48.4564
Q1 =
24.0000 Q3 =
54.0000
Kaplan-Meier Estimates
Time
6.0000
10.0000
11.0000
14.0000
16.0000
18.0000
22.0000
24.0000
25.0000
27.0000
29.0000
30.0000
32.0000
35.0000
Number
at Risk
40
39
38
37
36
35
32
31
30
29
28
27
26
25
Number
Survival
Failed Probability
1
0.9750
1
0.9500
1
0.9250
1
0.9000
1
0.8750
3
0.8000
1
0.7750
1
0.7500
1
0.7250
1
0.7000
1
0.6750
1
0.6500
1
0.6250
1
0.6000
Standard
Error
0.0247
0.0345
0.0416
0.0474
0.0523
0.0632
0.0660
0.0685
0.0706
0.0725
0.0741
0.0754
0.0765
0.0775
95.0% Normal CI
Lower
Upper
0.9266
1.0000
0.8825
1.0000
0.8434
1.0000
0.8070
0.9930
0.7725
0.9775
0.6760
0.9240
0.6456
0.9044
0.6158
0.8842
0.5866
0.8634
0.5580
0.8420
0.5299
0.8201
0.5022
0.7978
0.4750
0.7750
0.4482
0.7518
DF
1
1
P-Value
0.0055
0.0003
Graph
window
output
CONTENTS
15-65
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 15
SC QREF
HOW TO USE
Suppose you work for a company that manufactures tires. You are interested in finding out how
likely it is that a tire will fail, or wear down to 2/32 of an inch of tread, within given mileage
intervals. You are especially interested in knowing how many of the tires last past 45,000 miles.
You inspect each good tire at regular intervals (every 10,000 miles) to see if the tire fails, then
enter the data into the MINITAB worksheet.
1 Open the worksheet TIREWEAR.MTW.
2 Choose Stat Reliability/Survival Nonparametric Dist AnalysisArbitrary Cens.
3 In Start variables, enter Start. In End variables, enter End.
4 In Frequency columns, enter Freq, then click OK.
15-66
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Session
window
output
HOW TO USE
Distribution Analysis
Count
71
694
8
Turnbull Estimates
Interval
Lower
Upper
*
10000.00
10000.00
20000.00
20000.00
30000.00
30000.00
40000.00
40000.00
50000.00
50000.00
60000.00
60000.00
70000.00
70000.00
80000.00
80000.00
90000.00
90000.00
*
Probability
of Failure
0.0103
0.0129
0.0181
0.0323
0.0479
0.1125
0.1876
0.2988
0.1876
0.0918
Survival
Time Probability
10000.00
0.9897
20000.00
0.9767
30000.00
0.9586
40000.00
0.9263
50000.00
0.8784
60000.00
0.7658
70000.00
0.5783
80000.00
0.2794
90000.00
0.0918
Standard
Error
0.0036
0.0054
0.0072
0.0094
0.0118
0.0152
0.0178
0.0161
0.0104
Standard
Error
0.0036
0.0041
0.0048
0.0064
0.0077
0.0114
0.0140
0.0165
0.0140
*
95.0% Normal CI
Lower
Upper
0.9825
0.9968
0.9661
0.9873
0.9446
0.9726
0.9078
0.9447
0.8554
0.9014
0.7360
0.7957
0.5435
0.6131
0.2478
0.3111
0.0715
0.1122
CONTENTS
15-67
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 15
HOW TO USE
References
References
[1] R.B. DAgostino and M.A. Stephens (1986). Goodness-of-Fit Techniques, Marcel Dekker.
[2] J.D. Kalbfleisch and R.L. Prentice (1980). The Statistical Analysis of Failure Time Data, John
Wiley & Sons.
[3] D. Kececioglu (1991). Reliability Engineering Handbook, Vols 1 and 2, Prentice Hall.
[4] J.F. Lawless (1982). Statistical Models and Methods for Lifetime Data, John Wiley & Sons,
Inc.
[5] W.Q. Meeker and L.A. Escobar (1998). Statistical Methods for Reliability Data, John Wiley
& Sons, Inc.
[6] W. Murray, Ed. (1972). Numerical Methods for Unconstrained Optimization, Academic
Press.
[7] W. Nelson (1982). Applied Life Data Analysis, John Wiley & Sons.
[8] R. Peto (1973). Experimental Survival Curves for Interval-censored Data, Applied Statistics
22, pp. 86-91.
[9] B.W. Turnbull (1976). The Empirical Distribution Function with Arbitrarily Grouped,
Censored and Truncated Data, Journal of the Royal Statistical Society 38, pp. 290-295.
[10] B.W. Turnbull (1974). Nonparametric Estimation of a Survivorship Function with Doubly
Censored Data, Journal of the American Statistical Association 69, 345, pp. 169-173.
15-68
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
16
Regression with
Life Data
CONTENTS
16-1
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 16
HOW TO USE
Accelerated Life Testing performs a simple regression with one predictor that is used to model
failure times for highly reliable products. The predictor is an accelerating variable; its levels
exceed those normally found in the field. The data obtained under the high stress conditions
can then be used to extrapolate back to normal use conditions. In order to do this, you must
have a good model of the relationship between failure time and the accelerating variable.
Regression with Life Data performs a regression with one or more predictors. The model can
include factors, covariates, interactions, and nested terms. This model will help you
understand how different factors and covariates affect the lifetime of your part or product.
Both regression with life data commands differ from other regression commands in MINITAB in
that they use different distributions and accept censored data. You can choose to model your data
on one of the following eight distributions: Weibull, extreme value, exponential, normal,
lognormal basee, lognormal base10, logistic, and loglogistic.
Life data is often incomplete or censored in some way. Censored observations are those for which
an exact failure time is unknown. Suppose you are testing how long a product lasts and you plan
to end the study after a certain amount of time. Any products that have not failed before the study
ends are right-censored, meaning that the part failed sometime after the present time. Similarly,
you may only know that a product failed before a certain time, which is left-censored. Failure
times that occur within a certain interval of time are interval-censored.
MINITAB uses a modified Newton-Raphson algorithm to calculate maximum likelihood estimates
of the model parameters.
16-2
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Regression with Life Data
censoring indicators (for the failure times, if needed)see Uncensored/right censored data on
page 16-4
predictor variables
For Accelerated Life Testing, enter one predictor column containing various levels of an
accelerating variable. For example, an accelerating variable may be stresses or catalysts
whose levels exceed normal operating conditions.
For Regression with Life Data, enter one or more predictor columns. These predictor
variables may be treated as factors or covariates in the model. For more information, see
How to specify the model terms on page 16-23.
Structure each column so that it contains individual observations (one row = one observation),
or unique observations with a corresponding column of frequencies. Frequency columns are
useful when you have large numbers of data with common failure or censoring times, and
identical predictor values. Here is the same worksheet structured both ways:
Raw Data: one row for each observation
C1
C2
C3
C4
Response Censor Factor Covar
29
F
1
12
31
F
1
12
31
F
1
12
.
.
.
.
.
.
.
.
.
.
.
.
37
F
1
12
37
C
2
12
41
F
2
12
.
.
.
.
.
.
.
.
.
.
.
.
1
19
C2
Censor
F
F
F
C
F
C3
C4
C5
Covar Factor Count
12
1
1
12
1
19
12
1
1
12
2
1
12
2
19
1
1
19
Text categories (factor levels) are processed in alphabetical order by default. If you wish, you can
define your own ordersee Ordering Text Categories in the Manipulating Data chapter of
MINITAB Users Guide 1 for details.
The way you set up the worksheet depends on the type of censoring you have, as described in
Failure times on page 16-4.
CONTENTS
16-3
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 16
SC QREF
HOW TO USE
Failure times
The response data you gather for the commands in this chapter are the individual failure times.
For example, you might collect failure times for units running at a given temperature. You might
also collect samples under different temperatures, or under varying conditions of any
combination of accelerating variables. Individual failure times are the same type of data used for
Distribution Analysis on page 15-1.
Life data is often censored or incomplete in some way. Suppose you are monitoring air
conditioner fans to find out the percentage of fans which fail within a three-year warranty period.
The table below describes the types of observations you can have:
Type of observation
Description
Example
Right censored
Left censored
Interval censored
How you set up your worksheet depends, in part, on the type of censoring you have:
When your data consist of exact failures and right-censored observations, see Uncensored/right
censored data on page 16-4.
When your data have a varied censoring scheme, see Uncensored/arbitrarily censored data on
page 16-5.
Time
Censor
53
60
53
40
51
99
35
53
F
F
F
F
F
C
F
F
16-4
...
...
etc.
etc.
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Regression with Life Data
Censoring indicators can be numbers, text, or date/time values. If you do not specify which value
indicates censoring in the Censor subdialog box, MINITAB assumes the lower of the two values
indicates censoring, and the higher of the two values indicates an exact failure.
The data column and associated censoring column must be the same length, although pairs of
data/censor columns (each pair corresponds to a sample) can have different lengths.
Uncensored/arbitrarily censored data
When you have any combination of exact failure times, right-, left- and interval-censored data,
enter your data using a Start column and End column:
For this observation...
failure time
failure time
Right censored
Left censored
Interval censored
10000
20000
30000
30000
40000
50000
50000
60000
70000
80000
90000
End
10000
20000
30000
30000
40000
50000
50000
60000
70000
80000
90000
Frequency
20
10
10
2
20
40
7
50
120
230
310
190
CONTENTS
16-5
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 16
HOW TO USE
Accelerated Life Testing
process. The variable is thus called the accelerating variable. Accelerated tests are performed to
save time and money, since, under normal field conditions, it can take a very long time for a unit
to fail. Accelerated life testing requires knowledge of the relationship between the accelerating
variable and failure time.
Here are the steps:
1 Impose levels of the accelerating variable on the units.
2 Record the failure (or censoring) times.
3 Run the Accelerated Life Testing analysis, asking MINITAB to extrapolate to the design value,
or common field condition. This way, you can find out how the units behave under normal
field conditions.
You can request an Arrhenius, inverse temperature, loge, log10 transformation, or no
transformation for the accelerating variable. By default, MINITAB assumes the relationship is
linear (no transformation).
The simplest output includes a regression table, relation plot, and probability plot for each level
of the accelerating variable based on the fitted model. The relation plot displays the relationship
between the accelerating variable and failure time by plotting percentiles for each level of the
accelerating variable. By default, lines are drawn at the 10th, 50th, and 90th percentiles. The
50th percentile is a good estimate for the time a part will last when exposed to various levels of the
accelerating variable. The probability plot is created for each level of the accelerating variable
based on the fitted model (line) and based on a nonparametric model (points).
Data
See Worksheet Structure for Regression with Life Data on page 16-3.
MINITAB automatically excludes all observations with missing values from all calculations.
How you run the analysis depends on whether your data is uncensored/right censored or
uncensored/arbitrarily censored.
16-6
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Regression with Life Data
2 In Variables/Start variables, enter the columns of failure times. You can enter up to ten
Note
5 Click Censor.
6 In Use censoring columns, enter the censoring columns. The first censoring column is
paired with the first data column, the second censoring column is paired with the second data
column, and so on.
By default, MINITAB uses the lowest value in the censoring column to indicate a censored
observation. To use some other value, enter that value in Censoring value.
Note
If your censoring indicators are date/time values, store the value as a constant and then
enter the constant.
7 If you like, use any of the options listed below, then click OK.
CONTENTS
16-7
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 16
3 In Variables/Start variables, enter the columns of start times. You can enter up to ten columns
different samples).
5 If you have frequency columns, enter them in Freq. columns.
6 In Accelerating variable, enter the column of predictor values.
7 If you like, use any of the options described below, then click OK.
Options
Accelerated Life Testing dialog box
transform the accelerating variable one of four common ways: Arrhenius, inverse temperature,
loge, or log10. By default, MINITAB uses no transformation (linear). See Transforming the
accelerating variable on page 16-12.
choose one of eight common lifetime distributions for the error distribution, including the
Weibull (default), extreme value, exponential, normal, lognormal basee, lognormal base10,
logistic, and loglogistic distributions.
More
MINITABs extreme value distribution is the smallest extreme value (Type 1).
enter predictor values (levels of accelerating variable) for which to estimate percentiles and/or
survival probabilities. Most often, you would enter the design value. You can also use the
predictor values (levels of accelerating variable) from the data.
16-8
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Regression with Life Data
estimate percentiles for the percents you specifysee Percentiles and survival probabilities on
page 16-16. By default, MINITAB estimates the 50th percentile.
estimate survival probabilities for the times you specifysee Percentiles and survival
probabilities on page 16-16. For example, if you enter 10 hours, MINITAB estimates (for each
predictor value) the proportion of units that survive at least 10 hours.
specify a confidence level for all of the confidence intervals. The default is 95.0%.
enter a design value to include on the plots based on the fitted model (relation plot and
probability plot for each accelerating level).
draw a relation plot to display the relationship between an accelerating variable and failure
timesee Relation plot on page 16-11. You can:
plot percentiles for the percents you specify. By default, MINITAB plots the 10th, 50th, and
90th percentiles.
display confidence intervals for all of the percentiles or the middle percentiles only. You
can also suppress their display.
display points for failure times (exact failure time or midpoint of interval for interval
censored observation) on the plot.
draw a probability plot for each level of the accelerating variable based on the fitted model
see Probability plots on page 16-14. You can:
display confidence intervals for the design value or for all levels of the accelerating
variable. You can also suppress their display.
draw a probability plot for each level of the accelerating variable based on the individual
fitssee Probability plots on page 16-14.
draw a probability plot for the standardized residuals and an exponential probability plot for
the Cox-Snell residualssee Probability plots on page 16-14.
CONTENTS
16-9
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 16
HOW TO USE
Accelerated Life Testing
enter starting values for model parameters for the Newton-Raphson algorithmsee
Estimating the model parameters on page 16-28.
change the maximum number of iterations for reaching convergence (the default is 20).
MINITAB obtains maximum likelihood estimates through an iterative process. If the maximum
number of iterations is reached before convergence, the command terminatessee
Estimating the model parameters on page 16-28.
estimate other model coefficients while holding the shape parameter (Weibull) or the scale
parameter (other distributions) fixed at a specific valuesee Estimating the model parameters
on page 16-28.
use historical estimates for the parameters rather than estimate them from the datasee
Estimating the model parameters on page 16-28. In this case, no estimation is done; all
resultssuch as the percentilesare based on these parameters.
store information on the estimated equation, including the estimated coefficients, their
standard errors and confidence intervals, the variance/covariance matrix, and the
log-likelihood for the last iteration.
Output
The default output consists of the regression table, a relation plot, a probability plot for each level
of the accelerating variable based on the fitted model, and Anderson-Darling goodness-of-fit
statistics for the probability plot.
Regression table
The regression table displays:
the log-likelihood.
Anderson-Darling goodness-of-fit statistics for each level of the accelerating variable based on
the fitted model.
16-10
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Regression with Life Data
Relation plot
The relation plot displays failure time versus an accelerating variable. By default, lines are drawn
at the 10th, 50th, and 90th percentiles. The 50th percentile is a good estimate for the time a part
will last for the given conditions. For an illustration, see Example of accelerated life testing on
page 16-17.
You can optionally specify up to ten percentiles to plot and display the failure times (exact failure
time or midpoint of interval for interval censored observation) on the plot. You can enter a
design value to include on the plot.
h To modify the relation plot
1 In the Accelerated Life Testing dialog box, click Graphs.
To include a design value on the plot, specify a value in Enter design value to include on
plots.
To plot percentiles for the percents you specify, enter the percents or a column of percents
in Plot percentiles for percents. For example, to plot the 30th percentile (how long it
takes 30% of the units to fail), enter 30. By default, MINITAB plots the 10th, 50th, and 90th
percentiles.
Choose one:
Display confidence intervals for middle percentile
Display confidence intervals for all percentiles
Display no confidence intervals
To include failure times (exact failure time or midpoint of interval for interval censored
observation) on the plot, check Display failure times on plot.
3 Click OK.
4 If you like, change the confidence level for the intervals (default = 95%): Click Estimate. In
CONTENTS
16-11
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 16
HOW TO USE
Accelerated Life Testing
16-12
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Regression with Life Data
For the Weibull, exponential, lognormal basee, lognormal base10 and loglogistic
distributions, Yp = log (failure time)
For the normal, extreme value, and logistic distributions, Yp = failure time
When Yp = log (failure time), MINITAB takes the antilog to display the percentiles on the
original scale.
You can find the values for the y-intercept, the regression coefficient(s), and the shape or scale
parameter in the regression table. When you enter predictor values in the Estimate subdialog
box, the percentiles are displayed in the table of percentiles.
Note
You will often have more than one regression coefficient and predictor (X) with Regression
with Life Data on page 16-19.
The value of the error distribution p also depends on the distribution chosen.
For the normal distribution, the error distribution is the standard normal distribution
normal (0,1). For the lognormal base10 and lognormal basee distributions, MINITAB takes
the log base10 or log basee of the data, respectively, and uses a normal distribution.
For the logistic distribution, the error distribution is the standard logistic distribution
logistic (0, 1). For the loglogistic distribution, MINITAB takes the log of the data and uses a
logistic distribution.
For the extreme value distribution, the error distribution is the standard extreme value
distributionextreme value (0, 1). For the Weibull distribution and the exponential
distribution (a type of Weibull distribution), MINITAB takes the log of the data and uses the
extreme value distribution.
CONTENTS
16-13
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 16
HOW TO USE
Accelerated Life Testing
Probability plots
The Accelerated Life Testing command draws several probability plots to help you assess the fit of
the chosen distribution. You can draw probability plots for the standardized and Cox-Snell
residuals. You can use these plots to assess whether a particular distribution fits your data. In
general, the closer the points fall to the fitted line, the better the fit.
You can also choose to draw probability plots for each level of the accelerating variable based on
individual fits or on the fitted model. You can use these plots to assess whether the distribution,
transformation, and assumption of equal shape (Weibull or exponential) or scale (other
distributions) are appropriate. The probability plot based on the fitted model includes fitted lines
that are based on the chosen distribution and transformation. If the points do not fit the lines
adequately, then consider a different transformation or distribution.
The probability plot based on the individual fits includes fitted lines that are calculated by
individually fitting the distribution to each level of the accelerating variable. If the distributions
have equal shape (Weibull or exponential) or scale (other distributions) parameters, then the
fitted lines should be approximately parallel. The points should fit the line adequately if the
chosen distribution is appropriate.
MINITAB provides one goodness-of-fit measure: the Anderson-Darling statistic. A smaller
Anderson-Darling statistic indicates that the distribution provides a better fit. You can use the
Anderson-darling statistic to compare the fit of competing models.
For a discussion of probability plots, see Probability plots on page 15-36.
16-14
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Regression with Life Data
To plot based on the fitted model, check Probability plot for each accelerating level
based on fitted model. Choose one of the following:
Display confidence intervals for design value
Display confidence intervals for all levels
Display no confidence intervals
To include a design value on the fitted model plot, enter a value in Enter a design value to
include on plots.
To plot based on the individual fits, check Probability plot for each accelerating level
based on individual fits.
Tip
To plot the standardized residuals, check Probability plot for standardized residuals
To plot the Cox-Snell residuals, check Exponential probability plot for Cox-Snell
residuals
To draw a probability plot with more options, store the residuals in the Storage subdialog
box, then use the probability plot included with Parametric Distribution Analysis on page
15-27.
CONTENTS
16-15
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 16
HOW TO USE
Accelerated Life Testing
2 In Enter new predictor values, enter one new value or column of new values. Often you will
enter the design value, or common running condition, for the units.
3 Do any of the following, then click OK:
More
To estimate survival probabilities, enter the times in Estimate survival probabilities for
times. For example, when you enter 70 (units in hours), MINITAB estimates the probability,
for each predictor value, that the unit will survive past 70 hours.
Sometimes you may want to estimate percentiles or survival probabilities for the
accelerating variable levels used in the study:
In the Estimate subdialog box, choose Use predictor values in data (storage only).
Because of the potentially large amount of output, MINITAB stores the results in the
worksheet rather then printing them in the Session window.
16-16
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Regression with Life Data
Suppose you want to investigate the deterioration of an insulation used for electric motors. The
motors normally run between 80 and 100 C. To save time and money, you decide to use
accelerated life testing.
First you gather failure times for the insulation at abnormally high temperatures110, 130, 150,
and 170 Cto speed up the deterioration. With failure time information at these temperatures,
you can then extrapolate to 80 and 100 C. It is known that an Arrhenius relationship exists
between temperature and failure time.
To see how well the model fits, you will draw a probability plot based on the standardized
residuals.
1 Open the worksheet INSULATE.MTW.
2 Choose Stat Reliability/Survival Accelerated Life Testing.
3 In Variables/Start variables, enter FailureT. In Accelerating variable, enter Temp.
4 From Relationship, choose Arrhenius.
5 Click Censor. In Use censoring columns, enter Censor, then click OK.
6 Click Graphs. In Enter design value to include on plot, enter 80. Click OK.
Click Estimate. In Enter new predictor values, enter Design, then click OK in each dialog box.
Session
window
output
Count
66
14
Coef
-15.1874
0.83072
2.8246
Standard
Error
0.9862
0.03504
0.2570
Z
P
-15.40 0.000
23.71 0.000
95.0% Normal CI
Lower
Upper
-17.1203
-13.2546
0.76204
0.89940
2.3633
3.3760
Log-Likelihood = -564.693
CONTENTS
16-17
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 16
HOW TO USE
Accelerated Life Testing
Temp Percentile
80.0000
159584.5
100.0000
36948.57
Standard
Error
27446.85
4216.511
95.0% Normal CI
Lower
Upper
113918.2
223557.0
29543.36
46209.94
Graph
window
output
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Regression with Life Data
model terms, which consist of any number of predictor variables and when appropriate,
various interactions between predictors. See How to specify the model terms on page 16-23.
Some of these terms may be factors.
Data
Enter three types of columns in the worksheet:
CONTENTS
16-19
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 16
HOW TO USE
Regression with Life Data
the number of levels), to compare the effect of different levels on the response variable. For
covariates, MINITAB estimate the coefficient associated with the covariate to describe its effect
on the response variable.
Unless you specify a predictor as a factor, the predictor is assumed to be a covariate. In the
model, terms may be created from these predictor variables and treated as factors, covariates,
interactions, or nested terms. The model can include up to 9 factors and 50 covariates. Factors
may be crossed or nested. Covariates may be crossed with each other or with factors, or nested
within factors. See How to specify the model terms on page 16-23.
You can enter up to ten samples per analysis.
Depending on the type of censoring you have, you will set up your worksheet in column or table
form. You can also structure the worksheet as raw data, or as frequency data. For details, see
Worksheet Structure for Regression with Life Data on page 16-3.
Factor columns can be numeric or text. MINITAB by default designates the lowest numeric or text
value as the reference level. To change the reference level, see Factor variables and reference levels
on page 16-24.
MINITAB automatically excludes all observations with missing values from all calculations.
How you run the analysis depend on whether your data are uncensored/right censored or
uncensored/arbitrarily censored.
h To perform regression with uncensored/right censored data
1 Choose Stat Reliability/Survival Regression with Life Data.
2 In Variables/Start variables, enter up to ten columns of failure times (10 different samples).
3 If you have frequency columns, enter them in Freq. columns.
4 In Model, enter the model termssee How to specify the model terms on page 16-23. If any of
16-20
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Regression with Life Data
5 Click Censor.
6 In Use censoring columns, enter the censoring columns. The first censoring column is
paired with the first data column, the second censoring column is paired with the second data
column, and so on.
By default, MINITAB uses the lowest value in the censoring column to indicate a censored
value. To use some other value, enter that value in Censoring value.
Note
If your censoring indicators are date/time values, store the values as constants and then
enter them as constants.
7 If you like, use any of the options listed below, then click OK.
h To perform regression with uncensored/arbitrarily censored data
1 Choose Stat Reliability/Survival Regression with Life Data.
2 Choose Responses are uncens/arbitrarily censored data.
3 In Variables/Start variables, enter up to ten columns of start times (ten different samples).
4 In End variables, enter up to ten columns of end times (ten different samples).
5 If you have frequency columns, enter them in Freq. columns.
6 In Model, enter the model termssee How to specify the model terms on page 16-23. If any of
CONTENTS
16-21
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 16
Options
Regression with Life Data dialog box
More
choose one of eight common lifetime distributions for the error distribution, including the
Weibull (default), extreme value, exponential, normal, lognormal basee, lognormal base10,
logistic, and loglogistic distributions.
MINITABs extreme value distribution is the smallest extreme value (Type 1).
enter new predictor values for which to estimate percentiles and/or survival probabilitiessee
Percentiles and survival probabilities on page 16-26. You can also use the predictor values from
the data.
estimate percentiles for the percents you specifysee Percentiles and survival probabilities on
page 16-26. By default, MINITAB estimates the 50th percentile.
estimate survival probabilities for the times you specifysee Percentiles and survival
probabilities on page 16-26. For example, if you enter ten hours, MINITAB estimates (for each
predictor value) the proportion of units that survive past ten hours.
specify a confidence level for all of the confidence intervals. The default is 95.0%.
draw a probability plot for the standardized residuals or an exponential probability plot for the
Cox-Snell residualsProbability plots on page 16-25.
16-22
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Regression with Life Data
enter starting values for model parameters for the Newton-Raphson algorithmsee
Estimating the model parameters on page 16-28.
change the maximum number of iterations for reaching convergence (the default is 20).
MINITAB obtains maximum likelihood estimates through an iterative process. If the
maximum number of iterations is reached before convergence, the command terminates
see Estimating the model parameters on page 16-28.
estimate other model coefficients while holding the shape parameter (Weibull) or the scale
parameter (other distributions) fixed at a specific valuesee Estimating the model parameters
on page 16-28.
use historical estimates for the parameters rather than estimate them from the datasee
Estimating the model parameters on page 16-28. In this case, no estimation is done; all
resultssuch as the percentilesare based on these parameters.
change the reference levels for the factorssee Factor variables and reference levels on page
16-24.
store the information on the estimated equation, including the coefficients, their standard
errors and confidence intervals, the variance/covariance matrix, and the log-likelihood for the
last iteration.
Default output
The default output consists of the regression table which displays:
the log-likelihood.
CONTENTS
16-23
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 16
HOW TO USE
Regression with Life Data
covariates that are crossed with each other or with factors, or nested within factors
A|X
A X XX
fits a model with a covariate crossed with itself making a squared term
A X(A)
This model is a generalization of the model used in MINITABs general linear model (GLM)
procedure. Any model fit by GLM can also be fit by the life data procedures. For a general
discussion of specifying models, see Specifying the model terms on page 3-19 and Specifying
reduced models on page 3-21. In the regression with life data commands, MINITAB assumes any
variable in the model is a covariate unless the variable is specified as a factor. In contrast, GLM
assumes any variable in the model is a factor unless the variable is specified as a covariate.
Model restrictions
Life data models in MINITAB have the same restrictions as general linear models:
The model must be full rank, meaning there must be enough data to estimate all the terms in
your model. Suppose you have a two-factor crossed model with one empty cell. You can then
fit the model with terms A B, but not A B AB. Do not worry about figuring out whether or not
your model is of full rank. MINITAB will tell you if it is not. In most cases, eliminating some of
the high order interactions in your model (assuming, of course, they are not important) will
solve your problem.
16-24
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Regression with Life Data
Regression with Life Data creates a set of design variables for each factor in the model. If there
are k levels, there will be k 1 design variables and the reference level will be coded as 0. Here
are two examples of the default coding scheme:
reference
level
reference
level
By default, MINITAB designates the lowest numeric, date/time, or text value as the reference
factor level. If you like, you can change this reference value in the Options subdialog box.
h To change the reference factor level
1 In the Regression with Life Data dialog box, click Options.
2 In Reference factor level, for each factor you want to set the reference level for, enter a factor
column followed by a value specifying the reference level. For text values, the value must be
in double quotes. For date/time values, store the value as a constant and then enter the
constant. Click OK.
Probability plots
The Regression with Life Data command draws probability plots for the standardized and
Cox-Snell residuals. You can use these plots to assess whether a particular distribution fits your
data. In general, the closer the points fall to the fitted line, the better the fit.
MINITAB provides one goodness-of-fit measure: the Anderson-Darling statistic. The
Anderson-darling statistic is useful in comparing the fit of different distributions. It measures the
distances from the plot points to the fitted line; therefore, a smaller Anderson-Darling statistic
indicates that the distribution provides a better fit.
MINITAB Users Guide 2
CONTENTS
16-25
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 16
HOW TO USE
Regression with Life Data
Tip
To plot the standardized residuals, check Probability plot for standardized residuals
To plot the Cox-Snell residuals, check Exponential probability plot for Cox-Snell
residuals
To draw a probability plot with more options, store the residuals in the Storage subdialog
box, then use the probability plot included with Parametric Distribution Analysis on page
15-27.
To enter new predictor values: In Enter new predictor values, enter a set of predictor
values (or columns containing sets of predictor values) for which you want to estimate
16-26
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Regression with Life Data
percentiles or survival probabilities. The predictor values must be in the same order as the
main effects in the model. For an illustration, see Example of regression with life data on
page 16-29.
To use the predictor values in the data, choose Use predictor values in data (storage
only). Because of the potentially large amount of output, MINITAB stores the results in the
worksheet rather then printing them in the Session window.
2 Choose In addition, list of factor level values, tests for terms with more than 1 degree of
CONTENTS
16-27
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 16
HOW TO USE
Regression with Life Data
change the maximum number of iterations for reaching convergence (the default is 20).
MINITAB obtains maximum likelihood estimates through an iterative process. If the maximum
number of iterations is reached before convergence, the command terminates.
estimate other model coefficients while holding the shape parameter (Weibull) or the scale
parameter (other distributions) fixed at a specific value.
Why enter starting values for the algorithm? The maximum likelihood solution may not converge
if the starting estimates are not in the neighborhood of the true solution, so you may want to
specify what you think are good starting values for parameter estimates.
In all cases, enter a column with entries which correspond to the model terms in the order you
entered them in the Model box. With complicated models, find out the order of entries for the
starting estimates column by looking at the regression table in the output.
h To control estimation of the parameters
1 In the Regression with Life Data dialog box, click Options.
To estimate the model parameters from the data (the default), choose Estimate model
parameters.
To enter starting estimates for the parameters: In Use starting estimates, enter one
column to be used for all of the response variables, or a number of columns equal to the
number of response variables.
To specify the Maximum number of iterations, enter a positive integer.
16-28
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Regression with Life Data
To estimate other model coefficients while holding the shape (Weibull) or the scale
(other distributions) parameter fixed: In Set shape (Weibull) or scale parameter (other
distributions) at, enter one value to be used for all of the response variables, or a
number of values equal to the number of response variables.
To enter your own estimates for the model parameters, choose Use historical estimates
and enter one column to be used for all of the response variables, or a number of columns
equal to the number of response variables.
3 Click OK.
e Example of regression with life data
Suppose you want to investigate the deterioration of an insulation used for electric motors. You
want to know if you can predict failure times for the insulation based on the plant in which it was
manufactured, and the temperature at which the motor runs. It is known that an Arrhenius
relationship exists between temperature and failure time.
You gather failure times at plant 1 and plant 2 for the insulation at four temperatures110, 130,
150, and 170C. Because the motors generally run at between 80 and 100C, you want to
predict the insulations behavior at those temperatures.
To see how well the model fits, you will draw a probability plot based on the standardized
residuals.
1 Open the worksheet INSULATE.MTW.
2 Choose Stat Reliability/Survival Regression with Life Data.
3 In Variables/Start variables, enter FailureT.
4 In Model, enter ArrTemp Plant. In Factors (optional), enter Plant.
5 Click Censor. In Use censoring columns, enter Censor, then click OK.
6 Click Estimate. In Enter new predictor values, enter ArrNewT NewPlant, then click OK.
7 Click Graphs. Check Probability plot for standardized residuals, then click OK in each
dialog box.
CONTENTS
16-29
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 16
Session
window
output
HOW TO USE
Regression with Life Data
Count
66
14
Coef
-15.3411
0.83925
Standard
Error
0.9508
0.03397
-0.18077
2.9431
0.08457
0.2707
95.0% Normal CI
Lower
Upper
-17.2047
-13.4775
0.77267
0.90584
Z
P
-16.13 0.000
24.71 0.000
-2.14 0.033
-0.34652
2.4577
-0.01501
3.5244
Log-Likelihood = -562.525
Anderson-Darling (adjusted) Goodness-of-Fit
Standardized Residuals = 0.5078
Table of Percentiles
Percent
50
50
50
50
Predictor
Row Number Percentile
1
182093.6
2
151980.8
3
41530.38
4
34662.51
Standard
Error
32466.16
25286.65
5163.756
3913.866
95.0% Normal CI
Lower
Upper
128389.8
258260.9
109689.6
210577.6
32548.44
52990.94
27781.00
43248.61
Graph
window
output
16-30
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
References
HOW TO USE
Regression with Life Data
For motors running at 80 C, insulation from plant 1 lasts about 182093.6 hours or 20.77
years; insulation from plant 2 lasts about 151980.8 hours or 17.34 years.
For motors running at 100 C, insulation from plant 1 lasts about 41530.38 hours or 4.74
years; insulation from plant 2 lasts about 34662.51 hours or 3.95 years.
As you can see from the low p-values, the plants are significantly different at the = .05 level,
and temperature is a significant predictor.
The probability plot for standardized residuals will help you determine whether the distribution,
transformation, and equal shape (Weibull or exponential) or scale parameter (other
distributions) assumption is appropriate. Here, the plot points fit the fitted line adequately;
therefore you can assume the model is appropriate.
References
[1] J.D. Kalbfleisch and R.L. Prentice (1980). The Statistical Analysis of Failure Time Data,
John Wiley & Sons, Inc.
[2] J.F. Lawless (1982). Statistical Models and Methods for Lifetime Data, John Wiley & Sons,
Inc.
[3] W.Q. Meeker and L.A. Escobar (1998). Statistical Methods for Reliability Data, John Wiley
& Sons, Inc.
[4] W. Murray (Ed.) (1972). Numerical Methods for Unconstrained Optimization, Academic
Press.
[5] W. Nelson (1990). Accelerated Testing, John Wiley & Sons, Inc.
MINITAB Users Guide 2
CONTENTS
16-31
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
17
Probit Analysis
CONTENTS
17-1
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 17
Probit Analysis
Use probit analysis when you want to estimate percentiles, survival probabilities, and cumulative
probabilities for the distribution of a stress, and draw probability plots. When you enter a factor
and choose a Weibull, lognormal, or loglogistic distribution, you can also compare the potency of
the stress under different conditions.
MINITAB calculates the model coefficients using a modified Newton-Raphson algorithm.
Data
Enter the following columns in the worksheet:
17-2
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Probit Analysis
Probit Analysis
Response variable
The response data is binomial, so you have two possible outcomes, success or failure. You can
enter the data in either success/trial or response/frequency format. Here is the same data
arranged both ways:
Success/trial format
Temp
80
120
140
160
Success
2
4
7
9
Trials
10
10
10
10
Response/frequency format
Response
1
0
1
0
1
0
1
0
Frequency
2
8
4
6
7
3
9
1
Temp
80
80
120
120
140
140
160
160
Factors
Text categories (factor levels) are processed in alphabetical order by default. If you wish, you can
define your own ordersee Ordering Text Categories in the Manipulating Data chapter of
MINITAB Users Guide 1 for details.
h To perform a probit analysis
How you run the analysis depend on whether your worksheet is in success/trial or response/
frequency format.
1 Choose Stat Reliability/Survival Probit Analysis.
CONTENTS
17-3
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 17
Probit Analysis
2 Do one of the following:
Options
Probit Analysis dialog box
choose one of seven common lifetime distributions, including the normal (default), lognormal
basee, lognormal base10, logistic, loglogistic, Weibull, and extreme value distributions.
estimate percentiles for the percents you specifysee Percentiles on page 17-7. These
percentiles are added to the default table of percentiles.
estimate survival probabilities for the stress values you specifysee Survival and cumulative
probabilities on page 17-8.
specify a confidence level for all of the confidence intervals. The default is 95%.
plot the Pearson or deviance residuals versus the event probability. Use these plots to identify
poorly fit observations.
enter starting values for model parameterssee Estimating the model parameters on page
17-11.
17-4
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Probit Analysis
Probit Analysis
change the maximum number of iterations for reaching convergence (the default is 20).
MINITAB obtains maximum likelihood estimates through an iterative process. If the
maximum number of iterations is reached before convergence, the command terminates
see Estimating the model parameters on page 17-11.
use historical estimates for the model parameters. In this case, no estimation is done; all
resultssuch as the percentilesare based on these historical estimates. See Estimating the
model parameters on page 17-11.
estimate the natural response rate from the data or specify a valuesee Natural response rate
on page 17-12.
if you have response/frequency data, you can define the value used to signify the occurrence
of a success. Otherwise, the highest value in the column is used.
enter a reference level for the factorsee Factor variables and reference levels on page 17-10.
Otherwise, the lowest value in the column is used.
perform a Hosmer-Lemeshow test to assess how well your model fits the data. This test bins
the data into 10 groups by default; if you like, you can specify a different number.
Note
When you select fiducial confidence intervals, MINITAB will display fiducial confidence
intervals for the median, Q1, and Q2 and normal confidence intervals for mean, standard
deviation, and IQR in the characteristics of distribution table.
CONTENTS
17-5
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 17
HOW TO USE
Probit Analysis
Output
The default output consists of:
the regression table, which includes the estimated coefficients and their
standard errors.
Z-values and p-values. The Z-test tests that the coefficient is significantly different than 0; in
other words, is it a significant predictor?
natural response ratethe probability that a unit fails without being exposed to any of the
stress.
the test for equal slopes, which tests that the slopes associated with the factor levels are
significantly different.
two goodness-of-fit tests, which evaluate how well the model fits the data. The null hypothesis
is that the model fits the data adequately. Therefore, the higher the p-value the better the
model fits the data.
the parameter estimates for the distribution and their standard errors and 95% confidence
intervals. The parameter estimates are transformations of the estimated coefficients in the
regression table.
the table of percentiles, which includes the estimated percentiles, standard errors, and 95%
fiducial confidence intervals.
the probability plot, which helps you to assess whether the chosen distribution fits your data
see Probability plots on page 17-9.
the relative potencycompares the potency of a stress for two levels of a factor. To get this
output, you must have a factor, and choose a Weibull, lognormal, or loglogistic distribution.
Suppose you want to compare how the amount of voltage affects two types of light bulbs, and
the relative potency is .98. This means that light bulb 1 running at 117 volts would fail at
approximately the same time as light bulb 2 running at 114.66 volts (117 .98).
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Probit Analysis
Probit Analysis
j = c + ( 1 c )g ( 0 + x j )
where
j
xj
Distribution function
logistic
g(yj) = 1 ( 1 + e
normal
g(yj) = (yj)
extreme value
g(yj) = 1 e
yj
yj
Mean
Variance
pi2 / 3
Percentiles
At what stress level do half of the units fail? How much pesticide do you need to apply to kill 90%
of the insects? You are looking for percentiles.
Common percentiles used are the 10th, 50th, and 90th percentiles, also known in the life
sciences as the ED 10, ED 50 and ED 90 (ED = effective dose).
The probit analysis automatically displays a table of percentiles in the Session window, along
with 95% fiducial confidence intervals. You can also request:
CONTENTS
17-7
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 17
Probit Analysis
The Percentile column contains the stress level required for the corresponding percent of the
events to occur.
In this example, you exposed light bulbs to various voltages and recorded whether or not the bulb
burned out before 800 hours.
Table of Percentiles
At 104.9931 volts,
1% of the bulbs burn
out before 800 hours.
Percent Percentile
1
104.9931
2
106.9313
3
108.1795
4
109.1281
etc.
Standard
Error
1.3715
1.2661
1.1997
1.1504
95.0% Fiducial CI
Lower
Upper
101.9273
107.3982
104.1104
109.1598
105.5144
110.2980
106.5795
111.1656
In Estimate percentiles for these additional percents, enter the percents or a column of
percents.
Change the confidence level for the percentiles (default is 95%): In Confidence level,
enter a value. This changes the confidence level for all confidence intervals.
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Probit Analysis
HOW TO USE
Probit Analysis
burned out before 800 hours. Then we requested a survival probability for light bulbs subjected
to 117 volts:
Table of Survival Probabilities
Stress Probability
117.0000
0.7692
95.0% Fiducial CI
Lower
Upper
0.6224
0.8825
To calculate cumulative probabilities (the likelihood of failing rather than surviving), subtract the
survival probability from 1. In this case, the probability of failing before 800 hours at 117 volts is
0.2308.
h To request survival probabilities
1 In the Probit Analysis main dialog box, click Estimate.
2 In Estimate survival probabilities for these stress values, enter one or more stress values or
Probability plots
A probability plot displays the percentiles. You can use the probability plot to assess whether a
particular distribution fits your data. In general, the closer the points fall to the fitted line, the
better the fit.
For a discussion of probability plots, see Probability plots on page 15-36.
When you have more than one factor level, lines and confidence intervals are drawn for each
level. If the plot looks cluttered, you can turn off the confidence intervals in the Graphs
subdialog box. You can also change the confidence level for the 95% confidence by entering a
new value in the Estimate subdialog box.
Survival plots
Survival plots display the survival probabilities versus stress. Each point on the plot represents the
proportion of units surviving at a stress level. The survival curve is surrounded by two outer
MINITAB Users Guide 2
CONTENTS
17-9
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 17
HOW TO USE
Probit Analysis
linesthe 95% confidence interval for the curve, which provide reasonable values for the true
survival function.
For an illustration of a survival plot, see Survival plots on page 15-40.
h To draw a survival plot
1 In the Probit Analysis dialog box, click Graphs.
reference
level
reference
level
By default, MINITAB designates the lowest numeric, date/time, or text value as the reference factor
level. If you like, you can change this reference value in the Options subdialog box.
17-10
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Probit Analysis
Probit Analysis
change the maximum number of iterations for reaching convergence (the default is 20).
MINITAB obtains maximum likelihood estimates through an iterative process. If the
maximum number of iterations is reached before convergence, the command terminates.
Why enter starting values for the algorithm? The maximum likelihood solution may not
converge if the starting estimates are not in the neighborhood of the true solution, so you may
want to specify what you think are good starting values for parameter estimates.
h To control estimation of the parameters
1 In the Probit Analysis main dialog box, click Options.
Note
To estimate the model parameters from the data (the default), choose Estimate model
parameters.
To enter starting estimates for the parameters: In Use starting estimates, enter one
starting value for each coefficient in the regression table. Enter the values in the order
that they appear in the regression table.
Do not enter a starting value for the natural response rate here.
To enter your own estimates for the model parameters, choose Use historical estimates
and enter one starting value for each coefficient in the regression table. Enter the values in
the order that they appear in the regression table.
CONTENTS
17-11
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 17
Probit Analysis
Suppose you work for a lightbulb manufacturer and have been asked to determine bulb life for
two types of bulbs at typical household voltages. The typical line voltage entering a house is 117
volts + 10% (or 105 to 129 volts).
You subject the two bulbs to five stress levels within that range108, 114, 120, 126, and 132
volts, and define a success as: The bulb fails before 800 hours.
1 Open the worksheet LIGHTBUL.MTW.
2 Choose Stat Reliability/Survival Probit Analysis.
3 Choose Response in success/trial format.
4 In Number of successes, enter Blows. In Number of trials, enter Trials.
5 In Stress (stimulus), enter Volts.
6 In Factor (optional), enter Type. In Enter number of levels, enter 2.
7 From Assumed distribution, choose Weibull.
8 Click Estimate. In Estimate survival probabilities for these stress values, enter 117. Click
OK.
9 Click Graphs. Uncheck Display confidence intervals on above plots. Click OK in each
dialog box.
17-12
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Probit Analysis
Session
window
output
HOW TO USE
Probit Analysis
Weibull
Response
Information
Variable
Blows
Value
Success
Failure
Total
Trials
Count
192
308
500
Factor Information
Factor
Type
Levels Values
2 A B
CONTENTS
17-13
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 17
Probit Analysis
Regression Table
Variable
Constant
Volts
Type
B
Natural
Response
Coef
-97.019
20.019
Standard
Error
7.673
1.587
Z
P
-12.64 0.000
12.61 0.000
0.1794
0.1598
1.12 0.262
0.000
DF = 1, P-Value = 0.611
Goodness-of-Fit Tests
Method
Pearson
Deviance
Chi-Square
2.516
2.492
DF
P
7 0.926
7 0.928
Type = A
Tolerance Distribution
Parameter Estimates
Parameter
Shape
Scale
Estimate
20.019
127.269
Standard
Error
1.587
0.737
95.0% Normal CI
Lower
Upper
17.138
23.384
125.832
128.722
Standard
Error
1.8424
1.6355
1.5090
1.4171
1.3449
1.2854
1.2348
1.1909
1.1523
1.1177
0.8986
0.7901
0.7358
0.7179
95.0% Fiducial CI
Lower
Upper
96.9868
104.3407
101.0429
107.5731
103.5009
109.5267
105.2866
110.9457
106.6975
112.0680
107.8683
113.0007
108.8717
113.8017
109.7516
114.5057
110.5364
115.1354
111.2458
115.7062
116.1208
119.7003
119.2012
122.3424
121.5505
124.4720
123.5231
126.3718
Table of Percentiles
Percent
1
2
3
4
5
6
7
8
9
10
20
30
40
50
Percentile
101.1409
104.7307
106.9008
108.4760
109.7203
110.7531
111.6387
112.4158
113.1096
113.7373
118.0817
120.8808
123.0693
124.9600
17-14
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Probit Analysis
HOW TO USE
Probit Analysis
Estimate
20.019
126.134
Standard
Error
1.587
0.704
95.0% Normal CI
Lower
Upper
17.138
23.384
124.761
127.522
Standard
Error
1.8617
1.6562
1.5303
1.4386
1.3663
1.3065
1.2556
1.2113
1.1722
1.1371
0.9108
0.7929
0.7280
0.6989
95.0% Fiducial CI
Lower
Upper
96.0399
103.4706
100.0595
106.6728
102.4960
108.6073
104.2667
110.0121
105.6661
111.1226
106.8277
112.0453
107.8234
112.8374
108.6967
113.5335
109.4760
114.1558
110.1805
114.7197
115.0289
118.6590
118.1018
121.2561
120.4520
123.3436
122.4294
125.2031
Table of Percentiles
Percent
1
2
3
4
5
6
7
8
9
10
20
30
40
50
Percentile
100.2388
103.7965
105.9472
107.5084
108.7416
109.7652
110.6429
111.4131
112.1007
112.7228
117.0285
119.8026
121.9716
123.8454
Relative
Potency
0.9911
CONTENTS
95.0% Fiducial CI
Lower
Upper
0.9754
1.0068
17-15
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 17
HOW TO USE
References
Graph
window
output
References
[1] D.J. Finney (1971). Probit Analysis, Cambridge University Press.
[2] D.W. Hosmer and S. Lemeshow (1989). Applied Logistic Regression, John Wiley & Sons, Inc.
[3] P. McCullagh and J.A. Nelder (1992). Generalized Linear Models, Chapman & Hall.
[4] W. Murray, Ed. (1972). Numerical Methods for Unconstrained Optimization, Academic
Press.
[5] W. Nelson (1982). Applied Life Data Analysis, John Wiley & Sons.
17-16
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
18
Design of Experiments
Overview
See also,
CONTENTS
18-1
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 18
SC QREF
HOW TO USE
Planning
Careful planning can help you avoid problems that can occur during the execution of the
experimental plan. For example, personnel, equipment availability, funding, and the mechanical
aspects of your system may affect your ability to complete the experiment. If your project has low
priority, you may want to carry out small sequential experiments. That way, if you lose resources
to a higher priority project, you will not have to discard the data you have already collected.
When resources become available again, you can resume experimentation.
The preparation required before beginning experimentation depends on your problem. Here are
some steps you may need to go through:
Define the problem. Developing a good problem statement helps make sure you are studying
the right variables. At this step, you identify the questions that you want to answer.
Define the objective. A well-defined objective will ensure that the experiment answers the right
questions and yields practical, usable information. At this step, you define the goals of the
experiment.
Develop an experimental plan that will provide meaningful information. Be sure to review
relevant background information, such as theoretical principles, and knowledge gained
through observation or previous experimentation. For example, you may need to identify
which factors or process conditions affect process performance and contribute to process
variability. Or, if the process is already established and the influential factors have been
identified, you may want determine optimal process conditions.
18-2
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Make sure the process and measurement systems are in control. Ideally, both the process and
the measurements should be in statistical control as measured by a functioning statistical
process control (SPC) system. Even if you do not have the process completely in control, you
must be able to reproduce process settings. You also need to determine the variability in the
measurement system. If the variability in your system is greater than the difference/effect that
you consider important, experimentation will not yield useful results.
MINITAB provides numerous tools to evaluate process control and analyze your measurement
system.
Screening
In many process development and manufacturing applications, potentially influential variables
are numerous. Screening reduces the number of variables by identifying the key variables that
affect product quality. This reduction allows you to focus process improvement efforts on the
really important variables, or the vital few. Screening may also suggest the best or optimal
settings for these factors, and indicate whether or not curvature exists in the responses. Then, you
can use optimization methods to determine the best settings and define the nature of the
curvature.
Chapter 19, Factorial Designs, describes methods that are often used for screening:
two-level full and fractional factorial designs are used extensively in industry
Plackett-Burman designs have low resolution, but their usefulness in some screening
experimentation and robustness testing is widely recognized
general full factorial designs (designs with more than two-levels) may also be useful for small
screening experiments
Optimization
After you have identified the vital few by screening, you need to determine the best or
optimal values for these experimental factors. Optimal factor values depend on the process
objective. For example, you may want to maximize process yield or reduce product variability.
The optimization methods available in MINITAB include general full factorial designs (designs
with more than two-levels), response surface designs, mixture designs, and Taguchi designs.
Chapter 19, Factorial Designs, describes methods for designing and analyzing general full
factorial designs.
Chapter 20, Response Surface Designs, describes methods for designing and analyzing central
composite and Box-Behnken designs.
Chapter 21, Mixture Designs, describes methods for designing and analyzing simplex
centroid, simplex lattice, and extreme vertices designs. Mixture designs are special class of
response surface designs where the proportions of the components (factors), rather than their
magnitude, are important.
CONTENTS
18-3
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 18
SC QREF
HOW TO USE
Chapter 23, Response Optimization, describes methods for optimizing multiple responses.
MINITAB provides numerical optimization, an interactive graph, and an overlaid contour plot
to help you determine the best settings to simultaneously optimize multiple responses.
Chapter 24, Taguchi Designs, describes methods for analyzing Taguchi designs. Taguchi
designs may also be called orthogonal array designs, robust designs, or inner-outer array
designs. These designs are used for creating products that are robust to conditions in their
expected operating environment.
Verification
Verification involves performing a follow-up experiment at the predicted best processing
conditions to confirm the optimization results. For example, you may perform a few verification
runs at the optimal settings, then obtain a confidence interval for the mean response.
More
Our intent is to provide only a brief introduction to the design of experiments. There are
many resources that provide a thorough treatment of these methods. For a list of
resources, see References on pages 19-63, 20-37, 21-54, and 24-39.
StdOrder
RunOrder
Blocks
If you want to analyze your design with the Analyze Design procedures, you must follow certain
rules when modifying worksheet data. If you make changes that corrupt your design, you may still
be able to analyze it with the Analyze Design procedures after you use one of the Define Custom
Design procedures.
You cannot delete or move the columns that contain the design.
You can enter, edit, and analyze data in all the other columns of the worksheet, that is, all
columns beyond the last design column. You can place the response and covariate data here,
or any other data you want to enter into the worksheet.
18-4
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Note
HOW TO USE
You can delete runs from your design. If you delete runs, you may not be able to fit all terms
in your model. In that case, MINITAB will automatically remove any terms that cannot be fit
and do the analysis using the remaining terms.
You can add runs to your design. For example, you may want to add center points or a
replicate of a particular run of interest. Make sure the levels are appropriate for each factor or
component and that you enter appropriate values in StdOrder, RunOrder, CenterPt, PtType
and Blocks. These columns and the factor or component columns must all be the same
length. You can use any numbers that seem reasonable for StdOrder and RunOrder. MINITAB
uses these two columns to order data in the worksheet.
You can change the level of a factor for a botched run in the Data windowsee Analyzing
designs with botched runs on page 19-43.
You can change factor level settings using Modify Design. However, you cannot change a
factor type from numeric to text or text to numeric.
You can change the name of factors and components using Modify Design.
You can use any procedures to analyze the data in your design, not just the procedures in the
DOE menu.
You can add factors to your design by entering them in the worksheet. Then, use one of the
Define Custom Design procedures.
If you make changes that corrupt your design, you may still be able to analyze it. You can
redefine the design using one of the Define Custom Design procedures.
CONTENTS
SC QREF
18-5
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
19
Factorial Designs
See also,
CONTENTS
19-1
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 19
HOW TO USE
Factorial Designs Overview
Screening designs
In many process development and manufacturing applications, the number of potential input
variables (factors) is large. Screening (process characterization) is used to reduce the number of
input variables by identifying the key input variables or process conditions that affect product
quality. This reduction allows you to focus process improvement efforts on the few really
important variables, or the vital few. Screening may also suggest the best or optimal settings
for these factors, and indicate whether or not curvature exists in the responses. Optimization
experiments can then be done to determine the best settings and define the nature of the
curvature.
In industry, two-level full and fractional factorial designs, and Plackett-Burman designs are often
used to screen for the really important factors that influence process output measures or
product quality. These designs are useful for fitting first-order models (which detect linear
effects), and can provide information on the existence of second-order effects (curvature) when
the design includes center points.
In addition, general full factorial designs (designs with more than two-levels) may be used with
small screening experiments.
19-2
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Factorial Designs
Two factors
Three factors
A is high
B is low
C is low
A
A
C
A is low
B is low
B
Two levels of each factor
CONTENTS
19-3
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 19
HOW TO USE
Factorial Designs Overview
full factorial design. When you do not run all factor level combinations, some of the effects will
be confounded. Confounded effects cannot be estimated separately and are said to be aliased.
MINITAB displays an alias table which specifies the confounding patterns. Because some effects
are confounded and cannot be separated from other effects, the fraction must be carefully chosen
to achieve meaningful results. Choosing the best fraction often requires specialized knowledge
of the product or process under investigation.
Plackett-Burman designs
Plackett-Burman designs are a class of resolution III, two-level fractional factorial designs that are
often used to study main effects. In a resolution III design, main effects are aliased with two-way
interactions.
MINITAB generates designs for up to 47 factors. Each design is based on the number of runs, from
8 to 48 and always a multiple of 4. The number of factors must be less than the number of runs.
More
Our intent is to provide only a brief introduction to factorial designs. There are many
resources that provide a thorough treatment of these designs. For a list of resources, see
References on page 19-63.
example, you must determine what the influencing factors are, that is, what processing
conditions influence the values of the response variable. See Planning on page 18-2.
2 In MINITAB, create a new design or use data that is already in your worksheet.
Use Create Factorial Design to generate a full or fractional factorial designsee Creating
Two-Level Factorial Designs on page 19-6, Creating Plackett-Burman Designs on page
19-23, and Creating General Full Factorial Designs on page 19-31.
Use Define Custom Factorial Design to create a design from data you already have in the
worksheet. Define Custom Factorial Design allows you to specify which columns are your
factors and other design characteristics. You can then easily fit a model to the design and
generate plots. See Defining Custom Designs on page 19-34.
3 Use Modify Design to rename the factors, change the factor levels, replicate the design, and
randomize the design. For two-level designs, you can also fold the design, add axial points, and
add center points to the axial block. See Modifying Designs on page 19-37.
4 Use Display Design to change the display order of the runs and the units (coded or uncoded)
in which MINITAB expresses the factors in the worksheet. See Displaying Designs on page
19-41.
5 Perform the experiment and collect the response data. Then, enter the data in your MINITAB
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Choosing a Design
Factorial Designs
7 Display plots to look at the design and the effects. Use Factorial Plots to display main effects,
interactions, and cube plotssee page 19-52. For two-level designs, use Contour/Surface
(Wireframe) Plots to display contour and surface plotssee page 19-59.
8 If you are trying to optimize responses, use Response Optimizer (page 23-2) or Overlaid
Choosing a Design
The design, or layout, provides the specifications for each experimental run. It includes the
blocking scheme, randomization, replication, and factor level combinations. This information
defines the experimental conditions for each run. When performing the experiment, you
measure the response (observation) at the predetermined settings of the experimental conditions.
Each experimental condition that is employed to obtain a response measurement is a run.
MINITAB provides two-level full and fractional factorial designs, Plackett-Burman designs, and
full factorials for designs with more than two levels. When choosing a design you need to
determine the impact that other considerations (such as cost, time, or the availability of
facilities) have on your choice of a design.
Depending on your problem, there are other considerations that make a design desirable. You
may want to choose a design that allows you to
increase the order of the design sequentially. That is, you may want to build up the initial
design for subsequent experimentation.
perform the experiment in orthogonal blocks. Orthogonally blocked designs allow for model
terms and block effects to be estimated independently and minimize the variation in the
estimated coefficients.
estimate the effects that you believe are important by choosing a design with adequate
resolution. The resolution of a design describes how the effects are confounded. Some
common design resolutions are summarized below:
Resolution III designsno main effect is aliased with any other main effect. However,
main effects are aliased with two-factor interactions and two-factor interactions are aliased
with each other.
Resolution IV designsno main effect is aliased with any other main effect or two-factor
interaction. Two-factor interactions are aliased with each other.
CONTENTS
19-5
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 19
SC QREF
HOW TO USE
Resolution V designsno main effect or two-factor interaction is aliased with any other
main effect or two-factor interaction. Two-factor interactions are aliased with three-factor
interactions.
More
You can use default designs from MINITABs catalog (these designs are shown in the Display
Available Designs subdialog box) or create your own design by specifying the design generators
(see Specifying generators to add factors to the base design on page 19-8).
The default designs cover many industrial product design and development applications. They
are fully described in the Summary of Two-Level Designs on page 19-27.
To create full factorial designs when any factor has more than two levels or you have more than
seven factors, see Creating General Full Factorial Designs on page 19-31.
Note
To create a design from data that you already have in the worksheet, see Defining Custom
Designs on page 19-34.
19-6
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Factorial Designs
2 If you want to see a summary of the factorial designs, click Display Available Designs. Use
6 In the box at the top, highlight the design you want to create. If you like, use any of the
Options
Designs subdialog box
replicate the corner points of the designsee Replicating the design on page 19-11
CONTENTS
19-7
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 19
SC QREF
HOW TO USE
block a design that was created using the default generatorssee Blocking the design on page
19-12
for fractional factorials, specify the fraction to usesee Choosing a fraction on page 19-15
Caution
if you choose to display the alias table, you can specify the highest order interaction to print in
the alias table. The default alias table for designs with
up to 7 factors, shows all terms.
8 to 10 factors, shows up to four-way interactions.
11 or more factors, shows up to three-way interactions.
Be careful! High-order interactions with a large number of factors could take a very long
time to compute.
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Factorial Designs
5 In the box at the top, highlight the design you want to create. The selected design will serve as
the base design. If you like, use any of the options listed under Designs subdialog box on page
19-7.
6 Click Generators.
7 In Add factors to the base design by listing their generators, enter the generators for up to
15 additional factors in alphabetical order. Click OK in the Generators and Design subdialog
boxes.
8 If you want to block the design, in Define blocks by listing their generators, enter the block
generators. Click OK in the Generators and Design subdialog boxes. For more information,
see Blocking the design on page 19-12.
9 If you like, click Options, Factors, and/or Results to use any of the options listed on page
Suppose you want to add two factors to a base design with three factors and eight runs.
1 Choose Stat DOE Factorial Create Factorial Design.
MINITAB Users Guide 2
CONTENTS
19-9
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 19
SC QREF
HOW TO USE
Session
window
output
Factorial Design
Fractional Factorial Design
Factors:
Runs:
Blocks:
5
8
none
Base Design:
Replicates:
Center pts (total):
3, 8
1
0
Resolution: III
Fraction: 1/4
*** NOTE *** Some main effects are confounded with two-way interactions
Design Generators: D = AB
E = AC
19-10
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Factorial Designs
The way MINITAB adds center points to the design depends on whether you have text, numeric,
or a combination of text and numeric factors. Here is how MINITAB adds center points:
When all of the factors in a design are text, you cannot add center points.
When you have a combination of numeric and text factors, there is no true center to the
design. In this case, center points are called pseudo-center points. When the design is
not blocked, MINITAB adds the specified number of center points for each combination of
the levels of the text factors.
blocked, MINITAB adds the specified number of center points for each combination of the
levels of the text factors to each block.
For example, consider an unblocked 23design. Factors A and C are numeric with levels 0, 10
and .2, .3, respectively. Factor B is text indicating whether a catalyst is present or absent. If you
specify three center points in the Designs subdialog box, MINITAB adds a total of 2 3 = 6
pseudo-center points, three points for the low level of factor B and three for the high level.
These six points are
5 present .25
5 present .25
5 present .25
5 absent .25
5 absent .25
5 absent .25
Next, consider a blocked 25 design where three factors are text, and there are two blocks.
There are 2 2 2 = 8 combinations of text levels. If you specify two center points per block,
MINITAB will add 8 2 = 16 pseudo-center points to each of the two blocks.
h To add center points to the design
1 In the Create Factorial Design dialog box, click Designs.
2 From Number of center points, choose a number up to 25. Click OK.
CONTENTS
19-11
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 19
HOW TO USE
The runs that would be added to a two-factor full factorial design are as follows:
Initial design
(one replicate)
A B
- + +
+ - +
+
+
+
+
-
+
+
+
+
-
+
+
True replication provides an estimate of the error or noise in your process and may allow for more
precise estimates of effects.
h To replicate the design
1 In the Create Factorial Design dialog box, click Designs.
2 From Number of replicates, choose a number up to 10. Click OK.
More
You can also replicate a design after it has been created using Modify Design (page
19-37).
When you have more than one block, MINITAB randomizes each block independently.
19-12
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Factorial Designs
The list shows all the possible blocking combinations for the selected design with the number of
specified replicates. If you change the design or the number of replicates, the list will reflect a
new set of possibilities.
If your design has replicates, MINITAB attempts to put the replicates in different blocks. For
details, see Rule for blocks with replicates for default designs on page 19-27.
h To block a design created by specifying your own generators
You need to specify your own block generators because MINITAB cannot automatically
determine good generators when you are adding factors.
Suppose you generate a 64 run design with 8 factors (labeled alphabetically) and specify the
block generators to be ABC CDE. This gives four blocks which are shown in standard (Yates)
order below:
Block
1
2
3
4
Note
ABC
CDE
+
+
Blocking a design can reduce its resolution. Let r1 = the resolution before blocking.
Let r2 = the length of the shortest term that is confounded with blocks. Then the
resolution after blocking is the smaller of r1 and (r2 + 1).
2 In Define blocks by listing their generators, type the block generators. Click OK.
CONTENTS
19-13
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 19
HOW TO USE
main effects and two-factor interactions. If you fold on all factors, then all main effects will be free
from each other and from all two-factor interactions.
h To fold the design
1 In the Create Factorial Design dialog box, click Options.
Choose Fold on all factors to make all main effects free from each other and all two-factor
interactions.
Choose Fold just on factor and then choose a factor from the list to make the specified
factor and all its two-factor interactions free from other main effects and two-factor
interactions.
Method
For example, suppose you are creating a three-factor design in four runs.
When you fold on all factors, MINITAB adds to the original four runs, four runs with all the
signs reversed thereby doubling the number of runs.
When you fold on one factor, MINITAB reverses the signs on the specified factor while the
signs on the remaining factors are left alone. These rows are then appended to the end of the
data matrix, doubling the number of runs.
Original fraction
A B C
- - +
+ - - + + + +
+
+
-
Folded on factor A
A B C
- - +
+ - - + + + +
+
+
-
+
+
-
+
+
+
+
When you fold a design, the defining relation is usually shortened. Specifically, any word in the
defining relation that has an odd number of the letters on which you folded the design is omitted.
19-14
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Factorial Designs
If you fold a design and the defining relation is not shortened, then the folding just adds
replicates. It does not reduce confounding. In this case, MINITAB gives you an error message.
If you fold a design that is blocked, the same block generators are used for the folded design as
for the unfolded design.
Choosing a fraction
When you create a fractional factorial design, MINITAB uses the principal fraction by default.
The principal fraction is the fraction where all signs are positive. However, there may be
situations when a design contains points that are impractical to run and choosing an appropriate
fraction can avoid these points.
A full factorial design with 5 factors requires 32 runs. If you want just 8 runs, you need to use a
one-fourth fraction. You can use any of the four possible fractions of the design. MINITAB
numbers the runs in standard (Yates) order using the design generators as follows:
1
2
3
4
D
D
D
D
= -AB
= AB
= -AB
= AB
E
E
E
E
=
=
=
=
-AC
-AC
AC
AC
The default fraction is called the principal fraction. This is the fraction where all signs are
positive (D = AB E = AC). In the blocking example, shown on page 19-20, we asked for the
third fraction. This is the one with design generators D = AB and E = AC.
Choosing an appropriate fraction can avoid points that are impractical or impossible to run. For
example, suppose you could not run the design in the previous example with all five factors set at
their high level. The principal fraction contains this point, but the third fraction does not.
Note
If you choose to use a fraction other than the principal fraction, you cannot use minus
signs for the design generators in the Generators subdialog box. Using minus signs in this
case is not useful anyway.
CONTENTS
19-15
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 19
SC QREF
HOW TO USE
StdOrder shows what the order of the runs in the experiment would be if the experiment was
done in standard orderalso called Yates order.
RunOrder shows what the order of the runs in the experiment would be if the experiment was
run in random order.
If you do not randomize, the run order and standard order are the same.
If you want to re-create a design with the same ordering of the runs (that is, the same design
order), you can choose a base for the random data generator. Then, when you want to re-create
the design, you just use the same base.
Note
When you have more than one block, MINITAB randomizes each block independently.
More
You can use Stat DOE Display Design (page 19-41) to switch back and forth
between a random and standard order display in the worksheet.
C3 (CenterPt) (two-level factorials and Plackett-Burman designs only) contains a 0 if the row
is a center point run. Otherwise, it contains a 1.
C4 (Blocks) stores the blocking variable. When the design is not blocked, MINITAB sets all
column values to 1.
C5 Cn stores the factors. MINITAB stores each factor in your design in a separate column.
If you name the factors, these names display in the worksheet. If you did not provide names,
MINITAB names the factors alphabetically. After you create the design, you can change the factor
names directly in the Data window or with Modify Design (page 19-41).
If you did not assign factor levels in the Factors subdialog box, MINITAB stores factor levels in
coded form (all factor levels are 1 or +1). If you assigned factor levels, the uncoded levels display
in the worksheet. After you create the design, you can change the factor levels with Modify
Design (page 19-41).
More
You can use Stat DOE Display Design (page 19-41) to switch back and forth
between a coded and uncoded display in the worksheet.
19-16
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Caution
Factorial Designs
When you create a design using Create Factorial Design, MINITAB stores the appropriate
design information in the worksheet. MINITAB needs this stored information to analyze
and plot data. If you want to use Analyze Factorial Design, you must follow certain rules
when modifying the worksheet data. If you do not, you may corrupt your design. See
Modifying and Using Worksheet Data on page 18-4.
If you make changes that corrupt your design, you may still be able to analyze it with
Analyze Factorial Design after you use Define Custom Factorial Design (page 19-34).
Naming factors
By default, MINITAB names the factors alphabetically, skipping the letter I.
h To name factors
1 In the Create Factorial Design dialog box, click Factors.
2 Under Name, click in the first row and type the name of the first factor. Then, use the Z key
to move down the column and enter the remaining factor names. Click OK.
More
After you have created the design, you can change the factor names by typing new
names in the Data window or with Modify Design (page 19-37).
CONTENTS
19-17
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 19
HOW TO USE
2 Under Low, click in the factor row you would like to assign values and enter any numeric or
text value. Use the S key to move to High and enter a value. For numeric levels, the High
value must be larger than Low value.
3 Repeat step 2 to assign levels for other factors. Click OK.
More
To change the factor levels after you have created the design, use Stat DOE Modify
Design. Unless some runs result in botched runs, do not change levels by typing them in
the worksheet.
Suppose you want to study the influence six input variables (factors) have on shrinkage of a plastic
fastener of a toy. The goal of your pilot study is to screen these six factors to determine which ones
have the greatest influence. Because you assume that three-way and four-way interactions are
negligible, a resolution IV factorial design is appropriate. You decide to generate a 16-run
fractional factorial design from MINITABs catalog.
1 Choose Stat DOE Factorial Create Factorial Design.
2 From Number of factors, choose 6.
3 Click Designs.
4 In the box at the top, highlight the line for 1/4 fraction. Click OK.
5 Click Results. Choose Summary table, alias table, data table, defining relation.
6 Click OK in each dialog box.
19-18
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Session
window
output
HOW TO USE
Factorial Designs
Factorial Design
Fractional Factorial Design
Factors: 6
Runs:
16
Blocks: none
Base Design:
6, 16
Replicates:
1
Center pts (total): 0
Resolution: IV
Fraction: 1/4
DEF + ABCDF
CDF + ABDEF
BDF + ACDEF
BCF + ABCDE
ADF + BCDEF
BCD + ABCEF
ACDF + BDEF
ABDF + CDEF
ABCF + BCDE
DF + ABCDEF
ABCD + BCEF
ABEF + ACDE
ABDE + ACEF
+ BEF + CDE
+ BDE + CEF
A
+
+
+
+
+
+
+
+
-
B
+
+
+
+
+
+
+
+
-
C
+
+
+
+
+
+
+
+
D
+
+
+
+
+
+
+
+
E
+
+
+
+
+
+
+
+
F
+
+
+
+
+
+
+
+
-
CONTENTS
19-19
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 19
SC QREF
HOW TO USE
MINITAB randomizes the design by default, so if you try to replicate this example, your
runs may not match the order shown.
AE + BC + DF + ABCDEF
You can assign the remaining three factors to D, E, and F in any way.
If you also wanted to study the three-way interaction among pressure, speed, and cooling, this
assignment would not work because ABC is confounded with E. However, you could assign
pressure to A, speed to B, and cooling to D.
e Example of creating a blocked design
You would like to study the effects of five input variables on the impurity of a vaccine. Each batch
only contains enough raw material to manufacture four tubes of the vaccine. To remove the
19-20
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Factorial Designs
effects due to differences in the four batches of raw material, you decide to perform the
experiment in four blocks. To determine the experimental conditions that will be used for each
run, you create a 5-factor, 16-run design, in 4 blocks.
1 Choose Stat DOE Factorial Create Factorial Design.
2 From Number of factors, choose 5.
3 Click Designs.
4 In the box at the top, highlight the line for 1/2 fraction.
5 From Number of blocks, choose 4.
6 Click Results. Choose Summary table, alias table, data table, defining relation. Click OK
Factorial Design
Fractional Factorial Design
Factors:
Runs:
Blocks:
5
16
4
Base Design:
Replicates:
Center pts (total):
5, 16
1
0
AB AC
CONTENTS
19-21
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 19
SC QREF
HOW TO USE
C
+
+
+
+
+
+
+
+
D
+
+
+
+
+
+
+
+
-
E
+
+
+
+
+
+
+
+
MINITAB randomizes the design by default, so if you try to replicate this example, your
runs may not match the order shown.
19-22
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Factorial Designs
CONTENTS
19-23
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 19
HOW TO USE
6 From Number of runs, choose the number of runs for your design. This list contains only
acceptable numbers of runs based on the number of factors you choose in step 4. (Each design
is based on the number of runs, from 8 to 48, and is always a multiple of 4. The number of
factors must be less than the number of runs.)
7 If you like, use any of the options listed under Design subdialog box below.
Even if you do not use any of these options, click OK. This selects the design and brings you
back to the main dialog box.
8 If you like, click Options or Factors to use any of the options listed below, then click OK to
19-24
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Factorial Designs
Options
Design subdialog box
replicate the corner points of the design. For example, suppose you are creating a design with
3 factors and 12 runs, and you specify 2 replicates. Each of the 12 runs will be repeated for a
total of 24 runs in the experiment. MINITAB does not replicate center points. See Replicating
the design on page 19-11.
When all factors are numeric, MINITAB adds the specified number of center points to the
design.
When all of the factors in a design are text, you cannot add center points.
When you have a combination of numeric and text factors, there is no true center to the
design. In this case, center points are called pseudo-center points. MINITAB adds the specified
number center points for each combination of the levels of the text factors.
Suppose you want to study the effects of 9 factors using only 12 runs, with 3 center points. In this
12-run design, each main effect is partially confounded with more than one two-way interaction.
1 Choose Stat DOE Factorial Create Factorial Design.
2 Choose Plackett-Burman design.
3 From Number of factors, choose 9.
4 Click Designs.
CONTENTS
19-25
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 19
HOW TO USE
Session
window
output
Factorial Design
Plackett-Burman Design
Factors:
Runs:
9
15
Replicates:
Center pts (total):
1
3
Design:
12
A
+
0
0
+
+
+
+
+
0
-
B
+
+
0
0
+
+
+
+
0
-
C
+
0
0
+
+
+
+
+
0
-
D
+
0
0
+
+
+
+
+
0
-
E
+
0
0
+
+
+
+
+
0
-
F
+
+
+
0
0
+
+
+
0
-
G
+
+
+
0
0
+
+
+
0
-
H
+
0
0
+
+
+
+
+
0
-
J
+
+
0
0
+
+
+
+
0
-
MINITAB randomizes the design by default, so if you try to replicate this example, your
runs may not match the order shown.
19-26
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Factorial Designs
Number of factors
3
full
IV
III
III
III
full
IV
full
III
8
16
32
10
11
12
13
14
15
IV
IV
III
III
III
III
III
III
III
full
VI
IV
IV
IV
IV
IV
IV
IV
IV
IV
16
16
full
VII
IV
IV
IV
IV
IV
IV
IV
32
16
16
16
16
16
16
16
16
16
full
VIII
VI
IV
IV
IV
IV
64
32
16
16
16
16
16
16
16
64
128
CONTENTS
19-27
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 19
SC QREF
HOW TO USE
Then k = 3, b = 6, r = 15, and n = 8. The greatest common divisor of b and r is 3. Then B = 2 and
R = 5. Start with the design for 3 factors, 8 runs, and 2 blocks. Replicate this design 15 times. This
gives a total of 215 = 30 blocks, numbered 1, 2, 1, 2, 1, 2, , 1, 2. Renumber these blocks as 1,
2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, , 1, 2, 3, 4, 5, 6. This gives 6 blocks, each replicated 5 times.
runs
design generators
full
2(AB)3
C=AB
no blocking
full
2(ABC)4 4(AB,AC)3
16
16
32
16
32
64
16
D=ABC
2(AB)3 4(AB,AC)3
full
2(ABCD)5 4(BC,ABD)3 8(AB,BC,CD)3
D=AB E=AC
2(BC)3
E=ABCD
2(AB)3 4(AB,AC)3
full
2(ABCDE)6 4(ABC,CDE)4 8(AC,BD,ADE)3 16(AB,AC,CD,DE)3
D=AB E=AC F=BC
2(BE)3
E=ABC F=BCD
2(ACD)4 4(AE,ACD)3 8(AB,BC,BF)3
F=ABCDE
2(ABF)4 4(BC,ABF)3 8(AD,BC,ABF)3 16(AB,BC,CD,DE)3
full
2(ABCDEF)7 4(ABCF,ABDE)5 8(ACE,ADF,BCF)4
16(AD,BE,CE,ABF)3 32(AB,BC,CD,DE,EF)
D=AB E=AC F=BC G=ABC
no blocking
E=ABC F=BCD G=ACD
19-28
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Factorial Designs
factor
runs
32
64
128
16
32
64
128
16
32
64
128
10
16
10
32
10
64
10
128
11
16
11
32
11
64
11
128
design generators
2(ABD)4 4(AB,AC)3 8(AB,AC,AD)3
F=ABCD G=ABDE
2(CDE)4 4(CF,CDE)3 8(AB,AD,CG)3
G=ABCDEF
2(CDE)4 4(ACF,CDE)4 8(ACF,ADG,CDE)4 16(AB,AC,EF,EG)3
full
2(ABCDEFG)8 4(ABDE,ABCFG)5 8(ABC,AFG,DEF)4 16(ABE,ADG,CDE,EFG)4
32(AC,BD,CE,DF,ABG)3 64(AB,BC,CD,DE,EF,FG)3
E=BCD F=ACD G=ABC H=ABD
2(AB)3 4(AB,AC)3 8(AB,AC,AD)3
F=ABC G=ABD H=BCDE
2(ABE)4 4(EH,ABE)3 8(AB,AC,BD)3
G=ABCD H=ABEF
2(ACE)4 4(ACE,BDF)4 8(BC,FH,BDF)3 16(BC,DE,FH,BDF)3
H=ABCDEFG
2(ABCD)5 4(ABCD,ABEF)5 8(ABCD,ABEF,BCEG)5
16(BF,DE,ABG,AEH)3 32(AC,BD,BF,DE,AEH)3
E=ABC F=BCD G=ACD H=ABD J=ABCD
2(AB)3 4(AB,AC)3
F=BCDE G=ACDE H=ABDE J=ABCE
2(AEF)4 4(AB,CD)3 8(AB,AC,CD)3
G=ABCD H=ACEF J=CDEF
2(BCE)4 4(ABF,ACJ)4 8(AD,AH,BDE)3 16(AC,AD,AJ,BF)3
H=ACDFG J=BCEFG
2(CDEJ)5 4(ABFJ,CDEJ)5 8(ACF,AHJ,BCJ)4 16(AE,CG,BCJ,BDE)3
E=ABC F=BCD G=ACD H=ABD J=ABCD K=AB
2(AC)3 4(AD,AG)3
F=ABCD G=ABCE H=ABDE J=ACDE K=BCDE
2(AB)3 4(AB,BC)3 8(AB,AC,AH)3
G=BCDF H=ACDF J=ABDE K=ABCE
2(AGJ)4 4(CD,AGJ)3 8(AG,CJ,CK)3 16(AC,AG,CJ,CK)3
H=ABCG J=BCDE K=ACDF
2(ADG)4 4(ADG,BDF)4 8(AEH,AGK,CDH)4 16(BH,EG,JK,ADG)3
E=ABC F=BCD G=ACD H=ABD J=ABCD K=AB L=AC
2(AD)3 4(AE,AH)3
F=ABC G=BCD H=CDE J=ACD K=ADE L=BDE
2(ABD)4 4(AK,ABD)3 8(AB,AC,AD)3
G=CDE H=ABCD J=ABF K=BDEF L=ADEF
2(AHJ)4 4(FL,AHJ)3 8(CD,CE,DL)3 16(AB,AC,AE,AF)3
H=ABCG J=BCDE K=ACDF L=ABCDEFG
2(ADJ)4 4(ADJ,BFH)4 8(ADJ,AHL,BFH)4 16(BC,DF,GL,BFH)3
CONTENTS
19-29
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 19
SC QREF
HOW TO USE
factor
runs
design generators
12
16
12
32
12
64
12
128
13
16
13
32
13
64
13
128
14
16
14
32
14
64
14
128
15
16
15
32
15
64
15
128
Plackett-Burman designs
These are the designs given in [4], up through n = 48, where n is the number of runs. In all cases
except n = 28, the design can be specified by giving just the first column of the design matrix. In
the table below, we give this first column (written as a row to save space). This column is
permuted cyclically to get an (n 1) (n 1) matrix. Then a last row of all minus signs is added.
For n = 28, we start with the first 9 rows. These are then divided into 3 blocks of 9 columns each.
19-30
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Factorial Designs
Then the 3 blocks are permuted (rowwise) cyclically and a last column of all minus signs is
added to get the full design.
Each design can have up to k = (n 1) factors. If you specify a k that is less than (n 1), just the
first k columns are used.
8 Runs
+++-+-12 Runs
++-+++---+16 Runs
++++-+-++--+--20 Runs
++--++++-+-+----++24 Runs
+++++-+-++--++--+-+---28 Runs
+-++++----+---+--+++-+-++-+
++-+++-----++--+---++++-++-+++++---+---+--+-+-+-++-++
---+-++++--+-+---++-+++-+-+
---++-++++----++--++--++++----+++++-+-+---+--+++-+-++
+++---+-+--+--+-+-+-++-++++++---++-+--+----+++-++--++
+++----++-+--+-+---++-+++-+
32 Runs
----+-+-+++-++---+++++--++-+--+
36 Runs
-+-+++---+++++-+++--+----+-+-++--+40 Runs (derived by duplicating the 20 run design)
++--++++-+-+----++-++--++++-+-+----++44 Runs
++--+-+--+++-+++++---+-+++-----+---++-+-++48 Runs
+++++-++++--+-+-+++--+--++-++---+-+-++----+----
CONTENTS
19-31
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 19
HOW TO USE
5 Click in Number of Levels in the row for Factor A and enter a number from 2 to 10. Use the
Z key to move down the column and specify the number of levels for each factor.
6 If you like, use any of the options listed under Designs subdialog box on page 19-7.
7 Click OK. This selects the design and brings you back to the main dialog box.
8 If you like, click Options or Factors and use any of the options listed on page 19-33, then click
Options
Design subdialog box
name factors.
replicate the design up to 10 times. For example, suppose you are creating a design with 3
factors and 12 runs, and you specify 2 replicates. Each of the 12 runs will be repeated for a
total of 24 runs in the experiment.
block the design on replicates. Each set of replicate points will be placed in a separate block.
19-32
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Factorial Designs
Naming factors
By default, MINITAB names the factors alphabetically, skipping the letter I.
h To name factors
1 In the Create Factorial Design dialog box, click Factors.
2 Under Name, click in the first row and type the name of the first factor. Then, use the Z
arrow key to move down the column and enter the remaining factor names. Click OK.
More
After you have created the design, you can change the factor names by typing new
names in the Data window, or with Modify Design (page 19-37).
CONTENTS
19-33
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 19
HOW TO USE
Defining Custom Designs
2 Under Level Values, click in the factor row to which you would like to assign values and enter
any numeric or text value. Enter numeric levels from lowest to highest.
3 Use the Z key to move down the column and assign levels for the remaining factors. Click
OK.
More
To change the factor levels after you have created the design, use Stat DOE Modify
Design. Unless some runs result in botched runs, do not change levels by typing them in
the worksheet.
19-34
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Factorial Designs
low level; the highest value in a factor column as the high level.
1 Under Low and High Values for Factors, choose These values are specified below.
2 Under Low, click in the factor row you would like to assign values and enter the
appropriate numeric or text value. Use the S key to move to High and enter a value.
For numeric levels, the High value must be larger than Low value.
3 Repeat step 2 to assign levels for other factors.
4 Under Worksheet Data Are, choose Coded or Uncoded.
5 Click OK.
6 Do one of the following:
If you do not have any worksheet columns containing the standard order, run order, center
point indicators, or blocks, click OK in each dialog box.
CONTENTS
19-35
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 19
HOW TO USE
Defining Custom Designs
If you have worksheet columns that contain data for the blocks, center point identification
(two-level designs only), run order, or standard order, click Designs.
1 If you have a column that contains the standard order of the experiment, under
Standard Order Column, choose Specify by column and enter the column containing
the standard order.
2 If you have a column that contains the run order of the experiment, under Run Order
Column, choose Specify by column and enter the column containing the run order.
3 For two-level designs, if you have a column that contains the center point identification
values, under Center points, choose Specify by column and enter the column
containing these values. The column must contain only 0s and 1s. MINITAB considers 0
a center point; 1 not a center point.
4 If your design is blocked, under Blocks, choose Specify by column and enter the
19-36
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Modifying Designs
Factorial Designs
Modifying Designs
After creating a factorial design and storing it in the worksheet, you can use Modify Design to
make the following modifications:
add axial points to the design. You can also add center points to the axial block.
By default, MINITAB will replace the current design with the modified design.
3 Enter new factor names or factor levels as shown in Naming factors on page 19-17 and Setting
You can also type new factor names directly into the Data window.
CONTENTS
19-37
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 19
Modifying Designs
B
+
+
+
+
+
+
-
+
+
+
+
-
+
+
True replication provides an estimate of the error or noise in your process and may allow for more
precise estimates of effects.
h To replicate the design
1 Choose Stat DOE Modify Design.
2 Choose Replicate design and click Specify.
19-38
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Modifying Designs
Factorial Designs
2 Choose Randomize design and click Specify.
Choose Randomize just block, and choose a block number from the list.
4 If you like, in Base for random data generator, enter a number. Click OK.
More
You can use Stat DOE Display Design (page 19-41) to switch back and forth
between a random and standard order display in the worksheet.
Choose Fold on all factors to make all main effects free from each other and all two-factor
interactions.
Choose Fold just on factors and then choose a factor from the list to make the specified
factor and all its two-factor interactions free from other main effects and two-factor
interactions.
CONTENTS
19-39
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 19
HOW TO USE
Modifying Designs
Orthogonally blocked designs allow for model terms and block effects to be estimated
independently and minimize the variation in the estimated coefficients. Rotatable designs
provide the desirable property of constant prediction variance at all points that are
equidistant from the design center, thus improving the quality of the prediction.
4 If you want to add center points to the axial block, enter a number in Add the following
If you are building up a factorial design into a central composite design and would like to
consider the properties of orthogonal blocking and rotatability, use the table on page
20-17 for guidance on choosing and the number of center points to add.
19-40
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Displaying Designs
HOW TO USE
Factorial Designs
Displaying Designs
After you create the design, you can use Display Design to change the way the design points
display in the worksheet. You can change the design points in two ways:
display the points in either random or standard order. Standard order is the order of the runs if
the experiment was done in Yates order. Run order is the order of the runs if the experiment
was done in random order.
2 Choose Run order for the design or Standard order for the design. If you do not randomize
a design, the columns that contain the standard order and run order are the same.
3 Do one of the following:
If you want to reorder all worksheet columns that are the same length as the design
columns, click OK.
CONTENTS
19-41
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 19
HOW TO USE
Collecting and Entering Data
and factor settings in the worksheet. These columns constitute the basis of your data collection
form. If you did not name factors or specify factor levels when you created the design, and you
want names or levels to appear on the form, use Modify Design (page 19-37).
2 In the worksheet, name the columns in which you will record the measurement data obtained
More
You can also copy the worksheet cells to the Clipboard by choosing Edit Copy Cells.
Then paste the Clipboard contents into a word-processing application, such as Microsoft
WordPad or Microsoft Word, where you can create your own form.
19-42
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Factorial Designs
create a design from data that you already have in the worksheet with Define Custom
Factorial Design
Data
Enter up to 25 numeric response data columns that are equal in length to the design variables in
the worksheet. Each row will contain data corresponding to one run of your experiment. You
may enter the response data in any column(s) not occupied by the design data. The number of
columns reserved for the design data is dependent on the number of factors in your design.
If there is more than one response variable, MINITAB fits separate models for each response.
MINITAB omits missing data from all calculations.
Note
When all the response variables do not have the same missing value pattern, MINITAB
displays a message. Since you would get different results, you may want to repeat the
analysis separately for each response variable.
When you have a botched run, you need to determine the extent to which the actual
factor settings deviate from the planned settings. When the executed settings fall within
the normal range of their set points, you may not wish to alter the factor levels in the
worksheet. The variability in the actual factor levels will simply contribute to the overall
experimental error. However, if the executed levels differ notably from the planned levels,
you should change them in the worksheet.
CONTENTS
19-43
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 19
HOW TO USE
Analyzing Factorial Designs
Options
Graphs subdialog box
for two-level factorial and Plackett-Burman designs, draw two effects plotsa normal plot and
a Pareto chartsee Effects plots on page 19-47.
draw five different residual plots for regular, standardized, or deleted residualssee Choosing
a residual type on page 2-5. Available residual plots include a
histogram.
normal probability plot.
).
plot of residuals versus the fitted values ( Y
plot of residuals versus data order. The row number for each data point is shown on the
x-axisfor example, 1 2 3 4 n.
separate plot for the residuals versus each specified column.
For a discussion, see Residual plots on page 2-5.
fit a model by specifying the maximum order of the terms, or choose which terms to include
from a list of all estimable termssee Specifying the model on page 19-46.
for two-level factorial and Plackett-Burman designs, include center points in the model.
19-44
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Factorial Designs
Caution
Be careful! High-order interactions with a large number of factors could take a very long
time to compute.
display the adjusted (also called fitted or least squares) means for factors and interactions
that are in the model. If the design is orthogonal and there are no covariates, each adjusted
mean is just the average of all the observations in the corresponding cell.
for general full factorial designs, display the following in the Session window:
no results.
the ANOVA table.
the default results, which includes the ANOVA table, covariate coefficients, and unusual
observations.
the ANOVA table, all coefficients, and unusual observations.
display the adjusted (also called fitted or least squares) means for factors and interactions
that are in the model. If the design is orthogonal and there are no covariates, each adjusted
mean is just the average of all the observations in the corresponding cell.
include up to 50 covariates in your model. Covariates are fit first, then the blocks, then all
other terms.
store the fits and regular, standardized, and deleted residuals separately for each response
see Choosing a residual type on page 2-5.
for two-level factorial and Plackett-Burman designs, store the effects for each response in a
separate column. The effects for the constant, covariates, center points or blocks are not
stored.
store the coefficients, and design matrix for the model, separately for each response. The
design matrix multiplied by the coefficients will yield the fitted values.
CONTENTS
19-45
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 19
HOW TO USE
Analyzing Factorial Designs
for two-level factorial and Plackett-Burman designs, store information about the fitted model
in a column by checking Factorialsee Help for the structure of this column.
store the leverages, Cooks distances, and DFITS for identifying outlierssee Identifying
outliers on page 2-9.
For a full factorial design, by default MINITAB fits all terms up to the maximum order. For
example, MINITAB will fit all terms up to the four-way interaction for a four-factor design. If
you do not want the default model, you select terms by specifying the maximum order, or you
can fit a model that is a subset of these terms. The table below shows the terms that would be
fit for a four-factor design.
If you choose
linear
A B C D
linear
two-way interactions
A B C D
AB AC AD BC BD CD
linear
two-way interactions
three-way interactions
A B C D
AB AC AD BC BD CD
ABC ABD ACD BCD
linear
two-way interactions
three-way interactions
four-way interaction
A B C D
AB AC AD BC BD CD
ABC ABD BCD
ABCD
For a fractional factorial design, the default terms selected are based on the alias structure. For
example, suppose a five-factor design has the following alias structure: By default, MINITAB fits
A
B
C
D
E
BC
BE
+
+
+
+
+
+
+
BD
AD
AE
AB
AC
DE
CD
+
+
+
+
+
+
+
CE
CDE
BDE
BCE
BCD
ABE
ABC
+
+
+
+
+
+
+
ABCDE
ABCE
ABCD
ACDE
ABDE
ACD
ADE
the highlighted terms. If you do not want the default model, select terms by specifying the
maximum order, or fit a model that is a subset of these terms.
19-46
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Factorial Designs
from Include terms in the model up through order, choose a number. The available
choices depend on the number of factors in your design.
3 Move the terms you want to fit from the Available box to the Selected box using the arrow
or
or
the model, check Include center point column as a term in the model. Click OK.
Effects plots
The primary goal of screening designs is to identify the vital few factors or key variables that
influence the response. MINITAB provides two graphs that help you identify these influential
factors: a normal plot and a Pareto chart. These graphs allow you to compare the relative
magnitude of the effects and evaluate their statistical significance.
CONTENTS
19-47
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 19
HOW TO USE
Analyzing Factorial Designs
The normal probability plot labels important effects using = 0.10, by default. You can change
the -level in the Graphs subdialog box.
When there is no error term, MINITAB uses Lenths method [2] to identify important effects.
When there is an error term, MINITAB uses the corresponding p-values shown in the Session
window to identify important effects.
Pareto chart of the effects
You can also draw a Pareto chart of the effects. MINITAB displays the
absolute value of the unstandardized effects when there is not an error term
The Pareto chart allows you to look at both the magnitude and the importance of an effect. This
chart displays the absolute value of the effects, and draws a reference line on the chart. Any effect
that extends past this reference line is potentially important.
19-48
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Factorial Designs
The reference line corresponds to = 0.10, by default. You can change the -level in the
Graphs subdialog box.
When there is no error term, MINITAB uses Lenths method [2] to draw the line. When there is
an error term, MINITAB uses the corresponding t-value shown in the Session window to identify
important effects.
e Example of analyzing a full factorial design with replicates and blocks
You are an engineer investigating how processing conditions affect the yield of a chemical
reaction. You believe that three processing conditions (factors)reaction time, reaction
temperature, and type of catalystaffect the yield. You have enough resources for 16 runs, but
you can only perform 8 in a day. Therefore, you create a full factorial design, with two replicates,
and two blocks.
1 Open the worksheet YIELD.MTW. (The design and response data have been saved for you.)
2 Choose Stat DOE Factorial Analyze Factorial Design.
3 In Responses, enter Yield.
4 Click Graphs. Under Effects Plots, check Normal and Pareto. In Alpha, enter 0.05. Click
CONTENTS
19-49
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 19
Session
window
output
Effect
2.9594
2.7632
0.1618
0.8624
0.0744
-0.0867
0.0230
Coef
45.5592
-0.0484
1.4797
1.3816
0.0809
0.4312
0.0372
-0.0434
0.0115
SE Coef
T
0.09546 477.25
0.09546 -0.51
0.09546 15.50
0.09546 14.47
0.09546
0.85
0.09546
4.52
0.09546
0.39
0.09546 -0.45
0.09546
0.12
P
0.000
0.628
0.000
0.000
0.425
0.003
0.708
0.663
0.907
DF
1
3
3
1
7
15
Seq SS
0.0374
65.6780
3.0273
0.0021
1.0206
69.7656
Adj SS
0.0374
65.6780
3.0273
0.0021
1.0206
Adj MS
F
P
0.0374 0.26 0.628
21.8927 150.15 0.000
1.0091 6.92 0.017
0.0021 0.01 0.907
0.1458
19-50
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Factorial Designs
Graph
window
output
Effect
P-Value
Significanta
Blocks
0.628
no
Main
0.000
yes
Two-way interactions
0.017
yes
Three-way interactions
0.907
no
significant at = 0.05
The nonsignificant block effect indicates that the results are not affected by the fact that you had
to collect your data on two different days.
CONTENTS
19-51
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 19
HOW TO USE
Displaying Factorial Plots
After identifying the significant effects (main and two-way interactions) in the analysis of variance
table, look at the estimated effects and coefficients table. This table shows the p-values associated
with each individual model term. The p-values indicate that just one two-way interaction Time
Temp (p = 0.003), and two main effects Time (p = 0.000) and Temp (p = 0.000) are significant.
However, because both of these main effects are involved in an interaction, you need to
understand the nature of the interaction before you can consider these main effects. See Example
of factorial plots on page 19-57 for a discussion of this interaction.
The residual error that is shown in the ANOVA table can be made up of three parts:
(1) curvature, if there are center points in the data, (2) lack of fit, if a reduced model was fit, and
(3) pure error, if there are any replicates. If the residual error is just due to lack of fit, MINITAB
does not print this breakdown. In all other cases, it does.
The normal and Pareto plots of the effects allow you to visually identify the important effects and
compare the relative magnitude of the various effects.
You should also plot the residuals versus the run order to check for any time trends or other
nonrandom patterns. Residual plots are found in the Graphs subdialog box. For a discussion, see
Residual plots on page 2-5.
factorial plotsmain effects, interactions, and cube plots. These plots can be used to show
how a response variable relates to one or more factors.
response surface plotscontour and surface (wireframe) plots. These plots show how a
response variable relates to two factors based on a model equation. See Displaying Response
Surface Plots on page 19-59.
You must have a design in the worksheet created by Create Factorial Design or Define Custom
Factorial Design before using Factorial Plots.
raw response datathe means of the response variable for each level of a factor
fitted values after you have analyzed the designpredicted values for each level of a factor
For a balanced design, the main effects plot using the two types of responses are identical.
However, with an unbalanced design, the plots are sometimes quite different. While you can use
raw data with unbalanced designs to obtain a general idea of which main effects may be
important, it is generally good practice to use the predicted values to obtain more precise results.
19-52
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Factorial Designs
MINITAB plots the means at each level of the factor and connects them with a line. Center points
and factorial points are represented by different symbols. A reference line at the grand mean of
the response data is drawn.
MINITAB draws a single main effects plot if you enter one factor, or a series of plots if you enter
more than one factor. You can use these plots to compare the magnitudes of the various main
effects. MINITAB also draws a separate plot for each factor-response combination.
A main effect occurs when the mean response changes across the levels of a factor. You can use
main effects plots to compare the relative strength of the effects across factors. Notice on the
plots below that the main effect for pressure (on the left) is much smaller than the main effect for
temperature (on the right).
Main Effect for Pressure
Pressure
Temperature
Note
Although you can use these plots to compare factor effects, be sure to evaluate
significance by looking at the effects in the analysis of variable table.
Interactions plots
You can plot two-factor interactions for each pair of factors in your design. An interactions plot is
a plot of means for each level of a factor with the level of a second factor held constant. You can
draw an interactions plot for either the
raw response datathe means of the response variable for each level of a factor
fitted values after you have analyzed the designpredicted values for each level of a factor
For a balanced design, the interactions plot using the two types of responses are identical.
However, with an unbalanced design, the plots are sometimes quite different. While you can use
raw data with unbalanced designs to obtain a general idea of which interactions may important,
it is generally good practice to use the predicted values to obtain more precise results.
MINITAB draws a single interactions plot if you enter two factors, or a matrix of interactions plots
if you enter more than two factors.
CONTENTS
19-53
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 19
HOW TO USE
Displaying Factorial Plots
An interaction between factors occurs when the change in response from the low level to the high
level of one factor is not the same as the change in response at the same two levels of a second
factor. That is, the effect of one factor is dependent upon a second factor. You can use
interactions plots to compare the relative strength of the effects across factors. Notice on the plots
below that the interaction between pressure and rate (on the left) is much smaller than the
interaction between temperature and rate (on the right).
Pressure by Rate Interaction
Pressure
Temperature
Pressure
Note
Although you can use these plots to compare interaction effects, be sure to evaluate
significance by looking at the effects in the analysis of variable table.
Cube plots
Cube plots can be used to show the relationship between up to eight factorswith or without a
response measure. Viewing the factors without the response allows you to see what a design
looks like. If there are only two factors, MINITAB displays a square plot. You can draw a cube
plot for either the
data meansthe means of the raw response variable data for each factor level
fitted means after analyzing the designpredicted values for each factor level
19-54
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Factorial Designs
The plots below illustrate a three-factor cube plot with and without a response variable.
No response
Response
Data
You must create a factorial design, and enter the response data in your worksheet for both main
effects and interactions plots.
For cube plots, you do not need to have a response variable, but you must create a factorial
design first. If you enter a response column, MINITAB displays the means for the raw response
data or fitted values at each point in the cube where observations were measured. If you do not
enter a response column, MINITAB draws points on the cube for the effects that are in your
model.
If you are plotting the means of the raw response data, you can generate the plots before you fit a
model to the data. If you are using the fitted values (least-squares means), you need to use
Analyze Factorial Design before you can display a factorial plot.
h To display factorial plots
1 Choose Stat DOE Factorial Factorial Plots.
CONTENTS
19-55
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 19
HOW TO USE
Displaying Factorial Plots
To generate a main effects plot, check Main effects (response versus levels of 1 factor),
then click Setup
To generate a cube plot, check Cube (response versus levels of 2 to 8 factors, then click
Setup)
The setup subdialog box shown above is for a main effects plot. The setup dialog box for the
other factorial plots will differ slightly.
3 In Responses, enter the numeric columns that contain the response (measurement) data.
MINITAB draws a separate plot for each column. (You can create a cube plot without entering
any response columns.)
4 Move the factors you want to plot from the Available box to the Selected box using the arrow
or
or
Options
Factorial Plots dialog box
plot the data means or the fitted (least-squares) means as the response
19-56
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Factorial Designs
set the minimum and maximum values on the y-axis. Each plot will then be on the same
scale, which can be useful when you are comparing several plots of related data.
for interactions plots, display the full interaction matrix when you have more than two
factorsby default, MINITAB only displays the upper right portion of the matrix. In the full
matrix, the transpose of each plot in the upper right displays in the lower left portion of the
matrix.
In the Example of analyzing a full factorial design with replicates and blocks on page 19-49, you
were investigating how processing conditions (factors)reaction time, reaction temperature,
and type of catalystaffect the yield of a chemical reaction. You determined that there was a
significant interaction between reaction time and reaction temperature and you would like to
view the factorial plots to help you understand the nature of the relationship. Because the effects
due to block and catalyst are not significant, you will not include them in the plots.
1 Open the worksheet YIELDPLT.MTW. (The design, response data, and model information
6 Click
CONTENTS
19-57
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 19
HOW TO USE
Displaying Factorial Plots
Graph
window
output
19-58
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Factorial Designs
You can also produce three factorial plotsmain effects, interactions, and cube plots.
These plots can be used to show how a response variable relates to one or more factors.
See Displaying Factorial Plots on page 19-52.
Note
When the model has more than two factors, the factor(s) that are not in the plot are held
constant. Any covariates in the model are also held constant. You can specify the
constant values at which to hold the remaining factors and covariates. See Settings for
covariates and extra factors on page 19-61.
Data
Contour plots and surface plots are model dependent. Thus, you must fit a model using Analyze
Factorial Design before you can generate response surface plots. MINITAB looks in the worksheet
for the necessary model information to generate these plots.
CONTENTS
19-59
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 19
SC QREF
HOW TO USE
to generate a contour plot, check Contour plot and click Setup. If you have analyzed more
than one response, from Response, choose the desired response.
to generate a surface (wireframe) plot, check Surface (wireframe) plot and click Setup. If
you have analyzed more than one response, from Response, choose the desired response.
3 If you like, use any of the options listed below, then click OK in each dialog box.
Options
Setup subdialog box
display separate graphs for every combination of numeric factors in the model
for contour plots, specify the number or location of the contour levels, and the contour line
color and stylesee Controlling the number, type, and color of the contour lines on page 19-61
for surface (wireframe) plots, specify the color of the wireframe (mesh) and the surface
specify values for covariates and factors that are not included in the response surface plot. By
default, MINITAB holds factors at their low levels and covariates at their middle (calculated
median) levels. See Settings for covariates and extra factors on page 19-61.
define minimum and maximum values for the x-axis and y-axis.
19-60
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Factorial Designs
To use the preset values, choose High settings, Middle settings, or Low settings under
Hold extra factors at and/or Hold covariates at. When you use a preset value, all factors or
covariates not in the plot will be held at their specified settings.
To specify the value at which to hold a factor or covariate, enter a number in Setting for
each one you want control. This option allows you to set a different holding value for each
factor or covariate.
4 Click OK.
CONTENTS
19-61
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 19
HOW TO USE
Choose Values and enter from 2 to 15 contour level values in the units of your data. You
must enter the values in increasing order.
4 To define the line style, choose Make all lines solid or Use different types under Line Styles.
5 To define the line color, choose Make all lines black or Use different colors under Line
In the Example of analyzing a full factorial design with replicates and blocks on page 19-49, you
were investigating how processing conditions (factors)reaction time, reaction temperature, and
type of catalystaffect the yield of a chemical reaction. You determined that there was a
significant interaction between reaction time and reaction temperature and you would like to
view the response surface plots to help you understand the nature of the relationship. Because the
effects due to block and catalyst are not significant, you did not include them in the plots.
To view the main effects and interactions plots, see page 19-58.
1 Open the worksheet YIELDPLT.MTW. (The design, response data, and model information
19-62
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
References
HOW TO USE
Factorial Designs
4 Choose Surface (wireframe) plot and click Setup. Click OK in each dialog box.
Graph
window
output
References
[1] G.E.P. Box, W.G. Hunter, and J.S. Hunter (1978). Statistics for Experimenters. An
Introduction to Design, Data Analysis, and Model Building. New York: John Wiley & Sons.
[2] R.V. Lenth (1989). Quick and Easy Analysis of Unreplicated Factorials, Technometrics, 31,
pp.469-473.
[3] D.C. Montgomery (1991). Design and Analysis of Experiments, Third Edition, John Wiley
& Sons.
MINITAB Users Guide 2
CONTENTS
19-63
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 19
References
[4] R.L. Plackett and J.P. Burman (1946). The Design of Optimum Multifactorial
Experiments, Biometrika, 34, pp.255272.
Acknowledgment
The two-level factorial and Plackett-Burman design and analysis procedures were developed
under the guidance of James L. Rosenberger, Statistics Department, The Pennsylvania State
University.
19-64
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
20
Response Surface
Designs
See also,
CONTENTS
20-1
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 20
SC QREF
HOW TO USE
find factor settings (operating conditions) that produce the best response
identify new operating conditions that produce demonstrated improvement in product quality
over the quality achieved by current conditions
Many response surface applications are sequential in nature in that they require more than one
stage of experimentation and analysis. The steps shown below are typical of a response surface
experiment. Depending on your experiment, you may carry out some of the steps in a different
order, perform a given step more than once, or eliminate a step.
1 Choose a response surface design for the experiment. Before you begin using MINITAB, you
must determine what the influencing factors are, that is, what the process conditions are that
influence the values of the response variable. See Choosing a Design on page 20-3.
2 Use Create Response Surface Design to generate a central composite or Box-Behnken
20-2
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Choosing a Design
HOW TO USE
Response Surface Designs
Choosing a Design
Before you use MINITAB, you need to determine what design is most appropriate for your
experiment. Choosing your design correctly will ensure that the response surface is fit in the
most efficient manner. MINITAB provides central composite and Box-Behnken designs. When
choosing a design you need to
ensure adequate coverage of the region of interest on the response surface. You should choose
a design that will adequately predict values in the region of interest.
determine the impact that other considerations (such as cost, time, or the availability of
facilities) have on your choice of a design.
Depending on your problem, there are other considerations that make a design desirable. You
need to choose a design that shows consistent performance in the criteria that you consider
important, such as the ability to
More
perform the experiment in orthogonal blocks. Orthogonally blocked designs allow for model
terms and block effects to be estimated independently and minimize the variation in the
estimated coefficients.
rotate the design. Rotatable designs provide the desirable property of constant prediction
variance at all points that are equidistant from the design center, thus improving the quality of
the prediction.
CONTENTS
20-3
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 20
HOW TO USE
center points
A central composite design with two factors is shown below. Points on the diagrams represent the
experimental runs that are performed:
Central composite designs are often recommended when the design plan calls for sequential
experimentation because these designs can incorporate information from a properly planned
factorial experiment. The factorial or cube portion and center points may serve as a preliminary
stage where you can fit a first-order (linear) model, but still provide evidence regarding the
importance of a second-order contribution or curvature.
You can then build the cube portion of the design up into a central composite design to fit a
second-degree model by adding axial and center points. Central composite designs allow for
efficient estimation of the quadratic terms in the second-order model, and it is also easy to obtain
the desirable design properties of orthogonal blocking and rotatability.
More
Orthogonally blocked designs allow for model terms and block effects to be estimated
independently and minimize the variation in the regression coefficients. Rotatable
designs provide the desirable property of constant prediction variance at all points that
are equidistant from the design center, thus improving the quality of the prediction.
20-4
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Response Surface Designs
Box-Behnken designs
You can create blocked or unblocked Box-Behnken designs. The illustration below shows a
three-factor Box-Behnken design. Points on the diagram represent the experimental runs that are
performed:
You may want to use Box-Behnken designs when performing non-sequential experiments. That
is, you are only planning to perform the experiment once. These designs allow efficient
estimation of the first- and second-order coefficients. Because Box-Behnken designs have fewer
design points, they are less expensive to run than central composite designs with the same
number of factors.
Box-Behnken designs can also prove useful if you know the safe operating zone for your process.
Central composite designs usually have axial points outside the cube (unless you specify an
that is less than or equal to one). These points may not be in the region of interest, or may be
impossible to run because they are beyond safe operating limits. Box-Behnken designs do not
have axial points, thus, you can be sure that all design points fall within your safe operating zone.
Box-Behnken designs also ensure that all factors are never set at their high levels simultaneously.
h To create a response surface design
1 Choose Stat DOE Response Surface Create Response Surface Design
CONTENTS
20-5
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 20
HOW TO USE
5 Click Designs.
The subdialog box that displays depends whether you choose Central composite or
Box-Behnken in step 3.
Central Composite Design
Box-Behnken Design
for a central composite design, choose the design you want to create from the list shown at
the top of the subdialog box
for a Box-Behnken design, you do not have to choose a design because the number of
factors determines the number of runs
7 If you like, use any of the options listed under Design subdialog box below.
20-6
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Response Surface Designs
8 Click OK even if you do not change any of the options. This selects the design and brings you
Central composite designs on page 20-17 shows the central composite designs you can
generate with Create Response Surface Design. The default values of provide
orthogonal blocking and, whenever possible, rotatability. The cube portions of central
composite designs are identical to those generated by Create Factorial Design with the
same number of center points and blocks. Thus, a design generated by Create Factorial
Design with the same number of runs, center points, and blocks can be built up into an
orthogonally-blocked central composite design.
Any factorial design with the right number of runs and blocks can be built up into a
blocked central composite design. However, to make the blocks orthogonal, Create
Factorial Design must use the number of center points shown in Central composite
designs on page 20-17.
Options
Design subdialog box
change the number of center pointssee Changing the number of center points on page 20-8
for a central composite design, change the position of the axial settings ()see Changing
the value of for a central composite design on page 20-9
for a central composite design, define the low and high values of the experiment in terms of
the axial points rather than the cube pointssee Setting factor levels on page 20-11
display the summary and data tables or suppress all Session window results
CONTENTS
20-7
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 20
SC QREF
HOW TO USE
More
Box-Behnken designs
For a Box-Behnken design, the number of ways to block a design depends on the number of
factors. All of the blocked designs have orthogonal blocks. A design with
When you are creating a design, MINITAB displays the appropriate choices.
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Response Surface Designs
Note
For a central composite design, under Number of center points choose Custom and
enter a number in in cube. When you have more than one block in your design, you also
need to enter a number to indicate the number of center points in in axial block.
For a Box-Behnken design, under Number of center points choose Custom and enter a
number in the box.
When a Box-Behnken design is blocked, the center points are divided equally (as much as
possible) among the blocks.
Note
To set equal to 1, choose Face Centered. When = 1, the axial points are placed on the
cube portion of the design. This is an appropriate choice when the cube points of the
design are at the operational limits.
Choose Custom and enter a positive number in the box. A value less than one places the
axial points inside the cube portion of the design; a value greater than one places the
axial points outside the cube.
A value of = (F), where F is the number of factorial points in the design, guarantees
rotatability.
CONTENTS
20-9
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 20
SC QREF
HOW TO USE
StdOrder shows what the order of the runs in the experiment would be if the experiment was
done in standard orderalso called Yates order.
RunOrder shows what the order of the runs in the experiment would be if the experiment was
run in random order.
If you do not randomize, the run order and standard order are the same.
If you want to re-create a design with the same ordering of the runs (that is, the same design
order), you can choose a base for the random data generator. Then, when you want to re-create
the design, you just use the same base.
You can use Stat DOE Display Design (page 20-23) to switch back and forth
between a random and standard order display in the worksheet.
More
C3 (Blocks) stores the blocking variable. When the design is not blocked, MINITAB sets all
column values to one.
C4 Cn stores the factors. MINITAB stores each factor in your design in a separate column.
If you named the factors, these names display in the worksheet. If you did not provide names,
MINITAB names the factors alphabetically. After you create the design, you can change the factor
names directly in the Data window or with Stat DOE Modify Design (page 20-19).
If you did not assign factor levels in the Factors subdialog box, MINITAB stores factor levels in
coded form (all factor levels are 1 or +1). If you assigned factor levels, the uncoded levels display
20-10
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Response Surface Designs
in the worksheet. To switch back and forth between a coded and an uncoded display, use Stat
DOE Display Design (page 20-23).
Caution
When you create a design using Create Response Surface Design, MINITAB stores the
appropriate design information in the worksheet. MINITAB needs this stored information
to analyze and plot data. If you want to use Analyze Response Surface Design, you must
follow certain rules when modifying the worksheet data. See Modifying and Using
Worksheet Data on page 18-4.
If you make changes that corrupt your design, you may still be able to analyze it with
Analyze Response Surface Design after you use Define Custom Response Surface Design
(page 20-18).
Naming factors
By default, MINITAB names the factors alphabetically.
h To name factors
1 In the Create Response Surface Design dialog box, click Factors.
2 Under Name, click in the first row and type the name of the first factor. Then, use the Z key
to move down the column and enter the remaining factor names. Click OK.
More
After you have created the design, you can change the factor names by typing new
names in the Data window or with Modify Design (page 20-19).
CONTENTS
20-11
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 20
SC QREF
HOW TO USE
By default, MINITAB sets the low level of all factors to 1 and high level to +1.
h To assign factor levels
1 In the Create Response Surface Design dialog box, click Factors.
2 Under Low, click in the row for the factor you would like to assign values and enter any
numeric value. Use the S key to move to High and enter a numeric value that is greater than
the value you entered in Low.
3 Repeat step 2 to assign levels for other factors.
4 For a central composite design, under Levels Define, choose Cube points or Axial points to
specify which values you entered in Low and High. Click OK.
Note
In a central composite design, the values you enter for the factor levels are usually not the
minimum and maximum values in the design. They are the low and high settings for the
cube portion of the design. The axial points are usually outside the cube (unless you
specify an that is less than or equal to 1). If you are not careful, this could lead to axial
points that are not in the region of interest or may be impossible to run.
Choosing Axial points in the Factors subdialog box guarantees all of the design points
will fall between the defined minimum and maximum value for the factor(s). MINITAB will
then determine the appropriate low and high settings for the cube as follows:
More
( 1) max + ( + 1) min
2
( 1) min + ( + 1) max
2
To change the factor levels after you have created the design, use Stat DOE Modify
Design.
20-12
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Response Surface Designs
Suppose you want to conduct an experiment to maximize crystal growth. You have determined
that three variablestime the crystals are exposed to a catalyst, temperature in the exposure
chamber, and percentage of the catalyst in the air inside the chamberexplain much of the
variability in the rate of crystal growth.
You generate the default central composite design for three factors and two blocks (to represent
the two days you conduct the experiment). You assign the factor levels and randomize the
design.
1 Choose Stat DOE Response Surface Create Response Surface Design.
2 Under Type of Design, choose Central Composite.
3 From Number of factors, choose 3.
4 Click Designs. To create the design with 2 blocks, highlight the second row in the Design
Factor
Name
Low
High
Time
Temperature
40
60
Catalyst
3.5
7.5
6 Click OK.
7 Click Results. Choose Summary table and data table. Click OK in each dialog box.
CONTENTS
20-13
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 20
Session
window
output
SC QREF
HOW TO USE
3
20
Blocks:
2
Alpha: 1.633
4
2
MINITAB randomizes the design by default, so if you try to replicate this example, your
runs may not match the order shown.
20-14
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Response Surface Designs
Suppose you have a process for pressure treating utility poles with creosote. In the treating step of
the process, you place air-dried poles inside a treatment chamber. The pressure in the chamber
is increased and the chamber is flooded with hot creosote. The poles are left in the chamber
until they have absorbed 12 pounds of creosote per cubic foot. You would like to experiment
with different settings for the air pressure, temperature of the creosote, and time in the chamber.
Your goal is to get the creosote absorption as close to 12 pounds per cubic foot as possible, with
minimal variation. Previous investigation suggests that the response surface for absorption
exhibits curvature.
The chamber will withstand internal pressures up to 220 psi, although the strain on equipment
is pronounced at over 200 psi. The current operating value is at 175 psi, so you feel comfortable
with a range of values between 150 and 200. Current operating values for temperature and time
are 210 F and 5 hours, respectively. You feel that temperature cannot vary by more than 10
from the current value. Time can be varied from 4 to 6 hours.
A Box-Behnken design is a practical choice when you cannot run all of the factors at their high
(or low) levels at the same time. Here, the high level for pressure is already at the limit of what
the chamber can handle. If temperature were also at its high level, this increases the effective
pressure, and running at these settings for a long period of time is not recommended. The
Box-Behnken design will assure that no runs require all factors to be at their high settings
simultaneously.
1 Choose Stat DOE Response Surface Create Response Surface Design.
2 Under Type of Design, choose Box-Behnken.
3 From Number of factors, choose 3.
4 Click Designs. Click OK.
5 Click Factors. Complete the Name, Low, and High columns of the table as shown below:
Factor
Name
Low
High
Pressure
150
200
Temperature
200
220
Time
6 Click OK.
7 Click Results. Choose Summary table and data table. Click OK in each dialog box.
CONTENTS
20-15
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 20
Session
window
output
SC QREF
HOW TO USE
3
15
Blocks:
none
Center points: 3
A
0
+
0
+
+
0
0
0
0
0
+
-
B
+
0
0
+
0
+
0
0
0
+
0
-
C
0
0
0
+
+
0
0
0
+
+
0
MINITAB randomizes the design by default, so if you try to replicate this example, your
runs may not match the order shown.
20-16
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Response Surface Designs
5 half
5
6 half
13
14
20
20
20
31
30
30
32
33
52
54
54
53
54
54
90
90
90
90
2
3
2
3
2
3
2
3
2
3
5
1
2
1
2
1
2
1
2
1
2
4
total
center
points
4
4
8
8
8
16
16
16
16
16
32
32
32
32
32
32
64
64
64
64
cube
axial
center center default orthogonal
points points alpha
blocks
rotatable
5
6
6
6
6
7
6
6
6
7
10
12
12
9
10
10
14
14
14
14
4
4
4
4
8
8
8
8
8
8
8
2
2
2
2
4
4
2
2
6
6
6
1.414
1.414
1.682
1.633
1.633
2.000
2.000
2.000
2.000
2.000
2.378
2.366
2.366
2.378
2.366
2.366
2.828
2.828
2.828
2.828
y
y
y
y
y
y
y
y
y
y
y
y
y
y
n
n
y
y
y
y
y
y
n
n
y
n
n
y
y
y
y
Box-Behnken designs
factors
runs
blocks
center points
3
4
5
6
7
15
27
46
54
62
1
3
2
2
2
3
3
6
6
6
CONTENTS
20-17
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 20
HOW TO USE
Defining Custom Designs
20-18
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Modifying Designs
HOW TO USE
Response Surface Designs
If you do not have any columns containing the blocks, run order, or standard order, click
OK.
If you have additional columns that contain data for the blocks, run order, or standard
order, click Designs.
1 If your design is blocked, under Blocks, choose Specify by column and enter the
Column, choose Specify by column and enter the column containing the run order.
3 If you have a column that contains the standard order of the experiment, under
Standard Order Column, choose Specify by column and enter the column
containing the standard order. Click OK in each dialog box.
Modifying Designs
After creating a design and storing it in the worksheet, you can use Modify Design to make the
following modifications:
rename the factors and change the factor levelssee Renaming factors and changing factor
levels below
By default, MINITAB will replace the current design with the modified design.
CONTENTS
20-19
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 20
HOW TO USE
Modifying Designs
You can also change factor levels from the default values of 1 and +1 or change previously
assigned values.
h To rename factors or change factor levels
1 Choose Stat DOE Modify Design.
3 Enter new factor names or factor levels as shown in Naming factors on page 20-11 and Setting
You can also type new factor names directly into the Data window.
20-20
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Modifying Designs
HOW TO USE
Response Surface Designs
-1.414 0.000
1.000 -1.000
0.000 1.414
0.000 0.000
0.000 0.000
1.414 0.000
0.000 0.000
0.000 0.000
0.000 0.000
0.000 -1.414
1.000 1.000
-1.000 -1.000
-1.000 1.000
-1.414 0.000
1.000 -1.000
0.000 1.414
0.000 0.000
0.000 0.000
1.414 0.000
0.000 0.000
0.000 0.000
0.000 0.000
0.000 -1.414
1.000 1.000
-1.000 -1.000
-1.000 1.000
True replication provides an estimate of the error or noise in your process and may allow for
more precise estimates of effects.
CONTENTS
20-21
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 20
Modifying Designs
h To replicate the design
1 Choose Stat DOE Modify Design.
2 Choose Replicate design and click Specify.
Choose Randomize just block, and choose a block number from the list.
4 If you like, in Base for random data generator, enter a number. You can recreate a design by
You can use Stat DOE Display Design (page 20-23) to switch back and forth
between a random and standard order display in the worksheet.
20-22
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Displaying Designs
HOW TO USE
Response Surface Designs
Displaying Designs
After you create the design, you can use Display Design to change the way the design points are
stored in the worksheet. You can change the design points in two ways:
display the points in either run or standard order. Standard order is the order of the runs if the
experiment was done in Yates order. Run order is the order of the runs if the experiment was
done in random order.
2 Choose Run order for the design or Standard order for the design. If you do not randomize
a design, the columns that contain the standard order and run order are the same.
3 Do one of the following:
If you want to reorder all worksheet columns that are the same length as the design
columns, click OK.
CONTENTS
20-23
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 20
HOW TO USE
Collecting and Entering Data
and factor settings in the worksheet. These columns constitute the basis of your data collection
form. If you did not name factors or specify factor levels when you created the design, and you
want names or levels to appear on the form, use Modify Design (page 20-19).
2 In the worksheet, name the columns in which you will enter the measurement data obtained
More
You can also copy the worksheet cells to the Clipboard by choosing Edit Copy Cells.
Then paste the Clipboard contents into a word-processing application, such as Microsoft
Wordpad or Microsoft Word, where you can create your own form.
20-24
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Response Surface Designs
all linear terms, all squared terms, and all two-way interactions (the default)
The model you fit will determine the nature of the effect, linear or curvilinear, that you can
detectsee Selecting model terms on page 20-27.
Data
Enter up to 25 numeric response data columns that are equal in length to the design variables in
the worksheet. Each row will contain data corresponding to one run of your experiment. You
may enter the response data in any column(s) not occupied by the design data. The number of
columns reserved for the design data is dependent on the number of factors in your design.
If there is more than one response variable, MINITAB fits separate models for each response.
MINITAB omits missing data from all calculations.
Note
When all the response variables do not have the same missing value pattern, MINITAB
displays a message. Since you would get different results, you may want to repeat the
analysis separately for each response variable.
CONTENTS
20-25
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 20
SC QREF
HOW TO USE
Options
Analyze Response Surface Design dialog box
include blocks in the model. This option is available when the blocks column contains more
than one distinct value.
fit the model with coded or uncoded factor levels. See Choosing data units on page 20-27.
draw five different residual plots for regular, standardized, or deleted residualssee Choosing
a residual type on page 2-5. Available residual plots include a
histogram.
normal probability plot.
).
plot of residuals versus the fitted values ( Y
plot of residuals versus data order. The row number for each data point is shown on the
x-axisfor example, 1 2 3 4 n.
separate plot for the residuals versus each specified column.
For a discussion, see Residual plots on page 2-5.
fit a model by specifying the maximum order of the terms, or choose which terms to include
from a list of all estimable termssee Selecting model terms on page 20-27.
20-26
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Response Surface Designs
store the fits and regular, standardized, and deleted residuals separately for each response
see Choosing a residual type on page 2-5.
store the coefficients for the model and the design matrix, separately for each response. The
design matrix multiplied by the coefficients will yield the fitted values.
store information about the fitted model in a column by checking Quadratic in the Storage
subdialog boxsee Help for the structure of this column.
store leverages, Cooks distances, and DFITS for identifying outlierssee Identifying outliers
on page 2-9.
Additional results would be the same, including which terms in the model are significant.
h To specify the data units for analysis
1 In the Create Response Surface Design dialog box, under Analyze data using, choose coded
Analyze Response Surface Design uses the same method of coding as General Linear
Modelsee Design matrix used by General Linear Model on page 3-41.
CONTENTS
20-27
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 20
SC QREF
HOW TO USE
You can fit a linear, linear and squares, linear and two-way interactions, or full quadratic (default)
model. Or, you can fit a model that is a subset of these terms. The table below shows what terms
would be fit for a model with four factors.
This model type
linear
A B C D
A B C D
AA BB CC DD
A B C D
AB AC AD BC BD CD
full quadratic
(default)
A B C D
AA BB CC DD
AB AC AD BC BD CD
move the terms you do not want to include in the model from Selected Terms to Available
Terms using the arrow buttons
to move one or more factors, highlight the desired terms then click
or
to move all of the terms, click
or
You can also move a term by double-clicking it.
The following examples use data from [3]. The experiment uses three factorsnitrogen,
phosphoric acid, and potashall ingredients in fertilizer. The effect of the fertilizer on snap bean
yield was studied in a central composite design using the default (coded) factor levels.
20-28
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Response Surface Designs
The actual (uncoded) units for the 1 and +1 levels are 2.03 and 5.21 for nitrogen, 1.07 and 2.49
for phosphoric acid, 1.35 and 3.49 for potash. If we were to analyze the design in uncoded units,
a few things would change: the coefficients and their standard deviations, and the t-value and
p-value for the constant term. Additional results would be the same, including which terms in
the model are significant.
Step 1: Generating the central composite design
1 Choose Stat DOE Response Surface Create Response Surface Design.
2 Under Type of Design, choose Central composite.
3 From Number of factors, choose 3.
4 Click Designs. Click OK.
5 Click Factors. In the Name column, enter Nitrogen PhosAcid Potash in rows one through
CONTENTS
20-29
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 20
Session
window
output
HOW TO USE
Coef
10.1980
-0.5738
0.1834
0.4555
StDev
0.3473
0.4203
0.4203
0.4203
T
29.364
-1.365
0.436
1.084
P
0.000
0.191
0.668
0.295
s = 1.553
R-Sq = 16.8%
R-Sq(adj) = 1.2%
Analysis of Variance for BeanYiel
Source
Regression
Linear
Residual Error
Lack-of-Fit
Pure Error
Total
DF
3
3
16
11
5
19
Seq SS
7.789
7.789
38.597
36.057
2.540
46.385
Adj SS
7.789
7.789
38.597
36.057
2.540
Adj MS
2.5962
2.5962
2.4123
3.2779
0.5079
F
1.08
1.08
P
0.387
0.387
6.45
0.026
BeanYiel
8.260
13.190
Fit
11.163
10.500
StDev Fit
0.788
0.807
Residual
-2.903
2.690
St Resid
-2.17R
2.03R
20-30
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Response Surface Designs
In the previous example, you determined that the linear model did not adequately represent the
response surface. The next step is to fit the quadratic model. The quadratic model allows
detection of curvature in the response surface.
1 Open the worksheet CCD_EX1.MTW. (The design and response data have been saved for
you.)
2 Choose Stat DOE Response Surface Analyze Response Surface Design.
3 In Responses, enter BeanYield.
4 Click Terms.
5 From Include the following terms, choose Full quadratic. Click OK.
6 Click Graphs.
7 Under Residual Plots, check Histogram, Normal plot, Residuals versus fits, and Residuals
Coef
10.4623
-0.5738
0.1834
0.4555
-0.6764
0.5628
-0.2734
-0.6775
1.1825
0.2325
R-Sq = 78.6%
StDev
0.4062
0.2695
0.2695
0.2695
0.2624
0.2624
0.2624
0.3521
0.3521
0.3521
T
25.756
-2.129
0.680
1.690
-2.578
2.145
-1.042
-1.924
3.358
0.660
P
0.000
0.059
0.512
0.122
0.027
0.058
0.322
0.083
0.007
0.524
R-Sq(adj) = 59.4%
DF
9
3
3
3
10
5
5
19
CONTENTS
Seq SS
36.465
7.789
13.386
15.291
9.920
7.380
2.540
46.385
Adj SS
36.465
7.789
13.386
15.291
9.920
7.380
2.540
Adj MS
4.0517
2.5962
4.4619
5.0970
0.9920
1.4760
0.5079
F
4.08
2.62
4.50
5.14
P
0.019
0.109
0.030
0.021
2.91
0.133
20-31
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 20
SC QREF
HOW TO USE
BeanYiel
11.060
8.260
13.190
Fit
12.362
9.514
12.004
StDev Fit
0.776
0.776
0.815
Residual
-1.302
-1.254
1.186
St Resid
-2.09R
-2.01R
2.07R
Graph
window
output
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Response Surface Designs
Note
When the model has more than two factors, the factor(s) that are not in the plot are held
constant. You can specify the constant values at which to hold the remaining factors. See
Settings for extra factors on page 20-35.
Data
Contour plots and surface plots are model dependent. Thus, you must fit a model using Analyze
Response Surface Design before you can generate response surface plots with Contour/Surface
(Wireframe) Plots. MINITAB looks in the worksheet for the necessary model information to
generate these plots.
CONTENTS
20-33
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 20
HOW TO USE
Plotting the Response Surface
to generate a surface (wireframe) plot, check Surface (wireframe) plot and click Setup
3 If you like, use any of the options listed below, then click OK in each dialog box.
Options
Setup subdialog box
specify values for factors that are not included in the response surface plot, instead of using the
default of median (middle) valuessee Settings for extra factors on page 20-35
for contour plots, specify the number or location of the contour levels, and the contour line
color and stylesee Controlling the number, type, and color of the contour lines on page 20-35
for surface (wireframe) plots, specify the color of the wireframe (mesh) and the surface
define minimum and maximum values for the x-axis and y-axis
20-34
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Response Surface Designs
To use the preset values, choose High settings, Middle settings, or Low settings under
Hold extra factors at. When you use a preset value, all factors not in the plot will be held
at their high, middle (calculated median), or low settings. (Not available for custom
designs.)
To specify the value(s) at which to hold the factor(s), enter a number in Setting for each
factor you want control. This option allows you to set different hold settings for different
factors.
4 Click OK.
CONTENTS
20-35
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 20
HOW TO USE
Plotting the Response Surface
Choose Values and enter from 2 to 15 contour level values in the units of your data. You
must enter the values in increasing order.
4 To define the line style, choose Make all lines solid or Use different types under Line Styles.
5 To define the line color, choose Make all lines black or Use different colors under Line
In the fertilizer example on page 20-28, you generated a design, supplied the response data, and
fit a linear model. Since this linear model suggested that a higher model is needed to adequately
model the response surface, you fit the full quadratic model. The full quadratic provides a better
fit, with the squared terms for nitrogen and phosphoric acid and the nitrogen by potash
interaction being important. The example below is a continuation of this analysis. Now you want
to try an understand these effects by looking at a contour plot and a surface plot of snap bean yield
versus the significant factorsnitrogen and phosphoric acid. By default, MINITAB selects the first
factor, in this case nitrogen, for the vertical axis, and the second factor, phosphoric acid, for the
horizontal axis.
1 Open the worksheet CCD_EX1.MTW. (The design, response data, and model information
20-36
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
References
HOW TO USE
Response Surface Designs
Graph
window
output
References
[1] G.E.P. Box and D.W. Behnken (1960). Some New Three Level Designs for the Study of
Quantitative Variables, Technometrics 2, pp.455475.
[2] G.E.P. Box and N.R. Draper (1987). Empirical Model-Building and Response Surfaces, John
Wiley & Sons. p.249.
[3] A.I. Khuri and J.A. Cornell (1987). Response Surfaces: Designs and Analyses, Marcel Dekker,
Inc.
[4] D.C. Montgomery (1991). Design and Analysis of Experiments, Third Edition, John Wiley
& Sons.
CONTENTS
20-37
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
21
Mixture Designs
See also,
CONTENTS
21-1
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 21
HOW TO USE
Mixture Designs Overview
mixture experiments
Response depends on
Example
mixture
mixture-amounts
mixture-process variable
21-2
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Choosing a Design
Mixture Designs
determine what design is appropriate for your problem. See Choosing a Design on page 21-3.
2 Use Create Mixture Design to generate a simplex centroid, simplex lattice, or extreme
vertices mixture design (page 21-5). In addition, you can include amounts or process variables
in your design to create mixture-amounts designs (page 21-11) and mixture-process variable
designs (page 21-14).
Use Define Custom Mixture Design to create a design from data you already have in the
worksheet. Define Custom Mixture Design allows you to specify which columns contain your
components and other design characteristics. You can then easily fit a model to the design.
See Defining Custom Designs on page 21-28.
3 Use Modify Design to rename the components, replicate the design, randomize the design,
MINITAB expresses the components or process variables in the worksheet. See Displaying
Designs on page 21-35.
5 Perform the mixture experiment and collect the response data. Then, enter the data in your
(page 21-24) to view the design space, or Response Trace Plot (page 21-45) and Contour/
Surface (Wireframe) Plots to visualize response surface patterns (21-49).
8 If you are trying to optimize responses, use Response Optimizer (page 23-2) or Overlaid
Choosing a Design
Before you use MINITAB, you need to determine what design is most appropriate for your
experiment. MINITAB provides simplex centroid, simplex lattice, and extreme vertices designs.
When you are choosing a design you need to
identify the components, process variables, and mixture amounts that are of interest
determine the model you want to fitsee Selecting model terms on page 21-41
CONTENTS
21-3
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 21
Choosing a Design
determine the impact that other considerations (such as cost, time, availability of facilities, or
lower and upper bound constraints) have on your choice of a design
Simplex Lattice
Degree 2
Simplex Lattice
Degree 3
permits fitting of up to
a special cubic model
permits fitting of a
linear model
permits fitting of up to
a quadratic model
permits fitting of up to
a full cubic model
permits fitting of up to
a special cubic model
permits fitting of up to
a full cubic model
Augmented
Unaugmented
Simplex
Centroid
Note
When selecting a design, it is important to consider the maximum order of the fitted
model required to adequately model the response surface. Mixture experiments
frequently require a higher-order model than is initially planned. Therefore, it is usually a
good idea, whenever possible, to perform additional runs beyond the minimum required
to fit the model. For guidelines, see [1].
21-4
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Mixture Designs
The light gray lines represent the lower and upper bound constraints on the components.The dark
gray area represents the design space.The points are placed at the extreme vertices of design space.
More
For a discussion of upper and lower bound constraints, see Setting lower and upper bounds
on page 21-12.
Note
To create a design from data that you already have in the worksheet, see Defining Custom
Designs on page 21-28.
CONTENTS
21-5
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 21
HOW TO USE
Creating Mixture Designs
2 If you want to see a summary of the simplex designs, click Display Available Designs. Use this
3 Under Type of Design, choose Simplex centroid, Simplex lattice, or Extreme vertices.
4 From Number of components, choose a number.
5 Click Designs.
6 If you like, use any of the options listed under Design subdialog box on page 21-7.
21-6
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Mixture Designs
7 Click OK even if you do not change any of the options. This selects the design and brings you
Options
Design subdialog box
choose the degree of a simplex lattice or extreme vertices designsee Choosing a Design on
page 21-3 and Calculation of design points on page 21-56
add a center point (simplex lattice and extreme vertices designs only) or add axial points to the
interior of the design (by default, MINITAB adds these points to the design)see Augmenting
the design on page 21-8
generate the design in units of the actual measurements rather than the proportions of the
componentssee Generating the design in actual measurements on page 21-10
set lower and upper bounds for constrained designssee Setting lower and upper bounds on
page 21-12
for extreme vertices designs, set linear constraints for the set of componentssee Setting
linear constraints for extreme vertices designs on page 21-13
specify the type of design (full or fractional factorial designs) and the fraction number to use
for fractional factorial designssee Fractionating a mixture-process variable design on page
21-15
set the high and low levels for the process variablessee Setting process variable levels on
page 21-17
CONTENTS
21-7
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 21
HOW TO USE
Creating Mixture Designs
store the design parameters (amounts, upper and lower bounds of the components, and linear
constraints) in separate columns in the worksheetsee Storing the design on page 21-19
Unaugmented
Augmented
To compare some other three-component designs, see the table under Choosing a Design on
page 21-3. To view any design in MINITAB, use Simplex Design Plot.
Note
If you do not want to augment your design, uncheck Augment the design with a
center point and/or Augment the design with axial points in the Designs subdialog
box.
21-8
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Mixture Designs
When you replicate the whole design, you duplicate the complete set of design points from the
base design. The design points that would be added to a first-degree three-component simplex
lattice design are as follows:
Base design
A
1
0
0
B
0
1
0
C
0
0
1
1 0 0
0 1 0
0 0 1
1 0 0
0 1 0
0 0 1
When you choose which types of points to replicate, you duplicate only the design points of the
specified types of points from the base design. For example, the design points for a replicated
second-degree three-component simplex lattice design are as follows:
Base design
A
1
.5
.5
0
0
0
B
0
.5
0
1
.5
0
C
0
0
.5
0
.5
1
1 0 0
0 1 0
0 0 1
.5 .5 0
.5 0 .5
0 .5 .5
True replication provides an estimate of the error or noise in your process and may allow for
more precise estimates of effects.
CONTENTS
21-9
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 21
HOW TO USE
Creating Mixture Designs
To replicate the entire base design, choose Number of replicates for the whole design and
choose a number up to 50.
To replicate only certain types of points, choose Number of replicates for the selected
types of points and enter the number of replicates for each point type in the Number
column of the table.
2 Click OK.
21-10
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Mixture Designs
2 Under Total Mixture Amount, choose Single total and enter the sum of all the component
measurements. Suppose you measure all the components of your mixture in liters. If the
measurements add up to a total of 5.2 liters, you would enter 5.2. Click OK.
Mixture-amounts designs
In the simplest mixture experiment, the response is assumed to only depend on the proportions
of the components in the mixture. In the mixture-amounts experiment, the response is assumed
to depend on the proportions of the components and the amount of the mixture. For example,
the amount applied and the proportions of the ingredients of a plant food may affect the growth
of a house plant. When a mixture experiment is performed at two or more levels of the total
mixture amount, it is called a mixture-amounts experiment.
h To create a mixture-amounts design
1 In the Create Mixture Design dialog box, click Components.
CONTENTS
21-11
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 21
HOW TO USE
Creating Mixture Designs
2 Under Total Mixture Amount, choose Multiple totals and enter up to five mixture totals.
Suppose you are testing plant food and would like evaluate plant growth when one gram
versus two grams of food are applied. You would enter 1 2. Click OK.
More
Naming components
By default, MINITAB names the components alphabetically, skipping the letter T.
h To name components
1 In the Create Mixture Design dialog box, click Components.
2 Under Name, click in the first row and type the name of the first component. Then, use the Z
key to move down the column and enter the remaining names.
More
After you have created the design, you can change the component names by typing new
names in the Data window, or with Modify Design (page 21-30).
Lower bounds are necessary when any of the components must be present in the mixture. For
example, lemonade must contain lemon juice.
Upper bounds are necessary when the mixture cannot contain more than a given proportion
of an ingredient. For example, a cake mix cannot contain more than 5% baking powder.
Constrained designs (those in which you specify lower or upper bounds) produce coefficients that
are highly correlated. Generally, you can reduce the correlations among the coefficients by
21-12
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Mixture Designs
2 Under Lower, click in the component row for which you want set a lower bound, and type a
positive number.
Each lower bound must be less than the corresponding upper bound. The sum of the lower
bounds for all the components must be less than the value of Single total or the first value in
Multiple totals.
3 Use the S key to move to Upper and enter a positive number.
Each upper bound must be greater than the corresponding lower bound. Each upper bound
must be less than the value of Single total or the first value in Multiple totals. The sum of the
upper bounds for all the components must be greater than the value of Single total or the first
value in Multiple totals.
4 Repeat steps 2 and 3 to assign bounds for other components. Click OK.
When you change the default lower or upper bounds of a component, the achievable bounds on
the other components may need to be adjusted. See Help for calculations.
CONTENTS
21-13
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 21
HOW TO USE
Creating Mixture Designs
component coefficients are all 1. Examples for a four-component blend are shown in the table
below:
Coefficients
Condition
Lower
Value
10
5A + 3B + 8D < 0.1
0.5B + 0.8D > 0.9
0.9
0.5
Upper
Value
20
0.1
0.8
3 In the first column of the table, enter a coefficient for one or more of the components and a
lower and/or upper value. Use the Z key to move down the column and enter desired values.
The lower and upper values that you enter must be consistent with value of Single total or the
first value in Multiple totals.
You must enter at least one coefficient and an upper or lower value. If you do not enter a
coefficient for a component, MINITAB assumes it to be zero.
4 Repeat step 3 to enter up to ten different linear constraints on the set of components. Click
OK.
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Mixture Designs
generated at each combination of levels of the process variables or at a fraction of the level
combinations.
Fractionating a mixture-process variable design
When you generate a complete mixture-process variable design, the mixture design is
generated at each combination of levels of the process variables. This may result in a prohibitive
number of runs because the number of design points in the complete design increases quickly as
the number of process variables increase. For example, a complete simplex centroid design with
3 mixture components and 2 process variables has 28 runs. The same 3-component design with
three process variables has 56 runs; this design with 4 process variables has 112 runs.
Tip
You can also use an optimal design to reduce the number of runssee Chapter 22,
Optimal Designs.
Fraction
Notice that the full factorial design contains twice as many design points as the fraction
design. The response is only measured at four of the possible eight corner points of the factorial
portion of the design.
The types of factorial designs that are available depend on the number of process variables.
Factorial design availability is summarized in the table below:
Number of
process variables
one
two
three
CONTENTS
1/2
fraction
1/4
fraction
1/8
fraction
1/16
fraction
21-15
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 21
HOW TO USE
Creating Mixture Designs
Number of
process variables
1/2
fraction
1/4
fraction
1/8
fraction
four
five
six
seven
1/16
fraction
21-16
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Mixture Designs
2 Under Name, click in the first row and type the name of the first process variable. Then, use
the Z key to move down the column and enter the remaining names.
3 Click OK.
More
After you have created the design, you can change the process variable names by typing
new names in the Data window or with Modify Design (page 21-30).
CONTENTS
21-17
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 21
HOW TO USE
Creating Mixture Designs
2 Under Low, click in the process variable row to which you would like to assign values and
enter any value. Use the S key to move to High and enter a value. If you use numeric levels,
the value you enter in High must be larger than the value you enter in Low.
3 Repeat step 2 to assign levels for other process variables. Click OK.
More
To change the process variable levels after you have created the design, use Stat DOE
Modify Design.
21-18
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Mixture Designs
StdOrder shows what the order of the runs in the experiment would be if the experiment was
done in standard order.
RunOrder shows what the order of the runs in the experiment would be if the experiment was
run in random order.
If you did not randomize, the run order and standard order are the same.
If you want to re-create a design with the same ordering of the runs (that is, the same design
order), you can choose a base for the random data generator. Then, when you want to re-create
the design, you just use the same base.
You can use Stat DOE Display Design (page 21-35) to switch back and forth
between a random and standard order display in the worksheet.
More
C4 (Blocks) stores the blocking variable. When a design is not blocked, as with mixture
designs, MINITAB sets all column values to one.
In addition, depending on your design and storage options, MINITAB may store the following:
CONTENTS
21-19
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 21
HOW TO USE
Creating Mixture Designs
When you create a design using Create Mixture Design, MINITAB stores the appropriate
design information in the worksheet. MINITAB needs this stored information to analyze the
data properly. If you want to use Analyze Mixture Design, you must follow certain rules
when modifying the worksheet data. See Modifying and Using Worksheet Data on page
18-4.
If you make changes that corrupt your design, you may still be able to analyze it with
Analyze Mixture Design after you use Define Custom Mixture Design (page 21-28).
Suppose you want to study how the proportions of three ingredients in an herbal blend household
deodorizer affect the acceptance of the product based on scent. The three components are neroli
oil, rose oil, and tangerine oil.
1 Choose Stat DOE Mixture Create Mixture Design.
2 Under Type of Design, choose Simplex centroid.
3 From Number of components, choose 3.
4 Click Designs. Make sure Augment the design with axial points is checked. Click OK.
5 Click Components. In Name, enter Neroli, Rose, and Tangerine in rows 1 to 3, respectively.
Click OK.
6 Click Results. Choose Detailed description and data table.
7 Click OK in each dialog box.
21-20
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Session
window
output
HOW TO USE
Mixture Designs
Design points:
Design degree:
10
3
Mixture total: 1
Number of Boundaries for Each Dimension
Point Type
Dimension
Number
1
0
3
2
1
3
0
2
1
1
3
1
3
2
3
1
3
3
0
0
0
0
1
1
1
-1
3
1
3
Lower
0.0000
0.0000
0.0000
Amount
Upper
1.0000
1.0000
1.0000
Proportion
Lower
Upper
0.0000
1.0000
0.0000
1.0000
0.0000
1.0000
Pseudocomponent
Lower
Upper
0.0000
1.0000
0.0000
1.0000
0.0000
1.0000
A
0.1667
0.6667
0.5000
0.5000
0.1667
0.0000
0.0000
0.0000
0.3333
1.0000
B
0.1667
0.1667
0.5000
0.0000
0.6667
0.5000
0.0000
1.0000
0.3333
0.0000
C
0.6667
0.1667
0.0000
0.5000
0.1667
0.5000
1.0000
0.0000
0.3333
0.0000
CONTENTS
21-21
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 21
HOW TO USE
Creating Mixture Designs
Note
MINITAB randomizes the design by default, so if you try to replicate this example, your
runs may not match the order shown.
Suppose you need to determine the proportions of flour, milk, baking powder, eggs, and oil in a
pancake mix that would produce an optimal product based on taste. Because previous
experimentation suggests that a mix that does not contain all of the ingredients or has too much
baking powder will not meet the taste requirements, you decide to constrain the design by setting
lower bounds and upper bounds.
You decide that quadratic model will sufficiently model the response surface, so you decide to
create a second-degree design.
1 Choose Stat DOE Mixture Create Mixture Design.
2 Under Type of Design, choose Extreme vertices.
3 From Number of components, choose 5.
4 Click Designs. From Degree of design, choose 2.
5 Make sure Augment the design with center point and Augment the design with axial points
Name
Lower
Upper
Flour
.425
Milk
.30
Baking powder
.025
.05
Eggs
.10
Oil
.10
7 Click Results. Choose Detailed description and data table. Click OK in each dialog box.
21-22
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Session
window
output
Mixture Designs
Design points:
Design degree:
33
2
Mixture total: 1
Number of Boundaries for Each Dimension
Point Type
Dimension
Number
1
0
8
2
1
16
3
2
14
4
3
6
0
4
1
4
0
0
0
5
0
0
0
1
8
1
8
2
16
1
16
3
0
0
0
0
1
1
1
-1
8
1
8
Lower
0.425000
0.300000
0.025000
0.100000
0.100000
Upper
0.475000
0.350000
0.050000
0.150000
0.150000
Proportion
Lower
Upper
0.425000 0.475000
0.300000 0.350000
0.025000 0.050000
0.100000 0.150000
0.100000 0.150000
Pseudocomponent
Lower
Upper
0.000000 1.000000
0.000000 1.000000
0.000000 0.500000
0.000000 1.000000
0.000000 1.000000
Type
2
-1
2
-1
2
1
2
-1
2
2
2
2
-1
1
2
-1
-1
1
1
-1
A
0.462500
0.429687
0.425000
0.454687
0.425000
0.475000
0.425000
0.429687
0.437500
0.450000
0.437500
0.425000
0.429687
0.425000
0.450000
0.429687
0.429687
0.425000
0.425000
0.442187
CONTENTS
B
0.300000
0.304688
0.300000
0.304688
0.300000
0.300000
0.312500
0.304688
0.312500
0.300000
0.300000
0.325000
0.304688
0.300000
0.325000
0.304688
0.317188
0.300000
0.350000
0.304688
C
0.037500
0.043750
0.037500
0.031250
0.037500
0.025000
0.050000
0.031250
0.050000
0.025000
0.050000
0.025000
0.043750
0.025000
0.025000
0.031250
0.043750
0.025000
0.025000
0.043750
D
0.100000
0.104688
0.100000
0.104688
0.137500
0.100000
0.112500
0.129688
0.100000
0.100000
0.112500
0.125000
0.117188
0.150000
0.100000
0.104688
0.104688
0.100000
0.100000
0.104688
E
0.100000
0.117188
0.137500
0.104688
0.100000
0.100000
0.100000
0.104688
0.100000
0.125000
0.100000
0.100000
0.104688
0.100000
0.100000
0.129688
0.104688
0.150000
0.100000
0.104688
21-23
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 21
HOW TO USE
-1
0
2
2
2
2
2
1
1
2
1
2
1
0.429687
0.434375
0.425000
0.437500
0.425000
0.425000
0.425000
0.425000
0.425000
0.450000
0.450000
0.425000
0.425000
0.329687
0.309375
0.337500
0.300000
0.325000
0.312500
0.300000
0.300000
0.325000
0.300000
0.300000
0.300000
0.300000
0.031250
0.037500
0.037500
0.050000
0.025000
0.050000
0.025000
0.050000
0.050000
0.025000
0.050000
0.050000
0.050000
0.104688
0.109375
0.100000
0.100000
0.100000
0.100000
0.125000
0.100000
0.100000
0.125000
0.100000
0.112500
0.125000
0.104688
0.109375
0.100000
0.112500
0.125000
0.112500
0.125000
0.125000
0.100000
0.100000
0.100000
0.112500
0.100000
MINITAB randomizes the design by default, so if you try to replicate this example, your
runs may not match the order shown.
components only
Data
You must create and store a design using Create Mixture Design.
21-24
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Mixture Designs
To display a single simplex design plot for any three components, choose Select a triplet of
components for a single plot. Then, choose any three components that are in your design.
To display a layout with four simplex design plots (each plot displays three components),
choose Select four components for a matrix plot. Then, choose any four components that
are in your design.
To display a simplex design plot for all combinations of components, each in a separate
window, choose Generate plots for all triplets of components.
3 If you like, use any of the options listed below, then click OK.
Options
Simplex Design Plot dialog box
use the run order, number of replicates, or point type for design point labels on the plot
include process variables, and for a single simplex design plot include all the levels of the
process variables in a single layout
include an amount variable (by default, MINITAB will plot the amount variable at its first
defined value), and for a single simplex design plot include all the levels of the amount
variable in a single layout
CONTENTS
21-25
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 21
HOW TO USE
specify values for design variables that are not included in the plotsee Settings for extra
components, process variables, and an amount variable below
define minimum and maximum values for the x-axis, y-axis, and z-axis
Note
h To set the holding level for design variables not in the plot
1 In the Simplex Design Plot dialog box, click Settings.
For components (only available for design with more than three components):
21-26
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Mixture Designs
To use the preset values for components, choose Lower bound setting or Upper bound
setting under Hold components at. When you use a preset value, all components not
in the plot will be held at their lower bound or upper bound.
To specify the value at which to hold the components, enter a number in Setting for
each component that you want to control. This option allows you to set a different
holding value for each component.
3 Click OK.
e Example of simplex design plot
In the Example of a simplex centroid design on page 21-20, you created a design to study how the
proportions of three ingredients in an herbal blend household deodorizer affect the acceptance
of the product based on scent. The three components are neroli oil, rose oil, and tangerine oil.
To help you visualize the design space, you want to display a simplex design plot.
1 Open the worksheet DEODORIZ.MTW.
2 Choose Stat DOE Mixture Simplex Design Plot. Click OK.
Graph
window
output
CONTENTS
21-27
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 21
HOW TO USE
Defining Custom Designs
three pure mixtures, one for each component (Neroli, Rose, and Tangerine). These points are
found at the vertices of the triangle.
three binary blends, one for each possible two-component blend (Neroli-Rose,
Rose-Tangerine, and Tangerine-Neroli). These design points are found at the midpoint of each
edge of the triangle.
three complete blends. All three components are included in these blends, but not in equal
proportions.
one center point (or centroid). Equal proportions of all three components are included in this
blend.
21-28
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Mixture Designs
2 In Components, enter the columns that contain the component data. Data must be in the
form of amounts. (When the mixture total is one, amounts and proportions are equivalent.)
For information the data units, see Mixture-amounts designs on page 21-11 and Specifying the
units for components on page 21-35.
3 If you have process variables in your design, enter the columns in Process variables.
4 If you have an amount variable, under Mixture Amount, choose In column, and enter the
6 If you have an mixture-amounts experiment, MINITAB will enter the smallest value in your
amount column in Total value matching Lower/Upper bounds. If this is not the value you
want, change it to any other total in your amount column.
7 MINITAB will fill in the lower and upper bound table from the worksheet. Make any necessary
If you do not have any columns containing the standard order, run order, point type, or
blocks, click OK.
CONTENTS
21-29
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 21
HOW TO USE
Modifying Designs
If you have columns that contain data for the standard order, run order, point type, or
blocks, click Designs.
1 If you have a column that contains the standard order of the experiment, under
Standard Order Column, choose Specify by column and enter the column containing
the standard order.
2 If you have a column that contains the run order of the experiment, under Run Order
Column, choose Specify by column and enter the column containing the run order.
3 If you have a column that contains the design point type, under Point Type Column,
choose Specify by column and enter the column containing the point types.
4 If your design is blocked, under Blocks, choose Specify by column and enter the
Options
Lower/Upper subdialog box
store the design parameters (amounts, upper and lower bounds of the components, and linear
constraints) in separate columns in the worksheetsee Storing the design on page 21-19
set one or more linear constraints for the set of componentssee Setting linear constraints for
extreme vertices designs on page 21-13
Modifying Designs
After creating a design and storing it in the worksheet, you can use Modify Design to make the
following modifications:
21-30
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Modifying Designs
Mixture Designs
By default, MINITAB will replace the current design with the modified design in the worksheet.
To store the modified design in a new worksheet, check Put modified design in a new
worksheet.
Renaming components
h To rename components
1 Choose Stat DOE Modify Design.
3 Under Name, click in the first row and type the name of the first component. Then, use the
Z key to move down the column and enter the remaining names. Click OK.
Tip
You can also type new component or process variable names directly into the Data
window.
CONTENTS
21-31
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 21
HOW TO USE
Modifying Designs
Under Name, click in the first row and type the name of the first process variable. Then,
use the Z key to move down the column and name the remaining process variables.
Under Low, click in the process variable row you would like to assign values and enter any
numeric or text value. Use the S key to move to High and enter a value. For numeric
levels, the High value must be larger than Low value.
Repeat to assign levels for other factors.
21-32
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Modifying Designs
Mixture Designs
5 Click OK.
Tip
You can also type new component or process variable names directly into the Data
window.
B
0
1
0
C
0
0
1
1 0 0
0 1 0
0 0 1
1 0 0
0 1 0
0 0 1
True replication provides an estimate of the error or noise in your process and may allow for
more precise estimates of effects.
h To replicate the design
1 Choose Stat DOE Modify Design.
2 Choose Replicate design and click Specify.
CONTENTS
21-33
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 21
Modifying Designs
2 Choose Randomize design and click Specify.
Choose Randomize just block, and choose a block number from the list. (Mixture designs
are not usually blocked.)
4 If you like, in Base for random data generator, enter a number. Click OK.
More
You can use Stat DOE Display Design (page 21-35) to switch back and forth
between a random and standard order display in the worksheet.
21-34
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Displaying Designs
Mixture Designs
Displaying Designs
After you create a design, you can use Display Design to change the way the design points are
stored in the worksheet. You can change the design points in two ways:
display the points in either random and standard order. Run order is the order of the runs if
the experiment was done in random order.
2 Choose Run order for the design or Standard order for the design. If you do not randomize
a design, the columns that contain the standard order and run order are the same.
3 Do one of the following:
If you want to reorder all worksheet columns that are the same length as the design
columns, click OK.
CONTENTS
21-35
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 21
HOW TO USE
Displaying Designs
You can choose one of three scales to represent the design: amounts, proportions, or
pseudocomponents. With certain combinations of the mixture total and lower bound constraints,
the various scalings are equivalent as shown in the following table:
Total mixture
Lower bounds
Equivalent scales
equal to 1
amounts
proportions
pseudocomponents
equal to 1
greater than 0
amounts
proportions
not equal to 1
proportions
pseudocomponents
not equal to 1
greater than 0
none
Pseudocomponents
Constrained designs (those in which you specify lower or upper bounds) produce coefficients
which are highly correlated.
Lower bounds are necessary when any of the components must be present in the mixture. For
example, lemonade must contain lemon juice.
Upper bounds are necessary when the mixture cannot contain more than a given proportion
of an ingredient. For example, a cake mix cannot contain more than 5% baking powder.
Generally, you can reduce the correlations among the coefficients by transforming the
components to pseudocomponents. For a complete discussion, see [1] and [3].
Pseudocomponents, in effect, rescale the constrained data area so the minimum allowable
amount (the lower bound) of each component is zero. This makes a constrained design in
pseudocomponents the same as an unconstrained design in proportions.
The table below shows two components expressed in amounts, proportions, and
pseudocomponents. Suppose the total mixture is 50 ml. Let X1 and X2 be the amount scale.
21-36
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Mixture Designs
Thus X1 + X2 = 50. Suppose X1 has a lower bound of 20 (this means that the upper bound of X2
is 50 minus 20, or 30). Here are some points on the three scales:
Amounts
Proportions
Pseudocomponents
X1
X2
X1
X2
X1
X2
50
1.0
0.0
1.0
0.0
20
30
0.4
0.6
0.0
1.0
35
15
0.7
0.3
0.5
0.5
process variables, and amount variable in the worksheet. These columns constitute the basis
of your data collection form. If you did not name components or process variables when you
created the design, and you want names on the form, use Modify Design (page 21-30).
2 In the worksheet, name the columns in which you will record the measurement data
More
You can also copy the worksheet cells to the Clipboard by choosing Edit Copy Cells.
Then paste the Clipboard contents into a word-processing application, such as Microsoft
WordPad or Microsoft Word, where you can create your own form.
CONTENTS
21-37
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 21
HOW TO USE
Analyzing Mixture Designs
mixture regression
stepwise regression
forward selection
backward elimination
Data
Enter numeric response data column(s) that are equal in length to the design variables in the
worksheet. Each row in the worksheet will contain the data for one run of your experiment. You
may enter the response data in any columns not occupied by the design data. The number of
columns reserved for the design data is dependent on the number of components in your design
and whether or not you chose to store the design parameters (see Storing the design on page
21-19).
If there is more than one response variable, MINITAB fits separate models for each response.
MINITAB omits the rows containing missing data from all calculations.
Note
When all the response variables do not have the same missing value pattern, MINITAB
displays a message. When the responses do not have the same missing value pattern, you
may want to perform the analysis separately for each response because you would get
different results than if you included them all in a single analysis.
21-38
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Mixture Designs
Options
Analyze Mixture Design dialog box
fit a model for mixture components only (default), or include process variables or amounts in
the model. See Mixture-amounts designs on page 21-11 and Mixture-process variable designs
on page 21-14 for more information.
choose from four model fitting methods: mixture regression (default), stepwise regression,
forward selection, backward elimination.
draw five different residual plots for regular, standardized, or deleted residualssee Choosing
a residual type on page 2-5. Available residual plots include a:
histogram.
normal probability plot.
).
plot of residuals versus the fitted values ( Y
plot of residuals versus data order. The row number for each data point is shown on the
x-axis (for example, 1 2 3 4 n).
separate plot for the residuals versus each specified column.
For a discussion, see Residual plots on page 2-5.
CONTENTS
21-39
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 21
HOW TO USE
Analyzing Mixture Designs
fit a model by specifying the maximum order of the terms, or choose which terms to include
from a list of all estimable termssee Selecting model terms on page 21-41.
include inverse component terms, process variable terms, or an amount term in the model.
You cannot include inverse terms if the lower bound for any component is zero or if you
choose to analyze the design in pseudocomponents.
If you choose the forward selection model fitting method, you can
designate a set of predictor variables that cannot be removed from the model, even when
their p-values are less than to enter.
set the -value for entering a new variable in the model.
If you choose the backward elimination model fitting method, you can
designate a set of predictor variables that cannot be removed from the model, even when
their p-values are less than to enter.
set the -value for removing a variable from the model.
display the next best alternate predictors up to the number requested. If a new predictor is
entered into the model, MINITAB displays the predictor which was the second best choice, the
third best choice, and so on, up to the requested number.
store the fits, and regular, standardized, and deleted residuals separately for each response
see Choosing a residual type on page 2-5.
store the coefficients for the model, the design matrix, and model terms separately for each
response. The design matrix multiplied by the coefficients will yield the fitted values. Since
21-40
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Mixture Designs
Analyze Mixture Design does not allow a constant in the model, the design matrix does not
contain a column of ones.
store the leverages, Cook's distances, and DFITS, for identifying outlierssee Identifying
outliers on page 2-9.
linear
(first-order)
linear
additive
quadratic
(second-order)
additive
nonlinear synergistic binary
or
additive
nonlinear antagonistic binary
special cubic
(third-order)
linear, quadratic,
and special cubic
additive
nonlinear synergistic ternary
nonlinear antagonistic ternary
full cubic
(third-order)
linear, quadratic,
special cubic, and
full cubic
additive
nonlinear synergistic binary
nonlinear antagonistic binary
nonlinear synergistic ternary
nonlinear antagonistic ternary
special quartic
(fourth-order)
linear, quadratic,
special cubic,
full cubic, and
special quartic
additive
nonlinear synergistic binary
nonlinear antagonistic binary
nonlinear synergistic ternary
nonlinear antagonistic ternary
nonlinear synergistic quaternary
nonlinear antagonistic quaternary
CONTENTS
21-41
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 21
HOW TO USE
Analyzing Mixture Designs
full quartic
(fourth-order)
linear, quadratic,
special cubic,
full cubic,
special quartic, and
full quartic
additive
nonlinear synergistic binary
nonlinear antagonistic binary
nonlinear synergistic ternary
nonlinear antagonistic ternary
nonlinear synergistic quaternary
nonlinear antagonistic quaternary
You can fit inverse terms with any of the above models as long as the lower bound for any
component is not zero and you choose to analyze the design in proportions. Inverse terms allow
you to model extreme changes in the response as the proportion of one or more components
nears its boundary. Suppose you are formulating lemonade and you are interested in the
acceptance rating for taste. An extreme change in the acceptance of lemonade occurs when the
proportion of sweetener goes to zero. That is, the taste becomes unacceptably sour.
Analyze Mixture Design fits a model without a constant term. For example, a quadratic in three
components is as follows:
Y = b1A + b2 + b3C + b12AB + b13AC + b23BC
h To specify the model
1 In the Analyze Mixture Design dialog box, click Terms.
21-42
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Mixture Designs
from Include the component terms up through order, choose one of the following:
linear, quadratic, special cubic, full cubic, special quartic, or full quartic
move the terms you want to include in the model to Selected Terms using the arrow
buttons
to move one or more terms, highlight the desired terms, then click
or
to move all of the terms, click
or
You can also move a term by double-clicking it.
Note
MINITAB represents components with the letters A, B, C, , skipping the letter T, process
variables with X1Xn, and amounts with the letter T.
to include all the inverse component terms, check Include inverse component terms
to include a subset of the inverse component terms, highlight the desired terms, then click
4 If you want to include process variable or amount terms, do one of the following:
from Include process variables/mixture amount terms up through order, and choose an
order
move terms you want to include in the model to Selected Terms using the arrow buttons
to move one or more terms, highlight the desired terms, then click
or
to move all of the terms, click
or
You can also move a term by double-clicking it.
This example fits a model for the design created in Example of a simplex centroid design on page
21-20. Recall that you are trying determine how the proportions of the components in an herbal
blend household deodorizer affect the acceptance of the product based on scent. The three
components are neroli oil, rose oil, and tangerine oil. Based on the design points, you mixed ten
blends. The response measure (Acceptance) is the mean of five acceptance scores for each of the
blends.
1 Open the worksheet DEODORIZ.MTW.
2 Choose Stat DOE Mixture Analyze Mixture Design.
3 In Responses, enter Acceptance. Click OK.
CONTENTS
21-43
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 21
Session
window
output
Coef
5.856
7.141
7.448
1.795
5.090
-1.941
S = 0.49023
R-Sq = 73.84%
SE Coef
0.4728
0.4728
0.4728
2.1791
2.1791
2.1791
T
*
*
*
0.82
2.34
-0.89
PRESS = 11.440
R-Sq(pred) = 0.00%
P
*
*
*
0.456
0.080
0.423
VIF
1.964
1.964
1.964
1.982
1.982
1.982
R-Sq(adj) = 41.14%
DF
5
2
3
4
9
Seq SS
2.71329
1.04563
1.66766
0.96132
3.67461
Adj SS
2.71329
1.56873
1.66766
0.96132
Adj MS
0.542659
0.784366
0.555887
0.240329
F
P
2.26 0.225
3.26 0.144
2.31 0.218
21-44
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Mixture Designs
These plots are described in Chapter 19, Factorial Designs. See Displaying Factorial Plots on
page 19-52 for details.
surface (wireframe) plotsee Contour and surface (wireframe) plots on page 21-49
These plots show how a response variable relates to the design variables based on a model
equation.
More
You can use a simplex design plot to visualize the mixture design space (or a slice of the
design space if you have more than three components). MINITAB plots the design points
on triangular axes. See Displaying Simplex Design Plots on page 21-24.
Data
Trace plots, contour plots, and surface plots are model dependent. Thus, you must fit a model
using Analyze Mixture Design before you can display these plots. MINITAB looks in the
worksheet for the necessary model information to generate these plots.
CONTENTS
21-45
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 21
HOW TO USE
Displaying Mixture Plots
2 From Response, choose a response to plot. If an expected response is not in the list, fit a model
Options
Response Trace Plot dialog box
define the reference blend (the default is the centroid of the experimental region)
specify the line style and line color for the trace curves
specify hold values for process variables (the default is the low setting) and the amount variable
(the default is the average amount)
define minimum and maximum values for the x-axis and y-axis
21-46
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Mixture Designs
Component direction
When changing the proportion of a component in a mixture to determine its effect on a
response, you must make offsetting changes in the other mixture components because the sum
of the proportions must always be one. The changes in the component whose effect you are
evaluating along with the offsetting changes in the other components can be thought of as a
direction through the experimental region.
There are two commonly used trace directions along which the estimated responses are
calculated: Coxs direction and Piepels direction.
When the design is not constrained and the reference point lies at the centroid of the
unconstrained experimental region, both Coxs directions and Piepels directions are the axes
of the simplex.
When the design is constrained, the default reference mixture point lies at the centroid of the
constrained experimental region that is different than the centroid of the unconstrained
experimental region. In this case, Coxs direction is defined in the original design space,
whereas, Piepels direction is defined in the L-pseudocomponent space.
In the Example of a simplex centroid design on page 21-20, you created a design to study how the
proportions of three ingredients (neroli oil, rose oil, and tangerine oil) in an herbal blend
household deodorizer affect the acceptance of the product based on scent. Next, you analyzed
the response (Acceptance) in the Example of analyzing a simplex centroid design on page 21-43.
Now, to help you visualize the component effects, you display a response trace plot.
1 Open the worksheet DEODORIZ2.MTW.
2 Choose Stat DOE Mixture Response Trace Plot.
3 Click Curves.
4 Under Line Styles, choose Use different types. Click OK in each dialog box.
Graph
window
output
CONTENTS
21-47
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 21
HOW TO USE
Displaying Mixture Plots
Keep the following in mind when you are interpreting a response trace plot:
Components with the greatest effect on the response will have the steepest response traces.
Components with larger ranges (upper bound lower bound) will have longer response
traces; components with smaller ranges will have shorter response traces.
The total effect of a component depends on both the range of the component and the
steepness of its response trace. The total effect is defined as the difference in the response
between the effect direction point at which the component is at its upper bound and the effect
direction point at which the component is at its lower bound.
Components with approximately horizontal response traces, with respect to the reference
blend, have virtually no effect on the response.
Components with similar response traces will have similar effects on the response.
21-48
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Mixture Designs
CONTENTS
21-49
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 21
HOW TO USE
Displaying Mixture Plots
to generate a surface (wireframe) plot, check Surface (wireframe) plot and click Setup
These options
are only
available for
contour plots.
3 From Response, choose a response to plot. If an expected response is not in the list, fit a model
Options
This button is labeled
Wireframe
Setup
subdialog
for the Surface (wireframe) Plot.
box
for a single contour plot, include all the levels of the process variables in a single layout
include an amount variable (by default, MINITAB will hold the amount variable at its first
defined value)
for contour plots, specify the number or location of the contour levels, and the contour line
color and stylesee Controlling the number, type, and color of the contour lines on page 21-52
for surface (wireframe) plots, specify the color of the wireframe (mesh) and the surface
21-50
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Mixture Designs
define minimum and maximum values for the x-axis, y-axis, and z-axis
for contour plots, define the background grid or suppress grid lines
Note
h To set the holding level for design variables not in the plot
1 In the Setup subdialog box, click Settings.
For components (only available for design with more than three components):
CONTENTS
21-51
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 21
HOW TO USE
Displaying Mixture Plots
To use the preset values for components, choose Lower bound setting, Middle setting,
or Upper bound setting under Hold components at. When you use a preset value, all
components not in the plot will be held at their lower bound, middle, or upper bound.
To specify the value at which to hold the components, enter a number in Setting for
each component that you want to control. This option allows you to set a different
holding value for each component.
3 Click OK.
21-52
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Mixture Designs
Choose Values and enter from 2 to 15 contour level values in the units of your data. You
must enter the values in increasing order.
4 To define the line style, choose Make all lines solid or Use different types under Line Styles.
5 To define the line color, choose Make all lines black or Use different colors under Line
Colors.
6 Click OK.
e Example of a contour plot and a surface plot
In the deodorizer example on page 21-43, you fit a model to try and determine how the
proportions of the components in an herbal blend household deodorizer affect the acceptance of
the product based on scent. The three components are neroli oil, rose oil, and tangerine oil.
Based on the design points, you mixed ten blends. The response measure (Acceptance) is the
mean of five acceptance scores for each of the blends.
Now you generate a contour and a surface plot to help identify the component proportions that
yield the highest acceptance score for the herbal blend.
1 Open the worksheet DEODORIZ2.MTW.
2 Choose Stat DOE Mixture Contour/Surface (Wireframe) Plots.
3 Choose Contour plot and click Setup. Click OK.
4 Choose Surface (wireframe) plot and click Setup. Click OK in each dialog box.
Graph
window
output
CONTENTS
21-53
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 21
References
References
[1] J.A. Cornell (1990). Experiments With Mixtures: Designs, Models, and the Analysis of
Mixture Data, John Wiley & Sons.
[2] D.C. Montgomery and S.R. Voth (1994). Multicollinearity and Leverage in Mixture
Experiments, Journal of Quality Technology 26, pp. 96108.
[3] R.H Meyers and D.C. Montgomery (1995). Response Surface Methodology: Process and
Product Optimization Using Designed Experiments, John Wiley & Sons.
[4] R. D. Snee and D. W. Marquardt (1974). Extreme Vertices Designs for Linear Mixture
Models, Technometrics 16 (3), pp. 399408.
[5] R.C. St. John (1984). Experiments With Mixtures in Conditioning and Ridge Regression,
Journal of Quality Technology 16, pp.8196.
x2 = 1
(0,1,0)
21-54
x3 = 1
(0,0,1)
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Mixture Designs
Any points along the edges of the triangle represent blends where one of the components is
absent. The illustrations below show the location of different blends.
x2
=0
/3
thi
oi n
sp
ne
th i
s li
at
thi
ne
ng
s li
alo
=1
oi n
t
lin
e
x3
=1
at
th i
sp
h is
lin
l on
gt
his
x3
=2
/3
a
gt
x3
x3
=1
=0
/3
alo
alo
n
ng
th i
s li
ne
e
lin
ng
/3
x2
his
alo
=2
gt
=1
n
alo
x2
x2
x1 = 1 at this point
(1,0,0)
edge trisectors
(1,0,0)
edge midpoint
(2/3,1/3,0)
(1/3,0,2/3)
(0,1,0)
(0,0,1)
( 0,
2/ 3
(0,
1
, 2/
3)
(1/3,2/3,0)
/2,
1
/ 2)
(0,1,0)
(1/2,0,1/2)
(0,0,1)
( 0,
1/3
(1/3,1/3,1/3)
, 1/
3)
(1/2,1/2,0)
(2/3,0,1/3)
Each location on the triangles in the above illustrations represents a different blend of the
mixture. For example,
edge midpoints are two-blend mixtures in which one component makes up 1/2 and a second
component makes up 1/2 of the mixture.
edge trisectors are two-blend mixtures in which one component makes up 1/3 and another
component makes up 2/3 of the mixture. These points divide the triangle edge into 3 equal
parts.
the center point (or centroid) is the complete mixture in which all components are present in
equal proportions (1/3,1/3,1/3). Complete mixtures are on the interior of the design space and
are mixtures in which all of the components are simultaneously present.
CONTENTS
21-55
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 21
HOW TO USE
Appendix for Mixture Designs
all points (x1, x2, , xq) where one component, xi = 1, and the rest are 0. These are called
vertex points.
all points where one component, xi = 1/2, another component, xj = 1/2, and the rest are 0.
all points where one component, xi = 1/3, another component, xj = 1/3, another component,
xk = 1/3, and the rest are 0.
this pattern continues until all components are 1/q. This last point (where all components are
equal) is called the center or centroid of the design.
Number of
components (q)
2 to 20
2 to 20
2 to 17
2 to 11
2 to 8
2 to 7
2 to 6
2 to 5
2 to 5
10
2 to 5
21-56
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
22
Optimal Designs
See also,
CONTENTS
22-1
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 22
HOW TO USE
Optimal Designs Overview
augment (add points to) an existing designsee Augmenting or Improving a Design on page
22-9
MINITAB provides two optimality criteria for the selection of design points:
D-optimalityA design selected using this criterion minimizes the variance in the regression
coefficients of the fitted model. You specify the model, then MINITAB selects design points that
satisfy the D-optimal criterion from a set of candidate design points.
distance-based optimalityA design selected using this criterion spreads the design points
uniformly over the design space. The distance-based method can be used when it is not
possible or desirable to select a model in advance.
Data
The worksheet must contain a design generated by Create Response Surface Design, Define
Custom Response Surface Design, Create Mixture Design, or Define Custom Mixture Design.
For information on creating these designs, see Chapters 20 and 21.
The design columns in the worksheet comprise the candidate set of design points. For
descriptions of a DOE worksheet, see Storing the design on page 20-10 and page 21-19.
22-2
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Optimal Designs
2 Under Criterion, choose D-optimality. See Method on page 22-6 for a discussion.
3 In Number of points in optimal design, enter the number of points to be selected for the
optimal design. You must select at least as many design points as there are terms in the model.
More
The feasible number of design points is dictated by various constraints (for example,
time, budget, or ease of data collection). It is strongly recommended that you select
more than the minimum number so you obtain estimates of pure error and lack-of-fit of
the fitted model.
from Include the following terms, choose the order of the model you want to fit:
CONTENTS
22-3
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 22
HOW TO USE
Selecting an Optimal Design
move the terms you want to include in the model to Selected Terms using the arrow
buttons
to move one or more terms, highlight the desired terms, then click
or
to move all of the terms, click
or
You can also move a term by double-clicking it.
7 Click OK.
Note
MINITAB represents factors and components with the letters A, B, C, , skipping the letter
I for factors and the letter T for components. For mixture designs, process variables are
represented by X1,,Xn , and the amount variable by the letter T.
More
For more on specifying a response surface model, see Selecting model terms on page
20-27. For more information on specifying a mixture model, see Selecting model terms on
page 21-41.
8 If you like, use one or more of the options listed below, then click OK.
Options
Select Optimal Design dialog box
for mixture designs, you can analyze the design in proportions or pseudocomponentssee
Pseudocomponents on page 21-36.
for mixture designs, you can include inverse component terms, process variable terms, or an
amount term in the model. You cannot include inverse terms if the lower bound for any
component is zero or if you choose to analyze the design in pseudocomponents.
specify whether the initial design is generated using a sequential or random algorithm, or a
combination of both methodssee Method on page 22-6.
choose the search procedure for improving the initial designsee Method on page 22-6.
store a column (named OptPoint) in the original worksheet that indicates how many times a
design point has been selected by the optimal procedure.
22-4
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Optimal Designs
store the design points that have been selected by the optimal procedure in a new worksheet.
in addition to the design columns, store the rows of any non-design columns for the design
points that were selected in a new worksheet.
2 Under Criterion, choose Distance-based optimality. See Method on page 22-6 for a
discussion.
3 In Number of points in optimal design, enter the number of points to be included in the
design. The number of points you enter must be less than or equal to the number of distinct
design points in the candidate set.
4 In Specify design columns, delete the design columns that you do not want to include in the
optimal design.
For a response surface design, you can include all the factors or a subset of the factors.
For a mixture design, you must include all components. You can also include all the
process variables or a subset of the process variables, and an amount variable.
By default, MINITAB will include all input variables in the candidate design.
5 Under Task, choose Select optimal design.
6 If you like, use one or more of the options listed below, then click OK.
MINITAB Users Guide 2
CONTENTS
22-5
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 22
HOW TO USE
Selecting an Optimal Design
Options
Select Optimal Design dialog box
for mixture designs, you can analyze the design in proportions or pseudocomponentssee
Pseudocomponents on page 21-36
store a column (named OptPoint) in the original worksheet that indicates how many times a
design point has been selected by the optimal procedure
store the design points that have been selected by the optimal procedure in a new worksheet
in addition to the design columns, store the rows of any non-design columns for the design
points that were selected in a new worksheet
Method
There are two optimality criteria for MINITABs select optimal design capability: D-optimality and
distance-based optimality.
D-optimality
The D-optimality criterion minimizes the variance of the regression coefficients in the model.
You specify the model, then MINITAB selects design points that satisfy the D-optimal criterion
from a set of candidate points. The selection process consists of two steps:
The design columns in the worksheet comprise the candidate set of design points. The two-step
optimization process is summarized below.
1 MINITAB selects design points from the candidate set to obtain the initial design. You can
choose which algorithm will be used to select these points in the Methods subdialog box.
Choices include: sequential selection, random selection, or a combination of sequential and
random selection.
By default, MINITAB selects all points sequentially.
22-6
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Optimal Designs
2 MINITAB then tries to improve the initial design by adding and removing points to obtain the
final design (referred to simply as the optimal design). You can choose the improvement
method in the Methods subdialog box. Choices include:
exchange method. MINITAB will first add the best points from the candidate set, and then
drop the worst points until the D-optimality of the design cannot be improved further. You
can specify the number of points to be exchanged in the Methods subdialog box.
suppress improvement of the initial design. In this case, the final design will be the same as
the initial design.
Suppose you want to conduct an experiment to maximize crystal growth. You have determined
that four variablestime the crystals are exposed to a catalyst, temperature in the exposure
chamber, pressure within the chamber, and percentage of the catalyst in the air inside the
chamberexplain much of the variability in the rate of crystal growth.
You generate the default central composite design for four factors and two blocks (the blocks
represent the two days you conduct the experiment). This design, which contains 30 design
points, serves as the candidate set for the D-optimal design.
Available resources restrict the number of design points that you can include in your experiment
to 20. You want to obtain a D-optimal design which reduces the number of design points.
1 Open the worksheet OPTDES.MTW.
2 Choose Stat DOE Response Surface Select Optimal Design.
MINITAB Users Guide 2
CONTENTS
22-7
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 22
HOW TO USE
Selecting an Optimal Design
Session
window
output
Optimal Design
Response surface design selected according to D-optimality
C D
AA BB CC DD
AB
AC AD BC BD
CD
Optimal Design
Row number of selected design points:
24
14
27
25 22 30 26 28
21
1
5
6 19
10
Condition number:
1.5138E+04
D-optimality (determinant of XTX):
1.2622E+18
A-optimality (trace of inv(XTX)):
5.9014E+03
G-optimality(ave leverage/max leverage):
0.8000
Maximum leverage:
1.0000
Average leverage:
0.8000
17
16
BB
CC DD AB AC
AD BC BD CD
These are the full quadratic model terms that were the default in the Terms subdialog box.
Remember, a design that is D-optimal for one model will most likely not be D-optimal for
another model.
C This section summarizes the method by which the initial design was generated and whether or
not an improvement of the initial design was requested. In this example, the initial design was
generated sequentially and the exchange method (using one design point) was used to
improve the initial design.
22-8
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Optimal Designs
D The selected design points in the order they were chosen. The numbers shown identify the
row of the design points in the original worksheet.
Note
The design points that are selected depend on the row order of the points in the
candidate set. Therefore, MINITAB may select a different optimal design from the same set
of candidate points if they are in a different order. This can occur because there may be
more than one D-optimal design for a given candidate set of points.
E MINITAB displays some variance-minimizing optimality measures. You can use this
information to compare designs.
Data
The worksheet must contain a design generated by Create Response Surface Design, Define
Custom Response Surface Design, Create Mixture Design, or Define Custom Mixture Design.
For information on creating these designs, see Chapters 20 and 21.
The design columns in the worksheet comprise the candidate set of design points. For
descriptions of a DOE worksheet, see Storing the design on page 20-10 and page 21-19.
In addition to the design columns, you may also have a column that indicates how many times a
design point is to be included in the initial design, and whether a point must be kept in
(protected) or may be omitted from the final design. See below for more information.
Design indicator column
There are two ways that you can define the initial design. You can use all of the rows of the
design columns in the worksheet or you can create an indicator column to specify certain rows to
include in the initial design. In addition, you can use this column to protect design points
during the optimization process. If you protect a point, MINITAB will not drop this design point
from the final design. The indicator column can contain any positive or negative integers.
MINITAB interprets the indicators as follows:
the magnitude of the indicator determines the number of replicates of the corresponding
design point in the initial design
the sign of the indicator determines whether or not the design point will be protected during
the optimization process
a positive sign indicates that the design point may be excluded from the final design
CONTENTS
22-9
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 22
SC QREF
HOW TO USE
a negative sign indicates that the design point may not be excluded from the final design
h To augment or improve a D-optimal design
1 Choose Stat DOE Response Surface or Mixture Select Optimal Design.
2 Under Criterion, choose D-optimality. See Method on page 22-6 for a discussion.
3 Under Task, choose Augment/improve design. If you have a design point indicator column,
enter this column in the box. See Design indicator column on page 22-9.
4 Do one of the following:
To augment (add points) a design, in Number of points in optimal design, enter the
number of points to be included in the final design. The number of points you enter must
be greater than the number of points in the design you are augmenting.
To improve a designs D-optimality but not add any additional points, in Number of points
in optimal design, enter 0. In this case, the final design will have the same number of
design points as the initial design.
22-10
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Optimal Designs
5 Click Terms.
These list items
vary depending on
the type of design.
This option is
only available for
mixture designs.
from Include the following terms, choose the order of the model you want to fit:
for response surface designs, choose one of the following:
linear, linear + squares, linear + interactions, or full quadratic
for mixture designs, choose one of the following:
linear, quadratic, special cubic, full cubic, special quartic, or full quartic
move the terms you want to include in the model to Selected Terms using the arrow
buttons
to move one or more terms, highlight the desired terms, then click
or
to move all of the terms, click
or
You can also move a term by double-clicking it.
CONTENTS
22-11
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 22
SC QREF
HOW TO USE
Note
MINITAB represents factors and components with the letters A, B, C, , skipping the letter
I for factors and the letter T for components. For mixture designs, process variables are
represented by X1,,Xn , and the amount variable by the letter T.
More
For more on specifying a response surface model, see Selecting model terms on page
20-27. For more information on specifying a mixture model, see Selecting model terms on
page 21-41.
8 If you like, use one or more of the options listed below, then click OK.
Options
Select Optimal Design dialog box
for mixture designs, you can analyze the design in proportions or pseudocomponentssee
Pseudocomponents on page 21-36
for mixture designs, you can include inverse component terms, process variable terms, or an
amount term in the model. You cannot include inverse terms if the lower bound for any
component is zero or if you choose to analyze the design in pseudocomponents.
choose the search procedure for improving the initial designsee Method on page 22-14
store a column (named OptPoint) in the original worksheet that indicates how many times a
design point has been selected by the optimal procedure
store the design points that have been selected by the optimal procedure in a new worksheet
in addition to the design columns, store the rows of any non-design columns for the design
points that were selected in a new worksheet
22-12
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Optimal Designs
2 Under Criterion, choose Distance-based optimality. See Method on page 22-6 for a
discussion.
3 In Number of points in optimal design, enter the number of points to be in the final design.
The number of points you enter must be greater than the number of points in the initial
design but not greater then the number of distinct points in the candidate set.
4 In Specify design columns, delete the design columns that you do not want to include in the
optimal design.
For a response surface design, you can include all the factors or a subset of the factors.
For a mixture design, you must include all components. You can also include all the
process variables or a subset of the process variables, and an amount variable.
By default, MINITAB will include all design variables in the candidate design.
5 Under Task, choose Augment/improve design. If you have a design point indicator column,
enter this column in the box. See Design indicator column on page 22-9.
6 If you like, use one or more of the options listed below, then click OK.
Options
Select Optimal Design dialog box
for mixture designs, you can analyze the design in proportions or pseudocomponentssee
Pseudocomponents on page 21-36
CONTENTS
22-13
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 22
SC QREF
HOW TO USE
store a column (named OptPoint) in the original worksheet that indicates how many times a
design point has been selected by the optimal procedure
store the design points that have been selected by the optimal procedure in a new worksheet
in addition to the design columns, store the rows of any non-design columns for the design
points that were selected in a new worksheet
Method
There are two optimality criteria for MINITABs augment/improve optimal design capability:
D-optimality and distance-based optimality.
D-optimality
The D-optimality criterion minimizes the variance of the regression coefficients in the model.
You specify the model, then MINITAB selects design points that satisfy the D-optimal criterion
from a set of candidate points. The selection process consists of two steps:
The design columns in the worksheet make up the candidate set of design points. The two-step
optimization process is summarized below.
1 The initial design can be obtained in one of two ways:
You can use all the design points in the worksheet for the initial design.
You can use an indicator column to specify which design points and how many replicates of
each point comprise the initial design. For information on the structure of this indicator
column, see Design indicator column on page 22-9.
If you are augmenting the design, MINITAB adds the best points in the candidate set
sequentially.
2 MINITAB then tries to improve the initial design by adding and removing points to obtain the
final design (referred to simply as the optimal design). You can choose the improvement
method in the Methods subdialog box. Choices include:
22-14
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Optimal Designs
exchange method. MINITAB will first add the best points from the candidate set, and then
drop the worst points until the D-optimality of the design cannot be improved further. You
can specify the number of points to be exchanged in the Methods subdialog box.
suppress improvement of the initial design. In this case, the final design will be the same as
the initial design.
create an indicator column (OptPoint) in the original worksheet that shows whether
or not a point was selected and the number of replicates of that design point
There is only one trial possible if you generate the initial design by purely sequential
selection or if you specify the initial design with an indicator column.
CONTENTS
22-15
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 22
SC QREF
HOW TO USE
Distance-based optimality
If you do not want to select a model in advance, a good strategy is to spread the design points
uniformly over the design space. In this case, the distance-based method provides one solution
for selecting the design points.
The distance-based optimality algorithm selects design points from a candidate set, such that the
points are spread evenly over the design space. You may choose to begin the optimization from all
the design points in the candidate set or just points that you specify with an indicator column. If
you begin with the entire candidate set, MINITAB selects the candidate point with the largest
Euclidean distance from the origin (response surface design) or the point that is closest to a pure
component (mixture design) as the starting point. Then, MINITAB adds additional design points
in a stepwise manner such that each new point is as far as possible from the points already
selected for the design.
There is no replacement and no replicates in distance-based designs.
e Example of augmenting a D-optimal design
In the Example of selecting a D-optimal response surface design on page 22-7, you selected a
subset of 20 design points from a candidate set of 30 points. After you collected the data for the 20
selected design points, you found out that you could run five additional design points. Because
you already collected the data for the original design, you need to protect these points in the
augmented design so they can not be excluded during the augmentation/optimization procedure.
To protect these points, you need to have negative indicators for the design points that were
already selected for the first optimal design.
1 Open the worksheet OPTDES2.MTW. (The design and indicator column have been saved
for you.)
2 Choose Stat DOE Response Surface Select Optimal Design.
3 Choose Augment/improve design and type OptPoint in the box.
4 In Number of points in optimal design, type 25.
5 Click Terms. Click OK in each dialog box.
22-16
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Session
window
output
Optimal Designs
Optimal Design
Response surface design augmented according to D-optimality
C D
AA BB CC DD
AB
AC AD BC BD
CD
Optimal Design
Row number of selected design points:
1
3
4
5
6
8
9 10
25 26 27 28 30 15
2
7
14
11
16
13
17
19
Condition number:
1.7779E+04
D-optimality (determinant of XTX):
1.7219E+20
A-optimality (trace of inv(XTX)):
5.7881E+03
G-optimality(ave leverage/max leverage):
0.6400
Maximum leverage:
1.0000
Average leverage:
0.6400
21
22
24
AA
BB CC DD AB
AC
AD BC BD CD
These full quadratic model terms are the default in the Terms subdialog box. Remember, a
design that is D-optimal for one model will most likely not be D-optimal for another model.
C This section summarizes the method by which the initial design was augmented and whether
or not an improvement of the initial design was requested. In this example, two design points
were added sequentially and the exchange method (using one design point) was used to
improve the initial design.
CONTENTS
22-17
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 22
HOW TO USE
Evaluating a Design
D The selected design points in the order they were chosen. The numbers shown identify the
row of the design points in the worksheet.
Note
The design points that are selected depend on the row order of the points in the
candidate set. Therefore, MINITAB may select a different optimal design from the same
candidate points if they are in a different order. This can occur because there may be
more than one D-optimal design for a given candidate set of points.
E MINITAB displays some variance-minimizing optimality measures. You can use this
information to compare designs. You can use this information to compare designs. For
example, if you compare the optimality of the original 20 point design shown on page 22-8
with this 25 point design, you will notice that the D-optimality increased from 1.2622E+18 to
1.7219E+20.
Evaluating a Design
If you have a response surface or a mixture design in your worksheet, you can evaluate this design.
MINITAB will display a number of optimality statistics. You can use this information to compare
designs or to evaluate changes in the optimality of a design if you change the model.
For example, recall that a design that is D-optimal for a specific model only. Suppose you
generated a D-optimal design for a certain model, but then decided to fit a model with different
terms. You can determine the change in optimality using the Evaluate design task.
Data
The worksheet must contain a design generated by Create Response Surface Design, Define
Custom Response Surface Design, Create Mixture Design, or Define Custom Mixture Design.
For information on creating these designs, see Chapters 20 and 21.
In addition to the design columns, you may also have a column that indicates how many times a
design point is to be included in the evaluation. This column must contain only positive integers.
See below for more information.
Design indicator column
There are two ways that you can define the design you want to evaluate. You can use all of the
rows of the design columns in the worksheet or you can create an indicator column to specify
certain rows to include in the design. The magnitude of the indicator determines the number of
replicates of the corresponding design point.
22-18
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Evaluating a Design
Optimal Designs
h To evaluate a design
1 Choose Stat DOE Response Surface or Mixture Select Optimal Design.
2 Under Task, choose Evaluate design. If you have an indicator column that defines the
design, enter the column in the box. See Design indicator column above.
3 Click Terms.
These list items
vary depending on
the type of design.
This option is
only available for
mixture designs.
from Include the component terms up through order, choose the order of the model you
want to fit:
for response surface designs, choose one of the following:
linear, linear + squares, linear + interactions, or full quadratic
for mixture designs, choose one of the following:
linear, quadratic, special cubic, full cubic, special quartic, or full quartic
CONTENTS
22-19
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 22
Evaluating a Design
move the terms you want to include in the model to Selected Terms using the arrow
buttons
to move one or more terms, highlight the desired terms, then click
or
to move all of the terms, click
or
You can also move a term by double-clicking it.
5 Click OK.
Note
MINITAB represents factors and components with the letters A, B, C, , skipping the letter
I for factors and the letter T for components. For mixture designs, process variables are
represented by X1,,Xn , and the amount variable by the letter T.
More
For more on specifying a response surface model, see Selecting model terms on page
20-27. For more information on specifying a mixture model, see Selecting model terms on
page 21-41.
6 If you like, use one or more of the options listed below, then click OK.
Options
Select Optimal Design dialog box
for mixture designs, you can analyze the design in proportions or pseudocomponentssee
Pseudocomponents on page 21-36
for mixture designs, you can include inverse component terms, process variable terms, or an
amount term in the model. You cannot include inverse terms if the lower bound for any
component is zero or if you choose to analyze the design in pseudocomponents.
in addition to the design columns, store the rows of any non-design columns for the selected
design points in a new worksheet
22-20
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Evaluating a Design
Optimal Designs
Suppose you want determine how reducing the model changes the optimality for the 20 point
experimental design obtained in the Example of selecting a D-optimal response surface design on
page 22-7. Remember that a model that is D-optimal for a given model only.
1 Open the worksheet OPTDES3.MTW. (The design and indicator column have been saved
for you.)
2 Choose Stat DOE Response Surface Select Optimal Design.
3 Choose Evaluate design and type OptPoint in the box.
4 Click Terms.
5 From Include the following terms, choose Linear.
6 Click OK in each dialog box.
Session
window
output
Optimal Design
Evaluation of Specified Response Surface Design
Number of design points in optimal design: 20
Model terms
Block A B
A
C D
Specified Design
14
16
17
19
Condition number:
1.4311E+00
D-optimality (determinant of XTX):
4.1544E+07
A-optimality (trace of inv(XTX)):
1.1894E+01
G-optimality(ave leverage/max leverage):
0.8715
V-optimality (average leverage):
0.3000
Maximum leverage:
0.3442
CONTENTS
21
22
24
22-21
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 22
References
These are the linear model terms that you chose in the Terms subdialog box. Remember, a
design that is D-optimal for one model will most likely not be D-optimal for another model.
C The selected design points. The numbers shown identify the row of the design points in the
worksheet.
D In addition to the designs D-optimality, MINITAB displays various optimality measures. You
can use this information to evaluate or compare designs. If you compare the optimality of the
20 point design for a full quadratic model shown on page 22-8 with this 20 point design for a
linear model, you will notice that the D-optimality increased from 1.2622E+18 to
4.1544E+07.
References
[1] A.C. Atkinson, A.N. Donev (1992). Optimum Experimental Designs, Oxford Press.
[2] G.E.P. Box and N.R. Draper (1987). Empirical Model-Building and Response Surfaces, John
Wiley & Sons. p.249.
[3] A.I. Khuri and J.A. Cornell (1987). Response Surfaces: Designs and Analyses, Marcel Dekker,
Inc.
[4] R.H Meyers and D.C. Montgomery (1995). Response Surface Methodology: Process and
Product Optimization Using Designed Experiments, John Wiley & Sons.
22-22
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
23
Response
Optimization
CONTENTS
23-1
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 23
SC QREF
HOW TO USE
Response Optimizer provides you with an optimal solution for the input variable
combinations and an optimization plot. The optimization plot is interactive; you can adjust
input variable settings on the plot to search for more desirable solutions.
Overlaid Contour Plot shows how each response considered relates to two continuous design
variables (factorial and response surface designs) or three continuous design variables (mixture
designs), while holding the other variables in the model at specified levels. The contour plot
allows you to visualize an area of compromise among the various responses.
Response Optimization
You can use MINITABs Response Optimizer to help identify the combination of input variable
settings that jointly optimize a single response or a set of responses. Joint optimization must satisfy
the requirements for all the responses in the set. The overall desirability (D) is a measure of how
well you have satisfied the combined goals for all the responses. Overall desirability has a range of
zero to one. One represents the ideal case; zero indicates that one or more responses are outside
their acceptable limits.
MINITAB calculates an optimal solution and draws a plot. The optimal solution serves as the
starting point for the plot. This optimization plot allows you to interactively change the input
variable settings to perform sensitivity analyses and possibly improve the initial solution.
Note
Although numerical optimization along with graphical analysis can provide useful
information, it is not a substitute for subject matter expertise. Be sure to use relevant
background information, theoretical principles, and knowledge gained through
observation or previous experimentation when applying these methods.
23-2
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Response Optimization
HOW TO USE
Response Optimization
Data
Before you use MINITABs Response Optimizer, you must
1 Create and store a design using one of MINITABs Create Design commands or create a design
from data that you already have in the worksheet with Define Custom Design.
Command
on page
19-6, 19-23
20-4
21-5
19-34
20-18
21-28
Note
Command
on page
19-43
20-25
21-38
You can fit a model with different design variables for each response. If an input variable was not
included in the model for a particular response, the optimization plot for that response-input
variable combination will be blank.
MINITAB automatically omits missing data from the calculations. If you optimize more than one
response and there are missing data, MINITAB excludes the row with missing data from
calculations for all of the responses.
CONTENTS
23-3
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 23
HOW TO USE
Response Optimization
h To optimize responses
1 Choose Stat DOE Factorial, Response Surface, or Mixture Response Optimizer.
This option is
only available for
mixture designs.
2 Move up to 25 responses that you want to optimize from Available to Selected using the arrow
buttons. (If an expected response column does not show in Available, fit a model to it using
Analyze Factorial Design, Analyze Response Surface Design, or Analyze Mixture Design.)
or
or
Under Goal, choose Minimize, Target, or Maximize from the drop-down list.
Under Lower, Target, and Upper, enter numeric values for the target and necessary
bounds as follows:
23-4
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Response Optimization
HOW TO USE
Response Optimization
1 If you choose Minimize under Goal, enter values in Target and Upper.
2 If you choose Target under Goal, enter values in Lower, Target, and Upper.
3 If you choose Maximize under Goal, enter values in Target and Lower.
In Weight, enter a number from 0.1 to 10 to define the shape of the desirability function.
See Setting the weight for the desirability function on page 23-8.
In Importance, enter a number from 0.1 to 10 to specify the relative importance of the
response. See Specifying the importance for composite desirability on page 23-10.
4 Click OK.
5 If you like, use any of the options listed below, then click OK.
Options
Response Optimizer dialog box
define a starting point for the search algorithm by providing a value for each input variable in
your model. Each value must be between the minimum and maximum levels for that input
variable.
for factorial designs, define settings at which to hold any covariates that are in the model.
Method
MINITABs Response Optimizer searches for a combination of input variable levels that jointly
optimize a set of responses by satisfying the requirements for each response in the set. The
optimization is accomplished by
MINITAB Users Guide 2
CONTENTS
23-5
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 23
HOW TO USE
Response Optimization
Note
If you have only one response, the overall desirability is equal to the individual desirability.
Suppose you have a response that you want to minimize. You need to determine a target value
and an allowable maximum response value. The desirability for this response below the target
value is one; above the maximum acceptable value the desirability is zero. The closer the
response is to the target, the closer the desirability is to one. The illustration below shows the
default desirability function (also called utility transfer function) used to determine the individual
desirability (d) for a smaller is better goal:
d = desirability
d=1
0
target:
any response value smaller
than this target value has a
desirability of one.
<
<
upper bound:
any response value greater than this
upper bound has a desirability of zero
d=0
As response decreases,
the desirability increases.
The shape of the desirability function between the upper bound and the target is determined by
the choice of weight. The illustration above shows a function with a weight of one. To see how
changing the weight affects the shape of the desirability function, see Setting the weight for the
desirability function on page 23-8.
Obtaining the composite desirability
After MINITAB calculates an individual desirability for each response, they are combined to
provide a measure of the composite, or overall, desirability of the multi-response system. This
measure of composite desirability (D) is the weighted geometric mean of the individual
desirabilities for the responses. The individual desirabilities are weighted according to the
23-6
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Response Optimization
HOW TO USE
Response Optimization
importance that you assign each response. For a discussion, see Specifying the importance for
composite desirability on page 23-10.
Maximizing the composite desirability
Finally, MINITAB employs a reduced gradient algorithm with multiple starting points that
maximizes the composite desirability to determine the numerical optimal solution (optimal
input variable settings).
More
You may want to fine tune the solution by adjusting the input variable settings using the
interactive optimization plot. See Using the optimization plot on page 23-10.
Specifying bounds
In order to calculate the numerical optimal solution, you need to specify a target and lower and/
or upper bounds for each reponse. The boundaries needed depend on your goal:
If your goal is to minimize (smaller is better) the response, you need to determine a target
value and the upper bound. You may want to set the target value at the point of diminishing
returns, that is, although you want to minimize the response, going below a certain value
makes little or no difference. If there is no point of diminishing returns, use a very small
number, one that is probably not achievable, for the target value.
If your goal is to target the response, you probably have upper and lower specification limits
for the response that can be used as lower and upper bounds.
If your goal is to maximize (larger is better) the response, you need to determine a target value
and the lower bound. Again, you may want to set the target value at the point of diminishing
returns, although now you need a value on the upper end instead of the lower end of the
range.
CONTENTS
23-7
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 23
HOW TO USE
Response Optimization
The illustrations below show how the shape of the desirability function changes when the goal is
to maximize the response changes depending on the weight:
Weight
Desirability function
d = desirability
target
d=1
0.1
d=0
1
d=0
10
d=0
23-8
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Response Optimization
Response Optimization
ei
gh
weight = 10
t=
weight = 0.1
upper bound
As response decreases,
the desirability increases.
target
1
ei
gh
1
weight = 0.1
weight = 0.1
Below the lower bound,
=
t=
t
h
the response desirability is 0;
1
g
ei
weight = 10
w
at the target, it is 1; above
the upper bound, it
0
0
lower bound
upper bound
is 0.
1
target
weight = 0.1
w
ht
ig
weight = 10
0
lower bound
As response increases,
the desirability increases.
CONTENTS
23-9
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 23
Response Optimization
to search for lower-cost input variable settings with near optimal properties
When you change an input variable to a new level, the graphs are redrawn and the predicted
responses and desirabilities are recalculated. If you discover a setting combination that has a
composite desirability higher than the initial optimal setting, MINITAB replaces the initial optimal
setting with the new optimal setting. You will then have the option of adding the previous optimal
setting to the saved settings list.
23-10
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Response Optimization
HOW TO USE
Response Optimization
clicking on the red input variable settings located at the top and entering a new value in
the dialog box that appears
Note
You can return to the initial or optimal settings at any time by clicking
or by right-clicking and choosing Reset to Optimal Settings.
Note
For factorial designs with center points in the model: If you move one factor to the center
on the optimization plot, then all factors will move to the center. If you move one factor
away from the center, then all factors with move with it, away from the center.
Note
For a mixture design, you cannot change a component setting independently of the
other component settings. If you want one or more components to stay at their current
settings, you need to lock them. See To lock components (mixture designs only) on page
23-12.
on the Toolbar
Note
clicking
The saved settings are stored in a sequential list. You can cycle forwards and backwards
through the setting list by clicking on
or
on the Toolbar or by right-clicking and
choosing the appropriate command from the menu.
CONTENTS
23-11
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 23
HOW TO USE
Response Optimization
clicking
clicking
on the Toolbar
component at a value that would prevent any other component from changing. In addition,
you must leave at least two components unlocked.
h To view a list of all saved settings
1 View the a list of all saved settings by
More
clicking
You can copy the saved setting list to the Clipboard by right-clicking and choosing Select
All and then choosing Copy.
You are an engineer assigned to optimize the responses from a chemical reaction experiment.
You have determined that three factorsreaction time, reaction temperature, and type of
catalystaffect the yield and cost of the process. You want to find the factor settings that
maximize the yield and minimize the cost of the process.
1 Open the worksheet FACTOPT.MTW. (We have saved the design, response data, and model
23-12
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Response Optimization
Response Optimization
4 Click Setup. Complete the Goal, Lower, Target, and Upper columns of the table as shown
below:
Response
Goal
Yield
Maximize
Cost
Minimize
Lower
Target
35
45
28
Upper
35
Session
window
output
Response Optimization
Parameters
Goal
Maximum
Minimum
Yield
Cost
Lower
35
28
Target
45
28
Upper
45
35
Weight
1
1
Import
1
1
Global Solution
Time
Temp
Catalyst
= 46.062
= 150.000
=
-1.000 (A)
Predicted Responses
Yield
Cost
Composite Desirability =
0.92445
Graph
window
output
Interpreting results
The individual desirability for Yield is 0.98081; the individual desirability for Cost is 0.87132.
The composite desirability for both these two variables is 0.92445.
To obtain this desirability, you would set the factor levels at the values shown under Global
Solution in the Session window. That is, time would be set at 46.062, temperature at 150, and
you would use catalyst A.
MINITAB Users Guide 2
CONTENTS
23-13
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 23
HOW TO USE
Response Optimization
If you want to try to improve this initial solution, you can use the plot. Move the red vertical bars
to change the factor settings and see how the individual desirability of the responses and the
composite desirability change.
e Example of a response optimization experiment for a response surface design
You need to create a product that satisfies the criteria for both seal strength and variability in seal
strength. Parts are placed inside a bag, which is then sealed with a heat-sealing machine. The seal
must be strong enough so that product will not be lost in transit, yet not so strong that the
consumer cannot open the bag. The lower and upper specifications for the seal strength are 24
and 28 lbs, with a target of 26 lbs. For the variability in seal strength, the goal is to minimize and
the maximum acceptable value is 1.
Previous experimentation has indicated that the following are important factors for controlling
the strength of the seal: hot bar temperature (HotBarT), dwell time (DwelTime), hot bar pressure
(HotBarP), and material temperature (MatTemp). Hot bar temperature (HotBarT) and dwell
time (DwelTime) are important for reducing the variation in seal strength.
You goal is to optimize both responses: strength of the seal (Strength) and variability in the
strength of the seal (VarStrength).
1 Open the worksheet RSOPT.MTW. (The design, response data, and model information have
4 Click Setup. Complete the Goal, Lower, Target, and Upper columns of the table as shown
below:
Response
Goal
Strength
Target
VarStrength
Minimize
Lower
Target
Upper
24
26
28
23-14
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Response Optimization
Session
window
output
Response Optimization
Response Optimization
Parameters
Goal
Target
Minimum
Strength
VarStrength
Lower
24
0
Target
26
0
Upper
28
1
Weight
1
1
Import
1
1
Global Solution
HotBarT
DwelTime
HotBarP
MatTemp
= 125.000
=
1.197
= 163.842
= 104.552
Predicted Responses
Strength
= 26.0000, desirability = 1.00000
VarStrength = 0.0000, desirability = 1.00000
Composite Desirability =
1.00000
Graph
window
output
CONTENTS
23-15
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 23
HOW TO USE
Response Optimization
The compound normally used to make a plastic pipe is made of two materials: Material A and
Material B. As a research engineer, you would like to determine whether or not a filler can be
added to the existing formulation and still satisfy certain physical property requirements. You
would like to include as much filler in the formulation as possible and still satisfy the response
specifications. The pipe must meet the following specifications:
Using an augmented simplex centroid design, you collected data and are now going to optimize
on three responses: impact strength (Impact), deflection temperature (Temp), and yield strength
(Strength).
1 Open the worksheet MIXOPT.MTW. (The design, response data, and model information
below:
Response
Goal
Lower
Target
Impact
Maximize
Temp
Maximize
190
200
Strength
Maximize
5000
5200
Upper
23-16
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Response Optimization
Session
window
output
Response Optimization
Response Optimization
Parameters
Goal
Maximum
Maximum
Maximum
Impact
Temp
Strength
Lower
1
190
5000
Target
3
200
5200
Upper
3
200
5200
Weight
1
1
1
Import
1
1
1
Global Solution
Components
Mat-A
Mat-B
Filler
=
=
=
0.575
0.425
0.000
Predicted Responses
Impact
Temp
Strength
=
7.26, desirability =
= 203.93, desirability =
= 5255.47, desirability =
Composite Desirability =
1
1
1
1.00000
Graph
window
output
CONTENTS
23-17
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 23
HOW TO USE
Response Optimization
23-18
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Response Optimization
Data
Before you use Overlaid Contour Plot, you must
1 Create and store a design using one of MINITABs Create Design commands or create a design
from data that you already have in the worksheet with Define Custom Design.
Command
on page
19-6, 19-23
20-4
21-5
19-34
20-18
21-28
Note
Command
on page
19-43
20-25
21-38
Overlaid Contour Plot is not available for general full factorial designs.
CONTENTS
23-19
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 23
Mixture Designs
2 Under Responses, move up to ten responses that you want to include in the plot from
or
or
Note
For factorial and response surface designs, under Factors, choose a factor from X Axis and a
factor from Y Axis.
Only numeric process variables are valid candidates for X and Y axes.
2 To plot process variables, under Select components or process variables as axes, choose
2 process variables.
23-20
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Response Optimization
3 Click Contours.
4 For each response, enter a number in Low and High. See Defining contours on page 23-22.
Click OK.
5 If you like, use any of the options listed below, then click OK.
Options
Overlaid Contour Plot dialog box
for factorial and response surface designs, display the plot in coded or uncoded units
specify values for factors, components, or process variables that are not used as axes in the
contour plot, instead of using the default of median (middle) valuessee Settings for extra
factors, covariates, components, and process variables on page 23-22
for factorial designs, specify values for covariates in the design, instead of using the default of
mean (middle) valuessee Settings for extra factors, covariates, components, and process
variables on page 23-22
for mixture designs that include an amount variable, specify the hold value, instead of using
the mean as the default
for factorial and response surface designs, define minimum and maximum values for the
x-axis and y-axis
for mixture designs, define minimum values for the x-axis, y-axis, and z-axis
CONTENTS
23-21
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 23
HOW TO USE
Overlaid Contour Plots
Defining contours
For each response, you need to define a low and a high contour. These contours should be
chosen depending on your goal for the responses. Here are some examples:
If your goal is to minimize (smaller is better) the response, you may want to set the Low value
at the point of diminishing returns, that is, although you want to minimize the response, going
below a certain value makes little or no difference. If there is no point of diminishing returns,
use a very small number, one that is probably not achievable. Use your maximum acceptable
value in High.
If your goal is to target the response, you probably have upper and lower specification limits for
the response that can be used as the values for Low and High. If you do not have specification
limits, you may want to use lower and upper points of diminishing returns.
If your goal is to maximize (larger is better) the response, again, you may want to set the High
value at the point of diminishing returns, although now you need a value on the upper end
instead of the lower end of the range. Use your minimum acceptable value in Low.
In all of these cases, the goal is to have the response fall between these two values.
Note
23-22
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Response Optimization
Factorial Design
Mixture Design
2 Do one of the following to set the holding value for extra factors, components, or process
variables, or covariates:
For components:
CONTENTS
23-23
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 23
HOW TO USE
Overlaid Contour Plots
To use the preset values for components, choose Lower bound setting, Middle setting,
or Upper bound setting under Hold components at. When you use a preset value, all
components not in the plot will be held at their lower bound, middle, or upper bound.
To specify the value at which to hold the components, enter a number in Setting for
each component that you want control. This option allows you to set a different holding
value for each components.
3 Click OK.
e Example of an overlaid contour plot for factorial design
This contour plot is a continuation of the factorial response optimization example on page 23-12.
A chemical engineer conducted a 23 full factorial design to examine the effects of reaction time,
reaction temperature, and type of catalyst on the yield and cost of the process. The goal is to
maximize yield and minimize cost. In this example, you will create contour plots using time and
temperature as the two axes in the plot and holding type of catalyst at levels A and B respectively.
Step 1: Display the overlaid contour plot for Catalyst A
1 Open the worksheet FACTOPT.MTW. (The design information and response data have been
4 Click Contours. Complete the Low and High columns of the table as shown below, then
Low
High
Yield
35
45
Cost
28
35
23-24
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Response Optimization
Graph
window
output
Interpreting results
Above are two overlaid contour plots. The two factors, temperature and time, are used as the two
axes in the plots and the third factor, catalyst, has been held at levels A and B respectively.
The white area inside each plot shows the range of time and temperature where the criteria for
both response variables are satisfied. Use this plot in combination with the optimization plot
shown on page 23-12 to find the best operating conditions for maximizing yield and minimizing
cost.
CONTENTS
23-25
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 23
This contour plot is a continuation of the analysis for the heat-sealing process experiment
introduced on page 23-14. Parts are placed inside a sealable bag, which is then sealed with a
heat-sealing machine. The seal must be strong enough so that product will not be lost in transit,
yet not so strong that the consumer cannot open the bag. The upper and lower specifications for
the seal strength are 24 and 28 lbs, with a target of 26 lbs.
Previous experimentation has indicated that the important factors for controlling the strength of
the seal are: hot bar temperature (HotBarT), dwell time (DwelTime), hot bar pressure (HotBarP),
and material temperature (MatTemp). Hot bar temperature (HotBarT) and dwell time
(DwelTime) are important for reducing the variation in seal strength.
Your goal is to optimize both responses: strength of the seal (Strength) and variability in the
strength of the seal (VarStrength). With an overlaid contour plot, you can only look at two factors
at a time. You will use the optimal solution values shown on page 23-14 as the holding values for
factors that are not in the plot (HotBarP and MatTemp).
1 Open the worksheet RSOPT.MTW.
2 Choose Stat DOE Response Surface Overlaid Contour Plots.
3 Click
4 Click Contours. Complete the Low and High columns of the table as shown below, then
click OK.
Name
Strength
VarStrength
Low
High
24
28
5 Click Settings. In Setting, enter 163.842 for HotBarP and 104.552 for MatTemp.
6 Click OK in each dialog box.
Graph
window
output
23-26
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Response Optimization
This overlaid contour plot is a continuation of the analysis for the plastic pipe experiment
introduced on page 23-16. The compound normally used to make a plastic pipe is made of two
materials: Mat-A and Mat-B. As a research engineer, you would like to determine whether or not
a filler can be added to the existing formulation and still satisfy certain physical property
requirements. You would like to include as much filler in the formulation as possible and still
satisfy the response specifications. The pipe must meet the following specifications:
Using an augmented simplex centroid design, you collected data and are now going to create an
overlaid contour plot for three responses: impact strength (Impact), deflection temperature
(Temp), and yield strength (Strength).
1 Open the worksheet MIXOPT.MTW. (The design, response data, and model information
4 Click Contours. Complete the Low and High columns of the table as shown below.
Name
Low
High
Impact
190
205
5000
5800
Temp
Strength
CONTENTS
23-27
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 23
HOW TO USE
References
Graph
window
output
References
[1] Koons, G.F. and Wilt, M.H. (1985). Design and Analysis of an ABS Pipe Compound
Experiment, Experiments in Industry: Design, Analysis, and Interpretation of Results.
American Society for Quality Control, Milwaukee, 111-117.
[2] Derringer, G. and Suich, R. (1980). Simultaneous Optimization of Several Response
Variables, Journal of Quality Technology, 12, 214-219.
[3] Myers, R.H. and Montgomery D.C. (1995). Response Surface Methodology. John Wiley &
Sons, New York.
[4] Castillo, E.D., Montgomery, D.C., and McCarville, D.R. (1996). Modified Desirability
Functions for Multiple Repsonse Optimization. Journal of Quality Technology, 28, 337-345.
23-28
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
24
Taguchi Designs
References, 24-39
CONTENTS
24-1
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 24
HOW TO USE
Taguchi Design Overview
In a static response experiment, the quality characteristic of interest has a fixed level.
In a dynamic response experiment, the quality characteristic operates over a range of values
and the goal is to improve the relationship between an input signal and an output response.
signal-to-noise ratios (S/N ratios, which provide a measure of robustness) vs. the control factors
means (static design) or slopes (dynamic design) vs. the control factors
24-2
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Taguchi Designs
the natural log of the standard deviations vs. the control factors
Use these tables and plots to determine what factors and interactions are important and evaluate
how they affect responses. To get a complete understanding of factor effects it is advisable to
evaluate S/N ratios, means (static design), slopes (dynamic design), and standard deviations.
Make sure that you choose an S/N ratio that is appropriate for the type of data you have and your
goal for optimizing the responsesee Analyzing static designs on page 24-29.
Note
example, you need to choose control factors for the inner array and noise factors for the outer
array. Control factors are factors you can control to optimize the process. Noise factors are
factors that can influence the performance of a system but are not under control during the
intended use of the product. Note that while you cannot control noise factors during the
process or product use, you need to be able to control noise factors for experimentation
purposes.
2 Use Create Taguchi Design to generate a Taguchi design (orthogonal array)see Creating
factor levels, add a signal factor to a static design, ignore an existing signal factor (treat the
design as static), and add new levels to an existing signal factor. See Modifying Designs on
page 24-18.
4 After you create the design, you may use Display Design to change the units (coded or
uncoded) in which MINITAB expresses the factors in the worksheet. See Displaying Designs
on page 24-21.
5 Perform the experiment and collect the response data. Then, enter the data in your MINITAB
CONTENTS
24-3
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 24
HOW TO USE
Choosing a Taguchi Design
determine the impact of other considerations (such as cost, time, or facility availability) on
your choice of design
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Taguchi Designs
represent the control factors, the table rows represent the runs (combination of factor levels), and
each table cell represents the factor level for that run.
L8 (27) Taguchi Design
A
In the above example, levels 1 and 2 occur 4 times in each factor in the array. If you compare the
levels in factor A with the levels in factor B, you will see that B1 and B2 each occur 2 times in
conjunction with A1 and 2 times in conjunction with A2. Each pair of factors is balanced in this
manner, allowing factors to be evaluated independently.
Orthogonal array designs focus primarily on main effects. Some of the arrays offered in
MINITABs catalog permit a few selected interactions to be studied. See Estimating selected
interactions on page 24-9.
You can also add a signal factor to the Taguchi design in order to create a dynamic response
experiment. A dynamic response experiment is used to improve the functional relationship
between an input signal and an output response. See Adding a signal factor for a dynamic
response experiment on page 24-7.
h To create a Taguchi design
1 Choose Stat DOE Taguchi Create Taguchi Design.
CONTENTS
24-5
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 24
HOW TO USE
Creating Taguchi Designs
2 If you want to see a summary of the Taguchi designs available, click Display Available
6 In the Designs box, highlight the design you want to create. If you like, use the option
Options
Design subdialog box
add a signal factorsee Adding a signal factor for a dynamic response experiment on page 24-7
24-6
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Taguchi Designs
store the design in the worksheetsee Storing the design on page 24-9
select interactions to include in the design and allow Minitab to assign factors to columns of
the array to allow estimation of selected interactionssee Estimating selected interactions on
page 24-9
assign factors to columns of the array in order to allow estimation of selected interactions
see Estimating selected interactions on page 24-9
name signal factor and define signal factor levelssee Adding a signal factor for a dynamic
response experiment on page 24-7
CONTENTS
24-7
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 24
HOW TO USE
Creating Taguchi Designs
factor with 2 levels to an L4 (23) design, which has 4 runs, creates a design with 8 total runs;
adding a signal factor with 3 levels creates a design with 12 total runs.
Static design
(No signal factor)
A
1
1
2
2
Note
B
1
2
1
2
Dynamic design
(Signal factor
with 2 levels)
A B
Signal
factor
1 1
1
1 1
2
1 2
1
1 2
2
2 1
1
2 1
2
2 2
1
2 2
2
A
1
1
1
1
1
1
2
2
2
2
2
2
Dynamic design
(Signal factor
with 3 levels)
B
Signal
factor
1
1
1
2
1
3
2
1
2
2
2
3
1
1
1
2
1
3
2
1
2
2
2
3
When you add a signal factor while creating a new Taguchi design, the run order will be
different from the order that results from adding a signal factor using Modify Designsee
Adding a signal factor to an existing static design on page 24-19. The order of the rows
does not affect the Taguchi analysis.
24-8
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Taguchi Designs
4 If you like, in the signal factor table under Name, click in the first row and type the name of
You can also specify signal factor levels using a range and increments. You can specify a
range by typing two numbers separated by a colon. For example, 1:5 displays the
numbers 1, 2, 3, 4, and 5. You can specify an increment by typing a slash / and a
number. For example, 1:5/2 displays every other number in a range: 1, 3, and 5.
CONTENTS
24-9
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 24
HOW TO USE
Creating Taguchi Designs
Interaction tables show confounded columns, which can help you to assign factors to array
columns. For interaction tables of MINITABs catalog of Taguchi designs (orthogonal arrays), see
Help. The interaction table for the L8 (27) array is shown below.
1
1
2
3
4
5
The columns and rows represent the column numbers of the Taguchi design (orthogonal array).
Each table cell contains the interactions confounded for the two columns of the orthogonal
array.
For example, the entry in cell (1, 2) is 3. This means that the interaction between columns 1 and
2 is confounded with column 3. Thus, if you assigned factors A, B, and C to columns 1, 2, and 3,
you could not study the AB interaction independently of factor C. If you suspect that there is a
substantial interaction between A and B, you should not assign any factors to column 3. Similarly,
the column 1 and 3 interaction is confounded with column 2, and the column 2 and 3
interaction is confounded with column 1.
Note
Assigning factors to columns of the array does not change how the design is displayed in
the worksheet. For example, if you assigned factor A to column 3 of the array and factor
B to column 2 of the array, factor A would still appear in column 1 in the worksheet and
factor B would still appear in column 2 in the worksheet.
h To select interactions
1 In the Create Taguchi Design dialog box, click Factors.
24-10
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Taguchi Designs
2 Under Assign Factors, choose To allow estimation of selected Interactions and then click
Interactions.
3 Move the interactions that you want to include in the design from Available Terms to
or
or
Note
Assigning factors to columns of the array does not change how the design is displayed in
the worksheet. For example, if you assigned factor A to column 3 of the array and factor
B to column 2 of the array, factor A would still appear in column 1 in the worksheet and
factor B would still appear in column 2 in the worksheet.
CONTENTS
24-11
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 24
HOW TO USE
Creating Taguchi Designs
3 In the factor table, click under Column in the cell that corresponds to the factor that you want
to assign. From the drop-down list, choose the array column to which you want to assign the
factor. Then, use the Z key to move down the table and assign the factors to the remaining
array columns. Click OK.
More
See Help for interaction tables of MINITABs catalog of Taguchi designs (orthogonal
arrays).
Naming factors
By default, MINITAB names the factors alphabetically.
h To name factors
1 In the Create Taguchi Design dialog box, click Factors.
2 Under Name in the factor table, click in the first row and type the name of the first factor.
Then, use the Z key to move down the column and enter the remaining factor names.
3 Click OK.
More
After you have created the design, you can change the factor names by typing new
names in the Data window, or with Modify Design (page 24-18).
24-12
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Taguchi Designs
2 Under Level Values in the factor table, click in the first row and type the levels of the first
factor. Then, use the Z key to move down the column and enter the remaining levels. Click
OK.
L9 (34) array
Run
Run
Dummy
treatments
In the L9 (34) orthogonal array with dummy treatment above, factor A has repeated level 1, in
place of level 3. This results in an L9 (34) array with one factor at 2 levels and three factors at 3
levels. The array is still orthogonal, although it is not balanced.
When choosing which factor level to use as the dummy treatment, you may want to consider the
amount of information about the factor level and the availability of experimental resources. For
example, if you know more about level 1 than level 2, you may want to choose level 2 as your
dummy treatment. Similarly, if level 2 is more expensive than level 1, requiring more resources
or time to test, you may want to choose level 1 as your dummy treatment.
CONTENTS
24-13
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 24
SC QREF
HOW TO USE
L4 (23)
2-3
L8 (27)
2-7
L9 (34)
2-4
L12 (211)
2-11
L16 (215)
2-15
L16 (45)
2-5
L25 (56)
2-6
L27 (313)
2-13
L32 (231)
2-31
1-7
1-11
2-12
1-3
13
24-14
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Taguchi Designs
Number of levels
Designs
L54
(21
325)
CONTENTS
3-25
24-15
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
Chapter 24
SC QREF
HOW TO USE
L8 (24 41)
1-4
2-12
1-9
1-6
1-3
2-9
1-8
3 level
6 level
1-6
24-16
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Taguchi Designs
You can also use Define Custom Taguchi Design to redefine a design that you created with
Create Taguchi Design and then modified directly in the worksheet.
Define Custom Taguchi Design allows you to specify which columns contain your factors and to
include a signal factor. After you define your design, you can use Modify Design (page 24-18),
Display Design (page 24-21), and Analyze Taguchi Design (page 24-22).
h To define a custom Taguchi design
1 Choose Stat DOE Taguchi Define Custom Taguchi Design.
CONTENTS
24-17
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 24
HOW TO USE
Modifying Designs
Modifying Designs
After creating a Taguchi design and storing it in the worksheet, you can use Modify Design to
make the following modifications:
rename the factors and change the factor levels for the control factors in the inner arraysee
Renaming factors and changing factor levels on page 24-18
add a signal factor to a static designsee Adding a signal factor to an existing static design on
page 24-19
ignore the signal factor (treat the design as static)see Ignoring the signal factor on page 24-20
add new levels to the signal factor in an existing dynamic designsee Adding new levels to the
signal factor on page 24-21
By default, MINITAB will replace the current design with the modified design. To store the
modified design in a new worksheet, check Put modified design in a new worksheet in the
Modify Design dialog box.
Static Design
Dynamic Design
24-18
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Modifying Designs
HOW TO USE
Taguchi Designs
3 Under Name, click in the first row and type the name of the first factor. Then, use the Z key
to move down the column and enter the remaining factor names.
4 Under Level Values, click in the first row and type the levels of the first factor. Then, use the
Z key to move down the column and enter the remaining levels. Click OK in each dialog
box.
B
1
2
1
2
Dynamic design
(Added signal factor
with 2 levels)
A B
Signal
factor
1 1
1
1 2
1
2 1
1
2 2
1
1
1
2
2
Note
2
2
2
2
1
1
2
2
1
2
1
2
2
2
2
2
1
1
2
2
1
2
1
2
3
3
3
3
When you add a signal factor to an existing static design, the run order will be different
from the order that results from adding a signal factor while creating a new designsee
Adding a signal factor for a dynamic response experiment on page 24-7. The order of the
rows does not affect the Taguchi analysis.
CONTENTS
1
2
1
2
Dynamic design
(Added signal factor
with 3 levels)
A B
Signal
factor
1 1
1
1 2
1
2 1
1
2 2
1
24-19
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 24
HOW TO USE
Modifying Designs
3 If you like, in the signal factor table under Name, click in the first row and type the name of
You can also specify signal factor levels using a range and increments. You can specify a
range by typing two numbers separated by a colon. For example, 1:5 displays the
numbers 1, 2, 3, 4, and 5. You can specify an increment by typing a slash / and a
number. For example, 1:5/2 displays every other number in a range: 1, 3, and 5.
3 Select Ignore signal factor (treat as non-dynamic). Click OK in each dialog box.
24-20
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Displaying Designs
HOW TO USE
Taguchi Designs
When you add new signal factor levels to an existing dynamic design, the run order will
be different from the order that results from adding a signal factor while creating a new
design. The order of the rows does not affect the Taguchi analysis.
3 Choose Add new levels to signal factor. Enter the new signal factor levels. Click OK.
Note
You can also specify signal factor levels using a range and increments. You can specify a
range by typing two numbers separated by a colon. For example, 1:5 displays the
numbers 1, 2, 3, 4, and 5. You can specify an increment by typing a slash / and a
number. For example, 1:5/2 displays every other number in a range: 1, 3, and 5.
Displaying Designs
After you create the design, you can use Display Design to change the way the design points are
stored in the worksheet. You can display the factor levels in coded or uncoded form.
If you assigned factor levels in Factors subdialog box, the uncoded (actual) factor levels are
initially displayed in the worksheet. If you did not assign factor levels (used the default factor
levels, which are 1, 2, 3, ...), the coded and uncoded units are the same.
CONTENTS
24-21
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 24
HOW TO USE
Collecting and Entering Data
worksheet. These columns constitute the basis of your data collection form. If you did not
name factors or specify factor levels when you created the design and you want names or levels
to appear on the form, see Modifying Designs on page 24-18.
2 In the worksheet, name the columns in which you will enter the measurement data obtained
More
You can also copy the worksheet cells to the Clipboard by choosing Edit Copy cells.
Then paste the clipboard contents into a word-processing application, such as Microsoft
Word, where you can create your own form.
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Taguchi Designs
create and store the design using Create Taguchi Design (page 24-4), or create a design from
data already in the worksheet using Define Custom Taguchi Design (page 24-17) and
generate main effects and interaction plots of the S/N ratios, means (static design), slopes
(dynamic design), and standard deviations vs. the control factors
display response tables for S/N ratios, means (static design), slopes (dynamic design), and
standard deviations
The response tables and main effects and interaction plots can help you determine which factors
affect variation and process location. See Two-step optimization on page 24-23.
Two-step optimization
Two-step optimization, an important part of robust parameter design, involves first reducing
variation and then adjusting the mean on target. Use two-step optimization when you are using
either Nominal is Best signal-to-noise ratio. First, try to identify which factors have the greatest
effect on variation and choose levels of these factors that minimize variation. Then, once you
have reduced variation, the remaining factors are possible candidates for adjusting the mean on
target (scaling factors).
A scaling factor is a factor in which the mean and standard deviation are proportional. You can
identify scaling factors by examining the response tables for each control factor. A scaling factor
has a significant effect on the mean with a relatively small effect on signal-to-noise ratio. This
indicates that the mean and standard deviation scale together. Thus, you can use the scaling
factor to adjust the mean on target but not affect the S/N ratio.
Use main effects plots to help you visualize the relative value of the effects of different factors.
Initial process performance
high variation
Lower
CONTENTS
Target
Upper
24-23
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 24
HOW TO USE
Analyzing Taguchi Designs
variation minimized
Target
Upper
Lower
Target
Upper
variation minimized
process on target
robust design
Data
Structure your data in the worksheet so that each row contains the control factors in the inner
array and the response values from one complete run of the noise factors in the outer array. You
must have from 2 to 50 response columns. Here is an example:
Time
1
1
1
1
2
2
2
2
Pressure
1
1
2
2
1
1
2
2
Catalyst
1
1
2
2
2
2
1
1
Temperature
1
2
1
2
1
2
1
2
Noise 1
50
44
56
65
47
42
68
51
Noise 2
52
51
59
77
43
51
62
38
This example, which is an L8 (24), has four factors in the inner array (Time, Pressure, Catalyst,
and Temperature). Recall, the inner array represents the control factors. There are two noise
conditions in the outer array (Noise 1 and Noise 2). There are two responsesone for each noise
conditionin the outer array for each run in the inner array.
You can have 1 response column if you are using the Larger is Better or Smaller is Better
signal-to-noise ratio and you are not going to analyze or store the standard deviation.
If you have a design and response data in your worksheet that was
24-24
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Taguchi Designs
you can use Analyze Taguchi Design, which will prompt you to define your designsee
Defining Custom Taguchi Designs on page 24-17.
h To fit a model to the data
1 Choose Stat DOE Taguchi Analyze Taguchi Design.
2 In Response data are in, enter the columns that contain the measurement data.
3 If you like, use any of the options listed below, then click OK.
Options
Graphs subdialog box
for static designs, display main effects plots and selected interaction plots for the
signal-to-noise (S/N) ratios, the process means, and/or the process standard deviations
for dynamic designs, display main effects plots and selected interaction plots for the S/N
ratios, the slopes, and/or the process standard deviations. Also, display scatter plots with fitted
lines.
display interaction plots for selected interactionssee Selecting terms for the interaction plots
on page 24-28
display the interaction plots in a matrix on a single graph or to display each interaction plot
on a separate pagesee Selecting terms for the interaction plots on page 24-28
for static designs, display response tables for signal-to-noise ratios, the means, and the standard
deviationssee Displaying response tables on page 24-29
for dynamic designs, display response tables for signal-to-noise ratios, the slopes, and the
standard deviationssee Displaying response tables on page 24-29
CONTENTS
24-25
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 24
HOW TO USE
Analyzing Taguchi Designs
for static designs, choose the signal-to-noise (S/N) ratio that is consistent with your goal and
datasee Analyzing static designs on page 24-29
for dynamic designs, enter a response reference value and a signal reference value for the fitted
line or choose to fit the line with no fixed reference pointsee Analyzing dynamic designs on
page 24-30
Dynamic Design
Static Design
24-26
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Taguchi Designs
2 Under Generate plots of main effects and selected interactions for check Signal-to-noise
ratios, Means (for static design) or Slopes (for dynamic design), and/or Standard deviations.
Click OK.
CONTENTS
24-27
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 24
HOW TO USE
Analyzing Taguchi Designs
3 Move the interactions that you want to include in the plot from Available Terms to Selected
or
or
The available terms in the Interactions subdialog box list the interactions available to plot.
The second factor in the term (B in AB) is used as the horizontal scale for the plot. Thus,
you can view the AB interaction both ways by selecting both AB and BA.
24-28
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Taguchi Designs
Static Design
Dynamic Design
2 Under Display response tables for check Signal-to-noise ratios, Means (for static design) or
Larger is better
Positive
Non-negative with an
absolute zero in which the
standard deviation is zero
when the mean is zero
S/N=-10(log( (1/Y2)/n))
Nominal is best
2
S/N=-10(log(s ))
Nominal is best (default)
S/N=10(log(( Y
2)/s2))
Smaller is better
S/N=-10(log( Y
2/n))
Note
The Nominal is Best (default) S/N ratio is good for analyzing or identifying scaling factors,
which are factors in which the mean and standard deviation vary proportionally. Scaling
factors can be used to adjust the mean on target without affecting S/N ratios.
CONTENTS
24-29
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 24
HOW TO USE
Analyzing Taguchi Designs
2 Under Signal-to-Noise Ratio, choose the S/N ratio that best fits the goals of the design.
24-30
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Taguchi Designs
2 In Response reference value, enter a numeric value corresponding to the desired output
(response) value.
3 In Signal reference value, enter a signal factor level corresponding to the response reference
Suppose you are an engineer and need to evaluate the factors that affect the seal strength of
plastic bags used to ship your product. You have identified three controllable factors
(Temperature, Pressure, and Thickness) and two noise conditions (Noise 1 and Noise 2) that
may affect seal strength. You want to ensure that seal strength meets specifications. If the seal is
too weak, it may break, contaminating the product and resulting in returns. If the seal is too
strong, customers may have difficulty opening the bag. The target specification is 18.
1 Open the worksheet SEAL.MTW. The design and response data have been saved for you.
2 Choose Stat DOE Taguchi Analyze Taguchi Design.
3 In Response data are in, enter Noise1 Noise2.
4 Click Graphs. Under Generate plots of main effects and selected interactions for, check
CONTENTS
24-31
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 24
Session
window
output
HOW TO USE
Analyzing Taguchi Designs
Temperature
29.4219
27.0652
25.7842
3.6378
3
Pressure
21.9191
30.2117
30.1406
8.2926
1
Thickness
28.2568
29.0690
24.9455
4.1235
2
Temperature
17.6500
18.3333
16.3833
1.9500
1
Pressure
17.5833
17.7000
17.0833
0.6167
2
Thickness
17.6833
17.1500
17.5333
0.5333
3
Temperature
0.91924
0.94281
1.01352
0.09428
3
Pressure
1.53206
0.75425
0.58926
0.94281
1
Thickness
0.96638
0.68354
1.22565
0.54212
2
Graph
window
output
24-32
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Taguchi Designs
Suppose you are an engineer trying to increase the robustness of a measurement system. A
measurement system is dynamic because as the input signal changes, the output response
changes. A measurement system ideally should have a 1:1 correspondence between the value
being measured (signal factor) and the measured response (system output). Similarly, zero
should serve as the fixed reference point (all lines should be fit through the origin) because an
input signal of zero should result in a measurement of zero.
You have identified two components of your measurement system that will serve as the control
factors: Sensing and Reporting. The signal factor is the actual value of the item being measured
and the output response is the measurement. You have also selected two noise conditions.
1 Open the worksheet MEASURE.MTW. The design and response data have been saved for
you.
2 Choose Stat DOE Taguchi Analyze Taguchi Design.
3 In Response data are in, enter Noise1 and Noise2.
4 Click Graphs. Under Generate plots of main effects and selected interactions for, check
Standard deviations.
5 Check Display scatter plots with fitted lines. Click OK.
6 Click Tables. Under Display response tables for, check Standard deviations.
7 Click OK in each dialog box.
CONTENTS
24-33
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 24
Session
window
output
HOW TO USE
Analyzing Taguchi Designs
Sensing
20.3270
14.2224
6.1047
1
Reporting
18.3400
16.2095
2.1305
2
Sensing
1.52738
1.48734
0.04004
2
Reporting
1.03293
1.98179
0.94886
1
Sensing
0.165537
0.287448
0.121911
2
Reporting
0.141439
0.311545
0.170106
1
Graph
window
output
Interpreting results
The response tables show the average of the selected characteristic for each level of the factors.
The response tables include ranks based on Delta statistics, which compare the relative
24-34
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Predicting Results
HOW TO USE
Taguchi Designs
magnitude of effects. The Delta statistic is the highest average minus the lowest average for each
factor. Ranks are assigned based on Delta values; rank 1 is assigned to the highest Delta value,
rank 2 to the second highest Delta value, and so on. The main effects plot provide a graph of the
averages in the response table.
Because you are trying to improve the quality of a measurement system, you want to maximize
the signal-to-noise (S/N) ratio. If you examine the response table and main effects plot for S/N
ratio, you can see that the Sensing (Delta = 6.1047, Rank = 1) component has a greater effect on
S/N ratio than Reporting (Delta=2.1305, Rank = 2).
Here, the response table and main effects plots for slopes both show that Reporting (Delta =
0.94886, Rank = 1) has a much greater effect on slope than Sensing (Delta = 0.04004, Rank =
2). Thus, it is likely that Reporting can be used as a scaling factor to adjust the mean on target
after minimizing sensitivity to noise.
The response table and main effects plot show that Reporting (Delta=0.1701,Rank=1) has a
greater effect on standard deviation than sensing (Delta=0.1219,Rank=2).
Based on these results, you might first want to maximize S/N ratio using the low level of the
Sensing factor and then adjust the slope on to the target of 1 using the Reporting factor.
Predicting Results
Use Predict Results after you have run a Taguchi experiment and examined the response tables
and main effects plots to determine which factor settings should achieve a robust product design.
Predict Results allows you to predict S/N ratios and response characteristics for selected factor
settings.
For example, you might choose the best settings for the factors that have the greatest effect on the
S/N, and then wish to predict the S/N and mean response for several combinations of other
factors. Predict Results would provide the expected responses for those settings. You should
choose the results that comes closest to the desired mean without significantly reducing the S/N
ratio. You should then perform a follow-up experiment using the selected levels, to determine
how well the prediction matches the observed result.
If there are minimal interactions among the factors or if the interactions have been correctly
accounted for by the predictions, the observed results should be close to the prediction, and you
will have succeeded in producing a robust product. On the other hand, if there is substantial
disagreement between the prediction and the observed results, then there may be unaccounted
for interactions or unforeseen noise effects. This would indicate that further investigation is
necessary.
You can specify the terms in the model used to predict results. For example, you may decide not
to include a factor in the prediction because the response table and main effects plot indicate
that the factor does not have much of an effect on the response. You can also decide whether or
not to include selected interactions in the model. Interactions included in the model will affect
the predicted results.
MINITAB Users Guide 2
CONTENTS
24-35
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
Chapter 24
Predicting Results
Data
In order to predict results, you need to have
created and stored the design using Create Taguchi Design (page 24-4) or created a design
from data already in the worksheet with Define Custom Taguchi Design (page 24-17) and
h To predict results
1 Choose Stat DOE Taguchi Predict Results.
Static Design
Dynamic Design
signal-to-noise ratio
standard deviations
3 Click Terms.
4 Move the factors that you do not want to include in the model from Selected Terms to
24-36
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
or
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Predicting Results
HOW TO USE
Taguchi Designs
or
Options
Predict results dialog box
CONTENTS
24-37
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
Chapter 24
HOW TO USE
Predicting Results
We will now predict results for the seal strength experiment introduced on page 24-31. You had
identified three controllable factors that you thought would influence seal strength: Temperature,
Pressure, and Thickness. Because you first want to maximize the signal-to-noise (S/N) ratio, you
chose factor settings that increase S/N ratios: Temperature 60, Pressure 36, and Thickness 1.25.
1 Open the worksheet SEAL2.MTW. The design and response information have been saved for
you.
2 Choose Stat DOE Taguchi Predict Results.
3 Click Levels.
4 Under Method of specifying new factor levels, choose Select levels from a list.
5 Under Levels, click in the first row and choose the factor level according to the table below.
Then, use the Z key to move down the column and choose the remaining factor levels
according to the table below.
Factor
Levels
Temperature
60
Pressure
36
Thickness
1.25
Session
window
output
Predicted values
S/N Ratio
33.8551
Mean
17.5889
StDev
0.439978
Log(StDev)
-1.03172
Interpreting results
The predicted results for the chosen factor settings are: S/N ratio of 33.8551, mean of 17.5889,
and standard deviation of 0.439978. Next, you might run an experiment using these factor
settings to test the accuracy of the model.
Note
The predicted values for the standard deviation and log of the standard deviation use
different models of the data.
24-38
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
References
Taguchi Designs
References
[1] G.S. Peace (1993). Taguchi Methods. Addison-Wesley Publishing Company.
[2] J.H. Lochner and J.E. Matar (1990). Designing for Quality. ASQC Quality Press.
[3] W. Y. Fowlkes and C.M. Creveling (1995). Engineering Methods for Robust Product Design.
Addison-Wesley Publishing Company.
[4] S.H. Park (1996). Robust Design and Analysis for Quality Engineering. Chapman & Hall.
[5] M.S. Phadke (1989). Quality Engineering Using Robust Design. Prentice-Hall.
CONTENTS
24-39
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
INDEX
Numerics
1 proportion
confidence interval 1-26
example 1-29
method 1-28
power 9-7
sample size 9-7
test 1-26
1-sample sign test 5-3
1-sample t
confidence interval 1-15
example 1-17
method 1-17
power 9-4
sample size 9-4
sample size example 9-6
test 1-15
1-sample Wilcoxon test 5-7
1-sample Z
confidence interval 1-12
example 1-14
method 1-14
power 9-4
sample size 9-4
test 1-12
2 proportions
confidence interval 1-30
example 1-33
method 1-32
power 9-7
power example 9-9
sample size 9-7
test 1-30
2 variances 1-34
example 1-36
2-sample Mann-Whitney test 5-11
2-sample t
confidence interval 1-18
example 1-21
method 1-20
power 9-4
sample size 9-4
test 1-18
3-17
16-16
example 16-17
interpret regression equation
16-13
options 16-8
percentiles 16-16
probability plots 16-14, 16-26
relation plot 16-11
survival probabilities 16-16
transform accelerating variable
16-12
uncensored/arbitrarily censored
data 16-8
uncensored/right censored data
16-7
accuracy, gage linearity and accuracy
11-2
3-7
nested factors 3-19
one-way 3-5
overview 3-2
overview, balanced and GLM
3-18
CONTENTS
I-i
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
ANOME
see analysis of means
ANOVA
see analysis of variance
arbitrarily censored data 15-8
distribution overview plot 15-21
distributionID plot 15-12
nonparametric distribution
analysis 15-54
parametric distribution analysis
13-15, 14-25
asymmetric plot
column 6-26
row 6-26, 6-30
attributes control charts 13-1
C chart 13-9
NP chart 13-7
options 13-14
overview 13-2
P chart 13-4
U chart 13-12
autocorrelation 7-38
in residuals 2-8
testing, example 7-40
autoregressive integrated moving
average 7-44
average
linkage 4-24
of moving range 14-17
of subgroup ranges 12-67, 14-10,
14-17
3-31
3-28
two crossed factors example 3-29
unrestricted form of mixed models
3-28
balanced MANOVA 3-51
Bartletts test 3-60, 3-61
basic statistics 1-1
overview 1-2
Bayes analysis 15-33
best subsets regression
data 2-20
example 2-23
how to use 2-22
options 2-21
between-subgroups variation 14-5,
14-17
between/within (I-MR-R/S) chart 12-24
bias, gage linearity and accuracy 11-3
binary logistic regression 2-33
data 2-34
diagnostic plots 2-38
example 2-40
Hosmer-Lemeshow statistic 2-38
initial parameter estimates 2-37
link functions 2-36
options 2-35
parameter estimates, interpreting
2-39
residual analysis 2-38
Session window output description
HOW TO USE
14-37
SC QREF
15-29
UGUIDE 2
ARIMA 7-44
entering the model 7-47
fitting a model 7-46
fitting a model, example 7-48
forecasting with a model, example
7-49
ARL 12-40, 12-47
UGUIDE 1
2-42
worksheet structure 2-32
binomial
analysis of means 3-14
control charts 13-4, 13-7
7-25
C
C charts 13-9
capability 14-1
normal versus Weibull probability
model 14-6
overall variation 14-6
overview 14-2
within variation 14-6
capability analysis
between/within 14-14
binomial distribution 14-37
normal distribution 14-6
Poisson distribution 14-41
Weibull distribution 14-19
capability sixpack
between/within 14-30
normal distribution 14-24
Weibull distribution 14-34
capability statistics 14-4, 14-10, 14-17,
ii
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
censoring
arbitrarily censored data 15-8,
UGUIDE 1
UGUIDE 2
6-35
16-5
16-4, 16-5
6-30
column plot 6-26, 6-35
comparing distribution parameters
16-4, 16-5
multiply censored data 15-6
right censored data 15-5, 16-4
singly censored data 15-6
center points 20-9
analyzing 19-44
factorial designs 19-11, 19-37
mixture designs 21-8, 21-56
Plackett-Burman designs 19-26
response surface designs 20-9
central composite designs 20-4
analyzing 20-26
example 20-14
summary 20-18
centroid linkage 4-24
centroid, mixture designs 21-8, 21-56
chi-square test
contingency data, example 6-18
goodness-of-fit test 6-19
raw data, example 6-17
classification variables 6-2, 6-4, 6-31
cluster analysis
cluster observations 4-22
cluster variables 4-29
K-means clustering 4-32
cluster observations 4-22
data 4-22
distance measures 4-23
example 4-26
final cluster grouping 4-25
K-means 4-32
linkage methods 4-24
options 4-23
cluster variables 4-29
data 4-29
distance measures 4-30
example 4-31
final cluster grouping 4-30
in practice 4-31
linkage methods 4-30
coefficient of determination 2-11
column contributions
15-34
complementary log-log link function
2-36, 2-46
complete linkage 4-24
components 21-2
composite desirability 23-6
maximizing 23-7
obtaining 23-7
confidence interval
1 proportion 1-26
1-sample t 1-16
1-sample Wilcoxon 5-8
1-sample Z 1-13
2 proportions 1-30
2-sample t 1-18
for median 1-6
for paired data 1-22
for sigma 1-6
for the mean 1-14
intervals about the means, graph
3-63
confounding 19-9, 19-14, 19-40,
19-46
constraints
linear, mixture designs 21-14
lower and upper bounds, mixture
designs 21-13
contingency 6-3
contour plot
factorial designs 19-60
factorial example 19-63
mixture designs 21-49
mixture example 21-53
overlaid 23-19
response surface designs 20-34
response surface example 20-37
contour plot, overlaid 23-19
factorial example 23-24
mixture example 23-27
response surface example 23-26
control charts 12-1, 13-1
attributes data 13-1
SC QREF
HOW TO USE
12-36
variables data 12-1
control limits 12-70, 13-16
Cooks distance 2-9
correlation 1-37
example 1-39
method 1-38
partial 1-40
correlation coefficient, Pearson 15-13
correspondence analysis
multiple 6-31
simple 6-21
covariance 1-41
method 1-42
covariates 3-20
Coxs direction 21-47
Cp 14-5, 14-6, 14-10, 14-17
Cpk 14-5, 14-6, 14-10, 14-17
CPL 14-5, 14-6, 14-10, 14-17
Cpm 14-5, 14-6
CPU 14-5, 14-6, 14-10, 14-17
cross correlation 7-43
cross tabulation 6-3
change table layout 6-5
change table layout, example 6-10
three classification variables,
example 6-10
to display data, example 6-8
cross-validation 4-20
crossed and nested model 3-31
crossed factors 3-19, 3-26, 3-37, 3-51,
3-57
cube plot
factorial designs 19-55
mixture designs 21-45
cube points 20-9
cubic regression model 2-25
cumulative % defective 14-38
cumulative counts, tally 6-12
iii
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
17-9
curvature 19-53
curve fitting 2-24
customer support xiv
CUSUM
plan 12-47
two-sided 12-44
UGUIDE 1
UGUIDE 2
13-15
D
D-optimality 22-6, 22-14
data in subgroups
(CUSUM chart) 12-46
(Moving Average chart) 12-41
(R chart, Xbar-R chart, Xbar-S
chart, S chart, Xbar chart,
I-MR-R/S chart) 12-10
data limitations
capability analysis commands
14-2
capability analysis, normal 14-7
capability analysis, Weibull 14-19
Pareto chart 10-12
data sets, sample xviii
data subsetting lack of fit test 2-8
date/time stamp 12-72
decomposition 7-10
example 7-13
forecasting 7-13
model 7-12
trend model residuals 7-13
%defective
cumulative chart 14-38
histogram 14-38
defective rate plot 14-38
defectives data
control charts 13-2
process capability 14-37
defects control charts 13-9
defects data, capability analysis 14-41
defects (Pareto chart) 10-11
defects per unit
cumulative mean 14-41
histogram 14-41
SC QREF
HOW TO USE
15-10, 15-12
right censored data 15-9, 15-10
distribution overview plot 15-19
arbitrarily censored data 15-19,
15-21
right censored data 15-19, 15-20
distribution parameters
comparing 15-34
distributors of MINITAB xiv
documentation for MINITAB xv
DOE
factorial designs 19-1
inner-outer array design 24-1
mixture designs 21-1
optimization 18-3
orthogonal array designs 24-1
overview 18-1
planning 18-2
response surface 20-1
robust designs 24-1
screening 18-3
Taguchi designs 24-1
verification 18-4
worksheet, modifying 18-4
double exponential smoothing 7-25
choosing weights 7-27
forecasting 7-28
DPU
see defects per unit
dummy treatments 24-13
Dunnett method 3-7
with GLM 3-42
dynamic response experiment 24-2
creating 24-8, 24-19
treat as non-dynamic 24-20
iv
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
E
EDA
see exploratory data analysis
equal variances
example 1-36
test for 1-34
equimax rotation method 4-10
error bar graphs 3-63
estimate parameters for control charts
12-66, 13-15
estimating distribution parameters,
parametric distribution analysis 15-4,
15-42
EWMA charts 12-37
calculating the EWMA 12-39
examples, how to use them xvii
expected mean squares 3-28
experimental designs
see DOE
exploratory data analysis 8-1
overview 8-2
exponential growth trend model 7-6
exponentially weighted moving average
control chart 12-37
extreme vertices design 21-5
creating 21-5
example 21-22
F
F-test 3-60
versus Levenes test 1-35
face-centered design 19-41, 20-10
factor analysis 4-6
data 4-6
in practice 4-9
input data, matrix 4-10
input data, stored loadings 4-11
maximum likelihood method 4-9
maximum likelihood method
example 4-14
options 4-7
principal components method
example 4-12
rotating factor loadings 4-10
UGUIDE 1
UGUIDE 2
storage 4-12
varimax rotation example 4-14
factor information
binary logistic regression 2-42
nominal logistic regression 2-57
ordinal logistic regression 2-50
factor levels, using patterned data to set
up 3-25
factor variables
logistic regression 2-31
probit analysis 17-11
regression with life data 16-25
factorial designs 19-1
analyzing 19-44
analyzing example 19-50
center points 19-11, 19-37
choosing 19-5
collecting and entering data 19-43
contour plot 19-60
contour plot example 19-63
creating, example 19-19, 19-22,
19-26
data 19-44
displaying 19-42
factor levels, changing 19-38
factorial plots 19-53
fractional 19-3
full 19-3
general full factorial designs,
creating 19-33
modifying 19-38
naming factors 19-38
optimization example 23-12
overview 19-2
Plackett-Burman 19-4
Plackett-Burman, creating 19-24
power 9-13
power example 9-15
randomizing 19-39
replicating 19-8, 19-12, 19-39
resolution 19-6
sample size 9-13
specifying the model 19-47
surface (wireframe) plot example
19-63
two-level, creating 19-6
factorial plots
factorial designs 19-53
mixture designs 21-44
SC QREF
HOW TO USE
3-7
fast initial response 12-46
FIR
see fast initial response
fishbone diagram 10-14
Fishers least significant difference 3-7
fitted line plot 2-24
data 2-24
example 2-26
models 2-25
options 2-25
fitted regression line 2-24
example 2-26
fitting a distribution, parametric
distribution analysis 15-32
fixed factors 3-20, 3-26, 3-51
folding 19-8, 19-14, 19-40
foldover design 19-15, 19-40
forecasting in trend analysis 7-8
forecasting method, how to select 7-2
forward selection 2-17
fractional factorial designs 19-3, 19-6
analyzing 19-44
example 19-19
frame of control charts 12-74
frequency counts, tally 6-12
Friedman test for a randomized block
design 5-18
full cubic model, mixture designs
21-41
full factorial designs 19-6
analyzing 19-44
general 19-33
full quartic model, mixture designs
21-41
full rank
design matrix 3-37
models 3-37, 3-57
fully nested ANOVA 3-48
example 3-50
fully nested model 3-50
hierarchical model 3-50
furthest neighbor cluster distance 4-24
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
G
gage linearity and accuracy study 11-27
gage R&R
crossed 11-4
methods 11-1
nested 11-4
overview 11-2
study 11-4
gage run chart 11-23
general linear model
see GLM
general MANOVA 3-57
general trend model 7-5
generators, factorial designs
example 19-10
GLM 3-37
adjusted means 3-39
adjusted sums of squares 3-43
and balanced ANOVA, overview
3-18
design matrix used 3-43
fit linear and quadratic effects,
example 3-44
least squares means 3-39
multiple comparisons of means
UGUIDE 1
nonparametric distribution
analysis 15-62
parametric distribution analysis
15-41
hierarchical model with fully nested
ANOVA 3-50
hinges, letter-value display 8-3
histogram of
%defective 14-38
DPU 14-41
process data 14-24, 14-34
historical chart 12-61
with other control chart options
12-62
Holt double exponential smoothing
7-25
Holt-Winters exponential smoothing
7-30
homogeneity of variance 1-34
test 3-60
test example 3-62
Hosmer-Lemeshow statistic 2-38
Hotellings T2 test 3-54
Hsus MCB method 3-7
3-40
multiple comparisons with an
unbalanced nested design,
example 3-46
sequential sums of squares 3-43
global support xiv
gompit link function 2-36, 2-46
goodness-of-fit statistics 15-13
goodness-of-fit test 16-14
binary logistic regression 2-43
chi-square 6-19
nominal logistic regression 2-58
ordinal logistic regression 2-50
grand median 5-20
UGUIDE 2
HOW TO USE
K
K-means clustering 4-32
data 4-33
example 4-35
initialize the process 4-34
options 4-33
Kaplan-Meier survival estimates 15-4,
15-56
I and MR and R/S chart 12-24
I and MR chart 12-34
I charts 12-29
I-MR chart 12-34
I-MR-R/S (between/within) chart
5-13
kurtosis 1-6
12-24
identify outliers 2-9
individual desirability 23-6
individual error rate, multiple
comparisons 3-7
individual observations control chart
12-29, 12-34
SC QREF
14-30
24-29
Latin square with repeated measures
design 3-24
vi
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
21-14
linear discriminant analysis 4-18
linear model
factorial designs 19-47
mixture designs 21-41
regression 2-25
response surface designs 20-29
trend 7-6
linearity, gage linearity and accuracy
UGUIDE 1
UGUIDE 2
M
MA charts 12-41
MAD 7-8
Mahalanobis distance 4-18
main effects 19-4, 19-24, 19-52
main effects plot 3-66
example 3-67
factorial designs 19-53
mixture designs 21-44
Mann-Whitney test, 2-sample 5-11
MANOVA
see multivariate analysis of
variance
MAPE 7-8
master measurement 11-27
matrix of interaction plots 3-68
maximum 1-5
maximum likelihood estimates 15-4,
15-44
11-3
link functions
binary logistic regression 2-36
ordinal logistic regression 2-46
linkage methods
cluster observations 4-24
cluster variables 4-30
log-likelihood
binary logistic regression 2-42
nominal logistic regression 2-57
ordinal logistic regression 2-50
logistic regression
binary 2-33
model restrictions 2-30
nominal 2-51
ordinal 2-44
overview 2-29
reference event 2-31
reference level 2-31
specify model terms 2-30
worksheet structure 2-32
logistic regression table
binary logistic regression 2-42
nominal logistic regression 2-57
ordinal logistic regression 2-50
logit link function 2-36, 2-46
1-6
mean squared deviation
see MSD
mean, using historical values
(C chart, U chart) 13-14
variables control charts 12-64
measurement system variation
components of 11-4
diagram of components 11-4
measurement systems analysis 11-1
overview 11-2
measures of association
binary logistic regression 2-43
ordinal logistic regression 2-50
median 1-6
linkage 4-25
of moving range 12-67, 14-18
polish 8-5
test 5-3, 5-16
SC QREF
HOW TO USE
21-32
contour plot example 21-53
creating 21-5
creating, example 21-20, 21-22
display order 21-35
displaying 21-35
displaying results 21-40
fitting a model 21-38
linear constraints 21-14
lower and upper bound constraints
21-13
mixture-amounts designs 21-11
mixture-process variable designs
21-15
modifying 21-31
naming components 21-12,
21-31
naming process variables 21-32
optimal designs 22-2
optimization example 23-16
optimizing responses 23-2
options, analyzing 21-39
options, creating 21-7
overview 21-2
plots 21-45
process variables 21-15
randomizing 21-19, 21-34
renumbering 21-34
replicating 21-9, 21-33
simplex design plot 21-24
simplex design plot example
21-27
vii
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
21-53
units for components 21-36
using actual measurements 21-11
worksheet display 21-35
mixture-amounts designs 21-11
mixture-process variable designs 21-15
fractionating 21-15
model parameters, estimate 16-28
model specification
factorial designs 19-47
response surface designs 20-29
model terms
logistic regression 2-30
specifying 3-21
modifying designs
factorial 19-38
mixture 21-31
response surface 20-20
Taguchi 24-18
Moods median test for a one-way
design 5-16
moving average 7-18
centering values 7-19
determining the length 7-19
forecasting 7-20
time series plot 7-18
moving average control chart 12-41
calculating the moving average
UGUIDE 1
UGUIDE 2
3-8
one-way ANOVA example 3-9
pairwise comparisons 3-41
multiple correspondence analysis 6-31
multiple degrees of freedom test,
regression with life data 16-28
multiple regression 2-3
multiple response optimization 23-2
numerical optimization 23-2
optimization plot 23-2
overlaid contour plot 23-19
multiplicative model 7-12
multiply censored data 15-6
multivariate analysis 4-1
overview 4-2
multivariate analysis of variance 3-26
balanced 3-51
example 3-54
general 3-57
general, example 3-58
general, nesting 3-57
specify terms to test 3-53, 3-59
tests 3-54
12-43
moving range control chart 12-24,
example 2-55
model 2-54
options 2-52
parameter estimates, interpreting
2-54
Session window output description
2-56
worksheet structure 2-32
nominal specification for capability
analysis 14-8
non-normal data 14-6
with control charts 12-6, 12-68
nonparametric distribution analysis
15-3, 15-52
actuarial survival estimates 15-4,
15-58
arbitrarily censored data 15-54
density function 15-60
draw a hazard plot, arbitrary
censoring 15-63
draw a hazard plot, right censoring
15-62
hazard function 15-57, 15-60
hazard plots 15-62
Kaplan-Meier survival estimates
15-4, 15-56
nonparametric survival plots
15-61
options 15-54
request actuarial estimates 15-60
right censored data 15-53
survival curve, comparing in
survival probabilities 15-56
Turnbull survival estimates 15-4,
17-12
nearest neighbor cluster distance 4-24
nested factors 3-19, 3-26, 3-37, 3-51,
3-57
HOW TO USE
15-62
SC QREF
15-58
uncensored/right censored data
15-53
nonparametric survival plots,
nonparametric distribution analysis
15-61
nesting
in ANOVA 3-49
in general MANOVA 3-57
in GLM 3-37
noise factors 24-5
nominal is best, analyze Taguchi design
24-29
nominal logistic regression 2-51
data 2-51
nonparametrics 5-1
overview 5-2
normal probability plot 1-43, 14-24
normality test 1-43
example 1-44
normit link function 2-36, 2-46
NP charts 13-7
number of defectives control chart 13-7
viii
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
O
one proportion
confidence interval 1-26
example 1-29
method 1-28
power 9-7
sample size 9-7
test 1-26
one-sample
sign test 5-3
Wilcoxon test 5-7
one-sample t
confidence interval 1-15
example 1-17
method 1-17
power 9-4
sample size 9-4
sample size example 9-6
test 1-15
one-sample Z
confidence interval 1-12
example 1-14
method 1-14
power 9-4
sample size 9-4
test 1-12
one-way analysis of variance 3-5
power 9-10
power example 9-12
sample size 9-10
stacked data 3-5
unstacked data 3-6
one-way table 6-3
optimal designs 22-2
augmenting 22-9
augmenting example 22-16
D-optimal 22-6, 22-14
distance-based 22-6, 22-14
evaluating 22-18
evaluating example 22-21
overview 22-2
selecting 22-2
UGUIDE 1
UGUIDE 2
2-47
Session window output description
2-49
worksheet structure 2-32
orthogonal array designs 24-1
orthogonal arrays 24-2, 24-4
summary 24-14
orthomax rotation method 4-10
outliers, identify in regression 2-9
overall variation 14-5, 14-10, 14-17
overlaid contour plots 23-19
factorial example 23-24
mixture example 23-27
response surface example 23-26
P
P charts 13-4, 14-38
p, using historical values (NP chart, P
chart) 13-14
p-value 1-15, 1-18
paired t-test
confidence interval 1-22
example 1-25
method 1-24
test 1-22
pairwise
averages 5-24
differences 5-25
slopes 5-26
parametric distribution analysis 15-3,
15-27
arbitrarily censored data 15-29
comparing parameters 15-34
control estimation of parameters
15-43
draw parametric survival plot
15-41
SC QREF
HOW TO USE
15-4, 15-42
fitting a distribution 15-32
hazard plots 15-41
modify default probability plot
15-38
options 15-30
percentiles 15-36
probability plots 15-37
request additional percentiles
15-36
request parametric survival
probabilities 15-40
right censored data 15-28
survival plots 15-40
survival probabilities 15-39
uncensored/right censored data
15-28
Pareto chart 10-11
data limitations 10-12
numeric data 10-11
partial autocorrelation 7-41
partial correlation coefficient 1-40
example 1-40
PCI 14-1, 14-5, 14-6, 14-24, 14-34
Pearl-Reed logistic trend model 7-7
Pearson correlation coefficient 15-13
example 1-39
Pearson product moment 1-37
percentiles
accelerated life testing 16-16
parametric distribution analysis
15-36
probit analysis 17-8
regression with life data 16-27
percents, tally 6-12
Piepels direction 21-47
Pillais test 3-54
Plackett-Burman designs 19-4
creating 19-24
example 19-26
options 19-26
power 9-13
replicating 19-26
sample size 9-13
plot data means 3-66
ix
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
14-24
16-26
Poisson
analysis of means 3-14
control charts 13-9, 13-12
distribution, process capability
15-37
probit analysis 17-10
regression with life data 16-14,
14-41
polynomial regression 2-24
model choices 2-25
pooled standard deviation 12-67,
16-26
14-10, 14-17
potential variation 14-5
power 9-1
1 proportion 9-7
1-sample t 9-4
1-sample Z 9-4
2 proportions 9-7
2 proportions example 9-9
2-sample t 9-4
definition 9-2
estimating sigma 9-6
factorial design example 9-15
factorial designs 9-13
factors that influence 9-3
one-way ANOVA 9-10
one-way ANOVA example 9-12
overview 9-2
Plackett-Burman designs 9-13
two-level factorial designs 9-13
Pp 14-5, 14-6, 14-10, 14-17, 14-21
Ppk 14-5, 14-6, 14-10, 14-17, 14-21
PPL 14-5, 14-6, 14-10, 14-17, 14-21
Ppm 14-5, 14-6
PPU 14-5, 14-6, 14-10, 14-17, 14-21
precision of process measurements
11-2, 11-3
predicting results, Taguchi designs
24-35
prediction
group membership, discriminant
analysis 4-19
responses, regression 2-12
principal components analysis 4-3
data 4-3
example 4-5
nonuniqueness of coefficients 4-4
options 4-4
SC QREF
13-4
proportions, mixture designs 21-35
prospective study 9-2
pseudo-center points 19-12
pseudocomponents 21-13, 21-35,
21-36
pure error 19-53, 20-31
lack-of-fit test 2-8
Weibull 14-34
probit analysis 17-2
confidence intervals 17-8, 17-10
control estimation of parameters
17-12
cumulative probabilities 17-9
distribution function 17-7
example 17-13
factor variables 17-11
model parameters, estimating
17-11
natural rate response 17-12
options 17-4
overview 17-2
percentiles 17-8
performing 17-3
probability plots 17-10
reference levels 17-11
survival plots 17-10
survival probabilities 17-9
table of percentiles, modify 17-8
probit link function 2-36, 2-46
process capability 14-1
binomial distribution 14-37
capability plot 14-24, 14-34
overview 14-2
Poisson distribution 14-41
report 14-6, 14-19
sixpack combination graph 14-24
process capability statistics 14-4,
14-10, 14-17
HOW TO USE
Q
QQ plots 1-43
quadratic discriminant analysis 4-18
quadratic model
mixture designs 21-41
regression 2-25
response surface designs 20-32
trend 7-6
quality control graphs 10-1, 12-1,
13-1, 14-1
quality planning tools 10-1
overview 10-2
quantile-quantile plots 1-43
quartiles 1-6
quartimax rotation method 4-10
R
R and I and MR chart 12-24
R and X-bar chart 12-19
R chart 12-14, 12-24, 14-24, 14-30,
14-34
R-squared 2-11
random factors 3-20, 3-26, 3-51
randomized block design 3-23
randomized block experiment 5-3,
5-18
randomness, tests for 5-22, 10-5
ranges in R charts 12-14
reduced models, specify 3-22
reference event, logistic regression 2-31
reference levels
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
16-27
example 16-30
factor variables 16-25
how to specify the model terms
16-24
interpreting the regression
equation 16-13
UGUIDE 1
UGUIDE 2
16-28
options 16-22
overview 16-2
percentiles 16-27
probability plots 16-14, 16-26
reference factor level, change
16-26
reference levels 16-25
survival probabilities 16-27
uncensored/arbitrarily censored
data 16-22
uncensored/right censored data
16-21
worksheet structure 16-3
repeatability, gage R&R 11-3
repeated measures design
example 3-31
Latin square 3-24
replication
factorial designs 19-8, 19-12,
19-39
mixture designs 21-9, 21-33
Plackett-Burman designs 19-26
response surface designs 20-22
reproducibility, gage R&R 11-3
residual analysis 2-5
residual plots 2-6, 2-27, 20-27
data 2-27
example 2-28
options 2-27
resistant line 8-9
resistant smoothers 8-10
resolution of factorial designs 19-6,
19-29
response information
binary logistic regression 2-42
nominal logistic regression 2-56
ordinal logistic regression 2-49
response optimization 23-2, 23-19
factorial design example 23-12
mixture design example 23-16
numerical optimization 23-2
optimization plot 23-2
response surface design example
23-14
response surface designs 20-1
analyzing 20-26
analyzing example 20-30, 20-32
SC QREF
HOW TO USE
20-37
response surface methods 20-1
response surface plots
factorial designs 19-60
mixture designs 21-49
response surface designs 20-34
response trace plot 21-45
example 21-47
restricted form of mixed models 3-28
example 3-33
restricted model in ANOVA 3-33
results, predicting for Taguchi designs
24-35
retrospective study 9-2
right censored data 15-5
distribution ID plot 15-10
distribution overview plot 15-20
nonparametric distribution
analysis 15-53
parametric distribution analysis
15-28
robust designs 24-1
overview 24-2
robust parameter design 24-2
rootogram, suspended 8-12
rotatable designs 20-8
row contributions 6-29
row plot 6-26, 6-30
row profiles 6-29
Roys largest root test 3-54
xi
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
RSM 20-1
run chart 10-2, 14-24, 14-34
run order 19-17, 20-11, 21-19
runs test 5-22
UGUIDE 1
UGUIDE 2
1-sample 5-3
for the median 5-5
signal factor
adding 24-8, 24-19
adding levels 24-20, 24-21
ignoring 24-20
modifying 24-21
signal-to-noise ratio 24-29
signed rank test 5-7
simple correspondence analysis 6-21
simple linear regression 2-3
simplex centroid design 21-4
analyzing example 21-43
creating 21-5
creating example 21-20
simplex design plot example
S
S and I and MR chart 12-24
S and X-bar chart 12-22
S chart 12-17, 12-24, 14-30
S-curve trend model 7-7
S/N ratio 24-29
sample data sets xviii
sample size 9-1
1 proportion 9-7
1-sample t 9-4
1-sample t example 9-6
1-sample Z 9-4
2 proportions 9-7
2-sample t 9-4
estimating sigma 9-6
factorial designs 9-13
one-way ANOVA 9-10
overview 9-2
Plackett-Burman designs 9-13
two-level factorial designs 9-13
scale parameter, capability analysis
21-27
simplex design plot 21-24
simplex lattice design 21-4
creating 21-5
single linkage 4-24
singly censored data 15-6
skewed data with control charts 12-6,
12-68
14-22
screening designs 19-2
selecting a forecasting or smoothing
method 7-2
sequential sums of squares 3-43, 3-57
Shainin multi-vari charts 10-17
shape parameter, capability analysis
14-22
skewness 1-6
smaller is better, analyze Taguchi
design 24-29
smoothers, resistant smoothing 8-11
smoothing method, how to select 7-2
Spearmans 1-39
special causes, tests for 12-66, 13-15,
14-25
special cubic model, mixture designs
21-41
special quartic model, mixture designs
21-41
5-5
sign scores test 5-3, 5-16
sign test 5-4
3-23
specification limits 14-8, 14-20
specify
length of moving range 12-67
model terms 3-21
model terms, regression with life
data 16-24
reduced models 3-22
terms involving covariates 3-22
SC QREF
HOW TO USE
11-3
standard deviation 1-5
standard deviations control chart
12-17, 12-22
standard error bars 3-63
standard error of mean 1-5
standard order 19-17, 20-11, 21-19
standardized control chart 12-54
star points 20-9
static designs 24-2, 24-20
analyzing 24-29
stem-and-leaf plot for exploratory data
analysis 8-1
stepwise regression 2-14
backwards elimination 2-17
data 2-14
example 2-18
forward selection 2-17
method 2-16
options 2-15
user intervention 2-17
variable selection procedures 2-18
store descriptive statistics 1-9
comparing display and storage 1-4
study variation, gage R&R 11-10
subgroup indicators in control charts
12-4
subgroup means control chart 12-11,
12-19, 12-22
subgroup ranges control chart 12-14,
12-19
subgroup sizes unequal, defectives
control charts 13-3
subgroup standard deviations control
chart 12-17, 12-22
subgroups data control charts 12-10,
12-36
sums of squares 1-6, 19-52, 20-33
adjusted 3-57
sequential 3-57
supplementary rows 6-30
support, customer xiv
surface (wireframe) plot
xii
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
15-62
survival plots
nonparametric distribution
analysis 15-61
parametric distribution analysis
15-40
probit analysis 17-10
survival probabilities
accelerated life testing 16-16
nonparametric distribution
analysis 15-56
parametric distribution analysis
15-39
probit analysis 17-9
regression with life data 16-27
suspended rootogram 8-12
symmetric plot, simple correspondence
analysis 6-26
symmetry plot 10-20
T
t-test
one sample confidence interval
1-15
one sample test 1-15
paired data 1-22
two sample confidence interval
1-18
two sample test 1-18
T2 test statistic 3-54
table of coefficients in regression
output 2-11
tables 6-1, 6-3
arrangement of input data 6-3
overview 6-2
Taguchi designs 24-1, 24-2, 24-4
analyzing 24-23
choosing 24-4
UGUIDE 1
UGUIDE 2
creating 24-4
defining custom 24-17
displaying 24-21
estimating interactions 24-10
modifying 24-18
planning 24-3
predicting results 24-35
summary 24-14
Taguchi, Genichi 24-2
tally unique values 6-12
technical support xiv
test and confidence interval
1 proportion 1-26
1-sample t 1-15
1-sample Z 1-12
2 proportions 1-30
2-sample t 1-18
test equality of medians 5-13, 5-16
test for
equal variances 1-34
equal variances example 1-36
equality 3-60
homogeneity of variance 3-60
median, 1-sample Wilcoxon 5-8
randomness 10-5
test for association (independence),
chi-square 6-14
tests for special causes 12-64, 12-66,
13-15, 14-25
defining 12-5, 12-64
time series 7-1
ARIMA 7-44
ARIMA modeling 7-4
autocorrelation 7-38
correlation analysis 7-4
cross correlation 7-43
decomposition 7-10
differences between data values
7-35
double exponential smoothing
7-25
lag 7-36
moving average 7-18
overview 7-2
partial autocorrelation 7-41
plot 7-1
simple forecasting and smoothing
methods 7-2
smoother 8-10
SC QREF
HOW TO USE
15-58
two proportions
confidence interval 1-30
example 1-33
method 1-32
power 9-7
power example 9-9
sample size 9-7
test 1-30
two variances 1-34
example 1-36
two-level designs 19-6
two-level factorial designs
adding factors 19-9
analyzing 19-44
creating 19-7
options 19-8
power 9-13
power example 9-15
sample size 9-13
two-level full factorial designs, example
with replicates 19-50
two-sample Mann-Whitney test 5-11
two-sample t
xiii
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
U
U chart 13-12, 14-41
unequal subgroup sizes
defectives control charts 13-3
(R chart, Xbar-R chart, Xbar-S
chart, S chart, Xbar chart) 12-10
univariate analysis of variance 3-26,
3-37
unrestricted form of mixed models
3-28
example 3-33
unusual observations in regression 2-12
utility transfer function 23-6
V
V-mask 12-44
variable selection with stepwise
regression 2-18
variables control charts 12-1
add rows of tick labels 12-72
between/within chart 12-24
Box-Cox transformation for
non-normal data 12-5
control charts for data in
subgroups 12-10
control charts for individual
observations 12-28
control charts for short runs 12-54
UGUIDE 1
UGUIDE 2
12-70
CUSUM chart 12-44
defining tests for special causes
12-5
estimate control limits and center
line independently for different
groups 12-61
EWMA chart 12-37
force control limits and center line
to be constant 12-68
I (individuals) chart 12-29
I-MR chart 12-34
I-MR-R/S chart 12-24
moving average chart 12-41
moving range chart 12-32
omit subgroups from estimate of
or 12-66
options 12-66
overview 12-2
R chart 12-14
S chart 12-17
tests for special causes 12-64
time stamp 12-72
use historical values of and
12-64
X-bar and R chart 12-19
X-bar and S chart 12-22
X-bar chart 12-11
Z-MR chart 12-54
zone chart 12-48
variance 1-6
inflation factor 2-7
test 3-60
test example 3-62
test for equality 1-34
varimax rotation method 4-10
VIF 2-7
SC QREF
HOW TO USE
Weibull distribution
capability analysis 14-21
control charts 14-34
Weibull probability plot 14-34
weighted least squares regression 2-6
Wilcoxon
signed rank test 5-7
test, 1-sample 5-7
Wilks test 3-54
Winters exponential smoothing 7-30
additive model 7-32
choosing weights 7-32
forecasting 7-33
multiplicative model 7-32
wireframe plots 19-60, 20-34
within-subgroups variation 14-5,
14-10, 14-17
worksheet structure
accelerated life testing 16-3
arbitrarily censored data 15-8
frequency column 15-7
multiply censored data 15-6
probit analysis 17-2
regression with life data 16-3,
16-20
right censored data 15-5
singly censored data 15-6
stacked vs. unstacked data 15-8
worksheet structure, logistic regression
2-32
WWW address xiv
X
X-bar and R chart 12-19
X-bar and S chart 12-22
X-bar chart 12-11, 14-24, 14-34
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
CONTENTS
INDEX
MEET MTB
UGUIDE 1
UGUIDE 2
SC QREF
HOW TO USE
UGUIDE 2
SC QREF
HOW TO USE
Z
Z and MR chart 12-54
Z-test
one-sample confidence interval
1-12
one-sample test 1-12
zone control chart 12-48
comparing with a Shewhart chart
12-52
xv
CONTENTS
INDEX
MEET MTB
UGUIDE 1