Analyse-It CompleteGuide PDF
Analyse-It CompleteGuide PDF
Analyse-It CompleteGuide PDF
Preface: Welcome to Analyse-it 11
Part I: Administrator's Guide 13
Chapter 1: Installation 15
System requirements 15
Installing the software 15
Installing the software for concurrent use 16
Transferring the software to another computer 16
Uninstalling the software 17
Chapter 2: Licensing 19
Activating a license 19
Deactivating a license 21
Acquiring a concurrent license 22
Releasing a concurrent license 22
Finding out who's using a concurrent license 22
Chapter 3: Software updates 25
Maintenance 25
Updating the software 25
Checking maintenance expiry and renewing 26
Chapter 4: Troubleshooting 27
Checking the version installed 27
Enabling the add-in 27
Repairing the installation 28
Contacting technical support 29
Part II: User's Guide 31
3
Setting the minimum, maximum and units of a variable 39
Ordering categorical data 40
Setting the number format of a variable 40
Labeling cases 41
Assigning labels to categories 41
Assigning colors/symbols to categories 41
Transforming variables 42
Transform functions 42
Analyzing a subset of the data 42
Chapter 9: Distribution 55
Continuous distributions 55
Univariate descriptive statistics 55
Calculating univariate descriptive statistics 56
Univariate plot 57
Dot plot 57
Box plot 57
Mean plot 58
Creating a univariate plot 58
Frequency distribution 58
Cumulative distribution function plot 59
Creating a CDF plot 59
Histogram 59
Creating a histogram 60
Normality 60
Normal distribution 60
Normal probability (Q-Q) plot 61
Creating a normal probability plot 61
Normality hypothesis test 61
Tests for normality 62
Testing the normality of a distribution 62
Central limit theorem and the normality assumption 62
4 Contents
Inferences about distribution parameters 63
Point and interval estimation 63
Parameter estimate 64
Estimators for the central location parameter of a distribution 65
Estimators for the dispersion parameter of a distribution 65
Estimating the parameters of a distribution 65
Hypothesis testing 65
Parameter hypothesis test 66
Parameter equivalence hypothesis test 66
Tests for the central location parameter of a distribution 67
Tests for the dispersion parameter of a distribution 67
Testing distribution parameters 67
Testing equivalence of distribution parameters 68
Discrete distributions 69
Frequency distribution 69
Frequency table 69
Frequency plot 69
Whole-to-part plot 70
Creating a frequency plot 70
Creating a whole-to-part plot 70
Inferences about Binomial distribution parameters 71
Binomial distribution parameter estimate 71
Estimators for the parameter of a Binomial distribution 71
Binomial distribution parameter hypothesis test 71
Tests for the parameter of a Binomial distribution 71
Testing Binomial distribution parameters 72
Inferences about Multinomial distribution parameters 72
Multinomial distribution parameters hypothesis test 72
Tests for the parameters of a Multinomial distribution 73
Testing Multinomial distribution parameters 73
Study design 74
Chapter 10: Compare groups 75
Calculating univariate descriptive statistics, by group 75
Side-by-side univariate plots 76
Creating side-by-side univariate plots 76
Equality of means/medians hypothesis test 76
Equivalence of means hypothesis test 77
Tests for means/medians 77
Testing equality of means/medians 78
Testing equivalence of means 79
Difference between means/medians effect size 79
Estimators for the difference in means/medians 79
Estimating the difference between means/medians 80
Multiple comparisons 80
Mean-Mean scatter plot 81
Multiple comparison procedures 82
Comparing multiple means/medians 83
Homogeneity of variance hypothesis test 84
Tests for homogeneity of variance 84
Testing homogeneity of variance 84
Study design 86
Chapter 11: Compare pairs 87
Difference plot 87
Contents 5
Creating a Tukey mean-difference plot 88
Equality of means/medians hypothesis test 88
Equivalence of means hypothesis test 88
Tests for means/medians 89
Testing equality of means/medians 90
Testing equivalence of means 90
Difference between means/medians effect size 91
Estimators for the difference in means/medians 91
Estimating the difference between means/medians 91
Study design 93
Chapter 12: Contingency tables 95
Contingency table 95
Creating a contingency table 96
Creating a contingency table (related data) 96
Grouped frequency plot 97
Effect size 97
Estimators 97
Estimating the odds ratio 98
Estimating the odds ratio (related data) 98
Relative risk 99
Inferences about equality of proportions 99
Equality of proportions hypothesis test 99
Exact and asymptotic p-values 100
Wald, Score, Likelihood ratio 100
Tests for equality of proportions (independent samples) 101
Testing equality of proportions (independent samples) 101
Tests for equality of proportions (related samples) 101
Testing equality of proportions (related samples) 102
Inferences about independence 102
Independence hypothesis test 102
Continuity correction 103
Tests for independence 103
Testing independence 103
Mosaic plot 104
Creating a mosaic plot 104
Study design 105
6 Contents
Chapter 14: Principal component analysis (PCA) 115
Principal components 115
Scree plot 116
Calculating principal components 116
Biplot 116
Monoplot 118
Creating a biplot 119
Creating a correlation monoplot 119
Contents 7
Fitting an advanced logistic model 144
Fitting a simple probit regression 145
Parameter estimates 145
Odds ratio estimates 146
Effect of model hypothesis test 146
Effect of term hypothesis test 146
Study design 147
8 Contents
Estimating the bias of a measurement system 178
Testing against an assigned value 179
Testing bias against allowable bias 180
Linearity 180
Estimating the linearity of a measurement procedure 181
Interferences 182
Detection capability 183
Limit of blank (LoB) 183
Limit of detection (LoD) 183
Estimating the detection limit of a measurement system 183
Estimating the detection limit using a precision profile 184
Estimating the detection limit using a probit fit 185
Limit of quantitation (LoQ) 185
Study design 186
Contents 9
Xbar-R chart 211
Xbar-S chart 211
Xbar chart 211
R chart 211
S chart 212
I-MR chart 212
I chart 212
MR chart 213
Creating a Xbar-R / S control chart 213
Creating an I-MR control chart 214
Shewhart attributes control charts 215
NP chart 215
P chart 216
C chart 216
U chart 216
Creating an NP / P control chart 217
Creating a C / U control chart 217
Shewhart control chart rules 218
Applying rules to a Shewhart control chart 219
Formatting a control chart 219
Labeling control chart points 220
Displaying stratification in a control chart 220
Applying phases/stages for each change in a process 220
Time-weighted control charts 221
Uniformly Weighted Moving Average (UWMA) chart 221
Exponentially Weighted Moving Average (EWMA) chart 222
Creating a UWMA control chart 223
Creating an EWMA control chart 223
CUmulative SUM (CUSUM) chart 224
Creating a CUSUM control chart 225
Study design 226
10 Contents
Welcome to Analyse-it Preface
Analyse-it brings powerful statistical analysis and data visualization into Microsoft Excel.
This guide provides information to help you understand important concepts and instructions to
accomplish tasks in Analyse-it. In addition to this book, other resources are available to help you:
• Tutorials let you quickly get started using the main features of Analyse-it with real-world
examples. To view tutorials anytime, open Analyse-it and on the Analyse-it ribbon tab, choose
Analyse-it > Tutorials.
• On-screen help contains all the information in this book at your fingertips. To open on-screen
help, open Analyse-it and on the Analyse-it ribbon tab, choose Analyse-it > Help.
• Context-sensitive help provides brief text descriptions for most on-screen items. To see a help
tag, hover the mouse pointer over a user interface item for a few seconds.
• Our support website provides all the latest information www.analyse-it.com/support.
11
12
Part
I Administrator's Guide
13
14
Chapter
Installation
1
You must install the software on your computer before you can use it.
System requirements
Hardware and software requirements to run Analyse-it.
15
Installing the software for concurrent use
Install the software for a concurrent-user license.
Note: Installation for concurrent-user licenses is different from versions earlier than 3.00. The
software is now installed and activated on each computer, and your internet connection is used
to manage concurrent use. Earlier versions installed the software on a network share and used
local license control. This new method reduces network traffic, allows installation onto laptops
and computers, not on the same network, and depending on policy allows end-users to install the
software without systems administrator assistance.
• Ensure that your computer meets the system requirements.
• Ensure that you have the necessary administrative rights to install the software.
• Ensure that TCP port 443 (HTTPS) on the firewall is open for communication to secure.analyse-
it.com on each computer. Analyse-it uses your internet connection to communicate with our
license server to manage the concurrent use of the software.
1. To install the software:
• From a downloaded .EXE file, double-click the file you downloaded.
• From a CD-ROM, insert the CD into the CD-ROM drive. The installer should automatically
start, but if not, click Start > Run, and then type: D:\SETUP.EXE (where D: is the drive
letter of your CD-ROM drive).
The Setup window opens.
2. Click Next.
The installer checks your computer meets the system requirements, and if necessary installs
Microsoft .NET 4.0.
3. Read the End-User License Agreement (EULA) and then:
• If you agree, select I accept the terms of the License Agreement, and then click Next.
• If you have any questions, contact [email protected]. Click Cancel to abort the
installation.
4. Optional: If you want to change the installation folder, click Change and select the folder.
5. Click Next.
6. Click Install.
Wait while the software installs on your computer.
7. Click Close.
8. Repeat installation on each computer where you want to use the software. There is no limit to
the number of installations as our license server manages the concurrent use of the software.
Analyse-it automatically starts whenever you start Microsoft Excel, but a license is only acquired
when you use a command on the Analyse-it ribbon tab.
To use the software, you must activate a license.
16 Chapter 1: Installation
Uninstalling the software
Uninstall the software to remove it from your computer.
Note: You do not need to uninstall the software to install an update.
1. Start the Windows Control Panel.
2. Open the list of installed programs:
• On Windows XP, double-click Add/Remove Programs
• On Windows Vista or Windows 7, and later, click Programs, and then click Programs and
Features.
3. In the list of installed applications, select Analyse-it for Microsoft Excel, and then click
Uninstall.
Wait while the software uninstalls from your computer.
Chapter 1: Installation 17
18
Chapter
Licensing
2
License activation helps you meet the terms of the software license agreement and reduces piracy
ensuring continued development of the software.
When you request a 30-day trial or purchase a license to use the software you receive a product
key. The product key identifies features of the software you can use, the length of time you can
use it for, and how many users can use it.
For all products we offer either:
• A perpetual license that does not expire and includes 1-year of software maintenance, so you
receive all software updates within the year. You can renew maintenance to continue receiving
software updates, but it is not mandatory. If you do not renew maintenance you can continue
using the software without further cost, although you only have access up to the last version
released within your maintenance period.
• An annual license that expires 1-year after the initial activation and includes software
maintenance, so you receive all software updates within the year. After expiry, you must renew
the license to continue using the software.
There are two types of license seats:
• Fixed, per-user licenses allow a fixed number of users to install and use the software on their
computer, and optionally on their laptop computer for use when out of the office. The number
per-user licenses determine the number of users who can install the software. For example, a
2-user license allows two users to install and use the software. The user can transfer the license
to another computer when replacing computers.
• Floating, concurrent-user licenses allow an unlimited number of users to install the software
on their computer or laptop computer, but limits the maximum number of users of the
software at any time. For example, a 5 concurrent-user license allows any number of users to
install the software, but only allows up to 5 users to use it at the same time. All computers
must have internet access so the software can manage concurrent use.
When you activate a license on your computer the product key and a site identifier are sent to
our activation server. The site identifier is a non-unique string generated using the computer's
hardware configuration. In the request, we also include diagnostic information such as the version
of Microsoft Windows and Microsoft Excel you are using, plus some performance metrics of your
computer. We use the information to shape the future development of Analyse-it. We do not
collect any personal information except the name, organization and e-mail address you enter
during activation.
Activating a license
Activate a license to use the software on your computer.
• Ensure that you have the product key. If you have lost your product key, you can retrieve it at
http://analyse-it.com/support/lost-key.
• Ensure that TCP port 443 (HTTPS) on the firewall is open for communication to secure.analyse-
it.com on your computer. Analyse-it uses your internet connection to simplify activation.
19
1. Start Microsoft Excel to start Analyse-it.
2. On the Analyse-it ribbon tab, click Analyse-it, and then click Activate.
The Activate window opens.
3. Click Next.
The Activate window shows the license details.
4. Enter the Product Key and the Name, Organisation, and E-mail of the registered user of the
software.
The registered user may occasionally receive an e-mail with important information about
updates to Analyse-it. They can opt-out via the unsubscribe link in the e-mail they receive.
5. Click Next.
The software uses your internet connection to verify the product key and activates the license
on your computer.
6. If activation fails for any reason you must activate manually:
a) Click Get Activation Code.
Your browser opens the manual activation page at http://analyse-it.com/activate. If your
computer does not have an internet connection, you can open the page on any computer
or device with an internet connection.
b) On the manual activation page, enter the Product Key and Site ID from the Activate
window.
Note: The Site ID is unique to each computer. Do not contact technical support asking for
a Site ID as we cannot provide it. It only shows on the Activate window in the software.
20 Chapter 2: Licensing
c) Enter the Name, Organisation, and E-mail of the registered user of the software.
d) Click Get Activation Code.
The activation code shows in your browser window.
e) In the Activate window, enter the Activation code from your browser window.
f) Click Next.
7. Click Close.
Deactivating a license
Deactivate the license before you move the software to a different computer.
• Ensure that TCP port 443 (HTTPS) on the firewall is open for communication to secure.analyse-
it.com on your computer. Analyse-it uses your internet connection to simplify deactivation.
1. Start Microsoft Excel to start Analyse-it.
2. On the Analyse-it ribbon tab, click Analyse-it, and then click Deactivate.
The Deactivate window opens.
3. Click Next.
The software uses your internet connection to deactivate the license.
4. If deactivation fails for any reason you must manually deactivate:
a) Click Deactivate to manually deactivate the license.
Your browser opens the manual deactivation page at http://analyse-it.com/deactivate. If
your computer does not have an internet connection, you can open the page from any
computer or device with an internet connection to complete the deactivation of the license.
b) On the manual deactivation page, enter the Product Key and Site ID from the Deactivate
window.
Note: The Site ID is unique to each computer. Do not contact technical support asking for
a Site ID as we cannot provide it. It only shows on the Deactivate window in the software.
c) In the Deactivate window, Click Next.
5. Click Close.
You can no longer use the software on your computer.
Chapter 2: Licensing 21
Acquiring a concurrent license
Acquire a concurrent-use license to use the software.
The software acquires a license when you use a command on the Analyse-it ribbon tab, so you do
not normally need to acquire a license manually.
• Ensure that you have activated the software with a concurrent-user license.
• Ensure that TCP port 443 (HTTPS) on the firewall is open for communication to secure.analyse-
it.com on each computer. Analyse-it uses your internet connection to communicate with our
license server to manage the concurrent use of the software.
1. Start Microsoft Excel to start Analyse-it.
2. On the Analyse-it ribbon tab, click Analyse-it, and then click Acquire.
The software connects to our license server and acquires a license so you can use the software.
If a license is acquired, you can now use the software. If all licenses are currently in use, you can
try again later, can ask others using the software to release the license if they are no longer using
the software, or you may consider purchasing more concurrent-user licenses to avoid the problem
in future.
22 Chapter 2: Licensing
3. Click Close.
Chapter 2: Licensing 23
24
Chapter
Software updates
3
Software updates keep your software up-to-date with the latest features, bug fixes, and keeps it
working when Microsoft release a new version of Microsoft Excel.
We regularly make updates to the software:
• Major updates include new features and important bug fixes.
• Minor updates address bug fixes that only affect a few customers or special situations.
The software periodically uses your internet connection to check for updates and notifies you if
an update to the software is available. We do not notify you of minor updates that affect only a
few customers; we only notify customers affected directly by e-mail or through the crash reporting
system built-in to the software instead.
You must have active maintenance to download and install updates to the software.
Maintenance
Maintenance provides you with access to all software updates released during your maintenance
period.
If you purchase a perpetual license, the initial purchase includes 1-year of maintenance. Renewing
maintenance is optional. We notify you by e-mail 30 days before maintenance on your licenses
expires and again on the day it expires. If maintenance expires you can continue to use the
software but you do not receive any updates released after the maintenance expiry date.
If you purchase an annual license, maintenance is included and terminates when the license
expires. You must renew the license to continue using the software.
25
When installing an update do not uninstall the existing version. If you do, the license is
deactivated, and you must activate the license again before you can use the software.
5. Click Close.
26 Chapter 3: Software updates
Chapter
Troubleshooting
4
If Analyse-it does not start when you start Microsoft Excel, you should troubleshoot the
installation.
You can determine if Analyse-it is running by looking for the Analyse-it tab in the Excel ribbon. If
you cannot see the tab you should follow these steps:
27
4. If Analyse-it appears in the Disabled Application Add-ins section of the Add-ins list:
a) In the Manage drop-down list, select Disabled Items, and then click Go....
The Disabled Items window opens.
b) Select Add-in: Analyse-it for Microsoft Excel.
c) Click Enable and then click Close.
The COM Add-Ins window closes.
d) Click OK.
e) Exit and restart Microsoft Excel.
Analyse-it should automatically start.
28 Chapter 4: Troubleshooting
Contacting technical support
Contact us if you have any problems or questions.
E-mail [email protected].
Chapter 4: Troubleshooting 29
30
Part
II User's Guide
31
32
Chapter
Dataset context
When you are on a standard Excel worksheet the Analyse-it ribbon tab shows commands to
manage datasets and create new statistical analyses. The dataset task pane shows the dataset and
variables properties.
1. Analyse-it ribbon tab.
2. Dataset commands.
3. Statistical analyses commands.
4. Dataset task pane.
5. Analyse-it command.
33
Analysis context
When you switch to an analysis report the Analyse-it ribbon tab shows commands to manage
the report and perform additional statistical tasks. The analysis task pane shows options for the
statistical analysis grouped in panels that facilitate a logical workflow.
1. Analyse-it ribbon tab.
2. Report commands.
3. Analysis task pane.
4. Task panels.
5. Add analysis task commands.
6. Remove analysis tasks.
Datasets
A dataset is a range of contiguous cells on an Excel worksheet containing data to analyze.
When arranging data on an Excel worksheet you must follow a few simple rules so that Analyse-it
works with your data:
1. Title to clearly describe the data. If you do not specify a title, the cell range of the dataset (such
as A3:C13) is used to refer to the dataset.
2. A header row containing variable labels. Each variable name should be unique. Units of
measurement can be included in the label by enclosing them in brackets after the name.
3. Rows containing the data for each case. The number of rows is only limited by Excel (currently
over one million).
4. Columns containing the data for each variable.
5. Optional: Labels in the first column to provide a meaningful name/identifier for each case.
When you use an Analyse-it command, the extent of a dataset is determined by scanning
outwards from the active cell to include all surrounding contiguous cells. The extent is known
when a blank row or column surrounding the dataset, or the edge of the worksheet, is reached.
35
Most Excel commands (for example, Sort, Pivot Table) use the same technique to determine the
range of cells to operate on. It avoids the need to select often-large ranges of cells using the
mouse, which is laborious and error-prone. It also ensures that if you add, or remove, cases or
variables, that subsequent analyses automatically reflect any changes to the dataset.
When you analyze data, any data in hidden rows on the worksheet are excluded. This feature lets
you easily limit analysis to a subset of the cases in the dataset. You can hide rows manually or use
a filter to hide them based on criteria.
You can locate a dataset anywhere on a worksheet, and you can keep multiple datasets on a
single worksheet provided you separate them from each other by at least one blank row and
column. However, we recommend that you use a separate worksheet for each dataset. Using
separate worksheets allows you to name datasets using the Excel worksheet tabs, navigate
between datasets using the worksheet tabs, and ensures that filtering a dataset does not affect
other datasets on the same worksheet.
Variables
A variable is an attribute of an object. The value of a variable can vary from one thing to another.
There are two different types of variables:
• Quantitative (or numeric) - data are numeric values of a quantity.
• Qualitative (or categorical) - data differs only in kind.
Record quantitative data as the numeric value.
Record qualitative data as a numeric coding (for example, 0, 1, 2) or labels (for example, Low,
Medium, High). Numeric codings are often preferred as they succinctly represent the values and
allow quicker data entry. However, codings can be difficult to understand by anyone other than
the person who assigned them. To make their meaning clear, you can also assign a label to each
numeric coding.
Data are ordered based on numeric value, numeric coding, or if a numeric coding is not used into
alphabetical label order. For some statistical tests, the sort order determines how observations are
rank-ordered and can affect the results. Often alphabetic sort order does represent the correct
ordering. For example Low, Medium, High is sorted alphabetically into High, Low, Medium which
does not represent their correct, natural order. In such cases, you must explicitly set the order
before analyzing the data.
Measurement scales
Five different scales are used to classify measurements based on how much information each
measurement conveys.
Nominal Two things are assigned the same Permissible transformations are
symbol if they have the same value of any one-to-one or many-to-one
the attribute. transformation, although a many-to-
one transformation loses information.
For example, Gender (Male,
Female); Religion (coded as 0=None,
1=Christian, 2=Buddhist).
Ordinal Things are assigned numbers such that Permissible transformations are any
the order of the numbers reflects an monotone increasing transformation,
order relation defined on the attribute. although a transformation that is not
Two things x and y with attribute strictly increasing loses information.
values a(x) and a(y) are assigned
numbers m(x) and m(y) such that if m(x)
> m(y), then a(x) > a(y).
For example, Moh's scale for the
hardness of minerals; academic
performance grades (A, B, C, ...).
Interval Things are assigned numbers such Permissible transformations are any
that differences between the numbers affine transformation t(m) = c * m + d,
reflect differences of the attribute. If where c and d are constants; another
m(x) - m(y) > m(u) - m(v), then a(x) - a(y) way of saying this is that the origin and
> a(u) - a(v). unit of measurement are arbitrary.
For example, Temperature measured in
degrees Fahrenheit or Celsius.
Ratio Things are assigned numbers such that Permissible transformations are any
differences and ratios between the linear (similarity) transformation t(m) =
numbers reflect differences and ratios c * m, where c is a constant; another
of the attribute. way of saying this is that the unit of
measurement is arbitrary.
For example, Temperature measured
in degrees Kelvin scale; Length in
centimeters.
Absolute Things are assigned numbers such that The only permissible transformation is
all properties of the numbers reflect the identity transformation.
analogous properties of the attribute.
For example, Number of children in a
family, Frequency of occurrence.
While the measurement scale cannot determine a single best statistical method appropriate for
data analysis, it does define which statistical methods are inappropriate. Where possible the use of
a variable is restricted when its measurement scale is not appropriate for the analysis. For example,
a nominal variable cannot be used in a t-test.
When the measurement scale of a variable is unknown, the scale is inferred from its role in the
analysis and the type of data in the variable. If the measurement scale cannot be inferred, you
must set the measurement scale.
1 175 Blue
2 180 Blue
3 160 Hazel
4 190 Green
5 180 Green
6 150 Brown
7 140 Blue
8 160 Brown
… … …
Brown 221
Blue 215
Hazel 93
Green 64
Note: Examples of dataset layouts are included in the Statistical Reference Guide.
Labeling cases
Set a variable to use for labeling data points in plots for easy identification.
• Ensure that the variable containing the labels is the first (leftmost) column in the dataset.
1. Activate the dataset worksheet.
2. On the Analyse-it ribbon tab, in the Dataset group, click Dataset.
The dataset task pane opens.
3. Select the Labels in first column check box.
4. Click Apply.
The changes are saved and will be used in future analyses.
Transforming variables
Transform a variable to normalize, shift, scale or otherwise change the shape of the distribution so
that it meets the assumptions of a statistical test.
1. Activate the dataset worksheet.
2. In an empty column adjoining the dataset, enter the transformation function.
For example =LOG(A1)
3. Double-click on the drag handle at the bottom right of the active cell to copy the formula
down the entire column.
Transform functions
Useful Microsoft Excel functions for transforming data.
Examples
To filter to Female cases only, click the drop-down button alongside Sex, and then select Female in
the list of values to match against:
Creating an analysis
Create a statistical analysis and present the results.
The basic sequence of steps to create an analysis are:
1. Click a cell in the dataset.
2. On the Analyse-it ribbon tab, in the Statistical Analyses group, click the analysis command,
and then click a specific task command.
The analysis task pane opens.
3. Select the variables to analyze.
4. Add any additional plots, tests or statistics.
5. Set any options for analysis.
6. Click Calculate.
The analysis report opens.
Analysis reports
An analysis report is a standard Excel worksheet containing statistics and plots calculated by
a statistical analysis. You can print, send, move, copy, and save analysis reports like any other
worksheet.
Along with the results of the analysis, the report includes a header that describes the analysis, the
dataset and variables analyzed, who performed the analysis and when, and the version of Analyse-
it used.
Analysis reports are static. If the underlying data changes, the statistics and plots do not update
automatically. You must recalculate an analysis to update it. For example, if you exclude or change
the value of an observation, add or remove a case, or change the active filter, you must recalculate
the analysis to see the effect of the changes.
You can also edit an analysis to make changes such as adding or removing a plot, performing a
statistical test, or estimating a parameter.
Note: The analysis report stores a link to the dataset; the link is used to fetch the latest data
when you recalculate the analysis. The link to the dataset can become broken if you choose to
keep the dataset and analyses in separate workbooks, then move, rename or delete the dataset
workbook on disk. When the link is broken, the analysis cannot be changed or updated. To avoid
this potential problem we recommend you keep the dataset and associated analyses in the same
workbook.
45
Editing an analysis
Change an analysis and update the report.
1. Activate the report worksheet.
2. On the Analyse-it ribbon tab, in the Report group, click Edit.
The analysis task pane opens.
3. Change the variables, set the analysis options, or add/remove tasks.
4. Click Recalculate.
The analysis report updates.
Recalculating an analysis
Recalculate an analysis when the data has changed, and you want to update the report to reflect
those changes.
Note: Be aware that when you recalculate an analysis, the current analysis report worksheet is
deleted and replaced with a new worksheet. If you have made changes to the layout or formatting
of the analysis report, the changes are lost.
1. Activate the report worksheet.
2. On the Analyse-it ribbon tab, in the Report group, click Recalculate.
Note: Prior to version 5.10 the active filter is not saved with the analysis. When you recalculate
an analysis the rows currently visible are analyzed, regardless of the filter you used when
creating the analysis. From version 5.10 onwards, by default the filter criteria are saved with
the analysis and re-applied when you recalculate or make other changes to the analysis.
The drop-down arrow next to the Filter command on the ribbon of the active analysis lets
you change between Use active filter which always uses the currently active filter when
recalculating (pre version 5.10 behavior) and Save Filter & Re-apply (version 5.10 or later
behavior) which saves the filter when the analysis is calculated and re-applies it on subsequent
recalculation.
The analysis recalculates and the report updates.
Numerical accuracy
All calculations are performed using double-precision IEEE 754 standard, highly reliable numerical
methods, and then tested rigorously.
To ensure that Analyse-it produces accurate results it does not use any Excel mathematical,
statistical or distribution functions. Many of these functions in early versions of Microsoft Excel
used poor algorithms that often produced inaccurate results. Functions like STDEV and LINEST
were affected, as were add-ins that used those functions. Microsoft has now addressed most
of the problems, though you may still hear warnings against using Microsoft Excel for statistical
analysis.
To avoid any of these problems Analyse-it uses reliable numerical algorithms and all calculations
are performed using double-precision (IEEE 754 standard, effectively 15 significant digits). We then
verify the statistics are correct, and remain so, throughout the development and release process
using an in-house library of thousands of validation unit-tests.
Sending feedback
Submit your ideas for improvements, feature requests, and bug reports.
1. On the Analyse-it ribbon tab, click Analyse-it, and then click Send Feedback.
The Send Feedback window opens.
2. Enter an E-mail address where we can contact you.
We respect your privacy at all times. We do not collect personal information, and we only use
your e-mail address if we need to contact you about your problem or suggestion.
3. Enter a Subject and Feedback to describe your comment, suggestion, or bug report. Include
as much information as possible in the feedback, including the steps to reproduce the problem
if applicable.
49
4. Select Include a snapshot of my desktop if it helps to show the problem or suggestion you
are making.
5. Select Include the file I am working with to include the active Excel workbook if it contains
data that helps to show the problem or suggestion you are making.
We treat any data received as private and confidential. It is used only for the purpose you
submit it, and we delete it when no longer needed.
6. Click Send.
Your feedback is sent using your internet connection.
2. Enter an E-mail address where we can contact you.
We respect your privacy at all times. We do not collect any personal information, and we only
use your e-mail address if we need to contact you about the crash report.
3. Enter Feedback to describe what you were doing when the crash occurred and, if possible, the
steps to reproduce the problem.
4. Select Include a snapshot of my desktop if it helps to show the problem.
5. Select Include the file I am working with to include the active Excel workbook if it contains
data that helps to show the problem.
We treat any data received as private and confidential. It is used only for the purpose you
submit it, and we delete it when no longer needed.
53
54
Chapter
Distribution
9
Distribution analysis describes the distribution of a variable and makes inferences about population
parameters.
Continuous distributions
A continuous distribution describes a variable that can take on any numeric value (for example,
weight, age, or height).
Statistic Purpose
N The number of non-missing values in a set of data.
Sum The sum of the values in a set of data.
Mean Measure the central tendency using the arithmetic mean.
Harmonic mean Measure the central tendency using the reciprocal of the arithmetic
mean of the reciprocals.
Useful when the values are rates and ratios.
Geometric mean Measure the central tendency using the product of the values.
Useful when the values are percentages.
55
Statistic Purpose
Ues the Fisher-Pearson standardized moment coefficient G1 definition of
sample skewness.
Kurtosis Measure the "tailedness" of the distribution. That is, whether the tails
are heavy or light. Excess kurtosis is the kurtosis minus 3 and provides a
comparison to the normal distribution.
Positive excess kurtosis (called leptokurtic) indicates the distribution has
fatter tails than a normal distribution. Negative excess kurtosis (called
platykurtic) indicates the distribution has thinner tails than a normal
distribution.
Uses the G2 definition of the sample excess kurtosis
Median Measure the central tendency using the middle value in a set of data.
Minimum Smallest value in a set of data.
Maximum Largest value in a set of data.
Range The difference between the maximum and minimum.
1st quartile The middle value between the smallest value and median in a set of
data.
3rd quartile The middle value between the median and largest value in a set of data.
Interquartile range Measure the spread between the 1st and 3rd quartile.
Mode The value that appears the most in a set of data.
Quantiles A set of values that divide the range of the distribution into contiguous
intervals defined by probabilities.
For normally distributed data, the mean and standard deviation provide the best measures of
central location and dispersion.
For data with a non-normal or highly-skewed distribution, or data with extreme values, the median
and the first and third quartiles provide better measures of central location and dispersion. When
the distribution of the data is symmetric, the inter-quartile range (IQR) is a useful measure of
dispersion. Quantiles further describe the distribution of the data, providing an interval containing
a specified proportion (for example, 95%) of the data or by breaking the data into intervals each
containing a proportion of the data (for example, deciles each containing 10% of the data).
Mean and Moments Show mean, variance, standard deviation, skewness, and kurtosis.
Median and Quartiles Show minimum, maximum, median, and quartiles.
Quantiles Show quantiles.
56 Chapter 9: Distribution
4. Optional: To customize the statistics to show, click Customize... and then select or clear the
appropriate check boxes. To save the options as the defaults to use for future analyses, click
Save as Defaults.
5. Click Calculate.
Univariate plot
A univariate plot shows the data and summarizes its distribution.
Dot plot
A dot plot, also known as a strip plot, shows the individual observations.
A dot plot gives an indication of the spread of the data and can highlight clustering or extreme
values.
Box plot
A box plot shows the five-number summary of the data – the minimum, first quartile, median,
third quartile, and maximum. An outlier box plot is a variation of the skeletal box plot that also
identifies possible outliers.
A skeletal box plot shows the median as a line, a box from the 1st to 3rd quartiles, and whiskers
with end caps extending to the minimum and maximum. Optional notches in the box represent
the confidence interval around the median.
An outlier box plot is a variation of the skeletal box plot, but instead of extending to the minimum
and maximum, the whiskers extend to the furthest observation within 1.5 x IQR from the quartiles.
Possible near outliers (orange plus symbol) are identified as observations further than 1.5 x IQR
Chapter 9: Distribution 57
from the quartiles, and possible far outliers (red asterisk symbol) as observations further than 3.0
x IQR from the quartiles. You should investigate each possible outlier before deciding whether to
exclude it, as even in normally distributed data, an outlier box plot identifies approximately 0.7%
of observations as possible outliers.
A quantile box plot is a variation on the skeletal box plot and shows the whiskers extending to
specific quantiles rather than the minimum and maximum value.
Mean plot
A mean plot shows the mean and standard deviation of the data.
A line or dot represents the mean. A standard error or confidence interval measures uncertainty in
the mean and is represented as either an error bar or diamond.
An optional error bar or band represents the standard deviation. The standard deviation gives the
impression that the data is from a normal distribution centered at the mean value, with most of
the data within two standard deviations of the mean. Therefore, the data should be approximately
normally distributed. If the distribution is skewed, the plot is likely to mislead.
Frequency distribution
A frequency distribution reduces a large amount of data into a more easily understandable form.
58 Chapter 9: Distribution
Cumulative distribution function plot
A cumulative distribution function (CDF) plot shows the empirical cumulative distribution function
of the data.
The empirical CDF is the proportion of values less than or equal to X. It is an increasing step
function that has a vertical jump of 1/N at each value of X equal to an observed value. CDF plots
are useful for comparing the distribution of different sets of data.
Histogram
A histogram shows the distribution of the data to assess the central tendency, variability, and
shape.
Chapter 9: Distribution 59
A histogram for a quantitative variable divides the range of the values into discrete classes, and
then counts the number of observations falling into each class interval. The area of each bar in
the histogram is proportional to the frequency in the class. When the class widths are equal, the
height of the bar is also proportional to the frequency in the class.
Choosing the number of classes to use can be difficult as there is no "best," and different class
widths can reveal or hide features of the data. Scott's and Freedman-Diaconis' rules provide a
default starting point, though sometimes particular class intervals make sense for a particular
problem.
The histogram reveals if the distribution of the data is normal, skewed (shifted to the left or right),
bimodal (has more than one peak) and so on. Skewed data can sometimes be transformed to
normal using a transformation. Bi-modality often indicates that there is more than one underlying
population in the data. Individual bars distanced from the bulk of the data can indicate the
presence of an outlier.
Creating a histogram
Plot a histogram to visualize the distribution of a quantitative variable.
1. Select a cell in the dataset.
2. On the Analyse-it ribbon tab, in the Statistical Analyses group, click Distribution >
Histogram, and then click the plot type.
The analysis task pane opens.
3. In the Y drop-down list, select the variable.
4. In the Plot drop-down list, select the frequency scale.
5. Optional: To label the bars with the frequencies, select the Label bars check box.
6. Optional: To show the frequency table, select the Frequency table check box.
7. Optional: To customize the classes of the frequency distribution, in the Start at, Width, and
Classes edit boxes, type the values to form the classes.
8. Click Calculate.
Normality
Normality is the assumption that the underlying random variable is normally distributed, or
approximately so.
In some cases, the normality of the data itself may be important in describing a process that
generated the data. However, in many cases, it is hypothesis tests and parameter estimators that
rely on the assumption of normality, although many are robust against moderate departures in
normality due to the central limit theorem.
Normal distribution
A normal (or Gaussian) distribution is a continuous probability distribution that has a bell-shaped
probability density function. It is the most prominent probability distribution in used statistics.
Normal distributions are a family of distributions with the same general symmetric bell-shaped
curve, with more values concentrated in the middle than in the tails. Two parameters describe a
normal distribution, the mean, and standard deviation. The mean is the central location (the peak),
and the standard deviation is the dispersion (the spread). Skewness and excess kurtosis are zero for
a normal distribution.
The normal distribution is the basis of much statistical theory. Statistical tests and estimators based
on the normal distribution are often more powerful than their non-parametric equivalents. When
the distribution assumption can be met they are preferred, as the increased power lets you use a
smaller sample size to detect the same difference.
60 Chapter 9: Distribution
Normal probability (Q-Q) plot
A normal probability plot, or more specifically a quantile-quantile (Q-Q) plot, shows the distribution
of the data against the expected normal distribution.
For normally distributed data, observations should lie approximately on a straight line. If the data
is non-normal, the points form a curve that deviates markedly from a straight line. Possible outliers
are points at the ends of the line, distanced from the bulk of the observations.
Chapter 9: Distribution 61
The null hypothesis states that the population is normally distributed, against the alternative
hypothesis that it is not normally-distributed. If the test p-value is less than the predefined
significance level, you can reject the null hypothesis and conclude the data are not from a
population with a normal distribution. If the p-value is greater than the predefined significance
level, you cannot reject the null hypothesis.
Note that small deviations from normality can produce a statistically significant p-value when
the sample size is large, and conversely it can be impossible to detect non-normality with a small
sample. You should always examine the normal plot and use your judgment, rather than rely
solely on the hypothesis test. Many statistical tests and estimators are robust against moderate
departures in normality due to the central limit theorem.
Test Purpose
Shapiro-Wilk Test if the distribution is normal.
A powerful test that detects most departures from normality when the
sample size ≤ 5000.
62 Chapter 9: Distribution
approximated by a normal distribution even if the data are not normally distributed. For many
samples, the test statistic often approaches a normal distribution for non-skewed data when the
sample size is as small as 30, and for moderately skewed data when the sample size is larger than
100. The downside in such situations is a reduction in statistical power, and there may be more
powerful non-parametric tests.
Sometimes a transformation such as a logarithm can remove the skewness and allow you to use
powerful tests based on the normality assumption.
Chapter 9: Distribution 63
the same underlying evidence function should be used to form the confidence interval and test the
hypotheses. Be aware that not many statistical software packages follow this rule!
The following example illustrates the use of a point estimate, confidence interval and hypothesis
test in making inferences.
A scientist might study the difference in blood cholesterol between a new drug treatment and
a placebo. Improvements in cholesterol greater than 20mg/dL would be considered practically
important, and lead to a change in the treatment of patients, but smaller differences would not.
The possible outcomes of the study in terms of a point-estimate, confidence interval estimate, and
hypothesis test might be:
0 mg/dL -5 to 5 mg/dL Not significant The confidence interval is within the range
of no practical difference, 20mg/dL.
There is clear evidence the treatment
does not produce a difference of practical
importance.
10 mg/dL -5 to 25 mg/dL Not significant Part of the confidence interval lies outside
the range of no practical difference, 20mg/
dL.
Although the hypothesis test is not
significant, there may be an important
practical difference, though a larger sample
size is required to make any sharper
inferences.
Parameter estimate
A parameter estimate is either a point or interval estimate of an unknown population parameter.
64 Chapter 9: Distribution
A point estimate is a single value that is the best estimate of the true unknown parameter; a
confidence interval is a range of values and indicates the uncertainty of the estimate.
Estimator Purpose
Mean Estimate the population mean using the sample mean estimator.
Median Estimate the middle of the distribution using the sample median
estimator.
Hodges-Lehmann Estimate the population pseudo-median using the Hodges-Lehman
pseudo-median pseudo-median estimator.
The pseudo-median is equivalent to the mean/median when the
distribution is symmetric.
Estimator Purpose
Standard deviation Estimate the population standard deviation using the sample standard
deviation estimator.
The sample standard deviation is a biased estimator of the population
standard deviation.
Variance Estimate the population variance using the sample variance estimator.
The sample variance is an unbiased estimator of the population variance.
Hypothesis testing
Hypothesis testing is the formal process of making inferences from a sample whether or not a
statement about the population appears to be true.
A hypothesis test is a method of making decisions. You must state a null hypothesis and an
alternative hypothesis to perform a hypothesis test. The null hypothesis states what the study is
Chapter 9: Distribution 65
intending to reject and disprove. The alternative hypothesis is usually the negation of the null and
states what the study is trying to prove.
When the hypotheses have been stated a statistical test calculates a test statistic and p-value. The
p-value is the probability of obtaining a test statistic at least as extreme as that observed when
the null hypothesis is true. It is a measure of evidence against the null hypothesis. When the p-
value is small, the data are unlikely to have occurred if the null hypothesis is true so you can reject
the null hypothesis and accept the alternative hypothesis. When the p-value is large you cannot
reject the null hypothesis; there is insufficient evidence against it. It is not possible to prove the
null hypothesis, only disprove it. The p-value does not allow you to make any statements about
the probability of the null hypothesis been true, it is a statement based on the observing the data
given the null hypothesis is true.
Often a fixed significance level (denoted by the lower case greek symbol alpha) is used to decide
whether the test is statistically significant or not. The significance level is the probability of
rejecting the null hypothesis when it is true. When the p-value is less than the significance level,
you can declare the test statistically significant. A 5% significance level is typical, which implies
there is a 5% chance of wrongly rejecting the null hypothesis when in fact it is true. If more
certainty is required, use a 1% significance level. Regardless, you should always report the p-value
rather than just a statement of statistically significant, or not.
It is important to remember that a statistically significant test does not imply practically important.
The difference might be so small as to be practically useless even though it was statistically
significant. Alternatively, the sample size may have been so small that a hypothesis test was
not powerful enough to detect anything but a huge difference as statistically significant. It is,
therefore, essential that you always interpret the p-value together with a point and interval
estimate of the parameter or effect size.
66 Chapter 9: Distribution
Tests for the central location parameter of a distribution
Tests for the location parameter of a distribution and their properties and assumptions.
Test Purpose
Z Test if the mean is equal to a hypothesized value when the population
standard deviation is known.
Student's t Test if the mean is equal to a hypothesized value.
Assumes the population is normally distributed. Due to the central limit
theorem, the test may still be useful when this assumption is not true if
the sample size is moderate. However, the Wilcoxon test may be more
powerful in this situation.
TOST (two-one-sided t- Test if the mean is equivalent to a hypothesized value within the
test) equivalance bounds that specify the smallest effect size of interest.
Assumes the population is normally distributed. Due to the central limit
theorem, the test may still be useful when this assumption is not true if
the sample size is moderate.
Test Purpose
X² Test if the variance/standard deviation is equal to a hypothesized value.
Assumes the population is normally distributed. When this assumption
is not true, you should be cautious of using the test, as it is extremely
sensitive to deviations from normality.
Chapter 9: Distribution 67
6. Optional: To compare the p-value against a predefined significance level, in the Significance
level edit box, type the maximum probability of rejecting the null hypothesis when in fact it is
true (typically 5% or 1%).
7. Click Calculate.
68 Chapter 9: Distribution
Discrete distributions
A discrete distribution describes a variable that can only take discrete values (for example, the
number of male and females, or the number of people with a specific eye color).
Frequency distribution
A frequency distribution reduces a large amount of data into a more easily understandable form.
Frequency table
A frequency table is a simple table of the frequencies in each class.
There are many ways of expressing the frequencies:
Statistic Purpose
Frequency The number of occurrences in the class.
Relative frequency The frequency divided by the total.
Frequency density The relative frequency divided by the width of the class interval.
Use with unequal class intervals.
Cumulative frequency The number of occurrences in the class and all previous classes.
Cumulative relative frequency The cumulative frequency divided by the total.
Frequency plot
A frequency plot shows the distribution of a qualitative variable.
A frequency plot shows rectangular bars for each class with the height of the bar proportional to
the frequency.
Chapter 9: Distribution 69
Whole-to-part plot
A whole-to-part plot shows how the parts make up the whole.
A pie chart is a common representation which shows a circle divided into sectors for each class,
with the angle of each sector proportional to the frequency.
A stacked bar plot shows rectangular bars for each class with the size of each bar proportional to
the frequency.
The major problem with the pie chart is that it is difficult to judge the angles and areas of the
sectors. It is useful if you want to compare a single class relative to the whole. For comparing
classes to each other, we recommend the frequency plot or stacked bar plot for greater clarity and
ease of interpretation.
70 Chapter 9: Distribution
The analysis task pane opens.
3. In the Y drop-down list, select the categorical variable.
4. If the data are in frequency form, in the Frequency drop-down list, select the frequency count
variable.
5. Optional: To show the frequency table, select the Frequency table check box.
6. Optional: To label the bars/sectors with the frequencies, select the Label bars/Label sectors
check box.
7. Click Calculate.
Estimator Purpose
Proportion Estimate the population proportion of occurrences of the outcome of
interest using the sample proportion estimator.
Odds Estimate the population odds of the outcome of interest occurring using
the sample odds estimator.
Odds is an expression of the relative probabilities in favor of an event.
Chapter 9: Distribution 71
Test Purpose
Binomial exact Test if the proportion with the outcome of interest is equal to a
hypothesized value.
Uses the binomial distribution and computes an exact p-value. The test
is conservative, that is, the type I error is guaranteed to be less than or
equal to the desired significance level. Recommended for small sample
sizes.
72 Chapter 9: Distribution
The test is an omnibus test and does not tell you which proportions differ from the hypothesized
values.
Test Purpose
Pearson X² Tests if the proportions are equal to the hypothesized values.
Uses the score statistic and computes an asymptotic p-value.
Likelihood ratio G² Tests if the proportions are equal to the hypothesized values.
Uses the likelihood ratio statistic and computes an asymptotic p-value.
Pearson X² usually converges to the chi-squared distribution more
quickly than G². The likelihood ratio test is commonly used in statistical
modeling as the G² statistic is easier to compare between different
models.
Chapter 9: Distribution 73
Study design
Distribution analysis study requirements and dataset layout.
Requirements
• A categorical or quantitative variable.
Dataset layout
Use a column for each variable (Height, Eye color); each row has the values of the variables for a
case (Subject).
1 175 Blue
2 180 Blue
3 160 Hazel
4 190 Green
5 180 Green
6 150 Brown
7 140 Blue
8 160 Brown
9 165 Green
10 180 Hazel
… … …
Brown 221
Blue 215
Hazel 93
Green 64
74 Chapter 9: Distribution
Chapter
Compare groups
10
Compare Groups examines independent samples and makes inferences about the differences
between them.
Independent samples occur when observations are made on different sets of items or subjects. If
the values in one sample do not tell you anything about the values in the other sample, then the
samples are independent. If the knowing the values in one sample could tell you something about
the values in the other sample, then the samples are related.
75
Side-by-side univariate plots
Side-by-side univariate plots summarize the distribution of data stratified into groups.
76 Chapter 10: Compare groups
Equivalence of means hypothesis test
An equivalence hypothesis test formally tests if two population means are equivalent, that is,
practically the same.
An equality hypothesis test can never prove that the means are equal, it can only ever disprove the
null hypothesis of equality. It is therefore of interest when comparing say a new treatment against
a placebo, where the null hypothesis (assumption of what is true without evidence to the contray)
is that the treatment has no effect, and you want to prove the treatment produces a useful effect.
By contrast, an equivalence hypothesis test is of interest when comparing say a generic treatment
to an existing treatment where the aim is to prove that they are equivalent, that is the difference is
less than some small negligible effect size. A equivalence hypothesis test therefore constructs the
null hypothesis of non-equivalence and the goal is to prove the means are equivalent.
The null hypothesis states that the means are not equivalent, against the alternative hypothesis
that the difference between the means is within the bounds of the equivalence interval, that
is, the effect size is less than some small difference that is considered practically zero. The
hypothesis is tested as a composite of two one-sided t-tests (TOST), H01 tests the hypothesis
that mean difference is less than the lower bound of the equivalence interval, test H02 that the
mean difference is greater than the upper bounds of the equivalence interval. The p-value is the
greater of the two one sided t-test p-values. When the test p-value is small, you can reject the null
hypothesis and conclude the samples are from populations with practically equivalent means.
Test Purpose
Z Test if the difference between means is equal to a hypothesized value
when the population standard deviation is known.
Student's t Test if the difference between means is equal to a hypothesized value.
Assumes the populations are normally distributed. Due to the central
limit theorem, the test may still be useful when this assumption is not
true if the sample sizes are equal, moderate size, and the distributions
have a similar shape. However, in this situation the Wilcoxon-Mann-
Whitney test may be more powerful.
Assumes the population variances are equal. This assumption can be
tested using the Levene test. The test may still be useful when this
assumption is not true if the sample sizes are equal. However, in this
situation, the Welch t-test may be preferred.
Chapter 10: Compare groups 77
Test Purpose
true if the sample sizes are equal, moderate size, and the distributions
have a similar shape.
Note: The TOST can be performed using either a Student's t-test or
Welch t-test.
78 Chapter 10: Compare groups
b) In the Hypothesized difference edit box, type the expected difference under the null
hypothesis.
6. If performing a k sample hypothesis test:
a) In the Hypothesis drop-down list, select the null and alternative hypotheses.
7. Optional: To compare the p-value against a predefined significance level, in the Significance
level edit box, type the maximum probability of rejecting the null hypothesis when in fact it is
true (typically 5% or 1%).
8. Click Calculate.
Estimator Purpose
Mean difference Estimate the difference between the means.
Standardized mean Estimate the standardized difference between the means.
difference
Chapter 10: Compare groups 79
Estimator Purpose
Cohen's d is the most popular estimator using the difference between
the means divided by the pooled sample standard deviation. Cohen's
d is a biased estimator of the population standardized mean difference
(although the bias is very small and disappears for moderate to large
samples) whereas Hedge's g applies an unbiasing constant to correct for
the bias.
Multiple comparisons
Multiple comparisons make simultaneous inferences about a set of parameters.
When making inferences about more than one parameter (such as comparing many means, or
the differences between many means), you must use multiple comparison procedures to make
inferences about the parameters of interest. The problem when making multiple comparisons
using individual tests such as Student's t-test applied to each comparison is the chance of a
type I error increases with the number of comparisons. If you use a 5% significance level with
a hypothesis test to decide if two groups are significantly different, there is a 5% probability of
observing a significant difference that is simply due to chance (a type I error). If you made 20 such
comparisons, the probability that one or more of the comparisons is statistically significant simply
due to chance increases to 64%. With 50 comparisons, the chance increases to 92%. Another
problem is the dependencies among the parameters of interest also alter the significance level.
Therefore, you must use multiple comparison procedures to maintain the simultaneous probability
close to the nominal significance level (typically 5%).
Multiple comparison procedures are classified by the strength of inference that can be made
and the error rate controlled. A test of homogeneity controls the probability of falsely declaring
any pair to be different when in fact all are the same. A stronger level of inference is confident
inequalities and confident directions which control the probability of falsely declaring any pair to
be different regardless of the values of the others. An even stronger level is a set of simultaneous
confidence intervals that guarantees that the simultaneous coverage probability of the intervals is
at least 100(1-alpha)% and also have the advantage of quantifying the possible effect size rather
than producing just a p-value. A higher strength inference can be used to make an inference of
a lesser strength but not vice-versa. Therefore, a confidence interval can be used to perform a
80 Chapter 10: Compare groups
confidence directions/inequalities inference, but a test of homogeneity cannot make a confidence
direction/inequalities inference.
The most well known multiple comparison procedures, Bonferroni and Šidák, are not multiple
comparison procedures per se. Rather they are an inequality useful in producing easy to compute
multiple comparison methods of various types. In most scenarios, there are more powerful
procedures available. A useful application of Bonferroni inequality is when there are a small
number of pre-planned comparisons. In these cases, you can use the standard hypothesis test or
confidence interval with the significance level (alpha) set to the Bonferroni inequality (alpha divided
by the number of comparisons).
A side effect of maintaining the significance level is a lowering of the power of the test. Different
procedures have been developed to maintain the power as high as possible depending on
the strength of inference required and the number of comparisons to be made. All contrasts
comparisons allow for any possible contrast; all pairs forms the k*(k-1)/2 pairwise contrasts,
whereas with best forms k contrasts each with the best of the others, and against control forms
k-1 contrasts each against the control group. You should choose the appropriate contrasts of
interest before you perform the analysis, if you decide after inspecting the data, then you should
only use all contrasts comparison procedures.
The mean-mean scatter plot shows the mean of a group on the horizontal axis against the mean
of the other group on the vertical axis with a dot at the intersection. A vector centered at the
intersection with a slope of -1 and a length proportional to the width of the confidence interval
represents the confidence interval. A gray identity line represents equality of means; that is the
Chapter 10: Compare groups 81
difference is equal to zero. If the vector does not cross the identity line, you can conclude there is a
significant difference between the means.
To make interpretation easier, a 45-degree rotated version of the plot shows the difference
between means and its confidence interval on the horizontal axis against average of the means on
the vertical axis.
Procedure Purpose
Student's t (Fisher's Compare the means of each pair of groups using the Student's t
LSD) method.
When making all pairwise comparisons this procedure is also known as
unprotected Fisher's LSD, or when only performed following significant
ANOVA F -test known as protected Fisher's LSD.
Control the type I error rate is for each contrast.
Wilcoxon-Mann- Compare the median/means each pair of groups using the Wilcoxon
Whitney nonparametric method.
Control the type I error rate for each contrast.
Tukey-Kramer Compare the means of all pairs of groups using the Tukey-Kramer
method.
Controls the error rate simultaneously for all k(k+1)/2 contrasts.
Steel-Dwass-Critchlow- Compare the median/means of all pairs of groups using the Steel-Dwass-
Fligner Critchlow-Fligner pairwise ranking nonparametric method.
Controls the error rate simultaneously for all k(k+1)/2 contrasts.
Hsu Compare the means of all groups against the best of the other groups
using the Hsu method.
Controls the error rate simultaneously for all k contrasts.
Dunnett Compare the means of all groups against a control using the Dunnett
method.
Controls the error rate simultaneously for all k-1 contrasts.
Steel Compare the medians/means of all groups against a control using the
Steel pairwise ranking nonparametric method.
Controls the error rate simultaneously for all k-1 comparisons.
Scheffé Compare the means of all groups against all other groups using the
Scheffé F method.
Controls the error rate simultaneously for all possible contrasts.
82 Chapter 10: Compare groups
Procedure Purpose
Duncan Not implemented.
As discussed in Hsu (1996) it is not a confident inequalities method and
cannot be recommended.
Chapter 10: Compare groups 83
10.Click Calculate.
Test Purpose
F Test if the ratio of the variances is equal to a hypothesized value.
Assumes the populations are normally distributed and is extremely sensitive
to departures from normality. The Levene and Brown-Forsythe tests are
more robust against violations of the normality assumption than the F-test.
84 Chapter 10: Compare groups
4. In the X drop-down list, select the categorical factor variable identifying the groups.
5. In the Hypotheses drop-down list, select the null and alternative hypothesis.
6. Optional: To compare the p-value against a predefined significance level, in the Significance
level edit box, type the maximum probability of rejecting the null hypothesis when in fact it is
true (typically 5% or 1%).
7. Click Calculate.
Chapter 10: Compare groups 85
Study design
Compare groups analysis study requirements and dataset layout.
Requirements
• A quantitative response variable.
• A categorical factor variable with 2 or more levels.
Dataset layout
Use a column for the response variable (Height) and a column for the factor variable (Sex); each
row has the values of the variables for a case (Subject).
1 Male 175
2 Male 180
3 Male 160
4 Male 190
5 Male 180
6 Female 150
7 Female 140
8 Female 160
9 Female 165
10 Female 180
… … …
Note: Do not split groups into separate columns. For example, you should not record the heights
of males in one column and heights of females in another.
86 Chapter 10: Compare groups
Chapter
Compare pairs
11
Compare Pairs examines related samples and makes inferences about the differences between
them.
Related samples occur when observations are made on the same set of items or subjects at
different times, or when another form of matching has occurred. If the knowing the values in
one sample could tell you something about the values in the other sample, then the samples are
related.
There are a few different study designs that produce related data. A paired study design
takes individual observations from a pair of related subjects. A repeat measures study design
takes multiple observations on the same subject. A matched pair study design takes individual
observations on multiple subjects that are matched on other covariates.The purpose of matching
similar subjects is often to reduce or eliminate the effects of a confounding factor.
Difference plot
A difference plot shows the differences between two observations on the same sampling unit.
The difference plot shows the difference between two observations on the vertical axis against the
average of the two observations on the horizontal axis. A gray identity line represents equality; no
difference.
If the second observation is always greater than the first the points lie above the line of equality, or
vice-versa. If differences are not related to the magnitude the points will form a horizontal band.
If the points form an increasing, decreasing, or non-constant width band, then the variance is not
constant.
It is common to combine the difference plot with a histogram and a normality plot of the
differences to check if the differences are normally distributed, which is an assumption of some
statistical tests and estimators.
87
Creating a Tukey mean-difference plot
Summarize the differences between two related observations.
1. Select a cell in the dataset.
2. On the Analyse-it ribbon tab, in the Statistical Analyses group, click Compare Pairs, and
then click Difference.
The analysis task pane opens.
3. If the data consists of paired/repeated measurements in separate variables:
a) In the Model drop-down menu, select Paired/Repeated.
b) In the Y list, select the quantitative variables.
4. If the data consist of matched pairs with a separate response variable, factor variable, and
blocking variable:
a) In the Model drop-down menu, select Matched.
b) In the Y drop-down list, select the quantitative response variable.
c) In the X (factor) drop-down list, select the categorical factor variable identifying the
groups.
d) In the Block drop-down list, select the blocking variable identifying the matching.
5. Optional: To show a histogram of the distribution of the differences, select the Histogram
check box.
6. Optional: To show a normality plot of the differences, select the Normal plot check box.
7. Click Calculate.
88 Chapter 11: Compare pairs
less than some small negligible effect size. A equivalence hypothesis test therefore constructs the
null hypothesis of non-equivalence and the goal is to prove the means are equivalent.
The null hypothesis states that the means are not equivalent, against the alternative hypothesis
that the difference between the means is within the bounds of the equivalence interval, that is,
the effect size is less than some small difference that is considered practically zero. The hypothesis
is tested as a composite of two one-sided t-tests (TOST), H01 tests the hypothesis that mean
difference is less than the lower bound of the equivalence interval, test H02 that the mean
difference is greater than the upper bounds of the equivalence interval. The p-value is the greater
of the two one-sided t-test p-values. When the test p-value is small, you can reject the null
hypothesis and conclude the samples are from populations with practically equivalent means.
Test Purpose
Z Test if the difference between means is equal to a hypothesized value
when the population standard deviation is known.
Assumes the population differences are normally distributed. Due to the
central limit theorem, the test may still be useful when this assumption
is not true if the sample size is moderate. However, in this case, the
Wilcoxon test may be more powerful.
Chapter 11: Compare pairs 89
Test Purpose
Assumes the populations are normally distributed. Due to the central
limit theorem, the test may still be useful when the assumption is
violated if the sample sizes are equal and moderate size. However, in this
situation the Friedman test is may be more powerful.
90 Chapter 11: Compare pairs
4. If the data consist of matched pairs with a separate response variable, factor variable, and
blocking variable:
a) In the Model drop-down menu, select Matched.
b) In the Y drop-down list, select the quantitative response variable.
c) In the X (factor) drop-down list, select the categorical factor variable identifying the
groups.
d) In the Block drop-down list, select the blocking variable identifying the matching.
5. In the Equivalence interval edit boxes, type the lower and upper bounds of the equivalence
interval. For example, type-5 to +5 if differences between the means of ±5 is the effect size
considered to be practically the same.
6. Optional: To compare the p-value against a predefined significance level, in the Significance
level edit box, type the maximum probability of rejecting the null hypothesis when in fact it is
true (typically 5% or 1%).
7. Click Calculate.
Estimator Purpose
Mean difference Estimate the difference between the means.
Standardized mean Estimate the standardized difference between the means.
difference
Cohen's d is the most popular estimator using the difference between
the means divided by the pooled sample standard deviation. Cohen's
d is a biased estimator of the population standardized mean difference
(although the bias is very small and disappears for moderate to large
samples) whereas Hedge's g applies an unbiasing constant to correct for
the bias.
Chapter 11: Compare pairs 91
1. Select a cell in the dataset.
2. On the Analyse-it ribbon tab, in the Statistical Analyses group, click Compare Pairs, and
then click the effect size estimator.
The analysis task pane opens.
3. If the data consists of paired/repeated measurements in separate variables:
a) In the Model drop-down menu, select Paired/Repeated.
b) In the Y list, select the quantitative variables.
4. If the data consist of matched pairs with a separate response variable, factor variable, and
blocking variable:
a) In the Model drop-down menu, select Matched.
b) In the Y drop-down list, select the quantitative response variable.
c) In the X (factor) drop-down list, select the categorical factor variable identifying the
groups.
d) In the Block drop-down list, select the blocking variable identifying the matching.
5. In the Difference drop-down list, select the direction of the difference.
6. In the Confidence interval edit box, type the confidence level as a percentage, or type - to
suppress the confidence interval, and then in the drop-down list, select the confidence bounds.
7. In the Method drop-down list, select the interval estimator.
8. Click Calculate.
92 Chapter 11: Compare pairs
Study design
Compare pairs analysis study requirements and dataset layout.
Requirements
• 2 or more repeated measurements on a quantitative response variable.
Blood pressure
1 123 124
2 109 97
3 112 113
4 102 105
5 98 95
6 114 119
7 119 114
8 112 114
9 110 121
10 117 118
… … …
Chapter 11: Compare pairs 93
Matched dataset layout
Use a column the response variable (Blood pressure), a column for the factor variable
(Intervention), and a column for the blocking variable (Pair); each row has the values of the
variables for a case.
1 Before 123
2 Before 109
3 Before 112
4 Before 102
5 Before 98
6 Before 114
7 Before 119
8 Before 112
9 Before 110
10 Before 117
… Before …
1 After 124
2 After 97
3 After 113
4 After 105
5 After 95
6 After 119
7 After 114
8 After 114
9 After 121
10 After 118
… After …
94 Chapter 11: Compare pairs
Chapter
Contingency tables
12
Contingency analysis describes and visualizes the distribution of categorical variables, and makes
inferences about the equality of proportions, independence of the variables, or agreement
between variables.
Contingency table
A contingency table, also known as a cross-classification table, describes the relationships between
two or more categorical variables.
A table cross-classifying two variables is called a 2-way contingency table and forms a rectangular
table with rows for the R categories of the X variable and columns for the C categories of a Y
variable. Each intersection is called a cell and represents the possible outcomes. The cells contain
the frequency of the joint occurrences of the X, Y outcomes. A contingency table having R rows
and C columns is called an R x C table.
A variable having only two categories is called a binary variable. When both variables are binary,
the resulting contingency table is a 2 x 2 table. Also, commonly known as a four-fold table
because there are four cells.
Smoke
Alcohol consumption Yes No Total
Low 10 80 90
High 50 40 90
Total 60 120 180
A contingency table can summarize three probability distributions – joint, marginal, and
conditional.
• The joint distribution describes the proportion of the subjects jointly classified by a category of
X and a category of Y. The cells of the contingency table divided by the total provides the joint
distribution. The sum of the joint distribution is 1.
• The marginal distributions describe the distribution of the X (row) or Y (column) variable alone.
The row and column totals of the contingency table provide the marginal distributions. The
sum of a marginal distribution is 1.
• The conditional distributions describe the distribution of one variable given the levels of the
other variable. The cells of the contingency table divided by the row or column totals provide
the conditional distributions. The sum of a conditional distribution is 1.
When both variables are random, you can describe the data using the joint distribution, the
conditional distribution of Y given X, or the conditional distribution of X given Y.
When one variable is and explanatory variable (X, fixed) and the other a response variable (Y,
random), the notion of a joint distribution is meaningless, and you should describe the data using
the conditional distribution of Y given X. Likewise, if Y is a fixed variable and X random, you
should describe the data using the conditional distribution of X given Y.
95
When the variables are matched-pairs or repeated measurements on the same sampling unit, the
table is square R=C, with the same categories on both the rows and columns. For these tables, the
cells may exhibit a symmetric pattern about the main diagonal of the table, or the two marginal
distributions may differ in some systematic way.
After 6 months
Before Approve Disapprove Total
Approve 794 150 944
Disapprove 86 570 656
Total 880 720 1600
96 Chapter 12: Contingency tables
7. Click Calculate.
Effect size
An effect size estimates the magnitude of the difference of proportions or the association between
two categorical variables.
The effect size that best describes a 2 x 2 contingency table depends on the study design that
produced the data:
• When both variables are random variables, the odds ratio provides the best measure of
association between the variables.
• When one variable is an explanatory variable (a fixed variable) and the other a response
variable (a random variable), the effect size between the two groups of the response variable
can be expressed as the odds ratio, the difference between proportions, or ratio of proportions.
• When the variables are matched-pairs or repeated measurements, the odds ratio or the
difference between proportions are appropriate. The ratio of proportions is meaningless in this
scenario.
A point estimate is a single value that is the best estimate of the true unknown parameter; a
confidence interval is a range of values and indicates the uncertainty of the estimate.
For tables larger than 2 x 2, you must partition the contingency table into a series of 2 x 2 sub-
tables to estimate the effect size.
Estimators
Estimators for contingency tables and their properties and assumptions.
Estimator Purpose
Proportion difference Estimate the difference of proportions (also known as Risk Difference).
Proportion ratio Estimate the ratio of proportions (also known as Risk Ratio).
Odds ratio Estimate the odds ratio.
Values of the odds ratio range from zero to infinity. Values further from
1 in either direction represent a stronger association. The inverse value
of an odds ratio represents the same degree of association in opposite
directions.
Chapter 12: Contingency tables 97
Estimator Purpose
The odds-ratio does not change value when the orientation of a
contingency table is reversed so that rows become columns and
columns become rows. Therefore it is useful in situations where both
variables are random.
98 Chapter 12: Contingency tables
9. Click Calculate.
Relative risk
Relative risk is the ratio of the probability of the event occurring in one group versus another
group.
Medical statistics uses categorical data analysis extensively to describe the association between the
occurrence of a disease and exposure to a risk factor. In this application, the terminology is often
used to frame the statistics in terms of risk. Risk difference is equivalent to a difference between
proportions, and a risk ratio is equivalent to a ratio of proportions.
The term relative risk is the ratio of the probability of the event occurring in the exposed group
versus a non-exposed group. It is the same as the risk ratio. Although a relative risk is different
from odds ratio, in some circumstances, such as the low prevalence of the disease, the odds ratio
asymptotically approaches the relative risk and is, therefore, an estimate of the relative risk. It is
better to avoid confusion and use the terms risk ratio and odds ratio for the effect size estimates
and use the term relative risk for the population parameter.
A prospective cohort study or a clinical trial can estimate the risk ratio or the odds ratio of the
occurrence of a disease given exposure to a risk factor. Whereas, a retrospective case-control study
can estimate the odds ratio of the risk factor given the disease, which is equivalent to the odds
ratio of the occurrence of the disease given the risk factor. A risk ratio is not useful because it
refers to the risk factor given the disease, which is not normally of interest. However, as stated
above the odds ratio can be used as an estimate of the relative risk in most circumstances.
Practitioners often prefer the risk ratio due to its more direct interpretation. Statisticians tend to
prefer the odds ratio as it applies to a wide range of study designs, allowing comparison between
different studies and meta-analysis based on many studies. It also forms the basis of logistic
regression.
Chapter 12: Contingency tables 99
Tests for contingency tables larger than 2 x 2 are omnibus tests and do not tell you which groups
differ from each other or in which categories. You should use the mosaic plot to examine the
association, or partition the contingency table into a series of 2 x 2 sub-tables and test each table.
Test Purpose
Fisher exact Test if the two proportions are equal.
Assumes fixed marginal distributions. Although not naturally fixed in
most studies, this test still applies by conditioning on the marginal totals.
Uses hypergeometric distribution and computes an exact p-value. The
exact p-value is conservative, that is, the actual rejection rate is below
the nominal significance level. Recommended for small sample sizes or
sparse data.
Continuity correction
Continuity corrections such as Yates X2are no longer needed with modern computing power.
Continuity corrections have historically been used to make adjustments to the p-value when a
continuous distribution approximates a discrete distribution. Yates correction for the Pearson
chi-square (X2) test is probably the most well-known continuity correction. In some cases,
the continuity correction may adjust the p-value too far, and the test then becomes overly
conservative.
Modern computing power makes such corrections unnecessary, as exact tests that use the discrete
distributions are available for moderate and in many cases even large sample sizes. Hirji (Hirji
2005), states “An applied statistician today, in our view, may regard such corrections as interesting
historical curiosities.”
Test Purpose
Pearson X² Test if the variables are independent.
Uses the score statistic and computes an asymptotic p-value.
Testing independence
Test if two categorical variables are independent in a bivariate population represented by the
sample.
1. Select a cell in the dataset.
2. On the Analyse-it ribbon tab, in the Statistical Analyses group, click Compare Groups, and
then click the hypothesis test.
The analysis task pane opens.
3. In the X and Y drop-down lists, select the categorical variables.
4. In the Hypotheses drop-down list, select the independence null and alternative hypotheses
(the 2nd in the list).
5. Optional: To compare the p-value against a predefined significance level, in the Significance
level edit box, type the maximum probability of rejecting the null hypothesis when in fact it is
true (typically 5% or 1%).
6. Click Calculate.
You can use the mosaic plot to discover the association between two variables. Red tiles indicate
significant negative residuals, where the frequency is less than expected. Blue tiles indicate
significant positive residuals, where the frequency is greater than expected. The intensity of the
color represents the magnitude of the residual.
Requirements
• 2 categorical variables.
or:
• A qualitative response variable.
• A categorical factor variable with 2 or more levels.
or:
• 2 or more repeated measurements on a qualitative response variable.
Dataset layout
Use a column for each variable (Eye color, Hair color); each row has the values of the variables.
1 Brown Black
2 Blue Red
3 Blue Blonde
4 Brown Red
5 Green Blonde
6 Hazel Brown
7 Blue Blonde
8 Brown Black
9 Green Red
10 Green Blonde
… … …
Brown Black 69
Brown Brown 119
Brown Red 26
Brown Blonde 7
Blue Black 20
Blue Brown 84
Blue Red 17
Blue Blonde 94
Hazel Black 15
Hazel Brown 54
Hazel Red 14
Hazel Blonde 10
Green Black 5
Green Brown 29
Green Red 14
Green Blonde 16
Scatter plot
A scatter plot shows the association between two variables. A scatter plot matrix shows all
pairwise scatter plots for many variables.
If the variables tend to increase and decrease together, the association is positive. If one variable
tends to increase as the other decreases, the association is negative. If there is no pattern, the
association is zero.
When a straight line describes the relationship between the variables, the association is linear.
When a constantly increasing or decreasing nonlinear function describes the relationship, the
association is monotonic. Other relationships may be nonlinear or non-monotonic.
The type of relationship determines the statistical measures and tests of association that are
appropriate.
If the association is a linear relationship, a bivariate normal density ellipse summarizes the
correlation between variables. The narrower the ellipse, the greater the correlation between
the variables. The wider and more round it is, the more the variables are uncorrelated. If the
association is nonlinear, it is often worth trying to transform the data to make the relationship
linear as there are more statistics for analyzing linear relationships and their interpretation is easier
than nonlinear relationships.
An observation that appears detached from the bulk of observations may be an outlier requiring
further investigation. An individual observation on each of the variables may be perfectly
reasonable on its own but appear as an outlier when plotted on a scatter plot. Outliers can badly
affect the product-moment correlation coefficient, whereas other correlation coefficients are more
robust to them.
107
1. Select a cell in the dataset.
2. On the Analyse-it ribbon tab, in the Statistical Analyses group, click Correlation, and then
click Scatterplot or Scatterplot Matrix, and then click the plot type.
The analysis task pane opens.
3. If the model is bivariate, in the X and Y drop-down lists, select the variables.
4. If the model is multivariate, in the Variables list, select the variables.
5. Optional: To show stratification of the observations, select Vary point color/symbol, and then
in the Group / Color / Symbol drop-down list, select a variable.
6. Optional: To label the observations, select the Label observations check box.
7. Click Calculate.
Covariance
Covariance is a measure of how much two variables change together. A covariance matrix
measures the covariance between many pairs of variables.
When the variables tend to show similar behavior, the covariance is positive. That is when greater
values of one variable mainly correspond with the greater values of the other variable, or lesser
values of one variable correspond with lesser values of the other variable. When the variables tend
to show opposite behavior, the covariance is negative. That is when the greater values of one
variable mainly correspond to the lesser values of the other and vice-versa.
The magnitude of the covariance is not meaningful to interpret. However, the standardized version
of the covariance, the correlation coefficient, indicates by its magnitude the strength of the
relationship.
A covariance matrix measures the covariance between many variables. Because the covariance of
a variable with itself is that variable's variance, the diagonal of the covariance matrix is simply the
variance of each variable.
Correlation coefficient
A correlation coefficient measures the association between two variables. A correlation matrix
measures the correlation between many pairs of variables.
The type of relationship between the variables determines the best measure of association:
• When the association between the variables is linear, the product-moment correlation
coefficient describes the strength of the linear relationship.
The correlation coefficient ranges from -1 to +1. +1 indicates a perfect positive linear
relationship, and -1 indicates a perfect negative linear relationship. Zero indicates the variables
are uncorrelated and there is no linear relationship. Normally the correlation coefficient lies
somewhere between these values.
• When the association between the variables is not linear, a rank correlation coefficient
describes the strength of association.
Rank correlation coefficients range from -1 to +1. A positive rank correlation coefficient
describes the extent to which as one variable increases the other variable also tends to
increase, without requiring that increase to be linear. If one variable increases, as the other
tends to decrease, the rank correlation coefficient is negative.
It is best to use a scatter plot to identify the type of association between the variables and then
use an appropriate measure of association for the relationship. Do not be tempted just to look for
the highest correlation coefficient.
A correlation matrix measures the correlation between many variables. It is equivalent to a
covariance matrix of the standardized variables.
Red indicate negative values, blue indicates positive values. Intensity of the color represents the
magnitude of the value, the darker more extreme.
Parameter estimate
A parameter estimate is either a point or interval estimate of the unknown population correlation
coefficient.
A point estimate is a single value that is the best estimate of the true unknown population
correlation coefficient; a confidence interval is a range of values and indicates the uncertainty of
the estimate.
Estimator Purpose
Pearson r Estimate the Pearson rho correlation coefficient using the sample
Pearson correlation coefficient r.
Use when a linear function best describes the relationship between the
variables.
Susceptible to outliers. Assumes a bivariate normal distribution.
Spearman rs Estimate the Spearman rho rank correlation coefficient using the sample
Spearman rs correlation coefficient.
Use when a monotonic function best describes the relationship between
variables.
Testing correlation/association
Test if the correlation between 2 variables is equal to a hypothesized value, or test for
independence.
1. Select a cell in the dataset.
2. On the Analyse-it ribbon tab, in the Statistical Analyses group, click Correlation, and then
the parameter estimator.
The analysis task pane opens.
3. If analyzing two variables:
a) In the Model drop-down menu, select Bivariate.
b) In the X drop-down list, select the first variable.
c) In the Y drop-down list, select the second variable.
4. If analyzing more than two variables:
a) In the Model drop-down menu, select Multivariate.
b) In the Variables list, select the variables.
5. Select the Hypothesis test check box.
Requirements
• 2 or more quantitative variables.
Dataset layout
Use a a column for each variable (Height, Weight); each row has the values of the variables for a
case (Subject).
1 175 65
2 180 70
3 160 90
4 190 55
5 180 100
6 150 55
7 140 75
8 160 80
9 165 80
10 180 95
… … …
Principal components
Principal components are the linear combinations of the original variables.
Variances
Variances of each principal component show how much of the original variation in the dataset is
explained by the principal component.
When the data is standardized, a component with a variance of 1 indicates that the principal
component accounts for the variation equivalent to one of the original variables. Also, the sum of
all the variances is equal to the number of original variables.
Coefficients
Coefficients are the linear combinations of the original variables that make up the principal
component. The coefficients for each principal component can sometimes reveal the structure of
the data. Absolute values near zero indicate that a variable contributes little to the component,
whereas larger absolute values indicate variables that contribute more to the component.
Often, when the data is centered and standardized, the coefficients are normalized so that the
sum of the squares of the coefficients of a component is equal to the variance of the component.
In this normalization, the coefficients can be interpreted as the correlation between the original
variable and the principal component, and are often called loadings (a term borrowed from factor
analysis).
Scores
Scores are new variables that are the value of the linear combination of the original variables. The
scores are normalized so that the sum of squares equals the variance of the principal component.
115
Scree plot
A scree plot visualizes the dimensionality of the data.
The scree plot shows the cumulative variance explained by each principal component. You can
make decision on the number of components to keep to adequately describe a dataset using
ad-hoc rules such as components with a variance > 0.7 or where the cumulative proportion of
variation is > 80% or > 90% (Jolliffe 2002)
Biplot
A biplot simultaneously plots information on the observations and the variables in a
multidimensional dataset.
A biplot can optimally represent any two of the following characteristics:
• distances between observations
• relationships between variables
• inner products between observations and variables
There are 3 types of biplot based on which of these characteristics they represent:
Type Characteristics
PCA Distances between the observations and also the inner products
between observations and variables.
Covariance / Relationships between the variables and the inner products between
Correlation observations and variables.
Joint Distances between observations and also the relationship between
variables.
A 2-dimensional biplot represents the information contained in two of the principal components. It
is an approximation of the original multidimensional space.
The classical biplot (Gabriel 1971) plots points representing the observations and vectors
representing the variables.
A more recent innovation, the PCA biplot (Gower & Hand 1996), represents the variables with
calibrated axes and observations as points allowing you to project the observations onto the axes
to make an approximation of the original values of the variables.
Monoplot
A monoplot plots information on the observations or the variables in a multidimensional dataset.
Creating a biplot
A biplot simultaneously shows information on the observations and the variables in a
multidimensional dataset.
1. Select a cell in the dataset.
2. On the Analyse-it ribbon tab, in the Statistical Analyses group, click Multivariate > Biplot /
Monoplot, and then click the plot type.
The analysis task pane opens.
3. In the Variables list, select the variables.
4. Optional: To label the observations, select the Label points check box.
5. Optional: To label the variables, select the Label vectors check box.
6. Optional: To show stratification, select Vary point color/symbol, and then in the Group /
Color / Symbol drop-down list, select a variable.
7. Click Calculate.
EFA
Exploratory factor analysis (EFA) identifies the underlying relationships between a large number of
interrelated variables when there are no prior hypotheses about factors or patterns amongst the
variables.
EFA is a technique based on the common factor model which describes the measured variables by
a function of the common factors, unique factors, and error of measurements. Common factors
are those that influence two or more measured variables, while unique factors influence only one
measured variable.
Pattern matrix
The factor pattern matrix loadings are the linear combinations of the factors that make up the
original standardized variables.
Structure matrix
The factor structure matrix loadings are the correlation coefficients between the factors and the
variables.
Correlation matrix
The factor correlation matrix coefficients are the correlation coefficients between the factors.
Note: When the factors are not rotated, or the rotation is orthogonal, there is no correlation
between the factors and the correlation matrix is equal to the identity matrix. Also, the loadings
in the pattern matrix and structure matrix are identical, although it can be useful to remember the
different interpretations - as linear coefficients or correlation coefficients.
Factor rotation
Rotations minimize the complexity of the factor loadings to make the structure simpler to
interpret.
Factor loading matrices are not unique, for any solution involving two or more factors there are an
infinite number of orientations of the factors that explain the original data equally well. Rotation
of the factor loading matrices attempts to give a solution with the best simple structure.
There are two types of rotation:
121
• Orthogonal rotations constrain the factors to be uncorrelated. Although often favored, in
many cases it is unrealistic to expect the factors to be uncorrelated, and forcing them to be
uncorrelated makes it less likely that the rotation produces a solution with a simple structure.
• Oblique rotations permit the factors to be correlated with one another. Often produces
solutions with a simpler structure.
Matrix rotations
Orthogonal and oblique matrix rotations.
p = number of variables, m = number of factors.
Extracting factors
Extract the underlying factors (latent variables) among a large number of interrelated variables in a
multidimensional dataset.
1. Select a cell in the dataset.
2. On the Analyse-it ribbon tab, in the Statistical Analyses group, click Multivariate, and then
click Common Factors.
The analysis task pane opens.
3. In the Variables list, select the variables.
4. In the Factors to extract edit box, type the number of underlying factors to attempt to
extract.
Item reliability
16
Item reliability is the consistency of a set of items (variables); that is to what extent they measure
the same thing. When a set of items are consistent, they can make a measurement scale such as a
sum scale.
Cronbach’s alpha is the most popular measure of item reliability; it is the average correlation of
items in a measurement scale. If the items have variances that significantly differ, standardized
alpha is preferred.
When all items are consistent and measure the same thing, then the coefficient alpha is equal to 1.
A high value for alpha does not imply that the measure is unidimensional. To prove that a scale is
unidimensional, you can use factor analysis to check the dimensionality.
It is possible to see the effect of an individual item on the overall alpha value by recomputing
Cronbach's alpha excluding that item. If alpha increases when you exclude an item, that item
does not highly correlate with the other items in the scale. If the alpha decreases, that item does
correlate with the other items in the scale.
125
126
Chapter
Fit model
17
Fit model describes the relationship between a response variable and one or more predictor
variables.
There are many different models that you can fit including simple linear regression, multiple linear
regression, analysis of variance (ANOVA), analysis of covariance (ANCOVA), and binary logistic
regression.
Linear fit
A linear model describes the relationship between a continuous response variable and the
explanatory variables using a linear function.
127
Advanced models
Advanced models describe the relationship between a response variable and multiple predictor
terms.
An advanced model is built from simple terms, polynomial terms and their interactions.
Some common forms of advanced models are:
Type Description
Polynomial A series of polynomial terms of the nth degree (X, X², X³...).
Two-way ANOVA with Two categorical terms and their interaction (A, B, A*B).
interaction
Fully factorial ANOVA All the simple categorical terms and all the crossed interaction
terms (A, B, C, A*B, A*C, B*C, A*B*C).
Performing ANOVA
Test if there is a difference between population means when a response variable is classified by
one or more categorical variables (factors).
Performing ANCOVA
Test if there is a difference between population means when a response variable is classified
by one or more categorical variables (factors) while adjusting for the effect of one or more
quantitative variables (covariates).
1. Select a cell in the dataset.
2. On the Analyse-it ribbon tab, in the Statistical Analyses group, click Fit Model, and then
click ANCOVA.
The analysis task pane opens.
3. In the Y drop-down list, select the response variable.
4. In the Available variables list, select the variable(s):
• To select a single variable, click the variable.
• To select multiple variables, click the first variable then hold down the CTRL key and click
each additional variable.
• To select a range of variables, click the first variable then hold down the SHIFT key and click
the last variable in the range.
5. To add the variables to the model as categorical factors, click Add Factor.
6. To add the variables to the model as quantitative covariates, click Add Covariate.
7. To add the interaction between two or more factors/covariates, in the Terms list box, click
the first term then hold down the CTRL key and click each additional term to include in the
interaction, and then click Cross.
8. To remove a term, in the Terms list box, click the term and then click Remove.
Note: It is sometimes quicker to use the Factorial command to build a model and then
remove any unnecessary terms.
9. To remove a factor/covariate and all terms using the factor/covariate, in the Factors or
Covariates list box, click the factor/covariate and then click Remove.
When you select a factor or covariate, all terms that include the factor or covariate are
selected. If you hold down the CTRL key and click additional factors or covariates, then all
terms which include all the selected factors or covariates are selected.
10.Repeat steps 4 through 9 to build the model.
11.Click Calculate.
Scatter plot
A scatter plot shows the relationship between variables.
The scatter plot identifies the relationship that best describes the data, whether a straight line,
polynomial or some other function.
A scatter plot matrix shows the relationship between each predictor and the response, and the
relationship between each pair of predictors. You can use the matrix to identify the relationship
between variables, to identify where additional terms such as polynomials or interactions are
needed, and to see if transformations are needed to make the predictors or response linear. The
scatter plot matrix does not convey the joint relationship between each predictor and the response
since it does not take into account the effect on the response of the other variables in the model.
Effect leverage and residual plots fulfill this purpose after fitting the model.
Summary of fit
R² and similar statistics measure how much variability is explained by the model.
R² is the proportion of variability in the response explained by the model. It is 1 when the model
fits the data perfectly, though it can only attain this value when all sets of predictors are different.
Zero indicates the model fits no better than the mean of the response. You should not use R²
when the model does not include a constant term, as the interpretation is undefined.
For models with more than a single term, R² can be deceptive as it increases as you add more
parameters to the model, eventually reaching saturation at 1 when the number of parameters
equals the number of observations. Adjusted R² is a modification of R² that adjusts for the number
of parameters in the model. It only increases when the terms added to the model improve the fit
more than would be expected by chance. It is preferred when building and comparing models
with a different number of parameters.
For example, if you fit a straight-line model, and then add a quadratic term to the model, the
value of R² increases. If you continued to add more the polynomial terms until there are as many
parameters as the number of observations, then the R² value would be 1. The adjusted R² statistic
is designed to take into account the number of parameters in the model and ensures that adding
the new term has some useful purpose rather than simply due to the number of parameters
approaching saturation.
Parameter estimates
Parameter estimates (also called coefficients) are the change in the response associated with a
one-unit change of the predictor, all other predictors being held constant.
The unknown model parameters are estimated using least-squares estimation.
A coefficient describes the size of the contribution of that predictor; a near-zero coefficient
indicates that variable has little influence on the response. The sign of the coefficient indicates the
direction of the relationship, although the sign can change if more terms are added to the model,
so the interpretation is not particularly useful. A confidence interval expresses the uncertainty
in the estimate, under the assumption of normally distributed errors. Due to the central limit
theorem, violation of the normality assumption is not a problem if the sample size is moderate.
• For quantitative terms, the coefficient represents the rate of change in the response per 1
unit change of the predictor, assuming all other predictors are held constant. The units of
measurement for the coefficient are the units of response per unit of the predictor.
For example, a coefficient for Height of 0.75, in a simple model for the response Weight (kg)
with predictor Height (cm), could be expressed as 0.75 kg per cm which indicates a 0.75 kg
weight increase per 1 cm in height.
When a predictor is a logarithm transformation of the original variable, the coefficient is the
rate of change in the response per 1 unit change in the log of the predictor. Commonly base 2
log and base 10 log are used as transforms. For base 2 log the coefficient can be interpreted as
the rate of change in the response when for a doubling of the predictor value. For base 10 log
the coefficient can be interpreted as the rate of change in the response when the predictor is
multiplied by 10, or as the % change in the response per % change in the predictor.
• For categorical terms, there is a coefficient for each level:
• For nominal predictors the coefficients represent the difference between the level mean and
the grand mean.
Analyse-it uses effect coding for nominal terms (also known as the mean deviation coding).
The sum of the parameter estimates for a categorical term using effect coding is equal to 0.
• For ordinal predictors, the coefficients represent the difference between the level mean and
the baseline mean.
Analyse-it uses reference coding for ordinal terms. The first level is used as the baseline or
reference level.
• For the constant term, the coefficient is the response when all predictors are 0, and the units of
measurement are the same as the response variable.
A standardized parameter estimate (commonly known as standardized beta coefficient) removes
the unit of measurement of predictor and response variables. They represent the change in
standard deviations of the response for 1 standard deviation change of the predictor. You can use
them to compare the relative effects of predictors measured on different scales.
VIF, the variance inflation factor, represents the increase in the variance of the parameter estimate
due to correlation (collinearity) between predictors. Collinearity between the predictors can lead to
ANOVA table
An analysis of variance (ANOVA) table shows the sources of variation.
An ANOVA table uses various abbreviations:
Abbreviation Description
SS Sum of squares, the sum of the squared deviations from the expected value.
DF Degrees of freedom, the number of values that are free to vary.
MS Mean square, the sum of squares divided by the degrees of freedom.
F F-statistic, the ratio of two mean squares that forms the basis of a hypothesis
test.
p-value p-value, the probability of obtaining an F statistic at least as extreme as that
observed when the null hypothesis is true.
For a good fit, the points should be close to the fitted line, with narrow confidence bands. Points
on the left or right of the plot, furthest from the mean, have the most leverage and effectively try
to pull the fitted line toward the point. Points that are vertically distant from the line represent
possible outliers. Both types of points can adversely affect the fit.
The plot is also a visualization of the ANOVA table, except each observation is shown so you
can gain much more insight than just the hypothesis test. The confidence interval around the
full model line portrays the F-test that all the parameters except the intercept are zero. When
the confidence interval does not include the horizontal null model line, the hypothesis test is
significant.
Lack of Fit
An F-test or X2-test formally tests how well the model fits the data.
When the model fitted is correct the residual (model error) mean square provides and unbiased
estimate of the true variance. If the model is wrong, then the mean square is larger than the true
variance. It is possible to test for lack of fit by comparing the model error mean square to the true
variance.
When the true variance is known, a X2-squared test formally tests whether the model error is
equal to the hypothesized value.
When the true variance is unknown, and there are multiple observations for each set of predictor
values, an F-test formally tests whether there is a difference between pure error and the model
error. The pure error is the pooled variance calculated for all unique sets of predictor values. It
Test Purpose
Partial Measures the effect of the term adjusted for all other terms in the model. The
sum of squares in the ANOVA table is known as the Type III sum of squares.
A partial F-test is equivalent to testing if all the parameter estimates for the
term are equal to zero.
Sequential Measures the effect of the term adjusting only for the previous terms
included in the model. The sum of squares in the ANOVA table is known as
the Type I sum of squares.
A sequential F-test is often useful when fitting a polynomial regression.
Note: The squared t-statistic for a coefficient t-test is equivalent to the F statistic when using the
partial F-test. The t-test is not suitable when the model includes categorical variables coded as
dummy predictor variables as each term consists multiple coefficient t-tests.
A horizontal line shows the constrained model without the term; a slanted line shows the
unconstrained model with the term. The plot shows the unique effect of adding a term to a model
assuming the model contains all the other terms and the influence of each point on the effect of
term hypothesis test. Points further from the horizontal line than the slanted line effectively try to
make the hypothesis test more significant, and those closer to the horizontal than the slanted line
try to make the hypothesis test less significant. When the confidence band fully encompasses the
horizontal line, you can conclude the term does not contribute to the model. It is equivalent to a
non-significant F-test for the partial effect of the term.
You can also use the plot to identify cases that are likely to influence the parameter estimates.
Points furthest from the intersection of the horizontal and slanted lines have high leverage, and
effectively try to pull the line towards them. A high leverage point that is distant from the bulk of
the points can have a large influence the parameter estimates for the term.
Finally, you can use the plot to spot near collinearity between terms. When the terms are collinear,
the points collapse toward a vertical line.
Effect means
Effect means are least-squares estimates predicted by the model for each combination of levels in
a categorical term, adjusted for the other model effects.
Least-squares means are predictions made at specific values of the predictors. For each categorical
variable in the term, the value is the combination of levels. For all other variables that are not part
of the term, the values are neutral values. The neutral value for a quantitative variable is the mean.
The neutral value of a nominal variable is the average over all levels which is equivalent to values
Multiple comparisons
Multiple comparisons make simultaneous inferences about a set of parameters.
When making inferences about more than one parameter (such as comparing many means, or
the differences between many means), you must use multiple comparison procedures to make
inferences about the parameters of interest. The problem when making multiple comparisons
using individual tests such as Student's t-test applied to each comparison is the chance of a
type I error increases with the number of comparisons. If you use a 5% significance level with
a hypothesis test to decide if two groups are significantly different, there is a 5% probability of
observing a significant difference that is simply due to chance (a type I error). If you made 20 such
comparisons, the probability that one or more of the comparisons is statistically significant simply
due to chance increases to 64%. With 50 comparisons, the chance increases to 92%. Another
problem is the dependencies among the parameters of interest also alter the significance level.
Therefore, you must use multiple comparison procedures to maintain the simultaneous probability
close to the nominal significance level (typically 5%).
Multiple comparison procedures are classified by the strength of inference that can be made
and the error rate controlled. A test of homogeneity controls the probability of falsely declaring
any pair to be different when in fact all are the same. A stronger level of inference is confident
inequalities and confident directions which control the probability of falsely declaring any pair to
be different regardless of the values of the others. An even stronger level is a set of simultaneous
confidence intervals that guarantees that the simultaneous coverage probability of the intervals is
at least 100(1-alpha)% and also have the advantage of quantifying the possible effect size rather
than producing just a p-value. A higher strength inference can be used to make an inference of
a lesser strength but not vice-versa. Therefore, a confidence interval can be used to perform a
confidence directions/inequalities inference, but a test of homogeneity cannot make a confidence
direction/inequalities inference.
The most well known multiple comparison procedures, Bonferroni and Šidák, are not multiple
comparison procedures per se. Rather they are an inequality useful in producing easy to compute
multiple comparison methods of various types. In most scenarios, there are more powerful
procedures available. A useful application of Bonferroni inequality is when there are a small
number of pre-planned comparisons. In these cases, you can use the standard hypothesis test or
Tukey-Kramer Compare the means of all pairs of groups using the Tukey-Kramer
method.
Controls the error rate simultaneously for all k(k+1)/2 contrasts.
Hsu Compare the means of all groups against the best of the other groups
using the Hsu method.
Controls the error rate simultaneously for all k contrasts.
Dunnett Compare the means of all groups against a control using the Dunnett
method.
Controls the error rate simultaneously for all k-1 contrasts.
Scheffé Compare the means of all groups against all other groups using the
Scheffé F method.
Controls the error rate simultaneously for all possible contrasts.
Residual plot
A residual plot shows the difference between the observed response and the fitted response
values.
The ideal residual plot, called the null residual plot, shows a random scatter of points forming an
approximately constant width band around the identity line.
It is important to check the fit of the model and assumptions – constant variance, normality,
and independence of the errors, using the residual plot, along with normal, sequence, and lag
plot.
Independence When the order of the cases in the dataset is the order in which they
occurred:
Examine a sequence plot of the residuals against the order to identify
any dependency between the residual and time.
Examine a lag-1 plot of each residual against the previous residual to
identify a serial correlation, where observations are not independent,
and there is a correlation between an observation and the previous
observation.
Time-series analysis may be more suitable to model data where serial
correlation is present.
For a model with many terms, it can be difficult to identify specific problems using the residual
plot. A non-null residual plot indicates that there are problems with the model, but not necessarily
what these are.
Residuals - normality
Normality is the assumption that the underlying residuals are normally distributed, or
approximately so.
While a residual plot, or normal plot of the residuals can identify non-normality, you can formally
test the hypothesis using the Shapiro-Wilk or similar test.
The null hypothesis states that the residuals are normally distributed, against the alternative
hypothesis that they are not normally-distributed. If the test p-value is less than the predefined
significance level, you can reject the null hypothesis and conclude the residuals are not from a
normal distribution. If the p-value is greater than the predefined significance level, you cannot
reject the null hypothesis.
Violation of the normality assumption only becomes an issue with small sample sizes. For large
sample sizes, the assumption is less important due to the central limit theorem, and the fact that
Residuals - independence
Autocorrelation occurs when the residuals are not independent of each other. That is, when the
value of e[i+1] is not independent from e[i].
While a residual plot, or lag-1 plot allows you to visually check for autocorrelation, you can
formally test the hypothesis using the Durbin-Watson test. The Durbin-Watson statistic is used
to detect the presence of autocorrelation at lag 1 (or higher) in the residuals from a regression.
The value of the test statistic lies between 0 and 4, small values indicate successive residuals
are positively correlated. If the Durbin-Watson statistic is much less than 2, there is evidence of
positive autocorrelation, if much greater than 2 evidence of negative autocorrelation.
The null hypothesis states that the residuals are not autocorrelated, against the alternative
hypothesis that they are. If the test p-value is less than the predefined significance level, you can
reject the null hypothesis and conclude the residuals are correlated. If the p-value is greater than
the predefined significance level, you cannot reject the null hypothesis.
Note: The p-value is computed using the bootstrap method and can take a long time to compute.
Plotting residuals
Plot the residuals to check the fit and assumptions of the model.
1. Activate the analysis report worksheet.
2. On the Analyse-it ribbon tab, in the Diagnostics group, click Residuals, and then click:
Prediction
Prediction is the use of the model to predict the population mean or value of an individual future
observation, at specific values of the predictors.
When making predictions, it is important that the data used to fit the model is similar to future
populations to which you want to apply the prediction. You should be careful of making
predictions outside the range of the observed data. Assumptions met for the observed data may
not be met outside the range. Non-constant variance can cause confidence intervals for the
predicted values to become unrealistically narrow or so large as to be useless. Alternatively, a
different fit function may better describe the unobserved data outside the range.
When making multiple predictions of the population mean at different sets of predictor values
the confidence intervals can be simultaneous or individual. A simultaneous interval ensures you
achieve the confidence level simultaneously for all predictions, whereas individual intervals only
ensure confidence for the individual prediction. With individual inferences, the chance of at least
one interval not including the true value increases with the number of predictions.
Making predictions
Predict the value of the mean at specific values of the predictors, or the value of an individual
future observation.
1. Activate the analysis report worksheet.
2. On the Analyse-it ribbon tab, click Predict.
The analysis task pane Predict Y panel opens.
3. In the Terms list box, under the #1 column, type the value for each predictor.
You can leave a value blank in which case the average value is used. Leaving the value blank
is useful in ANOVA models where you can predict the mean at specific combinations averaged
over other effects.
Saving variables
Save the values computed by the analysis back to the dataset for further analysis.
1. Activate the analysis report worksheet.
2. On the Analyse-it ribbon tab, click Save Variable, and then select the variable to store to the
dataset.
Note: If the Save Variable command is shaded, On the Analyse-it ribbon tab, in the Report
group, click Recalculate to update the analysis.
Type Description
Logit / Logistic Fit a model to a binary response variable expressed by the logit link function
(log odds ratio) and binomial error distribution.
Note: This model is very common as the parameter estimates can be
interpreted as the log-odds or back transformed into an odds ratio.
Probit Fit a model to a binary response variable expressed by the probit function
and binomial error distribution.
Parameter estimates
Parameter estimates (also called coefficients) are the log odds ratio associated with a one-unit
change of the predictor, all other predictors being held constant.
The unknown model parameters are estimated using maximum-likelihood estimation.
A coefficient describes the size of the contribution of that predictor; a large coefficient indicates
that the variable strongly influences the probability of that outcome, while a near-zero coefficient
indicates that variable has little influence on the probability of that outcome. A positive sign
indicates that the explanatory variable increases the probability of the outcome, while a negative
sign indicates that the variable decreases the probability of that outcome. A confidence interval for
each parameter shows the uncertainty in the estimate.
When the model contains categorical variables, the interpretation of the coefficients is more
complex. For each term involving a categorical variable, a number of dummy predictor variables
are created to predict the effect of each different level. There are different ways to code the
predictors for a categorical variable, the most common method in logistic regression is called
reference cell coding or dummy coding. In reference cell coding, the first category acts as a
Requirements
• 1 or more categorical or quantitative predictor variables.
• A categorical or quantitative response variable.
Dataset layout
Use a column for each predictor variable (Height, Sex) and a column for the response variable
(Weight); each row has the values of the variables for a case (Subject).
1 175 M 65
2 180 M 70
3 160 F 90
4 190 F 55
5 180 M 100
6 150 F 55
7 140 M 75
8 160 M 80
9 165 F 80
10 180 M 95
… … … …
Method comparison
18
Method comparison measures the closeness of agreement between the measured values of two
methods.
Note: The term method is used as a generic term and can include different measurement
procedures, measurement systems, laboratories, or any other variable that you want to if there are
differences between measurements.
Correlation coefficient
A correlation coefficient measures the association between two methods.
The correlation coefficient is probably the most commonly reported statistic in method comparison
studies. However, it is irrelevant for a number of reasons (Bland & Altman, 1986):
• It is a measure of the strength of linear association between two methods, the extent to which
as one variable increases the other variable also tends to increase, not the agreement between
them.
• A change in the scale of measurement does not affect the correlation, even though it affects
the agreement. For example, if one method reports double the value of another method the
correlation coefficient would still be high even though the agreement between the methods is
poor.
• It simply represents the ratio of variation between the subjects relative to the measurement
variation. The measuring interval chosen in study design can affect the correlation coefficient.
The correlation coefficient is sometimes re-purposed as an adequate range test (CLSI, 2002) on the
basis that the ratio of variation between subjects, relative to measurement variation, is an indicator
of the quality of the data (Stöckl & Thienpont, 1998). When the correlation coefficient is greater
than 0.975 the parameters of an ordinary linear regression are not significantly biased by the error
in the X variable, and so linear regression is sometimes recommended. However, with the wide
range of proper regression procedures available for analyzing method comparison studies, there is
little need to use inappropriate models.
149
Scatter plot
A scatter plot shows the relationship between two methods.
The scatter plot shows measured values of the reference or comparison method on the horizontal
axis, against the test method on the vertical axis.
The relationship between the methods may indicate a constant, or proportional bias, and the
variability in the measurements across the measuring interval. If the points form a constant-width
band, the method has a constant standard deviation (constant SD). If the points form a band that
is narrower at small values and wider at large values, there is a constant relationship between the
standard deviation and value, and the method has constant a coefficient of variation (CV). Some
measurement procedures exhibit constant SD in the low range and constant CV in the high range.
If both methods measure on the same scale, a gray identity line shows ideal agreement and is
useful for comparing the relationship against.
Fit Y on X
Regression of Y on X describes the linear relationship between the methods.
Deming regression
Deming regression is an errors-in-variables model that fits a line describing the relationship
between two variables. Unlike ordinary linear regression, it is suitable when there is measurement
error in both variables.
Deming regression (Cornbleet & Gochman, 1979) finds the line of best fit by minimizing the sum
of the distances between the measured values and the regression line, at an angle specified by the
variance ratio. It assume both variables are measured with error. The variance ratio of the errors
in the X / Y variable is required and assumed to be constant across the measuring interval. If you
measure the items in replicate, the measurement error of each method is estimated from the data
and the variance ratio calculated.
In the case where the variance ratio is equal to 1, Deming regression is equivalent to orthogonal
regression. When only single measurements are made by each method and the ratio of variances is
unknown, a variance ratio of 1 is sometimes used as a default. When the range of measurements
is large compared to the measurement error this can be acceptable. However in cases where the
range is small the estimates are biased and the standard error underestimated, leading to incorrect
hypothesis tests and confidence intervals (Linnet, 1998).
Weighted Deming regression (Linnet, 1990) is a modification of Deming regression that assumes
the ratio of the coefficient of variation (CV), rather than the ratio of variances, is constant across
the measuring interval.
Confidence intervals for parameter estimates use a t-distribution and the standard errors are
computed using a Jackknife procedure (Linnet, 1993).
Passing-Bablok regression
Passing-Bablok regression fits a line describing the relationship between two variables. It is robust,
non-parametric, and is not sensitive to outliers or the distribution of errors.
Passing-Bablok regression (Passing & Bablok, 1983) finds the line of best fit using a shifted median
of all possible pairwise slopes between points. It does not make assumptions of the distribution of
the measurement errors, and the variance of the measurement errors need not remain constant
over the measuring interval though their ratio should remain proportional to #² (the slope squared,
in many cases # ≈ 1).
There is a modification to the procedure for use when the methods measure on different scales,
or where the purpose is to transform results from one method to another rather than to compare
two methods for equality (Passing & Bablok, 1988).
Confidence intervals for parameter estimates are based on a normal approximation or bootstrap.
Confidence curves around the regression line and at specific points on the line are obtained by
bootstrap.
Ordinary Least Square Fit an ordinary regression where the test method measurement
error SD is constant throughout the measuring interval.
Weighted Least Squares Fit an ordinary regression where the test method measurement
error CV is constant throughout the measuring interval.
1st X, 1st Y Uses only the 1st X replicate and 1st Y replicate in the regression.
Mean X, 1st Y Uses the Mean of X replicates and the 1st Y replicate in the
regression.
Mean X, Mean Y Uses the Mean of the X replicates and the Mean of the Y replicates in
the regression.
Note: All items must have the same number of replicates. Items that do not are excluded from
analysis.
7. Click Calculate.
Ordinary Deming Fit a Deming regression where the ratio of measurement error SD is
constant throughout the measuring interval.
1st X, 1st Y Uses only the 1st X replicate and 1st Y replicate in the regression.
Mean X, 1st Y Uses the Mean of X replicates and the 1st Y replicate in the
regression.
Mean X, Mean Y Uses the Mean of the X replicates and the Mean of the Y replicates in
the regression.
Note: All items must have the same number of replicates. Items that do not are excluded from
analysis.
7. If the items are not measured in replicate, in the Variance ratio, edit box, type the ratio of the
variances, or in the SD/CV X and SD/CV Y edit boxes, type the standard deviation or CV of the
measurement error for each method.
Note: The regression procedure uses all replicate measurements to estimate the precision of
each method.
8. Click Calculate.
1st X, 1st Y Uses only the 1st X replicate and 1st Y replicate in the regression.
Mean X, 1st Y Uses the Mean of X replicates and the 1st Y replicate in the
regression.
Mean X, Mean Y Uses the Mean of the X replicates and the Mean of the Y replicates in
the regression.
Note: All items must have the same number of replicates. Items that do not are excluded from
analysis.
7. In the Method drop-down list, select:
Part I When the methods are measured on the same scale and the purpose is to
compare if the methods are equal.
Part III When the methods are measured on different scales, or the purpose is to
convert the values from one method to the other using the regression formula.
8. Click Calculate.
Linearity
Linearity is the assumption that the relationship between the methods is linear.
The regression procedures used in method comparison studies assume the relationship between
the methods is linear. A CUSUM is a measure of the linearity, defined as a running sum of the
number of observations above and below the fitted regression line. When the relationship is linear
it is expected the points above and below the line are randomly scattered, and the CUSUM statistic
is small. Clusters of points on one side of the regression line produce a large CUSUM statistic.
A formal hypothesis test for linearity is based on the largest CUSUM statistic and the Kolmogorov-
Smirnov test. The null hypothesis states that the relationship is linear, against the alternative
hypothesis that it is not linear. When the test p-value is small, you can reject the null hypothesis
and conclude that the relationship is nonlinear.
Average bias
Bias is a measure of a systematic measurement error, the component of measurement error that
remains constant in replicate measurements on the same item. When measuring a method against
a reference method using many items the average bias is an estimate of bias that is averaged over
all the items.
Bias is the term used when a method is compared against a reference method. When the
comparison is not against a reference method but instead another routine comparative laboratory
method, it is simply an average difference between methods rather than an average bias. For
clarify of writing we will use the term average bias.
The average bias is usually expressed as the constant and proportional bias from a regression
procedure, or as a constant or proportional bias from the mean of the differences or relative
differences. If there are other sources systematic errors present, such as nonlinearity or
interferences, the average bias will be incorrect.
The average bias is an estimate of the true unknown average bias in a single study. If the study
were repeated, the estimate would be expected to vary from study to study. Therefore, if a
The classic difference plot shows the difference between the methods on the vertical axis, against
the best estimate of the true value on the horizontal axis. When one method is a reference
method, it is used as the best estimate of the true value and plotted on the horizontal axis
(Krouwer, 2008). In other cases, using the average of the methods as the best estimate of the true
value, to avoid an artificial relationship between the difference and magnitude (Bland & Altman,
1995).
A relative difference plot (Pollock et al., 1993) shows the relative differences on the vertical axis,
against the best estimate of the true value on the horizontal axis. It is useful when the methods
show variability related to increasing magnitude, that is where the points on a difference plot form
a band starting narrow and becoming wider as X increases. Another alternative is to plot the ratio
of the methods on the vertical axis, against the best estimate of the true value on the horizontal
axis.
Fit differences
Regression of the differences on the true value describes the relationship between the methods.
Mean difference measures the constant relationship between the variables. An assumption is
that the difference is not relative to magnitude across the measuring interval. If the differences
are related to the magnitude, the relationship should be modeled by using relative differences or
regressing the differences on the true value.
Mean difference Estimate the average difference using the mean of the differences.
Median difference Estimate the average difference using the median of the differences.
Useful when the distribution of the differences is not normally
distributed
X X is a reference method.
(X+Y)/2 Neither X or Y are reference methods.
8. Click Calculate.
Krouwer and Monti (1995) devised the mountain plot (also known as a folded empirical
cumulative distribution plot) as a complementary representation of the difference plot. It shows
the distribution of the differences with an emphasis on the center and the tails of the distribution.
You can use the plot to estimate the median of the differences, the central 95% interval, the
range, and the percentage of observations outside the total allowable error bands.
The plot is simply the empirical cumulative distribution function of the differences folded around
the median (that is, the plotted function = p where p < 0.5 otherwise 1-p). Unlike the histogram it
is unaffected by choice of class intervals, however it should be noted that although the mountain
plot looks like a frequency polygon it does not display the density function. It has recently been
proven that the area under the plot is equal to the mean absolute deviation from the median (Xue
and Titterington, 2010).
The positive agreement (PPA) and negative agreement (NPA) have a natural interpretation:
• Positive agreement is the proportion of comparative/reference method positive results in which
the test method result is positive.
Symmetric agreement measures are not affected by interchanging the X and Y variable. These
are useful in many other cases, such as comparing observers, laboratories, or other factors where
neither is a natural comparator. There are various measures based on the mean of the proportions
to which X agrees with Y, and Y agrees with X. The Kulczynski, Dice-Sørensen, and Ochiai are
three such measures that use arithmetic, harmonic, and geometric mean of the proportions,
respectively.
The harmonic mean weights the smaller proportion more heavily and produces the smallest value
amongst the three measures. The geometric mean is the square root of Bangdiwala’s B statistic,
which is the ratio of the observed agreement to maximum possible agreement in the agreement
plot. The arithmetic mean has the greatest value. In most cases, of moderate to high agreement,
there is very little to choose between the measures.
The average positive agreement and average negative agreement using the Dice-Sørensen
measure are:
• Average positive agreement is the number of positive matches as a proportion of the average
of the number of positive results by X, Y.
• Average negative agreement is the number of negative matches as a proportion of the average
of the number of negative results by X, Y.
The overall proportion of agreement is the sum of the diagonal entries divided by the total.
Agreement plot
An agreement plot shows the agreement between two binary or semi-quantitiatve methods.
Bangdiwala (2013) devised the agreement plot as a complement to the kappa or B-statistics. It
is invaluable for assessing agreement as it gives a visual impression that no summary statistic can
convey.
The agreement plot is a visual representation of a k by k square contingency table. Each black
rectangle represents the marginal totals of the rows and columns. Shaded boxes represent the
agreement based on the diagonal cell frequencies; they are positioned inside the rectangles
Reference Compute the proportion of the Y (Test method) that agree with the
X (Reference / Comparative method). Useful to make comparisons
against a reference method or the current laboratory method. Note:
This method is asymmetric and the agreement is different depending
on the assignment of X and Y.
Average Compute the Dice-Sorenson measure of average agreement. Useful
when comparing say 2 observers or laboratories where neither is
a natural comparator. Note: This method is symmetric and doesn't
depend on assignment of X and Y.
7. On the Analyse-it ribbon tab, in the Method Comparison group, click Estimate and then
select an overall agreement estimator.
8. Click Calculate.
Requirements
• 2 quantitative variables.
• A recommended minimum of at least 40 cases.
• Each measurement in singlicate or replicate.
1 120 121
2 113 118
3 167 150
4 185 181
5 122 122
… … …
1 120 121
1 122 120
2 113 118
2 110 119
3 167 150
3 170 155
4 185 181
4 188 180
5 122 122
5 123 130
… … …
1 X 120
1 X 122
1 Y 121
1 Y 120
2 X 113
2 X 110
2 Y 118
2 Y 119
3 X 167
3 X 170
3 Y 150
3 Y 155
4 X 185
4 X 188
4 Y 181
4 Y 180
5 X 122
5 X 123
5 Y 122
5 Y 130
… … …
Requirements
• 2 qualitative (binary / semi-quantitative) variables.
1 + +
2 - -
3 + +
4 - -
5 - -
6 - -
7 + +
8 + -
9 - -
10 - -
… … …
+ + 32
+ - 1
- + 3
- - 38
Precision
Precision is the closeness of agreement between measured quantity values obtained by replicate
measurements on the same or similar objects under specified conditions.
Precision is not a quantity and therefore it is not expressed numerically. Rather, it is expressed by
measures such as the variance, standard deviation, or coefficient of variation under the specified
conditions of measurement
Conditions of measurement
Many different factors may contribute to the variability between replicate measurements, including
the operator; the equipment used; the calibration of the equipment; the environment; the time
elapsed between measurements.
Two conditions of precision, termed repeatability and reproducibility conditions are useful for
describing the variability of a measurement procedure. Other intermediate conditions between
these two extreme conditions of precision are also conceivable and useful, such as conditions
within a single laboratory.
Reproducibility
Reproducibility is a measure of precision under a defined set of conditions: different locations,
operators, measuring systems, and replicate measurements on the same or similar objects.
Intermediate precision
Intermediate precision (also called within-laboratory or within-device) is a measure of precision
under a defined set of conditions: same measurement procedure, same measuring system, same
location, and replicate measurements on the same or similar objects over an extended period of
time. It may include changes to other conditions such as new calibrations, operators, or reagent
lots.
Repeatability
Repeatability (also called within-run precision) is a measure of precision under a defined set
of conditions: same measurement procedure, same operators, same measuring system, same
operating conditions and same location, and replicate measurements on the same or similar
objects over a short period of time.
Variance components
Variance components are estimates of a part of the total variability accounted for by a specified
source of variability.
173
Random factors are factors where a number of levels are randomly sampled from the population,
and the intention is to make inferences about the population. For example, a study might examine
the precision of a measurement procedure in different laboratories. In this case, there are 2
variance components: variation within an individual laboratory and the variation among all
laboratories. When performing the study it is impractical to study all laboratories, so instead a
random sample of laboratories are used. More complex studies may examine the precision within a
single run, within a single laboratory, and across laboratories.
Most precision studies use a nested (or hierarchical) model where each level of a nested factor is
unique amongst each level of the outer factor. The basis for estimating the variance components
is the nested analysis of variance (ANOVA). Estimates of the variance components are extracted
from the ANOVA by equating the mean squares to the expected mean squares. If the variance is
negative, usually due to a small sample size, it is set to zero. Variance components are combined
by summing them to estimate the precision under different conditions of measurements.
The variance components can be expressed as a variance, standard deviation (SD), or coefficient
of variation (CV). A point estimate is a single value that is the best estimate of the true unknown
parameter; a confidence interval is a range of values and indicates the uncertainty of the estimate.
A larger estimate reflects less precision.
There are numerous methods for constructing confidence intervals for variance components:
Estimator Description
Exact Used to form intervals on the inner-most nested variance components.
Based on the F-distribution.
Satterthwaite Used to form intervals on the sum of the variances. Based on the
F-distribution, as above, but uses modified degrees of freedom.
Works well when the factors have equal or a large number of levels,
though when the differences between them are large it can produce
unacceptably liberal confidence intervals.
Modified Large Sample Used to form intervals on the sum of variances or individual
(MLS) components. A modification of the large sample normal theory
approach to constructing confidence intervals. Provides good coverage
close to the nominal level in a wide range of cases.
Variance function
A variance function describes the relationship between the variance and the measured quantity
value.
The are numerous models that describe the relationship between the variance and measured
quantity value across the measuring interval:
Fit Description
Constant variance Fit constant variance across the measuring interval.
Constant CV Fit constant coefficient of variation across the measuring interval.
A variance function can be useful to estimate the limit of detection or limit of quantitation.
Bias
Bias is a measure of a systematic measurement error, the component of measurement error that
remains constant in replicate measurements.
The bias can be expressed in absolute measurement units or as a percentage relative to the known
value. A point estimate is a single value that is the best estimate of the true unknown parameter; a
confidence interval is a range of values and indicates the uncertainty of the estimate.
The bias is an estimate of the true unknown bias in a single study. If the study were repeated,
the estimate would be expected to vary from study to study. Therefore, if a single estimate is
compared directly to 0 or compared to the allowable bias the statement is only applicable to the
single study. To make inferences about the true unknown bias you must perform a hypothesis test:
There are two common hypotheses of interest that can be tested:
• Equality test
The null hypothesis states that the bias is equal to 0, against the alternative hypothesis that it is
not equal zero. When the test p-value is small, you can reject the null hypothesis and conclude
that the bias is different to zero.
It is important to remember that a statistically significant p-value tells you nothing about the
practical importance of what was observed. For a large sample, the bias for a statistically
significant hypothesis test may be so small as to be practically useless. Conversely, although
there may some evidence of bias, the sample size may be too small for the test to reach
statistical significance, and you may miss an opportunity to discover a true meaningful bias.
Lack of evidence against the null hypothesis does not mean it has been proven to be true, the
belief before you perform the study is that the null hypothesis is true and the purpose is to look
for evidence against it. An equality test at the 5% significance level is equivalent to comparing
a 95% confidence interval to see if it includes zero.
• Equivalence test
The null hypothesis states that the bias is outside an interval of practical equivalence, against
the alternative hypothesis that the bias is within the interval considered practically equivalent.
When the test p-value is small, you can reject the null hypothesis and conclude that the bias is
practically equivalent, and within the specified interval.
An equivalence test is used to prove a bias requirement can be met. The null hypothesis states
the methods are not equivalent and looks for evidence that they are in fact equivalent. An
equivalence hypothesis test at the 5% significance level is the same as comparing the 90%
confidence interval to the allowable bias interval.
Linearity
Nonlinear bias is a component of bias that cannot be represented by a linear relationship between
the measured and true values.
A measurement procedure is linear when there is a mathematically verified straight-line
relationship between the measured and true values. It is an important parameter as it allows linear
interpolation of results between points.
Best significant Fits a 2nd- and 3rd-order polynomial fit, then determines if either
term polynomial are better than the linear fit by testing whether the nonlinear terms
(default) (polynomial fit coefficients) are statistically significant. The 3rd-order
polynomial is used if it has significant nonlinear terms, the 2nd-order
polynomial is used if it has significant nonlinear terms, otherwise the
measurement procedure is assumed to be linear. Recommended by
CLSI EP6.
Note: If the precision of the measurement procedure is poor nonlinearity can be difficult to
detect as neither of the polynomial fits will be significantly better than the linear fit, due to the
amount of random error in the measurements.
7. Click Calculate.
Interferences
Interference bias is a component of bias caused by nonspecificity attributable to the presence of a
specific interfering substance.
Normal quantile Estimate the LoB using the 5% upper of the distribution of blank values
(levels with assigned value 0).
Use when values are normally distributed and not truncated at 0.
Quantile Estimate the LoB using the 95th percentile of the distribution of blank
values.
Use when values are truncated at zero.
Normal quantile Estimate the LoB using the 5% upper of the distribution of blank values
(levels with assigned value 0).
Use when values are normally distributed and not truncated at 0.
Quantile Estimate the LoB using the 95th percentile of the distribution of blank
values.
Use when values are truncated at zero.
5. In the Alpha edit box, type the probability that a blank material gives a result greater than the
critical value (LoB), when in fact the substance is not present (typically 5%).
6. In the SD drop-down list, select Precision profile function.
7. In the Beta edit box, type the probability that a non-blank material gives a result less than the
critical value (LoB) when in fact the substance is present (typically 5%).
8. Click Calculate.
Requirements
• A quantitative variable.
• 1 or more optional factor variables indicating the random effects of interest.
• At least 2 replicates at each level.
Dataset layout
Use a column for the measured variable (Measured value) and optionally a by variable (Level); each
row is a seperate measurement.
Level Measured
(optional) value
120 121
120 118
120 124
120 120
120 116
… …
240 240
240 246
240 232
240 241
240 240
… …
120 1 121
120 1 118
120 1 …
120 2 120
120 2 116
120 2 …
120 3 …
120 … …
… … …
240 1 240
240 1 242
240 1 …
240 2 260
240 2 238
240 2 …
240 3 …
240 … …
… … …
120 1 1 121
120 1 1 118
120 1 1 …
120 1 2 120
120 1 2 116
120 1 2 …
120 1 3 …
120 1 … …
120 2 1 124
120 2 1 119
120 2 1 …
120 2 2 118
120 2 2 121
120 2 2 …
120 2 3 …
120 2 … …
… … …
1 0.7 0.9 …
2 4.6 4.1 …
3 6.5 6.9 …
4 11 12.2 …
Note: All the above dataset layouts can arrange replicate measurements in a single row rather
than in multiple rows.
Reference interval
20
A reference interval (sometimes called a reference range or normal range) describes the range of
values of a measured quantity in healthy individuals.
Reference limits
A reference limit defines a value where a given proportion of reference values are less than or
equal to. A reference interval defines the interval between a lower and upper reference limit that
includes a given proportion of the reference values.
The process of defining a reference interval is that reference individuals compromise a reference
population from which is selected a reference sample group on which are determined reference
values on which is observed a reference distribution from which are calculated reference limits that
define a reference interval.
A point estimate of a reference limit is a single value that is the best estimate of the true unknown
parameter; a confidence interval is a range of values and indicates the uncertainty of the estimate.
It is important to remember that the width of the confidence interval is dependent upon sample
size and that large sample sizes are required to produce narrow confidence intervals, particularly
for skewed distributions (Linnet, 2000).
There are numerous quantile estimators that may define the reference limits. The best estimator
depends heavily on the shape of the distribution.
Estimator Description
MVUE The uniformly minimum unbiased variance quantile estimator uses the
sample mean and unbiased standard deviation as the best estimate of
the population parameters (Xbar ± Z(alpha) * (s / c4(n))). The factor c4(n)
is applied to the sample standard deviation to account for the bias in the
estimate for small sample sizes.
t-based The t-based prediction interval quantile estimator uses the Student's t
distribution for a prediction interval (Xbar ± t(alpha, n-1) * s * sqrt(1+1/
n)) for a single future observation (Horn, 2005). Note that this method
produces a wider interval than the MVUE estimator for small samples.
In many cases, the data are skewed to the right and do not follow a normal distribution. A Box-
Cox (or logarithmic) transform can correct the skewness, allowing you to use the Normal theory
quantile. If not, a distribution-free estimator may be more powerful (IFCC, 1987).
189
Quantile
A distribution-free (non-parametric) quantile estimator based on the order statistics (the sorted
values in the sample).
There are several definitions for the quantile estimator useful in defining reference limits.
A 95% reference interval (0.025 and 0.975 quantiles) requires a minimum sample size of 39. A
90% confidence interval for a 95% reference interval requires a minimum sample size of 119.
Bootstrap quantile
A distribution-free (non-parametric) quantile estimator that is the median of a set of quantiles
calculated by re-sampling the original sample a large number of times and computing a quantile
for each sample.
The bootstrap quantile (Linnet, 2000)as providing the lowest root-mean-squared-error (an estimate
of the bias and precision in the estimate) for both normal and skewed distributions. Another
advantage is that confidence intervals can be computed for smaller sample sizes, although Linnet
still recommends a sample size of at least 100.
Harrell-Davis quantile
A distribution-free (non-parametric) estimator that is a weighted linear combination of order
statistics. It is substantially more efficient than the traditional estimator based on one or two order
statistics.
Reference interval Estimate lower and upper reference limits that a stated proportion of
values are between.
Reference limit Estimate a single reference limit that a stated proportion of values are
less than.
5. In the Proportion inside interval/Proportion less than edit box, type the reference level as a
percentage.
6. In the Method drop-down list, select the computation method.
7. In the Confidence interval edit box, type the confidence level as a percentage, or type - to
suppress the confidence interval, and then in the drop-down list, select the confidence bounds.
8. Optional: To split the reference values by factors such as gender or age:
a) On the Analyse-it ribbon tab, in the Reference Interval group, click Partition
b) In the Factors list, select the factor variables.
c) In the Partitions list, select a partition.
The tasks change to the analysis options for that partition.
d) Optional: By default, the analysis options are the same for each partition, set the specific
analysis options for the partition as required.
e) Repeat steps 8.c through 8.d for each partition.
9. Click Calculate.
Histogram
A histogram shows the distribution of the data to assess the central tendency, variability, and
shape.
A histogram for a quantitative variable divides the range of the values into discrete classes, and
then counts the number of observations falling into each class interval. The area of each bar in
the histogram is proportional to the frequency in the class. When the class widths are equal, the
height of the bar is also proportional to the frequency in the class.
Choosing the number of classes to use can be difficult as there is no "best," and different class
widths can reveal or hide features of the data. Scott's and Freedman-Diaconis' rules provide a
Box plot
A box plot shows the five-number summary of the data – the minimum, first quartile, median,
third quartile, and maximum. An outlier box plot is a variation of the skeletal box plot that also
identifies possible outliers.
An outlier box plot is a variation of the skeletal box plot, but instead of extending to the minimum
and maximum, the whiskers extend to the furthest observation within 1.5 x IQR from the quartiles.
Possible near outliers (orange plus symbol) are identified as observations further than 1.5 x IQR
from the quartiles, and possible far outliers (red asterisk symbol) as observations further than 3.0
x IQR from the quartiles. You should investigate each possible outlier before deciding whether to
exclude it, as even in normally distributed data, an outlier box plot identifies approximately 0.7%
of observations as possible outliers.
For normally distributed data, observations should lie approximately on a straight line. If the data
is non-normal, the points form a curve that deviates markedly from a straight line. Possible outliers
are points at the ends of the line, distanced from the bulk of the observations.
Requirements
• A quantitative variable of the reference values.
• 1 or more factor variables.
Dataset layout
Use a column for the variable (Reference value), and optionally additional columns for each
partition factor (Sex); each row has the values of the variables for a case (Subject).
1 Male 121
2 Male 118
3 Female 124
4 Female 120
5 Male 116
6 Male …
7 Female 100
8 Male 115
9 Female 102
10 Female 98
11 Male 118
… … …
Diagnostic performance
21
Diagnostic performance evaluates the ability of a qualitative or quantitative test to discriminate
between two subclasses of subjects.
True positive (TP) Test result correctly identifies the presence of the condition.
False positive (FP) Test result incorrectly identifies the presence of the condition when it
was absent.
True negative (TN) Test result correctly identifies the absence of the condition.
False negative (FN) Test result incorrectly identifies the absence of the condition when it was
present.
A perfect diagnostic test can discriminate all subjects with and without the condition and results
in no false positive or false negatives. However, this is rarely achievable, as misdiagnosis of some
subjects is inevitable. Measures of diagnostic accuracy quantify the discriminative ability of a test.
Sensitivity / Specificity
Sensitivity and specificity are the probability of a correct test result in subjects with and without a
condition, respectively.
Sensitivity (true positive fraction, TPF) measures the ability of a test to detect the condition when it
is present. It is the probability that the test result is positive when the condition is present.
Specificity (true negative fraction, TNF) measures the ability of a test to detect the absence of
the condition when it is not present. It is the probability that the test result is negative when the
condition is absent.
Likelihood ratios
Likelihood ratios are the ratio of the probability of a specific test result for subjects with the
condition against the probability of the same test result for subjects without the condition.
A likelihood ratio of 1 indicates that the test result is equally likely in subjects with and without
the condition. A ratio > 1 indicates that the test result is more likely in subjects with the condition
than without the condition, and conversely, a ratio < 1 indicates that the test result is more likely
in subjects without the condition. The larger the ratio, the more likely the test result is in subjects
with the condition than without; likewise, the smaller the ratio, the more likely the test result is in
subjects without than with the condition.
197
The likelihood ratio of a positive test result is the ratio of the probability of a positive test result in
a subject with the condition (true positive fraction) against the probability of a positive test result
in a subject without the condition (false positive fraction). The likelihood ratio of a negative test
result is the ratio of the probability of a negative test result in a subject with the condition (false
negative fraction) to the probability of a negative test result in a subject without the condition
(true negative fraction).
Predictive values
Predictive values are the probability of correctly identifying a subject's condition given the test
result.
Predictive values use Bayes' theorem along with a pre-test prior probability (such as the prevalence
of the condition in the population) and the sensitivity and specificity of the test to compute the
post-test probability (predictive value).
The positive predictive value is the probability that a subject has the condition given a positive test
result; the negative predictive value is the probability that a subject does not have the condition
given a negative test result.
Youden J
Youden's J is the likelihood of a positive test result in subjects with the condition versus those
without the condition. It is also the probability of an informed decision (as opposed to a random
guess).
Youden's J index combines sensitivity and specificity into a single measure (Sensitivity + Specificity -
1) and has a value between 0 and 1. In a perfect test, Youden's index equals 1. It is also equivalent
to the vertical distance above the diagonal no discrimination (chance) line to the ROC curve for a
single decision threshold.
ROC plot
ROC (receiver operating characteristic) curves show the ability of a quantitative diagnostic test to
classify subjects correctly as the decision threshold is varied.
The ROC plot shows sensitivity (true positive fraction) on the horizontal axis against 1-specificity
(false positive fraction) on the vertical axis over all possible decision thresholds.
Decision thresholds
A decision threshold is a value that dichotomizes the result of a quantitative test to a simple binary
decision.
The test result of a quantitative diagnostic test is dichotomized by treating the values above or
equal to a threshold as positive, and those below as negative, or vice-versa.
There are many ways to choose a decision threshold for a diagnostic test. For a simple screening
test, the decision threshold is often chosen to incur a fixed, true positive, or false positive rate.
In more complex cases, the optimal decision threshold depends on both the cost of performing
the test and the cost of the consequences of the test result. The costs may include both financial
and health-related costs. A simple formula to determine the optimal decision threshold (Zweig &
Campbell, 1993) maximizes: Sensitivity - m * (1- Specificity) where m = (Cost FP - Cost TN) / (Cost
FN - Cost TP).
Decision plot
A decision plot shows a measure of performance (such as sensitivity, specificity, likelihood ratios, or
predictive values) against all decision thresholds to help identify optimal decision threshold.
Fixed FPF Predict the TPF (Sensitivity) and Threshold at a fixed FPF
Fixed TPF Predicts the FPF and Threshold at a fixed TPF.
(Sensitivity)
Threshold Predict the TPF (Sensitivity) and FPF at a decision threshold.
Requirements
• A qualitative or quantitative variable of the measured values or indications (positive/negative) of
the diagnostic test.
• A qualitative variable indicating the true state (positive/negative) of each subject.
1 Diseased 121
2 Diseased 118
3 Diseased 124
4 Diseased 120
5 Diseased 116
6 … …
7 Healthy 100
8 Healthy 115
9 Healthy 102
10 Healthy 98
11 Healthy 118
… … …
1 Diseased 121 86
2 Diseased 118 90
3 Diseased 124 91
4 Diseased 120 99
5 Diseased 116 89
6 … … …
7 Healthy 100 70
8 Healthy 115 80
9 Healthy 102 79
10 Healthy 98 87
11 Healthy 118 90
… … …
1 Diseased 1 121
2 Diseased 1 118
3 Diseased 1 124
4 Diseased 1 120
5 Diseased 1 116
6 Healthy 1 100
7 Healthy 1 115
8 Healthy 1 102
9 Healthy 1 98
10 Healthy 1 118
11 … … …
12 Diseased 2 86
13 Diseased 2 90
14 Diseased 2 91
15 Diseased 2 99
16 Diseased 2 89
17 Healthy 2 70
18 Healthy 2 80
19 Healthy 2 79
20 Healthy 2 87
21 Healthy 2 90
… … … …
1 Diseased P
2 Diseased P
3 Diseased P
4 Diseased N
5 Diseased P
6 Healthy N
7 Healthy N
8 Healthy N
9 Healthy N
10 Healthy N
… … …
Diseased Positive 3
Diseased Negative 1
Healthy Positive 0
Healthy Negative 5
Control charts
22
Control charts determine if a process is in a state of statistical control.
A control chart plots a quality characteristic statistic in a time-ordered sequence. A center line
indicates the process average, and two other horizontal lines called the lower and upper control
limits represent process variation.
All processes have some natural degree of variation. A control chart for a process that is in-control
has points randomly distributed within the control limits. That is it has variation only from sources
common to the process (called common-cause variation). An out-of-control process has points
falling outside the control limits or non-random patterns of points (called special-cause variation).
If the process is in-control, no corrections or changes to the process are needed.
If the process is out-of-control, the control chart can help determine the sources of variation in
need of further investigation. It is appropriate to determine if the results with the special-cause are
better than or worse than results from common causes alone. If worse, then that cause should be
eliminated if possible. If better, it may be appropriate to investigate the system further as it may
lead to improvements in the process.
Typically control limits are defined as a multiple of the process sigma. For a Shewhart control chart
with 3-sigma control limits and assuming normality, the probability of exceeding the upper control
limit is 0.00135 and the probability of falling below the lower control limit is also 0.00135. Their
sum is 0.0027 (0.27%). Therefore the probability of a point between the control limits for an in-
control process is 0.9973 (99.73%). An alternative is to define the control limits as probability
limits based on a specified distribution rather than assuming a normal distribution.
Another way to look at the performance of a control chart is the average run length (ARL). An
average in control run length is the number of observations when a process is in-control before
a false alarm occurs. An average out of control run length is the number of observations that
a process is out-of-control before a shift is detected, and depends on the size of the shift to be
detected. The Shewhart control chart described above has an ARL = 1/0.0027 = 370.37. That is,
when a process is in control, you should expect a false alarm out-of-control signal approximately
once every 371 runs.
Although most examples of control charts show quality characteristics that are of interest to the
end-user (such as length, diameter, or weight) they are most beneficial applied to process variables
further upstream (such as the temperature of the furnace or content of tin in the raw material).
Note: It is important not to confuse control limits used in control charts with specification limits
used in process capability. Natural variation in a process defines the control limits. Whereas,
customer requirements define the specification limits. Likewise, the center line should not be
confused with a target value.
209
Shewhart control charts
A Shewhart control chart detects changes in a process.
Xbar-S chart
An Xbar-S chart is a combination of control charts used to monitor the process variability (as the
standard deviation) and average (as the mean) when measuring subgroups at regular intervals
from a process.
Xbar chart
An Xbar-chart is a type of control chart used to monitor the process mean when measuring
subgroups at regular intervals from a process.
Each point on the chart represents the value of a subgroup mean.
The center line is the process mean. If unspecified, the process mean is the weighted mean of the
subgroup means.
The control limits are either:
• A multiple (k) of sigma above and below the center line. Default k=3.
• Probability limits, defined as the probability (alpha) of a point exceeding the limits. Default
alpha=0.27%.
If unspecified, the process sigma is the pooled standard deviation of the subgroups, unless the
chart is combined with an R- or S- chart where it is estimated as described for the respective chart.
The observations are assumed to be independent, and the means normally distributed. Individual
observations need not be normally distributed. Due to the central limit theorem, the subgroup
means are often approximately normally distributed for subgroup sizes larger than 4 or 5
regardless of the distribution of the individual observations.
For data with different subgroup sizes, the control limits vary. A standardized version of the control
chart plots the points in standard deviation units. Such a control chart has a constant center line at
0, and upper and lower control limits of -3 and +3 respectively making patterns in the data easier
to see.
It is important to ensure process variability is in a state of statistical control before using the Xbar-
chart to investigate if the process mean is in control. Therefore a Xbar-chart is often combined
with an R- or S-chart to monitor process variability. If the variability is not under control, the
control limits may be too wide leading to an inability to detect special causes of variation affecting
the process mean.
R chart
An R-chart is a type of control chart used to monitor the process variability (as the range) when
measuring small subgroups (n ≤ 10) at regular intervals from a process.
Each point on the chart represents the value of a subgroup range.
The center line for each subgroup is the expected value of the range statistic. Note that the center
line varies when the subgroup sizes are unequal.
The control limits are either:
• A multiple (k) of sigma above and below the center line. Default k=3.
• Probability limits, defined as the probability (alpha) of a point exceeding the limits. Default
alpha=0.27%.
S chart
An S-chart is a type of control chart used to monitor the process variability (as the standard
deviation) when measuring subgroups (n ≥ 5) at regular intervals from a process.
Each point on the chart represents the value of a subgroup standard deviation.
The center line for each subgroup is the expected value of the standard deviation statistic. Note
that the center line varies when the subgroup sizes are unequal.
The control limits are either:
• A multiple (k) of sigma above and below the center line. Default k=3.
• Probability limits, defined as the probability (alpha) of a point exceeding the limits. Default
alpha=0.27%.
If unspecified, the process sigma is the weighted average of the unbiased subgroup estimates of
sigma based on the standard deviation statistics.
The individual observations are assumed to be independent and normally distributed. Although
the chart is fairly robust and nonnormality is not a serious problem unless there is a considerable
deviation from normality (Burr, 1967).
For subgroup sizes less than 6 with k-sigma control limits, the lower limit is zero. In these cases,
probability limits may be more appropriate.
I-MR chart
An I-MR chart is a combination of control charts used to monitor the process variability (as the
moving range between successive observations) and average (as the mean) when measuring
individuals at regular intervals from a process.
I chart
An I-chart is a type of control chart used to monitor the process mean when measuring individuals
at regular intervals from a process.
It is typical to use individual observations instead of rational subgroups when there is no basis
for forming subgroups, when there is a long interval between observations becoming available,
when testing is destructive or expensive, or for many other reasons. Other charts such as the
exponentially weighted moving average and cumulative sum may be more appropriate to detect
smaller shifts more quickly.
Each point on the chart represents the value of an individual observation.
The center line is the process mean. If unspecified, the process mean is the mean of the individual
observations.
The control limits are either:
• A multiple (k) of sigma above and below the center line. Default k=3.
• Probability limits, defined as the probability (alpha) of a point exceeding the limits. Default
alpha=0.27%.
If unspecified, the process sigma is the standard deviation of the individual observations, unless
the chart is combined with an MR-chart where it is estimated as described for the respective chart.
MR chart
An MR-chart is a type of control chart used to process variability (as the moving range of
successive observations) when measuring individuals at regular intervals from a process.
When data are individual observations, it is not possible to use the standard deviation or range of
subgroups to assess the variability and instead the moving range is an estimate the variability.
Given a series of observations and a fixed subset size, the first element of the moving range is the
range of the initial subset of the number series. Then the subset is modified by "shifting forward";
that is, excluding the first number of the series and including the next number following the subset
in the series. The next element of the moving range is the range of this subset. This process is
repeated over the entire series creating a moving range statistic.
The moving range requires:
• A fixed subset size, the number of successive observations (span) in the moving range. Span
must satisfy 1 < span ≤ n. Default span=2.
Larger values have a dampening effect on the statistic and may be preferable when the data
are cyclical.
Each point on the chart represents the value of a moving range.
The center line is the expected value of the range statistic.
The control limits are either:
• A multiple (k) of sigma above and below the center line. Default k=3.
• Probability limits, defined as the probability (alpha) of a point exceeding the limits. Default
alpha=0.27%.
If unspecified, the process sigma is the weighted average of the unbiased moving range estimates
of sigma based on the range statistics.
You should be careful when interpreting a moving range chart because the values of the statistic
are correlated. Correlation may appear as a pattern of runs or cycles on the chart. Some authors
(Rigdon et al., 1994) recommend not plotting a moving range chart as moving range does not
provide any useful information about shifts in process variability beyond the I-chart.
I-MR Plot Shewhart control charts of the individual observations and moving range of
consecutive observations.
I Plot a Shewhart control chart of the individual observations.
MR Plot a Shewhart control chart of the moving range of consecutive observations.
NP chart
A np-chart is a type of control chart used to monitor the number of nonconforming units when
measuring subgroups at regular intervals from a process.
Each point on the chart represents the number of nonconforming units in a subgroup.
The center line is the average number of nonconforming units. If unspecified, the process average
proportion of nonconforming units is the total nonconforming units divided by the sum of
subgroup sizes. Note that the center line varies when the subgroup sizes are unequal.
The control limits are either:
• A multiple (k) of sigma above and below the center line. Default k=3.
• Probability limits, defined as the probability (alpha) of a point exceeding the limits. Default
alpha=0.27%.
A Binomial distribution is assumed. That is, units are either conforming or nonconforming, and
that nonconformities are independent; the occurrence of a nonconforming unit at a particular
point in time does not affect the probability of a nonconforming unit in the periods that
immediately follow. Violation of this assumption can cause overdispersion; the presence of greater
variance that would be expected based on the distribution. When k-sigma limits are used the
normal approximation to the binomial distribution is assumed adequate which may require large
subgroup sizes when the proportion of nonconforming units is small.
A np-chart is useful when the number of units in each subgroup is constant as interpretation is
easier than a p-chart. For data with different subgroup sizes the center line and control limits both
vary making interpretation difficult. In this case, you should use a p-chart which has a constant
center line but varying control limits.
For small subgroup sizes, the lower control limit is zero in many situations. The lack of a lower limit
is troublesome if the charts use is for quality improvement as the lower limit is desirable as points
P chart
A p-chart is a type of control chart used to monitor the proportion of nonconforming units when
measuring subgroups at regular intervals from a process.
A p-chart is a scaled version of the np-chart representing a proportion of nonconforming units
rather than the number of nonconforming units. The same assumptions and recommendations
apply.
For data with different subgroup sizes, the control limits vary although the center line is constant.
A standardized version of the control chart plots the points in standard deviation units. Such a
control chart has a constant center line at 0, and upper and lower control limits of -3 and +3
respectively making it easier to see patterns.
C chart
A c-chart is a type of control chart used to monitor the total number of nonconformities when
measuring subgroups at regular intervals from a process.
Each point on the chart represents the total number of nonconformities in a subgroup.
The center line is the average number of nonconformities. If unspecified, the process average
number of nonconformities per unit is the total number of nonconformities divided by the sum of
subgroup sizes. Note that the center line varies when the subgroup sizes are unequal.
The control limits are either:
• A multiple (k) of sigma above and below the center line. Default k=3.
• Probability limits, defined as the probability (alpha) of a point exceeding the limits. Default
alpha=0.27%.
A Poisson distribution is assumed. That is, the probability of observing a nonconformity in the
inspection unit should be small, but a large number of nonconformities should be theoretically
possible, and the size of an inspection unit should also be constant over time. When k-sigma limits
are used the normal approximation to the Poisson distribution is assumed adequate which usually
requires the average number of nonconformities to be at least 5.
A c-chart is useful when the number of units in each subgroup is constant as interpretation is
easier than a u-chart. For data with different subgroup sizes, the center line and control limits both
vary making interpretation difficult. In this case, you should use a u-chart which has a constant
center line but varying control limits.
U chart
A u-chart is a type of control chart used to monitor the average number of nonconformities per
unit when measuring subgroups at regular intervals from a process.
A u-chart is a scaled version of the c-chart representing the average number of nonconformities
per unit rather than the number of nonconformities. The same assumptions and recommendations
apply.
For data with different subgroup sizes, the control limits vary although the center line is constant.
A standardized version of the control chart plots the points in standard deviation units. Such a
control chart has a constant center line at 0, and upper and lower control limits of -3 and +3
respectively making it easier to see patterns.
You should choose tests in advance of looking at the control chart based on your knowledge
of the process. Applying test 1 to a Shewhart control chart for an in-control process with
observations from a normal distribution leads to a false alarm once every 370 observations on
average. Additional tests make the chart more sensitive to detecting special-cause variation, but
also increases the chance of false alarms. For example, applying tests 1, 2, 5, 6 raises the false
alarm rate to once every 91.75 observations.
Given a series of observations and a fixed subset size, the first element of the moving average
is the average of the initial subset of the number series. Then the subset is modified by "shifting
forward"; that is, excluding the first number of the series and including the next number following
the subset in the series. The next element of the moving average is the average of this subset. This
process is repeated over the entire series creating the moving average statistic.
The UWMA requires:
• A fixed subset size, the number of successive observations (span) in the moving average. Span
must satisfy 1 < span ≤ n. Default span=3.
A small span reduces the influence of older observations; a large span slows the response to
large shifts. In general, the magnitude of the shift to detect and the span are inversely related.
Each point on the chart represents the value of a moving average.
The center line is the process mean. If unspecified, the process mean is the weighted mean of the
subgroup means or the mean of the individual observations.
The control limits are a multiple (L) of sigma above and below the center line. Default L=3. If
unspecified, the process sigma is the pooled standard deviation of the subgroups, or the standard
deviation of the individual observations, unless the chart is combined with an R-, S-, or MR- chart
where it is estimated as described for the respective chart.
UWMA is sensitive to small shifts in the process mean, but is not as effective as either the CUSUM
or EWMA (Montgomery 2012).
Given a series of observations and a fixed weight, the first element of the exponentially weighted
moving average is computed by taking the (1-weight) * previous EWMA + (weight * current
observation). Then the current observation is modified by "shifting forward"; and repeating the
calculation. This process is repeated over the entire series creating the exponentially weighted
moving average statistic.
The EWMA requires:
• A weight for the most recent observation. Weight must satisfy 0 < weight ≤ 1. Default
weight=0.2.
The "best" value is a matter of personal preference and experience. A small weight reduces the
influence of the most recent sample; a large value increases the influence of the most recent
sample. A value of 1 reduces the chart to a Shewhart Xbar chart. Recommendations suggest a
weight between 0.05 and 0.25 (Montgomery 2012).
When designing an EWMA chart it is necessary to consider the average run length and shift to be
detected. Extensive guidance is available on suitable parameters (Montgomery 2012).
It is possible to modify the EWMA, so it responds more quickly to detect a process that is out-of-
control at start-up. This modification is done using a further exponentially decreasing adjustment
to narrow the limits of the first few observations (Montgomery 2012).
Each point on the chart represents the value of the exponentially weighted moving average.
The center line is the process mean. If unspecified, the process mean is the weighted mean of the
subgroup means or the mean of the individual observations.
The control limits are a multiple (L) of sigma above and below the center line. Default L=3. If
unspecified, the process sigma is the pooled standard deviation of the subgroups, or the standard
deviation of the individual observations, unless the chart is combined with an R-, S-, or MR- chart
where it is estimated as described for the respective chart.
Because the EWMA is a weighted average of all past and the current observations, it is very
insensitive to the assumption of normality. It is, therefore, an ideal replacement for a Shewhart I-
chart when normality cannot be assumed.
Requirements
• A categorical or quantitative variable.
• A stratification variable.
• A subgroup variable.
• A phase/stage variable.
Copper
7.96
8.52
9.24
7.96
10.04
8.68
7.46
8.84
8.9
9.28
…
1 SNH IQ 7.96
1 SNH IQ 8.52
2 JDH IQ 9.24
2 GMH IQ 7.96
3 SNH IQ 10.04 Electrode failure
3 SNH IQ 8.68
… … IQ …
20 JDH OQ 8.68
20 SNH OQ 8.76
21 JDH OQ 8.02
21 GMH OQ 8.7
… … OQ …
1 100 7
2 80 8
3 80 12
4 100 6
5 110 10
6 110 12
7 100 16
8 90 10
9 90 6
10 120 20
… … …
Process capability
23
Capability analysis measures the ability of a process to meet specifications when the process is in
statistical control.
A process must be in control before attempting to assess the capability. An out-of-control process
is unpredictable and not capable of been characterized by a probability distribution.
Most process capability indices assume a normally distributed quality characteristic. If the
distribution is non-normal, it may be possible to transform the data to be normally distributed. The
process mean and process sigma define the normal distribution.
Capability indices are either "long-term" or "short-term" depending on the definition of the
process sigma:
• Long-term indices measure the process performance and represent the quality the end-user
experiences. They are computed using the process sigma that includes both within-subgroup
and between-subgroup variation (the standard deviation of the individual measurements).
• Short-term indices measure the potential process performance ignoring differences between
subgroups. They are computed using the process sigma that includes only within-subgroup
variation (the Xbar-, R-, S-, or MR- control chart process sigma).
If the process is stable over time, the estimates of short-term sigma and long-term sigma are very
similar. They are both estimates of the same parameter, although statistically speaking the long-
term sigma is a slightly more efficient estimator.
However, if there are any changes in the process mean over time, the estimate of long-term sigma
is greater than that of short-term sigma. The larger the difference between the values of long-term
and short-term indices, the more opportunity there is to improve the process by eliminating drift,
shifts and other sources of variation.
Note: There is much confusion over the meaning of the phrases long-term and short-term. It is
important not to confuse them with the collection period of the sample data.
229
the Cp indices are much smaller than the Pp indices, it indicates that there are improvements you
could make by eliminating shifts and drifts in the process mean.
Various indices measure how the process is performing against the specification limits:
Index Purpose
Cp/Pp Estimates the capability of a process if the process mean were to be centered
between the specification limits.
Note: If the process mean is not centered between the specification limits the value
is only the potential capability, and you should not report it as the capability of the
process.
Cpl/Ppl Estimates the capability of a process to meet the lower specification limit. Defined as
how close the process mean is to the lower specification limit.
Cpu/Ppu Estimates the capability of a process to meet the upper specification limit. Defined as
how close the process mean is to the upper specification limit.
Cpk/Ppk Estimates the capability of a process, considering that the process mean may not be
centered between the specification limits. Defined as the lesser of Cpl and Cpu.
Note: If Cpk is equal to Cp, then the process is centered at the midpoint of the
specification limits. The magnitude of Cpk relative to Cp is a measure of how off
center the process is and the potential improvement possible by centering the
process.
Cpm/Ppm Estimates the capability of a process, and is dependent on the deviation of the
process mean from the target.
Note: Cpm increases as the process mean moves towards the target. Cpm, Cpk,
and Cp all coincide when the target is the center of the specification limits and the
process mean is centered.
Note: There is some confusion between terms "Cp" and "Pp" as some authors suggest the
use of Pp indices when a process is not-in-control and Cp indices when a process is in control.
However, it is nonsense to interpret the indices when the process is not-in-control as no probability
distribution can describe the process performance. We make the distinction between Pp and Cp
indices on the estimate of sigma used not on the state of the process.
Z benchmark
Z benchmark describes the sigma capability of a process.
Z benchmark indices are an alternative to Cp and Pp indices. They are the definition of the sigma
capability of a Six Sigma process.
All of the indices assume a normally distributed process quality characteristic with the parameters
specified by the process mean and sigma. The process sigma is either the short-term or long-term
sigma estimate.
Various indices measure how the process is performing against the specification limits:
Index Description
< LSL The number of sigma units from the process mean to the lower specification limit.
> USL The number of sigma units from the process mean to the upper specification limit.
Z shift is the difference between the short-term and long-term indices. The larger the Z shift, the
more scope there is to improve the process by eliminating shifts and drifts in the process mean.
Some industries define the sigma capability of a process as the long-term Z benchmark + a 1.5 Z
shift. Meaning a process with a long-term Z benchmark of 4.5 is quoted as a Six Sigma process.
It is best to avoid such rules and directly measure the short-term and long-term Z benchmark
capability.
Nonconforming units
Nonconforming units describe the number of nonconforming units a process produces, expressed
in parts per million.
The number of nonconforming units is an alternative to traditional Cp and Pp indices. They are
easily understandable by end-users.
The number of nonconforming units is either:
• The actual number of nonconforming units in the sample.
• The expected number of nonconforming units, assuming a normally distributed process quality
characteristic with the parameters specified by the process mean and sigma. The process sigma
is either the short-term or long-term sigma estimate.
Various indices measure how the process is performing against the specification limits:
Index Description
< LSL The number of nonconforming units that are less than the lower specification limit.
> USL Thr number of nonconforming units that are greater than the upper specification
limit.
< SL > The total number of nonconforming units that are outside the specification limits.
3-up Compute capability ratios and plot a histogram and univariate plot.
4-up Compute capability ratios and plot a histogram, univariate plot, and normal
probability plot.
6-up Compute capability ratios and plot a histogram, univariate plot, normal probability
plot and control charts.
Pareto analysis
24
Pareto analysis identifies the most important quality-related problems to resolve in a process.
The Pareto principle (also known as the 80/20 rule) states that for many events, roughly 80% of
the problems come from 20% of the causes. This statement is merely a rule of thumb and is not
an immutable law of nature. More generally, a small subset of issues tend to cause most problems,
and it is useful to identify those issues as they have the most impact on the process.
Pareto chart
A Pareto chart shows the frequency of occurrences of quality-related problems to highlight those
that need the most attention.
Bars represent the individual values ordered by decreasing magnitude. A line represents the
cumulative total. The left vertical axis is the frequency of occurrence or some other unit of
measurement (such as cost or time). The right vertical axis is the cumulative frequency expressed
as a percentage or total of the unit of measurement (such as total cost, total time).
Note: In user-interface controls and documentation, it is typical to refer to the count/frequency
for usability and clarity rather than more generic terms such as measure/unit of measurement.
233
5. In the Primary axis drop-down list, select:
Flow Layout the plots across the width of the page, before flowing onto a new row.
Matrix Layout the plots in a matrix as defined by the row and column factors.
11.Click Calculate.
Requirements
• A categorical variable of failures.
• 1 or 2 factor variables.
Dataset layout
Use a column for the variable (Failute), an optional column for the factor variable (Operator); each
row has the values of the variables for a failure (ID).
239
for backup or archival purposes, or (b) install the SOFTWARE in accordance with the Grant of License provided you keep the original
solely for backup or archival purposes. You may not copy the printed materials accompanying the SOFTWARE.
LIMITED WARRANTY
ANALYSE-IT SOFTWARE LIMITED makes no warranty, expressed or implied, about the merchantability or fitness for a particular purpose,
of the SOFTWARE during the evaluation period. You install it, and evaluate it, entirely at your own risk.
Only when a license has been purchased does ANALYSE-IT SOFTWARE LIMITED warrant that the SOFTWARE will perform substantially
in accordance with the accompanying documentation for a period of ninety (90) days from the date of receipt. Any implied warranties
on the SOFTWARE are limited to ninety (90) days, or the shortest period permitted by applicable law, whichever is greater.
To the maximum extent permitted by the applicable law, ANALYSE-IT SOFTWARE LIMITED disclaims all other warranties, either express
or implied, including but not limited to implied warranties of merchantability and fitness for a particular purpose, with respect to
the SOFTWARE, the accompanying product manual(s) and written material. The Limited Warranty contained herein gives you specific
legal rights.
CUSTOMER REMEDIES
ANALYSE-IT SOFTWARE LIMITED’s entire liability and your exclusive remedy shall be, at ANALYSE-IT SOFTWARE LIMITED’s option,
either (a) return of the price paid or (b) repair or replacement of the SOFTWARE that does not meet ANALYSE-IT SOFTWARE LIMITED’s
Limited Warranty and which is returned to ANALYSE-IT SOFTWARE LIMITED with a copy of the receipt. This Limited Warranty is void
if failure of the SOFTWARE has resulted from an accident, abuse, or misapplication. Any replacement SOFTWARE will be warranted
for the remainder of the original warranty period or thirty (30) days, whichever is longer.
LIMITED LIABILITY
To the maximum extent permitted by applicable law, ANALYSE-IT SOFTWARE LIMITED shall not be liable for any other damages
whatsoever (including, without limitation, damages for loss of business profits, business interruption, business information, or other
pecuniary loss) arising out of the use or inability to use this product, even if ANALYSE-IT SOFTWARE LIMITED has been advised of
the possibility of such damages. In any case, ANALYSE-IT SOFTWARE LIMITED’s entire liability under any provision of this agreement
shall be limited to the price paid for the SOFTWARE.
TERMINATION
Without prejudice to any other rights, ANALYSE-IT SOFTWARE LIMITED may terminate this EULA if you fail to comply with the terms
and conditions herein. In such event, you must destroy all copies of the SOFTWARE.
GOVERNING LAW
This EULA is governed by the laws of England, UK.
241
Cornbleet, P. J., & Gochman, N. (1979). Incorrect least-squares regression coefficients in method-
comparison analysis. Clinical Chemistry, 25(3), 432-438.
Currie, L. A. (1999). Detection and quantification limits: origins and historical overview. Analytica
Chimica Acta, 391(2), 127-134.
DeLong, E. R., DeLong, D. M., & Clarke-Pearson, D. L. (1988). Comparing the areas under two or
more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics,
837-845.
Gabriel, K. R. (1971). The biplot graphic display of matrices with application to principal
component analysis. Biometrika, 58(3), 453-467.
Emancipator, K., & Kroll, M. H. (1993). A quantitative measure of nonlinearity. Clinical chemistry,
39(5), 766-772.
Feinstein, A. R., & Cicchetti, D. V. (1990). High agreement but low kappa: I. The problems of two
paradoxes. Journal of clinical epidemiology, 43(6), 543-549.
Gower, J. C., Lubbe, S. G., & Le Roux, N. J. (2011). Understanding Biplots. John Wiley & Sons.
Harrell, F. E., & Davis, C. E. (1982). A new distribution-free quantile estimator. Biometrika, 69(3),
635-640.
Hanley, J. A., & McNeil, B. J. (1982). The meaning and use of the area under a receiver operating
characteristic (ROC) curve. Radiology, 143(1), 29-36.
Hanley, J. A., & McNeil, B. J. (1983). A method of comparing the areas under receiver operating
characteristic curves derived from the same cases. Radiology, 148(3), 839-843.
Hsieh, E., & Liu, J. P. (2008). On statistical evaluation of the linearity in assay validation. Journal of
Biopharmaceutical Statistics, 18(4), 677-690.
Hsieh, E., Hsiao, C. F., & Liu, J. P. (2009). Statistical methods for evaluating the linearity in assay
validation. Journal of Chemometrics, 23(1), 56-63.
Hirji, K. F. (2005). Exact analysis of discrete data. CRC Press.
Horn, P. S. (1990). Robust quantile estimators for skewed populations. Biometrika, 77(3), 631-636.
Horn, P. S., Pesce, A. J., & Copeland, B. E. (1998). A robust approach to reference interval
estimation and evaluation. Clinical Chemistry, 44(3), 622-631.
Horn, P. S., Pesce, A. J., & Copeland, B. E. (1999). Reference interval computation using robust vs.
parametric and nonparametric analyses. Clinical Chemistry, 45(12), 2284-2285.
Horn, P. S., & Pesce, A. J. (2005). Reference intervals: a user's guide. American Association for
Clinical Chemistry.
Hsu, J. C. (1996). Multiple Comparisons: Theory and Methods. CRC Press.
ISO. (1994). ISO 5725: Accuracy (Trueness and Precision) of Measurement Methods and Results.
International Organization for Standardization.
Jolliffe, I. (2002). Principal component analysis. Springer-Verlag, New York, Inc.
Kroll, M. H., & Emancipator, K. (1993). A theoretical evaluation of linearity. Clinical chemistry,
39(3), 405-413.
Krouwer, J. S. (2008). Why Bland–Altman plots should use X, not (Y+ X)/2 when X is a reference
method. Statistics in Medicine, 27(5), 778-780.
Krouwer, J. S., & Monti, K. L. (1995). A simple, graphical method to evaluate laboratory assays.
European journal of clinical chemistry and clinical biochemistry, 33(8), 525-528.
242 Bibliography
Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data.
biometrics, 159-174.
Linnet, K. (1993). Evaluation of regression procedures for methods comparison studies. Clinical
Chemistry, 39, 424-424.
Linnet, K. (1990). Estimation of the linear relationship between the measurements of two methods
with proportional errors. Statistics in Medicine, 9(12), 1463-1473.
Linnet, K. (1998). Performance of Deming regression analysis in case of misspecified analytical
error ratio in method comparison studies. Clinical chemistry, 44(5), 1024-1031.
Linnet, K. (2000). Nonparametric estimation of reference intervals by simple and bootstrap-based
procedures. Clinical Chemistry, 46(6), 867-869.
Krouwer, J. S. (2002). Setting performance goals and evaluating total analytical error for diagnostic
assays. Clinical Chemistry, 48(6), 919-927.
Linnet, K., & Kondratovich, M. (2004). Partly nonparametric approach for determining the limit of
detection. Clinical chemistry, 50(4), 732-740.
Liu, A., & Bandos, A. I. (2012). Statistical evaluation of diagnostic performance: topics in ROC
analysis. CRC Press.
Monti, K. L. (1995). Folded empirical distribution function curves—mountain plots. The American
Statistician, 49(4), 342-345.
Montgomery, D. C. (2012). Introduction to statistical quality control. John Wiley & Sons.
Nelson, L. S. (1984). Technical Aids: The Shewhart Control Chart -Tests for Special Causes. Journal
of quality technology, 16(4).
Passing, H., & Bablok, W. (1983). A new biometrical procedure for testing the equality of
measurements from two different analytical methods. Application of linear regression procedures
for method comparison studies in clinical chemistry, Part I. Clinical Chemistry and Laboratory
Medicine, 21(11), 709-720.
Passing, H., & Bablok, W. (1984). Comparison of Several Regression Procedures for Method
Comparison Studies and Determination of Sample Sizes Application of linear regression procedures
for method comparison studies in Clinical Chemistry, Part II. Clinical Chemistry and Laboratory
Medicine, 22(6), 431-445.
Pepe, M. S. (2003). The statistical evaluation of medical tests for classification and prediction.
Oxford University Press.
Pollock, M. A., Jefferson, S. G., Kane, J. W., Lomax, K., MacKinnon, G., & Winnard, C. B. (1992).
Method comparison - a different approach. Annals of clinical biochemistry, 29, 556-560.
Rigdon, S. E., Cruthis, E. N., & Champ, C. W. (1994). Design strategies for individuals and moving
range control charts. Journal of Quality Technology, 26(4), 274-287.
Ryan, T. P. (2011). Statistical methods for quality improvement. John Wiley & Sons.
Sadler, W. A., & Smith, M. H. (1986). A reliable method of estimating the variance function in
immunoassay. Computational Statistics & Data Analysis, 3, 227-239.
Sadler, W. A., Smith, M. H., & Legge, H. M. (1988). A method for direct estimation of imprecision
profiles, with reference to immunoassay data. Clinical chemistry, 34(6), 1058-1061.
Sadler, W. A., (2016). Using the variance function to estimate limit of blank, limit of detection and
their confidence intervals. Ann Clin Biochem, 53, 141-149.
Strike, P. W. (2014). Statistical methods in laboratory medicine. Butterworth-Heinemann.
N. I. S. T. (2012). e-Handbook of Statistical Methods.
Bibliography 243
Sahai, H., & Ojeda, M. M. (2004). Analysis of Variance for Random Models, Volume 1: Balanced
Data. Springer Science & Business Media.
Sahai, H., & Ojeda, M. M. (2004). Analysis of Variance for Random Models, Volume 2:
Unbalanced Data. Springer Science & Business Media.
Solberg, H. E. (1987). Approved recommendation on the theory of reference values. Part 5.
Statistical treatment of collected reference values. Determination of reference limits. Clinica
Chimica Acta, 170(2), S13-S32.
Stöckl, D., Dewitte, K., & Thienpont, L. M. (1998). Validity of linear regression in method
comparison studies: is it limited by the statistical model or the quality of the analytical input data?.
Clinical Chemistry, 44(11), 2340-2346.
Western Electric Company. (1958). Statistical Quality Control Handbook. AT&T.
Xue J., Titterington D. M. (2010, unpublished). The p-folded Cumulative Distribution Function and
the Mean Absolute Deviation from the p-quantile.
Zhou, X. H., Obuchowski, N. A., & McClish, D. K. (2011). Statistical methods in diagnostic
medicine. Wiley-Blackwell.
Zweig, M. H., & Campbell, G. (1993). Receiver-operating characteristic (ROC) plots: a fundamental
evaluation tool in clinical medicine. Clinical Chemistry, 39(4), 561-577.
244 Bibliography
Index
A C
agreement measures 162, 163 capability indices 229, 231
agreement plot 164 case form data 38
analysis categories
cloning 46 color 41
creating 45 labeling 41
editing 46 ordering 40
printing 47 symbol 41
printing charts 47 CDF plot 59, 59
recalculating 46 central limit theorem 62
report 45 CLSI EP15 174, 178, 179
task pane 33 CLSI EP17 183, 184, 185
ANCOVA 128, 130 CLSI EP5 174, 175
Anderson-Darling test CLSI EP6 181
normality 62, 62, 193 coefficient of variation 55
ANOVA Cohen's d
factorial 128, 129 independent samples 79, 80
one factor between subjects 77, 78 related samples 91, 91
repeated measures 89, 90 color map 109
table 133 compare groups 75
two factor 128, 129 compare pairs 87
association 107 confidence interval 63
AUC contingency table 95, 96, 96
difference 201, 202 control chart
autocorrelation 141 CUSUM 224, 225
EWMA 222, 223
Shewhart 210
B UWMA 221, 223
Bartlett test Cook's D 141, 142
independent samples 84, 84 correlation
beta coefficients 132, 145 coefficent 108
bias matrix 108, 109
average 155, 156, 158 method comparison 149
Binomial test monoplot 118, 119
one sample 71, 72 covariance
biplot 116, 119 matrix 108, 109
Bland-Altman plot 159 monoplot 118
Bonferroni Cronbach's alpha 125, 125
multiple comparisons 82, 83, 138, 139 CUSUM control chart 224, 225
box plot
side-by-side 76 D
skeletal 57
Tukey outlier 57, 192 dataset
Brown-Forsythe test definition 35
independent samples 84, 84 filtering 42
go to 47
task pane 33
245
dataset layout relative 69
compare groups 86 frequency distribution 58, 69
compare pairs 93 frequency form data 38
contingency table 105 frequency plot
correlation 113 bar 69, 70
diagnostic performance 205 grouped bar 97
distribution 74 pie 70
fit model 147 spie 70
method comparison 166, 169 stacked bar 70, 70
MSA 186 Friedman test
multivariate 113 related samples 89, 90
reference interval 196 F test
decision thresholds 203, 203, 203, 204 effect of model 133
Deming regression 151, 152 effect of term 135
descriptive statistics independent samples 84, 84
multivariate 108, 108 lack of fit 134
univariate 55, 56
univariate, by group 75
detection capability 183
G
detection limit geometric mean 55
probit 185
diagnostic performance 197
difference between means/medians H
independent samples 79
harmonic mean 55
related samples 91
Hedge's g
difference plot
independent samples 79, 80
method comparison 157, 158, 159
related samples 91, 91
distribution
histogram 59, 60, 191
continuous 55
Hodges-Lehmann location shift
discrete 69
independent samples 79, 80
dot plot
related samples 91, 91
side-by-side 76
Hodges-Lehmann pseudo-median 65, 65
Dunnett
Hsu
multiple comparisons 82, 82, 83, 83, 138, 138, 139, 139
multiple comparisons 82, 83, 138, 139
Durbin-Watson 141
hypothesis testing 65
hypothesis tests
E association 111
binomial 71
EFA 121, 122 correlation 111
eigenvalues 115 equality of means/medians
eigenvectors 115 independent samples 76
related samples 88
F equality of proportions 99
equivalence of means
factor analysis 121, 121, 122 independent samples 77
false negative 197 related samples 88
false positive 197 homogeneity of variance 84
Fisher's LSD independence 102
multiple comparisons 82, 83, 138, 139 mean 66, 66
Fisher exact test median 66
independent samples 101, 101 multinomial 72
fit model normality 61
linear 127 variance 66
logistic 144
folded CDF plot 161, 161
frequency
I
count 69 influence plot 141
cumulative 69 interaction plot 137
cumulative relative 69 interferences 182
density 69 intermediate precision 173
246 Index
interquartile range (IQR) 55 mean-mean scatter plot 81
ISO 5725 174 mean plot
item reliability 125, 125 side-by-side 76
measurement scale 36
measurement systems analysis 173
K median 55, 65, 65
kappa 163 median difference
Kendall tau 110, 110, 111 related samples 91
Kolmogorov-Smirnov test method comparison 149
normality 62, 62 minimum 55
Kruskal-Wallis test missing values 38
independent samples 77, 78 mode 55
kurtosis 55 monoplot 116, 118, 119
mosaic plot 104, 104
mountain plot 161, 161
L moving average control chart
EWMA 222, 223
lag plot 139, 141
UWMA 221, 223
least-squares means 136
MSA 173
Levene test
multiple comparisons 80, 83, 137, 139
independent samples 84, 84
leverage 141, 142
leverage plot 136 N
license
acquiring 22 nonconforming units 231, 231
activating 19 normal distribution 60
agreement 239 normality 60, 62, 140
deactivating 21 normal probability plot 61, 61, 193, 193
releasing 22 numerical accuracy 48
transferring 16
who's using 22 O
likelihood ratio 197
likelihood ratio G² test odds 71
effect of model 146 odds ratio 97, 98, 98, 146
effect of term 146 ordinary linear regression
independence 103, 103 method comparison 150, 151
one sample 73, 73 outlier 141
limit of blank 183 outliers 46, 57, 142
limit of detection 183
limit of quantitation 185
limits of agreement 159, 159
P
linearity 154, 155, 180, 181 Pareto analysis 233
logit 144 Pareto chart 233
log odds ratio 146 Passing-Bablok regression 151, 153
pattern matrix 121
M PCA 115, 116
Pearson r 110, 110, 111
main effects plot 137 Pearson X² test
maintenance independence 103, 103
checking expiry 26 independent samples 101, 101
renewing 26 one sample 73, 73
matrix rotation 122 point estimate 63
maximum 55 PPA/NPA 162
McNemar-Mosteller exact test precision 173, 173, 174, 175
related samples 101, 102 precision profile 176, 176, 176, 177, 184
McNemar X² test predicted against actual Y plot 134
related samples 101, 102 predictive value 198
mean 55, 65, 65 principal components 115, 115, 116
mean difference process capability 229
independent samples 79, 80 proportion 71
related samples 91, 91 proportion difference 97
Index 247
proportion ratio 97 sequence plot 139, 141
p-value Shapiro-Wilk test
asymptotic 100 normality 62, 62, 193
exact 100 Shewhart control chart
hypothesis testing 65 attributes 215
c 216, 217
I 212, 214
Q I-MR 212, 214
Q-Q plot 61, 61, 193 MR 213, 214
qualitative variable 36 np 215, 217
quantile p 216, 217
bootstrap 190 R 211, 213
Harrell-Davis 190 rules 218
normal 189 S 212, 213
robust bi-weight 190 u 216, 217
quantiles 55 variables 210
quantitative variable 36 Xbar 211, 213
quartiles 55 Xbar-R 211, 213
Xbar-S 211, 213
Sign test
R one sample 67, 67
related samples 89, 90
range 55
skewness 55
reference interval 189, 189, 190, 194
software
reference limit 189, 190, 194
installing 15
regression
installing for concurrent-use 16
advanced 128, 130
uninstalling 17
exponential 127, 127
updating 25
linear 127, 127
Spearman rs 110, 110
logarithmic 127, 127
specificity 197, 198, 199
logistic 144, 144, 144
standard deviation 55, 65, 65
multiple linear 128, 128
standardized beta coefficients 132
polynomial 127, 127
standardized mean difference
power 127, 127
independent samples 79, 80
probit 144, 145
related samples 91, 91
relative risk 97, 99
Steel
relative standard deviation 55
multiple comparisons 82, 83
repeatability 173
Steel-Dwass-Critchlow-Flinger
reproducibility 173
multiple comparisons 82, 83
residual plot 139, 141, 155, 155
strip plot 57
ribbon 33
structure matrix 121
risk difference 97
Student's t test
risk ratio 97
independent samples 77, 78
ROC plot 199, 200, 200
multiple comparisons 82, 83, 138, 139
R² 131
one sample 67, 67
related samples 89, 90
S Student-Newman-Keuls
multiple comparisons 82, 83
scatter plot summary of fit 131
correlation / association 107, 107 system requirements 15
matrix 107, 107
method comparison 150
regression 131 T
Scheffé
TOST
multiple comparisons 82, 83, 138, 139
one sample 68
Score Z test
TOST (two-one-sided t-tests)
independent samples 101, 101
independent samples 77, 79
one sample 71, 72
repeated measures 89, 90
related samples 101, 102
troubleshooting 27
scree plot 116
true negative 197
sensitivity 197, 198, 199
trueness 178
248 Index
true positive 197 related samples 89, 90
Tukey-Kramer
multiple comparisons 82, 83, 138, 139
two-one-sided t tests
one sample 68
U
univariate plot
side-by-side 76, 76
single 57, 58
V
variable
definition 36
saving 143
setting axis scale 39
setting measurement scale 39
setting number format 40
transforming 42
variance 55, 65, 65
variance components 173
variance inflation factors 132
W
weighted Deming regression 151, 152
weighted linear regression
method comparison 150, 151
Welch ANOVA
independent samples 77, 78
Welch t test
independent samples 77, 78
Wilcoxon-Mann-Whitney test
independent samples 77, 78
multiple comparisons 82, 83
Wilcoxon test
one sample 67, 67
related samples 89, 90
X
X² test
lack of fit 134
X² variance test
one sample 67, 67
Y
Yates correction 103
Youden J 198
Z
Z benchmark indices 230, 231
Z test
independent samples 77, 78
one sample 67, 67
Index 249
250 Index