Understanding Bland-Altman Analyses

Lessons in biostatistics
Understanding Bland Altman analysis

Davide Giavarina
Clinical Chemistry and Hematology Laboratory, San Bortolo Hospital, Vicenza, Italy
Corresponding author: [email protected]
Abstract
In a contemporary clinical laboratory it is very common to have to assess the agreement between two quantitative methods of measurement. The
correct statistical approach to assess this degree of agreement is not obvious. Correlation and regression studies are frequently proposed. However,
correlation studies the relationship between one variable and another, not the differences, and it is not recommended as a method for assessing the
comparability between methods.
In 1983 Altman and Bland (B&A) proposed an alternative analysis, based on the quantification of the agreement between two quantitative mea-
surements by studying the mean difference and constructing limits of agreement.
The B&A plot analysis is a simple way to evaluate a bias between the mean differences, and to estimate an agreement interval, within which 95%
of the differences of the second method, compared to the first one, fall. Data can be analyzed both as unit differences plot and as percentage diffe-
rences plot.
The B&A plot method only defines the intervals of agreements, it does not say whether those limits are acceptable or not. Acceptable limits must be
defined a priori, based on clinical necessity, biological considerations or other goals.
The aim of this article is to provide guidance on the use and interpretation of Bland Altman analysis in method comparison studies.
Key words: Bland-Altman; agreement analysis; laboratory research; method comparison; correlation of data
Received: February 23, 2015 Accepted: April 30, 2015
Introduction
Medical laboratories often need to assess the The measurement of variables always implies
agreement between two measurement methods. some degree of error. When two methods are
Every time we have to change one method for an- compared, neither provides an unequivocally cor-
other one, or evaluate a new or alternative meth- rect measurement, so it could be interesting trying
od, or quite simply we have an alignment problem to assess the degree of agreement.
between two instruments, we need some tools to To assess this degree of agreement, the correct
measure and appraise the differences as well as statistical approach is not obvious. Many studies
the cause of these differences. give the product–moment correlation coefficient
Validation of a clinical measurement should in- (r) between the results of two measurement meth-
clude all of the procedures that demonstrate that ods as an indicator of agreement. However, corre-
a particular method used for the quantitative lation studies the relationship between one varia-
measurement of the variable concerned is both re- ble and another, not the differences, and it is not
liable and reproducible for the intended use. recommended as a method for assessing the com-
parability between methods.
http://dx.doi.org/10.11613/BM.2015.015 Biochemia Medica 2015;25(2):141–51

©Copyright 141
by Croatian Society of Medical Biochemistry and Laboratory Medicine. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License
(http://creativecommons.org/licenses/by-nc-nd/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Giavarina D. Bland Altman analysis
In 1983 Altman and Bland re-proposed an alterna- this means that samples should cover a wide con-
tive analysis, firstly presented by Eksborg in 1981 centration range. A high correlation for any two
(1), based on the quantification of the agreement methods designed to measure the same property
between two quantitative measurements by stud- could thus, in itself just be a sign that one has cho-
ying the mean difference and constructing limits sen a widespread sample.
of agreement (2). Correlation quantifies the degree to which two
variables are related. But a high correlation does
Correlation and linear regression not automatically imply that there is good agree-
ment between the two methods. The correlation
Correlation is a statistical technique that can show coefficient and regression technique are some-
whether, and how strongly, pairs of variables are times inadequate and can be misleading when as-
related. There are several different correlation sessing agreement, because they evaluate only
techniques, including the Pearson or product-mo- the linear association of two sets of observations.
ment correlation, probably the most common The r measures the strength of a relation between
one. The main result of a correlation is called the two variables, not the agreement between them.
correlation coefficient (or “r”). It is computed as the Similarly, r2, named the coefficient of determina-
ratio of covariance between the variables to the tion, only tells us the proportion of variance that
product of their standard deviations. The numeri- the two variables have in common. Finally, the test
cal value of r ranges from -1.0 to +1.0. This enables of significance may show that the two methods
us to get an idea of the strength of relationship - or are related, but it is obvious that two methods de-
rather the strength of linear relationship between signed to measure the same variable are related.
the variables. The closer the coefficients are to +1.0 Moreover, the test of significance could be mis-
or -1.0, the greater the strength of the linear rela- leading; the significance of the correlation de-
tionship is. Usually, a linear regression study is per- pends on the values of the correlation coefficient.
formed together with correlation measurement. If the correlation coefficient is statistically signifi-
Actually, linear regression can be calculated only if cant with respect to the set limit (P < 0.05) only
the correlation exists and correlation coefficient then we can interpret its value; which means that
can be interpreted only if the P value is significant. if we get for example r = 0.22 and P = 0.027 we
However, P is significant and regression can be cal- should not conclude that there is a “significant re-
culated for most cases of method comparison. Lin- lationship”, but we can claim that there is no rela-
ear regression finds the best line that predicts one tionship between the variables, because, calculat-
variable from the other one. Linear regression ed coefficient of variation, which indicates the ab-
quantifies goodness of fit with r2, the coefficient of sence of correlation, is statistically significant.
determination. Correlation describes linear rela-
The proposed Passing and Bablok regression anal-
tionship between two sets of data but not their
ysis to overcome some limits of correlation analy-
agreement (3). Moreover, frequently a null hypoth-
sis partially solves problems related with data dis-
esis is used to verify if the two methods are not lin-
tribution and with the detection of a constant or
early related. With even a minimal trend, the prob-
proportional difference between two methods.
ability of null hypothesis is very small and it can be
Compared with the other frequently proposed
safely, but sometimes erroneously, concluded that
method, the Deming regression (4), the Passing
the two measurement methods are indeed relat-
and Bablok regression could be preferred for com-
ed.
paring clinical methods, because it does not as-
However, the two methods that are designed to sume measurement error is normally distributed,
measure the same variable should have good cor- and is robust against outliers. However, it needs
relation when a set of samples are chosen in such the residuals analysis, the distribution of difference
manner that the property to be determined varies around fitted regression line, for a complete inter-
considerably. In the case of method comparison, pretation of regression results (5). This is quite sim-
Biochemia Medica 2015;25(2):141–51 http://dx.doi.org/10.11613/BM.2015.015

142
ilar, but more complicated than the analysis of dif- Table 1. Hypothetical data of an agreement between two
ferences, described below. methods (Method A and B).
Mean (A – B)/
Method A Method B (A – B)
(A+B)/2 Mean
The analysis of differences: the Bland and (units) (units)
(units)
(units)
(%)
Altman method 1.0 8.0 4.5 -7.0 -155.6%
Bland and Altman introduced the Bland-Altman 5.0 16.0 10.5 -11.0 -104.8%
(B&A) plot to describe agreement between two 10.0 30.0 20.0 -20.0 -100.0%
quantitative measurements (6). They established 20.0 24.0 22.0 -4.0 -18.2%
a method to quantify agreement between two 50.0 39.0 44.5 11.0 24.7%
quantitative measurements by constructing lim- 40.0 54.0 47.0 -14.0 -29.8%
its of agreement. These statistical limits are cal- 50.0 40.0 45.0 10.0 22.2%
culated by using the mean and the standard de- 60.0 68.0 64.0 -8.0 -12.5%
viation (s) of the differences between two mea- 70.0 72.0 71.0 -2.0 -2.8%
surements. To check the assumptions of normal- 80.0 62.0 71.0 18.0 25.4%
ity of differences and other characteristics, they 90.0 122.0 106.0 -32.0 -30.2%
used a graphical approach. 100.0 80.0 90.0 20.0 22.2%
The resulting graph is a scatter plot XY, in which 150.0 181.0 165.5 -31.0 -18.7%
the Y axis shows the difference between the two 200.0 259.0 229.5 -59.0 -25.7%
paired measurements (A-B) and the X axis repre- 250.0 275.0 262.5 -25.0 -9.5%
sents the average of these measures ((A+B)/2). In 300.0 380.0 340.0 -80.0 -23.5%
other words, the difference of the two paired 350.0 320.0 335.0 30.0 9.0%
measurements is plotted against the mean of 400.0 434.0 417.0 -34.0 -8.2%
the two measurements. B&A recommended that
450.0 479.0 464.5 -29.0 -6.2%
95% of the data points should lie within ± 2s of
500.0 587.0 543.5 -87.0 -16.0%
the mean difference. This is the most common
550.0 626.0 588.0 -76.0 -12.9%
way to plot the B&A method, but it is also possi-
600.0 648.0 624.0 -48.0 -7.7%
ble to plot the differences as percentages or ra-
650.0 738.0 694.0 -88.0 -12.7%
tios, and one can use the first method or the sec-
ond one, instead of the mean of both methods. 700.0 766.0 733.0 -66.0 -9.0%
750.0 793.0 771.5 -43.0 -5.6%
The following example could help in familiariz-
800.0 851.0 825.5 -51.0 -6.2%
ing with the B&A graph plot. Table 1 shows a hy-
850.0 871.0 860.5 -21.0 -2.4%
pothetical series of paired data, from which it is
900.0 957.0 928.5 -57.0 -6.1%
possible to construct the B&A plot and to evalu-
ate the agreement. In the first column a series of 950.0 1001.0 975.5 -51.0 -5.2%
hypothetical variable measurements is shown, 1000.0 960.0 980.0 40.0 4.1%

obtained by a method, named method A. The mean (d) -27.17 -17.40%
data is sorted from smallest to largest. The sec- standard deviation (s) 34.81 -12.64%
ond column shows the measurements obtained Mean differences (d) and standard deviation (s) are shown.
for the same specimens but with a second, dif-
ferent method, B. Therefore, each line shows
paired data. Figure 1 indicates the regression
line between the two methods; correlation coef- 0.001), and the regression equation is y = 7.08
ficient between the two methods is r = 0.996 (-0.30 to 19.84) + 1.06 (1.02 to 1.09) x; that could
(95% confidence interval, CI = 0.991-0.998, P < be evaluated as a very good agreement.

143
1.200 bias could be a constant or an average result

arising from problems for specific concentra-
1.000 tions or values. It is important to evaluate the
800
differences at different magnitudes of the mea-
sured variable. If neither of the two methods is a
method B
600 “reference”, the differences could be compared

with the mean of the two paired values. The av-
400
erage can be seen in column 3. The B&A graph
200 plot simply represents every difference between
two paired methods against the average of the
0
measurement, as shown in Figure 2. The differ-
0 200 400 600 800 1.000 1.200
method A
ences between method A and method B are
plotted against the mean of the two measure-
Figure 1. The regression line between hypothetical measurements. Plotting difference against mean also al-
ments done by method A and method B.
Regression equation is expressed as: y = a (95% CI) + b (95% CI)
lows us to investigate any possible relationship
x (Passing & Bablok regression) (21). Regression line has a slope between measurement error and the true value.
of 1.06 (1.02 to 1.09) and an intercept of 7.08 (-0.30 to 19.84). But since we do not know the true value, the
Correlation coefficient between the two methods is r = 0.996
95% confidence interval, CI = 0.991-0.998, P < 0.001.
mean of the two measurements is the best esti-
mate we have (7). If the first method is a stan-
dard or reference method, we can use these val-
ues instead of the mean of the two measure-
If the aim is to evaluate the agreement between ments (8), although this is controversial, because
the two measurements, it could be interesting a plot of the difference against a “standard mea-
to statistically study the behaviors of the differ- surement” will always appear to show a relation
ences between one measurement and the other. between difference and magnitude when there
Column 4 shows these differences. An ideal is none (9).
model would claim that the measurements ob-
tained by one method or another gave exactly
60
the same results. So, all the differences would be
40
equal to zero. But any measurement of variables
always implies some degree of error. Even the 20
method A – method B
mere analytical imprecision for method A and 0

Bias
method B generates a variability of the differ- –20 Mean

ences. However, if the variability of the differenc- –40
–27.2
es were only linked to analytical imprecision of –60
each of the two methods, the average of these
–80
differences should be zero. This is the first point
–100
required to evaluate the agreement between
the two methods: look at the average of the dif- –120
ferences between the paired data. 0 200 400 600 800 1000 1200
Mean of method A and method B
From our example, the average of the differenc-
es is -27.17 units (bottom line of table 1). This Figure 2. Plot of differences between method A and method B
mean difference (d) is not zero, and this means vs. the mean of the two measurements (data from table 1). The
that on average the second method (B) mea- bias of -27.2 units is represented by the gap between the X axis,
corresponding to a zero differences, and the parallel line to the
sures 27.17 units more than the first one. This X axis at -27.2 units.

144
The bias of -27.2 units is represented by the gap 25

between the X axis, corresponding to zero dif-
ferences, and the parallel line to the X axis at 20
Relative frequency (%)

-27.2 units. This negative bias seems to be due to
measurements over 200 units, while for lower 15
concentrations data are closer to each other. A
negative trend seems to be evident along the 10
graph, as better shown in Figure 3. Drawing a re-

5
gression line of the differences could help in de-
tecting a proportional difference (10-12). The vi-
0
sual examination of the plot allows us to evalu-
–140 –120 –100 –80 –60 –40 –20 0 20 40 60 80
ate the global agreement between the two
Method A – Method B
measurements. In our example, we can summa-
rize the lack of agreement by calculating the Figure 4. Distribution plot of differences between measure-
bias, estimated by the mean difference (d) and ment by methods A and B.
The dotted line represents Normal distribution. Shapiro-Wilk
the standard deviation of the differences (s). We test for normal distribution accepted normality (P = 0.814).
would expect most of the differences to lie be-
tween d -2s and d +2s, or more precisely, 95% of
differences will be between d-1.96s and d +1.96s,
do seem to be (Figure 4). Statistical tests should
if the differences are normally distributed
always be used to determine if the distribution is
(Gaussian). Normal distribution of the differenc-
normal, since in some cases normality cannot be
es must always be verified, for example by draw-
determined simply by observing the histogram
ing a histogram. If this is skewed or has very long
plot. If any statistical software is available, a test
tails the assumption of normality may not be
for normal distribution (such as Shapiro-Wilk test
valid. From the example of table 1, the measure-
(13), D’Agostino-Pearson test (14), Kolmogorov-
ments of the two methods are not distributed
Smirnov test (15)) can be done, for the hypothe-
normally, but on the other hand the differences
sis that the distribution of the observations in
the sample is normal (if P < 0.05 then reject nor-
60
mality). If differences are not normally distribut-
40 ed, a logarithmic transformation of original data
20 can be tried.
0 After ensuring that our differences are normally

–20 Mean distributed, we can use the s to define the limits
–40
–27.2 of agreement. From data of table 1, s = 34.8, so
–60 95% of differences will be
–80 d-1.96s = -27.2 – (1.96 x 34.8) = -95.4
–100 d +1.96s = -27.2 + (1.96 x 34.8) = 41.1
–120 So, results measured by method A may be 95 units
0 200 400 600 800 1000 1200 below or 41 above method B (Figure 5).
Figure 3. The same plot as Figure 1 including regression line Bias and agreement limits
and confidence interval limits.
Dotted line represents the regression line (y = -0.05 (-0.08 to
-0.01)x – 10.15 (-28.07 to 7.77) confidence interval limits are pre-
The B&A plot system does not say if the agree-
sented as continuous line. ment is sufficient or suitable to use a method or

145
60 the statistics to see if these limits are exceeded, or

+1.96 s
40
41.1
not.
20
Limit of Agreement
Precision of estimated limits of agreement
–20 Mean As with any statistical evaluation, we only esti-

–40
–27.2 mate a value which applies to whole population.
–60 Our estimating precision depends on the
–80
amount of observed data, i.e. on the sample size.
–100
–1.96 s It would be opportune to calculate the confi-
–95.4
dence interval (CI) in order to see how precise
–120
our estimates are. In particular, the 95% CI of the
0 200 400 600 800 1000 1200
mean difference illustrates the magnitude of the
systematic difference. If the line of equality is not
Figure 5. Bland and Altman plot for data from the table 1, with in the interval, there is a significant systematic
the representation of the limits of agreement (doted line), from
-1.96s to +1.96s.
difference, i.e. the second method constantly
under- or over- estimates compared to the first
one.
The 95% CI of agreement limits allows for the es-
the other indifferently. It simply quantifies the bias
timate of the size of the possible sampling error.
and a range of agreement, within which 95% of
It can be measured by using standard error pro-
the differences between one measurement and
the other are included. It is possible to say that the vided the differences follow a distribution which
bias is significant, because the line of equality is is approximately normal (16). Standard error of d
not within the confidence interval of the mean dif- is and standard error of d-2s and d +2s is
ference (Figure 6, see over), but only analytical, bi- about . 95% CI corresponds to the ob-
ological or clinical goals could define whether the served value minus t standard errors to the ob-
agreement interval is too wide or sufficiently nar- served value plus t standard errors, where t is
row for our purpose. The best way to use the B&A the value of t distribution (17) with n-1 degrees
plot system would be to define a priori the limits of freedom. Table 2 shows all the B&A plot statis-
of maximum acceptable differences (limits of tics, including CIs. But usually simple statistic
agreement expected), based on biologically and programs can perform all these calculations and
analytically relevant criteria, and then to obtain what matters is to understand the significance
Table 2. Bland and Altman plot statistics from data of table 1, including the elements to calculate confidence intervals.
Standard t value Confidence intervals

error Standard for 29 degrees of Confidence
Parameter Unit formula error (se) freedom (se * t) from – to
number (n) 30
degrees of freedom (n-1) 29
difference mean (d) -27.17 6.35 2.05 13.00 -40.16 -14.17
standard deviation (s) 34.81
d –1.96s -95.39 11.01 2.05 22.51 -117.90 -72.88
d +1.96s 41.05 11.01 2.05 22.51 18.54 63.56

146
60
100
+1.96 s
40
(method A – method B)/Average %

41.1 +1.96 s
20 50 58.4%
0
0 Mean
–20 Mean –17.4%
–27.2 –50
–40
–1.96 s
–60 –100 –93.2%
–80
–1.96 s –150
–100 –95.4
–120 –200
0 200 400 600 800 1000 1200 0 200 400 600 800 1000 1200
Mean of method A and method B Mean of method A and method B
Figure 6. Same plot as Figure 2, with the representation of con- Figure 7. Plot of differences between method A and method
fidence interval limits for mean and agreement limits (shaded B, expressed as percentages of the values on the axis [(method
areas, data from table 2). A – Method B)/mean%)], vs. the mean of the two measurements
(data from table 1). Shaded areas present confidence interval
limits for mean and agreement limits.
of the areas of confidence around the mean dif-

ference and the agreement limits, as shown in Common instances in laboratory diagnostics
Figure 6. In summary, the CIs of mean difference
Proposed in 1983 (2), the B&A plot method is now
and of the agreement limits simply describe a
widespread. Their paper in the Lancet “Statistical
possible error in the estimate, due to a sampling
methods for assessing agreement between two
error. The greater the number of samples used methods of clinical measurement” (17) has been cit-
for the evaluation of the difference between the ed more than 30,000 times by a large number of
methods, the narrower will be the CIs, both for peer reviewed scientific papers (18). Many exam-
the mean difference and for the agreement lim- ples are available in scientific literature, usually as
its. supplements to regression analysis and the scatter
plot (19), a practice that is also recommended by
Bland and Altman method: plot difference as the Clinical and Laboratory Standards Institute
percentage (CLSI) (20).
In a B&A plot system the differences can be also In Figure 8 some common models, which could
expressed as percentages of the values on the axis represent general behaviors of agreement analysis
(i.e. proportionally to the magnitude of measure- are reported. Five cases are proposed, one for
ments [(method A – Method B)/mean %)]. This op- each line, each one analyzed by regression analy-
tion is useful when there is an increase in variabili- sis and B&A plot, in unit (second column) and per-
ty of the differences as the magnitude of the centage values (third column) versus the mean of
measurement increases. Figure 7 represents the the two methods.
same data as Figure 6, plotted as percentage of In the first example, case A, two highly correlated
differences. The bias (mean difference) is -17.4%, al- measurements are compared. Notwithstanding a
most constant for all the measured concentra- determination coefficient of 0.9992, differences
tions, with the exception of very low values. As for between the two measurements can be seen bet-
the plot of unit values, this bias is significant, since ter in the B&A plot, that defines a bias of -7.1 units
the line of equality is not in the CI. The agreement and an agreement range from -60.5 and 46.4 units.
limits are from -93.2% to 58.4%. A difference plot allows us to evaluate a moderate

147
negative trend of differences, proportional to the of plus 15 units in method B, given the same pro-
magnitude of the measurement. The bias seems portional variability (CV%) of 5%, as in case C. An
to change with concentration, becoming lower example of a constant systematic error could be
when the concentration is higher. Moreover, the an error in the blank reagent, or a matrix effect in-
differences seem to be constant, with a slight en- terfering with one method but not with the other.
largement of the agreement limits, correlating This constant error is immediately returned as a
with the concentration levels (absolute values, A2). bias of -15 units in the unit difference plot. The
However, a difference of plus 46 or less 60 units percentage difference plot shows how this error
would be important for a measurement of 100 or affected more measurements of low concentra-
200 or 300 units, while they would not be signifi- tions, while the percentage bias verges to 0% for
cant for 1000 or 2000 or 3000 unit measurements. higher ones.
This information is better represented when the The last case, E, hypothesizes a proportional con-
differences are plotted as percentage of the constant error, overlapped with the same proportion-
centration (A3). The bias is -0.5% and the 2s agree- al variability (CV%) of 5%, as in case C. An example
ment range is ± 11% (from -11.5% to 10.5%), princi- could be a calibration error in one method, or a
pally caused by the lower measurements; above problem in some constants in an equation when
500 units, the 2s agreement range seems to be computing the final results. The effect is that the
less than 5%. magnitude of difference (bias) changes in a linear
A model of this behavior in the differences com- fashion. The widening trend of data with increas-
parison is case B, where a constant s = ± 50 units ing concentrations is due to the constant CV% =
was hypothesized. If the variability of the differ- 5%. If a proportional constant error was over-
ences between the two measurements proce- lapped with a constant variability, the variability of
dures is constant, the two plots will appear as they the differences will be consistent across the meas-
do in case B; the spread of the differences remain uring interval, but the bias will show a linear slope.
consistent across the range of concentration on Case E could be a model for data from Table 2,
the reported units difference plot (B2), but it in- plotted in Figures 3 and 7. Case E is the only case in
creases significantly with decreasing concentra- which the linear regression provides clear informa-
tion on the percentage difference plot (B3). tion about a problem of agreement between the
In the case of proportional difference variability two measurements, with a significant change in
between measurements, i.e. constant coefficient the slope of the regression line. On the contrary,
of variation across the range of concentration, the when the agreement analysis is conducted on a
effect on the B&A plot in reported unit difference wide range of concentrations, correlation and lin-
is a widening trend of the agreement range with ear regression are not particularly informative, and
increasing concentrations (C2). Intuitively, in the could also be misunderstanding. Cases A to D are
percentage difference plot, the trends remain par- quite similar if only correlation is taken into ac-
allel to the x axis (C3). count.
For constant differences across the intervals of
concentrations, the reporting unit difference pro- Summary and highlights
vides a better representation of the difference be-
If you want to evaluate whether the differences
tween the two measurements, while percentage
between two measurements of the same sub-
difference plot is preferable for proportional dif-
stance are significant, study the differences, not
ference variability (constant coefficient of varia-
the agreement. The correlation between methods
tion).
is always misleading and should not be used for
If other errors overlap these sources of variability, assessing the method comparability. The B&A plot
they add their effects to the previous one. For in- analysis is a simple way to evaluate a bias between
stance, in case D we hypothesized a constant error the mean differences, and to estimate an agree-

148
Case A
A2 A3
(Method A – Method B) / Average %

3000 60 15
+1.96 SD
Method A – Method B
+1.96 SD
2500 40 46.4 10 10.5
20 5
2000
Method A
0 Mean 0 Mean
1500 – 7.1 – 0.5
–20 –5
1000
–40 –10 –1.96 SD
500 –1.96 SD –11.5
Y = 4.8 (–6.1 to 15.6) + 1.0 (1.0 to 1.0) x –60 –15
–60.5
R2 = 0.999
0 –80 –20
0 500 1000 1500 2000 2500 3000 0 500 1000 1500 2000 2500 3000 3500 0 500 1000 1500 2000 2500 3000 3500
Method B Mean of Method A and Method B Mean of Method A and Method B
Case B
(Method_A – Method_B) / Average %

3000 150 B2 120 B3
Method_A – Method_B
+1.96 SD
2500 100 80
99.3 +1.96 SD
2000 50
Method_A
40 58.2
Mean
1500 0
0.0 Mean
0
1000 –50 0.0
–100 –1.96 SD –40

500 Y = 4.1 (–22.6 to 30.9) + 1.0 (1.0 to 1.0) x –1.96 SD
–99.3
R2 = 0.996 –58.2
–150 –80
0 500 1000 1500 2000 2500 3000 0 500 1000 1500 2000 2500 3000 0 500 1000 1500 2000 2500 3000
Method_B Mean of Method_A and Method_B Mean of Method_A and Method_B
Case C
B2 C3
3000 150 15
+1.96 SD
2500 100 +1.96 SD
109.0 10
9.9
2000 50 5
Method_A
Mean Mean
1500 0 0
0.0 0.0
1000 –50 –5
–100 –1.96 SD –1.96 SD
500 –10
Y = 4.2 (–20.5 to 28.9) + 1.0 (1.0 to 1.0) x –109.0 –9.9
2
R = 0.995 –150 –15
0 500 1000 1500 2000 2500 3000 0 500 1000 1500 2000 2500 3000 0 500 1000 1500 2000 2500 3000
Figure 8. Method comparisons of two measurements in five different cases presented as regression analysis (column 1), Bland and
Altman plot where differences are presented as units (column 2) and Bland and Altman plot where differences are presented as per-
centage (column 3).
Cases A, B, C, D and E represent hypothetical examples: A - random variability; B - constant variability, s = ± 50 units; C - constant coef-
ficient of variation, CV% = 5%; D - constant error of plus 15 units in method B, given the same proportional variability (CV%) of 5%, as
in case C; E - proportional constant error over CV% = 5%. Regression equation is expressed as: y= a (95% CI) + b (95% CI)x.
CI – confidence interval.

149
Case D

3000 150 D2 100 D3
+1.96 SD
2500 100 +1.96 SD 50 60.5
94.0
0
Method_A
2000 50 Mean
–18.2
1500 0 Mean –50
–15.0 –1.96 SD
1000 –50 –100
–96.9
–150
500 Y = –10.8 (–35.7 to 14.2) + 1.0 (1.0 to 1.0) –100
x –1.96 SD
R2 = 0.995 –150
–124.0 –200
0
0 500 1000 1500 2000 2500 3000 0 500 1000 1500 2000 2500 3000 0 500 1000 1500 2000 2500 3000 3500
Case E
E2 E3

3000 600 30 +1.96 SD
28.0
2500 500
20 Mean
400 +1.96 SD
16.4
369.0
Method_A
2000 10
300 –1.96 SD
1500 200 4.8
0
Mean
1000 100 119.3 –10
0
500 –1.96 SD –20
Y = 5.7 (–23.3 to 34.8) + 1.2 (1.1 to 1.2) x –100
0 R2 = 0.993 –131.3
–200 –30
0 500 1000 1500 2000 2500 3000 0 500 1000 1500 2000 2500 3000 0 500 1000 1500 2000 2500 3000
Figure 8. Method comparisons of two measurements in five different cases presented as regression analysis (column 1), Bland and
Altman plot where differences are presented as units (column 2) and Bland and Altman plot where differences are presented as per-
centage (column 3).
Cases A, B, C, D and E represent hypothetical examples: A - random variability; B - constant variability, s = ± 50 units; C - constant coef-
ficient of variation, CV% = 5%; D - constant error of plus 15 units in method B, given the same proportional variability (CV%) of 5%, as
in case C; E - proportional constant error over CV% = 5%. Regression equation is expressed as: y= a (95% CI) + b (95% CI)x.
CI – confidence interval.
ment interval, within which 95% of the differences considered, to allow the better evaluation. The
of the second method, compared to the first one B&A plot method only defines the intervals of
fall. Data can be logarithmically transformed, if dif- agreements, it does not say whether those limits
ferences seem not to be normally distributed. For are acceptable or not. Acceptable limits must be
bias and agreement limits, appropriate CIs can be defined a priori, based on clinical necessity, biolog-
computed, in order to consider the sampling error ical considerations or other goals.
in relation to the dimension of the sample. Data
can be analyzed as unit differences plot or as per- Potential conflict of interest
centage differences plot. Both the plots may be
None declared.

150
References
1. Eksborg S. Evaluation of method-comparison data. Clin 12. Eisenhauer JG. Regression through the origin. Teach Stat
Chem 1981;27:1311-2. 2003;25:76-80. http://dx.doi.org/10.1111/1467-9639.00136.
2. Altman DG, Bland JM. Measurement in medicine: the 13. Shapiro SS, Wilk MB. An analysis of variance test for nor-
analysis of method comparison studies. Statistician mality (complete samples). Biometrika 1965;52:3-4. http://
1983;32:307–17. http://dx.doi.org/10.2307/2987937. dx.doi.org/10.1093/biomet/52.3-4.591.
3. Udovičić M, Baždarić K, Bilić-Zulle L, Petrovečki M. What we 14. Sheskin DJ. Handbook of parametric and nonparametric
need to know when calculating the coefficient of correla- statistical procedures. 5th ed. Chapman & Hall / CRC Press,
tion? Biochem Med (Zagreb) 2007;17:10-5. http://dx.doi. Boca Raton, FL, 2011.
org/10.11613/BM.2007.002. 15. Neter J, Wasserman W, Whitmore GA eds. Applied statistics.
4. Martin RF. General Deming regression for estimating syste- 3rd ed. Allyn and Bacon, Boston, MA, 1998.
matic bias and its confidence interval in method-compari- 16. Bland JM, Altman DG. Statistical method for assessing
son studies. Clin Chem 2000;46:100-4. agreement between two methods of clinical measurement.
5. Bilić-Zulle L. Comparison of methods: Passing and Bablok Lancet 1986;327:307-10. http://dx.doi.org/10.1016/S0140-
regression. Biochem Med (Zagreb) 2011;21:49-52. http:// 6736(86)90837-8.
dx.doi.org/10.11613/BM.2011.010. 17. Medcalc manual. Available at: http://www.medcalc.org/
6. Bland JM, Altman DG. Measuring agreement in method manual/t-distibution.php. Accessed January 23rd, 2015.
comparison studies. Stat Methods Med Res 1999;8:135-60. 18. Google Scholar search engine, Available at http://scho-
http://dx.doi.org/10.1191/096228099673819272. lar.google.it/scholar?cites=7296362022254018043&as_
7. Bland JM, Altman DG. Statistical methods for assessing sdt=2005&sciodt=0,5&hl=it. Accessed February 2nd, 2015.
agreement between two methods of clinical measurement. 19. Dewitte K, Fierens C, Stöckl D, LM Thienpont. Application of
Int J Nurs Stud 2010;47:931-6. http://dx.doi.org/10.1016/j. the Bland-Altman plot for interpretation of method - com-
ijnurstu.2009.10.001. parison studies: a critical investigation of its practice. Clin
8. Krouwer JS. Why Bland-Altman plots should use X, not Chem 2002;48:799-801.
(Y+X)/2 when X is a reference method. Stat Med 2008; 20. Clinical and Laboratory Standards Institute (CLSI): Me-
27:778-80. http://dx.doi.org/10.1002/sim.3086. asurement procedure comparison and bias estimation
9. Bland JM, Altman DG. Comparing methods of measure- using patient samples. Approved guideline - Fifth Edition.
ment: why plotting difference against standard method CLSI document EP09-A3. Wayne, PA, USA, 2013.
is misleading. Lancet 1995;346:1085-87. http://dx.doi. 21. Passing H, Bablok W. A new biometrical procedure for te-
org/10.1016/S0140-6736(95)91748-9. sting the equality of measurements from two different
10. Armitage P, Berry G, Matthews JNS eds. Statistical methods analytical methods. Application of linear regression proce-
in medical research. 4th ed. Maiden, MA: Blackwell Science, dures for method comparison studies in Clinical Chemistry,
2002. http://dx.doi.org/10.1002/9780470773666. Part I. J Clin Chem Clin Biochem 1983;21:709-20. http://
11. Bland M. An introduction to medical statistics. 3rd ed. dx.doi.org/10.1515/cclm.1983.21.11.709.
Oxford University Press, Oxford, 2000.

151

Understanding Bland-Altman Analyses

Uploaded by

Copyright:

Available Formats

Understanding Bland-Altman Analyses

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Understanding Bland-Altman Analyses

Uploaded by

Copyright:

Available Formats

Lessons in biostatistics

Understanding Bland Altman analysis

Corresponding author: [email protected]

Received: February 23, 2015 Accepted: April 30, 2015

http://dx.doi.org/10.11613/BM.2015.015 Biochemia Medica 2015;25(2):141–51

Biochemia Medica 2015;25(2):141–51 http://dx.doi.org/10.11613/BM.2015.015

hypothetical variable measurements is shown, 1000.0 960.0 980.0 40.0 4.1%

http://dx.doi.org/10.11613/BM.2015.015 Biochemia Medica 2015;25(2):141–51

1.200 bias could be a constant or an average result

600 “reference”, the differences could be compared

mere analytical imprecision for method A and 0

method B generates a variability of the differ- –20 Mean

Biochemia Medica 2015;25(2):141–51 http://dx.doi.org/10.11613/BM.2015.015

The bias of -27.2 units is represented by the gap 25

Relative frequency (%)

graph, as better shown in Figure 3. Drawing a re-

0 After ensuring that our differences are normally

http://dx.doi.org/10.11613/BM.2015.015 Biochemia Medica 2015;25(2):141–51

60 the statistics to see if these limits are exceeded, or

–20 Mean As with any statistical evaluation, we only esti-

Standard t value Confidence intervals

d +1.96s 41.05 11.01 2.05 22.51 18.54 63.56

Biochemia Medica 2015;25(2):141–51 http://dx.doi.org/10.11613/BM.2015.015

(method A – method B)/Average %

of the areas of confidence around the mean dif-

http://dx.doi.org/10.11613/BM.2015.015 Biochemia Medica 2015;25(2):141–51

Biochemia Medica 2015;25(2):141–51 http://dx.doi.org/10.11613/BM.2015.015

(Method A – Method B) / Average %

Method B Mean of Method A and Method B Mean of Method A and Method B

(Method_A – Method_B) / Average %

–100 –1.96 SD –40

Method_B Mean of Method_A and Method_B Mean of Method_A and Method_B

http://dx.doi.org/10.11613/BM.2015.015 Biochemia Medica 2015;25(2):141–51

(Method_A – Method_B) / Average %

(Method_A – Method_B) / Average %

Method_B Mean of Method_A and Method_B Mean of Method_A and Method_B

Biochemia Medica 2015;25(2):141–51 http://dx.doi.org/10.11613/BM.2015.015

http://dx.doi.org/10.11613/BM.2015.015 Biochemia Medica 2015;25(2):141–51

You might also like