Understanding Bland-Altman Analyses
Understanding Bland-Altman Analyses
Understanding Bland-Altman Analyses
Clinical Chemistry and Hematology Laboratory, San Bortolo Hospital, Vicenza, Italy
Abstract
In a contemporary clinical laboratory it is very common to have to assess the agreement between two quantitative methods of measurement. The
correct statistical approach to assess this degree of agreement is not obvious. Correlation and regression studies are frequently proposed. However,
correlation studies the relationship between one variable and another, not the differences, and it is not recommended as a method for assessing the
comparability between methods.
In 1983 Altman and Bland (B&A) proposed an alternative analysis, based on the quantification of the agreement between two quantitative mea-
surements by studying the mean difference and constructing limits of agreement.
The B&A plot analysis is a simple way to evaluate a bias between the mean differences, and to estimate an agreement interval, within which 95%
of the differences of the second method, compared to the first one, fall. Data can be analyzed both as unit differences plot and as percentage diffe-
rences plot.
The B&A plot method only defines the intervals of agreements, it does not say whether those limits are acceptable or not. Acceptable limits must be
defined a priori, based on clinical necessity, biological considerations or other goals.
The aim of this article is to provide guidance on the use and interpretation of Bland Altman analysis in method comparison studies.
Key words: Bland-Altman; agreement analysis; laboratory research; method comparison; correlation of data
Introduction
Medical laboratories often need to assess the The measurement of variables always implies
agreement between two measurement methods. some degree of error. When two methods are
Every time we have to change one method for an- compared, neither provides an unequivocally cor-
other one, or evaluate a new or alternative meth- rect measurement, so it could be interesting trying
od, or quite simply we have an alignment problem to assess the degree of agreement.
between two instruments, we need some tools to To assess this degree of agreement, the correct
measure and appraise the differences as well as statistical approach is not obvious. Many studies
the cause of these differences. give the product–moment correlation coefficient
Validation of a clinical measurement should in- (r) between the results of two measurement meth-
clude all of the procedures that demonstrate that ods as an indicator of agreement. However, corre-
a particular method used for the quantitative lation studies the relationship between one varia-
measurement of the variable concerned is both re- ble and another, not the differences, and it is not
liable and reproducible for the intended use. recommended as a method for assessing the com-
parability between methods.
In 1983 Altman and Bland re-proposed an alterna- this means that samples should cover a wide con-
tive analysis, firstly presented by Eksborg in 1981 centration range. A high correlation for any two
(1), based on the quantification of the agreement methods designed to measure the same property
between two quantitative measurements by stud- could thus, in itself just be a sign that one has cho-
ying the mean difference and constructing limits sen a widespread sample.
of agreement (2). Correlation quantifies the degree to which two
variables are related. But a high correlation does
Correlation and linear regression not automatically imply that there is good agree-
ment between the two methods. The correlation
Correlation is a statistical technique that can show coefficient and regression technique are some-
whether, and how strongly, pairs of variables are times inadequate and can be misleading when as-
related. There are several different correlation sessing agreement, because they evaluate only
techniques, including the Pearson or product-mo- the linear association of two sets of observations.
ment correlation, probably the most common The r measures the strength of a relation between
one. The main result of a correlation is called the two variables, not the agreement between them.
correlation coefficient (or “r”). It is computed as the Similarly, r2, named the coefficient of determina-
ratio of covariance between the variables to the tion, only tells us the proportion of variance that
product of their standard deviations. The numeri- the two variables have in common. Finally, the test
cal value of r ranges from -1.0 to +1.0. This enables of significance may show that the two methods
us to get an idea of the strength of relationship - or are related, but it is obvious that two methods de-
rather the strength of linear relationship between signed to measure the same variable are related.
the variables. The closer the coefficients are to +1.0 Moreover, the test of significance could be mis-
or -1.0, the greater the strength of the linear rela- leading; the significance of the correlation de-
tionship is. Usually, a linear regression study is per- pends on the values of the correlation coefficient.
formed together with correlation measurement. If the correlation coefficient is statistically signifi-
Actually, linear regression can be calculated only if cant with respect to the set limit (P < 0.05) only
the correlation exists and correlation coefficient then we can interpret its value; which means that
can be interpreted only if the P value is significant. if we get for example r = 0.22 and P = 0.027 we
However, P is significant and regression can be cal- should not conclude that there is a “significant re-
culated for most cases of method comparison. Lin- lationship”, but we can claim that there is no rela-
ear regression finds the best line that predicts one tionship between the variables, because, calculat-
variable from the other one. Linear regression ed coefficient of variation, which indicates the ab-
quantifies goodness of fit with r2, the coefficient of sence of correlation, is statistically significant.
determination. Correlation describes linear rela-
The proposed Passing and Bablok regression anal-
tionship between two sets of data but not their
ysis to overcome some limits of correlation analy-
agreement (3). Moreover, frequently a null hypoth-
sis partially solves problems related with data dis-
esis is used to verify if the two methods are not lin-
tribution and with the detection of a constant or
early related. With even a minimal trend, the prob-
proportional difference between two methods.
ability of null hypothesis is very small and it can be
Compared with the other frequently proposed
safely, but sometimes erroneously, concluded that
method, the Deming regression (4), the Passing
the two measurement methods are indeed relat-
and Bablok regression could be preferred for com-
ed.
paring clinical methods, because it does not as-
However, the two methods that are designed to sume measurement error is normally distributed,
measure the same variable should have good cor- and is robust against outliers. However, it needs
relation when a set of samples are chosen in such the residuals analysis, the distribution of difference
manner that the property to be determined varies around fitted regression line, for a complete inter-
considerably. In the case of method comparison, pretation of regression results (5). This is quite sim-
ilar, but more complicated than the analysis of dif- Table 1. Hypothetical data of an agreement between two
ferences, described below. methods (Method A and B).
Mean (A – B)/
Method A Method B (A – B)
(A+B)/2 Mean
The analysis of differences: the Bland and (units) (units)
(units)
(units)
(%)
Altman method 1.0 8.0 4.5 -7.0 -155.6%
Bland and Altman introduced the Bland-Altman 5.0 16.0 10.5 -11.0 -104.8%
(B&A) plot to describe agreement between two 10.0 30.0 20.0 -20.0 -100.0%
quantitative measurements (6). They established 20.0 24.0 22.0 -4.0 -18.2%
a method to quantify agreement between two 50.0 39.0 44.5 11.0 24.7%
quantitative measurements by constructing lim- 40.0 54.0 47.0 -14.0 -29.8%
its of agreement. These statistical limits are cal- 50.0 40.0 45.0 10.0 22.2%
culated by using the mean and the standard de- 60.0 68.0 64.0 -8.0 -12.5%
viation (s) of the differences between two mea- 70.0 72.0 71.0 -2.0 -2.8%
surements. To check the assumptions of normal- 80.0 62.0 71.0 18.0 25.4%
ity of differences and other characteristics, they 90.0 122.0 106.0 -32.0 -30.2%
used a graphical approach. 100.0 80.0 90.0 20.0 22.2%
The resulting graph is a scatter plot XY, in which 150.0 181.0 165.5 -31.0 -18.7%
the Y axis shows the difference between the two 200.0 259.0 229.5 -59.0 -25.7%
paired measurements (A-B) and the X axis repre- 250.0 275.0 262.5 -25.0 -9.5%
sents the average of these measures ((A+B)/2). In 300.0 380.0 340.0 -80.0 -23.5%
other words, the difference of the two paired 350.0 320.0 335.0 30.0 9.0%
measurements is plotted against the mean of 400.0 434.0 417.0 -34.0 -8.2%
the two measurements. B&A recommended that
450.0 479.0 464.5 -29.0 -6.2%
95% of the data points should lie within ± 2s of
500.0 587.0 543.5 -87.0 -16.0%
the mean difference. This is the most common
550.0 626.0 588.0 -76.0 -12.9%
way to plot the B&A method, but it is also possi-
600.0 648.0 624.0 -48.0 -7.7%
ble to plot the differences as percentages or ra-
650.0 738.0 694.0 -88.0 -12.7%
tios, and one can use the first method or the sec-
ond one, instead of the mean of both methods. 700.0 766.0 733.0 -66.0 -9.0%
750.0 793.0 771.5 -43.0 -5.6%
The following example could help in familiariz-
800.0 851.0 825.5 -51.0 -6.2%
ing with the B&A graph plot. Table 1 shows a hy-
850.0 871.0 860.5 -21.0 -2.4%
pothetical series of paired data, from which it is
900.0 957.0 928.5 -57.0 -6.1%
possible to construct the B&A plot and to evalu-
ate the agreement. In the first column a series of 950.0 1001.0 975.5 -51.0 -5.2%
ferences between the paired data. 0 200 400 600 800 1000 1200
Mean of method A and method B
From our example, the average of the differenc-
es is -27.17 units (bottom line of table 1). This Figure 2. Plot of differences between method A and method B
mean difference (d) is not zero, and this means vs. the mean of the two measurements (data from table 1). The
that on average the second method (B) mea- bias of -27.2 units is represented by the gap between the X axis,
corresponding to a zero differences, and the parallel line to the
sures 27.17 units more than the first one. This X axis at -27.2 units.
Figure 3. The same plot as Figure 1 including regression line Bias and agreement limits
and confidence interval limits.
Dotted line represents the regression line (y = -0.05 (-0.08 to
-0.01)x – 10.15 (-28.07 to 7.77) confidence interval limits are pre-
The B&A plot system does not say if the agree-
sented as continuous line. ment is sufficient or suitable to use a method or
Limit of Agreement
Precision of estimated limits of agreement
method A – method B
Table 2. Bland and Altman plot statistics from data of table 1, including the elements to calculate confidence intervals.
60
100
+1.96 s
40
0
0 Mean
–20 Mean –17.4%
–27.2 –50
–40
–1.96 s
–60 –100 –93.2%
–80
–1.96 s –150
–100 –95.4
–120 –200
0 200 400 600 800 1000 1200 0 200 400 600 800 1000 1200
Mean of method A and method B Mean of method A and method B
Figure 6. Same plot as Figure 2, with the representation of con- Figure 7. Plot of differences between method A and method
fidence interval limits for mean and agreement limits (shaded B, expressed as percentages of the values on the axis [(method
areas, data from table 2). A – Method B)/mean%)], vs. the mean of the two measurements
(data from table 1). Shaded areas present confidence interval
limits for mean and agreement limits.
negative trend of differences, proportional to the of plus 15 units in method B, given the same pro-
magnitude of the measurement. The bias seems portional variability (CV%) of 5%, as in case C. An
to change with concentration, becoming lower example of a constant systematic error could be
when the concentration is higher. Moreover, the an error in the blank reagent, or a matrix effect in-
differences seem to be constant, with a slight en- terfering with one method but not with the other.
largement of the agreement limits, correlating This constant error is immediately returned as a
with the concentration levels (absolute values, A2). bias of -15 units in the unit difference plot. The
However, a difference of plus 46 or less 60 units percentage difference plot shows how this error
would be important for a measurement of 100 or affected more measurements of low concentra-
200 or 300 units, while they would not be signifi- tions, while the percentage bias verges to 0% for
cant for 1000 or 2000 or 3000 unit measurements. higher ones.
This information is better represented when the The last case, E, hypothesizes a proportional con-
differences are plotted as percentage of the con- stant error, overlapped with the same proportion-
centration (A3). The bias is -0.5% and the 2s agree- al variability (CV%) of 5%, as in case C. An example
ment range is ± 11% (from -11.5% to 10.5%), princi- could be a calibration error in one method, or a
pally caused by the lower measurements; above problem in some constants in an equation when
500 units, the 2s agreement range seems to be computing the final results. The effect is that the
less than 5%. magnitude of difference (bias) changes in a linear
A model of this behavior in the differences com- fashion. The widening trend of data with increas-
parison is case B, where a constant s = ± 50 units ing concentrations is due to the constant CV% =
was hypothesized. If the variability of the differ- 5%. If a proportional constant error was over-
ences between the two measurements proce- lapped with a constant variability, the variability of
dures is constant, the two plots will appear as they the differences will be consistent across the meas-
do in case B; the spread of the differences remain uring interval, but the bias will show a linear slope.
consistent across the range of concentration on Case E could be a model for data from Table 2,
the reported units difference plot (B2), but it in- plotted in Figures 3 and 7. Case E is the only case in
creases significantly with decreasing concentra- which the linear regression provides clear informa-
tion on the percentage difference plot (B3). tion about a problem of agreement between the
In the case of proportional difference variability two measurements, with a significant change in
between measurements, i.e. constant coefficient the slope of the regression line. On the contrary,
of variation across the range of concentration, the when the agreement analysis is conducted on a
effect on the B&A plot in reported unit difference wide range of concentrations, correlation and lin-
is a widening trend of the agreement range with ear regression are not particularly informative, and
increasing concentrations (C2). Intuitively, in the could also be misunderstanding. Cases A to D are
percentage difference plot, the trends remain par- quite similar if only correlation is taken into ac-
allel to the x axis (C3). count.
For constant differences across the intervals of
concentrations, the reporting unit difference pro- Summary and highlights
vides a better representation of the difference be-
If you want to evaluate whether the differences
tween the two measurements, while percentage
between two measurements of the same sub-
difference plot is preferable for proportional dif-
stance are significant, study the differences, not
ference variability (constant coefficient of varia-
the agreement. The correlation between methods
tion).
is always misleading and should not be used for
If other errors overlap these sources of variability, assessing the method comparability. The B&A plot
they add their effects to the previous one. For in- analysis is a simple way to evaluate a bias between
stance, in case D we hypothesized a constant error the mean differences, and to estimate an agree-
Case A
A2 A3
Method A – Method B
+1.96 SD
2500 40 46.4 10 10.5
20 5
2000
Method A
0 Mean 0 Mean
1500 – 7.1 – 0.5
–20 –5
1000
–40 –10 –1.96 SD
500 –1.96 SD –11.5
Y = 4.8 (–6.1 to 15.6) + 1.0 (1.0 to 1.0) x –60 –15
–60.5
R2 = 0.999
0 –80 –20
0 500 1000 1500 2000 2500 3000 0 500 1000 1500 2000 2500 3000 3500 0 500 1000 1500 2000 2500 3000 3500
Case B
+1.96 SD
2500 100 80
99.3 +1.96 SD
2000 50
Method_A
40 58.2
Mean
1500 0
0.0 Mean
0
1000 –50 0.0
Case C
B2 C3
(Method_A – Method_B) / Average %
3000 150 15
Method_A – Method_B
+1.96 SD
2500 100 +1.96 SD
109.0 10
9.9
2000 50 5
Method_A
Mean Mean
1500 0 0
0.0 0.0
1000 –50 –5
–100 –1.96 SD –1.96 SD
500 –10
Y = 4.2 (–20.5 to 28.9) + 1.0 (1.0 to 1.0) x –109.0 –9.9
2
R = 0.995 –150 –15
0 500 1000 1500 2000 2500 3000 0 500 1000 1500 2000 2500 3000 0 500 1000 1500 2000 2500 3000
Figure 8. Method comparisons of two measurements in five different cases presented as regression analysis (column 1), Bland and
Altman plot where differences are presented as units (column 2) and Bland and Altman plot where differences are presented as per-
centage (column 3).
Cases A, B, C, D and E represent hypothetical examples: A - random variability; B - constant variability, s = ± 50 units; C - constant coef-
ficient of variation, CV% = 5%; D - constant error of plus 15 units in method B, given the same proportional variability (CV%) of 5%, as
in case C; E - proportional constant error over CV% = 5%. Regression equation is expressed as: y= a (95% CI) + b (95% CI)x.
CI – confidence interval.
Case D
Method_A – Method_B
+1.96 SD
2500 100 +1.96 SD 50 60.5
94.0
0
Method_A
2000 50 Mean
–18.2
1500 0 Mean –50
–15.0 –1.96 SD
1000 –50 –100
–96.9
–150
500 Y = –10.8 (–35.7 to 14.2) + 1.0 (1.0 to 1.0) –100
x –1.96 SD
R2 = 0.995 –150
–124.0 –200
0
0 500 1000 1500 2000 2500 3000 0 500 1000 1500 2000 2500 3000 0 500 1000 1500 2000 2500 3000 3500
Method_B Mean of Method_A and Method_B Mean of Method_A and Method_B
Case E
E2 E3
28.0
2500 500
20 Mean
400 +1.96 SD
16.4
369.0
Method_A
2000 10
300 –1.96 SD
1500 200 4.8
0
Mean
1000 100 119.3 –10
0
500 –1.96 SD –20
Y = 5.7 (–23.3 to 34.8) + 1.2 (1.1 to 1.2) x –100
0 R2 = 0.993 –131.3
–200 –30
0 500 1000 1500 2000 2500 3000 0 500 1000 1500 2000 2500 3000 0 500 1000 1500 2000 2500 3000
Figure 8. Method comparisons of two measurements in five different cases presented as regression analysis (column 1), Bland and
Altman plot where differences are presented as units (column 2) and Bland and Altman plot where differences are presented as per-
centage (column 3).
Cases A, B, C, D and E represent hypothetical examples: A - random variability; B - constant variability, s = ± 50 units; C - constant coef-
ficient of variation, CV% = 5%; D - constant error of plus 15 units in method B, given the same proportional variability (CV%) of 5%, as
in case C; E - proportional constant error over CV% = 5%. Regression equation is expressed as: y= a (95% CI) + b (95% CI)x.
CI – confidence interval.
ment interval, within which 95% of the differences considered, to allow the better evaluation. The
of the second method, compared to the first one B&A plot method only defines the intervals of
fall. Data can be logarithmically transformed, if dif- agreements, it does not say whether those limits
ferences seem not to be normally distributed. For are acceptable or not. Acceptable limits must be
bias and agreement limits, appropriate CIs can be defined a priori, based on clinical necessity, biolog-
computed, in order to consider the sampling error ical considerations or other goals.
in relation to the dimension of the sample. Data
can be analyzed as unit differences plot or as per- Potential conflict of interest
centage differences plot. Both the plots may be
None declared.
References
1. Eksborg S. Evaluation of method-comparison data. Clin 12. Eisenhauer JG. Regression through the origin. Teach Stat
Chem 1981;27:1311-2. 2003;25:76-80. http://dx.doi.org/10.1111/1467-9639.00136.
2. Altman DG, Bland JM. Measurement in medicine: the 13. Shapiro SS, Wilk MB. An analysis of variance test for nor-
analysis of method comparison studies. Statistician mality (complete samples). Biometrika 1965;52:3-4. http://
1983;32:307–17. http://dx.doi.org/10.2307/2987937. dx.doi.org/10.1093/biomet/52.3-4.591.
3. Udovičić M, Baždarić K, Bilić-Zulle L, Petrovečki M. What we 14. Sheskin DJ. Handbook of parametric and nonparametric
need to know when calculating the coefficient of correla- statistical procedures. 5th ed. Chapman & Hall / CRC Press,
tion? Biochem Med (Zagreb) 2007;17:10-5. http://dx.doi. Boca Raton, FL, 2011.
org/10.11613/BM.2007.002. 15. Neter J, Wasserman W, Whitmore GA eds. Applied statistics.
4. Martin RF. General Deming regression for estimating syste- 3rd ed. Allyn and Bacon, Boston, MA, 1998.
matic bias and its confidence interval in method-compari- 16. Bland JM, Altman DG. Statistical method for assessing
son studies. Clin Chem 2000;46:100-4. agreement between two methods of clinical measurement.
5. Bilić-Zulle L. Comparison of methods: Passing and Bablok Lancet 1986;327:307-10. http://dx.doi.org/10.1016/S0140-
regression. Biochem Med (Zagreb) 2011;21:49-52. http:// 6736(86)90837-8.
dx.doi.org/10.11613/BM.2011.010. 17. Medcalc manual. Available at: http://www.medcalc.org/
6. Bland JM, Altman DG. Measuring agreement in method manual/t-distibution.php. Accessed January 23rd, 2015.
comparison studies. Stat Methods Med Res 1999;8:135-60. 18. Google Scholar search engine, Available at http://scho-
http://dx.doi.org/10.1191/096228099673819272. lar.google.it/scholar?cites=7296362022254018043&as_
7. Bland JM, Altman DG. Statistical methods for assessing sdt=2005&sciodt=0,5&hl=it. Accessed February 2nd, 2015.
agreement between two methods of clinical measurement. 19. Dewitte K, Fierens C, Stöckl D, LM Thienpont. Application of
Int J Nurs Stud 2010;47:931-6. http://dx.doi.org/10.1016/j. the Bland-Altman plot for interpretation of method - com-
ijnurstu.2009.10.001. parison studies: a critical investigation of its practice. Clin
8. Krouwer JS. Why Bland-Altman plots should use X, not Chem 2002;48:799-801.
(Y+X)/2 when X is a reference method. Stat Med 2008; 20. Clinical and Laboratory Standards Institute (CLSI): Me-
27:778-80. http://dx.doi.org/10.1002/sim.3086. asurement procedure comparison and bias estimation
9. Bland JM, Altman DG. Comparing methods of measure- using patient samples. Approved guideline - Fifth Edition.
ment: why plotting difference against standard method CLSI document EP09-A3. Wayne, PA, USA, 2013.
is misleading. Lancet 1995;346:1085-87. http://dx.doi. 21. Passing H, Bablok W. A new biometrical procedure for te-
org/10.1016/S0140-6736(95)91748-9. sting the equality of measurements from two different
10. Armitage P, Berry G, Matthews JNS eds. Statistical methods analytical methods. Application of linear regression proce-
in medical research. 4th ed. Maiden, MA: Blackwell Science, dures for method comparison studies in Clinical Chemistry,
2002. http://dx.doi.org/10.1002/9780470773666. Part I. J Clin Chem Clin Biochem 1983;21:709-20. http://
11. Bland M. An introduction to medical statistics. 3rd ed. dx.doi.org/10.1515/cclm.1983.21.11.709.
Oxford University Press, Oxford, 2000.