Responsiveness of The Mini-Bestest and Berg Comparison of Reliability, Validity, and

Comparison of Reliability, Validity, and
Responsiveness of the Mini-BESTest and Berg

Balance Scale in Patients With Balance Disorders
Marco Godi, Franco Franchignoni, Marco Caligari,
Andrea Giordano, Anna Maria Turcato and Antonio
Nardone
PHYS THER. 2013; 93:158-167.
Originally published online September 27, 2012
doi: 10.2522/ptj.20120171
The online version of this article, along with updated information and services, can be
found online at: http://ptjournal.apta.org/content/93/2/158
Collections This article, along with others on similar topics, appears

in the following collection(s):
Balance
Tests and Measurements
e-Letters To submit an e-Letter on this article, click here or click on
"Submit a response" in the right-hand menu under
"Responses" in the online version of this article.
E-mail alerts Sign up here to receive free e-mail alerts
Downloaded from http://ptjournal.apta.org/ at The University of British Columbia Library on February 27, 2013
Research Report
M. Godi, PT, MS, Posture and Comparison of Reliability, Validity,

Movement Laboratory, Division of
Physical Medicine and Rehabilita-
tion, Salvatore Maugeri Founda-
and Responsiveness of the Mini-
tion–IRCCS, Veruno, Italy.
F. Franchignoni, MD, Unit of Occu-

BESTest and Berg Balance Scale in
pational Rehabilitation and Ergo-
nomics, Salvatore Maugeri Foun-
Patients With Balance Disorders
dation–IRCCS. Mailing address:
Fondazione Salvatore Maugeri,
Marco Godi, Franco Franchignoni, Marco Caligari, Andrea Giordano,
Clinica del Lavoro e della Riabilita- Anna Maria Turcato, Antonio Nardone
zione–IRCCS, Via Revislate 13,
1-28010, Veruno, Italy. Address all
correspondence to Dr Franch- Background. Recently, a new tool for assessing dynamic balance impairments
ignoni at: franco.franchignoni@ has been presented: the 14-item Mini-BESTest.
fsm.it.
M. Caligari, PT, Posture and Objective. The aim of this study was to compare the psychometric performance
Movement Laboratory, Division of the Mini-BESTest and the Berg Balance Scale (BBS).
of Physical Medicine and Reha-
bilitation, Salvatore Maugeri
Foundation–IRCCS.
Design. A prospective, single-group, observational design was used in the study.
A. Giordano, PhD, Unit of Bio- Methods. Ninety-three participants (mean age⫽66.2 years, SD⫽13.2; 53 women,
engineering, Salvatore Maugeri 40 men) with balance deficits were recruited. Interrater (3 raters) and test-retest (1–3
Foundation–IRCCS.
days) reliability were calculated using intraclass correlation coefficients (ICCs).
A.M. Turcato, PT, Posture and Responsiveness and minimal important change were assessed (after 10 sessions of
Movement Laboratory, Division physical therapy) using both distribution-based and anchor-based methods (external
of Physical Medicine and Reha-
bilitation, Salvatore Maugeri
criterion: the 15-point Global Rating of Change [GRC] scale).
Foundation–IRCCS.
Results. At baseline, neither floor effects nor ceiling effects were found in either
A. Nardone, MD, PhD, Division of
Physical Medicine and Rehabilita-
the Mini-BESTest or the BBS. After treatment, the maximum score was found in 12
tion, Salvatore Maugeri Founda- participants (12.9%) with BBS and in 2 participants (2.1%) with Mini-BESTest. Test-
tion–IRCCS, and Department of retest reliability for total scores was significantly higher for the Mini-BESTest
Translational Medicine, University (ICC⫽.96) than for the BBS (ICC⫽.92), whereas interrater reliability was similar
of Eastern Piedmont, Novara, (ICC⫽.98 versus .97, respectively). The standard error of measurement (SEM) was
Italy.
1.26 and the minimum detectable change at the 95% confidence level (MDC95) was
[Godi M, Franchignoni F, Caligari 3.5 points for Mini-BESTest, whereas the SEM was 2.18 and the MDC95 was 6.2 points
M, et al. Comparison of reliability, for the BBS. In receiver operating characteristic curves, the area under the curve was
validity, and responsiveness of the
Mini-BESTest and Berg Balance
0.92 for the Mini-BESTest and 0.91 for the BBS. The best minimal important change
Scale in patients with balance dis- (MIC) was 4 points for the Mini-BESTest and 7 points for the BBS. After treatment, 38
orders. Phys Ther. 2013;93:158 – participants evaluated with the Mini-BESTest and only 23 participants evaluated with
167.] the BBS (out of the 40 participants who had a GRC score of ⱖ3.5) showed a score
© 2013 American Physical Therapy change equal to or greater than the MIC values.
Association
Published Ahead of Print:

Limitations. The consecutive sampling method drawn from a single rehabilita-
September 27, 2012 tion facility and the intrinsic weakness of the GRC for calculating MIC values were
Accepted: September 17, 2012 limitations of the study.
Submitted: April 14, 2012
Conclusions. The 2 scales behave similarly, but the Mini-BESTest appears to have
a lower ceiling effect, slightly higher reliability levels, and greater accuracy in classify-
ing individual patients who show significant improvement in balance function.
Post a Rapid Response to
this article at:
ptjournal.apta.org
158 f Physical Therapy Volume 93 Number 2 February 2013

Psychometric Properties of the Mini-BESTest and BBS in Patients With Balance Disorders
B
ody balance relies on feedback tion Systems Test (BESTest).14 This tion treatment were recruited, repre-
circuits fed by the input from 36-item test, at variance with the senting a convenience sample of
different receptors, including BBS, also scores dynamic balance inpatients with balance disorders.
somatosensory, labyrinthine, and and gait performance, and it has Patients were referred from sur-
visual.1 These inputs have to be ade- shown good reliability and validity rounding acute care hospitals and
quately integrated in the central ner- for assessing balance in individuals general practitioners and were
vous system in order to produce with Parkinson disease (PD).15 How- screened for rehabilitation potential.
appropriate changes in motor output ever, the drawbacks of the BESTest The inclusion criterion was the abil-
to correct internal and external bal- are that it takes about 45 minutes to ity to fully participate in the study
ance perturbations.2 If one or more administer and it comprises multiple procedures (eg, absence of severe
of these inputs, their integration, or dimensions.16 Thus, with the aid of cognitive impairments, tolerance of
the motor output are impaired, bal- factor analysis and Rasch analysis, a balance and gait tasks without
ance disorders occur.3 short form of the BESTest with 14 fatigue). Of the 99 patients recruited,
items only, named the Mini-BESTest, 2 were unable to perform the assess-
Because balance control is a com- was produced, with improved rating ment due to the severity of their ill-
plex task, simple tests of postural category, high reliability, and struc- ness, and 4 declined to participate.
stability, such as one-leg stance, are tural validity.16 The Mini-BESTest Thus, 93 patients (mean age⫽66.2
not appropriate for a comprehensive includes important aspects of years, SD⫽13.2; 53 women, 40 men)
assessment of patients with balance dynamic balance control, such as the took part in the study. The partici-
impairment.4 People with balance capability to react to postural pertur- pants’ diagnoses were as follows: 25
disorders may be unstable in many bations, to stand on a compliant or had PD, 25 had hemiparesis (9 right,
different daily life situations (eg, inclined surface, and to walk while 12 left), 6 had multiple sclerosis, 5
when walking, when turning, when performing a cognitive task. All of had vestibular disorders, 6 had neu-
reaching for a far object, after an these features of balance control are romuscular diseases, 8 had heredi-
external perturbation).5–7 Clinical known to be important in assessing tary ataxia, 8 had sensorimotor poly-
scales have been developed to pro- balance disorders in different types neuropathy, 4 had central nervous
vide a comprehensive view of bal- of patients and reflect balance chal- system neoplasm, and 6 had unspe-
ance performances, as close as pos- lenges during activities of daily liv- cific age-related balance disorders.
sible to real-life situations.8 To ing.14,17 Recent articles have been Prior to taking part in the study, all
evaluate postural stability in a more published18 –20 in which some impor- participants signed an informed con-
functional context, these clinical tant psychometric characteristics of sent statement that had been
scales would appear to be more the Mini-BESTest (eg, responsive- approved by the Central Ethics Com-
appropriate than simple tests of pos- ness) compared favorably with those mittee of the Salvatore Maugeri
tural stability. of the BBS in patients with PD. Foundation.
The Berg Balance Scale (BBS)9 is one The aim of this study was to perform Assessment
of the most widely used tools for a head-to-head comparison of the Mini-BESTest. The Mini-BESTest
balance assessment.10 Its psychomet- psychometric performance of the is a 14-item balance scale that takes
ric properties have been well Mini-BESTest and the BBS in a con- about 15 minutes to administer, is
assessed, and the scale has shown to venience sample of patients with unidimensional, and is highly reli-
be a valid and reliable measure of balance disorders of different ori- able.16 It contains items covering a
balance.11 However, some important gins. For this purpose, we estimated broad spectrum of performance
limitations of the BBS have been interrater and test-retest reliability, tasks, including transitions and antic-
described, such as the need for concurrent validity, sensitivity to ipatory postural adjustments, pos-
some rescoring of the rating change, and responsiveness of both tural responses to perturbation, sen-
scale,12 a ceiling effect,11 and rela- scales. sory orientation while standing on a
tively low responsiveness.13 More- compliant or inclined base of sup-
over, dynamic balance (eg, reacting Method port, and dynamic stability in gait.
to a perturbation, gait) is unexplored Participants Items are scored from 0 (unable to
by the BBS. Ninety-nine patients (mean age⫽66.1 perform or requiring help) to 2 (nor-
years, SD⫽13.1; 56 women, 43 men) mal performance). The maximum
Recently, a new clinical tool for consecutively admitted to our free- total score is 28.
assessing balance impairments has standing rehabilitation center (320
been presented: the Balance Evalua- beds) for assessment and rehabilita-
February 2013 Volume 93 Number 2 Physical Therapy f 159

BBS. The BBS is the most widely considered moderately to largely treatment session was individually
used and validated instrument for improved.24 tailored according to the partici-
assessing balance performance in pant’s functional status and clinical
neurological conditions.9 It is com- Procedure indications.
posed of 14 items that require sub- All participants were evaluated with
jects to maintain positions of varying the Mini-BESTest and the BBS by the At the end of the treatment, the GRC
difficulty and perform specific tasks same rater before and after a physical was completed by each participant
such as standing and sitting unsup- therapy program for balance disor- and by the treating physical therapist
ported, transfers (sit to stand and ders. The raters for all procedures (4 different physical therapists who
stand to sit), turn to look over shoul- were 3 licensed physical therapists were not involved in the study pro-
ders, pick up an object from the (M.G., M.C., and A.M.T.) who were cedures). The participants and ther-
floor, turn 360° and place alternate specifically trained in administering apists were unaware of each other’s
feet on a stool. Scoring is based on the 2 balance scales. The raters were responses.
the subject’s ability to perform the always blinded to their previous
14 tasks independently and/or meet ratings. Data Analysis
certain time or distance require- Descriptive statistics, including cen-
ments. Each item is scored on a For both the Mini-BESTest and the tral tendency (median) and spread
5-point ordinal scale ranging from 0 BBS, test-retest reliability and interra- (25th–75th percentiles), were calcu-
(unable to perform) to 4 (normal per- ter reliability were analyzed in a sub- lated for both balance scales and the
formance) so that the aggregate set of 32 consecutive participants GRC. Floor and ceiling effects were
score ranges from 0 to 56. (mean age⫽67.3 years, SD⫽13.5; 19 analyzed, calculating the percent-
women, 13 men; 8 with PD, 7 with ages of individuals obtaining the low-
Global Rating of Change (GRC). hemiparesis, 10 with other neurolog- est and the highest scores for the 2
The GRC is a rating scale designed to ical disorders, 3 with vestibular dis- scales. The Stata/IC version 10.1 soft-
quantify patients’ improvement or orders, and 4 with age-related bal- ware package (StataCorp LP, College
deterioration over time. It is used to ance disorders). For interrater Station, Texas) was used for the sta-
determine the effect of an interven- reliability, each of the 3 physical tistical analyses.
tion or chart the clinical course of a therapists performed a simultaneous
condition. The GRC was completed independent balance assessment at Reliability. The internal consis-
at the time of the final assessment baseline; for test-retest reliability, tency of the Mini-BESTest and the
(after the rehabilitation treatment) participants were reassessed (by 1 of BBS was assessed by means of the
by each participant and the treating the 3 therapists) after 1 to 3 days. Cronbach alpha coefficient at both
physical therapist. Participants were This sample size was determined on baseline and follow-up. Alpha values
asked to independently rate the over- the basis of a pilot study, expecting ⱖ.70 are recommended for group-
all change in their balance from to obtain intraclass correlation coef- level comparison, whereas a mini-
when they began treatment using a ficient (ICC) values of about .90, mum of .85 to .90 is desirable for
15-point scale ranging from ⫺7 (“a with a 95% confidence interval (CI) individual judgments.28
very great deal worse”) to ⫹7 (“a of .20.25
very great deal better”), with 0 indi- For both scales, test-retest and inter-
cating “unchanged.”21,22 We decided The physical therapy program con- rater reliability of global scores was
to use 2 external indicators (clinician sisted of ten 1-hour sessions for 2 calculated, using the ICC (2,1) and
and patient rating, respectively) weeks of the following exercises: (1) corresponding CI. For clinical mea-
because the use of independent static and dynamic functional bal- surements, ICC values should
anchors is recommended23 and may ance activities (eg, reaching while exceed .90 to ensure reasonable reli-
reduce problems reported when standing, standing on one leg, sit-to- ability.29 Z-transformed ICCs
using only the patient GRC.21 There- stand maneuver, turning, walking obtained with 1,000 bootstrap sam-
fore, the mean value of the 2 GRC training); (2) exercises for training ples were used to test ICC difference
scores (physical therapist and specific balance skills (eg, “push and between measures.30
patient) was used as a reference stan- release” techniques, stance on a
dard: participants with a rating from foam surface, dual-task training); (3) Validity. Convergent validity was
0 to ⫹3 (“a little bit better”) were flexibility and strength training; and assessed by calculating the Pearson
considered to have minimally (4) perturbation-based training on a correlation coefficient (r) of the total
changed or not changed, and those platform continuously moving on scores of the Mini-BESTest and the
with a rating greater than 3 were the horizontal plane.14,26,27 Each BBS (at both the first evaluation and

follow-up) and their changes (after measurement (SEM), which links the to the number of participants who
versus before rehabilitation). Confi- reliability of the measurement instru- were correctly identified as not
dence intervals and comparisons of ment to the standard deviation of the improved based on the cutoff value
the correlation coefficients between population.33 The SEM and its CI divided by all participants who truly
the measures were calculated.31 were calculated on the basis of the did not undergo a meaningful
analysis of variance used to produce change (GRC ⱕ3). The optimal cut-
In addition, because the GRC was the ICC.34 Starting from the SEM, off score was chosen as the point
considered the anchor (ie, the refer- we calculated the minimum detect- that jointly maximized sensitivity
ence standard against which we able change (MDC). The MDC repre- and specificity (being associated
judged whether a real improvement sents the smallest change in score with the least amount of
in the participants had occurred), it that likely reflects true change rather misclassification).
was used to provide a valid assess- than measurement error alone. The
ment of the same construct mea- calculation is the result of the multi- The AUC can be interpreted as the
sured by the tools under longitudinal plication of the SEM ⫻ z value ⫻ probability of correctly identifying a
investigation.24 Thus, a Pearson cor- 公2. The 95% confidence level patient who has improved in ran-
relation between the GRC (mean (MDC95) was established, corre- domly selected pairs of patients who
value of the participant’s and thera- sponding to a z value of 1.96. As an have and have not shown an
pist’s scores) and the change (after example, if a participant has a improvement. The greater the AUC,
versus before rehabilitation) in the 2 change score equal to or above the the greater a measure’s ability to dis-
balance scales was calculated and MDC95 threshold, it is possible to tinguish patients who improved
tested for differences between mea- state with 95% confidence that this from those who do not improve; as a
sures. Moreover, the correlation change is reliable and not due to an general rule, an AUC ⬎0.8 is consid-
between the GRC rated by the par- error. ered to have excellent discrimina-
ticipant and that rated by the physi- tion.29 Based on the study by Turner
cal therapist was used to investigate The second approach for evaluating et al,24 our ROC analysis used the
their relationship. For all of these responsiveness is the use of anchor- entire cohort in order to increase
correlations, we expected a “non- based methods. These methods were precision and obtain more logical
trivial” association between mea- based on GRC assessment as an estimates of the MIC values.
sures (ie, r⬎.30).23 external criterion. The following 2
parameters were analyzed: (1) for Formal testing for a difference in the
Responsiveness. There are 2 the mean change approach, we cal- AUCs between scales was performed
types of approach for evaluating culated the mean change of partici- according to the procedure of
responsiveness and clinical signifi- pants graded on the GRC as not DeLong et al.35 To obtain CIs for the
cance23: distribution-based methods improved (GRC ⱕ3), moderately ROC analysis results, we drew 500
and anchor-based methods. The improved (3⬍GRC⬍5), or largely bootstrap samples and calculated the
distribution-based methods are improved (GRC ⱖ5); and (2) for the AUC, as well as the sensitivity and
based on the statistical characteris- receiver operating characteristic specificity values associated with the
tics of the obtained sample and ana- (ROC) curve approach,29 we deter- best cutoff scores in each bootstrap
lyze the ability to detect change in mined the optimal cutoff score and replication. The mean of the boot-
general. The anchor-based methods the area under the curve (AUC) after strap values was taken as the best
require an external criterion to deter- having split the participants based estimate, with the CI calculated as
mine whether changes in outcome on a GRC ⱕ3 or higher, and thus 1.96 ⫻ SD (as an estimate of the
scores are clinically meaningful. We having considered a GRC ⬎3 as an standard error) of the 500 bootstrap
used both approaches in order to index of meaningful change. values.32
have a wide range of results on
which to draw inferences about the A ROC curve plots sensitivity (y-axis) Role of the Funding Source
minimal important change (MIC) for against 1 ⫺ specificity (x-axis). In This study was supported, in part, by
both scales, aware of the large varia- this context, sensitivity was calcu- “Giovani Ricercatori 2009” grant
tion and lack of convergence that lated as the number of participants (GR-2009-1471033) to Mr Godi and
these different methods could correctly identified as improved by “Progetto Strategico 2007” grant
show.32 based on the cutoff value divided by (RFPS-2007-1-641398) to Dr Nardone
all participants identified as having from the Italian Ministry of Health.
For the distribution-based methods, undergone a meaningful change The study sponsor was not involved
we calculated the standard error of (GRC ⬎3), whereas specificity refers in: study design; collection, analysis,

Table 1.
Descriptive Statistics Related to Values of the Mini-BESTest, the Berg Balance Scale (BBS), and the Global Rating of Change (GRC)
in the Whole Group (n⫽93) and to Values of the Mini-BESTest and the BBS in the Test-Retest and Interrater Reliability Subgroup
(n⫽32)
Measure Minimum Maximuma X SD 1st Quartile Median 3rd Quartile
Mini-BESTest
Baseline 1 27 12.8 6.9 8 12 19
After treatment 1 28 (2) 15.8 6.9 11 15 22
Change ⫺1 10 3.1 2.4 1 4 5
Test-retest and interrater reliability subgroup 1 25 11.1 7.6 5 11 15
BBS
Baseline 4 55 42 11.2 38 45 50
After treatment 4 56 (12) 46.3 10.3 42 49 54
Change ⫺2 17 4.2 3.9 1 4 6
Test-retest and interrater reliability subgroup 4 55 38.4 14.2 30 42 48
GRC 0 6 2.9 1.2 2 3 3.5

a
Number of participants recording a ceiling effect shown in parentheses.
or interpretation of data; writing of (36.5%, moderate improvement), apist were significantly correlated
the report; or the decision to submit and GRC ⱖ5 in 6 participants (6.4%, (r⫽.61, P⬍.001).
the manuscript for publication. large improvement). No participants
worsened according to the GRC. Responsiveness
Results Distribution-based methods. The
Descriptive Statistics Reliability SEM and MDC95 values for both the
Table 1 provides the descriptive There was a statistically significant Mini-BESTest and the BBS are shown
statistics for 3 measures (both at difference in test-retest reliability in Table 2.
baseline and after treatment for the between the Mini-BESTest and the
Mini-BESTest and the BBS and only BBS, whereas both Cronbach alpha Anchor-based methods. For both
after treatment for the GRC) in the and interrater reliability were similar scales, the mean score changes in
whole group (n⫽93) and for Mini- in both groups (Tab. 2). those participants who were rated as
BESTest and the BBS in the test-retest having a small or null improvement
and interrater reliability subgroup Validity (GRC ⱕ3), moderate improvement
(n⫽32). No clinical problems were The scores of the Mini-BESTest and (3⬍GRC⬍5), or large improvement
encountered during assessment pro- the BBS were highly correlated at (GRC ⱖ5) are shown in Table 2.
cedures. No dropouts occurred. both baseline and follow-up (for
both, r⫽.85, CI⫽.78 –.90) (Fig. 2). Splitting data according to the pres-
Figure 1 shows the score distribution The correlation between score ence of a moderate to large GRC
of the 2 scales before and after treat- changes of the Mini-BESTest and the improvement (GRC ⱕ3 versus GRC
ment. In both the Mini-BESTest and BBS over the course of the rehabili- ⬎3), both AUCs were high and sim-
the BBS, neither top scores at base- tation program was r⫽.58 (P⬍.001). ilar (Tab. 2, Fig. 3). The cutoff score
line nor floor scores at any time were that best identified meaningful
found. After treatment, 12 partici- The correlation between mean GRC improvement in clinical status (as
pants (12.9%) reached the maximum and the score changes (after versus measured by GRC ⬎3) was 4 points
BBS score, whereas 2 participants before rehabilitation) was r⫽.72 for the Mini-BESTest and 6 points for
(2.1%) reached the Mini-BESTest top (CI⫽.61–.81) for the Mini-BESTest the BBS.
score (Tab. 1). and r⫽.62 (CI⫽.48 –.73) for BBS; the
difference between the correlation Overall, a MIC value of 4 points for
The mean GRC was ⱕ3 in 53 partic- coefficients was not statistically sig- the Mini-BESTest and 7 points for the
ipants (57%, small or null improve- nificant. The GRC rated by the par- BBS represented the best triangula-
ment), 3⬍GRC⬍5 in 34 participants ticipant and that by the physical ther- tion of these results, adopting values

higher than the respective MDC95

value for each scale. Among the 40
participants who had a moderate to
large improvement in balance (GRC
⬎3) after physical therapy, 38
showed a change of ⱖ4 points on
the Mini-BESTest, whereas only 23
showed a change of ⱖ7 points on
the BBS.
Discussion
Valid inferences about the efficacy of
treatment trials require high-quality
outcome measures that meet rigor-
ous measurement standards. The
present study was conducted to ana-
lyze reliability and validity issues in
both the Mini-BESTest and the BBS
and to compare their responsiveness
after a 10-session physical therapy
program for balance disorders. Our
results are in line with the recent
literature13,18,19 and indicate that the
Mini-BESTest shows sound psycho-
metric properties, which compare
favorably with those of the BBS, par-
ticularly when measuring change at
the individual level.
At the follow-up evaluation, 2 partic-

ipants (2.1%) reached the top score
on the Mini-BESTest, whereas 12 par-
ticipants (about 13%) reached the
maximum score on the BBS. Our Figure 1.
findings are in agreement with those Histogram of grouped frequency distribution (%) for Mini-BESTest scores (range⫽0 –
of previous studies that showed the 28) and Berg Balance Scale scores (range⫽0 –56), before (white columns) and after
(black columns) physical therapy program.
BBS to have a ceiling effect in people
with PD, as well as in other popula-
tions.13,18,19 Recently, in people with
PD, a lesser ceiling effect and Mini-BESTest speaks in favor of the (.96 versus .92) and interrater reli-
skewed distribution were found for use of this scale, which represents a ability (.98 versus .97).
the Mini-BESTest with respect to the more comprehensive measure of bal-
BBS.13 Usually only subgroups of ance, with items (eg, compensatory The high reliability of both balance
patients with severely limited func- steps, walking with dual task) that scales is in accordance with previous
tion do not show a ceiling effect on are able to challenge patients with findings. A recent study19 performed
the BBS.18 This fact raises an impor- even minimal impairment in balance in individuals with PD demonstrated
tant concern about the use of the function. similar levels of reliability for the
BBS as an outcome measure to eval- Mini-BESTest (interrater reliability
uate balance impairments: it repre- Reliability ICC⫽.91, test-retest reliability
sents a limited ability of the tool to The Cronbach alpha showed high ICC⫽.92). In earlier reliability stud-
discriminate among patients with values (ⱖ.90) in both tests. On the ies using the BBS, test-retest ICCs
quite good balance function. On the basis of the ICC, both the Mini- ranged from .80 to .99,11,15,36
contrary, the absence of a significant BESTest and the BBS performed very whereas interrater ICCs were usually
ceiling compression effect on the well in terms of test-retest reliability ⱖ.95.36,37

MDC90 –95 values for the BBS range

from 5 to 8 points.36,39 – 41
Anchor-based methods. The mean

score change in participants who
were rated as having had a moderate
improvement (3⬍GRC⬍5) was 4.6
points for the Mini-BESTest and 7.0
points for the BBS. Using ROC
curves, the relative discriminatory
accuracy of the 2 tests was excellent
(⬎90%) and statistically equivalent.
The Mini-BESTest showed a higher
sensitivity than the BBS (94% versus
77%, respectively) (Tab. 2), which
indicates a higher capacity to iden-
tify those participants who under-
went a clinically important change,
which is crucial in clinical settings.
Likewise, Duncan et al20 found a
comparable accuracy of the 2 tests in
predicting individuals with PD who
Figure 2. were prone to falling at 6 months,
Scatterplot showing the relationship between the Mini-BESTest and the Berg Balance whereas King et al18 reported that
Scale (BBS) raw scores, before and after the physical therapy program.
the Mini-BESTest was slightly more
successful than the BBS at discrimi-
nating subgroups of PD severity as
Validity extent to which changes in their measured by the Hoehn and Yahr
The high correlation between the 2 scores reflect clinically important scale.
scales supports the convergent valid- changes in patients’ health status.
ity of the MiniBESTest with the BBS, There is a lack of consensus regard- In general, the results of anchor-
the most commonly used scale for ing the best method to determine based methods (and related values of
balance assessment. A high correla- the MIC, and a recent study recom- MIC) should be considered more
tion between the 2 scales also was mended using multiple approaches important than those of the
found in a recent study of individuals followed by a triangulation to obtain distribution-based methods (includ-
with PD,18 and a high correlation of one value or a small range of values ing values of MDC),23 and—as
the Mini-BESTest with the BBS, the for the MIC,32 as we did in the pres- Turner et al24 stated— distribution-
Timed “Up & Go” Test, and the Falls ent study. based approaches should act only as
Efficacy Scale was reported in indi- a temporary surrogate, pending avail-
viduals with both PD and stroke.38 In Distribution-based methods. The ability of empirically established
addition, the ability of both the par- MDC95 value was 3.5 points for the anchor-based MIC values. However,
ticipants and the physical therapist Mini-BESTest and 6.2 points for the the large variations of MIC indexes
to acceptably estimate the change in BBS. In the only study that had suffi- that can be found among popula-
balance performance (during a cient data to calculate the MDC for tions and methods32 indicate that in
2-week transition period) is con- the Mini-BESTest,19 this value was the puzzle to establish the MIC, we
firmed by the correlation of their about 4 (ie, very close to our result). should select only MIC values that
GRC assessments with each other Romero et al39 recently found an are above the MDC.24
and with change in the Mini-BESTest MDC95 value of 6.5 points for the
and BBS scores. BBS and noted that this value was Accordingly, the overall results of
not constant across different levels our study suggest a change of 4
Responsiveness of function, being lower in individu- points in the Mini-BESTest as the
If rating scales are used as primary als with better performance. Our most acceptable MIC value. The MIC
outcome measures in clinical stud- findings also appear to be confirmed value was higher than MDC95 value
ies, there is a need to know the by the observation that reported for this scale and represents a score

change just slightly lower than the Table 2.

mean change in our group of partic- Reliability and Responsiveness Indexes for Mini-BESTest and Berg Balance Scalea
ipants who showed a moderate bal- Variable Mini-BESTest BBS
ance improvement (corresponding
Reliability
to 3⬍GRC⬍5). Similarly, in our sam-
Cronbach alpha: baseline/follow-up .90/.91 .93/.93
ple, a change of 7 points appears to
b
be the most adequate MIC value for Test-retest reliability: ICC .96 (.94–.99) .92 (.87–.97)b
the BBS: again, it was higher than its Interrater reliability: ICC .98 (.97–.99) .97 (.96–.99)
MDC95 value (6.2 points) and corre- Responsiveness: distribution-based methods
sponds to the mean change in our SEM 1.26 (1.01–1.65) 2.18 (1.76–2.87)
participants who showed a moder-
MDC95 3.5 6.2
ate balance improvement. Further-
Responsiveness: anchor-based methods
more, these MIC values represent a
change of similar size on the 2 scales. Mean score change in patients with:
A change of 4 points represents a ● null/small improvement (GRC ⱕ3) 1.6 1.9

variation of about 14% for the Mini- ● moderate/medium improvement (3⬍GRC⬍5) 4.6 7.0
BESTest (maximum score: 28), and a ● large improvement (GRC ⱖ5) 7.0 9.2
change of 7 points represents a vari-
Area under the ROC curve 0.92 (0.84–0.97) 0.91 (0.84–0.98)
ation of 13% for the BBS (maximum
Sensitivity 94 (87–100) 77 (65–89)
score: 56). However, switching from
group level to person level, 38 (95%) Specificity 81 (70–92) 97 (92–100)
of the 40 participants who had a Optimal cutoff score 4 (3.0–4.9) 6 (4.4–7.6)

moderate to large improvement in a
Data were calculated on the whole sample (n⫽93), except for test-retest and interrater reliability
balance (GRC ⱖ3.5) showed a (n⫽32); 95% confidence intervals are shown in parentheses. ICC⫽intraclass correlation coefficient,
SEM⫽standard error of measurement, MDC95⫽minimum detectable change at 95% confidence
change after physical therapy equal interval, GRC⫽Global Rating of Change, ROC⫽receiver operating characteristic.
b
to or higher than the MIC value (4 Italics denote significant difference between measures (P⬍.001).
points) for the Mini-BESTest,
whereas only 23 (58%) showed a
change equal to or higher than the
MIC value (7 points) for the BBS.
These findings are the first analyzing

in depth the responsiveness of the
Mini-BESTest and are in line with
those concerning the BBS. More-
over, Romero et al39 recently under-
scored that measurement error (and
parameters derived from it) often are
not constant across different levels
of function and related scores. As a
consequence, caution is mandatory
when interpreting and using these
MIC values in different populations
and settings, particularly considering
the intrinsic weaknesses of GRC.21,42
The GRC (and the MIC values
derived from it) suffers from the
problem of the subjective retrospec-
tive judgments of change (eg, due to
“recall bias,” or problematic patient
Figure 3.
ability to understand the context of
Comparison between the receiver operating characteristic curves of the Mini-BESTest
improvement).21 To reduce these and the Berg Balance Scale, showing their overall accuracy in identifying a balance
drawbacks, we used the mean of 2 improvement according to a Global Rating of Change score of ⱕ3 versus ⬎3. Arrows
ratings (participant and therapist), show the point that jointly maximizes sensitivity and specificity.

after reporting the correlations fidence in the relative validity of 11 Blum L, Korner-Bitensky N. Usefulness of
the Berg Balance Scale in stroke rehabili-
between them. these findings. tation: a systematic review. Phys Ther.
2008;88:559 –566.
An additional limitation of the pres- 12 Kornetti DL, Fritz SL, Chiu YP, et al. Rating
Mr Godi, Dr Franchignoni, Mr Caligari, and scale analysis of the Berg Balance Scale.
ent study is the selection criteria of Dr Nardone provided concept/idea/research Arch Phys Med Rehabil. 2004;85:1128 –
our convenience sample (recruited design. Mr Godi, Dr Franchignoni, Mr Cali- 1135.
with a consecutive sampling gari, Dr Giordano, and Dr Nardone provided 13 Pardasaney PK, Latham NK, Jette AM, et al.
writing and data analysis. Mr Godi, Mr Cali- Sensitivity to change and responsiveness
method), which may represent a of four balance measures for community-
gari, and Ms Turcato provided data collec-
threat to external validity. Our sam- dwelling older adults. Phys Ther. 2012;92:
tion. Dr Franchignoni and Dr Nardone pro- 388 –397.
ple was a cross-section of adults vided project management and study 14 Horak FB, Wrisley DM, Frank J. The Bal-
drawn from a single rehabilitation participants. Dr Nardone provided facilities/ ance Evaluation Systems Test (BESTest) to
facility and with balance disorders of equipment and institutional liaisons. Ms Tur- differentiate balance deficits. Phys Ther.
cato and Dr Nardone provided consultation 2009;89:484 – 498.
very different origins and severities.
(including review of manuscript before 15 Leddy AL, Crowner BE, Earhart GM. Func-
Finally, even if raters were blinded to tional gait assessment and balance evalua-
submission).
their previous ratings, a memory tion system test: reliability, validity, sensi-
This work was supported, in part, by “Gio- tivity, and specificity for identifying
effect cannot be ruled out. individuals with Parkinson disease who
vani Ricercatori 2009” and “Progetto Strate- fall. Phys Ther. 2011;91:102–113.
gico 2007” grants from the Italian Ministry
In conclusion, this study showed— of Health.
16 Franchignoni F, Horak F, Godi M, et al.
Using psychometric techniques to
within the context analyzed and our improve the Balance Evaluation Systems
DOI: 10.2522/ptj.20120171
specific patient group—the high reli- Test: the mini-BESTest. J Rehabil Med.
2010;42:323–331.
ability levels of the Mini-BESTest,
17 Horak FB. Postural orientation and equilib-
confirmed those of the BBS, and References rium: what do we need to know about
proved the validity of both scales for 1 Johansson R, Magnusson M. Human pos- neural control of balance to prevent falls?
tural dynamics. Crit Rev Biomed Eng. Age Ageing. 2006;35:ii7–ii11.
measuring balance function and its 1991;18:413– 437.
18 King LA, Priest KC, Salarian A, et al. Com-
change over time. In addition, our 2 Goodworth AD, Peterka RJ. Sensorimotor paring the Mini-BESTest with the Berg Bal-
findings show how much the calcu- integration for multi-segmental frontal ance Scale to evaluate balance disorders in
plane balance control in humans. J Neu- Parkinson’s disease. Parkinsons Dis.
lation of success rates (ie, percent- rophysiol. 2012;107:12–28. 2012;2012:375419. Epub 2011 Oct 24.
ages of patients having a change 3 Buchanan JJ, Horak FB. Voluntary control 19 Leddy AL, Crowner BE, Earhart GM. Utility
greater than the MIC value) can be of postural equilibrium patterns. Behav of the Mini-BESTest, BESTest, and BESTest
Brain Res. 2003;143:121–140. sections for balance assessments in indi-
useful from a clinical point of view.32 viduals with Parkinson disease. J Neurol
4 Briggs RC, Gossman MR, Birch R, et al.
Balance performance among noninstitu- Phys Ther. 2011;35:90 –97.
Most responsiveness indexes of the tionalized elderly women. Phys Ther. 20 Duncan RP, Leddy AL, Cavanaugh JT, et al.
1989;69:748 –756. Accuracy of fall prediction in Parkinson
Mini-BESTest were equivalent or disease: six-month and 12-month prospec-
5 Czernuszenko A, Członkowska A. Risk fac-
compared favorably with those of tors for falls in stroke patients during inpa- tive analyses. Parkinsons Dis. 2012;2012:
tient rehabilitation. Clin Rehabil. 2009;23: 237673. Epub 2011 Nov 30.
the BBS. The main advantages of the
176 –188. 21 Kamper SJ, Maher CG, Mackay G. Global
Mini-BESTest over the BBS appear to rating of change scales: a review of
6 Orr R. Contribution of muscle weakness to
be that it has a lower ceiling effect postural instability in the elderly: a system- strengths and weaknesses and consider-
atic review. Eur J Phys Rehabil Med. ations for design. J Man Manip Ther.
together with slightly higher reliabil- 2009;17:163–170.
2010;46:183–220.
ity levels, which led to greater accu- 22 Jaeschke R, Singer J, Guyatt G. Measure-
7 Plotnik M, Giladi N, Dagan Y, Hausdorff
racy in classifying individual patients JM. Postural instability and fall risk in Par- ment of health status: ascertaining the
kinson’s disease: impaired dual tasking, minimal clinically important difference.
who showed significant improve- Control Clin Trials. 1989;10:407– 415.
pacing, and bilateral coordination of gait
ment in balance function. during the “On” medication state. Exp 23 Revicki D, Hays RD, Cella D, Sloan J. Rec-
Brain Res. 2011;210:529 –538. ommended methods for determining
8 Yelnik A, Bonan I. Clinical tools for assess- responsiveness and minimally important
Further studies are needed to con- differences for patient-reported outcomes.
ing balance disorders. Neurophysiol Clin.
firm and expand the present results 2008;38:439 – 445. J Clin Epidemiol. 2008;61:102–109.
(to increase their generalizability), 9 Berg KO, Wood-Dauphinée SL, Williams JI, 24 Turner D, Schünemann HJ, Griffith LE,
Maki B. Measuring balance in the elderly: et al. Using the entire cohort in the
including analyses based on Rasch- receiver operating characteristic analysis
validation of an instrument. Can J Public
transformed rating scores. Neverthe- Health. 1992;83(suppl 2):S7–S11. maximizes precision of the minimal
important difference. J Clin Epidemiol.
less, our results for the Mini-BESTest 10 Tyson SF, Connell LA. How to measure 2009;62:374 –379.
are in line with those of previous balance in clinical practice: a systematic
review of the psychometrics and clinical 25 Bonett DG. Sample size requirements for
studies conducted in different coun- utility of measures of balance activity for estimating intraclass correlations with
neurological conditions. Clin Rehabil. desired precision. Stat Med. 2002;21:
tries and contexts using the same 1331–1335.
2009;23:824 – 840.
instrument, thus increasing our con-

26 Shubert TE. Evidence-based exercise pre- 33 de Vet HC, Terwee CB, Ostelo RW, et al. 38 Bergström M, Lenholm E, Franzén E.
scription for balance and falls prevention: Minimal changes in health status question- Translation and validation of the Swedish
a current review of the literature. J Geriatr naires: distinction between minimally version of the mini-BESTest in subjects
Phys Ther. 2011;34:100 –108. detectable change and minimally impor- with Parkinson’s disease or stroke: a pilot
tant change. Health Qual Life Outcomes. study. Physiother Theory Pract. 2012;
27 Corna S, Nardone A, Prestinari A, et al. 2006;4:54. 28:509 –514.
Comparison of Cawthorne-Cooksey exer-
cises and sinusoidal support surface trans- 34 Stratford PW, Goldsmith CH. Use of the 39 Romero S, Bishop MD, Velozo CA, Light K.
lations to improve balance in patients with standard error as a reliability index of Minimum detectable change of the Berg
unilateral vestibular deficit. Arch Phys interest: an applied example using elbow Balance Scale and Dynamic Gait Index in
Med Rehabil. 2003;84:1173–1184. flexor strength data. Phys Ther. 1997;77: older persons at risk for falling. J Geriatr
745–750. Phys Ther. 2011;34:131–137.
28 Bland JM, Altman DG. Cronbach’s alpha.
BMJ. 1997;314:572. 35 DeLong ER, DeLong DM, Clarke-Pearson 40 Stevenson TJ. Detecting change in
DL. Comparing the areas under two or patients with stroke using the Berg Bal-
29 Portney LG, Watkins MP. Foundations of more correlated receiver operating ance Scale. Aust J Physiother. 2001;47:
Clinical Research: Applications to Prac- curves: a nonparametric approach. Bio- 29 –38.
tice. 3rd ed. Upper Saddle River, NJ: Pren- metrics. 1988;44:837– 845.
tice Hall Health; 2009. 41 Conradsson M, Lundin-Olsson L, Lindelöf
36 Steffen T, Seney M. Test-retest reliability N, et al. Berg Balance Scale: intrarater test-
30 Stratford PW, Binkley JM, Riddle DL. and minimal detectable change on bal- retest reliability among older people
Development and initial validation of the ance and ambulation tests, the 36-Item dependent in activities of daily living and
back pain functional scale. Spine. 2000;25: Short-Form Health Survey, and the Unified living in residential care facilities. Phys
2095–2102. Parkinson Disease Rating Scale in people Ther. 2007;87:1155–1163.
31 Norman GR, Streiner DL. Biostatistics: The with parkinsonism. Phys Ther. 2008;88: 42 Cook CE. Clinimetrics corner. The Mini-
Bare Essentials. 3rd ed. Shelton, CT: 733–746. mal Clinically Important Change Score
PMPH USA Inc; 2008. 37 de Figueiredo KM, de Lima KC, Cavalcanti (MCID): a necessary pretense. J Man
32 Terwee CB, Roorda LD, Dekker J, et al. Maciel AC, Guerra RO. Interobserver Manip Ther. 2008;16:E82–E83.
Mind the MIC: large variation among pop- reproducibility of the Berg Balance Scale
ulations and methods. J Clin Epidemiol. by novice and experienced physiothera-
2010;63:524 –534. pists. Physiother Theory Pract. 2009;25:
30 –36.

Comparison of Reliability, Validity, and
Responsiveness of the Mini-BESTest and Berg
Balance Scale in Patients With Balance Disorders
Marco Godi, Franco Franchignoni, Marco Caligari,
Andrea Giordano, Anna Maria Turcato and Antonio
Nardone
PHYS THER. 2013; 93:158-167.
Originally published online September 27, 2012
doi: 10.2522/ptj.20120171
References This article cites 40 articles, 13 of which you can access

for free at:
http://ptjournal.apta.org/content/93/2/158#BIBL
Subscription http://ptjournal.apta.org/subscriptions/
Information
Permissions and Reprints http://ptjournal.apta.org/site/misc/terms.xhtml
Information for Authors http://ptjournal.apta.org/site/misc/ifora.xhtml

Responsiveness of The Mini-Bestest and Berg Comparison of Reliability, Validity, and

Uploaded by

Copyright:

Available Formats

Responsiveness of The Mini-Bestest and Berg Comparison of Reliability, Validity, and

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Responsiveness of The Mini-Bestest and Berg Comparison of Reliability, Validity, and

Uploaded by

Copyright:

Available Formats

Comparison of Reliability, Validity, and

Responsiveness of the Mini-BESTest and Berg

Collections This article, along with others on similar topics, appears

M. Godi, PT, MS, Posture and Comparison of Reliability, Validity,

F. Franchignoni, MD, Unit of Occu-

Published Ahead of Print:

158 f Physical Therapy Volume 93 Number 2 February 2013

February 2013 Volume 93 Number 2 Physical Therapy f 159

160 f Physical Therapy Volume 93 Number 2 February 2013

February 2013 Volume 93 Number 2 Physical Therapy f 161

Baseline 1 27 12.8 6.9 8 12 19

After treatment 1 28 (2) 15.8 6.9 11 15 22

Change ⫺1 10 3.1 2.4 1 4 5

Test-retest and interrater reliability subgroup 1 25 11.1 7.6 5 11 15

After treatment 4 56 (12) 46.3 10.3 42 49 54

Change ⫺2 17 4.2 3.9 1 4 6

Test-retest and interrater reliability subgroup 4 55 38.4 14.2 30 42 48

GRC 0 6 2.9 1.2 2 3 3.5

162 f Physical Therapy Volume 93 Number 2 February 2013

higher than the respective MDC95

At the follow-up evaluation, 2 partic-

February 2013 Volume 93 Number 2 Physical Therapy f 163

MDC90 –95 values for the BBS range

Anchor-based methods. The mean

164 f Physical Therapy Volume 93 Number 2 February 2013

change just slightly lower than the Table 2.

A change of 4 points represents a ● null/small improvement (GRC ⱕ3) 1.6 1.9

of the 40 participants who had a Optimal cutoff score 4 (3.0–4.9) 6 (4.4–7.6)

These findings are the first analyzing

February 2013 Volume 93 Number 2 Physical Therapy f 165

166 f Physical Therapy Volume 93 Number 2 February 2013

February 2013 Volume 93 Number 2 Physical Therapy f 167

References This article cites 40 articles, 13 of which you can access

You might also like