Le Geriatric Anxiety Inventory (GAI) et sa forme courte (GAI-SF) sont des échelles utilisées internationalement pour évaluer
les symptômes anxieux chez les aînés. L’objectif de cette étude était de conduire la première revue critique des propriétés
psychométriques de ces outils. Les études pertinentes (n = 31) des deux versions du GAI ont été extraites de bases de données
électroniques ainsi que d’une recherche à la main. La qualité des études a été évaluée par la grille COSMIN. Le GAI et le GAI-SF
présentaient une consistance interne ainsi qu’une fidélité test-retest adéquates. La validité convergente présentait des
corrélations élevées avec des mesures d’anxiété généralisée alors que de faibles corrélations étaient retrouvées avec celles
incluant des symptômes somatiques. Un chevauchement important a été trouvé avec des mesures des symptômes dépressifs.
Alors qu’il n’y a pas de consensus quant à la structure factorielle du GAI, le GAI-SF est unidimensionnel. Malgré de bonnes
sensibilité et spécificité pour détecter l’anxiété, les scores-frontières recommandés variaient considérablement. Le GAI et le
GAI-SF sont des instruments présentant des propriétés psychométriques satisfaisantes. Afin d’élargir leur utilisation,
certaines d’entre elles nécessitent toutefois un examen plus approfondi. Cette revue souligne l’importance de porter attention
quant à certaines lacunes méthodologiques qui ont été retrouvées dans les études.
The Geriatric Anxiety Inventory (GAI) and its short form (GAI-SF) are self-reported scales used internationally to assess
anxiety symptoms in older adults. In this study, we conducted the first critical comprehensive review of these scales’
psychometric properties. We rated the quality of 31 relevant studies with the COSMIN checklist. Both the GAI and GAI-SF
showed adequate internal consistency and test-retest reliability. Convergent validity indices were highest with generalized
anxiety measures; lowest with instruments relating to somatic symptoms. We detected substantial overlap with depression
measures. While there was no consensus on the GAI’s factorial structure, we found the short version to be unidimensional.
Although we found good sensitivity and specificity for detecting anxiety, cut-off scores varied. The GAI and GAI-SF are
relevant instruments showing satisfactory psychometric properties; to broaden their use, however, some psychometric
properties warrant closer examination. This review calls attention to weaknesses in the methodological quality of the studies.
School of Psychology, Université Laval, Québec
Centre d’excellence sur le vieillissement de Québec, Québec
Department of Psychology, Université de Sherbrooke, Sherbrooke
Institut universitaire de première ligne en santé et services sociaux – Centre intégré universitaire en santé et services sociaux de
l’Estrie - CHUS (CIUSSS de l’Estrie- CHUS), Sherbrooke
The authors wish to thank Alexandra Michel in the searching and assessment of methodological quality of studies.
these databases are representative of the literature pub- For each measurement property, an overall score is
lished on this topic. We made additional efforts to locate determined by taking the lowest rating of any of the
relevant studies through a handsearching process. The box items (worst score counts method; Terwee et al.,
keywords “Geriatric Anxiety Inventory” in the title and 2012). Quality assessment of studies was independently
abstract section of the databases was what we used to performed by the first author and a research assistant.
filter relevant studies. We decided on this approach Discrepancies were resolved through a discussion.
after conducting different tests (e.g., with broader key- When necessary, a third reviewer made the decision.
words like “anxiety” or “assessment” or by including
them in the “any field” section), which considerably
broadened the number of non-relevant articles Data Collection Process
retrieved. The search was restricted to articles published Data extraction was conducted by the first author.
between January 1, 2007 (the GAI was developed in Extracted data included basic information about study
2007 [Pachana et al., 2007]) and December 31, 2018. We demographics (e.g., publication year, country in which
retained articles according to the following criteria: the study was conducted, language in which the instru-
(a) written in English or French, (b) presented original ment was administered) as well as sample characteris-
empirical research, and (c) expressed the primary tics (e.g., type of sample, sample size, mean age). When
objective of exploring the psychometric properties of available, we collected data on the different measure-
the GAI and/or the GAI-SF. We excluded the following ment properties (e.g., results, statistical methods used,
types of articles because either the information pro- time interval, comparator instruments) defined in the
vided was limited or the articles were frequently non- COSMIN checklist. More specifically, these properties
peer reviewed: unpublished manuscripts, editorials, were: internal consistency, test-retest reliability, meas-
dissertations, theses, randomized controlled trials, case urement errors, content validity, structural validity
reports, and published abstracts. The first author and a (factor analysis), hypothesis testing, cross-cultural val-
research assistant independently screened the titles and idity, criterion validity, and responsiveness.
abstracts of the retrieved studies to determine their
eligibility. When a disagreement emerged between the
two reviewers, a discussion ensued in order to reach a Results
consensus. When necessary, a third reviewer made the Search Results
As shown in Figure 1, the database search retrieved a
total of 485 articles. Duplicates (n = 232) were removed
Quality Assessment and of the 253 remaining records, we excluded 222.
We assessed the methodological quality of the included The main reasons for exclusion were that the GAI or
studies with the “COnsensus-based Standards for the GAI-SF was not the topic of interest (n = 179) or that the
selection of health status Measurement Instruments” article was not presented as an original published
(COSMIN) checklist (Mokkink, Terwee, Knol, et al., manuscript (i.e., conference proceeding; n = 21). Four
2010; Mokkink, Terwee, Patrick, et al., 2010). The COS- articles were excluded based on language (i.e., were
MIN checklist consists of eight boxes that each refer to a not written in English or in French). Thus, we retained
specific measurement property (i.e., internal consistency, a final list of 31 articles for the purpose of the current
reliability, measurement error, content validity, struc- review.
tural validity, hypothesis testing, cross-cultural validity,
and responsiveness). Each box contains 5 to 18 items that
Methodological Quality of the Included Studies
assess methodological standards, and items are scored
on a 4-point rating scale (i.e., poor, fair, good, or excel- The results of COSMIN ratings for the 31 studies
lent) using specific criteria. For example, the fifth item on retained are displayed in Table 1. The studies assessed
the internal consistency box assesses whether the uni- an average of 2.7 psychometric properties out of the
dimensionality of the scale was verified. Criteria pro- nine COSMIN criteria. Most of the COSMIN boxes
posed by the COSMIN checklist for rating this item were rated as having “poor” (43.5%) or “fair” (40%)
follow: a factor analysis was performed in the study quality. The most frequent reasons for these ratings
population (excellent); the authors refer to another study were low sample size or a lack of information concern-
in which factor analysis was performed in a similar study ing the number of missing items and how they were
population (good); authors refer to another study in handled. This information corresponds to key criteria
which factor analysis was performed but not in a similar because it is assessed in almost all COSMIN boxes.
study (fair); factor analysis was not performed and Only 11.8 per cent of the rated boxes were rated as
contains no reference to another study (poor) (for more having “good” quality, and 4.7 per cent as having
information on rating, see “excellent” quality.
Records identified through database Additional records identified
searching through other sources
(n = 484) (n = 1)
(n = 253)
Studies included in
(n = 31)
Study and Participant Characteristics Sample recruitment source was categorized as either
non-clinical (e.g., community-dwelling seniors), psychi-
Basic characteristics of the studies retained for the cur-
atric (e.g., in-patient, outpatient, or institutionalized
rent review and their samples are presented in Table 2.
patients, or individuals with a psychiatric diagnosis),
Psychometric properties of the GAI were examined by
medical (i.e., having a medical diagnosis or receiving
22 studies, while only one study investigated the prop-
medical care), or mixed (i.e., different sources of recruit-
erties of the GAI’s short form, and eight studies exam-
ment in the sample). Of the 31 selected studies, nine
ined both forms. The latter studies generally extracted
used mixed samples (non-clinical and/or psychiatric
GAI-SF scores from the GAI. We examined psychomet-
and/or medical). Other studies’ recruitment sources
ric properties of 15 versions of the GAI: Brazilian Por-
were for the most part exclusively non-clinical (n = 9),
tuguese, Chinese, Czech (long and short forms), English
medical (n = 9), or, in a smaller proportion, psychiatric
(long and short forms), French Canadian (long and
(n = 4 studies). Mean scores on the GAI varied between
short forms), Italian (long and short forms), Norwegian
.58 to 16.3, with a mean of 5.5. Those for the GAI-SF
(long and short forms), Portuguese (long and short
ranged between .17 to 3.64, with a mean of 1.8.
forms) and Spanish.
The 31 retained studies provided data for 8,174 patients
who completed the GAI and/or the GAI-SF. Sample
sizes ranged from 32 to 1,318 patients. Most studies had
samples composed mainly of women (on average, Internal Consistency. The alpha coefficient of the GAI
64.9% of the samples were composed of women). Par- ranged between .71 and .97 with a mean of .91, and
ticipants were aged between 52 and 94 years old, between .61 to .84 with a mean .80 for the GAI short
excluding participants in the study by Matheson et al. form (see Table 2). According to the COSMIN checklist
(2012) that included young adults aged 37 years old and results, internal consistency was mostly (63%) rated as
older. Mean age of the participants was 72.5 years. poorly assessed. The items rated as “poor” referred
Table 1: Methodological quality of each study per measurement property
Table 2: Characteristics of the retained studies on the GAI and GAI-SF and their reliability coefficients
Brazilian Portuguese Massena et al. Brazil Mixed Mixed sample from com- 82 72.2 8.77 .91 .85
version (GAI-BR) (2015) munity and outpatient 1 week
Long form psychogeriatric clinic
(n = 72)
Chinese version Yan et al. (2014) Beijing Nonclinical Community-dwelling 59.4 70.8 2.17 (4.19) .94 —
(GAI-CV) seniors (n = 1,047)
Long form Guan (2016) Beijing Nonclinical Community-dwelling 59.4 71.4 — .94 —
seniors (n = 1,318)
Dow et al. (2018) Australia Nonclinical Community-dwelling 66 76.9 — .95 —
Chinese immigrants 60–92
(n = 87)
Czech version Heissler et al. Czech Repub- Nonclinical Community-dwelling 52 75.5 Men: 2.27 (2.85) .85 —
Long form (2018) lic seniors (n = 485) Women: 3.44
Czech version Heissler et al. Czech Repub- Nonclinical Community-dwelling 52 75.5 Men: .64 (1.11) .75 —
Short form (2018) lic seniors (n = 485) Women: 1.08
English version (GAI) Cheung (2007) New Zealand Psychiatric Geriatric psychiatry 63 75.5 7.59 (6.5) — —
Long form patients (n = 32) 66–85
Pachana et al. Australia Mixed Community-dwelling 64.4 71.7 2.3 (3.8) .91 —
(2007) – Nonclinical (n = 452) 60–90
– Psychiatric Patients attending a 74 78.8 5.22 (5.83) .83 .91
psychogeriatric 66–94 1 week
service (n = 46)
Boddice and Australia Mixed Community-dwelling 52 75.8 — — —
Byrne (2008) – Nonclinical seniors (n = 31)
– Medical Older adults living in 62.9 82.8 2.3 (4.2) — —
nursing homes (n = 27)
Diefenbach United States Medical Older home care recipi- 83.3 76.6 4.63 (5.57) .93 .95
et al. (2009) ents (n = 66); data on 6592 1 to 2 weeks
the GAI available only
for a subset of the
Table 2: Continued
Gender Mean Age & Consistency Test-retest
Norwegian version Bakkane Bend- Norway Mixed Total sample: patients 67 75.7 8.5 (6.6) .92 for the 1st —
(GAI) ixen et al. who were admitted to factor
Long form (2016) a department of geri- .85 for the
atric psychiatry 2nd factor
(n = 428)
– Psychiatric Patients with a diagnosis 67.6 75.6 11.1 (6.0) — —
of depression (n = 220)
Patients with a diagnosis 64.3 76.9 5.9 (6.1) — —
of nonorganic psych-
osis (n = 68)
– Medical Patients with a diagnosis 70.6 73.3 5.5 (5.9) — —
of dementia (n = 140)
Molde et al. Norway Psychiatric Psychogeriatric mixed 67.9 75.7 8.2 (6.5) .94 —
(2017) in-and-out patient 62–78
sample (n = 543)
Norwegian version Molde et al. Norway Psychiatric Psychogeriatric mixed 67.9 75.7 — .84 —
Short form (2017) in-and-out patient 62–78
sample (n = 543)
Portuguese version Ribeiro et al. Portugal Mixed Total sample (n = 217) — — — .96 —
(GAI-PT) (2011)
Long form – Nonclinical Community-dwelling 56.6 73.9 With PD: 16.3 (4.9) .97 .99 (ICC)
seniors (n = 152) 59–92 Without PD: 4.1 2 weeks
– Psychiatric Patients with depression 71.9 70.5 15.2 (5.58) —
(n = 32) 55–85
Patients with anxiety 47.8 72.3 With AD: 14.8 (4) — —
disorders (n = 23) 56–89 With GAD: 16.1
– Medical Patients with an early 80 74.6 11.9 (5.7) — —
Alzheimer’s disease 63–88
Note. AD = anxiety disorder; GAD = generalized anxiety disorder; IQR = interquartile range; MDN = median; PD = psychological distress.
mostly to the absence of information on missing items scale. Molde et al. (2017) performed a content analysis
and unidimensionality of the scale. on data retrieved from a panel of older adults and a
group of clinical psychologists and psychiatrists who
Test-Retest Reliability. Test-retest coefficients (mostly Pear-
were invited to comment on the items. Content validity
son’s r and the intraclass correlation [ICC]) ranged
was rated as “excellent” for the study by Pachana,
between .53 to .99 with a mean of .79 for the GAI, and
Byrne, et al. (2007) and “poor” for the one by Molde
between .80 to .97 with a mean of .90 for the short form
et al. (2017), according to COSMIN criteria. The latter
(see Table 2). Aside from the two lowest coefficients of the
study obtained such a rating because it is not clear
long form (r = .53 and .58; Kneebone, Fife-Schaw, Lin-
whether all items were assessed to determine whether
coln, & Harder, 2016; Silva et al., 2016), the lowest coef-
they comprehensively covered the construct of interest
ficient was .85. These large differences in the coefficients
in regard to its theoretical foundation, and whether
obtained are difficult to explain, and authors did not
items were relevant to the purpose of the instrument.
comment on their results. In general, the interval of time
between the two administrations of the scale was one to Convergent Validity. Convergent validity has been estab-
two weeks, except in the study of Silva et al. (2016) in lished between the GAI and GAI-SF and a variety of
which the interval was 30 weeks. Surprisingly, this longer other instruments that also assess anxiety and related
interval generated a coefficient of r = .58 for the GAI and constructs (e.g., symptoms of GAD, worry, general
the highest coefficient for the short form (r = .97). anxiety, or both anxiety and symptoms of depression
at the same time). As shown in Table 3, correlations vary
Test-retest reliability was chiefly rated as poorly
assessed (72.7%) according to the COSMIN checklist between .25 to .86 for the GAI and between .55 to .79 for
because of a lack of information concerning missing the short form. Convergent validity with GAD scales
items, stability of participants, and similarity of test appear to be the highest (r = .65 to .86). Data are scarce
conditions between the two administrations. The COS- on the association between the GAI and measures that
MIN checklist asks whether there were any important assess other anxiety disorders. Available evidence
flaws in the study design or method, and in light of reveals only a moderate relationship (r = .56) with a
certain retest research recommendations, there are sev- measure of post-traumatic symptoms (Gould et al.,
eral other weaknesses present. In the majority of the 2014). The weakest associations were found for scales
studies, little information was provided on sampling that contain somatic items such as the Hamilton Anx-
and rationale for major decisions that were made (e.g., iety Scale (HAMA) (r = .25; Ball, Lipsius, & Escobar,
length of the retest interval). Polit (2014) has suggested 2015) and the Beck Anxiety Inventory (BAI) (r = .28;
that seeking input from patients or experts regarding Gould et al., 2014). In contrast, the GAI focuses pre-
the stability of the construct being assessed can help dominantly on psychological symptoms. Another low
support decisions regarding retest interval. Park, Kang, correlation was found with the State-Trait Anxiety
Jang, Lee, and Chang (2018) have recommended that Inventory [STAI]-subscale state) (r = .28; Massena, de
the sample size be about five times the number of items, Araújo, Pachana, Laks, & de Pádua, 2015). The authors
which was not the case for any of the studies since they explained this result as due to a possible bias in the
generally assessed the retest reliability on a subgroup of formulation of the questions of the STAI-state, where
the sample. Moreover, the attrition rate for the retest symptoms were assessed according to participants’
assessment was rarely reported although there is evi- feelings at the time of the interview rather than those
dence that high rates of attrition can depress reliability experienced over the past week.
estimates (Polit, 2014). Although the COSMIN checklist
Convergent validity was not evaluated in depth with
prioritizes the use of the ICC to analyze retest reliability,
the COSMIN checklist because only two items referred
Vaz, Falkmer, Passmore, Parsons, and Andreou (2013)
to it in the hypothesis test box. For the purpose of this
made a case to consider measurement error indices such
review, we rarely used these items to assess convergent
as the coefficient of repeatability (CoR) or the smallest
validity since most of the retained studies did not
real difference (SRD) over coefficients like the Pearson’s
provide hypotheses to test. Despite this, the general
r and the ICC.
trend was that studies provided a poor description of
the constructs measured by the comparator instrument.
Validity In addition, it was not always clear whether the com-
parator instrument was an established and validated
Content Validity. Only two studies addressed content
instrument for use with elderly individuals.
validity. This very low number could be explained by
the fact that validation studies may have assumed that Divergent Validity. We assessed divergent validity in
items of the GAI and GAI-SF are relevant and compre- some studies by examining the association with a meas-
hensive. As the developers of the GAI, Pachana, Byrne, ure of depression symptoms. Correlations ranged
et al. (2007) thoroughly evaluated the content of this between .28 to .86 for the GAI and between .37 to .63
for the short form (see Table 3). The lowest correlations Only items 1 – “I worry a lot of the time” – and 2 – “I find
(r = .28) between the GAI and the Hospital Anxiety and it difficult to make a decision” – were always related to
Depression Scale – Depression Scale (HADS-D) are cognitive symptoms and items 12 – “I get an upset
explained by the fact that patients with major depres- stomach due to my worrying” – and 18 – “I sometimes
sive disorder were excluded (Ball et al., 2015) and by the feel a great knot in my stomach” – were always associ-
low prevalence of depression symptoms (Kneebone ated with physical symptoms. This variability may be
et al., 2016). These results suggest that there may exist due to the type of sample (i.e., three studies used non-
different patterns of divergent validity where highly clinical samples; one, a mixed sample of psychiatric and
uniform samples with low rates of depression symp- medical patients; and one, composed of elderly people
toms could facilitate distinction from anxiety symptoms with cognitive impairment) and cultural differences
assessed with the GAI and GAI-SF. because four versions were used (Norwegian, English,
Spanish, and Chinese).
Diefenbach, Bragdon, and Blank (2014) and Bakkane
Bendixen, Hartberg, Selbæk, and Engedal (2016) shed Four studies investigated the factor structure of the GAI-
new light on the association between the GAI and GAI- SF and all confirmed its unidimensionality (Champagne
SF and measures of depression. Diefenbach et al. (2014) et al., 2016; Diefenbach et al., 2014; Johnco et al., 2014;
found that depressive symptoms were more strongly Molde et al., 2017). Most items with high factor loadings
correlated with the “central nervous system hyperar- referred to cognitive symptoms of anxiety.
ousal” factor and to a lesser extent with “gastrointes-
Criterion Validity. At first, Pachana, Byrne, et al. (2007)
tinal symptoms”. These results suggest that there may
recommended a GAI cut-off score of 9 for the identifi-
be a certain response pattern in patients with greater
cation of any anxiety disorder and of 11 for the detection
co-morbid depressive symptoms. Bakkane Bendixen
of GAD. Further studies suggested cut-off scores that
et al. (2016) found that in comparison to those with
varied between 3 and 13 out of 20 for the identification
dementia or psychosis, a group of patients with depres-
of an anxiety disorder (see Table 5). Multiple factors can
sion present a different pattern of results on the GAI;
explain this variability such as the type of sample (non-
that is, with a higher total score and a higher endorse-
clinical vs. clinical), the proportion of patients who
ment of 18 of the 20 items (except items 3 and 18).
actually met the criteria for an anxiety disorder, cultural
Factorial Validity. The GAI was first described as being differences in the expression of anxiety, and the external
unidimensional although no factor analysis was pre- criterion used for the diagnosis. The much lower cut-off
sented to support this assumption (Byrne & Pachana, score of 3 found by Cheung, Patrick, Sullivan, Cooray,
2011; Pachana, Byrne, et al., 2007). Ten studies investi- and Chang (2012) may be attributable to differences in
gated the factorial validity of the GAI and half of them the nature of the sample as their participants had
confirmed the one-factor structure (Champagne, Land- chronic obstructive pulmonary disease; the mean score
reville, Gosselin, & Carmichael, 2016; Johnco, Knight, on the GAI was low (M = 3.3; SD = 4.6) as was the
proportion of participants with an anxiety disorder
Tadic, & Wuthrich, 2014; Molde et al., 2017; Ribeiro, Paul,
(25.5%). Test sensitivity values for the GAI ranged
Simoes, & Firmino, 2011; Yan, Xin, Wang, & Tang, 2014).
between 30 and 100 per cent; while specificity values
The other five studies that investigated the factorial ranged between 43 and 100 per cent. The area under the
validity of the GAI found a two-factor structure ROC curve (AUC) ranged between 79 and 98.1.
(Bakkane Bendixen et al., 2016), a three-factor structure
For the GAI short form, a score of 3 or more was
(Guan, 2016; Mababu & RuizSánchez, 2016; Marquez-
originally found to be optimal for the detection of
Gonzalez, Losada, Fernandez-Fernandez, & Pachana,
GAD in a non-clinical sample (Byrne & Pachana,
2012), and a four-factor structure (Diefenbach et al.,
2011). Results of subsequent studies were similar with
2014) (see Table 4). The identified factors can be
optimal thresholds at 2 to 3 out of 5 for the identification
grouped into three categories: (a) cognitive symptoms
of an anxiety disorder. Sensitivity varied between
(includes the following factors: worries, excessive
72 and 100 per cent and specificity ranged between
worry symptoms, decision-making symptoms, and
35 and 98.3 per cent. The AUC ranged between 78 and
mental anxiety), (b) physical symptoms of anxiety
(includes the following factors: central nervous system
hyperarousal, arousal and somatic symptoms), and With regard to the different diagnostic parameters, the
(c) negative anxiety. Cognitive and physical symptoms performance of the standard and short forms of the GAI
of anxiety were found across all five studies. In contrast, seemed quite comparable. According to the COSMIN
negative anxiety, which refers to the motives and checklist, we largely rated criterion validity as “fair” for
behaviours related to anxiety disorders, was found only different reasons (e.g., no information on how missing
by Guan (2016). Most of the GAI items were not con- items were handled; unclear if the criterion was a “gold
sistently associated with the same symptom category. standard”).
Table 3: Convergent and divergent validity of the GAI and GAI-SF
Anxiety Measure
ASI = Anxiety Inventory Status .85 Rozzini et al. (2009)
BAI = Beck Anxiety Inventory .28–.75 .58 Diefenbach et al. (2009); Gould et al. (2014); Massena et al. (2015); Pachana et al.
(2007); Silva et al. (2016); Yan et al. (2014)
GADS = Goldberg Anxiety and Depression Scale – Anxiety scale .57 Pachana et al. (2007)
GAI = Geriatric Anxiety Inventory .77–.94 Byrne and Pachana (2011); Champagne et al. (2016); Gerolimatos et al. (2013);
Heissler et al. (2018); Johnco et al. (2015); Silva et al. (2016)
GAI-SF = Geriatric Anxiety Inventory – Short Form .77–.94 Champagne et al. (2016); Gerolimatos et al. (2013); Heissler et al. (2018); Johnco et
al. (2015); Silva et al. (2016)
GAS = Geriatric Anxiety Scale .60–.82 Cheung (2007); Pachana et al. (2007) Gould et al. (2014)
HADS-A = Hospital Anxiety and Depression Scale – Anxiety Scale .51–.71 .61 Ball et al. (2015); Creighton et al. (2018); Dow et al. (2018); Ferrari et al. (2017);
Kneebone et al. (2016)
HAMA = Hamilton Anxiety Scale .25–.47 Ball et al. (2015); Gould et al. (2014)
RAID = Rating Anxiety in Dementia Scale .61 Creighton et al. (2018)
SAS = Self-Rating Anxiety Scale .52 Yan et al. (2014)
SRQ-20 = Self-Reporting Questionnaire .74 .55 Silva et al. (2016)
STAI = State-Trait Anxiety Inventory .61–.69 Cheung (2007); Massena et al. (2015); Matheson et al. (2012); Ribeiro et al. (2011)
STAI-S = State-Trait Anxiety Inventory – subscale state .28–.80 .48–.50 Byrne and Pachana (2011); Byrne et al. (2010); Ferrari et al. (2017); Massena et al.
(2015); Pachana et al. (2007)
STAI-T = State-Trait Anxiety Inventory – subscale trait 55 .53 Ferrari et al. (2017); Massena et al. (2015)
Anxiety/Depression Measure
GHQ = General Health Questionnaire .76 Ribeiro et al. (2011)
Generalized Anxiety Disorder Measure
GAD-7 = Generalized Anxiety Inventory-7 .86 .79 Champagne et al. (2016)
GADQ-IV: Generalized Anxiety Disorder Questionnaire for .65 Diefenbach et al. (2009)
GADSS = Generalized Anxiety Disorder Severity Scale .84 Diefenbach et al. (2009)
Intolerance to Uncertainty
IUI = Intolerance of Uncertainty Inventory .62 .58 Champagne et al. (2016)
Neuroticism Measure
NEO-N = NEO Five-Factor Inventory- neuroticism .63 Byrne et al. (2010)
Posttraumatic Stress Disorder Measure
PCL-C = Posttraumatic stress disorder checklist-civilian version .56 Gould et al. (2014)
Byrne and Pachana (2011); Champagne et al. (2016); Diefenbach et al. (2009); Gerolimatos et al. (2013);
To our knowledge, only Ball et al. (2015) explicitly
assessed sensitivity to treatment of the GAI in a clinical
controlled trial. They concluded that the GAI is a useful
tool for monitoring the outcome of treatment. Accord-
ing to the COSMIN checklist, responsiveness was rated
as “poor” because no analyses were conducted between
the score on the GAI and the gold standard to demon-
strate the good performance of the former. Although it
wasn’t their primary aim, there are studies that support
the sensitivity to change of the GAI in the treatment
Diefenbach et al. (2014); Johnco et al. (2015); Ribeiro et al. (2011)
Problematic Issues
Cross-Cultural Adaptation. Simple translation of a ques-
tionnaire is insufficient if it is to be used with a popu-
Ball et al. (2015); Kneebone et al. (2016)
Depression Measure
Depression Scale
1 Worries Excessive worry symptoms Cognitive symptoms Cognitive symptoms Mental anxiety
2 Worries Decision-making symptoms Cognitive symptoms Cognitive symptoms Mental anxiety
3 Physical symptoms NA Cognitive symptoms Cognitive symptoms Mental anxiety
4 Worries CNS hyperarousal symptoms Arousal symptoms Arousal symptoms Mental anxiety
5 Worries CNS hyperarousal symptoms Cognitive symptoms Cognitive symptoms Mental anxiety
6 Worries Excessive worry symptoms Arousal symptoms Arousal symptoms Mental anxiety
7 Physical symptoms Gastrointestinal symptoms Somatic symptoms Somatic symptoms Mental anxiety
8 Worries Excessive worry symptoms Cognitive symptoms Cognitive symptoms Physical anxiety
9 Worries Gastrointestinal symptoms Cognitive symptoms Cognitive symptoms Mental anxiety
10 Physical symptoms CNS hyperarousal symptoms Arousal symptoms Arousal symptoms Mental anxiety
11 Worries CNS hyperarousal symptoms Cognitive symptoms Cognitive symptoms Mental anxiety
12 Physical symptoms Gastrointestinal symptoms Somatic symptoms Somatic symptoms Physical anxiety
13 Worries Gastrointestinal symptoms Arousal symptoms Arousal symptoms Mental anxiety
14 Worries Excessive worry symptoms Cognitive symptoms Cognitive symptoms Negative anxiety
15 Physical symptoms Gastrointestinal symptoms Somatic symptoms Somatic symptoms Negative anxiety
16 Worries Excessive worry symptoms Cognitive symptoms Cognitive symptoms Negative anxiety
17 Worries Excessive worry symptoms Cognitive symptoms Cognitive symptoms Negative anxiety
18 Physical symptoms Gastrointestinal symptoms Somatic symptoms Somatic symptoms Physical anxiety
19 Worries Excessive worry symptoms Cognitive symptoms Cognitive symptoms Negative anxiety
20 Physical symptoms CNS hyperarousal symptoms Arousal symptoms Arousal symptoms Mental anxiety
7 (“I often feel like I have butterflies in my stomach”) Dissanayaka et al., 2015; Edelstein et al., 2008; Lin
and 18 (“I sometimes feel a great knot in my stomach”), et al., 2016; Pachana & Byrne, 2012; Therrien & Hunsley,
10 (“I often feel nervous”) and 13 (“I think of myself as a 2012). However, these reviews were mostly dedicated
nervous person”), 6 (“Little things bother me a lot”) and to the GAI long form and examined in specific popula-
9 (“I can’t help worrying about even trivial things”), tions or settings (e.g., Parkinson’s disease; residential
1 (“I worry a lot of the time”) and 8 (“I think of myself as aged care facilities). Moreover, these reviews examined
a worrier”), and 16 (“I think that my worries interfere only a small sample of studies. The study that reviewed
with my life”) and 17 (“My worry often overwhelms the largest number of articles on the GAI included only
me”). To a certain extent, the same phenomenon of 18 (Balsamo et al., 2018), a number that is almost half of
redundancy was observed for items 8, 10, and 11 of what our own search retrieved.
the GAI short version. Thus, the detected item overlap
The goals of our study were to summarize existing
suggests that there may be redundant items in the GAI
evidence on the psychometric properties of the GAI
and the GAI-SF that do not provide any additional
and GAI-SF, to assess the methodological quality of the
information because of their similar content (Molde
studies and to provide guidance for future psychometric
et al., 2017).
validation studies. For the current review, we identified
Floor Effects. Some authors have made assumptions 31 studies that purposely studied the psychometric
about the possible presence of floor effects in the GAI properties of these scales. As in reviews by other
and GAI-SF. Yan et al. (2014) and Johnco et al. (2014) researchers (Balsamo et al., 2018; Creighton et al., 2018;
hypothesized the presence of floor effects when they Dissanayaka et al., 2015; Edelstein et al., 2008; Lin et al.,
observed that the GAI may be less suitable for elderly 2016; Pachana & Byrne, 2012; Therrien & Hunsley, 2012),
people with low-level anxiety as this would mean that we generally found appropriate psychometric properties
they would not endorse several items suggesting high- for the GAI and GAI-SF among various clinical and non-
level anxiety and serious outcomes. clinical populations of older adults. However, we also
mostly found low levels of methodological quality in the
studies retained for this review.
Discussion To our knowledge, this is the first systematic review to
Since their development, the GAI and, to a lesser extent, examine the methodological quality of research con-
the GAI-SF, have undergone extensive psychometric ducted on the psychometric properties of the GAI and
testing in a wide range of populations and countries. GAI-SF. The majority of the COSMIN boxes were rated
These tools have been the subject of different reviews as “poor” or “fair” (83.5% of all boxes) for the different
(Balsamo et al., 2018; Creighton et al., 2018; psychometric properties evaluated. The results of
Table 5: Criterion validity and cut-off point of the GAI and GAI-SF
Note. ADIS-IV = Anxiety Disorder Interview Schedule for DSM-IV; AUC = area under the curve; GAD = generalized anxiety disorder; ICD-10 = International Classification of
Diseases, 10th Edition; MINI = Mini International Neuropsychiatric Interview; NOS = not otherwise specified; NPV = negative predictive value; PPV = positive predictive value;
studies with low methodological quality were not
ignored in this review. A “poor” or “fair” methodo-
psychological distress
anxiety), and a low correlation found with a measure of specific cut-off score for the detection of an anxiety
state anxiety supports this hypothesis. The higher retest disorder can be established because of significant
reliability of the short form versus the long form found variability in the results.
in Silva et al. (2016) also leads us to question whether the
Some psychometric properties of the GAI were some-
items of the short version refer to more stable anxiety
times found to be slightly better than those of the GAI-
SF, but most authors concluded that the results were
Inconsistency in divergent, factorial, and criterion val- nevertheless comparable. Most studies that assessed the
idity was found across studies. Whereas some authors psychometric properties of the short form had extracted
have concluded that moderate to high correlations with data from the GAI long form. Although we can only
a measure of depressive symptoms are evidence of poor speculate about the consequences of this procedure, it is
divergent validity, others have argued that it has not possible that the psychometric properties of the GAI-SF
been well established that anxiety and depression are differ when administered independently because of
completely independent disorders in the elderly popu- context, primacy and recency, and warm-up effects.
lation considering the overlap of symptoms (Cassidy, When validating a brief scale, it should be considered
Lauderdale, & Sheikh, 2005). The associations found are as a completely new measure and thus submitted to
not specific to the GAI and GAI-SF, but are rather independent validation procedures (Smith et al., 2000).
characteristic of other measures commonly used to Until further data from an independent assessment of
assess anxiety in the elderly population (Therrien & the short version become available, the findings of this
Hunsley, 2012). review suggest that psychometric properties are not a
major issue when choosing between the GAI and the
Further, conflicting results were found for the factorial
GAI-SF. The choice of one instrument or the other
structure of the GAI concerning the unidimensionality
depend on the user’s needs. For example, the short
and multidimensionality of the scale. However, the fact
form may be the best option in specific situations such
that three of the four studies that presented the highest
as when time is limited, when there is a demanding
methodological quality (either “good” or “excellent”)
clinical context (e.g., acute geriatric settings), with eld-
for structural validity concluded to the unidimension- erly people who are easily fatigued or distracted, when
ality of the scale leads us to support this result as well. multiple questionnaires are administered to patients, or
Mababu and RuizSánchez (2016) suggested that the when patients are frequently monitored.
different factorial structures found in previous studies
for the GAI could be due in part to the dichotomous
response format. Among the possible impacts of a
dichotomous scale are a decrease in the percentage of Suggestions for Future Research
explained variance and lower loadings (Lozano, Future efforts to validate the GAI and GAI-SF should
García-Cueto, & Muñiz, 2008; Velicer, DiClemente, & include paying particular attention to the previously
Corriveau, 1984). Molde et al. (2017) also proposed identified problems and aiming to achieve a higher
different explanations for the lack of factorial consist- degree of methodological quality. Since content validity
ency: different cultural response styles, differences in is considered to be the most important psychometric
semantics due to translation processes, different sam- property according to COSMIN and that it was hardly
ple characteristics, and true cultural differences in the tested in previous research, more studies should
structure of anxiety across countries. address this situation. Researchers should not assume
that the culture and scales’ content are equivalent. Also,
The unidimensionality of the GAI raises the question as
the ability of the GAI and GAI-SF to distinguish
to whether it reflects all manifestations of anxiety in a
between anxiety and depression symptoms is limited.
context where the GAI was designed to assess a range
Therefore, it would be interesting to further examine
of anxiety presentations (Pachana, Byrne, et al., 2007).
this issue by going beyond standard correlational ana-
There is currently a consensus on the unidimensional-
lyses (e.g., by using the heterotrait-monotrait ratio of the
ity of the GAI-SF, which is not surprising for a 5-item
correlation method, by comparing the answer profiles
scale. An obvious issue when designing the short form
of depressed and non-depressed elderly individuals, or
of an instrument is to ensure that the target content
by identifying specific items that spark confusion as to
domain is still adequately represented despite the
the true nature of symptoms [i.e., related to depression
reduced number of items (Smith, McCarthy, & Ander-
or anxiety]).
son, 2000). This does not seem to be the case with the
GAI-SF as it is composed largely of items that relate to The appropriateness of the GAI and GAI-SF for moni-
cognitive symptoms. Evidence on criterion validity toring treatment change also requires further attention.
shows that the GAI and the GAI-SF can screen for Considering that the GAI and GAI-SF were developed
probable cases of anxiety disorders. However, no to assess a range of anxiety disorders rather than a
